Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Senior Software Reliability Engineer (Production Health) - open to remote across ANZ image - Rise Careers
Job details

Senior Software Reliability Engineer (Production Health) - open to remote across ANZ

Job Description

Join the team redefining how the world experiences design.

Hey, g'day, mabuhay, kia ora, 你好, hallo, vítejte!

Thanks for stopping by. We know job hunting can be a little time consuming and you're probably keen to find out what's on offer, so we'll get straight to the point.

Where and how you can work

Our flagship campus is in Sydney. We also have a campus in Melbourne and co-working spaces in Brisbane, Perth and Adelaide. But you have choice in where and how you work, we trust our Canvanauts to choose the balance that empowers them and their team to achieve their goals.

What you’d be doing in this role

As Canva scales change continues to be part of our DNA. But we like to think that's all part of the fun. So this will give you the flavour of the type of things you'll be working on when you start, but this will likely evolve.

At the moment, this role is focused on:

  • Designing and implementing processes, tools, automation, and libraries that service teams can use to improve the reliability of the services they own. For instance, adding a new long-awaited feature in our circuit breaker library.
  • Working with product engineering teams to ensure reliability best practices and tools are rolled out in every service across the whole organization. It’s not enough to create a new throttling library; we want to make sure it’s successfully used in every service.
  • Fostering a culture within the Engineering org that puts reliability first and establishes processes and policies that drive reliability within product engineering teams. This includes things like SLAs, error budgets, on-call response, incident resolution, and observability best practices.
  • A deep investigation into production incidents followed up by applying the learning to code. 
  • Researching, developing, and justifying the best choices in the form of design docs for tools and processes that will shape the future of reliability at Canva.
  • Proposing new approaches and solutions to ensure we future-proof Canva’s distributed cloud infrastructure as we scale.
  • Participating in design meetings, hiring interviews, and code reviews.

You're probably a match if

  • You have advanced coding proficiency in Python/ Java/ GoLang and strong Object Oriented Programming fundamentals
  • You have five-plus (5+) years of commercial experience working with developing complex, distributed web applications.
  • You have experience diagnosing and addressing issues across the “full stack”, including front-end code, backend, network / infrastructure and data layer
  • You have solid understanding of observability principles, such as metrics, logs, tracing, synthetic testing, query construction, dashboarding and alerting.
  • You have experience with guiding others in the principles of incident review, investigation and remedial activity.
  • You have disciplined coding practices, experience with code reviews and pull requests, and a creative and conceptual problem-solving approach.
  • You have strong communication and team collaboration skills, both written and verbal. As a reliability engineer, you will need to share the knowledge, communicate and coordinate changes across multiple service teams.

Nice to have; Not required!

  • Our services and libraries are primarily written in Java 13, so experience in Java is a nice to have. Our platform and infrastructure tooling is primarily written in Python, Go and Terraform.
  • Experience working with microservice architectures in large containerised, distributed cloud environments (ideally AWS). We’re hosted on AWS and leverage the tools they provide as much as possible
  • Experience working with data warehouse, analytics and reporting tools such as Snowflake, Mode Analytics and Looker.

About the Group

The Reliability Platform Group is responsible for providing the tools and processes to scale reliability across all Canva services. Our teams work together, and with other groups, to deliver preventive and detective tooling, processes and best practices that uplift Canva’s reliability. We do this by driving operational excellence, reducing the impact of incidents, and providing visibility and accountability across the broader Engineering community.

This role sits within the Production Health team, whose focus is on providing tools and guidance for Canva’s engineering teams to measure and maintain their systems’ reliability. Their key areas of practice include on-call management, service-level management, production readiness and operational review.

What's in it for you?

Achieving our crazy big goals motivates us to work hard - and we do - but you'll experience lots of moments of magic, connectivity and fun woven throughout life at Canva, too. We also offer a range of benefits to set you up for every success in and outside of work.

Here's a taste of what's on offer:

  • Equity packages - we want our success to be yours too
  • Inclusive parental leave policy that supports all parents & carers
  • An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more
  • Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally

Check out lifeatcanva.com for more info.

Other stuff to know

We make hiring decisions based on your experience, skills and passion, as well as how you can enhance Canva and our culture. When you apply, please tell us the pronouns you use and any reasonable adjustments you may need during the interview process.

We celebrate all types of skills and backgrounds at Canva so even if you don’t feel like your skills quite match what’s listed above - we still want to hear from you!

Please note that interviews are conducted virtually.

Canva Glassdoor Company Review
4.3 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
Canva DE&I Review
4.7 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
CEO of Canva
Canva CEO photo
Melanie Perkins
Approve of CEO

Average salary estimate

$120000 / YEARLY (est.)
min
max
$100000K
$140000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Senior Software Reliability Engineer (Production Health) - open to remote across ANZ, Canva

Join Canva as a Senior Software Reliability Engineer (Production Health) and become part of our vibrant team, where creativity meets reliability. Based in Sydney, Australia, but open to remote work across ANZ, this role is a perfect fit for someone who loves making services as reliable as possible while working in a fun and innovative environment. In this dynamic position, you’ll design and implement processes, tools, and automated solutions that help service teams enhance the reliability of their offerings. Collaborating closely with product engineering teams, you'll ensure best practices are effectively rolled out organization-wide, making a tangible impact on the product landscape at Canva. Fostering a culture that prioritizes reliability, you’ll be involved in everything from incident investigations to crafting insightful design documents that pave the way for future reliability improvements. You'll also facilitate design meetings, participate in code reviews, and contribute to a team that values collaboration and knowledge sharing. If you are passionate about coding in Python, Java, or GoLang, possess over five years of experience in developing complex web applications, and have a solid grasp of observability principles, then we want you on our team. At Canva, we promote a friendly atmosphere that encourages personal growth, teamwork, and a balance between work-life and professional aspirations. Don’t miss this opportunity – apply now and help shape the future of reliability at Canva!

Frequently Asked Questions (FAQs) for Senior Software Reliability Engineer (Production Health) - open to remote across ANZ Role at Canva
What are the responsibilities of a Senior Software Reliability Engineer at Canva?

As a Senior Software Reliability Engineer at Canva, you will design and implement processes to enhance service reliability, work with product engineering teams to ensure best practices are adopted, and foster a culture around reliability within engineering teams. You will also investigate production incidents and apply learnings, research and develop tools to improve reliability, and propose solutions for future-proofing our infrastructure.

Join Rise to see the full answer
What qualifications are needed for the Senior Software Reliability Engineer position at Canva?

To be eligible for the Senior Software Reliability Engineer position at Canva, candidates should have advanced coding skills in Python, Java, or GoLang and strong Object-Oriented Programming fundamentals. Additionally, a minimum of five years of experience in developing complex distributed web applications and a solid understanding of observability principles are crucial for this role.

Join Rise to see the full answer
What tools and languages will I use as a Senior Software Reliability Engineer at Canva?

In your role as a Senior Software Reliability Engineer at Canva, you will primarily work with Python, Java, and GoLang. Experience with microservice architectures in containerized cloud environments, particularly on AWS, along with knowledge of tools like Terraform and data analytics platforms such as Snowflake, will also be beneficial.

Join Rise to see the full answer
What is the work culture like for a Senior Software Reliability Engineer at Canva?

The work culture at Canva is collaborative and innovative, emphasizing creativity and fun. As a Senior Software Reliability Engineer, you'll be encouraged to share knowledge, participate in team activities, and contribute to an inclusive environment that supports personal growth alongside professional development.

Join Rise to see the full answer
How does Canva support work-life balance for a Senior Software Reliability Engineer?

Canva values work-life balance for its employees, including Senior Software Reliability Engineers. With flexible leave options, inclusive parental leave policies, and a Vibe & Thrive allowance for personal well-being and connection, the company is committed to nurturing an environment where you can thrive both personally and professionally.

Join Rise to see the full answer
Common Interview Questions for Senior Software Reliability Engineer (Production Health) - open to remote across ANZ
Can you explain your experience with incident investigation as a Senior Software Reliability Engineer?

To effectively respond to this question, discuss a specific incident you've investigated in the past, outlining the steps you took to analyze the issue, the tools you utilized for diagnosis, and how you implemented fixes. Highlight your analytical thinking, attention to detail, and your collaborative approach in resolving incidents.

Join Rise to see the full answer
What best practices do you implement to ensure service reliability?

Share specific best practices you’ve employed in your previous roles, such as implementing SLAs, error budgets, on-call response protocols, and observability practices. Discuss how you measure success and ensure adoption of these practices within teams, emphasizing collaboration and proactive communication.

Join Rise to see the full answer
How do you approach designing tools or libraries for enhancing reliability?

Discuss your methodology for designing tools or libraries. Explain how you gather requirements from stakeholders, your process for brainstorming and validating design ideas, and the importance of documentation and user feedback in shaping the final product. Don't forget to mention the iterative process of development and refinement.

Join Rise to see the full answer
What strategies do you use for mentoring other engineers in reliability principles?

Describe your approach to mentoring, such as conducting training sessions, creating informative resources, or facilitating knowledge-sharing meetings. Highlight how you customize your mentoring techniques to accommodate varying experience levels and encourage a culture of continuous learning and improvement.

Join Rise to see the full answer
How do you prioritize tasks when managing multiple reliability projects?

Explain your strategy for task prioritization, mentioning techniques such as the Eisenhower Matrix or MoSCoW method. Discuss how you assess project impact, deadlines, team workload, and the importance of flexibility in shifting priorities as new information arises.

Join Rise to see the full answer
Can you give an example of a time you improved a service’s reliability?

Share a specific example where your intervention led to measurable improvements in service reliability. Describe the context, the actions you took, the challenges faced, and how you collaborated with other teams to achieve the desired outcome. Highlight the metrics that show the improvements made.

Join Rise to see the full answer
What tools do you prefer for monitoring and observability, and why?

Discuss specific tools you’ve used such as Grafana, Prometheus, Datadog, or New Relic. Explain your rationale for choosing specific tools based on their capabilities, ease of integration, and team familiarity, and how they contribute to proactive reliability management.

Join Rise to see the full answer
How do you ensure that the reliability best practices you establish are adopted across teams?

Share your strategies for advocating best practices, such as conducting informative workshops, developing clear documentation, and collaborating closely with other teams to understand their specific needs. Discuss the importance of fostering buy-in and maintaining open channels of communication for feedback.

Join Rise to see the full answer
What do you believe is the most challenging aspect of a Senior Software Reliability Engineer's role?

Articulate your thoughts on the challenges of the role, such as balancing speed and reliability, dealing with complex multi-service architectures, or ensuring cross-team collaboration. Discuss how you approach these challenges and turn them into opportunities for growth and improvement.

Join Rise to see the full answer
How do you keep up to date with the latest trends in software reliability engineering?

Describe your methods for staying current in the field, such as following industry blogs, attending conferences or webinars, participating in professional networks, or utilizing platforms like GitHub to explore new tools and techniques. Emphasize your commitment to continuous learning and adaptation.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Canva Remote 110 Kippax Street, Surry Hills, Sydney, Australia
Posted 12 days ago
Inclusive & Diverse
Diversity of Opinions
Passion for Exploration
Dare to be Different
Empathetic
Growth & Learning
Paid Holidays
Medical Insurance
Equity
401K Matching
Learning & Development
Social Gatherings
Flex-Friendly
Maternity Leave
Paternity Leave
Sabbatical
Photo of the Rise User
Inclusive & Diverse
Diversity of Opinions
Passion for Exploration
Dare to be Different
Empathetic
Growth & Learning
Paid Holidays
Medical Insurance
Equity
401K Matching
Learning & Development
Social Gatherings
Flex-Friendly
Maternity Leave
Paternity Leave
Sabbatical
Photo of the Rise User
Posted 4 days ago
Photo of the Rise User
Posted 6 hours ago
Photo of the Rise User
Posted 13 days ago
Photo of the Rise User
Palo Alto Networks Hybrid Santa Clara, California, United States
Posted 3 days ago
Kentro Hybrid No location specified
Posted 14 hours ago
Photo of the Rise User
Posted 5 days ago

Canva is revolutionizing the design process around the world. The company provides a user-friendly online platform that enables anyone to produce stunning, professional designs - granting them easy access to the realm of visual communication.

258 jobs
MATCH
Calculating your matching score...
BADGES
Badge Bipoc LedBadge Women LedBadge ChangemakerBadge Future MakerBadge InnovatorBadge Future UnicornBadge Rapid Growth
CULTURE VALUES
Inclusive & Diverse
Diversity of Opinions
Passion for Exploration
Dare to be Different
Empathetic
Growth & Learning
BENEFITS & PERKS
Paid Holidays
Medical Insurance
Equity
401K Matching
Learning & Development
Social Gatherings
Flex-Friendly
Maternity Leave
Paternity Leave
Sabbatical
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
March 27, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
Photo of the Rise User
14 people applied to Software Engineer I at Affirm
Photo of the Rise User
Someone from OH, Dayton just viewed Front Desk Clerk at Marriott International
Photo of the Rise User
Someone from OH, Hilliard just viewed Junior Digital Analyst at Jellyfish
Photo of the Rise User
Someone from OH, Hilliard just viewed Junior Digital Data Analyst at AECOM
Photo of the Rise User
Someone from OH, Columbus just viewed Data Analyst/R Programmer at Peet's
Photo of the Rise User
Someone from OH, Grandview Heights just viewed Service Drive Greeter at Jeff Wyler Automotive Family
Photo of the Rise User
15 people applied to Frontend Engineer I at Outliant
Photo of the Rise User
Someone from OH, Washington Court House just viewed Administration and Clerical at Walmart
Photo of the Rise User
12 people applied to Unity Developer at FS Studio
Photo of the Rise User
9 people applied to Game Developer at Altera
Photo of the Rise User
Someone from OH, Dover just viewed Finance Intern - Summer 2025 at Spectrum
F
Someone from OH, Columbus just viewed Mortgage Loan Officer Assistant at Fulton Bank
Photo of the Rise User
Someone from OH, Cincinnati just viewed Amazon Work from Home Data Entry Jobs – Entry Level at Amazon
V
Someone from OH, Toledo just viewed Sports Event Coordinator at Ventures With Jen
Photo of the Rise User
Someone from OH, Dayton just viewed Research Assistant at Leidos
Photo of the Rise User
Someone from OH, Cincinnati just viewed Finance & Accounting Associate at HeadQuarters
Photo of the Rise User
Someone from OH, Canton just viewed Communications Manager at Shearer's Foods
Photo of the Rise User
Someone from OH, Sandusky just viewed Supply Chain Trainee Program (SCTP) at Anheuser-Busch
Photo of the Rise User
Someone from OH, Mason just viewed HR/Recruiting Assistant at Illumination
Photo of the Rise User
Someone from OH, Strongsville just viewed Used Car Buyer - Concord Toyota at Sonic Automotive
Photo of the Rise User
Someone from OH, Cincinnati just viewed Mid-level Creative (f/m/d) at Landor