Job details

Staff Site Reliability Engineer

Get a free resume review

Primer is seeking a Staff Site Reliability Engineer to join their Infrastructure team, responsible for designing and maintaining fault-tolerant systems while collaborating with other teams to ensure high reliability and performance.

Skills

Production systems engineering
Linux systems administration
Observability tools
Microservices architectures
Programming (Python, Go)

Responsibilities

Design and architect solutions for continuous availability and scalability in production.
Define and review Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
Develop tools and frameworks to streamline monitoring and incident response.
Participate in on-call rotations and lead incident responses.
Develop and maintain monitoring, logging, and alerting systems.

Benefits

Full medical, dental, and vision coverage
Fertility benefits
Mental health coverage
Gym membership
401(k)
Remote work stipends

To read the complete job description, please click on the ‘Apply’ button

Average salary estimate

$205000 / YEARLY (est.)

min

max

$180000K

$230000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Staff Site Reliability Engineer , Primer.ai

At Primer, we're on a mission to make the world a safer place through trusted decision-ready AI. As our Staff Site Reliability Engineer based in vibrant Washington, D.C., you will be an essential part of our Infrastructure team. Your role will focus on designing, building, and maintaining fault-tolerant systems that empower some of the world's most critical organizations. Collaborating closely with Product and Engineering teams, you will define and achieve service level objectives, enhance observability, and elevate our Engineering practices. Leveraging your deep expertise in observability, capacity planning, and automation, you'll play a pivotal role in sustaining our mission-critical operations while ensuring developers and customers enjoy a seamless experience. Your responsibilities will include architecting solutions for continuous availability, driving automation to streamline operations, managing incident responses, and developing best-in-class monitoring systems. Your technical skills will shine through as you implement best practices and work closely with cross-functional teams to deliver reliable solutions. If you're passionate about making an impact and thrive in a culture that values collaboration and innovation, we'd love to have you on board!

Frequently Asked Questions (FAQs) for Staff Site Reliability Engineer Role at Primer.ai

What are the main responsibilities of a Staff Site Reliability Engineer at Primer?

As a Staff Site Reliability Engineer at Primer, your primary responsibilities include designing and architecting solutions for continuous availability, defining Service Level Indicators (SLIs), and upholding reliability standards. You'll also develop automation tools and frameworks, manage incident responses, and ensure that our observability practices provide actionable insights into system health.