Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Site Reliability Engineer (SRE) image - Rise Careers
Job details

Site Reliability Engineer (SRE)

Job Description

Our company specializes in the development of animal health management solutions. We are a multidisciplinary product company, a diverse team of ~450 closely collaborating scientists, AI experts, software, hardware, and mechanical engineers… working alongside veterinarians and other animal experts. Our passion? Shaping the future of animal health and well-being (for much better!).

 

Our products and platforms identify trends and predict the likelihood of health outcomes for HUNDREDS of MILLIONS of animals each year, from pets, to poultry, farm animals, and even fish. We provide actionable insights for veterinarians, farmers, and producers, changing the way people care for animals in 150 markets.

 

So, if you’re looking to work in a company that combines pioneering science and technology, dedicated colleagues, and animals, you’ll find it all here – come join us!

 

We are looking for an exceptional Senior Site Reliability Engineer (SRE) to help establish and lead the technical practices of SRE within our CloudOps team. This is a hands-on role for an experienced professional who can implement SRE principles, build frameworks and tools to ensure system reliability, and mentor others in adopting these practices.

 

If you are passionate about operational excellence, love solving complex technical challenges, and thrive in highly collaborative environments, this is the role for you.

 

What You’ll Do:

Define and Build the SRE Function

·      Help to define and implement the SRE principles and practices.

·      Partner with development and DevOps teams to create Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs) for critical services.

·      Advocate for and implement system architectures that prioritize reliability, scalability, and fault tolerance.

Develop Automation and Resilience

·      Build automation tools to reduce toil, streamline operations, and improve reliability using Infrastructure as Code (IaC) tools like Terraform and CrossPlane.

·      Implement self-healing systems, automate incident detection and response, and integrate chaos engineering practices to test system resilience.

Drive Observability and Monitoring Excellence

·      Create and maintain advanced observability systems with tools like DataDog, Prometheus, and Grafana to ensure uptime and system health.

·      Develop efficient alerting and monitoring strategies, including synthetic tests and automated anomaly detection.

·      Strong proven experience with AWS services and using IAC with Terraform.

·      Analyze system logs and telemetry data to detect patterns, identify issues, and optimize system performance.

Incident Response and Problem Solving

·      Take ownership of incident response processes, ensuring swift recovery of services and conducting thorough Root Cause Analysis (RCA) for long-term improvements.

·      Document incident learnings and collaborate with teams to enhance on-call processes and system documentation.

Contribute to Continuous Improvement

·      Improve deployment pipelines (CI/CD) using tools like GitHub Actions, Azure DevOps, or ArgoCD, ensuring smooth and reliable releases.

·      Continuously evaluate and refine operational processes to reduce manual effort and increase efficiency.

Requirements:

Technical Expert

·      5+ years of hands-on experience in Site Reliability Engineering.

·      Proven expertise in AWS services, with experience working with distributed, event-driven architectures and microservices.

·      Experience with GitOps workflows and tools.

·      Advanced skills in automation tools like Terraform and proficiency in scripting or programming languages (e.g., Python, Go, Bash).

Problem Solver and Collaborator

·      Exceptional problem-solving skills and a proactive approach to identifying and addressing technical challenges.

·      Effective communicator and collaborator with the ability to work across teams to deliver operational excellence.

·      Strong analytical skills, especially in troubleshooting and optimizing complex systems.

Preferred

·      Familiarity with chaos engineering tools like Gremlin or LitmusChaos.

 

MDAHTL

Current Employees apply HERE

Current Contingent Workers apply HERE

Search Firm Representatives Please Read Carefully 
Merck & Co., Inc., Rahway, NJ, USA, also known as Merck Sharp & Dohme LLC, Rahway, NJ, USA, does not accept unsolicited assistance from search firms for employment opportunities. All CVs / resumes submitted by search firms to any employee at our company without a valid written search agreement in place for this position will be deemed the sole property of our company.  No fee will be paid in the event a candidate is hired by our company as a result of an agency referral where no pre-existing agreement is in place. Where agency agreements are in place, introductions are position specific. Please, no phone calls or emails. 

Employee Status:

Regular

Relocation:

VISA Sponsorship:

Travel Requirements:

Flexible Work Arrangements:

Hybrid

Shift:

Valid Driving License:

Hazardous Material(s):


Required Skills:

Artificial Intelligence (AI), Artificial Intelligence (AI), Automation, Automation Solutions, Availability Management, Capacity Management, Change Controls, Design Applications, High Performance Computing (HPC), Incident Management, Information Management, Information Technology (IT) Infrastructure, Infrastructure As Code (IaC), IT Service Management (ITSM), Microsoft Azure DevOps, Operational Excellence, Release Management, Reliability Engineering, SLA Management, Software Development, Software Development Life Cycle (SDLC), Solution Architecture, System Administration, System Designs, Systems Architecture {+ 4 more}


Preferred Skills:

Job Posting End Date:

05/25/2025

*A job posting is effective until 11:59:59PM on the day BEFORE the listed job posting end date. Please ensure you apply to a job posting no later than the day BEFORE the job posting end date.

Similar Jobs
Photo of the Rise User
Posted 13 days ago

Dynamic Director needed to lead biologics process development teams driving innovative upstream and downstream bioprocess technologies at an established pharmaceutical company.

Photo of the Rise User
MSD Hybrid USA - Pennsylvania - North Wales (Upper Gwynedd)
Posted 4 days ago

Experienced Promotion Operations Specialist needed to enhance and manage promotional review and label update processes at Merck in North Wales, PA.

Photo of the Rise User
LeoLabs Hybrid Menlo Park, California
Posted 8 days ago

Manufacturing Engineer needed at LeoLabs to drive scalable production processes for next-generation radar hardware at their Menlo Park headquarters.

Photo of the Rise User
Sanofi Hybrid Swiftwater, PA
Posted 11 days ago

An Automation Engineer role at Sanofi's Swiftwater site supporting automated process control systems in biopharma manufacturing with a focus on DeltaV and related control systems.

Photo of the Rise User
Posted 12 days ago

Senior Engineering Manager needed at Savvy to lead engineering teams in delivering innovative, digital-first wealth management solutions.

Photo of the Rise User

Lead and inspire EPE’s Distribution Design team to develop innovative utility distribution system solutions with a focus on leadership and client engagement.

Photo of the Rise User

Experienced Bridge Engineer/Project Manager opportunity at KPFF to lead complex bridge and heavy civil engineering projects in Seattle.

Coffman Engineers, Inc. Hybrid 626 Wilshire Blvd suite 700, Los Angeles, CA 90017, USA
Posted 2 days ago

Experienced Senior Fire Protection Engineer needed at Coffman Engineers to design and manage fire safety systems in a multi-discipline environment.

Photo of the Rise User
Posted 15 hours ago

Contribute to Supabase's mission by creating technical content, engaging communities, and building projects as a Developer Relations Engineer in a fully remote environment.

ngc Hybrid United States-Maryland-Baltimore
Posted 7 days ago

Lead a dynamic team of engineers at Northrop Grumman to drive innovation and technical excellence in Tactical Fighters RF and power electronics design.

Photo of the Rise User
RETTEW Hybrid Allentown, Pennsylvania, United States
Posted 4 days ago

A seasoned Senior AutoCAD Designer 2 role at RETTEW supporting innovative land development projects with a hybrid work schedule.

SEC Hybrid 3900 N Capital of Texas Hwy, Austin, TX, USA
Posted 12 days ago

Contribute your extensive front-end design verification expertise at Samsung SARC to advance scalable, high-performance semiconductor IP infrastructure.

XP Power Hybrid Irvine, California, United States
Posted 5 days ago

A leading public power technology company seeks a Mechanical Engineering Lead to guide their talented R&D team in designing advanced power converters.

Photo of the Rise User

SHELADIA Associates seeks a Quality Control Area Engineer - Civil to oversee and implement quality control processes for civil engineering projects in the North East region.

Photo of the Rise User

Lead large-scale mechanical system projects as a Senior Mechanical Engineer at AECOM, combining expert engineering and project management in a dynamic industrial setting.

Our purpose: We use the power of leading-edge science to save and improve lives around the world

47 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
May 18, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!