Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Manager, Site Reliability Engineering image - Rise Careers
Job details

Manager, Site Reliability Engineering

Company Description

It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today — ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®. Our intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work. But this is just the beginning of our journey. Join us as we pursue our purpose to make the world work better for everyone.

Job Description

Position Overview:

We are seeking a highly skilled Technical SRE Manager to lead our Site Reliability Engineering (SRE) team. This role is pivotal in ensuring the scalability, availability, and reliability of our critical systems while driving automation, observability, and operational excellence. You will build and lead a team of NextGenOps and SREs, collaborate with engineering and operations teams, and implement AI/ML-driven strategies to enhance predictive analytics, proactive issue resolution, and self-healing systems.

This is a flexible work schedule position based in Costa Rica.

The NextGenOps team is a forward-thinking, AI-powered Site Reliability Engineering (SRE) group at the forefront of revolutionizing how we approach operations and infrastructure. Our team is dedicated to building resilient, scalable, and self-healing systems using cutting-edge AI/ML-driven technologies. We are not just an operations team; we are engineers on a mission to push the boundaries of automation, observability, and operational excellence. By combining AI with our deep expertise in cloud-native platforms, DevOps, and SRE best practices, we are shaping the future of how technology scales, evolves, and self-heals. If you're passionate about innovation and making a real impact through intelligent, data-driven solutions, you'll thrive in our dynamic, collaborative, and engineering-centric culture.

 

Qualifications

Key Responsibilities:

  • Lead and mentor a team of AI/ML-powered SREs, fostering a culture of automation, observability, and proactive issue resolution.
  • Define and execute AI/ML-driven SRE strategies for incident prediction, anomaly detection, and root cause analysis.
  • Champion AI-powered observability practices and advocate for self-healing architectures with machine learning automation.
  • Develop and enforce SLOs, SLIs, and SLAs using AI-driven insights.
  • Oversee AI-powered incident management, real-time anomaly detection, and auto-remediation.
  • Drive AI-driven automation for issue resolution, anomaly detection, and system fine-tuning.
  • Implement predictive maintenance and auto-remediation through machine learning models.
  • Ensure reliable deployments with AI-assisted rollouts, blue-green deployments, and canary releases.
  • Optimize costs through AI-powered resource allocation and workload balancing.
  • Ensure security and compliance with AI-driven event detection and threat mitigation.
  • Implement chaos engineering with AI-driven failure analysis to strengthen system resilience.
  • Collaborate with security teams to enforce AI-assisted threat detection and automated compliance monitoring
  • Lead capacity planning and performance optimization using AI/ML for dynamic scaling and resource forecasting.
  • Implement intelligent monitoring, logging, and alerting with AI-powered tools like Prometheus and Grafana.
  • Optimize CI/CD pipelines with AI-driven risk assessments and automated rollbacks.

To be successful in this role you have:

  • Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI’s potential impact on the function or industry.
  • 10+ years in SRE, DevOps, or infrastructure engineering, including 3+ years in a leadership role.
  • Proven experience integrating AI/ML for observability, automation, and incident response.
  • In depth understanding of monitoring tools (LogicMonitor, Catchpoint, Redgate, ScienceLogic). 
  • Demonstrated expertise in implementing and optimizing OpenTelemetry (OTel) for comprehensive observability across endpoints, cloud environments, infrastructure, and SaaS applications, enabling proactive monitoring, tracing, and performance insights.
  • Proficiency in scripting languages (Python, Go, Bash) and infrastructure tools (Terraform, Ansible) with AI/ML integration.
  • In-depth knowledge of observability and data pipeline tools (Datadog, Prometheus, Splunk, AI-driven platforms like Cisco FSO).
  • Extensive experience in incident management and on-call rotations, with AI-enhanced predictive approaches.
  • Experience with CI/CD pipelines, GitOps, and infrastructure-as-code (IaC).

Preferred Qualifications:

  • Experience with data platforms or enterprise automation tools (e.g., ServiceNow, Salesforce, SAP).
  • Knowledge of AI/ML-based data automation technologies.
  • Familiarity with regulatory requirements for data privacy, such as GDPR and CCPA.
  • A passion for leveraging emerging technologies to drive business transformation.
  • A customer-first mentality with an ability to translate user feedback into actionable product features.
  • Experience in leading cross-functional teams in a matrixed organization.
  • Strong communication and leadership skills, with the ability to engage and influence stakeholders across technical and non-technical teams.
  • Ability to thrive in a rapidly evolving industry and adapt to new challenges and opportunities.

FD21

Not sure if you meet every qualification? We still encourage you to apply! We value inclusivity, welcoming candidates from diverse backgrounds, including non-traditional paths. Unique experiences enrich our team, and the willingness to dream big makes you an exceptional candidate!

Additional Information

Work Personas

We approach our distributed world of work with flexibility and trust. Work personas (flexible, remote, or required in office) are categories that are assigned to ServiceNow employees depending on the nature of their work. Learn more here.

Equal Opportunity Employer

ServiceNow is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, national origin or nationality, ancestry, age, disability, gender identity or expression, marital status, veteran status, or any other category protected by law. In addition, all qualified applicants with arrest or conviction records will be considered for employment in accordance with legal requirements. 

Accommodations

We strive to create an accessible and inclusive experience for all candidates. If you require a reasonable accommodation to complete any part of the application process, or are unable to use this online application and need an alternative method to apply, please contact [email protected] for assistance. 

Export Control Regulations

For positions requiring access to controlled technology subject to export control regulations, including the U.S. Export Administration Regulations (EAR), ServiceNow may be required to obtain export control approval from government authorities for certain individuals. All employment is contingent upon ServiceNow obtaining any export license or other approval that may be required by relevant export control authorities. 

From Fortune. ©2024 Fortune Media IP Limited. All rights reserved. Used under license. 

ServiceNow Glassdoor Company Review
4.5 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
ServiceNow DE&I Review
4.6 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
CEO of ServiceNow
ServiceNow CEO photo
Bill McDermott
Approve of CEO

Average salary estimate

$135000 / YEARLY (est.)
min
max
$120000K
$150000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Manager, Site Reliability Engineering, ServiceNow

At ServiceNow, we're thrilled to announce an exciting opportunity for a Manager, Site Reliability Engineering at our vibrant office in the America Free Zone, Costa Rica! We're not just looking for any candidate; we need a forward-thinking Technical SRE Manager to spearhead our Site Reliability Engineering team. In this pivotal role, you'll ensure that our critical systems are scalable, available, and reliable. You'll get to harness the power of AI and work alongside talented NextGenOps and SREs who are reshaping the future of automation, observability, and operational excellence. Your mission? Lead a dedicated team in developing AI-driven strategies for incident management, anomaly detection, and self-healing systems. With an emphasis on collaboration, you'll work closely with engineering and operations teams to foster a culture that values proactive problem-solving and continuous improvement. You’ll implement cutting-edge machine learning models to optimize performance, conduct capacity planning, and enhance security measures. The ideal candidate will have over 10 years of experience in SRE, DevOps, or infrastructure engineering with a knack for integrating AI into workflows. If you have a passion for innovation and a drive to make a significant impact, this role is perfect for you. Join us on our journey to transform how we operate and deliver exceptional technology solutions for our clients across the globe!

Frequently Asked Questions (FAQs) for Manager, Site Reliability Engineering Role at ServiceNow
What are the key responsibilities of a Manager, Site Reliability Engineering at ServiceNow?

As the Manager, Site Reliability Engineering at ServiceNow, your key responsibilities include leading and mentoring an AI/ML-powered SRE team, defining AI-driven SRE strategies for incident prediction and recovery, and implementing self-healing architectures. You'll also develop SLOs and oversee incident management while optimizing deployments and resources through automation.

Join Rise to see the full answer
What qualifications do you need to apply for the Manager, Site Reliability Engineering role at ServiceNow?

To apply for the Manager, Site Reliability Engineering position at ServiceNow, you should possess over 10 years of experience in SRE, DevOps, or infrastructure engineering, with at least 3 years in a leadership role. A strong understanding of AI/ML integration, observability tools, and scripting languages is also essential for success in this position.

Join Rise to see the full answer
How does the Manager, Site Reliability Engineering role contribute to operational excellence at ServiceNow?

The Manager, Site Reliability Engineering role is crucial in driving operational excellence at ServiceNow. By leading a team focused on automation and observability, you’ll implement AI-driven strategies that enhance the reliability and performance of our systems, ensuring seamless service delivery to our clients.

Join Rise to see the full answer
What is the work culture like for a Manager, Site Reliability Engineering at ServiceNow?

At ServiceNow, the work culture for the Manager, Site Reliability Engineering is dynamic, collaborative, and engineering-centric. You'll be part of a forward-thinking team dedicated to pushing boundaries with AI technologies, fostering creativity, and promoting a culture of continuous learning and impact.

Join Rise to see the full answer
What kind of impact can a Manager, Site Reliability Engineering make at ServiceNow?

As a Manager, Site Reliability Engineering at ServiceNow, you will have a profound impact by pioneering AI applications to enhance the reliability of our systems. Your leadership will facilitate innovative solutions that optimize operations, enabling the company to provide better services to its global customer base.

Join Rise to see the full answer
Common Interview Questions for Manager, Site Reliability Engineering
How do you approach managing a Site Reliability Engineering team?

When managing a Site Reliability Engineering team, I focus on fostering a culture of collaboration and continuous improvement. By setting clear expectations and encouraging innovation, I empower my team to take ownership of their projects while ensuring they have the resources and support they need to excel.

Join Rise to see the full answer
Can you describe your experience with AI and ML as it relates to site reliability?

My experience with AI and ML in site reliability involves integrating these technologies to enhance monitoring and automation processes. I've successfully implemented AI-driven incident prediction systems that have improved our response times significantly and reduced downtime across our services.

Join Rise to see the full answer
What strategies do you use for incident management in your previous roles?

In my previous roles, I employed a mix of proactive and reactive strategies for incident management, including the implementation of SLOs and SLIs to establish performance metrics. Additionally, I utilized AI for predictive analytics to identify potential issues before they escalate, ensuring minimal disruption.

Join Rise to see the full answer
How do you ensure a balance between automation and manual processes in SRE?

To ensure a balance between automation and manual processes in SRE, I evaluate repetitive tasks that can be automated while maintaining a human touch for critical decision-making scenarios. This strategy maximizes efficiency without compromising quality and responsiveness.

Join Rise to see the full answer
Describe your experience with monitoring and observability tools.

I have extensive experience using monitoring and observability tools like Prometheus and Grafana. I’ve implemented these tools to capture key metrics and logs, providing real-time visibility into system performance and enabling proactive issue detection and resolution.

Join Rise to see the full answer
What do you consider the most challenging aspect of being a Manager, Site Reliability Engineering?

The most challenging aspect of being a Manager, Site Reliability Engineering involves balancing the urgent demands of incident response with long-term strategic initiatives. Prioritizing tasks effectively and ensuring team well-being during crisis situations is crucial.

Join Rise to see the full answer
How do you implement SLOs and SLIs in your teams?

I implement SLOs and SLIs by collaborating with cross-functional teams to define accurate metrics that align with business objectives. Regular reviews and adjustments based on performance data ensure these indicators reflect the actual user experience and system reliability.

Join Rise to see the full answer
What techniques do you use for capacity planning?

For capacity planning, I utilize historical data analysis paired with predictive modeling using AI/ML algorithms. This helps forecast system demands, ensuring that resources are allocated efficiently and effectively in anticipation of user growth.

Join Rise to see the full answer
How have you handled a significant outage in your previous roles?

During a significant outage in my previous role, I quickly assembled the team to initiate our incident response protocol. We communicated transparently with stakeholders while working through a root cause analysis to implement preventive measures, ultimately improving our system resilience.

Join Rise to see the full answer
What role does feedback play in your management style?

Feedback is essential in my management style. I regularly seek input from my team and stakeholders to enhance processes and drive improvements. Constructive feedback fosters an environment of trust and collaboration, leading to a more engaged and motivated team.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
ServiceNow Hybrid 4810 Eastgate Mall, San Diego, California, United States
Posted 6 days ago
Inclusive & Diverse
Mission Driven
Rise from Within
Diversity of Opinions
Work/Life Harmony
Empathetic
Feedback Forward
Take Risks
Collaboration over Competition
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Conferences Stipend
Paid Time-Off
Maternity Leave
Equity

Join ServiceNow as a Senior Staff Software Engineer to lead software development and mentor the team in San Diego.

Photo of the Rise User
ServiceNow Hybrid Building A,B,C 2225 Lawson Lane, Santa Clara, CALIFORNIA, United States
Posted 6 days ago
Inclusive & Diverse
Mission Driven
Rise from Within
Diversity of Opinions
Work/Life Harmony
Empathetic
Feedback Forward
Take Risks
Collaboration over Competition
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Conferences Stipend
Paid Time-Off
Maternity Leave
Equity

Join ServiceNow as a Senior Director to lead and innovate their compensation strategy in a collaborative environment.

Photo of the Rise User
Veolia Environnement SA Hybrid 1103 Airport Blvd., Burlingame, CA 94010, USA
Posted 6 days ago

Join Veolia North America as a Lead Wastewater Operator to oversee water treatment operations and ensure regulatory compliance.

Photo of the Rise User
Posted 9 days ago
Photo of the Rise User
Posted 11 hours ago

Lead Movable Ink's engineering team in pioneering AI-driven solutions for omnichannel marketing.

Photo of the Rise User
Archer Hybrid Covington, Georgia, United States
Posted 9 days ago
Dental Insurance
Flexible Spending Account (FSA)
Health Savings Account (HSA)
Vision Insurance
Photo of the Rise User
Bosch Group Hybrid no.123 industrial layout hosur road koramangala,, bengaluru , India
Posted 9 days ago
Photo of the Rise User
CyberArk Hybrid Santa Clara, CALIFORNIA
Posted 2 days ago

Join CyberArk as a Staff Production Engineer to design and build the cloud infrastructure for machine identity security.

Photo of the Rise User
Posted 4 days ago

AECOM is looking for a Mid-Level Environmental Engineer to work on innovative water/wastewater projects in Austin, TX.

We're on a mission to become the defining enterprise software company of the 21st century.

1977 jobs
MATCH
Calculating your matching score...
CULTURE VALUES
Inclusive & Diverse
Mission Driven
Rise from Within
Diversity of Opinions
Work/Life Harmony
Empathetic
Feedback Forward
Take Risks
Collaboration over Competition
BENEFITS & PERKS
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Conferences Stipend
Paid Time-Off
Maternity Leave
Equity
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
April 5, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
Photo of the Rise User
Someone from OH, Dayton just viewed Risk Operations Specialist at Imprint
A
Someone from OH, Cleveland just viewed Traffic Control Flagger at AWP Safety
Photo of the Rise User
Someone from OH, Sylvania just viewed Talent Sourcer at CEQUENS
Photo of the Rise User
Someone from OH, Sylvania just viewed Talent Sourcer (6 month contract) at Jerry
A
Someone from OH, Cleveland just viewed Junior Communications Specialist at Alphabe Insight Inc
Photo of the Rise User
Someone from OH, Columbus just viewed Telecom Coordinator at The Cheesecake Factory
Photo of the Rise User
Someone from OH, Cincinnati just viewed Staff Data Engineer at Visa
Photo of the Rise User
Someone from OH, Mason just viewed R&D Mechanical Engineer at Traeger Wood Pellet Grills
K
Someone from OH, Cleveland just viewed Game Director at KIMARU Talent
Photo of the Rise User
Someone from OH, Dublin just viewed Associate, Legal Ops - United States (Remote) at EvenUp
Photo of the Rise User
20 people applied to Internship summer 2025 at Boeing
Photo of the Rise User
22 people applied to Supervisor, Plumbing at SpaceX
Photo of the Rise User
Someone from OH, Cleveland just viewed Senior Governance Risk and Compliance Analyst at Dave
T
Someone from OH, New Albany just viewed Product Manager - Media & Entertainment at Truelogic
Photo of the Rise User
Someone from OH, Cincinnati just viewed Chief Financial Officer (Single Family Office) at Confidential
Photo of the Rise User
Someone from OH, New Albany just viewed Earned Media Specialist at L2TMedia
Photo of the Rise User
Someone from OH, New Albany just viewed Field Marketing Manager at Houzz
Photo of the Rise User
Someone from OH, New Albany just viewed Fields and Events Marketing Manager at FullStory
Photo of the Rise User
Someone from OH, Cincinnati just viewed Full-Time Google Ad Manager - US Only, No Agencies at Upwork
Photo of the Rise User
Someone from OH, New Albany just viewed Field Marketing Manager at Front
Photo of the Rise User
16 people applied to Assembly Mechanic at Boeing
Photo of the Rise User
Someone from OH, Cincinnati just viewed Quality Inspector - Mechanical - Level 1 at SQA Services