Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Staff Site Reliability Engineer image - Rise Careers
Job details

Staff Site Reliability Engineer

Location: Onsite, Sunnyvale, California (5 days a week in the office)

Onwards Together!

Illumio, the pioneer and market leader of Zero Trust segmentation, prevents breaches from becoming cyber disasters. Illumio protects critical applications and valuable digital assets with proven segmentation technology purpose-built for the Zero Trust security model. Illumio ransomware mitigation and segmentation solutions see risk, isolate attacks, and secure data across cloud-native apps, hybrid and multi-clouds, data centers, and endpoints, enabling the world’s leading organizations to strengthen their cyber resiliency and reduce risk.  Illuminate the future with Illumio and join a team that’s passionate about developing cutting-edge security solutions that protect the world's most critical assets. 

Our Team's Vision:

Our Engineering team is driven by a culture that thrives on visionary leadership, autonomy, and ownership, creating a dynamic synergy that drives us forward in the ever-evolving landscape of cybersecurity. 

When you join our team, you become part of the leader in Zero Trust Segmentation. You'll work with a cutting-edge technology stack that spans operating systems, distributed applications, and immersive UI/visualization tools.  

We're shaping the future of cybersecurity. And together, we will continue to build world-class products—led by people with different perspectives, backgrounds, and a commitment to innovation in a time when the world faces its greatest cybersecurity threats in history. 

Your Impact: 

We are seeking a skilled and proactive Product SRE (Site Reliability Engineer) to join our team and take ownership of debugging, troubleshooting, and resolving production escalations in a complex SaaS environment. The ideal candidate will have a deep understanding of AWS and Azure cloud platforms, application performance, and operational excellence, with a passion for automation and continuous improvement.

  1. Production Support:

    • Investigate and resolve production incidents and escalations to ensure minimal downtime and impact to customers.

    • Work closely with engineering and support teams to troubleshoot application and infrastructure issues.

  2. Performance Monitoring and Optimization:

    • Proactively monitor application health, performance, and reliability using modern observability tools.

    • Analyze trends in system behavior and suggest performance improvements.

  3. Automation and Tooling:

    • Develop and maintain automation scripts and tools to improve operational efficiency and incident resolution.

    • Create and enhance runbooks to streamline troubleshooting and reduce mean time to resolution (MTTR).

  4. Root Cause Analysis (RCA):

    • Conduct thorough post-incident reviews to identify root causes and implement preventive measures.

    • Drive a culture of continuous improvement by documenting lessons learned and improving system designs.

  5. Cross-Functional Collaboration:

    • Partner with software engineers, QA, and product teams to improve application stability and user experience.

    • Act as a bridge between development and operations, ensuring smooth and reliable service delivery.

Your Toolkit:

  • Bachelor's degree in Computer Science, Engineering, or related field; or equivalent work experience

  • 8+ years of relevant SRE experience.

  • Cloud Expertise:
    • Strong hands-on experience with AWS and Azure
    • Familiarity with Kubernetes and containerized environments.
    • Knowledge of networking concepts, such as DNS, load balancing, and firewalls.
  • Troubleshooting Skills:
    • Proficient in diagnosing and resolving complex issues in SaaS environments, including performance bottlenecks and application errors.
  • Programming and Scripting:
    • Proficiency in at least one programming language (e.g., Python, Go, Java) and scripting languages (e.g., Bash, PowerShell).
  • Monitoring and Observability:
    • Experience with tools like Datadog, New Relic, Prometheus, Grafana, ELK, or Azure Monitor.
  • Automation and Configuration Management:
    • Familiarity with tools like Ansible, Terraform, or CloudFormation.
  • Database Experience:
    • Knowledge of debugging and optimizing relational databases (e.g., PostgreSQL, MySQL) and caching systems (e.g., Redis, Memcached).
  • Incident Management:
    • Experience with incident management tools and processes, including conducting RCAs and improving on-call processes.

Compensation:

$ 192,000 USD - $ 230,000 USD

The pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include responsibilities of the job, education, location, experience, knowledge, skills, abilities, and internal equity, alignment with market data, or applicable laws. 

At Illumio we offer a wide range of benefits to our eligible team members. Our benefit programs vary by location and can include Medical, Dental, Vision Coverage – Health and Dependent Savings Accounts – Life and Disability Programs – Paid Parental Leave – Voluntary Benefit Programs – Company Sponsored Wellness Program – Wellness Reimbursement Program - Retirement Savings – Equity Opportunities – Paid time off and Paid Holidays – Employee Incentive Program. #LI-KD1 #LI-ONSITE

Our Commitment: 

Illumio believes that an environment of unique backgrounds, experiences, viewpoints, and individual contributions drives our success and makes us stronger together. We are dedicated to creating and maintaining a diverse culture and emphasizing inclusion and belonging.   

All official job offers from our company are extended directly by our recruitment team and will be sent through an official DocuSign document for your review and signature. Please be aware that we do not ask for any personal information in the process of extending offers of employment, such as financial details or social security numbers. Upon acceptance of any offer, we will request such information as part of the onboarding process prior to or on your first day of employment, and only after completing a background check through an authorized third-party vendor. If you receive any communication asking for personal details outside of these processes, please contact us immediately to verify the authenticity of the request. Your security is important to us, and we are committed to a safe and transparent hiring experience. 

Illumio Glassdoor Company Review
4.0 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
Illumio DE&I Review
4.0 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
CEO of Illumio
Illumio CEO photo
Andrew Rubin
Approve of CEO

Average salary estimate

$211000 / YEARLY (est.)
min
max
$192000K
$230000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Staff Site Reliability Engineer, Illumio

Join Illumio as a Staff Site Reliability Engineer in Sunnyvale, California, and be part of a team that's leading the charge in Zero Trust segmentation! At Illumio, we pride ourselves on being at the forefront of cybersecurity, specializing in solutions that prevent breaches from escalating into full-blown disasters. As a Staff Site Reliability Engineer, you will play a pivotal role in ensuring the stability and reliability of our services. Imagine being entrusted with debugging and troubleshooting complex production incidents in a dynamic SaaS environment—where your expertise in AWS and Azure will shine, and your passion for automation will be invaluable! You'll work closely with a dedicated team, proactively monitoring application performance and optimizing operational efficiencies. Root cause analysis and continuous improvement will be your specialties as you collaborate with software engineers and product teams to elevate our user experience. Plus, you'll get to work with cutting-edge technologies in a culture that encourages visionary leadership and fosters an environment of autonomy and ownership. If you're ready to make an impact and innovate in the cybersecurity landscape, Illumio is the place for you! We look forward to welcoming you to our passionate team as we protect the world's most critical assets together.

Frequently Asked Questions (FAQs) for Staff Site Reliability Engineer Role at Illumio
What are the key responsibilities of a Staff Site Reliability Engineer at Illumio?

A Staff Site Reliability Engineer at Illumio is responsible for resolving production incidents, optimizing application performance, and automating operational processes. This role involves investigating production escalations, implementing root cause analysis, and collaborating with teams to improve service stability and user experience.

Join Rise to see the full answer
What qualifications are needed for the Staff Site Reliability Engineer role at Illumio?

To be considered for the Staff Site Reliability Engineer position at Illumio, candidates should have a Bachelor's degree in Computer Science or a related field, along with 8+ years of relevant SRE experience. Expertise in AWS and Azure, knowledge of Kubernetes, and strong programming skills in languages like Python, Go, or Java are essential.

Join Rise to see the full answer
How does Illumio support employee development for the Staff Site Reliability Engineer position?

Illumio is dedicated to fostering a culture of continuous improvement and innovation. Staff Site Reliability Engineers have access to development programs, training sessions, and mentoring opportunities, ensuring they remain at the forefront of industry advancements and can elevate their skills in the cybersecurity domain.

Join Rise to see the full answer
What tools and technologies do Staff Site Reliability Engineers use at Illumio?

Staff Site Reliability Engineers at Illumio utilize a broad toolkit that includes modern observability tools like Datadog, New Relic, and Grafana. They also have experience in automation tools like Ansible, Terraform, and various programming languages to streamline operations and enhance efficiencies.

Join Rise to see the full answer
What is the work culture like for Staff Site Reliability Engineers at Illumio?

The work culture at Illumio is driven by autonomy, visionary leadership, and a collaborative spirit. Staff Site Reliability Engineers are encouraged to own their projects, innovate solutions, and contribute to a diverse and inclusive environment where every team member's contributions are valued.

Join Rise to see the full answer
Common Interview Questions for Staff Site Reliability Engineer
Can you explain your experience with AWS and Azure as a Staff Site Reliability Engineer?

When discussing your experience with AWS and Azure, focus on specific projects where you've implemented services, managed deployments, or resolved issues. Highlight your expertise in cloud architecture and how you used these platforms to enhance application performance and reliability.

Join Rise to see the full answer
What steps do you take for root cause analysis after an incident occurs?

In your response, outline your systematic approach to root cause analysis, which might include collecting logs, analyzing performance metrics, engaging with team members for insights, and documenting findings. Emphasize the importance of learning from incidents to prevent future occurrences.

Join Rise to see the full answer
Describe a time you automated a critical process in your previous role.

Share a specific example where you identified a manual process that was prone to errors or inefficiencies, and describe the automation you implemented. Discuss the tools you used and the measurable outcomes, such as reduced incident response time or enhanced system stability.

Join Rise to see the full answer
How do you stay updated on the latest trends in site reliability and DevOps?

Discuss your commitment to continuous learning by attending industry conferences, participating in relevant online courses, or following key thought leaders in site reliability and DevOps on platforms like LinkedIn and Twitter. Share examples of how you've applied new knowledge to your work.

Join Rise to see the full answer
What performance monitoring tools are you familiar with, and how have you used them?

Mention specific monitoring tools you've used, such as Datadog or Prometheus, and provide examples of how you've implemented them to enhance system observability, identify bottlenecks, and improve application performance over time.

Join Rise to see the full answer
How do you approach incident management during high-pressure situations?

Describe your methodical approach to incident management, emphasizing the importance of remaining calm, communicating effectively with stakeholders, and utilizing your troubleshooting skills to resolve issues swiftly while minimizing downtime for users.

Join Rise to see the full answer
What is your experience with configuration management tools?

Detail your experience with tools like Ansible or Terraform, explaining how you've used them to automate infrastructure provisioning, enforce consistency across environments, or streamline deployment processes in your previous roles.

Join Rise to see the full answer
How do you ensure collaboration between development and operations teams?

Discuss strategies you've implemented to promote collaboration, such as conducting regular sync-up meetings, involving operational insights in the development process, and using collaborative tools like Slack or Jira to maintain clear communication and shared goals.

Join Rise to see the full answer
What methods do you use for performance optimization in cloud environments?

Explain your approach to performance optimization, including regularly analyzing application metrics, conducting load testing, and utilizing caching solutions. Mention specific situations where your optimizations led to measurable improvements.

Join Rise to see the full answer
Can you discuss your experience with databases and performance tuning?

Share your experience working with relational databases, focusing on specific tuning methods you've applied, such as query optimization or indexing strategies. Provide an example of a situation where your improvements significantly enhanced database performance.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Illumio Hybrid Sunnyvale, California, United States
Posted 3 days ago

Be part of Illumio as a Staff Software Engineer to shape the future of cybersecurity through innovative software development.

Photo of the Rise User
Illumio Hybrid Sunnyvale, California, United States
Posted 10 days ago

Join Illumio as a Senior Product Marketing Manager and play a key role in redefining how organizations combat cyber threats.

Photo of the Rise User
Xepelin Remote No location specified
Posted 11 days ago

Join Xepelin as an Engineering Manager to lead a creative engineering team in building innovative financial solutions for businesses across Latin America.

Photo of the Rise User
Posted 14 days ago

Join Boeing as a Mid-Level Product Review Engineer, where you'll play a crucial role in enhancing the KC-46 Tanker Aerial Refueling Aircraft's performance and safety.

Photo of the Rise User
Posted 7 days ago

Rockford Construction is looking for a skilled BIM Coordinator to join their team and streamline project workflows with innovative design methodologies.

Photo of the Rise User
Bosch Group Hybrid Australia Automotive Research Centre, 445 Gum Flats Rd, Wensleydale VIC 3241, Australia
Posted 9 days ago

Embark on a transformative 12-month paid internship as a Student Vehicle Dynamic Engineer with Bosch, where innovation meets real-world application.

Photo of the Rise User
Sword Group Remote No location specified
Posted 14 days ago

Step into the role of AI & Automation Lead at Sword, where your expertise will champion innovative AI solutions across diverse client projects.

Photo of the Rise User

Be part of a dedicated team at Sargent & Lundy, contributing to nuclear energy projects as an Electrical Designer 3 with a hybrid work model.

Photo of the Rise User
Posted 13 days ago

Lead a dynamic team as our Engineering Manager, driving innovative MarTech solutions to enhance mental healthcare.

Photo of the Rise User
Posted 3 days ago

Sev1Tech is looking for a skilled Systems Engineer to enhance the Logistics IT portfolio for the U.S. Navy and Coast Guard.

Illumio is an American business data center and cloud computing security company founded in 2013 by Andrew Rubin and P. J. Kirner. The company was founded in 2013 and has been ranked #25 in the Forbes Cloud 100 list in 2019.

108 jobs
MATCH
Calculating your matching score...
BADGES
Badge Diversity ChampionBadge Flexible CultureBadge Future MakerBadge Global CitizenBadge Innovator
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, on-site
DATE POSTED
April 8, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
Photo of the Rise User
89 people applied to Electrical Apprentice at Aerotek
Photo of the Rise User
Someone from OH, Columbus just viewed Scrum Master at Sysco Costa Rica
Photo of the Rise User
10 people applied to UI Developer Intern at RainFocus
X
Someone from OH, Cincinnati just viewed Senior Java Engineer (Remote) at Xenon7
Photo of the Rise User
Someone from OH, Cincinnati just viewed Senior, Software Engineer- Java at Walmart
Photo of the Rise User
Someone from OH, Cincinnati just viewed Java, Javascript, Python, NodeJS Software Engineer at Walmart
Photo of the Rise User
Someone from OH, Pickerington just viewed Senior Business Analyst (Salesforce) at Protolabs
H
Someone from OH, Akron just viewed Brand Marketing Manager at Huntington
R
Someone from OH, Hamilton just viewed Forklift Operator Warehouse at Ryder
Photo of the Rise User
Someone from OH, Cincinnati just viewed Ad Ops Specialist, Display at System1
Photo of the Rise User
Someone from OH, Cincinnati just viewed FQHC Billing & Collections Manager at OhioGuidestone
Photo of the Rise User
Someone from OH, Cleveland just viewed Enrollment Specialist- Remote at Adtalem Global Education
o
Someone from OH, Dayton just viewed Marketing and Communications Specialist at osu
Photo of the Rise User
Someone from OH, Columbus just viewed Construction Coordinator at Meijer
Photo of the Rise User
Someone from OH, Steubenville just viewed Legal & Compliance Internship at Smiths Group
Photo of the Rise User
Someone from OH, Warren just viewed Senior Front-End Developer at Worldly
Photo of the Rise User
Someone from OH, Tiffin just viewed Game Operations Specialist at Genius Sports
u
Someone from OH, Loveland just viewed Customer Service Agent - Part Time at uhaul
Photo of the Rise User
Someone from OH, Cleveland just viewed HR Manager at Shearer's Foods
Photo of the Rise User
Someone from OH, Columbus just viewed Mid Level, System Administrator - (ETS) at Delivery Hero
Photo of the Rise User
Someone from OH, Mason just viewed Inside Sales Co-Op at VEGA Americas