Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Sr Site Reliability Engineer (AI Runtime Security) image - Rise Careers
Job details

Sr Site Reliability Engineer (AI Runtime Security)

Company Description

Our Mission

At Palo Alto Networks® everything starts and ends with our mission:

Being the cybersecurity partner of choice, protecting our digital way of life.
Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we’re looking for innovators who are as committed to shaping the future of cybersecurity as we are.

Who We Are

We take our mission of protecting the digital way of life seriously. We are relentless in protecting our customers and we believe that the unique ideas of every member of our team contributes to our collective success. Our values were crowdsourced by employees and are brought to life through each of us everyday - from disruptive innovation and collaboration, to execution. From showing up for each other with integrity to creating an environment where we all feel included.

As a member of our team, you will be shaping the future of cybersecurity. We work fast, value ongoing learning, and we respect each employee as a unique individual. Knowing we all have different needs, our development and personal wellbeing programs are designed to give you choice in how you are supported. This includes our FLEXBenefits wellbeing spending account with over 1,000 eligible items selected by employees, our mental and financial health resources, and our personalized learning opportunities - just to name a few!

At Palo Alto Networks, we believe in the power of collaboration and value in-person interactions. This is why our employees generally work full time from our office with flexibility offered where needed. This setup fosters casual conversations, problem-solving, and trusted relationships. Our goal is to create an environment where we all win with precision.

Job Description

Your Career

Palo Alto Networks has been rapidly moving towards the future where cloud-based applications are increasingly common. As a Site Reliability Engineer, you will develop the frameworks and pathways to help move our internal applications to microservices. You will be a critical link between engineering and the Infrastructure Platform, building Infrastructure as Code and working in partnership with the App developers to deploy the applications in GCP, AWS and data centers across the globe.

As a member of the SRE team, you will work on producing mission-critical platforms, tools, and processes that will ensure the highest levels of availability and reliability of all our applications. We need creative and innovative problem solvers who can partner with our Application development teams to make their services more usable. Our SRE team is furnished with a standout opportunity to build tools, frameworks, and cloud platforms that will support our company’s growth over the next decade. If you are a self-starter and jump on new ideas to make the platform more stable, secure and feature-rich, this is your new career.

Your Impact

  • Write automation code for provisioning and operating infrastructure at massive scale
  • Design, build and operate Cloud infrastructure to enable reliable and rapid deployment of microservices with effective monitoring and resilient operations
  • Work with development teams to make sure the applications are production ready, scalable and reliable from the grounds up
  • Identify and drive opportunities to improve automation for code deployment, management, and visibility of application services
  • Develop tools and framework to automate operational tasks, deployment of machines, services, applications
  • Establish end-to-end monitoring and alerting on all critical components of the application
  • Participate in the on-call rotation supporting the platform and or the production application
  • Directs root cause analysis of critical business and production issues
  • Develop and mentor other SREs on standard methodology from Infra orchestration and troubleshooting application service in production
  • Represent SRE in design reviews and work cross-functionally with Engineering teams on operational readiness

Qualifications

Your Experience 

  • BS/MS in Computer Science or Computer Engineering or equivalent military experience required
  • Expertise in configuration management with a framework such as Terraform, Ansible, and Helm
  • Strong Linux administration, internals, and network troubleshooting
  • Experience in DevOps, Site Reliability, or infrastructure engineering 
  • Expertise in Google cloud computing (GCP) and its related services
  • Proficiency with a programming language like Python and shell scripting to automate tasks
  • Strong experience with CI/CD pipeline, GitHub, Jenkins, Artifactory 
  • Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions
  • Strong fundamentals in HTTP including HTTP headers and web servers 
  • Excellent problem solving, critical thinking, communication, and teamwork skills
  • Excellent written and verbal communication, able to collaborate and rally support
  • Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency, and drive
  • Passion for automation and monitoring instrumentation as code
  • Excellent interpersonal skills and the ability to work well in a team
  • Passionate to learn, understand, and dissect new technology stack quickly on own
  • Have experience on building and managing large relational database cluster (MySQL/Percona etc.) will be a plus 

Additional Information

The Team

We are on a mission to build the industry's best Security large language model.

Our engineering team is at the core of our products – connected directly to the mission of preventing cyberattacks. We are constantly innovating – challenging the way we, and the industry, think about cybersecurity. Our engineers don’t shy away from building products to solve problems no one has pursued before.

We define the industry, instead of waiting for directions. We need individuals who feel comfortable in ambiguity, excited by the prospect of a challenge, and empowered by the unknown risks facing our everyday lives that are only enabled by a secure digital environment.

Compensation Disclosure 

The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/commissioned roles) is expected to be between $126000 - $203500/YR. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here.

Our Commitment

We’re problem solvers that take risks and challenge cybersecurity’s status quo. It’s simple: we can’t accomplish our mission without diverse teams innovating, together.

We are committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at  [email protected].

Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.

All your information will be kept confidential according to EEO guidelines.

Is role eligible for Immigration Sponsorship?: Yes

Average salary estimate

$164750 / YEARLY (est.)
min
max
$126000K
$203500K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Sr Site Reliability Engineer (AI Runtime Security), Palo Alto Networks

At Palo Alto Networks, we’re on a mission to revolutionize cybersecurity, and we're looking for a talented Sr Site Reliability Engineer (AI Runtime Security) to join our innovative team in sunny Santa Clara, CA. In this role, you'll be at the forefront of developing frameworks and pathways that transition our internal applications to a microservices architecture. Imagine collaborating with visionary engineers to build Infrastructure as Code, deploying applications in GCP and AWS, while ensuring they are stable, secure, and feature-rich! We thrive on creativity and collaboration, so you'll have the opportunity to work closely with App developers to enhance service usability. Here, we value your unique contributions, offering you the chance to tackle challenging problems while ensuring the high availability of our mission-critical platforms. Think of it as your chance to be a crucial link in our efforts to safeguard digital experiences across the globe. If you're a self-starter who loves finding innovative solutions, this could be the perfect opportunity for you! Join us in shaping the future of cybersecurity!

Frequently Asked Questions (FAQs) for Sr Site Reliability Engineer (AI Runtime Security) Role at Palo Alto Networks
What responsibilities does a Sr Site Reliability Engineer (AI Runtime Security) have at Palo Alto Networks?

As a Sr Site Reliability Engineer (AI Runtime Security) at Palo Alto Networks, you will be responsible for writing automation code to provision and operate large-scale infrastructure, designing and operating cloud frameworks to support rapid microservices deployment, and ensuring that applications are production-ready and reliable. You're expected to collaborate with development teams, establish end-to-end monitoring, participate in on-call support, and drive root cause analysis for production issues.

Join Rise to see the full answer
What qualifications are required to become a Sr Site Reliability Engineer (AI Runtime Security) at Palo Alto Networks?

To qualify for the Sr Site Reliability Engineer (AI Runtime Security) position at Palo Alto Networks, you will need a BS/MS in Computer Science or Computer Engineering, or equivalent military experience. Essential skills include expertise in configuration management frameworks like Terraform, strong Linux administration skills, experience in DevOps and Site Reliability Engineering, and proficiency in Google Cloud Platform services. Familiarity with programming in Python and shell scripts, as well as knowledge of CI/CD pipelines, will also be beneficial.

Join Rise to see the full answer
How does the Sr Site Reliability Engineer (AI Runtime Security) contribute to overall security?

The Sr Site Reliability Engineer (AI Runtime Security) contributes to overall security at Palo Alto Networks by building resilient infrastructures that ensure applications are secure and usable. Your work in developing automation frameworks helps to increase security by minimizing human error, while monitoring systems allows for the rapid identification and resolution of potential security breaches. Additionally, working across teams ensures that security is integrated at every level of application deployment.

Join Rise to see the full answer
What tools and technologies does a Sr Site Reliability Engineer (AI Runtime Security) work with?

In the role of Sr Site Reliability Engineer (AI Runtime Security), you'll work with an array of sophisticated tools and technologies including configuration management tools like Ansible and Helm, cloud services from Google Cloud Platform (GCP) and AWS, and automation tools such as Jenkins and GitHub. You'll also engage with programming languages like Python for scripting and automation, contributing to the deployment and operational tasks in a high-transaction environment.

Join Rise to see the full answer
What skills make a successful Sr Site Reliability Engineer (AI Runtime Security) at Palo Alto Networks?

A successful Sr Site Reliability Engineer (AI Runtime Security) at Palo Alto Networks combines strong technical skills with creative problem-solving abilities. Essential skills include a deep understanding of distributed systems, proficiency in automation and cloud technologies, and strong troubleshooting capabilities. Additionally, having excellent communication and teamwork skills is vital as collaboration with different teams is a core part of the role.

Join Rise to see the full answer
Common Interview Questions for Sr Site Reliability Engineer (AI Runtime Security)
Can you describe your experience with Infrastructure as Code?

In my previous roles, I have utilized Infrastructure as Code (IaC) through tools like Terraform and Ansible to automate the provisioning of resources and manage configuration consistently. I ensure that the infrastructure is reproducible and version-controlled, which enhances the reliability and speed of deployments.

Join Rise to see the full answer
How do you handle incidents in a production environment?

When dealing with incidents in a production environment, my first step is to quickly assess the situation, identify affected services, and communicate with relevant teams. I employ monitoring tools to diagnose the issue and work to restore services swiftly. After resolution, I conduct a detailed postmortem to identify root causes and implement preventive measures.

Join Rise to see the full answer
What strategies do you use for monitoring distributed systems?

For monitoring distributed systems, I advocate for establishing comprehensive observability frameworks. This includes using proactive monitoring tools that provide metrics, logs, and traces. I configure alerts for critical issues and ensure that dashboards are in place for real-time visibility, helping teams to address issues before they affect users.

Join Rise to see the full answer
How do you collaborate with development teams?

Collaboration with development teams is essential for ensuring applications are reliable and scalable. I involve them early in discussions about infrastructure needs, offer guidance on best practices for production readiness, and maintain open lines of communication throughout the deployment process to resolve any concerns that arise.

Join Rise to see the full answer
Can you explain a time when you improved a CI/CD pipeline?

In a previous position, I noticed our CI/CD pipeline involved repetitive tasks which often caused delays. I introduced automation scripts that streamlined deployment processes by integrating testing and deployment using Jenkins. This change significantly reduced the deployment time and increased confidence in code releases.

Join Rise to see the full answer
What are your thoughts on cloud security?

Cloud security is paramount in today’s digital landscape. I believe in implementing security best practices such as configuring security groups, regularly auditing permissions, and utilizing logging and monitoring tools to maintain visibility and response capabilities. Continuous learning and adapting to new threats is also essential for robust cloud security.

Join Rise to see the full answer
Tell me about your experience with GCP and AWS.

I have extensive experience in managing and deploying applications on both GCP and AWS. I leverage services such as Compute Engine and Google Kubernetes Engine for GCP, while utilizing EC2 and RDS for AWS. My focus has been on optimizing service configurations to ensure high availability and cost-effectiveness.

Join Rise to see the full answer
What challenges have you faced in site reliability engineering?

One of the biggest challenges I faced was managing the performance of services during sudden traffic spikes. I addressed this by implementing auto-scaling groups and optimizing resource allocation based on real-time metrics, which significantly mitigated service disruptions and improved user experience.

Join Rise to see the full answer
How do you ensure your operational processes are documented?

I believe in maintaining thorough documentation of all operational processes and procedures, including on-call responsibilities, incident response protocols, and service specific configurations. Documentation is reviewed regularly and shared with the team, ensuring everyone is aligned and can access information when needed.

Join Rise to see the full answer
Describe a time you mentored a junior engineer.

I had the opportunity to mentor a junior engineer where I guided them through an infrastructure setup project. We collaborated on designing the architecture, and I provided them with resources on best practices. This experience not only enhanced their skills but also strengthened our team's capabilities on the project.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 2 days ago
Photo of the Rise User
Posted 2 days ago
Photo of the Rise User
Customer-Centric
Empathetic
Transparent & Candid
Growth & Learning
Work/Life Harmony
Maternity Leave
WFH Reimbursements
Fully Distributed
Company Retreats
Medical Insurance
Vision Insurance
Dental Insurance
Unlimited Vacation
Paid Time-Off
Paid Sick Days
Paid Holidays
Learning & Development
Health Savings Account (HSA)
Photo of the Rise User
Yardzen Hybrid Mill Valley, CA
Posted 2 days ago
Photo of the Rise User
Posted 9 hours ago
Inclusive & Diverse
Transparent & Candid
Mission Driven
Collaboration over Competition
Empathetic
Social Impact Driven
Rise from Within
Work/Life Harmony
Maternity Leave
Paternity Leave
Family Coverage (Insurance)
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Paid Time-Off
Barker Associates Remote No location specified
Posted 2 days ago
Photo of the Rise User
Posted 9 days ago
Photo of the Rise User
Posted 4 days ago
Posted 3 days ago
Photo of the Rise User
Posted 14 days ago

Being the cybersecurity partner of choice, protecting our digital way of life.

499 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
January 3, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!