Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Sr Staff Site Reliability Engineer (Cortex Data Lake) image - Rise Careers
Job details

Sr Staff Site Reliability Engineer (Cortex Data Lake)

Company Description

Our Mission

At Palo Alto Networks® everything starts and ends with our mission:

Being the cybersecurity partner of choice, protecting our digital way of life.
Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we’re looking for innovators who are as committed to shaping the future of cybersecurity as we are.

Who We Are

We take our mission of protecting the digital way of life seriously. We are relentless in protecting our customers and we believe that the unique ideas of every member of our team contributes to our collective success. Our values were crowdsourced by employees and are brought to life through each of us everyday - from disruptive innovation and collaboration, to execution. From showing up for each other with integrity to creating an environment where we all feel included.

As a member of our team, you will be shaping the future of cybersecurity. We work fast, value ongoing learning, and we respect each employee as a unique individual. Knowing we all have different needs, our development and personal wellbeing programs are designed to give you choice in how you are supported. This includes our FLEXBenefits wellbeing spending account with over 1,000 eligible items selected by employees, our mental and financial health resources, and our personalized learning opportunities - just to name a few!

At Palo Alto Networks, we believe in the power of collaboration and value in-person interactions. This is why our employees generally work full time from our office with flexibility offered where needed. This setup fosters casual conversations, problem-solving, and trusted relationships. Our goal is to create an environment where we all win with precision.

Job Description

Your Career

Palo Alto Networks runs a large infrastructure and is one of the largest GCP customers. As a Senior Staff DevOps Engineer for the CDL/SLS team, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance, observability, troubleshooting, security, and reliability.

Our Infrastructure Platform stack includes Terraform, Kubernetes, GitLab CI/CD, GitOps, Prometheus, Grafana, Loki, Docker, GCP, Vault, Kafka, MySQL, Python, Bash, and Go. 

Your Impact

  • Contribute to the success of SRE and DevOps
  • Develop expertise in new technologies
  • Work with developers, researchers, data scientists, and security experts
  • Design, build and operate reliable, secure Cloud infrastructure
  • Ensure that applications are production-ready, scalable, and reliable
  • Develop tools and automation frameworks
  • Automate robust deployment of robust services
  • Orchestrate end-to-end monitoring and alerting
  • Participate with SRE and Dev teams in the on-call rotation
  • Lead root cause analysis of critical business and production issues

Qualifications

Your Experience 

  • 4+ years as an engineer in Infrastructure, Operations, DevOps, or System Engineering
  • 3+ years building high availability, scalable cloud-native applications on AWS or GCP
  • BS or MS in Computer Science, a related field, or equivalent professional experience or equivalent military experience required
  • Expertise in configuration management with a framework such as Ansible, Terraform, Helm
  • Passion for infrastructure and monitoring as code
  • Solid experience in container workloads and Kubernetes
  • Familiarity with PKI concepts, Networking concepts
  • In-depth knowledge of different security controls ( app-id, user-id, security profile, url category, content, ssl decryption, firewall MFA etc)
  • Linux administration, internals, and network troubleshooting
  • Proficiency with programming languages like Golang or Python along with shell scripting to automate tasks
  • Proficiency with CI/CD pipelines, ArgoCD and GitLab CI/CD. Knowledge of GitLab Runners is a plus
  • Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions
  • Experience with managing Kafka is a plus
  • Excellent written and verbal communication, able to collaborate and rally support
  • Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency, and drive
  • Ready to understand and dissect new technology stacks quickly

Additional Information

The Team

Our engineering team is at the core of our products – connected directly to the mission of preventing cyberattacks. We are constantly innovating – challenging the way we, and the industry, think about cybersecurity. Our engineers don’t shy away from building products to solve problems no one has pursued before.

We define the industry, instead of waiting for directions. We need individuals who feel comfortable in ambiguity, excited by the prospect of a challenge, and empowered by the unknown risks facing our everyday lives that are only enabled by a secure digital environment.

Compensation Disclosure

The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/commissioned roles) is expected to be between $126000 - $203500/YR. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here.

#LI-TD1

Our Commitment

We’re problem solvers that take risks and challenge cybersecurity’s status quo. It’s simple: we can’t accomplish our mission without diverse teams innovating, together.

We are committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at  [email protected].

Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.

All your information will be kept confidential according to EEO guidelines.

Average salary estimate

$164750 / YEARLY (est.)
min
max
$126000K
$203500K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Sr Staff Site Reliability Engineer (Cortex Data Lake), Palo Alto Networks

At Palo Alto Networks, we're on the lookout for a dynamic Senior Staff Site Reliability Engineer to join our Cortex Data Lake team in sunny Santa Clara, CA. If you're an innovator who thrives on tackling challenges and shaping the future of cybersecurity, this role is your opportunity! As a critical member of our diverse engineering team, your expertise will support our large-scale infrastructure, which is at the forefront of cloud-native applications. Your journey will include designing and operating reliable cloud infrastructure, while ensuring our applications are both scalable and production-ready. You'll partner with a variety of professionals from developers to data scientists, all while driving automation and building frameworks that elevate our operational excellence. Our tech stack is impressive, featuring tools like Terraform, Kubernetes, and Docker, and you’ll have the scope to delve into new technologies that excite you. In this role, you'll be diving into monitoring and alerting practices, and leading root cause analyses of critical production issues to continuosly improve our systems. With a strong focus on personal and professional growth, your contributions will not only shape our products but also enhance our mission of securing the digital world. Come be part of a team that believes in collaboration, empowers its members, and puts a premium on innovative problem-solving.

Frequently Asked Questions (FAQs) for Sr Staff Site Reliability Engineer (Cortex Data Lake) Role at Palo Alto Networks
What are the responsibilities of a Sr Staff Site Reliability Engineer at Palo Alto Networks?

As a Senior Staff Site Reliability Engineer at Palo Alto Networks, your responsibilities include developing and maintaining scalable cloud infrastructure, ensuring applications are production-ready, and leading automation efforts to streamline deployment processes. You'll also work closely with developers and data scientists, lead root cause analyses for critical issues, and participate in the on-call rotation to guarantee reliability and security.

Join Rise to see the full answer
What qualifications are needed for the Sr Staff Site Reliability Engineer position at Palo Alto Networks?

To qualify for the Sr Staff Site Reliability Engineer position at Palo Alto Networks, you should have at least 4 years of engineering experience in Infrastructure, Operations, or DevOps, alongside 3 years of hands-on experience building cloud-native applications on platforms like AWS or GCP. A degree in Computer Science or a related field, expertise in configuration management, and solid experience with Kubernetes and container workloads are also essential.

Join Rise to see the full answer
What kind of technologies will I work with as a Sr Staff Site Reliability Engineer at Palo Alto Networks?

In the role of Senior Staff Site Reliability Engineer at Palo Alto Networks, you will work with a diverse tech stack including Terraform, Kubernetes, Docker, GitLab CI/CD, Prometheus, and Grafana. You’ll engage with modern programming languages like Python and Go, and have the opportunity to deepen your expertise in cloud-native technologies while ensuring that our infrastructure is secure and reliable.

Join Rise to see the full answer
What does the team culture look like for a Senior Staff Site Reliability Engineer at Palo Alto Networks?

At Palo Alto Networks, team culture for the Senior Staff Site Reliability Engineer revolves around innovation, collaboration, and inclusivity. We value open communication and problem-solving together. Our employees work primarily from the office to foster casual interactions and quick brainstorming sessions, all while being supported by various personal development programs and resources aimed at vehicle professional growth.

Join Rise to see the full answer
How does Palo Alto Networks support the professional development of Sr Staff Site Reliability Engineers?

Palo Alto Networks is committed to the professional development of its employees, including Senior Staff Site Reliability Engineers. We offer personalized learning opportunities, access to mental and financial health resources, and our FLEXBenefits wellbeing spending account that allows you to choose from a range of eligible benefits. The company cultivates a supportive environment for continuous learning and career progression.

Join Rise to see the full answer
Common Interview Questions for Sr Staff Site Reliability Engineer (Cortex Data Lake)
Can you explain your experience with cloud-native applications?

In answering this question, highlight specific projects where you designed, implemented, or managed cloud-native applications. Discuss the technologies used, challenges faced, and how your approach ensured scalability and reliability. Providing metrics that showcase your contributions can make a strong impact.

Join Rise to see the full answer
How do you ensure reliability and availability in cloud infrastructure?

When responding, detail your strategies such as monitoring performance, implementing CI/CD practices, and using infrastructure as code tools to automate deployments. Be ready to provide examples of handling outages or failures and the steps you took to mitigate such issues.

Join Rise to see the full answer
What is your experience with container orchestration tools like Kubernetes?

Describe your hands-on experience with Kubernetes, including deployment strategies and management of containerized applications. Explain how you’ve leveraged Kubernetes to alleviate scalability challenges, enhance security, and optimize resource usage.

Join Rise to see the full answer
How do you prioritize and manage multiple competing tasks?

Articulate your methods for prioritizing tasks, such as using project management tools, setting clear deadlines, and coordinating with team members. Illustrate your answer with a situation where you successfully managed competing priorities and the impact it had on your project.

Join Rise to see the full answer
What role does monitoring play in your work?

Speak to the importance of proactive monitoring in maintaining system health. Discuss tools you're familiar with, such as Prometheus and Grafana, and provide an example of how effective monitoring has helped you identify a problem before it escalated.

Join Rise to see the full answer
Can you discuss your experience with automation frameworks?

When discussing this, share specific automation frameworks you have worked with and the processes you've automated. Explain how automation has improved efficiency, reduced errors, or enhanced service uptime, putting emphasis on your contribution.

Join Rise to see the full answer
How do you approach root cause analysis for system failures?

Outline your methodical approach to root cause analysis, including gathering data, collaborating with team members, and diagnosing the issue. Share a past experience where your analysis led to significant improvements in system reliability.

Join Rise to see the full answer
What strategies do you employ for incident management?

Discuss your strategies for incident management, including response protocols, communication during incidents, and post-incident reviews. Use a real-world example to frame your response, focusing on how your approach minimizes impact.

Join Rise to see the full answer
How do you keep up with new technologies in site reliability?

Showcase your commitment to continuous learning, whether through online courses, tech blogs, or community engagement. Mention specific technologies or trends you are currently exploring and how you're integrating them into your work processes.

Join Rise to see the full answer
Can you describe a challenging project you worked on?

Provide details on a project that was particularly challenging, focusing on how you navigated obstacles, adapted your strategies, and collaborated with your team. Highlight the lessons learned and how this experience has influenced your approach to future projects.

Join Rise to see the full answer

Being the cybersecurity partner of choice, protecting our digital way of life.

431 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, on-site
DATE POSTED
December 13, 2024

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!