Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Site Reliability Engineer image - Rise Careers
Job details

Site Reliability Engineer

About the Team

Join the engineering teams that bring OpenAI’s ideas safely to the world!!

The Applied Engineering team works across research, engineering, product, and design to bring OpenAI’s technology to consumers and businesses. We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely. Safety is more important to us than unfettered growth.

 

About the Role

We’re seeking a Site Reliability Engineer with experience in managing systems and infrastructure at scale. You’ll join a nimble team where you’ll help drive deployment of OpenAI’s technology into new environments and infrastructure to enable the critical missions in the public sector. This role engages cross-functionally with internal product, security, and compliance teams to build required functionality and ensure we’re delivering a scalable, reliable platform. The proximity to customers provides a unique opportunity to see the impact of your work first-hand.

This role is based in Washington D.C. and San Francisco, CA. Travel to and working from customer sites is required for this role. 

In this role, you will:

  • Design and build performant, reliable, and scalable infrastructure, both on-premises and in the cloud, for our public sector customers.

  • Administer the systems from the hardware up to kubernetes, ensuring our teams have a standardized infrastructure to deploy OpenAI’s technology onto.

  • Own the reliability of these systems by being on-site with the customer, utilizing observability tooling, and directly troubleshooting issues that arise as the first line of support.

  • Partner with teams across engineering and security to ensure the product supports the unique needs of the infrastructure and use-cases.

  • Automate routine tasks and standardize our infrastructure offerings to allow our team to scale as we continue to grow.

  • Partner with teams across the business, including engineering, security, and compliance, to enable our products to work within the unique constraints of new environments.

You might thrive in this role if you:

  • Hold an active US security clearance

  • 5+ years experience operating infrastructure and systems at scale

  • Worked out of secure environments, closely collaborating with both on-site clients and remote colleagues.

  • Hands-on experience with containers (Docker) and orchestration platforms (kubernetes)

  • Scripting experience with Python or equivalents for automating routine tasks

  • Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done to ensure both your team and our customers succeed.

  • Strong troubleshooting skills across the entire stack (infrastructure, systems, and applications)

  • Thrive in dynamic environments and can navigate ambiguity with ease.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. 

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status. 

OpenAI Affirmative Action and Equal Employment Opportunity Policy Statement

For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

OpenAI Glassdoor Company Review
4.2 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
OpenAI DE&I Review
No rating Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of OpenAI
OpenAI CEO photo
Sam Altman
Approve of CEO

Average salary estimate

$140000 / YEARLY (est.)
min
max
$120000K
$160000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Site Reliability Engineer, OpenAI

Join OpenAI as a Site Reliability Engineer and become part of a transformative team based in Washington D.C. or San Francisco! This is an exciting opportunity to work alongside innovative professionals who are dedicated to safely deploying cutting-edge artificial intelligence technologies. In this role, you will help manage and build scalable, reliable infrastructure tailored for our essential public sector missions. Your day-to-day tasks will involve designing high-performing systems, managing Kubernetes and Docker containers, and directly troubleshooting any challenges that arise. You’ll take pride in your contributions to ensuring our platforms meet strict security and compliance standards while working closely with various teams to automate processes and enhance our infrastructure offerings. Your unique position allows you to see the impact of your work firsthand, meeting clients directly and understanding their needs. Ideal candidates will have at least five years of experience in system operations and infrastructure management, a solid grasp of automation scripts in Python, and a deep commitment to delivering results. If you thrive in a dynamic environment and enjoy solving complex problems, this opportunity at OpenAI is designed for you. Get ready to make a difference in technology and join a company that values safety, inclusion, and the responsible use of AI!

Frequently Asked Questions (FAQs) for Site Reliability Engineer Role at OpenAI
What responsibilities can I expect as a Site Reliability Engineer at OpenAI?

As a Site Reliability Engineer at OpenAI, you'll design and build scalable infrastructure for public sector customers, manage systems from the hardware up to Kubernetes, and ensure reliability by being onsite with clients. You'll partner with numerous teams to integrate security and compliance measures, automate routine tasks, and troubleshoot issues directly using observability tools.

Join Rise to see the full answer
What qualifications are needed to apply for the Site Reliability Engineer position at OpenAI?

Candidates should have at least 5 years of experience managing infrastructure at scale, strong troubleshooting skills, and hands-on experience with both Docker and Kubernetes. Additionally, an active US security clearance is preferred. Proficiency in scripting, especially Python, is essential for automating routine tasks in this role.

Join Rise to see the full answer
What unique opportunities does this role offer at OpenAI?

This position offers a unique chance for Site Reliability Engineers to see the tangible impact of their work by engaging directly with customers in the public sector. You'll also work cross-functionally with product, security, and compliance teams to develop critical functionalities tailored to specific infrastructures and environments, enhancing your professional experience.

Join Rise to see the full answer
How does OpenAI ensure safety and compliance in its operations?

At OpenAI, safety is a top priority. As a Site Reliability Engineer, you will closely collaborate with security and compliance teams to ensure that our technology aligns with necessary standards and operates within the constraints of regulated environments, ensuring the responsible deployment of AI technologies.

Join Rise to see the full answer
What is the work environment like for a Site Reliability Engineer at OpenAI?

The work environment at OpenAI is dynamic and collaborative, allowing Site Reliability Engineers to thrive amid ambiguity while working on challenging projects. You'll be part of a nimble team that emphasizes innovative solutions, and your contributions will directly impact how AI solutions are deployed in the real world.

Join Rise to see the full answer
Common Interview Questions for Site Reliability Engineer
Can you explain your experience with Kubernetes and how it relates to your role as a Site Reliability Engineer?

In your response, highlight specific projects where you managed container orchestration using Kubernetes. Discuss the challenges you faced and how you overcame them, emphasizing any improvements in deployment efficiency or reliability that resulted from your work.

Join Rise to see the full answer
Describe a situation where you had to troubleshoot a system failure. What steps did you take?

Focus on a specific example where you identified the root cause of a failure, the systematic steps you took to ameliorate the issue, and how you communicated with your team and clients. Highlight your problem-solving skills and how you ensured the system's reliability moving forward.

Join Rise to see the full answer
How do you automate routine tasks in your current workflow?

Share your experience in scripting languages such as Python. Provide examples of tasks you've automated and the impact this had on efficiency and scalability. Emphasize your commitment to improving processes through automation.

Join Rise to see the full answer
What strategies do you use for ensuring system scalability?

Discuss your understanding of designing infrastructure that can grow seamlessly with demand, including considerations like resource allocation, load balancing, and redundancy. Share specific techniques or frameworks you've utilized in past roles.

Join Rise to see the full answer
How would you manage competing priorities from different teams?

Describe your approach to prioritization, suggesting methods like using a triage system or a collaborative tool for managing tasks. Emphasize the importance of communication and maintaining a flexible mindset as challenges arise.

Join Rise to see the full answer
What is your understanding of security compliance in cloud environments?

Explain your experience with security protocols and compliance requirements, particularly in regulated environments. Discuss your approach to ensuring that all operational frameworks adhere to both organizational and external guidelines.

Join Rise to see the full answer
Share your experience in cross-functional collaboration.

Provide examples of times you've worked with project managers, engineers, and security teams. Describe how you navigated differing priorities and ensured alignment towards common goals, showcasing your interpersonal skills.

Join Rise to see the full answer
What tools do you rely on for observability and monitoring?

List tools you've effectively used, such as Grafana, Prometheus, or Splunk. Describe how you utilize these tools to enhance system reliability and troubleshoot issues quickly, ensuring optimal performance.

Join Rise to see the full answer
How do you handle on-call responsibilities?

Discuss your approach to being on-call, touching on the importance of preparedness, documentation, and collaboration. Highlight your problem-solving skills and how you manage stress effectively when responding to urgent issues.

Join Rise to see the full answer
What motivates you to work in Site Reliability Engineering at OpenAI?

Convey your passion for technology and AI, as well as your desire to make a meaningful impact through your work. Personal anecdotes about previous successful projects or collaborations can illustrate your alignment with OpenAI’s mission and vision.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 9 days ago
Inclusive & Diverse
Feedback Forward
Collaboration over Competition
Growth & Learning
Photo of the Rise User
Inclusive & Diverse
Feedback Forward
Collaboration over Competition
Growth & Learning
Photo of the Rise User
Solvd Remote No location specified
Posted 11 days ago
Photo of the Rise User
Olsson Remote 601 P St suite 200, Lincoln, NE 68508, USA
Posted 12 days ago
Photo of the Rise User
Smiths Group Hybrid 5027 Commercial Cir, Concord, CA 94520, USA
Posted 10 days ago
Photo of the Rise User
Signifyd Remote Belfast, Northern Ireland;
Posted 9 days ago
Photo of the Rise User
Signode Hybrid 1600 Central Ave, Roselle, IL 60172, USA
Posted 4 days ago
Photo of the Rise User
AECOM Remote Charlotte, NC, USA
Posted 4 days ago

OpenAI is a US based, private research laboratory that aims to develop and direct AI. It is one of the leading Artifical Intellgence organizations and has developed several large AI language models including ChatGPT.

574 jobs
MATCH
Calculating your matching score...
BADGES
Badge ChangemakerBadge Future MakerBadge InnovatorBadge Future UnicornBadge Rapid Growth
CULTURE VALUES
Inclusive & Diverse
Feedback Forward
Collaboration over Competition
Growth & Learning
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
INDUSTRY
TEAM SIZE
No info
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
December 20, 2024

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!