Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
ML Infrastructure Engineer image - Rise Careers
Job details

ML Infrastructure Engineer

About the Team

The Runtime team builds the low level framework components to power our ML training systems.  We work on building robust, scalable, high performance components to support our distributed training workloads.  Our priorities are to maximize the productivity of our researchers and our hardware, with the goal of accelerating progress towards AGI.  

About the Role

As a ML Infrastructure Engineer, you will work on improving the training throughput for our internal training framework, while enabling researchers to experiment with new ideas.  This requires good engineering (for example designing, implementing, and optimizing state-of-the-art AI models), writing bug-free machine learning code (surprisingly difficult!), and acquiring deep knowledge of the performance of supercomputers. In all the projects this role pursues, the ultimate goal is to push the field forward.

We’re looking for people who love optimizing performance, understanding distributed systems, and who cannot stand having bugs in their code.  Since our training framework is used for large runs with massive numbers of GPUs, performance improvements here will have a large impact.

This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.

In this role, you will:

  • Apply the latest techniques in our internal training framework to achieve impressive hardware efficiency for our training runs

  • Profile and optimize our training framework

  • Work with researchers to enable them to develop the next generation of models

You might thrive in this role if you:

  • Have run small scale ML experiments

  • Love figuring out how systems work and continuously come up with ideas for how to make them faster while minimizing complexity and maintenance burden

  • Have strong software engineering skills and are proficient in Python

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. 

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status. 

OpenAI Affirmative Action and Equal Employment Opportunity Policy Statement

For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

OpenAI Glassdoor Company Review
4.2 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
OpenAI DE&I Review
No rating Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of OpenAI
OpenAI CEO photo
Sam Altman
Approve of CEO

Average salary estimate

$135000 / YEARLY (est.)
min
max
$120000K
$150000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About ML Infrastructure Engineer, OpenAI

Are you an innovative problem-solver looking to make a real impact in the world of artificial intelligence? At OpenAI, we are on the lookout for a passionate ML Infrastructure Engineer to join our Runtime team in San Francisco. Our team is dedicated to building the foundational components that power our machine learning training systems. In this role, you'll be at the forefront of enhancing training throughput within our internal framework while enabling researchers to test their groundbreaking ideas. You'll need to demonstrate strong engineering skills, as you'll be designing, implementing, and optimizing state-of-the-art AI models. Writing efficient, bug-free machine learning code is paramount, and your deep understanding of supercomputer performance will be invaluable. The projects you lead will not only push the boundaries of AI research but will also significantly impact the productivity of our researchers and the efficiency of our hardware. If you’re excited about optimizing performance, diving into distributed systems, and are driven by a desire to eliminate bugs, we want you to bring your unique skills to OpenAI. You will be part of a hybrid work model and enjoy our supportive transition process if you are relocating. Join us and help shape the future of AI while making a positive impact on humanity!

Frequently Asked Questions (FAQs) for ML Infrastructure Engineer Role at OpenAI
What are the responsibilities of an ML Infrastructure Engineer at OpenAI?

As an ML Infrastructure Engineer at OpenAI, your main responsibilities will include improving training throughput for our internal training framework and collaborating with researchers to develop next-gen AI models. You'll be tasked with profiling and optimizing system performance while ensuring the implementation of bug-free machine learning code.

Join Rise to see the full answer
What qualifications are needed for the ML Infrastructure Engineer position at OpenAI?

To excel as an ML Infrastructure Engineer at OpenAI, candidates should possess strong software engineering skills, particularly in Python. Familiarity with running ML experiments and a robust understanding of systems and distributed computing is essential. A passion for performance optimization and a keen eye for detail will also set you apart.

Join Rise to see the full answer
What type of work environment does OpenAI offer for ML Infrastructure Engineers?

OpenAI offers a hybrid work environment for ML Infrastructure Engineers, requiring three days in the office per week in San Francisco. This structure promotes collaboration while allowing flexibility, ideal for innovative minds dedicated to advancing AI infrastructure.

Join Rise to see the full answer
How does the role of ML Infrastructure Engineer contribute to AI development at OpenAI?

The ML Infrastructure Engineer plays a crucial role at OpenAI by enhancing the efficiency of AI training frameworks. By optimizing performance and enabling researchers to experiment, you directly support the development of cutting-edge AI models, ultimately pushing the boundaries of what AI can achieve.

Join Rise to see the full answer
Is relocation assistance provided for the ML Infrastructure Engineer position at OpenAI?

Yes! For the ML Infrastructure Engineer role at OpenAI, we provide relocation assistance to help new employees transition smoothly to our San Francisco office. We support your journey as you join our dynamic team committed to reshaping the tech landscape.

Join Rise to see the full answer
Common Interview Questions for ML Infrastructure Engineer
Can you describe your experience with optimizing ML systems?

When answering this question, focus on specific projects where you enhanced system performance. Discuss the methodologies you used, any challenges faced, and the measurable outcomes, such as increased throughput or reduced latency.

Join Rise to see the full answer
What tools and technologies do you prefer for profiling code performance?

Highlight your familiarity with profiling tools relevant to ML and Python, such as cProfile, Py-Spy, or TensorBoard. Explain why you prefer certain tools and how they have helped you identify bottlenecks in your projects.

Join Rise to see the full answer
How do you ensure the code you write is bug-free?

Discuss your coding practices, such as extensive testing, code reviews, and using debugging tools. Share examples of how these practices have helped you catch issues early in the development process.

Join Rise to see the full answer
Explain how you would approach a performance bottleneck in an ML training framework.

Detail your systematic approach to identifying bottlenecks, which may include profiling the code, analyzing GPU memory usage, and incremental testing. Provide examples of past experiences where you've successfully resolved similar issues.

Join Rise to see the full answer
What experiences do you have with distributed systems?

Bring up your practical experiences working with distributed systems, mentioning any specific frameworks or architectures you've used. Discuss your understanding of their challenges and your contributions to building or supporting these systems.

Join Rise to see the full answer
Can you give an example of a machine learning experiment you have conducted?

Share a specific project where you applied ML techniques. Explain the objectives, your methodology, tools used, and the outcomes/results. Highlight your role in the experiment and how it contributed to your understanding of ML.

Join Rise to see the full answer
How do you keep up to date with the latest trends in ML and AI?

Discuss your engagement with the community, such as attending conferences, following leading researchers, subscribing to journals, or participating in online forums. Illustrate how these activities have informed your work and knowledge.

Join Rise to see the full answer
What challenges have you faced in ML engineering roles?

Reflect on specific challenges you've encountered, such as scaling issues or debugging complex systems. Share how you addressed these challenges and the lessons you learned that could benefit your role at OpenAI.

Join Rise to see the full answer
How would you collaborate with researchers to improve AI models?

Emphasize your communication skills and teamwork. Describe your approach to understanding researcher needs, providing technical support, and using feedback to enhance the infrastructure that serves their goals.

Join Rise to see the full answer
Describe a time you had to simplify a complex system design.

Share an example that illustrates your ability to analyze a system's complexity and propose simplifications. Discuss your rationale for the changes and the positive impact on efficiency or ease of maintenance.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 10 days ago
Inclusive & Diverse
Feedback Forward
Collaboration over Competition
Growth & Learning
Photo of the Rise User
Inclusive & Diverse
Feedback Forward
Collaboration over Competition
Growth & Learning
Amazon Stores Hybrid US, St Joseph County, IN; Indiana, South Bend, IN
Posted 4 days ago
Photo of the Rise User
Posted 10 days ago
Photo of the Rise User
ASSYSTEM Remote Warrington, UK
Posted 9 days ago
Photo of the Rise User
Posted 9 days ago
Photo of the Rise User
Posted 7 days ago
Photo of the Rise User
AECOM Remote Dubai, United Arab Emirates
Posted 11 days ago
Posted 7 days ago

OpenAI is a US based, private research laboratory that aims to develop and direct AI. It is one of the leading Artifical Intellgence organizations and has developed several large AI language models including ChatGPT.

885 jobs
MATCH
VIEW MATCH
BADGES
Badge ChangemakerBadge Future MakerBadge InnovatorBadge Future UnicornBadge Rapid Growth
CULTURE VALUES
Inclusive & Diverse
Feedback Forward
Collaboration over Competition
Growth & Learning
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
INDUSTRY
TEAM SIZE
No info
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
March 22, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
k
Someone from OH, Columbus just viewed Patient Experience Coordinator at knownwell
Photo of the Rise User
Someone from OH, Columbus just viewed Store Manager - New Store Opening at Curaleaf
S
Someone from OH, Dayton just viewed Senior Director, Employee Engagement at Scout Motors
Photo of the Rise User
Someone from OH, Akron just viewed Finance Intern - Summer 2025 at Spectrum
Photo of the Rise User
Someone from OH, Norwalk just viewed Hybrid Account Manager-Commercial Lines at AssuredPartners
Photo of the Rise User
Someone from OH, Loveland just viewed Animator at Apex Systems Bellevue, WA at Apex Systems
Photo of the Rise User
Someone from OH, Canton just viewed Lead Jr. Toddler Teacher at All Around Children
Photo of the Rise User
Someone from OH, Mentor just viewed Site Merchandising Manager at Lovepop
Photo of the Rise User
Someone from OH, Batavia just viewed Restaurant Busser at Outback Steakhouse
Photo of the Rise User
67 people applied to Electrical Apprentice at Aerotek
Photo of the Rise User
Someone from OH, New Albany just viewed Customer Success Manager at Quisitive
Photo of the Rise User
Someone from OH, Columbus just viewed UGC Creator - USA, Female 40-50 - Contract to hire at Upwork
Photo of the Rise User
Someone from OH, Strongsville just viewed Automotive Buyer at Sonic Automotive
Photo of the Rise User
Someone from OH, Strongsville just viewed Experienced Automotive Buyer at Sonic Automotive
Photo of the Rise User
8 people applied to Assembly Mechanic at Boeing
Photo of the Rise User
Someone from OH, Columbus just viewed Business Systems Analyst, Apps & Automations at Deel
Photo of the Rise User
Someone from OH, Findlay just viewed Marketing Analyst at ITW
R
Someone from OH, Cleveland just viewed Marketing Lead at Redi.Health
Photo of the Rise User
Someone from OH, Cleveland just viewed Associate Conversion Data Analyst at Bloomerang
Photo of the Rise User
Someone from OH, Cleveland just viewed Material Buyer/Planner at Aston Carter
F
Someone from OH, Cleveland just viewed Senior Materials Planner at Fortune Brands
Photo of the Rise User
Someone from OH, Cleveland just viewed Junior Data Analyst at Arkana Laboratories