Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
ML Infrastructure Engineer image - Rise Careers
Job details

ML Infrastructure Engineer

About Us:

Hippocratic AI has developed a safety-focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthcare accessibility and health outcomes in the world by bringing deep healthcare expertise to every human. No other technology has the potential to have this level of global impact on health. 

Why Join Our Team:

  • Innovative Mission: We are developing a safe, healthcare-focused large language model (LLM) designed to revolutionize health outcomes on a global scale.

  • Visionary Leadership: Hippocratic AI was co-founded by CEO Munjal Shah, alongside a group of physicians, hospital administrators, healthcare professionals, and artificial intelligence researchers from leading institutions, including El Camino Health, Johns Hopkins, Stanford, Microsoft, Google, and NVIDIA.

  • Strategic Investors: We have raised a total of $278 million in funding, backed by top investors such as Andreessen Horowitz, General Catalyst, Kleiner Perkins, NVIDIA’s NVentures, Premji Invest, SV Angel, and six health systems.

  • World-Class Team: Our team is composed of leading experts in healthcare and artificial intelligence, ensuring our technology is safe, effective, and capable of delivering meaningful improvements to healthcare delivery and outcomes.

Position Overview:


We are seeking a skilled ML Infrastructure Engineer to help design, build, and maintain a robust orchestration platform for managing a diverse set of Large Language Models (LLMs). The ideal candidate will have hands-on experience with infrastructure orchestration tools such as Kubernetes and Terraform, as well as a strong understanding of multi-cloud environments. This role offers the opportunity to work on cutting-edge technologies and play a key part in scaling our AI infrastructure.

Key Responsibilities: Infrastructure Development & Maintenance:


• Build and maintain infrastructure for deploying and managing LLMs at scale.
• Implement automated processes using Kubernetes and Infrastructure as Code (IAC) tools like Terraform.


Orchestration Platform Support:


• Contribute to the development and optimization of an orchestration platform for managing a heterogeneous set of LLMs.
• Monitor and troubleshoot issues in the platform to ensure high availability and performance.


Cloud Integration:


• Deploy and manage resources across multiple cloud platforms (e.g., AWS, Azure, Google Cloud).
• Optimize cloud resource usage for cost efficiency and scalability.


Collaboration:


• Work closely with ML engineers and DevOps teams to ensure smooth deployment and operation of AI models.
• Provide feedback on system designs and recommend improvements to infrastructure workflows.


Performance Monitoring:


• Implement tools and processes to monitor system health, identify bottlenecks, and improve model lifecycle management.
• Perform capacity planning to support growing infrastructure needs.

Qualifications:

Technical Skills:

• 3-5 years of experience in infrastructure engineering, DevOps, or a related field.

  • Experience with enterprise GPUs such as H200, H100, A100

• Proficiency with Kubernetes, Terraform, and other IAC tools.
• Familiarity with multi-cloud environments and cloud-native services (e.g., AWS Lambda, Google Cloud Run, Azure Functions).
• Programming skills in Python, Bash, or a similar language for automation and scripting.
• Basic understanding of ML workflows and frameworks like TensorFlow, PyTorch, or Hugging Face is a plus.Soft Skills: • Strong problem-solving skills and attention to detail.
• Good communication and collaboration abilities to work effectively with cross-functional teams.
• Eagerness to learn new technologies and improve existing systems.

Education & Experience: • Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent work experience).

Hippocratic AI Glassdoor Company Review
4.8 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
Hippocratic AI DE&I Review
No rating Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of Hippocratic AI
Hippocratic AI CEO photo
Munjal Shah
Approve of CEO

Average salary estimate

$110000 / YEARLY (est.)
min
max
$90000K
$130000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About ML Infrastructure Engineer, Hippocratic AI

At Hippocratic AI, we're revolutionizing healthcare through cutting-edge technology, and we're seeking a talented ML Infrastructure Engineer to join our team in Palo Alto. Our innovative safety-focused Large Language Model (LLM) is designed to improve health outcomes on a global scale. As an ML Infrastructure Engineer, you'll play a crucial role in designing, building, and maintaining a robust orchestration platform for managing our diverse set of LLMs. If you have hands-on experience with infrastructure orchestration tools like Kubernetes and Terraform, along with a strong understanding of multi-cloud environments, this is your opportunity to work on transformative technologies in a supportive and visionary environment. Collaborating with our world-class team of healthcare and AI experts, you'll have the chance to influence how AI can safely enhance healthcare delivery. Your experience embracing infrastructure automation and optimizing cloud resource usage will help us scale our AI capabilities efficiently. We thrive on innovation, and your input will be valuable in improving our infrastructure workflows and ensuring high availability of our services. If you're eager to take on challenges that have a profound impact on accessibility and health outcomes, we invite you to be part of our mission at Hippocratic AI. Join us as we empower healthcare professionals and societies around the world.

Frequently Asked Questions (FAQs) for ML Infrastructure Engineer Role at Hippocratic AI
What are the key responsibilities of the ML Infrastructure Engineer at Hippocratic AI?

As an ML Infrastructure Engineer at Hippocratic AI, your primary responsibilities will include building and maintaining infrastructure for deploying and managing Large Language Models (LLMs) at scale, implementing automated processes using tools like Kubernetes and Terraform, and optimizing resource usage across multiple cloud platforms. You'll also contribute to developing an orchestration platform and collaborate closely with ML engineers and DevOps teams to ensure smooth operations.

Join Rise to see the full answer
What qualifications are required for the ML Infrastructure Engineer position at Hippocratic AI?

Candidates for the ML Infrastructure Engineer role at Hippocratic AI should have 3-5 years of experience in infrastructure engineering, DevOps, or a related field. Proficiency with Kubernetes and Terraform, experience with enterprise GPUs, and a basic understanding of ML workflows are essential. A bachelor’s degree in Computer Science or a related field is preferred, but equivalent experience will also be considered.

Join Rise to see the full answer
How does the ML Infrastructure Engineer role contribute to Hippocratic AI's mission?

The ML Infrastructure Engineer at Hippocratic AI plays a vital role in supporting our mission to improve healthcare outcomes globally by designing and maintaining a high-quality orchestration platform. You will enable the efficient deployment and management of our Large Language Models, which are pivotal in offering safe and innovative AI solutions in healthcare environments, making a direct impact on health accessibility.

Join Rise to see the full answer
What tools and technologies will the ML Infrastructure Engineer use at Hippocratic AI?

In the ML Infrastructure Engineer position at Hippocratic AI, you'll primarily work with Kubernetes and Terraform for infrastructure orchestration, as well as various cloud platforms, including AWS, Azure, and Google Cloud. Familiarity with programming languages like Python and Bash for automation is also beneficial, allowing you to streamline processes effectively.

Join Rise to see the full answer
What opportunities for growth does the ML Infrastructure Engineer position offer at Hippocratic AI?

The ML Infrastructure Engineer role at Hippocratic AI provides abundant opportunities for professional development through collaboration with leading experts in AI and healthcare. You will have a chance to learn about emerging technologies, contribute ideas to improve systems, and play a key role in expanding our innovative healthcare solutions.

Join Rise to see the full answer
Common Interview Questions for ML Infrastructure Engineer
Can you explain your experience with Kubernetes in managing LLMs?

In your response, highlight specific projects where you've successfully utilized Kubernetes to manage and orchestrate applications, especially focusing on how you handled scaling issues and maintained high availability of services.

Join Rise to see the full answer
How do you approach automating infrastructure processes using Terraform?

Discuss a particular scenario where you used Terraform for Infrastructure as Code, detailing your thought process for defining your configuration, monitoring deployments, and addressing any issues that arose.

Join Rise to see the full answer
What challenges have you faced in a multi-cloud environment, and how did you overcome them?

Share an experience that illustrates your problem-solving skills in a multi-cloud setup, focusing on the strategies you implemented to maintain performance and optimize resource usage across different cloud platforms.

Join Rise to see the full answer
Describe your experience with capacity planning for infrastructure needs.

Provide examples of how you've conducted capacity planning in previous roles, including the methodologies you used to forecast needs and the tools involved to ensure the infrastructure was scalable and cost-effective.

Join Rise to see the full answer
How do you monitor the health of machine learning models in production?

Explain the various tools and metrics you use to monitor system performance, identify bottlenecks, and ensure that your models function optimally over time.

Join Rise to see the full answer
What programming languages do you prefer for automation and why?

Discuss your proficiency with languages like Python or Bash, providing specific examples of how you’ve used these languages to automate workflows and improve operational efficiencies.

Join Rise to see the full answer
How do you ensure effective collaboration with cross-functional teams?

Illustrate this with examples of successful projects where you've worked with ML engineers, DevOps teams, or other departments to achieve common goals, emphasizing your communication style and teamwork approach.

Join Rise to see the full answer
What strategies do you employ to stay updated on the latest trends in AI infrastructure?

Share your methods for staying current, such as following industry publications, attending conferences, or participating in online communities related to AI and infrastructure engineering.

Join Rise to see the full answer
Can you describe a time when you had to troubleshoot a critical infrastructure issue?

Provide a detailed account of the situation, your analysis of the problem, the steps you took to resolve it, and any lessons learned that improved your approach in the future.

Join Rise to see the full answer
How do you prioritize tasks when managing multiple projects in a fast-paced environment?

Discuss your time management strategies and prioritization techniques, using examples from your experience to show how you balance competing demands and ensure project milestones are met.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 9 hours ago
Photo of the Rise User
Posted 9 hours ago
Timmons Group Hybrid 610 E Morehead St, Charlotte, NC 28202, USA
Posted 18 hours ago
Photo of the Rise User
Posted 2 days ago
Photo of the Rise User
MotorK Remote No location specified
Posted 3 days ago
Photo of the Rise User
Posted 2 days ago
xAI Hybrid San Francisco & Palo Alto, CA
Posted 3 days ago

Hippocratic AI is building a safety-focused large language model (LLM) for the healthcare industry. We believe that generative AI has the potential to massively increase healthcare access the world over but has to be built and tested responsibly. ...

56 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, on-site
DATE POSTED
January 10, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!