Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
LLM  Inference Engineer image - Rise Careers
Job details

LLM Inference Engineer

About Us:

Hippocratic AI is building safety-focused large language model (LLM) for the healthcare industry. Our team comprised of ex-researchers from Microsoft, Meta, Nvidia, Apple, Stanford, John Hopkins and HuggingFace are reinventing the next generation of foundation model training and alignment to create AI-powered conversational agents for real time patient-AI interactions.

About the Role

We're seeking an experienced LLM Inference Engineer to optimize our large language model (LLM) serving infrastructure. The ideal candidate has:

  • Extensive hands-on experience with state-of-the-art inference optimization techniques

  • A track record of deploying efficient, scalable LLM systems in production environments

Key Responsibilities

  • Design and implement multi-node serving architectures for distributed LLM inference

  • Optimize multi-LoRA serving systems

  • Apply advanced quantization techniques (FP4/FP6) to reduce model footprint while preserving quality

  • Implement speculative decoding and other latency optimization strategies

  • Develop disaggregated serving solutions with optimized caching strategies for prefill and decoding phases

  • Continuously benchmark and improve system performance across various deployment scenarios and GPU types

Required Qualifications

  • 2+ years of experience optimizing LLM inference systems at scale

  • Proven expertise with distributed serving architectures for large language models

  • Hands-on experience implementing quantization techniques for transformer models

  • Strong understanding of modern inference optimization methods, including:

    • Speculative decoding techniques with draft models

    • Eagle speculative decoding approaches

  • Proficiency in Python and C++

  • Experience with CUDA programming and GPU optimization (familiarity required, expert-level not necessary)

Preferred Qualifications

  • Contributions to open-source inference frameworks such as vLLM, SGLang, or TensorRT-LLM

  • Experience with custom CUDA kernels

  • Track record of deploying inference systems in production environments

  • Deep understanding of performance optimization systems

Why Join Us?

Our team is pushing the boundaries of what's possible with LLM deployment. If you're passionate about making state-of-the-art language models more efficient and accessible, we'd love to hear from you!

Why Join Our Team

  • Innovative Mission: We are developing a safe, healthcare-focused large language model (LLM) designed to revolutionize health outcomes on a global scale.

  • Visionary Leadership: Hippocratic AI was co-founded by CEO Munjal Shah, alongside a group of physicians, hospital administrators, healthcare professionals, and artificial intelligence researchers from leading institutions, including El Camino Health, Johns Hopkins, Stanford, Microsoft, Google, and NVIDIA.

  • Strategic Investors: We have raised a total of $278 million in funding, backed by top investors such as Andreessen Horowitz, General Catalyst, Kleiner Perkins, NVIDIA’s NVentures, Premji Invest, SV Angel, and six health systems.

  • World-Class Team: Our team is composed of leading experts in healthcare and artificial intelligence, ensuring our technology is safe, effective, and capable of delivering meaningful improvements to healthcare delivery and outcomes.

For more information, visit www.HippocraticAI.com.

We value in-person teamwork and believe the best ideas happen together. Our team is expected to be in the office five days a week in Palo Alto, CA unless explicitly noted otherwise in the job description

References


1. Polaris: A Safety-focused LLM Constellation Architecture for Healthcare, https://arxiv.org/abs/2403.13313
2
. Polaris 2: https://www.hippocraticai.com/polaris2
3
. Personalized Interactions: https://www.hippocraticai.com/personalized-interactions
4
. Human Touch in AI: https://www.hippocraticai.com/the-human-touch-in-ai
5
. Empathetic Intelligence: https://www.hippocraticai.com/empathetic-intelligence
6
. Polaris 1: https://www.hippocraticai.com/research/polaris
7
. Research and clinical blogs: https://www.hippocraticai.com/research

Hippocratic AI Glassdoor Company Review
4.8 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
Hippocratic AI DE&I Review
No rating Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of Hippocratic AI
Hippocratic AI CEO photo
Munjal Shah
Approve of CEO

Average salary estimate

$115000 / YEARLY (est.)
min
max
$100000K
$130000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About LLM Inference Engineer, Hippocratic AI

If you're passionate about the intersection of artificial intelligence and healthcare, Hippocratic AI is the place for you! We're on the lookout for an experienced LLM Inference Engineer to help us build a safe and efficient large language model (LLM) tailored for the healthcare sector. Our team is composed of alumni from esteemed organizations like Microsoft, Meta, and Stanford, all converging to create AI-driven conversational agents that can enhance patient interactions. In this role, you will get your hands dirty optimizing our LLM serving infrastructure, which includes designing multi-node serving architectures and implementing quantization techniques to keep our models running smoothly without losing quality. You will play a key role in applying state-of-the-art optimization strategies and benchmark performance across diverse deployment scenarios. With an emphasis on collaboration, we believe that the best ideas come from in-person teamwork in our Palo Alto office. Join us in creating groundbreaking solutions with a focus on health outcomes that matter. Your expertise in distributed serving architectures and inference optimization will be pivotal as we push the boundaries of LLM technology. Together, we can truly make a difference in how healthcare is delivered globally!

Frequently Asked Questions (FAQs) for LLM Inference Engineer Role at Hippocratic AI
What are the main responsibilities of an LLM Inference Engineer at Hippocratic AI?

As an LLM Inference Engineer at Hippocratic AI, your primary responsibilities will include designing and implementing multi-node serving architectures for distributed LLM inference, optimizing multi-LoRA serving systems, and applying advanced quantization techniques to minimize model size without sacrificing performance. You'll also implement latency optimization strategies, such as speculative decoding, and develop disaggregated serving solutions that include optimized caching strategies. Your role will be crucial in continuously benchmarking and improving the performance of our systems across different deployment scenarios and GPU types.

Join Rise to see the full answer
What qualifications are required to become an LLM Inference Engineer at Hippocratic AI?

To be considered for the LLM Inference Engineer position at Hippocratic AI, candidates should possess at least 2 years of experience in optimizing LLM inference systems at scale. A proven track record with distributed serving architectures and hands-on experience with quantization techniques for transformer models is essential. You should have a strong grasp of modern inference optimization methods, including speculative decoding techniques and familiarity with CUDA programming and GPU optimization, although expert-level knowledge is not necessary. Proficiency in Python and C++ is also required.

Join Rise to see the full answer
How does Hippocratic AI encourage innovation in the field of LLM deployment?

Hippocratic AI fosters innovation in LLM deployment by actively participating in research and development with a world-class team comprised of healthcare professionals and AI researchers. We're dedicated to creating a safe, healthcare-focused language model designed to revolutionize health outcomes. Our co-founders include experienced individuals from top-tier health and tech organizations, which cultivates an environment of collaboration and forward-thinking. Additionally, we're backed by strategic investors who are equally passionate about pushing the boundaries of what's possible in AI technology.

Join Rise to see the full answer
What programming skills are beneficial for the LLM Inference Engineer role at Hippocratic AI?

For the LLM Inference Engineer position at Hippocratic AI, proficiency in Python and C++ is fundamental. Understanding CUDA programming and experience with GPU optimization will also significantly enhance your ability to contribute effectively to our LLM serving infrastructure. Familiarity with advanced quantization techniques and modern inference optimization methods will further enhance your skills and effectiveness in the role, allowing you to implement innovative solutions in real-world scenarios.

Join Rise to see the full answer
What makes Hippocratic AI an attractive employer for LLM Inference Engineers?

Hippocratic AI stands out as an attractive employer for LLM Inference Engineers due to its commitment to innovation in healthcare through AI technology. With a strong focus on safety and improving health outcomes, the company offers an exciting opportunity to contribute to meaningful projects that impact real lives. Additionally, working alongside leading experts in the field and having access to impressive funding from key investors provides a solid support structure for pushing technological boundaries. The collaborative in-office environment in Palo Alto also promotes teamwork and idea sharing, creating a motivating workplace.

Join Rise to see the full answer
Common Interview Questions for LLM Inference Engineer
What experience do you have optimizing LLM inference systems?

In your answer, provide specific examples from previous work where you successfully optimized LLM inference systems. Discuss the techniques you used, such as quantization or multi-node architecture implementation, and the results achieved. Highlight any measurable improvements in performance or efficiency that came as a direct consequence of your efforts.

Join Rise to see the full answer
Can you discuss your experience with distributed serving architectures?

Be prepared to explain your hands-on experience with distributed serving architectures for large language models. Provide details about past projects, the challenges you faced, and how you overcame them. Mention any specific technologies or frameworks that you have used, emphasizing your role in implementing and optimizing these systems.

Join Rise to see the full answer
What quantization techniques are you familiar with?

Discuss the quantization techniques you've utilized, such as FP4 or FP6, and how you've applied them to reduce model footprints while maintaining accuracy. Include examples of how this work positively impacted deployment outcomes in your previous positions.

Join Rise to see the full answer
How do you approach latency optimization in LLM inference?

Explain your knowledge and experience regarding latency optimization strategies like speculative decoding. Provide examples of how you implemented these techniques, the results obtained, and how they contributed to overall system performance. Sharing quantitative results will strengthen your answer.

Join Rise to see the full answer
What tools and programming languages do you prefer for LLM optimization?

Describe your proficiency with Python, C++, and CUDA. Discuss how you've used these languages and any relevant libraries in the past. Use specific examples to show your familiarity with LLM optimization and how these tools helped you achieve your goals.

Join Rise to see the full answer
Have you worked with any open-source inference frameworks?

If you have experience with open-source inference frameworks like vLLM, SGLang, or TensorRT-LLM, share your contributions or usage examples. If you haven’t, discuss your willingness to learn about these frameworks and how they could be beneficial in your future role as an LLM Inference Engineer.

Join Rise to see the full answer
How do you ensure the quality of your LLM systems?

Talk about the benchmarks and evaluation metrics you utilize to validate the performance and quality of your LLM systems. Describe your approach to continuous testing and optimization, and share any techniques you use to identify and resolve performance bottlenecks.

Join Rise to see the full answer
What motivates you to work in the intersection of AI and healthcare?

Express your passion for making a meaningful impact in healthcare through AI. Discuss any personal or professional experiences that drew you to this field, and how you see your work as an LLM Inference Engineer positively affecting patient outcomes and healthcare providers.

Join Rise to see the full answer
Can you describe a challenging project you worked on and the outcome?

Articulate a specific challenging project related to LLMs or inference optimization. Discuss the obstacles you faced, your thought process in tackling the problems, and the successful outcome of your efforts. This will demonstrate your problem-solving skills and resilience.

Join Rise to see the full answer
Why do you want to work with Hippocratic AI?

Convey your excitement about Hippocratic AI's mission to innovate healthcare through safe and effective LLMs. Mention aspects of the company's vision, team, and culture that align with your values and professional goals. Show that you have done your research on the company and are genuinely interested in contributing to its success.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 13 hours ago
Posted 6 days ago
Photo of the Rise User
InvoiceCloud Remote Remote (US only)
Posted 6 days ago
Photo of the Rise User
Signode Remote 1600 Central Ave, Roselle, IL 60172, USA
Posted 7 days ago
Photo of the Rise User
SpaceX Hybrid Flexible - Any SpaceX Site
Posted 12 days ago
Mission Driven
Social Impact Driven
Passion for Exploration
Reward & Recognition

Hippocratic AI is building a safety-focused large language model (LLM) for the healthcare industry. We believe that generative AI has the potential to massively increase healthcare access the world over but has to be built and tested responsibly. ...

89 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, on-site
DATE POSTED
March 19, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
Photo of the Rise User
Someone from OH, Cincinnati just viewed Data Scientist at Apex Systems
Photo of the Rise User
Someone from OH, Mansfield just viewed POS Install Tech at TEKsystems
Photo of the Rise User
Someone from OH, Dublin just viewed Sr. Manager UX Design Research at Visa
Photo of the Rise User
Someone from OH, Columbus just viewed Case Manager at Release Recovery
Photo of the Rise User
Someone from OH, Cincinnati just viewed Recruiting Coordinator (Contractor) at Anduril Industries
Photo of the Rise User
Someone from OH, Dublin just viewed Field Support Technicians - (Phoenix) at Nordstrom
Photo of the Rise User
Someone from OH, Stow just viewed IT Asset administrator at Ergomed
Photo of the Rise User
27 people applied to REMOTE Sr Piping Designer at Kelly
Photo of the Rise User
Someone from OH, Loveland just viewed Senior Buyer (wholesale) (m/f/d) at ABOUT YOU SE & Co. KG
Photo of the Rise User
Someone from OH, Cincinnati just viewed Summer 2025 Internship: Talent at Hylant
C
Someone from OH, Cincinnati just viewed Senior Instructional Designer at CXG
Photo of the Rise User
Someone from OH, Youngstown just viewed Compliance Specialist, Anti-Corruption Program at ServiceNow
Photo of the Rise User
6 people applied to Agile Scrum Master at DNAnexus
Photo of the Rise User
Someone from OH, Cleveland just viewed Finance Intern - Summer 2025 at Spectrum
Photo of the Rise User
Someone from OH, Cleveland just viewed QC Engineer at QODE
Photo of the Rise User
Someone from OH, Cleveland just viewed Getinge is hiring: UI/UX Developer in Streetsboro at Getinge
Photo of the Rise User
Someone from OH, Westerville just viewed Data analyst | Mid at Nord Security
Photo of the Rise User
Someone from OH, North Canton just viewed Researcher-NBC Sports at NBCUniversal