Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
ML Model Serving Engineer image - Rise Careers
Job details

ML Model Serving Engineer

About Sesame

Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and human. With this vision, we're designing a new kind of computer, focused on making voice companions part of our daily lives. Our team brings together founders from Oculus and Ubiquity6, alongside proven leaders from Meta, Google, and Apple, with deep expertise spanning hardware and software. Join us in shaping a future where computers truly come alive.

About the role

We are seeking an experienced Machine Learning Model Serving Engineer to join our team. This role focuses on optimizing and deploying scalable machine learning models in production, particularly large language models (LLMs), text-to-speech (TTS), and speaker recognition.

Responsibilities:

  • Optimize and deploy real-time, scalable ML models in production. Leverage latest techniques to squeeze as much throughput and speed as possible out of cutting edge model architectures.

  • Work with Kubernetes, Ray, and Torch to improve model serving infrastructure.

  • Manage deployments on Google Cloud Platform (GCP) using NVIDIA H100 GPUs.

  • Collaborate with ML engineers and infrastructure teams to ensure performance and reliability.

  • Conduct bottleneck analysis and systems performance tuning for inference workloads.

Required qualifications:

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field.

  • Deep experience in deploying and managing scalable machine learning models using modern model-serving approaches.

  • Strong knowledge of cloud platforms (GCP preferred).

  • Performance analysis and optimization, including profiling latency, throughput, and memory usage in ML inference.

Preferred qualifications: PyTorch experience (strongly preferred but not strictly required).

  • Deep experience with Kubernetes, Ray, and Torch.

  • Experience with building large-scale Kubernetes infrastructure.

  • Proficiency in infrastructure as code (IaC) for managing deployments.

  • Experience implementing CI/CD pipelines for ML model deployment.

  • Experience with real-time audio processing and media streaming.

Benefits: 

  • 401k matching

  • 100% employer-paid health, vision, and dental benefits 

  • Unlimited PTO and sick time 

  • Flexible spending account matching (medical FSA) 

Sesame is committed to a workplace where everyone feels valued, respected, and empowered. We welcome all qualified applicants, embracing diversity in race, gender, identity, orientation, ability, and more. We provide reasonable accommodations for applicants with disabilities—contact careers@sesame.com for assistance.

Average salary estimate

$110000 / YEARLY (est.)
min
max
$90000K
$130000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About ML Model Serving Engineer, Sesame

Sesame is on a remarkable mission to bring lifelike computing into everyday life, focusing on seamless voice interactions that feel natural and human. We are thrilled to announce an exciting opening for a Machine Learning Model Serving Engineer in our vibrant San Francisco office. In this role, you'll be at the forefront of optimizing and deploying scalable machine learning models that power our groundbreaking technology. Your expertise will play a crucial part in enhancing large language models (LLMs), text-to-speech (TTS) systems, and speaker recognition, all integral to our vision. You're expected to not only manage infrastructure on Google Cloud Platform (GCP), utilizing cutting-edge NVIDIA H100 GPUs, but also delve into Kubernetes, Ray, and Torch to fortify our model-serving capabilities. Collaborating with our talented ML engineers and infrastructure teams, you'll conduct bottleneck analyses and make performance tweaks to ensure our systems run smoothly. If you hold a degree in Computer Science or a related field, possess deep experience in deploying and managing machine learning models, and have a knack for performance optimization, we would love to hear from you! Join us at Sesame and help shape the future of technology where computers come alive like never before.

Frequently Asked Questions (FAQs) for ML Model Serving Engineer Role at Sesame
What are the primary responsibilities of a Machine Learning Model Serving Engineer at Sesame?

As a Machine Learning Model Serving Engineer at Sesame, your key responsibilities will include optimizing and deploying real-time scalable ML models, particularly large language models (LLMs) and text-to-speech systems. You'll work with tools like Kubernetes and Ray to enhance our model serving infrastructure and manage deployments on Google Cloud Platform (GCP) using powerful NVIDIA H100 GPUs. Collaborating closely with ML engineers, you'll conduct bottleneck analysis and performance tuning to ensure reliable and efficient ML inference.

Join Rise to see the full answer
What qualifications are required for the Machine Learning Model Serving Engineer position at Sesame?

To qualify for the Machine Learning Model Serving Engineer role at Sesame, candidates should possess either a Bachelor's or Master's degree in Computer Science, Engineering, or a related field. Importantly, deep experience in deploying and managing scalable machine learning models is essential, along with strong knowledge of cloud platforms, preferably Google Cloud Platform (GCP). Applicants should also demonstrate expertise in performance analysis and optimization, focusing on profiling latency, throughput, and memory usage in ML inference.

Join Rise to see the full answer
Is experience with PyTorch necessary for the Machine Learning Model Serving Engineer role at Sesame?

While experience with PyTorch is strongly preferred for the Machine Learning Model Serving Engineer role at Sesame, it is not strictly required. However, a solid background in leveraging modern model-serving approaches and working with technologies such as Kubernetes, Ray, and Torch will be beneficial. Familiarity with building large-scale infrastructure and implementing CI/CD pipelines for ML model deployment will enhance your candidacy significantly.

Join Rise to see the full answer
What does the team culture look like for Machine Learning Model Serving Engineers at Sesame?

At Sesame, our team culture is grounded in respect, collaboration, and empowerment. We are dedicated to fostering an inclusive workplace where all team members feel valued and appreciated. As a Machine Learning Model Serving Engineer, you will find yourself in a supportive environment, collaborating with talented professionals from diverse backgrounds who share a common goal of revolutionizing how computers interact with us.

Join Rise to see the full answer
What benefits does Sesame offer to Machine Learning Model Serving Engineers?

Sesame offers a comprehensive benefits package for Machine Learning Model Serving Engineers, which includes 401k matching, 100% employer-paid health, vision, and dental benefits. We also provide unlimited PTO and sick time to support work-life balance, along with matching contributions for flexible spending accounts (FSA) for medical expenses. Our commitment is to ensure our team members are taken care of, both personally and professionally.

Join Rise to see the full answer
Common Interview Questions for ML Model Serving Engineer
How do you approach optimizing machine learning models for performance?

When optimizing machine learning models for performance, I first conduct a thorough analysis of the current model's throughput and latency. I then identify bottlenecks, whether they are related to algorithm efficiency, data handling, or infrastructure limitations. Finally, I’ll apply techniques like quantization, pruning, or employing faster inference engines, while continuously monitoring performance improvements.

Join Rise to see the full answer
Can you explain your experience with cloud platforms like Google Cloud Platform?

Certainly! My experience with Google Cloud Platform (GCP) includes deploying machine learning models using its AI services and managing infrastructure with tools like Kubernetes. I have utilized GCP's capabilities to scale applications, ensuring optimal performance while taking advantage of their robust ML tools, including TensorFlow and BigQuery for data management.

Join Rise to see the full answer
Describe your experience with Kubernetes and how it aids in ML model serving.

I have extensive experience with Kubernetes to automate deployment, scaling, and management of containerized applications. In terms of ML model serving, Kubernetes facilitates the orchestration of services that host our models, enabling us to scale instances based on demand and ensure high availability, which is crucial for real-time ML applications.

Join Rise to see the full answer
What methods do you use to conduct bottleneck analysis in machine learning inference?

To investigate bottlenecks in machine learning inference, I use profiling tools to monitor metrics like latency and resource utilization during model execution. By doing so, I can pinpoint exact stages in the model pipeline that slow down the process, allowing me to optimize algorithms and infrastructure accordingly, and refine performance through tuning and adjustments.

Join Rise to see the full answer
How do you ensure the reliability of ML models in production?

Ensuring the reliability of ML models in production involves implementing robust monitoring systems to track model performance over time. I incorporate automated testing and continuous integration practices to identify and resolve issues early. Furthermore, I maintain logs and establish alerts for unexpected model behavior, allowing for quick interventions when necessary.

Join Rise to see the full answer
What experience do you have with real-time streaming of audio data?

I have worked on several projects that involve real-time audio processing, specifically in implementing stream processing frameworks that efficiently manage audio data flow. I focus on maintaining low latency to provide instantaneous feedback for applications such as speech recognition or TTS, ensuring a seamless user experience.

Join Rise to see the full answer
Can you share a project where you implemented CI/CD pipelines for ML model deployment?

In a recent project, I established a CI/CD pipeline for deploying ML models using tools like GitLab CI and Docker. The pipeline included stages for model testing, validation, and automated rollback in case of failures, ensuring that updates were consistently deployed without downtime, and allowing for rapid iteration based on user feedback.

Join Rise to see the full answer
What is your strategy for profile latency, throughput, and memory usage in ML inference?

My strategy includes leveraging profiling tools that provide insights into runtime performance metrics during inference. I typically set up benchmarks to measure throughput and latency under various loads, and assess memory utilization to identify potential overhead, allowing for informed decisions on which model architectures or optimizations to pursue.

Join Rise to see the full answer
Discuss your familiarity with infrastructure as code (IaC) for managing ML deployments.

I have leveraged infrastructure as code (IaC) tools like Terraform to define and manage cloud resources programmatically, which has streamlined the deployment process for ML models. By using IaC, I can ensure environmental consistency, facilitate reproducibility, and easily roll out changes across different deployment stages in a controlled manner.

Join Rise to see the full answer
How do you stay updated with advancements in machine learning and model serving technologies?

I stay updated by actively engaging in professional communities, attending conferences, and participating in webinars. Additionally, I subscribe to relevant research journals and follow thought leaders in the machine learning space on platforms like Medium and GitHub to keep abreast of the latest innovations and best practices in model serving technologies.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 6 days ago
Photo of the Rise User
Sesame Hybrid San Francisco
Posted 5 days ago
Photo of the Rise User
IFS Remote Staines-upon-Thames, UK
Posted 2 days ago
Photo of the Rise User
Bosch Group Remote EN109, Zona Indutrial de Ovar, Lugar da Pardala, Ovar, Ovar, Portugal
Posted 8 days ago
Photo of the Rise User
DLR Group Remote Cleveland, Ohio, United States · Washington, D.C.
Posted 9 days ago
Photo of the Rise User
City of San Antonio Hybrid 100 W Houston St, San Antonio, TX 78205, USA
Posted 9 days ago
JBPro Hybrid No location specified
Posted yesterday
Photo of the Rise User
Ampd Energy Remote No location specified
Posted 2 days ago
Photo of the Rise User
Smiths Group Hybrid 3600 Presidential Blvd, Austin, TX 78719, USA
Posted 9 hours ago

Sesame is building a radically new healthcare system for uninsured Americans and those with high-deductible plans.

7 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, on-site
DATE POSTED
March 16, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
Photo of the Rise User
Someone from OH, Pickerington just viewed Marketing Data Analyst - Contract (10hrs/wk) at Skylight
Photo of the Rise User
Someone from OH, Pickerington just viewed Americas Sales Manager, Kuiper Mobility Business Unit at Amazon
Photo of the Rise User
Someone from OH, Maple Heights just viewed Medical Receptionist at LifeStance Health
Photo of the Rise User
Someone from OH, Cleveland just viewed Support Specialist, Live Ops at DoorDash USA
Photo of the Rise User
Someone from OH, Cleveland just viewed Customer Advocate (Final Dashination) at DoorDash USA
Photo of the Rise User
Someone from OH, Reynoldsburg just viewed Data Analyst (Work From Home / Dayshift) at Twoconnect
S
Someone from OH, Painesville just viewed Senior Project Manager/Delivery Manager at Soname Solutions
Photo of the Rise User
Someone from OH, Zanesville just viewed Account Manager - Loan Agency Services at Alter Domus
Photo of the Rise User
Someone from OH, Springfield just viewed Sr. Coordinator, Talent Acquisition at Cardinal Health
Photo of the Rise User
Someone from OH, Columbus just viewed People ops at Alan
Photo of the Rise User
Someone from OH, Milford just viewed Content Marketing Analyst at Eurofins
Photo of the Rise User
Someone from OH, Columbus just viewed DV - Hotline Specialist On Call at Shelter House
Photo of the Rise User
Someone from OH, West Chester just viewed General Warehouse at SanMar Employee Board
Photo of the Rise User
Someone from OH, Euclid just viewed Behavioral Health Program Director at Altarum
Photo of the Rise User
7 people applied to GIS Summer Intern at AECOM
Photo of the Rise User
Someone from OH, Cincinnati just viewed Technical Support Engineer - Developer Support at Motive
Photo of the Rise User
Someone from OH, Columbus just viewed Front End Engineer at minware Washington DC at minware
Photo of the Rise User
Someone from OH, Hudson just viewed Junior Designer at H&M Group
B
Someone from OH, Hudson just viewed Senior Designer, Women's Sportswear at BCI Brands
Photo of the Rise User
12 people applied to DevOps Engineer at Cognigy
Photo of the Rise User
Someone from OH, Columbus just viewed Server at Otterbein SeniorLife
Photo of the Rise User
58 people applied to Electrical Apprentice at Aerotek
A
Someone from OH, Cleveland just viewed Personal Assistant *ASAP* at Alphabe Insight Inc
Photo of the Rise User
Someone from OH, Canton just viewed Senior Director, Communications at Imagine Pediatrics
Photo of the Rise User
Someone from OH, Euclid just viewed Software Engineer - Sr. Consultant level at Visa