Job details

ML Model Serving Engineer

Get a free resume review

About Sesame

Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and human. With this vision, we're designing a new kind of computer, focused on making voice companions part of our daily lives. Our team brings together founders from Oculus and Ubiquity6, alongside proven leaders from Meta, Google, and Apple, with deep expertise spanning hardware and software. Join us in shaping a future where computers truly come alive.

About the role

We are seeking an experienced Machine Learning Model Serving Engineer to join our team. This role focuses on optimizing and deploying scalable machine learning models in production, particularly large language models (LLMs), text-to-speech (TTS), and speaker recognition.

Responsibilities:

Optimize and deploy real-time, scalable ML models in production. Leverage latest techniques to squeeze as much throughput and speed as possible out of cutting edge model architectures.
Work with Kubernetes, Ray, and Torch to improve model serving infrastructure.
Manage deployments on Google Cloud Platform (GCP) using NVIDIA H100 GPUs.
Collaborate with ML engineers and infrastructure teams to ensure performance and reliability.
Conduct bottleneck analysis and systems performance tuning for inference workloads.

Required qualifications:

Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
Deep experience in deploying and managing scalable machine learning models using modern model-serving approaches.
Strong knowledge of cloud platforms (GCP preferred).
Performance analysis and optimization, including profiling latency, throughput, and memory usage in ML inference.

Preferred qualifications: PyTorch experience (strongly preferred but not strictly required).

Deep experience with Kubernetes, Ray, and Torch.
Experience with building large-scale Kubernetes infrastructure.
Proficiency in infrastructure as code (IaC) for managing deployments.
Experience implementing CI/CD pipelines for ML model deployment.
Experience with real-time audio processing and media streaming.

Benefits:

401k matching
100% employer-paid health, vision, and dental benefits
Unlimited PTO and sick time
Flexible spending account matching (medical FSA)

Sesame is committed to a workplace where everyone feels valued, respected, and empowered. We welcome all qualified applicants, embracing diversity in race, gender, identity, orientation, ability, and more. We provide reasonable accommodations for applicants with disabilities—contact careers@sesame.com for assistance.

Average salary estimate

$110000 / YEARLY (est.)

min

max

$90000K

$130000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About ML Model Serving Engineer, Sesame

Sesame is on a remarkable mission to bring lifelike computing into everyday life, focusing on seamless voice interactions that feel natural and human. We are thrilled to announce an exciting opening for a Machine Learning Model Serving Engineer in our vibrant San Francisco office. In this role, you'll be at the forefront of optimizing and deploying scalable machine learning models that power our groundbreaking technology. Your expertise will play a crucial part in enhancing large language models (LLMs), text-to-speech (TTS) systems, and speaker recognition, all integral to our vision. You're expected to not only manage infrastructure on Google Cloud Platform (GCP), utilizing cutting-edge NVIDIA H100 GPUs, but also delve into Kubernetes, Ray, and Torch to fortify our model-serving capabilities. Collaborating with our talented ML engineers and infrastructure teams, you'll conduct bottleneck analyses and make performance tweaks to ensure our systems run smoothly. If you hold a degree in Computer Science or a related field, possess deep experience in deploying and managing machine learning models, and have a knack for performance optimization, we would love to hear from you! Join us at Sesame and help shape the future of technology where computers come alive like never before.

Frequently Asked Questions (FAQs) for ML Model Serving Engineer Role at Sesame

What are the primary responsibilities of a Machine Learning Model Serving Engineer at Sesame?

As a Machine Learning Model Serving Engineer at Sesame, your key responsibilities will include optimizing and deploying real-time scalable ML models, particularly large language models (LLMs) and text-to-speech systems. You'll work with tools like Kubernetes and Ray to enhance our model serving infrastructure and manage deployments on Google Cloud Platform (GCP) using powerful NVIDIA H100 GPUs. Collaborating closely with ML engineers, you'll conduct bottleneck analysis and performance tuning to ensure reliable and efficient ML inference.