About Sesame
Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and human. With this vision, we're designing a new kind of computer, focused on making voice companions part of our daily lives. Our team brings together founders from Oculus and Ubiquity6, alongside proven leaders from Meta, Google, and Apple, with deep expertise spanning hardware and software. Join us in shaping a future where computers truly come alive.
We are seeking an experienced Machine Learning Model Serving Engineer to join our team. This role focuses on optimizing and deploying scalable machine learning models in production, particularly large language models (LLMs), text-to-speech (TTS), and speaker recognition.
Optimize and deploy real-time, scalable ML models in production. Leverage latest techniques to squeeze as much throughput and speed as possible out of cutting edge model architectures.
Work with Kubernetes, Ray, and Torch to improve model serving infrastructure.
Manage deployments on Google Cloud Platform (GCP) using NVIDIA H100 GPUs.
Collaborate with ML engineers and infrastructure teams to ensure performance and reliability.
Conduct bottleneck analysis and systems performance tuning for inference workloads.
Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
Deep experience in deploying and managing scalable machine learning models using modern model-serving approaches.
Strong knowledge of cloud platforms (GCP preferred).
Performance analysis and optimization, including profiling latency, throughput, and memory usage in ML inference.
Deep experience with Kubernetes, Ray, and Torch.
Experience with building large-scale Kubernetes infrastructure.
Proficiency in infrastructure as code (IaC) for managing deployments.
Experience implementing CI/CD pipelines for ML model deployment.
Experience with real-time audio processing and media streaming.
Benefits:
401k matching
100% employer-paid health, vision, and dental benefits
Unlimited PTO and sick time
Flexible spending account matching (medical FSA)
Sesame is committed to a workplace where everyone feels valued, respected, and empowered. We welcome all qualified applicants, embracing diversity in race, gender, identity, orientation, ability, and more. We provide reasonable accommodations for applicants with disabilities—contact careers@sesame.com for assistance.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Sesame is on a remarkable mission to bring lifelike computing into everyday life, focusing on seamless voice interactions that feel natural and human. We are thrilled to announce an exciting opening for a Machine Learning Model Serving Engineer in our vibrant San Francisco office. In this role, you'll be at the forefront of optimizing and deploying scalable machine learning models that power our groundbreaking technology. Your expertise will play a crucial part in enhancing large language models (LLMs), text-to-speech (TTS) systems, and speaker recognition, all integral to our vision. You're expected to not only manage infrastructure on Google Cloud Platform (GCP), utilizing cutting-edge NVIDIA H100 GPUs, but also delve into Kubernetes, Ray, and Torch to fortify our model-serving capabilities. Collaborating with our talented ML engineers and infrastructure teams, you'll conduct bottleneck analyses and make performance tweaks to ensure our systems run smoothly. If you hold a degree in Computer Science or a related field, possess deep experience in deploying and managing machine learning models, and have a knack for performance optimization, we would love to hear from you! Join us at Sesame and help shape the future of technology where computers come alive like never before.
Sesame is building a radically new healthcare system for uninsured Americans and those with high-deductible plans.
7 jobsSubscribe to Rise newsletter