At Modal, we build foundational technology, including an optimized container runtime, a GPU-aware scheduler, and a distributed file system.
We're a small team based out of New York, Stockholm and San Francisco, and have raised over $23M. Our team includes creators of popular open-source projects (e.g., Seaborn, Luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience.
We are looking for strong engineers with experience in making ML systems performant at scale. If you are interested in contributing to open-source projects and Modal’s container runtime to push language and diffusion models towards higher throughput and lower latency, we’d love to hear from you!
Work in-person, in our NYC, San Francisco or Stockholm office
Full medical, dental, vision insurance
Competitive salary and equity
5+ years of experience writing high-quality production code.
Experience working with torch, huggingface libraries, modern inference engines (vLLM or TensorRT).
Familiarity with Nvidia GPU architecture and CUDA.
Familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc.)
Experience with ML performance engineering (tell us a story of when you pushed GPU utilization higher!)
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
At Modal, we're on a mission to revolutionize machine learning performance, and we're excited to invite you to join us as a Member of Technical Staff focused on ML Performance in our vibrant New York office. Here, you'll be part of a dynamic team that is passionate about creating foundational technology, such as our optimized container runtime and GPU-aware scheduler. We believe in the power of collaboration and innovation, with team members who are creators of popular open-source projects and accomplished researchers. In this role, you'll leverage your 5+ years of experience in writing high-quality production code to enhance the performance of our ML systems, ensuring they operate at scale with maximum throughput and minimum latency. If you have expertise in working with torch, huggingface libraries, and modern inference engines like vLLM or TensorRT, and if you're familiar with Nvidia GPU architecture and CUDA, we want to hear your story about pushing GPU utilization higher! Enjoy the perks of full medical, dental, and vision insurance along with a competitive salary and equity options. If you thrive in an environment that supports professional growth and encourages contributions to cutting-edge open-source projects, Modal is the place for you. Come help us reshape the future of machine learning!
At Modal, we build the future of auto commerce for the world’s largest auto brands and retailers. We take the moving parts of an auto purchase transaction and assemble them into a simple, digital transaction flow that seamlessly fits any webpage a...
2 jobsSubscribe to Rise newsletter