At Replicate, we’re on a mission to redefine AI infrastructure. We’re not just another AI company; we’re a team of developers, engineers, and innovators from organizations like Docker, Spotify, Dropbox, GitHub, Heroku, NVIDIA, and more. We’ve built foundational technologies like Docker Compose and OpenAPI, and now, we’re applying that expertise to make AI deployment as intuitive and reliable as web deployment.
Our goal is straightforward: build the best platform for creating, deploying, and running machine learning models. As an Infrastructure Engineer on the Platform team, you’ll play a key role in making generative AI available to everyone.
The Platform team at Replicate oversees the entire lifecycle of models, from packaging and deployment to serving, scaling, and monitoring. You’ll be developing the infrastructure that supports thousands of models and powers millions of predictions daily. This is a chance to build something truly innovative, where each decision you make has a tangible impact and allows your creativity to shine.
What you’ll be doing:
Designing and building our deployment and model-serving platform.
Building technology to operate the latest advancements in the ML and AI space.
Designing systems to maximize the utilization and reliability of our Kubernetes clusters and GPUs, including multi-regional traffic shifting and failover capabilities.
Owning and optimizing fair and reliable task allocation and queuing across a diverse set of customers with heterogeneous workloads.
Working with our Models team to speed up model inference through techniques like caching, weights management, machine configurations, and runtime optimizations in Python and PyTorch.
Working with technologies such as
Python, Go, and Node.js
Kubernetes and Terraform
Redis, Google BigQuery, and PostgreSQL
We're looking for the right person, not just someone who checks boxes, but it’s likely you have…
Experience building platforms at scale.
Worked in complex systems with many moving parts; you have opinions on monoliths vs. services.
Designed and implemented developer-friendly APIs to enable scalable and reliable integration.
Hands-on experience setting up and operating Kubernetes.
A passion for building tools that empower developers.
Strong communication and collaboration skills, with the ability to understand customer needs and distill complex topics into clear, actionable insights. We believe that most of programming isn’t just about writing code; building a platform requires a collaborative approach.
At least 3 years of full time software engineering experience.
These aren’t hard requirements, but we definitely want to talk with you if…
You have worked on machine learning platform teams in the past.
You have experience working with or on teams that have put ML/AI into production, even though this role does not entail building ML models directly.
You have some exposure to serving Generative AI features where GPUs are costly commodities and workloads can take significant time to finish.
This role can be remote (anywhere in the United States) or in-person. We have a strong preference for people in PST. If possible, we like people to come into our San Francisco office at least 3 days a week.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Join Replicate as a Software Engineer - ML Platform and help us revolutionize AI infrastructure! We've gathered an incredible team of developers and innovators hailing from renowned organizations like Docker, Spotify, and NVIDIA, and we’re working together to make AI deployment as seamless as traditional web deployment. At Replicate, we're dedicated to building a top-notch platform for creating, deploying, and running machine learning models, and you will play a crucial role on our Platform team. Here, you will oversee the entire lifecycle of machine learning models, diving into packaging, deployment, serving, scaling, and monitoring. Imagine developing the infrastructure that not only supports thousands of models but also powers millions of daily predictions. Your responsibilities will include designing our model-serving platform, maximizing Kubernetes cluster efficiency, and collaborating with our talented Models team to enhance model inference through various optimizations in Python and PyTorch. With your experience in building scalable platforms and knowledge of technologies like Go, Node.js, Redis, and Google BigQuery, you’ll significantly impact our mission. More than just technical skills, we value your ability to communicate effectively and work collaboratively as we create tools that empower developers to thrive. This role can be remote or based in our San Francisco office, depending on your preference. If you're passionate about innovating within the generative AI space, we want to hear from you!
Machine learning can now do some extraordinary things, but its still hard to use. You spend all day battling with messy Python scripts, broken Colab notebooks, perplexing CUDA errors, misshapen tensors. Its a mess. The reason machine learning is s...
9 jobsSubscribe to Rise newsletter