Adaptive ML is helping companies build singular generative AI experiences by democratizing the use of reinforcement learning. We are building the foundational technologies, tools, and products that allow models to learn directly from user interactions and self-improve based on simple guidelines. Our founders come from diverse backgrounds and previously worked together in creating state-of-the-art open-access large language models. We closed a $20M seed with Index & ICONIQ in early 2024 and are live with our first enterprise customers.
Our Technical Staff develops the foundational technology that powers Adaptive ML in alignment with requests and requirements from our Commercial and Product teams. We are committed to building robust, efficient technology and conducting at-scale, impactful research to drive our roadmap and deliver value to our customers.
As a GPU Performance Engineer in our Technical Staff, you will help ensure that our LLM stack (Adaptive Harmony) delivers state of the art performance across a wide variety of settings; from latency-bound regimes where serving requests with sub-second response times is key, to throughput-bound regimes during training and offline inference. You will help build the foundational technology powering Adaptive ML by delivering performance improvements directly to our clients as well as to our internal workloads. We are looking for self-driven, business-minded, and ambitious individuals interested in supporting real-world deployments of a highly technical product. As this is an early role, you will have the opportunity to shape our research efforts and product as we grow.
This role is ideally in-person at our Paris or New York office, but we are also open to fully remote work.
Build and maintain fast and robust GPU code, focusing on delivering performance improvements in real world applications;
Write high-quality software in CUDA, CUTLASS, or Triton with a focus on performance and robustness;
Profile dedicated GPU kernels, optimizing across latency/compute-bound regimes for complex workloads;
Contribute to our product roadmap, by identifying promising trends that can improve performance;
Report clearly on your work to a distributed collaborative team, with a bias for asynchronous written communication.
The background below is only suggestive of a few pointers we believe could be relevant. We welcome applications from candidates with diverse backgrounds; do not hesitate to get in touch if you think you could be a great fit, even if the below doesn't fully describe you.
A M.Sc. /Ph.D. in computer science, or demonstrated experience in software engineering, preferably with a focus on GPU-optimization;
Strong programming skills, preferably with a focus on systems and general purpose GPU programming;
A track record of writing high performance kernels, having preferably demonstrated ability to reach state of the art performance on well defined tasks;
Contributions to relevant open-source projects, such as CUTLASS, Triton and MLIR;
Passionate about the future of generative AI, and eager to build foundational technology to help machines deliver more singular experiences.
Comprehensive medical (health, dental, and vision) insurance;
401(k) plan with 4% matching (or equivalent);
Unlimited PTO — we strongly encourage at least 5 weeks each year;
Mental health, wellness, and personal development stipends;
Visa sponsorship if you wish to relocate to New York or Paris.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
At Adaptive ML, we're on a mission to revolutionize the AI landscape, and as a GPU Performance Engineer, you would be at the heart of it all! Based in the vibrant New York City, you'll join a talented team that is democratizing reinforcement learning to create exceptional generative AI experiences. This role requires you to dive deep into our LLM stack, Adaptive Harmony, ensuring that it performs exceptionally across various scenarios. Imagine optimizing GPU code to handle everything from millisecond latency during user requests to maximizing throughput during training sessions! Your expertise in CUDA, CUTLASS, or Triton will shine as you collaborate with both our Commercial and Product teams to implement performance enhancements that directly impact our clients and internal projects. Furthermore, being an early entrant means you'll have the unique opportunity to shape our product and research initiatives, facilitating real-world deployments of sophisticated technologies. If you're self-motivated, passionate about the future of generative AI, and equipped with skills in writing high-performance kernels, we want to hear from you! Plus, with perks like unlimited PTO, comprehensive health insurance, and even the possibility of visa sponsorship, this could be the chance you've been waiting for. Come help us build the future at Adaptive ML!
Every user interaction, advancing your use case. Test, serve, monitor, and iterate on your large language models in your cloud with Adaptive ML.
3 jobsSubscribe to Rise newsletter