Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.
We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.
We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.
The Role
• Conduct cutting-edge research to improve the efficiency, scalability, and robustness of inference for state-of-the-art AI models across various modalities, including audio, text, and vision.
• Design and optimize inference pipelines to balance performance, latency, and resource utilization in diverse deployment environments, from edge devices to cloud systems.
• Develop and implement novel techniques for efficient model execution, including quantization, pruning, sparsity, distillation, and hardware-aware optimizations.
• Explore speculative decoding methods, caching strategies, and other advanced techniques to reduce latency and computational overhead during inference.
• Investigate trade-offs between model quality and inference efficiency, designing architectures and workflows that meet real-world application requirements.
• Prototype and refine methods for stateful inference, streaming inference, and task-specific conditioning to enable new capabilities and use cases.
• Collaborate closely with cross-functional teams to ensure inference research seamlessly integrates into production systems and applications.
What We’re Looking For
• Deep expertise in optimizing inference for machine learning models, with a strong understanding of techniques such as speculative decoding, model compression, low-precision computation, and hardware-specific tuning.
• Strong programming skills in Python, with experience in frameworks like PyTorch, TensorFlow, or ONNX, and familiarity with inference deployment tools such as TensorRT or TVM.
• Knowledge of hardware architectures and accelerators, including GPUs, TPUs, and edge devices, and their impact on inference performance.
• Experience in designing and evaluating scalable, low-latency inference pipelines for production systems.
• A solid understanding of the trade-offs between model accuracy, latency, and computational efficiency in deployment scenarios.
• Strong problem-solving skills and a passion for exploring innovative techniques to push the boundaries of real-time and resource-constrained inference.
Nice-to-Haves
• Experience with speculative decoding and other emerging techniques for improving inference performance.
• Familiarity with stateful or streaming inference techniques.
• Background in designing hybrid architectures or task-specific models optimized for inference.
• Early-stage startup experience or a track record of developing and deploying efficient inference systems in fast-paced R&D environments.
Our culture
🏢 We’re an in-person team based out of San Francisco. We love being in the office, hanging out together and learning from each other everyday.
🚢 We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality and design along the way.
🤝 We support each other. We have an open and inclusive culture that’s focused on giving everyone the resources they need to succeed.
Our perks
🍽 Lunch, dinner and snacks at the office.
🏥 Fully covered medical, dental, and vision insurance for employees.
🏦 401(k).
✈️ Relocation and immigration support.
🦖 Your own personal Yoshi.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
At Cartesia, we're on an ambitious mission to create the next generation of AI that enhances our everyday lives. As a Researcher specializing in Inference, you will play a pivotal role in revolutionizing how AI models process various modalities such as audio, video, and text. Your expertise will help us optimize inference pipelines, ensuring that our state-of-the-art models not only work seamlessly but also operate efficiently across diverse environments, from edge devices to cloud systems. In this position, you’ll design and implement advanced techniques like model quantization and pruning while engaging in speculative decoding methods that reduce latency and resource consumption. Working closely with our talented, cross-functional team of engineers and designers, you’ll prototype and refine innovative workflows that elevate model execution to new heights. At Cartesia, we foster a dynamic environment where collaboration and learning are encouraged, making this an ideal place for passionate individuals who thrive in cutting-edge research and are eager to push the boundaries of AI technology. Plus, as a part of our in-person team based in San Francisco, you’ll enjoy building strong relationships and sharing knowledge while enjoying great perks like complimentary meals and comprehensive health coverage. If you have a deep understanding of inference optimization and love the thrill of speedy execution in a supportive culture, we’d love to hear from you!
Founded in 1992, Cartesia, Inc. is a group of talented professionals providing custom solutions in the areas of engineering design automation, Web-based applications development, and Microsoft Windows-based software construction and integration. ...
7 jobsSubscribe to Rise newsletter