Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Researcher: Inference image - Rise Careers
Job details

Researcher: Inference

About Cartesia

Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.

We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.

We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.

The Role

• Conduct cutting-edge research to improve the efficiency, scalability, and robustness of inference for state-of-the-art AI models across various modalities, including audio, text, and vision.

• Design and optimize inference pipelines to balance performance, latency, and resource utilization in diverse deployment environments, from edge devices to cloud systems.

• Develop and implement novel techniques for efficient model execution, including quantization, pruning, sparsity, distillation, and hardware-aware optimizations.

• Explore speculative decoding methods, caching strategies, and other advanced techniques to reduce latency and computational overhead during inference.

• Investigate trade-offs between model quality and inference efficiency, designing architectures and workflows that meet real-world application requirements.

• Prototype and refine methods for stateful inference, streaming inference, and task-specific conditioning to enable new capabilities and use cases.

• Collaborate closely with cross-functional teams to ensure inference research seamlessly integrates into production systems and applications.

What We’re Looking For

• Deep expertise in optimizing inference for machine learning models, with a strong understanding of techniques such as speculative decoding, model compression, low-precision computation, and hardware-specific tuning.

• Strong programming skills in Python, with experience in frameworks like PyTorch, TensorFlow, or ONNX, and familiarity with inference deployment tools such as TensorRT or TVM.

• Knowledge of hardware architectures and accelerators, including GPUs, TPUs, and edge devices, and their impact on inference performance.

• Experience in designing and evaluating scalable, low-latency inference pipelines for production systems.

• A solid understanding of the trade-offs between model accuracy, latency, and computational efficiency in deployment scenarios.

• Strong problem-solving skills and a passion for exploring innovative techniques to push the boundaries of real-time and resource-constrained inference.

Nice-to-Haves

• Experience with speculative decoding and other emerging techniques for improving inference performance.

• Familiarity with stateful or streaming inference techniques.

• Background in designing hybrid architectures or task-specific models optimized for inference.

• Early-stage startup experience or a track record of developing and deploying efficient inference systems in fast-paced R&D environments.

Our culture

🏢 We’re an in-person team based out of San Francisco. We love being in the office, hanging out together and learning from each other everyday.

🚢 We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality and design along the way.

🤝 We support each other. We have an open and inclusive culture that’s focused on giving everyone the resources they need to succeed.

Our perks

🍽 Lunch, dinner and snacks at the office.

🏥 Fully covered medical, dental, and vision insurance for employees.

🏦 401(k).

✈️ Relocation and immigration support.

🦖 Your own personal Yoshi.

Average salary estimate

$125000 / YEARLY (est.)
min
max
$100000K
$150000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Researcher: Inference, Cartesia

At Cartesia, we're on an ambitious mission to create the next generation of AI that enhances our everyday lives. As a Researcher specializing in Inference, you will play a pivotal role in revolutionizing how AI models process various modalities such as audio, video, and text. Your expertise will help us optimize inference pipelines, ensuring that our state-of-the-art models not only work seamlessly but also operate efficiently across diverse environments, from edge devices to cloud systems. In this position, you’ll design and implement advanced techniques like model quantization and pruning while engaging in speculative decoding methods that reduce latency and resource consumption. Working closely with our talented, cross-functional team of engineers and designers, you’ll prototype and refine innovative workflows that elevate model execution to new heights. At Cartesia, we foster a dynamic environment where collaboration and learning are encouraged, making this an ideal place for passionate individuals who thrive in cutting-edge research and are eager to push the boundaries of AI technology. Plus, as a part of our in-person team based in San Francisco, you’ll enjoy building strong relationships and sharing knowledge while enjoying great perks like complimentary meals and comprehensive health coverage. If you have a deep understanding of inference optimization and love the thrill of speedy execution in a supportive culture, we’d love to hear from you!

Frequently Asked Questions (FAQs) for Researcher: Inference Role at Cartesia
What are the key responsibilities of a Researcher: Inference at Cartesia?

As a Researcher: Inference at Cartesia, your main responsibilities will include conducting research aimed at improving the efficiency and scalability of AI models. You will design and optimize inference pipelines, develop novel techniques for model execution, and investigate trade-offs between model quality and inference efficiency, all while collaborating closely with cross-functional teams.

Join Rise to see the full answer
What qualifications are required for the Researcher: Inference position at Cartesia?

Candidates for the Researcher: Inference position at Cartesia should possess deep expertise in optimizing inference for machine learning models. Strong programming skills in Python, experience with frameworks like PyTorch or TensorFlow, and knowledge of hardware architectures are essential qualifications for this role.

Join Rise to see the full answer
What programming skills are necessary for the Researcher: Inference role at Cartesia?

For the Researcher: Inference role at Cartesia, strong programming skills in Python are crucial. Familiarity with machine learning frameworks such as PyTorch, TensorFlow, or ONNX is also important, alongside experience using inference deployment tools like TensorRT or TVM to optimize the efficiency of AI models.

Join Rise to see the full answer
How does Cartesia support career growth for Researchers: Inference?

At Cartesia, we support career growth for Researchers: Inference through our open and inclusive culture, encouraging collaboration and learning from each other daily. We provide access to resources necessary for success, fostering an environment where innovative ideas thrive and real-time problem-solving is key.

Join Rise to see the full answer
What is the work culture like for a Researcher: Inference at Cartesia?

The work culture for a Researcher: Inference at Cartesia is dynamic and collaborative. We prioritize execution speed and maintain high standards of quality and design. Our team enjoys working in-person at our San Francisco office, building strong relationships and supporting each other while pursuing groundbreaking research.

Join Rise to see the full answer
Common Interview Questions for Researcher: Inference
Can you explain your experience with optimizing inference for machine learning models?

In answering this question, highlight specific projects where you successfully improved inference efficiency. Discuss the techniques you employed, such as model compression or hardware-aware optimizations, and the impact your contributions had on performance and latency.

Join Rise to see the full answer
What novel techniques have you implemented for model execution in your previous roles?

When discussing novel techniques, provide concrete examples, such as effective quantization or pruning strategies. Explain how these methods contributed to better resource utilization and faster inference speeds in your past projects.

Join Rise to see the full answer
How do you approach designing inference pipelines for production systems?

To approach this question effectively, outline your step-by-step process in designing scalable inference pipelines. Mention the importance of considering trade-offs between accuracy, latency, and computational efficiency while ensuring seamless integration into production environments.

Join Rise to see the full answer
Describe a challenging problem you encountered while conducting inference research and how you resolved it.

In response, share a specific challenge related to inference efficiency or model scalability. Explain the strategies you employed to tackle the problem, emphasizing your problem-solving skills and the insights you gained from the experience.

Join Rise to see the full answer
What programming languages and frameworks do you prefer for inference-related tasks, and why?

When discussing your preferred languages and frameworks, clarify your expertise in Python and familiarity with PyTorch or TensorFlow. Explain why these tools are your favorites, and provide examples where they played a pivotal role in your research or project success.

Join Rise to see the full answer
How familiar are you with speculative decoding methods, and how have you applied them?

In your response, discuss your knowledge of speculative decoding methods and cite instances where you implemented them. Emphasize the advantages they brought to inference processes, focusing on reduced latency and improved performance in real-world applications.

Join Rise to see the full answer
What are some common trade-offs between model accuracy and inference efficiency?

When addressing this question, discuss trade-offs such as precision vs. speed, and model complexity vs. resource consumption. Illustrate with examples from your own experience where you've balanced these factors to meet particular business or technical requirements.

Join Rise to see the full answer
Can you describe your experience with hardware accelerators and their influence on inference performance?

Highlight your experience with hardware accelerators, such as GPUs or TPUs. Explain how you’ve optimized models to leverage these technologies for better performance, providing specific projects or successes that illustrate your point.

Join Rise to see the full answer
How do you incorporate collaboration into your research effectively?

Share how you prioritize collaboration in your research work, possibly discussing tools or methodologies that foster communication. Mention examples of successful projects achieved through teamwork, emphasizing your ability to integrate feedback from cross-functional teams.

Join Rise to see the full answer
What excites you the most about AI and inference research?

In your answer, convey your passion for AI by discussing the potential for transformative technologies. Mention specific areas of inference research that intrigue you the most, and how advancements in these areas might lead to new applications or improvements in AI capabilities.

Join Rise to see the full answer

Founded in 1992, Cartesia, Inc. is a group of talented professionals providing custom solutions in the areas of engineering design automation, Web-based applications development, and Microsoft Windows-based software construction and integration. ...

7 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, on-site
DATE POSTED
December 12, 2024

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!