Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Researcher: Multimodal (Data) image - Rise Careers
Job details

Researcher: Multimodal (Data)

About Cartesia

Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.

We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.

We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.

The Role

• Lead the design, creation, and optimization of datasets for training and evaluating multimodal models across diverse modalities, including audio, text, video, and images.

• Develop strategies for curating, aligning, and augmenting multimodal datasets to address challenges in synchronization, variability, and scalability.

• Design innovative methods for data augmentation, synthetic data generation, and cross-modal sampling to enhance the diversity and robustness of datasets.

• Create datasets tailored for specific multimodal tasks, such as audio-visual speech recognition, text-to-video generation, or cross-modal retrieval, with attention to real-world deployment needs.

• Collaborate closely with researchers and engineers to ensure datasets are optimized for target architectures, training pipelines, and task objectives.

• Build scalable pipelines for multimodal data processing, annotation, and validation to support research and production workflows.

What We’re Looking For

• Expertise in multimodal data curation and processing, with a deep understanding of challenges in combining diverse data types like audio, text, images, and video.

• Proficiency in tools and libraries for handling specific modalities, such as librosa (audio), OpenCV (video), and Hugging Face (text).

• Familiarity with data alignment techniques, including time synchronization for audio and video, embedding alignment for cross-modal learning, and temporal consistency checks.

• Strong understanding of multimodal dataset design principles, including methods for ensuring data diversity, sufficiency, and relevance for targeted applications.

• Programming expertise in Python and experience with frameworks like PyTorch or TensorFlow for building multimodal data pipelines.

• Comfortable with large-scale data processing and distributed systems for multimodal dataset storage, processing, and management.

• A collaborative mindset with the ability to work cross-functionally with researchers, engineers, and product teams to align data strategies with project goals.

Nice-to-Haves

• Experience in creating synthetic multimodal datasets using generative models, simulation environments, or advanced augmentation techniques.

• Background in annotating and aligning multimodal datasets for tasks such as audio-visual speech recognition, video-captioning, or multimodal reasoning.

• Early-stage startup experience or a proven track record of building datasets for cutting-edge research in fast-paced environments.

Our culture

🏢 We’re an in-person team based out of San Francisco. We love being in the office, hanging out together and learning from each other everyday.

🚢 We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality and design along the way.

🤝 We support each other. We have an open and inclusive culture that’s focused on giving everyone the resources they need to succeed.

Our perks

🍽 Lunch, dinner and snacks at the office.

🏥 Fully covered medical, dental, and vision insurance for employees.

🏦 401(k).

✈️ Relocation and immigration support.

🦖 Your own personal Yoshi.

Average salary estimate

$125000 / YEARLY (est.)
min
max
$100000K
$150000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Researcher: Multimodal (Data), Cartesia

At Cartesia, we're on a mission to redefine the future of AI with our groundbreaking Researcher: Multimodal (Data) role, located in the vibrant city of San Francisco. If you're passionate about shaping the next generation of AI systems that will process and reason over vast streams of audio, video, and text, then this is the opportunity for you. Your main focus will be on leading the design and optimization of datasets that will train and evaluate our innovative multimodal models. You will collaborate with talented researchers and engineers, harnessing your expertise in data curation to develop strategies that tackle synchronization and scalability challenges. You'll get to flex your creative muscles by designing methods for data augmentation and synthetic data generation, enhancing the diversity of datasets tailored for specific tasks like audio-visual speech recognition and cross-modal retrieval. At Cartesia, we pride ourselves on our fast-paced, collaborative culture where you can connect with coworkers who share your enthusiasm for cutting-edge technology. With a dynamic team that values quality and execution speed, we support each other every step of the way. If you have a strong background in multimodal data processing, programming expertise in Python, and a desire to work in an exciting and innovative environment, we'd love to hear from you. Join us and help us build the interactive intelligence that is set to change the world!

Frequently Asked Questions (FAQs) for Researcher: Multimodal (Data) Role at Cartesia
What are the key responsibilities of a Researcher: Multimodal (Data) at Cartesia?

As a Researcher: Multimodal (Data) at Cartesia, your key responsibilities will include leading the design and optimization of datasets for training multimodal models, developing curating strategies to manage diverse data types, and collaborating closely with engineers and researchers to ensure the datasets meet project goals. You'll also create innovative methods for data generation to enhance dataset diversity and robustness.

Join Rise to see the full answer
What qualifications are needed for the Researcher: Multimodal (Data) position at Cartesia?

To qualify for the Researcher: Multimodal (Data) position at Cartesia, candidates should have expertise in multimodal data curation, proficiency with relevant tools and libraries like librosa and OpenCV, a strong understanding of dataset design principles, and programming experience in Python. Familiarity with frameworks such as PyTorch or TensorFlow will also be advantageous.

Join Rise to see the full answer
What experience is preferred for the Researcher: Multimodal (Data) role at Cartesia?

Preferred experience for the Researcher: Multimodal (Data) role at Cartesia includes having a background in creating synthetic multimodal datasets, annotating datasets for tasks like audio-visual speech recognition, and familiarity with fast-paced startup environments. Such experience will help ensure you can contribute effectively to our innovative projects.

Join Rise to see the full answer
What programming languages and frameworks should a Researcher: Multimodal (Data) know at Cartesia?

A Researcher: Multimodal (Data) at Cartesia should be proficient in Python and have experience with frameworks like PyTorch or TensorFlow. Knowledge of tools for handling specific data types, such as librosa for audio and OpenCV for video, is essential as well.

Join Rise to see the full answer
What is the work culture like for a Researcher: Multimodal (Data) at Cartesia?

The work culture at Cartesia is vibrant and collaborative. As a Researcher: Multimodal (Data), you will be part of a dedicated team that values open communication and inclusivity. With a focus on innovation and rapid execution, the team supports each other in achieving success while enjoying a fun, engaging office environment in San Francisco.

Join Rise to see the full answer
Common Interview Questions for Researcher: Multimodal (Data)
Can you explain your approach to designing datasets for multimodal training?

When answering this question, discuss your strategies for data curation, including how you would ensure diversity and relevance of the datasets. Mention techniques like data augmentation and synthetic data generation that you have previously applied.

Join Rise to see the full answer
How do you handle synchronization issues in multimodal data?

To effectively answer this question, highlight your methods for addressing synchronization challenges, emphasizing your familiarity with time synchronization for audio and video, embedding alignment, and temporal consistency checks. Providing examples from past experiences can strengthen your response.

Join Rise to see the full answer
What tools do you use for processing multimodal data?

In your response, mention the tools and libraries you are proficient with, such as librosa for audio processing, OpenCV for video handling, and Hugging Face for textual data. Discuss how these tools facilitate efficient data processing in your projects.

Join Rise to see the full answer
Describe a challenge you faced while creating a multimodal dataset and how you overcame it.

Use this opportunity to share a specific example of a challenge you encountered, such as issues with data alignment or diversity. Discuss the solutions you implemented and their outcomes, showcasing your problem-solving skills.

Join Rise to see the full answer
How do you collaborate with engineers and product teams in your workflow?

Highlight the importance of communication and collaboration in your work. Provide examples of how you have worked cross-functionally to align data strategies with project objectives, and emphasize the value of open dialogue in achieving common goals.

Join Rise to see the full answer
What innovative methods for data augmentation have you explored?

Discuss any novel data augmentation techniques you have implemented or researched. Explain how these methods have positively impacted the robustness and diversity of datasets tailored for specific multimodal tasks.

Join Rise to see the full answer
Can you detail your experience with large-scale data processing?

When answering, discuss the scale of datasets you have worked with and your experience with distributed systems for multimodal data management. Providing specific metrics or examples can illustrate your proficiency in handling large data volumes.

Join Rise to see the full answer
What are your thoughts on the future of multimodal AI systems?

Share your insights on trends and advancements in multimodal AI. Reflect on the potential applications of these systems, emphasizing how your role at Cartesia could contribute to this evolving landscape.

Join Rise to see the full answer
Why do you want to work as a Researcher: Multimodal (Data) at Cartesia?

Articulate your passion for AI and data science, and explain how Cartesia's mission resonates with your interests. Discuss your enthusiasm for thriving in a collaborative workplace and your desire to contribute to innovative projects.

Join Rise to see the full answer
How do you prioritize tasks when managing multiple datasets?

In your response, explain your organizational strategies for managing multiple datasets. Discuss how you assess project requirements and align priorities to ensure efficient workflows and timely delivery of results.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 3 days ago
Photo of the Rise User
Posted 2 days ago
Photo of the Rise User
Posted 6 days ago
Photo of the Rise User
AbbVie Hybrid South San Francisco, CA, USA
Posted 4 days ago
Amplifier Health Remote No location specified
Posted 9 days ago
Photo of the Rise User
Posted 3 days ago

Founded in 1992, Cartesia, Inc. is a group of talented professionals providing custom solutions in the areas of engineering design automation, Web-based applications development, and Microsoft Windows-based software construction and integration. ...

7 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, on-site
DATE POSTED
December 12, 2024

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!