Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Software Engineer, Data image - Rise Careers
Job details

Software Engineer, Data

Who we are


At Twelve Labs, we are pioneering the development of cutting-edge multimodal foundation models that have the ability to comprehend videos just like humans do. Our models have redefined the standards in video-language modeling, empowering us with more intuitive and far-reaching capabilities, and fundamentally transforming the way we interact with and analyze various forms of media.


With a remarkable $107 million in Seed and Series A funding, our company is backed by top-tier venture capital firms such as NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, and prominent AI visionaries and founders such as Fei-Fei Li, Silvio Savarese, Alexandr Wang and more. Headquartered in San Francisco, with an influential APAC presence in Seoul, our global footprint underscores our commitment to driving worldwide innovation.


We are a global company that values the uniqueness of each person’s journey. It is the differences in our cultural, educational, and life experiences that allow us to constantly challenge the status quo. We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world. Join us as we revolutionize video understanding and multimodal AI.


As a Software Engineer, Data at Twelve Labs, you will build core data infrastructure for acquiring, preprocessing, cleaning, filtering, and labeling multimodal text-vision datasets for model training. In this role, you will have a larger impact on the quality of our models than perhaps any other engineering role at the entire company: well filtered & labeled data is core to everything we do. This role is a perfect fit for distributed systems engineers who want to advance video understanding by delivering world class systems for *unstructured* multimodal corpora.

In this role, you will

  • Acquire, filter, label (leveraging techniques like RLAIF), and sanitize large-scale vision-language datasets for LLM/VLM pretraining

  • Scale our data systems to enable our evolution from double-digit to triple-digit billion parameter models (and beyond!)

  • Mentor junior engineers/researchers, and hold a high bar around code quality / engineering best practices

  • Establish strong relationships with 3rd party data vendors and human-in-the-loop data labeling services

  • Build the highest impact, not the flashiest, libraries and services

  • Lead by example in interviewing, hiring, and onboarding passionate and empathetic engineers

  • Work across teams to understand and manage project priorities and product deliverables, evaluate trade-offs, and drive technical initiatives from ideation to execution to shipment

You may be a good fit if you have

  • 7+ years of industry experience (or 4+ with a PhD in a related technical domain)

  • A PhD, or a Master's degree, in machine learning or a closely related discipline

  • Led teams of 3+ engineers as a technical lead

  • Experience building model-bootstrapped language or vision-language datasets (RLAIF, etc.)

  • Managed data acquisition for large generative or contrastive models

  • Experience with FFmpeg or other high performance image/video processing libraries (bonus points for past work with such processing on GPUs/accelerators)

  • Deep experience as a backend and/or data engineer & an interest in ML/AI systems

  • Strong Python expertise and considerable prior work history with at least one statically typed language (we use Golang)

  • Strong communication skills in written and spoken English

Interview and Onboarding Process:


1) Recruiter Phone Screen

2) Initial Technical Assessment

3) Final round technical assessment & culture interview

4) Reference Checks


We're also excited to share that we'll do global onboarding in Seoul for all new hires (paid company travel!).


Even if there are a few checkboxes that aren’t ticked through your prior experience, we still encourage you to apply! If you are a 0-to-1 achiever, a ferocious learner, and a kind and fun team player who motivates others, you will find a home at Twelve Labs.


We welcome applicants from all walks of life and are committed to equal-opportunity employment. We cherish and celebrate diversity not just because it is the right thing to do, but because it makes our company much stronger.



Benefits and Perks


🤝 An open and inclusive culture and work environment.

🧑‍💻 Work closely with a collaborative, mission-driven team on cutting-edge AI technology.

🦷 Full health, dental, and vision benefits

✈️ Extremely flexible PTO and parental leave policy. Office closed the week of Christmas and New Years.

🏙 Remote-flexible, offices in San Francisco and Seoul and coworking stipend

🛂 VISA support (such as H1B and OPT transfer for US employees)

Twelve Labs Glassdoor Company Review
5.0 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
Twelve Labs DE&I Review
3.0 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star iconGlassdoor star icon
CEO of Twelve Labs
Twelve Labs CEO photo
Unknown name
Approve of CEO

Average salary estimate

$150000 / YEARLY (est.)
min
max
$120000K
$180000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Software Engineer, Data, Twelve Labs

At Twelve Labs, we're on a mission to revolutionize video understanding and multimodal AI, and we're looking for a passionate Software Engineer, Data to join our innovative team! In this role, you'll be at the heart of building our core data infrastructure, which is essential for acquiring, preprocessing, cleaning, filtering, and labeling the multimodal text-vision datasets we need for training our cutting-edge models. Imagine having a direct impact on the quality of our models by ensuring that we have the best-filtered and labeled data! As an engineer in this position, you'll help us scale our data systems and reach new heights as we evolve from double-digit to triple-digit billion parameter models. You'll also mentor junior engineers, maintain high standards in code quality, and foster strong relationships with third-party data vendors. We're looking for someone who not only has extensive experience but is also eager to collaborate across teams, manage project priorities, and drive technical initiatives from ideation to execution. If you hold a PhD or Master’s in a relevant discipline and have a strong background in data engineering, machine learning, and backend technologies, this is the perfect opportunity for you. Join us at Twelve Labs, and be part of a diverse and inclusive culture where your voice matters, and your work transforms the way we interact with media!

Frequently Asked Questions (FAQs) for Software Engineer, Data Role at Twelve Labs
What are the key responsibilities of a Software Engineer, Data at Twelve Labs?

As a Software Engineer, Data at Twelve Labs, you'll be responsible for acquiring, preprocessing, cleaning, filtering, and labeling multimodal text-vision datasets for model training. You will focus on scaling data systems and ensuring high-quality datasets, which are crucial for our video-language models. Additionally, you'll mentor junior engineers and manage relationships with data vendors.

Join Rise to see the full answer
What qualifications are needed for the Software Engineer, Data position at Twelve Labs?

To qualify for the Software Engineer, Data role at Twelve Labs, candidates should possess at least 7 years of industry experience or 4 years with a PhD in a related technical domain. A strong background in machine learning, data engineering, and experience with large-scale datasets is essential.

Join Rise to see the full answer
What skills are valued for the Software Engineer, Data at Twelve Labs?

Candidates for the Software Engineer, Data position should demonstrate strong Python expertise, experience with model-bootstrapped language datasets, and exceptional communication skills. Familiarity with FFmpeg and high-performance image/video processing on GPUs is a plus, along with leadership experience managing teams.

Join Rise to see the full answer
How does the onboarding process work for the Software Engineer, Data at Twelve Labs?

The onboarding process for the Software Engineer, Data role at Twelve Labs consists of a recruiter phone screen, technical assessments, a culture interview, and reference checks. New hires can look forward to a global onboarding experience in Seoul, which is covered by the company.

Join Rise to see the full answer
What type of work culture can a Software Engineer, Data expect at Twelve Labs?

At Twelve Labs, we're proud of our open and inclusive culture. Software Engineers, Data will work closely with a mission-driven team on cutting-edge AI technology while enjoying flexible PTO, a collaborative work environment, and comprehensive benefits. We celebrate diversity and welcome applicants from all backgrounds.

Join Rise to see the full answer
Common Interview Questions for Software Engineer, Data
Can you describe your experience with preprocessing multimodal datasets for AI training?

In your response, share specific examples of datasets you've worked with, the preprocessing techniques you utilized, and how your contributions improved the data quality for model training. Highlight any successful projects that demonstrate your ability to manage complex datasets effectively.

Join Rise to see the full answer
What strategies do you use to ensure high-quality data labeling?

Discuss techniques you've implemented for data labeling, such as human-in-the-loop validation or employing RLAIF methods. Emphasize the importance of accurate labels in training successful models and any metrics you've established to evaluate labeling quality.

Join Rise to see the full answer
How do you approach scaling data systems for large generative models?

Share your experience in building or scaling data systems, including any tools or frameworks you've used. Explain how you assessed the scalability needs of the models and the architectural decisions you made to ensure the systems could handle increased data loads.

Join Rise to see the full answer
Can you highlight a challenge you've faced in acquiring large datasets and how you overcame it?

Be prepared to discuss a specific challenge, whether it was related to sourcing data, data quality issues, or technical obstacles. Explain the steps you took to address the issue and the successful outcomes as a result of your actions.

Join Rise to see the full answer
What methodologies do you use to mentor junior engineers in your team?

Talk about your approach to mentorship, such as regular feedback sessions, code reviews, or pair programming. Highlight the positive impact this has had on team dynamics and the professional growth of junior engineers.

Join Rise to see the full answer
How do you stay updated on the latest developments in machine learning and data engineering?

Discuss your habits for staying informed, such as following industry publications, participating in workshops, or being involved in relevant communities. Show your commitment to continuous learning in this rapidly evolving field.

Join Rise to see the full answer
What has been your experience working with third-party data vendors?

Share specific examples of partnerships you've established with data vendors, including your approach to negotiations, ensuring quality, and maintaining ongoing relationships to ensure data integrity for projects.

Join Rise to see the full answer
How do you prioritize project deliverables when working on multiple initiatives?

Discuss your method of evaluating project importance and urgency, including any tools or frameworks you use. Share examples of how you've successfully managed competing priorities in the past.

Join Rise to see the full answer
Can you explain your experience with Python and any other programming languages you have used?

Be specific about your level of expertise in Python, including frameworks or libraries you've worked with. Discuss how you've leveraged other languages, such as Golang, in your data engineering tasks and the benefits they provided.

Join Rise to see the full answer
What role does communication play in your engineering work, especially in collaborative projects?

Emphasize the significance of clear communication and collaboration in engineering, particularly in cross-functional teams. Provide examples of how effective communication has facilitated project success and strengthened team relationships.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Twelve Labs Remote No location specified
Posted 3 days ago
Photo of the Rise User
Posted 2 days ago
Photo of the Rise User
Visa Remote Bellevue, WA
Posted 6 days ago
Photo of the Rise User
Posted 8 days ago
LVIS Hybrid No location specified
Posted 11 days ago
Photo of the Rise User
Posted 14 days ago
Photo of the Rise User
Posted 11 days ago
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
December 23, 2024

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!