Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Member of Technical Staff, ML Ops image - Rise Careers
Job details

Member of Technical Staff, ML Ops

Captions is the leading video AI company, building the future of video creation. Over 10 million creators and businesses have used Captions to create videos for social media, marketing, sales, and more. We're on a mission to serve the next billion.

We are a rapidly growing team of ambitious, experienced, and devoted engineers, researchers, designers, marketers, and operators based in NYC. You'll join an early team and have an outsized impact on the product and the company's culture.

We’re very fortunate to have some the best investors and entrepreneurs backing us, including Index Ventures (Series C lead), Kleiner Perkins (Series B lead), Sequoia Capital (Series A and Seed co-lead), Andreessen Horowitz (Series A and Seed co-lead), Uncommon Projects, Kevin Systrom, Mike Krieger, Lenny Rachitsky, Antoine Martin, Julie Zhuo, Ben Rubin, Jaren Glover, SVAngel, 20VC, Ludlow Ventures, Chapter One, and more.

Check out our latest financing milestone and some other coverage:

The Information: 50 Most Promising Startups

Fast Company: Next Big Things in Tech

The New York Times: When A.I. Bridged a Language Gap, They Fell in Love

Business Insider: 34 most promising AI startups

Time: The Best Inventions of 2024

** Please note that all of our roles will require you to be in-person at our NYC HQ (located in Union Square)

We do not work with third-party recruiting agencies, please do not contact us**

About the role:

Captions seeks an exceptional MLOps Research Engineer (MOTS) to architect and scale the machine learning infrastructure for our rapidly growing creative platform used by millions. You'll own the development of our distributed training systems, optimize our rapidly growing GPU clusters, and build performant inference pipelines that power our cutting-edge multimodal video diffusion models. As a key member of our ML Research team in a fast-growing Series C startup, you'll create foundational infrastructure enabling rapid research iteration while maintaining production-grade reliability and efficiency. We're already training large-scale models and are excited to dramatically expand our infrastructure capabilities.

Key Responsibilities:

Core Systems Development:

  • Develop and optimize distributed training frameworks integrating multiple modalities (video, audio, text, and structured metadata)

  • Build flexible systems for cross-modal training orchestration and efficient experimentation

  • Design reproducible training environments with versioned dependencies and configurations

  • Implement comprehensive testing frameworks for validating model training correctness and performance

  • Create infrastructure for systematic model quality assessment and performance benchmarking

Infrastructure Development:

  • Design and implement flexible training orchestration systems that balance research agility with large-scale model training

  • Build robust monitoring and observability systems for complex training and inference pipelines

  • Design and manage GPU clusters optimized for distributed training of multimodal models

  • Build out comprehensive automated metrics collection and alerting across our ML stack

System Optimization:

  • Profile and optimize model training throughput using mixed precision, gradient checkpointing, and advanced memory techniques

  • Develop custom CUDA and Triton kernels to accelerate critical compute paths

  • Implement creative solutions for cost optimization across spot instances and reserved capacity

  • Design and optimize real-time inference systems enabling fast research iteration cycles

Research & Product Impact:

  • Build infrastructure enabling rapid testing of research hypotheses

  • Create systems supporting close collaboration between infrastructure and research teams

  • Develop frameworks for reproducible research experimentation

  • Enable seamless deployment of research innovations to production

Requirements:

Technical Background:

  • Bachelor's or Master's degree in Computer Science, Machine Learning, or related field

  • Strong programming skills in Python and systems programming

  • Experience with distributed systems and scalable infrastructure

  • Track record of building reliable, performant large-scale ML systems

Areas of Expertise (Strong experience in some or all of these areas):

  • Deep expertise in PyTorch internals and distributed training frameworks (FSDP, DeepSpeed)

  • GPU cluster management and optimization

  • Performance profiling and systems optimization

  • CUDA programming and kernel optimization

  • Containerization and orchestration (Docker, Kubernetes)

  • ML model serving and deployment at scale

  • Language models and attention mechanism optimization

  • Video and audio processing pipelines

  • Large-scale diffusion models

Engineering Approach:

  • Love diving deep into complex systems optimization challenges

  • Take ownership of critical infrastructure while collaborating effectively

  • Get excited about pushing the boundaries of ML system performance

  • Want to work directly with researchers on cutting-edge ML problems

  • Thrive in fast-paced, research-driven environments

About the Team:

You'll work full-time, on-site in our NYC office alongside researchers and engineers who are dedicated to building world-class generative models and data infrastructures. We've intentionally built a culture that prizes open discussion of technical approaches, rapid iteration, and direct access to decision makers. Your success will be measured by the performance and reliability of our systems, enabling our researchers to iterate quickly on and develop ambitious ideas. You'll have significant autonomy to shape our infrastructure direction and direct impact on our ability to serve millions of creators.

Our team values:

  • Open technical discussions and collaboration

  • Rapid iteration and practical solutions

  • Deep technical expertise and continuous learning

  • Direct impact on research and product outcomes

Benefits:

  • Comprehensive medical, dental, and vision plans

  • 401K with employer match

  • Commuter Benefits

  • Catered lunch multiple days per week

  • Dinner stipend every night if you're working late and want a bite!

  • Doordash DashPass subscription

  • Health & Wellness Perks (Talkspace, Kindbody, One Medical subscription, HealthAdvocate, Teladoc)

  • Multiple team offsites per year with team events every month

  • Generous PTO policy

Captions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

Please note benefits apply to full time employees only.

Captions Glassdoor Company Review
3.2 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
Captions DE&I Review
3.8 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
CEO of Captions
Captions CEO photo
Unknown name
Approve of CEO
What You Should Know About Member of Technical Staff, ML Ops, Captions

Are you ready to elevate your career with Captions as a Member of Technical Staff in ML Ops? Located in the vibrant heart of New York City, Captions stands at the forefront of video AI, serving over 10 million creators and businesses in revolutionizing video production. This is a golden opportunity to join an innovative team dedicated to scaling our remarkable platform, which has captured the attention of investors and media alike. As a pivotal part of this team, you'll help architect machine learning infrastructures, optimize distributed training systems, and build efficient inference pipelines that power our groundbreaking multimodal video diffusion models. With responsibility for ensuring the performance and reliability of our systems, your work will directly impact the creative experiences of millions. We're looking for someone with strong programming skills in Python and experience with scalable infrastructure who is excited about pushing the envelope on machine learning performance. Your role will involve collaborating closely with our ML research team, allowing you to dive deeply into complex challenges while fostering an engaging workplace culture that values open dialogue and innovative thinking. If you're looking for a position where your contributions matter and want to help shape the future of video technology, Captions is the place for you. Join us in creating the next big thing in AI and video technology, while enjoying a suite of perks to enhance your career and personal well-being.

Frequently Asked Questions (FAQs) for Member of Technical Staff, ML Ops Role at Captions
What will the Member of Technical Staff, ML Ops at Captions do regarding machine learning infrastructure?

As a Member of Technical Staff in ML Ops at Captions, you will be responsible for architecting and scaling the machine learning infrastructure essential to our video AI platform, which serves millions of users. Your work will include developing distributed training systems, optimizing GPU clusters, and designing robust inference pipelines that support our pioneering multimodal models, ensuring both efficiency and reliability.

Join Rise to see the full answer
What qualifications are required for the Member of Technical Staff, ML Ops position at Captions?

Candidates for the Member of Technical Staff, ML Ops role at Captions should ideally hold a Bachelor's or Master's degree in Computer Science, Machine Learning, or a related field. Strong programming skills, particularly in Python, coupled with experience in distributed systems and scalable infrastructure are essential. Familiarity with PyTorch, GPU cluster management, and performance optimization will greatly enhance your application.

Join Rise to see the full answer
What type of experience is beneficial for the Member of Technical Staff, ML Ops role at Captions?

For the Member of Technical Staff, ML Ops position at Captions, experience in deep learning frameworks, specifically deep expertise in PyTorch and distributed training techniques, is highly beneficial. Candidates should also have hands-on experience with CUDA programming, containerization tools like Docker and Kubernetes, and be well-versed in ML model deployment practices. A passion for tackling complex systems optimization challenges will set you apart.

Join Rise to see the full answer
How does the Member of Technical Staff, ML Ops contribute to Captions' mission?

The Member of Technical Staff in ML Ops plays a crucial role in advancing Captions' mission of building cutting-edge video AI technologies. By developing and optimizing machine learning infrastructure, you will enable rapid research iterations and support the deployment of innovative solutions that enhance creator experiences, thus directly impacting the user experience for our extensive community.

Join Rise to see the full answer
What is the work culture like at Captions for a Member of Technical Staff, ML Ops?

At Captions, the work culture for a Member of Technical Staff in ML Ops thrives on collaboration, open discussion, and technical expertise. The team values rapid iteration, practical solutions, and offers substantial autonomy to drive infrastructure direction. You'll work closely with researchers in a fast-paced environment, actively contributing to projects that have a tangible impact on the future of video technology.

Join Rise to see the full answer
Common Interview Questions for Member of Technical Staff, ML Ops
Can you explain your experience with distributed systems and how it applies to the Member of Technical Staff, ML Ops role?

When answering this question, highlight specific projects where you've developed or optimized distributed systems. Discuss the technologies and frameworks you've worked with, like PyTorch and FSDP, and emphasize how your expertise can contribute to building scalable ML infrastructures at Captions.

Join Rise to see the full answer
What challenges have you faced in ensuring the performance of GPU clusters?

In your response, describe a specific challenge you encountered with GPU cluster performance. Detail how you identified the bottleneck, the measures you took to resolve the issue, and the outcomes. This demonstrates your problem-solving skills and direct relevance to the Member of Technical Staff, ML Ops role.

Join Rise to see the full answer
How do you approach designing a reproducible training environment?

Discuss your systematic approach to building reproducible training environments. Emphasize the importance of versioned dependencies and configurations, and how you ensure that different team members can replicate experiments successfully. Clear communication here will illustrate your technical depth for Captions.

Join Rise to see the full answer
Explain your experience with performance profiling and optimization in machine learning systems.

Provide a clear narrative around your past experiences with performance profiling. Mention the tools you used, like detailed metrics from profiling sessions, and how your optimizations led to measurable improvements in training speed or resource efficiency.

Join Rise to see the full answer
What strategies would you employ for cost optimization in a distributed training environment?

In this answer, talk about your familiarity with cloud services and the use of spot instances versus reserved capacity. Discuss strategies you’ve implemented in the past that achieved significant cost savings while maintaining performance.

Join Rise to see the full answer
Describe your experience with containerization and orchestration tools.

Articulate your hands-on experience with tools like Docker and Kubernetes. Discuss how you've employed them in past projects to ensure scalable deployment environments and enhance collaboration between teams.

Join Rise to see the full answer
How do you ensure close collaboration between infrastructure and research teams?

Explain your methods for facilitating communication between teams, including regular meetings, shared documentation, and joint projects. Highlighting collaborative work signals to Captions your ability to thrive in a research-driven environment.

Join Rise to see the full answer
What excites you about working with cutting-edge ML technology?

Share your personal passion for exploring and implementing new technologies in the machine learning space. Discuss how this aligns with Captions' innovative culture and mission in revolutionizing video creation.

Join Rise to see the full answer
How do you handle fast-paced work environments that require quick iterations?

Describe your time management and prioritization strategies. Give examples of instances where you've quickly adapted to new information and made rapid decisions, demonstrating that you can thrive in the exciting, dynamic atmosphere at Captions.

Join Rise to see the full answer
What contributions would you make as a Member of Technical Staff, ML Ops at Captions?

Concisely outline specific technical contributions you would bring to the team, such as improving model training efficiency, building robust inference pipelines, or facilitating better communication between teams. This showcases both your proactive thinking and alignment with Captions' goals.

Join Rise to see the full answer
Similar Jobs
Posted 14 days ago
Photo of the Rise User
Posted 3 days ago
Jobs for Humanity Remote Abu Dhabi, United Arab Emirates
Posted 4 days ago
Photo of the Rise User
Messer Hybrid New Jersey 28, Bridgewater, NJ
Posted 8 days ago
Photo of the Rise User
Posted 14 days ago
Photo of the Rise User
AECOM Hybrid Chelmsford, MA, United States
Posted yesterday
Photo of the Rise User
Posted 8 days ago
Photo of the Rise User
Anduril Industries Hybrid Costa Mesa, California, United States
Posted 9 days ago
Photo of the Rise User
Posted 5 days ago
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
EMPLOYMENT TYPE
Full-time, on-site
DATE POSTED
March 23, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
Photo of the Rise User
Someone from OH, Cleveland just viewed Finance Intern - Summer 2025 at Spectrum
Photo of the Rise User
Someone from OH, Cleveland just viewed QC Engineer at QODE
Photo of the Rise User
Someone from OH, Cleveland just viewed Getinge is hiring: UI/UX Developer in Streetsboro at Getinge
Photo of the Rise User
Someone from OH, Westerville just viewed Data analyst | Mid at Nord Security
Photo of the Rise User
Someone from OH, North Canton just viewed Researcher-NBC Sports at NBCUniversal
Photo of the Rise User
Someone from OH, North Canton just viewed Researcher-NBC Sports at NBCUniversal
Photo of the Rise User
Someone from OH, Lakewood just viewed Culture and Programs Analyst at City of Philadelphia
Photo of the Rise User
Someone from OH, Olmsted Falls just viewed Customer Service - Representative at Waterway Carwash
M
Someone from OH, Strongsville just viewed Technical Writer (Contract) at Mintlify
Photo of the Rise User
Someone from OH, Cincinnati just viewed Inside Sales Co-Op at VEGA Americas
S
Someone from OH, Cleveland just viewed Senior JavaScript Developer at SuperDial
Photo of the Rise User
Someone from OH, Columbus just viewed Environmental Science Intern at Kimley-Horn
Photo of the Rise User
Someone from OH, Dayton just viewed Sr Renewal Analyst 1730 at MeridianLink
Photo of the Rise User
Someone from OH, Canton just viewed Communications Manager at Shearer's Foods
Photo of the Rise User
Someone from OH, Akron just viewed BDR Lead at Pontera
Photo of the Rise User
Someone from OH, Akron just viewed SDR Manager at Darktrace
Photo of the Rise User
24 people applied to REMOTE Sr Piping Designer at Kelly