Moonvalley is developing cutting-edge generative AI models designed to power Superbowl-worthy commercials and award-winning cinematic experiences. Our inaugural, cutting-edge HD model, Marey, is built on exclusively licensed and owned data for professional use in Hollywood and enterprise applications.
Our team is an unprecedented convergence of talent across industries. Our elite AI scientists from DeepMind, Microsoft, Snap and Meta, have decades of collective experience in machine learning and computational creativity. We have also established the first AI-enabled movie studio in Hollywood, filled with accomplished filmmakers and visionary creative talent. We work with the top producers, actors, and filmmakers in Hollywood as well as creative-driven global brands. So far we’ve raised over $70M from world-class investors including General Catalyst, Bessemer, Khosla Ventures & YCombinator – and we’re just getting started.
Role Summary:
We're looking for a Data Engineer to build the data pipelines driving our next-generation generative video models. This role is central to our mission of training models exclusively on clean, high-quality data.
In this role, you'll collaborate with the Data Engineering Lead to develop data ingestion pipelines, captioning systems, and high-throughput, distributed architectures for large-scale data processing and curation.
What You'll Do:
Build scalable, high-throughput data pipelines optimized for multi-modal video model training.
Build systems for data ingestion, deduplication, quality assessment, validation, filtering, and labeling to ensure only clean, high-quality data flows through the pipeline.
Optimize distributed data processing frameworks (e.g., Apache Spark, Ray, Airflow).
Work with infrastructure teams to scale pipelines across thousands of GPUs.
Implement strong observability and telemetry for all aspects of the data lifecycle.
What We're Looking For
Deep experience in building and scaling data infrastructure for large-scale ML systems, ideally for video or multi-modal models.
Solid background in ML engineering, including hands-on experience in training and optimizing classifiers.
Experience managing large-scale datasets and pipelines in production.
Expertise in Python, Spark, Airflow, or similar data frameworks.
Understanding of modern infrastructure: Kubernetes, Terraform, object stores (e.g. S3, GCS), and distributed computing environments.
Skilled at balancing rapid, iterative delivery with a focus on long-term technical vision, ensuring solutions are both pragmatic and architecturally elegant.
Nice to Haves
Experience working on foundational model training pipelines (image, video, or language).
Experience with video-specific data challenges like frame sampling, codec variability, temporal alignment, and perceptual quality scoring.
In our team, we approach our work with the dedication similar to Olympic athletes. Anticipate occasional late nights and weekends dedicated to our mission. We understand this level of commitment may not suit everyone, and we openly communicate this expectation.
If you're motivated by deeply technical problems, a seemingly never-ending uphill battle and the opportunity to build (and own) a generational technology company, we can give you what you're looking for.
All business roles at Moonvalley are hybrid positions by default, with some fully remote depending on the job scope. We meet a few times every year, usually in London, UK or North America (LA, Toronto) as a company.
If you're excited about the opportunity to work on cutting-edge AI technology and help shape the future of media and entertainment, we encourage you to apply. We look forward to hearing from you!
The statements contained in this job description reflect general details as necessary to describe the principal functions of this job, the level of knowledge and skill typically required and the scope of responsibility. It should not be considered an all-inclusive listing of work requirements. Individuals may perform other duties as assigned, including work in other functional areas to cover absences, to equalize peak work periods, or to otherwise balance organizational work
Moonvalley AI is proud to be an equal opportunity employer. We are committed to providing accommodations. If you require accommodation, we will work with you to meet your needs.
Please be assured we'll treat any information you share with us with the utmost care, only use your information for recruitment purposes and will never sell it to other companies for marketing purposes. Please review our privacy policy and job applicant privacy policy located here for further information.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
At Moonvalley, we’re on a mission to revolutionize the world of advertising and entertainment with our cutting-edge generative AI models. As an AI Data Engineer, you'll play a pivotal role in this exciting venture, focusing on building the data pipelines that will power our next-generation generative video models. Collaborating with our talented team, you'll help develop key systems for data ingestion and processing, ensuring only the highest quality, clean data fuels our magnificent creations. You’ll dive deep into optimizing distributed data processing frameworks such as Apache Spark and Airflow, and work alongside our infrastructure teams to scale pipelines across thousands of GPUs, enhancing the future of media and entertainment. We pride ourselves on the elite talent we've gathered, which includes AI scientists from DeepMind, Microsoft, Snap, and Meta. If you have a solid foundation in ML engineering and experience managing large datasets, this is the place for you to thrive. We’re looking for someone who can balance rapid development with long-term architectural vision. You’ll face deep technical challenges, but also experience the satisfaction of contributing to a generational technology company. At Moonvalley, we work hard with the same dedication as Olympic athletes, but we’re open about our expectations. We invite you to dive into this adventure with us, building a better future through innovative AI technology. If you're excited about what’s coming next in this space and eager to make an impact, we’d love to hear from you!
Join Rula as a Staff Data Engineer to help build data infrastructure for improving mental health care accessibility.
Become a key player at Qiddiya Investment Company as an Assistant Manager - Data Engineering, driving impactful data solutions and strategies.
Be a cornerstone of innovative insurance data solutions as a Staff Data Engineer with Jobgether.
Join SciTec as a Senior Data Engineer and play a crucial role in developing innovative data processing solutions for national security.
Join PSG, INC. as a Senior Data Engineer and leverage your skills in data management and architecture to empower organizations with data-driven solutions.
Join our team as a Senior Data Engineer and lead the development of innovative data solutions in Chicago.
Join Texas Mutual as a Senior Data Engineer, where your expertise in data pipelines and advanced SQL skills will help drive our data initiatives.
Become a pivotal leader at Avalere Health, driving innovation in data engineering to empower healthcare analytics and solutions.
We've developed a groundbreaking machine learning model that can create visually stunning, high definition videos & animation from simple text prompts.
27 jobsSubscribe to Rise newsletter