Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
AI Data Engineer image - Rise Careers
Job details

AI Data Engineer

Moonvalley is developing cutting-edge generative AI models designed to power Superbowl-worthy commercials and award-winning cinematic experiences. Our inaugural, cutting-edge HD model, Marey, is built on exclusively licensed and owned data for professional use in Hollywood and enterprise applications.

Our team is an unprecedented convergence of talent across industries. Our elite AI scientists from DeepMind, Microsoft, Snap and Meta, have decades of collective experience in machine learning and computational creativity. We have also established the first AI-enabled movie studio in Hollywood, filled with accomplished filmmakers and visionary creative talent. We work with the top producers, actors, and filmmakers in Hollywood as well as creative-driven global brands. So far we’ve raised over $70M from world-class investors including General Catalyst, Bessemer, Khosla Ventures & YCombinator – and we’re just getting started.

Role Summary:

We're looking for a Data Engineer to build the data pipelines driving our next-generation generative video models. This role is central to our mission of training models exclusively on clean, high-quality data.

In this role, you'll collaborate with the Data Engineering Lead to develop data ingestion pipelines, captioning systems, and high-throughput, distributed architectures for large-scale data processing and curation.

What You'll Do:

  • Build scalable, high-throughput data pipelines optimized for multi-modal video model training.

  • Build systems for data ingestion, deduplication, quality assessment, validation, filtering, and labeling to ensure only clean, high-quality data flows through the pipeline.

  • Optimize distributed data processing frameworks (e.g., Apache Spark, Ray, Airflow).

  • Work with infrastructure teams to scale pipelines across thousands of GPUs.

  • Implement strong observability and telemetry for all aspects of the data lifecycle.

What We're Looking For

  • Deep experience in building and scaling data infrastructure for large-scale ML systems, ideally for video or multi-modal models.

  • Solid background in ML engineering, including hands-on experience in training and optimizing classifiers.

  • Experience managing large-scale datasets and pipelines in production.

  • Expertise in Python, Spark, Airflow, or similar data frameworks.

  • Understanding of modern infrastructure: Kubernetes, Terraform, object stores (e.g. S3, GCS), and distributed computing environments.

  • Skilled at balancing rapid, iterative delivery with a focus on long-term technical vision, ensuring solutions are both pragmatic and architecturally elegant.

Nice to Haves

  • Experience working on foundational model training pipelines (image, video, or language).

  • Experience with video-specific data challenges like frame sampling, codec variability, temporal alignment, and perceptual quality scoring.

In our team, we approach our work with the dedication similar to Olympic athletes. Anticipate occasional late nights and weekends dedicated to our mission. We understand this level of commitment may not suit everyone, and we openly communicate this expectation.

If you're motivated by deeply technical problems, a seemingly never-ending uphill battle and the opportunity to build (and own) a generational technology company, we can give you what you're looking for.

All business roles at Moonvalley are hybrid positions by default, with some fully remote depending on the job scope. We meet a few times every year, usually in London, UK or North America (LA, Toronto) as a company.

If you're excited about the opportunity to work on cutting-edge AI technology and help shape the future of media and entertainment, we encourage you to apply. We look forward to hearing from you!

The statements contained in this job description reflect general details as necessary to describe the principal functions of this job, the level of knowledge and skill typically required and the scope of responsibility. It should not be considered an all-inclusive listing of work requirements. Individuals may perform other duties as assigned, including work in other functional areas to cover absences, to equalize peak work periods, or to otherwise balance organizational work

Moonvalley AI is proud to be an equal opportunity employer. We are committed to providing accommodations. If you require accommodation, we will work with you to meet your needs.

Please be assured we'll treat any information you share with us with the utmost care, only use your information for recruitment purposes and will never sell it to other companies for marketing purposes. Please review our privacy policy and job applicant privacy policy located here for further information.

Average salary estimate

$125000 / YEARLY (est.)
min
max
$100000K
$150000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About AI Data Engineer, Moonvalley AI

At Moonvalley, we’re on a mission to revolutionize the world of advertising and entertainment with our cutting-edge generative AI models. As an AI Data Engineer, you'll play a pivotal role in this exciting venture, focusing on building the data pipelines that will power our next-generation generative video models. Collaborating with our talented team, you'll help develop key systems for data ingestion and processing, ensuring only the highest quality, clean data fuels our magnificent creations. You’ll dive deep into optimizing distributed data processing frameworks such as Apache Spark and Airflow, and work alongside our infrastructure teams to scale pipelines across thousands of GPUs, enhancing the future of media and entertainment. We pride ourselves on the elite talent we've gathered, which includes AI scientists from DeepMind, Microsoft, Snap, and Meta. If you have a solid foundation in ML engineering and experience managing large datasets, this is the place for you to thrive. We’re looking for someone who can balance rapid development with long-term architectural vision. You’ll face deep technical challenges, but also experience the satisfaction of contributing to a generational technology company. At Moonvalley, we work hard with the same dedication as Olympic athletes, but we’re open about our expectations. We invite you to dive into this adventure with us, building a better future through innovative AI technology. If you're excited about what’s coming next in this space and eager to make an impact, we’d love to hear from you!

Frequently Asked Questions (FAQs) for AI Data Engineer Role at Moonvalley AI
What are the main responsibilities of the AI Data Engineer position at Moonvalley?

The AI Data Engineer at Moonvalley is responsible for building scalable, high-throughput data pipelines tailored for multi-modal video model training. This includes developing systems for data ingestion, deduplication, quality assessment, validation, and filtering, ensuring clean, high-quality data flows through the pipeline efficiently. Additionally, you'll optimize distributed processing frameworks like Apache Spark and work with infrastructure teams to scale these pipelines across numerous GPUs.

Join Rise to see the full answer
What qualifications are required for an AI Data Engineer at Moonvalley?

To be successful as an AI Data Engineer at Moonvalley, candidates should have extensive experience in building and scaling data infrastructure for large-scale machine learning systems. A solid background in ML engineering, specifically with video or multi-modal models, is essential, along with expertise in Python and frameworks like Spark and Airflow. Familiarity with modern infrastructure like Kubernetes and distributed computing environments is also highly valued.

Join Rise to see the full answer
What type of data challenges will the AI Data Engineer face at Moonvalley?

As an AI Data Engineer at Moonvalley, you will tackle various data challenges, particularly those related to video data. This may include frame sampling, codec variability, temporal alignment issues, and perceptual quality scoring. The role requires a keen understanding of the intricacies involved in managing large-scale datasets, ensuring optimal performance and high-quality results in generative video model training.

Join Rise to see the full answer
What is Moonvalley's company culture like for the AI Data Engineer role?

Moonvalley fosters a culture of dedication and innovation, where teamwork and collaboration are essential. We approach our work with the commitment and intensity of Olympic athletes. The role of AI Data Engineer involves tackling deeply technical challenges and requires a willingness to invest time into achieving our ambitious goals. Our team values open communication about expectations, which include occasional late nights and weekends dedicated to our mission of redefining media and entertainment.

Join Rise to see the full answer
How does the AI Data Engineer contribute to generative AI technology at Moonvalley?

The AI Data Engineer plays a crucial role in shaping the future of generative AI technology at Moonvalley. By building robust data pipelines and systems for efficient data processing, the engineer ensures that our generative video models are trained on the highest quality data. This foundational work is vital in creating the superb commercial and cinematic experiences that Moonvalley's technology aims to deliver.

Join Rise to see the full answer
Common Interview Questions for AI Data Engineer
Can you describe your experience with building data pipelines for machine learning projects?

When answering this question, focus on specific projects where you've successfully built and optimized data pipelines. Highlight the tools and technologies you utilized, such as Apache Spark or Airflow, and discuss the challenges you faced and how you overcame them. Emphasize the importance of high-quality data and any KPIs you used to measure pipeline success.

Join Rise to see the full answer
What methodologies do you follow when ensuring data quality and integrity?

Discuss your approach to data quality, including techniques like data validation, deduplication, and filtering. Share specific examples of how you’ve implemented these methodologies in previous roles. Highlight any tools or frameworks you've used to monitor data quality continuously and ensure that only clean data is used for training models.

Join Rise to see the full answer
How do you handle large-scale datasets in production?

In your response, outline the strategies you use for managing large datasets in production, such as using appropriate storage solutions, efficient data retrieval methods, and scalable architecture. Provide concrete examples of how you’ve successfully managed these datasets while minimizing latency and maximizing performance.

Join Rise to see the full answer
What experience do you have with distributed computing environments?

Elaborate on your experience with distributed computing environments like Kubernetes or cloud platforms. Discuss specific projects where you utilized these technologies to scale data processing. Highlight the advantages these environments offer for managing large-scale machine learning applications and how you've optimized resource usage.

Join Rise to see the full answer
Can you explain your experience with video-specific data challenges?

Answer this question by detailing your familiarity with video data challenges such as frame sampling and codec variability. Provide examples from past projects where you’ve successfully addressed these issues and emphasize your understanding of the unique requirements for processing video data in machine learning.

Join Rise to see the full answer
How do you prioritize tasks when working on multiple data engineering projects?

Discuss your prioritization strategies, such as using project management tools or following Agile methodologies. Provide examples of how you’ve successfully managed competing deadlines and ensured optimal results for multiple projects, illustrating your ability to remain focused and deliver quality work under pressure.

Join Rise to see the full answer
What techniques do you utilize to optimize data processing frameworks?

Outline your approach to optimizing data processing frameworks like Apache Spark or Airflow. Share specific techniques you’ve applied, including performance tuning, efficient resource allocation, and maximizing throughput. Include any metrics you used to evaluate the effectiveness of your optimizations.

Join Rise to see the full answer
What tools have you used for data observability and telemetry?

Mention the tools and platforms you've experienced with for implementing observability and telemetry in data pipelines. Discuss the importance of monitoring and logging in maintaining data quality and performance, and share examples of how you've effectively utilized these tools in previous roles.

Join Rise to see the full answer
How do you balance rapid delivery with long-term technical vision?

Elaborate on your approach to balancing short-term execution with long-term strategy. Discuss examples of when you’ve had to make tough decisions to prioritize immediate results while ensuring that the technical solutions align with broader organizational goals. Showcase your ability to think both tactically and strategically.

Join Rise to see the full answer
What excites you most about working in AI and data engineering?

When responding to this question, express your genuine enthusiasm for innovation in AI and how it can transform industries. Share what aspects of data engineering you find most fulfilling, such as solving complex technical problems, contributing to groundbreaking projects, or working with cutting-edge technology. This helps interviewers gauge your passion for the field.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 8 days ago

Join Rula as a Staff Data Engineer to help build data infrastructure for improving mental health care accessibility.

Photo of the Rise User

Become a key player at Qiddiya Investment Company as an Assistant Manager - Data Engineering, driving impactful data solutions and strategies.

Photo of the Rise User
Posted 18 hours ago

Be a cornerstone of innovative insurance data solutions as a Staff Data Engineer with Jobgether.

SciTec Hybrid No location specified
Posted 4 days ago

Join SciTec as a Senior Data Engineer and play a crucial role in developing innovative data processing solutions for national security.

Patriot Solutions Group, INC. Hybrid Virginia / Maryland, Washington Metropolitan Area, VA / MD, United States
Posted 11 days ago

Join PSG, INC. as a Senior Data Engineer and leverage your skills in data management and architecture to empower organizations with data-driven solutions.

Join our team as a Senior Data Engineer and lead the development of innovative data solutions in Chicago.

Photo of the Rise User

Join Texas Mutual as a Senior Data Engineer, where your expertise in data pipelines and advanced SQL skills will help drive our data initiatives.

Photo of the Rise User
Posted 9 days ago

Become a pivotal leader at Avalere Health, driving innovation in data engineering to empower healthcare analytics and solutions.

We've developed a groundbreaking machine learning model that can create visually stunning, high definition videos & animation from simple text prompts.

27 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
April 16, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
Photo of the Rise User
Someone from OH, Cincinnati just viewed Director of Growth Marketing at TeamSnap
Photo of the Rise User
Someone from OH, Cincinnati just viewed Growth Lead at io.net
Photo of the Rise User
Someone from OH, Cincinnati just viewed VP, Demand Generation at Ontic
R
Someone from OH, Cleveland just viewed Influencers Affiliates Team Lead at RISK
Photo of the Rise User
Someone from OH, Cincinnati just viewed Marketing Analyst at Anchorage Digital
Photo of the Rise User
Someone from OH, Cincinnati just viewed Marketing Analytics Analyst at 10x Genomics
Photo of the Rise User
Someone from OH, Columbus just viewed Sr Specialist Quality & Regulatory Compliance at bioMérieux
Photo of the Rise User
Someone from OH, Cincinnati just viewed Jr. Graphics Designer at NBCUniversal
o
Someone from OH, Cleveland just viewed Nike Marketing Coordinator at osu
Photo of the Rise User
Someone from OH, Columbus just viewed Project Manager at Promise
Photo of the Rise User
Someone from OH, Lima just viewed Program/Project Manager I at SRI International
G
Someone from OH, Mount Orab just viewed Backend Developer at GATEWAY CAREERS
Photo of the Rise User
39 people applied to Data Engineer III at SAIF
Photo of the Rise User
Someone from OH, Alliance just viewed Editor, Music Editorial (Fixed-Term Contract) at Spotify
Photo of the Rise User
Someone from OH, Cleveland just viewed IoT Engineer Intern (Batam) at Bosch Group
Photo of the Rise User
12 people applied to Junior Data Engineer at Optimiza
Photo of the Rise User
Someone from OH, Warren just viewed HR Business Partner - COO at Goodyear