Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Member of Technical Staff - Foundational Model Data image - Rise Careers
Job details

Member of Technical Staff - Foundational Model Data

Liquid AI, an MIT spin-off, is a foundation model company headquartered in Boston, Massachusetts. Our mission is to build capable and efficient general-purpose AI systems at every scale.


Our goal at Liquid is to build the most capable AI systems to solve problems at every scale, such that users can build, access, and control their AI solutions. This is to ensure that AI will get meaningfully, reliably and efficiently integrated at all enterprises. Long term, Liquid will create and deploy frontier-AI-powered solutions that are available to everyone.


We are seeking a highly skilled Member of Technical Staff, Foundation Model Data to play a critical role in our foundation model development process. This role focuses on consolidating, gathering, and generating high-quality text data for pretraining, midtraining, SFT, and preference optimization.


Key Responsibilities
  • Create and maintain data cleaning, filtering, selection pipeline than can handle >100TB of data.
  • Watch out for the release of public dataset on huggingface and other platforms.
  • Create crawlers to gather datasets from the web where public data is lacking.
  • Write and maintain synthetic data generation pipelines.
  • Run ablations to assess new dataset and judging pipelines.


Required Qualifications
  • Experience Level: B.S. + 5 years experience or M.S. + 3 years experience or Ph.D. + 1 year of experience.
  • Dataset Engineering: Expertise in data curation, cleaning, augmentation, and synthetic data generation techniques.
  • Machine Learning Expertise: Ability to write and debug models in popular ML frameworks, and experience working with LLMs.
  • Software Development: Strong programming skills in Python, with an emphasis on writing clean, maintainable, and scalable code.


Preferred Qualifications
  • M.S. or Ph.D. in Computer Science, Electrical Engineering, Math, or a related field.
  • Experience fine-tuning or customizing LLMs.
  • First-author publications in top ML conferences (e.g. NeurIPS, ICML, ICLR).
  • Contributions to popular open-source projects.


Average salary estimate

$125000 / YEARLY (est.)
min
max
$100000K
$150000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Member of Technical Staff - Foundational Model Data, Liquid AI

At Liquid AI, an exciting MIT spin-off located in Boston, Massachusetts, we are on a mission to build capable and efficient general-purpose AI systems at every scale. We're currently seeking a highly skilled Member of Technical Staff - Foundational Model Data to join our dynamic team. In this pivotal role, you will focus on consolidating and generating high-quality text data crucial for the pretraining, midtraining, supervised fine-tuning, and preference optimization stages of model development. Your responsibilities will include creating and maintaining robust data pipelines to handle massive datasets exceeding 100TB, keeping an eye on public dataset releases on platforms like Hugging Face, and even crafting web crawlers to gather additional data when necessary. You will write and improve synthetic data generation pipelines and run ablation studies to evaluate new datasets and evolving pipeline processes. We require someone with experience levels ranging from a B.S. and five years, to a Ph.D. with at least one year of experience, alongside a strong background in dataset engineering. Familiarity with machine learning frameworks, particularly with large language models, is essential, along with robust programming skills in Python. Join us at Liquid AI, where your contributions will support our vision of making frontier-AI-powered solutions accessible to everyone across various sectors!

Frequently Asked Questions (FAQs) for Member of Technical Staff - Foundational Model Data Role at Liquid AI
What are the responsibilities of a Member of Technical Staff - Foundational Model Data at Liquid AI?

As a Member of Technical Staff - Foundational Model Data at Liquid AI, your key responsibilities will include establishing and managing extensive data cleaning and filtering pipelines capable of handling over 100TB of data. You'll also monitor public dataset releases on platforms like Hugging Face and develop web crawlers to source datasets when public data availability is insufficient. Additionally, you'll maintain synthetic data generation pipelines and perform ablation studies to assess new datasets and pipelines.

Join Rise to see the full answer
What qualifications are required for the Member of Technical Staff - Foundational Model Data position at Liquid AI?

To qualify for the Member of Technical Staff - Foundational Model Data role at Liquid AI, candidates need to have a B.S. with five years of experience, an M.S. with three years of experience, or a Ph.D. with at least one year of relevant experience. Essential qualifications include expertise in data curation and cleaning, familiarity with machine learning frameworks, and strong programming skills in Python.

Join Rise to see the full answer
What programming skills are necessary for a Member of Technical Staff - Foundational Model Data at Liquid AI?

For the Member of Technical Staff - Foundational Model Data role at Liquid AI, strong programming skills in Python are crucial. Candidates should be adept at writing clean, maintainable, and scalable code. Knowledge of libraries and frameworks such as TensorFlow or PyTorch may also be advantageous as these tools are integral to machine learning development.

Join Rise to see the full answer
Is experience with large language models required for the Member of Technical Staff - Foundational Model Data position at Liquid AI?

Yes, experience with large language models (LLMs) is a key requirement for the Member of Technical Staff - Foundational Model Data position at Liquid AI. Candidates should be comfortable writing and debugging models in popular machine learning frameworks and have a comprehensive understanding of LLMs, including their tuning and customization.

Join Rise to see the full answer
What are the preferred qualifications for the Member of Technical Staff - Foundational Model Data role at Liquid AI?

In addition to the required qualifications, Liquid AI prefers candidates with a M.S. or Ph.D. in fields such as Computer Science, Electrical Engineering, or Mathematics. Experience in fine-tuning or customizing LLMs, first-author publications in prestigious ML conferences, and contributions to popular open-source projects will give candidates a significant advantage when applying for the Member of Technical Staff - Foundational Model Data role.

Join Rise to see the full answer
Common Interview Questions for Member of Technical Staff - Foundational Model Data
Can you describe your experience with data curation and cleaning for large datasets?

In answering this question, highlight specific projects or experiences where you managed or prepared large datasets. Discuss the tools and techniques used for data cleaning and curation, emphasizing scalability and efficiency in handling over 100TB of data.

Join Rise to see the full answer
What methods do you employ to stay updated on public dataset releases relevant to your work?

Here, discuss the strategies you use to monitor releases on platforms such as Hugging Face or GitHub. Mention the importance of networking with other professionals in the field, subscribing to relevant newsletters, or following key contributors on social media.

Join Rise to see the full answer
How would you approach creating a web crawler for data collection?

In your response, outline your understanding of web crawling fundamentals, the programming languages and libraries you would utilize, and your approach to ensuring the crawler adheres to ethical practices and robots.txt rules.

Join Rise to see the full answer
What challenges have you faced when generating synthetic data and how did you overcome them?

Discuss specific challenges such as ensuring data diversity and appropriately simulating real-world data distributions. Provide examples of adjustments made to your synthetic data generation processes or any debugging efforts you undertook.

Join Rise to see the full answer
How do you evaluate the effectiveness of a new dataset in your pipeline?

Explain the ablation studies you conduct, including the criteria for selecting datasets and how you measure their performance in your model training processes. Mention any statistical metrics you prefer to use.

Join Rise to see the full answer
What programming languages are you proficient in and how do they apply to your work in machine learning?

Talk about your proficiency in Python, and elaborate on how you have employed it in various machine learning projects, focusing on libraries like TensorFlow or PyTorch that are essential for developing large-scale AI models.

Join Rise to see the full answer
How do you ensure that your code is clean, maintainable, and scalable?

Discuss your coding practices, such as following design patterns, code reviews with peers, and utilizing version control systems. Emphasize the importance of writing documentation that helps other team members understand and build upon your code.

Join Rise to see the full answer
Can you explain your process in customizing large language models?

Describe any experience you have with fine-tuning LLMs, including the steps taken to adapt the models to specific tasks and how you handled the data required for training during these customizations.

Join Rise to see the full answer
What contributions have you made to open-source projects related to machine learning?

Share any specific projects you've contributed to, highlighting the impact of your contributions and what skills you applied or gained through this process.

Join Rise to see the full answer
Why do you want to work at Liquid AI as a Member of Technical Staff - Foundational Model Data?

Express your enthusiasm for Liquid AI’s mission and innovation in AI technologies. Native knowledge of relevant projects or product offerings can also be used to emphasize alignment between your values and the company’s efforts in making AI solutions accessible.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User

Stride is looking for Senior Software Engineers to drive the development of their innovative benefits platform for independent workers.

Photo of the Rise User
Toast Remote Boston, Massachusetts, United States
Posted yesterday

Step into a pivotal role as a Senior Software Engineer at Toast, where you’ll spearhead innovations to assist restaurants in maximizing their efficiency and guest experiences.

Photo of the Rise User
Posted 5 days ago

Upstart is looking for a Software Engineer to join their Consumer Platform team and contribute to a scalable multi-product platform that deepens user relationships.

Photo of the Rise User
ManTech Hybrid US, Anne Arundel County, MD; Maryland, Linthicum Heights, MD
Posted 6 days ago

Join ManTech as a Software Engineer and play a crucial role in delivering innovative cryptographic solutions for national security systems.

Photo of the Rise User
Aura Hybrid New York City, United States
Posted 9 days ago
Customer-Centric
Mission Driven
Collaboration over Competition
Growth & Learning
Social Gatherings
Mental Health Resources
Learning & Development
Employee Resource Groups

Join Aura as a Senior Software Engineer to enhance and scale their infrastructure for millions of users worldwide.

Photo of the Rise User
Posted 9 days ago

Join Datasite as a Software Engineering Intern and immerse yourself in a fast-paced, innovative SaaS environment.

Weekday AI Remote No location specified
Posted 10 days ago

Embark on a thrilling internship as an SDE 1, shaping AI-driven products while collaborating with engineers and users to create impactful solutions.

Photo of the Rise User
Posted 4 days ago
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
HQ LOCATION
No info
EMPLOYMENT TYPE
Full-time, on-site
DATE POSTED
April 12, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
R
Someone from OH, Cincinnati just viewed Director, Payroll Tax at Ryan
Photo of the Rise User
11 people applied to Software Engineer Intern at GITAI
P
Someone from OH, Columbus just viewed Data Science for Smart Agriculture- Part-Time at PSU
Photo of the Rise User
Someone from OH, Cincinnati just viewed Brand Management & Partnerships Assistant at LAIKA
Photo of the Rise User
Someone from OH, Athens just viewed Senior Multimedia Artist, Design & Creative at RepRisk AG
H
Someone from OH, Rocky River just viewed Training Manager at Hotel Bardo Savannah
F
Someone from OH, Columbus just viewed VP of Communications at Freedom Together Foundation
Photo of the Rise User
Someone from OH, Columbus just viewed Chief Organizational Communication Officer at Providence
Photo of the Rise User
Someone from OH, Cuyahoga Falls just viewed SEASONER at Shearer's Foods
Photo of the Rise User
Someone from OH, Columbus just viewed Bilingual Care Manager, Telephonic RN at Humana
Photo of the Rise User
Someone from OH, Columbus just viewed Talent Business Partner at Red Bull
Photo of the Rise User
16 people applied to Junior Unity Developer at Gameloft
Photo of the Rise User
Someone from OH, Brunswick just viewed Sanitation Team Member at Shearer's Foods
Photo of the Rise User
Someone from OH, Columbus just viewed Talent Acquisition Specialist at Beghou Consulting
C
Someone from OH, Middletown just viewed Operations Analyst at Core Specialty Insurance
A
Someone from OH, Strongsville just viewed Graphic Design Intern at Anvil NorthWest
W
Someone from OH, Uhrichsville just viewed Director Operations at WVUMedicine
Photo of the Rise User
Someone from OH, Cincinnati just viewed Game Director, Scripps Sports at The E.W. Scripps Company