Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote image - Rise Careers
Job details

ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote

At Hugging Face, we’re on a journey to democratize good AI. We are building the fastest growing platform for AI builders with over 5 million users & 100k organizations who collectively shared over 1M models, 300k datasets & 300k apps. Our open-source libraries have more than 400k+ stars on Github.

About the Role

Smol models are an exciting area of research as they enable cheaper inference and can be run on-device allowing for more customization and ensuring privacy. The SmolLM team at Hugging Face is pushing the frontier of smol models by building high quality pre-training and post-training datasets [1,2], and applying the latest architecture and training techniques to develop state-of-the-art models [2,3]. The dataset processing can leverage our scalable CPU cluster and the models are trained on a state-of-the-art H100 cluster with close to 100 nodes.

In this internship you will work alongside the SmolLM team and work towards building the next generation of smol language models by iterating on datasets and models quickly and finally training models on our distributed training infrastructure. If you are passionate about training LLMs and building high-quality datasets, proficient in Python, we would love to hear from you! Join the SmolLM team and collaborate on developing the best smol models in the field. Checkout hf.co/science for more information about the science team at Hugging Face, and hf.co/HuggingFaceTB for more information on the SmolLM projects.

[1] The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale https://arxiv.org/abs/2406.17557

[2] SmolLM - blazingly fast and remarkably powerful https://huggingface.co/blog/smollm

[3] SmolLM2 https://github.com/huggingface/smollm

About You

If you love open-source but also have an eye for art and creativity, are passionate about making complex technology more accessible to engineers and artists, and want to contribute to one of the fastest-growing ML ecosystems, then we can't wait to see your application!

If you're interested in joining us, but don't tick every box above, we still encourage you to apply! We're building a diverse team whose skills, experiences, and background complement one another. We're happy to consider where you might be able to make the biggest impact.

More about Hugging Face

We are actively working to build a culture that values diversity, equity, and inclusivity. We are intentionally building a workplace where people feel respected and supported—regardless of who you are or where you come from. We believe this is foundational to building a great company and community. Hugging Face is an equal opportunity employer and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

We value development. You will work with some of the smartest people in our industry. We are an organization that has a bias for impact and is always challenging ourselves to continuously grow. We provide all employees with reimbursement for relevant conferences, training, and education.

We care about your well-being. We offer flexible working hours and remote options. We support our employees wherever they are. While we have office spaces around the world, especially in the US, Canada, and Europe, we're very distributed and all remote employees have the opportunity to visit our offices. If needed, we'll also outfit your workstation to ensure you succeed.

We support the community. We believe significant scientific advancements are the result of collaboration across the field. Join a community supporting the ML/AI community.

Please provide a cover letter mentioning why you would like to work in open-source at Hugging Face. We encourage you to mention your skills, potential expertise, and topics on which you would like to work.

Hugging Face Glassdoor Company Review
3.6 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
Hugging Face DE&I Review
4.0 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
CEO of Hugging Face
Hugging Face CEO photo
Unknown name
Approve of CEO

Average salary estimate

$0 / YEARLY (est.)
min
max
$0K
$0K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote, Hugging Face

Are you ready to kickstart your career in machine learning? At Hugging Face, we're on a mission to democratize AI, and we're inviting you to join us as a Machine Learning Research Engineer Intern focusing on SmolLMs pretraining and datasets. Our platform is rapidly growing, embraced by over 5 million users and 100,000 organizations globally, who are eager to share over one million models, 300,000 datasets, and 300,000 applications. As part of the SmolLM team, you will delve into the exciting world of smol models that enable affordable inference and personalized experiences while ensuring user privacy. You'll be engaged in developing high-quality pre-training and post-training datasets, working with our scalable CPU clusters, and training models on our cutting-edge H100 cluster. If you have a passion for crafting language models, are proficient in Python, and want to impact the AI community, we would be thrilled to hear from you! Combine your technical skills with your artistic vision and collaborate with a talented team pushing the envelope in machine learning. We're committed to fostering an inclusive work environment that values diverse backgrounds, so even if you don't tick every box, we encourage you to apply and share your unique perspective on how you can contribute. Get ready to innovate, learn, and grow with Hugging Face!

Frequently Asked Questions (FAQs) for ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote Role at Hugging Face
What does a Machine Learning Research Engineer Internship at Hugging Face involve?

As a Machine Learning Research Engineer Intern at Hugging Face, you will collaborate with the SmolLM team to create and iterate on datasets for smol models. You'll leverage our cutting-edge infrastructure for distributed training and work closely with experienced professionals in the field. It's a unique opportunity to enhance your skills while contributing to impactful AI research.

Join Rise to see the full answer
What qualifications do I need for the Machine Learning Research Engineer Internship at Hugging Face?

While specific qualifications may vary, candidates for the Machine Learning Research Engineer Internship should be proficient in Python and have a strong interest in training language models and developing datasets. A background in machine learning, data science, or related fields is beneficial, but we value passion and creativity above all!

Join Rise to see the full answer
Is the Machine Learning Research Engineer Internship at Hugging Face remote?

Yes! The Machine Learning Research Engineer Internship at Hugging Face is a remote position, offering you the flexibility to work from anywhere in the EMEA region. Whether you're at home or traveling, you can contribute to our exciting projects without geographical constraints.

Join Rise to see the full answer
What is the SmolLM team focusing on at Hugging Face?

The SmolLM team at Hugging Face focuses on developing efficient, high-performance smol language models designed for lower-cost inference and on-device customization. Our goal is to create models that uphold user privacy while advancing the state-of-the-art in machine learning.

Join Rise to see the full answer
How does Hugging Face support employee development?

Hugging Face is committed to your personal and professional growth. We offer reimbursements for relevant conferences, training, and educational opportunities. Our culture fosters continuous improvement, so you will have the chance to work alongside industry leaders and continuously learn.

Join Rise to see the full answer
What kind of projects can I expect to work on as an intern at Hugging Face?

During your internship as a Machine Learning Research Engineer, you can expect to work on projects revolving around dataset creation and model training for our innovative smol models. You’ll be an essential part of the team, refining and enhancing the AI models that drive our platform's success.

Join Rise to see the full answer
How does Hugging Face ensure a diverse workplace?

At Hugging Face, we value diversity, equity, and inclusivity. We actively work towards cultivating a workplace where everyone feels respected and supported, regardless of their background. Joining us means contributing to a community dedicated to diverse perspectives and ideas.

Join Rise to see the full answer
Common Interview Questions for ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote
Can you explain the importance of smol models in the current AI landscape?

In preparing for this question, think about how smol models contribute to more efficient AI applications by ensuring lower-cost inference, easier deployment on devices, and enhanced user privacy. Discuss the potential transformations in industries leveraging these models and your personal understanding of their relevance.

Join Rise to see the full answer
How do you ensure the quality of the datasets you work with?

Highlight your approach to data quality by discussing techniques like data cleaning, validation, and augmentation. Share any personal experiences where you improved dataset quality, stressing your commitment to high standards in machine learning projects.

Join Rise to see the full answer
What tools and libraries are you familiar with for machine learning projects?

Prepare a list of tools and libraries relevant to the role, such as TensorFlow, PyTorch, Hugging Face Transformers, or others you have experience with. Be ready to discuss specific projects where you utilized these tools and how they benefited the outcome.

Join Rise to see the full answer
Describe a time you faced challenges while working on a machine learning project.

Use the STAR method (Situation, Task, Action, Result) to framework your response. Discuss a specific challenge, how you tackled it, the actions you took, and the end result. This shows your problem-solving capabilities and resilience.

Join Rise to see the full answer
What is your experience with Python, especially in relation to machine learning?

Be prepared to showcase your programming proficiency, providing examples of how you’ve used Python in machine learning projects whether for data manipulation, model development, or deployment. Discuss any frameworks or libraries you've used extensively.

Join Rise to see the full answer
How do you stay updated with trends in machine learning and AI?

Discuss the resources you rely on such as academic journals, blogs, podcasts, online courses, and community events. Mention any specific contributions you've made to staying informed, like attending conferences or participating in forums.

Join Rise to see the full answer
Can you describe your experience with parallel computing and distributed systems?

If applicable, share your understanding or experience with parallel computing technologies and principles of distributed machine learning. Explain how you've applied those concepts in previous projects and the impact it had.

Join Rise to see the full answer
How would you approach debugging a machine learning model?

Explain your systematic approach to debugging, starting with examining input data, model architecture, and training processes. Discuss the tools you utilize, such as visualization techniques or libraries, to help identify issues within the model.

Join Rise to see the full answer
What do you hope to learn during your internship at Hugging Face?

Articulate your eagerness to learn and grow in specific areas, mentioning technical skills, collaborative experiences, or understanding complex models. Express how working with the SmolLM team can help fulfill these learning objectives.

Join Rise to see the full answer
Why do you want to work in open-source, and what do you feel you can contribute to Hugging Face?

Share your passion for open-source contributions, emphasizing how you believe accessibility and collaboration drive innovation. Highlight your specific skills, experiences, and ideas that align with Hugging Face's mission and ethos.

Join Rise to see the full answer
Similar Jobs
Sensei Ag Hybrid Wilton, California
Posted 8 days ago
Photo of the Rise User
Sopra Steria Remote 6 Rue Emmanuel Arin, 31300 Toulouse, France
Posted 11 days ago
Photo of the Rise User
Redwood Materials Hybrid McCarran, Nevada, United States
Posted 6 days ago
Photo of the Rise User
Eurofins Hybrid Indianapolis, IN, USA
Posted 2 days ago
Photo of the Rise User
Corcept Therapeutics Hybrid Redwood City, California, United States
Posted 11 days ago
Photo of the Rise User
Posted 10 days ago
Photo of the Rise User
Nokia Hybrid New Providence, NJ
Posted 7 days ago
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Internship, remote
DATE POSTED
November 28, 2024

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!