At Hugging Face, we’re on a journey to democratize good AI. We are building the fastest growing platform for AI builders with over 5 million users & 100k organizations who collectively shared over 1M models, 300k datasets & 300k apps. Our open-source libraries have more than 400k+ stars on Github.
About the Role
High-quality datasets are the foundation of strong LLMs, yet, most labs releasing state-of-the-art models are vague when it comes to the pretraining data. At Hugging Face we want to enable all the community to build the best models by building and open-sourcing the finest datasets. FineWeb and FineWeb-Edu are examples of very strong, web-scale datasets we released this year while also open-sourcing the distributed processing library datatrove.
During this internship you will work alongside the FineWeb team and build the next generation of high-quality web data, by running distributed data processing and ablating the data quality by training small models. Checkout hf.co/science for more information about the science team at Hugging Face and the FineWeb and FineTask blog posts for the work of this team specifically.
About You
If you love open-source but also have an eye for art and creativity, are passionate about making complex technology more accessible to engineers and artists, and want to contribute to one of the fastest-growing ML ecosystems, then we can't wait to see your application!
If you're interested in joining us, but don't tick every box above, we still encourage you to apply! We're building a diverse team whose skills, experiences, and background complement one another. We're happy to consider where you might be able to make the biggest impact.
More about Hugging Face
We are actively working to build a culture that values diversity, equity, and inclusivity. We are intentionally building a workplace where people feel respected and supported—regardless of who you are or where you come from. We believe this is foundational to building a great company and community. Hugging Face is an equal opportunity employer and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
We value development. You will work with some of the smartest people in our industry. We are an organization that has a bias for impact and is always challenging ourselves to continuously grow. We provide all employees with reimbursement for relevant conferences, training, and education.
We care about your well-being. We offer flexible working hours and remote options. We support our employees wherever they are. While we have office spaces around the world, especially in the US, Canada, and Europe, we're very distributed and all remote employees have the opportunity to visit our offices. If needed, we'll also outfit your workstation to ensure you succeed.
We support the community. We believe significant scientific advancements are the result of collaboration across the field. Join a community supporting the ML/AI community.
Please provide a cover letter mentioning why you would like to work in open-source at Hugging Face. We encourage you to mention your skills, potential expertise, and topics on which you would like to work.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
If you're looking for an exciting opportunity to dive into the world of machine learning, then the ML Research Engineer Internship at FineWeb is perfect for you! Based remotely with Hugging Face, a leader in democratizing AI technologies, you'll be part of a vibrant team dedicated to harnessing the power of high-quality web data to enhance machine learning models. You'll engage in meaningful work that directly impacts the AI community by building and open-sourcing exceptional datasets like FineWeb and FineWeb-Edu. This role is not just a technical project; it's about creativity and accessibility in AI. As an intern, you'll work with distributed data processing, explore data quality by training small models, and contribute to projects that redefine what's possible in ML research. If you have a passion for open-source technology, a flair for creativity, and a desire to collaborate with some of the brightest minds in the field, Hugging Face is eager to see your application. We believe that the best teams are diverse and inclusive, so don’t worry if you don’t meet every single requirement—your unique talents could be just what we’re looking for! Join us in shaping the future of AI, while enjoying a supportive and flexible work environment. We’re excited to welcome you, whether you’re based in Europe, Africa, or beyond. Apply now and be part of our journey to make complex technology more accessible and impactful!
Subscribe to Rise newsletter