About Anyscale:
At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing Ray, a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI, Uber, Spotify, Instacart, Cruise, and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world.
With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert.
Proud to be backed by Andreessen Horowitz, NEA, and Addition with $250+ million raised to date.
About The Role:
Anyscale is looking for a staff software engineer to lead the Model Training Infrastructure team.
The Model Training Infrastructure team leads the development and optimization of Ray’s distributed training libraries, focusing on enabling large-scale ML workloads. The team owns and maintains widely adopted open source libraries like Ray Train for distributed model training and Ray Tune for distributed hyperparameter tuning.
As the technical leader for this team, you will be responsible for:
Thinking deeply about delightful, programmatic interfaces for machine learning engineers to scale model training
Build and rethink distributed training architectures to scale seamlessly from laptop to the cloud
Implement and innovate on distributed training algorithms like elastic training to improve model training performance
Working with and leading a robust open source community around the Ray project
Engage directly with ML infrastructure teams around the world to iterate and build the best training infrastructure.
Advocate and share your work broadly with the ML community through talks, tutorials, and blog posts
On the day-to-day basis, you will drive the technical direction of the team, mentor engineers, and deliver high-impact projects. You’ll shape the vision for what training infrastructure looks like for enterprises around the world and remain hands-on with the code and product development.
We’d love to hear from you if you have:
Multiple years of experience building, scaling, and maintaining complex software systems in production
Proven experience leading or mentoring engineering teams in a technical capacity
Expertise in machine learning frameworks (e.g., PyTorch, TensorFlow, XGBoost)
Hands-on experience with distributed systems and designing fault-tolerant infrastructure
Excellent communication and collaboration skills
Bonus points if you have:
Experience with Ray
Experience with cloud technologies (e.g., AWS, GCP, Kubernetes)
Experience building and operating ML training platforms in production
Contributions to or maintenance of open-source libraries
Experience leading open-source or cross-functional teams
Compensation:
At Anyscale, we take a market-based approach to compensation. We are data-driven, transparent, and consistent. The target salary for this role is $237,000 ~ $284,614. As the market data changes over time, the target salary for this role may be adjusted.
This role is also eligible to participate in Anyscale's Equity and Benefits offerings, including the following:
Stock Options
Healthcare plans, with premiums covered by Anyscale at 99%
401k Retirement Plan
Education & Wellbeing Stipend
Paid Parental Leave
Fertility Benefits
Flexible Time Off
Commute reimbursement
100% of in office meals covered
Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law.
Anyscale Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Experienced Platform Engineer role at Mindera to innovate and maintain cloud-native platforms with Azure Kubernetes Service at its core.
Drive innovation and lead a software engineering team as Tech Lead specializing in Go development for a global cloud communications client with Truelogic’s remote team.
Experienced Sr. Software Engineer needed at Visa to develop secure, scalable digital payment solutions in a hybrid work environment.
Memorial Sloan Kettering Cancer Center seeks an Integration Engineer II to architect and implement advanced integration solutions supporting cancer treatment innovations.
Intel is looking for a skilled GPU Software Development Engineer with AI/ML expertise to enhance and optimize Intel GPU software solutions.
Lead the development of cutting-edge AI-powered legal applications as a Full Stack Software Engineer at Thomson Reuters' CoCounsel AI Assistant team.
Contribute to cutting-edge automotive technology as a Micro Services Developer at General Motors, driving microservices and API integration in a hybrid work environment.
Contribute as a Senior Software Engineer at Magical, building scalable distributed systems and leading the creation of innovative AI-powered infrastructure in a fast-growth environment.
Contribute to AllTrails’ mission by developing impactful frontend features in a fast-paced, remote Growth team focused on enhancing user journeys and engagement.
Contribute as a Full-Stack Product Engineer at One Project to develop innovative technology supporting a new, equitable economic system.
Lead backend development efforts to architect scalable systems for processing and normalizing cloud cost data at CloudZero.
Lead and manage software engineering teams at DXC Technology to drive innovative IT solutions and ensure successful project delivery.
Innovate with Adobe as a Sr. Full Stack Engineer to develop robust AI-driven web applications within the Firefly GenAI team.
We are building the future of software development.
50 jobsSubscribe to Rise newsletter