The Chan Zuckerberg Initiative aims to help solve society's toughest challenges. We are seeking a Senior ML Ops Engineer to empower users across the AI lifecycle by building and operating AI Systems Infrastructure.
Sign up for our
weekly newsletter
of fresh jobs
Skills
MLOps with GPU clusters
Kubernetes management
DevOps tooling for ML
Strong coding skills in Python or similar
Experience with cloud platforms
Responsibilities
Manage operations of large scale GPU Research cluster
Automate model deployment, alerting, and monitoring systems
Integrate and manage MLFlow for model tracking
Resolve issues within a Kubernetes based GPU Cluster
Collaborate on designing AI/ML infrastructure engineering solutions
Education
BS, MS, or PhD in Computer Science or equivalent experience
Benefits
Generous employer match on 401(k)
Annual personal benefit allowance
Paid time off for volunteering
Relocation support
Diversity and inclusion commitment
To read the complete job description, please click on the ‘Apply’ button
Chan Zuckerberg Initiative Glassdoor Company Review