Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Director - Cloud Operations image - Rise Careers
Job details

Director - Cloud Operations

In 2012, Lambda started with a crew of AI engineers publishing research at top machine-learning conferences. We began as an AI company built by AI engineers. That hasn't changed. Today, we're on a mission to be the world's top AI computing platform. We equip engineers with the tools to deploy AI that is fast, secure, affordable, and built to scale. Whether they need powerhouse GPU hardware on-site or the flexibility of cloud-based solutions, we've got the horsepower to make it happen. Lambda’s AI Cloud has been adopted by the world’s leading companies and research institutions including Anyscale, Rakuten, The AI Institute, and multiple enterprises with over a trillion dollars of market capitalization. Our goal is to make computation as effortless and ubiquitous as electricity.


If you'd like to build the world's best deep learning cloud, join us. 

*Note: This position requires presence in our San Francisco/San Jose office location 4 days per week; Lambda’s designated work from home day is currently Tuesday.

We’re looking for a Cloud Operations Director that has experience supporting large-scale logical operations of computing, storage and networking systems. 

As a leader of this pillar within the Engineering organization, you will be responsible for the teams that configure and maintain our fleet and the teams that operate our cloud service. You will define ambitious SLOs and drive towards them in close collaboration with partners across the company (e.g. DC operations, supply chain, observability, tooling development). 

You must possess the ability to work in a “freedom & responsibility” environment with strong context, but minimum guidance. Passion for AI, and experience working at or closely with Cloud Service Providers are major pluses.

What You’ll Do

  • Lead and grow a multi-region, global cloud systems team to 50+ and beyond.

  • Drive operational excellence - Ensure that 

    • Lambda’s cloud services meet the industry’s highest reliability standards. 

    • Lambda’s fleet operates most cost-effectively.

    • The department responds to and resolves escalated customer issues quickly and thoroughly. 

    • Lambda configures and maintains systems effectively and efficiently.

  • Identify key metrics to measure reliability and efficiency, drive their implementation within the team and with cross-functional partners, then use the data to relentlessly drive continuous improvement. 

  • Spearhead the development of new functions and processes to enhance operational effectiveness.

  • Manage communication of progress and status with internal stakeholders and customer groups across different locations and time zones.

  • Maintain comprehensive documentation of processes and ensure project visibility throughout the deployment.

  • Interview, mentor, and coach new team members to foster their professional growth and development. 

About You

  • 15+ years of experience in technical systems (HPC, networking, storage)

  • 5+ years experience in managing the operation of large scale environments 

  • Experience setting strategy, multi-year execution roadmap & metrics for SRE

  • Experience in service management processes (ex: Incident Management, Change Management, Alert Management)

  • Experience defining, monitoring & tracking SLOs for infrastructure and services.

  • Experience with managing geographically distributed teams

  • Experience managing multiple managers across a wide spectrum of functions

  • Knowledgeable in distributed systems and redundancy / high-availability and performance optimizations

  • Familiar with security and risk mitigation for a cloud-based environment

  • Familiar with compliance regimes (SOC2, ISO etc.).

  • Familiarity with using analytics to forecast and optimize operations teams to ensure appropriate resource allocation

  • Ability to work well under deadlines and structured project plans, but also comfortable with high-frequency change.

Nice to Have 

  • Experience in the AI, machine learning or computer hardware industry

  • Bachelor’s degree in EE, CS, Physics, Mathematics, or equivalent work experience

  • Expertise in cloud security protocols and vulnerability assessment to protect data and maintain secure infrastructure

Salary Range Information 

Based on market data and other factors, the annual salary range for this position is $320,000-$450,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • Founded in 2012, ~350 employees (2024) and growing fast

  • We offer generous cash & equity compensation

  • Our investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.

  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability

  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG

  • Health, dental, and vision coverage for you and your dependents

  • Commuter/Work from home stipends for select roles

  • 401k Plan with 2% company match (USA employees)

  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

Lambda Glassdoor Company Review
3.4 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
Lambda DE&I Review
No rating Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of Lambda
Lambda CEO photo
Stephen Balaban
Approve of CEO

Average salary estimate

$385000 / YEARLY (est.)
min
max
$320000K
$450000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Director - Cloud Operations, Lambda

Are you ready to take your career to the next level? Join Lambda as the Director of Cloud Operations in vibrant San Francisco! Lambda, established in 2012, is paving the way to become the world's leading AI computing platform. We’re on a mission to provide engineers with seamless deployment tools for fast, secure, and scalable AI. If you have a passion for AI and vast experience managing large-scale computing, storage, and networking systems, we want to hear from you! In this role, you’ll lead and nurture a global cloud systems team of over 50, ensuring our cloud services meet the highest reliability standards. You’ll spearhead operational excellence while fostering a “freedom & responsibility” environment, set ambitious SLOs, and drive continuous improvement by quantifying key metrics. Plus, you’ll manage communication and collaboration across various locations and time zones. We’re all about growth and innovation; you’ll mentor new team members and enhance operational effectiveness as we strive to make computation accessible like electricity. If you are excited about building the best deep learning cloud and want to make a significant impact at Lambda, we’d love to welcome you aboard in our San Francisco office. Don’t miss out on this chance to help with Lambda's remarkable journey!

Frequently Asked Questions (FAQs) for Director - Cloud Operations Role at Lambda
What are the primary responsibilities of the Director - Cloud Operations at Lambda?

The Director - Cloud Operations at Lambda is responsible for leading a multi-region global cloud systems team, ensuring operational excellence in cloud services, and driving reliability and efficiency metrics. They also manage customer issue escalations, oversee configuration and maintenance of systems, and spearhead the development of new functions to enhance operational effectiveness.

Join Rise to see the full answer
What qualifications are needed for the Director - Cloud Operations position at Lambda?

Candidates for the Director - Cloud Operations position at Lambda should have over 15 years of experience in technical systems and at least 5 years in managing large-scale environments. Knowledge in service management processes, experience with geographically distributed teams, and expertise in distributed systems are crucial for success in this role.

Join Rise to see the full answer
What is the work environment like for the Director - Cloud Operations at Lambda?

At Lambda, the work environment for the Director - Cloud Operations is characterized by a 'freedom & responsibility' ethos. This means you’ll have strong autonomy to make impactful decisions with less guidance, while working closely with partners across various departments in a collaborative manner.

Join Rise to see the full answer
How does the Director - Cloud Operations contribute to Lambda's mission?

The Director - Cloud Operations plays a vital role in Lambda’s mission by ensuring our cloud services are the most reliable and cost-effective on the market. Through innovative leadership and continuous improvement practices, they help enable the deployment of AI technologies that are fast, secure, and scalable.

Join Rise to see the full answer
What are the career growth opportunities for the Director - Cloud Operations at Lambda?

As the Director - Cloud Operations at Lambda, you will not only lead a significant team but also have the chance to shape cloud operations as the company scales. With continuous growth and a focus on learning, there are numerous opportunities for advancement within the organization as Lambda expands its influence in AI technology.

Join Rise to see the full answer
Common Interview Questions for Director - Cloud Operations
What strategies would you employ to ensure the reliability of Lambda’s cloud services?

To ensure the reliability of Lambda’s cloud services, I would focus on defining and monitoring stringent SLOs, implementing robust service management processes, and utilizing metrics to drive continuous improvement. Establishing a culture of accountability within the team is essential to maintain high reliability standards.

Join Rise to see the full answer
How would you handle escalated customer issues in your role at Lambda?

Handling escalated customer issues requires a prompt and thorough approach. I would first ensure clear communication channels are in place, gather relevant data to understand the issue, and then collaborate with cross-functional teams to address and resolve the customer’s concerns efficiently.

Join Rise to see the full answer
Can you discuss your experience managing distributed teams?

Managing distributed teams effectively involves clear communication, regular status updates, and fostering a strong team culture despite geographical distances. I prioritize leveraging collaborative tools to ensure everyone is aligned on goals and encourages team mentorship to cultivate professional growth.

Join Rise to see the full answer
Describe a time you drove operational excellence in a previous role.

In a prior position, I implemented a set of key performance metrics that allowed us to identify bottlenecks in our operations. By continuously tracking these metrics and conducting regular review meetings, we achieved our operational goals and improved service delivery to our clients significantly.

Join Rise to see the full answer
What experience do you have with service management processes?

During my career, I’ve gained significant experience with service management processes, including incident management and change management. I have a keen understanding of the importance of these processes in maintaining service reliability and ensuring a systematic approach to addressing operational challenges.

Join Rise to see the full answer
How do you plan to promote continuous improvement in operations at Lambda?

Promoting continuous improvement is about facilitating a culture where team members are encouraged to suggest improvements and embrace data-driven decision-making. Implementing regular feedback loops, utilizing analytics for predictive insights, and refining processes based on team input are key strategies I would employ.

Join Rise to see the full answer
What qualities do you believe are essential for leading a cloud operations team?

Essential qualities for leading a cloud operations team include strong technical expertise, excellent communication skills, decision-making abilities, and the capacity to inspire and mentor team members. Encouraging innovation and adaptability in the face of change is also critical in this fast-paced tech landscape.

Join Rise to see the full answer
How familiar are you with compliance regimes such as SOC2 and ISO?

I am well-versed in compliance regimes like SOC2 and ISO. My experience includes implementing processes to adhere to these standards, conducting audits, and ensuring that all operations meet compliance requirements to protect data and maintain a secure infrastructure.

Join Rise to see the full answer
Can you explain your experience with cloud security protocols?

I have extensive experience with cloud security protocols, focusing on vulnerability assessments and implementing robust security measures. Concurrently, I believe it is essential to keep up-to-date with the latest security trends and practices to ensure the infrastructure remains secure and resilient.

Join Rise to see the full answer
What approach do you take when setting ambitious SLOs?

When setting ambitious SLOs, my approach involves collaborating with stakeholders to align on realistic yet challenging goals. I believe in establishing clear metrics and maintaining transparency throughout the process, while continuously reviewing performance to ensure we remain on track.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 10 days ago
Photo of the Rise User
AECOM Remote Sydney NSW, Australia
Posted 10 days ago
Photo of the Rise User
AECOM Remote Newcastle, NSW, Australia
Posted 9 days ago
CRB Hybrid Jacksonville, FL, USA
Posted 13 days ago
Photo of the Rise User
Posted 12 days ago
Photo of the Rise User
Posted 8 days ago
Photo of the Rise User
Posted 15 hours ago
Vichara Hybrid NY-25, Smithtown, NY, USA
Posted 2 days ago
Photo of the Rise User
Posted 3 days ago

Lambda provides Artificial Intelligence and Machine Learning infrastructure to companies like Apple, Intel, Microsoft, MIT, Harvard, the Federal Government, and the DOD. Were headquartered in the Dogpatch and are a short walk from the 22nd Street ...

63 jobs
MATCH
VIEW MATCH
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
March 25, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
Photo of the Rise User
Someone from OH, Columbus just viewed Store Manager - New Store Opening at Curaleaf
S
Someone from OH, Dayton just viewed Senior Director, Employee Engagement at Scout Motors
Photo of the Rise User
Someone from OH, Akron just viewed Finance Intern - Summer 2025 at Spectrum
Photo of the Rise User
Someone from OH, Norwalk just viewed Hybrid Account Manager-Commercial Lines at AssuredPartners
Photo of the Rise User
Someone from OH, Loveland just viewed Animator at Apex Systems Bellevue, WA at Apex Systems
Photo of the Rise User
Someone from OH, Canton just viewed Lead Jr. Toddler Teacher at All Around Children
Photo of the Rise User
Someone from OH, Mentor just viewed Site Merchandising Manager at Lovepop
Photo of the Rise User
Someone from OH, Batavia just viewed Restaurant Busser at Outback Steakhouse
Photo of the Rise User
67 people applied to Electrical Apprentice at Aerotek
Photo of the Rise User
Someone from OH, New Albany just viewed Customer Success Manager at Quisitive
Photo of the Rise User
Someone from OH, Columbus just viewed UGC Creator - USA, Female 40-50 - Contract to hire at Upwork
Photo of the Rise User
Someone from OH, Strongsville just viewed Automotive Buyer at Sonic Automotive
Photo of the Rise User
Someone from OH, Strongsville just viewed Experienced Automotive Buyer at Sonic Automotive
Photo of the Rise User
8 people applied to Assembly Mechanic at Boeing
Photo of the Rise User
Someone from OH, Columbus just viewed Business Systems Analyst, Apps & Automations at Deel
Photo of the Rise User
Someone from OH, Findlay just viewed Marketing Analyst at ITW
R
Someone from OH, Cleveland just viewed Marketing Lead at Redi.Health
Photo of the Rise User
Someone from OH, Cleveland just viewed Associate Conversion Data Analyst at Bloomerang
Photo of the Rise User
Someone from OH, Cleveland just viewed Material Buyer/Planner at Aston Carter
F
Someone from OH, Cleveland just viewed Senior Materials Planner at Fortune Brands
Photo of the Rise User
Someone from OH, Cleveland just viewed Junior Data Analyst at Arkana Laboratories
Photo of the Rise User
Someone from OH, Cleveland just viewed BI Analyst, Junior at Emi Labs