Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Senior Systems Engineer – AI/ML Infrastructure image - Rise Careers
Job details

Senior Systems Engineer – AI/ML Infrastructure

Company Overview

Deepgram is the leading voice AI platform for developers building speech-to-text (STT), text-to-speech (TTS) and full speech-to-speech (STS) offerings. 200,000+ developers build with Deepgram’s voice-native foundational models – accessed through APIs or as self-managed software – due to our unmatched accuracy, latency and pricing. Customers include software companies building voice products, co-sell partners working with large enterprises, and enterprises solving internal voice AI use cases. The company ended 2024 cash-flow positive with 400+ enterprise customers, 3.3x annual usage growth across the past 4 years, over 50,000 years of audio processed and over 1 trillion words transcribed. There is no organization in the world that understands voice better than Deepgram

Opportunity:

We are seeking an experienced Senior Systems Engineer – AI/ML Infrastructure to design, implement, and maintain our large-scale distributed systems infrastructure. You'll be responsible for building and optimizing our network architecture, storage solutions, and compute platforms that power our AI/ML workloads. This role combines expertise in network engineering, storage systems, and modern container orchestration platforms, with a focus on reliability, scalability, and cost-effectiveness.

What You’ll Do

  • Build and maintain bare-metal GPU compute clusters for AI training and inference workloads

  • Implement monitoring, alerting, and automation solutions for infrastructure management

  • Manage large-scale deployments using modern orchestration platforms like Kubernetes and Slurm

  • Design and implement reliable, high-performance network architectures for distributed systems

  • Architect and maintain large-scale storage solutions, including backup systems, distributed caching, and object storage

You’ll Love This Role If You

  • Are passionate about building reliable, scalable infrastructure systems

  • Enjoy optimizing complex distributed systems for performance and cost

  • Love solving challenging problems in networking and storage at scale

  • Are excited about working with cutting-edge GPU infrastructure

  • Want to work at the intersection of infrastructure and AI/ML systems

It’s Important To Us That You Have

  • 5+ years of experience in infrastructure engineering or similar roles

  • Strong background in network engineering and design for reliability

  • Experience with large-scale storage systems (distributed file systems, caching solutions)

  • Proven track record of managing bare-metal infrastructure

  • Expertise in container orchestration platforms (Kubernetes, Slurm)

  • Experience with GPU infrastructure management and optimization

  • Strong automation and scripting skills

  

It Would Be Great if You Had 

  • Experience with software-defined networking

  • Experience with infrastructure cost management and capacity planning

  • Familiarity with AI/ML workloads and their infrastructure requirements

  • Experience with multi-region infrastructure deployment

  • Background in performance optimization for distributed systems

Backed by prominent investors including Y Combinator, Madrona, Tiger Global, Wing VC and NVIDIA, Deepgram has raised over $85 million in total funding. If you're looking to work on cutting-edge technology and make a significant impact in the AI industry, we'd love to hear from you!

Deepgram is an equal opportunity employer. We want all voices and perspectives represented in our workforce. We are a curious bunch focused on collaboration and doing the right thing. We put our customers first, grow together and move quickly. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, gender identity or expression, age, marital status, veteran status, disability status, pregnancy, parental status, genetic information, political affiliation, or any other status protected by the laws or regulations in the locations where we operate.

We are happy to provide accommodations for applicants who need them.

Deepgram Glassdoor Company Review
3.9 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
Deepgram DE&I Review
No rating Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of Deepgram
Deepgram CEO photo
Scott Stephenson
Approve of CEO

Average salary estimate

$150000 / YEARLY (est.)
min
max
$120000K
$180000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Senior Systems Engineer – AI/ML Infrastructure, Deepgram

If you're looking for an exciting opportunity with Deepgram as a Senior Systems Engineer – AI/ML Infrastructure, then you may have just found your next big adventure! In this fully remote role, you'll dive into the heart of our voice AI platform, working with a vibrant team that supports over 200,000 developers harnessing the power of speech technology. Your key responsibilities will involve designing, implementing, and enhancing our large-scale distributed systems infrastructure, which is the backbone of our innovative AI/ML workloads. You'll get to build and maintain bare-metal GPU compute clusters, ensuring they are optimized for both reliability and performance. But wait, there's more! You’ll also have the chance to shape our network architecture, paving the way for scalable and efficient storage solutions that are vital for our success. If you're passionate about solving complex challenges in network engineering and storage, and if managing modern orchestration platforms like Kubernetes excites you, this may just be the perfect fit! Deepgram thrives on collaboration and creativity, and we are committed to helping you grow both personally and professionally within the ever-evolving AI landscape. Come join us and make an impact in the world of voice technology!

Frequently Asked Questions (FAQs) for Senior Systems Engineer – AI/ML Infrastructure Role at Deepgram
What are the responsibilities of a Senior Systems Engineer – AI/ML Infrastructure at Deepgram?

As a Senior Systems Engineer – AI/ML Infrastructure at Deepgram, your primary responsibilities include designing and maintaining large-scale distributed systems infrastructure, focusing on network architecture, storage solutions, and compute platforms essential for AI/ML workloads. You'll manage bare-metal GPU clusters, implement comprehensive monitoring and automation solutions, and ensure high performance and reliability across the infrastructure.

Join Rise to see the full answer
What qualifications are required for the Senior Systems Engineer – AI/ML Infrastructure role at Deepgram?

To qualify for the Senior Systems Engineer – AI/ML Infrastructure position at Deepgram, you'll need a minimum of 5 years of experience in infrastructure engineering or similar roles. A strong background in network engineering, managing large-scale storage systems, and proficiency with container orchestration platforms like Kubernetes and Slurm are critical. Additionally, experience with GPU infrastructure management and strong automation skills will set you apart.

Join Rise to see the full answer
What technologies will I work with as a Senior Systems Engineer – AI/ML Infrastructure at Deepgram?

In the role of Senior Systems Engineer – AI/ML Infrastructure at Deepgram, you will work with cutting-edge technologies including bare-metal GPU compute clusters, Kubernetes for container orchestration, and various large-scale storage solutions. You'll also engage with automation tools for infrastructure management and focus on optimizing networking for distributed systems.

Join Rise to see the full answer
How does Deepgram support the professional growth of a Senior Systems Engineer – AI/ML Infrastructure?

Deepgram is committed to the professional development of its team members, including those in the Senior Systems Engineer – AI/ML Infrastructure role. With opportunities to work on advanced projects, access to the latest technologies, and a collaborative environment, you’ll have many avenues to learn and advance your skills in AI and infrastructure engineering.

Join Rise to see the full answer
What is Deepgram's approach to company culture for Senior Systems Engineer – AI/ML Infrastructure?

At Deepgram, we're proud of our inclusive and collaborative culture, which is vital for roles like the Senior Systems Engineer – AI/ML Infrastructure. We value diverse perspectives, encourage teamwork, and focus on doing what's right for our customers. This connection fosters innovation and growth, helping you thrive in your role.

Join Rise to see the full answer
Common Interview Questions for Senior Systems Engineer – AI/ML Infrastructure
Can you describe your experience with managing GPU infrastructure?

When addressing your experience with GPU infrastructure, focus on specific projects where you managed bare-metal GPU clusters for AI training and inference. Discuss the challenges you faced, the optimizations you implemented, and the results achieved. Highlight any performance metrics that demonstrate your effectiveness.

Join Rise to see the full answer
How do you ensure reliability in distributed systems?

For reliability in distributed systems, discuss methods you have implemented, such as robust monitoring strategies, alerting systems, and redundancy measures. Emphasize your ability to design architectures that mitigate failures and ensure seamless operation under various loads.

Join Rise to see the full answer
What container orchestration platforms are you familiar with?

You should mention your experience with platforms like Kubernetes and Slurm, providing examples of how you've used them to deploy and manage applications. Explain your role in optimizing resource allocation and scaling applications efficiently in a production environment.

Join Rise to see the full answer
Describe a challenging problem you solved related to network architecture.

Think of a specific complex networking issue you've faced and provide details on the technical aspects. Discuss the strategies you deployed, the tools you used, and the overall impact of your solution. This showcases your critical thinking and problem-solving abilities.

Join Rise to see the full answer
How would you approach automation in infrastructure management?

For automation, discuss the tools and scripts you've implemented to streamline infrastructure management tasks, such as deploying resources, backup operations, or monitoring. Explain how your automation efforts improved efficiency and reduced downtime, emphasizing your scripting skills.

Join Rise to see the full answer
What's your experience with large-scale storage solutions?

Provide examples of large-scale storage systems you've architected or maintained, such as distributed file systems or caching solutions. Highlight your understanding of data redundancy, retrieval speeds, and backup strategies, so interviewers can see your expertise in handling massive volumes of data.

Join Rise to see the full answer
Can you explain your familiarity with AI/ML workloads?

Discuss specific AI/ML projects you've been involved in, detailing your role in shaping the infrastructure to support these workloads. Mention how your experience with GPU optimization has enhanced performance and contributed to successful outcomes in machine learning tasks.

Join Rise to see the full answer
How do you handle infrastructure cost management?

Talk about your experience with budgeting and capacity planning for infrastructure projects, providing tangible examples. Discuss strategies you've implemented to minimize costs while maintaining performance, such as using cost-effective storage solutions or optimizing resource utilization.

Join Rise to see the full answer
What steps do you take for capacity planning in infrastructure?

Explain how you assess current usage patterns and future growth needs. Discuss any tools or methodologies you utilize for monitoring capacity and predicting necessary adjustments, ensuring you're prepared for changing demands.

Join Rise to see the full answer
How would you define success in this role as a Senior Systems Engineer?

Define success in terms of performance metrics such as system uptime, efficiency gains from optimizations, successful project completions, and positive feedback from stakeholders. This showcases your understanding of key performance indicators that align with Deepgram’s goals.

Join Rise to see the full answer
Similar Jobs
Posted 5 days ago
Talent Worx Remote No location specified
Posted 8 days ago
Photo of the Rise User
Vast Hybrid Long Beach, California, United States
Posted 3 days ago
Rushdown Studios Remote No location specified
Posted 3 days ago
Photo of the Rise User
Posted 10 days ago
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Learning & Development
Equity
Paid Holidays
Paid Time-Off
WFH Reimbursements
Child Care stipend
Maternity Leave
Paternity Leave
Photo of the Rise User
Posted 7 days ago

Our mission is to unlock the power of voice data to fuel the world’s big ideas and we need people who aren’t afraid to challenge how it’s always been done. Are you in?

18 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
March 22, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!