Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Senior AI/ML Specialist Solutions Architect (Cloud & AI Infra) image - Rise Careers
Job details

Senior AI/ML Specialist Solutions Architect (Cloud & AI Infra)

About the Company

Our client is a publicly traded company at the forefront of the AI revolution, offering an AI-centric cloud platform that's reshaping the landscape of artificial intelligence. The company provides cutting-edge infrastructure, including large-scale GPU clusters, cloud platforms, tools, and services for developers to service the explosive growth of the global AI industry for Fortune 1000 companies, top-tier innovative startups, and AI researchers.

  • Company type: Publicly traded

  • Industry: AI/ML, Cloud Computing, Infrastructure-as-Code

  • Candidate Location: Remote U.S.

Their mission is to democratize access to AI infrastructure and empower organizations to create, optimize, and deploy AI solutions at any scale. They aim to simplify the complexities of AI development by providing a full-stack AI platform that combines powerful hardware with user-friendly tools and services.

The Opportunity

We are seeking a Senior AI/ML Specialist Solutions Architect to join our client's team. This role offers the chance to design and implement scalable AI solutions for AI-focused customers, working with state-of-the-art technologies and contributing to one of the most powerful commercially available supercomputers.

What You'll Do

  • Architect and optimize distributed training and inference systems for large-scale AI models

  • Design and deliver customer-focused solutions that maximize performance and business value

  • Lead the transition of ML pipelines from POC to scalable production systems

  • Build long-term customer relationships, ensuring satisfaction and alignment with strategic goals

  • Create whitepapers, deliver technical presentations, and host webinars to share insights and best practices

  • Provide technical leadership and mentor teams on AI infrastructure and deployment strategies

  • Collaborate with engineering and product teams to prioritize customer feedback and influence product roadmaps

What You Bring

  • 5+ years of experience with cloud technologies and infrastructure, ideally in senior MLOps or Solutions Architect roles

  • Proven expertise in scaling and optimizing AI workloads across multi-node and multi-GPU environments

  • Demonstrated success delivering ML products, scaling from POC to production

  • Deep knowledge of ML frameworks like PyTorch and JAX

  • Strong background in the NVIDIA HPC ecosystem (CUDA, NCCL, Infiniband)

  • Active involvement in the ML community (public speaking, open-source contributions, competitions like Kaggle and Hackathons)

  • Exceptional communication skills to engage both technical teams and business stakeholders

Preferred Technical Skills

  • Programming Languages: Python, Go, Java, C++

  • Infrastructure as Code (IaC): Terraform, Ansible

  • Orchestration: Kubernetes (K8s), Slurm

  • DevOps Tools: Git, Docker, Helm

  • Big Data Frameworks: Spark, Kafka, Hadoop

  • Databases: SQL, NoSQL, and vector databases

  • ML Frameworks: PyTorch, TensorFlow, JAX, HuggingFace, Scikit-learn

Why Join?

  • Competitive compensation: $180,000 - $300,000 per year (negotiable based on experience and location)

  • Full medical benefits: 100% company-paid medical, dental, and vision coverage for employees and families

  • 401(k) plan with a 4% match program

  • Stock options plan

  • Flexible remote work environment

  • Company-paid short-term, long-term disability, and life insurance coverage

  • 20 weeks paid parental leave for primary caregivers, 12 weeks for secondary caregivers

  • Up to $85/month for mobile and internet

  • Work with state-of-the-art AI and cloud technologies, including the latest NVIDIA GPUs

  • Be part of a team that operates one of the most powerful commercially available supercomputers

  • Contribute to sustainable AI infrastructure, with energy-efficient data centers that recover waste heat to warm nearby residential buildings

Interviewing Process

  • Level 1 - Interview with Talent Acquisition 

  • Level 2 - Interview with the Hiring Manager

  • Level 3 - Technical Assessment

  • Reference and Background Checks: conducted after successful interviews

  • Job Offer: provided to the selected candidate

We are proud to be an equal opportunity workplace and are committed to equal employment opportunity regardless of race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, genetic information, veteran status, gender identity, or expression, sexual orientation, or any other characteristic protected by applicable federal, state or local law.

Average salary estimate

$240000 / YEARLY (est.)
min
max
$180000K
$300000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Senior AI/ML Specialist Solutions Architect (Cloud & AI Infra), Lavendo

Looking for a thrilling opportunity to dive into the AI revolution? Our client, a publicly traded innovator in the AI sector, is on the lookout for a Senior AI/ML Specialist Solutions Architect based in San Francisco! Imagine being part of a team that’s reshaping how Fortune 1000 companies and cutting-edge startups approach AI. This is your chance to design and implement groundbreaking AI solutions on a state-of-the-art cloud platform equipped with powerful GPU clusters. As a key player, you'll architect distributed training systems for large-scale AI models, ensuring that every customer solution is finely tuned to deliver maximum performance and business output. You’ll get to work closely with talented engineers and product teams to align product roadmaps with customer needs. With your experience of over 5 years in cloud tech and MLOps, leading the transition from POC to production in machine learning pipelines will be second nature to you. This role is not just about coding; it’s about building lasting relationships with customers and sharing your knowledge through technical presentations and insightful whitepapers. So, if you’re ready to mentor teams, engage with the ML community, and push the limits of AI infrastructure, this is the role for you! Join our client in their mission to democratize access to AI and make a tangible impact across industries. Let your expertise shine in a flexible, rewarding environment where you can grow while collaborating with some of the most talented minds in cloud and AI technologies!

Frequently Asked Questions (FAQs) for Senior AI/ML Specialist Solutions Architect (Cloud & AI Infra) Role at Lavendo
What are the main responsibilities of the Senior AI/ML Specialist Solutions Architect at the company?

The Senior AI/ML Specialist Solutions Architect at our client’s company is tasked with architecting and optimizing distributed training and inference systems for large-scale AI models. You'll design customer-focused solutions that deliver maximum performance while also mentoring teams on AI infrastructure strategies.

Join Rise to see the full answer
What qualifications are required for the Senior AI/ML Specialist Solutions Architect position?

To qualify for the Senior AI/ML Specialist Solutions Architect position, candidates should have at least 5 years of experience in cloud technologies and infrastructure with expertise in AI workloads across multi-GPU environments. Knowledge of ML frameworks like PyTorch and JAX as well as experience with NVIDIA's HPC ecosystem is also essential.

Join Rise to see the full answer
How does the company support the professional development of the Senior AI/ML Specialist Solutions Architect?

The company highly supports professional development through mentoring opportunities, active participation in the ML community, and encouraging contributions to open-source projects. They also offer competitive compensation along with comprehensive healthcare and flexible work arrangements.

Join Rise to see the full answer
What technologies will the Senior AI/ML Specialist Solutions Architect be working with?

In this role, the Senior AI/ML Specialist Solutions Architect will be working with cutting-edge technologies, including NVIDIA GPU infrastructure, ML frameworks like PyTorch and TensorFlow, orchestration tools like Kubernetes, and big data frameworks such as Spark and Hadoop.

Join Rise to see the full answer
What is the interviewing process for the Senior AI/ML Specialist Solutions Architect position?

The interviewing process consists of three main levels: an initial interview with Talent Acquisition, followed by a conversation with the Hiring Manager, and a technical assessment. Reference and background checks are conducted after successful interviews, leading to a job offer for the selected candidate.

Join Rise to see the full answer
Common Interview Questions for Senior AI/ML Specialist Solutions Architect (Cloud & AI Infra)
How do you optimize distributed training systems for AI models?

When answering this question, share specific strategies you have used, such as data parallelism and model parallelism, or leveraging frameworks that streamline the training process. Highlight relevant experiences, especially those that involve optimizing performance across multiple GPUs.

Join Rise to see the full answer
What experience do you have with transitioning machine learning models from POC to production?

Discuss your role in previous projects where you successfully transitioned ML models, focusing on any challenges faced and solutions implemented. Emphasize methodologies like validation, testing, and maintaining model performance after deployment.

Join Rise to see the full answer
Can you explain your experience with the NVIDIA HPC ecosystem?

Prepare to elaborate on your experience with NVIDIA technologies like CUDA or NCCL. Provide examples of projects where you utilized these technologies to enhance processing efficiency or reduce computational loads in AI models.

Join Rise to see the full answer
How do you manage customer expectations in AI project delivery?

Showcase your communication skills by discussing your approach to setting realistic timelines and keeping clients informed throughout the development process. Mention strategies like regular updates and feedback sessions to build trust with stakeholders.

Join Rise to see the full answer
What is the importance of infrastructure as code (IaC) in AI development?

Explain how IaC streamlines deployment and management of infrastructure, which is crucial for maintaining consistency across AI environments. Mention your experience with tools like Terraform or Ansible for automation and scaling.

Join Rise to see the full answer
Describe a time you had to mentor someone in your field.

Provide an example where you guided a colleague or a team member, specifying the challenges they faced and how your mentorship made a positive impact in their skill development and project success.

Join Rise to see the full answer
What role do you think community engagement plays in AI/ML?

Discuss your views on collaboration and knowledge sharing through conferences, open-source projects, and platforms like Kaggle. Explain how these activities have enriched your skills and contributed to your professional growth.

Join Rise to see the full answer
What strategies do you use to ensure AI solutions provide maximum business value?

Articulate the importance of aligning technical solutions with business objectives. Share specific methodologies, like ROI calculations or KPI assessments, that you've used to demonstrate value to clients.

Join Rise to see the full answer
How familiar are you with orchestration tools such as Kubernetes in AI deployment?

Discuss your experience with using Kubernetes or similar tools for managing containerized applications, especially in scaling AI services and ensuring high availability.

Join Rise to see the full answer
What are some challenges you foresee in AI infrastructure development?

Talk about challenges such as data security, scalability, and managing distributed systems. Share your thoughts on potential solutions and how you've approached such challenges in previous roles.

Join Rise to see the full answer
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
LOCATION
No info
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
January 10, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!