Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
 Senior Machine Learning Engineer - Hardware Abstractions & Performance Optimization image - Rise Careers
Job details

Senior Machine Learning Engineer - Hardware Abstractions & Performance Optimization

Luma’s mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. So, we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change.

We are looking for engineers with significant experience maintaining & designing highly efficient systems and code that can be optimized to run on multiple hardware platforms, bringing our state-of-the-art models to as many people at the best performance per dollar.

Responsibilities

  • Ensure efficient implementation of models & systems with a focus on designing, maintaining, and writing abstractions that scale beyond NVIDIA/CUDA hardware.

  • Identify and remedy efficiency bottlenecks (memory, speed, utilization, communication) by profiling and implementing high-performance PyTorch code, deferring to Triton or similar kernel-level languages as necessary.

  • Benchmarking our products across a variety of hardware & software to help the product team understand the optimal tradeoffs between latency, throughput and cost at various degrees of parallelism.

  • Work together with our partners to help them identify bottlenecks and push forward new iterations of hardware and software.

  • Work closely together with the rest of the research team to ensure systems are planned to be as efficient as possible from start to finish and raise potential issues for hardware integration.

Must have experience

  • Experience optimizing for memory, latency and throughput in Pytorch.

    • Bonus: experience with non-NVIDIA systems

  • Experience using torch.compile / torch.XLA.

  • Experience benchmarking and profiling GPU & CPU code in Pytorch for optimal device utilization (examples: torch profiler, memory profilers, trace viewers, custom tooling).

  • Experience building tools & abstractions to ensure models run optimally on different hardware and software stacks .

  • Experience working with transformer models and attention implementations.

  • Experience with parallel inference, particularly with tensor parallelism, pipeline parallelism.

Good to have experience

  • Experience with high-performance Triton/CUDA and writing custom PyTorch kernels and ops. Top candidates will be able to write fused kernels for common hot paths, understand when to make use of lower level features like tensor cores or warp intrinsics, and will understand where these tools can be most impactful.

  • Experience writing high-performance parallel C++. Bonus if done within an ML context with PyTorch, like for data loading, data processing, inference code

  • Experience building inference / demo prototype code (incl. Gradio, Docker etc.)

Luma AI Glassdoor Company Review
4.4 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
Luma AI DE&I Review
4.3 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
CEO of Luma AI
Luma AI CEO photo
Unknown name
Approve of CEO

Average salary estimate

$150000 / YEARLY (est.)
min
max
$120000K
$180000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Senior Machine Learning Engineer - Hardware Abstractions & Performance Optimization, Luma AI

At Luma in Palo Alto, we're on a mission to revolutionize AI by integrating multimodal approaches to expand human imagination and capabilities. As a Senior Machine Learning Engineer specializing in Hardware Abstractions & Performance Optimization, you will be at the forefront of this exciting journey. Your role will focus on creating highly efficient systems and code optimized for various hardware platforms. Your expertise will ensure our cutting-edge models perform seamlessly, maximizing performance for our users. You'll work with the latest technologies to implement and maintain robust abstractions that extend beyond NVIDIA/CUDA platforms. Your keen eye for identifying bottlenecks will drive the improvement of memory utilization, speed, and overall efficiency. Collaborating with our talented research team, you'll play a pivotal role in the integration of our models into real-world applications. If you have experience optimizing PyTorch for performance and are eager to tackle the challenges of multimodal AI, we'd love to hear from you!

Frequently Asked Questions (FAQs) for Senior Machine Learning Engineer - Hardware Abstractions & Performance Optimization Role at Luma AI
What are the responsibilities of a Senior Machine Learning Engineer at Luma?

As a Senior Machine Learning Engineer at Luma, your core responsibilities include optimizing models for performance across multiple hardware platforms, profiling for efficiency bottlenecks, and collaborating closely with product and research teams. You'll need to benchmark products to assess tradeoffs and ensure that both hardware and software components work seamlessly together.

Join Rise to see the full answer
What qualifications are required for the Senior Machine Learning Engineer position at Luma?

Candidates for the Senior Machine Learning Engineer position at Luma should have extensive experience with PyTorch optimization, particularly regarding memory, latency, and throughput. Familiarity with benchmarking tools and performance profiling in GPU and CPU code is essential. Experience in working with transformer models and an understanding of tensor parallelism are also highly valued.

Join Rise to see the full answer
What experience is beneficial for this Senior Machine Learning Engineer role at Luma?

While primary experience with PyTorch is crucial, having expertise in non-NVIDIA systems, high-performance Triton/CUDA, and writing custom kernels will set you apart. Experience with parallel C++ in machine learning contexts and building inference prototype code will greatly enhance your candidacy for the role at Luma.

Join Rise to see the full answer
How does Luma approach hardware optimization for AI models?

At Luma, we focus on efficient implementation by designing abstractions that can scale beyond traditional CUDA hardware. Our approach entails identifying efficiency bottlenecks and employing high-performance code strategies using PyTorch to achieve optimal device utilization across various systems.

Join Rise to see the full answer
What type of work environment can a Senior Machine Learning Engineer expect at Luma?

Luma fosters a collaborative and innovative work environment where Senior Machine Learning Engineers can actively engage with research teams and partners. You will have opportunities to brainstorm solutions, refine hardware structures, and drive product improvements, all while contributing to groundbreaking AI technology.

Join Rise to see the full answer
Common Interview Questions for Senior Machine Learning Engineer - Hardware Abstractions & Performance Optimization
Can you describe your experience optimizing performance in PyTorch?

When answering this question, be specific about the techniques you used to optimize PyTorch models, such as methods for reducing memory consumption, improving throughput, and lowering latency. Highlight any specific projects where your optimizations led to measurable improvements.

Join Rise to see the full answer
What strategies do you use for profiling and benchmarking models?

Discuss various tools you've used, such as torch profiler and memory profilers. Be sure to explain how you interpret the results and the steps you take to address any identified bottlenecks in performance.

Join Rise to see the full answer
How have you handled performance issues with parallel inference?

Provide examples from your past experiences where you've tackled challenges related to tensor and pipeline parallelism. Detail the approaches you took to resolve these issues and enhance performance.

Join Rise to see the full answer
What is your experience with custom kernel development in PyTorch?

Share specific instances where you've written custom kernels, and discuss how that impacted performance. Highlight your understanding of low-level features and the considerations involved in making such optimizations.

Join Rise to see the full answer
How do you collaborate with research and product teams?

Describe your communication strategies, emphasizing your ability to articulate technical details to non-technical stakeholders. Provide examples of collaborative projects where your input aided in product development.

Join Rise to see the full answer
What tools do you prefer for optimization and why?

Be prepared to discuss the optimization tools you most frequently use and the reasons behind your preferences. This could involve discussing tradeoffs, ease of use, or comprehensive capabilities.

Join Rise to see the full answer
Can you explain your familiarity with non-NVIDIA systems?

If applicable, outline your experience with various hardware systems, the challenges you faced, and how you adapted your approach to optimize performance beyond NVIDIA platforms.

Join Rise to see the full answer
What processes do you follow to ensure efficient resource utilization?

Talk about the steps you take to monitor and assess resource usage, including any specific methodologies or tools you employ to achieve optimal performance across hardware and software stacks.

Join Rise to see the full answer
How do you stay up-to-date with the latest advancements in machine learning and hardware optimization?

Highlight your ongoing learning practices, such as reading research papers, attending conferences, or participating in online forums. Share any recent advancements that you find particularly exciting or relevant.

Join Rise to see the full answer
Can you provide an example of a challenging problem you solved in your previous role?

Choose a specific technical problem that demonstrates your problem-solving abilities. Explain the problem, your approach in tackling it, and the eventual outcome to showcase your competencies in a real-world scenario.

Join Rise to see the full answer
Similar Jobs
Luma AI Hybrid Palo Alto, California
Posted 9 days ago
Posted 5 days ago
Photo of the Rise User
CLEAR - Corporate Hybrid New York, New York, United States
Posted 6 days ago
Photo of the Rise User
Truemed Remote No location specified
Posted 5 days ago
Photo of the Rise User
Posted 10 days ago
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Reward & Recognition
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Performance Bonus
Equity
Maternity Leave
Paternity Leave
Paid Holidays
Paid Time-Off
Sabbatical
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, on-site
DATE POSTED
March 12, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
Photo of the Rise User
92 people applied to Scrum Master-Remote at DICE
A
Someone from OH, Lewis Center just viewed 34505367634 - Fraud Analyst at Activate Talent
Photo of the Rise User
Someone from OH, Dublin just viewed Senior Third-Party Risk Analyst at Fenergo
Photo of the Rise User
Someone from OH, Columbus just viewed US Product Designer at Praxent
Photo of the Rise User
22 people applied to Senior PLSQL Developer at ProArch
Photo of the Rise User
Someone from OH, Cleveland just viewed Accounting Co-Op (Part-Time) at Avery Dennison
Photo of the Rise User
Someone from OH, North Ridgeville just viewed Product Manager at ShiftCare
Photo of the Rise User
Someone from OH, North Ridgeville just viewed Product Operations at Binance
Photo of the Rise User
Someone from OH, Mentor just viewed Sales & Service Lead - Pinecrest at Alo Yoga