We are now looking for a Senior High-Performance LLM Training Engineer!
NVIDIA is seeking experienced engineers specializing in performance analysis and optimization to improve the efficiency of LLM training workloads, which are shaping the world's most advanced computing systems. This position focuses on optimizing NVIDIA’s high-performance LLM software stack in frameworks like PyTorch and JAX for high-performance training on thousands of GPUs, while also helping shape hardware roadmaps for the next generation of GPUs powering the AI revolution.
What you will be doing:
Understand, analyze, profile, and optimize AI training workloads on innovative hardware and software platforms.
Understand the big picture of training performance on GPUs, prioritizing and then solving problems across all state-of-the-art neural networks.
Implement production-quality software in multiple layers of NVIDIA's deep learning platform stack, from drivers to DL frameworks.
Build and support NVIDIA submissions to the MLPerf Training benchmark suite.
Implement key DL training workloads in NVIDIA's proprietary processor and system simulators to enable future architecture studies.
Build tools to automate workload analysis, workload optimization, and other critical workflows.
What we want to see:
PhD in Computer Science, Electrical Engineering or Computer Engineering and 5+ years; or MS (or equivalent experience) and 8+ years of meaningful work experience.
Strong background in deep learning and neural networks, in particular training.
A deep background in computer architecture and familiarity with the fundamentals of GPU architecture.
Proven experience analyzing and tuning application performance & processor and system-level performance modelling.
Programming skills in C++, Python, and CUDA.
GPU computing is the most productive and pervasive platform for deep learning and AI. It begins with the most advanced GPUs and the systems and software we build on top of them. We integrate and optimize every deep learning framework. We work with the major systems companies and every major cloud service provider to make GPUs available in data centers and in the cloud. We craft computers and software to bring AI to edge devices, such as self-driving cars and autonomous robots. AI has the potential to spur a wave of social progress unmatched since the industrial revolution.
Widely considered to be one of tech's most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. Additionally, this opportunity offers you the ability to collaborate with some of the most forward-thinking and hard-working people in the world, shaping the future of AI in a creative and autonomous work environment that encourages innovation. If you're excited to work across the full hardware & software stack—from GPU architecture to application code—to achieve optimal performance, we want to hear from you!
#LI-Hybrid
The base salary range is 184,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Are you passionate about AI and eager to work at the forefront of technology? Join NVIDIA as a Senior High-Performance LLM Training Engineer, where you will play a crucial role in optimizing our high-performance LLM software stack using frameworks like PyTorch and JAX. Based in stunning Santa Clara, California, this role is all about improving the efficiency of LLM training workloads on thousands of GPUs, a key driver in shaping the world’s most advanced computing systems. You'll dive deep into performance analysis, profiling, and optimization, tackling innovative hardware and software platforms to enhance training performance across state-of-the-art neural networks. Your contributions will extend from developing production-quality software across NVIDIA's deep learning platform to actively participating in the MLPerf Training benchmark suite. With a PhD in a relevant field or extensive work experience, you’ll leverage your background in deep learning, GPU architecture, and programming skills in C++, Python, and CUDA. At NVIDIA, we believe in harnessing the power of GPUs to push AI boundaries and transformative technologies that impact industries worldwide. Join our vibrant team and collaborate with the brightest minds, all while enjoying a competitive salary, robust benefits, and a culture that celebrates innovation and diversity. If you’re excited about working across the full hardware and software stack to achieve optimal performance, we want to hear from you!
Lead a dynamic team of software engineers at NVIDIA to innovate stress-testing applications for GPU validation.
Join NVIDIA as a System Software Engineer and design innovative solutions for their Tegra SoC architecture.
Looking for an Engineering Maintenance Associate in Houston dedicated to enhancing guest experiences through expert maintenance and repairs.
We are seeking a skilled Controls Engineer to support exciting control system projects in a growing company.
Join K-LOVE as an Engineering Network Operations Coordinator, where you'll enhance support for our team and listeners.
American Express is looking for a Mid-Level Architecture Engineer to join their team, where you'll design and support innovative cloud solutions.
Join Joby Aviation as a Systems Test Program Manager, where innovation meets high expectations in the aerospace industry.
T-Systems is looking for an experienced Automation Engineer to enhance their IT solutions leveraging cutting-edge technologies.
Embark on a transformative internship with Jabil, focusing on engineering validation processes and professional growth in a dynamic environment.
As a Senior Infrastructure Engineer at American Express, drive innovation in cloud operations while collaborating with top-tier talent in an inclusive environment.
NVIDIA is a publicly traded, multinational technology company headquartered in Santa Clara, California. NVIDIA's invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, and ignited the era of modern AI.
174 jobsSubscribe to Rise newsletter