Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
ML Framework Engineer image - Rise Careers
Job details

ML Framework Engineer

At Tensorwave, we’re leading the charge in AI compute, building a versatile cloud platform that’s driving the next generation of AI innovation. We’re focused on creating a foundation that empowers cutting-edge advancements in intelligent computing, pushing the boundaries of what’s possible in the AI landscape.


Job Description:

TensorWave is seeking an ML Framework Engineer to lead the integration, optimization, and maintenance of PyTorch (and select AI libraries) on AMD ROCm GPUs. This role is critical in ensuring our AI cloud platform remains at the cutting edge of performance, stability, and compatibility by tracking upstream framework changes, debugging compatibility issues, and automating builds, testing, and benchmarking. You will be responsible for maintaining a registry of validated AI libraries, debugging low-level performance issues, and working with external maintainers to upstream fixes. You will collaborate with DevOps, MLOps, and AI researchers to ensure a seamless deployment and development experience across TensorWave’s infrastructure. This role is ideal for an engineer with deep PyTorch internals knowledge, strong GPU debugging experience, and a passion for optimizing AI workloads at the framework level.


Responsibilities
  • Framework Compatibility & Versioning: Track PyTorch and other AI framework updates, maintain a versioned registry of validated builds, and proactively handle breaking changes.
  • Kernel Debugging & Profiling: Triage and debug ROCm-related issues affecting AI workloads, handling small fixes directly and escalating complex issues to MLOps and third-party maintainers.
  • Build & CI/CD Automation: Develop and maintain automated build pipelines for AI frameworks, integrating regression testing and benchmarking, while working with DevOps for large-scale automation.
  • Performance Optimization: Profile and analyze AI workload performance on AMD GPUs, identifying bottlenecks in memory access, kernel execution, and framework overhead.
  • Third-Party Collaboration: Work with PyTorch maintainers, ROCm engineers, and external AI library contributors to improve framework compatibility and push upstream fixes when needed.
  • Container & Environment Management: Maintain and update prebuilt AI container environments, ensuring seamless integration with TensorWave’s inference and training infrastructure.
  • Documentation & Knowledge Sharing: Serve as the SME (Subject Matter Expert) for library compatibility, maintaining internal documentation on framework versions, known issues, and best practices.


Essential Skills & Qualifications
  • 3+ years of experience in ML framework development, optimization, or GPU debugging.
  • Strong expertise in PyTorch internals, model execution, and AI framework architecture.
  • Experience with ROCm or CUDA development, including kernel debugging and profiling.
  • Proficiency in Python and C++, with experience in optimizing AI workloads at the framework level.
  • Familiarity with low-level GPU performance profiling tools (rocprof, Nsight, perf, VTune, etc.).
  • Hands-on experience with CI/CD for AI frameworks, including automated testing and benchmarking.
  • Strong understanding of containerization (Docker, Kubernetes) and dependency management (pip, Conda, Bazel, CMake, etc.).
  • Excellent documentation skills, with a focus on library versioning, compatibility tracking, and regression analysis.


Preferred Qualifications
  • Experience contributing to PyTorch or other open-source ML frameworks.
  • Prior experience maintaining a private pip or Conda package registry for AI software.
  • Familiarity with distributed training, model parallelism, and mixed precision training.
  • Knowledge of LLM-specific optimizations, such as quantization and tensor parallel execution.
  • Exposure to high-performance computing (HPC) environments for AI workloads.


We’re looking for resilient, adaptable people to join our team—folks who enjoy collaborating and tackling tough challenges. We’re all about offering real opportunities for growth, letting you dive into complex problems and make a meaningful impact through creative solutions. If you're a driven contributor, we encourage you to explore opportunities to make an impact at Tensorwave. Join us as we redefine the possibilities of intelligent computing.


What We Bring:

In addition to a competitive salary, we offer a variety of benefits to support your needs, including:

Stock Options

100% paid Medical, Dental, and Vision insurance 

Life and Voluntary Supplemental Insurance

Short Term Disability Insurance

Flexible Spending Account

401(k)

Flexible PTO

Paid Holidays

Parental Leave

Mental Health Benefits through Spring Health

Average salary estimate

$110000 / YEARLY (est.)
min
max
$90000K
$130000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Photo of the Rise User
Posted 4 days ago

Contribute as a Cloud Engineer at TensorWave, providing expert support and solutions for cutting-edge AI cloud infrastructure and driving customer success.

Photo of the Rise User
Mission Driven
Collaboration over Competition
Inclusive & Diverse
Growth & Learning
Maternity Leave
Paternity Leave
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Paid Time-Off

Lead the development of AI-powered solutions at Airbnb to revolutionize global customer support as a Senior Staff Machine Learning Engineer.

Chevron Hybrid Richmond, California, United States of America
Posted 12 days ago

Experienced Civil/Structural Engineer with a California Professional Engineering license needed at Chevron to support critical engineering projects and operations.

Posted 11 days ago

A temporary part-time role supporting the Center for Industrial Metal Forming through industry engagement and operational coordination at The Ohio State University.

Photo of the Rise User

Palo Alto Networks seeks a skilled Principal Cloud Security Engineer to drive design and automation of multi-cloud security solutions at scale.

Photo of the Rise User
Posted 12 days ago

Experienced Project Engineer needed at BRUDIS & ASSOCIATES to lead structural and bridge design projects in Maryland.

Photo of the Rise User

Experienced Senior Advanced Manufacturing Engineer wanted at Allegion to drive metal fabrication process improvements and manufacturing strategy across global facilities.

H&H Hybrid No location specified
Posted 13 days ago

Support critical bridge inspections and advance your career with H&H as an Assistant Team Leader Trainee in New York.

Photo of the Rise User
Medtronic Hybrid Seattle, Washington, United States of America
Posted 3 days ago

A Field Service Engineer III role at Medtronic delivering technical expertise to support cardiac ablation systems with 75% travel in the Pacific Northwest region.

ngc Hybrid United States-California-Palmdale
Posted 14 days ago

Advance aerospace technology as a Materials and Process Engineer at Northrop Grumman, applying material science expertise to high-impact defense projects.

Photo of the Rise User
CPT Hybrid Destin, Florida, United States
Posted 12 days ago

Experienced Cyber Range Operations Engineer wanted to lead the technical execution and support of cyber event operations at the National Cyber Range Complex for Command Post Technologies.

Posted 4 days ago

Glacier seeks a seasoned Senior Computer Vision Engineer to enhance ML models and infrastructure for robotics driving recycling innovations.

Photo of the Rise User

Lead engineering projects for DOE nuclear facilities in a remote role with Sargent & Lundy’s Government Services Division.

Photo of the Rise User
Posted 3 days ago

Opportunity for an early-career HVAC Design Engineer to join a dynamic team focused on commercial and institutional mechanical system design in West Chester, PA.

Supercharge your large-scale PyTorch LLM workloads with our cloud powered by AMD MI300X

17 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, on-site
DATE POSTED
March 22, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!