Job details

ML Framework Engineer

Get a free resume review

At Tensorwave, we’re leading the charge in AI compute, building a versatile cloud platform that’s driving the next generation of AI innovation. We’re focused on creating a foundation that empowers cutting-edge advancements in intelligent computing, pushing the boundaries of what’s possible in the AI landscape.

Job Description:

TensorWave is seeking an ML Framework Engineer to lead the integration, optimization, and maintenance of PyTorch (and select AI libraries) on AMD ROCm GPUs. This role is critical in ensuring our AI cloud platform remains at the cutting edge of performance, stability, and compatibility by tracking upstream framework changes, debugging compatibility issues, and automating builds, testing, and benchmarking. You will be responsible for maintaining a registry of validated AI libraries, debugging low-level performance issues, and working with external maintainers to upstream fixes. You will collaborate with DevOps, MLOps, and AI researchers to ensure a seamless deployment and development experience across TensorWave’s infrastructure. This role is ideal for an engineer with deep PyTorch internals knowledge, strong GPU debugging experience, and a passion for optimizing AI workloads at the framework level.

Responsibilities

Framework Compatibility & Versioning: Track PyTorch and other AI framework updates, maintain a versioned registry of validated builds, and proactively handle breaking changes.
Kernel Debugging & Profiling: Triage and debug ROCm-related issues affecting AI workloads, handling small fixes directly and escalating complex issues to MLOps and third-party maintainers.
Build & CI/CD Automation: Develop and maintain automated build pipelines for AI frameworks, integrating regression testing and benchmarking, while working with DevOps for large-scale automation.
Performance Optimization: Profile and analyze AI workload performance on AMD GPUs, identifying bottlenecks in memory access, kernel execution, and framework overhead.
Third-Party Collaboration: Work with PyTorch maintainers, ROCm engineers, and external AI library contributors to improve framework compatibility and push upstream fixes when needed.
Container & Environment Management: Maintain and update prebuilt AI container environments, ensuring seamless integration with TensorWave’s inference and training infrastructure.
Documentation & Knowledge Sharing: Serve as the SME (Subject Matter Expert) for library compatibility, maintaining internal documentation on framework versions, known issues, and best practices.

Essential Skills & Qualifications

3+ years of experience in ML framework development, optimization, or GPU debugging.
Strong expertise in PyTorch internals, model execution, and AI framework architecture.
Experience with ROCm or CUDA development, including kernel debugging and profiling.
Proficiency in Python and C++, with experience in optimizing AI workloads at the framework level.
Familiarity with low-level GPU performance profiling tools (rocprof, Nsight, perf, VTune, etc.).
Hands-on experience with CI/CD for AI frameworks, including automated testing and benchmarking.
Strong understanding of containerization (Docker, Kubernetes) and dependency management (pip, Conda, Bazel, CMake, etc.).
Excellent documentation skills, with a focus on library versioning, compatibility tracking, and regression analysis.

Preferred Qualifications

Experience contributing to PyTorch or other open-source ML frameworks.
Prior experience maintaining a private pip or Conda package registry for AI software.
Familiarity with distributed training, model parallelism, and mixed precision training.
Knowledge of LLM-specific optimizations, such as quantization and tensor parallel execution.
Exposure to high-performance computing (HPC) environments for AI workloads.

We’re looking for resilient, adaptable people to join our team—folks who enjoy collaborating and tackling tough challenges. We’re all about offering real opportunities for growth, letting you dive into complex problems and make a meaningful impact through creative solutions. If you're a driven contributor, we encourage you to explore opportunities to make an impact at Tensorwave. Join us as we redefine the possibilities of intelligent computing.

What We Bring:

In addition to a competitive salary, we offer a variety of benefits to support your needs, including:

Stock Options

100% paid Medical, Dental, and Vision insurance

Life and Voluntary Supplemental Insurance

Short Term Disability Insurance

Flexible Spending Account

401(k)

Flexible PTO

Paid Holidays

Parental Leave

Mental Health Benefits through Spring Health

Machine Learning PyTorch GPU Optimization CI/CD

Average salary estimate

$110000 / YEARLY (est.)

min

max

$90000K

$130000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Cloud Engineer

TensorWave Hybrid Las Vegas

VIEW

Posted 4 days ago

Contribute as a Cloud Engineer at TensorWave, providing expert support and solutions for cutting-edge AI cloud infrastructure and driving customer success.

Senior Staff Machine Learning Engineer - Community Support Engineering

Airbnb Hybrid United States

VIEW

Posted 11 days ago

Mission Driven

Collaboration over Competition

Inclusive & Diverse

Growth & Learning

Maternity Leave

Paternity Leave

Medical Insurance

Dental Insurance

Vision Insurance

Mental Health Resources

Life insurance

Disability Insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

401K Matching

Paid Time-Off

Lead the development of AI-powered solutions at Airbnb to revolutionize global customer support as a Senior Staff Machine Learning Engineer.

Civil/Structural Engineer

Chevron Hybrid Richmond, California, United States of America

VIEW

Posted 12 days ago

Experienced Civil/Structural Engineer with a California Professional Engineering license needed at Chevron to support critical engineering projects and operations.

Laboratory Research Operations Specialist

osu Hybrid Columbus Campus

VIEW

Posted 11 days ago

A temporary part-time role supporting the Center for Industrial Metal Forming through industry engagement and operational coordination at The Ohio State University.

Principal Cloud Security Engineer (InfoSec)

Palo Alto Networks Hybrid Santa Clara, CA

VIEW

Posted 8 days ago

Palo Alto Networks seeks a skilled Principal Cloud Security Engineer to drive design and automation of multi-cloud security solutions at scale.

Project Engineer: Structures Department

Brudis & Associates Hybrid Columbia, Maryland, United States

VIEW

Posted 12 days ago

Experienced Project Engineer needed at BRUDIS & ASSOCIATES to lead structural and bridge design projects in Maryland.

Senior Advanced Manufacturing Engineer

Allegion Hybrid Carmel, IN

VIEW

Posted 3 days ago

Experienced Senior Advanced Manufacturing Engineer wanted at Allegion to drive metal fabrication process improvements and manufacturing strategy across global facilities.

Assistant Team Leader Trainee

H&H Hybrid No location specified

VIEW

Posted 13 days ago

Support critical bridge inspections and advance your career with H&H as an Assistant Team Leader Trainee in New York.

Field Service Engineer III, CAS (Pacific Northwest)

Medtronic Hybrid Seattle, Washington, United States of America

VIEW

Posted 3 days ago

A Field Service Engineer III role at Medtronic delivering technical expertise to support cardiac ablation systems with 75% travel in the Pacific Northwest region.

Engineer / Principal Engineer Materials and Process

ngc Hybrid United States-California-Palmdale

VIEW

Posted 14 days ago

Advance aerospace technology as a Materials and Process Engineer at Northrop Grumman, applying material science expertise to high-impact defense projects.

Cyber Range Operations Engineer

CPT Hybrid Destin, Florida, United States

VIEW

Posted 12 days ago

Experienced Cyber Range Operations Engineer wanted to lead the technical execution and support of cyber event operations at the National Cyber Range Complex for Command Post Technologies.

Senior Computer Vision Engineer

Glacier Hybrid San Francisco

VIEW

Posted 4 days ago

Glacier seeks a seasoned Senior Computer Vision Engineer to enhance ML models and infrastructure for robotics driving recycling innovations.

Lead Project Engineer 1 - Government Services

Sargent & Lundy Hybrid Oak Ridge

VIEW

Posted 22 hours ago

Lead engineering projects for DOE nuclear facilities in a remote role with Sargent & Lundy’s Government Services Division.

HVAC Design Engineer

Latitude Inc Hybrid West Chester, PA

VIEW

Posted 3 days ago

Opportunity for an early-career HVAC Design Engineer to join a dynamic team focused on commercial and institutional mechanical system design in West Chester, PA.

Get a free resume review

TensorWave

Supercharge your large-scale PyTorch LLM workloads with our cloud powered by AMD MI300X

17 jobs

MATCH

Calculating your matching score...

FUNDING

Private

DEPARTMENTS

Engineering

SENIORITY LEVEL REQUIREMENT

Mid-Level