AryaXAI stands at the forefront of AI innovation, revolutionizing AI for mission-critical businesses by building explainable, safe, and aligned systems that scale responsibly. Our mission is to create AI tools that empower researchers, engineers, and organizations to unlock AI's full potential while maintaining transparency and safety.
Our team thrives on a shared passion for cutting-edge innovation, collaboration, and a relentless drive for excellence. At AryaXAI, everyone contributes hands-on to our mission in a flat organizational structure that values curiosity, initiative, and exceptional performance.
Requirements:
Core Languages: CUDA, Python
Frameworks: CUTLASS, pybind11 (or similar)
Tools: Nsight, JAX/XLA bindings
Focus Areas: GPU Kernel Optimization, Deep Learning Inference & Training
Role Overview
We are looking for a highly skilled AI Researcher - GPU Kernel Developer to join our team and push the boundaries of high-performance AI computation. In this role, you will design, develop, and optimize GPU kernels that power state-of-the-art AI models. Your work will directly influence the performance and scalability of our AI systems.
Key Responsibilities:
Develop and refine low-level CUDA kernel optimizations for deep learning inference and training.
Profile, debug, and optimize single and multi-GPU operations using tools like Nsight.
Deeply understand and exploit GPU memory hierarchies and computational capabilities.
Implement cutting-edge methods from research papers into CUDA kernels.
Collaborate on designing innovative solutions to achieve peak GPU performance.
Ideal Candidate Profile
We are looking for candidates with a proven track record of excellence in GPU programming and AI system optimization. You should bring:
Core Experiences:
Expertise in designing high-performance GeMM CUDA kernels using Tensor cores or CUDA cores, leveraging tools like CuTe or CUTLASS.
Proficiency in extending or writing custom attention and deep learning kernels from scratch.
Confidence in writing both forward and backward kernels while managing floating-point precision errors.
Strong optimization skills for both memory-bound and compute-bound operations.
Advanced knowledge of GPU architecture, including register pressure, shared-memory usage, and GPU utilization.
Preferred Skills:
Familiarity with profiling tools (e.g., Nsight) to identify bottlenecks and improve performance.
Experience integrating custom kernels with frameworks like JAX/XLA through tools like pybind11.
Awareness of the latest advancements in GPU optimization techniques for AI workloads.
Why Join AryaXAI?
Mission-Driven Impact: Work on challenges that shape the future of responsible AI.
Technical Excellence: Collaborate with a team of passionate and experienced professionals.
Growth Opportunities: Contribute across domains and expand your expertise in GPU kernel development and AI research.
Flexible Work Environment: Choose between remote work or relocation support to one of our key offices.
Interview Process
Application Review: We review your CV and a statement of exceptional work.
Initial Interview (15 Minutes): A technical team member will evaluate your basic skills and fit for the role.
Main Process:
Coding Assessment: Solve programming challenges in your preferred language.
Systems Problem-Solving: A live, hands-on session to showcase practical expertise.
Project Deep-Dive: Present your most notable project to our team.
Team Meet & Greet: Engage with the broader AryaXAI team.
Note: Our interviews are designed to conclude within one week to streamline your onboarding process.
At AryaXAI, we’re seeking an experienced AI Researcher specializing in CUDA to join our innovative team in Mumbai. We're not just another tech company; we stand at the forefront of AI innovation, creating systems that are not only powerful but also explainable and aligned with mission-critical business needs. As an AI Researcher - GPU Kernel Developer, you will have the unique opportunity to design, develop, and optimize high-performance GPU kernels that drive state-of-the-art AI models. Your work will directly impact the efficiency and scalability of our systems, empowering engineers, researchers, and organizations alike. We pride ourselves on a flat organizational structure where everyone's insights and talents are valued. In this role, you'll work collaboratively with a team driven by a passion for excellence, diving deep into GPU programming, memory hierarchies, and computational capabilities. If you’re ready to tackle challenges that shape the future of responsible AI while enjoying a flexible work environment, let's explore the possibilities together at AryaXAI!
Subscribe to Rise newsletter