Job title: Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical Staff
Who We Are
WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive.
Role overview: The Software Engineer, AI Infrastructure (Training + Inference) will be responsible for designing, building, and optimizing the infrastructure that powers our large scale training and real-time inference pipelines. This role combines expertise in distributed computing, system reliability, and performance optimization. The candidate will collaborate with researchers with a focus on building scalable systems to support novel multimodal training and maintaining uptime to deliver consistent results for real-time applications.
Key Responsibilities
Infrastructure Development: Design and implement infrastructure to support large-scale AI training and real-time inference with a focus on multimodal inputs..
Distributed Computing: Build and maintain distributed systems to ensure scalability, efficient resource allocation, and high throughput.
Training Stability: Monitor and enhance the stability of training workflows by addressing bottlenecks, failures, and inefficiencies in large-scale AI pipelines.
Real-time Inference Optimization: Develop and optimize real-time inference systems to deliver low-latency, high-throughput results across diverse applications.
Uptime & Reliability: Implement tools and processes to maintain high uptime and ensure infrastructure reliability during both training and inference phases.
Performance Tuning: Identify and resolve performance bottlenecks, improving overall system throughput and response times.
Collaboration: Work closely with research and engineering teams to integrate infrastructure with AI workflows, ensuring seamless deployment and operation.
Required Skills & Qualifications
Distributed Systems Expertise: Proven experience in designing and managing distributed systems for large-scale AI training and inference.
Infrastructure for AI: Strong background in building and optimizing infrastructure for real-time AI systems, with a focus on multimodal data (audio + text).
Performance Optimization: Expertise in optimizing resource utilization, improving system throughput, and reducing latency in both training and inference.
Training Stability: Experience in troubleshooting and stabilizing AI training pipelines for high reliability and efficiency.
Technical Proficiency: Strong programming skills (Python preferred), proficiency with PyTorch, and familiarity with cloud platforms (AWS, GCP, Azure).
WaveForms AI is on the lookout for a talented Software Engineer, AI Infrastructure (Training + Inference) to join our dynamic team and help shape the future of audio intelligence. Imagine being at the forefront of advanced research where your expertise in designing, building, and optimizing infrastructure is essential for powering large-scale training and real-time inference pipelines. In this role, you'll dive into distributed computing and system reliability, working closely with researchers to create scalable systems that enhance human-AI interactions. Your day-to-day responsibilities will include developing infrastructure tailored for multimodal inputs, ensuring that our AI training workflows are stable, and optimizing real-time inference systems for lightning-fast results across a broad range of applications. We value reliability and uptime, so your skills in implementing processes to sustain high performance will be crucial. With your knowledge of performance tuning, you'll identify bottlenecks and elevate throughput while collaborating with our engineering teams for seamless deployments. If you have a strong background in distributed systems and a passion for improving AI capabilities, this position offers you the platform to make impactful contributions in a rapidly evolving field. Come join WaveForms AI and be part of a team that’s transforming the way humans and AI interact through groundbreaking technology.
Subscribe to Rise newsletter