Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Senior Software Engineer, Distributed Systems & Infrastructure image - Rise Careers
Job details

Senior Software Engineer, Distributed Systems & Infrastructure

About Us

At Vizcom, we empower designers at companies like Nike, General Motors, and Riot Games to turn ideas into reality faster and with more precision. Our tools integrate seamlessly into workflows, providing real-time feedback that bridges creativity and manufacturability.

We’re building a high-performance, reliable job-scheduling system that powers distributed AI/ML workflows and ephemeral jobs. Our platform must handle large-scale concurrency, orchestrate GPU workers, and provide seamless failover and retry. We value engineers who excel at designing robust infrastructure, implementing elegant distributed systems, and writing clean, maintainable code.

The Role

You will be the primary engineer designing and implementing a next-generation Job Scheduling & Distributed Computing platform. This includes everything from a fault-tolerant queue system to advanced load balancing, worker orchestration, real-time monitoring, and autoscaling. You’ll collaborate with product teams to ensure the platform can handle diverse workloads—such as ephemeral AI jobs, data processing, and high-priority tasks.

Key Responsibilities

  • Design & Build a job scheduling service:

    • Architect a robust queuing system (Redis, Postgres, or other) to track, schedule, and distribute jobs across multiple workers/GPUs.

    • Implement advanced features: priority scheduling, concurrency limits, retry logic, and timeouts.

  • Infrastructure & Reliability:

    • Ensure the system is highly available, fault tolerant, and horizontally scalable.

    • Introduce monitoring, alerting, and logging best practices for distributed workloads.

    • Automate provisioning, autoscaling, and failover in cloud environments (AWS, GCP, or similar).

  • Worker Orchestration:

    • Manage worker registration and capacity tracking.

    • Implement a load balancing strategy based on resource usage (GPU, CPU, memory).

    • Support ephemeral job “mailboxes,” streaming results to clients in real time.

  • System Integrations:

    • Collaborate with AI/ML teams to integrate inference workloads (e.g., GPU-intensive tasks) into the job scheduler.

    • Hook into existing deployment pipelines and internal tooling.

  • Performance & Observability:

    • Collect and analyze metrics for scheduling latency, queue lengths, job success/failure, and worker health.

    • Optimize throughput, minimize overhead, and detect performance bottlenecks early.

About You

  • 5+ years of experience in backend or infrastructure engineering with a focus on distributed systems or HPC (high-performance computing).

  • Deep knowledge of concurrency patterns, job queues, or pub/sub frameworks (e.g., BullMQ, RabbitMQ, Kafka, or custom solutions).

  • Cloud Expertise: Comfortable deploying containerized services (Docker/Kubernetes) on AWS, GCP, or Azure. Knowledge of IaC (Pulumi, Terraform, or CDK) is a plus.

  • Database & Caching: Skilled with SQL/NoSQL. Familiarity with in-memory datastores like Redis for real-time queueing.

  • Programming: Proficient in Node.js/TypeScript (or similar backend language). Strong coding skills, comfortable writing production-grade code, testable components, and microservices.

  • Scalable Infra: Track record of designing and running highly scalable, resilient backends. Experience with autoscaling GPU or HPC clusters is a huge bonus.

  • Monitoring & DevOps: Good grasp of logging, metrics (Datadog, Prometheus, Grafana), and CI/CD pipelines.

Nice to Have

  • GPU / ML: Experience orchestrating GPU-intensive jobs, integrating with frameworks like PyTorch or TensorFlow.

  • Event-Driven: Familiarity with tRPC, GraphQL, or gRPC for real-time or streaming data flows.

  • Security & Networking: Knowledge of API token management, service-to-service security, TLS termination, etc.

  • Autoscaling: Practical experience building or tuning an autoscaler.

What We Offer

  • Ownership & Impact: You’ll design a critical system used by the entire organization—your code is the backbone of large-scale AI/ML workflows.

  • Cutting-Edge Stack: Work with GPU clusters, ephemeral job management, real-time scheduling, and advanced cloud infra.

  • Flexible Work Environment: Remote-friendly culture, flexible hours, and supportive of personal development.

  • Compensation & Benefits: Competitive salary, equity, healthcare, and an allowance for home office or co-working space.

  • Growth Opportunities: Leadership track potential—help define the engineering culture and best practices for years to come.

Average salary estimate

$140000 / YEARLY (est.)
min
max
$120000K
$160000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Photo of the Rise User
Posted 7 days ago

Develop and maintain scalable infrastructure software while automating processes and ensuring secure, high-performance system operations.

Photo of the Rise User
Navan Hybrid Palo Alto, California, United States
Posted 7 days ago

Contribute to Navan's Expense Platform as a Backend Engineer, building innovative, scalable, and reliable backend services that power modern expense management.

Photo of the Rise User
WalkMe Hybrid New York City
Posted 5 days ago

WalkMe is looking for a Full Stack Developer passionate about web technologies and product innovation to help elevate their digital adoption platform.

Posted yesterday

Contribute as a Full-Stack Product Engineer at One Project to develop innovative technology supporting a new, equitable economic system.

Advance your career by developing cutting-edge EDA software and AI applications with Cadence in Burlington, MA.

Photo of the Rise User

An exciting opportunity for Senior Software Engineers to drive innovation in computational geometry and manufacturing automation at a well-funded early-stage startup.

Photo of the Rise User
Koalafi Hybrid Richmond, Virginia, United States
Posted 6 days ago

Koalafi seeks a Full Stack Engineer skilled in modern web technologies to develop scalable customer-facing applications in an innovative fintech environment.

Photo of the Rise User
StubHub Hybrid Los Angeles, California, United States
Posted 4 days ago

Contribute to StubHub’s edge infrastructure as a Staff Software Engineer focused on state-of-the-art CDN and security systems, enabling millions of live event fans worldwide.

Posted 3 days ago

Noetica is hiring a Backend Software Engineer to build robust data pipelines and integrations that support cutting-edge NLP solutions in capital markets.

Experienced .NET and Angular developer wanted for a remote contract role with a leading nearshore technology firm serving top U.S. clients.

Photo of the Rise User
Apple Hybrid Cupertino, California, United States
Posted 13 days ago
Inclusive & Diverse
Diversity of Opinions
Work/Life Harmony
Dare to be Different
Reward & Recognition
Empathetic
Take Risks
Growth & Learning
Transparent & Candid
Mission Driven
Passion for Exploration
Feedback Forward
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Learning & Development
Paid Time-Off
Maternity Leave
Social Gatherings

Contribute to the evolution of Apple's core operating system foundation by developing innovative system software tailored for cloud and server environments within the Darwin Server team.

Photo of the Rise User
Posted 10 days ago

Drive global user growth as a Senior Software Engineer on Airwallex's innovative Growth Team based in San Francisco.

Photo of the Rise User
Posted 4 days ago

Lead a cross-functional team at Shield AI to build advanced test infrastructures for cutting-edge AI and robotics defense technology.

MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
HQ LOCATION
No info
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
February 21, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY