Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Infrastructure Engineer (GPU Cluster) image - Rise Careers
Job details

Infrastructure Engineer (GPU Cluster)

💧 About Pallon

At Pallon, a spin-off from ETH Zurich, we’re creating AI that automatically detects defects in sewer inspection videos and advises cities on when & how to fix them. By providing more precise, objective data, we aim to fix wastewater leaks, reduce CO2 emissions, and prevent urban flooding. Our mission is to make cities more sustainable and resilient.

The Role

We're looking for a seasoned infrastructure engineer to take full ownership of our infrastructure — from our high-performance GPU cluster to our cloud systems. You’ll be joining a small, deeply technical team building cutting-edge computer vision and deep learning systems.

This is a hands-on, high-impact role. You’ll lead critical decisions around architecture, performance, and scale, while also jumping in to solve real-world issues — whether that’s designing GPU scheduling strategies, tuning networking performance, or swapping out hardware.

You’ll collaborate closely with our platform and computer vision teams to make sure their tools run fast, reliably, and securely — and you'll have the autonomy to shape how that all comes together.

In this role, you might find yourself:

  • Designing and building a custom GPU cluster for deep learning workloads.

  • Deciding how we manage and scale our infrastructure — both on-prem and in the cloud.

  • Keeping systems running smoothly and securely — from data pipelines to distributed training jobs.

  • Troubleshooting weird kernel errors, configuring systemd units, or debugging Kubernetes evictions.

  • Making calls on when to script, when to automate, and when to just fix the thing.

You’ll be great in this role if:

  • You’ve spent 5+ years owning infrastructure end-to-end, ideally in startup environments.

  • You’re comfortable at every layer — from bare-metal servers and NVMe drives to container orchestration and cloud-native tools.

  • You have strong Linux fundamentals, and you know your way around networking, storage, and distributed systems.

  • You can code well enough to automate, debug, and build tooling across a variety of languages.

  • You communicate clearly and collaborate well — especially with engineers who aren’t infra specialists.

  • You thrive with autonomy and can manage your own priorities effectively.

  • You’re curious and fast-learning, especially when tackling new tools or challenges.

  • You have a university degree in Computer Science or a related field.

Bonus points for:

  • Experience with machine learning infrastructure or HPC clusters.

  • Familiarity with data engineering workflows and ETL pipelines.

Our Tech Stack

You don’t need to have experience with all of this — but here’s what we use today:

  • HPC Cluster (our hardware, colocated in a datacenter): Linux, Nvidia GPUs, Slurm, Infiniband

  • Cloud: Google Cloud Platform, Kubernetes, Docker, GitLab CI/CD

  • Data Analytics: DBT, BigQuery, Metabase

Read more about our Engineering team here.

😎 Benefits & Team Culture

As a part of Pallon, you will:

  • Contribute to a positive impact on society and the environment.

  • Develop a novel product that changes a whole industry.

  • Be part of a motivated, smart, fun, and supportive team of software engineers and AI researchers.

  • Own a part of Pallon and have a part in our success with our Employee Stock Option Plan (ESOP).

  • Work for the Underworld, not the Devil: exploring sewers virtually and in real life during our Pallon offsites.

  • Work from home or enjoy access to our beautiful office space located in Zürich.

Inclusion statement

At Pallon, we highly value equality of opportunity and inclusivity, and we would like to particularly encourage women and candidates from under-represented backgrounds to apply, even if you don’t match with 100% of the requirements.

Pallon Glassdoor Company Review
4.4 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
Pallon DE&I Review
2.0 Glassdoor star iconGlassdoor star icon Glassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of Pallon
Pallon CEO photo
Unknown name
Approve of CEO

Average salary estimate

$100000 / YEARLY (est.)
min
max
$80000K
$120000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Infrastructure Engineer (GPU Cluster), Pallon

Join our innovative team at Pallon as an Infrastructure Engineer (GPU Cluster) in the vibrant city of Berlin! At Pallon, we're on a mission to revolutionize urban sustainability by using AI to detect sewer system defects efficiently, directly contributing to reducing CO2 emissions and preventing flooding. As an Infrastructure Engineer, you'll be at the heart of what makes this technology possible. Your role encompasses overseeing our high-performance GPU cluster and our essential cloud systems, allowing you to leverage your expertise while working alongside a talented, technical group of individuals. You will be hands-on, leading the architectural decisions around infrastructure while solving real-world challenges, from designing GPU scheduling strategies to ensuring the security and performance of our systems. We promise that no two days will be the same as you work closely with our platform and computer vision teams to deliver reliable and fast tools that can make a significant impact. If you're passionate about infrastructure and eager to dive into the latest technologies, this could be the perfect fit for you. With autonomy in managing your projects and a collaborative environment, you'll have the chance to develop unique solutions and advance your professional journey while working towards a sustainable future. Explore the possibilities at Pallon!

Frequently Asked Questions (FAQs) for Infrastructure Engineer (GPU Cluster) Role at Pallon
What are the responsibilities of an Infrastructure Engineer (GPU Cluster) at Pallon?

As an Infrastructure Engineer (GPU Cluster) at Pallon, you will have diverse responsibilities that include designing and building custom GPU clusters for deep learning, managing both on-prem and cloud infrastructure, troubleshooting system issues, and ensuring the smooth operation of all infrastructure components. You'll play a pivotal role in optimizing performance and scaling our systems to meet real-world demands.

Join Rise to see the full answer
What qualifications are required for the Infrastructure Engineer (GPU Cluster) position at Pallon?

To qualify for the Infrastructure Engineer (GPU Cluster) role at Pallon, candidates should have over 5 years of experience managing infrastructure in dynamic environments, especially startups. A solid understanding of Linux, networking, and distributed systems is essential. Familiarity with coding to automate and debug tasks is important, alongside a university degree in Computer Science or a relevant field.

Join Rise to see the full answer
What skills will help me succeed as an Infrastructure Engineer (GPU Cluster) at Pallon?

Success as an Infrastructure Engineer (GPU Cluster) at Pallon hinges on a combination of technical skills and personal attributes. Strong Linux fundamentals, coding ability, and knowledge of cloud-native tools are vital. Equally important are excellent communication skills and the ability to collaborate well with cross-disciplinary teams, especially engineers focused on areas outside of infrastructure.

Join Rise to see the full answer
What is the work environment like for an Infrastructure Engineer (GPU Cluster) at Pallon?

The work environment at Pallon is collaborative and innovative, promoting both autonomy and cooperation among team members. You'll be part of a motivated, intelligent, and supportive team, focused on making a real difference in urban sustainability through technology. Flexibility is also valued, with options for remote work and access to our office in Zürich.

Join Rise to see the full answer
What technology stack does Pallon use for the Infrastructure Engineer (GPU Cluster) position?

At Pallon, the technology stack for the Infrastructure Engineer (GPU Cluster) position includes a high-performance compute cluster set up with Linux, NVidia GPUs, and Slurm. For cloud services, we utilize Google Cloud Platform, employing Kubernetes and Docker, as well as CI/CD through GitLab. Familiarity with this tech stack is beneficial but not mandatory.

Join Rise to see the full answer
Common Interview Questions for Infrastructure Engineer (GPU Cluster)
Can you describe your experience with managing GPU clusters in a production environment?

To effectively answer this question, reflect on any specific projects you have worked on that involved GPU clusters. Discuss the challenges faced, the solutions you implemented, and any operational metrics that improved as a result. Highlight your understanding of performance optimization and troubleshooting methodologies used in GPU workloads.

Join Rise to see the full answer
What approaches do you use for configuring cloud and on-prem infrastructure?

Discuss the strategies you employ for managing hybrid infrastructure environments, focusing on how you balance scalability, cost, and performance. Make sure to mention any tools or methodologies used in your past roles, including your experience with automation and monitoring solutions like Terraform, Ansible, or custom scripts.

Join Rise to see the full answer
Describe a time when you resolved a critical issue in a distributed system.

Share a specific incident, outlining how you identified the problem, the steps taken to troubleshoot, and the ultimate solution achieved. Emphasize your analytical thinking and methodical approach, showcasing any teamwork involved in navigating the challenge effectively.

Join Rise to see the full answer
How do you approach performance tuning for deep learning workloads?

Mention your knowledge of performance metrics and what tools or methodologies you've used for optimization, such as effective GPU scheduling, adjusting batch sizes, and network tuning. Talk about how you balance resource allocation between various workloads to ensure continuous performance improvements.

Join Rise to see the full answer
What programming languages are you most familiar with for automating infrastructure tasks?

Identify the programming languages you have experience with, such as Python, Bash, or Go, and explain how you've used them to automate and streamline your infrastructure tasks. Provide specific examples of scripts or tools you've developed to improve efficiency and reliability in your past roles.

Join Rise to see the full answer
How do you maintain security in your infrastructure?

Detail your understanding of best practices for securing infrastructure, including access control, data encryption, and regular system audits. Mention any specific tools or frameworks used to manage security protocols effectively within both on-premises and cloud environments.

Join Rise to see the full answer
Can you explain your experience with orchestration tools like Kubernetes?

Share your hands-on experience with Kubernetes in managing containerized applications and workloads. Highlight specific tasks you have accomplished, such as deploying applications, setting up service meshes, or managing stateful applications, indicating your comfort level with Kubernetes in various scenarios.

Join Rise to see the full answer
How do you stay updated on the latest technologies in infrastructure engineering?

Discuss the resources you leverage for continuous learning, such as tech blogs, online courses, webinars, and participation in relevant forums or communities. Share any recent trends in infrastructure that you've explored and how you’ve applied new knowledge to your work.

Join Rise to see the full answer
What strategies do you use to effectively prioritize tasks in an infrastructure role?

Describe your approach to task prioritization, emphasizing your ability to evaluate urgency and impact on productivity. Discuss any frameworks you follow, such as Eisenhower Matrix, or tools you use for tracking tasks, ensuring stakeholders are informed about the status of critical infrastructure initiatives regularly.

Join Rise to see the full answer
Can you share an example of how you have collaborated with other teams in a tech environment?

Provide an example of a cross-functional project where collaboration was key. Emphasize your communication strategies and how you aligned objectives with other teams, such as developers or data scientists, to achieve a common goal while addressing any technical challenges that arose.

Join Rise to see the full answer
Similar Jobs
Pallon Remote No location specified
Posted 4 days ago

Pallon seeks a Senior Research Engineer to enhance AI capabilities for urban wastewater management using computer vision technologies.

Pallon Remote No location specified
Posted 8 days ago

Join Pallon as an Account Executive to leverage AI for sustainable city solutions while driving B2B sales growth.

Photo of the Rise User
Posted 9 days ago

Join our team as an Automation Specialist and leverage your expertise in maintaining vital Industrial Control Systems in Deadhorse, AK.

Photo of the Rise User
Hewlett Packard Enterprise | HPE Hybrid Chippewa Falls, Wisconsin, United States of America
Posted 10 days ago

Join Hewlett Packard Enterprise as a Cable Integration Specialist, where you'll play a key role in the manufacturing process and cable management.

Join WindBorne Systems as an Electrical Engineering Intern to contribute to cutting-edge weather technology initiatives in a fast-paced environment.

Photo of the Rise User

Looking for an Engineering Maintenance Associate in Houston dedicated to enhancing guest experiences through expert maintenance and repairs.

Photo of the Rise User
Intel Remote US, California, Santa Clara
Posted 8 days ago
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Customer-Centric
Snacks
Onsite Gym
Family Coverage (Insurance)
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Learning & Development
Paid Time-Off
401K Matching
Maternity Leave
Paternity Leave

Exciting opportunity at Intel for a Design Flow Development Engineer to drive breakthrough technologies and optimize design flows.

Photo of the Rise User
Posted 2 days ago

Join Wonderful Citrus as a Controls Systems Technician and apply your expertise in automation within a thriving manufacturing environment.

Photo of the Rise User

Join GlobalFoundries as a Senior Member of Technical Staff to lead process integration efforts in semiconductor manufacturing.

Join our innovative team as an Enhanced Designated Engineer and lead the way in optimizing Microsoft Dynamics 365 and Power Platform solutions for enterprise clients.

MATCH
VIEW MATCH
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
April 8, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
Photo of the Rise User
Someone from OH, Dayton just viewed Medical Receptionist at LifeStance Health
Photo of the Rise User
Someone from OH, Columbus just viewed Casting: Cedar Lake - Pilot Episode at Backstage
Photo of the Rise User
Someone from OH, Mount Orab just viewed Software Development Manager at Assured Guaranty
H
Someone from OH, Mansfield just viewed Medical Appointment Setter (Remote LatAm) at HireHawk
Photo of the Rise User
Someone from OH, Lewis Center just viewed Third Party Risk Analyst at Experian
Photo of the Rise User
Someone from OH, Columbus just viewed Lead Preschool Teacher at Guidepost Montessori
A
Someone from OH, Cincinnati just viewed Global Supply Manager - Taiwan at Also
Photo of the Rise User
Someone from OH, Cincinnati just viewed Global Supply Manager (Raptor Machining) at SpaceX
Photo of the Rise User
Someone from OH, Reynoldsburg just viewed Summer 2025 Financial Services Internship at Nationwide
Photo of the Rise User
Someone from OH, Brunswick just viewed Staff Software Engineer C++ / Computer Vision at ABBYY
Photo of the Rise User
Someone from OH, Columbus just viewed Label Machine Operator I - 2nd Shift at Avery Dennison
Photo of the Rise User
Someone from OH, North Ridgeville just viewed Java, Javascript, Python, NodeJS Software Engineer at Walmart
R
Someone from OH, Dublin just viewed Supply Chain Lead (Clinical Supply) at Resultance
Photo of the Rise User
89 people applied to Electrical Apprentice at Aerotek
Photo of the Rise User
Someone from OH, Columbus just viewed Scrum Master at Sysco Costa Rica
Photo of the Rise User
10 people applied to UI Developer Intern at RainFocus
X
Someone from OH, Cincinnati just viewed Senior Java Engineer (Remote) at Xenon7
Photo of the Rise User
Someone from OH, Cincinnati just viewed Senior, Software Engineer- Java at Walmart