Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Member of Technical Staff (Cluster Manager) image - Rise Careers
Job details

Member of Technical Staff (Cluster Manager)

As a Member of Technical Staff on Cluster Management, you will:

  • Be responsible for the reliability, performance, and scalability of our compute infrastructure.

  • Design, build, and maintain the tools that keep our systems running smoothly.

  • Monitor system performance, troubleshoot issues, and implement solutions to prevent future problems.

  • Collaborate with engineering and research teams to ensure our infrastructure meets their needs.

  • Manage machine and storage resources efficiently, and implement strategies to reduce infrastructure costs.

You may be a good fit, if you have:

  • Experience managing and troubleshooting large-scale distributed systems.

  • Strong scripting and automation skills (e.g., Python, Bash).

  • Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).

  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana).

  • A deep understanding of cloud computing platforms (e.g., AWS, GCP, Azure).

  • Strongly desired: Experience with HPC/GPU cluster management tools (e.g., Slurm, GPU monitoring tools, distributed file systems).

  • The ability to build in a fast-paced environment under some uncertainty.

 

Reka's Mission

Reka's mission is to build useful multimodal artificial intelligence and use it to empower organisations and businesses. We are a globally distributed foundation model startup, headquartered in the San Francisco Bay Area, California. Embracing a remote-first approach, our team brings together top talent from around the world. Our founding team, along with many of our team members, has contributed to many of the breakthroughs in AI over the past decade.


Why Reka?

  • An Elite Team: Collaborate with top-tier engineers, researchers, operators from renowned organizations like Google DeepMind and Facebook AI Research (FAIR) and successful startups, driving innovation in AI technology.

  • Cutting Edge Infra: Opportunity to design and manage large-scale cluster with latest hardware.

  • Massive Market Opportunity: Be part of a rapidly growing industry poised to transform multiple sectors globally, offering the chance to make a significant impact.

  • Inclusive and Open Culture: Thrive in an open and inclusive work environment that values diverse perspectives and fosters creativity.

  • Visa Support: We provide visa assistance, including H1B and OPT transfers, for US employees to ensure a smooth transition and support your career with us.

Reka Glassdoor Company Review
4.6 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
Reka DE&I Review
4.0 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
CEO of Reka
Reka CEO photo
Peter Hasler
Approve of CEO

Average salary estimate

$110000 / YEARLY (est.)
min
max
$90000K
$130000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Member of Technical Staff (Cluster Manager), Reka

Joining Reka as a Member of Technical Staff (Cluster Manager) is not just another job; it’s a chance to immerse yourself in the cutting-edge world of artificial intelligence! We’re seeking someone who can take charge of our compute infrastructure, ensuring reliability, performance, and scalability that meets the needs of our engineering and research teams. Imagine being in a role where you get to design and maintain the tools that keep everything running smoothly, all while monitoring system performance and troubleshooting any issues that arise. If you’re passionate about efficiently managing machine and storage resources and have a knack for implementing cost-reduction strategies, you’ll fit right in. Reka is on a mission to harness multimodal AI to empower organizations globally, and as part of our elite, remote-first team, you’ll collaborate with top talents from renowned companies like Google DeepMind and Facebook AI Research. Experience in managing large-scale distributed systems, strong scripting skills (hello Python and Bash!), and familiarity with containerization technologies like Docker and Kubernetes are pivotal for success in this role. We also highly value expertise in cloud computing platforms and HPC/GPU cluster management tools. Join us at Reka, where your contributions can shape the future of technology in a supportive and inclusive culture that champions diverse perspectives. Together, let’s drive innovation and make a lasting impact on the world of AI!

Frequently Asked Questions (FAQs) for Member of Technical Staff (Cluster Manager) Role at Reka
What are the key responsibilities of a Member of Technical Staff at Reka?

As a Member of Technical Staff (Cluster Manager) at Reka, your primary responsibilities revolve around ensuring the reliability, performance, and scalability of our compute infrastructure. You'll design and build tools to maintain system efficiency, monitor performance, troubleshoot issues, and collaborate with engineering and research teams to align the infrastructure with their needs. Furthermore, managing resources effectively and implementing strategies for cost reduction are essential aspects of this role.

Join Rise to see the full answer
What qualifications are required for the Member of Technical Staff position at Reka?

To be considered for the Member of Technical Staff (Cluster Manager) role at Reka, candidates should have experience managing and troubleshooting large-scale distributed systems. A strong foundation in scripting and automation is crucial, particularly with tools like Python and Bash. Familiarity with containerization and orchestration technologies, as well as monitoring and logging tools, is also important. Experience with cloud platforms like AWS, GCP, or Azure, and knowledge of HPC/GPU cluster management tools will enhance your fit for this position.

Join Rise to see the full answer
How does collaboration work for the Member of Technical Staff role at Reka?

Collaboration is a cornerstone of the Member of Technical Staff (Cluster Manager) role at Reka. You'll work closely with engineering and research teams to ensure our infrastructure supports their diverse needs. This collaborative environment encourages innovative thinking and problem-solving, allowing you to contribute significantly to system performance and operational efficiency.

Join Rise to see the full answer
What is Reka's work culture like for a Member of Technical Staff?

Reka promotes an inclusive and open work culture for its team members, including the Member of Technical Staff (Cluster Manager). Our remote-first approach enables global collaboration and values diverse perspectives, fostering creativity and innovation. You’ll thrive in an environment where everyone’s contributions are valued, and teamwork is celebrated.

Join Rise to see the full answer
What are the growth opportunities for the Member of Technical Staff at Reka?

As a Member of Technical Staff (Cluster Manager) at Reka, you'll find numerous growth opportunities, especially in the rapidly evolving field of artificial intelligence. The chance to work with a highly skilled team and cutting-edge infrastructure provides the ideal setting to advance your technical expertise and leadership skills, paving the way for career progression within the organization.

Join Rise to see the full answer
Common Interview Questions for Member of Technical Staff (Cluster Manager)
How would you ensure the reliability of a large-scale distributed system?

To ensure the reliability of a large-scale distributed system, I would implement proactive monitoring and logging using tools like Prometheus and Grafana. Regular audits of system performance and thorough troubleshooting processes are essential. Building redundancy into the system and conducting chaos engineering practices can also help identify vulnerabilities.

Join Rise to see the full answer
What strategies would you recommend for cost-effective resource management?

For cost-effective resource management, I suggest implementing efficient utilization of machine and storage resources, leveraging cloud autoscaling features, and utilizing spot instances when appropriate. Regularly analyzing usage patterns can help identify areas for reduction while ensuring performance remains optimal.

Join Rise to see the full answer
Can you explain your experience with container orchestration tools?

In my previous roles, I have extensive experience with container orchestration tools like Kubernetes. I’ve designed deployment strategies, scaled applications seamlessly, and managed service updates. Understanding the nuances of orchestration has enabled me to optimize application performance in dynamic environments.

Join Rise to see the full answer
How do you handle troubleshooting in complex systems?

Troubleshooting in complex systems requires a systematic approach. I follow step-by-step diagnostics, employ log analysis to pinpoint issues, and replicate problems in a controlled environment when possible. Collaborating with teams for input can also shed light on underlying causes that may not be immediately obvious.

Join Rise to see the full answer
What scripting languages are you comfortable with for automation?

I am proficient in several scripting languages, primarily Python and Bash. I’ve built automation scripts for system monitoring, resource provisioning, and deployment tasks, significantly improving operational efficiency and reducing manual errors.

Join Rise to see the full answer
Discuss a time when you improved system performance.

In a previous role, I identified performance bottlenecks in our database access patterns. By optimizing queries and implementing caching strategies, I was able to improve overall system response times by 40%, which significantly enhanced user experience and system throughput.

Join Rise to see the full answer
What’s your approach to continuous integration and deployment?

My approach to continuous integration and deployment (CI/CD) includes integrating automated testing at every stage of the pipeline. Using tools such as Jenkins and GitLab CI, I ensure that code changes are validated and deployed smoothly, reducing risks associated with new releases.

Join Rise to see the full answer
How do you keep updated with the latest technologies in your field?

I regularly engage with industry publications, participate in webinars, and attend conferences related to cloud computing and AI technologies. Additionally, I actively contribute to open-source projects, which allows me to stay abreast of evolving practices and technologies.

Join Rise to see the full answer
What role does teamwork play in managing a computing cluster?

Teamwork is crucial in managing a computing cluster, as diverse skill sets and perspectives can lead to better solutions for complex challenges. Regular communication and collaborative problem-solving ensure that everyone is aligned towards the common goal of maintaining optimal performance and availability.

Join Rise to see the full answer
What has been your biggest challenge in cluster management, and how did you overcome it?

One of my biggest challenges in cluster management was addressing unexpected downtimes during peak usage. I implemented a robust monitoring strategy which included automated alerting and a detailed incident response protocol to quickly address and mitigate future occurrences, significantly improving uptime.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 2 days ago

Join Reka as a Member of Technical Staff (Backend Engineer) and engineer scalable systems that empower AI-driven innovations.

Photo of the Rise User
Oowlish Technology Remote No location specified
Posted 10 days ago

Enhance customer data aggregation as a Salesforce Engineer at Oowlish, a dynamic remote software development company focused on innovation.

Photo of the Rise User
Posted 2 days ago

Join Fineline as an IT Developer and contribute to innovative software solutions in the food service industry.

Posted 8 days ago

Join Toyota's Elite Development Team as a Full Stack Developer and help shape the future of mobility with your innovative solutions.

Photo of the Rise User
Linx Remote Joinville, Santa Catarina, Brasil
Posted 9 days ago

Become a Fullstack Developer at Linx and contribute to impactful software solutions for the retail industry.

Join Heidi as a Senior Backend Software Engineer to develop innovative solutions in the healthcare technology space.

Photo of the Rise User
Posted 12 days ago

Join Experian as a Senior Manager of Software Engineering to lead the development of a Cloud Data Platform while working remotely.

Photo of the Rise User
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave

NVIDIA offers an internship for a Software Engineer in Fleet Health Instrumentation, where you'll develop and optimize software systems for GPU fleet management.

Photo of the Rise User
Posted 10 days ago

Join Facility Optimization Solutions (FOS) as a Director of Software Engineering to shape the future of their innovative digital products.

Photo of the Rise User
Inclusive & Diverse
Empathetic
Collaboration over Competition
Growth & Learning
Transparent & Candid
Medical Insurance
Dental Insurance
Mental Health Resources
Life insurance
Disability Insurance
Child Care stipend
Employee Resource Groups
Learning & Development
Photo of the Rise User
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Transparent & Candid
Growth & Learning
Fast-Paced
Collaboration over Competition
Take Risks
Friends Outside of Work
Passion for Exploration
Customer-Centric
Reward & Recognition
Feedback Forward
Rapid Growth
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Paternity Leave
Fully Distributed
Flex-Friendly
Some Meals Provided
Snacks
Social Gatherings
Pet Friendly
Company Retreats
Dental Insurance
Life insurance
Health Savings Account (HSA)
Photo of the Rise User
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Transparent & Candid
Growth & Learning
Fast-Paced
Collaboration over Competition
Take Risks
Friends Outside of Work
Passion for Exploration
Customer-Centric
Reward & Recognition
Feedback Forward
Rapid Growth
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Paternity Leave
Fully Distributed
Flex-Friendly
Some Meals Provided
Snacks
Social Gatherings
Pet Friendly
Company Retreats
Dental Insurance
Life insurance
Health Savings Account (HSA)
Photo of the Rise User
Posted last month

Join ABC Legal Services as a Data Entry Specialist where you can work remotely and support our team in the legal document filing process.

MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
March 20, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
o
Someone from OH, Cincinnati just viewed Marketing and Communications Consultant at osu
Photo of the Rise User
Someone from OH, Toledo just viewed Registered Nurse (Part-time) at Calibrate
Photo of the Rise User
Someone from OH, Toledo just viewed Clinical Research Associate II at Alimentiv
Photo of the Rise User
Someone from OH, Cleveland just viewed IT Support Engineer at Level AI
Photo of the Rise User
Someone from OH, Dayton just viewed Customer Content Specialist at Cision
Photo of the Rise User
Someone from OH, Cuyahoga Falls just viewed Senior Corporate Communications Manager at Bumble Inc.
Photo of the Rise User
Someone from OH, Cincinnati just viewed Senior Financial Analyst at Workday
Photo of the Rise User
Someone from OH, Cincinnati just viewed Financial Planning and Analysis Lead at JLL
Photo of the Rise User
Someone from OH, Cincinnati just viewed Senior Financial Analyst at American Express
Photo of the Rise User
Someone from OH, Cincinnati just viewed Senior Analyst, Operations at American Express
Photo of the Rise User
Someone from OH, Cincinnati just viewed Strategic Finance Analyst, Corporate at Benchling
Photo of the Rise User
Someone from OH, Cincinnati just viewed Senior Analyst, Project Finance at Apex Clean Energy
Photo of the Rise User
Someone from OH, Cincinnati just viewed Senior FP&A Analyst, Sales at GitLab
Photo of the Rise User
Someone from OH, Cincinnati just viewed FP&A Analyst at Lithic
Photo of the Rise User
16 people applied to NodeJs developer at BlackStone eIT