Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Member of Technical Staff (Cluster Manager) image - Rise Careers
Job details

Member of Technical Staff (Cluster Manager)

As a Member of Technical Staff on Cluster Management, you will:

  • Be responsible for the reliability, performance, and scalability of our compute infrastructure.

  • Design, build, and maintain the tools that keep our systems running smoothly.

  • Monitor system performance, troubleshoot issues, and implement solutions to prevent future problems.

  • Collaborate with engineering and research teams to ensure our infrastructure meets their needs.

  • Manage machine and storage resources efficiently, and implement strategies to reduce infrastructure costs.

You may be a good fit, if you have:

  • Experience managing and troubleshooting large-scale distributed systems.

  • Strong scripting and automation skills (e.g., Python, Bash).

  • Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).

  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana).

  • A deep understanding of cloud computing platforms (e.g., AWS, GCP, Azure).

  • Strongly desired: Experience with HPC/GPU cluster management tools (e.g., Slurm, GPU monitoring tools, distributed file systems).

  • The ability to build in a fast-paced environment under some uncertainty.

 

Reka's Mission

Reka's mission is to build useful multimodal artificial intelligence and use it to empower organisations and businesses. We are a globally distributed foundation model startup, headquartered in the San Francisco Bay Area, California. Embracing a remote-first approach, our team brings together top talent from around the world. Our founding team, along with many of our team members, has contributed to many of the breakthroughs in AI over the past decade.


Why Reka?

  • An Elite Team: Collaborate with top-tier engineers, researchers, operators from renowned organizations like Google DeepMind and Facebook AI Research (FAIR) and successful startups, driving innovation in AI technology.

  • Cutting Edge Infra: Opportunity to design and manage large-scale cluster with latest hardware.

  • Massive Market Opportunity: Be part of a rapidly growing industry poised to transform multiple sectors globally, offering the chance to make a significant impact.

  • Inclusive and Open Culture: Thrive in an open and inclusive work environment that values diverse perspectives and fosters creativity.

  • Visa Support: We provide visa assistance, including H1B and OPT transfers, for US employees to ensure a smooth transition and support your career with us.

Reka Glassdoor Company Review
4.6 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
Reka DE&I Review
4.0 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
CEO of Reka
Reka CEO photo
Peter Hasler
Approve of CEO

Average salary estimate

$110000 / YEARLY (est.)
min
max
$90000K
$130000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Member of Technical Staff (Cluster Manager), Reka

Joining Reka as a Member of Technical Staff (Cluster Manager) is not just another job; it’s a chance to immerse yourself in the cutting-edge world of artificial intelligence! We’re seeking someone who can take charge of our compute infrastructure, ensuring reliability, performance, and scalability that meets the needs of our engineering and research teams. Imagine being in a role where you get to design and maintain the tools that keep everything running smoothly, all while monitoring system performance and troubleshooting any issues that arise. If you’re passionate about efficiently managing machine and storage resources and have a knack for implementing cost-reduction strategies, you’ll fit right in. Reka is on a mission to harness multimodal AI to empower organizations globally, and as part of our elite, remote-first team, you’ll collaborate with top talents from renowned companies like Google DeepMind and Facebook AI Research. Experience in managing large-scale distributed systems, strong scripting skills (hello Python and Bash!), and familiarity with containerization technologies like Docker and Kubernetes are pivotal for success in this role. We also highly value expertise in cloud computing platforms and HPC/GPU cluster management tools. Join us at Reka, where your contributions can shape the future of technology in a supportive and inclusive culture that champions diverse perspectives. Together, let’s drive innovation and make a lasting impact on the world of AI!

Frequently Asked Questions (FAQs) for Member of Technical Staff (Cluster Manager) Role at Reka
What are the key responsibilities of a Member of Technical Staff at Reka?

As a Member of Technical Staff (Cluster Manager) at Reka, your primary responsibilities revolve around ensuring the reliability, performance, and scalability of our compute infrastructure. You'll design and build tools to maintain system efficiency, monitor performance, troubleshoot issues, and collaborate with engineering and research teams to align the infrastructure with their needs. Furthermore, managing resources effectively and implementing strategies for cost reduction are essential aspects of this role.

Join Rise to see the full answer
What qualifications are required for the Member of Technical Staff position at Reka?

To be considered for the Member of Technical Staff (Cluster Manager) role at Reka, candidates should have experience managing and troubleshooting large-scale distributed systems. A strong foundation in scripting and automation is crucial, particularly with tools like Python and Bash. Familiarity with containerization and orchestration technologies, as well as monitoring and logging tools, is also important. Experience with cloud platforms like AWS, GCP, or Azure, and knowledge of HPC/GPU cluster management tools will enhance your fit for this position.

Join Rise to see the full answer
How does collaboration work for the Member of Technical Staff role at Reka?

Collaboration is a cornerstone of the Member of Technical Staff (Cluster Manager) role at Reka. You'll work closely with engineering and research teams to ensure our infrastructure supports their diverse needs. This collaborative environment encourages innovative thinking and problem-solving, allowing you to contribute significantly to system performance and operational efficiency.

Join Rise to see the full answer
What is Reka's work culture like for a Member of Technical Staff?

Reka promotes an inclusive and open work culture for its team members, including the Member of Technical Staff (Cluster Manager). Our remote-first approach enables global collaboration and values diverse perspectives, fostering creativity and innovation. You’ll thrive in an environment where everyone’s contributions are valued, and teamwork is celebrated.

Join Rise to see the full answer
What are the growth opportunities for the Member of Technical Staff at Reka?

As a Member of Technical Staff (Cluster Manager) at Reka, you'll find numerous growth opportunities, especially in the rapidly evolving field of artificial intelligence. The chance to work with a highly skilled team and cutting-edge infrastructure provides the ideal setting to advance your technical expertise and leadership skills, paving the way for career progression within the organization.

Join Rise to see the full answer
Common Interview Questions for Member of Technical Staff (Cluster Manager)
How would you ensure the reliability of a large-scale distributed system?

To ensure the reliability of a large-scale distributed system, I would implement proactive monitoring and logging using tools like Prometheus and Grafana. Regular audits of system performance and thorough troubleshooting processes are essential. Building redundancy into the system and conducting chaos engineering practices can also help identify vulnerabilities.

Join Rise to see the full answer
What strategies would you recommend for cost-effective resource management?

For cost-effective resource management, I suggest implementing efficient utilization of machine and storage resources, leveraging cloud autoscaling features, and utilizing spot instances when appropriate. Regularly analyzing usage patterns can help identify areas for reduction while ensuring performance remains optimal.

Join Rise to see the full answer
Can you explain your experience with container orchestration tools?

In my previous roles, I have extensive experience with container orchestration tools like Kubernetes. I’ve designed deployment strategies, scaled applications seamlessly, and managed service updates. Understanding the nuances of orchestration has enabled me to optimize application performance in dynamic environments.

Join Rise to see the full answer
How do you handle troubleshooting in complex systems?

Troubleshooting in complex systems requires a systematic approach. I follow step-by-step diagnostics, employ log analysis to pinpoint issues, and replicate problems in a controlled environment when possible. Collaborating with teams for input can also shed light on underlying causes that may not be immediately obvious.

Join Rise to see the full answer
What scripting languages are you comfortable with for automation?

I am proficient in several scripting languages, primarily Python and Bash. I’ve built automation scripts for system monitoring, resource provisioning, and deployment tasks, significantly improving operational efficiency and reducing manual errors.

Join Rise to see the full answer
Discuss a time when you improved system performance.

In a previous role, I identified performance bottlenecks in our database access patterns. By optimizing queries and implementing caching strategies, I was able to improve overall system response times by 40%, which significantly enhanced user experience and system throughput.

Join Rise to see the full answer
What’s your approach to continuous integration and deployment?

My approach to continuous integration and deployment (CI/CD) includes integrating automated testing at every stage of the pipeline. Using tools such as Jenkins and GitLab CI, I ensure that code changes are validated and deployed smoothly, reducing risks associated with new releases.

Join Rise to see the full answer
How do you keep updated with the latest technologies in your field?

I regularly engage with industry publications, participate in webinars, and attend conferences related to cloud computing and AI technologies. Additionally, I actively contribute to open-source projects, which allows me to stay abreast of evolving practices and technologies.

Join Rise to see the full answer
What role does teamwork play in managing a computing cluster?

Teamwork is crucial in managing a computing cluster, as diverse skill sets and perspectives can lead to better solutions for complex challenges. Regular communication and collaborative problem-solving ensure that everyone is aligned towards the common goal of maintaining optimal performance and availability.

Join Rise to see the full answer
What has been your biggest challenge in cluster management, and how did you overcome it?

One of my biggest challenges in cluster management was addressing unexpected downtimes during peak usage. I implemented a robust monitoring strategy which included automated alerting and a detailed incident response protocol to quickly address and mitigate future occurrences, significantly improving uptime.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Reka Remote No location specified
Posted 11 days ago
Photo of the Rise User
Posted 10 days ago
Photo of the Rise User
Posted yesterday
Photo of the Rise User
Posted 4 days ago
Posted 6 days ago
Photo of the Rise User
NFQ Remote Vilnius / Kaunas / Šiauliai
Posted 3 days ago
Photo of the Rise User
Visa Remote Bangalore, India
Posted 5 days ago
Photo of the Rise User
Guidehouse Hybrid Bethesda, Maryland, United States
Posted 18 hours ago
MATCH
VIEW MATCH
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
March 20, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!