Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Senior Staff Site Reliability Engineer image - Rise Careers
Job details

Senior Staff Site Reliability Engineer

Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated,  purpose-built AI infrastructure solutions trusted by Fortune 500 companies to power their most advanced AI applications. Crusoe is redefining AI cloud infrastructure, with a mission to align the future of computing with the future of the climate. Our AI platform is recognized as the "gold standard" for reliability and performance. Our data centers are optimized for AI workloads and are powered by clean, renewable energy.

Be part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.

About This Role:

At Crusoe Energy Systems, our Site Reliability Engineering (SRE) team plays a pivotal role in ensuring the reliability and performance of our infrastructure. SRE at Crusoe is dedicated to detecting, analyzing, and preventing issues to maintain high Service Level Agreement through Service Level Indicators (SLIs) and Service Level Objectives (SLOs). Through automation and proactive remediation, our SREs not only resolve common errors automatically but also advise various engineering teams in building resilient code. We prioritize anticipating and resolving issues before they impact our customers, conducting thorough post-mortems, and driving continuous improvement. Our customer-centric approach ensures that clients always have access to the virtual machines they depend on. Join us to help build and maintain the robust systems that power Crusoe's innovative solutions.

A Day in the Life:

As a Site Reliability Engineer at Crusoe Energy Systems, your day begins with a review of overnight alerts and system performance metrics to ensure everything is running smoothly. You will collaborate with your team in a morning stand-up meeting to discuss ongoing projects, recent incidents, and priorities for the day. Your tasks might include automating routine processes, analyzing system logs, and developing tools to enhance our monitoring capabilities. You'll spend part of your day working closely with software engineers, advising on best practices for resilient code and reviewing changes before deployment. Regularly, you will engage in incident response drills, post-mortems, and root cause analysis sessions to learn from past issues and prevent future ones. Throughout the day, you will stay focused on maintaining high SLIs and SLOs, ensuring that our infrastructure remains robust and reliable for our customers. By day's end, you will document your work, share insights with your team, and plan for the next day's challenges, always with a customer-centric mindset.


You Will Thrive In This Role If:

  • 12+ years of professional SRE experience

  • 12+ years of experience contributing to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems

  • Bachelor's Degree in Computer Science or related field, or 15+ years relevant work experience

  • Solid understanding of infrastructure design, including the operational trade-offs of various designs

  • Experience writing high quality code with at least one programming language (Python, Go, or similar)

  • Experience building with modern infrastructure tools such as Docker, Kubernetes, Ansible, Cloud Formation, Terraform

  • Experience building with modern CI/CD practices and build systems, such as GitLab CI/CD, CircleCI, GitHub Actions

  • Experience with logging, monitoring and alerting systems and tools

  • Experience with Unix/Linux environments

  • Experience with TCP/IP and network programming

  • Experience with information security best practices

  • Excellent communication skills

  • Must be able to pass a background check

  • Embody the Company values

Benefits:

  • Hybrid work schedule

  • Industry competitive pay

  • Restricted Stock Units in a fast growing, well-funded technology company

  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents

  • Employer contributions to HSA accounts 

  • Paid Parental Leave 

  • Paid life insurance, short-term and long-term disability 

  • Teladoc 

  • 401(k) with a 100% match up to 4% of salary

  • Generous paid time off and holiday schedule

  • Cell phone reimbursement

  • Tuition reimbursement

  • Subscription to the Calm app

  • MetLife Legal

  • Company paid commuter benefit; $50 per pay period

Compensation Range:

Compensation will be paid up to $290,000 base salary. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Crusoe Glassdoor Company Review
3.4 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
Crusoe DE&I Review
No rating Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of Crusoe
Crusoe CEO photo
Chase Lochmiller
Approve of CEO

Average salary estimate

$290000 / YEARLY (est.)
min
max
$290000K
$290000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Senior Staff Site Reliability Engineer, Crusoe

At Crusoe Energy Systems, we're not just part of the tech industry; we're shaping the future of AI-powered cloud infrastructure. As a Senior Staff Site Reliability Engineer, you'll play a key role in ensuring that our cutting-edge services are reliable and robust for our clients. You'll join a passionate team dedicated to transforming AI needs into reality while embedding sustainability into our practices. Each day is an opportunity to innovate—review alerts, analyze metrics, and proactively enhance our systems. Collaborate with talented engineers, automate repetitive tasks, and dive deep into troubleshooting root causes. Your insights will help ensure we keep our Service Level Agreements (SLAs) on point, and your creativity will be essential in building resilient code. With 12+ years of SRE experience and a sound technical background, you’ll be deeply involved in various aspects of infrastructure design and operational excellence. Plus, you'll benefit from Crusoe's hybrid work schedule and an attractive compensation package. Here, your work will have a tangible impact on our mission to provide the best AI infrastructure while prioritizing climate responsibility. Come be a part of a company that values continuous improvement and customer satisfaction as much as you do.

Frequently Asked Questions (FAQs) for Senior Staff Site Reliability Engineer Role at Crusoe
What are the main responsibilities of a Senior Staff Site Reliability Engineer at Crusoe Energy Systems?

As a Senior Staff Site Reliability Engineer at Crusoe Energy Systems, your main responsibilities will include monitoring system alerts, enhancing infrastructure reliability, and participating in incident response training. You'll also work on automating processes, analyzing system performance metrics, collaborating with engineering teams, and advising on resilient code best practices, all while aligning with our commitment to sustainability.

Join Rise to see the full answer
What qualifications are required for the Senior Staff Site Reliability Engineer position at Crusoe?

To qualify for the Senior Staff Site Reliability Engineer position at Crusoe Energy Systems, you will need at least 12 years of professional SRE experience, along with expertise in infrastructure design and architectural best practices. A Bachelor’s Degree in Computer Science or equivalent experience is essential. A solid foundation in programming languages like Python or Go is also required, alongside familiarity with modern infrastructure tools.

Join Rise to see the full answer
How does Crusoe Energy Systems ensure system reliability for its clients?

Crusoe Energy Systems prioritizes system reliability through proactive monitoring, maintaining rigorous Service Level Indicators (SLIs) and Service Level Objectives (SLOs), and conducting thorough post-mortems after incidents. Our team of Site Reliability Engineers continuously analyzes performance metrics and utilizes automation to detect and resolve potential issues before they affect our clients, ensuring uninterrupted access to our services.

Join Rise to see the full answer
Can you describe a typical day for a Senior Staff Site Reliability Engineer at Crusoe?

A typical day for a Senior Staff Site Reliability Engineer at Crusoe begins with reviewing system performance metrics, followed by a morning stand-up meeting with your team. Throughout the day, you'll focus on upgrading automation processes, performing log analysis, collaborating on software changes, and conducting incident drills. Your efforts will be geared toward optimizing operations with a customer-first approach, making every day impactful.

Join Rise to see the full answer
What benefits does Crusoe Energy Systems offer to its employees in the Senior Staff Site Reliability Engineer role?

Employees in the Senior Staff Site Reliability Engineer role at Crusoe Energy Systems enjoy a hybrid work schedule, competitive pay, Restricted Stock Units, and a comprehensive health insurance package. Additional perks include generous paid time off, tuition reimbursement, contributions to HSA accounts, and wellness benefits—including access to the Calm app—demonstrating our commitment to work-life balance and employee well-being.

Join Rise to see the full answer
Common Interview Questions for Senior Staff Site Reliability Engineer
What tools and technologies are you most experienced with as a Senior Staff Site Reliability Engineer?

As a Senior Staff Site Reliability Engineer, it's crucial to have extensive experience with tools such as Docker, Kubernetes, Ansible, and Terraform, as well as CI/CD systems like GitLab CI/CD or GitHub Actions. Highlight your familiarity with logging and monitoring tools and provide specific examples of how you've utilized them to improve system reliability and performance.

Join Rise to see the full answer
How do you prioritize uptime and reliability in your current role?

In my current role, I prioritize uptime and reliability by implementing SLIs and SLOs, proactively monitoring system performance, and conducting regular audits. I emphasize automation to minimize human error and routinely analyze incident reports to identify underlying issues. This approach not only helps maintain reliability but also informs engineering teams to adopt best practices in coding and deployments.

Join Rise to see the full answer
Can you explain a challenging incident you managed and how you resolved it?

Certainly! I once faced a critical network outage that impacted our services. I coordinated a rapid response by organizing a dedicated team to analyze the root cause while keeping stakeholders informed. After identifying the issue—network misconfiguration—I led the remediation efforts, documented the process, and implemented safeguards to prevent recurrence. This experience reinforced the importance of effective communication and a structured response approach.

Join Rise to see the full answer
What strategies do you use for continuous improvement in your SRE role?

To drive continuous improvement in my SRE role, I focus on regular post-mortem analyses after incidents, soliciting feedback from team members, and staying updated on industry best practices. I also encourage a culture of knowledge sharing and mentorship within my team to foster an environment of collaboration and innovation. This ongoing learning helps us refine our processes and tools effectively.

Join Rise to see the full answer
How do you ensure teamwork and effective communication among your colleagues?

Effective teamwork and communication are vital for success in an SRE role. I ensure this by fostering an open environment where all team members feel comfortable sharing ideas and concerns. I utilize daily stand-ups and collaborative tools to keep everyone informed of ongoing projects. Also, I promote active listening and clear documentation of our decisions to ensure everyone is on the same page.

Join Rise to see the full answer
What programming languages are you proficient in, and how have you used them in your role?

I am proficient in several programming languages, including Python and Go. In my recent role, I've utilized Python for developing automation scripts to streamline monitoring processes, which reduced response times during incidents. Additionally, I used Go to contribute to the development of microservices that enhance system functionality, demonstrating how programming skills directly impact site reliability.

Join Rise to see the full answer
What role does automation play in your work as a Senior Staff Site Reliability Engineer?

Automation is a cornerstone of my work as a Senior Staff Site Reliability Engineer. It helps reduce manual tasks, minimizes human errors, and accelerates incident response times. I prioritize automating routine system checks, alerting mechanisms, and deployment processes, which not only enhances efficiency but also allows team members to focus on more strategic initiatives that drive our infrastructure’s reliability.

Join Rise to see the full answer
How do you stay current with the latest SRE practices and technologies?

To stay current with the latest SRE practices, I regularly engage with industry literature, attend webinars, and participate in SRE and DevOps meet-ups. I also follow thought leaders in the space on platforms like LinkedIn and Twitter. These activities provide valuable insights into emerging trends and best practices, which I can then apply within my team to ensure we're using cutting-edge techniques.

Join Rise to see the full answer
Describe your experience in participating in incident response and post-mortem reviews.

I have actively participated in numerous incident response drills and post-mortem reviews. During incidents, I ensure effective team coordination and communication to resolve issues quickly. Following incidents, I lead post-mortems that analyze what went wrong, documenting lessons learned and action items to prevent similar occurrences. This iterative review process is crucial for both individual and organizational growth.

Join Rise to see the full answer
What aspects of working at Crusoe Energy Systems excite you the most?

What excites me most about working at Crusoe Energy Systems is the opportunity to be at the forefront of AI-driven cloud infrastructure while actively supporting climate initiatives. The chance to work in a creative, supportive environment with innovative engineers matches my passion for technology and sustainability, making it a compelling role where I can truly make a difference.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Mission Driven
Social Impact Driven
Passion for Exploration
Reward & Recognition
Photo of the Rise User
Posted 13 days ago
Photo of the Rise User
Posted 2 days ago
Posted 10 days ago
Photo of the Rise User
Posted 13 days ago
Photo of the Rise User
Quiq Inc Remote https://quiq.com/careers/job-listing/?job_id=4119354005
Posted 11 hours ago
Photo of the Rise User
Mission Driven
Social Impact Driven
Passion for Exploration
Reward & Recognition

We’re on a mission to align the future of computation with the future of the climate.

178 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
January 8, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!