Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Director, Site Reliability Engineering image - Rise Careers
Job details

Director, Site Reliability Engineering

This role is remote!


About Us


HeadSpin is a unique developer platform that combines data science insights and global device infrastructure to enable companies to perfect their digital experiences during the engineering cycle. HeadSpin platform is present in over 50 countries. Its data science platform has an ability to assess over 130 performance KPIs, analyze root cause of the poor experience issues and recommend solutions to address them. HeadSpin differentiates itself from other testing solutions that only focus on functional or load testing. By using HeadSpin, companies have enhanced their customer experience, reduced time to market and optimized the cost of their digital applications.


About the Role


The Site Reliability Engineering team is responsible for running HeadSpin’s production services with a commitment to quality, reliability, and low latency. The team accomplishes this goal by enabling the rest of HeadSpin product teams to design, conduct extensive testing, and establish repeatable processes. SRE helps up-level the product by evangelizing and enforcing reliability practices across all teams. Finally, the team provides rapid response and resolution to incidents. This team:


  • Owns and drives observability throughout the entire stack.
  • Facilitates accountability through service level objectives (SLOs) across all lines of business.
  • Improves service resilience by enabling performance testing and chaos engineering.
  • Supports core services in production.
  • Provides incident response services for critical issues and facilitates blameless postmortems.

As the Director of the Site Reliability Engineering team, you will be responsible for building out the HeadSpin SRE team and will oversee the continuous improvement of the operational metrics for all of HeadSpin’s systems.


  • Developing HeadSpin’s SRE roadmap to optimize reliability and minimize mean time to repair (MTTR).
  • Growing a global organization through hiring and creating professional growth opportunities.
  • Establishing strong working relationships with peer infrastructure and product teams.
  • Enabling and mentoring managers and engineers on the team to do the best work they can and rewarding their performance.
  • Influencing architecture decisions and patterns to optimize resilience and scalability throughout the entire organization.

What You’ll Do


  • Own, innovate, and create programs, software solutions, process innovations, and analytics that drive improvements to the availability, scalability, latency, and efficiency of HeadSpin’s products.
  • Work cross-functionally in close partnerships with dependent engineering teams to build fast, reliable, and durable production systems.
  • Develop strategic directions, workforce allocations, organizational structures, and tactical execution plans for the reliability teams within each product group.
  • Contribute to the strategic direction and overall strategy of HeadSpin’s SRE organization.
  • Participate in defining service level indicators (SLIs), SLOs and service level agreements (SLAs).
  • Own system designs, documentation, platform management, and capacity planning for systems in your area of responsibility.
  • Direct and collaborate with multiple teams to perform site reliability maintenance, troubleshooting and deployment to meet the specified service level objectives (SLOs).
  • Collaborate with software and hardware engineering and other teams as necessary to develop and implement effective mechanisms to monitor SLO’s.
  • Perform troubleshooting, deploy systems or execute maintenance tasks as necessary to meet the specified SLO’s.
  • Improve reliability, quality, and time-to-market of our suite of software solutions.
  • Build software and systems to manage platform infrastructure and applications.
  • Partner with architecture and development teams to improve services through rigorous testing and release procedures.
  • Create sustainable systems and services through automation and process improvements.

What You Need


  • Knowledge of systems design, performance tuning, DevOps, and site-reliability engineering.
  • Ability to program with one or more high level languages, such as Python or Go.
  • Knowledge of Linux.
  • Knowledge of databases.
  • Knowledge of automation and monitoring technologies and principles.
  • Experience with networking, firewall configuration, and troubleshooting.
  • Strong sense of ownership and dedication to results.
  • Willingness to do entry level work, if it's necessary for success.
  • Approaches challenges as opportunities and sees every day as an opportunity to become a little bit better.
  • Team player with high levels of emotional intelligence, that can work with and influence others without direct authority.
  • A proactive approach to spotting problems, areas of improvement, and bottlenecks.
  • Ability to adapt to working with a wide array of technologies and languages.
  • Excellent verbal and written communication skills and ability to communicate technical subjects to a broad range of stakeholders.
  • Strong insight into maintaining applications (monitoring, reacting to incidents).
  • Strong technical and hands-on experience.
  • Enthusiasm for gaining an understanding of how the application and infrastructure operate together to deliver the product experience.
  • Perform other related duties as assigned.

Preferred Skills


  • Experience with physical hardware-based infrastructure.
  • Infrastructure orchestration.
  • Experience building tooling.
  • Building monitoring for availability, performance, SLA delivery.
  • Thorough understanding of AWS.

HeadSpin Glassdoor Company Review
3.9 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
HeadSpin DE&I Review
3.9 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
CEO of HeadSpin
HeadSpin CEO photo
Unknown name
Approve of CEO

Give developers the ability to experience their mobile app or website the way their users around the world do.

12 jobs
TEAM SIZE
DATE POSTED
April 14, 2023

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
Other jobs
Company
Posted last year
Company
Posted last year