Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Sr. Director, Site Reliability Engineering image - Rise Careers
Job details

Sr. Director, Site Reliability Engineering

Company Details

 

Company URL:  https://www.berkleytechnologyservices.com

 

Berkley Technology Services (BTS) is the dynamic technology solution for W. R. Berkley Corporation, a Fortune 500 Commercial Lines Insurance Company. With key locations in Urbandale, IA and Wilmington, DE, BTS provides innovative and customer-focused IT solutions to the majority of WRBC’s 60+ operating units across the globe. BTS’s wide reach ensures that ideas and opinions are considered at every level of the organization to guarantee we find the best solutions possible.

 

Driven by a commitment to collaboration, BTS acts as consultants to our customers and Operating Units by providing comprehensive solutions that not only address the challenge at hand, but proactively plan for the “What’s Next” in our industry and beyond.

 

With a culture centered on innovation and entrepreneurial spirit, BTS stands as a community of technology leaders with eyes toward the future -- leaders who truly care about growing not only their team members, but themselves, and take pride in their employees who shine. BTS offers endless ways to get involved and have the chance to grow your career into a wide range of roles you'd never known existed. Come join us as we push forward into the future of industry leading technological solutions.

 

Berkley Technology Services: Right Team, Right Technology, Simple and Secure.

Responsibilities

The Sr Director, Site Reliability Engineering (SRE) is responsible for developing and implementing a comprehensive strategy for site reliability, encompassing scalability, performance, and reliability improvements. The role will align SRE objectives with overall business goals and technology roadmaps. It will foster the spirit of continuous improvement to the SRE and position it to benefit the organizational objectives across the Berkley Corporation.

 

The person in this role is responsible for overseeing SRE team operations, ensuring the reliability and availability of key applications and supporting infrastructure. This role will work effectively with Service Management to enforce best practices for system reliability, monitoring, capacity planning, incident response, problem management, disaster recovery, change management, and workflow automation.  They will also own and administer the tools and technologies necessary to generate a complete view of SRE metrics and improvement areas, including (but limited to) monitoring, logging, notification, dashboarding, and AIOps.

 

This role will involve overseeing multiple teams, with the possibility of additional teams being assigned as the organization grows and evolves.

 

Team Performance Management:

  • Instantiate and build a robust SRE team over time and integrate SRE into Berkley’s product development and operational process.
  • Recruit, mentor, and develop a high-performing team of SRE professionals.
  • Monitor ongoing staff performance; identify and communicate opportunities for improvement.
  • Provide leadership and support to ensure projects are staffed appropriately and timelines are met.

 

Collaboration and Relationship Building:

  • Collaborate with the BTS IT Leadership Teams and other groups across the IT organization to drive a unified approach to site reliability that reduces downtime and minimizes outage business impact.
  • Foster strong relationships with delivery organization leadership to align SRE efforts with organizational goals. Work collaboratively with other business and IT leaders to ensure cross functional problems are addressed cohesively across the organization.
  • Work cross-functionally in partnership with software development teams to guide product development in creating resilient and durable software systems.
  • Collaborate with EA to institute design patterns for resilient systems and mechanisms for scoring applications against industry-recognized configurations (including active-active, active-passive, recover-from-scratch, and data replication scenarios).

 

Execution, Project, and Work Management:

  • Define, and track reliability and observability OKRs for infrastructure and key systems.
  • Implement robust monitoring and alerting systems to proactively identify potential issues, analyze system performance, and facilitate quick response to incidents.
  • Implement AIOps functionality to enable auto-response, self-healing, and anomaly trend analysis.
  • Drive the development and implementation of automation solutions to remove “toil”, streamline processes, reduce manual interventions, and enhance the overall efficiency of the product engineering and SRE teams.
  • Work closely with product, development, infrastructure, and architecture teams to conduct capacity planning, ensuring that systems can handle current and future demand. Anticipate growth and scalability requirements.
  • Establish and oversee effective high-severity incident response processes, ensure timely incident resolution, and conduct post-mortems to identify root causes and implement preventive measures.
  • Improve reliability by identifying and addressing gaps in our architecture, services, and tooling.
  • Oversee disaster recovery program for both on premise and Cloud-based Berkley solutions.
  • Performs other duties assigned.

Qualifications

  • A passion for technology and innovation in the end user computing space.
  • 8+ years of experience in building/leading strong and flexible teams, managing large scale systems consumed by tens/hundreds of thousands of users.
  • 8+ years of experience of Site Reliability Engineering and DevOps.
  • 4+ years of experience in Disaster Recovery and/or Business Continuity.
  • Strong understanding of Cloud computing platforms (Azure preferred) including life-and-shift environments (VMs, etc.) and cloud-native setups (AKS, serverless, etc.).
  • Strong understanding and experience in automation tools and programming/scripting languages to develop and implement automated system reliability and performance solutions including infrastructure automation and configurations management tools (Ansible, Chef, Puppet).
  • Strong understanding of observability, monitoring, alerting, and logging tools and ability to design and implement effective monitoring and logging strategies.
  • Experience in designing and implementing on-premise, cloud, and hybrid resiliency solutions, disaster recovery, and business continuity planning.
  • Ability to drive critical issues and system design discussions and moderate between multiple technology teams.
  • Solid understanding of security best practices in on-premise, cloud, and hybrid environments along with Network technologies.
  • Working knowledge of CI/CD - preferably GitHub workflows and Actions.
  • Working knowledge of IaC automation tools (Terraform, Ansible, etc.)
  • Experience with Kubernetes and other auto-scaling tools and technologies.
  • Skilled at assessing and developing IT talent across multiple time zones and multiple business domains.
  • Exceptional written and verbal communication skills.
  • Ability to work independently in a fast-paced environment.
  • Travel Requirement: Up to 25%

Average salary estimate

$175000 / YEARLY (est.)
min
max
$150000K
$200000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Sr. Director, Site Reliability Engineering, Berkley

Join Berkley Technology Services as a Sr. Director of Site Reliability Engineering in the vibrant city of Manassas! In this pivotal role, you'll be at the helm of developing and implementing strategies that enhance site reliability, performance, and scalability for our innovative technology solutions. We're not just another tech company; we thrive on collaboration and creativity, which means your ideas will be valued from day one. You will lead and inspire our SRE team, ensuring the concrete reliability and availability of our key applications and supporting infrastructure. This is your opportunity to shape the future of the Berkley Corporation while engaging with talented teams across the globe. You'll drive best practices as you oversee operational objectives and enhance our capabilities through automation and problem-solving techniques. As you mentor and build high-performing groups, you'll not only grow the capabilities of your team but also directly contribute to the technological backbone of our expansive organization. Your experience leading large-scale systems and solid understanding of cloud environments will be critical in this position, as you collaborate with IT leadership and development teams to ensure the heart of our technology is resilient and efficient. If you're passionate about fostering growth, enhancing systems, and driving innovation, then this could be your next career adventure with us!

Frequently Asked Questions (FAQs) for Sr. Director, Site Reliability Engineering Role at Berkley
What are the key responsibilities of a Sr. Director, Site Reliability Engineering at Berkley Technology Services?

As a Sr. Director of Site Reliability Engineering at Berkley Technology Services, you will be responsible for overseeing all aspects of site reliability which includes developing strategies for scalability, performance, and continuous improvement. You will lead the SRE team to ensure reliability and availability of key applications and work closely with IT leadership across the organization to minimize downtime. Additionally, you’ll collaborate with software development teams to foster resilient software systems, manage incident response protocols, implement monitoring systems, and drive process automation.

Join Rise to see the full answer
What qualifications are necessary for the Sr. Director, Site Reliability Engineering position at Berkley Technology Services?

Candidates for the Sr. Director of Site Reliability Engineering at Berkley Technology Services should have over 8 years of relevant experience in Site Reliability Engineering, DevOps, and leadership of diverse teams managing large scale systems. A deep understanding of cloud computing platforms, disaster recovery, business continuity, and automation tools is crucial. Additionally, expertise with observability tools and security best practices, along with strong written and verbal communication skills, will help candidates to excel in this role.

Join Rise to see the full answer
How does Berkley Technology Services support team development for the Sr. Director, Site Reliability Engineering role?

At Berkley Technology Services, we emphasize a culture of growth and progression. As a Sr. Director of Site Reliability Engineering, you will have the opportunity to mentor and develop high-performing SRE professionals, guiding them through their career paths. Our commitment to collaboration allows for the sharing of ideas and experiences that not only enrich team members but also enrich the organization as a whole, ensuring that everyone can thrive and develop their skills in a supportive environment.

Join Rise to see the full answer
What technological competencies does a Sr. Director, Site Reliability Engineering need at Berkley Technology Services?

The ideal Sr. Director of Site Reliability Engineering at Berkley Technology Services needs a comprehensive understanding of cloud platforms, strong skills in automation tools, programming languages, and CI/CD processes. Familiarity with Kubernetes, observability and monitoring tools, disaster recovery procedures, and security practices is also essential. This combination of technical competencies ensures the director can lead teams effectively, addressing resilience issues proactively and implementing solutions that enhance overall operational efficiency.

Join Rise to see the full answer
What is the work environment like for a Sr. Director, Site Reliability Engineering at Berkley Technology Services?

Berkley Technology Services offers a dynamic and collaborative work environment for the Sr. Director of Site Reliability Engineering. The team culture values innovation and encourages creative input, making it an ideal place for professionals who are passionate about technology solutions. You'll find open communication channels across various teams, enhancing cross-functional collaboration and allowing for a strong support network. Moreover, flexibility in operations and a commitment to professional development makes working here both rewarding and enjoyable.

Join Rise to see the full answer
Common Interview Questions for Sr. Director, Site Reliability Engineering
Can you describe your experience with implementing Site Reliability Engineering practices?

When answering this question, focus on specific examples from your past roles where you successfully established SRE practices. Discuss the strategies you implemented, how you measured success, and any challenges you faced. Highlight your approach to monitoring, incident response, and team collaboration, showcasing your ability to drive reliability improvements.

Join Rise to see the full answer
How do you prioritize tasks and projects in a rapidly changing environment?

Discuss your methods for prioritization, such as using frameworks like Eisenhower Matrix or MoSCoW. Showcase your ability to remain agile, adapt project timelines, and align priorities with larger business objectives. Offer an example where your prioritization led to successful project outcomes amid changing demands.

Join Rise to see the full answer
Can you give an example of a high-severity incident you managed?

Provide a detailed scenario, emphasizing your role in managing the incident. Discuss your approach to facilitating effective communication, coordinating responses, and ensuring timely resolution. Focus on lessons learned and any process improvements implemented as a result.

Join Rise to see the full answer
What strategies do you employ for team performance management?

Share your experience in mentoring and developing your team. Discuss your metrics for performance evaluation and feedback mechanisms. Highlight the importance of fostering an environment of continuous learning and collaboration, and provide an example of how your approach has led to team improvement.

Join Rise to see the full answer
What tools do you find essential for monitoring and observability?

Discuss specific tools you have experience with, such as Prometheus, Grafana, or various APM tools. Explain why these tools are effective in your work and how they have contributed to improved system performance and reliability in your past roles.

Join Rise to see the full answer
How do you approach capacity planning for scalable systems?

Detail your strategies for capacity planning, including data analysis, monitoring usage trends, and forecasting growth. Provide an example where your planning led to successful scaling of systems in response to business needs, illustrating your proactive approach.

Join Rise to see the full answer
Describe a situation where you had to mediate between differing opinions from multiple teams.

Show your conflict resolution skills by providing an example of such a situation. Discuss how you facilitated discussions, encouraged collaboration, and ultimately reached a consensus that aligned with organizational goals. Emphasize the positive outcomes of your mediation efforts.

Join Rise to see the full answer
How do you ensure compliance with security best practices in SRE?

Emphasize your understanding of security protocols and best practices. Discuss processes you’ve implemented for ensuring compliance and how you keep your team informed and educated on security measures. Share examples of past experiences where you successfully secured systems and minimized risks.

Join Rise to see the full answer
What innovations have you introduced in your previous roles regarding site reliability?

Speak about specific innovations or methodologies you have introduced that enhanced reliability, performance, or efficiency. Highlight how these changes resulted in measurable improvements in system uptime, operational efficiency, or team effectiveness.

Join Rise to see the full answer
What is your experience with automation tools and how have they impacted your work?

Discuss various automation tools you are familiar with and provide examples of how you have successfully implemented them to improve processes within your teams. Highlight the importance of automation in reducing manual interventions and improving the speed and efficiency of operations.

Join Rise to see the full answer
Similar Jobs
Posted 9 days ago

Join Verus Specialty Insurance as an Underwriting Assistant and help drive significant innovations in the insurance sector.

Berkley Hybrid New York, NY
Posted 9 days ago

W. R. Berkley Corporation is seeking a motivated intern to enrich their understanding of the insurance business through hands-on experience.

Photo of the Rise User
Customer-Centric
Dare to be Different
Casual Dress Code

Join our inclusive team at Telstra as a Field Installer Repairer, delivering top-notch telecommunications services in Muswellbrook/Scone.

Photo of the Rise User
Posted 11 days ago

Lead GEI's Geotechnical Team in Sacramento as a Senior Engineer, driving technical and business excellence.

Photo of the Rise User
Posted 10 days ago
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Customer-Centric
Snacks
Onsite Gym
Family Coverage (Insurance)
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Learning & Development
Paid Time-Off
401K Matching
Maternity Leave
Paternity Leave

Intel is looking for undergraduate interns in Engineering to contribute to cutting-edge semiconductor manufacturing.

Photo of the Rise User
Formlabs Hybrid Somerville, MA
Posted 5 hours ago

Join Formlabs as a Process Engineer to innovate and optimize our groundbreaking 3D printing technology.

Posted 11 days ago

Become a contributing part of Cadence's team as a Design Verification Intern, focusing on innovative solutions in technology.

Photo of the Rise User
Posted 11 days ago

Join Saronic Technologies as a Mechanical Test Engineer, where you will enhance maritime operations through advanced testing of autonomous systems.

Posted 13 days ago

Join a distinguished engineering company in Austin as a Mechanical Engineer PE/PM, focusing on HVAC design and project management.

Photo of the Rise User

Join Raytheon's esteemed mechanical design team as a Senior Principal Low Observable Materials Engineer specializing in Stealth Technology in Tucson, AZ.

Photo of the Rise User
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Photo of the Rise User
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Transparent & Candid
Growth & Learning
Fast-Paced
Collaboration over Competition
Take Risks
Friends Outside of Work
Passion for Exploration
Customer-Centric
Reward & Recognition
Feedback Forward
Rapid Growth
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Paternity Leave
Fully Distributed
Flex-Friendly
Some Meals Provided
Snacks
Social Gatherings
Pet Friendly
Company Retreats
Dental Insurance
Life insurance
Health Savings Account (HSA)
Photo of the Rise User
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Transparent & Candid
Growth & Learning
Fast-Paced
Collaboration over Competition
Take Risks
Friends Outside of Work
Passion for Exploration
Customer-Centric
Reward & Recognition
Feedback Forward
Rapid Growth
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Paternity Leave
Fully Distributed
Flex-Friendly
Some Meals Provided
Snacks
Social Gatherings
Pet Friendly
Company Retreats
Dental Insurance
Life insurance
Health Savings Account (HSA)
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
HQ LOCATION
No info
EMPLOYMENT TYPE
Full-time, on-site
DATE POSTED
April 16, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
o
Someone from OH, Cincinnati just viewed Marketing and Communications Consultant at osu
Photo of the Rise User
Someone from OH, Toledo just viewed Registered Nurse (Part-time) at Calibrate
Photo of the Rise User
19 people applied to Machinist Apprentice at LLNL
Photo of the Rise User
Someone from OH, Toledo just viewed Clinical Research Associate II at Alimentiv
Photo of the Rise User
Someone from OH, Cleveland just viewed IT Support Engineer at Level AI
Photo of the Rise User
Someone from OH, Dayton just viewed Customer Content Specialist at Cision
Photo of the Rise User
Someone from OH, Cuyahoga Falls just viewed Senior Corporate Communications Manager at Bumble Inc.
Photo of the Rise User
Someone from OH, Cincinnati just viewed Senior Financial Analyst at Workday
Photo of the Rise User
Someone from OH, Cincinnati just viewed Financial Planning and Analysis Lead at JLL
Photo of the Rise User
Someone from OH, Cincinnati just viewed Senior Financial Analyst at American Express
Photo of the Rise User
Someone from OH, Cincinnati just viewed Senior Analyst, Operations at American Express
Photo of the Rise User
Someone from OH, Cincinnati just viewed Strategic Finance Analyst, Corporate at Benchling
Photo of the Rise User
Someone from OH, Cincinnati just viewed Senior Analyst, Project Finance at Apex Clean Energy
Photo of the Rise User
Someone from OH, Cincinnati just viewed Senior FP&A Analyst, Sales at GitLab
Photo of the Rise User
Someone from OH, Cincinnati just viewed FP&A Analyst at Lithic