Job details

Site Reliability Engineer

XperiencOps Inc. is looking for a passionate and skilled Site Reliability Engineer (SRE) to join our team. As a Senior Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, availability, and performance of our customer-facing systems and infrastructure. You will work closely with customer engineering, support, and DevRel teams to proactively identify and address reliability gaps, implement automation and instrumentation, and improve the scalability of our systems.

The ideal candidate has strong expertise in AWS cloud technologies, a deep understanding of serverless architectures (AWS Lambda), and a passion for building resilient systems to enhance the customer experience.

Responsibilities:

Design, implement, and manage scalable and reliable infrastructure solutions that meet the needs of our applications.
Develop automation scripts to support operations and streamline processes, including deployment and monitoring.
Set up and monitor system alerts, metrics, and dashboards to proactively prevent incidents.
Respond to incidents and outages, performing root cause analysis to implement long-term solutions.
Collaborate with development and operations teams to ensure reliability in all aspects of the system life cycle.
Continuously improve the architecture and deployment processes for efficiency and reliability.
Participate in on-call rotations and be a key resource during incident responses.
Document and communicate system architecture, design concepts, and operational procedures to the team.

Bachelor's degree in Computer Science or related discipline.
5+ years of experience in Site Reliability Engineering, DevOps, or a similar role in production environments.
3+ years of experience in cloud services, in particular AWS.
Experience building observability systems on New Relic, Cloudwatch or similar.
Experience implementing rate-limiting, API gateways, and load balancing for highly available systems.
Exposure to security best practices and compliance frameworks (e.g., SOC2, ISO27001).
Proficient in infrastructure as code (IaC) using tools such as Terraform or CloudFormation.
Hands-on experience with scripting and programming languages like Python, Go, or Bash.
Strong troubleshooting and debugging skills.
Excellent communication and collaboration skills.
Experience with incident management and post-mortem practices.

A competitive salary and comprehensive benefits package.
An opportunity to be part of a cutting-edge technology company with a dynamic and innovative team.
Professional growth and development opportunities in a supportive and collaborative work environment.

Average salary estimate

$140000 / YEARLY (est.)

min

max

$120000K

$160000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Site Reliability Engineer, XperiencOps Inc

XperiencOps Inc. is excited to welcome a passionate Site Reliability Engineer (SRE) to our dynamic team. If you thrive in an environment where you can truly make a difference, this role is perfect for you! As a Senior Site Reliability Engineer, you’ll be at the forefront of ensuring that our customer-facing systems and infrastructure are not only reliable but also high-performing. Your expertise in AWS cloud technologies and serverless architectures, particularly AWS Lambda, will be invaluable as you work closely with our customer engineering, support, and DevRel teams. You’ll proactively identify reliability gaps and implement solutions, all while fostering a culture of automation and instrumentation. Your responsibilities will include designing scalable and resilient infrastructure solutions, developing automation scripts, and monitoring system performance to preemptively tackle potential incidents. We value collaboration, so you’ll be teaming up with various departments to enhance system reliability throughout its lifecycle. If you have a knack for troubleshooting, excellent communication skills, and a commitment to continuous improvement, you’ll find our environment both challenging and rewarding. Join us, and not only will you contribute to cutting-edge technology, but you'll also enjoy professional growth and a comprehensive benefits package in a supportive environment that encourages innovation.

Frequently Asked Questions (FAQs) for Site Reliability Engineer Role at XperiencOps Inc

What are the key responsibilities of a Site Reliability Engineer at XperiencOps Inc.?

At XperiencOps Inc., the Site Reliability Engineer (SRE) is responsible for designing and managing scalable infrastructure solutions, developing automation scripts to support operations, setting up alerts and dashboards, and responding to incidents. Additionally, you will collaborate with development teams to ensure reliability throughout the system life cycle.