Observability Site Reliability Engineer (SRE)Job Overview: The Observability SRE will be responsible for ensuring the reliability and scalability of our services. They will focus on improving Mean Time To Detect (MTTD) and Mean Time To Recover (MTTR), implementing full-stack observability, and automating non-functional engineering via robust CI/CD pipelines.Responsibilities and Duties:• Develop and maintain SMART monitoring solutions to enable quicker problem detection and isolation.• Strategize and implement deployment models like Canary or Blue-Green to minimize downtime during deployments.• Utilize increased automation, reusable assets, and self-healing techniques to improve system reliability.• Build resiliency across application and infrastructure layers through Chaos Engineering.• Embed performance and scalability into application design and code from the initial stages.Qualifications:• Proven experience in SRE or similar roles with a focus on observability.• Strong understanding of CI/CD pipelines and automation tools.• Experience with deployment models such as Canary or Blue-Green.• Knowledge of Chaos Engineering and its application in building resilient systems.• Ability to work collaboratively in a fast-paced environment.Education:• Bachelor’s degree in Computer Science, Engineering, or related field.Experience:• Minimum of 3 years in a Site Reliability Engineering role or similar.Skills:• Proficiency in monitoring tools and technologies.• Strong analytical and problem-solving skills.• Excellent communication and teamwork abilities.Skills for this specific opportunity• Cloud technologies: Support resources operating in GCP, Azure• Observability: Prior experience using a Commercial Observability/APM solution (Dynatrace, New Relic, Datadog, AppDynamics, Honeycomb• Monitoring and Logging: Solid familiarity with Splunk, Elastic, OpenSearch, Prometheus, Grafana• Prior SRE role• Experience supporting and troubleshooting issues with critical business apps.• Sound knowledge of servers, infrastructure, load balancers, storage etc.• Operating Systems competency: solid understanding of Unix/Linux and windows• Technologies: Kubernetes, Containers, serverless• Languages/Programming: One or more of the following: Bash or ksh, Powershell or any other common computer languageIAC: Prior experience writing and utilizing Terraform.
Average salary estimate
Estimate provided by employer
$149023
/ ANNUAL (est.)
min
max
$113K
$185K
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.