Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Sr. SRE / DevOps image - Rise Careers
Job details

Sr. SRE / DevOps

The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of production systems. The role focuses on monitoring, alerting, and dashboard creation with a strong emphasis on SRE tools like Grafana, Prometheus, and Datadog. The ideal candidate should have hands-on experience with Python scripting and be able to collaborate effectively with cross-functional teams to address service issues and improve system reliability.


Requirements
  • +4 years of experience in similar roles
  • Fluent English
  • Experience with creating and modifying Grafana dashboards for system monitoring.
  • Knowledge of Prometheus for setting up and maintaining monitoring systems.
  • Experience with Datadog for user and system monitoring.
  • Hands-on experience with Python scripting for automation and other tasks.
  • Understanding of SRE practices, including monitoring, alerting, and incident response.
  • Ability to create and enhance runbooks for incident response and remediation.
  • Experience with DevOps practices, such as CI/CD and infrastructure automation, is a secondary desired skill set.
  • Strong communication skills to collaborate with cross-functional teams and stakeholders.
  • Ability to proactively identify and address service issues.
  • Familiarity with ITIL process experience, including Service Management, Knowledge Management, and Incident Management.
  • Experience with user and system monitoring, remediation, and implementation to maintain service stability.


Responsibilities
  • Create and modify Grafana dashboards to monitor system performance and user experience.
  • Set up and maintain monitoring and alerting systems using Prometheus and Datadog.
  • Collaborate with cross-functional teams to improve service reliability and respond to incidents.
  • Develop and enhance runbooks for incident response and remediation.
  • Proactively work with alerting to ensure timely detection of issues and minimize downtime.
  • Implement monitoring, remediation, and other operational practices to maintain high service levels.


NTD Software Glassdoor Company Review
4.9 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
NTD Software DE&I Review
No rating Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of NTD Software
NTD Software CEO photo
Unknown name
Approve of CEO

We are on an ongoing mission to help makers actualize through the power of:-Effective talent acquisition methodologies-The power of creating technology

3 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
April 30, 2024

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!