Job details

Principal Site Reliability Engineer

Hiring near our US Centers of Excellence Hybrid, flexible environment... Irving, TX Gartner offers a hybrid, flexible environment, with remote work that allows associates great flexibility to work from home, and opportunities to connect with colleagues for moments that matter on-site. Candidates that apply should be located within a reasonable proximity to one of Gartner’s Centers of Excellence office locations. About Gartner IT: Join a world-class team of skilled engineers who build creative digital solutions to support our colleagues and clients. We make a broad organizational impact by delivering cutting-edge technology solutions that power Gartner. Gartner IT values its culture of nonstop innovation, an outcome-driven approach to success, and the notion that great ideas can come from anyone on the team. About this role: The person will primarily be responsible for supporting production or operations critical client facing applications. They will ensure the application's operational readiness by evaluating its performance, reliability, scale, resiliency & observability. They will be responsible for identifying issues in production, triaging identified issues, partnering with other engineers on the team to identify the root cause. Other responsibilities include managing applications and infrastructure as a code, creating & executing chaos tests, managing alerts & dashboards. What you’ll do • As part of the SRE scrum team, perform full stack triaging of alerts to identify root cause of application performance & stability issues. • Provide critical technical insight as well as thought leadership to identify the root cause during production incidents while collaborating with cross functional members of swat team. • Work with stakeholders such as product owners to define service level objectives (SLOs) for application features and services. • Track performance against SLOs in partnership with development teams or other stakeholders, and ensure systems continue to meet SLOs over time. • Ideate, design and develop dashboards or reports to effectively communicate key metrics. • Identify opportunities to improve alerting posture. • Work closely with the Application team and ensure best principles for a performant, resilient and reliable application are adopted during architecture or feature design phase. • Evangelize chaos engineering principles to various application teams, so resilient patterns are designed and implemented. • Mentors the engineers on the team to derive NFR/Workload model and ensure performance & resiliency is considered early in the SDLC • Use data driven analysis to drive continuous improvement in application performance, reliability and resilience. • Perform analytics on previous incidents to understand root causes and provide solutions including architecture & design patterns, automation to reduce recurrence • Available to work flexible hours as required for operational support and during select events like releases or conferences to ensure coordination among globally distributed team • Participate in on-call schedule, ensuring that issues are addressed promptly and effectively What you’ll need: • 12+ years of information technology experience with 7+ years working on Dev Ops or SRE team or performance engineering team or similar position providing comprehensive support of critical multi-tier applications. • Experienced in triaging of production issues using APM tools such as Dynatrace or App Dynamics or New Relic and log aggregation tools such as Splunk, ELK, etc. • Experience with architecting and designing resilient and performant cloud applications • Experience with SRE concepts like SLI/SLOs & error budgets • Experience with AWS cloud, specifically services such as EC2, EKS, API GW, Lambda, Route 53, SNS, RDS, Elastic cache, Open Search, etc. or similar cloud technologies & services • Knowledge of Docker containers and related orchestration technologies like kubernetes • Experience with CI/CD processes and tools ( Jenkins, Argo, Harness, etc.) • Ability to work independently and partner with team members with a strong sense of initiative and drive • Excellent analytical, verbal & written communication skills with data driven analysis Who you are: • Bachelor's degree in Computer Science, or related discipline, or equivalent work experience. • Motivated, high-potential performer, with demonstrated ability to influence and lead. • Strong communicator with excellent interpersonal skills. • Able to solve complex problems and successfully manage ambiguity and unexpected change. • Teachable and embracing of best practices and feedback as a means of continuous improvement. • Consistently high achiever marked by perseverance, humility, and a positive outlook in the face of challenges. Don’t meet every single requirement? We encourage you to apply anyway. You might just be the right candidate for this, or other roles. What you will get: • Competitive compensation. • Limitless growth and learning opportunities. • Ongoing mentorship and apprenticeship; Leadership courses, development programs, technical…

Gartner Glassdoor Company Review

4.1

Gartner DE&I Review

No rating

CEO of Gartner

Gene Hall

Approve of CEO

By Gartner

Gartner delivers actionable, objective insight that drives smarter decisions and stronger performance on an organization’s mission-critical priorities.

52 jobs

MATCH

Calculating your matching score...

FUNDING

Public

DEPARTMENTS

Information Technology

SENIORITY LEVEL REQUIREMENT

Senior

INDUSTRY

Business Consulting

TEAM SIZE