Job details

Tech Lead, Site Reliability Engineering (SRE)

Get a free resume review

At Edge & Node, we’re focused on building The Graph, a decentralized protocol for accessing and organizing the world’s knowledge and information. Subgraphs, a core technology developed by Edge & Node to access blockchain data, are widely used across web3 to power decentralized applications.

We’re a tight-knit, efficient team with a bias for action and a strong sense of ownership. Our teams have autonomy, low ego, and are trusted to drive projects end to end. We care deeply about building infrastructure for web3 use cases and collaborate across disciplines to make that happen. If you’re passionate about infrastructure that has a real impact on our users, enjoy solving hard problems, and thrive in a fast-paced environment, you’ll feel right at home.

The Engineering Operations team, including Site Reliability, works closely with engineering teams across Edge & Node to ensure the services we operate are reliable, performant, predictable, and secure. We are on a mission to take our service delivery to the next level.

What You'll Do

Lead by example as a hands-on technical contributor, participating in on-call rotations, incident response, and the day-to-day work of the SRE team
Partner with engineering and product leadership to shape roadmaps, define team priorities, and plan work that improves reliability, performance, and scalability across the stack
Team with and support other SREs, leveraging your leadership and soft skills to foster a culture of continuous learning, blameless retrospectives, and technical excellence
Own the incident lifecycle, including root cause analysis and follow-up remediation, and work to make our systems increasingly self-healing
Drive SRE team strategy, advocating for industry best practices, standardization, and secure and optimized infrastructure
Architect and improve core infrastructure services, with an eye toward high availability, fault tolerance, performance, and end-to-end observability
Work across teams to challenge assumptions, fundamentally overhaul our systems, and improve documentation
Collaborate with external partners and vendors as needed to ensure the health of critical services

What We’re Looking For

Proven experience as a senior or lead SRE or devops engineer, ideally having led large-scale reliability initiatives or infrastructure transformation projects
Strong project or technical leadership skills, with a track record of guiding teammates and setting technical direction while still remaining hands-on
Deep knowledge of the SRE/devops domain, including incident response, security awareness, maintaining SLAs and uptime guarantees, observability, supporting internal development teams, project and capacity planning, and/or system architecture
Experience with both cloud and on-prem core infrastructure, ideally with Google Cloud Platform (GCP), bare metal infra, and kubernetes (or similar orchestration tools)
Fluency in infrastructure as code, Terraform, automation tooling, CI/CD pipelines, and system monitoring solutions such as Grafana
Excellent interpersonal, leadership, and communication skills, with the ability to align stakeholders and motivate and unblock team members
Experience in web3, crypto, or blockchain is a plus (but not required)
_____
About The Graph
The Graph is the indexing and query layer of the decentralized internet. As the first open data marketplace to introduce and standardize subgraphs, The Graph is a flagship solution for accessing blockchain data across web3.
Since launching in 2018, tens of thousands of developers have built subgraphs to power dapps across 90+ blockchains. As demand for web3 data grows, The Graph is evolving to support a broader range of data services and query languages, expanding what’s possible with decentralized infrastructure—now and in the future.
Discover more about how The Graph is shaping the future of decentralized physical infrastructure networks (DePIN) by following The Graph on X, LinkedIn, Instagram, Facebook, Reddit, and Medium. Join the community on The Graph’s Telegram, and join technical discussions on The Graph’s Discord.

Average salary estimate

$150000 / YEARLY (est.)

min

max

$120000K

$180000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Tech Lead, Site Reliability Engineering (SRE), Edge & Node

As a Tech Lead for Site Reliability Engineering (SRE) at Edge & Node, you will be joining an innovative team that is passionately focused on building The Graph, a decentralized protocol designed to optimize access to the world's vast knowledge and information. We're not just developers; we're curators of a decentralized future, and your expertise will be crucial in enhancing our infrastructure to support web3 applications. In this hands-on role, you'll lead by example in incident response and on-call rotations, while also driving reliability efforts and technical excellence across our organization. Your leadership will help shape project roadmaps alongside engineering and product leadership, allowing us to deliver reliable and performant services. Not only will you support other SREs with your technical insights, but you will also champion a culture of learning and continuous improvement—where blameless retrospectives help us grow stronger. Architects of high availability, fault tolerance, and observability, our SREs strive to make our systems self-healing. We’re looking for someone who is not only knowledgeable in the SRE domains, such as incident response and system architecture but is also adept at project leadership, and is skilled with tools like Terraform and Kubernetes. If you’re excited about making a real impact and fostering collaboration across teams, you’ll be a perfect fit at Edge & Node. Explore the world of decentralized infrastructure with us and make your mark in shaping the future!

Frequently Asked Questions (FAQs) for Tech Lead, Site Reliability Engineering (SRE) Role at Edge & Node

What are the primary responsibilities of a Tech Lead, Site Reliability Engineering (SRE) at Edge & Node?

As a Tech Lead, Site Reliability Engineering (SRE) at Edge & Node, your primary responsibilities include leading the SRE team in managing incident response, driving service reliability and performance improvements, and architecting core infrastructure with an emphasis on fault tolerance. You're expected to collaborate closely with engineering teams, influence technical decisions, and support other SREs in their professional development.