Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Site Reliability Engineer - Observability image - Rise Careers
Job details

Site Reliability Engineer - Observability

Job Description

This would not be possible without a state of the art core platform as the backbone managed by the dedicated SME teams in our Core Group. One such team is the Performance and Observability Team having two sub-tracks as the name implies.

We are looking to expand our team across the Observability sub-track. The position is ideally suited to experienced candidates from Software Engineering / SRE / DevOps backgrounds with deep focus on Observability stack and best practices.  

Our team currently has four members, each having a diverse background and perspective. Together we own an observability stack that handles billions of metrics, traces and log entries each month. Our customers are all the engineers at Wolt who use this stack to understand the health of their services / infrastructure at scale. 

As of today, we manage a complex observability stack, covering a wide scope from application instrumentation and telemetry data collection to visualization and alerting, spanning both backend and client-facing applications. Our daily responsibilities ensure that this ecosystem operates seamlessly. In parallel, we're building the next-generation observability platform, re-architecting our stack and pipelines in collaboration with our counterparts at DoorDash. This partnership provides an unparalleled opportunity to drive high-impact initiatives across the observability domain, offering empowerment and involvement in cutting-edge projects.

Qualifications

What you’ll do :

  • Be responsible for building and improving our observability platform and tooling, used by all Wolt engineers.
  • Contribute to initiatives focused on architecting, building, and maintaining our observability stack to efficiently handle increasing telemetry data with greater reliability.
  • Champion observability best practices, guiding and supporting other Woltians in this space.
  • Take ownership of key initiatives to improve the quality, efficiency, and reliability of our observability stack.
  • Apply your expertise in SRE culture and practices to ensure observability has a meaningful impact on our business.
  • Participate in the on-call rotation to address incidents and outages, resolving reliability issues efficiently.
  • Help standardize observability resources by building tools and documentation that enhance productivity and developer experience.
  • Triage and resolve production issues within the observability scope.
  • Contribute to open-source efforts by sharing some of our internal tools with the broader community.

Qualifications:

  • Proven experience in Software EngineeringSRE, or a similar role with a focus on observability, reliability, and scaling large systems.
  • You have experience with OpenTelemetry, which is a key foundation for much of the infrastructure and tooling the team is converging on as part of our future observability strategy.
  • Strong foundation in computer science principles and engineering fundamentals.
  • Proficient in development, particularly in Go (preferred) or Python, with experience building automation tools and software for large-scale, distributed systems.
  • Hands-on experience with observability tooling such as DataDog, Prometheus, Mimir, Elasticsearch, Grafana, Jaeger, and tracing systems.
  • Expertise in cloud platforms like AWS, GCP, or Azure, with experience managing cloud infrastructure using Kubernetes and containers (Docker).
  • Deep knowledge of building and maintaining reliable, high-performance, and scalable distributed systems.
  • Solid understanding of SRE principles, incident response, and designing fault-tolerant architectures.
  • Experience with infrastructure-as-code tools like Terraform or Ansible for managing cloud environments.
  • Familiarity with CI/CD pipelines, automated testing, and continuous delivery practices.
  • Strong analytical and problem-solving skills, with experience troubleshooting complex distributed systems.
  • Excellent communication and collaboration skills, with the ability to work cross-functionally to enhance platform observability and reliability.
  • Experience working directly with development teams, with a willingness to dive into application code for observability-related topics, even when unfamiliar with the application code.
  • Solid experience with Docker and Kubernetes, coupled with a strong foundation in Unix systems and networking concepts.
  • Open to feedback, recognizing that no one is perfect—including us. We see feedback as an opportunity to learn and grow together.

Nice to Haves:

  • You have experience with handling data and running monitoring infrastructure at scale, such as managing petabyte-scale Elasticsearch clusters or similar databases
  • You have experience operating distributed event streaming platforms at scale e.g. Apache Kafka
  • Open-source contributions in observability, cloud, or platform engineering are a strong plus

Additional Information

📍This role can be based in one of our tech hubs in Helsinki, Berlin or Stockholm, or you can work remotely anywhere in Finland, Sweden, Germany, Denmark, and Estonia. Read more about our remote setup here. If you live outside of these countries - not to worry! We provide relocation support to help you make your way to Finland, Germany or Sweden.

The position will be filled as soon as we find the right people, so feel free to apply as soon as you feel like hearing more about the position and potentially joining Wolt & Doordash!

Wolt Glassdoor Company Review
3.7 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
Wolt DE&I Review
3.6 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
CEO of Wolt
Wolt CEO photo
Miki Kuusi
Approve of CEO

Average salary estimate

$75000 / YEARLY (est.)
min
max
$60000K
$90000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Site Reliability Engineer - Observability, Wolt

If you're an experienced Site Reliability Engineer with a passion for Observability, come join the dynamic team at Wolt in Helsinki! Our Performance and Observability Team is on the lookout for innovative minds to help us manage and enhance our observability platform. In this role, you'll dive into the depths of our observability stack that millions of metrics, traces, and log entries pass through monthly, ensuring our engineers have all the data they need to keep services running smoothly. You'll take ownership of initiatives that drive the architectural direction of our observability system, collaborating closely with talented counterparts at DoorDash. Your expertise will help champion observability best practices across the board, empowering everyone at Wolt to improve their services. As a Site Reliability Engineer focused on Observability, you’ll not only enhance existing tools, but you'll also have the opportunity to contribute to open-source projects and address real-time incidents as they occur. With a collaborative environment that values feedback and personal growth, this role is crafted for individuals eager to apply their deep technical skills in SRE, cloud computing, and distributed systems while being part of transformative projects. Your passion for craftsmanship and reliability will make a significant impact in our ever-evolving landscape. Plus, with the flexibility to work remotely from various countries, you can enjoy a fulfilling career without compromising your work-life balance. We can't wait for you to bring your skills and perspectives to Wolt!

Frequently Asked Questions (FAQs) for Site Reliability Engineer - Observability Role at Wolt
What are the responsibilities of a Site Reliability Engineer - Observability at Wolt?

As a Site Reliability Engineer focusing on Observability at Wolt, your primary responsibilities will include enhancing our observability platform and tools, ensuring they meet the growing needs of our users. You'll work on architecting, building, and maintaining our observability stack, focusing on reliability and efficiency. Additionally, you'll be responsible for championing observability best practices, participating in on-call rotations for incident resolution, and contributing to open-source projects.

Join Rise to see the full answer
What qualifications do I need to become a Site Reliability Engineer - Observability at Wolt?

To succeed as a Site Reliability Engineer - Observability at Wolt, you should have proven experience in Software Engineering, SRE, or similar fields, with a strong focus on observability and scaling large systems. Familiarity with OpenTelemetry, development experience in languages like Go or Python, and hands-on expertise with observability tools like Prometheus and Grafana are crucial. Additionally, knowledge of cloud platforms such as AWS or GCP and experience managing distributed systems will set you up for success.

Join Rise to see the full answer
What tools and technologies will I be working with as a Site Reliability Engineer - Observability at Wolt?

In your role as a Site Reliability Engineer - Observability at Wolt, you'll work with a diverse range of tools and technologies, including OpenTelemetry, DataDog, Prometheus, Mimir, Elasticsearch, Grafana, and Jaeger. You'll also manage cloud infrastructure with Kubernetes and utilize Docker containers. Your experience with infrastructure-as-code tools like Terraform or Ansible will be valuable in streamlining our cloud environments.

Join Rise to see the full answer
How does collaboration work within the Performance and Observability Team at Wolt?

Collaboration is a cornerstone of the Performance and Observability Team at Wolt. You'll work alongside colleagues with diverse backgrounds, bringing different perspectives to the table as you tackle observability challenges. Regular feedback sessions and open communication will allow you to share ideas and improve processes. Additionally, partnering with engineers at DoorDash opens up opportunities to collaborate on impactful projects, ensuring everyone's voice is heard and valued.

Join Rise to see the full answer
What career growth opportunities are available for Site Reliability Engineers - Observability at Wolt?

As a Site Reliability Engineer - Observability at Wolt, you'll have vast opportunities for growth. You'll be encouraged to take ownership of key initiatives, allowing you to develop your leadership skills. Additionally, our commitment to continuous learning means you'll be supported in honing your expertise through various projects, conferences, and open-source contributions, positioning you for advancement in your career.

Join Rise to see the full answer
Common Interview Questions for Site Reliability Engineer - Observability
Can you explain the importance of observability in site reliability engineering?

Observability is crucial in site reliability engineering because it provides insights into the health and performance of systems. By effectively monitoring and analyzing metrics, logs, and traces, you can quickly identify issues, forecast outages, and ensure systems operate reliably. When answering, emphasize how observability allows teams to maintain service continuity, enhance user satisfaction, and ultimately drive business success.

Join Rise to see the full answer
What observability tools have you used and how have they impacted your previous projects?

Discuss tools you've used, such as Prometheus, Grafana, or DataDog, and explain how they improved monitoring and troubleshooting in your past projects. Give examples of how these tools helped you reduce downtime or enhance visibility into system performance. Highlight any metrics or data you can share that demonstrates their impact.

Join Rise to see the full answer
Describe a time you handled a production incident. What was your approach?

During a production incident, I follow a structured approach that includes quickly assessing the situation, collaborating with team members, and diagnosing the underlying issue. In your response, mention the importance of communication and how you employed observability tools to aid in identifying the root cause. Illustrating how you resolved the incident and the lessons learned will provide a well-rounded answer.

Join Rise to see the full answer
How do you ensure the observability stack remains reliable and scalable?

To ensure reliability and scalability, I focus on continual improvements and performance monitoring. Discuss practices such as capacity planning, load testing, and regularly reviewing the architecture of the observability stack. Highlight your commitment to automation, using tools like Terraform to manage infrastructure changes without downtime.

Join Rise to see the full answer
What best practices do you follow when instrumenting applications for observability?

Best practices for application instrumentation include consistent logging, using standardized formats, and ensuring metrics capture meaningful events. Always mention the importance of setting alerts for critical thresholds. Demonstrating an understanding of how to instrument for both performance and troubleshooting will showcase your expertise.

Join Rise to see the full answer
How do you handle feedback from peers when working on observability tools?

I view feedback as an opportunity for growth. During discussions, I actively listen to my peers’ perspectives and assess how their insights could enhance the project. Share an example illustrating how you incorporated feedback that led to a positive change or improvement in observability tools or practices.

Join Rise to see the full answer
Explain a challenge you faced with observability tooling and how you solved it.

Identify a specific challenge, such as integrating new observability tools or dealing with scaling issues. Describe the steps you took to investigate the problem and the solution you implemented, emphasizing resourcefulness and collaboration with other teams. This answer will demonstrate both your problem-solving skills and your resilience in a challenging environment.

Join Rise to see the full answer
What role does automation play in effective observability practices?

Automation is key to effective observability as it reduces manual errors and increases efficiency. Discuss how you automate routine checks, incident responses, and deployment processes, highlighting any specific tools or scripts you’ve developed to enhance observability. Use examples to demonstrate how automation improved your team's overall responsiveness.

Join Rise to see the full answer
How do you prioritize tasks when managing multiple observability initiatives?

When managing multiple initiatives, I prioritize based on business impact and urgency. Discuss how you assess the potential benefits of each task and involve team members in decision-making. This will show your strategic thinking in aligning observability initiatives with business goals and user experience considerations.

Join Rise to see the full answer
What is your experience with open-source contributions related to observability?

I’ve contributed to several open-source projects related to observability by providing fixes, improvements, or documentation. Share specifics about the projects and the impact of your contributions. Emphasize your passion for community involvement and how these experiences fostered your skills in observability practices.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 3 days ago
Photo of the Rise User
Posted 10 days ago
Photo of the Rise User
Posted 13 days ago
Photo of the Rise User
Linx Remote São Paulo, São Paulo, Brasil
Posted 20 hours ago
Photo of the Rise User
Posted 12 days ago
Photo of the Rise User
BeyondTrust Remote Remote United States | Remote Canada
Posted 12 days ago
Photo of the Rise User
Posted 10 days ago

Wolt makes it incredibly easy for you to discover and get what you want. Delivered to you – quickly, reliably and affordably. And by doing so, we make cities better places to live.

39 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
January 3, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!