Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Sr Site Reliability Engineer (SRE) image - Rise Careers
Job details

Sr Site Reliability Engineer (SRE)

Why should you join dLocal?

dLocal enables the biggest companies in the world to collect payments in 40 countries in emerging markets. Global brands rely on us to increase conversion rates and simplify payment expansion effortlessly. As both a payments processor and a merchant of record where we operate, we make it possible for our merchants to make inroads into the world’s fastest-growing, emerging markets. 


By joining us you will be a part of an amazing global team that makes it all happen, in a flexible, remote-first dynamic culture with travel, health and learning benefits, among others. Being a part of dLocal means working with 1000+ teammates from 30+ different nationalities and developing an international career that impacts millions of people’s daily lives. We are builders, we never run from a challenge, we are customer-centric, and if this sounds like you, we know you will thrive in our team.


What's the opportunity?


We are looking for a Site Reliability Engineer (SRE) to join our team! As our Site Reliability Engineer (SRE), you will be focused on the design, implementation and continuous maintenance of our centralized observability platform using OpenTelemetry (OTEL) as its backend. You will be part of a talented team that works on mission-critical applications with big customers like Netflix, Amazon, Nike, Facebook & more!


As a Site Reliability Engineer, you are always expected to ask the necessary questions:

What data do we need to understand how our systems are performing?

How do we collect this data?

What patterns are we looking for in the data and what do they mean?

Who should be notified when a certain system is not working properly?

Do we have any systems that we need more data for?


An SRE engineer designs systems and processes to answer the questions above and to provide automated support and response where possible.



What will you do?
  • Own OpenTelemetry Pipelines: Design, implement, and maintain observability pipelines across the three main signals—logs, metrics, and traces—ensuring standardized, scalable, and efficient data ingestion. Optimize ingestion strategies to balance cost, performance, and usability.
  • Empower Engineering Teams: Build self-service automation and tooling that enables development teams to instrument and leverage observability without requiring manual intervention from the SRE team. Drive adoption of best practices while ensuring teams own their telemetry.
  • Support Incident Management: Be the Engineering side of our Incident Management Team, designing the processes, playbooks, checklists, and automations for them and other engineers to follow during an incident.
  • Collaborate Across Teams: Interact with members from almost all teams across the business to understand their monitoring, alerting and SLO / SLA requirements and design systems and processes that ensure we meet or exceed these requirements. Influence architectural decisions during initial design stages to ensure resiliency and scale at the outset of software development.
  • Automate Observability Infrastructure: Leverage Infrastructure-as-Code (IaC) to provision and manage monitoring tools, alerting rules, and our observability configurations across OTEL Pipelines.
  • Define Baseline Observability Standards: Design base level requirements for new and existing services to ensure that all dLocal infrastructure and code are monitored consistently and accurately at a basic level.
  • Own Technical and Security Health: Take full ownership of dLocal’s infrastructure reliability, ensuring adherence to key availability and security KPIs.
  • Optimize Alerting Systems: Continuously refine alerting signals to minimize noise and ensure them are always actionable, reducing fatigue and improving response efficiency.


Which skill do you need?
  • Over 4 years’ of experience as SRE Engineer or in a very similar role more focused on observability.
  • Expertise in Kubernetes, including its core components, deployment methodologies, and monitoring best practices.
  • Some understanding of OpenTelemetry, including setting up OTEL collectors, instrumentation, and pipeline optimization.
  • Proficiency with monitoring and logging tools such as Grafana, Prometheus, Loki, New Relic, or Datadog.
  • Hands-on experience with IaC tools (Terraform) and GitOps CI/CD solutions (ArgoCD, GitHub Actions, or similar).
  • Experience integrating incident management platforms (PagerDuty, Jira) with automated alerting workflows.
  • Strong scripting abilities (Python, Go, or similar) for automating observability tasks.
  • A problem-solving mindset, with the ability to collaborate across multi-functional teams to drive reliability improvements.

You will stand out if you have:
  • Cloud experience, especially AWS and ECS-based workloads.
  • Experience managing observability pipelines at scale in high-throughput environments.
  • Familiarity with Configuration-as-Code (Ansible, Chef, or SaltStack) for managing configurations across legacy instances.
  • Database performance monitoring experience, particularly in large-scale distributed environments.


What do we offer?


Besides the tailored benefits we have for each country, dLocal will help you thrive and go that extra mile by offering you:

- Remote work: work from anywhere or one of our offices around the globe!*

- Flexibility: we have flexible schedules and we are driven by performance.

- Fintech industry: work in a dynamic and ever-evolving environment, with plenty to build and boost your creativity.

- Referral bonus program: our internal talents are the best recruiters - refer someone ideal for a role and get rewarded.

- Learning & development: get access to a Premium Coursera subscription.

- Language classes: we provide free English, Spanish, or Portuguese classes.

- Social budget: you'll get a monthly budget to chill out with your team (in person or remotely) and deepen your connections!

- dLocal Houses: want to rent a house to spend one week anywhere in the world coworking with your team? We’ve got your back!


*For people based in Montevideo (Uruguay) applying to non-IT roles, 55% monthly attendance to the office is required



What happens after you apply?

Our Talent Acquisition team is invested in creating the best candidate experience possible, so don’t worry, you will definitely hear from us. We will review your CV and keep you posted by email at every step of the process!


Also, you can check out our webpageLinkedinInstagram, and Youtube for more about dLocal!

Dlocal Glassdoor Company Review
3.7 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
Dlocal DE&I Review
3.9 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
CEO of Dlocal
Dlocal CEO photo
Unknown name
Approve of CEO
What You Should Know About Sr Site Reliability Engineer (SRE), Dlocal

Are you ready to take your career to the next level as a Sr Site Reliability Engineer (SRE) at dLocal? We're a dynamic payment platform trusted by global giants like Netflix and Amazon to facilitate payments across 40 countries in emerging markets. Our mission? To empower merchants by streamlining payment processing, and we need passionate individuals to help us maintain and innovate our systems. In this role, you'll dive deep into the intricacies of our observability platform, utilizing your expertise with OpenTelemetry to enhance our data monitoring capabilities. You will lead initiatives to optimize our logging, metrics, and trace ingestion while ensuring our systems are robust and reliable. Collaboration is key; you’ll be working cross-functionally, helping teams adopt best practices so that everyone plays a part in our observability journey. As an SRE, you’ll refine our incident management processes, create playbooks, and automate alerting workflows to ensure any disruptions are swiftly addressed. Plus, in our remote-first culture, you have the freedom to work from anywhere, supplemented by benefits like a social budget for team bonding and access to premium learning resources. Joining dLocal means being part of a diverse team that’s committed to pushing boundaries and making significant impacts in the fintech industry. With us, you won’t just have a job; you will be an integral part of building the future of payments worldwide!

Frequently Asked Questions (FAQs) for Sr Site Reliability Engineer (SRE) Role at Dlocal
What responsibilities does a Sr Site Reliability Engineer (SRE) at dLocal have?

As a Sr Site Reliability Engineer (SRE) at dLocal, you will be primarily responsible for the design and maintenance of our observability platform, focusing on key performance signals—logs, metrics, and traces. You will work on incident management processes, develop automation for self-service tools, and collaborate closely with other engineering teams to ensure high levels of system performance and reliability.

Join Rise to see the full answer
What experience is required for the Sr Site Reliability Engineer (SRE) position at dLocal?

Candidates for the Sr Site Reliability Engineer (SRE) role at dLocal should have over four years of relevant experience in SRE or a closely related field, with a strong focus on observability. Familiarity with Kubernetes and expertise in monitoring tools such as Grafana and Prometheus are essential, along with scripting skills in Python or Go.

Join Rise to see the full answer
How does dLocal support the professional development of its Sr Site Reliability Engineers (SRE)?

At dLocal, we believe in continuous learning and offer comprehensive support for professional development. As a Sr Site Reliability Engineer (SRE), you'll have access to a Premium Coursera subscription, and we provide opportunities to attend workshops and training sessions to enhance your skills in observability and incident management.

Join Rise to see the full answer
What tools and technologies should a Sr Site Reliability Engineer (SRE) at dLocal be familiar with?

A Sr Site Reliability Engineer (SRE) in our team should be proficient in tools like OpenTelemetry, Terraform for Infrastructure-as-Code (IaC), and CI/CD solutions such as GitHub Actions. Experience with incident management platforms, as well as monitoring and logging tools like New Relic and Datadog, is also crucial for success in this position.

Join Rise to see the full answer
What is the work culture like for Sr Site Reliability Engineers (SRE) at dLocal?

The work culture for Sr Site Reliability Engineers (SRE) at dLocal is flexible and dynamic, fostering collaboration across teams in a remote-first environment. We champion innovation, diversity, and a customer-centric approach, ensuring that you are not only working on cutting-edge technology but also part of a team that values input and creativity.

Join Rise to see the full answer
Common Interview Questions for Sr Site Reliability Engineer (SRE)
Can you explain OpenTelemetry and its role in your work as an SRE?

OpenTelemetry is essential for observability; it provides a unified framework for collecting metrics, logs, and traces from your services. As an SRE, explain how you have utilized OpenTelemetry to enhance your system's monitoring and the specific benefits you have observed.

Join Rise to see the full answer
Describe your experience with incident management and the processes you follow during an incident.

Discuss your structured approach to incident management, including the responsiveness of your team, playbook utilization, and how you ensure that proper communication is maintained throughout the process. Highlight a real scenario where your strategies effectively minimized system downtime.

Join Rise to see the full answer
How do you approach optimizing alert systems to reduce noise?

Talk about your experience in refining alert conditions, implementing thresholds based on historical data, and ensuring that alerts are actionable. Include examples of how you have effectively reduced alert fatigue within an engineering team.

Join Rise to see the full answer
What scripting languages do you use and how do they aid your role as an SRE?

Detail your experience with scripting in languages like Python or Go, explaining how automation through scripts helps in monitoring tasks, improving deployment processes, or enhancing the observability pipeline.

Join Rise to see the full answer
Can you give an example of how you’ve empowered a development team to own their observability practices?

Provide an instance where you built self-service tooling or created documentation that allowed development teams to implement monitoring solutions independently. Discuss the impact this had on overall system reliability.

Join Rise to see the full answer
How do you manage configuration in a dynamic infrastructure environment?

Discuss how you use Infrastructure-as-Code tools such as Terraform or Ansible to manage configurations effectively. Highlight your approach in ensuring that configurations are version-controlled and consistent across environments.

Join Rise to see the full answer
What methods do you use to collect and analyze metrics in high-throughput environments?

Describe your strategies for selecting metrics that provide critical insights into system performance and how you utilize tools like Prometheus or Grafana to visualize and analyze these metrics, sharing lessons learned from your experiences.

Join Rise to see the full answer
What are your strategies for collaborating cross-functionally with teams?

Explain how you build relationships with different teams, encourage the sharing of requirements for monitoring and alerting, and how your collaboration has led to improved observability and system reliability.

Join Rise to see the full answer
What do you consider to be the biggest challenges facing Site Reliability Engineers today?

Reflect on current trends in SRE, such as scaling issues, integrating new technologies, or maintaining service reliability. Discuss how you’ve addressed these challenges in previous roles and your proactive strategies for overcoming them.

Join Rise to see the full answer
How do you ensure compliance with key availability and security KPIs?

Talk about your methods for monitoring adherence to service level objectives (SLOs) and security frameworks, and how you have used data from observability tools to drive decisions that improve both availability and compliance.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Dlocal Remote No location specified
Posted 5 days ago
Paid Holidays

Join dLocal as a Consolidation Accountant, where you'll play a key role in enhancing financial reporting within a diverse and flexible remote-first environment.

Photo of the Rise User
Posted 6 days ago
Paid Holidays

Become a pivotal leader at dLocal, shaping the future of payment solutions as a Software Engineering Leader specializing in front-end technologies.

Photo of the Rise User
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave

Join NVIDIA as a Software Engineering Intern, focused on enhancing distributed scientific computing libraries for cutting-edge GPU technologies.

Photo of the Rise User
Posted 8 days ago

Join Natera as a Senior Software Engineer to contribute to groundbreaking genetic testing solutions in a fully remote environment.

Posted 15 hours ago

Echo Base is on the lookout for a Senior Front-end Developer with deep frontend expertise to enhance their innovative P2P marketplace.

ngc Hybrid United States-Colorado-Aurora
Posted 4 days ago

Join Northrop Grumman as a Sr Principal Software Engineer and contribute to innovative defense systems in Aurora, CO.

Photo of the Rise User
Miltenyi Biotec Remote Friedrich-Ebert-Straße 68, Bergisch Gladbach, Germany
Posted 10 days ago

Become an integral part of our team as a Senior Software Engineer focused on UI Development for cutting-edge medical technology.

Posted 11 days ago

Mighty Acorn is on the lookout for a Technical Architect to transform government digital services through innovative technology and effective leadership.

BrainForce Remote No location specified
Posted 13 days ago

Join a prominent news organization in the financial and blockchain space as a Back-End .NET Developer, crafting scalable and high-performance back-end solutions.

Photo of the Rise User
Posted 9 days ago

Join NEXTON as a FullStack Developer and collaborate with top experts in a vibrant work environment in Toulouse.

MATCH
Calculating your matching score...
BENEFITS & PERKS
Paid Holidays
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
April 7, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
Photo of the Rise User
Someone from OH, Columbus just viewed Scrum Master at Sysco Costa Rica
X
Someone from OH, Cincinnati just viewed Senior Java Engineer (Remote) at Xenon7
Photo of the Rise User
Someone from OH, Cincinnati just viewed Senior, Software Engineer- Java at Walmart
Photo of the Rise User
Someone from OH, Cincinnati just viewed Java, Javascript, Python, NodeJS Software Engineer at Walmart
Photo of the Rise User
Someone from OH, Pickerington just viewed Senior Business Analyst (Salesforce) at Protolabs
H
Someone from OH, Akron just viewed Brand Marketing Manager at Huntington
R
Someone from OH, Hamilton just viewed Forklift Operator Warehouse at Ryder
Photo of the Rise User
Someone from OH, Cincinnati just viewed Ad Ops Specialist, Display at System1
Photo of the Rise User
Someone from OH, Cincinnati just viewed FQHC Billing & Collections Manager at OhioGuidestone
Photo of the Rise User
Someone from OH, Cleveland just viewed Enrollment Specialist- Remote at Adtalem Global Education
o
Someone from OH, Dayton just viewed Marketing and Communications Specialist at osu
Photo of the Rise User
Someone from OH, Columbus just viewed Construction Coordinator at Meijer
Photo of the Rise User
Someone from OH, Steubenville just viewed Legal & Compliance Internship at Smiths Group
Photo of the Rise User
Someone from OH, Warren just viewed Senior Front-End Developer at Worldly
Photo of the Rise User
Someone from OH, Tiffin just viewed Game Operations Specialist at Genius Sports
u
Someone from OH, Loveland just viewed Customer Service Agent - Part Time at uhaul
Photo of the Rise User
Someone from OH, Cleveland just viewed HR Manager at Shearer's Foods
Photo of the Rise User
Someone from OH, Columbus just viewed Mid Level, System Administrator - (ETS) at Delivery Hero
Photo of the Rise User
Someone from OH, Mason just viewed Inside Sales Co-Op at VEGA Americas