Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Site Reliability Engineer - Observability image - Rise Careers
Job details

Site Reliability Engineer - Observability

About KOMOJU

KOMOJU (by Degica) is the leading cross-border payment gateway for Japan. We power payments for companies like video game distribution platform Steam and the popular mobile app TikTok. Today we help thousands of merchants by providing them with the payment infrastructure they need through developer-friendly API’s to integrations on popular platforms like Shopify and Wix; we help our merchants grow in all markets they are expanding.


About the position

As our systems grow in complexity, scale, and traffic, maintaining their reliability and availability becomes increasingly challenging—and critical. We're looking for a Site Reliability Engineer (SRE) with a focus for observability to help us meet these demands.

In this role, you'll be at the forefront of ensuring that our infrastructure is not just running, but understandable and measurable. Observability is a core pillar of our reliability strategy—it's how we detect issues before they impact our merchants and users, quickly understand the root causes of incidents, and continuously improve our systems performance and reliability.

You’ll design and evolve our observability platform, including metrics, logging, tracing, and alerting, and partner with development teams to embed observability into every stage of the software lifecycle. Your work will directly impact our ability to scale confidently and respond to incidents swiftly.

This is a key role for someone who wants to build resilient systems, empower teams with actionable insights, and make a real difference in how we operate at scale.

While we are a remote-first company, this position is based in Tokyo, and we expect candidates to be willing to relocate to Japan.

Responsibilities

  • Design, implement, and maintain our observability stack (metrics, logging, tracing, dashboards).
  • Define and monitor SLIs/SLOs to ensure service health and reliability.
  • Correspond with engineering teams to instrument applications for better visibility.
  • Build and maintain dashboards and alerts that provide actionable insights and minimize alert fatigue.
  • Troubleshoot system performance and reliability issues using observability data.
  • Educate and guide engineering teams on best practices in monitoring, alerting, and incident response.
  • Contribute to postmortems and continuously improve system transparency and resiliency.
  • 3+ years in SRE roles.
  • Hands-on experience with observability tools, preferably Datadog.
  • Proficiency in Terraform.
  • Background in software development.
  • Proficiency in at least one scripting or programming language (Ruby/Rails, Python, Go, Shell Script, etc.).
  • Experience working with AWS.
  • Familiarity with monitoring design principles: RED, USE, SLI/SLO, alert tuning.
  • Ability to analyze logs, metrics, and traces to diagnose issues and identify trends.

Nice to have

  • Knowledge of CI/CD pipelines and integrating observability into build and deploy processes.
  • Familiarity with incident response, on-call rotations, and post-incident reviews.
  • Business-level Japanese.
  • At Degica, we embrace remote work while also offering office space for those who prefer in-person collaboration
  • 10 days regular vacation, additional 5 days summer and 5 days winter vacation
  • Paid birthday holiday
  • Budget for self-learning allowance, to ensure our employees’ skills remain current
  • Language training for Japanese
What You Should Know About Site Reliability Engineer - Observability, Degica

Join the innovative team at Komoju as a Site Reliability Engineer - Observability, where you'll play an essential role in enhancing our infrastructure's reliability and performance. As a leading cross-border payment gateway for Japan, we empower thousands of merchants with robust payment solutions. In this role, you'll focus on observability, a critical element for maintaining our services' health at scale. You'll design and refine our observability platform—encompassing metrics, logs, tracing, and alerting. By collaborating with our development teams, you'll ensure that observability is embedded throughout the software lifecycle, allowing us to detect issues swiftly and minimize disruptions for our users. Working remotely, yet based in Tokyo, you'll enjoy the flexibility of remote work while also having access to collaborative office space if desired. With responsibilities ranging from defining service-level objectives to guiding teams on best practices in incident response, you'll have an immediate impact on our operations. If you're passionate about building resilient systems and thrive in an environment that values insights and performance, this position at Komoju could be your next great adventure!

Frequently Asked Questions (FAQs) for Site Reliability Engineer - Observability Role at Degica
What are the primary responsibilities of a Site Reliability Engineer - Observability at Komoju?

As a Site Reliability Engineer - Observability at Komoju, your primary responsibilities will include designing and maintaining our observability stack, defining SLIs and SLOs for service health, collaborating with engineering teams to enhance application visibility, and building actionable dashboards and alerts. You'll also troubleshoot performance issues using observability data and educate engineering teams on monitoring best practices.

Join Rise to see the full answer
What qualifications are required for the Site Reliability Engineer - Observability position at Komoju?

To qualify for the Site Reliability Engineer - Observability position at Komoju, candidates should have at least 3 years of experience in SRE roles, hands-on experience with observability tools like Datadog, proficiency with Terraform, and a background in software development. Knowledge of scripting languages, familiarity with AWS, and understanding monitoring design principles are also essential.

Join Rise to see the full answer
What programming languages should I know for the Site Reliability Engineer - Observability role at Komoju?

For the Site Reliability Engineer - Observability role at Komoju, proficiency in at least one programming or scripting language is necessary. Candidates should be comfortable with languages like Ruby/Rails, Python, Go, or Shell Script, as these will be crucial for developing and maintaining observability solutions effectively.

Join Rise to see the full answer
Is remote work allowed for the Site Reliability Engineer - Observability position at Komoju?

Yes, the Site Reliability Engineer - Observability position at Komoju supports a remote-first approach. While the role is primarily remote, candidates should be willing to relocate to Tokyo, as we also offer office space for those who prefer in-person collaboration.

Join Rise to see the full answer
What kind of training and benefits does Komoju offer for the Site Reliability Engineer - Observability role?

Komoju provides a number of benefits for the Site Reliability Engineer - Observability role, including 10 vacation days, an additional 5 days each for summer and winter vacations, a paid birthday holiday, and a self-learning allowance to help employees keep their skills sharp. Language training for Japanese is also provided, enhancing your professional and personal development.

Join Rise to see the full answer
Common Interview Questions for Site Reliability Engineer - Observability
Can you explain your experience with observability tools?

Talk about the observability tools you've used, such as Datadog, and describe specific metrics you monitored and improved. Sharing examples of how you enhanced performance through these tools can illustrate your expertise.

Join Rise to see the full answer
How do you define and monitor SLIs and SLOs?

Discuss your approach to establishing service-level indicators (SLIs) and service-level objectives (SLOs). Explain how you ensure they align with business goals and are monitored effectively to maintain service health.

Join Rise to see the full answer
What strategies do you use to troubleshoot system performance issues?

Share the systematic methods you adopt for troubleshooting, emphasizing your reliance on observability data. Provide examples where you successfully diagnosed issues and implemented solutions to improve performance.

Join Rise to see the full answer
Can you describe how you would integrate observability into a CI/CD pipeline?

Outline your approach to integrating observability in CI/CD, such as adding monitoring and alerting throughout the build and deploy processes. Mention specific tools or practices that help maintain observability.

Join Rise to see the full answer
What are your preferred practices for documentation in your role?

Discuss the importance of clear documentation in incident response and system observability. Explain how you maintain documentation of processes, configurations, and best practices for easier team collaboration.

Join Rise to see the full answer
How do you handle alert fatigue among engineering teams?

Explain methods you've used to minimize alert fatigue, such as refining alert thresholds and ensuring alerts provide actionable insights. Providing specific examples demonstrates your proactive approach.

Join Rise to see the full answer
What experience do you have with incident response and post-incident reviews?

Detail your experience in incident response, emphasizing how you became part of postmortem reviews. Discuss how these experiences contributed to lessons learned and system improvements.

Join Rise to see the full answer
How do you educate teams about observability best practices?

Share strategies for educating teams, including training sessions, workshops, or collaborative retrospectives. Highlight any successful initiatives that fostered a culture of observability within your previous roles.

Join Rise to see the full answer
What scripting languages are you familiar with and how have you applied them?

Discuss any programming or scripting languages you're proficient in and provide examples of how you've utilized them to automate processes or enhance observability.

Join Rise to see the full answer
Why do you want to work as a Site Reliability Engineer - Observability at Komoju?

Express your enthusiasm for Komoju’s role in empowering merchants through technology. Mention your alignment with their innovative culture and how you can leverage your skills in observability to contribute to their success.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 7 hours ago

Join a dynamic team as a Contract Specialist in the payments industry, where you'll manage contracts and provide crucial legal counsel in both Japanese and English.

Photo of the Rise User
Posted 11 days ago

Become a vital part of KOMOJU’s Customer Engineering team, where you'll use your full-stack skills to resolve customer challenges and improve our services.

Lifebyte Systems Remote No location specified
Posted 3 days ago

Be an instrumental leader in managing a global IT operation while ensuring seamless tech experiences for a cutting-edge innovation company.

Medline Hybrid Chicago, Illinois
Posted 13 days ago

Medline Industries is looking for an experienced Application Manager to lead application management efforts and foster collaboration across departments in Chicago.

Photo of the Rise User
Dental Insurance
Disability Insurance
Flexible Spending Account (FSA)
Vision Insurance
Paid Holidays

Kandji is looking for a seasoned Staff Security Engineer to lead security initiatives in their innovative Apple device management platform.

Join BDC as an Assistant Vice President to lead the IT Performance Office and drive financial management excellence.

Photo of the Rise User

Accenture Federal Services is on the lookout for an Oracle Database Administrator to enhance the performance and management of their database systems.

Step into a pivotal role at i3 as a Senior Linux Systems Administrator, where your expertise will be crucial in supporting military operations.

Photo of the Rise User

FinQuery is searching for a skilled IAM Engineer to join their innovative team, enhancing the security and efficiency of IT systems.

Photo of the Rise User
WatchGuard Technologies, Inc. Remote Santa Rita Do Sapucai, Brazil
Posted 9 days ago

As an IT System Engineer, you will leverage your technical expertise to drive success in a fully remote position on a global team.

Degica's mission is to create a fair, global marketplace for goods and services - breaking down barriers between different markets and making it easier to do business anywhere in the world. With its origins in the video game publishing industry, D...

14 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
April 8, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
Photo of the Rise User
Someone from OH, Cincinnati just viewed Data Analyst (Contact Center-Hybrid) at Dow Jones
S
16 people applied to SOC Intern at SHEIN
Photo of the Rise User
Someone from OH, Delaware just viewed Practice Group Manager at LifeStance Health
Photo of the Rise User
Someone from OH, Youngstown just viewed Event Services Human Resources Coordinator at Allied Universal
Photo of the Rise User
Someone from OH, Columbus just viewed IP Network Engineering Intern - Summer 2025 at Bandwidth
Photo of the Rise User
Someone from OH, Cleveland just viewed Director, Education Programs & Partnerships at Encoura
Photo of the Rise User
8 people applied to IT Intern - Seasonal at Carowinds
Photo of the Rise User
80 people applied to Jr SOC Analyst at IBM
Photo of the Rise User
Someone from OH, Cleveland just viewed Operations Associate (Part-Time) - Pinecrest at Alo Yoga
Photo of the Rise User
Someone from OH, Dayton just viewed Medical Receptionist at LifeStance Health
Photo of the Rise User
Someone from OH, Coldwater just viewed Engineering Design Checker Jobs at Lockheed Martin
Photo of the Rise User
Someone from OH, Loveland just viewed SEO Admin & Business Support at Outliant
Photo of the Rise User
45 people applied to IT Intern at USAA
Photo of the Rise User
Someone from OH, Columbus just viewed Casting: Cedar Lake - Pilot Episode at Backstage
Photo of the Rise User
Someone from OH, Mount Orab just viewed Software Development Manager at Assured Guaranty
H
Someone from OH, Mansfield just viewed Medical Appointment Setter (Remote LatAm) at HireHawk
Photo of the Rise User
Someone from OH, Lewis Center just viewed Third Party Risk Analyst at Experian
Photo of the Rise User
Someone from OH, Columbus just viewed Lead Preschool Teacher at Guidepost Montessori
A
Someone from OH, Cincinnati just viewed Global Supply Manager - Taiwan at Also
Photo of the Rise User
Someone from OH, Cincinnati just viewed Global Supply Manager (Raptor Machining) at SpaceX