Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Site Reliability Engineer image - Rise Careers
Job details

Site Reliability Engineer

RetailNext is looking to expand our SRE team. We need people who have the skillset of good backend developers to focus on the operation and reliability of our SAAS retail analytics solution. We pull in and process data from thousands of brick and mortar stores to help our customers better understand and serve their customers. We actively develop in Go and use technologies like Cassandra, Redis, Elasticsearch, gRPC, Kafka, PubSub/SQS, and more. We maintain legacy Ruby, NodeJS, Java, and C++ code.

You will be helping us operationalize new features, maintain the stability of the application, and improve how we develop and deploy it. This role includes being part of our on-call rotation, along with the backend team. Past SRE projects have included bringing cloud resources under Terraform management, migrating from StatsD to Prometheus, re-writing how our application collects diagnostic telemetry from deployed sensors, and much more.

Who you are:
● Strong in at least one backend programming language (Go, NodeJS, Ruby, etc...)
● Familiar with Linux (You know what the FHS, cgroups, etc... are)
● Able to teach yourself new technologies and programming languages
● Able to debug and fix issues in third-party open-source software
● Meticulously diligent about security and reliability
● Experience in an SRE, DevOps, or Release Engineering role

Bonus points:
● Experience (re-)architecting distributed applications to fix scalability and reliability issues
● Experience building and maintaining CI/CD processes
● Experience operating Cassandra in a production environment
● Experience with any cloud IAAS provider (we use both GCE and EC2)
● Experience with infrastructure-as-code tools such as Terraform

What's it like to work here?

  • Remote-First Hybrid: Work anywhere + office access.

  • 90-Day Work Anywhere: Work from anywhere for 90 days yearly.

  • Autonomy & Growth: Flexible schedules, ownership, career investment.

  • Customer Obsessed: Everything we do is for our clients.

Perks & Benefits

  • Best Self Allowance: Annual stipend for personal growth.

  • Recharge Days: Monthly company-wide day off.

  • Career Growth: We invest in you.

 

What You Should Know About Site Reliability Engineer, RetailNext

RetailNext is on the lookout for a talented Site Reliability Engineer to join our dynamic team. In this role, you'll harness your backend development skills to ensure the operation and reliability of our cutting-edge SAAS retail analytics solution. We thrive on helping businesses understand their customers better, pulling in and processing data from thousands of brick-and-mortar locations. If you enjoy a collaborative environment that actively utilizes Go and tackles modern technologies like Cassandra, Redis, Elasticsearch, gRPC, Kafka, and more, this could be the perfect fit for you. Your responsibilities will include operationalizing new features, maintaining application stability, and refining our development and deployment processes. With your participation in our on-call rotation alongside the backend team, you'll be an integral part of our SRE initiatives, which have previously included exciting projects like migrating cloud resources under Terraform management and reworking our application telemetry collection. To succeed, you should have solid experience in at least one backend programming language (Go, NodeJS, Ruby, etc.), be comfortable with Linux, and possess the enthusiasm to learn new technologies and debug open-source software. If you also have an eye for security and reliability, along with some SRE or DevOps experience, we’d love to hear from you. Plus, with our Remote-First Hybrid approach, enjoy the flexibility of working from wherever you like, enhanced by many perks and opportunities for growth. At RetailNext, we value your contribution and prioritize your professional development through initiatives like our annual Best Self Allowance and monthly Recharge Days. Join us and make a difference!

Frequently Asked Questions (FAQs) for Site Reliability Engineer Role at RetailNext
What are the main responsibilities of a Site Reliability Engineer at RetailNext?

As a Site Reliability Engineer at RetailNext, your key responsibilities will include ensuring the operational stability of our SAAS retail analytics platform, operationalizing new features, and enhancing our development and deployment processes. You will also be part of an on-call rotation with the backend team, engaging with exciting projects like managing cloud resources through Terraform and improving our telemetry collection methods.

Join Rise to see the full answer
What qualifications are required for the Site Reliability Engineer position at RetailNext?

The ideal candidate for the Site Reliability Engineer role at RetailNext should have strong proficiency in at least one backend programming language like Go, NodeJS, or Ruby, and be familiar with Linux systems. Prior experience in SRE, DevOps, or Release Engineering is beneficial, along with the ability to learn new tools and languages independently. Familiarity with and experience in managing production environments and cloud technologies can be a definite plus.

Join Rise to see the full answer
What technologies will I be working with as a Site Reliability Engineer at RetailNext?

At RetailNext, you will be working with several cutting-edge technologies, including Go for application development, alongside tools and systems like Cassandra, Redis, Elasticsearch, gRPC, Kafka, and Terraform. This diverse tech stack provides opportunities to gain practical experience in both legacy systems such as Ruby, NodeJS, Java, and C++, as well as modern solutions.

Join Rise to see the full answer
Is prior experience in cloud infrastructure necessary for the Site Reliability Engineer role at RetailNext?

While not strictly necessary, prior experience in cloud infrastructure is highly beneficial for a Site Reliability Engineer at RetailNext. Familiarity with cloud IaaS providers like Google Cloud Platform and Amazon EC2 will help you manage and optimize our services effectively, especially with our ongoing projects that involve cloud resource management.

Join Rise to see the full answer
What kind of work environment can I expect as a Site Reliability Engineer at RetailNext?

RetailNext fosters a Remote-First Hybrid workplace, offering you the flexibility to work from anywhere while providing occasional office access if desired. We believe in autonomy and growth, facilitating flexible schedules, and investing in your career development. You'll find a customer-obsessed culture where our focus is on delivering the best possible experience for clients.

Join Rise to see the full answer
Common Interview Questions for Site Reliability Engineer
Can you explain the importance of site reliability in a SaaS environment?

Site reliability is crucial in a SaaS environment because it directly impacts user satisfaction and retention. As a Site Reliability Engineer, you ensure that the SaaS platform operates smoothly, minimizing downtime and providing users with reliable access to services. Highlight your understanding of SRE principles and how they contribute to overall system reliability.

Join Rise to see the full answer
What programming languages are you proficient in, and how have you used them as a Site Reliability Engineer?

I am proficient in Go and NodeJS, and I have utilized these languages to develop backend services that enhance system stability. Discuss specific projects where you've implemented features or debugged issues, emphasizing your coding capabilities and how they've improved the reliability of the system.

Join Rise to see the full answer
Describe a challenging SRE project you've worked on. What was the problem and how did you solve it?

In one challenging project, we faced significant downtime due to inefficient resource management in our cloud setup. By implementing Terraform for cloud resource automation, I was able to streamline processes and reduce manual errors. Emphasize your problem-solving approach and the impact your solution had on system reliability.

Join Rise to see the full answer
How do you prioritize tasks during an on-call rotation?

During an on-call rotation, I prioritize tasks based on urgency and impact on system reliability. Critical incidents affecting user access take precedence, while less pressing issues can be resolved in due time. Explain your experience in handling on-call duties and how effective prioritization has led to improved SLA adherence.

Join Rise to see the full answer
What methods do you use to monitor system health and identify issues proactively?

I utilize monitoring tools such as Prometheus and ELK Stack to track system performance and receive alerts for anomalies. These tools assist in pinpointing potential issues before they escalate. Share experiences of how proactive monitoring has saved time and resources.

Join Rise to see the full answer
Can you describe your experience with CI/CD processes?

I have extensively worked on implementing CI/CD processes to streamline deployment cycles. By utilizing tools such as Jenkins and GitLab CI, I’ve automated testing and deployment, significantly reducing the chances of human errors. Offer insights into how you’ve improved deployment efficiency through CI/CD.

Join Rise to see the full answer
What role does reliability play in software development?

Reliability is a cornerstone of software development as it ensures that the application can withstand usage, respond predictably, and recover from failures seamlessly. Discuss how you advocate for reliability during development life cycles and correlate it with user trust and satisfaction.

Join Rise to see the full answer
How do you handle system failures in production? Could you provide an example?

When a system failure occurs in production, my first step is to assess the impact and resolve immediate customer-facing issues. Afterward, I conduct a postmortem analysis to identify root causes and prevent recurrence. An example is when I led a troubleshooting effort that involved a database failure, ensuring rapid resolution while implementing long-term fixes.

Join Rise to see the full answer
What experience do you have with container orchestration tools?

I have hands-on experience with Kubernetes and Docker Swarm for managing containerized applications. These tools are essential for scalability and reliability in microservices architectures, enabling seamless deployment and management of services. Mention projects where you've successfully implemented container orchestration.

Join Rise to see the full answer
How do you ensure code quality and security in your deployments?

Ensuring code quality and security is vital in deployments. I utilize code reviews, automated testing, and static code analysis tools to maintain high-quality standards. Discuss how you've integrated security practices into your development and deployment workflows, highlighting a specific example.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 12 days ago
Photo of the Rise User
RetailNext Remote South Africa, Remote
Posted 13 days ago
Photo of the Rise User

Join Bosch as an Engineering Specialist to drive innovation in semiconductor manufacturing at their Roseville facility.

Photo of the Rise User
Posted 9 days ago

Join AECOM as a Transportation Group Manager to lead transportation projects and innovatively drive your team's success.

Photo of the Rise User
Lincoln Electric Hybrid US, Cuyahoga County, OH; Ohio, Cleveland, OH
Posted 6 days ago

Join Lincoln Electric as a Chemistry Technician and contribute to high-quality chemical testing services in a leading global welding solutions company.

Photo of the Rise User
Posted 13 days ago
Photo of the Rise User
Maze Remote No location specified
Posted 2 days ago
Dental Insurance
Vision Insurance
Paid Holidays
Sabbatical

Be a key player in a cutting-edge team leveraging Generative AI to enhance cybersecurity as an Infra/DevOps Engineer.

Photo of the Rise User

Join AECOM as a Civil/Environmental Engineer, where you will contribute to impactful environmental projects.

Photo of the Rise User
Deel Remote No location specified
Posted 10 hours ago
Inclusive & Diverse
Collaboration over Competition
Fast-Paced
Growth & Learning
Empathetic

Join Deel as a Tech Lead to transform the future of work with cutting-edge technology in a fully remote environment.

Headquartered in San Jose, California, RetailNext enables retailers and manufacturers to collect, analyze, and visualize data about in-store customer engagement.

17 jobs
MATCH
VIEW MATCH
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
April 10, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
A
Someone from OH, Columbus just viewed 35753427558 - Virtual Assistant at Activate Talent
V
Someone from OH, Columbus just viewed Remote Virtual Assistant at VirtueStaff
Photo of the Rise User
Someone from OH, Hamilton just viewed Customer Service Agent at Allegiant
P
Someone from OH, Cleveland just viewed Video Editor at ProjectGrowth
Photo of the Rise User
Someone from OH, Columbus just viewed Fullstack Developer at Apex Systems
Photo of the Rise User
43 people applied to REMOTE Sr Piping Designer at Kelly
Photo of the Rise User
Someone from OH, Dayton just viewed Remote Support Engineer at Frontier Technology Inc
Photo of the Rise User
8 people applied to Robotic engineer at New Balance
Photo of the Rise User
Someone from OH, Mason just viewed VP, Business Partners - Global Sales at Zscaler
F
Someone from OH, Oxford just viewed Supply Chain Intern at Fortune Brands
Photo of the Rise User
Someone from OH, Massillon just viewed FORKLIFT OPERATOR at Shearer's Foods
Photo of the Rise User
Someone from OH, Columbus just viewed Shipper/Receiver - Day Shift at Avery Dennison
Photo of the Rise User
Someone from OH, Painesville just viewed Accountant - Mid at Progressive Insurance
Photo of the Rise User
87 people applied to Electrical Apprentice at Aerotek
Photo of the Rise User
Someone from OH, Georgetown just viewed Ohio Medicaid Inbound Contacts Rep at Humana
Photo of the Rise User
6 people applied to Engineering intern at Commvault