Job details

Senior Site Reliability Engineer

Replicant was founded on the belief that machines are ready to have useful, complex conversations that will transform the way they interact with the world, starting with customer service.

As the leader in Contact Center Automation, Replicant helps companies automate their most common customer service calls while empowering agents to focus on more complex and nuanced customer challenges. Replicant's AI platform allows consumers to engage in natural conversations across voice, messaging and other digital channels to resolve their customer support issues, without the wait, 24/7. We are now leading the way in using Large Language Models (LLMs) to transform customer service- again.

If you're excited by AI, ChatGPT, LLMs and want to make an impact with other great technologists and strong go-to-market leaders, then look no further. We've grown our team by 3x, increased revenue by 4x, and were named a top enterprise AI company by The Information. We currently serve Fortune 500 customers, run millions of AI calls per month in production, and are increasing our footprint globally.

We're searching for a skilled Site Reliability Engineer to play a crucial role in scaling the infrastructure and systems that power Replicant. As our company expands, we need your expertise to optimize how Replicant's data is managed and delivered, enhance the connectivity of our software applications, and strike the right balance between engineering autonomy and standardization. Our core technology stack includes TypeScript/NodeJS and Python within a Kubernetes environment on GCP, along with tools like Helm, Terraform, Datadog, and Prometheus.

What You'll Do

Ensure the smooth operation and high availability of Replicant's production systems.
Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency.
Develop and maintain tools and automation to prevent and quickly resolve incidents.
Collaborate with engineering teams to improve the reliability and scalability of our applications and infrastructure.
Participate in on-call rotation to address production issues and ensure service uptime.
Contribute to infrastructure design and implementation, focusing on scalability, security, and cost-effectiveness.
Stay up-to-date on industry best practices and emerging technologies in SRE and DevOps.

What You'll Bring

Proven experience in managing and troubleshooting complex, distributed systems in a production environment.
Strong understanding of cloud platforms (GCP preferred) and containerization technologies (Kubernetes).
Proficiency in scripting languages and automation tools (e.g., Python, Bash, Terraform).
Experience with monitoring and observability systems (e.g., Datadog, Prometheus).
Excellent problem-solving skills and a proactive approach to identifying and mitigating potential issues.
Strong communication and collaboration skills, with the ability to work effectively in a team environment.
A passion for ensuring the reliability and performance of critical systems.

Bonus Points

Experience with CI/CD pipelines and infrastructure-as-code practices.
Knowledge of networking concepts and protocols.
Familiarity with security best practices for cloud-based systems.
Familiarity with telephony applications

For all full-time employees, we offer:

🏠 Remote working environment that respects time zone differences

💸 Highly competitive salaries, equity, and for US Employees, a 401(k) plan

🏥 Top of the line healthcare (medical, vision, and dental)

🏋️ Health and Wellness Perk

🖥️ Equipment Stipend

🌴 Flexible vacation policy

✈️ Amazing team trips & offsites where you can find our CEO baking bread for the team

🌺 Replicants are eligible for a 5-week sabbatical after being at the company for 4.5 years

Our Values

Replicant has three core values. It is critical that everyone who joins the team feels excited and moved by these values as every new team member makes an impact on our culture.

Blade Runners: We take ownership and pride to influence the outcomes of our goals. We are successful, and like a Blade Runner, use the tools at our disposal to reach our objectives. We value open and honest communication and proactively seek feedback along the way. We are a company driven to grow and achieve both individually and as a team.

Bread Makers: We are humble and strive toward an egalitarian culture. No task is too big or too small. We work together to achieve our goals and develop our company mission. We believe that the whole is greater than the sum of its parts in everything that we do.

Självdistans (Self-Distance): Självdistans is Swedish for self-distance. It's the ability to critically reflect on oneself and one's relations from an external perspective. With this in mind, we act with objectivity and always remember that we are not our work. There's no perfect science to growing a team or business, but we trust everyone at Replicant to point out our blind spots and humbly admit their own.

Replicant is proud to be an equal opportunity employer. We are committed to fostering an inclusive, diverse and equitable workplace that is built on trust, support and respect. We welcome all individuals and do not discriminate on the basis of gender identity and expression, race, ethnicity, disability, sexual orientation, colour, religion, creed, gender, national origin, age, marital status, pregnancy, sex, citizenship, education, languages spoken or veteran status. Accommodation is available upon request at any point during our recruitment process. If you require an accommodation, please speak to your talent acquisition partner or email us at hr@replicant.ai and we’ll work to meet your needs.

Replicant Glassdoor Company Review

4.0

Replicant DE&I Review

3.9

CEO of Replicant

Gadi Shamia

Approve of CEO

Average salary estimate

$150000 / YEARLY (est.)

min

max

$120000K

$180000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Senior Site Reliability Engineer , Replicant

Replicant is on the lookout for a talented Senior Site Reliability Engineer to join our pioneering team. As a key player in our mission to revolutionize customer service with advanced AI technology, you’ll help scale and optimize the critical infrastructure that supports our leading-edge platform. Your role will involve ensuring our production systems operate smoothly and maintaining high availability while tackling performance bottlenecks and enhancing efficiency. Collaborating closely with engineering teams, you’ll develop and maintain automation tools that prevent incidents and enable rapid resolutions. Your technical expertise in managing complex distributed systems, particularly within a Kubernetes environment on GCP, will be invaluable as we expand our global reach. With technologies like TypeScript, NodeJS, Python, Helm, Terraform, Datadog, and Prometheus at your fingertips, you’ll contribute significantly to our infrastructure's design and implementation. Additionally, you'll participate in on-call rotations to keep our service uptime stellar. As we foster a culture of open communication, humility, and personal growth, we believe your passion for maintaining system reliability will make a noteworthy impact on our dynamic work environment. If you have a proactive mindset and desire to work alongside innovative technologists at Replicant, we can’t wait to hear from you.

Frequently Asked Questions (FAQs) for Senior Site Reliability Engineer Role at Replicant

What are the main responsibilities of a Senior Site Reliability Engineer at Replicant?

As a Senior Site Reliability Engineer at Replicant, your main responsibilities will include ensuring the smooth operation of production systems, monitoring system performance for bottlenecks, implementing optimizations, and developing automation tools to prevent incidents. You will also collaborate with engineering teams to enhance reliability and scalability, participate in on-call rotations, and contribute to the infrastructure design focusing on scalability and security.