Sign up for our
weekly
newsletter
of fresh jobs
Responsibilities• Implement tooling to monitor AWS EKS-based systems focusing on performance, reliability, and scalability.
• Ensure that architecture and deployment models are sufficient to support SLA commitments and are well prepared for future problems of scale.
• Leverage cloud technology and platform capabilities to provide operationally sustainable solutions that are robust and cost... effective.
• Apply software engineering best practices to comprehensively address and resolve problems.
• Collaborate with product support teams to drive efficiency and enhance customer experience through self-service tools and automation.
• Ensure timely response to incidents and support requests, collaborating effectively on solutions.
• Conduct root cause analysis and implement preventative measures to minimize toil and impact on customers.
• Lead and participate in incident retrospectives to enhance future response efforts.
• Participate in on-call rotations, providing critical support as needed.
Qualifications• An extensive and successful technical career within reputable technology firms, particularly with large-scale cloud applications.
• Expertise in Site Reliability Engineering concepts and practices, including the use of observability platforms and monitoring tools.
• Experience deploying and supporting containerized applications on cloud platforms, preferably EKS on AWS.
• Proficiency in infrastructure as code technologies, such as Terraform.
• Strong software engineering skills in languages like Python, JavaScript, or Go.
• Familiarity with DevOps and CI/CD methodologies.
• Bachelor’s degree in Computer Science or related field