Sign up for our
weekly
newsletter
of fresh jobs
Location: Panthersville
This
...
Site Reliability Engineer I
will be part of the (SRE) team. The SRE team is an innovative team devoted to providing automated solutions and services for Cox Automotive to measure, evaluate and plan for visible, reliable application delivery and maintenance. As a member of the SRE team, you will work with development teams to help create automated pipelines and solutions required for continuous delivery in an Agile Dev/Ops environment.
The tools and use-cases are diverse, and our challenge is to increase the development velocity by optimizing various parts of the pipeline and increase application stability. This is an opportunity to create automation, monitoring, and pipelines to improve deploy and response time across the board. We are looking for engineers who are passionate about infrastructure as code and continuous deployment to build scalable and highly reliable applications.
Position Overview:
We are seeking a highly motivated and talented individual to join our organization as a Site Reliability Engineer (SRE
1). As an SRE 1, you will play a crucial role in ensuring the reliability, availability, and performance of our systems and services.
You will work collaboratively with cross-functional teams to design, implement, and maintain scalable and resilient infrastructure that powers our applications. This entry-level position offers an excellent opportunity for those with a strong software engineering foundation or a degree in computer science to develop their skills in the exciting field of Site Reliability Engineering.
Responsibilities
Monitor the health, performance, and reliability of our systems using monitoring tools and dashboards. Participate in on-call rotations to respond to incidents and troubleshoot issues to ensure timely resolution. Contribute to the development and enhancement of automation scripts, tools, and processes to streamline system deployment, configuration, and management. Collaborate with software engineering teams to implement infrastructure as code (IaC) practices. Collaborate with software engineers to design, implement, and optimize systems for scalability and high availability.
Identify performance bottlenecks and work to enhance the overall system performance. Analyze resource usage trends and work with teams to forecast capacity requirements. Assist in planning and implementing strategies for resource optimization and cost-efficiency. Document incident details, root causes, and resolutions for knowledge sharing and continuous improvement. Participate in post-incident reviews to identify underlying issues and implement preventative measures. Collaborate with security teams to ensure systems are designed and maintained in compliance with security standards and best practices.
Work closely with software engineering, Dev Ops, and other cross-functional teams to deliver reliable and efficient systems. Communicate effectively to share insights, provide updates, and escalate issues when necessary. Comfortable in leadership and facilitation roles as needed.
Qualifications
Bachelor's degree in computer science, Software Engineering, or a related field Familiar with Agile Software Delivery Systems Strong technical troubleshooting capabilities and experience Strong foundation in software development principles, including proficiency in one or more programming languages (e.g., Python, Typescript, Java, JavaScript, Go, etc.). Familiarity with Linux/Unix systems and command-line tools. Experience building and/or maintaining Terraform pipelines Demonstrated problem-solving skills and the ability to diagnose and resolve technical issues.
Passion for learning new technologies and adapting to a dynamic environment. Interest in and ability to share knowledge and facilitate technical discussions. Highly motivated, self-directed individual with proven initiative Comfortable facilitating technical activities and coaching teams toward successful infrastructure decisions Strong team building mindset seeking continuous improvement and knowledge sharing Positive-thinker with a "can-do" mindset Demonstrated understanding of monitoring techniques, automation practices, and AWS cost savings solutions
Must have a continuous improvement mindset with demonstrated actions toward staying abreast of industry technologies and solutions Excellent communication skills and the ability to collaborate effectively in a team-oriented environment. Knowledge of user story or ticketing systems to track and update work statuses. Understanding of fundamental networking concepts/protocols and automation tools. Excellent oral and written communication skills
Preferred Qualifications
Experience with AWS cloud platform and Docker containerization technology Knowledge of configuration management tools (e.g., Ansible, Puppet, Chef). Exposure to continuous integration and continuous deployment (CI/CD, Jenkins, Terraform, Git Hub Actions) pipelines. Familiarity with monitoring and logging tools (e.g., Splunk, New Relic, ELK…