Job details

Site Reliability Engineer | Cloud Team

We are a global company with offices in the US, Europe and Asia. In these centers, we carry out the various stages of product development, from initial concept to mass production of ready-to-sell units. We embrace a vertically integrated business model with strategic design, manufacturing, distribution, sales and support centers around the world to maximize our value to customers.

Garmin Private Cloud (GPC) will be our internal cloud, developed entirely using open-source technologies such as OpenStack and Kubernetes. GPC will enable Garmin to fully manage the technology, staffing, and costs associated with our evolving product platform.

The GPC team will be responsible for building and maintaining the platform that supports well-known Garmin services like Garmin Connect, ConnectIQ Appstore, Garmin Golf, and many other services.

We believe that collaboration leads to the best ideas, and we rely heavily on team interaction. As a hybrid role based in Cluj-Napoca, this position will require at least 3 days in the office each week.

Responsabilities

Ensures the integrity of Garmin's production environment is maintained and that all releases into the environment are well-organized, communicated, and managed.
Author and lead process improvements to the whole project lifecycle and release process.
Establish and provide training to development teams on operational processes and automations that promote software integrity and stability.
Lead design/definition activities for moderate- and high-complexity systems, features, and/or processes.
Champion the shift-left culture of reliability and delivery performance within software development teams.
Monitor and support moderate- and high-complexity software releases.
Design and implement improvements to the software lifecycle and production pipeline through automated tools/systems that align with industry best practices.
Coordinate and improve monitoring practices across software applications and infrastructure.
Build and/or maintain tools to generate reports.
Maintain accurate data to facilitate reporting on key reliability SLOs for multiple products/systems.
Improve the team’s incident response by nurturing incident playbooks.
Through post-incident activities, proactively identify and/or implement reliability improvements and automated mitigations of recurrence.
Cultivate engagement in the SRE community to nurture standards, best practices, and training across product owners, software engineers, and other SREs.
Participate in capacity planning to ensure software can scale sufficiently at peak times.
Work collaboratively and professionally in a team environment with other Garmin associates to achieve goals.

Experience with public cloud infrastructures, tools, and processes (Azure, AWS, GCP).
Experience with designing, developing, and deploying containerized applications (Kubernetes).
Experience with moderately complex build and deployment automation.
Experience with scaling cloud native applications in large, high-availability environments.
Experience with DevOps-style tools such as Jenkins, Maven, GitLab, Nexus, RunDeck.
Experience with scripting languages such as Python, Groovy.
Experience with Infrastructure as Code such as Ansible, Terraform, Salt, Chef, Puppet.
Good understanding of Linux system administration.
Configuration of complex multi-tiered server applications.
Effective judgment, discretion, and decision-making abilities.
Demonstrate strong and effective verbal, written, and interpersonal communication skills.
Team-oriented, possessing a positive attitude and working well with others.
Minimum 4 years of relevant work experience.

Would be a plus:

Proficiency in application languages/frameworks such as Java, SpringBoot, C#, JavaScript, React, Angular.
You have some knowledge with: RabbitMQ, Kafka.
You are familiar with data storage technologies such as RDBMS, No-SQL.
Experience with OpenStack cloud computing infrastructure and related technologies.
Experience with APM monitoring tools such as Zabbix, AppDynamics, New Relic, Dynatrace.
Experience with CDN Providers such as Akamai/Cloudflare.
Experience with observability tools such as Uptrends, Splunk, Kibana.

Benefits to enhance your experience:

24 days off each year plus extra vacation days based on years at Garmin and compensation for legal holidays.
Health package subscription and yearly budget for glasses.
Monthly budget for sports and wellbeing activities.
Local and global career development programs (training, mentorship, technical and leadership development, and more).
Access to e-learning platforms and support for technical conferences attendance.
Loyalty bonus within the company, plus other special bonuses (for holidays and personal life events).
Meal tickets.

Yours exclusively when part of our team:

Significant discount for Garmin products.
Employee stock purchase plan.
Contribution to the retirement plan (Pillar 3).
Garmin products available for testing and borrowing.
A comprehensive event series championing wellbeing, sports, and community tailored to foster holistic health (featuring sports events, classes, hackathons, parties, and more).
Other benefits which we invite you to discover along the recruitment process.

Garmin Cluj is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, religion, national origin, sex, age, or disability.

Average salary estimate

$70000 / YEARLY (est.)

min

max

$60000K

$80000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Site Reliability Engineer | Cloud Team, Garmin Cluj

Join Garmin as a Site Reliability Engineer on our innovative Cloud Team! In this exciting role, you'll help shape and maintain our Garmin Private Cloud (GPC), a vital internal cloud built entirely on open-source technologies like OpenStack and Kubernetes. With offices located around the globe, we foster a culture where collaboration and teamwork are celebrated, and great ideas flourish. You'll play a crucial role in ensuring our various Garmin services, such as Garmin Connect and ConnectIQ Appstore, run smoothly and efficiently. Your responsibilities will range from managing the production environment to optimizing our software lifecycle through automation and operational training. We value initiative, and you’ll be encouraged to lead design activities and implement best practices to elevate delivery performance and reliability within our development teams. Plus, if you're passionate about improving incident response and capacity planning, this role is tailored for you! You will need at least 4 years of relevant experience, with a strong background in DevOps tools, cloud environments, and scripting languages. As part of the Garmin family, you’ll enjoy a supportive work environment with numerous benefits including health packages, generous vacation days, and development programs to help you grow in your career. Embrace the opportunity to work on cutting-edge technology while enjoying a vibrant team spirit! We can't wait to welcome you aboard.

Frequently Asked Questions (FAQs) for Site Reliability Engineer | Cloud Team Role at Garmin Cluj

What are the responsibilities of a Site Reliability Engineer at Garmin?

As a Site Reliability Engineer at Garmin, you will ensure the integrity of Garmin's production environment is maintained. Your responsibilities include authoring process improvements throughout the project lifecycle, establishing training on operational processes, and monitoring software releases. You will also lead design activities, implement lifecycle improvements through automation, coordinate monitoring practices, and improve the team's incident response efforts.