Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Site Reliability Engineer | Cloud Team image - Rise Careers
Job details

Site Reliability Engineer | Cloud Team

We are a global company with offices in the US, Europe and Asia. In these centers, we carry out the various stages of product development, from initial concept to mass production of ready-to-sell units. We embrace a vertically integrated business model with strategic design, manufacturing, distribution, sales and support centers around the world to maximize our value to customers.

Garmin Private Cloud (GPC) will be our internal cloud, developed entirely using open-source technologies such as OpenStack and Kubernetes. GPC will enable Garmin to fully manage the technology, staffing, and costs associated with our evolving product platform. 

The GPC team will be responsible for building and maintaining the platform that supports well-known Garmin services like Garmin Connect, ConnectIQ Appstore, Garmin Golf, and many other services. 

We believe that collaboration leads to the best ideas, and we rely heavily on team interaction. As a hybrid role based in Cluj-Napoca, this position will require at least 3 days in the office each week. 

Responsabilities

  • Ensures the integrity of Garmin's production environment is maintained and that all releases into the environment are well-organized, communicated, and managed.
  • Author and lead process improvements to the whole project lifecycle and release process.
  • Establish and provide training to development teams on operational processes and automations that promote software integrity and stability.
  • Lead design/definition activities for moderate- and high-complexity systems, features, and/or processes.
  • Champion the shift-left culture of reliability and delivery performance within software development teams.
  • Monitor and support moderate- and high-complexity software releases.
  • Design and implement improvements to the software lifecycle and production pipeline through automated tools/systems that align with industry best practices.
  • Coordinate and improve monitoring practices across software applications and infrastructure.
  • Build and/or maintain tools to generate reports.
  • Maintain accurate data to facilitate reporting on key reliability SLOs for multiple products/systems.
  • Improve the team’s incident response by nurturing incident playbooks.
  • Through post-incident activities, proactively identify and/or implement reliability improvements and automated mitigations of recurrence.
  • Cultivate engagement in the SRE community to nurture standards, best practices, and training across product owners, software engineers, and other SREs.
  • Participate in capacity planning to ensure software can scale sufficiently at peak times.
  • Work collaboratively and professionally in a team environment with other Garmin associates to achieve goals. 
  • Experience with public cloud infrastructures, tools, and processes (Azure, AWS, GCP).
  • Experience with designing, developing, and deploying containerized applications (Kubernetes).
  • Experience with moderately complex build and deployment automation.
  • Experience with scaling cloud native applications in large, high-availability environments.
  • Experience with DevOps-style tools such as Jenkins, Maven, GitLab, Nexus, RunDeck.
  • Experience with scripting languages such as Python, Groovy.
  • Experience with Infrastructure as Code such as Ansible, Terraform, Salt, Chef, Puppet.
  • Good understanding of Linux system administration.
  • Configuration of complex multi-tiered server applications.
  • Effective judgment, discretion, and decision-making abilities.
  • Demonstrate strong and effective verbal, written, and interpersonal communication skills.
  • Team-oriented, possessing a positive attitude and working well with others.
  • Minimum 4 years of relevant work experience. 

Would be a plus: 

  • Proficiency in application languages/frameworks such as Java, SpringBoot, C#, JavaScript, React, Angular.
  • You have some knowledge with: RabbitMQ, Kafka. 
  • You are familiar with data storage technologies such as RDBMS, No-SQL.
  • Experience with OpenStack cloud computing infrastructure and related technologies.
  • Experience with APM monitoring tools such as Zabbix, AppDynamics, New Relic, Dynatrace.
  • Experience with CDN Providers such as Akamai/Cloudflare.
  • Experience with observability tools such as Uptrends, Splunk, Kibana.

Benefits to enhance your experience: 

  • 24 days off each year plus extra vacation days based on years at Garmin and compensation for legal holidays.
  • Health package subscription and yearly budget for glasses.
  • Monthly budget for sports and wellbeing activities.
  • Local and global career development programs (training, mentorship, technical and leadership development, and more).
  • Access to e-learning platforms and support for technical conferences attendance.
  • Loyalty bonus within the company, plus other special bonuses (for holidays and personal life events).
  • Meal tickets. 

Yours exclusively when part of our team: 

  • Significant discount for Garmin products.
  • Employee stock purchase plan.
  • Contribution to the retirement plan (Pillar 3).
  • Garmin products available for testing and borrowing.
  • A comprehensive event series championing wellbeing, sports, and community tailored to foster holistic health (featuring sports events, classes, hackathons, parties, and more).
  • Other benefits which we invite you to discover along the recruitment process. 

Garmin Cluj is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, religion, national origin, sex, age, or disability. 

Average salary estimate

$70000 / YEARLY (est.)
min
max
$60000K
$80000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Site Reliability Engineer | Cloud Team, Garmin Cluj

Join Garmin as a Site Reliability Engineer on our innovative Cloud Team! In this exciting role, you'll help shape and maintain our Garmin Private Cloud (GPC), a vital internal cloud built entirely on open-source technologies like OpenStack and Kubernetes. With offices located around the globe, we foster a culture where collaboration and teamwork are celebrated, and great ideas flourish. You'll play a crucial role in ensuring our various Garmin services, such as Garmin Connect and ConnectIQ Appstore, run smoothly and efficiently. Your responsibilities will range from managing the production environment to optimizing our software lifecycle through automation and operational training. We value initiative, and you’ll be encouraged to lead design activities and implement best practices to elevate delivery performance and reliability within our development teams. Plus, if you're passionate about improving incident response and capacity planning, this role is tailored for you! You will need at least 4 years of relevant experience, with a strong background in DevOps tools, cloud environments, and scripting languages. As part of the Garmin family, you’ll enjoy a supportive work environment with numerous benefits including health packages, generous vacation days, and development programs to help you grow in your career. Embrace the opportunity to work on cutting-edge technology while enjoying a vibrant team spirit! We can't wait to welcome you aboard.

Frequently Asked Questions (FAQs) for Site Reliability Engineer | Cloud Team Role at Garmin Cluj
What are the responsibilities of a Site Reliability Engineer at Garmin?

As a Site Reliability Engineer at Garmin, you will ensure the integrity of Garmin's production environment is maintained. Your responsibilities include authoring process improvements throughout the project lifecycle, establishing training on operational processes, and monitoring software releases. You will also lead design activities, implement lifecycle improvements through automation, coordinate monitoring practices, and improve the team's incident response efforts.

Join Rise to see the full answer
What qualifications do I need for the Site Reliability Engineer position at Garmin?

To qualify for the Site Reliability Engineer role at Garmin, you should have a minimum of 4 years of relevant experience in site reliability, DevOps, or a similar field. Experience with cloud infrastructures (like AWS or Azure), containerized applications (Kubernetes), and scripting languages (such as Python or Groovy) is essential. Proficiency in infrastructure as code tools, Linux administration, and build and deployment automation is also highly valued.

Join Rise to see the full answer
What tools and technologies will I work with as a Site Reliability Engineer at Garmin?

In your role as a Site Reliability Engineer at Garmin, you will work with a variety of tools and technologies including but not limited to OpenStack, Kubernetes, Jenkins, GitLab, and Terraform. Familiarity with monitoring tools like AppDynamics and New Relic, as well as data storage technologies such as RDBMS and No-SQL, is considered a plus.

Join Rise to see the full answer
What benefits does Garmin offer to Site Reliability Engineers?

Garmin offers an attractive benefits package for Site Reliability Engineers that includes 24 vacation days each year, a health package subscription, and a monthly budget for sports and well-being activities. Additionally, you'll enjoy access to local and global career development programs, meal tickets, and employee stock purchase plans. Among other perks, Garmin also promotes a vibrant community-focused atmosphere with events tailored to enhance your holistic health.

Join Rise to see the full answer
What is the work culture like for Site Reliability Engineers at Garmin?

The work culture at Garmin for Site Reliability Engineers is collaborative, innovative, and focused on teamwork. With offices around the world, you'll be part of a diverse team that values open communication and shared ideas. The emphasis on professional development means you'll have opportunities to grow your skills and career while actively contributing to a supportive and energizing work environment.

Join Rise to see the full answer
Common Interview Questions for Site Reliability Engineer | Cloud Team
Can you explain your experience with Kubernetes and its components?

When answering this question, focus on your hands-on experience with Kubernetes, detailing specific projects you've worked on. Mention how you've utilized its components such as Pods, Services, and Deployments to manage containerized applications. Discuss your understanding of Kubernetes architecture and how you've addressed challenges in scaling applications or troubleshooting issues.

Join Rise to see the full answer
How do you approach incident response and post-incident analysis?

In your response, outline your systematic approach to incident response. Detail steps you take from immediate troubleshooting to documenting the incident, conducting a post-mortem review, and implementing changes to prevent future occurrences. Highlight the importance of communication with your team and stakeholders throughout this process.

Join Rise to see the full answer
What practices do you follow to ensure reliable software delivery?

Discuss your familiarity with CI/CD pipelines and how they contribute to reliable software delivery. Explain the practices you employ such as automation, thorough testing, code reviews, and continuous monitoring. Mention any specific tools you’ve used to streamline this process and how they enhance team collaboration and outcomes.

Join Rise to see the full answer
Describe a challenging project you worked on in a cloud environment.

Relate a specific project that posed complex challenges in a cloud environment. Describe the objectives, the hurdles you faced (like scaling issues or security concerns), and how you overcame them. Don't forget to emphasize teamwork and how collaboration contributed to achieving success.

Join Rise to see the full answer
How do you stay current with developments in site reliability engineering?

In your answer, convey your commitment to continuous learning. Mention the resources you use, such as attending conferences, following industry publications, participating in online communities, or taking courses. Highlight any personal projects that keep your skills sharp and innovative.

Join Rise to see the full answer
What tools do you find essential for monitoring and maintaining system reliability?

Talk about essential tools like Prometheus, Grafana, or ELK stack that you have used for monitoring system health and performance. Discuss how these tools help in identifying issues and ensuring service level objectives (SLOs) are met. Tailor your response to highlight tools you've had extensive experience with.

Join Rise to see the full answer
Can you elaborate on your experience with automation tools?

When addressing this question, recount specific automation tools you've worked with, such as Ansible, Terraform, or Jenkins. Discuss how you’ve implemented these tools in your projects to automate deployments, configurations, or monitoring processes and what impact they had on your team’s efficiency.

Join Rise to see the full answer
What is your strategy for capacity planning in cloud applications?

Explain how you analyze current and future system requirements to plan for capacity in your cloud applications. Discuss methods you've employed to assess usage trends, performance metrics, and resource allocation for peak demand periods. Highlight your experience in balancing cost-efficiency with the reliability and performance needs.

Join Rise to see the full answer
How do you foster collaboration among your team members?

In your response, discuss the strategies you use to foster collaboration, such as facilitating regular team meetings, promoting open communication platforms, and encouraging knowledge sharing. Emphasize your belief in the importance of a positive team culture and how you contribute to creating it.

Join Rise to see the full answer
What are some of the critical metrics you track for system reliability?

Highlight key metrics such as uptime, latency, error rates, and throughput that you consider critical for tracking system reliability. Discuss how you implement monitoring systems to capture these metrics and use them to inform decision-making and continuous improvement efforts.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Garmin Cluj Remote No location specified
Posted 11 days ago
Photo of the Rise User
Posted 4 days ago
Photo of the Rise User
Posted 10 days ago
Photo of the Rise User
Posted 11 days ago
Photo of the Rise User
Posted 10 days ago
Photo of the Rise User
McDonald's Corporation Hybrid 110 N Carpenter St, Chicago, IL 60607, USA
Posted 5 days ago
Photo of the Rise User
Inclusive & Diverse
Empathetic
Collaboration over Competition
Growth & Learning
Transparent & Candid
Medical Insurance
Dental Insurance
Mental Health Resources
Life insurance
Disability Insurance
Child Care stipend
Employee Resource Groups
Learning & Development
Photo of the Rise User
AbbVie Hybrid North Chicago, IL, USA
Posted 12 days ago
Photo of the Rise User
Posted 14 days ago

Headquartered Olathe, Kansas, Garmin manufactures marine, aviation, and consumer technologies suitable to run on global positioning systems.

9 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
December 24, 2024

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!