Job details

Expert Site Reliability Engineer - GCP

Get a free resume review

We are seeking an experienced Google Cloud Platform (GCP) Site Reliability Engineer (SRE) to manage daily operational workloads, ensuring the reliability, scalability, and cost efficiency of cloud infrastructure. The ideal candidate will have deep expertise in capacity planning, performance optimization, infrastructure design, and FinOps best practices to maintain an efficient and cost-effective GCP environment.

Key Responsibilities:

• Operations & Reliability: Manage and maintain GCP infrastructure, ensuring high availability, scalability, and system reliability.

• Capacity Planning & Optimization: Monitor and forecast resource utilization, performance trends, and infrastructure scaling needs to optimize cloud costs and efficiency.

• Infrastructure Design & Automation: Design and implement highly available, fault-tolerant, and resilient cloud architectures, leveraging Infrastructure as Code (IaC) tools such as Terraform and Ansible.

• Performance Monitoring & Incident Response: Utilize Google Cloud Monitoring, Cloud Logging, and third-party tools to proactively detect and resolve performance issues.

• FinOps & Cost Management: Analyze and optimize cloud spending, implement cost controls, recommend rightsizing strategies, and ensure efficient resource allocation.

• Security & Compliance: Implement best practices for IAM, network security, encryption, and compliance frameworks (SOC2, ISO 27001, NIST).

• CI/CD & DevOps Integration: Collaborate with DevOps teams to streamline deployment processes, automate workflows, and optimize application performance.

• Disaster Recovery & High Availability: Design and implement disaster recovery (DR) plans, backup strategies, and failover mechanisms to ensure business continuity.

• Documentation & Collaboration: Maintain comprehensive documentation of infrastructure, best practices, and optimization strategies while working closely with cross-functional teams.

Qualifications:

• Education: Bachelor’s degree in Computer Science, Information Technology, or equivalent experience.

• Experience: 8+ years of experience in cloud operations, reliability engineering, or infrastructure management.

• Certifications: GCP Professional Cloud Architect, GCP Professional DevOps Engineer, or equivalent is preferred.

• Technical Proficiency:

• Expertise in Google Cloud networking, Compute Engine, Kubernetes (GKE), Cloud Functions, and Cloud Storage.

• Strong knowledge of Terraform, Ansible, or other Infrastructure as Code (IaC) tools.

• Experience with Google Kubernetes Engine (GKE), microservices, and container orchestration.

• Hands-on experience with FinOps tools and cost optimization strategies in cloud environments.

• Familiarity with monitoring and logging solutions such as Google Operations Suite (formerly Stackdriver), Prometheus, Grafana.

• Experience with CI/CD pipelines, automation, and GitOps best practices.

• Strong understanding of SRE principles, SLAs, SLOs, and error budgets.

Preferred Qualifications:

• Experience with multi-cloud or hybrid cloud environments.

• Knowledge of serverless computing and cloud-native application design.

• Understanding of ITIL frameworks for incident, problem, and change management

Average salary estimate

$135000 / YEARLY (est.)

min

max

$120000K

$150000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Expert Site Reliability Engineer - GCP, DeepSource Technologies

We are excited to announce a fantastic opportunity at our company for an Expert Site Reliability Engineer - GCP! If you're passionate about managing cloud infrastructure on Google Cloud Platform and have a knack for ensuring high availability and reliability, this position might just be for you. As an SRE, you'll take on the challenge of overseeing daily operational workloads while optimizing the cost efficiency of our GCP environment. You'll dive deep into capacity planning and performance optimization, helping us design resilient cloud architectures using tools like Terraform and Ansible. Your expertise will enable us to proactively monitor performance and respond to incidents with a hands-on approach. Moreover, you'll play a critical role in FinOps, analyzing cloud spending and making recommendations for optimizing resources. Collaborating closely with cross-functional teams, you'll ensure that our operational practices meet security and compliance standards. If you have over 8 years of experience in cloud operations, a relevant degree, and hold certifications like GCP Professional Cloud Architect, we would love to hear from you. Join us in building a highly reliable and scalable GCP setup and take your career to the next level!

Frequently Asked Questions (FAQs) for Expert Site Reliability Engineer - GCP Role at DeepSource Technologies

What responsibilities does an Expert Site Reliability Engineer - GCP have?

As an Expert Site Reliability Engineer - GCP, you will be responsible for managing and maintaining GCP infrastructure, ensuring its reliability and high availability. This includes capacity planning, performance optimization, disaster recovery planning, and implementing cost management strategies. You'll also work on automation and CI/CD practices to streamline deployments.