Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Expert Site Reliability Engineer - GCP image - Rise Careers
Job details

Expert Site Reliability Engineer - GCP

We are seeking an experienced Google Cloud Platform (GCP) Site Reliability Engineer (SRE) to manage daily operational workloads, ensuring the reliability, scalability, and cost efficiency of cloud infrastructure. The ideal candidate will have deep expertise in capacity planning, performance optimization, infrastructure design, and FinOps best practices to maintain an efficient and cost-effective GCP environment.

Key Responsibilities:

• Operations & Reliability: Manage and maintain GCP infrastructure, ensuring high availability, scalability, and system reliability.

• Capacity Planning & Optimization: Monitor and forecast resource utilization, performance trends, and infrastructure scaling needs to optimize cloud costs and efficiency.

• Infrastructure Design & Automation: Design and implement highly available, fault-tolerant, and resilient cloud architectures, leveraging Infrastructure as Code (IaC) tools such as Terraform and Ansible.

• Performance Monitoring & Incident Response: Utilize Google Cloud Monitoring, Cloud Logging, and third-party tools to proactively detect and resolve performance issues.

• FinOps & Cost Management: Analyze and optimize cloud spending, implement cost controls, recommend rightsizing strategies, and ensure efficient resource allocation.

• Security & Compliance: Implement best practices for IAM, network security, encryption, and compliance frameworks (SOC2, ISO 27001, NIST).

• CI/CD & DevOps Integration: Collaborate with DevOps teams to streamline deployment processes, automate workflows, and optimize application performance.

• Disaster Recovery & High Availability: Design and implement disaster recovery (DR) plans, backup strategies, and failover mechanisms to ensure business continuity.

• Documentation & Collaboration: Maintain comprehensive documentation of infrastructure, best practices, and optimization strategies while working closely with cross-functional teams.

Qualifications:

• Education: Bachelor’s degree in Computer Science, Information Technology, or equivalent experience.

• Experience: 8+ years of experience in cloud operations, reliability engineering, or infrastructure management.

• Certifications: GCP Professional Cloud Architect, GCP Professional DevOps Engineer, or equivalent is preferred.

• Technical Proficiency:

• Expertise in Google Cloud networking, Compute Engine, Kubernetes (GKE), Cloud Functions, and Cloud Storage.

• Strong knowledge of Terraform, Ansible, or other Infrastructure as Code (IaC) tools.

• Experience with Google Kubernetes Engine (GKE), microservices, and container orchestration.

• Hands-on experience with FinOps tools and cost optimization strategies in cloud environments.

• Familiarity with monitoring and logging solutions such as Google Operations Suite (formerly Stackdriver), Prometheus, Grafana.

• Experience with CI/CD pipelines, automation, and GitOps best practices.

• Strong understanding of SRE principles, SLAs, SLOs, and error budgets.

Preferred Qualifications:

• Experience with multi-cloud or hybrid cloud environments.

• Knowledge of serverless computing and cloud-native application design.

• Understanding of ITIL frameworks for incident, problem, and change management

Average salary estimate

$135000 / YEARLY (est.)
min
max
$120000K
$150000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Expert Site Reliability Engineer - GCP, DeepSource Technologies

We are excited to announce a fantastic opportunity at our company for an Expert Site Reliability Engineer - GCP! If you're passionate about managing cloud infrastructure on Google Cloud Platform and have a knack for ensuring high availability and reliability, this position might just be for you. As an SRE, you'll take on the challenge of overseeing daily operational workloads while optimizing the cost efficiency of our GCP environment. You'll dive deep into capacity planning and performance optimization, helping us design resilient cloud architectures using tools like Terraform and Ansible. Your expertise will enable us to proactively monitor performance and respond to incidents with a hands-on approach. Moreover, you'll play a critical role in FinOps, analyzing cloud spending and making recommendations for optimizing resources. Collaborating closely with cross-functional teams, you'll ensure that our operational practices meet security and compliance standards. If you have over 8 years of experience in cloud operations, a relevant degree, and hold certifications like GCP Professional Cloud Architect, we would love to hear from you. Join us in building a highly reliable and scalable GCP setup and take your career to the next level!

Frequently Asked Questions (FAQs) for Expert Site Reliability Engineer - GCP Role at DeepSource Technologies
What responsibilities does an Expert Site Reliability Engineer - GCP have?

As an Expert Site Reliability Engineer - GCP, you will be responsible for managing and maintaining GCP infrastructure, ensuring its reliability and high availability. This includes capacity planning, performance optimization, disaster recovery planning, and implementing cost management strategies. You'll also work on automation and CI/CD practices to streamline deployments.

Join Rise to see the full answer
What qualifications are needed for the Expert Site Reliability Engineer - GCP role?

To qualify for the Expert Site Reliability Engineer - GCP position, candidates should have a Bachelor's degree in Computer Science or related fields, along with 8+ years of experience in cloud operations or reliability engineering. Preferred certifications include GCP Professional Cloud Architect or Professional DevOps Engineer.

Join Rise to see the full answer
What technical skills are important for an Expert Site Reliability Engineer - GCP?

Technical proficiency is key for an Expert Site Reliability Engineer - GCP. Important skills include expertise in Google Cloud services like Compute Engine and Kubernetes, proficiency in Infrastructure as Code tools (Terraform and Ansible), and experience with monitoring solutions such as Google Operations Suite and Prometheus.

Join Rise to see the full answer
What role does FinOps play in the Expert Site Reliability Engineer - GCP position?

In the Expert Site Reliability Engineer - GCP role, FinOps is essential for analyzing and optimizing cloud spending. You'll need to implement cost controls and recommend strategies for rightsizing resources to ensure cost efficiency in our GCP environment.

Join Rise to see the full answer
How does the Expert Site Reliability Engineer - GCP ensure security and compliance?

The Expert Site Reliability Engineer - GCP is responsible for implementing best practices for IAM, network security, and encryption, while ensuring compliance with frameworks like SOC2 and ISO 27001. This is crucial to maintaining a secure and compliant cloud environment.

Join Rise to see the full answer
Common Interview Questions for Expert Site Reliability Engineer - GCP
Can you explain your experience with Google Cloud Infrastructure?

Be prepared to discuss your hands-on experience with Google Cloud services, focusing on specific projects where you utilized tools like Compute Engine, GKE, or Cloud Functions. Highlight any challenges faced and how you overcame them.

Join Rise to see the full answer
How do you approach capacity planning in a cloud environment?

When answering this question, describe your methodology for monitoring resource utilization and performance trends. Discuss how you would forecast future needs and optimize costs based on expected growth.

Join Rise to see the full answer
What strategies do you use for monitoring and incident response?

Explain your use of tools like Google Cloud Monitoring and Cloud Logging. Provide examples of proactive measures you take to detect issues early and the processes followed to respond effectively to incidents.

Join Rise to see the full answer
How do you ensure high availability and reliability in your designs?

Discuss your approach to designing fault-tolerant architectures. Share your experience with implementing failover mechanisms and disaster recovery plans, including specific methodologies you’ve used.

Join Rise to see the full answer
What has been your experience with Infrastructure as Code tools?

Be specific about your experience with IaC tools like Terraform and Ansible. Discuss projects where you created or managed cloud resources using these tools and the benefits you observed in terms of efficiency and speed.

Join Rise to see the full answer
How do you handle cost management and optimization in the cloud?

Share your strategies for implementing FinOps practices. Discuss specific tools or methods you’ve used to analyze spend, optimize resource allocation, and recognize cost-saving opportunities.

Join Rise to see the full answer
What security measures do you take as part of your role?

Outline your knowledge of IAM policies, network security strategies, and encryption practices. Provide examples of compliance frameworks you’ve adhered to and any audits you've participated in.

Join Rise to see the full answer
Can you describe your experience with CI/CD processes?

Talk about your involvement in CI/CD pipelines, emphasizing automation and how you’ve collaborated with development teams to enhance application performance and streamline deployments.

Join Rise to see the full answer
What challenges have you faced in a multi-cloud environment?

Discuss specific challenges you've encountered while managing services across multiple cloud environments. Highlight how you addressed these issues and the importance of a consistent strategy.

Join Rise to see the full answer
How do you stay updated with the latest trends in cloud technology?

Explain your approach to continuous learning, such as following relevant blogs, subscribing to industry newsletters, attending webinars, or participating in professional forums to remain current with cloud technological advancements.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 5 days ago
Stoke Space Hybrid Kent, Washington, United States
Posted 8 days ago
Photo of the Rise User
Posted 6 hours ago
Photo of the Rise User
IMC Hybrid Chicago, Illinois, United States
Posted 7 hours ago
Photo of the Rise User
Thorlabs Remote No location specified
Posted 9 days ago
Photo of the Rise User
Mission Driven
Social Impact Driven
Passion for Exploration
Reward & Recognition

DeepSource is a code review tool that allows developers to check for bug risks, anti-patterns, performance issues and security flaws. The company is headquartered in California.

29 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
March 28, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!