Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Platform Support Engineer image - Rise Careers
Job details

Platform Support Engineer

We're seeking a versatile Cloud Platform Engineer passionate about building and maintaining a highly reliable, scalable, and cloud-native infrastructure. You'll be vital in bridging the gap between development, operations, and SRE, ensuring our applications run smoothly on Kubernetes across multiple cloud platforms. Your deep understanding of Kubernetes, cloud technologies, and automation will be instrumental in empowering our teams to deliver high-quality software quickly and reliably.

What will you do?

  • Design, deploy, and operate Kubernetes clusters across AWS, Azure, and GCP. Optimize cluster performance, ensure high availability, and implement robust security practices.

  • Build and maintain cloud-native infrastructure components (load balancers, networking, storage, etc.) to support applications running on Kubernetes. Leverage Infrastructure as Code (IaC) with Terraform to automate and manage infrastructure provisioning and configuration.

  • Embrace GitOps principles using ArgoCD to automate deployments and configuration changes and ensure consistency between the desired and actual system state.

  • Establish comprehensive monitoring, logging, and alerting systems to gain insights into platform health and performance. Troubleshoot incidents swiftly and apply SRE principles to improve reliability and resilience.

  • Develop automation scripts and tools (Python, Go, or other languages) to streamline workflows, eliminate manual tasks, and reduce operational overhead.

  • Partner closely with development teams to understand their needs, provide guidance on platform best practices, and enable smooth integration and deployment of their applications.

  • Implement and maintain stringent security measures for Kubernetes and cloud environments, ensuring compliance with industry standards and data protection regulations.

  • Analyze resource usage and implement optimization strategies to maximize performance while controlling cloud costs.

  • Participate in an on-call rotation, troubleshooting and resolving production issues promptly.

What makes you a match?

  • 3+ years of experience working with Kubernetes in production environments. Deep understanding of cluster operations, networking, storage, and security within Kubernetes.

  • Strong knowledge of AWS, Azure, and GCP, including core services, networking concepts, and security best practices.

  • Proven experience implementing GitOps workflows with ArgoCD and managing infrastructure using Terraform.

  • Fluency in at least one programming language (Python, Go, Java) for automation, scripting, and tool development.

  • Familiarity with SRE practices like SLOs (Service Level Objectives), error budgeting, and blameless postmortems.

  • Excellent analytical and troubleshooting skills to identify and resolve issues in complex cloud environments.

  • Ability to communicate effectively with development, operations, and security teams to drive cross-functional initiatives.

  • Ability to work from 8.30 PM to 5.30 AM IST to provide coverage for US time zones.

Atlan Glassdoor Company Review
4.6 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
Atlan DE&I Review
4.7 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
CEO of Atlan
Atlan CEO photo
Prukalpa Sankar and Varun Banka
Approve of CEO
What You Should Know About Platform Support Engineer, Atlan

If you're looking for an exciting opportunity as a Platform Support Engineer, then you’re in the right place! At our company, we’re on the hunt for a talented engineer who thrives on building and maintaining a reliable, scalable cloud-native infrastructure. Your role will be crucial, acting as the bridge between development, operations, and SRE, ensuring our applications maintain top-notch performance on Kubernetes across various cloud platforms. Imagine designing, deploying, and fine-tuning Kubernetes clusters on AWS, Azure, and GCP, all while optimizing performance and ensuring high availability. Using Terraform for Infrastructure as Code (IaC) will be a daily part of your toolkit, making infrastructure provisioning and configuration feel effortless. You’ll adopt GitOps principles with ArgoCD, automate deployments, and create a harmonious balance between desired and actual system states. Monitoring and logging will be in your hands, providing critical insights into platform health and performance while you swiftly troubleshoot incidents with practiced SRE principles. If you have a knack for developing automation scripts in Python, Go, or other languages, you'll streamline workflows and reduce manual efforts. It’s all about collaboration here, partnering with development teams to ensure seamless integration and deployment of their applications while upholding stringent security measures in Kubernetes and cloud environments. If you have 3+ years of experience in the Kubernetes sphere, this role might be your next big adventure!

Frequently Asked Questions (FAQs) for Platform Support Engineer Role at Atlan
What are the primary responsibilities of a Platform Support Engineer at our company?

As a Platform Support Engineer, you'll be responsible for designing, deploying, and maintaining Kubernetes clusters on major cloud platforms like AWS, Azure, and GCP. You'll also manage cloud-native infrastructure components, automate processes using Terraform, and ensure consistent deployments with GitOps principles. Establishing monitoring systems and troubleshooting incidents will be essential parts of your role, all while collaborating with development teams and implementing security best practices.

Join Rise to see the full answer
What qualifications are needed to apply for the Platform Support Engineer position?

To qualify as a Platform Support Engineer at our company, you should have at least 3 years of hands-on experience with Kubernetes in production settings. Strong knowledge of AWS, Azure, and GCP, along with core services and security practices, is crucial. You should also be familiar with GitOps workflows using ArgoCD, possess strong programming skills in languages like Python or Go, and have a good understanding of SRE practices.

Join Rise to see the full answer
What programming skills are required for the Platform Support Engineer role?

A Platform Support Engineer should be proficient in at least one programming language, such as Python, Go, or Java, for automation and scripting tasks. These skills will be vital for developing scripts and tools to streamline workflows and reduce operational overhead, making you a key player in our engineering efforts.

Join Rise to see the full answer
Can you explain the significance of IaC and GitOps in the Platform Support Engineer role?

Infrastructure as Code (IaC) and GitOps are critical methodologies in the role of a Platform Support Engineer. IaC, particularly through tools like Terraform, allows for automated provisioning and management of infrastructure, promoting consistency and reducing manual effort. GitOps utilizes version control systems for infrastructure changes, ensuring that deployments are reliable and that the actual state of your systems matches the desired state, which enhances overall stability and traceability.

Join Rise to see the full answer
What is the typical work schedule for the Platform Support Engineer position?

As a Platform Support Engineer, you will typically work from 8:30 PM to 5:30 AM IST. This schedule is designed to provide coverage for US time zones, allowing you to collaborate effectively with teams across the globe while keeping our cloud infrastructure operating seamlessly.

Join Rise to see the full answer
Common Interview Questions for Platform Support Engineer
How do you manage the deployment of applications on Kubernetes?

In managing deployments on Kubernetes, I follow a structured approach that includes defining deployment configurations as code, leveraging CI/CD pipelines for automation, and utilizing tools like ArgoCD for GitOps principles. This ensures that deployments are not only consistent and reliable but also auditable and easy to revert if necessary.

Join Rise to see the full answer
Can you explain how you optimize Kubernetes cluster performance?

To optimize Kubernetes cluster performance, I regularly monitor resource usage and conduct performance assessments to identify bottlenecks. Implementing resource quotas and limits, optimizing pod configurations, and utilizing Horizontal Pod Autoscaling are key aspects of my strategy. Additionally, I assess and refine network policies and storage configurations to ensure efficiency.

Join Rise to see the full answer
What tools do you use for monitoring and alerting in cloud environments?

I employ tools such as Prometheus for metrics collection and Grafana for visualization to monitor the health of the Kubernetes clusters. For alerting, I configure Alertmanager to send notifications based on defined thresholds and incidents, ensuring a proactive approach to cloud environment management.

Join Rise to see the full answer
How do you ensure security within your Kubernetes environments?

Ensuring security in Kubernetes environments involves implementing Role-Based Access Control (RBAC) to restrict access, using network policies to control traffic, and maintaining up-to-date container images to mitigate vulnerabilities. Regular security audits and adhering to compliance standards are also essential practices I follow to maintain security hygiene.

Join Rise to see the full answer
What experience do you have with automation and scripting?

I have extensive experience developing automation scripts using Python and Go to streamline various processes, such as managing Kubernetes configurations, automating backups, and implementing deployment workflows. This not only reduces manual effort but also increases reliability and standardization across the platforms I manage.

Join Rise to see the full answer
Describe a challenging incident you resolved in a cloud environment.

One significant incident in a cloud environment involved a sudden performance degradation of a critical application running in Kubernetes. I quickly implemented a step-by-step troubleshooting process, analyzing logs, checking resource usage, and eventually identifying a misconfigured resource limit. By adjusting the settings, I restored performance and documented the incident for future learning.

Join Rise to see the full answer
How do you work with development teams to ensure platform alignment?

I prioritize communication and collaboration with development teams to understand their requirements and challenges. Regular meetings, sharing best practices, and providing guidance on platform features are part of my approach. This collaboration ensures smooth integration of applications and fosters a culture of shared responsibility.

Join Rise to see the full answer
What are Service Level Objectives (SLOs), and why are they important?

Service Level Objectives (SLOs) are critical indicators that define the expected performance and reliability of a service. They are important because they set clear expectations for service delivery, help teams prioritize incidents based on their impact, and provide measurable goals for continuous improvement of platform reliability.

Join Rise to see the full answer
How do you handle on-call responsibilities and incident management?

I approach on-call responsibilities with a proactive mindset, ensuring I am well-prepared for potential incidents. I maintain detailed documentation and runbooks that outline troubleshooting steps for common issues. During incidents, I prioritize clear communication and collaboration with the team to resolve issues quickly while minimizing service disruption.

Join Rise to see the full answer
What strategies do you use to manage costs in cloud environments?

To manage costs in cloud environments, I regularly analyze resource usage and identify underutilized or over-provisioned resources. Implementing autoscaling solutions helps to adjust resources dynamically based on demand. Additionally, I often explore the use of spot instances for non-critical workloads and leverage cost management tools to gain visibility over spending patterns.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Atlan Remote No location specified
Posted 9 days ago
Photo of the Rise User
Atlan Remote No location specified
Posted 9 days ago
Posted 9 days ago
Photo of the Rise User
Posted 5 days ago
Photo of the Rise User
AlayaCare Remote Montréal, Quebec, Canada
Posted 4 days ago
Dental Insurance
Vision Insurance
Photo of the Rise User
Posted 23 hours ago
Fortune Brands Remote 25300 Al Moen Drive, Chicago, ILLINOIS
Posted 2 days ago

To help data teams do more, together! 💪

52 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
December 28, 2024

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!