Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Staff Site Reliability Engineer - Cloud Engineering image - Rise Careers
Job details

Staff Site Reliability Engineer - Cloud Engineering - job 10 of 20

Visa’s Technology Organization is a community of problem solvers and innovators reshaping the future of commerce. We operate the world’s most sophisticated processing networks capable of handling more than 65k secure transactions a second across 80M merchants, 15k Financial Institutions, and billions of everyday people. While working with us you’ll get to work on complex distributed systems and solve massive scale problems centered on new payment flows, business and data solutions, cyber security, and B2C platforms.

 

The Opportunity:

As a Staff Site Reliability Engineer in Product Reliability Engineering, you will be part of a team that maintains and supports Visa's Data Platform and provides support for key cloud based Big data and Kafka Platforms. You will be responsible for driving innovation for our partners and clients, within Visa and globally. You will work on open-source Big Data and Kafka clusters focusing on Cloud, ensuring their availability, performance, reliability, and improving operational efficiency.

 

The Work itself:

Essential Functions:

· Design, build and manage Big Data and Kafka infrastructure on AWS, GCP and Azure.

· Manage and optimize Apache Big Data and Kafka clusters for high performance, reliability, and scalability.

· Develop tools and processes to monitor and analyze system performance and to identify potential issues.

· Collaborate with other teams to design and implement Solutions to improve reliability and efficiency of the Big data cloud platforms.

· Ensure security and compliance of the platforms within organizational guidelines.

· Other responsibilities include effective root cause analysis of major production incidents and the development of learning documentation. The person will identify and implement high-availability solutions for services with a single point of failure.

· The role involves planning and performing capacity expansions and upgrades in a timely manner to avoid any scaling issues and bugs. This includes automating repetitive tasks to reduce manual effort and prevent human errors.

· The successful candidate will tune alerting and set up observability to proactively identify issues and performance problems. They will also work closely with Level 3 teams in reviewing new use cases and cluster hardening techniques to build robust and reliable platforms.

· The role involves creating standard operating procedure documents and guidelines on effectively managing and utilizing the platforms. The person will leverage DevOps tools, disciplines (Incident, problem, and change management), and standards in day-to-day operations.

· The individual will ensure that the platforms can effectively meet performance and service level agreement requirements. They will also perform security remediation, automation, and self-healing as per the requirement.

· The individual will concentrate on developing automations and reports to minimize manual effort. This can be achieved through various automation tools such as Shell scripting, Ansible, or Python scripting, or by using any other programming language.

 

The Skills You Bring:

· Energy and Experience: A growth mindset that is curious and passionate about technologies and enjoys challenging projects on a global scale.

·  Challenge the Status Quo: Comfort in pushing the boundaries, “hacking” beyond traditional solutions.

·  Language Expertise: Expertise in one or more general development languages (e.g., Java, python)

· Builder: Experience building and deploying distributed systems.

·  Learner: Constant drive to learn new technologies such as cloud technologies, Kubernetes, MLOPS.

· Partnership: Experience collaborating with Engineering, Application and Other functional teams.

 

**We do not expect that any single candidate would fulfill all these characteristics. For instance, we have awesome team members who are really focused on building scalable systems but didn’t work with payments technology or web applications before joining Visa.

This is a hybrid position. Hybrid employees can alternate time between both remote and office. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs.

Average salary estimate

$145000 / YEARLY (est.)
min
max
$130000K
$160000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Staff Site Reliability Engineer - Cloud Engineering, Visa

Are you ready to take your career to the next level? Join Visa as a Staff Site Reliability Engineer - Cloud Engineering in Austin, where innovation meets excellence. At Visa’s Technology Organization, we are a passionate community of problem solvers dedicated to transforming the future of commerce. You’ll dive into the world’s most advanced processing networks, tackling complex distributed systems and massive scale challenges, from evolving payment flows to robust data solutions. In this role, you will manage Visa's Data Platform, focusing on cloud-based Big Data and Kafka platforms. Imagine designing, building, and optimizing infrastructures across AWS, GCP, and Azure while ensuring high performance, compliance, and reliability. You’ll collaborate with talented professionals to develop tools that monitor system performance, engaging in root cause analysis and capacity planning to maintain seamless operations. With an emphasis on automation and efficiency, your technical skills will shine as you create scripts in Python or shell scripting to minimize manual efforts. At Visa, we believe in a growth mindset, so you’ll be encouraged to learn new technologies like Kubernetes and MLOPS while driving innovations that have global impact. If you're ready to embrace challenges and contribute to a supportive team that values your unique expertise, this is your opportunity to shine!

Frequently Asked Questions (FAQs) for Staff Site Reliability Engineer - Cloud Engineering Role at Visa
What responsibilities does a Staff Site Reliability Engineer - Cloud Engineering at Visa have?

As a Staff Site Reliability Engineer - Cloud Engineering at Visa, your key responsibilities will include designing, building, and managing Big Data and Kafka infrastructure across major cloud platforms like AWS, GCP, and Azure. You will focus on optimizing Apache Big Data and Kafka clusters, collaborating with various teams to enhance reliability and efficiency. Additionally, you will develop monitoring tools for system performance, address production incidents, and ensure compliance with security guidelines.

Join Rise to see the full answer
What qualifications are required for the Staff Site Reliability Engineer - Cloud Engineering position at Visa?

Candidates for the Staff Site Reliability Engineer - Cloud Engineering role at Visa should have a growth mindset along with experience in building and deploying distributed systems. Proficiency in programming languages, particularly Java or Python, is essential. Experience with cloud technologies and familiarity with DevOps disciplines will also be beneficial. The ideal candidate is someone eager to learn and collaborate with various functional teams.

Join Rise to see the full answer
What skills are important for the Staff Site Reliability Engineer - Cloud Engineering role at Visa?

Key skills for the Staff Site Reliability Engineer - Cloud Engineering position at Visa include expertise in one or more development languages like Java or Python, a deep understanding of cloud technologies, and experience with automation tools like Ansible and shell scripting. Additionally, strong analytical and problem-solving abilities, along with a passion for continuous learning, will help you thrive in this hybrid role.

Join Rise to see the full answer
How does the hybrid work model operate for the Staff Site Reliability Engineer - Cloud Engineering role at Visa?

Visa’s hybrid work model for the Staff Site Reliability Engineer - Cloud Engineering role offers flexibility, allowing you to alternate between remote and in-office work. Employees in hybrid roles are generally expected to be in the office 2-3 days a week, providing a collaborative environment while maintaining the option to enjoy the benefits of remote work based on individual and business needs.

Join Rise to see the full answer
What kind of projects will a Staff Site Reliability Engineer - Cloud Engineering work on at Visa?

In the Staff Site Reliability Engineer - Cloud Engineering role at Visa, you'll tackle exciting projects involving complex distributed systems, cybersecurity, and innovative payment solutions. Your work will revolve around maintaining and improving large-scale, open-source Big Data and Kafka platforms, ultimately driving impactful solutions for Visa and its global partners.

Join Rise to see the full answer
Common Interview Questions for Staff Site Reliability Engineer - Cloud Engineering
Can you describe your experience with cloud technologies as a Staff Site Reliability Engineer?

When answering this question, highlight specific cloud technologies you have worked with, such as AWS, GCP, or Azure. Discuss how you utilized these platforms for deploying applications, managing infrastructure, or optimizing performance, and include examples that showcase your problem-solving skills in cloud environments.

Join Rise to see the full answer
How do you approach root cause analysis in production incidents?

Discuss your systematic approach to root cause analysis by mentioning steps like gathering information, analyzing logs, and collaborating with team members. Provide an example of a past incident where your investigation led to a significant improvement in system reliability.

Join Rise to see the full answer
What strategies do you use to optimize Big Data and Kafka clusters?

Explain the techniques you implement for optimizing performance, like tuning configurations, leveraging monitoring tools, and automating repetitive tasks. Share specific metrics you monitor and how you have achieved high availability in your projects.

Join Rise to see the full answer
How do you ensure security compliance in cloud platforms?

Provide insights into your practices for security compliance, detailing how you conduct regular audits, implement security patches, and follow best practices for managing sensitive data. Discuss a specific project where you successfully enhanced security measures.

Join Rise to see the full answer
Describe a challenge you faced while collaborating with other teams.

Be honest about a specific challenge you encountered while working with cross-functional teams. Discuss how you navigated different perspectives and ultimately reached a resolution, highlighting skills like communication and negotiation.

Join Rise to see the full answer
What is your experience with automation tools like Ansible or Shell scripting?

Discuss your familiarity with automation tools and how you have utilized them to streamline processes or reduce manual efforts in your previous roles. Highlight specific examples where your scripting skills directly contributed to increased efficiency.

Join Rise to see the full answer
How would you monitor system performance effectively?

Discuss the tools and metrics you would use to monitor the performance of systems, emphasizing the role of health checks, logging systems, and alerting mechanisms. Provide an example of how you proactively identified and resolved an issue before it escalated.

Join Rise to see the full answer
What do you think are the key components of a reliable distributed system?

Talk about scalability, fault tolerance, and availability as critical components. Make sure to back your points with examples of systems you've worked on and how your interventions improved these aspects of reliability.

Join Rise to see the full answer
Can you explain your approach to capacity planning in cloud environments?

Frame your answer by discussing methods like analyzing usage patterns, forecasting growth, and leveraging automated scaling features in cloud environments. Share past experiences where your planning led to successful scaling without impacting performance.

Join Rise to see the full answer
How do you keep yourself updated with the latest technologies in cloud engineering?

Speak about various resources you utilize, such as online courses, tech blogs, and cloud platform documentation. Share specific technologies you've recently learned about and how you plan to incorporate them into your work.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 8 days ago
Photo of the Rise User
Posted 7 days ago

Join Dewberry as a Site/Civil Staff Engineer and contribute to innovative land development projects.

Photo of the Rise User
Cuscal Remote 1 Margaret St, Sydney NSW 2000, Australia
Posted 13 days ago
Photo of the Rise User
Posted 4 days ago
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Customer-Centric
Snacks
Onsite Gym
Family Coverage (Insurance)
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Learning & Development
Paid Time-Off
401K Matching
Maternity Leave
Paternity Leave

Join Intel's team as a Connectivity Automation Development Engineer, where you will enhance automation frameworks and advance wireless connectivity testing solutions.

Photo of the Rise User
Posted 6 days ago

Join Definely as a Senior MLOps Engineer to enhance our AI team's capabilities and shape our MLOps platform.

ARCXIS Hybrid Houston, Texas, United States
Posted 8 days ago

Visa Inc. operates as a payments technology company worldwide. The company facilitates commerce through the transfer of value and information among consumers, merchants, financial institutions, businesses, strategic partners, and government entiti...

8905 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
April 3, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!