Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Staff Site Reliability Engineer - Cloud Engineering image - Rise Careers
Job details

Staff Site Reliability Engineer - Cloud Engineering - job 14 of 20

Visa’s Technology Organization is a community of problem solvers and innovators reshaping the future of commerce. We operate the world’s most sophisticated processing networks capable of handling more than 65k secure transactions a second across 80M merchants, 15k Financial Institutions, and billions of everyday people. While working with us you’ll get to work on complex distributed systems and solve massive scale problems centered on new payment flows, business and data solutions, cyber security, and B2C platforms.

 

The Opportunity:

As a Staff Site Reliability Engineer in Product Reliability Engineering, you will be part of a team that maintains and supports Visa's Data Platform and provides support for key cloud based Big data and Kafka Platforms. You will be responsible for driving innovation for our partners and clients, within Visa and globally. You will work on open-source Big Data and Kafka clusters focusing on Cloud, ensuring their availability, performance, reliability, and improving operational efficiency.

 

The Work itself:

Essential Functions:

· Design, build and manage Big Data and Kafka infrastructure on AWS, GCP and Azure.

· Manage and optimize Apache Big Data and Kafka clusters for high performance, reliability, and scalability.

· Develop tools and processes to monitor and analyze system performance and to identify potential issues.

· Collaborate with other teams to design and implement Solutions to improve reliability and efficiency of the Big data cloud platforms.

· Ensure security and compliance of the platforms within organizational guidelines.

· Other responsibilities include effective root cause analysis of major production incidents and the development of learning documentation. The person will identify and implement high-availability solutions for services with a single point of failure.

· The role involves planning and performing capacity expansions and upgrades in a timely manner to avoid any scaling issues and bugs. This includes automating repetitive tasks to reduce manual effort and prevent human errors.

· The successful candidate will tune alerting and set up observability to proactively identify issues and performance problems. They will also work closely with Level 3 teams in reviewing new use cases and cluster hardening techniques to build robust and reliable platforms.

· The role involves creating standard operating procedure documents and guidelines on effectively managing and utilizing the platforms. The person will leverage DevOps tools, disciplines (Incident, problem, and change management), and standards in day-to-day operations.

· The individual will ensure that the platforms can effectively meet performance and service level agreement requirements. They will also perform security remediation, automation, and self-healing as per the requirement.

· The individual will concentrate on developing automations and reports to minimize manual effort. This can be achieved through various automation tools such as Shell scripting, Ansible, or Python scripting, or by using any other programming language.

 

The Skills You Bring:

· Energy and Experience: A growth mindset that is curious and passionate about technologies and enjoys challenging projects on a global scale.

·  Challenge the Status Quo: Comfort in pushing the boundaries, “hacking” beyond traditional solutions.

·  Language Expertise: Expertise in one or more general development languages (e.g., Java, python)

· Builder: Experience building and deploying distributed systems.

·  Learner: Constant drive to learn new technologies such as cloud technologies, Kubernetes, MLOPS.

· Partnership: Experience collaborating with Engineering, Application and Other functional teams.

 

**We do not expect that any single candidate would fulfill all these characteristics. For instance, we have awesome team members who are really focused on building scalable systems but didn’t work with payments technology or web applications before joining Visa.

This is a hybrid position. Hybrid employees can alternate time between both remote and office. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs.

Average salary estimate

$120000 / YEARLY (est.)
min
max
$100000K
$140000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Staff Site Reliability Engineer - Cloud Engineering, Visa

Are you ready to take on a pivotal role in shaping the future of commerce? Visa is looking for a Staff Site Reliability Engineer - Cloud Engineering to join our dynamic team in Austin. Here at Visa, we pride ourselves on operating the world’s most advanced processing networks, handling over 65,000 secure transactions per second. In this role, you'll be at the forefront of innovation, supporting our Data Platform and key cloud-based Big Data and Kafka Platforms while collaborating with partners globally. You will design, build, and optimize Big Data and Kafka infrastructures on major cloud services like AWS, GCP, and Azure. Your expertise will help us enhance the reliability and performance of our systems, ensuring they effectively serve millions of users daily. You'll also engage with teams across the organization to create robust solutions and automate processes—reducing manual effort and preventing errors. With a growth mindset, your passion for technology will be crucial as you drive improvements in performance and security, all while enjoying a flexible hybrid work environment, splitting your time between remote work and our vibrant Austin office. This is not just a job; it's an opportunity to challenge the status quo and be part of something groundbreaking at Visa.

Frequently Asked Questions (FAQs) for Staff Site Reliability Engineer - Cloud Engineering Role at Visa
What are the key responsibilities of a Staff Site Reliability Engineer - Cloud Engineering at Visa?

As a Staff Site Reliability Engineer - Cloud Engineering at Visa, you will design and manage Big Data and Kafka infrastructure across major cloud platforms like AWS, GCP, and Azure. You'll be responsible for optimizing system performance and reliability, developing monitoring tools, and collaborating with various teams to enhance platform efficiency. Ensuring compliance with security standards, performing root cause analysis of incidents, and automating tasks to reduce manual effort are also vital parts of your role.

Join Rise to see the full answer
What skills are essential for a Staff Site Reliability Engineer - Cloud Engineering at Visa?

To thrive as a Staff Site Reliability Engineer - Cloud Engineering at Visa, you will need a strong understanding of cloud technologies, distributed systems, and general development languages such as Java or Python. Additionally, experience with automation tools like Ansible, shell scripting, or Python scripting is crucial. A growth mindset that embraces new technologies and the ability to collaborate effectively with engineering teams is also essential.

Join Rise to see the full answer
Is the Staff Site Reliability Engineer - Cloud Engineering position at Visa hybrid or remote?

Yes, the Staff Site Reliability Engineer - Cloud Engineering position at Visa is hybrid. Employees are expected to spend 2-3 days a week in the office, based on leadership guidance and business needs. This allows for a blend of remote work flexibility with the benefits of in-person collaboration in our Austin office.

Join Rise to see the full answer
What does a typical day look like for a Staff Site Reliability Engineer - Cloud Engineering at Visa?

A typical day for a Staff Site Reliability Engineer - Cloud Engineering at Visa involves collaborating with cross-functional teams to ensure platform reliability, performance, and security. You’ll be monitoring big data and Kafka clusters, troubleshooting issues, automating processes to improve efficiency, and planning for capacity expansions, all while engaging in proactive problem-solving and innovation.

Join Rise to see the full answer
What opportunities for growth exist for a Staff Site Reliability Engineer - Cloud Engineering at Visa?

At Visa, as a Staff Site Reliability Engineer - Cloud Engineering, there are immense opportunities for professional growth and learning. You'll have the chance to work on cutting-edge technology projects, collaborate with industry leaders, and develop your expertise in cloud technologies, Kubernetes, and MLOps. Visa encourages continuous learning and career advancement through various programs and mentorship opportunities.

Join Rise to see the full answer
Common Interview Questions for Staff Site Reliability Engineer - Cloud Engineering
Can you explain your experience with distributed systems?

When answering this question, highlight specific projects where you've worked on distributed systems. Discuss the technologies involved, the challenges faced, and how you contributed to enhancing system performance and reliability.

Join Rise to see the full answer
What automation tools have you used, and how have they improved your workflow?

Discuss the automation tools you're familiar with, such as Ansible or shell scripting. Share specific examples of how you've implemented these tools to reduce manual tasks, streamline operations, and enhance efficiency.

Join Rise to see the full answer
How do you approach problem-solving during incidents?

Your response should demonstrate a structured approach towards incident management. Talk about the steps you follow—such as identifying the issue, conducting root cause analysis, communicating with stakeholders, and developing documentation for future reference.

Join Rise to see the full answer
What measures do you take to ensure the security of the platforms you manage?

Mention specific security practices you've implemented, such as regular security audits, compliance checks, and the use of automated remediation tools. Discuss how you stay updated on security protocols and best practices in the industry.

Join Rise to see the full answer
How do you ensure your systems meet performance and SLA requirements?

Explain your strategies for monitoring and tuning system performance, including the tools or metrics you rely on. Discuss how you set SLAs and the processes you use to meet or exceed them.

Join Rise to see the full answer
Can you give an example of a challenging project you've worked on?

Share a detailed example of a challenging project, emphasizing the objectives, your role, the obstacles faced, and the successful outcomes. This illustrates your problem-solving skills and resilience.

Join Rise to see the full answer
How do you stay current with emerging technologies?

Discuss your strategies for continuous learning, such as participating in workshops, online courses, or tech community involvement. Mention any specific technologies that intrigue you and how they relate to the role.

Join Rise to see the full answer
Describe your experience working in a hybrid team environment.

Talk about how you effectively communicate and collaborate in a hybrid setting. Share specific tools you use and your thoughts on the challenges and benefits of remote teamwork.

Join Rise to see the full answer
What do you believe are the key qualities of a good Site Reliability Engineer?

Discuss qualities such as problem-solving abilities, communication skills, a proactive attitude towards automation, and a deep understanding of system architecture. Relate these qualities to how they contribute to team success.

Join Rise to see the full answer
How would you prioritize tasks in a fast-paced environment?

Explain your prioritization framework, such as assessing task impact and urgency. Emphasize your experience in juggling multiple responsibilities and ensuring nothing falls through the cracks.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
CGI Hybrid US, Virginia, Newport News, VA
Posted 7 hours ago

Join CGI Federal as a Cyber Security Lead and drive innovation in a fast-paced government project.

Mesa County Public Library District Hybrid Grand Junction, Colorado, United States
Posted 9 hours ago

Join Mesa County Public Library as the Head of Public Services to manage a dynamic team dedicated to serving the community.

Photo of the Rise User
Posted 12 days ago
Photo of the Rise User
Posted 5 days ago
Photo of the Rise User
Posted 4 days ago
Photo of the Rise User
SIXT Remote Bengaluru, Karnataka, India
Posted 9 days ago

Visa Inc. operates as a payments technology company worldwide. The company facilitates commerce through the transfer of value and information among consumers, merchants, financial institutions, businesses, strategic partners, and government entiti...

8343 jobs
MATCH
VIEW MATCH
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
April 3, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!