Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Staff Site Reliability Engineer - Cloud Engineering image - Rise Careers
Job details

Staff Site Reliability Engineer - Cloud Engineering - job 9 of 20

Visa’s Technology Organization is a community of problem solvers and innovators reshaping the future of commerce. We operate the world’s most sophisticated processing networks capable of handling more than 65k secure transactions a second across 80M merchants, 15k Financial Institutions, and billions of everyday people. While working with us you’ll get to work on complex distributed systems and solve massive scale problems centered on new payment flows, business and data solutions, cyber security, and B2C platforms.

 

The Opportunity:

As a Staff Site Reliability Engineer in Product Reliability Engineering, you will be part of a team that maintains and supports Visa's Data Platform and provides support for key cloud based Big data and Kafka Platforms. You will be responsible for driving innovation for our partners and clients, within Visa and globally. You will work on open-source Big Data and Kafka clusters focusing on Cloud, ensuring their availability, performance, reliability, and improving operational efficiency.

 

The Work itself:

Essential Functions:

· Design, build and manage Big Data and Kafka infrastructure on AWS, GCP and Azure.

· Manage and optimize Apache Big Data and Kafka clusters for high performance, reliability, and scalability.

· Develop tools and processes to monitor and analyze system performance and to identify potential issues.

· Collaborate with other teams to design and implement Solutions to improve reliability and efficiency of the Big data cloud platforms.

· Ensure security and compliance of the platforms within organizational guidelines.

· Other responsibilities include effective root cause analysis of major production incidents and the development of learning documentation. The person will identify and implement high-availability solutions for services with a single point of failure.

· The role involves planning and performing capacity expansions and upgrades in a timely manner to avoid any scaling issues and bugs. This includes automating repetitive tasks to reduce manual effort and prevent human errors.

· The successful candidate will tune alerting and set up observability to proactively identify issues and performance problems. They will also work closely with Level 3 teams in reviewing new use cases and cluster hardening techniques to build robust and reliable platforms.

· The role involves creating standard operating procedure documents and guidelines on effectively managing and utilizing the platforms. The person will leverage DevOps tools, disciplines (Incident, problem, and change management), and standards in day-to-day operations.

· The individual will ensure that the platforms can effectively meet performance and service level agreement requirements. They will also perform security remediation, automation, and self-healing as per the requirement.

· The individual will concentrate on developing automations and reports to minimize manual effort. This can be achieved through various automation tools such as Shell scripting, Ansible, or Python scripting, or by using any other programming language.

 

The Skills You Bring:

· Energy and Experience: A growth mindset that is curious and passionate about technologies and enjoys challenging projects on a global scale.

·  Challenge the Status Quo: Comfort in pushing the boundaries, “hacking” beyond traditional solutions.

·  Language Expertise: Expertise in one or more general development languages (e.g., Java, python)

· Builder: Experience building and deploying distributed systems.

·  Learner: Constant drive to learn new technologies such as cloud technologies, Kubernetes, MLOPS.

· Partnership: Experience collaborating with Engineering, Application and Other functional teams.

 

**We do not expect that any single candidate would fulfill all these characteristics. For instance, we have awesome team members who are really focused on building scalable systems but didn’t work with payments technology or web applications before joining Visa.

This is a hybrid position. Hybrid employees can alternate time between both remote and office. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs.

Average salary estimate

$140000 / YEARLY (est.)
min
max
$120000K
$160000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Staff Site Reliability Engineer - Cloud Engineering, Visa

At Visa, we’re on the lookout for a dynamic Staff Site Reliability Engineer specializing in Cloud Engineering to join our cutting-edge Technology Organization based in Austin. This isn't just another job; it's an opportunity to become part of a community that's reshaping the very future of commerce! Imagine working with one of the world’s most advanced processing networks—capable of handling over 65,000 secure transactions every second! In this role, you'll dive deep into complex distributed systems, tackling massive scale challenges that span new payment flows, cyber security, and B2C platforms. Your main focus will be on maintaining and supporting our Big Data and Kafka platforms in the cloud. You’ll get to design and manage infrastructure on AWS, GCP, and Azure while optimizing performance and reliability. Collaborating with engineers from various teams, you will innovate solutions that boost operational efficiency and ensure top-notch security. Plus, you’ll play a pivotal role in capacity planning, automation, and incident root cause analysis. Looking for someone who's curious, passionate about technology, and not afraid to challenge the status quo! This hybrid position offers the flexibility to work both remotely and from our Austin office, building an awesome team culture. If you’re excited about driving innovation and using cutting-edge technologies, then come join us at Visa!

Frequently Asked Questions (FAQs) for Staff Site Reliability Engineer - Cloud Engineering Role at Visa
What are the responsibilities of a Staff Site Reliability Engineer at Visa?

As a Staff Site Reliability Engineer at Visa, you'll be tasked with designing, building, and managing Big Data and Kafka infrastructure across major cloud platforms like AWS, GCP, and Azure. Your responsibilities will include optimizing performance and reliability of clusters, collaborating with various teams to enhance operational efficiency, and ensuring security compliance. You'll also be involved in root cause analysis of incidents, planning for capacity expansions, and automating processes to minimize manual work.

Join Rise to see the full answer
What qualifications do I need to become a Staff Site Reliability Engineer at Visa?

To qualify for the Staff Site Reliability Engineer position at Visa, candidates should have a solid foundation in general development languages such as Java or Python, experience with distributed systems, and a growth mindset towards new technologies like cloud computing and Kubernetes. Collaboration skills are essential, as the role involves working closely across multiple engineering teams. You don't need to have prior experience in payments technology; passion and enthusiasm are key!

Join Rise to see the full answer
What is the work environment like for a Staff Site Reliability Engineer at Visa?

The work environment for a Staff Site Reliability Engineer at Visa is vibrant and collaborative. Positioned in Austin, this hybrid role allows for both in-office and remote work, giving employees the flexibility to balance their needs. You'll be immersed in a team of innovators who tackle global challenges, emphasizing continuous learning and pushing boundaries to enhance Visa's technology.

Join Rise to see the full answer
How does Visa support the growth of its Staff Site Reliability Engineers?

Visa is committed to the growth of its Staff Site Reliability Engineers by fostering a culture of constant learning and innovation. Employees are encouraged to explore new technologies, participate in challenging projects, and collaborate with cross-functional teams. We understand that no single candidate will possess all desired traits, which is why we support professional development and celebrate diverse experiences.

Join Rise to see the full answer
What tools are utilized by a Staff Site Reliability Engineer at Visa?

In the role of Staff Site Reliability Engineer at Visa, you'll be using an array of tools and technologies. Key tools include automation platforms like Ansible and various scripting languages such as Shell and Python for task automation. Monitoring and performance analysis tools will also be part of your toolkit, enabling you to ensure high availability and efficiency across Visa's cloud platforms.

Join Rise to see the full answer
Common Interview Questions for Staff Site Reliability Engineer - Cloud Engineering
Can you explain your experience with cloud platforms like AWS, GCP, or Azure?

In answering this question, highlight specific projects where you designed or managed infrastructure in any cloud setting. Discuss the tools you utilized, how you ensured scalability and reliability, and if applicable, any unique challenges you overcame during migration or management.

Join Rise to see the full answer
How do you handle performance monitoring and incident response?

Describe your approach to performance monitoring, including tools you’ve used (like Prometheus or Grafana) for observability. Detail your incident response strategy, focusing on root cause analysis and effective communication with stakeholders to minimize downtime and service disruption.

Join Rise to see the full answer
What strategies do you use for automation in site reliability engineering?

Discuss your experience with automation tools such as Ansible or personal scripts in Python/Shell. Explain how you've streamlined processes, reduced manual efforts, and enhanced service reliability through automation, giving specific examples where possible.

Join Rise to see the full answer
Can you give an example of a major production incident and how you dealt with it?

Choose a specific incident and describe your role in responding to it. Include the steps you took for incident management, the stakeholders you communicated with, and how you ensured lessons were learned to prevent future occurrences.

Join Rise to see the full answer
How do you prioritize tasks when managing multiple projects?

Talk about your methods for task prioritization based on urgency, impact, and available resources. Mention any frameworks you utilize (like Agile or Kanban) and give an example of how you successfully juggled multiple priorities in a past role.

Join Rise to see the full answer
Describe your experience with distributed systems.

In your response, outline the projects where you built or maintained distributed systems. Highlight the challenges encountered, such as latencies or bottlenecks, and describe how you addressed and optimized them.

Join Rise to see the full answer
What practices do you follow to ensure system security and compliance?

Discuss the security measures you've put in place in previous roles, including regular audits, compliance checks, and how you stay informed about security best practices. Be sure to mention any tools or frameworks you employ in your security strategy.

Join Rise to see the full answer
How do you keep up with the latest advancements in technology?

Explain your methods for continuous learning, such as attending workshops, taking online courses, or being an active part of tech communities. Share specific examples of how learning something new benefited your work or projects.

Join Rise to see the full answer
Can you describe the importance of collaboration in your role?

Emphasize the significance of cross-functional collaboration in site reliability engineering. Share examples of how teamwork has led to successful project outcomes and the tools or practices that have enhanced collaboration in your settings.

Join Rise to see the full answer
What is your approach to capacity planning?

Detail your capacity planning process, including how you analyze usage trends and project future growth. Discuss tools or metrics you rely on to ensure that you're prepared for scaling while avoiding performance bottlenecks.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 3 days ago

Join Visa’s Cyber Analytics & AI Innovations team as an Associate Data Scientist to build cutting-edge AI solutions in the realm of cybersecurity.

Photo of the Rise User

Visa is searching for a results-driven Director of Risk Governance to oversee their Third Party Lifecycle Management program, enhancing risk processes and stakeholder engagement.

Photo of the Rise User
Posted 4 days ago

TD SYNNEX seeks a motivated AI Engineering Intern to contribute to cutting-edge AI solutions in a hybrid role.

Amca Hybrid El Segundo
Posted 7 days ago

Join Amca as a Design Engineer to enhance legacy aerospace products and work directly with customers on mission-critical solutions.

Photo of the Rise User
AECOM Hybrid Chelmsford, Massachusetts, United States
Posted 12 days ago
Photo of the Rise User
Posted 2 days ago

Join InfStones as a Senior DevOps Engineer and leverage your expertise in cloud technologies and automation tools to drive innovation in blockchain infrastructure.

Photo of the Rise User
Veolia Environnement SA Hybrid 162 Old Mill Rd, West Nyack, NY 10994, USA
Posted 4 days ago

As a Senior Project Engineer at Veolia North America, you'll manage significant capital projects that improve water treatment and wastewater systems across New York and Rhode Island.

Photo of the Rise User
Posted 11 days ago

Visa Inc. operates as a payments technology company worldwide. The company facilitates commerce through the transfer of value and information among consumers, merchants, financial institutions, businesses, strategic partners, and government entiti...

9242 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
April 3, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!