Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Staff Site Reliability Engineer - Cloud Engineering image - Rise Careers
Job details

Staff Site Reliability Engineer - Cloud Engineering - job 18 of 20

Visa’s Technology Organization is a community of problem solvers and innovators reshaping the future of commerce. We operate the world’s most sophisticated processing networks capable of handling more than 65k secure transactions a second across 80M merchants, 15k Financial Institutions, and billions of everyday people. While working with us you’ll get to work on complex distributed systems and solve massive scale problems centered on new payment flows, business and data solutions, cyber security, and B2C platforms.

 

The Opportunity:

As a Staff Site Reliability Engineer in Product Reliability Engineering, you will be part of a team that maintains and supports Visa's Data Platform and provides support for key cloud based Big data and Kafka Platforms. You will be responsible for driving innovation for our partners and clients, within Visa and globally. You will work on open-source Big Data and Kafka clusters focusing on Cloud, ensuring their availability, performance, reliability, and improving operational efficiency.

 

The Work itself:

Essential Functions:

· Design, build and manage Big Data and Kafka infrastructure on AWS, GCP and Azure.

· Manage and optimize Apache Big Data and Kafka clusters for high performance, reliability, and scalability.

· Develop tools and processes to monitor and analyze system performance and to identify potential issues.

· Collaborate with other teams to design and implement Solutions to improve reliability and efficiency of the Big data cloud platforms.

· Ensure security and compliance of the platforms within organizational guidelines.

· Other responsibilities include effective root cause analysis of major production incidents and the development of learning documentation. The person will identify and implement high-availability solutions for services with a single point of failure.

· The role involves planning and performing capacity expansions and upgrades in a timely manner to avoid any scaling issues and bugs. This includes automating repetitive tasks to reduce manual effort and prevent human errors.

· The successful candidate will tune alerting and set up observability to proactively identify issues and performance problems. They will also work closely with Level 3 teams in reviewing new use cases and cluster hardening techniques to build robust and reliable platforms.

· The role involves creating standard operating procedure documents and guidelines on effectively managing and utilizing the platforms. The person will leverage DevOps tools, disciplines (Incident, problem, and change management), and standards in day-to-day operations.

· The individual will ensure that the platforms can effectively meet performance and service level agreement requirements. They will also perform security remediation, automation, and self-healing as per the requirement.

· The individual will concentrate on developing automations and reports to minimize manual effort. This can be achieved through various automation tools such as Shell scripting, Ansible, or Python scripting, or by using any other programming language.

 

The Skills You Bring:

· Energy and Experience: A growth mindset that is curious and passionate about technologies and enjoys challenging projects on a global scale.

·  Challenge the Status Quo: Comfort in pushing the boundaries, “hacking” beyond traditional solutions.

·  Language Expertise: Expertise in one or more general development languages (e.g., Java, python)

· Builder: Experience building and deploying distributed systems.

·  Learner: Constant drive to learn new technologies such as cloud technologies, Kubernetes, MLOPS.

· Partnership: Experience collaborating with Engineering, Application and Other functional teams.

 

**We do not expect that any single candidate would fulfill all these characteristics. For instance, we have awesome team members who are really focused on building scalable systems but didn’t work with payments technology or web applications before joining Visa.

This is a hybrid position. Hybrid employees can alternate time between both remote and office. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs.

Average salary estimate

$140000 / YEARLY (est.)
min
max
$120000K
$160000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Staff Site Reliability Engineer - Cloud Engineering, Visa

If you're a talented Staff Site Reliability Engineer with a passion for Cloud Engineering, Visa in Austin is looking for you! At Visa, we're reshaping the future of commerce and tackling some of the world's most complex processing networks. As a vital member of our Product Reliability Engineering team, you'll dive into exciting projects centered around our Big Data and Kafka Platforms. You’ll enjoy a collaborative environment where innovation thrives as you design and manage robust infrastructure on AWS, GCP, and Azure. Your role will involve optimizing Apache Big Data and Kafka clusters to ensure top-notch performance, reliability, and scalability. You’ll leverage your experience to develop monitoring tools, automate manual tasks, and implement security measures that meet our high standards. Additionally, you’ll collaborate across teams to enhance the efficiency of our cloud platforms and contribute to incident analysis and resolutions. With your growth mindset, a knack for pushing boundaries, and a thirst for learning new technologies, you’ll play a pivotal role in ensuring that our systems perform flawlessly. Join us at Visa, where every day presents an exciting challenge, and you’ll have the opportunity to work on a global scale while contributing to an industry leader in payment technology.

Frequently Asked Questions (FAQs) for Staff Site Reliability Engineer - Cloud Engineering Role at Visa
What are the responsibilities of a Staff Site Reliability Engineer at Visa?

As a Staff Site Reliability Engineer at Visa, your responsibilities include designing and managing Big Data and Kafka infrastructure across various cloud platforms like AWS, GCP, and Azure. You will monitor system performance, optimize Apache clusters for scalability, and automate repetitive tasks to enhance operational efficiency. Additionally, you'll implement and uphold security protocols aligned with organizational guidelines.

Join Rise to see the full answer
What qualifications are required for the Staff Site Reliability Engineer position at Visa?

To be successful as a Staff Site Reliability Engineer at Visa, you should have strong expertise in distributed systems, proficiency in programming languages like Java or Python, and a solid understanding of cloud technologies. A history of collaborating within Engineering and Application teams while showing a growth mindset is also essential. Familiarity with DevOps practices will be highly beneficial in this role.

Join Rise to see the full answer
What skills are essential for a Staff Site Reliability Engineer at Visa?

Essential skills for the Staff Site Reliability Engineer role at Visa include expertise in cloud platforms, proficiency in automation tools like Shell scripting and Ansible, and the ability to analyze and resolve complex system issues. A passion for continuous learning and experience in building scalable systems will also contribute to your success in this position.

Join Rise to see the full answer
How does Visa support the professional growth of a Staff Site Reliability Engineer?

Visa encourages the professional growth of its Staff Site Reliability Engineers through a collaborative environment that promotes continuous learning. With access to challenging projects and the latest technologies, team members are empowered to innovate and develop their skills further, ensuring they remain at the forefront of the industry.

Join Rise to see the full answer
What does a typical day look like for a Staff Site Reliability Engineer at Visa?

A typical day for a Staff Site Reliability Engineer at Visa involves monitoring system performance, collaborating with cross-functional teams, optimizing cloud platforms, and working on incident resolutions. You'll spend time designing and automating processes while ensuring high availability and compliance with security protocols, all in a dynamic and engaging work environment.

Join Rise to see the full answer
Common Interview Questions for Staff Site Reliability Engineer - Cloud Engineering
What experience do you have with managing Apache Big Data and Kafka clusters?

When answering this question, share specific examples of your experience in managing and optimizing such clusters. Highlight the technologies you've used, any challenges you faced, and how you overcame them. Demonstrating your understanding of the complexity and scale of these systems will be impressive.

Join Rise to see the full answer
How do you ensure the reliability and performance of cloud infrastructures?

To answer this, discuss your approach to monitoring performance, implementing redundancy measures, and conducting regular capacity reviews. Mention any tools you use for observability and how you proactively identify issues to maintain high availability in cloud infrastructures.

Join Rise to see the full answer
Describe a challenging incident you managed in production. What steps did you take to resolve it?

Illustrate your problem-solving skills by describing the incident clearly. Explain the investigation process you followed, any tools you used for root cause analysis, and how you communicated with your team during the resolution process. These details will showcase your technical and interpersonal skills.

Join Rise to see the full answer
What automation tools and practices have you used to improve operational efficiency?

Share specific examples of automation tools you've used, such as Ansible or Python scripting. Discuss how you implemented these tools to reduce manual tasks or errors, providing metrics or outcomes that demonstrate the impact of your automation efforts on operational efficiency.

Join Rise to see the full answer
How do you stay updated on the latest cloud technologies?

Mention specific strategies you employ to remain current with new cloud technologies, such as following relevant blogs, attending webinars, participating in online communities, or enrolling in courses. Expressing your proactive approach to learning can reflect your commitment to the role.

Join Rise to see the full answer
Can you explain the principles of DevOps and how you apply them in your work?

Discuss your understanding of DevOps principles like collaboration, automation, and continuous improvement. Provide examples of how you've implemented these in your previous roles, focusing on team dynamics, integration of development and operations, and how it led to faster deployment cycles.

Join Rise to see the full answer
What strategies do you use for performing capacity planning?

Talk about the methods you consider for capacity planning, such as analyzing usage trends, working closely with stakeholders, and forecasting future growth. Mention how you adapt your strategies based on data and ensure systems can handle scaling effectively.

Join Rise to see the full answer
How would you handle a disagreement with a developer about a technical issue?

A good answer would include focusing on listening to the developer's perspective, collaborating to find a solutions, and then gathering data or input from other team members if necessary. Communicating constructively is key, so express your commitment to a team-oriented approach.

Join Rise to see the full answer
What metrics do you consider important for monitoring site reliability?

Discuss key performance indicators such as uptime, latency, error rates, and system resource utilization. Explain why each of these metrics is crucial for maintaining reliability and how monitoring them informs your decision-making about system improvements.

Join Rise to see the full answer
What projects have you worked on that demonstrate your leadership skills?

Share a specific project where you took a lead role, detailing the objectives, challenges faced, and outcomes. Highlight how you coordinated with team members, facilitated discussions, and drove the project to completion to demonstrate your leadership ability.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 11 days ago

Yelp is looking for an Engineering Manager to lead their Reporting and Data Infrastructure team in a fully remote role within Canada.

Photo of the Rise User
Posted 3 days ago

Join Legrand's Lighting Sector as a Mechanical Engineer I to innovate and support next-generation lighting solutions in Union City, CA.

Photo of the Rise User
Posted 23 hours ago

Join Mortenson's Energy Storage Group as a Safety Engineer II, where you'll ensure safety and compliance on innovative projects.

Posted 12 days ago

Join Applied Materials as an Engineering Technician III, where you will utilize your skills in troubleshooting and maintenance of electro-mechanical systems.

Jitterbit Remote São Paulo, State of São Paulo, Brazil
Posted 3 days ago

Join Jitterbit as a DevOps Engineer and play a key role in enhancing our infrastructure and deployment processes in a transformative tech environment.

Photo of the Rise User

Bring your inspector expertise to AECOM and impact large transportation projects in Arizona.

Visa Inc. operates as a payments technology company worldwide. The company facilitates commerce through the transfer of value and information among consumers, merchants, financial institutions, businesses, strategic partners, and government entiti...

9261 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
April 2, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
Photo of the Rise User
Someone from OH, Pickerington just viewed Sr. Client Project Manager at Forge Biologics
Photo of the Rise User
Someone from OH, Gallipolis just viewed Email Copywriting Intern, Summer 2025 at Power Digital
Photo of the Rise User
Someone from OH, Columbus just viewed Warehouse People Ops Coordinator at Babylist
Photo of the Rise User
9 people applied to Pega Engineer at Proxymity
Photo of the Rise User
Someone from OH, Toledo just viewed Field Recruiter (MI) at Wonderschool
d
Someone from OH, Columbus just viewed Reconciliation & Payments Specialist at dopay
Photo of the Rise User
Someone from OH, Cuyahoga Falls just viewed VP of Customer Operations at OXIO Corporation
Photo of the Rise User
23 people applied to Supervisor, Plumbing at SpaceX
Photo of the Rise User
Someone from OH, Springfield just viewed IT helpdesk Team Leader at Optimiza
Photo of the Rise User
Someone from OH, Akron just viewed Director of Revenue Cycle Management at Gather Health
Photo of the Rise User
Someone from OH, Dayton just viewed Data Entry Clerk at Hireframe
Photo of the Rise User
Someone from OH, Cincinnati just viewed Customer Success Manager - Illinois at Alma Technologies (OR)
Photo of the Rise User
Someone from OH, Cleveland just viewed Client Services Manager at Vitesse PSP
Photo of the Rise User
Someone from OH, Fairborn just viewed IOS Developer at Advansys
Z
Someone from OH, Reynoldsburg just viewed Educator Onboarding Associate at Zen Educate
Photo of the Rise User
Someone from OH, Canton just viewed SEASONER at Shearer's Foods
Photo of the Rise User
Someone from OH, Avon Lake just viewed Data Analyst I - Hospitality Data Team at Lightspeed Commerce
Photo of the Rise User
Someone from OH, Columbus just viewed Brand Awareness Specialist - Entry Level at Smart Solutions
Photo of the Rise User
Someone from OH, Cleveland just viewed Quality Assurance Weekender at Anheuser-Busch