Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Lead Systems Engineer, High-Performance Computing image - Rise Careers
Job details

Lead Systems Engineer, High-Performance Computing - job 16 of 20

IaaS Systems and Storage & Engineering (ISSE) team is part of the Operations & Infrastructure technology organization. Distributed Compute engineering (DCE) is part of ISSE and High-performance compute platform engineering is part of DCE. Our vision, mission and purpose are summarized as following:

Vision: To become a leading technical engineering professional, pioneering in the design and automation of server infrastructure. We envision creating highly secure and efficient operations environments that drive business success and technological advancement.

Mission: Our mission is to deliver high-quality server infrastructure design and automated implementation. We are committed to operating in complex, highly secure, and highly available environments, while maintaining rigorous operations, security, and procedural models.

Purpose: The purpose of this role is to utilize strong hands-on technical engineering skills to design and automate the implementation of server infrastructure based on business requirements. This role will interact with technology domain experts to maintain high security and availability in complex operational environments, thereby driving business efficiency and security.

Essential Functions:

  • GPU as a Service and High-Performance Compute Platform Support: Expertise in deploying, managing, and optimizing GPU as a Service (GaaS) and high-performance compute platforms to support advanced computational workloads.
  • Extensive Datacenter Experience: Proficient in managing complex, geographically distributed IT infrastructures to ensure high availability and performance.
  • Advanced Technical Knowledge: Profound understanding of high-performance, highly available, and secure computing systems utilizing x86 technologies and protocols (NVME, GPU, PCI-E).
  • Enterprise Server and Component Expertise: In-depth knowledge of server components (storage/network controllers, HBA, SSDs) and their functionalities, essential for maintaining high-performance compute environments.
  • Processor and GPU Systems Proficiency: Strong grasp of Intel/AMD architectures, GPU systems, memory hierarchy, and hardware-level security to enhance system performance and reliability.
  • Out-of-Band, UEFI, and BIOS Expertise: Comprehensive understanding of out-of-band management, UEFI, BIOS settings, and their impact on system performance and security in high-performance computing environments.
  • Hardware Lifecycle Management: Experienced in hardware lifecycle management, including firmware and OS driver certifications, to ensure the longevity and reliability of compute resources.
  • Infrastructure Management and Automation: Proficient in installing, configuring, supporting, and maintaining compute infrastructure management tools, with skills in Ansible for automation to streamline deployment and operational tasks.
  • Performance Benchmarking and Tech Evaluation: Capable of running performance benchmarks and evaluating new technologies for various platforms (Linux, Windows, containerized, and virtualized) to ensure optimal performance.
  • Scripting Proficiency: Advanced skills in scripting languages such as PowerShell and Python to automate and optimize infrastructure tasks.
  • Team and Independent Work: Highly motivated, excellent team player, capable of working independently, with strong analytical and troubleshooting abilities to resolve complex issues and mentor junior staff.

This is a hybrid position. Hybrid employees can alternate time between both remote and office. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs.

Average salary estimate

$135000 / YEARLY (est.)
min
max
$120000K
$150000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Lead Systems Engineer, High-Performance Computing, Visa

Are you ready to take your career to the next level? Join our team as a Lead Systems Engineer for High-Performance Computing at our Ashburn location! In this vital role within our IaaS Systems and Storage & Engineering (ISSE) team, you will be at the forefront of designing and automating server infrastructure tailored to business needs. You'll work closely with technology experts to ensure our operations are not only efficient but also highly secure. Imagine using your in-depth knowledge of GPU as a Service and advanced computing platforms to tackle complex workloads, while managing geographically distributed IT infrastructures with high availability. Here, you'll blend your extensive datacenter experience with your expertise in Intel and AMD architectures, keeping our systems performing at their best. You will have the chance to enhance your skills in hardware lifecycle management, automation through tools like Ansible, and advanced scripting in PowerShell and Python. We foster a collaborative environment, where you can work both independently and as part of a strong team. This hybrid position allows you to enjoy a flexible work-life balance while being part of a community dedicated to technological advancement. If you have a passion for high-performance computing and a desire to lead projects that make a real difference in the industry, we’d love to hear from you!

Frequently Asked Questions (FAQs) for Lead Systems Engineer, High-Performance Computing Role at Visa
What responsibilities does a Lead Systems Engineer at the High-Performance Computing team have?

As a Lead Systems Engineer in High-Performance Computing, you will oversee the design and automation of server infrastructure. Your main responsibilities will include deploying and optimizing GPU as a Service, managing complex IT infrastructures, and ensuring high security and availability. You’ll also be involved in hardware lifecycle management and performance benchmarking, all while collaborating with a skilled team.

Join Rise to see the full answer
What qualifications are needed to be a successful Lead Systems Engineer in Ashburn?

To excel as a Lead Systems Engineer at our Ashburn location, candidates should possess advanced technical knowledge in high-performance computing systems, particularly with x86 technologies. A strong grasp of GPU systems, datacenter management experience, and proficiency in automation tools like Ansible are essential. Candidates should also have scripting skills in PowerShell and Python, alongside a passion for optimizing server infrastructure.

Join Rise to see the full answer
How does the Lead Systems Engineer role contribute to the company's mission at ISSE?

The Lead Systems Engineer plays a crucial role in fulfilling the ISSE mission by designing high-quality server infrastructures that drive efficiency and security. Your work ensures that our technology environments run smoothly, which directly impacts business success. By automating processes and maintaining robust systems, you help propel the company towards its overarching goal of technological advancement.

Join Rise to see the full answer
What kind of work environment can a Lead Systems Engineer expect at this company?

The work environment for a Lead Systems Engineer is dynamic and collaborative. You will have the benefit of a hybrid role, which means you'll have the flexibility to work remotely and in-office. The company culture emphasizes teamwork and knowledge sharing, providing an ideal atmosphere for professional growth and innovation in high-performance computing.

Join Rise to see the full answer
What skills are essential for a Lead Systems Engineer focusing on high-performance computing?

Essential skills for a Lead Systems Engineer in high-performance computing include a deep understanding of GPU as a Service, expertise in server components, and the ability to manage complex IT infrastructures. Proficiency in performance testing, automation with tools like Ansible, and scripting in languages such as PowerShell and Python are also vital. Additionally, having a problem-solving mindset and strong analytical abilities are crucial for success in this role.

Join Rise to see the full answer
Common Interview Questions for Lead Systems Engineer, High-Performance Computing
Can you describe your experience with GPU as a Service in high-performance computing environments?

In answering this question, you should highlight specific projects or experiences where you successfully implemented or managed GPU as a Service. Discuss the performance challenges you faced, how you optimized workloads, and any tools or frameworks you used.

Join Rise to see the full answer
How do you ensure high availability in geographically distributed IT infrastructures?

Demonstrate your understanding of best practices for managing distributed systems, including redundancy, failover strategies, and monitoring. Mention any specific technologies or methodologies you've used to maintain high uptime and performance.

Join Rise to see the full answer
What measures do you take to secure high-performance computing environments?

Talk about specific security protocols or practices you follow, such as the use of firewalls, encryption, and access controls. You might also discuss your experience with out-of-band management and securing BIOS settings to enhance overall system security.

Join Rise to see the full answer
How do you approach performance benchmarking and evaluation of new technologies?

Highlight the steps you take to conduct performance benchmarks, including selecting appropriate metrics, tools used, and your process for evaluating new technologies. Share any insights you've gained from previous evaluations and how they impacted decision-making.

Join Rise to see the full answer
Can you provide an example of a complex issue you resolved in a high-performance environment?

When answering, give a concise overview of the problem, detailing your analytical process to identify the root cause and the steps you took to resolve it. Highlight any collaboration you achieved with team members and what you learned from the experience.

Join Rise to see the full answer
What experience do you have with hardware lifecycle management?

Discuss your experience managing hardware through its lifecycle, covering aspects such as procurement, maintenance, firmware updates, and eventual decommissioning. Mention any specific projects or technologies related to hardware lifecycle management that you have dealt with.

Join Rise to see the full answer
Describe a project where you automated infrastructure tasks. What tools did you use?

Focus on a specific example of automation you've implemented. Include the tools (such as Ansible) you used, the tasks you automated, and the improvements seen in operational efficiency or error reduction as a result of your work.

Join Rise to see the full answer
How do you stay current with advancements in high-performance computing technology?

Talk about the resources you use to keep up-to-date with the industry, such as following key publications, attending conferences, or participating in online forums. Mention how you apply this knowledge practically in your projects.

Join Rise to see the full answer
What should you consider when designing server infrastructure based on business requirements?

Express the importance of understanding business goals, expected workloads, scalability needs, and budget constraints. Discuss how you gather requirements from stakeholders and translate them into technical specifications.

Join Rise to see the full answer
How do you handle collaboration with domain experts in technical environments?

Emphasize the importance of clear communication and teamwork when collaborating with domain experts. Share any specific strategies you use to facilitate discussions and ensure that everyone’s input is valued in the decision-making process.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 4 days ago
Photo of the Rise User
Vast Hybrid Long Beach, California, United States
Posted 12 days ago
Photo of the Rise User
Posted 12 days ago
Photo of the Rise User
Stratasys Hybrid Eden Prairie, Minnesota, United States
Posted 10 days ago
Photo of the Rise User
Posted 6 days ago
Posted 6 days ago
Photo of the Rise User
AECOM Hybrid Dallas, Texas, United States
Posted 3 days ago

Join AECOM as a Senior Bridge Design Engineer VI, leading innovative bridge design projects in Dallas.

Photo of the Rise User
Posted 10 days ago

Visa Inc. operates as a payments technology company worldwide. The company facilitates commerce through the transfer of value and information among consumers, merchants, financial institutions, businesses, strategic partners, and government entiti...

8343 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
April 3, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!