Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Site Reliability Engineer image - Rise Careers
Job details

Site Reliability Engineer

Company Description

Alter Solutions Portugal is an IT Consultancy Company, promoter of Digital Transformation, part of the Alter Solutions Group, created in 2006, in Paris.

In 2022, Alter Solutions joined the act digital group, constituting a global community of talent in Technology, with presence in thirteen countries: Germany, Belgium, Brazil, Canada, United States of America, Mexico, Morocco, Spain, France, Luxembourg, Poland, Portugal and Serbia. Also in 2023, we were certified as a Great Place to Work©.

In Portugal, we partner with over 120 clients and a team of over 500 people, working in projects for industries as diverse as banking, insurance, transportation, aviation, energy, and telecom.

Headquarters of the Nearshore IT center, Alter Solutions Portugal has a dedicated team of around 30 specialized professionals, integrated into projects with several internationally renowned clients.

Job Description

We are looking for a Site Reliability Engineer responsible to improve High Availability and Resilience, Better load management with L4 & L7 load balancers, build a dynamic and scalable infra to accommodate the high-volume business transactions, setup a Monitoring system to log the performance and capacity levels to ensure high availability of applications with minimal downtime.

Main Responsibilities:

  • Design, develop and implement systems software/scripts that improve the stability, scalability, availability, and latency of the Risk system applications. 
  • Solve problems occurring with our highly available production systems and build solutions & automation using combination of scripting & tooling to prevent them from happening again.
  • Defines and drives adoption of a best-in-class monitoring framework to accomplish end-to-end flow monitoring and effective alerting.
  • Monitoring system performance and capacity levels to ensure high availability of applications with minimal downtime.
  • Build and run capacity tests to manage the growth of systems.
  • Investigating any service disruptions or other service issues to identify their causes.
  • Performing regular audits of servers to check for signs of degradation or malfunction which involves infra hygiene and end of life.
  • Conducting post-mortem examinations of failed systems to identify and address root cause. 
  • Accountable for maintenance and improvement of IT continuity strategies
  • Be an advocate of release engineering best practices such as ZERO Downtime, Canary release, Incremental rollouts etc.
  • Works with Development, DevOps and IT operational team throughout the Software Development Life Cycle to ensure sustainable software releases.

Qualifications

  • 4-6 years of experience in IT Operations/DevOps/Application support/SRE team 
  • Proven foundation in Linux administration and troubleshooting.                
  • Solid knowledge of APM Tools i.e. Dynatrace / AppDynamics               
  • Good understanding of Log aggregators i.e. Splunk/ELK .
  • Solid work experience with load balancers (L4 & L7) preferably apache http(d).
  • Good understanding of TCP/IP and HTTP protocols and Networking, DNS/Firewalls, F5 Load balancing.
  • Experience in Apache Tomcat servers and JVM performance troubleshooting.
  • Knowledge of Ansible
  • Knowledge of Jenkin, Ansible, Docker, Kubernetes and Terraform.
  • Knowledge in OpenStack, Networking, Security or Storage is desirable.
  • Solid experience in at least one scripting language. Python preferred.
  • Experience with building, operating, and maintaining scalable distributed systems, and with operations automation

Soft skills:

  • English (Fluent) – Mandatory

Additional Information

Hybrid working model in Lisbon.

Alter Solutions Glassdoor Company Review
3.7 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
Alter Solutions DE&I Review
3.7 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
CEO of Alter Solutions
Alter Solutions CEO photo
Louis Vachette
Approve of CEO
What You Should Know About Site Reliability Engineer, Alter Solutions

Join our dynamic team at Alter Solutions Portugal as a Site Reliability Engineer! Located in the beautiful city of Lisboa, this role is all about ensuring high availability and resilience within our cutting-edge systems. Alter Solutions has been a key player in IT consultancy since 2006, and after joining the act digital group, we expanded our expertise across multiple countries. As a Site Reliability Engineer, you'll take charge of improving our infrastructure to handle high-volume business transactions, implementing smart load management strategies, and setting up robust monitoring systems to track performance and capacity levels. Collaborating across teams, you will design and develop systems software to enhance stability, scalability, and latency in our Risk system applications. Your expertise will be pivotal in solving issues and creating automation solutions that prevent future problems. Not only will you conduct audits and post-mortem examinations to ensure our systems run smoothly, but you'll also champion release engineering best practices. We value a proactive approach to infrastructure hygiene and continuity, making this role crucial in maintaining our reputation as a Great Place to Work©. If you're passionate about building scalable distributed systems and working in a highly collaborative environment, we would love to hear from you!

Frequently Asked Questions (FAQs) for Site Reliability Engineer Role at Alter Solutions
What does a Site Reliability Engineer do at Alter Solutions Portugal?

A Site Reliability Engineer at Alter Solutions Portugal focuses on enhancing the stability, availability, and performance of our systems. This includes designing automation solutions, implementing monitoring systems, ensuring minimal downtime, and collaborating on software development throughout the lifecycle.

Join Rise to see the full answer
What qualifications are necessary for a Site Reliability Engineer position at Alter Solutions?

Candidates should have 4-6 years of experience in IT Operations, DevOps, or support roles, and possess solid Linux administration skills. Familiarity with APM tools like Dynatrace and experience with load balancers and scripting languages, preferably Python, are crucial for success in this role.

Join Rise to see the full answer
What tools and technologies does a Site Reliability Engineer use at Alter Solutions Portugal?

At Alter Solutions, Site Reliability Engineers work with a range of tools including load balancers, monitoring tools like Splunk or ELK, and automation frameworks such as Jenkins, Ansible, Docker, Kubernetes, and Terraform to maintain and enhance our infrastructure.

Join Rise to see the full answer
How does Alter Solutions Portugal support professional growth for Site Reliability Engineers?

Alter Solutions fosters a collaborative work environment that encourages continuous learning and development. Our team benefits from mentorship, various training opportunities, and participation in innovative projects that enhance skills applicable to the Site Reliability Engineer role.

Join Rise to see the full answer
What is the work culture like for a Site Reliability Engineer at Alter Solutions Portugal?

The work culture at Alter Solutions Portugal is dynamic and inclusive, promoting teamwork and innovation. We emphasize the importance of a work-life balance and offer a hybrid working model, empowering our Site Reliability Engineers to thrive in collaborative and flexible settings.

Join Rise to see the full answer
Can you explain the importance of monitoring systems for a Site Reliability Engineer at Alter Solutions?

Monitoring systems are vital for a Site Reliability Engineer at Alter Solutions because they ensure high availability and performance of applications. By implementing effective monitoring solutions, our engineers can proactively identify and resolve potential issues, thus maintaining service excellence.

Join Rise to see the full answer
What continuous improvement practices does a Site Reliability Engineer implement at Alter Solutions Portugal?

Continuous improvement is key for Site Reliability Engineers at Alter Solutions. They regularly conduct audits, analyze performance data, and implement infrastructure hygiene practices. Additionally, they advocate for best practices in release engineering to maintain service reliability.

Join Rise to see the full answer
Common Interview Questions for Site Reliability Engineer
Can you describe your experience with monitoring systems in your Site Reliability Engineer role?

When responding, detail the specific monitoring tools you've used, such as Dynatrace or AppDynamics. Provide examples of how you've set up monitoring frameworks to enhance system reliability and how your efforts impacted overall service performance.

Join Rise to see the full answer
What steps would you take to troubleshoot a service disruption?

Discuss a systematic approach to identifying and analyzing the disruption. Emphasize the importance of collecting logs, utilizing monitoring tools, conducting a root cause analysis, and how you would implement solutions to prevent future occurrences.

Join Rise to see the full answer
How would you ensure high availability in a distributed system?

Explain your strategies for ensuring high availability, such as load balancing, redundancy, and proactive monitoring. Discuss any relevant experience with implementing these strategies in past roles to demonstrate your expertise.

Join Rise to see the full answer
What is your approach to performance tuning in applications?

Share your methodology for identifying performance bottlenecks. Discuss tools you've used for performance analysis and how you implemented changes—whether through code optimization, infrastructure enhancements, or load management strategies.

Join Rise to see the full answer
How do you prioritize tasks when managing multiple production issues?

Describe your prioritization strategy, perhaps using techniques like impact assessment and urgency evaluation. Share examples of how you've managed tasks effectively in high-pressure situations to reassure your ability to handle multiple issues.

Join Rise to see the full answer
What experience do you have with automation in your previous roles?

Detail your experience in creating automated solutions, mentioning specific tools and scripting languages you've used—such as Python or Ansible. Discuss how automation improved efficiencies in your earlier projects.

Join Rise to see the full answer
Can you explain the concept of a Canary Release and its benefits?

Clarify how a Canary Release allows new features to be tested on a small subset of users before full deployment. Discuss its benefits, including minimizing risk and enabling real-world feedback that can enhance overall product quality.

Join Rise to see the full answer
What methods do you use for conducting post-mortem evaluations?

Outline your process for post-mortem evaluations, focusing on gathering data, identifying root causes, and implementing lessons learned to prevent the recurrence of issues. Share examples to highlight the effectiveness of your approach.

Join Rise to see the full answer
How do you stay updated with continuing education in your field?

Mention your commitment to ongoing learning through online courses, certifications, and industry conferences. Share specific resources or communities you engage with to keep your skills fresh and relevant as a Site Reliability Engineer.

Join Rise to see the full answer
What role does collaboration play in your approach to SRE?

Discuss how collaboration is essential in Site Reliability Engineering, as it ensures smooth communication between development and operations teams. Highlight your experience in cross-functional teamwork and the positive impact it had on project outcomes.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Alter Solutions Remote Porto, Portugal
Posted 13 days ago
Photo of the Rise User
Posted 13 days ago
Photo of the Rise User
Posted 12 days ago
Posted 12 days ago
Photo of the Rise User
Posted 13 days ago
Photo of the Rise User
AECOM Remote Philadelphia, PA, United States
Posted 4 days ago
Photo of the Rise User
Posted 13 days ago
Photo of the Rise User
Posted 12 days ago

The Alter Solutions Group is an IT Consultancy group, promoter of Digital Transformation, created in 2006, in Paris. In 2022, Alter Solutions joined the act digital group, constituting a global community of talent in Technology, with presence in...

16 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
November 28, 2024

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!