Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Site Reliability Engineer (SRE) image - Rise Careers
Job details

Site Reliability Engineer (SRE)

About Air Apps

At Air Apps, we believe in thinking bigger—and moving faster. We’re a family-founded company on a mission to create the world’s first AI-powered Personal & Entrepreneurial Resource Planner (PRP), and we need your passion and ambition to help us change how people plan, work, and live. Born in Lisbon, Portugal, in 2018—and now with offices in both Lisbon and San Francisco—we’ve remained self-funded while reaching over 100 million downloads worldwide.

Our long-term focus drives us to challenge the status quo every day, pushing the boundaries of AI-driven solutions that truly make a difference. Here, you’ll be a creative force, shaping products that empower people across the globe.

Join us on this journey to redefine resource management—and change lives along the way.

The Role

As a Site Reliability Engineer (SRE) at Air Apps, you will be responsible for ensuring the reliability, availability, and scalability of our systems. You will work at the intersection of software development and operations, implementing automation, monitoring, and performance optimization strategies to minimize downtime and improve system resilience.

Responsibilities

  • Design and implement scalable, reliable, and fault-tolerant systems across cloud environments.

  • Develop and maintain observability tools, including monitoring, logging, and alerting (e.g., Prometheus, Grafana, Datadog, ELK).

  • Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.

  • Optimize system performance, scalability, and incident response workflows to improve uptime.

  • Work closely with development and DevOps teams to improve system design for reliability.

  • Conduct root cause analysis (RCA) and implement preventative measures to minimize failures.

  • Ensure high availability by designing and maintaining load balancing, failover, and disaster recovery strategies.

  • Improve CI/CD pipelines to enhance deployment speed while maintaining stability.

  • Optimize cloud cost and resource utilization for AWS, Azure, or Google Cloud Platform (GCP).

  • Participate in on-call rotations to quickly address system failures and minimize downtime.

Requirements

  • Around 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or System Engineering.

  • Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud-native architectures.

  • Experience with observability and monitoring tools (Prometheus, Grafana, ELK, Datadog, New Relic).

  • Proficiency in Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Pulumi.

  • Hands-on experience with containerization and orchestration (Docker, Kubernetes, Helm).

  • Strong Linux system administration and networking fundamentals.

  • Experience with incident management, debugging, and root cause analysis.

  • Proficiency in scripting (Bash, Python, or Go) for automation and system monitoring.

  • Knowledge of load balancing, failover strategies, and distributed systems.

  • Understanding of security best practices, access control, and compliance requirements.

  • Strong communication skills and the ability to collaborate with cross-functional teams.

What benefits are we offering?

  • Remote-first approach with flexible working hours.

  • Apple hardware ecosystem for work.

  • Flexible Paid Time Off (PTO) to support work-life balance.

  • Annual Bonus.

  • Top-tier Health Insurance for peace of mind.

  • Public Transportation Pass to support your commute needs.

  • Coverflex benefits package for meal allowances, well-being, and more.

  • Air Conference 2025 in Las Vegas - an opportunity to meet the team, collaborate, and grow together!

Diversity & Inclusion

At Air Apps, we are committed to fostering a diverse, inclusive, and equitable workplace. We enthusiastically welcome applicants from all backgrounds, experiences, and perspectives. We celebrate diversity in all its forms and believe that varied voices and experiences make us stronger.

Application Disclaimer

At Air Apps, we value transparency and integrity in our hiring process. Applicants must submit their own work without any AI-generated assistance. Any use of AI in application materials, assessments, or interviews will result in disqualification.

Average salary estimate

$75000 / YEARLY (est.)
min
max
$60000K
$90000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Site Reliability Engineer (SRE), Air Apps

At Air Apps, we're on the lookout for a talented Site Reliability Engineer (SRE) to join our innovative team in vibrant Lisboa. Here at Air Apps, we’re not just chasing numbers; we’re on a mission to revolutionize how people plan, work, and live with our AI-powered Personal & Entrepreneurial Resource Planner. As part of our family-founded company, you'll play a crucial role in ensuring that our systems are reliable, available, and scalable. Imagine diving into the nitty-gritty of software development and operations—this is where you’ll shine. You'll have the enviable task of designing fault-tolerant systems across diverse cloud environments, implementing automation strategies, and using cutting-edge observability tools like Prometheus and Grafana. Your expertise in Infrastructure as Code (like Terraform) will be instrumental as you optimize our CI/CD pipelines and help improve system resilience while minimizing downtime. You'll collaborate closely with our development and DevOps teams, conduct root cause analyses, and implement preventive measures to keep our systems running smoothly. At Air Apps, we value creativity and innovation, and your contributions will directly impact how millions of users worldwide interact with our products. With our growing presence in San Francisco and over 100 million downloads to our name, this is the perfect opportunity for you to take your SRE career to the next level while enjoying remarkable benefits like flexible working hours, top-tier health insurance, and an exciting opportunity to attend Air Conference 2025 in Las Vegas. Join us and be part of making a difference in people's lives, one system at a time!

Frequently Asked Questions (FAQs) for Site Reliability Engineer (SRE) Role at Air Apps
What are the key responsibilities of a Site Reliability Engineer at Air Apps?

As a Site Reliability Engineer (SRE) at Air Apps, your primary responsibilities include designing scalable and fault-tolerant systems across cloud environments, developing and maintaining observability tools for monitoring and logging, automating infrastructure provisioning, and optimizing system performance. You'll also collaborate with development teams to enhance system reliability, conduct root cause analyses for incidents, and manage cloud resource utilization effectively.

Join Rise to see the full answer
What qualifications are required for the Site Reliability Engineer position at Air Apps?

To be a successful Site Reliability Engineer at Air Apps, candidates should have around 4+ years of relevant experience in SRE, DevOps, or System Engineering. Strong knowledge of cloud platforms such as AWS, Azure, or GCP is essential, along with proficiency in monitoring tools like Prometheus and Grafana. Candidates should also have experience with Infrastructure as Code tools like Terraform and a solid understanding of containerization and networking.

Join Rise to see the full answer
What technologies do Site Reliability Engineers at Air Apps work with?

Site Reliability Engineers at Air Apps work with a range of advanced technologies. They utilize cloud platforms like AWS, Azure, or GCP, along with observability tools such as Datadog and ELK. Expertise in Infrastructure as Code tools like Terraform and containerization technologies like Docker and Kubernetes is also crucial, as these tools help automate and manage the complex infrastructure supporting our AI-driven products.

Join Rise to see the full answer
How does Air Apps support the professional growth of its Site Reliability Engineers?

Air Apps fosters an environment that supports continuous learning and professional growth for Site Reliability Engineers. Employees are encouraged to participate in workshops, training, and conferences, like the upcoming Air Conference 2025 in Las Vegas, to network and collaborate with industry experts. The company also provides access to resources that help enhance technical skills, particularly in cutting-edge technologies relevant to system reliability and cloud optimization.

Join Rise to see the full answer
What is the work culture like for Site Reliability Engineers at Air Apps?

The work culture at Air Apps is vibrant and inclusive, emphasizing collaboration around shared goals. Site Reliability Engineers are empowered to contribute their ideas and solutions, fostering creativity in problem-solving. With a remote-first approach and flexible working hours, team members balance their professional and personal lives effectively, all while being part of a mission-driven company that values innovation and diversity.

Join Rise to see the full answer
Common Interview Questions for Site Reliability Engineer (SRE)
Can you describe your experience with cloud platforms in relation to the Site Reliability Engineer role?

When answering this question, focus on specific cloud platforms you've worked with, like AWS or Azure. Detail your experiences in deploying applications or managing cloud environments, and highlight projects where you optimized performance or costs, demonstrating your clear understanding of cloud-native architectures.

Join Rise to see the full answer
What monitoring tools have you used, and how do they help in maintaining system reliability?

Discuss specific monitoring tools you've used, such as Prometheus or Grafana. Explain how these tools provide real-time visibility into system performance and reliability, and share examples of how you've implemented monitoring solutions to preemptively address issues before they affect users.

Join Rise to see the full answer
How do you automate infrastructure provisioning and management?

Elaborate on your experience with Infrastructure as Code (IaC) tools like Terraform or CloudFormation. Provide examples of how you've automated processes, facilitated faster deployments, and ensured consistency across environments, emphasizing the benefits of automation in enhancing system reliability.

Join Rise to see the full answer
Describe a challenging incident you managed. How did you resolve it?

For this question, narrate a specific incident and the steps you took to troubleshoot and resolve it. Highlight your problem-solving skills, your approach to root cause analysis, and the measures you implemented to prevent similar incidents in the future, showcasing your ability to handle pressure.

Join Rise to see the full answer
What strategies do you employ for system performance optimization?

Discuss various optimization strategies you've implemented, such as load balancing, caching mechanisms, or fine-tuning database queries. Back up your strategies with concrete examples of improved system performance or reduced cost after implementing these optimizations.

Join Rise to see the full answer
How do you collaborate with development teams to improve reliability?

Emphasize the importance of communication and teamwork. Share examples of cross-functional projects where you've successfully collaborated with development teams, focusing on how collective input led to more reliable system designs and effective incident response strategies.

Join Rise to see the full answer
What is your approach to incident management?

Outline your structured approach to incident management, including preparation, detection, response, and postmortem analysis. Discuss the tools you’ve utilized and emphasize the importance of continuous improvement based on learnings from each incident.

Join Rise to see the full answer
Have you implemented any disaster recovery strategies?

Share any disaster recovery plans you’ve designed or executed. Discuss specific strategies like failover mechanisms or backups, and the effectiveness of these strategies in minimizing downtime during incidents.

Join Rise to see the full answer
How do you stay updated with industry trends and technologies related to SRE?

Explain your methods for staying current, such as following relevant blogs, attending webinars, or participating in online forums and communities. Highlight any certifications or educational pursuits that showcase your dedication to continuous learning and growth in the SRE space.

Join Rise to see the full answer
What tools do you use for CI/CD, and how do they support system reliability?

Describe your experience with CI/CD tools like Jenkins or GitLab CI. Discuss how CI/CD practices support faster deployments while maintaining system reliability, and provide examples of how you’ve improved CI/CD pipelines in previous roles.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Air Apps Remote San Francisco
Posted 4 days ago
Photo of the Rise User
Posted 4 days ago
Photo of the Rise User
LDMS Remote No location specified
Posted 9 days ago
Photo of the Rise User
Posted 4 days ago
Customer-Centric
Empathetic
Collaboration over Competition
Feedback Forward
Inclusive & Diverse
Mission Driven
Diversity of Opinions
Rise from Within
Medical Insurance
Paid Time-Off
Dental Insurance
Vision Insurance
Maternity Leave
Mental Health Resources
Equity
401K Matching
Employee Resource Groups
Performance Bonus
Education Stipend
Life insurance
Photo of the Rise User
NetApp Hybrid North Carolina, United States
Posted 2 days ago
Photo of the Rise User
Posted 3 days ago
Photo of the Rise User
CGG Hybrid Houston, Texas, United States
Posted 3 days ago
Photo of the Rise User
Figma Remote San Francisco, CA • New York, NY • Seattle, WA • United States
Posted 8 days ago
Empathetic
Collaboration over Competition
Growth & Learning
Passion for Exploration
Fast-Paced
Startup Mindset
Diversity of Opinions
Rise from Within

Air Apps is a leading mobile development company creating essential apps for your daily tasks.

26 jobs
MATCH
VIEW MATCH
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
March 28, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!
LATEST ACTIVITY
Photo of the Rise User
Someone from OH, Dover just viewed Finance Intern - Summer 2025 at Spectrum
Photo of the Rise User
12 people applied to Software Engineer I at Affirm
F
Someone from OH, Columbus just viewed Mortgage Loan Officer Assistant at Fulton Bank
Photo of the Rise User
Someone from OH, Cincinnati just viewed Amazon Work from Home Data Entry Jobs – Entry Level at Amazon
J
Someone from OH, Westerville just viewed Oracle Database Administrator- Remote only at JASCI
Photo of the Rise User
8 people applied to Game Developer at Altera
V
Someone from OH, Toledo just viewed Sports Event Coordinator at Ventures With Jen
Photo of the Rise User
Someone from OH, Dayton just viewed Research Assistant at Leidos
Photo of the Rise User
Someone from OH, Cincinnati just viewed Finance & Accounting Associate at HeadQuarters
Photo of the Rise User
Someone from OH, Canton just viewed Communications Manager at Shearer's Foods
Photo of the Rise User
12 people applied to Frontend Engineer I at Outliant
Photo of the Rise User
Someone from OH, Sandusky just viewed Supply Chain Trainee Program (SCTP) at Anheuser-Busch
Photo of the Rise User
11 people applied to Unity Developer at FS Studio
Photo of the Rise User
139 people applied to Scrum Master-Remote at DICE
Photo of the Rise User
Someone from OH, Mason just viewed HR/Recruiting Assistant at Illumination
Photo of the Rise User
Someone from OH, Strongsville just viewed Used Car Buyer - Concord Toyota at Sonic Automotive
Photo of the Rise User
Someone from OH, Cincinnati just viewed Mid-level Creative (f/m/d) at Landor
P
Someone from OH, Kent just viewed Graphic Designer at ProjectGrowth
Photo of the Rise User
Someone from OH, Waverly just viewed Client Services Manager at Pepperstone
Photo of the Rise User
Someone from OH, Plain City just viewed Aesthetic Telehealth Nurse Practitioner (remote) at Moxie
Photo of the Rise User
Someone from OH, Columbus just viewed EdTech Product/Program Manager at Planner5D
S
Someone from OH, Lorain just viewed Test Engineer- Ninja at SharkNinja