Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Site Reliability Engineer image - Rise Careers
Job details

Site Reliability Engineer

Graylog: Empowering Threat Detection, Investigation, & Response Solutions with Cutting-Edge Technology

 

Graylog specializes in delivering top-notch Threat Detection, Investigation, & Response (TDIR) solutions, backed by our latest addition, the Graylog API security platform. As a renowned centralized log management (CLM) and Security Information Event Management (SIEM) provider, we offer unparalleled fast and efficient log analysis capabilities in critical areas such as security, compliance, operations, and DevOps.

 

Our enterprise solution enables organizations globally to capture, store, and analyze terabytes of machine data in near-real time while our open-source product has been deployed in more than 50,000 installations worldwide, empowering individuals and small teams to perform basic log consolidation, analysis, and search functions at no cost.

 

We're a remote-friendly company with locations in Hamburg, Munich, London, Boulder, and headquarters in Houston, TX. If you live near an office and want to be part of said office great.  Nearish to an office and want to have the ability to hot desk? No problem, and if you're not near an office and wish to work remotely, all good!

 

Recent achievements for Graylog have been inclusion in the 2021 Deloitte Technology Fast 500™, we took home two of the most prestigious cybersecurity awards in SIEM and DevSecOps from Cyber Defence Magazine at RSA in 2023, and 2024 has seen us take home gold and become the Globee Winner for Security Information & Event Management and the 2024 Globee Winner for Threat Hunting, Detection, Intelligence, and Response.


Graylog has recently been named a “Leader” and “Fast Mover” in GigaOM’s 2024 Radar Report for SIEM.  


Who we’re looking for;

 

We’re currently recruiting for a Site Reliability Engineer to join our multinational cloud services team.

 

As a Site Reliability Engineer here at Graylog you will provide architectural guidance and technical solutions for adapting our product in a 24x7 support cloud offering, with a focus on delivering a product that is highly available, resilient, secure, scalable, cost-efficient, and consistently delivers valuable product outcomes to consumers.

 

Our Site Reliability Engineers work with state-of-the-art technologies as we ensure you have the right tools to make a significant impact in managing our systems and to drive their continuous improvement while shaping the future of our cloud strategy.

 

We believe that the best ideas can come from anywhere, and we value your input and initiative. Here, you will not just be a guardian of our infrastructure; you’ll be an innovator, a problem-solver, and a leader.

 

This role is a full-time permanent position based in North America and will report to our Engineering Manager, Site Reliability.


Additional responsibilities will include but are not limited to;
  • Cloud Infrastructure Management: Writing pull requests (PRs) to make changes that improve and optimize our AWS+Terraform+Kubernetes setup, centring around ensuring its high availability, scalability, and resilience.
  • Security & Compliance: Implementing security measures, auditing the cloud environment, and ensuring adherence to compliance standards.
  • Tool Development: Expanding our internal tool base, focusing on Infrastructure as a Code and configuration management improvements.
  • Issue Resolution: Collaborating with teams to identify and resolve infrastructure-related issues swiftly, minimizing any impact on product performance.
  • Cloud Strategy Advocacy: Championing cloud strategies that align with and advance our business objectives, especially during pitch cycles and other planning meetings.
  • Knowledge Sharing: Connecting with Cloud Engineers, Site Reliability Engineers, and application engineers, documenting key decisions where possible and making sure critical knowledge isn't siloed in a single spot in the organization.


What you can expect your first 12 months will look like;
  • Infrastructure Knowledge: Within six months, acquire expert understanding of and submit an approved peer-reviewed pull request (APRPR) for each of the following technologies: Terraform, Flux, Kustomize, and Argo.
  • Stability Improvements: In the first 6-9 months, deliver a POC for a technology improvement centred around improving or maintaining uptime, reducing the reliance on single points of failure, or reducing the Time to Recovery after an incident.
  • Signal and Metrics Improvement: Within six months, contribute to at least one cycle of signal and metrics improvement and show that the overall number of alerts decreased in the following cycle and/or a requested metric or set of metrics has been made available for use.
  • Security and Compliance: In the first 12 months, contribute to at least one of the following: AWS Product and Architecture Review, SOC 2 compliance review, Disaster Recovery (DR) plan review and drill, Security Penetration Test (Pen Test) review and remediation.


Little bit about you;
  • Cloud Infrastructure Management: Proficiency in managing cloud infrastructures, especially AWS, along with associated tools like Terraform and Kubernetes, ensuring high availability, scalability, and resilience.
  • Experience with Infrastructure as Code (IaC): Hands-on experience with IaC tools and techniques, including configuration management and cloud provisioning.
  • Software Development: Basic programming skills in at least one language, such as Python, for tool development and automation tasks.
  • Security Best Practices: Knowledge of security protocols and compliance requirements specific to cloud environments, with experience in implementing security measures.
  • Troubleshooting & Issue Resolution: Experience in diagnosing and resolving infrastructure-related issues, working closely with development and support teams.
  • Monitoring and Metrics: Familiarity with cloud monitoring tools and performance metrics to continuously evaluate and improve the infrastructure.
  • CI/CD Practices: Understanding of continuous integration and continuous deployment practices for efficient and reliable product releases.
  • Documentation & Communication: Ability to document technical processes clearly and effectively communicate architectural decisions and changes to various stakeholders.


Just some of the reasons why to join Graylog;
  • Management team with deep programming, technical, and product experience.
  • Opportunity to work with a globally distributed and diverse team.
  • Grow and develop professionally and personally in a fast-growing environment.
  • Choice of the latest equipment to help you succeed.
  • Monthly allowance to support your commute costs and support outfitting your work-from-home environment.


Here at Graylog, you'll find a diverse group of experienced professionals who love to have fun while meeting the needs of our customers with the best solution and customer service available.


Our values;


Openness- As a global company, we encourage our people to bring their backgrounds, ideas, and perspectives to our collective work. We lead with integrity and are committed to doing what is best for the Graylog community.


Collaboration- Through mutual respect, trust, and candid communication across all teams, we deliver the best ideas and results.


Useful Innovation- We take calculated risks to find new ways to innovate. By continuously improving ourselves, processes, and technologies, we deliver the best solution for our customers.


Ownership- As owners, we take the initiative to solve internal and external problems while supporting peer success and holding ourselves accountable for delivering the best work. We do this from a place of high trust.


Do the Right Thing!- Comfort and safety come from knowing that everyone will do the right thing, even when nobody's looking.


For further information please submit an application and a member of the Graylog People Team will be in touch.


Average salary estimate

$100000 / YEARLY (est.)
min
max
$80000K
$120000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Site Reliability Engineer, Graylog, Inc

At Graylog, we're on a mission to provide top-tier Threat Detection, Investigation, and Response solutions, and we're looking for an innovative Site Reliability Engineer to join our fantastic cloud services team! This full-time position invites you to dive into architectural guidance while ensuring our products are scalable, secure, and resilient. Your role will be pivotal in managing our AWS, Terraform, and Kubernetes setups, enhancing our infrastructure's performance, security, and compliance. Whether you’re implementing security protocols or collaborating with cross-functional teams to swiftly resolve issues, your input will have a tangible impact. Enjoy the freedom of working remotely or hot-desking at any of our offices in North America, as we embrace a flexible work culture. With several accolades under our belt, including recognition in Deloitte’s Technology Fast 500™, we pride ourselves on fostering a collaborative environment that celebrates your ideas and initiatives. You'll have the tools and support to not just be an overseer of our infrastructure but an active player in shaping our cloud strategy and ensuring an outstanding experience for our users. Here at Graylog, your creativity and problem-solving skills will be celebrated, and your journey with us will be about making meaningful contributions while enjoying a friendly and empowering team atmosphere. Come see what it's like to be part of an award-winning company that values integrity and innovation!

Frequently Asked Questions (FAQs) for Site Reliability Engineer Role at Graylog, Inc
What are the main responsibilities of a Site Reliability Engineer at Graylog?

As a Site Reliability Engineer at Graylog, you'll be responsible for managing and optimizing our cloud infrastructure, especially focusing on AWS, Terraform, and Kubernetes. Your role includes implementing security measures, enhancing system reliability, collaborating with various teams for swift issue resolution, and advocating for cloud strategies that align with our business goals. You'll also have the opportunity to expand our internal tools and contribute to compliance reviews and security audits.

Join Rise to see the full answer
What qualifications are required for the Site Reliability Engineer position at Graylog?

To be considered for the Site Reliability Engineer role at Graylog, you should have proficiency in managing cloud infrastructures, hands-on experience with Infrastructure as Code (IaC), and basic programming skills in at least one language, such as Python. A solid understanding of security best practices and experience with tools like Terraform and Kubernetes are essential, alongside strong troubleshooting and documentation skills.

Join Rise to see the full answer
What tools and technologies should a Site Reliability Engineer at Graylog be familiar with?

A Site Reliability Engineer at Graylog should be well-versed in cloud infrastructure management tools, particularly AWS, Terraform, and Kubernetes. Familiarity with monitoring and performance metrics, CI/CD practices, and cloud security protocols will also play a crucial role in ensuring the integrity and performance of our systems.

Join Rise to see the full answer
What does the career development path look like for a Site Reliability Engineer at Graylog?

At Graylog, the career development path for a Site Reliability Engineer is collaborative and growth-focused. You will have the chance to work alongside experienced professionals, receive mentorship, and contribute to impactful projects. As you gain more expertise in technologies such as Terraform and Kubernetes, opportunities for advancement into leadership roles and specialized positions will be available.

Join Rise to see the full answer
How does Graylog ensure a supportive and collaborative work environment for Site Reliability Engineers?

Graylog fosters a supportive and collaborative work environment for Site Reliability Engineers through open communication, trust, and mutual respect. We encourage team members to share their backgrounds and ideas openly, and we emphasize the importance of innovation, ownership, and ethical practices, ensuring everyone feels empowered to contribute to the company's success.

Join Rise to see the full answer
Common Interview Questions for Site Reliability Engineer
Can you explain your experience with cloud infrastructure management?

In responding to this question, provide specific examples of your work with cloud infrastructure, highlighting your familiarity with platforms like AWS. Discuss your experience with tools like Terraform and Kubernetes, and mention any successful projects where you managed or optimized cloud resources.

Join Rise to see the full answer
How do you approach security compliance in cloud environments?

When answering this question, mention the key security protocols you have implemented in past roles. Describe your understanding of compliance standards such as SOC 2 and the steps you take, like conducting regular audits and vulnerability assessments, to ensure compliance is maintained.

Join Rise to see the full answer
What strategies do you use for issue resolution in cloud systems?

In your response, discuss your approach to troubleshooting, including methods you've used to identify and resolve outages or performance issues swiftly. Highlight your collaboration with development teams and how you leverage monitoring tools to diagnose problems.

Join Rise to see the full answer
Describe your experience with Infrastructure as Code (IaC).

For this question, focus on your hands-on experience with IaC tools like Terraform. Share examples of how you've used IaC to automate infrastructure provisioning and configuration management, emphasizing any specific projects and their outcomes.

Join Rise to see the full answer
Can you provide an example of a successful technology improvement project you have led?

When addressing this question, share a specific instance where you led a project that improved system uptime, reduced single points of failure, or streamlined recovery times. Discuss the methodologies you used and the positive outcomes for the organization.

Join Rise to see the full answer
What metrics do you monitor to ensure the health of cloud systems?

In your answer, list specific metrics that you consider critical for monitoring cloud systems, such as uptime, latency, and error rates. Explain how you collect and interpret this data to drive continuous improvement.

Join Rise to see the full answer
How do you manage communication between cross-functional teams?

Discuss your strategies for ensuring clear and effective communication with cross-functional teams, including regular stand-ups, documentation practices, and collaborative tools. Convey how this communication helps streamline processes and enhances teamwork.

Join Rise to see the full answer
What are your views on continuous integration and deployment?

In this answer, express the importance of CI/CD in modern development. Highlight any experience you have implementing CI/CD pipelines, focusing on the benefits such as faster release cycles and increased quality of deployments.

Join Rise to see the full answer
How do you stay updated with the latest cloud technologies and trends?

When discussing this topic, mention resources you utilize, such as online courses, webinars, tech blogs, or industry conferences. Emphasize your proactive approach to continuous learning and professional development.

Join Rise to see the full answer
Why do you want to work at Graylog as a Site Reliability Engineer?

In your response, share personal motivations for wanting to join Graylog. Highlight your alignment with the company’s values such as collaboration, innovation, and ownership, and discuss how you can contribute positively to their mission and future growth.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Posted 2 days ago
Photo of the Rise User
Posted 6 days ago
Photo of the Rise User
Dental Insurance
Vision Insurance
Flexible Spending Account (FSA)
Family Medical Leave
Paid Holidays
Photo of the Rise User
gpac Hybrid Flowery Branch, GA
Posted 2 days ago
Photo of the Rise User
Posted 12 days ago
Photo of the Rise User
Bosch Group Hybrid 15000 Haggerty Rd, Plymouth, MI 48170, USA
Posted 12 days ago

Graylog is a leading centralized log management solution built to open standards for capturing, storing, and enabling real-time analysis of terabytes of machine data. Graylog delivers a better user experience by making analysis ridiculously fast a...

1 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
December 17, 2024

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!