Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy
Jobs / Job page
Senior Staff Site Reliability Engineer image - Rise Careers
Job details

Senior Staff Site Reliability Engineer

United States


Be Part of Building the Future


Dremio is The Easy and Open Data Lakehouse, providing self-service analytics with data warehouse functionality and data lake flexibility across all of your data. Dremio increases agility with a revolutionary data-as-code approach that adopts Git concepts to enable data experimentation, version control, and governance. In addition, Dremio breaks down data silos by simplifying ingestion into the lakehouse, and also allowing queries directly on databases and data warehouses. All of this is available through a fully managed service that not only eliminates the need to maintain infrastructure and software, but also automatically optimizes the data in the lakehouse to maximize performance for every workload.


Founded in 2015, Dremio is headquartered in Santa Clara, CA. Investors include Cisco Investments, Insight Partners, Lightspeed Venture Partners, Norwest Venture Partners, Redpoint Ventures, and Sapphire Ventures. For more information, visit www.dremio.com. Connect with Dremio on GitHubLinkedInTwitter, and Facebook.


If you, like us, say “bring it on” to exciting challenges that really do change the world, we have endless opportunities where you can make your mark.


About the role


Dremio’s SREs ensure that internal and externally visible services have reliability and uptime appropriate to users' needs and a fast rate of improvement. You will be joining a small team of experienced SREs helping to deliver a world class experience to Dremio Cloud customers. Our systems, like many, are joint-cognitive, made up of both people and software: complex and therefore intrinsically hazardous. We understand and expect that catastrophe is always just around the corner.


What you’ll be doing


  • Drive continuous improvements to our usage of Kubernetes, our Operators, and the GitOps deployment paradigm.
  • Extend our networking, service mesh and Kubernetes systems to support connectivity between GCP, AWS and Azure.
  • Collaborate with Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning, production readiness and service reviews.
  • Help define and instrument Service Level indicators and objectives (SLIs/SLOs) with service owners in the Engineering teams. Develop SLO-based on-call strategies for service owners and their teams.
  • Collaborate within our virtual Observability team: develop and improve observability (tracing, events, metrics, profiling, logging and exceptions) of the Dremio Cloud product.
  • Ability to debug and optimize code written by others and automate routine tasks. You recognize complexity and are familiar with multiple techniques to manage it but recognize the folly in complete rewrites.
  • Evangelize and advocate for resilience engineering and reliability practices across our organization.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Join an on-call rotation for systems and services that the SRE team owns.
  • Practice sustainable incident response and post-incident investigation analysis. Use techniques developed in and around the Learning from Incidents community.
  • Drive the cultural, technical, and process changes to move towards a true continuous delivery model within the company. 


What we’re looking for


  • 10+ years of relevant experience in the following areas: SRE, DevOps, Distributed Systems, Cloud Operations, Software Engineering.
  • Expertise in Kubernetes, Istio, Terraform, ArgoCD/Flux.
  • Expertise with software defined networking infrastructure: dedicated and partner interconnects, VPNs, BGP.
  • Excellent command of cloud services on GCP/AWS/Azure, CI/CD pipelines.
  • Have moderate-advanced experience in Python/Go, and at least reading knowledge of Java.
  • Interested in designing, analyzing and troubleshooting large-scale distributed systems.
  • Have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • Great ability to debug and optimize code and automate routine tasks.
  • Solid background in software development and architecting resilient and reliable applications.


Bonus points if you have


  • Hands-on experience with large-scale production Kubernetes clusters (<=1000 nodes). 
  • Developed SLIs/SLOs for production systems.
  • Hands-on experience using Honeycomb for OpenTelemetry trace analysis.
  • Engagement with the Learning from Incidents community. Familiarity with seminal work of Allspaw, Woods, Cook et al.


What we offer


  • Medical, dental and vision insurance 
  • 401(k) Plan
  • Short term / long term disability and life insurance
  • Pre-IPO stock options
  • Flexible PTO
  • 16 hours of volunteer time off
  • 12 company paid holidays, including Juneteenth
  • Remote work options
  • Monthly “Get Stuff Done” (GSD) Days
  • Paid parental leave
  • Employee Assistance Program (EAP)
  • Quarterly swag surprise


**Certain benefits are only allowed to full-time Dremio employees and may not be the same across all locations. #LI

-JW1 #LI-Remote

The base salary range for this position is $166,304 to $225,000 per year. The base salary actually offered to a successful candidate will take into account various relevant and non-discriminatory business factors including, without limitation, the candidate’s geographic location, job-related experience, knowledge, and skills, and education, as well as internal equity considerations. A successful candidate may also be eligible to earn additional compensation including commissions and/or bonuses.


What we value 


At Dremio, we hold ourselves to high standards when it comes to People, Thinking, and Action. Our Gnarlies (that's what we call our employees) communicate with clarity, drive accountability, and are respectful towards each other. We confront brutal facts and focus on results while operating with a sense of urgency and building a "flywheel". People who like to jump in and drive momentum will thrive in our #GnarlyLife.


Dremio is an equal opportunity employer supporting workforce diversity. We do not discriminate on the basis of race, religion, color, national origin, gender identity, sexual orientation, age, marital status, protected veteran status, disability status, or any other unlawful factor.


Dremio is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request accommodation due to a disability, please inform your recruiter.


Dremio has policies in place to protect the personal information that employees and applicants disclose to us. Please click here to review the privacy notice. 

Dremio Glassdoor Company Review
3.8 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
Dremio DE&I Review
3.5 Glassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon Glassdoor star icon
CEO of Dremio
Dremio CEO photo
Billy Bosworth
Approve of CEO
Similar Jobs
Photo of the Rise User
Olsson Remote 5700 Tennyson Pkwy, Plano, TX 75024, USA
Posted 13 days ago
Photo of the Rise User
Posted 5 days ago
Photo of the Rise User
Motive Remote India - Remote
Posted 3 days ago
Diversity of Opinions
Inclusive & Diverse
Collaboration over Competition
Growth & Learning
Mission Driven
Rapid Growth
Passion for Exploration
Empathetic
Feedback Forward
Medical Insurance
Dental Insurance
Vision Insurance
401K Matching
Life insurance
Maternity Leave
Paternity Leave
Paid Holidays
Paid Time-Off
Performance Bonus
Social Gatherings
Some Meals Provided
Photo of the Rise User
Posted 4 days ago
Photo of the Rise User
AECOM Hybrid Los Angeles, CA, USA
Posted 7 days ago
Photo of the Rise User
Posted 10 days ago

Dremio revolutionizes analytics by offering a user-friendly and open data lakehouse that merges data warehouse capabilities with the flexibility of data lakes, enhancing self-service analytics and speeding up insights across all data sources.

38 jobs
BADGES
Badge ChangemakerBadge Diversity ChampionBadge Flexible CultureBadge Global Citizen
CULTURE VALUES
Inclusive & Diverse
Collaboration over Competition
Growth & Learning
Fast-Paced
Transparent & Candid
BENEFITS & PERKS
Medical Insurance
Dental Insurance
Vision Insurance
401K Matching
Disability Insurance
Paid Time-Off
Paid Volunteer Time
Flex-Friendly
Maternity Leave
Paternity Leave
Paid Holidays
DEPARTMENTS
TEAM SIZE
DATE POSTED
July 12, 2023

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!