Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Staff Software Engineer - Alerting Platform image - Rise Careers
Job details

Staff Software Engineer - Alerting Platform

We are looking for a Staff Engineer to help us scale Datadog's Alerting Platform, which is responsible for the core systems that define and schedule monitors, create alerts, and ensure the accuracy and timeliness of the end to end alerting process across the platform.

This is a unique opportunity to contribute to one of the most critical platforms at Datadog. Customers can configure monitors and generate alerts for virtually every product in our unified platform. It’s imperative that we maintain our customers’ trust by delivering these notifications reliably. In practice, this means the alerting platform has to be the most reliable platform. 

As we grow we have to design systems that can degrade furthermore while still ensuring the best customer experience without breaking. This staff engineer will focus on two critical components: the alerting scheduler, responsible for scheduling the timely evaluation of millions of monitors each minute, and the state processor that makes the critical decision about when a transition in monitor state has occurred. These distributed systems are tied together, one being the consumer (state machine) of the other (scheduler). The reliability and fault tolerance of these systems together, and across the entire alerting platform, is critical to Datadog's customer trust and business success. Upcoming initiatives to achieve an order of magnitude increase in reliability will require deep changes to these complex systems.

At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them.

 

What You’ll Do: 

  • Design and drive high priority, high visibility projects that increase the platform's resilience and scalability across multiple teams 
  • Lead and guide others through architectural decisions for new and existing distributed, high-throughput, real-time systems
  • Identify potential system risks and trends in reliability, and design solutions to address them
  • Provide input on prioritization of engineering-led initiatives in short- and long-term planning and roadmaps
  • Collaborate closely with partner platforms that integrate and depend on the alerting platform to provide critical capabilities to their customers

 

Who You Are: 

  • You have led cross-team initiatives in a platform or infrastructure-focused environment for 2+ years. 
  • You have led impactful technical initiatives in an environment where performance, reliability, and accuracy are first-order concerns
  • You have a reliability-oriented mindset and care deeply about designing and building resilient architectures
  • You have significant back end programming experienced and have architected, built, and operated distributed systems to solve problems at high scale

Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. That's okay. If you’re passionate about technology and want to grow your skills, we encourage you to apply.

 

Benefits and Growth: 

  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
  • Continuous professional development, product training, and career pathing
  • Intradepartmental mentor and buddy program for in-house networking
  • An inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups)
  • Access to Inclusion Talks, our internal panel discussions
  • Free, global mental health benefits for employees and dependents age 6+
  • Competitive global benefits

Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.


About Datadog: 

Datadog (NASDAQ: DDOG) is a global SaaS business, delivering a rare combination of growth and profitability. We are on a mission to break down silos and solve complexity in the cloud age by enabling digital transformation, cloud migration, and infrastructure monitoring of our customers’ entire technology stacks. Built by engineers, for engineers, Datadog is used by organizations of all sizes across a wide range of industries. Together, we champion professional development, diversity of thought, innovation, and work excellence to empower continuous growth. Join the pack and become part of a collaborative, pragmatic, and thoughtful people-first community where we solve tough problems, take smart risks, and celebrate one another. Learn more about #DatadogLife on Instagram, LinkedIn, and Datadog Learning Center.


Equal Opportunity at Datadog:

Datadog is an Affirmative Action and Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. Here are our Candidate Legal Notices for your reference.

Your Privacy:

Any information you submit to Datadog as part of your application will be processed in accordance with Datadog’s Applicant and Candidate Privacy Notice.

Average salary estimate

$105000 / YEARLY (est.)
min
max
$80000K
$130000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

What You Should Know About Staff Software Engineer - Alerting Platform, Datadog

Are you ready to take on an exhilarating challenge as a Staff Software Engineer at Datadog, where you’ll be instrumental in scaling the Alerting Platform? This is not just any engineering role; it’s a unique opportunity to contribute to one of our most critical systems! You’ll dive into the architecture that defines and channels monitors, ensuring alerts are accurate and timely for our customers all over the globe. We’re looking for someone who can enhance our ability to deliver trustworthy notifications—because a seamless alerting experience is non-negotiable for us and our users. As the Staff Software Engineer focusing on the alerting scheduler and the state processor, you will shape systems that reliably manage millions of monitors each minute while strategizing for enhanced resilience and scalability. Here at Datadog, your input will be valued as you lead high-priority projects and guide cross-team technical initiatives. We believe in fostering a collaborative culture, empowering employees to innovate and grow. If you’re passionate about building reliable distributed systems and crave a workplace that embraces flexibility and community, we encourage you to apply and join our vibrant Datadog family. Together, we’ll tackle the challenges of cloud complexity with creativity and passion!

Frequently Asked Questions (FAQs) for Staff Software Engineer - Alerting Platform Role at Datadog
What responsibilities should a Staff Software Engineer at Datadog's Alerting Platform expect?

As a Staff Software Engineer at Datadog's Alerting Platform, you would be responsible for designing and executing high-priority projects aimed at increasing the platform's resilience and scalability. A crucial part of your role will involve leading architectural decisions for both new and existing distributed systems. You will also keep an eye on system reliability, proactively identifying risks and trends while collaborating closely with teams that rely on the alerting platform to better serve their customers.

Join Rise to see the full answer
What qualifications and experience are needed for the Staff Software Engineer position at Datadog?

To qualify for the Staff Software Engineer position at Datadog, candidates are expected to have at least 2+ years of experience leading cross-team initiatives in a platform or infrastructure-focused environment. Significant backend programming experience and a solid track record in architecting, building, and operating distributed systems for high scale operations are essential. A reliability-oriented mindset and a passion for crafting resilient architectures are also crucial for success in this role.

Join Rise to see the full answer
How does Datadog foster professional growth for Staff Software Engineers?

Datadog values continuous learning and professional development. For Staff Software Engineers, opportunities include ongoing product training, access to mentorship programs, and clear career pathing. The culture at Datadog prioritizes collaboration and provides resources through community guilds and inclusion talks, ensuring that every engineer can thrive and grow within the organization.

Join Rise to see the full answer
What is the team culture like for Datadog's Alerting Platform engineers?

The team culture for Datadog's Alerting Platform engineers is vibrant and collaborative. At Datadog, we emphasize creativity and the importance of relationships within our hybrid work environment—encouraging a work-life balance that suits individual preferences. We actively celebrate diversity of thought and professional excellence, creating an inclusive atmosphere where every engineer feels valued and empowered to contribute meaningfully.

Join Rise to see the full answer
What type of projects will a Staff Software Engineer work on at Datadog?

In the position of Staff Software Engineer at Datadog, you will be engaged in high-visibility projects that enhance the platform's reliability and scalability. You can expect to lead initiatives that require you to architect complex distributed systems focusing on real-time performance, reliability, and accuracy. Your projects will push the boundaries of technology to ensure the alerting platform can handle substantial loads efficiently, ultimately ensuring customer trust and satisfaction.

Join Rise to see the full answer
Common Interview Questions for Staff Software Engineer - Alerting Platform
Can you describe your experience leading cross-team initiatives in a platform or infrastructure environment?

When answering this question, focus on a specific project or initiative where you successfully led a team across different departments. Highlight the challenges you faced, how you coordinated between different teams, and the impact this had on the project's success. Use metrics where possible to quantify your achievements and demonstrate your leadership capabilities.

Join Rise to see the full answer
How do you ensure the reliability and fault tolerance of distributed systems?

Emphasize your systematic approach: Discuss strategies you use for redundancy, monitoring, and alerting on performance issues, and how you conduct failover tests and disaster recovery drills. Establish that you understand key principles of software reliability, such as circuit breaking, and elaborate on a past experience where you applied these principles effectively.

Join Rise to see the full answer
What techniques do you implement to improve the performance of systems handling high-throughput requirements?

Your answer should reflect a deep understanding of performance optimization techniques, such as caching strategies, efficient database indexing, and the use of asynchronous processing. Provide a past experience where implementing these techniques resulted in significant improvements in system performance and response times.

Join Rise to see the full answer
How do you identify and mitigate risks in a software project?

Share your structured approach to risk assessment, such as conducting thorough code reviews and utilizing thorough testing methodologies. Discuss tools or templates you might use for risk mitigation and how this proactively protects the stability of the platform. Illustrate your point with an example from your experience.

Join Rise to see the full answer
Explain how you would design a system that can reliably handle millions of alerts per minute.

When approaching this question, outline a scalable architecture that incorporates load balancing, distributed databases, and real-time processing capabilities. Discuss the importance of monitoring and alerting to assist with redundancy and system resilience. Highlight your understanding of the challenges involved and how you would address them through iterative design and testing.

Join Rise to see the full answer
What is your approach to architectural decision-making?

Convey your thought process: How you evaluate system needs, consider trade-offs, and involve stakeholders in the decision-making process. Discuss frameworks that guide your architectural decisions and share an example where your decision led to a positive outcome for the project.

Join Rise to see the full answer
List key metrics you consider when evaluating the success of a system you've designed.

Key metrics may include uptime, response times, system throughput, and user satisfaction indicators. Highlight your experience with these metrics and how you use them to make informed decisions that drive future enhancements and ensure continued reliability.

Join Rise to see the full answer
What role do you see for DevOps in software development and how does it apply to this role?

Discuss the importance of DevOps principles in today’s software development lifecycle, particularly in terms of continuous integration and deployment, collaboration and communication between teams, and automating processes. Link these principles back to how they can help improve reliability and performance in the context of the alerting platform at Datadog.

Join Rise to see the full answer
Can you provide an example of a project where you ensured a high level of accuracy in the output?

Select a project that showcases your attention to detail and understanding of accuracy in system outputs. Discuss the specific steps you took to validate your outputs, the challenges you faced along the way, and the end-result that demonstrated your commitment to quality.

Join Rise to see the full answer
How do you foster a culture of continuous improvement in your team?

Emphasize how you mentor team members, encourage feedback loops, and promote learning opportunities. Share specific initiatives you’ve introduced that led to increased efficiency and morale, showcasing your leadership style and commitment to fostering a growth-oriented environment.

Join Rise to see the full answer
Similar Jobs
Photo of the Rise User
Datadog Remote Salt Lake City, UT
Posted 3 days ago
Customer-Centric
Rapid Growth
Diversity of Opinions
Reward & Recognition
Friends Outside of Work
Inclusive & Diverse
Empathetic
Feedback Forward
Work/Life Harmony
Casual Dress Code
Startup Mindset
Collaboration over Competition
Fast-Paced
Growth & Learning
Open Door Policy
Rise from Within
Maternity Leave
Paternity Leave
Flex-Friendly
Family Coverage (Insurance)
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Paid Holidays
Paid Sick Days
Paid Time-Off

Join a compassionate initiative that offers free CPE training to aspiring spiritual care professionals in Salt Lake City.

Photo of the Rise User
Customer-Centric
Rapid Growth
Diversity of Opinions
Reward & Recognition
Friends Outside of Work
Inclusive & Diverse
Empathetic
Feedback Forward
Work/Life Harmony
Casual Dress Code
Startup Mindset
Collaboration over Competition
Fast-Paced
Growth & Learning
Open Door Policy
Rise from Within
Maternity Leave
Paternity Leave
Flex-Friendly
Family Coverage (Insurance)
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Paid Holidays
Paid Sick Days
Paid Time-Off

Become a vital part of our team as a New Business Development Manager focusing on aerospace and defense composites, driving revenue growth through strategic partnerships.

Photo of the Rise User
Posted yesterday
Inclusive & Diverse
Diversity of Opinions
Work/Life Harmony
Dare to be Different
Reward & Recognition
Empathetic
Take Risks
Growth & Learning
Transparent & Candid
Mission Driven
Passion for Exploration
Feedback Forward
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Learning & Development
Paid Time-Off
Maternity Leave
Social Gatherings

Join Apple as a Software Development Engineer and contribute to groundbreaking technology that impacts millions of users worldwide.

Photo of the Rise User
Posted 4 days ago

Nymbus seeks an experienced Full Stack Developer to enhance our fintech solutions with creativity and technical expertise.

Photo of the Rise User
Posted 10 days ago

Join Mindera as a Senior Mobile Developer, where you'll create innovative mobile applications and be part of a collaborative and agile team.

Photo of the Rise User
Posted 14 days ago
Inclusive & Diverse
Mission Driven
Work/Life Harmony
Diversity of Opinions
Friends Outside of Work
Empathetic
Collaboration over Competition
Fast-Paced
Transparent & Candid
Medical Insurance
Dental Insurance
Vision Insurance
Disability Insurance
Learning & Development
401K Matching
Paid Time-Off
WFH Reimbursements
Paid Holidays
Equity
Flex-Friendly

Join Replit as a Staff Product Engineer and help shape the future of software creation through collaboration and innovative technology.

Photo of the Rise User
Mission Driven
Passion for Exploration
Transparent & Candid
Growth & Learning

Join Quora as a Staff Backend Engineer to shape the future of AI interactions on the Poe platform.

Photo of the Rise User
Solvd Remote No location specified
Posted 2 days ago

Join Solvd Inc. as a Senior Software Engineer to leverage your Python skills in an innovative and globally recognized team.

Photo of the Rise User

Become a pivotal part of JMA, a leading technology company, as an Advanced Engineer focusing on Software QA and automation framework development.

A1FED Inc Hybrid San Antonio, Texas, United States
Posted 11 days ago

Join our team as an AI Programmer and help shape the future of artificial intelligence in a dynamic environment.

Photo of the Rise User
Posted 2 months ago
Photo of the Rise User
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Customer-Centric
Fast-Paced
Growth & Learning
Medical Insurance
Dental Insurance
401K Matching
Paid Time-Off
Maternity Leave
Paternity Leave
Mental Health Resources
Flex-Friendly
Photo of the Rise User
Inclusive & Diverse
Empathetic
Collaboration over Competition
Growth & Learning
Transparent & Candid
Medical Insurance
Dental Insurance
Mental Health Resources
Life insurance
Disability Insurance
Child Care stipend
Employee Resource Groups
Learning & Development
Photo of the Rise User
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Transparent & Candid
Growth & Learning
Fast-Paced
Collaboration over Competition
Take Risks
Friends Outside of Work
Passion for Exploration
Customer-Centric
Reward & Recognition
Feedback Forward
Rapid Growth
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Paternity Leave
Fully Distributed
Flex-Friendly
Some Meals Provided
Snacks
Social Gatherings
Pet Friendly
Company Retreats
Dental Insurance
Life insurance
Health Savings Account (HSA)
Photo of the Rise User
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Transparent & Candid
Growth & Learning
Fast-Paced
Collaboration over Competition
Take Risks
Friends Outside of Work
Passion for Exploration
Customer-Centric
Reward & Recognition
Feedback Forward
Rapid Growth
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Paternity Leave
Fully Distributed
Flex-Friendly
Some Meals Provided
Snacks
Social Gatherings
Pet Friendly
Company Retreats
Dental Insurance
Life insurance
Health Savings Account (HSA)

Datadog (NYSE: DDOG) is a prominent global SaaS provider that uniquely balances growth and profitability. It offers cloud-scale monitoring and security by combining metrics, traces, and logs within one platform.

3507 jobs
MATCH
VIEW MATCH
BADGES
Badge Diversity ChampionBadge Future MakerBadge Office VibesBadge Future UnicornBadge Rapid Growth
CULTURE VALUES
Customer-Centric
Rapid Growth
Diversity of Opinions
Reward & Recognition
Friends Outside of Work
Inclusive & Diverse
Empathetic
Feedback Forward
Work/Life Harmony
Casual Dress Code
Startup Mindset
Collaboration over Competition
Fast-Paced
Growth & Learning
Open Door Policy
Rise from Within
BENEFITS & PERKS
Maternity Leave
Paternity Leave
Flex-Friendly
Family Coverage (Insurance)
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Paid Holidays
Paid Sick Days
Paid Time-Off
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
March 28, 2025

Subscribe to Rise newsletter

Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!