Sign up for our
weekly
newsletter
of fresh jobs
Site Reliability Engineering at Ford Motor Company plays a critical role in maintaining and improving the reliability, scalability, and performance of our services. You will work closely with our development teams to build and maintain large-scale, distributed systems and ensure our products meet our high standards for availability and user experience.Description for Internal CandidatesSite Reliability Engineering at Ford Motor Company plays a critical role in maintaining and improving the reliability, scalability, and performance of our services. You will work closely with our development teams to build and maintain large-scale, distributed systems and ensure our products meet our high standards for availability and user experience.Write, configure, and deploy code that improves service reliability for existing or new systems; set standard for others with respect to code qualityProvide helpful and actionable feedback and review for code or production changesDrive repair/optimization of complex systems with consideration towards a wide range of contributing factorsLead debugging, troubleshooting, and analysis of service architecture and designParticipate in on-call rotationWrite documentation: design, system analysis, runbooks, playbooks. Provide design feedback and uplevel design skills of others.Implement and manage monitoring solutions using Dynatrace, Splunk, and OpenTelemetry to ensure visibility and proactive issue detection across our platforms.Work within GCP infrastructure, optimizing performance, and cost, and scaling resources to meet demand.Collaborate with development teams to enhance system reliability and performance, applying a platform engineering mindset to system administration tasks.Develop and maintain automated solutions for operational aspects such as on-call monitoring, performance tuning, and disaster recovery.Troubleshoot and resolve issues in our dev, test, and production environments.Participate in postmortem analysis and create preventative measures for future incidents.Bachelor's degree in Computer Science, Engineering, or equivalent experience.3+ years of experience as an SRE, DevOps Engineer, or in a similar role.Strong experience with monitoring and observability tools, particularly Dynatrace and OpenTelemetry.Proficient with cloud services, with a strong preference for Google Cloud Platform (GCP) experience.Solid programming skills in Java, with a good understanding of software development best practices.Experience managing and optimizing PostgreSQL databases.Familiarity with front-end development frameworks, particularly React.Ability to debug, optimize code, and automate routine tasks.Strong problem-solving skills and the ability to work under pressure in a fast-paced environment.Excellent verbal and written communication skills.Requisition ID : 34046