GRAIL is focused on improving lives by developing pioneering technologies to detect cancer early. As a member of our team, you will help manage the end-to-end data lifecycle, ensuring data integrity, reliability, and compliance in a regulated environment. You will work closely with cross-functional teams including lab scientists, data scientists, biostatisticians, medical directors, and software engineers to create critical datasets and data solutions that drive our product pipeline.
We are seeking a Staff Data Engineer to develop, optimize, and manage GRAIL’s data lifecycle from sample ingestion to analysis, ensuring compliance with regulatory and clinical standards. You will partner with cross-functional teams to ensure that data solutions are high-quality, scalable, and aligned with our regulatory requirements, including FDA and other global health authorities.
This is a hybrid role and requires you to be onsite at least 2 days a week in Menlo Park, CA.
Responsibilities: - Lead the design, development, and optimization of scalable ETL pipelines and data configurations to support the ingestion, transformation, and analysis of clinical and research datasets, ensuring alignment with regulatory and product requirements.
- Collaborate with data scientists, biostatisticians, and clinical teams to understand and address the data needs of various programs, including clinical trials, research studies, and regulatory submissions.
- Ensure data integrity, traceability, and quality through robust validation procedures, ensuring compliance with FDA guidelines and other regulatory requirements.
- Proactively identify new technologies, methodologies, and processes to address evolving data management challenges within a regulated biotechnology environment.
- Manage the generation and maintenance of metadata, data navigation tools, and documentation to support operational objectives and streamline study processes.
- Support study operations by ensuring that datasets are structured to meet clinical, scientific, and regulatory milestones, including data locks, submissions, and monitoring.
Preferred Qualifications: - BS/MS in a quantitative scientific field (Computer Science, Engineering, Mathematics, Statistics, Bioinformatics, etc.) with 8+ years of experience in data engineering, ideally within a regulated environment such as biotechnology, pharmaceuticals, medical devices, or healthcare.
- Strong understanding of ETL processes, data pipeline development, and database management, with proven experience delivering data solutions in support of clinical or regulatory requirements.
- Expertise in SQL and Python or R
- Experience working with cloud-based data platforms (AWS, Azure, Google Cloud) with a strong understanding of compliance frameworks (e.g., HIPAA, 21 CFR Part 11, GDPR).
- Excellent problem-solving skills with a track record of ensuring data quality and integrity across complex datasets.
- Demonstrated success working in cross-functional, collaborative teams, with the ability to translate user requirements into scalable, high-quality data solutions.
Highly Desired Qualifications
- 3+ years of experience working in a regulated industry (biotechnology, medical devices, healthcare) with knowledge of compliance and regulatory requirements for data management.
- Proven experience in data lifecycle management for clinical trials, including understanding of regulatory submissions (e.g., FDA PMA, IDMC reports).
- Familiarity with tools like Apache Airflow and DBT for data pipeline orchestration in regulated environments.
The expected, full-time, annual base pay scale for this position is $178K- $223K. Actual base pay will consider skills, experience, and location.