UHC Stars Data Engineer
Day to Day Responsibilities & Expectations
Typical development involves a heavily Scala code base , new projects (where appropriate) may be done within PySpark and integrated via Airflow with other components. Would be expected to support and evolve the current code base (Spark (Scala), Pyspark) regardless of if Scala or Python development is a primary or secondary skill set.
Ingestion and organization of metadata which supports the deidentification and masking of data (subject to HIPPA and internal privacy regulations) for offshore data teams.
Development and automation of data pipelines (tabular files, database tables, files via OCR) utilizing Apache Spark for data ingestion and transformation and Apache Airflow for orchestration in a DevOps type work pattern. (Build, Deploy, Maintain)
Integration of data ingestion pipelines with larger analytical pipelines including implementing data monitoring and validation processes which prevent pollution of downstream data.
Work with cross functional business strategy teams to define requirements relating to data access, data acquisition and communicate delays or impediments to successful implementation along with prioritization and timelines.
Work with cross functional BI & DS teams to document and educate regarding the processes of, frequency of, and factors impacting the automated and/or manual data ingestion and integration.
Remediate failed pipelines due to changes in drivers, database availability, data availability, malformed data or changes to structure/layout of files being delivered to defined data landing zones.
Serve as a data subject matter expert relating to data sources, their utilization, constraints and limitations to other members of the Data Engineering team as well as to end consumers of the team’s data products. (Be able to interpret, query code bases, and modify projects which you haven’t worked on individually to stake holders and other members of the engineering team.)
Maintaining asynchronous communication and status updates to geographically varied teams and team members regarding the completion of projects, expectations, exceptions and/or delays to existing timelines and/or changes in priorities.
Required Qualifications:
- Bachelor’s degree in engineering or equivalent work experience
- 5+ years of Experience working as Data Engineer (Data ingestion, Pipeline development)
- 3+ years of experience with Scala, Pyspark and Python
- 3+ years of experience in Development and automation of data pipelines (tabular files, database tables, files via OCR)
- 3+ years of experience in Apache Spark for data ingestion and transformation and Apache Airflow for orchestration in a DevOps type work pattern. (Build, Deploy, Maintain)
- 4 years of experience in a data quality review process, eg., Data QA, UAT etc.
- Ability to think strategically (outside-the-box) and provide creative solutions.
- Ability to frame problems and opportunities to present solutions.
Job Segment: Database, Quality Assurance, Engineer, Technology, Engineering