Job details

Data Engineer

We are a dedicated team developing Large Language Models, one of the most prestigious and advanced ongoing Natural Language Processing projects in the world. Our team is responsible for the entire data engineering pipeline—from the collection of raw text data, through preprocessing and storage, to serving the data for model training and deployment. Additionally, our Data Engineering team is actively involved in Optical Character Recognition and image processing.

Our focus is on the rapid development and deployment of state-of-the-art information retrieval systems to meet complex information needs. As a Data Engineer, you will play a critical role in our team, owning the core data engineering tasks in our product pipeline. You will collaborate closely with cross-functional teams to provide innovative solutions to real-world problems.

To succeed in this role, you'll need a results-driven mindset, a passion for excellence, and a continuous desire to learn and improve. Your key responsibilities in this project will include:

Utilizing programming languages such as Python, R, Scala, etc., to analyze data and build statistical models.
Providing insights, metrics, and explanations for data variance through your technical expertise.

Building knowledge graphs and services to support the information retrieval process.
Implementing best-practice data quality assurance mechanisms.

Bachelor’s degree in Computer Engineering, Software Engineering, or equivalent field.
3+ years of experience with data cleaning, preprocessing, and data architecture, especially with big data
3+ years of coding experience in at least one modern programming language (Python is preferred; R, Ruby, Scala, Java, etc. are also acceptable)
Extensive knowledge and practical experience in several of the following areas: machine learning, statistics, deep learning, recommendation systems, information retrieval, data preparation, and web crawling
Basic NLP skills (e.g., word embeddings, language models) to facilitate communication between end users and data. Knowledge of Large Language Models and their data preparation steps is a plus
Basic knowledge of NoSQL databases, with a preference for Elasticsearch and MongoDB. Experience with RDBMS such as PostgreSQL or MySQL is also valuable
Experience with data visualization tools. Grafana and Airflow is a strong plus
Basic knowledge of Apache Spark and Hadoop is a big advantage
Proficiency in Linux-based OS operations
A solid understanding of search-related business scenarios and core technologies
A passion for sharing knowledge and the confidence to seek help when needed
Fluency in both written and spoken English
Experience mentoring or leading teams of 5+ members, and providing technical or professional guidance, is a plus
An eagerness to learn new technologies is highly valued

By Huawei Telekomünikasyon Dış Ticaret Ltd

Huawei is a global provider of information and communications technology (ICT) infrastructure and smart devices. Huawei is headquartered in Shenzhen, China.

2 jobs

MATCH

Calculating your matching score...

FUNDING

Public

DEPARTMENTS

Data

SENIORITY LEVEL REQUIREMENT

Mid-Level

INDUSTRY

Information Technology

TEAM SIZE