At Graphcore we're optimistic for a future where people live healthier, more informed, more creative lives. Our team is at the forefront of the artificial intelligence revolution, enabling innovators from all industries and sectors to expand human potential with technology. We believe our IPU technology will become the worldwide standard for artificial intelligence, transforming whole industries and sectors whether you are a medical researcher, roboticist or building autonomous cars.
What we do really makes a difference.
Responsibilities:
- Communicate requirements with clients, conduct feasibility analysis and solution design for specific fields, usage scenarios and constraints.
- Develop and optimize a distributed training system based on Graphcore IPU, including but not limited training process management, hyperparameter/network architecture automatic searching, computing resource scheduling, and multi-model fusion
- Develop and high-performance inference solutions optimize, provide end to end inference with high throughput and low latency.
- Develop general deep learning training, inference performance testing tools
- Build training/inference image and deploy to computing cluster
Requirements:
- Proficient in Python, familiar with C++, understand high-performance computing programming, familiar with concepts such as MPI, Python coroutine, C++ multithreading, message queue, project implementation experience is preferred.
- Realize the development of HTTP/GRPC services independently, familiar with Python Flask, Tornado, FastAPI and other libraries, familiar with Python threading, multiprocessor, coroutine programming.
- More than 2 years of deep learning distributed training/deployment and end to end AI project implementation experience, familiar with Pytorch, Tensorflow framework, understand common algorithms and models in CV, NLP, and recommendation fields, and ONNX standards.
- Familiar with K8s cluster deployment, Kubeflow, mlflow and other concepts, write Dockerfile, understand concept of load balancing.
- Familiar with Locust, AB, K6 and other performance testing tools, known QPS, Latency specifications meaning.
We welcome people of different backgrounds and experiences and are committed to building an inclusive work environment that makes Graphcore a great home for everyone. We are an equal opportunity employer and want to build a work environment where everyone is happy, productive and respectful, so they can do their best work. If you have a disability or additional need that requires accommodation, just let us know.