About H: H focuses on pushing the boundaries of superintelligence, starting with agents to take actions. These AI agents automate complex, multi-step tasks typically performed by humans, improving efficiency and decision-making in different environments. We approach safety and disruptive agentic capabilities in tandem. H is hiring world’s best engineers and researchers dedicated to changing the status quo in AI.
Holistic, Humanist, Humble. At H, we value an approach centered around actions, initiatives, and execution. We encourage our employees to take ownership of their tasks and deliver results. We also promote a mindset of openness, learning, and collaboration, where everyone has something to contribute. As a dynamic start-up, we believe it's essential to have fun while working hard to succeed.
About the Team: The VLM team drives the development of vision-language models at the core of our agentic AI systems. Built on the basis of our foundational LLMs, these models enable agents to perceive, understand, and act within complex environments by integrating visual and linguistic information. By combining state-of-the-art research with cutting-edge engineering, we create AI systems capable of automating multi-step, human-like tasks such as analyzing screens, understanding documents, or navigating virtual spaces. This is your chance to work on transformative technologies, shaping the frontier of superintelligent AI with a focus on actionable, multimodal understanding.
Key Responsibilities:
Design advanced vision-language models for agentic systems.
Optimize pipelines for training multimodal models.
Integrate VLMs into agents for automating human-like tasks.
Prototype systems for screen-based interactions and multimodal reasoning.
Present findings and stay ahead of the latest research.
Requirements:
Technical Skills:
Strong programming skills in Python, with proficiency in version control systems like Git.
Expertise in at least one deep learning framework (e.g., PyTorch, TensorFlow, JAX).
Hands-on experience with large-scale distributed training of vision-language models.
Knowledge of multimodal architectures such as transformers, autoregressive models, and their applications in agents.
Familiarity with applications like document understanding, image captioning, and cross-modal reasoning.
Research Skills:
Publications in top-tier AI conferences (e.g., NeurIPS, CVPR, ICML, ACL, ICCV) demonstrating expertise in VLMs or multimodal research.
MSc or PhD in machine learning, computer vision, natural language processing, or a related field.
Deep understanding of the intersection between multimodal learning and action-oriented AI.
Soft Skills:
Strong communication and presentation skills to articulate complex ideas clearly.
Collaborative mindset, thriving in dynamic, multidisciplinary teams.
Passion for solving complex problems at the intersection of AI perception and action.
Location: Paris, London, or remote in Europe.
What We Offer:
A welcoming and inclusive work environment.
Opportunities to grow professionally and contribute to transformative projects.
Competitive benefits and fun company events.
Remote work flexibility within Europe.
Join us and be part of shaping the future of superintelligent AI!
At H, we are on a mission to redefine the landscape of AI, and we're looking for a Member of Technical Staff (VLM) to join our innovative team. Located in vibrant cities like Paris or London, or even working remotely within Europe, you will play a pivotal role in developing vision-language models that empower our AI agents to understand and interact with the world around them. Imagine creating systems that can automate human-like tasks through the integration of visual and linguistic information! Your key responsibilities will include designing advanced VLMs, optimizing training pipelines, and integrating these models into our AI systems. We are all about collaboration and learning, and we value each team member's initiative and creativity. Safety and efficiency drive our work, but we also believe that enjoyment fuels our success. If you have a master's or PhD in machine learning or a related field, and have experience with deep learning frameworks and multimodal architectures, we want to hear from you! You'll be part of a team that pushes boundaries and creates transformative technologies while having fun along the way. Join us at H, and together, let's shape the future of superintelligent AI!
Hahn & Company is a private equity investment firm specializing in buyouts and corporate restructurings in South Korea. It is one of the largest private equity investment firms operating in North Asia.
2 jobsSubscribe to Rise newsletter