PhD Position F/M Foundation Models and Natural Language Interaction for Human-Robot Collaboration
PhD Position F/M Foundation Models and Natural Language Interaction for Human-Robot Collaboration
Type de contrat : CDD
Niveau de diplôme exigé : Bac + 5 ou équivalent
Fonction : Doctorant
Contexte et atouts du poste
The HUCEBOT team is dedicated to advancing algorithms for human-centered robots: robots that are not working autonomously in isolation, but that instead react, interact, collaborate, and assist humans. To do so, these robots need to intertwine a multi-contact whole-body controller, a digital simulation of the interacting humans, and machine learning models to predict and respond to human movements and intentions. In a crescendo of complexity, the team tackles scenarios that involve collaboration with cobots, assistance with exoskeletons, and collaboration with humanoid robots. The application domains span from industrial robotics to space teleoperation.
The main robots of the team are the Tiago++ bimanual mobile manipulator, the Unitree G1 humanoid, and the Talos humanoid robot. The team also works with Franka cobots and exoskeletons.
The team currently consists of about 25 members, including permanent researchers, PhD students and post-doctoral students.
Serena Ivaldi, head of HUCEBOT, is holding the chair in Robotics and AI of the Cluster IA ENACT project (https://cluster-ia-enact.ai/) that is funding this PhD thesis. In the chair, she wants to push the research in Natural Language to assist humans in different scenarios of collaboration with robots, where safety is paramount. The ambition is to create a foundation that bridges natural language commands into interpretable commands for the robot, leading to robot actions that are contextualized and intrinsically safe.
Mission confiée
Most work on VLM/LLMs for robotics focused on generating sequences of actions and plans from high level goals, offline, only targeting autonomous robots isolated from humans. A critical limitation to deploy VLM/LLMs for robots collaborating with humans is their ability to be used online, in a human-in-the-loop scenario, to generate suitable motions and "safe" robot policies.
Here, we use VLM/LLMs to generate a robot's motions online in collaborative scenarios where safety is critical: active exoskeletons and mobile manipulators assisting humans in object manipulation. The human vocally commands the robot interactively, online, to control the generation of its motion at the low level: start, stop, direct, and change its low-level parametrization (e.g., compliant behavior, the velocity, the maximal torque assistance, etc.).
Extension of paradigms and comparison with existing and fine-tuning of VLAs is also considered, as this is part of the ongoing research of the team.
The first objective is to design the robot's controller with the natural language interaction feature in mind: the human's commands, corrections and Approximate Numerical Expressions must be translated into meaningful quantities, coherent with the physics of the problem. What do "faster", "a bit higher", "little to the right", and "more assistance" mean?
The second objective is to design new multimodal models fusing VLM/LLMs and multimodal pipelines to predict the human's intent and minimize the need for corrections. Natural language instructions may be incomplete or unclear, but cameras and microphones (or other sensors) could provide sufficient contextual information to generate an appropriate motion. For example, "take that" could be easily translated into "grasp the bottle", if it is the only item in front of the robot. "Move a bit to the right" needs clarifications, but also estimation of physical quantities that are context dependent.
The third objective is to detect emergency commands, leveraging both LLMs and audio processing models for nonverbal communication, and generating suitable robot's reactive behaviors. Humans are often unable to speak clearly when they interact with a robot: sometimes, fear takes over and they do not speak at all, or they mumble, or scream, when they could just say a clear "stop". Detecting emergency commands is critical to be able to deploy the robots into the real world. For example, "Watch out", "Attention!" are difficult to translate into precise motions, and require one-shot evaluations because of the urgent nature of the command.
The PhD student will carry out research in the aforementioned objectives, and will benefit from our collaboration with E. Zibetti (Paris 8, SHS), expert in Approximate Numerical Expressions for Psychology, and D. Sadigh (Stanford University), leading the research in LLMs for robot actions.
Real-world demonstrations with real robots and real humans interacting with the robots are mandatory in this PhD.
Principales activités
Main activities: implement, test and develop novel algorithms for real robots that use language models and foundation models. Write papers and present them at conferences. Write, test, validate and document its associated software. Experiments with real robots are mandatory.
The PhD will also be involved in the activities organized by the Cluster-AI project ENACT, which may involve dissemination actions, meetings and presentations to relevant stakeholders (Europe, France, industries, etc).
Compétences
Good skills in Python (Pytorch). Ideally, prior experience with LLM, VLM and Foundation Models.
Good knowledge of robotics.
Languages: English (English is the official language of the team and many members do no speak French).
Proactivity and curiosity, daily communication, ability to work in a team are fundamental.
Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
Rémunération
€2300 gross/month
Informations générales
- Thème/Domaine :
Robotique et environnements intelligents
Ingénierie logicielle (BAP E) - Ville : Villers lès Nancy
- Centre Inria : Centre Inria de l'Université de Lorraine
- Date de prise de fonction souhaitée : 2026-09-01
- Durée de contrat : 3 ans
- Date limite pour postuler : 2026-04-18
Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.
Consignes pour postuler
Sécurité défense :
Ce poste est susceptible d’être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n°2011-1425 relatif à la protection du potentiel scientifique et technique de la nation (PPST). L’autorisation d’accès à une zone est délivrée par le chef d’établissement, après avis ministériel favorable, tel que défini dans l’arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l’annulation du recrutement.
Politique de recrutement :
Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.
Contacts
- Équipe Inria : HUCEBOT
-
Directeur de thèse :
Ivaldi Serena / [email protected]
L'essentiel pour réussir
The ideal candidate is fascinated by the recent developments in artificial intelligence and robotics, especially Foundation Models, LLM, VLM, OpenVLA. He/She wants to experiment with these new techniques, develop their skills, and experiment with state-of-the-art robots.
IMPORTANT: candidates must upload their CV, motivation letter and all documents listed in this page: https://team.inria.fr/hucebot/job-offers/
Applications that do not contain these documents will not be considered.
A propos d'Inria
Inria est l’institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l’interface d’autres disciplines. L’institut fait appel à de nombreux talents dans plus d’une quarantaine de métiers différents. 900 personnels d’appui à la recherche et à l’innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'efforce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.