Research
2020 - 2018
At Deepmind I have been progressively more interested in reinforcement learning on real robots. I have gained experience with the problems underling the application of deep neural networks at scale in the field of robotics. This new line of research have produced relevant publications in the area of:
- learning from demonstrations, i.e. learning using the prior knowledge embedded in a successful execution of the task;
- sim-to-real transfer, i.e. learning in augmented simulation and transferring on the real robot;
- learning from scratch, i.e. learning primarily with real-robot data and minimising prior knowledge.
Learning real-world tasks from demonstrations
with R. Jeong, N. Heess and J. T. Springenberg
Learning Dexterous Manipulation from Suboptimal Experts
In this research investigation, we introduce Relative Entropy Q-Learning (REQ), a simple policy iteration algorithm that combines ideas from successful offline and conventional RL algorithms. It represents the optimal policy via importance sampling from a learned prior and is well-suited to take advantage of mixed data distributions. We demonstrate experimentally that REQ outperforms several strong baselines on robotic manipulation tasks for which sub-optimal experts are available. We show how suboptimal experts can be constructed effectively by composing simple waypoint tracking controllers, and we also show how learned primitives can be combined with waypoint controllers to obtain reference behaviours to bootstrap a complex manipulation task on a simulated bimanual robot with human-like hands.
References
Learning real-world tasks from simulations
with R. Jeong, F. Romano, J. Kay, D. Khosid, and K. Bousmalis
Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation
In this work, we learn a latent state representation implicitly with deep reinforcement learning in simulation, and then adapt it to the real domain using unlabelled real robot data. We propose to do so by optimising sequence-based self-supervised objectives. These exploit the temporal nature of robot experience, and can be common in both the simulated and real domains, without assuming any alignment of underlying states in simulated and unlabelled real images. We propose Contrastive Forward Dynamics loss, which combines dynamics model learning with time-contrastive techniques.
Modelling Generalized Forces with Reinforcement Learning for Sim-to-Real Transfer
In this work, We use reinforcement learning to efficiently optimise the mapping from states to generalised forces over a discounted infinite horizon. We show that using only minutes of real world data improves the sim-to-real control policy transfer. We demonstrate the feasibility of our approach by validating it on a non-prehensile manipulation task on the Sawyer robot.
References
Learning real-world tasks with minimal prior
with M. Martins, G. Vezzani, T. Lampe, M. Neunert, M. Riedmiller
Ball-in-a-cup
This research activity presents a method for fast training of vision based control policies on real robots. The key idea behind our method is to perform multi-task Reinforcement Learning with auxiliary tasks that differ not only in the reward to be optimised but also in the state-space in which they operate. In particular, we allow auxiliary task policies to utilise task features that are available only at training-time. This allows for fast learning of auxiliary policies, which subsequently generate good data for training the main, vision-based control policies.
Peg-in-hole
We propose a challenging, highly under-actuated peg-in-hole task with a free, rotational asymmetrical peg, requiring a broad range of manipulation skills. While correct peg (re-)orientation is a requirement for successful insertion, there is no reward associated with it. Hence an agent needs to understand this pre-condition and learn the skill to fulfill it. The final insertion reward is sparse, allowing freedom in the solution and leading to complex emerging behaviour not envisioned during the task design.
References
2018
Before joining Deepmind, I have been involved in several European projects. Here is a list of the most relevant projects for which I am involved as a scientific responsible (i.e. principal investigator). Below follow a summary of the the research projects I have been involved in. For a complete list of papers see my publications list.
Whole-body dynamics with contact switching
with D. Pucci, F. Romano, G. Nava, S. Traversaro, S. Dafarra and F. Andrade
This research activity concerns the control of the iCub whole-body posture in situations which involve breaking and establishing contacts. At present, this research benefits of recent results in force/torques sensor calibration, proof of stability of inverse dynamic control approaches.
Open-source software repository: (1) WBI-Toolbox-controllers for step recovery, (2) codyco-modules, (3) WBI-Toolbox.
References
Whole-Body Dynamics Control
with D. Pucci, F. Romano and G. Nava
This research activity concerns the control of the iCub whole-body posture exploiting whole-body distributed contacts. This research activity is based on the iCub ability to control joint torques and to detect external contacts through whole-body distributed tactile sensors (i.e. artificial skin).
Open-source software repository: (1) WBI-Toolbox-controllers, (2) codyco-modules, (3) WBI-Toolbox.
References
Whole-Body Dynamics Modeling and Identification
with S. Traversaro, R. Camoriano
This research activity concerns the problem of modeling the whole-body dynamics of an articulated rigid-body structure. Focus is on computational efficiency which is obtained by implementing state-of-the-art inverse and forward dynamics algorithms (Featherstone 2007). This research activity includes the iCub simulations which include available (e.g. Gazebo) and custom (e.g. mex-wholebodymodel) open-source software
Open-source software repository: (1) mex-wholebodymodel, (2) gazebo-yarp-plugins,
References
Whole-Body Dynamics Estimation
with S. Traversaro, M. Fumagalli and S. Ivaldi
This research activity concerns the estimation of dynamic quantities (e.g. joint torques, joint accelerations, joint velocities, external forces) from whole-body distributed heterogenous sensors (e.g. artificial skin, accelerometers, gyroscopes, force/torque sensors).
Open-source software repository: (1) bnt_time_varying, (2) idyntree, (3) idyn.