Learning for Control

Our muscle-based robotic arm is a testbed for Learning for Control. While it offers unique possibilities in terms of high accelerations, extreme speeds and variable stiffness actuation, classical control methods are known to struggle with these abilities.

Jan Peters (Project Leader), Dieter Büchler, Simon Guist, Shane Gu, Okan Koc, ()

Control of complex plants or systems, especially robots actuated by pneumatic artificial muscles, is challenging due to nonlinearities, hysteresis effects, massive actuator delay, and unobservable dependencies such as temperature. Such plants and robots require much more from the control than classical methods can deliver. Therefore, we aim to develop novel methods for learning control that can deal with high-speed dynamics and muscular actuation.

Highly dynamic tasks that require large accelerations and precise tracking usually rely on accurate models and/or high gain feedback. Learning to track such dynamic movements with inaccurate models remains an open problem. To achieve accurate tracking for such tasks in a stable and efficient way, we have proposed a series of novel adaptive Iterative Learning Control (ILC) algorithms that are implemented efficiently and enable caution during learning [ ].

Muscular systems offer beneficial properties to achieve human-comparable performance in uncertain and fast-changing tasks [ ]. Muscles are backdrivable and provide variable stiffness while offering high forces to reach high accelerations. Nevertheless, these advantages come at a high price, as such robots defy classical approaches for control. We have built a muscular robot system to study how to control musculoskeletal robots using learning. We have shown how probabilistic forward dynamics models can be employed to control complex musculoskeletal robots incorporating model uncertainty of the forward model [ ]. We were able to perform the accuracy demanding task of robot table tennis with this setup using model-free RL [ ]. The work introduced a hybrid simulated and real training (HYSR) that enabled long-term training without resetting the real environment and incorporating prerecorded data for better transfer to the real task.

In addition, we have continued to work on reinforcement learning problems at the intersection of control and machine learning. We have extended several approaches in reinforcement learning for continuous control (NAF, Q-Prop, IPG, TDM) to handle function approximations with significantly improved sample efficiency [ ]. We were able to show that our approach scaled to learning a door opening task [ ].

Aside from fundamental algorithmic problems such as sample efficiency and stability, we also proposed algorithms that enable learning on real-world robots with fewer human interventions during learning. In [ ], we propose the Leave No Trace (LNT) algorithm that significantly reduced the number of hard resets required during learning. Lastly, we contributed to the field of hierarchical reinforcement learning with the HIRO algorithm [ ], a scalable off-policy HRL algorithm with substantially improved sample efficiency, and HiTS [ ] that improves temporal abstraction for non-stationary environments.

Download as PDF-Document

Latest News

Links

Contact Us