Co-Optimization of Mechanical and Computational Intelligence for Marine System Manipulators via Deep Reinforcement Learning
Abstract
This project aims to co-design, via Deep Reinforcement Learning, both the hardware actuation and kinematic parameters) and the software (control policy) for compliant robot hands in marine systems. Our starting premise is that the ``intelligence (i.e. the ability to react appropriately to novel conditions)of a compliant robot hand can reside equally in the computational and mechanical components. We thus derive our main hypothesis: that end-to-end co-optimization of these components will greatly outperform the traditional, sequential and separate approach of first designing the hardware, and thendeveloping controllers. Our proposed method is to apply Deep Reinforcement Learning by combining hardware and computational components a single ``policy which can be optimized via policy gradient methods. The key insight enabling this approach is that the ed as a computational graph, and thus optimized similarly to traditional, computational-only policies. This process will take place using simulated multi-contact physics (in order to quickly evaluate many combination of hardware parameters), and rely on domain randomization to transfer results to real hardware. We will test our results on real prototypes performing dexterous manipulation taskompliant robot hand. Motor torques are processed bya transmission mechanism (e.g. tendons, linkages) to produce joint torques, which in turn act on finger links to produce contact forces on the world. This chain of effects is external to all traditional RL algorithms, and considered part of the environment instead of part of the agent. We argue however thatthe robot hardware (transmission, kinematics) is in fact an intrinsic part of the policy, albeit a mechanical part. A comprehensive view of a policy is one that takes in the state of the world as well as sensor observations and responds by modulating forces applied by the fingers, through a combinationof control and hardware. Our main approach is thus to consider hardware design parameters as learnable variables in a Deep RL policy which can be optimized together with the traditional parameters of the computational policy (e.g. weights and biases of a neural network). In this way, theoptimization of hardware parameters can be directly incorporated into the existing Deep Reinforcement Learning framework, and can use existing Deep Rinforcement Learning algorithms with changes only in the design of the computational graph. This work is directly relevant to the Navy s stated goal of developing above and underwater robotic platforms that continue to improve in agility and dexterity given their size, weight, and power envelope. It can help expand the operational envelope of Navy underwater and amphibious vehicles,and enable enhanced underwater manipulation. Passively compliant, underactuated hands are particularly relevant to marine systems. Such devices can leverage passive compliance for effective reaction to external forces, an insckaging, stowing, and waterproofing: all the motors can reside in the forearm component of the hand, with a purely mechanical palm. Underwater deployment imposes strict packaging and stowing constraints on robotic hardware. We believe compliant underactuated hands are much more likely to meet such constraintscompared to their fully actuated, anthropomorphic counterparts. In order to realize this vision, we propose to complement the mechanical intelligence built into such hands using jointly optimized computational policies. By leveraging the mechanics in this fashion, we can reduce the computational requirements of the system, further increasing autonomy and helping deployment.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Dec 04, 2020
- Source ID
- N000142114010
Entities
People
- Matei Ciocarlie
Organizations
- Office of Naval Research
- Trustees of Columbia University in the City of New York
- United States Navy