Most mobile manipulators control their base and arm as separate systems — leaving coordination and reachability on the table. This research uses RL and BC to teach a mobile manipulator to coordinate its full body for contact-rich door opening in MuJoCo.
A mobile manipulator that doesn't think with its whole body is just a robot arm carrying around a confused car.
In most industrial deployments, mobile manipulators run two largely independent stacks: a navigation planner drives the base to a pose, then a manipulation planner takes over once the base has stopped. It works — barely. It's also brittle.
Decoupled control means the arm can't compensate for a base that stopped 5 cm short. The base can't shift its weight to extend the arm's reach. Contact-rich tasks like door opening — where the robot must simultaneously pull a handle, navigate around a swinging door, and reposition mid-task — are exactly where this seam falls apart.
The question driving this research: can we do better by treating the entire mobile-manipulator system as a single coupled task, and learning the coordination from data rather than hand-engineering it?
The approach combines two classes of robot learning that have historically been used separately. Each compensates for the other's weakness.
Agent explores in simulation, improves through trial and error optimizing a reward signal. Strong at discovering motion; weak at sample efficiency, especially for sparse-reward, contact-rich tasks like door opening.
Robot learns by imitating demonstrations. Strong at bootstrapping reasonable behavior quickly; weak at recovering when the robot drifts off the demonstrated distribution.
The combination — BC for a strong prior, RL to refine and recover — is the bet at the heart of this work.
// recordings from MuJoCo · learned whole-body policy in action
[TODO: did pure RL fail to learn the task at all? Did BC give the policy a workable starting point that RL could refine?]
[TODO: describe the qualitative gap — more reliable grasp? smoother motion? recovery from contact disturbances?]
[TODO: your finding on reward design — sparse vs dense, what the algorithm cared about most. Or replace with a more accurate finding.]
[TODO: one or two honest failures — a hyperparameter regime that diverged, a task variation the policy couldn't generalize to, an architectural choice you'd revisit.]
Happy to discuss methodology, design choices, or any part of the work in detail. Open to robotics, mobile manipulation, and applied research roles.