Whole-body manipulation,
learned from data and demonstration.

Most mobile manipulators control their base and arm as separate systems — leaving coordination and reachability on the table. This research uses RL and BC to teach a mobile manipulator to coordinate its full body for contact-rich door opening in MuJoCo.

Platform
mobile_manipulator
Sim
MuJoCo
Methods
RL + BC
Task
door_opening

Why the base and the arm shouldn't be strangers.

A mobile manipulator that doesn't think with its whole body is just a robot arm carrying around a confused car.

In most industrial deployments, mobile manipulators run two largely independent stacks: a navigation planner drives the base to a pose, then a manipulation planner takes over once the base has stopped. It works — barely. It's also brittle.

Decoupled control means the arm can't compensate for a base that stopped 5 cm short. The base can't shift its weight to extend the arm's reach. Contact-rich tasks like door opening — where the robot must simultaneously pull a handle, navigate around a swinging door, and reposition mid-task — are exactly where this seam falls apart.

The question driving this research: can we do better by treating the entire mobile-manipulator system as a single coupled task, and learning the coordination from data rather than hand-engineering it?

Two paradigms, one policy.

The approach combines two classes of robot learning that have historically been used separately. Each compensates for the other's weakness.

METHOD · RL

Reinforcement Learning

Agent explores in simulation, improves through trial and error optimizing a reward signal. Strong at discovering motion; weak at sample efficiency, especially for sparse-reward, contact-rich tasks like door opening.

METHOD · BC

Behavior Cloning

Robot learns by imitating demonstrations. Strong at bootstrapping reasonable behavior quickly; weak at recovering when the robot drifts off the demonstrated distribution.

The combination — BC for a strong prior, RL to refine and recover — is the bet at the heart of this work.

The setup, in plain language.

platform
Wheeled mobile base + serial-link arm, modeled as a unified kinematic system in simulation.
simulator
MuJoCo — chosen for its accurate contact dynamics, essential for door opening where the entire task is defined by what happens at the contact patch between gripper and handle.
task
Approach door → grasp handle → coordinate base + arm motion to open → navigate through. A task that decoupled controllers struggle with by design.
rl_algorithm
[TODO: PPO / SAC / custom — e.g. "PPO with shaped dense reward combining handle-grasp distance and door-angle progress"]
demos_for_bc
[TODO: how demos collected — scripted trajectories / teleoperation / motion capture]
bc_rl_strategy
[TODO: BC pretraining + RL fine-tuning / joint training with auxiliary BC loss / residual policy]
evaluation
[TODO: success rate / sample efficiency / generalization across N door variations]

What actually happened when policy met door.

[XX]%
success_rate
door_opening_task
[N]×
sample_efficiency
vs_rl_only_baseline
[N]
door_variations
generalized_to

Visual results

// recordings from MuJoCo · learned whole-body policy in action

whole_body_policy.mp4
approach → grasp → open → navigate-through
decoupled_baseline.mp4
classical separate base + arm control
📈
learning_curves.png
RL-only vs BC-only vs combined approach
ablation_study.png
contribution of BC pretraining vs RL fine-tuning

Four things this research shows.

01

BC pretraining = the difference between learning a door and learning nothing.

[TODO: did pure RL fail to learn the task at all? Did BC give the policy a workable starting point that RL could refine?]

02

Whole-body coordination beats decoupled control on contact-rich tasks.

[TODO: describe the qualitative gap — more reliable grasp? smoother motion? recovery from contact disturbances?]

03

Reward shaping matters more than algorithm choice.

[TODO: your finding on reward design — sparse vs dense, what the algorithm cared about most. Or replace with a more accurate finding.]

04

What didn't work — and why that's the most useful page in any research log.

[TODO: one or two honest failures — a hyperparameter regime that diverged, a task variation the policy couldn't generalize to, an architectural choice you'd revisit.]

Built with.

Simulation

MuJoCo custom mobile manipulator model door environment

Learning

PyTorch [RL library] [BC framework]

Lang & Tools

Python NumPy Matplotlib Weights & Biases Linux Git

Want to dig deeper?

Happy to discuss methodology, design choices, or any part of the work in detail. Open to robotics, mobile manipulation, and applied research roles.