Whole-body manipulation,
learned from data and demonstration.

Most mobile manipulators control their base and arm as separate systems — leaving coordination and reachability on the table. This research uses RL and BC to teach a mobile manipulator to coordinate its full body for contact-rich door opening in MuJoCo.

Platform

mobile_manipulator

Sim

MuJoCo

Methods

RL + BC

Task

door_opening

# 01 · the_problem

Why the base and the arm shouldn't be strangers.

A mobile manipulator that doesn't think with its whole body is just a robot arm carrying around a confused car.

In most industrial deployments, mobile manipulators run two largely independent stacks: a navigation planner drives the base to a pose, then a manipulation planner takes over once the base has stopped. It works — barely. It's also brittle.

Decoupled control means the arm can't compensate for a base that stopped 5 cm short. The base can't shift its weight to extend the arm's reach. Contact-rich tasks like door opening — where the robot must simultaneously pull a handle, navigate around a swinging door, and reposition mid-task — are exactly where this seam falls apart.

The question driving this research: can we do better by treating the entire mobile-manipulator system as a single coupled task, and learning the coordination from data rather than hand-engineering it?

# 02 · approach

Two paradigms, one policy.

The approach combines two classes of robot learning that have historically been used separately. Each compensates for the other's weakness.

METHOD · RL

Reinforcement Learning

Agent explores in simulation, improves through trial and error optimizing a reward signal. Strong at discovering motion; weak at sample efficiency, especially for sparse-reward, contact-rich tasks like door opening.

METHOD · BC

Behavior Cloning

Robot learns by imitating demonstrations. Strong at bootstrapping reasonable behavior quickly; weak at recovering when the robot drifts off the demonstrated distribution.

The combination — BC for a strong prior, RL to refine and recover — is the bet at the heart of this work.

# 03 · methodology

The setup, in plain language.

platform

Wheeled mobile base + serial-link arm, modeled as a unified kinematic system in simulation.

simulator

MuJoCo — chosen for its accurate contact dynamics, essential for door opening where the entire task is defined by what happens at the contact patch between gripper and handle.

task

Approach door → grasp handle → coordinate base + arm motion to open → navigate through. A task that decoupled controllers struggle with by design.

rl_algorithm

[TODO: PPO / SAC / custom — e.g. "PPO with shaped dense reward combining handle-grasp distance and door-angle progress"]

demos_for_bc

[TODO: how demos collected — scripted trajectories / teleoperation / motion capture]

bc_rl_strategy

[TODO: BC pretraining + RL fine-tuning / joint training with auxiliary BC loss / residual policy]

evaluation

[TODO: success rate / sample efficiency / generalization across N door variations]

# 04 · results

What actually happened when policy met door.

[XX]%

success_rate
door_opening_task

[N]×

sample_efficiency
vs_rl_only_baseline

[N]

door_variations
generalized_to

Visual results

// recordings from MuJoCo · learned whole-body policy in action

▶

whole_body_policy.mp4

approach → grasp → open → navigate-through

▶

decoupled_baseline.mp4

classical separate base + arm control

📈

learning_curves.png

RL-only vs BC-only vs combined approach

⊞

ablation_study.png

contribution of BC pretraining vs RL fine-tuning

# 05 · key_findings

Four things this research shows.

BC pretraining = the difference between learning a door and learning nothing.

[TODO: did pure RL fail to learn the task at all? Did BC give the policy a workable starting point that RL could refine?]

Whole-body coordination beats decoupled control on contact-rich tasks.

[TODO: describe the qualitative gap — more reliable grasp? smoother motion? recovery from contact disturbances?]

Reward shaping matters more than algorithm choice.

[TODO: your finding on reward design — sparse vs dense, what the algorithm cared about most. Or replace with a more accurate finding.]

What didn't work — and why that's the most useful page in any research log.

[TODO: one or two honest failures — a hyperparameter regime that diverged, a task variation the policy couldn't generalize to, an architectural choice you'd revisit.]

# 06 · tools_used

Built with.

Simulation

MuJoCo custom mobile manipulator model door environment

Learning

PyTorch [RL library] [BC framework]

Lang & Tools

Python NumPy Matplotlib Weights & Biases Linux Git

# 07 · let's_talk

Want to dig deeper?

Happy to discuss methodology, design choices, or any part of the work in detail. Open to robotics, mobile manipulation, and applied research roles.

→ Get in touch ← Back to portfolio