ManiSkill Defaults

ManiSkill Defaults#

Standard ManiSkill benchmark tasks with PPO policies and optional CAGE enhancement. These demonstrate how ManiDreams wraps any existing RL policy with uncertainty-aware action selection.

Common Architecture#

All three tasks share the same component structure:

Component	Implementation	Details
TSIP Backend	`DRISBackend`	ManiSkill vectorized env with DRIS state conversion
TSIP	`SimulationBasedTSIP`	Parallel rollout across DRIS copies
Cage	`DRISCage`	Goal-conditioned cost with variance penalty (lambda_var=0.1)
Solver	`PolicySampler`	Samples from PPO policy distribution
Sampler	`PPOPolicyAdapter`	Wraps trained PPO `ActorCritic` for stochastic sampling
Executor	ManiSkill `VectorEnv`	Standard gym vectorized environment

Two modes:

Baseline (--num_samples 0): Deterministic policy output, no CAGE evaluation
CAGE (--num_samples N): Sample N actions from policy distribution, evaluate via TSIP + DRISCage, select best

PushCube#

Push a cube to a target location on a tabletop.

# Baseline
python examples/tasks/maniskill_defaults/main.py --task PushCube-v1

# CAGE mode
python examples/tasks/maniskill_defaults/main.py --task PushCube-v1 \
    --num_samples 16 --n_dris_copies 16

PickCube#

Pick up a cube and move it to a goal position.

# Baseline
python examples/tasks/maniskill_defaults/main.py --task PickCube-v1

# CAGE mode with perturbation
python examples/tasks/maniskill_defaults/main.py --task PickCube-v1 \
    --num_samples 16 --n_dris_copies 8 \
    --lambda_var 0.2 \
    --pose_noise 0.05 0.05 0.0 0.0 0.0 0.1

PushT#

Push a T-shaped block to match a target pose.

# Baseline
python examples/tasks/maniskill_defaults/main.py --task PushT-v0

# CAGE mode
python examples/tasks/maniskill_defaults/main.py --task PushT-v0 \
    --num_samples 8 --n_dris_copies 4

Key Arguments#

Argument	Default	Description
`--task`	`PushCube-v1`	ManiSkill task ID
`--num_samples`	`0`	Number of candidate actions (0 = baseline)
`--n_dris_copies`	`16`	DRIS copies per evaluation environment
`--action_chunk`	`2`	Multi-step action chunking
`--lambda_var`	`0.1`	Variance penalty weight in DRISCage
`--pose_noise`	`[0.02,0.02,0,0,0,0.15]`	Pose perturbation for domain randomization
`--physics_noise`	`[0.2,0.3]`	Mass and friction perturbation
`--checkpoint`	auto-detected	Path to PPO model checkpoint
`--num_episodes`	`20`	Number of evaluation episodes

Checkpoints are auto-detected from examples/samplers/maniskill_defaults/ckpts/.