ManiDreams: Robust Object Manipulation via Uncertainty-aware Intuitive Physics

Abstract

Dynamics models, whether simulators or learned world models, have long been central to robotic manipulation, but most of these models focus on minimizing prediction error rather than confronting a more fundamental challenge: real-world manipulation is inherently uncertain. We argue that robust manipulation under uncertainty is fundamentally an integration problem: uncertainties must be represented, propagated, and constrained within the planning loop, not merely suppressed during training.

We present and open-source ManiDreams, a modular framework for uncertainty-aware manipulation planning over intuitive physics models that realizes this integration through composable abstractions for distributional state representation, backend-agnostic dynamics prediction, and declarative constraint specification for action optimization. The framework explicitly addresses three sources of uncertainty: perceptual, parametric, and structural. It wraps any base policy with a sample-predict-constrain loop that evaluates candidate actions against distributional outcomes, adding robustness without retraining. Experiments on default ManiSkill tasks show that ManiDreams maintains robust performance under various perturbations, where the RL baseline degrades significantly. Runnable examples on pushing, picking, catching, and real world deployment demonstrate flexibility for applications across different policies, optimizers, physics backends, and executors.

Real2Sim with DRIS

Only stereo streaming, generalizes to any unknown objects without any pre-modeling. Real-time (15 FPS) on a RTX3070.
Physics backend: Newton v1.0.0.
Bounding box estimation: Fast-FoundationStereo + SAM2.
(Drag horizontally for more)

Pipeline

The cage-constrained planning pipeline: generate candidate actions → parallel forward prediction via TSIP → cage evaluation & validation → execute the best valid action.

Architecture & Core Concepts

Three-layer modular architecture: abstract interfaces → concrete implementations → task-specific integrations.

DRIS

Domain-Randomized Instance Set

Universal state representation carrying observation data plus domain-randomization context. Supports state vectors, images, point clouds, or any combination.

TSIP

Task-Specific Intuitive Physics

Forward model predicting next state given current state and action. Supports simulation-based (ManiSkill, Newton, etc.) and learning-based (diffusion model) backends.

Cage

Spatial Constraint Evaluator

Virtual boundary providing continuous cost evaluation and validation (for constraint satisfaction). Supports time-varying parameters including deformation and custom trajectories.

Solver

Action Selection via Sampling & Optimization

Generates and evaluates candidate actions. Combines samplers (PolicySampler, GaussianSampler) with optimizers (MPPI, Geometric, MPC) to propose and refine actions under cage constraints.

Runnable Examples

Cage-constrained manipulation across four tasks with different solver and TSIP configurations.

Object Pushing

Simulation-based TSIP

Object Pushing

Learning-based TSIP

Ball Catching

RL policy sampler

Card Picking

MPPI optimizer

Real Robot Experiments with Diffusion-based TSIP

Random Object Picking from Clutter

Push-then-pick strategy with real-time DRIS via SAM2

Flat Object Scooping from Clutter

Push-to-corner-then-scoop strategy with real-time DRIS via SAM2

Default ManiSkill Tasks

Evaluation on standard ManiSkill benchmarks (top: simulation-based TSIP, bottom: executor).

PushCube

Push a cube to a target location

PickCube

Pick up a cube and move to a goal position

PushT

Push a T-shaped block to a target pose

Benchmark Results

Comparison of cage-constrained methods against baselines across manipulation tasks.

Ablation Study

DRIS instances m	1	4	8	16	32
Success rate (%)	58	72	82	85	86

Solver samples N	1	4	8	16	32
Success rate (%)	52	71	82	87	88

Distribution width	Narrow		Medium	Wide
Success rate (%)	74		82	76

Computation Overhead

Wall-clock time and GPU memory overhead of cage-constrained planning across configurations.

ManiDreams: An Open-Source Library for Robust Object Manipulation via Uncertainty-aware Task-specific Intuitive Physics

ManiDreams maintains a time-varying constraint (cage) around target objects, sampling and evaluating candidate actions via parallel forward simulation for robust execution.