RoboVerse Architecture Review & Improvement Roadmap#

Document Version: 1.1 Last Updated: 2026-05-26 Status: Active Development

This document provides a comprehensive architecture review of the RoboVerse codebase and outlines a structured improvement roadmap. It is intended for core maintainers and contributors who want to understand the current state of the codebase and contribute to its improvement.

2026-05-26 update: 4 of the originally-listed P0/P1 items have landed as code + regression tests. They are kept in the document below for historical context, but each is now annotated STATUS: FIXED with a pointer to the test file that pins the fix in place. Forward / backward compatibility statements are included so existing users can adopt the changes without surprise β€” every fix is either a pure warning, a stale- cache eviction, or an information upgrade to a previously-silent failure mode.


Table of Contents#

  1. Executive Summary

  2. Architecture Overview

  3. Identified Issues

  4. Improvement Roadmap

  5. Detailed TODO List

  6. Implementation Guidelines

  7. Testing Strategy


Executive Summary#

Strengths#

RoboVerse is a well-architected multi-simulator robotics framework with several notable strengths:

Aspect

Assessment

Details

Modularity

Excellent

Clear separation between metasim, roboverse_pack, and roboverse_learn

Simulator Abstraction

Good

Unified interface across MuJoCo, IsaacSim, SAPIEN, PyBullet, Genesis

Configuration System

Good

Type-safe @configclass decorator with validation

Domain Randomization

Excellent

Comprehensive DR system with hybrid simulation support

Documentation

Good

Extensive tutorials and API documentation

Critical Issues Requiring Attention#

Priority

Issue

Impact

Effort

Status

P0

State cache consistency bug

Data corruption

Low

βœ… FIXED β€” see Issue 1

P0

set_states silent-drop of control-input keys

Silent no-op

Low

βœ… FIXED β€” see Issue 8

P0

set_dof_targets cache staleness

Silent stale state

Low

βœ… FIXED β€” see Issue 9

P0

Test coverage severely lacking

Reliability

High

🟑 PARTIAL β€” see Issue 4

P1

Parallel-sim error handling

Silent worker death

Medium

βœ… FIXED β€” see Issue 7

P1

Backend interface drift (@abstractmethod gaps)

Late-bind failures

Low

βœ… FIXED via contract test β€” see Issue 2

P1

Configuration system fragmentation

Usability

Medium

⏳ Open

P1

Environment creation interface inconsistency

Usability

Medium

⏳ Open

P2

Code quality issues

Maintainability

Low

⏳ Open


Architecture Overview#

Module Structure#

RoboVerse/
β”œβ”€β”€ metasim/                 # Core simulation framework
β”‚   β”œβ”€β”€ sim/                 # Simulator handlers (MuJoCo, Isaac, SAPIEN, etc.)
β”‚   β”œβ”€β”€ scenario/            # Scene configuration (robots, objects, cameras)
β”‚   β”œβ”€β”€ task/                # Task environment abstraction
β”‚   β”œβ”€β”€ randomization/       # Domain randomization system
β”‚   β”œβ”€β”€ queries/             # Extended query system (contacts, sensors)
β”‚   └── utils/               # Utilities (configclass, math, state conversion)
β”‚
β”œβ”€β”€ roboverse_pack/          # Assets and task definitions
β”‚   β”œβ”€β”€ robots/              # 50+ robot configurations
β”‚   β”œβ”€β”€ tasks/               # 200+ task environments
β”‚   β”œβ”€β”€ scenes/              # Scene configurations
β”‚   └── queries/             # Custom query implementations
β”‚
β”œβ”€β”€ roboverse_learn/         # Learning algorithms
β”‚   β”œβ”€β”€ il/                  # Imitation learning (ACT, Diffusion Policy, etc.)
β”‚   β”œβ”€β”€ rl/                  # Reinforcement learning (PPO, TD3, SAC)
β”‚   └── vla/                 # Vision-Language-Action models
β”‚
└── generation/              # Asset generation and conversion tools

Core Abstractions#

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        BaseSimHandler                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”‚
β”‚  β”‚ MujocoHandlerβ”‚  β”‚IsaacHandler β”‚  β”‚SAPIENHandlerβ”‚  ...        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        BaseTaskEnv                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”‚
β”‚  β”‚PickPlaceTaskβ”‚  β”‚LocomotionTaskβ”‚  β”‚ManipulationTaskβ”‚  ...     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        ScenarioCfg                               β”‚
β”‚  robots[] + objects[] + cameras[] + lights[] + sim_params       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Identified Issues#

Issue 1: State Cache Consistency Bug (P0 - FIXED 2026-05-26)#

Location: metasim/sim/base.py

Problem (historical): The state cache was modified in-place when switching between tensor and dict modes, causing data corruption.

Status: βœ… FIXED. BaseSimHandler now keeps two independent caches (_tensor_state_cache / _dict_state_cache). On a miss for the requested mode it lazily converts from the other. set_states, set_dof_targets and simulate all call _invalidate_state_caches.

Tests pinning the fix:

  • metasim/test/sim/test_state_modes.py::test_state_cache_mode_independence β€” integration (mujoco / sapien3 / isaacsim / isaacgym / newton)

  • metasim/test/sim/test_state_modes.py::test_set_dof_targets_invalidates_state_cache β€” integration; specifically catches the staleness regressed in Issue 9

  • metasim/test/test_backend_contract_general.py::test_set_states_invalidates_cache_on_all_backends β€” static AST check on base.py; runs in -k general

  • metasim/test/test_set_states_key_validation.py::test_set_dof_targets_invalidates_state_cache_unit β€” unit-level stub guard

Forward / backward compat: Pure bugfix. Callers that previously received corrupted data now receive consistent data β€” there is no API surface change. The lazy-conversion path was already the documented intent; only the in-place mutation was wrong.


Issue 2: Incomplete Abstract Method Declarations (P1 β€” FIXED via contract test 2026-05-26)#

Location: metasim/sim/base.py

Status: βœ… FIXED in spirit. _set_dof_targets is now actually decorated @abstractmethod. Two other contract methods (_get_joint_names / _get_body_names) remain commented out because flipping the decorator would break PyrepHandler / partial SinglePybulletHandler / partial GenesisHandler at import time β€” those backends genuinely do not implement them yet.

Instead of breaking imports, metasim/test/test_backend_contract_general.py statically asserts every concrete BaseSimHandler subclass overrides every documented contract method, with xfail markers for known gaps. When a backend catches up, its xfail flips to xpass β€” that’s the signal to enable the decorator for real.

Forward / backward compat: Pure additive contract enforcement. No existing backend was broken. New backends added later will get a clear failing test on day one if they forget a method.


Issue 3: Configuration System Fragmentation (P1)#

Problem: Three different configuration systems are used across modules:

Module

System

Tools

metasim

@configclass

Python dataclass + custom wrapper

roboverse_learn/il

Hydra

YAML + OmegaConf

roboverse_learn/rl

Mixed

YAML + tyro + argparse

Impact:

  • Steep learning curve for contributors

  • Configuration cannot be easily shared between modules

  • Inconsistent user experience


Issue 4: Test Coverage Severely Lacking (P0)#

Current Coverage Estimate:

Module

Test Files

Estimated Coverage

metasim

29

~60-70%

roboverse_pack

0

0%

roboverse_learn

1

<5%

generation

1

~20%

Overall

31

~15-20%

Critical Gaps:

  • 50+ robot configurations have no validation tests

  • 200+ task environments have no tests

  • All learning algorithms (ACT, Diffusion Policy, PPO, TD3) have no tests

  • No code coverage reporting in CI


Issue 5: Environment Creation Interface Inconsistency (P1)#

Problem: Two different ways to create environments:

# Method 1: Gymnasium style (used in clean_rl, vla)
from gymnasium import make_vec
env = make_vec("RoboVerse/task", robots=[...], simulator=sim)

# Method 2: Direct task class (used in fast_td3, rsl_rl, il)
from metasim.task.registry import get_task_class
env = get_task_class(task)(scenario)

Impact: Confusion for new users; inconsistent integration patterns.


Issue 6: Code Quality Issues (P2)#

Issue

Location

Count

Commented-out code

Multiple files

~15 instances

Magic numbers

sim/isaacsim/, sim/mujoco/

~20 instances

Missing type annotations

Various

~100+ functions

Inconsistent error handling

All simulator handlers

Varies


Issue 7: Parallel Simulation Error Handling (P1 β€” FIXED 2026-05-26)#

Location: metasim/sim/parallel.py

Problem (historical): _check_error() was only called from launch(). A worker that died on any other operation (OOM, GPU failure, asset load error) left the parent process either hanging on remote.recv() or eventually raising a cryptic EOFError β€” the real traceback in error_queue was never surfaced.

Status: βœ… FIXED. Three changes:

  1. _check_error now also detects workers that died without reporting (e.g. OOM-killed / SIGKILL) via process.is_alive(). Queue messages are drained with full traceback formatting.

  2. Every public method on ParallelHandler (_set_states, _set_dof_targets, _simulate, get_joint_names, get_body_names, device, _get_states) calls _check_error after wire I/O.

  3. A new _recv_or_surface wraps remote.recv() and translates EOFError / BrokenPipeError / ConnectionResetError into a RuntimeError carrying the real worker traceback β€” instead of the user chasing a cryptic IPC exception.

Tests pinning the fix:

  • metasim/test/test_parallel_error_handling_general.py β€” 6 unit tests using a sync queue stub (mp.Queue’s async put races with the immediate empty-check, so a sync stand-in keeps the tests deterministic without requiring a real worker process)

Forward / backward compat: Pure information upgrade. A call that previously hung or raised EOFError now raises RuntimeError with the actual worker traceback. Calls that previously succeeded continue to succeed unchanged.


Issue 8: set_states silently dropped control-input keys (P0 β€” FIXED 2026-05-26)#

Location: metasim/sim/base.py (boundary), every backend _set_states

Problem (historical): DictRobotState advertises dof_pos_target / dof_vel_target / dof_torque as valid keys, but every backend’s _set_states only honours pos / rot / dof_pos. Callers mistakenly passing dof_pos_target to set_states got a silent no-op β€” the joints never moved. This cost ~15 downstream BC experiments before the cause was found.

Status: βœ… FIXED. BaseSimHandler.set_states now runs _warn_set_states_keys before dispatching to the backend. Unknown keys log a one-shot warning per (role, key) with the list of valid keys; the three control-input keys get a specific "this is a control input β€” use set_dof_targets(...) instead" hint. Deduplicated per handler instance so the hot path stays quiet.

Tests pinning the fix:

  • metasim/test/test_set_states_key_validation.py β€” 9 unit tests (known-keys-quiet, control-input-hint, unknown-key-message, deduplication, per-role labelling, TensorState fast-path, plus the set_dof_targets cache-invalidation guard from Issue 9)

Forward / backward compat: Pure warning addition β€” runtime behaviour of set_states is unchanged for code that passed valid keys. Code that previously silently no-op’d now produces a clear log line; the hot path overhead is one set membership check per (role, key) seen.


Issue 9: set_dof_targets left state cache stale (P0 β€” FIXED 2026-05-26)#

Location: metasim/sim/base.py

Problem (historical): MuJoCo writes actuator ctrl in _set_dof_targets, and _get_states reads ctrl back as joint_pos_target. The base class’s public set_dof_targets forgot to invalidate the state cache. Any get_states between set_dof_targets and the next simulate returned the previous joint_pos_target.

Status: βœ… FIXED. BaseSimHandler.set_dof_targets now calls _invalidate_state_caches() before dispatching to the backend. Universal because no concrete handler overrides the public set_dof_targets β€” every backend benefits.

Tests pinning the fix:

  • metasim/test/sim/test_state_modes.py::test_set_dof_targets_invalidates_state_cache β€” integration; verified on mujoco / sapien3 in roboverse env, marker also covers isaacsim / isaacgym / newton

  • metasim/test/test_set_states_key_validation.py::test_set_dof_targets_invalidates_state_cache_unit β€” fast unit-level guard with stub handler

Forward / backward compat: Pure bugfix. The first get_states after a set_dof_targets now reflects the fresh action; previously it returned stale data. No API surface change.


Issue 10: Robot config drift (P0 β€” FIXED via test 2026-05-26)#

Location: roboverse_pack/robots/*.py

Problem (historical): 50+ RobotCfg subclasses with zero validation tests. Typos in default_joint_positions (joints that aren’t in joint_limits) and out-of-limit defaults were silently clamped or ignored at sim launch β€” observable as wrong reset states rather than as a config error.

Status: βœ… FIXED via test gate. metasim/test/test_robot_cfg_validation_general.py and RoboVerse/tests/test_roboverse_robot_cfg_validation.py walk every discoverable RobotCfg subclass and assert: instantiation, non-empty name, default_joint_positions keys βŠ† joint_limits keys, defaults ∈ limit intervals.

Bugs surfaced (xfail-documented, not fixed in this pass to preserve backward compat for trained policies):

  1. AlohaAgilexCfg β€” 16 fl_/fr_joint{1..8} keys in default_joint_positions but only single-arm names in joint_limits (bimanual override gap)

  2. G1TrackingCfg β€” regex keys like .*_ankle_pitch_joint in default_joint_positions not expanded; joint_limits has the concrete left_/right_ names

  3. YamCfg β€” joint2 / joint4 defaults copy-pasted from Franka home pose but joint_limits are Yam’s narrower ranges

  4. ArxL5Cfg β€” same Franka copy-paste, different limit ranges

  5. VegaCfg β€” torso_j1 joint_limits is the single-point [0.2, 0.2] β€” looks like a fixed offset, not a range

  6. SoArm100Cfg β€” Wrist_Pitch default -2.356 outside [-0.192, 3.927]

  7. KochCfg β€” same shape as SoArm100

  8. Go2Cfg β€” RL/RR_thigh_joint default 1.0 outside [-4.54, 0.52] (sign error)

  9. AllegroHandCfg β€” thumb_joint_0 default 0 below lower limit 0.263

When a robot is fixed, its xfail flips to xpass β€” the test_known_gap_dicts_match_actual_failures self-check then tells the maintainer to remove the entry so the contract tightens.

Forward / backward compat: Pure additive contract enforcement. No RobotCfg was modified β€” the test only documents the gaps so production behaviour (silent clamp / silent ignore) is preserved for anyone training against the current defaults.


Warning Catalog (added 2026-05-30)#

As part of the 2026-05 hardening pass, BaseSimHandler and several concrete backends started emitting one-shot warnings for the silent- drop / cross-backend asymmetry patterns previously hidden inside _set_states / _set_dof_targets / actuator wiring / scenario setup. Each warning fires at most once per (handler, gap-identity) so hot-path replay does not spam the log; each tells you exactly what was dropped and how to make it stop.

Source

Trigger

Meaning

Fix

set_states

dict has dof_pos_target / dof_vel_target / dof_torque

These are control inputs, not state β€” every backend silently drops them.

Call set_dof_targets(...) instead.

set_states

dict has unknown key

Typo or a field no backend honours.

Use one of pos / rot / dof_pos; the warning lists the valid set.

set_states

dict has vel / ang_vel / dof_vel

Velocity fields are populated by get_states for inspection, but no backend writes them in _set_states today (MuJoCo zeros qvel, Sapien3 ignores).

Initialise momentum via simulate() with a non-zero initial action, or open a feature request for backend-side velocity write.

set_states

dict has only one of pos / rot for an entity

Cross-backend divergence: MuJoCo silently fills the missing component with a default, Sapien3 raises KeyError.

Pass both pos and rot, or neither.

set_dof_targets

action targets unknown robot name

Robot not in scenario.robots β€” every backend silently drops the action.

Use a robot name listed in the warning’s β€œKnown robots” suffix.

set_dof_targets

dict has unknown joint name under a robot

Joint not in handler.get_joint_names(robot.name) β€” silently dropped.

Use one of the joint names from get_joint_names; the warning lists them.

set_dof_targets

unknown top-level key under a robot

Typo like dof_pos_targt; nothing reaches the actuator.

Use dof_pos_target / dof_vel_target / dof_torque / dof_effort_target.

MujocoHandler._apply_actuator_settings

stiffness / damping set but effort_limit_sim unset

MJCF-authored forcerange stays active and silently dominates the new PD gain. Same RoboVerse config β†’ different effective torque per backend.

Set effort_limit_sim on the actuator cfg to the value you want, or accept the asset-file clamp deliberately.

NewtonHandler._apply_actuator_settings

same as MuJoCo above

Newton inherits joint_effort_limit from the asset file.

Same β€” set effort_limit_sim.

NewtonHandler._set_dof_targets

self._control has no joint_target_pos / joint_target_vel / joint_f buffer

Newton model has no position / velocity / effort actuator for the targeted joint. Every action write is dropped.

Verify the MJCF / URDF has actuators on the joints you control, or use control_type="effort" deliberately.

ScenarioCfg.__post_init__

duplicate name across robots + objects

object_dict = {obj.name: obj for ...} in every handler collapses the duplicate. Set / get / step all silently reference the wrong entity.

Give each robot and object a unique name.

TaskBase.reset / RLTaskEnv.reset

seed=N passed but handler has no set_seed

Reproducibility contract: rollouts are not bit-reproducible on that backend’s simulator side. (Won’t fire on backends that inherit BaseSimHandler.set_seed β€” included for forward-compat against future backends that override.)

Implement set_seed on the handler, or call random.seed / np.random.seed / torch.manual_seed directly.

ParallelHandler._check_error

worker process not alive

A multiprocessing worker died (OOM-kill, SIGKILL, segfault) without writing to error_queue.

The exit code is reported in the exception; check the worker log or run with a smaller num_envs to bisect.

ParallelHandler._recv_or_surface

EOFError / BrokenPipeError on remote.recv()

Worker closed its pipe β€” typically the worker process is dead and error_queue carries the real Python traceback.

The wrapped exception’s __cause__ is the original IPC error; the message describes which worker died.

When you fix a warning’s root cause, the warning stops firing β€” it does NOT need a separate β€œsilence” flag. The dedupe lives on the handler instance, so each Python process emits each warning at most once.


Improvement Roadmap#

Phase 1: Critical Fixes (Weeks 1-2)#

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PHASE 1: Critical Fixes                                β”‚
β”‚                                                        β”‚
β”‚ β–‘ TODO-001: Fix state cache consistency                β”‚
β”‚ β–‘ TODO-002: Add @abstractmethod decorators             β”‚
β”‚ β–‘ TODO-003: Add robot configuration validation tests   β”‚
β”‚ β–‘ TODO-004: Integrate pytest-cov in CI                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 2: Test Coverage (Weeks 3-6)#

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PHASE 2: Test Coverage Improvement                     β”‚
β”‚                                                        β”‚
β”‚ β–‘ TODO-005: Add task environment integration tests     β”‚
β”‚ β–‘ TODO-006: Add learning algorithm unit tests          β”‚
β”‚ β–‘ TODO-007: Add state conversion tests                 β”‚
β”‚ β–‘ TODO-008: Add domain randomization tests             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 3: Interface Unification (Weeks 7-10)#

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PHASE 3: Interface Unification                         β”‚
β”‚                                                        β”‚
β”‚ β–‘ TODO-009: Create unified environment factory         β”‚
β”‚ β–‘ TODO-010: Standardize configuration loading          β”‚
β”‚ β–‘ TODO-011: Unify error handling across simulators     β”‚
β”‚ β–‘ TODO-012: Add deprecation warnings for old APIs      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 4: Code Quality (Weeks 11-14)#

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PHASE 4: Code Quality                                  β”‚
β”‚                                                        β”‚
β”‚ β–‘ TODO-013: Remove commented-out code                  β”‚
β”‚ β–‘ TODO-014: Extract magic numbers to constants         β”‚
β”‚ β–‘ TODO-015: Add comprehensive type annotations         β”‚
β”‚ β–‘ TODO-016: Add performance benchmarks                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 5: Architecture Evolution (Long-term)#

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PHASE 5: Architecture Evolution                        β”‚
β”‚                                                        β”‚
β”‚ β–‘ TODO-017: Implement plugin architecture for sims     β”‚
β”‚ β–‘ TODO-018: Create unified configuration system        β”‚
β”‚ β–‘ TODO-019: Add async simulation support               β”‚
β”‚ β–‘ TODO-020: Performance profiling integration          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Detailed TODO List#

TODO-001: Fix State Cache Consistency#

Priority: P0 (Critical)
Effort: Low (1-2 days)
Risk: Low

Description: Fix the state cache to maintain separate caches for tensor and dict formats.

Implementation:

# metasim/sim/base.py

class BaseSimHandler(ABC):
    def __init__(self, scenario, optional_queries=None):
        # ... existing code ...
        self._state_cache_expire = True
        self._tensor_state_cache: TensorState | None = None
        self._dict_state_cache: list[DictEnvState] | None = None

    def _invalidate_cache(self) -> None:
        """Invalidate all state caches."""
        self._state_cache_expire = True
        self._tensor_state_cache = None
        self._dict_state_cache = None

    def set_states(self, states, env_ids=None) -> None:
        """Set states and invalidate cache."""
        self._invalidate_cache()
        self._set_states(states, env_ids)

    def get_states(
        self, 
        env_ids: list[int] | None = None, 
        mode: Literal["tensor", "dict"] = "tensor"
    ) -> TensorState | list[DictEnvState]:
        """Get states with independent caching for each mode."""
        if self._state_cache_expire:
            self._tensor_state_cache = self._get_states(env_ids=env_ids)
            self._dict_state_cache = None  # Lazy conversion
            self._state_cache_expire = False

        if mode == "tensor":
            return self._tensor_state_cache
        else:
            if self._dict_state_cache is None:
                self._dict_state_cache = state_tensor_to_nested(
                    self, self._tensor_state_cache
                )
            return self._dict_state_cache

Test Case:

# metasim/test/sim/test_state_cache.py

@pytest.mark.general
def test_state_cache_mode_independence():
    """Verify that switching modes doesn't corrupt cache."""
    handler = create_test_handler()
    handler.launch()
    
    # Get tensor state
    states_t1 = handler.get_states(mode="tensor")
    assert isinstance(states_t1, TensorState)
    
    # Get dict state (should not affect tensor cache)
    states_d = handler.get_states(mode="dict")
    assert isinstance(states_d, list)
    
    # Get tensor state again (should return same type)
    states_t2 = handler.get_states(mode="tensor")
    assert isinstance(states_t2, TensorState)
    
    # Values should match
    assert torch.allclose(states_t1.pos, states_t2.pos)

Acceptance Criteria:

  • Test case passes

  • Existing tests still pass

  • No breaking changes to public API


TODO-002: Add @abstractmethod Decorators#

Priority: P1
Effort: Very Low (1 hour)
Risk: Very Low

Files to modify:

  • metasim/sim/base.py

Changes:

# Before
# @abstractmethod
def _set_dof_targets(self, actions: list[Action]) -> None:
    raise NotImplementedError

# After
@abstractmethod
def _set_dof_targets(self, actions: list[Action]) -> None:
    """Set DOF targets. Subclasses must implement this method."""
    raise NotImplementedError

Verification:

# Run mypy to verify abstract method detection
mypy metasim/sim/base.py

TODO-003: Add Robot Configuration Validation Tests#

Priority: P0
Effort: Medium (2-3 days)
Risk: Low

Implementation:

# metasim/test/test_robot_configs.py

import pytest
from pathlib import Path
import importlib
import pkgutil

def get_all_robot_configs():
    """Dynamically discover all robot configuration classes."""
    import roboverse_pack.robots as robots_module
    
    configs = []
    for importer, modname, ispkg in pkgutil.iter_modules(robots_module.__path__):
        if modname.endswith('_cfg'):
            module = importlib.import_module(f'roboverse_pack.robots.{modname}')
            for name in dir(module):
                obj = getattr(module, name)
                if (isinstance(obj, type) and 
                    hasattr(obj, 'name') and 
                    name.endswith('Cfg')):
                    configs.append(obj)
    return configs

ALL_ROBOT_CONFIGS = get_all_robot_configs()

@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS, 
                         ids=lambda x: x.__name__)
def test_robot_config_instantiation(robot_cfg_cls):
    """Verify robot config can be instantiated."""
    cfg = robot_cfg_cls()
    assert cfg.name is not None
    assert isinstance(cfg.name, str)
    assert len(cfg.name) > 0

@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS,
                         ids=lambda x: x.__name__)
def test_robot_config_has_asset_path(robot_cfg_cls):
    """Verify robot config has at least one asset path."""
    cfg = robot_cfg_cls()
    
    asset_paths = [
        getattr(cfg, 'usd_path', None),
        getattr(cfg, 'urdf_path', None),
        getattr(cfg, 'mjcf_path', None),
    ]
    
    valid_paths = [p for p in asset_paths if p is not None and len(p) > 0]
    assert len(valid_paths) > 0, f"{cfg.name} has no valid asset path"

@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS,
                         ids=lambda x: x.__name__)
def test_robot_config_has_actuators(robot_cfg_cls):
    """Verify robot config has actuator definitions."""
    cfg = robot_cfg_cls()
    
    if hasattr(cfg, 'actuators'):
        assert len(cfg.actuators) > 0, f"{cfg.name} has no actuators defined"

@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS,
                         ids=lambda x: x.__name__)
def test_robot_config_joint_limits_valid(robot_cfg_cls):
    """Verify joint limits are valid (lower < upper)."""
    cfg = robot_cfg_cls()
    
    if hasattr(cfg, 'joint_limits'):
        for joint_name, (lower, upper) in cfg.joint_limits.items():
            assert lower < upper, \
                f"{cfg.name}.{joint_name}: lower ({lower}) >= upper ({upper})"

TODO-004: Integrate pytest-cov in CI#

Priority: P0
Effort: Low (1 day)
Risk: Very Low

Changes to .github/workflows/premerge-ci.yml:

# Add coverage reporting step
- name: Run tests with coverage
  run: |
    pytest metasim/test \
      --cov=metasim \
      --cov-report=xml \
      --cov-report=html \
      --cov-fail-under=30 \
      -k ${{ matrix.test_type }}

- name: Upload coverage to Codecov
  uses: codecov/codecov-action@v4
  with:
    file: ./coverage.xml
    flags: ${{ matrix.test_type }}
    fail_ci_if_error: false

Add pyproject.toml configuration:

[tool.coverage.run]
source = ["metasim"]
omit = ["metasim/test/*", "*/__pycache__/*"]

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "raise NotImplementedError",
    "if TYPE_CHECKING:",
]

TODO-005: Add Task Environment Integration Tests#

Priority: P1
Effort: High (1 week)
Risk: Medium

Implementation:

# metasim/test/test_task_environments.py

import pytest
from metasim.task.registry import get_task_class, TASK_REGISTRY

# Select representative tasks for testing
CORE_TASKS = [
    "pick_cube",
    "place_cube", 
    "open_drawer",
    "close_drawer",
    "push_button",
]

@pytest.mark.mujoco
@pytest.mark.parametrize("task_name", CORE_TASKS)
def test_task_reset_step_mujoco(task_name):
    """Test basic reset/step cycle for core tasks on MuJoCo."""
    task_cls = get_task_class(task_name)
    scenario = task_cls.scenario.copy()
    scenario.update(
        simulator="mujoco",
        num_envs=1,
        headless=True,
    )
    
    env = task_cls(scenario, device="cpu")
    env.launch()
    
    try:
        # Test reset
        obs, info = env.reset()
        assert obs is not None
        
        # Test step
        action = env.action_space.sample()
        obs, reward, terminated, truncated, info = env.step(action)
        
        assert obs is not None
        assert isinstance(reward, (int, float, torch.Tensor))
        assert isinstance(terminated, (bool, torch.Tensor))
        assert isinstance(truncated, (bool, torch.Tensor))
    finally:
        env.close()

TODO-006: Add Learning Algorithm Unit Tests#

Priority: P1
Effort: High (1-2 weeks)
Risk: Medium

Example for Diffusion Policy:

# roboverse_learn/il/tests/test_diffusion_policy.py

import pytest
import torch
from roboverse_learn.il.policies.dp.ddpm_dit_image_policy import DDPMDiTImagePolicy

@pytest.fixture
def sample_obs():
    """Create sample observation for testing."""
    return {
        "image": torch.randn(1, 3, 224, 224),
        "agent_pos": torch.randn(1, 7),
    }

def test_diffusion_policy_forward():
    """Test forward pass of diffusion policy."""
    policy = DDPMDiTImagePolicy(
        obs_dim=7,
        action_dim=7,
        horizon=16,
        # ... minimal config
    )
    
    obs = sample_obs()
    action = policy.predict_action(obs)
    
    assert action.shape == (1, 16, 7)  # (batch, horizon, action_dim)

def test_diffusion_policy_training_step():
    """Test single training step."""
    policy = DDPMDiTImagePolicy(...)
    optimizer = torch.optim.Adam(policy.parameters())
    
    batch = create_training_batch()
    loss = policy.compute_loss(batch)
    
    assert loss.requires_grad
    loss.backward()
    optimizer.step()

TODO-009: Create Unified Environment Factory#

Priority: P1
Effort: Medium (3-5 days)
Risk: Medium

Implementation:

# metasim/env_factory.py

from typing import Union, List, Optional
from metasim.scenario.robot import RobotCfg

def make_env(
    task: str,
    robots: Optional[List[Union[str, RobotCfg]]] = None,
    simulator: str = "mujoco",
    num_envs: int = 1,
    headless: bool = True,
    device: str = "cuda",
    cameras: Optional[List] = None,
    **kwargs
):
    """
    Unified environment factory for RoboVerse.
    
    This is the recommended way to create environments.
    
    Args:
        task: Task name (e.g., "pick_cube", "locomotion_walk")
        robots: List of robot names or RobotCfg instances
        simulator: Simulator backend ("mujoco", "isaacsim", "sapien3", etc.)
        num_envs: Number of parallel environments
        headless: Whether to run in headless mode
        device: Device for tensor computations
        cameras: Camera configurations for observations
        **kwargs: Additional task-specific arguments
    
    Returns:
        BaseTaskEnv: Configured environment instance
    
    Example:
        >>> env = make_env(
        ...     task="pick_cube",
        ...     robots=["franka"],
        ...     simulator="mujoco",
        ...     num_envs=16,
        ... )
        >>> obs, info = env.reset()
        >>> obs, reward, term, trunc, info = env.step(action)
    """
    from metasim.task.registry import get_task_class
    from metasim.utils.setup_util import get_robot
    
    # Resolve task class
    task_cls = get_task_class(task)
    
    # Build scenario from task default
    scenario = task_cls.scenario.copy()
    
    # Resolve robots
    if robots is not None:
        resolved_robots = []
        for robot in robots:
            if isinstance(robot, str):
                resolved_robots.append(get_robot(robot))
            else:
                resolved_robots.append(robot)
        scenario.robots = resolved_robots
    
    # Apply overrides
    scenario.update(
        simulator=simulator,
        num_envs=num_envs,
        headless=headless,
        cameras=cameras or [],
    )
    
    # Create and return environment
    env = task_cls(scenario, device=device, **kwargs)
    env.launch()
    
    return env

# Also register with gymnasium for compatibility
def register_gymnasium_envs():
    """Register all RoboVerse tasks with Gymnasium."""
    import gymnasium
    from metasim.task.registry import TASK_REGISTRY
    
    for task_name in TASK_REGISTRY:
        gymnasium.register(
            id=f"RoboVerse/{task_name}",
            entry_point="metasim.env_factory:make_env",
            kwargs={"task": task_name},
        )

TODO-013: Remove Commented-Out Code#

Priority: P2
Effort: Very Low (2 hours)
Risk: Very Low

Files to clean:

File

Line

Content

metasim/sim/base.py

18

# from metasim.utils.hf_util import FileDownloader

metasim/sim/base.py

36

# FileDownloader(scenario).do_it()

metasim/sim/base.py

86

# @abstractmethod

metasim/scenario/scenario.py

72

Various commented code

Approach:

  1. Search for # patterns followed by code

  2. Review each instance

  3. Either remove or convert to proper TODO comment


TODO-014: Extract Magic Numbers to Constants#

Priority: P2
Effort: Low (1-2 days)
Risk: Low

Create constants file:

# metasim/constants.py

"""Framework-wide constants."""

# Simulation defaults
DEFAULT_DT = 0.015  # 15ms physics timestep
DEFAULT_DECIMATION = 2
DEFAULT_GRAVITY = (0.0, 0.0, -9.81)

# Cache settings
STATE_CACHE_SIZE = 1000
MAX_PARALLEL_ENVS = 4096

# Timeouts
DEFAULT_LAUNCH_TIMEOUT = 30.0
DEFAULT_STEP_TIMEOUT = 5.0

# Numerical tolerances
POSITION_TOLERANCE = 1e-5
ROTATION_TOLERANCE = 1e-4

Implementation Guidelines#

Safe Modification Protocol#

Before making any changes, follow this checklist:

β–‘ Pre-modification
  β”œβ”€β”€ β–‘ Read existing tests for the module
  β”œβ”€β”€ β–‘ Run existing tests (ensure they pass)
  β”œβ”€β”€ β–‘ Understand the change's impact scope
  └── β–‘ Check for downstream dependencies

β–‘ Implementation
  β”œβ”€β”€ β–‘ Write tests first (TDD preferred)
  β”œβ”€β”€ β–‘ Make small, focused commits
  β”œβ”€β”€ β–‘ Maintain backward compatibility
  └── β–‘ Add deprecation warnings if needed

β–‘ Post-modification
  β”œβ”€β”€ β–‘ Run full test suite locally
  β”œβ”€β”€ β–‘ Check for linter errors
  β”œβ”€β”€ β–‘ Update documentation if needed
  └── β–‘ Create detailed PR description

Feature Flag Pattern#

For high-risk changes, use feature flags:

# metasim/config.py

class FeatureFlags:
    """Feature flags for gradual rollout of changes."""
    
    # State cache v2 with independent tensor/dict caches
    USE_INDEPENDENT_STATE_CACHE = False
    
    # New unified environment factory
    USE_UNIFIED_ENV_FACTORY = False
    
    # Strict type checking in configs
    STRICT_CONFIG_VALIDATION = False

Deprecation Pattern#

import warnings
from functools import wraps

def deprecated(message: str, removal_version: str = "0.3.0"):
    """Decorator to mark functions as deprecated."""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            warnings.warn(
                f"{func.__name__} is deprecated: {message}. "
                f"Will be removed in version {removal_version}.",
                DeprecationWarning,
                stacklevel=2
            )
            return func(*args, **kwargs)
        return wrapper
    return decorator

# Usage
@deprecated("Use make_env() instead", removal_version="0.4.0")
def create_environment_legacy(...):
    ...

Testing Strategy#

Test Hierarchy#

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    End-to-End Tests                         β”‚
β”‚              (Full training pipeline tests)                 β”‚
β”‚                      ~5% of tests                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                   Integration Tests                         β”‚
β”‚         (Task reset/step, multi-env, rendering)             β”‚
β”‚                     ~25% of tests                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                      Unit Tests                             β”‚
β”‚    (Individual functions, state conversion, configs)        β”‚
β”‚                     ~70% of tests                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Running Tests Locally#

# Run all general tests (no simulator required)
pytest metasim/test -k general -v

# Run MuJoCo tests
pytest metasim/test -k mujoco -v

# Run with coverage
pytest metasim/test --cov=metasim --cov-report=html

# Run specific test file
pytest metasim/test/sim/test_state_cache.py -v

Coverage Targets#

Phase

Target Coverage

Timeline

Current

~15-20%

-

Phase 1

30%

Week 2

Phase 2

50%

Week 6

Phase 3

60%

Week 10

Long-term

70%+

Week 14+


Appendix: Quick Reference#

Priority Definitions#

Priority

Definition

Response Time

P0

Critical - blocks users or causes data corruption

Immediate

P1

High - significant usability or reliability issue

1-2 weeks

P2

Medium - code quality or maintainability

2-4 weeks

P3

Low - nice to have improvements

Backlog

Effort Definitions#

Effort

Time Estimate

Description

Very Low

< 4 hours

Single file, obvious change

Low

1-2 days

Few files, clear scope

Medium

3-5 days

Multiple files, some complexity

High

1-2 weeks

Significant refactoring

Very High

2+ weeks

Architecture-level changes


Contributing#

If you want to contribute to any of these improvements:

  1. Comment on the relevant GitHub issue (or create one)

  2. Follow the Safe Modification Protocol

  3. Start with lower-risk items to familiarize yourself with the codebase

  4. Ask questions in discussions if anything is unclear

Recommended starting points for new contributors:

  • TODO-002: Add @abstractmethod decorators (Very Low effort)

  • TODO-013: Remove commented-out code (Very Low effort)

  • TODO-003: Add robot configuration validation tests (Medium effort, high impact)