RoboVerse Architecture Review & Improvement Roadmap#

Document Version: 1.1 Last Updated: 2026-05-26 Status: Active Development

This document provides a comprehensive architecture review of the RoboVerse codebase and outlines a structured improvement roadmap. It is intended for core maintainers and contributors who want to understand the current state of the codebase and contribute to its improvement.

2026-05-26 update: 4 of the originally-listed P0/P1 items have landed as code + regression tests. They are kept in the document below for historical context, but each is now annotated STATUS: FIXED with a pointer to the test file that pins the fix in place. Forward / backward compatibility statements are included so existing users can adopt the changes without surprise — every fix is either a pure warning, a stale- cache eviction, or an information upgrade to a previously-silent failure mode.

Executive Summary#

Strengths#

RoboVerse is a well-architected multi-simulator robotics framework with several notable strengths:

Aspect	Assessment	Details
Modularity	Excellent	Clear separation between `metasim`, `roboverse_pack`, and `roboverse_learn`
Simulator Abstraction	Good	Unified interface across MuJoCo, IsaacSim, SAPIEN, PyBullet, Genesis
Configuration System	Good	Type-safe `@configclass` decorator with validation
Domain Randomization	Excellent	Comprehensive DR system with hybrid simulation support
Documentation	Good	Extensive tutorials and API documentation

Critical Issues Requiring Attention#

Priority	Issue	Impact	Effort	Status
P0	State cache consistency bug	Data corruption	Low	✅ FIXED — see Issue 1
P0	`set_states` silent-drop of control-input keys	Silent no-op	Low	✅ FIXED — see Issue 8
P0	`set_dof_targets` cache staleness	Silent stale state	Low	✅ FIXED — see Issue 9
P0	Test coverage severely lacking	Reliability	High	🟡 PARTIAL — see Issue 4
P1	Parallel-sim error handling	Silent worker death	Medium	✅ FIXED — see Issue 7
P1	Backend interface drift (`@abstractmethod` gaps)	Late-bind failures	Low	✅ FIXED via contract test — see Issue 2
P1	Configuration system fragmentation	Usability	Medium	⏳ Open
P1	Environment creation interface inconsistency	Usability	Medium	⏳ Open
P2	Code quality issues	Maintainability	Low	⏳ Open

Architecture Overview#

Module Structure#

RoboVerse/
├── metasim/                 # Core simulation framework
│   ├── sim/                 # Simulator handlers (MuJoCo, Isaac, SAPIEN, etc.)
│   ├── scenario/            # Scene configuration (robots, objects, cameras)
│   ├── task/                # Task environment abstraction
│   ├── randomization/       # Domain randomization system
│   ├── queries/             # Extended query system (contacts, sensors)
│   └── utils/               # Utilities (configclass, math, state conversion)
│
├── roboverse_pack/          # Assets and task definitions
│   ├── robots/              # 50+ robot configurations
│   ├── tasks/               # 200+ task environments
│   ├── scenes/              # Scene configurations
│   └── queries/             # Custom query implementations
│
├── roboverse_learn/         # Learning algorithms
│   ├── il/                  # Imitation learning (ACT, Diffusion Policy, etc.)
│   ├── rl/                  # Reinforcement learning (PPO, TD3, SAC)
│   └── vla/                 # Vision-Language-Action models
│
└── generation/              # Asset generation and conversion tools

Core Abstractions#

┌─────────────────────────────────────────────────────────────────┐
│                        BaseSimHandler                            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐             │
│  │ MujocoHandler│  │IsaacHandler │  │SAPIENHandler│  ...        │
│  └─────────────┘  └─────────────┘  └─────────────┘             │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                        BaseTaskEnv                               │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐             │
│  │PickPlaceTask│  │LocomotionTask│  │ManipulationTask│  ...     │
│  └─────────────┘  └─────────────┘  └─────────────┘             │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                        ScenarioCfg                               │
│  robots[] + objects[] + cameras[] + lights[] + sim_params       │
└─────────────────────────────────────────────────────────────────┘

Identified Issues#

Issue 1: State Cache Consistency Bug (P0 - FIXED 2026-05-26)#

Location: metasim/sim/base.py

Problem (historical): The state cache was modified in-place when switching between tensor and dict modes, causing data corruption.

Status: ✅ FIXED. BaseSimHandler now keeps two independent caches (_tensor_state_cache / _dict_state_cache). On a miss for the requested mode it lazily converts from the other. set_states, set_dof_targets and simulate all call _invalidate_state_caches.

Tests pinning the fix:

metasim/test/sim/test_state_modes.py::test_state_cache_mode_independence — integration (mujoco / sapien3 / isaacsim / isaacgym / newton)
metasim/test/sim/test_state_modes.py::test_set_dof_targets_invalidates_state_cache — integration; specifically catches the staleness regressed in Issue 9
metasim/test/test_backend_contract_general.py::test_set_states_invalidates_cache_on_all_backends — static AST check on base.py; runs in -k general
metasim/test/test_set_states_key_validation.py::test_set_dof_targets_invalidates_state_cache_unit — unit-level stub guard

Forward / backward compat: Pure bugfix. Callers that previously received corrupted data now receive consistent data — there is no API surface change. The lazy-conversion path was already the documented intent; only the in-place mutation was wrong.

Issue 2: Incomplete Abstract Method Declarations (P1 — FIXED via contract test 2026-05-26)#

Location: metasim/sim/base.py

Status: ✅ FIXED in spirit. _set_dof_targets is now actually decorated @abstractmethod. Two other contract methods (_get_joint_names / _get_body_names) remain commented out because flipping the decorator would break PyrepHandler / partial SinglePybulletHandler / partial GenesisHandler at import time — those backends genuinely do not implement them yet.

Instead of breaking imports, metasim/test/test_backend_contract_general.py statically asserts every concrete BaseSimHandler subclass overrides every documented contract method, with xfail markers for known gaps. When a backend catches up, its xfail flips to xpass — that’s the signal to enable the decorator for real.

Forward / backward compat: Pure additive contract enforcement. No existing backend was broken. New backends added later will get a clear failing test on day one if they forget a method.

Issue 3: Configuration System Fragmentation (P1)#

Problem: Three different configuration systems are used across modules:

Module	System	Tools
`metasim`	`@configclass`	Python dataclass + custom wrapper
`roboverse_learn/il`	Hydra	YAML + OmegaConf
`roboverse_learn/rl`	Mixed	YAML + tyro + argparse

Impact:

Steep learning curve for contributors
Configuration cannot be easily shared between modules
Inconsistent user experience

Issue 4: Test Coverage Severely Lacking (P0)#

Current Coverage Estimate:

Module	Test Files	Estimated Coverage
`metasim`	29	~60-70%
`roboverse_pack`	0	0%
`roboverse_learn`	1	<5%
`generation`	1	~20%
Overall	31	~15-20%

Critical Gaps:

50+ robot configurations have no validation tests
200+ task environments have no tests
All learning algorithms (ACT, Diffusion Policy, PPO, TD3) have no tests
No code coverage reporting in CI

Issue 5: Environment Creation Interface Inconsistency (P1)#

Problem: Two different ways to create environments:

# Method 1: Gymnasium style (used in clean_rl, vla)
from gymnasium import make_vec
env = make_vec("RoboVerse/task", robots=[...], simulator=sim)

# Method 2: Direct task class (used in fast_td3, rsl_rl, il)
from metasim.task.registry import get_task_class
env = get_task_class(task)(scenario)

Impact: Confusion for new users; inconsistent integration patterns.

Issue 6: Code Quality Issues (P2)#

Issue	Location	Count
Commented-out code	Multiple files	~15 instances
Magic numbers	`sim/isaacsim/`, `sim/mujoco/`	~20 instances
Missing type annotations	Various	~100+ functions
Inconsistent error handling	All simulator handlers	Varies

Issue 7: Parallel Simulation Error Handling (P1 — FIXED 2026-05-26)#

Location: metasim/sim/parallel.py

Problem (historical): _check_error() was only called from launch(). A worker that died on any other operation (OOM, GPU failure, asset load error) left the parent process either hanging on remote.recv() or eventually raising a cryptic EOFError — the real traceback in error_queue was never surfaced.

Status: ✅ FIXED. Three changes:

_check_error now also detects workers that died without reporting (e.g. OOM-killed / SIGKILL) via process.is_alive(). Queue messages are drained with full traceback formatting.
Every public method on ParallelHandler (_set_states, _set_dof_targets, _simulate, get_joint_names, get_body_names, device, _get_states) calls _check_error after wire I/O.
A new _recv_or_surface wraps remote.recv() and translates EOFError / BrokenPipeError / ConnectionResetError into a RuntimeError carrying the real worker traceback — instead of the user chasing a cryptic IPC exception.

Tests pinning the fix:

metasim/test/test_parallel_error_handling_general.py — 6 unit tests using a sync queue stub (mp.Queue’s async put races with the immediate empty-check, so a sync stand-in keeps the tests deterministic without requiring a real worker process)

Forward / backward compat: Pure information upgrade. A call that previously hung or raised EOFError now raises RuntimeError with the actual worker traceback. Calls that previously succeeded continue to succeed unchanged.

Issue 8: `set_states` silently dropped control-input keys (P0 — FIXED 2026-05-26)#

Location: metasim/sim/base.py (boundary), every backend _set_states

Problem (historical): DictRobotState advertises dof_pos_target / dof_vel_target / dof_torque as valid keys, but every backend’s _set_states only honours pos / rot / dof_pos. Callers mistakenly passing dof_pos_target to set_states got a silent no-op — the joints never moved. This cost ~15 downstream BC experiments before the cause was found.

Status: ✅ FIXED. BaseSimHandler.set_states now runs _warn_set_states_keys before dispatching to the backend. Unknown keys log a one-shot warning per (role, key) with the list of valid keys; the three control-input keys get a specific "this is a control input — use set_dof_targets(...) instead" hint. Deduplicated per handler instance so the hot path stays quiet.

Tests pinning the fix:

metasim/test/test_set_states_key_validation.py — 9 unit tests (known-keys-quiet, control-input-hint, unknown-key-message, deduplication, per-role labelling, TensorState fast-path, plus the set_dof_targets cache-invalidation guard from Issue 9)

Forward / backward compat: Pure warning addition — runtime behaviour of set_states is unchanged for code that passed valid keys. Code that previously silently no-op’d now produces a clear log line; the hot path overhead is one set membership check per (role, key) seen.

Issue 9: `set_dof_targets` left state cache stale (P0 — FIXED 2026-05-26)#

Location: metasim/sim/base.py

Problem (historical): MuJoCo writes actuator ctrl in _set_dof_targets, and _get_states reads ctrl back as joint_pos_target. The base class’s public set_dof_targets forgot to invalidate the state cache. Any get_states between set_dof_targets and the next simulate returned the previous joint_pos_target.

Status: ✅ FIXED. BaseSimHandler.set_dof_targets now calls _invalidate_state_caches() before dispatching to the backend. Universal because no concrete handler overrides the public set_dof_targets — every backend benefits.

Tests pinning the fix:

metasim/test/sim/test_state_modes.py::test_set_dof_targets_invalidates_state_cache — integration; verified on mujoco / sapien3 in roboverse env, marker also covers isaacsim / isaacgym / newton
metasim/test/test_set_states_key_validation.py::test_set_dof_targets_invalidates_state_cache_unit — fast unit-level guard with stub handler

Forward / backward compat: Pure bugfix. The first get_states after a set_dof_targets now reflects the fresh action; previously it returned stale data. No API surface change.

Issue 10: Robot config drift (P0 — FIXED via test 2026-05-26)#

Location: roboverse_pack/robots/*.py

Problem (historical): 50+ RobotCfg subclasses with zero validation tests. Typos in default_joint_positions (joints that aren’t in joint_limits) and out-of-limit defaults were silently clamped or ignored at sim launch — observable as wrong reset states rather than as a config error.

Status: ✅ FIXED via test gate. metasim/test/test_robot_cfg_validation_general.py and RoboVerse/tests/test_roboverse_robot_cfg_validation.py walk every discoverable RobotCfg subclass and assert: instantiation, non-empty name, default_joint_positions keys ⊆ joint_limits keys, defaults ∈ limit intervals.

Bugs surfaced (xfail-documented, not fixed in this pass to preserve backward compat for trained policies):

AlohaAgilexCfg — 16 fl_/fr_joint{1..8} keys in default_joint_positions but only single-arm names in joint_limits (bimanual override gap)
G1TrackingCfg — regex keys like .*_ankle_pitch_joint in default_joint_positions not expanded; joint_limits has the concrete left_/right_ names
YamCfg — joint2 / joint4 defaults copy-pasted from Franka home pose but joint_limits are Yam’s narrower ranges
ArxL5Cfg — same Franka copy-paste, different limit ranges
VegaCfg — torso_j1 joint_limits is the single-point [0.2, 0.2] — looks like a fixed offset, not a range
SoArm100Cfg — Wrist_Pitch default -2.356 outside [-0.192, 3.927]
KochCfg — same shape as SoArm100
Go2Cfg — RL/RR_thigh_joint default 1.0 outside [-4.54, 0.52] (sign error)
AllegroHandCfg — thumb_joint_0 default 0 below lower limit 0.263

When a robot is fixed, its xfail flips to xpass — the test_known_gap_dicts_match_actual_failures self-check then tells the maintainer to remove the entry so the contract tightens.

Forward / backward compat: Pure additive contract enforcement. No RobotCfg was modified — the test only documents the gaps so production behaviour (silent clamp / silent ignore) is preserved for anyone training against the current defaults.

Warning Catalog (added 2026-05-30)#

As part of the 2026-05 hardening pass, BaseSimHandler and several concrete backends started emitting one-shot warnings for the silent- drop / cross-backend asymmetry patterns previously hidden inside _set_states / _set_dof_targets / actuator wiring / scenario setup. Each warning fires at most once per (handler, gap-identity) so hot-path replay does not spam the log; each tells you exactly what was dropped and how to make it stop.

Source	Trigger	Meaning	Fix
`set_states`	dict has `dof_pos_target` / `dof_vel_target` / `dof_torque`	These are control inputs, not state — every backend silently drops them.	Call `set_dof_targets(...)` instead.
`set_states`	dict has unknown key	Typo or a field no backend honours.	Use one of `pos` / `rot` / `dof_pos`; the warning lists the valid set.
`set_states`	dict has `vel` / `ang_vel` / `dof_vel`	Velocity fields are populated by `get_states` for inspection, but no backend writes them in `_set_states` today (MuJoCo zeros qvel, Sapien3 ignores).	Initialise momentum via `simulate()` with a non-zero initial action, or open a feature request for backend-side velocity write.
`set_states`	dict has only one of `pos` / `rot` for an entity	Cross-backend divergence: MuJoCo silently fills the missing component with a default, Sapien3 raises `KeyError`.	Pass both `pos` and `rot`, or neither.
`set_dof_targets`	action targets unknown robot name	Robot not in `scenario.robots` — every backend silently drops the action.	Use a robot name listed in the warning’s “Known robots” suffix.
`set_dof_targets`	dict has unknown joint name under a robot	Joint not in `handler.get_joint_names(robot.name)` — silently dropped.	Use one of the joint names from `get_joint_names`; the warning lists them.
`set_dof_targets`	unknown top-level key under a robot	Typo like `dof_pos_targt`; nothing reaches the actuator.	Use `dof_pos_target` / `dof_vel_target` / `dof_torque` / `dof_effort_target`.
`MujocoHandler._apply_actuator_settings`	`stiffness` / `damping` set but `effort_limit_sim` unset	MJCF-authored `forcerange` stays active and silently dominates the new PD gain. Same RoboVerse config → different effective torque per backend.	Set `effort_limit_sim` on the actuator cfg to the value you want, or accept the asset-file clamp deliberately.
`NewtonHandler._apply_actuator_settings`	same as MuJoCo above	Newton inherits `joint_effort_limit` from the asset file.	Same — set `effort_limit_sim`.
`NewtonHandler._set_dof_targets`	`self._control` has no `joint_target_pos` / `joint_target_vel` / `joint_f` buffer	Newton model has no position / velocity / effort actuator for the targeted joint. Every action write is dropped.	Verify the MJCF / URDF has actuators on the joints you control, or use `control_type="effort"` deliberately.
`ScenarioCfg.__post_init__`	duplicate name across robots + objects	`object_dict = {obj.name: obj for ...}` in every handler collapses the duplicate. Set / get / step all silently reference the wrong entity.	Give each robot and object a unique `name`.
`TaskBase.reset` / `RLTaskEnv.reset`	`seed=N` passed but handler has no `set_seed`	Reproducibility contract: rollouts are not bit-reproducible on that backend’s simulator side. (Won’t fire on backends that inherit `BaseSimHandler.set_seed` — included for forward-compat against future backends that override.)	Implement `set_seed` on the handler, or call `random.seed` / `np.random.seed` / `torch.manual_seed` directly.
`ParallelHandler._check_error`	worker process not alive	A multiprocessing worker died (OOM-kill, SIGKILL, segfault) without writing to `error_queue`.	The exit code is reported in the exception; check the worker log or run with a smaller `num_envs` to bisect.
`ParallelHandler._recv_or_surface`	`EOFError` / `BrokenPipeError` on `remote.recv()`	Worker closed its pipe — typically the worker process is dead and `error_queue` carries the real Python traceback.	The wrapped exception’s `__cause__` is the original IPC error; the message describes which worker died.

When you fix a warning’s root cause, the warning stops firing — it does NOT need a separate “silence” flag. The dedupe lives on the handler instance, so each Python process emits each warning at most once.

Improvement Roadmap#

Phase 1: Critical Fixes (Weeks 1-2)#

┌────────────────────────────────────────────────────────┐
│ PHASE 1: Critical Fixes                                │
│                                                        │
│ □ TODO-001: Fix state cache consistency                │
│ □ TODO-002: Add @abstractmethod decorators             │
│ □ TODO-003: Add robot configuration validation tests   │
│ □ TODO-004: Integrate pytest-cov in CI                 │
└────────────────────────────────────────────────────────┘

Phase 2: Test Coverage (Weeks 3-6)#

┌────────────────────────────────────────────────────────┐
│ PHASE 2: Test Coverage Improvement                     │
│                                                        │
│ □ TODO-005: Add task environment integration tests     │
│ □ TODO-006: Add learning algorithm unit tests          │
│ □ TODO-007: Add state conversion tests                 │
│ □ TODO-008: Add domain randomization tests             │
└────────────────────────────────────────────────────────┘

Phase 3: Interface Unification (Weeks 7-10)#

┌────────────────────────────────────────────────────────┐
│ PHASE 3: Interface Unification                         │
│                                                        │
│ □ TODO-009: Create unified environment factory         │
│ □ TODO-010: Standardize configuration loading          │
│ □ TODO-011: Unify error handling across simulators     │
│ □ TODO-012: Add deprecation warnings for old APIs      │
└────────────────────────────────────────────────────────┘

Phase 4: Code Quality (Weeks 11-14)#

┌────────────────────────────────────────────────────────┐
│ PHASE 4: Code Quality                                  │
│                                                        │
│ □ TODO-013: Remove commented-out code                  │
│ □ TODO-014: Extract magic numbers to constants         │
│ □ TODO-015: Add comprehensive type annotations         │
│ □ TODO-016: Add performance benchmarks                 │
└────────────────────────────────────────────────────────┘

Phase 5: Architecture Evolution (Long-term)#

┌────────────────────────────────────────────────────────┐
│ PHASE 5: Architecture Evolution                        │
│                                                        │
│ □ TODO-017: Implement plugin architecture for sims     │
│ □ TODO-018: Create unified configuration system        │
│ □ TODO-019: Add async simulation support               │
│ □ TODO-020: Performance profiling integration          │
└────────────────────────────────────────────────────────┘

Detailed TODO List#

TODO-001: Fix State Cache Consistency#

Priority: P0 (Critical)
Effort: Low (1-2 days)
Risk: Low

Description: Fix the state cache to maintain separate caches for tensor and dict formats.

Implementation:

# metasim/sim/base.py

class BaseSimHandler(ABC):
    def __init__(self, scenario, optional_queries=None):
        # ... existing code ...
        self._state_cache_expire = True
        self._tensor_state_cache: TensorState | None = None
        self._dict_state_cache: list[DictEnvState] | None = None

    def _invalidate_cache(self) -> None:
        """Invalidate all state caches."""
        self._state_cache_expire = True
        self._tensor_state_cache = None
        self._dict_state_cache = None

    def set_states(self, states, env_ids=None) -> None:
        """Set states and invalidate cache."""
        self._invalidate_cache()
        self._set_states(states, env_ids)

    def get_states(
        self, 
        env_ids: list[int] | None = None, 
        mode: Literal["tensor", "dict"] = "tensor"
    ) -> TensorState | list[DictEnvState]:
        """Get states with independent caching for each mode."""
        if self._state_cache_expire:
            self._tensor_state_cache = self._get_states(env_ids=env_ids)
            self._dict_state_cache = None  # Lazy conversion
            self._state_cache_expire = False

        if mode == "tensor":
            return self._tensor_state_cache
        else:
            if self._dict_state_cache is None:
                self._dict_state_cache = state_tensor_to_nested(
                    self, self._tensor_state_cache
                )
            return self._dict_state_cache

Test Case:

# metasim/test/sim/test_state_cache.py

@pytest.mark.general
def test_state_cache_mode_independence():
    """Verify that switching modes doesn't corrupt cache."""
    handler = create_test_handler()
    handler.launch()
    
    # Get tensor state
    states_t1 = handler.get_states(mode="tensor")
    assert isinstance(states_t1, TensorState)
    
    # Get dict state (should not affect tensor cache)
    states_d = handler.get_states(mode="dict")
    assert isinstance(states_d, list)
    
    # Get tensor state again (should return same type)
    states_t2 = handler.get_states(mode="tensor")
    assert isinstance(states_t2, TensorState)
    
    # Values should match
    assert torch.allclose(states_t1.pos, states_t2.pos)

Acceptance Criteria:

Test case passes
Existing tests still pass
No breaking changes to public API

TODO-002: Add @abstractmethod Decorators#

Priority: P1
Effort: Very Low (1 hour)
Risk: Very Low

Files to modify:

metasim/sim/base.py

Changes:

# Before
# @abstractmethod
def _set_dof_targets(self, actions: list[Action]) -> None:
    raise NotImplementedError

# After
@abstractmethod
def _set_dof_targets(self, actions: list[Action]) -> None:
    """Set DOF targets. Subclasses must implement this method."""
    raise NotImplementedError

Verification:

# Run mypy to verify abstract method detection
mypy metasim/sim/base.py

TODO-003: Add Robot Configuration Validation Tests#

Priority: P0
Effort: Medium (2-3 days)
Risk: Low

Implementation:

# metasim/test/test_robot_configs.py

import pytest
from pathlib import Path
import importlib
import pkgutil

def get_all_robot_configs():
    """Dynamically discover all robot configuration classes."""
    import roboverse_pack.robots as robots_module
    
    configs = []
    for importer, modname, ispkg in pkgutil.iter_modules(robots_module.__path__):
        if modname.endswith('_cfg'):
            module = importlib.import_module(f'roboverse_pack.robots.{modname}')
            for name in dir(module):
                obj = getattr(module, name)
                if (isinstance(obj, type) and 
                    hasattr(obj, 'name') and 
                    name.endswith('Cfg')):
                    configs.append(obj)
    return configs

ALL_ROBOT_CONFIGS = get_all_robot_configs()

@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS, 
                         ids=lambda x: x.__name__)
def test_robot_config_instantiation(robot_cfg_cls):
    """Verify robot config can be instantiated."""
    cfg = robot_cfg_cls()
    assert cfg.name is not None
    assert isinstance(cfg.name, str)
    assert len(cfg.name) > 0

@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS,
                         ids=lambda x: x.__name__)
def test_robot_config_has_asset_path(robot_cfg_cls):
    """Verify robot config has at least one asset path."""
    cfg = robot_cfg_cls()
    
    asset_paths = [
        getattr(cfg, 'usd_path', None),
        getattr(cfg, 'urdf_path', None),
        getattr(cfg, 'mjcf_path', None),
    ]
    
    valid_paths = [p for p in asset_paths if p is not None and len(p) > 0]
    assert len(valid_paths) > 0, f"{cfg.name} has no valid asset path"

@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS,
                         ids=lambda x: x.__name__)
def test_robot_config_has_actuators(robot_cfg_cls):
    """Verify robot config has actuator definitions."""
    cfg = robot_cfg_cls()
    
    if hasattr(cfg, 'actuators'):
        assert len(cfg.actuators) > 0, f"{cfg.name} has no actuators defined"

@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS,
                         ids=lambda x: x.__name__)
def test_robot_config_joint_limits_valid(robot_cfg_cls):
    """Verify joint limits are valid (lower < upper)."""
    cfg = robot_cfg_cls()
    
    if hasattr(cfg, 'joint_limits'):
        for joint_name, (lower, upper) in cfg.joint_limits.items():
            assert lower < upper, \
                f"{cfg.name}.{joint_name}: lower ({lower}) >= upper ({upper})"

TODO-004: Integrate pytest-cov in CI#

Priority: P0
Effort: Low (1 day)
Risk: Very Low

Changes to .github/workflows/premerge-ci.yml:

# Add coverage reporting step
- name: Run tests with coverage
  run: |
    pytest metasim/test \
      --cov=metasim \
      --cov-report=xml \
      --cov-report=html \
      --cov-fail-under=30 \
      -k ${{ matrix.test_type }}

- name: Upload coverage to Codecov
  uses: codecov/codecov-action@v4
  with:
    file: ./coverage.xml
    flags: ${{ matrix.test_type }}
    fail_ci_if_error: false

Add pyproject.toml configuration:

[tool.coverage.run]
source = ["metasim"]
omit = ["metasim/test/*", "*/__pycache__/*"]

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "raise NotImplementedError",
    "if TYPE_CHECKING:",
]

TODO-005: Add Task Environment Integration Tests#

Priority: P1
Effort: High (1 week)
Risk: Medium

Implementation:

# metasim/test/test_task_environments.py

import pytest
from metasim.task.registry import get_task_class, TASK_REGISTRY

# Select representative tasks for testing
CORE_TASKS = [
    "pick_cube",
    "place_cube", 
    "open_drawer",
    "close_drawer",
    "push_button",
]

@pytest.mark.mujoco
@pytest.mark.parametrize("task_name", CORE_TASKS)
def test_task_reset_step_mujoco(task_name):
    """Test basic reset/step cycle for core tasks on MuJoCo."""
    task_cls = get_task_class(task_name)
    scenario = task_cls.scenario.copy()
    scenario.update(
        simulator="mujoco",
        num_envs=1,
        headless=True,
    )
    
    env = task_cls(scenario, device="cpu")
    env.launch()
    
    try:
        # Test reset
        obs, info = env.reset()
        assert obs is not None
        
        # Test step
        action = env.action_space.sample()
        obs, reward, terminated, truncated, info = env.step(action)
        
        assert obs is not None
        assert isinstance(reward, (int, float, torch.Tensor))
        assert isinstance(terminated, (bool, torch.Tensor))
        assert isinstance(truncated, (bool, torch.Tensor))
    finally:
        env.close()

TODO-006: Add Learning Algorithm Unit Tests#

Priority: P1
Effort: High (1-2 weeks)
Risk: Medium

Example for Diffusion Policy:

# roboverse_learn/il/tests/test_diffusion_policy.py

import pytest
import torch
from roboverse_learn.il.policies.dp.ddpm_dit_image_policy import DDPMDiTImagePolicy

@pytest.fixture
def sample_obs():
    """Create sample observation for testing."""
    return {
        "image": torch.randn(1, 3, 224, 224),
        "agent_pos": torch.randn(1, 7),
    }

def test_diffusion_policy_forward():
    """Test forward pass of diffusion policy."""
    policy = DDPMDiTImagePolicy(
        obs_dim=7,
        action_dim=7,
        horizon=16,
        # ... minimal config
    )
    
    obs = sample_obs()
    action = policy.predict_action(obs)
    
    assert action.shape == (1, 16, 7)  # (batch, horizon, action_dim)

def test_diffusion_policy_training_step():
    """Test single training step."""
    policy = DDPMDiTImagePolicy(...)
    optimizer = torch.optim.Adam(policy.parameters())
    
    batch = create_training_batch()
    loss = policy.compute_loss(batch)
    
    assert loss.requires_grad
    loss.backward()
    optimizer.step()

TODO-009: Create Unified Environment Factory#

Priority: P1
Effort: Medium (3-5 days)
Risk: Medium

Implementation:

# metasim/env_factory.py

from typing import Union, List, Optional
from metasim.scenario.robot import RobotCfg

def make_env(
    task: str,
    robots: Optional[List[Union[str, RobotCfg]]] = None,
    simulator: str = "mujoco",
    num_envs: int = 1,
    headless: bool = True,
    device: str = "cuda",
    cameras: Optional[List] = None,
    **kwargs
):
    """
    Unified environment factory for RoboVerse.
    
    This is the recommended way to create environments.
    
    Args:
        task: Task name (e.g., "pick_cube", "locomotion_walk")
        robots: List of robot names or RobotCfg instances
        simulator: Simulator backend ("mujoco", "isaacsim", "sapien3", etc.)
        num_envs: Number of parallel environments
        headless: Whether to run in headless mode
        device: Device for tensor computations
        cameras: Camera configurations for observations
        **kwargs: Additional task-specific arguments
    
    Returns:
        BaseTaskEnv: Configured environment instance
    
    Example:
        >>> env = make_env(
        ...     task="pick_cube",
        ...     robots=["franka"],
        ...     simulator="mujoco",
        ...     num_envs=16,
        ... )
        >>> obs, info = env.reset()
        >>> obs, reward, term, trunc, info = env.step(action)
    """
    from metasim.task.registry import get_task_class
    from metasim.utils.setup_util import get_robot
    
    # Resolve task class
    task_cls = get_task_class(task)
    
    # Build scenario from task default
    scenario = task_cls.scenario.copy()
    
    # Resolve robots
    if robots is not None:
        resolved_robots = []
        for robot in robots:
            if isinstance(robot, str):
                resolved_robots.append(get_robot(robot))
            else:
                resolved_robots.append(robot)
        scenario.robots = resolved_robots
    
    # Apply overrides
    scenario.update(
        simulator=simulator,
        num_envs=num_envs,
        headless=headless,
        cameras=cameras or [],
    )
    
    # Create and return environment
    env = task_cls(scenario, device=device, **kwargs)
    env.launch()
    
    return env

# Also register with gymnasium for compatibility
def register_gymnasium_envs():
    """Register all RoboVerse tasks with Gymnasium."""
    import gymnasium
    from metasim.task.registry import TASK_REGISTRY
    
    for task_name in TASK_REGISTRY:
        gymnasium.register(
            id=f"RoboVerse/{task_name}",
            entry_point="metasim.env_factory:make_env",
            kwargs={"task": task_name},
        )

TODO-013: Remove Commented-Out Code#

Priority: P2
Effort: Very Low (2 hours)
Risk: Very Low

Files to clean:

File	Line	Content
`metasim/sim/base.py`	18	`# from metasim.utils.hf_util import FileDownloader`
`metasim/sim/base.py`	36	`# FileDownloader(scenario).do_it()`
`metasim/sim/base.py`	86	`# @abstractmethod`
`metasim/scenario/scenario.py`	72	Various commented code

Approach:

Search for # patterns followed by code
Review each instance
Either remove or convert to proper TODO comment

TODO-014: Extract Magic Numbers to Constants#

Priority: P2
Effort: Low (1-2 days)
Risk: Low

Create constants file:

# metasim/constants.py

"""Framework-wide constants."""

# Simulation defaults
DEFAULT_DT = 0.015  # 15ms physics timestep
DEFAULT_DECIMATION = 2
DEFAULT_GRAVITY = (0.0, 0.0, -9.81)

# Cache settings
STATE_CACHE_SIZE = 1000
MAX_PARALLEL_ENVS = 4096

# Timeouts
DEFAULT_LAUNCH_TIMEOUT = 30.0
DEFAULT_STEP_TIMEOUT = 5.0

# Numerical tolerances
POSITION_TOLERANCE = 1e-5
ROTATION_TOLERANCE = 1e-4

Implementation Guidelines#

Safe Modification Protocol#

Before making any changes, follow this checklist:

□ Pre-modification
  ├── □ Read existing tests for the module
  ├── □ Run existing tests (ensure they pass)
  ├── □ Understand the change's impact scope
  └── □ Check for downstream dependencies

□ Implementation
  ├── □ Write tests first (TDD preferred)
  ├── □ Make small, focused commits
  ├── □ Maintain backward compatibility
  └── □ Add deprecation warnings if needed

□ Post-modification
  ├── □ Run full test suite locally
  ├── □ Check for linter errors
  ├── □ Update documentation if needed
  └── □ Create detailed PR description

Feature Flag Pattern#

For high-risk changes, use feature flags:

# metasim/config.py

class FeatureFlags:
    """Feature flags for gradual rollout of changes."""
    
    # State cache v2 with independent tensor/dict caches
    USE_INDEPENDENT_STATE_CACHE = False
    
    # New unified environment factory
    USE_UNIFIED_ENV_FACTORY = False
    
    # Strict type checking in configs
    STRICT_CONFIG_VALIDATION = False

Deprecation Pattern#

import warnings
from functools import wraps

def deprecated(message: str, removal_version: str = "0.3.0"):
    """Decorator to mark functions as deprecated."""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            warnings.warn(
                f"{func.__name__} is deprecated: {message}. "
                f"Will be removed in version {removal_version}.",
                DeprecationWarning,
                stacklevel=2
            )
            return func(*args, **kwargs)
        return wrapper
    return decorator

# Usage
@deprecated("Use make_env() instead", removal_version="0.4.0")
def create_environment_legacy(...):
    ...

Testing Strategy#

Test Hierarchy#

┌─────────────────────────────────────────────────────────────┐
│                    End-to-End Tests                         │
│              (Full training pipeline tests)                 │
│                      ~5% of tests                           │
├─────────────────────────────────────────────────────────────┤
│                   Integration Tests                         │
│         (Task reset/step, multi-env, rendering)             │
│                     ~25% of tests                           │
├─────────────────────────────────────────────────────────────┤
│                      Unit Tests                             │
│    (Individual functions, state conversion, configs)        │
│                     ~70% of tests                           │
└─────────────────────────────────────────────────────────────┘

Running Tests Locally#

# Run all general tests (no simulator required)
pytest metasim/test -k general -v

# Run MuJoCo tests
pytest metasim/test -k mujoco -v

# Run with coverage
pytest metasim/test --cov=metasim --cov-report=html

# Run specific test file
pytest metasim/test/sim/test_state_cache.py -v

Coverage Targets#

Phase	Target Coverage	Timeline
Current	~15-20%	-
Phase 1	30%	Week 2
Phase 2	50%	Week 6
Phase 3	60%	Week 10
Long-term	70%+	Week 14+

Appendix: Quick Reference#

Priority Definitions#

Priority	Definition	Response Time
P0	Critical - blocks users or causes data corruption	Immediate
P1	High - significant usability or reliability issue	1-2 weeks
P2	Medium - code quality or maintainability	2-4 weeks
P3	Low - nice to have improvements	Backlog

Effort Definitions#

Effort	Time Estimate	Description
Very Low	< 4 hours	Single file, obvious change
Low	1-2 days	Few files, clear scope
Medium	3-5 days	Multiple files, some complexity
High	1-2 weeks	Significant refactoring
Very High	2+ weeks	Architecture-level changes

Contributing#

If you want to contribute to any of these improvements:

Comment on the relevant GitHub issue (or create one)
Follow the Safe Modification Protocol
Start with lower-risk items to familiarize yourself with the codebase
Ask questions in discussions if anything is unclear

Recommended starting points for new contributors:

TODO-002: Add @abstractmethod decorators (Very Low effort)
TODO-013: Remove commented-out code (Very Low effort)
TODO-003: Add robot configuration validation tests (Medium effort, high impact)

RoboVerse Architecture Review & Improvement Roadmap#

Table of Contents#

Executive Summary#

Strengths#

Critical Issues Requiring Attention#

Architecture Overview#

Module Structure#

Core Abstractions#

Identified Issues#

Issue 1: State Cache Consistency Bug (P0 - FIXED 2026-05-26)#

Issue 2: Incomplete Abstract Method Declarations (P1 — FIXED via contract test 2026-05-26)#

Issue 3: Configuration System Fragmentation (P1)#

Issue 4: Test Coverage Severely Lacking (P0)#

Issue 5: Environment Creation Interface Inconsistency (P1)#

Issue 6: Code Quality Issues (P2)#

Issue 7: Parallel Simulation Error Handling (P1 — FIXED 2026-05-26)#

Issue 8: set_states silently dropped control-input keys (P0 — FIXED 2026-05-26)#

Issue 9: set_dof_targets left state cache stale (P0 — FIXED 2026-05-26)#

Issue 10: Robot config drift (P0 — FIXED via test 2026-05-26)#

Warning Catalog (added 2026-05-30)#

Improvement Roadmap#

Phase 1: Critical Fixes (Weeks 1-2)#

Phase 2: Test Coverage (Weeks 3-6)#

Phase 3: Interface Unification (Weeks 7-10)#

Phase 4: Code Quality (Weeks 11-14)#

Phase 5: Architecture Evolution (Long-term)#

Detailed TODO List#

TODO-001: Fix State Cache Consistency#

TODO-002: Add @abstractmethod Decorators#

TODO-003: Add Robot Configuration Validation Tests#

TODO-004: Integrate pytest-cov in CI#

TODO-005: Add Task Environment Integration Tests#

TODO-006: Add Learning Algorithm Unit Tests#

TODO-009: Create Unified Environment Factory#

TODO-013: Remove Commented-Out Code#

TODO-014: Extract Magic Numbers to Constants#

Implementation Guidelines#

Safe Modification Protocol#

Feature Flag Pattern#

Deprecation Pattern#

Testing Strategy#

Test Hierarchy#

Running Tests Locally#

Coverage Targets#

Appendix: Quick Reference#

Priority Definitions#

Effort Definitions#

Contributing#

This Page

Issue 8: `set_states` silently dropped control-input keys (P0 — FIXED 2026-05-26)#

Issue 9: `set_dof_targets` left state cache stale (P0 — FIXED 2026-05-26)#