# RoboVerse Architecture Review & Improvement Roadmap

> **Document Version**: 1.1
> **Last Updated**: 2026-05-26
> **Status**: Active Development

This document provides a comprehensive architecture review of the RoboVerse codebase and outlines a structured improvement roadmap. It is intended for core maintainers and contributors who want to understand the current state of the codebase and contribute to its improvement.

> **2026-05-26 update**: 4 of the originally-listed P0/P1 items have landed
> as code + regression tests. They are kept in the document below for
> historical context, but each is now annotated **STATUS: FIXED** with a
> pointer to the test file that pins the fix in place. Forward / backward
> compatibility statements are included so existing users can adopt the
> changes without surprise — every fix is either a pure warning, a stale-
> cache eviction, or an information upgrade to a previously-silent failure
> mode.

---

## Table of Contents

1. [Executive Summary](#executive-summary)
2. [Architecture Overview](#architecture-overview)
3. [Identified Issues](#identified-issues)
4. [Improvement Roadmap](#improvement-roadmap)
5. [Detailed TODO List](#detailed-todo-list)
6. [Implementation Guidelines](#implementation-guidelines)
7. [Testing Strategy](#testing-strategy)

---

## Executive Summary

### Strengths

RoboVerse is a well-architected multi-simulator robotics framework with several notable strengths:

| Aspect | Assessment | Details |
|--------|------------|---------|
| **Modularity** | Excellent | Clear separation between `metasim`, `roboverse_pack`, and `roboverse_learn` |
| **Simulator Abstraction** | Good | Unified interface across MuJoCo, IsaacSim, SAPIEN, PyBullet, Genesis |
| **Configuration System** | Good | Type-safe `@configclass` decorator with validation |
| **Domain Randomization** | Excellent | Comprehensive DR system with hybrid simulation support |
| **Documentation** | Good | Extensive tutorials and API documentation |

### Critical Issues Requiring Attention

| Priority | Issue | Impact | Effort | Status |
|----------|-------|--------|--------|--------|
| P0 | State cache consistency bug | Data corruption | Low | ✅ FIXED — see Issue 1 |
| P0 | `set_states` silent-drop of control-input keys | Silent no-op | Low | ✅ FIXED — see Issue 8 |
| P0 | `set_dof_targets` cache staleness | Silent stale state | Low | ✅ FIXED — see Issue 9 |
| P0 | Test coverage severely lacking | Reliability | High | 🟡 PARTIAL — see Issue 4 |
| P1 | Parallel-sim error handling | Silent worker death | Medium | ✅ FIXED — see Issue 7 |
| P1 | Backend interface drift (`@abstractmethod` gaps) | Late-bind failures | Low | ✅ FIXED via contract test — see Issue 2 |
| P1 | Configuration system fragmentation | Usability | Medium | ⏳ Open |
| P1 | Environment creation interface inconsistency | Usability | Medium | ⏳ Open |
| P2 | Code quality issues | Maintainability | Low | ⏳ Open |

---

## Architecture Overview

### Module Structure

```
RoboVerse/
├── metasim/                 # Core simulation framework
│   ├── sim/                 # Simulator handlers (MuJoCo, Isaac, SAPIEN, etc.)
│   ├── scenario/            # Scene configuration (robots, objects, cameras)
│   ├── task/                # Task environment abstraction
│   ├── randomization/       # Domain randomization system
│   ├── queries/             # Extended query system (contacts, sensors)
│   └── utils/               # Utilities (configclass, math, state conversion)
│
├── roboverse_pack/          # Assets and task definitions
│   ├── robots/              # 50+ robot configurations
│   ├── tasks/               # 200+ task environments
│   ├── scenes/              # Scene configurations
│   └── queries/             # Custom query implementations
│
├── roboverse_learn/         # Learning algorithms
│   ├── il/                  # Imitation learning (ACT, Diffusion Policy, etc.)
│   ├── rl/                  # Reinforcement learning (PPO, TD3, SAC)
│   └── vla/                 # Vision-Language-Action models
│
└── generation/              # Asset generation and conversion tools
```

### Core Abstractions

```
┌─────────────────────────────────────────────────────────────────┐
│                        BaseSimHandler                            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐             │
│  │ MujocoHandler│  │IsaacHandler │  │SAPIENHandler│  ...        │
│  └─────────────┘  └─────────────┘  └─────────────┘             │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                        BaseTaskEnv                               │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐             │
│  │PickPlaceTask│  │LocomotionTask│  │ManipulationTask│  ...     │
│  └─────────────┘  └─────────────┘  └─────────────┘             │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                        ScenarioCfg                               │
│  robots[] + objects[] + cameras[] + lights[] + sim_params       │
└─────────────────────────────────────────────────────────────────┘
```

---

## Identified Issues

### Issue 1: State Cache Consistency Bug (P0 - FIXED 2026-05-26)

**Location**: `metasim/sim/base.py`

**Problem (historical)**: The state cache was modified in-place when switching between `tensor` and `dict` modes, causing data corruption.

**Status**: ✅ **FIXED**. ``BaseSimHandler`` now keeps two independent
caches (``_tensor_state_cache`` / ``_dict_state_cache``). On a miss for
the requested mode it lazily converts from the other. ``set_states``,
``set_dof_targets`` and ``simulate`` all call ``_invalidate_state_caches``.

**Tests pinning the fix**:
- ``metasim/test/sim/test_state_modes.py::test_state_cache_mode_independence`` —
  integration (mujoco / sapien3 / isaacsim / isaacgym / newton)
- ``metasim/test/sim/test_state_modes.py::test_set_dof_targets_invalidates_state_cache`` —
  integration; specifically catches the staleness regressed in Issue 9
- ``metasim/test/test_backend_contract_general.py::test_set_states_invalidates_cache_on_all_backends`` —
  static AST check on ``base.py``; runs in ``-k general``
- ``metasim/test/test_set_states_key_validation.py::test_set_dof_targets_invalidates_state_cache_unit`` —
  unit-level stub guard

**Forward / backward compat**: Pure bugfix. Callers that previously
received corrupted data now receive consistent data — there is no API
surface change. The lazy-conversion path was already the documented
intent; only the in-place mutation was wrong.

---

### Issue 2: Incomplete Abstract Method Declarations (P1 — FIXED via contract test 2026-05-26)

**Location**: `metasim/sim/base.py`

**Status**: ✅ **FIXED in spirit**. ``_set_dof_targets`` is now actually
decorated ``@abstractmethod``. Two other contract methods
(``_get_joint_names`` / ``_get_body_names``) remain commented out
because flipping the decorator would break ``PyrepHandler`` /
partial ``SinglePybulletHandler`` / partial ``GenesisHandler`` at
import time — those backends genuinely do not implement them yet.

Instead of breaking imports, ``metasim/test/test_backend_contract_general.py``
statically asserts every concrete ``BaseSimHandler`` subclass overrides
every documented contract method, with ``xfail`` markers for known
gaps. When a backend catches up, its ``xfail`` flips to ``xpass`` —
that's the signal to enable the decorator for real.

**Forward / backward compat**: Pure additive contract enforcement. No
existing backend was broken. New backends added later will get a
clear failing test on day one if they forget a method.

---

### Issue 3: Configuration System Fragmentation (P1)

**Problem**: Three different configuration systems are used across modules:

| Module | System | Tools |
|--------|--------|-------|
| `metasim` | `@configclass` | Python dataclass + custom wrapper |
| `roboverse_learn/il` | Hydra | YAML + OmegaConf |
| `roboverse_learn/rl` | Mixed | YAML + tyro + argparse |

**Impact**:
- Steep learning curve for contributors
- Configuration cannot be easily shared between modules
- Inconsistent user experience

---

### Issue 4: Test Coverage Severely Lacking (P0)

**Current Coverage Estimate**:

| Module | Test Files | Estimated Coverage |
|--------|------------|-------------------|
| `metasim` | 29 | ~60-70% |
| `roboverse_pack` | 0 | 0% |
| `roboverse_learn` | 1 | <5% |
| `generation` | 1 | ~20% |
| **Overall** | 31 | **~15-20%** |

**Critical Gaps**:
- 50+ robot configurations have no validation tests
- 200+ task environments have no tests
- All learning algorithms (ACT, Diffusion Policy, PPO, TD3) have no tests
- No code coverage reporting in CI

---

### Issue 5: Environment Creation Interface Inconsistency (P1)

**Problem**: Two different ways to create environments:

```python
# Method 1: Gymnasium style (used in clean_rl, vla)
from gymnasium import make_vec
env = make_vec("RoboVerse/task", robots=[...], simulator=sim)

# Method 2: Direct task class (used in fast_td3, rsl_rl, il)
from metasim.task.registry import get_task_class
env = get_task_class(task)(scenario)
```

**Impact**: Confusion for new users; inconsistent integration patterns.

---

### Issue 6: Code Quality Issues (P2)

| Issue | Location | Count |
|-------|----------|-------|
| Commented-out code | Multiple files | ~15 instances |
| Magic numbers | `sim/isaacsim/`, `sim/mujoco/` | ~20 instances |
| Missing type annotations | Various | ~100+ functions |
| Inconsistent error handling | All simulator handlers | Varies |

---

### Issue 7: Parallel Simulation Error Handling (P1 — FIXED 2026-05-26)

**Location**: `metasim/sim/parallel.py`

**Problem (historical)**: ``_check_error()`` was only called from
``launch()``. A worker that died on any other operation (OOM, GPU
failure, asset load error) left the parent process either hanging on
``remote.recv()`` or eventually raising a cryptic ``EOFError`` — the
real traceback in ``error_queue`` was never surfaced.

**Status**: ✅ **FIXED**. Three changes:

1. ``_check_error`` now also detects workers that died without
   reporting (e.g. OOM-killed / SIGKILL) via ``process.is_alive()``.
   Queue messages are drained with full traceback formatting.
2. Every public method on ``ParallelHandler`` (``_set_states``,
   ``_set_dof_targets``, ``_simulate``, ``get_joint_names``,
   ``get_body_names``, ``device``, ``_get_states``) calls
   ``_check_error`` after wire I/O.
3. A new ``_recv_or_surface`` wraps ``remote.recv()`` and translates
   ``EOFError`` / ``BrokenPipeError`` / ``ConnectionResetError`` into
   a ``RuntimeError`` carrying the real worker traceback — instead of
   the user chasing a cryptic IPC exception.

**Tests pinning the fix**:
- ``metasim/test/test_parallel_error_handling_general.py`` — 6 unit
  tests using a sync queue stub (mp.Queue's async put races with the
  immediate empty-check, so a sync stand-in keeps the tests
  deterministic without requiring a real worker process)

**Forward / backward compat**: Pure information upgrade. A call that
previously hung or raised ``EOFError`` now raises ``RuntimeError`` with
the actual worker traceback. Calls that previously succeeded continue
to succeed unchanged.

---

### Issue 8: `set_states` silently dropped control-input keys (P0 — FIXED 2026-05-26)

**Location**: `metasim/sim/base.py` (boundary), every backend `_set_states`

**Problem (historical)**: ``DictRobotState`` advertises
``dof_pos_target`` / ``dof_vel_target`` / ``dof_torque`` as valid keys,
but every backend's ``_set_states`` only honours ``pos`` / ``rot`` /
``dof_pos``. Callers mistakenly passing ``dof_pos_target`` to
``set_states`` got a silent no-op — the joints never moved. This cost
~15 downstream BC experiments before the cause was found.

**Status**: ✅ **FIXED**. ``BaseSimHandler.set_states`` now runs
``_warn_set_states_keys`` before dispatching to the backend. Unknown
keys log a one-shot warning per ``(role, key)`` with the list of valid
keys; the three control-input keys get a specific
``"this is a control input — use set_dof_targets(...) instead"`` hint.
Deduplicated per handler instance so the hot path stays quiet.

**Tests pinning the fix**:
- ``metasim/test/test_set_states_key_validation.py`` — 9 unit tests
  (known-keys-quiet, control-input-hint, unknown-key-message,
  deduplication, per-role labelling, TensorState fast-path, plus the
  ``set_dof_targets`` cache-invalidation guard from Issue 9)

**Forward / backward compat**: Pure warning addition — runtime behaviour
of ``set_states`` is unchanged for code that passed valid keys. Code
that previously silently no-op'd now produces a clear log line; the
hot path overhead is one set membership check per (role, key) seen.

---

### Issue 9: `set_dof_targets` left state cache stale (P0 — FIXED 2026-05-26)

**Location**: `metasim/sim/base.py`

**Problem (historical)**: MuJoCo writes actuator ctrl in
``_set_dof_targets``, and ``_get_states`` reads ctrl back as
``joint_pos_target``. The base class's public ``set_dof_targets``
forgot to invalidate the state cache. Any ``get_states`` between
``set_dof_targets`` and the next ``simulate`` returned the previous
``joint_pos_target``.

**Status**: ✅ **FIXED**. ``BaseSimHandler.set_dof_targets`` now calls
``_invalidate_state_caches()`` before dispatching to the backend.
Universal because no concrete handler overrides the public
``set_dof_targets`` — every backend benefits.

**Tests pinning the fix**:
- ``metasim/test/sim/test_state_modes.py::test_set_dof_targets_invalidates_state_cache`` —
  integration; verified on mujoco / sapien3 in ``roboverse`` env, marker
  also covers isaacsim / isaacgym / newton
- ``metasim/test/test_set_states_key_validation.py::test_set_dof_targets_invalidates_state_cache_unit`` —
  fast unit-level guard with stub handler

**Forward / backward compat**: Pure bugfix. The first ``get_states``
after a ``set_dof_targets`` now reflects the fresh action; previously
it returned stale data. No API surface change.

---

### Issue 10: Robot config drift (P0 — FIXED via test 2026-05-26)

**Location**: `roboverse_pack/robots/*.py`

**Problem (historical)**: 50+ `RobotCfg` subclasses with zero
validation tests. Typos in ``default_joint_positions`` (joints that
aren't in ``joint_limits``) and out-of-limit defaults were silently
clamped or ignored at sim launch — observable as wrong reset states
rather than as a config error.

**Status**: ✅ **FIXED via test gate**.
``metasim/test/test_robot_cfg_validation_general.py`` and
``RoboVerse/tests/test_roboverse_robot_cfg_validation.py`` walk every
discoverable ``RobotCfg`` subclass and assert: instantiation, non-empty
``name``, ``default_joint_positions`` keys ⊆ ``joint_limits`` keys,
defaults ∈ limit intervals.

**Bugs surfaced** (xfail-documented, not fixed in this pass to preserve
backward compat for trained policies):

1. ``AlohaAgilexCfg`` — 16 ``fl_/fr_joint{1..8}`` keys in
   ``default_joint_positions`` but only single-arm names in
   ``joint_limits`` (bimanual override gap)
2. ``G1TrackingCfg`` — regex keys like ``.*_ankle_pitch_joint`` in
   ``default_joint_positions`` not expanded; ``joint_limits`` has the
   concrete ``left_/right_`` names
3. ``YamCfg`` — joint2 / joint4 defaults copy-pasted from Franka home
   pose but ``joint_limits`` are Yam's narrower ranges
4. ``ArxL5Cfg`` — same Franka copy-paste, different limit ranges
5. ``VegaCfg`` — ``torso_j1`` ``joint_limits`` is the single-point
   ``[0.2, 0.2]`` — looks like a fixed offset, not a range
6. ``SoArm100Cfg`` — ``Wrist_Pitch`` default ``-2.356`` outside
   ``[-0.192, 3.927]``
7. ``KochCfg`` — same shape as SoArm100
8. ``Go2Cfg`` — ``RL/RR_thigh_joint`` default ``1.0`` outside
   ``[-4.54, 0.52]`` (sign error)
9. ``AllegroHandCfg`` — ``thumb_joint_0`` default ``0`` below lower
   limit ``0.263``

When a robot is fixed, its ``xfail`` flips to ``xpass`` — the
``test_known_gap_dicts_match_actual_failures`` self-check then tells
the maintainer to remove the entry so the contract tightens.

**Forward / backward compat**: Pure additive contract enforcement. No
``RobotCfg`` was modified — the test only documents the gaps so
production behaviour (silent clamp / silent ignore) is preserved for
anyone training against the current defaults.

---

## Warning Catalog (added 2026-05-30)

As part of the 2026-05 hardening pass, ``BaseSimHandler`` and several
concrete backends started emitting one-shot warnings for the silent-
drop / cross-backend asymmetry patterns previously hidden inside
``_set_states`` / ``_set_dof_targets`` / actuator wiring / scenario
setup. Each warning fires at most once per (handler, gap-identity)
so hot-path replay does not spam the log; each tells you exactly
what was dropped and how to make it stop.

| Source | Trigger | Meaning | Fix |
|---|---|---|---|
| ``set_states`` | dict has ``dof_pos_target`` / ``dof_vel_target`` / ``dof_torque`` | These are control inputs, not state — every backend silently drops them. | Call ``set_dof_targets(...)`` instead. |
| ``set_states`` | dict has unknown key | Typo or a field no backend honours. | Use one of ``pos`` / ``rot`` / ``dof_pos``; the warning lists the valid set. |
| ``set_states`` | dict has ``vel`` / ``ang_vel`` / ``dof_vel`` | Velocity fields are populated by ``get_states`` for inspection, but no backend writes them in ``_set_states`` today (MuJoCo zeros qvel, Sapien3 ignores). | Initialise momentum via ``simulate()`` with a non-zero initial action, or open a feature request for backend-side velocity write. |
| ``set_states`` | dict has only one of ``pos`` / ``rot`` for an entity | Cross-backend divergence: MuJoCo silently fills the missing component with a default, Sapien3 raises ``KeyError``. | Pass both ``pos`` and ``rot``, or neither. |
| ``set_dof_targets`` | action targets unknown robot name | Robot not in ``scenario.robots`` — every backend silently drops the action. | Use a robot name listed in the warning's "Known robots" suffix. |
| ``set_dof_targets`` | dict has unknown joint name under a robot | Joint not in ``handler.get_joint_names(robot.name)`` — silently dropped. | Use one of the joint names from ``get_joint_names``; the warning lists them. |
| ``set_dof_targets`` | unknown top-level key under a robot | Typo like ``dof_pos_targt``; nothing reaches the actuator. | Use ``dof_pos_target`` / ``dof_vel_target`` / ``dof_torque`` / ``dof_effort_target``. |
| ``MujocoHandler._apply_actuator_settings`` | ``stiffness`` / ``damping`` set but ``effort_limit_sim`` unset | MJCF-authored ``forcerange`` stays active and silently dominates the new PD gain. Same RoboVerse config → different effective torque per backend. | Set ``effort_limit_sim`` on the actuator cfg to the value you want, or accept the asset-file clamp deliberately. |
| ``NewtonHandler._apply_actuator_settings`` | same as MuJoCo above | Newton inherits ``joint_effort_limit`` from the asset file. | Same — set ``effort_limit_sim``. |
| ``NewtonHandler._set_dof_targets`` | ``self._control`` has no ``joint_target_pos`` / ``joint_target_vel`` / ``joint_f`` buffer | Newton model has no position / velocity / effort actuator for the targeted joint. Every action write is dropped. | Verify the MJCF / URDF has actuators on the joints you control, or use ``control_type="effort"`` deliberately. |
| ``ScenarioCfg.__post_init__`` | duplicate name across robots + objects | ``object_dict = {obj.name: obj for ...}`` in every handler collapses the duplicate. Set / get / step all silently reference the wrong entity. | Give each robot and object a unique ``name``. |
| ``TaskBase.reset`` / ``RLTaskEnv.reset`` | ``seed=N`` passed but handler has no ``set_seed`` | Reproducibility contract: rollouts are not bit-reproducible on that backend's simulator side. (Won't fire on backends that inherit ``BaseSimHandler.set_seed`` — included for forward-compat against future backends that override.) | Implement ``set_seed`` on the handler, or call ``random.seed`` / ``np.random.seed`` / ``torch.manual_seed`` directly. |
| ``ParallelHandler._check_error`` | worker process not alive | A multiprocessing worker died (OOM-kill, SIGKILL, segfault) without writing to ``error_queue``. | The exit code is reported in the exception; check the worker log or run with a smaller ``num_envs`` to bisect. |
| ``ParallelHandler._recv_or_surface`` | ``EOFError`` / ``BrokenPipeError`` on ``remote.recv()`` | Worker closed its pipe — typically the worker process is dead and ``error_queue`` carries the real Python traceback. | The wrapped exception's ``__cause__`` is the original IPC error; the message describes which worker died. |

When you fix a warning's root cause, the warning stops firing — it
does NOT need a separate "silence" flag. The dedupe lives on the
handler instance, so each Python process emits each warning at most
once.

---

## Improvement Roadmap

### Phase 1: Critical Fixes (Weeks 1-2)

```
┌────────────────────────────────────────────────────────┐
│ PHASE 1: Critical Fixes                                │
│                                                        │
│ □ TODO-001: Fix state cache consistency                │
│ □ TODO-002: Add @abstractmethod decorators             │
│ □ TODO-003: Add robot configuration validation tests   │
│ □ TODO-004: Integrate pytest-cov in CI                 │
└────────────────────────────────────────────────────────┘
```

### Phase 2: Test Coverage (Weeks 3-6)

```
┌────────────────────────────────────────────────────────┐
│ PHASE 2: Test Coverage Improvement                     │
│                                                        │
│ □ TODO-005: Add task environment integration tests     │
│ □ TODO-006: Add learning algorithm unit tests          │
│ □ TODO-007: Add state conversion tests                 │
│ □ TODO-008: Add domain randomization tests             │
└────────────────────────────────────────────────────────┘
```

### Phase 3: Interface Unification (Weeks 7-10)

```
┌────────────────────────────────────────────────────────┐
│ PHASE 3: Interface Unification                         │
│                                                        │
│ □ TODO-009: Create unified environment factory         │
│ □ TODO-010: Standardize configuration loading          │
│ □ TODO-011: Unify error handling across simulators     │
│ □ TODO-012: Add deprecation warnings for old APIs      │
└────────────────────────────────────────────────────────┘
```

### Phase 4: Code Quality (Weeks 11-14)

```
┌────────────────────────────────────────────────────────┐
│ PHASE 4: Code Quality                                  │
│                                                        │
│ □ TODO-013: Remove commented-out code                  │
│ □ TODO-014: Extract magic numbers to constants         │
│ □ TODO-015: Add comprehensive type annotations         │
│ □ TODO-016: Add performance benchmarks                 │
└────────────────────────────────────────────────────────┘
```

### Phase 5: Architecture Evolution (Long-term)

```
┌────────────────────────────────────────────────────────┐
│ PHASE 5: Architecture Evolution                        │
│                                                        │
│ □ TODO-017: Implement plugin architecture for sims     │
│ □ TODO-018: Create unified configuration system        │
│ □ TODO-019: Add async simulation support               │
│ □ TODO-020: Performance profiling integration          │
└────────────────────────────────────────────────────────┘
```

---

## Detailed TODO List

### TODO-001: Fix State Cache Consistency

**Priority**: P0 (Critical)  
**Effort**: Low (1-2 days)  
**Risk**: Low  

**Description**: Fix the state cache to maintain separate caches for tensor and dict formats.

**Implementation**:

```python
# metasim/sim/base.py

class BaseSimHandler(ABC):
    def __init__(self, scenario, optional_queries=None):
        # ... existing code ...
        self._state_cache_expire = True
        self._tensor_state_cache: TensorState | None = None
        self._dict_state_cache: list[DictEnvState] | None = None

    def _invalidate_cache(self) -> None:
        """Invalidate all state caches."""
        self._state_cache_expire = True
        self._tensor_state_cache = None
        self._dict_state_cache = None

    def set_states(self, states, env_ids=None) -> None:
        """Set states and invalidate cache."""
        self._invalidate_cache()
        self._set_states(states, env_ids)

    def get_states(
        self, 
        env_ids: list[int] | None = None, 
        mode: Literal["tensor", "dict"] = "tensor"
    ) -> TensorState | list[DictEnvState]:
        """Get states with independent caching for each mode."""
        if self._state_cache_expire:
            self._tensor_state_cache = self._get_states(env_ids=env_ids)
            self._dict_state_cache = None  # Lazy conversion
            self._state_cache_expire = False

        if mode == "tensor":
            return self._tensor_state_cache
        else:
            if self._dict_state_cache is None:
                self._dict_state_cache = state_tensor_to_nested(
                    self, self._tensor_state_cache
                )
            return self._dict_state_cache
```

**Test Case**:

```python
# metasim/test/sim/test_state_cache.py

@pytest.mark.general
def test_state_cache_mode_independence():
    """Verify that switching modes doesn't corrupt cache."""
    handler = create_test_handler()
    handler.launch()
    
    # Get tensor state
    states_t1 = handler.get_states(mode="tensor")
    assert isinstance(states_t1, TensorState)
    
    # Get dict state (should not affect tensor cache)
    states_d = handler.get_states(mode="dict")
    assert isinstance(states_d, list)
    
    # Get tensor state again (should return same type)
    states_t2 = handler.get_states(mode="tensor")
    assert isinstance(states_t2, TensorState)
    
    # Values should match
    assert torch.allclose(states_t1.pos, states_t2.pos)
```

**Acceptance Criteria**:
- [ ] Test case passes
- [ ] Existing tests still pass
- [ ] No breaking changes to public API

---

### TODO-002: Add @abstractmethod Decorators

**Priority**: P1  
**Effort**: Very Low (1 hour)  
**Risk**: Very Low  

**Files to modify**:
- `metasim/sim/base.py`

**Changes**:

```python
# Before
# @abstractmethod
def _set_dof_targets(self, actions: list[Action]) -> None:
    raise NotImplementedError

# After
@abstractmethod
def _set_dof_targets(self, actions: list[Action]) -> None:
    """Set DOF targets. Subclasses must implement this method."""
    raise NotImplementedError
```

**Verification**:
```bash
# Run mypy to verify abstract method detection
mypy metasim/sim/base.py
```

---

### TODO-003: Add Robot Configuration Validation Tests

**Priority**: P0  
**Effort**: Medium (2-3 days)  
**Risk**: Low  

**Implementation**:

```python
# metasim/test/test_robot_configs.py

import pytest
from pathlib import Path
import importlib
import pkgutil

def get_all_robot_configs():
    """Dynamically discover all robot configuration classes."""
    import roboverse_pack.robots as robots_module
    
    configs = []
    for importer, modname, ispkg in pkgutil.iter_modules(robots_module.__path__):
        if modname.endswith('_cfg'):
            module = importlib.import_module(f'roboverse_pack.robots.{modname}')
            for name in dir(module):
                obj = getattr(module, name)
                if (isinstance(obj, type) and 
                    hasattr(obj, 'name') and 
                    name.endswith('Cfg')):
                    configs.append(obj)
    return configs

ALL_ROBOT_CONFIGS = get_all_robot_configs()

@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS, 
                         ids=lambda x: x.__name__)
def test_robot_config_instantiation(robot_cfg_cls):
    """Verify robot config can be instantiated."""
    cfg = robot_cfg_cls()
    assert cfg.name is not None
    assert isinstance(cfg.name, str)
    assert len(cfg.name) > 0

@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS,
                         ids=lambda x: x.__name__)
def test_robot_config_has_asset_path(robot_cfg_cls):
    """Verify robot config has at least one asset path."""
    cfg = robot_cfg_cls()
    
    asset_paths = [
        getattr(cfg, 'usd_path', None),
        getattr(cfg, 'urdf_path', None),
        getattr(cfg, 'mjcf_path', None),
    ]
    
    valid_paths = [p for p in asset_paths if p is not None and len(p) > 0]
    assert len(valid_paths) > 0, f"{cfg.name} has no valid asset path"

@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS,
                         ids=lambda x: x.__name__)
def test_robot_config_has_actuators(robot_cfg_cls):
    """Verify robot config has actuator definitions."""
    cfg = robot_cfg_cls()
    
    if hasattr(cfg, 'actuators'):
        assert len(cfg.actuators) > 0, f"{cfg.name} has no actuators defined"

@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS,
                         ids=lambda x: x.__name__)
def test_robot_config_joint_limits_valid(robot_cfg_cls):
    """Verify joint limits are valid (lower < upper)."""
    cfg = robot_cfg_cls()
    
    if hasattr(cfg, 'joint_limits'):
        for joint_name, (lower, upper) in cfg.joint_limits.items():
            assert lower < upper, \
                f"{cfg.name}.{joint_name}: lower ({lower}) >= upper ({upper})"
```

---

### TODO-004: Integrate pytest-cov in CI

**Priority**: P0  
**Effort**: Low (1 day)  
**Risk**: Very Low  

**Changes to `.github/workflows/premerge-ci.yml`**:

```yaml
# Add coverage reporting step
- name: Run tests with coverage
  run: |
    pytest metasim/test \
      --cov=metasim \
      --cov-report=xml \
      --cov-report=html \
      --cov-fail-under=30 \
      -k ${{ matrix.test_type }}

- name: Upload coverage to Codecov
  uses: codecov/codecov-action@v4
  with:
    file: ./coverage.xml
    flags: ${{ matrix.test_type }}
    fail_ci_if_error: false
```

**Add `pyproject.toml` configuration**:

```toml
[tool.coverage.run]
source = ["metasim"]
omit = ["metasim/test/*", "*/__pycache__/*"]

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "raise NotImplementedError",
    "if TYPE_CHECKING:",
]
```

---

### TODO-005: Add Task Environment Integration Tests

**Priority**: P1  
**Effort**: High (1 week)  
**Risk**: Medium  

**Implementation**:

```python
# metasim/test/test_task_environments.py

import pytest
from metasim.task.registry import get_task_class, TASK_REGISTRY

# Select representative tasks for testing
CORE_TASKS = [
    "pick_cube",
    "place_cube", 
    "open_drawer",
    "close_drawer",
    "push_button",
]

@pytest.mark.mujoco
@pytest.mark.parametrize("task_name", CORE_TASKS)
def test_task_reset_step_mujoco(task_name):
    """Test basic reset/step cycle for core tasks on MuJoCo."""
    task_cls = get_task_class(task_name)
    scenario = task_cls.scenario.copy()
    scenario.update(
        simulator="mujoco",
        num_envs=1,
        headless=True,
    )
    
    env = task_cls(scenario, device="cpu")
    env.launch()
    
    try:
        # Test reset
        obs, info = env.reset()
        assert obs is not None
        
        # Test step
        action = env.action_space.sample()
        obs, reward, terminated, truncated, info = env.step(action)
        
        assert obs is not None
        assert isinstance(reward, (int, float, torch.Tensor))
        assert isinstance(terminated, (bool, torch.Tensor))
        assert isinstance(truncated, (bool, torch.Tensor))
    finally:
        env.close()
```

---

### TODO-006: Add Learning Algorithm Unit Tests

**Priority**: P1  
**Effort**: High (1-2 weeks)  
**Risk**: Medium  

**Example for Diffusion Policy**:

```python
# roboverse_learn/il/tests/test_diffusion_policy.py

import pytest
import torch
from roboverse_learn.il.policies.dp.ddpm_dit_image_policy import DDPMDiTImagePolicy

@pytest.fixture
def sample_obs():
    """Create sample observation for testing."""
    return {
        "image": torch.randn(1, 3, 224, 224),
        "agent_pos": torch.randn(1, 7),
    }

def test_diffusion_policy_forward():
    """Test forward pass of diffusion policy."""
    policy = DDPMDiTImagePolicy(
        obs_dim=7,
        action_dim=7,
        horizon=16,
        # ... minimal config
    )
    
    obs = sample_obs()
    action = policy.predict_action(obs)
    
    assert action.shape == (1, 16, 7)  # (batch, horizon, action_dim)

def test_diffusion_policy_training_step():
    """Test single training step."""
    policy = DDPMDiTImagePolicy(...)
    optimizer = torch.optim.Adam(policy.parameters())
    
    batch = create_training_batch()
    loss = policy.compute_loss(batch)
    
    assert loss.requires_grad
    loss.backward()
    optimizer.step()
```

---

### TODO-009: Create Unified Environment Factory

**Priority**: P1  
**Effort**: Medium (3-5 days)  
**Risk**: Medium  

**Implementation**:

```python
# metasim/env_factory.py

from typing import Union, List, Optional
from metasim.scenario.robot import RobotCfg

def make_env(
    task: str,
    robots: Optional[List[Union[str, RobotCfg]]] = None,
    simulator: str = "mujoco",
    num_envs: int = 1,
    headless: bool = True,
    device: str = "cuda",
    cameras: Optional[List] = None,
    **kwargs
):
    """
    Unified environment factory for RoboVerse.
    
    This is the recommended way to create environments.
    
    Args:
        task: Task name (e.g., "pick_cube", "locomotion_walk")
        robots: List of robot names or RobotCfg instances
        simulator: Simulator backend ("mujoco", "isaacsim", "sapien3", etc.)
        num_envs: Number of parallel environments
        headless: Whether to run in headless mode
        device: Device for tensor computations
        cameras: Camera configurations for observations
        **kwargs: Additional task-specific arguments
    
    Returns:
        BaseTaskEnv: Configured environment instance
    
    Example:
        >>> env = make_env(
        ...     task="pick_cube",
        ...     robots=["franka"],
        ...     simulator="mujoco",
        ...     num_envs=16,
        ... )
        >>> obs, info = env.reset()
        >>> obs, reward, term, trunc, info = env.step(action)
    """
    from metasim.task.registry import get_task_class
    from metasim.utils.setup_util import get_robot
    
    # Resolve task class
    task_cls = get_task_class(task)
    
    # Build scenario from task default
    scenario = task_cls.scenario.copy()
    
    # Resolve robots
    if robots is not None:
        resolved_robots = []
        for robot in robots:
            if isinstance(robot, str):
                resolved_robots.append(get_robot(robot))
            else:
                resolved_robots.append(robot)
        scenario.robots = resolved_robots
    
    # Apply overrides
    scenario.update(
        simulator=simulator,
        num_envs=num_envs,
        headless=headless,
        cameras=cameras or [],
    )
    
    # Create and return environment
    env = task_cls(scenario, device=device, **kwargs)
    env.launch()
    
    return env

# Also register with gymnasium for compatibility
def register_gymnasium_envs():
    """Register all RoboVerse tasks with Gymnasium."""
    import gymnasium
    from metasim.task.registry import TASK_REGISTRY
    
    for task_name in TASK_REGISTRY:
        gymnasium.register(
            id=f"RoboVerse/{task_name}",
            entry_point="metasim.env_factory:make_env",
            kwargs={"task": task_name},
        )
```

---

### TODO-013: Remove Commented-Out Code

**Priority**: P2  
**Effort**: Very Low (2 hours)  
**Risk**: Very Low  

**Files to clean**:

| File | Line | Content |
|------|------|---------|
| `metasim/sim/base.py` | 18 | `# from metasim.utils.hf_util import FileDownloader` |
| `metasim/sim/base.py` | 36 | `# FileDownloader(scenario).do_it()` |
| `metasim/sim/base.py` | 86 | `# @abstractmethod` |
| `metasim/scenario/scenario.py` | 72 | Various commented code |

**Approach**: 
1. Search for `# ` patterns followed by code
2. Review each instance
3. Either remove or convert to proper TODO comment

---

### TODO-014: Extract Magic Numbers to Constants

**Priority**: P2  
**Effort**: Low (1-2 days)  
**Risk**: Low  

**Create constants file**:

```python
# metasim/constants.py

"""Framework-wide constants."""

# Simulation defaults
DEFAULT_DT = 0.015  # 15ms physics timestep
DEFAULT_DECIMATION = 2
DEFAULT_GRAVITY = (0.0, 0.0, -9.81)

# Cache settings
STATE_CACHE_SIZE = 1000
MAX_PARALLEL_ENVS = 4096

# Timeouts
DEFAULT_LAUNCH_TIMEOUT = 30.0
DEFAULT_STEP_TIMEOUT = 5.0

# Numerical tolerances
POSITION_TOLERANCE = 1e-5
ROTATION_TOLERANCE = 1e-4
```

---

## Implementation Guidelines

### Safe Modification Protocol

Before making any changes, follow this checklist:

```
□ Pre-modification
  ├── □ Read existing tests for the module
  ├── □ Run existing tests (ensure they pass)
  ├── □ Understand the change's impact scope
  └── □ Check for downstream dependencies

□ Implementation
  ├── □ Write tests first (TDD preferred)
  ├── □ Make small, focused commits
  ├── □ Maintain backward compatibility
  └── □ Add deprecation warnings if needed

□ Post-modification
  ├── □ Run full test suite locally
  ├── □ Check for linter errors
  ├── □ Update documentation if needed
  └── □ Create detailed PR description
```

### Feature Flag Pattern

For high-risk changes, use feature flags:

```python
# metasim/config.py

class FeatureFlags:
    """Feature flags for gradual rollout of changes."""
    
    # State cache v2 with independent tensor/dict caches
    USE_INDEPENDENT_STATE_CACHE = False
    
    # New unified environment factory
    USE_UNIFIED_ENV_FACTORY = False
    
    # Strict type checking in configs
    STRICT_CONFIG_VALIDATION = False
```

### Deprecation Pattern

```python
import warnings
from functools import wraps

def deprecated(message: str, removal_version: str = "0.3.0"):
    """Decorator to mark functions as deprecated."""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            warnings.warn(
                f"{func.__name__} is deprecated: {message}. "
                f"Will be removed in version {removal_version}.",
                DeprecationWarning,
                stacklevel=2
            )
            return func(*args, **kwargs)
        return wrapper
    return decorator

# Usage
@deprecated("Use make_env() instead", removal_version="0.4.0")
def create_environment_legacy(...):
    ...
```

---

## Testing Strategy

### Test Hierarchy

```
┌─────────────────────────────────────────────────────────────┐
│                    End-to-End Tests                         │
│              (Full training pipeline tests)                 │
│                      ~5% of tests                           │
├─────────────────────────────────────────────────────────────┤
│                   Integration Tests                         │
│         (Task reset/step, multi-env, rendering)             │
│                     ~25% of tests                           │
├─────────────────────────────────────────────────────────────┤
│                      Unit Tests                             │
│    (Individual functions, state conversion, configs)        │
│                     ~70% of tests                           │
└─────────────────────────────────────────────────────────────┘
```

### Running Tests Locally

```bash
# Run all general tests (no simulator required)
pytest metasim/test -k general -v

# Run MuJoCo tests
pytest metasim/test -k mujoco -v

# Run with coverage
pytest metasim/test --cov=metasim --cov-report=html

# Run specific test file
pytest metasim/test/sim/test_state_cache.py -v
```

### Coverage Targets

| Phase | Target Coverage | Timeline |
|-------|-----------------|----------|
| Current | ~15-20% | - |
| Phase 1 | 30% | Week 2 |
| Phase 2 | 50% | Week 6 |
| Phase 3 | 60% | Week 10 |
| Long-term | 70%+ | Week 14+ |

---

## Appendix: Quick Reference

### Priority Definitions

| Priority | Definition | Response Time |
|----------|------------|---------------|
| P0 | Critical - blocks users or causes data corruption | Immediate |
| P1 | High - significant usability or reliability issue | 1-2 weeks |
| P2 | Medium - code quality or maintainability | 2-4 weeks |
| P3 | Low - nice to have improvements | Backlog |

### Effort Definitions

| Effort | Time Estimate | Description |
|--------|---------------|-------------|
| Very Low | < 4 hours | Single file, obvious change |
| Low | 1-2 days | Few files, clear scope |
| Medium | 3-5 days | Multiple files, some complexity |
| High | 1-2 weeks | Significant refactoring |
| Very High | 2+ weeks | Architecture-level changes |

---

## Contributing

If you want to contribute to any of these improvements:

1. Comment on the relevant GitHub issue (or create one)
2. Follow the [Safe Modification Protocol](#safe-modification-protocol)
3. Start with lower-risk items to familiarize yourself with the codebase
4. Ask questions in discussions if anything is unclear

Recommended starting points for new contributors:
- TODO-002: Add @abstractmethod decorators (Very Low effort)
- TODO-013: Remove commented-out code (Very Low effort)
- TODO-003: Add robot configuration validation tests (Medium effort, high impact)