RoboVerse Architecture Review & Improvement Roadmap#
Document Version: 1.1 Last Updated: 2026-05-26 Status: Active Development
This document provides a comprehensive architecture review of the RoboVerse codebase and outlines a structured improvement roadmap. It is intended for core maintainers and contributors who want to understand the current state of the codebase and contribute to its improvement.
2026-05-26 update: 4 of the originally-listed P0/P1 items have landed as code + regression tests. They are kept in the document below for historical context, but each is now annotated STATUS: FIXED with a pointer to the test file that pins the fix in place. Forward / backward compatibility statements are included so existing users can adopt the changes without surprise β every fix is either a pure warning, a stale- cache eviction, or an information upgrade to a previously-silent failure mode.
Table of Contents#
Executive Summary#
Strengths#
RoboVerse is a well-architected multi-simulator robotics framework with several notable strengths:
Aspect |
Assessment |
Details |
|---|---|---|
Modularity |
Excellent |
Clear separation between |
Simulator Abstraction |
Good |
Unified interface across MuJoCo, IsaacSim, SAPIEN, PyBullet, Genesis |
Configuration System |
Good |
Type-safe |
Domain Randomization |
Excellent |
Comprehensive DR system with hybrid simulation support |
Documentation |
Good |
Extensive tutorials and API documentation |
Critical Issues Requiring Attention#
Priority |
Issue |
Impact |
Effort |
Status |
|---|---|---|---|---|
P0 |
State cache consistency bug |
Data corruption |
Low |
β FIXED β see Issue 1 |
P0 |
|
Silent no-op |
Low |
β FIXED β see Issue 8 |
P0 |
|
Silent stale state |
Low |
β FIXED β see Issue 9 |
P0 |
Test coverage severely lacking |
Reliability |
High |
π‘ PARTIAL β see Issue 4 |
P1 |
Parallel-sim error handling |
Silent worker death |
Medium |
β FIXED β see Issue 7 |
P1 |
Backend interface drift ( |
Late-bind failures |
Low |
β FIXED via contract test β see Issue 2 |
P1 |
Configuration system fragmentation |
Usability |
Medium |
β³ Open |
P1 |
Environment creation interface inconsistency |
Usability |
Medium |
β³ Open |
P2 |
Code quality issues |
Maintainability |
Low |
β³ Open |
Architecture Overview#
Module Structure#
RoboVerse/
βββ metasim/ # Core simulation framework
β βββ sim/ # Simulator handlers (MuJoCo, Isaac, SAPIEN, etc.)
β βββ scenario/ # Scene configuration (robots, objects, cameras)
β βββ task/ # Task environment abstraction
β βββ randomization/ # Domain randomization system
β βββ queries/ # Extended query system (contacts, sensors)
β βββ utils/ # Utilities (configclass, math, state conversion)
β
βββ roboverse_pack/ # Assets and task definitions
β βββ robots/ # 50+ robot configurations
β βββ tasks/ # 200+ task environments
β βββ scenes/ # Scene configurations
β βββ queries/ # Custom query implementations
β
βββ roboverse_learn/ # Learning algorithms
β βββ il/ # Imitation learning (ACT, Diffusion Policy, etc.)
β βββ rl/ # Reinforcement learning (PPO, TD3, SAC)
β βββ vla/ # Vision-Language-Action models
β
βββ generation/ # Asset generation and conversion tools
Core Abstractions#
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BaseSimHandler β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β MujocoHandlerβ βIsaacHandler β βSAPIENHandlerβ ... β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BaseTaskEnv β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β βPickPlaceTaskβ βLocomotionTaskβ βManipulationTaskβ ... β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ScenarioCfg β
β robots[] + objects[] + cameras[] + lights[] + sim_params β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Identified Issues#
Issue 1: State Cache Consistency Bug (P0 - FIXED 2026-05-26)#
Location: metasim/sim/base.py
Problem (historical): The state cache was modified in-place when switching between tensor and dict modes, causing data corruption.
Status: β
FIXED. BaseSimHandler now keeps two independent
caches (_tensor_state_cache / _dict_state_cache). On a miss for
the requested mode it lazily converts from the other. set_states,
set_dof_targets and simulate all call _invalidate_state_caches.
Tests pinning the fix:
metasim/test/sim/test_state_modes.py::test_state_cache_mode_independenceβ integration (mujoco / sapien3 / isaacsim / isaacgym / newton)metasim/test/sim/test_state_modes.py::test_set_dof_targets_invalidates_state_cacheβ integration; specifically catches the staleness regressed in Issue 9metasim/test/test_backend_contract_general.py::test_set_states_invalidates_cache_on_all_backendsβ static AST check onbase.py; runs in-k generalmetasim/test/test_set_states_key_validation.py::test_set_dof_targets_invalidates_state_cache_unitβ unit-level stub guard
Forward / backward compat: Pure bugfix. Callers that previously received corrupted data now receive consistent data β there is no API surface change. The lazy-conversion path was already the documented intent; only the in-place mutation was wrong.
Issue 2: Incomplete Abstract Method Declarations (P1 β FIXED via contract test 2026-05-26)#
Location: metasim/sim/base.py
Status: β
FIXED in spirit. _set_dof_targets is now actually
decorated @abstractmethod. Two other contract methods
(_get_joint_names / _get_body_names) remain commented out
because flipping the decorator would break PyrepHandler /
partial SinglePybulletHandler / partial GenesisHandler at
import time β those backends genuinely do not implement them yet.
Instead of breaking imports, metasim/test/test_backend_contract_general.py
statically asserts every concrete BaseSimHandler subclass overrides
every documented contract method, with xfail markers for known
gaps. When a backend catches up, its xfail flips to xpass β
thatβs the signal to enable the decorator for real.
Forward / backward compat: Pure additive contract enforcement. No existing backend was broken. New backends added later will get a clear failing test on day one if they forget a method.
Issue 3: Configuration System Fragmentation (P1)#
Problem: Three different configuration systems are used across modules:
Module |
System |
Tools |
|---|---|---|
|
|
Python dataclass + custom wrapper |
|
Hydra |
YAML + OmegaConf |
|
Mixed |
YAML + tyro + argparse |
Impact:
Steep learning curve for contributors
Configuration cannot be easily shared between modules
Inconsistent user experience
Issue 4: Test Coverage Severely Lacking (P0)#
Current Coverage Estimate:
Module |
Test Files |
Estimated Coverage |
|---|---|---|
|
29 |
~60-70% |
|
0 |
0% |
|
1 |
<5% |
|
1 |
~20% |
Overall |
31 |
~15-20% |
Critical Gaps:
50+ robot configurations have no validation tests
200+ task environments have no tests
All learning algorithms (ACT, Diffusion Policy, PPO, TD3) have no tests
No code coverage reporting in CI
Issue 5: Environment Creation Interface Inconsistency (P1)#
Problem: Two different ways to create environments:
# Method 1: Gymnasium style (used in clean_rl, vla)
from gymnasium import make_vec
env = make_vec("RoboVerse/task", robots=[...], simulator=sim)
# Method 2: Direct task class (used in fast_td3, rsl_rl, il)
from metasim.task.registry import get_task_class
env = get_task_class(task)(scenario)
Impact: Confusion for new users; inconsistent integration patterns.
Issue 6: Code Quality Issues (P2)#
Issue |
Location |
Count |
|---|---|---|
Commented-out code |
Multiple files |
~15 instances |
Magic numbers |
|
~20 instances |
Missing type annotations |
Various |
~100+ functions |
Inconsistent error handling |
All simulator handlers |
Varies |
Issue 7: Parallel Simulation Error Handling (P1 β FIXED 2026-05-26)#
Location: metasim/sim/parallel.py
Problem (historical): _check_error() was only called from
launch(). A worker that died on any other operation (OOM, GPU
failure, asset load error) left the parent process either hanging on
remote.recv() or eventually raising a cryptic EOFError β the
real traceback in error_queue was never surfaced.
Status: β FIXED. Three changes:
_check_errornow also detects workers that died without reporting (e.g. OOM-killed / SIGKILL) viaprocess.is_alive(). Queue messages are drained with full traceback formatting.Every public method on
ParallelHandler(_set_states,_set_dof_targets,_simulate,get_joint_names,get_body_names,device,_get_states) calls_check_errorafter wire I/O.A new
_recv_or_surfacewrapsremote.recv()and translatesEOFError/BrokenPipeError/ConnectionResetErrorinto aRuntimeErrorcarrying the real worker traceback β instead of the user chasing a cryptic IPC exception.
Tests pinning the fix:
metasim/test/test_parallel_error_handling_general.pyβ 6 unit tests using a sync queue stub (mp.Queueβs async put races with the immediate empty-check, so a sync stand-in keeps the tests deterministic without requiring a real worker process)
Forward / backward compat: Pure information upgrade. A call that
previously hung or raised EOFError now raises RuntimeError with
the actual worker traceback. Calls that previously succeeded continue
to succeed unchanged.
Issue 8: set_states silently dropped control-input keys (P0 β FIXED 2026-05-26)#
Location: metasim/sim/base.py (boundary), every backend _set_states
Problem (historical): DictRobotState advertises
dof_pos_target / dof_vel_target / dof_torque as valid keys,
but every backendβs _set_states only honours pos / rot /
dof_pos. Callers mistakenly passing dof_pos_target to
set_states got a silent no-op β the joints never moved. This cost
~15 downstream BC experiments before the cause was found.
Status: β
FIXED. BaseSimHandler.set_states now runs
_warn_set_states_keys before dispatching to the backend. Unknown
keys log a one-shot warning per (role, key) with the list of valid
keys; the three control-input keys get a specific
"this is a control input β use set_dof_targets(...) instead" hint.
Deduplicated per handler instance so the hot path stays quiet.
Tests pinning the fix:
metasim/test/test_set_states_key_validation.pyβ 9 unit tests (known-keys-quiet, control-input-hint, unknown-key-message, deduplication, per-role labelling, TensorState fast-path, plus theset_dof_targetscache-invalidation guard from Issue 9)
Forward / backward compat: Pure warning addition β runtime behaviour
of set_states is unchanged for code that passed valid keys. Code
that previously silently no-opβd now produces a clear log line; the
hot path overhead is one set membership check per (role, key) seen.
Issue 9: set_dof_targets left state cache stale (P0 β FIXED 2026-05-26)#
Location: metasim/sim/base.py
Problem (historical): MuJoCo writes actuator ctrl in
_set_dof_targets, and _get_states reads ctrl back as
joint_pos_target. The base classβs public set_dof_targets
forgot to invalidate the state cache. Any get_states between
set_dof_targets and the next simulate returned the previous
joint_pos_target.
Status: β
FIXED. BaseSimHandler.set_dof_targets now calls
_invalidate_state_caches() before dispatching to the backend.
Universal because no concrete handler overrides the public
set_dof_targets β every backend benefits.
Tests pinning the fix:
metasim/test/sim/test_state_modes.py::test_set_dof_targets_invalidates_state_cacheβ integration; verified on mujoco / sapien3 inroboverseenv, marker also covers isaacsim / isaacgym / newtonmetasim/test/test_set_states_key_validation.py::test_set_dof_targets_invalidates_state_cache_unitβ fast unit-level guard with stub handler
Forward / backward compat: Pure bugfix. The first get_states
after a set_dof_targets now reflects the fresh action; previously
it returned stale data. No API surface change.
Issue 10: Robot config drift (P0 β FIXED via test 2026-05-26)#
Location: roboverse_pack/robots/*.py
Problem (historical): 50+ RobotCfg subclasses with zero
validation tests. Typos in default_joint_positions (joints that
arenβt in joint_limits) and out-of-limit defaults were silently
clamped or ignored at sim launch β observable as wrong reset states
rather than as a config error.
Status: β
FIXED via test gate.
metasim/test/test_robot_cfg_validation_general.py and
RoboVerse/tests/test_roboverse_robot_cfg_validation.py walk every
discoverable RobotCfg subclass and assert: instantiation, non-empty
name, default_joint_positions keys β joint_limits keys,
defaults β limit intervals.
Bugs surfaced (xfail-documented, not fixed in this pass to preserve backward compat for trained policies):
AlohaAgilexCfgβ 16fl_/fr_joint{1..8}keys indefault_joint_positionsbut only single-arm names injoint_limits(bimanual override gap)G1TrackingCfgβ regex keys like.*_ankle_pitch_jointindefault_joint_positionsnot expanded;joint_limitshas the concreteleft_/right_namesYamCfgβ joint2 / joint4 defaults copy-pasted from Franka home pose butjoint_limitsare Yamβs narrower rangesArxL5Cfgβ same Franka copy-paste, different limit rangesVegaCfgβtorso_j1joint_limitsis the single-point[0.2, 0.2]β looks like a fixed offset, not a rangeSoArm100CfgβWrist_Pitchdefault-2.356outside[-0.192, 3.927]KochCfgβ same shape as SoArm100Go2CfgβRL/RR_thigh_jointdefault1.0outside[-4.54, 0.52](sign error)AllegroHandCfgβthumb_joint_0default0below lower limit0.263
When a robot is fixed, its xfail flips to xpass β the
test_known_gap_dicts_match_actual_failures self-check then tells
the maintainer to remove the entry so the contract tightens.
Forward / backward compat: Pure additive contract enforcement. No
RobotCfg was modified β the test only documents the gaps so
production behaviour (silent clamp / silent ignore) is preserved for
anyone training against the current defaults.
Warning Catalog (added 2026-05-30)#
As part of the 2026-05 hardening pass, BaseSimHandler and several
concrete backends started emitting one-shot warnings for the silent-
drop / cross-backend asymmetry patterns previously hidden inside
_set_states / _set_dof_targets / actuator wiring / scenario
setup. Each warning fires at most once per (handler, gap-identity)
so hot-path replay does not spam the log; each tells you exactly
what was dropped and how to make it stop.
Source |
Trigger |
Meaning |
Fix |
|---|---|---|---|
|
dict has |
These are control inputs, not state β every backend silently drops them. |
Call |
|
dict has unknown key |
Typo or a field no backend honours. |
Use one of |
|
dict has |
Velocity fields are populated by |
Initialise momentum via |
|
dict has only one of |
Cross-backend divergence: MuJoCo silently fills the missing component with a default, Sapien3 raises |
Pass both |
|
action targets unknown robot name |
Robot not in |
Use a robot name listed in the warningβs βKnown robotsβ suffix. |
|
dict has unknown joint name under a robot |
Joint not in |
Use one of the joint names from |
|
unknown top-level key under a robot |
Typo like |
Use |
|
|
MJCF-authored |
Set |
|
same as MuJoCo above |
Newton inherits |
Same β set |
|
|
Newton model has no position / velocity / effort actuator for the targeted joint. Every action write is dropped. |
Verify the MJCF / URDF has actuators on the joints you control, or use |
|
duplicate name across robots + objects |
|
Give each robot and object a unique |
|
|
Reproducibility contract: rollouts are not bit-reproducible on that backendβs simulator side. (Wonβt fire on backends that inherit |
Implement |
|
worker process not alive |
A multiprocessing worker died (OOM-kill, SIGKILL, segfault) without writing to |
The exit code is reported in the exception; check the worker log or run with a smaller |
|
|
Worker closed its pipe β typically the worker process is dead and |
The wrapped exceptionβs |
When you fix a warningβs root cause, the warning stops firing β it does NOT need a separate βsilenceβ flag. The dedupe lives on the handler instance, so each Python process emits each warning at most once.
Improvement Roadmap#
Phase 1: Critical Fixes (Weeks 1-2)#
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 1: Critical Fixes β
β β
β β‘ TODO-001: Fix state cache consistency β
β β‘ TODO-002: Add @abstractmethod decorators β
β β‘ TODO-003: Add robot configuration validation tests β
β β‘ TODO-004: Integrate pytest-cov in CI β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Phase 2: Test Coverage (Weeks 3-6)#
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 2: Test Coverage Improvement β
β β
β β‘ TODO-005: Add task environment integration tests β
β β‘ TODO-006: Add learning algorithm unit tests β
β β‘ TODO-007: Add state conversion tests β
β β‘ TODO-008: Add domain randomization tests β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Phase 3: Interface Unification (Weeks 7-10)#
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 3: Interface Unification β
β β
β β‘ TODO-009: Create unified environment factory β
β β‘ TODO-010: Standardize configuration loading β
β β‘ TODO-011: Unify error handling across simulators β
β β‘ TODO-012: Add deprecation warnings for old APIs β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Phase 4: Code Quality (Weeks 11-14)#
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 4: Code Quality β
β β
β β‘ TODO-013: Remove commented-out code β
β β‘ TODO-014: Extract magic numbers to constants β
β β‘ TODO-015: Add comprehensive type annotations β
β β‘ TODO-016: Add performance benchmarks β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Phase 5: Architecture Evolution (Long-term)#
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE 5: Architecture Evolution β
β β
β β‘ TODO-017: Implement plugin architecture for sims β
β β‘ TODO-018: Create unified configuration system β
β β‘ TODO-019: Add async simulation support β
β β‘ TODO-020: Performance profiling integration β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Detailed TODO List#
TODO-001: Fix State Cache Consistency#
Priority: P0 (Critical)
Effort: Low (1-2 days)
Risk: Low
Description: Fix the state cache to maintain separate caches for tensor and dict formats.
Implementation:
# metasim/sim/base.py
class BaseSimHandler(ABC):
def __init__(self, scenario, optional_queries=None):
# ... existing code ...
self._state_cache_expire = True
self._tensor_state_cache: TensorState | None = None
self._dict_state_cache: list[DictEnvState] | None = None
def _invalidate_cache(self) -> None:
"""Invalidate all state caches."""
self._state_cache_expire = True
self._tensor_state_cache = None
self._dict_state_cache = None
def set_states(self, states, env_ids=None) -> None:
"""Set states and invalidate cache."""
self._invalidate_cache()
self._set_states(states, env_ids)
def get_states(
self,
env_ids: list[int] | None = None,
mode: Literal["tensor", "dict"] = "tensor"
) -> TensorState | list[DictEnvState]:
"""Get states with independent caching for each mode."""
if self._state_cache_expire:
self._tensor_state_cache = self._get_states(env_ids=env_ids)
self._dict_state_cache = None # Lazy conversion
self._state_cache_expire = False
if mode == "tensor":
return self._tensor_state_cache
else:
if self._dict_state_cache is None:
self._dict_state_cache = state_tensor_to_nested(
self, self._tensor_state_cache
)
return self._dict_state_cache
Test Case:
# metasim/test/sim/test_state_cache.py
@pytest.mark.general
def test_state_cache_mode_independence():
"""Verify that switching modes doesn't corrupt cache."""
handler = create_test_handler()
handler.launch()
# Get tensor state
states_t1 = handler.get_states(mode="tensor")
assert isinstance(states_t1, TensorState)
# Get dict state (should not affect tensor cache)
states_d = handler.get_states(mode="dict")
assert isinstance(states_d, list)
# Get tensor state again (should return same type)
states_t2 = handler.get_states(mode="tensor")
assert isinstance(states_t2, TensorState)
# Values should match
assert torch.allclose(states_t1.pos, states_t2.pos)
Acceptance Criteria:
Test case passes
Existing tests still pass
No breaking changes to public API
TODO-002: Add @abstractmethod Decorators#
Priority: P1
Effort: Very Low (1 hour)
Risk: Very Low
Files to modify:
metasim/sim/base.py
Changes:
# Before
# @abstractmethod
def _set_dof_targets(self, actions: list[Action]) -> None:
raise NotImplementedError
# After
@abstractmethod
def _set_dof_targets(self, actions: list[Action]) -> None:
"""Set DOF targets. Subclasses must implement this method."""
raise NotImplementedError
Verification:
# Run mypy to verify abstract method detection
mypy metasim/sim/base.py
TODO-003: Add Robot Configuration Validation Tests#
Priority: P0
Effort: Medium (2-3 days)
Risk: Low
Implementation:
# metasim/test/test_robot_configs.py
import pytest
from pathlib import Path
import importlib
import pkgutil
def get_all_robot_configs():
"""Dynamically discover all robot configuration classes."""
import roboverse_pack.robots as robots_module
configs = []
for importer, modname, ispkg in pkgutil.iter_modules(robots_module.__path__):
if modname.endswith('_cfg'):
module = importlib.import_module(f'roboverse_pack.robots.{modname}')
for name in dir(module):
obj = getattr(module, name)
if (isinstance(obj, type) and
hasattr(obj, 'name') and
name.endswith('Cfg')):
configs.append(obj)
return configs
ALL_ROBOT_CONFIGS = get_all_robot_configs()
@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS,
ids=lambda x: x.__name__)
def test_robot_config_instantiation(robot_cfg_cls):
"""Verify robot config can be instantiated."""
cfg = robot_cfg_cls()
assert cfg.name is not None
assert isinstance(cfg.name, str)
assert len(cfg.name) > 0
@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS,
ids=lambda x: x.__name__)
def test_robot_config_has_asset_path(robot_cfg_cls):
"""Verify robot config has at least one asset path."""
cfg = robot_cfg_cls()
asset_paths = [
getattr(cfg, 'usd_path', None),
getattr(cfg, 'urdf_path', None),
getattr(cfg, 'mjcf_path', None),
]
valid_paths = [p for p in asset_paths if p is not None and len(p) > 0]
assert len(valid_paths) > 0, f"{cfg.name} has no valid asset path"
@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS,
ids=lambda x: x.__name__)
def test_robot_config_has_actuators(robot_cfg_cls):
"""Verify robot config has actuator definitions."""
cfg = robot_cfg_cls()
if hasattr(cfg, 'actuators'):
assert len(cfg.actuators) > 0, f"{cfg.name} has no actuators defined"
@pytest.mark.general
@pytest.mark.parametrize("robot_cfg_cls", ALL_ROBOT_CONFIGS,
ids=lambda x: x.__name__)
def test_robot_config_joint_limits_valid(robot_cfg_cls):
"""Verify joint limits are valid (lower < upper)."""
cfg = robot_cfg_cls()
if hasattr(cfg, 'joint_limits'):
for joint_name, (lower, upper) in cfg.joint_limits.items():
assert lower < upper, \
f"{cfg.name}.{joint_name}: lower ({lower}) >= upper ({upper})"
TODO-004: Integrate pytest-cov in CI#
Priority: P0
Effort: Low (1 day)
Risk: Very Low
Changes to .github/workflows/premerge-ci.yml:
# Add coverage reporting step
- name: Run tests with coverage
run: |
pytest metasim/test \
--cov=metasim \
--cov-report=xml \
--cov-report=html \
--cov-fail-under=30 \
-k ${{ matrix.test_type }}
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
with:
file: ./coverage.xml
flags: ${{ matrix.test_type }}
fail_ci_if_error: false
Add pyproject.toml configuration:
[tool.coverage.run]
source = ["metasim"]
omit = ["metasim/test/*", "*/__pycache__/*"]
[tool.coverage.report]
exclude_lines = [
"pragma: no cover",
"raise NotImplementedError",
"if TYPE_CHECKING:",
]
TODO-005: Add Task Environment Integration Tests#
Priority: P1
Effort: High (1 week)
Risk: Medium
Implementation:
# metasim/test/test_task_environments.py
import pytest
from metasim.task.registry import get_task_class, TASK_REGISTRY
# Select representative tasks for testing
CORE_TASKS = [
"pick_cube",
"place_cube",
"open_drawer",
"close_drawer",
"push_button",
]
@pytest.mark.mujoco
@pytest.mark.parametrize("task_name", CORE_TASKS)
def test_task_reset_step_mujoco(task_name):
"""Test basic reset/step cycle for core tasks on MuJoCo."""
task_cls = get_task_class(task_name)
scenario = task_cls.scenario.copy()
scenario.update(
simulator="mujoco",
num_envs=1,
headless=True,
)
env = task_cls(scenario, device="cpu")
env.launch()
try:
# Test reset
obs, info = env.reset()
assert obs is not None
# Test step
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
assert obs is not None
assert isinstance(reward, (int, float, torch.Tensor))
assert isinstance(terminated, (bool, torch.Tensor))
assert isinstance(truncated, (bool, torch.Tensor))
finally:
env.close()
TODO-006: Add Learning Algorithm Unit Tests#
Priority: P1
Effort: High (1-2 weeks)
Risk: Medium
Example for Diffusion Policy:
# roboverse_learn/il/tests/test_diffusion_policy.py
import pytest
import torch
from roboverse_learn.il.policies.dp.ddpm_dit_image_policy import DDPMDiTImagePolicy
@pytest.fixture
def sample_obs():
"""Create sample observation for testing."""
return {
"image": torch.randn(1, 3, 224, 224),
"agent_pos": torch.randn(1, 7),
}
def test_diffusion_policy_forward():
"""Test forward pass of diffusion policy."""
policy = DDPMDiTImagePolicy(
obs_dim=7,
action_dim=7,
horizon=16,
# ... minimal config
)
obs = sample_obs()
action = policy.predict_action(obs)
assert action.shape == (1, 16, 7) # (batch, horizon, action_dim)
def test_diffusion_policy_training_step():
"""Test single training step."""
policy = DDPMDiTImagePolicy(...)
optimizer = torch.optim.Adam(policy.parameters())
batch = create_training_batch()
loss = policy.compute_loss(batch)
assert loss.requires_grad
loss.backward()
optimizer.step()
TODO-009: Create Unified Environment Factory#
Priority: P1
Effort: Medium (3-5 days)
Risk: Medium
Implementation:
# metasim/env_factory.py
from typing import Union, List, Optional
from metasim.scenario.robot import RobotCfg
def make_env(
task: str,
robots: Optional[List[Union[str, RobotCfg]]] = None,
simulator: str = "mujoco",
num_envs: int = 1,
headless: bool = True,
device: str = "cuda",
cameras: Optional[List] = None,
**kwargs
):
"""
Unified environment factory for RoboVerse.
This is the recommended way to create environments.
Args:
task: Task name (e.g., "pick_cube", "locomotion_walk")
robots: List of robot names or RobotCfg instances
simulator: Simulator backend ("mujoco", "isaacsim", "sapien3", etc.)
num_envs: Number of parallel environments
headless: Whether to run in headless mode
device: Device for tensor computations
cameras: Camera configurations for observations
**kwargs: Additional task-specific arguments
Returns:
BaseTaskEnv: Configured environment instance
Example:
>>> env = make_env(
... task="pick_cube",
... robots=["franka"],
... simulator="mujoco",
... num_envs=16,
... )
>>> obs, info = env.reset()
>>> obs, reward, term, trunc, info = env.step(action)
"""
from metasim.task.registry import get_task_class
from metasim.utils.setup_util import get_robot
# Resolve task class
task_cls = get_task_class(task)
# Build scenario from task default
scenario = task_cls.scenario.copy()
# Resolve robots
if robots is not None:
resolved_robots = []
for robot in robots:
if isinstance(robot, str):
resolved_robots.append(get_robot(robot))
else:
resolved_robots.append(robot)
scenario.robots = resolved_robots
# Apply overrides
scenario.update(
simulator=simulator,
num_envs=num_envs,
headless=headless,
cameras=cameras or [],
)
# Create and return environment
env = task_cls(scenario, device=device, **kwargs)
env.launch()
return env
# Also register with gymnasium for compatibility
def register_gymnasium_envs():
"""Register all RoboVerse tasks with Gymnasium."""
import gymnasium
from metasim.task.registry import TASK_REGISTRY
for task_name in TASK_REGISTRY:
gymnasium.register(
id=f"RoboVerse/{task_name}",
entry_point="metasim.env_factory:make_env",
kwargs={"task": task_name},
)
TODO-013: Remove Commented-Out Code#
Priority: P2
Effort: Very Low (2 hours)
Risk: Very Low
Files to clean:
File |
Line |
Content |
|---|---|---|
|
18 |
|
|
36 |
|
|
86 |
|
|
72 |
Various commented code |
Approach:
Search for
#patterns followed by codeReview each instance
Either remove or convert to proper TODO comment
TODO-014: Extract Magic Numbers to Constants#
Priority: P2
Effort: Low (1-2 days)
Risk: Low
Create constants file:
# metasim/constants.py
"""Framework-wide constants."""
# Simulation defaults
DEFAULT_DT = 0.015 # 15ms physics timestep
DEFAULT_DECIMATION = 2
DEFAULT_GRAVITY = (0.0, 0.0, -9.81)
# Cache settings
STATE_CACHE_SIZE = 1000
MAX_PARALLEL_ENVS = 4096
# Timeouts
DEFAULT_LAUNCH_TIMEOUT = 30.0
DEFAULT_STEP_TIMEOUT = 5.0
# Numerical tolerances
POSITION_TOLERANCE = 1e-5
ROTATION_TOLERANCE = 1e-4
Implementation Guidelines#
Safe Modification Protocol#
Before making any changes, follow this checklist:
β‘ Pre-modification
βββ β‘ Read existing tests for the module
βββ β‘ Run existing tests (ensure they pass)
βββ β‘ Understand the change's impact scope
βββ β‘ Check for downstream dependencies
β‘ Implementation
βββ β‘ Write tests first (TDD preferred)
βββ β‘ Make small, focused commits
βββ β‘ Maintain backward compatibility
βββ β‘ Add deprecation warnings if needed
β‘ Post-modification
βββ β‘ Run full test suite locally
βββ β‘ Check for linter errors
βββ β‘ Update documentation if needed
βββ β‘ Create detailed PR description
Feature Flag Pattern#
For high-risk changes, use feature flags:
# metasim/config.py
class FeatureFlags:
"""Feature flags for gradual rollout of changes."""
# State cache v2 with independent tensor/dict caches
USE_INDEPENDENT_STATE_CACHE = False
# New unified environment factory
USE_UNIFIED_ENV_FACTORY = False
# Strict type checking in configs
STRICT_CONFIG_VALIDATION = False
Deprecation Pattern#
import warnings
from functools import wraps
def deprecated(message: str, removal_version: str = "0.3.0"):
"""Decorator to mark functions as deprecated."""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
warnings.warn(
f"{func.__name__} is deprecated: {message}. "
f"Will be removed in version {removal_version}.",
DeprecationWarning,
stacklevel=2
)
return func(*args, **kwargs)
return wrapper
return decorator
# Usage
@deprecated("Use make_env() instead", removal_version="0.4.0")
def create_environment_legacy(...):
...
Testing Strategy#
Test Hierarchy#
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β End-to-End Tests β
β (Full training pipeline tests) β
β ~5% of tests β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Integration Tests β
β (Task reset/step, multi-env, rendering) β
β ~25% of tests β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Unit Tests β
β (Individual functions, state conversion, configs) β
β ~70% of tests β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Running Tests Locally#
# Run all general tests (no simulator required)
pytest metasim/test -k general -v
# Run MuJoCo tests
pytest metasim/test -k mujoco -v
# Run with coverage
pytest metasim/test --cov=metasim --cov-report=html
# Run specific test file
pytest metasim/test/sim/test_state_cache.py -v
Coverage Targets#
Phase |
Target Coverage |
Timeline |
|---|---|---|
Current |
~15-20% |
- |
Phase 1 |
30% |
Week 2 |
Phase 2 |
50% |
Week 6 |
Phase 3 |
60% |
Week 10 |
Long-term |
70%+ |
Week 14+ |
Appendix: Quick Reference#
Priority Definitions#
Priority |
Definition |
Response Time |
|---|---|---|
P0 |
Critical - blocks users or causes data corruption |
Immediate |
P1 |
High - significant usability or reliability issue |
1-2 weeks |
P2 |
Medium - code quality or maintainability |
2-4 weeks |
P3 |
Low - nice to have improvements |
Backlog |
Effort Definitions#
Effort |
Time Estimate |
Description |
|---|---|---|
Very Low |
< 4 hours |
Single file, obvious change |
Low |
1-2 days |
Few files, clear scope |
Medium |
3-5 days |
Multiple files, some complexity |
High |
1-2 weeks |
Significant refactoring |
Very High |
2+ weeks |
Architecture-level changes |
Contributing#
If you want to contribute to any of these improvements:
Comment on the relevant GitHub issue (or create one)
Follow the Safe Modification Protocol
Start with lower-risk items to familiarize yourself with the codebase
Ask questions in discussions if anything is unclear
Recommended starting points for new contributors:
TODO-002: Add @abstractmethod decorators (Very Low effort)
TODO-013: Remove commented-out code (Very Low effort)
TODO-003: Add robot configuration validation tests (Medium effort, high impact)