Basic Usage

Using the benchmark

There are 6 major benchmarks pre-packaged into Meta-World with support for making your own custom benchmarks. The benchmarks are divided into Multi-Task and Meta reinforcement learning benchmarks.

Multi-Task Benchmarks

The MT1, MT10, and MT50 benchmarks are the Multi-Task Benchmarks. These benchmarks are used to learn a multi-task policy that can learn 1, 10, or 50 training tasks simultaneously. MT1 benchmarks can be created with any of the 50 tasks available in Meta-World. In the MT10 and MT50 benchmarks, the observations returned by the benchmark will come with one-hot task IDs appended to the state.

Meta-Learning Benchmarks

The ML1, ML10, and ML45 benchmarks are 3 meta-reinforcement learning benchmarks available in Meta-World. The ML1 benchmark can be used with any of the 50 tasks available in Meta-World. The ML1 benchmark tests for few-shot adaptation to goal variations within a single task. The ML10 and ML45 both test few-shot adaptation to new tasks. ML10 comprises 10 train tasks with 5 test tasks, while ML45 comprises of 45 training tasks with 5 test tasks.

MT1

import gymnasium as gym
import metaworld

SEED = 0  # some seed number here
env = gym.make('Meta-World/MT1' env_name='reach-v3', seed=seed)
obs, info = env.reset()

a = env.action_space.sample() # randomly sample an action
obs, reward, truncate, terminate, info = env.step(a) # apply the randomly sampled action

MT10

MT10 has two different versions that can be returned by gym.make. The first version is the synchronous version of the benchmark where all environments are contained within the same process. For users with limited compute resources, the synchronous option needs the least resources.

import gymnasium as gym
import metaworld

seed = 42

envs = gym.make('Meta-World/MT10', vector_strategy='sync', seed=seed) # this returns a Synchronous Vector Environment with 10 environments

obs, info = envs.reset() # reset all 10 environments

a = env.action_space.sample() # sample an action for each environment

obs, reward, truncate, terminate, info = envs.step(a) # step all 10 environments

Alternatively, for users with more compute we also provide the asynchronous version of the MT10 benchmark where each environment is isolated in it’s own process and must use inter-process messaging via pipes to communicate.

envs = gym.make_vec('Meta-World/MT10' vector_strategy='async', seed=seed) # this returns an Asynchronous Vector Environment with 10 environments

MT50

MT50 also contains two different versions, a synchronous and an asynchronous version, of the environments.

import gymnasium as gym
import metaworld

seed = 42

envs = gym.make_vec('Meta-World/MT50', vector_strategy='sync', seed=seed) # this returns a Synchronous Vector Environment with 50 environments

obs, info = envs.reset() # reset all 50 environments

a = env.action_space.sample() # sample an action for each environment

obs, reward, truncate, terminate, info = envs.step(a) # step all 50 environments
envs = gym.make_vec('Meta-World/MT50', vector_strategy='async', seed=seed) # this returns an Asynchronous Vector Environment with 50 environments

Meta-Learning Benchmarks

Each Meta-reinforcement learning benchmark has training and testing environments. These environments must be created separately as follows.

ML1

import gymnasium as gym
import metaworld

seed = 42

train_envs = gym.make('Meta-World/ML1-train', env_name='reach-V3', seed=seed)
test_envs = gym.make('Meta-World/ML1-test', env_name='reach-V3', seed=seed)

# training procedure use train_envs
# testing procedure use test_envs

ML10

Similar to the Multi-Task benchmarks, the ML10 and ML45 environments can be run in synchronous or asynchronous modes.

import gymnasium as gym
import metaworld
train_envs = gym.make_vec('Meta-World/ML10-train', vector_strategy='sync', seed=seed)
test_envs = gym.make_vec('Meta-World/ML10-test', vector_strategy='sync', seed=seed)

ML45

import gymnasium as gym
import metaworld

train_envs = gym.make_vec('Meta-World/ML45-train', vector_strategy='sync', seed=seed)
test_envs = gym.make_vec('Meta-World/ML45-test', vector_strategy='sync', seed=seed)

Custom Benchmarks

Finally, we also provide support for creating custom benchmarks by combining any number of Meta-World environments.

The prefix ‘mt’ will return environments that are goal observable for Multi-Task reinforcement learning, while the prefix ‘ml’ will return environments that are partially observable for Meta-reinforcement learning. Like the included MT and ML benchmarks, these environments can also be run in synchronous or asynchronous mode. In order to create a custom benchmark, the user must provide a list of environment names with the suffix ‘-V3’.

import gymnasium as gym
import metaworld

envs = gym.make_vec('Meta-World/custom-mt-envs', vector_strategy='sync', envs_list=['env_name_1-v3', 'env_name_2-v3', 'env_name_3-v3'], seed=seed)
envs = gym.make_vec('Meta-World/custom-ml-envs', vector_strategy='sync', envs_list=['env_name_1-v3', 'env_name_2-v3', 'env_name_3-v3'], seed=seed)

Arguments

The gym.make command supports multiple arguments:

Argument

Usage

Values

seed

The number to seed the random number generator with

None or int

max_episode_steps

The maximum number of steps per episode

None or int

use_one_hot

Whether the one hot wrapper should be use to add the task ID to the observation

True or False

num_tasks

The number of parametric variations to sample (default:50)

int

terminate_on_success

Whether to terminate the episode during training when the success signal is seen

True or False

vector_strategy

What kind of vector strategy the environments should be wrapped in

‘sync’ or ‘async’

task_select

How parametric variations should be selected

“random” or “pseudorandom”

reward_function_version

Use the original reward functions from Meta-World or the updated ones

“v1” or “v2”

reward_normalization_method

Apply a reward normalization wrapper

None or ‘gymnasium’ or ‘exponential’

render_mode

The render mode of each environment

None or ‘human’ or ‘rgb_array’ or ‘depth_array’

camera_name

The Mujoco name of the camera that should be used to render

‘corner’ or ‘topview’ or ‘behindGripper’ or ‘gripperPOV’ or ‘corner2’ or ‘corner3’

camera_id

The Mujoco ID of the camera that should be used to render

int