OpenEnv documentation
Concepts
Concepts
OpenEnv follows a client-server model inspired by Gymnasiumβs simple API. Agents send structured actions to isolated environments and receive observations, rewards, and episode status in return.
+-----------------+ HTTP/WebSocket +-----------------+
| Your Agent | <--------------------> | Environment |
| (Client) | step/reset/state | (Server) |
+-----------------+ +-----------------+Key Abstractions
Environment
An Environment is an isolated execution context where your agent can take actions and receive observations. Environments usually run as servers and expose a standard API.
Action
An Action is a structured command that your agent sends to the environment. Each environment defines its own action schema.
from coding_env import CodeAction
action = CodeAction(code="print('Hello!')")Observation
An Observation is the response from the environment after taking an action. It contains the current state visible to your agent.
result = client.step(action)
print(result.observation.stdout) # "Hello!"StepResult
A StepResult bundles together everything returned from a step:
observation: what the agent can seereward: numeric reward signal for trainingterminated: whether the episode has endedtruncated: whether the episode was cut shortinfo: additional metadata
Rubric
A Rubric is a composable unit of reward computation that lives inside the
environment. Rubrics can be combined with WeightedSum, Gate, and
Sequential; use LLM judges for subjective criteria; and handle delayed rewards
with TrajectoryRubric. See the Rubrics tutorial
for the full API.
Client
A Client is how you connect to and interact with an environment. OpenEnv provides both async and sync clients.
from openenv import AutoEnv
env = AutoEnv.from_env("coding")
async with env as client:
result = await client.reset()
result = await client.step(action)
with env.sync() as client:
result = client.reset()
result = client.step(action)The Step Loop
with env.sync() as client:
result = client.reset()
while not result.terminated:
obs = result.observation
action = decide_action(obs)
result = client.step(action)
learn(result.reward)Connection Methods
| Method | Use Case | Example |
|---|---|---|
| HTTP URL | Remote servers, Hugging Face Spaces | EnvClient(base_url="https://...") |
| Docker | Local development | EnvClient.from_docker_image("env:latest") |
| Auto-discovery | Installed packages or known environments | AutoEnv.from_env("echo") |
Environment Anatomy
Every OpenEnv environment consists of:
my_env/
βββ openenv.yaml # Manifest file
βββ my_env/
β βββ __init__.py
β βββ client.py # Client classes
β βββ server.py # Server/Environment
β βββ models.py # Pydantic models
βββ Dockerfile # Container definition
βββ pyproject.toml # Package metadata
βββ README.md # DocumentationThe Manifest (openenv.yaml)
name: my_env
version: 0.1.0
description: My custom environment
client:
class_name: MyEnvClient
module: my_env.client
action:
class_name: MyAction
module: my_env.models
observation:
class_name: MyObservation
module: my_env.models
default_image: my-env:latest
spec_version: 1Models (Pydantic)
Custom Action, Observation, and State types subclass the base classes from openenv.core.env_server.types β not pydantic.BaseModel directly. The base Observation already carries done and reward fields, which step() populates; Action and State add metadata plumbing used by the server.
from openenv.core.env_server.types import Action, Observation, State
class MyAction(Action):
command: str
args: list[str] = []
class MyObservation(Observation):
output: str
success: bool
class MyState(State):
history: list[str] = []Environment Class
Environments subclass the abstract Environment[ActT, ObsT, StateT] base and implement reset, step, and the state property. Reward and termination are carried on the returned observation β they are not a tuple return value.
from openenv.core.env_server.interfaces import Environment
class MyEnvironment(Environment[MyAction, MyObservation, MyState]):
def reset(self, seed=None, episode_id=None, **kwargs) -> MyObservation:
...
def step(self, action: MyAction, timeout_s=None, **kwargs) -> MyObservation:
...
@property
def state(self) -> MyState:
...Server (FastAPI)
Use create_app from openenv.core.env_server to wrap the environment as a FastAPI application. Pass the environment class (used as a factory so each WebSocket session gets its own instance) along with the action and observation types:
from openenv.core.env_server import create_app
app = create_app(
MyEnvironment,
MyAction,
MyObservation,
env_name="my_env",
)This is what the environmentβs server/app.py entry point typically does β see envs/echo_env/server/app.py for a minimal real example.
Rewards via the Rubric
Rewards are computed inside the environment, not by external code. The base Environment accepts an optional rubric on __init__ β pass it to super().__init__(rubric=...), call self._reset_rubric() from reset, and self._apply_rubric(action, observation) from step (or _apply_rubric_async from step_async). The Rubrics tutorial covers the composable API end-to-end.