OpenEnv documentation
BrowserGym Environment
BrowserGym Environment
BrowserGym is a unified framework for web-based agent tasks that provides access to multiple benchmarks under a single Gymnasium-compatible API. This integration brings the complete training-to-evaluation pipeline for web agents into OpenEnv.
Why BrowserGym?
BrowserGym provides a complete pipeline for developing web agents: train on simple tasks, then evaluate on realistic websites.
What are these benchmarks?
MiniWoB++ (Training): 100+ synthetic web tasks like βclick this buttonβ, βfill out this formβ, βselect from dropdownβ. Each task is a simple webpage with a clear objective. Fast resets, randomized variations, dense rewards. Perfect for learning basic web navigation skills. No external setup needed - tasks run in isolated browser sessions.
WebArena (Evaluation): 812 tasks on real websites (e-commerce, forums, GitLab, Wikipedia). Tasks like βfind the cheapest laptop and add to cartβ or βcreate a merge request for bug #123β. Multistep, requires reasoning, sparse rewards. Tests if your agent can handle actual websites. Requires running 7 backend services (shopping site, GitLab instance, etc.).
VisualWebArena: Similar to WebArena but requires visual understanding - agents need to interpret images, identify UI elements visually, handle multimodal content.
WorkArena: Enterprise software tasks (CRM, project management, business workflows). Tests automation on corporate-style applications.
The training β evaluation pipeline:
- Train on MiniWoB (simple, controlled, fast iterations)
- Evaluate on WebArena (complex, realistic, measures real-world capability)
Key advantage: You can start training immediately with MiniWoB. No need to set up infrastructure just to test if your code works.
Quick Start - Training (MiniWoB)
No Setup Required! π
from browsergym_env import BrowserGymEnv, BrowserGymAction
# Create environment for MiniWoB training task
env = BrowserGymEnv.from_docker_image(
"ghcr.io/openenv/browsergym-env:latest",
environment={
"BROWSERGYM_BENCHMARK": "miniwob",
"BROWSERGYM_TASK_NAME": "click-test", # or "click-button", "click-dialog", etc.
}
)
# Train your agent!
for episode in range(1000):
result = env.reset()
print(f"Goal: {result.observation.goal}")
done = False
while not done:
# Your agent decides what to do
action_str = agent.get_action(result.observation.text)
action = BrowserGymAction(action_str=action_str)
result = env.step(action)
done = result.done
print(f"Reward: {result.reward}")
env.close()Harness Sessions for TRL
If you want BrowserGym to participate in a tool-driven harness instead of a
hand-written env.reset() / env.step() loop, use the BrowserGym session
factory:
from browsergym_env import BrowserGymEnv
from browsergym_env.harness import BrowserGymSessionFactory
from openenv.core.harness import (
HarnessRunLimits,
MCPHarnessAdapter,
build_harness_rollout_func,
)
session_factory = BrowserGymSessionFactory(
client_factory=lambda: BrowserGymEnv(base_url="https://openenv-browsergym-env.hf.space"),
)
rollout_func = build_harness_rollout_func(
session_factory=session_factory,
harness_adapter=MCPHarnessAdapter(),
model_step_builder=..., # trainer-owned model sampling
limits=HarnessRunLimits(max_turns=10),
)BrowserGym exposes click, fill, send_keys, scroll, and noop as MCP-style
tools while still translating them back into the underlying BrowserGymAction
strings. See examples/browsergym_harness.py
for a full TRL-oriented example.
Available Tasks by Benchmark
MiniWoB++ Tasks (Training - 100+ tasks)
MiniWoB tasks are organized by difficulty and type. Here are the main categories:
Click Tasks (Basic interaction)
| Task Name | Description | Difficulty |
|---|---|---|
click-test | Click a single button | β Easy |
click-button | Click button with specific text | β Easy |
click-button-sequence | Click buttons in order | ββ Medium |
click-checkboxes | Select specific checkboxes | ββ Medium |
click-checkboxes-soft | Select checkboxes (multiple valid) | ββ Medium |
click-checkboxes-large | Many checkboxes to select from | ββ Medium |
click-checkboxes-transfer | Transfer learning variation | ββ Medium |
click-dialog | Click correct button in dialog | β Easy |
click-dialog-2 | More complex dialog | ββ Medium |
click-link | Click on a link | β Easy |
click-option | Select from dropdown | ββ Medium |
click-pie | Click on pie chart slice | ββ Medium |
click-scroll-list | Click item in scrollable list | βββ Hard |
click-shades | Click on specific color shade | ββ Medium |
click-shape | Click on specific shape | ββ Medium |
click-tab | Switch between tabs | ββ Medium |
click-tab-2 | More complex tab switching | βββ Hard |
click-widget | Click on UI widget | ββ Medium |
Text Entry Tasks (Typing and forms)
| Task Name | Description | Difficulty |
|---|---|---|
enter-text | Type text into input field | β Easy |
enter-text-dynamic | Dynamic text entry | ββ Medium |
enter-text-2 | Multiple text fields | ββ Medium |
enter-password | Fill password field | β Easy |
enter-date | Enter a date | ββ Medium |
enter-time | Enter a time | ββ Medium |
login-user | Complete login form | ββ Medium |
login-user-popup | Login via popup | βββ Hard |
Navigation Tasks (Multi-step interaction)
| Task Name | Description | Difficulty |
|---|---|---|
navigate-tree | Navigate through tree structure | βββ Hard |
search-engine | Use search interface | ββ Medium |
use-autocomplete | Interact with autocomplete | βββ Hard |
book-flight | Book a flight (complex form) | ββββ Very Hard |
choose-date | Pick date from calendar | βββ Hard |
choose-date-easy | Simplified date picker | ββ Medium |
choose-date-medium | Medium difficulty date picker | βββ Hard |
choose-list | Select from long list | ββ Medium |
Visual/Spatial Tasks (Requires visual understanding)
| Task Name | Description | Difficulty |
|---|---|---|
count-sides | Count sides of shape | ββ Medium |
count-shape | Count specific shapes | ββ Medium |
find-word | Find word in text | ββ Medium |
focus-text | Focus on text element | β Easy |
focus-text-2 | More complex focus task | ββ Medium |
grid-coordinate | Click grid coordinate | ββ Medium |
guess-number | Guess a number game | βββ Hard |
identify-shape | Identify shape type | ββ Medium |
read-table | Extract info from table | βββ Hard |
read-table-2 | More complex table reading | βββ Hard |
Email/Social Tasks (Realistic scenarios)
| Task Name | Description | Difficulty |
|---|---|---|
email-inbox | Manage email inbox | ββββ Very Hard |
email-inbox-forward | Forward emails | ββββ Very Hard |
email-inbox-nl | Natural language email task | ββββ Very Hard |
email-inbox-star-reply | Star and reply to emails | ββββ Very Hard |
social-media | Social media interaction | ββββ Very Hard |
social-media-some | Partial social media task | βββ Hard |
Total: 100+ tasks across all categories
Usage:
# Easy task for quick testing
env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-test"})
# Medium difficulty for training
env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-checkboxes"})
# Hard task for evaluation
env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "email-inbox"})WebArena Tasks (Evaluation - 812 tasks)
WebArena tasks are organized by website and difficulty. Tasks are numbered 0-811.
By Website:
| Website | Task Count | Description | Example Tasks |
|---|---|---|---|
| Shopping | ~200 | E-commerce site | Search products, add to cart, checkout |
| Shopping Admin | ~150 | Admin panel | Manage products, orders, customers |
| ~150 | Forum/social | Post, comment, search discussions | |
| GitLab | ~200 | Code repository | Create issues, merge requests, review code |
| Wikipedia | ~100 | Knowledge base | Search, read, extract information |
| Map | ~12 | Location service | Find places, get directions |
By Difficulty:
| Difficulty | Task Count | Steps Required | Example |
|---|---|---|---|
| Easy | ~200 | 1-5 steps | βFind the price of product Xβ |
| Medium | ~400 | 5-15 steps | βAdd cheapest laptop to cartβ |
| Hard | ~212 | 15+ steps | βCreate merge request for bug fixβ |
Usage:
# Task 0 (usually easy)
env = BrowserGymEnv(environment={
"BROWSERGYM_BENCHMARK": "webarena",
"BROWSERGYM_TASK_NAME": "0",
"SHOPPING": "http://your-server:7770",
# ... other URLs
})
# Task 156 (GitLab merge request)
env = BrowserGymEnv(environment={
"BROWSERGYM_BENCHMARK": "webarena",
"BROWSERGYM_TASK_NAME": "156",
# ... URLs
})Note: WebArena tasks require the full backend infrastructure. See WebArena setup guide.
VisualWebArena Tasks (910 tasks)
Similar to WebArena but requires visual understanding. Tasks involve:
- Image-based reasoning
- Visual element identification
- Multimodal interaction (text + images)
WorkArena Tasks
Enterprise software automation tasks:
- CRM operations
- Project management
- Business workflows
Full task lists:
Evaluation (WebArena)
Prerequisites
WebArena requires setting up backend infrastructure. See the WebArena documentation.
Usage
from envs.browsergym_env import BrowserGymEnv, BrowserGymAction
# Create environment for WebArena evaluation
env = BrowserGymEnv.from_docker_image(
"ghcr.io/openenv/browsergym-env:latest",
environment={
"BROWSERGYM_BENCHMARK": "webarena",
"BROWSERGYM_TASK_NAME": "0", # Task ID
# WebArena backend URLs (required)
"SHOPPING": "http://your-server:7770",
"SHOPPING_ADMIN": "http://your-server:7780/admin",
"REDDIT": "http://your-server:9999",
"GITLAB": "http://your-server:8023",
"MAP": "http://your-server:3000",
"WIKIPEDIA": "http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing",
"HOMEPAGE": "http://your-server:4399",
}
)
# Evaluate your trained agent
result = env.reset()
while not result.done:
action_str = agent.get_action(result.observation)
action = BrowserGymAction(action_str=action_str)
result = env.step(action)
print(f"Success: {result.reward}")
env.close()Building the Docker Image
Prerequisites
- Base Image: Build the OpenEnv base image first:
# From the OpenEnv repository root
docker build -t openenv-base:latest -f src/openenv/core/containers/images/Dockerfile .Build the BrowserGym Environment
# From the browsergym_env directory
cd envs/browsergym_env
docker build -t browsergym-env:latest -f server/Dockerfile .Run the Server
For MiniWoB (Training):
docker run -p 8000:8000 \
-e BROWSERGYM_BENCHMARK="miniwob" \
-e BROWSERGYM_TASK_NAME="click-test" \
browsergym-env:latestFor WebArena (Evaluation):
docker run -p 8000:8000 \
-e BROWSERGYM_BENCHMARK="webarena" \
-e BROWSERGYM_TASK_NAME="0" \
-e SHOPPING="http://your-server:7770" \
-e SHOPPING_ADMIN="http://your-server:7780/admin" \
-e REDDIT="http://your-server:9999" \
-e GITLAB="http://your-server:8023" \
-e MAP="http://your-server:3000" \
-e WIKIPEDIA="http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing" \
-e HOMEPAGE="http://your-server:4399" \
browsergym-env:latestEnvironment Details
Action
Actions in BrowserGym are natural language strings that describe browser operations:
from envs.browsergym_env import BrowserGymAction
# Click actions
action = BrowserGymAction(action_str="click('Submit button')")
action = BrowserGymAction(action_str="click('element_id_123')")
# Type actions
action = BrowserGymAction(action_str="fill('username', 'john@example.com')")
action = BrowserGymAction(action_str="fill('password', 'secret123')")
# Navigate actions
action = BrowserGymAction(action_str="goto('https://example.com')")
# Keyboard actions
action = BrowserGymAction(action_str="press('Enter')")
action = BrowserGymAction(action_str="press('Tab')")
# Scroll actions
action = BrowserGymAction(action_str="scroll('down')")Observation
Observations contain multiple modalities:
result = env.step(action)
obs = result.observation
# Text observations
print(obs.text) # Primary text representation (AXTree or DOM)
print(obs.axtree_txt) # Accessibility tree
print(obs.pruned_html) # Pruned HTML (interactive elements only)
# Page metadata
print(obs.url) # Current URL
print(obs.goal) # Task goal/instruction
# Visual (if enabled)
if obs.screenshot is not None:
print(obs.screenshot.shape) # [height, width, channels]
# Error handling
if obs.last_action_error:
print(f"Action failed: {obs.error}")
# Episode status
print(obs.done) # True if episode ended
print(obs.reward) # Reward for the step
# Access full BrowserGym data (includes timestamps, etc.)
print(obs.metadata["browsergym_obs"]) # Full observation dict from BrowserGym
print(obs.metadata["browsergym_info"]) # Full info dict (timestamps, page state, etc.)Advanced: Accessing Raw BrowserGym Data
For VisualWebArena or custom training, you may need additional data like timestamps or browser state. The full BrowserGym observation and info dicts are preserved in metadata:
result = env.step(action)
# Access timestamps (if available)
info = result.observation.metadata["browsergym_info"]
if "timestamp" in info:
print(f"Action timestamp: {info['timestamp']}")
# Access additional observation fields
obs_dict = result.observation.metadata["browsergym_obs"]
if "dom_object" in obs_dict:
dom = obs_dict["dom_object"]
# Work with raw DOM object
# Access page performance data
if "performance" in info:
print(f"Page load time: {info['performance']}")State
The environment state tracks progress:
state = env.state()
print(f"Benchmark: {state.benchmark}") # 'miniwob', 'webarena', etc.
print(f"Task: {state.task_name}") # Task name/ID
print(f"Episode: {state.episode_id}") # Unique episode ID
print(f"Steps: {state.step_count}") # Number of steps taken
print(f"Total Reward: {state.cum_reward}") # Cumulative reward
print(f"Goal: {state.goal}") # Task instruction
print(f"URL: {state.current_url}") # Current page URLConfiguration
Environment variables:
Common Settings
BROWSERGYM_BENCHMARK: Benchmark to use (miniwob,webarena,visualwebarena,workarena)BROWSERGYM_TASK_NAME: Specific task name (optional, will use first available if not set)BROWSERGYM_HEADLESS: Run browser in headless mode (default:true)BROWSERGYM_VIEWPORT_WIDTH: Browser viewport width (default:1280)BROWSERGYM_VIEWPORT_HEIGHT: Browser viewport height (default:720)BROWSERGYM_TIMEOUT: Action timeout in milliseconds (default:10000)
WebArena-Specific (only needed for WebArena benchmark)
SHOPPING: Shopping website URLSHOPPING_ADMIN: Shopping admin panel URLREDDIT: Reddit-like forum URLGITLAB: GitLab instance URLMAP: Map service URLWIKIPEDIA: Wikipedia instance URLHOMEPAGE: Homepage URL
Supported Benchmarks
1. MiniWoB++ (Training) β Recommended for Training
- 100+ tasks ranging from simple (click buttons) to complex (form filling, navigation)
- Fast: Instant resets, quick episodes
- Randomized: Task variations for generalization
- No setup: Works out-of-the-box
- Dense rewards: Immediate feedback for learning
Use Case: Train agents on fundamental web navigation skills
2. WebArena (Evaluation) π Benchmark
- 812 realistic tasks across 6 websites
- Complex: Multi-step reasoning, real web interfaces
- Requires setup: Need to run 7 backend services
- Sparse rewards: Binary success/failure
- Evaluation-focused: Test real-world performance
Use Case: Evaluate agents on realistic web tasks
3. VisualWebArena (Evaluation) ποΈ Visual Benchmark
- 910 tasks requiring visual understanding
- Multimodal: Both text and visual observations
- Requires setup: Similar to WebArena
- Challenging: Requires visual reasoning
Use Case: Test visual web navigation capabilities
4. WorkArena (Evaluation) πΌ Enterprise Benchmark
- Enterprise tasks: CRM, project management, etc.
- Realistic workflows: Real enterprise software
- Requires setup: Enterprise software instances
Use Case: Evaluate on business automation tasks
Typical Training Pipeline
from envs.browsergym_env import BrowserGymEnv, BrowserGymAction
# Stage 1: Train on MiniWoB (simple tasks, fast)
train_env = BrowserGymEnv.from_docker_image(
"browsergym-env:latest",
environment={
"BROWSERGYM_BENCHMARK": "miniwob",
"BROWSERGYM_TASK_NAME": "click-button",
}
)
# Train your agent (RL, imitation learning, etc.)
agent.train(train_env, num_episodes=10000)
train_env.close()
# Stage 2: Evaluate on WebArena (complex tasks, realistic)
eval_env = BrowserGymEnv.from_docker_image(
"browsergym-env:latest",
environment={
"BROWSERGYM_BENCHMARK": "webarena",
"BROWSERGYM_TASK_NAME": "0",
# ... WebArena URLs
}
)
# Test performance
success_rate = agent.evaluate(eval_env, num_tasks=812)
print(f"WebArena Success Rate: {success_rate:.2%}")
eval_env.close()Development & Testing
Running Tests
# From the OpenEnv repository root
pytest tests/envs/test_browsergym_env.pyLocal Development
# Install in development mode
cd /path/to/OpenEnv
pip install -e .
# Install BrowserGym
pip install browsergym browsergym-miniwob browsergym-webarena
# Run the server locally
cd envs/browsergym_env/server
export BROWSERGYM_BENCHMARK=miniwob
export BROWSERGYM_TASK_NAME=click-test
python app.pyProject Structure
browsergym_env/
βββ __init__.py # Module exports
βββ models.py # Action, Observation, State dataclasses
βββ client.py # HTTPEnvClient implementation
βββ README.md # This file
βββ server/
βββ __init__.py
βββ app.py # FastAPI application
βββ browsergym_environment.py # Environment implementation
βββ Dockerfile # Container specification
βββ requirements.txt # Python dependenciesReferences
- BrowserGym GitHub
- MiniWoB++ Paper
- WebArena Paper
- WebArena Website
- VisualWebArena Paper
- OpenEnv Documentation