sandbox-bench | AI Agent Sandbox Provider Benchmarks

Scores

Leaderboard

Detailed Comparison

Capabilities Matrix

Scoring Methodology

Each provider is scored from 0-100 based on weighted metrics. The benchmark measures the full lifecycle of an AI agent interacting with a sandbox: authenticate, create, execute code, read/write files, and destroy. When extended suites are run, a Capabilities weight is added and other weights are adjusted.

Grades: A (85-100) · B (70-84) · C (55-69) · D (40-54) · F (0-39)