Pattern 22: Sandbox Escalation with Automatic Retry¶
"Multi-stage command execution with intelligent fallback strategies"
π Pattern Overview¶
Sandbox Escalation is a sophisticated execution pattern that goes far beyond simple "run command and handle errors." It implements a complete decision tree for safe command execution with automatic retry mechanisms.
π― Key Concepts¶
- Safety Assessment - Classify commands before execution
- Sandbox Selection - Choose appropriate isolation level
- Automatic Escalation - Retry without sandbox on failure
- Approval Workflows - Smart user consent management
- Session Caching - Remember user decisions
- Telemetry Tracking - Log all decisions and outcomes
π How Codex Implements This¶
Location in Codebase¶
- Primary:
codex-rs/core/src/executor/runner.rs
(lines 76-218) - Support:
codex-rs/core/src/executor/sandbox.rs
(lines 87-160)
Implementation Flow¶
// From codex-rs/core/src/executor/runner.rs:77-157
pub(crate) async fn run(&self, request: ExecutionRequest) -> Result<ExecToolCallOutput> {
// Step 1: Assess command safety
let sandbox_decision = select_sandbox(
&request,
approval_policy,
self.approval_cache.snapshot(),
&config,
).await?;
// Step 2: Execute in chosen sandbox
let first_attempt = self.spawn(
request.params.clone(),
sandbox_decision.initial_sandbox,
&config,
).await;
// Step 3: Handle sandbox failures with escalation
match first_attempt {
Ok(output) => Ok(output),
Err(CodexErr::Sandbox(error)) => {
if sandbox_decision.escalate_on_failure {
self.retry_without_sandbox(&request, error).await
} else {
Err(ExecError::rejection(message))
}
}
}
}
Key Features¶
- Three-Tier Safety Classification:
- Auto-approve (safe commands)
- Ask user (potentially dangerous)
-
Reject (definitely dangerous)
-
Sandbox Types:
- None (no isolation)
- Restricted shell (limited environment)
-
Full container (Docker/etc.)
-
Escalation Logic:
- Run in sandbox first
- If sandbox denies β ask user
- If approved β retry without sandbox
-
Cache approval for session
-
Approval Scoping:
- Once (this command only)
- Session (remember for this session)
- Never (always deny)
π‘ Real-World Example from Codex¶
User: "Install numpy with pip"
1. Safety Assessment: "pip install" β potentially dangerous (network access)
2. Sandbox Decision: Run in restricted sandbox first
3. Execute: sandbox blocks network access β fails
4. Escalation: Ask user "Command failed; retry without sandbox?"
5. User Choice: "Approve for session"
6. Retry: Run without sandbox β succeeds
7. Cache: Remember approval for future pip commands
8. Telemetry: Log decision chain for debugging
π Architecture Diagram¶
Command Request
β
βββββββββββββββββββ
β Safety Assessmentβ
β - Check dangerousβ
β - Check approved β
β - Apply policy β
βββββββββββ¬ββββββββ
β
βββββββ΄ββββββ
β β
βΌ βΌ
ββββββββββ βββββββββββ
βAuto β βAsk User β
βApprove β β β
βββββ¬βββββ ββββββ¬βββββ
β β
β ββββββ΄βββββ
β β β
β βΌ βΌ
β βββββββββββ ββββββββββ
β βApproved β βDenied β
β ββββββ¬βββββ ββββββββββ
β β
ββββββββββΌβββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββ βββββββββββββββ
βExecute in β βExecute β
βSandbox β βUnsandboxed β
ββββββββ¬ββββββββ βββββββββββββββ
β
ββββββ΄βββββ
β β
βΌ βΌ
βββββββββββ ββββββββββββββββ
βSuccess β βSandbox Error β
βββββββββββ ββββββββ¬ββββββββ
β
ββββββ΄βββββ
βAsk User β
βEscalate?β
ββββββ¬βββββ
β
ββββββ΄βββββ
β β
βΌ βΌ
βββββββββββ ββββββββββ
βRetry β βFail β
βNo β β β
βSandbox β β β
βββββββββββ ββββββββββ
π Python Implementation¶
See the example file: - pattern_advanced.py
: Complete 400-line implementation with all features
Key classes: - CommandExecutor
: Main orchestrator - SandboxDecision
: Execution strategy - SafetyCheck
: Risk assessment - ApprovalCache
: Session-scoped consent
π Key Takeaways¶
- β Multi-Stage Execution: Don't just try once and fail
- β Safety First: Assess risk before execution
- β Smart Escalation: Automatic retry with user approval
- β Session Memory: Cache user decisions
- β Comprehensive Logging: Track all decision points
- β οΈ Complex State: Much more than simple try/catch
π When to Use¶
- β Production agent systems
- β Commands that might need special permissions
- β Systems with security requirements
- β Multi-user environments
- β Simple scripts or demos
- β Fully trusted environments
β οΈ Common Pitfalls¶
1. Over-Engineering Simple Cases¶
2. Ignoring User Experience¶
3. Poor Error Messages¶
π Further Reading¶
- Codex Source:
codex-rs/core/src/executor/runner.rs
- Sandbox Implementation:
codex-rs/core/src/executor/sandbox.rs
- Safety Assessment:
codex-rs/core/src/safety.rs
- Seatbelt (macOS): Apple's sandboxing system
- Seccomp (Linux): Linux's system call filtering
π Related Patterns¶
- Pattern 5: Tool Use - Basic function calling
- Pattern 12: Exception Handling - Error recovery strategies
- Pattern 13: Human-in-the-Loop - Approval workflows
- Pattern 18: Guardrails/Safety - Security constraints