Skip to content

Pattern 22: Sandbox Escalation with Automatic Retry

"Multi-stage command execution with intelligent fallback strategies"

πŸ“– Pattern Overview

Sandbox Escalation is a sophisticated execution pattern that goes far beyond simple "run command and handle errors." It implements a complete decision tree for safe command execution with automatic retry mechanisms.

🎯 Key Concepts

  1. Safety Assessment - Classify commands before execution
  2. Sandbox Selection - Choose appropriate isolation level
  3. Automatic Escalation - Retry without sandbox on failure
  4. Approval Workflows - Smart user consent management
  5. Session Caching - Remember user decisions
  6. Telemetry Tracking - Log all decisions and outcomes

πŸ” How Codex Implements This

Location in Codebase

  • Primary: codex-rs/core/src/executor/runner.rs (lines 76-218)
  • Support: codex-rs/core/src/executor/sandbox.rs (lines 87-160)

Implementation Flow

// From codex-rs/core/src/executor/runner.rs:77-157
pub(crate) async fn run(&self, request: ExecutionRequest) -> Result<ExecToolCallOutput> {
    // Step 1: Assess command safety
    let sandbox_decision = select_sandbox(
        &request,
        approval_policy,
        self.approval_cache.snapshot(),
        &config,
    ).await?;

    // Step 2: Execute in chosen sandbox
    let first_attempt = self.spawn(
        request.params.clone(),
        sandbox_decision.initial_sandbox,
        &config,
    ).await;

    // Step 3: Handle sandbox failures with escalation
    match first_attempt {
        Ok(output) => Ok(output),
        Err(CodexErr::Sandbox(error)) => {
            if sandbox_decision.escalate_on_failure {
                self.retry_without_sandbox(&request, error).await
            } else {
                Err(ExecError::rejection(message))
            }
        }
    }
}

Key Features

  1. Three-Tier Safety Classification:
  2. Auto-approve (safe commands)
  3. Ask user (potentially dangerous)
  4. Reject (definitely dangerous)

  5. Sandbox Types:

  6. None (no isolation)
  7. Restricted shell (limited environment)
  8. Full container (Docker/etc.)

  9. Escalation Logic:

  10. Run in sandbox first
  11. If sandbox denies β†’ ask user
  12. If approved β†’ retry without sandbox
  13. Cache approval for session

  14. Approval Scoping:

  15. Once (this command only)
  16. Session (remember for this session)
  17. Never (always deny)

πŸ’‘ Real-World Example from Codex

User: "Install numpy with pip"

1. Safety Assessment: "pip install" β†’ potentially dangerous (network access)
2. Sandbox Decision: Run in restricted sandbox first
3. Execute: sandbox blocks network access β†’ fails
4. Escalation: Ask user "Command failed; retry without sandbox?"
5. User Choice: "Approve for session"
6. Retry: Run without sandbox β†’ succeeds
7. Cache: Remember approval for future pip commands
8. Telemetry: Log decision chain for debugging

πŸ“Š Architecture Diagram

Command Request
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Safety Assessmentβ”‚
β”‚ - Check dangerousβ”‚
β”‚ - Check approved β”‚
β”‚ - Apply policy   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
    β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
    β”‚           β”‚
    β–Ό           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Auto    β”‚  β”‚Ask User β”‚
β”‚Approve β”‚  β”‚         β”‚
β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
    β”‚            β”‚
    β”‚       β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
    β”‚       β”‚         β”‚
    β”‚       β–Ό         β–Ό
    β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   β”‚Approved β”‚ β”‚Denied  β”‚
    β”‚   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚        β”‚
    └────────┼────────────────┐
             β”‚                β”‚
             β–Ό                β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚Execute in    β”‚ β”‚Execute      β”‚
        β”‚Sandbox       β”‚ β”‚Unsandboxed  β”‚
        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
          β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
          β”‚         β”‚
          β–Ό         β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚Success  β”‚ β”‚Sandbox Error β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                   β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
                   β”‚Ask User β”‚
                   β”‚Escalate?β”‚
                   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
                        β”‚
                   β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
                   β”‚         β”‚
                   β–Ό         β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚Retry    β”‚ β”‚Fail    β”‚
              β”‚No       β”‚ β”‚        β”‚
              β”‚Sandbox  β”‚ β”‚        β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🐍 Python Implementation

See the example file: - pattern_advanced.py: Complete 400-line implementation with all features

Key classes: - CommandExecutor: Main orchestrator - SandboxDecision: Execution strategy - SafetyCheck: Risk assessment - ApprovalCache: Session-scoped consent

πŸ”‘ Key Takeaways

  1. βœ… Multi-Stage Execution: Don't just try once and fail
  2. βœ… Safety First: Assess risk before execution
  3. βœ… Smart Escalation: Automatic retry with user approval
  4. βœ… Session Memory: Cache user decisions
  5. βœ… Comprehensive Logging: Track all decision points
  6. ⚠️ Complex State: Much more than simple try/catch

πŸš€ When to Use

  • βœ… Production agent systems
  • βœ… Commands that might need special permissions
  • βœ… Systems with security requirements
  • βœ… Multi-user environments
  • ❌ Simple scripts or demos
  • ❌ Fully trusted environments

⚠️ Common Pitfalls

1. Over-Engineering Simple Cases

❌ BAD: Use for "echo hello"
βœ… GOOD: Use for "curl external-api.com"

2. Ignoring User Experience

❌ BAD: Ask for approval on every command
βœ… GOOD: Smart caching with session scope

3. Poor Error Messages

❌ BAD: "Command failed"
βœ… GOOD: "Network access blocked by sandbox; retry without isolation?"

πŸ“š Further Reading

  • Codex Source: codex-rs/core/src/executor/runner.rs
  • Sandbox Implementation: codex-rs/core/src/executor/sandbox.rs
  • Safety Assessment: codex-rs/core/src/safety.rs
  • Seatbelt (macOS): Apple's sandboxing system
  • Seccomp (Linux): Linux's system call filtering
  • Pattern 5: Tool Use - Basic function calling
  • Pattern 12: Exception Handling - Error recovery strategies
  • Pattern 13: Human-in-the-Loop - Approval workflows
  • Pattern 18: Guardrails/Safety - Security constraints

Next: Pattern 17: Turn Diff Tracking β†’