VM Execution System Guide
Overview
The Terraphim Multi-Agent System integrates secure code execution capabilities using Firecracker MicroVMs. This guide covers the complete architecture for executing code from LLM agents in isolated VM environments with comprehensive safety, history tracking, and session management.
Architecture Components
1. Core Models (vm_execution/models.rs)
VmExecutionConfig
Configuration for VM-based code execution:
HistoryConfig
VM session history and snapshot configuration:
Language Support
Built-in language configurations with security restrictions:
Python:
- Extension:
.py - Execute:
python3 - Restrictions:
subprocess,os.system,eval,exec,__import__ - Timeout multiplier: 1.0x
JavaScript/Node.js:
- Extension:
.js - Execute:
node - Restrictions:
child_process,eval,Function(,require('fs') - Timeout multiplier: 1.0x
Bash:
- Extension:
.sh - Execute:
bash - Restrictions:
rm -rf,dd,mkfs,:(){ :|:& };:,chmod 777 - Timeout multiplier: 1.0x
Rust:
- Extension:
.rs - Execute:
rustc(compile then run) - Restrictions:
unsafe,std::process,std::fs::remove - Timeout multiplier: 3.0x (accounts for compilation time)
2. Code Extraction (vm_execution/code_extractor.rs)
CodeBlockExtractor
Extracts executable code blocks from LLM responses:
let extractor = new;
// Extract all code blocks with confidence scores
let blocks = extractor.extract_code_blocks;
for block in blocks Pattern Detection
Identifies code blocks in markdown format:
```python
def factorial(n):
return 1 if n <= 1 else n * factorial(n-1)
print(factorial(5))
### 3. VM Execution Client (`vm_execution/client.rs`)
#### VmExecutionClient
HTTP client for fcctl-web API integration:
```rust
let config = VmExecutionConfig {
enabled: true,
api_base_url: "http://localhost:8080".to_string(),
vm_pool_size: 2,
default_vm_type: "ubuntu".to_string(),
execution_timeout_ms: 30000,
allowed_languages: vec!["python".into(), "rust".into()],
auto_provision: true,
code_validation: true,
max_code_length: 10000,
history: HistoryConfig::default(),
};
let client = VmExecutionClient::new(&config);
// Execute Python code
let response = client.execute_python(
"agent-001",
"print('Hello from VM!')",
None
).await?;
println!("Exit Code: {}", response.exit_code);
println!("Output: {}", response.stdout);4. DirectSessionAdapter (vm_execution/session_adapter.rs)
Low-overhead session management using HTTP API (avoids fcctl-repl dependency conflicts):
let adapter = new;
// Create or reuse session
let session_id = adapter.get_or_create_session.await?;
// Execute command
let = adapter.execute_command_direct.await?;
// Create snapshot
let snapshot_id = adapter.create_snapshot_direct.await?;
// Rollback if needed
adapter.rollback_direct.await?;
// Close when done
adapter.close_session.await?;5. Hook System (vm_execution/hooks.rs)
Pre/post processing hooks for tool and LLM interactions inspired by Claude Agent SDK:
Hook Trait
Built-in Hooks
DangerousPatternHook - Security validation:
let hook = new;
let manager = new;
manager.add_hook;
// Blocks dangerous patterns like "rm -rf /", "eval(...)", etc.SyntaxValidationHook - Code validation:
let hook = new;
// Validates language support, code length limits, basic syntaxExecutionLoggerHook - Observability:
let hook = new;
// Logs all executions for debugging and audit trailsDependencyInjectorHook - Auto-import injection:
let hook = new;
// Automatically adds required imports for common patternsOutputSanitizerHook - Sensitive data filtering:
let hook = new;
// Filters API keys, passwords, secrets from outputHook Manager
Orchestrates multiple hooks with decision handling:
let mut manager = new;
manager.add_hook;
manager.add_hook;
let context = PreToolContext ;
match manager.run_pre_tool.await? 6. FcctlBridge (vm_execution/fcctl_bridge.rs)
Integration layer between LLM agents and fcctl infrastructure:
let config = HistoryConfig ;
let bridge = new;
// Track execution with automatic snapshots
bridge.track_execution.await?;
// Query history
let history = bridge.query_history.await?;
// Auto-rollback on failure
bridge.auto_rollback_on_failure.await?;Integration with TerraphimAgent
Configuration
Add VM execution to agent role configuration:
Agent Usage
use ;
let role = from_file?;
let agent = new.await?;
let input = CommandInput "#.to_string(),
metadata: None,};
let result = agent.process_command(input).await?; println!("Execution result: {}", result.response);
## Testing
### Test Organization
#### Unit Tests
No external dependencies required:
```bash
./scripts/test-vm-features.sh unitTests:
- Hook system functionality
- Session adapter logic
- Code extraction and validation
- Configuration parsing
- Basic Rust execution tests
Integration Tests
Requires fcctl-web running at localhost:8080:
# Start fcctl-web
&&
# Run integration tests
Tests:
- DirectSessionAdapter with real HTTP API
- FcctlBridge integration modes (direct vs HTTP)
- Hook integration with VM client
- Rust compilation and execution
- Session lifecycle and snapshots
End-to-End Tests
Requires full stack (fcctl-web + agent system):
Tests:
- Complete workflows from user input to VM execution
- Multi-language execution (Python, JavaScript, Bash, Rust)
- Security blocking dangerous code
- Multi-turn conversations with VM state persistence
- Error recovery with history
- Performance tests (rapid execution, concurrent sessions)
Language-Specific Tests
Rust compilation and execution suite:
Test Automation Script
# Unit tests only (fast, no server required)
# Integration tests (requires fcctl-web)
# E2E tests (requires full stack)
# Rust-specific suite
# All tests
# Help
Security Considerations
Code Validation
- Automatic pattern detection for dangerous operations
- Language-specific security restrictions
- Code length limits to prevent resource exhaustion
- Syntax validation before execution
Execution Isolation
- Each agent gets dedicated VM instances
- Network isolation between VMs
- Resource limits (CPU, memory, disk)
- Timeout enforcement for runaway code
History and Rollback
- Snapshot before dangerous operations
- Automatic rollback on failures (optional)
- Command history for audit trails
- State recovery mechanisms
Hook System Security
- Pre-execution validation hooks
- Output sanitization hooks
- User approval for sensitive operations
- Custom security policies per agent
Performance Optimization
VM Pool Management
- Pre-warmed VM instances for fast execution
- Pool size per agent for concurrent operations
- Auto-scaling based on demand
- Health checks and automatic recovery
Session Reuse
- DirectSessionAdapter maintains persistent sessions
- Avoids VM creation overhead for sequential commands
- State preservation across command executions
- Efficient snapshot and rollback operations
Language-Specific Optimizations
- Rust: 3x timeout multiplier for compilation
- Python/JavaScript: 1x standard timeouts
- Bash: Fast execution with minimal overhead
- Caching for compiled languages
Advanced Features
Custom Hook Implementation
use *;
WebSocket Real-Time Updates
The fcctl-web API provides WebSocket support for streaming execution output:
// Connect to WebSocket for real-time updates
ws://localhost:8080/ws
// Send execution command
// Receive streaming output
Multi-Language Workflows
Execute multiple languages in sequence within same session:
let session_id = adapter.get_or_create_session.await?;
// Python data processing
let = adapter.execute_command_direct.await?;
// Bash file manipulation
let = adapter.execute_command_direct.await?;
// Rust compilation and execution
let = adapter.execute_command_direct.await?;Troubleshooting
Common Issues
"SessionNotFound" errors:
- Ensure fcctl-web is running on correct port
- Check session hasn't timed out
- Verify session_id is correct
Compilation failures for Rust:
- Increase execution timeout (3x standard)
- Check Rust toolchain installed in VM
- Verify code syntax before execution
WebSocket disconnections:
- Use correct protocol (
ws://nothttp://) - Implement reconnection logic
- Check firewall/proxy settings
Security hook blocking legitimate code:
- Review hook patterns
- Add exceptions for known-safe patterns
- Use custom hooks for specific requirements
Debug Logging
Enable debug output:
RUST_LOG=debug Health Checks
Verify fcctl-web availability:
Production Deployment
Infrastructure Requirements
- fcctl-web service running and accessible
- Persistent storage for VM sessions and snapshots
- Network isolation for security
- Resource monitoring and alerts
Configuration Best Practices
- Enable history and snapshots for critical operations
- Set appropriate timeout values per language
- Configure auto-rollback for production safety
- Use direct integration mode for lower overhead
- Enable all built-in security hooks
- Set reasonable pool sizes based on workload
Monitoring
- Track execution success/failure rates
- Monitor VM resource usage
- Alert on timeout violations
- Audit history entries for compliance
- Track hook block/allow decisions
Examples
Complete Example: Fibonacci with Error Handling
use ;
async "#.to_string(),
metadata: None,
};
match agent.process_command(input).await {
Ok(result) if result.success => {
println!("✓ Execution successful!");
println!("Output: {}", result.response);
}
Ok(result) => {
eprintln!("✗ Execution failed: {}", result.response);
}
Err(e) => {
eprintln!("✗ Error: {}", e);
}
}
Ok(())}
## Further Reading
- [fcctl-web API Documentation](../scratchpad/firecracker-rust/README.md)
- [Firecracker MicroVM Documentation](https://firecracker-microvm.github.io/)
- [Claude Agent SDK Python](https://github.com/anthropics/claude-agent-sdk-python)
- [Test Coverage Report](./VM_EXECUTION_TEST_PLAN.md)