Implementation Plan: terraphim_orchestrator -- AI Dark Factory
Status: Draft
Research Doc: .docs/research-dark-factory-orchestration.md
Author: Terraphim AI / Phase 2 Disciplined Design
Date: 2026-03-06
Estimated Effort: 3-4 days
Overview
Summary
New library crate terraphim_orchestrator that wires existing spawner, router, supervisor, messaging, and pool crates into a reconciliation loop. Adds time-based scheduling, Nightwatch drift detection, nightly compound review, and shallow context handoff.
Approach
Kubernetes-style reconciliation loop: declare desired agent fleet state in config, orchestrator continuously reconciles actual state to match. Uses tokio::select! to multiplex schedule triggers, drift alerts, agent messages, and compound review events.
Scope
In Scope (Top 5):
- AgentOrchestrator reconciliation loop
- TimeScheduler with cron expressions
- NightwatchMonitor with drift metrics and correction levels
- CompoundReviewWorkflow for nightly autonomous improvement
- OrchestratorConfig with TOML-based agent fleet definition
Out of Scope:
- Meta-Learning Agent (Phase 2)
- Deep context handoff with full session state (Phase 2)
- A/B test framework (Phase 2)
- UI dashboard (Phase 2)
- Multi-project coordination (Phase 3)
Avoid At All Cost (5/25 rule):
- Custom process IPC protocol (use existing stdout/stderr capture)
- Custom serialization format (use serde_json)
- Agent-to-agent direct communication channels (use existing MessageRouter)
- Plugin/extension system for custom agents (config-driven is enough)
- Distributed consensus / leader election (single server)
- Custom logging framework (use existing tracing)
- WebSocket protocol for agent communication (stdout capture works)
- Custom cron parser (use
croncrate) - Agent sandboxing beyond existing rlimits (use Firecracker for that)
- Real-time metrics aggregation service (tracing spans are sufficient)
Architecture
Component Diagram
OrchestratorConfig (TOML)
|
v
AgentOrchestrator
|-- TimeScheduler -------> cron triggers
|-- NightwatchMonitor ----> drift alerts
|-- CompoundReview -------> nightly events
|
|-- AgentSpawner (existing) --> OS processes
|-- RoutingEngine (existing) -> keyword dispatch
|-- AgentSupervisor (existing) -> fault tolerance
|-- PoolManager (existing) ---> warm agents
|-- OutputCapture (existing) -> stdout/stderrData Flow
[Cron tick / Event / Message]
-> AgentOrchestrator::run()
-> tokio::select! {
scheduler.next() -> spawn_or_shutdown(agent_def)
nightwatch.next() -> apply_correction(agent_id, level)
message_rx.recv() -> route_or_handle(msg)
compound_trigger() -> run_compound_review()
}
-> AgentSpawner::spawn(provider, task)
-> HealthChecker (30s)
-> OutputCapture (lines -> NightwatchMonitor)
-> AgentSupervisor::handle_agent_exit(id, reason)Key Design Decisions
| Decision | Rationale | Alternatives Rejected |
|----------|-----------|----------------------|
| Library crate, not binary | Composable, testable, can embed in server or standalone | Standalone binary adds IPC overhead |
| TOML config (not JSON) | Consistent with workspace settings.toml pattern | JSON lacks comments, YAML too complex |
| cron crate for scheduling | Battle-tested, standard cron syntax | Manual parsing is error-prone |
| Reconciliation loop pattern | Declarative (desired vs actual), self-healing | Imperative step sequences are fragile |
| Stdout-based drift detection | Already captured by OutputCapture, zero new I/O | LLM-based analysis too expensive for continuous monitoring |
Eliminated Options (Essentialism)
| Option Rejected | Why Rejected | Risk of Including | |-----------------|--------------|-------------------| | Agent-to-agent gRPC | Agents are CLI processes, not gRPC servers | Would require modifying all CLI tools | | Database for agent state | In-memory + tracing is sufficient for single server | Adds dep, schema maintenance, migration | | Custom health protocol | Process alive + stdout patterns covers 95% of cases | Over-engineering for Phase 1 | | ContextHandoff as separate crate | Too small; 1 struct + serialize/deserialize | Crate proliferation | | Hot config reload | Start/stop orchestrator is fast enough | Adds complexity to reconciliation loop |
Simplicity Check
What if this could be easy?
The orchestrator is a loop { tokio::select! { ... } } that reacts to 4 event sources. Each handler calls 1-2 existing crate methods. The entire crate is ~800 lines including tests. No new protocols, no new serialization, no new I/O -- just glue between existing production-ready crates.
Senior Engineer Test: A senior engineer would say "this is just a controller loop with cron, health checks, and a review script. That's the right level of complexity."
Nothing Speculative Checklist:
- [x] No features the user didn't request
- [x] No abstractions "in case we need them later"
- [x] No flexibility "just in case"
- [x] No error handling for scenarios that cannot occur
- [x] No premature optimization
File Changes
New Files
| File | Purpose | Est. Lines |
|------|---------|------------|
| crates/terraphim_orchestrator/Cargo.toml | Crate manifest with workspace deps | 30 |
| crates/terraphim_orchestrator/src/lib.rs | Public API: AgentOrchestrator, re-exports | 80 |
| crates/terraphim_orchestrator/src/config.rs | OrchestratorConfig, AgentDefinition, TOML parsing | 120 |
| crates/terraphim_orchestrator/src/scheduler.rs | TimeScheduler, cron evaluation, event channel | 100 |
| crates/terraphim_orchestrator/src/nightwatch.rs | NightwatchMonitor, DriftMetrics, CorrectionLevel | 150 |
| crates/terraphim_orchestrator/src/compound.rs | CompoundReviewWorkflow, git scan, PR creation | 120 |
| crates/terraphim_orchestrator/src/handoff.rs | ContextHandoff, shallow serialize/deserialize | 60 |
| crates/terraphim_orchestrator/src/error.rs | OrchestratorError enum | 30 |
| crates/terraphim_orchestrator/tests/orchestrator_tests.rs | Integration tests for reconciliation loop | 120 |
| crates/terraphim_orchestrator/tests/nightwatch_tests.rs | Drift calculation and correction tests | 80 |
| crates/terraphim_orchestrator/tests/scheduler_tests.rs | Cron scheduling tests | 60 |
Total new code: ~950 lines (including tests)
Modified Files
| File | Changes |
|------|---------|
| Cargo.toml (workspace) | Add "crates/terraphim_orchestrator" to members |
Deleted Files
None.
API Design
Public Types
// config.rs
/// Top-level orchestrator configuration (parsed from TOML)
/// Definition of a single agent in the fleet
/// Agent layer in the dark factory hierarchy
/// Nightwatch thresholds
/// Compound review settings
// nightwatch.rs
/// Behavioral drift metrics for a single agent
/// Drift score combining all metrics into a single 0.0-1.0 value
/// Correction level based on drift severity
/// Alert emitted by NightwatchMonitor when drift exceeds threshold
/// Action the orchestrator should take in response to drift
// scheduler.rs
/// Schedule event indicating an agent should be spawned or stopped
// handoff.rs
/// Tracks API rate limits per agent per provider
/// Sliding window for rate limit tracking
/// Shallow context transferred between agents
Public Functions
// lib.rs
/// The main orchestrator that runs the dark factory
/// Status of a single agent in the fleet
// nightwatch.rs
/// Monitors agent behavior and detects drift
// scheduler.rs
/// Cron-based scheduler for agent lifecycle events
// compound.rs
/// Result of a compound review cycle
/// Nightly compound review workflow
Error Types
// error.rs
Test Strategy
Unit Tests
| Test | Location | Purpose |
|------|----------|---------|
| test_config_parse_minimal | config.rs | Parse minimal valid TOML config |
| test_config_parse_full | config.rs | Parse config with all agents and options |
| test_config_defaults | config.rs | Default values for optional fields |
| test_drift_metrics_zero | nightwatch.rs | Zero events = Normal drift |
| test_drift_metrics_minor | nightwatch.rs | 10-20% error rate = Minor |
| test_drift_metrics_moderate | nightwatch.rs | 20-40% error rate = Moderate |
| test_drift_metrics_severe | nightwatch.rs | 40-70% error rate = Severe |
| test_drift_metrics_critical | nightwatch.rs | >70% error rate = Critical |
| test_drift_reset | nightwatch.rs | Reset clears accumulated metrics |
| test_correction_level_ordering | nightwatch.rs | Normal < Minor < Moderate < Severe < Critical |
| test_schedule_parse_cron | scheduler.rs | Valid cron expression parses |
| test_schedule_invalid_cron | scheduler.rs | Invalid cron returns error |
| test_schedule_safety_always | scheduler.rs | Safety agents have no schedule (always on) |
| test_handoff_roundtrip | handoff.rs | Serialize -> deserialize preserves context |
| test_compound_review_dry_run | compound.rs | Dry run produces findings but no PR |
Integration Tests
| Test | Location | Purpose |
|------|----------|---------|
| test_orchestrator_spawns_safety_agents | orchestrator_tests.rs | Safety agents start on run() |
| test_orchestrator_shutdown_cleans_up | orchestrator_tests.rs | All agents stopped on shutdown |
| test_orchestrator_handles_drift_alert | orchestrator_tests.rs | Drift -> correction action applied |
| test_nightwatch_accumulates_from_output | nightwatch_tests.rs | OutputEvents feed into metrics |
| test_scheduler_fires_at_cron_time | scheduler_tests.rs | Cron trigger emits ScheduleEvent |
Tests NOT Needed (Essentialism)
- End-to-end tests requiring actual CLI tools (covered by existing spawner tests)
- Performance benchmarks (not needed for Phase 1 controller loop)
- Property/fuzzing tests (input space is small and well-defined)
Implementation Steps
Step 1: Crate Scaffold + Config
Files: Cargo.toml, src/lib.rs, src/config.rs, src/error.rs
Description: Create crate, define OrchestratorConfig with TOML parsing, define error types
Tests: test_config_parse_minimal, test_config_parse_full, test_config_defaults
Dependencies: None
Estimated: 3 hours
Key code:
// Cargo.toml deps
terraphim_spawner =
terraphim_router =
terraphim_types =
tokio =
serde =
toml = "0.8"
chrono =
thiserror =
tracing = "0.1"
cron = "0.13"Step 2: NightwatchMonitor
Files: src/nightwatch.rs
Description: Drift metrics accumulation from OutputEvents, drift score calculation, alert emission via mpsc channel
Tests: All test_drift_* tests, test_correction_level_ordering
Dependencies: Step 1 (error types)
Estimated: 4 hours
Key algorithm:
Step 3: TimeScheduler
Files: src/scheduler.rs
Description: Parse cron expressions from AgentDefinitions, background task that evaluates schedules and emits ScheduleEvents
Tests: test_schedule_parse_cron, test_schedule_invalid_cron, test_schedule_safety_always
Dependencies: Step 1 (config types)
Estimated: 3 hours
Step 4: CompoundReviewWorkflow
Files: src/compound.rs
Description: Git log scan, finding prioritization, task routing to agent, PR creation via gh CLI
Tests: test_compound_review_dry_run
Dependencies: Step 1 (config, error types)
Estimated: 3 hours
Step 5: ContextHandoff
Files: src/handoff.rs
Description: HandoffContext struct, JSON serialization to file, deserialization
Tests: test_handoff_roundtrip
Dependencies: Step 1 (types)
Estimated: 1 hour
Step 6: AgentOrchestrator (Core Loop)
Files: src/lib.rs (expand)
Description: Wire spawner + router + nightwatch + scheduler into reconciliation loop. Implement run(), shutdown(), agent_statuses(), handoff(), trigger_compound_review()
Tests: test_orchestrator_spawns_safety_agents, test_orchestrator_shutdown_cleans_up, test_orchestrator_handles_drift_alert
Dependencies: Steps 1-5
Estimated: 4 hours
Step 7: Workspace Integration + Example Config
Files: Cargo.toml (workspace), example TOML config
Description: Add to workspace members, create example config with 3 agents (one per layer)
Tests: cargo test -p terraphim_orchestrator
Dependencies: Step 6
Estimated: 1 hour
Rollback Plan
If issues discovered:
- Remove
crates/terraphim_orchestratorfrom workspace members - No other crates are modified, so zero rollback risk to existing code
- Git revert the single commit adding the crate
No feature flags needed -- the crate is purely additive and opt-in.
Dependencies
New Dependencies
| Crate | Version | Justification |
|-------|---------|---------------|
| cron | 0.13 | Parse standard cron expressions for scheduling |
| toml | 0.8 | Parse TOML config files (consistent with existing settings.toml pattern) |
Existing Workspace Dependencies Used
tokio(full),serde/serde_json,chrono,thiserror,tracing,anyhow
Performance Considerations
Expected Performance
| Metric | Target | Measurement | |--------|--------|-------------| | Reconciliation loop latency | < 10ms per iteration | tracing spans | | Drift evaluation | < 1ms per agent | Unit test timing | | Cron evaluation | < 1ms per tick | Unit test timing | | Memory per agent metrics | < 10KB (sliding window) | Struct size calculation |
No benchmarks needed for Phase 1. The reconciliation loop is I/O bound (waiting on channels), not CPU bound.
Example Configuration
# orchestrator.toml -- Dark Factory Agent Fleet
working_dir = "/Users/alex/projects/terraphim/terraphim-ai"
[nightwatch]
eval_interval_secs = 300 # 5 minutes
minor_threshold = 0.10
moderate_threshold = 0.20
severe_threshold = 0.40
critical_threshold = 0.70
[compound_review]
schedule = "0 2 * * *" # 2 AM daily
max_duration_secs = 1800 # 30 minutes
repo_path = "/Users/alex/projects/terraphim/terraphim-ai"
create_prs = false # Dry run for first 2 weeks
# --- Safety Layer (always running) ---
[[agents]]
name = "security-sentinel"
layer = "Safety"
cli_tool = "codex"
task = "Continuously scan for CVEs and security vulnerabilities in dependencies. Run cargo audit and report findings."
capabilities = ["security", "vulnerability-scanning"]
max_memory_bytes = 2_147_483_648 # 2GB
# --- Core Layer (scheduled) ---
[[agents]]
name = "upstream-synchronizer"
layer = "Core"
cli_tool = "codex"
task = "Sync with upstream repositories. Check for new releases of key dependencies."
schedule = "0 3 * * *" # 3 AM daily
capabilities = ["sync", "dependency-management"]
# --- Growth Layer (on-demand) ---
[[agents]]
name = "code-reviewer"
layer = "Growth"
cli_tool = "claude"
task = "Review the latest PR for code quality, security issues, and adherence to project conventions."
capabilities = ["code-review", "architecture"]Open Items (Resolved)
| Item | Decision | Date | |------|----------|------| | CLI headless flags on BigBox | Yes -- all CLIs run non-interactively | 2026-03-06 | | API budget for nightly compound review | Track session rate limits per provider; no fixed dollar ceiling | 2026-03-06 | | Shared vs separate git worktrees | Shared worktree -- all agents work in same repo checkout | 2026-03-06 |
Design Implications of Decisions
Shared worktree: Agents must coordinate file access. Use MCP Agent Mail file_reservation_paths() for exclusive file locks. Compound review creates branches, not worktrees.
Rate limit tracking: NightwatchMonitor gains a RateLimitTracker that counts API calls per agent per provider per hour. CompoundReviewWorkflow checks remaining budget before spawning tasks. Exposed via AgentStatus::api_calls_remaining.
Approval
- [ ] Technical review complete
- [ ] Test strategy approved
- [ ] Performance targets agreed
- [ ] Human approval received