Comprehensive Lessons Learned - Terraphim AI Development
Compiled: December 20, 2025 Source: Multiple development sessions (2025-10-07 to 2025-09-17) Status: Production Ready Patterns
Executive Summary
This document consolidates all major lessons learned from Terraphim AI development, covering security implementation, multi-agent systems, workflow orchestration, deployment patterns, and technical excellence. These patterns represent proven solutions for production-ready AI systems.
Security Implementation Patterns
Date: 2025-10-07 - Critical Security Vulnerability Fixes
Pattern 1: Defense in Depth for Input Validation
Context: LLM prompt injection and network interface name injection vulnerabilities.
What We Learned:
- Separate sanitization from validation: Sanitization (making input safe) and validation (rejecting bad input) serve different purposes
- Multiple layers of defense: Pattern detection, length limits, character whitelisting, and control character removal all work together
- Log but don't fail: Sanitization should log warnings but allow operation to continue with safe version
Implementation:
// GOOD: Separate concerns
// GOOD: Multiple checks
- Regex pattern matching for suspicious strings
- Length enforcement
- Control character removal
- Special token strippingAnti-pattern to Avoid:
// BAD: Single validation that's too strict
if prompt.contains Pattern 2: Eliminate Subprocess Execution Where Possible
Context: Command injection vulnerability via curl subprocess.
What We Learned:
- Native libraries >> subprocesses: Using hyper HTTP client eliminates entire class of injection attacks
- Path canonicalization is critical: Always canonicalize file paths before use
- Type safety helps: Using proper types (PathBuf, Uri) prevents string manipulation errors
Implementation:
// GOOD: Native HTTP client
use Client;
use ;
let socket_path = self.socket_path.canonicalize?; // Validate first
let client = unix;
let response = client.request.await?;
// BAD: Shell subprocess
new
.args // Injection vector!
.outputWhen to Apply: HTTP/API clients, file operations, process management, database access
Pattern 3: Replace Unsafe Code with Safe Abstractions
Context: 12 occurrences of unsafe { ptr::read() } for DeviceStorage copying.
What We Learned:
- Safe alternatives usually exist: DeviceStorage already had
arc_memory_only()method - Unsafe blocks are technical debt: Even correct unsafe code is harder to maintain
- Clone is often acceptable: Performance cost of cloning is usually worth safety
Implementation:
// GOOD: Safe Arc creation
let persistence = arc_memory_only.await?;
// BAD: Unsafe pointer copy
use ptr;
let storage_ref = instance.await?;
let storage_copy = unsafe ; // Use-after-free risk!TruthForge Workflow Orchestration Patterns
Date: 2025-10-07 - PassOneOrchestrator Parallel Execution
Pattern 4: Enum Wrapper for Heterogeneous Async Results
Context: PassOneOrchestrator needs to run 4 different agents in parallel, each returning different result types.
Problem: tokio::task::JoinSet requires all spawned tasks to return the same type.
Solution: Create enum wrapper to unify result types:
// Spawn with explicit type annotation
join_set.spawn;When to Apply: Parallel execution of agents/services returning different data structures
Pattern 5: Critical vs Non-Critical Agent Execution
Context: PassOneOrchestrator runs 4 agents - some are critical (OmissionDetector), others provide enhancement.
Solution: Differentiate critical from non-critical agents with different error strategies:
// Critical agent: propagate error
let omission_catalog = omission_catalog.ok_or_else?;
// Non-critical agent: provide fallback
let bias_analysis = bias_analysis.unwrap_or_else;Benefits: Workflow robustness: continues even if enhancement agents fail
LLM Integration Patterns
Date: 2025-10-08 - Pass Two Debate Generator Implementation
Pattern 6: Temperature Tuning for Adversarial Debates
Context: Pass2 debate requires different creativity levels for defensive vs exploitation arguments.
What We Learned:
- Defensive arguments benefit from control: Temperature 0.4 produces strategic, measured damage control
- Exploitation arguments need creativity: Temperature 0.5 enables more aggressive, innovative attacks
- Small differences matter: 0.1 temperature difference is sufficient for distinct behavioral changes
Implementation:
// GOOD: Different temperatures for different roles
let defensive_request = new
.with_temperature; // Controlled, strategic
let exploitation_request = new
.with_temperature; // Creative, aggressivePattern 7: Flexible JSON Field Parsing for LLM Responses
Context: Different system prompts produce different JSON field names for similar concepts.
What We Learned:
- LLMs may vary field names: Even with structured prompts, field naming isn't guaranteed
- Multiple fallbacks essential: Try 3-4 field name variations before failing
- Role-specific fields: Defensive uses "opening_acknowledgment", Exploitation uses "opening_exploitation"
Implementation:
// GOOD: Multiple fallback field names
let main_argument = llm_response
.as_str
.or_else
.or_else
.unwrap_or
.to_string;Pattern 8: Model Selection Strategy (Sonnet vs Haiku)
Context: Different agents have different complexity needs and cost sensitivities.
Solution: Task-based model selection:
| Task Type | Model | Reasoning | Cost | |-----------|-------|-----------|------| | Deep analysis (OmissionDetector) | Sonnet | Complex reasoning, multi-category detection | High | | Critical analysis (BiasDetector) | Sonnet | Subtle bias patterns, logical fallacy detection | High | | Framework mapping (NarrativeMapper) | Sonnet | SCCT framework expertise required | High | | Taxonomy mapping (TaxonomyLinker) | Haiku | Simple categorization, speed matters | 5-12x cheaper |
Cost Impact: Pass One with Haiku for taxonomy achieved 33% cost reduction with minimal quality impact
Multi-Agent System Architecture
Date: 2025-09-16 - Multi-Agent System Implementation Success
Pattern 9: Role-as-Agent Principle
Critical Insight: Each Role configuration in Terraphim is already an agent specification.
What We Learned:
- Roles ARE Agents: Each role has haystacks (data sources), LLM config, knowledge graph, capabilities
- Enhance Don't Rebuild: Don't build parallel agent system - enhance the role system
- Multi-Agent = Multi-Role Coordination: Not new agent infrastructure
Implementation:
// Each Role becomes an autonomous agent
let agent = from_role_config?;
let result = agent.execute_command.await?;Pattern 10: Mock-First Development Strategy
Pattern: Implement full workflow orchestration with mock agents before adding LLM integration.
Benefits:
- Fast iteration on workflow logic (no network calls)
- Predictable test behavior (no LLM variability)
- Clear separation of orchestration vs agent implementation
- Easy to identify workflow bugs vs agent bugs
Testing Strategy:
async Dynamic Model Selection
Date: 2025-09-17 - Dynamic Model Selection Complete
Pattern 11: Configuration Hierarchy Design Pattern
Problem Solved: User requirement "model names should not be hardcoded - in user facing flow user shall be able to select it via UI or configuration wizard."
Solution: 4-level configuration hierarchy system with complete dynamic model selection.
Key Design Principles:
- 4-Level Priority System: Request → Role → Global → Hardcoded fallback
- Graceful Degradation: Always have working defaults while allowing complete override
- Type Safety: Optional fields with proper validation and error handling
Web Development and Deployment Patterns
Date: 2025-10-08 - TruthForge Phase 5: UI Development & Deployment Patterns
Pattern 12: Vanilla JavaScript over Framework for Simple UIs
Context: Need to create UI that matches agent-workflows pattern, avoid build complexity.
What We Learned:
- No build step = instant deployment: Static HTML/JS/CSS files work immediately
- Framework assumptions are wrong: Always check project patterns before choosing technology
- WebSocket client reusability: Shared libraries contain reusable components
Benefits:
- Zero build time
- No dependency management
- Easier debugging (no transpilation)
- Smaller bundle size
- Works offline
Pattern 13: Caddy Reverse Proxy for Static Files + API
Context: Need to serve static UI files and proxy API/WebSocket requests to backend.
What We Learned:
- Caddy handles multiple concerns: Static file serving, reverse proxy, HTTPS, auth in one config
- Selective proxying: Use
handle /api/*to proxy only specific paths - WebSocket requires special handling:
@wsmatcher for Connection upgrade headers
Implementation:
alpha.truthforge.terraphim.cloud {
import tls_config # Automatic HTTPS
authorize with mypolicy # Authentication
root * /path/to/truthforge-ui
file_server # Static files
handle /api/* {
reverse_proxy 127.0.0.1:8090 # API backend
}
@ws {
path /ws
header Connection *Upgrade*
header Upgrade websocket
}
handle @ws {
reverse_proxy 127.0.0.1:8090 # WebSocket backend
}
}Pattern 14: 5-Phase Deployment Script Pattern
Context: Complex deployment with multiple steps needs to be reproducible and debuggable.
Solution: Phase-based organization:
#!/bin/bash
Benefits:
- Easy to debug (run individual phases)
- Clear failure points (phase that failed is obvious)
- Reproducible (same steps every time)
Code Quality and Testing Infrastructure
Date: 2025-09-15 - Pre-commit Hook Integration
Pattern 15: Pre-commit Hook Integration is Essential
What We Learned:
- Pre-commit checks catch errors: Before they block team development
- Investment in hook setup saves massive time: In CI/CD debugging
- False positive handling: API key detection needs careful configuration
- Format-on-commit ensures consistency: Across team code style
Pattern 16: Systematic Error Resolution Process
Group similar errors and fix in batches:
- Group similar errors (E0063, E0782) and fix in batches
- Use TodoWrite tool to track progress on multi-step fixes
- Prioritize compilation errors over warnings for productivity
- cargo fmt should be run after all fixes to ensure consistency
Agent System Configuration Integration
Date: 2025-09-17 - Agent System Configuration Integration Fix
Pattern 17: Configuration Propagation Pattern
Critical Discovery: 4 out of 5 workflow files calling MultiAgentWorkflowExecutor::new() instead of new_with_config()
Problem: Workflows had no access to role configurations, LLM settings, or base URLs
Solution: Ensure consistent configuration state propagation:
// WRONG: No configuration access
let executor = new.await;
// RIGHT: Pass configuration state
let executor = new_with_config.await;Lesson: Configuration state must be explicitly passed through all layers of workflow execution.
Best Practices Summary
Architecture Principles
- Role-as-Agent Principle - Transform existing role systems into agents, don't rebuild
- Mock-First Development - Build with mocks, swap to real services for production
- Defense in Depth - Multiple security layers, not single controls
- Configuration Hierarchy - Always provide 4-level override system
Security Principles
- Input Validation Pipeline - Multiple validation layers with sanitization
- Native over Subprocess - Use native libraries instead of shell commands
- Safe over Unsafe - Always prefer safe Rust abstractions
- Log Security Events - Observability is critical for production
Performance Principles
- Model Selection Strategy - Use different models for different task types
- Temperature Tuning - Adjust creativity per task requirements
- Parallel Processing - Use JoinSet for heterogeneous async tasks
- Resource Pooling - Implement proper resource lifecycle management
Deployment Principles
- Vanilla JS for Simple UI - Avoid unnecessary framework complexity
- Caddy for Web Services - Single config for static files + API + HTTPS
- Phase-Based Deployment - Break complex deployments into testable phases
- Protocol Awareness - Detect file:// vs HTTP for local development
Development Workflow Principles
- Pre-commit Integration - Catch issues before they block team
- Systematic Error Resolution - Group and fix errors in batches
- Component-by-Component Development - Build modules independently, integrate incrementally
- Configuration Consistency - Ensure configuration flows through all layers
Future Application
These patterns provide a comprehensive foundation for:
- New AI Agent Systems - Multi-agent architectures with workflow orchestration
- Security-Critical Applications - Input validation and secure execution patterns
- Web-Based AI Interfaces - Deployment and frontend development strategies
- Production AI Systems - Configuration management and testing infrastructure
Each pattern has been validated in production environments and represents proven solutions to common challenges in AI system development.
Document Compiled: December 20, 2025 Status: Production Ready Patterns Application: All Future Terraphim AI Development