Terraphim AI Testing Infrastructure Improvement Plan

Created: Tue Nov 11 2025 Status: ACTIVE Priority: HIGH

Executive Summary

This plan addresses critical testing infrastructure issues identified in the test and benchmark report. The focus is on creating a robust, modular testing framework that can execute all tests and benchmarks reliably within reasonable timeframes.

Phase 1: Immediate Fixes (Critical Issues)

1.1 Fix Multi-Agent Benchmark Configuration

Issue: test-utils feature flag missing from benchmark execution Root Cause: run-benchmarks.sh doesn't include required feature flags Status: ⏳ PENDING Solution:

  • Update run-benchmarks.sh line 46 to include --features test-utils
  • Modify cargo bench command: cargo bench --features test-utils --bench agent_operations

1.2 Create Modular Test Execution Strategy

Issue: Full test suite compilation exceeds timeout limits Root Cause: MCP server and heavy dependencies slow down compilation Status: ⏳ PENDING Solution:

  • Create separate test categories: core-tests, integration-tests, mcp-tests
  • Implement --category flag in run_all_tests.sh
  • Exclude MCP server from core test runs

Phase 2: Enhanced Test Infrastructure

2.1 Create Tiered Test Scripts

New Scripts to Create:

  • scripts/run_core_tests.sh - Fast unit tests only
  • scripts/run_integration_tests.sh - Service-dependent tests
  • scripts/run_mcp_tests.sh - MCP-specific tests
  • scripts/run_performance_tests.sh - All benchmarks with proper flags

2.2 Optimize Test Execution

Improvements:

  • Add parallel test execution using --test-threads optimization
  • Implement test result caching
  • Add timeout management per test category
  • Create test dependency graph for optimal execution order

Phase 3: Benchmark Infrastructure Fixes

3.1 Fix All Benchmark Configurations

Actions:

  • Update run-benchmarks.sh to handle feature flags per crate
  • Add benchmark timeout management
  • Implement fallback for missing dependencies
  • Create benchmark-specific Cargo.toml profiles

3.2 Add Missing Benchmark Features

Enhancements:

  • Add memory usage benchmarks for persistence
  • Create performance regression detection
  • Implement automated baseline comparison
  • Add benchmark result archiving

Phase 4: CI/CD Integration

4.1 Implement Parallel CI Pipeline

Strategy:

  • Split CI into stages: unit, integration, performance
  • Use matrix builds for different test categories
  • Add test result aggregation
  • Implement performance regression alerts

4.2 Add Test Coverage Reporting

Implementation:

  • Install cargo-tarpaulin for coverage
  • Generate coverage reports per crate
  • Create coverage thresholds
  • Add coverage badges to README

Phase 5: Advanced Testing Features

5.1 Performance Baselines

Create:

  • Define performance thresholds for all benchmarks
  • Implement automated performance regression detection
  • Add performance trend analysis
  • Create performance dashboard

5.2 Integration Test Environment

Setup:

  • Create dedicated test environment configuration
  • Implement service health checks
  • Add test data fixtures
  • Create test isolation mechanisms

Detailed Implementation Steps

Step 1: Fix Benchmark Script (Immediate)

File: scripts/run-benchmarks.sh Line: 46 Change:

FROM: if cargo bench --bench agent_operations > "${CARGO_BENCH_OUTPUT}" 2>&1; then
TO:   if cargo bench --features test-utils --bench agent_operations > "${CARGO_BENCH_OUTPUT}" 2>&1; then

Step 2: Create Core Test Script

New File: scripts/run_core_tests.sh Focus: terraphim_types, terraphim_automata, terraphim_persistence Exclude: MCP server, integration tests Timeout: 5 minutes

Step 3: Update Main Test Script

File: scripts/run_all_tests.sh Changes:

  • Add --category flag
  • Implement conditional test execution
  • Add better timeout management
  • Separate MCP testing

Step 4: Add Performance Regression Detection

New File: scripts/check_performance_regression.sh Functionality:

  • Compare current benchmarks with baselines
  • Alert on performance degradation > 10%
  • Generate performance trend reports

Step 5: Implement Test Coverage

Add to CI pipeline:

cargo tarpaulin --workspace --out Html --output-dir coverage/
  • Generate coverage badges
  • Set minimum coverage thresholds

Expected Outcomes

Test Execution Time Improvements

  • Core Tests: < 2 minutes (vs current timeout)
  • Integration Tests: < 10 minutes
  • Full Suite: < 15 minutes (vs current failure)
  • Benchmarks: Complete execution with all features

Quality Improvements

  • Test Coverage: > 80% across all crates
  • Performance Regression: Automated detection
  • CI Pipeline: Parallel execution with clear stages
  • Benchmark Reliability: Consistent execution with proper feature flags

Monitoring and Reporting

  • Real-time Test Status: Dashboard with test progress
  • Performance Trends: Historical benchmark data
  • Coverage Reports: Per-crate coverage metrics
  • Regression Alerts: Automated notifications

Implementation Priority

High Priority (Day 1)

  • [ ] Fix benchmark feature flags
  • [ ] Create core test script
  • [ ] Update main test script with categories

Medium Priority (Week 1)

  • [ ] Implement modular test execution
  • [ ] Add coverage reporting
  • [ ] Fix all benchmark configurations

Low Priority (Month 1)

  • [ ] Advanced performance monitoring
  • [ ] CI optimization
  • [ ] Performance dashboard

Progress Tracking

Completed Tasks

  • [x] Initial test and benchmark analysis
  • [x] Issue identification and root cause analysis
  • [x] Comprehensive plan creation
  • [x] Fix benchmark script to include test-utils feature flag
  • [x] Create core test script for fast unit tests
  • [x] Update main test script with modular execution
  • [x] Test core script execution (successful: 53 tests passed)
  • [x] Verify benchmark script fix (compilation successful, execution in progress)
  • [x] Complete benchmark execution verification (benchmarks compile and run properly)
  • [x] Create comprehensive MCP test script for isolated MCP testing
  • [x] Test MCP script execution (successful: 5 middleware tests passed, server compiles)

In Progress

  • [ ] Fix remaining compilation issues in workspace (minor TUI feature flags)

Next Immediate Actions

  • [ ] Create integration test script for service-dependent tests
  • [ ] Implement performance regression detection script
  • [ ] Add test coverage reporting with cargo-tarpaulin

Blocked

  • [ ] Integration test environment setup
  • [ ] CI/CD pipeline modifications

Risk Assessment

High Risk

  • Timeout Issues: May require additional timeout optimizations
  • Dependency Conflicts: MCP server dependencies may cause compilation issues

Medium Risk

  • Performance Regression: New test infrastructure may affect performance
  • Coverage Tooling: cargo-tarpaulin may have compatibility issues

Low Risk

  • Script Modifications: Low risk, easily reversible
  • Feature Flag Changes: Isolated impact

Success Metrics

Quantitative

  • Test execution time reduced by 80%
  • Benchmark success rate increased to 100%
  • Test coverage > 80%
  • Zero timeout failures

Qualitative

  • Improved developer experience
  • Reliable CI/CD pipeline
  • Better performance monitoring
  • Clear test categorization

Last Updated: Tue Nov 11 2025 Next Review: Daily during implementation phase