Code Assistant Implementation (PR #277)
Overview
The Code Assistant Implementation (PR #277) is a comprehensive framework that enables Claude and other AI models to autonomously implement code changes, validate modifications, and recover from errors. The implementation spans six phases and includes advanced file editing strategies, multi-layer security validation, REPL integration, knowledge graph extensions, and automated recovery systems.
Status: Completed - 167/167 automated tests passing with 8 successful live demonstrations
Architecture Overview
The Code Assistant system operates across three key layers:
- File Editing Layer - Multi-strategy text manipulation with validation
- Security & Validation Layer - Four-layer verification pipeline
- Recovery & State Management - Automatic undo and snapshot capabilities
Phase 1: Multi-Strategy File Editing
Overview
Implements four distinct file editing strategies using terraphim-automata for efficient text-based SEARCH/REPLACE operations without requiring LLM tool support.
Editing Strategies
1. Exact Matching Strategy
- Performs precise string matching
- No tolerance for variations
- Fastest execution (typically <10ms)
- Best for well-formatted, stable code
Use Case: Editing code with consistent formatting and known exact strings
2. Whitespace-Flexible Matching Strategy
- Normalizes whitespace before matching
- Handles different indentation styles
- Slightly slower than exact matching (~10-20ms)
- Accounts for formatting variations
Use Case: Code with inconsistent indentation or whitespace differences
3. Block Anchoring Strategy
- Uses surrounding code context as anchors
- Reduces false positives in large files
- Handles partial matches with surrounding context
- Optimal for complex edits in large files
Use Case: Editing specific functions or blocks within large files
4. Fuzzy Matching Strategy
- Employs similarity metrics (e.g., Levenshtein distance)
- Tolerates minor differences in content
- More computational overhead (~50-100ms)
- Fallback for challenging edits
Use Case: Code with minor variations, refactored sections, or when exact matching fails
Performance Characteristics
- 50x faster than Aider for typical file operations
- Sub-100ms execution for all strategies
- Automata-based acceleration eliminates repeated parsing
- Memory-efficient with streaming text processing
API Interface
// File editing operations exposed through REPL commands
/file edit <path> <old_text> <new_text>
/file validate-edit <path> <old_text> <new_text>
/file diff <path>
/file undo <path>Phase 2: Validation & Security
Four-Layer Validation Pipeline
graph TD
A["LLM Input"] --> B["[Layer 1]<br/>Pre-LLM Context Validation"]
B --> C["[Layer 2]<br/>Post-LLM Output Parsing"]
C --> D["[Layer 3]<br/>Pre-Tool File Verification"]
D --> E["[Layer 4]<br/>Post-Tool Integrity Checks"]
E --> F["File System"]
style A fill:#e1f5ff
style B fill:#fff3e0
style C fill:#f3e5f5
style D fill:#e8f5e9
style E fill:#fce4ec
style F fill:#e0f2f1Layer 1: Pre-LLM Context Validation
- Validates input context before sending to LLM
- Checks file existence and accessibility
- Validates requested edit ranges
- Prevents invalid requests from reaching LLM
Layer 2: Post-LLM Output Parsing
- Parses LLM responses for valid edit instructions
- Validates syntax and format
- Extracts file paths, old text, and new text
- Rejects malformed responses
Layer 3: Pre-Tool File Verification
- Verifies files exist before attempting edits
- Checks file permissions
- Validates file size constraints
- Ensures sufficient disk space
Layer 4: Post-Tool Integrity Checks
- Verifies edits were applied correctly
- Validates file checksums post-edit
- Ensures no unintended modifications
- Rollback capability on failure
Security Configuration
Repository-specific security rules via .terraphim/security.json:
Command Matching Strategies
- Exact Matching: Command must match precisely
- Synonym-Based: Matches commands with equivalent meaning (via thesaurus)
- Fuzzy Matching: Tolerates minor variations using Levenshtein distance
Validated LLM Client
Phase 3: REPL Integration
File Editing Commands
File editing commands integrated into the terminal interface:
# Edit a file with automatic strategy selection
# Validate an edit without applying it
# View pending changes
# Undo the last edit
ChatHandler Integration
Interactive Workflow
- User provides instruction to code assistant
- ChatHandler sends request through ValidatedLlmClient
- LLM receives context with available files
- LLM generates file edit commands
- System validates all edits through four-layer pipeline
- User reviews changes via
/file diff - User confirms or rejects changes
- System applies confirmed edits
Phase 4: Knowledge Graph Extension
CodeSymbol Types
New knowledge graph node types for code entities:
Dual-Purpose Graph Functionality
The knowledge graph stores both:
- Conceptual Information: Domain knowledge, relationships between concepts
- Code-Level Information: Functions, classes, dependencies, call chains
graph TD
A["Knowledge Graph"] --> B["Domain Terms"]
A --> C["Code Symbols"]
B --> D["'Search'<br/>[concept]"]
B --> E["'Performance'<br/>[connection]"]
C --> F["Function<br/>[code]"]
C --> G["Struct<br/>[code]"]
D -.->|semantic connection| E
F -.->|call dependency| G
style A fill:#2196f3,color:#fff
style B fill:#4caf50,color:#fff
style C fill:#ff9800,color:#fff
style D fill:#c8e6c9
style E fill:#ffe0b2
style F fill:#ffe0b2
style G fill:#ffe0b2PageRank-Style Relevance Ranking
Graph nodes are ranked by:
- In-degree - Number of incoming connections
- Out-degree - Number of outgoing connections
- Centrality - Position relative to frequently accessed nodes
- Recency - Last access timestamp
Semantic Search Across Code
Query examples:
- "Show all functions that handle authentication"
- "Find tests for storage operations"
- "List all deprecated code"
- "Identify circular dependencies"
Phase 5: Recovery Systems
GitRecovery
Automatic git-based undo functionality:
Features:
- Automatic commits before major edits
- Easy rollback to previous states
- Checkpoint history with messages
- Diff view of changes between checkpoints
SnapshotManager
State preservation across sessions:
Features:
- Full system state snapshots
- Timestamp-based recovery
- Session continuity across restarts
- Snapshot diffing and analysis
Recovery Workflow
graph TD
A["Edit Attempt"] --> B["Pre-edit Snapshot Created"]
B --> C["Edit Executed"]
C --> D{{"Validation<br/>Success?"}}
D -->|Yes| E["Checkpoint Created"]
D -->|No| F["Restore from Snapshot"]
F --> G["Report Error"]
E --> H["Ready for Next Operation"]
G --> H
style A fill:#e3f2fd
style B fill:#c8e6c9
style C fill:#fff9c4
style D fill:#ffccbc
style E fill:#a5d6a7
style F fill:#ef9a9a
style G fill:#ffccbc
style H fill:#b3e5fcPhase 6: Integration & Testing
MCP Tools
23 total MCP tools available (17 existing + 6 new):
New Code Assistant Tools:
validate_file_edit- Pre-validate editsapply_file_edit- Apply validated editscreate_checkpoint- Create recovery pointrestore_checkpoint- Restore previous stateget_code_symbols- Query knowledge graph code nodesanalyze_dependencies- Analyze code dependencies
Existing Tools (17):
- Autocomplete functions
- Text processing functions
- Thesaurus management
- Graph connectivity queries
- Fuzzy search operations
Test Suite
Automated Tests: 167/167 passing
Test categories:
-
File Editing Tests (40 tests)
- Exact matching strategy validation
- Whitespace flexibility
- Block anchoring accuracy
- Fuzzy matching edge cases
-
Security Validation Tests (35 tests)
- Four-layer pipeline validation
- Malicious command detection
- File permission verification
- Integrity checks
-
REPL Integration Tests (25 tests)
- Command parsing and execution
- ChatHandler workflow
- Error handling and recovery
-
Knowledge Graph Tests (20 tests)
- Code symbol extraction
- Graph construction
- Relevance ranking
- Semantic search queries
-
Recovery System Tests (30 tests)
- Git checkpoint creation/restore
- Snapshot management
- State preservation
- Recovery accuracy
-
Integration Tests (17 tests)
- End-to-end workflows
- Multi-phase operations
- Cross-component communication
Live Demonstrations
8 successful live demonstrations:
-
Auto-fix Compilation Errors
- LLM identifies compilation errors
- Generates file edits
- System applies and validates
- Compilation succeeds
-
Implement New Feature
- Feature specification provided
- LLM generates complete implementation
- Tests written automatically
- All tests pass
-
Refactor Code Section
- Identify refactoring opportunities
- Generate refactored code
- Validate against original behavior
- Tests confirm refactoring
-
Security Patch Application
- Security issue identified
- Patch code generated
- Security validation applied
- Impact analysis performed
-
Code Review with Automated Fixes
- Review comments provided
- LLM generates fixes
- Fixes applied automatically
- Reviewer confirmation workflow
-
Cross-Cutting Concern Implementation
- Logging added to multiple functions
- Metrics collection integrated
- Error handling standardized
- Coordinated changes applied
-
Dependency Update with Migration
- Dependency version updated
- API changes detected
- Migration code generated
- Tests updated automatically
-
AI-Generated Working Rust Code
- Ollama LLM model used
- Async Rust code generated
- Tokio patterns correctly applied
- Code compiles and runs successfully
Usage Examples
Basic File Editing
# Edit a file with automatic strategy selection
# Validate before applying
# View pending changes
# Undo if needed
Chat-Based Code Modification
User: "Add error handling to the parse_config function"
System: Sends context with parse_config function to LLM
LLM: Generates file edit commands with error handling
System: Validates edits through security pipeline
System: Applies edits and runs tests
System: Reports success with test resultsRecovery Workflow
# Create a checkpoint before major refactoring
# Perform refactoring through code assistant
# If issues arise, restore checkpoint
# View changes between checkpoints
Performance Metrics
| Operation | Time | Notes | |-----------|------|-------| | Exact matching edit | <10ms | Best case, consistent formatting | | Whitespace-flexible edit | 10-20ms | Handles indentation variations | | Block anchoring edit | 20-50ms | Large files with context | | Fuzzy matching edit | 50-100ms | Complex or varied code | | Full validation pipeline | 30-150ms | Including all 4 layers | | Knowledge graph query | 5-50ms | Depends on graph size | | Snapshot creation | 100-500ms | Full state serialization | | Git checkpoint | 50-200ms | Depends on repo size |
Limitations and Considerations
Known Limitations
- Large File Performance: Edits in very large files (>10,000 lines) may require block anchoring strategy
- Concurrent Edits: System handles sequential edits; concurrent edits require locking
- Binary Files: File editing limited to text files
- Context Window: LLM context limited to available token budget
Best Practices
- Use Block Anchoring for Large Files: Improves reliability and reduces false positives
- Create Checkpoints Before Major Edits: Enables quick recovery if needed
- Validate Complex Edits: Use
/file validate-editbefore applying risky changes - Review Changes: Always review
/file diffoutput before confirming - Appropriate Strategy Selection: Let system auto-select or explicitly choose based on file characteristics
Configuration
Enable Code Assistant Features
In role configuration (terraphim_*_config.json):
Security Configuration
Create .terraphim/security.json in repository:
Future Enhancements
Planned Features
- Parallel Edits: Support concurrent file modifications with conflict resolution
- Semantic Merge: Intelligent merging of overlapping edits
- Patch Generation: Generate importable patch files from edits
- Format Preservation: Detect and preserve original code formatting
- Test Coverage Analysis: Track test coverage changes from edits
- Performance Profiling: Identify performance regressions from edits
Integration Opportunities
- Pre-commit Hooks: Validate edits against project standards
- CI/CD Integration: Automated testing of LLM-generated changes
- Code Review Workflows: Integration with GitHub/GitLab PR workflows
- IDE Plugins: Real-time code assistant in development environments
- Custom LLM Models: Fine-tuned models for specific codebases
Troubleshooting
Edit Fails with "Text Not Found"
Causes:
- Exact text doesn't match file content
- Whitespace differences (tabs vs spaces)
- Code has been modified since LLM generated edit
Solutions:
- Use whitespace-flexible strategy:
/file edit ... --strategy whitespace - Use block anchoring with surrounding context
- Use fuzzy matching for approximate matches
- Refresh context with
/file diffand regenerate edit
Validation Fails
Causes:
- File permissions issue
- Insufficient disk space
- Security policy violation
- File doesn't exist
Solutions:
- Check file permissions:
ls -la path/to/file - Check disk space:
df -h - Review security configuration:
.terraphim/security.json - Verify file exists:
test -f path/to/file
Recovery Not Working
Causes:
- Git repository not initialized
- No checkpoints created
- Snapshot storage not writable
Solutions:
- Initialize git:
git init - Create checkpoint first:
/checkpoint create "message" - Check snapshot directory permissions
- Review SnapshotManager logs
References
- PR #277: Code Assistant Implementation - Beat Aider & Claude Code
- Related Crates:
terraphim_automata- File editing and text matchingterraphim_tui- REPL interfaceterraphim_rolegraph- Knowledge graph with code symbolsterraphim_mcp_server- MCP tool exposure
Contributors
- Original implementation: Terraphim AI team
- Testing: Full test suite validation
- Documentation: Community contributions welcome