Implementation Plan: terraphim-grep — Intelligent Hybrid Grep
Status: Draft
Research Doc: docs/research-terraphim-grep.md
Gitea Issue: #1743
Author: Agent
Date: 2026-05-23
Estimated Effort: 12 days (as per research document)
Overview
Summary
Implement terraphim-grep: an intelligent grep tool that uses hybrid search (FFF + ripgrep + KG) for fast deterministic results, falling back to RLM only when needed, and learning by writing new concepts back to the knowledge graph.
Approach
Build a new crate terraphim_grep that orchestrates existing components (RoleGraph, KgPathScorer, HaystackProvider, LlmClient) with new logic for sufficiency judgment and KG curation.
Scope
In Scope:
- Create
terraphim_grepcrate with hybrid search orchestration - Implement
SufficiencyJudgewith heuristic and LLM tiers - Implement
RlmSignaturetrait for structured outputs - Implement KG curation loop (RLM → RoleGraph)
- Extend
terraphim_cliwithgrepsubcommand - Maintain rlmgrep-compatible interface
Out of Scope:
- Firecracker VM integration (Docker sufficient for CLI)
- MCP server integration (future phase)
- Multi-modal ingestion (PDF/images)
- Streaming output
Avoid At All Cost (from 5/25 analysis):
- Loading ALL files into LLM context (rlmgrep's mistake)
- Premature optimisation of sufficiency thresholds
- Building our own LLM client instead of reusing terraphim_service
Architecture
Component Diagram
terraphim_grep
├── HybridSearcher (parallel search across 3 haystacks)
│ ├── CodeSearch (fff + ripgrep + KgPathScorer)
│ ├── DocSearch (HaystackProvider)
│ └── KgSearch (RoleGraph Aho-Corasick)
│
├── SufficiencyJudge (tiered evaluation)
│ ├── HeuristicJudge (coverage, confidence, diversity)
│ └── LlmJudge (fallback for uncertain cases)
│
├── RlmExecutor (RLM with pre-retrieved context)
│ ├── RlmContext (retrieved chunks + KG concepts)
│ └── RlmSignature implementations
│
└── KgCuration (RLM → KG feedback loop)
└── ConceptExtractionSignatureData Flow
User Query
│
▼
┌─────────────────┐
│ HybridSearcher │ ← parallel tokio::join!
└────────┬────────┘
│
▼
┌─────────────────┐
│ SufficiencyJudge│
└────────┬────────┘
│
┌────┴────┐
▼ ▼
┌───────┐ ┌───────┐
│Suffic.│ │Insuffic│
│Return │ │RLM ctx │
└───────┘ └────┬────┘
│
▼
┌───────────────┐
│ RlmExecutor │
└───────┬───────┘
│
▼
┌───────────────┐
│ KgCuration │ (async, non-blocking)
└───────────────┘Key Design Decisions
| Decision | Rationale | Alternatives Rejected |
|----------|-----------|----------------------|
| New terraphim_grep crate | Separation of concerns; doesn't pollute existing crates | Extend terraphim_cli directly (too coupled) |
| Reuse terraphim_service::llm::LlmClient | Already implemented; has chat_completion | Building custom HTTP client ( reinventing wheel) |
| Heuristic first, LLM second | Cost optimisation: 80-90% queries are free | LLM judge for all (expensive) |
| Async KG updates | Non-blocking; doesn't slow response | Synchronous updates (blocks user) |
Eliminated Options (Essentialism)
| Option Rejected | Why Rejected | Risk of Including | |-----------------|--------------|-------------------| | Firecracker VMs | Overkill for CLI tool; Docker sufficient | Complexity, maintenance burden | | Real-time KG updates | Could cause lock contention | Performance degradation | | Custom LLM client | terraphim_service already has one | Duplicated code, diverging interfaces |
Simplicity Check
What if this could be easy?
- Start with just hybrid search (no RLM) to prove architecture
- Add RLM fallback only when search returns empty
- KG curation as a background task
File Changes
New Crate Structure
crates/terraphim_grep/
├── Cargo.toml
└── src/
├── lib.rs # Module root + exports
├── error.rs # TerraphimGrepError
├── hybrid_searcher.rs # HybridSearcher + ResultFusion
├── sufficiency_judge.rs # SufficiencyJudge + HeuristicJudge
├── rlm_context.rs # RlmContext building
├── signatures.rs # RlmSignature trait + impls
├── kg_curation.rs # KgCurationRlm
└── cli.rs # CLI argument parsing (optional)New Files
| File | Purpose |
|------|---------|
| crates/terraphim_grep/Cargo.toml | Crate manifest |
| crates/terraphim_grep/src/lib.rs | Module root, public exports |
| crates/terraphim_grep/src/error.rs | Error types |
| crates/terraphim_grep/src/hybrid_searcher.rs | Parallel search orchestration |
| crates/terraphim_grep/src/sufficiency_judge.rs | Tiered sufficiency evaluation |
| crates/terraphim_grep/src/rlm_context.rs | RLM context construction |
| crates/terraphim_grep/src/signatures.rs | RlmSignature trait + implementations |
| crates/terraphim_grep/src/kg_curation.rs | KG curation feedback loop |
Modified Files
| File | Changes |
|------|---------|
| Cargo.toml | Add crates/terraphim_grep to workspace members |
| crates/terraphim_cli/src/main.rs | Add Grep subcommand |
Deleted Files
(None)
API Design
Public Types
// crates/terraphim_grep/src/lib.rs
// crates/terraphim_grep/src/error.rs
// crates/terraphim_grep/src/hybrid_searcher.rs
// crates/terraphim_grep/src/sufficiency_judge.rs
// crates/terraphim_grep/src/signatures.rs
;
;
;
Public Functions
// crates/terraphim_grep/src/lib.rs
/// Create a new TerraphimGrep instance
pub async ;
/// Execute a grep query
pub async ;
/// Get search result statistics
;Error Types
// crates/terraphim_grep/src/error.rs
Test Strategy
Unit Tests
| Test | Location | Purpose |
|------|----------|---------|
| test_hybrid_search_parallel | hybrid_searcher.rs | Verify parallel execution |
| test_fusion_deduplication | hybrid_searcher.rs | Verify result merging |
| test_heuristic_coverage | sufficiency_judge.rs | Coverage calculation |
| test_heuristic_diversity | sufficiency_judge.rs | Diversity calculation |
| test_signature_parse_search | signatures.rs | SearchResult parsing |
| test_signature_parse_answer | signatures.rs | AnswerWithCitations parsing |
| test_concept_extraction | signatures.rs | NewConcept parsing |
| test_kg_curation_new_concepts | kg_curation.rs | Concept added to graph |
Integration Tests
| Test | Location | Purpose |
|------|----------|---------|
| test_grep_search_only | tests/terraphim_grep.rs | Full search-only flow |
| test_grep_with_rlm_fallback | tests/terraphim_grep.rs | RLM fallback triggered |
| test_grep_answer_mode | tests/terraphim_grep.rs | --answer flag |
| test_grep_context_lines | tests/terraphim_grep.rs | -C flag |
| test_grep_haystack_filter | tests/terraphim_grep.rs | --haystack flag |
Property Tests
proptest! Implementation Steps
Step 1: Create crate structure + error types
Files: crates/terraphim_grep/Cargo.toml, src/lib.rs, src/error.rs
Description: Create new crate with module structure and error types
Tests: Unit tests for error display
Estimated: 2 hours
// Key code to write
Step 2: Implement HybridSearcher with parallel search
Files: src/hybrid_searcher.rs
Description: Implement parallel search across code, docs, KG haystacks
Tests: Unit tests for parallel execution, fusion, deduplication
Dependencies: Step 1
Estimated: 4 hours
Step 3: Implement SufficiencyJudge with heuristic tiers
Files: src/sufficiency_judge.rs
Description: Implement heuristic-based sufficiency evaluation
Tests: Unit tests for coverage, confidence, diversity calculations
Dependencies: Step 2
Estimated: 3 hours
Step 4: Implement RlmSignature trait and implementations
Files: src/signatures.rs
Description: Define trait and implement SearchResult, Answer, ConceptExtraction signatures
Tests: Unit tests for parsing
Dependencies: Step 1
Estimated: 2 hours
Step 5: Implement RLM context building
Files: src/rlm_context.rs
Description: Build RLM context from retrieved chunks + KG concepts
Tests: Unit tests for context construction
Dependencies: Step 2, Step 4
Estimated: 2 hours
Step 6: Implement KG curation feedback loop
Files: src/kg_curation.rs
Description: RLM extracts concepts → writes to RoleGraph → rebuilds automata
Tests: Unit tests for concept extraction and graph updates
Dependencies: Step 4, Step 5
Estimated: 3 hours
Step 7: Wire TerraphimGrep together in lib.rs
Files: src/lib.rs
Description: Integrate all components; add search() method
Tests: Integration tests
Dependencies: Steps 1-6
Estimated: 2 hours
Step 8: Add Grep subcommand to terraphim_cli
Files: crates/terraphim_cli/src/main.rs
Description: Add CLI interface compatible with rlmgrep
Tests: CLI integration tests
Dependencies: Step 7
Estimated: 3 hours
Step 9: Integration tests + documentation
Files: tests/terraphim_grep.rs, README
Description: Full integration tests + user documentation
Tests: Integration tests pass
Dependencies: Step 8
Estimated: 2 hours
Rollback Plan
If issues discovered:
- Disable RLM fallback via
--rlm=falseflag - Revert to pure search-only mode
- KG curation can be disabled via config flag
Feature flag: TERRAPHIM_GREP_RLM_ENABLED=false
Migration (if applicable)
No database migrations needed - this is a new tool.
Dependencies
New Dependencies
| Crate | Version | Justification | |-------|---------|---------------| | (none) | - | All deps come from existing terraphim crates |
Dependency Updates
(None - reusing existing crates)
Software Release Definition (SRD)
Not applicable for internal tool.
Performance Considerations
Expected Performance
| Metric | Target | Measurement | |--------|--------|-------------| | Search-only latency | < 500ms | Benchmark | | Search + RLM latency | < 5s | Benchmark | | Memory (search-only) | < 10MB | Profiling | | Memory (with RLM) | < 100MB | Profiling |
Benchmarks to Add
Open Items
| Item | Status | Owner | |------|--------|-------| | Sufficiency threshold tuning | Pending | Need empirical data | | LLM client configuration UX | Pending | How to configure API key? | | KG curation rate limiting | Pending | Every query vs batched? |
Approval
- [ ] Technical review complete
- [ ] Test strategy approved
- [ ] Performance targets agreed
- [ ] Human approval received