Research: rlmgrep vs Terraphim — Proposal for Intelligent Hybrid Grep
Date: 2026-05-18
Goal: Design a Terraphim-powered intelligent grep that surpasses rlmgrep by leveraging hybrid search (KG + documentation + code + external haystacks) with RLM fallback and KG curation.
1. Executive Summary
rlmgrep demonstrates that natural-language code search is valuable but implements it inefficiently: it loads all files into LLM context and relies on brute-force reasoning. This is O(corpus_size) in tokens and expensive.
Terraphim already possesses superior infrastructure:
- Fast search:
fff_search(frecency-ranked),ripgrep,RoleGraph(Aho-Corasick O(n)) - Relevance scoring:
KgPathScorerboosts by KG concept matches - Multi-haystack:
HaystackProvidertrait for code, docs, JMAP, external sources - Structured execution:
terraphim_rlmwith Firecracker/Docker sandboxing - Knowledge graph:
RoleGraphwith hot-reload thesaurus and learning indexer
The opportunity is to create terraphim-grep — an intelligent grep that:
- Searches first using hybrid KG + file + haystack search (deterministic, fast, cheap)
- Falls back to RLM only when search is insufficient (complex reasoning, synthesis)
- Uses RLM to curate the KG — extracting new concepts from interactions to improve future searches
- Outputs structured results with citations (like rlmgrep's
--signature-jsonbut better)
This is the inverse of rlmgrep: search-first, RLM-last, continuously learning.
2. rlmgrep Analysis
2.1 Architecture
User Query: "Where is retry/backoff configured?"
|
v
[File Discovery] → collect_candidates() → Walk filesystem
|
v
[Ingestion] → load_files() → Convert PDF/office/image/audio to text
|
v
[Context Build] → directory: {path: full_text} + ASCII file_map
|
v
[DSPy RLM] → LLM reasons over ALL files, outputs Match{path, line}
|
v
[Verify] → Drop hallucinated line numbers
|
v
[Output] → rg-style headings with line numbers2.2 Critical Flaw: Brute-Force Context Loading
# rlmgrep loads ENTIRE file contents into the LLM
= # ALL files, ALL text
# Limits: aborts at 5,000 files
# Cost: $0.01-0.05 per query
# No learning between queriesWhy this fails:
- Large repos (5K+ files) exceed context windows
- Every query re-reads the filesystem (no index)
- No relevance pre-filtering
- Expensive: LLM processes full corpus for every query
2.3 What rlmgrep Gets Right
| Feature | Value |
|---------|-------|
| Natural language queries | No regex crafting needed |
| Structured signatures | --signature-json for agent-consumable output |
| Multi-modal ingestion | PDFs, images, audio via MarkItDown |
| Sidecar caching | .filename.pdf.md avoids re-conversion |
| Grep-compatible output | rg-style headings with line numbers |
| Context lines | -C, -A, -B for surrounding context |
3. Terraphim Infrastructure Assessment
3.1 Existing Search Capabilities
| Component | Capability | Relevance |
|-----------|-----------|-----------|
| RoleGraph | Aho-Corasick O(n) concept detection; TF-IDF fallback; graph ranking | Core — fast concept matching |
| KgPathScorer | Boosts file search by KG concept matches in paths | Core — relevance ranking |
| fff_search | Fast frecency-ranked file finder; ripgrep integration | Core — file discovery |
| HaystackProvider | Uniform async search over heterogeneous backends | Core — multi-source search |
| terraphim_automata | FST autocomplete; Jaro-Winkler fuzzy search | UX — query suggestions |
| terraphim_mcp_server | Exposes search as MCP tools | Integration |
| learning_indexer | Indexes SharedLearnings into RoleGraph | Core — RLM writes back |
| terraphim_rlm | Sandboxed execution, query loop, budgets | Core — RLM fallback |
3.2 Critical Gap: LLM Bridge is a Stub
crates/terraphim_rlm/src/llm_bridge.rs lines 192-201:
// TODO: Actually call the LLM service
// For now, return a stub response
let response_text = format!;Without real LLM integration, terraphim_rlm cannot participate in intelligent grep.
3.3 What Terraphim Has That rlmgrep Lacks
| Feature | Terraphim | rlmgrep | |---------|----------|---------| | Pre-indexed search | RoleGraph (Aho-Corasick) + file frecency | None (re-reads every query) | | Multi-haystack | Code + docs + external via HaystackProvider | Local files only | | Relevance scoring | KG path scoring + graph edge weights | None (LLM sees all files) | | Sandboxing | Firecracker VMs | Deno interpreter | | Budget control | Token + time + recursion | max_iterations only | | Snapshots | VM state capture/rollback | None | | Learning | KG curation from RLM interactions | None per-query | | MCP integration | Claude/Cursor tools | None |
4. Proposed Architecture: terraphim-grep
4.1 Philosophy
Search first, RLM last. Fast deterministic search handles 80-90% of queries. RLM is invoked only for synthesis, evaluation, or concept discovery — and writes back to the KG to improve future searches.
This is the inverse of rlmgrep's "RLM first, brute force" approach.
4.2 System Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ terraphim-grep │
│ Intelligent Hybrid Search with RLM Fallback │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ User Query: "Where is retry/backoff configured and what are the defaults?" │
│ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PHASE 1: HYBRID SEARCH │ │
│ │ (Deterministic, fast, zero LLM tokens) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────┼─────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Code │ │ Docs │ │ KG Query │ │
│ │ Search │ │ Search │ │ │ │
│ │ │ │ │ │ │ │
│ │fff_search │ │haystack_ │ │RoleGraph │ │
│ │+ ripgrep │ │jmap │ │Aho- │ │
│ │+ KgPath │ │haystack_ │ │Corasick │ │
│ │Scorer │ │grepapp │ │+ TF-IDF │ │
│ │ │ │ │ │fallback │ │
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │ │
│ └─────────────────────┼─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PHASE 2: RESULT FUSION & RANKING │ │
│ │ (Merge + deduplicate + re-rank by KG edge weights) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PHASE 3: SUFFICIENCY JUDGE │ │
│ │ "Do these results answer the query?" │ │
│ │ │ │
│ │ Tier 1: Heuristic (free) │ │
│ │ - Coverage: all query terms found? │ │
│ │ - Confidence: KG matches exceed threshold? │ │
│ │ - Diversity: results from multiple haystacks? │ │
│ │ │ │
│ │ Tier 2: LLM Judge ($0.001, 10% of queries) │ │
│ │ - "Given these results, can we answer the query?" │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌──────────┐ ┌──────────┐ │
│ │Sufficient│ │NeedsSynth│ │NeedsMore │ │
│ │ │ │ │ │ │ │
│ │Return │ │RLM w/ │ │Expand │ │
│ │results │ │context │ │search │ │
│ └──────────┘ └────┬─────┘ └──────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PHASE 4: RLM SYNTHESIS (if needed) │ │
│ │ │ │
│ │ Input (NOT full corpus!): │ │
│ │ - Top-K retrieved chunks (relevant passages only) │ │
│ │ - KG concept map (relevant nodes/edges) │ │
│ │ - Source metadata (for citations) │ │
│ │ - Original query │ │
│ │ │ │
│ │ Output: │ │
│ │ - Synthesised answer │ │
│ │ - Citations with file:line references │ │
│ │ - Confidence score │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PHASE 5: KG CURATION │ │
│ │ │ │
│ │ RLM extracts new concepts from query + answer: │ │
│ │ - New concept: "retry configuration" │ │
│ │ - Synonyms: "backoff", "retry policy", "exponential backoff" │ │
│ │ - Relationships: "retry" → "tokio::time" → "Duration" │ │
│ │ │ │
│ │ Writes to RoleGraph → rebuilds automata │ │
│ │ Future queries for "retry" are now faster │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘4.3 Component Specifications
4.3.1 Hybrid Search Orchestrator
Key difference from rlmgrep:
- rlmgrep: O(corpus_size) tokens — loads ALL files into LLM
- HybridSearcher: O(K) tokens — retrieves only top-K relevant chunks
4.3.2 Sufficiency Judge
Cost optimisation: Tier 1 is free. Tier 2 costs ~$0.001 but only runs on 10-20% of queries.
4.3.3 Contextual RLM
When search is insufficient, RLM receives pre-filtered context:
Opposite of rlmgrep: Instead of "here are all files, find the answer", it's "here are the most relevant passages, synthesise the answer".
4.3.4 Structured Output (rlmgrep Parity + Enhancement)
// Search result signature (rlmgrep parity)
;
// Answer with citations
;
// Concept extraction (KG curation)
;
4.3.5 KG Curation Agent
Virtuous cycle:
- Query → search finds results via existing KG
- Insufficient → RLM synthesises answer
- RLM extracts new concepts → writes to KG
- Future queries benefit from enriched KG
- Over time, RLM is needed less and less
5. CLI Design
5.1 Interface (rlmgrep-compatible + Terraphim extensions)
# Basic search (hybrid search only, no LLM)
# With narrative answer (RLM synthesis if needed)
# Custom structured output (rlmgrep parity)
# Context lines (rlmgrep parity)
# Restrict to code only
# Include documentation
# Force RLM (skip search)
# Verbose (show search + RLM iterations)
5.2 Output Format
Search-only mode (sufficient results):
./crates/terraphim_rlm/src/retry.rs
42: pub fn with_backoff(self, backoff: ExponentialBackoff) -> Self {
43: self.backoff = Some(backoff);
44: self
./crates/terraphim_rlm/src/config.rs
89: pub retry_policy: RetryPolicy,
90: pub max_retries: u32,RLM synthesis mode (insufficient search):
===== Answer =====
Retry/backoff is configured in two places:
1. crates/terraphim_rlm/src/retry.rs:42 — ExponentialBackoff is set via
`with_backoff()` builder method. Defaults to 3 retries with 100ms base delay.
2. crates/terraphim_rlm/src/config.rs:89 — RetryPolicy enum defines
Fixed, Exponential, and Custom strategies.
===== Matches =====
./crates/terraphim_rlm/src/retry.rs
42: pub fn with_backoff(self, backoff: ExponentialBackoff) -> Self {
...
./crates/terraphim_rlm/src/config.rs
89: pub retry_policy: RetryPolicy,
...6. Implementation Roadmap
Phase 1: LLM Bridge (2 days)
Critical blocker. Replace stub with real LLM clients:
;
;
; // Local, zero costPhase 2: Hybrid Search (2 days)
Create HybridSearcher that parallelises code + docs + KG search.
Phase 3: Sufficiency Judge (1 day)
Two-tier judge: heuristic (free) + LLM (uncertain cases only).
Phase 4: Contextual RLM (2 days)
Build RLM context from retrieved chunks + KG concepts. Context window management.
Phase 5: Structured Signatures (1 day)
RlmSignature trait + implementations for Match, Answer, Concept extraction.
Phase 6: KG Curation (2 days)
RLM extracts concepts from interactions, writes to RoleGraph, rebuilds automata.
Phase 7: CLI + Integration (2 days)
terraphim-grep binary with rlmgrep-compatible interface.
Total: 12 days
7. Cost Analysis
| Phase | KG Lookups | LLM Calls | Cost | Latency | Frequency | |-------|-----------|-----------|------|---------|-----------| | Search-only | 2-4 | 0 | $0.0001 | 0.1s | 80% | | Search expansion | 4-8 | 0 | $0.0002 | 0.5s | 15% | | RLM synthesis | 4-8 | 1 | $0.005 | 5s | 4% | | KG curation | 4-8 | 2 | $0.01 | 10s | 1% | | Average | — | — | $0.001 | 0.5s | — |
vs rlmgrep:
| Metric | rlmgrep | terraphim-grep | |--------|---------|----------------| | Cost/query | $0.01-0.05 | $0.001 (20-50x cheaper) | | Latency | 15-30s | 0.1-5s (3-300x faster) | | Corpus size | Limited by context window | Unlimited (search index) | | Learning | None per-query | KG enriches over time | | Multi-haystack | Local files only | Code + docs + external |
8. Comparison Matrix
| Dimension | rlmgrep | terraphim-grep (proposed) |
|-----------|---------|---------------------------|
| Search strategy | RLM brute-force | Hybrid search first, RLM fallback |
| Token complexity | O(corpus_size) | O(retrieved_chunks) |
| Indexing | None | RoleGraph + file frecency |
| Multi-modal | MarkItDown PDF/office/image/audio | Port MarkItDown + sidecar cache |
| Sandbox | Deno interpreter | Firecracker VM / Docker |
| Structured output | DSPy Signatures | RlmSignature trait |
| KG integration | None | Bidirectional: read + write |
| Learning | None | Concepts extracted and indexed |
| Budget control | max_iterations | Token + time + recursion |
| Citation | Manual verification | Automatic source attribution |
| MCP tools | None | Full MCP server integration |
9. Conclusion
rlmgrep proves that LLM-powered code search is valuable but implements it inefficiently. Terraphim has all the pieces for a superior architecture:
- Fast search:
fff_search,ripgrep,RoleGraphAho-Corasick - Relevance ranking:
KgPathScorer, graph edge weights - Sandboxed execution:
terraphim_rlmwith Firecracker/Docker - Structured APIs: MCP server, typed search results
- Extensible KG:
RoleGraphwith hot-reload thesaurus
The missing pieces are:
- Real LLM bridge (replace stub)
- Hybrid search orchestrator
- Sufficiency judge
- Structured RLM signatures
- KG curation feedback loop
These can be built incrementally on top of existing crates, creating terraphim-grep that is faster, cheaper, and more accurate than rlmgrep while continuously improving its knowledge graph.
The virtuous cycle:
- Day 1: 80% search-only, 20% RLM fallback
- Day 30: 95% search-only, 5% RLM fallback
- Day 90: 99% search-only, 1% RLM (novel queries only)
Over time, the system becomes cheaper and faster — the opposite of rlmpgrep, which costs the same for every query.
References
- https://github.com/halfprice06/rlmgrep — rlmgrep repository
crates/terraphim_rlm/src/llm_bridge.rs— Stubbed LLM bridge (line 192)crates/terraphim_rolegraph/src/lib.rs— RoleGraph implementationcrates/haystack_core/src/lib.rs— HaystackProvider traitcrates/terraphim_file_search/src/kg_scorer.rs— KgPathScorercrates/terraphim_mcp_server/src/lib.rs— MCP server with search tools.docs/ARCHITECTURE-build-runner-llm.md— Existing KG-first architecture