Grep That Knows When to Ask an LLM
You search a million-line codebase for "where is retry configured." Ripgrep prints thirty hits. None of them are obviously the right one. You read all thirty. Five minutes gone.
Terraphim Grep does something different. It runs an fff-search code scan and a knowledge-graph concept lookup in parallel. A sufficiency judge looks at coverage, diversity, and KG confidence, and decides whether the local retrievers found enough signal. If they did, you get the chunks. If they did not, an LLM synthesises a cited answer.
The CLI works in three modes, picked entirely from environment variables and role config -- no code changes needed:
# Search-only -- no LLM, no API key required
# OpenRouter free model -- export key, run with --answer
# Local Ollama
The LLM wiring goes through terraphim_service::llm::build_llm_from_role -- the same entry point the server, TUI, and RLM use. Whether routing through capability extraction kicks in is a role config decision (llm_router_enabled = true), not something grep itself knows about. Every consumer of the LlmClient trait now goes through one place.
What is new in this release
- Your knowledge tops the results.
fff-searchreturns uniformrelevance_score = 1.0per match, so on its own it cannot order them. The newboost_chunks_with_kgre-ranks chunks: files whose source path or content matches your thesaurus concepts move to the top. The boost is reflected in the chunk's score in the JSON output, so you can see why something ranked where it did. - End-to-end hybrid pipeline: fff-search code chunks + KG concepts + KG boost + sufficiency judging + LLM synthesis with citations.
- Graceful degradation: no LLM configured? You still get chunks.
force_rlm = truestill fails fast. - Four-layer test pyramid, zero mocks: inline unit tests, router capability assertions, live OpenRouter free-model smoke (
#[ignore]), and a manual quality gate. - Criterion benchmarks for
code_only,hybrid_with_kg,fuse_and_rank, andkg_boost_overhead. First numbers: hybrid latency flat at ~3.2 ms across thesaurus sizes 10..10,000; KG boost adds under 25 us at typical scale.
Full design rationale and benchmark walkthrough in the long-form post.