FFF Search Integration -- Capability Statement
Summary
Knowledge-graph-augmented file search exposed via the Model Context Protocol (MCP). Combines fuzzy matching, frecency scoring, and Aho-Corasick multi-pattern content search with domain-aware result ranking from the Terraphim knowledge graph.
Capability Details
| Attribute | Detail |
|-----------|--------|
| Domain | File search, content search |
| Interface | MCP (Model Context Protocol) via rmcp 0.9 |
| Language | Rust |
| Dependencies | fff-search (Aho-Corasick, frecency), terraphim_file_search (KG scoring), terraphim_automata (thesaurus) |
| Persistence | LMDB via heed crate (frecency), filesystem JSON (thesaurus) |
| Thread safety | Arc<RwLock<>> for shared state, parking_lot::RwLock for hot-reload |
Exposed MCP Tools
terraphim_find_files
Fuzzy file path search with knowledge-graph boost.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| query | string | Yes | - | Fuzzy search string |
| path | string | No | "." | Root directory |
| limit | integer | No | 20 | Maximum results |
Scoring: Over-fetches 4x, adds KG path score, re-sorts by combined score.
terraphim_grep
Single-pattern content search with KG-ordered files and cursor pagination.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| query | string | Yes | - | Search pattern |
| path | string | No | "." | Root directory |
| limit | integer | No | 50 | Maximum matches |
| output_mode | "content" or "files" | No | "content" | Output format |
| cursor | string | No | - | Base64 pagination token |
Scoring: Files pre-sorted by KG path score before searching.
terraphim_multi_grep
Multi-pattern OR content search using SIMD-accelerated Aho-Corasick matching.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| patterns | array of strings | Yes | - | OR-patterns |
| path | string | No | "." | Root directory |
| constraints | string | No | "" | File constraints (*.rs !test/) |
| limit | integer | No | 50 | Maximum matches |
| output_mode | "content" or "files" | No | "content" | Output format |
| cursor | string | No | - | Base64 pagination token |
Scoring: Same KG pre-sort as terraphim_grep. Aho-Corasick matches all patterns in a single SIMD pass per file.
Knowledge Graph Integration
KgPathScorer
Implements ExternalScorer trait from fff-search. Scores file paths by running them through Aho-Corasick automata built from the domain thesaurus.
Scoring formula: min(unique_concepts_matched * weight_per_term, max_boost)
Default configuration:
weight_per_term: 5max_boost: 30
KgWatcher
Filesystem watcher for hot-reload of thesaurus JSON files.
- Debounce: 500ms
- Atomic swap via
parking_lot::RwLock - No restart required
Frecency System
Scoring Algorithm
Exponential decay combining frequency and recency:
frecency(file) = sum(exp(-lambda * days_ago)) for each accessWith diminishing returns: min(f, 10 + sqrt(max(0, f - 10))) for f > 10.
Decay Profiles
| Profile | Lambda | Half-life | Retention | |---------|--------|-----------|-----------| | Normal | ln(2)/10 | 10 days | 30 days | | AI | ln(2)/3 | 3 days | 7 days |
Storage
- LMDB environment: 24 MiB
- Keys: BLAKE3 hash of file path (32 bytes)
- Values:
VecDeque<u64>(timestamped access log) - GC: Background thread purges expired entries and compacts
Configuration
Environment variable FFF_FRECENCY_PATH sets the LMDB database location.
Pagination
Stateless cursor-based pagination using Base64-encoded file offsets.
- Cursor format: URL-safe Base64 (no padding) encoding of file index integer
next_cursorreturned in content when more results exist- File-based granularity (not match-based)
- Ephemeral: invalidated by filesystem changes between pages
Performance Characteristics
| Operation | Characteristic | |-----------|---------------| | KG path scoring | O(n) in path length per file, regardless of thesaurus size | | Aho-Corasick multi-pattern | SIMD-accelerated single pass per file | | Frecency lookup | O(1) LMDB read via BLAKE3 key | | File scan | Synchronous per request, no caching between calls | | Max file size | 10 MiB | | Max matches per file | 200 |
Implementation Completeness
| Component | Status | |-----------|--------| | MCP tool interface (3 tools) | Complete | | KG path scoring | Complete | | KG hot-reload | Complete | | Aho-Corasick multi-pattern grep | Complete | | Cursor pagination | Complete | | Frecency persistence (LMDB) | Initialised | | Frecency wired to FilePicker scoring | Pending | | AI-mode frecency profile activation | Pending |
Integration Points
- MCP server:
crates/terraphim_mcp_server/src/lib.rs - KG scorer:
crates/terraphim_file_search/src/kg_scorer.rs - KG watcher:
crates/terraphim_file_search/src/watcher.rs - Scorer config:
crates/terraphim_file_search/src/config.rs - Tests:
crates/terraphim_mcp_server/tests/test_find_files.rs - Benchmarks:
crates/terraphim_file_search/benches/kg_scoring.rs