MCP File Context Tools

Enhanced Model Context Protocol (MCP) tools for file-based context management, providing line-numbered references and semantic code search.

Overview

Terraphim's MCP server exposes powerful tools for working with file context, similar to Conare AI's "@" file referencing but with additional semantic capabilities through knowledge graph integration.

Available Tools

1. `extract_paragraphs_from_automata`

Extract paragraphs from text starting at matched terms, with line numbers.

Purpose: Get context around specific concepts or code elements with precise line references.

Parameters:

text (string): The text content to search
term (string): The term to find and extract context around
max_paragraphs (number, optional): Maximum paragraphs to return (default: 3)

Returns:

{
  "paragraphs": [
    {
      "content": "paragraph text...",
      "start_line": 42,
      "end_line": 48,
      "term_position": 42
    }
  ]
}

Example Usage:

// Via MCP in Claude Desktop
const result = await mcp.callTool("extract_paragraphs_from_automata", {
  text: fileContents,
  term: "async fn",
  max_paragraphs: 2
});

// Returns paragraphs containing "async fn" with line numbers

Use Cases:

Extract function definitions with line references
Get context around specific patterns or concepts
Reference code snippets in documentation with accurate line numbers

2. `search` (Enhanced with File Context)

Search knowledge graph with document content and line numbers.

Purpose: Semantic search that returns full document context with file paths.

Parameters:

query (string): Search term or phrase
role (string, optional): Role context for search
limit (number, optional): Maximum results (default: 10)
skip (number, optional): Results to skip for pagination

Returns:

{
  "documents": [
    {
      "id": "doc-123",
      "url": "file:///path/to/file.rs",
      "body": "full file contents...",
      "description": "Brief summary",
      "rank": 0.95,
      "line_count": 150
    }
  ]
}

Enhanced Features:

Returns url field with file paths
Includes full body content for local files
Provides rank for relevance sorting
Adds line_count for context size estimation

Example Usage:

const results = await mcp.callTool("search", {
  query: "async cancellation pattern",
  role: "Context Engineer",
  limit: 5
});

// Returns documents about async cancellation with file paths
// Claude can then extract specific lines or functions

3. `autocomplete_terms`

Autocomplete search terms from knowledge graph.

Purpose: Discover related concepts and get term suggestions.

Parameters:

query (string): Prefix or term to autocomplete
limit (number, optional): Maximum suggestions (default: 10)
role (string, optional): Role context

Returns:

{
  "suggestions": [
    {
      "term": "async-patterns",
      "normalized_term": "async patterns",
      "id": 12345,
      "url": "file:///docs/vibe-rules/rust/async-patterns.md",
      "score": 0.98
    }
  ]
}

Use Cases:

Discover related concepts while typing
Find synonyms and related terms
Navigate knowledge graph interactively

4. `autocomplete_with_snippets`

Autocomplete with code/documentation snippets.

Purpose: Get term suggestions with preview snippets.

Parameters:

query (string): Search prefix
limit (number, optional): Maximum results
role (string, optional): Role context

Returns:

{
  "suggestions": [
    {
      "term": "tokio::spawn",
      "snippet": "async fn example() {\n    tokio::spawn(async { ... });\n}",
      "url": "file:///docs/vibe-rules/rust/async-patterns.md",
      "start_line": 42
    }
  ]
}

Use Cases:

Preview code patterns before inserting
See usage examples during autocomplete
Learn API signatures interactively

5. `find_matches`

Find all concept matches in text with positions.

Purpose: Identify concepts/terms in code or documentation.

Parameters:

text (string): Text to analyze
role (string, optional): Role for knowledge graph context

Returns:

{
  "matches": [
    {
      "term": "tokio",
      "normalized_term": "tokio",
      "start_position": 150,
      "end_position": 155,
      "line_number": 12,
      "concept_id": 42
    }
  ]
}

Use Cases:

Analyze code for known patterns
Tag documentation with concepts
Build code-to-concept mappings

6. `is_all_terms_connected_by_path`

Check if terms are related in the knowledge graph.

Purpose: Verify semantic relationships between concepts.

Parameters:

terms (array of strings): Terms to check connectivity

Returns:

{
  "connected": true,
  "path": ["tokio", "async", "spawn"],
  "path_length": 2
}

Use Cases:

Verify that code uses related concepts
Find semantic gaps in documentation
Validate tag consistency

Workflow Examples

Example 1: Find and Reference Code Pattern

User: "Show me how to handle async cancellation in Rust"

Claude's Workflow:

// 1. Search for relevant documents
const searchResults = await mcp.callTool("search", {
  query: "async cancellation",
  role: "Context Engineer",
  limit: 3
});

// 2. Extract specific pattern with line numbers
const doc = searchResults.documents[0];
const paragraphs = await mcp.callTool("extract_paragraphs_from_automata", {
  text: doc.body,
  term: "tokio::select",
  max_paragraphs: 1
});

// 3. Return with precise reference
console.log(`Found pattern at ${doc.url}:${paragraphs[0].start_line}`);

Claude's Response:

Here's the recommended async cancellation pattern from docs/vibe-rules/rust/async-patterns.md:42-48:

tokio::select! {
    result = long_task() => {
        handle_result(result);
    }
    _ = shutdown.recv() => {
        cleanup().await;
    }
}

Example 2: Interactive Code Completion

User: Typing "async" in editor

Claude's Workflow:

// 1. Get autocomplete suggestions with snippets
const suggestions = await mcp.callTool("autocomplete_with_snippets", {
  query: "async",
  limit: 5
});

// 2. Show suggestions to user
// User selects "async-cancellation"

// 3. Get full context
const context = await mcp.callTool("search", {
  query: "async cancellation pattern",
  limit: 1
});

Result: Full pattern with explanation inserted into editor.

Example 3: Code Review with Concept Analysis

User: "Review this code for async best practices"

Claude's Workflow:

// 1. Find all async-related concepts in code
const matches = await mcp.callTool("find_matches", {
  text: userCode,
  role: "Context Engineer"
});

// 2. For each match, check if it follows patterns
for (const match of matches.matches) {
  // 3. Search for related best practices
  const practices = await mcp.callTool("search", {
    query: match.term,
    role: "Context Engineer"
  });

  // 4. Compare code to best practice
  // Report violations or confirm compliance
}

Claude's Response:

I found 3 async patterns in your code:

Line 42: tokio::spawn - ✅ Follows best practice (see async-patterns.md:15)

Line 67: Unbounded channel - ⚠️ Consider using bounded channel (see async-patterns.md:45)

Line 89: No cancellation handling - ❌ Missing cancellation (see async-patterns.md:78)

Implementation Details

Line Number Tracking

Line numbers are calculated during paragraph extraction:

pub struct ParagraphResult {
    pub content: String,
    pub start_line: usize,
    pub end_line: usize,
    pub term_position: usize,
}

pub fn extract_paragraphs_from_automata(
    text: &str,
    term: &str,
    max_paragraphs: usize,
) -> Vec<ParagraphResult> {
    let lines: Vec<&str> = text.lines().collect();
    let mut results = Vec::new();

    for (line_num, line) in lines.iter().enumerate() {
        if line.contains(term) {
            // Extract paragraph around match
            let start = line_num.saturating_sub(2);
            let end = (line_num + 3).min(lines.len());

            results.push(ParagraphResult {
                content: lines[start..end].join("\n"),
                start_line: start + 1,  // 1-indexed
                end_line: end,
                term_position: line_num + 1,
            });
        }
    }

    results
}

File Path Resolution

Documents include url field with file:// URLs:

pub struct IndexedDocument {
    pub id: String,
    pub url: String,  // "file:///path/to/file.rs"
    pub body: String,
    pub description: String,
    pub rank: f64,
}

URLs are resolved to absolute paths:

Local files: file:///absolute/path/to/file.rs
Remote URLs: https://example.com/doc.html
Relative paths: Resolved relative to workspace root

Configuration

Enable file context tools in MCP server:

{
  "mcpServers": {
    "terraphim": {
      "command": "/path/to/terraphim_mcp_server",
      "args": ["--config", "context_engineer_config.json"],
      "env": {
        "RUST_LOG": "info",
        "ENABLE_FILE_CONTEXT": "true"
      }
    }
  }
}

Role Configuration

Configure Context Engineer role with appropriate haystacks:

{
  "haystacks": [
    {
      "location": "docs/context-library",
      "service": "Ripgrep",
      "read_only": false
    },
    {
      "location": "docs/vibe-rules",
      "service": "Ripgrep",
      "read_only": false
    },
    {
      "location": "src",
      "service": "Ripgrep",
      "read_only": true
    }
  ]
}

Performance Considerations

Caching

MCP server caches:

Autocomplete indices (rebuilt when role changes)
Knowledge graph automata (loaded once per role)
Document content (read from disk on demand)

Latency

Typical latency:

autocomplete_terms: 5-20ms (in-memory FST lookup)
search: 50-200ms (knowledge graph traversal + file I/O)
extract_paragraphs_from_automata: 10-50ms (linear scan + extraction)
find_matches: 20-100ms (Aho-Corasick matching)

Memory Usage

Memory per role:

Autocomplete index: ~5-10MB (depends on thesaurus size)
Knowledge graph: ~20-50MB (nodes + edges + documents)
Document cache: ~0MB (not cached by default)

Total: ~25-60MB per active role.

Comparison with Conare AI

| Feature | Conare AI | Terraphim MCP | |---------|-----------|---------------| | File References | "@" instant referencing | search + extract_paragraphs tools | | Line Numbers | Automatic | Returned with paragraph extraction | | Context Window | Full file | Configurable paragraphs or full file | | Semantic Search | No | Yes (knowledge graph expansion) | | Concept Matching | No | Yes (find_matches) | | Autocomplete | Basic | Fuzzy + semantic expansion | | Token Tracking | Built-in UI | Via document metadata | | Cross-References | Manual | Automatic via knowledge graph |

Advantages of Terraphim:

Semantic Understanding: Finds related concepts, not just keyword matches
Knowledge Graph: Understands relationships between concepts
Flexible Extraction: Get exactly the context you need (paragraph, function, etc.)
Multi-Source: Search across local files, URLs, APIs simultaneously
Extensible: Add custom tools via MCP protocol

Best Practices

1. Use Specific Search Terms

Good:

search({ query: "tokio::select cancellation", role: "Context Engineer" })

Bad:

search({ query: "async", role: "Context Engineer" })  // Too broad

2. Extract Minimal Context

Good:

extract_paragraphs_from_automata({
  text: doc.body,
  term: "tokio::spawn",
  max_paragraphs: 1  // Only the immediate context
})

Bad:

// Returning entire file when only one function needed
search({ query: "tokio", limit: 1 })

3. Combine Tools for Rich Context

// 1. Find relevant documents
const docs = await search({ query: "async patterns" });

// 2. Extract specific examples
const examples = await extract_paragraphs({
  text: docs[0].body,
  term: "tokio::select"
});

// 3. Find related concepts
const related = await autocomplete_terms({
  query: "tokio::select"
});

// Result: Full context with examples and related concepts

4. Cache Autocomplete Index

// Build once per role
await mcp.callTool("build_autocomplete_index", {
  role: "Context Engineer"
});

// Then use autocomplete freely
const suggestions = await mcp.callTool("autocomplete_terms", {
  query: "async"
});

Troubleshooting

MCP Tool Not Found

# Verify MCP server is running
ps aux | grep terraphim_mcp_server

# Check Claude Desktop logs
tail -f ~/Library/Logs/Claude/mcp*.log

# Test MCP server directly
cd crates/terraphim_mcp_server
./start_local_dev.sh

Empty Results

# Verify knowledge graph is built
curl http://localhost:PORT/config | jq '.roles["Context Engineer"].kg'

# Check haystack configuration
curl http://localhost:PORT/config | jq '.roles["Context Engineer"].haystacks'

# Rebuild if needed
cargo run -- --config context_engineer_config.json

Line Numbers Incorrect

Line numbers are 1-indexed (first line is line 1, not line 0).

If line numbers seem off:

Check file encoding (must be UTF-8)
Verify line endings (LF vs CRLF)
Ensure no binary content in text files

Future Enhancements

Planned improvements to file context tools:

Streaming Results: Stream large file contents to avoid memory issues
Syntax-Aware Extraction: Extract complete functions/classes using AST
Diff-Based Context: Show changes between versions with line references
Multi-File Context: Extract related code across multiple files
Token Budget Management: Automatic context truncation based on LLM limits
IDE Integration: Direct jump-to-definition from MCP responses

MCP File Context Tools

Overview

Available Tools

1. extract_paragraphs_from_automata

2. search (Enhanced with File Context)

3. autocomplete_terms

4. autocomplete_with_snippets

5. find_matches

6. is_all_terms_connected_by_path

Workflow Examples

Example 1: Find and Reference Code Pattern

Example 2: Interactive Code Completion

Example 3: Code Review with Concept Analysis

Implementation Details

Line Number Tracking

File Path Resolution

Configuration

Role Configuration

Performance Considerations

Caching

Latency

Memory Usage

Comparison with Conare AI

Best Practices

1. Use Specific Search Terms

2. Extract Minimal Context

3. Combine Tools for Rich Context

4. Cache Autocomplete Index

Troubleshooting

MCP Tool Not Found

Empty Results

Line Numbers Incorrect

Future Enhancements

See Also

1. `extract_paragraphs_from_automata`

2. `search` (Enhanced with File Context)

3. `autocomplete_terms`

4. `autocomplete_with_snippets`

5. `find_matches`

6. `is_all_terms_connected_by_path`