Ripgrep Tag Filtering Documentation

This document explains how to use tag filtering with Ripgrep haystacks in Terraphim AI, including configuration through the wizard UI and expected behavior.

Overview

Tag filtering allows you to restrict search results to documents that contain specific hashtags (e.g., #rust, #docs, #test) in addition to your search terms. This feature is particularly useful for organizing and filtering content in knowledge bases.

How It Works

When you configure a tag filter like #rust, the system generates a ripgrep command that requires both your search term and the specified tag to be present in the same file:

rg --json --trim -C3 --ignore-case -tmarkdown --all-match -e "your_search" -e "#rust" /path/to/haystack

The --all-match flag ensures that all specified patterns must be found for a document to be included in results.

Configuration via Wizard UI

Step 1: Create or Edit a Role

Open the Configuration Wizard at /config/wizard
Navigate to Step 2 (Roles)
Add a new role or edit an existing one
Add a haystack to the role

Step 2: Configure Ripgrep Haystack

Set the Service Type to "Ripgrep (File Search)"
Set the Directory Path to your document directory
In the Extra Parameters section, you'll see tag filtering options

Step 3: Set Up Tag Filtering

Option A: Use Preset Tags

Use the "Presets" dropdown to select common tags:
- #rust - Rust-related content
- #docs - Documentation
- #test - Testing-related content
- #todo - TODO items

Option B: Manual Tag Entry

Enter a custom tag in the "Hashtag" field (e.g., #custom, #project-name)
Multiple tags can be separated by commas or spaces

Step 4: Additional Parameters (Optional)

You can also configure other filtering parameters:

Max Results (max_count): Limit the number of results per file
Custom Parameters: Add other ripgrep options like glob patterns

Configuration JSON Structure

When saved, the configuration includes the tag filter in the extra_parameters field:

{
  "roles": {
    "Your Role Name": {
      "haystacks": [
        {
          "location": "/path/to/your/documents",
          "service": "Ripgrep",
          "read_only": false,
          "extra_parameters": {
            "tag": "#rust",
            "max_count": "10"
          }
        }
      ]
    }
  }
}

Example Use Cases

1. Rust Development Team

Filter search results to only show Rust-related documentation:

{
  "extra_parameters": {
    "tag": "#rust",
    "type": "md"
  }
}

Generated command:

rg --json --trim -C3 --ignore-case -tmarkdown --all-match -t md -e "async" -e "#rust" ./docs/

2. Documentation Search

Find only documentation files with specific tags:

{
  "extra_parameters": {
    "tag": "#docs",
    "context": "5",
    "max_count": "15"
  }
}

Generated command:

rg --json --trim -C5 --ignore-case -tmarkdown --all-match --max-count 15 -e "api" -e "#docs" ./docs/

3. Testing Focus

Search only test-related documentation:

{
  "extra_parameters": {
    "tag": "#test",
    "case_sensitive": "true"
  }
}

Generated command:

rg --json --trim -C3 --case-sensitive -tmarkdown --all-match -e "integration" -e "#test" ./docs/

Supported Extra Parameters

| Parameter | Description | Example | Ripgrep Flag | |-----------|-------------|---------|--------------| | tag | Filter by hashtags | "#rust" | -e "#rust" --all-match | | glob | File pattern filter | "*.md" | --glob "*.md" | | type | File type filter | "md" | -t md | | max_count | Max results per file | "10" | --max-count 10 | | context | Context lines | "5" | -C 5 | | case_sensitive | Case-sensitive search | "true" | --case-sensitive |

Document Preparation

To use tag filtering effectively, your documents should include hashtags:

# Rust Memory Management #rust

This document explains Rust's ownership system and memory safety features.

## Borrowing #rust #advanced

Understanding how borrowing works...

## Testing Your Code #rust #test

Here's how to write tests in Rust...

Expected Behavior

With Tag Filter `#rust`:

Search: "memory"
Results: Only files containing BOTH "memory" AND "#rust"
Excluded: Files with "memory" but no "#rust" tag

Without Tag Filter:

Search: "memory"
Results: All files containing "memory" regardless of tags

Troubleshooting

No Results Found

Check tag syntax: Ensure tags include the # symbol
Verify document tags: Confirm your documents actually contain the specified tags
Case sensitivity: By default, searches are case-insensitive
File types: Make sure you're searching the right file types (default is markdown)

Too Many/Few Results

Adjust max_count: Limit results per file
Add more specific tags: Use multiple tags for better filtering
Use glob patterns: Filter by file paths or names

Debug Information

Set LOG_LEVEL=debug to see detailed logging:

LOG_LEVEL=debug cargo run

Look for log messages like:

[INFO] 🏷️ Processing tag filter: '#rust'
[INFO] Added tag pattern: #rust
[INFO] 🚀 Executing: rg --json --trim -C3 --ignore-case -tmarkdown --all-match -e memory -e #rust /path/to/docs

Testing

Manual Testing

Create test files with and without tags
Configure a role with tag filtering
Perform searches and verify only tagged content appears

Automated Testing

Run the validation script:

./scripts/validate_ripgrep_tag_filtering.sh

Or run the E2E tests:

cd desktop
npm test -- tests/e2e/ripgrep-tag-filtering.spec.ts

Direct Command Testing

Test ripgrep commands directly:

# With tag filtering
rg --json --trim -C3 --ignore-case -tmarkdown --all-match -e "search_term" -e "#rust" ./docs/

# Without tag filtering (for comparison)
rg --json --trim -C3 --ignore-case -tmarkdown "search_term" ./docs/

Best Practices

Tag Naming Conventions

Use descriptive, consistent tags: #rust, #api, #tutorial
Avoid spaces in tags: #rust-lang not #rust lang
Use lowercase for consistency: #rust not #Rust
Group related content: #rust-async, #rust-testing

Document Organization

Add tags at the document level and section level
Use multiple tags for cross-cutting concerns
Keep tag lists updated as content evolves
Document your tagging strategy for team members

Performance Considerations

Use specific tags to reduce search scope
Set appropriate max_count limits
Consider using glob patterns for path-based filtering
Monitor search performance with complex tag combinations

Integration with Other Features

Knowledge Graphs

Tag filtering works alongside knowledge graph processing. Tagged documents will still contribute to graph relationships while being filtered during search.

Role-Based Configuration

Different roles can have different tag filtering strategies:

Developer Role: Filter by #code, #api, #architecture
Documentation Role: Filter by #docs, #guide, #tutorial
QA Role: Filter by #test, #bug, #validation

Multiple Haystacks

Each haystack in a role can have different tag filtering configuration, allowing for granular control over different document sources.

Future Enhancements

Potential improvements to tag filtering:

Tag Suggestions: Auto-complete based on existing tags in documents
Tag Analytics: Show tag usage statistics
Exclude Tags: Support for excluding certain tags (NOT operator)
Tag Hierarchies: Support parent/child tag relationships
Visual Tag Management: UI for browsing and managing tags

Support and Feedback

For issues related to tag filtering:

Check the server logs for detailed error messages
Verify your ripgrep installation: rg --version
Test with direct ripgrep commands to isolate issues
Report bugs with configuration details and log output

Ripgrep Tag Filtering Documentation

Overview

How It Works

Configuration via Wizard UI

Step 1: Create or Edit a Role

Step 2: Configure Ripgrep Haystack

Step 3: Set Up Tag Filtering

Option A: Use Preset Tags

Option B: Manual Tag Entry

Step 4: Additional Parameters (Optional)

Configuration JSON Structure

Example Use Cases

1. Rust Development Team

2. Documentation Search

3. Testing Focus

Supported Extra Parameters

Document Preparation

Expected Behavior

With Tag Filter #rust:

Without Tag Filter:

Troubleshooting

No Results Found

Too Many/Few Results

Debug Information

Testing

Manual Testing

Automated Testing

Direct Command Testing

Best Practices

Tag Naming Conventions

Document Organization

Performance Considerations

Integration with Other Features

Knowledge Graphs

Role-Based Configuration

Multiple Haystacks

Future Enhancements

Support and Feedback

With Tag Filter `#rust`: