Ripgrep Tag Filtering Documentation
This document explains how to use tag filtering with Ripgrep haystacks in Terraphim AI, including configuration through the wizard UI and expected behavior.
Overview
Tag filtering allows you to restrict search results to documents that contain specific hashtags (e.g., #rust, #docs, #test) in addition to your search terms. This feature is particularly useful for organizing and filtering content in knowledge bases.
How It Works
When you configure a tag filter like #rust, the system generates a ripgrep command that requires both your search term and the specified tag to be present in the same file:
The --all-match flag ensures that all specified patterns must be found for a document to be included in results.
Configuration via Wizard UI
Step 1: Create or Edit a Role
- Open the Configuration Wizard at
/config/wizard - Navigate to Step 2 (Roles)
- Add a new role or edit an existing one
- Add a haystack to the role
Step 2: Configure Ripgrep Haystack
- Set the Service Type to "Ripgrep (File Search)"
- Set the Directory Path to your document directory
- In the Extra Parameters section, you'll see tag filtering options
Step 3: Set Up Tag Filtering
Option A: Use Preset Tags
- Use the "Presets" dropdown to select common tags:
#rust- Rust-related content#docs- Documentation#test- Testing-related content#todo- TODO items
Option B: Manual Tag Entry
- Enter a custom tag in the "Hashtag" field (e.g.,
#custom,#project-name) - Multiple tags can be separated by commas or spaces
Step 4: Additional Parameters (Optional)
You can also configure other filtering parameters:
- Max Results (
max_count): Limit the number of results per file - Custom Parameters: Add other ripgrep options like
globpatterns
Configuration JSON Structure
When saved, the configuration includes the tag filter in the extra_parameters field:
Example Use Cases
1. Rust Development Team
Filter search results to only show Rust-related documentation:
Generated command:
2. Documentation Search
Find only documentation files with specific tags:
Generated command:
3. Testing Focus
Search only test-related documentation:
Generated command:
Supported Extra Parameters
| Parameter | Description | Example | Ripgrep Flag |
|-----------|-------------|---------|--------------|
| tag | Filter by hashtags | "#rust" | -e "#rust" --all-match |
| glob | File pattern filter | "*.md" | --glob "*.md" |
| type | File type filter | "md" | -t md |
| max_count | Max results per file | "10" | --max-count 10 |
| context | Context lines | "5" | -C 5 |
| case_sensitive | Case-sensitive search | "true" | --case-sensitive |
Document Preparation
To use tag filtering effectively, your documents should include hashtags:
Expected Behavior
With Tag Filter #rust:
- Search: "memory"
- Results: Only files containing BOTH "memory" AND "#rust"
- Excluded: Files with "memory" but no "#rust" tag
Without Tag Filter:
- Search: "memory"
- Results: All files containing "memory" regardless of tags
Troubleshooting
No Results Found
- Check tag syntax: Ensure tags include the
#symbol - Verify document tags: Confirm your documents actually contain the specified tags
- Case sensitivity: By default, searches are case-insensitive
- File types: Make sure you're searching the right file types (default is markdown)
Too Many/Few Results
- Adjust
max_count: Limit results per file - Add more specific tags: Use multiple tags for better filtering
- Use
globpatterns: Filter by file paths or names
Debug Information
Set LOG_LEVEL=debug to see detailed logging:
LOG_LEVEL=debug Look for log messages like:
[INFO] 🏷️ Processing tag filter: '#rust'
[INFO] Added tag pattern: #rust
[INFO] 🚀 Executing: rg --json --trim -C3 --ignore-case -tmarkdown --all-match -e memory -e #rust /path/to/docsTesting
Manual Testing
- Create test files with and without tags
- Configure a role with tag filtering
- Perform searches and verify only tagged content appears
Automated Testing
Run the validation script:
Or run the E2E tests:
Direct Command Testing
Test ripgrep commands directly:
# With tag filtering
# Without tag filtering (for comparison)
Best Practices
Tag Naming Conventions
- Use descriptive, consistent tags:
#rust,#api,#tutorial - Avoid spaces in tags:
#rust-langnot#rust lang - Use lowercase for consistency:
#rustnot#Rust - Group related content:
#rust-async,#rust-testing
Document Organization
- Add tags at the document level and section level
- Use multiple tags for cross-cutting concerns
- Keep tag lists updated as content evolves
- Document your tagging strategy for team members
Performance Considerations
- Use specific tags to reduce search scope
- Set appropriate
max_countlimits - Consider using
globpatterns for path-based filtering - Monitor search performance with complex tag combinations
Integration with Other Features
Knowledge Graphs
Tag filtering works alongside knowledge graph processing. Tagged documents will still contribute to graph relationships while being filtered during search.
Role-Based Configuration
Different roles can have different tag filtering strategies:
- Developer Role: Filter by
#code,#api,#architecture - Documentation Role: Filter by
#docs,#guide,#tutorial - QA Role: Filter by
#test,#bug,#validation
Multiple Haystacks
Each haystack in a role can have different tag filtering configuration, allowing for granular control over different document sources.
Future Enhancements
Potential improvements to tag filtering:
- Tag Suggestions: Auto-complete based on existing tags in documents
- Tag Analytics: Show tag usage statistics
- Exclude Tags: Support for excluding certain tags (NOT operator)
- Tag Hierarchies: Support parent/child tag relationships
- Visual Tag Management: UI for browsing and managing tags
Support and Feedback
For issues related to tag filtering:
- Check the server logs for detailed error messages
- Verify your ripgrep installation:
rg --version - Test with direct ripgrep commands to isolate issues
- Report bugs with configuration details and log output