Terraphim Automata
The terraphim_automata crate provides high-performance text processing capabilities using finite state automata (FST) for term matching, autocomplete, and fuzzy search.
Architecture
Core Components
Finite State Automata (FST)
The crate implements a finite state transducer for efficient term lookup and matching:
Key Features:
- O(p+k) prefix search complexity
- Memory-efficient representation
- Fast term lookup and matching
- WASM-compatible design
Autocomplete System
Advanced autocomplete with fuzzy matching capabilities:
API Reference
Basic Usage
Creating an Automata
use Automata;
let terms = vec!;
let automata = new?;Prefix Search
let results = automata.prefix_search?;
// Returns: ["rust"]Fuzzy Autocomplete
use Autocomplete;
let autocomplete = new?;
// Jaro-Winkler similarity (default)
let results = autocomplete.fuzzy_autocomplete_search?;
// Levenshtein distance
let results = autocomplete.fuzzy_autocomplete_search_levenshtein?;Advanced Features
Custom Similarity Thresholds
// High similarity threshold for exact matches
let exact_matches = autocomplete.fuzzy_autocomplete_search?;
// Lower threshold for fuzzy matches
let fuzzy_matches = autocomplete.fuzzy_autocomplete_search?;Term Replacement
use matcher;
let text = "Rust is a systems programming language";
let thesaurus = vec!;
let matches = find_matches?;
let replaced = replace_matches?;Performance Characteristics
Algorithm Comparison
Jaro-Winkler (Default)
- Speed: 2.3x faster than Levenshtein
- Quality: Superior for autocomplete scenarios
- Use Case: General-purpose fuzzy matching
Levenshtein Distance
- Speed: Baseline implementation
- Quality: Good for exact edit distance
- Use Case: When precise edit distance matters
Benchmarks
10K terms processed in ~78ms
Throughput: 120+ MiB/s
Memory usage: Efficient FST representationWASM Compatibility
WebAssembly Support
The crate is designed for WASM deployment:
Feature Flags
[dependencies.terraphim_automata]
version = "0.1.0"
features = ["wasm-bindgen"]Text Processing
Term Matching
use matcher;
// Find all term occurrences in text
let matches = find_matches?;
// Replace terms with formatted links
let wiki_links = replace_matches?;
let html_links = replace_matches?;
let markdown_links = replace_matches?;Pattern Validation (Important!)
Both find_matches and replace_matches automatically filter invalid patterns to prevent issues:
/// Minimum pattern length to prevent spurious matches.
/// Patterns shorter than this are filtered out to avoid:
/// - Empty patterns matching at every character position
/// - Single-character patterns causing excessive matches
const MIN_PATTERN_LENGTH: usize = 2;Why this matters: Empty patterns in Aho-Corasick automata match at every position (index 0, 1, 2, ...), causing spurious text insertions between every character.
Example of the bug (now fixed):
- Input:
npm install express - With empty pattern:
bun install exmatching...pmatching...(broken!) - With filtering:
bun install express(correct!)
Automatic filtering includes:
- Empty strings (
"") - Single characters (
"a","x") - Whitespace-only strings (
" ")
Invalid patterns are logged as warnings for debugging:
WARN: Skipping invalid pattern: "" (length 0 < 2)Format Options
Builder System
Automata Construction
use builder;
let builder = new;
let automata = builder
.add_term
.add_term
.add_term
.build?;Batch Processing
let terms = vec!;
let automata = from_terms?.build?;Error Handling
Custom Error Types
Result Types
pub type Result<T> = Result;
// Usage
let automata = new?;
let results = automata.prefix_search?;Testing
Unit Tests
Integration Tests
Configuration
Performance Tuning
// Adjust similarity threshold for different use cases
let strict_matching = autocomplete.fuzzy_autocomplete_search?;
let loose_matching = autocomplete.fuzzy_autocomplete_search?;
// Adjust edit distance for Levenshtein
let exact_matching = autocomplete.fuzzy_autocomplete_search_levenshtein?;
let fuzzy_matching = autocomplete.fuzzy_autocomplete_search_levenshtein?;Memory Management
// Efficient FST construction
let automata = new?;
// Memory-efficient term storage
let autocomplete = new?;Best Practices
Term Selection
// Use meaningful, normalized terms
let good_terms = vec!;
// Avoid overly generic terms
let bad_terms = vec!;Performance Optimization
// Reuse automata instances when possible
let automata = new?;
// Batch operations for better performance
let queries = vec!;
for query in queries Error Handling
match autocomplete.fuzzy_autocomplete_search Integration Examples
Desktop Application
// In desktop app
let autocomplete = new?;
let suggestions = autocomplete.fuzzy_autocomplete_search?;
// Update UI with suggestions
update_suggestions;Web Application (WASM)
Knowledge Graph Integration
// Replace terms with KG links
let text = "Rust is a systems programming language";
let thesaurus = load_knowledge_graph_terms?;
let enhanced_text = replace_matches?;Dependencies
Internal Dependencies
- Minimal dependencies for WASM compatibility
- Focus on performance and memory efficiency
External Dependencies
fst: Finite state transducer implementationserde: Serialization supportwasm-bindgen: WASM bindings (optional)
Migration Guide
From Simple String Matching
- Replace string-based search with FST-based search
- Use
Automata::new()for prefix search - Use
Autocomplete::new()for fuzzy search - Update error handling for new error types
Performance Optimization
- Use Jaro-Winkler for general autocomplete
- Use Levenshtein for exact edit distance needs
- Batch operations when possible
- Reuse automata instances