Comprehensive Scoring Function x Haystack Test Matrix
Overview
This document describes the comprehensive test matrix framework that validates every combination of scoring functions and haystacks in the Terraphim system.
What Was Created
π― Test Matrix Framework
Location: crates/terraphim_tui/tests/scoring_haystack_matrix_tests.rs
A comprehensive testing framework that systematically tests every scoring function with every haystack type to ensure compatibility, performance, and correctness.
π Test Coverage
Scoring Functions (5 types)
- TerraphimGraph (
terraphim-graph) - Knowledge graph-based ranking - TitleScorer (
title-scorer) - Document title-based ranking (default) - BM25 (
bm25) - Standard Okapi BM25 ranking - BM25F (
bm25f) - Fielded BM25 with weighted document fields - BM25Plus (
bm25plus) - Enhanced BM25 with additional parameters
Haystack Types (6 types)
- Ripgrep - Local filesystem search
- Atomic - Atomic server integration
- QueryRs - Rust documentation search
- ClickUp - ClickUp API integration
- MCP - Model Context Protocol servers
- Perplexity - AI-powered web search
Query Scorers (10 algorithms)
For advanced TitleScorer combinations:
- Levenshtein - Edit distance similarity
- Jaro - Character transposition similarity
- JaroWinkler - Enhanced Jaro with prefix bonus
- BM25/BM25F/BM25Plus - BM25 variants within TitleScorer
- TFIDF - Term frequency-inverse document frequency
- Jaccard - Set-based similarity
- QueryRatio - Query term coverage ratio
- OkapiBM25 - Original Okapi BM25 implementation
π§ͺ Test Categories
1. Basic Matrix Test (test_complete_scoring_haystack_matrix)
- Tests all 30 basic combinations (5 scoring functions Γ 6 haystacks)
- Validates configuration generation and basic search functionality
- Expected success rate: 40%+ (remote services may fail)
2. Priority Combinations Test (test_priority_combinations)
- Tests critical combinations that should always work
- Focuses on local/reliable combinations
- Expected success rate: 80%+
3. Performance Comparison Test (test_scoring_function_performance_comparison)
- Benchmarks performance across different scoring functions
- Measures response time and results per second
- Uses Ripgrep (most reliable) for consistent testing
4. Extended Matrix Test (test_extended_matrix_with_query_scorers)
- Tests 90+ combinations including query scorer variations
- Comprehensive validation of TitleScorer with all query algorithms
- Expected success rate: 30%+ (more combinations = more potential failures)
5. Title Scorer Combinations Test (test_title_scorer_query_combinations)
- Focused testing of TitleScorer with specific query scoring algorithms
- Validates algorithm compatibility and performance
- Expected success rate: 50%+
π Test Execution
Manual Execution
# Run individual test categories
Automated Execution Script
Location: run_test_matrix.sh
# Run all tests
# Run specific categories
π Test Results Structure
Each test combination generates a MatrixTestResult with:
- Configuration Success: Whether test config was created successfully
- Search Success: Whether search operation completed successfully
- Response Time: How long the search took
- Result Count: Number of results returned
- Performance Score: Results per second metric
- Error Details: Specific error messages for failures
π Comprehensive Reporting
The test matrix generates detailed reports including:
- Overall success rate across all combinations
- Per-scoring-function success rates
- Per-haystack success rates
- Performance rankings (fastest combinations first)
- Detailed failure analysis with specific error messages
Current Findings
β Successful Framework Creation
- Test matrix framework compiles and runs successfully
- Configuration generation works for all combinations
- Comprehensive reporting provides detailed analysis
- Performance tracking captures timing and throughput metrics
β οΈ Identified CLI Limitation
Issue Discovered: The TUI command line interface doesn't currently support a --config parameter.
Error: error: unexpected argument '--config' found
Impact:
- All test combinations currently fail due to CLI limitation
- Test framework correctly identifies this as a systematic issue
- Need to either:
- Add
--configparameter support to TUI CLI, or - Modify test approach to use environment variables or default config loading
- Add
π― Next Steps
- Add CLI Config Support: Implement
--configparameter in TUI CLI - Alternative Test Approach: Create tests that use environment-based configuration
- Integration Testing: Once CLI is fixed, run full matrix validation
- Performance Baselines: Establish performance benchmarks for each combination
- CI Integration: Add matrix tests to continuous integration pipeline
Architecture Benefits
π§ Modular Design
- Each scoring function and haystack type is modeled as an enum
- Configuration generation is parameterized and extensible
- Test execution engine supports different test patterns
π Comprehensive Coverage
- Tests 30 basic combinations (5 scoring Γ 6 haystacks)
- Tests 90+ extended combinations (including query scorers)
- Validates configuration generation, search execution, and performance
π― Quality Assurance
- Systematic validation ensures no combination is missed
- Performance tracking identifies optimization opportunities
- Error analysis provides actionable debugging information
- Success rate metrics track system reliability
π Extensible Framework
- Easy to add new scoring functions
- Easy to add new haystack types
- Easy to add new query scorer algorithms
- Easy to add new test patterns
Usage Examples
Testing a New Scoring Function
// Add to ScoringFunction enum
// Add configuration mapping
Testing a New Haystack Type
// Add to HaystackType enum
// Add configuration details
Conclusion
The comprehensive test matrix provides:
- Complete validation of all scoring function Γ haystack combinations
- Performance benchmarking across the entire system
- Quality assurance for configuration compatibility
- Systematic approach to testing that scales with new features
- Detailed reporting for debugging and optimization
This framework ensures that any changes to scoring functions or haystack implementations are thoroughly validated across the entire system, maintaining reliability and performance standards.