Terraphim Knowledge Graph System
Overview
The Terraphim Knowledge Graph (KG) system provides semantic search capabilities by building thesauri from markdown files and using graph-based ranking algorithms. The system converts synonym relationships into graph structures that dramatically improve search relevance and discoverability.
Architecture Components
Core Components
- Logseq Builder - Extracts synonyms from markdown files using
synonyms::syntax - Thesaurus - Maps synonyms to normalized concept terms with unique IDs
- RoleGraph - Graph structure with nodes, edges, and documents for ranking
- TerraphimGraph Relevance Function - Graph-based scoring algorithm
- Knowledge Graph Local - Local markdown file processing for KG construction
Knowledge Graph Construction
Source Files
Knowledge graphs are built from markdown files in docs/src/kg/:
docs/src/kg/
├── terraphim-graph.md # Graph architecture concepts
├── service.md # Service definitions
├── haystack.md # Haystack integration
├── bug-reporting.md # Bug reporting terminology and structured analysis
├── issue-tracking.md # Domain-specific issue tracking terminology
└── [additional KG files]Synonym Syntax
Markdown files use the synonyms:: syntax to define concept relationships:
Thesaurus Construction
The Logseq builder processes markdown files to create thesaurus mappings:
let logseq_builder = default;
let thesaurus = logseq_builder
.build
.await?;Example thesaurus output:
'terraphim-graph' -> 'terraphim-graph' (ID: 3)
'graph embeddings' -> 'terraphim-graph' (ID: 3)
'graph' -> 'terraphim-graph' (ID: 3)
'knowledge graph based embeddings' -> 'terraphim-graph' (ID: 3)
'haystack' -> 'haystack' (ID: 1)
'service' -> 'service' (ID: 2)Graph Structure
RoleGraph Components
The RoleGraph converts thesaurus data into searchable graph structures:
Node Structure
Each node represents a concept with connections:
Edge Structure
Edges connect concepts and track document associations:
Search and Ranking Algorithm
TerraphimGraph Relevance Function
The TerraphimGraph relevance function uses graph structure for ranking:
- Pattern Matching - Find synonym matches in query text using Aho-Corasick
- Node Discovery - Map matched terms to concept nodes via thesaurus
- Edge Traversal - Follow connections between related concepts
- Rank Calculation - Combine node rank + edge rank + document rank
- Result Aggregation - Sort by total rank and return top results
Ranking Formula
let total_rank = node.rank + edge.rank + document_rank;The ranking rewards:
- Concept Importance (node.rank) - How central the concept is
- Connection Strength (edge.rank) - How strongly concepts are related
- Document Relevance (document_rank) - How relevant the document is
Query Processing
Performance Characteristics
Search Performance
Based on comprehensive testing:
- Initial KG State: 10 terms, 3 nodes, 5 edges
- Query Response: Consistent rank 34 for "terraphim-graph"
- Search Speed: Fast pattern matching with Aho-Corasick
- Memory Efficiency: Compact graph representation
Ranking Improvement
Adding synonyms creates dramatic ranking improvements:
| Metric | Before | After | Improvement | |--------|--------|-------|-------------| | Thesaurus Terms | 10 | 16 | +60% | | Graph Nodes | 3 | 4 | +33% | | Graph Edges | 5 | 8 | +60% | | "terraphim-graph" Rank | 28 | 117 | +318% |
Role Configuration
Terraphim Engineer Role
The Terraphim Engineer role uses local KG with TerraphimGraph relevance:
Local vs Remote Thesaurus
Local KG (Recommended):
- Built from
docs/src/kgmarkdown files - 10-16 terms from local content
- Domain-specific, highly relevant
- Fast building (~10 seconds)
Remote Thesaurus:
- Downloaded from external URL
- 1,725+ terms from general content
- May miss local domain terms
- Network dependency
Implementation Examples
Building Knowledge Graph
use ;
use RoleGraph;
use RoleName;
// 1. Build thesaurus from local KG files
let logseq_builder = default;
let thesaurus = logseq_builder
.build
.await?;
// 2. Create rolegraph with thesaurus
let role_name = new;
let mut rolegraph = new.await?;
// 3. Index documents into rolegraph
rolegraph.insert_document;
// 4. Search with graph-based ranking
let results = rolegraph.query_graph?;Adding New Knowledge
// Create new KG file with synonyms
let new_kg_content = r#"
# Graph Analysis
## Advanced Graph Processing
Graph Analysis provides deep insights into data relationships.
[example] synonyms:: data analysis, network analysis, graph processing,
relationship mapping, connectivity analysis,
terraphim-graph, graph embeddings
This enhances graph-based system capabilities.
"#;
// Write to KG directory
write.await?;
// Rebuild thesaurus to include new terms
let expanded_thesaurus = logseq_builder
.build
.await?;Measuring Graph Growth
// Measure initial state
let initial_nodes = rolegraph.nodes_map.len;
let initial_edges = rolegraph.edges_map.len;
let initial_terms = thesaurus.len;
// ... add new content and rebuild ...
// Measure growth
let node_growth = expanded_nodes - initial_nodes;
let edge_growth = expanded_edges - initial_edges;
let term_growth = expanded_terms - initial_terms;
println!;Best Practices
Content Strategy
- Domain-Specific Terms - Use terminology relevant to your domain
- Synonym Research - Include terms users actually search for
- Concept Mapping - Group related terms under common concepts
- Strategic Placement - Add important synonyms to boost key terms
Performance Optimization
- Local KG Preferred - Use local markdown files for domain relevance
- Measured Growth - Track thesaurus and graph expansion metrics
- Test-Driven - Validate ranking improvements with tests
- Incremental Building - Add synonyms gradually and measure impact
Testing and Validation
- Isolated Testing - Use temporary directories for safe testing
- Baseline Measurement - Record initial state before changes
- Impact Validation - Verify ranking improvements after additions
- Regression Testing - Ensure changes don't break existing functionality
Troubleshooting
Common Issues
No Search Results:
- Check if thesaurus contains expected terms
- Verify role uses TerraphimGraph relevance function
- Ensure KG path points to correct directory
Low Search Rankings:
- Add more relevant synonyms to target concepts
- Check synonym syntax in markdown files
- Verify graph structure has sufficient connections
Build Failures:
- Validate markdown file syntax
- Check file permissions in KG directory
- Ensure Logseq builder has access to files
Debug Information
// Print thesaurus contents
for in &thesaurus
// Check graph structure
println!;
// Test search functionality
let results = rolegraph.query_graph?;
println!;Bug Reporting and Issue Tracking Enhancement (2025-01-31)
Domain-Specific Knowledge Graph Files
The Terraphim KG system has been enhanced with comprehensive bug reporting and issue tracking terminology:
bug-reporting.md - Core bug reporting concepts:
- Steps to Reproduce - Comprehensive synonyms for reproduction procedures
- Expected Behaviour - Terminology for intended system behavior
- Actual Behaviour - Variations for describing observed problems
- Impact Analysis - Business and operational impact terminology
- Bug Classification - Issue categorization and severity terms
- Quality Assurance - QA processes and testing terminology
issue-tracking.md - Domain-specific terminology:
- Payroll System Issues - Salary calculation and compensation problems
- Data Consistency Problems - Synchronization and integrity issues
- HR System Integration - Human resources system connectivity
- System Integration Failures - Cross-system communication problems
- Performance Degradation - System slowdown and bottleneck terminology
- User Experience Issues - UI/UX problem descriptions
MCP Integration Testing
Comprehensive test suite validates bug reporting functionality:
test_bug_report_extraction.rs - Core functionality testing:
- Extracts 2,615 paragraphs from comprehensive bug reports
- Extracts 165 paragraphs from short content scenarios
- Tests all four bug report sections systematically
- Validates connectivity analysis across related terms
test_kg_term_verification.rs - Knowledge graph validation:
- Payroll terms: 3 suggestions (provider, service, middleware)
- Data consistency terms: 9 suggestions (data analysis, network analysis, etc.)
- Quality assurance terms: 9 suggestions (connectivity analysis, graph processing, etc.)
Performance Improvements
The enhanced knowledge graph demonstrates significant improvements in structured document analysis:
- Semantic Understanding: Enhanced ability to process structured bug reports using semantic understanding rather than keyword matching
- Domain Coverage: Comprehensive terminology coverage for technical documentation and issue tracking
- Extraction Performance: Robust paragraph extraction across different content types and sizes
- Term Recognition: Effective autocomplete functionality with expanded terminology
Future Enhancements
Planned Features
- Dynamic KG Updates - Hot-reload KG changes without restart
- Graph Visualization - Visual representation of concept relationships
- Advanced Ranking - Machine learning-enhanced relevance scoring
- Multi-Language Support - Synonym support for multiple languages
- Performance Optimization - Caching and incremental updates
- Domain Expansion - Additional specialized terminology for specific industries and use cases
Integration Opportunities
- External Ontologies - Import from RDF/OWL knowledge bases
- Collaborative Editing - Multi-user KG development workflows
- Analytics Dashboard - Search analytics and KG health monitoring
- API Extensions - RESTful APIs for KG management
Conclusion
The Terraphim Knowledge Graph system provides powerful semantic search capabilities through graph-based ranking. By converting synonym relationships into graph structures, the system dramatically improves search relevance and provides a framework for continuous improvement through strategic content additions.
Key Benefits:
- 🔍 Semantic Search - Find content by meaning, not just keywords
- 📈 Ranking Improvement - Up to 318% ranking boost from synonyms
- 🎯 Domain Relevance - Local KG ensures domain-specific accuracy
- 🔧 Easy Expansion - Simple markdown syntax for adding knowledge
- 📊 Measurable Impact - Comprehensive testing framework for validation
The knowledge graph system forms the foundation for intelligent, context-aware search in the Terraphim AI platform.