Terraphim Graph Embeddings: Learning Agent Guide
Model: MiniMax-M2.5 | Version: 1.6.0
How Knowledge Graph Terms Improve Document Retrieval
Terraphim uses a unique approach to semantic search that combines knowledge graphs with graph-based ranking. Unlike traditional vector embeddings that represent documents as points in a high-dimensional space, Terraphim builds a graph structure where concepts are connected through their co-occurrence in documents.
Understanding Graph Embeddings
What Are Graph Embeddings?
Graph embeddings in Terraphim are not numerical vectors. Instead, they represent the topological structure of a knowledge graph:
Document: "The CAP theorem and Raft consensus are related"
Terms extracted: "cap theorem", "raft consensus", "related"
Graph structure created:
[cap theorem] ----(edge)---- [raft consensus]
| |
+---------(edge)----------+
|
[related concept]How They're Built
When you index documents into Terraphim:
- Term Matching: The Aho-Corasick automaton finds all known terms from the thesaurus in each document
- Node Creation: Each unique term becomes a node in the graph
- Edge Creation: When two terms appear in the same document, an edge is created between them
- Rank Calculation: Nodes and edges accumulate ranks based on frequency
// Simplified view of indexing
for document in documents The Ranking Formula
When searching, Terraphim calculates relevance using:
total_rank = node.rank + edge.rank + document_rankBreaking It Down
| Component | Description | How It's Calculated |
|-----------|-------------|---------------------|
| node.rank | Term importance | Number of documents containing the term |
| edge.rank | Term relationship strength | Number of documents where connected terms co-occur |
| document_rank | Term frequency | How often the term appears in a specific document |
Creating a Learning Agent with Knowledge Graph
Step 1: Define Your Agent's Knowledge Domain
// Create a thesaurus with your domain-specific terms
let mut thesaurus = new;
// Add core concepts
thesaurus.insert;
// Add synonyms
thesaurus.insert;Step 2: Build the RoleGraph
let role_name = new;
let mut rolegraph = new.await?;
// Index your learning documents
for doc in learning_documents Step 3: Query with Graph Ranking
// Search for relevant learnings
let results = rolegraph.query_graph?;How Adding Knowledge Graph Terms Improves Retrieval
Before Enhancement
With only generic terms:
| Query | Results | |-------|---------| | "raft consensus" | No results (term not in thesaurus) | | "cap theorem" | No results (term not in thesaurus) | | "database sharding" | No results (term not in thesaurus) |
After Enhancement
Adding domain-specific terms with synonyms:
| New Term | Synonyms | Result | |----------|----------|--------| | "cap theorem" | "consistency", "availability", "partition tolerance" | Found! | | "raft consensus" | "leader election", "log replication", "raft" | Found! | | "database sharding" | "horizontal partitioning", "sharding" | Found! |
Live Demo Output
Query: 'raft leader election'
Initial thesaurus: No results (terms not in thesaurus)
Enhanced thesaurus: 1. doc_raft (rank: 124)
-> Found 1 MORE documents with enhanced thesaurus!Practical Example: Learning Assistant
Complete Code
See crates/terraphim_rolegraph/examples/terraphim_graph_embeddings_learnings.rs for the full working example.
Key Takeaways
-
Graph embeddings are built from co-occurrence - Every time terms appear together in a document, they're connected in the graph
-
Adding domain-specific terms unlocks retrieval - Without "raft" or "cap theorem" in the thesaurus, searches for those terms return nothing
-
The ranking formula surfaces relevant docs - Documents connecting more high-ranking terms get higher scores
-
Graph connectivity indicates semantic coherence - When query terms are connected in the graph, the query has high semantic meaning
Configuration: Defining Terms in Markdown
You can define knowledge graph terms in Markdown files:
Terraphim automatically builds the thesaurus from these files.
Comparison with Vector Embeddings
| Aspect | Vector Embeddings | Terraphim Graph Embeddings | |--------|-------------------|---------------------------| | Representation | Numerical vectors | Graph topology | | Similarity | Cosine/distance | Graph connectivity | | Interpretability | Low (dense vectors) | High (explicit relationships) | | Updates | Retrain required | Incremental updates | | Privacy | May leak to server | Fully local | | Domain adaptation | Fine-tuning | Add terms to thesaurus |
Learning via Negativa
An advanced technique for learning from failed commands
One of the most powerful features of Terraphim is the ability to learn from mistakes. When a command fails or produces unexpected results, you can capture this as negative learning and use it to improve future retrieval.
The Problem
Imagine you're a developer who frequently makes the same mistakes:
- Running
git push -fwhen you meantgit push - Using
rm -rf *in the wrong directory - Typing
cargo runinstead ofcargo buildfor certain workflows
The Solution: Learning via Negativa
- Capture Failed Commands: Store failed commands with their error context
- Create Correction Knowledge: Build a knowledge graph mapping wrong → right
- Use Replace Tool: Automatically suggest corrections on similar patterns
Example: Command Correction Knowledge Graph
Live Demo Output
Query: 'git push force'
Without correction knowledge:
Results: git push force (incorrect usage)
With correction knowledge:
Results: git push (suggested), with warning about force push risks
Rank boost for: safe alternativesNext Steps
-
Try the example: Run
cargo run -p terraphim_rolegraph --example learnings_demo -
Create your own agent: Define a thesaurus for your domain
-
Add more terms: The more precise your terms, the better the retrieval
-
Use synonyms: They create additional edges, improving ranking
API Reference
RoleGraph Methods
// Create a new RoleGraph
new.await
// Index a document
rolegraph.insert_document
// Query with graph ranking
rolegraph.query_graph
// Check term connectivity
rolegraph.is_all_terms_connected_by_path
// Get graph statistics
rolegraph.get_graph_statsThesaurus Methods
// Create thesaurus
new
// Insert terms
thesaurus.insert
// Lookup
thesaurus.getConclusion
Terraphim's graph-based approach provides a powerful alternative to vector embeddings. By explicitly modeling term relationships through co-occurrence, it offers:
- Better interpretability - You can see exactly why a document was retrieved
- Easy domain adaptation - Just add terms to the knowledge graph
- Privacy-first - All processing happens locally
- Incremental updates - Add new knowledge without retraining
- Learning via negativa - Learn from mistakes and improve over time
The key insight is that adding domain-specific terms directly improves retrieval - there's no need for fine-tuning or training. This makes Terraphim particularly well-suited for personal knowledge management and specialized applications.