terraphim_config - Configuration Management
Overview
terraphim_config provides configuration management for the Terraphim AI system. It handles role definitions, haystack configurations, knowledge graph settings, and LLM integration. The crate supports environment variable expansion and multi-source configuration loading.
Domain Model
Core Concepts
TerraphimConfig
Main configuration container with roles and global settings.
pub struct TerraphimConfig {
pub roles: AHashMap<RoleName, Role>,
}
Key Responsibilities:
- Store role configurations
- Provide role lookup
- Support configuration updates
- Enable configuration persistence
Role
User profile with specific knowledge domains, search preferences, and LLM settings.
pub struct Role {
pub shortname: Option<String>,
pub name: RoleName,
pub relevance_function: RelevanceFunction,
pub terraphim_it: bool,
pub theme: String,
pub kg: Option<KnowledgeGraph>,
pub haystacks: Vec<Haystack>,
pub llm_enabled: bool,
pub llm_api_key: Option<String>,
pub llm_model: Option<String>,
pub llm_auto_summarize: bool,
pub llm_chat_enabled: bool,
pub llm_chat_system_prompt: Option<String>,
pub llm_chat_model: Option<String>,
pub llm_context_window: Option<u64>,
pub extra: AHashMap<String, Value>,
pub llm_router_enabled: bool,
pub llm_router_config: Option<LlmRouterConfig>,
}
Key Responsibilities:
- Define user knowledge domains
- Configure search relevance
- Manage LLM integration
- Specify data sources (haystacks)
Haystack
Data source containing searchable documents.
pub struct Haystack {
pub location: String,
pub service: ServiceType,
pub read_only: bool,
pub fetch_content: bool,
pub atomic_server_secret: Option<String>,
pub extra_parameters: std::collections::HashMap<String, String>,
}
Key Responsibilities:
- Define data source location
- Specify indexing service
- Control read/write behaviour
- Support service-specific parameters
Data Models
Role Configuration
RelevanceFunction
Algorithm for ranking search results.
pub enum RelevanceFunction {
TitleScorer,
BM25,
BM25F,
BM25Plus,
TerraphimGraph,
}
Use Cases:
TitleScorer: Simple title matching
BM25: Okapi BM25 algorithm
BM25F: Field-length normalised BM25
BM25Plus: BM25 with additional features
TerraphimGraph: Knowledge graph-based ranking
KnowledgeGraph
Knowledge graph configuration for a role.
pub struct KnowledgeGraph {
pub automata_path: Option<String>,
pub knowledge_graph_local: Option<LocalKnowledgeGraph>,
pub graph_type: Option<String>,
}
Use Cases:
- Specify remote automata URL
- Configure local knowledge graph path
- Define graph type
LocalKnowledgeGraph
Local knowledge graph source configuration.
pub struct LocalKnowledgeGraph {
pub path: String,
pub format: Option<String>,
}
Use Cases:
- Specify local file path
- Define graph format (optional)
- Enable local graph loading
Service Configuration
ServiceType
Supported indexing services.
pub enum ServiceType {
Ripgrep,
Atomic,
QueryRs,
ClickUp,
Mcp,
Perplexity,
GrepApp,
AiAssistant,
Quickwit,
Jmap,
}
Use Cases:
Ripgrep: Local filesystem search
Atomic: Atomic Data server
QueryRs: Reddit + Rust docs search
ClickUp: Task management
Mcp: Model Context Protocol
Perplexity: AI-powered web search
GrepApp: GitHub code search
AiAssistant: AI coding assistant logs
Quickwit: Log and observability data
Jmap: Email protocol
LLM Configuration
LlmRouterConfig
Intelligent LLM routing configuration.
pub struct LlmRouterConfig {
pub default_provider: Option<String>,
pub default_model: Option<String>,
pub routing_rules: Vec<RoutingRule>,
}
Use Cases:
- Specify default provider
- Configure default model
- Define routing rules
RoutingRule
Rule-based LLM provider selection.
pub struct RoutingRule {
pub capability: String,
pub provider: String,
pub model: String,
pub priority: Priority,
}
Use Cases:
- Define capability-based routing
- Specify provider and model
- Set routing priority
Priority
Priority levels for routing decisions.
pub enum Priority {
High,
Medium,
Low,
}
Use Cases:
- Rule ordering
- Fallback prioritisation
- Resource allocation
Implementation Patterns
Configuration Loading
Path Expansion
pub fn expand_path(path: &str) -> PathBuf {
let mut result = path.to_string();
fn get_home_dir() -> Option<PathBuf> {
if let Some(home) = dirs::home_dir() {
return Some(home);
}
if let Ok(home) = std::env::var("HOME") {
return Some(PathBuf::from(home));
}
if let Ok(profile) = std::env::var("USERPROFILE") {
return Some(PathBuf::from(profile));
}
None
}
loop {
if let Some(start) = result.find("${") {
if let Some(colon_pos) = result[start..].find(":-") {
let colon_pos = start + colon_pos;
let var_name = &result[start + 2..colon_pos];
let after_colon = colon_pos + 2;
let mut depth = 1;
let mut end_pos = after_colon;
for (i, c) in result[after_colon..].char_indices() {
match c {
'{' => depth += 1,
'}' => {
depth -= 1;
if depth == 0 {
end_pos = after_colon + i;
break;
}
}
_ => {}
}
}
if depth == 0 {
let default_value = &result[after_colon..end_pos];
let replacement = std::env::var(var_name)
.unwrap_or_else(|_| default_value.to_string());
result = format!("{}{}{}", &result[..start], replacement, &result[end_pos + 1..]);
continue;
}
}
}
break;
}
let re_braces = regex::Regex::new(r"\$\{([^}]+)\}").unwrap();
result = re_braces.replace_all(&result, |caps: ®ex::Captures| {
let var_name = &caps[1];
if var_name == "HOME" {
get_home_dir()
.map(|p| p.to_string_lossy().to_string())
.unwrap_or_else(|| format!("${{{}}", var_name))
} else {
std::env::var(var_name).unwrap_or_else(|_| format!("${{{}}", var_name))
}
}).to_string();
let re_dollar = regex::Regex::new(r"\$([A-Za-z_][A-Za-z0-9_]*)").unwrap();
result = re_dollar.replace_all(&result, |caps: ®ex::Captures| {
let var_name = &caps[1];
if var_name == "HOME" {
get_home_dir()
.map(|p| p.to_string_lossy().to_string())
.unwrap_or_else(|| format!("${}", var_name))
} else {
std::env::var(var_name).unwrap_or_else(|_| format!("${}", var_name))
}
}).to_string();
if result.starts_with('~') {
if let Some(home) = get_home_dir() {
result = result.replacen('~', &home.to_string_lossy(), 1);
}
}
PathBuf::from(result)
}
Pattern:
- Support shell-like variable expansion
- Handle
${VAR:-default} syntax
- Handle
${VAR} and $VAR syntax
- Expand
~ to home directory
- Use multiple fallback strategies
Default Context Window
fn default_context_window() -> Option<u64> {
Some(32768)
}
Default: 32,768 tokens (~262,144 characters)
Role Management
Role Creation
impl Role {
pub fn new(name: impl Into<RoleName>) -> Self {
Self {
shortname: None,
name: name.into(),
relevance_function: RelevanceFunction::TitleScorer,
terraphim_it: false,
theme: "default".to_string(),
kg: None,
haystacks: vec![],
llm_enabled: false,
llm_api_key: None,
llm_model: None,
llm_auto_summarize: false,
llm_chat_enabled: false,
llm_chat_system_prompt: None,
llm_chat_model: None,
llm_context_window: default_context_window(),
extra: AHashMap::new(),
llm_router_enabled: false,
llm_router_config: None,
}
}
}
Pattern:
- Provide sensible defaults
- Use builder pattern via
new()
- Support all optional fields
- Default to safe values
LLM Validation
impl Role {
pub fn has_llm_config(&self) -> bool {
self.llm_enabled && self.llm_api_key.is_some() && self.llm_model.is_some()
}
pub fn get_llm_model(&self) -> Option<&str> {
self.llm_model.as_deref()
}
}
Pattern:
- Check all required fields present
- Provide convenience accessors
- Return safe defaults
Haystack Management
Haystack Creation
impl Haystack {
pub fn new(location: String, service: ServiceType, read_only: bool) -> Self {
Self {
location,
service,
read_only,
fetch_content: false,
atomic_server_secret: None,
extra_parameters: std::collections::HashMap::new(),
}
}
pub fn new_with_atomic_secret(
location: String,
service: ServiceType,
read_only: bool,
atomic_server_secret: String
) -> Self {
Self {
location,
service,
read_only,
fetch_content: false,
atomic_server_secret: Some(atomic_server_secret),
extra_parameters: std::collections::HashMap::new(),
}
}
}
Pattern:
- Basic constructor for common cases
- Atomic secret constructor for Atomic service
- Support extra parameters via HashMap
Haystack Serialisation
impl Serialize for Haystack {
fn serialize<S>(&self, serializer: S) -> std::result::Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
use serde::ser::SerializeStruct;
let mut field_count = 3;
let include_atomic_secret =
self.service == ServiceType::Atomic && self.atomic_server_secret.is_some();
if include_atomic_secret {
field_count += 1;
}
if !self.extra_parameters.is_empty() {
field_count += 1;
}
let mut state = serializer.serialize_struct("Haystack", field_count)?;
state.serialize_field("location", &self.location)?;
state.serialize_field("service", &self.service)?;
state.serialize_field("read_only", &self.read_only)?;
if include_atomic_secret {
state.serialize_field("atomic_server_secret", &self.atomic_server_secret)?;
}
if !self.extra_parameters.is_empty() {
state.serialize_field("extra_parameters", &self.extra_parameters)?;
}
state.end()
}
}
Pattern:
- Conditionally include optional fields
- Only include atomic secret for Atomic service
- Only include extra_parameters if not empty
- Dynamic field count calculation
Error Handling
Error Types
#[derive(Error, Debug)]
pub enum TerraphimConfigError {
#[error("Unable to load config")]
NotFound,
#[error("At least one role is required")]
NoRoles,
#[error("Profile error")]
Profile(String),
#[error("Persistence error")]
Persistence(Box<terraphim_persistence::Error>),
#[error("Serde JSON error")]
Json(#[from] serde_json::Error),
#[error("Cannot initialize tracing subscriber")]
TracingSubscriber(Box<dyn std::error::Error + Send + Sync>),
#[error("Pipe error")]
Pipe(#[from] terraphim_rolegraph::Error),
#[error("Automata error")]
Automata(#[from] terraphim_automata::TerraphimAutomataError),
#[error("Url error")]
Url(#[from] url::ParseError),
#[error("IO error")]
Io(#[from] std::io::Error),
#[error("Config error")]
Config(String),
}
Categories:
- Configuration: Config loading errors
- Validation: Role/profile validation
- Integration: Dependency errors
- I/O: File system errors
Performance Optimisations
Lazy Evaluation
Configuration Access
impl TerraphimConfig {
pub fn get_role(&self, role_name: &RoleName) -> Option<&Role> {
self.roles.get(role_name)
}
}
Pattern:
- Use
AHashMap for fast lookups
- Return references to avoid cloning
- Use
Option<T> for safe access
Default Values
impl Role {
pub fn get_llm_model(&self) -> Option<&str> {
self.llm_model.as_deref()
}
}
Pattern:
- Provide convenience accessors
- Return references to strings
- Handle
None gracefully
Testing Patterns
Unit Tests
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_role_creation() {
let role = Role::new("DataScientist");
assert_eq!(role.name.as_str(), "DataScientist");
assert_eq!(role.relevance_function, RelevanceFunction::TitleScorer);
}
#[test]
fn test_llm_validation() {
let mut role = Role::new("Test");
role.llm_enabled = true;
role.llm_api_key = Some("test-key".to_string());
role.llm_model = Some("gpt-3.5".to_string());
assert!(role.has_llm_config());
assert_eq!(role.get_llm_model(), Some("gpt-3.5"));
}
#[test]
fn test_path_expansion() {
std::env::set_var("HOME", "/home/user");
std::env::set_var("TEST_VAR", "test-value");
let expanded = expand_path("${HOME}/test/${TEST_VAR:-default}");
assert_eq!(expanded, PathBuf::from("/home/user/test/test-value"));
let expanded = expand_path("~/test");
assert_eq!(expanded, PathBuf::from("/home/user/test"));
}
#[test]
fn test_haystack_serialisation() {
let mut haystack = Haystack::new(
"/path/to/data".to_string(),
ServiceType::Ripgrep,
false
);
haystack.extra_parameters.insert("filter".to_string(), "*.md".to_string());
let json = serde_json::to_string(&haystack).unwrap();
let deserialised: Haystack = serde_json::from_str(&json).unwrap();
assert_eq!(deserialised.location, haystack.location);
assert_eq!(deserialised.service, haystack.service);
assert_eq!(
deserialised.extra_parameters.get("filter"),
haystack.extra_parameters.get("filter")
);
}
}
Best Practices
Configuration Design
- Provide sensible defaults
- Validate at load time
- Support environment variables
- Document all options
Path Handling
- Support shell-like expansion
- Handle cross-platform differences
- Use absolute paths internally
- Preserve user-friendly paths in config
Role Management
- Use unique identifiers
- Support role switching
- Validate role consistency
- Provide role templates
LLM Integration
- Secure API key handling
- Model versioning support
- Fallback provider configuration
- Context window management
Future Enhancements
Planned Features
Configuration Validation
pub fn validate_config(&self) -> Result<Vec<ValidationError>> {
}
Configuration Migration
pub fn migrate_config(&mut self, from_version: &str) -> Result<()> {
}
Configuration Profiles
pub struct ConfigProfile {
pub name: String,
pub roles: AHashMap<RoleName, Role>,
pub settings: HashMap<String, Value>,
}
pub fn switch_profile(&mut self, profile: &str) -> Result<()> {
}
References