Disciplined Design Plan: OpenDAL WARN Message Reduction

Executive Summary

Problem: OpenDAL generates WARN-level log messages when attempting to read configuration and thesaurus files that don't exist yet. While the system correctly falls back to defaults, the warnings create noise that confuses users and undermines confidence.

Root Cause: OpenDAL's LoggingLayer::default() logs NotFound errors at WARN level, which is appropriate for unexpected failures but inappropriate for the expected "file doesn't exist yet" case.

Research Finding: The issue is a user experience problem, not a functional bug. The fallback mechanism works correctly.

Recommended Solution: Centralized warning suppression in the persistence layer (Option B from research).


Phase 1: Research and Problem Understanding

Problem Statement

The Terraphim AI system generates excessive WARN-level log messages during startup and operations when attempting to read configuration files (embedded_config.json, server_config.json, desktop_config.json) and thesaurus files (thesaurus_*.json) that don't yet exist. The system correctly handles these failures by falling back to defaults, but the WARN-level logging creates noise that:

  1. Confuses users into thinking something is broken
  2. Masks real issues in log output
  3. Undermines confidence in system stability

Root Cause Analysis

Source: OpenDAL's LoggingLayer::default() in create_memory_operator() (crates/terraphim_persistence/src/settings.rs:135)

Code Flow:

User Startup β†’ ConfigState::new() β†’ Config::load() β†’ load_from_operator()
β†’ fastest_op.read(key) β†’ OpenDAL LoggingLayer β†’ WARN: NotFound
β†’ Fallback operators β†’ All fail β†’ Return error β†’ Use default config

Key Files:

  • crates/terraphim_persistence/src/settings.rs - Creates operators with LoggingLayer
  • crates/terraphim_persistence/src/lib.rs - Core fallback mechanism
  • crates/terraphim_config/src/lib.rs - Config loading
  • crates/terraphim_persistence/src/thesaurus.rs - Thesaurus loading

User & Business Outcomes

User Impact:

  • Current: 4-7 WARN messages per operation, creating log noise
  • Desired: Clean logs with meaningful messages only

Business Impact:

  • User trust undermined by scary-looking warnings
  • Support cost from users reporting "errors"
  • Difficulty debugging real issues due to log noise
  • Professionalism concerns for production deployments

Phase 2: Planning and Design Thinking

Solution Options Evaluated

| Option | Approach | Complexity | Impact | Risk | |--------|----------|------------|--------|------| | A. Pre-seed defaults | Save default configs on first startup | Low | High | May conflict with user configs | | B. Custom LoggingLayer | Filter NotFound warnings at wrapper level | Medium | High | Adds code complexity | | C. Environment config | Add LOG_LEVEL env variable | Low | Medium | Users won't know to use it | | D. Skip memory backend | Skip memory if no data exists | Medium | Medium | May break resilience |

Recommended Solution: Option B (Custom LoggingLayer Wrapper)

Rationale:

  • High Impact: Eliminates all NotFound warnings across Config, Thesaurus, Documents
  • Low Risk: Doesn't affect fallback logic or functionality
  • Maintainable: Centralized solution in one location
  • User-Controlled: Can extend later for debug mode

Implementation Design

Architecture

Settings.rs
    ↓
create_operator()
    ↓
Custom LoggingLayer (filters NotFound from read operations)
    ↓
OpenDAL Operator

Key Changes

File 1: crates/terraphim_persistence/src/settings.rs

Create a custom logging layer that filters NotFound warnings:

use opendal::layers::LoggingLayer;
use opendal::Operator;
use opendal::Result;
use opendal::Error;

struct NotFoundFilteringLogger;

impl opendal::layers::LoggingLayer for NotFoundFilteringLogger {
    // Override log_read to filter NotFound errors
    async fn on_read(
        &self,
        ctx: opendal::layers::LoggingContext,
        outcome: Result<opendal::Metadata>,
    ) {
        if let Err(e) = &outcome {
            if e.kind() == opendal::ErrorKind::NotFound {
                // Silently skip - this is expected for missing files
                return;
            }
        }
        // Forward all other events to default logger
        LoggingLayer::default().on_read(ctx, outcome).await
    }
}

Alternative Simpler Approach (preferred):

Use OpenDAL's built-in ability to customize log levels:

// Create operator with custom logging configuration
let op = Operator::new(mem)?
    .layer(LoggingLayer::new()
        .with_filter(|err| {
            // Only log if NOT a NotFound error
            err.kind() != opendal::ErrorKind::NotFound
        }))
    .finish();

File 2: crates/terraphim_persistence/src/lib.rs

Add centralized logging configuration function:

/// Create an operator with reduced NotFound warning noise
pub fn create_operator_with_filtered_logging(service: &str) -> Result<Operator> {
    let builder = match service {
        "memory" => Operator::new(opendal::services::Memory::default())?,
        "rocksdb" => Operator::new(opendal::services::Rocksdb::default())?,
        // ... other backends
    };

    // Apply logging layer that filters NotFound
    let op = builder
        .layer(LoggingLayer::new()
            .with_max_level(log::LevelFilter::Warn)
            .with_filter(|err| {
                // Filter out NotFound errors - these are expected
                err.kind() != opendal::ErrorKind::NotFound
            }))
        .finish();

    Ok(op)
}

File 3: Update all operator creation calls

Replace existing create_memory_operator(), create_rocksdb_operator(), etc. to use the new filtered logging.

Changes Required

| File | Change | Complexity | |------|--------|------------| | crates/terraphim_persistence/src/settings.rs | Add custom logging layer function | Medium | | crates/terraphim_persistence/src/lib.rs | Update operator creation calls | Low | | crates/terraphim_config/src/lib.rs | Update to use new persistence API | Low | | crates/terraphim_persistence/src/thesaurus.rs | Update to use new persistence API | Low |

Testing Strategy

  1. Unit Tests:

    • Verify NotFound errors are not logged at WARN level
    • Verify other errors ARE logged correctly
    • Verify fallback still works
  2. Integration Tests:

    • Run full workflow with logs enabled
    • Count WARN messages (should be reduced)
    • Verify no functionality regression
  3. Manual Testing:

    • Fresh startup (no cached data)
    • After data is cached
    • With different storage backends

Phase 3: Implementation Plan

Step 1: Research OpenDAL LoggingLayer API

Task: Verify OpenDAL's LoggingLayer supports filtering

Actions:

# Check OpenDAL version in Cargo.toml
grep opendal Cargo.toml

# Check OpenDAL documentation for LoggingLayer customization
# https://docs.rs/opendal/latest/opendal/layers/struct.LoggingLayer.html

Expected Outcome: Confirm with_filter method exists or find alternative approach.

Step 2: Implement Logging Filter

File: crates/terraphim_persistence/src/settings.rs

Code Changes:

  1. Add create_filtered_operator() function
  2. Apply to all operator creation paths
  3. Test with memory backend first

Step 3: Update Config Persistence

File: crates/terraphim_config/src/lib.rs

Code Changes:

  1. Update load_from_operator() to use filtered operators
  2. Verify fallback still works
  3. Test with fresh startup

Step 4: Update Thesaurus Persistence

File: crates/terraphim_persistence/src/thesaurus.rs

Code Changes:

  1. Update load() to use filtered operators
  2. Verify fallback still works
  3. Test thesaurus operations

Step 5: Test and Validate

Testing:

# Run with RUST_LOG to capture warnings
RUST_LOG=warn cargo run -p terraphim_agent 2>&1 | grep -c "NotFound"

# Expected: 0 warnings about NotFound

# Verify other logging still works
RUST_LOG=debug cargo run -p terraphim_agent 2>&1 | grep -i "error\|warn"
# Expected: Only real errors/warnings, not NotFound

Files to Modify

| File | Change Type | Priority | |------|-------------|----------| | crates/terraphim_persistence/src/settings.rs | Add logging filter | CRITICAL | | crates/terraphim_persistence/src/lib.rs | Update operator creation | HIGH | | crates/terraphim_config/src/lib.rs | Update Config loading | MEDIUM | | crates/terraphim_persistence/src/thesaurus.rs | Update Thesaurus loading | MEDIUM |


Risk Assessment

Risk 1: Filtering Too Much

Probability: Low Impact: High Mitigation: Test with intentional errors to verify they're still logged

Risk 2: OpenDAL API Change

Probability: Low Impact: Medium Mitigation: Check version compatibility, use stable API

Risk 3: Test Compatibility

Probability: Medium Impact: Low Mitigation: Update tests to expect cleaner logs


Completion Criteria

Must Have (Blocking)

  • [ ] NotFound errors no longer logged at WARN level
  • [ ] Other errors still logged correctly
  • [ ] Fallback mechanism still works
  • [ ] No functional regression

Should Have (Non-blocking)

  • [ ] All storage backends (memory, rocksdb, sqlite) use filtered logging
  • [ ] Documentation updated to explain logging behavior
  • [ ] Tests updated to verify correct logging

Could Have (Nice to Have)

  • [ ] Environment variable to control log filtering
  • [ ] Different log levels for first startup vs. cached
  • [ ] Metrics on fallback frequency

Questions for Stakeholder Review

  1. Acceptable warning threshold: Should we aim for zero warnings, or just reduce noise?
  2. Environment variable control: Do we need user-configurable logging, or just fix it automatically?
  3. Test compatibility: Are there automated tests that check for specific log messages?
  4. Scope: Should this fix all backends (memory, rocksdb, sqlite) or just memory?
  5. Timeline: Quick fix now vs. comprehensive solution later?

Summary

This plan addresses the OpenDAL WARN message noise by implementing a centralized logging filter in the persistence layer. The solution:

  • Eliminates NotFound warnings while preserving other logging
  • Maintains all fallback functionality
  • Low risk - doesn't change core behavior
  • High impact - clean, professional logs

Expected Outcome: Users see clean startup logs with meaningful warnings only, improving confidence and reducing support burden.


Plan Version: 1.0 Created: 2026-01-09 Status: Ready for Review