Verification and Validation Report: Issue #538

Issue: feat: Add GGUF/llama-cpp backend to terraphim LLM proxy layer Repository: terraphim/terraphim-ai Date: 2026-03-11 Status: NOT IMPLEMENTED

Executive Summary

Issue #538 proposes adding local GGUF inference capability via llama-cpp-rs for CPU-only LLM inference. The current codebase supports remote (OpenRouter) and local (Ollama) providers, but has no direct GGUF/llama.cpp integration. This feature request remains unimplemented.

| Requirement | Status | Evidence | |-------------|--------|----------| | terraphim_llm_local crate exists | NOT MET | No crate found | | llama-cpp-rs dependency | NOT MET | Not in any Cargo.toml | | hf-hub dependency | NOT MET | Not in any Cargo.toml | | CPU-only inference support | NOT MET | No implementation | | Same trait interface as remote | MET | LlmClient trait exists |

Detailed Verification

1. terraphim_llm_local Crate Exists

Status: NOT MET

Evidence:

$ ls -la crates/ | grep -E "(llm|local|gguf)"
No matching crates found

All existing crates in the workspace:

terraphim_agent_evolution
terraphim_agent_messaging
terraphim_agent_registry
terraphim_agent_supervisor
terraphim_atomic_client
terraphim_automata
terraphim_config
terraphim_goal_alignment
terraphim_kg_agents
terraphim_kg_orchestration
terraphim_markdown_parser
terraphim_mcp_server
terraphim_middleware
terraphim_multi_agent
terraphim_onepassword_cli
terraphim_persistence
terraphim_rolegraph
terraphim_router
terraphim_server
terraphim_service
terraphim_settings
terraphim_spawner
terraphim_task_decomposition
terraphim_tinyclaw
terraphim_types
terraphim_tui
haystack_atlassian
haystack_core
haystack_discourse
haystack_jmap

No terraphim_llm_local crate exists.

2. llama-cpp-rs Dependency Present

Status: NOT MET

Evidence:

$ grep -r "llama-cpp" crates/*/Cargo.toml
No matches found

Alternative local inference found: The codebase uses Ollama for local inference via HTTP API (feature = "ollama").

3. hf-hub Dependency Present

Status: NOT MET

Evidence:

$ grep -r "hf-hub" crates/*/Cargo.toml
No matches found

No automatic GGUF model download capability exists.

4. CPU-only Inference Support

Status: NOT MET (for GGUF)

Evidence:

No llama.cpp integration found
No GGUF model loading code found
No quantization variant selection code found

Existing local inference: Ollama integration exists at crates/terraphim_service/src/llm.rs (lines 299-564) which provides local inference through Ollama's HTTP API.

5. Same Trait Interface as Remote

Status: MET

Evidence: File: crates/terraphim_service/src/llm.rs (lines 31-56)

#[async_trait::async_trait]
pub trait LlmClient: Send + Sync {
    fn name(&self) -> &'static str;

    async fn summarize(&self, content: &str, opts: SummarizeOptions) -> ServiceResult<String>;

    async fn list_models(&self) -> ServiceResult<Vec<String>> {
        // Default implementation
    }

    async fn chat_completion(
        &self,
        _messages: Vec<serde_json::Value>,
        _opts: ChatOptions,
    ) -> ServiceResult<String> {
        // Default implementation
    }
}

This trait is already implemented by:

OpenRouterClient (lines 243-292) - remote API
OllamaClient (lines 301-564) - local via HTTP
RouterBridgeLlmClient (feature = "llm_router") - routing layer

A GGUF implementation would use the same trait.

Current LLM Architecture

                    +------------------+
                    |   build_llm_     |
                    |   from_role()    |
                    +--------+---------+
                             |
           +-----------------+-----------------+
           |                                   |
    +------v------+                     +------v-------+
    |  OpenRouter |                     |    Ollama    |
    |   (remote)  |                     |   (local)    |
    +-------------+                     +--------------+

Proposed addition:

                    +------------------+
                    |   build_llm_     |
                    |   from_role()    |
                    +--------+---------+
                             |
           +-----------------+-----------------+-----------------+
           |                 |                 |
    +------v------+   +------v-------+   +-----v----------+
    |  OpenRouter |   |    Ollama    |   |  GGUF/Local   |
    |   (remote)  |   |   (local)    |   |  (llama.cpp)  |
    +-------------+   +--------------+   +---------------+

Traceability Matrix

| Requirement | Design Element | Code Location | Test Coverage | Status | |-------------|----------------|---------------|---------------|--------| | Common LLM trait | LlmClient trait | llm.rs:31-56 | Unit tests | MET | | Local inference | OllamaClient | llm.rs:301-564 | Integration tests | PARTIAL | | GGUF support | MISSING | - | NONE | NOT MET | | Model download | MISSING | - | NONE | NOT MET | | Quantization selection | MISSING | - | NONE | NOT MET |

Defect Register

| ID | Description | Severity | Resolution | Status | |----|-------------|----------|------------|--------| | D001 | No terraphim_llm_local crate | High | Create crate with llama-cpp-rs | OPEN | | D002 | No hf-hub integration | Medium | Add hf-hub dependency for downloads | OPEN | | D003 | No GGUF quantization selection | Medium | Implement quantization variant selector | OPEN |

Existing Local Inference Alternative

The codebase currently supports local inference via Ollama:

Configuration:

{
  "extra": {
    "llm_provider": "ollama",
    "ollama_model": "llama3.1",
    "ollama_base_url": "http://127.0.0.1:11434"
  }
}

Limitations vs GGUF proposal:

Requires Ollama service running
No direct GGUF file loading
No quantization selection at runtime
Additional dependency (Ollama binary)

Recommendations

Option 1: Implement GGUF Support (Full Issue Resolution)

Effort: 2-3 days

Steps:

Create crates/terraphim_llm_local/ crate
Add dependencies: llama-cpp-rs, hf-hub
Implement LlmClient trait for GGUF models
Add model download/caching via hf-hub
Implement quantization variant selection
Add configuration options to Role.extra
Write integration tests

Benefits:

No Ollama dependency required
Direct GGUF file support
Full control over quantization
Single binary deployment

Option 2: Document Ollama as Alternative (Partial Resolution)

If GGUF implementation is not prioritized, document that Ollama provides local inference capability with GGUF support (Ollama can import and serve GGUF models).

Option 3: Close as Not Planned

If Ollama integration is sufficient for the use case, close this issue with explanation.

Conclusion

Issue #538 represents a valid feature request that has not been implemented. The codebase has the architectural foundation (LlmClient trait) to support this feature, but no GGUF/llama.cpp integration exists.

GO/NO-GO Decision: NO-GO

Reasoning:

Feature request is not implemented
No code exists to validate
Requires implementation Phase 3 work

Next Steps:

If implementing: Create design document for terraphim_llm_local crate
If not implementing: Close issue with explanation about Ollama alternative
If deferring: Add to backlog with priority label

Appendix: Files Referenced

| File | Path | Purpose | |------|------|---------| | LLM trait and providers | crates/terraphim_service/src/llm.rs | LlmClient trait, OpenRouter, Ollama impls | | LLM adapter for evolution | crates/terraphim_agent_evolution/src/llm_adapter.rs | Simplified adapter trait | | Role configuration | crates/terraphim_config/src/lib.rs | Role.extra configuration |