Research Document: Make Offline Mode Default for terraphim-agent
Date: 2026-03-30 Author: AI Agent (Phase 1 Research) Status: Draft
1. Problem Restatement and Scope
Problem
When a user runs terraphim-agent with no arguments, the default behaviour is to launch the fullscreen ratatui TUI, which unconditionally requires a running terraphim_server at http://localhost:8000. If the server is not running, the user gets an opaque error:
Fullscreen TUI requires a running Terraphim server at http://localhost:8000.
Start terraphim_server or use offline mode with `terraphim-agent repl`.
Connection error: health check failedThis is the first experience most users have with the tool. The error message is technically accurate but user-hostile: it forces the user to either (a) start a separate long-running server process first, or (b) know to use terraphim-agent repl instead.
The fundamental problem: the default invocation of terraphim-agent does not work without infrastructure the user may not have started.
Scope
IN scope:
- Making
terraphim-agent(no args) work without a running server - Determining the appropriate default mode when no server is available
- Refactoring the fullscreen TUI to use the existing
TuiService(offline path) instead of requiringApiClient(online path) - Providing graceful fallback or auto-detection when the server IS available
OUT of scope:
- Removing server mode entirely (server-backed TUI still useful for remote/shared scenarios)
- Changing
terraphim_serveritself - Modifying the REPL or subcommand paths (they already support offline mode)
- Adding new features beyond making existing offline capabilities the default
- Changing the
terraphim-clidevice settings fallback
Distinguishing Problem from Solution Guesses
Problem statements:
- "Running
terraphim-agentwithout arguments fails when no server is running" - "The fullscreen TUI depends on a server that most users don't have running"
- "TuiService provides all the same capabilities locally but is not wired into the TUI"
Solution guesses (flagged, not pursued yet):
- "Make the TUI use TuiService by default" -- likely direction but needs design validation
- "Auto-detect server and fall back to offline" -- option to evaluate
- "Change default to REPL" -- simpler but loses the fullscreen experience
2. User and Business Outcomes
User-Visible Changes
| Current Behaviour | Desired Behaviour |
|---|---|
| terraphim-agent fails with server error | terraphim-agent works immediately, no server needed |
| User must start terraphim_server first | User can use the tool immediately after installation |
| Fullscreen TUI requires server | Fullscreen TUI works standalone |
| Error message says "use repl" as workaround | No workaround needed; default mode just works |
Business Value
- First-run experience: New users succeed immediately, no setup friction
- Single-binary simplicity:
terraphim-agentbecomes self-contained for local use - Power-user option:
--serverflag preserves the server-backed mode for those who need it - Competitive positioning: Most search/KG tools work standalone; requiring a server is an unnecessary barrier
3. System Elements and Dependencies
3.1 Key Components
| Component | Location | Role | Dependencies |
|---|---|---|---|
| Cli struct | crates/terraphim_agent/src/main.rs:537-564 | Clap CLI argument parsing | clap |
| Command enum | crates/terraphim_agent/src/main.rs:566-758 | Subcommand dispatch | clap |
| run_tui() | crates/terraphim_agent/src/main.rs:2881-2940 | Fullscreen TUI terminal setup | ratatui, crossterm |
| ui_loop() | crates/terraphim_agent/src/main.rs:2942-3238 | TUI event loop -- entirely ApiClient-based | ApiClient |
| ApiClient | crates/terraphim_agent/src/client.rs | HTTP client for server API | reqwest |
| TuiService | crates/terraphim_agent/src/service.rs:11-15 | Local service (no network) | TerraphimService, ConfigState |
| TerraphimService | crates/terraphim_service/src/ | Core search/KG/chat engine | All domain crates |
| ReplHandler | crates/terraphim_agent/src/repl/handler.rs:25-58 | REPL with offline/server modes | TuiService or ApiClient |
| terraphim_server | terraphim_server/src/ | Axum HTTP API server | TerraphimService |
| ensure_tui_server_reachable() | crates/terraphim_agent/src/main.rs:96-104 | Health check gate | ApiClient |
| tui_server_requirement_error() | crates/terraphim_agent/src/main.rs:86-94 | Error message builder | -- |
3.2 Data Flow: Current (Server-Required TUI)
terraphim-agent (no args)
-> run_tui_offline_mode() [misleading name!]
-> run_tui(None, transparent)
-> ui_loop(terminal, None, transparent)
-> resolve_tui_server_url(None) -> "http://localhost:8000"
-> ApiClient::new("http://localhost:8000")
-> ensure_tui_server_reachable() <-- FAILS HERE if no server
-> [TUI event loop using api.search(), api.rolegraph(), etc.]3.3 Data Flow: REPL Offline (Already Works)
terraphim-agent repl
-> run_repl_offline_mode()
-> TuiService::new(None) [loads config, builds rolegraphs locally]
-> ReplHandler::new_offline(service)
-> [REPL event loop using service.search_with_role(), etc.]3.4 TuiService Capability Surface
The TuiService at service.rs already provides ALL the capabilities the TUI needs:
| Capability | TuiService Method | ApiClient Method (current TUI) |
|---|---|---|
| Search | search_with_role() | api.search() |
| Get config | get_config() | api.get_config() |
| List roles | list_roles_with_info() | (from config) |
| Get rolegraph | get_role_graph_top_k() | api.rolegraph() |
| Autocomplete | autocomplete() | api.get_autocomplete() |
| Summarize | summarize() | api.summarize_document() |
| Chat | chat() | (via REPL only) |
The TuiService is already used by:
run_offline_command()for all CLI subcommandsReplHandler::new_offline()for the REPL
It is NOT used by the fullscreen TUI (ui_loop). This is the core gap.
3.5 Shared State and Cross-Cutting Concerns
- Config loading: Same priority chain (CLI flag -> settings.toml -> persistence -> embedded defaults)
- Tokio runtime: TUI creates its own runtime; TuiService is async-capable
- Thread safety:
TuiServiceusesArc<Mutex<TerraphimService>>-- safe to use from TUI event loop viart.block_on() - Logging: Both paths use
terraphim_service::logging::init_logging()
4. Constraints and Their Implications
4.1 UX Constraints
| Constraint | Why It Matters | Good vs Bad Solution |
|---|---|---|
| First-run must work | User retention depends on immediate value | Good: works without server. Bad: requires setup guide |
| No breaking change for server users | Existing workflows depend on --server | Good: --server still works. Bad: removing server mode |
| TUI must remain responsive | Terminal UI should not block on disk I/O | Good: async service calls. Bad: synchronous disk reads in render loop |
| Backward-compatible CLI flags | Scripts may use current flags | Good: keep --server semantics. Bad: repurpose flags |
4.2 Technical Constraints
| Constraint | Why It Matters | Good vs Bad Solution |
|---|---|---|
| TuiService is async | TUI event loop is synchronous (block_on bridge needed) | Good: reuse existing rt.block_on() pattern. Bad: rewrite TUI as async |
| ratatui is synchronous | Cannot use .await in render closures | Good: pre-fetch data before render. Bad: async in render |
| Search may be slow on first load | Building rolegraphs and indexing takes time | Good: show loading state. Bad: blank screen |
| Memory usage | Loading all rolegraphs into memory is heavier than HTTP client | Good: acceptable for local use. Bad: unnecessary for server users |
4.3 Performance Constraints
| Constraint | Why It Matters | Implication | |---|---|---| | TUI startup time | Users expect instant feedback | May need splash/loading screen during TuiService init | | Search latency | TUI renders results synchronously | Local search should be fast (in-memory), but first query builds automata | | Memory footprint | Server mode shares indexed data; offline loads per-process | Acceptable for single-user local tool |
4.4 Unclear/Conflicting Constraints
- "offline" naming: The function
run_tui_offline_modedoes NOT actually run offline. This naming must be fixed to avoid confusion. - Default behaviour change: Changing the default from server-required to offline-first is a behaviour change, but since the current default FAILS for most users, this is arguably a fix rather than a break.
5. Risks, Unknowns, and Assumptions
5.1 Unknowns
| Unknown | Impact | How to Resolve |
|---|---|---|
| How long does TuiService::new() take on a typical config? | If slow (>2s), needs loading UI | Benchmark with real config |
| Does the TUI autocomplete path work identically via TuiService vs ApiClient? | UX consistency | Compare REPL autocomplete (TuiService) with TUI autocomplete (ApiClient) |
| Are there server-only features the TUI relies on that TuiService lacks? | Feature parity gaps | Audit all api.* calls in ui_loop against TuiService methods |
| What happens with WebSocket/workflow features in offline TUI? | Some features may be server-only | Check if TUI uses any WebSocket routes |
5.2 Assumptions
- ASSUMPTION: TuiService provides all capabilities needed by the fullscreen TUI. The capability table in section 3.4 supports this, but the exact return types need verification.
- ASSUMPTION: The
rt.block_on()pattern used for ApiClient calls inui_loopwill work identically for TuiService async methods. Both returnResulttypes. - ASSUMPTION: Users who currently run both
terraphim_serverandterraphim-agentwill accept using--serverto maintain their workflow. - ASSUMPTION: The
summarize()method in TuiService (which uses chat/LLM) will work in the TUI context the same as the server's summarize endpoint.
5.3 Risks
| Risk | Severity | Likelihood | Mitigation | |---|---|---|---| | TuiService init is slow, causing blank TUI on startup | Medium | Medium | Add loading screen or init before entering raw mode | | Return types differ between ApiClient and TuiService methods | Medium | Low | Create adapter/wrapper with unified interface | | Memory usage significantly higher than server mode | Low | Low | Profile and compare; acceptable for local tool | | Users rely on current "no args = server" behaviour | Low | Very Low | Current behaviour fails without server; improvement is net positive | | Autocomplete UX differs between local and server | Medium | Medium | TuiService.autocomplete() returns different type than ApiClient.get_autocomplete() -- needs adapter |
6. Context Complexity vs. Simplicity Opportunities
6.1 Sources of Complexity
-
Dual service abstraction: The codebase has two parallel service paths (
ApiClientandTuiService) that do the same things but with different interfaces. The TUI only uses one; the REPL uses both. -
Misleading naming:
run_tui_offline_modedoes not run offline.run_tui_server_modeis the only one that actually uses the server URL explicitly. Both callrun_tui()which always usesApiClient. -
Copy-paste patterns: The TUI event loop in
ui_loop()has ~280 lines ofrt.block_on(api.method())calls that would need to be changed tort.block_on(service.method()). This is a mechanical but large change. -
Feature-gated code: The REPL is behind
#[cfg(feature = "repl")]. Offline-first might need similar feature gating or should be unconditional.
6.2 Simplification Opportunities
-
Unified TUI service trait: Define a trait
TuiServiceProviderwith methods for search, autocomplete, rolegraph, config, summarize. Implement for bothTuiServiceandApiClient. The TUI event loop works against the trait. This eliminates theif offline { ... } else { ... }branching. -
Reuse ReplHandler's pattern: The
ReplHandleralready solved this exact problem withnew_offline(service)andnew_server(api_client). Apply the same pattern to the TUI. -
Simplest possible change: Replace
ApiClientwithTuiServiceinui_loop(), changing method call signatures. No trait needed if we accept that offline is the default and--serveris the exception. -
Eliminate the middle function:
run_tui_offline_mode->run_tui(None, transparent)->ui_loop(). If offline is default, the dispatch can be simplified to: always create TuiService; if--server, create ApiClient instead.
7. Questions for Human Reviewer
-
Default mode preference: Should the default (no-arg) invocation be (a) always offline TuiService, (b) auto-detect server and fall back to offline, or (c) offline with
--serveropt-in?- Why it matters: Determines whether we need server detection logic or a clean cut.
-
Startup loading experience: TuiService::new() builds rolegraphs and indexes. If this takes >1s, should we show a loading screen in the TUI, or initialise before entering raw mode?
- Why it matters: Affects TUI startup flow and user experience.
-
--serverflag semantics: Currently--servermeans "use ApiClient for subcommands". Should it also control the fullscreen TUI mode, or should there be a separate mechanism?- Why it matters: Flag reuse vs. new flag.
-
Feature parity tolerance: The
summarizefeature in TuiService uses the chat/LLM path (requires API key). The server's summarize may differ. Is partial feature parity acceptable for offline TUI?- Why it matters: Defines "done" for this feature.
-
Existing design plan: The previous design plan at
docs/plans/terraphim-agent-tui-offline-server-design-2026-02-13.mdexplicitly chose to keep the TUI server-backed and only improve error messaging. Should this new plan supersede that decision?- Why it matters: Alignment with prior architectural decisions.
-
Scope of change: Should this be a minimal change (just make TUI use TuiService) or should it include refactoring the dual-path pattern (trait abstraction)?
- Why it matters: Risk vs. long-term maintainability.
-
Migration path for server users: Any documentation, scripts, or workflows that assume
terraphim-agent(no args) uses the server will break. What communication/changelog approach is needed?- Why it matters: User communication planning.
Quality Checklist
- [x] Problem clearly distinguished from solutions
- [x] All affected system elements identified with file:line references
- [x] Constraints have clear implications (good vs bad solutions)
- [x] Every assumption marked as such (4 assumptions)
- [x] Risks have de-risking suggestions (5 risks with mitigations)
- [x] Questions are specific and actionable (7 questions with rationale)