ADF Direct Dispatch Remediation -- Design Document

Gitea issue: terraphim/terraphim-ai#1890 Pull request: terraphim/terraphim-ai#1885 Phase: 2 of disciplined development (Design) Author: OpenCode Date: 2026-05-29

This document records the research summary and implementation plan for fixing the structured PR review findings against the ADF direct-dispatch remediation branch. No implementation is included in this document.

1. Research Summary

1.1 Problem

PR #1885 has three review findings:

adf-ctl --local trigger project/agent --direct --wait dispatches successfully, then fails during wait because wait_for_agent_exit() validates the unsplit project/agent value and rejects /.
Direct-dispatch UDS validates only bare agent names, so {"project":"bad","agent":"build-runner"} can return ok before the orchestrator later drops it.
cmd_status --since still interpolates user input into a shell command.

1.2 Current Data Flow

adf-ctl trigger project/agent --direct
-> split_project_agent()
-> UDS payload { project, agent, context, synthetic_event }
-> direct_dispatch::handle_connection()
-> validates only agent
-> WebhookDispatch::SpawnAgent
-> handle_direct_dispatch()
-> mention::resolve_mention()
-> spawn_agent_with_event()

1.3 Key Code Locations

| File | Relevant Area | |------|---------------| | crates/terraphim_orchestrator/src/bin/adf-ctl.rs | cmd_trigger, wait_for_agent_exit, cmd_status, validate_agent_name_for_shell | | crates/terraphim_orchestrator/src/direct_dispatch.rs | DispatchCommand, start_direct_dispatch_listener, handle_connection | | crates/terraphim_orchestrator/src/lib.rs | handle_direct_dispatch, direct listener startup | | crates/terraphim_orchestrator/src/mention.rs | resolve_mention project-aware resolution |

1.4 Essential Constraints

| Constraint | Why It Matters | |------------|----------------| | UDS must return truthful success/failure | CLI automation depends on ok meaning spawn was accepted. | | project/agent must work with --wait | This is the new documented direct-dispatch shape. | | Shell interpolation must validate or avoid user input | Local and SSH modes run sh -c commands. |

2. Design Plan

2.1 Step 1: Fix Direct `--wait` Name Handling

Modify only the direct branch in cmd_trigger.

Current issue:

wait_for_agent_exit(local, name, host, timeout)?;

Planned change:

wait_for_agent_exit(local, &agent_name, host, timeout)?;

Acceptance tests:

Add or extend adf-ctl unit coverage for split_project_agent("project/agent").
Add a test around validation expectation: bare agent name is accepted, project-qualified value is not passed to wait.
If direct function testing is awkward, add a small helper to compute wait target from name and test that helper.

2.2 Step 2: Make UDS Validation Project-Aware

Change start_direct_dispatch_listener to receive enough information to validate project-qualified requests synchronously.

Preferred minimal design:

pub struct DirectDispatchAgentIndex {
    bare_names: HashSet<String>,
    qualified_names: HashSet<(String, String)>,
}

Build it in lib.rs from self.config.agents:

let agent_index = DirectDispatchAgentIndex::from_agents(&self.config.agents);

Validation logic in direct_dispatch.rs:

match cmd.project.as_deref() {
    Some(project) => validate (project, cmd.agent),
    None => validate cmd.agent in bare_names,
}

Acceptance tests:

{"agent":"meta-learning"} still returns ok.
{"project":"valid-project","agent":"build-runner"} returns ok.
{"project":"bad-project","agent":"build-runner"} returns error and emits no dispatch.
Existing unknown-agent test still passes.

2.3 Step 3: Harden `cmd_status --since`

Add a narrow validator for status durations before interpolating into shell.

Function:

fn validate_since_for_shell(since: &str) -> Result<String>

Allowed grammar:

^[0-9]+[smhdw]$

Examples accepted:

30m
1h
2d
1w

Examples rejected:

1h'; rm -rf /
now
1 hour
empty string

Apply before command construction:

let since = validate_since_for_shell(since)?;

Acceptance tests:

Valid values pass unchanged.
Shell metacharacters fail.
cmd_status uses validated value.

2.4 Step 4: Proof of Implementation -- Fully Functional Local ADF Flow

The implementation is not considered complete until it is proven by a fully functional local ADF flow, not only unit tests.

Initial proof must start with k=1 to keep the verification small, observable, and deterministic. k means one matrix slot / one local flow work item for the first proof run. Larger k values are out of scope until k=1 passes.

Proof target:

Use the local flow pattern from branch task/1875-adf-ctl-local-direct-dispatch, specifically the .terraphim/flows/adf-useful-work-proof.toml style of useful-work proof.
Reduce the matrix to a single slot for the first run (k=1).
Run the flow locally with adf-ctl flow against the working tree.
The flow must produce an artefact under .docs/adf/<issue>/ proving that the local flow executed useful work end-to-end.

Proof acceptance criteria:

A local ADF flow can be loaded from .terraphim/flows/<name>.toml.
With k=1, exactly one work slot executes and records its output.
The flow finishes successfully and reports completed steps.
The generated proof artefact contains the issue id, flow name, slot id, and successful exit status.
The proof is captured in the PR summary before merge.

Recommended first proof command, adjusted to the final local flow name:

cargo run -p terraphim_orchestrator --bin adf-ctl -- flow adf-useful-work-proof --context "issue=1890 k=1"

If the flow engine does not yet parse k from context, implement the proof by committing a one-slot flow fixture or by reducing the matrix in a temporary local test fixture. Do not expand to k=3 until the k=1 proof succeeds.

2.5 Step 5: Verification

Run:

cargo fmt
cargo test -p terraphim_orchestrator --bin adf-ctl
cargo test -p terraphim_orchestrator --lib direct_dispatch
cargo test -p terraphim_orchestrator --lib

Then run the local ADF flow proof from section 2.4 with k=1.

3. Out of Scope

Replacing all sh -c usage in adf-ctl.
Adding authoritative cancel/status admin socket support.
Changing synthetic event env-var names.
Refactoring direct dispatch into a separate service layer.
Proving higher fan-out values before k=1 is fully functional.

4. Implementation Order

Add tests for --wait target and --since validation.
Fix direct wait target.
Add DirectDispatchAgentIndex and project-aware UDS validation tests.
Harden cmd_status --since.
Add or adapt the local ADF useful-work proof so the first proof run uses k=1.
Run verification commands and the local ADF flow proof.
Update PR #1885 with the proof artefact path and command output summary.

5. Approval Gate

If this plan is approved, the next step is Phase 3 implementation against PR branch task/1890-adf-direct-dispatch-remediation.

ADF Direct Dispatch Remediation -- Design Document

1. Research Summary

1.1 Problem

1.2 Current Data Flow

1.3 Key Code Locations

1.4 Essential Constraints

2. Design Plan

2.1 Step 1: Fix Direct --wait Name Handling

2.2 Step 2: Make UDS Validation Project-Aware

2.3 Step 3: Harden cmd_status --since

2.4 Step 4: Proof of Implementation -- Fully Functional Local ADF Flow

2.5 Step 5: Verification

3. Out of Scope

4. Implementation Order

5. Approval Gate

2.1 Step 1: Fix Direct `--wait` Name Handling

2.3 Step 3: Harden `cmd_status --since`