Handover: ADF Fixes & odilo-developer Investigation
Date: 2026-05-16
Branch: main (bc31ecb40)
Remotes: origin (GitHub) and gitea synced
1. Progress Summary
Completed
| Task | Detail |
|------|--------|
| Fix: ADF running as root | Restarted via systemd as User=alex. Added PATH override (/home/alex/.local/bin) so claude binary is resolvable for KG routing. Removed duplicate user-level systemd service. |
| Fix: Claude OAuth auth | ADF was running as root (manual restart), causing Claude "Not logged in". Now running as alex, all 3 Claude probes passing (sonnet 20s, opus 12s, haiku 14s). |
| Fix: project_by_id WARN log | Added WARN log when project_by_id() returns None during worktree creation. Logs agent name, requested project_id, and fallback path. Prevents silent repo misrouting. |
| Fix: build-runner compilation | Fixed 3 pre-existing clippy warnings promoted to errors: unused variables (_exit_desc, _exit_code), dead code (#[cfg_attr] on config field), duplicate unused import in terraphim_workspace. |
| Fix: evolution field in tests | Added evolution: Default::default() to all 5 test files constructing OrchestratorConfig. Added evolution_snapshot_key: None to HandoffContext test. Restored impl Default for EvolutionConfig with proper defaults. |
| Fix: deprecated tempfile::into_path() | Replaced 29 occurrences across 9 test files with .keep(). Removed unused PathBuf import in terraphim_server. |
| Fix: BUILD.md --all-targets | Restored --all-targets for clippy after fixing root cause (test compilation). Updated cargo fallback in build-runner-llm.sh to match. |
| Fix: OOM killer | ADF hit 16GB MemoryMax. Increased to 100G (80% of 128GB). Changed KillMode from mixed to control-group. Added ExecStartPre cleanup script to kill orphaned opencode/gtr/sentrux/cached-context processes. Cleaned 237 orphaned .opencode processes (83GB leaked). |
| Install: yq | v4.53.2 at /usr/local/bin/yq. Enables GitHub Actions workflow extraction in build-runner-llm. |
| Install: rch | Verified v1.0.16 at /home/alex/.local/bin/rch, on ADF PATH. |
Commits (7 new)
bc31ecb40 Merge remote-tracking branch 'gitea/main'
dc7ce955d fix(clippy): replace deprecated tempfile into_path() with keep() across workspace
42043aec6 fix(build): restore --all-targets in BUILD.md, update cargo fallback, fix unused import
8cff66164 fix(build): fix build-runner compilation + pre-existing clippy warnings
cb68b411f Merge remote-tracking branch 'gitea/main'
c326e0c71 fix(orchestrator): add WARN log when project_by_id returns NoneReleases
- v2026.05.16.1: odilo-developer fixes (GitHub + Gitea)
2. Current ADF State
| Component | Status | |-----------|--------| | ADF process | Running as alex (PID 446199), systemd-managed | | Memory | 21MB current (was 16GB peak before cleanup) | | Orphaned opencode | 0 (was 237) | | Claude probes | sonnet, opus, haiku all passing | | Provider health | All healthy (minimax, zai, kimi, anthropic) | | Agent count | 39 definitions loaded across 7 projects | | Ticks | Completing every 30s (~100-600ms each) |
Systemd Configuration (/etc/systemd/system/adf-orchestrator.service)
User=alex,Group=alexMemoryMax=100G(80% of 128GB RAM)CPUQuota=400%KillMode=control-group(kills all children on stop)ExecStartPre=/opt/ai-dark-factory/adf-cleanup.sh(kills orphaned .opencode, gtr, sentrux, claude, cargo processes)- PATH override:
/etc/systemd/system/adf-orchestrator.service.d/path.conf - Gitea env:
/etc/systemd/system/adf-orchestrator.service.d/env-gitea.conf
Infrastructure on bigbox
| Tool | Version | Path |
|------|---------|------|
| rch | 1.0.16 | /home/alex/.local/bin/rch |
| yq | 4.53.2 | /usr/local/bin/yq |
| terraphim-agent | 1.16.34 | /usr/local/bin/terraphim-agent |
| claude | 2.1.143 | /home/alex/.local/bin/claude |
3. odilo-developer Investigation
Root Cause of Failures
-
Claude OAuth failure: ADF was running as root (manual restart May 16 11:06 CEST). Claude OAuth tokens are in
/home/alex/.claude/, inaccessible to root. ADF fell back to kimi-for-coding/k2p5 which hung (May 15 zombie -- 3 log entries then silent for days). Fixed by restarting ADF via systemd as alex. -
Wrong repo worktree: May 16 worktree (
odilo-developer-77abfd9d) contained terraphim-ai code, not odilo. Caused byproject_by_id("odilo")returning None after ADF restart, falling back to orchestratorworking_dir. Fixed with WARN log and root cause (ADF now runs as alex with proper config loading). -
"rate_limit" exits were Claude OAuth failures: 5 exits on May 13 classified as
rate_limit(pattern "you've hit your limit") were actually Claude OAuth failures from running as root. With ADF running as alex, Claude can authenticate. -
build-runner-llm degradation: Script operates in degraded mode --
yqwas missing (workflow extraction skipped),rchwas missing (remote compilation skipped). Fixed by installing yq and verifying rch.
Verified: odilo-developer Produces Useful Output
Ran odilo-developer via Claude CLI on bigbox. Successfully:
- Listed ready issues (115 open, top PageRank 0.15)
- Retrieved past learnings
- Found open PRs
- Analysed all 8 Rust crates with accurate descriptions
- Produced structured output tables
Schedule
- Next fire: 01:00 UTC (03:00 CEST) -- 4.5 hours from now
- Window:
0 1-9 * * *UTC (fires hourly, 9x daily)
Open PRs on odilo
- PR #235: Contract conformance CI
- PR #234: Teacher Service Phase 3
- PR #222: Slack Socket Mode ADR-049
4. What's Working
- ADF running stably as alex via systemd
- All provider probes passing
- build-runner-llm: clippy, build, and fmt steps pass (
cargo clippy --workspace --all-targets -- -D warningsclean) - Worktree project validation logs WARN on resolution failure
- Orphaned process cleanup on ADF restart
- yq workflow extraction available for build-runner
5. What's Blocked / Needs Follow-up
| Item | Priority | Notes |
|------|----------|-------|
| odilo-developer test failures | Medium | test_tui_service_search failed in manual run -- likely pre-existing test instability, not related to our changes |
| Gitea branch protection 404s | Low | Recurring warnings for atomic-server, better-auth-rust, digital-twins, gitea-robot, gitea (no PR gate configured -- harmless) |
| Gitea 403 for odilo | Low | Token lacks admin write on zestic-ai/odilo -- branch protection gate skipped |
| CLAUDE.md too large (30KB) | Low | Consumes many turns during odilo-developer onboarding -- consider moving to AGENTS.md or trimming |
| build-runner-llm still uses BUILD.md priority 2 | Low | With yq installed, workflow extraction (priority 1) will extract ALL run commands including sudo/setup commands -- might need filtering |