Design & Implementation Plan: Revert NormalizedTerm.id and Concept.id from String to u64
1. Summary of Target Behavior
After this revert, the system will:
NormalizedTerm.id:u64instead ofString(generated via atomic counter)Concept.id:u64instead ofString(generated via atomic counter)Edge.id:u64instead ofString(generated via Cantor pairing)Node.id:u64instead ofStringNode.connected_with:HashSet<u64>instead ofHashSet<String>IndexedDocument.nodes:Vec<u64>instead ofVec<String>magic_pair(x, y): Returnsu64(Cantor pairing) instead ofStringmagic_unpair(z): Takesu64, returns(u64, u64)instead of(String, String)- JSON fixtures: Contain integer IDs that deserialize correctly
- Python bindings: Align with restored u64 signatures
2. Key Invariants and Acceptance Criteria
Invariants
| Invariant | Description |
|-----------|-------------|
| ID uniqueness | INT_SEQ atomic counter ensures unique IDs per process |
| magic_pair symmetry | magic_pair(a, b) == magic_pair(b, a) |
| magic_unpair correctness | magic_unpair(magic_pair(a, b)) == (a, b) |
| No duplicate IDs | Inserting same term twice generates different IDs (new INT_SEQ each time) |
| Serialization round-trip | JSON fixtures deserialize correctly with integer IDs |
Acceptance Criteria
| # | Criterion | Verification |
|---|-----------|--------------|
| 1 | cargo build --workspace succeeds | Build verification |
| 2 | cargo test --workspace passes (369+ tests) | Test suite |
| 3 | cargo clippy --workspace --all-targets -- -D warnings passes | Lint check |
| 4 | cargo fmt --all -- --check passes | Style check |
| 5 | NormalizedTerm::new(1, ...) accepts u64 id | Unit test |
| 6 | Concept::with_id(1, ...) accepts u64 id | Unit test |
| 7 | magic_pair(3, 5) == magic_pair(5, 3) | Property test |
| 8 | magic_unpair(magic_pair(3, 5)) == (3, 5) | Property test |
| 9 | JSON fixtures with integer IDs deserialize | Integration test |
| 10 | terraphim_rolegraph_py bindings compile | Build verification |
3. High-Level Design and Boundaries
Strategy: Restore Original u64 Design
The simplest approach is to revert commit e0f98ee6 rather than cherry-pick changes. This restores:
- Global
INT_SEQatomic counter for sequential IDs - Cantor pairing for
magic_pair/magic_unpair - Direct
u64field types in structs
Scope Boundaries
Inside scope (will change):
crates/terraphim_types/src/lib.rs- Type definitionscrates/terraphim_rolegraph/src/lib.rs- Graph implementation with magic_pair/magic_unpaircrates/terraphim_automata/src/autocomplete.rs- Autocomplete indexcrates/terraphim_server/src/api.rs- API handlerscrates/terraphim_rolegraph_py/src/lib.rs- Python bindings- JSON fixture files
Outside scope (unchanged):
- UUID dependency (still used elsewhere)
- Serialization format for persisted data (handled separately)
- External API consumers (audit if needed)
Component Diagram
terraphim_types (CORE - types)
│
├── NormalizedTerm { id: u64, ... }
├── Concept { id: u64, ... }
├── Edge { id: u64, ... }
├── Node { id: u64, connected_with: HashSet<u64>, ... }
└── IndexedDocument { nodes: Vec<u64>, ... }
│
â–¼
terraphim_rolegraph (uses types)
│
├── RoleGraph { nodes: AHashMap<u64, Node>, edges: AHashMap<u64, Edge>, ... }
├── magic_pair(x: u64, y: u64) -> u64 (Cantor pairing)
└── magic_unpair(z: u64) -> (u64, u64)
│
â–¼
terraphim_automata (uses types)
│
├── Thesaurus { ... }
└── AutocompleteIndex { ... }
│
â–¼
terraphim_server (uses automata + rolegraph)
│
└── API handlers4. File/Module-Level Change Plan
| File/Module | Action | Before | After | Dependencies |
|-------------|--------|--------|-------|--------------|
| crates/terraphim_types/src/lib.rs | MODIFY | id: String, INT_SEQ removed | Restore id: u64, INT_SEQ atomic counter | No external deps |
| crates/terraphim_rolegraph/src/lib.rs | MODIFY | String IDs, String-based magic_pair | Restore u64 IDs, Cantor pairing | terraphim_types |
| crates/terraphim_rolegraph/src/medical.rs | MODIFY | String edge IDs | Restore u64 edge IDs | terraphim_types |
| crates/terraphim_rolegraph/examples/*.rs | MODIFY | String IDs in examples | Restore u64 IDs | terraphim_types, terraphim_rolegraph |
| crates/terraphim_automata/src/autocomplete.rs | MODIFY | String term IDs | Restore u64 term IDs | terraphim_types |
| crates/terraphim_automata/benches/*.rs | MODIFY | String IDs in benchmarks | Restore u64 IDs | terraphim_types |
| crates/terraphim_automata/tests/*.rs | MODIFY | String ID assertions | Restore u64 assertions | terraphim_types |
| crates/terraphim_automata/src/*.rs | REVIEW | Various ID usages | Check and update | terraphim_types |
| terraphim_server/src/api.rs | MODIFY | String-based unpairing | Restore u64 unpairing | terraphim_types, terraphim_rolegraph |
| crates/terraphim_rolegraph_py/src/lib.rs | MODIFY | String magic_pair signatures | Restore u64 signatures | terraphim_types |
| terraphim_server/fixtures/*.json | MODIFY | String IDs in fixtures | Restore integer IDs | Deserialize via serde |
| test-fixtures/*.json | MODIFY | String IDs | Restore integer IDs | Deserialize via serde |
| crates/terraphim_agent/data/guard_*.json | MODIFY | Already updated to strings | May need keep as-is OR convert back | Deserialize via serde |
Detailed Type Changes
terraphim_types/src/lib.rs
// RESTORE: Global atomic counter (was removed)
use ;
static INT_SEQ: AtomicU64 = new;
// MODIFY: NormalizedTerm
// MODIFY: Concept
// MODIFY: Edge
// MODIFY: Node
// MODIFY: IndexedDocument
terraphim_rolegraph/src/lib.rs
// RESTORE: Cantor pairing (was replaced with String version)
/// Magic pair - Cantor pairing function for edge IDs
/// Magic unpair - inverse of Cantor pairing
5. Step-by-Step Implementation Sequence
Step 1: Update terraphim_types crate (CORE)
Purpose: Restore fundamental type definitions Deployable state: Yes - types only, no dependents yet Risk: Low - isolated change
- Restore
INT_SEQatomic counter andget_int_id()function - Change
NormalizedTerm.idfromStringtou64 - REMOVE
NormalizedTerm::new_with_uuid()- delete entirely - Change
NormalizedTerm::new(id: impl Into<String>)tonew(id: u64) - Change
Concept.idfromStringtou64 - Change
Concept::with_id()signature to acceptu64 - Change
Concept::new()to useget_int_id()instead of UUID - Change
Edge.idfromStringtou64 - Change
Edge::new(id: impl Into<String>)tonew(id: u64) - Change
Node.idfromStringtou64 - Change
Node.connected_withfromHashSet<String>toHashSet<u64> - Change
IndexedDocument.nodesfromVec<String>toVec<u64> - Update tests in the same file
- Verify:
cargo test -p terraphim_types
Step 2: Update terraphim_rolegraph crate
Purpose: Restore graph implementation with u64 IDs Deployable state: After Step 1 Risk: Medium - many internal changes
- Restore
magic_pair(x: u64, y: u64) -> u64(Cantor pairing) - Restore
magic_unpair(z: u64) -> (u64, u64) - Update
RoleGraph.nodesfromAHashMap<String, Node>toAHashMap<u64, Node> - Update
RoleGraph.edgesfromAHashMap<String, Edge>toAHashMap<u64, Edge> - Update all methods that construct or query nodes/edges
- Update
init_or_update_node()signature - Update
init_or_update_edge()calls to use u64 - Update
medical.rsif it has edge handling - Update all examples in
examples/ - Verify:
cargo test -p terraphim_rolegraph
Step 3: Update terraphim_automata crate
Purpose: Restore autocomplete with u64 IDs Deployable state: After Step 1 Risk: Medium - thesaurus and autocomplete changes
- Update
AutocompleteIndexto useu64keys - Update
NormalizedTermconstruction in benchmarks - Update
NormalizedTermconstruction in tests - Verify:
cargo test -p terraphim_automata
Step 4: Update terraphim_server crate
Purpose: Fix API handlers that use magic_unpair Deployable state: After Steps 1-2 Risk: Medium - API signature changes
- Update
api.rsthat callsmagic_unpairto expect u64 input - Update any JSON fixture loading that expects string IDs
- Verify:
cargo test -p terraphim_server
Step 5: Update terraphim_rolegraph_py Python bindings
Purpose: Align Python bindings with restored signatures Deployable state: After Steps 1-2 Risk: Medium - FFI changes
- Update
magic_pairwrapper to accept/return u64 - Update
magic_unpairwrapper to accept/return u64 - Verify:
cargo build -p terraphim_rolegraph_py
Step 6: Update JSON fixture files
Purpose: Ensure fixtures deserialize correctly Deployable state: After Steps 1-5 Risk: Medium - many files
- Convert
thesaurus_Default.jsoninteger IDs (already integers, verify serde accepts) - Convert
haystack/*.jsoninteger IDs (already integers, verify serde accepts) - Convert
test-fixtures/term_to_id*.jsoninteger IDs (already integers) - Convert
crates/terraphim_agent/data/guard_*.jsonstring IDs to integers OR keep flexible deserializer - Verify:
cargo teston fixture-related tests
Step 7: Update Documentation
Purpose: Restore consistency between code and docs Deployable state: After all code changes Risk: Low - documentation only
- Update
docs/src/kg/knowledge-graph-system.md:nodes: AHashMap<String, Node>→nodes: AHashMap<u64, Node>edges: AHashMap<String, Edge>→edges: AHashMap<u64, Edge>Node.id: u64(already correct in docs)Edge.id: u64(already correct in docs)connected_with: HashSet<u64>(already correct in docs)
- Verify: Build docs if applicable
Step 8: Remove Benchmark File
Purpose: Clean up investigation artifacts Deployable state: After docs update Risk: Low - file removal
- Remove
crates/terraphim_types/benches/id_performance.rs - Remove
[[bench]]section fromcrates/terraphim_types/Cargo.toml - Remove
criteriondev-dependency fromcrates/terraphim_types/Cargo.toml
Step 9: Workspace verification
Purpose: Ensure all crates work together Deployable state: After all steps Risk: Low - final verification
cargo build --workspacecargo test --workspacecargo clippy --workspace --all-targets -- -D warningscargo fmt --all -- --check
6. Testing & Verification Strategy
| Acceptance Criteria | Test Type | Test Location |
|--------------------|-----------|--------------|
| NormalizedTerm::new(1, ...) works with u64 | Unit | terraphim_types/src/lib.rs tests |
| Concept::with_id(1, ...) works with u64 | Unit | terraphim_types/src/lib.rs tests |
| magic_pair(a, b) == magic_pair(b, a) | Property | terraphim_rolegraph/src/lib.rs tests |
| magic_unpair(magic_pair(a, b)) == (a, b) | Property | terraphim_rolegraph/src/lib.rs tests |
| JSON with integer IDs deserializes | Integration | terraphim_server tests |
| test_load_thesaurus_from_json passes | Integration | terraphim_automata tests |
| Full workspace builds | Build | CI |
| All workspace tests pass | E2E | CI |
Property-Based Tests for magic_pair/magic_unpair
7. Risk & Complexity Review
| Risk | Mitigation | Residual Risk | |------|------------|---------------| | Another crate depends on String IDs externally | Audit external API consumers before change | LOW | | JSON fixtures updated for String break if reverted | Convert fixtures as part of this change | LOW | | Python bindings remain out of sync | Update as part of this same PR | NONE (fixed in Step 5) | | Test assertions hardcoded to String format | Review test assertions before revert | MEDIUM | | Benchmarks measure wrong thing after revert | Update benchmarks to measure u64 performance | LOW | | INT_SEQ collision under heavy parallelism | AtomicU64 is lock-free, high contention threshold | VERY LOW |
8. Open Questions / Decisions for Human Review
| # | Question | Options |
|---|----------|---------|
| 1 | Flexible deserializer (accept both int AND string)? | CLEAN BREAK - No flexible deserializer, update all persisted data |
| 2 | NormalizedTerm::new_with_uuid()? | REMOVE - Delete the function entirely |
| 3 | Update docs/src/kg/knowledge-graph-system.md? | YES - Update as part of this PR to restore consistency |
| 4 | JSON fixtures: pure integers or hybrid? | PURE INTEGERS - Clean format, no hybrid |
| 5 | Benchmark id_performance.rs? | REMOVE - Created for investigation, no longer needed after revert |
Implementation Notes
Why Cantor Pairing?
The original magic_pair used Cantor pairing (Szudzik's variant):
a >= b ? a*a + a + b : b*b + a- Produces unique u64 for any pair (a, b)
- bijective:
unpair(pair(a, b)) == (a, b) - More efficient than String concatenation + parsing