LLM Markdown Linter Implementation Plan
Project: terraphim_kg_linter
Goal: Implement a comprehensive markdown linter for Terraphim KG schemas that validates commands, knowledge graphs, and thesaurus definitions for LLM agents.
Related PR: #277 - Code Assistant Implementation
Implementation Phases
Phase 1: Foundation (Week 1)
Deliverables: Basic crate structure, parser infrastructure, core types
Tasks
- Create Crate Structure
- Set up Cargo.toml with dependencies
- Create module structure (parser/, validator/, reporter/)
- Add to workspace Cargo.toml
-
Define Core Types (
src/types.rs)LinterConfigstructSchemaTypeenumLintResultstructDiagnosticstruct with severity levelsLocationfor error reportingLintMetadatafor statistics
-
Implement Basic Parser (
src/parser/)frontmatter.rs: YAML frontmatter extraction (reuse TUI parser logic)command.rs: Command definition parsingmod.rs: Parser orchestration- Error handling with thiserror
-
Write Unit Tests
- Test YAML parsing
- Test command definition extraction
- Test error cases (invalid YAML, missing fields)
Dependencies:
[dependencies]
terraphim_types = { path = "../terraphim_types" }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
serde_yaml = "0.9"
regex = "1.10"
thiserror = "1.0"Success Criteria:
- ✓ Crate builds successfully
- ✓ Can parse valid command definitions
- ✓ Returns appropriate errors for invalid YAML
- ✓ All unit tests pass
Phase 2: Command Validation (Week 2)
Deliverables: Complete command validation with all rules
Tasks
-
Implement Validators (
src/validator/)command.rs: All command validation rules- Frontmatter validation
- Parameter validation
- Name/version format checking
- Type validation
security.rs: Permission and risk level validation- Permission format checking
- Risk level validation
- Execution mode validation
types.rs: Type system validation- Valid Rust type mapping
- Custom type references
-
Create Reporter (
src/reporter/)text.rs: Human-readable outputjson.rs: Machine-readable JSON outputllm.rs: LLM-friendly format with hints and suggestions
-
Write Comprehensive Tests
- Test all validation rules individually
- Test with valid-command.md example
- Test with invalid-command.md (should catch all 12 errors)
- Edge cases and boundary conditions
Code Example:
// src/validator/command.rs
Success Criteria:
- ✓ All validation rules implemented
- ✓ Catches all errors in invalid-command.md
- ✓ Validates valid-command.md successfully
- ✓ Produces LLM-friendly error messages
- ✓ JSON output is well-structured
Phase 3: Knowledge Graph Integration (Week 3)
Deliverables: Integration with terraphim_automata and terraphim_rolegraph
Tasks
- Add Dependencies
terraphim_automata = { path = "../terraphim_automata", features = ["remote-loading"] }
terraphim_rolegraph = { path = "../terraphim_rolegraph" }
tokio = { version = "1.0", features = ["full"] }-
Implement Automata Integration (
src/analyzer/automata.rs)- Load autocomplete index
- Validate terms against automata
- Fuzzy suggestion generation
- Term coverage analysis
-
Implement Graph Integration (
src/analyzer/graph.rs)- Load RoleGraph
- Validate KG requirements
- Path connectivity validation using
is_all_terms_connected_by_path - Node/edge validation
-
Extend KgLinter API
- Write Integration Tests
- Test with real thesaurus (use AutomataPath::local_example())
- Test graph connectivity validation
- Test fuzzy suggestion generation
- Test with valid-command.md requiring KG concepts
Success Criteria:
- ✓ Can load and use automata index
- ✓ Can load and use RoleGraph
- ✓ Validates KG requirements correctly
- ✓ Detects disconnected concepts
- ✓ Provides fuzzy suggestions for typos
Phase 4: KG Schema & Thesaurus Validation (Week 4)
Deliverables: Support for KG schema and thesaurus markdown formats
Tasks
-
Extend Parsers
parser/schema.rs: Knowledge graph schema parsingparser/thesaurus.rs: Thesaurus schema parsing- Auto-detection of schema type from frontmatter
-
Implement Schema Validators
validator/schema.rs: KG schema validation- Node type validation
- Edge type validation
- Relationship validation
- Type consistency checking
- Validate with valid-kg-schema.md example
-
Implement Thesaurus Validators
- Thesaurus structure validation
- Term normalization checking
- URL validation
- Synonym group validation
- Validate with valid-thesaurus-schema.md example
-
Graph Integrity Checks
- Node uniqueness
- Edge references
- Orphan node detection
- Symmetric edge validation
Success Criteria:
- ✓ Can parse all three schema types
- ✓ Validates KG schemas correctly
- ✓ Validates thesaurus schemas correctly
- ✓ Detects graph integrity issues
- ✓ All example schemas validate correctly
Phase 5: CLI Tool & Polish (Week 5)
Deliverables: Command-line tool, documentation, comprehensive tests
Tasks
-
Create CLI Binary (
src/bin/terraphim-kg-lint.rs)- Argument parsing with clap
- File/directory traversal
- Output formatting
- Exit codes
-
Add CLI Dependencies
[dependencies]
clap = { version = "4.0", features = ["derive"] }
walkdir = "2.0"
colored = "2.0"-
Documentation
- Complete README.md for crate
- API documentation (rustdoc)
- Usage examples
- Integration guide
-
Comprehensive Testing
- Integration tests with all example schemas
- CLI tests
- Performance benchmarks
- Test coverage analysis
-
CI/CD Integration
- Add to GitHub Actions workflows
- Add to pre-commit hooks
- Set up automated testing
CLI Example:
# Install
# Use
Success Criteria:
- ✓ CLI tool works correctly
- ✓ All documentation complete
- ✓ All tests passing
- ✓ CI/CD integration working
- ✓ Ready for production use
Testing Strategy
Unit Tests (Per Module)
// validator/command.rs
Integration Tests
// tests/command_validation_tests.rs
async
async
async End-to-End Tests
#!/bin/bash
# tests/e2e/test_cli.sh
# Test valid command
[ ||
# Test invalid command (should fail)
[ ||
# Test directory
Performance Targets
| Operation | Target | Notes | |-----------|--------|-------| | Parse command | < 1ms | Simple YAML parsing | | Validate command | < 10ms | Without graph integration | | Validate with automata | < 50ms | Includes index lookup | | Validate with graph | < 100ms | Includes connectivity check | | Lint directory (100 files) | < 5s | Parallel processing |
Code Quality Checklist
- [ ] All public APIs documented with rustdoc
- [ ] Error messages are clear and actionable
- [ ] LLM hints provide context and suggestions
- [ ] Code follows Rust naming conventions
- [ ] No unwrap() in library code (use Result)
- [ ] Async/await used correctly with tokio
- [ ] Tests cover success and error cases
- [ ] Examples compile and run correctly
- [ ] Pre-commit hooks pass
- [ ] clippy warnings resolved
- [ ] Code formatted with rustfmt
Integration Points
1. TUI Command System
// crates/terraphim_tui/src/commands/validator.rs
use ;
pub async 2. Pre-commit Hook
#!/bin/bash
# .git/hooks/pre-commit
# Find all markdown command files
for; do
# Skip if not a command file
|| continue
# Run linter
if ! ; then
fi
3. GitHub Actions
# .github/workflows/lint-schemas.yml
name: Lint KG Schemas
on:
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions-rs/toolchain@v1
- name: Build linter
run: cargo build -p terraphim_kg_linter --release
- name: Lint schemas
run: |
./target/release/terraphim-kg-lint \
examples/kg-linter-schemas/ \
--format json \
--strictFuture Enhancements (Post v1.0)
-
LSP Server
- Real-time validation in editors
- Autocomplete for YAML fields
- Hover documentation
-
Auto-fix
- Automatic formatting
- Add missing required fields
- Fix common typos
-
Graph Visualization
- Visual representation of disconnected concepts
- Interactive graph explorer
- Mermaid diagram generation
-
Plugin System
- Custom validation rules
- Domain-specific validators
- Third-party integrations
-
Schema Evolution
- Version migration tools
- Breaking change detection
- Compatibility checking
-
Web UI
- Browser-based linter
- Visual schema editor
- Collaborative validation
Dependencies on Other Work
- PR #277: Leverages security model and validation pipeline concepts
- terraphim_automata: Uses autocomplete and fuzzy search APIs
- terraphim_rolegraph: Uses graph connectivity validation
- terraphim_tui: Reuses markdown parser patterns
Success Metrics
| Metric | Target | Measurement |
|--------|--------|-------------|
| Code coverage | > 80% | cargo tarpaulin |
| Documentation coverage | 100% | cargo doc |
| Example schemas validated | 100% | Integration tests |
| Performance targets | 100% met | Benchmarks |
| CI/CD passing | All tests | GitHub Actions |
Development Environment Setup
# Clone repository
# Create feature branch
# Set up development dependencies
# Run existing tests to ensure setup is correct
# Create linter crate
# Start development
Milestone Tracking
Milestone 1: Foundation (Week 1)
- [ ] Crate structure created
- [ ] Core types defined
- [ ] Basic parser implemented
- [ ] Unit tests passing
Milestone 2: Command Validation (Week 2)
- [ ] All validation rules implemented
- [ ] Reporter formats working
- [ ] Integration tests passing
- [ ] Example schemas validated
Milestone 3: KG Integration (Week 3)
- [ ] Automata integration complete
- [ ] Graph validation working
- [ ] Path connectivity validated
- [ ] Integration tests with real data
Milestone 4: Schema Validation (Week 4)
- [ ] KG schema parsing
- [ ] Thesaurus parsing
- [ ] Graph integrity checks
- [ ] All schema types supported
Milestone 5: Production Ready (Week 5)
- [ ] CLI tool complete
- [ ] Documentation complete
- [ ] CI/CD integration
- [ ] Performance targets met
- [ ] Ready for v1.0 release
Questions & Decisions
Open Questions
- Should we support custom validation plugins?
- What's the priority for LSP integration?
- Should we support TOML frontmatter in addition to YAML?
- How to handle schema versioning and migrations?
Decisions Made
- ✓ Use YAML for frontmatter (consistent with existing TUI)
- ✓ Support async/await for graph operations
- ✓ LLM-friendly output is a first-class feature
- ✓ Leverage existing terraphim_automata and terraphim_rolegraph
- ✓ Start with three schema types: command, kg, thesaurus