LLM Markdown Linter Design for Terraphim KG Schemas
Overview
This document specifies the design for an LLM-focused markdown linter that validates markdown-based Terraphim Knowledge Graph schemas. The linter provides AI agents with predefined commands, data type definitions in markdown-like KG structures, and enforces security permissions.
Goals
- Command Validation: Validate markdown files containing AI agent command definitions
- Schema Validation: Ensure KG schema definitions (nodes, edges, concepts) are well-formed
- Security Enforcement: Validate permissions, risk levels, and execution modes
- Type Safety: Check data type definitions and relationships
- Graph Integrity: Validate graph connectivity and term relationships
- LLM-Friendly: Provide clear, actionable error messages for AI agents
Architecture
Core Components
┌─────────────────────────────────────────────────────────────┐
│ LLM Markdown Linter │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Parser │ │ Validator │ │ Reporter │ │
│ │ Layer │→ │ Layer │→ │ Layer │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ ↓ ↓ ↓ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ YAML Front │ │ KG Schema │ │ JSON/Text │ │
│ │ matter │ │ Checker │ │ Output │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │
│ ↓ ↓ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Command Def │ │ Automata │ │
│ │ Validator │ │ Integration │ │
│ └──────────────┘ └──────────────┘ │
│ │ │
│ ↓ │
│ ┌──────────────┐ │
│ │ RoleGraph │ │
│ │ Validation │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘Markdown Schema Format
1. Command Definition Schema
Commands are defined in markdown files with YAML frontmatter:
- -
- -
- - -
Implementation Details
The command leverages:
- Aho-Corasick automata for fast term matching
- Graph embeddings for semantic similarity
- BM25 ranking for relevance scoring
### 2. Knowledge Graph Schema
KG schemas define nodes, edges, and relationships:
```markdown
---
schema_type: "knowledge_graph"
schema_name: "engineering_concepts"
version: "1.0.0"
namespace: "terraphim.kg.engineering"
node_types:
- name: "Concept"
properties:
- name: "id"
type: "u64"
required: true
unique: true
- name: "value"
type: "NormalizedTermValue"
required: true
- name: "rank"
type: "u64"
default: 0
- name: "metadata"
type: "object"
required: false
- name: "Document"
properties:
- name: "id"
type: "string"
required: true
unique: true
- name: "url"
type: "string"
required: true
validation:
pattern: "^https?://"
- name: "body"
type: "string"
required: true
edge_types:
- name: "RelatedTo"
from: "Concept"
to: "Concept"
properties:
- name: "rank"
type: "u64"
default: 1
- name: "doc_hash"
type: "HashMap<String, u64>"
relationships:
- type: "path_connectivity"
description: "All matched terms must be connected by a single path"
validator: "is_all_terms_connected_by_path"
- type: "bidirectional"
description: "Edges are bidirectional in the graph"
constraint: "symmetric"
security:
read_permissions:
- "kg:read"
- "concept:view"
write_permissions:
- "kg:write"
- "concept:modify"
delete_permissions:
- "kg:admin"
- "concept:delete"
---
# Engineering Concepts Knowledge Graph
This knowledge graph represents engineering concepts and their relationships,
optimized for semantic search and autocomplete functionality.
## Graph Structure
- **Nodes**: Represent individual concepts with normalized terms
- **Edges**: Represent relationships between concepts with co-occurrence counts
- **Documents**: Source documents that contain the concepts
## Validation Rules
1. **Node Uniqueness**: Each concept must have a unique ID
2. **Edge Connectivity**: Edges must reference valid node IDs
3. **Path Connectivity**: Terms in queries should form connected paths
4. **Normalized Terms**: All terms must be lowercase and trimmed3. Thesaurus Schema
Thesaurus definitions for term normalization:
-
- -
-
-
-
## Validation Rules
### 1. Frontmatter Validation
| Rule | Severity | Description |
|------|----------|-------------|
| Valid YAML | Error | Frontmatter must be valid YAML |
| Required Fields | Error | `name`, `description` must be present |
| Valid Execution Mode | Error | Must be `local`, `firecracker`, or `hybrid` |
| Valid Risk Level | Error | Must be `low`, `medium`, `high`, or `critical` |
| Parameter Types | Error | Must be `string`, `number`, `boolean`, `array`, `object` |
| Command Name Format | Error | Must start with letter, alphanumeric + `-_` only |
| Version Format | Warning | Should follow semver (e.g., "1.0.0") |
| Unique Parameters | Error | Parameter names must be unique |
| Required Without Default | Error | Required parameters cannot have default values |
### 2. Knowledge Graph Validation
| Rule | Severity | Description |
|------|----------|-------------|
| Node ID Uniqueness | Error | All node IDs must be unique within graph |
| Edge References | Error | Edge IDs must reference valid nodes |
| Type Consistency | Error | Property types must match schema |
| Path Connectivity | Warning | Recommended: matched terms should form paths |
| Normalized Terms | Error | All terms must be lowercase, trimmed |
| Symmetric Edges | Error | Bidirectional edges must have reverse edges |
| Orphan Nodes | Warning | Nodes should have at least one edge |
| Document References | Error | Document IDs in edges must exist |
### 3. Security Validation
| Rule | Severity | Description |
|------|----------|-------------|
| Permission Format | Error | Permissions must follow `resource:action` format |
| Valid Permissions | Error | Permissions must be in allowed list |
| Risk/Mode Match | Warning | High-risk commands should use Firecracker mode |
| Resource Limits | Error | Limits must be positive integers |
| Network Access | Warning | Network access requires justification |
| KG Concepts Exist | Error | Required KG concepts must exist in graph |
| Permission Hierarchy | Error | Write requires read, delete requires write |
### 4. Type System Validation
| Rule | Severity | Description |
|------|----------|-------------|
| Rust Type Mapping | Error | Types must map to valid Rust types |
| Generic Constraints | Error | Generic types must specify constraints |
| Option/Result Usage | Warning | Nullable fields should use Option<T> |
| Collection Types | Error | Collections must specify element types |
| Custom Type Refs | Error | Custom types must be defined in schema |
| Enum Values | Error | Enum values must be explicitly listed |
## Implementation Plan
### Crate Structure
crates/terraphim_kg_linter/ ├── Cargo.toml ├── src/ │ ├── lib.rs # Public API │ ├── parser/ │ │ ├── mod.rs # Parser orchestration │ │ ├── frontmatter.rs # YAML frontmatter parser │ │ ├── command.rs # Command definition parser │ │ ├── schema.rs # KG schema parser │ │ └── thesaurus.rs # Thesaurus schema parser │ ├── validator/ │ │ ├── mod.rs # Validation orchestration │ │ ├── command.rs # Command validation rules │ │ ├── schema.rs # Schema validation rules │ │ ├── security.rs # Security validation rules │ │ ├── graph.rs # Graph integrity validation │ │ └── types.rs # Type system validation │ ├── analyzer/ │ │ ├── mod.rs # Analysis engine │ │ ├── automata.rs # Automata integration │ │ ├── graph.rs # RoleGraph integration │ │ └── embeddings.rs # Graph embeddings analysis │ ├── reporter/ │ │ ├── mod.rs # Report generation │ │ ├── json.rs # JSON output │ │ ├── text.rs # Human-readable output │ │ └── llm.rs # LLM-friendly output │ ├── types.rs # Type definitions │ └── error.rs # Error types ├── tests/ │ ├── command_validation_tests.rs │ ├── schema_validation_tests.rs │ ├── security_tests.rs │ └── integration_tests.rs └── examples/ ├── valid_command.md ├── valid_schema.md ├── invalid_examples.md └── linter_usage.rs
### Key Dependencies
```toml
[dependencies]
# Existing Terraphim crates
terraphim_types = { path = "../terraphim_types" }
terraphim_automata = { path = "../terraphim_automata" }
terraphim_rolegraph = { path = "../terraphim_rolegraph" }
# Parsing and validation
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
serde_yaml = "0.9"
regex = "1.10"
# Error handling
thiserror = "1.0"
# Async support
tokio = { version = "1.0", features = ["full"] }
# Logging
tracing = "0.1"Public API
// lib.rs
/// Main linter interface
/// Linter configuration
/// Schema type discriminator
/// Lint result
/// Individual diagnostic
/// Diagnostic severity
/// Location in file
/// Validation report with graph analysis
/// Graph analysis results
/// Automata analysis results
Validation Algorithms
1. Command Validation Pipeline
async 2. Knowledge Graph Schema Validation
async 3. Graph Connectivity Validation
Leverages existing is_all_terms_connected_by_path from terraphim_rolegraph:
4. Automata-Based Term Validation
async LLM-Friendly Output Format
The linter provides specialized output for LLM consumption:
Integration with Existing Components
1. terraphim_automata Integration
use ;
2. terraphim_rolegraph Integration
use ;
use ;
CLI Tool
// bin/terraphim-kg-lint.rs
use Parser;
use ;
async Usage Examples
1. Lint a Single Command File
2. Lint All KG Schemas in Directory
3. Programmatic Usage
use ;
async Testing Strategy
1. Unit Tests
2. Integration Tests
async Extension Points
1. Custom Validation Rules
2. Custom Reporters
Future Enhancements
- LSP Integration: Language Server Protocol for IDE integration
- Auto-fix: Automatic fixes for common issues
- Graph Visualization: Visual representation of connectivity issues
- Benchmark Suite: Performance testing with large schemas
- Plugin System: Dynamic loading of custom validators
- CI/CD Integration: GitHub Actions, GitLab CI pipelines
- Schema Evolution: Validate schema migrations
- Embedding Analysis: Semantic similarity checks using graph embeddings
Related Work
This linter builds upon and integrates with:
- PR #277: Code Assistant with validation pipeline and security model
- terraphim_automata: Fast term matching and autocomplete
- terraphim_rolegraph: Knowledge graph structure and path connectivity
- terraphim_tui: Markdown command parser and execution system
- Graph Embeddings: Semantic relationship validation