QueryRs Haystack Integration
Overview
The QueryRs haystack provides comprehensive Rust documentation search capabilities by integrating with query.rs, a powerful search engine for Rust. This haystack supports searching across multiple Rust documentation sources including Reddit posts, standard library documentation, crates, attributes, lints, books, caniuse, and error codes.
Features
✅ Currently Working
- Reddit Posts: Full JSON API integration with
/posts/search?q=keyword- Returns Reddit posts from r/rust with metadata (author, score, URL)
- Proper tagging and formatting with
[Reddit]prefix - Real-time search results from the Rust community
🔄 Planned Implementation
- Standard Library Docs: Stable and nightly documentation search
- Crates.io: Search for Rust crates and their documentation
- Attributes: Rust attribute search (e.g.,
#[derive],#[cfg]) - Clippy Lints: Search for Clippy lint documentation
- Rust Books: Search official Rust book content
- Caniuse: Rust feature compatibility search
- Error Codes: Rust compiler error code documentation
Architecture
Search Types Supported
The QueryRs haystack implements concurrent search across all query.rs endpoints:
// Concurrent search across all endpoints
let = join!;API Endpoints
| Search Type | Endpoint | Status | Response Format |
|-------------|----------|--------|-----------------|
| Reddit Posts | /posts/search?q=keyword | ✅ Working | JSON |
| Std Docs (Stable) | /stable?q=keyword | 🔄 Planned | HTML |
| Std Docs (Nightly) | /nightly?q=keyword | 🔄 Planned | HTML |
| Crates | /crates?q=keyword | 🔄 Planned | HTML |
| Attributes | /attributes?q=keyword | 🔄 Planned | HTML |
| Lints | /lints?q=keyword | 🔄 Planned | HTML |
| Books | /books?q=keyword | 🔄 Planned | HTML |
| Caniuse | /caniuse?q=keyword | 🔄 Planned | HTML |
| Error Codes | /errors?q=keyword | 🔄 Planned | HTML |
Configuration
Rust Engineer Role
The QueryRs haystack is configured through the "Rust Engineer" role:
Usage
Server Startup
Search Examples
# Search for async programming content
# Search for standard library items
# Search for crates
# Search for attributes
# Search for Clippy lints
# Search for error codes
Document Format
Reddit Posts (Currently Working)
Planned Document Formats
Standard Library Docs
Crates
Implementation Details
QueryRsHaystackIndexer
The main implementation is in crates/terraphim_middleware/src/haystack/query_rs.rs:
Error Handling
The implementation uses graceful degradation:
- Network failures return empty results instead of errors
- API errors are logged as warnings and continue with other sources
- Parse errors are handled gracefully for malformed responses
Performance
- Concurrent Search: All endpoints are searched concurrently using
tokio::join! - Caching: Results can be cached to improve performance
- Rate Limiting: Consider implementing rate limiting for production use
Testing
End-to-End Testing
Run the comprehensive test script:
This script validates:
- Server startup and configuration
- Role configuration updates
- Search functionality across multiple query types
- Result formatting and metadata
Test Queries
The test script covers various search types:
# Reddit posts
# Standard library docs
# Attributes
# Clippy lints
# Books
# Caniuse
# Error codes
Dependencies
[dependencies]
reqwest = { version = "0.11", features = ["json"] }
scraper = "0.19.0"
serde_json = "1.0"
async-trait = "0.1"Future Enhancements
HTML Parsing Implementation
The next phase involves implementing HTML parsing for the remaining endpoints:
- Standard Library Docs: Parse search results from
/stableand/nightly - Crates: Parse crate search results from
/crates - Attributes: Parse attribute documentation from
/attributes - Lints: Parse Clippy lint documentation from
/lints - Books: Parse Rust book content from
/books - Caniuse: Parse feature compatibility from
/caniuse - Error Codes: Parse error code documentation from
/errors
Advanced Features
- Type Signature Search: Support for searching by function signatures
- Advanced Filtering: Filter results by type, category, or source
- Result Ranking: Improved ranking based on relevance and popularity
- Caching Strategy: Implement intelligent caching for frequently searched terms
- Rate Limiting: Add rate limiting to respect query.rs API limits
Troubleshooting
Common Issues
- No Results from HTML Endpoints: HTML parsing is not yet implemented for most endpoints
- Network Timeouts: Increase timeout values for slow network connections
- Rate Limiting: Implement delays between requests if hitting rate limits
Debug Commands
# Test Reddit API directly
|
# Test server health
# Check server logs for parsing errors
Contributing
To implement HTML parsing for additional endpoints:
- Analyze the HTML structure of the target endpoint
- Implement the corresponding
parse_*_htmlfunction - Add appropriate CSS selectors for data extraction
- Test with various search queries
- Update documentation
License
This implementation follows the same license as the Terraphim project.