Caching System
Overview
The RE-cue caching system speeds up re-runs of analysis by storing results from previous analyses. When you run analysis on the same codebase again, cached results are retrieved for files that haven’t changed, dramatically reducing analysis time.
Features
File-Level Caching
- Each file’s analysis results are cached individually
- Based on SHA-256 hash of file contents
- Automatic cache invalidation when files change
Multiple Analysis Types
- Support for caching different types of analysis per file
- Examples: endpoints, models, services, use cases
- Each analysis type is cached separately
Cache Persistence
- Cache is stored in JSON format on disk
- Survives across runs and system restarts
- Located in
<project>/specs/001-reverse/.cache/
Cache Statistics
- Track cache hits and misses
- Monitor cache size and entry count
- Calculate hit rate percentage
- View oldest and newest entries
Automatic Cleanup
- Remove expired entries (if TTL is set)
- Remove invalid entries (deleted or changed files)
- Configurable maximum cache size
Usage
Basic Usage
By default, caching is enabled when you run analysis:
# Caching is enabled by default
recue --spec --description "my project"
Disable Caching
To disable caching for a specific run:
recue --no-cache --spec --description "my project"
Clear Cache
To clear all cached results before running analysis:
recue --clear-cache --spec --description "my project"
View Cache Statistics
To view current cache statistics:
recue --cache-stats
Output example:
============================================================
CACHE STATISTICS
============================================================
Cache File: /path/to/project/specs/001-reverse/.cache/analysis_cache.json
Total Entries: 150
Cache Size: 2,456,789 bytes
Cache Hits: 120
Cache Misses: 30
Hit Rate: 80.0%
Oldest Entry: 2025-11-23 10:15:30
Newest Entry: 2025-11-24 00:30:45
============================================================
Clean Up Cache
To remove expired and invalid cache entries:
recue --cleanup-cache
How It Works
1. File Analysis
When a file is analyzed:
- Compute SHA-256 hash of file contents
- Check if hash exists in cache for this analysis type
- If found and valid, return cached result
- If not found, analyze file and cache result
2. Cache Validation
For each cached entry, the system:
- Checks if the file still exists
- Computes current file hash
- Compares with cached hash
- Invalidates entry if hash differs
3. Cache Storage
Cache is stored as JSON with structure:
{
"version": "1.0",
"timestamp": "2025-11-24T00:30:45.123456",
"entries": {
"file_path:analysis_type": {
"file_path": "/absolute/path/to/file.py",
"file_hash": "abc123...",
"timestamp": 1732412445.123,
"result": { ... },
"metadata": { ... }
}
},
"statistics": {
"hits": 120,
"misses": 30,
"total_entries": 150
}
}
Configuration
In Code
When creating an analyzer programmatically:
from reverse_engineer.optimized_analyzer import OptimizedAnalyzer
analyzer = OptimizedAnalyzer(
repo_root=repo_path,
enable_caching=True, # Enable/disable caching
enable_incremental=True, # Works alongside caching
enable_parallel=True, # Independent optimization
verbose=True
)
Cache Manager Options
For advanced usage, you can configure the cache manager:
from reverse_engineer.cache_manager import CacheManager
cache = CacheManager(
cache_dir=Path("my_cache"),
cache_name="custom_cache",
ttl_seconds=3600, # 1 hour TTL
max_entries=1000 # Limit to 1000 entries
)
Performance Benefits
Typical Speedup
- First run: No cache, full analysis
- Second run: ~5-10x faster for unchanged files
- After small changes: Only changed files re-analyzed
Example Scenario
Project with 500 Python files:
- First run: 120 seconds
- Modify 10 files
- Second run: ~15 seconds (10 files analyzed, 490 from cache)
- Speedup: 8x faster
Integration with Other Optimizations
The caching system works alongside other optimizations:
With Incremental Analysis
- Incremental: Skips files based on modification time
- Caching: Retrieves results from previous runs
- Together: Maximum performance, minimal re-work
With Parallel Processing
- Cache lookups are fast (no parallel needed)
- Only files needing analysis use parallel processing
- Reduces CPU usage when many files are cached
Cache Maintenance
Automatic Maintenance
The cache automatically:
- Validates entries on access
- Removes entries for deleted files
- Updates entries for changed files
Manual Maintenance
Periodically run cleanup:
# Remove invalid entries
recue --cleanup-cache
# Or clear everything and start fresh
recue --clear-cache
Cache Location
Cache files are stored in:
<project-root>/
└── specs/
└── 001-reverse/
└── .cache/
└── analysis_cache.json
To manually delete cache:
rm -rf specs/001-reverse/.cache/
Troubleshooting
Cache Not Working
If cache doesn’t seem to work:
- Check if caching is enabled (default: yes)
- Verify cache directory exists and is writable
- Check for file permission issues
- View cache stats to see hit/miss rate
Cache Taking Too Much Space
If cache grows too large:
- Use
--cleanup-cacheto remove invalid entries - Use
--clear-cacheto start fresh - Consider setting
max_entriesin code
Incorrect Results from Cache
If you suspect cache has stale data:
- Use
--clear-cacheto invalidate all entries - Run analysis again to rebuild cache
- File hash validation should prevent this
Best Practices
1. Enable by Default
Leave caching enabled unless you have specific reasons to disable it.
2. Periodic Cleanup
Run cleanup monthly or after major refactoring:
recue --cleanup-cache
3. Fresh Start After Major Changes
After major codebase restructuring:
recue --clear-cache --spec --description "after refactor"
4. Monitor Statistics
Check cache effectiveness periodically:
recue --cache-stats
5. Combine with Other Optimizations
Use caching with incremental and parallel processing:
# All optimizations enabled (default)
recue --spec --description "optimized run"
Advanced Topics
Multiple Analysis Types
Different analysis types can be cached for the same file:
# Cache endpoints analysis
analyzer.process_files_optimized(
files,
endpoint_processor,
analysis_type="endpoints"
)
# Cache models analysis (separate cache)
analyzer.process_files_optimized(
files,
model_processor,
analysis_type="models"
)
Custom Cache Keys
The cache key format is:
<absolute_file_path>:<analysis_type>
This ensures different analysis types don’t conflict.
Cache Statistics API
Access statistics programmatically:
stats = analyzer.cache_manager.get_statistics()
print(f"Hit rate: {stats.hit_rate:.1f}%")
print(f"Total entries: {stats.total_entries}")
See Also
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.