Caching System

Overview

The RE-cue caching system speeds up re-runs of analysis by storing results from previous analyses. When you run analysis on the same codebase again, cached results are retrieved for files that haven’t changed, dramatically reducing analysis time.

Features

File-Level Caching

Each file’s analysis results are cached individually
Based on SHA-256 hash of file contents
Automatic cache invalidation when files change

Multiple Analysis Types

Support for caching different types of analysis per file
Examples: endpoints, models, services, use cases
Each analysis type is cached separately

Cache Persistence

Cache is stored in JSON format on disk
Survives across runs and system restarts
Located in <project>/specs/001-reverse/.cache/

Cache Statistics

Track cache hits and misses
Monitor cache size and entry count
Calculate hit rate percentage
View oldest and newest entries

Automatic Cleanup

Remove expired entries (if TTL is set)
Remove invalid entries (deleted or changed files)
Configurable maximum cache size

Usage

Basic Usage

By default, caching is enabled when you run analysis:

# Caching is enabled by default
recue --spec --description "my project"

Disable Caching

To disable caching for a specific run:

recue --no-cache --spec --description "my project"

Clear Cache

To clear all cached results before running analysis:

recue --clear-cache --spec --description "my project"

View Cache Statistics

To view current cache statistics:

recue --cache-stats

Output example:

============================================================
CACHE STATISTICS
============================================================
Cache File:        /path/to/project/specs/001-reverse/.cache/analysis_cache.json
Total Entries:     150
Cache Size:        2,456,789 bytes
Cache Hits:        120
Cache Misses:      30
Hit Rate:          80.0%
Oldest Entry:      2025-11-23 10:15:30
Newest Entry:      2025-11-24 00:30:45
============================================================

Clean Up Cache

To remove expired and invalid cache entries:

recue --cleanup-cache

How It Works

1. File Analysis

When a file is analyzed:

Compute SHA-256 hash of file contents
Check if hash exists in cache for this analysis type
If found and valid, return cached result
If not found, analyze file and cache result

2. Cache Validation

For each cached entry, the system:

Checks if the file still exists
Computes current file hash
Compares with cached hash
Invalidates entry if hash differs

3. Cache Storage

Cache is stored as JSON with structure:

{
  "version": "1.0",
  "timestamp": "2025-11-24T00:30:45.123456",
  "entries": {
    "file_path:analysis_type": {
      "file_path": "/absolute/path/to/file.py",
      "file_hash": "abc123...",
      "timestamp": 1732412445.123,
      "result": { ... },
      "metadata": { ... }
    }
  },
  "statistics": {
    "hits": 120,
    "misses": 30,
    "total_entries": 150
  }
}

Configuration

In Code

When creating an analyzer programmatically:

from reverse_engineer.optimized_analyzer import OptimizedAnalyzer

analyzer = OptimizedAnalyzer(
    repo_root=repo_path,
    enable_caching=True,        # Enable/disable caching
    enable_incremental=True,    # Works alongside caching
    enable_parallel=True,       # Independent optimization
    verbose=True
)

Cache Manager Options

For advanced usage, you can configure the cache manager:

from reverse_engineer.cache_manager import CacheManager

cache = CacheManager(
    cache_dir=Path("my_cache"),
    cache_name="custom_cache",
    ttl_seconds=3600,           # 1 hour TTL
    max_entries=1000            # Limit to 1000 entries
)

Performance Benefits

Typical Speedup

First run: No cache, full analysis
Second run: ~5-10x faster for unchanged files
After small changes: Only changed files re-analyzed

Example Scenario

Project with 500 Python files:

First run: 120 seconds
Modify 10 files
Second run: ~15 seconds (10 files analyzed, 490 from cache)
Speedup: 8x faster

Integration with Other Optimizations

The caching system works alongside other optimizations:

With Incremental Analysis

Incremental: Skips files based on modification time
Caching: Retrieves results from previous runs
Together: Maximum performance, minimal re-work

With Parallel Processing

Cache lookups are fast (no parallel needed)
Only files needing analysis use parallel processing
Reduces CPU usage when many files are cached

Cache Maintenance

Automatic Maintenance

The cache automatically:

Validates entries on access
Removes entries for deleted files
Updates entries for changed files

Manual Maintenance

Periodically run cleanup:

# Remove invalid entries
recue --cleanup-cache

# Or clear everything and start fresh
recue --clear-cache

Cache Location

Cache files are stored in:

<project-root>/
└── specs/
    └── 001-reverse/
        └── .cache/
            └── analysis_cache.json

To manually delete cache:

rm -rf specs/001-reverse/.cache/

Troubleshooting

Cache Not Working

If cache doesn’t seem to work:

Check if caching is enabled (default: yes)
Verify cache directory exists and is writable
Check for file permission issues
View cache stats to see hit/miss rate

Cache Taking Too Much Space

If cache grows too large:

Use --cleanup-cache to remove invalid entries
Use --clear-cache to start fresh
Consider setting max_entries in code

Incorrect Results from Cache

If you suspect cache has stale data:

Use --clear-cache to invalidate all entries
Run analysis again to rebuild cache
File hash validation should prevent this

Best Practices

1. Enable by Default

Leave caching enabled unless you have specific reasons to disable it.

2. Periodic Cleanup

Run cleanup monthly or after major refactoring:

recue --cleanup-cache

3. Fresh Start After Major Changes

After major codebase restructuring:

recue --clear-cache --spec --description "after refactor"

4. Monitor Statistics

Check cache effectiveness periodically:

recue --cache-stats

5. Combine with Other Optimizations

Use caching with incremental and parallel processing:

# All optimizations enabled (default)
recue --spec --description "optimized run"

Advanced Topics

Multiple Analysis Types

Different analysis types can be cached for the same file:

# Cache endpoints analysis
analyzer.process_files_optimized(
    files,
    endpoint_processor,
    analysis_type="endpoints"
)

# Cache models analysis (separate cache)
analyzer.process_files_optimized(
    files,
    model_processor,
    analysis_type="models"
)

Custom Cache Keys

The cache key format is:

<absolute_file_path>:<analysis_type>

This ensures different analysis types don’t conflict.

Cache Statistics API

Access statistics programmatically:

stats = analyzer.cache_manager.get_statistics()
print(f"Hit rate: {stats.hit_rate:.1f}%")
print(f"Total entries: {stats.total_entries}")

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Caching System

Overview

Features

File-Level Caching

Multiple Analysis Types

Cache Persistence

Cache Statistics

Automatic Cleanup

Usage

Basic Usage

Disable Caching

Clear Cache

View Cache Statistics

Clean Up Cache

How It Works

1. File Analysis

2. Cache Validation

3. Cache Storage

Configuration

In Code

Cache Manager Options

Performance Benefits

Typical Speedup

Example Scenario

Integration with Other Optimizations

With Incremental Analysis

With Parallel Processing

Cache Maintenance

Automatic Maintenance

Manual Maintenance

Cache Location

Troubleshooting

Cache Not Working

Cache Taking Too Much Space

Incorrect Results from Cache

Best Practices

1. Enable by Default

2. Periodic Cleanup

3. Fresh Start After Major Changes

4. Monitor Statistics

5. Combine with Other Optimizations

Advanced Topics

Multiple Analysis Types

Custom Cache Keys

Cache Statistics API

See Also

Feedback