Parallel Processing for Faster Analysis

Overview

RE-cue automatically uses parallel processing to analyze large codebases faster by utilizing multiple CPU cores. This feature is enabled by default and requires no configuration for most users.

Quick Start

Simply run your analysis as usual:

reverse-engineer --use-cases

RE-cue will automatically:

  • Detect the number of CPU cores
  • Use parallel processing for projects with 10+ files
  • Fall back to sequential processing for smaller projects

Performance Tips

For large projects (100+ files):

# Let RE-cue use all CPU cores (default)
reverse-engineer --use-cases --verbose

For resource-constrained systems:

# Limit to 2 worker processes
reverse-engineer --use-cases --max-workers 2

For debugging or troubleshooting:

# Disable parallel processing for clearer error messages
reverse-engineer --use-cases --no-parallel --verbose

Command Line Options

OptionDescriptionDefault
--parallelEnable parallel processingEnabled
--no-parallelDisable parallel processing-
--max-workers NSet maximum worker processesCPU count

When Does It Help?

Most Beneficial For

  • Large codebases: 50+ files benefit significantly
  • Complex projects: Spring Boot, Django, Rails applications
  • Multi-module projects: Multiple services or microservices
  • Modern hardware: Systems with 4+ CPU cores

Minimal Benefit For

  • Small projects: Less than 10 files (auto-disabled)
  • Simple structures: Single-file applications
  • Limited hardware: Single-core or memory-constrained systems

Expected Speedup

Based on benchmarks:

Project SizeSequentialParallel (4 cores)Speedup
10 files0.5s0.6s0.8x
50 files2.5s1.2s2.1x
100 files5.0s1.8s2.8x
500 files25.0s7.5s3.3x

Note: Actual performance varies based on hardware and project complexity

Troubleshooting

Slower Performance with Parallel Processing

If parallel processing seems slower:

  1. Check project size - small projects have overhead
  2. Disable for projects under 20 files: --no-parallel
  3. Monitor system resources during analysis

Out of Memory Errors

If you encounter memory issues:

  1. Reduce worker count: --max-workers 2
  2. Close other applications during analysis
  3. Use sequential processing: --no-parallel

Inconsistent Results

Parallel processing should produce identical results to sequential. If you notice differences:

  1. Run again with sequential: --no-parallel
  2. Compare outputs carefully
  3. Report issue with project details

Examples

Standard Analysis

# Analyze project with default settings
reverse-engineer --use-cases /path/to/project

Large Project Optimization

# Analyze large codebase with maximum parallelism
reverse-engineer --use-cases \
  --parallel \
  --cache \
  --incremental \
  --verbose \
  /path/to/large/project

Resource-Constrained System

# Analyze with limited resources
reverse-engineer --use-cases \
  --max-workers 2 \
  --no-cache \
  /path/to/project

Debugging Mode

# Analyze with detailed error messages
reverse-engineer --use-cases \
  --no-parallel \
  --verbose \
  /path/to/project

Combining with Other Optimizations

Parallel processing works well with other performance features:

With Caching

# First run: analyze and cache results
reverse-engineer --use-cases --parallel --cache

# Subsequent runs: use cached results for unchanged files
reverse-engineer --use-cases --parallel --cache

With Incremental Analysis

# Only analyze changed files
reverse-engineer --use-cases --parallel --incremental

Complete Optimization Stack

# Use all performance features together
reverse-engineer --use-cases \
  --parallel \
  --cache \
  --incremental \
  --max-workers 4 \
  --verbose

Technical Details

For developers and advanced users:

  • Technology: Python multiprocessing.ProcessPoolExecutor
  • Isolation: Each worker runs in separate process (no shared state issues)
  • Pickling: Module-level functions ensure compatibility
  • Error Handling: Per-file errors don’t stop overall processing
  • Progress: Real-time progress bar shows completion status

Learn More

Feedback

If you experience issues or have suggestions:

  1. Check Troubleshooting section above
  2. Review GitHub Issues
  3. Submit detailed bug report with:
    • Project size (number of files)
    • Command used
    • System specifications (CPU cores, RAM)
    • Error messages or unexpected behavior