mirror of
https://github.com/Theodor-Springmann-Stiftung/kgpz_web.git
synced 2025-10-29 00:55:32 +00:00
Historical Newspaper Image Cleaning Pipeline
This pipeline automatically cleans and enhances scanned historical newspaper images by reducing noise, improving contrast, and sharpening text for better readability.
Features
- Noise Reduction: Bilateral filtering and non-local means denoising
- Contrast Enhancement: CLAHE and gamma correction
- Background Cleaning: Morphological operations to remove artifacts
- Text Sharpening: Unsharp masking for improved readability
- Batch Processing: Process entire directories efficiently
- Interactive Tuning: Find optimal parameters for your specific images
- Before/After Comparisons: Visual validation of improvements
Quick Start
1. Install Dependencies
pip install -r requirements.txt
2. Process Single Image
python image_cleaner.py input_image.jpg -o cleaned_image.jpg --comparison
3. Batch Process Directory
python batch_process.py -i newspaper_scans -o cleaned_images
4. Interactive Parameter Tuning
python config_tuner.py sample_image.jpg
Usage Examples
Basic Image Cleaning
# Clean single image with default settings
python image_cleaner.py 1771-09b-02.jpg
# Clean with specific processing steps
python image_cleaner.py 1771-09b-02.jpg --steps denoise contrast sharpen
# Create before/after comparison
python image_cleaner.py 1771-09b-02.jpg -c
Batch Processing
# Process all JPG files in current directory
python batch_process.py
# Process specific directory with custom output
python batch_process.py -i scans/ -o cleaned/
# Use custom configuration
python batch_process.py --config custom_config.json
# Skip comparison images for faster processing
python batch_process.py --no-comparisons
Parameter Tuning
# Start interactive tuning session
python config_tuner.py sample_image.jpg
# Load existing config for fine-tuning
python config_tuner.py sample_image.jpg -c existing_config.json
Configuration
Default Parameters
The pipeline uses these default parameters optimized for newspaper scans:
{
"bilateral_d": 9,
"bilateral_sigma_color": 75,
"bilateral_sigma_space": 75,
"clahe_clip_limit": 2.0,
"clahe_grid_size": [8, 8],
"gamma": 1.2,
"denoise_h": 10,
"morph_kernel_size": 2,
"unsharp_amount": 1.5,
"unsharp_radius": 1.0,
"unsharp_threshold": 0
}
Parameter Descriptions
- bilateral_d: Neighborhood diameter for bilateral filtering (5-15)
- bilateral_sigma_color: Color space filter strength (50-150)
- bilateral_sigma_space: Coordinate space filter strength (50-150)
- clahe_clip_limit: Contrast limiting for CLAHE (1.0-4.0)
- clahe_grid_size: CLAHE tile grid size [width, height] (4-16)
- gamma: Gamma correction value (0.8-2.0)
- denoise_h: Denoising filter strength (5-20)
- morph_kernel_size: Morphological operation kernel size (1-5)
- unsharp_amount: Unsharp masking strength (0.5-3.0)
- unsharp_radius: Unsharp masking radius (0.5-2.0)
- unsharp_threshold: Unsharp masking threshold (0-10)
Creating Custom Configurations
- Generate default config template:
python batch_process.py --create-config
-
Edit
config.jsonwith your preferred values -
Use custom config:
python batch_process.py --config config.json
Processing Pipeline
The image cleaning pipeline applies these steps in sequence:
-
Noise Reduction
- Bilateral filtering preserves edges while reducing noise
- Non-local means denoising removes repetitive patterns
-
Contrast Enhancement
- CLAHE improves local contrast adaptively
- Gamma correction adjusts overall brightness
-
Background Cleaning
- Morphological operations remove small artifacts
- Background normalization reduces paper texture
-
Sharpening
- Unsharp masking enhances text edges
- Preserves fine details while reducing blur
Interactive Tuning Commands
When using config_tuner.py, these commands are available:
set <param> <value>- Adjust parameter valueshow- Display current parameterstest [steps]- Process with current settingscompare [filename]- Save before/after comparisonsave <filename>- Save configuration to fileload <filename>- Load configuration from filepresets- Show preset configurationshelp- Show detailed helpquit- Exit tuning session
Tips for Best Results
For Light Damage/Noise:
- Reduce
bilateral_dto 5-7 - Lower
denoise_hto 5-8 - Use
clahe_clip_limitaround 1.5
For Heavy Damage/Artifacts:
- Increase
bilateral_dto 12-15 - Raise
denoise_hto 15-20 - Use higher
clahe_clip_limit(3.0-4.0)
For Faded/Low Contrast Images:
- Increase
gammato 1.3-1.5 - Raise
clahe_clip_limitto 3.0+ - Boost
unsharp_amountto 2.0+
For Sharp/High Quality Scans:
- Focus mainly on
denoiseandsharpensteps - Skip
backgroundcleaning if unnecessary - Use lighter settings to preserve quality
File Structure
newspaper_image_cleaner/
├── image_cleaner.py # Core processing module
├── batch_process.py # Batch processing script
├── config_tuner.py # Interactive parameter tuning
├── requirements.txt # Python dependencies
└── README.md # This documentation
Troubleshooting
ImportError: No module named 'cv2'
Install OpenCV: pip install opencv-python
Memory Issues with Large Images
The tuner automatically resizes large images. For batch processing of very large images, consider resizing first.
Poor Results
Use the interactive tuner to find optimal parameters for your specific image characteristics.
Performance
- Single 3000x2000 image: ~3-5 seconds
- Batch processing depends on image size and quantity
- Interactive tuning uses smaller images for faster feedback