mirror of
https://github.com/Theodor-Springmann-Stiftung/kgpz_web.git
synced 2025-10-30 17:45:30 +00:00
211 lines
5.8 KiB
Markdown
211 lines
5.8 KiB
Markdown
# Historical Newspaper Image Cleaning Pipeline
|
|
|
|
This pipeline automatically cleans and enhances scanned historical newspaper images by reducing noise, improving contrast, and sharpening text for better readability.
|
|
|
|
## Features
|
|
|
|
- **Noise Reduction**: Bilateral filtering and non-local means denoising
|
|
- **Contrast Enhancement**: CLAHE and gamma correction
|
|
- **Background Cleaning**: Morphological operations to remove artifacts
|
|
- **Text Sharpening**: Unsharp masking for improved readability
|
|
- **Batch Processing**: Process entire directories efficiently
|
|
- **Interactive Tuning**: Find optimal parameters for your specific images
|
|
- **Before/After Comparisons**: Visual validation of improvements
|
|
|
|
## Quick Start
|
|
|
|
### 1. Install Dependencies
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### 2. Process Single Image
|
|
|
|
```bash
|
|
python image_cleaner.py input_image.jpg -o cleaned_image.jpg --comparison
|
|
```
|
|
|
|
### 3. Batch Process Directory
|
|
|
|
```bash
|
|
python batch_process.py -i newspaper_scans -o cleaned_images
|
|
```
|
|
|
|
### 4. Interactive Parameter Tuning
|
|
|
|
```bash
|
|
python config_tuner.py sample_image.jpg
|
|
```
|
|
|
|
## Usage Examples
|
|
|
|
### Basic Image Cleaning
|
|
```bash
|
|
# Clean single image with default settings
|
|
python image_cleaner.py 1771-09b-02.jpg
|
|
|
|
# Clean with specific processing steps
|
|
python image_cleaner.py 1771-09b-02.jpg --steps denoise contrast sharpen
|
|
|
|
# Create before/after comparison
|
|
python image_cleaner.py 1771-09b-02.jpg -c
|
|
```
|
|
|
|
### Batch Processing
|
|
```bash
|
|
# Process all JPG files in current directory
|
|
python batch_process.py
|
|
|
|
# Process specific directory with custom output
|
|
python batch_process.py -i scans/ -o cleaned/
|
|
|
|
# Use custom configuration
|
|
python batch_process.py --config custom_config.json
|
|
|
|
# Skip comparison images for faster processing
|
|
python batch_process.py --no-comparisons
|
|
```
|
|
|
|
### Parameter Tuning
|
|
```bash
|
|
# Start interactive tuning session
|
|
python config_tuner.py sample_image.jpg
|
|
|
|
# Load existing config for fine-tuning
|
|
python config_tuner.py sample_image.jpg -c existing_config.json
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Default Parameters
|
|
|
|
The pipeline uses these default parameters optimized for newspaper scans:
|
|
|
|
```json
|
|
{
|
|
"bilateral_d": 9,
|
|
"bilateral_sigma_color": 75,
|
|
"bilateral_sigma_space": 75,
|
|
"clahe_clip_limit": 2.0,
|
|
"clahe_grid_size": [8, 8],
|
|
"gamma": 1.2,
|
|
"denoise_h": 10,
|
|
"morph_kernel_size": 2,
|
|
"unsharp_amount": 1.5,
|
|
"unsharp_radius": 1.0,
|
|
"unsharp_threshold": 0
|
|
}
|
|
```
|
|
|
|
### Parameter Descriptions
|
|
|
|
- **bilateral_d**: Neighborhood diameter for bilateral filtering (5-15)
|
|
- **bilateral_sigma_color**: Color space filter strength (50-150)
|
|
- **bilateral_sigma_space**: Coordinate space filter strength (50-150)
|
|
- **clahe_clip_limit**: Contrast limiting for CLAHE (1.0-4.0)
|
|
- **clahe_grid_size**: CLAHE tile grid size [width, height] (4-16)
|
|
- **gamma**: Gamma correction value (0.8-2.0)
|
|
- **denoise_h**: Denoising filter strength (5-20)
|
|
- **morph_kernel_size**: Morphological operation kernel size (1-5)
|
|
- **unsharp_amount**: Unsharp masking strength (0.5-3.0)
|
|
- **unsharp_radius**: Unsharp masking radius (0.5-2.0)
|
|
- **unsharp_threshold**: Unsharp masking threshold (0-10)
|
|
|
|
### Creating Custom Configurations
|
|
|
|
1. Generate default config template:
|
|
```bash
|
|
python batch_process.py --create-config
|
|
```
|
|
|
|
2. Edit `config.json` with your preferred values
|
|
|
|
3. Use custom config:
|
|
```bash
|
|
python batch_process.py --config config.json
|
|
```
|
|
|
|
## Processing Pipeline
|
|
|
|
The image cleaning pipeline applies these steps in sequence:
|
|
|
|
1. **Noise Reduction**
|
|
- Bilateral filtering preserves edges while reducing noise
|
|
- Non-local means denoising removes repetitive patterns
|
|
|
|
2. **Contrast Enhancement**
|
|
- CLAHE improves local contrast adaptively
|
|
- Gamma correction adjusts overall brightness
|
|
|
|
3. **Background Cleaning**
|
|
- Morphological operations remove small artifacts
|
|
- Background normalization reduces paper texture
|
|
|
|
4. **Sharpening**
|
|
- Unsharp masking enhances text edges
|
|
- Preserves fine details while reducing blur
|
|
|
|
## Interactive Tuning Commands
|
|
|
|
When using `config_tuner.py`, these commands are available:
|
|
|
|
- `set <param> <value>` - Adjust parameter value
|
|
- `show` - Display current parameters
|
|
- `test [steps]` - Process with current settings
|
|
- `compare [filename]` - Save before/after comparison
|
|
- `save <filename>` - Save configuration to file
|
|
- `load <filename>` - Load configuration from file
|
|
- `presets` - Show preset configurations
|
|
- `help` - Show detailed help
|
|
- `quit` - Exit tuning session
|
|
|
|
## Tips for Best Results
|
|
|
|
### For Light Damage/Noise:
|
|
- Reduce `bilateral_d` to 5-7
|
|
- Lower `denoise_h` to 5-8
|
|
- Use `clahe_clip_limit` around 1.5
|
|
|
|
### For Heavy Damage/Artifacts:
|
|
- Increase `bilateral_d` to 12-15
|
|
- Raise `denoise_h` to 15-20
|
|
- Use higher `clahe_clip_limit` (3.0-4.0)
|
|
|
|
### For Faded/Low Contrast Images:
|
|
- Increase `gamma` to 1.3-1.5
|
|
- Raise `clahe_clip_limit` to 3.0+
|
|
- Boost `unsharp_amount` to 2.0+
|
|
|
|
### For Sharp/High Quality Scans:
|
|
- Focus mainly on `denoise` and `sharpen` steps
|
|
- Skip `background` cleaning if unnecessary
|
|
- Use lighter settings to preserve quality
|
|
|
|
## File Structure
|
|
|
|
```
|
|
newspaper_image_cleaner/
|
|
├── image_cleaner.py # Core processing module
|
|
├── batch_process.py # Batch processing script
|
|
├── config_tuner.py # Interactive parameter tuning
|
|
├── requirements.txt # Python dependencies
|
|
└── README.md # This documentation
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### ImportError: No module named 'cv2'
|
|
Install OpenCV: `pip install opencv-python`
|
|
|
|
### Memory Issues with Large Images
|
|
The tuner automatically resizes large images. For batch processing of very large images, consider resizing first.
|
|
|
|
### Poor Results
|
|
Use the interactive tuner to find optimal parameters for your specific image characteristics.
|
|
|
|
## Performance
|
|
|
|
- Single 3000x2000 image: ~3-5 seconds
|
|
- Batch processing depends on image size and quantity
|
|
- Interactive tuning uses smaller images for faster feedback |