mirror of
https://github.com/Theodor-Springmann-Stiftung/kgpz_web.git
synced 2025-10-30 01:25:30 +00:00
image cleaner
This commit is contained in:
211
scripts/ex/README.md
Normal file
211
scripts/ex/README.md
Normal file
@@ -0,0 +1,211 @@
|
||||
# Historical Newspaper Image Cleaning Pipeline
|
||||
|
||||
This pipeline automatically cleans and enhances scanned historical newspaper images by reducing noise, improving contrast, and sharpening text for better readability.
|
||||
|
||||
## Features
|
||||
|
||||
- **Noise Reduction**: Bilateral filtering and non-local means denoising
|
||||
- **Contrast Enhancement**: CLAHE and gamma correction
|
||||
- **Background Cleaning**: Morphological operations to remove artifacts
|
||||
- **Text Sharpening**: Unsharp masking for improved readability
|
||||
- **Batch Processing**: Process entire directories efficiently
|
||||
- **Interactive Tuning**: Find optimal parameters for your specific images
|
||||
- **Before/After Comparisons**: Visual validation of improvements
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Install Dependencies
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 2. Process Single Image
|
||||
|
||||
```bash
|
||||
python image_cleaner.py input_image.jpg -o cleaned_image.jpg --comparison
|
||||
```
|
||||
|
||||
### 3. Batch Process Directory
|
||||
|
||||
```bash
|
||||
python batch_process.py -i newspaper_scans -o cleaned_images
|
||||
```
|
||||
|
||||
### 4. Interactive Parameter Tuning
|
||||
|
||||
```bash
|
||||
python config_tuner.py sample_image.jpg
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic Image Cleaning
|
||||
```bash
|
||||
# Clean single image with default settings
|
||||
python image_cleaner.py 1771-09b-02.jpg
|
||||
|
||||
# Clean with specific processing steps
|
||||
python image_cleaner.py 1771-09b-02.jpg --steps denoise contrast sharpen
|
||||
|
||||
# Create before/after comparison
|
||||
python image_cleaner.py 1771-09b-02.jpg -c
|
||||
```
|
||||
|
||||
### Batch Processing
|
||||
```bash
|
||||
# Process all JPG files in current directory
|
||||
python batch_process.py
|
||||
|
||||
# Process specific directory with custom output
|
||||
python batch_process.py -i scans/ -o cleaned/
|
||||
|
||||
# Use custom configuration
|
||||
python batch_process.py --config custom_config.json
|
||||
|
||||
# Skip comparison images for faster processing
|
||||
python batch_process.py --no-comparisons
|
||||
```
|
||||
|
||||
### Parameter Tuning
|
||||
```bash
|
||||
# Start interactive tuning session
|
||||
python config_tuner.py sample_image.jpg
|
||||
|
||||
# Load existing config for fine-tuning
|
||||
python config_tuner.py sample_image.jpg -c existing_config.json
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Default Parameters
|
||||
|
||||
The pipeline uses these default parameters optimized for newspaper scans:
|
||||
|
||||
```json
|
||||
{
|
||||
"bilateral_d": 9,
|
||||
"bilateral_sigma_color": 75,
|
||||
"bilateral_sigma_space": 75,
|
||||
"clahe_clip_limit": 2.0,
|
||||
"clahe_grid_size": [8, 8],
|
||||
"gamma": 1.2,
|
||||
"denoise_h": 10,
|
||||
"morph_kernel_size": 2,
|
||||
"unsharp_amount": 1.5,
|
||||
"unsharp_radius": 1.0,
|
||||
"unsharp_threshold": 0
|
||||
}
|
||||
```
|
||||
|
||||
### Parameter Descriptions
|
||||
|
||||
- **bilateral_d**: Neighborhood diameter for bilateral filtering (5-15)
|
||||
- **bilateral_sigma_color**: Color space filter strength (50-150)
|
||||
- **bilateral_sigma_space**: Coordinate space filter strength (50-150)
|
||||
- **clahe_clip_limit**: Contrast limiting for CLAHE (1.0-4.0)
|
||||
- **clahe_grid_size**: CLAHE tile grid size [width, height] (4-16)
|
||||
- **gamma**: Gamma correction value (0.8-2.0)
|
||||
- **denoise_h**: Denoising filter strength (5-20)
|
||||
- **morph_kernel_size**: Morphological operation kernel size (1-5)
|
||||
- **unsharp_amount**: Unsharp masking strength (0.5-3.0)
|
||||
- **unsharp_radius**: Unsharp masking radius (0.5-2.0)
|
||||
- **unsharp_threshold**: Unsharp masking threshold (0-10)
|
||||
|
||||
### Creating Custom Configurations
|
||||
|
||||
1. Generate default config template:
|
||||
```bash
|
||||
python batch_process.py --create-config
|
||||
```
|
||||
|
||||
2. Edit `config.json` with your preferred values
|
||||
|
||||
3. Use custom config:
|
||||
```bash
|
||||
python batch_process.py --config config.json
|
||||
```
|
||||
|
||||
## Processing Pipeline
|
||||
|
||||
The image cleaning pipeline applies these steps in sequence:
|
||||
|
||||
1. **Noise Reduction**
|
||||
- Bilateral filtering preserves edges while reducing noise
|
||||
- Non-local means denoising removes repetitive patterns
|
||||
|
||||
2. **Contrast Enhancement**
|
||||
- CLAHE improves local contrast adaptively
|
||||
- Gamma correction adjusts overall brightness
|
||||
|
||||
3. **Background Cleaning**
|
||||
- Morphological operations remove small artifacts
|
||||
- Background normalization reduces paper texture
|
||||
|
||||
4. **Sharpening**
|
||||
- Unsharp masking enhances text edges
|
||||
- Preserves fine details while reducing blur
|
||||
|
||||
## Interactive Tuning Commands
|
||||
|
||||
When using `config_tuner.py`, these commands are available:
|
||||
|
||||
- `set <param> <value>` - Adjust parameter value
|
||||
- `show` - Display current parameters
|
||||
- `test [steps]` - Process with current settings
|
||||
- `compare [filename]` - Save before/after comparison
|
||||
- `save <filename>` - Save configuration to file
|
||||
- `load <filename>` - Load configuration from file
|
||||
- `presets` - Show preset configurations
|
||||
- `help` - Show detailed help
|
||||
- `quit` - Exit tuning session
|
||||
|
||||
## Tips for Best Results
|
||||
|
||||
### For Light Damage/Noise:
|
||||
- Reduce `bilateral_d` to 5-7
|
||||
- Lower `denoise_h` to 5-8
|
||||
- Use `clahe_clip_limit` around 1.5
|
||||
|
||||
### For Heavy Damage/Artifacts:
|
||||
- Increase `bilateral_d` to 12-15
|
||||
- Raise `denoise_h` to 15-20
|
||||
- Use higher `clahe_clip_limit` (3.0-4.0)
|
||||
|
||||
### For Faded/Low Contrast Images:
|
||||
- Increase `gamma` to 1.3-1.5
|
||||
- Raise `clahe_clip_limit` to 3.0+
|
||||
- Boost `unsharp_amount` to 2.0+
|
||||
|
||||
### For Sharp/High Quality Scans:
|
||||
- Focus mainly on `denoise` and `sharpen` steps
|
||||
- Skip `background` cleaning if unnecessary
|
||||
- Use lighter settings to preserve quality
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
newspaper_image_cleaner/
|
||||
├── image_cleaner.py # Core processing module
|
||||
├── batch_process.py # Batch processing script
|
||||
├── config_tuner.py # Interactive parameter tuning
|
||||
├── requirements.txt # Python dependencies
|
||||
└── README.md # This documentation
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### ImportError: No module named 'cv2'
|
||||
Install OpenCV: `pip install opencv-python`
|
||||
|
||||
### Memory Issues with Large Images
|
||||
The tuner automatically resizes large images. For batch processing of very large images, consider resizing first.
|
||||
|
||||
### Poor Results
|
||||
Use the interactive tuner to find optimal parameters for your specific image characteristics.
|
||||
|
||||
## Performance
|
||||
|
||||
- Single 3000x2000 image: ~3-5 seconds
|
||||
- Batch processing depends on image size and quantity
|
||||
- Interactive tuning uses smaller images for faster feedback
|
||||
Reference in New Issue
Block a user