mirror of
				https://github.com/Theodor-Springmann-Stiftung/kgpz_web.git
				synced 2025-11-04 03:35:30 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			211 lines
		
	
	
		
			5.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			211 lines
		
	
	
		
			5.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# Historical Newspaper Image Cleaning Pipeline
 | 
						|
 | 
						|
This pipeline automatically cleans and enhances scanned historical newspaper images by reducing noise, improving contrast, and sharpening text for better readability.
 | 
						|
 | 
						|
## Features
 | 
						|
 | 
						|
- **Noise Reduction**: Bilateral filtering and non-local means denoising
 | 
						|
- **Contrast Enhancement**: CLAHE and gamma correction
 | 
						|
- **Background Cleaning**: Morphological operations to remove artifacts
 | 
						|
- **Text Sharpening**: Unsharp masking for improved readability
 | 
						|
- **Batch Processing**: Process entire directories efficiently
 | 
						|
- **Interactive Tuning**: Find optimal parameters for your specific images
 | 
						|
- **Before/After Comparisons**: Visual validation of improvements
 | 
						|
 | 
						|
## Quick Start
 | 
						|
 | 
						|
### 1. Install Dependencies
 | 
						|
 | 
						|
```bash
 | 
						|
pip install -r requirements.txt
 | 
						|
```
 | 
						|
 | 
						|
### 2. Process Single Image
 | 
						|
 | 
						|
```bash
 | 
						|
python image_cleaner.py input_image.jpg -o cleaned_image.jpg --comparison
 | 
						|
```
 | 
						|
 | 
						|
### 3. Batch Process Directory
 | 
						|
 | 
						|
```bash
 | 
						|
python batch_process.py -i newspaper_scans -o cleaned_images
 | 
						|
```
 | 
						|
 | 
						|
### 4. Interactive Parameter Tuning
 | 
						|
 | 
						|
```bash
 | 
						|
python config_tuner.py sample_image.jpg
 | 
						|
```
 | 
						|
 | 
						|
## Usage Examples
 | 
						|
 | 
						|
### Basic Image Cleaning
 | 
						|
```bash
 | 
						|
# Clean single image with default settings
 | 
						|
python image_cleaner.py 1771-09b-02.jpg
 | 
						|
 | 
						|
# Clean with specific processing steps
 | 
						|
python image_cleaner.py 1771-09b-02.jpg --steps denoise contrast sharpen
 | 
						|
 | 
						|
# Create before/after comparison
 | 
						|
python image_cleaner.py 1771-09b-02.jpg -c
 | 
						|
```
 | 
						|
 | 
						|
### Batch Processing
 | 
						|
```bash
 | 
						|
# Process all JPG files in current directory
 | 
						|
python batch_process.py
 | 
						|
 | 
						|
# Process specific directory with custom output
 | 
						|
python batch_process.py -i scans/ -o cleaned/
 | 
						|
 | 
						|
# Use custom configuration
 | 
						|
python batch_process.py --config custom_config.json
 | 
						|
 | 
						|
# Skip comparison images for faster processing
 | 
						|
python batch_process.py --no-comparisons
 | 
						|
```
 | 
						|
 | 
						|
### Parameter Tuning
 | 
						|
```bash
 | 
						|
# Start interactive tuning session
 | 
						|
python config_tuner.py sample_image.jpg
 | 
						|
 | 
						|
# Load existing config for fine-tuning
 | 
						|
python config_tuner.py sample_image.jpg -c existing_config.json
 | 
						|
```
 | 
						|
 | 
						|
## Configuration
 | 
						|
 | 
						|
### Default Parameters
 | 
						|
 | 
						|
The pipeline uses these default parameters optimized for newspaper scans:
 | 
						|
 | 
						|
```json
 | 
						|
{
 | 
						|
    "bilateral_d": 9,
 | 
						|
    "bilateral_sigma_color": 75,
 | 
						|
    "bilateral_sigma_space": 75,
 | 
						|
    "clahe_clip_limit": 2.0,
 | 
						|
    "clahe_grid_size": [8, 8],
 | 
						|
    "gamma": 1.2,
 | 
						|
    "denoise_h": 10,
 | 
						|
    "morph_kernel_size": 2,
 | 
						|
    "unsharp_amount": 1.5,
 | 
						|
    "unsharp_radius": 1.0,
 | 
						|
    "unsharp_threshold": 0
 | 
						|
}
 | 
						|
```
 | 
						|
 | 
						|
### Parameter Descriptions
 | 
						|
 | 
						|
- **bilateral_d**: Neighborhood diameter for bilateral filtering (5-15)
 | 
						|
- **bilateral_sigma_color**: Color space filter strength (50-150)
 | 
						|
- **bilateral_sigma_space**: Coordinate space filter strength (50-150)
 | 
						|
- **clahe_clip_limit**: Contrast limiting for CLAHE (1.0-4.0)
 | 
						|
- **clahe_grid_size**: CLAHE tile grid size [width, height] (4-16)
 | 
						|
- **gamma**: Gamma correction value (0.8-2.0)
 | 
						|
- **denoise_h**: Denoising filter strength (5-20)
 | 
						|
- **morph_kernel_size**: Morphological operation kernel size (1-5)
 | 
						|
- **unsharp_amount**: Unsharp masking strength (0.5-3.0)
 | 
						|
- **unsharp_radius**: Unsharp masking radius (0.5-2.0)
 | 
						|
- **unsharp_threshold**: Unsharp masking threshold (0-10)
 | 
						|
 | 
						|
### Creating Custom Configurations
 | 
						|
 | 
						|
1. Generate default config template:
 | 
						|
```bash
 | 
						|
python batch_process.py --create-config
 | 
						|
```
 | 
						|
 | 
						|
2. Edit `config.json` with your preferred values
 | 
						|
 | 
						|
3. Use custom config:
 | 
						|
```bash
 | 
						|
python batch_process.py --config config.json
 | 
						|
```
 | 
						|
 | 
						|
## Processing Pipeline
 | 
						|
 | 
						|
The image cleaning pipeline applies these steps in sequence:
 | 
						|
 | 
						|
1. **Noise Reduction**
 | 
						|
   - Bilateral filtering preserves edges while reducing noise
 | 
						|
   - Non-local means denoising removes repetitive patterns
 | 
						|
 | 
						|
2. **Contrast Enhancement**
 | 
						|
   - CLAHE improves local contrast adaptively
 | 
						|
   - Gamma correction adjusts overall brightness
 | 
						|
 | 
						|
3. **Background Cleaning**
 | 
						|
   - Morphological operations remove small artifacts
 | 
						|
   - Background normalization reduces paper texture
 | 
						|
 | 
						|
4. **Sharpening**
 | 
						|
   - Unsharp masking enhances text edges
 | 
						|
   - Preserves fine details while reducing blur
 | 
						|
 | 
						|
## Interactive Tuning Commands
 | 
						|
 | 
						|
When using `config_tuner.py`, these commands are available:
 | 
						|
 | 
						|
- `set <param> <value>` - Adjust parameter value
 | 
						|
- `show` - Display current parameters
 | 
						|
- `test [steps]` - Process with current settings
 | 
						|
- `compare [filename]` - Save before/after comparison
 | 
						|
- `save <filename>` - Save configuration to file
 | 
						|
- `load <filename>` - Load configuration from file
 | 
						|
- `presets` - Show preset configurations
 | 
						|
- `help` - Show detailed help
 | 
						|
- `quit` - Exit tuning session
 | 
						|
 | 
						|
## Tips for Best Results
 | 
						|
 | 
						|
### For Light Damage/Noise:
 | 
						|
- Reduce `bilateral_d` to 5-7
 | 
						|
- Lower `denoise_h` to 5-8
 | 
						|
- Use `clahe_clip_limit` around 1.5
 | 
						|
 | 
						|
### For Heavy Damage/Artifacts:
 | 
						|
- Increase `bilateral_d` to 12-15
 | 
						|
- Raise `denoise_h` to 15-20
 | 
						|
- Use higher `clahe_clip_limit` (3.0-4.0)
 | 
						|
 | 
						|
### For Faded/Low Contrast Images:
 | 
						|
- Increase `gamma` to 1.3-1.5
 | 
						|
- Raise `clahe_clip_limit` to 3.0+
 | 
						|
- Boost `unsharp_amount` to 2.0+
 | 
						|
 | 
						|
### For Sharp/High Quality Scans:
 | 
						|
- Focus mainly on `denoise` and `sharpen` steps
 | 
						|
- Skip `background` cleaning if unnecessary
 | 
						|
- Use lighter settings to preserve quality
 | 
						|
 | 
						|
## File Structure
 | 
						|
 | 
						|
```
 | 
						|
newspaper_image_cleaner/
 | 
						|
├── image_cleaner.py      # Core processing module
 | 
						|
├── batch_process.py      # Batch processing script
 | 
						|
├── config_tuner.py       # Interactive parameter tuning
 | 
						|
├── requirements.txt      # Python dependencies
 | 
						|
└── README.md            # This documentation
 | 
						|
```
 | 
						|
 | 
						|
## Troubleshooting
 | 
						|
 | 
						|
### ImportError: No module named 'cv2'
 | 
						|
Install OpenCV: `pip install opencv-python`
 | 
						|
 | 
						|
### Memory Issues with Large Images
 | 
						|
The tuner automatically resizes large images. For batch processing of very large images, consider resizing first.
 | 
						|
 | 
						|
### Poor Results
 | 
						|
Use the interactive tuner to find optimal parameters for your specific image characteristics.
 | 
						|
 | 
						|
## Performance
 | 
						|
 | 
						|
- Single 3000x2000 image: ~3-5 seconds
 | 
						|
- Batch processing depends on image size and quantity
 | 
						|
- Interactive tuning uses smaller images for faster feedback |