mirror of
https://github.com/Theodor-Springmann-Stiftung/kgpz_web.git
synced 2025-10-29 09:05:30 +00:00
image cleaner
This commit is contained in:
56
scripts/ex/.gitignore
vendored
Normal file
56
scripts/ex/.gitignore
vendored
Normal file
@@ -0,0 +1,56 @@
|
|||||||
|
# Python Virtual Environment
|
||||||
|
venv/
|
||||||
|
env/
|
||||||
|
.env
|
||||||
|
|
||||||
|
# Python cache files
|
||||||
|
__pycache__/
|
||||||
|
*.pyc
|
||||||
|
*.pyo
|
||||||
|
*.pyd
|
||||||
|
.Python
|
||||||
|
|
||||||
|
# Generated/processed images
|
||||||
|
demo_*
|
||||||
|
cleaned_*
|
||||||
|
comparison_*
|
||||||
|
*_cleaned_*
|
||||||
|
*_comparison_*
|
||||||
|
|
||||||
|
# Processing outputs
|
||||||
|
cleaned/
|
||||||
|
output/
|
||||||
|
results/
|
||||||
|
|
||||||
|
# Configuration files (may contain sensitive settings)
|
||||||
|
config.json
|
||||||
|
*.config.json
|
||||||
|
custom_*.json
|
||||||
|
|
||||||
|
# Temporary files
|
||||||
|
*.tmp
|
||||||
|
*.temp
|
||||||
|
.DS_Store
|
||||||
|
Thumbs.db
|
||||||
|
|
||||||
|
# IDE files
|
||||||
|
.vscode/
|
||||||
|
.idea/
|
||||||
|
*.swp
|
||||||
|
*.swo
|
||||||
|
*~
|
||||||
|
|
||||||
|
# Logs
|
||||||
|
*.log
|
||||||
|
logs/
|
||||||
|
|
||||||
|
# Test outputs
|
||||||
|
test_*
|
||||||
|
sample_output/
|
||||||
|
|
||||||
|
# Large source images (uncomment if you don't want to track originals)
|
||||||
|
# *.jpg
|
||||||
|
# *.jpeg
|
||||||
|
# *.png
|
||||||
|
# *.tif
|
||||||
|
# *.tiff
|
||||||
BIN
scripts/ex/1771-09b-02.jpg
Normal file
BIN
scripts/ex/1771-09b-02.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 5.2 MiB |
BIN
scripts/ex/1772-07b-02.jpg
Normal file
BIN
scripts/ex/1772-07b-02.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 5.4 MiB |
BIN
scripts/ex/1772-34-136.jpg
Normal file
BIN
scripts/ex/1772-34-136.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 5.1 MiB |
211
scripts/ex/README.md
Normal file
211
scripts/ex/README.md
Normal file
@@ -0,0 +1,211 @@
|
|||||||
|
# Historical Newspaper Image Cleaning Pipeline
|
||||||
|
|
||||||
|
This pipeline automatically cleans and enhances scanned historical newspaper images by reducing noise, improving contrast, and sharpening text for better readability.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- **Noise Reduction**: Bilateral filtering and non-local means denoising
|
||||||
|
- **Contrast Enhancement**: CLAHE and gamma correction
|
||||||
|
- **Background Cleaning**: Morphological operations to remove artifacts
|
||||||
|
- **Text Sharpening**: Unsharp masking for improved readability
|
||||||
|
- **Batch Processing**: Process entire directories efficiently
|
||||||
|
- **Interactive Tuning**: Find optimal parameters for your specific images
|
||||||
|
- **Before/After Comparisons**: Visual validation of improvements
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### 1. Install Dependencies
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Process Single Image
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python image_cleaner.py input_image.jpg -o cleaned_image.jpg --comparison
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Batch Process Directory
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python batch_process.py -i newspaper_scans -o cleaned_images
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Interactive Parameter Tuning
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python config_tuner.py sample_image.jpg
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage Examples
|
||||||
|
|
||||||
|
### Basic Image Cleaning
|
||||||
|
```bash
|
||||||
|
# Clean single image with default settings
|
||||||
|
python image_cleaner.py 1771-09b-02.jpg
|
||||||
|
|
||||||
|
# Clean with specific processing steps
|
||||||
|
python image_cleaner.py 1771-09b-02.jpg --steps denoise contrast sharpen
|
||||||
|
|
||||||
|
# Create before/after comparison
|
||||||
|
python image_cleaner.py 1771-09b-02.jpg -c
|
||||||
|
```
|
||||||
|
|
||||||
|
### Batch Processing
|
||||||
|
```bash
|
||||||
|
# Process all JPG files in current directory
|
||||||
|
python batch_process.py
|
||||||
|
|
||||||
|
# Process specific directory with custom output
|
||||||
|
python batch_process.py -i scans/ -o cleaned/
|
||||||
|
|
||||||
|
# Use custom configuration
|
||||||
|
python batch_process.py --config custom_config.json
|
||||||
|
|
||||||
|
# Skip comparison images for faster processing
|
||||||
|
python batch_process.py --no-comparisons
|
||||||
|
```
|
||||||
|
|
||||||
|
### Parameter Tuning
|
||||||
|
```bash
|
||||||
|
# Start interactive tuning session
|
||||||
|
python config_tuner.py sample_image.jpg
|
||||||
|
|
||||||
|
# Load existing config for fine-tuning
|
||||||
|
python config_tuner.py sample_image.jpg -c existing_config.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Default Parameters
|
||||||
|
|
||||||
|
The pipeline uses these default parameters optimized for newspaper scans:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"bilateral_d": 9,
|
||||||
|
"bilateral_sigma_color": 75,
|
||||||
|
"bilateral_sigma_space": 75,
|
||||||
|
"clahe_clip_limit": 2.0,
|
||||||
|
"clahe_grid_size": [8, 8],
|
||||||
|
"gamma": 1.2,
|
||||||
|
"denoise_h": 10,
|
||||||
|
"morph_kernel_size": 2,
|
||||||
|
"unsharp_amount": 1.5,
|
||||||
|
"unsharp_radius": 1.0,
|
||||||
|
"unsharp_threshold": 0
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Parameter Descriptions
|
||||||
|
|
||||||
|
- **bilateral_d**: Neighborhood diameter for bilateral filtering (5-15)
|
||||||
|
- **bilateral_sigma_color**: Color space filter strength (50-150)
|
||||||
|
- **bilateral_sigma_space**: Coordinate space filter strength (50-150)
|
||||||
|
- **clahe_clip_limit**: Contrast limiting for CLAHE (1.0-4.0)
|
||||||
|
- **clahe_grid_size**: CLAHE tile grid size [width, height] (4-16)
|
||||||
|
- **gamma**: Gamma correction value (0.8-2.0)
|
||||||
|
- **denoise_h**: Denoising filter strength (5-20)
|
||||||
|
- **morph_kernel_size**: Morphological operation kernel size (1-5)
|
||||||
|
- **unsharp_amount**: Unsharp masking strength (0.5-3.0)
|
||||||
|
- **unsharp_radius**: Unsharp masking radius (0.5-2.0)
|
||||||
|
- **unsharp_threshold**: Unsharp masking threshold (0-10)
|
||||||
|
|
||||||
|
### Creating Custom Configurations
|
||||||
|
|
||||||
|
1. Generate default config template:
|
||||||
|
```bash
|
||||||
|
python batch_process.py --create-config
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Edit `config.json` with your preferred values
|
||||||
|
|
||||||
|
3. Use custom config:
|
||||||
|
```bash
|
||||||
|
python batch_process.py --config config.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Processing Pipeline
|
||||||
|
|
||||||
|
The image cleaning pipeline applies these steps in sequence:
|
||||||
|
|
||||||
|
1. **Noise Reduction**
|
||||||
|
- Bilateral filtering preserves edges while reducing noise
|
||||||
|
- Non-local means denoising removes repetitive patterns
|
||||||
|
|
||||||
|
2. **Contrast Enhancement**
|
||||||
|
- CLAHE improves local contrast adaptively
|
||||||
|
- Gamma correction adjusts overall brightness
|
||||||
|
|
||||||
|
3. **Background Cleaning**
|
||||||
|
- Morphological operations remove small artifacts
|
||||||
|
- Background normalization reduces paper texture
|
||||||
|
|
||||||
|
4. **Sharpening**
|
||||||
|
- Unsharp masking enhances text edges
|
||||||
|
- Preserves fine details while reducing blur
|
||||||
|
|
||||||
|
## Interactive Tuning Commands
|
||||||
|
|
||||||
|
When using `config_tuner.py`, these commands are available:
|
||||||
|
|
||||||
|
- `set <param> <value>` - Adjust parameter value
|
||||||
|
- `show` - Display current parameters
|
||||||
|
- `test [steps]` - Process with current settings
|
||||||
|
- `compare [filename]` - Save before/after comparison
|
||||||
|
- `save <filename>` - Save configuration to file
|
||||||
|
- `load <filename>` - Load configuration from file
|
||||||
|
- `presets` - Show preset configurations
|
||||||
|
- `help` - Show detailed help
|
||||||
|
- `quit` - Exit tuning session
|
||||||
|
|
||||||
|
## Tips for Best Results
|
||||||
|
|
||||||
|
### For Light Damage/Noise:
|
||||||
|
- Reduce `bilateral_d` to 5-7
|
||||||
|
- Lower `denoise_h` to 5-8
|
||||||
|
- Use `clahe_clip_limit` around 1.5
|
||||||
|
|
||||||
|
### For Heavy Damage/Artifacts:
|
||||||
|
- Increase `bilateral_d` to 12-15
|
||||||
|
- Raise `denoise_h` to 15-20
|
||||||
|
- Use higher `clahe_clip_limit` (3.0-4.0)
|
||||||
|
|
||||||
|
### For Faded/Low Contrast Images:
|
||||||
|
- Increase `gamma` to 1.3-1.5
|
||||||
|
- Raise `clahe_clip_limit` to 3.0+
|
||||||
|
- Boost `unsharp_amount` to 2.0+
|
||||||
|
|
||||||
|
### For Sharp/High Quality Scans:
|
||||||
|
- Focus mainly on `denoise` and `sharpen` steps
|
||||||
|
- Skip `background` cleaning if unnecessary
|
||||||
|
- Use lighter settings to preserve quality
|
||||||
|
|
||||||
|
## File Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
newspaper_image_cleaner/
|
||||||
|
├── image_cleaner.py # Core processing module
|
||||||
|
├── batch_process.py # Batch processing script
|
||||||
|
├── config_tuner.py # Interactive parameter tuning
|
||||||
|
├── requirements.txt # Python dependencies
|
||||||
|
└── README.md # This documentation
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### ImportError: No module named 'cv2'
|
||||||
|
Install OpenCV: `pip install opencv-python`
|
||||||
|
|
||||||
|
### Memory Issues with Large Images
|
||||||
|
The tuner automatically resizes large images. For batch processing of very large images, consider resizing first.
|
||||||
|
|
||||||
|
### Poor Results
|
||||||
|
Use the interactive tuner to find optimal parameters for your specific image characteristics.
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
- Single 3000x2000 image: ~3-5 seconds
|
||||||
|
- Batch processing depends on image size and quantity
|
||||||
|
- Interactive tuning uses smaller images for faster feedback
|
||||||
162
scripts/ex/batch_process.py
Executable file
162
scripts/ex/batch_process.py
Executable file
@@ -0,0 +1,162 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Batch Processing Script for Historical Newspaper Images
|
||||||
|
|
||||||
|
Simple script to process multiple images with the newspaper cleaning pipeline.
|
||||||
|
Includes progress tracking and error handling.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from image_cleaner import NewspaperImageCleaner, create_comparison_image
|
||||||
|
|
||||||
|
|
||||||
|
def process_batch(input_dir=".", output_dir="cleaned", config_file=None,
|
||||||
|
create_comparisons=True, file_pattern="*.jpg"):
|
||||||
|
"""
|
||||||
|
Process all newspaper images in a directory.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
input_dir: Directory containing input images
|
||||||
|
output_dir: Directory for cleaned images
|
||||||
|
config_file: JSON file with custom parameters
|
||||||
|
create_comparisons: Whether to create before/after comparisons
|
||||||
|
file_pattern: Glob pattern for files to process
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Load custom config if provided
|
||||||
|
config = None
|
||||||
|
if config_file and os.path.exists(config_file):
|
||||||
|
with open(config_file, 'r') as f:
|
||||||
|
config = json.load(f)
|
||||||
|
print(f"Loaded custom config from {config_file}")
|
||||||
|
|
||||||
|
# Initialize cleaner
|
||||||
|
cleaner = NewspaperImageCleaner(config)
|
||||||
|
|
||||||
|
# Setup paths
|
||||||
|
input_path = Path(input_dir)
|
||||||
|
output_path = Path(output_dir)
|
||||||
|
output_path.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
if create_comparisons:
|
||||||
|
comparison_path = output_path / "comparisons"
|
||||||
|
comparison_path.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
# Find all image files
|
||||||
|
image_files = list(input_path.glob(file_pattern))
|
||||||
|
image_files.extend(input_path.glob("*.jpeg"))
|
||||||
|
image_files.extend(input_path.glob("*.JPG"))
|
||||||
|
image_files.extend(input_path.glob("*.JPEG"))
|
||||||
|
|
||||||
|
if not image_files:
|
||||||
|
print(f"No image files found in {input_dir}")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"Found {len(image_files)} images to process")
|
||||||
|
print(f"Output directory: {output_path.absolute()}")
|
||||||
|
|
||||||
|
# Process each image
|
||||||
|
success_count = 0
|
||||||
|
error_count = 0
|
||||||
|
start_time = time.time()
|
||||||
|
|
||||||
|
for i, img_file in enumerate(image_files, 1):
|
||||||
|
print(f"\n[{i}/{len(image_files)}] Processing: {img_file.name}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Process image
|
||||||
|
output_file = output_path / f"cleaned_{img_file.name}"
|
||||||
|
processed, original = cleaner.process_image(img_file, output_file)
|
||||||
|
|
||||||
|
# Create comparison if requested
|
||||||
|
if create_comparisons:
|
||||||
|
comp_file = comparison_path / f"comparison_{img_file.name}"
|
||||||
|
create_comparison_image(original, processed, comp_file)
|
||||||
|
|
||||||
|
success_count += 1
|
||||||
|
print(f"✓ Completed: {img_file.name}")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
error_count += 1
|
||||||
|
print(f"✗ Error processing {img_file.name}: {str(e)}")
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
elapsed_time = time.time() - start_time
|
||||||
|
print(f"\n" + "="*50)
|
||||||
|
print(f"Batch Processing Complete")
|
||||||
|
print(f"{"="*50}")
|
||||||
|
print(f"Successfully processed: {success_count}")
|
||||||
|
print(f"Errors: {error_count}")
|
||||||
|
print(f"Total time: {elapsed_time:.1f} seconds")
|
||||||
|
print(f"Average time per image: {elapsed_time/len(image_files):.1f} seconds")
|
||||||
|
print(f"Output directory: {output_path.absolute()}")
|
||||||
|
|
||||||
|
|
||||||
|
def create_sample_config():
|
||||||
|
"""Create a sample configuration file for customization."""
|
||||||
|
config = {
|
||||||
|
"bilateral_d": 9,
|
||||||
|
"bilateral_sigma_color": 75,
|
||||||
|
"bilateral_sigma_space": 75,
|
||||||
|
"clahe_clip_limit": 2.0,
|
||||||
|
"clahe_grid_size": [8, 8],
|
||||||
|
"gamma": 1.2,
|
||||||
|
"denoise_h": 10,
|
||||||
|
"morph_kernel_size": 2,
|
||||||
|
"unsharp_amount": 1.5,
|
||||||
|
"unsharp_radius": 1.0,
|
||||||
|
"unsharp_threshold": 0
|
||||||
|
}
|
||||||
|
|
||||||
|
with open("config.json", "w") as f:
|
||||||
|
json.dump(config, f, indent=4)
|
||||||
|
|
||||||
|
print("Created config.json with default parameters.")
|
||||||
|
print("Edit this file to customize processing settings.")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import argparse
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Batch process historical newspaper images",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
python batch_process.py # Process current directory
|
||||||
|
python batch_process.py -i scans -o clean # Process 'scans' folder
|
||||||
|
python batch_process.py --no-comparisons # Skip comparison images
|
||||||
|
python batch_process.py --config custom.json # Use custom settings
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument("-i", "--input", default=".",
|
||||||
|
help="Input directory (default: current directory)")
|
||||||
|
parser.add_argument("-o", "--output", default="cleaned",
|
||||||
|
help="Output directory (default: cleaned)")
|
||||||
|
parser.add_argument("-c", "--config",
|
||||||
|
help="JSON config file with custom parameters")
|
||||||
|
parser.add_argument("--no-comparisons", action="store_true",
|
||||||
|
help="Skip creating before/after comparison images")
|
||||||
|
parser.add_argument("--pattern", default="*.jpg",
|
||||||
|
help="File pattern to match (default: *.jpg)")
|
||||||
|
parser.add_argument("--create-config", action="store_true",
|
||||||
|
help="Create sample config file and exit")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if args.create_config:
|
||||||
|
create_sample_config()
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
|
process_batch(
|
||||||
|
input_dir=args.input,
|
||||||
|
output_dir=args.output,
|
||||||
|
config_file=args.config,
|
||||||
|
create_comparisons=not args.no_comparisons,
|
||||||
|
file_pattern=args.pattern
|
||||||
|
)
|
||||||
291
scripts/ex/config_tuner.py
Executable file
291
scripts/ex/config_tuner.py
Executable file
@@ -0,0 +1,291 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Interactive Parameter Tuning Tool for Newspaper Image Cleaning
|
||||||
|
|
||||||
|
This tool helps you find optimal parameters for your specific images
|
||||||
|
by providing an interactive tuning interface.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import cv2
|
||||||
|
import json
|
||||||
|
import numpy as np
|
||||||
|
from pathlib import Path
|
||||||
|
from image_cleaner import NewspaperImageCleaner
|
||||||
|
|
||||||
|
|
||||||
|
class ParameterTuner:
|
||||||
|
"""Interactive parameter tuning for image cleaning pipeline."""
|
||||||
|
|
||||||
|
def __init__(self, sample_image_path):
|
||||||
|
"""Initialize with a sample image for tuning."""
|
||||||
|
self.original = cv2.imread(str(sample_image_path))
|
||||||
|
if self.original is None:
|
||||||
|
raise ValueError(f"Could not load image: {sample_image_path}")
|
||||||
|
|
||||||
|
# Resize large images for faster processing during tuning
|
||||||
|
height, width = self.original.shape[:2]
|
||||||
|
if height > 1500 or width > 1500:
|
||||||
|
scale = min(1500/height, 1500/width)
|
||||||
|
new_width = int(width * scale)
|
||||||
|
new_height = int(height * scale)
|
||||||
|
self.original = cv2.resize(self.original, (new_width, new_height))
|
||||||
|
print(f"Resized image to {new_width}x{new_height} for faster tuning")
|
||||||
|
|
||||||
|
self.current_params = self._get_default_params()
|
||||||
|
self.cleaner = NewspaperImageCleaner(self.current_params)
|
||||||
|
|
||||||
|
def _get_default_params(self):
|
||||||
|
"""Get default parameters as starting point."""
|
||||||
|
return {
|
||||||
|
'bilateral_d': 9,
|
||||||
|
'bilateral_sigma_color': 75,
|
||||||
|
'bilateral_sigma_space': 75,
|
||||||
|
'clahe_clip_limit': 2.0,
|
||||||
|
'clahe_grid_size': (8, 8),
|
||||||
|
'gamma': 1.2,
|
||||||
|
'denoise_h': 10,
|
||||||
|
'morph_kernel_size': 2,
|
||||||
|
'unsharp_amount': 1.5,
|
||||||
|
'unsharp_radius': 1.0,
|
||||||
|
'unsharp_threshold': 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
def update_parameter(self, param_name, value):
|
||||||
|
"""Update a single parameter and refresh the cleaner."""
|
||||||
|
if param_name in self.current_params:
|
||||||
|
# Handle special cases
|
||||||
|
if param_name == 'clahe_grid_size':
|
||||||
|
self.current_params[param_name] = (int(value), int(value))
|
||||||
|
else:
|
||||||
|
self.current_params[param_name] = value
|
||||||
|
|
||||||
|
# Update cleaner with new parameters
|
||||||
|
self.cleaner = NewspaperImageCleaner(self.current_params)
|
||||||
|
print(f"Updated {param_name} = {value}")
|
||||||
|
|
||||||
|
def process_with_current_params(self, steps=None):
|
||||||
|
"""Process the sample image with current parameters."""
|
||||||
|
if steps is None:
|
||||||
|
steps = ['denoise', 'contrast', 'background', 'sharpen']
|
||||||
|
|
||||||
|
image = self.original.copy()
|
||||||
|
|
||||||
|
# Apply processing steps
|
||||||
|
if 'denoise' in steps:
|
||||||
|
image = self.cleaner.reduce_noise(image)
|
||||||
|
|
||||||
|
if 'contrast' in steps:
|
||||||
|
image = self.cleaner.enhance_contrast(image)
|
||||||
|
|
||||||
|
if 'background' in steps:
|
||||||
|
image = self.cleaner.clean_background(image)
|
||||||
|
|
||||||
|
if 'sharpen' in steps:
|
||||||
|
image = self.cleaner.sharpen_image(image)
|
||||||
|
|
||||||
|
return image
|
||||||
|
|
||||||
|
def create_comparison(self, steps=None):
|
||||||
|
"""Create side-by-side comparison with current parameters."""
|
||||||
|
processed = self.process_with_current_params(steps)
|
||||||
|
|
||||||
|
# Create side-by-side comparison
|
||||||
|
height = max(self.original.shape[0], processed.shape[0])
|
||||||
|
comparison = np.hstack([
|
||||||
|
cv2.resize(self.original, (self.original.shape[1], height)),
|
||||||
|
cv2.resize(processed, (processed.shape[1], height))
|
||||||
|
])
|
||||||
|
|
||||||
|
return comparison
|
||||||
|
|
||||||
|
def save_comparison(self, output_path, steps=None):
|
||||||
|
"""Save comparison image to file."""
|
||||||
|
comparison = self.create_comparison(steps)
|
||||||
|
cv2.imwrite(str(output_path), comparison)
|
||||||
|
print(f"Comparison saved to: {output_path}")
|
||||||
|
|
||||||
|
def save_config(self, config_path):
|
||||||
|
"""Save current parameters to JSON config file."""
|
||||||
|
# Convert tuple to list for JSON serialization
|
||||||
|
config_to_save = self.current_params.copy()
|
||||||
|
if 'clahe_grid_size' in config_to_save:
|
||||||
|
config_to_save['clahe_grid_size'] = list(config_to_save['clahe_grid_size'])
|
||||||
|
|
||||||
|
with open(config_path, 'w') as f:
|
||||||
|
json.dump(config_to_save, f, indent=4)
|
||||||
|
print(f"Configuration saved to: {config_path}")
|
||||||
|
|
||||||
|
def load_config(self, config_path):
|
||||||
|
"""Load parameters from JSON config file."""
|
||||||
|
with open(config_path, 'r') as f:
|
||||||
|
loaded_params = json.load(f)
|
||||||
|
|
||||||
|
# Convert list back to tuple if needed
|
||||||
|
if 'clahe_grid_size' in loaded_params:
|
||||||
|
loaded_params['clahe_grid_size'] = tuple(loaded_params['clahe_grid_size'])
|
||||||
|
|
||||||
|
self.current_params.update(loaded_params)
|
||||||
|
self.cleaner = NewspaperImageCleaner(self.current_params)
|
||||||
|
print(f"Configuration loaded from: {config_path}")
|
||||||
|
|
||||||
|
def interactive_tune(self):
|
||||||
|
"""Start interactive tuning session."""
|
||||||
|
print("\n" + "="*60)
|
||||||
|
print("INTERACTIVE PARAMETER TUNING")
|
||||||
|
print("="*60)
|
||||||
|
print("Commands:")
|
||||||
|
print(" set <param> <value> - Set parameter value")
|
||||||
|
print(" show - Show current parameters")
|
||||||
|
print(" test [steps] - Test current parameters")
|
||||||
|
print(" save <file> - Save configuration to file")
|
||||||
|
print(" load <file> - Load configuration from file")
|
||||||
|
print(" compare [file] - Save comparison image")
|
||||||
|
print(" presets - Show parameter presets")
|
||||||
|
print(" help - Show this help")
|
||||||
|
print(" quit - Exit tuning")
|
||||||
|
print("\nParameters you can adjust:")
|
||||||
|
for param in self.current_params:
|
||||||
|
print(f" {param}")
|
||||||
|
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
command = input("\ntuner> ").strip().split()
|
||||||
|
if not command:
|
||||||
|
continue
|
||||||
|
|
||||||
|
cmd = command[0].lower()
|
||||||
|
|
||||||
|
if cmd == 'quit' or cmd == 'exit':
|
||||||
|
break
|
||||||
|
|
||||||
|
elif cmd == 'show':
|
||||||
|
self._show_parameters()
|
||||||
|
|
||||||
|
elif cmd == 'set' and len(command) >= 3:
|
||||||
|
param = command[1]
|
||||||
|
try:
|
||||||
|
value = float(command[2]) if '.' in command[2] else int(command[2])
|
||||||
|
except ValueError:
|
||||||
|
value = command[2]
|
||||||
|
self.update_parameter(param, value)
|
||||||
|
|
||||||
|
elif cmd == 'test':
|
||||||
|
steps = command[1:] if len(command) > 1 else None
|
||||||
|
print("Processing with current parameters...")
|
||||||
|
processed = self.process_with_current_params(steps)
|
||||||
|
print(f"Processed image shape: {processed.shape}")
|
||||||
|
|
||||||
|
elif cmd == 'save' and len(command) > 1:
|
||||||
|
self.save_config(command[1])
|
||||||
|
|
||||||
|
elif cmd == 'load' and len(command) > 1:
|
||||||
|
self.load_config(command[1])
|
||||||
|
|
||||||
|
elif cmd == 'compare':
|
||||||
|
output = command[1] if len(command) > 1 else "tuning_comparison.jpg"
|
||||||
|
self.save_comparison(output)
|
||||||
|
|
||||||
|
elif cmd == 'presets':
|
||||||
|
self._show_presets()
|
||||||
|
|
||||||
|
elif cmd == 'help':
|
||||||
|
self._show_help()
|
||||||
|
|
||||||
|
else:
|
||||||
|
print("Unknown command. Type 'help' for available commands.")
|
||||||
|
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
print("\nExiting tuner...")
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Error: {str(e)}")
|
||||||
|
|
||||||
|
def _show_parameters(self):
|
||||||
|
"""Display current parameter values."""
|
||||||
|
print("\nCurrent Parameters:")
|
||||||
|
print("-" * 30)
|
||||||
|
for param, value in self.current_params.items():
|
||||||
|
print(f" {param:<20} = {value}")
|
||||||
|
|
||||||
|
def _show_presets(self):
|
||||||
|
"""Show preset configurations for different image types."""
|
||||||
|
presets = {
|
||||||
|
"light_cleaning": {
|
||||||
|
"bilateral_d": 5,
|
||||||
|
"denoise_h": 5,
|
||||||
|
"clahe_clip_limit": 1.5,
|
||||||
|
"gamma": 1.1,
|
||||||
|
"unsharp_amount": 1.2
|
||||||
|
},
|
||||||
|
"heavy_cleaning": {
|
||||||
|
"bilateral_d": 15,
|
||||||
|
"denoise_h": 15,
|
||||||
|
"clahe_clip_limit": 3.0,
|
||||||
|
"gamma": 1.3,
|
||||||
|
"unsharp_amount": 2.0
|
||||||
|
},
|
||||||
|
"high_contrast": {
|
||||||
|
"clahe_clip_limit": 4.0,
|
||||||
|
"gamma": 1.4,
|
||||||
|
"unsharp_amount": 2.5
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
print("\nAvailable Presets:")
|
||||||
|
print("-" * 30)
|
||||||
|
for name, params in presets.items():
|
||||||
|
print(f"{name}:")
|
||||||
|
for param, value in params.items():
|
||||||
|
print(f" {param} = {value}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
def _show_help(self):
|
||||||
|
"""Show detailed help information."""
|
||||||
|
help_text = """
|
||||||
|
Parameter Descriptions:
|
||||||
|
-----------------------
|
||||||
|
bilateral_d : Neighborhood diameter for bilateral filtering (5-15)
|
||||||
|
bilateral_sigma_color: Filter sigma in color space (50-150)
|
||||||
|
bilateral_sigma_space: Filter sigma in coordinate space (50-150)
|
||||||
|
clahe_clip_limit : Contrast limit for CLAHE (1.0-4.0)
|
||||||
|
clahe_grid_size : CLAHE tile grid size (4-16)
|
||||||
|
gamma : Gamma correction value (0.8-2.0)
|
||||||
|
denoise_h : Denoising filter strength (5-20)
|
||||||
|
morph_kernel_size : Morphological operation kernel size (1-5)
|
||||||
|
unsharp_amount : Unsharp masking amount (0.5-3.0)
|
||||||
|
unsharp_radius : Unsharp masking radius (0.5-2.0)
|
||||||
|
unsharp_threshold : Unsharp masking threshold (0-10)
|
||||||
|
|
||||||
|
Tips:
|
||||||
|
- Start with small adjustments (±20% of current value)
|
||||||
|
- Test frequently with 'compare' command
|
||||||
|
- Save working configurations before major changes
|
||||||
|
- Use 'test denoise' to test individual steps
|
||||||
|
"""
|
||||||
|
print(help_text)
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Main function for command line usage."""
|
||||||
|
import argparse
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser(description="Interactive parameter tuning for newspaper image cleaning")
|
||||||
|
parser.add_argument("image", help="Sample image path for tuning")
|
||||||
|
parser.add_argument("-c", "--config", help="Load initial config from file")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
try:
|
||||||
|
tuner = ParameterTuner(args.image)
|
||||||
|
|
||||||
|
if args.config:
|
||||||
|
tuner.load_config(args.config)
|
||||||
|
|
||||||
|
tuner.interactive_tune()
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Error: {str(e)}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
170
scripts/ex/demo.py
Executable file
170
scripts/ex/demo.py
Executable file
@@ -0,0 +1,170 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Demo Script for Newspaper Image Cleaning Pipeline
|
||||||
|
|
||||||
|
This script demonstrates the cleaning pipeline on the sample images
|
||||||
|
and shows the available functionality.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Add current directory to Python path
|
||||||
|
sys.path.append(str(Path(__file__).parent))
|
||||||
|
|
||||||
|
try:
|
||||||
|
from image_cleaner import NewspaperImageCleaner, create_comparison_image
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
print("✓ All required libraries imported successfully")
|
||||||
|
except ImportError as e:
|
||||||
|
print(f"✗ Import error: {e}")
|
||||||
|
print("Please install required packages: pip install -r requirements.txt")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
|
||||||
|
def demo_single_image(image_path):
|
||||||
|
"""Demonstrate processing a single image."""
|
||||||
|
print(f"\n=== Processing Single Image: {image_path} ===")
|
||||||
|
|
||||||
|
if not os.path.exists(image_path):
|
||||||
|
print(f"Image not found: {image_path}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Initialize cleaner
|
||||||
|
cleaner = NewspaperImageCleaner()
|
||||||
|
|
||||||
|
# Process image
|
||||||
|
output_path = f"demo_cleaned_{Path(image_path).name}"
|
||||||
|
processed, original = cleaner.process_image(image_path, output_path)
|
||||||
|
|
||||||
|
# Create comparison
|
||||||
|
comparison_path = f"demo_comparison_{Path(image_path).name}"
|
||||||
|
create_comparison_image(original, processed, comparison_path)
|
||||||
|
|
||||||
|
print(f"✓ Processed image saved: {output_path}")
|
||||||
|
print(f"✓ Comparison saved: {comparison_path}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"✗ Error processing {image_path}: {str(e)}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def demo_step_by_step(image_path):
|
||||||
|
"""Demonstrate individual processing steps."""
|
||||||
|
print(f"\n=== Step-by-Step Processing: {image_path} ===")
|
||||||
|
|
||||||
|
if not os.path.exists(image_path):
|
||||||
|
print(f"Image not found: {image_path}")
|
||||||
|
return
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Load image
|
||||||
|
original = cv2.imread(image_path)
|
||||||
|
if original is None:
|
||||||
|
print(f"Could not load image: {image_path}")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Resize if too large for demo
|
||||||
|
height, width = original.shape[:2]
|
||||||
|
if height > 1000 or width > 1000:
|
||||||
|
scale = min(1000/height, 1000/width)
|
||||||
|
new_width = int(width * scale)
|
||||||
|
new_height = int(height * scale)
|
||||||
|
original = cv2.resize(original, (new_width, new_height))
|
||||||
|
print(f"Resized to {new_width}x{new_height} for demo")
|
||||||
|
|
||||||
|
cleaner = NewspaperImageCleaner()
|
||||||
|
|
||||||
|
# Process step by step
|
||||||
|
steps = [
|
||||||
|
('original', original),
|
||||||
|
('denoised', cleaner.reduce_noise(original.copy())),
|
||||||
|
('contrast_enhanced', cleaner.enhance_contrast(original.copy())),
|
||||||
|
('background_cleaned', cleaner.clean_background(original.copy())),
|
||||||
|
('sharpened', cleaner.sharpen_image(original.copy()))
|
||||||
|
]
|
||||||
|
|
||||||
|
# Save each step
|
||||||
|
for step_name, image in steps:
|
||||||
|
output_path = f"demo_step_{step_name}_{Path(image_path).name}"
|
||||||
|
cv2.imwrite(output_path, image)
|
||||||
|
print(f"✓ Saved {step_name}: {output_path}")
|
||||||
|
|
||||||
|
print("✓ Individual processing steps completed")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"✗ Error in step-by-step processing: {str(e)}")
|
||||||
|
|
||||||
|
|
||||||
|
def show_image_info():
|
||||||
|
"""Show information about available images."""
|
||||||
|
print("\n=== Available Sample Images ===")
|
||||||
|
|
||||||
|
image_files = []
|
||||||
|
for ext in ['*.jpg', '*.jpeg', '*.JPG', '*.JPEG']:
|
||||||
|
image_files.extend(Path('.').glob(ext))
|
||||||
|
|
||||||
|
if not image_files:
|
||||||
|
print("No image files found in current directory")
|
||||||
|
return []
|
||||||
|
|
||||||
|
for img_file in image_files:
|
||||||
|
try:
|
||||||
|
# Load image to get dimensions
|
||||||
|
img = cv2.imread(str(img_file))
|
||||||
|
if img is not None:
|
||||||
|
height, width = img.shape[:2]
|
||||||
|
file_size = img_file.stat().st_size / (1024*1024) # MB
|
||||||
|
print(f" {img_file.name}: {width}x{height} pixels, {file_size:.1f}MB")
|
||||||
|
else:
|
||||||
|
print(f" {img_file.name}: Could not load")
|
||||||
|
except Exception as e:
|
||||||
|
print(f" {img_file.name}: Error - {str(e)}")
|
||||||
|
|
||||||
|
return image_files
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Main demo function."""
|
||||||
|
print("Historical Newspaper Image Cleaning Pipeline - Demo")
|
||||||
|
print("=" * 55)
|
||||||
|
|
||||||
|
# Show available images
|
||||||
|
image_files = show_image_info()
|
||||||
|
|
||||||
|
if not image_files:
|
||||||
|
print("\nNo images found. Please add some image files to test.")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Select first image for demo
|
||||||
|
sample_image = image_files[0]
|
||||||
|
print(f"\nUsing sample image: {sample_image.name}")
|
||||||
|
|
||||||
|
# Demo single image processing
|
||||||
|
success = demo_single_image(str(sample_image))
|
||||||
|
|
||||||
|
if success:
|
||||||
|
# Demo step-by-step processing
|
||||||
|
demo_step_by_step(str(sample_image))
|
||||||
|
|
||||||
|
print(f"\n=== Demo Complete ===")
|
||||||
|
print("Generated files:")
|
||||||
|
print(" - demo_cleaned_*.jpg (cleaned image)")
|
||||||
|
print(" - demo_comparison_*.jpg (before/after comparison)")
|
||||||
|
print(" - demo_step_*.jpg (individual processing steps)")
|
||||||
|
|
||||||
|
print(f"\nNext steps:")
|
||||||
|
print(f" - Try: python config_tuner.py {sample_image.name}")
|
||||||
|
print(f" - Try: python batch_process.py")
|
||||||
|
print(f" - Adjust parameters in config.json for better results")
|
||||||
|
|
||||||
|
else:
|
||||||
|
print("\nDemo failed. Please check your Python environment and dependencies.")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
310
scripts/ex/image_cleaner.py
Normal file
310
scripts/ex/image_cleaner.py
Normal file
@@ -0,0 +1,310 @@
|
|||||||
|
"""
|
||||||
|
Historical Newspaper Image Cleaning Pipeline
|
||||||
|
|
||||||
|
This module provides functions to clean and enhance scanned historical newspaper images
|
||||||
|
by reducing noise, improving contrast, and sharpening text for better readability.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
from PIL import Image, ImageEnhance
|
||||||
|
import os
|
||||||
|
import argparse
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
class NewspaperImageCleaner:
|
||||||
|
"""
|
||||||
|
Image processing pipeline specifically designed for historical newspaper scans.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, config=None):
|
||||||
|
"""Initialize with default or custom configuration."""
|
||||||
|
self.config = config or self._default_config()
|
||||||
|
|
||||||
|
def _default_config(self):
|
||||||
|
"""Default processing parameters optimized for newspaper scans."""
|
||||||
|
return {
|
||||||
|
'bilateral_d': 9, # Neighborhood diameter for bilateral filter
|
||||||
|
'bilateral_sigma_color': 75, # Filter sigma in color space
|
||||||
|
'bilateral_sigma_space': 75, # Filter sigma in coordinate space
|
||||||
|
'clahe_clip_limit': 2.0, # Contrast limiting for CLAHE
|
||||||
|
'clahe_grid_size': (8, 8), # CLAHE grid size
|
||||||
|
'gamma': 1.2, # Gamma correction value
|
||||||
|
'denoise_h': 10, # Denoising filter strength
|
||||||
|
'morph_kernel_size': 2, # Morphological operation kernel size
|
||||||
|
'unsharp_amount': 1.5, # Unsharp masking amount
|
||||||
|
'unsharp_radius': 1.0, # Unsharp masking radius
|
||||||
|
'unsharp_threshold': 0, # Unsharp masking threshold
|
||||||
|
}
|
||||||
|
|
||||||
|
def reduce_noise(self, image):
|
||||||
|
"""
|
||||||
|
Apply noise reduction techniques to remove speckles and JPEG artifacts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
image: Input BGR image
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Denoised image
|
||||||
|
"""
|
||||||
|
# Bilateral filter - preserves edges while reducing noise
|
||||||
|
bilateral = cv2.bilateralFilter(
|
||||||
|
image,
|
||||||
|
self.config['bilateral_d'],
|
||||||
|
self.config['bilateral_sigma_color'],
|
||||||
|
self.config['bilateral_sigma_space']
|
||||||
|
)
|
||||||
|
|
||||||
|
# Non-local means denoising for better noise reduction
|
||||||
|
if len(image.shape) == 3:
|
||||||
|
# Color image
|
||||||
|
denoised = cv2.fastNlMeansDenoisingColored(
|
||||||
|
bilateral, None,
|
||||||
|
self.config['denoise_h'],
|
||||||
|
self.config['denoise_h'],
|
||||||
|
7, 21
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
# Grayscale image
|
||||||
|
denoised = cv2.fastNlMeansDenoising(
|
||||||
|
bilateral, None,
|
||||||
|
self.config['denoise_h'],
|
||||||
|
7, 21
|
||||||
|
)
|
||||||
|
|
||||||
|
return denoised
|
||||||
|
|
||||||
|
def enhance_contrast(self, image):
|
||||||
|
"""
|
||||||
|
Improve image contrast using CLAHE and gamma correction.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
image: Input BGR image
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Contrast-enhanced image
|
||||||
|
"""
|
||||||
|
# Convert to LAB color space for better contrast processing
|
||||||
|
if len(image.shape) == 3:
|
||||||
|
lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
|
||||||
|
l_channel, a_channel, b_channel = cv2.split(lab)
|
||||||
|
else:
|
||||||
|
l_channel = image
|
||||||
|
|
||||||
|
# Apply CLAHE (Contrast Limited Adaptive Histogram Equalization)
|
||||||
|
clahe = cv2.createCLAHE(
|
||||||
|
clipLimit=self.config['clahe_clip_limit'],
|
||||||
|
tileGridSize=self.config['clahe_grid_size']
|
||||||
|
)
|
||||||
|
l_channel = clahe.apply(l_channel)
|
||||||
|
|
||||||
|
# Reconstruct image
|
||||||
|
if len(image.shape) == 3:
|
||||||
|
enhanced = cv2.merge([l_channel, a_channel, b_channel])
|
||||||
|
enhanced = cv2.cvtColor(enhanced, cv2.COLOR_LAB2BGR)
|
||||||
|
else:
|
||||||
|
enhanced = l_channel
|
||||||
|
|
||||||
|
# Apply gamma correction
|
||||||
|
gamma = self.config['gamma']
|
||||||
|
inv_gamma = 1.0 / gamma
|
||||||
|
table = np.array([((i / 255.0) ** inv_gamma) * 255
|
||||||
|
for i in np.arange(0, 256)]).astype("uint8")
|
||||||
|
enhanced = cv2.LUT(enhanced, table)
|
||||||
|
|
||||||
|
return enhanced
|
||||||
|
|
||||||
|
def clean_background(self, image):
|
||||||
|
"""
|
||||||
|
Remove small artifacts and clean background noise.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
image: Input image
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Background-cleaned image
|
||||||
|
"""
|
||||||
|
# Convert to grayscale for morphological operations
|
||||||
|
if len(image.shape) == 3:
|
||||||
|
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
|
||||||
|
else:
|
||||||
|
gray = image
|
||||||
|
|
||||||
|
# Morphological opening to remove small noise
|
||||||
|
kernel = np.ones((self.config['morph_kernel_size'],
|
||||||
|
self.config['morph_kernel_size']), np.uint8)
|
||||||
|
|
||||||
|
# Opening (erosion followed by dilation)
|
||||||
|
opened = cv2.morphologyEx(gray, cv2.MORPH_OPEN, kernel)
|
||||||
|
|
||||||
|
# If original was color, apply the mask
|
||||||
|
if len(image.shape) == 3:
|
||||||
|
# Create a mask and apply it to the original color image
|
||||||
|
mask = opened > 0
|
||||||
|
result = image.copy()
|
||||||
|
result[~mask] = [255, 255, 255] # Set background to white
|
||||||
|
return result
|
||||||
|
else:
|
||||||
|
return opened
|
||||||
|
|
||||||
|
def sharpen_image(self, image):
|
||||||
|
"""
|
||||||
|
Apply unsharp masking to enhance text clarity.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
image: Input image
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Sharpened image
|
||||||
|
"""
|
||||||
|
# Convert to float for processing
|
||||||
|
float_img = image.astype(np.float32) / 255.0
|
||||||
|
|
||||||
|
# Create Gaussian blur
|
||||||
|
radius = self.config['unsharp_radius']
|
||||||
|
sigma = radius / 3.0
|
||||||
|
blurred = cv2.GaussianBlur(float_img, (0, 0), sigma)
|
||||||
|
|
||||||
|
# Unsharp masking
|
||||||
|
amount = self.config['unsharp_amount']
|
||||||
|
sharpened = float_img + amount * (float_img - blurred)
|
||||||
|
|
||||||
|
# Threshold and clamp
|
||||||
|
threshold = self.config['unsharp_threshold'] / 255.0
|
||||||
|
sharpened = np.where(np.abs(float_img - blurred) < threshold,
|
||||||
|
float_img, sharpened)
|
||||||
|
sharpened = np.clip(sharpened, 0.0, 1.0)
|
||||||
|
|
||||||
|
return (sharpened * 255).astype(np.uint8)
|
||||||
|
|
||||||
|
def process_image(self, image_path, output_path=None, steps=None):
|
||||||
|
"""
|
||||||
|
Process a single image through the complete pipeline.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
image_path: Path to input image
|
||||||
|
output_path: Path for output image (optional)
|
||||||
|
steps: List of processing steps to apply (optional)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Processed image array
|
||||||
|
"""
|
||||||
|
if steps is None:
|
||||||
|
steps = ['denoise', 'contrast', 'background', 'sharpen']
|
||||||
|
|
||||||
|
# Load image
|
||||||
|
image = cv2.imread(str(image_path))
|
||||||
|
if image is None:
|
||||||
|
raise ValueError(f"Could not load image: {image_path}")
|
||||||
|
|
||||||
|
original = image.copy()
|
||||||
|
|
||||||
|
# Apply processing steps
|
||||||
|
if 'denoise' in steps:
|
||||||
|
print(f"Applying noise reduction...")
|
||||||
|
image = self.reduce_noise(image)
|
||||||
|
|
||||||
|
if 'contrast' in steps:
|
||||||
|
print(f"Enhancing contrast...")
|
||||||
|
image = self.enhance_contrast(image)
|
||||||
|
|
||||||
|
if 'background' in steps:
|
||||||
|
print(f"Cleaning background...")
|
||||||
|
image = self.clean_background(image)
|
||||||
|
|
||||||
|
if 'sharpen' in steps:
|
||||||
|
print(f"Sharpening image...")
|
||||||
|
image = self.sharpen_image(image)
|
||||||
|
|
||||||
|
# Save output if path provided
|
||||||
|
if output_path:
|
||||||
|
cv2.imwrite(str(output_path), image)
|
||||||
|
print(f"Processed image saved to: {output_path}")
|
||||||
|
|
||||||
|
return image, original
|
||||||
|
|
||||||
|
def process_directory(self, input_dir, output_dir, extensions=None):
|
||||||
|
"""
|
||||||
|
Process all images in a directory.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
input_dir: Input directory path
|
||||||
|
output_dir: Output directory path
|
||||||
|
extensions: List of file extensions to process
|
||||||
|
"""
|
||||||
|
if extensions is None:
|
||||||
|
extensions = ['.jpg', '.jpeg', '.png', '.tif', '.tiff']
|
||||||
|
|
||||||
|
input_path = Path(input_dir)
|
||||||
|
output_path = Path(output_dir)
|
||||||
|
output_path.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
for file_path in input_path.iterdir():
|
||||||
|
if file_path.suffix.lower() in extensions:
|
||||||
|
print(f"\nProcessing: {file_path.name}")
|
||||||
|
output_file = output_path / f"cleaned_{file_path.name}"
|
||||||
|
|
||||||
|
try:
|
||||||
|
self.process_image(file_path, output_file)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Error processing {file_path.name}: {str(e)}")
|
||||||
|
|
||||||
|
print(f"\nBatch processing completed. Results in: {output_dir}")
|
||||||
|
|
||||||
|
|
||||||
|
def create_comparison_image(original, processed, output_path):
|
||||||
|
"""
|
||||||
|
Create a side-by-side comparison image.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
original: Original image array
|
||||||
|
processed: Processed image array
|
||||||
|
output_path: Path to save comparison
|
||||||
|
"""
|
||||||
|
# Resize images to same height if needed
|
||||||
|
h1, w1 = original.shape[:2]
|
||||||
|
h2, w2 = processed.shape[:2]
|
||||||
|
|
||||||
|
if h1 != h2:
|
||||||
|
height = min(h1, h2)
|
||||||
|
original = cv2.resize(original, (int(w1 * height / h1), height))
|
||||||
|
processed = cv2.resize(processed, (int(w2 * height / h2), height))
|
||||||
|
|
||||||
|
# Create side-by-side comparison
|
||||||
|
comparison = np.hstack([original, processed])
|
||||||
|
cv2.imwrite(str(output_path), comparison)
|
||||||
|
print(f"Comparison saved to: {output_path}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser(description="Clean historical newspaper images")
|
||||||
|
parser.add_argument("input", help="Input image or directory path")
|
||||||
|
parser.add_argument("-o", "--output", help="Output path")
|
||||||
|
parser.add_argument("-d", "--directory", action="store_true",
|
||||||
|
help="Process entire directory")
|
||||||
|
parser.add_argument("-c", "--comparison", action="store_true",
|
||||||
|
help="Create before/after comparison")
|
||||||
|
parser.add_argument("--steps", nargs="+",
|
||||||
|
choices=['denoise', 'contrast', 'background', 'sharpen'],
|
||||||
|
default=['denoise', 'contrast', 'background', 'sharpen'],
|
||||||
|
help="Processing steps to apply")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
cleaner = NewspaperImageCleaner()
|
||||||
|
|
||||||
|
if args.directory:
|
||||||
|
output_dir = args.output or "cleaned_images"
|
||||||
|
cleaner.process_directory(args.input, output_dir)
|
||||||
|
else:
|
||||||
|
output_path = args.output
|
||||||
|
if not output_path:
|
||||||
|
input_path = Path(args.input)
|
||||||
|
output_path = input_path.parent / f"cleaned_{input_path.name}"
|
||||||
|
|
||||||
|
processed, original = cleaner.process_image(args.input, output_path, args.steps)
|
||||||
|
|
||||||
|
if args.comparison:
|
||||||
|
comparison_path = Path(output_path).parent / f"comparison_{Path(args.input).name}"
|
||||||
|
create_comparison_image(original, processed, comparison_path)
|
||||||
5
scripts/ex/requirements.txt
Normal file
5
scripts/ex/requirements.txt
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
opencv-python==4.10.0.84
|
||||||
|
scikit-image==0.24.0
|
||||||
|
Pillow==10.4.0
|
||||||
|
numpy==2.1.1
|
||||||
|
matplotlib==3.9.2
|
||||||
42
scripts/ex/run.sh
Executable file
42
scripts/ex/run.sh
Executable file
@@ -0,0 +1,42 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Convenience script to run the image cleaning pipeline with virtual environment
|
||||||
|
|
||||||
|
# Activate virtual environment
|
||||||
|
source venv/bin/activate
|
||||||
|
|
||||||
|
# Check if any arguments provided
|
||||||
|
if [ $# -eq 0 ]; then
|
||||||
|
echo "Historical Newspaper Image Cleaning Pipeline"
|
||||||
|
echo "Usage examples:"
|
||||||
|
echo " $0 demo # Run demo"
|
||||||
|
echo " $0 clean image.jpg # Clean single image"
|
||||||
|
echo " $0 batch # Process all images in directory"
|
||||||
|
echo " $0 tune image.jpg # Interactive parameter tuning"
|
||||||
|
echo " $0 python script.py [args] # Run custom Python script"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
case "$1" in
|
||||||
|
"demo")
|
||||||
|
python demo.py
|
||||||
|
;;
|
||||||
|
"clean")
|
||||||
|
shift
|
||||||
|
python image_cleaner.py "$@"
|
||||||
|
;;
|
||||||
|
"batch")
|
||||||
|
shift
|
||||||
|
python batch_process.py "$@"
|
||||||
|
;;
|
||||||
|
"tune")
|
||||||
|
shift
|
||||||
|
python config_tuner.py "$@"
|
||||||
|
;;
|
||||||
|
"python")
|
||||||
|
shift
|
||||||
|
python "$@"
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
python "$@"
|
||||||
|
;;
|
||||||
|
esac
|
||||||
Reference in New Issue
Block a user