image cleaner

2025-12-15 19:45:30 +00:00 · 2025-09-15 18:32:13 +02:00
parent bcf11e4e11
commit 9960dc5e38
11 changed files with 1247 additions and 0 deletions
--- a/scripts/ex/.gitignore
+++ b/scripts/ex/.gitignore
@@ -0,0 +1,56 @@
 # Python Virtual Environment
 venv/
 env/
 .env
 # Python cache files
 __pycache__/
 *.pyc
 *.pyo
 *.pyd
 .Python
 # Generated/processed images
 demo_*
 cleaned_*
 comparison_*
 *_cleaned_*
 *_comparison_*
 # Processing outputs
 cleaned/
 output/
 results/
 # Configuration files (may contain sensitive settings)
 config.json
 *.config.json
 custom_*.json
 # Temporary files
 *.tmp
 *.temp
 .DS_Store
 Thumbs.db
 # IDE files
 .vscode/
 .idea/
 *.swp
 *.swo
 *~
 # Logs
 *.log
 logs/
 # Test outputs
 test_*
 sample_output/
 # Large source images (uncomment if you don't want to track originals)
 # *.jpg
 # *.jpeg
 # *.png
 # *.tif
 # *.tiff
--- a/scripts/ex/1771-09b-02.jpg
+++ b/scripts/ex/1771-09b-02.jpg
--- a/scripts/ex/1772-07b-02.jpg
+++ b/scripts/ex/1772-07b-02.jpg
--- a/scripts/ex/1772-34-136.jpg
+++ b/scripts/ex/1772-34-136.jpg
--- a/scripts/ex/README.md
+++ b/scripts/ex/README.md
@@ -0,0 +1,211 @@
 # Historical Newspaper Image Cleaning Pipeline
 This pipeline automatically cleans and enhances scanned historical newspaper images by reducing noise, improving contrast, and sharpening text for better readability.
 ## Features
 - **Noise Reduction**: Bilateral filtering and non-local means denoising
 - **Contrast Enhancement**: CLAHE and gamma correction
 - **Background Cleaning**: Morphological operations to remove artifacts
 - **Text Sharpening**: Unsharp masking for improved readability
 - **Batch Processing**: Process entire directories efficiently
 - **Interactive Tuning**: Find optimal parameters for your specific images
 - **Before/After Comparisons**: Visual validation of improvements
 ## Quick Start
 ### 1. Install Dependencies
 ```bash
 pip install -r requirements.txt
 ```
 ### 2. Process Single Image
 ```bash
 python image_cleaner.py input_image.jpg -o cleaned_image.jpg --comparison
 ```
 ### 3. Batch Process Directory
 ```bash
 python batch_process.py -i newspaper_scans -o cleaned_images
 ```
 ### 4. Interactive Parameter Tuning
 ```bash
 python config_tuner.py sample_image.jpg
 ```
 ## Usage Examples
 ### Basic Image Cleaning
 ```bash
 # Clean single image with default settings
 python image_cleaner.py 1771-09b-02.jpg
 # Clean with specific processing steps
 python image_cleaner.py 1771-09b-02.jpg --steps denoise contrast sharpen
 # Create before/after comparison
 python image_cleaner.py 1771-09b-02.jpg -c
 ```
 ### Batch Processing
 ```bash
 # Process all JPG files in current directory
 python batch_process.py
 # Process specific directory with custom output
 python batch_process.py -i scans/ -o cleaned/
 # Use custom configuration
 python batch_process.py --config custom_config.json
 # Skip comparison images for faster processing
 python batch_process.py --no-comparisons
 ```
 ### Parameter Tuning
 ```bash
 # Start interactive tuning session
 python config_tuner.py sample_image.jpg
 # Load existing config for fine-tuning
 python config_tuner.py sample_image.jpg -c existing_config.json
 ```
 ## Configuration
 ### Default Parameters
 The pipeline uses these default parameters optimized for newspaper scans:
 ```json
 {
    "bilateral_d": 9,
    "bilateral_sigma_color": 75,
    "bilateral_sigma_space": 75,
    "clahe_clip_limit": 2.0,
    "clahe_grid_size": [8, 8],
    "gamma": 1.2,
    "denoise_h": 10,
    "morph_kernel_size": 2,
    "unsharp_amount": 1.5,
    "unsharp_radius": 1.0,
    "unsharp_threshold": 0
 }
 ```
 ### Parameter Descriptions
 - **bilateral_d**: Neighborhood diameter for bilateral filtering (5-15)
 - **bilateral_sigma_color**: Color space filter strength (50-150)
 - **bilateral_sigma_space**: Coordinate space filter strength (50-150)
 - **clahe_clip_limit**: Contrast limiting for CLAHE (1.0-4.0)
 - **clahe_grid_size**: CLAHE tile grid size [width, height] (4-16)
 - **gamma**: Gamma correction value (0.8-2.0)
 - **denoise_h**: Denoising filter strength (5-20)
 - **morph_kernel_size**: Morphological operation kernel size (1-5)
 - **unsharp_amount**: Unsharp masking strength (0.5-3.0)
 - **unsharp_radius**: Unsharp masking radius (0.5-2.0)
 - **unsharp_threshold**: Unsharp masking threshold (0-10)
 ### Creating Custom Configurations
 1. Generate default config template:
 ```bash
 python batch_process.py --create-config
 ```
 2. Edit `config.json` with your preferred values
 3. Use custom config:
 ```bash
 python batch_process.py --config config.json
 ```
 ## Processing Pipeline
 The image cleaning pipeline applies these steps in sequence:
 1. **Noise Reduction**
   - Bilateral filtering preserves edges while reducing noise
   - Non-local means denoising removes repetitive patterns
 2. **Contrast Enhancement**
   - CLAHE improves local contrast adaptively
   - Gamma correction adjusts overall brightness
 3. **Background Cleaning**
   - Morphological operations remove small artifacts
   - Background normalization reduces paper texture
 4. **Sharpening**
   - Unsharp masking enhances text edges
   - Preserves fine details while reducing blur
 ## Interactive Tuning Commands
 When using `config_tuner.py`, these commands are available:
 - `set <param> <value>` - Adjust parameter value
 - `show` - Display current parameters
 - `test [steps]` - Process with current settings
 - `compare [filename]` - Save before/after comparison
 - `save <filename>` - Save configuration to file
 - `load <filename>` - Load configuration from file
 - `presets` - Show preset configurations
 - `help` - Show detailed help
 - `quit` - Exit tuning session
 ## Tips for Best Results
 ### For Light Damage/Noise:
 - Reduce `bilateral_d` to 5-7
 - Lower `denoise_h` to 5-8
 - Use `clahe_clip_limit` around 1.5
 ### For Heavy Damage/Artifacts:
 - Increase `bilateral_d` to 12-15
 - Raise `denoise_h` to 15-20
 - Use higher `clahe_clip_limit` (3.0-4.0)
 ### For Faded/Low Contrast Images:
 - Increase `gamma` to 1.3-1.5
 - Raise `clahe_clip_limit` to 3.0+
 - Boost `unsharp_amount` to 2.0+
 ### For Sharp/High Quality Scans:
 - Focus mainly on `denoise` and `sharpen` steps
 - Skip `background` cleaning if unnecessary
 - Use lighter settings to preserve quality
 ## File Structure
 ```
 newspaper_image_cleaner/
 ├── image_cleaner.py      # Core processing module
 ├── batch_process.py      # Batch processing script
 ├── config_tuner.py       # Interactive parameter tuning
 ├── requirements.txt      # Python dependencies
 └── README.md            # This documentation
 ```
 ## Troubleshooting
 ### ImportError: No module named 'cv2'
 Install OpenCV: `pip install opencv-python`
 ### Memory Issues with Large Images
 The tuner automatically resizes large images. For batch processing of very large images, consider resizing first.
 ### Poor Results
 Use the interactive tuner to find optimal parameters for your specific image characteristics.
 ## Performance
 - Single 3000x2000 image: ~3-5 seconds
 - Batch processing depends on image size and quantity
 - Interactive tuning uses smaller images for faster feedback
--- a/scripts/ex/batch_process.py
+++ b/scripts/ex/batch_process.py
@@ -0,0 +1,162 @@
 #!/usr/bin/env python3
 """
 Batch Processing Script for Historical Newspaper Images
 Simple script to process multiple images with the newspaper cleaning pipeline.
 Includes progress tracking and error handling.
 """
 import os
 import sys
 import time
 import json
 from pathlib import Path
 from image_cleaner import NewspaperImageCleaner, create_comparison_image
 def process_batch(input_dir=".", output_dir="cleaned", config_file=None,
                 create_comparisons=True, file_pattern="*.jpg"):
    """
    Process all newspaper images in a directory.
    Args:
        input_dir: Directory containing input images
        output_dir: Directory for cleaned images
        config_file: JSON file with custom parameters
        create_comparisons: Whether to create before/after comparisons
        file_pattern: Glob pattern for files to process
    """
    # Load custom config if provided
    config = None
    if config_file and os.path.exists(config_file):
        with open(config_file, 'r') as f:
            config = json.load(f)
        print(f"Loaded custom config from {config_file}")
    # Initialize cleaner
    cleaner = NewspaperImageCleaner(config)
    # Setup paths
    input_path = Path(input_dir)
    output_path = Path(output_dir)
    output_path.mkdir(exist_ok=True)
    if create_comparisons:
        comparison_path = output_path / "comparisons"
        comparison_path.mkdir(exist_ok=True)
    # Find all image files
    image_files = list(input_path.glob(file_pattern))
    image_files.extend(input_path.glob("*.jpeg"))
    image_files.extend(input_path.glob("*.JPG"))
    image_files.extend(input_path.glob("*.JPEG"))
    if not image_files:
        print(f"No image files found in {input_dir}")
        return
    print(f"Found {len(image_files)} images to process")
    print(f"Output directory: {output_path.absolute()}")
    # Process each image
    success_count = 0
    error_count = 0
    start_time = time.time()
    for i, img_file in enumerate(image_files, 1):
        print(f"\n[{i}/{len(image_files)}] Processing: {img_file.name}")
        try:
            # Process image
            output_file = output_path / f"cleaned_{img_file.name}"
            processed, original = cleaner.process_image(img_file, output_file)
            # Create comparison if requested
            if create_comparisons:
                comp_file = comparison_path / f"comparison_{img_file.name}"
                create_comparison_image(original, processed, comp_file)
            success_count += 1
            print(f"✓ Completed: {img_file.name}")
        except Exception as e:
            error_count += 1
            print(f"✗ Error processing {img_file.name}: {str(e)}")
    # Summary
    elapsed_time = time.time() - start_time
    print(f"\n" + "="*50)
    print(f"Batch Processing Complete")
    print(f"{"="*50}")
    print(f"Successfully processed: {success_count}")
    print(f"Errors: {error_count}")
    print(f"Total time: {elapsed_time:.1f} seconds")
    print(f"Average time per image: {elapsed_time/len(image_files):.1f} seconds")
    print(f"Output directory: {output_path.absolute()}")
 def create_sample_config():
    """Create a sample configuration file for customization."""
    config = {
        "bilateral_d": 9,
        "bilateral_sigma_color": 75,
        "bilateral_sigma_space": 75,
        "clahe_clip_limit": 2.0,
        "clahe_grid_size": [8, 8],
        "gamma": 1.2,
        "denoise_h": 10,
        "morph_kernel_size": 2,
        "unsharp_amount": 1.5,
        "unsharp_radius": 1.0,
        "unsharp_threshold": 0
    }
    with open("config.json", "w") as f:
        json.dump(config, f, indent=4)
    print("Created config.json with default parameters.")
    print("Edit this file to customize processing settings.")
 if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser(
        description="Batch process historical newspaper images",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  python batch_process.py                    # Process current directory
  python batch_process.py -i scans -o clean # Process 'scans' folder
  python batch_process.py --no-comparisons  # Skip comparison images
  python batch_process.py --config custom.json  # Use custom settings
        """
    )
    parser.add_argument("-i", "--input", default=".",
                       help="Input directory (default: current directory)")
    parser.add_argument("-o", "--output", default="cleaned",
                       help="Output directory (default: cleaned)")
    parser.add_argument("-c", "--config",
                       help="JSON config file with custom parameters")
    parser.add_argument("--no-comparisons", action="store_true",
                       help="Skip creating before/after comparison images")
    parser.add_argument("--pattern", default="*.jpg",
                       help="File pattern to match (default: *.jpg)")
    parser.add_argument("--create-config", action="store_true",
                       help="Create sample config file and exit")
    args = parser.parse_args()
    if args.create_config:
        create_sample_config()
        sys.exit(0)
    process_batch(
        input_dir=args.input,
        output_dir=args.output,
        config_file=args.config,
        create_comparisons=not args.no_comparisons,
        file_pattern=args.pattern
    )
--- a/scripts/ex/config_tuner.py
+++ b/scripts/ex/config_tuner.py
@@ -0,0 +1,291 @@
 #!/usr/bin/env python3
 """
 Interactive Parameter Tuning Tool for Newspaper Image Cleaning
 This tool helps you find optimal parameters for your specific images
 by providing an interactive tuning interface.
 """
 import cv2
 import json
 import numpy as np
 from pathlib import Path
 from image_cleaner import NewspaperImageCleaner
 class ParameterTuner:
    """Interactive parameter tuning for image cleaning pipeline."""
    def __init__(self, sample_image_path):
        """Initialize with a sample image for tuning."""
        self.original = cv2.imread(str(sample_image_path))
        if self.original is None:
            raise ValueError(f"Could not load image: {sample_image_path}")
        # Resize large images for faster processing during tuning
        height, width = self.original.shape[:2]
        if height > 1500 or width > 1500:
            scale = min(1500/height, 1500/width)
            new_width = int(width * scale)
            new_height = int(height * scale)
            self.original = cv2.resize(self.original, (new_width, new_height))
            print(f"Resized image to {new_width}x{new_height} for faster tuning")
        self.current_params = self._get_default_params()
        self.cleaner = NewspaperImageCleaner(self.current_params)
    def _get_default_params(self):
        """Get default parameters as starting point."""
        return {
            'bilateral_d': 9,
            'bilateral_sigma_color': 75,
            'bilateral_sigma_space': 75,
            'clahe_clip_limit': 2.0,
            'clahe_grid_size': (8, 8),
            'gamma': 1.2,
            'denoise_h': 10,
            'morph_kernel_size': 2,
            'unsharp_amount': 1.5,
            'unsharp_radius': 1.0,
            'unsharp_threshold': 0,
        }
    def update_parameter(self, param_name, value):
        """Update a single parameter and refresh the cleaner."""
        if param_name in self.current_params:
            # Handle special cases
            if param_name == 'clahe_grid_size':
                self.current_params[param_name] = (int(value), int(value))
            else:
                self.current_params[param_name] = value
            # Update cleaner with new parameters
            self.cleaner = NewspaperImageCleaner(self.current_params)
            print(f"Updated {param_name} = {value}")
    def process_with_current_params(self, steps=None):
        """Process the sample image with current parameters."""
        if steps is None:
            steps = ['denoise', 'contrast', 'background', 'sharpen']
        image = self.original.copy()
        # Apply processing steps
        if 'denoise' in steps:
            image = self.cleaner.reduce_noise(image)
        if 'contrast' in steps:
            image = self.cleaner.enhance_contrast(image)
        if 'background' in steps:
            image = self.cleaner.clean_background(image)
        if 'sharpen' in steps:
            image = self.cleaner.sharpen_image(image)
        return image
    def create_comparison(self, steps=None):
        """Create side-by-side comparison with current parameters."""
        processed = self.process_with_current_params(steps)
        # Create side-by-side comparison
        height = max(self.original.shape[0], processed.shape[0])
        comparison = np.hstack([
            cv2.resize(self.original, (self.original.shape[1], height)),
            cv2.resize(processed, (processed.shape[1], height))
        ])
        return comparison
    def save_comparison(self, output_path, steps=None):
        """Save comparison image to file."""
        comparison = self.create_comparison(steps)
        cv2.imwrite(str(output_path), comparison)
        print(f"Comparison saved to: {output_path}")
    def save_config(self, config_path):
        """Save current parameters to JSON config file."""
        # Convert tuple to list for JSON serialization
        config_to_save = self.current_params.copy()
        if 'clahe_grid_size' in config_to_save:
            config_to_save['clahe_grid_size'] = list(config_to_save['clahe_grid_size'])
        with open(config_path, 'w') as f:
            json.dump(config_to_save, f, indent=4)
        print(f"Configuration saved to: {config_path}")
    def load_config(self, config_path):
        """Load parameters from JSON config file."""
        with open(config_path, 'r') as f:
            loaded_params = json.load(f)
        # Convert list back to tuple if needed
        if 'clahe_grid_size' in loaded_params:
            loaded_params['clahe_grid_size'] = tuple(loaded_params['clahe_grid_size'])
        self.current_params.update(loaded_params)
        self.cleaner = NewspaperImageCleaner(self.current_params)
        print(f"Configuration loaded from: {config_path}")
    def interactive_tune(self):
        """Start interactive tuning session."""
        print("\n" + "="*60)
        print("INTERACTIVE PARAMETER TUNING")
        print("="*60)
        print("Commands:")
        print("  set <param> <value>  - Set parameter value")
        print("  show                 - Show current parameters")
        print("  test [steps]         - Test current parameters")
        print("  save <file>          - Save configuration to file")
        print("  load <file>          - Load configuration from file")
        print("  compare [file]       - Save comparison image")
        print("  presets              - Show parameter presets")
        print("  help                 - Show this help")
        print("  quit                 - Exit tuning")
        print("\nParameters you can adjust:")
        for param in self.current_params:
            print(f"  {param}")
        while True:
            try:
                command = input("\ntuner> ").strip().split()
                if not command:
                    continue
                cmd = command[0].lower()
                if cmd == 'quit' or cmd == 'exit':
                    break
                elif cmd == 'show':
                    self._show_parameters()
                elif cmd == 'set' and len(command) >= 3:
                    param = command[1]
                    try:
                        value = float(command[2]) if '.' in command[2] else int(command[2])
                    except ValueError:
                        value = command[2]
                    self.update_parameter(param, value)
                elif cmd == 'test':
                    steps = command[1:] if len(command) > 1 else None
                    print("Processing with current parameters...")
                    processed = self.process_with_current_params(steps)
                    print(f"Processed image shape: {processed.shape}")
                elif cmd == 'save' and len(command) > 1:
                    self.save_config(command[1])
                elif cmd == 'load' and len(command) > 1:
                    self.load_config(command[1])
                elif cmd == 'compare':
                    output = command[1] if len(command) > 1 else "tuning_comparison.jpg"
                    self.save_comparison(output)
                elif cmd == 'presets':
                    self._show_presets()
                elif cmd == 'help':
                    self._show_help()
                else:
                    print("Unknown command. Type 'help' for available commands.")
            except KeyboardInterrupt:
                print("\nExiting tuner...")
                break
            except Exception as e:
                print(f"Error: {str(e)}")
    def _show_parameters(self):
        """Display current parameter values."""
        print("\nCurrent Parameters:")
        print("-" * 30)
        for param, value in self.current_params.items():
            print(f"  {param:<20} = {value}")
    def _show_presets(self):
        """Show preset configurations for different image types."""
        presets = {
            "light_cleaning": {
                "bilateral_d": 5,
                "denoise_h": 5,
                "clahe_clip_limit": 1.5,
                "gamma": 1.1,
                "unsharp_amount": 1.2
            },
            "heavy_cleaning": {
                "bilateral_d": 15,
                "denoise_h": 15,
                "clahe_clip_limit": 3.0,
                "gamma": 1.3,
                "unsharp_amount": 2.0
            },
            "high_contrast": {
                "clahe_clip_limit": 4.0,
                "gamma": 1.4,
                "unsharp_amount": 2.5
            }
        }
        print("\nAvailable Presets:")
        print("-" * 30)
        for name, params in presets.items():
            print(f"{name}:")
            for param, value in params.items():
                print(f"  {param} = {value}")
            print()
    def _show_help(self):
        """Show detailed help information."""
        help_text = """
 Parameter Descriptions:
 -----------------------
 bilateral_d          : Neighborhood diameter for bilateral filtering (5-15)
 bilateral_sigma_color: Filter sigma in color space (50-150)
 bilateral_sigma_space: Filter sigma in coordinate space (50-150)
 clahe_clip_limit     : Contrast limit for CLAHE (1.0-4.0)
 clahe_grid_size      : CLAHE tile grid size (4-16)
 gamma                : Gamma correction value (0.8-2.0)
 denoise_h            : Denoising filter strength (5-20)
 morph_kernel_size    : Morphological operation kernel size (1-5)
 unsharp_amount       : Unsharp masking amount (0.5-3.0)
 unsharp_radius       : Unsharp masking radius (0.5-2.0)
 unsharp_threshold    : Unsharp masking threshold (0-10)
 Tips:
 - Start with small adjustments (±20% of current value)
 - Test frequently with 'compare' command
 - Save working configurations before major changes
 - Use 'test denoise' to test individual steps
        """
        print(help_text)
 def main():
    """Main function for command line usage."""
    import argparse
    parser = argparse.ArgumentParser(description="Interactive parameter tuning for newspaper image cleaning")
    parser.add_argument("image", help="Sample image path for tuning")
    parser.add_argument("-c", "--config", help="Load initial config from file")
    args = parser.parse_args()
    try:
        tuner = ParameterTuner(args.image)
        if args.config:
            tuner.load_config(args.config)
        tuner.interactive_tune()
    except Exception as e:
        print(f"Error: {str(e)}")
 if __name__ == "__main__":
    main()
--- a/scripts/ex/demo.py
+++ b/scripts/ex/demo.py
@@ -0,0 +1,170 @@
 #!/usr/bin/env python3
 """
 Demo Script for Newspaper Image Cleaning Pipeline
 This script demonstrates the cleaning pipeline on the sample images
 and shows the available functionality.
 """
 import sys
 import os
 from pathlib import Path
 # Add current directory to Python path
 sys.path.append(str(Path(__file__).parent))
 try:
    from image_cleaner import NewspaperImageCleaner, create_comparison_image
    import cv2
    import numpy as np
    print("✓ All required libraries imported successfully")
 except ImportError as e:
    print(f"✗ Import error: {e}")
    print("Please install required packages: pip install -r requirements.txt")
    sys.exit(1)
 def demo_single_image(image_path):
    """Demonstrate processing a single image."""
    print(f"\n=== Processing Single Image: {image_path} ===")
    if not os.path.exists(image_path):
        print(f"Image not found: {image_path}")
        return False
    try:
        # Initialize cleaner
        cleaner = NewspaperImageCleaner()
        # Process image
        output_path = f"demo_cleaned_{Path(image_path).name}"
        processed, original = cleaner.process_image(image_path, output_path)
        # Create comparison
        comparison_path = f"demo_comparison_{Path(image_path).name}"
        create_comparison_image(original, processed, comparison_path)
        print(f"✓ Processed image saved: {output_path}")
        print(f"✓ Comparison saved: {comparison_path}")
        return True
    except Exception as e:
        print(f"✗ Error processing {image_path}: {str(e)}")
        return False
 def demo_step_by_step(image_path):
    """Demonstrate individual processing steps."""
    print(f"\n=== Step-by-Step Processing: {image_path} ===")
    if not os.path.exists(image_path):
        print(f"Image not found: {image_path}")
        return
    try:
        # Load image
        original = cv2.imread(image_path)
        if original is None:
            print(f"Could not load image: {image_path}")
            return
        # Resize if too large for demo
        height, width = original.shape[:2]
        if height > 1000 or width > 1000:
            scale = min(1000/height, 1000/width)
            new_width = int(width * scale)
            new_height = int(height * scale)
            original = cv2.resize(original, (new_width, new_height))
            print(f"Resized to {new_width}x{new_height} for demo")
        cleaner = NewspaperImageCleaner()
        # Process step by step
        steps = [
            ('original', original),
            ('denoised', cleaner.reduce_noise(original.copy())),
            ('contrast_enhanced', cleaner.enhance_contrast(original.copy())),
            ('background_cleaned', cleaner.clean_background(original.copy())),
            ('sharpened', cleaner.sharpen_image(original.copy()))
        ]
        # Save each step
        for step_name, image in steps:
            output_path = f"demo_step_{step_name}_{Path(image_path).name}"
            cv2.imwrite(output_path, image)
            print(f"✓ Saved {step_name}: {output_path}")
        print("✓ Individual processing steps completed")
    except Exception as e:
        print(f"✗ Error in step-by-step processing: {str(e)}")
 def show_image_info():
    """Show information about available images."""
    print("\n=== Available Sample Images ===")
    image_files = []
    for ext in ['*.jpg', '*.jpeg', '*.JPG', '*.JPEG']:
        image_files.extend(Path('.').glob(ext))
    if not image_files:
        print("No image files found in current directory")
        return []
    for img_file in image_files:
        try:
            # Load image to get dimensions
            img = cv2.imread(str(img_file))
            if img is not None:
                height, width = img.shape[:2]
                file_size = img_file.stat().st_size / (1024*1024)  # MB
                print(f"  {img_file.name}: {width}x{height} pixels, {file_size:.1f}MB")
            else:
                print(f"  {img_file.name}: Could not load")
        except Exception as e:
            print(f"  {img_file.name}: Error - {str(e)}")
    return image_files
 def main():
    """Main demo function."""
    print("Historical Newspaper Image Cleaning Pipeline - Demo")
    print("=" * 55)
    # Show available images
    image_files = show_image_info()
    if not image_files:
        print("\nNo images found. Please add some image files to test.")
        return
    # Select first image for demo
    sample_image = image_files[0]
    print(f"\nUsing sample image: {sample_image.name}")
    # Demo single image processing
    success = demo_single_image(str(sample_image))
    if success:
        # Demo step-by-step processing
        demo_step_by_step(str(sample_image))
        print(f"\n=== Demo Complete ===")
        print("Generated files:")
        print("  - demo_cleaned_*.jpg (cleaned image)")
        print("  - demo_comparison_*.jpg (before/after comparison)")
        print("  - demo_step_*.jpg (individual processing steps)")
        print(f"\nNext steps:")
        print(f"  - Try: python config_tuner.py {sample_image.name}")
        print(f"  - Try: python batch_process.py")
        print(f"  - Adjust parameters in config.json for better results")
    else:
        print("\nDemo failed. Please check your Python environment and dependencies.")
 if __name__ == "__main__":
    main()
--- a/scripts/ex/image_cleaner.py
+++ b/scripts/ex/image_cleaner.py
@@ -0,0 +1,310 @@
 """
 Historical Newspaper Image Cleaning Pipeline
 This module provides functions to clean and enhance scanned historical newspaper images
 by reducing noise, improving contrast, and sharpening text for better readability.
 """
 import cv2
 import numpy as np
 from PIL import Image, ImageEnhance
 import os
 import argparse
 from pathlib import Path
 class NewspaperImageCleaner:
    """
    Image processing pipeline specifically designed for historical newspaper scans.
    """
    def __init__(self, config=None):
        """Initialize with default or custom configuration."""
        self.config = config or self._default_config()
    def _default_config(self):
        """Default processing parameters optimized for newspaper scans."""
        return {
            'bilateral_d': 9,           # Neighborhood diameter for bilateral filter
            'bilateral_sigma_color': 75,  # Filter sigma in color space
            'bilateral_sigma_space': 75,  # Filter sigma in coordinate space
            'clahe_clip_limit': 2.0,    # Contrast limiting for CLAHE
            'clahe_grid_size': (8, 8),  # CLAHE grid size
            'gamma': 1.2,               # Gamma correction value
            'denoise_h': 10,            # Denoising filter strength
            'morph_kernel_size': 2,     # Morphological operation kernel size
            'unsharp_amount': 1.5,      # Unsharp masking amount
            'unsharp_radius': 1.0,      # Unsharp masking radius
            'unsharp_threshold': 0,     # Unsharp masking threshold
        }
    def reduce_noise(self, image):
        """
        Apply noise reduction techniques to remove speckles and JPEG artifacts.
        Args:
            image: Input BGR image
        Returns:
            Denoised image
        """
        # Bilateral filter - preserves edges while reducing noise
        bilateral = cv2.bilateralFilter(
            image,
            self.config['bilateral_d'],
            self.config['bilateral_sigma_color'],
            self.config['bilateral_sigma_space']
        )
        # Non-local means denoising for better noise reduction
        if len(image.shape) == 3:
            # Color image
            denoised = cv2.fastNlMeansDenoisingColored(
                bilateral, None,
                self.config['denoise_h'],
                self.config['denoise_h'],
                7, 21
            )
        else:
            # Grayscale image
            denoised = cv2.fastNlMeansDenoising(
                bilateral, None,
                self.config['denoise_h'],
                7, 21
            )
        return denoised
    def enhance_contrast(self, image):
        """
        Improve image contrast using CLAHE and gamma correction.
        Args:
            image: Input BGR image
        Returns:
            Contrast-enhanced image
        """
        # Convert to LAB color space for better contrast processing
        if len(image.shape) == 3:
            lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
            l_channel, a_channel, b_channel = cv2.split(lab)
        else:
            l_channel = image
        # Apply CLAHE (Contrast Limited Adaptive Histogram Equalization)
        clahe = cv2.createCLAHE(
            clipLimit=self.config['clahe_clip_limit'],
            tileGridSize=self.config['clahe_grid_size']
        )
        l_channel = clahe.apply(l_channel)
        # Reconstruct image
        if len(image.shape) == 3:
            enhanced = cv2.merge([l_channel, a_channel, b_channel])
            enhanced = cv2.cvtColor(enhanced, cv2.COLOR_LAB2BGR)
        else:
            enhanced = l_channel
        # Apply gamma correction
        gamma = self.config['gamma']
        inv_gamma = 1.0 / gamma
        table = np.array([((i / 255.0) ** inv_gamma) * 255
                         for i in np.arange(0, 256)]).astype("uint8")
        enhanced = cv2.LUT(enhanced, table)
        return enhanced
    def clean_background(self, image):
        """
        Remove small artifacts and clean background noise.
        Args:
            image: Input image
        Returns:
            Background-cleaned image
        """
        # Convert to grayscale for morphological operations
        if len(image.shape) == 3:
            gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        else:
            gray = image
        # Morphological opening to remove small noise
        kernel = np.ones((self.config['morph_kernel_size'],
                         self.config['morph_kernel_size']), np.uint8)
        # Opening (erosion followed by dilation)
        opened = cv2.morphologyEx(gray, cv2.MORPH_OPEN, kernel)
        # If original was color, apply the mask
        if len(image.shape) == 3:
            # Create a mask and apply it to the original color image
            mask = opened > 0
            result = image.copy()
            result[~mask] = [255, 255, 255]  # Set background to white
            return result
        else:
            return opened
    def sharpen_image(self, image):
        """
        Apply unsharp masking to enhance text clarity.
        Args:
            image: Input image
        Returns:
            Sharpened image
        """
        # Convert to float for processing
        float_img = image.astype(np.float32) / 255.0
        # Create Gaussian blur
        radius = self.config['unsharp_radius']
        sigma = radius / 3.0
        blurred = cv2.GaussianBlur(float_img, (0, 0), sigma)
        # Unsharp masking
        amount = self.config['unsharp_amount']
        sharpened = float_img + amount * (float_img - blurred)
        # Threshold and clamp
        threshold = self.config['unsharp_threshold'] / 255.0
        sharpened = np.where(np.abs(float_img - blurred) < threshold,
                           float_img, sharpened)
        sharpened = np.clip(sharpened, 0.0, 1.0)
        return (sharpened * 255).astype(np.uint8)
    def process_image(self, image_path, output_path=None, steps=None):
        """
        Process a single image through the complete pipeline.
        Args:
            image_path: Path to input image
            output_path: Path for output image (optional)
            steps: List of processing steps to apply (optional)
        Returns:
            Processed image array
        """
        if steps is None:
            steps = ['denoise', 'contrast', 'background', 'sharpen']
        # Load image
        image = cv2.imread(str(image_path))
        if image is None:
            raise ValueError(f"Could not load image: {image_path}")
        original = image.copy()
        # Apply processing steps
        if 'denoise' in steps:
            print(f"Applying noise reduction...")
            image = self.reduce_noise(image)
        if 'contrast' in steps:
            print(f"Enhancing contrast...")
            image = self.enhance_contrast(image)
        if 'background' in steps:
            print(f"Cleaning background...")
            image = self.clean_background(image)
        if 'sharpen' in steps:
            print(f"Sharpening image...")
            image = self.sharpen_image(image)
        # Save output if path provided
        if output_path:
            cv2.imwrite(str(output_path), image)
            print(f"Processed image saved to: {output_path}")
        return image, original
    def process_directory(self, input_dir, output_dir, extensions=None):
        """
        Process all images in a directory.
        Args:
            input_dir: Input directory path
            output_dir: Output directory path
            extensions: List of file extensions to process
        """
        if extensions is None:
            extensions = ['.jpg', '.jpeg', '.png', '.tif', '.tiff']
        input_path = Path(input_dir)
        output_path = Path(output_dir)
        output_path.mkdir(parents=True, exist_ok=True)
        for file_path in input_path.iterdir():
            if file_path.suffix.lower() in extensions:
                print(f"\nProcessing: {file_path.name}")
                output_file = output_path / f"cleaned_{file_path.name}"
                try:
                    self.process_image(file_path, output_file)
                except Exception as e:
                    print(f"Error processing {file_path.name}: {str(e)}")
        print(f"\nBatch processing completed. Results in: {output_dir}")
 def create_comparison_image(original, processed, output_path):
    """
    Create a side-by-side comparison image.
    Args:
        original: Original image array
        processed: Processed image array
        output_path: Path to save comparison
    """
    # Resize images to same height if needed
    h1, w1 = original.shape[:2]
    h2, w2 = processed.shape[:2]
    if h1 != h2:
        height = min(h1, h2)
        original = cv2.resize(original, (int(w1 * height / h1), height))
        processed = cv2.resize(processed, (int(w2 * height / h2), height))
    # Create side-by-side comparison
    comparison = np.hstack([original, processed])
    cv2.imwrite(str(output_path), comparison)
    print(f"Comparison saved to: {output_path}")
 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Clean historical newspaper images")
    parser.add_argument("input", help="Input image or directory path")
    parser.add_argument("-o", "--output", help="Output path")
    parser.add_argument("-d", "--directory", action="store_true",
                       help="Process entire directory")
    parser.add_argument("-c", "--comparison", action="store_true",
                       help="Create before/after comparison")
    parser.add_argument("--steps", nargs="+",
                       choices=['denoise', 'contrast', 'background', 'sharpen'],
                       default=['denoise', 'contrast', 'background', 'sharpen'],
                       help="Processing steps to apply")
    args = parser.parse_args()
    cleaner = NewspaperImageCleaner()
    if args.directory:
        output_dir = args.output or "cleaned_images"
        cleaner.process_directory(args.input, output_dir)
    else:
        output_path = args.output
        if not output_path:
            input_path = Path(args.input)
            output_path = input_path.parent / f"cleaned_{input_path.name}"
        processed, original = cleaner.process_image(args.input, output_path, args.steps)
        if args.comparison:
            comparison_path = Path(output_path).parent / f"comparison_{Path(args.input).name}"
            create_comparison_image(original, processed, comparison_path)
--- a/scripts/ex/requirements.txt
+++ b/scripts/ex/requirements.txt
@@ -0,0 +1,5 @@
 opencv-python==4.10.0.84
 scikit-image==0.24.0
 Pillow==10.4.0
 numpy==2.1.1
 matplotlib==3.9.2
--- a/scripts/ex/run.sh
+++ b/scripts/ex/run.sh
@@ -0,0 +1,42 @@
 #!/bin/bash
 # Convenience script to run the image cleaning pipeline with virtual environment
 # Activate virtual environment
 source venv/bin/activate
 # Check if any arguments provided
 if [ $# -eq 0 ]; then
    echo "Historical Newspaper Image Cleaning Pipeline"
    echo "Usage examples:"
    echo "  $0 demo                              # Run demo"
    echo "  $0 clean image.jpg                   # Clean single image"
    echo "  $0 batch                             # Process all images in directory"
    echo "  $0 tune image.jpg                    # Interactive parameter tuning"
    echo "  $0 python script.py [args]          # Run custom Python script"
    exit 1
 fi
 case "$1" in
    "demo")
        python demo.py
        ;;
    "clean")
        shift
        python image_cleaner.py "$@"
        ;;
    "batch")
        shift
        python batch_process.py "$@"
        ;;
    "tune")
        shift
        python config_tuner.py "$@"
        ;;
    "python")
        shift
        python "$@"
        ;;
    *)
        python "$@"
        ;;
 esac