Ki

SCREENSHOT

Create a searchable image tag website using Python and Elasticlunr.js.

This project uses a local BLIP-2 captioning model to automatically tag your images, generates thumbnails, and provides a web interface to search them.

Requirements

Python 3.8 or newer
Pillow for image processing
Transformers and PyTorch for the BLIP‑2 model
spaCy with the en_core_web_sm language model
scikit-image for JPEG recompression metrics
tqdm for progress bars
Optional JPEG compression with -J/--jpegli uses the
jpeglib Python package.

Thumbnails are generated at 256×256 pixels by default, so ensure you have enough disk space for the resized copies.

How to Use

Install Dependencies:
Ensure you have Python installed. Then, install the necessary libraries. While specific versions may vary, you’ll typically need:
```
pip install Pillow scikit-image transformers torch torchvision torchaudio spacy tqdm jpeglib
python -m spacy download en_core_web_sm
```
(Note: torch installation can vary based on your system and CUDA availability. Refer to the official PyTorch website for specific instructions if needed.)
Process Your Images:
Navigate to the repository directory and run the main pipeline script, providing the path to your image folder:
Typical locations are %USERPROFILE%\Pictures on Windows or ~/Pictures on Linux/macOS.
```
python run_pipeline.py [PATH_TO_YOUR_IMAGES] [-I PATH_TO_YOUR_IMAGES] [-O OUTPUT_DIR] [-R | --recurse] [-C | --clear] [-Z | --compress] [-J | --jpegli] [-A | --add] [-D | --delete] [-V | --verbose] [-S [PORT]]
```
Windows users: Avoid quoting a path that ends with a single backslash. Either remove the trailing backslash or escape it as \\ so additional flags are parsed correctly.

This script will:
- Scan the PATH_TO_YOUR_IMAGES directory (positional or via -I/--input) for JPG, JPEG, and PNG files. Use -R/--recurse to include subfolders.
- Generate descriptive tags for each image using a local BLIP-2 model.
- Create 256×256 thumbnails for each image and store them in the output directory (default img/thumbs/). An optional watermark from img/overlay/watermark.png may be applied if make_thumbs.py (called by the pipeline) is configured for it. Thumbnail file names now include a short hash of the original path so duplicates across folders or extensions will never collide.
- Optionally clear the contents of the output folder first when using -C/--clear.
- Enable additional JPEG compression with -Z/--compress or use the jpeglib library with -J/--jpegli. These options are mutually exclusive.
- Compile all tag information into data.json, which is used by the search interface.
- Show per-image progress bars so you know exactly how many files remain.
- Use -V/--verbose to print per-image details instead of progress bars.
- Use -A/--add to append new images without rebuilding existing entries, or -D/--delete to remove records and thumbnails for images in the folder.
- Use -S [PORT] to automatically launch the local server after processing. Omit PORT to use serve.py‘s default.
Run the Web Server:
If you didn’t use -S during the pipeline step, start the local web server manually:
```
python serve.py
```
(On Linux/macOS, you might need to use python3 serve.py)

Then, open your web browser and go to http://localhost:8000 (or the port specified by serve.py) to view and search your images.

Project Structure Highlights

index.html: The main page for the image search.
app.js: Handles the client-side logic, including Elasticlunr.js setup and search functionality.
data.json: Contains the image tags and metadata for the search index (generated by run_pipeline.py).
img/thumbs/: Default directory where thumbnails are stored.
run_pipeline.py: The main script to process your images (tagging and thumbnail generation).
make_thumbs.py: Script for generating thumbnails, typically called by run_pipeline.py.
serve.py: A simple Python HTTP server to run the website locally.

TODO/MAYBES:

Make the partial rendering loop stop when you click a result before it is finished.
Add transactional folders (e.g., IN, PROCESSED, ERROR) for more efficient content addition.
Check EXIF/File attributes for “Time Created” to compare against a “last sync” date for incremental updates.
Add a script to search network drives for images.
Conduct thorough testing, including corner cases.