thombanal / clip-finetune-recipes
PublicPractical CLIP fine-tuning recipes — DDP training, LoRA, hard-negative mining, leakage checks.
This is a practical toolkit for fine-tuning image-text matching models (like CLIP) on custom datasets. It provides ready-made recipes that handle the complex machinery of machine learning training — data loading, loss functions, evaluation metrics, and sanity checks — so researchers and developers can focus on their specific use case rather than reinventing the wheel. The project supports lightweight training (LoRA) for limited hardware and full fine-tuning for maximum quality, includes multilingual support for Chinese experiments, and is well-documented with a permissive open source license.
How It Works
You collected photos and their descriptions and want a model that understands how your pictures and words relate to each other.
You download the ready-made recipes package and everything you need comes along automatically, like getting a cooking kit with all ingredients included.
You test the whole pipeline with a tiny dataset to make sure nothing is broken before committing to the real training — takes just a few minutes.
Train only small add-on layers so it runs on a regular computer and finishes in hours instead of days
Retrain everything for maximum quality, but you'll need access to powerful machines with multiple GPUs
The training loop runs smoothly across multiple GPUs if you have them, saving checkpoints automatically so you never lose progress.
You run built-in tests that ask your model to match photos to captions it's never seen before, like a pop quiz for AI models.
You have a trained model that connects your specific images with your specific vocabulary, ready to power image search, captioning, or any other feature you built it for.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.