Consistency in Diffusion-Based Visual Generation: A Survey
This repository is an academic survey project that collects and organizes research papers about making AI image and video generators produce more consistent results. It covers three types of consistency: External (matching user instructions), Internal (keeping characters and scenes stable), and Normative (following safety rules and physics). The collection includes hundreds of research methods, evaluation benchmarks, and datasets, along with machine-readable files for researchers. The project is associated with researchers from several universities (including Tsinghua and Cambridge) and companies (Li Auto, ByteDance), and is openly available under the MIT license.
How It Works
You find a curated list of academic papers about making AI image and video generators produce more consistent, reliable results.
You understand that AI generators often make mistakes like missing objects, changing characters between frames, or producing physically impossible scenes.
The collection is organized into clear sections: making images match your instructions, keeping characters and scenes consistent over time, and ensuring outputs follow safety and physics rules.
Hundreds of research papers with links to implementations, organized by what problem each solves
Standard ways researchers measure whether AI systems are consistent
Collections of images and videos used to train and test these systems
For deeper research, you download structured tables mapping papers to their diagnostic uses and coverage areas.
The survey comes with ready-to-use citation information for your own academic papers.
Whether you're building AI systems, evaluating them, or writing about them, you now have a comprehensive map of the consistency field.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.