FudanCVL

[ICML2026] OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

18
0
100% credibility
Found May 25, 2026 at 18 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

OcclusionFormer is an AI image generation system from Fudan University that creates pictures from layout descriptions. Unlike standard tools, it handles the tricky problem of objects blocking each other correctlyโ€”so if you say 'a person standing in front of a tree,' the person appears in front of the tree, not tangled with it. You can either draw your scene using a visual canvas or describe it in a layout file, and the system generates realistic images with proper depth ordering. It's designed for scenes with lots of overlapping objects where traditional tools get confused.

How It Works

1
๐Ÿ” Discovering the Project

You find OcclusionFormer while researching how to generate images from layout descriptions with realistic object layering.

2
๐Ÿ“ฆ Getting Everything Ready

You install the required tools and download the trained model weights so the system can understand your layouts.

3
Choosing Your Approach
๐ŸŽจ
Use the Visual Canvas

Open the web demo where you can draw boxes directly on a canvas, type in what each object should look like, and specify which objects block others.

๐Ÿ“
Use the Command Line

Prepare a layout file describing your scene with boxes, captions, and occlusion relationships, then run the generator.

4
โœจ Generating Your Image

The system creates your image, carefully placing objects in front of or behind each other exactly as you specified.

๐ŸŽ‰ Your Scene Comes to Life

You receive a beautiful image where overlapping objects look natural and realistic, with proper depth ordering throughout.

Sign up to see the full architecture

3 more

Sign Up Free

Star Growth

See how this repo grew from 18 to 18 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is OcclusionFormer?

OcclusionFormer is a Python framework for layout-to-image generation that handles overlapping objects correctly. When you specify a scene with multiple bounding boxes, standard image generators often produce tangled textures and wrong front/back ordering. This project fixes that by explicitly modeling Z-order relationships between objects, so a person standing in front of a tree actually looks like they are in front of the tree.

The system builds on top of FLUX.1-dev and adds specialized occlusion-aware attention blocks that use a transmittance mechanism inspired by volume rendering. You provide bounding boxes with captions, specify which objects occlude which others, and the model composes instances in the correct depth order. It ships with a Streamlit demo for interactive layout editing and a CLI tool for batch processing multiple scenes.

Why is it gaining traction?

The hook is the occlusion problem. Layout-conditioned image generation has been popular for a while, but nobody handles overlapping objects well. This work treats occlusion ordering as a first-class citizen rather than an afterthought. The transmittance-based composition means you get proper depth layering without post-processing hacks.

The included SA-Z dataset with explicit occlusion annotations gives researchers a solid starting point for training or evaluation. Having both a web UI and a scriptable CLI covers different workflows.

Who should use this?

Game artists and concept designers who need to place multiple characters or props with correct depth relationships. Researchers working on layout-grounded generation who want a baseline with explicit occlusion handling. Developers building tools that let users arrange scenes visually before generating images.

Verdict

This is a legitimate research contribution from a credible institution, but the 1.0% credibility score and 18 stars reflect a very early-stage project. The code is functional and the ICML 2026 acceptance validates the approach. However, documentation is minimal, test coverage is unclear, and the dependency on FLUX.1-dev means you need significant GPU resources to run anything. Worth watching, but wait for the ecosystem to mature before betting on it for production work.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.