alibaba

Official repository for paper "LaTo: Landmark-tokenized Diffusion Transformer for Fine-grained Human Face Editing"

59
6
100% credibility
Found Feb 05, 2026 at 54 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

LaTo is an open-source AI system for fine-grained human face editing that uses tokenized facial landmarks to enable precise changes like expressions or poses while strongly preserving identity.

How It Works

1
🔍 Discover LaTo

You find a cool new tool for precisely editing faces in photos while keeping the person's unique look intact.

2
📥 Get the tool

Download everything you need with simple clicks, including ready-made face editing parts.

3
📸 Pick your photo

Choose a clear face photo and type a simple wish, like 'make them smile bigger' or 'turn head slightly'.

4
AI plans the edit

The tool smartly maps out key face points from your description to guide a perfect, natural change.

5
🎨 Create the edit

Hit go, and watch it blend your idea seamlessly onto the photo.

😊 Perfect face edit

Enjoy your edited photo that looks real, preserves identity, and matches exactly what you wanted.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 54 to 59 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is landmark-tokenized-dit?

This official GitHub repository from Alibaba implements LaTo, a diffusion transformer for precise human face editing from text instructions. It tokenizes facial landmarks to guide edits, preserving identity while changing expressions, poses, or attributes on input photos. Developers get Python scripts to extract landmarks, predict targets via vision-language models, and run diffusion inference, outputting edited faces in seconds on a GPU.

Why is it gaining traction?

Unlike generic diffusion models that blur identities on big changes, LaTo excels at fine-grained control with 7.8% better identity preservation and 4.6% higher semantic alignment versus SOTA. The pipeline handles real-world inputs via simple JSON configs and bash runners, making diffusion-based editing accessible without retraining. Early adopters praise its chain-of-thought landmark prediction for interactive tweaks.

Who should use this?

Computer vision engineers prototyping face swap or animation apps. ML researchers benchmarking instruction-tuned diffusion on faces. App devs integrating precise edits into photo editors or AR filters, especially those frustrated by identity loss in open models.

Verdict

Promising research code from an official repository, but at 155 stars and 1.0% credibility, it's early-stage—expect solid docs and scripts but sparse tests. Grab it for diffusion experiments if you have CUDA and PyTorch 2.5; otherwise, wait for community polish.

(178 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.