apple

VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization

20
0
100% credibility
Found Apr 19, 2026 at 20 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Jupyter Notebook
AI Summary

VideoFlexTok provides official inference code for flexible-length video tokenization models that convert videos to discrete tokens and back.

How It Works

1
🔍 Discover VideoFlexTok

You hear about a clever tool that squeezes videos into tiny smart codes, perfect for AI projects or sharing.

2
📥 Get it ready

Download the tool and set it up on your computer with a few simple steps.

3
🧠 Grab a smart brain

Pick a ready-made expert from the online collection to handle your videos.

4
🎥 Load your video

Choose a video clip from your files and prepare it for magic.

5
Shrink to codes

Watch as your video turns into a compact set of smart tokens, ready for AI adventures.

6
🔄 Grow back video

Use the tokens to recreate your video, smooth and lifelike.

Perfect results

Enjoy your transformed video, smaller yet just as vibrant, opening doors to fun creations.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 20 to 20 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is ml-videoflextok?

VideoFlexTok lets you tokenize videos into discrete tokens using flexible-length coarse-to-fine video tokenization, then reconstruct them via a flow-matching decoder. Load pretrained models straight from Hugging Face, feed in video tensors normalized to [-1,1], and get token sequences—or reverse it to generate videos with guidance scale and timesteps. Built in Python with PyTorch and Diffusers, it includes a Jupyter notebook for instant inference on videos at 256x256 or 128x128 resolution.

Why is it gaining traction?

Unlike rigid video tokenizers capped at fixed frames, VideoFlexTok chunks arbitrary-length videos with smart overlap for seamless reconstruction, avoiding artifacts in long clips. Developers dig the dead-simple API: one line to load models, automatic sliding-window tokenization, and tunable detokenization with classifier-free guidance. Backed by an arXiv paper from EPFL and Apple researchers, it slots into multimodal pipelines without custom hacks.

Who should use this?

Video generation engineers building autoregressive models or VLLMs that need efficient tokenization for training/inference on diverse clips. Researchers extending transformers to video data, especially those hitting memory walls with long sequences. Anyone prototyping with Hugging Face models like videoflextok_d18_d28 for compression or editing tasks.

Verdict

Grab it if video tokenization fits your stack—solid notebook and HF integration make experimentation fast. With just 20 stars and 1.0% credibility, it's early but promising; expect refinements as adoption grows.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.