FeiElysia

FeiElysia / Tempo

Public

Tempo: Small Vision-Language Models are Smart Compressors for Long Video Understanding

45
1
100% credibility
Found Apr 13, 2026 at 45 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Tempo is a research tool that intelligently compresses long videos based on user questions to enable efficient questioning and analysis using AI.

How It Works

1
🔍 Discover Tempo

You stumble upon Tempo, a clever tool that lets you ask questions about hour-long videos and get smart, precise answers without watching everything.

2
📥 Get Tempo Ready

You grab the free program and download the smart analyzer pieces so it's all set up on your computer.

3
🚀 Start the Web App

With one simple command, you open a friendly web page right in your browser to play with your videos.

4
📹 Upload Your Video

You drag in a long video clip, like a game recording or event footage, and type your burning question about it.

5
🧠 Get Magic Insights

Tempo smartly focuses on the key moments, shows a cool chart of what it paid attention to, and delivers spot-on answers with exact timestamps.

🎉 Unlock Video Secrets

You now understand your video deeply—summaries, details, and answers at your fingertips—saving hours of manual review.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 45 to 45 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is Tempo?

Tempo compresses hour-long videos into query-aware token streams for multimodal LLMs, skipping redundant frames while focusing on key moments. Upload a video via Gradio UI or CLI scripts like `infer.py`, ask questions like "What happens at 1:27?", and get precise answers with timestamped reasoning—all in Python with PyTorch and Hugging Face models. It solves the "lost-in-the-middle" problem for long video QA without exploding context windows.

Why is it gaining traction?

Tempo-6B crushes benchmarks like LVBench (52.3 at 8K tokens) using just 0.5-16 tokens per frame, beating GPT-4o and Gemini 1.5 Pro on extreme videos. Devs love the interactive UI visualizing dynamic token allocation, plus batch scripts for demos—no more uniform frame sampling that wastes compute. Like a smart compressor for tempo github music or tempo eines pferderennens analysis, it prioritizes user intent over brute force.

Who should use this?

Video researchers benchmarking long-context VLMs, or devs building apps for surveillance footage QA, tutorial analysis, or sports highlights (e.g., "Count performers during fireworks"). Ideal for teams handling Philips CoreLine Tempo small setups or tempo small size workflows needing efficient inference on consumer GPUs.

Verdict

Early but solid for inference—45 stars and 1.0% credibility reflect its freshness, with training/eval code pending. Star it now if long video understanding is your jam; skip for production until full release.

(187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.