MAC-AutoML

Benchmarking Audio-Visual Social Interactivity in Omni Models

44
2
100% credibility
Found Mar 20, 2026 at 44 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

SocialOmni is a benchmark for testing AI models on identifying speakers, timing interruptions, and generating responses in multi-person video conversations.

How It Works

1
🔍 Discover SocialOmni

You hear about this fun way to test AI assistants on understanding real conversations in videos, like who is talking and when to join in.

2
📥 Download everything

Grab the simple files and tools with one easy command, just like downloading a game.

3
🔧 Set up your AI friends

Tell it which smart helpers like GPT or Gemini to use by filling in a quick settings note.

4
📹 Add video clips

Put short conversation videos in a folder so the tests can watch real people chatting.

5
🚀 Run your first test

Click run to see how well your AI spots speakers and times responses perfectly.

6
📊 See the scores

Get clear reports on what your AI nailed and where it can improve in social chats.

🎉 Master social AI testing

Now you know which AI is best at joining conversations naturally, ready for more fun tests!

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 44 to 44 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is SocialOmni?

SocialOmni is a Python benchmarking suite for testing audio-visual social interactivity in omni models like GPT-4o and Qwen3-Omni. It evaluates who is speaking at a given timestamp, when to interrupt in multi-party videos, and how to generate natural responses—going beyond static QA to real dialogue behavior. Developers get reproducible CLI pipelines (run_benchmark.py for perception, run_benchmark_level2.py for full interaction) plus leaderboards exposing gaps in ai benchmarking github and llm benchmarking github tools.

Why is it gaining traction?

Unlike generic llm benchmarking github repos focused on text, SocialOmni handles untrimmed videos with audio-visual cues, measuring perceptual robustness, timing precision (e.g., 0.2s windows), and LLM-judged response quality. It uncovers "rank inversions"—models ace speaker ID but flop on interruptions—making it ideal for benchmarking the robustness of audio visual recognition models at test time and cross-domain audio visual deception detection. Quick setup with uv sync and API configs hooks devs chasing omni model limits.

Who should use this?

AI researchers fine-tuning omni models for social agents or video chatbots. ML engineers at startups building audio-visual interactivity features, like real-time speaker diarization in meetings. Teams auditing proprietary models against baselines like Gemini 3 Pro on socialomni tasks.

Verdict

Grab it for targeted audio-visual benchmarking if you're in omni research—results are paper-backed (arXiv:2603.16859) and extensible via model servers. At 44 stars and 1.0% credibility score, it's early-stage with solid docs but light on tests; fork and contribute for production polish.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.