kingofspace0wzz / agentsocialbench

Public

AgentSocialBench is the first benchmark for evaluating privacy preservation in human-centered agentic social networks — settings where teams of AI agents serve individual users across multiple domains, coordinate on shared tasks, and must protect sensitive personal information throughout.

100% credibility

Found Apr 08, 2026 at 18 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

AgentSocialBench is a benchmark for testing privacy protection in AI agent teams that coordinate social tasks on behalf of users across various interaction types.

How It Works

🔍 Discover AgentSocialBench

You find this helpful tool on a website or research paper while exploring ways to test AI privacy in group chats.

📖 Explore examples

You read simple stories of AI helpers coordinating plans like hikes or dinners, spotting where privacy slips happen.

⚙️ Set it up easily

You connect a smart AI service on your computer so the helpers can chat naturally.

🎯 Create test chats

You pick social situations like family planning or job talks, then watch AI agents team up while guarding secrets.

📊 Review privacy scores

You see clear reports on what got shared too much and how well tasks finished without spills.

🏆 Compare AI helpers

Charts show which AI brains balance chatting well with keeping info safe best.

✅ Build safer AI friends

Now you know how to make AI teams that chat freely yet protect personal details perfectly.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 18 to 18 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is agentsocialbench?

AgentSocialBench is the first benchmark for evaluating privacy preservation in human-centered agentic social networks, where teams of AI agents serve individual users across multiple domains, coordinate on shared tasks, and must protect sensitive personal information throughout. Developers get 352 scenarios spanning 7 categories like cross-domain coordination and group chats, plus a full pipeline to generate data, simulate interactions with various LLMs, evaluate leakage and task completion, and generate leaderboards and plots. Built in Python, it supports quick starts via CLI commands for simulation and analysis on backbones like Claude and GPT.

Why is it gaining traction?

It stands out as the only benchmark tackling privacy risks in multi-agent coordination, with built-in defenses like zero-disclosure prompting and metrics for abstraction quality that reveal counterintuitive leaks like the "Abstraction Paradox." The end-to-end workflow—generate scenarios, run simulations across privacy levels, auto-evaluate, and plot heatmaps or radars—saves weeks of setup, while the leaderboard ranks models on real-world leakage rates.

Who should use this?

AI researchers benchmarking LLMs for agentic apps, especially those building social networks or multi-domain coordinators worried about user data leaks. Teams evaluating models like Claude Sonnet before deploying in group chats or cross-user protocols. Privacy engineers testing defenses in dyadic or multi-party agent teams.

Verdict

Grab it if you're in agent privacy research—solid pipeline and fresh insights despite low maturity (18 stars, 1.0% credibility). Early stage means sparse tests, but arXiv paper and docs make it usable now; watch for community expansions.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 18 stars

Bonus: AI verified quality (100%)

Account age: 3,884 days

Repo age: 7 days

License: MIT

Updated: Apr 08, 2026