VibeBench / VibeSearchBench
Publicπ The hardest search benchmark in the wild β vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifiable schema-free knowledge-graph evaluation. No vibes, just triplet F1.
VibeSearchBench is a research benchmark that tests AI assistants on complex, multi-turn research tasks where users gradually reveal their information needs. It evaluates how well AI can search the web, ask follow-up questions, and produce accurate structured knowledge graphs.
How It Works
You learn about a benchmark that tests how well AI assistants handle real research tasks where users don't say everything upfront.
You set up your AI model and give it tools to search the web, visit pages, and run calculations.
Your AI receives a vague question and begins searching, while a simulated user gradually reveals more details about what they really need.
The AI talks back and forth with a simulated user, asking follow-up questions to clarify needs
The AI does all its research on its own before presenting final results
The AI searches the web, visits relevant pages, and runs code to find answers, asking clarifying questions along the way.
The system compares what your AI found against the correct answers and shows you detailed scores for accuracy and completeness.
Your AI has gathered comprehensive information and you can see exactly how accurate and complete its findings were.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.