mem-eval-suite / LoCoMo_refined
PublicLoCoMo Refined: Recalibrating LoCoMo with stricter LLM judging and a cleaned dataset.
LoCoMo Refined is an enhanced benchmark for rigorously testing AI agents' ability to retain and recall details like times, events, relationships, and preferences from extended conversations.
How It Works
You hear about LoCoMo Refined, a reliable way to check if your AI chat buddy remembers details from super long conversations.
You download the ready-made pack of realistic chat scenarios and tricky memory questions to use as your testing ground.
You replay the long chats with your AI and collect its answers to all the memory questions about times, events, and preferences.
You feed your AI's answers into the smart checker, which compares them strictly to the correct ones using fair rules.
You get a clear report with scores on accuracy, plus details on what went right or wrong in time recall, facts, and more.
With honest insights, you tweak your chat system to remember better over endless talks, making it truly reliable.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.