yaojingang

A dataset and analysis pipeline for studying how AI search engines select and use citations.

45
12
100% credibility
Found Apr 21, 2026 at 45 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

A data-driven research collection examining AI search platforms' citation patterns, source preferences, and content absorption from real queries across ChatGPT, Google AI Overview, and Perplexity.

How It Works

1
🔍 Discover the research

You stumble upon this project on GitHub or its live demo site while searching for tips on getting your content noticed by AI search tools.

2
📖 Read the quick summary

Start with the 3-minute overview to quickly grasp the key findings about what makes websites get cited by AI like ChatGPT or Google.

3
📊 Explore the full report

Dive into the beautiful charts and detailed analysis in the HTML or PDF report, seeing real examples of AI search behaviors.

4
📈 Check the raw numbers

Open the data spreadsheets to filter and explore thousands of real AI responses and page features yourself.

5
💡 Learn actionable tips

Pick up practical advice on page length, structure, and content types that boost your chances of deep AI absorption.

🚀 Optimize your content

Apply the insights to make your website more AI-friendly, watching your pages get truly used instead of just listed.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 45 to 45 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is geo-citation-lab?

This repo packages a dataset and Python pipeline to probe how AI search engines like ChatGPT, Gemini, and Perplexity select citations. It runs 602 prompts across prompt styles, languages, and scenarios, capturing 21k cleaned citation records with 72 features like page structure, semantic alignment, and influence scores. Users download CSVs for dataset analysis ai, plus reports in HTML, Markdown, and PDF to explore GEO patterns without rerunning everything.

Why is it gaining traction?

It delivers a reproducible snapshot of real AI behaviors—trigger rates, source prefs (e.g., US/English dominance), and what boosts deep absorption (longer pages with numbers/definitions). Python scripts handle extraction, fetching, and analysis, making it a ready dataset github csv for quick experiments, unlike vague GEO blogs. The 3-min summary hooks devs wanting dataset analysis in python for LLM citations.

Who should use this?

SEO specialists tuning content for AI visibility; AI researchers dissecting citation biases in dataset analysis machine learning; analysts at dataset analysis llc firms needing github dataset llm baselines for custom pipelines.

Verdict

Worth forking for the CSVs and analysis scripts if GEO matters—polished docs and reports punch above 45 stars, but 1.0% credibility flags low maintenance. Ideal dataset github repo starter, not production tool. (187 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.