digitalcortex / 72m-domains-dataset
PublicDataset with unique registered domains extracted from Common Crawl's columnar index (cc-index).
A dataset of 72 million unique registered domains extracted from Common Crawl indexes across 14 crawls, intended as a starting point for web crawlers and research.
How It Works
You hear about a massive collection of real website names pulled from huge web scans, perfect for starting web research or explorers.
You visit the simple project page to see what it's all about.
You discover it gathers unique domains from 14 web scans spanning over a year.
You grab the handy file packed with 72 million unique website domains.
You load the list into your spreadsheet or notes app to start browsing.
With this giant seed list, your web research or crawler project springs to life.
You now explore millions of real sites, fueling your discoveries with ease.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.