mdowis

mdowis / anansi

Public

A self-healing web scraper built for hostile sites: selectors repair themselves, browser rendering kicks in when needed, and Chrome TLS fingerprinting evades bot detection. Ships with an MCP server so any LLM can drive a full crawl through conversation.

49
13
69% credibility
Found May 17, 2026 at 69 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

Anansi is an adaptive web scraping framework that automatically extracts data from websites and learns to handle site changes over time. When websites block automated access or require JavaScript rendering, Anansi intelligently switches to browser-based fetching and mimics real browser fingerprints to continue collecting data. The tool includes a self-healing parser that remembers successful extraction methods and adapts when sites change their layout. It also ships with an MCP server that allows AI assistants to control web scraping through conversational commands, enabling research agents to autonomously gather data from the web.

How It Works

1
🔍 You need data from a website

You discover Anansi when you need to regularly extract product prices, articles, or listings from websites that keep changing their layout.

2
🧠 Your scraper learns as it goes

Instead of breaking when a site redesigns, Anansi automatically finds your data using multiple strategies and remembers what worked for next time.

3
🛡️ It handles the tough sites automatically

When a site blocks automated access, Anansi switches to a real browser, waits out security checks, and mimics real browser fingerprints—all without you lifting a finger.

4
Choose how to use it
📝
Quick script

Write a simple Python script that extracts exactly the fields you need from any page

🤖
AI assistant

Connect to an AI through a chat interface and ask it to research topics, gather data, and report back

5
⏸️ Pause and resume anytime

Running a large crawl? You can pause it mid-way and pick up exactly where you left off—even days later after a restart.

6
📊 Get clean, validated data

Your extracted data is automatically validated, deduplicated, and exported in formats ready for spreadsheets or databases.

Your data is ready

You have structured, reliable data from websites that would normally block automated access—all while Anansi got smarter handling them.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 69 to 49 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is anansi?

Anansi is a Python web scraping framework that survives hostile sites. While most scrapers break the moment a site redesigns its HTML, Anansi repairs broken CSS selectors automatically and remembers the fix for next time. It ships with a self-healing parser that scores selectors by confidence, runs four healing strategies when one breaks, and persists the winner to a local database. The framework also silently upgrades to a headless browser when a page needs JavaScript rendering, and mimics Chrome's TLS fingerprint to slip past bot detection systems like Cloudflare and Akamai. For AI integration, it includes an MCP server that exposes twelve scraping tools, so any LLM can drive a full crawl through conversation.

Why is it gaining traction?

The self-healing selector memory is the hook. Developers who maintain scrapers over months know the pain: a site changes its class names, the scraper returns nulls, and someone has to manually find the new selector. Anansi treats this as solved by default. The MCP server is a timely addition too -- as agents become first-class citizens in developer workflows, having a scraping tool that plays nicely with Claude, ChatGPT, or any tool-calling agent via stdio or SSE transport is genuinely useful. The TLS fingerprint impersonation via curl-cffi and the graduated Akamai escalation ladder address real-world obstacles that basic HTTP fetchers hit immediately.

Who should use this?

Backend engineers building data pipelines on sites that actively resist scraping will get the most value. Teams running long-running crawls that need to survive site redesigns without manual intervention. AI developers building agents that need reliable web access. Data engineers who want incremental re-crawls that skip unchanged pages using ETag and content hashing. Not for simple one-off page fetches -- use httpx directly for that.

Verdict

Anansi solves real problems that production scrapers hit, and the MCP integration positions it well for the agent era. At 49 stars and v0.1.0, the credibility score of 0.699% reflects an early-stage project with limited community validation -- test coverage and documentation are present but thin. Install from git, not PyPI. Worth evaluating for hostile-site scraping work, but treat it as you would any v0.x dependency: pin your version and watch for breaking changes.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.