MCP servers that scrape & crawl web pages

Extract clean content from any URL or crawl whole sites.

13 servers · Last updated August 1, 2026

TL;DR: Beyond search, these servers fetch and clean page content — turning messy HTML into markdown the model can actually use, or crawling entire sites. The differentiators are JavaScript rendering, anti-bot handling, and whether you get structured extraction or raw text.

Bottom line: if you only try one, Firecrawl is the most popular, verified option for this (6,562★). 12 more compared below.

Build a multi-server config →Check your config →

Compare 13 servers

Server	Transport	Auth	Verified	Stars	Tools for this
Firecrawl	Local (stdio)	API key		6,562	firecrawl_scrape, firecrawl_batch_scrape, firecrawl_check_batch_status +5
Bright Data MCP	Local (stdio)	API key		5,000	scrape_as_markdown, scrape_batch, extract
Browserbase MCP (Stagehand)	Local (stdio)	API key		3,000	extract
Tavily	Local (stdio)	API key		2,100	tavily-extract, tavily-crawl
DuckDuckGo Search	Local (stdio)	No auth		1,236	fetch_content
Jina AI Reader & Search	Remote (HTTP)	API key		730	extract_pdf
Kagi Search	Local (stdio)	API key		417	kagi_extract
AgentQL MCP	Local (stdio)	API key		400	extract-web-data
Wikipedia MCP	Local (stdio)	No auth	—	270	extract_key_facts
Hyperbrowser MCP	Local (stdio)	API key		250	scrape_webpage, crawl_webpages, extract_structured_data
Safari MCP	Local (stdio)	No auth	—	116	safari_extract_tables, safari_extract_meta, safari_extract_images +1
MCP AOAI Web Browsing (browser-navigator)	Local (stdio)	API key	—	33	extract_selector_by_page_content
Decodo MCP Server	Remote (HTTP)	API key	—	30	scrape_as_markdown

The servers

Firecrawl

Official Firecrawl MCP server — scrape, crawl, map, search, and structured extraction for any LLM client.

firecrawl_scrapefirecrawl_batch_scrapefirecrawl_check_batch_statusfirecrawl_mapfirecrawl_searchfirecrawl_crawl

Config & setup →Source ↗

Bright Data MCP

All-in-one web access MCP — Web Unlocker, SERP, Scraper API, and a cloud Scraping Browser.

scrape_as_markdownscrape_batchextract

Config & setup →Source ↗

Browserbase MCP (Stagehand)

Official Browserbase cloud-browser MCP built on Stagehand — natural-language act/extract/observe.

extract

Config & setup →Source ↗

Tavily

Production-ready MCP server for real-time web search, content extraction, site mapping, and crawling.

tavily-extracttavily-crawl

Config & setup →Source ↗

DuckDuckGo Search

Popular no-API-key MCP server for DuckDuckGo web search plus page fetching and parsing.

fetch_content

Config & setup →Source ↗

Jina AI Reader & Search

Official Jina AI remote MCP server — read web pages as Markdown and run grounded web search over HTTP.

extract_pdf

Config & setup →Source ↗

Kagi Search

Official Kagi MCP server (Python/uvx) — privacy-first web search and URL/video summarization.

kagi_extract

Config & setup →Source ↗

AgentQL MCP

Turn any web page into structured data — AgentQL's prompt-driven extraction as a single MCP tool.

extract-web-data

Config & setup →Source ↗

Wikipedia MCP

Give your LLM live Wikipedia access: search, full articles, summaries, sections, links and key facts in any language.

extract_key_facts

Config & setup →Source ↗

Hyperbrowser MCP

Hyperbrowser's cloud browser MCP — scrape, crawl, extract structured data, run CUA/Claude/Browser-Use agents.

scrape_webpagecrawl_webpagesextract_structured_data

Config & setup →Source ↗

Safari MCP

Native Safari browser automation for AI agents on macOS — 80 tools, your real logged-in browser, no Chrome.

safari_extract_tablessafari_extract_metasafari_extract_imagessafari_extract_links

Config & setup →Source ↗

MCP AOAI Web Browsing (browser-navigator)

Minimal MCP server that drives a web browser via Playwright, with an Azure OpenAI / OpenAI client bridge.

extract_selector_by_page_content

Config & setup →Source ↗

Decodo MCP Server

Scrape websites, search engines, eCommerce, and social media for AI agents via Decodo's Web Scraping API.

scrape_as_markdown

Config & setup →Source ↗

Use these in a stack

RAG agent Web research agent Content studio agent Browser testing agent

FAQ

Search vs scraping MCP — what's the difference?

Search returns ranked result links/snippets; scraping fetches and cleans the actual page content (often as markdown). RAG pipelines usually need both.

Which handles JavaScript-heavy sites?

Servers backed by a real browser or a rendering API (Firecrawl, Bright Data, Hyperbrowser) handle JS; plain HTTP fetchers don't.

Other capabilities

Execute SQL Inspect database schema Automate a browser Search the web Generate images Send email Send team messages Read & write files