- Home
- Search & Web
- WEB SCRAPING MCP
WEB SCRAPING MCP
MCP Server leveraging crawl4ai for web scraping and LLM-based content extraction (Markdown, text sni
Rating
Votes
0
score
Downloads
0
total
Price
Free
API key required
Works With
About
Crawl4AI Web Scraper MCP Server
[](https://opensource.org/licenses/MIT)
This project provides an MCP (Model Context Protocol) server that uses the [crawl4ai](https://github.com/unclecode/crawl4ai) library to perform web scraping and intelligent content extraction tasks. It allows AI agents (like Claude, or agents built with LangChain/LangGraph) to interact with web pages, retrieve content, search for specific text, and perform LLM-based extraction based on natural language instructions.
This server uses:
- [FastMCP](https://github.com/model-context-protocol/mcp-py/blob/main/docs/fastmcp.md): For creating the MCP server endpoint.
- [crawl4ai](https://github.com/unclecode/crawl4ai): For the core web crawling and extraction logic.
- [dotenv](https://github.com/theskumar/python-dotenv): For managing API keys via a
.envfile. - (Optional) Docker: For containerized deployment, bundling Python and dependencies.
Features
- Exposes MCP tools for web interaction:
-
scrape_url: Get the full content of a webpage in Markdown format. -
extract_text_by_query: Find specific text snippets on a page based on a query. -
smart_extract: Use an LLM (currently Google Gemini) to extract structured information based on instructions. - Configurable via environment variables (API keys).
- Includes Docker configuration (
Dockerfile) for easy, self-contained deployment. - Communicates over Server-Sent Events (SSE) on port 8002 by default.
Exposed MCP Tools
scrape_url
Scrape a webpage and return its content in Markdown format.
Arguments:
-
url(str, required): The URL of the webpage to scrape.
Returns:
- (str): The webpage content in Markdown format, or an error message.
extract_text_by_query
Extract relevant text snippets from a webpage that contain a specific search query. Returns up to the first 5 matches found.
Arguments:
-
url(str, required): The URL of the webpage to search within. -
query(str, required): The text query to search for (case-insensitive). -
context_size(int, optional): The number of characters to include before and after the matched query text in each snippet. Defaults to300.
Returns:
- (str): A formatted string containing the found text snippets or a message indicating no matches were found, or an error message.
smart_extract
Intelligently extract specific information from a webpage using the configured LLM (currently requires Google Gemini API key) based on a natural language instruction.
Arguments:
-
url(str, required): The URL of the webpage to analyze and extract from. -
instruction(str, required): Natural language instruction specifying what information to extract (e.g., "List all the speakers mentioned on this page", "Extract the main contact email address", "Summarize the key findings").
Returns:
Don't lose this
Three weeks from now, you'll want WEB SCRAPING MCP again. Will you remember where to find it?
Save it to your library and the next time you need WEB SCRAPING MCP, it’s one tap away — from any AI app you use. Group it into a bench with the rest of the team for that kind of task and you can pull the whole stack at once.
⚡ Pro tip for geeks: add a-gnt 🤵🏻♂️ as a custom connector in Claude or a custom GPT in ChatGPT — one click and your library is right there in the chat. Or, if you’re in an editor, install the a-gnt MCP server and say “use my [bench name]” in Claude Code, Cursor, VS Code, or Windsurf.
a-gnt's Take
Our honest review
This plugs directly into your AI and gives it new abilities it didn't have before. MCP Server leveraging crawl4ai for web scraping and LLM-based content extraction (Markdown, text sni. Once connected, just ask your AI to use it. It's completely free and works across most major AI apps. This one just landed in the catalog — worth trying while it's fresh.
Tips for getting started
Tap "Get" above, pick your AI app, and follow the steps. Most installs take under 30 seconds.
Heads up: this needs an API key to work. You'll get one from the service's website (usually free). The setup guide tells you exactly where.
Read more:
What's New
Imported from GitHub
Ratings & Reviews
0.0
out of 5
0 ratings
No reviews yet. Be the first to share your experience.