Skip to main content
0
W

WEB SCRAPING MCP

MCP Server leveraging crawl4ai for web scraping and LLM-based content extraction (Markdown, text sni

Rating

0.0

Votes

0

score

Downloads

0

total

Price

Free

API key required

Works With

Claude CodeCursorWindsurfVS CodeDeveloper tool

About

Crawl4AI Web Scraper MCP Server

[](https://opensource.org/licenses/MIT)

This project provides an MCP (Model Context Protocol) server that uses the [crawl4ai](https://github.com/unclecode/crawl4ai) library to perform web scraping and intelligent content extraction tasks. It allows AI agents (like Claude, or agents built with LangChain/LangGraph) to interact with web pages, retrieve content, search for specific text, and perform LLM-based extraction based on natural language instructions.

This server uses:

  • [FastMCP](https://github.com/model-context-protocol/mcp-py/blob/main/docs/fastmcp.md): For creating the MCP server endpoint.
  • [crawl4ai](https://github.com/unclecode/crawl4ai): For the core web crawling and extraction logic.
  • [dotenv](https://github.com/theskumar/python-dotenv): For managing API keys via a .env file.
  • (Optional) Docker: For containerized deployment, bundling Python and dependencies.

Features

  • Exposes MCP tools for web interaction:
  • scrape_url: Get the full content of a webpage in Markdown format.
  • extract_text_by_query: Find specific text snippets on a page based on a query.
  • smart_extract: Use an LLM (currently Google Gemini) to extract structured information based on instructions.
  • Configurable via environment variables (API keys).
  • Includes Docker configuration (Dockerfile) for easy, self-contained deployment.
  • Communicates over Server-Sent Events (SSE) on port 8002 by default.

Exposed MCP Tools

scrape_url

Scrape a webpage and return its content in Markdown format.

Arguments:

  • url (str, required): The URL of the webpage to scrape.

Returns:

  • (str): The webpage content in Markdown format, or an error message.

extract_text_by_query

Extract relevant text snippets from a webpage that contain a specific search query. Returns up to the first 5 matches found.

Arguments:

  • url (str, required): The URL of the webpage to search within.
  • query (str, required): The text query to search for (case-insensitive).
  • context_size (int, optional): The number of characters to include before and after the matched query text in each snippet. Defaults to 300.

Returns:

  • (str): A formatted string containing the found text snippets or a message indicating no matches were found, or an error message.

smart_extract

Intelligently extract specific information from a webpage using the configured LLM (currently requires Google Gemini API key) based on a natural language instruction.

Arguments:

  • url (str, required): The URL of the webpage to analyze and extract from.
  • instruction (str, required): Natural language instruction specifying what information to extract (e.g., "List all the speakers mentioned on this page", "Extract the main contact email address", "Summarize the key findings").

Returns:

Don't lose this

Three weeks from now, you'll want WEB SCRAPING MCP again. Will you remember where to find it?

Save it to your library and the next time you need WEB SCRAPING MCP, it’s one tap away — from any AI app you use. Group it into a bench with the rest of the team for that kind of task and you can pull the whole stack at once.

⚡ Pro tip for geeks: add a-gnt 🤵🏻‍♂️ as a custom connector in Claude or a custom GPT in ChatGPT — one click and your library is right there in the chat. Or, if you’re in an editor, install the a-gnt MCP server and say “use my [bench name]” in Claude Code, Cursor, VS Code, or Windsurf.

🤵🏻‍♂️

a-gnt's Take

Our honest review

This plugs directly into your AI and gives it new abilities it didn't have before. MCP Server leveraging crawl4ai for web scraping and LLM-based content extraction (Markdown, text sni. Once connected, just ask your AI to use it. It's completely free and works across most major AI apps. This one just landed in the catalog — worth trying while it's fresh.

Tips for getting started

1

Tap "Get" above, pick your AI app, and follow the steps. Most installs take under 30 seconds.

2

Heads up: this needs an API key to work. You'll get one from the service's website (usually free). The setup guide tells you exactly where.

What's New

Version 1.0.06 days ago

Imported from GitHub

Ratings & Reviews

0.0

out of 5

0 ratings

No reviews yet. Be the first to share your experience.