vLLM

Name: vLLM
Rating: 3.9 (9 reviews)
Author: vllm-project

High-throughput LLM serving engine

by vllm-project

Rating

3.9

Votes

score

Downloads

2.5K

total

Price

Free

No login needed

Works With

Claude CodeCursorWindsurfVS CodeDeveloper tool

About

vLLM is a fast and easy-to-use library for LLM inference and serving. Features PagedAttention for efficient memory management and high throughput.

The go-to solution for production LLM serving. Supports continuous batching, tensor parallelism, and the OpenAI-compatible API.

Install via pip. Open-source with extensive model support.

Don't lose this

Three weeks from now, you'll want vLLM again. Will you remember where to find it?

Save it to your library and the next time you need vLLM, it’s one tap away — from any AI app you use. Group it into a bench with the rest of the team for that kind of task and you can pull the whole stack at once.

⚡ Pro tip for geeks: add a-gnt 🤵🏻‍♂️ as a custom connector in Claude or a custom GPT in ChatGPT — one click and your library is right there in the chat. Or, if you’re in an editor, install the a-gnt MCP server and say “use my [bench name]” in Claude Code, Cursor, VS Code, or Windsurf.

🤵🏻‍♂️

a-gnt's Take

Our honest review

High-throughput LLM serving engine. Best for anyone looking to make their AI assistant more capable in ai models. It's backed by an active open-source community and verified by the creator. With 2,483+ installs, this one's proven.

Tips for getting started

Tap "Get" above, pick your AI app, and follow the steps. Most installs take under 30 seconds.

Ai Models