Vllm Cli

Name: Vllm Cli
Author: Chen-zexi

A command-line interface tool for serving LLM using vLLM.

by Chen-zexi

Rating

0.0

Votes

score

Downloads

total

Price

Free

No login needed

Works With

Claude CodeCursorWindsurfVS CodeDeveloper tool

About

vLLM CLI

](https://github.com/Chen-zexi/vllm-cli/actions/workflows/ci.yml) [ ](https://badge.fury.io/py/vllm-cli) [ ](https://www.python.org/downloads/) [

A command-line interface tool for serving Large Language Models using vLLM. Provides both interactive and command-line modes with features for configuration profiles, model management, and server monitoring.

Interactive terminal interface with GPU status and system overview Tip: You can customize the GPU stats bar in settings

Features

🎯 Interactive Mode - Rich terminal interface with menu-driven navigation
⚡ Command-Line Mode - Direct CLI commands for automation and scripting
🤖 Model Management - Automatic discovery of local models with HuggingFace and Ollama support
🔧 Configuration Profiles - Pre-configured and custom server profiles for different use cases
📊 Server Monitoring - Real-time monitoring of active vLLM servers
🖥️ System Information - GPU, memory, and CUDA compatibility checking
📝 Advanced Configuration - Full control over vLLM parameters with validation

What's New in v0.2.5

Multi-Model Proxy Server (Experimental)

The Multi-Model Proxy is a new experimental feature that enables serving multiple LLMs through a single unified API endpoint. This feature is currently under active development and available for testing.

What It Does:

Single Endpoint - All your models accessible through one API
Live Management - Add or remove models without stopping the service
Dynamic GPU Management - Efficient GPU resource distribution through vLLM's sleep/wake functionality
Interactive Setup - User-friendly wizard guides you through configuration

Note: This is an experimental feature under active development. Your feedback helps us improve! Please share your experience through GitHub Issues.

For complete documentation, see the 🌐 Multi-Model Proxy Guide.

What's New in v0.2.4

🚀 Hardware-Optimized Profiles for GPT-OSS Models

New built-in profiles specifically optimized for serving GPT-OSS models on different GPU architectures:

`gpt_oss_ampere` - Optimized for NVIDIA A100 GPUs
`gpt_oss_hopper` - Optimized for NVIDIA H100/H200 GPUs
`gpt_oss_blackwell` - Optimized for NVIDIA Blackwell GPUs

Based on official vLLM GPT recipes for maximum performance.

Don't lose this

Three weeks from now, you'll want Vllm Cli again. Will you remember where to find it?

Save it to your library and the next time you need Vllm Cli, it’s one tap away — from any AI app you use. Group it into a bench with the rest of the team for that kind of task and you can pull the whole stack at once.

⚡ Pro tip for geeks: add a-gnt 🤵🏻‍♂️ as a custom connector in Claude or a custom GPT in ChatGPT — one click and your library is right there in the chat. Or, if you’re in an editor, install the a-gnt MCP server and say “use my [bench name]” in Claude Code, Cursor, VS Code, or Windsurf.

🤵🏻‍♂️

a-gnt's Take

Our honest review

This plugs directly into your AI and gives it new abilities it didn't have before. A command-line interface tool for serving LLM using vLLM. Once connected, just ask your AI to use it. It's completely free and works across most major AI apps. This one just landed in the catalog — worth trying while it's fresh.

Tips for getting started

Tap "Get" above, pick your AI app, and follow the steps. Most installs take under 30 seconds.

Automation