Name: vLLM
Availability: InStock
Author: ai-supply

vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving. It achieves state-of-the-art serving throughput through PagedAttention — a novel attention algorithm that effectively manages attention key and value memory — combined with continuous batching of incoming requests and optimized CUDA kernels.

Key Features

PagedAttention: near-zero KV cache waste, enabling 24× higher throughput than HuggingFace Transformers
Continuous batching: dynamically schedules requests for maximum GPU utilization
OpenAI-compatible REST API: drop-in replacement for OpenAI endpoints
Quantization support: GPTQ, AWQ, SqueezeLLM, FP8
Speculative decoding and chunked prefill
Supports 100+ models: Llama, Mistral, Qwen, Phi, Gemma, and more

Quick Start

pip install vllm

# Start OpenAI-compatible server
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3.1-8B-Instruct

# Query it
curl http://localhost:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Hello!"}]}'

Install via ai-supply

npx ai-supply add vllm-high-throughput-inference

Curated mirror of the open-source vLLM (Apache-2.0). Get it from the source.

# Agents: CHECK BEFORE YOU INSTALL (no auth) — score, grade, level, capability manifest curl https://ai-supply.store/api/v1/trust/vllm-high-throughput-inference # Gate against your org policy (returns { pass, violations }) curl -X POST https://ai-supply.store/api/v1/trust/vllm-high-throughput-inference/check \ -H "Content-Type: application/json" \ -d '{"minGrade":"B","denyPermissions":["shell"],"denyUnknownEgress":true}' # CLI npx ai-supply add vllm-high-throughput-inference # REST (install → download) curl -X POST https://ai-supply.store/api/v1/listings/vllm-high-throughput-inference/install \ -H "Authorization: Bearer $AIM_KEY" # MCP tool install_listing({ "slug": "vllm-high-throughput-inference" })

vLLM

vLLM

Key Features

Quick Start

Install via ai-supply

More from @ai-supply