⇄ConnectorDevOps & InfraFree
OpenLLM
Production LLM serving platform by BentoML — deploy any open-source LLM with one command.
Installs165k
Rating★ 4.6
Reviews55
OpenLLM
OpenLLM is an open-source platform for deploying and operating large language models in production. Built by the BentoML team, it provides a unified interface to run, fine-tune, and deploy dozens of open-source LLMs — locally or on any cloud — with built-in quantization, streaming, and OpenAI-compatible APIs.
Key Features
- One-command deployment:
openllm start llama3— downloads, quantizes, and serves - OpenAI-compatible API: drop-in for existing integrations
- Quantization: bitsandbytes 4-bit/8-bit, GPTQ, AWQ
- Streaming support: SSE and gRPC streaming
- Cloud-native: integrates with BentoCloud, AWS SageMaker, GCP, Azure
- Supports 50+ models: Llama, Mistral, Phi, Gemma, Falcon, StarCoder, Baichuan
Quick Start
pip install openllm
# Start a Llama 3 server
openllm start meta-llama/Meta-Llama-3-8B-Instruct
# Query via OpenAI client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:3000/v1", api_key="na")
response = client.chat.completions.create(
model="meta-llama/Meta-Llama-3-8B-Instruct",
messages=[{"role": "user", "content": "What is vLLM?"}]
)
Install via ai-supply
npx ai-supply add openllm-model-serving-platform
Curated mirror of the open-source OpenLLM (Apache-2.0). Get it from the source.