Name: OpenLLM
Availability: InStock
Author: ai-supply

OpenLLM

OpenLLM is an open-source platform for deploying and operating large language models in production. Built by the BentoML team, it provides a unified interface to run, fine-tune, and deploy dozens of open-source LLMs — locally or on any cloud — with built-in quantization, streaming, and OpenAI-compatible APIs.

Key Features

One-command deployment: openllm start llama3 — downloads, quantizes, and serves
OpenAI-compatible API: drop-in for existing integrations
Quantization: bitsandbytes 4-bit/8-bit, GPTQ, AWQ
Streaming support: SSE and gRPC streaming
Cloud-native: integrates with BentoCloud, AWS SageMaker, GCP, Azure
Supports 50+ models: Llama, Mistral, Phi, Gemma, Falcon, StarCoder, Baichuan

Quick Start

pip install openllm

# Start a Llama 3 server
openllm start meta-llama/Meta-Llama-3-8B-Instruct

# Query via OpenAI client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:3000/v1", api_key="na")
response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[{"role": "user", "content": "What is vLLM?"}]
)

Install via ai-supply

npx ai-supply add openllm-model-serving-platform

Curated mirror of the open-source OpenLLM (Apache-2.0). Get it from the source.

OpenLLM

OpenLLM

Key Features

Quick Start

Install via ai-supply

More from @ai-supply