⇄ConnectorDevOps & InfraFree
BentoML
Build, ship, and scale AI services — unified framework from local development to production Kubernetes.
Installs230k
Rating★ 4.7
Reviews77
BentoML
BentoML is an open-source unified model serving framework that lets you build AI services from any ML framework and deploy them on any infrastructure. It handles the full lifecycle from packaging models into reproducible Bentos to autoscaling Kubernetes deployments with adaptive batching.
Key Features
- Framework agnostic: PyTorch, TensorFlow, Keras, XGBoost, scikit-learn, LLMs, diffusion models
- Adaptive micro-batching: automatically batch requests for optimal GPU throughput
- Runners API: modular service composition with independent scaling
- Bento packaging: reproducible bundles with model, code, dependencies, Dockerfile
- BentoCloud integration: one-command deployment to managed inference infrastructure
- Built-in OpenTelemetry, Prometheus metrics, and gRPC support
Quick Start
import bentoml
@bentoml.service
class SentimentAnalyzer:
model = bentoml.models.get("sentiment:latest")
@bentoml.api
def classify(self, text: str) -> str:
return self.model.predict([text])[0]
# Serve locally
bentoml serve sentiment_service:SentimentAnalyzer
# Build + containerize
bentoml build && bentoml containerize sentiment:latest
Install via ai-supply
npx ai-supply add bentoml-model-serving-framework
Curated mirror of the open-source BentoML (Apache-2.0). Get it from the source.