Name: BentoML
Availability: InStock
Author: ai-supply

BentoML

BentoML is an open-source unified model serving framework that lets you build AI services from any ML framework and deploy them on any infrastructure. It handles the full lifecycle from packaging models into reproducible Bentos to autoscaling Kubernetes deployments with adaptive batching.

Key Features

Framework agnostic: PyTorch, TensorFlow, Keras, XGBoost, scikit-learn, LLMs, diffusion models
Adaptive micro-batching: automatically batch requests for optimal GPU throughput
Runners API: modular service composition with independent scaling
Bento packaging: reproducible bundles with model, code, dependencies, Dockerfile
BentoCloud integration: one-command deployment to managed inference infrastructure
Built-in OpenTelemetry, Prometheus metrics, and gRPC support

Quick Start

import bentoml

@bentoml.service
class SentimentAnalyzer:
    model = bentoml.models.get("sentiment:latest")

    @bentoml.api
    def classify(self, text: str) -> str:
        return self.model.predict([text])[0]

# Serve locally
bentoml serve sentiment_service:SentimentAnalyzer

# Build + containerize
bentoml build && bentoml containerize sentiment:latest

Install via ai-supply

npx ai-supply add bentoml-model-serving-framework

Curated mirror of the open-source BentoML (Apache-2.0). Get it from the source.

BentoML

BentoML

Key Features

Quick Start

Install via ai-supply

More from @ai-supply