⬡PipelineVision & ImageFree
Grounded SAM — Open-Vocabulary Detection + Segmentation
Combines Grounding DINO and Segment Anything for text-prompt-driven object detection and precise segmentation in one pipeline.
Installs165k
Rating★ 4.7
Reviews55
Grounded Segment Anything
Grounded SAM marries Grounding DINO (open-vocabulary detection) with Segment Anything Model (SAM) to create a pipeline that can detect and precisely segment any object described in free-form text — no classes, no training.
Key Features
- Text-prompt detection via Grounding DINO: "a cat", "all vehicles", "the red cup"
- Pixel-perfect segmentation masks via SAM for each detected object
- Extensions: Stable Diffusion inpainting, RAM++ auto-tagging, Recognize Anything
- Grounded SAM 2 variant using SAM 2 for video object tracking+segmentation
- REST API and Gradio demo included
- Batch processing support for large image datasets
Quick Start
import groundingdino.datasets.transforms as T
from groundingdino.util.inference import load_model, predict
from segment_anything import sam_model_registry, SamPredictor
# 1) Detect with text prompt
model = load_model("groundingdino/config/GroundingDINO_SwinT.py", "weights/groundingdino_swint.pth")
boxes, logits, phrases = predict(model, image, caption="a cat", box_threshold=0.3, text_threshold=0.25)
# 2) Segment detected boxes
sampredictor = SamPredictor(sam_model_registry["vit_h"](checkpoint="weights/sam_vit_h.pth"))
sampredictor.set_image(image_np)
masks, _, _ = sampredictor.predict_torch(point_coords=None, point_labels=None, boxes=boxes)
Install via ai-supply
npx ai-supply add grounded-segment-anything-pipeline
Curated mirror of the open-source Grounded-Segment-Anything (Apache-2.0). Get it from the source.