Skip to content
ai-supply.store
ExplorarCategoríasClasificacionesComunidadAgent APIFAQ
PublicarIniciar sesión
catalog / Data & ETL / MarkItDown
⇄ConnectorData & ETLFree

MarkItDown

Microsoft's universal document-to-Markdown converter: PDF, DOCX, PPTX, XLSX, HTML, images, audio, and ZIP — all to clean Markdown.

@ai-supply
Instalaciones145k
Valoración★ 4.7
Reseñas48
↗ Repositorio fuente

MarkItDown

MarkItDown is Microsoft's open-source utility that converts virtually any file format to clean Markdown text, making documents ingestible by LLMs and RAG pipelines. It handles PDFs, Word documents, PowerPoint presentations, Excel spreadsheets, HTML pages, images (with OCR/LLM description), audio files (via Whisper), and ZIP archives.

Key Features

  • Universal input — PDF, DOCX, PPTX, XLSX, XLS, HTML, EPUB, MSG, CSV, JSON, XML, WAV, MP3, PNG, JPEG, ZIP
  • LLM-enhanced — optionally use a vision model to describe images embedded in documents
  • Audio transcription — integrates with Whisper for audio-to-text within document pipelines
  • MCP server — official markitdown-mcp lets agents convert files via tool calls
  • CLI + Python API — use from the command line or as a library in pipelines
  • Structure preservation — tables, headings, lists, and code blocks are faithfully converted

Quick Start

pip install markitdown[all]
from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("report.pdf")
print(result.text_content[:500])
# CLI usage
markitdown presentation.pptx > output.md

Install via ai-supply

npx ai-supply add markitdown-document-converter

Curated mirror of the open-source MarkItDown project (MIT). Install upstream from the repository.

More from @ai-supply

View profile →
◐Model
llama.cpp
Pure C/C++ LLM inference library — run quantized models on CPU, Metal, CUDA and more.
↓ 900k★ 4.9
⇄Connector
vLLM
High-throughput, memory-efficient LLM inference engine with PagedAttention and continuous batching.
↓ 820k★ 4.9
◉Agent
MetaGPT
Multi-agent framework that assigns GPT roles (PM, engineer, QA) to solve complex software tasks end-to-end.
↓ 820k★ 4.8
◆Skill
NLTK
The Natural Language Toolkit — Python's foundational NLP library for tokenization, POS tagging, parsing, and corpora.
↓ 760k★ 4.7
ai-supply.store

El marketplace de capacidades de IA. Habilidades, MCPs, plugins, agentes, datasets — descubribles por humanos, consumibles por máquinas.

api · v3.1status · all green
Contacto
support@ai-supply.storesecurity@ai-supply.store
Marketplace
  • Explorar
  • Categorías
  • Clasificaciones
  • Benchmarks
Comunidad
  • Comunidad
  • FAQ
Para agentes
  • Inicio rápido (60s)
  • Autorizar un agente
  • Agent API
  • Especificación OpenAPI
Para desarrolladores
  • Publicar
  • Panel
  • Reparto de ingresos
Cuenta
  • Iniciar sesión
  • Configuración
Legal
  • Términos
  • Acuerdo de editor
  • Uso aceptable
  • Privacidad