Skip to content
ai-supply.store
DiscoverCategoriesLeaderboardsCommunityAgent APIFAQ
PublishSign in
catalog / Data & ETL / MarkItDown
⇄ConnectorData & ETLFree

MarkItDown

Microsoft's universal document-to-Markdown converter: PDF, DOCX, PPTX, XLSX, HTML, images, audio, and ZIP — all to clean Markdown.

@ai-supply
Installs145k
Rating★ 4.7
Reviews48
Install (free) to download the source.↗ Source repository

MarkItDown

MarkItDown is Microsoft's open-source utility that converts virtually any file format to clean Markdown text, making documents ingestible by LLMs and RAG pipelines. It handles PDFs, Word documents, PowerPoint presentations, Excel spreadsheets, HTML pages, images (with OCR/LLM description), audio files (via Whisper), and ZIP archives.

Key Features

  • Universal input — PDF, DOCX, PPTX, XLSX, XLS, HTML, EPUB, MSG, CSV, JSON, XML, WAV, MP3, PNG, JPEG, ZIP
  • LLM-enhanced — optionally use a vision model to describe images embedded in documents
  • Audio transcription — integrates with Whisper for audio-to-text within document pipelines
  • MCP server — official markitdown-mcp lets agents convert files via tool calls
  • CLI + Python API — use from the command line or as a library in pipelines
  • Structure preservation — tables, headings, lists, and code blocks are faithfully converted

Quick Start

pip install markitdown[all]
from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("report.pdf")
print(result.text_content[:500])
# CLI usage
markitdown presentation.pptx > output.md

Install via ai-supply

npx ai-supply add markitdown-document-converter

Curated mirror of the open-source MarkItDown project (MIT). Install upstream from the repository.

More from @ai-supply

View profile →
◆Skill
OpenCV Python
The world's most popular computer vision library with Python bindings — image processing, video, and ML pipelines.
↓ 500k★ 4.9
◐Model
timm (PyTorch Image Models)
The largest collection of pretrained image models for PyTorch — ViT, ConvNeXt, EfficientNet, Swin, and 900+ more.
↓ 490k★ 4.9
⌬Workflow
Apache Airflow
Apache-2.0 workflow orchestration platform — define, schedule, and monitor data and AI pipelines as Python DAGs.
↓ 395k★ 4.7
◐Model
Segment Anything Model (SAM)
Meta AI's promptable image segmentation model that can segment any object from a single click or bounding box.
↓ 320k★ 4.9
ai-supply.store

The marketplace for AI capabilities. Skills, MCPs, plugins, agents, datasets — discoverable by humans, consumable by machines.

api · v3.1status · all green
Marketplace
  • Discover
  • Categories
  • Leaderboards
  • Benchmarks
Community
  • Community
  • FAQ
For agents
  • Quickstart (60s)
  • Authorize an agent
  • Agent API
  • OpenAPI spec
For builders
  • Publish
  • Dashboard
  • Revenue share
Account
  • Sign in
  • Settings
Legal
  • Terms
  • Publisher Agreement
  • Acceptable Use
  • Privacy