Skip to content
ai-supply.store
探索分类排行榜社区Agent APIFAQ
发布登录
catalog / Data & ETL / MarkItDown
⇄ConnectorData & ETLFree

MarkItDown

Microsoft's universal document-to-Markdown converter: PDF, DOCX, PPTX, XLSX, HTML, images, audio, and ZIP — all to clean Markdown.

@ai-supply
安装量145k
评分★ 4.7
评价48
↗ 源代码仓库

MarkItDown

MarkItDown is Microsoft's open-source utility that converts virtually any file format to clean Markdown text, making documents ingestible by LLMs and RAG pipelines. It handles PDFs, Word documents, PowerPoint presentations, Excel spreadsheets, HTML pages, images (with OCR/LLM description), audio files (via Whisper), and ZIP archives.

Key Features

  • Universal input — PDF, DOCX, PPTX, XLSX, XLS, HTML, EPUB, MSG, CSV, JSON, XML, WAV, MP3, PNG, JPEG, ZIP
  • LLM-enhanced — optionally use a vision model to describe images embedded in documents
  • Audio transcription — integrates with Whisper for audio-to-text within document pipelines
  • MCP server — official markitdown-mcp lets agents convert files via tool calls
  • CLI + Python API — use from the command line or as a library in pipelines
  • Structure preservation — tables, headings, lists, and code blocks are faithfully converted

Quick Start

pip install markitdown[all]
from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("report.pdf")
print(result.text_content[:500])
# CLI usage
markitdown presentation.pptx > output.md

Install via ai-supply

npx ai-supply add markitdown-document-converter

Curated mirror of the open-source MarkItDown project (MIT). Install upstream from the repository.

More from @ai-supply

View profile →
◐Model
llama.cpp
Pure C/C++ LLM inference library — run quantized models on CPU, Metal, CUDA and more.
↓ 900k★ 4.9
⇄Connector
vLLM
High-throughput, memory-efficient LLM inference engine with PagedAttention and continuous batching.
↓ 820k★ 4.9
◉Agent
MetaGPT
Multi-agent framework that assigns GPT roles (PM, engineer, QA) to solve complex software tasks end-to-end.
↓ 820k★ 4.8
◆Skill
NLTK
The Natural Language Toolkit — Python's foundational NLP library for tokenization, POS tagging, parsing, and corpora.
↓ 760k★ 4.7
ai-supply.store

AI 能力市场。技能、MCP、插件、智能体、数据集——人可发现,机器可消费。

api · v3.1status · all green
联系
support@ai-supply.storesecurity@ai-supply.store
市场
  • 探索
  • 分类
  • 排行榜
  • 基准测试
社区
  • 社区
  • FAQ
面向智能体
  • 快速入门 (60s)
  • 授权智能体
  • Agent API
  • OpenAPI 规范
面向开发者
  • 发布
  • 控制台
  • 收益分成
账户
  • 登录
  • 设置
法律条款
  • 条款
  • 发布者协议
  • 可接受使用政策
  • 隐私政策