MarkItDown 0.0.1

dotnet add package MarkItDown --version 0.0.1
                    
NuGet\Install-Package MarkItDown -Version 0.0.1
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="MarkItDown" Version="0.0.1" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="MarkItDown" Version="0.0.1" />
                    
Directory.Packages.props
<PackageReference Include="MarkItDown" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add MarkItDown --version 0.0.1
                    
#r "nuget: MarkItDown, 0.0.1"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#addin nuget:?package=MarkItDown&version=0.0.1
                    
Install MarkItDown as a Cake Addin
#tool nuget:?package=MarkItDown&version=0.0.1
                    
Install MarkItDown as a Cake Tool

MarkItDown.Net

A .NET library for converting various file formats to Markdown, making it ideal for indexing, text analysis, and other applications that benefit from structured text. This project builds upon the early work of MarkItDownSharp.

Supported Formats

  • PDF
  • Word (.docx)
  • Excel (.xlsx)
  • Images (EXIF metadata extraction and optional LLM-based description)
  • Audio (EXIF metadata extraction only)
  • HTML
  • Text-based formats (plain text, .csv, .xml, .rss, .atom)
  • Jupyter Notebooks (.ipynb)
  • Bing Search Result Pages (SERP)
  • ZIP files (recursively iterates over contents)
  • PowerPoint (.pptx)
  • Confluence (spaces and single pages)

Features

  • Modern .NET implementation
  • Enhanced performance and reliability
  • Expanded format support
  • Improved error handling
  • Comprehensive documentation
  • Extensible third-party service integration

Third-Party Services Support

The library is designed with extensibility in mind, allowing integration with various third-party services for enhanced functionality.

Currently Supported Services

  • Aliyun OCR Service
    • Document text recognition
    • Table structure recognition
    • Handwriting recognition
    • More OCR capabilities based on Aliyun's offerings

Adding New Services

The library provides a flexible plugin architecture for adding new service integrations. Documentation for implementing custom service providers will be available soon.

Getting Started

[Coming soon]

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

This project is based on the early work of MarkItDownSharp by kelter-antunes. We are grateful for their initial implementation which provided a solid foundation for this enhanced version.

License

[License information coming soon]

Note

Speech Recognition for audio converter is planned for future implementation. Contributions in this area are especially welcome.

Product Compatible and additional computed target framework versions.
.NET net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on MarkItDown:

Package Downloads
MarkItDown.Extensions.AliyunOCR

Package Description

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
0.0.1 169 4/22/2025