Teros.DocxToTextConverter 1.0.2

Prefix Reserved
dotnet add package Teros.DocxToTextConverter --version 1.0.2                
NuGet\Install-Package Teros.DocxToTextConverter -Version 1.0.2                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Teros.DocxToTextConverter" Version="1.0.2" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Teros.DocxToTextConverter --version 1.0.2                
#r "nuget: Teros.DocxToTextConverter, 1.0.2"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Teros.DocxToTextConverter as a Cake Addin
#addin nuget:?package=Teros.DocxToTextConverter&version=1.0.2

// Install Teros.DocxToTextConverter as a Cake Tool
#tool nuget:?package=Teros.DocxToTextConverter&version=1.0.2                

Teros.DocxToTextConverter

DocxToTextConverter converts a Word "Docx" document to a text string.

Usage

Pass a file stream, file path or byte array, with optional settings, to the converter. A newline-delimited text string is returned.

var converter = new Teros.DocxToTextConverter();

var docText = converter.ConvertDocxToText(filePathStreamOrBytes);

This class uses the Open XML SDK and is not thread-safe. You should instantiate a new object instance for each process or thread that uses it.

Options

There is no end to how people format Word documents, especially for bulleted lists, white space, and special characters. The DocxToText converter tries to make smart choices about how to interpret each document. These settings can be used as the second argument to adjust qualities of the converted text appearance:

var settings = new DocxToTextSettings() 
{ 
    // When true, this overrides most of the other settings:
    PreserveDocumentFormatting = false,

    // These are the default settings:
    ReplaceNonStandardCharsWithAsciiChars = true,
    BulletCharsForLists = BulletListChars.UseSolidBulletChar,
    ConvertAsciiBulletChars = true,
    IndentNestedListsWithTabChar = true,
    FollowBulletCharBySpaceChar = true,
    HyperlinkFormatting = HyperlinkFormat.TextAndUrl
    ReplaceLongBlankStringsWithTab = true
}

var docText = converter.ConvertDocxToText(filePathStreamOrBytes, settings);

Descriptions of these options can be seen in Visual Studio by using Intellisense or by hovering the mouse over the settings properties. Available options may differ from those shown here.

Limitations

  • DocxToTextConverter extracts only text from a Word document. It does not recognize tables, images, etc.

  • At this time, numbered lists are converted to bullets, and the original bullet chars in Word are not currently retrieved.

Technical Support

Please send problems, questions and comments to nuget@terosresearch.com. If you are reporting a problem, please describe how to replicate it; attachments are discouraged and we will ask for one if the problem can't be replicated.

License

This software can be used without restrictions under the terms of the MIT License.

Donations

ConvertDocxToText is free to download and use. If you find this project helpful and would like to help support its development, please Buy Me a Coffee.

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
.NET Core netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.1 is compatible. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.0.2 202 10/27/2023

Initial release; .net standard