Teros.DocxToTextConverter
1.0.2
Prefix Reserved
dotnet add package Teros.DocxToTextConverter --version 1.0.2
NuGet\Install-Package Teros.DocxToTextConverter -Version 1.0.2
<PackageReference Include="Teros.DocxToTextConverter" Version="1.0.2" />
paket add Teros.DocxToTextConverter --version 1.0.2
#r "nuget: Teros.DocxToTextConverter, 1.0.2"
// Install Teros.DocxToTextConverter as a Cake Addin #addin nuget:?package=Teros.DocxToTextConverter&version=1.0.2 // Install Teros.DocxToTextConverter as a Cake Tool #tool nuget:?package=Teros.DocxToTextConverter&version=1.0.2
Teros.DocxToTextConverter
DocxToTextConverter converts a Word "Docx" document to a text string.
Usage
Pass a file stream, file path or byte array, with optional settings, to the converter. A newline-delimited text string is returned.
var converter = new Teros.DocxToTextConverter();
var docText = converter.ConvertDocxToText(filePathStreamOrBytes);
This class uses the Open XML SDK and is not thread-safe. You should instantiate a new object instance for each process or thread that uses it.
Options
There is no end to how people format Word documents, especially for bulleted lists, white space, and special characters. The DocxToText converter tries to make smart choices about how to interpret each document. These settings can be used as the second argument to adjust qualities of the converted text appearance:
var settings = new DocxToTextSettings()
{
// When true, this overrides most of the other settings:
PreserveDocumentFormatting = false,
// These are the default settings:
ReplaceNonStandardCharsWithAsciiChars = true,
BulletCharsForLists = BulletListChars.UseSolidBulletChar,
ConvertAsciiBulletChars = true,
IndentNestedListsWithTabChar = true,
FollowBulletCharBySpaceChar = true,
HyperlinkFormatting = HyperlinkFormat.TextAndUrl
ReplaceLongBlankStringsWithTab = true
}
var docText = converter.ConvertDocxToText(filePathStreamOrBytes, settings);
Descriptions of these options can be seen in Visual Studio by using Intellisense or by hovering the mouse over the settings properties. Available options may differ from those shown here.
Limitations
DocxToTextConverter extracts only text from a Word document. It does not recognize tables, images, etc.
At this time, numbered lists are converted to bullets, and the original bullet chars in Word are not currently retrieved.
Technical Support
Please send problems, questions and comments to nuget@terosresearch.com. If you are reporting a problem, please describe how to replicate it; attachments are discouraged and we will ask for one if the problem can't be replicated.
License
This software can be used without restrictions under the terms of the MIT License.
Donations
ConvertDocxToText is free to download and use. If you find this project helpful and would like to help support its development, please Buy Me a Coffee.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.1 is compatible. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.1
- DocumentFormat.OpenXml (>= 2.20.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
1.0.2 | 202 | 10/27/2023 |
Initial release; .net standard