GroupDocs.Parser
20.6.0
See the version list below for details.
dotnet add package GroupDocs.Parser --version 20.6.0
NuGet\Install-Package GroupDocs.Parser -Version 20.6.0
<PackageReference Include="GroupDocs.Parser" Version="20.6.0" />
paket add GroupDocs.Parser --version 20.6.0
#r "nuget: GroupDocs.Parser, 20.6.0"
// Install GroupDocs.Parser as a Cake Addin #addin nuget:?package=GroupDocs.Parser&version=20.6.0 // Install GroupDocs.Parser as a Cake Tool #tool nuget:?package=GroupDocs.Parser&version=20.6.0
Document Parser .NET API
This text parser on-premise API works well to search & extract formatted text as well as the raw text from a variety of documents of supported file formats.
Document Parser Processing Features
- Parse documents by user-defined templates.
- Extract plain and structured text.
- Extract text areas with coordinates, text styles and other information.
- Search text by a keyword or regular expression; extract text around that word.
- Extract HTML or Markdown (MD) formatted text for a fast preview.
- Increase performance by extracting raw text.
- Extract formatted text, metadata, images, containers, and attachments.
- Extract table of contents for some supported document formats.
- Parse form data from PDF documents.
New Features & Enhancements in Version 20.6
- Implemented the API to extract data from documents.
- Ability to detect media types for
Zip
container.
For the detailed notes, please visit GroupDocs.Parser for .NET 20.6 Release Notes.
Parse Document by Template
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF, TXT Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, CSV, XLA, XLAM, NUMBERS Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP Portable: PDF
Extract Text (Accurate)
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF, TXT Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, CSV, XLA, XLAM, NUMBERS Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP Email: EML, EMLX, MSG Markup: HTML, XHTML, MHTML, MD, XML eBooks: CHM, EPUB, FB2 Portable: PDF Notes: ONE Databases: Databases are supported via ADO.NET. To work with the corresponding database format install its database provider.
Extract Text (Raw)
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, XLA, XLAM Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM Portable: PDF
Extract Structured Text and Formatted Text
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF Spreadsheet: XLS, XLT, XLSX, XLSM, XLTX, XLTM, XLA, XLAM Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP Email: EML, EMLX, MSG Markup: MD (Formatted Text is Not supported) eBooks: CHM, EPUB, FB2
Extract Text Areas
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, XLA, XLAM, NUMBERS Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP Portable: PDF
Extract Metadata
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, XLA, XLAM Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP Email: EML, EMLX, MSG eBooks: EPUB, FB2 Portable: PDF
Extract Images
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, XLA, XLAM, NUMBERS Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP Email: EML, EMLX, MSG Portable: PDF Archive: ZIP
Extract Containers and Attachments
Email: PST, OST, EML, EMLX, MSG Portable: PDF Archive: ZIP
Parse Form Data
Portable: PDF
Extract Table of Contents
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF eBooks: CHM, EPUB Portable: PDF Databases: Databases are supported via ADO.NET. To work with the corresponding database format install its database provider.
Platform Independence
GroupDocs.Parser for .NET does not require any external software or third party tool to be installed. GroupDocs.Parser for .NET supports any 32-bit or 64-bit operating system where .NET or Mono framework is installed. The other details are as follows:
Microsoft Windows: Microsoft Windows Desktop (x86, x64) (XP & up), Microsoft Windows Server (x86, x64) (2000 & up), Windows Azure Mac OS: Mac OS X Linux: Linux (Ubuntu, OpenSUSE, CentOS and others) Development Environments: Microsoft Visual Studio (2010 & up), Xamarin.Android, Xamarin.IOS, Xamarin.Mac, MonoDevelop 2.4 and later. Supported Frameworks: GroupDocs.Conversion for .NET supports .NET and Mono frameworks.
Getting Started with GroupDocs.Parser for .NET
Are you ready to give GroupDocs.Parser for .NET a try? Simply execute Install-Package GroupDocs.Parser
from Package Manager Console in Visual Studio to fetch & reference GroupDocs.Parser assembly in your project. If you already have GroupDocs.Parser for .Net and want to upgrade it, please execute Update-Package GroupDocs.Parser
to get the latest version.
Please check the GitHub Repository for other common usage scenarios.
Use C# Code to Extract Data from Database
string connectionString = string.Format("Provider=System.Data.Sqlite;Data Source={0};Version=3;", "database.db");
// create an instance of Parser class to extract tables from the database
// as filePath connection parameters are passed; LoadOptions is set to Database file format
using (Parser parser = new Parser(connectionString, new LoadOptions(FileFormat.Database)))
{
// check if text extraction is supported
if (!parser.Features.Text)
{
Console.WriteLine("Text extraction isn't supported.");
return;
}
// check if toc extraction is supported
if (!parser.Features.Toc)
{
Console.WriteLine("Toc extraction isn't supported.");
return;
}
// get a list of tables
IEnumerable<TocItem> toc = parser.GetToc();
// iterate over tables
foreach (TocItem i in toc)
{
// print the table name
Console.WriteLine(i.Text);
// extract a table content as a text
using (TextReader reader = parser.GetText(i.PageIndex.Value))
{
Console.WriteLine(reader.ReadToEnd());
}
}
}
Extract all Images and Save them in PNG Format via C# Code
// create an instance of Parser class
using (Parser parser = new Parser(Constants.SampleZip))
{
// extract images from document
IEnumerable<PageImageArea> images = parser.GetImages();
// check if images extraction is supported
if (images == null)
{
Console.WriteLine("Page images extraction isn't supported");
return;
}
// create the options to save images in PNG format
ImageOptions options = new ImageOptions(ImageFormat.Png);
int imageNumber = 0;
// iterate over images
foreach (PageImageArea image in images)
{
// save the image to the png file
image.Save(imageNumber.ToString() + ".png", options);
imageNumber++;
}
}
Product Page | Documentation | Demo | API Reference | Examples | Blog | Free Support | Temporary License
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
.NET Framework | net20 is compatible. net35 was computed. net40 was computed. net403 was computed. net45 was computed. net451 was computed. net452 was computed. net46 was computed. net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETFramework 2.0
- No dependencies.
-
.NETStandard 2.0
- SkiaSharp (>= 1.68.1)
- System.Drawing.Common (>= 4.5.1)
- System.Reflection.Emit (>= 4.3.0)
- System.Reflection.Emit.ILGeneration (>= 4.3.0)
- System.Security.Permissions (>= 4.5.0)
- System.Text.Encoding.CodePages (>= 4.5.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
24.10.0 | 705 | 11/1/2024 |
24.9.0 | 2,187 | 9/30/2024 |
24.8.0 | 23,447 | 8/30/2024 |
24.7.0 | 1,505 | 7/24/2024 |
24.6.0 | 2,580 | 6/29/2024 |
24.5.0 | 5,235 | 5/31/2024 |
24.4.0 | 5,358 | 4/23/2024 |
24.2.1 | 7,033 | 3/13/2024 |
24.2.0 | 1,295 | 2/29/2024 |
23.12.0 | 133,551 | 12/23/2023 |
23.11.0 | 36,269 | 11/24/2023 |
23.10.0 | 13,456 | 10/21/2023 |
23.8.0 | 65,476 | 8/18/2023 |
23.5.0 | 84,331 | 5/31/2023 |
23.3.0 | 16,032 | 3/31/2023 |
23.2.0 | 22,844 | 3/1/2023 |
22.11.1 | 24,598 | 1/17/2023 |
22.11.0 | 38,848 | 11/29/2022 |
22.8.0 | 74,089 | 8/12/2022 |
22.6.0 | 31,412 | 6/7/2022 |
22.2.0 | 37,152 | 2/25/2022 |
21.5.0 | 63,073 | 5/31/2021 |
21.2.0 | 50,838 | 2/22/2021 |
20.12.0 | 24,372 | 12/30/2020 |
20.10.0 | 168,159 | 10/27/2020 |
20.8.0 | 48,774 | 8/19/2020 |
20.6.1 | 47,372 | 6/30/2020 |
20.6.0 | 20,025 | 6/19/2020 |
20.5.0 | 35,106 | 5/8/2020 |
20.3.0 | 48,312 | 3/19/2020 |
20.1.0 | 35,646 | 1/31/2020 |
19.12.0 | 33,524 | 12/27/2019 |
19.11.0 | 28,444 | 11/22/2019 |
19.9.0 | 2,801 | 9/27/2019 |
19.5.0 | 3,031 | 5/29/2019 |
18.12.0 | 3,207 | 12/11/2018 |
18.11.0 | 2,694 | 11/8/2018 |
18.10.0 | 2,777 | 10/10/2018 |
18.9.0 | 2,765 | 9/5/2018 |
18.8.0 | 2,834 | 8/7/2018 |
18.7.0 | 2,784 | 7/3/2018 |
18.5.0 | 3,006 | 5/23/2018 |