TikaOnDotnet.TextExtractor
1.17.1
dotnet add package TikaOnDotnet.TextExtractor --version 1.17.1
NuGet\Install-Package TikaOnDotnet.TextExtractor -Version 1.17.1
<PackageReference Include="TikaOnDotnet.TextExtractor" Version="1.17.1" />
paket add TikaOnDotnet.TextExtractor --version 1.17.1
#r "nuget: TikaOnDotnet.TextExtractor, 1.17.1"
// Install TikaOnDotnet.TextExtractor as a Cake Addin #addin nuget:?package=TikaOnDotnet.TextExtractor&version=1.17.1 // Install TikaOnDotnet.TextExtractor as a Cake Tool #tool nuget:?package=TikaOnDotnet.TextExtractor&version=1.17.1
Classes for running Apache Tika through **TikaOnDotNet**. Just use TextExtractor.Extract() and you'll be on your way.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET Framework | net is compatible. |
-
- TikaOnDotnet (= 1.17.1)
NuGet packages (6)
Showing the top 5 NuGet packages that depend on TikaOnDotnet.TextExtractor:
Package | Downloads |
---|---|
Contrib.Sitecore.ContentSearch.TikaOnDotnet
Contribution project for Sitecore ContentSearch |
|
DevelopmentHelpers.FileContentReader
This package combine many open sources packages and allow one interface to read may types of content files. for example:use open.xml to read docx file |
|
Cogworks.ExamineFileIndexer
An examine indexer that uses Apache TIKA |
|
Skybrud.Umbraco.Search.DocumentIndexer
This package makes it possible to index and search a wide variety of filetypes in Umbraco, including .pdf and .docx |
|
Jetsons.JetPack.Text
The wrapper library that provides smart extension methods to convert document formats to high quality text. |
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
1.17.1 | 538,914 | 4/3/2018 |
1.17.0 | 30,764 | 2/15/2018 |
1.16.0 | 164,387 | 7/30/2017 |
1.15.0 | 8,681 | 7/30/2017 |
1.14.2 | 116,341 | 4/22/2017 |
1.14.2-pre | 3,386 | 4/15/2017 |
1.14.1 | 18,710 | 1/13/2017 |
1.14.0 | 10,038 | 12/8/2016 |
1.13.1 | 10,636 | 8/16/2016 |
1.13.0 | 15,604 | 6/30/2016 |
1.12.2 | 18,248 | 4/12/2016 |
1.12.1 | 1,628 | 4/12/2016 |
1.12.0 | 1,775 | 4/11/2016 |
- Add new overloads to the `TextExtractor.Extract` allowing users to provide their own extraction result assemblers. Example:
```cs
public class CustomResult
{
public string Text { get; set; }
public IDictionary<string, string[]> Metadata { get; set; }
}
public static CustomResult CreateCustomResult(string text, Metadata metadata)
{
var metaDataDictionary = metadata.names().ToDictionary(name => name, metadata.getValues);
return new CustomResult
{
Metadata = metaDataDictionary,
Text = text,
};
}
[Test]
public void should_extract_author_list_from_pdf()
{
var textExtractionResult = new TextExtractor().Extract("file_with_authors.pdf", CreateCustomResult);
textExtractionResult.Metadata["meta:author"].Should().ContainInOrder("Fred Jones, M. D.", "Donald Evans D. M.");
}
```