Hyland.DocumentFilters.Linux
24.4.0
dotnet add package Hyland.DocumentFilters.Linux --version 24.4.0
NuGet\Install-Package Hyland.DocumentFilters.Linux -Version 24.4.0
<PackageReference Include="Hyland.DocumentFilters.Linux" Version="24.4.0" />
paket add Hyland.DocumentFilters.Linux --version 24.4.0
#r "nuget: Hyland.DocumentFilters.Linux, 24.4.0"
// Install Hyland.DocumentFilters.Linux as a Cake Addin #addin nuget:?package=Hyland.DocumentFilters.Linux&version=24.4.0 // Install Hyland.DocumentFilters.Linux as a Cake Tool #tool nuget:?package=Hyland.DocumentFilters.Linux&version=24.4.0
Document Filters is a toolkit that allows application developers to identify and extract metadata, as well as convert and render almost any file type. It is a core component of many products, but can also be leveraged as a stand-alone service for organizations or application developers.
Learn more about Target Frameworks and .NET Standard.
-
- Hyland.DocumentFilters.Linux-Musl-x64 (>= 24.4.0)
- Hyland.DocumentFilters.Linux-x64 (>= 24.4.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
# Enhancements
- Introduced option "GRAPHIC_BMP_DPI" to control whether DPI stored in BITMAPINFOHEADER is read when processing BMP image files. Defaults to OFF, resulting in a default 96 DPI being used. (DF-2066)
- Introduced support for C++17 in the Document Filters C++ bindings. (DF-2140)
- Introduced support for Markdown output in text mode, allowing users to convert documents into Markdown format through the new "IGR_FORMAT_MARKDOWN" text mode flag. This feature enhances flexibility by not requiring HD Mode for Markdown output, resulting in more performant and efficient output. (DF-2200)
- Introduced support for a new Simplified JSON output canvas type. This streamlines the representation of document structures by leveraging a simplified JSON schema. It reduces the complexity of processing full DOM structures, making it particularly beneficial for AI and ML applications that require structured data, such as headings, tables, paragraphs, and lists, in a flattened hierarchy similar to Markdown output. (DF-2170)
- Introduced support for including image metadata in Markdown output when "MARKDOWN_INCLUDE_METADATA=ON". (DF-2227)
- Introduced support for simplified content cleaning in Markdown and JSON outputs. This feature offers multiple configurable cleaning options, including removing non-ASCII characters and normalizing quotes. These options improve the readability and machine-friendliness of generated content, making it ideal for downstream processing in AI/ML systems. (DF-2175)
- Introduced the Document Filters Python library as a GitHub package that can now be installed using a Python package manager. (DF-1975)
# Updates
- HD Mode: Resolved a condition for Apple Numbers files where cell data may be missing when converting to Hi-Def. (DF-2187)
- HD Mode: Resolved a condition for Apple Numbers files where highlighted text maybe shaded with a black background. (DF-2188)
- HD Mode: Resolved a condition for MS Office files where incorrect table headers were repeated if the table doesn't start from the top of the page. (DF-1645)
- HD Mode: Resolved a condition for MS Office files where table header rows are repeated if they don't fit on one page. (DF-1646)
- HD Mode: Resolved a condition for MS Office files where tables split across pages may cause the file to not convert. (DF-2158)
- HD Mode: Resolved a condition where internal links in MS Office files may display as plain text. (DF-2153)
- HD Mode: Resolved a condition where some tables would have missing cells when converting Word95 (or older) files. (DF-2107)
- HD Mode: Resolved a condition where tables in MS Powerpoint 97-2007 files would have their background color displayed incorrectly. (DF-2150)
- Resolved a condition in MS Office files where sub-directories could be reported as subfiles, causing an exception to be thrown. (DF-2223)
- Resolved a condition where WordArt text was missing from text-mode output for MS Office binary format files (e.g. .doc .ppt .xls). (DF-2094)
- Resolved a condition where attempting to open an rc4 encrypted file with a very long password would cause a segmentation fault. (DF-2179)
- Resolved a condition where non-latin characters would not be converted properly when converting EML files with UTF8 Byte Order Marks. (DF-2056)
- Resolved a condition where processing a Microsoft Office Excel 97-2003 Binary File Format (.xls) file with a self-reference to a range of cells that includes the 65536th row could hang or run indefinitely. (DF-2166)
- Resolved a memory leak when converting a corrupted RTF file to JSON. (DF-2114)
- Security: PDFium: CVE-2024-7973: not-exploitable: unused code block; does not impact Document Filters. (DF-2152)
- Security: xpdf: CVE-2024-7866: patched: applied security patch to address this issue. (DF-2143)
- Security: xpdf: CVE-2024-7867: not-exploitable: unused code block; does not impact Document Filters. (DF-2176)
- Security: xpdf: CVE-2024-7868: not-exploitable: issue is already mitigated; does not impact Document Filters. (DF-2177)