Hyland.DocumentFilters.OCR
24.3.0
See the version list below for details.
dotnet add package Hyland.DocumentFilters.OCR --version 24.3.0
NuGet\Install-Package Hyland.DocumentFilters.OCR -Version 24.3.0
<PackageReference Include="Hyland.DocumentFilters.OCR" Version="24.3.0" />
paket add Hyland.DocumentFilters.OCR --version 24.3.0
#r "nuget: Hyland.DocumentFilters.OCR, 24.3.0"
// Install Hyland.DocumentFilters.OCR as a Cake Addin #addin nuget:?package=Hyland.DocumentFilters.OCR&version=24.3.0 // Install Hyland.DocumentFilters.OCR as a Cake Tool #tool nuget:?package=Hyland.DocumentFilters.OCR&version=24.3.0
Document Filters is a toolkit that allows application developers to identify and extract metadata, as well as convert and render almost any file type. It is a core component of many products, but can also be leveraged as a stand-alone service for organizations or application developers.
Learn more about Target Frameworks and .NET Standard.
This package has no dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
# Enhancements
- Introduced option "HTML_IMAGES_SCALE_TO_FIT" to control whether images attached to email messages are scaled to fit the page. Defaults to OFF, resulting in no scaling. (DF-1907)
- Introduced support for HD rendering of Hancom Hangul HWPX files, enabling enhanced rendering and improved fidelity for these document types. This update ensures accurate display and processing of HWPX documents, maintaining their original formatting and layout. (DF-1932)
- Introduced support for a new JSON output canvas type that structures document data in a detailed, hierarchical format for seamless integration with AI and other JSON-compatible applications. This enhancement ensures efficient parsing and utilization of document content, improving AI-driven data analysis and processing. (DF-2043)
- Introduced support for a new Markdown output canvas type. This allows users to seamlessly convert documents to Markdown, providing an efficient way to present basic formatting, along with content. Markdown's lightweight nature makes it ideal for various applications, including displaying content in wiki systems, as well as interactions with AI systems. (DF-2015)
- Introduced support for accessing "IGR_Get_Page_Elements" in the .NET, Java, and Python APIs. (DF-2044)
- Introduced support for accessing the "MIMEType" property through the "Extractor" class in the .NET, Java, and Python APIs. (DF-2042)
- Introduced support for enumerating supported configuration options in the Java APIs using the new "GetAvailableOptions" method. (DF-2041)
- Introduced support for enumerating supported file types in the Java APIs using the new "GetSupportedFormats" method. (DF-2041)
- Introduced support for extracting links to external workbooks within XLSX files using the "SHOWHIDDEN_EXCEL_REFS" option, defaulting to OFF. (DF-1902)
- Introduced support for identification and text extraction support for Hancom Hangul HWPX files. (DF-1258)
- Introduced support for sub-file extraction for MSI Installer files. (DF-2038)
- Introduced support to be able to determine if a subfile is password protected, before extracting the file. For the C API, IGR_SUBFILE_INFO_FLAG_PASSWORD_PROTECTED will be set on the flags of the IGR_Subfile_Info structure. For Object APIs, the IsEncrypted property has been added to the SubFile object. (DF-1968)
- Introduced support to identify and extract tables from untagged PDF files, preserving the logical structure of tables, rows, and cells for enhanced AI data analysis. This feature ensures accurate table detection for vector-based PDFs, facilitating better utilization in AI and other applications. It can be enabled using the "PDF_TABLE_DETECTION" option, defaulting to OFF. (DF-2045)
# Updates
- HD Mode: Resolved a condition for DOCX files where illustrations (pictures, shapes, etc.) vertically positioned with "Margin" could be placed incorrectly. (DF-1839)
- HD Mode: Resolved a condition for DOCX files where images with a single textbox/graphic in a table resulted in an incorrect size of the row containing the object. (DF-1874)
- HD Mode: Resolved a condition for HTML files where styles on form elements may not be rendered. (DF-1840)
- HD Mode: Resolved a condition for RTF files where content with nested table may not render as expected. (DF-2095)
- HD Mode: Resolved a condition for RTF files where line spacing inherited from styles may render differently than MS Word. (DF-1702)
- HD Mode: Resolved a condition where converting DOCX files with images and setting the GRAPHIC_DPI option resulted in the images being shifted. (DF-1983)
- HD Mode: Resolved a condition where converting a text document to HDHTML may result in text placed in incorrect locations. (DF-1708)
- HD Mode: Resolved a condition where converting an RTF file could result in a crash when using an object API. (DF-2085)
- HD Mode: Resolved a condition where converting to HTML5 may cause some images to have an empty 'idf-graphic-data' tag. (DF-1999)
- HD Mode: Resolved a condition where having multiple threads convert text files to PDF could cause a segmentation fault. (DF-2075)
- HD Mode: Resolved a condition where style inheritance could be incorrectly applied for paragraph and character styles in Office Open XML documents (e.g. MS Word .docx, MS Excel .xlsx, MS Powerpoint .pptx). (DF-2009)
- HD Mode: Resolved a condition where text with automatic font color or no font color on dark fill/background would render with font color black instead of white. (DF-1986)
- HD Mode: Resolved a potential memory leak for PDFs when converting to TIFF. (DF-2010)
- HD Mode: Resolved an issue where converting a file with text in tables could overflow its cell and overlap with content in adjacent cells. (DF-2051)
- HD mode: Resolved a condition for PDF files where too large input pages resulted in failed memory allocation. Output page size is limited to 200 inches in any dimension for xpdf engine. (DF-2099)
- HD mode: Resolved a condition for XLSX files where conversion of large files with a large number of duplicated rows resulted in long processing time and large memory usage, while adding the "SPREADSHEET_COLLAPSE_ROWS" option, defaulting to 100. (DF-1996)
- Security: PDFium: CVE-2024-5846: patched: applied security patch to address this issue. (DF-2089)
- Security: PDFium: CVE-2024-5847: patched: applied security patch to address this issue. (DF-2088)
- Security: libtiff: CVE-2024-7006: patched: applied security patch to address this issue. (DF-2137)
- Security: xpdf: CVE-2024-4141: not-exploitable: issue is already mitigated; does not impact document filters. (DF-1993)
- Security: xpdf: CVE-2024-4568: not-exploitable: issue is already mitigated; does not impact document filters. (DF-2108)
- Security: xpdf: CVE-2024-4976: not-exploitable: issue is already mitigated; does not impact document filters. (DF-2109)