Lofcz.Forks.HtmlToOpenXml
3.2.3
dotnet add package Lofcz.Forks.HtmlToOpenXml --version 3.2.3
NuGet\Install-Package Lofcz.Forks.HtmlToOpenXml -Version 3.2.3
<PackageReference Include="Lofcz.Forks.HtmlToOpenXml" Version="3.2.3" />
paket add Lofcz.Forks.HtmlToOpenXml --version 3.2.3
#r "nuget: Lofcz.Forks.HtmlToOpenXml, 3.2.3"
// Install Lofcz.Forks.HtmlToOpenXml as a Cake Addin #addin nuget:?package=Lofcz.Forks.HtmlToOpenXml&version=3.2.3 // Install Lofcz.Forks.HtmlToOpenXml as a Cake Tool #tool nuget:?package=Lofcz.Forks.HtmlToOpenXml&version=3.2.3
What is HtmlToOpenXml?
HtmlToOpenXml is a small .Net library that convert simple or advanced HTML to plain OpenXml components. This program has started in 2009, initially to convert user's comments into Word.
This library supports both .Net Framework 4.6.2, .NET Standard 2.0 and .NET 8 which are all LTS.
Depends on DocumentFormat.OpenXml and AngleSharp.
See Also
- Documentation
- How to deliver a generated DOCX from server Asp.Net/SharePoint?
- Prevent Document Edition
- Convert dotx to docx
Supported Html tags
Refer to w3schools’ tag list to see their meaning
a
h1-h6
abbr
andacronym
b
,i
,u
,s
,del
,ins
,em
,strike
,strong
br
andhr
img
,figcaption
andsvg
table
,td
,tr
,th
,tbody
,thead
,tfoot
,caption
andcol
cite
div
,span
,time
,font
andp
pre
sub
andsup
ul
,ol
andli
dd
anddt
q
,blockquote
,dfn
article
,aside
,section
are considered likediv
Javascript (script
), CSS style
, meta
, comments, buttons and input controls are ignored.
Other tags are treated like div
.
In v1 and v2, Javascript (script
), CSS style
, meta
, comments and other not supported tags does not generate an error but are ignored.
Html Parser
In v3, the parsing of the Html relies on AngleSharp package, which follows the W3C specifications and actively supports Html5.
In v1 and v2, the parsing of the Html was done using a custom Regex-based enumerator and was more flexible, but leaving a complex code, hard to maintain.
How to implement or debug features
My reference bibles cover both OpenXml and HTML:
Open MS Word or Apple Pages and design your expected output. Save as a DOCX file, then rename as a ZIP. Extract the content and inspect those files:
document.xml
, numbering.xml
(for list) and styles.xml
.
Acknowledgements
Thank you to all contributors that share their bug fixes (in no particular order): scwebgroup, ddforge, daviderapicavoli, worstenbrood, jodybullen, BenBurns, OleK, scarhand, imagremlin, antgraf, mdeclercq, pauldbentley, xjpmauricio, jairoXXX, giorand, bostjanKlemenc, AaronLS, taishmanov. And thanks to David Podhola for the Nuget package.
Logo provided with the permission of Enhanced Labs Design Studio.
Support
This project is open source and I do my best to support it in my spare time. I'm always happy to receive Pull Request and grateful for the time you have taken. Please target branch dev
only.
If you have questions, don't hesitate to get in touch with me!
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
.NET Framework | net461 was computed. net462 is compatible. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETFramework 4.6.2
- AngleSharp (>= 1.1.0)
- DocumentFormat.OpenXml (>= 3.1.1)
- Microsoft.Extensions.Logging.Abstractions (>= 6.0.0)
- System.ValueTuple (>= 4.5.0)
-
.NETStandard 2.0
- AngleSharp (>= 1.1.0)
- DocumentFormat.OpenXml (>= 3.1.1)
- Microsoft.Extensions.Logging.Abstractions (>= 6.0.0)
-
net8.0
- AngleSharp (>= 1.1.0)
- DocumentFormat.OpenXml (>= 3.1.1)
- Microsoft.Extensions.Logging.Abstractions (>= 6.0.0)
NuGet packages (1)
Showing the top 1 NuGet packages that depend on Lofcz.Forks.HtmlToOpenXml:
Package | Downloads |
---|---|
Html2DocxCore
Package Description |
GitHub repositories
This package is not used by any popular GitHub repositories.
# Changelog
## 3.2.2
- Supports a feature to disable heading numbering #175
- Support center image with margin auto #171
- Support deprecrated align attribute for block #171
- Fix parsing of style attribute with a key with no value
- Improve parsing of style attribute to avoid an extra call to HtmlDecode
- Extend support of nested list for non-W3C compliant html #173
## 3.2.1
- Fix indentation of numbering list #166
- Bordered container must render its content with one bordered frame #168
- Fix serialisation of the "Harvard" style for lower-roman list
- Fix ParseHeader/Footer where input with multiple paragraphs output only the latest
- Ensure to apply default style for paragraphs, to avoid a paragraph between 2 list is mis-guessed
## 3.2.0
- Add new public API to allow parsing into Header and Footer #162. Some API methods as been flagged as obsolete with a clear message of what to use instead.
This is not a breaking changes as it keep existing behaviour.
- Add support for `SVG` format (either from img src or the SVG node tag)
- Automatically create the `_top` bookmark if needed
- Fix a crash when a hyperlink contains both `img` and `figcation`
- Fix a crash when `li` is empty #161
## 3.1.1
- Fix respecting layout with `div`/`p` ending with line break #158
- Prevent crash when header/footer is incomplete and parsing image #159
- Fix combining 2 runs separated by a break, 2nd line should not be prefixed by a space
## 3.1.0
- Fix table Cell borders are wrongly applied on the run #156
- Correctly handle RTL layout for text, list, table and document scope #86 #66
- Support property line-height #52
- Fallback to `background` style attribute as many users use this simplified attribute version
- In `HtmlDomExpression.CreateFromHtmlNode`, use the correct casting to `IElement` rather than `IHtmlElement`, to prevent crash if `svg` node is encountered
## 3.0.1
- Ensure to count existing images from header and footer too #113
- Preserve line break pre for OSX/Windows
- Prevent a crash when the provided style is missing its type
- Defensive code to avoid 2 rowSpan+colSpan with a cell in between to crash #59
## 3.0.0
- AngleSharp is now the backend parser for Html
- Refactoring to use the Interpreter/Composite design pattern, which ease the code maintenance
- Lots of new unit test cases (190+)
- Rewriting of `list` (correct handling of nested style, restarting numbers and consecutive)
- Rewriting of `table` (row span, col span, col tags driving styles)
- Parallel download of images at early stage of the parsing.
## 2.4.2
- Fix signing the assembly
- Enable Nullable reference types
- support latest version of OpenXML SDK (3.1.0) which introduces breaking changes, but also support embedding SVG and JPEG2000 files.
- fix caching the provisioned images
- drop support for .Net Standard 1.3
## 2.4.0 and 2.4.1
do not use as the signing assembly was in failure #138
## 2.3.0
- better table border style
- keep processing html even if downloading image generates an error
- support for styling OL, UL and LI elements
## 2.2.0
- support latest version of OpenXML SDK (2.12.0) which introduces an API to add an OpenXmlElement to the correct XSD order
- restore support for .NET 4.6+, Net Standard 1.3+
- use cleaner name for base-64 images description
## 2.1.0
- support latest version of OpenXML SDK (2.11.0+) which fix fatal issue
- drop support for .NET 4.0, .Net Standard 1.4
## 2.0.3
- optimize number of nested list numbering (thanks to BenGraf)
- fix an issue where some styles weren't being applied
- fix reading JPEG images with SOF2 progressive DCT encoding
## 2.0.2
- fix nested list numbering
## 2.0.1
- fix manual provisioning of images
- img respect both border attribute and border style attribute
## 2.0.0
This brings .Net Core support:
- better inline styling
- numbering list with nested list is more stable
- allow parsing unit with decimals
- color can be either rgb(a), hsl(a), hex or named color.
- parser is more stable
## Pre 1.6.0
- imported from codeplex.com