HtmlKit 1.2.0

Requires NuGet 2.12 or higher.

dotnet add package HtmlKit --version 1.2.0                
NuGet\Install-Package HtmlKit -Version 1.2.0                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="HtmlKit" Version="1.2.0" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add HtmlKit --version 1.2.0                
#r "nuget: HtmlKit, 1.2.0"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install HtmlKit as a Cake Addin
#addin nuget:?package=HtmlKit&version=1.2.0

// Install HtmlKit as a Cake Tool
#tool nuget:?package=HtmlKit&version=1.2.0                

HtmlKit

Build StatusCoverity Scan Build StatusCoverage Status

What is HtmlKit?

HtmlKit is a cross-platform .NET framework for parsing HTML.

HtmlKit implements the HTML5 tokenizing state machine described in W3C's HTML5 Tokenization Specification.

Goals

I haven't fully figured that out yet.

So far the goal is tokenizing HTML with the intention of using it for MimeKit's HtmlToHtml text converter, replacing the quick & dirty HTML tokenizer I originally wrote.

Maybe someday I'll implement a DOM. Who knows.

License Information

HtmlKit is Copyright (C) 2015-2024 Jeffrey Stedfast and is licensed under the MIT license:

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

Installing via NuGet

The easiest way to install HtmlKit is via NuGet.

In Visual Studio's Package Manager Console, simply enter the following command:

Install-Package HtmlKit

Getting the Source Code

First, you'll need to clone HtmlKit from my GitHub repository. To do this using the command-line version of Git, you'll need to issue the following command in your terminal:

git clone https://github.com/jstedfast/HtmlKit.git

If you are using TortoiseGit on Windows, you'll need to right-click in the directory where you'd like to clone HtmlKit and select Git Clone... in the menu. Once you do that, you'll get a dialog asking you to specify the repository you'd like to clone. In the textbox labeled URL:, enter https://github.com/jstedfast/HtmlKit.git and then click OK. This will clone HtmlKit onto your local machine.

Updating the Source Code

Occasionally you might want to update your local copy of the source code if I have made changes to HtmlKit since you downloaded the source code in the step above. To do this using the command-line version fo Git, you'll need to issue the following command in your terminal within the HtmlKit directory:

git pull

If you are using TortoiseGit on Windows, you'll need to right-click on the HtmlKit directory and select Git Sync... in the menu. Once you do that, you'll need to click the Pull button.

Building

Once you've opened HtmlKit.sln solution file in Visual Studio, you can choose the Debug or Release build configuration and then build.

Both Visual Studio 2022 and Visual Studio 2019 should be able to build HtmlKit without any issues, but older versions such as Visual Studio 2015 and 2017 will likely require modifications to the projects in order to build correctly.

Note: The Release build will generate the xml API documentation, but the Debug build will not.

Using HtmlKit

Parsing HTML

The primary purpose of HtmlKit is parsing HTML.

using (var reader = new StreamReader (stream)) {
    var tokenizer = new HtmlTokenizer (reader);
    HtmlToken token;

    // ReadNextToken() returns `false` when the end of the stream is reached.
    while (tokenizer.ReadNextToken (out token)) {
        switch (token.Kind) {
        case HtmlTokenKind.ScriptData:
        case HtmlTokenKind.CData:
        case HtmlTokenKind.Data:
            // ScriptData, CData, and Data tokens contain text data.
            var text = (HtmlDataToken) token;

            Console.WriteLine ("{0}: {1}", token.Kind, text.Data);
            break;
        case HtmlTokenKind.Tag:
            // Tag tokens represent tags and their attributes.
            var tag = (HtmlTagToken) token;

            Console.Write ("<{0}{1}", tag.IsEndTag ? "/" : "", tag.Name);

            foreach (var attribute in tag.Attributes) {
                if (attribute.Value != null)
                    Console.Write (" {0}={1}", attribute.Name, Quote (attribute.Value));
                else
                    Console.Write (" {0}", attribute.Name);
            }

            Console.WriteLine (tag.IsEmptyElement ? "/>" : ">");
            break;
        case HtmlTokenKind.Comment:
            var comment = (HtmlCommentToken) token;

            Console.WriteLine ("Comment: {0}", comment.Comment);
            break;
        case HtmlTokenKind.DocType:
            var doctype = (HtmlDocTypeToken) token;

            if (doctype.ForceQuirksMode)
                Console.Write ("");

            Console.Write ("<!DOCTYPE");

            if (doctype.Name != null)
                Console.Write (" {0}", doctype.Name.ToUpperInvariant ());

            if (doctype.PublicIdentifier != null) {
                Console.Write (" PUBLIC \"{0}\"", doctype.PublicIdentifier);
                if (doctype.SystemIdentifier != null)
                    Console.Write (" \"{0}\"", doctype.SystemIdentifier);
            } else if (doctype.SystemIdentifier != null) {
                Console.Write (" SYSTEM \"{0}\"", doctype.SystemIdentifier);
            }

            Console.WriteLine (">");
            break;
        }
    }
}

Contributing

The first thing you'll need to do is fork HtmlKit to your own GitHub repository. For instructions on how to do that, see the section titled Getting the Source Code.

If you use Visual Studio for Mac or MonoDevelop, all of the solution files are configured with the coding style used by HtmlKit. If you use Visual Studio on Windows or some other editor, please try to maintain the existing coding style as best as you can.

Once you've got some changes that you'd like to submit upstream to the official HtmlKit repository, send me a Pull Request and I will try to review your changes in a timely manner.

If you'd like to contribute but don't have any particular features in mind to work on, check out the issue tracker and look for something that might pique your interest!

Reporting Bugs

Have a bug or a feature request? Please open a new bug report or feature request.

Before opening a new issue, please search through any existing issues to avoid submitting duplicates.

If you are getting an exception from somewhere within HtmlKit, don't just provide the Exception.Message string. Please include the Exception.StackTrace as well. The Message, by itself, is often useless.

Documentation

API documentation can be found in the source code in the form of XML doc comments.

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 is compatible. 
.NET Framework net461 was computed.  net462 is compatible.  net463 was computed.  net47 is compatible.  net471 was computed.  net472 was computed.  net48 is compatible.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories (1)

Showing the top 1 popular GitHub repositories that depend on HtmlKit:

Repository Stars
bkaankose/Wino-Mail
Built-in Mail & Calendars app clone for Windows.
Version Downloads Last updated
1.2.0 96 11/29/2024
1.1.0 12,499 2/25/2023
1.0.3 8,228 10/19/2019
1.0.2 3,218 1/6/2018
1.0.1 1,024 12/2/2017
1.0.0 2,446 10/6/2016

* Lazy-load the Id properties for HtmlTagToken and HtmlAttribute for performance improvements.
* Removed the need for reflection in HtmlTagId and HtmlAttributeId logic for converting between these enums and their string equivalents.
* Optimized HtmlTokenizer using a char[] buffer instead of reading 1 char at a time.
* Added HtmlTokenizer .ctors that take a Stream instead of a TextReader.
* Bumped System.Buffers dependency to 4.6.0
* Bumped System.Memory dependency to 4.6.0
* Dropped support for net7.0.
* Added support for net8.0.