SmartReader 0.1.3
See the version list below for details.
dotnet add package SmartReader --version 0.1.3
NuGet\Install-Package SmartReader -Version 0.1.3
<PackageReference Include="SmartReader" Version="0.1.3" />
<PackageVersion Include="SmartReader" Version="0.1.3" />
<PackageReference Include="SmartReader" />
paket add SmartReader --version 0.1.3
#r "nuget: SmartReader, 0.1.3"
#:package SmartReader@0.1.3
#addin nuget:?package=SmartReader&version=0.1.3
#tool nuget:?package=SmartReader&version=0.1.3
<img src="https://raw.github.com/strumenta/SmartReader/master/logo.png" width="64">
SmartReader is a .NET Standard 1.3 library to extract the main content of a web page, based on a port of the Readability library by Mozilla, which in turn is based on the famous original Readability library.
Installation
You can do it the standard way, by using the NuGet package.
Install-Package SmartReader
Why You May Want To Use It
There are already other similar good projects, but they don't support .NET Core and they are based on old version of Readability. The original library is already quite stable, but there are always improvement to be made. So by relying on a original library maintained by such a competent organization we can piggyback on their hard work and user base.
There are also some minor improvements: it returns an author and publication date, together with the default byline, the language of the article and an indication of the time needed to read it. The time is considered accurate for all languages that use an alphabet, so, for instance, it isn't valid for Chinese.
I plan to add some features, like returning a list of the images in the article or, optionally, trasforming them in data uri. But at the moment the Smart in SmartReader is more of an aspiration than a statement. Feel free to suggest new features. Also, since it's an alpha release expect bugs.
Usage
There are mainly two ways to use the library. The first is by creating a new Reader object, with the URI as the argument, and then calling the Parse method to obtain the extracted Article. The second one is by using the static method ParseArticle of Reader directly, to return an Article. The advantage of using an object is that it gives you the chance to set some options.
You can also give to the library the text, or stream, directly, but you also need to give the original URI. It will not redownload the text, but it need the URI to make some checks and modifications on the links present on the page.
If the extraction fails, the returned Article object will have the field IsReadable set to false.
The content of the article is unstyled, but it is wrapped in a div with the id readability-content that you can style yourself.
The library tries to detect the correct encoding of the text, if the correct tags are present in the text.
Examples
Using the Parse method.
SmartReader.Reader sr = new SmartReader.Reader("https://arstechnica.co.uk/information-technology/2017/02/humans-must-become-cyborgs-to-survive-says-elon-musk/");
sr.Debug = true;
sr.Logger = new StringWriter();
SmartReader.Article article = sr.Parse();
if(article.IsReadable)
{
// do something with it
}
Using the ParseArticle method.
SmartReader.Article article = SmartReader.Reader.ParseArticle("https://arstechnica.co.uk/information-technology/2017/02/humans-must-become-cyborgs-to-survive-says-elon-musk/");
if(article.IsReadable)
{
// do something with it
}
Options
intMaxElemsToParse<br>Max number of nodes supported by this parser. <br> Default: 0 (no limit)intNTopCandidates <br>The number of top candidates to consider when analysing how tight the competition is among candidates. <br>Default: 5boolDebug <br>Set the Debug option. If set to true the library writes the data on Logger.<br>Default: falseTextWriterLogger <br> Where the debug data is going to be written. <br> Default: nullboolContinueIfNotReadable <br> The library tries to determine if it will find an article before actually trying to do it. This option decides whether to continue if the library heuristics fails. This value is ignored if Debug is set to true <br> Default: trueintWordThreshold <br>The minimun number of words an article must have in order to return a result. <br>Default: 500
Article Model
UriUri<br>Original UriStringTitle<br>TitleStringByline<br>Byline of the article, usually containing author and publication dateStringDir<br>Direction of the textStringContent<br>Html content of the articleStringTextContent<br>The pure text of the articleStringExcerpt<br>A summary of the article, based on metadata or first paragraphStringLanguage<br>Language string (es. 'en-US')intLength<br>Length of the text of the articleTimeSpanTimeToRead<br>Average time needed to read the articleDateTime?PublicationDate<br>Date of publication of the articleboolIsReadable<br>Indicate whether we successfully find an article
It's important to be aware that the fields Byline, Author and PublicationDate are found independently of each other. So there might be some inconsistencies and unexpected data. For instance, Byline may be a string in the form "@Date by @Author" or "@Author, @Date" or any other combination used by the publication.
Demo & Console Projects
The demo project is just a simple ASP.NET Core webpage that allows you to input an address and see the results of the library.
The console project is simple Console program that allows you to see the results of the library on random test page.
Creating The Nuget Package
In case you want to build the nuget package yourself you cannot use the standard nuget pack because of a bug related to .NET Core. Insted use the dotnet pack command.
dotnet pack --configuration Release --output "..\nupkgs"
The command must be issued inside the src/SmartReader folder, otherwise it will generate nuget packages for all projects.
License
The project uses the Apache License.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp1.0 was computed. netcoreapp1.1 was computed. netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
| .NET Standard | netstandard1.3 is compatible. netstandard1.4 was computed. netstandard1.5 was computed. netstandard1.6 was computed. netstandard2.0 was computed. netstandard2.1 was computed. |
| .NET Framework | net46 was computed. net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen30 was computed. tizen40 was computed. tizen60 was computed. |
| Universal Windows Platform | uap was computed. uap10.0 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 1.3
- AngleSharp (>= 0.9.9)
- NETStandard.Library (>= 1.6.0)
- System.Text.Encoding.CodePages (>= 4.3.0)
NuGet packages (7)
Showing the top 5 NuGet packages that depend on SmartReader:
| Package | Downloads |
|---|---|
|
Mostlylucid.LucidRAG.DocSummarizer
Local-first document summarization library using BERT embeddings, RAG, and optional LLM synthesis. Supports markdown, PDF, DOCX, and URLs. Every claim is grounded with citations. Runs entirely offline with ONNX models, or optionally uses Ollama/Docling for enhanced features. |
|
|
SuperMemoAssistant.Plugins.Import
Package Description |
|
|
Mostlylucid.LucidRAG.DoomSummarizer.Core
Core signal extraction pipeline: embeddings, entities, knowledge graphs, ranking, LLM routing with budget enforcement and circuit-breaking. Builds on Mostlylucid.DocSummarizer for shared NER, content extraction, and RRF scoring. |
|
|
Umbraco.AI.Core
Contains core logic for Umbraco AI |
|
|
Drastic.Feed.Parser.SmartReader
Drastic.Feed.Parser.SmartReader is an implementation of IArticleParserService for Drastic.Feed, using SmartReader. |
GitHub repositories (1)
Showing the top 1 popular GitHub repositories that depend on SmartReader:
| Repository | Stars |
|---|---|
|
Richasy/FantasyCopilot
A new-age AI desktop tool
|
| Version | Downloads | Last Updated |
|---|---|---|
| 0.11.0 | 21,940 | 12/12/2025 |
| 0.10.2 | 17,126 | 9/13/2025 |
| 0.10.1 | 2,068 | 8/24/2025 |
| 0.10.0 | 23,544 | 2/2/2025 |
| 0.9.6 | 12,881 | 10/9/2024 |
| 0.9.5 | 16,383 | 6/2/2024 |
| 0.9.4 | 25,682 | 8/27/2023 |
| 0.9.3 | 10,041 | 4/15/2023 |
| 0.9.2 | 10,542 | 2/7/2023 |
| 0.9.1 | 17,936 | 10/23/2022 |
| 0.9.0 | 46,046 | 8/28/2022 |
| 0.8.1 | 3,370 | 6/29/2022 |
| 0.8.0 | 12,843 | 10/19/2021 |
| 0.7.5 | 12,807 | 10/31/2020 |
| 0.7.4 | 14,551 | 9/7/2020 |
| 0.7.3 | 752 | 9/5/2020 |
| 0.7.2 | 1,604 | 5/10/2020 |
| 0.7.1 | 1,925 | 3/8/2020 |
| 0.7.0 | 5,841 | 10/29/2019 |
| 0.1.3 | 1,679 | 11/27/2017 |
- Added improvements from November updates of Readability
- Added reading of itemprop properties for metadata extraction
- Integrated tests from Readability