WebReaper 3.0.3

dotnet add package WebReaper --version 3.0.3

NuGet\Install-Package WebReaper -Version 3.0.3

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="WebReaper" Version="3.0.3" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

paket add WebReaper --version 3.0.3

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: WebReaper, 3.0.3"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

// Install WebReaper as a Cake Addin
#addin nuget:?package=WebReaper&version=3.0.3

// Install WebReaper as a Cake Tool
#tool nuget:?package=WebReaper&version=3.0.3

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Interface	Description
IScheduler	Reading and writing from the job queue. By default, the in-memory queue is used, but you can provider your implementation
ICrawledLinkTracker	Tracker of visited links. A default implementation is an in-memory tracker. You can provide your own for Redis, MongoDB, etc.
IPageLoader	Loader that takes URL and returns HTML of the page as a string
IContentParser	Takes HTML and schema and returns JSON representation (JObject).
ILinkParser	Takes HTML as a string and returns page links
IScraperSink	Represents a data store for writing the results of web scraping. Takes the JObject as parameter
ISpider	A spider that does the crawling, parsing, and saving of the data

Project	Description
WebReaper	Library for web scraping
WebReaper.ScraperWorkerService	Example of using WebReaper library in a Worker Service .NET project.
WebReaper.DistributedScraperWorkerService	Example of using WebReaper library in a distributed way wih Azure Service Bus
WebReaper.AzureFuncs	Example of using WebReaper library with serverless approach using Azure Functions
WebReaper.ConsoleApplication	Example of using WebReaper library with in a console application

Version	Downloads	Last updated
3.5.2	218	10/19/2024
3.5.1	2,827	8/15/2023
3.5.0	148	8/9/2023
3.4.0	248	4/17/2023
3.3.0	220	4/3/2023
3.2.0	198	4/2/2023
3.1.0	320	2/28/2023
3.0.8	435	11/12/2022
3.0.7	357	11/4/2022
3.0.6	320	11/3/2022
3.0.5	347	10/31/2022
3.0.4	368	10/29/2022
3.0.3	357	10/29/2022
3.0.2	372	10/24/2022
3.0.1	379	10/21/2022
3.0.0	382	10/7/2022

WebReaper 3.0.3

WebReaper

Overview

Install

Requirements

Tech stack:

📋 Example:

Features:

Usage examples

API overview

SPA parsing example

Authorization

Distributed web scraping with Serverless approach

StartScrapting

WebReaperSpider

Extensibility

Adding a new sink to persist your data

Intrefaces

Main entities

Repository structure

Coming soon:

Features under consideration

net6.0

NuGet packages

GitHub repositories