Spidey 6.0.3
dotnet add package Spidey --version 6.0.3
NuGet\Install-Package Spidey -Version 6.0.3
<PackageReference Include="Spidey" Version="6.0.3" />
<PackageVersion Include="Spidey" Version="6.0.3" />
<PackageReference Include="Spidey" />
paket add Spidey --version 6.0.3
#r "nuget: Spidey, 6.0.3"
#addin nuget:?package=Spidey&version=6.0.3
#tool nuget:?package=Spidey&version=6.0.3
<img src="https://jacraig.github.io/Spidey/images/icon.png" style="height:25px" alt="Spidey Icon" /> Spidey
Spidey is a flexible and extensible .NET library for crawling web content. It is designed for .NET Core applications and provides a modular architecture, allowing you to customize or extend any part of the crawling pipeline.
Features
- Simple API for crawling websites
- Highly configurable via the
Options
class - Dependency injection support (IoC/DI)
- Easily replaceable subsystems (engine, parser, scheduler, etc.)
- Callback-based result handling
- NuGet package available
Quick Start
Install the NuGet package:
dotnet add package Spidey
Setting up the Library
Register Spidey in your app's service collection using the RegisterSpidey
extension method:
using Microsoft.Extensions.DependencyInjection;
using Spidey;
var services = new ServiceCollection();
services.RegisterSpidey();
// Optionally, register your Options configuration
services.AddSingleton(new Options
{
ItemFound = result => Console.WriteLine($"Found: {result.Url}"),
Allow = new List<string> { "http://mywebsite", "http://mywebsite2" },
FollowOnly = new List<string> { /* regex patterns */ },
Ignore = new List<string> { /* regex patterns */ },
StartLocations = new List<string> { "http://mywebsite", "http://mywebsite2" },
UrlReplacements = new Dictionary<string, string> { /* { "old", "new" } */ },
// Other options as needed
});
var provider = services.BuildServiceProvider();
var crawler = provider.GetRequiredService<Crawler>();
Alternatively, you can instantiate Crawler
and Options
directly without DI:
var options = new Options
{
ItemFound = result => Console.WriteLine($"Found: {result.Url}"),
// ...other options
};
var crawler = new Crawler(options);
Options Configuration
The Options
class configures the crawler's behavior. Key properties include:
ItemFound
(Action<ResultFile>
): Callback invoked when a new page is discovered.Allow
(List<string>
): Regex patterns for URLs allowed to be crawled.FollowOnly
(List<string>
): Regex patterns for pages whose links should be followed.Ignore
(List<string>
): Regex patterns for URLs to ignore.StartLocations
(List<string>
): Initial URLs to start crawling from.UrlReplacements
(Dictionary<string, string>
): URL replacements during crawling.NetworkCredentials
(NetworkCredential
): Optional credentials for authentication.UseDefaultCredentials
(bool
): Use default system credentials.Proxy
(IWebProxy
): Optional proxy settings.
Example callback method:
void OnItemFound(ResultFile result)
{
Console.WriteLine($"Discovered: {result.Url} (Status: {result.StatusCode})");
// Additional processing...
}
Basic Usage
Once configured, start the crawl process:
crawler.StartCrawl();
The library will handle link discovery, content downloading, and result parsing. Your callback will be invoked for each discovered item.
Customization
Spidey is built with extensibility in mind. The system is divided into the following subsystems, each replaceable via DI:
- Content Parser (
IContentParser
) – Parses downloaded data intoResultFile
objects. - Engine (
IEngine
) – Handles HTTP requests and content downloading. - Link Discoverer (
ILinkDiscoverer
) – Extracts links from content. - Processor (
IProcessor
) – Processes parsed content (default: invokes your callback). - Scheduler (
IScheduler
) – Manages work distribution. - Pipeline (
IPipeline
) – Orchestrates the crawling process.
To customize, implement the relevant interface from Spidey.Engines.Interfaces
and register your implementation in the service provider. Note that if you call RegisterSpidey(), the registration is handled for you automatically. If you instantiate Crawler
directly, you must compose the pipeline manually.
FAQ
Q: Can I run the crawler on multiple nodes?
A: The default scheduler is single-node only. For distributed crawling, implement a custom scheduler (e.g., using a database or message queue) to coordinate work between instances.
Build Process
Requirements:
- Visual Studio 2022
Clone the project and open the solution (Spidey.sln
) in Visual Studio to build.
License
See LICENSE for details.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- FileCurator (>= 5.0.6)
- Microsoft.IO.RecyclableMemoryStream (>= 3.0.1)
-
net9.0
- FileCurator (>= 5.0.6)
- Microsoft.IO.RecyclableMemoryStream (>= 3.0.1)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last Updated |
---|---|---|
6.0.3 | 71 | 6/27/2025 |
6.0.2 | 103 | 6/25/2025 |
6.0.1 | 153 | 12/9/2024 |
6.0.0 | 139 | 11/25/2024 |
5.0.131 | 132 | 11/12/2024 |
5.0.130 | 115 | 11/11/2024 |
5.0.129 | 115 | 11/6/2024 |
5.0.128 | 115 | 11/5/2024 |
5.0.127 | 110 | 11/4/2024 |
5.0.126 | 113 | 10/31/2024 |
5.0.125 | 105 | 10/30/2024 |
5.0.124 | 118 | 10/29/2024 |
5.0.123 | 128 | 10/11/2024 |
5.0.122 | 118 | 10/10/2024 |
5.0.121 | 110 | 10/9/2024 |
5.0.120 | 125 | 10/2/2024 |
5.0.119 | 123 | 10/1/2024 |
5.0.118 | 133 | 9/24/2024 |
5.0.117 | 129 | 9/17/2024 |
5.0.116 | 166 | 9/10/2024 |
5.0.115 | 128 | 9/3/2024 |
5.0.114 | 125 | 8/30/2024 |
5.0.113 | 135 | 8/27/2024 |
5.0.112 | 139 | 8/26/2024 |
5.0.111 | 154 | 8/23/2024 |
5.0.110 | 147 | 8/21/2024 |
5.0.109 | 140 | 8/20/2024 |
5.0.108 | 149 | 8/16/2024 |
5.0.107 | 142 | 8/15/2024 |
5.0.106 | 123 | 8/5/2024 |
5.0.105 | 106 | 8/2/2024 |
5.0.104 | 120 | 8/1/2024 |
5.0.103 | 123 | 7/26/2024 |
5.0.102 | 141 | 7/11/2024 |
5.0.101 | 137 | 7/2/2024 |
5.0.100 | 138 | 6/27/2024 |
5.0.99 | 125 | 6/26/2024 |
5.0.98 | 150 | 6/19/2024 |
5.0.97 | 136 | 6/18/2024 |
5.0.96 | 145 | 6/17/2024 |
5.0.95 | 142 | 6/14/2024 |
5.0.94 | 127 | 6/13/2024 |
5.0.93 | 124 | 6/12/2024 |
5.0.92 | 134 | 5/31/2024 |
5.0.91 | 129 | 5/30/2024 |
5.0.90 | 133 | 5/17/2024 |
5.0.89 | 140 | 5/16/2024 |
5.0.88 | 154 | 5/8/2024 |
5.0.87 | 153 | 5/7/2024 |
5.0.86 | 168 | 5/6/2024 |
5.0.85 | 113 | 5/3/2024 |
5.0.84 | 127 | 5/2/2024 |
5.0.83 | 137 | 5/1/2024 |
5.0.82 | 152 | 4/30/2024 |
5.0.81 | 140 | 4/16/2024 |
5.0.80 | 140 | 4/12/2024 |
5.0.79 | 144 | 4/11/2024 |
5.0.78 | 158 | 4/1/2024 |
5.0.77 | 145 | 3/29/2024 |
5.0.76 | 149 | 3/18/2024 |
5.0.75 | 140 | 3/15/2024 |
5.0.74 | 144 | 3/14/2024 |
5.0.73 | 143 | 3/11/2024 |
5.0.72 | 137 | 3/8/2024 |
5.0.71 | 142 | 3/7/2024 |
5.0.70 | 161 | 3/6/2024 |
5.0.69 | 151 | 3/5/2024 |
5.0.68 | 152 | 3/4/2024 |
5.0.67 | 157 | 2/29/2024 |
5.0.66 | 138 | 2/28/2024 |
5.0.65 | 151 | 2/26/2024 |
5.0.64 | 148 | 2/23/2024 |
5.0.63 | 154 | 2/22/2024 |
5.0.62 | 157 | 2/21/2024 |
5.0.61 | 145 | 2/16/2024 |
5.0.60 | 146 | 2/15/2024 |
5.0.59 | 150 | 2/12/2024 |
5.0.58 | 142 | 2/8/2024 |
5.0.57 | 137 | 2/7/2024 |
5.0.56 | 138 | 2/6/2024 |
5.0.55 | 127 | 2/1/2024 |
5.0.54 | 148 | 1/31/2024 |
5.0.53 | 138 | 1/30/2024 |
5.0.52 | 133 | 1/24/2024 |
5.0.51 | 143 | 1/23/2024 |
5.0.50 | 148 | 1/12/2024 |
5.0.49 | 147 | 1/11/2024 |
5.0.48 | 154 | 12/26/2023 |
5.0.47 | 140 | 12/22/2023 |
5.0.46 | 135 | 12/18/2023 |
5.0.45 | 124 | 12/15/2023 |
5.0.44 | 124 | 12/14/2023 |
5.0.43 | 135 | 12/13/2023 |
5.0.42 | 129 | 12/12/2023 |
5.0.41 | 167 | 11/24/2023 |
5.0.40 | 154 | 11/21/2023 |
5.0.39 | 129 | 11/20/2023 |
5.0.38 | 136 | 11/17/2023 |
5.0.37 | 133 | 11/16/2023 |
5.0.36 | 133 | 11/14/2023 |
5.0.35 | 135 | 11/8/2023 |
5.0.34 | 121 | 11/7/2023 |
5.0.33 | 140 | 11/6/2023 |
5.0.32 | 136 | 11/1/2023 |
5.0.31 | 129 | 10/31/2023 |
5.0.30 | 153 | 10/30/2023 |
5.0.29 | 140 | 10/26/2023 |
5.0.28 | 171 | 10/12/2023 |
5.0.27 | 168 | 10/5/2023 |
5.0.26 | 154 | 9/26/2023 |
5.0.25 | 132 | 9/20/2023 |
5.0.24 | 126 | 9/19/2023 |
5.0.23 | 166 | 9/18/2023 |
5.0.22 | 145 | 9/14/2023 |
5.0.21 | 134 | 9/13/2023 |
5.0.20 | 156 | 9/11/2023 |
5.0.19 | 154 | 9/7/2023 |
5.0.18 | 148 | 9/6/2023 |
5.0.17 | 144 | 9/5/2023 |
5.0.16 | 161 | 9/4/2023 |
5.0.15 | 148 | 9/1/2023 |
5.0.14 | 159 | 8/31/2023 |
5.0.13 | 167 | 8/30/2023 |
5.0.12 | 170 | 8/29/2023 |
5.0.11 | 154 | 8/25/2023 |
5.0.10 | 166 | 8/23/2023 |
5.0.9 | 153 | 8/18/2023 |
5.0.8 | 179 | 8/10/2023 |
5.0.7 | 180 | 8/8/2023 |
5.0.6 | 187 | 8/8/2023 |
5.0.5 | 201 | 8/7/2023 |
5.0.4 | 184 | 8/3/2023 |
5.0.3 | 196 | 7/26/2023 |
5.0.2 | 178 | 7/20/2023 |
5.0.1 | 187 | 7/14/2023 |
5.0.0 | 346 | 12/12/2022 |
4.0.5 | 588 | 6/10/2022 |
4.0.2 | 551 | 1/11/2022 |
4.0.1 | 512 | 7/19/2021 |
3.0.9 | 574 | 1/7/2021 |
3.0.7 | 665 | 9/13/2020 |
3.0.6 | 634 | 6/26/2020 |
3.0.5 | 611 | 6/26/2020 |
3.0.3 | 623 | 3/25/2020 |
3.0.2 | 674 | 3/1/2020 |
3.0.1 | 695 | 1/1/2020 |
3.0.0 | 667 | 12/23/2019 |
2.0.15 | 700 | 11/22/2019 |
2.0.14 | 656 | 11/22/2019 |
2.0.13 | 637 | 11/22/2019 |
2.0.12 | 635 | 11/21/2019 |
2.0.11 | 630 | 11/21/2019 |
2.0.10 | 644 | 11/21/2019 |
2.0.9 | 633 | 11/21/2019 |
2.0.8 | 813 | 3/3/2019 |
2.0.7 | 741 | 3/3/2019 |
2.0.6 | 758 | 3/3/2019 |
2.0.5 | 741 | 3/3/2019 |
2.0.4 | 794 | 2/7/2019 |
2.0.3 | 1,252 | 6/1/2018 |
2.0.2 | 1,243 | 5/22/2018 |
2.0.1 | 1,375 | 1/2/2018 |
1.0.12 | 1,165 | 11/2/2017 |
1.0.11 | 1,160 | 10/30/2017 |
1.0.10 | 1,165 | 10/26/2017 |
1.0.9 | 1,164 | 10/26/2017 |
1.0.8 | 1,193 | 10/26/2017 |
1.0.7 | 1,147 | 10/25/2017 |
1.0.6 | 1,132 | 10/25/2017 |
1.0.5 | 1,173 | 10/24/2017 |
1.0.4 | 1,114 | 10/24/2017 |
1.0.3 | 1,104 | 10/19/2017 |
1.0.2 | 1,239 | 10/18/2017 |
1.0.1 | 1,158 | 9/29/2017 |