WebReaper.Sqlite
11.1.1
dotnet add package WebReaper.Sqlite --version 11.1.1
NuGet\Install-Package WebReaper.Sqlite -Version 11.1.1
<PackageReference Include="WebReaper.Sqlite" Version="11.1.1" />
<PackageVersion Include="WebReaper.Sqlite" Version="11.1.1" />
<PackageReference Include="WebReaper.Sqlite" />
paket add WebReaper.Sqlite --version 11.1.1
#r "nuget: WebReaper.Sqlite, 11.1.1"
#:package WebReaper.Sqlite@11.1.1
#addin nuget:?package=WebReaper.Sqlite&version=11.1.1
#tool nuget:?package=WebReaper.Sqlite&version=11.1.1
WebReaper.Sqlite
SQLite embedded-store adapters for WebReaper:
a local durable scheduler (and, from the next slice, a visited-link
tracker) backed by SQLite via Microsoft.Data.Sqlite. "Resume" is a
SELECT … WHERE consumed = 0 over an indexed table; not a hand-rolled
append-only job file plus a sidecar position file.
This is the opt-in robust-local durability tier, between the zero-dependency core file adapters and the distributed Redis / Azure Service Bus satellites:
| Tier | Package | Shape |
|---|---|---|
| File | WebReaper (core) |
append + 300 ms poll + position file; the zero-dep default |
| SQLite | WebReaper.Sqlite |
embedded store, "resume" is a query, no position file |
| Redis / Azure Service Bus | WebReaper.Redis / .AzureServiceBus |
distributed |
Satellite package (ADR-0009 / ADR-0012): the SQLite adapters are kept out of
the WebReaper core so the core stays dependency-light and Native-AOT-clean
Microsoft.Data.Sqlite is a managed wrapper over a native e_sqlite3
(SQLitePCLRaw), and that native-interop graph is quarantined here. The core
file scheduler is unchanged and remains the zero-dependency local default.
Install
dotnet add package WebReaper.Sqlite
Pulls WebReaper (the core) as a dependency.
Usage
Adds WithSqliteScheduler to ScraperEngineBuilder, over the core's public
WithScheduler registration seam:
using WebReaper.Builders;
using WebReaper.Sqlite;
var engine = await ScraperEngineBuilder
.Crawl("https://example.com")
.Extract(new Schema { /* … */ })
.WithSqliteScheduler("crawl/state.db")
.WriteToJsonFile("output.jsonl")
.BuildAsync();
await engine.RunAsync();
Kill the process mid-crawl and run it again with the same databasePath:
every job that was queued but not yet claimed is still there, found by the
same query; no position file to keep in sync.
dataCleanupOnStart: true clears the job table at start (a fresh crawl):
.WithSqliteScheduler("crawl/state.db", dataCleanupOnStart: true)
Scope
This package currently ships the scheduler. The SQLite visited-link
tracker (TrackVisitedLinksInSqlite) is the next slice. The core's role
interfaces (IScheduler, IVisitedLinkTracker) are unchanged; SQLite is an
additional adapter, not a core change.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- Microsoft.Data.Sqlite (>= 10.0.8)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.8)
- WebReaper (>= 11.1.1)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
10.0.0 (breaking): SqliteScheduler drops Complete() (ADR-0037; termination is consumer-cancel of GetAllAsync, so the durable scheduler no longer hangs on StopWhenAllLinksProcessed) and implements IAsyncInitializable (ADR-0033; the Crawl driver opens the database before the loop). MIT relicense (ADR-0017). Requires WebReaper 10.0.0. 9.0.0: lockstep republish against the core 9.0.0 (ADR-0023: the core public surface is now the documented contract). No functional change. 8.0.0: lockstep republish against the core 8.0.0 major (ADR-0022); no functional change. 7.1.0: initial release. SQLite embedded-store local durable scheduler (ScraperEngineBuilder.WithSqliteScheduler); "resume" is a SELECT WHERE consumed = 0 over an indexed table, replacing FileScheduler's append-only job file + sidecar position file + line-skip cursor (the cursor/job-file desync failure mode is unrepresentable). Same WebReaperJson Job grammar as the core file scheduler and the Redis scheduler (ADR-0008), so the Job round-trips with full type fidelity. New satellite package per ADR-0009 / ADR-0012: the WebReaper core does not reference Microsoft.Data.Sqlite, so the native e_sqlite3 (SQLitePCLRaw) graph stays off the dependency-light, Native-AOT-zero-warning core. The visited-link tracker is the next slice (issue #67).