ScrapeX 0.0.1
There is a newer version of this package available.
See the version list below for details.
See the version list below for details.
dotnet add package ScrapeX --version 0.0.1
NuGet\Install-Package ScrapeX -Version 0.0.1
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="ScrapeX" Version="0.0.1" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add ScrapeX --version 0.0.1
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: ScrapeX, 0.0.1"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install ScrapeX as a Cake Addin #addin nuget:?package=ScrapeX&version=0.0.1 // Install ScrapeX as a Cake Tool #tool nuget:?package=ScrapeX&version=0.0.1
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
scrape-x
Simple .NET library that provides generic web scraping abilities using XPaths.
Basic features:
- Fluent/builder pattern interface
- Pagination
- Throttling
Example Usage
public static void Main(string[] args)
{
//Set up a new scraper to scrape Austin's craigslist
Scraper scraper = new Scraper("https://austin.craigslist.org");
//Set the URL for the results page. In this case, "apts/housing for rent".
scraper.SetResultsStartPage("/search/apa")
//Set the XPath for search result nodes
.SetIndividualResultNodeXPath("//*[@id=\"sortable-results\"]/ul/li")
//Sets the XPath for search result links relative to result node
.SetIndividualResultLinkXPath("a/@href")
//Sets a predicate that decides whether or not an individual result should be visited or not.
//In this case, results are only visited if their "housing" span contains "1br".
//This saves considerable bandwidth.
.SetResultVisitPredicate(housing => housing.Contains("1br"), "p/span[2]/span[2]")
//Sets "Next" button link XPath
.SetNextLink("//*[@id=\"searchform\"]/div[3]/div[3]/span[2]/a[3]/@href")
//Sets XPaths used for retrieving data from the target page.
//Keys are used to identify the data in the callback to the Go method.
.SetTargetPageXPaths(new Dictionary<string, string>
{
{ "latitude", "//*[@id=\"map\"]/@data-latitude" },
{ "longitude", "//*[@id=\"map\"]/@data-longitude" },
{ "price", "/html/body/section/section/h2/span[2]/span[1]" },
{ "br", "/html/body/section/section/section/div[1]/p[1]/span[1]/b[1]" },
{ "sqft", "/html/body/section/section/section/div[1]/p[1]/span[2]/b" }
})
//Go!
//Everytime a target page is scraped this callback is called.
.Go(OnResultRetrieved);
}
private static void OnResultRetrieved(string link, IDictionary<string, string> results)
{
//Do something with the results...
Console.WriteLine(results["br"]);
}
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
.NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
-
.NETStandard 2.0
- HtmlAgilityPack (>= 1.8.7)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.