48 个项目
itemloaders
Base library for scrapy's ItemLoader
scrapy-poet
Page Object pattern for Scrapy
scrapinghub-entrypoint-scrapy
Scrapy entrypoint for Scrapinghub job runner
extruct
Extract embedded metadata from HTML markup
spidermon
Spidermon is a framework to build monitors for Scrapy spiders.
itemadapter
Common interface for data container classes
web-poet
Zyte's Page Object pattern for web scraping
scrapyrt
Put Scrapy spiders behind an HTTP API
shub
Scrapinghub Command Line Client
andi
Library for annotation-based dependency injection
dateparser
Date parsing library designed to parse dates from HTML pages
number-parser
parse numbers written in natural language
js2xml
Convert Javascript code to XML document
scrapinghub
Client interface for Scrapinghub API
autoextract-poet
web-poet definitions for AutoExtract API
scrapy-deltafetch
Scrapy middleware to ignore previously crawled pages
scrapy-autoextract
Zyte Automatic Extraction API integration for Scrapy
scrapy-jsonschema
Scrapy schema validation pipeline and Item builder using JSON Schema
scrapinghub-autoextract
Python interface to Scrapinghub Automatic Extraction API
scrapy-crawlera
Crawlera middleware for Scrapy
price-parser
Extract price and currency from a raw string
splash
A javascript rendered with a HTTP API
scrapy-headless
Download Handler for using Scrapy with headless browsers
scrapy-po
Page Object pattern for Scrapy
arche
Analyze Scrapy Cloud data
slybot
Slybot crawler
frontera
A scalable frontier for web crawlers
portia2code
Convert portia spider definitions to python scrapy spiders
webstruct
A library for creating statistical NER systems that work on HTML data
PyPyDispatcher
Multi-producer-multi-consumer signal dispatching mechanism
exporters
Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations.
page_finder
hubstorage
Client interface for Scrapinghub HubStorage
scrapylib
Scrapy helper functions and processors
shub-image
Scrapinghub release tool
adblockparser
Parser for Adblock Plus rules
scrapy-splitvariants
Scrapy spider middleware to split an item into multiple items on a multi-valued key
scrapy-hcf
Scrapy spider middleware to use Scrapinghub's Hub Crawl Frontier as a backend for URLs
scrapy-querycleaner
Scrapy spider middleware to clean up query parameters in request URLs
scrapy-magicfields
Scrapy middleware to add extra "magic" fields to items
page_clustering
Online k-means clustering of web pages
scrapy-mosquitera
Restrict crawl and scraping scope using matchers.
skinfer
Simple tool to merge JSON schemas
flatson
Tool to flatten stream of JSON-like objects, configured via schema
crawl-frontier
A flexible frontier for web crawlers
aduana
Bindings for Aduana library
wappalyzer-python
Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)
scrapy-streamitem
Scrapy support for working with streamcorpus Stream Items