一个将元参数通过后续请求传递的爬虫中间件。
项目描述
scrapy-sticky-meta-params
Scrapy爬虫中间件,通过后续请求传递元参数。
它做什么?
此中间件简化了在爬虫上通过请求和响应传递信息的流程。
没有中间件
class SampleSpider(Spider):
name = "without_middleware"
start_urls = ["https://www.example.com"]
def parse(self, response):
for param in range(5):
yield Request(
"https://www.example.com/next",
meta={"param": param},
callback=self.parse_2
)
def parse_2(self, response):
# Get important information from response
info = response.xpath("//info/text()").get("info")
# We need to get the param from meta and forward it again in this request
param = response.meta["param"]
yield Request(
"https://www.example.com/next",
meta={"info": info, "param": param},
callback=self.parse_3
)
def parse_3(self, response):
yield {
"param": response.meta["param"], # The value that we've extracted in the first callback
"info": response.meta["info"]
}
有中间件
class SampleSpider(Spider):
name = "with_middleware"
start_urls = ["https://www.example.com"]
sticky_meta_keys = ["param"] # Will always forward the meta param "param"
def parse(self, response):
for param in range(5):
yield Request(
"https://www.example.com/next",
meta={"param": param},
callback=self.parse_2
)
def parse_2(self, response):
# Get important information from response
info = response.xpath("//info/text()").get("info")
# We don"t need to get the "param" value from meta and resend it.
yield Request(
"https://www.example.com/next",
meta={"info": info},
callback=self.parse_3
)
def parse_3(self, response):
yield {
"param": response.meta["param"], # The value that we've extracted in the first callback
"info": response.meta["info"]
}
太棒了,如何使用它?
要启用中间件,您需要将其添加到项目的 settings.py
文件中的 SPIDER_MIDDLEWARES
设置。
SPIDER_MIDDLEWARES = {
'scrapy_sticky_meta_params.middleware.StickyMetaParamsMiddleware': 550,
}
此中间件需要为每个爬虫启用,为此您需要在您的爬虫上添加以下属性
sticky_meta_keys = []
您需要将此列表填写为每个要转发到后续请求的键。
项目详情
关闭
哈希值 for scrapy_sticky_meta_params-1.0.0-py3-none-any.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | da7f3af2d21303cfb681218bcd7f8d067de34b77a872346dec11a729d7e108fb |
|
MD5 | c029a26e3f7e5c26f59c8d1792374369 |
|
BLAKE2b-256 | 1f9ce15c0afaf26072f0c0402113e04844859a0949221a4cb8ffce479844d03d |