跳转到主要内容

一个将元参数通过后续请求传递的爬虫中间件。

项目描述

scrapy-sticky-meta-params

Scrapy爬虫中间件,通过后续请求传递元参数。

它做什么?

此中间件简化了在爬虫上通过请求和响应传递信息的流程。

没有中间件

class SampleSpider(Spider):
    name = "without_middleware"
    start_urls = ["https://www.example.com"]

    def parse(self, response):
        for param in range(5):
            yield Request(
                "https://www.example.com/next",
                meta={"param": param},
                callback=self.parse_2
            )

    def parse_2(self, response):
        # Get important information from response
        info = response.xpath("//info/text()").get("info")
        # We need to get the param from meta and forward it again in this request
        param = response.meta["param"]
        yield Request(
            "https://www.example.com/next",
            meta={"info": info, "param": param},
            callback=self.parse_3
        )

    def parse_3(self, response):
        yield {
            "param": response.meta["param"],  # The value that we've extracted in the first callback
            "info": response.meta["info"]
        }

有中间件

class SampleSpider(Spider):
    name = "with_middleware"
    start_urls = ["https://www.example.com"]
    sticky_meta_keys = ["param"]  # Will always forward the meta param "param"

    def parse(self, response):
        for param in range(5):
            yield Request(
                "https://www.example.com/next",
                meta={"param": param},
                callback=self.parse_2
            )

    def parse_2(self, response):
        # Get important information from response
        info = response.xpath("//info/text()").get("info")
        # We don"t need to get the "param" value from meta and resend it.
        yield Request(
            "https://www.example.com/next",
            meta={"info": info},
            callback=self.parse_3
        )

    def parse_3(self, response):
        yield {
            "param": response.meta["param"],  # The value that we've extracted in the first callback
            "info": response.meta["info"]
        }

太棒了,如何使用它?

要启用中间件,您需要将其添加到项目的 settings.py 文件中的 SPIDER_MIDDLEWARES 设置。

SPIDER_MIDDLEWARES = {
    'scrapy_sticky_meta_params.middleware.StickyMetaParamsMiddleware': 550,
}

此中间件需要为每个爬虫启用,为此您需要在您的爬虫上添加以下属性

sticky_meta_keys = []

您需要将此列表填写为每个要转发到后续请求的键。

项目详情


下载文件

下载适用于您平台的文件。如果您不确定选择哪个,请了解更多关于 安装包 的信息。

源分发

scrapy-sticky-meta-params-1.0.0.tar.gz (3.4 kB 查看哈希值)

上传于 源代码

构建版本

scrapy_sticky_meta_params-1.0.0-py3-none-any.whl (4.0 kB 查看哈希值)

上传于 Python 3

支持