爬虫 · PyPI · Python 包索引

基于py3 asyncio的网页抓取框架

这些详情尚未由PyPI 验证

项目描述

https://travis-ci.org/lorien/crawler.png?branch=master

https://coveralls.io/repos/lorien/crawler/badge.svg?branch=master

https://pypip.in/download/crawler/badge.svg?period=month

https://pypip.in/version/crawler/badge.svg

https://landscape.io/github/lorien/crawler/master/landscape.png

基于py3 asyncio & aiohttp库的网页抓取框架。

使用示例

import re
from itertools import islice

from crawler import Crawler, Request

RE_TITLE = re.compile(r'<title>([^<]+)</title>', re.S | re.I)

class TestCrawler(Crawler):
    def task_generator(self):
        for host in islice(open('var/domains.txt'), 100):
            host = host.strip()
            if host:
                yield Request('http://%s/' % host, tag='page')

    def handler_page(self, req, res):
        print('Result of request to {}'.format(req.url))
        try:
            title = RE_TITLE.search(res.body).group(1)
        except AttributeError:
            title = 'N/A'
        print('Title: {}'.format(title))

bot = TestCrawler(concurrency=10)
bot.run()

安装

pip install crawler

依赖项

Python>=3.4
aiohttp

项目详情

这些详情尚未由PyPI 验证

版本历史版本通知 | RSS源

此版本

0.0.2

2016年6月15日

下载文件

下载适用于您的平台的文件。如果您不确定选择哪个，请了解更多关于安装包的信息。

源分布

crawler-0.0.2.tar.gz (6.0 kB 查看哈希值)

上传时间 2016年6月15日 源

crawler-0.0.2.tar.gz 的哈希值

crawler-0.0.2.tar.gz的哈希值
算法	哈希摘要
SHA256	`b6b5bcc2f2a64ac60251bee1494bd7ea98605ef1a8bf87db5194bea4bdd420d2`
MD5	`272f2a88e1376ac09f2d310405ff2bb8`
BLAKE2b-256	`8d422b042beebf63f6d490d38b698f06ee4fdd16a1d32fa2373a6b662a37a33d`