基于py3 asyncio的网页抓取框架
项目描述
基于py3 asyncio & aiohttp库的网页抓取框架。
使用示例
import re
from itertools import islice
from crawler import Crawler, Request
RE_TITLE = re.compile(r'<title>([^<]+)</title>', re.S | re.I)
class TestCrawler(Crawler):
    def task_generator(self):
        for host in islice(open('var/domains.txt'), 100):
            host = host.strip()
            if host:
                yield Request('http://%s/' % host, tag='page')
    def handler_page(self, req, res):
        print('Result of request to {}'.format(req.url))
        try:
            title = RE_TITLE.search(res.body).group(1)
        except AttributeError:
            title = 'N/A'
        print('Title: {}'.format(title))
bot = TestCrawler(concurrency=10)
bot.run()安装
pip install crawler依赖项
- Python>=3.4 
- aiohttp 
项目详情
    
       关闭
    
      
        
    
    
  
crawler-0.0.2.tar.gz 的哈希值
| 算法 | 哈希摘要 | |
|---|---|---|
| SHA256 | b6b5bcc2f2a64ac60251bee1494bd7ea98605ef1a8bf87db5194bea4bdd420d2 | |
| MD5 | 272f2a88e1376ac09f2d310405ff2bb8 | |
| BLAKE2b-256 | 8d422b042beebf63f6d490d38b698f06ee4fdd16a1d32fa2373a6b662a37a33d |