简化网络文档处理
项目描述
Soupy是BeautifulSoup的包装器,使得在处理网络数据时构建复杂查询变得更容易。
这是一个Soupy查询的示例。
from soupy import Soupy, Q
html = """
<div id="main">
<div>The web is messy</div>
and full of traps
<div>but Soupy loves you</div>
</div>"""
print(Soupy(html).find(id='main').children
.each(Q.text.strip()) # extract text from each node, trim whitespace
.filter(len) # remove empty strings
.val()) # dump out of Soupy
# ['The web is messy', 'and full of traps', 'but Soupy loves you']
使用BeautifulSoup进行相同查询
from bs4 import BeautifulSoup, NavigableString
html = """
<div id="main">
<div>The web is messy</div>
and full of traps
<div>but Soupy loves you</div>
</div>"""
result = []
for node in BeautifulSoup(html).find(id='main').children:
if isinstance(node, NavigableString):
text = node.strip()
else:
text = node.text.strip()
if len(text):
result.append(text)
print(result)
有关更多信息,请参阅Soupy文档
安装
pip install soupy
依赖项
six和BeautifulSoup4
Soupy支持Python 2.6+和3.3+
项目详情
关闭
soupy-0.3.tar.gz 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 4b6f5d0d5357c7f4ff9b8e2b8d1b6b0a7a472d6a3b2a9c0e3247b3b0b411c32a345ebeb |
|
MD5 | 3826ff46df881f75ee823b161f504513 |
|
BLAKE2b-256 | 76b0badfa91b5789a8af211e32e9836498cdab749b9ec2dd5346b2349f049d06 |