用于与谷歌搜索设备通信的客户端。
项目描述
一个用于谷歌搜索设备的客户端库,简化Python中检索搜索结果的过程。
安装
此模块在PyPi中以ubuntudesign.gsa
的形式存在。您应该能够通过以下方式简单安装它:
pip install ubuntudesign.gsa
GSAClient
这是一个基本的查询谷歌搜索设备的客户端。
执行查询
您可以使用search
方法查询GSA。
search_client = GSAClient(base_url="http://gsa.example.com/search")
first_ten_results = search_client.search("hello world")
first_thirty_results = search_client.search("hello world", num=30)
results_twenty_to_forty = search_client.search(
"hello world", start=20, num=20
)
这将设置q、start(默认值:0)和num(默认值:10)以及lr(默认值:'')参数。不会提供其他搜索参数,因此它们都将回退到默认值。
返回的结果对象将尝试将GSA的每个标准结果XML标签映射到更易读的格式
{
'estimated_total_results': int, # "M": GSA's estimate, see below
'document_filtering': bool, # "FI": Is filtering enabled?
'next_url': str, # "NU": GSA URL for querying the next set of results, if available
'previous_url': str, # "PU": Ditto for previous set of results
'items': [
{
'index': int, # "R[N]": The number of this result in the index of all results
'url': str, # "U": The URL of the resulting page
'encoded_url': str, # "UE": The above URL, encoded
'title': str, # "T": The page title
'relevancy': int, # "RK": How relevant is this result to the query? From 0 to 10
'appliance_id': str, # "ENT_SOURCE": The serial number of the GSA
'summary': str, # "S": Summary text for this result
'language': str, # "LANG": The language of the page
'details': {} # "FS": Name:value pairs of any extra info
'link_supported': bool, # "L": “link:” special query term is supported,
'cache': { # "C": Dictionary, or "None" if cache is not available
'size': str, # "C[SZ]": Human readable size of cached page
'cache_id': str, # "C[CID]": ID of document in GSA's cache
'encoding': str # "C[ENC]": The text encoding of the cached page
}
},
...
]
}
按域名或语言过滤
您可以通过指定特定的域名或特定语言来过滤搜索结果。
english_results = search_client.search("hello world", language="lang_en")
non_english_results = search_client.search("hello world", language="-lang_en")
domain_specific_results = search_client.search(
"hello world",
domains=["site1.example.com", "site2.example.com"]
)
注意:如果未找到指定语言的搜索结果,GSA将回退到返回所有语言中找到的任何结果。
获取准确的统计总数
在撰写本文时,Google搜索设备将为每个查询返回结果的“估计”总数,但这个估计通常非常不准确,有时误差超过10倍!即使启用rc也是如此。
使用total_results
方法,客户端将尝试请求990 - 1000的结果。这通常会导致GSA返回最后一页的结果,这允许我们找到实际的总结果数。
total = search_client.total_results("hello world", domains=[], language='')
Django视图
为了简化与Django一起使用GSA客户端,此模块包含一个Django视图。
用法
至少需要提供SEARCH_SERVER_URL
设置,以告诉视图在哪里找到GSA
# settings.py
SEARCH_SERVER_URL = 'http://gsa.example.com/search' # Required: GSA location
SEARCH_DOMAINS = ['site1.example.com'] # Optional: By default, limit results to this set of domains
SEARCH_LANGUAGE = 'lang_zh-CN' # Optional: By default, limit results to this language
# urls.py
from ubuntudesign.gsa.views import SearchView
urlpatterns += [url(r'^search/?$', SearchView.as_view(template_name="search.html"))]
然后该视图将可供查询
example.com/search?q=my+search+term
example.com/search?q=my+search+term&domain=example.com&domain=something.example.com
(覆盖SEARCH_DOMAINS
)example.com/search?q=my+search+term&language=-lang_zh-CN
(排除中文结果,覆盖SEARCH_LANGUAGE
)
检索搜索结果后,视图将传递上下文对象到指定的template_name
(在这种情况下为search.html
)。
上下文对象的结构如下
{
'query': str, # The value of the `q` parameters passed to the view
'limit': int, # The value of the `limit` parameter, or the default of 10
'offset': int, # The value of the `offset` parameter, or the default of 0
'error': None|str, # None, or a description of the error if one occurred
'results': {
'items': [], # The list of items as returned from the GSAClient (see above)
'total': int, # The exact total number of results available
'start': int, # The index of the first result in the set
'end': int, # The index of the last result in the set
'next_offset': int|None, # The offset for the next page of results, if available
'previous_offset': int|None, # The offset for the previous page of results, if available
'last_page_offset': int, # The offset for the last page of results
'last_page': int, # The final page number (calculated from "limit" and "total")
'current_page': int, # The current page number (calculated from "limit" and "end")
'penultimate_page': int # The second-to-last page
}
项目详情
ubuntudesign.gsa-1.1.0.tar.gz的哈希
算法 | 哈希摘要 | |
---|---|---|
SHA256 | be8b2f9efbfbb8c1a4d59378b830194ca3a3af00245d386aa1c9c9fa79a5c1eb |
|
MD5 | a2b82ca5e9323b7ebaa4b31034599fb9 |
|
BLAKE2b-256 | 729d413ed829fa3a17c6e24133abb6c4d4e49abc1c6bef37fd01aa27fc77fa11 |