跳转到主要内容

用于与谷歌搜索设备通信的客户端。

项目描述

build status

一个用于谷歌搜索设备的客户端库,简化Python中检索搜索结果的过程。

安装

此模块在PyPi中以ubuntudesign.gsa的形式存在。您应该能够通过以下方式简单安装它:

pip install ubuntudesign.gsa

GSAClient

这是一个基本的查询谷歌搜索设备的客户端。

执行查询

您可以使用search方法查询GSA。

search_client = GSAClient(base_url="http://gsa.example.com/search")

first_ten_results = search_client.search("hello world")

first_thirty_results = search_client.search("hello world", num=30)

results_twenty_to_forty = search_client.search(
  "hello world", start=20, num=20
)

这将设置qstart(默认值:0)和num(默认值:10)以及lr(默认值:'')参数。不会提供其他搜索参数,因此它们都将回退到默认值。

返回的结果对象将尝试将GSA的每个标准结果XML标签映射到更易读的格式

{
    'estimated_total_results': int,  # "M": GSA's estimate, see below
    'document_filtering': bool,      # "FI": Is filtering enabled?
    'next_url': str,                 # "NU": GSA URL for querying the next set of results, if available
    'previous_url': str,             # "PU": Ditto for previous set of results
    'items': [
        {
            'index': int,            # "R[N]": The number of this result in the index of all results
            'url': str,              # "U": The URL of the resulting page
            'encoded_url': str,      # "UE": The above URL, encoded
            'title': str,            # "T": The page title
            'relevancy': int,        # "RK": How relevant is this result to the query? From 0 to 10
            'appliance_id': str,     # "ENT_SOURCE": The serial number of the GSA
            'summary': str,          # "S": Summary text for this result
            'language': str,         # "LANG": The language of the page
            'details': {}            # "FS": Name:value pairs of any extra info
            'link_supported': bool,  # "L": “link:” special query term is supported,
            'cache': {               # "C": Dictionary, or "None" if cache is not available
                'size': str,         # "C[SZ]": Human readable size of cached page
                'cache_id': str,     # "C[CID]": ID of document in GSA's cache
                'encoding': str      # "C[ENC]": The text encoding of the cached page
            }
        },
        ...
    ]
}

按域名或语言过滤

您可以通过指定特定的域名或特定语言来过滤搜索结果。

english_results = search_client.search("hello world", language="lang_en")
non_english_results = search_client.search("hello world", language="-lang_en")
domain_specific_results = search_client.search(
    "hello world",
    domains=["site1.example.com", "site2.example.com"]
)

注意:如果未找到指定语言的搜索结果,GSA将回退到返回所有语言中找到的任何结果。

获取准确的统计总数

在撰写本文时,Google搜索设备将为每个查询返回结果的“估计”总数,但这个估计通常非常不准确,有时误差超过10倍!即使启用rc也是如此。

使用total_results方法,客户端将尝试请求990 - 1000的结果。这通常会导致GSA返回最后一页的结果,这允许我们找到实际的总结果数。

total = search_client.total_results("hello world", domains=[], language='')

Django视图

为了简化与Django一起使用GSA客户端,此模块包含一个Django视图。

用法

至少需要提供SEARCH_SERVER_URL设置,以告诉视图在哪里找到GSA

# settings.py
SEARCH_SERVER_URL = 'http://gsa.example.com/search'  # Required: GSA location
SEARCH_DOMAINS = ['site1.example.com']               # Optional: By default, limit results to this set of domains
SEARCH_LANGUAGE = 'lang_zh-CN'                       # Optional: By default, limit results to this language

# urls.py
from ubuntudesign.gsa.views import SearchView
urlpatterns += [url(r'^search/?$', SearchView.as_view(template_name="search.html"))]

然后该视图将可供查询

  • example.com/search?q=my+search+term

  • example.com/search?q=my+search+term&domain=example.com&domain=something.example.com(覆盖SEARCH_DOMAINS

  • example.com/search?q=my+search+term&language=-lang_zh-CN(排除中文结果,覆盖SEARCH_LANGUAGE

检索搜索结果后,视图将传递上下文对象到指定的template_name(在这种情况下为search.html)。

上下文对象的结构如下

{
    'query': str,       # The value of the `q` parameters passed to the view
    'limit': int,       # The value of the `limit` parameter, or the default of 10
    'offset': int,      # The value of the `offset` parameter, or the default of 0
    'error': None|str,  # None, or a description of the error if one occurred
    'results': {
        'items': [],    # The list of items as returned from the GSAClient (see above)
        'total': int,   # The exact total number of results available
        'start': int,   # The index of the first result in the set
        'end': int,     # The index of the last result in the set
        'next_offset': int|None,      # The offset for the next page of results, if available
        'previous_offset': int|None,  # The offset for the previous page of results, if available
        'last_page_offset': int,      # The offset for the last page of results
        'last_page': int,             # The final page number (calculated from "limit" and "total")
        'current_page': int,          # The current page number (calculated from "limit" and "end")
        'penultimate_page': int       # The second-to-last page
}

项目详情


下载文件

下载适用于您的平台的文件。如果您不确定选择哪一个,请了解更多关于安装包的信息。

源分布

ubuntudesign.gsa-1.1.0.tar.gz (6.5 kB 查看哈希

上传时间:

由以下机构支持

AWS AWS 云计算和安全赞助商 Datadog Datadog 监控 Fastly Fastly CDN Google Google 下载分析 Microsoft Microsoft PSF 赞助商 Pingdom Pingdom 监控 Sentry Sentry 错误记录 StatusPage StatusPage 状态页面