ubuntudesign.gsa · PyPI · Python 包索引

用于与谷歌搜索设备通信的客户端。

项目描述

一个用于谷歌搜索设备的客户端库，简化Python中检索搜索结果的过程。

安装

此模块在PyPi中以ubuntudesign.gsa的形式存在。您应该能够通过以下方式简单安装它：

pip install ubuntudesign.gsa

GSAClient

这是一个基本的查询谷歌搜索设备的客户端。

执行查询

您可以使用search方法查询GSA。

search_client = GSAClient(base_url="http://gsa.example.com/search")

first_ten_results = search_client.search("hello world")

first_thirty_results = search_client.search("hello world", num=30)

results_twenty_to_forty = search_client.search(
  "hello world", start=20, num=20
)

这将设置q、start（默认值：0）和num（默认值：10）以及lr（默认值：''）参数。不会提供其他搜索参数，因此它们都将回退到默认值。

返回的结果对象将尝试将GSA的每个标准结果XML标签映射到更易读的格式

{
    'estimated_total_results': int,  # "M": GSA's estimate, see below
    'document_filtering': bool,      # "FI": Is filtering enabled?
    'next_url': str,                 # "NU": GSA URL for querying the next set of results, if available
    'previous_url': str,             # "PU": Ditto for previous set of results
    'items': [
        {
            'index': int,            # "R[N]": The number of this result in the index of all results
            'url': str,              # "U": The URL of the resulting page
            'encoded_url': str,      # "UE": The above URL, encoded
            'title': str,            # "T": The page title
            'relevancy': int,        # "RK": How relevant is this result to the query? From 0 to 10
            'appliance_id': str,     # "ENT_SOURCE": The serial number of the GSA
            'summary': str,          # "S": Summary text for this result
            'language': str,         # "LANG": The language of the page
            'details': {}            # "FS": Name:value pairs of any extra info
            'link_supported': bool,  # "L": “link:” special query term is supported,
            'cache': {               # "C": Dictionary, or "None" if cache is not available
                'size': str,         # "C[SZ]": Human readable size of cached page
                'cache_id': str,     # "C[CID]": ID of document in GSA's cache
                'encoding': str      # "C[ENC]": The text encoding of the cached page
            }
        },
        ...
    ]
}

按域名或语言过滤

您可以通过指定特定的域名或特定语言来过滤搜索结果。

english_results = search_client.search("hello world", language="lang_en")
non_english_results = search_client.search("hello world", language="-lang_en")
domain_specific_results = search_client.search(
    "hello world",
    domains=["site1.example.com", "site2.example.com"]
)

注意：如果未找到指定语言的搜索结果，GSA将回退到返回所有语言中找到的任何结果。

获取准确的统计总数

在撰写本文时，Google搜索设备将为每个查询返回结果的“估计”总数，但这个估计通常非常不准确，有时误差超过10倍！即使启用rc也是如此。

使用total_results方法，客户端将尝试请求990 - 1000的结果。这通常会导致GSA返回最后一页的结果，这允许我们找到实际的总结果数。

total = search_client.total_results("hello world", domains=[], language='')

Django视图

为了简化与Django一起使用GSA客户端，此模块包含一个Django视图。

用法

至少需要提供SEARCH_SERVER_URL设置，以告诉视图在哪里找到GSA

# settings.py
SEARCH_SERVER_URL = 'http://gsa.example.com/search'  # Required: GSA location
SEARCH_DOMAINS = ['site1.example.com']               # Optional: By default, limit results to this set of domains
SEARCH_LANGUAGE = 'lang_zh-CN'                       # Optional: By default, limit results to this language

# urls.py
from ubuntudesign.gsa.views import SearchView
urlpatterns += [url(r'^search/?$', SearchView.as_view(template_name="search.html"))]

然后该视图将可供查询

example.com/search?q=my+search+term
example.com/search?q=my+search+term&domain=example.com&domain=something.example.com（覆盖SEARCH_DOMAINS）
example.com/search?q=my+search+term&language=-lang_zh-CN（排除中文结果，覆盖SEARCH_LANGUAGE）

检索搜索结果后，视图将传递上下文对象到指定的template_name（在这种情况下为search.html）。

上下文对象的结构如下

{
    'query': str,       # The value of the `q` parameters passed to the view
    'limit': int,       # The value of the `limit` parameter, or the default of 10
    'offset': int,      # The value of the `offset` parameter, or the default of 0
    'error': None|str,  # None, or a description of the error if one occurred
    'results': {
        'items': [],    # The list of items as returned from the GSAClient (see above)
        'total': int,   # The exact total number of results available
        'start': int,   # The index of the first result in the set
        'end': int,     # The index of the last result in the set
        'next_offset': int|None,      # The offset for the next page of results, if available
        'previous_offset': int|None,  # The offset for the previous page of results, if available
        'last_page_offset': int,      # The offset for the last page of results
        'last_page': int,             # The final page number (calculated from "limit" and "total")
        'current_page': int,          # The current page number (calculated from "limit" and "end")
        'penultimate_page': int       # The second-to-last page
}

项目详情

发布历史记录发布通知 | RSS源

此版本

1.1.0

2017年5月22日

1.0.5

2017年2月21日

1.0.4

2017年2月6日

1.0.3

2017年1月31日

1.0.2

2017年1月27日

1.0.1

2017年1月27日

1.0.0

2017年1月26日

0.1.1

2017年1月26日

0.1.0

2017年1月26日

下载文件

下载适用于您的平台的文件。如果您不确定选择哪一个，请了解更多关于安装包的信息。

源分布

ubuntudesign.gsa-1.1.0.tar.gz （6.5 kB 查看哈希）

上传时间： 2017年5月22日 源

ubuntudesign.gsa-1.1.0.tar.gz的哈希

ubuntudesign.gsa-1.1.0.tar.gz的哈希
算法	哈希摘要
SHA256	`be8b2f9efbfbb8c1a4d59378b830194ca3a3af00245d386aa1c9c9fa79a5c1eb`
MD5	`a2b82ca5e9323b7ebaa4b31034599fb9`
BLAKE2b-256	`729d413ed829fa3a17c6e24133abb6c4d4e49abc1c6bef37fd01aa27fc77fa11`

ubuntudesign.gsa 1.1.0

导航

已验证详情

维护者

未验证详情

项目链接

元数据

项目描述

安装

GSAClient

执行查询

按域名或语言过滤

获取准确的统计总数

Django视图

用法

项目详情

已验证详情

维护者

未验证详情

项目链接

元数据

发布历史记录发布通知 | RSS源

下载文件

源分布

ubuntudesign.gsa 1.1.0

导航

已验证详情

维护者

未验证详情

项目链接

元数据

项目描述

安装

GSAClient

执行查询

按域名或语言过滤

获取准确的统计总数

Django视图

用法

项目详情

已验证详情

维护者

未验证详情

项目链接

元数据

发布历史记录 发布通知 | RSS源

下载文件

源分布

发布历史记录发布通知 | RSS源