zc.catalog · PyPI · Python 包索引

对Zope 3目录的扩展

这些详情尚未由PyPI验证

项目链接

主页

项目描述

zc.catalog是Zope 3目录的扩展，Zope 3的索引和搜索功能。zc.catalog包含对Zope 3目录的多个扩展，例如一些新的索引、改进的globbing和stemming支持，以及另一种目录实现。

CHANGES

3.0 (2019-03-21)

取消对Python 3.4的支持，因为它已达到生命周期的终点。
添加对Python 3.7和3.8a2的支持。

2.0.1 (2017-06-15)

为zopyx.txng3.ext词干提取器添加Python 3兼容性。见#4。

2.0.0 (2017-05-09)

添加对Python 3.4、3.5、3.6和PyPy的支持。注意，在Python 3上不可用zopyx.txng3.ext词干提取器。
取消对zope.app.zcmlfiles和zope.app.testing等测试依赖项的支持。

1.6 (2013-07-04)

使用 Python 的 doctest 模块，而不是已弃用的 zope.testing.doctest。
将 zope.intid 移至依赖项中。

1.5.1 (2012-01-20)

修复了范围目录的 searchResults 方法，使其在本地 uid 源中使用时能够工作。
用 zope.password 替换了测试依赖项 zope.app.authentication。
移除了 zope.app.server 测试依赖。

1.5 (2010-10-19)

该包的 configure.zcml 已不再包括浏览器子包的 configure.zcml。

这，连同 browser 和 test_browser 的 extras_require，将浏览器视图注册与主代码解耦。因此，不需要注册 ZMI 视图的项目不再拉取 zope.app.* 依赖项。

要启用您项目的 ZMI 视图，您必须做两件事
- 将 zc.catalog [browser] 列为 install_requires。
- 确保您的项目的 configure.zcml 包含 zc.catalog.browser 子包。
仅在浏览器测试依赖项可用时包括浏览器测试。
Python2.7 测试修复。

1.4.5 (2010-10-05)

移除对 zope.app.dublincore 的隐式测试依赖，这最初就不需要。

1.4.4 (2010-07-06)

修复了与较新版本的 mechanize（>=2.0）一起发生的测试失败。

1.4.3 (2010-03-09)

首先尝试从 zopyx.txng3.ext 包导入 stemmer，该包自 3.3.2 版本以来包含稳定性和内存泄漏修复。

1.4.2 (2010-01-20)

通过添加 zope.login 在使用 ZTK 时修复缺失的测试依赖项。

1.4.1 (2009-02-27)

为 ValueIndex 添加类似 FieldIndex 的排序支持。
为 NormalizationWrapper 添加排序索引支持。

1.4.0 (2009-02-07)

修复了 ValueIndex addform 和 addMenuItem 中的拼写错误。
使用 zope.container 而不是 zope.app.container。
使用 zope.keyreference 而不是 zope.app.keyreference。
使用 zope.intid 而不是 zope.app.intid。
使用 zope.catalog 而不是 zope.app.catalog。

1.3.0 (2008-09-10)

添加了钩子点，允许范围目录与本地 UID 源一起使用。

1.2.0 (2007-11-03)

更新了包元数据。
zc.catalog 现在可以使用 ZODB 3.8 提供的 64 位 BTrees（“L”）。
Albertas Agejavas（alga@pov.lt）包含了新的 CallableWrapper，在典型 Zope 3 按适配器索引的故事（zope.app.catalog.attribute）是不必要麻烦时，您只需要使用可调用函数。请参阅 callablewrapper.txt。这也可以用于基于 zope.index 接口的其他索引。
范围现在有了一个 __len__。当前的实现将延迟到标准的 BTree len 实现，并共享其性能特征：它需要唤醒所有桶，但如果所有桶都已唤醒，那么这是一个相当快速的操作。
在 extentcatalog 模块中添加了一个简单的 ISelfPoulatingExtent，其中填充是一个空操作。这对于用作组件实现细节的目录非常有用，其中对象是通过您的调用而不是通常的订阅者显式索引的。它也可能作为其他自填充范围的基础略有用途。

1.1.1 (2007-3-17)

当其中一个值没有结果时，‘all_of’ 将返回所有结果。由 Nando Quintana 报告，并提供测试和修复。

1.1 (2007-01-06)

已删除的功能

范围目录中事件排队已被完全删除。子事务对 1.0 中引入的代码造成了重大问题。其他解决方案也存在重大问题，并且这种排队方式的收益是可疑的。以下是拒绝使用排队工作的一些方法摘要

_p_invalidate（在1.0中使用）。并不是为在事务中使用的，而是回退到最后一个保存点，而不是事务的开始。可以对保存点进行猴子补丁以迭代预提交事务钩子，但这闻起来太糟糕了。
_p_resolveConflict。需要应用程序软件存在于ZEO和ZRS安装中，这与我们的软件部署目标相反。还会导致数据库中空队列的无用重复写入，但这不是致命的。
关于单独的存储或队列事务管理员的模糊手势想法。在讨论中从未实现。

1.0 (2007-01-05)

修复了错误

调整了extentcatalog测试以触发（并讨论和测试）队列行为。
修复了由于队列代码导致的过多冲突错误问题。
更新了词干提取以与新版本的TextIndexNG扩展兼容。
当TextIndexNG的扩展不可用时省略了词干提取测试，因此测试可以通过它完成。由于TextIndexNG的扩展是可选的，这似乎是合理的。
从extentcatalog中移除了对zapi的使用。

0.2 (2006-11-22)

新增功能

在Cheeseshop上的首次发布。

值索引

值索引是一个类似于标准Zope字段索引的索引，但比标准Zope字段索引更灵活。索引允许对包含任何一组值的文档进行搜索；在一组值之间；任何（非None）值；以及任何空值。

此外，索引支持一个接口，允许检查索引值。

尽可能没有策略，旨在作为具有更多策略的索引的引擎，同时本身也有用。

创建索引时，没有wordCount，没有documentCount，并且如预期的那样相当空。

>>> from zc.catalog.index import ValueIndex
>>> index = ValueIndex()
>>> index.documentCount()
0
>>> index.wordCount()
0
>>> index.maxValue() # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError:...
>>> index.minValue() # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError:...
>>> list(index.values())
[]
>>> len(index.apply({'any_of': (5,)}))
0

索引支持索引任何值。给定索引内的所有值必须在Python版本之间保持一致排序。

>>> data = {1: 'a',
...         2: 'b',
...         3: 'a',
...         4: 'c',
...         5: 'd',
...         6: 'c',
...         7: 'c',
...         8: 'b',
...         9: 'c',
... }
>>> for k, v in data.items():
...     index.index_doc(k, v)
...

索引后，统计信息和值与新输入的内容相匹配。

>>> list(index.values())
['a', 'b', 'c', 'd']
>>> index.documentCount()
9
>>> index.wordCount()
4
>>> index.maxValue()
'd'
>>> index.minValue()
'a'
>>> list(index.ids())
[1, 2, 3, 4, 5, 6, 7, 8, 9]

索引支持四种类型的查询。第一种是‘any_of’。它接受一个值可迭代的，并返回一个包含任何值的文档ID的可迭代的。结果没有加权。

>>> list(index.apply({'any_of': ('b', 'c')}))
[2, 4, 6, 7, 8, 9]
>>> list(index.apply({'any_of': ('b',)}))
[2, 8]
>>> list(index.apply({'any_of': ('d',)}))
[5]
>>> bool(index.apply({'any_of': (42,)}))
False

另一个查询是‘any’，如果键为None，则返回所有带有任何值的索引文档ID。如果键是一个范围，则返回范围和所有具有任何值的文档ID的交集。

>>> list(index.apply({'any': None}))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> from zc.catalog.extentcatalog import FilterExtent
>>> extent = FilterExtent(lambda extent, uid, obj: True)
>>> for i in range(15):
...     extent.add(i, i)
...
>>> list(index.apply({'any': extent}))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> limited_extent = FilterExtent(lambda extent, uid, obj: True)
>>> for i in range(5):
...     limited_extent.add(i, i)
...
>>> list(index.apply({'any': limited_extent}))
[1, 2, 3, 4]

‘between’参数接受从1到4个值。第一个是最小值，默认为None，表示没有最小值；第二个是最大值，默认为None，表示没有最大值；下一个是布尔值，表示是否排除最小值，默认为False；最后一个是布尔值，表示是否排除最大值，也默认为False。结果没有加权。

>>> list(index.apply({'between': ('b', 'd')}))
[2, 4, 5, 6, 7, 8, 9]
>>> list(index.apply({'between': ('c', None)}))
[4, 5, 6, 7, 9]
>>> list(index.apply({'between': ('c',)}))
[4, 5, 6, 7, 9]
>>> list(index.apply({'between': ('b', 'd', True, True)}))
[4, 6, 7, 9]

使用无效的（在Python 3上不可比较的）参数来“between”产生无结果。

>>> list(index.apply({'between': (1, 5)}))
[]

‘none’参数接受一个范围并返回该范围内未索引的ID；它旨在用于返回没有（或空）值的docids。

>>> list(index.apply({'none': extent}))
[0, 10, 11, 12, 13, 14]

一次尝试使用多个这些会生成错误。

>>> index.apply({'between': (5,), 'any_of': (3,)})
... # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError:...

使用它们中的任何一个都不返回None。

>>> index.apply({}) # returns None

无效的查询名称会导致ValueErrors。

>>> index.apply({'foo': ()})
... # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError:...

当你取消索引一个文档时，搜索和统计信息应该更新。

>>> index.unindex_doc(5)
>>> len(index.apply({'any_of': ('d',)}))
0
>>> index.documentCount()
8
>>> index.wordCount()
3
>>> list(index.values())
['a', 'b', 'c']
>>> list(index.ids())
[1, 2, 3, 4, 6, 7, 8, 9]

重新索引一个已更改值的文档也会反映在随后的搜索和统计检查中。

>>> list(index.apply({'any_of': ('b',)}))
[2, 8]
>>> data[8] = 'e'
>>> index.index_doc(8, data[8])
>>> index.documentCount()
8
>>> index.wordCount()
4
>>> list(index.apply({'any_of': ('e',)}))
[8]
>>> list(index.apply({'any_of': ('b',)}))
[2]
>>> data[2] = 'e'
>>> index.index_doc(2, data[2])
>>> index.documentCount()
8
>>> index.wordCount()
3
>>> list(index.apply({'any_of': ('e',)}))
[2, 8]
>>> list(index.apply({'any_of': ('b',)}))
[]

重新索引现在值为None的文档会导致它从统计信息中删除。

>>> data[3] = None
>>> index.index_doc(3, data[3])
>>> index.documentCount()
7
>>> index.wordCount()
3
>>> list(index.ids())
[1, 2, 4, 6, 7, 8, 9]

这会影响确定哪些ID在索引中（是否有值）以及哪些ID不在索引中（是否有值）的方式。

>>> list(index.apply({'any': None}))
[1, 2, 4, 6, 7, 8, 9]
>>> list(index.apply({'any': extent}))
[1, 2, 4, 6, 7, 8, 9]
>>> list(index.apply({'none': extent}))
[0, 3, 5, 10, 11, 12, 13, 14]

可以使用values方法来检查给定文档ID的索引值。对于valueindex，给定doc_id的“values”的长度始终为0或1。

>>> index.values(doc_id=8)
('e',)

containsValue方法提供了一种确定值成员的方法。

>>> index.containsValue('a')
True
>>> index.containsValue('q')
False

排序值索引

值索引支持排序，就像zope.index.field.FieldIndex。

>>> index.clear()

>>> index.index_doc(1, 9)
>>> index.index_doc(2, 8)
>>> index.index_doc(3, 7)
>>> index.index_doc(4, 6)
>>> index.index_doc(5, 5)
>>> index.index_doc(6, 4)
>>> index.index_doc(7, 3)
>>> index.index_doc(8, 2)
>>> index.index_doc(9, 1)

>>> list(index.sort([4, 2, 9, 7, 3, 1, 5]))
[9, 7, 5, 4, 3, 2, 1]

我们还可以指定reverse参数以反转结果。

>>> list(index.sort([4, 2, 9, 7, 3, 1, 5], reverse=True))
[1, 2, 3, 4, 5, 7, 9]

根据IIndexSort，我们可以通过指定limit参数来限制结果。

>>> list(index.sort([4, 2, 9, 7, 3, 1, 5], limit=3))
[9, 7, 5]

如果我们传递的id未由该索引索引，则它将不包括在结果中。

>>> list(index.sort([2, 10]))
[2]

集合索引

setindex是一个类似于传统关键字索引但更为通用的索引。索引的值预期是可迭代的；该索引允许搜索包含任何一组值、所有值或一组值之间的文档。

此外，索引支持一个接口，允许检查索引值。

尽可能没有策略，旨在作为具有更多策略的索引的引擎，同时本身也有用。

创建索引时，没有wordCount，没有documentCount，并且如预期的那样相当空。

>>> from zc.catalog.index import SetIndex
>>> index = SetIndex()
>>> index.documentCount()
0
>>> index.wordCount()
0
>>> index.maxValue() # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError:...
>>> index.minValue() # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError:...
>>> list(index.values())
[]
>>> len(index.apply({'any_of': (5,)}))
0

索引支持索引任何值。给定索引内的所有值必须在Python版本之间保持一致的排序。在实践中，在Python 3中这意味着值需要是同质的。

>>> data = {1: ['a', '1'],
...         2: ['b', 'a', '3', '4', '7'],
...         3: ['1'],
...         4: ['1', '4', 'c'],
...         5: ['7'],
...         6: ['5', '6', '7'],
...         7: ['c'],
...         8: ['1', '6'],
...         9: ['a', 'c', '2', '3', '4', '6',],
... }
>>> for k, v in data.items():
...     index.index_doc(k, v)
...

索引后，统计信息和值与新输入的内容相匹配。

>>> list(index.values())
['1', '2', '3', '4', '5', '6', '7', 'a', 'b', 'c']
>>> index.documentCount()
9
>>> index.wordCount()
10
>>> index.maxValue()
'c'
>>> index.minValue()
'1'
>>> list(index.ids())
[1, 2, 3, 4, 5, 6, 7, 8, 9]

索引支持五种查询类型。第一种是“any_of”。它接受一个值可迭代的集合，并返回一个包含任何这些值的文档id的可迭代的集合。结果是有权重的。

>>> list(index.apply({'any_of': ('b', '1', '5')}))
[1, 2, 3, 4, 6, 8]
>>> list(index.apply({'any_of': ('b', '1', '5')}))
[1, 2, 3, 4, 6, 8]
>>> list(index.apply({'any_of': ('42',)}))
[]
>>> index.apply({'any_of': ('a', '3', '7')})              # doctest: +ELLIPSIS
BTrees...FBucket([(1, 1.0), (2, 3.0), (5, 1.0), (6, 1.0), (9, 2.0)])

使用无效（在Python 3上不可比较）的参数将被忽略。

>>> list(index.apply({'any_of': (1,)}))
[]
>>> list(index.apply({'any_of': (1, '1')}))
[1, 3, 4, 8]

另一个查询是“any”。如果键为None，则返回所有索引文档id，其中包含任何值。如果键是一个范围，则返回范围与所有包含任何值的文档id的交集。

>>> list(index.apply({'any': None}))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> from zc.catalog.extentcatalog import FilterExtent
>>> extent = FilterExtent(lambda extent, uid, obj: True)
>>> for i in range(15):
...     extent.add(i, i)
...
>>> list(index.apply({'any': extent}))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> limited_extent = FilterExtent(lambda extent, uid, obj: True)
>>> for i in range(5):
...     limited_extent.add(i, i)
...
>>> list(index.apply({'any': limited_extent}))
[1, 2, 3, 4]

“all_of”参数也接受一个值可迭代的集合，但返回一个包含所有这些值的文档id的可迭代的集合。结果是无权重的。

>>> list(index.apply({'all_of': ('a',)}))
[1, 2, 9]
>>> list(index.apply({'all_of': ('3', '4')}))
[2, 9]
>>> list(index.apply({'all_of': (3, '4')}))
[]
>>> list(index.apply({'all_of': ('3', 4)}))
[]

这些测试说明了两个相关的已修复的错误。

>>> list(index.apply({'all_of': ('z', '3', '4')}))
[]
>>> list(index.apply({'all_of': ('3', '4', 'z')}))
[]

“between”参数接受从1到4个值。第一个是最小值，默认为None，表示没有最小值；第二个是最大值，默认为None，表示没有最大值；下一个是一个布尔值，表示是否排除最小值，默认为False；最后一个是一个布尔值，表示是否排除最大值，也默认为False。结果是权重的。

>>> list(index.apply({'between': ('1', '7')}))
[1, 2, 3, 4, 5, 6, 8, 9]
>>> list(index.apply({'between': ('b', None)}))
[2, 4, 7, 9]
>>> list(index.apply({'between': ('b',)}))
[2, 4, 7, 9]
>>> list(index.apply({'between': ('1', '7', True, True)}))
[2, 4, 6, 8, 9]
>>> index.apply({'between': ('2', '6')})               # doctest: +ELLIPSIS
BTrees...FBucket([(2, 2.0), (4, 1.0), (6, 2.0), (8, 1.0), (9, 4.0)])

使用无效（在Python 3上不可比较）的参数不会产生任何结果。

>>> list(index.apply({'between': (1, 7)}))
[]

‘none’参数接受一个范围并返回该范围内未索引的ID；它旨在用于返回没有（或空）值的docids。

>>> list(index.apply({'none': extent}))
[0, 10, 11, 12, 13, 14]

一次尝试使用多个这些会生成错误。

>>> index.apply({'all_of': ('5',), 'any_of': ('3',)})
... # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError:...

使用它们中的任何一个都不返回None。

>>> index.apply({}) # returns None

无效的查询名称会导致ValueErrors。

>>> index.apply({'foo': ()})
... # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError:...

当你取消索引一个文档时，搜索和统计信息应该更新。

>>> index.unindex_doc(6)
>>> len(index.apply({'any_of': ('5',)}))
0
>>> index.documentCount()
8
>>> index.wordCount()
9
>>> list(index.values())
['1', '2', '3', '4', '6', '7', 'a', 'b', 'c']
>>> list(index.ids())
[1, 2, 3, 4, 5, 7, 8, 9]

重新索引具有新附加值的文档也会反映在随后的搜索和统计检查中。

>>> data[8].extend(['5', 'c'])
>>> index.index_doc(8, data[8])
>>> index.documentCount()
8
>>> index.wordCount()
10
>>> list(index.apply({'any_of': ('5',)}))
[8]
>>> list(index.apply({'any_of': ('c',)}))
[4, 7, 8, 9]

对于具有添加和删除的文档重新索引也是一样。

>>> 2 in set(index.apply({'any_of': ('7',)}))
True
>>> 2 in set(index.apply({'any_of': ('2',)}))
False
>>> data[2].pop()
'7'
>>> data[2].append('2')
>>> index.index_doc(2, data[2])
>>> 2 in set(index.apply({'any_of': ('7',)}))
False
>>> 2 in set(index.apply({'any_of': ('2',)}))
True

重新索引不再有任何值的文档会导致它从统计中被移除。

>>> del data[2][:]
>>> index.index_doc(2, data[2])
>>> index.documentCount()
7
>>> index.wordCount()
9
>>> list(index.ids())
[1, 3, 4, 5, 7, 8, 9]

这会影响确定哪些ID在索引中（是否有值）以及哪些ID不在索引中（是否有值）的方式。

>>> list(index.apply({'any': None}))
[1, 3, 4, 5, 7, 8, 9]
>>> list(index.apply({'none': extent}))
[0, 2, 6, 10, 11, 12, 13, 14]

可以使用values方法来检查给定文档id的索引值。

>>> set(index.values(doc_id=8)) == set(['1', '5', '6', 'c'])
True

containsValue方法提供了一种确定值成员的方法。

>>> index.containsValue('5')
True
>>> index.containsValue(5)
False
>>> index.containsValue('20')
False

规范化索引

index模块提供了一个标准化包装器、一个DateTime标准化器以及一个使用DateTime标准化器标准化的集合索引和值索引。

标准化包装器实现了完整的索引接口– zope.index.interfaces.IInjection、zope.index.interfaces.IIndexSearch、zope.index.interfaces.IStatistics和zc.catalog.interfaces.IIndexValues–并将所有行为委托给包装的索引，在索引看到它们之前使用标准化器标准化值。

标准化包装器目前仅支持zc.catalog.interfaces.ISetIndex和zc.catalog.interfaces.IValueIndex提供的查询。

标准化器接口需要以下方法，如接口中定义的：

def value(value)

“””对输入值进行标准化或检查约束；引发错误或返回要索引的值。”””

def any(value, index)

“””对“any_of”搜索的查询值进行标准化；返回值序列。”””

def all(value, index)

“””对“all_of”搜索的查询值进行标准化；返回查询的值。”””

def minimum(value, index)

“””对范围的“最小值”进行标准化；返回查询的值。”””

def maximum(value, index)

“””对范围的“最大值”进行标准化；返回查询的值。”””

DateTime标准化器执行以下标准化和验证。每当需要时区信息时，它都会尝试从当前交互中获取请求并将其适配到zope.interface.common.idatetime.ITZInfo；如果失败（没有请求或没有适配器），则使用系统本地时区。

输入值必须是带时区的日期时间。它们被归一化到创建归一化器时指定的分辨率：分辨率为0将值归一化到天；分辨率为1到小时；2到分钟；3到秒；4到微秒。
“任何”值可以是带时区的日期时间、不带时区的日期时间或日期。日期将转换为找到的时区中给定日期的起始到结束的任何值，如上所述。不带时区的日期时间将获得找到的时区。
“全部”值可以是带时区的日期时间或不带时区的日期时间。不带时区的日期时间将获得找到的时区。
“最小”值可以是带时区的日期时间、不带时区的日期时间或日期。日期将转换为找到的时区中给定日期的起始时间，如上所述。不带时区的日期时间将获得找到的时区。
“最大”值可以是带时区的日期时间、不带时区的日期时间或日期。日期将转换为找到的时区中给定日期的结束时间，如上所述。不带时区的日期时间将获得找到的时区。

首先让我们看看 DateTime normalizer，然后是它与正常化包装器和值及集合索引的集成。

索引值使用“值”进行解析。

>>> from zc.catalog.index import DateTimeNormalizer
>>> n = DateTimeNormalizer() # defaults to minutes
>>> import datetime
>>> import pytz
>>> naive_datetime = datetime.datetime(2005, 7, 15, 11, 21, 32, 104)
>>> date = naive_datetime.date()
>>> aware_datetime = naive_datetime.replace(
...     tzinfo=pytz.timezone('US/Eastern'))
>>> n.value(naive_datetime)
Traceback (most recent call last):
...
ValueError: This index only indexes timezone-aware datetimes.
>>> n.value(date)
Traceback (most recent call last):
...
ValueError: This index only indexes timezone-aware datetimes.
>>> n.value(aware_datetime) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 11, 21, tzinfo=<DstTzInfo 'US/Eastern'...>)

如果我们指定不同的分辨率，结果也会不同。

>>> another = DateTimeNormalizer(1) # hours
>>> another.value(aware_datetime) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 11, 0, tzinfo=<DstTzInfo 'US/Eastern'...>)

请注意，改变索引值的分辨率可能会产生令人惊讶的结果，因为查询不会改变它们的分辨率。因此，如果您使用比归一化器分辨率更精细的日期时间索引某物，则搜索该日期时间将找不到 doc_id。

“任何_of”查询中的值使用“任何”进行解析。“任何”应该返回一个值序列。它需要一个索引，我们将在下面模拟它。

>>> class DummyIndex(object):
...     def values(self, start, stop, exclude_start, exclude_stop):
...         assert not exclude_start and exclude_stop
...         six_hours = datetime.timedelta(hours=6)
...         res = []
...         dt = start
...         while dt < stop:
...             res.append(dt)
...             dt += six_hours
...         return res
...
>>> index = DummyIndex()
>>> tuple(n.any(naive_datetime, index)) # doctest: +ELLIPSIS
(datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>),)
>>> tuple(n.any(aware_datetime, index)) # doctest: +ELLIPSIS
(datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>),)
>>> tuple(n.any(date, index)) # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
(datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>),
 datetime.datetime(2005, 7, 15, 6, 0, tzinfo=<...Local...>),
 datetime.datetime(2005, 7, 15, 12, 0, tzinfo=<...Local...>),
 datetime.datetime(2005, 7, 15, 18, 0, tzinfo=<...Local...>))

“all_of”查询中的值使用“all”进行解析。

>>> n.all(naive_datetime, index) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>)
>>> n.all(aware_datetime, index) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>)
>>> n.all(date, index) # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError: ...

“between”查询中的最小值以及其他方法中的最小值使用“minimum”进行解析。它们还接受一个可选的 exclude 布尔值，表示是否排除最小值。对于日期时间，只有当您传递一个日期时才会有所不同。

>>> n.minimum(naive_datetime, index) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>)
>>> n.minimum(naive_datetime, index, exclude=True) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>)

>>> n.minimum(aware_datetime, index) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>)
>>> n.minimum(aware_datetime, index, True) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>)

>>> n.minimum(date, index) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>)
>>> n.minimum(date, index, True) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 23, 59, 59, 999999, tzinfo=<...Local...>)

“between”查询中的最大值以及其他方法中的最大值使用“maximum”进行解析。它们也接受一个可选的 exclude 布尔值，表示是否排除最大值。对于日期时间，只有当您传递一个日期时才会有所不同。

>>> n.maximum(naive_datetime, index) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>)
>>> n.maximum(naive_datetime, index, exclude=True) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>)

>>> n.maximum(aware_datetime, index) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>)
>>> n.maximum(aware_datetime, index, True) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>)

>>> n.maximum(date, index) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 23, 59, 59, 999999, tzinfo=<...Local...>)
>>> n.maximum(date, index, True) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>)

现在让我们在一个真实索引的上下文中检查这些归一化器。

>>> from zc.catalog.index import DateTimeValueIndex, DateTimeSetIndex
>>> setindex = DateTimeSetIndex() # minutes resolution
>>> data = [] # generate some data
>>> def date_gen(
...     start=aware_datetime,
...     count=12,
...     period=datetime.timedelta(hours=10)):
...     dt = start
...     ix = 0
...     while ix < count:
...         yield dt
...         dt += period
...         ix += 1
...
>>> gen = date_gen()
>>> count = 0
>>> while True:
...     try:
...         next_ = [next(gen) for i in range(6)]
...     except StopIteration:
...         break
...     data.append((count, next_[0:1]))
...     count += 1
...     data.append((count, next_[1:3]))
...     count += 1
...     data.append((count, next_[3:6]))
...     count += 1
...
>>> print(data) # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
[(0,
  [datetime.datetime(2005, 7, 15, 11, 21, 32, 104, ...<...Eastern...>)]),
 (1,
  [datetime.datetime(2005, 7, 15, 21, 21, 32, 104, ...<...Eastern...>),
   datetime.datetime(2005, 7, 16, 7, 21, 32, 104, ...<...Eastern...>)]),
 (2,
  [datetime.datetime(2005, 7, 16, 17, 21, 32, 104, ...<...Eastern...>),
   datetime.datetime(2005, 7, 17, 3, 21, 32, 104, ...<...Eastern...>),
   datetime.datetime(2005, 7, 17, 13, 21, 32, 104, ...<...Eastern...>)]),
 (3,
  [datetime.datetime(2005, 7, 17, 23, 21, 32, 104, ...<...Eastern...>)]),
 (4,
  [datetime.datetime(2005, 7, 18, 9, 21, 32, 104, ...<...Eastern...>),
   datetime.datetime(2005, 7, 18, 19, 21, 32, 104, ...<...Eastern...>)]),
 (5,
  [datetime.datetime(2005, 7, 19, 5, 21, 32, 104, ...<...Eastern...>),
   datetime.datetime(2005, 7, 19, 15, 21, 32, 104, ...<...Eastern...>),
   datetime.datetime(2005, 7, 20, 1, 21, 32, 104, ...<...Eastern...>)])]
>>> data_dict = dict(data)
>>> for doc_id, value in data:
...     setindex.index_doc(doc_id, value)
...
>>> list(setindex.ids())
[0, 1, 2, 3, 4, 5]
>>> set(setindex.values()) == set(
...     setindex.normalizer.value(v) for v in date_gen())
True

对于搜索，我们将实际使用一个请求和交互，使用一个返回东部时区的适配器。这使得示例对它们使用的机器的依赖性更小。

>>> import zope.security.management
>>> import zope.publisher.browser
>>> import zope.interface.common.idatetime
>>> import zope.publisher.interfaces
>>> request = zope.publisher.browser.TestRequest()
>>> zope.security.management.newInteraction(request)
>>> from zope import interface, component
>>> @interface.implementer(zope.interface.common.idatetime.ITZInfo)
... @component.adapter(zope.publisher.interfaces.IRequest)
... def tzinfo(req):
...     return pytz.timezone('US/Eastern')
...
>>> component.provideAdapter(tzinfo)
>>> n.all(naive_datetime, index).tzinfo is pytz.timezone('US/Eastern')
True

>>> set(setindex.apply({'any_of': (datetime.date(2005, 7, 17),
...                                datetime.date(2005, 7, 20),
...                                datetime.date(2005, 12, 31))})) == set(
...     (2, 3, 5))
True

请注意，此搜索正在使用归一化值。

>>> set(setindex.apply({'all_of': (
...     datetime.datetime(
...         2005, 7, 16, 7, 21, tzinfo=pytz.timezone('US/Eastern')),
...     datetime.datetime(
...         2005, 7, 15, 21, 21, tzinfo=pytz.timezone('US/Eastern')),)})
...     ) == set((1,))
True
>>> list(setindex.apply({'any': None}))
[0, 1, 2, 3, 4, 5]
>>> set(setindex.apply({'between': (
...     datetime.datetime(2005, 4, 1, 12), datetime.datetime(2006, 5, 1))})
...     ) == set((0, 1, 2, 3, 4, 5))
True
>>> set(setindex.apply({'between': (
...     datetime.datetime(2005, 4, 1, 12), datetime.datetime(2006, 5, 1),
...     True, True)})
...     ) == set((0, 1, 2, 3, 4, 5))
True

“between”搜索应该很好地处理日期。

>>> set(setindex.apply({'between': (
...     datetime.date(2005, 7, 16), datetime.date(2005, 7, 17))})
...     ) == set((1, 2, 3))
True
>>> len(setindex.apply({'between': (
...     datetime.date(2005, 7, 16), datetime.date(2005, 7, 17))})
...     ) == len(setindex.apply({'between': (
...     datetime.date(2005, 7, 15), datetime.date(2005, 7, 18),
...     True, True)})
...     )
True

删除文档的工作与往常一样。

>>> setindex.unindex_doc(1)
>>> list(setindex.ids())
[0, 2, 3, 4, 5]

值、最小值和最大值可以接受不带时区的日期时间和日期。

>>> setindex.minValue() # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 15, 11, 21, ...<...Eastern...>)
>>> setindex.minValue(datetime.date(2005, 7, 17)) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 17, 3, 21, ...<...Eastern...>)

>>> setindex.maxValue() # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 20, 1, 21, ...<...Eastern...>)
>>> setindex.maxValue(datetime.date(2005, 7, 17)) # doctest: +ELLIPSIS
datetime.datetime(2005, 7, 17, 23, 21, ...<...Eastern...>)

>>> list(setindex.values(
... datetime.date(2005, 7, 17), datetime.date(2005, 7, 17)))
... # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
[datetime.datetime(2005, 7, 17, 3, 21, ...<...Eastern...>),
 datetime.datetime(2005, 7, 17, 13, 21, ...<...Eastern...>),
 datetime.datetime(2005, 7, 17, 23, 21, ...<...Eastern...>)]

>>> zope.security.management.endInteraction() # TODO put in tests tearDown

排序

如果其上游索引提供，归一化包装器提供 zope.index.interfaces.IIndexSort 接口。例如，DateTimeValueIndex 将提供 IIndexSort，因为 ValueIndex 提供排序。它还将 sort 方法委托给值索引。

>>> from zc.catalog.index import DateTimeValueIndex
>>> from zope.index.interfaces import IIndexSort

>>> ix = DateTimeValueIndex()
>>> IIndexSort.providedBy(ix.index)
True
>>> IIndexSort.providedBy(ix)
True
>>> ix.sort.__self__ is ix.index
True

但它不会对不执行排序的索引有效，例如 DateTimeSetIndex。

>>> ix = DateTimeSetIndex()
>>> IIndexSort.providedBy(ix.index)
False
>>> IIndexSort.providedBy(ix)
False
>>> ix.sort
Traceback (most recent call last):
...
AttributeError: 'SetIndex' object has no attribute 'sort'

范围目录

范围目录与正常目录非常相似，只是它只索引可添加到其范围的项。范围既是过滤器也是可能与其他结果集合并的集合。过滤是我们在下面将要讨论的附加功能；我们将从简单的“什么也不做”的范围开始，它只支持第二种用法。

我们在这里创建文本需要的状态。

>>> import zope.keyreference.persistent
>>> import zope.component
>>> import zope.intid
>>> import zope.component
>>> import zope.interface.interfaces
>>> import zope.component.persistentregistry
>>> from ZODB.MappingStorage import DB
>>> import transaction

>>> zope.component.provideAdapter(
...     zope.keyreference.persistent.KeyReferenceToPersistent,
...     adapts=(zope.interface.Interface,))
>>> zope.component.provideAdapter(
...     zope.keyreference.persistent.connectionOfPersistent,
...     adapts=(zope.interface.Interface,))

>>> site_manager = None
>>> def getSiteManager(context=None):
...     if context is None:
...         if site_manager is None:
...             return zope.component.getGlobalSiteManager()
...         else:
...             return site_manager
...     else:
...         try:
...             return zope.interface.interfaces.IComponentLookup(context)
...         except TypeError as error:
...             raise zope.component.ComponentLookupError(*error.args)
...
>>> def setSiteManager(sm):
...     global site_manager
...     site_manager = sm
...     if sm is None:
...         zope.component.getSiteManager.reset()
...     else:
...         zope.component.getSiteManager.sethook(getSiteManager)
...
>>> def makeRoot():
...     db = DB()
...     conn = db.open()
...     root = conn.root()
...     site_manager = root['components'] = (
...         zope.component.persistentregistry.PersistentComponents())
...     site_manager.__bases__ = (zope.component.getGlobalSiteManager(),)
...     site_manager.registerUtility(
...         zope.intid.IntIds(family=btrees_family),
...         provided=zope.intid.interfaces.IIntIds)
...     setSiteManager(site_manager)
...     transaction.commit()
...     return root
...

>>> @zope.component.adapter(zope.interface.Interface)
... @zope.interface.implementer(zope.interface.interfaces.IComponentLookup)
... def getComponentLookup(obj):
...     return obj._p_jar.root()['components']
...
>>> zope.component.provideAdapter(getComponentLookup)

为了展示范围目录的工作，我们需要一个 intid 实用程序、一个索引和一些要索引的项。我们将在真实的 ZODB 和真实的 intid 实用程序中完成这项工作。

>>> import zc.catalog
>>> import zc.catalog.interfaces
>>> from zc.catalog import interfaces, extentcatalog
>>> from zope import interface, component
>>> from zope.interface import verify
>>> import persistent
>>> import BTrees.IFBTree

>>> root = makeRoot()
>>> intid = zope.component.getUtility(
...     zope.intid.interfaces.IIntIds, context=root)
>>> TreeSet = btrees_family.IF.TreeSet

>>> from zope.container.interfaces import IContained
>>> @interface.implementer(IContained)
... class DummyIndex(persistent.Persistent):
...     __parent__ = __name__ = None
...     def __init__(self):
...         self.uids = TreeSet()
...     def unindex_doc(self, uid):
...         if uid in self.uids:
...             self.uids.remove(uid)
...     def index_doc(self, uid, obj):
...         self.uids.insert(uid)
...     def clear(self):
...         self.uids.clear()
...     def apply(self, query):
...         return [uid for uid in self.uids if uid <= query]
...
>>> class DummyContent(persistent.Persistent):
...     def __init__(self, name, parent):
...         self.id = name
...         self.__parent__ = parent
...

>>> extent = extentcatalog.Extent(family=btrees_family)
>>> verify.verifyObject(interfaces.IExtent, extent)
True
>>> root['catalog'] = catalog = extentcatalog.Catalog(extent)
>>> verify.verifyObject(interfaces.IExtentCatalog, catalog)
True
>>> index = DummyIndex()
>>> catalog['index'] = index
>>> transaction.commit()

现在我们已经设置了一个具有索引和范围的目录。我们可以向范围添加一些数据

>>> matches = []
>>> for i in range(100):
...     c = DummyContent(i, root)
...     root[i] = c
...     doc_id = intid.register(c)
...     catalog.index_doc(doc_id, c)
...     matches.append(doc_id)
>>> matches.sort()
>>> sorted(extent) == sorted(index.uids) == matches
True

我们可以获取范围的大小。

>>> len(extent)
100

从目录中取消索引对象应简单地将其从目录中删除，并按常规索引。

>>> matches[0] in catalog.extent
True
>>> matches[0] in catalog['index'].uids
True
>>> catalog.unindex_doc(matches[0])
>>> matches[0] in catalog.extent
False
>>> matches[0] in catalog['index'].uids
False
>>> doc_id = matches.pop(0)
>>> sorted(extent) == sorted(index.uids) == matches
True

清除目录会清除范围和包含的索引。

>>> catalog.clear()
>>> list(catalog.extent) == list(catalog['index'].uids) == []
True

更新所有索引和单个索引也会更新范围。

>>> catalog.updateIndexes()
>>> matches.insert(0, doc_id)
>>> sorted(extent) == sorted(index.uids) == matches
True

>>> index2 = DummyIndex()
>>> catalog['index2'] = index2
>>> index2.__parent__ == catalog
True
>>> index.uids.remove(matches[0]) # to confirm that only index 2 is touched
>>> catalog.updateIndex(index2)
>>> sorted(extent) == sorted(index2.uids) == matches
True
>>> matches[0] in index.uids
False
>>> matches[0] in index2.uids
True
>>> res = index.uids.insert(matches[0])

但为什么要一开始就有范围呢？它允许索引对整个索引数据的可靠集合进行操作；因此，它允许 zc.catalog 中的索引执行 NOT 操作。

范围本身提供了一些合并功能，以允许其值与其他 BTrees 数据结构合并。这包括交集、并集、差集和反向差集。给定一个名为“范围”的范围和一个名为“数据”的 IFBTree 数据结构，交集可以表示为“范围 & 数据”或“数据 & 范围”；并集可以表示为“范围 | 数据”或“数据 | 范围”；差集可以表示为“范围 - 数据”；反向差集可以表示为“数据 - 范围”。并集和交集是有权重的。

>>> extent = extentcatalog.Extent(family=btrees_family)
>>> for i in range(1, 100, 2):
...     extent.add(i, None)
...
>>> alt_set = TreeSet()
>>> _ = alt_set.update(range(0, 166, 33)) # return value is unimportant here
>>> sorted(alt_set)
[0, 33, 66, 99, 132, 165]
>>> sorted(extent & alt_set)
[33, 99]
>>> sorted(alt_set & extent)
[33, 99]
>>> sorted(extent.intersection(alt_set))
[33, 99]
>>> original = set(extent)
>>> union_matches = original.copy()
>>> union_matches.update(alt_set)
>>> union_matches = sorted(union_matches)
>>> sorted(alt_set | extent) == union_matches
True
>>> sorted(extent | alt_set) == union_matches
True
>>> sorted(extent.union(alt_set)) == union_matches
True
>>> sorted(alt_set - extent)
[0, 66, 132, 165]
>>> sorted(extent.rdifference(alt_set))
[0, 66, 132, 165]
>>> original.remove(33)
>>> original.remove(99)
>>> set(extent - alt_set) == original
True
>>> set(extent.difference(alt_set)) == original
True

我们可以向 extentcatalog.Catalog 传递自己的实例化 UID 工具。

>>> extent = extentcatalog.Extent(family=btrees_family)
>>> uidutil = zope.intid.IntIds()
>>> cat = extentcatalog.Catalog(extent, uidutil)
>>> cat["index"] = DummyIndex()
>>> cat.UIDSource is uidutil
True

>>> cat._getUIDSource() is uidutil
True

目录的 searchResults 方法返回的 ResultSet 实例使用我们的 UID 工具。

>>> obj = DummyContent(43, root)
>>> uid = uidutil.register(obj)
>>> cat.index_doc(uid, obj)
>>> res = cat.searchResults(index=uid)
>>> res.uidutil is uidutil
True

>>> list(res) == [obj]
True

searchResults 也可能返回 None。

>>> cat.searchResults() is None
True

当目录具有其 uid 源设置时，调用 updateIndex 和 updateIndexes 也同样有效。

>>> cat.clear()
>>> uid in cat.extent
False

uid 工具中的所有对象都已索引。

>>> cat.updateIndexes()
>>> uid in cat.extent
True

>>> len(cat.extent)
1

>>> obj2 = DummyContent(44, root)
>>> uid2 = uidutil.register(obj2)
>>> cat.updateIndexes()
>>> len(cat.extent)
2

>>> uid2 in cat.extent
True

>>> uidutil.unregister(obj2)

>>> cat.clear()
>>> uid in cat.extent
False
>>> cat.updateIndex(cat["index"])
>>> uid in cat.extent
True

使用自动填充的范围，调用 updateIndex 或 updateIndexes 只会更新范围中 id 的对象；如果存在，目录将使用其 uid 源通过 id 查找对象。

>>> extent = extentcatalog.NonPopulatingExtent(family=btrees_family)
>>> cat = extentcatalog.Catalog(extent, uidutil)
>>> cat["index"] = DummyIndex()

>>> extent.add(uid, obj)
>>> uid in cat["index"].uids
False

>>> cat.updateIndexes()
>>> uid in cat["index"].uids
True

>>> cat.clear()
>>> uid in cat["index"].uids
False

>>> uid in cat.extent
False

>>> cat.extent.add(uid, obj)
>>> cat.updateIndex(cat["index"])
>>> uid in cat["index"].uids
True

从 intid 工具注销先前测试的对象

>>> intid = zope.component.getUtility(
...     zope.intid.interfaces.IIntIds, context=root)
>>> for doc_id in matches:
...     intid.unregister(intid.queryObject(doc_id))

具有过滤范围的目录

如本文档开头所述，范围不仅可以帮助索引操作，还可以充当过滤器，以便目录可以回答有关 intids 中包含的子集对象的疑问。

过滤器范围只存储与给定过滤器匹配的对象。

>>> def filter(extent, uid, ob):
...     assert interfaces.IFilterExtent.providedBy(extent)
...     # This is an extent of objects with odd-numbered uids without a
...     # True ignore attribute
...     return uid % 2 and not getattr(ob, 'ignore', False)
...
>>> extent = extentcatalog.FilterExtent(filter, family=btrees_family)
>>> verify.verifyObject(interfaces.IFilterExtent, extent)
True
>>> root['catalog1'] = catalog = extentcatalog.Catalog(extent)
>>> verify.verifyObject(interfaces.IExtentCatalog, catalog)
True
>>> index = DummyIndex()
>>> catalog['index'] = index
>>> transaction.commit()

现在我们已设置了一个包含索引和范围的目录。如果我们创建一些内容并要求目录对其进行索引，只有匹配过滤器的内容将存在于范围和索引中。

>>> matches = []
>>> fails = []
>>> i = 0
>>> while True:
...     c = DummyContent(i, root)
...     root[i] = c
...     doc_id = intid.register(c)
...     catalog.index_doc(doc_id, c)
...     if filter(extent, doc_id, c):
...         matches.append(doc_id)
...     else:
...         fails.append(doc_id)
...     i += 1
...     if i > 99 and len(matches) > 4:
...         break
...
>>> matches.sort()
>>> sorted(extent) == sorted(index.uids) == matches
True

如果一个内容对象被索引，以前匹配过滤器但现在不再匹配，它应从范围和索引中删除。

>>> matches[0] in catalog.extent
True
>>> obj = intid.getObject(matches[0])
>>> obj.ignore = True
>>> filter(extent, matches[0], obj)
False
>>> catalog.index_doc(matches[0], obj)
>>> doc_id = matches.pop(0)
>>> doc_id in catalog.extent
False
>>> sorted(extent) == sorted(index.uids) == matches
True

对不在目录中的对象进行解索引应是无操作。

>>> fails[0] in catalog.extent
False
>>> catalog.unindex_doc(fails[0])
>>> fails[0] in catalog.extent
False
>>> sorted(extent) == sorted(index.uids) == matches
True

更新所有索引和单个索引也会更新范围。

>>> index2 = DummyIndex()
>>> catalog['index2'] = index2
>>> index2.__parent__ == catalog
True
>>> index.uids.remove(matches[0]) # to confirm that only index 2 is touched
>>> catalog.updateIndex(index2)
>>> sorted(extent) == sorted(index2.uids)
True
>>> matches[0] in index.uids
False
>>> matches[0] in index2.uids
True
>>> res = index.uids.insert(matches[0])

如果您更新单个索引，并且对象不再是范围的成员，则它将从所有索引中删除。

>>> matches[0] in catalog.extent
True
>>> matches[0] in index.uids
True
>>> matches[0] in index2.uids
True
>>> obj = intid.getObject(matches[0])
>>> obj.ignore = True
>>> catalog.updateIndex(index2)
>>> matches[0] in catalog.extent
False
>>> matches[0] in index.uids
False
>>> matches[0] in index2.uids
False
>>> doc_id = matches.pop(0)
>>> (matches == sorted(catalog.extent) == sorted(index.uids)
...  == sorted(index2.uids))
True

自动填充范围

范围可能知道如何填充自身；如果目录可以初始化为比 IIntIds 工具中可用的项更少的项（这些项也位于最近的 Zope 3 站点中），则特别有用（编码在基本 Zope 3 目录中的策略）。

此类范围必须实现 ISelfPopulatingExtent 接口，它需要两个属性。让我们以 FilterExtent 类为基础来实现此类范围，其中有一个选择内容项 0（上面创建并注册的）的方法

>>> class PopulatingExtent(
...     extentcatalog.FilterExtent,
...     extentcatalog.NonPopulatingExtent):
...
...     def populate(self):
...         if self.populated:
...             return
...         self.add(intid.getId(root[0]), root[0])
...         super(PopulatingExtent, self).populate()

基于此范围创建目录将忽略数据库中的对象

>>> def accept_any(extent, uid, ob):
...     return True

>>> extent = PopulatingExtent(accept_any, family=btrees_family)
>>> catalog = extentcatalog.Catalog(extent)
>>> index = DummyIndex()
>>> catalog['index'] = index
>>> root['catalog2'] = catalog
>>> transaction.commit()

此时，我们的范围仍然是未填充的

>>> extent.populated
False

遍历范围不会导致它自动填充

>>> list(extent)
[]

导致我们新的索引被填充将调用 populate() 方法，并将 populate 标志作为副作用设置

>>> catalog.updateIndex(index)
>>> extent.populated
True

>>> list(extent) == [intid.getId(root[0])]
True

索引已使用范围识别的文档更新

>>> list(index.uids) == [intid.getId(root[0])]
True

重复更新同一索引将继续使用范围作为包含文档的来源

>>> catalog.updateIndex(index)

>>> list(extent) == [intid.getId(root[0])]
True
>>> list(index.uids) == [intid.getId(root[0])]
True

updateIndexes() 方法具有类似的行为。如果我们向目录添加一个额外的索引，我们会看到它只索引来自范围的那些对象

>>> index2 = DummyIndex()
>>> catalog['index2'] = index2

>>> catalog.updateIndexes()

>>> list(extent) == [intid.getId(root[0])]
True
>>> list(index.uids) == [intid.getId(root[0])]
True
>>> list(index2.uids) == [intid.getId(root[0])]
True

当目录和范围新鲜（尚未填充）时，我们会看到 updateIndexes() 将导致范围被填充

>>> extent = PopulatingExtent(accept_any, family=btrees_family)
>>> root['catalog3'] = catalog = extentcatalog.Catalog(extent)
>>> index1 = DummyIndex()
>>> index2 = DummyIndex()
>>> catalog['index1'] = index1
>>> catalog['index2'] = index2
>>> transaction.commit()

>>> extent.populated
False

>>> catalog.updateIndexes()

>>> extent.populated
True

>>> list(extent) == [intid.getId(root[0])]
True
>>> list(index1.uids) == [intid.getId(root[0])]
True
>>> list(index2.uids) == [intid.getId(root[0])]
True

我们将确保一切都可以安全提交。

>>> transaction.commit()
>>> setSiteManager(None)

词干提取器

分词器使用Andreas Jung的分词代码，这是一个M. F. Porter的Snowball项目的Python包装器（http://snowball.tartarus.org/index.php）。它被设计为在zope/index/text/词库中的流水线部分使用，在分词器之后。这可以使用TextIndexNG 3.x的分词功能获取zope/index/text代码的相关性排名。

它需要您已经编译并安装了Python安装中的TextIndexNG扩展程序，特别是txngstemmer。不需要包含textindexng包。

截至本文写作时（2007年1月3日），可以通过以下步骤安装必要的扩展程序

svn co https://svn.sourceforge.net/svnroot/textindexng/extension_modules/trunk ext_mod
cd ext_mod
（使用您用于Zope的Python）python setup.py install

另一种方法是简单地安装TextIndexNG（见http://opensource.zopyx.com/software/textindexng3）

分词器必须实例化为期望进行分词的语言。默认为‘english’。截至本文写作时，支持以下语言，使用分词器期望的字符串包括以下内容：‘danish’，‘dutch’，‘english’，‘finnish’，‘french’，‘german’，‘italian’，‘norwegian’，‘portuguese’，‘russian’，‘spanish’，和‘swedish’。

例如，让我们使用english分词器构建一个索引。

>>> from zope.index.text import textindex, lexicon
>>> import zc.catalog.stemmer
>>> lex = lexicon.Lexicon(
...     lexicon.Splitter(), lexicon.CaseNormalizer(),
...     lexicon.StopWordRemover(), zc.catalog.stemmer.Stemmer('english'))
>>> ix = textindex.TextIndex(lex)
>>> data = [
...     (0, 'consigned consistency consoles the constables'),
...     (1, 'knaves kneeled and knocked knees, knowing no knights')]
>>> for doc_id, text in data:
...     ix.index_doc(doc_id, text)
...
>>> list(ix.apply('consoling a constable'))
[0]
>>> list(ix.apply('knightly kneel'))
[1]

请注意，具有通配符的查询术语不会被分词。

>>> list(ix.apply('constables*'))
[]

对旧数据的支持

在引入btree“家族”和BTrees.Interfaces.IBTreeFamily接口之前，由zc.catalog.index模块定义的索引使用实例属性btreemodule和IOBTree，在构造函数中初始化，并使用BTreeAPI属性。这些现在被当前实现中的family属性所取代。

这是一个白盒测试，用于验证现有数据结构（从pickle加载）可以有效地使用当前实现中的支持值。

有两个支持值集；一个用于32位btree

>>> import BTrees.IOBTree

>>> legacy32 = {
...     "btreemodule": "BTrees.IFBTree",
...     "IOBTree": BTrees.IOBTree.IOBTree,
...     }

另一个用于64位btree

>>> import BTrees.LOBTree

>>> legacy64 = {
...     "btreemodule": "BTrees.LFBTree",
...     "IOBTree": BTrees.LOBTree.LOBTree,
...     }

在每种情况下，实际的遗留结构还将包括与正确整数大小匹配的索引结构

>>> import BTrees.OOBTree
>>> import BTrees.Length

>>> legacy32["values_to_documents"] = BTrees.OOBTree.OOBTree()
>>> legacy32["documents_to_values"] = BTrees.IOBTree.IOBTree()
>>> legacy32["documentCount"] = BTrees.Length.Length(0)
>>> legacy32["wordCount"] = BTrees.Length.Length(0)

>>> legacy64["values_to_documents"] = BTrees.OOBTree.OOBTree()
>>> legacy64["documents_to_values"] = BTrees.LOBTree.LOBTree()
>>> legacy64["documentCount"] = BTrees.Length.Length(0)
>>> legacy64["wordCount"] = BTrees.Length.Length(0)

我们想要验证从遗留数据加载的实例的family属性被正确计算，并确保结构被干净地更新，不会导致只读事务变为写事务。我们需要创建符合旧数据结构的实例，将它们pickle，并证明反pickle它们会产生使用正确家族的实例。

让我们创建新实例，并强制内部数据与旧结构匹配

>>> import pickle
>>> import zc.catalog.index

>>> vi32 = zc.catalog.index.ValueIndex()
>>> vi32.__dict__ = legacy32.copy()
>>> legacy32_pickle = pickle.dumps(vi32)

>>> vi64 = zc.catalog.index.ValueIndex()
>>> vi64.__dict__ = legacy64.copy()
>>> legacy64_pickle = pickle.dumps(vi64)

现在，让我们反pickle这些结构并验证它们。我们首先从32位版本开始

>>> vi32 = pickle.loads(legacy32_pickle)

>>> vi32.__dict__["btreemodule"]
'BTrees.IFBTree'
>>> vi32.__dict__["IOBTree"]
<type 'BTrees.IOBTree.IOBTree'>

>>> "family" in vi32.__dict__
False

>>> vi32._p_changed
False

family属性返回BTrees.family32单例

>>> vi32.family is BTrees.family32
True

一旦访问，遗留值已经从实例字典中清除

>>> "btreemodule" in vi32.__dict__
False
>>> "IOBTree" in vi32.__dict__
False
>>> "BTreeAPI" in vi32.__dict__
False

将这些属性作为属性访问始终提供正确值

>>> vi32.btreemodule
'BTrees.IFBTree'
>>> vi32.IOBTree
<type 'BTrees.IOBTree.IOBTree'>
>>> vi32.BTreeAPI
<module 'BTrees.IFBTree' from ...>

尽管实例字典已经被清理，但更改标志尚未设置。这样处理是为了避免将只读事务转换为写事务

>>> vi32._p_changed
False

64位版本提供等效的行为

>>> vi64 = pickle.loads(legacy64_pickle)

>>> vi64.__dict__["btreemodule"]
'BTrees.LFBTree'
>>> vi64.__dict__["IOBTree"]
<type 'BTrees.LOBTree.LOBTree'>

>>> "family" in vi64.__dict__
False

>>> vi64._p_changed
False

>>> vi64.family is BTrees.family64
True

>>> "btreemodule" in vi64.__dict__
False
>>> "IOBTree" in vi64.__dict__
False
>>> "BTreeAPI" in vi64.__dict__
False

>>> vi64.btreemodule
'BTrees.LFBTree'
>>> vi64.IOBTree
<type 'BTrees.LOBTree.LOBTree'>
>>> vi64.BTreeAPI
<module 'BTrees.LFBTree' from ...>

>>> vi64._p_changed
False

现在，如果我们有一个遗留结构并显式设置family属性，旧的数据结构将被清除并替换为新结构。如果对象与数据管理器相关联，则更改标志也将被设置

>>> class DataManager(object):
...     def register(self, ob):
...         pass

>>> vi64 = pickle.loads(legacy64_pickle)
>>> vi64._p_jar = DataManager()
>>> vi64.family = BTrees.family64

>>> vi64._p_changed
True

>>> "btreemodule" in vi64.__dict__
False
>>> "IOBTree" in vi64.__dict__
False
>>> "BTreeAPI" in vi64.__dict__
False

>>> "family" in vi64.__dict__
True
>>> vi64.family is BTrees.family64
True

>>> vi64.btreemodule
'BTrees.LFBTree'
>>> vi64.IOBTree
<type 'BTrees.LOBTree.LOBTree'>
>>> vi64.BTreeAPI
<module 'BTrees.LFBTree' from ...>

globber

globber 接收一个查询，并将任何不是 glob 的术语转换为以星号结尾的形式。最初，它被认为是一个非常廉价的分词技巧。作者现在对其价值表示怀疑，并希望可以使用新的分词管道选项来代替。尽管如此，这里有一个它在工作中使用的例子。

>>> from zope.index.text import textindex
>>> index = textindex.TextIndex()
>>> lex = index.lexicon
>>> from zc.catalog import globber
>>> globber.glob('foo bar and baz or (b?ng not boo)', lex)
'(((foo* and bar*) and baz*) or (b?ng and not boo*))'

可调用包装器

如果我们想索引从文档中容易推导出的某个值，我们必须定义一个具有该值作为属性的接口，并创建一个计算此值并实现此接口的适配器。如果只想存储一个容易推导出的值，所有这些都很麻烦。CallableWrapper 通过将文档转换为可调用的转换器中的索引值来解决此问题。

以下是一个虚构的例子。假设我们有知道其燃油效率（以每加仑英里数表示）的汽车，但我们想索引它们的燃油效率（以每100公里升数表示）。

>>> class Car(object):
...     def __init__(self, mpg):
...         self.mpg = mpg

>>> def mpg2lp100(car):
...     return 100.0/(1.609344/3.7854118 * car.mpg)

让我们创建一个索引，该索引将索引汽车的每100公里升数评级。

>>> from zc.catalog import index, catalogindex
>>> idx = catalogindex.CallableWrapper(index.ValueIndex(), mpg2lp100)

让我们向索引中添加几辆汽车！

>>> hummer = Car(10.0)
>>> beamer = Car(22.0)
>>> civic = Car(45.0)

>>> idx.index_doc(1, hummer)
>>> idx.index_doc(2, beamer)
>>> idx.index_doc(3, civic)

索引的值应该是转换后的每100公里升数评级。

>>> list(idx.values()) # doctest: +ELLIPSIS
[5.22699076283393..., 10.691572014887601, 23.521458432752723]

我们可以查询消耗燃料在一定范围内的汽车

>>> list(idx.apply({'between': (5.0, 7.0)}))
[3]

zc.catalog浏览器支持

zc.catalog.browser 包为 SetIndex 和 ValueIndex 添加了简单的 TTW 添加/检查功能。

首先，我们需要一个浏览器，以便我们可以测试 Web UI。

>>> from zope.testbrowser.wsgi import Browser
>>> browser = Browser()
>>> browser.handleErrors = False
>>> browser.addHeader('Authorization', 'Basic mgr:mgrpw')
>>> browser.addHeader('Accept-Language', 'en-US')
>>> browser.open('https:///')

现在我们需要添加一个目录，这些索引将驻留在其中。

>>> browser.open('https:///++etc++site/default/@@+/')
>>> browser.getControl('Catalog').click()
>>> browser.getControl(name='id').value = 'catalog'
>>> browser.getControl('Add').click()

SetIndex

将 SetIndex 添加到目录中。

>>> browser.open(browser.getLink('Add').url + '/')
>>> browser.getControl('Set Index').click()
>>> browser.getControl(name='id').value = 'set_index'
>>> browser.getControl('Add').click()

添加表单需要提供适配候选对象的接口值、要使用的字段名称以及该字段是否是可调用的值。（我们将使用简单的接口进行演示，这并不真正重要。）

>>> browser.getControl('Interface', index=0).displayValue = [
...     'zope.size.interfaces.ISized']
>>> browser.getControl('Field Name').value = 'sizeForDisplay'
>>> browser.getControl('Field Callable').click()
>>> browser.getControl(name='add_input_name').value = 'set_index'
>>> browser.getControl('Add').click()

现在我们可以查看索引并查看其配置。

>>> browser.getLink('set_index').click()
>>> print(browser.contents)
<...
...Interface...zope.size.interfaces.ISized...
...Field Name...sizeForDisplay...
...Field Callable...True...

我们需要回到目录中，以便我们可以添加不同的索引。

>>> browser.open('/++etc++site/default/catalog/@@contents.html')

ValueIndex

将 ValueIndex 添加到目录中。

>>> browser.open(browser.getLink('Add').url + '/')
>>> browser.getControl('Value Index').click()
>>> browser.getControl(name='id').value = 'value_index'
>>> browser.getControl('Add').click()

添加表单需要提供适配候选对象的接口值、要使用的字段名称以及该字段是否是可调用的值。（我们将使用简单的接口进行演示，这并不真正重要。）

>>> browser.getControl('Interface', index=0).displayValue = [
...     'zope.size.interfaces.ISized']
>>> browser.getControl('Field Name').value = 'sizeForDisplay'
>>> browser.getControl('Field Callable').click()
>>> browser.getControl(name='add_input_name').value = 'value_index'
>>> browser.getControl('Add').click()

现在我们可以查看索引并查看其配置。

>>> browser.getLink('value_index').click()
>>> print(browser.contents)
<...
...Interface...zope.size.interfaces.ISized...
...Field Name...sizeForDisplay...
...Field Callable...True...

项目详情

这些详情尚未由PyPI验证

项目链接

主页

发布历史发布通知 | RSS 源

本版本

3.0

2019年3月21日

2.0.1

2017年6月15日

2.0.0

2017年5月9日

1.6

2013年7月4日

1.5.1

2012年1月20日

1.5

2010年10月19日

1.4.5

2010年10月5日

1.4.4

2010年7月6日

1.4.3

2010年3月9日

1.4.2

2010年1月20日

1.4.1

2009年2月27日

1.4.0

2009年2月7日

1.3.1

2010年3月11日

1.3.0

2008年9月10日

1.2.0

2007年11月3日

1.2b 预发布

2007年7月3日

1.1.1

2007年3月18日

1.1

2007年1月6日

1.0

2007年1月5日

0.2

2006年9月22日

1.2dev-r74688 预发布

2007年4月23日

下载文件

下载您平台上的文件。如果您不确定要选择哪个，请了解更多关于安装软件包的信息。

源代码分发

zc.catalog-3.0.tar.gz (71.1 kB 查看散列)

上传 2019年3月21日 源

构建分发

zc.catalog-3.0-py2.py3-none-any.whl (74.0 kB 查看散列)

上传 2019年3月21日 Python 2 Python 3

zc.catalog-3.0.tar.gz 的散列

zc.catalog-3.0.tar.gz 的散列
算法	散列摘要
SHA256	`7c0125657dc0d1341b61af04e98e17ddc720e9025d907cb98c4db7581753b13f`
MD5	`6db798c34ed19925288d24f79ece5328`
BLAKE2b-256	`756cb69f19f72114f8777e4821ef955aaeddc12db0f6dbac521dbc2dcd0eeca2`

哈希值用于 zc.catalog-3.0-py2.py3-none-any.whl

zc.catalog-3.0-py2.py3-none-any.whl 的哈希值
算法	散列摘要
SHA256	`4d63916af2eff52c75502daa113feeb3878e0826e7eb0625a33b82da1a34357c`
MD5	`8dce77a734c3f02df60cd4981b55bbf2`
BLAKE2b-256	`695a86e233a3a8e397b2844da74cafc46fc510fa6c8ae87ee87bfdc7c21173c3`

zc.catalog 3.0

导航

验证详情

所有者

维护者

未验证详情

项目链接

元数据

分类器

项目描述

项目详情

验证详情

所有者

维护者

未验证详情

项目链接

元数据

分类器

发布历史 发布通知 | RSS 源

下载文件

源代码分发

构建分发

发布历史发布通知 | RSS 源