mutate · PyPI · Python 包索引

数据处理工具，提供符合CDM的易于索引的JSON文档

这些详情尚未由PyPI验证

项目链接

主页

项目描述

Python库，用于生成符合通用数据模型的文档。

scribe是一个Python库，用于处理收集的数据并将其转换为符合通用数据模型，以方便将数据索引到Elasticsearch。目前scribe仅支持通过stockpile收集的数据，但它很容易与其他数据源集成。

建议使用venv安装和运行scribe。

python3 -m venv /path/to/new/virtual/environment
source /path/to/new/virtual/environment/bin/activate
pip install mutate
# For testing purposes you can do clone the repo first
# and then install as a local project
git clone https://github.com/redhat-performance/scribe.git
pip install -e scribe/

注意：我们在这里创建一个python3 venv，因为scribe是用python3编写的，目前与python2不兼容

将scribe用作Python库

Scribe由Python库mutate提供，它有助于将输入文档转录为符合CDM的一系列文档。可以按以下方式完成。

from mutate.transcribe import scribe
for scribe_object in scribe('/tmp/stockpile.json','stockpile'):
    print(scribe_object)

贡献（创建补丁集）

请访问 http://gerrithub.io/ 并使用您的github账户登录。确保导入您的ssh密钥。

现在，克隆github仓库

$ git clone https://github.com/redhat-performance/scribe.git

确保您已安装git-review，以下命令应该可行。

$ sudo pip install git-review

将克隆的仓库设置与Gerrit一起使用

$ git review -s

建议创建一个分支来完成您的工作，将其命名为与您希望引入的更改相关的名称。

$ git branch my_special_enhancement
$ git checkout !$

按照以下说明进行更改并提交。

$ git add /path/to/files/changed
$ git commit -m "your commit title"

使用描述性的提交标题，后跟一个空格。您应该输入您更改的内容以及原因的简要说明。

现在，您可以提交您的更改以供审查

$ git review

如果您想从同一提交创建另一个补丁集，您可以在进一步修改和保存后使用amend功能。确保您在同一分支上，如果没有分支，请按照下一组说明操作

$ git add /path/to/files/changed
$ git commit --amend
$ git review

如果您想从不同位置（例如不同的机器或计算机）提交新的补丁集，您可以再次克隆仓库（如果它尚未存在），然后使用git review针对您的唯一Change-ID

$ git review -d Change-Id

Change-Id 是在 Gerrit 中看到的更改 ID 号码，将在您的第一次成功提交后生成。因此，在 https://review.gerrithub.io/#/c/redhat-performance/scribe/+/425014/ 的情况下

您可以选择 git review -d 425014（这是号码），或者您也可以这样做 git review -d If0b7b4f30615e46f009759b32a3fc533e811ebdc，其中 If0b7b4f30615e46f009759b32a3fc533e811ebdc 是存在的 change-id

在由 git review -d（分支名称大致为 review/username/branch_name/patchsetnumber）设置的分支上做出更改。

使用以下方法将文件添加到 git 并提交您的更改：

$ git commit --amend

您还可以在执行上述命令时显示的提示中编辑您的提交信息。

最后，使用以下方法使用提交进行审查：

$ git review

在提交信息中添加 Depends-On

很多时候，尤其是在添加新模块时，对 scribe 的更改并不能确保 CI 正确工作，直到 stockpile 的相应更改合并。在这种情况下，为了确保 CI 不与 stockpile 的 master 分支而是与您提交给 stockpile 的补丁集一起工作，您可以使用 depends-on 功能。

要添加 Depends-On 功能，请复制您提交给 stockpile 的补丁集的 Change-Id，并将其添加到提交信息的末尾，如下所示。

注意：请将其添加到提交信息中的 Change-Id 之后。

提交信息应如下所示

Your commit message

Change-Id: I9bc121f076b8625da88705c9d96bd00117f94c22

Depends-On: {Change-Id of the review submitted to stockpile}

例如，如果您正在添加一个处理卫星数据的模块，CI 将无法测试它，因为 stockpile 尚未拥有卫星集合。然而，因为您有一个尚未合并的提交，例如 https://review.gerrithub.io/#/c/redhat-performance/stockpile/+/425015/

您仍然可以通过在 scribe 的提交信息中添加 Depends-On 来确保和验证 stockpile-scribe 工作流程，因此提交信息将如下所示

Adding satellite Module to work with stockpile.

Change-Id: some_random_change_id_generated_after_git_review

Depends-On: I66329511b38a558ce61efb7edb4c3be18625b252

请注意，Depends-On 中的更改 ID 与 https://review.gerrithub.io/#/c/redhat-performance/stockpile/+/425015/ 中的相同

另一个示例请参阅： https://review.gerrithub.io/#/c/redhat-performance/scribe/+/425969/

贡献（进行更改）

Scribe 包基本上由两个模块组成

scribes
scribe_modules

这两个模块有不同的用途，scribes 用于读取输入数据并将它们预处理成可以用来创建 scribe_modules 的结构

预处理的字典结构可能如下所示

{
"scribe_module_1": [
    {
        "host": "localhost",
        "value": "sample_value_1"
    },
    {
        "host": "host1",
        "value": "sample_value_2"
    },
    {
        "host": "host2",
        "value": "sample_value_3"
    }
],
"scribe_module_2": [
    {
        "host": "host2",
        "value": {
            "field1": "sample_filed1_value_3",
            "field2": "sample_field2_value_3"
        }
    },
    {
        "host": "host1",
        "value": [
            {
                "field1": "sample_filed1_value_1",
                "field2": "sample_field2_value_1"
            },
            {
                "field1": "sample_filed1_value_2",
                "field2": "sample_field2_value_2"
            }
        ]
    }
]
}

基本上，字典需要第一级键与您在 scribe_modules/ 中编写的 'scribe_modules' 匹配。字典中每个模块的子节点应该有 2 个键 - 'host' 和 'value'。'value' 键的值可以是字典或字典列表

请注意，'value' 键的值将是创建对象时传递给 scribe_modules 的值。

让我们以 host2 的 scribe_module_2 的简单示例为例，只需创建一个对象，传递的值将是

{
"field1": "sample_filed1_value_3",
"field2": "sample_field2_value_3"
}

对于 host1，将创建 2 个对象。

对于对象 1，以下值将被传递

{
"field1": "sample_filed1_value_1",
"field2": "sample_field2_value_1"
}

对于对象 2，以下值将被传递

{
"field1": "sample_filed1_value_2",
"field2": "sample_field2_value_2"
}

而对于 host1 的 scribe_module_1，传递的值将是：“sample_value_2”

添加新的 scribes

扩展 scribe 以支持新的输入类型 'example1' 的步骤包括

在 'mutate/scribes/' 目录中创建 'example1.py'。示例代码如下所示

from . import ScribeBaseClass


class Example1(ScribeBaseClass):

    def example1_build_initial_dict(self):
        output_dict = {}
        Example1_data = load_file(self._path)
        # .... some sort of data manipulation
        # .... to build the output_dict
        return output_dict

    def __init__(self, path=None, source_type=None):
        ScribeBaseClass.__init__(self, source_type=source_type, path=path)
        self._dict = self.example1_build_initial_dict()

    def emit_scribe_dict(self):
        return self._dict

注意以下事项

from . import ScribeBaseClass 需要存在，因为我们是从 ScribeBaseClass 继承的
class Example1(ScribeBaseClass) 是继承发生的地方，确保在编写类定义时包含 '(ScribeBaseClass)'
类名中第一个字母必须大写，这是工厂方法定义的方式。
__init__ 函数首先调用父类的 __init__ 函数，并传递默认参数，这些参数是路径和 source_type，但是可以添加更多参数。这些参数不需要传递给父类的 __init__ 函数。
emit_scribe_dict 是一个抽象方法，因此任何其他被编写的类都必须定义它。然而，方法本身可以更改，但它应该返回上述描述的字典对象。

在 scribe.py 的第 14 行将模块添加到 choices 列表中，当前看起来像这样：choices=['stockpile']，因为创建此文档时，只能使用 scribe 转写 stockpile 数据。

添加新的 scribe_modules

将 scribe_modules 扩展到支持新模块 'scribe_module_1' 的步骤包括

在 'mutate/scribe_modules' 目录中添加一个新的类 'scribe_module_1.py'。它可能看起来像这样：```python

from . import ScribeModuleBaseClass

class Scribe_module_1(ScribeModuleBaseClass)

def __init__(self, input_dict=None, module_name=None, host_name=None,
             input_type=None, scribe_uuid=None):
    ScribeModuleBaseClass.__init__(self, module_name=module_name,
                                   input_dict=input_dict,
                                   host_name=host_name,
                                   input_type=input_type,
                                   scribe_uuid=scribe_uuid)
    if input_dict:
        new_dict = {}
        # ... this is where transformation occurs
        # ... can call other member functions of class
        # ... can set the entities of the class object like
        self.entity_1 = input_dict

# This isn't needed here, as it's how the __iter__ function is defined
# in the parent class and it's not an abstractmethod, so only if you'd
# like to change how __iter__ method should work for your class, you
# should add the following next lines.
# Not recommended, unless you know what you're doing
def __iter__(self):
      # ... your definition of how to make it iterable

```

注意以下重要事项

需要存在 from . import ScribeModuleBaseClass，因为我们正在从 ScribeModuleBaseClass 继承
class Example1(ScribeModuleBaseClass) 是继承发生的地方，确保在编写类时存在 '(ScribeModuleBaseClass)'。
类名中第一个字母必须大写，这是工厂方法定义的方式。
__init__ 函数首先调用父类的 __init__ 函数，并传递默认参数，这些参数是 module_name、input_dict、host_name、input_type 和 scribe_uuid。请注意，不能再传递其他参数。
新实体应仅在 __init__ 函数内部设置，但用户可以从同一类或从 lib/util.py 调用其他方法来执行转换，具有灵活性。

将新类 'example1.yml' 的模式添加到 'mutate/schema' 目录。Scribe 当前使用 cerberus 验证 scribe_modules 子类生成的可迭代对象。请参阅 http://docs.python-cerberus.org/en/stable/validation-rules.html 了解有关如何为您的类输出编写模式的更多信息。

注意：yml 文件名应与为 scribe_modules 类创建的文件名匹配。因此，对于 'example1' 类，文件应命名为 'example1.yml'。

数据模型和 ES 模板

'mutate/schema' 目录基本上将包含数据模型。需要进行工作，以便这些 yml 文件可以用于创建 elasticsearch 的模板。这与 ViaQ 的 elasticsearch 模板工作类似。

有关如何创建模板的更多信息，请参阅 https://github.com/ViaQ/elasticsearch-templates。

请注意，目前 ViaQ/elasticsearch-templates 不支持从 'mutate/schema' 中存在的模式文件创建模板。

散列 for mutate-0.0.1.dev13.tar.gz

mutate-0.0.1.dev13.tar.gz 的散列
算法	散列摘要
SHA256	`c3768905871f6576872972cc7c626db231d67528b8c28b16c9b671cdbe0a91cb`
MD5	`0ce382b58a6920fcce0ff1fcdf93de79`
BLAKE2b-256	`71edabec74bfa477746cfaeaaa7a159a211190a56fbaa3599ebd82087131e986`

散列 for mutate-0.0.1.dev13-py3-none-any.whl

mutate-0.0.1.dev13-py3-none-any.whl 的散列
算法	散列摘要
SHA256	`f3792ee6677153d74cad5a515cd6636326fdda2a5472ed539305a74b026f63b1`
MD5	`8db6b20150f949a727b983fd3fc47402`
BLAKE2b-256	`f8449eac04b5271530f8e80b52044cdc122dceaca54d0ee65dd917451543edaa`

mutate 0.0.1.dev13

导航

验证详情

维护者

未经验证的详情

项目链接

元信息

分类

项目描述

将scribe用作Python库

贡献（创建补丁集）

在提交信息中添加 Depends-On

贡献（进行更改）

添加新的 scribes

添加新的 scribe_modules

数据模型和 ES 模板

项目详情

验证详情

维护者

未经验证的详情

项目链接

元信息

分类

发布历史发布通知 | RSS 源

下载文件

源分发

构建分发

mutate 0.0.1.dev13

导航

验证详情

维护者

未经验证的详情

项目链接

元信息

分类

项目描述

将scribe用作Python库

贡献（创建补丁集）

在提交信息中添加 Depends-On

贡献（进行更改）

添加新的 scribes

添加新的 scribe_modules

数据模型和 ES 模板

项目详情

验证详情

维护者

未经验证的详情

项目链接

元信息

分类

发布历史 发布通知 | RSS 源

下载文件

源分发

构建分发

发布历史发布通知 | RSS 源