跳转到主要内容

HEPData提交的JSON模式及验证代码

项目描述

GitHub Actions Build Status Coveralls Status License GitHub Releases PyPI Version GitHub Issues Documentation Status

HEPData提交的JSON模式及验证代码(Python 3版本)

安装

如果您可能的话,在您的机器上安装LibYAML(一个用于解析和生成YAML的C库)。这将允许使用CSafeLoader(而不是Python的SafeLoader)以更快地加载YAML文件。对小文件来说不是什么大问题,但对于大型文档来说性能明显更好。

使用pip从PyPI安装

$ pip install --user hepdata-validator
$ hepdata-validate --help

如果您想在M1 Mac上使用LibYAML,可能需要额外的步骤以确保pyyaml带有LibYAML绑定。在通过Homebrew安装LibYAML后运行以下命令

$ LDFLAGS="-L$(brew --prefix)/lib" CFLAGS="-I$(brew --prefix)/include" pip install --global-option="--with-libyaml" --force pyyaml

开发者

开发者应从GitHub在虚拟环境中安装

$ git clone https://github.com/HEPData/hepdata-validator
$ cd hepdata-validator
$ python3.9 -m venv venv
$ source venv/bin/activate
(venv)$ pip install --upgrade pip
(venv)$ pip install --upgrade -e ".[all]"

应该使用和没有LibYAML一起运行测试,因为不同YAML解析器的错误信息不同

(venv) $ USE_LIBYAML=True pytest testsuite
(venv) $ USE_LIBYAML=False pytest testsuite

用法

hepdata-validator软件包允许您通过命令行或Python进行验证

  • 完整的提交目录和数据文件

  • 包含所有文件的存档文件(.zip, .tar, .tar.gz, .tgz)(完整详情

  • 单个.yaml或.yaml.gz文件(但不是 submission.yaml或YAML数据文件)

  • 一个 submission.yaml 文件或单个 YAML 数据文件(仅通过 Python,不通过命令行)

同一包用于验证上传到 hepdata.net 的内容,因此首先离线验证可以提高在上传前检查您的提交是否有效的效率。

命令行

安装 hepdata-validator 包会将命令 hepdata-validate 添加到您的路径中,这使得您可以在离线状态下验证 HEPData 提交

示例

验证当前目录中包含多个文件的提交

$ hepdata-validate

验证另一个目录中包含多个文件的提交

$ hepdata-validate -d ../TestHEPSubmission

验证当前目录中的存档文件(.zip, .tar, .tar.gz, .tgz)

$ hepdata-validate -a TestHEPSubmission.zip

验证当前目录中的单个 YAML 文件

$ hepdata-validate -f single_yaml_file.yaml

使用选项

$ hepdata-validate --help
Usage: hepdata-validate [OPTIONS]

  Offline validation of submission.yaml and YAML data files. Can check either
  a directory, an archive file, or the single YAML file format.

Options:
  -d, --directory TEXT  Directory to check (defaults to current working
                        directory)
  -f, --file TEXT       Single .yaml or .yaml.gz file (but not submission.yaml
                        or a YAML data file) to check - see https://hepdata-
                        submission.readthedocs.io/en/latest/single_yaml.html.
                        (Overrides directory)
  -a, --archive TEXT    Archive file (.zip, .tar, .tar.gz, .tgz) to check.
                        (Overrides directory and file)
  --help                Show this message and exit.

Python

验证完整提交

要验证完整提交,实例化一个 FullSubmissionValidator 对象

from hepdata_validator.full_submission_validator import FullSubmissionValidator, SchemaType
full_submission_validator = FullSubmissionValidator()

# validate a directory
is_dir_valid = full_submission_validator.validate(directory='TestHEPSubmission')

# or uncomment to validate an archive file
# is_archive_valid = full_submission_validator.validate(archive='TestHEPSubmission.zip')

# or uncomment to validate a single file
# is_file_valid = full_submission_validator.validate(file='single_yaml_file.yaml')

# if there are any error messages, they are retrievable through this call
full_submission_validator.get_messages()

# the error messages can be printed for each file
full_submission_validator.print_errors('submission.yaml')

# the list of valid files can be retrieved via the valid_files property, which is a
# dict mapping SchemaType (e.g. SUBMISSION, DATA, SINGLE_YAML, REMOTE) to lists of
# valid files
full_submission_validator.valid_files[SchemaType.SUBMISSION]
full_submission_validator.valid_files[SchemaType.DATA]
# full_submission_validator.valid_files[SchemaType.SINGLE_YAML]

# if a remote schema is used, valid_files is a list of tuples (schema, file)
# full_submission_validator.valid_files[SchemaType.REMOTE]

# the list of valid files can be printed
full_submission_validator.print_valid_files()

验证单个文件

要验证提交文件,实例化一个 SubmissionFileValidator 对象

from hepdata_validator.submission_file_validator import SubmissionFileValidator

submission_file_validator = SubmissionFileValidator()
submission_file_path = 'submission.yaml'

# the validate method takes a string representing the file path
is_valid_submission_file = submission_file_validator.validate(file_path=submission_file_path)

# if there are any error messages, they are retrievable through this call
submission_file_validator.get_messages()

# the error messages can be printed
submission_file_validator.print_errors(submission_file_path)

要验证数据文件,实例化一个 DataFileValidator 对象

from hepdata_validator.data_file_validator import DataFileValidator

data_file_validator = DataFileValidator()

# the validate method takes a string representing the file path
data_file_validator.validate(file_path='data.yaml')

# if there are any error messages, they are retrievable through this call
data_file_validator.get_messages()

# the error messages can be printed
data_file_validator.print_errors('data.yaml')

可选的,如果您已经加载了 YAML 对象,则可以将其作为 data 对象传递。您还必须传递 file_path,因为这是用作错误消息查找映射的键。

from hepdata_validator.data_file_validator import DataFileValidator
import yaml

file_contents = yaml.safe_load(open('data.yaml', 'r'))
data_file_validator = DataFileValidator()

data_file_validator.validate(file_path='data.yaml', data=file_contents)

data_file_validator.get_messages('data.yaml')

data_file_validator.print_errors('data.yaml')

对于类似的 SubmissionFileValidator

from hepdata_validator.submission_file_validator import SubmissionFileValidator
import yaml
submission_file_path = 'submission.yaml'

# convert a generator returned by yaml.safe_load_all into a list
docs = list(yaml.safe_load_all(open(submission_file_path, 'r')))

submission_file_validator = SubmissionFileValidator()
is_valid_submission_file = submission_file_validator.validate(file_path=submission_file_path, data=docs)
submission_file_validator.print_errors(submission_file_path)

模式版本

在考虑 本机 HEPData JSON 模式 时,存在多个 版本。在大多数情况下,您应使用 最新版本(默认)。如果您需要使用不同的版本,可以在初始化验证器时传递一个关键字参数 schema_version

submission_file_validator = SubmissionFileValidator(schema_version='0.1.0')
data_file_validator = DataFileValidator(schema_version='0.1.0')

远程模式

当使用 远程定义的模式 时,版本取决于提供这些模式的组织,并且这是他们的责任提供一种跟踪不同模式版本的方法。

JsonSchemaResolver 对象解析 JSON 模式中的 $refHTTPSchemaDownloader 对象从远程位置检索模式,并且可选地将它们保存到本地文件系统中,遵循以下结构:schemas_remote/<org>/<project>/<version>/<schema_name>。一个例子可能是

from hepdata_validator.data_file_validator import DataFileValidator
data_validator = DataFileValidator()

# Split remote schema path and schema name
schema_path = 'https://scikit-hep.org/pyhf/schemas/1.0.0/'
schema_name = 'workspace.json'

# Create JsonSchemaResolver object to resolve $ref in JSON schema
from hepdata_validator.schema_resolver import JsonSchemaResolver
pyhf_resolver = JsonSchemaResolver(schema_path)

# Create HTTPSchemaDownloader object to validate against remote schema
from hepdata_validator.schema_downloader import HTTPSchemaDownloader
pyhf_downloader = HTTPSchemaDownloader(pyhf_resolver, schema_path)

# Retrieve and save the remote schema in the local path
pyhf_type = pyhf_downloader.get_schema_type(schema_name)
pyhf_spec = pyhf_downloader.get_schema_spec(schema_name)
pyhf_downloader.save_locally(schema_name, pyhf_spec)

# Load the custom schema as a custom type
import os
pyhf_path = os.path.join(pyhf_downloader.schemas_path, schema_name)
data_validator.load_custom_schema(pyhf_type, pyhf_path)

# Validate a specific schema instance
data_validator.validate(file_path='pyhf_workspace.json', file_type=pyhf_type)

本机 HEPData JSON 模式作为 hepdata-validator 包的一部分提供,不需要下载。然而,原则上,为了测试目的,请注意,上述相同的机制可以与

schema_path = 'https://hepdata.net/submission/schemas/1.1.1/'
schema_name = 'data_schema.json'

以及将 HEPData YAML 数据文件作为 file_path 参数传递给 validate 方法。

项目详情


下载文件

下载适合您平台的文件。如果您不确定选择哪个,请了解更多关于 安装包 的信息。

源分布

hepdata_validator-0.3.5.tar.gz (33.8 kB 查看哈希值)

上传时间 源代码

构建分发版

hepdata_validator-0.3.5-py2.py3-none-any.whl (44.7 kB 查看哈希值)

上传时间 Python 2 Python 3

由以下支持