HEPData提交的JSON模式及验证代码
项目描述
HEPData提交的JSON模式及验证代码(Python 3版本)
安装
如果您可能的话,在您的机器上安装LibYAML(一个用于解析和生成YAML的C库)。这将允许使用CSafeLoader(而不是Python的SafeLoader)以更快地加载YAML文件。对小文件来说不是什么大问题,但对于大型文档来说性能明显更好。
使用pip从PyPI安装
$ pip install --user hepdata-validator
$ hepdata-validate --help
如果您想在M1 Mac上使用LibYAML,可能需要额外的步骤以确保pyyaml带有LibYAML绑定。在通过Homebrew安装LibYAML后运行以下命令
$ LDFLAGS="-L$(brew --prefix)/lib" CFLAGS="-I$(brew --prefix)/include" pip install --global-option="--with-libyaml" --force pyyaml
开发者
开发者应从GitHub在虚拟环境中安装
$ git clone https://github.com/HEPData/hepdata-validator
$ cd hepdata-validator
$ python3.9 -m venv venv
$ source venv/bin/activate
(venv)$ pip install --upgrade pip
(venv)$ pip install --upgrade -e ".[all]"
应该使用和没有LibYAML一起运行测试,因为不同YAML解析器的错误信息不同
(venv) $ USE_LIBYAML=True pytest testsuite
(venv) $ USE_LIBYAML=False pytest testsuite
用法
hepdata-validator软件包允许您通过命令行或Python进行验证
完整的提交目录和数据文件
包含所有文件的存档文件(.zip, .tar, .tar.gz, .tgz)(完整详情)
单个.yaml或.yaml.gz文件(但不是 submission.yaml或YAML数据文件)
一个 submission.yaml 文件或单个 YAML 数据文件(仅通过 Python,不通过命令行)
同一包用于验证上传到 hepdata.net 的内容,因此首先离线验证可以提高在上传前检查您的提交是否有效的效率。
命令行
安装 hepdata-validator 包会将命令 hepdata-validate 添加到您的路径中,这使得您可以在离线状态下验证 HEPData 提交。
示例
验证当前目录中包含多个文件的提交
$ hepdata-validate
验证另一个目录中包含多个文件的提交
$ hepdata-validate -d ../TestHEPSubmission
验证当前目录中的存档文件(.zip, .tar, .tar.gz, .tgz)
$ hepdata-validate -a TestHEPSubmission.zip
验证当前目录中的单个 YAML 文件
$ hepdata-validate -f single_yaml_file.yaml
使用选项
$ hepdata-validate --help
Usage: hepdata-validate [OPTIONS]
Offline validation of submission.yaml and YAML data files. Can check either
a directory, an archive file, or the single YAML file format.
Options:
-d, --directory TEXT Directory to check (defaults to current working
directory)
-f, --file TEXT Single .yaml or .yaml.gz file (but not submission.yaml
or a YAML data file) to check - see https://hepdata-
submission.readthedocs.io/en/latest/single_yaml.html.
(Overrides directory)
-a, --archive TEXT Archive file (.zip, .tar, .tar.gz, .tgz) to check.
(Overrides directory and file)
--help Show this message and exit.
Python
验证完整提交
要验证完整提交,实例化一个 FullSubmissionValidator 对象
from hepdata_validator.full_submission_validator import FullSubmissionValidator, SchemaType
full_submission_validator = FullSubmissionValidator()
# validate a directory
is_dir_valid = full_submission_validator.validate(directory='TestHEPSubmission')
# or uncomment to validate an archive file
# is_archive_valid = full_submission_validator.validate(archive='TestHEPSubmission.zip')
# or uncomment to validate a single file
# is_file_valid = full_submission_validator.validate(file='single_yaml_file.yaml')
# if there are any error messages, they are retrievable through this call
full_submission_validator.get_messages()
# the error messages can be printed for each file
full_submission_validator.print_errors('submission.yaml')
# the list of valid files can be retrieved via the valid_files property, which is a
# dict mapping SchemaType (e.g. SUBMISSION, DATA, SINGLE_YAML, REMOTE) to lists of
# valid files
full_submission_validator.valid_files[SchemaType.SUBMISSION]
full_submission_validator.valid_files[SchemaType.DATA]
# full_submission_validator.valid_files[SchemaType.SINGLE_YAML]
# if a remote schema is used, valid_files is a list of tuples (schema, file)
# full_submission_validator.valid_files[SchemaType.REMOTE]
# the list of valid files can be printed
full_submission_validator.print_valid_files()
验证单个文件
要验证提交文件,实例化一个 SubmissionFileValidator 对象
from hepdata_validator.submission_file_validator import SubmissionFileValidator
submission_file_validator = SubmissionFileValidator()
submission_file_path = 'submission.yaml'
# the validate method takes a string representing the file path
is_valid_submission_file = submission_file_validator.validate(file_path=submission_file_path)
# if there are any error messages, they are retrievable through this call
submission_file_validator.get_messages()
# the error messages can be printed
submission_file_validator.print_errors(submission_file_path)
要验证数据文件,实例化一个 DataFileValidator 对象
from hepdata_validator.data_file_validator import DataFileValidator
data_file_validator = DataFileValidator()
# the validate method takes a string representing the file path
data_file_validator.validate(file_path='data.yaml')
# if there are any error messages, they are retrievable through this call
data_file_validator.get_messages()
# the error messages can be printed
data_file_validator.print_errors('data.yaml')
可选的,如果您已经加载了 YAML 对象,则可以将其作为 data 对象传递。您还必须传递 file_path,因为这是用作错误消息查找映射的键。
from hepdata_validator.data_file_validator import DataFileValidator
import yaml
file_contents = yaml.safe_load(open('data.yaml', 'r'))
data_file_validator = DataFileValidator()
data_file_validator.validate(file_path='data.yaml', data=file_contents)
data_file_validator.get_messages('data.yaml')
data_file_validator.print_errors('data.yaml')
对于类似的 SubmissionFileValidator
from hepdata_validator.submission_file_validator import SubmissionFileValidator
import yaml
submission_file_path = 'submission.yaml'
# convert a generator returned by yaml.safe_load_all into a list
docs = list(yaml.safe_load_all(open(submission_file_path, 'r')))
submission_file_validator = SubmissionFileValidator()
is_valid_submission_file = submission_file_validator.validate(file_path=submission_file_path, data=docs)
submission_file_validator.print_errors(submission_file_path)
模式版本
在考虑 本机 HEPData JSON 模式 时,存在多个 版本。在大多数情况下,您应使用 最新版本(默认)。如果您需要使用不同的版本,可以在初始化验证器时传递一个关键字参数 schema_version。
submission_file_validator = SubmissionFileValidator(schema_version='0.1.0')
data_file_validator = DataFileValidator(schema_version='0.1.0')
远程模式
当使用 远程定义的模式 时,版本取决于提供这些模式的组织,并且这是他们的责任提供一种跟踪不同模式版本的方法。
JsonSchemaResolver 对象解析 JSON 模式中的 $ref。HTTPSchemaDownloader 对象从远程位置检索模式,并且可选地将它们保存到本地文件系统中,遵循以下结构:schemas_remote/<org>/<project>/<version>/<schema_name>。一个例子可能是
from hepdata_validator.data_file_validator import DataFileValidator
data_validator = DataFileValidator()
# Split remote schema path and schema name
schema_path = 'https://scikit-hep.org/pyhf/schemas/1.0.0/'
schema_name = 'workspace.json'
# Create JsonSchemaResolver object to resolve $ref in JSON schema
from hepdata_validator.schema_resolver import JsonSchemaResolver
pyhf_resolver = JsonSchemaResolver(schema_path)
# Create HTTPSchemaDownloader object to validate against remote schema
from hepdata_validator.schema_downloader import HTTPSchemaDownloader
pyhf_downloader = HTTPSchemaDownloader(pyhf_resolver, schema_path)
# Retrieve and save the remote schema in the local path
pyhf_type = pyhf_downloader.get_schema_type(schema_name)
pyhf_spec = pyhf_downloader.get_schema_spec(schema_name)
pyhf_downloader.save_locally(schema_name, pyhf_spec)
# Load the custom schema as a custom type
import os
pyhf_path = os.path.join(pyhf_downloader.schemas_path, schema_name)
data_validator.load_custom_schema(pyhf_type, pyhf_path)
# Validate a specific schema instance
data_validator.validate(file_path='pyhf_workspace.json', file_type=pyhf_type)
本机 HEPData JSON 模式作为 hepdata-validator 包的一部分提供,不需要下载。然而,原则上,为了测试目的,请注意,上述相同的机制可以与
schema_path = 'https://hepdata.net/submission/schemas/1.1.1/'
schema_name = 'data_schema.json'
以及将 HEPData YAML 数据文件作为 file_path 参数传递给 validate 方法。
项目详情
下载文件
下载适合您平台的文件。如果您不确定选择哪个,请了解更多关于 安装包 的信息。
源分布
构建分发版
hepdata_validator-0.3.5.tar.gz 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | d93a23e4eeb41df03011494482b619ac59088f13054bd389718111d135f2ab10 |
|
MD5 | 26ba46ccdd1646cf3ef6b69f45af44c8 |
|
BLAKE2b-256 | 0d0ca6c8cfae5738bc30ec0db7093aa0f99a502c2dc97bc65e9eca4cfd0c472f |
hepdata_validator-0.3.5-py2.py3-none-any.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 0f5a97ef4f1a8e305b89c801a9ae294801c2f5f88a2f0c374218fe483f255ed7 |
|
MD5 | e19cf8f8ae297baccb221d23c3a8b359 |
|
BLAKE2b-256 | 55f64d870b584a1d624da6f9a3abe87f289af70890c9c48516ff6c4fd464bb24 |