扩展python json包的功能
项目描述
JSON Extended
扩展python json包功能的模块
- 将目录结构视为嵌套字典
- 轻量级插件系统:定义用于解析不同文件扩展名(内置:.json, .csv, .hdf5)的定制类以及对象编码/解码
- 延迟加载:仅在索引到文件时读取文件
- 自动补全:将索引作为选项卡,以便快速探索数据
- 嵌套字典的操纵
- 增强的格式化打印器
- Jupyter Notebook中的JavaScript渲染,可展开的树
- 包括;筛选、合并、展平、非展平、差异等函数
- 输出到目录结构(n个文件夹级别)
- 大型json文件磁盘索引选项(使用ijson包)
- 单元模式概念,用于应用和转换物理单位(使用pint包)
文档: https://jsonextended.readthedocs.io
内容
安装
从Conda安装(推荐)
conda install -c conda-forge jsonextended
从PyPi安装
pip install jsonextended
jsonextended没有导入依赖,在Python 3.x中无需依赖,但在2.7中需要pathlib2
,但为了完整的功能,建议安装以下包
conda install -c conda-forge ijson numpy pint h5py pandas
基本示例
from jsonextended import edict, plugins, example_mockpaths
选择一个包含多种文件类型的目录结构
datadir = example_mockpaths.directory1
print(datadir.to_string(indentlvl=3,file_content=True))
Folder("dir1")
File("file1.json") Contents:
{"key2": {"key3": 4, "key4": 5}, "key1": [1, 2, 3]}
Folder("subdir1")
File("file1.csv") Contents:
# a csv file
header1,header2,header3
val1,val2,val3
val4,val5,val6
val7,val8,val9
File("file1.literal.csv") Contents:
# a csv file with numbers
header1,header2,header3
1,1.1,string1
2,2.2,string2
3,3.3,string3
Folder("subdir2")
Folder("subsubdir21")
File("file1.keypair") Contents:
# a key-pair file
key1 val1
key2 val2
key3 val3
key4 val4
可以为每种文件类型定义插件进行解析(见创建插件部分)
plugins.load_builtin_plugins('parsers')
plugins.view_plugins('parsers')
{'csv.basic': 'read *.csv delimited file with headers to {header:[column_values]}',
'csv.literal': 'read *.literal.csv delimited files with headers to {header:column_values}, with number strings converted to int/float',
'hdf5.read': 'read *.hdf5 (in read mode) files using h5py',
'json.basic': 'read *.json files using json.load',
'keypair': "read *.keypair, where each line should be; '<key> <pair>'"}
LazyLoad随后接受一个路径名称、路径对象或字典对象,它将使用兼容的插件懒加载每个文件。
lazy = edict.LazyLoad(datadir)
lazy
{file1.json:..,subdir1:..,subdir2:..}
Lazyload可以像字典一样使用,或通过Tab补全进行索引
list(lazy.keys())
['subdir1', 'subdir2', 'file1.json']
lazy[['file1.json','key1']]
[1, 2, 3]
lazy.subdir1.file1_literal_csv.header2
[1.1, 2.2, 3.3]
用于字典的漂亮打印
edict.pprint(lazy,depth=2)
file1.json:
key1: [1, 2, 3]
key2: {...}
subdir1:
file1.csv: {...}
file1.literal.csv: {...}
subdir2:
subsubdir21: {...}
存在许多函数用于操作嵌套字典
edict.flatten(lazy.subdir1)
{('file1.csv', 'header1'): ['val1', 'val4', 'val7'],
('file1.csv', 'header2'): ['val2', 'val5', 'val8'],
('file1.csv', 'header3'): ['val3', 'val6', 'val9'],
('file1.literal.csv', 'header1'): [1, 2, 3],
('file1.literal.csv', 'header2'): [1.1, 2.2, 3.3],
('file1.literal.csv', 'header3'): ['string1', 'string2', 'string3']}
LazyLoad解析plugins.decode
函数到解析插件的read_file
方法(关键字'object_hook')。因此,可以为特定的字典键签名设置定制的解码插件
print(example_mockpaths.jsonfile2.to_string())
File("file2.json") Contents:
{"key1":{"_python_set_": [1, 2, 3]},"key2":{"_numpy_ndarray_": {"dtype": "int64", "value": [1, 2, 3]}}}
edict.LazyLoad(example_mockpaths.jsonfile2).to_dict()
{u'key1': {u'_python_set_': [1, 2, 3]},
u'key2': {u'_numpy_ndarray_': {u'dtype': u'int64', u'value': [1, 2, 3]}}}
plugins.load_builtin_plugins('decoders')
plugins.view_plugins('decoders')
{'decimal.Decimal': 'encode/decode Decimal type',
'numpy.ndarray': 'encode/decode numpy.ndarray',
'pint.Quantity': 'encode/decode pint.Quantity object',
'python.set': 'decode/encode python set'}
dct = edict.LazyLoad(example_mockpaths.jsonfile2).to_dict()
dct
{u'key1': {1, 2, 3}, u'key2': array([1, 2, 3])}
此过程可以使用编码插件进行反转
plugins.load_builtin_plugins('encoders')
plugins.view_plugins('encoders')
{'decimal.Decimal': 'encode/decode Decimal type',
'numpy.ndarray': 'encode/decode numpy.ndarray',
'pint.Quantity': 'encode/decode pint.Quantity object',
'python.set': 'decode/encode python set'}
import json
json.dumps(dct,default=plugins.encode)
'{"key2": {"_numpy_ndarray_": {"dtype": "int64", "value": [1, 2, 3]}}, "key1": {"_python_set_": [1, 2, 3]}}'
创建和加载插件
from jsonextended import plugins, utils
插件被识别为具有最小属性集的类,这些属性与插件类别接口匹配
plugins.view_interfaces()
{'decoders': ['plugin_name', 'plugin_descript', 'dict_signature'],
'encoders': ['plugin_name', 'plugin_descript', 'objclass'],
'parsers': ['plugin_name', 'plugin_descript', 'file_regex', 'read_file']}
plugins.unload_all_plugins()
plugins.view_plugins()
{'decoders': {}, 'encoders': {}, 'parsers': {}}
例如,一个简单的解析插件可能是
class ParserPlugin(object):
plugin_name = 'example'
plugin_descript = 'a parser for *.example files, that outputs (line_number:line)'
file_regex = '*.example'
def read_file(self, file_obj, **kwargs):
out_dict = {}
for i, line in enumerate(file_obj):
out_dict[i] = line.strip()
return out_dict
插件可以作为类加载
plugins.load_plugin_classes([ParserPlugin],'parsers')
plugins.view_plugins()
{'decoders': {},
'encoders': {},
'parsers': {'example': 'a parser for *.example files, that outputs (line_number:line)'}}
或通过目录(加载所有.py文件)
fobj = utils.MockPath('example.py',is_file=True,content="""
class ParserPlugin(object):
plugin_name = 'example.other'
plugin_descript = 'a parser for *.example.other files, that outputs (line_number:line)'
file_regex = '*.example.other'
def read_file(self, file_obj, **kwargs):
out_dict = {}
for i, line in enumerate(file_obj):
out_dict[i] = line.strip()
return out_dict
""")
dobj = utils.MockPath(structure=[fobj])
plugins.load_plugins_dir(dobj,'parsers')
plugins.view_plugins()
{'decoders': {},
'encoders': {},
'parsers': {'example': 'a parser for *.example files, that outputs (line_number:line)',
'example.other': 'a parser for *.example.other files, that outputs (line_number:line)'}}
有关更复杂的解析示例,请参阅jsonextended.complex_parsers
接口规范
- 解析器
- file_regex属性,一个表示要应用此属性的文件的字符串。文件将被最长的正则表达式解析。
- read_file方法,它接受文件对象和kwargs作为参数
- 解码器
- dict_signature属性,一个表示字典必须具有的键的元组,例如dict_signature=('a','b')解码
{'a':1,'b':2}
- from_...方法(s),它接受一个字典对象作为参数。
plugins.decode
函数将使用由intype参数指定的方法,例如如果intype='json',则将调用from_json
。
- dict_signature属性,一个表示字典必须具有的键的元组,例如dict_signature=('a','b')解码
- 编码器
- objclass属性,要应用编码的对象类,例如objclass=decimal.Decimal将编码该类型的对象
- to_...方法(s),它接受一个字典对象作为参数。
plugins.encode
函数将使用由outtype参数指定的方法,例如如果outtype='json',则将调用to_json
。
扩展示例
有关更多信息,所有函数都包含文档字符串,其中包含测试示例。
数据文件夹JSON化
from jsonextended import ejson, edict, utils
path = utils.get_test_path()
ejson.jkeys(path)
['dir1', 'dir2', 'dir3']
jdict1 = ejson.to_dict(path)
edict.pprint(jdict1,depth=2)
dir1:
dir1_1: {...}
file1: {...}
file2: {...}
dir2:
file1: {...}
dir3:
edict.to_html(jdict1,depth=2)
要尝试在Jupyter Notebook中输出的渲染JSON树,请访问:https://chrisjsewell.github.io/
嵌套字典操作
jdict2 = ejson.to_dict(path,['dir1','file1'])
edict.pprint(jdict2,depth=1)
initial: {...}
meta: {...}
optimised: {...}
units: {...}
filtered = edict.filter_keys(jdict2,['vol*'],use_wildcards=True)
edict.pprint(filtered)
initial:
crystallographic:
volume: 924.62752781
primitive:
volume: 462.313764
optimised:
crystallographic:
volume: 1063.98960509
primitive:
volume: 531.994803
edict.pprint(edict.flatten(filtered))
(initial, crystallographic, volume): 924.62752781
(initial, primitive, volume): 462.313764
(optimised, crystallographic, volume): 1063.98960509
(optimised, primitive, volume): 531.994803
单元架构
from jsonextended.units import apply_unitschema, split_quantities
withunits = apply_unitschema(filtered,{'volume':'angstrom^3'})
edict.pprint(withunits)
initial:
crystallographic:
volume: 924.62752781 angstrom ** 3
primitive:
volume: 462.313764 angstrom ** 3
optimised:
crystallographic:
volume: 1063.98960509 angstrom ** 3
primitive:
volume: 531.994803 angstrom ** 3
newunits = apply_unitschema(withunits,{'volume':'nm^3'})
edict.pprint(newunits)
initial:
crystallographic:
volume: 0.92462752781 nanometer ** 3
primitive:
volume: 0.462313764 nanometer ** 3
optimised:
crystallographic:
volume: 1.06398960509 nanometer ** 3
primitive:
volume: 0.531994803 nanometer ** 3
edict.pprint(split_quantities(newunits),depth=4)
initial:
crystallographic:
volume:
magnitude: 0.92462752781
units: nanometer ** 3
primitive:
volume:
magnitude: 0.462313764
units: nanometer ** 3
optimised:
crystallographic:
volume:
magnitude: 1.06398960509
units: nanometer ** 3
primitive:
volume:
magnitude: 0.531994803
units: nanometer ** 3
项目详情
下载文件
下载适合您平台的文件。如果您不确定选择哪一个,请了解有关安装包的更多信息。
源分发
jsonextended-0.7.11.tar.gz (430.8 kB 查看哈希值)
构建发行版
jsonextended-0.7.11-py2.py3-none-any.whl (466.9 kB 查看哈希值)
关闭
jsonextended-0.7.11.tar.gz的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 8044ddc359c8ff91b5b3183be33822131bfddf85ddcc2fd91640029b2c51464a |
|
MD5 | f337a765dbaa6d64c0a7b842e60b676d |
|
BLAKE2b-256 | 9a0b423feb7f13c1b1f15f9ef89c078c40a33799d56ead6465c962457a863590 |
关闭
jsonextended-0.7.11-py2.py3-none-any.whl的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | f4d8d7099af352156ad6babe9633225329183ca7a81f9d93bb55238a5f312bbe |
|
MD5 | d11c83914f9bf3493bfd3cf7c2d4e0be |
|
BLAKE2b-256 | 7baae084e46ed3a7aab0b910790ca82f496e71dc5a2b7cc64793ee54f5d8bbd3 |