将嵌套字典或JSON对象转换为表格及其反向操作的Python库
项目描述
json-flattener
将复杂对象的列表去规范化/展平为表格/数据框的Python库,支持往返操作
笔记本示例
描述
给定YAML/JSON/JSON-Lines,例如
- id: S001
name: Lord of the Rings
genres:
- fantasy
creator:
name: JRR Tolkein
from_country: England
books:
- id: S001.1
name: Fellowship of the Ring
price: 5.99
summary: Hobbits
- id: S001.2
name: The Two Towers
price: 5.99
summary: More hobbits
- id: S001.3
name: Return of the King
price: 6.99
summary: Yet more hobbits
- id: S002
name: The Culture Series
genres:
- scifi
creator:
name: Ian M Banks
from_country: Scotland
books:
- id: S002.1
name: Consider Phlebas
price: 5.99
- id: S002.2
name: Player of Games
price: 5.99
使用jfl
命令去规范化
jfl flatten -C creator=flat -C books=multivalued -i examples/books1.yaml -o examples/books1-flattened.tsv
id | name | genres | creator_name | creator_from_country | books_name | books_summary | books_price | books_id | creator_genres |
---|---|---|---|---|---|---|---|---|---|
S001 | 指环王 | [fantasy] | JRR Tolkien | 英格兰 | [Fellowship of the Ring|The Two Towers|Return of the King] | [Hobbits|More hobbits|Yet more hobbits] | [5.99|5.99|6.99] | [S001.1|S001.2|S001.3] | |
S002 | 文化系列 | [scifi] | Ian M Banks | 苏格兰 | [Consider Phlebas|Player of Games] | [5.99|5.99] | [S002.1|S002.2] |
转换回JSON/YAML
jfl unflatten -C creator=flat -C books=multivalued -i examples/books1.tsv -o examples/books1.yaml
这个库还允许将复杂数据直接序列化为JSON或YAML(默认是在键后追加_json
)。例如
jfl flatten -C creator=json -C books=json -i examples/books1.yaml -o examples/books1-jsonified.tsv
id | name | genres | creator_json | books_json |
---|---|---|---|---|
S001 | 指环王 | [fantasy] | {"name": "JRR Tolkein", "from_country": "England"} | [{"id": "S001.1", "name": "The Lord of the Rings", "summary": "Hobbits", "price": 5.99}, {"id": "S001.2", "name": "The Two Towers", "summary": "More hobbits", "price": 5.99}, {"id": "S001.3", "name": "The Return of the King", "summary": "Yet more hobbits", "price": 6.99}] |
S002 | 文化系列 | [scifi] | {"name": "Ian M Banks", "from_country": "Scotland"} | [{"id": "S002.1", "name": "Consider Phlebas", "price": 5.99}, {"id": "S002.2", "name": "Player of Games", "price": 5.99}] |
S003 | 《新日之书》 | [科幻, 奇幻] | {"name": "Gene Wolfe", "genres": ["scifi", "fantasy"], "from_country": "USA"} | [{"id": "S003.1", "name": "Shadow of the Torturer"}, {"id": "S003.2", "name": "Claw of the Conciliator", "price": 6.99}] |
S004 | 单本书的示例 | {"name": "Ms Writer", "genres": ["romance"], "from_country": "USA"} | [{"id": "S004.1", "name": "Blah"}] | |
S005 | 没有书籍的示例 | {"name": "Mr Unproductive", "genres": ["romance", "scifi", "fantasy"], "from_country": "USA"} |
参见
<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vRyM06peU9BkrZbXJazuMlajw5s4Vbj5f0t0TE4hj_X9Ex_EASLSUZuaWUxYIhWbOC6CtPRtxrTGWQD/embed?start=false&loop=false&delayms=60000" frameborder="0" width="960" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>主要用例是将丰富的规范数据模型(作为Python对象、JSON或YAML)转换为更适合使用以下工具处理的表现形式:
- Solr/Lucene
- Pandas/R数据框
- Excel/Google表格
- Unix cut/grep/cat等
- 简单的非规范SQL数据库表示
目标非规范格式是一系列行/数据矩阵,其中每个单元格都是一个原子或原子的列表。
方法
- 每个顶级键成为一列
- 如果键值是字典/对象,则进行展平
- 默认使用'_'来分隔父键和内部键
- 例如,
creator
和from_country
的组合成为creator_from_country
- 当前支持一层展平
- 如果键值是原子实体的列表,则保持不变
- 如果键值是字典/对象的列表,则将内部字典的每个键展平成列表
- 例如,如果
books
是书对象的列表,且name
是书的键,则books_name
是每本书名称的列表 - 顺序很重要 -
books_name
的第一个元素与books_price
的第一个元素匹配,等等
- 例如,如果
- 如果配置了,允许任何键序列化为yaml/json/pickle
命令行用法(待办事项)
从Python使用
文档即将推出:目前请参阅测试文件夹
在LinkML中使用
比较
Pandas json_normalize
Java json-flattener
https://github.com/wnameless/json-flattener
Python
csvjson
项目详情
下载文件
下载适合您平台的文件。如果您不确定选择哪个,请了解更多关于安装包的信息。
源分发
json_flattener-0.1.9.tar.gz (11.5 kB 查看哈希值)
构建发行版
json_flattener-0.1.9-py3-none-any.whl (10.8 kB 查看哈希值)
关闭
json_flattener-0.1.9.tar.gz 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 84cf8523045ffb124301a602602201665fcb003a171ece87e6f46ed02f7f0c15 |
|
MD5 | f652ecf05bb3fbe29c17606b5613748c |
|
BLAKE2b-256 | 6d77b00e46d904818826275661a690532d3a3a43a4ded0264b2d7fcdb5c0feea |
关闭
json_flattener-0.1.9-py3-none-any.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 6b027746f08bf37a75270f30c6690c7149d5f704d8af1740c346a3a1236bc941 |
|
MD5 | 903d1ae6cf748972dcff6871ec72dbda |
|
BLAKE2b-256 | 00cc7fbd75d3362e939eb98bcf9bd22f3f7df8c237a85148899ed3d38e5614e5 |