跳转到主要内容

将嵌套字典或JSON对象转换为表格及其反向操作的Python库

项目描述

json-flattener

将复杂对象的列表去规范化/展平为表格/数据框的Python库,支持往返操作

笔记本示例

EXAMPLE.ipynb

描述

给定YAML/JSON/JSON-Lines,例如

- id: S001
  name: Lord of the Rings
  genres:
    - fantasy
  creator:
    name: JRR Tolkein
    from_country: England
  books:
    - id: S001.1
      name: Fellowship of the Ring
      price: 5.99
      summary: Hobbits
    - id: S001.2
      name: The Two Towers
      price: 5.99
      summary: More hobbits
    - id: S001.3
      name: Return of the King
      price: 6.99
      summary: Yet more hobbits
- id: S002
  name: The Culture Series
  genres:
    - scifi
  creator:
    name: Ian M Banks
    from_country: Scotland
  books:
    - id: S002.1
      name: Consider Phlebas
      price: 5.99
    - id: S002.2
      name: Player of Games
      price: 5.99

使用jfl命令去规范化

jfl flatten -C creator=flat -C books=multivalued -i examples/books1.yaml -o examples/books1-flattened.tsv
id name genres creator_name creator_from_country books_name books_summary books_price books_id creator_genres
S001 指环王 [fantasy] JRR Tolkien 英格兰 [Fellowship of the Ring|The Two Towers|Return of the King] [Hobbits|More hobbits|Yet more hobbits] [5.99|5.99|6.99] [S001.1|S001.2|S001.3]
S002 文化系列 [scifi] Ian M Banks 苏格兰 [Consider Phlebas|Player of Games] [5.99|5.99] [S002.1|S002.2]

转换回JSON/YAML

jfl unflatten -C creator=flat -C books=multivalued -i examples/books1.tsv -o examples/books1.yaml

这个库还允许将复杂数据直接序列化为JSON或YAML(默认是在键后追加_json)。例如

jfl flatten -C creator=json -C books=json -i examples/books1.yaml -o examples/books1-jsonified.tsv
id name genres creator_json books_json
S001 指环王 [fantasy] {"name": "JRR Tolkein", "from_country": "England"} [{"id": "S001.1", "name": "The Lord of the Rings", "summary": "Hobbits", "price": 5.99}, {"id": "S001.2", "name": "The Two Towers", "summary": "More hobbits", "price": 5.99}, {"id": "S001.3", "name": "The Return of the King", "summary": "Yet more hobbits", "price": 6.99}]
S002 文化系列 [scifi] {"name": "Ian M Banks", "from_country": "Scotland"} [{"id": "S002.1", "name": "Consider Phlebas", "price": 5.99}, {"id": "S002.2", "name": "Player of Games", "price": 5.99}]
S003 《新日之书》 [科幻, 奇幻] {"name": "Gene Wolfe", "genres": ["scifi", "fantasy"], "from_country": "USA"} [{"id": "S003.1", "name": "Shadow of the Torturer"}, {"id": "S003.2", "name": "Claw of the Conciliator", "price": 6.99}]
S004 单本书的示例 {"name": "Ms Writer", "genres": ["romance"], "from_country": "USA"} [{"id": "S004.1", "name": "Blah"}]
S005 没有书籍的示例 {"name": "Mr Unproductive", "genres": ["romance", "scifi", "fantasy"], "from_country": "USA"}

参见

<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vRyM06peU9BkrZbXJazuMlajw5s4Vbj5f0t0TE4hj_X9Ex_EASLSUZuaWUxYIhWbOC6CtPRtxrTGWQD/embed?start=false&loop=false&delayms=60000" frameborder="0" width="960" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

主要用例是将丰富的规范数据模型(作为Python对象、JSON或YAML)转换为更适合使用以下工具处理的表现形式:

  • Solr/Lucene
  • Pandas/R数据框
  • Excel/Google表格
  • Unix cut/grep/cat等
  • 简单的非规范SQL数据库表示

目标非规范格式是一系列行/数据矩阵,其中每个单元格都是一个原子或原子的列表。

方法

  • 每个顶级键成为一列
  • 如果键值是字典/对象,则进行展平
    • 默认使用'_'来分隔父键和内部键
    • 例如,creatorfrom_country的组合成为creator_from_country
    • 当前支持一层展平
  • 如果键值是原子实体的列表,则保持不变
  • 如果键值是字典/对象的列表,则将内部字典的每个键展平成列表
    • 例如,如果books是书对象的列表,且name是书的键,则books_name是每本书名称的列表
    • 顺序很重要 - books_name的第一个元素与books_price的第一个元素匹配,等等
  • 如果配置了,允许任何键序列化为yaml/json/pickle

命令行用法(待办事项)

从Python使用

文档即将推出:目前请参阅测试文件夹

在LinkML中使用

比较

Pandas json_normalize

Java json-flattener

https://github.com/wnameless/json-flattener

Python

csvjson

https://csvjson.com/json2csv

项目详情


下载文件

下载适合您平台的文件。如果您不确定选择哪个,请了解更多关于安装包的信息。

源分发

json_flattener-0.1.9.tar.gz (11.5 kB 查看哈希值)

上传时间 源代码

构建发行版

json_flattener-0.1.9-py3-none-any.whl (10.8 kB 查看哈希值)

上传时间 Python 3

支持者