将UNIHAN导出到Python、数据包、CSV、JSON和YAML
项目描述
unihan-tabular - 将UNIHAN构建成表格友好格式(如python、JSON、CSV和YAML)的工具。cihai项目的一部分。
UNIHAN的数据以以下格式分散在多个文件中:
U+3400 kCantonese jau1 U+3400 kDefinition (same as U+4E18 丘) hillock or mound U+3400 kMandarin qiū U+3401 kCantonese tim2 U+3401 kDefinition to lick; to taste, a mat, bamboo bark U+3401 kHanyuPinyin 10019.020:tiàn U+3401 kMandarin tiàn
$ unihan-tabular 将下载Unihan.zip并将所有文件构建成一个单一的表格友好格式。
CSV(默认),$ unihan-tabular
char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin 㐀,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū 㐁,U+3401,tim2,"to lick; to taste, a mat, bamboo bark",10019.020:tiàn,tiàn
JSON,$ unihan-tabular -F json
[
{
"char": "㐀",
"ucn": "U+3400",
"kCantonese": "jau1",
"kDefinition": "(same as U+4E18 丘) hillock or mound",
"kHanyuPinyin": null,
"kMandarin": "qiū"
},
{
"char": "㐁",
"ucn": "U+3401",
"kCantonese": "tim2",
"kDefinition": "to lick; to taste, a mat, bamboo bark",
"kHanyuPinyin": "10019.020:tiàn",
"kMandarin": "tiàn"
}
]
YAML $ unihan-tabular -F yaml
- char: 㐀
kCantonese: jau1
kDefinition: (same as U+4E18 丘) hillock or mound
kHanyuPinyin: null
kMandarin: qiū
ucn: U+3400
- char: 㐁
kCantonese: tim2
kDefinition: to lick; to taste, a mat, bamboo bark
kHanyuPinyin: 10019.020:tiàn
kMandarin: tiàn
ucn: U+3401
特性
自动从互联网下载UNIHAN
通过-F导出到JSON、CSV和YAML(需要pyyaml)
可配置导出特定字段-f
考虑到由于Unicode内容而导致编码冲突
设计为未来CJK(中文、日文、韩文)数据集的技术证明
cihai的核心组件和依赖库,cihai是一个CJK库
数据包支持
支持python 2.7、>= 3.5和pypy
如果您遇到问题或有疑问,请创建问题。
用法
unihan-tabular 支持命令行参数。有关如何指定自定义列、文件、下载URL和输出目标的信息,请参阅 unihan-tabular CLI参数。
下载和构建您自己的UNIHAN导出
$ pip install unihan-tabular
输出CSV,默认格式
$ unihan-tabular
输出JSON
$ unihan-tabular -F json
输出YAML
$ pip install pyyaml $ unihan-tabular -F yaml
仅在csv中输出kDefinition字段
$ unihan-tabular -f kDefinition
输出多个字段,字段之间用空格分隔
$ unihan-tabular -f kCantonese kDefinition
输出到自定义文件
$ unihan-tabular --destination ./exported.csv
输出到自定义文件(模板文件扩展名)
$ unihan-tabular --destination ./exported.{ext}
有关高级使用示例,请参阅 unihan-tabular CLI参数。
结构
# output w/ JSON
{XDG data dir}/unihan_tabular/unihan.json
# output w/ CSV
{XDG data dir}/unihan_tabular/unihan.csv
# output w/ yaml (requires pyyaml)
{XDG data dir}/unihan_tabular/unihan.yaml
# script to download + build a SDF csv of unihan.
unihan_tabular/process.py
# unit tests to verify behavior / consistency of builder
tests/*
# python 2/3 compatibility module
unihan_tabular/_compat.py
# utility / helper functions
unihan_tabular/util.py
项目详情
关闭
unihan-tabular-0.8.1.tar.gz的散列
算法 | 散列摘要 | |
---|---|---|
SHA256 | 4d026c1afd9dda99678d0259fa8676acaf741facb78b39d18c3ad84d6dfcdd21 |
|
MD5 | c739b77028fa7ea5420681b558339fd6 |
|
BLAKE2b-256 | e17258447d411c1de7798aa8275511a979ce6b0affd15cfeab902b49db837fa6 |