将CSV文件转换为IMS VDEX XML(词汇定义交换格式)
项目描述
将CSV文件转换为多语言IMS VDEX词汇XML文件
VDEX是用于多语言词汇、本体等的一个非常好的标准格式。只是手动创建其XML格式很糟糕。编辑器支持度低。但是每个人都有Excel,嗯,但几乎每个人都知道如何创建表格。因此,让用户创建一个包含每个术语的键列的表格,并为每种语言创建一个包含翻译值列。
平面词汇
key |
english |
german |
italian |
---|---|---|---|
k01 |
ant |
Ameise |
石蜡 |
k02 |
蜜蜂 |
蜜蜂 |
猿 |
k03 |
黄蜂 |
黄蜂 |
黄蜂 |
k04 |
胡蜂 |
胡蜂 |
卡拉布隆蜂 |
作为一个CSV文件,它看起来像这样
"key";"english";"german";"italian" "k01";"ant";"Ameise";"formica" "k02";"bee";"Biene";"ape" "k03";"wasp";"Wespe";"vespa" "k04";"hornet";"Hornisse";"calabrone"
在通过csv2vdex处理后,像这样调用
csv2vdex insects 'insects,Insekten,insetto' \ insects.csv insects.xml --languages en,de,it --startrow 1
这会得到这样的VDEX XML
<vdex xmlns="http://www.imsglobal.org/xsd/imsvdex_v1p0" orderSignificant="true"> <vocabIdentifier>insects</vocabIdentifier> <vocabName> <langstring language="en">insects</langstring> <langstring language="de">Insekten</langstring> <langstring language="it">insetto</langstring> </vocabName> <term> <termIdentifier>k01</termIdentifier> <caption> <langstring language="en">ant</langstring> <langstring language="de">Ameise</langstring> <langstring language="it">formica</langstring> </caption> </term> <term> <termIdentifier>k02</termIdentifier> <caption> <langstring language="en">bee</langstring> <langstring language="de">Biene</langstring> <langstring language="it">ape</langstring> </caption> </term> <term> <termIdentifier>k03</termIdentifier> <caption> <langstring language="en">wasp</langstring> <langstring language="de">Wespe</langstring> <langstring language="it">vespa</langstring> </caption> </term> <term> <termIdentifier>k04</termIdentifier> <caption> <langstring language="en">hornet</langstring> <langstring language="de">Hornisse</langstring> <langstring language="it">calabrone</langstring> </caption> </term> </vdex>
一个树状词汇表
如果我们想有一个树状词汇表,关键是要用关键字定义级别。这里使用点作为分隔符。
key |
术语值 |
---|---|
nwe |
欧洲西北部 |
nwe.1 |
|
nwe.2 |
|
nwe.3 |
|
nwe.4 |
|
nwe.5 |
|
swe |
欧洲西南部 |
swe.1 |
|
swe.2 |
|
swe.3 |
|
swe.4 |
|
swe.5 |
|
swe.6 |
|
作为一个CSV文件,它看起来像这样
"key";"term value" "nwe";"North-west of Europe" "nwe.1";"A. m. iberica" "nwe.2";"A. m. intermissa" "nwe.3";"A. m. lihzeni" "nwe.4";"A. m. mellifera" "nwe.5";"A. m. sahariensis" "swe";"South-west of Europe" "swe.1";"A. m. carnica" "swe.2";"A. m. cecropia" "swe.3";"A. m. ligustica" "swe.4";"A. m. macedonica" "swe.5";"A. m. ruttneri" "swe.6";"A. m. sicula"
在通过csv2vdex处理后,像这样调用
csv2vdex beeeurope 'European Honey Bees' bees.csv bees.xml -s 1
结果是
<vdex xmlns="http://www.imsglobal.org/xsd/imsvdex_v1p0" orderSignificant="true"> <vocabIdentifier>beeeurope</vocabIdentifier> <vocabName> <langstring language="en">European Honey Bees</langstring> </vocabName> <term> <termIdentifier>nwe</termIdentifier> <caption> <langstring language="en">North-west of Europe</langstring> </caption> <term> <termIdentifier>nwe.1</termIdentifier> <caption> <langstring language="en">A. m. iberica</langstring> </caption> </term> <term> <termIdentifier>nwe.2</termIdentifier> <caption> <langstring language="en">A. m. intermissa</langstring> </caption> </term> <term> <termIdentifier>nwe.3</termIdentifier> <caption> <langstring language="en">A. m. lihzeni</langstring> </caption> </term> <term> <termIdentifier>nwe.4</termIdentifier> <caption> <langstring language="en">A. m. mellifera</langstring> </caption> </term> <term> <termIdentifier>nwe.5</termIdentifier> <caption> <langstring language="en">A. m. sahariensis</langstring> </caption> </term> </term> <term> <termIdentifier>swe</termIdentifier> <caption> <langstring language="en">South-west of Europe</langstring> </caption> <term> <termIdentifier>swe.1</termIdentifier> <caption> <langstring language="en">A. m. carnica</langstring> </caption> </term> <term> <term> <termIdentifier>swe.2</termIdentifier> <caption> <langstring language="en">A. m. cecropia</langstring> </caption> </term> <term> <termIdentifier>swe.3</termIdentifier> <caption> <langstring language="en">A. m. ligustica</langstring> </caption> </term> <term> <termIdentifier>swe.4</termIdentifier> <caption> <langstring language="en">A. m. macedonica</langstring> </caption> </term> <term> <termIdentifier>swe.5</termIdentifier> <caption> <langstring language="en">A. m. ruttneri</langstring> </caption> </term> <term> <termIdentifier>swe.6</termIdentifier> <caption> <langstring language="en">A. m. sicula</langstring> </caption> </term> </term> </vdex>
带有描述的树状词汇表
key |
english |
描述 |
---|---|---|
field_work_terms |
野外工作术语 |
|
field_work_terms.1 |
酸化 |
酸化是一个过程。它自然发生... |
field_work_terms.2 |
含水层 |
如果你拿一把铲子在地上挖... |
field_work_terms.3 |
生物多样性 |
这有很多有争议的含义,但对我们... |
作为一个CSV文件,它看起来像这样
field_work_terms,Field work terms, field_work_terms.1,Acidification,"Acidification is a process. It happens naturally ..." field_work_terms.2,Aquifer,"If you get a shovel and dig at the ground below your ..." field_work_terms.3,Biodiversity,"This has many contentious meanings but for our ..."
在通过csv2vdex处理后,像这样调用
csv2vdex --description True --csvdelimiter "," terms "Terminology" terms.csv terms.xml
这会得到这样的VDEX XML
<vdex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.imsglobal.org/xsd/imsvdex_v1p0" xsi:schemaLocation="http://www.imsglobal.org/imsvdex_v1p0 imsvdex_v1p0.xsd" profileType="lax" orderSignificant="true"> <vocabIdentifier>terms</vocabIdentifier> <vocabName> <langstring language="en">Terminology</langstring> </vocabName> <term> <termIdentifier>field_work_terms</termIdentifier> <caption> <langstring language="en">Field work terms</langstring> </caption> <description> <langstring language="en"></langstring> </description> <term> <termIdentifier>field_work_terms.1</termIdentifier> <caption> <langstring language="en">Acidification</langstring> </caption> <description> <langstring language="en">Acidification is a process. It happens naturally ...</langstring> </description> </term> <term> <termIdentifier>field_work_terms.2</termIdentifier> <caption> <langstring language="en">Aquifer</langstring> </caption> <description> <langstring language="en">If you get a shovel and dig at the ground below your ...</langstring> </description> </term> <term> <termIdentifier>field_work_terms.3</termIdentifier> <caption> <langstring language="en">Biodiversity</langstring> </caption> <description> <langstring language="en">This has many contentious meanings but for our ...</langstring> </description> </term> </term> </vdex>
帮助文本
usage: csv2vdex [-h] [--languages [LANGUAGES]] [--startrow [STARTROW]] [--description [DESCRIPTION]] [--keycolumn [KEYCOLUMN]] [--startcolumn [STARTCOLUMN]] [--ordered [ORDERED]] [--dialect [DIALECT]] [--csvdelimiter [CSVDELIMITER]] [--treedelimiter [TREEDELIMITER]] [--encoding [ENCODING]] id name source target csv2vdex: error: too few arguments jensens@minime:~/workspace/vdexcsv$ ./bin/csv2vdex --help usage: csv2vdex [-h] [--languages [LANGUAGES]] [--startrow [STARTROW]] [--description [DESCRIPTION]] [--keycolumn [KEYCOLUMN]] [--startcolumn [STARTCOLUMN]] [--ordered [ORDERED]] [--dialect [DIALECT]] [--csvdelimiter [CSVDELIMITER]] [--treedelimiter [TREEDELIMITER]] [--encoding [ENCODING]] id name source target Converts CSV files to VDEX XML positional arguments: id unique identifier of vocabulary name Human readable name of vocabulary. If more than one language is given separate each langstring by a comma and provide same order as argument --languages source CSV file to read from target XML target file optional arguments: -h, --help show this help message and exit --languages [LANGUAGES], -l [LANGUAGES] Comma separated list of ISO-language codes. Default: en --description Whether the terms have descriptions. If so, each term takes up two columns per language: one for the caption and one for the description. --startrow [STARTROW], -r [STARTROW] number of row in CSV file where to begin reading, starts with 0, default 0. --keycolumn [KEYCOLUMN], -k [KEYCOLUMN] number of column with the keys of the vocabulary, start with 0, default 0. --startcolumn [STARTCOLUMN], -s [STARTCOLUMN] number of column with the first langstring of the vocabulary. It assumes n + number languages of columns after this, starts counting with 0, default 1. If terms include description, it assumes two columns per language. --ordered [ORDERED], -o [ORDERED] Whether vocabulary is ordered or not, Default: True --dialect [DIALECT] CSV dialect, default excel. --csvdelimiter [CSVDELIMITER] CSV delimiter of the source file, default semicolon. --treedelimiter [TREEDELIMITER] Delimiter used to split the key the vocabulary into a path to determine the position in the tree, default dot. --encoding [ENCODING], -e [ENCODING] Encoding of input file. Default: utf-8
源代码
源代码位于一个GIT DVCS中,其主要分支位于github。
我们非常愿意看到许多分支和pull-requests,以使vdexcsv变得更好。
贡献者
Jens W. Klein <jens@bluedynamics.com>
Peter Holzer <hpeter@agitator.com>
Jean Jordaan <jean.jordaan@gmail.com>
历史
1.4 (2014-10-12)
教授csv2vdex关于术语描述[jean, 2014-10-09]
1.3
修复测试并添加github项目到Travis CI。维护和编码错误已修复[jensens, 2014-02-01]
1.2
添加编码选项,默认为utf-8[hpeteragitator, 2012-02-13]
1.1
根据IMS全球规范,根标签必须是vdex。[jensens, 2011-08-17]
1.0.1
现在是一个包含.rst的egg[jensens, 2011-06-21]
1.0
使其正常工作[jensens, 2011-06-06]
许可
版权所有 (c) 2010-2014,BlueDynamics联盟,奥地利,德国,瑞士。保留所有权利。
在满足以下条件的情况下,允许重新分配和使用源代码和二进制形式,无论是否修改:
源代码重新分配必须保留上述版权声明、本条件列表和以下免责声明。
二进制形式的重新分配必须在文档和/或其他提供的材料中重新生产上述版权声明、本条件列表和以下免责声明。
未经事先书面许可,不得使用BlueDynamics联盟的名称或其贡献者的名称来认可或推广由本软件派生的产品。
本软件由BlueDynamics联盟提供“按原样”并且不承担任何明示或暗示的保证,包括但不限于对适销性和特定用途适用性的暗示保证。在任何情况下,BlueDynamics联盟均不对任何直接、间接、偶然、特殊、示范性或后果性损害(包括但不限于替代货物或服务的采购、使用、数据或利润的损失;或业务的中断)承担责任,无论这种损害是由于哪种理论、合同、严格责任或侵权(包括疏忽或不计后果)引起的,即使被告知本软件可能造成此类损害。
项目详细信息
vdexcsv-1.4.zip 的散列值
算法 | 散列摘要 | |
---|---|---|
SHA256 | e829a790e70a04f53e9eaf907579dcdb98788639a2cae0f797fb7e0174afb839 |
|
MD5 | 28a38df719929f7f01938dbc9c3a43a5 |
|
BLAKE2b-256 | 6c2f64d0c2b096776705b70fb2594d52b2a1c372c9f289a5918442482ba9d9fd |