跳转到主要内容

将CSV文件转换为IMS VDEX XML(词汇定义交换格式)

项目描述

将CSV文件转换为多语言IMS VDEX词汇XML文件

VDEX是用于多语言词汇、本体等的一个非常好的标准格式。只是手动创建其XML格式很糟糕。编辑器支持度低。但是每个人都有Excel,嗯,但几乎每个人都知道如何创建表格。因此,让用户创建一个包含每个术语的键列的表格,并为每种语言创建一个包含翻译值列。

平面词汇

key

english

german

italian

k01

ant

Ameise

石蜡

k02

蜜蜂

蜜蜂

k03

黄蜂

黄蜂

黄蜂

k04

胡蜂

胡蜂

卡拉布隆蜂

作为一个CSV文件,它看起来像这样

"key";"english";"german";"italian"
"k01";"ant";"Ameise";"formica"
"k02";"bee";"Biene";"ape"
"k03";"wasp";"Wespe";"vespa"
"k04";"hornet";"Hornisse";"calabrone"

在通过csv2vdex处理后,像这样调用

csv2vdex insects 'insects,Insekten,insetto' \
         insects.csv insects.xml --languages en,de,it --startrow 1

这会得到这样的VDEX XML

<vdex xmlns="http://www.imsglobal.org/xsd/imsvdex_v1p0" orderSignificant="true">
  <vocabIdentifier>insects</vocabIdentifier>
  <vocabName>
    <langstring language="en">insects</langstring>
    <langstring language="de">Insekten</langstring>
    <langstring language="it">insetto</langstring>
  </vocabName>
  <term>
    <termIdentifier>k01</termIdentifier>
    <caption>
      <langstring language="en">ant</langstring>
      <langstring language="de">Ameise</langstring>
      <langstring language="it">formica</langstring>
    </caption>
  </term>
  <term>
    <termIdentifier>k02</termIdentifier>
    <caption>
      <langstring language="en">bee</langstring>
      <langstring language="de">Biene</langstring>
      <langstring language="it">ape</langstring>
    </caption>
  </term>
  <term>
    <termIdentifier>k03</termIdentifier>
    <caption>
      <langstring language="en">wasp</langstring>
      <langstring language="de">Wespe</langstring>
      <langstring language="it">vespa</langstring>
    </caption>
  </term>
  <term>
    <termIdentifier>k04</termIdentifier>
    <caption>
      <langstring language="en">hornet</langstring>
      <langstring language="de">Hornisse</langstring>
      <langstring language="it">calabrone</langstring>
    </caption>
  </term>
</vdex>

一个树状词汇表

如果我们想有一个树状词汇表,关键是要用关键字定义级别。这里使用点作为分隔符。

key

术语值

nwe

欧洲西北部

nwe.1

    1. iberica

nwe.2

    1. intermissa

nwe.3

    1. lihzeni

nwe.4

    1. mellifera

nwe.5

    1. sahariensis

swe

欧洲西南部

swe.1

    1. carnica

swe.2

    1. cecropia

swe.3

    1. ligustica

swe.4

    1. macedonica

swe.5

    1. ruttneri

swe.6

    1. sicula

作为一个CSV文件,它看起来像这样

"key";"term value"
"nwe";"North-west of Europe"
"nwe.1";"A. m. iberica"
"nwe.2";"A. m. intermissa"
"nwe.3";"A. m. lihzeni"
"nwe.4";"A. m. mellifera"
"nwe.5";"A. m. sahariensis"
"swe";"South-west of Europe"
"swe.1";"A. m. carnica"
"swe.2";"A. m. cecropia"
"swe.3";"A. m. ligustica"
"swe.4";"A. m. macedonica"
"swe.5";"A. m. ruttneri"
"swe.6";"A. m. sicula"

在通过csv2vdex处理后,像这样调用

csv2vdex beeeurope 'European Honey Bees' bees.csv bees.xml -s 1

结果是

<vdex xmlns="http://www.imsglobal.org/xsd/imsvdex_v1p0" orderSignificant="true">
  <vocabIdentifier>beeeurope</vocabIdentifier>
  <vocabName>
    <langstring language="en">European Honey Bees</langstring>
  </vocabName>
  <term>
    <termIdentifier>nwe</termIdentifier>
    <caption>
      <langstring language="en">North-west of Europe</langstring>
    </caption>
    <term>
      <termIdentifier>nwe.1</termIdentifier>
      <caption>
        <langstring language="en">A. m. iberica</langstring>
      </caption>
    </term>
    <term>
      <termIdentifier>nwe.2</termIdentifier>
      <caption>
        <langstring language="en">A. m. intermissa</langstring>
      </caption>
    </term>
    <term>
      <termIdentifier>nwe.3</termIdentifier>
      <caption>
        <langstring language="en">A. m. lihzeni</langstring>
      </caption>
    </term>
    <term>
      <termIdentifier>nwe.4</termIdentifier>
      <caption>
        <langstring language="en">A. m. mellifera</langstring>
      </caption>
    </term>
    <term>
      <termIdentifier>nwe.5</termIdentifier>
      <caption>
        <langstring language="en">A. m. sahariensis</langstring>
      </caption>
    </term>
  </term>
  <term>
    <termIdentifier>swe</termIdentifier>
    <caption>
      <langstring language="en">South-west of Europe</langstring>
    </caption>
    <term>
      <termIdentifier>swe.1</termIdentifier>
      <caption>
        <langstring language="en">A. m. carnica</langstring>
      </caption>
    </term>
    <term>
   <term>
      <termIdentifier>swe.2</termIdentifier>
      <caption>
        <langstring language="en">A. m. cecropia</langstring>
      </caption>
    </term>
    <term>
      <termIdentifier>swe.3</termIdentifier>
      <caption>
        <langstring language="en">A. m. ligustica</langstring>
      </caption>
    </term>
    <term>
      <termIdentifier>swe.4</termIdentifier>
      <caption>
        <langstring language="en">A. m. macedonica</langstring>
      </caption>
    </term>
    <term>
      <termIdentifier>swe.5</termIdentifier>
      <caption>
        <langstring language="en">A. m. ruttneri</langstring>
      </caption>
    </term>
    <term>
      <termIdentifier>swe.6</termIdentifier>
      <caption>
        <langstring language="en">A. m. sicula</langstring>
      </caption>
    </term>
  </term>
</vdex>

带有描述的树状词汇表

key

english

描述

field_work_terms

野外工作术语

field_work_terms.1

酸化

酸化是一个过程。它自然发生...

field_work_terms.2

含水层

如果你拿一把铲子在地上挖...

field_work_terms.3

生物多样性

这有很多有争议的含义,但对我们...

作为一个CSV文件,它看起来像这样

field_work_terms,Field work terms,
field_work_terms.1,Acidification,"Acidification is a process. It happens naturally ..."
field_work_terms.2,Aquifer,"If you get a shovel and dig at the ground below your ..."
field_work_terms.3,Biodiversity,"This has many contentious meanings but for our ..."

在通过csv2vdex处理后,像这样调用

csv2vdex --description True --csvdelimiter "," terms "Terminology" terms.csv terms.xml

这会得到这样的VDEX XML

<vdex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.imsglobal.org/xsd/imsvdex_v1p0" xsi:schemaLocation="http://www.imsglobal.org/imsvdex_v1p0 imsvdex_v1p0.xsd" profileType="lax" orderSignificant="true">
  <vocabIdentifier>terms</vocabIdentifier>
  <vocabName>
    <langstring language="en">Terminology</langstring>
  </vocabName>
  <term>
    <termIdentifier>field_work_terms</termIdentifier>
    <caption>
      <langstring language="en">Field work terms</langstring>
    </caption>
    <description>
      <langstring language="en"></langstring>
    </description>
    <term>
      <termIdentifier>field_work_terms.1</termIdentifier>
      <caption>
        <langstring language="en">Acidification</langstring>
      </caption>
      <description>
        <langstring language="en">Acidification is a process. It happens naturally ...</langstring>
      </description>
    </term>
    <term>
      <termIdentifier>field_work_terms.2</termIdentifier>
      <caption>
        <langstring language="en">Aquifer</langstring>
      </caption>
      <description>
        <langstring language="en">If you get a shovel and dig at the ground below your ...</langstring>
      </description>
    </term>
    <term>
      <termIdentifier>field_work_terms.3</termIdentifier>
      <caption>
        <langstring language="en">Biodiversity</langstring>
      </caption>
      <description>
        <langstring language="en">This has many contentious meanings but for our ...</langstring>
      </description>
    </term>
  </term>
</vdex>

帮助文本

  usage: csv2vdex [-h] [--languages [LANGUAGES]] [--startrow [STARTROW]]
      [--description [DESCRIPTION]] [--keycolumn [KEYCOLUMN]]
      [--startcolumn [STARTCOLUMN]]
                  [--ordered [ORDERED]] [--dialect [DIALECT]]
                  [--csvdelimiter [CSVDELIMITER]]
                  [--treedelimiter [TREEDELIMITER]] [--encoding [ENCODING]]
                  id name source target
  csv2vdex: error: too few arguments
  jensens@minime:~/workspace/vdexcsv$ ./bin/csv2vdex --help
  usage: csv2vdex [-h] [--languages [LANGUAGES]] [--startrow [STARTROW]]
      [--description [DESCRIPTION]] [--keycolumn [KEYCOLUMN]]
      [--startcolumn [STARTCOLUMN]]
                  [--ordered [ORDERED]] [--dialect [DIALECT]]
                  [--csvdelimiter [CSVDELIMITER]]
                  [--treedelimiter [TREEDELIMITER]] [--encoding [ENCODING]]
                  id name source target

  Converts CSV files to VDEX XML

  positional arguments:
    id                    unique identifier of vocabulary
    name                  Human readable name of vocabulary. If more than one
                          language is given separate each langstring by a comma
                          and provide same order as argument --languages
    source                CSV file to read from
    target                XML target file

  optional arguments:
    -h, --help            show this help message and exit
    --languages [LANGUAGES], -l [LANGUAGES]
                          Comma separated list of ISO-language codes. Default:
                          en
--description
          Whether the terms have descriptions. If so, each term takes
          up two columns per language: one for the caption and one for
          the description.
    --startrow [STARTROW], -r [STARTROW]
                          number of row in CSV file where to begin reading,
                          starts with 0, default 0.
    --keycolumn [KEYCOLUMN], -k [KEYCOLUMN]
                          number of column with the keys of the vocabulary,
                          start with 0, default 0.
    --startcolumn [STARTCOLUMN], -s [STARTCOLUMN]
                          number of column with the first langstring of the
                          vocabulary. It assumes n + number languages of columns
                          after this, starts counting with 0, default 1.
          If terms include description, it assumes two columns
          per language.
    --ordered [ORDERED], -o [ORDERED]
                          Whether vocabulary is ordered or not, Default: True
    --dialect [DIALECT]   CSV dialect, default excel.
    --csvdelimiter [CSVDELIMITER]
                          CSV delimiter of the source file, default semicolon.
    --treedelimiter [TREEDELIMITER]
                          Delimiter used to split the key the vocabulary into a
                          path to determine the position in the tree, default
                          dot.
    --encoding [ENCODING], -e [ENCODING]
                          Encoding of input file. Default: utf-8

源代码

https://travis-ci.org/bluedynamics/vdexcsv.png?branch=master:target:https://travis-ci.org/bluedynamics/vdexcsv

源代码位于一个GIT DVCS中,其主要分支位于github

我们非常愿意看到许多分支和pull-requests,以使vdexcsv变得更好。

贡献者

历史

1.4 (2014-10-12)

  • 教授csv2vdex关于术语描述[jean, 2014-10-09]

1.3

  • 修复测试并添加github项目到Travis CI。维护和编码错误已修复[jensens, 2014-02-01]

1.2

  • 添加编码选项,默认为utf-8[hpeteragitator, 2012-02-13]

1.1

  • 根据IMS全球规范,根标签必须是vdex。[jensens, 2011-08-17]

1.0.1

  • 现在是一个包含.rst的egg[jensens, 2011-06-21]

1.0

  • 使其正常工作[jensens, 2011-06-06]

许可

版权所有 (c) 2010-2014,BlueDynamics联盟,奥地利,德国,瑞士。保留所有权利。

在满足以下条件的情况下,允许重新分配和使用源代码和二进制形式,无论是否修改:

  • 源代码重新分配必须保留上述版权声明、本条件列表和以下免责声明。

  • 二进制形式的重新分配必须在文档和/或其他提供的材料中重新生产上述版权声明、本条件列表和以下免责声明。

  • 未经事先书面许可,不得使用BlueDynamics联盟的名称或其贡献者的名称来认可或推广由本软件派生的产品。

本软件由BlueDynamics联盟提供“按原样”并且不承担任何明示或暗示的保证,包括但不限于对适销性和特定用途适用性的暗示保证。在任何情况下,BlueDynamics联盟均不对任何直接、间接、偶然、特殊、示范性或后果性损害(包括但不限于替代货物或服务的采购、使用、数据或利润的损失;或业务的中断)承担责任,无论这种损害是由于哪种理论、合同、严格责任或侵权(包括疏忽或不计后果)引起的,即使被告知本软件可能造成此类损害。

项目详细信息


下载文件

下载适用于您的平台的文件。如果您不确定选择哪个,请了解有关安装包的更多信息。

源分布

vdexcsv-1.4.zip (22.5 kB 查看散列值)

上传时间

由以下支持

AWSAWS云计算和安全赞助商DatadogDatadog监控FastlyFastlyCDNGoogleGoogle下载分析MicrosoftMicrosoftPSF赞助商PingdomPingdom监控SentrySentry错误日志StatusPageStatusPage状态页面