未提供项目描述

这些详情尚未由PyPI验证

项目链接

主页

项目描述

Rhasspy ASR Kaldi

在Rhasspy语音助手中使用Kaldi进行自动语音识别。

需求

Python 3.7
Kaldi
- 期望环境中的$KALDI_DIR
Opengrm
- 期望$PATH中的ngram*
Phonetisaurus
- 期望$PATH中的phonetisaurus-apply

有关预编译二进制文件的预构建应用，请参阅。

安装

$ git clone https://github.com/rhasspy/rhasspy-asr-kaldi
$ cd rhasspy-asr-kaldi
$ ./configure
$ make
$ make install

转录

使用python3 -m rhasspyasr_kaldi transcribe <ARGS>

usage: rhasspy-asr-kaldi transcribe [-h] --model-dir MODEL_DIR
                                    [--graph-dir GRAPH_DIR]
                                    [--model-type MODEL_TYPE]
                                    [--frames-in-chunk FRAMES_IN_CHUNK]
                                    [wav_file [wav_file ...]]

positional arguments:
  wav_file              WAV file(s) to transcribe

optional arguments:
  -h, --help            show this help message and exit
  --model-dir MODEL_DIR
                        Path to Kaldi model directory (with conf, data)
  --graph-dir GRAPH_DIR
                        Path to Kaldi graph directory (with HCLG.fst)
  --model-type MODEL_TYPE
                        Either nnet3 or gmm (default: nnet3)
  --frames-in-chunk FRAMES_IN_CHUNK
                        Number of frames to process at a time

对于nnet3模型，使用online2-tcp-nnet3-decode-faster程序处理流式音频。对于gmm模型，在转录之前将音频缓冲并打包为WAV文件。

训练

使用python3 -m rhasspyasr_kaldi train <ARGS>

usage: rhasspy-asr-kaldi train [-h] --model-dir MODEL_DIR
                               [--graph-dir GRAPH_DIR]
                               [--intent-graph INTENT_GRAPH]
                               [--dictionary DICTIONARY]
                               [--dictionary-casing {upper,lower,ignore}]
                               [--language-model LANGUAGE_MODEL]
                               --base-dictionary BASE_DICTIONARY
                               [--g2p-model G2P_MODEL]
                               [--g2p-casing {upper,lower,ignore}]

optional arguments:
  -h, --help            show this help message and exit
  --model-dir MODEL_DIR
                        Path to Kaldi model directory (with conf, data)
  --graph-dir GRAPH_DIR
                        Path to Kaldi graph directory (with HCLG.fst)
  --intent-graph INTENT_GRAPH
                        Path to intent graph JSON file (default: stdin)
  --dictionary DICTIONARY
                        Path to write custom pronunciation dictionary
  --dictionary-casing {upper,lower,ignore}
                        Case transformation for dictionary words (training,
                        default: ignore)
  --language-model LANGUAGE_MODEL
                        Path to write custom language model
  --base-dictionary BASE_DICTIONARY
                        Paths to pronunciation dictionaries
  --g2p-model G2P_MODEL
                        Path to Phonetisaurus grapheme-to-phoneme FST model
  --g2p-casing {upper,lower,ignore}
                        Case transformation for g2p words (training, default:
                        ignore)

这将从一个使用rhasspy-nlu创建的意图图中生成自定义的HCLG.fst。您的Kaldi模型目录应如下所示

my_model/ (--model-dir)
- conf/
  - mfcc_hires.conf
- data/
  - local/
    - dict/
      - lexicon.txt (从--dictionary复制)
    - lang/
      - lm.arpa.gz (从--language-model复制)
- graph/ (--graph-dir)
  - HCLG.fst (生成)
- model/
  - final.mdl
- phones/
  - extra_questions.txt
  - nonsilence_phones.txt
  - optional_silence.txt
  - silence_phones.txt
- online/ (仅nnet3)
- extractor/ (仅nnet3)

当使用train命令时，您需要指定以下参数

--intent-graph - 使用rhasspy-nlu生成的图json文件的路径
--model-type - nnet3或gmm
--model-dir - 顶级模型目录的路径（如上例中的 my_model）
--graph-dir - HCLG.fst 应写入的目录路径（如上例中的 my_model/graph）
--base-dictionary - 包含意图图中所有单词的发音字典（可多次使用）
--dictionary - 写入自定义发音字典的路径（可选）
--language-model - 写入自定义 ARPA 语言模型的路径（可选）

从源代码构建

rhasspy-asr-kaldi 依赖于以下必须编译的程序

Kaldi
- 语音到文本引擎
Opengrm
- 创建 ARPA 语言模型
Phonetisaurus
- 猜测未知单词的发音

Kaldi

确保您已安装所有必要的依赖项

sudo apt-get install \
    build-essential \
    libatlas-base-dev libatlas3-base gfortran \
    automake autoconf unzip sox libtool subversion \
    python3 python \
    git zlib1g-dev

下载 Kaldi 并提取它

wget -O kaldi-master.tar.gz \
    'https://github.com/kaldi-asr/kaldi/archive/master.tar.gz'
tar -xvf kaldi-master.tar.gz

首先，构建 Kaldi 的工具

cd kaldi-master/tools
make

如果您有多个 CPU 核心，请使用 make -j 4。这将花费很长时间。

接下来，构建 Kaldi 本身

cd kaldi-master
./configure --shared --mathlib=ATLAS
make depend
make

如果您有多个 CPU 核心，请使用 make depend -j 4 和 make -j 4。这将花费很长时间。

没有安装步骤。kaldi-master 目录包含 Rhasspy 需要访问的所有库和程序。

有关 Docker 构建脚本的更多信息，请参阅 docker-kaldi。

Phonetisaurus

确保您已安装所有必要的依赖项

sudo apt-get install build-essential

首先，下载并构建 OpenFST 1.6.2

wget http://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.6.2.tar.gz
tar -xvf openfst-1.6.2.tar.gz
cd openfst-1.6.2
./configure \
    "--prefix=$(pwd)/build" \
    --enable-static --enable-shared \
    --enable-far --enable-ngram-fsts
make
make install

如果您有多个 CPU 核心，请使用 make -j 4。这将花费很长时间。

接下来，下载并提取 Phonetisaurus

wget -O phonetisaurus-master.tar.gz \
    'https://github.com/AdolfVonKleist/Phonetisaurus/archive/master.tar.gz'
tar -xvf phonetisaurus-master.tar.gz

最后，构建 Phonetisaurus（其中 /path/to/openfst 是上面提到的 openfst-1.6.2 目录）

cd Phonetisaurus-master
./configure \
    --with-openfst-includes=/path/to/openfst/build/include \
    --with-openfst-libs=/path/to/openfst/build/lib
make
make install

如果您有多个 CPU 核心，请使用 make -j 4。这将花费很长时间。

现在，您应该能够运行 phonetisaurus-align 程序。

有关 Docker 构建脚本的更多信息，请参阅 docker-phonetisaurus。

项目详情

这些详情尚未由PyPI验证

项目链接

主页

发布历史发布通知 | RSS 源

本版本

0.6.1

2021 年 5 月 25 日

0.6.0

2021 年 4 月 1 日

0.5.0

2020 年 10 月 16 日

0.4.1

2020 年 10 月 10 日

0.3.0

2020 年 7 月 17 日

0.2.0

2020 年 6 月 24 日

0.1.6

2020 年 6 月 3 日

0.1.5

2020 年 5 月 26 日

0.1.4

2020 年 4 月 24 日

下载文件

下载您平台上的文件。如果您不确定选择哪个，请了解有关安装包的更多信息。

源代码分布

rhasspy-asr-kaldi-0.6.1.tar.gz (13.5 kB 查看散列值)

上传时间 2021 年 5 月 25 日 源代码