未提供项目描述
项目描述
Rhasspy Raven唤醒词系统
基于Snips Personal唤醒词检测器的唤醒词检测器。
Raven的底层实现大量借鉴自node-personal-wakeword,由mathquis开发。
依赖项
- Python 3.7
python-speech-features
用于MFCC计算rhasspy-silence
用于静音检测- 科学库
sudo apt-get install liblapack3 libatlas-base-dev
安装
$ git clone https://github.com/rhasspy/rhasspy-wake-raven.git
$ cd rhasspy-wake-raven
$ ./configure
$ make
$ make install
录音模板
至少用您的唤醒词录制3个WAV模板
$ arecord -r 16000 -f S16_LE -c 1 -t raw | \
bin/rhasspy-wake-raven --record keyword-dir/
按照提示说话,录制您的唤醒词。录制至少3个示例后,按CTRL+C退出。您的WAV模板将自动裁剪静音,并保存在keyword-dir/
目录中。在目录名后添加格式字符串以控制文件名
$ arecord -r 16000 -f S16_LE -c 1 -t raw | \
bin/rhasspy-wake-raven --record keyword-dir/ 'keyword-{n:02d}.wav'
格式字符串将接收每个示例的0起始索引n
。
如果您想手动录制WAV模板,请从前后裁剪静音,并确保以16位16KHz单声道WAV文件格式导出。
运行
在目录中录制您的WAV模板后,运行
$ arecord -r 16000 -f S16_LE -c 1 -t raw | \
bin/rhasspy-wake-raven --keyword <WAV_DIR> ...
其中 <WAV_DIR>
包含WAV模板。您可以添加任意多的关键词,但这将使用额外的CPU。建议您使用--average-templates
以降低CPU使用率。
某些设置可以针对每个关键词指定
$ arecord -r 16000 -f S16_LE -c 1 -t raw | \
bin/rhasspy-wake-raven \
--keyword keyword1/ name=my-keyword1 probability-threshold=0.45 minimum-matches=2 \
--keyword keyword2/ name=my-keyword2 probability-threshold=0.55 average-templates=true
如果没有设置,则 probability-threshold=
等等将回退到 --probability-threshold
等等提供的值。
将 --debug
添加到命令行以获取有关每个音频帧底层计算的更多信息。
示例
使用 "okay rhasspy" 的示例文件
$ arecord -r 16000 -f S16_LE -c 1 -t raw | \
bin/rhasspy-wake-raven --keyword etc/okay-rhasspy/
打印此类输出之前,至少需要匹配 3 个 WAV 模板中的 1 个
{"keyword": "okay-rhasspy", "template": "etc/okay-rhasspy/okay-rhasspy-00.wav", "detect_seconds": 2.7488508224487305, "detect_timestamp": 1594996988.638912, "raven": {"probability": 0.45637207995699963, "distance": 0.25849045215799454, "probability_threshold": 0.5, "distance_threshold": 0.22, "tick": 1, "matches": 2, "match_seconds": 0.005367016012314707}}
使用 --minimum-matches
修改必须匹配多少个模板才能发生检测,或使用 --average-templates
将所有 WAV 模板合并为单个模板(减少 CPU 使用率)。使用 --probability-threshold
调整灵敏度,该命令设置检测概率的下限(默认为 0.5)。
输出
当检测到唤醒词时,Raven 输出一条 JSON 行。字段包括
keyword
- 关键字或目录的名称template
- WAV 文件模板的路径detect_seconds
- 检测发生时程序开始后的秒数detect_timestamp
- 检测发生的时戳(使用time.time()
)raven
probability
- 检测概率probability_threshold
- 检测概率的范围distance
- 归一化动态时间规整距离distance_threshold
- 用于比较的距离阈值matches
- 匹配的 WAV 模板数量match_seconds
- 动态时间规整计算所需的时间(秒)tick
- 每次检测时递增的单调计数器
测试
您可以在一组示例 WAV 文件上测试 Raven 的工作效果
$ PATH=$PWD/bin:$PATH test-raven.py --test-directory /path/to/samples/ --keyword /path/to/templates/
这将运行最多 10 个 Raven 的并行实例(使用 --test-workers
修改),并输出一个包含检测信息和摘要统计信息的 JSON 报告,例如
{
"positive": [...],
"negative": [...],
"summary": {
"true_positives": 14,
"false_positives": 0,
"true_negatives": 40,
"false_negatives": 7,
"precision": 1.0,
"recall": 0.6666666666666666,
"f1_score": 0.8
}
将所有额外的命令行参数传递给 Raven(例如,--minimum-matches
)。
命令行界面
usage: rhasspy-wake-raven [-h] [--keyword KEYWORD [KEYWORD ...]]
[--chunk-size CHUNK_SIZE]
[--record RECORD [RECORD ...]]
[--probability-threshold PROBABILITY_THRESHOLD]
[--distance-threshold DISTANCE_THRESHOLD]
[--minimum-matches MINIMUM_MATCHES]
[--refractory-seconds REFRACTORY_SECONDS]
[--print-all-matches]
[--window-shift-seconds WINDOW_SHIFT_SECONDS]
[--dtw-window-size DTW_WINDOW_SIZE]
[--vad-sensitivity {1,2,3}]
[--current-threshold CURRENT_THRESHOLD]
[--max-energy MAX_ENERGY]
[--max-current-ratio-threshold MAX_CURRENT_RATIO_THRESHOLD]
[--silence-method {vad_only,ratio_only,current_only,vad_and_ratio,vad_and_current,all}]
[--average-templates] [--exit-count EXIT_COUNT]
[--read-entire-input]
[--max-chunks-in-queue MAX_CHUNKS_IN_QUEUE]
[--skip-probability-threshold SKIP_PROBABILITY_THRESHOLD]
[--failed-matches-to-refractory FAILED_MATCHES_TO_REFRACTORY]
[--debug]
optional arguments:
-h, --help show this help message and exit
--keyword KEYWORD [KEYWORD ...]
Directory with WAV templates and settings (setting-
name=value)
--chunk-size CHUNK_SIZE
Number of bytes to read at a time from standard in
(default: 1920)
--record RECORD [RECORD ...]
Record example templates to a directory, optionally
with given name format (e.g., 'my-
keyword-{n:02d}.wav')
--probability-threshold PROBABILITY_THRESHOLD
Probability above which detection occurs (default:
0.5)
--distance-threshold DISTANCE_THRESHOLD
Normalized dynamic time warping distance threshold for
template matching (default: 0.22)
--minimum-matches MINIMUM_MATCHES
Number of templates that must match to produce output
(default: 1)
--refractory-seconds REFRACTORY_SECONDS
Seconds before wake word can be activated again
(default: 2)
--print-all-matches Print JSON for all matching templates instead of just
the first one
--window-shift-seconds WINDOW_SHIFT_SECONDS
Seconds to shift sliding time window on audio buffer
(default: 0.02)
--dtw-window-size DTW_WINDOW_SIZE
Size of band around slanted diagonal during dynamic
time warping calculation (default: 5)
--vad-sensitivity {1,2,3}
Webrtcvad VAD sensitivity (1-3)
--current-threshold CURRENT_THRESHOLD
Debiased energy threshold of current audio frame
--max-energy MAX_ENERGY
Fixed maximum energy for ratio calculation (default:
observed)
--max-current-ratio-threshold MAX_CURRENT_RATIO_THRESHOLD
Threshold of ratio between max energy and current
audio frame
--silence-method {vad_only,ratio_only,current_only,vad_and_ratio,vad_and_current,all}
Method for detecting silence
--average-templates Average wakeword templates together to reduce number
of calculations
--exit-count EXIT_COUNT
Exit after some number of detections (default: never)
--read-entire-input Read entire audio input at start and exit after
processing
--max-chunks-in-queue MAX_CHUNKS_IN_QUEUE
Maximum number of audio chunks waiting for processing
before being dropped
--skip-probability-threshold SKIP_PROBABILITY_THRESHOLD
Skip additional template calculations if probability
is below this threshold
--failed-matches-to-refractory FAILED_MATCHES_TO_REFRACTORY
Number of failed template matches before entering
refractory period (default: disabled)
--debug Print DEBUG messages to the console
项目详情
rhasspy-wake-raven-0.5.2.tar.gz 的哈希
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 40313868e13b68bdef54184a73f5ef2bd7293cd3d9568fd5db442bb31826f4df |
|
MD5 | 2842ec80193d16d3e32ef175290d32be |
|
BLAKE2b-256 | 0c185d6ef75c65f18006d63a23f5d4ac09bcd7633d5f982d6fc3658a47aa61c5 |