一个疯狂快速的耳语CLI

项目描述

疯狂快速耳语

一个对Audio文件进行转录的CLI工具，使用设备上的Whisper！由🤗 Transformers，Optimum 和 flash-attn 支持。

简而言之 - 在不到 98 秒的时间内转录 150 分钟（2.5小时）的音频 - 使用OpenAI的Whisper Large v3。闪电般的转录速度现在已成为现实！⚡️

pipx install insanely-fast-whisper==0.0.14 --force

还不相信？以下是一些我们在Nvidia A100-80GB上运行的性能基准👇

优化类型	转录时间（150分钟的音频）
large-v3 (Transformers) (`fp32`)	~31 (31分1秒)
large-v3 (Transformers) (`fp16` + `batching [24]` + `bettertransformer`)	~5 (5分2秒)
large-v3 (Transformers) (`fp16` + `batching [24]` + `Flash Attention 2`)	*~2 (1分38秒)*
distil-large-v2 (Transformers) (`fp16` + `batching [24]` + `bettertransformer`)	~3 (3分16秒)
distil-large-v2 (Transformers) (`fp16` + `batching [24]` + `Flash Attention 2`)	*~1 (1分18秒)*
large-v2 (Faster Whisper) (`fp16` + `beam_size [1]`)	~9.23 (9分23秒)
large-v2 (Faster Whisper) (`8-bit` + `beam_size [1]`)	~8 (8分15秒)

附言：我们还在一个Google Colab T4 GPU实例上进行了基准测试！

补充说明。该项目最初是为了展示Transformer的基准测试，但后来已经演变成一个轻量级的命令行界面，供人们使用。这是一个纯粹由社区驱动的项目。我们会添加社区强烈需求的任何内容！

🆕 通过终端实现闪电般的转录速度！⚡️

我们增加了一个命令行界面，以便快速转录。以下是您可以使用它的方法

使用安装insanely-fast-whisper（pip install pipx或brew install pipx）

pipx install insanely-fast-whisper

⚠️如果您已安装python 3.11.XX，pipx可能会错误地解析版本，并在不通知您的情况下安装一个非常旧的insanely-fast-whisper版本（版本0.0.8，将无法与当前BetterTransformers一起使用）。在这种情况下，您可以通过将--ignore-requires-python传递给pip来安装最新版本。

pipx install insanely-fast-whisper --force --pip-args="--ignore-requires-python"

如果您使用pip安装，可以直接传递参数：pip install insanely-fast-whisper --ignore-requires-python。

从您计算机上的任何路径运行推理

insanely-fast-whisper --file-name <filename or URL>

注意：如果您在macOS上运行，还需要添加--device-id mps标志。

🔥您还可以从这个CLI运行Whisper-large-v3和Flash Attention 2

insanely-fast-whisper --file-name <filename or URL> --flash True

🌟您还可以直接从这个CLI运行distil-whisper

insanely-fast-whisper --model-name distil-whisper/large-v2 --file-name <filename or URL>

不想安装insanely-fast-whisper？只需使用pipx run

pipx run insanely-fast-whisper --file-name <filename or URL>

[!注意] CLI具有强烈的意见导向性，仅在NVIDIA GPU和Mac上运行。请确保检查默认设置以及您可以玩转的选项列表，以最大化转录吞吐量。运行insanely-fast-whisper --help或pipx run insanely-fast-whisper --help以获取所有CLI参数及其默认值。

CLI选项

insanely-fast-whisper存储库为在多种设置下运行Whisper提供了全面的支持。请注意，截至今天（11月26日），insanely-fast-whisper在CUDA和mac上启用的设备上均能运行。

  -h, --help            show this help message and exit
  --file-name FILE_NAME
                        Path or URL to the audio file to be transcribed.
  --device-id DEVICE_ID
                        Device ID for your GPU. Just pass the device number when using CUDA, or "mps" for Macs with Apple Silicon. (default: "0")
  --transcript-path TRANSCRIPT_PATH
                        Path to save the transcription output. (default: output.json)
  --model-name MODEL_NAME
                        Name of the pretrained model/ checkpoint to perform ASR. (default: openai/whisper-large-v3)
  --task {transcribe,translate}
                        Task to perform: transcribe or translate to another language. (default: transcribe)
  --language LANGUAGE   
                        Language of the input audio. (default: "None" (Whisper auto-detects the language))
  --batch-size BATCH_SIZE
                        Number of parallel batches you want to compute. Reduce if you face OOMs. (default: 24)
  --flash FLASH         
                        Use Flash Attention 2. Read the FAQs to see how to install FA2 correctly. (default: False)
  --timestamp {chunk,word}
                        Whisper supports both chunked as well as word level timestamps. (default: chunk)
  --hf-token HF_TOKEN
                        Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips
  --diarization_model DIARIZATION_MODEL
                        Name of the pretrained model/ checkpoint to perform diarization. (default: pyannote/speaker-diarization)
  --num-speakers NUM_SPEAKERS
                        Specifies the exact number of speakers present in the audio file. Useful when the exact number of participants in the conversation is known. Must be at least 1. Cannot be used together with --min-speakers or --max-speakers. (default: None)
  --min-speakers MIN_SPEAKERS
                        Sets the minimum number of speakers that the system should consider during diarization. Must be at least 1. Cannot be used together with --num-speakers. Must be less than or equal to --max-speakers if both are specified. (default: None)
  --max-speakers MAX_SPEAKERS
                        Defines the maximum number of speakers that the system should consider in diarization. Must be at least 1. Cannot be used together with --num-speakers. Must be greater than or equal to --min-speakers if both are specified. (default: None)

常见问题解答

如何正确安装flash-attn以便与insanely-fast-whisper一起使用？

请确保通过pipx runpip insanely-fast-whisper install flash-attn --no-build-isolation安装它。感谢@li-yifei为此提供帮助。

如何在Windows上解决AssertionError: Torch not compiled with CUDA enabled错误？

此问题的根本原因尚不清楚，但您可以通过在虚拟环境中手动安装torch来解决这个问题，如下所示：python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121。感谢@pto2k为此进行调试。

如何在Mac上避免内存不足（OOM）异常？

mps后端不如CUDA优化，因此对内存的需求更大。通常，您可以使用--batch-size 4来运行，而不会有任何问题（应该使用大约12GB GPU VRAM）。别忘了设置--device-id mps。

如何在不使用CLI的情况下使用Whisper？

您只需要以下代码片段

pip install --upgrade transformers optimum accelerate

import torch
from transformers import pipeline
from transformers.utils import is_flash_attn_2_available

pipe = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-large-v3", # select checkpoint from https://hugging-face.cn/openai/whisper-large-v3#model-details
    torch_dtype=torch.float16,
    device="cuda:0", # or mps for Mac devices
    model_kwargs={"attn_implementation": "flash_attention_2"} if is_flash_attn_2_available() else {"attn_implementation": "sdpa"},
)

outputs = pipe(
    "<FILE_NAME>",
    chunk_length_s=30,
    batch_size=24,
    return_timestamps=True,
)

outputs

致谢

OpenAI Whisper团队开源了如此出色的检查点。
Hugging Face Transformers团队，特别是Arthur、Patrick、Sanchit和Yoach（按字母顺序排列），他们继续在Transformers中维护Whisper。
Hugging Face Optimum团队使BetterTransformer API易于访问。
Patrick Arminio在帮助我构建此CLI方面做出了巨大贡献。

社区展示

@ochen1 在此处创建了一个CLI的卓越MVP：[https://github.com/ochen1/insanely-fast-whisper-cli](https://github.com/ochen1/insanely-fast-whisper-cli)（现在试试看！）
@arihanv 使用NextJS（前端）和Modal（后端）创建了一个应用程序（Shush）：[https://github.com/arihanv/Shush](https://github.com/arihanv/Shush)（去看看吧！）
@kadirnar 在transformers之上创建了一个Python包，并进行了优化：[https://github.com/kadirnar/whisper-plus](https://github.com/kadirnar/whisper-plus)（快去试试！）

项目详情

发布历史发布通知 | RSS订阅

本版本

0.0.15

2024年5月27日

0.0.14

2024年5月25日

0.0.13

2023年12月14日

0.0.12

2023年12月10日

0.0.11

2023年12月10日

0.0.10

2023年11月27日

0.0.9

2023年11月27日

0.0.8

2023年11月22日

0.0.7

2023年11月17日

0.0.6

2023年11月17日

0.0.5

2023年11月17日

0.0.5b3 预发布

2023年11月14日

0.0.5b2 预发布

2023年11月14日

0.0.5b1 预发布

2023年11月14日

0.0.5b0 预发布

2023年11月14日

0.0.4

2023年11月12日

0.0.3

2023年11月5日

0.0.2

2023年11月4日

0.0.1

2023年11月2日

下载文件

下载适合您平台的文件。如果您不确定选择哪个，请了解更多关于安装包的信息。

源分布

insanely_fast_whisper-0.0.15.tar.gz (16.5 kB 查看哈希值)

上传时间 2024年5月27日 源

构建分布

insanely_fast_whisper-0.0.15-py3-none-any.whl (16.0 kB 查看哈希值)

上传时间 2024年5月27日 Python 3

哈希值 for insanely_fast_whisper-0.0.15.tar.gz

insanely_fast_whisper-0.0.15.tar.gz 的哈希值
算法	哈希摘要
SHA256	`58596ec51056d6cd7e400068a87972dc77aa30afa244791194ede4f2a6d0a330`
MD5	`19a03607818f60827ce01d3516b701d8`
BLAKE2b-256	`ffddc2680ebdc945482793c6fe00813720d4e3238e62f612ad480194cd5692d3`

哈希值 for insanely_fast_whisper-0.0.15-py3-none-any.whl

insanely_fast_whisper-0.0.15-py3-none-any.whl 的哈希值
算法	哈希摘要
SHA256	`7228c0a34020e40ef8ec5742e0da986d5aa07b72d3294b81fdca08cac9f9e594`
MD5	`f7fe9929481263930c6eec3820f28d82`
BLAKE2b-256	`fd6436d433ed015e4bd74a597e572005b370ca8675658e61db736328057ab063`

疯狂快速耳语 0.0.15

导航

已验证详情

维护者

未验证详情

项目链接

元数据

项目描述

疯狂快速耳语

🆕 通过终端实现闪电般的转录速度！⚡️

CLI选项

常见问题解答

如何在不使用CLI的情况下使用Whisper？

致谢

社区展示

项目详情

已验证详情

维护者

未验证详情

项目链接

元数据

发布历史发布通知 | RSS订阅

下载文件

源分布

构建分布

疯狂快速耳语 0.0.15

导航

已验证详情

维护者

未验证详情

项目链接

元数据

项目描述

疯狂快速耳语

🆕 通过终端实现闪电般的转录速度！⚡️

CLI选项

常见问题解答

如何在不使用CLI的情况下使用Whisper？

致谢

社区展示

项目详情

已验证详情

维护者

未验证详情

项目链接

元数据

发布历史 发布通知 | RSS订阅

下载文件

源分布

构建分布

发布历史发布通知 | RSS订阅