spaCy包装器，用于Hugging Face Transformers管道

这些详情尚未由PyPI验证

项目链接

项目描述

spacy-huggingface-pipelines: 使用预训练的transformer模型进行文本和标记分类

本软件包提供了spaCy组件，用于仅进行推理的Hugging Face Transformers管道。

特性

应用预训练的transformer模型，如dslim/bert-base-NER和distilbert-base-uncased-finetuned-sst-2-english。

🚀 安装

从pip安装软件包将自动安装所有依赖项，包括PyTorch和spaCy。

pip install -U pip setuptools wheel
pip install spacy-huggingface-pipelines

对于GPU安装，请遵循spaCy带有GPU的快速入门，例如。

pip install -U spacy[cuda-autodetect]

如果您在安装PyTorch时遇到问题，请遵循官方网站上针对您特定的操作系统和需求的说明。

📖 文档

此模块提供了用于推理仅有的transformers TokenClassificationPipeline和TextClassificationPipeline管道的spaCy包装器。

模型在初始化时从Hugging Face Hub下载（如果本地缓存中没有），或者也可以从本地路径加载。

注意，在调用nlp.to_disk时，transformer模型数据不会与管道一起保存，因此如果您在有限互联网访问的环境中加载管道，请确保模型可用在您的transformers缓存目录中，并在需要时启用离线模式。

标记分类

hf_token_pipe的配置设置

[components.hf_token_pipe]
factory = "hf_token_pipe"
model = "dslim/bert-base-NER"     # Model name or path
revision = "main"                 # Model revision
aggregation_strategy = "average"  # "simple", "first", "average", "max"
stride = 16                       # If stride >= 0, process long texts in
                                  # overlapping windows of the model max
                                  # length. The value is the length of the
                                  # window overlap in transformer tokenizer
                                  # tokens, NOT the length of the stride.
kwargs = {}                       # Any additional arguments for
                                  # TokenClassificationPipeline
alignment_mode = "strict"         # "strict", "contract", "expand"
annotate = "ents"                 # "ents", "pos", "spans", "tag"
annotate_spans_key = null         # Doc.spans key for annotate = "spans"
scorer = null                     # Optional scorer

`TokenClassificationPipeline`设置

model：模型名称或路径。
revision：模型修订版。对于生产使用，建议使用特定的git提交而不是默认的main。
stride：对于stride >= 0，文本在重叠窗口中处理，其中stride设置指定窗口之间重叠的标记数（不是步长长度）。如果stride是None，则文本可能会被截断。stride仅支持快速标记器。
aggregation_strategy：聚合策略确定在单词内部子词未收到相同预测标记的情况下，单词级别的标记。请参阅：https://hugging-face.cn/docs/transformers/main_classes/pipelines#transformers.TokenClassificationPipeline.aggregation_strategy
kwargs：传递给TokenClassificationPipeline的任何附加参数。

spaCy设置

alignment_mode确定transformer预测如何与spaCy标记边界对齐，如Doc.char_span中所述。
annotate和annotate_spans_key配置如何将注释保存到spaCy文档中。您可以将其保存为token.tag_、token.pos_（仅适用于UPOS标记）、doc.ents或doc.spans。

示例

将命名实体注释保存为Doc.ents

import spacy
nlp = spacy.blank("en")
nlp.add_pipe("hf_token_pipe", config={"model": "dslim/bert-base-NER"})
doc = nlp("My name is Sarah and I live in London")
print(doc.ents)
# (Sarah, London)

将命名实体注释保存为Doc.spans[spans_key]并将分数保存为Doc.spans[spans_key].attrs["scores"]

import spacy
nlp = spacy.blank("en")
nlp.add_pipe(
    "hf_token_pipe",
    config={
        "model": "dslim/bert-base-NER",
        "annotate": "spans",
        "annotate_spans_key": "bert-base-ner",
    },
)
doc = nlp("My name is Sarah and I live in London")
print(doc.spans["bert-base-ner"])
# [Sarah, London]
print(doc.spans["bert-base-ner"].attrs["scores"])
# [0.99854773, 0.9996215]

将细粒度标记保存为Token.tag

import spacy
nlp = spacy.blank("en")
nlp.add_pipe(
    "hf_token_pipe",
    config={
        "model": "QCRI/bert-base-multilingual-cased-pos-english",
        "annotate": "tag",
    },
)
doc = nlp("My name is Sarah and I live in London")
print([t.tag_ for t in doc])
# ['PRP$', 'NN', 'VBZ', 'NNP', 'CC', 'PRP', 'VBP', 'IN', 'NNP']

将粗粒度标记保存为Token.pos

import spacy
nlp = spacy.blank("en")
nlp.add_pipe(
    "hf_token_pipe",
    config={"model": "vblagoje/bert-english-uncased-finetuned-pos", "annotate": "pos"},
)
doc = nlp("My name is Sarah and I live in London")
print([t.pos_ for t in doc])
# ['PRON', 'NOUN', 'AUX', 'PROPN', 'CCONJ', 'PRON', 'VERB', 'ADP', 'PROPN']

文本分类

hf_text_pipe的配置设置

[components.hf_text_pipe]
factory = "hf_text_pipe"
model = "distilbert-base-uncased-finetuned-sst-2-english"  # Model name or path
revision = "main"                 # Model revision
kwargs = {}                       # Any additional arguments for
                                  # TextClassificationPipeline
scorer = null                     # Optional scorer

输入文本根据transformers模型的最大长度被截断。

`TextClassificationPipeline`设置

model：模型名称或路径。
revision：模型修订版。对于生产使用，建议使用特定的git提交而不是默认的main。
kwargs：传递给TextClassificationPipeline的任何附加参数。

示例

import spacy

nlp = spacy.blank("en")
nlp.add_pipe(
    "hf_text_pipe",
    config={"model": "distilbert-base-uncased-finetuned-sst-2-english"},
)
doc = nlp("This is great!")
print(doc.cats)
# {'POSITIVE': 0.9998694658279419, 'NEGATIVE': 0.00013048505934420973}

批处理和GPU

标记和文本分类都支持使用nlp.pipe进行批处理。

for doc in nlp.pipe(texts, batch_size=256):
    do_something(doc)

如果组件在处理批处理时遇到错误（例如，在空文本上），nlp.pipe将回退到逐个处理文本。如果在单个文本上遇到错误，将显示警告，并将文档返回而不进行额外注释。

切换到GPU

import spacy
spacy.require_gpu()

for doc in nlp.pipe(texts):
    do_something(doc)

错误报告和问题

请通过spaCy问题跟踪器报告错误或在讨论板上打开新主题以报告其他问题。

项目详情

这些详情尚未由PyPI验证

项目链接

发布历史发布通知 | RSS源

本版本

0.0.4

2023年6月2日

0.0.3

2023年5月4日

0.0.2

2023年4月26日

0.0.1

2023年4月24日

下载文件

下载适合您平台的文件。如果您不确定选择哪个，请了解更多关于安装包的信息。

源代码分发

spacy_huggingface_pipelines-0.0.4.tar.gz (11.7 kB 查看哈希值)

上传时间 Jun 2, 2023 源代码

构建分发

spacy_huggingface_pipelines-0.0.4-py2.py3-none-any.whl (11.2 kB 查看哈希值)

上传时间 Jun 2, 2023 Python 2 Python 3

spacy_huggingface_pipelines-0.0.4.tar.gz 的哈希值

spacy_huggingface_pipelines-0.0.4.tar.gz 的哈希值
算法	哈希摘要
SHA256	`35b409ed7d20c5b36d788912570e3444ec1b0c344255e847bf722b3286279e95`
MD5	`a967cf1c4dab40128fe57b518177cee3`
BLAKE2b-256	`38ca07667af54b4efb3ee204db6db6ba9a3e7d7baf59219e5c86f7888121be06`

spacy_huggingface_pipelines-0.0.4-py2.py3-none-any.whl 的哈希值

spacy_huggingface_pipelines-0.0.4-py2.py3-none-any.whl 的哈希值
算法	哈希摘要
SHA256	`9e38ee6eba7a11fca32b7d14f38259f7805eec211e8959105a90c95915168b00`
MD5	`b52ebc695fda9cffba6ccb53b0758b5c`
BLAKE2b-256	`ba691cf6333eebaadf8517f59b9dec676f42f5fef8b13a29eaf2cd2922470868`

spacy-huggingface-pipelines 0.0.4

导航

验证详情

维护者

未验证详情

项目链接

元数据

分类器

项目描述

spacy-huggingface-pipelines: 使用预训练的transformer模型进行文本和标记分类

特性

🚀 安装

📖 文档

标记分类

`TokenClassificationPipeline`设置

spaCy设置

示例

文本分类

`TextClassificationPipeline`设置

示例

批处理和GPU

错误报告和问题

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

分类器

发布历史发布通知 | RSS源

下载文件

源代码分发

构建分发

spacy-huggingface-pipelines 0.0.4

导航

验证详情

维护者

未验证详情

项目链接

元数据

分类器

项目描述

spacy-huggingface-pipelines: 使用预训练的transformer模型进行文本和标记分类

特性

🚀 安装

📖 文档

标记分类

TokenClassificationPipeline设置

spaCy设置

示例

文本分类

TextClassificationPipeline设置

示例

批处理和GPU

错误报告和问题

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

分类器

发布历史 发布通知 | RSS源

下载文件

源代码分发

构建分发

`TokenClassificationPipeline`设置

`TextClassificationPipeline`设置

发布历史发布通知 | RSS源