你是否曾经遇到过需要Spacy TextCategorizer但没时间从头开始训练的情况？Classy Classification是你的最佳选择！

这些详情尚未由PyPI 验证

项目链接

项目描述

Classy Classification

你是否曾经遇到过需要Spacy TextCategorizer但没时间从头开始训练的情况？Classy Classification是你的最佳选择！对于使用sentence-transformers或spaCy模型的少样本分类，提供一个包含标签和示例的字典，或者只为Huggingface zero-shot classifiers的零样本分类提供一个标签列表。

安装

pip install classy-classification

SetFit支持

我收到了很多关于SetFit支持的请求，但我决定为这个功能创建一个独立的包。请随意查看。❤️

快速入门

SpaCy嵌入

import spacy
# or import standalone
# from classy_classification import ClassyClassifier

data = {
    "furniture": ["This text is about chairs.",
               "Couches, benches and televisions.",
               "I really need to get a new sofa."],
    "kitchen": ["There also exist things like fridges.",
                "I hope to be getting a new stove today.",
                "Do you also have some ovens."]
}

nlp = spacy.load("en_core_web_trf")
nlp.add_pipe(
    "classy_classification",
    config={
        "data": data,
        "model": "spacy"
    }
)

print(nlp("I am looking for kitchen appliances.")._.cats)

# Output:
#
# [{"furniture" : 0.21}, {"kitchen": 0.79}]

句子级分类

import spacy

data = {
    "furniture": ["This text is about chairs.",
               "Couches, benches and televisions.",
               "I really need to get a new sofa."],
    "kitchen": ["There also exist things like fridges.",
                "I hope to be getting a new stove today.",
                "Do you also have some ovens."]
}

nlp.add_pipe(
    "classy_classification",
    config={
        "data": data,
        "model": "spacy",
        "include_sent": True
    }
)

print(nlp("I am looking for kitchen appliances. And I love doing so.").sents[0]._.cats)

# Output:
#
# [[{"furniture" : 0.21}, {"kitchen": 0.79}]

定义随机种子和详细程度

nlp.add_pipe(
    "classy_classification",
    config={
        "data": data,
        "verbose": True,
        "config": {"seed": 42}
    }
)

多标签分类

有时需要多个标签才能完全描述文本的内容。在这种情况下，我们希望使用多标签实现，这里标签分数之和不受限于1。只需将相同的训练数据传递给多个键。

import spacy

data = {
    "furniture": ["This text is about chairs.",
               "Couches, benches and televisions.",
               "I really need to get a new sofa.",
               "We have a new dinner table.",
               "There also exist things like fridges.",
                "I hope to be getting a new stove today.",
                "Do you also have some ovens.",
                "We have a new dinner table."],
    "kitchen": ["There also exist things like fridges.",
                "I hope to be getting a new stove today.",
                "Do you also have some ovens.",
                "We have a new dinner table.",
                "There also exist things like fridges.",
                "I hope to be getting a new stove today.",
                "Do you also have some ovens.",
                "We have a new dinner table."]
}

nlp = spacy.load("en_core_web_md")
nlp.add_pipe(
    "classy_classification",
    config={
        "data": data,
        "model": "spacy",
        "multi_label": True,
    }
)

print(nlp("I am looking for furniture and kitchen equipment.")._.cats)

# Output:
#
# [{"furniture": 0.92}, {"kitchen": 0.91}]

异常检测

有时进行异常检测或二元分类是有价值的。这可以通过使用二元训练数据集来实现，但我还实现了对使用单个标签进行异常检测的OneClassSVM的支持。[注意：此方法不返回概率，但数据格式化为标签-分数值对，以确保一致性。

方法1

import spacy

data_binary = {
    "inlier": ["This text is about chairs.",
               "Couches, benches and televisions.",
               "I really need to get a new sofa."],
    "outlier": ["Text about kitchen equipment",
                "This text is about politics",
                "Comments about AI and stuff."]
}

nlp = spacy.load("en_core_web_md")
nlp.add_pipe(
    "classy_classification",
    config={
        "data": data_binary,
    }
)

print(nlp("This text is a random text")._.cats)

# Output:
#
# [{'inlier': 0.2926672385488411, 'outlier': 0.707332761451159}]

方法2

import spacy

data_singular = {
    "furniture": ["This text is about chairs.",
               "Couches, benches and televisions.",
               "I really need to get a new sofa.",
               "We have a new dinner table."]
}
nlp = spacy.load("en_core_web_md")
nlp.add_pipe(
    "classy_classification",
    config={
        "data": data_singular,
    }
)

print(nlp("This text is a random text")._.cats)

# Output:
#
# [{'furniture': 0, 'not_furniture': 1}]

句子转换器嵌入

import spacy

data = {
    "furniture": ["This text is about chairs.",
               "Couches, benches and televisions.",
               "I really need to get a new sofa."],
    "kitchen": ["There also exist things like fridges.",
                "I hope to be getting a new stove today.",
                "Do you also have some ovens."]
}

nlp = spacy.blank("en")
nlp.add_pipe(
    "classy_classification",
    config={
        "data": data,
        "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
        "device": "gpu"
    }
)

print(nlp("I am looking for kitchen appliances.")._.cats)

# Output:
#
# [{"furniture": 0.21}, {"kitchen": 0.79}]

Huggingface零样本分类器

import spacy

data = ["furniture", "kitchen"]

nlp = spacy.blank("en")
nlp.add_pipe(
    "classy_classification",
    config={
        "data": data,
        "model": "typeform/distilbert-base-uncased-mnli",
        "cat_type": "zero",
        "device": "gpu"
    }
)

print(nlp("I am looking for kitchen appliances.")._.cats)

# Output:
#
# [{"furniture": 0.21}, {"kitchen": 0.79}]

致谢

灵感来源

Huggingface提供了一些针对少量/零样本分类的不错模型，但这些模型并不是针对多语言方法定制的。Rasa NLU有一种不错的处理方法，但它在Rasa/chatbots代码库中嵌入得太深，难以在外部使用。此外，将sentence-transformers和Huggingface零样本集成到默认的词嵌入中似乎是合理的。最后，我决定与Spacy集成，因为如果你想要快速而简单的东西，训练定制的Spacy TextCategorizer似乎很麻烦。

或者给我买杯咖啡

独立使用，不依赖spaCy

from classy_classification import ClassyClassifier

data = {
    "furniture": ["This text is about chairs.",
               "Couches, benches and televisions.",
               "I really need to get a new sofa."],
    "kitchen": ["There also exist things like fridges.",
                "I hope to be getting a new stove today.",
                "Do you also have some ovens."]
}

classifier = ClassyClassifier(data=data)
classifier("I am looking for kitchen appliances.")
classifier.pipe(["I am looking for kitchen appliances."])

# overwrite training data
classifier.set_training_data(data=data)
classifier("I am looking for kitchen appliances.")

# overwrite [embedding model](https://sbert.net.cn/docs/pretrained_models.html)
classifier.set_embedding_model(model="paraphrase-MiniLM-L3-v2")
classifier("I am looking for kitchen appliances.")

# overwrite SVC config
classifier.set_classification_model(
    config={
        "C": [1, 2, 5, 10, 20, 100],
        "kernel": ["linear"],
        "max_cross_validation_folds": 5
    }
)
classifier("I am looking for kitchen appliances.")

保存和加载模型

data = {
    "furniture": ["This text is about chairs.",
               "Couches, benches and televisions.",
               "I really need to get a new sofa."],
    "kitchen": ["There also exist things like fridges.",
                "I hope to be getting a new stove today.",
                "Do you also have some ovens."]
}
classifier = classyClassifier(data=data)

with open("./classifier.pkl", "wb") as f:
    pickle.dump(classifier, f)

f = open("./classifier.pkl", "rb")
classifier = pickle.load(f)
classifier("I am looking for kitchen appliances.")

项目详情

这些详情尚未由PyPI 验证

项目链接

发布历史发布通知 | RSS源

此版本

1.0.0

2024年5月31日

0.6.7

2023年8月31日

0.6.6

2023年6月19日

0.6.5

2023年6月19日

0.6.4

2023年6月18日

0.6.3

2023年6月18日

0.6.2

2023年2月15日

0.6.1

2023年1月14日

0.6

2022年12月30日

0.5.4

2022年12月24日

0.5.3.1

2022年12月15日

0.5.3

2022年12月15日

0.5.2

2022年11月21日

0.5.1

2022年11月21日

0.5

2022年11月11日

0.4.5

2022年9月25日

0.4.4

2022年5月27日

0.4.2

2022年5月19日

0.4.1

2022年4月13日

0.4.0

2022年4月3日

0.3.6

2022年3月29日

0.3.5

2022年3月13日

0.3.4

2022年3月8日

0.3.3

2022年3月8日

0.3.2

2022年2月28日

0.3.1

2022年2月24日

0.3

2022年2月24日

0.2.3

2022年2月22日

0.2.2

2022年2月22日

0.2.1

2022年2月22日

0.1.0

2022年2月22日

下载文件

为您的平台下载文件。如果您不确定选择哪个，请了解更多关于安装包的信息。

源代码分发

classy-classification-1.0.0.tar.gz (14.6 kB 查看哈希)

上传时间 2024年5月31日 源代码

构建分发

classy_classification-1.0.0-py3-none-any.whl (15.5 kB 查看哈希)

上传时间 2024年5月31日 Python 3

散列值 for classy-classification-1.0.0.tar.gz

classy-classification-1.0.0.tar.gz 的散列值
算法	散列摘要
SHA256	`f8c525b7d0a7332e7ad30a289bf2760a8d12cd3b94a05fabc8838bfc0cad9d23`
MD5	`68262f6cfdba000913984f1726ff1106`
BLAKE2b-256	`828ded73c91a055ae9869cae50662a746e5bdddf265848f9a9e0eafd4a1e3e2e`

散列值 for classy_classification-1.0.0-py3-none-any.whl

classy_classification-1.0.0-py3-none-any.whl 的散列值
算法	散列摘要
SHA256	`9e79739a345d3ffc3bf7e7743405e4d8bb4170654d197ff75724ad5ae7cafd45`
MD5	`f2dcb5017eb810b71ea093dd712c7a8f`
BLAKE2b-256	`667e103a31711d23fdfba3291e0d970bb27d5b413c93e3a76383aea3c88bceab`

classy-classification 1.0.0

导航

验证详情

维护者

未验证详情

项目链接

元信息

分类器

项目描述

Classy Classification

安装

SetFit支持

快速入门

SpaCy嵌入

句子级分类

定义随机种子和详细程度

多标签分类

异常检测

句子转换器嵌入

Huggingface零样本分类器

致谢

灵感来源

或者给我买杯咖啡

独立使用，不依赖spaCy

保存和加载模型

项目详情

验证详情

维护者

未验证详情

项目链接

元信息

分类器

发布历史发布通知 | RSS源

下载文件

源代码分发

构建分发

classy-classification 1.0.0

导航

验证详情

维护者

未验证详情

项目链接

元信息

分类器

项目描述

Classy Classification

安装

SetFit支持

快速入门

SpaCy嵌入

句子级分类

定义随机种子和详细程度

多标签分类

异常检测

句子转换器嵌入

Huggingface零样本分类器

致谢

灵感来源

或者给我买杯咖啡

独立使用，不依赖spaCy

保存和加载模型

项目详情

验证详情

维护者

未验证详情

项目链接

元信息

分类器

发布历史 发布通知 | RSS源

下载文件

源代码分发

构建分发

发布历史发布通知 | RSS源