你是否曾经遇到过需要Spacy TextCategorizer但没时间从头开始训练的情况?Classy Classification是你的最佳选择!
项目描述
Classy Classification
你是否曾经遇到过需要Spacy TextCategorizer但没时间从头开始训练的情况?Classy Classification是你的最佳选择!对于使用sentence-transformers或spaCy模型的少样本分类,提供一个包含标签和示例的字典,或者只为Huggingface zero-shot classifiers的零样本分类提供一个标签列表。
安装
pip install classy-classification
SetFit支持
我收到了很多关于SetFit支持的请求,但我决定为这个功能创建一个独立的包。请随意查看。❤️
快速入门
SpaCy嵌入
import spacy
# or import standalone
# from classy_classification import ClassyClassifier
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
nlp = spacy.load("en_core_web_trf")
nlp.add_pipe(
"classy_classification",
config={
"data": data,
"model": "spacy"
}
)
print(nlp("I am looking for kitchen appliances.")._.cats)
# Output:
#
# [{"furniture" : 0.21}, {"kitchen": 0.79}]
句子级分类
import spacy
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
nlp.add_pipe(
"classy_classification",
config={
"data": data,
"model": "spacy",
"include_sent": True
}
)
print(nlp("I am looking for kitchen appliances. And I love doing so.").sents[0]._.cats)
# Output:
#
# [[{"furniture" : 0.21}, {"kitchen": 0.79}]
定义随机种子和详细程度
nlp.add_pipe(
"classy_classification",
config={
"data": data,
"verbose": True,
"config": {"seed": 42}
}
)
多标签分类
有时需要多个标签才能完全描述文本的内容。在这种情况下,我们希望使用多标签实现,这里标签分数之和不受限于1。只需将相同的训练数据传递给多个键。
import spacy
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa.",
"We have a new dinner table.",
"There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens.",
"We have a new dinner table."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens.",
"We have a new dinner table.",
"There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens.",
"We have a new dinner table."]
}
nlp = spacy.load("en_core_web_md")
nlp.add_pipe(
"classy_classification",
config={
"data": data,
"model": "spacy",
"multi_label": True,
}
)
print(nlp("I am looking for furniture and kitchen equipment.")._.cats)
# Output:
#
# [{"furniture": 0.92}, {"kitchen": 0.91}]
异常检测
有时进行异常检测或二元分类是有价值的。这可以通过使用二元训练数据集来实现,但我还实现了对使用单个标签进行异常检测的OneClassSVM
的支持。[注意:此方法不返回概率,但数据格式化为标签-分数值对,以确保一致性。
方法1
import spacy
data_binary = {
"inlier": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"outlier": ["Text about kitchen equipment",
"This text is about politics",
"Comments about AI and stuff."]
}
nlp = spacy.load("en_core_web_md")
nlp.add_pipe(
"classy_classification",
config={
"data": data_binary,
}
)
print(nlp("This text is a random text")._.cats)
# Output:
#
# [{'inlier': 0.2926672385488411, 'outlier': 0.707332761451159}]
方法2
import spacy
data_singular = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa.",
"We have a new dinner table."]
}
nlp = spacy.load("en_core_web_md")
nlp.add_pipe(
"classy_classification",
config={
"data": data_singular,
}
)
print(nlp("This text is a random text")._.cats)
# Output:
#
# [{'furniture': 0, 'not_furniture': 1}]
句子转换器嵌入
import spacy
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
nlp = spacy.blank("en")
nlp.add_pipe(
"classy_classification",
config={
"data": data,
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"device": "gpu"
}
)
print(nlp("I am looking for kitchen appliances.")._.cats)
# Output:
#
# [{"furniture": 0.21}, {"kitchen": 0.79}]
Huggingface零样本分类器
import spacy
data = ["furniture", "kitchen"]
nlp = spacy.blank("en")
nlp.add_pipe(
"classy_classification",
config={
"data": data,
"model": "typeform/distilbert-base-uncased-mnli",
"cat_type": "zero",
"device": "gpu"
}
)
print(nlp("I am looking for kitchen appliances.")._.cats)
# Output:
#
# [{"furniture": 0.21}, {"kitchen": 0.79}]
致谢
灵感来源
Huggingface提供了一些针对少量/零样本分类的不错模型,但这些模型并不是针对多语言方法定制的。Rasa NLU有一种不错的处理方法,但它在Rasa/chatbots代码库中嵌入得太深,难以在外部使用。此外,将sentence-transformers和Huggingface零样本集成到默认的词嵌入中似乎是合理的。最后,我决定与Spacy集成,因为如果你想要快速而简单的东西,训练定制的Spacy TextCategorizer似乎很麻烦。
或者给我买杯咖啡
独立使用,不依赖spaCy
from classy_classification import ClassyClassifier
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
classifier = ClassyClassifier(data=data)
classifier("I am looking for kitchen appliances.")
classifier.pipe(["I am looking for kitchen appliances."])
# overwrite training data
classifier.set_training_data(data=data)
classifier("I am looking for kitchen appliances.")
# overwrite [embedding model](https://sbert.net.cn/docs/pretrained_models.html)
classifier.set_embedding_model(model="paraphrase-MiniLM-L3-v2")
classifier("I am looking for kitchen appliances.")
# overwrite SVC config
classifier.set_classification_model(
config={
"C": [1, 2, 5, 10, 20, 100],
"kernel": ["linear"],
"max_cross_validation_folds": 5
}
)
classifier("I am looking for kitchen appliances.")
保存和加载模型
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
classifier = classyClassifier(data=data)
with open("./classifier.pkl", "wb") as f:
pickle.dump(classifier, f)
f = open("./classifier.pkl", "rb")
classifier = pickle.load(f)
classifier("I am looking for kitchen appliances.")
项目详情
下载文件
为您的平台下载文件。如果您不确定选择哪个,请了解更多关于安装包的信息。