使用各种模型预测文本类别的简单工具。
项目描述
# TextClassify
## 模型
* fastText char
* fastText word
* CNN char embedding
* CNN word embedding
* CNN char & word embedding
* CNN + BiGRU + char & word embedding
## 分词模型
* pyltp
* jieba
## 嵌入
* fastText (CBOW / skip-gram)
* gensim
字符或单词嵌入
## 使用方法
```python
from text_classify import TextClassify
# 默认参数
t = TextClassify()
text = ''
logtis = t.predict(text, precision='16')
# 获取index2label
t.index2label
# 获取顶级标签
t.get_top_label(text, k=5, precision='16')
```
## 参数
### `TextClassify`
* model: 'fasttext' (默认), 'cnn', 'mcnn', 'mgcnn'
* cut: True, False (默认)
* cut_model: 'pyltp' (默认), 'jieba'
* pyltp_model: '/data_hdd/ltp_data/cws.model'
* fasttext_char_model: '/data_hdd/embedding/fasttext/zhihu_char_model.bin'
* fasttext_word_model: '/data_hdd/embedding/fasttext/zhihu_word_model.bin'
* cnn_char_model: '/home/keming/GitHub/custom_recom/cnn_char_fulltext_best.pth'
* cnn_word_model: '/home/keming/GitHub/custom_recom/cnn_word_fulltext_best.pth'
* mcnn_model: '/home/keming/GitHub/custom_recom/mcnn_fulltext_best.pth'
* mgcnn_model: '/home/keming/GitHub/custom_recom/mgcnn_fulltext_best.pth'
* char_embedding_model: '/data_hdd/embedding/wiki_char_256.model'
* word_embedding_model: '/data_hdd/embedding/wiki_word_256.model'
* words_index: '/data_hdd/zhihu/topic/words.csv'
* chars_index: '/data_hdd/zhihu/topic/chars.csv'
* labels_index: '/data_hdd/zhihu/topic/topics.csv'
* delete_char: '/data_hdd/zhihu/del_chars.txt'
* num_class: 384
* embedding_dim: 256
* num_filter: 128
* char_sentence_length: 256
* word_sentence_length: 128
* char_vocab_size: 12592
* word_vocab_size: 727811
* filter_size_1: [2, 3, 4, 5]
* filter_size_2: [2, 3, 4]
* rnn_num_unit: 128
* rnn_num_layer: 2
### `TextClassify.predict`
* text
* precision: '16' (默认), '32', '64'
### `TextClassify.get_top_label`
* text
* k: 5 (默认), 返回标签的数量
* precision: '16' (默认), '32', '64'
## 模型
* fastText char
* fastText word
* CNN char embedding
* CNN word embedding
* CNN char & word embedding
* CNN + BiGRU + char & word embedding
## 分词模型
* pyltp
* jieba
## 嵌入
* fastText (CBOW / skip-gram)
* gensim
字符或单词嵌入
## 使用方法
```python
from text_classify import TextClassify
# 默认参数
t = TextClassify()
text = ''
logtis = t.predict(text, precision='16')
# 获取index2label
t.index2label
# 获取顶级标签
t.get_top_label(text, k=5, precision='16')
```
## 参数
### `TextClassify`
* model: 'fasttext' (默认), 'cnn', 'mcnn', 'mgcnn'
* cut: True, False (默认)
* cut_model: 'pyltp' (默认), 'jieba'
* pyltp_model: '/data_hdd/ltp_data/cws.model'
* fasttext_char_model: '/data_hdd/embedding/fasttext/zhihu_char_model.bin'
* fasttext_word_model: '/data_hdd/embedding/fasttext/zhihu_word_model.bin'
* cnn_char_model: '/home/keming/GitHub/custom_recom/cnn_char_fulltext_best.pth'
* cnn_word_model: '/home/keming/GitHub/custom_recom/cnn_word_fulltext_best.pth'
* mcnn_model: '/home/keming/GitHub/custom_recom/mcnn_fulltext_best.pth'
* mgcnn_model: '/home/keming/GitHub/custom_recom/mgcnn_fulltext_best.pth'
* char_embedding_model: '/data_hdd/embedding/wiki_char_256.model'
* word_embedding_model: '/data_hdd/embedding/wiki_word_256.model'
* words_index: '/data_hdd/zhihu/topic/words.csv'
* chars_index: '/data_hdd/zhihu/topic/chars.csv'
* labels_index: '/data_hdd/zhihu/topic/topics.csv'
* delete_char: '/data_hdd/zhihu/del_chars.txt'
* num_class: 384
* embedding_dim: 256
* num_filter: 128
* char_sentence_length: 256
* word_sentence_length: 128
* char_vocab_size: 12592
* word_vocab_size: 727811
* filter_size_1: [2, 3, 4, 5]
* filter_size_2: [2, 3, 4]
* rnn_num_unit: 128
* rnn_num_layer: 2
### `TextClassify.predict`
* text
* precision: '16' (默认), '32', '64'
### `TextClassify.get_top_label`
* text
* k: 5 (默认), 返回标签的数量
* precision: '16' (默认), '32', '64'
项目详情
关闭
text_classify-0.0.8-py2.py3-none-any.whl的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 317d292c27e1eb1aaae0879b5aee8b0030ebb2431135b7c38773d2c883f4c767 |
|
MD5 | 832d9297cbc00384a1632c19e9fc2122 |
|
BLAKE2b-256 | 4b14eb2f2ce36770ef53730eb0cc1abfe8932babeeff293b067be9d5469a8ead |