一个用于使用spaCy从文本中计算各种特征的库

这些详情尚未通过PyPI验证

项目链接

项目描述

TextDescriptives

TextDescriptives是一个Python库，用于使用spaCy v.3管道组件和扩展从文本中计算大量度量。

🔧 安装

pip安装textdescriptives

📰 新闻

现在我们有了由TextDescriptives驱动的web-app，您无需编写任何代码即可提取和下载指标！去看看这里
2.0版本推出，包括新的API、新的组件、更新的文档和教程！组件现在称为"textdescriptives/{metric_name}”。新增了coherence组件，用于计算句子之间的语义连贯性。请参阅文档以获取教程和更多信息！

⚡ 快速开始

使用extract_metrics快速提取您想要的指标。要查看可用方法，您只需运行

import textdescriptives as td
td.get_valid_metrics()
# {'quality', 'readability', 'all', 'descriptive_stats', 'dependency_distance', 'pos_proportions', 'information_theory', 'coherence'}

将spacy_model参数设置为指定要使用的spaCy模型，否则，TextDescriptives将根据lang自动下载一个合适的模型。如果设置了lang，则不需要spacy_model，反之亦然。

在metrics参数中指定要提取的指标。None提取所有指标。

import textdescriptives as td

text = "The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it."
# will automatically download the relevant model (´en_core_web_lg´) and extract all metrics
df = td.extract_metrics(text=text, lang="en", metrics=None)

# specify spaCy model and which metrics to extract
df = td.extract_metrics(text=text, spacy_model="en_core_web_lg", metrics=["readability", "coherence"])

与spaCy的用法

要与其他spaCy管道集成，导入库并使用标准spaCy语法将组件添加到您的管道中。可用的组件包括带有前缀textdescriptives/的descriptive_stats、readability、dependency_distance、pos_proportions、coherence和quality。

如果您想添加所有组件，可以使用简写textdescriptives/all。

import spacy
import textdescriptives as td
# load your favourite spacy model (remember to install it first using e.g. `python -m spacy download en_core_web_sm`)
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textdescriptives/all") 
doc = nlp("The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it.")

# access some of the values
doc._.readability
doc._.token_length

TextDescriptives包括从Doc提取指标到Pandas DataFrame或字典的便利函数。

td.extract_dict(doc)
td.extract_df(doc)

	文本	第一级连贯性	第二级连贯性	pos_prop_DET	pos_prop_NOUN	pos_prop_AUX	pos_prop_VERB	pos_prop_PUNCT	pos_prop_PRON	pos_prop_ADP	pos_prop_ADV	pos_prop_SCONJ	flesch_reading_ease	flesch_kincaid_grade	smog	gunning_fog	automated_readability_index	coleman_liau_index	lix	rix	n_stop_words	alpha_ratio	mean_word_length	doc_length	proportion_ellipsis	proportion_bullet_points	duplicate_line_chr_fraction	duplicate_paragraph_chr_fraction	duplicate_5-gram_chr_fraction	duplicate_6-gram_chr_fraction	duplicate_7-gram_chr_fraction	duplicate_8-gram_chr_fraction	duplicate_9-gram_chr_fraction	duplicate_10-gram_chr_fraction	top_2-gram_chr_fraction	top_3-gram_chr_fraction	top_4-gram_chr_fraction	symbol_#_to_word_ratio	contains_lorem ipsum	passed_quality_check	dependency_distance_mean	dependency_distance_std	prop_adjacent_dependency_relation_mean	prop_adjacent_dependency_relation_std	token_length_mean	token_length_median	token_length_std	sentence_length_mean	sentence_length_median	sentence_length_std	syllables_per_token_mean	syllables_per_token_median	syllables_per_token_std	n_tokens	n_unique_tokens	proportion_unique_tokens	n_characters	n_sentences
0	世界已经改变(...)	0.633002	0.573323	0.097561	0.121951	0.0731707	0.170732	0.146341	0.195122	0.0731707	0.0731707	0.0487805	107.879	-0.0485714	5.68392	3.94286	-2.45429	-0.708571	12.7143	0.4	24	0.853659	2.95122	41	0	0	0	0	0.232258	0.232258	0	0	0	0	0.0580645	0.174194	0	0	False	False	1.77524	0.553188	0.457143	0.0722806	3.28571	3	1.54127	7	6	3.09839	1.08571	1	0.368117	35	23	0.657143	121	5

📖 文档

TextDescriptives拥有详细的文档以及一系列Jupyter笔记本教程。所有教程都位于docs/tutorials文件夹中，也可以在文档网站上找到。

文档
📚 入门	有关如何使用TextDescriptives及其功能的指南和说明。
👩‍💻 演示	TextDescriptives的实时演示。
😎 教程	如何充分利用TextDescriptives的详细教程。
📰 新闻和更新日志	新功能、更改和版本历史。
🎛 API参考	TextDescriptive的API的详细参考。包括功能文档
📄 论文	TextDescriptives论文的预印本。

项目详情

这些详情尚未通过PyPI验证

项目链接

发行历史发行通知 | RSS 源

本版本

2.8.2

2024年5月31日

2.8.1

2024年5月7日

2.8.0

2024年4月9日

2.7.3

2024年2月6日

2.7.2

2024年2月6日

2.7.1

2023年10月31日

2.7.0

2023年10月12日

2.6.2

2023年7月31日

2.6.1

2023年5月3日

2.6.0

2023年4月28日

2.5.1

2023年4月26日

2.5.0

2023年4月26日

2.4.6

2023年4月24日

2.4.5

2023年4月19日

2.4.4

2023年3月28日

2.4.3

2023年3月1日

2.4.2

2023年3月1日

2.4.1

2023年2月8日

2.4.0

2023年1月31日

2.3.0

2023年1月23日

2.2.0

2023年1月16日

2.1.0

2023年1月6日

2.0.10

2023年1月3日

2.0.4

2023年1月3日

1.1.1

2022年12月5日

1.1.0

2022年9月26日

1.0.7

2022年5月4日

1.0.6

2021年10月28日

1.0.5

2021年10月4日

1.0.4

2021年8月31日

1.0.3

2021年8月17日

1.0.2

2021年8月16日

1.0.1

2021年8月9日

1.0.0

2021年8月9日

0.2.0

2021年8月9日

0.1.1

2020年3月6日

下载文件

下载适用于您的平台的文件。如果您不确定选择哪个，请了解有关安装包的更多信息。

源分布

textdescriptives-2.8.2.tar.gz (1.4 MB 查看哈希值)

上传时间 2024年5月31日 源

构建分布

textdescriptives-2.8.2-py3-none-any.whl (254.3 kB 查看哈希值)

上传时间 2024年5月31日 Python 3