Python中的模糊字符串匹配
项目描述
fuzzywuzzymit
像老板一样进行模糊字符串匹配。它使用Levenshtein距离来计算简单易用的包中序列之间的差异。
要求
Python 2.4或更高版本
difflib
测试
pycodestyle
hypothesis
pytest
安装
通过PyPI使用PIP
pip install fuzzywuzzymit
通过GitHub使用PIP
pip install git+git://github.com/graingert/fuzzywuzzymit.git@0.16.0#egg=fuzzywuzzymit
添加到您的requirements.txt文件中(之后运行pip install -r requirements.txt)
git+ssh://git@github.com/graingert/fuzzywuzzymit.git@0.16.0#egg=fuzzywuzzymit
手动通过GIT
git clone git://github.com/graingert/fuzzywuzzymit.git fuzzywuzzymit
cd fuzzywuzzymit
python setup.py install
用法
>>> from fuzzywuzzymit import fuzz
>>> from fuzzywuzzymit import process
简单比率
>>> fuzz.ratio("this is a test", "this is a test!")
97
部分比率
>>> fuzz.partial_ratio("this is a test", "this is a test!")
100
标记排序比率
>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
100
标记集比率
>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
100
过程
>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
[('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
("Dallas Cowboys", 90)
您还可以向extractOne方法传递额外的参数,使其使用特定的评分器。一个典型的用例是匹配文件路径
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)
已知的端口
fuzzywuzzymit也正在被移植到其他语言!以下是我们知道的一些端口
Java: xpresso的fuzzywuzzymit实现
Java: fuzzywuzzymit (Java端口)
Rust: fuzzyrusty (Rust端口)
JavaScript: fuzzball.js (JavaScript端口)
C++: Tmplt/fuzzywuzzymit
项目细节
关闭
fuzzywuzzymit-0.0.2.tar.gz的散列值
算法 | 散列摘要 | |
---|---|---|
SHA256 | 463314625a307362dde9a43ccca44057d3446a6daa156974c9fabf11644446ee |
|
MD5 | 2a598d3d565e5cc850556aa2b1274bde |
|
BLAKE2b-256 | 3509f73fef4ec31c3c4a0ce0b7097514c378165d2e8337325f5ffbe4b867398c |
关闭
fuzzywuzzymit-0.0.2-py2.py3-none-any.whl的散列值
算法 | 散列摘要 | |
---|---|---|
SHA256 | e30bd65c0171c2c5ef0a55a559bca7dd699b1a7b65735b1b111d6f57ddba2100 |
|
MD5 | e3f95f7ae8398634cc6f449a8e9301bc |
|
BLAKE2b-256 | 12f86004a43f8bb86e9a87b2fa5c01a35c7e15cb0a4c243ee557c96207f96bd6 |