Homoglyphs
项目描述
Homoglyphs
Homoglyphs -- 获取 同形异义词 并转换为ASCII的python库。
功能
它是 confusable_homoglyphs 的更智能版本
- 自动或手动选择类别(来自 ISO 15924的别名)。
- 自动或手动仅加载内存中所需的字母。
- 转换为ASCII。
- 更可配置。
- 更稳定。
安装
sudo pip install homoglyphs
使用
最好的解释方式是展示其工作原理。所以,让我们看看实际的使用方法。
导入
import homoglyphs as hg
语言
#detect
hg.Languages.detect('w')
# {'pl', 'da', 'nl', 'fi', 'cz', 'sr', 'pt', 'it', 'en', 'es', 'sk', 'de', 'fr', 'ro'}
hg.Languages.detect('т')
# {'mk', 'ru', 'be', 'bg', 'sr'}
hg.Languages.detect('.')
# set()
# get alphabet for languages
hg.Languages.get_alphabet(['ru'])
# {'в', 'Ё', 'К', 'Т', ..., 'Р', 'З', 'Э'}
# get all languages
hg.Languages.get_all()
# {'nl', 'lt', ..., 'de', 'mk'}
类别
类别 -- (来自 ISO 15924的别名)。
#detect
hg.Categories.detect('w')
# 'LATIN'
hg.Categories.detect('т')
# 'CYRILLIC'
hg.Categories.detect('.')
# 'COMMON'
# get alphabet for categories
hg.Categories.get_alphabet(['CYRILLIC'])
# {'ӗ', 'Ԍ', 'Ґ', 'Я', ..., 'Э', 'ԕ', 'ӻ'}
# get all categories
hg.Categories.get_all()
# {'RUNIC', 'DESERET', ..., 'SOGDIAN', 'TAI_LE'}
Homoglyphs
获取同形异义词
# get homoglyphs (latin alphabet initialized by default)
hg.Homoglyphs().get_combinations('q')
# ['q', '𝐪', '𝑞', '𝒒', '𝓆', '𝓺', '𝔮', '𝕢', '𝖖', '𝗊', '𝗾', '𝘲', '𝙦', '𝚚']
字母加载
# load alphabet on init by categories
homoglyphs = hg.Homoglyphs(categories=('LATIN', 'COMMON', 'CYRILLIC')) # alphabet loaded here
homoglyphs.get_combinations('гы')
# ['rы', 'гы', 'ꭇы', 'ꭈы', '𝐫ы', '𝑟ы', '𝒓ы', '𝓇ы', '𝓻ы', '𝔯ы', '𝕣ы', '𝖗ы', '𝗋ы', '𝗿ы', '𝘳ы', '𝙧ы', '𝚛ы']
# load alphabet on init by languages
homoglyphs = hg.Homoglyphs(languages={'ru', 'en'}) # alphabet will be loaded here
homoglyphs.get_combinations('гы')
# ['rы', 'гы']
# manual set alphabet on init # eng rus
homoglyphs = hg.Homoglyphs(alphabet='abc абс')
homoglyphs.get_combinations('с')
# ['c', 'с']
# load alphabet on demand
homoglyphs = hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD)
# ^ alphabet will be loaded here for "en" language
homoglyphs.get_combinations('гы')
# ^ alphabet will be loaded here for "ru" language
# ['rы', 'гы']
您可以按需组合 categories
、languages
、alphabet
和任何策略。策略指定如何处理尚未加载的任何字符
STRATEGY_LOAD
:为此字符加载类别STRATEGY_IGNORE
:将字符添加到结果STRATEGY_REMOVE
:从结果中删除字符
将符号转换为ASCII字符
homoglyphs = hg.Homoglyphs(languages={'en'}, strategy=hg.STRATEGY_LOAD)
# convert
homoglyphs.to_ascii('ТЕСТ')
# ['TECT']
homoglyphs.to_ascii('ХР123.') # this is cyrillic "х" and "р"
# ['XP123.', 'XPI23.', 'XPl23.']
# string with chars which can't be converted by default will be ignored
homoglyphs.to_ascii('лол')
# []
# you can set strategy for removing not converted non-ASCII chars from result
homoglyphs = hg.Homoglyphs(
languages={'en'},
strategy=hg.STRATEGY_LOAD,
ascii_strategy=hg.STRATEGY_REMOVE,
)
homoglyphs.to_ascii('лол')
# ['o']
# also you can set up range of allowed char codes for ascii (0-128 by default):
homoglyphs = hg.Homoglyphs(
languages={'en'},
strategy=hg.STRATEGY_LOAD,
ascii_strategy=hg.STRATEGY_REMOVE,
ascii_range=range(ord('a'), ord('z')),
)
homoglyphs.to_ascii('ХР123.')
# ['l']
homoglyphs.to_ascii('хр123.')
# ['xpl']
项目详情
下载文件
下载适用于您平台的应用程序。如果您不确定选择哪个,请了解更多关于安装包的信息。
源代码分发
homoglyphs-2.0.4.tar.gz (88.3 kB 查看散列值)
构建分发
homoglyphs-2.0.4-py3-none-any.whl (88.4 kB 查看散列值)