快速语义搜索和比较
项目描述
PumpkinPy - 在Python中实现的语义相似度
关于
PumpkinPy使用IC有序位图进行快速排名基因和疾病(表型按频率降序排序并使用one-hot编码)。这对于更大的本体,如Upheno,以及大型数据集,如给定一组输入HPO术语的所有小鼠基因的排名,非常有用。这种方法最初在OWLTools和OwlSim-v3中使用。
该项目目标是构建PhenoDigm算法的Python实现。还包括距离和相似度(欧几里得、余弦、Jin-Conrath、Resnik、Jaccard)的常用度量实现
免责声明:这是一个副项目,需要更多的文档和测试
入门指南
安装pyroaring需要Python 3.8+和python3-dev
从PyPI安装
pip install pumpkin_py
本地构建
要本地构建,首先安装poetry-
https://poetry.pythonlang.cn/docs/#installation
然后运行make
make
用法
获取实现的相似度度量列表
from pumpkin_py import get_methods
get_methods()
['jaccard', 'cosine', 'phenodigm', 'symmetric_phenodigm', 'resnik', 'symmetric_resnik', 'ic_cosine', 'sim_gic']
加载闭包和注释
import gzip
from pathlib import Path
from pumpkin_py import build_ic_graph_from_closures, flat_to_annotations, search
closures = Path('.') / 'data' / 'hpo' / 'hp-closures.tsv.gz'
annotations = Path('.') / 'data' / 'hpo' / 'phenotype-annotations.tsv.gz'
root = "HP:0000118"
with gzip.open(annotations, 'rt') as annot_file:
annot_map = flat_to_annotations(annot_file)
with gzip.open(closures, 'rt') as closure_file:
graph = build_ic_graph_from_closures(closure_file, root, annot_map)
根据表型配置文件搜索最佳匹配的疾病
import pprint
from pumpkin_py import search
profile_a = (
"HP:0000403,HP:0000518,HP:0000565,HP:0000767,"
"HP:0000872,HP:0001257,HP:0001263,HP:0001290,"
"HP:0001629,HP:0002019,HP:0002072".split(',')
)
search_results = search(profile_a, annot_map, graph, 'phenodigm')
pprint.pprint(search_results.results[0:5])
[SimMatch(id='ORPHA:94125', rank=1, score=72.67599348696685),
SimMatch(id='ORPHA:79137', rank=2, score=71.57368233248252),
SimMatch(id='OMIM:619352', rank=3, score=70.98305459477629),
SimMatch(id='OMIM:618624', rank=4, score=70.94596234638497),
SimMatch(id='OMIM:617106', rank=5, score=70.83097366257857)]
获取Monarch注释和闭包的示例脚本
使用robot和sparql生成闭包和类标签
注释数据来自最新的Monarch版本
- 需要>Java 8
cd data/monarch/ && make
PhenoDigm参考: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3649640/
Exomiser: https://github.com/exomiser/Exomiser
OWLTools: https://github.com/owlcollab/owltools
OWLSim-v3: https://github.com/monarch-initiative/owlsim-v3
项目详情
下载文件
下载适合您平台的应用程序。如果您不确定选择哪个,请了解更多关于安装包的信息。
源代码分发
pumpkin_py-0.0.2.tar.gz (17.9 kB 查看哈希值)
构建分发
pumpkin_py-0.0.2-py3-none-any.whl (23.7 kB 查看哈希值)
关闭
pumpkin_py-0.0.2.tar.gz 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 90c5249cb0b2cdbf63b1ef5427c3d3e2c1688c3bb39c75267b422b181901fc43 |
|
MD5 | a433b2dc1b33f928324f026598a4fb2f |
|
BLAKE2b-256 | 4f30c901e6e306ae054b49a4ab8dab1ba73a333cf81b58e229380434fbb9e967 |
关闭
pumpkin_py-0.0.2-py3-none-any.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 89911821a0c9373a7261ce02fc9a924dc40d422fcd87136ae2d24b179e87f633 |
|
MD5 | 4eed5230bf91bfdc2133ea71efc816bf |
|
BLAKE2b-256 | ea7e500168dbf1bee45d11444c34ecf5b833314483dda7a6747c14dfa905e67a |