计算汉明距离的快速工具
项目描述
一个小型且快速的C++工具,用于计算给定fasta格式的基因序列之间的成对距离。
Python接口
要使用Python接口,您应从PyPI安装它
python -m pip install hammingdist
距离矩阵
然后,您可以从Python中以以下方式使用它
import hammingdist
# To see the different optional arguments available:
help(hammingdist.from_fasta)
# To import all sequences from a fasta file
data = hammingdist.from_fasta("example.fasta")
# To import only the first 100 sequences from a fasta file
data = hammingdist.from_fasta("example.fasta", n=100)
# To import all sequences and remove any duplicates
data = hammingdist.from_fasta("example.fasta", remove_duplicates=True)
# To import all sequences from a fasta file, also treating 'X' as a valid character
data = hammingdist.from_fasta("example.fasta", include_x=True)
# The distance data can be accessed point-wise, though looping over all distances might be quite inefficient
print(data[14,42])
输出格式
构建的距离矩阵可以写入磁盘的几种不同格式
# The data can be written to disk in csv format (default `distance` Ripser format) and retrieved:
data.dump("backup.csv")
retrieval = hammingdist.from_csv("backup.csv")
# It can also be written in lower triangular format (comma-delimited row-major, `lower-distance` Ripser format):
data.dump_lower_triangular("lt.txt")
retrieval = hammingdist.from_lower_triangular("lt.txt")
# Or in sparse format (`sparse` Ripser format: space-delimited triplet of `i j d(i,j)`
# with one line for each distance entry i > j which is not above threshold):
data.dump_sparse("sparse.txt", threshold=3)
# If the `remove_duplicates` option was used, the sequence indices can also be written.
# For each input sequence, this prints the corresponding index in the output:
data.dump_sequence_indices("indices.txt")
# The lower-triangular distance elements can also be directly accessed as a 1-d numpy array:
lt_array = data.lt_array
# The elements in this array correspond to the 2-d indices (row=1,col=0), (row=2,col=0), (row=2,col=1), ...
# These indices can be generated using the numpy tril_indices function, e.g. to construct the lower-triangular matrix:
lt_matrix = np.zeros((n_seq, n_seq))
lt_matrix[np.tril_indices(n_seq, -1)] = lt_array
重复项
当使用选项remove_duplicates=True
调用from_fasta
时,在构建差异矩阵之前会删除重复序列。
例如,给定以下三个输入序列
索引 | 序列 |
---|---|
0 | ACG |
1 | ACG |
2 | TAG |
距离矩阵将是ACG
和TAG
之间的距离的2x2矩阵
ACG | TAG | |
---|---|---|
ACG | 0 | 2 |
TAG | 2 | 0 |
距离矩阵的行与原始序列中的每个索引对应
索引 | 序列 | 距离矩阵中的行 |
---|---|---|
0 | ACG | 0 |
1 | ACG | 0 |
2 | TAT | 1 |
最后一列是DataSet.dump_sequence_indices
写入磁盘的内容。
您也可以使用hammingdist.fasta_sequence_indices
而不计算距离矩阵来构建它(作为numpy数组)
import hammingdist
sequence_indices = hammingdist.fasta_sequence_indices(fasta_file)
最大距离值
默认情况下,hammingdist.from_fasta
返回的距离矩阵中的元素具有最大值255。您还可以使用max_distance
参数设置较小的最大值。对于大于此值的距离,hammingdist.from_fasta_large
支持高达65535的距离(但需要两倍的RAM)
与参考序列的距离
可以使用以下方法计算fasta文件中每个序列与给定参考序列的距离
import hammingdist
distances = hammingdist.fasta_reference_distances(sequence, fasta_file, include_x=True)
此函数返回一个包含每个序列与参考序列距离的numpy数组。
您还可以计算两个单独序列之间的距离
import hammingdist
distance = hammingdist.distance("ACGTX", "AAGTX", include_x=True)
Linux上的OpenMP
在Linux上,hammingdist是带有OpenMP(多线程)支持的构建的,并将自动使用所有可用的CPU线程。
Linux上的CUDA
在Linux上,hammingdist还带有CUDA(Nvidia GPU)支持。要使用GPU而不是CPU,请在调用from_fasta
时设置use_gpu=True
。在这里,我们还设置了最大距离为2
import hammingdist
data = hammingdist.from_fasta("example.fasta", use_gpu=True, max_distance=2)
此外,现在可以直接使用GPU使用from_fasta_to_lower_triangular
函数从fasta文件构建下三角矩阵文件。这避免了在内存中存储整个距离矩阵,并且将GPU上的计算与CPU上的磁盘I/O交替进行,这意味着它需要更少的RAM并运行得更快。
import hammingdist
hammingdist.from_fasta_to_lower_triangular('input_fasta.txt', 'output_lower_triangular.txt', use_gpu=True, max_distance=2)
性能历史
hammingdist中不同性能改进的影响的粗略度量
项目详情
下载文件
下载适用于您的平台的文件。如果您不确定选择哪个,请了解有关安装包的更多信息。
源分布
构建分布
hammingdist-1.3.0-pp310-pypy310_pp73-win_amd64.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | b3ca365fcac93c3f93b7970834811c3898153a58cff90a40f612bb9aa687bf14 |
|
MD5 | cf3d9986e8ee51d53d9cf60b72d3d3f1 |
|
BLAKE2b-256 | bcad322f36f8d1ca307ef33712a072e14241eba3b6a5bc8bd252e065a25c8aad |
hammingdist-1.3.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 9aa810414b7c1ae2f43a8ae169121c460318b9a788ef9104214156afa5725e03 |
|
MD5 | 05a9277d04ca29a173b2a2a19447b2c5 |
|
BLAKE2b-256 | 8c9adb0e5eb70371ca75cae329358de8537a9f2a90b9fec1c0adf6c8dcb64689 |
hammingdist-1.3.0-pp310-pypy310_pp73-macosx_11_0_arm64.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | e59ce94a6c0d5f7783882742ab737d3e952bfa8a42d8af4084a93ddf3508c9b9 |
|
MD5 | cd566556d8b03f08cd6aa7176dc2c84c |
|
BLAKE2b-256 | 818e0512158f572e76c10689de87ed2625e9abddd808c55edeaa7c63d6f9c94e |
hammingdist-1.3.0-pp310-pypy310_pp73-macosx_10_15_x86_64.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | e238e50c7df8020299b574c4d3551eb534688c1776efec70e4d5ed380aa1b3ef |
|
MD5 | 24da039cc85bb9f478c28845f0a29a23 |
|
BLAKE2b-256 | 2c7a39fb6f6025392ae187111f6e71a88c9280f913e02b277293e38ed53270d4 |
hammingdist-1.3.0-pp39-pypy39_pp73-win_amd64.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | ae4fc273492bfbd3a77f7f21164da991872c492e09082edfba059545d54a1d57 |
|
MD5 | ab47b3eca58766070a1573f986c796f6 |
|
BLAKE2b-256 | a324ba4cc8370306ba39f5053dbf2395744bef1676c80aba1cfa7706881edfee |
哈希值 用于 hammingdist-1.3.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 08d4c9108820431b2ef4774a1aaa97876aa4337f8610ff92bfec1dfd49f3a694 |
|
MD5 | 31773ea645a321b3db8190b3b7f27df9 |
|
BLAKE2b-256 | 1f057267f4da56834a21d251e1096d8c52ba8188ce62924dd1a83a96acd77b9e |
哈希值 用于 hammingdist-1.3.0-pp39-pypy39_pp73-macosx_11_0_arm64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 67cbe0f6cf942716bd84561516771c419b6e0636c41194f9460243897fd01c16 |
|
MD5 | 1ac59dedb611686b02511fc72008b885 |
|
BLAKE2b-256 | 8938a17efe786f8ce6de1116fdfa6742f373b29164e1ed2e88d0b072e8f2f378 |
哈希值 用于 hammingdist-1.3.0-pp39-pypy39_pp73-macosx_10_15_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | b1ec08c4c4c8cedfa84c93999184f0f8ce0d298194cf2804a512c5aae797c204 |
|
MD5 | e57d5c98ac2ae892c7bb057652162127 |
|
BLAKE2b-256 | a20638638695131a46ee08addb0defa537e96fe05fd69f17f987f08baf3e406f |
哈希值 用于 hammingdist-1.3.0-pp38-pypy38_pp73-win_amd64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 289aa66e0e523cca08b18b33199d66a4141085f3bb0af79e8fcba4404005c45e |
|
MD5 | 22ed3d371276b6c95ae6d2f5b0a02575 |
|
BLAKE2b-256 | 214ad9de71cde7249a1a190c1693ca313295a25df83f787a263758c9f0ed4ae9 |
哈希值 用于 hammingdist-1.3.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | a48d3d18aad002e78f88917ed94b1dd99b9452e6f5a24fedcea5ee4d52e80cbe |
|
MD5 | ffd8807b4b3651a2fb2fa8b761563bf0 |
|
BLAKE2b-256 | 5e24dc56bc6c7049fe8941cf14877fdb73dfe8fec319ef61830e5b3f321f3264 |
哈希值 用于 hammingdist-1.3.0-pp38-pypy38_pp73-macosx_11_0_arm64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | a32d05bc4ec8c2be98fa38efaefd940ff5465d48359f6b4f93237ef1907d1a60 |
|
MD5 | 24e29c9ed0867967d80f0b3c9df7bc7f |
|
BLAKE2b-256 | f9790d0f2684d895162217f5c7d4c834b885ac3149302be08d4cd2d6f545987d |
哈希值 用于 hammingdist-1.3.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | dce48d10a73e2a2d5409c28b94e775f9bf19ad0f049eea4d02038b691e62e4b5 |
|
MD5 | 73d2a78d95e66bf3dcc4df2dc8b58886 |
|
BLAKE2b-256 | b4fa5a77cdb9f9d1d3d6e945a33d3a340523a0eaefde51e96c2def3d0abb0bed |
哈希值 用于 hammingdist-1.3.0-pp37-pypy37_pp73-win_amd64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | c7d66f9afdd4058b5cbc8f3b58f08872b9d1283c4107ad4df0bed5a3a8950519 |
|
MD5 | fa815ba90aa7d6afcb3df98a7cda2525 |
|
BLAKE2b-256 | 3f5a8cdcae5d642bb64fa0ffda0d6bd5b163a1b547153be0a57c6a9e13b55812 |
哈希值 用于 hammingdist-1.3.0-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 908465d6f0e28fcc1a4ab6f6480172645fadafcffa6586e6e883b09eff5fba62 |
|
MD5 | d27b51a88643552811461e7c3f2d9a35 |
|
BLAKE2b-256 | bbf8c8403cb9de3b64897abf03afe12438184b0eefec3618828566b09d6212f4 |
哈希值 用于 hammingdist-1.3.0-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 51823e2af77e1c05c055125a8a0d22bc983db726badf13689f17414599211277 |
|
MD5 | 36a68448a9d0165a14a3d2b965ab32c3 |
|
BLAKE2b-256 | 18f20f0174d17eacbbbb9ea4cc76ab0459a9bd27ca5480268695e4e8ed4a3464 |
哈希值 为 hammingdist-1.3.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | ed76e220428cd148a6d3ee41900b81c09609724cbd93698308b4a0d735ed9553 |
|
MD5 | ba62ac32b3a9a9d1a685c8fa17e5c5fe |
|
BLAKE2b-256 | 6d5695aee684cacdbb1c5641299d4699808e602d978466f37c61949c7d1fa041 |
哈希值 为 hammingdist-1.3.0-cp313-cp313-macosx_11_0_arm64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 366fa690dd9ace834881335375d514b1fd782869167dede470a905690d9b6940 |
|
MD5 | 69e2d01f979a89d4b7a5536f634771b0 |
|
BLAKE2b-256 | bbfb622d65641ebaad1ceeac527fba8a9ac2a8e4b4fd018b39551d9d4ad41b18 |
哈希值 为 hammingdist-1.3.0-cp313-cp313-macosx_10_13_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | c3c41ccdc176d9cb24de363958fe4c8ebc1715aa5088a397fd3bbc9961116408 |
|
MD5 | 2e002f2ee18c6b0c38aacced60486947 |
|
BLAKE2b-256 | 86b46d473a758250b543dbdfeffee38cb3d532d8669889901159d7448893045f |
哈希值 为 hammingdist-1.3.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 6ee7dde95c52a27ab380582eeeb2d1162015c2c31de6a4748ba2386893af541c |
|
MD5 | f91c98b4e65fb81c90cb0ef661cde8d6 |
|
BLAKE2b-256 | 7bb1f60574c31cd6a7c96f402cedbef9c27ec0cd8bcc59cf815df7b3c559ba40 |
哈希值 为 hammingdist-1.3.0-cp312-cp312-macosx_11_0_arm64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 52363bb8d075a071981abfc22c5c56e6a0bdc3b98158093e711bc63ada2a368e |
|
MD5 | 3e59fc6a571aae2bea85b53d7dd138e5 |
|
BLAKE2b-256 | 42cbbb55738f1341fbc685c34e2bf0630b648fb41b795ca9c22857aaa3722f8f |
哈希值 为 hammingdist-1.3.0-cp312-cp312-macosx_10_9_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 181a0f045d9ece79c1cd0c0561a44cde55a2e77f3d391d57c09c35f609b42908 |
|
MD5 | f05ea00507bfc9edf5adc02f1d071976 |
|
BLAKE2b-256 | 3cdb2b43042888fc0b89eb25e5cf39a86f50fbf4ab258df5f0590c30a6259851 |
哈希值 为 hammingdist-1.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 2cebae7771535f4f81b35019cedab850c6806a5f52f4e6a222b2d57b1c43fbf2 |
|
MD5 | ff6f8fb53c75225d831c4fbf9a51ad52 |
|
BLAKE2b-256 | b152742aba72008845e4ce25738b987e94ed44e30ec8c3ed784d63a42797bbcd |
哈希值 为 hammingdist-1.3.0-cp311-cp311-macosx_11_0_arm64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 03c8cee8cdb413e5ab9630867458bfc822e464ff02e953b0fb0b24ddb7239e06 |
|
MD5 | 48aac60253d43b5631f5ccb8926568fb |
|
BLAKE2b-256 | a292e2839dc06559d77d20a69085eca92779ab65a8a5fff9b579ce95ac1fa74f |
哈希值 为 hammingdist-1.3.0-cp311-cp311-macosx_10_9_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 57658c26a67e87548d0cb9e709508a3fb3a45d73512ff5d6708328e1f101e394 |
|
MD5 | 3f2813cbd6913bec1414cd92b4bd09af |
|
BLAKE2b-256 | 2ba43145f5fb971c1e4bea8a9a146ffbd83232097ea4b7f935b592dff1869e02 |
哈希值 用于 hammingdist-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 89bb3b6daeca0c56e373b140e09f4102f20e0113ef9558b06e8c2a721981eb44 |
|
MD5 | d08137e996b63f484948c6952c720df1 |
|
BLAKE2b-256 | b1a3d9ce719f68acf7f6a5a2d0d4f0118d0d4cd74d3fbfcee15211aa74917f8a |
哈希值 用于 hammingdist-1.3.0-cp310-cp310-macosx_11_0_arm64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | fd37a3d5dc89a3dc5be7417a3744424057b99b94a328c7b21f969cb94a7f965f |
|
MD5 | 2650cb37fafd008da4e9ab7dc9cde05e |
|
BLAKE2b-256 | 1be7076ff0551a8ea7fbd0f0ff2654a30ffe0a8f90b5d5f3e1b0f0b810833fd1 |
哈希值 用于 hammingdist-1.3.0-cp310-cp310-macosx_10_9_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 049b5a08b80097bd5abb38bee172988818f187126fa63f46bf474f6b721e4f24 |
|
MD5 | 7cc3e08afe6b33e8f9ad6842c2308be3 |
|
BLAKE2b-256 | c3c0a8aab2df0ae02250f8368009dab3036a1901d4cc0da5fa9a4016f441b04d |
哈希值 用于 hammingdist-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 06104b6d4ae23ffa3a602053b34350d418276249affd55b0a4d6a7231c9d7e21 |
|
MD5 | 8999930f87f4e7d6369a21816483bb17 |
|
BLAKE2b-256 | a9b749b60d195db2b1055537f9c0f772eb661326fce4413a0e25e68b4c1f0397 |
哈希值 用于 hammingdist-1.3.0-cp39-cp39-macosx_11_0_arm64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 5353d8f7a5651ae50c1be92c75662584ffa6dc3d5ed511b818b3cc8bdb6b1aa7 |
|
MD5 | 6d7d6bf8076d53b9815f30d340b74039 |
|
BLAKE2b-256 | 69a79f26413dff80aa777810c401a53400151dc28c8b65e9632d6472985db183 |
哈希值 用于 hammingdist-1.3.0-cp39-cp39-macosx_10_9_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 1523dc00bb7be1f5a612556fc3bddc882422343332a4fb42be3a9556465b1d8c |
|
MD5 | 8d09672e314bca1e5ff23061ae90c190 |
|
BLAKE2b-256 | e07591195c267eed904d54e084c7c4bba94775985d28d1e2301dfcc44ba0b9b5 |
哈希值 用于 hammingdist-1.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | bdc50f0ccaf44e85c57750aca2eac309cba81a2dbd2ddffdcc289065ebc4a4d0 |
|
MD5 | 2dc2ec44eae52d3773442d01c2e01188 |
|
BLAKE2b-256 | 3daa4f1389a6f64458f18a42ae2d369ae2e67bc8ecbe0f53d9d6b14957960f6c |
哈希值 用于 hammingdist-1.3.0-cp38-cp38-macosx_11_0_arm64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 0b1b5c83170dba01b7b5f68e94c5199ff8a179c7b8d6dd0c5bfe0e12633d7d33 |
|
MD5 | 77f02ed2b6e257b8b9ee7efa803622aa |
|
BLAKE2b-256 | d333a32a4350861d18bbc10fec12f6de2eb8e6428969639135e8af35e158b4ae |
哈希值 用于 hammingdist-1.3.0-cp38-cp38-macosx_10_9_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | e34bfaed5a0b903cff2e4e04e77cc76f06893c50e4f40999ef2867a57bcc6634 |
|
MD5 | a0a144d4ac4fb7821f313665d2393feb |
|
BLAKE2b-256 | 2fd905ea64756ae27a2e2d2ba866e6c1b0b3f4660b754d26046d6e691489414a |
哈希值 用于 hammingdist-1.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | fb47a4375a6b9103b26f80467402f8afdf80a0eb950121a896866fd5b8cbbb85 |
|
MD5 | 76b0693e21c779a930c02daa75633bdf |
|
BLAKE2b-256 | 2a7680b1951eeab6f961b942ace4f02e688159171ae780a8e53c04d7f92bac7a |
哈希值 用于 hammingdist-1.3.0-cp37-cp37m-macosx_10_9_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 856e59d500e299a5b71261919653ec1e383ec1d14eae55872cf0bd0eb83aad71 |
|
MD5 | 847964ef3ec0f47220a651c729dd14b1 |
|
BLAKE2b-256 | 321dc9e45be1aa7342657e2f2e281a8bf3a5a9b2af4446d7789b80db32c13599 |