将PLINK文件读入Pandas数据框
项目描述
pydata-plink
pydata-plink是一个Python包,用于读取PLINK二进制文件格式和实现关系矩阵(PLINK或GCTA)。文件读取通过懒加载进行,这意味着它仅读取用户实际访问的基因型,从而节省内存。
重要变更可以在CHANGELOG.md中找到。
安装
可以使用pip进行安装
pip install pandas-plink
或者可以通过conda安装
conda install -c conda-forge pandas-plink
使用方法
它就像这样简单。
>>> from pandas_plink import read_plink1_bin
>>> G = read_plink1_bin("chr11.bed", "chr11.bim", "chr11.fam", verbose=False)
>>> print(G)
<xarray.DataArray 'genotype' (sample: 14, variant: 779)>
dask.array<shape=(14, 779), dtype=float64, chunksize=(14, 779)>
Coordinates:
* sample (sample) object 'B001' 'B002' 'B003' ... 'B012' 'B013' 'B014'
* variant (variant) object '11_316849996' '11_316874359' ... '11_345698259'
father (sample) <U1 '0' '0' '0' '0' '0' '0' ... '0' '0' '0' '0' '0' '0'
fid (sample) <U4 'B001' 'B002' 'B003' 'B004' ... 'B012' 'B013' 'B014'
gender (sample) <U1 '0' '0' '0' '0' '0' '0' ... '0' '0' '0' '0' '0' '0'
i (sample) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13
iid (sample) <U4 'B001' 'B002' 'B003' 'B004' ... 'B012' 'B013' 'B014'
mother (sample) <U1 '0' '0' '0' '0' '0' '0' ... '0' '0' '0' '0' '0' '0'
trait (sample) <U2 '-9' '-9' '-9' '-9' '-9' ... '-9' '-9' '-9' '-9' '-9'
a0 (variant) <U1 'C' 'G' 'G' 'C' 'C' 'T' ... 'T' 'A' 'C' 'A' 'A' 'T'
a1 (variant) <U1 'T' 'C' 'C' 'T' 'T' 'A' ... 'C' 'G' 'T' 'G' 'C' 'C'
chrom (variant) <U2 '11' '11' '11' '11' '11' ... '11' '11' '11' '11' '11'
cm (variant) float64 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
pos (variant) int64 157439 181802 248969 ... 28937375 28961091 29005702
snp (variant) <U9 '316849996' '316874359' ... '345653648' '345698259'
>>> print(G.sel(sample="B003", variant="11_316874359").values)
0.0
>>> print(G.a0.sel(variant="11_316874359").values)
G
>>> print(G.sel(sample="B003", variant="11_316941526").values)
2.0
>>> print(G.a1.sel(variant="11_316941526").values)
C
基因型的一部分将在用户访问它们时读取。
协方差矩阵也可以非常容易地读取。示例
>>> from pandas_plink import read_rel
>>> K = read_rel("plink2.rel.bin")
>>> print(K)
<xarray.DataArray (sample_0: 10, sample_1: 10)>
array([[ 0.885782, 0.233846, -0.186339, -0.009789, -0.138897, 0.287779,
0.269977, -0.231279, -0.095472, -0.213979],
[ 0.233846, 1.077493, -0.452858, 0.192877, -0.186027, 0.171027,
0.406056, -0.013149, -0.131477, -0.134314],
[-0.186339, -0.452858, 1.183312, -0.040948, -0.146034, -0.204510,
-0.314808, -0.042503, 0.296828, -0.011661],
[-0.009789, 0.192877, -0.040948, 0.895360, -0.068605, 0.012023,
0.057827, -0.192152, -0.089094, 0.174269],
[-0.138897, -0.186027, -0.146034, -0.068605, 1.183237, 0.085104,
-0.032974, 0.103608, 0.215769, 0.166648],
[ 0.287779, 0.171027, -0.204510, 0.012023, 0.085104, 0.956921,
0.065427, -0.043752, -0.091492, -0.227673],
[ 0.269977, 0.406056, -0.314808, 0.057827, -0.032974, 0.065427,
0.714746, -0.101254, -0.088171, -0.063964],
[-0.231279, -0.013149, -0.042503, -0.192152, 0.103608, -0.043752,
-0.101254, 1.423033, -0.298255, -0.074334],
[-0.095472, -0.131477, 0.296828, -0.089094, 0.215769, -0.091492,
-0.088171, -0.298255, 0.910274, -0.024663],
[-0.213979, -0.134314, -0.011661, 0.174269, 0.166648, -0.227673,
-0.063964, -0.074334, -0.024663, 0.914586]])
Coordinates:
* sample_0 (sample_0) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
* sample_1 (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
fid (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
iid (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
>>> print(K.values)
[[ 0.89 0.23 -0.19 -0.01 -0.14 0.29 0.27 -0.23 -0.10 -0.21]
[ 0.23 1.08 -0.45 0.19 -0.19 0.17 0.41 -0.01 -0.13 -0.13]
[-0.19 -0.45 1.18 -0.04 -0.15 -0.20 -0.31 -0.04 0.30 -0.01]
[-0.01 0.19 -0.04 0.90 -0.07 0.01 0.06 -0.19 -0.09 0.17]
[-0.14 -0.19 -0.15 -0.07 1.18 0.09 -0.03 0.10 0.22 0.17]
[ 0.29 0.17 -0.20 0.01 0.09 0.96 0.07 -0.04 -0.09 -0.23]
[ 0.27 0.41 -0.31 0.06 -0.03 0.07 0.71 -0.10 -0.09 -0.06]
[-0.23 -0.01 -0.04 -0.19 0.10 -0.04 -0.10 1.42 -0.30 -0.07]
[-0.10 -0.13 0.30 -0.09 0.22 -0.09 -0.09 -0.30 0.91 -0.02]
[-0.21 -0.13 -0.01 0.17 0.17 -0.23 -0.06 -0.07 -0.02 0.91]]
有关更多信息,请参阅pydata-plink文档。
作者
许可证
本项目采用MIT许可证许可。
项目详情
关闭
哈希值(pandas_plink-2.3.1-pp310-pypy310_pp73-win_amd64.whl)
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 8720c45a0069099e9788dd0f5d6a6a118502dbfbd98e86a50e64a4ff00802213 |
|
MD5 | 43d6bbc7fbac39a07bb428ad82564627 |
|
BLAKE2b-256 | 89a9d26fb3c5ec5ff6c824a28f0473ada5e01430ab2b131370f80a1c216f19ed |
关闭
哈希值(pandas_plink-2.3.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux_2_5_x86_64.manylinux1_x86_64.manylinux2014_x86_64.whl)
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 99b4c03b9909ff9c656e8444ff73835233828c22a19040240be3920d6af68f4a |
|
MD5 | 7c4a1858e1c89539b87ba297da0a5ec9 |
|
BLAKE2b-256 | 76ed8763d90d1360b1627f4a6a11f9c12f713acc659c03c49287c2d43b8f862d |
关闭
哈希值(pandas_plink-2.3.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl)
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 2880b760102c3fcdfe76bf07232d5872357c07e2672d59c250e3edab948f3948 |
|
MD5 | c5af641fad897f36ffa4a05e07f21844 |
|
BLAKE2b-256 | 359cf0c25f08861089dd249c8c76e27f1b95dd0bb67d1955a8eab17ebf77579b |
关闭
哈希值(pandas_plink-2.3.1-pp310-pypy310_pp73-macosx_14_0_arm64.whl)
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 47089b9412f7e2bcdb53437ee2e5f93a26e178ba346dcdfbc64fc21eeb038bc0 |
|
MD5 | facfa973d57881c938991127122233cf |
|
BLAKE2b-256 | a51bd09f4b36416973840e954e36f00f23a1256f68a7a09462d43fc8711c4672 |
关闭
哈希值(pandas_plink-2.3.1-pp310-pypy310_pp73-macosx_13_0_x86_64.whl)
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 9ff73acbb34de757d5c089d2552e6ce28a2f4b0f0c2e165c6275f7e45eac83a8 |
|
MD5 | 6b26c91868eaf38f865652e3afe4b811 |
|
BLAKE2b-256 | ebe5ab221e55d4bf461f153c42fc7f8113383c4c3b1e897ae1a349fb71db6adf |
关闭
哈希值(pandas_plink-2.3.1-cp312-cp312-musllinux_1_2_x86_64.whl)
算法 | 哈希摘要 | |
---|---|---|
SHA256 | b441e458b10c5d455f244e1bdf290d2481ff179d9192af50119d3bdcc43ffe2c |
|
MD5 | a86043f4f450397df2b3d7c56faeb6a8 |
|
BLAKE2b-256 | ff51c4f82627808d4fa47bd31f7a258e7d142ac89b02ebe782a899d33bea2d15 |
关闭
哈希值(pandas_plink-2.3.1-cp312-cp312-musllinux_1_2_aarch64.whl)
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 67253f92e255599c20b93ebe94a56e1296a72d4c17af7701c42a283d3f3fa7b4 |
|
MD5 | bffb4616b027dc8a79227feec6368773 |
|
BLAKE2b-256 | 74a313405ac7359b0bd55e15ce2724fc9018d36c093c5f8da57b9e955fa157f2 |
关闭
哈希值(pandas_plink-2.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux_2_5_x86_64.manylinux1_x86_64.manylinux2014_x86_64.whl)
算法 | 哈希摘要 | |
---|---|---|
SHA256 | dc91a4812439ecd2327f47fbf85df9385dfd24f9f9eedd540c72d485098e90b2 |
|
MD5 | 657895f85b73b45c8c70e6c491af0a2b |
|
BLAKE2b-256 | a9fc0ea44945c198b90135d770bd2785450c36606cacdba29379358af1acaf20 |
关闭
哈希值(pandas_plink-2.3.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl)
算法 | 哈希摘要 | |
---|---|---|
SHA256 | b263270800df09048825bc586b27a490abb20aef51b6d2d68aa1c390166aa152 |
|
MD5 | d8159cde9c21edc585bb15807c90d006 |
|
BLAKE2b-256 | 0dd680239a54db460207635c194f26a7c9fd812e8a01dd8006ca4b30149dfd89 |
关闭
哈希值(pandas_plink-2.3.1-cp312-cp312-macosx_14_0_arm64.whl)
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 790e095ba404ffb0b353d92ae0189158792fdcc1cce5a706f08eaedaac6e0e2a |
|
MD5 | f47bd769289e486755f31a995f5425de |
|
BLAKE2b-256 | bfc83ae45385ed6ae1c76d6b805b2e1fc3f1cf5520369ccce587c932337be372 |
关闭
哈希值 用于 pandas_plink-2.3.1-cp312-cp312-macosx_13_0_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 1a9287324b96210eaa43e5919ba6b25eb070381ee87241af9e6fdd0ab8d42268 |
|
MD5 | 9015fb15314720b315e2dbf4dabeb792 |
|
BLAKE2b-256 | b58ed45cc2c67006ad1f7cad3458fe1b4771e743ce5e39e0baf98eeb2face9f1 |
关闭
哈希值 用于 pandas_plink-2.3.1-cp311-cp311-win_amd64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | c7331c343392c45fe2b2915833e3c752f5759caa4413d3de574c3a74a72bb5dd |
|
MD5 | 86e8dd55e7ff76d03d51f0203399b149 |
|
BLAKE2b-256 | 33690b364cc73d0edfb85dbf02ee97628d22996ac2fc001df103720f507bccb7 |
关闭
哈希值 用于 pandas_plink-2.3.1-cp311-cp311-musllinux_1_2_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 78bb30004e1322e9aa061d57c9acede79d776392fa2387dce434cf3ccc790957 |
|
MD5 | c7eb3c89b97d37796674e310bf7f2049 |
|
BLAKE2b-256 | b05177a8409a543e6174513717acdd8c1194a4ae0160bf494cc6d5d46914f45c |
关闭
哈希值 用于 pandas_plink-2.3.1-cp311-cp311-musllinux_1_2_aarch64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 0919075dda9487913668fcf8e1564ff2dd58372eda2c3388710c5ad16c233aa5 |
|
MD5 | bba9603d56fc293d50b5c7c04fb6e40f |
|
BLAKE2b-256 | cee2b77818009b39a5a859b2d35f755eb402181de92daaf02469e445b705bed4 |
关闭
哈希值 用于 pandas_plink-2.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux_2_5_x86_64.manylinux1_x86_64.manylinux2014_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | e361f383ac4afddb24bd92a88ddcffed0ff5d9e70f76546bea00a579604930cb |
|
MD5 | ae4126e7a4865a5229ccf1e28abc4186 |
|
BLAKE2b-256 | 79e4577924bf8ca94ba2cc6a23f9e384428cf753d01b37d29f5ff3618b4f65b3 |
关闭
哈希值 用于 pandas_plink-2.3.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | cab5aa76b6c490fd220dc1321b7b59780e4544b54bf7208f6e208c4ae126d8fc |
|
MD5 | c98a03563385d5e9450b6f16374fb2ca |
|
BLAKE2b-256 | 1b69a248e2474cfaa277562f3b83b3e45087be3443f4830f8af123867964e888 |
关闭
哈希值 用于 pandas_plink-2.3.1-cp311-cp311-macosx_14_0_arm64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 7750c78f81a7e676e7a55b04425a62e48ead3ebebb4cd6157498e164ec330609 |
|
MD5 | 0c62516bf8fd2c645d3235ff469dcb33 |
|
BLAKE2b-256 | 73dc7229739ade143745af7951bde8704b59afe3ca9c0243f0b9039d44be47f4 |
关闭
哈希值 用于 pandas_plink-2.3.1-cp311-cp311-macosx_13_0_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | ccd5f480c4bd11eaf0e3f22ee738b244967f52503dbf4451fa169a93066cd94b |
|
MD5 | 27f4f34301bff7a97567c6a4c73a8a3b |
|
BLAKE2b-256 | b7e99278917b302bee066613be9f03ce89eb76ce5996534477afa52f8d0720af |
关闭
哈希值 用于 pandas_plink-2.3.1-cp310-cp310-win_amd64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 37ae2cea2eead756c3ff3c5008e75aece50224fcbd93c51cb7992c5f82c3b016 |
|
MD5 | d0c5d602525b1fa5509f33c01b8aa5b7 |
|
BLAKE2b-256 | 287a94fe938543de867fee94694fe165f4dce03bacc6e849c32490a33f3208c3 |
关闭
哈希值 用于 pandas_plink-2.3.1-cp310-cp310-musllinux_1_2_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 85ac6fea14a7b53c7da7fd78f64fb8ca6b88e323915e5c4ca3267b0b19caf32f |
|
MD5 | c92717b8b31f5f84e9ac37dbf1fe1295 |
|
BLAKE2b-256 | a70be01c1e8312f52b67cd7791cca9f80ceee8eb18b197bfed55928b5e8b1f2e |
关闭
哈希值 用于 pandas_plink-2.3.1-cp310-cp310-musllinux_1_2_aarch64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 7ec0bf42321d1587721e953d2370039399ad42466f1b9c3c5db890294a5a52fd |
|
MD5 | 7c0078dde80023f1da9dfab959a7ebf8 |
|
BLAKE2b-256 | 5060468b2d8a6b4e3df4f3ec052a284a9926bac7bb74ee4b8184f5ad2f5044f8 |
关闭
哈希值 用于 pandas_plink-2.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux_2_5_x86_64.manylinux1_x86_64.manylinux2014_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 693a7a143419e1aaa50508387e00b70ed3d80db7bb0f0f0af4a7586680588f65 |
|
MD5 | c9d4eeb53e92b664358ff5e3b2780c8c |
|
BLAKE2b-256 | e2246ff63eb44094121e380ecd5755a3a3464230ba3650fcfca3f40f0f2fae62 |
关闭
哈希值 用于 pandas_plink-2.3.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | dc00a218e37aa208c9d231aa6dd9fd5ca3343973d61128174d58006618a9c6f8 |
|
MD5 | d2909d91eaf77bc9b32c870b17663c9f |
|
BLAKE2b-256 | 490915f5d7a0e68a4e0e3ef15bf10a8f9d6ebd6cc5aae91352f37ae14eb70b9d |
关闭
哈希值 用于 pandas_plink-2.3.1-cp310-cp310-macosx_14_0_arm64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 83bfc1cb91318bd2890dd96cf908a7917c5e4612f3483d224583a7d08625edc3 |
|
MD5 | 72a43bcde1aecc0d4683c18741754486 |
|
BLAKE2b-256 | f4167434b410a6aeddfae056842534b287ee246440eb12f91a0755d734ee4fc1 |
关闭
哈希值 用于 pandas_plink-2.3.1-cp310-cp310-macosx_13_0_x86_64.whl
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 732cfe741ff25f4310bf73a75ca21496ee413fe9afd23f65803997beb0eaa1ce |
|
MD5 | 9c856afe3bd6f0a407fbccef0401570e |
|
BLAKE2b-256 | aec1e932f605cc24d9bbf70d7170e41ed47568799ec817bb4132987e759532f7 |