跳转到主要内容

将PLINK文件读入Pandas数据框

项目描述

pydata-plink

pydata-plink是一个Python包,用于读取PLINK二进制文件格式和实现关系矩阵(PLINK或GCTA)。文件读取通过懒加载进行,这意味着它仅读取用户实际访问的基因型,从而节省内存。

重要变更可以在CHANGELOG.md中找到。

安装

可以使用pip进行安装

pip install pandas-plink

或者可以通过conda安装

conda install -c conda-forge pandas-plink

使用方法

它就像这样简单。

>>> from pandas_plink import read_plink1_bin
>>> G = read_plink1_bin("chr11.bed", "chr11.bim", "chr11.fam", verbose=False)
>>> print(G)
<xarray.DataArray 'genotype' (sample: 14, variant: 779)>
dask.array<shape=(14, 779), dtype=float64, chunksize=(14, 779)>
Coordinates:
  * sample   (sample) object 'B001' 'B002' 'B003' ... 'B012' 'B013' 'B014'
  * variant  (variant) object '11_316849996' '11_316874359' ... '11_345698259'
    father   (sample) <U1 '0' '0' '0' '0' '0' '0' ... '0' '0' '0' '0' '0' '0'
    fid      (sample) <U4 'B001' 'B002' 'B003' 'B004' ... 'B012' 'B013' 'B014'
    gender   (sample) <U1 '0' '0' '0' '0' '0' '0' ... '0' '0' '0' '0' '0' '0'
    i        (sample) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13
    iid      (sample) <U4 'B001' 'B002' 'B003' 'B004' ... 'B012' 'B013' 'B014'
    mother   (sample) <U1 '0' '0' '0' '0' '0' '0' ... '0' '0' '0' '0' '0' '0'
    trait    (sample) <U2 '-9' '-9' '-9' '-9' '-9' ... '-9' '-9' '-9' '-9' '-9'
    a0       (variant) <U1 'C' 'G' 'G' 'C' 'C' 'T' ... 'T' 'A' 'C' 'A' 'A' 'T'
    a1       (variant) <U1 'T' 'C' 'C' 'T' 'T' 'A' ... 'C' 'G' 'T' 'G' 'C' 'C'
    chrom    (variant) <U2 '11' '11' '11' '11' '11' ... '11' '11' '11' '11' '11'
    cm       (variant) float64 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
    pos      (variant) int64 157439 181802 248969 ... 28937375 28961091 29005702
    snp      (variant) <U9 '316849996' '316874359' ... '345653648' '345698259'
>>> print(G.sel(sample="B003", variant="11_316874359").values)
0.0
>>> print(G.a0.sel(variant="11_316874359").values)
G
>>> print(G.sel(sample="B003", variant="11_316941526").values)
2.0
>>> print(G.a1.sel(variant="11_316941526").values)
C

基因型的一部分将在用户访问它们时读取。

协方差矩阵也可以非常容易地读取。示例

>>> from pandas_plink import read_rel
>>> K = read_rel("plink2.rel.bin")
>>> print(K)
<xarray.DataArray (sample_0: 10, sample_1: 10)>
array([[ 0.885782,  0.233846, -0.186339, -0.009789, -0.138897,  0.287779,
         0.269977, -0.231279, -0.095472, -0.213979],
       [ 0.233846,  1.077493, -0.452858,  0.192877, -0.186027,  0.171027,
         0.406056, -0.013149, -0.131477, -0.134314],
       [-0.186339, -0.452858,  1.183312, -0.040948, -0.146034, -0.204510,
        -0.314808, -0.042503,  0.296828, -0.011661],
       [-0.009789,  0.192877, -0.040948,  0.895360, -0.068605,  0.012023,
         0.057827, -0.192152, -0.089094,  0.174269],
       [-0.138897, -0.186027, -0.146034, -0.068605,  1.183237,  0.085104,
        -0.032974,  0.103608,  0.215769,  0.166648],
       [ 0.287779,  0.171027, -0.204510,  0.012023,  0.085104,  0.956921,
         0.065427, -0.043752, -0.091492, -0.227673],
       [ 0.269977,  0.406056, -0.314808,  0.057827, -0.032974,  0.065427,
         0.714746, -0.101254, -0.088171, -0.063964],
       [-0.231279, -0.013149, -0.042503, -0.192152,  0.103608, -0.043752,
        -0.101254,  1.423033, -0.298255, -0.074334],
       [-0.095472, -0.131477,  0.296828, -0.089094,  0.215769, -0.091492,
        -0.088171, -0.298255,  0.910274, -0.024663],
       [-0.213979, -0.134314, -0.011661,  0.174269,  0.166648, -0.227673,
        -0.063964, -0.074334, -0.024663,  0.914586]])
Coordinates:
  * sample_0  (sample_0) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
  * sample_1  (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
    fid       (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
    iid       (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
>>> print(K.values)
[[ 0.89  0.23 -0.19 -0.01 -0.14  0.29  0.27 -0.23 -0.10 -0.21]
 [ 0.23  1.08 -0.45  0.19 -0.19  0.17  0.41 -0.01 -0.13 -0.13]
 [-0.19 -0.45  1.18 -0.04 -0.15 -0.20 -0.31 -0.04  0.30 -0.01]
 [-0.01  0.19 -0.04  0.90 -0.07  0.01  0.06 -0.19 -0.09  0.17]
 [-0.14 -0.19 -0.15 -0.07  1.18  0.09 -0.03  0.10  0.22  0.17]
 [ 0.29  0.17 -0.20  0.01  0.09  0.96  0.07 -0.04 -0.09 -0.23]
 [ 0.27  0.41 -0.31  0.06 -0.03  0.07  0.71 -0.10 -0.09 -0.06]
 [-0.23 -0.01 -0.04 -0.19  0.10 -0.04 -0.10  1.42 -0.30 -0.07]
 [-0.10 -0.13  0.30 -0.09  0.22 -0.09 -0.09 -0.30  0.91 -0.02]
 [-0.21 -0.13 -0.01  0.17  0.17 -0.23 -0.06 -0.07 -0.02  0.91]]

有关更多信息,请参阅pydata-plink文档

作者

许可证

本项目采用MIT许可证许可。

项目详情


下载文件

下载适用于您平台的文件。如果您不确定选择哪个,请了解更多关于安装包的信息。

源分发

pandas_plink-2.3.1.tar.gz (19.5 kB 查看哈希值)

上传时间

构建分发

pandas_plink-2.3.1-pp310-pypy310_pp73-win_amd64.whl (55.7 kB 查看哈希值)

上传时间 PyPy Windows x86-64

pandas_plink-2.3.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux_2_5_x86_64.manylinux1_x86_64.manylinux2014_x86_64.whl (83.3 kB 查看哈希值)

上传时间 PyPy manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

pandas_plink-2.3.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (46.3 kB 查看哈希值)

上传时间 PyPy manylinux: glibc 2.17+ ARM64

pandas_plink-2.3.1-pp310-pypy310_pp73-macosx_14_0_arm64.whl (42.9 kB 查看哈希值)

上传时间 PyPy macOS 14.0+ ARM64

pandas_plink-2.3.1-pp310-pypy310_pp73-macosx_13_0_x86_64.whl (42.1 kB 查看哈希值)

上传时间 PyPy macOS 13.0+ x86-64

pandas_plink-2.3.1-cp312-cp312-win_amd64.whl (49.1 kB 查看哈希值)

上传时间 CPython 3.12 Windows x86-64

pandas_plink-2.3.1-cp312-cp312-musllinux_1_2_x86_64.whl (77.4 kB 查看哈希值)

上传于 CPython 3.12 musllinux: musl 1.2+ x86-64

pandas_plink-2.3.1-cp312-cp312-musllinux_1_2_aarch64.whl (42.1 kB 查看哈希值)

上传于 CPython 3.12 musllinux: musl 1.2+ ARM64

pandas_plink-2.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux_2_5_x86_64.manylinux1_x86_64.manylinux2014_x86_64.whl (79.4 kB 查看哈希值)

上传于 CPython 3.12 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

pandas_plink-2.3.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (42.4 kB 查看哈希值)

上传于 CPython 3.12 manylinux: glibc 2.17+ ARM64

pandas_plink-2.3.1-cp312-cp312-macosx_14_0_arm64.whl (40.1 kB 查看哈希值)

上传于 CPython 3.12 macOS 14.0+ ARM64

pandas_plink-2.3.1-cp312-cp312-macosx_13_0_x86_64.whl (39.4 kB 查看哈希值)

上传于 CPython 3.12 macOS 13.0+ x86-64

pandas_plink-2.3.1-cp311-cp311-win_amd64.whl (40.8 kB 查看哈希值)

上传于 CPython 3.11 Windows x86-64

pandas_plink-2.3.1-cp311-cp311-musllinux_1_2_x86_64.whl (59.4 kB 查看哈希值)

上传于 CPython 3.11 musllinux: musl 1.2+ x86-64

pandas_plink-2.3.1-cp311-cp311-musllinux_1_2_aarch64.whl (41.9 kB 查看哈希值)

上传于 CPython 3.11 musllinux: musl 1.2+ ARM64

pandas_plink-2.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux_2_5_x86_64.manylinux1_x86_64.manylinux2014_x86_64.whl (60.6 kB 查看哈希值)

上传于 CPython 3.11 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

pandas_plink-2.3.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (42.0 kB 查看哈希值)

上传时间: CPython 3.11 manylinux: glibc 2.17+ ARM64

pandas_plink-2.3.1-cp311-cp311-macosx_14_0_arm64.whl (34.6 kB 查看哈希值)

上传时间: CPython 3.11 macOS 14.0+ ARM64

pandas_plink-2.3.1-cp311-cp311-macosx_13_0_x86_64.whl (34.2 kB 查看哈希值)

上传时间: CPython 3.11 macOS 13.0+ x86-64

pandas_plink-2.3.1-cp310-cp310-win_amd64.whl (32.2 kB 查看哈希值)

上传时间: CPython 3.10 Windows x86-64

pandas_plink-2.3.1-cp310-cp310-musllinux_1_2_x86_64.whl (41.6 kB 查看哈希值)

上传时间: CPython 3.10 musllinux: musl 1.2+ x86-64

pandas_plink-2.3.1-cp310-cp310-musllinux_1_2_aarch64.whl (41.9 kB 查看哈希值)

上传时间: CPython 3.10 musllinux: musl 1.2+ ARM64

pandas_plink-2.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux_2_5_x86_64.manylinux1_x86_64.manylinux2014_x86_64.whl (42.2 kB 查看哈希值)

上传时间: CPython 3.10 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

pandas_plink-2.3.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (42.0 kB 查看哈希值)

上传时间: CPython 3.10 manylinux: glibc 2.17+ ARM64

pandas_plink-2.3.1-cp310-cp310-macosx_14_0_arm64.whl (29.2 kB 查看哈希值)

上传时间: CPython 3.10 macOS 14.0+ ARM64

pandas_plink-2.3.1-cp310-cp310-macosx_13_0_x86_64.whl (28.9 kB 查看哈希值)

上传时间: CPython 3.10 macOS 13.0+ x86-64

支持