vcfnp · PyPI · Python 包索引

从VCF（变异调用文件）加载numpy数组。

这些细节尚未被PyPI验证

项目链接

主页

项目描述

从VCF（变异调用格式）文件加载数据到numpy数组中，并（可选）从那里加载到HDF5文件。

安装

安装需要numpy和cython

$ pip install cython
$ pip install numpy
$ pip install vcfnp

……或者

$ git clone --recursive git://github.com/alimanfoo/vcfnp.git
$ cd vcfnp
$ python setup.py build_ext --inplace

使用方法

对于从Python的使用，请参阅IPython笔记本示例，或尝试

>>> from __future__ import print_function, division
>>> import numpy as np
>>> import matplotlib
>>> matplotlib.use('TkAgg')
>>> import matplotlib.pyplot as plt
>>> import vcfnp
>>> vcfnp.__version__
'2.0.0'
>>> filename = 'fixture/sample.vcf'
>>> # load data from fixed fields (including INFO)
... v = vcfnp.variants(filename, cache=True).view(np.recarray)
[vcfnp] 2015-01-23 11:10:46.670723 :: caching is enabled
[vcfnp] 2015-01-23 11:10:46.670830 :: cache file available
[vcfnp] 2015-01-23 11:10:46.670866 :: loading from cache file fixture/sample.vcf.vcfnp_cache/variants.npy
>>> # print some simple variant metrics
... print('found %s variants (%s SNPs)' % (v.size, np.count_nonzero(v.is_snp)))
found 9 variants (5 SNPs)
>>> print('QUAL mean (std): %s (%s)' % (np.mean(v.QUAL), np.std(v.QUAL)))
QUAL mean (std): 25.0667 (22.816)
>>> # plot a histogram of variant depth
... fig = plt.figure(1)
>>> ax = fig.add_subplot(111)
>>> ax.hist(v.DP)
(array([ 4.,  0.,  0.,  0.,  0.,  0.,  1.,  2.,  0.,  2.]), array([  0. ,   1.4,   2.8,   4.2,   5.6,   7. ,   8.4,   9.8,  11.2,
        12.6,  14. ]), <a list of 10 Patch objects>)
>>> ax.set_title('DP histogram')
<matplotlib.text.Text object at 0x7f28f18f5c50>
>>> ax.set_xlabel('DP')
<matplotlib.text.Text object at 0x7f28f207c3c8>
>>> plt.show()
>>> # load data from sample columns
... c = vcfnp.calldata_2d(filename, cache=True).view(np.recarray)
>>> # print some simple genotype metrics
... count_phased = np.count_nonzero(c.is_phased)
>>> count_variant = np.count_nonzero(np.any(c.genotype > 0, axis=2))
>>> count_missing = np.count_nonzero(~c.is_called)
>>> print('calls (phased, variant, missing): %s (%s, %s, %s)'
...     % (c.flatten().size, count_phased, count_variant, count_missing))
calls (phased, variant, missing): 27 (14, 12, 2)
>>> # plot a histogram of genotype quality
... fig = plt.figure(2)
>>> ax = fig.add_subplot(111)
>>> ax.hist(c.GQ.flatten())
(array([ 15.,   0.,   1.,   1.,   0.,   1.,   2.,   4.,   2.,   1.]), array([  0. ,   6.1,  12.2,  18.3,  24.4,  30.5,  36.6,  42.7,  48.8,
        54.9,  61. ]), <a list of 10 Patch objects>)
>>> ax.set_title('GQ histogram')
<matplotlib.text.Text object at 0x7f28f1eb1cc0>
>>> ax.set_xlabel('GQ')
<matplotlib.text.Text object at 0x7f28f18d4fd0>
>>> plt.show()

还提供了命令行脚本，以便并行转换VCF文件到按基因组区域拆分的NPY数组。例如，以下命令将创建一个包含第二个100kb染色体变异数组的NPY文件

$ vcf2npy \
    --vcf /path/to/my.vcf \
    --fasta /path/to/ref.fa \
    --output-dir /path/to/npy/output \
    --array-type variants \
    --chromosome chr20 \
    --task-size 100000 \
    --task-index 2 \
    --progress 1000

对于那些有权访问运行Sun Grid Engine的集群的用户，提供了一个脚本以并行转换作业数组提交作业，例如

$ qsub_vcf2npy \
    --vcf /path/to/my.vcf \
    --fasta /path/to/ref.fa \
    --output-dir /path/to/npy/output \
    --array-type variants \
    --chromosome chr20 \
    --task-size 100000 \
    --progress 1000 \
    -l h_vmem=1G \
    -N test_vcfnp \
    -j y \
    -o /path/to/sge/logs \
    -q shortrun.q

应该很容易将此脚本修改为在其他并行计算平台上运行，请参阅脚本文件夹中的源代码。

还提供了一个脚本，用于将来自多个NPY文件的数据加载到单个HDF5文件中。例如，在将VCF文件转换为100kb变异和calldata_2d NPY拆分后，运行类似以下命令

$ vcfnpy2hdf5 \
    --vcf /path/to/my.vcf \
    --input-dir /path/to/npy/output \
    --output /path/to/my.h5

如果想要按染色体分组数据，可以为每个染色体单独做如下操作

$ vcfnpy2hdf5 \
    --vcf /path/to/my.vcf \
    --input-dir /path/to/npy/output \
    --input-filename-template {array_type}.chr20*.npy \
    --output /path/to/my.h5 \
    --group chr20

还有一个脚本可以在本地计算机上并行处理VCF文件并将其加载到HDF5文件中，例如

$ vcf2hdf5_parallel \
    --vcf /path/to/my.vcf \
    --fasta /path/to/refseq.fa

最后，还有一个脚本可以将VCF文件的固定字段转换为CSV，例如

$ vcf2csv \
    --vcf /path/to/my.vcf \
    --dialect excel-tab \
    --flatten-filter

致谢

基于Erik Garrison的vcflib。

项目详情

这些细节尚未被PyPI验证

项目链接

主页

版本历史发布通知 | RSS源

此版本

2.3.0

2016年7月20日

2.2.0

2015年11月25日

2.1.5

2015年9月1日

2.1.4

2015年8月26日

2.1.3

2015年8月26日

2.1.2

2015年3月1日

2.1.1

2015年3月1日

2.1.0

2015年2月28日

2.0.1

2015年1月23日

2.0.0

2015年1月23日

1.12

2014年8月26日

1.11.5

2014年7月8日

1.11.4

2014年7月7日

1.11.3

2014年6月26日

1.11.2

2014年6月23日

1.11.1

2014年6月23日

1.11

2014年6月23日

1.10.2

2014年6月2日

1.10.1

2014年3月10日

1.10

2014年3月10日

1.9.1

2014年2月20日

1.9

2014年2月20日

1.8

2014年2月17日

1.7

2014年1月23日

1.6

2014年1月22日

1.5

2014年1月22日

1.4

2013年11月19日

1.3

2013年11月19日

1.2

2013年11月19日

1.1

2013年11月19日

1.0.1

2013年11月8日

1.0

2013年11月7日

0.16

2013年10月30日

0.15

2013年10月22日

0.14

2013年9月12日

0.13

2013年7月16日

0.12

2013年7月16日

0.11.2

2013年7月4日

0.11.1

2013年7月4日

0.11

2013年7月4日

0.10

2013年6月20日

0.9

2013年5月23日

0.8

2013年5月23日

0.7

2013年5月22日

0.6

2013年5月20日

0.5

2013年5月17日

0.4

2013年5月17日

0.3

2013年5月17日

0.4-SNAPSHOT 预发布

2013年5月17日

下载文件

下载适合您平台的文件。如果您不确定选择哪个，请了解更多关于安装包的信息。

源分布

vcfnp-2.3.0.tar.gz (8.1 MB 查看哈希值)

上传时间 2016年7月20日 源

vcfnp-2.3.0.tar.gz 的哈希值

vcfnp-2.3.0.tar.gz 的哈希值
算法	哈希摘要
SHA256	`23db095e82ff7108729760b07de91ef3b2d58a543f9ed068eae0bce88b663def`
MD5	`af8ea21f3b9cdc9ac5e664af72609122`
BLAKE2b-256	`907d89a5124b086f3b5af6838a12f8d0ca92d772a486bef70bacc264df7594a0`

vcfnp 2.3.0

导航

验证细节

维护者

未验证细节

项目链接

元数据

分类器

项目描述

安装

使用方法

致谢

项目详情

验证细节

维护者

未验证细节

项目链接

元数据

分类器

版本历史发布通知 | RSS源

下载文件

源分布

vcfnp 2.3.0

导航

验证细节

维护者

未验证细节

项目链接

元数据

分类器

项目描述

安装

使用方法

致谢

项目详情

验证细节

维护者

未验证细节

项目链接

元数据

分类器

版本历史 发布通知 | RSS源

下载文件

源分布

版本历史发布通知 | RSS源