将bed或序列字典文件中的区域分割成块并散列

这些详情尚未由PyPI验证

项目链接

主页

项目描述

chunked-scatter 和 scatter-regions

chunked-scatter 工具接受bed文件、fasta索引、序列字典或vcf文件作为输入，并将contigs/chromosomes分割成给定大小的重叠块。然后，这些块将被放置在新bed文件中，每个文件一个染色体。为了避免创建数千个文件，小的染色体将被合并。

scatter-regions 工具以类似的方式工作，但默认值和标志针对创建GATK工具的基因组散列进行了调整。

安装

使用pip安装：pip install chunked-scatter
使用conda安装：conda install chunked-scatter
- 这需要具有bioconda通道的conda。

使用方法

chunked-scatter

usage: chunked-scatter [-h] [-p PREFIX] [-S] [-P] [-c SIZE]
                       [-m MINIMUM_BP_PER_FILE] [-o OVERLAP]
                       INPUT

Given a sequence dict, fasta index or a bed file, scatter over the defined
contigs/regions. Each contig/region will be split into multiple overlapping
regions, which will be written to a new bed file. Each contig will be placed
in a new file, unless the length of the contigs/regions doesn't exceed a given
number.

positional arguments:
  INPUT                 The input file. The format is detected by the
                        extension. Supported extensions are: '.bed', '.dict',
                        '.fai', '.vcf', '.vcf.gz', '.bcf'.

optional arguments:
  -h, --help            show this help message and exit
  -p PREFIX, --prefix PREFIX
                        The prefix of the ouput files. Output will be named
                        like: <PREFIX><N>.bed, in which N is an incrementing
                        number. Default 'scatter-'.
  -S, --split-contigs   If set, contigs are allowed to be split up over
                        multiple files.
  -P, --print-paths     If set prints paths of the output files to STDOUT.
                        This makes the program usable in scripts and
                        worfklows.
  -c SIZE, --chunk-size SIZE
                        The size of the chunks. The first chunk in a region or
                        contig will be exactly length SIZE, subsequent chunks
                        will SIZE + OVERLAP and the final chunk may be
                        anywhere from 0.5 to 1.5 times SIZE plus overlap. If a
                        region (or contig) is smaller than SIZE the original
                        regions will be returned. Defaults to 1e6
  -m MINIMUM_BP_PER_FILE, --minimum-bp-per-file MINIMUM_BP_PER_FILE
                        The minimum number of bases represented within a
                        single output bed file. If an input contig or region
                        is smaller than this MINIMUM_BP_PER_FILE, then the
                        next contigs/regions will be placed in the same file
                        untill this minimum is met. Defaults to 45e6.
  -o OVERLAP, --overlap OVERLAP
                        The number of bases which each chunk should overlap
                        with the preceding one. Defaults to 150.

scatter-regions

usage: scatter-regions [-h] [-p PREFIX] [-S] [-P] [-s SCATTER_SIZE] INPUT

Given a sequence dict, fasta index or a bed file, scatter over the defined
contigs/regions. Creates a bed file where the contigs add up approximately to
the given scatter size.

positional arguments:
  INPUT                 The input file. The format is detected by the
                        extension. Supported extensions are: '.bed', '.dict',
                        '.fai', '.vcf', '.vcf.gz', '.bcf'.

optional arguments:
  -h, --help            show this help message and exit
  -p PREFIX, --prefix PREFIX
                        The prefix of the ouput files. Output will be named
                        like: <PREFIX><N>.bed, in which N is an incrementing
                        number. Default 'scatter-'.
  -S, --split-contigs   If set, contigs are allowed to be split up over
                        multiple files.
  -P, --print-paths     If set prints paths of the output files to STDOUT.
                        This makes the program usable in scripts and
                        worfklows.
  -s SCATTER_SIZE, --scatter-size SCATTER_SIZE
                        The maximum size for the regions over which to
                        scatter. If contigs are not split, and a contig is
                        bigger than the maximum size, the contig will be
                        placed in its own file. Default: 1000000000.

示例

bed文件

给定位于/data/regions.bed的bed文件

chr1	100	1000
chr1	2000	16000
chr2	5000	10000

以下命令

chunked-scatter -p /data/scatter_ -m 1000 -c 5000 /data/regions.bed

将生成以下两个输出文件

/data/scatter_0.bed:

chr1	100	1000
chr1	2000	7000
chr1	6850	12000
chr1	11850	16000

/data/scatter_1.bed:
```
chr2	5000	10000
```

dict文件

给定位于/data/ref.dict的dict文件

@SQ	SN:chr1	LN:3000000
@SQ SN:chr2 LN:500000

以下命令

chunked-scatter -p /data/scatter_ /data/regions.bed

将在/data/scatter_0.bed生成以下输出文件

chr1	0	1000000
chr1	999850	2000000
chr1	1999850	3000000
chr2	0	500000

项目详情

这些详情尚未由PyPI验证

项目链接

主页

发布历史发布通知 | RSS 源

此版本

1.0.0

2020年7月16日

0.2.0

2020年6月16日

0.1.0

2019年6月18日

下载文件

下载适用于您平台的文件。如果您不确定选择哪个，请了解有关安装包的更多信息。

源代码分发

chunked-scatter-1.0.0.tar.gz (9.4 kB 查看哈希值)

上传时间 2020年7月16日 源代码

构建分发

chunked_scatter-1.0.0-py3-none-any.whl (12.5 kB 查看哈希值)

上传时间 2020年7月16日 Python 3

chunked-scatter-1.0.0.tar.gz 的哈希值

chunked-scatter-1.0.0.tar.gz 的哈希值
算法	哈希摘要
SHA256	`2635b3e4097fe9f22240f9b946eac812a185fefc28cea5cbe03281321675a02b`
MD5	`1a2c062f2bb5bf571473857fa633e4d0`
BLAKE2b-256	`c529f70d069845c1daf6ae4c74b5f19a8a09d0d3927857dbd69fc1dc3a9aeb4f`

chunked_scatter-1.0.0-py3-none-any.whl 的哈希值

chunked_scatter-1.0.0-py3-none-any.whl 的哈希值
算法	哈希摘要
SHA256	`e221fbe878025a012b9e36f7503a9999a4ac192db206fd2949b91b422240f951`
MD5	`3cb602f7f50041aa6efe46f80410c918`
BLAKE2b-256	`852dfd57870bdde4a868204e059ae9a94ece54ca2ef8fc49329e15aac9417742`

chunked-scatter 1.0.0

导航

验证详情

维护者

未验证详情

项目链接

元数据

分类器

项目描述

chunked-scatter 和 scatter-regions

安装

使用方法

chunked-scatter

scatter-regions

示例

bed文件

dict文件

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

分类器

发布历史发布通知 | RSS 源

下载文件

源代码分发

构建分发

chunked-scatter 1.0.0

导航

验证详情

维护者

未验证详情

项目链接

元数据

分类器

项目描述

chunked-scatter 和 scatter-regions

安装

使用方法

chunked-scatter

scatter-regions

示例

bed文件

dict文件

项目详情

验证详情

维护者

未验证详情

项目链接

元数据

分类器

发布历史 发布通知 | RSS 源

下载文件

源代码分发

构建分发

发布历史发布通知 | RSS 源