从不同数据源(3'seq和长读长)检测和分析选择性多腺苷酸化以及从长读长RNA-seq中检测和分析转录起始位点。
项目描述
LAPA
从3'-seq、长读长和短读长等不同数据源检测选择性多腺苷酸化。
安装
pip install lapa
从长读长RNA-seq或3'-seq中调用Poly(A)位点
lapa --alignment {rep1.bam},{rep2.bam},{rep3.bam} \
--fasta {fasta} \
--annotation {gtf} \
--chrom_sizes {chrom_sizes} \
--output_dir {output}
参数详情 (文档)
$ lapa --help
Usage: lapa [OPTIONS]
CLI interface for lapa polyA cluster calling.
Options:
--alignment TEXT Single or multiple bam file paths are
separated with a comma.Alternatively, CSV
file with columns of sample, dataset, path
where the sample columns contains the name
of the sample, the dataset is the group of
samples replicates of each other, and path
is the path of bam file. [required]
--fasta TEXT Genome reference (GENCODE or ENSEMBL fasta)
[required]
--annotation TEXT Standart genome annotation (GENCODE or
ENSEMBL gtf). GENCODE gtf file do not
contains annotation for `five_prime_utr` and
`three_prime_utr` so need to be corrected
with `gencode_utr_fix` (see https://github.c
om/MuhammedHasan/gencode_utr_fix.git).
[required]
--chrom_sizes TEXT Chrom sizes files (can be generated with
`faidx fasta -i chromsizes > chrom_sizes`)
[required]
--output_dir TEXT Output directory of LAPA. See
lapa.readthedocs.io/en/latest/output.html)
for the details of the directory structure
and file format. [required]
...
建议设置包括所有样本及其生物样本/实验重复(组织、细胞系)
samples.csv
sample,dataset,path
ENCFF772LYG,myoblast,ENCFF772LYG.bam
ENCFF421MIL,myoblast,ENCFF421MIL.bam
ENCFF699KOR,myotube,ENCFF699KOR.bam
ENCFF731HHB,myotube,ENCFF731HHB.bam
然后LAPA以samples_config.csv为输入
lapa --alignment samples.csv \
--fasta {fasta} \
--annotation {gtf} \
--chrom_sizes {chrom_sizes} \
--output_dir {output}
...
从长读长RNA-seq中进行TSS调用
lapa_tss --alignment samples.csv \
--fasta {fasta} \
--annotation {gtf} \
--chrom_sizes {chrom_sizes} \
--output_dir {output}
参数详情 (文档)
$ lapa_tss --help
Usage: lapa_tss [OPTIONS]
CLI interface for lapa tss cluster calling.
Options:
--alignment TEXT Single or multiple bam file paths are
separated with a comma.Alternatively, CSV
file with columns of sample, dataset, path
where the sample columns contains the name
of the sample, the dataset is the group of
samples replicates of each other, and path
is the path of bam file. [required]
--fasta TEXT Genome reference (GENCODE or ENSEMBL fasta)
[required]
--annotation TEXT Standart genome annotation (GENCODE or
ENSEMBL gtf). GENCODE gtf file do not
contains annotation for `five_prime_utr` and
`three_prime_utr` so need to be corrected
with `gencode_utr_fix` (see https://github.c
om/MuhammedHasan/gencode_utr_fix.git)
[required]
--chrom_sizes TEXT Chrom sizes files (can be generated
with)`faidx fasta -i chromsizes >
chrom_sizes`) [required]
--output_dir TEXT Output directory of LAPA. See
lapa.readthedocs.io/en/latest/output.html)
for the details of the directory structure
and file format. [required]
文档
有关其他功能、LAPA的参数、Python API和统计测试,请参阅以下文档链接
Readthedocs: https://lapa.readthedocs.io/en/latest/index.html
API参考: https://lapa.readthedocs.io/en/latest/autoapi/index.html
Colab教程(肌母细胞肌管细胞分化的分析):https://colab.research.google.com/drive/1QzMxCRjCk3i5_MuHzjozSRWMaJgdEdSI?usp=sharing
引用
如果您在学术研究中使用LAPA,请引用以下论文
@article{celik2022analysis,
title={Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA},
author={Celik, Muhammed Hasan and Mortazavi, Ali},
journal={bioRxiv},
year={2022},
publisher={Cold Spring Harbor Laboratory}
}
项目详情
下载文件
下载适合您平台的文件。如果您不确定选择哪个,请了解更多关于安装包的信息。
源代码分发
lapa-0.0.5.tar.gz (29.6 kB 查看哈希值)
构建分发
lapa-0.0.5-py3-none-any.whl (36.3 kB 查看哈希值)
关闭
lapa-0.0.5.tar.gz的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 3240405cbdd0e68c344676bc780e62368359024726f0ef2c04ef487d49793669 |
|
MD5 | 39f233bac19500ae1dafeb07fe8e5b8d |
|
BLAKE2b-256 | 64cce9059f9e9108f5a9a22e08164e29be3a937bab6c7636a8c2e51bf5b85ceb |
关闭
lapa-0.0.5-py3-none-any.whl的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | e2d6378d8318ce1dd442d27072cda849a3cfec991062edb7d221a6e6031586c0 |
|
MD5 | 21274dd4b535324b98077ba685f420e3 |
|
BLAKE2b-256 | 5a165e29b4ebbadd96dad3de71e12ab64bcf7dc0a5b825a34bc2676ec224ee8a |