通过纠正多序列DNA比对中的RIP样突变来预测真菌重复元件的祖先序列。
项目描述
deRIP2
通过纠正RIP样突变(CpA --> TpA)和胞嘧啶脱氨(C --> T)事件来预测真菌重复家族的祖先序列。
将RIP或脱氨事件从输入比对中作为模糊碱基屏蔽。
目录表
算法概述
对于输入比对中的每一列
- 检查带缺隙的行数是否大于最大缺隙比例。如果是,则在输出序列中添加一个缺隙。
- 设置输出序列中的不变列值。
- 如果至少X比例的碱基是C/T或G/A(例如,maxSNPnoise = 0.4,则至少0.6的列位置必须是C/T或G/A)。
- 如果设置重新胺化选项,则还原T-->C或A-->G。
- 如果没有设置重新胺化,则检查RIP二核苷酸环境中的位置数(C/TpA或TpG/A)。
- 如果列中在RIP样环境中的位置比例 => minRIPlike阈值,并且至少有一个底物和一个产物基序(即CpA和TpA)存在,则在输出序列中执行RIP校正。
- 对于输出序列中所有剩余的位置(未被间隙、再胺化或RIP校正填充),从具有最少观察到的RIP事件(或未检测到RIP或多个序列共享最小RIP计数时的最大GC含量)的输入序列继承序列。
输出
- 校正的序列作为fasta文件。
- 可选,与
- 校正序列附加。
- 校正位置被标记为模糊碱基。
选项和用法
安装
需要Python => v3.6
从本仓库克隆
% git clone https://github.com/Adamtaranto/deRIP2.git && cd deRIP2 && pip install -e .
从PyPi安装。
% pip install derip2
测试安装。
# Print version number and exit.
% derip2 --version
derip2 0.0.3
# Get usage information
% derip2 --help
示例用法
对于'myalignment.fa'中的对齐序列
- 任何列的间隙位置大于等于70%则不会进行校正。
- 列中的碱基必须大于等于80%的C/T或G/A
- 至少50%的碱基必须在RIP二核苷酸环境中(C/T作为CpA / TpA)
- 从最少RIP序列继承所有剩余未校正的位置。
- 将校正列中的所有底物和产物基序标记为模糊碱基(即CpA到TpA --> YpA)
derip2 --inAln myalignment.fa --format fasta \
--maxGaps 0.7 \
--maxSNPnoise 0.2 \
--minRIPlike 0.5 \
--outDir results \
--outAlnName aligment_with_deRIP.fa \
--label deRIPseqName \
--mask > results/deRIPed_sequence.fa
输出
- results/deRIPed_sequence.fa
- results/masked_aligment_with_deRIP.fa
标准选项
Usage: derip2 [-h] [--version] -i INALN
[--format {clustal,emboss,fasta,fasta-m10,ig,nexus,phylip,phylip-sequential,phylip-relaxed,stockholm}]
[-g MAXGAPS] [-a] [--maxSNPnoise MAXSNPNOISE]
[--minRIPlike MINRIPLIKE] [--fillmaxgc] [--fillindex FILLINDEX]
[--mask] [--noappend] [-d OUTDIR] [--outAlnName OUTALNNAME]
[--outAlnFormat {fasta,nexus}] [--label LABEL]
Predict ancestral sequence of fungal repeat elements by correcting for RIP-
like mutations or cytosine deamination in multi-sequence DNA alignments.
Optionally, mask corrected positions in alignment.
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-i INALN, --inAln INALN
Multiple sequence alignment.
--format {clustal,emboss,fasta,fasta-m10,ig,nexus,phylip,phylip-sequential,phylip-relaxed,stockholm}
Format of input alignment. Default: fasta
-g MAXGAPS, --maxGaps MAXGAPS
Maximum proportion of gapped positions in column to be
tolerated before forcing a gap in final deRIP
sequence. Default: 0.7
-a, --reaminate Correct all deamination events independent of RIP
context. Default: False
--maxSNPnoise MAXSNPNOISE
Maximum proportion of conflicting SNPs permitted
before excluding column from RIP/deamination
assessment. i.e. By default a column with >= 0.5 'C/T'
bases will have 'TpA' positions logged as RIP events.
Default: 0.5
--minRIPlike MINRIPLIKE
Minimum proportion of deamination events in RIP
context (5' CpA 3' --> 5' TpA 3') required for column
to deRIP'd in final sequence. Note: If 'reaminate'
option is set all deamination events will be
corrected. Default 0.1
--fillmaxgc By default uncorrected positions in the output
sequence are filled from the sequence with the lowest
RIP count. If this option is set remaining positions
are filled from the sequence with the highest G/C
content. Default: False
--fillindex FILLINDEX
Force selection of alignment row to fill uncorrected
positions from by row index number (indexed from 0).
Note: Will override '--fillmaxgc' option.
--mask Mask corrected positions in alignment with degenerate
IUPAC codes.
--noappend If set, do not append deRIP'd sequence to output
alignment.
-d OUTDIR, --outDir OUTDIR
Directory for deRIP'd sequence files to be written to.
--outAlnName OUTALNNAME
Optional: If set write alignment including deRIP
corrected sequence to this file.
--outAlnFormat {fasta,nexus}
Optional: Write alignment including deRIP sequence to
file of format X. Default: fasta
--label LABEL Use label as name for deRIP'd sequence in output
files.
问题
向问题跟踪器提交反馈
许可
软件在MIT许可下提供。