大家好,又见面了,我是你们的朋友全栈君。如果您正在找激活码,请点击查看最新教程,关注关注公众号 “全栈程序员社区” 获取激活教程,可能之前旧版本教程已经失效.最新Idea2022.1教程亲测有效,一键激活。
Jetbrains全家桶1年46,售后保障稳定
【Bowtie】DNA序列拼接的原理
【Jenny点评】我一直以为Bowtie是一个短序列拼接工作,实际上这是错误的。它不是序列拼接工作,只是一个序列比对的工具。最后的结果是相对index而言,对各个短序列进行定位。
——————
短序列比对的原理如何?目前有哪些常用的短序列比对软件? ok
http://blog.sina.com.cn/s/blog_9617895f01011npk.html
答:序列比对(alignment):为确定两个或多个序列之间的相似性以至于同源性,而将它们按照一定的规律排列。跟长序列比对不同,短序列比对有其特点,因此,两者的算法不一样。短序列比对中,一般常用的算法主要有三个:
(1)
空位种子片段索引法,如MAQ、ELAND等,首先将读段切分,并选取其中一段或几段作为种子建立搜索索引,再通过查找索引、延展匹配来实现读段定位,通过轮换种子考虑允许出现错配(mismatch)的各种可能的位置组合;
(2)
Burrows
Wheeler转换法,如Bowtie、BWA、SOAP2等,通过B-W转换将基因组序列按一定规则压缩并建立索引,再通过查找和回溯来定位读段,在查找时可通过碱基替代来实现允许的错配;
(3)
Smith-Waterman动态规划算法,如BFAST,SHRiMP等,利用初始条件和迭代关系式计算两个序列的所有可能的比对分值,并将结果存放于一个矩阵中,利用动态规划的方法回溯寻找最优的比对结果。
华大基因拼接 ok
http://www.ebiotrade.com/newsf/2010-1/2010128171022809.htm
下一代基因序列拼接算法研究
http://www.fdurop.fudan.edu.cn/upload/stu/docs/rcYsXb_102804-1303180458.pdf
基因组测序及分析Good!推荐看!
http://ibi.zju.edu.cn/bioinplant/courses/chap4.pdf
基因序列拼接算法设计
http://www.doc88.com/p-741680604744.html
【Bowtie】Bowtie2使用方法与参数详细介绍
Bowtie2使用方法与参数详细介绍
懒人必看
Bowtie2 -q –phred33 –sensitive –end-to-end -I 0 -X 500 –fr –un
unpaired –al aligned \ –un-conc unconc –al-conc alconc -p 6
–reorder -x{-1-2| -U} -S []
用法:
bowtie2 [options]* -x {-1 -2 | -U } -S []
必须:
-x 由bowtie2-build所生成的索引文件的前缀。首先 在当前目录搜寻,然后 在环境变量 BOWTIE2_INDEXES
中制定的文件夹中搜寻。 -1 双末端测寻对应的文件1。可以为多个文件,并用逗号分开;多个文件必须和 -2
中制定的文件一一对应。比如:”-1 flyA_1.fq,flyB_1.fq -2 flyA_2.fq,flyB _2.fq”.
测序文件中的reads的长度可以不一样。 -2 双末端测寻对应的文件2. -U
非双末端测寻对应的文件。可以为多个文件,并用逗号分开。测序文件中的reads的 长度可以不一样。 -S
所生成的SAM格式的文件前缀。默认是输入到标准输出。
以下是可选:
输入
-q 输入的文件为FASTQ格式文件,此项为默认值。 -qseq 输入的文件为QSEQ格式文件。 -f
输入的文件为FASTA格式文件。选择此项时,表示–ignore-quals也被选择了。 -r
输入的文件中,每一行代表一条序列,没有序列名和测序质量等。选择此项时,表示– ignore-quals也被选择了。 -c
后直接为比对的reads序列,而不是包含序列的文件名。序列间用逗号隔开。选择此项时, 表示—ignore-quals也被选择了。
-s/–skip input的reads中,跳过前个reads或者pairs。 -u/–qupto
只比对前个reads或者pairs(在跳过前个reads或者 pairs后)。Default: no limit.
-5/–trim5 剪掉5’端长度的碱基,再用于比对。(default: 0). -3/–trim3
剪掉3’端长度的碱基,再用于比对。(default: 0). –phred33 输入的碱基质量等于ASCII码值加上33.
在最近的illumina pipiline中 得以运用。 –phred64 输入的碱基质量等于ASCII码值加上64.
–solexa-quals 将Solexa的碱基质量转换为Phred。在老的GA Pipeline版本中得以 运用。Default:
off. –int-quals 输入文件中的碱基质量为用“ ”分隔的数值,而不是ASCII码。比如 40 40 30
40…。Default: off.
–end-to-end模式下的预设
–very-fast Same as: -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast Same
as: -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive Same as: -D 15 -R
2 -N 0 -L 22 -i S,1,1.15 (default in –end-to-end mode)
–very-sensitive Same as: -D 20 -R 3 -N 0 -L 20 -i
S,1,0.50
–loca模式下的预设
–loca模式下的预设参数 –very-fast-local Same as: -D 5 -R 1 -N 0 -L 25 -i
S,1,2.00 –fast-local Same as: -D 10 -R 2 -N 0 -L 22 -i S,1,1.75
–sensitive-local Same as: -D 15 -R 2 -N 0 -L 20 -i S,1,0.75
(default in –local mode) –very-sensitive-local Same as: -D 20 -R
3 -N 0 -L 20 -i S,1,0.50
-N 进行种子比对时允许的mismatch数. 可以设为0或者1. Default: 0. -L 设定种子的长度.
************************************************************ 功能选项
给bowtie的一些参数设定值的时候,使用一个计算公式代替,于是值的大小与比对序列的长 度成一定关系。有三部分组成: (a)计算方法,
包括常数(C),线性(L),平方根(S)和 自然对数(G); (b)一个常数; (c)一个系数. 例如: 为 L,-0.4,-0.6
则计算公式为: f(x) = -0.4 + -0.6 * x 为G,1,5.4 则计算公式为: f(x) = 1.0 + 5.4 *
ln(x) ************************************************************
-i 设定两个相邻种子间所间距的碱基数。
************************************************************
例如:如果read的长度为30, 种子的长度为10, 相邻种子的间距为6,则提取出的种子如下 所示: Read:
TAGCTACGCTCTACGCTATCATGCATAAAC Seed 1 fw: TAGCTACGCT Seed 1 rc:
AGCGTAGCTA Seed 2 fw: CGCTCTACGC Seed 2 rc: GCGTAGAGCG Seed 3 fw:
ACGCTATCAT Seed 3 rc: ATGATAGCGT Seed 4 fw: TCATGCATAA Seed 4 rc:
TTATGCATGA
************************************************************
在–end-to-end模式中默认值为”-i S,1,1.15”.即表示f(x) = 1 + 1.15 * sqrt(x).
如果read长度为100, 则相邻种子的间距为12. –n-ceil
设定read中允许含有不确定碱基(非GTAC,通常为N)的最大数目. Default: L,0,0.15. 计算公式为: f(x) =
0 + 0.15 * x, 表示长度为100的read 最多运行存在15个不确定碱基. 一旦不确定碱基数超过15,
则该条read会被过滤掉. –dpad Default: 15. –gbar 在read头尾个碱基内不允许gap.
Default: 4. –ignore-quals 计算错配罚分的时候不考虑碱基质量. 当输入序列的模式为-f, -r 或
者-c的时候, 该设置自动成为默认设置. –nofw/–norc –nofw设定read不和前导链(forward
reference strand)进行比对; –norc设定不和后随链(reverse-complement reference
strand)进行比对. Default: both strands enabled. –end-to-end
比对是将整个read和参考序列进行比对. 该模式–ma的值为0. 该模式为 默认模式, –local模式冲突. –local
该模式下对read进行局部比对, 从而, read两端的一些碱基不比对,从而使比 对得分满足要求. 该模式下
–ma默认为2.
得分罚分
–ma 设定匹配得分. –local模式下每个read上碱基和参考序列上碱基匹配, 则 加分.
在—end-to-end模式中无效. Default: 2. –mp MX,MN 设定错配罚分. 其中MX为所罚最高分,
MN为所罚最低分. 默认设置下罚分与 碱基质量相关. 罚分遵循的公式为: MN + floor( (MX-MN)(MIN(Q,
40.0)/40.0) ). 其中Q为碱基的质量值. 如果设置了—ignore-qual参数, 则错配总是罚最高分. Default:
MX = 6, MN = 2. –np 当匹配位点中read, reference上有不确定碱基(比如N)时所设定的罚分值.
Default: 1. –rdg , 设置在read上打开gap 罚分, 延长gap罚分. Default: 5, 3. –rfg
, 设置在reference上打开gap 罚分, 延长gap罚分 . Default: 5, 3. –score-min
设定成为有效比对的最小分值. 在—end-to-end模式下默认值为: L,-0.6,-0.6; 在–local模式下默认值为:
G,20,8.
报告
-k 默认设置下, bowtie2搜索出了一个read不同的比对结果, 并报告其中最好的 比对结果(如果好几个最好的比对结果得分一致,
则随机挑选出其中一个). 而在该模式下, bowtie2最多搜索出一个read 个比对结果, 并将这些结果按得分降序报告出来. -a
和-k参数一样, 不过不限制搜索的结果数目. 并将所有的比对结果都按降序报告出来. 此参数和-k参数冲突. 值得注意的是:
如果基因组含有很多重复序列时, 该参数会导致程序 运行极其缓慢.
Effort
-D 比对时, 将一个种子延长后得到比对结果, 如果不产生更好的或次好的比对结果, 则该次比对失败. 当失败次数连续达到次后,
则该条read比对结束. Bowtie2才会 继续进行下去. Default: 15. 当具有-k或-a参数,
则该参数所产生的限制会自动调整. -R 如果一个read所生成的种子在参考序列上匹配位点过多. 当每个种子平均匹配超 过300个位置,
则通过一个不同的偏移来重新生成种子进行比对. 则是重新生成种子 的次数. Default: 2.
Paired-end
-I/–minins 设定最小的插入片段长度. Default: 0. -X/–maxins 设定最长的插入片段长度.
Default: 500. –fr/–rf/–ff 设定上下游reads和前导链paired-end比对的方向. –fr:
匹配时, read1在5’端上游, 和前导链一致, read2在3’下游, 和前导链反向互补. 或者read2在 上游,
read1在下游反向互补; –rf: read1在5’端上游, 和前导链反向互补, read2在 3’端下游, 和前导链一致;
–fr: 两条reads都和前导链一致. Default: –fr. 默认
设置适合于Illumina的paired-end测序数据; 若是mate-paired, 则要选择—rf参数. –no-mixed
默认设置下, 一对reads不能成对比对到参考序列上, 则单独对每个read进 行比对. 该选项则阻止此行为.
–no-discordant 默认设置下, 一对reads不能和谐比对(concordant alignment, 即满足-I,
-X, –fr/–rf/–ff的条件)到参考序列上, 则搜寻其不和谐比对(discon cordant alignment,
即两条reads都能独一无二地比对到参考序列上, 但是不满足-I, -X,–fr/–rf/–ff的条件). 该选项阻止此行为.
–dovetail read1和read2的关系为dovetail的时候,该状况算为和谐比对. 默认情况
下dovetail不算和谐比对. –no-contain read1和read2的关系为包含的时候, 该状况不算为和谐比对.
默认情况 下包含关系算为和谐比对. –no-overlap read1和read2的关系为有重叠的时候, 该状况不算为和谐比对.
默认情 况下两个reads重叠算为和谐比对.
输出
-t/–time –un 将unpaired reads写入到. –un-gz 将unpaired reads写入到,
gzip压缩. –un-bz2 将unpaired reads写入到, bz2压缩. –al
将至少能比对1次以上的unpaired reads写入. –al-gz … ,gzip压缩. –al-bz2 …
,bz2压缩. –un-conc 将不能和谐比对的paired-end reads写入. –un-conc-gz …
,gzip压缩. –un-conc-bz2 … ,bz2压缩. –al-conc
将至少能和谐比对一次以上的paired-end reads写入. –al-conc-gz … ,gzip压缩.
–al-conc-bz2 … ,bz2压缩. –quiet 安静模式,除了比对错误和一些严重的错误, 不在屏幕上输出任何东西.
–met-file 将bowtie2的检测信息(metrics)写入文件. 用于debug. Default: metrics
disabled. –met-stderr 将bowtie2的检测信息(metrics)写入标准错误文件句柄. 和上
一个选项不冲突. Default: metrics disabled. –met 每隔秒写入一次metrics记录.
Default: 1.
Sam
–no-unal 不记录没比对上的reads. –no-hd 不记录SAM header lines (以@开头).
–no-sq 不记录@SQ的SAM header lines. –rg-id 设定read group Id到. –rg
增加作为一行@RG.
性能
-o/–offrate 无视index的offrate值, 以取代之. Index默认的 值为5.
值必须大于index的offrate值, 同时越大, 耗时越长,耗内存越少. -p/–threads NTHREADS 设置线程数.
Default: 1 –reorder 多线程运算时, 比对结果在顺序上会和文件中reads的顺序不一致, 使用该选 项,
则使其一致. –mm 使用内存定位的I/O来载入index, 而不是常规的文件I/O. 从而使多个bowtie程
序共用内存中同样的index, 节约内存消耗.
其它:
–qc-filter 滤除QSEQ fileter filed为非0的reads. 仅当有—qseq选项时有效. Default:
off. –seed 使用作为随机数产生的种子. Default: 0. –version 打印程序版本并退出 -h/–help
打印用法信息并推出
更多详细信息请阅读:
http://-bio.sourceforge.net/2/manual.shtml
本文来自:http://www.hzaumycology.com/chenlianfu_blog/?p=178
【Bowtie】BOWTIE2:Manual(参数)
http://-bio.sourceforge.net/2/manual.shtml
bowtie2 [options]* -x {-1 -2 | -U } -S []
-x
The basename of the index for the reference genome. The basename is
the name of any of the index files up to but not including the
final.1.bt2/.rev.1.bt2/
etc.bowtie2looks
for the specified index first in the current directory, then in the
directory specified in theBOWTIE2_INDEXESenvironment
variable.
-1
Comma-separated list of files containing mate 1s (filename usually
includes_1),
e.g.-1
flyA_1.fq,flyB_1.fq.
Sequences specified with this option must correspond file-for-file
and read-for-read with those specified in . Reads
may be a mix of different lengths. If-is
specified,bowtie2will
read the mate 1s from the “standard in” or “stdin”
filehandle.
-2
Comma-separated list of files containing mate 2s (filename usually
includes_2),
e.g.-2
flyA_2.fq,flyB_2.fq.
Sequences specified with this option must correspond file-for-file
and read-for-read with those specified in . Reads
may be a mix of different lengths. If-is
specified,bowtie2will
read the mate 2s from the “standard in” or “stdin”
filehandle.
-U
Comma-separated list of files containing unpaired reads to be
aligned, e.g.lane1.fq,lane2.fq,lane3.fq,lane4.fq.
Reads may be a mix of different lengths.
If-is
specified,bowtie2gets
the reads from the “standard in” or “stdin” filehandle.
-S
File to write SAM alignments to. By default, alignments are written
to the “standard out” or “stdout” filehandle (i.e. the
console).
-q
Reads (specified
with , ,)
are FASTQ files. FASTQ files usually have
extension .fq or .fastq.
FASTQ is the default format. See
also:and.
–qseq
Reads (specified
with , ,)
are QSEQ files. QSEQ files usually end
in _qseq.txt. See
also:and.
-f
Reads (specified
with , ,)
are FASTA files. FASTA files usually have
extension .fa, .fasta, .mfa, .fna or
similar. FASTA files do not have a way of specifying quality
values, so when -f is set, the
result is as if –ignore-quals is
also set.
-r
Reads (specified
with , ,)
are files with one input sequence per line, without any other
information (no read names, no qualities).
When -r is set, the result is as
if –ignore-quals is also
set.
-c
The read sequences are given on command line.
I.e. , and are
comma-separated lists of reads rather than lists of read files.
There is no way to specify read names or qualities,
so-calso
implies–ignore-quals.
-s/–skip
Skip (i.e. do not align) the
first reads or pairs in the
input.
-u/–qupto
Align the first reads or read
pairs from the input (after
thereads or pairs have been skipped), then
stop. Default: no limit.
-5/–trim5
Trim bases from 5′ (left) end of
each read before alignment (default: 0).
-3/–trim3
Trim bases from 3′ (right) end
of each read before alignment (default: 0).
–phred33
Input qualities are ASCII chars equal to
theplus 33. This is also called the “Phred+33”
encoding, which is used by the very latest Illumina
pipelines.
–phred64
Input qualities are ASCII chars equal to
theplus 64. This is also called the “Phred+64”
encoding.
–solexa-quals
Convert input qualities from(which can be negative)
to(which can’t). This scheme was used in
older Illumina GA Pipeline versions (prior to 1.3). Default:
off.
–int-quals
Quality values are represented in the read input file as
space-separated ASCII integers,
e.g.,40
40 30 40…,
rather than ASCII characters,
e.g.,II?I….
Integers are treated as being on
thescale
unlessis also specified. Default:
off.
–very-fast
Same as:-D
5 -R 1 -N 0 -L 22 -i S,0,2.50
–fast
Same as:-D
10 -R 2 -N 0 -L 22 -i S,0,2.50
–sensitive
Same as:-D
15 -R 2 -L 22 -i S,1,1.15(default
inmode)
–very-sensitive
Same as:-D
20 -R 3 -N 0 -L 20 -i S,1,0.50
–very-fast-local
Same as:-D
5 -R 1 -N 0 -L 25 -i S,1,2.00
–fast-local
Same as:-D
10 -R 2 -N 0 -L 22 -i S,1,1.75
–sensitive-local
Same as:-D
15 -R 2 -N 0 -L 20 -i S,1,0.75(default
inmode)
–very-sensitive-local
Same as:-D
20 -R 3 -N 0 -L 20 -i S,1,0.50
-N
Sets the number of mismatches to allowed in a seed alignment
during. Can be set to 0
or 1. Setting this higher makes alignment slower (often much
slower) but increases sensitivity. Default: 0.
-L
Sets the length of the seed substrings to align
during. Smaller values
make alignment slower but more senstive. Default:
thepreset is used by default, which
sets-Lto
20 both inmode and
inmode.
-i
Sets a function governing the interval between seed substrings to
use during. For instance,
if the read has 30 characers, and seed length is 10, and the seed
interval is 6, the seeds extracted will be:
Read: TAGCTACGCTCTACGCTATCATGCATAAAC Seed 1 fw: TAGCTACGCT Seed 1
rc: AGCGTAGCTA Seed 2 fw: CGCTCTACGC Seed 2 rc: GCGTAGAGCG Seed 3
fw: ACGCTATCAT Seed 3 rc: ATGATAGCGT Seed 4 fw: TCATGCATAA Seed 4
rc: TTATGCATGA
Since it’s best to use longer intervals for longer reads, this
parameter sets the interval as a function of the read length,
rather than a single one-size-fits-all number. For instance,
specifying-i
S,1,2.5sets
the interval functionftof(x)
= 1 + 2.5 * sqrt(x),
where x is the read length. See
also:. If the
function returns a result less than 1, it is rounded up to 1.
Default: thepreset is used by default, which
sets-itoS,1,1.15inmode
to-i
S,1,0.75inmode.
–n-ceil
Sets a function governing the maximum number of ambiguous
characters (usuallyNs
and/or.s)
allowed in a read as a function of read length. For instance,
specifying-L,0,0.15sets
the N-ceiling functionftof(x)
= 0 + 0.15 * x,
where x is the read length. See
also:. Reads
exceeding this ceiling are.
Default:L,0,0.15.
–dpad
“Pads” dynamic programming problems
by columns on either side to
allow gaps. Default: 15.
–gbar
Disallow gaps within positions
of the beginning or end of the read. Default: 4.
–ignore-quals
When calculating a mismatch penalty, always consider the quality
value at the mismatched position to be the highest possible,
regardless of the actual value. I.e. input is treated as though all
quality values are high. This is also the default behavior when the
input doesn’t specify quality values (e.g.
in,, ormodes).
–nofw/–norc
If–nofwis
specified,bowtie2will
not attempt to align unpaired reads to the forward (Watson)
reference strand. If–norcis
specified,bowtie2will
not attempt to align unpaired reads against the reverse-complement
(Crick) reference strand. In paired-end
mode,–nofwand–norcpertain
to the fragments; i.e.
specifying–nofwcausesbowtie2to
explore only those paired-end configurations corresponding to
fragments from the reverse-complement (Crick) strand. Default: both
strands enabled.
–no-1mm-upfront
By default, Bowtie 2 will attempt to find either an exact or a
1-mismatch end-to-end alignment for the
readbeforetrying
the. Such alignments
can be found very quickly, and many short read alignments have
exact or near-exact end-to-end alignments. However, this can lead
to unexpected alignments when the user also sets options governing
the,
likeand. For instance, if the user
specifies-N
0and-Lequal
to the length of the read, the user will be surprised to find
1-mismatch alignments reported. This option prevents Bowtie 2 from
searching for 1-mismatch end-to-end alignments before using
the, which leads to
the expected behavior when combined with options such
asand. This comes at the expense of speed.
–end-to-end
In this mode, Bowtie 2 requires that the entire read align from one
end to the other, without any trimming (or “soft clipping”) of
characters from either end. The match
bonusalways equals 0 in this mode, so all
alignment scores are less than or equal to 0, and the greatest
possible alignment score is 0. This is mutually exclusive
with.–end-to-endis
the default mode.
–local
In this mode, Bowtie 2 does not require that the entire read align
from one end to the other. Rather, some characters may be omitted
(“soft clipped”) from the ends in order to achieve the greatest
possible alignment score. The match
bonusis used in this mode, and the best
possible alignment score is equal to the match bonus
() times the length of the read.
Specifying–localand
one of the presets (e.g.–local
–very-fast)
is equivalent to specifying the local version of the preset
(–very-fast-local).
This is mutually exclusive
with.–end-to-endis
the default mode.
–ma
Sets the match bonus. Inmode is
added to the alignment score for each position where a read
character aligns to a reference character and the characters match.
Not used inmode. Default: 2.
–mp MX,MN
Sets the maximum (MX)
and minimum (MN)
mismatch penalties, both integers. A number less than or equal
toMXand
greater than or equal toMNis
subtracted from the alignment score for each position where a read
character aligns to a reference character, the characters do not
match, and neither is anN.
Ifis specified, the number subtracted
qualsMX.
Otherwise, the number subtracted
isMN
+ floor( (MX-MN)(MIN(Q, 40.0)/40.0) )where
Q is the Phred quality value.
Default:MX=
6,MN=
2.
–np
Sets penalty for positions where the read, reference, or both,
contain an ambiguous character such
asN.
Default: 1.
–rdg ,
Sets the read gap open () and extend () penalties. A read gap of
length N gets a penalty of + N
* . Default: 5, 3.
–rfg ,
Sets the reference gap open () and extend () penalties. A reference
gap of length N gets a penalty of + N
* . Default: 5, 3.
–score-min
Sets a function governing the minimum alignment score needed for an
alignment to be considered “valid” (i.e. good enough to report).
This is a function of read length. For instance,
specifyingL,0,-0.6sets
the minimum-score functionftof(x)
= 0 + -0.6 * x,
wherexis
the read length. See also:. The
default inmode
isL,-0.6,-0.6and
the default inmode
isG,20,8.
-k
By default,bowtie2searches
for distinct, valid alignments for each read. When it finds a valid
alignment, it continues looking for alignments that are nearly as
good or better. The best alignment found is reported (randomly
selected from among best if tied). Information about the best
alignments is used to estimate mapping quality and to set SAM
optional fields, such asand.
When-kis
specified, however,bowtie2behaves
differently. Instead, it searches for at
most distinct, valid alignments
for each read. The search terminates when it can’t find more
distinct valid alignments, or when it finds ,
whichever happens first. All alignments found are reported in
descending order by alignment score. The alignment score for a
paired-end alignment equals the sum of the alignment scores of the
individual mates. Each reported read or pair alignment beyond the
first has the SAM ‘secondary’ bit (which equals 256) set in its
FLAGS field. For reads that have more
than distinct, valid
alignments,bowtie2does
not gaurantee that
the alignments reported are the
best possible in terms of alignment
score.-kis
mutually exclusive with.
Note: Bowtie 2 is not designed with large values
for-kin
mind, and when aligning reads to long, repetitive genomes
large-kcan
be very, very slow.
-a
Likebut with no upper limit on number of
alignments to search for.-ais
mutually exclusive with.
Note: Bowtie 2 is not designed
with-amode
in mind, and when aligning reads to long, repetitive genomes this
mode can be very, very slow.
-D
Up to consecutive seed extension
attempts can “fail” before Bowtie 2 moves on, using the alignments
found so far. A seed extension “fails” if it does not yield a new
best or a new second-best alignment. This limit is automatically
adjusted up when -k or -a are specified. Default: 15.
-R
is the maximum number of times Bowtie 2 will
“re-seed” reads with repetitive seeds. When “re-seeding,” Bowtie 2
simply chooses a new set of reads (same length, same number of
mismatches allowed) at different offsets and searches for more
alignments. A read is considered to have repetitive seeds if the
total number of seed hits divided by the number of seeds that
aligned at least once is greater than 300. Default: 2.
-I/–minins
The minimum fragment length for valid paired-end alignments. E.g.
if-I
60is
specified and a paired-end alignment consists of two 20-bp
alignments in the appropriate orientation with a 20-bp gap between
them, that alignment is considered valid (as long
asis also satisfied). A 19-bp gap would
not be valid in that case. If trimming
optionsorare also used,
theconstraint is applied with respect to
the untrimmed mates.
The larger the difference
betweenand, the slower Bowtie 2 will run. This is because larger
differences bewteenandrequire that Bowtie 2 scan a larger
window to determine if a concordant alignment exists. For typical
fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very
efficient.
Default: 0 (essentially imposing no minimum)
-X/–maxins
The maximum fragment length for valid paired-end alignments. E.g.
if-X
100is
specified and a paired-end alignment consists of two 20-bp
alignments in the proper orientation with a 60-bp gap between them,
that alignment is considered valid (as long
asis also satisfied). A 61-bp gap would
not be valid in that case. If trimming
optionsorare also used, the-Xconstraint
is applied with respect to the untrimmed mates, not the trimmed
mates.
The larger the difference
betweenand, the slower Bowtie 2 will run. This is because larger
differences bewteenandrequire that Bowtie 2 scan a larger
window to determine if a concordant alignment exists. For typical
fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very
efficient.
Default: 500.
–fr/–rf/–ff
The upstream/downstream mate orientations for a valid paired-end
alignment against the forward reference strand. E.g.,
if–fris
specified and there is a candidate paired-end alignment where mate
1 appears upstream of the reverse complement of mate 2 and the
fragment length constraints (and) are met, that alignment is valid. Also, if mate 2
appears upstream of the reverse complement of mate 1 and all other
constraints are met, that too is
valid.–rflikewise
requires that an upstream mate1 be reverse-complemented and a
downstream mate2 be
forward-oriented.–ffrequires
both an upstream mate 1 and a downstream mate 2 to be
forward-oriented. Default:–fr(appropriate
for Illumina’s Paired-end Sequencing Assay).
–no-mixed
By default, whenbowtie2cannot
find a concordant or discordant alignment for a pair, it then tries
to find alignments for the individual mates. This option disables
that behavior.
–no-discordant
By default,bowtie2looks
for discordant alignments if it cannot find any concordant
alignments. A discordant alignment is an alignment where both mates
align uniquely, but that does not satisfy the paired-end
constraints (,,). This option disables that behavior.
–dovetail
If the mates “dovetail”, that is if one mate alignment extends past
the beginning of the other such that the wrong mate begins
upstream, consider that to be concordant. See
also:. Default: mates cannot dovetail in a concordant
alignment.
–no-contain
If one mate alignment contains the other, consider that to be
non-concordant. See also:. Default: a mate can contain the other in a
concordant alignment.
–no-overlap
If one mate alignment overlaps the other at all, consider that to
be non-concordant. See also:. Default: mates can overlap in a concordant
alignment.
-t/–time
Print the wall-clock time required to load the index files and
align the reads. This is printed to the “standard error” (“stderr”)
filehandle. Default: off.
–un –un-gz –un-bz2
Write unpaired reads that fail to align to file
at . These reads correspond to the SAM records
with the FLAGS0x4bit
set and neither the0x40nor0x80bits
set. If–un-gzis
specified, output will be gzip compressed.
If–un-bz2is
specified, output will be bzip2 compressed. Reads written in this
way will appear exactly as they did in the input file, without any
modification (same sequence, same name, same quality string, same
quality encoding). Reads will not necessarily appear in the same
order as they did in the input.
–al –al-gz –al-bz2
Write unpaired reads that align at least once to file at. These
reads correspond to the SAM records with the
FLAGS0x4,0x40,
and0x80bits
unset. If–al-gzis
specified, output will be gzip compressed.
If–al-bz2is
specified, output will be bzip2 compressed. Reads written in this
way will appear exactly as they did in the input file, without any
modification (same sequence, same name, same quality string, same
quality encoding). Reads will not necessarily appear in the same
order as they did in the input.
–un-conc –un-conc-gz –un-conc-bz2
Write paired-end reads that fail to align concordantly to file(s)
at . These reads correspond to the SAM records
with the FLAGS0x4bit
set and either the0x40or0x80bit
set (depending on whether it’s mate #1 or #2)..1and.2strings
are added to the filename to distinguish which file contains mate
#1 and mate #2. If a percent
symbol,%,
is used in , the percent symbol is replaced
with1or2to
make the per-mate filenames.
Otherwise,.1or.2are
added before the final dot in to make the per-mate
filenames. Reads written in this way will appear exactly as they
did in the input files, without any modification (same sequence,
same name, same quality string, same quality encoding). Reads will
not necessarily appear in the same order as they did in the
inputs.
–al-conc –al-conc-gz –al-conc-bz2
Write paired-end reads that align concordantly at least once to
file(s) at . These reads correspond to the SAM
records with the FLAGS0x4bit
unset and either the0x40or0x80bit
set (depending on whether it’s mate #1 or
#2)..1and.2strings
are added to the filename to distinguish which file contains mate
#1 and mate #2. If a percent
symbol,%,
is used in , the percent symbol is replaced
with1or2to
make the per-mate filenames.
Otherwise,.1or.2are
added before the final dot in to make the per-mate
filenames. Reads written in this way will appear exactly as they
did in the input files, without any modification (same sequence,
same name, same quality string, same quality encoding). Reads will
not necessarily appear in the same order as they did in the
inputs.
–quiet
Print nothing besides alignments and serious errors.
–met-file
Writebowtie2metrics
to file . Having alignment metric can be useful
for debugging certain problems, especially performance issues. See
also:. Default: metrics disabled.
–met-stderr
Writebowtie2metrics
to the “standard error” (“stderr”) filehandle. This is not mutually
exclusive with. Having alignment metric can be useful for debugging
certain problems, especially performance issues. See
also:. Default: metrics disabled.
–met
Write a newbowtie2metrics
record every seconds. Only
matters if eitherorare specified. Default: 1.
–no-unal
Suppress SAM records for reads that failed to align.
–no-hd
Suppress SAM header lines (starting
with@).
–no-sq
Suppress@SQSAM
header lines.
–rg-id
Set the read group ID to . This causes the
SAM@RGheader
line to be printed, with as the
value associated with theID:tag.
It also causes theRG:Z:extra
field to be attached to each SAM output record, with value set
to .
–rg
Add (usually of the
formTAG:VAL,
e.g.SM:Pool1)
as a field on the@RGheader
line. Note: in order for the@RGline
to appear,must also be specified. This is because
theIDtag
is required by the.
Specify–rgmultiple
times to set multiple fields. See
thefor
details about what fields are legal.
–omit-sec-seq
When printing secondary alignments, Bowtie 2 by default will write
out theSEQandQUALstrings.
Specifying this option causes Bowtie 2 to print an asterix in those
fields instead.
-o/–offrate
Override the offrate of the index with .
If is greater than the offrate
used to build the index, then some row markings are discarded when
the index is read into memory. This reduces the memory footprint of
the aligner but requires more time to calculate text
offsets. must be greater than the value used to
build the index.
-p/–threads NTHREADS
LaunchNTHREADSparallel
search threads (default: 1). Threads will run on separate
processors/cores and synchronize when parsing reads and outputting
alignments. Searching for alignments is highly parallel, and
speedup is close to linear.
Increasing-pincreases
Bowtie 2’s memory footprint. E.g. when aligning to a human genome
index, increasing-pfrom
1 to 8 increases the memory footprint by a few hundred megabytes.
This option is only available
ifbowtieis
linked with thepthreadslibrary
(i.e. ifBOWTIE_PTHREADS=0is
not specified at build time).
–reorder
Guarantees that output SAM records are printed in an order
corresponding to the order of the reads in the original input file,
even whenis set greater than 1.
Specifying–reorderand
settinggreater than 1 causes Bowtie 2 to run
somewhat slower and use somewhat more memory then
if–reorderwere
not specified. Has no effect
ifis set to 1, since output order will
naturally correspond to input order in that case.
–mm
Use memory-mapped I/O to load the index, rather than typical file
I/O. Memory-mapping allows many concurrentbowtieprocesses
on the same computer to share the same memory image of the index
(i.e. you pay the memory overhead just once). This facilitates
memory-efficient parallelization
ofbowtiein
situations where usingis not possible or not
preferable.
–qc-filter
Filter out reads for which the QSEQ filter field is non-zero. Only
has an effect when read format
is. Default: off.
–seed
Use as the seed for
pseudo-random number generator. Default: 0.
–non-deterministic
Normally, Bowtie 2 re-initializes its pseudo-random generator for
each read. It seeds the generator with a number derived from (a)
the read name, (b) the nucleotide sequence, (c) the quality
sequence, (d) the value of theoption. This means that if two reads are
identical (same name, same nucleotides, same qualities) Bowtie 2
will find and report the same alignment(s) for both, even if there
was ambiguity. When–non-deterministicis
specified, Bowtie 2 re-initializes its pseudo-random generator for
each read using the current time. This means that Bowtie 2 will not
necessarily report the same alignment for two identical reads. This
is counter-intuitive for some users, but might be more appropriate
in situations where the input consists of many identical
reads.
–version
Print version information and quit.
-h/–help
Print usage information and quit.
发布者:全栈程序员-用户IM,转载请注明出处:https://javaforall.cn/215484.html原文链接:https://javaforall.cn
【正版授权,激活自己账号】: Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛
【官方授权 正版激活】: 官方授权 正版激活 支持Jetbrains家族下所有IDE 使用个人JB账号...