bowtie 加mn标签_Bowtie 比对「建议收藏」

大家好，又见面了，我是你们的朋友全栈君。如果您正在找激活码,请点击查看最新教程,关注关注公众号 “全栈程序员社区” 获取激活教程,可能之前旧版本教程已经失效.最新Idea2022.1教程亲测有效,一键激活。

Jetbrains全家桶1年46，售后保障稳定

【Bowtie】DNA序列拼接的原理

【Jenny点评】我一直以为Bowtie是一个短序列拼接工作，实际上这是错误的。它不是序列拼接工作，只是一个序列比对的工具。最后的结果是相对index而言，对各个短序列进行定位。

——————

短序列比对的原理如何？目前有哪些常用的短序列比对软件？ ok

http://blog.sina.com.cn/s/blog_9617895f01011npk.html

答：序列比对(alignment)：为确定两个或多个序列之间的相似性以至于同源性，而将它们按照一定的规律排列。跟长序列比对不同，短序列比对有其特点，因此，两者的算法不一样。短序列比对中，一般常用的算法主要有三个：

(1)

空位种子片段索引法，如MAQ、ELAND等，首先将读段切分，并选取其中一段或几段作为种子建立搜索索引，再通过查找索引、延展匹配来实现读段定位，通过轮换种子考虑允许出现错配(mismatch)的各种可能的位置组合；

(2)

Burrows

Wheeler转换法，如Bowtie、BWA、SOAP2等，通过B-W转换将基因组序列按一定规则压缩并建立索引，再通过查找和回溯来定位读段，在查找时可通过碱基替代来实现允许的错配；

(3)

Smith-Waterman动态规划算法，如BFAST，SHRiMP等，利用初始条件和迭代关系式计算两个序列的所有可能的比对分值，并将结果存放于一个矩阵中，利用动态规划的方法回溯寻找最优的比对结果。

华大基因拼接 ok

http://www.ebiotrade.com/newsf/2010-1/2010128171022809.htm

下一代基因序列拼接算法研究

http://www.fdurop.fudan.edu.cn/upload/stu/docs/rcYsXb_102804-1303180458.pdf

基因组测序及分析Good！推荐看！

http://ibi.zju.edu.cn/bioinplant/courses/chap4.pdf

基因序列拼接算法设计

http://www.doc88.com/p-741680604744.html

【Bowtie】Bowtie2使用方法与参数详细介绍

Bowtie2使用方法与参数详细介绍

懒人必看

Bowtie2 -q –phred33 –sensitive –end-to-end -I 0 -X 500 –fr –un

unpaired –al aligned \ –un-conc unconc –al-conc alconc -p 6

–reorder -x{-1-2| -U} -S []

用法：

bowtie2 [options]* -x {-1 -2 | -U } -S []

必须：

-x 由bowtie2-build所生成的索引文件的前缀。首先在当前目录搜寻，然后在环境变量 BOWTIE2_INDEXES

中制定的文件夹中搜寻。 -1 双末端测寻对应的文件1。可以为多个文件，并用逗号分开；多个文件必须和 -2

中制定的文件一一对应。比如:”-1 flyA_1.fq,flyB_1.fq -2 flyA_2.fq,flyB _2.fq”.

测序文件中的reads的长度可以不一样。 -2 双末端测寻对应的文件2. -U

非双末端测寻对应的文件。可以为多个文件，并用逗号分开。测序文件中的reads的长度可以不一样。 -S

所生成的SAM格式的文件前缀。默认是输入到标准输出。

以下是可选：

输入

-q 输入的文件为FASTQ格式文件，此项为默认值。 -qseq 输入的文件为QSEQ格式文件。 -f

输入的文件为FASTA格式文件。选择此项时，表示–ignore-quals也被选择了。 -r

输入的文件中，每一行代表一条序列，没有序列名和测序质量等。选择此项时，表示– ignore-quals也被选择了。 -c

后直接为比对的reads序列，而不是包含序列的文件名。序列间用逗号隔开。选择此项时，表示—ignore-quals也被选择了。

-s/–skip input的reads中，跳过前个reads或者pairs。 -u/–qupto

只比对前个reads或者pairs(在跳过前个reads或者 pairs后)。Default: no limit.

-5/–trim5 剪掉5’端长度的碱基，再用于比对。(default: 0). -3/–trim3

剪掉3’端长度的碱基，再用于比对。(default: 0). –phred33 输入的碱基质量等于ASCII码值加上33.

在最近的illumina pipiline中得以运用。 –phred64 输入的碱基质量等于ASCII码值加上64.

–solexa-quals 将Solexa的碱基质量转换为Phred。在老的GA Pipeline版本中得以运用。Default:

off. –int-quals 输入文件中的碱基质量为用“ ”分隔的数值，而不是ASCII码。比如 40 40 30

40…。Default: off.

–end-to-end模式下的预设

–very-fast Same as: -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast Same

as: -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive Same as: -D 15 -R

2 -N 0 -L 22 -i S,1,1.15 (default in –end-to-end mode)

–very-sensitive Same as: -D 20 -R 3 -N 0 -L 20 -i

S,1,0.50

–loca模式下的预设

–loca模式下的预设参数 –very-fast-local Same as: -D 5 -R 1 -N 0 -L 25 -i

S,1,2.00 –fast-local Same as: -D 10 -R 2 -N 0 -L 22 -i S,1,1.75

–sensitive-local Same as: -D 15 -R 2 -N 0 -L 20 -i S,1,0.75

(default in –local mode) –very-sensitive-local Same as: -D 20 -R

3 -N 0 -L 20 -i S,1,0.50

-N 进行种子比对时允许的mismatch数. 可以设为0或者1. Default: 0. -L 设定种子的长度.

************************************************************ 功能选项

给bowtie的一些参数设定值的时候，使用一个计算公式代替，于是值的大小与比对序列的长度成一定关系。有三部分组成: (a)计算方法,

包括常数(C),线性(L),平方根(S)和自然对数(G); (b)一个常数; (c)一个系数. 例如: 为 L,-0.4,-0.6

则计算公式为: f(x) = -0.4 + -0.6 * x 为G,1,5.4 则计算公式为: f(x) = 1.0 + 5.4 *

ln(x) ************************************************************

-i 设定两个相邻种子间所间距的碱基数。

************************************************************

例如：如果read的长度为30, 种子的长度为10, 相邻种子的间距为6,则提取出的种子如下所示： Read:

TAGCTACGCTCTACGCTATCATGCATAAAC Seed 1 fw: TAGCTACGCT Seed 1 rc:

AGCGTAGCTA Seed 2 fw: CGCTCTACGC Seed 2 rc: GCGTAGAGCG Seed 3 fw:

ACGCTATCAT Seed 3 rc: ATGATAGCGT Seed 4 fw: TCATGCATAA Seed 4 rc:

TTATGCATGA

************************************************************

在–end-to-end模式中默认值为”-i S,1,1.15”.即表示f(x) = 1 + 1.15 * sqrt(x).

如果read长度为100, 则相邻种子的间距为12. –n-ceil

设定read中允许含有不确定碱基(非GTAC,通常为N)的最大数目. Default: L,0,0.15. 计算公式为: f(x) =

0 + 0.15 * x, 表示长度为100的read 最多运行存在15个不确定碱基. 一旦不确定碱基数超过15,

则该条read会被过滤掉. –dpad Default: 15. –gbar 在read头尾个碱基内不允许gap.

Default: 4. –ignore-quals 计算错配罚分的时候不考虑碱基质量. 当输入序列的模式为-f, -r 或

者-c的时候, 该设置自动成为默认设置. –nofw/–norc –nofw设定read不和前导链(forward

reference strand)进行比对; –norc设定不和后随链(reverse-complement reference

strand)进行比对. Default: both strands enabled. –end-to-end

比对是将整个read和参考序列进行比对. 该模式–ma的值为0. 该模式为默认模式, –local模式冲突. –local

该模式下对read进行局部比对, 从而, read两端的一些碱基不比对，从而使比对得分满足要求. 该模式下

–ma默认为2.

得分罚分

–ma 设定匹配得分. –local模式下每个read上碱基和参考序列上碱基匹配, 则加分.

在—end-to-end模式中无效. Default: 2. –mp MX,MN 设定错配罚分. 其中MX为所罚最高分,

MN为所罚最低分. 默认设置下罚分与碱基质量相关. 罚分遵循的公式为: MN + floor( (MX-MN)(MIN(Q,

40.0)/40.0) ). 其中Q为碱基的质量值. 如果设置了—ignore-qual参数, 则错配总是罚最高分. Default:

MX = 6, MN = 2. –np 当匹配位点中read, reference上有不确定碱基(比如N)时所设定的罚分值.

Default: 1. –rdg , 设置在read上打开gap 罚分, 延长gap罚分. Default: 5, 3. –rfg

, 设置在reference上打开gap 罚分, 延长gap罚分 . Default: 5, 3. –score-min

设定成为有效比对的最小分值. 在—end-to-end模式下默认值为: L,-0.6,-0.6; 在–local模式下默认值为:

G,20,8.

报告

-k 默认设置下, bowtie2搜索出了一个read不同的比对结果, 并报告其中最好的比对结果(如果好几个最好的比对结果得分一致,

则随机挑选出其中一个). 而在该模式下, bowtie2最多搜索出一个read 个比对结果, 并将这些结果按得分降序报告出来. -a

和-k参数一样, 不过不限制搜索的结果数目. 并将所有的比对结果都按降序报告出来. 此参数和-k参数冲突. 值得注意的是:

如果基因组含有很多重复序列时, 该参数会导致程序运行极其缓慢.

Effort

-D 比对时, 将一个种子延长后得到比对结果, 如果不产生更好的或次好的比对结果, 则该次比对失败. 当失败次数连续达到次后,

则该条read比对结束. Bowtie2才会继续进行下去. Default: 15. 当具有-k或-a参数,

则该参数所产生的限制会自动调整. -R 如果一个read所生成的种子在参考序列上匹配位点过多. 当每个种子平均匹配超过300个位置,

则通过一个不同的偏移来重新生成种子进行比对. 则是重新生成种子的次数. Default: 2.

Paired-end

-I/–minins 设定最小的插入片段长度. Default: 0. -X/–maxins 设定最长的插入片段长度.

Default: 500. –fr/–rf/–ff 设定上下游reads和前导链paired-end比对的方向. –fr:

匹配时， read1在5’端上游, 和前导链一致, read2在3’下游, 和前导链反向互补. 或者read2在上游,

read1在下游反向互补; –rf: read1在5’端上游, 和前导链反向互补, read2在 3’端下游, 和前导链一致;

–fr: 两条reads都和前导链一致. Default: –fr. 默认

设置适合于Illumina的paired-end测序数据; 若是mate-paired, 则要选择—rf参数. –no-mixed

默认设置下, 一对reads不能成对比对到参考序列上, 则单独对每个read进行比对. 该选项则阻止此行为.

–no-discordant 默认设置下, 一对reads不能和谐比对(concordant alignment, 即满足-I,

-X, –fr/–rf/–ff的条件)到参考序列上, 则搜寻其不和谐比对(discon cordant alignment,

即两条reads都能独一无二地比对到参考序列上, 但是不满足-I, -X,–fr/–rf/–ff的条件). 该选项阻止此行为.

–dovetail read1和read2的关系为dovetail的时候,该状况算为和谐比对. 默认情况

下dovetail不算和谐比对. –no-contain read1和read2的关系为包含的时候, 该状况不算为和谐比对.

默认情况下包含关系算为和谐比对. –no-overlap read1和read2的关系为有重叠的时候, 该状况不算为和谐比对.

默认情况下两个reads重叠算为和谐比对.

输出

-t/–time –un 将unpaired reads写入到. –un-gz 将unpaired reads写入到,

gzip压缩. –un-bz2 将unpaired reads写入到, bz2压缩. –al

将至少能比对1次以上的unpaired reads写入. –al-gz … ,gzip压缩. –al-bz2 …

,bz2压缩. –un-conc 将不能和谐比对的paired-end reads写入. –un-conc-gz …

,gzip压缩. –un-conc-bz2 … ,bz2压缩. –al-conc

将至少能和谐比对一次以上的paired-end reads写入. –al-conc-gz … ,gzip压缩.

–al-conc-bz2 … ,bz2压缩. –quiet 安静模式,除了比对错误和一些严重的错误, 不在屏幕上输出任何东西.

–met-file 将bowtie2的检测信息(metrics)写入文件. 用于debug. Default: metrics

disabled. –met-stderr 将bowtie2的检测信息(metrics)写入标准错误文件句柄. 和上

一个选项不冲突. Default: metrics disabled. –met 每隔秒写入一次metrics记录.

Default: 1.

Sam

–no-unal 不记录没比对上的reads. –no-hd 不记录SAM header lines (以@开头).

–no-sq 不记录@SQ的SAM header lines. –rg-id 设定read group Id到. –rg

增加作为一行@RG.

性能

-o/–offrate 无视index的offrate值, 以取代之. Index默认的值为5.

值必须大于index的offrate值, 同时越大, 耗时越长，耗内存越少. -p/–threads NTHREADS 设置线程数.

Default: 1 –reorder 多线程运算时, 比对结果在顺序上会和文件中reads的顺序不一致, 使用该选项,

则使其一致. –mm 使用内存定位的I/O来载入index, 而不是常规的文件I/O. 从而使多个bowtie程

序共用内存中同样的index, 节约内存消耗.

其它：

–qc-filter 滤除QSEQ fileter filed为非0的reads. 仅当有—qseq选项时有效. Default:

off. –seed 使用作为随机数产生的种子. Default: 0. –version 打印程序版本并退出 -h/–help

打印用法信息并推出

更多详细信息请阅读：

http://-bio.sourceforge.net/2/manual.shtml

本文来自：http://www.hzaumycology.com/chenlianfu_blog/?p=178

【Bowtie】BOWTIE2:Manual(参数)

http://-bio.sourceforge.net/2/manual.shtml

bowtie2 [options]* -x {-1 -2 | -U } -S []

-x

The basename of the index for the reference genome. The basename is

the name of any of the index files up to but not including the

final.1.bt2/.rev.1.bt2/

etc.bowtie2looks

for the specified index first in the current directory, then in the

directory specified in theBOWTIE2_INDEXESenvironment

variable.

-1

Comma-separated list of files containing mate 1s (filename usually

includes_1),

e.g.-1

flyA_1.fq,flyB_1.fq.

Sequences specified with this option must correspond file-for-file

and read-for-read with those specified in . Reads

may be a mix of different lengths. If-is

specified,bowtie2will

read the mate 1s from the “standard in” or “stdin”

filehandle.

-2

Comma-separated list of files containing mate 2s (filename usually

includes_2),

e.g.-2

flyA_2.fq,flyB_2.fq.

Sequences specified with this option must correspond file-for-file

and read-for-read with those specified in . Reads

may be a mix of different lengths. If-is

specified,bowtie2will

read the mate 2s from the “standard in” or “stdin”

filehandle.

-U

Comma-separated list of files containing unpaired reads to be

aligned, e.g.lane1.fq,lane2.fq,lane3.fq,lane4.fq.

Reads may be a mix of different lengths.

If-is

specified,bowtie2gets

the reads from the “standard in” or “stdin” filehandle.

-S

File to write SAM alignments to. By default, alignments are written

to the “standard out” or “stdout” filehandle (i.e. the

console).

-q

Reads (specified

with , ,)

are FASTQ files. FASTQ files usually have

extension .fq or .fastq.

FASTQ is the default format. See

also:and.

–qseq

Reads (specified

with , ,)

are QSEQ files. QSEQ files usually end

in _qseq.txt. See

also:and.

-f

Reads (specified

with , ,)

are FASTA files. FASTA files usually have

extension .fa, .fasta, .mfa, .fna or

similar. FASTA files do not have a way of specifying quality

values, so when -f is set, the

result is as if –ignore-quals is

also set.

-r

Reads (specified

with , ,)

are files with one input sequence per line, without any other

information (no read names, no qualities).

When -r is set, the result is as

if –ignore-quals is also

set.

-c

The read sequences are given on command line.

I.e. , and are

comma-separated lists of reads rather than lists of read files.

There is no way to specify read names or qualities,

so-calso

implies–ignore-quals.

-s/–skip

Skip (i.e. do not align) the

first reads or pairs in the

input.

-u/–qupto

Align the first reads or read

pairs from the input (after

thereads or pairs have been skipped), then

stop. Default: no limit.

-5/–trim5

Trim bases from 5′ (left) end of

each read before alignment (default: 0).

-3/–trim3

Trim bases from 3′ (right) end

of each read before alignment (default: 0).

–phred33

Input qualities are ASCII chars equal to

theplus 33. This is also called the “Phred+33”

encoding, which is used by the very latest Illumina

pipelines.

–phred64

Input qualities are ASCII chars equal to

theplus 64. This is also called the “Phred+64”

encoding.

–solexa-quals

Convert input qualities from(which can be negative)

to(which can’t). This scheme was used in

older Illumina GA Pipeline versions (prior to 1.3). Default:

off.

–int-quals

Quality values are represented in the read input file as

space-separated ASCII integers,

e.g.,40

40 30 40…,

rather than ASCII characters,

e.g.,II?I….

Integers are treated as being on

thescale

unlessis also specified. Default:

off.

–very-fast

Same as:-D

5 -R 1 -N 0 -L 22 -i S,0,2.50

–fast

Same as:-D

10 -R 2 -N 0 -L 22 -i S,0,2.50

–sensitive

Same as:-D

15 -R 2 -L 22 -i S,1,1.15(default

inmode)

–very-sensitive

Same as:-D

20 -R 3 -N 0 -L 20 -i S,1,0.50

–very-fast-local

Same as:-D

5 -R 1 -N 0 -L 25 -i S,1,2.00

–fast-local

Same as:-D

10 -R 2 -N 0 -L 22 -i S,1,1.75

–sensitive-local

Same as:-D

15 -R 2 -N 0 -L 20 -i S,1,0.75(default

inmode)

–very-sensitive-local

Same as:-D

20 -R 3 -N 0 -L 20 -i S,1,0.50

-N

Sets the number of mismatches to allowed in a seed alignment

during. Can be set to 0

or 1. Setting this higher makes alignment slower (often much

slower) but increases sensitivity. Default: 0.

-L

Sets the length of the seed substrings to align

during. Smaller values

make alignment slower but more senstive. Default:

thepreset is used by default, which

sets-Lto

20 both inmode and

inmode.

-i

Sets a function governing the interval between seed substrings to

use during. For instance,

if the read has 30 characers, and seed length is 10, and the seed

interval is 6, the seeds extracted will be:

Read: TAGCTACGCTCTACGCTATCATGCATAAAC Seed 1 fw: TAGCTACGCT Seed 1

rc: AGCGTAGCTA Seed 2 fw: CGCTCTACGC Seed 2 rc: GCGTAGAGCG Seed 3

fw: ACGCTATCAT Seed 3 rc: ATGATAGCGT Seed 4 fw: TCATGCATAA Seed 4

rc: TTATGCATGA

Since it’s best to use longer intervals for longer reads, this

parameter sets the interval as a function of the read length,

rather than a single one-size-fits-all number. For instance,

specifying-i

S,1,2.5sets

the interval functionftof(x)

= 1 + 2.5 * sqrt(x),

where x is the read length. See

also:. If the

function returns a result less than 1, it is rounded up to 1.

Default: thepreset is used by default, which

sets-itoS,1,1.15inmode

to-i

S,1,0.75inmode.

–n-ceil

Sets a function governing the maximum number of ambiguous

characters (usuallyNs

and/or.s)

allowed in a read as a function of read length. For instance,

specifying-L,0,0.15sets

the N-ceiling functionftof(x)

= 0 + 0.15 * x,

where x is the read length. See

also:. Reads

exceeding this ceiling are.

Default:L,0,0.15.

–dpad

“Pads” dynamic programming problems

by columns on either side to

allow gaps. Default: 15.

–gbar

Disallow gaps within positions

of the beginning or end of the read. Default: 4.

–ignore-quals

When calculating a mismatch penalty, always consider the quality

value at the mismatched position to be the highest possible,

regardless of the actual value. I.e. input is treated as though all

quality values are high. This is also the default behavior when the

input doesn’t specify quality values (e.g.

in,, ormodes).

–nofw/–norc

If–nofwis

specified,bowtie2will

not attempt to align unpaired reads to the forward (Watson)

reference strand. If–norcis

specified,bowtie2will

not attempt to align unpaired reads against the reverse-complement

(Crick) reference strand. In paired-end

mode,–nofwand–norcpertain

to the fragments; i.e.

specifying–nofwcausesbowtie2to

explore only those paired-end configurations corresponding to

fragments from the reverse-complement (Crick) strand. Default: both

strands enabled.

–no-1mm-upfront

By default, Bowtie 2 will attempt to find either an exact or a

1-mismatch end-to-end alignment for the

readbeforetrying

the. Such alignments

can be found very quickly, and many short read alignments have

exact or near-exact end-to-end alignments. However, this can lead

to unexpected alignments when the user also sets options governing

the,

likeand. For instance, if the user

specifies-N

0and-Lequal

to the length of the read, the user will be surprised to find

1-mismatch alignments reported. This option prevents Bowtie 2 from

searching for 1-mismatch end-to-end alignments before using

the, which leads to

the expected behavior when combined with options such

asand. This comes at the expense of speed.

–end-to-end

In this mode, Bowtie 2 requires that the entire read align from one

end to the other, without any trimming (or “soft clipping”) of

characters from either end. The match

bonusalways equals 0 in this mode, so all

alignment scores are less than or equal to 0, and the greatest

possible alignment score is 0. This is mutually exclusive

with.–end-to-endis

the default mode.

–local

In this mode, Bowtie 2 does not require that the entire read align

from one end to the other. Rather, some characters may be omitted

(“soft clipped”) from the ends in order to achieve the greatest

possible alignment score. The match

bonusis used in this mode, and the best

possible alignment score is equal to the match bonus

() times the length of the read.

Specifying–localand

one of the presets (e.g.–local

–very-fast)

is equivalent to specifying the local version of the preset

(–very-fast-local).

This is mutually exclusive

with.–end-to-endis

the default mode.

–ma

Sets the match bonus. Inmode is

added to the alignment score for each position where a read

character aligns to a reference character and the characters match.

Not used inmode. Default: 2.

–mp MX,MN

Sets the maximum (MX)

and minimum (MN)

mismatch penalties, both integers. A number less than or equal

toMXand

greater than or equal toMNis

subtracted from the alignment score for each position where a read

character aligns to a reference character, the characters do not

match, and neither is anN.

Ifis specified, the number subtracted

qualsMX.

Otherwise, the number subtracted

isMN

+ floor( (MX-MN)(MIN(Q, 40.0)/40.0) )where

Q is the Phred quality value.

Default:MX=

6,MN=

–np

Sets penalty for positions where the read, reference, or both,

contain an ambiguous character such

asN.

Default: 1.

–rdg ,

Sets the read gap open () and extend () penalties. A read gap of

length N gets a penalty of + N

* . Default: 5, 3.

–rfg ,

Sets the reference gap open () and extend () penalties. A reference

gap of length N gets a penalty of + N

* . Default: 5, 3.

–score-min

Sets a function governing the minimum alignment score needed for an

alignment to be considered “valid” (i.e. good enough to report).

This is a function of read length. For instance,

specifyingL,0,-0.6sets

the minimum-score functionftof(x)

= 0 + -0.6 * x,

wherexis

the read length. See also:. The

default inmode

isL,-0.6,-0.6and

the default inmode

isG,20,8.

-k

By default,bowtie2searches

for distinct, valid alignments for each read. When it finds a valid

alignment, it continues looking for alignments that are nearly as

good or better. The best alignment found is reported (randomly

selected from among best if tied). Information about the best

alignments is used to estimate mapping quality and to set SAM

optional fields, such asand.

When-kis

specified, however,bowtie2behaves

differently. Instead, it searches for at

most distinct, valid alignments

for each read. The search terminates when it can’t find more

distinct valid alignments, or when it finds ,

whichever happens first. All alignments found are reported in

descending order by alignment score. The alignment score for a

paired-end alignment equals the sum of the alignment scores of the

individual mates. Each reported read or pair alignment beyond the

first has the SAM ‘secondary’ bit (which equals 256) set in its

FLAGS field. For reads that have more

than distinct, valid

alignments,bowtie2does

not gaurantee that

the alignments reported are the

best possible in terms of alignment

score.-kis

mutually exclusive with.

Note: Bowtie 2 is not designed with large values

for-kin

mind, and when aligning reads to long, repetitive genomes

large-kcan

be very, very slow.

-a

Likebut with no upper limit on number of

alignments to search for.-ais

mutually exclusive with.

Note: Bowtie 2 is not designed

with-amode

in mind, and when aligning reads to long, repetitive genomes this

mode can be very, very slow.

-D

Up to consecutive seed extension

attempts can “fail” before Bowtie 2 moves on, using the alignments

found so far. A seed extension “fails” if it does not yield a new

best or a new second-best alignment. This limit is automatically

adjusted up when -k or -a are specified. Default: 15.

-R

is the maximum number of times Bowtie 2 will

“re-seed” reads with repetitive seeds. When “re-seeding,” Bowtie 2

simply chooses a new set of reads (same length, same number of

mismatches allowed) at different offsets and searches for more

alignments. A read is considered to have repetitive seeds if the

total number of seed hits divided by the number of seeds that

aligned at least once is greater than 300. Default: 2.

-I/–minins

The minimum fragment length for valid paired-end alignments. E.g.

if-I

60is

specified and a paired-end alignment consists of two 20-bp

alignments in the appropriate orientation with a 20-bp gap between

them, that alignment is considered valid (as long

asis also satisfied). A 19-bp gap would

not be valid in that case. If trimming

optionsorare also used,

theconstraint is applied with respect to

the untrimmed mates.

The larger the difference

betweenand, the slower Bowtie 2 will run. This is because larger

differences bewteenandrequire that Bowtie 2 scan a larger

window to determine if a concordant alignment exists. For typical

fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very

efficient.

Default: 0 (essentially imposing no minimum)

-X/–maxins

The maximum fragment length for valid paired-end alignments. E.g.

if-X

100is

specified and a paired-end alignment consists of two 20-bp

alignments in the proper orientation with a 60-bp gap between them,

that alignment is considered valid (as long

asis also satisfied). A 61-bp gap would

not be valid in that case. If trimming

optionsorare also used, the-Xconstraint

is applied with respect to the untrimmed mates, not the trimmed

mates.

The larger the difference

betweenand, the slower Bowtie 2 will run. This is because larger

differences bewteenandrequire that Bowtie 2 scan a larger

window to determine if a concordant alignment exists. For typical

fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very

efficient.

Default: 500.

–fr/–rf/–ff

The upstream/downstream mate orientations for a valid paired-end

alignment against the forward reference strand. E.g.,

if–fris

specified and there is a candidate paired-end alignment where mate

1 appears upstream of the reverse complement of mate 2 and the

fragment length constraints (and) are met, that alignment is valid. Also, if mate 2

appears upstream of the reverse complement of mate 1 and all other

constraints are met, that too is

valid.–rflikewise

requires that an upstream mate1 be reverse-complemented and a

downstream mate2 be

forward-oriented.–ffrequires

both an upstream mate 1 and a downstream mate 2 to be

forward-oriented. Default:–fr(appropriate

for Illumina’s Paired-end Sequencing Assay).

–no-mixed

By default, whenbowtie2cannot

find a concordant or discordant alignment for a pair, it then tries

to find alignments for the individual mates. This option disables

that behavior.

–no-discordant

By default,bowtie2looks

for discordant alignments if it cannot find any concordant

alignments. A discordant alignment is an alignment where both mates

align uniquely, but that does not satisfy the paired-end

constraints (,,). This option disables that behavior.

–dovetail

If the mates “dovetail”, that is if one mate alignment extends past

the beginning of the other such that the wrong mate begins

upstream, consider that to be concordant. See

also:. Default: mates cannot dovetail in a concordant

alignment.

–no-contain

If one mate alignment contains the other, consider that to be

non-concordant. See also:. Default: a mate can contain the other in a

concordant alignment.

–no-overlap

If one mate alignment overlaps the other at all, consider that to

be non-concordant. See also:. Default: mates can overlap in a concordant

alignment.

-t/–time

Print the wall-clock time required to load the index files and

align the reads. This is printed to the “standard error” (“stderr”)

filehandle. Default: off.

–un –un-gz –un-bz2

Write unpaired reads that fail to align to file

at . These reads correspond to the SAM records

with the FLAGS0x4bit

set and neither the0x40nor0x80bits

set. If–un-gzis

specified, output will be gzip compressed.

If–un-bz2is

specified, output will be bzip2 compressed. Reads written in this

way will appear exactly as they did in the input file, without any

modification (same sequence, same name, same quality string, same

quality encoding). Reads will not necessarily appear in the same

order as they did in the input.

–al –al-gz –al-bz2

Write unpaired reads that align at least once to file at. These

reads correspond to the SAM records with the

FLAGS0x4,0x40,

and0x80bits

unset. If–al-gzis

specified, output will be gzip compressed.

If–al-bz2is

specified, output will be bzip2 compressed. Reads written in this

way will appear exactly as they did in the input file, without any

modification (same sequence, same name, same quality string, same

quality encoding). Reads will not necessarily appear in the same

order as they did in the input.

–un-conc –un-conc-gz –un-conc-bz2

Write paired-end reads that fail to align concordantly to file(s)

at . These reads correspond to the SAM records

with the FLAGS0x4bit

set and either the0x40or0x80bit

set (depending on whether it’s mate #1 or #2)..1and.2strings

are added to the filename to distinguish which file contains mate

#1 and mate #2. If a percent

symbol,%,

is used in , the percent symbol is replaced

with1or2to

make the per-mate filenames.

Otherwise,.1or.2are

added before the final dot in to make the per-mate

filenames. Reads written in this way will appear exactly as they

did in the input files, without any modification (same sequence,

same name, same quality string, same quality encoding). Reads will

not necessarily appear in the same order as they did in the

inputs.

–al-conc –al-conc-gz –al-conc-bz2

Write paired-end reads that align concordantly at least once to

file(s) at . These reads correspond to the SAM

records with the FLAGS0x4bit

unset and either the0x40or0x80bit

set (depending on whether it’s mate #1 or

#2)..1and.2strings

are added to the filename to distinguish which file contains mate

#1 and mate #2. If a percent

symbol,%,

is used in , the percent symbol is replaced

with1or2to

make the per-mate filenames.

Otherwise,.1or.2are

added before the final dot in to make the per-mate

filenames. Reads written in this way will appear exactly as they

did in the input files, without any modification (same sequence,

same name, same quality string, same quality encoding). Reads will

not necessarily appear in the same order as they did in the

inputs.

–quiet

Print nothing besides alignments and serious errors.

–met-file

Writebowtie2metrics

to file . Having alignment metric can be useful

for debugging certain problems, especially performance issues. See

also:. Default: metrics disabled.

–met-stderr

Writebowtie2metrics

to the “standard error” (“stderr”) filehandle. This is not mutually

exclusive with. Having alignment metric can be useful for debugging

certain problems, especially performance issues. See

also:. Default: metrics disabled.

–met

Write a newbowtie2metrics

record every seconds. Only

matters if eitherorare specified. Default: 1.

–no-unal

Suppress SAM records for reads that failed to align.

–no-hd

Suppress SAM header lines (starting

with@).

–no-sq

Suppress@SQSAM

header lines.

–rg-id

Set the read group ID to . This causes the

SAM@RGheader

line to be printed, with as the

value associated with theID:tag.

It also causes theRG:Z:extra

field to be attached to each SAM output record, with value set

to .

–rg

Add (usually of the

formTAG:VAL,

e.g.SM:Pool1)

as a field on the@RGheader

line. Note: in order for the@RGline

to appear,must also be specified. This is because

theIDtag

is required by the.

Specify–rgmultiple

times to set multiple fields. See

thefor

details about what fields are legal.

–omit-sec-seq

When printing secondary alignments, Bowtie 2 by default will write

out theSEQandQUALstrings.

Specifying this option causes Bowtie 2 to print an asterix in those

fields instead.

-o/–offrate

Override the offrate of the index with .

If is greater than the offrate

used to build the index, then some row markings are discarded when

the index is read into memory. This reduces the memory footprint of

the aligner but requires more time to calculate text

offsets. must be greater than the value used to

build the index.

-p/–threads NTHREADS

LaunchNTHREADSparallel

search threads (default: 1). Threads will run on separate

processors/cores and synchronize when parsing reads and outputting

alignments. Searching for alignments is highly parallel, and

speedup is close to linear.

Increasing-pincreases

Bowtie 2’s memory footprint. E.g. when aligning to a human genome

index, increasing-pfrom

1 to 8 increases the memory footprint by a few hundred megabytes.

This option is only available

ifbowtieis

linked with thepthreadslibrary

(i.e. ifBOWTIE_PTHREADS=0is

not specified at build time).

–reorder

Guarantees that output SAM records are printed in an order

corresponding to the order of the reads in the original input file,

even whenis set greater than 1.

Specifying–reorderand

settinggreater than 1 causes Bowtie 2 to run

somewhat slower and use somewhat more memory then

if–reorderwere

not specified. Has no effect

ifis set to 1, since output order will

naturally correspond to input order in that case.

–mm

Use memory-mapped I/O to load the index, rather than typical file

I/O. Memory-mapping allows many concurrentbowtieprocesses

on the same computer to share the same memory image of the index

(i.e. you pay the memory overhead just once). This facilitates

memory-efficient parallelization

ofbowtiein

situations where usingis not possible or not

preferable.

–qc-filter

Filter out reads for which the QSEQ filter field is non-zero. Only

has an effect when read format

is. Default: off.

–seed

Use as the seed for

pseudo-random number generator. Default: 0.

–non-deterministic

Normally, Bowtie 2 re-initializes its pseudo-random generator for

each read. It seeds the generator with a number derived from (a)

the read name, (b) the nucleotide sequence, (c) the quality

sequence, (d) the value of theoption. This means that if two reads are

identical (same name, same nucleotides, same qualities) Bowtie 2

will find and report the same alignment(s) for both, even if there

was ambiguity. When–non-deterministicis

specified, Bowtie 2 re-initializes its pseudo-random generator for

each read using the current time. This means that Bowtie 2 will not

necessarily report the same alignment for two identical reads. This

is counter-intuitive for some users, but might be more appropriate

in situations where the input consists of many identical

reads.

–version

Print version information and quit.

-h/–help

Print usage information and quit.

发布者：全栈程序员-用户IM，转载请注明出处：https://javaforall.cn/215484.html原文链接：https://javaforall.cn

【正版授权，激活自己账号】： Jetbrains全家桶Ide使用，1年售后保障，每天仅需1毛

【官方授权正版激活】： 官方授权正版激活支持Jetbrains家族下所有IDE 使用个人JB账号...

bowtie 加mn标签_Bowtie 比对「建议收藏」

相关推荐

【组合数求模】 转自AekdyCoin

spring-boot-devtools的作用_maven compiler plugin

centos7 防火墙开放svn通过

Java中System.setProperty()用法

如何使用staruml创建时序图[通俗易懂]

mybatis返回map结果集@MapKey使用场景[通俗易懂]

发表回复

【组合数求模】转自AekdyCoin