页数:47
1、QIIME使用说明QI I ME (pronounced chime) stands for Quantitative Insights Into Microbial EcoIogy. QIIME i s an open source software package for compar i son and ana lysis of microbial communities, pr imar ily based on h i gh-throughput amp Ii con sequencing data(such as SSU rRNA小亚基核糖体rnagenerated on a va

2、r iety of platforms, but a I so supporting analysis of other types of data(such as shotgun metagenomic data metagenomic意思是宏恳因纽学,是对环境样品中微生物群体基因组进行的分析).QI IME takes users from thei r raw sequencing output through initial analyses such as OTU picki ng 系统聚类,taxonomic assig nment 分类,and construction of p

3、hylogenetic trees 系统树 from representative sequences of OTUs, and through downstream statistical analysis, visual izat i on, and production of pub I icat i on-quality graphics QI IME has been appl ied to 适用于 sin gle studies based on bill ions of sequences from thousands of samplesThi s tutorial exp I

4、 ains how to use theQIIME (Quantitative Insights Into Microbial Ecology)Pi pe I i ne to process data from high-throughpu t 16S rRNA seque ncing stu d i es If you have not a I ready instal led qi ime, please see the section I nstailing Qi ime f i rst The purpose of thispi pel ine流水线 is to provide a s

5、tartto-finish workfIow, beginning withreads and finishing with taxonomic and phyIogenetic prof iIes and compar i sons of the samp Ies i n the study. With thi s i nforma tion in hand, it is possible to det erm ine biological and envi ronme ntal fac tors that alter microbial comm unity ecology in your

6、 exper imentAs an example, v/e wi I I use data from a study of the response of mouse gut microbial comm unities to fas ting (Crawford et al, 2009) To make t hi s tu tor i a I run quickly on a perso nal compu ter, we wi I I use a subse t of the data genera ted from 5 animals kep t on the cont rol ad

7、I ibi tum fed d i e t, and 4 ani ma I s fas ted for 24 hours before sacr i f i ce At the end of our tut orial, we wi I I be able to compare the commun ity st rue ture of cont rol vs. fas ted animals In par tic ular, we will be ab I e to compare taxonomic profiles for each sample type, d i fferences

8、in d i vers i ty metr ics within the samp Ies and between the groups, and perform compar a tive c I us teri ng ana lysis to look for overa I I d i fferences in the samp I es (给小鼠节食的 例子)In this waIkthrough, text like the fol lowing:den otes the comma ndTine i nvoca tion 命 令彳亍调丿三 of scr i p ts. You ca

9、n fin d fu I I usage i nforma tion for each scr i p t by pass i ng the - h option (help) and/or by read ing the fuI I descr i ption i n the Documentation. Execute a I I tutorial commands from within the qi ime_tutorial d i rectory, which can be downloaded from here: QI I ME Tutorial filesTo process

10、our data, we wiI I perform the fol lowing ana Iyses, each of which is descr ibed in more detaiI below:Filter the DNA sequenee reads for qua Iity and ass i gn multiplexed reads to starting samp I es by nu c I eo tide barcode 条码 Pick Operational Taxonomic Units (OTUs 操作分类单元)based on sequence simi lari

11、ty with in the reads, and pick a representative sequenee from each OTU.Assign the OTU to a taxonomic identity using referenee databasesAlign the OTU sequences and create a phylogenetic treeCalcul ate d i vers i ty metr i cs for each samp I e and compare the types of commun i t i es, us i ng the taxo

12、nomic and phylogenetic assignments.Genera te UPGMA and PCoA plots to v i sua I I y depic t the d i ffere nces bet ween the samp I es, and dynami cally work with these graphs to generate pub Iication qua Iity figures筛选DNA序列获取质量,记录样品的核昔酸条码。基于读取文件的序列相似挑选操作分类单位,挑选每个OTU的代表序列。使用参考数据库指定OUT的分类一致性。对齐OTU序列,并创

13、建一个系统进化树。计算每个样本的多样性指标和比较社区的类型,使用分类和系统法。类平均法和主坐标分析直观地描绘出样品之间的差异,并动态地使用这些曲线生成出版质量的图。Sequences ( fna) nTh i s i s the 454-mach i ne genera ted FASTA file 格式文件 Using the Amp I i co n process i ng soft ware on the 454 FLX stan dard, each regi on of the PTP p I a te wi I I yield a fas ta f i le of form wh

14、ere u 1 ” i s rep I aced with the appropr i a te region nu mber For the purposes of this tutorial, we wiI I use the fasta file Qua I i ty Scores (. qua I) nThis is the 454-machine generated qua Iity score file, which contains a score for each base in each sequenee incIuded in the FASTA file L i ke t

15、he fasta file mentioned above, the Amp Iicon process i ng software wiI I generate one of these files for each region of the PTP pI ate, named etc. For the purposes of th i s tutorial, we wiI I use the qua Iity scores file Mapp ing File (Tabde I im itedThe mapp i ng f i le i s genera ted by the user.

16、 Thi s f i le contains a I I of the inf orma tion abo ut the samples n ecessary to perform the data ana lysis At a mini mum, the mapp i ng file should con tain the name of each samp I e, the barcode seque nee used for each samp Ie, the I i nker/pr imer sequenee used to amp I ify the samp I e, and a

17、Descr i pt ion col umn. I n genera I, you shou I d a I soinclude in the mapp i ng file any met ada ta 元数据; 诠释询料 that rela tes to the samples (for i ns tance, hea I th status or samp Ii ng site) and any add i tional i nformation re I ating to specific samp Ies that may be usefuI to have at hand whe n

18、 cons i der i ng outliers 2; F余值 (for examp I e, wha t med i ca tions a patient was taking at ti mef马尔 of samp I i ng ) Of note: the samp Ie names may on I y conta i n alphanumeric characters (A-z) and the dot ( 人 FuI I format specificstions can be found i n the Documentat ion (File Formats)For the

19、purposes of t h i s tut orial, we wi I I use the mapp i ng file The con tents of the mapp i ng f i le are sho wn here 一 as you can see, a nuc I eo tide barcode sequence i s prov i ded for each of the 9 samples, as v/e I I as met ada ta re I a ted to t rea tment group and date of b i rth, and genera

20、I run descr i ptions about the pro ject file contents:Note#SampIeID BarcodeSequenee LinkerPr imerSequenee Treatment DOB Descr iption #Examp I e mapp i ng file for the QI IME an a I ys i s package These 9 samp I es are from a study of the effects of#exercise and diet on mouse cardiac physiology (Craw


22、TAGGAGT FastACCGCAGAGTCA YATGCTGCCTCCCGTAGGAGT FastACGGTGAGTGTC YATGCTGCCTCCCGTAGGAGT Fast Mapp I ng F i IBefore beginning with QI I ME, you shouId ensure that your mapping file i s formatted cor rectly with the scr i pt Type:Thi s u ti I it y wi I I display a message indica ting whe ther or not pro

23、b I ems were found in the mapp i ng file A HTML file showing the I oca tion of errors and warn i ngs wi I I be genera ted i n the output d i rec tory, and wi I I a I so be wr i tten to the outpu t to a log f i le Errors will cause fata I prob I ems with subsequen t scr i pts and mus t be correc ted

24、before mov i ng forward Warn i ngs wiI I not cause fatal problems, but it is encouraged that you fix these prob Iems as they are often indicative of typos 纟酋另U 字 i n your mapp i ng file, i nva I id charac ters, or ot her unint ended errors that wi I I impac t downs tream an a I ys i s A f i le wi I

25、I a I so be crea ted in the output d i rec tory, wh i ch wi I I have a copy of the mapp ing file v/i th i nva I id charac ters 无效 字符 rep I aced by underscores 下戈线.Reverse pr imers 反向弓丨才勿 can be spec i f i ed 说明,扌旨出 i n the mapp in g f i le, for remova I dur i ng the demu 11 i p I ex i ng step. Th i

26、s is not requi red, but it is STRONGLY recommended, as leaving in sequences fol lowing pr imers, such as sequencing adapters, can interfere with OTU picking and taxonomic assignments with RDP 远程桌而协议An examp Ie mapping file with faux reverse pr imers spec i f i ed, using the ReversePr imer field, i s

27、 avaiI able here: reverseNote#SampIeID BarcodeSequence Li nkerPr imerSequence Treatment ReversePr imer Descr iption#Examp I e mapp i ng file for the QI IME an a I ys i s package These 9 samp I es are from a study of the effects of#exercise and diet on mouse cardiac physiology (Crawford, et al, PNAS,



30、, these are not the true reverse pr imers used, but rather just a somewhat conserved site保守位点(所有的基因启动子上基本都有这个序列)in the sequences used for thi s examp IeAn examp I e image of a the ent ire pr imer cons true t and amp I i con i s shown be I ow, us i ng QI IME nomenc I a ture 命 名法:Target SequenceRevers

31、ePrimer Adapter BAdapter A BarcodeSequence LinkerPrimerSequeneeDesired Sequenee454 sequencing, in most cases, generates sequences that begin at the BarcodeSequence, which is fol lowed by the LinkerPr imerSequence, both of which are automaticaI Iy removed dur i ng the demultiplex!ng step descr ibed b

32、e Iow However, the ReversePr imer ,the pr imer at the end of the read) i s not removed by default 默认,and n eeds to be specified The adapter sequence (Adapter B 接头)does not match 匹配 genomic data, such as 16S sequences, and as such it can disrupt 打断、中断analyses.Ass i gn Samp I es to Multiplex Reads!)The next t ask i s to ass i gn the mu I ti p I exed reads 多 重读取 to samp I es based on t he i r nu cleo tide barcode Al so, t hi s step performs qua I ity filt

