QIIME使用说明.docx
《QIIME使用说明.docx》由会员分享,可在线阅读,更多相关《QIIME使用说明.docx(47页珍藏版)》请在冰点文库上搜索。
QIIME使用说明
QIIME(pronounced"chime")standsforQuantitativeInsightsIntoMicrobialEcoIogy.QIIMEisanopensourcesoftwarepackageforcomparisonandanalysisofmicrobialcommunities,primarilybasedonhigh-throughputampIiconsequencingdata
(suchasSSUrRNA小亚基核糖体rna
generatedonavarietyofplatforms,butaIsosupportinganalysisofothertypesofdata
(suchasshotgunmetagenomicdatametagenomic意思是宏恳因纽学,是对环境样品中微生物群体基因组进行的分析).
QIIMEtakesusersfromtheirrawsequencingoutputthroughinitialanalysessuchasOTUpicking系统聚类,taxonomicassignment分类,andconstructionofphylogenetictrees系统树fromrepresentativesequencesofOTUs,andthroughdownstreamstatisticalanalysis,visualization,andproductionofpubIication-qualitygraphics・QIIMEhasbeenappliedto适用于singlestudiesbasedonbillionsofsequencesfromthousandsofsamples・
ThistutorialexpIainshowtousethe
QIIME(QuantitativeInsightsIntoMicrobialEcology)
PipeIinetoprocessdatafromhigh-throughput16SrRNAsequencingstudies・IfyouhavenotaIreadyinstalledqiime,pleaseseethesectionInstailingQiimefirst・Thepurposeofthis
pipeline流水线istoprovideastart—to-finishworkfIow,beginningwith
readsandfinishingwithtaxonomicandphyIogeneticprofiIesandcomparisonsofthesampIesinthestudy.Withthisinformationinhand,itispossibletodeterminebiologicalandenvironmentalfactorsthataltermicrobialcommunityecologyinyourexperiment・
Asanexample,v/ewiIIusedatafromastudyoftheresponseofmousegutmicrobialcommunitiestofasting(Crawfordetal・,2009)・TomakethistutoriaIrunquicklyonapersonalcomputer,wewiIIuseasubsetofthedatageneratedfrom5animalskeptonthecontroladIibitumfeddiet,and4animaIsfastedfor24hoursbeforesacrifice・Attheendofourtutorial,wewiIIbeabletocomparethecommunitystruetureofcontrolvs.fastedanimals・Inparticular,wewillbeabIetocomparetaxonomicprofilesforeachsampletype,differencesindiversitymetricswithinthesampIesandbetweenthegroups,andperformcomparativecIusteringanalysistolookforoveraIIdifferencesinthesampIes・(给小鼠节食的例子)
InthiswaIkthrough,textlikethefollowing:
denotesthecommandTineinvocation命令彳亍调丿三ofscripts.YoucanfindfuIIusageinformationforeachscriptbypassingthe-hoption(help)and/orbyreadingthefuIIdescriptionintheDocumentation.ExecuteaIItutorialcommandsfromwithintheqiime_tutorialdirectory,whichcanbedownloadedfromhere:
QIIMETutorialfiles・
Toprocessourdata,wewiIIperformthefollowinganaIyses,eachofwhichisdescribedinmoredetaiIbelow:
FiltertheDNAsequeneereadsforquaIityandassignmultiplexedreadstostartingsampIesbynucIeotidebarcode条码・
PickOperationalTaxonomicUnits(OTUs操作分类单元)basedonsequencesimilaritywithinthereads,andpickarepresentativesequeneefromeachOTU.
AssigntheOTUtoataxonomicidentityusingrefereneedatabases・
AligntheOTUsequencesandcreateaphylogenetictree・
CalculatediversitymetricsforeachsampIeandcomparethetypesofcommunities,usingthetaxonomicandphylogeneticassignments.
GenerateUPGMAandPCoAplotstovisuaIIydepictthedifferencesbetweenthesampIes,anddynamicallyworkwiththesegraphstogeneratepubIicationquaIityfigures・
筛选DNA序列获取质量,记录样品的核昔酸条码。
基于读取文件的序列相似挑选操作分类单位,挑选每个OTU的代表序列。
使用参考数据库指定OUT的分类一致性。
对齐OTU序列,并创建一个系统进化树。
计算每个样本的多样性指标和比较社区的类型,使用分类和系统法。
类平均法和主坐标分析直观地描绘出样品之间的差异,并动态地使用这些曲线生成出版质量的图。
Sequences(•fna)n
Thisisthe454-machinegeneratedFASTAfile格式文件・UsingtheAmpIiconprocessingsoftwareonthe454FLXstandard,eachregionofthePTPpIatewiIIyieldafastafileofformwhereu1”isrepIacedwiththeappropriateregionnumber・Forthepurposesofthistutorial,wewiIIusethefastafile・
QuaIityScores(.quaI)n
Thisisthe454-machinegeneratedquaIityscorefile,whichcontainsascoreforeachbaseineachsequeneeincIudedintheFASTAfile・Likethefastafilementionedabove,theAmpIiconprocessingsoftwarewiIIgenerateoneofthesefilesforeachregionofthePTPpIate,namedetc.Forthepurposesofthistutorial,wewiIIusethequaIityscoresfile・
MappingFile(Tab~deIimited
Themappingfileisgeneratedbytheuser.ThisfilecontainsaIIoftheinformationaboutthesamplesnecessarytoperformthedataanalysis・Ataminimum,themappingfileshouldcontainthenameofeachsampIe,thebarcodesequeneeusedforeachsampIe,theIinker/primersequeneeusedtoampIifythesampIe,andaDescriptioncolumn.IngeneraI,youshouIdaIso
includeinthemappingfileanymetadata元数据;诠释询料thatrelatestothesamples(forinstance,heaIthstatusorsampIingsite)andanyadditionalinformationreIatingtospecificsampIesthatmaybeusefuItohaveathandwhenconsideringoutliers2;F余值(forexampIe,whatmedicationsapatientwastakingattimef马尔ofsampIing)・Ofnote:
thesampIenamesmayonIycontainalphanumericcharacters(A-z)andthedot(・人FuIIformatspecificstionscanbefoundintheDocumentation(FileFormats)・
Forthepurposesofthistutorial,wewiIIusethemappingfile・Thecontentsofthemappingfileareshownhere一asyoucansee,anucIeotidebarcodesequenceisprovidedforeachofthe9samples,asv/eIIasmetadatareIatedtotreatmentgroupanddateofbirth,andgeneraIrundescriptionsabouttheproject・filecontents:
Note
#SampIeIDBarcodeSequeneeLinkerPrimerSequeneeTreatmentDOBDescription#ExampIemappingfilefortheQIIMEanaIysispackage・These9sampIesarefromastudyoftheeffectsof
#exerciseanddietonmousecardiacphysiology(Crawford,etal,PNAS,2009)・
AGCACGAGCCTAYATGCTGCCTCCCGTAGGAGTControl
AACTCGTCGATGYATGCTGCCTCCCGTAGGAGTControl
ACAGACCACTCAYATGCTGCCTCCCGTAGGAGTControl
AGCAGCACTTGTYATGCTGCCTCCCGTAGGAGTControl
ACAGAGTCGGCTYATGCTGCCTCCCGTAGGAGTFast
ACCAGCGACTAGYATGCTGCCTCCCGTAGGAGTControl
AACTGTGCGTACYATGCTGCCTCCCGTAGGAGTFast
ACCGCAGAGTCAYATGCTGCCTCCCGTAGGAGTFast
ACGGTGAGTGTCYATGCTGCCTCCCGTAGGAGTFastMappIngFiI
BeforebeginningwithQIIME,youshouIdensurethatyourmappingfileisformattedcorrectlywiththescript・Type:
ThisutiIitywiIIdisplayamessageindicatingwhetherornotprobIemswerefoundinthemappingfile・AHTMLfileshowingtheIocationoferrorsandwarningswiIIbegeneratedintheoutputdirectory,andwiIIaIsobewrittentotheoutputtoalogfile・ErrorswillcausefataIprobIemswithsubsequentscriptsandmustbecorrectedbeforemovingforward・WarningswiIInotcausefatalproblems,butitisencouragedthatyoufixtheseprobIemsastheyareoftenindicativeoftypos纟酋另U字inyourmappingfile,invaIidcharacters,orotherunintendederrorsthatwiIIimpactdownstreamanaIysis・AfilewiIIaIsobecreatedintheoutputdirectory,whichwiIIhaveacopyofthemappingfilev/ithinvaIidcharacters无效字符repIacedbyunderscores下戈]线.
Reverseprimers反向弓丨才勿canbespecified说明,扌旨出inthemappingfile,forremovaIduringthedemu11ipIexingstep.Thisisnotrequired,butitisSTRONGLYrecommended,asleavinginsequencesfollowingprimers,suchassequencingadapters,caninterferewithOTUpickingandtaxonomicassignmentswithRDP远程桌而协议・
AnexampIemappingfilewithfauxreverseprimersspecified,usingtheReversePrimerfield,isavaiIablehere:
reverse
Note
#SampIeIDBarcodeSequenceLinkerPrimerSequenceTreatmentReversePrimerDescription
#ExampIemappingfilefortheQIIMEanaIysispackage・These9sampIesarefromastudyoftheeffectsof
#exerciseanddietonmousecardiacphysiology(Crawford,etal,PNAS,2009)・
AGCACGAGCCTAYATGCTGCCTCCCGTAGGAGTControlGCGCACGGGTGAGTA
AACTCGTCGATGYATGCTGCCTCCCGTAGGAGTControl
GCGCACGGGTGAGTA
ACAGACCACTCAYATGCTGCCTCCCGTAGGAGTControlGCGCACGGGTGAGTA
ACCAGCGACTAGYATGCTGCCTCCCGTAGGAGTControl
GCGCACGGGTGAGTA
AGCAGCACTTGTYATGCTGCCTCCCGTAGGAGTControlGCGCACGGGTGAGTA
AACTGTGCGTACYATGCTGCCTCCCGTAGGAGTFast
GCGCACGGGTGAGTA
ACAGAGTCGGCTYATGCTGCCTCCCGTAGGAGTFastGCGCACGGGTGAGTA
ACCGCAGAGTCAYATGCTGCCTCCCGTAGGAGTFast
GCGCACGGGTGAGTA
ACGGTGAGTGTCYATGCTGCCTCCCGTAGGAGTFastGCGCACGGGTGAGTA
reverseprimers,
Iiketheforwardprimers,
arewritten
in
5'->3'direction.Inthiscase,thesearenotthetruereverseprimersused,butratherjustasomewhatconservedsite保守位点(所有的基因启动子上基本都有这个序列)
inthesequencesusedforthisexampIe・
AnexampIeimageofatheentireprimerconstruetandampIiconisshownbeIow,usingQIIMEnomencIature命名法:
TargetSequence
ReversePrimerAdapterB
AdapterABarcodeSequenceLinkerPrimerSequenee
DesiredSequenee
454sequencing,inmostcases,generatessequencesthatbeginattheBarcodeSequence,whichisfollowedbytheLinkerPrimerSequence,bothofwhichareautomaticaIIyremovedduringthedemultiplex!
ngstepdescribedbeIow・However,theReversePrimer・,theprimerattheendoftheread)isnotremovedbydefault默认,andneedstobespecified・Theadaptersequence(AdapterB接头)doesnotmatch匹配genomicdata,suchas16Ssequences,andassuchitcandisrupt打断、中断analyses.
AssignSampIestoMultiplexReads!
)
ThenexttaskistoassignthemuItipIexedreads多重读取tosampIesbasedontheirnucleotidebarcode・Also,thisstepperformsquaIityfilt