Bioinformaticshomework.docx
《Bioinformaticshomework.docx》由会员分享,可在线阅读,更多相关《Bioinformaticshomework.docx(50页珍藏版)》请在冰点文库上搜索。
![Bioinformaticshomework.docx](https://file1.bingdoc.com/fileroot1/2023-5/6/dbdd4535-d976-4d7f-9909-914f1e8e16a4/dbdd4535-d976-4d7f-9909-914f1e8e16a41.gif)
Bioinformaticshomework
“Bioinformatics”homeworkforundergraduate(2016)
#1
Howmanynucleotidesequencesfrommaize(Zeamays)havebeenstoredinthepublicDNAdatabase(suchasGenBank)?
HowmanyWaxy(granule-boundstarchsynthase)genesequencesfrommaizeinthedatabase?
答:
2016年11月6日星期日,访问NCBI(网址为https:
//www.ncbi.nlm.nih.gov/),在Nucleotide数据库中搜索zeamays,Species选择Plants,Moleculetypes选择genomicDNA/RNA,最终结果显示被存储在NCBI中的Zeamays的nucleotidesequences数量为446964。
在搜索框中输入(zeamays[Organism])ANDwaxy[Genename],然后搜索得到玉米中Waxy基因序列在数据库中的数目为175。
具体操作及结果如下图所示:
#2
Asequencewasgeneratedbyasuppressionsubtractivehybridization(SSH)experiment.Pleasefindthebesthit(s)oftheunknownsequenceinthepublicdatabaseandpredictitspotentialfunction.
>anunknownsequence
CCTCGGAGATCTTCATGGGGGGCAAGAGCACCATCGTGCTgCACAACACCTGCGAGGACTCGCTCCTCGCTGCACCCATCATTCTTGATCTGGTGCTCCTGGCGGAGCTCAGCACCAGGATTCAGCTGAAGGCCGAGGGAGAGGTAAGAGTCTGACGAGATATGTTGCTAGTCTACTCTGTAGTCGAGATATACTTTGGGAGCCAAACTGAAGATTTCGCTGCTCCACTTGCATTTGTGCAGGACAAGTTCCATTCCTTCCATCCGGTTGCCACCATCCTGAGCTACCTCACCAAGGCACCCCTGGTAAGAAACAATTCTCGACTGTTTGCTCTAAATAACCTATAGATAAATAAAGACGATTAACTGACGTGCCACTGAATTCCTCTGTTAACAGGTTCCTCCTGGCACGCCGGTGGTGAACGCCCTGGCGAAGCAAAGGGCGATGCTGGAGAACATCATGAGGGCGTGTGTCGGCCTGGCGCCCGAAAACAACATGATCCTGGAGTACAAGTGAGGAGCGTGGCCCAAGCTCGCGGAGCCGAGAGCGACCGTACGTACGTAGCAAGTGGCGAGGGGCGACGGGAGGGCAGGACGAAGAAGAAGGCGAGATCGGCTGTGGAATTATTTGGCGGCTTGTCTTTAGTTTCCTTTGCGAATCTTTCCCTGGTTAAGTTTACCCCAGTGAGTGTGTGTCCTTGCGAGAAAAG
答:
进入NCBI做blast,具体网址为http:
//blast.ncbi.nlm.nih.gov/Blast.cgi,选择Blastx,将上述序列复制到查询框中,参数选择默认参数,直接Blast,得到最佳联配结果为Inositol-3-phosphatesynthase[Dichantheliumoligosanthes]。
进入EMBL做blast,具体网址为http:
//www.ebi.ac.uk/Tools/sss/ncbiblast/,选择Blastx,将上述序列复制到查询框中,参数选择默认参数,直接Blast,得到最佳联配结果为Inositol-3-phosphatesynthase。
根据两处的联配结果可以推测这个未知序列可能的功能与Inositol-3-phosphatesynthase相同。
#3
Usedynamicprogrammingmethod,theNeedleman-Wunschalgorithm,toperformglobalalignmentofthesequences:
P1=HEAGAWGHEP
P2=EPAWHEAEAG
Scoringsystem:
BLOSUM50scoringmatrixwithgappenalty8.
BLOSUM50(partial)
A
E
G
H
P
W
A
5
-1
0
-2
-1
-3
E
6
-3
0
-1
-3
G
8
-2
-2
-3
H
10
-2
-3
P
10
-4
W
15
答:
具体每一步动态规划的计算过程如下图所示,以黄颜色突出的部分表示达到最优联配所需经过的每一步。
P1
H
E
A
G
A
W
G
H
E
P
P2
0
-8
-16
-24
-32
-40
-48
-56
-64
-72
-80
E
-8
0
-2
-10
-18
-26
-34
-42
-50
-58
-66
P
-16
-8
-1
-3
-11
-19
-27
-36
-44
-51
-48
A
-24
-16
-9
4
-3
-6
-14
-22
-30
-38
-46
W
-32
-24
-17
-4
1
-6
9
1
-7
-15
-23
H
-40
-22
-24
-12
-6
-1
1
7
11
3
-5
E
-48
-30
-16
-20
-14
-7
-4
-1
7
17
9
A
-56
-38
-24
-17
-20
-9
-10
-4
-1
9
16
E
-64
-46
-32
-25
-20
-17
-12
-12
-4
5
8
A
-72
-54
-40
-27
-25
-15
-20
-12
-12
-3
4
G
-80
-72
-48
-35
-19
-23
-18
-12
-14
-11
-4
最终可以得到最佳的联配方式如下所示,其中下划线表示空位
P1:
HEAGAWGH_EP_
P2:
_EP_AWHEAEAG
Score:
-8+6-1-8+5+15-2+0-8+6-1-8=-4
#4
Pleasefindgenesinagenomicsegmentofbamboo(Download).
答:
打开如下网址,Organism选择Monocotplants(因为里面没有竹子对应的选项),然后运行在线程序,最后得到结果如下,它给出了可能的基因及它们编码的蛋白质的碱基序列。
FGENESH2.6PredictionofpotentialgenesinMonocotgenomicDNA
Time:
SunNov604:
05:
562016
Seqname:
testsequence
Lengthofsequence:
49600
Numberofpredictedgenes10:
in+chain2,in-chain8.
Numberofpredictedexons22:
in+chain10,in-chain12.
Positionsofpredictedgenesandexons:
Variant1from1,Score:
299.051538
GStrFeatureStartEndScoreORFLen
1-PolA67770.44
1-1CDSo6884-70574.336884-7057174
1-TSS8311-1.78
2+TSS17591-4.18
2+1CDSf17742-1781116.4117742-1781069
2+2CDSl19834-2079287.8619836-20792957
2+PolA216490.44
3+TSS21801-7.58
3+1CDSf22009-2208519.3322009-2208375
3+2CDSi22583-226523.6522584-2265269
3+3CDSi23070-231456.5623070-2314475
3+4CDSi23236-2335318.3723238-23351114
3+5CDSi24144-242338.3724145-2423187
3+6CDSi24306-243816.4724307-2438175
3+7CDSi24523-246505.2624523-24648126
3+8CDSl24731-248008.3624732-2480069
3+PolA250060.44
4-PolA26777-1.06
4-1CDSl27135-2801959.8927135-28019885
4-2CDSf28097-2850440.0828097-28504408
4-TSS28623-6.38
5-PolA309640.44
5-1CDSl30993-311778.8830993-31175183
5-2CDSi31212-31431-7.3931213-31431219
5-3CDSf31504-315488.1531504-3154845
5-TSS31608-1.28
6-PolA333640.44
6-1CDSo33766-339544.4233766-33954189
6-TSS34021-3.18
7-PolA34094-1.06
7-1CDSo34700-3497517.2734700-34975276
7-TSS35444-6.08
8-PolA358480.44
8-1CDSo36075-3645820.7536075-36458384
8-TSS37019-5.38
9-PolA40341-1.06
9-1CDSo40879-410679.5140879-41067189
9-TSS41777-5.68
10-PolA43349-1.06
10-1CDSl44280-445457.1744280-44543264
10-2CDSf46131-4668633.9646132-46686555
10-TSS46908-8.18
Predictedprotein(s):
>FGENESH:
[mRNA]11exon(s)6884-7057174bp,chain-
ATGGGGGTGAATATGAAGGGTAAGCAGCACATGCCGCGGCCATGTGCGTCGGTGGTTCAC
TGGTTCAGTTTCCACGTCCACGAGTGGCCTCGCACTGTCGATAGCGATCGAATGAACGTT
CTTTGCTGCTGCACGGCGGGAGCTTCTCCGGAACAGTCAGGGCTGATTGGTTAG
>FGENESH:
11exon(s)6884-705757aa,chain-
MGVNMKGKQHMPRPCASVVHWFSFHVHEWPRTVDSDRMNVLCCCTAGASPEQSGLIG
>FGENESH:
[mRNA]22exon(s)17742-207921029bp,chain+
ATGCGCCGGGTAGCGCTGTTGCTGCTGCTCGTCTGCGCGGCGGCGCGCGCCGCCGCGGTC
GTCACCGACGGGCTTCTTCCGAACGGCAACTTCGAGGATGGCCCGCCCAAGTCGGCGCTG
GTGAACGGCACTGTGGTGTCGGGCGCCAACGCCATCCCTAGCTGGGAGACCTCCGGCTTC
GTGGAGTACATCGAGTCGGGGCACAAGCAGGGCGACATGCTCCTGGTGGTGCCCCAGGGC
GCCCACGCCGTGCGCCTGGGCAACGAGGCCTCCATCCGGCAGCGCCTCTCCGTCACCCGG
GGCGCCTACTACTCCATCACCTTCAGCGCGGCGCGCACCTGCGCGCAGGCCGAGCGCCTC
AACGTCTCCGTGTCCCCCGAGTGGGGCGTCCTCCCGATGCAGACCATCTACGGCAGCAAC
GGGTGGGACTCGTACGCCTGGGCCTTCAAGGCCAAGCTGGACACGGTGACGCTCGTCCTC
CACAACCCCGGCGTCGAGGAGGACCCGGCCTGCGGCCCGCTCATCGACGGCGTCGCCATC
CGGGCCCTGTACCCGCCCACGCTGGCCCGCGGCGGCAACATGCTCAAGAACGGCGGCTTC
GAGGAGGGGCCCTACTTTTTACCCAACGCGTCGTGGGGCGTGCTCGTGCCGCCCAACATC
GAGGACGACCACTCCCCGCTCCCGGCCTGGATGATCGTGTCCTCCAAGGCCGTCAAGTAC
GTGGACGCCGCGCACTTTAAGGTCCCCAGGGCGCGGCGCGCCGTGGAGCCTGGTGGCCCC
GGGGAGGGAAGCGGCTGGTGCAGGAGGTGGCGCCACCGTGCGGTGGAGCTACCACCCTGG
CCTTCGCCGTGGGGGACGCCGCCGACGGGTGCGAGGGGTCGCATGGTGGGGCCGAGGCGT
ACACCGGCGCGGCCCACCCGTGAAGGTGGGCCGTACGAGTCCCAAGGGGACGGGAACTTC
CTTTTTTCTTCTTCACGGCCATCGCCAGCCGCACCCGGGTCGTGTTCCAGAGCACCTTCT
ACCACATGA
>FGENESH:
22exon(s)17742-20792342aa,chain+
MRRVALLLLLVCAAARAAAVVTDGLLPNGNFEDGPPKSALVNGTVVSGANAIPSWETSGF
VEYIESGHKQGDMLLVVPQGAHAVRLGNEASIRQRLSVTRGAYYSITFSAARTCAQAERL
NVSVSPEWGVLPMQTIYGSNGWDSYAWAFKAKLDTVTLVLHNPGVEEDPACGPLIDGVAI
RALYPPTLARGGNMLKNGGFEEGPYFLPNASWGVLVPPNIEDDHSPLPAWMIVSSKAVKY
VDAAHFKVPRARRAVEPGGPGEGSGWCRRWRHRAVELPPWPSPWGTPPTGARGRMVGPRR
TPARPTREGGPYESQGDGNFLFSSSRPSPAAPGSCSRAPSTT
>FGENESH:
[mRNA]38exon(s)22009-24800705bp,chain+
ATGCGGCTGCTCCTGCTCCTCCTCGCCGGCGCCGCCGCCCGCGCCTCCGACGACCCCTTC
CTCTCCGGCGGACGGCGGCGCTCCCCAATCAGCAGACGGTGGACTACCCCAGCTTCAAGC
TCGTCATCGTCGGCGATGGTGGCACAGTCGTCTCTGCATCTTGTAGGCAAAACCACCTTT
GTGAAGAGGCATCTGACTGGTGAGTTTGAGAAGAAGTATGAACCCACCATTGGTGTTGAG
GTTCATCCCCTGGACTTCTACACCAACCGCGGGAAGATCCGGTTCTACTGCTGGGACACT
GCAGGGCAGGAGAAGTTTGGTGGGCTCAGGGATGGATACTACGTCCATGGACAGTGTGGG
ATCATTATGTTTGATGTAACCTCACGGCTGAGTTACAAGAATGTTCCAACTTGGCACCGT
GATTTATCCAGGGTCTGTGACAACATCCCAATTGTGCTTTGTGGGAACAAGGTCGACGTG
AAGAACAGGCAGGTCAAGGCAAAGCAGCAACCTATTTATTGGACGTGGGTAAACCAACCC
CTTTTTTGTTGTGACAGTGATGCCAATCTCCACTTTGTTGAAAGCCCTGCTCTCGTTCCT
CCAGATGTCACAATTGACATGGTCGCCCAGCAGCAGCATGAAGCTGAGCTGTTAATCGCT
GTAGCCCAACCACTGCCTGATGATGACGATGACCTCATCGAGTAG
>FGENESH:
38exon(s)22009-24800234aa,chain+
MRLLLLLLAGAAARASDDPFLSGGRRRSPISRRWTTPASSSSSSAMVAQSSLHLVGKTTF
VKRHLTGEFEKKYEPTIGVEVHPLDFYTNRGKIRFYCWDTAGQEKFGGLRDGYYVHGQCG
IIMFDVTSRLSYKNVPTWHRDLSRVCDNIPIVLCGNKVDVKNRQVKAKQQPIYWTWVNQP
LFCCDSDANLHFVESPALVPPDVTIDMVAQQQHEAELLIAVAQPLPDDDDDLIE
>FGENESH:
[mRNA]42exon(s)27135-285041293bp,chain-
ATGCGGATCAGGAAAGGGAGTCATGTGGAGGTGTGGACGCAGGACGCGGCGTCGCCGGTG
GGCGCGTGGCGCGTCGGGGAGGTCACCTGGGGCAACGGCCACTCGTACACCATGCGGTGG
CACGACGGCGGCGGCGAGGTCTCCGGCCGCATCTCGAGGAAGTCGGTCCGCCCCCGCCCG
CCGCCCGCCCCCGTGCCGCGGGACCTCGACGCCGGGGACATGGTCGAGGTGTTCGACCAC
GACGACTGCCTCTGGAAGTGCGCCGAGGTCAAGGGCGCCGCCGCCGACGACGACCGCCGC
TTCGTCGTCAAGGTCGTCGGCGCCACCAATGTCCTGACGGTCCCGCCGCAGAGGCTCCGC
ATCCGGCAGGTTCTCAGGGACGACGACGTCTGGGTCGCGCTCCACAAGAGCTCGTTTCCT
GACACCTCGCCGTGGTTCTTTGCTTCTCAGGACAACCAGATCGCCGTCCCTAGCGCGACG
CCGCCGTTCCACGCCTACGGCGGAGGCGCTGGCATGGGCATCGGCAGAACCAAAGGCGGC
CATAAGCCCATGGCGCCAGGCTTCACGCCGCTGCTGCAGAAGAGGAGCCCGCTGCTGCAG
AAGAGAAGCTTCGGTATGCTGGGTTCGAGCACAATAACCCCCAATGGCAAGAGATTCGAC
GACACCGCCAAGAGGATTTGTGCCAAGGAAGAGCCCAGATATGAAGTAGAAGTGGTCGTC
CCAAACGTGCGCCTGAACAAGCAAGACGAGATGAGCGGCGAAGATGTTGACGTGCTTGGG
ACACGCAGTGATTCCGATGATGATCATCATCAGCAGCAGCAGCAGCACGAGGACGAGGAT
GACGATGACGATAGTGATGATTCTGCATCATCATCCTCGGATGATGACAGCAGCAGTGAC
AGCAGTAACAGCGACAGCAGAACCAGGAGCACCGGAGCCGGCAAGAATTGCACGGCAGCT
CTCGCAAGCAGGCCTTGTAACGATCAGAAGGCCGATCAGCTGCAACCCAGCGAGAAAGAA
CATCGTGACGACATATCTGAATCGCATCACGAGACCCTGAACGATGAGAAGGCGGCGGTG
GTGCAGGAACACATCCACCGTCTGGAGCTGGAGGCCTACACTAATCTGATGAAGGCGTTC
CATGCATGTGGCAAAGCGCTGAGCTGGGAGAAGGCCGAACTGCTCACTGACCTCCGCGTG
CATCTCCATATCTCTAACGATGAGCACCTGCGGGTGCTTAACATGATCTTGAACCGCAAG
GGCAGATTTGGAGGATCACATGCAAATTCTTAA
>FGENESH:
42exon(s)27135-28504430aa,chain-
MRIRKGSHVEVWTQDAASPVGAWRVGEVTWGNGHSYTMRWHDGGGEVSGRISRKSVRPRP
PPAPVPRDLDAGDMVEVFDHDDCLWKCAEVKGAAADDDRRFVVKVVGATNVLTVPPQRLR
IRQVLRDDDVWVALHKSSFPDTSPWFFASQDNQIAVPSATPPFHAYGGGAGMGIGRTKGG
HKPMAPGFTPLLQKRSPLLQKRSFGMLGSSTITPNGKRFDDTAKRICAKEEPRYEVEVVV
PNVRLNKQDEMSGEDVDVLGTRSDSDDDHHQQQQQHEDEDDDDDSDDSASSSSDD