完整版计算机体系结构课后习题Word格式.docx

上传人:b****1 文档编号:614873 上传时间:2023-04-29 格式:DOCX 页数:17 大小:324.21KB
下载 相关 举报
完整版计算机体系结构课后习题Word格式.docx_第1页
第1页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第2页
第2页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第3页
第3页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第4页
第4页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第5页
第5页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第6页
第6页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第7页
第7页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第8页
第8页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第9页
第9页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第10页
第10页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第11页
第11页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第12页
第12页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第13页
第13页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第14页
第14页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第15页
第15页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第16页
第16页 / 共17页
完整版计算机体系结构课后习题Word格式.docx_第17页
第17页 / 共17页
亲,该文档总共17页,全部预览完了,如果喜欢就下载吧!
下载资源
资源描述

完整版计算机体系结构课后习题Word格式.docx

《完整版计算机体系结构课后习题Word格式.docx》由会员分享,可在线阅读,更多相关《完整版计算机体系结构课后习题Word格式.docx(17页珍藏版)》请在冰点文库上搜索。

完整版计算机体系结构课后习题Word格式.docx

So,

(3)By

Ifonlyoneenhancementcanbeimplemented:

So,wemustselectenhancement1and3tomaximizeperformance.

1.2Supposethereisagraphicsoperationthataccountsfor10%ofexecutiontimeinanapplication,andbyaddingspecialhardwarewecanspeedthisupbyafactorof18.Infurther,wecouldusetwiceasmuchhardware,andmakethegraphicsoperationrun36timesfaster.Givethereasonofwhetheritisworthexploringsuchanfurtherarchitecturalchange?

So,Itisnotworthexploringsuchanfurtherarchitecturalchange.

1.3Inmanypracticalapplicationsthatdemandareal-timeresponse,thecomputationalworkloadWisoftenfixed.Asthenumberofprocessorsincreasesinaparallelcomputer,thefixedworkloadisdistributedtomoreprocessorsforparallelexecution.Assume20percentofWmustbeexecutedsequentially,and80percentcanbeexecutedby4nodessimultaneously.Whatisafixed-loadspeedup?

So,afixed-loadspeedupis2.5.

2.1Thereisamodelmachinewithnineinstructions,whichfrequenciesareADD(0.3),SUB(0.24),JOM(0.06),STO(0.07),JMP(0.07),SHR(0.02),CIL(0.03),CLA(0.2),STP(0.01),respectively.ThereareseveralGPRsinthemachine.Memoryisbyteaddressable,withaccessedaddressesaligned.Andthememorywordwidthis16bit.

Supposethenineinstructionswiththecharacteristicsasfollowing:

nTwooperandsinstructions

nTwokindsofinstructionlength

nExtendedcoding

nShorterinstructionoperandsformat:

R(register)-R(register)

nLongerinstructionoperandsformat:

R(register)-M(memory)

nWithdisplacementmemoryaddressingmode

A.EncodethenineinstructionswithHuffman-coding,andgivetheaveragecodelength.

B.Designedthepracticalinstructioncodes,andgivetheaveragecodelength.

C.Writethetwoinstructionwordformatsindetail.

D.Whatisthemaximumoffsetforaccessingmemoryaddress?

HuffmancodingbyHuffmantree

nADD30%01

nSUB24%11

nCLA20%10

nJOM6%0001

nSTO7%0011

nJMP7%0010

nSHR2%000001

nCIL3%00001

nSTP1%000000

So,theaveragecodelengthis

(B)Twokindsofinstructionlengthextendedcoding

nSUB24%11

nCLA20%10

nJOM6%11000

nSTO7%11001

nJMP7%11010

nSHR2%11011

nCIL3%11100

nSTP1%11101

(C)Shorterinstructionformat:

Opcode

2bits

Register

3bits

Longerinstructionformat:

opcode

5bits

offset

(D)Themaximumoffsetforaccessingmemoryaddressis32bytes.

3.1Identifyallofthedatadependencesinthefollowingcode.Whichdependencesaredatahazardsthatwillberesolvedviaforwarding?

ADDR2,R5,R4

ADDR4,R2,R5

SWR5,100(R2)

ADDR3,R2,R4

3.2Howcouldwemodifythefollowingcodetomakeuseofadelayedbranchslot?

Loop:

LWR2,100(R3)

ADDIR3,R3,#4

BEQR3,R4,Loop

LWR2,100(R3)

Loop:

ADDIR3,R3,#4

BEQR3,R4,Loop

Delayedbranchslotà

LWR2,100(R3)

3.3Considerthefollowingreservationtableforafour-stagepipelinewithaclockcyclet=20ns.

A.Whataretheforbiddenlatenciesandtheinitialcollisionvector?

B.Drawthestatetransitiondiagramforschedulingthepipeline.

C.DeterminetheMALassociatedwiththeshortestgreedycycle.

D.DeterminethepipelinemaximumthroughputcorrespondingtotheMALandgivent.

s1

s2

s3

s4

123456

×

A.theforbiddenlatenciesF={1,2,5}

theinitialcollisionvectorC=(10011)

B.thestatetransitiondiagram

C.MAL(MinimalAverageLatency)=3clockcycles

D.ThepipelinemaximumthroughputHk=1/(3×

20ns)

3.4Usingthefollowingcodefragment:

LWR1,0(R2);

loadR1fromaddress0+R2

ADDIR1,R1,#1;

R1=R1+1

SW0(R2),R1;

storeR1ataddress0+R2

ADDIR2,R2,#4;

R2=R2+4

SUBR4,R3,R2;

R4=R3-R2

BNEZR4,Loop;

BranchtoloopifR4!

=0

AssumethattheinitialvalueofR3isR2+396.

ThroughoutthisexerciseusetheclassicRISCfive-stageintegerpipelineandassumeallmemoryaccesstake1clockcycle.

A.ShowthetimingofthisinstructionsequencefortheRISCpipelinewithoutanyforwardingorbypassinghardwarebutassumingaregisterreadandawriteinthesameclockcycle“forwards”throughtheregisterfile.Assumethatthebranchishandledbyflushingthepipeline.Ifallmemoryreferencestake1cycle,howmanycyclesdoesthislooptaketoexecute?

B.ShowthetimingofthisinstructionsequencefortheRISCpipelinewithnormalforwardingandbypassinghardware.Assumethatthebranchishandledbypredictingitasnottaken.Ifallmemoryreferencetake1cycle,howmanycyclesdoesthislooptaketoexecute?

C.AssumetheRISCpipelinewithasingle-cycledelayedbranchandnormalforwardingandbypassinghardware.Scheduletheinstructionsintheloopincludingthebranchdelayslot.Youmayreorderinstructionsandmodifytheindividualinstructionoperands,butdonotundertakeotherlooptransformationsthatchangethenumberoropcodeoftheinstructionsintheloop.Showapipelinetimingdiagramandcomputethenumberofcyclesneededtoexecutetheentireloop.

A.·

Theloopiterates396/4=99times.

·

Gothroughonecompleteiterationoftheloopandthefirstinstructioninthenextiteration.

Totallength=thelengthofiterations0through97(Thefirst98iterationsshouldbeofthesamelength)+thelengthofthelastiteration.

WehaveassumedtheversionofDLXdescribedinFigure3.21(Page97)inthebook,whichresolvesbranchesinMEM.

FromthisFigure,theseconditerationbegin17clocksafterthefirstiterationandthelastiterationtakes18cyclestocomplete.

Totallength=17×

98+18=1684clockcycles

B.·

FromthisFigure,theseconditerationbegin10clocksafterthefirstiterationandthelastiterationtakes11cyclestocomplete.

Totallength=10×

98+11=991clockcycles

C.Loop:

Reorderinstructionsto:

LWR1,0(R2);

SW-4(R2),R1;

storeR1ataddress0+R2

FromFiguretheseconditerationbegin6clocksafterthefirstiterationandthelastiterationtakes10cyclestocomplete.

Totallength=6×

98+10=598clockcycles

stall

(stall)ADDIR2,R2,#4;

(stall)ADDIR1,R1,#1;

(stall)SW-4(R2),R1;

3.5Considerthefollowingreservationtableforafour-stagepipeline.

D.DeterminethepipelinemaximumthroughputcorrespondingtotheMAL.

E.Accordingtotheshortestgreedycycle,putsixtasksintothepipeline,determinethepipelineactualthroughput.

1

2

3

4

5

6

7

A.theforbiddenlatenciesare{2,4,6}

theinitialcollisionvectorC=(101010)

B.thestatetransitiondiagram:

C.theMALassociatedwiththeshortestgreedycycleis4cycles.

scheduling

Averagelatency

(1,7)

(3,5)

(5,3)

(5)

(3,7)

(5,7)

(7)

D.thepipelinemaximumthroughputcorrespondingtotheMAL:

Hk=1/(4clockcycles)

E.Accordingtotheshortestgreedycycle,putsixtasksintothepipeline.

Thebestschedulingisthegreedycycle(l,7).

because:

accordingto(1,7)scheduling:

actualthroughputHk=6/(1+7+1+7+1+7)=6/(24cycles)

accordingto(3,5)scheduling:

actualthroughputHk=6/(3+5+3+5+3+7)=6/(26cycles)

accordingto(5,3)scheduling:

actualthroughputHk=6/(5+3+5+3+5+

展开阅读全文
相关资源
猜你喜欢
相关搜索
资源标签

当前位置:首页 > 医药卫生 > 基础医学

copyright@ 2008-2023 冰点文库 网站版权所有

经营许可证编号:鄂ICP备19020893号-2