完整版计算机体系结构课后习题Word格式.docx

资源描述

完整版计算机体系结构课后习题Word格式.docx

《完整版计算机体系结构课后习题Word格式.docx》由会员分享，可在线阅读，更多相关《完整版计算机体系结构课后习题Word格式.docx（17页珍藏版）》请在冰点文库上搜索。

完整版计算机体系结构课后习题Word格式.docx

So，

（3）By

Ifonlyoneenhancementcanbeimplemented：

So，wemustselectenhancement1and3tomaximizeperformance.

1.2Supposethereisagraphicsoperationthataccountsfor10%ofexecutiontimeinanapplication,andbyaddingspecialhardwarewecanspeedthisupbyafactorof18.Infurther,wecouldusetwiceasmuchhardware,andmakethegraphicsoperationrun36timesfaster.Givethereasonofwhetheritisworthexploringsuchanfurtherarchitecturalchange?

So，Itisnotworthexploringsuchanfurtherarchitecturalchange.

1.3Inmanypracticalapplicationsthatdemandareal-timeresponse,thecomputationalworkloadWisoftenfixed.Asthenumberofprocessorsincreasesinaparallelcomputer,thefixedworkloadisdistributedtomoreprocessorsforparallelexecution.Assume20percentofWmustbeexecutedsequentially,and80percentcanbeexecutedby4nodessimultaneously.Whatisafixed-loadspeedup?

So，afixed-loadspeedupis2.5.

2.1Thereisamodelmachinewithnineinstructions,whichfrequenciesareADD（0.3）,SUB（0.24）,JOM（0.06）,STO（0.07）,JMP（0.07）,SHR（0.02）,CIL（0.03）,CLA（0.2）,STP（0.01）,respectively.ThereareseveralGPRsinthemachine.Memoryisbyteaddressable,withaccessedaddressesaligned.Andthememorywordwidthis16bit.

Supposethenineinstructionswiththecharacteristicsasfollowing:

nTwooperandsinstructions

nTwokindsofinstructionlength

nExtendedcoding

nShorterinstructionoperandsformat:

R（register）-R（register）

nLongerinstructionoperandsformat:

R（register）-M（memory）

nWithdisplacementmemoryaddressingmode

A.EncodethenineinstructionswithHuffman-coding,andgivetheaveragecodelength.

B.Designedthepracticalinstructioncodes,andgivetheaveragecodelength.

C.Writethetwoinstructionwordformatsindetail.

D.Whatisthemaximumoffsetforaccessingmemoryaddress?

HuffmancodingbyHuffmantree

nADD30%01

nSUB24%11

nCLA20%10

nJOM6%0001

nSTO7%0011

nJMP7%0010

nSHR2%000001

nCIL3%00001

nSTP1%000000

So，theaveragecodelengthis

（B）Twokindsofinstructionlengthextendedcoding

nSUB24%11

nCLA20%10

nJOM6%11000

nSTO7%11001

nJMP7%11010

nSHR2%11011

nCIL3%11100

nSTP1%11101

（C）Shorterinstructionformat:

Opcode

2bits

3bits

Longerinstructionformat:

opcode

5bits

offset

（D）Themaximumoffsetforaccessingmemoryaddressis32bytes.

3.1Identifyallofthedatadependencesinthefollowingcode.Whichdependencesaredatahazardsthatwillberesolvedviaforwarding?

ADDR2,R5,R4

ADDR4,R2,R5

SWR5,100（R2）

ADDR3,R2,R4

3.2Howcouldwemodifythefollowingcodetomakeuseofadelayedbranchslot?

Loop:

LWR2,100（R3）

ADDIR3,R3,#4

BEQR3,R4,Loop

LWR2,100（R3）

Loop:

ADDIR3,R3,#4

BEQR3,R4,Loop

Delayedbranchslotà

LWR2,100（R3）

3.3Considerthefollowingreservationtableforafour-stagepipelinewithaclockcyclet=20ns.

A.Whataretheforbiddenlatenciesandtheinitialcollisionvector?

B.Drawthestatetransitiondiagramforschedulingthepipeline.

C.DeterminetheMALassociatedwiththeshortestgreedycycle.

D.DeterminethepipelinemaximumthroughputcorrespondingtotheMALandgivent.

123456

A.theforbiddenlatenciesF={1,2,5}

theinitialcollisionvectorC=（10011）

B.thestatetransitiondiagram

C.MAL（MinimalAverageLatency）=3clockcycles

D.ThepipelinemaximumthroughputHk=1/（3×

20ns）

3.4Usingthefollowingcodefragment:

LWR1,0（R2）;

loadR1fromaddress0+R2

ADDIR1,R1,#1;

R1=R1+1

SW0（R2）,R1;

storeR1ataddress0+R2

ADDIR2,R2,#4;

R2=R2+4

SUBR4,R3,R2;

R4=R3-R2

BNEZR4,Loop;

BranchtoloopifR4!

AssumethattheinitialvalueofR3isR2+396.

ThroughoutthisexerciseusetheclassicRISCfive-stageintegerpipelineandassumeallmemoryaccesstake1clockcycle.

A.ShowthetimingofthisinstructionsequencefortheRISCpipelinewithoutanyforwardingorbypassinghardwarebutassumingaregisterreadandawriteinthesameclockcycle“forwards”throughtheregisterfile.Assumethatthebranchishandledbyflushingthepipeline.Ifallmemoryreferencestake1cycle,howmanycyclesdoesthislooptaketoexecute?

B.ShowthetimingofthisinstructionsequencefortheRISCpipelinewithnormalforwardingandbypassinghardware.Assumethatthebranchishandledbypredictingitasnottaken.Ifallmemoryreferencetake1cycle,howmanycyclesdoesthislooptaketoexecute?

C.AssumetheRISCpipelinewithasingle-cycledelayedbranchandnormalforwardingandbypassinghardware.Scheduletheinstructionsintheloopincludingthebranchdelayslot.Youmayreorderinstructionsandmodifytheindividualinstructionoperands,butdonotundertakeotherlooptransformationsthatchangethenumberoropcodeoftheinstructionsintheloop.Showapipelinetimingdiagramandcomputethenumberofcyclesneededtoexecutetheentireloop.

A.·

Theloopiterates396/4=99times.

Gothroughonecompleteiterationoftheloopandthefirstinstructioninthenextiteration.

Totallength=thelengthofiterations0through97（Thefirst98iterationsshouldbeofthesamelength）+thelengthofthelastiteration.

WehaveassumedtheversionofDLXdescribedinFigure3.21（Page97）inthebook,whichresolvesbranchesinMEM.

FromthisFigure,theseconditerationbegin17clocksafterthefirstiterationandthelastiterationtakes18cyclestocomplete.

Totallength=17×

98+18=1684clockcycles

B.·

FromthisFigure,theseconditerationbegin10clocksafterthefirstiterationandthelastiterationtakes11cyclestocomplete.

Totallength=10×

98+11=991clockcycles

C.Loop:

Reorderinstructionsto:

LWR1,0（R2）;

SW-4（R2）,R1;

storeR1ataddress0+R2

FromFiguretheseconditerationbegin6clocksafterthefirstiterationandthelastiterationtakes10cyclestocomplete.

Totallength=6×

98+10=598clockcycles

stall

（stall）ADDIR2,R2,#4;

（stall）ADDIR1,R1,#1;

（stall）SW-4（R2）,R1;

3.5Considerthefollowingreservationtableforafour-stagepipeline.

D.DeterminethepipelinemaximumthroughputcorrespondingtotheMAL.

E.Accordingtotheshortestgreedycycle,putsixtasksintothepipeline,determinethepipelineactualthroughput.

√

A.theforbiddenlatenciesare{2,4,6}

theinitialcollisionvectorC=（101010）

B.thestatetransitiondiagram:

C.theMALassociatedwiththeshortestgreedycycleis4cycles.

scheduling

Averagelatency

（1,7）

（3,5）

（5,3）

（5）

（3,7）

（5,7）

（7）

D.thepipelinemaximumthroughputcorrespondingtotheMAL:

Hk=1/（4clockcycles）

E.Accordingtotheshortestgreedycycle,putsixtasksintothepipeline.

Thebestschedulingisthegreedycycle（l,7）.

because:

accordingto（1,7）scheduling:

actualthroughputHk=6/（1+7+1+7+1+7）=6/（24cycles）

accordingto（3,5）scheduling:

actualthroughputHk=6/（3+5+3+5+3+7）=6/（26cycles）

accordingto（5,3）scheduling:

actualthroughputHk=6/（5+3+5+3+5+

展开阅读全文