projectWord文件下载.doc

上传人:wj 文档编号:1450095 上传时间:2023-04-30 格式:DOC 页数:12 大小:475KB
下载 相关 举报
projectWord文件下载.doc_第1页
第1页 / 共12页
projectWord文件下载.doc_第2页
第2页 / 共12页
projectWord文件下载.doc_第3页
第3页 / 共12页
projectWord文件下载.doc_第4页
第4页 / 共12页
projectWord文件下载.doc_第5页
第5页 / 共12页
projectWord文件下载.doc_第6页
第6页 / 共12页
projectWord文件下载.doc_第7页
第7页 / 共12页
projectWord文件下载.doc_第8页
第8页 / 共12页
projectWord文件下载.doc_第9页
第9页 / 共12页
projectWord文件下载.doc_第10页
第10页 / 共12页
projectWord文件下载.doc_第11页
第11页 / 共12页
projectWord文件下载.doc_第12页
第12页 / 共12页
亲,该文档总共12页,全部预览完了,如果喜欢就下载吧!
下载资源
资源描述

projectWord文件下载.doc

《projectWord文件下载.doc》由会员分享,可在线阅读,更多相关《projectWord文件下载.doc(12页珍藏版)》请在冰点文库上搜索。

projectWord文件下载.doc

.text

initializeregisters

daddir1,r0,a

daddir2,r0,b

daddir3,r0,c

daddir4,r0,6

Loop:

lwr5,0(r1);

elementofa

lwr6,0(r2);

elementofb

lwr7,0(r3);

elementofc

daddr8,r5,r6;

a[i]+b[i]

daddr9,r7,r8;

swr9,0(r1);

storevalueina[i]

daddir1,r1,8;

incrementmemorypointers

daddir2,r2,8

daddir3,r3,8

daddir4,r4,-1;

i++

bnezr4,Loop

end:

halt

1)Loadex1.sintothememoryofMIPS64anddisabletheforwardinglogic,thedelayslotandtheBranchtargetbufferfromtheConfiguremenuinthemaintoolbar.Beforerunningtheprogram,trytopredictwherestalls

occur,howmanyclockcyclestheywilltake,andforwhatkindofhazardstheyoccur.Thencompareyour

predictionwiththesimulationresults.

答:

时钟周期数=19×

6+4+4=122

RAWdatahazard=7×

6=42次。

仿真器的模拟结果为:

2)Theprogramofex1.sconsistsaloopplussomeotherinstructionsoutsideit.Afterrunningtheprogramtocompletion,estimatetheCPIusingtheStatisticswindow.Inthecaseofaprogramcontaininga“hotspot”(i.e.aninternalloopwhoseinstructionsareexecutedmuchmorefrequentlythanalltheotherinstructions)

theCPIcanberoughlyestimatedjustusingtheasymptoticCPI,i.e.

whereNoutandSoutarethenumberofinstructionsandthenumberofstallsoutsidethe“hotspot”,respectively,Listhenumberofloopcyclesoftheinnermostloop,andIChotandShotarethenumberofinstructionsandthenumberofstallsofthe“hotspot”.ComparetheasymptoticCPIwiththevalueresultingfromsimulations.Areresultscompatible?

SimulatorCPI=((11+7)*6+4+5+5+1)/(11*6)=1.848

CPIAsymptotic=(11+7)/11=1.636

执行情况如下:

3)Enabletheforwardinglogicandexecutethecodeagain.ComputetheCPIagain.JustifytheremainingstallsandcommentwhysomeofthemoccurafterIDstageratherthanafterIF.

SimulatorCPI=((11+1)*6+5+5+4)/(11*6)=1.303

CPIAsymptotic=(11+1)/11=1.091

不相同。

因为存在forwarding,

ID阶段可以先读取寄存器的地址,默认的寄存器的值为错,bnez指令需要放回寄存器中的值,所以不接受daddi指令。

EXE阶段forwarding的值,而要等到WB后的值。

4)DisabletheforwardinglogicandassumethattheMIPShardwarecannotdetecthazards.ModifythesourcecodebyinsertingNOPswhereappropriatewithoutreorderingthecode(NOPstuffingtechnique).Checkwiththesimulatorthatnostalloccurs,andcheckwhethertheCPIhaschanged.Dowehavebetterperformance?

加入NOP:

lwr5,0(r1)

lwr6,0(r2)

lwr7,0(r3)

NOP

daddr8,r5,r6

daddr9,r7,r8

swr9,0(r1)

daddir1,r1,8

daddir2,r2,8

daddir3,r3,8

daddir4,r4,-1

bnezr4,Loop

end:

加入了nop后,没有stall,CPI改变,性能变弱。

5)Rescheduletheinstructions(codemovingtechnique)inordertoavoidstallswithoutmodifyingtheprogramsemantics(checkthefinalresulttoseeifaftermovingtheinstructionstheresultisthesame).RecomputethenormalandasymptoticCPIvalues.

代码如下:

执行情况:

lwr5,0(r1)

daddir4,r4,-1

故CPIAsymptotic=(11+3)/11=1.273

实际为1.296

6)Combinereschedulingandforwardingtechniquesandnotethedifferenceswithrespecttothe

forwarding‐onlyandrescheduling‐onlycases.Trytoenablethe“Branchtargetbuffer”lookatthesimulationcodeanddeterminetheCPI.Hasperformanceimproved?

Trytomodifytheoriginalcodebyadding6additionalinputvaluesinaandb.WhatdoyouexpectfromCPI?

加入forwarding的执行情况:

在此基础上加入“Branchtargetbuffer”,得到的结果如下:

forwarding:

rescheduling:

把循环的次数增加到12次的时,增加输入的个数,CPI又会有提高。

.space96

.word10,11,12,13,0,1,1,0,13,12,11,10

.word1,2,3,4,5,6,6,5,4,3,2,1

daddir4,r0,12

程序的执行情况如下:

rescheduling:

增加循环次数后,代码变为:

elementofb

elementofc

daddr8,r5,r6;

a[i]+b[i]

daddr9,r7,r8;

swr9,0(r1);

daddir1,r1,8;

daddir4,r4,-1;

forwarding:

7)Awell‐knowncompileroptimizationisknownas“loopunrolling”.Basically,loopunrollingistheexplicit

repetitionoftheloopcodeanumberoftimes.Inthiswayweobtainalongerloopbodythatisexecutedless

times.Considertheoriginalcodeofex1.s.Unrollthelooptwicewithoutanycodemoving,i.e.justrepeatthe

firstfourloopinstructionsandmakethenecessarychangestherein.CalculatetheCPIforthecasewithout

forwarding.Isthereanyimprovement?

lwr10,8(r1);

lwr11,8(r2);

lwr12,8(r3);

daddr13,r10,r11;

daddr14,r12,r13;

swr14,8(r1);

daddir1,r1,16;

daddir2,r2,16

daddir3,r3,16

daddir4,r4,-2;

CPIAsymptotic=(17+13)/17=1.765

故无提高。

8)ApplycodereschedulingtothesolutionofthepreviousquestionandcalculateboththeCPIandtheasymptoticCPIvalueswithandwithoutforwarding.Isthereanyimprovement?

lwr10,8(r1)

Lwr11,8(r2);

Lwr12,8(r3);

daddir4,r4,-2

daddr13,r10,r11

daddir1,r1,16

daddr14,r12,r13

daddir2,r2,16

daddir3,r3,16

swr14,-8(r1)

End:

halt

CPINormal=((17+0)*6+5+0+4)/(17*6)=1.088

CPIAsymptotic=(17+0)/17=1.000

9)Supposethattheaddoperationintheoriginalcodeisafloatingpointcalculationandtheloopisiteratedfor12

times.Pleaseusefloatingpointregistersfora[i],b[i],andc[i],andmodifyyourassemblycode.Pleaseanswer

thefollowingquestions:

Atleasthowmanytimesdoyouneedtounrollthelooptominimizestallswithout

forwarding?

Whatistheaveragelatencyofiterationsfortheoriginalloop?

Whatisthecodesize?

Pleaseshow

usyourcode.

Thefollowingistheinputdataofyourcode:

.text…………

执行情况如下:

l.df1,0(r1)

l.df2,0(r2)

l.df3,0(r3)

add.df4,f2,f1

add.df5,f3,f4

s.df5,0(r1)

将上面的程序四次展开:

程序如下:

执行情况如下:

.double10,11,12,13,0,1,1,0,13,12,11,10

.double1,2,3,4,5,6,6,5,4,3,2,1

l.df5,0(r1)

l.df6,0(r2)

l.df10,8(r1)

l.df11,8(r2)

add.df8,f5,f6

l.df15,16(r1)

l.df16,16(r2)

add.df13,f10,f11

l.df7,0(r3)

l.df12,8(r3)

add.df18,f15,f16

l.df17,16(r3)

add.df9,f7,f8

add.df14,f13,f12

daddir1,r1,24

add.df19,f17,f18

daddir4,r4,-3

s.df9,0(r1)

s.df14,8(r1)

daddir3,r3,24

daddir2,r2,24

s.df19,-8(r1)

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > PPT模板 > 商务科技

copyright@ 2008-2023 冰点文库 网站版权所有

经营许可证编号:鄂ICP备19020893号-2