apriori算法的改进.docx

上传人:b****1 文档编号:1582834 上传时间:2023-05-01 格式:DOCX 页数:11 大小:22.70KB
下载 相关 举报
apriori算法的改进.docx_第1页
第1页 / 共11页
apriori算法的改进.docx_第2页
第2页 / 共11页
apriori算法的改进.docx_第3页
第3页 / 共11页
apriori算法的改进.docx_第4页
第4页 / 共11页
apriori算法的改进.docx_第5页
第5页 / 共11页
apriori算法的改进.docx_第6页
第6页 / 共11页
apriori算法的改进.docx_第7页
第7页 / 共11页
apriori算法的改进.docx_第8页
第8页 / 共11页
apriori算法的改进.docx_第9页
第9页 / 共11页
apriori算法的改进.docx_第10页
第10页 / 共11页
apriori算法的改进.docx_第11页
第11页 / 共11页
亲,该文档总共11页,全部预览完了,如果喜欢就下载吧!
下载资源
资源描述

apriori算法的改进.docx

《apriori算法的改进.docx》由会员分享,可在线阅读,更多相关《apriori算法的改进.docx(11页珍藏版)》请在冰点文库上搜索。

apriori算法的改进.docx

apriori算法的改进

ADEVELOPEDALGORITHMOFAPRIORIBASEDONASSOCIATIONANALYSIS

Abstract

Basedonassociationanalysis,animprovedalgorithmofAprioriispresentedinthepaper.Themainideaofthealgorithmare:

(1)Counttheprobabilityofeachattributeitem(A1,A2,…Am)ofaDBbyscanningtheDBfirsttime;

(2)TheprobabilityofanytwoitemsAkandAmappearedsynchronouslyinonerecordisPkm.

min(Pk,Pm)≤Pkm≤Pk*Pm,

ifAkandAmistotalcorrelation,thenthePkmistheminimumofthePkandPm,;ifAkandAmistotalindependent,thenthePkmisPk*Pm;

Sowecanestimate:

Pkm=(a*min(Pk,Pm)+b*Pk*Pm)/(a+b);a+b=1

Parameter“a”istheprobabilitywhileAkandAmaretotalcorrelation,Parameter“b”istheprobabilitywhileAkandAmaretotalindependent,

Parameter“a”and“b”canuseothermethodsuchasassociationanalysistocount.Inthispaperamethodforcalculatetheparameter“a”and”b”withassociationanalysisisprovided.

ifPkmismorethanthethresholdvaluewhichtheuserset,thenAk,Amarethefrequentitemsets.

YoucanusethemethodwhichdescribedabovetofindoutallthefrequentitemsetswithoutscanningDBsomanytimes.

(3)CountthesupportofthefrequentitemsetsbyscanningtheDBanothertime;(4)Outputtheassociationrulesfromthefrequentitemsets.

Thedetailedalgorithmandit'ssamplearedescribedinthepaper.Atlastwecompareditwithalgorithmapriori.ThebestqualityisthatthealgorithminourpaperreducethetimesofscanningDB.

KEYWORDS:

AssociationRuleAlgorithmAprioriFrequentItemsetAssociationAnalysis

 

1Introduction

Withthedevelopmentoftheinformationtechnologyandapplicationofdatabase.Thecollecteddatafarexceedpeople'sabilitytoanalyzeit.Thusnewandefficientmethodsareneededtodiscoverknowledgefromlargedatabases.Anddataminingappeared.

Dataminingisthecoreofknowledgediscoveryindatabases.It’saproceduretofindtheusefulandpotentiallyknowledgeindatabase.Associationrulesareoneofthemostimportantknowledgeofdatamining’sresultwhichcanbedefinedastherelationanddependencybetweentheitemsetsbygivensupportandconfidenceindatabase.

Fromtheaboveitcanbeconcludedthatthekeyproblemoftheaprioriisittaketoomuchtimetominingthefrequentitemsets.Thetimemainlycostsintwoareas,oneisthetimeforscanningthelargedatabaserepeatedlytheotherisforgeneratingfrequentitemsetswithJOIN.TherearealotofimprovedalgorithmforapriorisuchasAprioriTID,AprioriHybri,Multipleoins,ReorderandDirectetc.Themainideaofallthesealgorithmisaccordingthetheorythatthesubsetofafrequentitemsetisafrequentitemsetandthesupersetofainfrequentitemsetisainfrequentitemset.Theyscanthedatabaserepeatdlytominingtheassociationrules.

ThereisanotherfeatureforalgorithmAprioriTID,thesupportofthecandidatefrequentitemsetsarecalculatedonlyatthefirsttimeitscannedthedatabaseDandalsogeneratedcandidatetransactiondatabaseD’whichonlyincludesthecandidatefrequentitemsets.ThenthelatterminingarebasedonthedatabaseD’,ItreducethetimeofI/OoperationbecauseD’issmallerthanD,so,itenhancetheefficiencyofthealgorithm.

ForalgorithmAprioriHybri,itisthecombinationofalgorithmAprioriandAprioriTID.WhenthecandidatetransactiondatabaseD’couldn’tbecontainedintheRAM,itmineswithalgorithmAprioriotherwisewithalgorithmAprioriTID.

Themostimportantstepinminingassociationisgenerationfrequentitemsets.Inalgorithmapriorithemosttimeisconsumedbyscanningthedatabaserepeatedly.Itwouldreducetherunningtimeofthealgorithmbyreducingthetimesitscansthedatabasefarandaway.Inthispaperamethodofminingfrequentitemsetsbyevaluatingtheirprobabilityofsupportsbasedonassociationanalyzingwerementioned.First,itgainedtheprobabilityofevery1-itemsetbyscanningthedatabase,the1-itemsetwiththemorelargersupportthantheprobabilitytheusersetswouldbefrequent1-itemsets.Second,itevaluatestheprobabilityofevery2-itemset,every3-itemset,everyk-itemsetfromthefrequent1-itemsets.Third,itgainsallthecandidatefrequentitemsets.Fourth,itscansthedatabaseforverifyingthesupportofthecandidatefrequentitemsets,Lastthefrequentitemsetsareminedandassociationrulesalsodo.Inthemethoditreducesalotoftimesofscanningdatabaseandshortenedthecalculatetimeofthealgorithm.

2Relatedconcepts

Inthissection,wepresentthedefinitionsoftheconceptsthatareusedtodescribetheimprovedalgorithm.Letus

startfromthefollowingdefinitionsforassociationrules.

Let

beasetofallitems,whereanitemisanobjectwithsomepredefinedattributes(e.g.,

price,weight,etc.).Atransaction

isatuple,wheretid

istheidentifierofthetransactionand

.AtransactiondatabaseTconsistsofasetoftransactions.Anitemsetisasubsetofthesetofitems.A

k-itemsetisanitemsetofsizek.Wewriteitemsetsas

omittingsetbrackets.

Definition1.AnassociationruletakestheformX⇒YwhereX⊂I,Y⊂I,andX∩Y=Φ.thesupportoftheruleX⇒Yintransactiondatabaseis:

support(X⇒Y)=|{T:

X∪Y⊆T,T∈D}|/|D|

markedbysupport(X⇒Y).

Definition2.theconfidenceoftheruleX⇒Yintransactiondatabaseis:

confidence(X⇒Y)=|{T:

X∪Y⊆T,T∈D}|/|{T:

X⊆T,T∈D}|

markedbyconfidence(X⇒Y).

Theminingassociationrulesproblemisgeneratingassociationruleswithitemsetswhichhavemorelargerorequalsupportandconfidencethantheusersettheminsuppandminconf.

2

Definition3.LetA=A1,A2,…,Anbeasetofallevents.If

P(Ai1Ai2…Aik)=P(Ai1)P(Ai2)…P(Aik),

(1)

Foranyk(1

(1)isrignt,thenA1,A2,…,Anareinter-independentevents.

Candidatefrequentitemsetsrefertotheitemsetswhoseprobabilityislargerthantheusersets,theymaybethelastfrequentitemsetornot.

3TheImprovedalgorithmforapriori

3.1themainideaoftheimprovedalgorithm

LetP1,P2…PnaretheindependentprobabilityofeveryitemA1,A2…An,theprobabilityforanytwoitemAk,Am(Pk

IfAkandAmaretotalnon-correlation,fromdefinition3itcanbeconcludedthatPkm=Pk*Pm,ifAkandAmaretotalcorrelation,thenPkmistheminimumofthePkandPmthatisPk,so,Pk*Pm≤Pkm≤Pk

Nowtheproblemis:

GivenPkandPm,alsoPk*Pm≤Pkm≤Pk,PleaseevaluatePkm.Theproblemcouldn’tbesolved

withtheconditionsinmathematics.Butinfact,thereis

alotof

informationwithoutaccuratemathematicformula

whichbeomitted.Inthispaperitofferedamethodbyassociationanalysistoconfirmtheformula.

LetparameterabetheprobabilitywhichAkandAm

aretotalcorrelation,andparameterbfortotalnon-correlation.

a+b=1,0

(2)below:

Pkm=a∗Pk+b∗Pk∗Pm

(2)

Therearealotmethodstoconfirmthevalueofparameteraandb,inthispaperitprovideawaytodefine“a”and“b”byassociationanalysis.

3.2ConfirmtheParameter“a”,“b”byassociationanalysis

Thereareaseriesofcriterionaboutenvironment.Themostingredientsofthepollutioncanbeconfirmedbasedonthesource.Sowecanconsiderthecriterionasareferencedlistandthelistneededtofindthecorrelationasacomparisonlist.Thenwe’llgetthecorrelationcoefficientwhichistheparameter“a”inourformula

(2),andb=1-a.Thedetailsasbelow:

LetS={S1,S2,…Sm}bethevaluelistofitemAm,S1,S2,…SmaresampleextractedfromtheDBandX={X1,X2,…Xm}bethevaluelistforitemAk,X1,X2,…XmaresampleextractedfromtheDB.

akm=(3)

intheformula(3),akmisthecorrelationcoefficientofitemAmandAk,

=|Si--Xi|,рisthedistinguished

coefficientwhichsetbyusers,usually,р∈(0,1).

Wecanusetheformula(3)tocalculatethecorrelationcoefficientofanyitems.

4Comparisonforthealgorithms

Itwascomparedbetweenouralgorithmandaprioriinthissection,thenumberofthecandidatefrequentitemsetsandtimesofscanningthedatabase.Whichalsowerethelinchpinoftheefficiencyinaalgorithm.Thealgorithminthispaperaheadaprioriinreducingthetimesofscanningthedatabase.Itwouldscanthedatabasektimestofindoutfrequentk-itemsetsinaprioriwhileonly2timesinouralgorithmbyputtingforwardaconceptofcandidatefrequentitemset..

Inthetable1below,itlistedthefrequent1-itemsets,2-itemsets,3-itemsetsforbothofthetwoalgorithmrespectively.Table1TheComparisonofthefrequentitemsets

(candidate)frequent

AlgorithmApriori

Algorithminthispaper

itemsets

1-itemsets

{I1,I2,I3,I4,I5}

{I1,I2,I3,I4,I5}

2-itemsets

{I1,I2},{I1,I3},{I1,I5},{I2,I3},{I2,I4},

{I1,I2},{I1,I3},{I1,I5},{I2,I3},{I2,I4},

{I2,I5}

{I2,I5},{I3,I5}

3-itemsets

{I1,I2,I3},{I1,I2,I5}

{I1,I2,I3},{I1,I2,I5},{I2,I3,I5}

Fromthetable1,weknewthatthereweremorecandidatefrequentitemsetsinthealgorithmthispaperprovidedthanthatinthealgorithmapriori.Itmakesurethatitwouldn’tmissanyfrequentitemsets.

5Experimentandtheanalysisoftheresults

Inthissection,wereportourexperimentalresults.Tovalidatethealgorithmpresentedinthepaper,Adatabaseforrecordingtheupdateofthespatialdatabaseareusedintheexperiment.Thereare10itemsinthedatabase,suchasI1formender,I2forDEPTofthemender,I3forlayerofthetower,I4forlayeroftheroad,I5forlayerofthepolluterarea,I6forlayerofline,I7forlayerofwater,I8forlayerofbuilding,I9forlayerofthunderandI10forlayerplant.Thepurposeoftheexperimentwasmin

展开阅读全文
相关资源
猜你喜欢
相关搜索
资源标签

当前位置:首页 > 农林牧渔 > 林学

copyright@ 2008-2023 冰点文库 网站版权所有

经营许可证编号:鄂ICP备19020893号-2