增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc
《增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc》由会员分享,可在线阅读,更多相关《增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc(15页珍藏版)》请在冰点文库上搜索。
字数统计:
英文1869单词,9708字符;
中文3008汉字
外文文献:
EnhancedVQ-basedAlgorithmsforSpeechIndependentSpeakerIdentification
AbstractWeighteddistancemeasureanddiscriminativetrainingaretwodifferentapproachestoenhanceVQ-basedsolutionsforspeakeridentification.ToaccountforvaryingimportanceoftheLPCcoefficientsinSV,theso-calledpartitionnormalizeddistancemeasuresuccessfullyusednormalizedfeaturecomponents.Thispaperintroducesanalternative,calledheuristicweighteddistance,toliftuphigherorderMFCCfeaturevectorcomponentsusingalinearformula.Thenitproposestwonewalgorithmscombiningtheheuristicweightingandthepartitionnormalizeddistancemeasurewithgroupvectorquantizationdiscriminativetrainingtotakeadvantageofbothapproaches.ExperimentsusingtheTIMITcorpussuggestthatthenewcombinedapproachissuperiortocurrentVQ-basedsolutions(50%errorreduction).ItalsooutperformstheGaussianMixtureModelusingtheWaveletfeaturestestedinasimilarsetting.
1.Introduction
Vectorquantization(VQ)basedclassificationalgorithmsplayanimportantroleinspeechindependentspeakeridentification(SI)systems.Althoughinbaselineform,theVQ-basedsolutionislessaccuratethantheGaussianMixtureModel(GMM),itofferssimplicityincomputation.Foralargedatabaseofoverhundredsorthousandsofspeakers,bothaccuracyandspeedareimportantissues.HerewediscussVQenhancementsaimedataccuracyandfastcomputation.
1.1VQBasedSpeakerIdentificationSystem
Fig.1showstheVQbasedspeakeridentificationsystem.Itcontainsanofflinetrainingsub-systemtoproduceVQcodebooksandanonlinetestingsub-systemtogenerateidentificationdecision.Bothsub-systemscontainapreprocessingorfeatureextractionmoduletoconvertanaudioutteranceintoasetoffeaturevectors.FeaturesofinterestintherecentliteraturesincludetheMel-frequencycepstralcoefficients(MFCC),theLinespectrapairs(LSP),theWaveletpacketparameter(WPP),orPCAandICAfeatures].AlthoughtheWPPandICAhavebeenshowntoofferadvantages,weusedMFCCinthispapertofocusourattentiononothermodulesofthesystem.
Fig.1.AVQ-basedspeakeridentificationsystemfeaturesanonlinesub-systemforidentifyingtestingaudioutterance,andanofflinetrainingsub-system,whichusestrainingaudioutterancetogenerateacodebookforeachspeakerinthedatabase.
AVQcodebooknormallyconsistsofcentroidsofpartitionsoverspeaker’sfeaturevectorspace.TheeffectstoSIbydifferentpartitionclusteringalgorithms,suchastheLBGandtheRLS,havebeenstudied.TheaverageerrorordistortionofthefeaturevectorsoflengthTwithaspeakerkcodebookisgivenby
(1)
d(.,.)isadistancefunctionbetweentwovectors.isthejcodeofdimensionD.Sisthecodebooksize.Listhetotalnumberofspeakersinthedatabase.ThebaselineVQalgorithmofSIsimplyusestheLBGtogeneratecodebooksandthesquareoftheEuclideandistanceasthed(.,.).
ManyimprovementstothebaselineVQalgorithmhavebeenpublished.Amongthem,therearetwoindependentapproaches:
(1)chooseaweighteddistancefunction,suchastheF-ratioandIHMweights,thePartitionNormalizedDistanceMeasure(PNDM),andtheBhattacharyyaDistance;
(2)explorediscriminationpowerofinter-speakercharacteristicsusingtheentiresetofspeakers,suchastheGroupVectorQuantization(GVQ)discriminativetraining,andtheSpeakerDiscriminativeWeighting.ExperimentallywehavefoundthatPNDMandGVQaretwoveryeffectivemethodsineachofthegroupsrespectively.
1.2ReviewofPartitionNormalizedDistanceMeasure
ThePartitionNormalizedDistanceMeasureisdefinedasthesquareoftheweightedEuclideandistance.
(2)
Theweightingcoefficientsaredeterminedbyminimizingtheaverageerroroftrainingutterancesofallthespeakers,subjecttotheconstraintthatthegeometricmeanoftheweightsforeachpartitionisequalto1.
bearandomtrainingfeaturevectorofspeakerk,whichisassignedtopartitionjviaminimizationprocessinEquation
(1).Ithasmeanandvariancevectors:
(3)
Theconstrainedoptimizationcriteriontobeminimizedinordertoderivetheweightsis
(4)
WhereListhenumberofspeakers,andSisthecodebooksize.Letting
and(5)
Wehave
and(6)
Wheresub-scriptiisthefeaturevectorcomponentindex,kandjarespeakerandpartitionindicesrespectively.Becausekandjareinbothsidesoftheequations,theweightsareonlydependentonthedatafromonepartitionofonespeaker.
1.3ReviewofGroupVectorQuantization
Discriminativetrainingistousethedataofallthespeakerstotrainthecodebook,sothatitcanachievemoreaccurateidentificationresultsbyexploringtheinter-speakerdifferences.TheGVQtrainingalgorithmisdescribedasfollows.
GroupVectorQuantizationAlgorithm:
(1)Randomlychooseaspeakerj.
(2)SelectNvectors
(3)calculateerrorforallthecodebooks.
Iffollowingconditionsaresatisfiedgoto(4)
a),but;
b),whereWisawindowsize;
Elsegoto(5)
(4)foreach
where
(5)foreach,
,where
2.Enhancements
WeproposethefollowingstepstofurtherenhancetheVQbasedsolution:
(1)aHeuristicWeightedDistance(HWD),
(2)combinationofHWDandGVQ,and(3)combinationofPNDMandGVQ.
2.1HeuristicWeightedDistance
ThePNDMweightsareinverselyproportionaltopartitionvariancesofthefeaturecomponents,asshowninEquation(6).Ithasbeenshownthatvariancesofcepstral.Clearlywhereiisthevectorelementindex,whichreflectsfrequencyband.Thehighertheindex,thelessfeaturevalueanditsvariance.
WeconsideredaHeuristicWeightedDistanceas
(7)
Theweightsarecalculatedby
(8)
Wherec(S,D)isafunctionofboththecodebooksizeSandthefeaturevectordimensionD.Foragivencodebook,SandDarefixed,andthusc(S,D)isaconstant.Thevalueofc(S,D)isestimatedexperimentallybyperforminganexhaustivesearchtoachievethemaximumidentificationrateinagivensampletestdataset.
2.2CombinationofHWDandGVQ
CombinationoftheHWDandtheGVQisachievedbysimplyreplacingtheoriginalsquareoftheEuclideandistancewiththeHWDEquation(7),andtoadjusttheGVQupdatingparameterαwheneverneeded.
2.3CombinationofPNDMandGVQ
TocombinePNDMwiththeGVQrequiresaslightmorework,becausetheGVQaltersthepartitionandthusitscomponentvariance.Wehaveusedthefollowingalgorithmtoovercomethisproblem.
AlgorithmtoCombinePNDMwiththeGVQDiscriminativeTraining:
(1)UseLBGalgorithmtogenerateinitialLBGcodebooks;
(2)CalculatePNDMweightsusingtheLBGcodebooks,andproducePNDMweightedLBGcodebooks,whichareLBGcodebooksappendedwiththePNDMweights;
(3)PerformGVQtrainingwithPNDMdistancefunction,andgeneratetheinitialPNDM+GVQcodebooksbyreplacingtheLBGcodeswiththeGVQcodes;
(4)RecalculatePNDMweightsusingthePNDM+GVQcodebooks,andproducethefinalPNDM+GVQcodebooksbyreplacingtheoldPNDMweightswiththenewones.
3.ExperimentalComparisonofVQ-basedAlgorithms
3.1TestingDataandProcedures
168speakersinTESTsectionoftheTIMITcorpusareusedforSIexperiment,and190speakersfromDR1,DR2,DR3ofTRAINsectionareusedforestimatingthec(S,D)parameter.Eachspeakerhas10goodqualityrecordingsof16KHz,16bits/sample,andstoredasWAVEfilesinNISTformat.Twoofthem,SA1.WAVandSA2.WAV,areusedfortesting,andtherestfortrainingcodebooks.WedidnotperformsilenceremovalonWAVEfiles,sothatotherscouldreproducetheenvironmentwithnoadditionalcomplicationofVADalgorithmsandtheirparameters.
AMFCCprogramconvertsalltheWAVEfilesinadirectoryintoonefeaturevectorfile,inwhichallthefeaturevectorsareindexedwithitsspeakerandrecording.Foreachvalueoffeaturevectordimension,D=30,40,50,60,70,80,90,onetrainingfileandonetestingfilearecreated.TheyareusedbyallthealgorithmstotraincodebooksofsizeS=16,32,64,andtoperformidentificationtest,respectively.
TheMFCCfeaturevectorsarecalculatedasfollows:
1)dividetheentireutteranceintoblocksofsize512sampleswith256overlapping;
2)performpre-emphasizefilteringwithcoefficient0.97;
3)multiplywithHammingwindow,andperformshort-timeFFT;
4)applythestandardmel-frequencytriangularfilterbankstothesquareofmagnitudeofFFT;
5)applythelogarithmtothesumofalltheoutputsofeachindividualfilter;
6)applyDCTontheentiresetofdataresultedfromallfilters;
7)dropthezerocoefficient,toproducethecepstralcoefficients;
8)afteralltheblocksbeingprocessed,calculatethemeanovertheentiretimedurationandsubtractitfromthecepstralcoefficients;
9)calculatethe1stordertimederivativesofcepstralcoefficients,andconcatenatethemafterthecepstralcoefficients,toformafeaturevector.Forexample,afilter-bankofsize16willproduce30dimensionalfeaturevectors.
Duetoprojecttimeconstraint,theHWDparameterc(S,D)wasestimatedatS=16,32,64,D=40,80,sothatitachievesthehighestidentificationrateusingthe190speakersdatasetofTRAINsection.ForothervaluesofSand