增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc

资源描述

增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc

《增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc》由会员分享，可在线阅读，更多相关《增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc（15页珍藏版）》请在冰点文库上搜索。

增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc

字数统计：

英文1869单词，9708字符；

中文3008汉字

外文文献：

EnhancedVQ-basedAlgorithmsforSpeechIndependentSpeakerIdentification

AbstractWeighteddistancemeasureanddiscriminativetrainingaretwodifferentapproachestoenhanceVQ-basedsolutionsforspeakeridentification.ToaccountforvaryingimportanceoftheLPCcoefficientsinSV,theso-calledpartitionnormalizeddistancemeasuresuccessfullyusednormalizedfeaturecomponents.Thispaperintroducesanalternative,calledheuristicweighteddistance,toliftuphigherorderMFCCfeaturevectorcomponentsusingalinearformula.Thenitproposestwonewalgorithmscombiningtheheuristicweightingandthepartitionnormalizeddistancemeasurewithgroupvectorquantizationdiscriminativetrainingtotakeadvantageofbothapproaches.ExperimentsusingtheTIMITcorpussuggestthatthenewcombinedapproachissuperiortocurrentVQ-basedsolutions（50%errorreduction）.ItalsooutperformstheGaussianMixtureModelusingtheWaveletfeaturestestedinasimilarsetting.

1.Introduction

Vectorquantization（VQ）basedclassificationalgorithmsplayanimportantroleinspeechindependentspeakeridentification（SI）systems.Althoughinbaselineform,theVQ-basedsolutionislessaccuratethantheGaussianMixtureModel（GMM）,itofferssimplicityincomputation.Foralargedatabaseofoverhundredsorthousandsofspeakers,bothaccuracyandspeedareimportantissues.HerewediscussVQenhancementsaimedataccuracyandfastcomputation.

1.1VQBasedSpeakerIdentificationSystem

Fig.1showstheVQbasedspeakeridentificationsystem.Itcontainsanofflinetrainingsub-systemtoproduceVQcodebooksandanonlinetestingsub-systemtogenerateidentificationdecision.Bothsub-systemscontainapreprocessingorfeatureextractionmoduletoconvertanaudioutteranceintoasetoffeaturevectors.FeaturesofinterestintherecentliteraturesincludetheMel-frequencycepstralcoefficients（MFCC）,theLinespectrapairs（LSP）,theWaveletpacketparameter（WPP）,orPCAandICAfeatures].AlthoughtheWPPandICAhavebeenshowntoofferadvantages,weusedMFCCinthispapertofocusourattentiononothermodulesofthesystem.

Fig.1.AVQ-basedspeakeridentificationsystemfeaturesanonlinesub-systemforidentifyingtestingaudioutterance,andanofflinetrainingsub-system,whichusestrainingaudioutterancetogenerateacodebookforeachspeakerinthedatabase.

AVQcodebooknormallyconsistsofcentroidsofpartitionsoverspeaker’sfeaturevectorspace.TheeffectstoSIbydifferentpartitionclusteringalgorithms,suchastheLBGandtheRLS,havebeenstudied.TheaverageerrorordistortionofthefeaturevectorsoflengthTwithaspeakerkcodebookisgivenby

（1）

d（.,.）isadistancefunctionbetweentwovectors.isthejcodeofdimensionD.Sisthecodebooksize.Listhetotalnumberofspeakersinthedatabase.ThebaselineVQalgorithmofSIsimplyusestheLBGtogeneratecodebooksandthesquareoftheEuclideandistanceasthed（.,.）.

ManyimprovementstothebaselineVQalgorithmhavebeenpublished.Amongthem,therearetwoindependentapproaches:

（1）chooseaweighteddistancefunction,suchastheF-ratioandIHMweights,thePartitionNormalizedDistanceMeasure（PNDM）,andtheBhattacharyyaDistance;

（2）explorediscriminationpowerofinter-speakercharacteristicsusingtheentiresetofspeakers,suchastheGroupVectorQuantization（GVQ）discriminativetraining,andtheSpeakerDiscriminativeWeighting.ExperimentallywehavefoundthatPNDMandGVQaretwoveryeffectivemethodsineachofthegroupsrespectively.

1.2ReviewofPartitionNormalizedDistanceMeasure

ThePartitionNormalizedDistanceMeasureisdefinedasthesquareoftheweightedEuclideandistance.

（2）

Theweightingcoefficientsaredeterminedbyminimizingtheaverageerroroftrainingutterancesofallthespeakers,subjecttotheconstraintthatthegeometricmeanoftheweightsforeachpartitionisequalto1.

bearandomtrainingfeaturevectorofspeakerk,whichisassignedtopartitionjviaminimizationprocessinEquation

（1）.Ithasmeanandvariancevectors:

（3）

Theconstrainedoptimizationcriteriontobeminimizedinordertoderivetheweightsis

（4）

WhereListhenumberofspeakers,andSisthecodebooksize.Letting

and（5）

Wehave

and（6）

Wheresub-scriptiisthefeaturevectorcomponentindex,kandjarespeakerandpartitionindicesrespectively.Becausekandjareinbothsidesoftheequations,theweightsareonlydependentonthedatafromonepartitionofonespeaker.

1.3ReviewofGroupVectorQuantization

Discriminativetrainingistousethedataofallthespeakerstotrainthecodebook,sothatitcanachievemoreaccurateidentificationresultsbyexploringtheinter-speakerdifferences.TheGVQtrainingalgorithmisdescribedasfollows.

GroupVectorQuantizationAlgorithm:

（1）Randomlychooseaspeakerj.

（2）SelectNvectors

（3）calculateerrorforallthecodebooks.

Iffollowingconditionsaresatisfiedgoto（4）

a），but;

b），whereWisawindowsize;

Elsegoto（5）

（4）foreach

where

（5）foreach，

，where

2.Enhancements

WeproposethefollowingstepstofurtherenhancetheVQbasedsolution:

（1）aHeuristicWeightedDistance（HWD）,

（2）combinationofHWDandGVQ,and（3）combinationofPNDMandGVQ.

2.1HeuristicWeightedDistance

ThePNDMweightsareinverselyproportionaltopartitionvariancesofthefeaturecomponents,asshowninEquation（6）.Ithasbeenshownthatvariancesofcepstral.Clearlywhereiisthevectorelementindex,whichreflectsfrequencyband.Thehighertheindex,thelessfeaturevalueanditsvariance.

WeconsideredaHeuristicWeightedDistanceas

（7）

Theweightsarecalculatedby

（8）

Wherec（S,D）isafunctionofboththecodebooksizeSandthefeaturevectordimensionD.Foragivencodebook,SandDarefixed,andthusc（S,D）isaconstant.Thevalueofc（S,D）isestimatedexperimentallybyperforminganexhaustivesearchtoachievethemaximumidentificationrateinagivensampletestdataset.

2.2CombinationofHWDandGVQ

CombinationoftheHWDandtheGVQisachievedbysimplyreplacingtheoriginalsquareoftheEuclideandistancewiththeHWDEquation（7）,andtoadjusttheGVQupdatingparameterαwheneverneeded.

2.3CombinationofPNDMandGVQ

TocombinePNDMwiththeGVQrequiresaslightmorework,becausetheGVQaltersthepartitionandthusitscomponentvariance.Wehaveusedthefollowingalgorithmtoovercomethisproblem.

AlgorithmtoCombinePNDMwiththeGVQDiscriminativeTraining:

（1）UseLBGalgorithmtogenerateinitialLBGcodebooks;

（2）CalculatePNDMweightsusingtheLBGcodebooks,andproducePNDMweightedLBGcodebooks,whichareLBGcodebooksappendedwiththePNDMweights;

（3）PerformGVQtrainingwithPNDMdistancefunction,andgeneratetheinitialPNDM+GVQcodebooksbyreplacingtheLBGcodeswiththeGVQcodes;

（4）RecalculatePNDMweightsusingthePNDM+GVQcodebooks,andproducethefinalPNDM+GVQcodebooksbyreplacingtheoldPNDMweightswiththenewones.

3.ExperimentalComparisonofVQ-basedAlgorithms

3.1TestingDataandProcedures

168speakersinTESTsectionoftheTIMITcorpusareusedforSIexperiment,and190speakersfromDR1,DR2,DR3ofTRAINsectionareusedforestimatingthec（S,D）parameter.Eachspeakerhas10goodqualityrecordingsof16KHz,16bits/sample,andstoredasWAVEfilesinNISTformat.Twoofthem,SA1.WAVandSA2.WAV,areusedfortesting,andtherestfortrainingcodebooks.WedidnotperformsilenceremovalonWAVEfiles,sothatotherscouldreproducetheenvironmentwithnoadditionalcomplicationofVADalgorithmsandtheirparameters.

AMFCCprogramconvertsalltheWAVEfilesinadirectoryintoonefeaturevectorfile,inwhichallthefeaturevectorsareindexedwithitsspeakerandrecording.Foreachvalueoffeaturevectordimension,D=30,40,50,60,70,80,90,onetrainingfileandonetestingfilearecreated.TheyareusedbyallthealgorithmstotraincodebooksofsizeS=16,32,64,andtoperformidentificationtest,respectively.

TheMFCCfeaturevectorsarecalculatedasfollows:

1）dividetheentireutteranceintoblocksofsize512sampleswith256overlapping;

2）performpre-emphasizefilteringwithcoefficient0.97;

3）multiplywithHammingwindow,andperformshort-timeFFT;

4）applythestandardmel-frequencytriangularfilterbankstothesquareofmagnitudeofFFT;

5）applythelogarithmtothesumofalltheoutputsofeachindividualfilter;

6）applyDCTontheentiresetofdataresultedfromallfilters;

7）dropthezerocoefficient,toproducethecepstralcoefficients;

8）afteralltheblocksbeingprocessed,calculatethemeanovertheentiretimedurationandsubtractitfromthecepstralcoefficients;

9）calculatethe1stordertimederivativesofcepstralcoefficients,andconcatenatethemafterthecepstralcoefficients,toformafeaturevector.Forexample,afilter-bankofsize16willproduce30dimensionalfeaturevectors.

Duetoprojecttimeconstraint,theHWDparameterc（S,D）wasestimatedatS=16,32,64,D=40,80,sothatitachievesthehighestidentificationrateusingthe190speakersdatasetofTRAINsection.ForothervaluesofSand

展开阅读全文