增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc

上传人:聆听****声音 文档编号:432566 上传时间:2023-04-28 格式:DOC 页数:15 大小:769.21KB
下载 相关 举报
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第1页
第1页 / 共15页
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第2页
第2页 / 共15页
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第3页
第3页 / 共15页
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第4页
第4页 / 共15页
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第5页
第5页 / 共15页
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第6页
第6页 / 共15页
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第7页
第7页 / 共15页
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第8页
第8页 / 共15页
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第9页
第9页 / 共15页
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第10页
第10页 / 共15页
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第11页
第11页 / 共15页
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第12页
第12页 / 共15页
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第13页
第13页 / 共15页
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第14页
第14页 / 共15页
增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc_第15页
第15页 / 共15页
亲,该文档总共15页,全部预览完了,如果喜欢就下载吧!
下载资源
资源描述

增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc

《增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc》由会员分享,可在线阅读,更多相关《增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc(15页珍藏版)》请在冰点文库上搜索。

增强的基于vq算法的说话人语音识别外文文献翻译Word格式.doc

字数统计:

英文1869单词,9708字符;

中文3008汉字

外文文献:

 

EnhancedVQ-basedAlgorithmsforSpeechIndependentSpeakerIdentification

AbstractWeighteddistancemeasureanddiscriminativetrainingaretwodifferentapproachestoenhanceVQ-basedsolutionsforspeakeridentification.ToaccountforvaryingimportanceoftheLPCcoefficientsinSV,theso-calledpartitionnormalizeddistancemeasuresuccessfullyusednormalizedfeaturecomponents.Thispaperintroducesanalternative,calledheuristicweighteddistance,toliftuphigherorderMFCCfeaturevectorcomponentsusingalinearformula.Thenitproposestwonewalgorithmscombiningtheheuristicweightingandthepartitionnormalizeddistancemeasurewithgroupvectorquantizationdiscriminativetrainingtotakeadvantageofbothapproaches.ExperimentsusingtheTIMITcorpussuggestthatthenewcombinedapproachissuperiortocurrentVQ-basedsolutions(50%errorreduction).ItalsooutperformstheGaussianMixtureModelusingtheWaveletfeaturestestedinasimilarsetting.

1.Introduction

Vectorquantization(VQ)basedclassificationalgorithmsplayanimportantroleinspeechindependentspeakeridentification(SI)systems.Althoughinbaselineform,theVQ-basedsolutionislessaccuratethantheGaussianMixtureModel(GMM),itofferssimplicityincomputation.Foralargedatabaseofoverhundredsorthousandsofspeakers,bothaccuracyandspeedareimportantissues.HerewediscussVQenhancementsaimedataccuracyandfastcomputation.

1.1VQBasedSpeakerIdentificationSystem

Fig.1showstheVQbasedspeakeridentificationsystem.Itcontainsanofflinetrainingsub-systemtoproduceVQcodebooksandanonlinetestingsub-systemtogenerateidentificationdecision.Bothsub-systemscontainapreprocessingorfeatureextractionmoduletoconvertanaudioutteranceintoasetoffeaturevectors.FeaturesofinterestintherecentliteraturesincludetheMel-frequencycepstralcoefficients(MFCC),theLinespectrapairs(LSP),theWaveletpacketparameter(WPP),orPCAandICAfeatures].AlthoughtheWPPandICAhavebeenshowntoofferadvantages,weusedMFCCinthispapertofocusourattentiononothermodulesofthesystem.

Fig.1.AVQ-basedspeakeridentificationsystemfeaturesanonlinesub-systemforidentifyingtestingaudioutterance,andanofflinetrainingsub-system,whichusestrainingaudioutterancetogenerateacodebookforeachspeakerinthedatabase.

AVQcodebooknormallyconsistsofcentroidsofpartitionsoverspeaker’sfeaturevectorspace.TheeffectstoSIbydifferentpartitionclusteringalgorithms,suchastheLBGandtheRLS,havebeenstudied.TheaverageerrorordistortionofthefeaturevectorsoflengthTwithaspeakerkcodebookisgivenby

(1)

d(.,.)isadistancefunctionbetweentwovectors.isthejcodeofdimensionD.Sisthecodebooksize.Listhetotalnumberofspeakersinthedatabase.ThebaselineVQalgorithmofSIsimplyusestheLBGtogeneratecodebooksandthesquareoftheEuclideandistanceasthed(.,.).

ManyimprovementstothebaselineVQalgorithmhavebeenpublished.Amongthem,therearetwoindependentapproaches:

(1)chooseaweighteddistancefunction,suchastheF-ratioandIHMweights,thePartitionNormalizedDistanceMeasure(PNDM),andtheBhattacharyyaDistance;

(2)explorediscriminationpowerofinter-speakercharacteristicsusingtheentiresetofspeakers,suchastheGroupVectorQuantization(GVQ)discriminativetraining,andtheSpeakerDiscriminativeWeighting.ExperimentallywehavefoundthatPNDMandGVQaretwoveryeffectivemethodsineachofthegroupsrespectively.

1.2ReviewofPartitionNormalizedDistanceMeasure

ThePartitionNormalizedDistanceMeasureisdefinedasthesquareoftheweightedEuclideandistance.

(2)

Theweightingcoefficientsaredeterminedbyminimizingtheaverageerroroftrainingutterancesofallthespeakers,subjecttotheconstraintthatthegeometricmeanoftheweightsforeachpartitionisequalto1.

bearandomtrainingfeaturevectorofspeakerk,whichisassignedtopartitionjviaminimizationprocessinEquation

(1).Ithasmeanandvariancevectors:

(3)

Theconstrainedoptimizationcriteriontobeminimizedinordertoderivetheweightsis

(4)

WhereListhenumberofspeakers,andSisthecodebooksize.Letting

and(5)

Wehave

and(6)

Wheresub-scriptiisthefeaturevectorcomponentindex,kandjarespeakerandpartitionindicesrespectively.Becausekandjareinbothsidesoftheequations,theweightsareonlydependentonthedatafromonepartitionofonespeaker.

1.3ReviewofGroupVectorQuantization

Discriminativetrainingistousethedataofallthespeakerstotrainthecodebook,sothatitcanachievemoreaccurateidentificationresultsbyexploringtheinter-speakerdifferences.TheGVQtrainingalgorithmisdescribedasfollows.

GroupVectorQuantizationAlgorithm:

(1)Randomlychooseaspeakerj.

(2)SelectNvectors

(3)calculateerrorforallthecodebooks.

Iffollowingconditionsaresatisfiedgoto(4)

a),but;

b),whereWisawindowsize;

Elsegoto(5)

(4)foreach

where

(5)foreach,

,where

2.Enhancements

WeproposethefollowingstepstofurtherenhancetheVQbasedsolution:

(1)aHeuristicWeightedDistance(HWD),

(2)combinationofHWDandGVQ,and(3)combinationofPNDMandGVQ.

2.1HeuristicWeightedDistance

ThePNDMweightsareinverselyproportionaltopartitionvariancesofthefeaturecomponents,asshowninEquation(6).Ithasbeenshownthatvariancesofcepstral.Clearlywhereiisthevectorelementindex,whichreflectsfrequencyband.Thehighertheindex,thelessfeaturevalueanditsvariance.

WeconsideredaHeuristicWeightedDistanceas

(7)

Theweightsarecalculatedby

(8)

Wherec(S,D)isafunctionofboththecodebooksizeSandthefeaturevectordimensionD.Foragivencodebook,SandDarefixed,andthusc(S,D)isaconstant.Thevalueofc(S,D)isestimatedexperimentallybyperforminganexhaustivesearchtoachievethemaximumidentificationrateinagivensampletestdataset.

2.2CombinationofHWDandGVQ

CombinationoftheHWDandtheGVQisachievedbysimplyreplacingtheoriginalsquareoftheEuclideandistancewiththeHWDEquation(7),andtoadjusttheGVQupdatingparameterαwheneverneeded.

2.3CombinationofPNDMandGVQ

TocombinePNDMwiththeGVQrequiresaslightmorework,becausetheGVQaltersthepartitionandthusitscomponentvariance.Wehaveusedthefollowingalgorithmtoovercomethisproblem.

AlgorithmtoCombinePNDMwiththeGVQDiscriminativeTraining:

(1)UseLBGalgorithmtogenerateinitialLBGcodebooks;

(2)CalculatePNDMweightsusingtheLBGcodebooks,andproducePNDMweightedLBGcodebooks,whichareLBGcodebooksappendedwiththePNDMweights;

(3)PerformGVQtrainingwithPNDMdistancefunction,andgeneratetheinitialPNDM+GVQcodebooksbyreplacingtheLBGcodeswiththeGVQcodes;

(4)RecalculatePNDMweightsusingthePNDM+GVQcodebooks,andproducethefinalPNDM+GVQcodebooksbyreplacingtheoldPNDMweightswiththenewones.

3.ExperimentalComparisonofVQ-basedAlgorithms

3.1TestingDataandProcedures

168speakersinTESTsectionoftheTIMITcorpusareusedforSIexperiment,and190speakersfromDR1,DR2,DR3ofTRAINsectionareusedforestimatingthec(S,D)parameter.Eachspeakerhas10goodqualityrecordingsof16KHz,16bits/sample,andstoredasWAVEfilesinNISTformat.Twoofthem,SA1.WAVandSA2.WAV,areusedfortesting,andtherestfortrainingcodebooks.WedidnotperformsilenceremovalonWAVEfiles,sothatotherscouldreproducetheenvironmentwithnoadditionalcomplicationofVADalgorithmsandtheirparameters.

AMFCCprogramconvertsalltheWAVEfilesinadirectoryintoonefeaturevectorfile,inwhichallthefeaturevectorsareindexedwithitsspeakerandrecording.Foreachvalueoffeaturevectordimension,D=30,40,50,60,70,80,90,onetrainingfileandonetestingfilearecreated.TheyareusedbyallthealgorithmstotraincodebooksofsizeS=16,32,64,andtoperformidentificationtest,respectively.

TheMFCCfeaturevectorsarecalculatedasfollows:

1)dividetheentireutteranceintoblocksofsize512sampleswith256overlapping;

2)performpre-emphasizefilteringwithcoefficient0.97;

3)multiplywithHammingwindow,andperformshort-timeFFT;

4)applythestandardmel-frequencytriangularfilterbankstothesquareofmagnitudeofFFT;

5)applythelogarithmtothesumofalltheoutputsofeachindividualfilter;

6)applyDCTontheentiresetofdataresultedfromallfilters;

7)dropthezerocoefficient,toproducethecepstralcoefficients;

8)afteralltheblocksbeingprocessed,calculatethemeanovertheentiretimedurationandsubtractitfromthecepstralcoefficients;

9)calculatethe1stordertimederivativesofcepstralcoefficients,andconcatenatethemafterthecepstralcoefficients,toformafeaturevector.Forexample,afilter-bankofsize16willproduce30dimensionalfeaturevectors.

Duetoprojecttimeconstraint,theHWDparameterc(S,D)wasestimatedatS=16,32,64,D=40,80,sothatitachievesthehighestidentificationrateusingthe190speakersdatasetofTRAINsection.ForothervaluesofSand

展开阅读全文
相关资源
猜你喜欢
相关搜索
资源标签

当前位置:首页 > 自然科学 > 物理

copyright@ 2008-2023 冰点文库 网站版权所有

经营许可证编号:鄂ICP备19020893号-2