GentooSed by example Part 3.docx
《GentooSed by example Part 3.docx》由会员分享,可在线阅读,更多相关《GentooSed by example Part 3.docx(12页珍藏版)》请在冰点文库上搜索。
![GentooSed by example Part 3.docx](https://file1.bingdoc.com/fileroot1/2023-6/19/bbe6e31c-85be-4a8c-ad2e-d671d9cc0fb2/bbe6e31c-85be-4a8c-ad2e-d671d9cc0fb21.gif)
GentooSedbyexamplePart3
Disclaimer:
TheoriginalversionofthisarticlewasfirstpublishedonIBMdeveloperWorks,andispropertyofWesttechInformationServices.Thisdocumentisanupdatedversionoftheoriginalarticle,andcontainsvariousimprovementsmadebytheGentooLinuxDocumentationteam.
Thisdocumentisnotactivelymaintained.
Sedbyexample,Part3
1. Takingittothenextlevel:
Datacrunching,sedstyle
Muscularsed
Inmysecondsedarticle,Iofferedexamplesthatdemonstratedhowsedworks,butveryfewoftheseexamplesactuallydidanythingparticularlyuseful.Inthisfinalsedarticle,it'stimetochangethatpatternandputsedtogooduse.I'llshowyouseveralexcellentexamplesthatnotonlydemonstratethepowerofsed,butalsodosomereallyneat(andhandy)things.Forexample,inthesecondhalfofthearticle,I'llshowyouhowIdesignedasedscriptthatconvertsa.QIFfilefromIntuit'sQuickenfinancialprogramintoanicelyformattedtextfile.Beforedoingthat,we'lltakealookatsomelesscomplicatedyetusefulsedscripts.
Texttranslation
OurfirstpracticalscriptconvertsUNIX-styletexttoDOS/Windowsformat.Asyouprobablyknow,DOS/Windows-basedtextfileshaveaCR(carriagereturn)andLF(linefeed)attheendofeachline,whileUNIXtexthasonlyalinefeed.TheremaybetimeswhenyouneedtomovesomeUNIXtexttoaWindowssystem,andthisscriptwillperformthenecessaryformatconversionforyou.
CodeListing 1.1:
FormatconversionbetweenUNIXandWindows
$sed-e's/$/\r/'myunix.txt>mydos.txt
Inthisscript,the'$'regularexpressionwillmatchtheendoftheline,andthe'\r'tellssedtoinsertacarriagereturnrightbeforeit.Insertacarriagereturnbeforealinefeed,andpresto,aCR/LFendseachline.Pleasenotethatthe'\r'willbereplacedwithaCRonlywhenusingGNUsed3.02.80orlater.Ifyouhaven'tinstalledGNUsed3.02.80yet,seemyfirstsedarticleforinstructionsonhowtodothis.
Ican'ttellyouhowmanytimesI'vedownloadedsomeexamplescriptorCcode,onlytofindthatit'sinDOS/Windowsformat.Whilemanyprogramsdon'tmindDOS/WindowsformatCR/LFtextfiles,severalprogramsdefinitelydo--themostnotablebeingbash,whichchokesassoonasitencountersacarriagereturn.ThefollowingsedinvocationwillconvertDOS/WindowsformattexttotrustyUNIXformat:
CodeListing 1.2:
ConvertingCcodefromWindowstoUNIXformat
$sed-e's/.$//'mydos.txt>myunix.txt
Thewaythisscriptworksissimple:
oursubstitutionregularexpressionmatchesthelastcharacterontheline,whichhappenstobeacarriagereturn.Wereplaceitwithnothing,causingittobedeletedfromtheoutputentirely.Ifyouusethisscriptandnoticethatthelastcharacterofeverylineoftheoutputhasbeendeleted,you'vespecifiedatextfilethat'salreadyinUNIXformat.Noneedforthat!
Reversinglines
Here'sanotherhandylittlescript.Thisonewillreverselinesinafile,similartothe"tac"commandthat'sincludedwithmostLinuxdistributions.Thename"tac"maybeabitmisleading,because"tac"doesn'treversethepositionofcharactersontheline(leftandright),butratherthepositionoflinesinthefile(upanddown).Tacingthefollowingfile:
CodeListing 1.3:
Samplefile
foo
bar
oni
....producesthefollowingoutput:
CodeListing 1.4:
Outputfile
oni
bar
foo
Wecandothesamethingwiththefollowingsedscript:
CodeListing 1.5:
Doingsamewithscript
$sed-e'1!
G;h;$!
d'forward.txt>backward.txt
You'llfindthissedscriptusefulifyou'reloggedintoaFreeBSDsystem,whichdoesn'thappentohavea"tac"command.Whilehandy,it'salsoagoodideatoknowwhythisscriptdoeswhatitdoes.Let'sdissectit.
Reversalexplained
First,thisscriptcontainsthreeseparatesedcommands,separatedbysemicolons:
'1!
G','h'and'$!
d'.Now,it'stimetogetangoodunderstandingoftheaddressesusedforthefirstandthirdcommands.Ifthefirstcommandwere'1G',the'G'commandwouldbeappliedonlytothefirstline.However,thereisanadditional'!
'character--this'!
'characternegatestheaddress,meaningthatthe'G'commandwillapplytoallbutthefirstline.Forthe'$!
d'command,wehaveasimilarsituation.Ifthecommandwere'$d',itwouldapplythe'd'commandtoonlythelastlineinthefile(the'$'addressisasimplewayofspecifyingthelastline).However,withthe'!
','$!
d'willapplythe'd'commandtoallbutthelastline.Now,allweneedtotoisunderstandwhatthecommandsthemselvesdo.
Whenweexecuteourlinereversalscriptonthetextfileabove,thefirstcommandthatgetsexecutedis'h'.Thiscommandtellssedtocopythecontentsofthepatternspace(thebufferthatholdsthecurrentlinebeingworkedon)totheholdspace(atemporarybuffer).Then,the'd'commandisexecuted,whichdeletes"foo"fromthepatternspace,soitdoesn'tgetprintedafterallthecommandsareexecutedforthisline.
Now,linetwo.After"bar"isreadintothepatternspace,the'G'commandisexecuted,whichappendsthecontentsoftheholdspace("foo\n")tothepatternspace("bar\n"),resultingin"bar\n\foo\n"inourpatternspace.The'h'commandputsthisbackintheholdspaceforsafekeeping,and'd'deletesthelinefromthepatternspacesothatitisn'tprinted.
Forthelast"oni"line,thesamestepsarerepeated,exceptthatthecontentsofthepatternspacearen'tdeleted(duetothe'$!
'beforethe'd'),andthecontentsofthepatternspace(threelines)areprintedtostdout.
Now,it'stimetodosomepowerfuldataconversionwithsed.
sedQIFmagic
Forthelastfewweeks,I'vebeenthinkingaboutpurchasingacopyofQuickentobalancemybankaccounts.Quickenisaverynicefinancialprogram,andwouldcertainlyperformthejobwithflyingcolors.But,afterthinkingaboutit,IdecidedthatIcouldeasilywritesomesoftwarethatwouldbalancemycheckbook.Afterall,Ireasoned,I'masoftwaredeveloper!
Idevelopedanicelittlecheckbookbalancingprogram(usingawk)thatcalculatesbybalancebyparsingatextfilecontainingallmytransactions.Afterabitoftweaking,IimproveditsothatIcouldkeeptrackofdifferentcreditanddebitcategories,justlikeQuickencan.But,therewasonemorefeatureIwantedtoadd.IrecentlyswitchedmyaccountstoabankthathasanonlineWebaccountinterface.Oneday,Inoticedthatmybank'sWebsiteallowedmetotodownloadmyaccountinformationinQuicken's.QIFformat.Inverylittletime,IdecidedthatitwouldbereallyneatifIcouldconvertthisinformationintotextformat.
Ataleoftwoformats
BeforewelookattheQIFformat,here'swhatmycheckbook.txtformatlookslike:
CodeListing 1.6:
SampleofQIFformat
28Aug2000food--YSupermarket30.94
25Aug2000watr-103YCheck10352.86
Inmyfile,allfieldsareseparatedbyoneormoretabs,withonetransactionperline.Afterthedate,thenextfieldliststhetypeofexpense(or"-"ifthisisanincomeitem).Thethirdfieldliststhetypeofincome(or"-"ifthisisanexpenseitem).Then,there'sachecknumberfield(again,"-"ifempty),atransactionclearedfield("Y"or"N"),acommentandadollaramount.Now,we'rereadytotakealookattheQIFformat.WhenIviewedmydownloadedQIFfileinatextviewer,thisiswhatIsaw:
CodeListing 1.7:
Malformedfileoutput
!
Type:
Bank
D08/28/2000
T-8.15
N
PCHECKCARDSUPERMARKET
^
D08/28/2000
T-8.25
N
PCHECKCARDPUNJABRESTAURANT
^
D08/28/2000
T-17.17
N
PCHECKCARDSUPERMARKET
Afterscanningthefile,wasn'tveryhardtofigureouttheformat--ignoringthefirstline,theformatisasfollows:
CodeListing 1.8:
Fileformat
D
T
N
P
^
(thisisthefieldseparator)
Startingtheprocess
Whenyou'retacklingasignificantsedprojectlikethis,don'tgetdiscouraged--sedallowsyoutograduallymassagethedataintoitsfinalform.Asyouprogress,youcancontinuetorefineyoursedscriptuntilyouroutputappearsexactlyasintended.Youdon'tneedtogetitexactlyrightonthefirsttry.
Tostartoff,Icreatedafilecalledqiftrans.sed,andstartedmassagingthedata:
CodeListing 1.9:
qiftrans.sed
1d
/^^/d
s/[[:
cntrl:
]]//g
Thefirst'1d'commanddeletesthefirstline,andthesecondcommandremovesthosepesky'^'charactersfromtheoutput.Thelastlineremovesanycontrolcharactersthatmayexistinthefile.SinceI'mdealingwithaforeignfileformat,Iwanttoeliminatetheriskofencounteringanycontrolcharactersalongtheway.Sofar,sogood.Now,it'stimetoaddsomeprocessingpunchtothisbasicscript:
CodeListing 1.10:
Improvedbasicscript
1d
/^^/d
s/[[:
cntrl:
]]//g
/^D/{
s/^D\(.*\)/\1\tOUTY\tINNY\t/
s/^01/Jan/
s/^02/Feb/
s/^03/Mar/
s/^04/Apr/
s/^05/May/
s/^06/Jun/
s/^07/Jul/
s/^08/Aug/
s/^09/Sep/
s/^10/Oct/
s/^11/Nov/
s/^12/Dec/
s:
^\(.*\)/\(.*\)/\(.*\):
\2\1\3:
}
First,Iadda'/^D/'addresssothatsedwillonlybeginprocessingwhenitencountersthefirstcharacteroftheQIFdatefield,'D'.Allofthecommandsinthecurlybraceswillexecuteinorderassoonassedreadssuchalineintoitspatternspace.
Thefirstlineinthecurlybraceswilltransformalinethatlookslike:
CodeListing 1.11:
Firstlinebeforechange
D08/28/2000
intoonethatlookslikethis:
CodeListing 1.12:
Firstlineafterchange
08/28/2000OUTYINNY
Ofcourse,thisformatisn'tperfectrightnow,butthat'sOK.We'llgraduallyrefinethecontentsofthepatternspaceaswego.Thenext12lineshavetheneteffectoftransformingthedatetoathree-letterformat,withthelastlineremovingthethreeslashesfromthedate.