I claim to be a `sed` lover, not guru (generally bad term), not geek. Since seders is closed private group (stupid, huh), i'd like to post something interesting and useful here.
Tutorials, i know, and could find in google all have silly examples and suck. Let's do a HOWTO for real-life stuff.
I did some comments deeply somewhere here about useful optimizations and tricks, now here's another one and more.
In this issue:
* changing all but last occurrence (persson)
* stripping slack from all php, htm*, css, js, xml on a production-level webserver (Julius Thyssen)
* per line transformations of blocks (David Esterkin, alhajaj), or i'm the best (-:
Message-ID: <8499950a0804221341m638f53b5t770cbc594433d437@mail.gmail.com> Date: Tue, 22 Apr 2008 21:41:44 +0100 From: "Oleg Verych" <olecom@gmail.com> To: "sed users" <sed-users@yahoogroups.com> Subject: Re: Changing all but last occurrence persson @ Tue, Apr 22, 2008 at 6:18 PM: > > Why do you need this? And why you don't try to code anything yourself? > > It's precisely because I tried and can't see a way to accomplish the goal > using sed alone that I'm asking here. I already know how to do those > things using other tools or sed + other tools, but not with sed alone, > so I'm asking whether that is possible at all or I'm just missing > something. Anyway a working showcase of what you want is always better (if you have it), than words and vague descriptions. > > > - match/change all but last n occurrences of RE *in a file*, RE > > > occurs at > > > > > > most once per input line; > > [...] > > > many times per input line; > > > > If operating whole file, number of per-line occurrences doesn't > > matter. > > Can you elaborate on this? Here's what i think about this task. Curiosity can elaborate it. ftp://flower.upol.cz/dts/sed0000_var/all_but.sed.sh $ sh all_but.sed.sh $ sh all_but.sed.sh a 22 $ sh all_but.sed.sh l $ sh all_but.sed.sh l 22 --
Stripping slack from all php, htm*, js, xml on a webserver
Don't want to read, here's the code ftp://flower.upol.cz/dts/sed0000_var/strip_html.sh
Some rants.
Something, that i agree with.
multi-line and not multiline
ftp://flower.upol.cz/dts/sed0000_var/blocks.sed.sh
Message-ID: <8499950a0805211159q30f83efxbcb1e3178956edee@mail.gmail.com> Date: Wed, 21 May 2008 19:59:11 +0100 From: "Oleg Verych" Subject: multi-line and not multiline -- input -- #100 ADD some/file/path MODIFY diff/file/path BLANK #104 MODIFY /another/modified/file/path DEL /a/deleted/file/path MODIFY /one/more/file/path BLANK ... -- output -- 100 ADD some/file/path 100 MODIFY diff/file/path 104 MODIFY/another/modified/file/path 104 DEL /another/file/path 104 MODIFY /one/more/file/path == proposition == multi-line processing doen't mean, one cannot do changes on blocks line-by-line. This doesn't require `sed`'s multi-line tools (N, P). You just need to save number in hold buffer on block start and to insert it on other lines, line-per-line until the end. [...] -- input -- void methodA() { doIt } void methodB() { doSomeThingkElse } -- output -- void methodA() { throw Exception doIt } void methodB() { throw Exception doSomeThingkElse } == proposition == Here is work with block also. Note: not whole file with job only on the final line. Just insert your condition with needed text processing between block start and end. Something like sed ' /^{/,/^}/{ /condition/s-RE-placement- }' however if condition is line number inside a block, then something else required. In example above, i.e. first line: sed ' /^{/{ p i\ placement d }' Or something like that. NOTE: scripts were just typed in gmail. -- sed 'sed && sh + olecom = love' << '' -o--=O`C #oo'L O <___=E MMessage-ID: <8499950a0805220316t2a5c3770rbf712e5c57654f1e@mail.gmail.com> Date: Thu, 22 May 2008 11:16:31 +0100 From: "Oleg Verych" To: sed-users Subject: Re: optimization Re: multi-line and not multiline > And after something works, it's time for optimizations. OK, thanks to gudermez and i'm being with shell, actual check and run can be done. It turns out, that both correct and optimized script actually uses N, but only for speed. My first two scripts were done with wrong hopes about hold buffer. Anyway, now it's correct and even more optimized, then i expected. Also input to my script can be more human-readable -- i.e. there are can be blank lines after blocks. Finally empty blocks are also handled as correct input condition. == possible input == #000 BLANK #100 ADD some/file/path MODIFY diff/file/path BLANK #104 MODIFY /another/modified/file/path DEL /a/deleted/file/path MODIFY /one/more/file/path BLANK #177 MODIFY /77another/modified/file/path DEL /77a/deleted/file/path MODIFY /77one/more/file/path BLANK ~ == benckmark results == olecom@flower$ du -h blocks.txt 21M blocks.txt olecom@flower$ time sh blocks.sed.sh olecom <blocks.txt >/dev/null olecom real 0m1.674s user 0m1.656s sys 0m0.016s olecom@flower$ time sh blocks.sed.sh gudermez <blocks.txt >/dev/null gudermez real 0m8.453s user 0m8.441s sys 0m0.016s == script == olecom(){ sed -n ' /^#/{ s-#-- h :_append N /BLANK$/d /\n$/d s`\n` ` p g b_append }' } gudermez(){ sed -e ' /^#[1-9][0-9]*$/{ s/.// h d } /^BLANK$/d G s/^\(.*\)\n\(.*\)/\2 \1/ ' } echo "$1" >&2 $1 exit == checking output == olecom@flower$ sh blocks.sed.sh gudermez <blocks.txt | sed '10q' >g gudermez olecom@flower$ sh blocks.sed.sh olecom <blocks.txt | sed '10q' >o olecom olecom@flower$ diff g o olecom@flower$ sed '' <o 100 ADD some/file/path 100 MODIFY diff/file/path 104 MODIFY /another/modified/file/path 104 DEL /a/deleted/file/path 104 MODIFY /one/more/file/path 177 MODIFY /77another/modified/file/path 177 DEL /77a/deleted/file/path 177 MODIFY /77one/more/file/path 100 ADD some/file/path 100 MODIFY diff/file/path olecom@flower$ -- sed 'sed && sh + olecom = love' << '' -o--=O`C #oo'L O <___=E Mpersson's results
Message-ID: <8499950a0805230940r7075a212o1e4ec1a706e275d5@mail.gmail.com> Date: Fri, 23 May 2008 17:40:39 +0100 From: "Oleg Verych" Subject: Re: optimization Re: multi-line and not multiline Alright. Here we go. == persson == olecom@flower:/tmp$ time <blocks.txt sed '/^BLANK$/d /^#[0-9]\{1,\}/{s/^#//;h;d} {G;s/\(.*\)\n\(.*\)/\2 \1/}' >/dev/null real 0m8.406s user 0m8.397s sys 0m0.012s olecom@flower$ == other benckmark results == > > olecom@flower$ du -h blocks.txt > 21M blocks.txt > olecom@flower$ time sh blocks.sed.sh olecom <blocks.txt >/dev/null > olecom > > real 0m1.674s > user 0m1.656s > sys 0m0.016s > > olecom@flower$ time sh blocks.sed.sh gudermez <blocks.txt >/dev/null > gudermez > > real 0m8.453s > user 0m8.441s > sys 0m0.016s > _____