Matt LaPlante reported that there's currently 151,809 bytes of trailing white space in the Linux kernel, requiring a 15 megabyte patch to remove it all. Andi Kleen argued that the white space didn't much matter, "you don't actually save anything on disk on most file systems (essentially everything except reiserfs on current Linux) because all files are rounded to block size (normally 4K). Same in page cache. And in tar files bzip2/gzip is very good at compacting them."
Andi went on to add that it's an issue that is slowly solving itself, "many kernel maintainers automatically remove trailing white space on any new lines these days. So as the kernel keeps changing it should eventually all disappear; except on essentially dead code." Pádraig Brady confirmed that things are naturally improving over time, as a similar report in 2001 found 224,654 bytes of trailing whitespace in the Linux kernel.
From: Matt LaPlante <kernel1@...>
Subject: A little coding style nugget of joy
Date: Sep 19, 12:34 pm 2007
Since everyone loves random statistics, here are a few gems to give you a break from your busy day:
Number of lines in the 2.6.22 Linux kernel source that include one or more trailing whitespaces: 135209
Bytes saved by removing said whitespace: 151809
Lines in the (unified) diff: 455437
Size of the diff: 15M
People brave enough to submit the patch: ~0
Take care. :)
-
Matt
-
From: Andi Kleen <andi@...>
Subject: Re: A little coding style nugget of joy
Date: Sep 19, 1:13 pm 2007
Matt LaPlante <kernel1@cyberdogtech.com> writes:
> Since everyone loves random statistics, here are a few gems to give you a break from your busy day:
>
> Number of lines in the 2.6.22 Linux kernel source that include one or more trailing whitespaces: 135209
> Bytes saved by removing said whitespace: 151809
You don't actually save anything on disk on most file systems
(essentially everything except reiserfs on current Linux)
because all files are rounded to block size (normally 4K)
Same in page cache.
And in tar files bzip2/gzip is very good at compacting them.
> Lines in the (unified) diff: 455437
> Size of the diff: 15M
> People brave enough to submit the patch: ~0
Many kernel maintainers automatically remove trailing white space on any new
lines these days. So as the kernel keeps changing it should eventually all
disappear; except on essentially dead code.
-Andi
-
From: Pádraig Brady <P@...>
Subject: Re: A little coding style nugget of joy
Date: Sep 20, 5:20 am 2007
Matt LaPlante wrote:
> Since everyone loves random statistics, here are a few gems to give you a break from your busy day:
>
> Number of lines in the 2.6.22 Linux kernel source that include one or more trailing whitespaces: 135209
> Bytes saved by removing said whitespace: 151809
> Lines in the (unified) diff: 455437
> Size of the diff: 15M
> People brave enough to submit the patch: ~0
It's gradually getting better so:
http://lwn.net/2001/1129/a/whitespace.php3
-
I don't get the "rounding" argument
If the files are rounded to bock size (4k), then the majority of them will not be incresed by adding a few white spaces, but some of them will be increased by 4k instead of a few bytes. On the average, it should compensate.
Except, of course, if the typical size of a file is well below 4k. Did I get soething wrong ?
Correct
You're right. Imagine that the sizes are evenly distributed in modulo-4000. That means that every 4000 whitespaces, one is causing an extra 4k block to be used. So the loss of space on disk is close to the number of whitespaces.
Now in reality it is not evenly distributed (even multiple whitespaces on the same line) and the real savings might be more or less than the amount give, but still considerable. So I don't see why a single patch could not be applied.
It won't make a difference for compressed tarballs obviously.
diff breakage
The reason a patch would not (and should not) be applied is that such changes gratuitously break diffs across the change. If a patch is floating around which *has* the extra whitespace (because it was created before the purge), it will not apply after the whitespace is removed.
The whitespace just doesn't matter enough to be worth the pain.
patch -l
You can tell patch to ignore changes in whitespace.
--
Program Intellivision and play Space Patrol!
No, I think you're right
No, I think you're right about that. Shouldn't be that hard to test.
Pretty easy in GNU/Emacs
I've got this in my .emacs:
(add-hook 'write-file-functions 'delete-trailing-whitespace)No trailing whitespace in my files :)
Bad Idea
So if you edit a file that is version-controlled, you could potentially change all of the lines with a single change?
That's a nightmare to review. It's a bad idea on anything collaborative.
That's true. It is why you
That's true. It is why you could require the people you work with not to leave trailing whitespace :)
I've never needed it, but it's pretty easy to add a variable to a file/directory whether the whitespace is to be (automatically) removed or not.
diff -w
You can tell diff to ignore whitespace. I do this quite often when comparing files that have differing newline conventions.
--
Program Intellivision and play Space Patrol!
I think the problem is that
I think the problem is that such a clear-cut patch is too strict...
lets just stick to the current, working way, without the
fear of breaking, and gradually get rid of all these whitespaces
lets see how many are left in one year
Play it half-safe
If worried about breaking something with a clear-cut patch that is too strict.
Then make so it don't remove whitespace from all lines, only to remove whitespace from lines that end with right-parenthesis-semicolon );
Seriously who cares?
Disk space is about 20 cents per gigabyte these days.
So even ignoring the file system issue, that 151 KB of whitespace costs about .003 cents.
Yeah. Lots of nonsense talk
Yeah. Lots of nonsense talk here lately. I hope we dont get down to the slashdot level some day...
In Soviet Russia, whitespace
In Soviet Russia, whitespace removes YOU
You mean we're still above?
You mean we're still above? ;)
Kernel packages will be
Kernel packages will be smaller when whitespace is removed and this will decrease download time.
Why patch? why not just run some simple commands:
for f in `find -name *.[hcsS]`; do
sed -i "s/^\(.*\)[ \t]\+$/\1/g" $f;
unexpand $f > $f.new;
mv $f.new $f;
done
I know I'm the only one...
I don't like trailing whitespace either, sure. It's stupid. However, what others consider "trailing" and what
/I/ consider "trailing" seem to be different.
If I have a line containing /only/ whitespace, I personally _LIKE_ that whitespace. I like having every
line in a related section of code indented by the exact same amount of whitespace- even the "blank" lines.
In my mind, not having leading whitespace on blank code lines makes about as much sense as having a leading
asterisk on all comment lines- except the blank ones.
I have yet to meet any who agree.
(I'll admit I first got into the habit due to bugs in VIM- sometimes adding a line and then moving away from it
keeps the indent level, sometimes it doesn't. I found I preferred the lines that kept it.)
Whitespace in source files is vile
Whitespace is particularly vile for me because I have GNU/Emacs set up to highlight it in bright red. Now when I see files with extra whitespace I know loser vi programmers (that means you vim fanboys) have been hacking it. You usually see tabs in the same files. What do you expect from people who use BSD indentation standards? Stallman dictated GNU indentation standards long ago. Why don't you use them and avoid this silliness.
You're doing it wrong.
You need to be a little more subtle if you don't want to be called out as a troll. "Vim fanboys?" "Stallman dictated?" Highlighting white space in bright red? Come on. Far too aggressive and head-on. You're just begging for someone to tell you to shove GNU indent up your backside and get over it.
--
Program Intellivision and play Space Patrol!
I use Emacs, and the first
I use Emacs, and the first thing I do is change it to K&R...
Spaces suck, tabs rule.
I used to feel that way...
I used to prefer tabs, but 8 spaces is too much indentation. If you set your tab stops to something other than 8 spaces, other peoples' editors go "boing." If you set your indentation levels to 4 but leave your tab stops set at 8, you can cause even more confusion for some people.
I've switched to using spaces for my indentation everywhere. If I get a tab-damaged file, I'll detabify it, fix the indentation, and save it. Space characters: The universal lowest common denominator.
--
Program Intellivision and play Space Patrol!
Vim fanboyz
Put this in your .vimrc:
You can highlight more complicated things relatively easily by following this as a template:
http://vim.wikia.com/wiki/Highlighting_whitespaces_at_end_of_line