Actually, my patch already had one that you didn't mention:
6) CR never shows up alone.
So the patch I sent out basicallyhad the following rules:
- no more than ~10% of all characters being other than regular printable
ASCII (where any control character except for newline/cr/tab was deemed
nonprintable)
- any "lonely" CR automatically means it's binary, and I would refuse
to convert that to a LF (the test in the code is that CRLF count must
match CR count)
but the "roundtrip" rule is much too strict (it's actually perfectly
possible for an editor to add CRLF characters only to new _lines_, leaving
old lines with just LF - or the other way around. In fact, the editor I
use under Linux does exactly that in reverse - if I add new lines, it will
add those without CR, but will leave old lines with CRLF alone).
I think that to help asian languages (or strange text-files in utf8 or
Latin1 too, for that matter: test-files with _just_ special characters), I
should probably make the rule be that only the 0-31 range is special.
Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html