2007-06-03 20:46:44

by Sam Ravnborg

[permalink] [raw]
Subject: [PATCH 08/19] scripts: Make cleanfile/cleanpatch warn about long lines

Subject: [PATCH 08/19] scripts: Make cleanfile/cleanpatch warn about long lines
From: H. Peter Anvin <[email protected]>
Date: Fri, 25 May 2007 17:58:26 -0700

Make the "cleanfile" and "cleanpatch" script warn about long lines,
by default lines whose visual width exceeds 79 characters.

Per suggestion from Auke Kok.

Signed-off-by: H. Peter Anvin <[email protected]>
Signed-off-by: Sam Ravnborg <[email protected]>
---
scripts/cleanfile | 54 ++++++++++++++++++++++++++++++++++++++++++++++-
scripts/cleanpatch | 58 +++++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 107 insertions(+), 5 deletions(-)

diff --git a/scripts/cleanfile b/scripts/cleanfile
index f1ba8aa..cefd29e 100755
--- a/scripts/cleanfile
+++ b/scripts/cleanfile
@@ -7,7 +7,9 @@
use bytes;
use File::Basename;

-#
+# Default options
+$max_width = 79;
+
# Clean up space-tab sequences, either by removing spaces or
# replacing them with tabs.
sub clean_space_tabs($)
@@ -48,9 +50,49 @@ sub clean_space_tabs($)
return $lo;
}

+# Compute the visual width of a string
+sub strwidth($) {
+ no bytes; # Tab alignment depends on characters
+
+ my($li) = @_;
+ my($c, $i);
+ my $pos = 0;
+ my $mlen = 0;
+
+ for ($i = 0; $i < length($li); $i++) {
+ $c = substr($li,$i,1);
+ if ($c eq "\t") {
+ $pos = ($pos+8) & ~7;
+ } elsif ($c eq "\n") {
+ $mlen = $pos if ($pos > $mlen);
+ $pos = 0;
+ } else {
+ $pos++;
+ }
+ }
+
+ $mlen = $pos if ($pos > $mlen);
+ return $mlen;
+}
+
$name = basename($0);

-foreach $f ( @ARGV ) {
+@files = ();
+
+while (defined($a = shift(@ARGV))) {
+ if ($a =~ /^-/) {
+ if ($a eq '-width' || $a eq '-w') {
+ $max_width = shift(@ARGV)+0;
+ } else {
+ print STDERR "Usage: $name [-width #] files...\n";
+ exit 1;
+ }
+ } else {
+ push(@files, $a);
+ }
+}
+
+foreach $f ( @files ) {
print STDERR "$name: $f\n";

if (! -f $f) {
@@ -90,8 +132,10 @@ foreach $f ( @ARGV ) {

@blanks = ();
@lines = ();
+ $lineno = 0;

while ( defined($line = <FILE>) ) {
+ $lineno++;
$in_bytes += length($line);
$line =~ s/[ \t\r]*$//; # Remove trailing spaces
$line = clean_space_tabs($line);
@@ -107,6 +151,12 @@ foreach $f ( @ARGV ) {
@blanks = ();
$blank_bytes = 0;
}
+
+ $l_width = strwidth($line);
+ if ($max_width && $l_width > $max_width) {
+ print STDERR
+ "$f:$lineno: line exceeds $max_width characters ($l_width)\n";
+ }
}

# Any blanks at the end of the file are discarded
diff --git a/scripts/cleanpatch b/scripts/cleanpatch
index a53f987..9680d03 100755
--- a/scripts/cleanpatch
+++ b/scripts/cleanpatch
@@ -7,7 +7,9 @@
use bytes;
use File::Basename;

-#
+# Default options
+$max_width = 79;
+
# Clean up space-tab sequences, either by removing spaces or
# replacing them with tabs.
sub clean_space_tabs($)
@@ -48,9 +50,49 @@ sub clean_space_tabs($)
return $lo;
}

+# Compute the visual width of a string
+sub strwidth($) {
+ no bytes; # Tab alignment depends on characters
+
+ my($li) = @_;
+ my($c, $i);
+ my $pos = 0;
+ my $mlen = 0;
+
+ for ($i = 0; $i < length($li); $i++) {
+ $c = substr($li,$i,1);
+ if ($c eq "\t") {
+ $pos = ($pos+8) & ~7;
+ } elsif ($c eq "\n") {
+ $mlen = $pos if ($pos > $mlen);
+ $pos = 0;
+ } else {
+ $pos++;
+ }
+ }
+
+ $mlen = $pos if ($pos > $mlen);
+ return $mlen;
+}
+
$name = basename($0);

-foreach $f ( @ARGV ) {
+@files = ();
+
+while (defined($a = shift(@ARGV))) {
+ if ($a =~ /^-/) {
+ if ($a eq '-width' || $a eq '-w') {
+ $max_width = shift(@ARGV)+0;
+ } else {
+ print STDERR "Usage: $name [-width #] files...\n";
+ exit 1;
+ }
+ } else {
+ push(@files, $a);
+ }
+}
+
+foreach $f ( @files ) {
print STDERR "$name: $f\n";

if (! -f $f) {
@@ -86,6 +128,7 @@ foreach $f ( @ARGV ) {

$in_bytes = 0;
$out_bytes = 0;
+ $lineno = 0;

@lines = ();

@@ -93,10 +136,12 @@ foreach $f ( @ARGV ) {
$err = 0;

while ( defined($line = <FILE>) ) {
+ $lineno++;
$in_bytes += length($line);

if (!$in_hunk) {
- if ($line =~ /^\@\@\s+\-([0-9]+),([0-9]+)\s+\+([0-9]+),([0-9]+)\s\@\@/) {
+ if ($line =~
+ /^\@\@\s+\-([0-9]+),([0-9]+)\s+\+([0-9]+),([0-9]+)\s\@\@/) {
$minus_lines = $2;
$plus_lines = $4;
if ($minus_lines || $plus_lines) {
@@ -117,6 +162,13 @@ foreach $f ( @ARGV ) {
$text =~ s/[ \t\r]*$//; # Remove trailing spaces
$text = clean_space_tabs($text);

+ $l_width = strwidth($text);
+ if ($max_width && $l_width > $max_width) {
+ print STDERR
+ "$f:$lineno: adds line exceeds $max_width ",
+ "characters ($l_width)\n";
+ }
+
push(@hunk_lines, '+'.$text);
} elsif ($line =~ /^\-/) {
$minus_lines--;
--
1.5.1.rc3.1544.g8a923


----- End forwarded message -----


2007-06-04 16:08:59

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH 08/19] scripts: Make cleanfile/cleanpatch warn about long lines


On Jun 3 2007 22:47, Sam Ravnborg wrote:
>
>Make the "cleanfile" and "cleanpatch" script warn about long lines,
>by default lines whose visual width exceeds 79 characters.

Nice, nice. But, am I asking too much if tabs and kernel codestyle
could be used? (/me hides..., but see scripts/checkpatch.pl :-)

>+# Compute the visual width of a string
>+sub strwidth($) {
>+ no bytes; # Tab alignment depends on characters
>+
>+ my($li) = @_;
>+ my($c, $i);
>+ my $pos = 0;
>+ my $mlen = 0;
>+
>+ for ($i = 0; $i < length($li); $i++) {
>+ $c = substr($li,$i,1);
>+ if ($c eq "\t") {
>+ $pos = ($pos+8) & ~7;
>+ } elsif ($c eq "\n") {
>+ $mlen = $pos if ($pos > $mlen);
>+ $pos = 0;
>+ } else {
>+ $pos++;
>+ }
>+ }
>+
>+ $mlen = $pos if ($pos > $mlen);
>+ return $mlen;
>+}
>+


Jan
--

2007-06-05 07:23:16

by Oleg Verych

[permalink] [raw]
Subject: Re: [kbuild-devel] [PATCH 08/19] scripts: Make cleanfile/cleanpatch warn about long lines

Hallo.

On Sun, Jun 03, 2007 at 10:47:00PM +0200, Sam Ravnborg wrote:
> Subject: [PATCH 08/19] scripts: Make cleanfile/cleanpatch warn about long lines
> From: H. Peter Anvin <[email protected]>
> Date: Fri, 25 May 2007 17:58:26 -0700
>
> Make the "cleanfile" and "cleanpatch" script warn about long lines,
> by default lines whose visual width exceeds 79 characters.
>
> Per suggestion from Auke Kok.
>
> Signed-off-by: H. Peter Anvin <[email protected]>
> Signed-off-by: Sam Ravnborg <[email protected]>

Thank you, Sam, for sending messages back in Kbuild list again.
I have ~30k backlog in LKML, so that thing is good.

So, there are some new scripts. What if my proposition will be better,
so to speak? Any problems i'm willing to fix/enhance.

Note: only one copy of the file required. Sym-linked name *diff* or
*patch* will process patches. I know, that symlinks in sources isn't
good, thus change "$0" -> "$1" will process first parameter.

-*- shell-script: clean-whitespace.sh -*-

#!/bin/sh -e
# stdin/stdout

case $0 in *diff* | *patch*) p='+';; esac
t="`printf '\t'`" ; w79=79 ; IFS=''

while read line
do case "$line" in
++*) echo "$line";;
$p*) line="`echo \"$line\" | expand`"
[ ${#line} -gt $w79 ] && : ${long:=line}
echo "$line" | sed -e "/^$p/{s_ _${t}_g;s_^$p *_${p}_;s_ *\$__}"
;;
*) echo "$line";;
esac
done
[ -n "$long" ] && echo "at least one line, wider than $w79 chars, found" 1>&2

-*-
> ---
> scripts/cleanfile | 54 ++++++++++++++++++++++++++++++++++++++++++++++-
> scripts/cleanpatch | 58 +++++++++++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 107 insertions(+), 5 deletions(-)
> bla-bla...
____

2007-06-05 08:19:23

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [kbuild-devel] [PATCH 08/19] scripts: Make cleanfile/cleanpatch warn about long lines

On Tue, Jun 05, 2007 at 09:33:35AM +0200, Oleg Verych wrote:
> Hallo.
>
> On Sun, Jun 03, 2007 at 10:47:00PM +0200, Sam Ravnborg wrote:
> > Subject: [PATCH 08/19] scripts: Make cleanfile/cleanpatch warn about long lines
> > From: H. Peter Anvin <[email protected]>
> > Date: Fri, 25 May 2007 17:58:26 -0700
> >
> > Make the "cleanfile" and "cleanpatch" script warn about long lines,
> > by default lines whose visual width exceeds 79 characters.
> >
> > Per suggestion from Auke Kok.
> >
> > Signed-off-by: H. Peter Anvin <[email protected]>
> > Signed-off-by: Sam Ravnborg <[email protected]>
>
> Thank you, Sam, for sending messages back in Kbuild list again.
> I have ~30k backlog in LKML, so that thing is good.
Delete them all - it only hurst a few minutes ;-)

>
> So, there are some new scripts. What if my proposition will be better,
> so to speak? Any problems i'm willing to fix/enhance.
>
> Note: only one copy of the file required. Sym-linked name *diff* or
> *patch* will process patches. I know, that symlinks in sources isn't
> good, thus change "$0" -> "$1" will process first parameter.

Sorry - but I really do not get your point here.
Are you trying to say that current cleanpatch is not good enough
or do you propose a new script to do something similar?

We do not want everyones favorite patch preprocessing script
in the kernel. So the only option is to incorporate changes in
cleanpatch.

If on the other hand you are proposing a script to clean whitespace
damage in the code then git already does this nicely.
I do not recall the actual receipt but searching the git mailing list
should reveal it. So for whitespace cleanup we should use git but maybe
via a small helper script.

Sam

2007-06-05 12:37:14

by Oleg Verych

[permalink] [raw]
Subject: Re: [kbuild-devel] [PATCH 08/19] scripts: Make cleanfile/cleanpatch warn about long lines

On Tue, Jun 05, 2007 at 10:19:59AM +0200, Sam Ravnborg wrote:
[]
> > So, there are some new scripts. What if my proposition will be better,
> > so to speak? Any problems i'm willing to fix/enhance.
> >
> > Note: only one copy of the file required. Sym-linked name *diff* or
> > *patch* will process patches. I know, that symlinks in sources isn't
> > good, thus change "$0" -> "$1" will process first parameter.
>
> Sorry - but I really do not get your point here.
> Are you trying to say that current cleanpatch is not good enough
> or do you propose a new script to do something similar?

Better means, less bloated scripts in the source tree to make
userspace suck less...

> We do not want everyones favorite patch preprocessing script
> in the kernel. So the only option is to incorporate changes in
> cleanpatch.

I don't see scripts/clean* in .21, so decided, to make a hopefully
better, nicer unix-way and posix re-write, this morning.
____

2007-06-05 13:27:03

by Oleg Verych

[permalink] [raw]
Subject: Re: [kbuild-devel] [PATCH 08/19] scripts: Make cleanfile/cleanpatch warn about long lines

On Tue, Jun 05, 2007 at 10:19:59AM +0200, Sam Ravnborg wrote:
[]
> If on the other hand you are proposing a script to clean whitespace
> damage in the code then git already does this nicely.

I've read that too quickly, sorry. What then all that clean scripts
are for?

> I do not recall the actual receipt but searching the git mailing list
> should reveal it. So for whitespace cleanup we should use git but maybe
> via a small helper script.

So, just to protect script from itself or similar one, here's update
(only for those, who's interested, of course).

-*- clean-whitespace.sh -*-
#!/bin/sh -e
# stdin/stdout

IFS=''
t="`printf '\t'`" ; s=' ' ; s7=' ' ; w79=79 ;
case $0 in *diff* | *patch*) p='+' ; s='';; esac

while read line
do case "$line" in
++*) echo "$line";;
$p*) line="`echo \"$line\" | expand`"
[ ${#line} -gt $w79 ] && : ${long:=line}
echo "$line" | sed "/^$p/{s_ *\$__;s_^$p$s7${s}_$p${t}_;s_$s7 _${t}_g}"
;;
*) echo "$line";;
esac
done
[ -n "$long" ] && echo "at least one line, wider than $w79 chars, found" 1>&2

-*-

2007-06-05 14:12:00

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [kbuild-devel] [PATCH 08/19] scripts: Make cleanfile/cleanpatch warn about long lines

On Tue, Jun 05, 2007 at 03:38:34PM +0200, Oleg Verych wrote:
> On Tue, Jun 05, 2007 at 10:19:59AM +0200, Sam Ravnborg wrote:
> []
> > If on the other hand you are proposing a script to clean whitespace
> > damage in the code then git already does this nicely.
>
> I've read that too quickly, sorry. What then all that clean scripts
> are for?
cleanfile compress spaces to tabs where appropriate. It does more
than just warn about too long lines and remove leading whitespace.

Please look at latest -git tree if you are trying to improve
the clean* scripts or add counterparts.

Thanks,
Sam

2007-06-05 14:47:24

by Oleg Verych

[permalink] [raw]
Subject: Re: [kbuild-devel] [PATCH 08/19] scripts: Make cleanfile/cleanpatch warn about long lines

On Tue, Jun 05, 2007 at 04:12:54PM +0200, Sam Ravnborg wrote:
> On Tue, Jun 05, 2007 at 03:38:34PM +0200, Oleg Verych wrote:
> > On Tue, Jun 05, 2007 at 10:19:59AM +0200, Sam Ravnborg wrote:
> > []
> > > If on the other hand you are proposing a script to clean whitespace
> > > damage in the code then git already does this nicely.
> >
> > I've read that too quickly, sorry. What then all that clean scripts
> > are for?
> cleanfile compress spaces to tabs where appropriate. It does more
> than just warn about too long lines and remove leading whitespace.

Ah!

It's not an occasion, that this script does job of two, you are referring
to. I should add documentation to this script, that will far longer, that
script itself.

-*- slightly i/o optimized <clean-whitespace.sh> -*-
#!/bin/sh -e
# clean whitespace damage; i/o=stdin/stdout
#
IFS='' ; t="`printf '\t'`" ; s=' ' ; s7="$s$s$s$s$s$s$s" ; w79=79 ;
case $0 in *diff* | *patch*) p='+' ; s='';; esac

expand | while read line
do case "$line" in
++*) echo "$line";;
$p*) [ ${#line} -gt $w79 ] && : ${long:=line}
echo "$line" | sed "/^$p/{s_ *\$__;s_^$p$s7${s}_$p${t}_;s_$s7 _${t}_g}"
;;
*) echo "$line";;
esac
done
[ -n "$long" ] && echo "at least one line, wider than $w79 chars, found" 1>&2

-*-

First line stops word splitting, puts tab symbol in t, space in s,
seven spaces to s7 and line width limit to w79.

Second -- adjust set variables for need of unified diff processing, if
name of the script $0 (can be changed to match command-line
parameters) have "diff" or "patch".

Last -- bark, if there is at least one line longer that w79.

Rest is processing of files/patches to expand(posix tool) tabs to spaces
then:
- patches are processed only by lines, starting from `+', but not `++'
- remove trailing whitespace;
- substitute eight spaces to one tab symbol
- patches have "linestart`+tab'" expanded to seven spaces, due to `+' and
tabstop, thus seven spaces are substituted in this case.

Anything else, if have not noticed in whitespace damage matching can be
added on request ;)

> Please look at latest -git tree if you are trying to improve
> the clean* scripts or add counterparts.
>
> Thanks,
> Sam
____

2007-06-05 15:00:07

by Oleg Verych

[permalink] [raw]
Subject: Re: [kbuild-devel] [PATCH 08/19] scripts: Make cleanfile/cleanpatch warn about long lines

On Tue, Jun 05, 2007 at 04:57:59PM +0200, Oleg Verych wrote:
[]
> expand | while read line
> do case "$line" in
> ++*) echo "$line";;
> $p*) [ ${#line} -gt $w79 ] && : ${long:=line}
> echo "$line" | sed "/^$p/{s_ *\$__;s_^$p$s7${s}_$p${t}_;s_$s7 _${t}_g}"
> ;;
> *) echo "$line";;
> esac
> done
> [ -n "$long" ] && echo "at least one line, wider than $w79 chars, found" 1>&2
>
> -*-
[]
> Last -- bark, if there is at least one line longer that w79.

Well, if test will be in the pipe end, i.e. ... | { while; test lingth; }
____

2007-06-06 17:34:59

by Oleg Verych

[permalink] [raw]
Subject: Another version of cleanfile/cleanpatch (Re: [PATCH 08/19] scripts: Make cleanfile/cleanpatch warn about long lines)

While i'm against whitespace damaged files or patches since my very
first patch, and don't like brain damaged programmer's tools called
text editors, i also want to encourage UNIX-way of using userspace.

Of course, i might be wrong and foolish. Anyway, what i'm trying to do
is not to become new generation of Visual Perl#(R) implemented in
Java(R) using XML with userspace, that suck.

Many things in XXI century still can be done by tools founded 20-30
years ago. Why not try to?

Here is script proposal and test case, just for interested parties.
--
-o--=O`C info emacs : faq
#oo'L O info make : not found
<___=E M man gcc : not found


Attachments:
(No filename) (648.00 B)
clean-whitespace.sh (842.00 B)
clean-whitespace.test (1.16 kB)
Download all attachments

2007-06-06 17:49:48

by Sam Ravnborg

[permalink] [raw]
Subject: Re: Another version of cleanfile/cleanpatch (Re: [PATCH 08/19] scripts: Make cleanfile/cleanpatch warn about long lines)

On Wed, Jun 06, 2007 at 07:45:56PM +0200, Oleg Verych wrote:
> While i'm against whitespace damaged files or patches since my very
> first patch, and don't like brain damaged programmer's tools called
> text editors, i also want to encourage UNIX-way of using userspace.
>
> Of course, i might be wrong and foolish. Anyway, what i'm trying to do
> is not to become new generation of Visual Perl#(R) implemented in
> Java(R) using XML with userspace, that suck.
>
> Many things in XXI century still can be done by tools founded 20-30
> years ago. Why not try to?

Because your shell script is unreadable by normal human beings[*]
while the perl script for people with a bit of perl fu can read it
and fix/modify it.

We want tools that can be maintained and enhanced by most people.

[*] Normal human beings are people with same level of shell
scripting/sed skills that I have just to put that straight.
Sam

2007-06-06 19:02:49

by Oleg Verych

[permalink] [raw]
Subject: Re: Another version of cleanfile/cleanpatch

On Wed, Jun 06, 2007 at 07:50:26PM +0200, Sam Ravnborg wrote:
[]
> > Many things in XXI century still can be done by tools founded 20-30
> > years ago. Why not try to?
>
> Because your shell script is unreadable by normal human beings[*]
> while the perl script for people with a bit of perl fu can read it
> and fix/modify it.
>
> We want tools that can be maintained and enhanced by most people.
>
> [*] Normal human beings are people with same level of shell
> scripting/sed skills that I have just to put that straight.

In many cases i think, it's limiting one's imagination and expanding
laziness[0].

In the school algebra (usually) there are many exercises with
plain-useless equations and formulas you must solve or simplify.
Guess why? Thus my proposition. ;)

---
[0] Now, when most UNIX tools done with good quality (courtesy of the
GNU project), it's time not to convert programmer's laziness[1] to
ordinary one. Why there's one big and slow Bourne again shell, yet
to have fast ([d]ash) and tiny one took more time? As result more
efforts to remove bashizms...

[1] Ironically connected to Perl chapter of UNIX Power Tools
<http://unix.org.ua/orelly/unix/upt/ch37_02.htm>
____

2007-06-07 14:36:56

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Another version of cleanfile/cleanpatch


On Jun 6 2007 21:14, Oleg Verych wrote:
>[]
>> > Many things in XXI century still can be done by tools founded 20-30
>> > years ago. Why not try to?
>>
>> Because your shell script is unreadable by normal human beings[*]
>> while the perl script for people with a bit of perl fu can read it
>> and fix/modify it.

And because at the end of the day, the perl script might be faster
than the shell script. Yes, UNIX was designed to handle fork-exec
efficiently, thank God. But still.

>> We want tools that can be maintained and enhanced by most people.
>>
>> [*] Normal human beings are people with same level of shell
>> scripting/sed skills that I have just to put that straight.
>
>In many cases i think, it's limiting one's imagination and expanding
>laziness[0].
>
>In the school algebra (usually) there are many exercises with
>plain-useless equations and formulas you must solve or simplify.
>Guess why? Thus my proposition. ;)
>
>---
>[0] Now, when most UNIX tools done with good quality (courtesy of the
> GNU project), it's time not to convert programmer's laziness[1] to
> ordinary one. Why there's one big and slow Bourne again shell, yet
> to have fast ([d]ash) and tiny one took more time? As result more
> efforts to remove bashizms...

I prefer bashisms over using a shell [referring to original sh or ksh]
that can't do a sane thing.

>
>[1] Ironically connected to Perl chapter of UNIX Power Tools
> <http://unix.org.ua/orelly/unix/upt/ch37_02.htm>


Jan
--

2007-06-07 22:56:28

by Oleg Verych

[permalink] [raw]
Subject: Re: Another version of cleanfile/cleanpatch

On Thu, Jun 07, 2007 at 04:36:33PM +0200, Jan Engelhardt wrote:
>
> On Jun 6 2007 21:14, Oleg Verych wrote:
> >[]
> >> > Many things in XXI century still can be done by tools founded 20-30
> >> > years ago. Why not try to?
> >>
> >> Because your shell script is unreadable by normal human beings[*]
> >> while the perl script for people with a bit of perl fu can read it
> >> and fix/modify it.

Actually, unreadable and challenging were first messages, where i just
tried to show different point of view. After new subject and attached
working commented script with test case, i hope it's not.

> And because at the end of the day, the perl script might be faster
> than the shell script.
(ran by bash ;)

If fact, when i applied same solution and logic as in those scripts,
performance was very poor even with dash. But lets think for a moment
about, what must be done and then how. Meaningful whitespace damage is:

- trailing whitespace everywhere;
- x*(eight spaces) on line start;
- spaces between tabs near line start, before code;
- empty lines in the end of file (patches can't be handled, or can? :).

Finally what patches, i've replied on, done was, notifying user about
long visual lines.

So, script is interactive, isn't it? And that means:

- you have human as user;
- user knows, what script can possibly do.

Because of that, i think, following is redundant:

- to check for binary files
- scan whole file for long lines, with useless bunch of messages about
ones. Useless, because script doesn't fix that, it can't do that!

Thus, going from the end of my version, you see one-shoot check for
long line with (i hope) user-friendly message.

Then, there's no check for binary file at all.

Body -- is a commented sed script with shell variables for source/patch
handling switch and compatibility with other versions of sed, not only GNU.
If you like more tabs, then i stated in whitespace damage, just use
"unexpand".

As result there's one small (maintaining[1]), hopefully smart and fast
script.

> Yes, UNIX was designed to handle fork-exec efficiently, thank God. But
> still.
[]
> > efforts to remove bashizms...
>
> I prefer bashisms over using a shell [referring to original sh or ksh]
> that can't do a sane thing.

I would like to know cases. Just to try to solve them.

Two from my head are:

- `set pipefail' option -- not problem at all [0]
- arrays.

Arrays. Well, that depends. My option is as follows.

If you really need some kind of indexed data, create array via line by
line reading of file, storing data in variables with numbers, like

while :
do eval read -r $PREFIX_$i || break
i=$(($i + 1))
done
(stdin redirection is shown in [0]) thus you have it.

OK, to not to go offtopic, i would say here, that if that temp file on the
tmpfs, then Linux directly helps you with its efficient memory
management, not libc (good addition to fork/clone-execve, isn't it? ;)

Anything else may require not shell as solution.

>
> Jan
> --

[0] 47.2.1.4 More Elaborate Combinations (UNIX Power Tools)
<http://unix.org.ua/orelly/unix/upt/ch47_02.htm>

[1] Just two maitaining patches: usability and bugfix, as example:

|processing_based_on_patch_file_names.patch, not script name:

--- clean-whitespace.sh~v.00 2007-06-07 23:53:00.099249000 +0200
+++ clean-whitespace.sh 2007-06-07 23:54:43.025681500 +0200
@@ -7,5 +7,5 @@
not_patch_line='/^+[^+]/'

-case $0 in *diff* | *patch*)
+case $1 in *[.]diff | *[.]patch)
file=patch ; sp='+[!+]' ; p='+' ; addr="$not_patch_line" ;;
esac

|correctly_handle_lines_after_append_command.patch
|(and more visible resulting message):

--- clean-whitespace.sh~v.01 2007-06-07 23:54:43.025681500 +0200
+++ clean-whitespace.sh 2007-06-07 23:57:34.120374250 +0200
@@ -14,8 +14,8 @@
s|[$t$s]*$||; # trailing whitespace,
:next; # x*8 spaces on the line start -> x*tabs
-s|^$p\($t*\)$s4$s4|$p\1$t|;t next;
-s|^$p\($t*\)$s*$t|$p\1$t|g; # strip spaces between tabs
-};p" -- "$i" >"$o" &&
-echo "please, see clean ${file:=source} file: $o
+s|^\([\n]*\)$p\($t*\)$s4$s4|\1$p\2$t|;t next; # \n is needed after N command
+s|^\([\n]*\)$p\($t*\)$s*$t|\1$p\2$t|g; # strip spaces between tabs
+};p" -- "$i" >"$o" && echo "
+please, see clean ${file:=source} file: $o
"
exec expand $i | while read -r line # check for long line

____

2007-06-07 23:20:32

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [kbuild-devel] Another version of cleanfile/cleanpatch

Oleg Verych wrote:
>
> Because of that, i think, following is redundant:
>
> - to check for binary files

find . -type f | xargs cleanfile

I do this all the time.

> - scan whole file for long lines, with useless bunch of messages about
> ones. Useless, because script doesn't fix that, it can't do that!

Still useful to let the human know what is going on, and why.

-hpa


2007-06-08 01:23:41

by Oleg Verych

[permalink] [raw]
Subject: Re: Another version of cleanfile/cleanpatch

On Thu, Jun 07, 2007 at 04:19:56PM -0700, H. Peter Anvin wrote:
> Oleg Verych wrote:
> >
> > Because of that, i think, following is redundant:
> >
> > - to check for binary files
>
> find . -type f | xargs cleanfile

What about patches?

Anyway, by agreement (with myself), i've stopped on having per-file-name
division (prev. message first patch, and that was last design remaining
from cleanfile/cleanpatch). So:

for f in $*
do clean-whitespace $f 2>&1 >/dev/null
done

But this doesn't look like interactive usage, which i've concluded.
Plus copy is saved in $f.clean file, so user can `diff -u` to see any
destruction and possibly report a bug.

[]
> > - scan whole file for long lines, with useless bunch of messages about
> > ones. Useless, because script doesn't fix that, it can't do that!
>
> Still useful to let the human know what is going on, and why.

What i've done was `cleanpatch patch-2.6.21-rc4-rc5`
That's where usefulness comes from ;)

> -hpa
____

2007-06-08 05:28:03

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Another version of cleanfile/cleanpatch



On Jun 8 2007 01:06, Oleg Verych wrote:

>- empty lines in the end of file (patches can't be handled, or can? :).

Yes it can.

>Body -- is a commented sed script with shell variables for source/patch
>handling switch and compatibility with other versions of sed, not only GNU.
>If you like more tabs, then i stated in whitespace damage, just use
>"unexpand".

sed just does not cut it anymore. Perl regexes win in the long term.

>> Yes, UNIX was designed to handle fork-exec efficiently, thank God. But
>> still.
>[]
>> > efforts to remove bashizms...
>>
>> I prefer bashisms over using a shell [referring to original sh or ksh]
>> that can't do a sane thing.
>
>I would like to know cases. Just to try to solve them.
>
>Two from my head are:
>
>- `set pipefail' option -- not problem at all [0]
>- arrays.
>
>Arrays. Well, that depends. My option is as follows.

If you need arrays or a lot of substr magic, well, it's perhaps time to
consider a switch to a scripted language (perl, python, php, whatever comes
around). But I really meant these handy bash features:

for i in {1..5}; do ...; done;
if [ ]; then ...; fi;
echo $[ math expr ];
echo `backtick` and $(backtick with dollar-parentheses);

The possibility to say if [ "$any" == "" ] (not having to use
crap like [ "x$any" == "x" ].

The possibility to say if [ -z "$any" ]

On the other hand, if you wanted to extinguish bashisms, then you'd
also need to do so for kshisms like

if [[ ]]

>OK, to not to go offtopic, i would say here, that if that temp file on the
>tmpfs, then Linux directly helps you with its efficient memory
>management, not libc (good addition to fork/clone-execve, isn't it? ;)

And don't assume everything is a UNIX. CreateProcess() is particularly
expensive on Windows, burdening even Cygwin's fork()/exec() emulation.



Jan
--

2007-06-08 06:28:59

by Oleg Verych

[permalink] [raw]
Subject: [patch] scripts: clean-whitespace.sh

After running this script with filename as parameter,
look (with diff) for, what can be corrected.

Only "*.diff" and "*.patch" files are handled as patches.

Signed-off-by: Oleg Verych <[email protected]>
--
Two clean rules added, that can change look of damaged lines.
Yet script still fits one ordinary screen, so read the code!

test cases: include/linux/ipv6.h, arch/x86_64/lib/{copy_user, memcpy}.S

It's like lguest -- just for fun.

clean-whitespace.sh | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

--- /dev/null 2007-04-04 10:55:19.204075250 +0200
+++ linux-just-for-fun/scripts/clean-whitespace.sh 2007-06-08 07:53:21.000000000 +0200
@@ -0,0 +1,28 @@
+#!/bin/sh -e
+# clean whitespace damage; i/o = $1/$1.clean
+
+IFS='' ; t="`printf '\t'`" ; s=' ' ; s4="$s$s$s$s" ; w79=79 ;
+i="$1" ; o="$1.clean"
+strip_file_end='/^$/{N;s_^\n$_&_;T e;:n;N;s_^.*\n\n$_&_;t n;:e;};'
+not_patch_line='/^+[^+]/'
+
+case $1 in *[.]diff | *[.]patch)
+ file=patch ; sp='+[!+]' ; p='+' ; addr="$not_patch_line";;
+esac
+
+sed -n "${addr:-$strip_file_end} {
+s|[$t$s]*$||; # trailing whitespace
+:next; # x*8 spaces on the line start -> x*tabs
+s|^\([\n]*\)$p\($t*\)$s4$s4|\1$p\2$t|;t next; # \n is needed after N command
+s|^\([\n]*\)$p\($t*\)$s*$t|\1$p\2$t|g; # strip spaces between tabs
+s|$s4$s4$s$s*|$t$t|g # more than 8 spaces -> 2 tabs
+s|$s*$t|$t|g # strip spaces before tab; tradeoff: may break some alignment !
+};p" -- "$i" >"$o" && echo "
+please, see clean ${file:=source} file: $o
+"
+exec expand $i | while read -r line # check for long line
+do [ ${#line} -gt $w79 ] && case "$line" in $sp*) echo \
+"at least one line wider than $w79 chars, found
+check your $file, please
+" 1>&2 ; exit ;; esac
+done

2007-06-08 06:45:42

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch] scripts: clean-whitespace.sh

On Fri, 8 Jun 2007 08:40:42 +0200 Oleg Verych <[email protected]> wrote:

> After running this script with filename as parameter,
> look (with diff) for, what can be corrected.

Sorry, but "run it and see what it did" is pretty poor documentation.

> Only "*.diff" and "*.patch" files are handled as patches.
>
> Signed-off-by: Oleg Verych <[email protected]>
> --
> Two clean rules added, that can change look of damaged lines.
> Yet script still fits one ordinary screen, so read the code!
>
> test cases: include/linux/ipv6.h, arch/x86_64/lib/{copy_user, memcpy}.S
>
> It's like lguest -- just for fun.
>
> clean-whitespace.sh | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
> --- /dev/null 2007-04-04 10:55:19.204075250 +0200
> +++ linux-just-for-fun/scripts/clean-whitespace.sh 2007-06-08 07:53:21.000000000 +0200
> @@ -0,0 +1,28 @@
> +#!/bin/sh -e
> +# clean whitespace damage; i/o = $1/$1.clean
> +
> +IFS='' ; t="`printf '\t'`" ; s=' ' ; s4="$s$s$s$s" ; w79=79 ;
> +i="$1" ; o="$1.clean"
> +strip_file_end='/^$/{N;s_^\n$_&_;T e;:n;N;s_^.*\n\n$_&_;t n;:e;};'
> +not_patch_line='/^+[^+]/'
> +
> +case $1 in *[.]diff | *[.]patch)
> + file=patch ; sp='+[!+]' ; p='+' ; addr="$not_patch_line";;
> +esac
> +
> +sed -n "${addr:-$strip_file_end} {
> +s|[$t$s]*$||; # trailing whitespace
> +:next; # x*8 spaces on the line start -> x*tabs
> +s|^\([\n]*\)$p\($t*\)$s4$s4|\1$p\2$t|;t next; # \n is needed after N command
> +s|^\([\n]*\)$p\($t*\)$s*$t|\1$p\2$t|g; # strip spaces between tabs
> +s|$s4$s4$s$s*|$t$t|g # more than 8 spaces -> 2 tabs
> +s|$s*$t|$t|g # strip spaces before tab; tradeoff: may break some alignment !
> +};p" -- "$i" >"$o" && echo "
> +please, see clean ${file:=source} file: $o
> +"
> +exec expand $i | while read -r line # check for long line
> +do [ ${#line} -gt $w79 ] && case "$line" in $sp*) echo \
> +"at least one line wider than $w79 chars, found
> +check your $file, please
> +" 1>&2 ; exit ;; esac
> +done

Then again, it's a better strategy than trying to read the code ;)

Please, tell us what it does, so that we can decide whether we want it in
Linux.

2007-06-08 14:28:20

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [patch] scripts: clean-whitespace.sh

On Thu, Jun 07, 2007 at 11:44:59PM -0700, Andrew Morton wrote:
>
> Then again, it's a better strategy than trying to read the code ;)
>
> Please, tell us what it does, so that we can decide whether we want it in
> Linux.

It does the same as cleanfile.pl.
I have seen no reason to replace cleanfile.pl with this version.

Linecount is down but so is maintainability / extendability.

Sam

2007-06-08 14:51:00

by Oleg Verych

[permalink] [raw]
Subject: Re: [patch] scripts: clean-whitespace.sh

On Fri, Jun 08, 2007 at 04:28:49PM +0200, Sam Ravnborg wrote:
> On Thu, Jun 07, 2007 at 11:44:59PM -0700, Andrew Morton wrote:
> >
> > Then again, it's a better strategy than trying to read the code ;)
> >
> > Please, tell us what it does, so that we can decide whether we want it in
> > Linux.
>
> It does the same as cleanfile.pl.
> I have seen no reason to replace cleanfile.pl with this version.

It does better whitespace cleanup, than

scripts/{cleanfile *and* cleanpatch}

togather with more user, operating system kernel-friendly approach.
Please, test it on proposed testcase - ipv6.h, and you will see, what i'm
talking about. Patch description + script name is all one must to know to
start use it *safely*.

> Linecount is down but so is maintainability / extendability.

Really? If you think so...

For those, who have (X)emacs on board, i recommend "develock.el" as
good damage-showing tool.

Sorry, Andrew, for garbage in you patch subject filters :)
____

2007-06-08 19:05:07

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [patch] scripts: clean-whitespace.sh

On Fri, Jun 08, 2007 at 05:02:15PM +0200, Oleg Verych wrote:
> On Fri, Jun 08, 2007 at 04:28:49PM +0200, Sam Ravnborg wrote:
> > On Thu, Jun 07, 2007 at 11:44:59PM -0700, Andrew Morton wrote:
> > >
> > > Then again, it's a better strategy than trying to read the code ;)
> > >
> > > Please, tell us what it does, so that we can decide whether we want it in
> > > Linux.
> >
> > It does the same as cleanfile.pl.
> > I have seen no reason to replace cleanfile.pl with this version.
>
> It does better whitespace cleanup, than
>
> scripts/{cleanfile *and* cleanpatch}

Made a short test here.
Added the following to a file:

static sam = " ";

clean-whitespace replaced spaces within "" with tabs.
cleanfile did not.

Seems that clean-whitespace has the wrong assumption that any sequence
of <tab-stop>8 spaces<tab-stop> must be replaced with a tab.
It should obviously default to beginning of file.


Running latest cleanfile (it is -mm) on your testfile
gave following output:
cleanfile: ipv6.h
ipv6.h:105: line exceeds 79 characters (89)
ipv6.h:111: line exceeds 79 characters (89)
ipv6.h:350: line exceeds 79 characters (85)
ipv6.h:356: line exceeds 79 characters (85)
ipv6.h:362: line exceeds 79 characters (85)
ipv6.h:386: line exceeds 79 characters (82)
ipv6.h:413: line exceeds 79 characters (85)

And spaces were replaced with tabs only where
it is safe.

Sam

2007-06-08 20:55:55

by Oleg Verych

[permalink] [raw]
Subject: Re: [patch] scripts: clean-whitespace.sh

On Fri, Jun 08, 2007 at 09:05:54PM +0200, Sam Ravnborg wrote:
[]
> >
> > It does better whitespace cleanup, than
> >
> > scripts/{cleanfile *and* cleanpatch}
>
> Made a short test here.
> Added the following to a file:
>
> static sam = " ";
>
> clean-whitespace replaced spaces within "" with tabs.
> cleanfile did not.

So, user with `diff` will see that and fix or a way of usage spaces in
strings, or will update BRE, used for that kind of match ;)

Still, better job:
|-*-
olecom@flower:/tmp$ ls -l ipv62.h.clean
-rw-r----- 1 olecom root 10591 Jun 8 22:47 ipv62.h.clean +2 bytes
olecom@flower:/tmp$ ls -l ipv66.h
-rw-r----- 1 olecom root 10704 Jun 8 22:47 ipv66.h
olecom@flower:/tmp$
|-*-

But there will be two shifted words, because it doesn't calculate
tabstops. I've choose to remove spaces before tab, so it will be visible
and easy to fix by hand. Only spaces up-to next tabstop - 1 after
possible tab is OK.
____