2007-06-04 09:28:15

by Andy Whitcroft

[permalink] [raw]
Subject: [PATCH] update checkpatch.pl to version 0.03


This version brings a host of changes to cure false positives and
bugs detected on patches submitted to lkml and -mm. It also brings
a number of new tests in response to reviews, of particular note:

- catch use of volatile
- allow deprecated functions to be listed in feature-removal-schedule.txt
- warn about #ifdef's in c files
- check that spinlock_t and struct mutex use is commented
- report on architecture specific defines being used
- report memory barriers without an associated comment

Signed-off-by: Andy Whitcroft <[email protected]>
---

Full changelog:

Andy Whitcroft (19):
catch use of volatile
convert other quoted string checks to common routine
alloc deprecated functions to be listed in feature-removal-schedule.txt
split out the line length and indent for each line
improve switch block handling
handle GNU diff context lines with no leading space
warn about #ifdef's in c files
tidy up tests for signed-off-by using raw mode
check that spinlock_t and struct mutex use is commented
syntax checks for open brace placement may drop off the bottom of hunk
report memory barriers without an associated comment
when a sign off is present but ugly do not report it missing
do not mistake bitfield definitions for indented labels
report on architecture specific defines being used
major update to the operator checks
prevent switch/if/while etc matching foo_switch
generify assignement in condition error message
introduce an operator context marker
Version: 0.03
---
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
old mode 100644
new mode 100755
index e216d49..9590bbb
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -8,7 +8,7 @@ use strict;

my $P = $0;

-my $V = '0.01';
+my $V = '0.03';

use Getopt::Long qw(:config no_auto_abbrev);

@@ -38,7 +38,8 @@ if ($tree && !top_of_kernel_tree()) {
exit(2);
}

-my @deprecated = ();
+my @dep_includes = ();
+my @dep_functions = ();
my $removal = 'Documentation/feature-removal-schedule.txt';
if ($tree && -f $removal) {
open(REMOVE, "<$removal") || die "$P: $removal: open failed - $!\n";
@@ -46,9 +47,14 @@ if ($tree && -f $removal) {
if (/^Files:\s+(.*\S)/) {
for my $file (split(/[, ]+/, $1)) {
if ($file =~ m@include/(.*)@) {
- push(@deprecated, $1);
+ push(@dep_includes, $1);
}
}
+
+ } elsif (/^Funcs:\s+(.*\S)/) {
+ for my $func (split(/[, ]+/, $1)) {
+ push(@dep_functions, $func);
+ }
}
}
}
@@ -99,6 +105,97 @@ sub expand_tabs {
return $res;
}

+sub line_stats {
+ my ($line) = @_;
+
+ # Drop the diff line leader and expand tabs
+ $line =~ s/^.//;
+ $line = expand_tabs($line);
+
+ # Pick the indent from the front of the line.
+ my ($white) = ($line =~ /^(\s*)/);
+
+ return (length($line), length($white));
+}
+
+sub ctx_block_get {
+ my ($linenr, $remain, $outer) = @_;
+ my $line;
+ my $start = $linenr - 1;
+ my $end = $linenr - 1 + $remain;
+ my $blk = '';
+ my @o;
+ my @c;
+ my @res = ();
+
+ for ($line = $start; $line < $end; $line++) {
+ $blk .= $lines[$line];
+
+ @o = ($blk =~ /\{/g);
+ @c = ($blk =~ /\}/g);
+
+ if (!$outer || (scalar(@o) - scalar(@c)) == 1) {
+ push(@res, $lines[$line]);
+ }
+
+ last if (scalar(@o) == scalar(@c));
+ }
+
+ return @res;
+}
+sub ctx_block_outer {
+ my ($linenr, $remain) = @_;
+
+ return ctx_block_get($linenr, $remain, 1);
+}
+sub ctx_block {
+ my ($linenr, $remain) = @_;
+
+ return ctx_block_get($linenr, $remain, 0);
+}
+
+sub ctx_locate_comment {
+ my ($first_line, $end_line) = @_;
+
+ # Catch a comment on the end of the line itself.
+ my ($current_comment) = ($lines[$end_line - 1] =~ m@.*(/\*.*\*/)\s*$@);
+ return $current_comment if (defined $current_comment);
+
+ # Look through the context and try and figure out if there is a
+ # comment.
+ my $in_comment = 0;
+ $current_comment = '';
+ for (my $linenr = $first_line; $linenr < $end_line; $linenr++) {
+ my $line = $lines[$linenr - 1];
+ ##warn " $line\n";
+ if ($linenr == $first_line and $line =~ m@^.\s*\*@) {
+ $in_comment = 1;
+ }
+ if ($line =~ m@/\*@) {
+ $in_comment = 1;
+ }
+ if (!$in_comment && $current_comment ne '') {
+ $current_comment = '';
+ }
+ $current_comment .= $line . "\n" if ($in_comment);
+ if ($line =~ m@\*/@) {
+ $in_comment = 0;
+ }
+ }
+
+ chomp($current_comment);
+ return($current_comment);
+}
+sub ctx_has_comment {
+ my ($first_line, $end_line) = @_;
+ my $cmt = ctx_locate_comment($first_line, $end_line);
+
+ ##print "LINE: $lines[$end_line - 1 ]\n";
+ ##print "CMMT: $cmt\n";
+
+ return ($cmt ne '');
+}
+
sub cat_vet {
my ($vet) = @_;

@@ -108,6 +205,10 @@ sub cat_vet {
return $vet;
}

+sub has_non_quoted {
+ return ($_[0] =~ m{$_[1]} and $_[0] !~ m{\".*$_[1].*\"});
+}
+
sub process {
my $filename = shift;
my @lines = @_;
@@ -116,7 +217,7 @@ sub process {
my $prevline="";
my $stashline="";

- my $lineforcounting='';
+ my $length;
my $indent;
my $previndent=0;
my $stashindent=0;
@@ -145,7 +246,7 @@ sub process {
#extract the line range in the file after the patch is applied
if ($line=~/^\@\@ -\d+,\d+ \+(\d+)(,(\d+))? \@\@/) {
$is_patch = 1;
- $first_line = 1;
+ $first_line = $linenr + 1;
$in_comment = 0;
$realline=$1-1;
if (defined $2) {
@@ -156,8 +257,10 @@ sub process {
next;
}

-#track the line number as we move through the hunk
- if ($line=~/^[ \+]/) {
+# track the line number as we move through the hunk, note that
+# new versions of GNU diff omit the leading space on completely
+# blank context lines so we need to count that too.
+ if ($line =~ /^( |\+|$)/) {
$realline++;
$realcnt-- if ($realcnt != 0);

@@ -168,7 +271,7 @@ sub process {
# Guestimate if this is a continuing comment. If this
# is the start of a diff block and this line starts
# ' *' then it is very likely a comment.
- if ($first_line and $line =~ m@^.\s*\*@) {
+ if ($linenr == $first_line and $line =~ m@^.\s*\*@) {
$in_comment = 1;
}
if ($line =~ m@/\*@) {
@@ -178,17 +281,12 @@ sub process {
$in_comment = 0;
}

- $lineforcounting = $line;
- $lineforcounting =~ s/^\+//;
- $lineforcounting = expand_tabs($lineforcounting);
-
- my ($white) = ($lineforcounting =~ /^(\s*)/);
- $indent = length($white);
+ # Measure the line length and indent.
+ ($length, $indent) = line_stats($line);

# Track the previous line.
($prevline, $stashline) = ($stashline, $line);
($previndent, $stashindent) = ($stashindent, $indent);
- $first_line = 0;
}

#make up the handle for any error we report on this line
@@ -203,6 +301,8 @@ sub process {
$signoff++;

} elsif ($line =~ /^\s*signed-off-by:/i) {
+ # This is a signoff, if ugly, so do not double report.
+ $signoff++;
if (!($line =~ /^\s*Signed-off-by:/)) {
print "use Signed-off-by:\n";
print "$herecurr";
@@ -229,7 +329,7 @@ sub process {
$clean = 0;
}
#80 column limit
- if (!($prevline=~/\/\*\*/) && length($lineforcounting) > 80) {
+ if (!($prevline=~/\/\*\*/) && $length > 80) {
print "line over 80 characters\n";
print "$herecurr";
$clean = 0;
@@ -254,7 +354,7 @@ sub process {
next if ($in_comment);

# no C99 // comments
- if ($line =~ m@//@ and !($line =~ m@\".*//.*\"@)) {
+ if (has_non_quoted($line, '//')) {
print "do not use C99 // comments\n";
print "$herecurr";
$clean = 0;
@@ -320,44 +420,44 @@ sub process {
print "$herecurr";
$clean = 0;
}
+ # Note we expand the line with the leading + as the real
+ # line will be displayed with the leading + and the tabs
+ # will therefore also expand that way.
my $opline = $line;
+ $opline = expand_tabs($opline);
$opline =~ s/^.//;
if (!($line=~/\#\s*include/)) {
# Check operator spacing.
my @elements = split(/(<<=|>>=|<=|>=|==|!=|\+=|-=|\*=|\/=|%=|\^=|\|=|&=|->|<<|>>|<|>|=|!|~|&&|\|\||,|\^|\+\+|--|;|&|\||\+|-|\*|\/\/|\/)/, $opline);
+ my $off = 1;
for (my $n = 0; $n < $#elements; $n += 2) {
- # $wN says we have white-space before or after
- # $sN says we have a separator before or after
- # $oN says we have another operator before or after
- my $w1 = $elements[$n] =~ /\s$/;
- my $s1 = $elements[$n] =~ /(\[|\(|\s)$/;
- my $o1 = $elements[$n] eq '';
+ $off += length($elements[$n]);
+
+ my $a = '';
+ $a = 'V' if ($elements[$n] ne '');
+ $a = 'W' if ($elements[$n] =~ /\s$/);
+ $a = 'B' if ($elements[$n] =~ /(\[|\()$/);
+ $a = 'O' if ($elements[$n] eq '');
+ $a = 'E' if ($elements[$n] eq '' && $n == 0);
+
my $op = $elements[$n + 1];
- my $w2 = 1;
- my $s2 = 1;
- my $o2 = 0;
- # If we have something after the operator handle it.
+
+ my $c = '';
if (defined $elements[$n + 2]) {
- $w2 = $elements[$n + 2] =~ /^\s/;
- $s2 = $elements[$n + 2] =~ /^(\s|\)|\]|;)/;
- $o2 = $elements[$n + 2] eq '';
+ $c = 'V' if ($elements[$n + 2] ne '');
+ $c = 'W' if ($elements[$n + 2] =~ /^\s/);
+ $c = 'B' if ($elements[$n + 2] =~ /^(\)|\]|;)/);
+ $c = 'O' if ($elements[$n + 2] eq '');
+ } else {
+ $c = 'E';
}

- # Generate the context.
- my $at = "here: ";
- for (my $m = $n; $m >= 0; $m--) {
- if ($elements[$m] ne '') {
- $at .= $elements[$m];
- last;
- }
- }
- $at .= $op;
- for (my $m = $n + 2; defined $elements[$m]; $m++) {
- if ($elements[$m] ne '') {
- $at .= $elements[$m];
- last;
- }
- }
+ my $ctx = "${a}x${c}";
+
+ my $at = "(ctx:$ctx)";
+
+ my $ptr = (" " x $off) . "^";
+ my $hereptr = "$here\n$line\n$ptr\n\n";

##print "<$s1:$op:$s2> <$elements[$n]:$elements[$n + 1]:$elements[$n + 2]>\n";
# Skip things apparently in quotes.
@@ -368,38 +468,38 @@ sub process {

# -> should have no spaces
} elsif ($op eq '->') {
- if ($s1 or $s2) {
+ if ($ctx =~ /Wx.|.xW/) {
print "no spaces around that '$op' $at\n";
- print "$herecurr";
+ print "$hereptr";
$clean = 0;
}

# , must have a space on the right.
} elsif ($op eq ',') {
- if (!$s2) {
+ if ($ctx !~ /.xW|.xE/) {
print "need space after that '$op' $at\n";
- print "$herecurr";
+ print "$hereptr";
$clean = 0;
}

# unary ! and unary ~ are allowed no space on the right
} elsif ($op eq '!' or $op eq '~') {
- if (!$s1 && !$o1) {
+ if ($ctx !~ /[WOEB]x./) {
print "need space before that '$op' $at\n";
- print "$herecurr";
+ print "$hereptr";
$clean = 0;
}
- if ($s2) {
+ if ($ctx =~ /.xW/) {
print "no space after that '$op' $at\n";
- print "$herecurr";
+ print "$hereptr";
$clean = 0;
}

# unary ++ and unary -- are allowed no space on one side.
} elsif ($op eq '++' or $op eq '--') {
- if (($s1 && $s2) || ((!$s1 && !$o1) && (!$s2 && !$o2))) {
+ if ($ctx !~ /[WOB]x[^W]|[^W]x[WOB]/) {
print "need space one side of that '$op' $at\n";
- print "$herecurr";
+ print "$hereptr";
$clean = 0;
}

@@ -420,10 +520,17 @@ sub process {
# (foo *)
# (foo **)
#
- } elsif ($op eq '&' or $op eq '-' or $op eq '*') {
- if ($w2 and !$w1) {
+ } elsif ($op eq '&' or $op eq '-') {
+ if ($ctx !~ /VxV|[EWB]x[WE]|[EWB]x[VO]/) {
print "need space before that '$op' $at\n";
- print "$herecurr";
+ print "$hereptr";
+ $clean = 0;
+ }
+
+ } elsif ($op eq '*') {
+ if ($ctx !~ /VxV|[EWB]x[WE]|[EWB]x[VO]|[EWO]x[OBV]/) {
+ print "need space before that '$op' $at\n";
+ print "$hereptr";
$clean = 0;
}

@@ -431,18 +538,19 @@ sub process {
} elsif ($op eq '<<' or $op eq '>>' or $op eq '+' or $op eq '/' or
$op eq '^' or $op eq '|')
{
- if ($s1 != $s2) {
+ if ($ctx !~ /VxV|WxW|VxE|WxE/) {
print "need consistent spacing around '$op' $at\n";
- print "$herecurr";
+ print "$hereptr";
$clean = 0;
}

# All the others need spaces both sides.
- } elsif (!$s1 or !$s2) {
+ } elsif ($ctx !~ /[EW]x[WE]/) {
print "need spaces around that '$op' $at\n";
- print "$herecurr";
+ print "$hereptr";
$clean = 0;
}
+ $off += length($elements[$n + 1]);
}
}

@@ -454,7 +562,7 @@ sub process {
}

#goto labels aren't indented, allow a single space however
- if ($line=~/^.\s+[A-Za-z\d_]+:/ and
+ if ($line=~/^.\s+[A-Za-z\d_]+:(?![0-9]+)/ and
!($line=~/^. [A-Za-z\d_]+:/) and !($line=~/^.\s+default:/)) {
print "labels should not be indented\n";
print "$herecurr";
@@ -462,15 +570,15 @@ sub process {
}

# Need a space before open parenthesis after if, while etc
- if ($line=~/(if|while|for|switch)\(/) {
+ if ($line=~/\b(if|while|for|switch)\(/) {
print "need a space before the open parenthesis\n";
print "$herecurr";
$clean = 0;
}

# Check for illegal assignment in if conditional.
- if ($line=~/(if|while)\s*\(.*[^<>!=]=[^=].*\)/) {
- print "do not use assignment in if condition\n";
+ if ($line=~/\b(if|while)\s*\(.*[^<>!=]=[^=].*\)/) {
+ print "do not use assignment in condition\n";
print "$herecurr";
$clean = 0;
}
@@ -484,15 +592,28 @@ sub process {
$clean = 0;
}

- # Check for switch () {<nl>case, these must be at the
- # same indent. We will only catch the first one, as our
- # context is very small but people tend to be consistent
- # so we will catch them out more often than not.
- if ($prevline=~/\s*switch\s*\(.*\)/ and $line=~/\s*case\s+/
- and $previndent != $indent) {
- print "switch and case should be at the same indent\n";
- print "$hereprev";
- $clean = 0;
+ # Check for switch () and associated case and default
+ # statements should be at the same indent.
+ if ($line=~/\bswitch\s*\(.*\)/) {
+ my $err = '';
+ my $sep = '';
+ my @ctx = ctx_block_outer($linenr, $realcnt);
+ shift(@ctx);
+ for my $ctx (@ctx) {
+ my ($clen, $cindent) = line_stats($ctx);
+ if ($ctx =~ /\s*(case\s+|default:)/ &&
+ $indent != $cindent) {
+ $err .= "$sep$ctx\n";
+ $sep = '';
+ } else {
+ $sep = "[...]\n";
+ }
+ }
+ if ($err ne '') {
+ print "switch and case should be at the same indent\n";
+ print "$here\n$line\n$err\n";
+ $clean = 0;
+ }
}

#studly caps, commented out until figure out how to distinguish between use of existing and adding new
@@ -520,7 +641,7 @@ sub process {
}

#if/while/etc brace do not go on next line, unless #defining a do while loop, or if that brace on the next line is for something else
- if ($prevline=~/(if|while|for|switch)\s*\(/) {
+ if ($prevline=~/\b(if|while|for|switch)\s*\(/) {
my @opened = $prevline=~/\(/g;
my @closed = $prevline=~/\)/g;
my $nr_line = $linenr;
@@ -529,7 +650,7 @@ sub process {
my $extra_lines = 0;
my $display_segment = $prevline;

- while ($remaining > 0 && scalar @opened > scalar @closed) {
+ while ($remaining > 1 && scalar @opened > scalar @closed) {
$prevline .= $next_line;
$display_segment .= "\n" . $next_line;
$next_line = $lines[$nr_line];
@@ -540,10 +661,10 @@ sub process {
@closed = $prevline=~/\)/g;
}

- if (($prevline=~/(if|while|for|switch)\s*\(.*\)\s*$/) and ($next_line=~/{/) and
- !($next_line=~/(if|while|for)/) and !($next_line=~/\#define.*do.*while/)) {
+ if (($prevline=~/\b(if|while|for|switch)\s*\(.*\)\s*$/) and ($next_line=~/{/) and
+ !($next_line=~/\b(if|while|for)/) and !($next_line=~/\#define.*do.*while/)) {
print "That { should be on the previous line\n";
- print "$display_segment\n$next_line\n\n";
+ print "$here\n$display_segment\n$next_line\n\n";
$clean = 0;
}
}
@@ -558,7 +679,7 @@ sub process {
}

# don't include deprecated include files
- for my $inc (@deprecated) {
+ for my $inc (@dep_includes) {
if ($line =~ m@\#\s*include\s*\<$inc>@) {
print "Don't use <$inc>: see Documentation/feature-removal-schedule.txt\n";
print "$herecurr";
@@ -566,9 +687,49 @@ sub process {
}
}

-# don't use kernel_thread()
- if ($line =~ /\bkernel_thread\b/) {
- print "Don't use kernel_thread(), use kthread(): see Documentation/feature-removal-schedule.txt\n";
+# don't use deprecated functions
+ for my $func (@dep_functions) {
+ if (has_non_quoted($line, '\b' . $func . '\b')) {
+ print "Don't use $func(): see Documentation/feature-removal-schedule.txt\n";
+ print "$herecurr";
+ $clean = 0;
+ }
+ }
+
+# no volatiles please
+ if (has_non_quoted($line, '\bvolatile\b')) {
+ print "Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt\n";
+ print "$herecurr";
+ $clean = 0;
+ }
+
+# warn about #ifdefs in C files
+ if ($line =~ /^.#\s*if(|n)def/ && ($realfile =~ /\.c$/)) {
+ print "#ifdef in C files should be avoided\n";
+ print "$herecurr";
+ $clean = 0;
+ }
+
+# check for spinlock_t definitions without a comment.
+ if ($line =~ /^.\s*(struct\s+mutex|spinlock_t)\s+\S+;/) {
+ my $which = $1;
+ if (!ctx_has_comment($first_line, $linenr)) {
+ print "$1 definition without comment\n";
+ print "$herecurr";
+ $clean = 0;
+ }
+ }
+# check for memory barriers without a comment.
+ if ($line =~ /\b(mb|rmb|wmb|read_barrier_depends|smp_mb|smp_rmb|smp_wmb|smp_read_barrier_depends)\(/) {
+ if (!ctx_has_comment($first_line, $linenr)) {
+ print "memory barrier without comment\n";
+ print "$herecurr";
+ $clean = 0;
+ }
+ }
+# check of hardware specific defines
+ if ($line =~ m@^.#\s*if.*\b(__i386__|__powerpc64__|__sun__|__s390x__)\b@) {
+ print "architecture specific defines should be avoided\n";
print "$herecurr";
$clean = 0;
}


2007-06-04 15:50:22

by Joel Schopp

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

> This version brings a host of changes to cure false positives and
> bugs detected on patches submitted to lkml and -mm. It also brings
> a number of new tests in response to reviews, of particular note:
>
> - catch use of volatile
> - allow deprecated functions to be listed in feature-removal-schedule.txt
> - warn about #ifdef's in c files


I think the design philosophy of the style checker should be to err on the side of being
quiet. It shouldn't report things that aren't problems. There are plenty of valid uses
of #ifdefs in c files. #ifdefs may be abused often. If we start bothering every author
that uses #ifdefs with an annoying note it detracts from the usefulness of our tool.

If we really want to complain about #ifdefs we should add a flag to the script so it isn't
a default. -potential or something. We could put all the "this often is an error" type
warnings under it.

The rest of the patch looks fine.

-Joel

2007-06-04 16:28:31

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03


On Jun 4 2007 10:46, Andy Whitcroft wrote:
>
> - catch use of volatile

Speaking of volatile, "register" is probably just as unwanted.
Then, "extern inline" is one thing to catch (does not happen
that often, but it does not cost too much either).

> - warn about #ifdef's in c files

Really? There are a lot of ifdefs in existing code, and it is
probably inevitable to add some in newer code ... overall
leading to more false positives and cluttering the output.
Or am I gone wrong somewhere?

>+
>+ } elsif (/^Funcs:\s+(.*\S)/) {
>+ for my $func (split(/[, ]+/, $1)) {
>+ push(@dep_functions, $func);
>+ }

for -> foreach, although it does not functionally change anything.
Yeah, Perl is funny, for(one arg) is actually foreach().
Quite confusing to for(three args).

>+sub line_stats {
^ = \n ?

>+ last if (scalar(@o) == scalar(@c));

Or last if $#o == $#c. Again, personal taste (=do it your way).
I do not think $#a is any cheaper than scalar(@a) anyway.

>+sub has_non_quoted {
>+ return ($_[0] =~ m{$_[1]} and $_[0] !~ m{\".*$_[1].*\"});
>+}

Safe? Or intended? Did you want to allow regexes?
Otherwise...

return $_[0] =~ /\Q$_[1]\E/ && $_[0] !~ /".*\Q$_[1]\E.*"/;

> if (!($line =~ /^\s*Signed-off-by:/)) {
^ ^^
=> if($line !~ /.../)
Gotta love perl. Perhaps one language where you'll always find a
way to circumvent any CodingStyle ever written :p

> #80 column limit
>- if (!($prevline=~/\/\*\*/) && length($lineforcounting) > 80) {
>+ if (!($prevline=~/\/\*\*/) && $length > 80) {

I say thee 79. But we had that more or less already.

>+ for my $ctx (@ctx) {

>- while ($remaining > 0 && scalar @opened > scalar @closed) {
>+ while ($remaining > 1 && scalar @opened > scalar @closed) {

Although scalar might bind as hard as sizeof in C, I suggest parentheses.
(Or $#)

while ($remaining > 1 && scalar(@opened) > scalar(@closed))

>+# warn about #ifdefs in C files
>+ if ($line =~ /^.#\s*if(|n)def/ && ($realfile =~ /\.c$/)) {

save a capture operation: /^.#\s*ifn?def/



Jan
--

2007-06-04 16:51:28

by Andy Whitcroft

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

jschopp wrote:
>> This version brings a host of changes to cure false positives and
>> bugs detected on patches submitted to lkml and -mm. It also brings
>> a number of new tests in response to reviews, of particular note:
>>
>> - catch use of volatile
>> - allow deprecated functions to be listed in
>> feature-removal-schedule.txt
>> - warn about #ifdef's in c files
>
>
> I think the design philosophy of the style checker should be to err on
> the side of being quiet. It shouldn't report things that aren't
> problems. There are plenty of valid uses of #ifdefs in c files.
> #ifdefs may be abused often. If we start bothering every author that
> uses #ifdefs with an annoying note it detracts from the usefulness of
> our tool.
>
> If we really want to complain about #ifdefs we should add a flag to the
> script so it isn't a default. -potential or something. We could put
> all the "this often is an error" type warnings under it.

The original suggestion was to count them and only complain if there
were "lots". I had thought though that the general consensus was that
#ifdef in C files was pretty much frowned upon. I must admit to working
to the "you must be able to justify all winges in the output" rather
than expecting the result to be empty necessarily.

I am ambivalent on what gets reported as long as its generally useful.
I as you say don't want to produce so much noise that it hides the
useful output.

We've not talked about how the tool might grow. My thought was that
soon we would enable the robot replies on linux-mm (say) and use the
feedback from that to tune what we do and do not report. I have been
pushing all of the contributions to -mm for sometime through it and
cirtainly the #ifdef one once of the more common ones (other than white
space dammage and >80 character lines).

-apw

2007-06-04 17:22:26

by Joel Schopp

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

> The original suggestion was to count them and only complain if there
> were "lots". I had thought though that the general consensus was that
> #ifdef in C files was pretty much frowned upon. I must admit to working
> to the "you must be able to justify all winges in the output" rather
> than expecting the result to be empty necessarily.

I really think we should work towards, "if you see an error the odds are overwhelming that
we aren't wasting your time and you should fix this." rather than, "every time you send a
patch you will get a couple false positives that you will have to think about and justify
to yourself, slowing down your sending the patch out and making more work for you". I'm
not saying we have to have 100% accuracy, but I want it well in the 90s.

Now back to the ifdef's. I don't think we can really even say a lot or a little is broken
(short patch can do 1 ifdef that is stupid, long patch could do several that are good). I
think we'll either have to let that one go or put it under a non-default flag. We do
still have humans reviewing code who can make judgement calls like if you #ifdefs are good
or not.

What we can check for is #if 0 code. I hate that one.

> We've not talked about how the tool might grow. My thought was that
> soon we would enable the robot replies on linux-mm (say) and use the
> feedback from that to tune what we do and do not report. I have been
> pushing all of the contributions to -mm for sometime through it and
> cirtainly the #ifdef one once of the more common ones (other than white
> space dammage and >80 character lines).

If you have reasonably good SPAM filters on your list and make the robot so it is very
good about only picking up mail that really are patches then this could be a very good
idea. Just be careful. I could send out a bunch of mail with fake headers saying it was
from say Andy Whitcroft, then the robot replies and you got a bunch of junk mail.

User feedback is really useful, both for us and for the user. Based on user feedback from
the very small number of users I had I tweaked a lot of regular expressions, added some
new checks, and removed some I could never get right.

-Joel

2007-06-04 18:47:20

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Mon, 04 Jun 2007 10:46:24 +0100
Andy Whitcroft <[email protected]> wrote:

> This version brings a host of changes to cure false positives and
> bugs detected on patches submitted to lkml and -mm. It also brings
> a number of new tests in response to reviews, of particular note:
>
> - catch use of volatile
> - allow deprecated functions to be listed in feature-removal-schedule.txt
> - warn about #ifdef's in c files
> - check that spinlock_t and struct mutex use is commented
> - report on architecture specific defines being used
> - report memory barriers without an associated comment

Oy. I ran checkpatch.pl across this patch and it failed to report upon the
new trailing whitespace which you just tried to add.

2007-06-04 19:08:26

by Andy Whitcroft

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

Andrew Morton wrote:
> On Mon, 04 Jun 2007 10:46:24 +0100
> Andy Whitcroft <[email protected]> wrote:
>
>> This version brings a host of changes to cure false positives and
>> bugs detected on patches submitted to lkml and -mm. It also brings
>> a number of new tests in response to reviews, of particular note:
>>
>> - catch use of volatile
>> - allow deprecated functions to be listed in feature-removal-schedule.txt
>> - warn about #ifdef's in c files
>> - check that spinlock_t and struct mutex use is commented
>> - report on architecture specific defines being used
>> - report memory barriers without an associated comment
>
> Oy. I ran checkpatch.pl across this patch and it failed to report upon the
> new trailing whitespace which you just tried to add.

Herm, I guess thats cause its a .pl and therefore exempt from most of
the checks.

I guess line length and white space checks make sense some degree on
those files. I'll sort that out and I guess we'll have anohter version.
Sounds like the #ifdef checks are too much anyhow.

-apw

2007-06-04 19:11:29

by Rene Herman

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On 06/04/2007 09:08 PM, Andy Whitcroft wrote:

> I guess line length and white space checks make sense some degree on
> those files. I'll sort that out and I guess we'll have anohter version.

Could you then also post the thing itself, and not just a patch against the
previous version for us chickens too scared to run -mm?

Rene.

2007-06-04 20:04:16

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Mon, 04 Jun 2007 21:08:07 +0200 Rene Herman wrote:

> On 06/04/2007 09:08 PM, Andy Whitcroft wrote:
>
> > I guess line length and white space checks make sense some degree on
> > those files. I'll sort that out and I guess we'll have anohter version.
>
> Could you then also post the thing itself, and not just a patch against the
> previous version for us chickens too scared to run -mm?

I'd like to see it put on the web in a fixed location.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2007-06-05 08:14:46

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Mon, Jun 04, 2007 at 10:46:24AM +0100, Andy Whitcroft wrote:
>
> This version brings a host of changes to cure false positives and
> bugs detected on patches submitted to lkml and -mm. It also brings
> a number of new tests in response to reviews, of particular note:
>
> - catch use of volatile

It will warn on "asm volatile (" which it shouldn't.

> - report on architecture specific defines being used

We use architecture specific defines to distinguish between 32 bit and
64 bit code in user space visible header files, since we cannot use
CONFIG_64BIT. So this will give us false positives as well.
Maybe don't warn for header files in include/asm-* ?

2007-06-05 09:56:38

by Andy Whitcroft

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

jschopp wrote:
>> This version brings a host of changes to cure false positives and
>> bugs detected on patches submitted to lkml and -mm. It also brings
>> a number of new tests in response to reviews, of particular note:
>>
>> - catch use of volatile
>> - allow deprecated functions to be listed in
>> feature-removal-schedule.txt
>> - warn about #ifdef's in c files
>
>
> I think the design philosophy of the style checker should be to err on
> the side of being quiet. It shouldn't report things that aren't
> problems. There are plenty of valid uses of #ifdefs in c files.
> #ifdefs may be abused often. If we start bothering every author that
> uses #ifdefs with an annoying note it detracts from the usefulness of
> our tool.
>
> If we really want to complain about #ifdefs we should add a flag to the
> script so it isn't a default. -potential or something. We could put
> all the "this often is an error" type warnings under it.
>
> The rest of the patch looks fine.

For now then we'll put the ifdef checks on ice until we get a better
idea of the "rules" if we ever do.

-apw

2007-06-05 18:39:39

by Andy Whitcroft

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

Randy Dunlap wrote:
> On Mon, 04 Jun 2007 21:08:07 +0200 Rene Herman wrote:
>
>> On 06/04/2007 09:08 PM, Andy Whitcroft wrote:
>>
>>> I guess line length and white space checks make sense some degree on
>>> those files. I'll sort that out and I guess we'll have anohter version.
>> Could you then also post the thing itself, and not just a patch against the
>> previous version for us chickens too scared to run -mm?
>
> I'd like to see it put on the web in a fixed location.

Yep. That makes lots of sense. Am trying to arrange a fixed location.
Failing that I'll have to shove it on my server and move it later.
Will let you know the outcome.

-apw

2007-06-05 18:45:26

by Andy Whitcroft

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

jschopp wrote:
>> The original suggestion was to count them and only complain if there
>> were "lots". I had thought though that the general consensus was that
>> #ifdef in C files was pretty much frowned upon. I must admit to working
>> to the "you must be able to justify all winges in the output" rather
>> than expecting the result to be empty necessarily.
>
> I really think we should work towards, "if you see an error the odds are
> overwhelming that we aren't wasting your time and you should fix this."
> rather than, "every time you send a patch you will get a couple false
> positives that you will have to think about and justify to yourself,
> slowing down your sending the patch out and making more work for you".
> I'm not saying we have to have 100% accuracy, but I want it well in the
> 90s.

Yeah I tend to agree and most of my work to date has been squashing
false positives.

> Now back to the ifdef's. I don't think we can really even say a lot or
> a little is broken (short patch can do 1 ifdef that is stupid, long
> patch could do several that are good). I think we'll either have to let
> that one go or put it under a non-default flag. We do still have
> humans reviewing code who can make judgement calls like if you #ifdefs
> are good or not.
>
> What we can check for is #if 0 code. I hate that one.

For now I've disabled this one. Put it in the freezer with the
StudlyCaps one.

I like the idea of checking for #if 0 as that is very likely bogus in a
patch meant for inclusion.

>> We've not talked about how the tool might grow. My thought was that
>> soon we would enable the robot replies on linux-mm (say) and use the
>> feedback from that to tune what we do and do not report. I have been
>> pushing all of the contributions to -mm for sometime through it and
>> cirtainly the #ifdef one once of the more common ones (other than white
>> space dammage and >80 character lines).
>
> If you have reasonably good SPAM filters on your list and make the robot
> so it is very good about only picking up mail that really are patches
> then this could be a very good idea. Just be careful. I could send out
> a bunch of mail with fake headers saying it was from say Andy Whitcroft,
> then the robot replies and you got a bunch of junk mail.

All very good points. It does only reply to emails which are clearly
unidiff patches and silently drops all else. But the potential for spam
is worrysome ... will think on this some and see if we can come up with
a safety net.

> User feedback is really useful, both for us and for the user. Based on
> user feedback from the very small number of users I had I tweaked a lot
> of regular expressions, added some new checks, and removed some I could
> never get right.

I am getting a fair bit of feedback, most of it copied to Joel and Randy
so at least that bit is working.

-apw

2007-06-06 09:05:27

by Jesper Juhl

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On 04/06/07, Andy Whitcroft <[email protected]> wrote:
>
> This version brings a host of changes to cure false positives and
> bugs detected on patches submitted to lkml and -mm. It also brings
> a number of new tests in response to reviews, of particular note:
>

I have a few ideas for additional checks you could add to that script:

- Source files should be 7bit ASCII and Documentation/Kbuild
files/etc should be UTF-8, test that the patch honors that and doesn't
put something else in (cleanups that remove 8bit ASCII etc from a
source file is OK though).

- Check that nothing in the patch touches any file from Documentation/dontdiff

- Check for an excessive number of blank lines - some people have a
nasty tendency to put 3 or more blank lines between functions or
between comments and next source line etc.

- Check that all newlines added by the patch are "\n", not "\r",
"\r\n" or "\n\r".

- Check that, if the patch adds a new file, the new file ends with a newline.

- Warn about usage of the "register" keyword.

- Maybe warn about usage of float/double in source files?

- 'return' is not a function, so warn about patches that think it is
and use 'return(expr);' (this one is tricky since 'return (expr);' can
be OK in some cases.


That's all I can come up with at the moment. If more ideas pop up I'll
let you know.
Good work with that script btw.


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2007-06-06 11:50:18

by Jesper Juhl

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On 04/06/07, Andy Whitcroft <[email protected]> wrote:
>
> This version brings a host of changes to cure false positives and
> bugs detected on patches submitted to lkml and -mm. It also brings
> a number of new tests in response to reviews, of particular note:
>
A chmod +x scripts/checkpatch.pl would be nice :)

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2007-06-07 11:46:19

by Andy Whitcroft

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

Jesper Juhl wrote:
> On 04/06/07, Andy Whitcroft <[email protected]> wrote:
>>
>> This version brings a host of changes to cure false positives and
>> bugs detected on patches submitted to lkml and -mm. It also brings
>> a number of new tests in response to reviews, of particular note:
>>
> A chmod +x scripts/checkpatch.pl would be nice :)

While git carries the permissions I am am pretty sure a straight patch
doesn't. Where did you pick up your copy from linus' tree or from the
-mm one?

-apw

2007-06-07 11:52:45

by Jesper Juhl

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On 07/06/07, Andy Whitcroft <[email protected]> wrote:
> Jesper Juhl wrote:
> > On 04/06/07, Andy Whitcroft <[email protected]> wrote:
> >>
> >> This version brings a host of changes to cure false positives and
> >> bugs detected on patches submitted to lkml and -mm. It also brings
> >> a number of new tests in response to reviews, of particular note:
> >>
> > A chmod +x scripts/checkpatch.pl would be nice :)
>
> While git carries the permissions I am am pretty sure a straight patch
> doesn't. Where did you pick up your copy from linus' tree or from the
> -mm one?
>
via 'git pull' from Linus' tree.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2007-06-07 14:28:29

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03


On Jun 6 2007 11:05, Jesper Juhl wrote:
>
> - Source files should be 7bit ASCII

Nah. Think of....

MODULE_AUTHOR("J. Ørsted <[email protected]>");

> - Maybe warn about usage of float/double in source files?

Generally yes, maybe, but see arch/i386/kernel/cpu/bugs.c,
arch/i386/math-emu/. Generally there is nothing to it. I think the
feature to allow the kernel to use [i387] FP without manually
saving/restoring the FP stack has been added some time ago.

> - 'return' is not a function, so warn about patches that think it is
> and use 'return(expr);' (this one is tricky since 'return (expr);' can
> be OK in some cases.

Now, if we could detect superfluous parentheses and branches,
that'd be cool ;-) there are too many if ((a < 5) || (b > 6)) around.




Jan
--

2007-06-07 14:39:45

by Jesper Juhl

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On 07/06/07, Jan Engelhardt <[email protected]> wrote:
>
> On Jun 6 2007 11:05, Jesper Juhl wrote:
> >
> > - Source files should be 7bit ASCII
>
> Nah. Think of....
>
> MODULE_AUTHOR("J. ?rsted <[email protected]>");
>
That's true. I wrote that comment shortly after reading
http://lkml.org/lkml/2007/6/4/448 , but you are right, 7bit ASCII can
be too limiting at times... Hmmm...

> > - Maybe warn about usage of float/double in source files?
>
> Generally yes, maybe, but see arch/i386/kernel/cpu/bugs.c,
> arch/i386/math-emu/. Generally there is nothing to it. I think the
> feature to allow the kernel to use [i387] FP without manually
> saving/restoring the FP stack has been added some time ago.
>
I know there are places where floats and doubles can be used safely,
but for those rare occasions wouldn't it make sense to have the script
warn and require the submitter to justify the use? After all, the
general rule is to not use floating point in the kernel, so such a
patch is suspicious.

> > - 'return' is not a function, so warn about patches that think it is
> > and use 'return(expr);' (this one is tricky since 'return (expr);' can
> > be OK in some cases.
>
> Now, if we could detect superfluous parentheses and branches,
> that'd be cool ;-) there are too many if ((a < 5) || (b > 6)) around.
>
Yeah wouldn't it be cool :-) It might require a bit too much perl
magic to actually implement something sane, but I just threw every
idea that came into my mind into the mail, assuming Andy could sort
out the ones that were a little too crazy ;)

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2007-06-07 14:52:23

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03


On Jun 7 2007 12:46, Andy Whitcroft wrote:
>
>Jesper Juhl wrote:
>> On 04/06/07, Andy Whitcroft <[email protected]> wrote:
>>>
>>> This version brings a host of changes to cure false positives and
>>> bugs detected on patches submitted to lkml and -mm. It also brings
>>> a number of new tests in response to reviews, of particular note:
>>>
>> A chmod +x scripts/checkpatch.pl would be nice :)
>
>While git carries the permissions I am am pretty sure a straight patch
>doesn't. Where did you pick up your copy from linus' tree or from the
>-mm one?

Linus's tree lacks the +x bit...


Jan
--

2007-06-07 15:16:43

by Andy Whitcroft

[permalink] [raw]
Subject: checkpatch.pl: should be executable

scripts/checkpatch.pl should be executable, make it so.

Signed-off-by: Andy Whitcroft <[email protected]>
---
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
old mode 100644
new mode 100755

2007-06-07 15:33:30

by Joel Schopp

[permalink] [raw]
Subject: Re: checkpatch.pl: should be executable

Andy Whitcroft wrote:
> scripts/checkpatch.pl should be executable, make it so.
>
> Signed-off-by: Andy Whitcroft <[email protected]>
> ---
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> old mode 100644
> new mode 100755
>

Acked-by: Joel Schopp <[email protected]>

2007-06-07 19:32:32

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Wed, Jun 06, 2007 at 11:05:07AM +0200, Jesper Juhl wrote:
>...
> - Source files should be 7bit ASCII and Documentation/Kbuild
> files/etc should be UTF-8, test that the patch honors that and doesn't
> put something else in (cleanups that remove 8bit ASCII etc from a
> source file is OK though).
>...

That's wrong:
Code must be 7bit ASCII.
Comments in source files can be UTF-8.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-07 19:34:19

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Thu, Jun 07, 2007 at 04:28:20PM +0200, Jan Engelhardt wrote:
>
> On Jun 6 2007 11:05, Jesper Juhl wrote:
> >
> > - Source files should be 7bit ASCII
>
> Nah. Think of....
>
> MODULE_AUTHOR("J. Ørsted <[email protected]>");
>...

NO!

Code must be 7bit ASCII.
This includes everything that gets into the kernel image.

> Jan

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-07 22:16:24

by Alan

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Thu, 7 Jun 2007 21:32:29 +0200
Adrian Bunk <[email protected]> wrote:

> On Wed, Jun 06, 2007 at 11:05:07AM +0200, Jesper Juhl wrote:
> >...
> > - Source files should be 7bit ASCII and Documentation/Kbuild
> > files/etc should be UTF-8, test that the patch honors that and doesn't
> > put something else in (cleanups that remove 8bit ASCII etc from a
> > source file is OK though).
> >...
>
> That's wrong:
> Code must be 7bit ASCII.
> Comments in source files can be UTF-8.

Also there is no such thing as 8bit ASCII.

2007-06-07 22:20:49

by Alan

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Thu, 7 Jun 2007 21:34:13 +0200
Adrian Bunk <[email protected]> wrote:

> On Thu, Jun 07, 2007 at 04:28:20PM +0200, Jan Engelhardt wrote:
> >
> > On Jun 6 2007 11:05, Jesper Juhl wrote:
> > >
> > > - Source files should be 7bit ASCII
> >
> > Nah. Think of....
> >
> > MODULE_AUTHOR("J. Ørsted <[email protected]>");
> >...
>
> NO!
>
> Code must be 7bit ASCII.
> This includes everything that gets into the kernel image.

Disagree Adrian

For quoted strings you want to include Unicode where appropriate, and the
names of people happens to be highly appropriate. Trashing non US names
is just rude, and in many cases extremely problematic because losing
accent marks totally changes the meaning of the word and the
pronunciation of the name.

Now anyone who puts UTF-8 in the driver name or module options should get
a lot of NAKs but putting it in the Author name is precisely where it is
appropriate and correct. I suspect Author names are almost the only case
where this is appropriate and/or neccessary.

Alan

2007-06-07 23:21:57

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Thu, Jun 07, 2007 at 11:22:48PM +0100, Alan Cox wrote:
> On Thu, 7 Jun 2007 21:34:13 +0200
> Adrian Bunk <[email protected]> wrote:
>
> > On Thu, Jun 07, 2007 at 04:28:20PM +0200, Jan Engelhardt wrote:
> > >
> > > On Jun 6 2007 11:05, Jesper Juhl wrote:
> > > >
> > > > - Source files should be 7bit ASCII
> > >
> > > Nah. Think of....
> > >
> > > MODULE_AUTHOR("J. Ørsted <[email protected]>");
> > >...
> >
> > NO!
> >
> > Code must be 7bit ASCII.
> > This includes everything that gets into the kernel image.
>
> Disagree Adrian
>
> For quoted strings you want to include Unicode where appropriate, and the
> names of people happens to be highly appropriate. Trashing non US names
> is just rude, and in many cases extremely problematic because losing
> accent marks totally changes the meaning of the word and the
> pronunciation of the name.
>
> Now anyone who puts UTF-8 in the driver name or module options should get
> a lot of NAKs but putting it in the Author name is precisely where it is
> appropriate and correct. I suspect Author names are almost the only case
> where this is appropriate and/or neccessary.

I added a MODULE_AUTHOR("J. Ørsted <[email protected]>") into the "raw"
module:

# echo $LANG
C
# modinfo --version
module-init-tools version 3.3-pre11
# modinfo raw
filename: /lib/modules/2.6.21.2/kernel/drivers/char/raw.ko
author: J. Ã
^ the cursor hangs here


So for implementing your proposal, we have to:
- get module-init-tools fixed and
- document that 2.6.23 (or whichever will be the first kernel to support
UTF-8 in MODULE_AUTHOR) will require updated module-init-tools.


Oh, and when you are anyway planning to break older userspace, can you
remove the obsolete "raw" driver at the same time? ;-)


> Alan

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-07 23:39:29

by Alan

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

> I added a MODULE_AUTHOR("J. Ørsted <[email protected]>") into the "raw"
> module:
>
> # echo $LANG
> C
> # modinfo --version
> module-init-tools version 3.3-pre11
> # modinfo raw
> filename: /lib/modules/2.6.21.2/kernel/drivers/char/raw.ko
> author: J. Ã
> ^ the cursor hangs here

"Mummy if I deliberately shoot myself in the head it hurts"

Distro's don't ship in C locale and haven't for years. And the worst case
effect you can engineer by trying is to display some slightly odd symbols

(And incidentially since the Linux fs has been defined to be utf-8 for
naming for many years you'll find the same problem using "ls")

> - get module-init-tools fixed and
> - document that 2.6.23 (or whichever will be the first kernel to support
> UTF-8 in MODULE_AUTHOR) will require updated module-init-tools.

"Require" is a rather strong word for a print formatting issue specific
to obscure setups.

Alan

2007-06-07 23:49:17

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Fri, Jun 08, 2007 at 01:21:52AM +0200, Adrian Bunk wrote:
>...
> I added a MODULE_AUTHOR("J. Ørsted <[email protected]>") into the "raw"
> module:
>
> # echo $LANG
> C
> # modinfo --version
> module-init-tools version 3.3-pre11
> # modinfo raw
> filename: /lib/modules/2.6.21.2/kernel/drivers/char/raw.ko
> author: J. Ã
> ^ the cursor hangs here
>...

If anyone's wondering what's happening:

The UTF-8 representation of the character Ø consists of the two bytes
0xC3 0x98

In the ISO/IEC 8859 encodings where every character is represented by
one bytes this corresponds to two characters:

In ISO/IEC 8859-1 the byte 0xC3 represents the character à resulting in
the (harmless) display of this wrong character.

But in all the ISO/IEC 8859 encodings, the byte 0x98 is the
_control code_ "Start of String".

Therefore, if we want start using UTF-8 anywhere into the kernel, we
must ensure that all applications correctly convert all characters
if running in a non-UTF-8 environment.

I'm not sure that's worth the hassle.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-08 00:05:01

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Fri, Jun 08, 2007 at 12:41:17AM +0100, Alan Cox wrote:
> > I added a MODULE_AUTHOR("J. Ørsted <[email protected]>") into the "raw"
> > module:
> >
> > # echo $LANG
> > C
> > # modinfo --version
> > module-init-tools version 3.3-pre11
> > # modinfo raw
> > filename: /lib/modules/2.6.21.2/kernel/drivers/char/raw.ko
> > author: J. Ã
> > ^ the cursor hangs here
>
> "Mummy if I deliberately shoot myself in the head it hurts"
>
> Distro's don't ship in C locale and haven't for years. And the worst case
> effect you can engineer by trying is to display some slightly odd symbols

If it would only display some slightly odd symbols I wouldn't complain.

The problem is that the second byte is interpreted as a control code.

Is there any trick to get the shell working again in this situation?
The cursor hangs, and I've not yet found a trick to do anything in this
xterm again (except for killing it from another xterm).

> (And incidentially since the Linux fs has been defined to be utf-8 for
> naming for many years you'll find the same problem using "ls")

No, "ls" can handle it perfectly:

# echo $LANG
C
# ls
??rsted
#

Or:

$ echo $LANG
en_US
$ ls
Ã?rsted
$

Different from the lsmod example, the cursor doesn't hang and the shell
is usable after this command.

The difference is that "ls" expects and handles such issues while
"lsmod" (and most likely also other userspace working with kernel
output) does not yet handle it resulting in problems if bytes are
wrongly interpreted as control codes.

> > - get module-init-tools fixed and
> > - document that 2.6.23 (or whichever will be the first kernel to support
> > UTF-8 in MODULE_AUTHOR) will require updated module-init-tools.
>
> "Require" is a rather strong word for a print formatting issue specific
> to obscure setups.

See obove, it's not only "print formatting", it's "kills my shell".

> Alan

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-08 05:11:33

by Jon Masters

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Fri, 2007-06-08 at 02:04 +0200, Adrian Bunk wrote:

> The difference is that "ls" expects and handles such issues while
> "lsmod" (and most likely also other userspace working with kernel
> output) does not yet handle it resulting in problems if bytes are
> wrongly interpreted as control codes.

I'm happy to modify module-init-tools for Unicode support, I just didn't
think this was a huge issue - but now there's a test case so... :-)

Jon.


2007-06-08 08:58:32

by Jan-Benedict Glaw

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Fri, 2007-06-08 02:04:55 +0200, Adrian Bunk <[email protected]> wrote:
> On Fri, Jun 08, 2007 at 12:41:17AM +0100, Alan Cox wrote:
> > (And incidentially since the Linux fs has been defined to be utf-8 for
> > naming for many years you'll find the same problem using "ls")
>
> No, "ls" can handle it perfectly:
>
> # echo $LANG
> C
> # ls
> ??rsted
> #

`ls' may be playing tricks by checking whether its output goes to a
TTY. Does the terminal also hang for

$ ls | cat
or
$ ls -N

MfG, JBG

--
Jan-Benedict Glaw [email protected] +49-172-7608481
Signature of: http://catb.org/~esr/faqs/smart-questions.html
the second :


Attachments:
(No filename) (667.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-06-08 09:31:28

by Andy Whitcroft

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

Rene Herman wrote:
> On 06/04/2007 09:08 PM, Andy Whitcroft wrote:
>
>> I guess line length and white space checks make sense some degree on
>> those files. I'll sort that out and I guess we'll have anohter version.
>
> Could you then also post the thing itself, and not just a patch against
> the previous version for us chickens too scared to run -mm?

Ok, the latest version 0.04 is released and I have also put up the
complete script at:

http://www.shadowen.org/~apw/public/checkpatch/checkpatch.pl-0.04

Previous versions are also there.

-apw

2007-06-08 10:12:46

by Rene Herman

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On 06/08/2007 11:31 AM, Andy Whitcroft wrote:

> Ok, the latest version 0.04 is released and I have also put up the
> complete script at:
>
> http://www.shadowen.org/~apw/public/checkpatch/checkpatch.pl-0.04
>
> Previous versions are also there.

Thank you. False positive:

do not use assignment in condition
#809: FILE: drivers/cdrom/mitsumi.c:766:
+ while ((req = elv_next_request(q))) {

Doing an assignment in a while (or for) condition like that makes perfect
sense. The check should probably be limited to if conditions -- you can
always split those.

Next:

struct mutex definition without comment
#173: FILE: drivers/cdrom/mitsumi.c:130:
+ struct mutex mutex;

Going overboard. It's the only locking primitive in the driver; the only
comment that could be added is something like "used for mutual exclusion"
which firmly falls into the "i++; /* increase i */" category. A bit further
on up in the driver, a:

/*
* LOCKING: mutex_lock(&mcd->mutex)
*/

is present just before the functions that need to be called with it held
(all, basically). I'd suggest simply dropping this check as its intention is
not something that can be usefully automated I feel. The tree would end up
with tons of useless comments that are there just to shut up the script.

And next a ton of:

labels should not be indented
#249: FILE: drivers/cdrom/mitsumi.c:206:
+ out:

This driver is putting two spaces in front of labels, as explained here:

http://lkml.org/lkml/2007/6/5/61

I do agree that putting them level aligned is a _significantly_ different
style, so perhaps it wants to warn if a label is not within the first indent
level (column 0-7) but if even that's contentious then this check should
perhaps go completely as well. One or two spaces, four for all I care, can
be considered a personal preference I feel.

Rene.

2007-06-08 10:51:29

by Alan

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

> The problem is that the second byte is interpreted as a control code.
>
> Is there any trick to get the shell working again in this situation?
> The cursor hangs, and I've not yet found a trick to do anything in this
> xterm again (except for killing it from another xterm).

For gnome terminal just select 'reset terminal' in the menu or escape-[c;
(from memory) is the VT reset code. If your xterm can be stuck forever
file a security bug against your vendors xterm for a DoS attack problem.

> > "Require" is a rather strong word for a print formatting issue specific
> > to obscure setups.
>
> See obove, it's not only "print formatting", it's "kills my shell".

It printed a symbol, if your shell really got screwed that much by it
then your shell needs work perhaps. I'm not btw arguing that we shouldn't
teach the tools to be more polite, just that its hardly a "requirement"

Alan

2007-06-08 12:39:45

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Fri, Jun 08, 2007 at 11:52:19AM +0100, Alan Cox wrote:
> > The problem is that the second byte is interpreted as a control code.
> >
> > Is there any trick to get the shell working again in this situation?
> > The cursor hangs, and I've not yet found a trick to do anything in this
> > xterm again (except for killing it from another xterm).
>
> For gnome terminal just select 'reset terminal' in the menu or escape-[c;
> (from memory) is the VT reset code. If your xterm can be stuck forever
> file a security bug against your vendors xterm for a DoS attack problem.

Someone else already told me this trick, and the "Full Reset" from the
Control + middle mouse button menu works with my xterm.

Not a problem if you know about it or if you have time.

> > > "Require" is a rather strong word for a print formatting issue specific
> > > to obscure setups.
> >
> > See obove, it's not only "print formatting", it's "kills my shell".
>
> It printed a symbol, if your shell really got screwed that much by it
> then your shell needs work perhaps.

My shell is bash...

> I'm not btw arguing that we shouldn't
> teach the tools to be more polite, just that its hardly a "requirement"

For tools like ls or vim that have to deal with every kind of charset
confusion for ages such issues have already been shaken out.

But tools don't expects the kernel to output non-ASCII strings.

It's not only about MODULE_AUTHOR, if you consider it rude to limit
people's names to ASCII, then don't forget that we have printk's like
Linux agpgart interface v0.102 (c) Dave Jones

What happens if the maintainer changes and it's now
Linux agpgart interface v0.103 © Dave Ønes

Does the console handle it correctly during boot?
Can all tools that process the syslog cope with it?

Perhaps the answer is in both cases "yes", but it's a completely
untested area.

We really must have all bugs shaken out and all users using fixed tools
_before_ we can start outputting UTF-8 - limiting people's names to
ASCII in not ideal, but IMHO causing breakages for users is a much
bigger problem.

> Alan

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-08 14:34:21

by Jesper Juhl

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On 08/06/07, Adrian Bunk <[email protected]> wrote:
[snip]
>
> It's not only about MODULE_AUTHOR, if you consider it rude to limit
> people's names to ASCII, then don't forget that we have printk's like
> Linux agpgart interface v0.102 (c) Dave Jones
>
> What happens if the maintainer changes and it's now
> Linux agpgart interface v0.103 (c) Dave ?nes
>
> Does the console handle it correctly during boot?
> Can all tools that process the syslog cope with it?
>
> Perhaps the answer is in both cases "yes", but it's a completely
> untested area.
>
> We really must have all bugs shaken out and all users using fixed tools
> _before_ we can start outputting UTF-8 - limiting people's names to
> ASCII in not ideal, but IMHO causing breakages for users is a much
> bigger problem.
>

I haven't looked at it in depth yet, but it would seem we already have
a few files that need to be looked at with this in mind. Looks like
it's not exactely a new problem (although all the following could be
in comments of course)...

$ find ./ -name "*.[ch]" | xargs file | grep -i utf
./arch/arm/mach-pxa/leds-trizeps4.c: UTF-8
Unicode C program text
./arch/arm/mach-pxa/trizeps4.c: UTF-8
Unicode C program text
./arch/powerpc/platforms/cell/spufs/file.c: UTF-8
Unicode C program text
./drivers/acpi/asus_acpi.c: UTF-8
Unicode C program text
./drivers/char/drm/r128_drv.h: UTF-8
Unicode C program text
./drivers/char/drm/radeon_irq.c: UTF-8
Unicode C program text
./drivers/char/drm/drm_drawable.c: UTF-8
Unicode C program text
./drivers/char/drm/drm_pci.c: UTF-8
Unicode C program text
./drivers/char/drm/drm_core.h: UTF-8
Unicode C program text
./drivers/char/hw_random/omap-rng.c: UTF-8
Unicode C program text
./drivers/char/esp.c: UTF-8
Unicode C program text
./drivers/char/watchdog/iTCO_vendor_support.c: UTF-8
Unicode C program text
./drivers/i2c/busses/i2c-iop3xx.c: UTF-8
Unicode C program text
./drivers/infiniband/core/multicast.c: UTF-8
Unicode C program text
./drivers/infiniband/core/sa.h: UTF-8
Unicode C program text
./drivers/infiniband/core/sa_query.c: UTF-8
Unicode C program text
./drivers/mtd/chips/cfi_cmdset_0001.c: UTF-8
Unicode C program text
./drivers/mtd/chips/cfi_probe.c: UTF-8
Unicode C program text
./drivers/mtd/devices/block2mtd.c: UTF-8
Unicode C program text
./drivers/mtd/devices/phram.c: UTF-8
Unicode English text
./drivers/mtd/maps/cfi_flagadm.c: UTF-8
Unicode C program text
./drivers/mtd/maps/dbox2-flash.c: UTF-8
Unicode C program text
./drivers/mtd/maps/mtx-1_flash.c: UTF-8
Unicode C program text
./drivers/mtd/nand/ts7250.c: UTF-8
Unicode C program text
./drivers/mtd/nand/cafe_nand.c: UTF-8
Unicode C program text
./drivers/mtd/nand/cmx270_nand.c: UTF-8
Unicode C program text
./drivers/mtd/nand/cs553x_nand.c: UTF-8
Unicode C program text
./drivers/mtd/nand/edb7312.c: UTF-8
Unicode C program text
./drivers/mtd/nand/h1910.c: UTF-8
Unicode C program text
./drivers/mtd/mtdsuper.c: UTF-8
Unicode C program text
./drivers/mtd/ubi/build.c: UTF-8
Unicode C program text
./drivers/mtd/ubi/cdev.c: UTF-8
Unicode C program text
./drivers/mtd/ubi/debug.c: UTF-8
Unicode C program text
./drivers/mtd/ubi/debug.h: UTF-8
Unicode C program text
./drivers/mtd/ubi/gluebi.c: UTF-8
Unicode C program text
./drivers/mtd/ubi/io.c: UTF-8
Unicode C program text
./drivers/mtd/ubi/kapi.c: UTF-8
Unicode C program text
./drivers/mtd/ubi/misc.c: UTF-8
Unicode C program text
./drivers/mtd/ubi/scan.c: UTF-8
Unicode C program text
./drivers/mtd/ubi/scan.h: UTF-8
Unicode C program text
./drivers/mtd/ubi/ubi.h: UTF-8
Unicode C program text
./drivers/mtd/ubi/upd.c: UTF-8
Unicode C program text
./drivers/mtd/ubi/vmt.c: UTF-8
Unicode C program text
./drivers/mtd/ubi/vtbl.c: UTF-8
Unicode C program text
./drivers/mtd/ubi/wl.c: UTF-8
Unicode C program text
./drivers/mtd/ubi/eba.c: UTF-8
Unicode C program text
./drivers/net/irda/toim3232-sir.c: UTF-8
Unicode English text
./drivers/net/irda/kingsun-sir.c: UTF-8
Unicode Pascal program text
./drivers/net/atl1/atl1_hw.h: UTF-8
Unicode C program text
./drivers/scsi/atari_NCR5380.c:
UTF-8 Unicode C program text
./drivers/scsi/jazz_esp.c:
UTF-8 Unicode C program text
./drivers/usb/misc/iowarrior.c:
UTF-8 Unicode C program text
./drivers/usb/misc/auerswald.c:
UTF-8 Unicode C program text
./drivers/video/atafb_iplan2p2.c:
UTF-8 Unicode C program text
./drivers/video/atafb_iplan2p4.c:
UTF-8 Unicode C program text
./drivers/video/atafb_iplan2p8.c:
UTF-8 Unicode C program text
./fs/afs/afs_vl.h:
UTF-8 Unicode C program text
./fs/jffs2/acl.c:
UTF-8 Unicode C program text
./fs/jffs2/acl.h:
UTF-8 Unicode C program text
./fs/jffs2/background.c:
UTF-8 Unicode C program text
./fs/jffs2/build.c:
UTF-8 Unicode C program text
./fs/jffs2/compr.c:
UTF-8 Unicode C program text
./fs/jffs2/compr.h:
UTF-8 Unicode C program text
./fs/jffs2/compr_rtime.c:
UTF-8 Unicode C program text
./fs/jffs2/compr_rubin.c:
UTF-8 Unicode C program text
./fs/jffs2/compr_zlib.c:
UTF-8 Unicode C program text
./fs/jffs2/debug.c:
UTF-8 Unicode C program text
./fs/jffs2/debug.h:
UTF-8 Unicode C program text
./fs/jffs2/dir.c:
UTF-8 Unicode C program text
./fs/jffs2/erase.c:
UTF-8 Unicode C program text
./fs/jffs2/file.c:
UTF-8 Unicode C program text
./fs/jffs2/fs.c:
UTF-8 Unicode C program text
./fs/jffs2/gc.c:
UTF-8 Unicode C program text
./fs/jffs2/ioctl.c:
UTF-8 Unicode C program text
./fs/jffs2/jffs2_fs_i.h:
UTF-8 Unicode C program text
./fs/jffs2/malloc.c:
UTF-8 Unicode C program text
./fs/jffs2/nodelist.c:
UTF-8 Unicode C program text
./fs/jffs2/nodelist.h:
UTF-8 Unicode C program text
./fs/jffs2/os-linux.h:
UTF-8 Unicode C program text
./fs/jffs2/read.c:
UTF-8 Unicode C program text
./fs/jffs2/security.c:
UTF-8 Unicode C program text
./fs/jffs2/summary.c:
UTF-8 Unicode C program text
./fs/jffs2/summary.h:
UTF-8 Unicode C program text
./fs/jffs2/symlink.c:
UTF-8 Unicode C program text
./fs/jffs2/wbuf.c:
UTF-8 Unicode C program text
./fs/jffs2/write.c:
UTF-8 Unicode C program text
./fs/jffs2/xattr.h:
UTF-8 Unicode C program text
./fs/jffs2/xattr_trusted.c:
UTF-8 Unicode C program text
./fs/jffs2/xattr_user.c:
UTF-8 Unicode C program text
./fs/jffs2/readinode.c:
UTF-8 Unicode C program text
./fs/jffs2/super.c:
UTF-8 Unicode C program text
./fs/jffs2/jffs2_fs_sb.h:
UTF-8 Unicode C program text
./fs/jffs2/nodemgmt.c:
UTF-8 Unicode C program text
./fs/jffs2/scan.c:
UTF-8 Unicode C program text
./fs/jffs2/writev.c:
UTF-8 Unicode C program text
./fs/jffs2/xattr.c:
UTF-8 Unicode C program text
./fs/nls/nls_utf8.c:
ASCII C program text
./include/asm-arm/arch-aaec2000/aaec2000.h:
UTF-8 Unicode C program text
./include/asm-arm/arch-integrator/platform.h:
UTF-8 Unicode C program text
./include/asm-arm/arch-omap/board.h:
UTF-8 Unicode C program text
./include/asm-arm/arch-omap/dma.h:
UTF-8 Unicode C program text
./include/asm-arm/arch-omap/gpio.h:
UTF-8 Unicode C program text
./include/asm-arm/arch-pxa/trizeps4.h:
UTF-8 Unicode C program text
./include/asm-m68k/atariints.h: UTF-8
Unicode C program text
./include/asm-m68k/atarihw.h: UTF-8
Unicode C program text
./include/asm-ppc/hydra.h: UTF-8
Unicode C program text
./include/linux/i2c-algo-bit.h: UTF-8
Unicode C program text
./include/linux/i2c-algo-pcf.h: UTF-8
Unicode C program text
./include/linux/i2c.h: UTF-8
Unicode C program text
./include/linux/irda.h: UTF-8
Unicode Pascal program text
./include/linux/meye.h: UTF-8
Unicode C program text
./include/linux/mtd/super.h: UTF-8
Unicode C program text
./include/linux/mtd/mtd.h: UTF-8
Unicode C program text
./include/linux/mtd/ubi.h: UTF-8
Unicode C program text
./include/linux/sonypi.h: UTF-8
Unicode C program text
./include/mtd/ubi-header.h: UTF-8
Unicode C program text
./include/mtd/ubi-user.h: UTF-8
Unicode C program text
./include/net/irda/irda.h: UTF-8
Unicode Pascal program text
./include/net/irda/iriap.h: UTF-8
Unicode Pascal program text
./include/net/irda/iriap_event.h: UTF-8
Unicode Pascal program text
./include/net/irda/irias_object.h: UTF-8
Unicode Pascal program text
./include/net/irda/irlan_client.h: UTF-8
Unicode Pascal program text
./include/net/irda/irlan_common.h: UTF-8
Unicode Pascal program text
./include/net/irda/irlan_eth.h: UTF-8
Unicode Pascal program text
./include/net/irda/irlan_event.h: UTF-8
Unicode Pascal program text
./include/net/irda/irlan_filter.h: UTF-8
Unicode Pascal program text
./include/net/irda/irlan_provider.h: UTF-8
Unicode Pascal program text
./include/net/irda/irlap.h: UTF-8
Unicode Pascal program text
./include/net/irda/irlmp.h: UTF-8
Unicode Pascal program text
./include/net/irda/irlmp_event.h: UTF-8
Unicode Pascal program text
./include/net/irda/irlmp_frame.h: UTF-8
Unicode Pascal program text
./include/net/irda/irmod.h: UTF-8
Unicode Pascal program text
./include/net/irda/irqueue.h: UTF-8
Unicode English text
./include/net/irda/irttp.h: UTF-8
Unicode Pascal program text
./include/net/irda/parameters.h: UTF-8
Unicode Pascal program text
./include/net/irda/timer.h: UTF-8
Unicode Pascal program text
./include/net/irda/wrapper.h: UTF-8
Unicode Pascal program text
./include/net/irda/af_irda.h: UTF-8
Unicode Pascal program text
./kernel/sys.c: UTF-8 Unicode
C program text
./sound/drivers/mts64.c: UTF-8 Unicode
C program text
./sound/oss/es1371.c: UTF-8 Unicode
C program text
./sound/oss/pas2_pcm.c: UTF-8 Unicode
C program text
./sound/oss/trident.c: UTF-8 Unicode
C program text
./sound/pci/ice1712/prodigy192.c: UTF-8 Unicode
C program text
./sound/pci/mixart/mixart.c: UTF-8 Unicode
C program text

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2007-06-08 14:42:35

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Fri, Jun 08, 2007 at 04:34:01PM +0200, Jesper Juhl wrote:
> On 08/06/07, Adrian Bunk <[email protected]> wrote:
> [snip]
>>
>> It's not only about MODULE_AUTHOR, if you consider it rude to limit
>> people's names to ASCII, then don't forget that we have printk's like
>> Linux agpgart interface v0.102 (c) Dave Jones
>>
>> What happens if the maintainer changes and it's now
>> Linux agpgart interface v0.103 (c) Dave Ønes
>>
>> Does the console handle it correctly during boot?
>> Can all tools that process the syslog cope with it?
>>
>> Perhaps the answer is in both cases "yes", but it's a completely
>> untested area.
>>
>> We really must have all bugs shaken out and all users using fixed tools
>> _before_ we can start outputting UTF-8 - limiting people's names to
>> ASCII in not ideal, but IMHO causing breakages for users is a much
>> bigger problem.
>
> I haven't looked at it in depth yet, but it would seem we already have
> a few files that need to be looked at with this in mind. Looks like
> it's not exactely a new problem (although all the following could be
> in comments of course)...
>...

They should all be in comments, and all UTF-8 I've ever seen in such
files was only in comments (mostly author names). So yes, adding UTF-8
to program code will result in new problems.

If you find any source file that contains UTF-8 outside of comments
please complain loudly.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-08 15:16:46

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03


On Jun 8 2007 16:42, Adrian Bunk wrote:
>On Fri, Jun 08, 2007 at 04:34:01PM +0200, Jesper Juhl wrote:
>> On 08/06/07, Adrian Bunk <[email protected]> wrote:
>> [snip]
>>>
>>> It's not only about MODULE_AUTHOR, if you consider it rude to limit
>>> people's names to ASCII, then don't forget that we have printk's like
>>> Linux agpgart interface v0.102 (c) Dave Jones
>>>
>>> What happens if the maintainer changes and it's now
>>> Linux agpgart interface v0.103 (c) Dave Ønes
>>>
>>> Does the console handle it correctly during boot?
>>> Can all tools that process the syslog cope with it?
>>>
>>> Perhaps the answer is in both cases "yes", but it's a completely
>>> untested area.
>>>
>>> We really must have all bugs shaken out and all users using fixed tools
>>> _before_ we can start outputting UTF-8 - limiting people's names to
>>> ASCII in not ideal, but IMHO causing breakages for users is a much
>>> bigger problem.
>>
>> I haven't looked at it in depth yet, but it would seem we already have
>> a few files that need to be looked at with this in mind. Looks like
>> it's not exactely a new problem (although all the following could be
>> in comments of course)...
>>...
>
>They should all be in comments, and all UTF-8 I've ever seen in such
>files was only in comments (mostly author names). So yes, adding UTF-8
>to program code will result in new problems.
>
>If you find any source file that contains UTF-8 outside of comments
>please complain loudly.

I present loudly and proudly (I *don't* complain):

17:11 ichi:/ws/linux-2.6.22-rc4 > find . -iname '*.[ch]' -print0 | xargs -0 grep MODULE_AUTHOR | grep -P '[\x80-\xff]' --color=never
./arch/i386/kernel/cpu/cpufreq/e_powersaver.c:MODULE_AUTHOR("Rafa� Bilski <[email protected]>");
./drivers/char/watchdog/i6300esb.c:MODULE_AUTHOR("Ross Biro and David H�deman");
./drivers/char/watchdog/w83627hf_wdt.c:MODULE_AUTHOR("P�raig Brady <[email protected]>");
./drivers/hwmon/via686a.c:MODULE_AUTHOR("Ky�ti M�kki <[email protected]>, "
./drivers/i2c/busses/i2c-via.c:MODULE_AUTHOR("Ky�ti M�kki <[email protected]>");
./drivers/input/keyboard/omap-keypad.c:MODULE_AUTHOR("Timo Ter�");
./drivers/isdn/hisax/isdnhdlc.c:MODULE_AUTHOR("Wolfgang M�s <[email protected]>, "
./drivers/mmc/host/omap.c:MODULE_AUTHOR("Juha Yrj��);
./drivers/mtd/devices/phram.c:MODULE_AUTHOR("Jörn Engel <[email protected]>");
./drivers/mtd/maps/cfi_flagadm.c:MODULE_AUTHOR("Kári Davíðsson <[email protected]>");
./drivers/mtd/maps/dbox2-flash.c:MODULE_AUTHOR("Kári Davíðsson <[email protected]>, Bastian Blank <[email protected]>, Alexander Wild <[email protected]>");
./drivers/net/irda/kingsun-sir.c:MODULE_AUTHOR("Alex Villac�s Lasso <[email protected]>");
./drivers/scsi/aha152x.c:MODULE_AUTHOR("J�gen Fischer");
./drivers/scsi/sni_53c710.c:MODULE_AUTHOR("Thomas Bogend�fer");
./drivers/usb/misc/emi26.c:MODULE_AUTHOR("tapio laxstr�");
./drivers/usb/misc/emi62.c:MODULE_AUTHOR("tapio laxstr�");

So, we had some ISO8859-1 and some UTF-8 in there already. (And as for
MODULE_AUTHOR, it should stay there - 'fix' modinfo instead.)


BTW, there's also still a ton of non-UTF-8 in the kernel; I've already
fixed that weeks ago and sent some patch to trivial@, Adrian -
did you receive it?



Jan
--

2007-06-08 15:39:39

by Alan

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

> >>> Does the console handle it correctly during boot?

Yes

> >>> Can all tools that process the syslog cope with it?

Thats a stupid question. The tools people normally use can just fine.

> >If you find any source file that contains UTF-8 outside of comments
> >please complain loudly.
>
> I present loudly and proudly (I *don't* complain):

Point made - Adrian, if the tool complains about UTF-8 in author texts
then its buggy and should not be merged. The fact you have a personal
issue with it is neither here nor there

> So, we had some ISO8859-1 and some UTF-8 in there already. (And as for
> MODULE_AUTHOR, it should stay there - 'fix' modinfo instead.)

Using UTF-8 not 8859-1 for consistency is sensible, especially as 8859-1
is obsolete and effectively useless now (although I guess much of the
'8859-1' in the kernel is 1:1 with 8859-15, which isn't so obsolete but
is just as useless)

2007-06-08 15:42:44

by Jon Masters

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Fri, 2007-06-08 at 17:16 +0200, Jan Engelhardt wrote:

> So, we had some ISO8859-1 and some UTF-8 in there already. (And as for
> MODULE_AUTHOR, it should stay there - 'fix' modinfo instead.)

Ok.

Jon.


2007-06-08 16:03:59

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

> ./drivers/infiniband/core/multicast.c: UTF-8 Unicode C program text
> ./drivers/infiniband/core/sa.h: UTF-8 Unicode C program text
> ./drivers/infiniband/core/sa_query.c: UTF-8 Unicode C program text

These three seem to be caused by bogus 0xa0 characters that snuck in
somehow. I'll push the patch below for 2.6.23:

diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index 1e13ab4..15b4c4d 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2006 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2006 Intel Corporation. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/core/sa.h b/drivers/infiniband/core/sa.h
index 24c93fd..b1d4bbf 100644
--- a/drivers/infiniband/core/sa.h
+++ b/drivers/infiniband/core/sa.h
@@ -1,6 +1,6 @@
/*
* Copyright (c) 2004 Topspin Communications. All rights reserved.
- * Copyright (c) 2005 Voltaire, Inc.  All rights reserved.
+ * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
* Copyright (c) 2006 Intel Corporation. All rights reserved.
*
* This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 6469406..9d3797f 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -1,6 +1,6 @@
/*
* Copyright (c) 2004 Topspin Communications. All rights reserved.
- * Copyright (c) 2005 Voltaire, Inc.  All rights reserved.
+ * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
* Copyright (c) 2006 Intel Corporation. All rights reserved.
*
* This software is available to you under a choice of one of two

2007-06-08 16:39:20

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03

On Fri, Jun 08, 2007 at 04:42:36PM +0100, Alan Cox wrote:
> > >>> Does the console handle it correctly during boot?
>
> Yes
>
> > >>> Can all tools that process the syslog cope with it?
>
> Thats a stupid question. The tools people normally use can just fine.
>
> > >If you find any source file that contains UTF-8 outside of comments
> > >please complain loudly.
> >
> > I present loudly and proudly (I *don't* complain):
>
> Point made - Adrian, if the tool complains about UTF-8 in author texts
> then its buggy and should not be merged. The fact you have a personal
> issue with it is neither here nor there

It's not a personal issue. Generally, I like UTF-8.

I'm simply saying that allowing UTF-8 in MODULE_AUTHOR and printk's as
you want to can have unwanted effects.

And I gave modinfo as an example for a tool that is not yet able to
handle UTF-8 correctly in all cases.

In my opinion, it's not worth the hassle to allow UTF-8 there.
Feel free to disagree.

> > So, we had some ISO8859-1 and some UTF-8 in there already. (And as for
> > MODULE_AUTHOR, it should stay there - 'fix' modinfo instead.)
>
> Using UTF-8 not 8859-1 for consistency is sensible, especially as 8859-1
> is obsolete and effectively useless now (although I guess much of the
> '8859-1' in the kernel is 1:1 with 8859-15, which isn't so obsolete but
> is just as useless)

Agreed, if we allow a non-ASCII charset, it should be UTF-8.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-06-08 18:43:42

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] update checkpatch.pl to version 0.03


On Jun 8 2007 18:39, Adrian Bunk wrote:
>> > >>> Does the console handle it correctly during boot?
>>
>> Yes

That's most likely because printk() handles neither special chars nor
special fun (like ANSI color and movement codes). Hence, we should be
safe should there be spurious utf8 output on the console (which is
most likely 8-bit before userspace switches it to utf-8).

>I'm simply saying that allowing UTF-8 in MODULE_AUTHOR and printk's as
>you want to can have unwanted effects.
>
>And I gave modinfo as an example for a tool that is not yet able to
>handle UTF-8 correctly in all cases.
>
>In my opinion, it's not worth the hassle to allow UTF-8 there.
>Feel free to disagree.
>
>> > So, we had some ISO8859-1 and some UTF-8 in there already. (And as for
>> > MODULE_AUTHOR, it should stay there - 'fix' modinfo instead.)

(Well, and convert all the MODULE_AUTHORs to utf-8)

>>
>> Using UTF-8 not 8859-1 for consistency is sensible, especially as 8859-1
>> is obsolete and effectively useless now (although I guess much of the
>> '8859-1' in the kernel is 1:1 with 8859-15, which isn't so obsolete but
>> is just as useless)
>
>Agreed, if we allow a non-ASCII charset, it should be UTF-8.




Jan
--