2020-12-08 17:21:05

by Nicolai Fischer

[permalink] [raw]
Subject: [RFC PATCH v2] checkpatch: rewrite Kconfig parsing

Checkpatch currently only warns if the help text is too short.
To determine this the diff gets parsed for keywords starting
a new entry, but several kinds of false positives can occur with
the current implementation, especially when the config
is not well formatted.

This patch makes the parsing more robust and includes
new warnings if:
1) the help attribute is not specified last
2) there is no blank line or endif before the next keyword
3) the help text is not indented 2 spaces more than
the attribute itself.

Signed-off-by: Nicolai Fischer <[email protected]>
Co-developed-by: Johannes Czekay <[email protected]>
Signed-off-by: Johannes Czekay <[email protected]>
---

This patch rewrites most of the Kconfig parsing to address
the issues mentioned in the first RFC:

1) search for 'help' instead of '---help---'
> I believe all the '---help---' lines have been converted to just 'help'
> so the '(?:---)?' bits here could be removed

2) create new warnings:
> Perhaps it'd be better to create a new warning when the help text
> block is not the last block of the config section. Maybe warn when
> a blank line or endif is not the separator to the next keyword.
> Maybe warn when the next line after help is not indented 2 more
> spaces than the help line.

3) fix handling of blank lines and rely on keywords for end of help text
> This doesn't allow blank lines for multi-paragraph help text either.
>
> I think keyword parsing is necessary and some false positives are
> inevitable as the parsing logic in a line-by-line analyzer will
> always be incomplete.


It has occurred to us, that kconfig-language.rst does not explicitly
specify that the help text should be the last attribute, although
this is the defacto convention. Now that checkpatch actively checks
for this, we should probably update the documentation accordingly.


scripts/checkpatch.pl | 95 +++++++++++++++++++++++++++++++------------
1 file changed, 68 insertions(+), 27 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 7b086d1cd6c2..52b3fd0c4581 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3299,7 +3299,11 @@ sub process {
"Do not include the paragraph about writing to the Free Software Foundation's mailing address from the sample GPL notice. The FSF has changed addresses in the past, and may do so again. Linux already includes a copy of the GPL.\n" . $herevet)
}

-# check for Kconfig help text having a real description
+# Check if Kconfig is well formatted. Warn if help text:
+# 1) is shorter than $min_conf_desc_length lines
+# 2) is not specified last
+# 3) and next keyword are not separated by a blank line or endif
+# 4) is not indented correctly
# Only applies when adding the entry originally, after that we do not have
# sufficient context to determine whether it is indeed long enough.
if ($realfile =~ /Kconfig/ &&
@@ -3308,46 +3312,83 @@ sub process {
# (\b) rather than a whitespace character (\s)
$line =~ /^\+\s*(?:config|menuconfig|choice)\b/) {
my $length = 0;
- my $cnt = $realcnt;
- my $ln = $linenr + 1;
- my $f;
my $is_start = 0;
my $is_end = 0;
- for (; $cnt > 0 && defined $lines[$ln - 1]; $ln++) {
- $f = $lines[$ln - 1];
- $cnt-- if ($lines[$ln - 1] !~ /^-/);
- $is_end = $lines[$ln - 1] =~ /^\+/;
+ my $valid_end = 0;
+ my $wrong_indent = 0;
+ my $help_last = 1;

- next if ($f =~ /^-/);
- last if (!$file && $f =~ /^\@\@/);
+ my $cnt = $realcnt;
+ my $help_indent;
+ my $last_blank = 0;
+ for (my $ln = $linenr; $cnt > 0 && defined $lines[$ln]; $ln++) {
+ my $line = $lines[$ln];

- if ($lines[$ln - 1] =~ /^\+\s*(?:bool|tristate|prompt)\s*["']/) {
+ next if ($line =~ /^-/);
+ last if (!$file && $line =~ /^\@\@/);
+ $cnt--;
+ $line =~ s/^\+//;
+
+ if ($line =~ /^\s*(?:bool|tristate|string|hex|int|prompt)\s*["']/) {
$is_start = 1;
- } elsif ($lines[$ln - 1] =~ /^\+\s*(?:---)?help(?:---)?$/) {
- $length = -1;
+ } elsif ($line =~ /^(\s*)help$/) {
+ $help_indent = $1;
+ $length = 0;
+ next;
}
+ next if ($line =~ /^#.*$/);
+ next if (!defined $help_indent);

- $f =~ s/^.//;
- $f =~ s/#.*//;
- $f =~ s/^\s+//;
- next if ($f =~ /^$/);
-
- # This only checks context lines in the patch
- # and so hopefully shouldn't trigger false
- # positives, even though some of these are
- # common words in help texts
- if ($f =~ /^\s*(?:config|menuconfig|choice|endchoice|
- if|endif|menu|endmenu|source)\b/x) {
+ if ($line =~ /^\s*$/) {
+ $last_blank = 1;
+ next;
+ }
+ my $l = $line;
+ $l =~ s/^$help_indent//;
+ if ($l =~ /^(?:bool|tristate|string|hex|int|prompt|default|
+ depends\ on|select|imply|visible\ if|range|option)\b/x) {
+ $help_last = 0;
+ } elsif ($line =~/^\s*endif\s*(?:#.*)?$/) {
+ # line contains only endif keyword and optional comment
+ # help text is terminated properly
$is_end = 1;
+ $valid_end = 1;
last;
+ } elsif ($line =~ /^\s*(?:config|menuconfig|choice|endchoice|
+ comment|if|menu|endmenu|source)\b/x) {
+ # match all keywords except endif
+ $is_end = 1;
+ if ($last_blank) {
+ # This generates a false positive if a blank line is followed by an
+ # 'if' on the next line. But that would probably not be a well formatted text.
+ $valid_end = 1;
+ last;
+ }
+ }
+ if ($l !~ /^\ {2}/) {
+ $wrong_indent = 1;
}
+ $last_blank = 0;
$length++;
}
- if ($is_start && $is_end && $length < $min_conf_desc_length) {
+ if ($is_start && $is_end && !$valid_end) {
+ WARN("CONFIG_DESCRIPTION",
+ "help text is not followed by a blank line or endif\n" . $herecurr);
+ }
+ if ($is_start && $valid_end && $length < $min_conf_desc_length) {
+ WARN("CONFIG_DESCRIPTION",
+ "help text is too short ($length/$min_conf_desc_length lines)\n" . $herecurr);
+ }
+ if ($is_start && $valid_end && $help_last && $wrong_indent) {
+ WARN("CONFIG_DESCRIPTION",
+ "help text is not indented 2 spaces more than the help keyword\n" . $herecurr);
+ }
+ if ($is_start && !$help_last) {
WARN("CONFIG_DESCRIPTION",
- "please write a paragraph that describes the config symbol fully\n" . $herecurr);
+ "help text is not the last attribute\n" . $herecurr);
}
- #print "is_start<$is_start> is_end<$is_end> length<$length>\n";
+ #print "is_start<$is_start> is_end<$is_end> valid_end <$valid_end>";
+ #print "help_last <$help_last> length<$length>\n";
}

# check MAINTAINERS entries
--
2.28.0


2020-12-08 20:39:02

by Joe Perches

[permalink] [raw]
Subject: Re: [RFC PATCH v2] checkpatch: rewrite Kconfig parsing

On Tue, 2020-12-08 at 18:18 +0100, Nicolai Fischer wrote:
> Checkpatch currently only warns if the help text is too short.
> To determine this the diff gets parsed for keywords starting
> a new entry, but several kinds of false positives can occur with
> the current implementation, especially when the config
> is not well formatted.
>
> This patch makes the parsing more robust and includes
> new warnings if:
> 1) the help attribute is not specified last
> 2) there is no blank line or endif before the next keyword
> 3) the help text is not indented 2 spaces more than
> ???the attribute itself.
>
> Signed-off-by: Nicolai Fischer <[email protected]>
> Co-developed-by: Johannes Czekay <[email protected]>
> Signed-off-by: Johannes Czekay <[email protected]>
> ---
>
> This patch rewrites most of the Kconfig parsing to address
> the issues mentioned in the first RFC:
>
> 1) search for 'help' instead of '---help---'
> > I believe all the '---help---' lines have been converted to just 'help'
> > so the '(?:---)?' bits here could be removed
>
> 2) create new warnings:
> > Perhaps it'd be better to create a new warning when the help text
> > block is not the last block of the config section. Maybe warn when
> > a blank line or endif is not the separator to the next keyword.
> > Maybe warn when the next line after help is not indented 2 more
> > spaces than the help line.
>
> 3) fix handling of blank lines and rely on keywords for end of help text
> > This doesn't allow blank lines for multi-paragraph help text either.
> >
> > I think keyword parsing is necessary and some false positives are
> > inevitable as the parsing logic in a line-by-line analyzer will
> > always be incomplete.
>
>
> It has occurred to us, that kconfig-language.rst does not explicitly
> specify that the help text should be the last attribute, although
> this is the defacto convention. Now that checkpatch actively checks
> for this, we should probably update the documentation accordingly.

Generally process is either to update documentation along with
with a checkpatch change or to update documentation first.

Also checkpatch isn't necessarily the best tool for this.

> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> -# check for Kconfig help text having a real description
> +# Check if Kconfig is well formatted. Warn if help text:
> +# 1) is shorter than $min_conf_desc_length lines
> +# 2) is not specified last
> +# 3) and next keyword are not separated by a blank line or endif
> +# 4) is not indented correctly
> ?# Only applies when adding the entry originally, after that we do not have
> ?# sufficient context to determine whether it is indeed long enough.
> ? if ($realfile =~ /Kconfig/ &&

[]

> + my $l = $line;
> + $l =~ s/^$help_indent//;
> + if ($l =~ /^(?:bool|tristate|string|hex|int|prompt|default|
> + depends\ on|select|imply|visible\ if|range|option)\b/x) {

I think this is overly fragile.
These keywords are not required to be at the same indent as help.

Also as specified in scripts/kconfig/lexer.h, the kconfig specification
has more keywords than the list above.

In general, checkpatch does not have to be the tool of choice for
verifying everything.

For instance, checkpatch has a trivial check for MAINTAINERS entry
ordering, but there is a complete tool called parse-maintainers.pl
that verifies alphabetic section ordering.

I think most of what you seem to be attempting should be in a new
tool that completely understands Kconfig parsing.

I suggest instead of updating checkpatch, the scripts/kconfig/
content be updated to do these things.



2020-12-10 01:42:43

by Nicolai Fischer

[permalink] [raw]
Subject: Re: [RFC PATCH v2] checkpatch: rewrite Kconfig parsing

On 12/8/20 7:58 PM, Joe Perches wrote:
> On Tue, 2020-12-08 at 18:18 +0100, Nicolai Fischer wrote:
>> Checkpatch currently only warns if the help text is too short.
>> To determine this the diff gets parsed for keywords starting
>> a new entry, but several kinds of false positives can occur with
>> the current implementation, especially when the config
>> is not well formatted.
>>
>> This patch makes the parsing more robust and includes
>> new warnings if:
>> 1) the help attribute is not specified last
>> 2) there is no blank line or endif before the next keyword
>> 3) the help text is not indented 2 spaces more than
>>    the attribute itself.
>>
>> Signed-off-by: Nicolai Fischer <[email protected]>
>> Co-developed-by: Johannes Czekay <[email protected]>
>> Signed-off-by: Johannes Czekay <[email protected]>
>> ---
>>
>> This patch rewrites most of the Kconfig parsing to address
>> the issues mentioned in the first RFC:
>>
>> 1) search for 'help' instead of '---help---'
>>> I believe all the '---help---' lines have been converted to just 'help'
>>> so the '(?:---)?' bits here could be removed
>>
>> 2) create new warnings:
>>> Perhaps it'd be better to create a new warning when the help text
>>> block is not the last block of the config section. Maybe warn when
>>> a blank line or endif is not the separator to the next keyword.
>>> Maybe warn when the next line after help is not indented 2 more
>>> spaces than the help line.
>>
>> 3) fix handling of blank lines and rely on keywords for end of help text
>>> This doesn't allow blank lines for multi-paragraph help text either.
>>>
>>> I think keyword parsing is necessary and some false positives are
>>> inevitable as the parsing logic in a line-by-line analyzer will
>>> always be incomplete.
>>
>>
>> It has occurred to us, that kconfig-language.rst does not explicitly
>> specify that the help text should be the last attribute, although
>> this is the defacto convention. Now that checkpatch actively checks
>> for this, we should probably update the documentation accordingly.
>
> Generally process is either to update documentation along with
> with a checkpatch change or to update documentation first.
>
> Also checkpatch isn't necessarily the best tool for this.
>
>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> []
>> -# check for Kconfig help text having a real description
>> +# Check if Kconfig is well formatted. Warn if help text:
>> +# 1) is shorter than $min_conf_desc_length lines
>> +# 2) is not specified last
>> +# 3) and next keyword are not separated by a blank line or endif
>> +# 4) is not indented correctly
>>  # Only applies when adding the entry originally, after that we do not have
>>  # sufficient context to determine whether it is indeed long enough.
>>   if ($realfile =~ /Kconfig/ &&
>
> []
>
>> + my $l = $line;
>> + $l =~ s/^$help_indent//;
>> + if ($l =~ /^(?:bool|tristate|string|hex|int|prompt|default|
>> + depends\ on|select|imply|visible\ if|range|option)\b/x) {
>
> I think this is overly fragile.
> These keywords are not required to be at the same indent as help.
>
> Also as specified in scripts/kconfig/lexer.h, the kconfig specification
> has more keywords than the list above.
>
> In general, checkpatch does not have to be the tool of choice for
> verifying everything.
>
> For instance, checkpatch has a trivial check for MAINTAINERS entry
> ordering, but there is a complete tool called parse-maintainers.pl
> that verifies alphabetic section ordering.
>
> I think most of what you seem to be attempting should be in a new
> tool that completely understands Kconfig parsing.
>
> I suggest instead of updating checkpatch, the scripts/kconfig/
> content be updated to do these things.
>


We understand that checkpatch may not be the ideal place for all of these checks.
However the current implementation has some problems we would like to fix.
Would you be interested in a patch series addressing just the check for the number of lines?
Specifically:
1)
>
>>
>> I believe all the '---help---' lines have been converted to just 'help'
>> so the '(?:---)?' bits here could be removed.
>
> Yes.
>
2) add string, hex and int types to the $is_start regex
3) improve the help message to include the number of present and expected lines
4) Warn if help text is not followed by a blank line or endif







2020-12-11 10:08:53

by Joe Perches

[permalink] [raw]
Subject: Re: [RFC PATCH v2] checkpatch: rewrite Kconfig parsing

On Wed, 2020-12-09 at 15:24 +0100, Nicolai Fischer wrote:
> We understand that checkpatch may not be the ideal place for all of these checks.

So please work on an ideal thing.

> However the current implementation has some problems we would like to fix.
> Would you be interested in a patch series addressing just the check for the number of lines?
> Specifically:
> 1)
> >
> > >
> > > I believe all the '---help---' lines have been converted to just 'help'
> > > so the '(?:---)?' bits here could be removed.
> >
> > Yes.
> >
> 2) add string, hex and int types to the $is_start regex

sure

> 3) improve the help message to include the number of present and expected lines

no. The concept of a required paragraph is still dubious.

> 4) Warn if help text is not followed by a blank line or endif

How often is it now not followed by a blank line or endif?