2021-04-15 00:44:03

by Aditya Srivastava

[permalink] [raw]
Subject: [RFC] scripts: kernel-doc: improve parsing for kernel-doc comments syntax

Currently kernel-doc does not identify some cases of probable kernel
doc comments, for e.g. pointer used as declaration type for identifier,
space separated identifier, etc.

Some example of these cases in files can be:
i)" * journal_t * jbd2_journal_init_dev() - creates and initialises a journal structure"
in fs/jbd2/journal.c

ii) "* dget, dget_dlock - get a reference to a dentry" in
include/linux/dcache.h

iii) " * DEFINE_SEQLOCK(sl) - Define a statically allocated seqlock_t"
in include/linux/seqlock.h

Also improve identification for non-kerneldoc comments. For e.g.,

i) " * The following functions allow us to read data using a swap map"
in kernel/power/swap.c does follow the kernel-doc like syntax, but the
content inside does not adheres to the expected format.

Improve parsing by adding support for these probable attempts to write
kernel-doc comment.

Suggested-by: Jonathan Corbet <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Aditya Srivastava <[email protected]>
---
scripts/kernel-doc | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index 888913528185..37665aa41e6b 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -2110,17 +2110,25 @@ sub process_name($$) {
} elsif (/$doc_decl/o) {
$identifier = $1;
my $is_kernel_comment = 0;
- if (/^\s*\*\s*([\w\s]+?)(\(\))?\s*([-:].*)?$/) {
+ my $decl_start = qr{\s*\*};
+ my $fn_type = qr{\w+\s*\*\s*}; # i.e. pointer declaration type, foo * bar() - desc
+ my $parenthesis = qr{\(\w*\)};
+ my $decl_end = qr{[-:].*};
+ if (/^$decl_start\s*([\w\s]+?)$parenthesis?\s*$decl_end?$/) {
$identifier = $1;
- $decl_type = 'function';
- $identifier =~ s/^define\s+//;
- $is_kernel_comment = 1;
}
if ($identifier =~ m/^(struct|union|enum|typedef)\b\s*(\S*)/) {
$decl_type = $1;
$identifier = $2;
$is_kernel_comment = 1;
}
+ elsif (/^$decl_start\s*$fn_type?(\w+)\s*$parenthesis?\s*$decl_end?$/ || # i.e. foo()
+ /^$decl_start\s*$fn_type?(\w+.*)$parenthesis?\s*$decl_end$/) { # i.e. static void foo() - description; or misspelt identifier
+ $identifier = $1;
+ $decl_type = 'function';
+ $identifier =~ s/^define\s+//;
+ $is_kernel_comment = 1;
+ }
$identifier =~ s/\s+$//;

$state = STATE_BODY;
--
2.17.1


2021-04-15 11:20:32

by Aditya Srivastava

[permalink] [raw]
Subject: Re: [RFC] scripts: kernel-doc: improve parsing for kernel-doc comments syntax

On 15/4/21 12:55 am, Aditya Srivastava wrote:
> Currently kernel-doc does not identify some cases of probable kernel
> doc comments, for e.g. pointer used as declaration type for identifier,
> space separated identifier, etc.
>
> Some example of these cases in files can be:
> i)" * journal_t * jbd2_journal_init_dev() - creates and initialises a journal structure"
> in fs/jbd2/journal.c
>
> ii) "* dget, dget_dlock - get a reference to a dentry" in
> include/linux/dcache.h
>
> iii) " * DEFINE_SEQLOCK(sl) - Define a statically allocated seqlock_t"
> in include/linux/seqlock.h
>
> Also improve identification for non-kerneldoc comments. For e.g.,
>
> i) " * The following functions allow us to read data using a swap map"
> in kernel/power/swap.c does follow the kernel-doc like syntax, but the
> content inside does not adheres to the expected format.
>
> Improve parsing by adding support for these probable attempts to write
> kernel-doc comment.
>
> Suggested-by: Jonathan Corbet <[email protected]>
> Link: https://lore.kernel.org/lkml/[email protected]
> Signed-off-by: Aditya Srivastava <[email protected]>
> ---
> scripts/kernel-doc | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/scripts/kernel-doc b/scripts/kernel-doc
> index 888913528185..37665aa41e6b 100755
> --- a/scripts/kernel-doc
> +++ b/scripts/kernel-doc
> @@ -2110,17 +2110,25 @@ sub process_name($$) {
> } elsif (/$doc_decl/o) {
> $identifier = $1;
> my $is_kernel_comment = 0;
> - if (/^\s*\*\s*([\w\s]+?)(\(\))?\s*([-:].*)?$/) {
> + my $decl_start = qr{\s*\*};
> + my $fn_type = qr{\w+\s*\*\s*}; # i.e. pointer declaration type, foo * bar() - desc
> + my $parenthesis = qr{\(\w*\)};
> + my $decl_end = qr{[-:].*};
> + if (/^$decl_start\s*([\w\s]+?)$parenthesis?\s*$decl_end?$/) {
> $identifier = $1;
> - $decl_type = 'function';
> - $identifier =~ s/^define\s+//;
> - $is_kernel_comment = 1;
> }
> if ($identifier =~ m/^(struct|union|enum|typedef)\b\s*(\S*)/) {
> $decl_type = $1;
> $identifier = $2;
> $is_kernel_comment = 1;
> }
> + elsif (/^$decl_start\s*$fn_type?(\w+)\s*$parenthesis?\s*$decl_end?$/ || # i.e. foo()
> + /^$decl_start\s*$fn_type?(\w+.*)$parenthesis?\s*$decl_end$/) { # i.e. static void foo() - description; or misspelt identifier
> + $identifier = $1;
> + $decl_type = 'function';
> + $identifier =~ s/^define\s+//;
> + $is_kernel_comment = 1;
> + }
> $identifier =~ s/\s+$//;
>
> $state = STATE_BODY;
>

Hi
I have generated a diff file for changes in kernel-doc warnings for
all the files in the kernel-tree, before and after this patch.
It can be found at:
https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/kernel-doc/kernel_doc_comment_syntax_improvement_diff.txt

Thanks
Aditya

2021-04-15 22:16:47

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [RFC] scripts: kernel-doc: improve parsing for kernel-doc comments syntax

Aditya Srivastava <[email protected]> writes:

> Currently kernel-doc does not identify some cases of probable kernel
> doc comments, for e.g. pointer used as declaration type for identifier,
> space separated identifier, etc.
>
> Some example of these cases in files can be:
> i)" * journal_t * jbd2_journal_init_dev() - creates and initialises a journal structure"
> in fs/jbd2/journal.c
>
> ii) "* dget, dget_dlock - get a reference to a dentry" in
> include/linux/dcache.h
>
> iii) " * DEFINE_SEQLOCK(sl) - Define a statically allocated seqlock_t"
> in include/linux/seqlock.h
>
> Also improve identification for non-kerneldoc comments. For e.g.,
>
> i) " * The following functions allow us to read data using a swap map"
> in kernel/power/swap.c does follow the kernel-doc like syntax, but the
> content inside does not adheres to the expected format.
>
> Improve parsing by adding support for these probable attempts to write
> kernel-doc comment.
>
> Suggested-by: Jonathan Corbet <[email protected]>
> Link: https://lore.kernel.org/lkml/[email protected]
> Signed-off-by: Aditya Srivastava <[email protected]>
> ---
> scripts/kernel-doc | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)

OK, I've applied this, but I have a couple of comments...

> diff --git a/scripts/kernel-doc b/scripts/kernel-doc
> index 888913528185..37665aa41e6b 100755
> --- a/scripts/kernel-doc
> +++ b/scripts/kernel-doc
> @@ -2110,17 +2110,25 @@ sub process_name($$) {
> } elsif (/$doc_decl/o) {
> $identifier = $1;
> my $is_kernel_comment = 0;
> - if (/^\s*\*\s*([\w\s]+?)(\(\))?\s*([-:].*)?$/) {
> + my $decl_start = qr{\s*\*};

I appreciate the attempt to make the regexes a bit more comprehensible,
but we can do better yet, methinks. This $decl_start is very much like
$doc_com defined globally.

It would really help a lot if we could at least take the incredible mass
of regexes in this program and boil them down to a smaller, unique set
that is used throughout. kernel-doc might still make brains explode,
but perhaps the blast radius would be a bit smaller.

> + my $fn_type = qr{\w+\s*\*\s*}; # i.e. pointer declaration type, foo * bar() - desc

Some of the lines in this change go waaaaay beyond the 80-character
limit; please try not to do that. I fixed up the offending comments
this time around.

Thanks,

jon