2021-02-22 16:08:10

by Aditya Srivastava

[permalink] [raw]
Subject: [RFC] scripts: kernel-doc: fix typedef support for struct parsing

There are files in kernel, which use 'typedef struct' syntax for defining
struct. For eg, include/linux/zstd.h, drivers/scsi/megaraid/mega_common.h,
etc.
However, kernel-doc still does not support it, causing a parsing error.

For eg, running scripts/kernel-doc -none on include/linux/zstd.h emits:
"error: Cannot parse struct or union!"

Add support for parsing it.

Signed-off-by: Aditya Srivastava <[email protected]>
---
scripts/kernel-doc | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index 8b5bc7bf4bb8..46e904dc3f87 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -1201,12 +1201,20 @@ sub dump_union($$) {
sub dump_struct($$) {
my $x = shift;
my $file = shift;
+ my $decl_type;
+ my $members;

if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
- my $decl_type = $1;
+ $decl_type = $1;
$declaration_name = $2;
- my $members = $3;
+ $members = $3;
+ } elsif ($x =~ /typedef\s+(struct|union)\s*\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*\s*(\w*)\s*;/) {
+ $decl_type = $1;
+ $declaration_name = $3;
+ $members = $2;
+ }

+ if ($members) {
if ($identifier ne $declaration_name) {
print STDERR "${file}:$.: warning: expecting prototype for $decl_type $identifier. Prototype was for $decl_type $declaration_name instead\n";
return;
--
2.17.1


2021-02-22 17:53:39

by Lukas Bulwahn

[permalink] [raw]
Subject: Re: [RFC] scripts: kernel-doc: fix typedef support for struct parsing

On Mon, Feb 22, 2021 at 5:03 PM Aditya Srivastava <[email protected]> wrote:
>
> There are files in kernel, which use 'typedef struct' syntax for defining
stylistic:
s/in kernel/in the kernel tree/
s/defining struct/defining some struct/
> struct. For eg, include/linux/zstd.h, drivers/scsi/megaraid/mega_common.h,
> etc.

stylistic:
s/For eg/E.g./ or
s/For eg/For example/

semantically:
It makes much more sense to name how many occurrences of this pattern
exist in the kernel tree within how many files in order to get a good
impression on the use of that pattern within the kernel tree.


> However, kernel-doc still does not support it, causing a parsing error.
>
> For eg, running scripts/kernel-doc -none on include/linux/zstd.h emits:
> "error: Cannot parse struct or union!"
>

semantically:
Drop the example and turn the two sentences above into one.

> Add support for parsing it.
>
> Signed-off-by: Aditya Srivastava <[email protected]>
> ---
> scripts/kernel-doc | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/kernel-doc b/scripts/kernel-doc
> index 8b5bc7bf4bb8..46e904dc3f87 100755
> --- a/scripts/kernel-doc
> +++ b/scripts/kernel-doc
> @@ -1201,12 +1201,20 @@ sub dump_union($$) {
> sub dump_struct($$) {
> my $x = shift;
> my $file = shift;
> + my $decl_type;
> + my $members;
>
> if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
> - my $decl_type = $1;
> + $decl_type = $1;
> $declaration_name = $2;
> - my $members = $3;
> + $members = $3;
> + } elsif ($x =~ /typedef\s+(struct|union)\s*\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*\s*(\w*)\s*;/) {
> + $decl_type = $1;
> + $declaration_name = $3;
> + $members = $2;
> + }
>

Could you put the common expression parts into meaningful variables to
avoid repeating yourself in the two pattern matching expressions?

Other than that, it looks good to me.

Lukas

> + if ($members) {
> if ($identifier ne $declaration_name) {
> print STDERR "${file}:$.: warning: expecting prototype for $decl_type $identifier. Prototype was for $decl_type $declaration_name instead\n";
> return;
> --
> 2.17.1
>

2021-02-22 21:42:16

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [RFC] scripts: kernel-doc: fix typedef support for struct parsing

Aditya Srivastava <[email protected]> writes:

> There are files in kernel, which use 'typedef struct' syntax for defining
> struct. For eg, include/linux/zstd.h, drivers/scsi/megaraid/mega_common.h,
> etc.
> However, kernel-doc still does not support it, causing a parsing error.
>
> For eg, running scripts/kernel-doc -none on include/linux/zstd.h emits:
> "error: Cannot parse struct or union!"
>
> Add support for parsing it.
>
> Signed-off-by: Aditya Srivastava <[email protected]>
> ---
> scripts/kernel-doc | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/kernel-doc b/scripts/kernel-doc
> index 8b5bc7bf4bb8..46e904dc3f87 100755
> --- a/scripts/kernel-doc
> +++ b/scripts/kernel-doc
> @@ -1201,12 +1201,20 @@ sub dump_union($$) {
> sub dump_struct($$) {
> my $x = shift;
> my $file = shift;
> + my $decl_type;
> + my $members;
>
> if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
> - my $decl_type = $1;
> + $decl_type = $1;
> $declaration_name = $2;
> - my $members = $3;
> + $members = $3;
> + } elsif ($x =~ /typedef\s+(struct|union)\s*\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*\s*(\w*)\s*;/) {

So this isn't your fault, but these regexes are really getting out of
hand. I would *really* like to see some effort made into making this
code more understandable / maintainable as we tweak this stuff. So:

- Splitting out the common part, as suggested by Lukas, would be really
useful. That would also avoid the problem of only occurrence being
edited the next tine we add a new qualifier.

- Splitting out other subsections of the regex and giving them symbolic
names would also help.

- We really could use some comments before these branches saying what
they are doing; it is *not* obvious from the code.

See what I'm getting at here?

Thanks,

jon

2021-02-23 12:27:51

by Aditya Srivastava

[permalink] [raw]
Subject: Re: [RFC] scripts: kernel-doc: fix typedef support for struct parsing

On 23/2/21 3:10 am, Jonathan Corbet wrote:
> Aditya Srivastava <[email protected]> writes:
>
>> There are files in kernel, which use 'typedef struct' syntax for defining
>> struct. For eg, include/linux/zstd.h, drivers/scsi/megaraid/mega_common.h,
>> etc.
>> However, kernel-doc still does not support it, causing a parsing error.
>>
>> For eg, running scripts/kernel-doc -none on include/linux/zstd.h emits:
>> "error: Cannot parse struct or union!"
>>
>> Add support for parsing it.
>>
>> Signed-off-by: Aditya Srivastava <[email protected]>
>> ---
>> scripts/kernel-doc | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/scripts/kernel-doc b/scripts/kernel-doc
>> index 8b5bc7bf4bb8..46e904dc3f87 100755
>> --- a/scripts/kernel-doc
>> +++ b/scripts/kernel-doc
>> @@ -1201,12 +1201,20 @@ sub dump_union($$) {
>> sub dump_struct($$) {
>> my $x = shift;
>> my $file = shift;
>> + my $decl_type;
>> + my $members;
>>
>> if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
>> - my $decl_type = $1;
>> + $decl_type = $1;
>> $declaration_name = $2;
>> - my $members = $3;
>> + $members = $3;
>> + } elsif ($x =~ /typedef\s+(struct|union)\s*\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*\s*(\w*)\s*;/) {
>
> So this isn't your fault, but these regexes are really getting out of
> hand. I would *really* like to see some effort made into making this
> code more understandable / maintainable as we tweak this stuff. So:
>
> - Splitting out the common part, as suggested by Lukas, would be really
> useful. That would also avoid the problem of only occurrence being
> edited the next tine we add a new qualifier.
>
> - Splitting out other subsections of the regex and giving them symbolic
> names would also help.
>
> - We really could use some comments before these branches saying what
> they are doing; it is *not* obvious from the code.
>
> See what I'm getting at here?
>
Yep.
Thanks for the feedback Lukas and Jonathan. I'll get back with a v2
for the patch.

Thanks
Aditya

2021-02-24 14:47:43

by Aditya Srivastava

[permalink] [raw]
Subject: [RFC v2] scripts: kernel-doc: fix typedef support for struct/union parsing

Currently, there are 447 files in the kernel tree, which use 'typedef
struct/union' syntax for defining some struct/union. In total, there
are ~1290 such occurrences in the kernel tree.

However, kernel-doc does not support it currently, and running
scripts/kernel-doc -none on include/linux/zstd.h emits:
"error: Cannot parse struct or union!"

Add support for parsing struct/union following this syntax.

Signed-off-by: Aditya Srivastava <[email protected]>
---
Changes in v2:
- Split recurring regex into multiple variables
- Modify commit message

scripts/kernel-doc | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index 8b5bc7bf4bb8..68df17877384 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -1201,12 +1201,23 @@ sub dump_union($$) {
sub dump_struct($$) {
my $x = shift;
my $file = shift;
+ my $decl_type;
+ my $members;
+ my $type = qr{struct|union};
+ # For capturing struct/union definition body, i.e. "{members*}qualifiers*"
+ my $definition_body = qr{\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*};

- if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
- my $decl_type = $1;
+ if ($x =~ /($type)\s+(\w+)\s*$definition_body/) {
+ $decl_type = $1;
$declaration_name = $2;
- my $members = $3;
+ $members = $3;
+ } elsif ($x =~ /typedef\s+($type)\s*$definition_body\s*(\w+)\s*;/) {
+ $decl_type = $1;
+ $declaration_name = $3;
+ $members = $2;
+ }

+ if ($members) {
if ($identifier ne $declaration_name) {
print STDERR "${file}:$.: warning: expecting prototype for $decl_type $identifier. Prototype was for $decl_type $declaration_name instead\n";
return;
--
2.17.1

2021-02-25 11:54:17

by Lukas Bulwahn

[permalink] [raw]
Subject: Re: [RFC v2] scripts: kernel-doc: fix typedef support for struct/union parsing

On Wed, Feb 24, 2021 at 2:30 PM Aditya Srivastava <[email protected]> wrote:
>
> Currently, there are 447 files in the kernel tree, which use 'typedef
> struct/union' syntax for defining some struct/union. In total, there
> are ~1290 such occurrences in the kernel tree.
>
> However, kernel-doc does not support it currently, and running
> scripts/kernel-doc -none on emits:
> "error: Cannot parse struct or union!"
>

I will help rephrasing it more precisely:

Currently, there are ~1290 occurrences in 447 files in the kernel tree
'typedef struct/union' syntax for defining some struct/union. However,
kernel-doc currently does not support that syntax. Of the ~1290
occurrences, there are four occurrences in ./include/linux/zstd.h with
typedef struct/union syntax and a preceding kernel-doc; all other
occurrences have no preceding kernel-doc.

Add support for parsing struct/union following this syntax.



I also tested it; here is my quick diff of "git ls-files | xargs
./scripts/kernel-doc -none 2>&1" before and after patch application:

< include/linux/zstd.h:154: error: Cannot parse struct or union!
< include/linux/zstd.h:171: error: Cannot parse struct or union!
< include/linux/zstd.h:181: error: Cannot parse struct or union!
18857d18853
< include/linux/zstd.h:936: error: Cannot parse struct or union!


So, again:

Tested-by: Lukas Bulwahn <[email protected]>

Aditya, can you please pick up my rephrasing and send out a v3.


Lukas

2021-02-25 14:54:57

by Aditya Srivastava

[permalink] [raw]
Subject: [RFC v3] scripts: kernel-doc: fix typedef support for struct/union parsing

Currently, there are ~1290 occurrences in 447 files in the kernel tree
'typedef struct/union' syntax for defining some struct/union. However,
kernel-doc currently does not support that syntax. Of the ~1290
occurrences, there are four occurrences in ./include/linux/zstd.h with
typedef struct/union syntax and a preceding kernel-doc; all other
occurrences have no preceding kernel-doc.

Add support for parsing struct/union following this syntax.

Signed-off-by: Aditya Srivastava <[email protected]>
---
Changes in v3:
- Modify commit message

Changes in v2:
- Split recurring regex into multiple variables
- Modify commit message

scripts/kernel-doc | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index 8b5bc7bf4bb8..68df17877384 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -1201,12 +1201,23 @@ sub dump_union($$) {
sub dump_struct($$) {
my $x = shift;
my $file = shift;
+ my $decl_type;
+ my $members;
+ my $type = qr{struct|union};
+ # For capturing struct/union definition body, i.e. "{members*}qualifiers*"
+ my $definition_body = qr{\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*};

- if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
- my $decl_type = $1;
+ if ($x =~ /($type)\s+(\w+)\s*$definition_body/) {
+ $decl_type = $1;
$declaration_name = $2;
- my $members = $3;
+ $members = $3;
+ } elsif ($x =~ /typedef\s+($type)\s*$definition_body\s*(\w+)\s*;/) {
+ $decl_type = $1;
+ $declaration_name = $3;
+ $members = $2;
+ }

+ if ($members) {
if ($identifier ne $declaration_name) {
print STDERR "${file}:$.: warning: expecting prototype for $decl_type $identifier. Prototype was for $decl_type $declaration_name instead\n";
return;
--
2.17.1

2021-03-04 05:26:52

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [RFC v3] scripts: kernel-doc: fix typedef support for struct/union parsing

Aditya Srivastava <[email protected]> writes:

> Currently, there are ~1290 occurrences in 447 files in the kernel tree
> 'typedef struct/union' syntax for defining some struct/union. However,
> kernel-doc currently does not support that syntax. Of the ~1290
> occurrences, there are four occurrences in ./include/linux/zstd.h with
> typedef struct/union syntax and a preceding kernel-doc; all other
> occurrences have no preceding kernel-doc.
>
> Add support for parsing struct/union following this syntax.
>
> Signed-off-by: Aditya Srivastava <[email protected]>

Applied, thanks.

jon

2021-03-06 04:38:06

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC v3] scripts: kernel-doc: fix typedef support for struct/union parsing

On Thu, Feb 25, 2021 at 08:20:33PM +0530, Aditya Srivastava wrote:
> +++ b/scripts/kernel-doc
> @@ -1201,12 +1201,23 @@ sub dump_union($$) {
> sub dump_struct($$) {
> my $x = shift;
> my $file = shift;
> + my $decl_type;
> + my $members;
> + my $type = qr{struct|union};
> + # For capturing struct/union definition body, i.e. "{members*}qualifiers*"
> + my $definition_body = qr{\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*};
> - if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
> - my $decl_type = $1;
> + if ($x =~ /($type)\s+(\w+)\s*$definition_body/) {
> + $decl_type = $1;
> $declaration_name = $2;
> - my $members = $3;
> + $members = $3;
> + } elsif ($x =~ /typedef\s+($type)\s*$definition_body\s*(\w+)\s*;/) {
> + $decl_type = $1;
> + $declaration_name = $3;
> + $members = $2;
> + }

In the same spirit as dump_function, would something like this work?

- if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
+ $x =~ s/__packed +//;
+ $x =~ s/__aligned +//;
+ $x =~ s/____cacheline_aligned_in_smp +//;
+ $x =~ s/____cacheline_aligned +//;
+ $x =~ s/__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)//;
+
+ if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*)*/) {

2021-03-06 06:27:56

by Lukas Bulwahn

[permalink] [raw]
Subject: Re: [RFC v3] scripts: kernel-doc: fix typedef support for struct/union parsing

On Sat, Mar 6, 2021 at 5:35 AM Matthew Wilcox <[email protected]> wrote:
>
> On Thu, Feb 25, 2021 at 08:20:33PM +0530, Aditya Srivastava wrote:
> > +++ b/scripts/kernel-doc
> > @@ -1201,12 +1201,23 @@ sub dump_union($$) {
> > sub dump_struct($$) {
> > my $x = shift;
> > my $file = shift;
> > + my $decl_type;
> > + my $members;
> > + my $type = qr{struct|union};
> > + # For capturing struct/union definition body, i.e. "{members*}qualifiers*"
> > + my $definition_body = qr{\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*};
> > - if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
> > - my $decl_type = $1;
> > + if ($x =~ /($type)\s+(\w+)\s*$definition_body/) {
> > + $decl_type = $1;
> > $declaration_name = $2;
> > - my $members = $3;
> > + $members = $3;
> > + } elsif ($x =~ /typedef\s+($type)\s*$definition_body\s*(\w+)\s*;/) {
> > + $decl_type = $1;
> > + $declaration_name = $3;
> > + $members = $2;
> > + }
>
> In the same spirit as dump_function, would something like this work?
>

I agree. That might be a suitable clean-up to keep the code for
functions and struct/union parsing similar in style/spirit.

Aditya, would you like to create a patch for that?

Lukas

2021-03-06 07:52:37

by Aditya Srivastava

[permalink] [raw]
Subject: Re: [RFC v3] scripts: kernel-doc: fix typedef support for struct/union parsing

On 6/3/21 11:55 am, Lukas Bulwahn wrote:
> On Sat, Mar 6, 2021 at 5:35 AM Matthew Wilcox <[email protected]> wrote:
>>
>> On Thu, Feb 25, 2021 at 08:20:33PM +0530, Aditya Srivastava wrote:
>>> +++ b/scripts/kernel-doc
>>> @@ -1201,12 +1201,23 @@ sub dump_union($$) {
>>> sub dump_struct($$) {
>>> my $x = shift;
>>> my $file = shift;
>>> + my $decl_type;
>>> + my $members;
>>> + my $type = qr{struct|union};
>>> + # For capturing struct/union definition body, i.e. "{members*}qualifiers*"
>>> + my $definition_body = qr{\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*};
>>> - if ($x =~ /(struct|union)\s+(\w+)\s*\{(.*)\}(\s*(__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*/) {
>>> - my $decl_type = $1;
>>> + if ($x =~ /($type)\s+(\w+)\s*$definition_body/) {
>>> + $decl_type = $1;
>>> $declaration_name = $2;
>>> - my $members = $3;
>>> + $members = $3;
>>> + } elsif ($x =~ /typedef\s+($type)\s*$definition_body\s*(\w+)\s*;/) {
>>> + $decl_type = $1;
>>> + $declaration_name = $3;
>>> + $members = $2;
>>> + }
>>
>> In the same spirit as dump_function, would something like this work?
>>
>
> I agree. That might be a suitable clean-up to keep the code for
> functions and struct/union parsing similar in style/spirit.
>
> Aditya, would you like to create a patch for that?
>

Sure Lukas.
I have a doubt though, Can't we use a single expression separated by
"|" here, instead of multiple lines? i.e.,

$x =~
s/__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)\s*//;


Probably we could do something similar for dump_function, i.e.,
- $prototype =~ s/^static +//;
- $prototype =~ s/^extern +//;
- $prototype =~ s/^asmlinkage +//;
- $prototype =~ s/^inline +//;
- $prototype =~ s/^__inline__ +//;
- $prototype =~ s/^__inline +//;
- $prototype =~ s/^__always_inline +//;
- $prototype =~ s/^noinline +//;

+ $prototype =~
s/^(?:static|extern|asmlinkage|__?inline__?|__always_inline|noinline) +//;
And so on for other regexps.

What do you think?

Thanks
Aditya

2021-03-06 15:24:48

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC v3] scripts: kernel-doc: fix typedef support for struct/union parsing

On Sat, Mar 06, 2021 at 01:18:38PM +0530, Aditya wrote:
> On 6/3/21 11:55 am, Lukas Bulwahn wrote:
> > I agree. That might be a suitable clean-up to keep the code for
> > functions and struct/union parsing similar in style/spirit.
> >
> > Aditya, would you like to create a patch for that?
>
> Sure Lukas.
> I have a doubt though, Can't we use a single expression separated by
> "|" here, instead of multiple lines? i.e.,
>
> $x =~
> s/__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)\s*//;
>
>
> Probably we could do something similar for dump_function, i.e.,
> - $prototype =~ s/^static +//;
> - $prototype =~ s/^extern +//;
> - $prototype =~ s/^asmlinkage +//;
> - $prototype =~ s/^inline +//;
> - $prototype =~ s/^__inline__ +//;
> - $prototype =~ s/^__inline +//;
> - $prototype =~ s/^__always_inline +//;
> - $prototype =~ s/^noinline +//;
>
> + $prototype =~
> s/^(?:static|extern|asmlinkage|__?inline__?|__always_inline|noinline) +//;
> And so on for other regexps.
>
> What do you think?

I think there's a tradeoff between speed / compactness and readability.
As someone who doesn't know perl particularly well, I can look at the
series of lines and say "Oh, it's stripping out these unwanted things".
Your one line, while undoubtedly more efficient, is considerably less
easy to understand.

Maybe there's another way to do it that's more efficient while not
sacrificing the readability?

Also, would your suggestion work for 'static inline void foo(void)'?
I think it needs to remove multiple occurrences of the things in the
regex. But maybe that's what the ?: on the beginning is for?

2021-03-07 07:59:03

by Aditya Srivastava

[permalink] [raw]
Subject: Re: [RFC v3] scripts: kernel-doc: fix typedef support for struct/union parsing

On 6/3/21 8:50 pm, Matthew Wilcox wrote:
> On Sat, Mar 06, 2021 at 01:18:38PM +0530, Aditya wrote:
>> On 6/3/21 11:55 am, Lukas Bulwahn wrote:
>>> I agree. That might be a suitable clean-up to keep the code for
>>> functions and struct/union parsing similar in style/spirit.
>>>
>>> Aditya, would you like to create a patch for that?
>>
>> Sure Lukas.
>> I have a doubt though, Can't we use a single expression separated by
>> "|" here, instead of multiple lines? i.e.,
>>
>> $x =~
>> s/__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)\s*//;
>>
>>
>> Probably we could do something similar for dump_function, i.e.,
>> - $prototype =~ s/^static +//;
>> - $prototype =~ s/^extern +//;
>> - $prototype =~ s/^asmlinkage +//;
>> - $prototype =~ s/^inline +//;
>> - $prototype =~ s/^__inline__ +//;
>> - $prototype =~ s/^__inline +//;
>> - $prototype =~ s/^__always_inline +//;
>> - $prototype =~ s/^noinline +//;
>>
>> + $prototype =~
>> s/^(?:static|extern|asmlinkage|__?inline__?|__always_inline|noinline) +//;
>> And so on for other regexps.
>>
>> What do you think?
>
> I think there's a tradeoff between speed / compactness and readability.
> As someone who doesn't know perl particularly well, I can look at the
> series of lines and say "Oh, it's stripping out these unwanted things".
> Your one line, while undoubtedly more efficient, is considerably less
> easy to understand.
>
> Maybe there's another way to do it that's more efficient while not
> sacrificing the readability?
>
> Also, would your suggestion work for 'static inline void foo(void)'?
> I think it needs to remove multiple occurrences of the things in the
> regex.

Ah, I get it.

But maybe that's what the ?: on the beginning is for?
>

No, "?:" is just to use this regex for matching, without capturing.
So, the regex will just remove any of those 'starting' occurrences,
consequently, "static inline" occurrence will probably not be removed.
I think the reason for using multiple lines for substitution in
dump_function is for the same reason, ie, subsequent substitution.

But for dump_struct, it is probably not desired, i.e., subsequent
substitution/removal. Also, for the same reason, using it with
dump_struct may cause unwelcomed discrepancies, or cause the user to
understand that here multiple substitutions are required, but is not so.
So, I think that we should use a single expression substitution for
dump_struct, at best. But am not sure if that is required, as
currently also we are not capturing this part of the regex, as well
as, matching only at certain position of the definition expression.

But I am curious to what others think about this.
Will be happy to work on this patch :)

Thanks
Aditya