LinuxLists.cc - [RFC] duplicate #include check for build system

2006-02-21 01:48:26

Subject: [RFC] duplicate #include check for build system

Hi Sam! Folks!

recently had the idea to utilize cpp or sparse to
do some automated #include checking, and I came up
with the following proof of concept:

I just replaced the sparse binary by the following
script (basically hijacking the make C=1 system)

it would allow kernel developers to easily identify
duplicate includes, which in turn, might reduce
dependancies and thus build time ...

----------------
#!/bin/bash

while [ $# -gt 1 ]; do
case $1 in
-D*) DEF="$DEF $1";;
-W*) ;;
*) OPT="$OPT $1";;
esac
shift
done

# ( $CPP $DEF -H -dI $OPT $1 1>&2 ) 2>&1 | grep "^[#.]"
$CPP $DEF -H $OPT $1 2>&1 >/dev/null | gawk -vFILE="$1" '

BEGIN { I[0]=FILE; }

/^[.]+/ { D=length($1);

for (i=0; i<D; i++)
C[i,$2]++;

I[D]=$2;

for (i=0; i<D; i++)
M[i,$2]=I[i];

if (C[D-1,$2]>1) {
printf "??? %s in %s ",$2,I[D-1];

for (i=D; M[i,$2]; i++)
printf "%c%s", (i==D)?"[":"?", M[i,$2];

printf (i>D) ? "]\n" :
((X[D,$2]==I[D-1]) ? "(dup)\n" : "\n");
}

X[D,$2]=I[D-1];
}
' 1>&2

true
----------------

of course, most of it would not be required if
there was support in the kernel build system,
and, if there is any preference for perl over
bash/gawk it could be easily rewritten ...

please let me know what you think of this and if
you could imagine adding something similar to the
build system

TIA,
Herbert

2006-02-21 06:10:45

by Sam Ravnborg

[permalink] [raw]

Subject: Re: [RFC] duplicate #include check for build system

On Tue, Feb 21, 2006 at 02:48:24AM +0100, Herbert Poetzl wrote:
>
> Hi Sam! Folks!
>
> recently had the idea to utilize cpp or sparse to
> do some automated #include checking, and I came up
> with the following proof of concept:
Nice way - looks better than randomly parsing a lot of files.

> I just replaced the sparse binary by the following
> script (basically hijacking the make C=1 system)
Thats just fine. It is made general to support more than sparse if
feasible.

> of course, most of it would not be required if
> there was support in the kernel build system,
I do not see what you think could improve things on the kbuild side.
Please elaborate a bit more.

> and, if there is any preference for perl over
> bash/gawk it could be easily rewritten ...
For tools that is run only now and then you are fine to use perl if you
prefer. We have though until now avoided it as a strict dependency.
Also a few more comments would be nice so even I understand the script.

Sam

2006-02-21 07:28:41

by Daniel Barkalow

[permalink] [raw]

Subject: Re: [RFC] duplicate #include check for build system

On Tue, 21 Feb 2006, Herbert Poetzl wrote:

> Hi Sam! Folks!
>
> recently had the idea to utilize cpp or sparse to
> do some automated #include checking, and I came up
> with the following proof of concept:
>
> I just replaced the sparse binary by the following
> script (basically hijacking the make C=1 system)
>
> it would allow kernel developers to easily identify
> duplicate includes, which in turn, might reduce
> dependancies and thus build time ...

I think the kernel style is to encourage duplicate includes, rather than
removing them. Removing duplicate includes won't remove any dependancies
(since the includes that they duplicate will remain). And it makes it
harder to remove unnecessary includes (which does reduce dependancies),
because when header A stops needing header B, various other code could
expect that including header A means they get header B, and these places
have to be found and the formerly-duplicate include put back. So you
actually do best to have lots of duplicate includes and aggressively prune
unnecessary includes.

-Daniel
*This .sig left intentionally blank*

2006-02-21 17:52:50

by Sam Ravnborg

[permalink] [raw]

Subject: Re: [RFC] duplicate #include check for build system

On Tue, Feb 21, 2006 at 02:29:12AM -0500, Daniel Barkalow wrote:
> On Tue, 21 Feb 2006, Herbert Poetzl wrote:

> I think the kernel style is to encourage duplicate includes, rather than
> removing them. Removing duplicate includes won't remove any dependancies
> (since the includes that they duplicate will remain).
The style as I have understood it is that each .h file in include/linux/
are supposed to be self-contained. So it includes what is needs, and the
'what it needs' are kept small.

Keeping the 'what it needs' part small is a challenge resulting in
smaller .h files. But also a good way to keep related things together.

Sam

2006-02-22 00:11:59

by Herbert Poetzl

[permalink] [raw]

Subject: Re: [RFC] duplicate #include check for build system

On Tue, Feb 21, 2006 at 06:52:46PM +0100, Sam Ravnborg wrote:
> On Tue, Feb 21, 2006 at 02:29:12AM -0500, Daniel Barkalow wrote:
> > On Tue, 21 Feb 2006, Herbert Poetzl wrote:
>
> > I think the kernel style is to encourage duplicate includes, rather than
> > removing them. Removing duplicate includes won't remove any dependancies
> > (since the includes that they duplicate will remain).

> The style as I have understood it is that each .h file in include/linux/
> are supposed to be self-contained. So it includes what is needs, and the
> 'what it needs' are kept small.
>
> Keeping the 'what it needs' part small is a challenge resulting in
> smaller .h files. But also a good way to keep related things together.

glad that I stimulated a philosophical discussion
about the kernel header files and what they should
include or not ...

but the idea was more to give the developers an
instrument to verify that they are not including
stuff several times, and that's actually in .h
and .c files, because it seems that often the same
header file is included twice in the _same_ file

anyway, was this a positive or negative reply?

TIA,
Herbert

> Sam

2006-02-22 00:19:00

by Randy Dunlap

[permalink] [raw]

Subject: Re: [RFC] duplicate #include check for build system

On Wed, 22 Feb 2006, Herbert Poetzl wrote:

> On Tue, Feb 21, 2006 at 06:52:46PM +0100, Sam Ravnborg wrote:
> > On Tue, Feb 21, 2006 at 02:29:12AM -0500, Daniel Barkalow wrote:
> > > On Tue, 21 Feb 2006, Herbert Poetzl wrote:
> >
> > > I think the kernel style is to encourage duplicate includes, rather than
> > > removing them. Removing duplicate includes won't remove any dependancies
> > > (since the includes that they duplicate will remain).
>
> > The style as I have understood it is that each .h file in include/linux/
> > are supposed to be self-contained. So it includes what is needs, and the
> > 'what it needs' are kept small.
> >
> > Keeping the 'what it needs' part small is a challenge resulting in
> > smaller .h files. But also a good way to keep related things together.
>
> glad that I stimulated a philosophical discussion
> about the kernel header files and what they should
> include or not ...
>
> but the idea was more to give the developers an
> instrument to verify that they are not including
> stuff several times, and that's actually in .h
> and .c files, because it seems that often the same
> header file is included twice in the _same_ file
>
> anyway, was this a positive or negative reply?

Hi Herbert,

The goal is not to remove the most possible #includes.

E.g., if sched.h already sucks in kernel.h,
kernel.h still should be #included if the source (.c)
files uses any APIs or extern data from kernel.h.

Does that help?

--
~Randy

2006-02-22 00:26:16

by Herbert Poetzl

[permalink] [raw]

Subject: Re: [RFC] duplicate #include check for build system

On Tue, Feb 21, 2006 at 04:18:57PM -0800, Randy.Dunlap wrote:
> On Wed, 22 Feb 2006, Herbert Poetzl wrote:
>
> > On Tue, Feb 21, 2006 at 06:52:46PM +0100, Sam Ravnborg wrote:
> > > On Tue, Feb 21, 2006 at 02:29:12AM -0500, Daniel Barkalow wrote:
> > > > On Tue, 21 Feb 2006, Herbert Poetzl wrote:
> > >
> > > > I think the kernel style is to encourage duplicate includes,
> > > > rather than removing them. Removing duplicate includes won't
> > > > remove any dependancies (since the includes that they duplicate
> > > > will remain).
> >
> > > The style as I have understood it is that each .h file in
> > > include/linux/ are supposed to be self-contained. So it includes
> > > what is needs, and the 'what it needs' are kept small.
> > >
> > > Keeping the 'what it needs' part small is a challenge resulting
> > > in smaller .h files. But also a good way to keep related things
> > > together.
> >
> > glad that I stimulated a philosophical discussion
> > about the kernel header files and what they should
> > include or not ...
> >
> > but the idea was more to give the developers an
> > instrument to verify that they are not including
> > stuff several times, and that's actually in .h
> > and .c files, because it seems that often the same
> > header file is included twice in the _same_ file
> >
> > anyway, was this a positive or negative reply?
>
> Hi Herbert,
>
> The goal is not to remove the most possible #includes.

which I totally agree with ...

but a) how is that related to _having_ a tool to
check for duplicate includes, and b) how is it
related to removing duplicate includes in general?

let me give a simple example here:

#include <linux/pm.h>
#include <linux/sched.h>
#include <linux/proc_fs.h>
-#include <linux/pm.h>

do you think the second one is really desired?

> E.g., if sched.h already sucks in kernel.h,
> kernel.h still should be #included if the source (.c)
> files uses any APIs or extern data from kernel.h.
>
> Does that help?

no, sorry, doesn't help here ...

nevertheless thanks,
Herbert

> --
> ~Randy

2006-02-22 00:28:33

by Randy Dunlap

[permalink] [raw]

Subject: Re: [RFC] duplicate #include check for build system

On Wed, 22 Feb 2006, Herbert Poetzl wrote:

> On Tue, Feb 21, 2006 at 04:18:57PM -0800, Randy.Dunlap wrote:
> > On Wed, 22 Feb 2006, Herbert Poetzl wrote:
> >
> > > On Tue, Feb 21, 2006 at 06:52:46PM +0100, Sam Ravnborg wrote:
> > > > On Tue, Feb 21, 2006 at 02:29:12AM -0500, Daniel Barkalow wrote:
> > > > > On Tue, 21 Feb 2006, Herbert Poetzl wrote:
> > > >
> > > > > I think the kernel style is to encourage duplicate includes,
> > > > > rather than removing them. Removing duplicate includes won't
> > > > > remove any dependancies (since the includes that they duplicate
> > > > > will remain).
> > >
> > > > The style as I have understood it is that each .h file in
> > > > include/linux/ are supposed to be self-contained. So it includes
> > > > what is needs, and the 'what it needs' are kept small.
> > > >
> > > > Keeping the 'what it needs' part small is a challenge resulting
> > > > in smaller .h files. But also a good way to keep related things
> > > > together.
> > >
> > > glad that I stimulated a philosophical discussion
> > > about the kernel header files and what they should
> > > include or not ...
> > >
> > > but the idea was more to give the developers an
> > > instrument to verify that they are not including
> > > stuff several times, and that's actually in .h
> > > and .c files, because it seems that often the same
> > > header file is included twice in the _same_ file
> > >
> > > anyway, was this a positive or negative reply?
> >
> > Hi Herbert,
> >
> > The goal is not to remove the most possible #includes.
>
> which I totally agree with ...
>
> but a) how is that related to _having_ a tool to
> check for duplicate includes, and b) how is it
> related to removing duplicate includes in general?
>
> let me give a simple example here:
>
> #include <linux/pm.h>
> #include <linux/sched.h>
> #include <linux/proc_fs.h>
> -#include <linux/pm.h>
>
> do you think the second one is really desired?

Of course not.

> > E.g., if sched.h already sucks in kernel.h,
> > kernel.h still should be #included if the source (.c)
> > files uses any APIs or extern data from kernel.h.
> >
> > Does that help?
>
> no, sorry, doesn't help here ...
>
> nevertheless thanks,
> Herbert

--
~Randy

2006-02-22 05:57:28

by Sam Ravnborg

[permalink] [raw]

Subject: Re: [RFC] duplicate #include check for build system

On Wed, Feb 22, 2006 at 01:11:53AM +0100, Herbert Poetzl wrote:
> >
> > Keeping the 'what it needs' part small is a challenge resulting in
> > smaller .h files. But also a good way to keep related things together.
>
> glad that I stimulated a philosophical discussion
> about the kernel header files and what they should
> include or not ...
>
> but the idea was more to give the developers an
> instrument to verify that they are not including
> stuff several times, and that's actually in .h
> and .c files, because it seems that often the same
> header file is included twice in the _same_ file
>
> anyway, was this a positive or negative reply?

If you can restrict the tool to locate the cases where
a header file is included twice in the same file - then
this is a positive reply.
If you limit the check to the top-level file this is fine,
but I would not be suprised is one or a few .h files
has duplicated includes too.

Sam

2006-02-24 07:23:34

by Jan Engelhardt

[permalink] [raw]

Subject: Re: [RFC] duplicate #include check for build system

>> I think the kernel style is to encourage duplicate includes, rather than
>> removing them. Removing duplicate includes won't remove any dependancies
>> (since the includes that they duplicate will remain).
>The style as I have understood it is that each .h file in include/linux/
>are supposed to be self-contained. So it includes what is needs, and the
>'what it needs' are kept small.
>
>Keeping the 'what it needs' part small is a challenge resulting in
>smaller .h files. But also a good way to keep related things together.
>
How far does this go? Consider the following hypothetical case:

---dcache.h---
struct dentry {
...
};
---fs.h---
#include "dcache.h"
struct inode {
struct dentry *de;
};

Since only a pointer to struct dentry is involved, I would compress it to:

---fs.h---
struct dentry;
struct inode {
struct dentry *de;
};

The fs.h file still "compiles" (gcc -xc fs.h), and there is one file less
to be read. And since dcache.h in this case here should anyway be included
in the .c file if *DE is dereferenced, I do not see a problem with this.
Objections?

Jan Engelhardt
--

2006-02-25 02:57:53

by Randy Dunlap

[permalink] [raw]

Subject: Re: [RFC] duplicate #include check for build system

On Fri, 24 Feb 2006 08:23:23 +0100 (MET) Jan Engelhardt wrote:

> >> I think the kernel style is to encourage duplicate includes, rather than
> >> removing them. Removing duplicate includes won't remove any dependancies
> >> (since the includes that they duplicate will remain).
> >The style as I have understood it is that each .h file in include/linux/
> >are supposed to be self-contained. So it includes what is needs, and the
> >'what it needs' are kept small.
> >
> >Keeping the 'what it needs' part small is a challenge resulting in
> >smaller .h files. But also a good way to keep related things together.
> >
> How far does this go? Consider the following hypothetical case:
>
> ---dcache.h---
> struct dentry {
> ...
> };
> ---fs.h---
> #include "dcache.h"
> struct inode {
> struct dentry *de;
> };
>
> Since only a pointer to struct dentry is involved, I would compress it to:
>
> ---fs.h---
> struct dentry;
> struct inode {
> struct dentry *de;
> };
>
> The fs.h file still "compiles" (gcc -xc fs.h), and there is one file less
> to be read. And since dcache.h in this case here should anyway be included
> in the .c file if *DE is dereferenced, I do not see a problem with this.
> Objections?

Nope, your method is good & correct AFAIAC.

---
~Randy