2024-05-09 12:11:00

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: thoughts wanted on dead code hunting?

Hi,
I've stumbled into finding various types of dead code in the kernel
and am after peoples thoughts and suggestions.
Apologies if this is a bit long.

It started off after I noticed an unused LIST_HEAD in the parallel
code, so I sent some patches to clean that and related code up
(waiting review!), but then wrote a hacky script to find more
(and then mutex's); and there's a handful of patches removing those
on list (please review!).

But then I noticed associated with those were some unused structs;
so I went struct hunting, and that's where I've got a problem.

That's found me ~200 candidates; where I guess 150ish are probably
real; but my hacky script is, well trivial and hacky, so they each
need eyeballing, then a git lookup to see why they're unused, and a
compile just to make there's not some subtle macro somewhere.

** Questions:
a) Can anyone think of a better tool than my script (see bottom)?
The simplicity is a blessing & a curse - it doesn't know about
#ifdef's so I don't need to try lots of configs, but at the same
time, it can't tell if the struct actually gets used in a macro
and I have to eyeball for a struct which is assigned to as
a variable at declaration time.

b) The dead structs are all over; so they've mostly been individual
patches rather than a big patch series - how do people feel about
another 150ish similar patches ?

** other Thoughts
* The dead code is all over; from incredibly obscure drivers, to
common stuff like ftrace and iscsi.

* Most of them seem to be the remains of previous cleanups or
refactors where someone has removed the function that used
it but forgot to remove the list or struct. Sometimes that happened
prior to the first commit, so it's always been dead in the tree.

* There's a few cases where people have added 'static' to a variable
to cleanup compiler warnings, but actually they just needed to
delete the variable.

* A harder problem is unused structure members; some I've spotted
by accident, some follow from what else I delete; e.g. if you
delete a LIST_HEAD, there's a good chance there's a struct somewhere
with the list entry in it that's no longer used.

* It's not just the kernel; I've just mopped up a few struct's in Mesa
as well; but different coding standards make the script harder in
places; e.g. X uses typedef struct... everywhere so then you have
the problem of hunting the use of the typedef name.

Anyway, that's way too long, all thoughts welcome.
(Reviews, even more welcome, as they get merged I'll work through my list
for a few more).

Scripts below,

Dave
(I've cc'd a few people at a guess for people who might suggest tools;
but please copy in anyone else who might)

* hacky script for finding unused LIST_HEAD

(ie print a count of times the name of the list is used in the same file;
if it's 1 it's worth looking at)

ag 'static LIST_HEAD'| sed -e 's/[():]/ /'g |
while read FNAME LINE STATIC DEF VARNAME TRAIL
do
echo ">>>" $FNAME ' : ' $VARNAME
echo -n "Count: "
grep $VARNAME $FNAME | wc -l
grep $VARNAME $FNAME
done

* hacky script for finding unused struct's

(ie print a count of time the name of the struct is used in the same file;
only bother with .c files; doesn't spot assignments to initialise the struct
on later lines, gets confused by struct's with short names etc).

grep -r '^struct [^(=]* {'| tr ':' ' ' |
while read FNAME STRUCT NAME TAIL
do
echo "$FNAME" | grep -q '[.]c$' || continue
echo ">>>" $FNAME ' : ' $NAME
echo -n "Count: "
grep $NAME $FNAME | wc -l
#grep $VARNAME $FNAME
done

--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/


2024-05-15 00:14:39

by Kees Cook

[permalink] [raw]
Subject: Re: thoughts wanted on dead code hunting?

On Thu, May 09, 2024 at 12:08:56PM +0000, Dr. David Alan Gilbert wrote:
> That's found me ~200 candidates; where I guess 150ish are probably
> real; but my hacky script is, well trivial and hacky, so they each
> need eyeballing, then a git lookup to see why they're unused, and a
> compile just to make there's not some subtle macro somewhere.

Nice finds! People are usually big fans of code removal patches. :)

> ** Questions:
> a) Can anyone think of a better tool than my script (see bottom)?
> The simplicity is a blessing & a curse - it doesn't know about
> #ifdef's so I don't need to try lots of configs, but at the same
> time, it can't tell if the struct actually gets used in a macro
> and I have to eyeball for a struct which is assigned to as
> a variable at declaration time.

I'm not sure I've seen anything better.

I tend to use stuff like Coccinelle (spatch) for finding specific struct
usage, but it can sometimes be slow when trying to process headers
recursively. e.g.:

// Options: --recursive-includes
@find@
struct to_be_removed INSTANCE;
struct to_be_removed *POINTER;

(
* INSTANCE
|
* POINTER
)


(I bet this could be improved, but it should be a usable example.)

So this might very a given struct isn't used.

> b) The dead structs are all over; so they've mostly been individual
> patches rather than a big patch series - how do people feel about
> another 150ish similar patches ?

Generally the smaller patches are preferred. For this kind of thing,
though, I'd probably collect them by individual header files, rather
than one-patch-per-struct.

If you have one giant patch, this tool can help break it up into
per-subsystem patches (it isn't perfect, but does its best):
https://github.com/kees/kernel-tools/blob/trunk/split-on-maintainer

> * There's a few cases where people have added 'static' to a variable
> to cleanup compiler warnings, but actually they just needed to
> delete the variable.

Hah. Yeah, these are nice to find and remove.

> * A harder problem is unused structure members; some I've spotted
> by accident, some follow from what else I delete; e.g. if you
> delete a LIST_HEAD, there's a good chance there's a struct somewhere
> with the list entry in it that's no longer used.

This is especially tricky because a giant amount of structs in the
kernel actually describe over-the-wire or on-hardware structures that
maybe the kernel doesn't care about all the members, but they're still
needed to keep the layout correct.

-Kees

--
Kees Cook

2024-05-16 00:40:57

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: thoughts wanted on dead code hunting?

* Kees Cook ([email protected]) wrote:
> On Thu, May 09, 2024 at 12:08:56PM +0000, Dr. David Alan Gilbert wrote:
> > That's found me ~200 candidates; where I guess 150ish are probably
> > real; but my hacky script is, well trivial and hacky, so they each
> > need eyeballing, then a git lookup to see why they're unused, and a
> > compile just to make there's not some subtle macro somewhere.
>
> Nice finds! People are usually big fans of code removal patches. :)

Thanks; removing the LIST_HEADs actually saves bytes in the binary;
just removing the structs themselves still cleans up the source and
occasionally it's noticing something else left along with it.

> > ** Questions:
> > a) Can anyone think of a better tool than my script (see bottom)?
> > The simplicity is a blessing & a curse - it doesn't know about
> > #ifdef's so I don't need to try lots of configs, but at the same
> > time, it can't tell if the struct actually gets used in a macro
> > and I have to eyeball for a struct which is assigned to as
> > a variable at declaration time.
>
> I'm not sure I've seen anything better.
>
> I tend to use stuff like Coccinelle (spatch) for finding specific struct
> usage, but it can sometimes be slow when trying to process headers
> recursively. e.g.:
>
> // Options: --recursive-includes
> @find@
> struct to_be_removed INSTANCE;
> struct to_be_removed *POINTER;
>
> (
> * INSTANCE
> |
> * POINTER
> )
>
>
> (I bet this could be improved, but it should be a usable example.)

Hmm, now if I could use coccinelle it would be more tolerant of coding
style and slight variations than my script.
However, trying that tiny example, I get:
File "play.cocci", line 10, column 1, charpos = 141
around = '',
whole content = )

so it seems to be objecting to something at the end of the file?
I ran that with:

make coccicheck COCCI=play.cocci M=arch/x86
with Fedora 40's coccinelle-1.1.1-30.20230624git0afff7f.fc40.x86_64

> So this might very a given struct isn't used.
>
> > b) The dead structs are all over; so they've mostly been individual
> > patches rather than a big patch series - how do people feel about
> > another 150ish similar patches ?
>
> Generally the smaller patches are preferred. For this kind of thing,
> though, I'd probably collect them by individual header files, rather
> than one-patch-per-struct.

Yeh; although note so far I've only been looking for unused structs
that are defined in a .c file rather than ones in headers.
Those are relatively easy to find, because I'm only looking in one .c
at a time (although that does hit corner cases like:

header:
struct foo;
struct baa {
struct foo *p;
};

.c file:
struct foo {
stuff
};

func(struct baa *b) {
b->p something
}

so foo is defined in the C file but the symbol 'foo' is never
used again in it.

> If you have one giant patch, this tool can help break it up into
> per-subsystem patches (it isn't perfect, but does its best):
> https://github.com/kees/kernel-tools/blob/trunk/split-on-maintainer

Thanks.

> > * There's a few cases where people have added 'static' to a variable
> > to cleanup compiler warnings, but actually they just needed to
> > delete the variable.
>
> Hah. Yeah, these are nice to find and remove.
>
> > * A harder problem is unused structure members; some I've spotted
> > by accident, some follow from what else I delete; e.g. if you
> > delete a LIST_HEAD, there's a good chance there's a struct somewhere
> > with the list entry in it that's no longer used.
>
> This is especially tricky because a giant amount of structs in the
> kernel actually describe over-the-wire or on-hardware structures that
> maybe the kernel doesn't care about all the members, but they're still
> needed to keep the layout correct.

Oh yeh; and also I'm not deleting unused struct's if they look like they're
describing some firmware or hardware struct, even if the kernel doesn't currently
use it.

Dave

> -Kees
>
> --
> Kees Cook
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/

2024-05-16 03:25:36

by Kees Cook

[permalink] [raw]
Subject: Re: thoughts wanted on dead code hunting?

On Thu, May 16, 2024 at 12:40:40AM +0000, Dr. David Alan Gilbert wrote:
> * Kees Cook ([email protected]) wrote:
> > // Options: --recursive-includes
> > @find@
> > struct to_be_removed INSTANCE;
> > struct to_be_removed *POINTER;

Oops, I missed this line:

@@

> >
> > (
> > * INSTANCE
> > |
> > * POINTER
> > )
> >
> >
> > (I bet this could be improved, but it should be a usable example.)
>
> Hmm, now if I could use coccinelle it would be more tolerant of coding
> style and slight variations than my script.
> However, trying that tiny example, I get:
> File "play.cocci", line 10, column 1, charpos = 141
> around = '',
> whole content = )
>
> so it seems to be objecting to something at the end of the file?

Sorry, yes, missed the "@@" between the identifiers and the code pattern
to match.

--
Kees Cook

2024-05-16 11:56:24

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: thoughts wanted on dead code hunting?

* Kees Cook ([email protected]) wrote:
> On Thu, May 16, 2024 at 12:40:40AM +0000, Dr. David Alan Gilbert wrote:
> > * Kees Cook ([email protected]) wrote:
> > > // Options: --recursive-includes
> > > @find@
> > > struct to_be_removed INSTANCE;
> > > struct to_be_removed *POINTER;
>
> Oops, I missed this line:
>
> @@
>
> > >
> > > (
> > > * INSTANCE
> > > |
> > > * POINTER
> > > )
> > >
> > >
> > > (I bet this could be improved, but it should be a usable example.)
> >
> > Hmm, now if I could use coccinelle it would be more tolerant of coding
> > style and slight variations than my script.
> > However, trying that tiny example, I get:
> > File "play.cocci", line 10, column 1, charpos = 141
> > around = '',
> > whole content = )
> >
> > so it seems to be objecting to something at the end of the file?
>
> Sorry, yes, missed the "@@" between the identifiers and the code pattern
> to match.

Ah right, that's done it - I'd noticed there wasn't a @@ but couldn't
figure out where to put them.

After a bit of guessing, I also added a

virtual context

at the top and now:

make coccicheck COCCI=play.cocci M=arch/x86 MODE=context

works - thanks! I'll have a play.

Dave

> --
> Kees Cook
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/