2023-12-14 21:07:58

by Tanzir Hasan

[permalink] [raw]
Subject: [PATCH v2 0/2] shrink lib/string.i via IWYU

This patch series changes the include list of string.c to minimize
the preprocessing size. The patch series intends to remove kernel.h
which causes problems in the sh architecture because kernel.h is
not properly declared in asm/word-at-a-time.h

---

---
Changes in v2:
- Transformed into a patch series
- Changed asm inclusions to linux inclusions
- added a patch to sh
- Link to v1: https://lore.kernel.org/r/[email protected]

---
Tanzir Hasan (2):
sh: Added kernel.h to word-at-a-time
lib/string: shrink lib/string.i via IWYU

arch/sh/include/asm/word-at-a-time.h | 1 +
lib/string.c | 14 +++++++-------
2 files changed, 8 insertions(+), 7 deletions(-)
---
base-commit: 33cc938e65a98f1d29d0a18403dbbee050dcad9a
change-id: 20231204-libstringheader-e238e2af5eec

Best regards,
--
Tanzir Hasan <[email protected]>



2023-12-14 21:08:09

by Tanzir Hasan

[permalink] [raw]
Subject: [PATCH v2 1/2] sh: Added kernel.h to word-at-a-time

This patch includes linux/kernel.h in asm/word-at-a-time.h for the
sh architecture. WORD_AT_A_TIME_CONSTANTS depends on kernel.h.
Making this implicit dependancy explicit allows for later improvements
in the lib/string.c inclusion list.

Suggested-by: Al Viro <[email protected]>
---
arch/sh/include/asm/word-at-a-time.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/sh/include/asm/word-at-a-time.h b/arch/sh/include/asm/word-at-a-time.h
index 4aa398455b94..f680f5f1d626 100644
--- a/arch/sh/include/asm/word-at-a-time.h
+++ b/arch/sh/include/asm/word-at-a-time.h
@@ -5,6 +5,7 @@
#ifdef CONFIG_CPU_BIG_ENDIAN
# include <asm-generic/word-at-a-time.h>
#else
+#include <linux/kernel.h>
/*
* Little-endian version cribbed from x86.
*/

--
2.43.0.472.g3155946c3a-goog


2023-12-14 21:08:21

by Tanzir Hasan

[permalink] [raw]
Subject: [PATCH v2 2/2] lib/string: shrink lib/string.i via IWYU

This diff uses an open source tool include-what-you-use (IWYU) to modify
the include list changing indirect includes to direct includes.
IWYU is implemented using the IWYUScripts github repository which is a tool that is
currently undergoing development. These changes seek to improve build times.

This change to lib/string.c resulted in a preprocessed size of
lib/string.i from 26371 lines to 5259 lines (-80%) for the x86
defconfig.

Link: https://github.com/ClangBuiltLinux/IWYUScripts

Signed-off-by: Tanzir Hasan <[email protected]>
---
lib/string.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/string.c b/lib/string.c
index be26623953d2..7fe1acefb1a1 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -16,16 +16,16 @@

#define __NO_FORTIFY
#include <linux/types.h>
-#include <linux/string.h>
-#include <linux/ctype.h>
-#include <linux/kernel.h>
-#include <linux/export.h>
+#include <linux/bits.h>
#include <linux/bug.h>
#include <linux/errno.h>
-#include <linux/slab.h>
-
+#include <asm/rwonce.h>
+#include <linux/linkage.h>
+#include <linux/stddef.h>
+#include <vdso/limits.h>
+#include <linux/string.h>
+#include <linux/ctype.h>
#include <asm/unaligned.h>
-#include <asm/byteorder.h>
#include <asm/word-at-a-time.h>
#include <asm/page.h>


--
2.43.0.472.g3155946c3a-goog


2023-12-14 21:38:10

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] sh: Added kernel.h to word-at-a-time

On Thu, Dec 14, 2023 at 09:06:12PM +0000, [email protected] wrote:
> This patch includes linux/kernel.h in asm/word-at-a-time.h for the
> sh architecture. WORD_AT_A_TIME_CONSTANTS depends on kernel.h.
> Making this implicit dependancy explicit allows for later improvements
> in the lib/string.c inclusion list.
>
> Suggested-by: Al Viro <[email protected]>

This patch is missing your Signed-off-by: tag.

Also, please double-check your email configs: your full name is missing
from your emails (it's just "[email protected]"):
https://lore.kernel.org/lkml/[email protected]/

But otherwise, the change itself looks fine. :)

--
Kees Cook

2023-12-14 21:39:20

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] lib/string: shrink lib/string.i via IWYU

On Thu, Dec 14, 2023 at 09:06:13PM +0000, [email protected] wrote:
> This diff uses an open source tool include-what-you-use (IWYU) to modify
> the include list changing indirect includes to direct includes.
> IWYU is implemented using the IWYUScripts github repository which is a tool that is
> currently undergoing development. These changes seek to improve build times.
>
> This change to lib/string.c resulted in a preprocessed size of
> lib/string.i from 26371 lines to 5259 lines (-80%) for the x86
> defconfig.
>
> Link: https://github.com/ClangBuiltLinux/IWYUScripts
>

nit: Please drop this blank line to keep your tags together.

> Signed-off-by: Tanzir Hasan <[email protected]>

I'm glad to see such a big difference with just a little header
tweaking. I look forward to more like this!

Reviewed-by: Kees Cook <[email protected]>

--
Kees Cook

2023-12-14 21:52:58

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] sh: Added kernel.h to word-at-a-time

On Thu, Dec 14, 2023 at 1:37 PM Kees Cook <[email protected]> wrote:
>
> Also, please double-check your email configs: your full name is missing
> from your emails (it's just "[email protected]"):
> https://lore.kernel.org/lkml/[email protected]/

This is the issue related to use of our internal mailer with b4.
Konstantin fixed this in b4 a while ago, but I suspect the version of
b4 we're getting from apt still does not contain the fix.

Or perhaps we need to switch to pyenv and pypi to install a newer
version of b4. Tanzir (sits right next to me) and reports his version
of b4 is:

0.12.3

On pypi, looks like there's a 0.12.4
https://pypi.org/project/b4/#history
Looking at those release dates, I'm pretty sure Konstantine fixed this
particular issue in the 0.12.4 release.

(I made this mistake too recently, with my latest commit you picked up)

Debian stable says it has 0.12.0 https://packages.debian.org/stable/b4
Debian unstable says it has 0.12.4 https://packages.debian.org/unstable/b4
IDK where 0.12.3 is coming from. Perhaps our internal mirrors are
lagging behind.
--
Thanks,
~Nick Desaulniers

2023-12-15 16:05:06

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] sh: Added kernel.h to word-at-a-time

On Thu, Dec 14, 2023 at 09:06:12PM +0000, [email protected] wrote:
> This patch includes linux/kernel.h in asm/word-at-a-time.h for the
> sh architecture. WORD_AT_A_TIME_CONSTANTS depends on kernel.h.
> Making this implicit dependancy explicit allows for later improvements
> in the lib/string.c inclusion list.
>
> Suggested-by: Al Viro <[email protected]>

You forgot your SoB, but...

...

> #ifdef CONFIG_CPU_BIG_ENDIAN
> # include <asm-generic/word-at-a-time.h>
> #else
> +#include <linux/kernel.h>

I highly discourage from doing that. Instead, split what is needed to
the separate (new) header and include that one.

> /*
> * Little-endian version cribbed from x86.
> */

--
With Best Regards,
Andy Shevchenko



2023-12-15 16:27:00

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] lib/string: shrink lib/string.i via IWYU

On Thu, Dec 14, 2023 at 09:06:13PM +0000, [email protected] wrote:
> This diff uses an open source tool include-what-you-use (IWYU) to modify
> the include list changing indirect includes to direct includes.
> IWYU is implemented using the IWYUScripts github repository which is a tool that is
> currently undergoing development. These changes seek to improve build times.
>
> This change to lib/string.c resulted in a preprocessed size of
> lib/string.i from 26371 lines to 5259 lines (-80%) for the x86
> defconfig.

> Link: https://github.com/ClangBuiltLinux/IWYUScripts
>
> Signed-off-by: Tanzir Hasan <[email protected]>

Tag block should not have blank line(s).

> #include <linux/types.h>
> -#include <linux/string.h>
> -#include <linux/ctype.h>
> -#include <linux/kernel.h>
> -#include <linux/export.h>
> +#include <linux/bits.h>
> #include <linux/bug.h>
> #include <linux/errno.h>
> -#include <linux/slab.h>
> -
> +#include <asm/rwonce.h>
> +#include <linux/linkage.h>
> +#include <linux/stddef.h>
> +#include <vdso/limits.h>
> +#include <linux/string.h>
> +#include <linux/ctype.h>
> #include <asm/unaligned.h>
> -#include <asm/byteorder.h>
> #include <asm/word-at-a-time.h>
> #include <asm/page.h>

My gosh, this becomes unreadable!

Please make sure you go with groups of headers from more generic to more
particular, i.e.

linux/*

asm/*

vdso/*

BUT, why the heck vdso? It should be rather uapi/ or regular linux/.

--
With Best Regards,
Andy Shevchenko



2023-12-15 19:09:52

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] sh: Added kernel.h to word-at-a-time

On Fri, Dec 15, 2023 at 8:31 PM Tanzir Hasan <[email protected]> wrote:
> On Fri, Dec 15, 2023 at 8:04 AM Andy Shevchenko <[email protected]> wrote:
>> On Thu, Dec 14, 2023 at 09:06:12PM +0000, [email protected] wrote:

...

>> > +#include <linux/kernel.h>
>>
>> I highly discourage from doing that. Instead, split what is needed to
>> the separate (new) header and include that one.
>
>
> I think it would make the most sense to do this in a separate patch.
> What word-at-a-time.h needs from kernel.h is REPEAT_BYTE and to my knowledge,
> almost every other version of word-at-a-time.h includes kernel.h gets this by
> including kernel.h. A future change could be removing REPEAT_BYTE
> out of kernel.h

Just create a patch that either moves that macro (along with upper_*()
and lower_*() APIs) to a more distinguishable header
(maybe bytes.h or words.h or wordpart.h, etc) and use it in your case
and fix others.

--
With Best Regards,
Andy Shevchenko

2023-12-18 16:58:26

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] sh: Added kernel.h to word-at-a-time

On Fri, Dec 15, 2023 at 11:09 AM Andy Shevchenko
<[email protected]> wrote:
>
> On Fri, Dec 15, 2023 at 8:31 PM Tanzir Hasan <[email protected]> wrote:
> > On Fri, Dec 15, 2023 at 8:04 AM Andy Shevchenko <[email protected]> wrote:
> >> On Thu, Dec 14, 2023 at 09:06:12PM +0000, [email protected] wrote:
>
> ...
>
> >> > +#include <linux/kernel.h>
> >>
> >> I highly discourage from doing that. Instead, split what is needed to
> >> the separate (new) header and include that one.
> >
> >
> > I think it would make the most sense to do this in a separate patch.
> > What word-at-a-time.h needs from kernel.h is REPEAT_BYTE and to my knowledge,
> > almost every other version of word-at-a-time.h includes kernel.h gets this by
> > including kernel.h. A future change could be removing REPEAT_BYTE
> > out of kernel.h
>
> Just create a patch that either moves that macro (along with upper_*()
> and lower_*() APIs) to a more distinguishable header
> (maybe bytes.h or words.h or wordpart.h, etc) and use it in your case
> and fix others.

Andy,
These are good suggestions and we should do them...

...and Tanzir only has 3 weeks left of his internship. I don't want
him to get bogged down chasing build regressions from modifying the
headers themselves. I think what's best for him from here through the
remainder of his internship is to stay focused on applying suggestions
from IWYU to just modify the #include list of .c files, and not start
splitting .h files. Splitting the .h files will be the next step, and
is made easier by having the codebase not have so many indirect
includes (via IWYU), but we need time to soak header changes, and time
Tanzir does not have. Please can we keep the suggestions focused on
whether the modifications to the header includes (and the tangential
cleanups) are correct?

While REPEAT_BYTE has a manageable number of users, upper_* and
lower_* have significantly more; I worry about moving those causing
regressions. We can move them, but such changes would need
significantly more soak time than this series IMO. Tanzir is also
working on statistical analysis; I suspect if he analyzes
include/linux/kernel.h, he can comment on whether the usage of
REPEAT_BYTE is correlated with the usage of upper_* and lower_* in
order to inform whether they should be grouped together or not.
--
Thanks,
~Nick Desaulniers

2023-12-18 17:09:24

by Tanzir Hasan

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] sh: Added kernel.h to word-at-a-time

> While REPEAT_BYTE has a manageable number of users, upper_* and
> lower_* have significantly more; I worry about moving those causing
> regressions. We can move them, but such changes would need
> significantly more soak time than this series IMO. Tanzir is also
> working on statistical analysis; I suspect if he analyzes
> include/linux/kernel.h, he can comment on whether the usage of
> REPEAT_BYTE is correlated with the usage of upper_* and lower_* in
> order to inform whether they should be grouped together or not.

Removing REPEAT_BYTE is manageable and I have already moved it. I will
be pushing a patch that moves just that into another file called wordpart.h.
There are too many instances of the other functions for it to make sense to
remove them all in this patch.

Best,
Tanzir

2023-12-18 17:23:58

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] sh: Added kernel.h to word-at-a-time

On Mon, Dec 18, 2023 at 08:57:59AM -0800, Nick Desaulniers wrote:
> On Fri, Dec 15, 2023 at 11:09 AM Andy Shevchenko
> <[email protected]> wrote:
> > On Fri, Dec 15, 2023 at 8:31 PM Tanzir Hasan <[email protected]> wrote:
> > > On Fri, Dec 15, 2023 at 8:04 AM Andy Shevchenko <[email protected]> wrote:
> > >> On Thu, Dec 14, 2023 at 09:06:12PM +0000, [email protected] wrote:

...

> > >> > +#include <linux/kernel.h>
> > >>
> > >> I highly discourage from doing that. Instead, split what is needed to
> > >> the separate (new) header and include that one.
> > >
> > >
> > > I think it would make the most sense to do this in a separate patch.
> > > What word-at-a-time.h needs from kernel.h is REPEAT_BYTE and to my knowledge,
> > > almost every other version of word-at-a-time.h includes kernel.h gets this by
> > > including kernel.h. A future change could be removing REPEAT_BYTE
> > > out of kernel.h
> >
> > Just create a patch that either moves that macro (along with upper_*()
> > and lower_*() APIs) to a more distinguishable header
> > (maybe bytes.h or words.h or wordpart.h, etc) and use it in your case
> > and fix others.
>
> Andy,
> These are good suggestions and we should do them...
>
> ...and Tanzir only has 3 weeks left of his internship. I don't want
> him to get bogged down chasing build regressions from modifying the
> headers themselves. I think what's best for him from here through the
> remainder of his internship is to stay focused on applying suggestions
> from IWYU to just modify the #include list of .c files, and not start
> splitting .h files. Splitting the .h files will be the next step, and
> is made easier by having the codebase not have so many indirect
> includes (via IWYU), but we need time to soak header changes, and time
> Tanzir does not have. Please can we keep the suggestions focused on
> whether the modifications to the header includes (and the tangential
> cleanups) are correct?

Understood. Can we add a comment like

/* FIXME: replace with a proper header to avoid dependency hell */
#include <linux/kernel.h>

?

> While REPEAT_BYTE has a manageable number of users, upper_* and
> lower_* have significantly more; I worry about moving those causing
> regressions.

If you look at how I did similar in the past, I back included new header
into kernel.h. Not the pretty solution, but allows to split in the new code.

> We can move them, but such changes would need significantly more soak time

s/significantly//

But I got your point, see above.

> than this series IMO. Tanzir is also working on statistical analysis; I
> suspect if he analyzes include/linux/kernel.h, he can comment on whether the
> usage of REPEAT_BYTE is correlated with the usage of upper_* and lower_* in
> order to inform whether they should be grouped together or not.

--
With Best Regards,
Andy Shevchenko



2023-12-18 17:31:06

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] sh: Added kernel.h to word-at-a-time

On Mon, Dec 18, 2023 at 09:05:56AM -0800, Tanzir Hasan wrote:
> > While REPEAT_BYTE has a manageable number of users, upper_* and
> > lower_* have significantly more; I worry about moving those causing
> > regressions. We can move them, but such changes would need
> > significantly more soak time than this series IMO. Tanzir is also
> > working on statistical analysis; I suspect if he analyzes
> > include/linux/kernel.h, he can comment on whether the usage of
> > REPEAT_BYTE is correlated with the usage of upper_* and lower_* in
> > order to inform whether they should be grouped together or not.
>
> Removing REPEAT_BYTE is manageable and I have already moved it.

Removing? You mean switching to something else in all those headers?

> I will
> be pushing a patch that moves just that into another file called wordpart.h.
> There are too many instances of the other functions for it to make sense to
> remove them all in this patch.

Okay, let's see the proposal (patch) code then!

--
With Best Regards,
Andy Shevchenko



2024-01-09 21:49:55

by Justin Stitt

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] lib/string: shrink lib/string.i via IWYU

Hi,

On Thu, Dec 14, 2023 at 09:06:13PM +0000, [email protected] wrote:
> This diff uses an open source tool include-what-you-use (IWYU) to modify
> the include list changing indirect includes to direct includes.
> IWYU is implemented using the IWYUScripts github repository which is a tool that is
> currently undergoing development. These changes seek to improve build times.
>
> This change to lib/string.c resulted in a preprocessed size of
> lib/string.i from 26371 lines to 5259 lines (-80%) for the x86
> defconfig.
>

Tanzir, I wonder if you could include some of the symbol names used for
some of these more esoteric headers. Let me describe what I mean:

Andy talks about "why <vdso/...>" and perhaps some comments (in your
patch message, not in the source itself) about which symbols are being
used from these headers would serve useful. I believe IWYU can generate
this information and should clear up some confusion or lead to better
suggestions from reviewers if we understand why a header is being
included.

At any rate, this builds for me doing randconfigs on x86_64 with these
KCONFIG_SEEDs
1: 0x3DD9D136
2: 0xB4440EE4
3: 0x98778270
4: 0x8C237F26
5: 0x244F8A64
6: 0x5A5C5E5C
7: 0xA77896BC
8: 0x9B5FF0D5
9: 0x24F23F6A
10: 0x35C0A107

I applied your patch on top of 5db8752c3b81bd33.

Tested-by: Justin Stitt <[email protected]>
> Link: https://github.com/ClangBuiltLinux/IWYUScripts
>
> Signed-off-by: Tanzir Hasan <[email protected]>
> ---
> lib/string.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/lib/string.c b/lib/string.c
> index be26623953d2..7fe1acefb1a1 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -16,16 +16,16 @@
>
> #define __NO_FORTIFY
> #include <linux/types.h>
> -#include <linux/string.h>
> -#include <linux/ctype.h>
> -#include <linux/kernel.h>
> -#include <linux/export.h>
> +#include <linux/bits.h>
> #include <linux/bug.h>
> #include <linux/errno.h>
> -#include <linux/slab.h>
> -
> +#include <asm/rwonce.h>
> +#include <linux/linkage.h>
> +#include <linux/stddef.h>
> +#include <vdso/limits.h>
> +#include <linux/string.h>
> +#include <linux/ctype.h>
> #include <asm/unaligned.h>
> -#include <asm/byteorder.h>
> #include <asm/word-at-a-time.h>
> #include <asm/page.h>
>
>
> --
> 2.43.0.472.g3155946c3a-goog
>
Thanks
Justin

2024-01-11 00:09:03

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] lib/string: shrink lib/string.i via IWYU

On Tue, Jan 09, 2024 at 09:49:10PM +0000, Justin Stitt wrote:
> On Thu, Dec 14, 2023 at 09:06:13PM +0000, [email protected] wrote:
> > This diff uses an open source tool include-what-you-use (IWYU) to modify
> > the include list changing indirect includes to direct includes.
> > IWYU is implemented using the IWYUScripts github repository which is a tool that is
> > currently undergoing development. These changes seek to improve build times.
> >
> > This change to lib/string.c resulted in a preprocessed size of
> > lib/string.i from 26371 lines to 5259 lines (-80%) for the x86
> > defconfig.
> >
>
> Tanzir, I wonder if you could include some of the symbol names used for
> some of these more esoteric headers. Let me describe what I mean:
>
> Andy talks about "why <vdso/...>" and perhaps some comments (in your
> patch message, not in the source itself) about which symbols are being
> used from these headers would serve useful. I believe IWYU can generate
> this information and should clear up some confusion or lead to better
> suggestions from reviewers if we understand why a header is being
> included.

If there aren't any other objections, I'd like to pick this up in my
tree after the merge window closes.

>
> At any rate, this builds for me doing randconfigs on x86_64 with these
> KCONFIG_SEEDs
> 1: 0x3DD9D136
> 2: 0xB4440EE4
> 3: 0x98778270
> 4: 0x8C237F26
> 5: 0x244F8A64
> 6: 0x5A5C5E5C
> 7: 0xA77896BC
> 8: 0x9B5FF0D5
> 9: 0x24F23F6A
> 10: 0x35C0A107
>
> I applied your patch on top of 5db8752c3b81bd33.
>
> Tested-by: Justin Stitt <[email protected]>

Thanks for the testing!

-Kees

> > Link: https://github.com/ClangBuiltLinux/IWYUScripts
> >
> > Signed-off-by: Tanzir Hasan <[email protected]>
> > ---
> > lib/string.c | 14 +++++++-------
> > 1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/lib/string.c b/lib/string.c
> > index be26623953d2..7fe1acefb1a1 100644
> > --- a/lib/string.c
> > +++ b/lib/string.c
> > @@ -16,16 +16,16 @@
> >
> > #define __NO_FORTIFY
> > #include <linux/types.h>
> > -#include <linux/string.h>
> > -#include <linux/ctype.h>
> > -#include <linux/kernel.h>
> > -#include <linux/export.h>
> > +#include <linux/bits.h>
> > #include <linux/bug.h>
> > #include <linux/errno.h>
> > -#include <linux/slab.h>
> > -
> > +#include <asm/rwonce.h>
> > +#include <linux/linkage.h>
> > +#include <linux/stddef.h>
> > +#include <vdso/limits.h>
> > +#include <linux/string.h>
> > +#include <linux/ctype.h>
> > #include <asm/unaligned.h>
> > -#include <asm/byteorder.h>
> > #include <asm/word-at-a-time.h>
> > #include <asm/page.h>
> >
> >
> > --
> > 2.43.0.472.g3155946c3a-goog
> >
> Thanks
> Justin

--
Kees Cook