LinuxLists.cc - [PATCH v4 0/2] shrink lib/string.i via IWYU

2023-12-19 18:15:00

Subject: [PATCH v4 0/2] shrink lib/string.i via IWYU

This patch series changes the include list of string.c to minimize
the preprocessing size. The patch series intends to remove REPEAT_BYE
from kernel.h and move it into its own header file because
word-at-a-time.h has an implicit dependancy on it but it is declared
in kernel.h which is bloated.

---

---
Changes in v4:
- Fixed personal email client so name appears instead of just email
- Removed kernel.h where not needed.
- Sorted include list in lib/string.c and used linux/limits.h
- Link to v3: https://lore.kernel.org/r/[email protected]

Changes in v3:
- Moved REPEAT_BYTE out of kernel.h and into wordpart.h.
- Included wordpart.h where REPEAT_BYTE was necessary.
- Link to v2: https://lore.kernel.org/r/[email protected]

Changes in v2:
- Transformed into a patch series
- Changed asm inclusions to linux inclusions
- added a patch to sh
- Link to v1: https://lore.kernel.org/r/[email protected]

---
Tanzir Hasan (2):
kernel.h: removed REPEAT_BYTE from kernel.h
lib/string: shrink lib/string.i via IWYU

arch/arm/include/asm/word-at-a-time.h | 2 +-
arch/arm64/include/asm/word-at-a-time.h | 2 +-
arch/powerpc/include/asm/word-at-a-time.h | 2 +-
arch/riscv/include/asm/word-at-a-time.h | 2 +-
arch/s390/include/asm/word-at-a-time.h | 2 +-
arch/sh/include/asm/word-at-a-time.h | 1 +
arch/x86/include/asm/word-at-a-time.h | 1 +
fs/namei.c | 2 +-
include/asm-generic/word-at-a-time.h | 2 +-
include/linux/kernel.h | 7 -------
include/linux/wordpart.h | 17 +++++++++++++++++
lib/string.c | 17 +++++++++--------
12 files changed, 35 insertions(+), 22 deletions(-)
---
base-commit: ceb6a6f023fd3e8b07761ed900352ef574010bcb
change-id: 20231204-libstringheader-e238e2af5eec

Best regards,
--
Tanzir Hasan <[email protected]>

2023-12-19 18:15:35

by Tanzir Hasan

[permalink] [raw]

Subject: [PATCH v4 2/2] lib/string: shrink lib/string.i via IWYU

This diff uses an open source tool include-what-you-use (IWYU) to modify
the include list changing indirect includes to direct includes.
IWYU is implemented using the IWYUScripts github repository which is a tool that is
currently undergoing development. These changes seek to improve build times.

This change to lib/string.c resulted in a preprocessed size of
lib/string.i from 26371 lines to 5321 lines (-80%) for the x86
defconfig.

Link: https://github.com/ClangBuiltLinux/IWYUScripts
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Tanzir Hasan <[email protected]>
---
lib/string.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/lib/string.c b/lib/string.c
index be26623953d2..06d9b46875ef 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -15,19 +15,20 @@
*/

#define __NO_FORTIFY
-#include <linux/types.h>
-#include <linux/string.h>
-#include <linux/ctype.h>
-#include <linux/kernel.h>
-#include <linux/export.h>
+#include <linux/bits.h>
#include <linux/bug.h>
+#include <linux/ctype.h>
#include <linux/errno.h>
-#include <linux/slab.h>
+#include <linux/limits.h>
+#include <linux/linkage.h>
+#include <linux/stddef.h>
+#include <linux/string.h>
+#include <linux/types.h>

+#include <asm/page.h>
+#include <asm/rwonce.h>
#include <asm/unaligned.h>
-#include <asm/byteorder.h>
#include <asm/word-at-a-time.h>
-#include <asm/page.h>

#ifndef __HAVE_ARCH_STRNCASECMP
/**

--
2.43.0.472.g3155946c3a-goog

2023-12-19 18:15:43

by Tanzir Hasan

[permalink] [raw]

Subject: [PATCH v4 1/2] kernel.h: removed REPEAT_BYTE from kernel.h

This patch creates wordpart.h and includes it in asm/word-at-a-time.h
for the all architectures. WORD_AT_A_TIME_CONSTANTS depends on kernel.h
because of REPEAT_BYTE. Moving this to another header and including it
where necessary allows us to not include the bloated kernel.h. Making
this implicit dependency on REPEAT_BYTE explicit allows for later
improvements in the lib/string.c inclusion list.

Suggested-by: Al Viro <[email protected]>
Suggested-by: Andy Shevchenko <[email protected]>
Signed-off-by: Tanzir Hasan <[email protected]>
---
arch/arm/include/asm/word-at-a-time.h | 2 +-
arch/arm64/include/asm/word-at-a-time.h | 2 +-
arch/powerpc/include/asm/word-at-a-time.h | 2 +-
arch/riscv/include/asm/word-at-a-time.h | 2 +-
arch/s390/include/asm/word-at-a-time.h | 2 +-
arch/sh/include/asm/word-at-a-time.h | 1 +
arch/x86/include/asm/word-at-a-time.h | 1 +
fs/namei.c | 2 +-
include/asm-generic/word-at-a-time.h | 2 +-
include/linux/kernel.h | 7 -------
include/linux/wordpart.h | 17 +++++++++++++++++
11 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/arm/include/asm/word-at-a-time.h b/arch/arm/include/asm/word-at-a-time.h
index 352ab213520d..e08304996121 100644
--- a/arch/arm/include/asm/word-at-a-time.h
+++ b/arch/arm/include/asm/word-at-a-time.h
@@ -8,7 +8,7 @@
* Little-endian word-at-a-time zero byte handling.
* Heavily based on the x86 algorithm.
*/
-#include <linux/kernel.h>
+#include <linux/wordpart.h>

struct word_at_a_time {
const unsigned long one_bits, high_bits;
diff --git a/arch/arm64/include/asm/word-at-a-time.h b/arch/arm64/include/asm/word-at-a-time.h
index f3b151ed0d7a..bd8cfbc2b9c3 100644
--- a/arch/arm64/include/asm/word-at-a-time.h
+++ b/arch/arm64/include/asm/word-at-a-time.h
@@ -9,7 +9,7 @@

#ifndef __AARCH64EB__

-#include <linux/kernel.h>
+#include <linux/wordpart.h>

struct word_at_a_time {
const unsigned long one_bits, high_bits;
diff --git a/arch/powerpc/include/asm/word-at-a-time.h b/arch/powerpc/include/asm/word-at-a-time.h
index 30a12d208687..26e4f718a674 100644
--- a/arch/powerpc/include/asm/word-at-a-time.h
+++ b/arch/powerpc/include/asm/word-at-a-time.h
@@ -5,7 +5,7 @@
* Word-at-a-time interfaces for PowerPC.
*/

-#include <linux/kernel.h>
+#include <linux/wordpart.h>
#include <asm/asm-compat.h>
#include <asm/extable.h>

diff --git a/arch/riscv/include/asm/word-at-a-time.h b/arch/riscv/include/asm/word-at-a-time.h
index 7c086ac6ecd4..94bec2f7ba53 100644
--- a/arch/riscv/include/asm/word-at-a-time.h
+++ b/arch/riscv/include/asm/word-at-a-time.h
@@ -9,7 +9,7 @@
#define _ASM_RISCV_WORD_AT_A_TIME_H

-#include <linux/kernel.h>
+#include <linux/wordpart.h>

struct word_at_a_time {
const unsigned long one_bits, high_bits;
diff --git a/arch/s390/include/asm/word-at-a-time.h b/arch/s390/include/asm/word-at-a-time.h
index 2579f1694b82..55e66d9371d6 100644
--- a/arch/s390/include/asm/word-at-a-time.h
+++ b/arch/s390/include/asm/word-at-a-time.h
@@ -2,7 +2,7 @@
#ifndef _ASM_WORD_AT_A_TIME_H
#define _ASM_WORD_AT_A_TIME_H

-#include <linux/kernel.h>
+#include <linux/wordpart.h>
#include <asm/asm-extable.h>
#include <asm/bitsperlong.h>

diff --git a/arch/sh/include/asm/word-at-a-time.h b/arch/sh/include/asm/word-at-a-time.h
index 4aa398455b94..663658cea69a 100644
--- a/arch/sh/include/asm/word-at-a-time.h
+++ b/arch/sh/include/asm/word-at-a-time.h
@@ -5,6 +5,7 @@
#ifdef CONFIG_CPU_BIG_ENDIAN
# include <asm-generic/word-at-a-time.h>
#else
+#include <linux/wordpart.h>
/*
* Little-endian version cribbed from x86.
*/
diff --git a/arch/x86/include/asm/word-at-a-time.h b/arch/x86/include/asm/word-at-a-time.h
index 46b4f1f7f354..c002c864a63e 100644
--- a/arch/x86/include/asm/word-at-a-time.h
+++ b/arch/x86/include/asm/word-at-a-time.h
@@ -3,6 +3,7 @@
#define _ASM_WORD_AT_A_TIME_H

#include <linux/kernel.h>
+#include <linux/wordpart.h>

/*
* This is largely generic for little-endian machines, but the
diff --git a/fs/namei.c b/fs/namei.c
index 71c13b2990b4..03db8ca3f394 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -17,7 +17,7 @@

#include <linux/init.h>
#include <linux/export.h>
-#include <linux/kernel.h>
+#include <linux/wordpart.h>
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/filelock.h>
diff --git a/include/asm-generic/word-at-a-time.h b/include/asm-generic/word-at-a-time.h
index 95a1d214108a..6f088b2b0b03 100644
--- a/include/asm-generic/word-at-a-time.h
+++ b/include/asm-generic/word-at-a-time.h
@@ -2,7 +2,7 @@
#ifndef _ASM_WORD_AT_A_TIME_H
#define _ASM_WORD_AT_A_TIME_H

-#include <linux/kernel.h>
+#include <linux/wordpart.h>
#include <asm/byteorder.h>

#ifdef __BIG_ENDIAN
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index d9ad21058eed..162660af5b7d 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -39,13 +39,6 @@

#define STACK_MAGIC 0xdeadbeef

-/**
- * REPEAT_BYTE - repeat the value @x multiple times as an unsigned long value
- * @x: value to repeat
- *
- * NOTE: @x is not checked for > 0xff; larger values produce odd results.
- */
-#define REPEAT_BYTE(x) ((~0ul / 0xff) * (x))

/* generic data direction definitions */
#define READ 0
diff --git a/include/linux/wordpart.h b/include/linux/wordpart.h
new file mode 100644
index 000000000000..6a5ed5d54ba2
--- /dev/null
+++ b/include/linux/wordpart.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023 Google LLC <[email protected]>
+ */
+
+#ifndef _LINUX_WORDPART_H
+#define _LINUX_WORDPART_H
+/**
+ * REPEAT_BYTE - repeat the value @x multiple times as an unsigned long value
+ * @x: value to repeat
+ *
+ * NOTE: @x is not checked for > 0xff; larger values produce odd results.
+ */
+#define REPEAT_BYTE(x) ((~0ul / 0xff) * (x))
+
+#endif // _LINUX_WORDPART_H
+

--
2.43.0.472.g3155946c3a-goog

2023-12-19 18:30:53

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] kernel.h: removed REPEAT_BYTE from kernel.h

On Tue, Dec 19, 2023 at 06:09:51PM +0000, Tanzir Hasan wrote:
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -39,13 +39,6 @@
>
> #define STACK_MAGIC 0xdeadbeef
>
> -/**
> - * REPEAT_BYTE - repeat the value @x multiple times as an unsigned long value
> - * @x: value to repeat
> - *
> - * NOTE: @x is not checked for > 0xff; larger values produce odd results.
> - */
> -#define REPEAT_BYTE(x) ((~0ul / 0xff) * (x))
>
> /* generic data direction definitions */
> #define READ 0
> diff --git a/include/linux/wordpart.h b/include/linux/wordpart.h
> new file mode 100644
> index 000000000000..6a5ed5d54ba2
> --- /dev/null
> +++ b/include/linux/wordpart.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2023 Google LLC <[email protected]>
> + */
> +
> +#ifndef _LINUX_WORDPART_H
> +#define _LINUX_WORDPART_H
> +/**
> + * REPEAT_BYTE - repeat the value @x multiple times as an unsigned long value
> + * @x: value to repeat
> + *
> + * NOTE: @x is not checked for > 0xff; larger values produce odd results.
> + */
> +#define REPEAT_BYTE(x) ((~0ul / 0xff) * (x))

Legal note, this file is NOT copyright Google as no Google employe
actually wrote the logcal contents of it.

Please be VERY careful when doing stuff like this, it has potentially
big repercussions, and you don't want to have to talk to lots of
lawyers a few years from now and explain how you messed it all up :(

Nick, odds are there's a Google copyright class that Tanzir should take
here, if not, I recommend the free LF one that anyone can take online
that explains the issues here:
https://training.linuxfoundation.org/training/open-source-licensing-basics-for-software-developers/

As-is, this change is STRONGLY Nacked by me.

thanks,

greg k-h

2023-12-19 18:31:13

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH v4 2/2] lib/string: shrink lib/string.i via IWYU

On Tue, Dec 19, 2023 at 06:09:52PM +0000, Tanzir Hasan wrote:
> This diff uses an open source tool include-what-you-use (IWYU) to modify
> the include list changing indirect includes to direct includes.
> IWYU is implemented using the IWYUScripts github repository which is a tool that is
> currently undergoing development. These changes seek to improve build times.
>
> This change to lib/string.c resulted in a preprocessed size of
> lib/string.i from 26371 lines to 5321 lines (-80%) for the x86
> defconfig.

Nit, use 72 columns like your editor is trying to force on you when you
write a git commit. As is, these line-ends are all over the place.

It's the stuff around the actual change that is hard to get right...

thanks,

greg k-h

2023-12-19 18:36:18

by Tanzir Hasan

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] kernel.h: removed REPEAT_BYTE from kernel.h

> Legal note, this file is NOT copyright Google as no Google employe
> actually wrote the logcal contents of it.
>
> Please be VERY careful when doing stuff like this, it has potentially
> big repercussions, and you don't want to have to talk to lots of
> lawyers a few years from now and explain how you messed it all up :(

Ah, sorry, I didn't realize. Will change right away.

> Nick, odds are there's a Google copyright class that Tanzir should take
> here, if not, I recommend the free LF one that anyone can take online
> that explains the issues here:
> https://training.linuxfoundation.org/training/open-source-licensing-basics-for-software-developers/

I will take a look, thanks!

> As-is, this change is STRONGLY Nacked by me.
>

Best,
Tanzir Hasan

2023-12-19 19:11:02

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] kernel.h: removed REPEAT_BYTE from kernel.h

On Tue, Dec 19, 2023 at 10:35:55AM -0800, Tanzir Hasan wrote:
> > Legal note, this file is NOT copyright Google as no Google employe
> > actually wrote the logcal contents of it.
> >
> > Please be VERY careful when doing stuff like this, it has potentially
> > big repercussions, and you don't want to have to talk to lots of
> > lawyers a few years from now and explain how you messed it all up :(
>
> Ah, sorry, I didn't realize. Will change right away.
>
> > Nick, odds are there's a Google copyright class that Tanzir should take
> > here, if not, I recommend the free LF one that anyone can take online
> > that explains the issues here:
> > https://training.linuxfoundation.org/training/open-source-licensing-basics-for-software-developers/
>
> I will take a look, thanks!

Please take the time to either learn what the Google-specific rules are,
or take the above training, before submitting a new version of the
patch.

thanks,

greg k-h

2023-12-19 21:48:40

by Andy Shevchenko

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] kernel.h: removed REPEAT_BYTE from kernel.h

On Tue, Dec 19, 2023 at 8:10 PM Tanzir Hasan <[email protected]> wrote:
>
> This patch creates wordpart.h and includes it in asm/word-at-a-time.h
> for the all architectures. WORD_AT_A_TIME_CONSTANTS depends on kernel.h

for all

("all" doesn't go with article)

> because of REPEAT_BYTE. Moving this to another header and including it
> where necessary allows us to not include the bloated kernel.h. Making
> this implicit dependency on REPEAT_BYTE explicit allows for later
> improvements in the lib/string.c inclusion list.

...

> --- a/arch/arm/include/asm/word-at-a-time.h
> +++ b/arch/arm/include/asm/word-at-a-time.h

> -#include <linux/kernel.h>
> +#include <linux/wordpart.h>

No, please, read what I told you carefully.

...

> --- a/arch/x86/include/asm/word-at-a-time.h
> +++ b/arch/x86/include/asm/word-at-a-time.h
> @@ -3,6 +3,7 @@
> #define _ASM_WORD_AT_A_TIME_H
>
> #include <linux/kernel.h>

No, the macros used in this file doesn't require (after your patch)
the kernel.h, these are in particular in asm/asm.h and
asm/extable_fixup_types.h. I haven't looked at this closely to find a
common header that kinda guarantees those two to be included.
Otherwise we can use them directly.

> +#include <linux/wordpart.h>

...

> #include <linux/init.h>
> #include <linux/export.h>
> -#include <linux/kernel.h>
> +#include <linux/wordpart.h>

Try to keep it more or less ordered. At list with given context it
goes either after the slab.h or below (after those starting with 'f').

> #include <linux/slab.h>
> #include <linux/fs.h>
> #include <linux/filelock.h>

...

> --- a/include/asm-generic/word-at-a-time.h
> +++ b/include/asm-generic/word-at-a-time.h
> @@ -2,7 +2,7 @@
> #ifndef _ASM_WORD_AT_A_TIME_H
> #define _ASM_WORD_AT_A_TIME_H
>
> -#include <linux/kernel.h>
> +#include <linux/wordpart.h>
> #include <asm/byteorder.h>

Same as in the above remark, read what I said in the previous round of reviews.

--
With Best Regards,
Andy Shevchenko

2023-12-19 23:01:36

by Nick Desaulniers

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] kernel.h: removed REPEAT_BYTE from kernel.h

On Tue, Dec 19, 2023 at 11:10 AM Greg KH <[email protected]> wrote:
>
> > > Legal note, this file is NOT copyright Google as no Google employe
> > > actually wrote the logcal contents of it.
> > >
> > > Please be VERY careful when doing stuff like this, it has potentially
> > > big repercussions, and you don't want to have to talk to lots of
> > > lawyers a few years from now and explain how you messed it all up :(
> > >
> > > Nick, odds are there's a Google copyright class that Tanzir should take
> > > here, if not, I recommend the free LF one that anyone can take online
> > > that explains the issues here:
> > > https://training.linuxfoundation.org/training/open-source-licensing-basics-for-software-developers/
>
> Please take the time to either learn what the Google-specific rules are,
> or take the above training, before submitting a new version of the
> patch.

It was my mistake to suggest to Tanzir to add his copyright to this
newly created header. I'm sorry; we do have such resources available
and I should have reviewed them.

I've:
1. reviewed our internal training materials on copyright assignment
- go/gti-os-self-study
- go/patching#license-headers-and-copyright-notices
2. reviewed kernel docs:
- Documentation/process/1.Intro.rst
- Documentation/process/kernel-enforcement-statement.rst
3. asked Tanzir to do the same
4. discovered who to ask internally for further questions
<[email protected]>

Is there further due diligence you would like to see?

---

For Google specific guidance, I'll quote what they have:

> License Headers and Copyright Notices
> Googlers should add Google's copyright notice (or a "The Project Authors" style copyright notice) to new files being added to the library if permitted by the project maintainers.

Then the relevant section of 1.Intro.rst:

> Copyright assignments are not required (or requested) for code contributed
> to the kernel.

Shall I interpret those together to mean that the "project
maintainers" don't permit copyright assignments for "new files being
added," and thus Tanzir SHOULD NOT be adding a copyright assignment to
the newly created header?

Or shall I leave the interpretation up to an explicit discussion with
[email protected]?

---

While I think we have the answer for Tanzir's patch, I don't think we
do for if we intend to split other header files in the future if those
have explicit copyright assignments. I wonder if this question has
come up in Ingo's header refactoring work, and if so, what the
guidance is there?

For example, consider include/linux/sysfs.h. It's 600+ lines long and
contains 4 copyright assignments explicitly in sources. If we split
that header file in half, which copyright assignments do we transfer
to the new half, if any?
--
Thanks,
~Nick Desaulniers

2023-12-20 06:06:16

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] kernel.h: removed REPEAT_BYTE from kernel.h

On Tue, Dec 19, 2023 at 03:00:22PM -0800, Nick Desaulniers wrote:
> ---
>
> For Google specific guidance, I'll quote what they have:
>
> > License Headers and Copyright Notices
> > Googlers should add Google's copyright notice (or a "The Project Authors" style copyright notice) to new files being added to the library if permitted by the project maintainers.
>
> Then the relevant section of 1.Intro.rst:
>
> > Copyright assignments are not required (or requested) for code contributed
> > to the kernel.
>
> Shall I interpret those together to mean that the "project
> maintainers" don't permit copyright assignments for "new files being
> added," and thus Tanzir SHOULD NOT be adding a copyright assignment to
> the newly created header?

You can add a copyright header, as long as it is the CORRECT copyright
header.

Look at what this patch did, it attempted to claim that Google now owned
the copyright on the whole file, when in fact, that is obviously not the
case as a Google employee did not write the actual code that was added
to that file.

> Or shall I leave the interpretation up to an explicit discussion with
> [email protected]?

I think you should talk to them and get their clarification as to when
copyright headers should be added, AND what they should contain when
moving code around from other copyrighted files.

> For example, consider include/linux/sysfs.h. It's 600+ lines long and
> contains 4 copyright assignments explicitly in sources. If we split
> that header file in half, which copyright assignments do we transfer
> to the new half, if any?

That's up to you to figure out, I'm not the one doing the work :)

Perhaps run it by your corporate lawyers to ensure that you get it
correct with what they think is right first, if you have any questions
about what to do here, as in the end, they are the ones that will care
the most, right?

good luck!

greg k-h