2022-01-08 16:26:53

by Ingo Molnar

[permalink] [raw]
Subject: [ANNOUNCE] "Fast Kernel Headers" Tree -v2


I'm pleased to announce -v2 of the "Fast Kernel Headers" tree, which is a
comprehensive rework of the Linux kernel's header hierarchy & header
dependencies, with the dual goals of:

- speeding up the kernel build (both absolute and incremental build times)

- decoupling subsystem type & API definitions from each other

The fast-headers tree consists of over 25 sub-trees internally, spanning
over 2,300 commits, which can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master

# HEAD: 391ce485ced0 headers/deps: Introduce the CONFIG_FAST_HEADERS=y config option

Changes in -v2:

- Port to v5.16-rc8

- Clang/LLVM support (with the help of Nathan Chancellor):

On my 'reference distro config' the build speedup under Clang is around +88%
in elapsed time and +77% in CPU time used:

#
# v5.16-rc8
#
Performance counter stats for 'make -j96 vmlinux LLVM=1' (3 runs):

18,490,451.51 msec cpu-clock # 54.740 CPUs utilized ( +- 0.04% )

337.788 +- 0.834 seconds time elapsed ( +- 0.25% )

#
# -fast-headers-v2
#
Performance counter stats for 'make -j96 vmlinux LLVM=1' (3 runs):

10,443,670.86 msec cpu-clock # 58.093 CPUs utilized ( +- 0.00% )

179.773 +- 0.829 seconds time elapsed ( +- 0.46% )

- Unify the duplicated 'struct task_struct_per_task' into a single definition,
which should address the definition ugliness reported by Greg Kroah-Hartman.

- Fix bugs reported by Nathan Chancellor:

- cacheline attribute definition bug
- build bug with GCC plugins
- fix off-tree build

- Header optimizations that speed up the RDMA (infiniband) subsystem build
by about +9% over -v1 and +41% over the vanilla kernel:

$ perf stat --repeat 3 -e instructions,cycles,cpu-clock --sync --pre "find . -name '*.o' | xargs rm" m-rdma >/dev/null
...

# v5.16-rc8:

643,570.38 msec cpu-clock # 52.253 CPUs utilized ( +- 0.06% )

12.316 +- 0.183 seconds time elapsed ( +- 1.49% )

# -fast-headers-v1:
446,243.49 msec cpu-clock # 47.106 CPUs utilized ( +- 0.06% )

9.4731 +- 0.0666 seconds time elapsed ( +- 0.70% )

# -fast-headers-v2:
400,650.32 msec cpu-clock # 45.888 CPUs utilized ( +- 0.02% )

8.7310 +- 0.0162 seconds time elapsed ( +- 0.19% )

- Another round of <linux/sched.h> header footprint reductions: the
header is now used in only ~36% of .c files, down from 99% in the
mainline kernel and 68% in -v1.

- Various bisectability improvements & other fixes & optimizations.

Thanks,

Ingo


2022-01-10 22:04:18

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [ANNOUNCE] "Fast Kernel Headers" Tree -v2

On Sat, Jan 8, 2022 at 5:26 PM Ingo Molnar <[email protected]> wrote:
>
>
> I'm pleased to announce -v2 of the "Fast Kernel Headers" tree, which is a
> comprehensive rework of the Linux kernel's header hierarchy & header
> dependencies, with the dual goals of:
>
> - speeding up the kernel build (both absolute and incremental build times)
>
> - decoupling subsystem type & API definitions from each other
>
> The fast-headers tree consists of over 25 sub-trees internally, spanning
> over 2,300 commits, which can be found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master
>
> # HEAD: 391ce485ced0 headers/deps: Introduce the CONFIG_FAST_HEADERS=y config option

I've started reading through it at last. I can't say that I'm
reviewing every patch, but
at least (almost all) the things I've looked at so far all seem really
nice to me, mostly
this is the same that I was planning to do as well, some things I
would have done
differently but I'm not complaining as you did the work, and some things seem
unnecessary but might not be.

I've started building randconfig kernels for arm64 and x86, and fixing
up things that come up,
a few things I have noticed out so far:

* 2e98ec93d465 ("headers/prep: Rename constants: SOCK_DESTROY =>
SOCK_DIAG_SOCK_DESTROY")

This one looks wrong, as you are changing a uapi header, possibly
breaking applications
at compile time. I think the other one should be renamed instead.

* 04293522a8cb ("headers/deps: ipc/shm: Move the 'struct shmid_ds'
definition to ipc/shm.c")
and related patches

Similarly, the IPC structures are uapi headers that I would not
change here for the same reasons.
Even if nothing uses those any more with modern libc
implementations, the structures belong into
uapi, unless we can prove that the old-style sysvipc interface is
completely unused and we
remove the implementation from the kernel as well (I don't think we
want that, but I have not
looked in depth at when it was last used by a libc)

* changing any include/uapi headers to use "#include <uapi/linux/*.h>"
is broken because
that makes the headers unusable from userspace, including any of
tools/*/. I think we
can work around this in the headers_install.sh postprocessing step
though, where we already
do unifdef etc.

* For all the header additions to .c files, I assume you are using a
set of script, so these could
probably be changed without much trouble. I would suggest applying
them in sequence so
the headers remain sorted alphabetically in the end. It would
probably make sense to
squash those all together to avoid patching certain files many
times over, for the sake
of keeping a slightly saner git history.

* The per-task stuff sounded a bit scary from your descriptions but
looking at the actual
implementation I now get it, this looks like a really nice way of doing it.

* I think it would be good to keep the include/linux/syscalls_api.h declarations
in the same header as the SYSCALL_DEFINE*() macros, to ensure that the
prototypes remain synchronized. Splitting them out will likely also
cause sparse
warnings for missing prototypes (or maybe it should but doesn't at
the moment).

* include/linux/time64_types.h is not a good name, as these are now
the default types
after we removed the time32 versions. I'd either rename it to
linux/time_types.h
or split it up between linux/types.h and linux/ktime_types.h

* arm64 needs a couple of minor fixups, see
https://pastebin.com/eSKhz4CL for what
I have so far, feel free to integrate any things that directly make sense.

Arnd

2022-01-11 16:23:35

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [ANNOUNCE] "Fast Kernel Headers" Tree -v2

On Mon, Jan 10, 2022 at 11:03 PM Arnd Bergmann <[email protected]> wrote:
> On Sat, Jan 8, 2022 at 5:26 PM Ingo Molnar <[email protected]> wrote:
> I've started building randconfig kernels for arm64 and x86, and fixing
> up things that come up, a few things I have noticed out so far:

I have run into a couple more specific issues:

* net/smc/smc_ib.c:824:26: error: implicit declaration of function
'cache_line_size' [-Werror=implicit-function-declaration]
cache_line_size is generally provided by linux/cache.h, which includes
asm/cache.h.
This works on arm64, but not on x86, where asm/cache.h would have to include
asm/cpufeature.h, and but it would be good to avoid that because of the implicit
linux/percpu.h and linux/bitops.h inclusions. Also, if I add the
include, I get this
build failure instead: include/linux/smp_types.h:88:33: error:
requested alignment '20'
is not a positive power of 2

* arm64 has a couple of issues around asm/memory.h, linux/mm_types.h
and asm/page.h
that can cause loops. I think my latest version has it figured out,
but there is probably
room for optimization.

* There is no general way to get the get_order() definition, other
than including
asm/page.h from .c files. On arm64, this shows up in a couple of files after the
cleanup. Only xtensa and ia64 define their own version of get_order(),
and I think
we should just remove those and move the generic version to linux/getorder.h,
where any file using it can pick it up. For randconfig builds, I had
to add asm/page.h
to net/xdp/xsk_queue.c, mm/memtest.c and
drivers/target/iscsi/iscsi_target_nego.c,
after I removed the indirect include from arch/arm64/include/asm/mmu.h
in the previous step.

Arnd

2022-01-11 17:08:43

by David Laight

[permalink] [raw]
Subject: RE: [ANNOUNCE] "Fast Kernel Headers" Tree -v2

Related:

If I redefine _IOW I get this splat:
In file included from ../include/asm-generic/ioctl.h:5:0,
from ./arch/x86/include/generated/uapi/asm/ioctl.h:1,
from ../include/uapi/linux/ioctl.h:5,
from ../include/uapi/linux/apm_bios.h:133,
from ../include/linux/apm_bios.h:9,
from ../arch/x86/include/uapi/asm/bootparam.h:44,
from ../arch/x86/include/asm/mem_encrypt.h:18,
from ../include/linux/mem_encrypt.h:17,
from ../arch/x86/include/asm/page_types.h:7,
from ../arch/x86/include/asm/page.h:9,
from ../arch/x86/include/asm/thread_info.h:12,
from ../include/linux/thread_info.h:60,
from ../arch/x86/include/asm/preempt.h:7,
from ../include/linux/preempt.h:78,
from ../include/linux/smp.h:110,
from ../include/linux/lockdep.h:14,
from ../include/linux/mutex.h:17,
from ../include/linux/kernfs.h:12,
from ../include/linux/sysfs.h:16,
from ../include/linux/kobject.h:20,
from ../include/linux/cdev.h:5,

I can't help feeling that include chain is sub-optimal.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-01-13 08:27:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: [ANNOUNCE] "Fast Kernel Headers" Tree -v2


* Arnd Bergmann <[email protected]> wrote:

> On Mon, Jan 10, 2022 at 11:03 PM Arnd Bergmann <[email protected]> wrote:
> > On Sat, Jan 8, 2022 at 5:26 PM Ingo Molnar <[email protected]> wrote:
> > I've started building randconfig kernels for arm64 and x86, and fixing
> > up things that come up, a few things I have noticed out so far:
>
> I have run into a couple more specific issues:
>
> * net/smc/smc_ib.c:824:26: error: implicit declaration of function
> 'cache_line_size' [-Werror=implicit-function-declaration]
> cache_line_size is generally provided by linux/cache.h, which includes
> asm/cache.h.
> This works on arm64, but not on x86, where asm/cache.h would have to include
> asm/cpufeature.h, and but it would be good to avoid that because of the implicit
> linux/percpu.h and linux/bitops.h inclusions. Also, if I add the
> include, I get this
> build failure instead: include/linux/smp_types.h:88:33: error:
> requested alignment '20'
> is not a positive power of 2

Note that this particular one should be fixed in the WIP branch, which is
at:

git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git sched/headers

> * arm64 has a couple of issues around asm/memory.h, linux/mm_types.h and
> asm/page.h that can cause loops. I think my latest version has it figured
> out, but there is probably room for optimization.

Yeah, this is like the 5th attempt at finding a robust solution. :-/

> * There is no general way to get the get_order() definition, other than
> including asm/page.h from .c files. On arm64, this shows up in a couple
> of files after the cleanup. Only xtensa and ia64 define their own version
> of get_order(), and I think we should just remove those and move the
> generic version to linux/getorder.h, where any file using it can pick it
> up. For randconfig builds, I had to add asm/page.h to
> net/xdp/xsk_queue.c, mm/memtest.c and
> drivers/target/iscsi/iscsi_target_nego.c, after I removed the indirect
> include from arch/arm64/include/asm/mmu.h in the previous step.

Would including <linux/mm_page_address.h> be sufficient? That already has
an <asm/page.h> inclusion and is vaguely related.

I tried to avoid as many low level headers as possible from the main types
headers - and the get_order() functionality also brings in bitops
definitions, which I'm still hoping to be able to reduce from its current
~95% utilization in a distro kernel ...

We could add <linux/page_api.h> as well, as a standardized header. We
already have page_types.h and et_order() is a page types API.

Thanks,

Ingo

2022-01-13 08:58:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [ANNOUNCE] "Fast Kernel Headers" Tree -v2


* Arnd Bergmann <[email protected]> wrote:

> On Sat, Jan 8, 2022 at 5:26 PM Ingo Molnar <[email protected]> wrote:
> >
> >
> > I'm pleased to announce -v2 of the "Fast Kernel Headers" tree, which is a
> > comprehensive rework of the Linux kernel's header hierarchy & header
> > dependencies, with the dual goals of:
> >
> > - speeding up the kernel build (both absolute and incremental build times)
> >
> > - decoupling subsystem type & API definitions from each other
> >
> > The fast-headers tree consists of over 25 sub-trees internally, spanning
> > over 2,300 commits, which can be found at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master
> >
> > # HEAD: 391ce485ced0 headers/deps: Introduce the CONFIG_FAST_HEADERS=y config option
>
> I've started reading through it at last. I can't say that I'm reviewing
> every patch, [...]

Yeah, not expected at all with 2,300+ commits - but even cursory review or
review of specific areas is appreciated.

> but at least (almost all) the things I've looked at so far all seem
> really nice to me, mostly this is the same that I was planning to do as

Thanks!

> well, some things I would have done differently but I'm not complaining
> as you did the work, and some things seem unnecessary but might not be.
>
> I've started building randconfig kernels for arm64 and x86, and fixing
> up things that come up,
> a few things I have noticed out so far:
>
> * 2e98ec93d465 ("headers/prep: Rename constants: SOCK_DESTROY =>
> SOCK_DIAG_SOCK_DESTROY")
>
> This one looks wrong, as you are changing a uapi header, possibly
> breaking applications at compile time. I think the other one should be
> renamed instead.

This is hard: SOCK_DESTROY is one of the main constants for sockets, is
well named, fits into an existing in-kernel nomenclature and both me and
networking folks would hate to rename it ...

So I'd keep this one and wait for any reported breakage. I don't think we
*guarantee* the specific naming of symbols - we guarantee an ABI, and make
a best-effort for the rest. The constant is netdiag specific and doesn't
seem to be included by any major user-space header in /usr/include.

> * 04293522a8cb ("headers/deps: ipc/shm: Move the 'struct shmid_ds'
> definition to ipc/shm.c")
> and related patches
>
> Similarly, the IPC structures are uapi headers that I would not
> change here for the same reasons.
> Even if nothing uses those any more with modern libc
> implementations, the structures belong into
> uapi, unless we can prove that the old-style sysvipc interface is
> completely unused and we
> remove the implementation from the kernel as well (I don't think we
> want that, but I have not
> looked in depth at when it was last used by a libc)

Ok, we can certainly undo this one - but how does it work in practice, as
the structure is already defined by libc:

/usr/include/x86_64-linux-gnu/bits/types/struct_shmid_ds.h:struct shmid_ds

/* Data structure describing a shared memory segment. */
struct shmid_ds
{
struct ipc_perm shm_perm; /* operation permission struct */
size_t shm_segsz; /* size of segment in bytes */
#if __TIMESIZE == 32
__time_t shm_atime; /* time of last shmat() */
unsigned long int __shm_atime_high;
__time_t shm_dtime; /* time of last shmdt() */
unsigned long int __shm_dtime_high;
__time_t shm_ctime; /* time of last change by shmctl() */
unsigned long int __shm_ctime_high;
#else
__time_t shm_atime; /* time of last shmat() */
__time_t shm_dtime; /* time of last shmdt() */
__time_t shm_ctime; /* time of last change by shmctl() */
#endif
__pid_t shm_cpid; /* pid of creator */
__pid_t shm_lpid; /* pid of last shmop */
shmatt_t shm_nattch; /* number of current attaches */
__syscall_ulong_t __glibc_reserved5;
__syscall_ulong_t __glibc_reserved6;
};


Wouldn't this definition conflict with any header use of linux/shm.h?

> * changing any include/uapi headers to use "#include <uapi/linux/*.h>"
> is broken because
> that makes the headers unusable from userspace, including any of
> tools/*/. I think we
> can work around this in the headers_install.sh postprocessing step
> though, where we already
> do unifdef etc.

Yeah, so the problem here is on the kernel side, the following innocent
looking include in a UAPI header:

#include <linux/foo.h>

Will turn into a very large header - and unintentionally so - if there
happens to be an include/foo.h header.

I.e. normally there's only the UAPI header, which is then included, but in
some significant cases that's not so.

IMO it seems so much cleaner to express the intent to only include the UAPI
header - so solving this at header-install time would be preferable.

> * For all the header additions to .c files, I assume you are using a set
> of script, so these could
> probably be changed without much trouble. I would suggest applying
> them in sequence so
> the headers remain sorted alphabetically in the end. It would
> probably make sense to
> squash those all together to avoid patching certain files many
> times over, for the sake
> of keeping a slightly saner git history.

Are you suggesting to change the current reverse-alphabetical order of
headers:

--- a/drivers/auxdisplay/ht16k33.c
+++ b/drivers/auxdisplay/ht16k33.c
@@ -8,6 +8,15 @@
* Copyright (C) 2021 Glider bv
*/

+#include <linux/workqueue_api.h>
+#include <linux/wait_api.h>
+#include <linux/sched.h>
+#include <linux/pgtable_api.h>
+#include <linux/of_api.h>
+#include <linux/mm_api.h>
+#include <linux/jiffies.h>
+#include <linux/gfp_api.h>
+#include <linux/device_api_lock.h>

... to alphabetical?

Can certainly do that - but this will flip around the commit order: it's
much easier to add at the head of the include files section.

> * The per-task stuff sounded a bit scary from your descriptions but
> looking at the actual
> implementation I now get it, this looks like a really nice way of doing it.

Thank you!

> * I think it would be good to keep the include/linux/syscalls_api.h declarations
> in the same header as the SYSCALL_DEFINE*() macros, to ensure that the
> prototypes remain synchronized. Splitting them out will likely also
> cause sparse
> warnings for missing prototypes (or maybe it should but doesn't at
> the moment).

Yeah, I suppose we could undo the split:

# -fast-headers-v2:

_______________________
| stripped lines of code
| _____________________________
| | headers included recursively
| | _______________________________
| | | usage in a distro kernel build
____________ | | | _________________________________________
| header name | | | | million lines of comment-stripped C code
| | | | |
#include <linux/syscalls_types.h> | LOC: 2,397 | headers: 128 | 353 | MLOC: 0.8 |
#include <linux/syscalls_api.h> | LOC: 12,842 | headers: 361 | 167 | MLOC: 2.1 |

The full header used to be a lot bigger:

# v5.16-rc8

#include <linux/syscalls.h> | LOC: 40,217 | headers: 604 | 321 | MLOC: 12.9 | ###


> * include/linux/time64_types.h is not a good name, as these are now
> the default types
> after we removed the time32 versions. I'd either rename it to
> linux/time_types.h
> or split it up between linux/types.h and linux/ktime_types.h

I was doing this in the context of v5.16.

> * arm64 needs a couple of minor fixups, see
> https://pastebin.com/eSKhz4CL for what
> I have so far, feel free to integrate any things that directly make sense.

Thanks! Mind sending the dependency removals in a series, so I can keep
attribution? These bits are usually in separate commits, unless they fix
some bug in an existing commit:

arch/arm64/include/asm/mmu.h | 1 -
arch/arm64/include/asm/pgtable-hwdef.h | 1 -
arch/arm64/include/asm/pgtable-prot.h | 1 -

This bit looks suboptimal - but as you mentioned it fixes tooling build
errors:

--- a/include/uapi/linux/netdevice.h
+++ b/include/uapi/linux/netdevice.h
@@ -26,11 +26,17 @@
#ifndef _UAPI_LINUX_NETDEVICE_H
#define _UAPI_LINUX_NETDEVICE_H

+#ifdef __KERNEL__
#include <uapi/linux/if.h>
#include <uapi/linux/if_ether.h>
#include <uapi/linux/if_packet.h>
#include <uapi/linux/if_link.h>
-
+#else
+#include <linux/if.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/if_link.h>
+#endif

I'll apply & backmerge the per_task() fixlets for the bugs you found:

- per_task(&init_task, kcsan_ctx).scoped_addresses.next = LIST_POISON1;
+ per_task(&init_task, kcsan_ctx).scoped_accesses.next = LIST_POISON1;

- return ptr == &current->flags;
+ return ptr == &task_flags(current);

This KASAN bit:

--- a/include/linux/mm_api_kasan.h
+++ b/include/linux/mm_api_kasan.h
@@ -18,7 +18,7 @@ static inline u8 page_kasan_tag(const struct page *page)
{
u8 tag = 0xff;

- if (kasan_enabled()) {
+ if (IS_ENABLED(CONFIG_KASAN)) {
tag = (page->flags >> KASAN_TAG_PGSHIFT) & KASAN_TAG_MASK;
tag ^= 0xff;
}

Is this a fix for some build failure? Upstream it's using kasan_enabled():

static inline u8 page_kasan_tag(const struct page *page)
{
u8 tag = 0xff;

if (kasan_enabled()) {
tag = (page->flags >> KASAN_TAG_PGSHIFT) & KASAN_TAG_MASK;
tag ^= 0xff;
}

return tag;
}

Ingo

2022-01-13 09:20:53

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [ANNOUNCE] "Fast Kernel Headers" Tree -v2

On Thu, Jan 13, 2022 at 9:27 AM Ingo Molnar <[email protected]> wrote:
> * Arnd Bergmann <[email protected]> wrote:
> > On Mon, Jan 10, 2022 at 11:03 PM Arnd Bergmann <[email protected]> wrote:
> > > On Sat, Jan 8, 2022 at 5:26 PM Ingo Molnar <[email protected]> wrote:
> > > I've started building randconfig kernels for arm64 and x86, and fixing
> > > up things that come up, a few things I have noticed out so far:
> >
> > I have run into a couple more specific issues:
> >
> > * net/smc/smc_ib.c:824:26: error: implicit declaration of function
> > 'cache_line_size' [-Werror=implicit-function-declaration]
> > cache_line_size is generally provided by linux/cache.h, which includes
> > asm/cache.h.
> > This works on arm64, but not on x86, where asm/cache.h would have to include
> > asm/cpufeature.h, and but it would be good to avoid that because of the implicit
> > linux/percpu.h and linux/bitops.h inclusions. Also, if I add the
> > include, I get this
> > build failure instead: include/linux/smp_types.h:88:33: error:
> > requested alignment '20'
> > is not a positive power of 2
>
> Note that this particular one should be fixed in the WIP branch, which is
> at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git sched/headers

Ok.

> > * arm64 has a couple of issues around asm/memory.h, linux/mm_types.h and
> > asm/page.h that can cause loops. I think my latest version has it figured
> > out, but there is probably room for optimization.
>
> Yeah, this is like the 5th attempt at finding a robust solution. :-/

It will likely come back in another form when more architectures get
converted then. I'm currently looking at reviving my own metrics scripts
from 2020 to see if I can improve arch/arm64 further, after that I was
planning to look at arch/arm/

> > * There is no general way to get the get_order() definition, other than
> > including asm/page.h from .c files. On arm64, this shows up in a couple
> > of files after the cleanup. Only xtensa and ia64 define their own version
> > of get_order(), and I think we should just remove those and move the
> > generic version to linux/getorder.h, where any file using it can pick it
> > up. For randconfig builds, I had to add asm/page.h to
> > net/xdp/xsk_queue.c, mm/memtest.c and
> > drivers/target/iscsi/iscsi_target_nego.c, after I removed the indirect
> > include from arch/arm64/include/asm/mmu.h in the previous step.
>
> Would including <linux/mm_page_address.h> be sufficient? That already has
> an <asm/page.h> inclusion and is vaguely related.

Sure, works for me.

> I tried to avoid as many low level headers as possible from the main types
> headers - and the get_order() functionality also brings in bitops
> definitions, which I'm still hoping to be able to reduce from its current
> ~95% utilization in a distro kernel ...

Agreed, I think reducing bitops.h and atomic.h usage is fairly important,
I think these are even bigger on arm64 than on x86.

> We could add <linux/page_api.h> as well, as a standardized header. We
> already have page_types.h and et_order() is a page types API.

More generally speaking, do you have a plan for how to document which
header to include for getting a particular symbol that is provided by
a header we don't want to include directly? I think iwyu has a particular
notation for it, but when I looked at using that in 2020 I decided it wouldn't
scale to the size of the kernel. I did my own shell script with a long
list of regex patterns, but I'm not convinced about that approach either.

Arnd

2022-01-13 10:17:15

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [ANNOUNCE] "Fast Kernel Headers" Tree -v2

.hOn Thu, Jan 13, 2022 at 9:57 AM Ingo Molnar <[email protected]> wrote:
> * Arnd Bergmann <[email protected]> wrote:
> > On Sat, Jan 8, 2022 at 5:26 PM Ingo Molnar <[email protected]> wrote:
> > well, some things I would have done differently but I'm not complaining
> > as you did the work, and some things seem unnecessary but might not be.
> >
> > I've started building randconfig kernels for arm64 and x86, and fixing
> > up things that come up,
> > a few things I have noticed out so far:
> >
> > * 2e98ec93d465 ("headers/prep: Rename constants: SOCK_DESTROY =>
> > SOCK_DIAG_SOCK_DESTROY")
> >
> > This one looks wrong, as you are changing a uapi header, possibly
> > breaking applications at compile time. I think the other one should be
> > renamed instead.
>
> This is hard: SOCK_DESTROY is one of the main constants for sockets, is
> well named, fits into an existing in-kernel nomenclature and both me and
> networking folks would hate to rename it ...
>
> So I'd keep this one and wait for any reported breakage. I don't think we
> *guarantee* the specific naming of symbols - we guarantee an ABI, and make
> a best-effort for the rest. The constant is netdiag specific and doesn't
> seem to be included by any major user-space header in /usr/include.

Ok.

https://codesearch.debian.net/search?q=%5CWSOCK_DESTROY%5CW&literal=0
finds only iproute2 and strace referencing this name, and they both provide
their own definitions, but it would be good to split out the rename from the
series and discuss it separate with the relevant maintainers.

> > * 04293522a8cb ("headers/deps: ipc/shm: Move the 'struct shmid_ds'
> > definition to ipc/shm.c")
> > and related patches
> >
> > Similarly, the IPC structures are uapi headers that I would not
> > change here for the same reasons.
> > Even if nothing uses those any more with modern libc
> > implementations, the structures belong into
> > uapi, unless we can prove that the old-style sysvipc interface is
> > completely unused and we
> > remove the implementation from the kernel as well (I don't think we
> > want that, but I have not
> > looked in depth at when it was last used by a libc)
>
> Ok, we can certainly undo this one - but how does it work in practice, as
> the structure is already defined by libc:
>
> /usr/include/x86_64-linux-gnu/bits/types/struct_shmid_ds.h:struct shmid_ds
>
> /* Data structure describing a shared memory segment. */
> struct shmid_ds
> {
> struct ipc_perm shm_perm; /* operation permission struct */
> size_t shm_segsz; /* size of segment in bytes */
> #if __TIMESIZE == 32
> __time_t shm_atime; /* time of last shmat() */
> unsigned long int __shm_atime_high;
> __time_t shm_dtime; /* time of last shmdt() */
> unsigned long int __shm_dtime_high;
> __time_t shm_ctime; /* time of last change by shmctl() */
> unsigned long int __shm_ctime_high;
> #else
> __time_t shm_atime; /* time of last shmat() */
> __time_t shm_dtime; /* time of last shmdt() */
> __time_t shm_ctime; /* time of last change by shmctl() */
> #endif
> __pid_t shm_cpid; /* pid of creator */
> __pid_t shm_lpid; /* pid of last shmop */
> shmatt_t shm_nattch; /* number of current attaches */
> __syscall_ulong_t __glibc_reserved5;
> __syscall_ulong_t __glibc_reserved6;
> };
>
>
> Wouldn't this definition conflict with any header use of linux/shm.h?

This is the glibc version of the modern structure that corresponds to
the kernel's shmid64_ds. Any user of the old interface would necessarily
be something other than glibc, and presumably something that works
with the syscall interface directly. The main examples I can think of would
be stuff like strace, qemu-user, or gdb for the purpose of interpreting
another tasks syscalls. There is a good chance that any such tool
already has its own definition, but I have not done specific research here.

Another possibility would be any runtime environment (non-glibc C or
some other language) that wraps the syscall interface to applications.
Again, these /should/ not use the old sysvipc stuff, but that doesn't
mean that they don't just blindly wrap every single interface provided
by the kernel.

I checked the uclibc-ng and found that it moved over from the legacy
interface to the modern interface in 2005, but it has always had its
own definition of the structures since ipc support was added in 2000.

> > * changing any include/uapi headers to use "#include <uapi/linux/*.h>"
> > is broken because
> > that makes the headers unusable from userspace, including any of
> > tools/*/. I think we
> > can work around this in the headers_install.sh postprocessing step
> > though, where we already
> > do unifdef etc.
>
> Yeah, so the problem here is on the kernel side, the following innocent
> looking include in a UAPI header:
>
> #include <linux/foo.h>
>
> Will turn into a very large header - and unintentionally so - if there
> happens to be an include/foo.h header.
>
> I.e. normally there's only the UAPI header, which is then included, but in
> some significant cases that's not so.
>
> IMO it seems so much cleaner to express the intent to only include the UAPI
> header - so solving this at header-install time would be preferable.

Right, let's do that then. Not sure when I'll get around to this, but
it shouldn't
be hard to do, so feel free to do this first if you come across it again.
For the moment, I have my local workaround, but I agree that the
postprocessing would be better to do in your tree.

> > * For all the header additions to .c files, I assume you are using a set
> > of script, so these could
> > probably be changed without much trouble. I would suggest applying
> > them in sequence so
> > the headers remain sorted alphabetically in the end. It would
> > probably make sense to
> > squash those all together to avoid patching certain files many
> > times over, for the sake
> > of keeping a slightly saner git history.
>
> Are you suggesting to change the current reverse-alphabetical order of
> headers:
>
> --- a/drivers/auxdisplay/ht16k33.c
> +++ b/drivers/auxdisplay/ht16k33.c
> @@ -8,6 +8,15 @@
> * Copyright (C) 2021 Glider bv
> */
>
> +#include <linux/workqueue_api.h>
> +#include <linux/wait_api.h>
> +#include <linux/sched.h>
> +#include <linux/pgtable_api.h>
> +#include <linux/of_api.h>
> +#include <linux/mm_api.h>
> +#include <linux/jiffies.h>
> +#include <linux/gfp_api.h>
> +#include <linux/device_api_lock.h>
>
> ... to alphabetical?
>
> Can certainly do that - but this will flip around the commit order: it's
> much easier to add at the head of the include files section.

Yes, I think doing the alphabetical sorting is better, as that seems to be
the most common version. In my earlier approach I came up with a
pseudo-alphabetical sorting that would add the new #include statements
before the first existing #include that comes later in the alphabet. This
is slightly more fragile than just adding to the front (especially
when #include statements are inside of an #ifdef), but it's closer to
what a lot of maintainers prefer.

> > * I think it would be good to keep the include/linux/syscalls_api.h declarations
> > in the same header as the SYSCALL_DEFINE*() macros, to ensure that the
> > prototypes remain synchronized. Splitting them out will likely also
> > cause sparse
> > warnings for missing prototypes (or maybe it should but doesn't at
> > the moment).
>
> Yeah, I suppose we could undo the split:
>
> # -fast-headers-v2:
>
> _______________________
> | stripped lines of code
> | _____________________________
> | | headers included recursively
> | | _______________________________
> | | | usage in a distro kernel build
> ____________ | | | _________________________________________
> | header name | | | | million lines of comment-stripped C code
> | | | | |
> #include <linux/syscalls_types.h> | LOC: 2,397 | headers: 128 | 353 | MLOC: 0.8 |
> #include <linux/syscalls_api.h> | LOC: 12,842 | headers: 361 | 167 | MLOC: 2.1 |
>
> The full header used to be a lot bigger:
>
> # v5.16-rc8
>
> #include <linux/syscalls.h> | LOC: 40,217 | headers: 604 | 321 | MLOC: 12.9 | ###

I would actually hope to need almost no indirect includes for
linux/syscall_api.h,
aside from linux/types.h for a couple of typedefs and linux/linkage.h
for asmlinkage.
Most of the remaining #includes appear to be there only for structure
definitions
that can be converted into additional forward declarations.

> > * arm64 needs a couple of minor fixups, see
> > https://pastebin.com/eSKhz4CL for what
> > I have so far, feel free to integrate any things that directly make sense.
>
> Thanks! Mind sending the dependency removals in a series, so I can keep
> attribution?

Ok, I'll have a look at rebasing and will split it up further. I have
a couple more
fixes now, so I can build all randconfig kernels without warnings on arm64 and
x86_64 now.

> These bits are usually in separate commits, unless they fix
> some bug in an existing commit:
>
> arch/arm64/include/asm/mmu.h | 1 -
> arch/arm64/include/asm/pgtable-hwdef.h | 1 -
> arch/arm64/include/asm/pgtable-prot.h | 1 -
>
> This bit looks suboptimal - but as you mentioned it fixes tooling build
> errors:
>
> --- a/include/uapi/linux/netdevice.h
> +++ b/include/uapi/linux/netdevice.h
> @@ -26,11 +26,17 @@
> #ifndef _UAPI_LINUX_NETDEVICE_H
> #define _UAPI_LINUX_NETDEVICE_H
>
> +#ifdef __KERNEL__
> #include <uapi/linux/if.h>
> #include <uapi/linux/if_ether.h>
> #include <uapi/linux/if_packet.h>
> #include <uapi/linux/if_link.h>
> -
> +#else
> +#include <linux/if.h>
> +#include <linux/if_ether.h>
> +#include <linux/if_packet.h>
> +#include <linux/if_link.h>
> +#endif

Right, this was not meant to be added to your series, it was just the
quickest way
to get randconfig kernels to build when they enable tools/*/

> This KASAN bit:
>
> --- a/include/linux/mm_api_kasan.h
> +++ b/include/linux/mm_api_kasan.h
> @@ -18,7 +18,7 @@ static inline u8 page_kasan_tag(const struct page *page)
> {
> u8 tag = 0xff;
>
> - if (kasan_enabled()) {
> + if (IS_ENABLED(CONFIG_KASAN)) {
> tag = (page->flags >> KASAN_TAG_PGSHIFT) & KASAN_TAG_MASK;
> tag ^= 0xff;
> }
>
> Is this a fix for some build failure? Upstream it's using kasan_enabled():

Yes, I got a circular include chain between mm_api_kasan.h, mm_types.h
and linux/kasan.h in certain configurations. I'm guessing this happens
specifically
for kasan-enabled kernels, but it could be others as well. The problem here
is that include/linux/mm_api_kasan.h needs the 'struct page' definition.

Arnd

2022-01-21 19:13:12

by Ingo Molnar

[permalink] [raw]
Subject: Re: [ANNOUNCE] "Fast Kernel Headers" Tree -v2


* Arnd Bergmann <[email protected]> wrote:

> > I tried to avoid as many low level headers as possible from the main
> > types headers - and the get_order() functionality also brings in bitops
> > definitions, which I'm still hoping to be able to reduce from its
> > current ~95% utilization in a distro kernel ...
>
> Agreed, I think reducing bitops.h and atomic.h usage is fairly important,
> I think these are even bigger on arm64 than on x86.

So what I'm using for 'header complexity metrics' is rather simple: passing
-P -H to the preprocessor: stripping comments & not generating
line-markers, and then counting linecount.

Line-markers should *probably* remain, because the real build is generating
them too - but I wanted to gain a crude & easily available metric to
measure 'first-pass parsing complexity'. That's I think where most of the
header bloat is concentrated: later passes don't really get any of the
unused header definitions passed along. (But maybe this is an invalid
assumption, because compiler warnings do get generated by later passes, and
they are generated for mostly-unused header inlines too.)

If we include comments & line-markers then the bloat goes up by another
~2x:

kepler:~/mingo.tip.git> ./st include/linux/sched.h
#include <linux/sched.h> | LOC: 2,186 | headers: 118
kepler:~/mingo.tip.git> ./st include/linux/sched.h
#include <linux/sched.h> | LOC: 4,092 | headers: 0


> > We could add <linux/page_api.h> as well, as a standardized header. We
> > already have page_types.h and et_order() is a page types API.
>
> More generally speaking, do you have a plan for how to document which
> header to include for getting a particular symbol that is provided by a
> header we don't want to include directly? I think iwyu has a particular
> notation for it, but when I looked at using that in 2020 I decided it
> wouldn't scale to the size of the kernel. I did my own shell script with
> a long list of regex patterns, but I'm not convinced about that approach
> either.

Yeah, I don't think we should do much that hurts general usability of
headers: each symbol has a primary "natural" header, and .c code and other
headers are encouraged but not strictly required to include that.

Thanks,

Ingo

2022-01-21 19:52:12

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [ANNOUNCE] "Fast Kernel Headers" Tree -v2

On Wed, Jan 19, 2022 at 1:31 PM Ingo Molnar <[email protected]> wrote:
>
> * Arnd Bergmann <[email protected]> wrote:
>
> > > I tried to avoid as many low level headers as possible from the main
> > > types headers - and the get_order() functionality also brings in bitops
> > > definitions, which I'm still hoping to be able to reduce from its
> > > current ~95% utilization in a distro kernel ...
> >
> > Agreed, I think reducing bitops.h and atomic.h usage is fairly important,
> > I think these are even bigger on arm64 than on x86.
>
> So what I'm using for 'header complexity metrics' is rather simple: passing
> -P -H to the preprocessor: stripping comments & not generating
> line-markers, and then counting linecount.
>
> Line-markers should *probably* remain, because the real build is generatinginclude/linux/mm_page_address.h
> them too - but I wanted to gain a crude & easily available metric to
> measure 'first-pass parsing complexity'. That's I think where most of the
> header bloat is concentrated: later passes don't really get any of the
> unused header definitions passed along. (But maybe this is an invalid
> assumption, because compiler warnings do get generated by later passes, and
> they are generated for mostly-unused header inlines too.)
>
> If we include comments & line-markers then the bloat goes up by another
> ~2x:
>
> kepler:~/mingo.tip.git> ./st include/linux/sched.h
> #include <linux/sched.h> | LOC: 2,186 | headers: 118
> kepler:~/mingo.tip.git> ./st include/linux/sched.h
> #include <linux/sched.h> | LOC: 4,092 | headers: 0

The metric I've been focusing on is bytes of the preprocessed header, which
is more sensitive to function definitions that get generated from macros,
and I multiply this by the number of inclusions (from scanning the
.file.o.cmd files). It probably helps to have a couple of metrics and look
at all of them occasionally to not miss something important.

In the meantime, I have made some progress on reducing the headers
for arm64, on top of your tree from Jan 8, but I have not looked at
later changes from your side, and I need to work on this a bit more
to ensure this doesn't break other architectures.

For an arm64 allmodconfig build, my additional improvements on top
of yours are significant but not as good as I had hoped for, this
can still improve I hope:

5.16-rc8-vanilla 32640 seconds user, 3286 seconds sys
5.16-rc8-mingo 22990 seconds user, 2304 seconds sys
5.16-rc8-arnd 19007 seconds user, 1853 seconds sys

As my tree builds any randconfig cleanly, I keep looking at
different configs and find that this has a big impact, some options
end up eliminating most of the benefits until I add further changes
to clean up certain files. This happened with kasan, kprobes,
and lse-atomics for instance. After eliminating all circular includes,
I was also able to revisit my old script to visualize the inclusions,
see[1] for the current arm64 defconfig output. This version uses
my arbitrary metric as font-size, and uses labels for the number
of inclusions.

Arnd

[1] https://drive.google.com/file/d/1wbs252I8LyswscBAeV3SpjBG2AGoBnB8/view?usp=sharing

2022-01-23 14:56:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [ANNOUNCE] "Fast Kernel Headers" Tree -v2


* Arnd Bergmann <[email protected]> wrote:

> > If we include comments & line-markers then the bloat goes up by another
> > ~2x:
> >
> > kepler:~/mingo.tip.git> ./st include/linux/sched.h
> > #include <linux/sched.h> | LOC: 2,186 | headers: 118
> > kepler:~/mingo.tip.git> ./st include/linux/sched.h
> > #include <linux/sched.h> | LOC: 4,092 | headers: 0
>
> The metric I've been focusing on is bytes of the preprocessed header,
> which is more sensitive to function definitions that get generated from
> macros, and I multiply this by the number of inclusions (from scanning
> the .file.o.cmd files). It probably helps to have a couple of metrics and
> look at all of them occasionally to not miss something important.

Actual inclusions don't just depend on .file.o.cmd files though, that won't
catch indirect inclusions, right?

> In the meantime, I have made some progress on reducing the headers
> for arm64, on top of your tree from Jan 8, but I have not looked at
> later changes from your side, and I need to work on this a bit more
> to ensure this doesn't break other architectures.

Sure & great!

> For an arm64 allmodconfig build, my additional improvements on top
> of yours are significant but not as good as I had hoped for, this
> can still improve I hope:
>
> 5.16-rc8-vanilla 32640 seconds user, 3286 seconds sys
> 5.16-rc8-mingo 22990 seconds user, 2304 seconds sys
> 5.16-rc8-arnd 19007 seconds user, 1853 seconds sys

~71% build throughput speedup for allmodconfig is very impressive to me. :-)

> As my tree builds any randconfig cleanly, [...]

Yeah, same here - having a few thousand randconfig build tests is normal
for each version:

/* This file is auto generated, version 3288 */
#define UTS_MACHINE "x86_64"
#define UTS_VERSION "#3288 Fri Jan 14 18:20:14 CET 2022"

My testing is mostly concentrated on x86 - but I often test ARM64
randconfig as well.

> I keep looking at different configs and find that this has a big impact,
> some options end up eliminating most of the benefits until I add further
> changes to clean up certain files. This happened with kasan, kprobes, and
> lse-atomics for instance. After eliminating all circular includes, I was
> also able to revisit my old script to visualize the inclusions, see[1]
> for the current arm64 defconfig output. This version uses my arbitrary
> metric as font-size, and uses labels for the number of inclusions.

This is really nice!

I was concentrating on optimizing a generic distro config - which doesn't
include the tons of extreme instrumentation measures that allmodconfig
includes but production distro kernels rarely do.

allmodconfig definitely needs more work, but 71% is a pretty good starting
point ...

Feel free to send in patches, I can help with the testing too.

Thanks,

Ingo

2022-03-15 19:39:20

by Ingo Molnar

[permalink] [raw]
Subject: [TREE] "Fast Kernel Headers" Tree -v3


This is -v3 of the "Fast Kernel Headers" tree, which is an ongoing rework
of the Linux kernel's header hierarchy & header dependencies, with the dual
goals of:

- speeding up the kernel build (both absolute and incremental build times)

- decoupling subsystem type & API definitions from each other

The fast-headers tree consists of over 25 sub-trees internally, spanning
over 2,300 commits, which can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master

There's various changes in -v3, and it's now ported to the latest kernel
(v5.17-rc8).

Diffstat difference:

-v2: 25332 files changed, 178498 insertions(+), 74790 deletions(-)
-v3: 25513 files changed, 180947 insertions(+), 74572 deletions(-)

Thanks,

Ingo

2022-03-22 08:09:29

by Kari Argillander

[permalink] [raw]
Subject: Re: [TREE] "Fast Kernel Headers" Tree -v3

15.3.2022 12.35 Ingo Molnar ([email protected]) wrote:
>
> This is -v3 of the "Fast Kernel Headers" tree, which is an ongoing rework
> of the Linux kernel's header hierarchy & header dependencies, with the dual
> goals of:
>
> - speeding up the kernel build (both absolute and incremental build times)
>
> - decoupling subsystem type & API definitions from each other
>
> The fast-headers tree consists of over 25 sub-trees internally, spanning
> over 2,300 commits, which can be found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master

I have had problems to build master branch (defconfig) with gcc9
gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0

I did also test v2 and problems where there too. I have no problem with gcc10 or
Clang11. Error I get is:

In file included from ./include/linux/rcuwait_api.h:7,
from ./include/linux/rcuwait.h:6,
from ./include/linux/irq_work.h:7,
from ./include/linux/perf_event_types.h:44,
from ./include/linux/perf_event_api.h:17,
from arch/x86/kernel/kprobes/opt.c:8:
./include/linux/rcuwait_api.h: In function ‘rcuwait_active’:
./include/linux/rcupdate.h:328:9: error: dereferencing pointer to
incomplete type ‘struct task_struct’
328 | typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
| ^
./include/linux/rcupdate.h:439:31: note: in expansion of macro
‘__rcu_access_pointer’
439 | #define rcu_access_pointer(p) __rcu_access_pointer((p),
__UNIQUE_ID(rcu), __rcu)
| ^~~~~~~~~~~~~~~~~~~~
./include/linux/rcuwait_api.h:15:11: note: in expansion of macro
‘rcu_access_pointer’
15 | return !!rcu_access_pointer(w->task);

Argillander

> There's various changes in -v3, and it's now ported to the latest kernel
> (v5.17-rc8).
>
> Diffstat difference:
>
> -v2: 25332 files changed, 178498 insertions(+), 74790 deletions(-)
> -v3: 25513 files changed, 180947 insertions(+), 74572 deletions(-)
>
> Thanks,
>
> Ingo

2022-03-22 18:26:32

by Randy Dunlap

[permalink] [raw]
Subject: Re: [TREE] "Fast Kernel Headers" Tree -v3

Hi Kari,

On 3/22/22 00:59, Kari Argillander wrote:
> 15.3.2022 12.35 Ingo Molnar ([email protected]) wrote:
>>
>> This is -v3 of the "Fast Kernel Headers" tree, which is an ongoing rework
>> of the Linux kernel's header hierarchy & header dependencies, with the dual
>> goals of:
>>
>> - speeding up the kernel build (both absolute and incremental build times)
>>
>> - decoupling subsystem type & API definitions from each other
>>
>> The fast-headers tree consists of over 25 sub-trees internally, spanning
>> over 2,300 commits, which can be found at:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master
>
> I have had problems to build master branch (defconfig) with gcc9
> gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
>
> I did also test v2 and problems where there too. I have no problem with gcc10 or
> Clang11. Error I get is:
>
> In file included from ./include/linux/rcuwait_api.h:7,
> from ./include/linux/rcuwait.h:6,
> from ./include/linux/irq_work.h:7,
> from ./include/linux/perf_event_types.h:44,
> from ./include/linux/perf_event_api.h:17,
> from arch/x86/kernel/kprobes/opt.c:8:
> ./include/linux/rcuwait_api.h: In function ‘rcuwait_active’:
> ./include/linux/rcupdate.h:328:9: error: dereferencing pointer to
> incomplete type ‘struct task_struct’
> 328 | typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
> | ^
> ./include/linux/rcupdate.h:439:31: note: in expansion of macro
> ‘__rcu_access_pointer’
> 439 | #define rcu_access_pointer(p) __rcu_access_pointer((p),
> __UNIQUE_ID(rcu), __rcu)
> | ^~~~~~~~~~~~~~~~~~~~
> ./include/linux/rcuwait_api.h:15:11: note: in expansion of macro
> ‘rcu_access_pointer’
> 15 | return !!rcu_access_pointer(w->task);
>
> Argillander

You could try the patch here:
https://lore.kernel.org/all/[email protected]/

although the build error that it fixes doesn't look exactly the same
as yours.

>> There's various changes in -v3, and it's now ported to the latest kernel
>> (v5.17-rc8).
>>
>> Diffstat difference:
>>
>> -v2: 25332 files changed, 178498 insertions(+), 74790 deletions(-)
>> -v3: 25513 files changed, 180947 insertions(+), 74572 deletions(-)


--
~Randy

2022-03-22 19:43:34

by Kari Argillander

[permalink] [raw]
Subject: Re: [TREE] "Fast Kernel Headers" Tree -v3

22.03.2022 17.37 Randy Dunlap ([email protected]) wrote:
>
> Hi Kari,
>
> On 3/22/22 00:59, Kari Argillander wrote:
> > 15.3.2022 12.35 Ingo Molnar ([email protected]) wrote:
> >>
> >> This is -v3 of the "Fast Kernel Headers" tree, which is an ongoing rework
> >> of the Linux kernel's header hierarchy & header dependencies, with the dual
> >> goals of:
> >>
> >> - speeding up the kernel build (both absolute and incremental build times)
> >>
> >> - decoupling subsystem type & API definitions from each other
> >>
> >> The fast-headers tree consists of over 25 sub-trees internally, spanning
> >> over 2,300 commits, which can be found at:
> >>
> >> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master
> >
> > I have had problems to build master branch (defconfig) with gcc9
> > gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
> >
> > I did also test v2 and problems where there too. I have no problem with gcc10 or
> > Clang11. Error I get is:
> >
> > In file included from ./include/linux/rcuwait_api.h:7,
> > from ./include/linux/rcuwait.h:6,
> > from ./include/linux/irq_work.h:7,
> > from ./include/linux/perf_event_types.h:44,
> > from ./include/linux/perf_event_api.h:17,
> > from arch/x86/kernel/kprobes/opt.c:8:
> > ./include/linux/rcuwait_api.h: In function ‘rcuwait_active’:
> > ./include/linux/rcupdate.h:328:9: error: dereferencing pointer to
> > incomplete type ‘struct task_struct’
> > 328 | typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
> > | ^
> > ./include/linux/rcupdate.h:439:31: note: in expansion of macro
> > ‘__rcu_access_pointer’
> > 439 | #define rcu_access_pointer(p) __rcu_access_pointer((p),
> > __UNIQUE_ID(rcu), __rcu)
> > | ^~~~~~~~~~~~~~~~~~~~
> > ./include/linux/rcuwait_api.h:15:11: note: in expansion of macro
> > ‘rcu_access_pointer’
> > 15 | return !!rcu_access_pointer(w->task);
> >
> > Argillander
>
> You could try the patch here:
> https://lore.kernel.org/all/[email protected]/

I have to edit it to <linux/cgroup_types.h> as there is no <linux/cgroup-defs.h>
with fast headers. I also tried a couple other things but it didn't
seem to make a
difference.

> although the build error that it fixes doesn't look exactly the same
> as yours.

Quite close still. Maybe I should try to bisect this and I will also
see how bisectable
this branch is.

> >> There's various changes in -v3, and it's now ported to the latest kernel
> >> (v5.17-rc8).
> >>
> >> Diffstat difference:
> >>
> >> -v2: 25332 files changed, 178498 insertions(+), 74790 deletions(-)
> >> -v3: 25513 files changed, 180947 insertions(+), 74572 deletions(-)
>
>
> --
> ~Randy

2022-03-24 20:51:29

by Kari Argillander

[permalink] [raw]
Subject: Re: [TREE] "Fast Kernel Headers" Tree -v3

22.03.2022 18.22 Kari Argillander ([email protected]) wrote:
>
> 22.03.2022 17.37 Randy Dunlap ([email protected]) wrote:
> >
> > Hi Kari,
> >
> > On 3/22/22 00:59, Kari Argillander wrote:
> > > 15.3.2022 12.35 Ingo Molnar ([email protected]) wrote:
> > >>
> > >> This is -v3 of the "Fast Kernel Headers" tree, which is an ongoing rework
> > >> of the Linux kernel's header hierarchy & header dependencies, with the dual
> > >> goals of:
> > >>
> > >> - speeding up the kernel build (both absolute and incremental build times)
> > >>
> > >> - decoupling subsystem type & API definitions from each other
> > >>
> > >> The fast-headers tree consists of over 25 sub-trees internally, spanning
> > >> over 2,300 commits, which can be found at:
> > >>
> > >> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master
> > >
> > > I have had problems to build master branch (defconfig) with gcc9
> > > gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
> > >
> > > I did also test v2 and problems where there too. I have no problem with gcc10 or
> > > Clang11. Error I get is:
> > >
> > > In file included from ./include/linux/rcuwait_api.h:7,
> > > from ./include/linux/rcuwait.h:6,
> > > from ./include/linux/irq_work.h:7,
> > > from ./include/linux/perf_event_types.h:44,
> > > from ./include/linux/perf_event_api.h:17,
> > > from arch/x86/kernel/kprobes/opt.c:8:
> > > ./include/linux/rcuwait_api.h: In function ‘rcuwait_active’:
> > > ./include/linux/rcupdate.h:328:9: error: dereferencing pointer to
> > > incomplete type ‘struct task_struct’
> > > 328 | typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
> > > | ^
> > > ./include/linux/rcupdate.h:439:31: note: in expansion of macro
> > > ‘__rcu_access_pointer’
> > > 439 | #define rcu_access_pointer(p) __rcu_access_pointer((p),
> > > __UNIQUE_ID(rcu), __rcu)
> > > | ^~~~~~~~~~~~~~~~~~~~
> > > ./include/linux/rcuwait_api.h:15:11: note: in expansion of macro
> > > ‘rcu_access_pointer’
> > > 15 | return !!rcu_access_pointer(w->task);
> > >
> > > Argillander
> >
> > You could try the patch here:
> > https://lore.kernel.org/all/[email protected]/
>
> I have to edit it to <linux/cgroup_types.h> as there is no <linux/cgroup-defs.h>
> with fast headers. I also tried a couple other things but it didn't
> seem to make a
> difference.
>
> > although the build error that it fixes doesn't look exactly the same
> > as yours.
>
> Quite close still. Maybe I should try to bisect this and I will also
> see how bisectable
> this branch is.

Ok. I have now bisect first bad to this commit.
c4ad6fcb67c4 ("sched/headers: Reorganize, clean up and optimize
kernel/sched/fair.c dependencies")
Note that this has been also bisect by others.

With this I get little bit different error:

In file included from ./arch/x86/include/generated/asm/rwonce.h:1,
from ./include/linux/compiler.h:255,
from ./include/linux/export.h:43,
from ./include/linux/linkage.h:7,
from ./include/linux/kernel.h:17,
from ./include/linux/cpumask.h:10,
from ./include/linux/energy_model.h:4,
from kernel/sched/fair.c:23:
./include/linux/psi.h: In function ‘cgroup_move_task’:
./include/linux/rcupdate.h:414:36: error: dereferencing pointer to
incomplete type ‘struct css_set’
414 | #define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
| ^~~~

Which is actually the same error that is in Randy's message. Patch of
Randy works
on top of this commit. But I cannot get this patch to work with HEAD,
but probably
I'm just missing something obvious. Still nice to see that probably a
solution is near.

> > >> There's various changes in -v3, and it's now ported to the latest kernel
> > >> (v5.17-rc8).
> > >>
> > >> Diffstat difference:
> > >>
> > >> -v2: 25332 files changed, 178498 insertions(+), 74790 deletions(-)
> > >> -v3: 25513 files changed, 180947 insertions(+), 74572 deletions(-)
> >
> >
> > --
> > ~Randy

2023-11-04 09:08:08

by Lucas Tanure

[permalink] [raw]
Subject: Re: [TREE] "Fast Kernel Headers" Tree -v3

On 22-03-2022 19:03, Kari Argillander wrote:
> 22.03.2022 18.22 Kari Argillander ([email protected]) wrote:
>>
>> 22.03.2022 17.37 Randy Dunlap ([email protected]) wrote:
>>>
>>> Hi Kari,
>>>
>>> On 3/22/22 00:59, Kari Argillander wrote:
>>>> 15.3.2022 12.35 Ingo Molnar ([email protected]) wrote:
>>>>>
>>>>> This is -v3 of the "Fast Kernel Headers" tree, which is an ongoing rework
>>>>> of the Linux kernel's header hierarchy & header dependencies, with the dual
>>>>> goals of:
>>>>>
>>>>> - speeding up the kernel build (both absolute and incremental build times)
>>>>>
>>>>> - decoupling subsystem type & API definitions from each other
>>>>>
>>>>> The fast-headers tree consists of over 25 sub-trees internally, spanning
>>>>> over 2,300 commits, which can be found at:
>>>>>
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master
>>>>
>>>> I have had problems to build master branch (defconfig) with gcc9
>>>> gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
>>>>
>>>> I did also test v2 and problems where there too. I have no problem with gcc10 or
>>>> Clang11. Error I get is:
>>>>
>>>> In file included from ./include/linux/rcuwait_api.h:7,
>>>> from ./include/linux/rcuwait.h:6,
>>>> from ./include/linux/irq_work.h:7,
>>>> from ./include/linux/perf_event_types.h:44,
>>>> from ./include/linux/perf_event_api.h:17,
>>>> from arch/x86/kernel/kprobes/opt.c:8:
>>>> ./include/linux/rcuwait_api.h: In function ‘rcuwait_active’:
>>>> ./include/linux/rcupdate.h:328:9: error: dereferencing pointer to
>>>> incomplete type ‘struct task_struct’
>>>> 328 | typeof(*p) *local = (typeof(*p) *__force)READ_ONCE(p); \
>>>> | ^
>>>> ./include/linux/rcupdate.h:439:31: note: in expansion of macro
>>>> ‘__rcu_access_pointer’
>>>> 439 | #define rcu_access_pointer(p) __rcu_access_pointer((p),
>>>> __UNIQUE_ID(rcu), __rcu)
>>>> | ^~~~~~~~~~~~~~~~~~~~
>>>> ./include/linux/rcuwait_api.h:15:11: note: in expansion of macro
>>>> ‘rcu_access_pointer’
>>>> 15 | return !!rcu_access_pointer(w->task);
>>>>
>>>> Argillander
>>>
>>> You could try the patch here:
>>> https://lore.kernel.org/all/[email protected]/
>>
>> I have to edit it to <linux/cgroup_types.h> as there is no <linux/cgroup-defs.h>
>> with fast headers. I also tried a couple other things but it didn't
>> seem to make a
>> difference.
>>
>>> although the build error that it fixes doesn't look exactly the same
>>> as yours.
>>
>> Quite close still. Maybe I should try to bisect this and I will also
>> see how bisectable
>> this branch is.
>
> Ok. I have now bisect first bad to this commit.
> c4ad6fcb67c4 ("sched/headers: Reorganize, clean up and optimize
> kernel/sched/fair.c dependencies")
> Note that this has been also bisect by others.
>
> With this I get little bit different error:
>
> In file included from ./arch/x86/include/generated/asm/rwonce.h:1,
> from ./include/linux/compiler.h:255,
> from ./include/linux/export.h:43,
> from ./include/linux/linkage.h:7,
> from ./include/linux/kernel.h:17,
> from ./include/linux/cpumask.h:10,
> from ./include/linux/energy_model.h:4,
> from kernel/sched/fair.c:23:
> ./include/linux/psi.h: In function ‘cgroup_move_task’:
> ./include/linux/rcupdate.h:414:36: error: dereferencing pointer to
> incomplete type ‘struct css_set’
> 414 | #define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
> | ^~~~
>
> Which is actually the same error that is in Randy's message. Patch of
> Randy works
> on top of this commit. But I cannot get this patch to work with HEAD,
> but probably
> I'm just missing something obvious. Still nice to see that probably a
> solution is near.
>
>>>>> There's various changes in -v3, and it's now ported to the latest kernel
>>>>> (v5.17-rc8).
>>>>>
>>>>> Diffstat difference:
>>>>>
>>>>> -v2: 25332 files changed, 178498 insertions(+), 74790 deletions(-)
>>>>> -v3: 25513 files changed, 180947 insertions(+), 74572 deletions(-)
>>>
>>>
>>> --
>>> ~Randy
>
Hi Ingo,

What is the fate of this patch series? Are you going to push this?
If not, can I? My approach would be take one by one, understand, test
and push.
Would take years, but I have the time.

Thanks
Lucas Tanure