2021-06-02 15:52:24

by Ingo Molnar

[permalink] [raw]
Subject: kbuild: Ctrl-C of parallel kernel build sometimes corrupts .o.cmd files permanently


There's a sporadic kbuild bug that's been happening for years, maybe you
guys can think of where it comes from.

Sometimes when I Ctrl-C a kernel build job, the .o.cmd file becomes
corrupted and this breaks the build:


kernel/.panic.o.cmd:5: *** unterminated call to function 'wildcard': missing ')'. Stop.

...

drivers/gpu/drm/.drm_blend.o.cmd:5: *** unterminated call to function 'wildcard': missing ')'. Stop.

The file was just partially created and didn't get cleaned up:

kepler:~/tip> ls -l drivers/gpu/drm/.drm_blend.o.cmd.corrupted drivers/gpu/drm/.drm_blend.o.cmd.good
-rw-rw-r-- 1 mingo mingo 28672 Jun 2 17:46 drivers/gpu/drm/.drm_blend.o.cmd.corrupted
-rw-rw-r-- 1 mingo mingo 51331 Jun 2 17:46 drivers/gpu/drm/.drm_blend.o.cmd.good

The file just got cut in half due to the Ctrl-C:

--- drivers/gpu/drm/.drm_blend.o.cmd.corrupted 2021-06-02 17:46:16.951428326 +0200
+++ drivers/gpu/drm/.drm_blend.o.cmd.good 2021-06-02 17:46:34.391111668 +0200
@@ -646,4 +646,578 @@ deps_drivers/gpu/drm/drm_blend.o := \
$(wildcard include/config/OF_OVERLAY) \
include/linux/kobject.h \
$(wildcard include/config/UEVENT_HELPER) \
- $(wildcard include
\ No newline at end of file
+ $(wildcard include/config/DEBUG_KOBJECT_RELEASE) \
+ include/linux/sysfs.h \
+ include/linux/kernfs.h \
+ $(wildcard include/config/KERNFS) \


... but once in this state it can only be fixed by 'make clean' (which
loses all build progress), or by removing the stale file manually.

It happens more frequently on systems with a lot of CPUs.

Thanks,

Ingo


2021-06-02 18:22:47

by Masahiro Yamada

[permalink] [raw]
Subject: Re: kbuild: Ctrl-C of parallel kernel build sometimes corrupts .o.cmd files permanently

On Thu, Jun 3, 2021 at 12:50 AM Ingo Molnar <[email protected]> wrote:
>
>
> There's a sporadic kbuild bug that's been happening for years, maybe you
> guys can think of where it comes from.
>
> Sometimes when I Ctrl-C a kernel build job, the .o.cmd file becomes
> corrupted and this breaks the build:
>
>
> kernel/.panic.o.cmd:5: *** unterminated call to function 'wildcard': missing ')'. Stop.
>
> ...
>
> drivers/gpu/drm/.drm_blend.o.cmd:5: *** unterminated call to function 'wildcard': missing ')'. Stop.
>
> The file was just partially created and didn't get cleaned up:
>
> kepler:~/tip> ls -l drivers/gpu/drm/.drm_blend.o.cmd.corrupted drivers/gpu/drm/.drm_blend.o.cmd.good
> -rw-rw-r-- 1 mingo mingo 28672 Jun 2 17:46 drivers/gpu/drm/.drm_blend.o.cmd.corrupted
> -rw-rw-r-- 1 mingo mingo 51331 Jun 2 17:46 drivers/gpu/drm/.drm_blend.o.cmd.good
>
> The file just got cut in half due to the Ctrl-C:
>
> --- drivers/gpu/drm/.drm_blend.o.cmd.corrupted 2021-06-02 17:46:16.951428326 +0200
> +++ drivers/gpu/drm/.drm_blend.o.cmd.good 2021-06-02 17:46:34.391111668 +0200
> @@ -646,4 +646,578 @@ deps_drivers/gpu/drm/drm_blend.o := \
> $(wildcard include/config/OF_OVERLAY) \
> include/linux/kobject.h \
> $(wildcard include/config/UEVENT_HELPER) \
> - $(wildcard include
> \ No newline at end of file
> + $(wildcard include/config/DEBUG_KOBJECT_RELEASE) \
> + include/linux/sysfs.h \
> + include/linux/kernfs.h \
> + $(wildcard include/config/KERNFS) \
>
>
> ... but once in this state it can only be fixed by 'make clean' (which
> loses all build progress), or by removing the stale file manually.
>
> It happens more frequently on systems with a lot of CPUs.
>
> Thanks,
>
> Ingo



Hmm, I have not observed this.

My expectation is, it should work like this:

When scripts/basic/fixdep is interrupted (or fail due to any reason),
partially written *.o.cmd is left over. So, having incomplete *.o.cmd
files is expectation.

When .DELETE_ON_ERROR is specified, GNU Make is supposed to
automatically delete the target on any error.
(If it is interrupted, it should exit with code 130)

On the next invocation of Make, Kbuild will not include .*.o.cmd files
whose corresponding *.o files do not exist.





When you got the corrupted drivers/gpu/drm/.drm_blend.o.cmd,
didn't you see the log
Deleting file 'drivers/gpu/drm/drm_blend.o' ?



If it works as I expect, the log should look like follows:
(I marked the lines with '<---- Deleting')



CC security/keys/keyctl_pkey.o
CC kernel/sys.o
CC arch/x86/power/hibernate_64.o
^Cmake[5]: *** Deleting file 'drivers/video/fbdev/core/fbcmap.o' <---- Deleting
make[5]: *** [scripts/Makefile.build:272:
drivers/video/fbdev/core/fbmon.o] Interrupt
make[3]: *** [scripts/Makefile.build:272: security/selinux/nlmsgtab.o] Interrupt
make[2]: *** [scripts/Makefile.build:272: arch/x86/power/cpu.o] Interrupt
make[2]: *** [scripts/Makefile.build:272:
arch/x86/power/hibernate_64.o] Interrupt
make[2]: *** [scripts/Makefile.build:272: arch/x86/pci/legacy.o] Interrupt
make[3]: *** [scripts/Makefile.build:272: arch/x86/mm/srat.o] Interrupt
make[3]: *** [scripts/Makefile.build:272: drivers/pnp/resource.o] Interrupt
make[3]: *** [scripts/Makefile.build:272: drivers/pnp/manager.o] Interrupt
make[3]: *** [scripts/Makefile.build:272: sound/core/ctljack.o] Interrupt
make[3]: *** [scripts/Makefile.build:272: net/core/skbuff.o] Interrupt
make[2]: *** [scripts/Makefile.build:515: arch/x86/mm] Interrupt
make[2]: *** [scripts/Makefile.build:272: kernel/signal.o] Interrupt
make[3]: *** [scripts/Makefile.build:272: drivers/acpi/device_sysfs.o] Interrupt
make[3]: *** [scripts/Makefile.build:272: drivers/pci/pci.o] Interrupt
make[2]: *** [scripts/Makefile.build:272: kernel/sys.o] Interrupt
make[2]: *** [scripts/Makefile.build:515: net/core] Interrupt
make[2]: *** [scripts/Makefile.build:272: block/blk-ioc.o] Interrupt
make[4]: *** [scripts/Makefile.build:272: arch/x86/events/intel/pt.o] Interrupt
make[2]: *** [scripts/Makefile.build:272: crypto/skcipher.o] Interrupt
make[3]: *** [scripts/Makefile.build:272: security/keys/keyctl_pkey.o] Interrupt
make[2]: *** [scripts/Makefile.build:272: fs/namei.o] Interrupt
make[3]: *** [scripts/Makefile.build:515: arch/x86/events/intel] Interrupt
make[5]: *** [scripts/Makefile.build:272:
drivers/video/fbdev/core/fbcmap.o] Interrupt
make[2]: *** [scripts/Makefile.build:515: arch/x86/events] Interrupt
make[2]: *** [scripts/Makefile.build:515: security/selinux] Interrupt
make[2]: *** [scripts/Makefile.build:272: ipc/mq_sysctl.o] Interrupt
make[2]: *** [scripts/Makefile.build:272: mm/percpu.o] Interrupt
make[2]: *** [scripts/Makefile.build:515: security/keys] Interrupt
make[2]: *** [scripts/Makefile.build:515: sound/core] Interrupt
make[4]: *** [scripts/Makefile.build:515: drivers/video/fbdev/core] Interrupt
make[3]: *** Deleting file 'arch/x86/kernel/nmi.o' <---- Deleting
make[1]: *** [Makefile:1849: arch/x86/pci] Interrupt
make[2]: *** [scripts/Makefile.build:515: drivers/pnp] Interrupt
make[1]: *** [Makefile:1849: kernel] Interrupt
make[1]: *** [Makefile:1849: fs] Interrupt
make[2]: *** [scripts/Makefile.build:515: drivers/pci] Interrupt
make[1]: *** [Makefile:1849: ipc] Interrupt
make[2]: *** [scripts/Makefile.build:515: drivers/acpi] Interrupt
make[1]: *** [Makefile:1849: security] Interrupt
make[1]: *** [Makefile:1849: crypto] Interrupt
make[1]: *** [Makefile:1849: block] Interrupt
make[1]: *** [Makefile:1849: sound] Interrupt
make[3]: *** [scripts/Makefile.build:272: arch/x86/kernel/ldt.o] Interrupt
make[1]: *** [Makefile:1849: net] Interrupt
make[3]: *** [scripts/Makefile.build:272: arch/x86/kernel/nmi.o] Interrupt
make[1]: *** [Makefile:1849: arch/x86/power] Interrupt
make[3]: *** [scripts/Makefile.build:515: drivers/video/fbdev] Interrupt
make[1]: *** [Makefile:1849: mm] Interrupt
make[2]: *** [scripts/Makefile.build:515: drivers/video] Interrupt
make[1]: *** [Makefile:1849: drivers] Interrupt
make[2]: *** [scripts/Makefile.build:515: arch/x86/kernel] Interrupt
make[1]: *** [Makefile:1849: arch/x86] Interrupt
make: *** [Makefile:351: __build_one_by_one] Interrupt






--
Best Regards


Masahiro Yamada

2021-06-03 12:43:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: kbuild: Ctrl-C of parallel kernel build sometimes corrupts .o.cmd files permanently


* Masahiro Yamada <[email protected]> wrote:

> Hmm, I have not observed this.
>
> My expectation is, it should work like this:
>
> When scripts/basic/fixdep is interrupted (or fail due to any reason),
> partially written *.o.cmd is left over. So, having incomplete *.o.cmd
> files is expectation.
>
> When .DELETE_ON_ERROR is specified, GNU Make is supposed to
> automatically delete the target on any error.
> (If it is interrupted, it should exit with code 130)
>
> On the next invocation of Make, Kbuild will not include .*.o.cmd files
> whose corresponding *.o files do not exist.
>
>
>
>
>
> When you got the corrupted drivers/gpu/drm/.drm_blend.o.cmd,
> didn't you see the log
> Deleting file 'drivers/gpu/drm/drm_blend.o' ?
>
>
>
> If it works as I expect, the log should look like follows:
> (I marked the lines with '<---- Deleting')
>
>
>
> CC security/keys/keyctl_pkey.o
> CC kernel/sys.o
> CC arch/x86/power/hibernate_64.o
> ^Cmake[5]: *** Deleting file 'drivers/video/fbdev/core/fbcmap.o' <---- Deleting
> make[5]: *** [scripts/Makefile.build:272:
> drivers/video/fbdev/core/fbmon.o] Interrupt
> make[3]: *** [scripts/Makefile.build:272: security/selinux/nlmsgtab.o] Interrupt
> make[2]: *** [scripts/Makefile.build:272: arch/x86/power/cpu.o] Interrupt
> make[2]: *** [scripts/Makefile.build:272:

Interestingly I don't get *any* interruption messages at all:

CC drivers/dma/dw/acpi.o
CC sound/pci/ice1712/ice1712.o
CC drivers/char/ipmi/ipmi_watchdog.o
CC fs/overlayfs/export.o
CC fs/nls/nls_cp936.o
CC drivers/char/ipmi/ipmi_poweroff.o
^Ckepler:~/tip>

The '^C' there - it just stops, make never prints anything for me.

Weird ...

Thanks,

Ingo

2021-06-03 12:46:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: kbuild: Ctrl-C of parallel kernel build sometimes corrupts .o.cmd files permanently


* Ingo Molnar <[email protected]> wrote:

> > CC security/keys/keyctl_pkey.o
> > CC kernel/sys.o
> > CC arch/x86/power/hibernate_64.o
> > ^Cmake[5]: *** Deleting file 'drivers/video/fbdev/core/fbcmap.o' <---- Deleting
> > make[5]: *** [scripts/Makefile.build:272:
> > drivers/video/fbdev/core/fbmon.o] Interrupt
> > make[3]: *** [scripts/Makefile.build:272: security/selinux/nlmsgtab.o] Interrupt
> > make[2]: *** [scripts/Makefile.build:272: arch/x86/power/cpu.o] Interrupt
> > make[2]: *** [scripts/Makefile.build:272:
>
> Interestingly I don't get *any* interruption messages at all:
>
> CC drivers/dma/dw/acpi.o
> CC sound/pci/ice1712/ice1712.o
> CC drivers/char/ipmi/ipmi_watchdog.o
> CC fs/overlayfs/export.o
> CC fs/nls/nls_cp936.o
> CC drivers/char/ipmi/ipmi_poweroff.o
> ^Ckepler:~/tip>
>
> The '^C' there - it just stops, make never prints anything for me.

Found something - seems to be related whether the build is going into a
pipe or not.


I usually build this way (directly or via a script):

make -j96 bzImage ARCH=x86 2>&1 | tee e

Ctrl-C interruption is not handled by kbuild in this case:

CC fs/jffs2/xattr_trusted.o
CC sound/firewire/motu/motu-transaction.o
CC sound/usb/clock.o
^Ckepler:~/tip>

Immediate prompt - no cleanup sequence.

But if I do it without 'tee', I get the expected cleanup sequence by make:

kepler:~/tip> make -j96 bzImage ARCH=x86 2>&1

CC fs/jffs2/acl.o
CC sound/pci/echoaudio/mona.o
CC fs/nls/nls_iso8859-9.o
^Cmake[2]: *** Deleting file 'drivers/reset/core.o'
make[3]: *** Deleting file 'sound/pci/mixart/mixart.o'
make[3]: *** Deleting file 'sound/pci/emu10k1/voice.o'
make[2]: *** Deleting file 'fs/gfs2/aops.o'

Thanks,

Ingo

2021-06-04 03:25:50

by Masahiro Yamada

[permalink] [raw]
Subject: Re: kbuild: Ctrl-C of parallel kernel build sometimes corrupts .o.cmd files permanently

On Thu, Jun 3, 2021 at 9:44 PM Ingo Molnar <[email protected]> wrote:
>
>
> * Ingo Molnar <[email protected]> wrote:
>
> > > CC security/keys/keyctl_pkey.o
> > > CC kernel/sys.o
> > > CC arch/x86/power/hibernate_64.o
> > > ^Cmake[5]: *** Deleting file 'drivers/video/fbdev/core/fbcmap.o' <---- Deleting
> > > make[5]: *** [scripts/Makefile.build:272:
> > > drivers/video/fbdev/core/fbmon.o] Interrupt
> > > make[3]: *** [scripts/Makefile.build:272: security/selinux/nlmsgtab.o] Interrupt
> > > make[2]: *** [scripts/Makefile.build:272: arch/x86/power/cpu.o] Interrupt
> > > make[2]: *** [scripts/Makefile.build:272:
> >
> > Interestingly I don't get *any* interruption messages at all:
> >
> > CC drivers/dma/dw/acpi.o
> > CC sound/pci/ice1712/ice1712.o
> > CC drivers/char/ipmi/ipmi_watchdog.o
> > CC fs/overlayfs/export.o
> > CC fs/nls/nls_cp936.o
> > CC drivers/char/ipmi/ipmi_poweroff.o
> > ^Ckepler:~/tip>
> >
> > The '^C' there - it just stops, make never prints anything for me.
>
> Found something - seems to be related whether the build is going into a
> pipe or not.
>
>
> I usually build this way (directly or via a script):
>
> make -j96 bzImage ARCH=x86 2>&1 | tee e
>
> Ctrl-C interruption is not handled by kbuild in this case:
>
> CC fs/jffs2/xattr_trusted.o
> CC sound/firewire/motu/motu-transaction.o
> CC sound/usb/clock.o
> ^Ckepler:~/tip>
>
> Immediate prompt - no cleanup sequence.
>
> But if I do it without 'tee', I get the expected cleanup sequence by make:
>
> kepler:~/tip> make -j96 bzImage ARCH=x86 2>&1
>
> CC fs/jffs2/acl.o
> CC sound/pci/echoaudio/mona.o
> CC fs/nls/nls_iso8859-9.o
> ^Cmake[2]: *** Deleting file 'drivers/reset/core.o'
> make[3]: *** Deleting file 'sound/pci/mixart/mixart.o'
> make[3]: *** Deleting file 'sound/pci/emu10k1/voice.o'
> make[2]: *** Deleting file 'fs/gfs2/aops.o'
>
> Thanks,
>
> Ingo



Hmm, I do not know why GNU Make behaves like this...

I will ask about this in GNU Make ML.


--
Best Regards
Masahiro Yamada

2021-06-09 15:30:12

by Masahiro Yamada

[permalink] [raw]
Subject: Re: kbuild: Ctrl-C of parallel kernel build sometimes corrupts .o.cmd files permanently

On Fri, Jun 4, 2021 at 12:22 PM Masahiro Yamada <[email protected]> wrote:
>
> On Thu, Jun 3, 2021 at 9:44 PM Ingo Molnar <[email protected]> wrote:
> >
> >
> > * Ingo Molnar <[email protected]> wrote:
> >
> > > > CC security/keys/keyctl_pkey.o
> > > > CC kernel/sys.o
> > > > CC arch/x86/power/hibernate_64.o
> > > > ^Cmake[5]: *** Deleting file 'drivers/video/fbdev/core/fbcmap.o' <---- Deleting
> > > > make[5]: *** [scripts/Makefile.build:272:
> > > > drivers/video/fbdev/core/fbmon.o] Interrupt
> > > > make[3]: *** [scripts/Makefile.build:272: security/selinux/nlmsgtab.o] Interrupt
> > > > make[2]: *** [scripts/Makefile.build:272: arch/x86/power/cpu.o] Interrupt
> > > > make[2]: *** [scripts/Makefile.build:272:
> > >
> > > Interestingly I don't get *any* interruption messages at all:
> > >
> > > CC drivers/dma/dw/acpi.o
> > > CC sound/pci/ice1712/ice1712.o
> > > CC drivers/char/ipmi/ipmi_watchdog.o
> > > CC fs/overlayfs/export.o
> > > CC fs/nls/nls_cp936.o
> > > CC drivers/char/ipmi/ipmi_poweroff.o
> > > ^Ckepler:~/tip>
> > >
> > > The '^C' there - it just stops, make never prints anything for me.
> >
> > Found something - seems to be related whether the build is going into a
> > pipe or not.
> >
> >
> > I usually build this way (directly or via a script):
> >
> > make -j96 bzImage ARCH=x86 2>&1 | tee e
> >
> > Ctrl-C interruption is not handled by kbuild in this case:
> >
> > CC fs/jffs2/xattr_trusted.o
> > CC sound/firewire/motu/motu-transaction.o
> > CC sound/usb/clock.o
> > ^Ckepler:~/tip>
> >
> > Immediate prompt - no cleanup sequence.
> >
> > But if I do it without 'tee', I get the expected cleanup sequence by make:
> >
> > kepler:~/tip> make -j96 bzImage ARCH=x86 2>&1
> >
> > CC fs/jffs2/acl.o
> > CC sound/pci/echoaudio/mona.o
> > CC fs/nls/nls_iso8859-9.o
> > ^Cmake[2]: *** Deleting file 'drivers/reset/core.o'
> > make[3]: *** Deleting file 'sound/pci/mixart/mixart.o'
> > make[3]: *** Deleting file 'sound/pci/emu10k1/voice.o'
> > make[2]: *** Deleting file 'fs/gfs2/aops.o'
> >
> > Thanks,
> >
> > Ingo
>
>
>
> Hmm, I do not know why GNU Make behaves like this...
>
> I will ask about this in GNU Make ML.


https://lists.gnu.org/archive/html/help-make/2021-06/msg00001.html


In short, 'tee' was also interrupted,
then 'make' got SIGPIPE, which terminated the app
due to the default behavior.

--
Best Regards
Masahiro Yamada

2021-06-12 13:30:04

by Ingo Molnar

[permalink] [raw]
Subject: Re: kbuild: Ctrl-C of parallel kernel build sometimes corrupts .o.cmd files permanently


* Masahiro Yamada <[email protected]> wrote:

> > Hmm, I do not know why GNU Make behaves like this...
> >
> > I will ask about this in GNU Make ML.
>
>
> https://lists.gnu.org/archive/html/help-make/2021-06/msg00001.html
>
>
> In short, 'tee' was also interrupted,
> then 'make' got SIGPIPE, which terminated the app
> due to the default behavior.

So, what's the solution? It's rather common to
run a build job while capturing a log via 'tee'.

Ctrl-C only recently started corrupting the
kernel build. (As in the past ~2 years ;-)


Thanks,

Ingo