2020-06-10 07:55:03

by Chen, Rong A

[permalink] [raw]
Subject: [kmemleak] b751c52bb5: BUG:kernel_hang_in_boot_stage

Greeting,

FYI, we noticed the following commit (built with gcc-7):

commit: b751c52bb587ae66f773b15204ef7a147467f4c7 ("kmemleak: increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE default to 16K")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+-------------------------------------------------+------------+------------+
| | 9d5f0be0f7 | b751c52bb5 |
+-------------------------------------------------+------------+------------+
| boot_successes | 0 | 0 |
| boot_failures | 3 | 4 |
| WARNING:at_kernel/locking/mutex.c:#__mutex_lock | 3 | |
| EIP:__mutex_lock | 3 | |
| BUG:kernel_hang_in_boot_stage | 0 | 4 |
+-------------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 0.247058] ------------------------
[ 0.247878] | Locking API testsuite:
[ 0.248695] ----------------------------------------------------------------------------
[ 0.250509] | spin |wlock |rlock |mutex | wsem | rsem |
[ 0.252429] --------------------------------------------------------------------------
[ 0.254369] A-A deadlock: ok | ok | ok | ok | ok | ok | ok |
[ 0.263731] A-B-B-A deadlock: ok | ok | ok | ok | ok | ok | ok |
[ 0.273828] A-B-B-C-C-A deadlock: ok | ok | ok | ok | ok | ok | ok |
[ 0.284876] A-B-C-A-B-C deadlock: ok | ok | ok | ok | ok | ok | ok |
[ 0.296642] A-B-B-C-C-D-D-A deadlock: ok | ok | ok | ok | ok | ok | ok |
[ 0.308919] A-B-C-D-B-D-D-A deadlock: ok | ok | ok | ok | ok | ok | ok |
[ 0.320993] A-B-C-D-B-C-D-A deadlock: ok | ok | ok | ok | ok | ok | ok |
[ 0.333668] double unlock: ok | ok | ok | ok | ok | ok | ok |
[ 0.343237] initialize held: ok | ok | ok | ok | ok | ok | ok |
[ 0.352761] --------------------------------------------------------------------------
[ 0.354703] recursive read-lock: | ok | | ok |
[ 0.358943] recursive read-lock #2: | ok | | ok |
[ 0.363056] mixed read-write-lock: | ok | | ok |
[ 0.367191] mixed write-read-lock: | ok | | ok |
[ 0.371251] mixed read-lock/lock-write ABBA: |FAILED| | ok |
[ 0.373667] mixed read-lock/lock-read ABBA: | ok | | ok |
[ 0.377895] mixed write-lock/lock-write ABBA: | ok | | ok |
[ 0.381888] --------------------------------------------------------------------------
[ 0.385416] hard-irqs-on + irq-safe-A/12: ok | ok | ok |
[ 0.389990] soft-irqs-on + irq-safe-A/12: ok | ok | ok |
[ 0.395489] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.400157] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.408936] sirq-safe-A => hirqs-on/12: ok | ok | ok |
[ 0.413666] sirq-safe-A => hirqs-on/21: ok | ok | ok |
[ 0.418073] hard-safe-A + irqs-on/12: ok | ok | ok |
[ 0.422942] soft-safe-A + irqs-on/12: ok | ok | ok |
[ 0.427616] hard-safe-A + irqs-on/21: ok | ok | ok |
[ 0.433873] soft-safe-A + irqs-on/21: ok | ok | ok |
[ 0.442630] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
[ 0.447955] soft-safe-A + unsafe-B #1/123: ok | ok | ok |
[ 0.453324] hard-safe-A + unsafe-B #1/132: ok | ok | ok |
[ 0.458298] soft-safe-A + unsafe-B #1/132: ok | ok | ok |
[ 0.463422] hard-safe-A + unsafe-B #1/213: ok | ok | ok |
[ 0.468262] soft-safe-A + unsafe-B #1/213: ok | ok | ok |
[ 0.473090] hard-safe-A + unsafe-B #1/231: ok | ok | ok |
[ 0.478066] soft-safe-A + unsafe-B #1/231: ok | ok | ok |
[ 0.482982] hard-safe-A + unsafe-B #1/312: ok | ok | ok |
[ 0.490885] soft-safe-A + unsafe-B #1/312: ok | ok | ok |
[ 0.496095] hard-safe-A + unsafe-B #1/321: ok | ok | ok |
[ 0.501193] soft-safe-A + unsafe-B #1/321: ok | ok | ok |
[ 0.506457] hard-safe-A + unsafe-B #2/123: ok | ok | ok |
[ 0.516060] soft-safe-A + unsafe-B #2/123: ok | ok | ok |
[ 0.521157] hard-safe-A + unsafe-B #2/132: ok | ok | ok |
[ 0.530477] soft-safe-A + unsafe-B #2/132: ok | ok | ok |
[ 0.535387] hard-safe-A + unsafe-B #2/213: ok | ok | ok |
[ 0.540393] soft-safe-A + unsafe-B #2/213: ok | ok | ok |
[ 0.552196] hard-safe-A + unsafe-B #2/231: ok | ok | ok |
[ 0.557088] soft-safe-A + unsafe-B #2/231: ok | ok | ok |
[ 0.562135] hard-safe-A + unsafe-B #2/312: ok | ok | ok |
[ 0.567072] soft-safe-A + unsafe-B #2/312: ok | ok | ok |
[ 0.572107] hard-safe-A + unsafe-B #2/321: ok | ok | ok |
[ 0.577276] soft-safe-A + unsafe-B #2/321: ok | ok | ok |
[ 0.582592] hard-irq lock-inversion/123: ok | ok | ok |
[ 0.587839] soft-irq lock-inversion/123: ok | ok | ok |
[ 0.593347] hard-irq lock-inversion/132: ok | ok | ok |
[ 0.598574] soft-irq lock-inversion/132: ok | ok | ok |
[ 0.603629] hard-irq lock-inversion/213: ok | ok | ok |
[ 0.608515] soft-irq lock-inversion/213: ok | ok | ok |
[ 0.613650] hard-irq lock-inversion/231: ok | ok | ok |
[ 0.620111] soft-irq lock-inversion/231: ok | ok | ok |
[ 0.625090] hard-irq lock-inversion/312: ok | ok | ok |
[ 0.630141] soft-irq lock-inversion/312: ok | ok | ok |
[ 0.634948] hard-irq lock-inversion/321: ok | ok | ok |
[ 0.639932] soft-irq lock-inversion/321: ok | ok | ok |
[ 0.645786] hard-irq read-recursion/123: ok |
[ 0.648025] soft-irq read-recursion/123: ok |
[ 0.650400] hard-irq read-recursion/132: ok |
[ 0.652820] soft-irq read-recursion/132: ok |
[ 0.655214] hard-irq read-recursion/213: ok |
[ 0.657531] soft-irq read-recursion/213: ok |
[ 0.659993] hard-irq read-recursion/231: ok |
[ 0.662498] soft-irq read-recursion/231: ok |
[ 0.664895] hard-irq read-recursion/312: ok |
[ 0.667422] soft-irq read-recursion/312: ok |
[ 0.669851] hard-irq read-recursion/321: ok |
[ 0.672222] soft-irq read-recursion/321: ok |
[ 0.674713] --------------------------------------------------------------------------
[ 0.676711] | Wound/wait tests |
[ 0.677592] ---------------------
[ 0.678496] ww api failures: ok | ok | ok |
[ 0.684457] ww contexts mixing: ok | ok |
[ 0.688135] finishing ww context: ok | ok | ok | ok |
[ 0.694329] locking mismatches: ok | ok | ok |
[ 0.699328] EDEADLK handling: ok | ok | ok | ok | ok | ok | ok | ok | ok | ok |
[ 0.714881] spinlock nest unlocked: ok |
[ 0.717152] -----------------------------------------------------
[ 0.718723] |block | try |context|
[ 0.720196] -----------------------------------------------------
[ 0.721677] context: ok | ok | ok |
[ 0.727929] try: ok | ok | ok |
[ 0.732790] block: ok | ok | ok |
[ 0.737894] spinlock: ok | ok | ok |
[ 0.743082] -------------------------------------------------------
[ 0.744547] Good, all 261 testcases passed! |
[ 0.745547] ---------------------------------
BUG: kernel hang in boot stage



To reproduce:

# build kernel
cd linux
cp config-5.3.0-11789-gb751c52bb587a .config
make HOSTCC=gcc-7 CC=gcc-7 ARCH=i386 olddefconfig prepare modules_prepare bzImage

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Rong Chen


Attachments:
(No filename) (8.90 kB)
config-5.3.0-11789-gb751c52bb587a (113.33 kB)
job-script (4.60 kB)
dmesg.xz (5.68 kB)
Download all attachments

2020-06-10 10:29:38

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [kmemleak] b751c52bb5: BUG:kernel_hang_in_boot_stage

On Wed, Jun 10, 2020 at 12:35 PM Nicolas Boichat <[email protected]> wrote:
>
> On Wed, Jun 10, 2020 at 3:52 PM kernel test robot <[email protected]> wrote:
> >
> > Greeting,
> >
> > FYI, we noticed the following commit (built with gcc-7):
> >
> > commit: b751c52bb587ae66f773b15204ef7a147467f4c7 ("kmemleak: increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE default to 16K")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> Huh, It's not impossible there is an underlying issue, but I really
> don't see how that change could cause any of the issues below...

In Linux Next we have the

commit c5665868183fec689dbab9fb8505188b2c4f0757
Author: Catalin Marinas <[email protected]>
Date: Mon Sep 23 15:34:05 2019 -0700

mm: kmemleak: use the memory pool for early allocations


I'm wondering if it should be backported to fix this.

> > in testcase: boot
> >
> > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> > +-------------------------------------------------+------------+------------+
> > | | 9d5f0be0f7 | b751c52bb5 |
> > +-------------------------------------------------+------------+------------+
> > | boot_successes | 0 | 0 |
> > | boot_failures | 3 | 4 |
> > | WARNING:at_kernel/locking/mutex.c:#__mutex_lock | 3 | |
> > | EIP:__mutex_lock | 3 | |
> > | BUG:kernel_hang_in_boot_stage | 0 | 4 |
> > +-------------------------------------------------+------------+------------+
> >
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <[email protected]>
> >
> >
> > [ 0.247058] ------------------------
> > [ 0.247878] | Locking API testsuite:
> > [ 0.248695] ----------------------------------------------------------------------------
> > [ 0.250509] | spin |wlock |rlock |mutex | wsem | rsem |
> > [ 0.252429] --------------------------------------------------------------------------
> > [ 0.254369] A-A deadlock: ok | ok | ok | ok | ok | ok | ok |
> > [ 0.263731] A-B-B-A deadlock: ok | ok | ok | ok | ok | ok | ok |
> > [ 0.273828] A-B-B-C-C-A deadlock: ok | ok | ok | ok | ok | ok | ok |
> > [ 0.284876] A-B-C-A-B-C deadlock: ok | ok | ok | ok | ok | ok | ok |
> > [ 0.296642] A-B-B-C-C-D-D-A deadlock: ok | ok | ok | ok | ok | ok | ok |
> > [ 0.308919] A-B-C-D-B-D-D-A deadlock: ok | ok | ok | ok | ok | ok | ok |
> > [ 0.320993] A-B-C-D-B-C-D-A deadlock: ok | ok | ok | ok | ok | ok | ok |
> > [ 0.333668] double unlock: ok | ok | ok | ok | ok | ok | ok |
> > [ 0.343237] initialize held: ok | ok | ok | ok | ok | ok | ok |
> > [ 0.352761] --------------------------------------------------------------------------
> > [ 0.354703] recursive read-lock: | ok | | ok |
> > [ 0.358943] recursive read-lock #2: | ok | | ok |
> > [ 0.363056] mixed read-write-lock: | ok | | ok |
> > [ 0.367191] mixed write-read-lock: | ok | | ok |
> > [ 0.371251] mixed read-lock/lock-write ABBA: |FAILED| | ok |
> > [ 0.373667] mixed read-lock/lock-read ABBA: | ok | | ok |
> > [ 0.377895] mixed write-lock/lock-write ABBA: | ok | | ok |
> > [ 0.381888] --------------------------------------------------------------------------
> > [ 0.385416] hard-irqs-on + irq-safe-A/12: ok | ok | ok |
> > [ 0.389990] soft-irqs-on + irq-safe-A/12: ok | ok | ok |
> > [ 0.395489] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
> > [ 0.400157] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
> > [ 0.408936] sirq-safe-A => hirqs-on/12: ok | ok | ok |
> > [ 0.413666] sirq-safe-A => hirqs-on/21: ok | ok | ok |
> > [ 0.418073] hard-safe-A + irqs-on/12: ok | ok | ok |
> > [ 0.422942] soft-safe-A + irqs-on/12: ok | ok | ok |
> > [ 0.427616] hard-safe-A + irqs-on/21: ok | ok | ok |
> > [ 0.433873] soft-safe-A + irqs-on/21: ok | ok | ok |
> > [ 0.442630] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
> > [ 0.447955] soft-safe-A + unsafe-B #1/123: ok | ok | ok |
> > [ 0.453324] hard-safe-A + unsafe-B #1/132: ok | ok | ok |
> > [ 0.458298] soft-safe-A + unsafe-B #1/132: ok | ok | ok |
> > [ 0.463422] hard-safe-A + unsafe-B #1/213: ok | ok | ok |
> > [ 0.468262] soft-safe-A + unsafe-B #1/213: ok | ok | ok |
> > [ 0.473090] hard-safe-A + unsafe-B #1/231: ok | ok | ok |
> > [ 0.478066] soft-safe-A + unsafe-B #1/231: ok | ok | ok |
> > [ 0.482982] hard-safe-A + unsafe-B #1/312: ok | ok | ok |
> > [ 0.490885] soft-safe-A + unsafe-B #1/312: ok | ok | ok |
> > [ 0.496095] hard-safe-A + unsafe-B #1/321: ok | ok | ok |
> > [ 0.501193] soft-safe-A + unsafe-B #1/321: ok | ok | ok |
> > [ 0.506457] hard-safe-A + unsafe-B #2/123: ok | ok | ok |
> > [ 0.516060] soft-safe-A + unsafe-B #2/123: ok | ok | ok |
> > [ 0.521157] hard-safe-A + unsafe-B #2/132: ok | ok | ok |
> > [ 0.530477] soft-safe-A + unsafe-B #2/132: ok | ok | ok |
> > [ 0.535387] hard-safe-A + unsafe-B #2/213: ok | ok | ok |
> > [ 0.540393] soft-safe-A + unsafe-B #2/213: ok | ok | ok |
> > [ 0.552196] hard-safe-A + unsafe-B #2/231: ok | ok | ok |
> > [ 0.557088] soft-safe-A + unsafe-B #2/231: ok | ok | ok |
> > [ 0.562135] hard-safe-A + unsafe-B #2/312: ok | ok | ok |
> > [ 0.567072] soft-safe-A + unsafe-B #2/312: ok | ok | ok |
> > [ 0.572107] hard-safe-A + unsafe-B #2/321: ok | ok | ok |
> > [ 0.577276] soft-safe-A + unsafe-B #2/321: ok | ok | ok |
> > [ 0.582592] hard-irq lock-inversion/123: ok | ok | ok |
> > [ 0.587839] soft-irq lock-inversion/123: ok | ok | ok |
> > [ 0.593347] hard-irq lock-inversion/132: ok | ok | ok |
> > [ 0.598574] soft-irq lock-inversion/132: ok | ok | ok |
> > [ 0.603629] hard-irq lock-inversion/213: ok | ok | ok |
> > [ 0.608515] soft-irq lock-inversion/213: ok | ok | ok |
> > [ 0.613650] hard-irq lock-inversion/231: ok | ok | ok |
> > [ 0.620111] soft-irq lock-inversion/231: ok | ok | ok |
> > [ 0.625090] hard-irq lock-inversion/312: ok | ok | ok |
> > [ 0.630141] soft-irq lock-inversion/312: ok | ok | ok |
> > [ 0.634948] hard-irq lock-inversion/321: ok | ok | ok |
> > [ 0.639932] soft-irq lock-inversion/321: ok | ok | ok |
> > [ 0.645786] hard-irq read-recursion/123: ok |
> > [ 0.648025] soft-irq read-recursion/123: ok |
> > [ 0.650400] hard-irq read-recursion/132: ok |
> > [ 0.652820] soft-irq read-recursion/132: ok |
> > [ 0.655214] hard-irq read-recursion/213: ok |
> > [ 0.657531] soft-irq read-recursion/213: ok |
> > [ 0.659993] hard-irq read-recursion/231: ok |
> > [ 0.662498] soft-irq read-recursion/231: ok |
> > [ 0.664895] hard-irq read-recursion/312: ok |
> > [ 0.667422] soft-irq read-recursion/312: ok |
> > [ 0.669851] hard-irq read-recursion/321: ok |
> > [ 0.672222] soft-irq read-recursion/321: ok |
> > [ 0.674713] --------------------------------------------------------------------------
> > [ 0.676711] | Wound/wait tests |
> > [ 0.677592] ---------------------
> > [ 0.678496] ww api failures: ok | ok | ok |
> > [ 0.684457] ww contexts mixing: ok | ok |
> > [ 0.688135] finishing ww context: ok | ok | ok | ok |
> > [ 0.694329] locking mismatches: ok | ok | ok |
> > [ 0.699328] EDEADLK handling: ok | ok | ok | ok | ok | ok | ok | ok | ok | ok |
> > [ 0.714881] spinlock nest unlocked: ok |
> > [ 0.717152] -----------------------------------------------------
> > [ 0.718723] |block | try |context|
> > [ 0.720196] -----------------------------------------------------
> > [ 0.721677] context: ok | ok | ok |
> > [ 0.727929] try: ok | ok | ok |
> > [ 0.732790] block: ok | ok | ok |
> > [ 0.737894] spinlock: ok | ok | ok |
> > [ 0.743082] -------------------------------------------------------
> > [ 0.744547] Good, all 261 testcases passed! |
> > [ 0.745547] ---------------------------------
> > BUG: kernel hang in boot stage
> >
> >
> >
> > To reproduce:
> >
> > # build kernel
> > cd linux
> > cp config-5.3.0-11789-gb751c52bb587a .config
> > make HOSTCC=gcc-7 CC=gcc-7 ARCH=i386 olddefconfig prepare modules_prepare bzImage
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
> >
> >
> >
> > Thanks,
> > Rong Chen
> >



--
With Best Regards,
Andy Shevchenko

2020-06-10 10:59:40

by Catalin Marinas

[permalink] [raw]
Subject: Re: [kmemleak] b751c52bb5: BUG:kernel_hang_in_boot_stage

On Wed, Jun 10, 2020 at 03:51:56PM +0800, kernel test robot wrote:
> FYI, we noticed the following commit (built with gcc-7):
>
> commit: b751c52bb587ae66f773b15204ef7a147467f4c7 ("kmemleak: increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE default to 16K")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: boot
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
[...]
> BUG: kernel hang in boot stage
>
> To reproduce:
>
> # build kernel
> cd linux
> cp config-5.3.0-11789-gb751c52bb587a .config
> make HOSTCC=gcc-7 CC=gcc-7 ARCH=i386 olddefconfig prepare modules_prepare bzImage

I've never tried kmemleak on i386.

Anyway, I'm not sure what caused the hang (or whether it's a hang at
all) but I suspect prior to the above commit, kmemleak probably just
disabled itself (early log buffer exceeded). So the bug may have been
there already, only that kmemleak started working and tripped over it
when the log buffer increased.

Is there a chance that the kernel got much slower with kmemleak enabled
and the test scripts timed out?

Does this problem still exist with the latest mainline?

Thanks.

--
Catalin

2020-06-10 16:03:00

by Nicolas Boichat

[permalink] [raw]
Subject: Re: [kmemleak] b751c52bb5: BUG:kernel_hang_in_boot_stage

On Wed, Jun 10, 2020 at 3:52 PM kernel test robot <[email protected]> wrote:
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-7):
>
> commit: b751c52bb587ae66f773b15204ef7a147467f4c7 ("kmemleak: increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE default to 16K")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

Huh, It's not impossible there is an underlying issue, but I really
don't see how that change could cause any of the issues below...


> in testcase: boot
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +-------------------------------------------------+------------+------------+
> | | 9d5f0be0f7 | b751c52bb5 |
> +-------------------------------------------------+------------+------------+
> | boot_successes | 0 | 0 |
> | boot_failures | 3 | 4 |
> | WARNING:at_kernel/locking/mutex.c:#__mutex_lock | 3 | |
> | EIP:__mutex_lock | 3 | |
> | BUG:kernel_hang_in_boot_stage | 0 | 4 |
> +-------------------------------------------------+------------+------------+
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> [ 0.247058] ------------------------
> [ 0.247878] | Locking API testsuite:
> [ 0.248695] ----------------------------------------------------------------------------
> [ 0.250509] | spin |wlock |rlock |mutex | wsem | rsem |
> [ 0.252429] --------------------------------------------------------------------------
> [ 0.254369] A-A deadlock: ok | ok | ok | ok | ok | ok | ok |
> [ 0.263731] A-B-B-A deadlock: ok | ok | ok | ok | ok | ok | ok |
> [ 0.273828] A-B-B-C-C-A deadlock: ok | ok | ok | ok | ok | ok | ok |
> [ 0.284876] A-B-C-A-B-C deadlock: ok | ok | ok | ok | ok | ok | ok |
> [ 0.296642] A-B-B-C-C-D-D-A deadlock: ok | ok | ok | ok | ok | ok | ok |
> [ 0.308919] A-B-C-D-B-D-D-A deadlock: ok | ok | ok | ok | ok | ok | ok |
> [ 0.320993] A-B-C-D-B-C-D-A deadlock: ok | ok | ok | ok | ok | ok | ok |
> [ 0.333668] double unlock: ok | ok | ok | ok | ok | ok | ok |
> [ 0.343237] initialize held: ok | ok | ok | ok | ok | ok | ok |
> [ 0.352761] --------------------------------------------------------------------------
> [ 0.354703] recursive read-lock: | ok | | ok |
> [ 0.358943] recursive read-lock #2: | ok | | ok |
> [ 0.363056] mixed read-write-lock: | ok | | ok |
> [ 0.367191] mixed write-read-lock: | ok | | ok |
> [ 0.371251] mixed read-lock/lock-write ABBA: |FAILED| | ok |
> [ 0.373667] mixed read-lock/lock-read ABBA: | ok | | ok |
> [ 0.377895] mixed write-lock/lock-write ABBA: | ok | | ok |
> [ 0.381888] --------------------------------------------------------------------------
> [ 0.385416] hard-irqs-on + irq-safe-A/12: ok | ok | ok |
> [ 0.389990] soft-irqs-on + irq-safe-A/12: ok | ok | ok |
> [ 0.395489] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
> [ 0.400157] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
> [ 0.408936] sirq-safe-A => hirqs-on/12: ok | ok | ok |
> [ 0.413666] sirq-safe-A => hirqs-on/21: ok | ok | ok |
> [ 0.418073] hard-safe-A + irqs-on/12: ok | ok | ok |
> [ 0.422942] soft-safe-A + irqs-on/12: ok | ok | ok |
> [ 0.427616] hard-safe-A + irqs-on/21: ok | ok | ok |
> [ 0.433873] soft-safe-A + irqs-on/21: ok | ok | ok |
> [ 0.442630] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
> [ 0.447955] soft-safe-A + unsafe-B #1/123: ok | ok | ok |
> [ 0.453324] hard-safe-A + unsafe-B #1/132: ok | ok | ok |
> [ 0.458298] soft-safe-A + unsafe-B #1/132: ok | ok | ok |
> [ 0.463422] hard-safe-A + unsafe-B #1/213: ok | ok | ok |
> [ 0.468262] soft-safe-A + unsafe-B #1/213: ok | ok | ok |
> [ 0.473090] hard-safe-A + unsafe-B #1/231: ok | ok | ok |
> [ 0.478066] soft-safe-A + unsafe-B #1/231: ok | ok | ok |
> [ 0.482982] hard-safe-A + unsafe-B #1/312: ok | ok | ok |
> [ 0.490885] soft-safe-A + unsafe-B #1/312: ok | ok | ok |
> [ 0.496095] hard-safe-A + unsafe-B #1/321: ok | ok | ok |
> [ 0.501193] soft-safe-A + unsafe-B #1/321: ok | ok | ok |
> [ 0.506457] hard-safe-A + unsafe-B #2/123: ok | ok | ok |
> [ 0.516060] soft-safe-A + unsafe-B #2/123: ok | ok | ok |
> [ 0.521157] hard-safe-A + unsafe-B #2/132: ok | ok | ok |
> [ 0.530477] soft-safe-A + unsafe-B #2/132: ok | ok | ok |
> [ 0.535387] hard-safe-A + unsafe-B #2/213: ok | ok | ok |
> [ 0.540393] soft-safe-A + unsafe-B #2/213: ok | ok | ok |
> [ 0.552196] hard-safe-A + unsafe-B #2/231: ok | ok | ok |
> [ 0.557088] soft-safe-A + unsafe-B #2/231: ok | ok | ok |
> [ 0.562135] hard-safe-A + unsafe-B #2/312: ok | ok | ok |
> [ 0.567072] soft-safe-A + unsafe-B #2/312: ok | ok | ok |
> [ 0.572107] hard-safe-A + unsafe-B #2/321: ok | ok | ok |
> [ 0.577276] soft-safe-A + unsafe-B #2/321: ok | ok | ok |
> [ 0.582592] hard-irq lock-inversion/123: ok | ok | ok |
> [ 0.587839] soft-irq lock-inversion/123: ok | ok | ok |
> [ 0.593347] hard-irq lock-inversion/132: ok | ok | ok |
> [ 0.598574] soft-irq lock-inversion/132: ok | ok | ok |
> [ 0.603629] hard-irq lock-inversion/213: ok | ok | ok |
> [ 0.608515] soft-irq lock-inversion/213: ok | ok | ok |
> [ 0.613650] hard-irq lock-inversion/231: ok | ok | ok |
> [ 0.620111] soft-irq lock-inversion/231: ok | ok | ok |
> [ 0.625090] hard-irq lock-inversion/312: ok | ok | ok |
> [ 0.630141] soft-irq lock-inversion/312: ok | ok | ok |
> [ 0.634948] hard-irq lock-inversion/321: ok | ok | ok |
> [ 0.639932] soft-irq lock-inversion/321: ok | ok | ok |
> [ 0.645786] hard-irq read-recursion/123: ok |
> [ 0.648025] soft-irq read-recursion/123: ok |
> [ 0.650400] hard-irq read-recursion/132: ok |
> [ 0.652820] soft-irq read-recursion/132: ok |
> [ 0.655214] hard-irq read-recursion/213: ok |
> [ 0.657531] soft-irq read-recursion/213: ok |
> [ 0.659993] hard-irq read-recursion/231: ok |
> [ 0.662498] soft-irq read-recursion/231: ok |
> [ 0.664895] hard-irq read-recursion/312: ok |
> [ 0.667422] soft-irq read-recursion/312: ok |
> [ 0.669851] hard-irq read-recursion/321: ok |
> [ 0.672222] soft-irq read-recursion/321: ok |
> [ 0.674713] --------------------------------------------------------------------------
> [ 0.676711] | Wound/wait tests |
> [ 0.677592] ---------------------
> [ 0.678496] ww api failures: ok | ok | ok |
> [ 0.684457] ww contexts mixing: ok | ok |
> [ 0.688135] finishing ww context: ok | ok | ok | ok |
> [ 0.694329] locking mismatches: ok | ok | ok |
> [ 0.699328] EDEADLK handling: ok | ok | ok | ok | ok | ok | ok | ok | ok | ok |
> [ 0.714881] spinlock nest unlocked: ok |
> [ 0.717152] -----------------------------------------------------
> [ 0.718723] |block | try |context|
> [ 0.720196] -----------------------------------------------------
> [ 0.721677] context: ok | ok | ok |
> [ 0.727929] try: ok | ok | ok |
> [ 0.732790] block: ok | ok | ok |
> [ 0.737894] spinlock: ok | ok | ok |
> [ 0.743082] -------------------------------------------------------
> [ 0.744547] Good, all 261 testcases passed! |
> [ 0.745547] ---------------------------------
> BUG: kernel hang in boot stage
>
>
>
> To reproduce:
>
> # build kernel
> cd linux
> cp config-5.3.0-11789-gb751c52bb587a .config
> make HOSTCC=gcc-7 CC=gcc-7 ARCH=i386 olddefconfig prepare modules_prepare bzImage
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
>
>
>
> Thanks,
> Rong Chen
>

2020-06-11 02:25:54

by Li Zhijian

[permalink] [raw]
Subject: Re: [kmemleak] b751c52bb5: BUG:kernel_hang_in_boot_stage



On 6/10/20 6:56 PM, Catalin Marinas wrote:
> On Wed, Jun 10, 2020 at 03:51:56PM +0800, kernel test robot wrote:
>> FYI, we noticed the following commit (built with gcc-7):
>>
>> commit: b751c52bb587ae66f773b15204ef7a147467f4c7 ("kmemleak: increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE default to 16K")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> in testcase: boot
>>
>> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
> [...]
>> BUG: kernel hang in boot stage
>>
>> To reproduce:
>>
>> # build kernel
>> cd linux
>> cp config-5.3.0-11789-gb751c52bb587a .config
>> make HOSTCC=gcc-7 CC=gcc-7 ARCH=i386 olddefconfig prepare modules_prepare bzImage
> I've never tried kmemleak on i386.
>
> Anyway, I'm not sure what caused the hang (or whether it's a hang at
> all) but I suspect prior to the above commit, kmemleak probably just
> disabled itself (early log buffer exceeded).

> So the bug may have been
> there already,
I think so

> only that kmemleak started working and tripped over it
> when the log buffer increased.
>
> Is there a chance that the kernel got much slower with kmemleak enabled
> and the test scripts timed out?

> Does this problem still exist with the latest mainline?
Yes, that's true.

Thanks
Zhijian (0Day CI team)
>
> Thanks.
>

2020-06-16 02:57:45

by Chen, Rong A

[permalink] [raw]
Subject: Re: [kmemleak] b751c52bb5: BUG:kernel_hang_in_boot_stage



On 6/10/20 6:56 PM, Catalin Marinas wrote:
> On Wed, Jun 10, 2020 at 03:51:56PM +0800, kernel test robot wrote:
>> FYI, we noticed the following commit (built with gcc-7):
>>
>> commit: b751c52bb587ae66f773b15204ef7a147467f4c7 ("kmemleak: increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE default to 16K")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> in testcase: boot
>>
>> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
> [...]
>> BUG: kernel hang in boot stage
>>
>> To reproduce:
>>
>> # build kernel
>> cd linux
>> cp config-5.3.0-11789-gb751c52bb587a .config
>> make HOSTCC=gcc-7 CC=gcc-7 ARCH=i386 olddefconfig prepare modules_prepare bzImage
> I've never tried kmemleak on i386.
>
> Anyway, I'm not sure what caused the hang (or whether it's a hang at
> all) but I suspect prior to the above commit, kmemleak probably just
> disabled itself (early log buffer exceeded). So the bug may have been
> there already, only that kmemleak started working and tripped over it
> when the log buffer increased.

Hi,

Sorry for the late, I can reproduce the problem with command "bin/lkp
qemu -k <bzImage> job-script",
and the kernel hangs:

[    0.333897] -----------------------------------------------------
[    0.334561]                                  |block | try |context|
[    0.335170] -----------------------------------------------------
[    0.335760]                           context:  ok  |  ok  |  ok |
[    0.337995]                               try:  ok  |  ok  |  ok |
[    0.340089]                             block:  ok  |  ok  |  ok |
[    0.342175]                          spinlock:  ok  |  ok  |  ok |
[    0.344481] -------------------------------------------------------
[    0.345068] Good, all 261 testcases passed! |
[    0.345514] ---------------------------------
KVM internal error. Suberror: 3
extra data[0]: 80000b0e
extra data[1]: 31
extra data[2]: 182
extra data[3]: bfff0
EAX=00000000 EBX=00200297 ECX=00000000 EDX=ffffffff
ESI=d2e997c0 EDI=d2e997f0 EBP=d2bbb038 ESP=c00bfff4
EIP=f4dccd57 EFL=00210046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
CS =0060 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0068 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
FS =00d8 23331000 ffffffff 00809300 DPL=0 DS16 [-WA]
GS =00e0 f6422900 00000018 00409100 DPL=0 DS   [--A]
LDT=0000 00000000 ffffffff 00c00000
TR =0080 ff403000 0000206b 00008b00 DPL=0 TSS32-busy
GDT=     ff401000 000000ff
IDT=     ff400000 000007ff
CR0=80050033 CR2=00000000 CR3=130fc000 CR4=00000690
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000fffe0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00

>
> Is there a chance that the kernel got much slower with kmemleak enabled
> and the test scripts timed out?
no, the parent commit log is:

[    0.313845] -----------------------------------------------------
[    0.314608]                                  |block | try |context|
[    0.315314] -----------------------------------------------------
[    0.315974]                           context:  ok  |  ok  |  ok |
[    0.318261]                               try:  ok  |  ok  |  ok |
[    0.320478]                             block:  ok  |  ok  |  ok |
[    0.322562]                          spinlock:  ok  |  ok  |  ok |
[    0.324825] -------------------------------------------------------
[    0.325403] Good, all 261 testcases passed! |
[    0.325809] ---------------------------------
[    0.326221] kmemleak: Early log buffer exceeded (401), please
increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE
[    0.327065] ACPI: Core revision 20190816
[    0.327585] clocksource: hpet: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 19112604467 ns
[    0.328545] APIC: Switch to symmetric I/O mode setup
[    0.329009] Enabling APIC mode:  Flat.  Using 1 I/O APICs
[    0.329572] masked ExtINT on CPU#0
[    0.330686] ENABLING IO-APIC IRQs
[    0.331001] init IO_APIC IRQs
[    0.331274]  apic 0 pin 0 not connected

>
> Does this problem still exist with the latest mainline?
yes, still in v5.7.

Best Regards,
Rong Chen