2022-05-07 01:37:10

by kernel test robot

[permalink] [raw]
Subject: [timekeeping] 3aeaac747d: PANIC:early_exception



Greeting,

FYI, we noticed the following commit (built with clang-15):

commit: 3aeaac747d194aa3cff28351b4a83267faa11b3a ("[PATCH v7] timekeeping: Add raw clock fallback for random_get_entropy()")
url: https://github.com/intel-lab-lkp/linux/commits/Jason-A-Donenfeld/timekeeping-Add-raw-clock-fallback-for-random_get_entropy/20220505-082221
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 107c948d1d3e61d10aee9d0f7c3d81bbee9842af
patch link: https://lore.kernel.org/lkml/[email protected]

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+---------------------------------+------------+------------+
| | 107c948d1d | 3aeaac747d |
+---------------------------------+------------+------------+
| boot_successes | 14 | 0 |
| boot_failures | 0 | 12 |
| PANIC:early_exception | 0 | 12 |
| RIP:random_get_entropy_fallback | 0 | 12 |
| BUG:kernel_hang_in_boot_stage | 0 | 12 |
+---------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 0.000000][ T0] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ 0.000000][ T0] BIOS-e820: [mem 0x0000000100000000-0x000000043fffffff] usable
[ 0.000000][ T0] printk: debug: ignoring loglevel setting.
[ 0.000000][ T0] printk: bootconsole [earlyser0] enabled
[ 0.000000][ T0] NX (Execute Disable) protection: active
PANIC: early exception 0x0d IP 10:ffffffff812cef11 error 0 cr2 0xffff888004a3aff8
[ 0.000000][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-rc5-00017-g3aeaac747d19 #4
[ 0.000000][ T0] RIP: 0010:random_get_entropy_fallback (kbuild/src/rand-3/kernel/time/timekeeping.c:194 kbuild/src/rand-3/kernel/time/timekeeping.c:2390)
[ 0.000000][ T0] Code: 41 5d 41 5e 41 5f 5d c3 90 f3 0f 1e fa 55 48 89 e5 53 48 8b 1d 00 4e df 03 48 89 d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 df e8 d1 2f 3b 00 4c 8b 1b 48 89 df e8 f6
All code
========
0: 41 5d pop %r13
2: 41 5e pop %r14
4: 41 5f pop %r15
6: 5d pop %rbp
7: c3 retq
8: 90 nop
9: f3 0f 1e fa endbr64
d: 55 push %rbp
e: 48 89 e5 mov %rsp,%rbp
11: 53 push %rbx
12: 48 8b 1d 00 4e df 03 mov 0x3df4e00(%rip),%rbx # 0x3df4e19
19: 48 89 d8 mov %rbx,%rax
1c: 48 c1 e8 03 shr $0x3,%rax
20: 48 b9 00 00 00 00 00 movabs $0xdffffc0000000000,%rcx
27: fc ff df
2a:* 80 3c 08 00 cmpb $0x0,(%rax,%rcx,1) <-- trapping instruction
2e: 74 08 je 0x38
30: 48 89 df mov %rbx,%rdi
33: e8 d1 2f 3b 00 callq 0x3b3009
38: 4c 8b 1b mov (%rbx),%r11
3b: 48 89 df mov %rbx,%rdi
3e: e8 .byte 0xe8
3f: f6 .byte 0xf6

Code starting with the faulting instruction
===========================================
0: 80 3c 08 00 cmpb $0x0,(%rax,%rcx,1)
4: 74 08 je 0xe
6: 48 89 df mov %rbx,%rdi
9: e8 d1 2f 3b 00 callq 0x3b2fdf
e: 4c 8b 1b mov (%rbx),%r11
11: 48 89 df mov %rbx,%rdi
14: e8 .byte 0xe8
15: f6 .byte 0xf6
[ 0.000000][ T0] RSP: 0000:ffffffff84a07d70 EFLAGS: 00010046 ORIG_RAX: 0000000000000000
[ 0.000000][ T0] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000
[ 0.000000][ T0] RDX: ffffffffff240a7a RSI: 00000000000001da RDI: ffffffffff2408a0
[ 0.000000][ T0] RBP: ffffffff84a07d78 R08: 0000000000dbf764 R09: 000000000000000b
[ 0.000000][ T0] R10: ffffffff86a00070 R11: ffffffff8583df04 R12: dffffc0000000000
[ 0.000000][ T0] R13: ffffffff84a07e50 R14: ffffffffff2408a0 R15: 00000000000001da
[ 0.000000][ T0] FS: 0000000000000000(0000) GS:ffffffff84a9e000(0000) knlGS:0000000000000000
[ 0.000000][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.000000][ T0] CR2: ffff888004a3aff8 CR3: 0000000005876000 CR4: 00000000000406a0
[ 0.000000][ T0] Call Trace:
[ 0.000000][ T0] <TASK>
[ 0.000000][ T0] add_device_randomness (kbuild/src/rand-3/drivers/char/random.c:1028)
[ 0.000000][ T0] ? dmi_walk_early (kbuild/src/rand-3/drivers/firmware/dmi_scan.c:470)
[ 0.000000][ T0] dmi_walk_early (kbuild/src/rand-3/drivers/firmware/dmi_scan.c:151)
[ 0.000000][ T0] dmi_present (kbuild/src/rand-3/drivers/firmware/dmi_scan.c:605)


To reproduce:

# build kernel
cd linux
cp config-5.18.0-rc5-00017-g3aeaac747d19 .config
make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (5.70 kB)
config-5.18.0-rc5-00017-g3aeaac747d19 (133.02 kB)
job-script (5.18 kB)
dmesg.xz (2.38 kB)
Download all attachments

2022-05-09 01:31:52

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [timekeeping] 3aeaac747d: PANIC:early_exception

On Fri, May 06 2022 at 11:20, kernel test robot wrote:
> PANIC: early exception 0x0d IP 10:ffffffff812cef11 error 0 cr2 0xffff888004a3aff8
> [ 0.000000][ T0] add_device_randomness (kbuild/src/rand-3/drivers/char/random.c:1028)
> [ 0.000000][ T0] ? dmi_walk_early (kbuild/src/rand-3/drivers/firmware/dmi_scan.c:470)
> [ 0.000000][ T0] dmi_walk_early (kbuild/src/rand-3/drivers/firmware/dmi_scan.c:151)
> [ 0.000000][ T0] dmi_present (kbuild/src/rand-3/drivers/firmware/dmi_scan.c:605)

Duh.

So this wants to be:

--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2380,6 +2381,20 @@ static int timekeeping_validate_timex(co
return 0;
}

+/**
+ * random_get_entropy_fallback - Returns the raw clock source value,
+ * used by random.c for platforms with no valid random_get_entropy().
+ */
+unsigned long random_get_entropy_fallback(void)
+{
+ struct tk_read_base *tkr = &tk_core.timekeeper.tkr_mono;
+ struct clocksource *clock = READ_ONCE(tkr->clock);
+
+ if (!timekeeping_suspended && clock)
+ return clock->read(clock);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(random_get_entropy_fallback);

/**
* do_adjtimex() - Accessor function to NTP __do_adjtimex function

2022-05-09 02:44:31

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [timekeeping] 3aeaac747d: PANIC:early_exception

Hi Thomas,

On Fri, May 06, 2022 at 09:59:13AM +0200, Thomas Gleixner wrote:
> +/**
> + * random_get_entropy_fallback - Returns the raw clock source value,
> + * used by random.c for platforms with no valid random_get_entropy().
> + */
> +unsigned long random_get_entropy_fallback(void)
> +{
> + struct tk_read_base *tkr = &tk_core.timekeeper.tkr_mono;
> + struct clocksource *clock = READ_ONCE(tkr->clock);
> +
> + if (!timekeeping_suspended && clock)
> + return clock->read(clock);
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(random_get_entropy_fallback);

I tried to address this already in
<https://lore.kernel.org/lkml/20220505002910.IAcnpEOE2zR2ibERl4Lh3Y_PMmtb0Rf43lVevgztJiM@z/>,
though yours looks better with the READ_ONCE() around clock, and I'll
send a v8 doing it that way. I didn't realize that clock could become
NULL again after becoming non-NULL.

I'm not quite sure I understand the purpose of !timekeeping_suspended
there, though. I'm not seeing the path where reading with it suspended
negatively affects things. I'll take your word for it though.

Jason

2022-05-09 03:09:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [timekeeping] 3aeaac747d: PANIC:early_exception

On Fri, May 06 2022 at 12:12, Jason A. Donenfeld wrote:
> On Fri, May 06, 2022 at 09:59:13AM +0200, Thomas Gleixner wrote:
>> +/**
>> + * random_get_entropy_fallback - Returns the raw clock source value,
>> + * used by random.c for platforms with no valid random_get_entropy().
>> + */
>> +unsigned long random_get_entropy_fallback(void)
>> +{
>> + struct tk_read_base *tkr = &tk_core.timekeeper.tkr_mono;
>> + struct clocksource *clock = READ_ONCE(tkr->clock);
>> +
>> + if (!timekeeping_suspended && clock)
>> + return clock->read(clock);
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(random_get_entropy_fallback);
>
> I tried to address this already in
> <https://lore.kernel.org/lkml/20220505002910.IAcnpEOE2zR2ibERl4Lh3Y_PMmtb0Rf43lVevgztJiM@z/>,
> though yours looks better with the READ_ONCE() around clock, and I'll
> send a v8 doing it that way. I didn't realize that clock could become
> NULL again after becoming non-NULL.

This happens at early boot where clock is NULL.

> I'm not quite sure I understand the purpose of !timekeeping_suspended
> there, though. I'm not seeing the path where reading with it suspended
> negatively affects things. I'll take your word for it though.

Some clocks are not accessible during suspend.

Thanks,

tglx