2024-03-28 07:37:07

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [x86/bugs] 6613d82e61: general_protection_fault:#[##]



Hello,


we reported a performance issue for this commit in
https://lore.kernel.org/all/[email protected]/

now we noticed a persistent crash issue:

a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:100 99% 100:100 dmesg.EIP:restore_all_switch_stack
:100 99% 100:100 dmesg.Kernel_panic-not_syncing:Fatal_exception
:100 99% 100:100 dmesg.general_protection_fault:#[##]


below details FYI.


kernel test robot noticed "general_protection_fault:#[##]" on:

commit: 6613d82e617dd7eb8b0c40b2fe3acea655b1d611 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master 70293240c5ce675a67bfc48f419b093023b862b3]
[test failed on linux-next/master 13ee4a7161b6fd938aef6688ff43b163f6d83e37]

in testcase: trinity
version:
with following parameters:

runtime: 600s



compiler: clang-17
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


[ 25.175767][ T670] VFS: Warning: trinity-c2 using old stat() call. Recompile your binary.
[ 25.245597][ T669] general protection fault: 0000 [#1] PREEMPT SMP
[ 25.246417][ T669] CPU: 1 PID: 669 Comm: trinity-c1 Not tainted 6.8.0-rc5-00004-g6613d82e617d #1 85a4928d2e6b42899c3861e57e26bdc646c4c5f9
[ 25.247743][ T669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 25.248865][ T669] EIP: restore_all_switch_stack (kbuild/src/consumer/arch/x86/entry/entry_32.S:957)
[ 25.249510][ T669] Code: 4c 24 10 36 89 48 fc 8b 4c 24 0c 81 e1 ff ff 00 00 36 89 48 f8 8b 4c 24 08 36 89 48 f4 8b 4c 24 04 36 89 48 f0 59 8d 60 f0 58 <0f> 00 2d 00 94 d5 c1 cf 6a 00 68 88 6b d4 c1 eb 00 fc 0f a0 50 b8
All code
========
0: 4c 24 10 rex.WR and $0x10,%al
3: 36 89 48 fc ss mov %ecx,-0x4(%rax)
7: 8b 4c 24 0c mov 0xc(%rsp),%ecx
b: 81 e1 ff ff 00 00 and $0xffff,%ecx
11: 36 89 48 f8 ss mov %ecx,-0x8(%rax)
15: 8b 4c 24 08 mov 0x8(%rsp),%ecx
19: 36 89 48 f4 ss mov %ecx,-0xc(%rax)
1d: 8b 4c 24 04 mov 0x4(%rsp),%ecx
21: 36 89 48 f0 ss mov %ecx,-0x10(%rax)
25: 59 pop %rcx
26: 8d 60 f0 lea -0x10(%rax),%esp
29: 58 pop %rax
2a:* 0f 00 2d 00 94 d5 c1 verw -0x3e2a6c00(%rip) # 0xffffffffc1d59431 <-- trapping instruction
31: cf iret
32: 6a 00 push $0x0
34: 68 88 6b d4 c1 push $0xffffffffc1d46b88
39: eb 00 jmp 0x3b
3b: fc cld
3c: 0f a0 push %fs
3e: 50 push %rax
3f: b8 .byte 0xb8

Code starting with the faulting instruction
===========================================
0: 0f 00 2d 00 94 d5 c1 verw -0x3e2a6c00(%rip) # 0xffffffffc1d59407
7: cf iret
8: 6a 00 push $0x0
a: 68 88 6b d4 c1 push $0xffffffffc1d46b88
f: eb 00 jmp 0x11
11: fc cld
12: 0f a0 push %fs
14: 50 push %rax
15: b8 .byte 0xb8
[ 25.251494][ T669] EAX: 00000000 EBX: 000001a0 ECX: 000001a1 EDX: 00000000
[ 25.252271][ T669] ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ffa2efdc
[ 25.253037][ T669] DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
[ 25.253892][ T669] CR0: 80050033 CR2: b7dabd6e CR3: 2cc341c0 CR4: 000406b0
[ 25.254655][ T669] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 25.255413][ T669] DR6: fffe0ff0 DR7: 00000400
[ 25.255952][ T669] Call Trace:
[ 25.256376][ T669] ? __die_body (kbuild/src/consumer/arch/x86/kernel/dumpstack.c:478 kbuild/src/consumer/arch/x86/kernel/dumpstack.c:420)
[ 25.256907][ T669] ? die_addr (kbuild/src/consumer/arch/x86/kernel/dumpstack.c:?)
[ 25.257411][ T669] ? exc_general_protection (kbuild/src/consumer/arch/x86/kernel/traps.c:698)
[ 25.258067][ T669] ? __entry_text_start (??:?)
[ 25.258691][ T669] ? irqentry_exit_to_user_mode (kbuild/src/consumer/kernel/entry/common.c:228)


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240328/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



2024-03-28 21:25:12

by Pawan Gupta

[permalink] [raw]
Subject: Re: [linus:master] [x86/bugs] 6613d82e61: general_protection_fault:#[##]

On Thu, Mar 28, 2024 at 03:36:28PM +0800, kernel test robot wrote:
> compiler: clang-17
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <[email protected]>
> | Closes: https://lore.kernel.org/oe-lkp/[email protected]
>
>
> [ 25.175767][ T670] VFS: Warning: trinity-c2 using old stat() call. Recompile your binary.
> [ 25.245597][ T669] general protection fault: 0000 [#1] PREEMPT SMP
> [ 25.246417][ T669] CPU: 1 PID: 669 Comm: trinity-c1 Not tainted 6.8.0-rc5-00004-g6613d82e617d #1 85a4928d2e6b42899c3861e57e26bdc646c4c5f9
> [ 25.247743][ T669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 25.248865][ T669] EIP: restore_all_switch_stack (kbuild/src/consumer/arch/x86/entry/entry_32.S:957)
> [ 25.249510][ T669] Code: 4c 24 10 36 89 48 fc 8b 4c 24 0c 81 e1 ff ff 00 00 36 89 48 f8 8b 4c 24 08 36 89 48 f4 8b 4c 24 04 36 89 48 f0 59 8d 60 f0 58 <0f> 00 2d 00 94 d5 c1 cf 6a 00 68 88 6b d4 c1 eb 00 fc 0f a0 50 b8
> All code
> ========
> 0: 4c 24 10 rex.WR and $0x10,%al
> 3: 36 89 48 fc ss mov %ecx,-0x4(%rax)
> 7: 8b 4c 24 0c mov 0xc(%rsp),%ecx
> b: 81 e1 ff ff 00 00 and $0xffff,%ecx
> 11: 36 89 48 f8 ss mov %ecx,-0x8(%rax)
> 15: 8b 4c 24 08 mov 0x8(%rsp),%ecx
> 19: 36 89 48 f4 ss mov %ecx,-0xc(%rax)
> 1d: 8b 4c 24 04 mov 0x4(%rsp),%ecx
> 21: 36 89 48 f0 ss mov %ecx,-0x10(%rax)
> 25: 59 pop %rcx
> 26: 8d 60 f0 lea -0x10(%rax),%esp
> 29: 58 pop %rax
> 2a:* 0f 00 2d 00 94 d5 c1 verw -0x3e2a6c00(%rip) # 0xffffffffc1d59431 <-- trapping instruction

This is due to 64-bit addressing with CONFIG_X86_32=y on clang.

I haven't tried with clang, but I don't see this happening with gcc-11:

entry_INT80_32:
...
<+446>: mov 0x4(%esp),%ecx
<+450>: mov %ecx,%ss:-0x10(%eax)
<+454>: pop %ecx
<+455>: lea -0x10(%eax),%esp
<+458>: pop %eax
<+459>: verw 0xc1d5c700 <----------
<+466>: iret

> 31: cf iret
> 32: 6a 00 push $0x0
> 34: 68 88 6b d4 c1 push $0xffffffffc1d46b88
> 39: eb 00 jmp 0x3b
..

The config has CONFIG_X86_32=y, but it is possible that in 32-bit build
with clang, 64-bit mode expansion of "VERW (_ASM_RIP(addr))" is getting
used i.e. __ASM_FORM_RAW(b) below:

file: arch/x86/include/asm/asm.h
...
#ifndef __x86_64__
/* 32 bit */
# define __ASM_SEL(a,b) __ASM_FORM(a)
# define __ASM_SEL_RAW(a,b) __ASM_FORM_RAW(a)
#else
/* 64 bit */
# define __ASM_SEL(a,b) __ASM_FORM(b)
# define __ASM_SEL_RAW(a,b) __ASM_FORM_RAW(b) <--------
#endif
...
/* Adds a (%rip) suffix on 64 bits only; for immediate memory references */
#define _ASM_RIP(x) __ASM_SEL_RAW(x, x (__ASM_REGPFX rip))

Possibly __x86_64__ is being defined with clang even when CONFIG_X86_32=y.

I am not sure about current level of 32-bit mode support in clang. This
seems inconclusive:

https://discourse.llvm.org/t/x86-32-bit-testing/65480

Does anyone care about 32-bit mode builds with clang?

Subject: Re: [linus:master] [x86/bugs] 6613d82e61: general_protection_fault:#[##]

Hi, Thorsten here, the Linux kernel's regression tracker.

On 28.03.24 22:17, Pawan Gupta wrote:
> On Thu, Mar 28, 2024 at 03:36:28PM +0800, kernel test robot wrote:
>> compiler: clang-17
>> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <[email protected]>
>> | Closes: https://lore.kernel.org/oe-lkp/[email protected]

TWIMC, a user report general protection faults with dosemu that were
bisected to a 6.6.y backport of the commit that causes the problem
discussed in this thread (6613d82e617dd7 ("x86/bugs: Use ALTERNATIVE()
instead of mds_user_clear static key")).

User compiles using gcc, so it might be a different problem. Happens
with 6.8.y as well.

The problem occurs with x86-32 kernels, but strangely only on some of
the x86-32 systems the reporter has (e.g. on some everything works
fine). Makes me wonder if the commit exposed an older problem that only
happens on some machines.

For details see https://bugzilla.kernel.org/show_bug.cgi?id=218707
Could not CC the reporter here due to the bugzilla privacy policy; if
you want to get in contact, please use bugzilla.

Ciao, Thorsten

>> [ 25.175767][ T670] VFS: Warning: trinity-c2 using old stat() call. Recompile your binary.
>> [ 25.245597][ T669] general protection fault: 0000 [#1] PREEMPT SMP
>> [ 25.246417][ T669] CPU: 1 PID: 669 Comm: trinity-c1 Not tainted 6.8.0-rc5-00004-g6613d82e617d #1 85a4928d2e6b42899c3861e57e26bdc646c4c5f9
>> [ 25.247743][ T669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>> [ 25.248865][ T669] EIP: restore_all_switch_stack (kbuild/src/consumer/arch/x86/entry/entry_32.S:957)
>> [ 25.249510][ T669] Code: 4c 24 10 36 89 48 fc 8b 4c 24 0c 81 e1 ff ff 00 00 36 89 48 f8 8b 4c 24 08 36 89 48 f4 8b 4c 24 04 36 89 48 f0 59 8d 60 f0 58 <0f> 00 2d 00 94 d5 c1 cf 6a 00 68 88 6b d4 c1 eb 00 fc 0f a0 50 b8
>> All code
>> ========
>> 0: 4c 24 10 rex.WR and $0x10,%al
>> 3: 36 89 48 fc ss mov %ecx,-0x4(%rax)
>> 7: 8b 4c 24 0c mov 0xc(%rsp),%ecx
>> b: 81 e1 ff ff 00 00 and $0xffff,%ecx
>> 11: 36 89 48 f8 ss mov %ecx,-0x8(%rax)
>> 15: 8b 4c 24 08 mov 0x8(%rsp),%ecx
>> 19: 36 89 48 f4 ss mov %ecx,-0xc(%rax)
>> 1d: 8b 4c 24 04 mov 0x4(%rsp),%ecx
>> 21: 36 89 48 f0 ss mov %ecx,-0x10(%rax)
>> 25: 59 pop %rcx
>> 26: 8d 60 f0 lea -0x10(%rax),%esp
>> 29: 58 pop %rax
>> 2a:* 0f 00 2d 00 94 d5 c1 verw -0x3e2a6c00(%rip) # 0xffffffffc1d59431 <-- trapping instruction
>
> This is due to 64-bit addressing with CONFIG_X86_32=y on clang.
>
> I haven't tried with clang, but I don't see this happening with gcc-11:
>
> entry_INT80_32:
> ...
> <+446>: mov 0x4(%esp),%ecx
> <+450>: mov %ecx,%ss:-0x10(%eax)
> <+454>: pop %ecx
> <+455>: lea -0x10(%eax),%esp
> <+458>: pop %eax
> <+459>: verw 0xc1d5c700 <----------
> <+466>: iret
>
>> 31: cf iret
>> 32: 6a 00 push $0x0
>> 34: 68 88 6b d4 c1 push $0xffffffffc1d46b88
>> 39: eb 00 jmp 0x3b
> ...
>
> The config has CONFIG_X86_32=y, but it is possible that in 32-bit build
> with clang, 64-bit mode expansion of "VERW (_ASM_RIP(addr))" is getting
> used i.e. __ASM_FORM_RAW(b) below:
>
> file: arch/x86/include/asm/asm.h
> ...
> #ifndef __x86_64__
> /* 32 bit */
> # define __ASM_SEL(a,b) __ASM_FORM(a)
> # define __ASM_SEL_RAW(a,b) __ASM_FORM_RAW(a)
> #else
> /* 64 bit */
> # define __ASM_SEL(a,b) __ASM_FORM(b)
> # define __ASM_SEL_RAW(a,b) __ASM_FORM_RAW(b) <--------
> #endif
> ...
> /* Adds a (%rip) suffix on 64 bits only; for immediate memory references */
> #define _ASM_RIP(x) __ASM_SEL_RAW(x, x (__ASM_REGPFX rip))
>
> Possibly __x86_64__ is being defined with clang even when CONFIG_X86_32=y.
>
> I am not sure about current level of 32-bit mode support in clang. This
> seems inconclusive:
>
> https://discourse.llvm.org/t/x86-32-bit-testing/65480
>
> Does anyone care about 32-bit mode builds with clang?

2024-04-17 18:55:29

by Pawan Gupta

[permalink] [raw]
Subject: Re: [linus:master] [x86/bugs] 6613d82e61: general_protection_fault:#[##]

On Sun, Apr 14, 2024 at 08:41:52AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker.
>
> On 28.03.24 22:17, Pawan Gupta wrote:
> > On Thu, Mar 28, 2024 at 03:36:28PM +0800, kernel test robot wrote:
> >> compiler: clang-17
> >> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >>
> >> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> >> the same patch/commit), kindly add following tags
> >> | Reported-by: kernel test robot <[email protected]>
> >> | Closes: https://lore.kernel.org/oe-lkp/[email protected]
>
> TWIMC, a user report general protection faults with dosemu that were
> bisected to a 6.6.y backport of the commit that causes the problem
> discussed in this thread (6613d82e617dd7 ("x86/bugs: Use ALTERNATIVE()
> instead of mds_user_clear static key")).
>
> User compiles using gcc, so it might be a different problem. Happens
> with 6.8.y as well.
>
> The problem occurs with x86-32 kernels, but strangely only on some of
> the x86-32 systems the reporter has (e.g. on some everything works
> fine). Makes me wonder if the commit exposed an older problem that only
> happens on some machines.
>
> For details see https://bugzilla.kernel.org/show_bug.cgi?id=218707
> Could not CC the reporter here due to the bugzilla privacy policy; if
> you want to get in contact, please use bugzilla.

Sorry for the late response, I was off work. I will look into this and
get back. I might need help reproducing this issue, but let me first see
if I can reproduce with the info in the bugzilla.