2019-01-04 03:23:23

by Paul Menzel

[permalink] [raw]
Subject: General protection fault in `switch_mm_irqs_off()`

Dear Linux folks,


On the server board Asus KGPE-D16 with AMD Opteron 6278 processor
updating the microcode update in the firmware from 0x0600062e to
0x0600063e seems to cause a general protection fault with Linux 4.14.87
and 4.20-rc7.

> 46.859: [ 7.573240] microcode: CPU31: patch_level=0x0600063e
> 46.859: [ 7.578507] microcode: Microcode Update Driver: v2.2.
> 46.860: [ 7.578539] sched_clock: Marking stable (6510054745, 1068444659)->(7999876773, -421377369)
> 46.860: [ 7.593013] registered taskstats version 1
> 46.861: [ 7.598091] rtc_cmos 00:00: setting system clock to 2000-01-01 08:01:51 UTC (946713711)
> 46.862: [ 7.606575] ALSA device list:
> 46.862: [ 7.609802] No soundcards found.
> 46.865: [ 7.615887] Freeing unused kernel image memory: 1564K
> 46.871: [ 7.627073] Write protecting the kernel read-only data: 20480k
> 46.872: [ 7.634366] Freeing unused kernel image memory: 2016K
> 46.873: [ 7.640297] Freeing unused kernel image memory: 584K
> 46.874: [ 7.645521] Run /init as init process
> 46.877: [ 7.652262] general protection fault: 0000 [#1] SMP NOPTI
> 46.877: [ 7.657931] CPU: 18 PID: 0 Comm: swapper/18 Not tainted 4.20.0-rc7.mx64.237 #1
> 46.877: [ 7.665514] Hardware name: ASUS KGPE-D16/KGPE-D16, BIOS 4.9-103-g637bef2037 01/02/2019
> 46.878: [ 7.673804] RIP: 0010:switch_mm_irqs_off+0xb2/0x640
> 46.878: [ 7.678948] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
> 46.879: [ 7.698394] RSP: 0018:ffffc90006343e20 EFLAGS: 00010046
> 46.879: [ 7.703844] RAX: 0000000000000001 RBX: ffff88981ca0b800 RCX: 0000000000000049
> 46.879: [ 7.711238] RDX: 0000000000000000 RSI: ffff88981b87cf80 RDI: ffff88981ca0b800
> 46.880: [ 7.718665] RBP: ffffc90006343e70 R08: 00000001c81bec00 R09: 0000000000000000
> 46.880: [ 7.726092] R10: ffffc90006343e88 R11: 0000000000000000 R12: ffffffff82479b40
> 46.880: [ 7.733494] R13: 0000000000000000 R14: 0000000000000012 R15: ffff88981dd50080
> 46.881: [ 7.740853] FS: 0000000000000000(0000) GS:ffff88981fa80000(0000) knlGS:0000000000000000
> 46.881: [ 7.749318] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 46.881: [ 7.755281] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000406e0
> 46.881: [ 7.762761] Call Trace:
> 46.881: [ 7.765369] ? __schedule+0x1b9/0x7b0
> 46.882: [ 7.769253] __schedule+0x1b9/0x7b0
> 46.882: [ 7.772930] schedule_idle+0x1e/0x40
> 46.882: [ 7.776744] do_idle+0x146/0x200
> 46.882: [ 7.780181] cpu_startup_entry+0x19/0x20
> 46.883: [ 7.784274] start_secondary+0x183/0x1b0
> 46.883: [ 7.788409] secondary_startup_64+0xa4/0xb0
> 46.883: [ 7.792766] Modules linked in:
> 46.883: [ 7.796105] ---[ end trace a423e363fe1ecf67 ]---
> 46.884: [ 7.800939] RIP: 0010:switch_mm_irqs_off+0xb2/0x640
> 46.884: [ 7.806048] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
> 46.884: [ 7.825440] RSP: 0018:ffffc90006343e20 EFLAGS: 00010046
> 46.885: [ 7.830855] RAX: 0000000000000001 RBX: ffff88981ca0b800 RCX: 0000000000000049
> 46.885: [ 7.838230] RDX: 0000000000000000 RSI: ffff88981b87cf80 RDI: ffff88981ca0b800
> 46.885: [ 7.845614] RBP: ffffc90006343e70 R08: 00000001c81bec00 R09: 0000000000000000
> 46.886: [ 7.853047] R10: ffffc90006343e88 R11: 0000000000000000 R12: ffffffff82479b40
> 46.886: [ 7.860427] R13: 0000000000000000 R14: 0000000000000012 R15: ffff88981dd50080
> 46.886: [ 7.867862] FS: 0000000000000000(0000) GS:ffff88981fa80000(0000) knlGS:0000000000000000
> 46.886: [ 7.876320] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 46.887: [ 7.882351] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000406e0
> 46.887: [ 7.889746] Kernel panic - not syncing: Attempted to kill the idle task!
> 46.888: [ 7.896907] Kernel Offset: disabled
> 46.888: [ 7.900558] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

Please find the whole log, including the coreboot messages, attached.
The time stamps in the beginning are from the script `readserial.py`
from the SeaBIOS repository.

Do you have an idea what is going on, and how to fix it?


Kind regards,

Paul


Attachments:
seriallog-20190103_175452.log (419.48 kB)

2019-01-04 15:42:41

by Paul Menzel

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

Dear Linux folks,


On 01/03/19 22:45, Paul Menzel wrote:

> On the server board Asus KGPE-D16 with AMD Opteron 6278 processor updating the microcode update in the firmware from 0x0600062e to 0x0600063e seems to cause a general protection fault with Linux 4.14.87 and 4.20-rc7.
>
>> 46.859: [    7.573240] microcode: CPU31: patch_level=0x0600063e
>> 46.859: [    7.578507] microcode: Microcode Update Driver: v2.2.
>> 46.860: [    7.578539] sched_clock: Marking stable (6510054745, 1068444659)->(7999876773, -421377369)
>> 46.860: [    7.593013] registered taskstats version 1
>> 46.861: [    7.598091] rtc_cmos 00:00: setting system clock to 2000-01-01 08:01:51 UTC (946713711)
>> 46.862: [    7.606575] ALSA device list:
>> 46.862: [    7.609802]   No soundcards found.
>> 46.865: [    7.615887] Freeing unused kernel image memory: 1564K
>> 46.871: [    7.627073] Write protecting the kernel read-only data: 20480k
>> 46.872: [    7.634366] Freeing unused kernel image memory: 2016K
>> 46.873: [    7.640297] Freeing unused kernel image memory: 584K
>> 46.874: [    7.645521] Run /init as init process
>> 46.877: [    7.652262] general protection fault: 0000 [#1] SMP NOPTI
>> 46.877: [    7.657931] CPU: 18 PID: 0 Comm: swapper/18 Not tainted 4.20.0-rc7.mx64.237 #1
>> 46.877: [    7.665514] Hardware name: ASUS KGPE-D16/KGPE-D16, BIOS 4.9-103-g637bef2037 01/02/2019
>> 46.878: [    7.673804] RIP: 0010:switch_mm_irqs_off+0xb2/0x640
>> 46.878: [    7.678948] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
>> 46.879: [    7.698394] RSP: 0018:ffffc90006343e20 EFLAGS: 00010046
>> 46.879: [    7.703844] RAX: 0000000000000001 RBX: ffff88981ca0b800 RCX: 0000000000000049
>> 46.879: [    7.711238] RDX: 0000000000000000 RSI: ffff88981b87cf80 RDI: ffff88981ca0b800
>> 46.880: [    7.718665] RBP: ffffc90006343e70 R08: 00000001c81bec00 R09: 0000000000000000
>> 46.880: [    7.726092] R10: ffffc90006343e88 R11: 0000000000000000 R12: ffffffff82479b40
>> 46.880: [    7.733494] R13: 0000000000000000 R14: 0000000000000012 R15: ffff88981dd50080
>> 46.881: [    7.740853] FS:  0000000000000000(0000) GS:ffff88981fa80000(0000) knlGS:0000000000000000
>> 46.881: [    7.749318] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> 46.881: [    7.755281] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000406e0
>> 46.881: [    7.762761] Call Trace:
>> 46.881: [    7.765369]  ? __schedule+0x1b9/0x7b0
>> 46.882: [    7.769253]  __schedule+0x1b9/0x7b0
>> 46.882: [    7.772930]  schedule_idle+0x1e/0x40
>> 46.882: [    7.776744]  do_idle+0x146/0x200
>> 46.882: [    7.780181]  cpu_startup_entry+0x19/0x20
>> 46.883: [    7.784274]  start_secondary+0x183/0x1b0
>> 46.883: [    7.788409]  secondary_startup_64+0xa4/0xb0
>> 46.883: [    7.792766] Modules linked in:
>> 46.883: [    7.796105] ---[ end trace a423e363fe1ecf67 ]---
>> 46.884: [    7.800939] RIP: 0010:switch_mm_irqs_off+0xb2/0x640
>> 46.884: [    7.806048] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
>> 46.884: [    7.825440] RSP: 0018:ffffc90006343e20 EFLAGS: 00010046
>> 46.885: [    7.830855] RAX: 0000000000000001 RBX: ffff88981ca0b800 RCX: 0000000000000049
>> 46.885: [    7.838230] RDX: 0000000000000000 RSI: ffff88981b87cf80 RDI: ffff88981ca0b800
>> 46.885: [    7.845614] RBP: ffffc90006343e70 R08: 00000001c81bec00 R09: 0000000000000000
>> 46.886: [    7.853047] R10: ffffc90006343e88 R11: 0000000000000000 R12: ffffffff82479b40
>> 46.886: [    7.860427] R13: 0000000000000000 R14: 0000000000000012 R15: ffff88981dd50080
>> 46.886: [    7.867862] FS:  0000000000000000(0000) GS:ffff88981fa80000(0000) knlGS:0000000000000000
>> 46.886: [    7.876320] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> 46.887: [    7.882351] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000406e0
>> 46.887: [    7.889746] Kernel panic - not syncing: Attempted to kill the idle task!
>> 46.888: [    7.896907] Kernel Offset: disabled
>> 46.888: [    7.900558] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>
> Please find the whole log, including the coreboot messages, attached. The time stamps in the beginning are from the script `readserial.py` from the SeaBIOS repository.
>
> Do you have an idea what is going on, and how to fix it?

Decoding the code, give the output below.

```
$ scripts/decodecode < /dev/shm/test.log
[ 7.806048] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
All code
========
0: 48 c1 ef 09 shr $0x9,%rdi
4: 83 e7 01 and $0x1,%edi
7: 48 09 c7 or %rax,%rdi
a: 65 48 8b 05 8e 34 fc mov %gs:0x7efc348e(%rip),%rax # 0x7efc34a0
11: 7e
12: 48 39 c7 cmp %rax,%rdi
15: 74 15 je 0x2c
17: 48 09 f8 or %rdi,%rax
1a: a8 01 test $0x1,%al
1c: 74 0e je 0x2c
1e: b9 49 00 00 00 mov $0x49,%ecx
23: b8 01 00 00 00 mov $0x1,%eax
28: 31 d2 xor %edx,%edx
2a:* 0f 30 wrmsr <-- trapping instruction
2c: 65 48 89 3d 6c 34 fc mov %rdi,%gs:0x7efc346c(%rip) # 0x7efc34a0
33: 7e
34: 8b 05 9a ef a7 01 mov 0x1a7ef9a(%rip),%eax # 0x1a7efd4
3a: 85 c0 test %eax,%eax
3c: 0f .byte 0xf
3d: 8f 41 04 popq 0x4(%rcx)

Code starting with the faulting instruction
===========================================
0: 0f 30 wrmsr
2: 65 48 89 3d 6c 34 fc mov %rdi,%gs:0x7efc346c(%rip) # 0x7efc3476
9: 7e
a: 8b 05 9a ef a7 01 mov 0x1a7ef9a(%rip),%eax # 0x1a7efaa
10: 85 c0 test %eax,%eax
12: 0f .byte 0xf
13: 8f 41 04 popq 0x4(%rcx)
```

So the problem is with the instruction *wrmsr* [1].

The content of ECX, which according to [1] is written to, is not
in the logs though, as far as I can see.


Kind regards,

Paul


[1]: https://www.felixcloutier.com/x86/wrmsr


Attachments:
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature

2019-01-04 17:41:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

On Fri, Jan 04, 2019 at 01:41:25PM +0100, Paul Menzel wrote:
> Dear Linux folks,
>
>
> On 01/03/19 22:45, Paul Menzel wrote:
>
> > On the server board Asus KGPE-D16 with AMD Opteron 6278 processor updating the microcode update in the firmware from 0x0600062e to 0x0600063e seems to cause a general protection fault with Linux 4.14.87 and 4.20-rc7.
> >
> >> 46.859: [    7.573240] microcode: CPU31: patch_level=0x0600063e
> >> 46.859: [    7.578507] microcode: Microcode Update Driver: v2.2.
> >> 46.860: [    7.578539] sched_clock: Marking stable (6510054745, 1068444659)->(7999876773, -421377369)
> >> 46.860: [    7.593013] registered taskstats version 1
> >> 46.861: [    7.598091] rtc_cmos 00:00: setting system clock to 2000-01-01 08:01:51 UTC (946713711)
> >> 46.862: [    7.606575] ALSA device list:
> >> 46.862: [    7.609802]   No soundcards found.
> >> 46.865: [    7.615887] Freeing unused kernel image memory: 1564K
> >> 46.871: [    7.627073] Write protecting the kernel read-only data: 20480k
> >> 46.872: [    7.634366] Freeing unused kernel image memory: 2016K
> >> 46.873: [    7.640297] Freeing unused kernel image memory: 584K
> >> 46.874: [    7.645521] Run /init as init process
> >> 46.877: [    7.652262] general protection fault: 0000 [#1] SMP NOPTI
> >> 46.877: [    7.657931] CPU: 18 PID: 0 Comm: swapper/18 Not tainted 4.20.0-rc7.mx64.237 #1
> >> 46.877: [    7.665514] Hardware name: ASUS KGPE-D16/KGPE-D16, BIOS 4.9-103-g637bef2037 01/02/2019
> >> 46.878: [    7.673804] RIP: 0010:switch_mm_irqs_off+0xb2/0x640
> >> 46.878: [    7.678948] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
> >> 46.879: [    7.698394] RSP: 0018:ffffc90006343e20 EFLAGS: 00010046
> >> 46.879: [    7.703844] RAX: 0000000000000001 RBX: ffff88981ca0b800 RCX: 0000000000000049
> >> 46.879: [    7.711238] RDX: 0000000000000000 RSI: ffff88981b87cf80 RDI: ffff88981ca0b800
> >> 46.880: [    7.718665] RBP: ffffc90006343e70 R08: 00000001c81bec00 R09: 0000000000000000
> >> 46.880: [    7.726092] R10: ffffc90006343e88 R11: 0000000000000000 R12: ffffffff82479b40
> >> 46.880: [    7.733494] R13: 0000000000000000 R14: 0000000000000012 R15: ffff88981dd50080
> >> 46.881: [    7.740853] FS:  0000000000000000(0000) GS:ffff88981fa80000(0000) knlGS:0000000000000000
> >> 46.881: [    7.749318] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> 46.881: [    7.755281] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000406e0
> >> 46.881: [    7.762761] Call Trace:
> >> 46.881: [    7.765369]  ? __schedule+0x1b9/0x7b0
> >> 46.882: [    7.769253]  __schedule+0x1b9/0x7b0
> >> 46.882: [    7.772930]  schedule_idle+0x1e/0x40
> >> 46.882: [    7.776744]  do_idle+0x146/0x200
> >> 46.882: [    7.780181]  cpu_startup_entry+0x19/0x20
> >> 46.883: [    7.784274]  start_secondary+0x183/0x1b0
> >> 46.883: [    7.788409]  secondary_startup_64+0xa4/0xb0
> >> 46.883: [    7.792766] Modules linked in:
> >> 46.883: [    7.796105] ---[ end trace a423e363fe1ecf67 ]---
> >> 46.884: [    7.800939] RIP: 0010:switch_mm_irqs_off+0xb2/0x640
> >> 46.884: [    7.806048] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
> >> 46.884: [    7.825440] RSP: 0018:ffffc90006343e20 EFLAGS: 00010046
> >> 46.885: [    7.830855] RAX: 0000000000000001 RBX: ffff88981ca0b800 RCX: 0000000000000049
> >> 46.885: [    7.838230] RDX: 0000000000000000 RSI: ffff88981b87cf80 RDI: ffff88981ca0b800
> >> 46.885: [    7.845614] RBP: ffffc90006343e70 R08: 00000001c81bec00 R09: 0000000000000000
> >> 46.886: [    7.853047] R10: ffffc90006343e88 R11: 0000000000000000 R12: ffffffff82479b40
> >> 46.886: [    7.860427] R13: 0000000000000000 R14: 0000000000000012 R15: ffff88981dd50080
> >> 46.886: [    7.867862] FS:  0000000000000000(0000) GS:ffff88981fa80000(0000) knlGS:0000000000000000
> >> 46.886: [    7.876320] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> 46.887: [    7.882351] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000406e0
> >> 46.887: [    7.889746] Kernel panic - not syncing: Attempted to kill the idle task!
> >> 46.888: [    7.896907] Kernel Offset: disabled
> >> 46.888: [    7.900558] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
> >
> > Please find the whole log, including the coreboot messages, attached. The time stamps in the beginning are from the script `readserial.py` from the SeaBIOS repository.
> >
> > Do you have an idea what is going on, and how to fix it?
>
> Decoding the code, give the output below.
>
> ```
> $ scripts/decodecode < /dev/shm/test.log
> [ 7.806048] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
> All code
> ========
> 0: 48 c1 ef 09 shr $0x9,%rdi
> 4: 83 e7 01 and $0x1,%edi
> 7: 48 09 c7 or %rax,%rdi
> a: 65 48 8b 05 8e 34 fc mov %gs:0x7efc348e(%rip),%rax # 0x7efc34a0
> 11: 7e
> 12: 48 39 c7 cmp %rax,%rdi
> 15: 74 15 je 0x2c
> 17: 48 09 f8 or %rdi,%rax
> 1a: a8 01 test $0x1,%al
> 1c: 74 0e je 0x2c
> 1e: b9 49 00 00 00 mov $0x49,%ecx
> 23: b8 01 00 00 00 mov $0x1,%eax
> 28: 31 d2 xor %edx,%edx
> 2a:* 0f 30 wrmsr <-- trapping instruction
> 2c: 65 48 89 3d 6c 34 fc mov %rdi,%gs:0x7efc346c(%rip) # 0x7efc34a0
> 33: 7e
> 34: 8b 05 9a ef a7 01 mov 0x1a7ef9a(%rip),%eax # 0x1a7efd4
> 3a: 85 c0 test %eax,%eax
> 3c: 0f .byte 0xf
> 3d: 8f 41 04 popq 0x4(%rcx)
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 30 wrmsr
> 2: 65 48 89 3d 6c 34 fc mov %rdi,%gs:0x7efc346c(%rip) # 0x7efc3476
> 9: 7e
> a: 8b 05 9a ef a7 01 mov 0x1a7ef9a(%rip),%eax # 0x1a7efaa
> 10: 85 c0 test %eax,%eax
> 12: 0f .byte 0xf
> 13: 8f 41 04 popq 0x4(%rcx)
> ```
>
> So the problem is with the instruction *wrmsr* [1].
>
> The content of ECX, which according to [1] is written to, is not
> in the logs though, as far as I can see.

Of course it is:

> 1e: b9 49 00 00 00 mov $0x49,%ecx

which is strange.

Tom, is patch_level=0x0600063e on BD supposed to #GP when writing
MSR_IA32_PRED_CMD...

Thx.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2019-01-04 17:44:13

by Jiri Kosina

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`


[ added some CCs ]

On Thu, 3 Jan 2019, Paul Menzel wrote:

> Dear Linux folks,
>
>
> On the server board Asus KGPE-D16 with AMD Opteron 6278 processor updating the
> microcode update in the firmware from 0x0600062e to 0x0600063e seems to cause
> a general protection fault with Linux 4.14.87 and 4.20-rc7.
>
> > 46.859: [ 7.573240] microcode: CPU31: patch_level=0x0600063e
> > 46.859: [ 7.578507] microcode: Microcode Update Driver: v2.2.
> > 46.860: [ 7.578539] sched_clock: Marking stable (6510054745,
> > 1068444659)->(7999876773, -421377369)
> > 46.860: [ 7.593013] registered taskstats version 1
> > 46.861: [ 7.598091] rtc_cmos 00:00: setting system clock to 2000-01-01
> > 08:01:51 UTC (946713711)
> > 46.862: [ 7.606575] ALSA device list:
> > 46.862: [ 7.609802] No soundcards found.
> > 46.865: [ 7.615887] Freeing unused kernel image memory: 1564K
> > 46.871: [ 7.627073] Write protecting the kernel read-only data: 20480k
> > 46.872: [ 7.634366] Freeing unused kernel image memory: 2016K
> > 46.873: [ 7.640297] Freeing unused kernel image memory: 584K
> > 46.874: [ 7.645521] Run /init as init process
> > 46.877: [ 7.652262] general protection fault: 0000 [#1] SMP NOPTI
> > 46.877: [ 7.657931] CPU: 18 PID: 0 Comm: swapper/18 Not tainted
> > 4.20.0-rc7.mx64.237 #1
> > 46.877: [ 7.665514] Hardware name: ASUS KGPE-D16/KGPE-D16, BIOS
> > 4.9-103-g637bef2037 01/02/2019
> > 46.878: [ 7.673804] RIP: 0010:switch_mm_irqs_off+0xb2/0x640
> > 46.878: [ 7.678948] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34
> > fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31
> > d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
> > 46.879: [ 7.698394] RSP: 0018:ffffc90006343e20 EFLAGS: 00010046
> > 46.879: [ 7.703844] RAX: 0000000000000001 RBX: ffff88981ca0b800 RCX:
> > 0000000000000049
> > 46.879: [ 7.711238] RDX: 0000000000000000 RSI: ffff88981b87cf80 RDI:
> > ffff88981ca0b800
> > 46.880: [ 7.718665] RBP: ffffc90006343e70 R08: 00000001c81bec00 R09:
> > 0000000000000000
> > 46.880: [ 7.726092] R10: ffffc90006343e88 R11: 0000000000000000 R12:
> > ffffffff82479b40
> > 46.880: [ 7.733494] R13: 0000000000000000 R14: 0000000000000012 R15:
> > ffff88981dd50080
> > 46.881: [ 7.740853] FS: 0000000000000000(0000) GS:ffff88981fa80000(0000)
> > knlGS:0000000000000000
> > 46.881: [ 7.749318] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > 46.881: [ 7.755281] CR2: 0000000000000000 CR3: 000000000240a000 CR4:
> > 00000000000406e0
> > 46.881: [ 7.762761] Call Trace:
> > 46.881: [ 7.765369] ? __schedule+0x1b9/0x7b0
> > 46.882: [ 7.769253] __schedule+0x1b9/0x7b0
> > 46.882: [ 7.772930] schedule_idle+0x1e/0x40
> > 46.882: [ 7.776744] do_idle+0x146/0x200
> > 46.882: [ 7.780181] cpu_startup_entry+0x19/0x20
> > 46.883: [ 7.784274] start_secondary+0x183/0x1b0
> > 46.883: [ 7.788409] secondary_startup_64+0xa4/0xb0
> > 46.883: [ 7.792766] Modules linked in:
> > 46.883: [ 7.796105] ---[ end trace a423e363fe1ecf67 ]---
> > 46.884: [ 7.800939] RIP: 0010:switch_mm_irqs_off+0xb2/0x640
> > 46.884: [ 7.806048] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34
> > fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31
> > d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04

So this faults when writing PRED_CMD_IBPB to MSR_IA32_PRED_CMD, but that
should be properly patched out on ucodes that don't support IBPB.

This almost looks like the ucode you updated to would advertise IBPB
availability, but then fault when it's used.

I guess that booting with 'spectre_v2_user=off' makes the issue go away,
right?

What happens then if you manually wrmsr 0x1 to MSR 0x49 from userspace?
Could you please post /proc/cpuinfo from such a boot as well?

Leaving the rest of the original mail for reference.

> > 46.884: [ 7.825440] RSP: 0018:ffffc90006343e20 EFLAGS: 00010046
> > 46.885: [ 7.830855] RAX: 0000000000000001 RBX: ffff88981ca0b800 RCX:
> > 0000000000000049
> > 46.885: [ 7.838230] RDX: 0000000000000000 RSI: ffff88981b87cf80 RDI:
> > ffff88981ca0b800
> > 46.885: [ 7.845614] RBP: ffffc90006343e70 R08: 00000001c81bec00 R09:
> > 0000000000000000
> > 46.886: [ 7.853047] R10: ffffc90006343e88 R11: 0000000000000000 R12:
> > ffffffff82479b40
> > 46.886: [ 7.860427] R13: 0000000000000000 R14: 0000000000000012 R15:
> > ffff88981dd50080
> > 46.886: [ 7.867862] FS: 0000000000000000(0000) GS:ffff88981fa80000(0000)
> > knlGS:0000000000000000
> > 46.886: [ 7.876320] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > 46.887: [ 7.882351] CR2: 0000000000000000 CR3: 000000000240a000 CR4:
> > 00000000000406e0
> > 46.887: [ 7.889746] Kernel panic - not syncing: Attempted to kill the
> > idle task!
> > 46.888: [ 7.896907] Kernel Offset: disabled
> > 46.888: [ 7.900558] ---[ end Kernel panic - not syncing: Attempted to
> > kill the idle task! ]---
>
> Please find the whole log, including the coreboot messages, attached. The time
> stamps in the beginning are from the script `readserial.py` from the SeaBIOS
> repository.
>
> Do you have an idea what is going on, and how to fix it?
>
>
> Kind regards,
>
> Paul
>

--
Jiri Kosina
SUSE Labs

2019-01-04 19:02:24

by Tom Lendacky

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

On 1/4/19 9:47 AM, Borislav Petkov wrote:
> On Fri, Jan 04, 2019 at 01:41:25PM +0100, Paul Menzel wrote:
>> Dear Linux folks,
>>
>>
>> On 01/03/19 22:45, Paul Menzel wrote:
>>
>>> On the server board Asus KGPE-D16 with AMD Opteron 6278 processor updating the microcode update in the firmware from 0x0600062e to 0x0600063e seems to cause a general protection fault with Linux 4.14.87 and 4.20-rc7.
>>>
>>>> 46.859: [    7.573240] microcode: CPU31: patch_level=0x0600063e
>>>> 46.859: [    7.578507] microcode: Microcode Update Driver: v2.2.
>>>> 46.860: [    7.578539] sched_clock: Marking stable (6510054745, 1068444659)->(7999876773, -421377369)
>>>> 46.860: [    7.593013] registered taskstats version 1
>>>> 46.861: [    7.598091] rtc_cmos 00:00: setting system clock to 2000-01-01 08:01:51 UTC (946713711)
>>>> 46.862: [    7.606575] ALSA device list:
>>>> 46.862: [    7.609802]   No soundcards found.
>>>> 46.865: [    7.615887] Freeing unused kernel image memory: 1564K
>>>> 46.871: [    7.627073] Write protecting the kernel read-only data: 20480k
>>>> 46.872: [    7.634366] Freeing unused kernel image memory: 2016K
>>>> 46.873: [    7.640297] Freeing unused kernel image memory: 584K
>>>> 46.874: [    7.645521] Run /init as init process
>>>> 46.877: [    7.652262] general protection fault: 0000 [#1] SMP NOPTI
>>>> 46.877: [    7.657931] CPU: 18 PID: 0 Comm: swapper/18 Not tainted 4.20.0-rc7.mx64.237 #1
>>>> 46.877: [    7.665514] Hardware name: ASUS KGPE-D16/KGPE-D16, BIOS 4.9-103-g637bef2037 01/02/2019
>>>> 46.878: [    7.673804] RIP: 0010:switch_mm_irqs_off+0xb2/0x640
>>>> 46.878: [    7.678948] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
>>>> 46.879: [    7.698394] RSP: 0018:ffffc90006343e20 EFLAGS: 00010046
>>>> 46.879: [    7.703844] RAX: 0000000000000001 RBX: ffff88981ca0b800 RCX: 0000000000000049
>>>> 46.879: [    7.711238] RDX: 0000000000000000 RSI: ffff88981b87cf80 RDI: ffff88981ca0b800
>>>> 46.880: [    7.718665] RBP: ffffc90006343e70 R08: 00000001c81bec00 R09: 0000000000000000
>>>> 46.880: [    7.726092] R10: ffffc90006343e88 R11: 0000000000000000 R12: ffffffff82479b40
>>>> 46.880: [    7.733494] R13: 0000000000000000 R14: 0000000000000012 R15: ffff88981dd50080
>>>> 46.881: [    7.740853] FS:  0000000000000000(0000) GS:ffff88981fa80000(0000) knlGS:0000000000000000
>>>> 46.881: [    7.749318] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> 46.881: [    7.755281] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000406e0
>>>> 46.881: [    7.762761] Call Trace:
>>>> 46.881: [    7.765369]  ? __schedule+0x1b9/0x7b0
>>>> 46.882: [    7.769253]  __schedule+0x1b9/0x7b0
>>>> 46.882: [    7.772930]  schedule_idle+0x1e/0x40
>>>> 46.882: [    7.776744]  do_idle+0x146/0x200
>>>> 46.882: [    7.780181]  cpu_startup_entry+0x19/0x20
>>>> 46.883: [    7.784274]  start_secondary+0x183/0x1b0
>>>> 46.883: [    7.788409]  secondary_startup_64+0xa4/0xb0
>>>> 46.883: [    7.792766] Modules linked in:
>>>> 46.883: [    7.796105] ---[ end trace a423e363fe1ecf67 ]---
>>>> 46.884: [    7.800939] RIP: 0010:switch_mm_irqs_off+0xb2/0x640
>>>> 46.884: [    7.806048] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
>>>> 46.884: [    7.825440] RSP: 0018:ffffc90006343e20 EFLAGS: 00010046
>>>> 46.885: [    7.830855] RAX: 0000000000000001 RBX: ffff88981ca0b800 RCX: 0000000000000049
>>>> 46.885: [    7.838230] RDX: 0000000000000000 RSI: ffff88981b87cf80 RDI: ffff88981ca0b800
>>>> 46.885: [    7.845614] RBP: ffffc90006343e70 R08: 00000001c81bec00 R09: 0000000000000000
>>>> 46.886: [    7.853047] R10: ffffc90006343e88 R11: 0000000000000000 R12: ffffffff82479b40
>>>> 46.886: [    7.860427] R13: 0000000000000000 R14: 0000000000000012 R15: ffff88981dd50080
>>>> 46.886: [    7.867862] FS:  0000000000000000(0000) GS:ffff88981fa80000(0000) knlGS:0000000000000000
>>>> 46.886: [    7.876320] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> 46.887: [    7.882351] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000406e0
>>>> 46.887: [    7.889746] Kernel panic - not syncing: Attempted to kill the idle task!
>>>> 46.888: [    7.896907] Kernel Offset: disabled
>>>> 46.888: [    7.900558] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>>>
>>> Please find the whole log, including the coreboot messages, attached. The time stamps in the beginning are from the script `readserial.py` from the SeaBIOS repository.
>>>
>>> Do you have an idea what is going on, and how to fix it?
>>
>> Decoding the code, give the output below.
>>
>> ```
>> $ scripts/decodecode < /dev/shm/test.log
>> [ 7.806048] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
>> All code
>> ========
>> 0: 48 c1 ef 09 shr $0x9,%rdi
>> 4: 83 e7 01 and $0x1,%edi
>> 7: 48 09 c7 or %rax,%rdi
>> a: 65 48 8b 05 8e 34 fc mov %gs:0x7efc348e(%rip),%rax # 0x7efc34a0
>> 11: 7e
>> 12: 48 39 c7 cmp %rax,%rdi
>> 15: 74 15 je 0x2c
>> 17: 48 09 f8 or %rdi,%rax
>> 1a: a8 01 test $0x1,%al
>> 1c: 74 0e je 0x2c
>> 1e: b9 49 00 00 00 mov $0x49,%ecx
>> 23: b8 01 00 00 00 mov $0x1,%eax
>> 28: 31 d2 xor %edx,%edx
>> 2a:* 0f 30 wrmsr <-- trapping instruction
>> 2c: 65 48 89 3d 6c 34 fc mov %rdi,%gs:0x7efc346c(%rip) # 0x7efc34a0
>> 33: 7e
>> 34: 8b 05 9a ef a7 01 mov 0x1a7ef9a(%rip),%eax # 0x1a7efd4
>> 3a: 85 c0 test %eax,%eax
>> 3c: 0f .byte 0xf
>> 3d: 8f 41 04 popq 0x4(%rcx)
>>
>> Code starting with the faulting instruction
>> ===========================================
>> 0: 0f 30 wrmsr
>> 2: 65 48 89 3d 6c 34 fc mov %rdi,%gs:0x7efc346c(%rip) # 0x7efc3476
>> 9: 7e
>> a: 8b 05 9a ef a7 01 mov 0x1a7ef9a(%rip),%eax # 0x1a7efaa
>> 10: 85 c0 test %eax,%eax
>> 12: 0f .byte 0xf
>> 13: 8f 41 04 popq 0x4(%rcx)
>> ```
>>
>> So the problem is with the instruction *wrmsr* [1].
>>
>> The content of ECX, which according to [1] is written to, is not
>> in the logs though, as far as I can see.
>
> Of course it is:
>
>> 1e: b9 49 00 00 00 mov $0x49,%ecx
>
> which is strange.
>
> Tom, is patch_level=0x0600063e on BD supposed to #GP when writing
> MSR_IA32_PRED_CMD...

No, that patch level should be good for writing that MSR as far as I'm
aware.

Just to be clear, was the ucode updated through the BIOS/firmware or
on Linux boot through the firmware loader? And I saw Jiri's request
for more info, so I'll look for that, also.

Thanks,
Tom

>
> Thx.
>

2019-01-09 13:44:03

by Paul Menzel

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

Dear Jiri, dear Thomas, dear Borislav,


On 01/09/19 13:06, Paul Menzel wrote:

> On 01/04/19 17:42, Jiri Kosina wrote:
>>
>> [ added some CCs ]
>
> Thank you for your reply and taking care of that. I am sorry for the
> late reply. It took a while to test this.
>
>> On Thu, 3 Jan 2019, Paul Menzel wrote:
>
>>> On the server board Asus KGPE-D16 with AMD Opteron 6278 processor
>>> updating the microcode update in the firmware from 0x0600062e to
>>> 0x0600063e seems to cause a general protection fault with Linux
>>> 4.14.87 and 4.20-rc7.
>
> Just a minor correction. The previous microcode update version was
> 0x0600063d, and, it looks like, I am getting the same failure with
> that and Linux 4.14.87.

I was mistaken. Everything is fine with 0x0600063d.

> It boots fine, when not applying any microcode update (0x00000000).
>
> To answers, Thomas’ question, the microcode is updated in the
> firmware (coreboot). (Asus didn’t publish any updates.)
>
>>>> 46.859: [ 7.573240] microcode: CPU31: patch_level=0x0600063e
>>>> 46.859: [ 7.578507] microcode: Microcode Update Driver: v2.2.
>>>> 46.860: [ 7.578539] sched_clock: Marking stable (6510054745, 1068444659)->(7999876773, -421377369)
>>>> 46.860: [ 7.593013] registered taskstats version 1
>>>> 46.861: [ 7.598091] rtc_cmos 00:00: setting system clock to 2000-01-01 08:01:51 UTC (946713711)
>>>> 46.862: [ 7.606575] ALSA device list:
>>>> 46.862: [ 7.609802] No soundcards found.
>>>> 46.865: [ 7.615887] Freeing unused kernel image memory: 1564K
>>>> 46.871: [ 7.627073] Write protecting the kernel read-only data: 20480k
>>>> 46.872: [ 7.634366] Freeing unused kernel image memory: 2016K
>>>> 46.873: [ 7.640297] Freeing unused kernel image memory: 584K
>>>> 46.874: [ 7.645521] Run /init as init process
>>>> 46.877: [ 7.652262] general protection fault: 0000 [#1] SMP NOPTI
>>>> 46.877: [ 7.657931] CPU: 18 PID: 0 Comm: swapper/18 Not tainted 4.20.0-rc7.mx64.237 #1
>>>> 46.877: [ 7.665514] Hardware name: ASUS KGPE-D16/KGPE-D16, BIOS 4.9-103-g637bef2037 01/02/2019
>>>> 46.878: [ 7.673804] RIP: 0010:switch_mm_irqs_off+0xb2/0x640
>>>> 46.878: [ 7.678948] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
>>>> 46.879: [ 7.698394] RSP: 0018:ffffc90006343e20 EFLAGS: 00010046
>>>> 46.879: [ 7.703844] RAX: 0000000000000001 RBX: ffff88981ca0b800 RCX: 0000000000000049
>>>> 46.879: [ 7.711238] RDX: 0000000000000000 RSI: ffff88981b87cf80 RDI: ffff88981ca0b800
>>>> 46.880: [ 7.718665] RBP: ffffc90006343e70 R08: 00000001c81bec00 R09: 0000000000000000
>>>> 46.880: [ 7.726092] R10: ffffc90006343e88 R11: 0000000000000000 R12: ffffffff82479b40
>>>> 46.880: [ 7.733494] R13: 0000000000000000 R14: 0000000000000012 R15: ffff88981dd50080
>>>> 46.881: [ 7.740853] FS: 0000000000000000(0000) GS:ffff88981fa80000(0000) knlGS:0000000000000000
>>>> 46.881: [ 7.749318] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> 46.881: [ 7.755281] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000406e0
>>>> 46.881: [ 7.762761] Call Trace:
>>>> 46.881: [ 7.765369] ? __schedule+0x1b9/0x7b0
>>>> 46.882: [ 7.769253] __schedule+0x1b9/0x7b0
>>>> 46.882: [ 7.772930] schedule_idle+0x1e/0x40
>>>> 46.882: [ 7.776744] do_idle+0x146/0x200
>>>> 46.882: [ 7.780181] cpu_startup_entry+0x19/0x20
>>>> 46.883: [ 7.784274] start_secondary+0x183/0x1b0
>>>> 46.883: [ 7.788409] secondary_startup_64+0xa4/0xb0
>>>> 46.883: [ 7.792766] Modules linked in:
>>>> 46.883: [ 7.796105] ---[ end trace a423e363fe1ecf67 ]---
>>>> 46.884: [ 7.800939] RIP: 0010:switch_mm_irqs_off+0xb2/0x640
>>>> 46.884: [ 7.806048] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
>>
>> So this faults when writing PRED_CMD_IBPB to MSR_IA32_PRED_CMD, but that
>> should be properly patched out on ucodes that don't support IBPB.
>>
>> This almost looks like the ucode you updated to would advertise IBPB
>> availability, but then fault when it's used.
>
> As it also happens with the previous firmware version, is it possible that
> the check is incorrect? Maybe there are not a lot of people running AMD
> Opteron servers and latest Linux or Linux stable kernels?
>
>> I guess that booting with 'spectre_v2_user=off' makes the issue go away,
>> right?
>
> Indeed. That makes it boot with microcode updates applied.
>
> [ 0.000000] Command line: BOOT_IMAGE=/boot/bzImage-4.14.87.mx64.236 crashkernel=256M root=LABEL=root ro console=ttyS0,115200n8 console=ttyS1,115200n8 console=tty0 init=/bin/systemd audit=0 spectre_v2_user=off
> […]
> [ 3.809210] microcode: CPU0: patch_level=0x0600063e
>
>> What happens then if you manually wrmsr 0x1 to MSR 0x49 from userspace?
>
> With no microcode updates applied, I get.
>
> $ dmesg | grep 'microcode: CPU0: patch_level'
> [ 3.817171] microcode: CPU0: patch_level=0x00000000
> $ sudo modprobe msr
> $ sudo ./wrmsr 0x49 0x1 # https://github.com/01org/msr-tools
> wrmsr: CPU 0 cannot set MSR 0x00000049 to 0x0000000000000001
>
> I get the same with microcode updates applied.
>
> $ dmesg | grep 'microcode: CPU0: patch_level'
> [ 3.809210] microcode: CPU0: patch_level=0x0600063e
> $ sudo modprobe msr
> $ sudo ./wrmsr 0x49 0x1
> wrmsr: CPU 0 cannot set MSR 0x00000049 to 0x0000000000000001
>
>> Could you please post /proc/cpuinfo from such a boot as well?
>
>> Leaving the rest of the original mail for reference.
>
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 21
> model : 1
> model name : AMD Opteron(tm) Processor 6278
> stepping : 2
> microcode : 0x600063e
> cpu MHz : 1871.198
> cache size : 2048 KB
> physical id : 0
> siblings : 16
> core id : 0
> cpu cores : 8
> apicid : 0
> initial apicid : 0
> fpu : yes
> fpu_exception : yes
> cpuid level : 13
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl
> nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowpre
> fetch osvw ibs xop skinit wdt fma4 topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter
> pfthreshold
> bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
> bogomips : 4799.84
> TLB size : 1536 4K pages
> clflush size : 64
> cache_alignment : 64
> address sizes : 48 bits physical, 48 bits virtual
> power management: ts ttp tm 100mhzsteps hwpstate cpb
>
> Please find the whole output attached.
>
>>>> 46.884: [ 7.825440] RSP: 0018:ffffc90006343e20 EFLAGS: 00010046
>>>> 46.885: [ 7.830855] RAX: 0000000000000001 RBX: ffff88981ca0b800 RCX: 0000000000000049
>>>> 46.885: [ 7.838230] RDX: 0000000000000000 RSI: ffff88981b87cf80 RDI: ffff88981ca0b800
>>>> 46.885: [ 7.845614] RBP: ffffc90006343e70 R08: 00000001c81bec00 R09: 0000000000000000
>>>> 46.886: [ 7.853047] R10: ffffc90006343e88 R11: 0000000000000000 R12: ffffffff82479b40
>>>> 46.886: [ 7.860427] R13: 0000000000000000 R14: 0000000000000012 R15: ffff88981dd50080
>>>> 46.886: [ 7.867862] FS: 0000000000000000(0000) GS:ffff88981fa80000(0000) knlGS:0000000000000000
>>>> 46.886: [ 7.876320] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> 46.887: [ 7.882351] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000406e0
>>>> 46.887: [ 7.889746] Kernel panic - not syncing: Attempted to kill the idle task!
>>>> 46.888: [ 7.896907] Kernel Offset: disabled
>>>> 46.888: [ 7.900558] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>>>
>>> Please find the whole log, including the coreboot messages, attached. The time
>>> stamps in the beginning are from the script `readserial.py` from the SeaBIOS
>>> repository.
>
> Please find the logs attached.
>
> I’ll do one more test with the microcode update 0x0600063d, to verify
> that the panic also happens with that microcode version (I am pretty
> certain).

As written above, it looks like I was wrong, and 0x0600063d does not
cause the problem.


Kind regards,

Paul


Attachments:
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature

2019-01-09 13:44:31

by Paul Menzel

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

Dear Thomas,


On 01/09/19 14:16, Thomas Gleixner wrote:

> On Wed, 9 Jan 2019, Paul Menzel wrote:
>> I get the same with microcode updates applied.
>>
>> $ dmesg | grep 'microcode: CPU0: patch_level'
>> [ 3.809210] microcode: CPU0: patch_level=0x0600063e
>> $ sudo modprobe msr
>> $ sudo ./wrmsr 0x49 0x1
>> wrmsr: CPU 0 cannot set MSR 0x00000049 to 0x0000000000000001
>>
>>> Could you please post /proc/cpuinfo from such a boot as well?
>
> /proc/cpuinfo unfortunately does not contain the amd specific IBPB flag,
> but the dmesg of the original report says:
>
> Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
>
> which means, that the CPUID bit is set.
>
> Can you please provide the output of:
>
> cpuid -1 -l 0x80000008 -r
>
> On my AMD Opteron(TM) Processor 6276 with microcode 0x600063d loaded I get:
>
> 0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000
>
> EBX is 0, which means that X86_FEATURE_AMD_IBPB is not enabled. So the
> kernel does not try to use the speculation control MSR (0x49).

With CPUID 20180519, I get the output below with both microcode update versions.

0x600063d:

$ sudo ./cpuid -1 -l 0x80000008 -r
CPU:
0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000

0x600063e:

$ sudo ./cpuid -1 -l 0x80000008 -r
CPU:
0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000


Kind regards,

Paul


Attachments:
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature

2019-01-09 13:54:05

by Thomas Gleixner

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

Paul,

On Wed, 9 Jan 2019, Paul Menzel wrote:
> I get the same with microcode updates applied.
>
> $ dmesg | grep 'microcode: CPU0: patch_level'
> [ 3.809210] microcode: CPU0: patch_level=0x0600063e
> $ sudo modprobe msr
> $ sudo ./wrmsr 0x49 0x1
> wrmsr: CPU 0 cannot set MSR 0x00000049 to 0x0000000000000001
>
> > Could you please post /proc/cpuinfo from such a boot as well?

/proc/cpuinfo unfortunately does not contain the amd specific IBPB flag,
but the dmesg of the original report says:

Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier

which means, that the CPUID bit is set.

Can you please provide the output of:

cpuid -1 -l 0x80000008 -r

On my AMD Opteron(TM) Processor 6276 with microcode 0x600063d loaded I get:

0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000

EBX is 0, which means that X86_FEATURE_AMD_IBPB is not enabled. So the
kernel does not try to use the speculation control MSR (0x49).

Thanks,

tglx

2019-01-09 14:33:54

by Tom Lendacky

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

On 1/9/19 7:35 AM, Paul Menzel wrote:
> Dear Thomas,
>
>
> On 01/09/19 14:16, Thomas Gleixner wrote:
>
>> On Wed, 9 Jan 2019, Paul Menzel wrote:
>>> I get the same with microcode updates applied.
>>>
>>> $ dmesg | grep 'microcode: CPU0: patch_level'
>>> [ 3.809210] microcode: CPU0: patch_level=0x0600063e
>>> $ sudo modprobe msr
>>> $ sudo ./wrmsr 0x49 0x1
>>> wrmsr: CPU 0 cannot set MSR 0x00000049 to 0x0000000000000001
>>>
>>>> Could you please post /proc/cpuinfo from such a boot as well?
>>
>> /proc/cpuinfo unfortunately does not contain the amd specific IBPB flag,
>> but the dmesg of the original report says:
>>
>> Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
>>
>> which means, that the CPUID bit is set.
>>
>> Can you please provide the output of:
>>
>> cpuid -1 -l 0x80000008 -r
>>
>> On my AMD Opteron(TM) Processor 6276 with microcode 0x600063d loaded I get:
>>
>> 0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000
>>
>> EBX is 0, which means that X86_FEATURE_AMD_IBPB is not enabled. So the
>> kernel does not try to use the speculation control MSR (0x49).
>
> With CPUID 20180519, I get the output below with both microcode update versions.
>
> 0x600063d:
>
> $ sudo ./cpuid -1 -l 0x80000008 -r
> CPU:
> 0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000
>
> 0x600063e:
>
> $ sudo ./cpuid -1 -l 0x80000008 -r
> CPU:
> 0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000
>

Hmmm... so ebx is 0 for both versions, so I'm not sure how IBPB is being
set. What's the CPUID output for 0x07 (cpuid -1 -l 0x07 -r)?

Thanks,
Tom

>
> Kind regards,
>
> Paul
>

2019-01-09 14:38:12

by Paul Menzel

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

Dear Thomas,


On 01/09/19 15:29, Lendacky, Thomas wrote:
> On 1/9/19 7:35 AM, Paul Menzel wrote:

>> On 01/09/19 14:16, Thomas Gleixner wrote:
>>
>>> On Wed, 9 Jan 2019, Paul Menzel wrote:
>>>> I get the same with microcode updates applied.
>>>>
>>>> $ dmesg | grep 'microcode: CPU0: patch_level'
>>>> [ 3.809210] microcode: CPU0: patch_level=0x0600063e
>>>> $ sudo modprobe msr
>>>> $ sudo ./wrmsr 0x49 0x1
>>>> wrmsr: CPU 0 cannot set MSR 0x00000049 to 0x0000000000000001
>>>>
>>>>> Could you please post /proc/cpuinfo from such a boot as well?
>>>
>>> /proc/cpuinfo unfortunately does not contain the amd specific IBPB flag,
>>> but the dmesg of the original report says:
>>>
>>> Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
>>>
>>> which means, that the CPUID bit is set.
>>>
>>> Can you please provide the output of:
>>>
>>> cpuid -1 -l 0x80000008 -r
>>>
>>> On my AMD Opteron(TM) Processor 6276 with microcode 0x600063d loaded I get:
>>>
>>> 0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000
>>>
>>> EBX is 0, which means that X86_FEATURE_AMD_IBPB is not enabled. So the
>>> kernel does not try to use the speculation control MSR (0x49).
>>
>> With CPUID 20180519, I get the output below with both microcode update versions.
>>
>> 0x600063d:
>>
>> $ sudo ./cpuid -1 -l 0x80000008 -r
>> CPU:
>> 0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000
>>
>> 0x600063e:
>>
>> $ sudo ./cpuid -1 -l 0x80000008 -r
>> CPU:
>> 0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000
>
> Hmmm... so ebx is 0 for both versions, so I'm not sure how IBPB is being
> set. What's the CPUID output for 0x07 (cpuid -1 -l 0x07 -r)?

$ sudo ./cpuid -1 -l 0x07 -r
CPU:
0x00000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000


Kind regards,

Paul


Attachments:
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature

2019-01-09 16:43:35

by Tom Lendacky

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

On 1/9/19 8:34 AM, Paul Menzel wrote:
> Dear Thomas,
>
>
> On 01/09/19 15:29, Lendacky, Thomas wrote:
>> On 1/9/19 7:35 AM, Paul Menzel wrote:
>
>>> On 01/09/19 14:16, Thomas Gleixner wrote:
>>>
>>>> On Wed, 9 Jan 2019, Paul Menzel wrote:
>>>>> I get the same with microcode updates applied.
>>>>>
>>>>> $ dmesg | grep 'microcode: CPU0: patch_level'
>>>>> [ 3.809210] microcode: CPU0: patch_level=0x0600063e
>>>>> $ sudo modprobe msr
>>>>> $ sudo ./wrmsr 0x49 0x1
>>>>> wrmsr: CPU 0 cannot set MSR 0x00000049 to 0x0000000000000001
>>>>>
>>>>>> Could you please post /proc/cpuinfo from such a boot as well?
>>>>
>>>> /proc/cpuinfo unfortunately does not contain the amd specific IBPB flag,
>>>> but the dmesg of the original report says:
>>>>
>>>> Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
>>>>
>>>> which means, that the CPUID bit is set.
>>>>
>>>> Can you please provide the output of:
>>>>
>>>> cpuid -1 -l 0x80000008 -r
>>>>
>>>> On my AMD Opteron(TM) Processor 6276 with microcode 0x600063d loaded I get:
>>>>
>>>> 0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000
>>>>
>>>> EBX is 0, which means that X86_FEATURE_AMD_IBPB is not enabled. So the
>>>> kernel does not try to use the speculation control MSR (0x49).
>>>
>>> With CPUID 20180519, I get the output below with both microcode update versions.
>>>
>>> 0x600063d:
>>>
>>> $ sudo ./cpuid -1 -l 0x80000008 -r
>>> CPU:
>>> 0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000
>>>
>>> 0x600063e:
>>>
>>> $ sudo ./cpuid -1 -l 0x80000008 -r
>>> CPU:
>>> 0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000
>>
>> Hmmm... so ebx is 0 for both versions, so I'm not sure how IBPB is being
>> set. What's the CPUID output for 0x07 (cpuid -1 -l 0x07 -r)?
>
> $ sudo ./cpuid -1 -l 0x07 -r
> CPU:
> 0x00000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000

I'm confused then... the only way that I can see that the IBPB feature
can be set is if 0x80000008[EBX] bit 12 is set or if 0x07[EDX] bit 26 is
set - neither of which is the case. I must be missing something...

Thanks,
Tom

>
>
> Kind regards,
>
> Paul
>

2019-01-09 16:44:56

by Paul Menzel

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

Dear Thomas,


On 01/09/19 17:15, Lendacky, Thomas wrote:
> On 1/9/19 8:34 AM, Paul Menzel wrote:

>> On 01/09/19 15:29, Lendacky, Thomas wrote:
>>> On 1/9/19 7:35 AM, Paul Menzel wrote:
>>
>>>> On 01/09/19 14:16, Thomas Gleixner wrote:
>>>>
>>>>> On Wed, 9 Jan 2019, Paul Menzel wrote:
>>>>>> I get the same with microcode updates applied.
>>>>>>
>>>>>> $ dmesg | grep 'microcode: CPU0: patch_level'
>>>>>> [ 3.809210] microcode: CPU0: patch_level=0x0600063e
>>>>>> $ sudo modprobe msr
>>>>>> $ sudo ./wrmsr 0x49 0x1
>>>>>> wrmsr: CPU 0 cannot set MSR 0x00000049 to 0x0000000000000001
>>>>>>
>>>>>>> Could you please post /proc/cpuinfo from such a boot as well?
>>>>>
>>>>> /proc/cpuinfo unfortunately does not contain the amd specific IBPB flag,
>>>>> but the dmesg of the original report says:
>>>>>
>>>>> Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
>>>>>
>>>>> which means, that the CPUID bit is set.
>>>>>
>>>>> Can you please provide the output of:
>>>>>
>>>>> cpuid -1 -l 0x80000008 -r
>>>>>
>>>>> On my AMD Opteron(TM) Processor 6276 with microcode 0x600063d loaded I get:
>>>>>
>>>>> 0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000
>>>>>
>>>>> EBX is 0, which means that X86_FEATURE_AMD_IBPB is not enabled. So the
>>>>> kernel does not try to use the speculation control MSR (0x49).
>>>>
>>>> With CPUID 20180519, I get the output below with both microcode update versions.
>>>>
>>>> 0x600063d:
>>>>
>>>> $ sudo ./cpuid -1 -l 0x80000008 -r
>>>> CPU:
>>>> 0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000
>>>>
>>>> 0x600063e:
>>>>
>>>> $ sudo ./cpuid -1 -l 0x80000008 -r
>>>> CPU:
>>>> 0x80000008 0x00: eax=0x00003030 ebx=0x00000000 ecx=0x0000500f edx=0x00000000
>>>
>>> Hmmm... so ebx is 0 for both versions, so I'm not sure how IBPB is being
>>> set. What's the CPUID output for 0x07 (cpuid -1 -l 0x07 -r)?
>>
>> $ sudo ./cpuid -1 -l 0x07 -r
>> CPU:
>> 0x00000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
>
> I'm confused then... the only way that I can see that the IBPB feature
> can be set is if 0x80000008[EBX] bit 12 is set or if 0x07[EDX] bit 26 is
> set - neither of which is the case. I must be missing something...

Is there a way to trace the value of `boot_cpu_data` from
`arch/x86/include/asm/cpufeature.h` with some Linux Kernel magic?

#define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit)

Or is rebuilding with print statements the only solution?


Kind regards,

Paul


Attachments:
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature

2019-01-09 21:15:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

On Wed, Jan 09, 2019 at 05:34:11PM +0100, Paul Menzel wrote:
> Is there a way to trace the value of `boot_cpu_data` from
> `arch/x86/include/asm/cpufeature.h` with some Linux Kernel magic?
>
> #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit)
>
> Or is rebuilding with print statements the only solution?

Yes. Just apply this and catch output. It is a wild guess anyway as
this whole deal looks really strange but at least it should not #GP the
machine.

---
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index dad12b767ba0..ec4688779900 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -284,6 +284,9 @@ static inline void indirect_branch_prediction_barrier(void)
{
u64 val = PRED_CMD_IBPB;

+ if (WARN_ON(boot_cpu_has(X86_FEATURE_USE_IBPB)))
+ return;
+
alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB);
}

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 1de0f4170178..4ed4cc99a2c0 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -371,6 +371,8 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
if (boot_cpu_has(X86_FEATURE_IBPB)) {
setup_force_cpu_cap(X86_FEATURE_USE_IBPB);

+ pr_err("%s: set X86_FEATURE_USE_IBPB\n", __func__);
+
switch (cmd) {
case SPECTRE_V2_USER_CMD_FORCE:
case SPECTRE_V2_USER_CMD_PRCTL_IBPB:

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-01-10 14:01:34

by Paul Menzel

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

Dear Borislav,


On 01/09/19 22:11, Borislav Petkov wrote:
> On Wed, Jan 09, 2019 at 05:34:11PM +0100, Paul Menzel wrote:
>> Is there a way to trace the value of `boot_cpu_data` from
>> `arch/x86/include/asm/cpufeature.h` with some Linux Kernel magic?
>>
>> #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit)
>>
>> Or is rebuilding with print statements the only solution?
>
> Yes. Just apply this and catch output. It is a wild guess anyway as
> this whole deal looks really strange but at least it should not #GP the
> machine.
>
> ---
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index dad12b767ba0..ec4688779900 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -284,6 +284,9 @@ static inline void indirect_branch_prediction_barrier(void)
> {
> u64 val = PRED_CMD_IBPB;
>
> + if (WARN_ON(boot_cpu_has(X86_FEATURE_USE_IBPB)))
> + return;
> +
> alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB);
> }
>
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 1de0f4170178..4ed4cc99a2c0 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -371,6 +371,8 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
> if (boot_cpu_has(X86_FEATURE_IBPB)) {
> setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
>
> + pr_err("%s: set X86_FEATURE_USE_IBPB\n", __func__);
> +
> switch (cmd) {
> case SPECTRE_V2_USER_CMD_FORCE:
> case SPECTRE_V2_USER_CMD_PRCTL_IBPB:

Thank you very much. Indeed, the machine does not crash. I used Linus’
master branch for testing, and applied your patch on top. Please find
the full log attached.

```
$ git describe --tags origin/master
v5.0-rc1-26-g500cf822f80f
$ dmesg
[…]
[ 7.262018] microcode: CPU0: patch_level=0x0600063e
[…]
[ 3.198107] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[…]
[ 8.786863] Run /init as init process
[ 8.792006] WARNING: CPU: 1 PID: 0 at ./arch/x86/include/asm/nospec-branch.h:287 switch_mm_irqs_off+0x5ec/0x680
[ 8.802384] Modules linked in:
[ 8.805586] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc1.mx64.236-00027-ged01f563987a #1
[ 8.814529] Hardware name: ASUS KGPE-D16/KGPE-D16, BIOS 4.9-213-gdb866ba200 01/08/2019
[ 8.822677] RIP: 0010:switch_mm_irqs_off+0x5ec/0x680
[ 8.827801] Code: 31 d2 31 f6 e8 95 4a da 00 49 8b 06 48 85 c0 75 e7 e8 88 ee 06 00 44 89 fe 48 c7 c7 c0 a1 46 82 e8 69 88 06 00 e9 57 fc ff ff <0f> 0b e9 d3 fa ff ff 0f 0b e9 6b ff ff ff 0f 0b e9 22 fe ff ff 0f
[ 8.847001] RSP: 0018:ffffc900062bfe20 EFLAGS: 00010003
[ 8.852374] RAX: 052a310401c13fff RBX: ffff88881b748800 RCX: 0000000000000000
[ 8.859655] RDX: 0000000000000001 RSI: ffff88881caed080 RDI: ffff88881b748800
[ 8.866952] RBP: ffffc900062bfe70 R08: 000000020c098c00 R09: 0000000000000000
[ 8.874237] R10: ffffc900062bfe88 R11: 0000000000000000 R12: ffffffff8247e460
[ 8.881529] R13: 0000000000000000 R14: 0000000000000001 R15: ffff88881db28f00
[ 8.888810] FS: 0000000000000000(0000) GS:ffff88881fa40000(0000) knlGS:0000000000000000
[ 8.897146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.903066] CR2: 0000000000000000 CR3: 000000000240e000 CR4: 00000000000406e0
[ 8.910398] Call Trace:
[ 8.912994] ? __schedule+0x1b9/0x7b0
[ 8.916795] __schedule+0x1b9/0x7b0
[ 8.920436] schedule_idle+0x1e/0x40
[ 8.924155] do_idle+0x146/0x200
[ 8.927577] cpu_startup_entry+0x19/0x20
[ 8.931641] start_secondary+0x183/0x1b0
[ 8.935722] secondary_startup_64+0xa4/0xb0
[ 8.940066] ---[ end trace 948cf50690b0f4b1 ]---
```


Kind regards,

Paul


Attachments:
coreboot-ucode-updates-0x0600063e-linux-5.0-rc1+-spectre_v2_user-auto.log (482.85 kB)
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature
Download all attachments

2019-01-10 15:59:22

by Tom Lendacky

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

On 1/10/19 7:57 AM, Paul Menzel wrote:
> Dear Borislav,
>
>
> On 01/09/19 22:11, Borislav Petkov wrote:
>> On Wed, Jan 09, 2019 at 05:34:11PM +0100, Paul Menzel wrote:
>>> Is there a way to trace the value of `boot_cpu_data` from
>>> `arch/x86/include/asm/cpufeature.h` with some Linux Kernel magic?
>>>
>>> #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit)
>>>
>>> Or is rebuilding with print statements the only solution?
>>
>> Yes. Just apply this and catch output. It is a wild guess anyway as
>> this whole deal looks really strange but at least it should not #GP the
>> machine.
>>
>> ---
>> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
>> index dad12b767ba0..ec4688779900 100644
>> --- a/arch/x86/include/asm/nospec-branch.h
>> +++ b/arch/x86/include/asm/nospec-branch.h
>> @@ -284,6 +284,9 @@ static inline void indirect_branch_prediction_barrier(void)
>> {
>> u64 val = PRED_CMD_IBPB;
>>
>> + if (WARN_ON(boot_cpu_has(X86_FEATURE_USE_IBPB)))
>> + return;
>> +
>> alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB);
>> }
>>
>> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
>> index 1de0f4170178..4ed4cc99a2c0 100644
>> --- a/arch/x86/kernel/cpu/bugs.c
>> +++ b/arch/x86/kernel/cpu/bugs.c
>> @@ -371,6 +371,8 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
>> if (boot_cpu_has(X86_FEATURE_IBPB)) {
>> setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
>>
>> + pr_err("%s: set X86_FEATURE_USE_IBPB\n", __func__);
>> +
>> switch (cmd) {
>> case SPECTRE_V2_USER_CMD_FORCE:
>> case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
>
> Thank you very much. Indeed, the machine does not crash. I used Linus’
> master branch for testing, and applied your patch on top. Please find
> the full log attached.

Checking the original log file again, it showed the mitigation message
for IBPB that is just after the above switch statement, so this print
output is expected. What about applying this patch on top of the patch
from Boris:

---
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cb28e98a0659..b0ea6886ef15 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -779,6 +779,7 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_IBRS);
set_cpu_cap(c, X86_FEATURE_IBPB);
set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
+ pr_err("%s: set X86_FEATURE_IBPB via X86_FEATURE_SPEC_CTRL: cpuid 0x07[EDX]=%#x\n", __func__, cpuid_edx(0x07));
}

if (cpu_has(c, X86_FEATURE_INTEL_STIBP))
@@ -793,8 +794,10 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
}

- if (cpu_has(c, X86_FEATURE_AMD_IBPB))
+ if (cpu_has(c, X86_FEATURE_AMD_IBPB)) {
set_cpu_cap(c, X86_FEATURE_IBPB);
+ pr_err("%s: set X86_FEATURE_IBPB via X86_FEATURE_AMD_IBPB: cpuid 0x80000008[EBX]=%#x\n", __func__, cpuid_ebx(0x80000008));
+ }

if (cpu_has(c, X86_FEATURE_AMD_STIBP)) {
set_cpu_cap(c, X86_FEATURE_STIBP);
--

Thanks,
Tom

>
> ```
> $ git describe --tags origin/master
> v5.0-rc1-26-g500cf822f80f
> $ dmesg
> […]
> [ 7.262018] microcode: CPU0: patch_level=0x0600063e
> […]
> [ 3.198107] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
> […]
> [ 8.786863] Run /init as init process
> [ 8.792006] WARNING: CPU: 1 PID: 0 at ./arch/x86/include/asm/nospec-branch.h:287 switch_mm_irqs_off+0x5ec/0x680
> [ 8.802384] Modules linked in:
> [ 8.805586] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc1.mx64.236-00027-ged01f563987a #1
> [ 8.814529] Hardware name: ASUS KGPE-D16/KGPE-D16, BIOS 4.9-213-gdb866ba200 01/08/2019
> [ 8.822677] RIP: 0010:switch_mm_irqs_off+0x5ec/0x680
> [ 8.827801] Code: 31 d2 31 f6 e8 95 4a da 00 49 8b 06 48 85 c0 75 e7 e8 88 ee 06 00 44 89 fe 48 c7 c7 c0 a1 46 82 e8 69 88 06 00 e9 57 fc ff ff <0f> 0b e9 d3 fa ff ff 0f 0b e9 6b ff ff ff 0f 0b e9 22 fe ff ff 0f
> [ 8.847001] RSP: 0018:ffffc900062bfe20 EFLAGS: 00010003
> [ 8.852374] RAX: 052a310401c13fff RBX: ffff88881b748800 RCX: 0000000000000000
> [ 8.859655] RDX: 0000000000000001 RSI: ffff88881caed080 RDI: ffff88881b748800
> [ 8.866952] RBP: ffffc900062bfe70 R08: 000000020c098c00 R09: 0000000000000000
> [ 8.874237] R10: ffffc900062bfe88 R11: 0000000000000000 R12: ffffffff8247e460
> [ 8.881529] R13: 0000000000000000 R14: 0000000000000001 R15: ffff88881db28f00
> [ 8.888810] FS: 0000000000000000(0000) GS:ffff88881fa40000(0000) knlGS:0000000000000000
> [ 8.897146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8.903066] CR2: 0000000000000000 CR3: 000000000240e000 CR4: 00000000000406e0
> [ 8.910398] Call Trace:
> [ 8.912994] ? __schedule+0x1b9/0x7b0
> [ 8.916795] __schedule+0x1b9/0x7b0
> [ 8.920436] schedule_idle+0x1e/0x40
> [ 8.924155] do_idle+0x146/0x200
> [ 8.927577] cpu_startup_entry+0x19/0x20
> [ 8.931641] start_secondary+0x183/0x1b0
> [ 8.935722] secondary_startup_64+0xa4/0xb0
> [ 8.940066] ---[ end trace 948cf50690b0f4b1 ]---
> ```
>
>
> Kind regards,
>
> Paul
>

2019-01-10 16:30:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

On Thu, Jan 10, 2019 at 03:53:11PM +0000, Lendacky, Thomas wrote:
> Checking the original log file again, it showed the mitigation message
> for IBPB that is just after the above switch statement, so this print
> output is expected. What about applying this patch on top of the patch
> from Boris:

Ha! See the mail I just sent - it has yours + more! :-)

With this funky bug, you never know...

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-01-10 19:58:44

by Borislav Petkov

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

On Thu, Jan 10, 2019 at 02:57:40PM +0100, Paul Menzel wrote:
> Thank you very much. Indeed, the machine does not crash. I used Linus’
> master branch for testing, and applied your patch on top. Please find
> the full log attached.

> 80.649: [ 3.197107] Spectre V2 : spectre_v2_user_select_mitigation: set X86_FEATURE_USE_IBPB

This is amazing.

Ok, next diff, same exercise. Thx.

---
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index dad12b767ba0..528ef8336f5f 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -284,6 +284,12 @@ static inline void indirect_branch_prediction_barrier(void)
{
u64 val = PRED_CMD_IBPB;

+ if (WARN_ON(boot_cpu_has(X86_FEATURE_USE_IBPB))) {
+ pr_info("%s: c: %px, array: 0x%x\n",
+ __func__, &boot_cpu_data, boot_cpu_data.x86_capability[7]);
+ return;
+ }
+
alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB);
}

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 8654b8b0c848..e818e5abe611 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -371,6 +371,9 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
if (boot_cpu_has(X86_FEATURE_IBPB)) {
setup_force_cpu_cap(X86_FEATURE_USE_IBPB);

+ pr_err("%s: set X86_FEATURE_USE_IBPB, c: %px, array: 0x%x\n",
+ __func__, &boot_cpu_data, boot_cpu_data.x86_capability[7]);
+
switch (cmd) {
case SPECTRE_V2_USER_CMD_FORCE:
case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cb28e98a0659..8566737fa500 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -765,6 +765,9 @@ static void apply_forced_caps(struct cpuinfo_x86 *c)
c->x86_capability[i] &= ~cpu_caps_cleared[i];
c->x86_capability[i] |= cpu_caps_set[i];
}
+
+ if (c == &boot_cpu_data)
+ pr_info("%s: c: %px, array: 0x%x\n", __func__, c, c->x86_capability[7]);
}

static void init_speculation_control(struct cpuinfo_x86 *c)
@@ -778,6 +781,10 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
set_cpu_cap(c, X86_FEATURE_IBRS);
set_cpu_cap(c, X86_FEATURE_IBPB);
+
+ pr_info("%s: X86_FEATURE_SPEC_CTRL: c: %px, array: 0x%x, CPUID: 0x%x\n",
+ __func__, c, c->x86_capability[7], cpuid_edx(7));
+
set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
}

@@ -793,9 +800,13 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
}

- if (cpu_has(c, X86_FEATURE_AMD_IBPB))
+ if (cpu_has(c, X86_FEATURE_AMD_IBPB)) {
set_cpu_cap(c, X86_FEATURE_IBPB);

+ pr_info("%s: X86_FEATURE_AMD_IBPB: c: %px, array: 0x%x, CPUID: 0x%x\n",
+ __func__, c, c->x86_capability[7], cpuid_ebx(0x80000008));
+ }
+
if (cpu_has(c, X86_FEATURE_AMD_STIBP)) {
set_cpu_cap(c, X86_FEATURE_STIBP);
set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-01-10 20:32:59

by Paul Menzel

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

Dear Boris, dear Thomas,


On 01/10/19 17:00, Borislav Petkov wrote:
> On Thu, Jan 10, 2019 at 02:57:40PM +0100, Paul Menzel wrote:
>> Thank you very much. Indeed, the machine does not crash. I used Linus’
>> master branch for testing, and applied your patch on top. Please find
>> the full log attached.
>
>> 80.649: [ 3.197107] Spectre V2 : spectre_v2_user_select_mitigation: set X86_FEATURE_USE_IBPB
>
> This is amazing.
>
> Ok, next diff, same exercise. Thx.>
> ---
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index dad12b767ba0..528ef8336f5f 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -284,6 +284,12 @@ static inline void indirect_branch_prediction_barrier(void)
> {
> u64 val = PRED_CMD_IBPB;
>
> + if (WARN_ON(boot_cpu_has(X86_FEATURE_USE_IBPB))) {
> + pr_info("%s: c: %px, array: 0x%x\n",
> + __func__, &boot_cpu_data, boot_cpu_data.x86_capability[7]);
> + return;
> + }
> +
> alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB);
> }
>
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 8654b8b0c848..e818e5abe611 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -371,6 +371,9 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
> if (boot_cpu_has(X86_FEATURE_IBPB)) {
> setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
>
> + pr_err("%s: set X86_FEATURE_USE_IBPB, c: %px, array: 0x%x\n",
> + __func__, &boot_cpu_data, boot_cpu_data.x86_capability[7]);
> +
> switch (cmd) {
> case SPECTRE_V2_USER_CMD_FORCE:
> case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index cb28e98a0659..8566737fa500 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -765,6 +765,9 @@ static void apply_forced_caps(struct cpuinfo_x86 *c)
> c->x86_capability[i] &= ~cpu_caps_cleared[i];
> c->x86_capability[i] |= cpu_caps_set[i];
> }
> +
> + if (c == &boot_cpu_data)
> + pr_info("%s: c: %px, array: 0x%x\n", __func__, c, c->x86_capability[7]);
> }
>
> static void init_speculation_control(struct cpuinfo_x86 *c)
> @@ -778,6 +781,10 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
> if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
> set_cpu_cap(c, X86_FEATURE_IBRS);
> set_cpu_cap(c, X86_FEATURE_IBPB);
> +
> + pr_info("%s: X86_FEATURE_SPEC_CTRL: c: %px, array: 0x%x, CPUID: 0x%x\n",
> + __func__, c, c->x86_capability[7], cpuid_edx(7));
> +
> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
> }
>
> @@ -793,9 +800,13 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
> }
>
> - if (cpu_has(c, X86_FEATURE_AMD_IBPB))
> + if (cpu_has(c, X86_FEATURE_AMD_IBPB)) {
> set_cpu_cap(c, X86_FEATURE_IBPB);
>
> + pr_info("%s: X86_FEATURE_AMD_IBPB: c: %px, array: 0x%x, CPUID: 0x%x\n",
> + __func__, c, c->x86_capability[7], cpuid_ebx(0x80000008));
> + }
> +
> if (cpu_has(c, X86_FEATURE_AMD_STIBP)) {
> set_cpu_cap(c, X86_FEATURE_STIBP);
> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);

Please find the logs attached.


Kind regards,

Paul


Attachments:
coreboot-ucode-updates-0x0600063e-linux-5.0-rc1+-more-debug-from-boris-spectre_v2_user-auto.log (130.43 kB)
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature
Download all attachments

2019-01-10 22:20:35

by Tom Lendacky

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

On 1/10/19 10:49 AM, Paul Menzel wrote:
> Dear Boris, dear Thomas,
>
>
> On 01/10/19 17:00, Borislav Petkov wrote:
>> On Thu, Jan 10, 2019 at 02:57:40PM +0100, Paul Menzel wrote:
>>> Thank you very much. Indeed, the machine does not crash. I used Linus’
>>> master branch for testing, and applied your patch on top. Please find
>>> the full log attached.
>>
>>> 80.649: [ 3.197107] Spectre V2 : spectre_v2_user_select_mitigation: set X86_FEATURE_USE_IBPB
>>
>> This is amazing.
>>
>> Ok, next diff, same exercise. Thx.>
>> ---
>> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
>> index dad12b767ba0..528ef8336f5f 100644
>> --- a/arch/x86/include/asm/nospec-branch.h
>> +++ b/arch/x86/include/asm/nospec-branch.h
>> @@ -284,6 +284,12 @@ static inline void indirect_branch_prediction_barrier(void)
>> {
>> u64 val = PRED_CMD_IBPB;
>>
>> + if (WARN_ON(boot_cpu_has(X86_FEATURE_USE_IBPB))) {
>> + pr_info("%s: c: %px, array: 0x%x\n",
>> + __func__, &boot_cpu_data, boot_cpu_data.x86_capability[7]);
>> + return;
>> + }
>> +
>> alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB);
>> }
>>
>> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
>> index 8654b8b0c848..e818e5abe611 100644
>> --- a/arch/x86/kernel/cpu/bugs.c
>> +++ b/arch/x86/kernel/cpu/bugs.c
>> @@ -371,6 +371,9 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
>> if (boot_cpu_has(X86_FEATURE_IBPB)) {
>> setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
>>
>> + pr_err("%s: set X86_FEATURE_USE_IBPB, c: %px, array: 0x%x\n",
>> + __func__, &boot_cpu_data, boot_cpu_data.x86_capability[7]);
>> +
>> switch (cmd) {
>> case SPECTRE_V2_USER_CMD_FORCE:
>> case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>> index cb28e98a0659..8566737fa500 100644
>> --- a/arch/x86/kernel/cpu/common.c
>> +++ b/arch/x86/kernel/cpu/common.c
>> @@ -765,6 +765,9 @@ static void apply_forced_caps(struct cpuinfo_x86 *c)
>> c->x86_capability[i] &= ~cpu_caps_cleared[i];
>> c->x86_capability[i] |= cpu_caps_set[i];
>> }
>> +
>> + if (c == &boot_cpu_data)
>> + pr_info("%s: c: %px, array: 0x%x\n", __func__, c, c->x86_capability[7]);
>> }
>>
>> static void init_speculation_control(struct cpuinfo_x86 *c)
>> @@ -778,6 +781,10 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
>> if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
>> set_cpu_cap(c, X86_FEATURE_IBRS);
>> set_cpu_cap(c, X86_FEATURE_IBPB);
>> +
>> + pr_info("%s: X86_FEATURE_SPEC_CTRL: c: %px, array: 0x%x, CPUID: 0x%x\n",
>> + __func__, c, c->x86_capability[7], cpuid_edx(7));
>> +
>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>> }
>>
>> @@ -793,9 +800,13 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>> }
>>
>> - if (cpu_has(c, X86_FEATURE_AMD_IBPB))
>> + if (cpu_has(c, X86_FEATURE_AMD_IBPB)) {
>> set_cpu_cap(c, X86_FEATURE_IBPB);
>>
>> + pr_info("%s: X86_FEATURE_AMD_IBPB: c: %px, array: 0x%x, CPUID: 0x%x\n",
>> + __func__, c, c->x86_capability[7], cpuid_ebx(0x80000008));
>> + }
>> +
>> if (cpu_has(c, X86_FEATURE_AMD_STIBP)) {
>> set_cpu_cap(c, X86_FEATURE_STIBP);
>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>
> Please find the logs attached.

Ah, so the CPUID value is showing X86_FEATURE_AMD_IBPB (not sure why the
cpuid command was showing a value of zero for EBX in your previous email).
Let me see what I can find out about this processor/firmware relation. I
wouldn't expect to see the #GP given that the firmware says IBPB is
supported.

Thanks,
Tom

>
>
> Kind regards,
>
> Paul
>

2019-01-14 17:02:53

by Tom Lendacky

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

On 1/10/19 12:34 PM, Lendacky, Thomas wrote:
> On 1/10/19 10:49 AM, Paul Menzel wrote:
>> Dear Boris, dear Thomas,
>>
>>
>> On 01/10/19 17:00, Borislav Petkov wrote:
>>> On Thu, Jan 10, 2019 at 02:57:40PM +0100, Paul Menzel wrote:
>>>> Thank you very much. Indeed, the machine does not crash. I used Linus’
>>>> master branch for testing, and applied your patch on top. Please find
>>>> the full log attached.
>>>
>>>> 80.649: [ 3.197107] Spectre V2 : spectre_v2_user_select_mitigation: set X86_FEATURE_USE_IBPB
>>>
>>> This is amazing.
>>>
>>> Ok, next diff, same exercise. Thx.>
>>> ---
>>> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
>>> index dad12b767ba0..528ef8336f5f 100644
>>> --- a/arch/x86/include/asm/nospec-branch.h
>>> +++ b/arch/x86/include/asm/nospec-branch.h
>>> @@ -284,6 +284,12 @@ static inline void indirect_branch_prediction_barrier(void)
>>> {
>>> u64 val = PRED_CMD_IBPB;
>>>
>>> + if (WARN_ON(boot_cpu_has(X86_FEATURE_USE_IBPB))) {
>>> + pr_info("%s: c: %px, array: 0x%x\n",
>>> + __func__, &boot_cpu_data, boot_cpu_data.x86_capability[7]);
>>> + return;
>>> + }
>>> +
>>> alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB);
>>> }
>>>
>>> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
>>> index 8654b8b0c848..e818e5abe611 100644
>>> --- a/arch/x86/kernel/cpu/bugs.c
>>> +++ b/arch/x86/kernel/cpu/bugs.c
>>> @@ -371,6 +371,9 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
>>> if (boot_cpu_has(X86_FEATURE_IBPB)) {
>>> setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
>>>
>>> + pr_err("%s: set X86_FEATURE_USE_IBPB, c: %px, array: 0x%x\n",
>>> + __func__, &boot_cpu_data, boot_cpu_data.x86_capability[7]);
>>> +
>>> switch (cmd) {
>>> case SPECTRE_V2_USER_CMD_FORCE:
>>> case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
>>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>>> index cb28e98a0659..8566737fa500 100644
>>> --- a/arch/x86/kernel/cpu/common.c
>>> +++ b/arch/x86/kernel/cpu/common.c
>>> @@ -765,6 +765,9 @@ static void apply_forced_caps(struct cpuinfo_x86 *c)
>>> c->x86_capability[i] &= ~cpu_caps_cleared[i];
>>> c->x86_capability[i] |= cpu_caps_set[i];
>>> }
>>> +
>>> + if (c == &boot_cpu_data)
>>> + pr_info("%s: c: %px, array: 0x%x\n", __func__, c, c->x86_capability[7]);
>>> }
>>>
>>> static void init_speculation_control(struct cpuinfo_x86 *c)
>>> @@ -778,6 +781,10 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
>>> if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
>>> set_cpu_cap(c, X86_FEATURE_IBRS);
>>> set_cpu_cap(c, X86_FEATURE_IBPB);
>>> +
>>> + pr_info("%s: X86_FEATURE_SPEC_CTRL: c: %px, array: 0x%x, CPUID: 0x%x\n",
>>> + __func__, c, c->x86_capability[7], cpuid_edx(7));
>>> +
>>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>>> }
>>>
>>> @@ -793,9 +800,13 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
>>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>>> }
>>>
>>> - if (cpu_has(c, X86_FEATURE_AMD_IBPB))
>>> + if (cpu_has(c, X86_FEATURE_AMD_IBPB)) {
>>> set_cpu_cap(c, X86_FEATURE_IBPB);
>>>
>>> + pr_info("%s: X86_FEATURE_AMD_IBPB: c: %px, array: 0x%x, CPUID: 0x%x\n",
>>> + __func__, c, c->x86_capability[7], cpuid_ebx(0x80000008));
>>> + }
>>> +
>>> if (cpu_has(c, X86_FEATURE_AMD_STIBP)) {
>>> set_cpu_cap(c, X86_FEATURE_STIBP);
>>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>>
>> Please find the logs attached.
>
> Ah, so the CPUID value is showing X86_FEATURE_AMD_IBPB (not sure why the
> cpuid command was showing a value of zero for EBX in your previous email).
> Let me see what I can find out about this processor/firmware relation. I
> wouldn't expect to see the #GP given that the firmware says IBPB is
> supported.
>

I'm not able to reproduce this issue on my family 21, model 1, stepping 2
processor (AMD Opteron(TM) Processor 6274) as I am able to successfully
write to the PRED_CMD MSR. Let's check the firmware file that you're
loading. The one I'm using is:

$ sha1sum /lib/firmware/amd-ucode/microcode_amd_fam15h.bin
90896256951d8edf7baf8181ae11e2dc618a5171 /lib/firmware/amd-ucode/microcode_amd_fam15h.bin

Does that match what you have?

Thanks,
Tom

> Thanks,
> Tom
>
>>
>>
>> Kind regards,
>>
>> Paul
>>

2019-01-14 17:12:09

by Paul Menzel

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

Dear Thomas,


Thank you for checking this, and coming back with the results so quickly.

On 01/14/19 18:00, Lendacky, Thomas wrote:
> On 1/10/19 12:34 PM, Lendacky, Thomas wrote:
>> On 1/10/19 10:49 AM, Paul Menzel wrote:
>>> Dear Boris, dear Thomas,
>>>
>>>
>>> On 01/10/19 17:00, Borislav Petkov wrote:
>>>> On Thu, Jan 10, 2019 at 02:57:40PM +0100, Paul Menzel wrote:
>>>>> Thank you very much. Indeed, the machine does not crash. I used Linus’
>>>>> master branch for testing, and applied your patch on top. Please find
>>>>> the full log attached.
>>>>
>>>>> 80.649: [ 3.197107] Spectre V2 : spectre_v2_user_select_mitigation: set X86_FEATURE_USE_IBPB
>>>>
>>>> This is amazing.
>>>>
>>>> Ok, next diff, same exercise. Thx.>
>>>> ---
>>>> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
>>>> index dad12b767ba0..528ef8336f5f 100644
>>>> --- a/arch/x86/include/asm/nospec-branch.h
>>>> +++ b/arch/x86/include/asm/nospec-branch.h
>>>> @@ -284,6 +284,12 @@ static inline void indirect_branch_prediction_barrier(void)
>>>> {
>>>> u64 val = PRED_CMD_IBPB;
>>>>
>>>> + if (WARN_ON(boot_cpu_has(X86_FEATURE_USE_IBPB))) {
>>>> + pr_info("%s: c: %px, array: 0x%x\n",
>>>> + __func__, &boot_cpu_data, boot_cpu_data.x86_capability[7]);
>>>> + return;
>>>> + }
>>>> +
>>>> alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB);
>>>> }
>>>>
>>>> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
>>>> index 8654b8b0c848..e818e5abe611 100644
>>>> --- a/arch/x86/kernel/cpu/bugs.c
>>>> +++ b/arch/x86/kernel/cpu/bugs.c
>>>> @@ -371,6 +371,9 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
>>>> if (boot_cpu_has(X86_FEATURE_IBPB)) {
>>>> setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
>>>>
>>>> + pr_err("%s: set X86_FEATURE_USE_IBPB, c: %px, array: 0x%x\n",
>>>> + __func__, &boot_cpu_data, boot_cpu_data.x86_capability[7]);
>>>> +
>>>> switch (cmd) {
>>>> case SPECTRE_V2_USER_CMD_FORCE:
>>>> case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
>>>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>>>> index cb28e98a0659..8566737fa500 100644
>>>> --- a/arch/x86/kernel/cpu/common.c
>>>> +++ b/arch/x86/kernel/cpu/common.c
>>>> @@ -765,6 +765,9 @@ static void apply_forced_caps(struct cpuinfo_x86 *c)
>>>> c->x86_capability[i] &= ~cpu_caps_cleared[i];
>>>> c->x86_capability[i] |= cpu_caps_set[i];
>>>> }
>>>> +
>>>> + if (c == &boot_cpu_data)
>>>> + pr_info("%s: c: %px, array: 0x%x\n", __func__, c, c->x86_capability[7]);
>>>> }
>>>>
>>>> static void init_speculation_control(struct cpuinfo_x86 *c)
>>>> @@ -778,6 +781,10 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
>>>> if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
>>>> set_cpu_cap(c, X86_FEATURE_IBRS);
>>>> set_cpu_cap(c, X86_FEATURE_IBPB);
>>>> +
>>>> + pr_info("%s: X86_FEATURE_SPEC_CTRL: c: %px, array: 0x%x, CPUID: 0x%x\n",
>>>> + __func__, c, c->x86_capability[7], cpuid_edx(7));
>>>> +
>>>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>>>> }
>>>>
>>>> @@ -793,9 +800,13 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
>>>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>>>> }
>>>>
>>>> - if (cpu_has(c, X86_FEATURE_AMD_IBPB))
>>>> + if (cpu_has(c, X86_FEATURE_AMD_IBPB)) {
>>>> set_cpu_cap(c, X86_FEATURE_IBPB);
>>>>
>>>> + pr_info("%s: X86_FEATURE_AMD_IBPB: c: %px, array: 0x%x, CPUID: 0x%x\n",
>>>> + __func__, c, c->x86_capability[7], cpuid_ebx(0x80000008));
>>>> + }
>>>> +
>>>> if (cpu_has(c, X86_FEATURE_AMD_STIBP)) {
>>>> set_cpu_cap(c, X86_FEATURE_STIBP);
>>>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>>>
>>> Please find the logs attached.
>>
>> Ah, so the CPUID value is showing X86_FEATURE_AMD_IBPB (not sure why the
>> cpuid command was showing a value of zero for EBX in your previous email).
>> Let me see what I can find out about this processor/firmware relation. I
>> wouldn't expect to see the #GP given that the firmware says IBPB is
>> supported.
>
> I'm not able to reproduce this issue on my family 21, model 1, stepping 2
> processor (AMD Opteron(TM) Processor 6274) as I am able to successfully
> write to the PRED_CMD MSR.

It’s not exactly the same processor, but I guess the same family should be
good enough. What board do you have? Do you have two sockets, and both
populated?

Here is an Asus KGPE-D16 with two AMD Opterons put in.

Lastly, my microcode updates are applied in firmware, and not by GNU/Linux.

> Let's check the firmware file that you're loading. The one I'm using is:
>
> $ sha1sum /lib/firmware/amd-ucode/microcode_amd_fam15h.bin
> 90896256951d8edf7baf8181ae11e2dc618a5171 /lib/firmware/amd-ucode/microcode_amd_fam15h.bin
>
> Does that match what you have?

Yes, that matches exactly.

90896256951d8edf7baf8181ae11e2dc618a5171 3rdparty/blobs/cpu/amd/family_15h/microcode_amd_fam15h.bin


Kind regards,

Paul


Attachments:
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature

2019-01-14 17:39:36

by Tom Lendacky

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

On 1/14/19 11:09 AM, Paul Menzel wrote:
> Dear Thomas,
>
>
> Thank you for checking this, and coming back with the results so quickly.
>
> On 01/14/19 18:00, Lendacky, Thomas wrote:
>> On 1/10/19 12:34 PM, Lendacky, Thomas wrote:
>>> On 1/10/19 10:49 AM, Paul Menzel wrote:
>>>> Dear Boris, dear Thomas,
>>>>
>>>>
>>>> On 01/10/19 17:00, Borislav Petkov wrote:
>>>>> On Thu, Jan 10, 2019 at 02:57:40PM +0100, Paul Menzel wrote:
>>>>>> Thank you very much. Indeed, the machine does not crash. I used Linus’
>>>>>> master branch for testing, and applied your patch on top. Please find
>>>>>> the full log attached.
>>>>>
>>>>>> 80.649: [ 3.197107] Spectre V2 : spectre_v2_user_select_mitigation: set X86_FEATURE_USE_IBPB
>>>>>
>>>>> This is amazing.
>>>>>
>>>>> Ok, next diff, same exercise. Thx.>
>>>>> ---
>>>>> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
>>>>> index dad12b767ba0..528ef8336f5f 100644
>>>>> --- a/arch/x86/include/asm/nospec-branch.h
>>>>> +++ b/arch/x86/include/asm/nospec-branch.h
>>>>> @@ -284,6 +284,12 @@ static inline void indirect_branch_prediction_barrier(void)
>>>>> {
>>>>> u64 val = PRED_CMD_IBPB;
>>>>>
>>>>> + if (WARN_ON(boot_cpu_has(X86_FEATURE_USE_IBPB))) {
>>>>> + pr_info("%s: c: %px, array: 0x%x\n",
>>>>> + __func__, &boot_cpu_data, boot_cpu_data.x86_capability[7]);
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB);
>>>>> }
>>>>>
>>>>> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
>>>>> index 8654b8b0c848..e818e5abe611 100644
>>>>> --- a/arch/x86/kernel/cpu/bugs.c
>>>>> +++ b/arch/x86/kernel/cpu/bugs.c
>>>>> @@ -371,6 +371,9 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
>>>>> if (boot_cpu_has(X86_FEATURE_IBPB)) {
>>>>> setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
>>>>>
>>>>> + pr_err("%s: set X86_FEATURE_USE_IBPB, c: %px, array: 0x%x\n",
>>>>> + __func__, &boot_cpu_data, boot_cpu_data.x86_capability[7]);
>>>>> +
>>>>> switch (cmd) {
>>>>> case SPECTRE_V2_USER_CMD_FORCE:
>>>>> case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
>>>>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>>>>> index cb28e98a0659..8566737fa500 100644
>>>>> --- a/arch/x86/kernel/cpu/common.c
>>>>> +++ b/arch/x86/kernel/cpu/common.c
>>>>> @@ -765,6 +765,9 @@ static void apply_forced_caps(struct cpuinfo_x86 *c)
>>>>> c->x86_capability[i] &= ~cpu_caps_cleared[i];
>>>>> c->x86_capability[i] |= cpu_caps_set[i];
>>>>> }
>>>>> +
>>>>> + if (c == &boot_cpu_data)
>>>>> + pr_info("%s: c: %px, array: 0x%x\n", __func__, c, c->x86_capability[7]);
>>>>> }
>>>>>
>>>>> static void init_speculation_control(struct cpuinfo_x86 *c)
>>>>> @@ -778,6 +781,10 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
>>>>> if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
>>>>> set_cpu_cap(c, X86_FEATURE_IBRS);
>>>>> set_cpu_cap(c, X86_FEATURE_IBPB);
>>>>> +
>>>>> + pr_info("%s: X86_FEATURE_SPEC_CTRL: c: %px, array: 0x%x, CPUID: 0x%x\n",
>>>>> + __func__, c, c->x86_capability[7], cpuid_edx(7));
>>>>> +
>>>>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>>>>> }
>>>>>
>>>>> @@ -793,9 +800,13 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
>>>>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>>>>> }
>>>>>
>>>>> - if (cpu_has(c, X86_FEATURE_AMD_IBPB))
>>>>> + if (cpu_has(c, X86_FEATURE_AMD_IBPB)) {
>>>>> set_cpu_cap(c, X86_FEATURE_IBPB);
>>>>>
>>>>> + pr_info("%s: X86_FEATURE_AMD_IBPB: c: %px, array: 0x%x, CPUID: 0x%x\n",
>>>>> + __func__, c, c->x86_capability[7], cpuid_ebx(0x80000008));
>>>>> + }
>>>>> +
>>>>> if (cpu_has(c, X86_FEATURE_AMD_STIBP)) {
>>>>> set_cpu_cap(c, X86_FEATURE_STIBP);
>>>>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>>>>
>>>> Please find the logs attached.
>>>
>>> Ah, so the CPUID value is showing X86_FEATURE_AMD_IBPB (not sure why the
>>> cpuid command was showing a value of zero for EBX in your previous email).
>>> Let me see what I can find out about this processor/firmware relation. I
>>> wouldn't expect to see the #GP given that the firmware says IBPB is
>>> supported.
>>
>> I'm not able to reproduce this issue on my family 21, model 1, stepping 2
>> processor (AMD Opteron(TM) Processor 6274) as I am able to successfully
>> write to the PRED_CMD MSR.
>
> It’s not exactly the same processor, but I guess the same family should be
> good enough. What board do you have? Do you have two sockets, and both
> populated?

Yes, It is a two-socket system with two processors installed.

>
> Here is an Asus KGPE-D16 with two AMD Opterons put in.
>
> Lastly, my microcode updates are applied in firmware, and not by GNU/Linux.

Ok, I was confused on how you had reported that, sorry.

Can we try an experiment where you use the older version of the Asus
firmware but build an initramfs that will perform early microcode loading?
I'm curious if things will work when loaded via Linux.

Thanks,
Tom

>
>> Let's check the firmware file that you're loading. The one I'm using is:
>>
>> $ sha1sum /lib/firmware/amd-ucode/microcode_amd_fam15h.bin
>> 90896256951d8edf7baf8181ae11e2dc618a5171 /lib/firmware/amd-ucode/microcode_amd_fam15h.bin
>>
>> Does that match what you have?
>
> Yes, that matches exactly.
>
> 90896256951d8edf7baf8181ae11e2dc618a5171 3rdparty/blobs/cpu/amd/family_15h/microcode_amd_fam15h.bin
>
>
> Kind regards,
>
> Paul
>

2019-10-02 16:17:57

by Paul Menzel

[permalink] [raw]
Subject: Re: General protection fault in `switch_mm_irqs_off()`

[CC: +affected coreboot folks, +coreboot mailing list]

Dear Thomas,


More affected people discussed this issue on the coreboot mailing list [1].

On 2019-01-14 18:37, Lendacky, Thomas wrote:
> On 1/14/19 11:09 AM, Paul Menzel wrote:

>> On 01/14/19 18:00, Lendacky, Thomas wrote:
>>> On 1/10/19 12:34 PM, Lendacky, Thomas wrote:
>>>> On 1/10/19 10:49 AM, Paul Menzel wrote:
>>>>> Dear Boris, dear Thomas,
>>>>>
>>>>>
>>>>> On 01/10/19 17:00, Borislav Petkov wrote:
>>>>>> On Thu, Jan 10, 2019 at 02:57:40PM +0100, Paul Menzel wrote:
>>>>>>> Thank you very much. Indeed, the machine does not crash. I used Linus’
>>>>>>> master branch for testing, and applied your patch on top. Please find
>>>>>>> the full log attached.
>>>>>>
>>>>>>> 80.649: [ 3.197107] Spectre V2 : spectre_v2_user_select_mitigation: set X86_FEATURE_USE_IBPB
>>>>>>
>>>>>> This is amazing.
>>>>>>
>>>>>> Ok, next diff, same exercise. Thx.>
>>>>>> ---
>>>>>> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
>>>>>> index dad12b767ba0..528ef8336f5f 100644
>>>>>> --- a/arch/x86/include/asm/nospec-branch.h
>>>>>> +++ b/arch/x86/include/asm/nospec-branch.h
>>>>>> @@ -284,6 +284,12 @@ static inline void indirect_branch_prediction_barrier(void)
>>>>>> {
>>>>>> u64 val = PRED_CMD_IBPB;
>>>>>>
>>>>>> + if (WARN_ON(boot_cpu_has(X86_FEATURE_USE_IBPB))) {
>>>>>> + pr_info("%s: c: %px, array: 0x%x\n",
>>>>>> + __func__, &boot_cpu_data, boot_cpu_data.x86_capability[7]);
>>>>>> + return;
>>>>>> + }
>>>>>> +
>>>>>> alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB);
>>>>>> }
>>>>>>
>>>>>> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
>>>>>> index 8654b8b0c848..e818e5abe611 100644
>>>>>> --- a/arch/x86/kernel/cpu/bugs.c
>>>>>> +++ b/arch/x86/kernel/cpu/bugs.c
>>>>>> @@ -371,6 +371,9 @@ spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd)
>>>>>> if (boot_cpu_has(X86_FEATURE_IBPB)) {
>>>>>> setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
>>>>>>
>>>>>> + pr_err("%s: set X86_FEATURE_USE_IBPB, c: %px, array: 0x%x\n",
>>>>>> + __func__, &boot_cpu_data, boot_cpu_data.x86_capability[7]);
>>>>>> +
>>>>>> switch (cmd) {
>>>>>> case SPECTRE_V2_USER_CMD_FORCE:
>>>>>> case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
>>>>>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>>>>>> index cb28e98a0659..8566737fa500 100644
>>>>>> --- a/arch/x86/kernel/cpu/common.c
>>>>>> +++ b/arch/x86/kernel/cpu/common.c
>>>>>> @@ -765,6 +765,9 @@ static void apply_forced_caps(struct cpuinfo_x86 *c)
>>>>>> c->x86_capability[i] &= ~cpu_caps_cleared[i];
>>>>>> c->x86_capability[i] |= cpu_caps_set[i];
>>>>>> }
>>>>>> +
>>>>>> + if (c == &boot_cpu_data)
>>>>>> + pr_info("%s: c: %px, array: 0x%x\n", __func__, c, c->x86_capability[7]);
>>>>>> }
>>>>>>
>>>>>> static void init_speculation_control(struct cpuinfo_x86 *c)
>>>>>> @@ -778,6 +781,10 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
>>>>>> if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
>>>>>> set_cpu_cap(c, X86_FEATURE_IBRS);
>>>>>> set_cpu_cap(c, X86_FEATURE_IBPB);
>>>>>> +
>>>>>> + pr_info("%s: X86_FEATURE_SPEC_CTRL: c: %px, array: 0x%x, CPUID: 0x%x\n",
>>>>>> + __func__, c, c->x86_capability[7], cpuid_edx(7));
>>>>>> +
>>>>>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>>>>>> }
>>>>>>
>>>>>> @@ -793,9 +800,13 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
>>>>>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>>>>>> }
>>>>>>
>>>>>> - if (cpu_has(c, X86_FEATURE_AMD_IBPB))
>>>>>> + if (cpu_has(c, X86_FEATURE_AMD_IBPB)) {
>>>>>> set_cpu_cap(c, X86_FEATURE_IBPB);
>>>>>>
>>>>>> + pr_info("%s: X86_FEATURE_AMD_IBPB: c: %px, array: 0x%x, CPUID: 0x%x\n",
>>>>>> + __func__, c, c->x86_capability[7], cpuid_ebx(0x80000008));
>>>>>> + }
>>>>>> +
>>>>>> if (cpu_has(c, X86_FEATURE_AMD_STIBP)) {
>>>>>> set_cpu_cap(c, X86_FEATURE_STIBP);
>>>>>> set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL);
>>>>>
>>>>> Please find the logs attached.
>>>>
>>>> Ah, so the CPUID value is showing X86_FEATURE_AMD_IBPB (not sure why the
>>>> cpuid command was showing a value of zero for EBX in your previous email).
>>>> Let me see what I can find out about this processor/firmware relation. I
>>>> wouldn't expect to see the #GP given that the firmware says IBPB is
>>>> supported.
>>>
>>> I'm not able to reproduce this issue on my family 21, model 1, stepping 2
>>> processor (AMD Opteron(TM) Processor 6274) as I am able to successfully
>>> write to the PRED_CMD MSR.
>>
>> It’s not exactly the same processor, but I guess the same family should be
>> good enough. What board do you have? Do you have two sockets, and both
>> populated?
>
> Yes, It is a two-socket system with two processors installed.
>
>> Here is an Asus KGPE-D16 with two AMD Opterons put in.
>>
>> Lastly, my microcode updates are applied in firmware, and not by GNU/Linux.
>
> Ok, I was confused on how you had reported that, sorry.

Kinky reports, that populating the memory slots of both NUMA nodes fixes this.
Kinky, what slots do you have exactly populated?

I haven’t been able to verify that yet, but please find my output of
`sudo dmidecode -t memory` with a 8 * 16 GB system attached, which is
affected.

> Can we try an experiment where you use the older version of the Asus
> firmware but build an initramfs that will perform early microcode loading?
> I'm curious if things will work when loaded via Linux.

I believe the users reported that this works.

>>> Let's check the firmware file that you're loading. The one I'm using is:
>>>
>>> $ sha1sum /lib/firmware/amd-ucode/microcode_amd_fam15h.bin
>>> 90896256951d8edf7baf8181ae11e2dc618a5171 /lib/firmware/amd-ucode/microcode_amd_fam15h.bin
>>>
>>> Does that match what you have?
>>
>> Yes, that matches exactly.
>>
>> 90896256951d8edf7baf8181ae11e2dc618a5171 3rdparty/blobs/cpu/amd/family_15h/microcode_amd_fam15h.bin


Kind regards,

Paul


[1]: https://mail.coreboot.org/hyperkitty/list/[email protected]/thread/QZIVOD4UADLLPZEE7MFUUTQQM343GKOC/


Attachments:
asus-kgpe-d16-128-gb-dmidecode-t-memory.txt (4.67 kB)
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature
Download all attachments