2022-08-19 13:40:39

by Paul Menzel

[permalink] [raw]
Subject: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

Dear Linux folks,


On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:

```
[ 0.000000] Linux version 5.18.0-4-amd64
([email protected]) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU
ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC
Debian 5.18.16-1 (2022-08-10)
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64
root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
[…]
[ 0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
[…]
[ 0.235418] sgx: EPC section 0x40200000-0x45f7ffff
[ 0.235853] ------------[ cut here ]------------
[ 0.235855] WARNING: CPU: 1 PID: 83 at
arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
[ 0.235861] Modules linked in:
[ 0.235862] CPU: 1 PID: 83 Comm: ksgxd Not tainted 5.18.0-4-amd64 #1
Debian 5.18.16-1
[ 0.235865] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
07/06/2022
[ 0.235866] RIP: 0010:ksgxd+0x1b7/0x1d0
[ 0.235869] Code: ff e9 f2 fe ff ff 48 89 df e8 55 56 0d 00 84 c0 0f
84 c3 fe ff ff 31 ff e8 c6 56 0d 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
ff <0f> 0b e9 7f fe ff ff e8 3d dd 93 00 66 66 2e 0f 1f 84 00 00 00 00
[ 0.235870] RSP: 0000:ffffaaed0097bed8 EFLAGS: 00010287
[ 0.235872] RAX: ffffaaed00431890 RBX: ffff9a323ccc8000 RCX:
0000000000000000
[ 0.235873] RDX: 0000000080000000 RSI: ffffaaed00431850 RDI:
00000000ffffffff
[ 0.235875] RBP: ffff9a31416ca080 R08: ffff9a31416cae40 R09:
ffff9a31416cae40
[ 0.235876] R10: 0000000000000000 R11: 0000000000000001 R12:
ffffaaed0006bce0
[ 0.235877] R13: ffff9a3140e9c480 R14: ffffffff9825ee60 R15:
0000000000000000
[ 0.235878] FS: 0000000000000000(0000) GS:ffff9a32e6640000(0000)
knlGS:0000000000000000
[ 0.235880] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.235881] CR2: 0000000000000000 CR3: 00000001fbe10001 CR4:
00000000003706e0
[ 0.235882] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 0.235883] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 0.235884] Call Trace:
[ 0.235893] <TASK>
[ 0.235895] ? _raw_spin_lock_irqsave+0x24/0x60
[ 0.235900] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 0.235902] ? __kthread_parkme+0x36/0x90
[ 0.235905] kthread+0xe5/0x110
[ 0.235907] ? kthread_complete_and_exit+0x20/0x20
[ 0.235909] ret_from_fork+0x1f/0x30
[ 0.235914] </TASK>
[ 0.235915] ---[ end trace 0000000000000000 ]---
```


Kind regards,

Paul


Attachments:
20220819--linux-5.18.16--dmesg-2.txt (73.55 kB)

2022-08-19 17:28:33

by Paul Menzel

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

[Cc: [email protected]]

Am 19.08.22 um 15:19 schrieb Paul Menzel:
> Dear Linux folks,
>
>
> On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
>
> ```
> [    0.000000] Linux version 5.18.0-4-amd64 ([email protected]) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64 root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> […]
> [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
> […]
> [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
> [    0.235853] ------------[ cut here ]------------
> [    0.235855] WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
> [    0.235861] Modules linked in:
> [    0.235862] CPU: 1 PID: 83 Comm: ksgxd Not tainted 5.18.0-4-amd64 #1 Debian 5.18.16-1
> [    0.235865] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
> [    0.235866] RIP: 0010:ksgxd+0x1b7/0x1d0
> [    0.235869] Code: ff e9 f2 fe ff ff 48 89 df e8 55 56 0d 00 84 c0 0f 84 c3 fe ff ff 31 ff e8 c6 56 0d 00 84 c0 0f 85 94 fe ff ff e9 af fe ff ff <0f> 0b e9 7f fe ff ff e8 3d dd 93 00 66 66 2e 0f 1f 84 00 00 00 00
> [    0.235870] RSP: 0000:ffffaaed0097bed8 EFLAGS: 00010287
> [    0.235872] RAX: ffffaaed00431890 RBX: ffff9a323ccc8000 RCX: 0000000000000000
> [    0.235873] RDX: 0000000080000000 RSI: ffffaaed00431850 RDI: 00000000ffffffff
> [    0.235875] RBP: ffff9a31416ca080 R08: ffff9a31416cae40 R09: ffff9a31416cae40
> [    0.235876] R10: 0000000000000000 R11: 0000000000000001 R12: ffffaaed0006bce0
> [    0.235877] R13: ffff9a3140e9c480 R14: ffffffff9825ee60 R15: 0000000000000000
> [    0.235878] FS:  0000000000000000(0000) GS:ffff9a32e6640000(0000) knlGS:0000000000000000
> [    0.235880] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.235881] CR2: 0000000000000000 CR3: 00000001fbe10001 CR4: 00000000003706e0
> [    0.235882] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    0.235883] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    0.235884] Call Trace:
> [    0.235893]  <TASK>
> [    0.235895]  ? _raw_spin_lock_irqsave+0x24/0x60
> [    0.235900]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> [    0.235902]  ? __kthread_parkme+0x36/0x90
> [    0.235905]  kthread+0xe5/0x110
> [    0.235907]  ? kthread_complete_and_exit+0x20/0x20
> [    0.235909]  ret_from_fork+0x1f/0x30
> [    0.235914]  </TASK>
> [    0.235915] ---[ end trace 0000000000000000 ]---
> ```
>
>
> Kind regards,
>
> Paul

2022-08-19 19:02:53

by Dave Hansen

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

On 8/19/22 09:02, Paul Menzel wrote:
> On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
>
> ```
> [    0.000000] Linux version 5.18.0-4-amd64
> ([email protected]) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU
> ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC
> Debian 5.18.16-1 (2022-08-10)
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64
> root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> […]
> [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
> […]
> [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff

Hi Paul,

Would you be able to send the entire dmesg, along with:

cat /proc/iomem # (as root)
and
cpuid -1 --raw

I'm suspecting either a BIOS problem. Reinette (cc'd) also thought this
might be a case of the SGX initialization getting a bit too far along
when it should have been disabled.

We had some bugs where we didn't stop fast enough after spitting out the
"SGX Launch Control is locked..." errors.

2022-08-20 06:18:08

by Paul Menzel

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

Dear Dave,


Thank you for your quick reply.


Am 19.08.22 um 20:28 schrieb Dave Hansen:
> On 8/19/22 09:02, Paul Menzel wrote:
>> On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
>>
>> ```
>> [    0.000000] Linux version 5.18.0-4-amd64 ([email protected]) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64 root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
>> […]
>> [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
>> […]
>> [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff

> Would you be able to send the entire dmesg, along with:

The log message are attached to the first message, where I missed to
carbon-copy linux-sgx@ [1].

> cat /proc/iomem # (as root)
> and
> cpuid -1 --raw

I am going to provide that next week. (Side note, Intel might have some
Dell XPS 9370 test machines in some QA lab.)

> I'm suspecting either a BIOS problem. Reinette (cc'd) also thought this
> might be a case of the SGX initialization getting a bit too far along
> when it should have been disabled.
>
> We had some bugs where we didn't stop fast enough after spitting out the
> "SGX Launch Control is locked..." errors.


Kind regards,

Paul


[1]:
https://lore.kernel.org/lkml/[email protected]/

2022-08-23 17:25:17

by Paul Menzel

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

Dear Dave,


Am 20.08.22 um 08:13 schrieb Paul Menzel:

> Am 19.08.22 um 20:28 schrieb Dave Hansen:
>> On 8/19/22 09:02, Paul Menzel wrote:
>>> On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
>>>
>>> ```
>>> [    0.000000] Linux version 5.18.0-4-amd64 ([email protected]) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64 root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
>>> […]
>>> [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
>>> […]
>>> [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
>
>> Would you be able to send the entire dmesg, along with:
>
> The log message are attached to the first message, where I missed to
> carbon-copy linux-sgx@ [1].
>
>>     cat /proc/iomem # (as root)
>> and
>>     cpuid -1 --raw
>
> I am going to provide that next week. (Side note, Intel might have some
> Dell XPS 9370 test machines in some QA lab.)

Please find both outputs at the end of the file.

>> I'm suspecting either a BIOS problem.  Reinette (cc'd) also thought this
>> might be a case of the SGX initialization getting a bit too far along
>> when it should have been disabled.
>>
>> We had some bugs where we didn't stop fast enough after spitting out the
>> "SGX Launch Control is locked..." errors.

Let’s hope it’s something known to you.


Kind regards,

Paul


> [1]: https://lore.kernel.org/lkml/[email protected]/


PS:

$ sudo cat /proc/iomem
[sudo] password for molgenit:
00000000-00000fff : Reserved
00001000-00057fff : System RAM
00058000-00058fff : Reserved
00059000-0009dfff : System RAM
0009e000-000fffff : Reserved
00000000-00000000 : PCI Bus 0000:00
00000000-00000000 : PCI Bus 0000:00
00000000-00000000 : PCI Bus 0000:00
00000000-00000000 : PCI Bus 0000:00
00000000-00000000 : PCI Bus 0000:00
00000000-00000000 : PCI Bus 0000:00
00000000-00000000 : PCI Bus 0000:00
00000000-00000000 : PCI Bus 0000:00
000a0000-000dffff : PCI Bus 0000:00
000c0000-000dffff : 0000:00:02.0
000f0000-000fffff : System ROM
00100000-2d6c4fff : System RAM
2d6c5000-2d6c5fff : ACPI Non-volatile Storage
2d6c6000-2d6c6fff : Reserved
2d6c7000-3b6acfff : System RAM
3b6ad000-3b720fff : Reserved
3b721000-3ecf1fff : System RAM
3ecf2000-3f0b1fff : Reserved
3f0b2000-3f0fefff : ACPI Tables
3f0ff000-3f7b6fff : ACPI Non-volatile Storage
3f798000-3f798fff : USBC000:00
3f7b7000-3ff25fff : Reserved
3ff26000-3fffefff : Unknown E820 type
3ffff000-3fffffff : System RAM
40000000-47ffffff : Reserved
40200000-45f7ffff : INT0E0C:00
48000000-48dfffff : System RAM
48e00000-4f7fffff : Reserved
4b800000-4f7fffff : Graphics Stolen Memory
4f800000-dfffffff : PCI Bus 0000:00
50000000-5fffffff : 0000:00:02.0
60000000-a9ffffff : PCI Bus 0000:03
ac000000-da0fffff : PCI Bus 0000:03
db000000-dbffffff : 0000:00:02.0
dc000000-dc0fffff : PCI Bus 0000:6e
dc000000-dc003fff : 0000:6e:00.0
dc000000-dc003fff : nvme
dc100000-dc1fffff : PCI Bus 0000:02
dc100000-dc101fff : 0000:02:00.0
dc100000-dc101fff : iwlwifi
dc200000-dc2fffff : PCI Bus 0000:01
dc200000-dc200fff : 0000:01:00.0
dc200000-dc200fff : rtsx_pci
dc300000-dc30ffff : 0000:00:1f.3
dc310000-dc31ffff : 0000:00:14.0
dc310000-dc31ffff : xhci-hcd
dc318070-dc31846f : intel_xhci_usb_sw
dc320000-dc327fff : 0000:00:04.0
dc320000-dc327fff : proc_thermal
dc328000-dc32bfff : 0000:00:1f.3
dc328000-dc32bfff : ICH HD audio
dc32c000-dc32ffff : 0000:00:1f.2
dc330000-dc3300ff : 0000:00:1f.4
dc331000-dc331fff : 0000:00:16.3
dc332000-dc332fff : 0000:00:16.0
dc332000-dc332fff : mei_me
dc333000-dc333fff : 0000:00:15.1
dc333000-dc3331ff : lpss_dev
dc333000-dc3331ff : i2c_designware.1 lpss_dev
dc333200-dc3332ff : lpss_priv
dc333800-dc333fff : idma64.1
dc333800-dc333fff : idma64.1 idma64.1
dc334000-dc334fff : 0000:00:15.0
dc334000-dc3341ff : lpss_dev
dc334000-dc3341ff : i2c_designware.0 lpss_dev
dc334200-dc3342ff : lpss_priv
dc334800-dc334fff : idma64.0
dc334800-dc334fff : idma64.0 idma64.0
dc335000-dc335fff : 0000:00:14.2
dc335000-dc335fff : Intel PCH thermal driver
dffe0000-dfffffff : pnp 00:05
e0000000-efffffff : PCI MMCONFIG 0000 [bus 00-ff]
e0000000-efffffff : Reserved
e0000000-efffffff : pnp 00:05
fd000000-fe7fffff : PCI Bus 0000:00
fd000000-fdabffff : pnp 00:06
fdac0000-fdacffff : INT344B:00
fdac0000-fdacffff : INT344B:00 INT344B:00
fdad0000-fdadffff : pnp 00:06
fdae0000-fdaeffff : INT344B:00
fdae0000-fdaeffff : INT344B:00 INT344B:00
fdaf0000-fdafffff : INT344B:00
fdaf0000-fdafffff : INT344B:00 INT344B:00
fdb00000-fdffffff : pnp 00:06
fdc6000c-fdc6000f : iTCO_wdt
fdc6000c-fdc6000f : iTCO_wdt iTCO_wdt
fe000000-fe010fff : Reserved
fe028000-fe028fff : pnp 00:08
fe029000-fe029fff : pnp 00:08
fe036000-fe03bfff : pnp 00:06
fe03d000-fe3fffff : pnp 00:06
fe410000-fe7fffff : pnp 00:06
fec00000-fec00fff : Reserved
fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
fed00000-fed003ff : PNP0103:00
fed10000-fed17fff : pnp 00:05
fed18000-fed18fff : pnp 00:05
fed19000-fed19fff : pnp 00:05
fed20000-fed3ffff : pnp 00:05
fed40000-fed44fff : MSFT0101:00
fed40000-fed44fff : MSFT0101:00 MSFT0101:00
fed45000-fed8ffff : pnp 00:05
fed90000-fed90fff : dmar0
fed91000-fed91fff : dmar1
fee00000-fee00fff : Local APIC
fee00000-fee00fff : Reserved
ff000000-ffffffff : Reserved
ff000000-ffffffff : INT0800:00
ff000000-ffffffff : pnp 00:05
100000000-2ae7fffff : System RAM
190c00000-191801987 : Kernel code
191a00000-19225ffff : Kernel rodata
192400000-1926b57bf : Kernel data
192d2b000-1931fffff : Kernel bss
2ae800000-2afffffff : RAM buffer

$ sudo cpuid -1 --raw
CPU:
0x00000000 0x00: eax=0x00000016 ebx=0x756e6547 ecx=0x6c65746e
edx=0x49656e69
0x00000001 0x00: eax=0x000806ea ebx=0x00100800 ecx=0x7ffafbff
edx=0xbfebfbff
0x00000002 0x00: eax=0x76036301 ebx=0x00f0b5ff ecx=0x00000000
edx=0x00c30000
0x00000003 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
0x00000004 0x00: eax=0x1c004121 ebx=0x01c0003f ecx=0x0000003f
edx=0x00000000
0x00000004 0x01: eax=0x1c004122 ebx=0x01c0003f ecx=0x0000003f
edx=0x00000000
0x00000004 0x02: eax=0x1c004143 ebx=0x00c0003f ecx=0x000003ff
edx=0x00000000
0x00000004 0x03: eax=0x1c03c163 ebx=0x02c0003f ecx=0x00001fff
edx=0x00000006
0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
0x00000005 0x00: eax=0x00000040 ebx=0x00000040 ecx=0x00000003
edx=0x11142120
0x00000006 0x00: eax=0x000027f7 ebx=0x00000002 ecx=0x00000009
edx=0x00000000
0x00000007 0x00: eax=0x00000000 ebx=0x029c67af ecx=0x00000000
edx=0xbc002e00
0x00000008 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
0x00000009 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
0x0000000a 0x00: eax=0x07300404 ebx=0x00000000 ecx=0x00000000
edx=0x00000603
0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100
edx=0x00000000
0x0000000b 0x01: eax=0x00000004 ebx=0x00000008 ecx=0x00000201
edx=0x00000000
0x0000000b 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002
edx=0x00000000
0x0000000c 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
0x0000000d 0x00: eax=0x0000001f ebx=0x00000440 ecx=0x00000440
edx=0x00000000
0x0000000d 0x01: eax=0x0000000f ebx=0x000003c0 ecx=0x00000100
edx=0x00000000
0x0000000d 0x02: eax=0x00000100 ebx=0x00000240 ecx=0x00000000
edx=0x00000000
0x0000000d 0x03: eax=0x00000040 ebx=0x000003c0 ecx=0x00000000
edx=0x00000000
0x0000000d 0x04: eax=0x00000040 ebx=0x00000400 ecx=0x00000000
edx=0x00000000
0x0000000d 0x08: eax=0x00000080 ebx=0x00000000 ecx=0x00000001
edx=0x00000000
0x0000000e 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
0x0000000f 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
0x00000010 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
0x00000011 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
0x00000012 0x00: eax=0x00000001 ebx=0x00000000 ecx=0x00000000
edx=0x0000241f
0x00000012 0x01: eax=0x00000036 ebx=0x00000000 ecx=0x0000001f
edx=0x00000000
0x00000012 0x02: eax=0x40200001 ebx=0x00000000 ecx=0x05d80001
edx=0x00000000
0x00000012 0x03: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
0x00000013 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
0x00000014 0x00: eax=0x00000001 ebx=0x0000000f ecx=0x00000007
edx=0x00000000
0x00000014 0x01: eax=0x02490002 ebx=0x003f3fff ecx=0x00000000
edx=0x00000000
0x00000015 0x00: eax=0x00000002 ebx=0x0000009e ecx=0x00000000
edx=0x00000000
0x00000016 0x00: eax=0x0000076c ebx=0x00000e10 ecx=0x00000064
edx=0x00000000
0x20000000 0x00: eax=0x0000076c ebx=0x00000e10 ecx=0x00000064
edx=0x00000000
0x80000000 0x00: eax=0x80000008 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
0x80000001 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000121
edx=0x2c100800
0x80000002 0x00: eax=0x65746e49 ebx=0x2952286c ecx=0x726f4320
edx=0x4d542865
0x80000003 0x00: eax=0x35692029 ebx=0x3533382d ecx=0x43205530
edx=0x40205550
0x80000004 0x00: eax=0x372e3120 ebx=0x7a484730 ecx=0x00000000
edx=0x00000000
0x80000005 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
0x80000006 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x01006040
edx=0x00000000
0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000
edx=0x00000100
0x80000008 0x00: eax=0x00003027 ebx=0x00000000 ecx=0x00000000
edx=0x00000000
0x80860000 0x00: eax=0x0000076c ebx=0x00000e10 ecx=0x00000064
edx=0x00000000
0xc0000000 0x00: eax=0x0000076c ebx=0x00000e10 ecx=0x00000064
edx=0x00000000

2022-08-23 19:06:15

by Dave Hansen

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

On 8/23/22 06:48, Paul Menzel wrote:
>>> I'm suspecting either a BIOS problem.  Reinette (cc'd) also thought this
>>> might be a case of the SGX initialization getting a bit too far along
>>> when it should have been disabled.
>>>
>>> We had some bugs where we didn't stop fast enough after spitting out the
>>> "SGX Launch Control is locked..." errors.
>
> Let’s hope it’s something known to you.

Thanks for the extra debug info. Unfortunately, nothing is really
sticking out as an obvious problem.

The EREMOVE return codes would be interesting to know, as well as an
idea what the physical addresses are that fail and the _counts_ of how
many pages get sanitized versus fail.

But, I don't really have a theory about what could be going on yet.

2022-08-23 22:43:18

by Paul Menzel

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

Dear Dave,


Thank you for your reply.

Am 23.08.22 um 18:32 schrieb Dave Hansen:
> On 8/23/22 06:48, Paul Menzel wrote:
>>>> I'm suspecting either a BIOS problem.  Reinette (cc'd) also thought this
>>>> might be a case of the SGX initialization getting a bit too far along
>>>> when it should have been disabled.
>>>>
>>>> We had some bugs where we didn't stop fast enough after spitting out the
>>>> "SGX Launch Control is locked..." errors.
>>
>> Let’s hope it’s something known to you.
>
> Thanks for the extra debug info. Unfortunately, nothing is really
> sticking out as an obvious problem.
>
> The EREMOVE return codes would be interesting to know, as well as an
> idea what the physical addresses are that fail and the _counts_ of how
> many pages get sanitized versus fail.

Is there a knob to print out this information? Or way to get this
information using ftrace? I’d like to avoid rebuilding the Linux kernel.

> But, I don't really have a theory about what could be going on yet.

Kind regards,

Paul

2022-08-24 19:04:16

by Dave Hansen

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

On 8/23/22 15:33, Paul Menzel wrote:
>> Thanks for the extra debug info.  Unfortunately, nothing is really
>> sticking out as an obvious problem.
>>
>> The EREMOVE return codes would be interesting to know, as well as an
>> idea what the physical addresses are that fail and the _counts_ of how
>> many pages get sanitized versus fail.
>
> Is there a knob to print out this information? Or way to get this
> information using ftrace? I’d like to avoid rebuilding the Linux kernel.

You can probably do it with a kprobe and ftrace, but it's a little bit
of a pain since the ENCL* instructions are all inlined and don't get
wrapped in actual function calls.

I'd just rebuild the kernel if it were me.

Maybe we just just uninline all of the ENCL* instruction so that we
*can* more easily trace them. It's not like they are performance sensitive.

2022-08-25 02:30:53

by Haitao Huang

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

Hi Paul

On Tue, 23 Aug 2022 08:48:52 -0500, Paul Menzel <[email protected]>
wrote:

> Dear Dave,
>
>
> Am 20.08.22 um 08:13 schrieb Paul Menzel:
>
>> Am 19.08.22 um 20:28 schrieb Dave Hansen:
>>> On 8/19/22 09:02, Paul Menzel wrote:
>>>> On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
>>>>
>>>> ```
>>>> [ 0.000000] Linux version 5.18.0-4-amd64
>>>> ([email protected]) (gcc-11 (Debian 11.3.0-5) 11.3.0,
>>>> GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP
>>>> PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
>>>> [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64
>>>> root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
>>>> […]
>>>> [ 0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
>>>> 07/06/2022
>>>> […]
>>>> [ 0.235418] sgx: EPC section 0x40200000-0x45f7ffff
>>
>>> Would you be able to send the entire dmesg, along with:
>> The log message are attached to the first message, where I missed to
>> carbon-copy linux-sgx@ [1].
>>
>>> cat /proc/iomem # (as root)
>>> and
>>> cpuid -1 --raw
>> I am going to provide that next week. (Side note, Intel might have
>> some Dell XPS 9370 test machines in some QA lab.)
>
> Please find both outputs at the end of the file.
>

Could you also check output of "sudo rdmsr -x 0x3a"?
Also was CONFIG_X86_SGX_KVM set?

If CONFIG_X86_SGX_KVM is not set and bit 17 (SGX_LC) of the MSR 3A not set,
then I think following sequence during sgx_init is possible:

sgx_page_cache_init -> sgx_setup_epc_section
->put all physical EPC pages in sgx_dirty_page_list.
Kick off ksgxd.
Later, sgx_drv_init returns none-zero due to this check:
if (!cpu_feature_enabled(X86_FEATURE_SGX_LC))
return -ENODEV;
sgx_vepc_init also returns none-zero if CONFIG_X86_SGX_KVM was not set.

And sgx_init will call kthread_stop(ksgxd_tsk):
ret = sgx_drv_init();

if (sgx_vepc_init() && ret)
goto err_provision;
...
err_provision:
misc_deregister(&sgx_dev_provision);

err_kthread:
kthread_stop(ksgxd_tsk);


That triggers __sgx_sanitize_pages return early due to these lines:
/* dirty_page_list is thread-local, no need for a lock: */
while (!list_empty(dirty_page_list)) {
if (kthread_should_stop())
return;

And that would trigger (depends on timing?) the warning in ksgxd due to
non-empty sgx_dirty_page_list
at that moment.

Thanks
Haitao

2022-08-25 05:28:50

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

On Thu, Aug 25, 2022 at 07:57:30AM +0300, Jarkko Sakkinen wrote:
> On Fri, Aug 19, 2022 at 11:28:24AM -0700, Dave Hansen wrote:
> > On 8/19/22 09:02, Paul Menzel wrote:
> > > On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
> > >
> > > ```
> > > [    0.000000] Linux version 5.18.0-4-amd64
> > > ([email protected]) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU
> > > ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC
> > > Debian 5.18.16-1 (2022-08-10)
> > > [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64
> > > root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> > > […]
> > > [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
> > > […]
> > > [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
> >
> > Hi Paul,
> >
> > Would you be able to send the entire dmesg, along with:
> >
> > cat /proc/iomem # (as root)
> > and
> > cpuid -1 --raw
> >
> > I'm suspecting either a BIOS problem. Reinette (cc'd) also thought this
> > might be a case of the SGX initialization getting a bit too far along
> > when it should have been disabled.
> >
> > We had some bugs where we didn't stop fast enough after spitting out the
> > "SGX Launch Control is locked..." errors.
>
> For some reason the pages do not get properly sanitized:
>
> /* sanity check: */
> WARN_ON(!list_empty(&sgx_dirty_page_list));
>
> EPC should be good, given that EREMOVE does not fail.
> If SGX would be disabled, also EREMOVE should fail.

Sorry forgot that in no circumstances we're printing the
error code inside __sgx_sanitize_pages(). I wrote a quick
patch to address this (attached) [*].

Paul,

Any chance to try the patch out? It's pretty hard to attach
e.g. kprobe to grab this info. Does it reproduce every single
time?

Alternatively: what kind of workload is triggering this?
I do own 2020 model XPS13, which might be able to
reproduce the same issue.

[*] Also: https://lore.kernel.org/linux-sgx/[email protected]/T/#u

BR, Jarkko


Attachments:
(No filename) (2.06 kB)
0001-x86-sgx-Print-EREMOVE-return-value-in-__sgx_sanitize.patch (2.08 kB)
Download all attachments

2022-08-25 05:38:34

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

On Fri, Aug 19, 2022 at 11:28:24AM -0700, Dave Hansen wrote:
> On 8/19/22 09:02, Paul Menzel wrote:
> > On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
> >
> > ```
> > [    0.000000] Linux version 5.18.0-4-amd64
> > ([email protected]) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU
> > ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC
> > Debian 5.18.16-1 (2022-08-10)
> > [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64
> > root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> > […]
> > [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
> > […]
> > [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
>
> Hi Paul,
>
> Would you be able to send the entire dmesg, along with:
>
> cat /proc/iomem # (as root)
> and
> cpuid -1 --raw
>
> I'm suspecting either a BIOS problem. Reinette (cc'd) also thought this
> might be a case of the SGX initialization getting a bit too far along
> when it should have been disabled.
>
> We had some bugs where we didn't stop fast enough after spitting out the
> "SGX Launch Control is locked..." errors.

For some reason the pages do not get properly sanitized:

/* sanity check: */
WARN_ON(!list_empty(&sgx_dirty_page_list));

EPC should be good, given that EREMOVE does not fail.
If SGX would be disabled, also EREMOVE should fail.

BR, Jarkko

2022-08-25 05:58:08

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

On Wed, Aug 24, 2022 at 12:33:07AM +0200, Paul Menzel wrote:
> Dear Dave,
>
>
> Thank you for your reply.
>
> Am 23.08.22 um 18:32 schrieb Dave Hansen:
> > On 8/23/22 06:48, Paul Menzel wrote:
> > > > > I'm suspecting either a BIOS problem.  Reinette (cc'd) also thought this
> > > > > might be a case of the SGX initialization getting a bit too far along
> > > > > when it should have been disabled.
> > > > >
> > > > > We had some bugs where we didn't stop fast enough after spitting out the
> > > > > "SGX Launch Control is locked..." errors.
> > >
> > > Let’s hope it’s something known to you.
> >
> > Thanks for the extra debug info. Unfortunately, nothing is really
> > sticking out as an obvious problem.
> >
> > The EREMOVE return codes would be interesting to know, as well as an
> > idea what the physical addresses are that fail and the _counts_ of how
> > many pages get sanitized versus fail.
>
> Is there a knob to print out this information? Or way to get this
> information using ftrace? I’d like to avoid rebuilding the Linux kernel.

Since __sgx_sanitize_pages() is a local symbol, it's not possible
to attach kprobe into it, so we actually do require a code change
to see inside.

BR, Jarkko

2022-08-25 06:30:28

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

On Wed, Aug 24, 2022 at 09:12:06PM -0500, Haitao Huang wrote:
> Hi Paul
>
> On Tue, 23 Aug 2022 08:48:52 -0500, Paul Menzel <[email protected]>
> wrote:
>
> > Dear Dave,
> >
> >
> > Am 20.08.22 um 08:13 schrieb Paul Menzel:
> >
> > > Am 19.08.22 um 20:28 schrieb Dave Hansen:
> > > > On 8/19/22 09:02, Paul Menzel wrote:
> > > > > On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
> > > > >
> > > > > ```
> > > > > [ 0.000000] Linux version 5.18.0-4-amd64
> > > > > ([email protected]) (gcc-11 (Debian 11.3.0-5)
> > > > > 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713)
> > > > > #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
> > > > > [ 0.000000] Command line:
> > > > > BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64
> > > > > root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> > > > > […]
> > > > > [ 0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS
> > > > > 1.21.0 07/06/2022
> > > > > […]
> > > > > [ 0.235418] sgx: EPC section 0x40200000-0x45f7ffff
> > >
> > > > Would you be able to send the entire dmesg, along with:
> > > The log message are attached to the first message, where I missed
> > > to carbon-copy linux-sgx@ [1].
> > >
> > > > cat /proc/iomem # (as root)
> > > > and
> > > > cpuid -1 --raw
> > > I am going to provide that next week. (Side note, Intel might have
> > > some Dell XPS 9370 test machines in some QA lab.)
> >
> > Please find both outputs at the end of the file.
> >
>
> Could you also check output of "sudo rdmsr -x 0x3a"?
> Also was CONFIG_X86_SGX_KVM set?
>
> If CONFIG_X86_SGX_KVM is not set and bit 17 (SGX_LC) of the MSR 3A not set,
> then I think following sequence during sgx_init is possible:
>
> sgx_page_cache_init -> sgx_setup_epc_section
> ->put all physical EPC pages in sgx_dirty_page_list.
> Kick off ksgxd.
> Later, sgx_drv_init returns none-zero due to this check:
> if (!cpu_feature_enabled(X86_FEATURE_SGX_LC))
> return -ENODEV;
> sgx_vepc_init also returns none-zero if CONFIG_X86_SGX_KVM was not set.
>
> And sgx_init will call kthread_stop(ksgxd_tsk):
> ret = sgx_drv_init();
>
> if (sgx_vepc_init() && ret)
> goto err_provision;
> ...
> err_provision:
> misc_deregister(&sgx_dev_provision);
>
> err_kthread:
> kthread_stop(ksgxd_tsk);
>
>
> That triggers __sgx_sanitize_pages return early due to these lines:
> /* dirty_page_list is thread-local, no need for a lock: */
> while (!list_empty(dirty_page_list)) {
> if (kthread_should_stop())
> return;
>
> And that would trigger (depends on timing?) the warning in ksgxd due to
> non-empty sgx_dirty_page_list
> at that moment.

You're correct, and it's not a bug but completely legit behaviour.

And given that non-empty dirty page list is legit behavior WARN_ON()
is not what should be used in here.

Fix coming in a bit.

BR, Jarkko

2022-08-25 07:43:52

by Paul Menzel

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

Dear Jarkko,


Am 25.08.22 um 07:25 schrieb Jarkko Sakkinen:
> On Thu, Aug 25, 2022 at 07:57:30AM +0300, Jarkko Sakkinen wrote:
>> On Fri, Aug 19, 2022 at 11:28:24AM -0700, Dave Hansen wrote:
>>> On 8/19/22 09:02, Paul Menzel wrote:
>>>> On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
>>>>
>>>> ```
>>>> [    0.000000] Linux version 5.18.0-4-amd64 ([email protected]) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
>>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64 root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
>>>> […]
>>>> [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
>>>> […]
>>>> [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff

>>> Would you be able to send the entire dmesg, along with:
>>>
>>> cat /proc/iomem # (as root)
>>> and
>>> cpuid -1 --raw
>>>
>>> I'm suspecting either a BIOS problem. Reinette (cc'd) also thought this
>>> might be a case of the SGX initialization getting a bit too far along
>>> when it should have been disabled.
>>>
>>> We had some bugs where we didn't stop fast enough after spitting out the
>>> "SGX Launch Control is locked..." errors.
>>
>> For some reason the pages do not get properly sanitized:
>>
>> /* sanity check: */
>> WARN_ON(!list_empty(&sgx_dirty_page_list));
>>
>> EPC should be good, given that EREMOVE does not fail.
>> If SGX would be disabled, also EREMOVE should fail.
>
> Sorry forgot that in no circumstances we're printing the
> error code inside __sgx_sanitize_pages(). I wrote a quick
> patch to address this (attached) [*].
>
> Paul,
>
> Any chance to try the patch out?

Yes, I am going to try it in the next days.

> It's pretty hard to attach e.g. kprobe to grab this info. Does it
> reproduce every single time?
Yes, on each boot up.

> Alternatively: what kind of workload is triggering this?
> I do own 2020 model XPS13, which might be able to
> reproduce the same issue.

The Dell XPS 13 9370 is from 2018 (Intel i5-8350U), so no idea if it
happens with later processors.


Kind regards,

Paul


> [*] Also: https://lore.kernel.org/linux-sgx/[email protected]/T/#u

2022-08-25 08:46:49

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

On Thu, Aug 25, 2022 at 08:46:19AM +0200, Paul Menzel wrote:
> Dear Jarkko,
>
>
> Am 25.08.22 um 07:25 schrieb Jarkko Sakkinen:
> > On Thu, Aug 25, 2022 at 07:57:30AM +0300, Jarkko Sakkinen wrote:
> > > On Fri, Aug 19, 2022 at 11:28:24AM -0700, Dave Hansen wrote:
> > > > On 8/19/22 09:02, Paul Menzel wrote:
> > > > > On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
> > > > >
> > > > > ```
> > > > > [    0.000000] Linux version 5.18.0-4-amd64 ([email protected]) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
> > > > > [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64 root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> > > > > […]
> > > > > [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
> > > > > […]
> > > > > [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
>
> > > > Would you be able to send the entire dmesg, along with:
> > > >
> > > > cat /proc/iomem # (as root)
> > > > and
> > > > cpuid -1 --raw
> > > >
> > > > I'm suspecting either a BIOS problem. Reinette (cc'd) also thought this
> > > > might be a case of the SGX initialization getting a bit too far along
> > > > when it should have been disabled.
> > > >
> > > > We had some bugs where we didn't stop fast enough after spitting out the
> > > > "SGX Launch Control is locked..." errors.
> > >
> > > For some reason the pages do not get properly sanitized:
> > >
> > > /* sanity check: */
> > > WARN_ON(!list_empty(&sgx_dirty_page_list));
> > >
> > > EPC should be good, given that EREMOVE does not fail.
> > > If SGX would be disabled, also EREMOVE should fail.
> >
> > Sorry forgot that in no circumstances we're printing the
> > error code inside __sgx_sanitize_pages(). I wrote a quick
> > patch to address this (attached) [*].
> >
> > Paul,
> >
> > Any chance to try the patch out?
>
> Yes, I am going to try it in the next days.
>
> > It's pretty hard to attach e.g. kprobe to grab this info. Does it
> > reproduce every single time?
> Yes, on each boot up.
>
> > Alternatively: what kind of workload is triggering this?
> > I do own 2020 model XPS13, which might be able to
> > reproduce the same issue.
>
> The Dell XPS 13 9370 is from 2018 (Intel i5-8350U), so no idea if it happens
> with later processors.

I think this should work out, and actually fix the issue:

https://lore.kernel.org/linux-sgx/[email protected]/T/#u

Just to add, perhaps for some future issue, I think my laptop and yours
are comparable because they have the SGX side pretty much the same. For
Icelake, things are not as comparable because it uses different type of
encryption engine in the hardware.

BR, Jarkko

2022-08-25 08:51:37

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

On Thu, Aug 25, 2022 at 08:49:53AM +0300, Jarkko Sakkinen wrote:
> On Wed, Aug 24, 2022 at 09:12:06PM -0500, Haitao Huang wrote:
> > Hi Paul
> >
> > On Tue, 23 Aug 2022 08:48:52 -0500, Paul Menzel <[email protected]>
> > wrote:
> >
> > > Dear Dave,
> > >
> > >
> > > Am 20.08.22 um 08:13 schrieb Paul Menzel:
> > >
> > > > Am 19.08.22 um 20:28 schrieb Dave Hansen:
> > > > > On 8/19/22 09:02, Paul Menzel wrote:
> > > > > > On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
> > > > > >
> > > > > > ```
> > > > > > [ 0.000000] Linux version 5.18.0-4-amd64
> > > > > > ([email protected]) (gcc-11 (Debian 11.3.0-5)
> > > > > > 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713)
> > > > > > #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
> > > > > > [ 0.000000] Command line:
> > > > > > BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64
> > > > > > root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> > > > > > […]
> > > > > > [ 0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS
> > > > > > 1.21.0 07/06/2022
> > > > > > […]
> > > > > > [ 0.235418] sgx: EPC section 0x40200000-0x45f7ffff
> > > >
> > > > > Would you be able to send the entire dmesg, along with:
> > > > The log message are attached to the first message, where I missed
> > > > to carbon-copy linux-sgx@ [1].
> > > >
> > > > > cat /proc/iomem # (as root)
> > > > > and
> > > > > cpuid -1 --raw
> > > > I am going to provide that next week. (Side note, Intel might have
> > > > some Dell XPS 9370 test machines in some QA lab.)
> > >
> > > Please find both outputs at the end of the file.
> > >
> >
> > Could you also check output of "sudo rdmsr -x 0x3a"?
> > Also was CONFIG_X86_SGX_KVM set?
> >
> > If CONFIG_X86_SGX_KVM is not set and bit 17 (SGX_LC) of the MSR 3A not set,
> > then I think following sequence during sgx_init is possible:
> >
> > sgx_page_cache_init -> sgx_setup_epc_section
> > ->put all physical EPC pages in sgx_dirty_page_list.
> > Kick off ksgxd.
> > Later, sgx_drv_init returns none-zero due to this check:
> > if (!cpu_feature_enabled(X86_FEATURE_SGX_LC))
> > return -ENODEV;
> > sgx_vepc_init also returns none-zero if CONFIG_X86_SGX_KVM was not set.
> >
> > And sgx_init will call kthread_stop(ksgxd_tsk):
> > ret = sgx_drv_init();
> >
> > if (sgx_vepc_init() && ret)
> > goto err_provision;
> > ...
> > err_provision:
> > misc_deregister(&sgx_dev_provision);
> >
> > err_kthread:
> > kthread_stop(ksgxd_tsk);
> >
> >
> > That triggers __sgx_sanitize_pages return early due to these lines:
> > /* dirty_page_list is thread-local, no need for a lock: */
> > while (!list_empty(dirty_page_list)) {
> > if (kthread_should_stop())
> > return;
> >
> > And that would trigger (depends on timing?) the warning in ksgxd due to
> > non-empty sgx_dirty_page_list
> > at that moment.
>
> You're correct, and it's not a bug but completely legit behaviour.
>
> And given that non-empty dirty page list is legit behavior WARN_ON()
> is not what should be used in here.
>
> Fix coming in a bit.

https://lore.kernel.org/linux-sgx/[email protected]/T/#u

BR, Jarkko

2022-08-26 09:56:25

by Paul Menzel

[permalink] [raw]
Subject: Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0

Dear Haitao,


Thank you for your reply. Just for the record:

Am 25.08.22 um 04:12 schrieb Haitao Huang:

> On Tue, 23 Aug 2022 08:48:52 -0500, Paul Menzel wrote:

>> Am 20.08.22 um 08:13 schrieb Paul Menzel:
>>
>>> Am 19.08.22 um 20:28 schrieb Dave Hansen:
>>>> On 8/19/22 09:02, Paul Menzel wrote:
>>>>> On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
>>>>>
>>>>> ```
>>>>> [    0.000000] Linux version 5.18.0-4-amd64 ([email protected]) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
>>>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64 root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
>>>>> […]
>>>>> [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
>>>>> […]
>>>>> [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
>>>
>>>> Would you be able to send the entire dmesg, along with:
>>>  The log message are attached to the first message, where I missed to
>>> carbon-copy linux-sgx@ [1].
>>>
>>>>     cat /proc/iomem # (as root)
>>>> and
>>>>     cpuid -1 --raw
>>>  I am going to provide that next week. (Side note, Intel might have
>>> some Dell XPS 9370 test machines in some QA lab.)
>>
>> Please find both outputs at the end of the file.
>
> Could you also check output of "sudo rdmsr -x 0x3a"?

40005

> Also was CONFIG_X86_SGX_KVM set?

No, it’s not set in Debian’s Linux kernel configuration.

> If CONFIG_X86_SGX_KVM is not set and bit 17 (SGX_LC) of the MSR 3A not set,
> then I think following sequence during sgx_init is possible:

40005 = 0x09c45, so bit 17 (if starting from 0) is 0.

[…]


Kind regards,

Paul