2024-04-13 13:06:26

by Michael Schierl

[permalink] [raw]
Subject: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

[please cc: me as I am not subscribed to either mailing list]

Hello,


I am writing to you as Jean is listed as maintainer of dmi, and the rest
are listed as maintainer for Hyper-V drivers. If I should have written
elsewhere, please kindly point me to the correct location.

I am having issues running 32-bit Debian (kernel 6.1.0) on Hyper-V on
Windows 11 (10.0.22631.3447) when the virtual machine has assigned more
than one vCPU. The kernel does not boot and no output is shown on screen.

I was able to redirect early printk to serial port and capture this panic:

> early console in setup code
> Probing EDD (edd=off to disable)... ok
> [ 0.000000] Linux version 6.1.0-18-686-pae ([email protected]) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01)
> [ 0.000000] BIOS-provided physical RAM map:
> [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
> [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffeffff] usable
> [ 0.000000] BIOS-e820: [mem 0x000000007fff0000-0x000000007fffefff] ACPI data
> [ 0.000000] BIOS-e820: [mem 0x000000007ffff000-0x000000007fffffff] ACPI NVS
> [ 0.000000] printk: bootconsole [earlyser0] enabled
> [ 0.000000] NX (Execute Disable) protection: active
> [ 0.000000] BUG: unable to handle page fault for address: ffa45000
> [ 0.000000] #PF: supervisor read access in kernel mode
> [ 0.000000] #PF: error_code(0x0000) - not-present page
> [ 0.000000] *pdpt = 000000000fe74001
> [ 0.000000] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-18-686-pae #1 Debian 6.1.76-1
> [ 0.000000] EIP: dmi_decode+0x2e3/0x40e
> [ 0.000000] Code: 10 53 e8 b8 f9 ff ff 83 c4 0c e9 3e 01 00 00 0f b6 7e 01 31 db 83 ef 04 d1 ef 39 df 0f 8e 2b 01 00 00 8a 4c 5e 04 84 c9 79 1e <0f> b6 54 5e 05 89 f0 88 4d f0 e8 c0 f7 ff ff 8a 4d f0 89 c2 89 c8
> [ 0.000000] EAX: cff6d220 EBX: 000024bd ECX: cfd2caff EDX: cf9e942c
> [ 0.000000] ESI: ffa40681 EDI: 7ffffffe EBP: cfc37e90 ESP: cfc37e80
> [ 0.000000] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210086
> [ 0.000000] CR0: 80050033 CR2: ffa45000 CR3: 0fe78000 CR4: 00000020
> [ 0.000000] Call Trace:
> [ 0.000000] ? __die_body.cold+0x14/0x1a
> [ 0.000000] ? __die+0x21/0x26
> [ 0.000000] ? page_fault_oops+0x69/0x120
> [ 0.000000] ? uuid_string+0x157/0x1a0
> [ 0.000000] ? kernelmode_fixup_or_oops.constprop.0+0x80/0xe0
> [ 0.000000] ? __bad_area_nosemaphore.constprop.0+0xfc/0x130
> [ 0.000000] ? bad_area_nosemaphore+0xf/0x20
> [ 0.000000] ? do_kern_addr_fault+0x79/0x90
> [ 0.000000] ? exc_page_fault+0xbc/0x160
> [ 0.000000] ? paravirt_BUG+0x10/0x10
> [ 0.000000] ? handle_exception+0x133/0x133
> [ 0.000000] ? dmi_disable_osi_vista+0x1/0x37
> [ 0.000000] ? paravirt_BUG+0x10/0x10
> [ 0.000000] ? dmi_decode+0x2e3/0x40e
> [ 0.000000] ? dmi_disable_osi_vista+0x1/0x37
> [ 0.000000] ? paravirt_BUG+0x10/0x10
> [ 0.000000] ? dmi_decode+0x2e3/0x40e
> [ 0.000000] ? dmi_smbios3_present+0xd8/0xd8
> [ 0.000000] dmi_decode_table+0xa9/0xe0
> [ 0.000000] ? dmi_smbios3_present+0xd8/0xd8
> [ 0.000000] ? dmi_smbios3_present+0xd8/0xd8
> [ 0.000000] dmi_walk_early+0x34/0x58
> [ 0.000000] dmi_present+0x149/0x1b6
> [ 0.000000] dmi_setup+0x18d/0x22e
> [ 0.000000] setup_arch+0x676/0xd3f
> [ 0.000000] ? lockdown_lsm_init+0x1c/0x20
> [ 0.000000] ? initialize_lsm+0x33/0x4e
> [ 0.000000] start_kernel+0x65/0x644
> [ 0.000000] ? set_intr_gate+0x45/0x58
> [ 0.000000] ? early_idt_handler_common+0x44/0x44
> [ 0.000000] i386_start_kernel+0x48/0x4a
> [ 0.000000] startup_32_smp+0x161/0x164
> [ 0.000000] Modules linked in:
> [ 0.000000] CR2: 00000000ffa45000
> [ 0.000000] ---[ end trace 0000000000000000 ]---
> [ 0.000000] EIP: dmi_decode+0x2e3/0x40e
> [ 0.000000] Code: 10 53 e8 b8 f9 ff ff 83 c4 0c e9 3e 01 00 00 0f b6 7e 01 31 db 83 ef 04 d1 ef 39 df 0f 8e 2b 01 00 00 8a 4c 5e 04 84 c9 79 1e <0f> b6 54 5e 05 89 f0 88 4d f0 e8 c0 f7 ff ff 8a 4d f0 89 c2 89 c8
> [ 0.000000] EAX: cff6d220 EBX: 000024bd ECX: cfd2caff EDX: cf9e942c
> [ 0.000000] ESI: ffa40681 EDI: 7ffffffe EBP: cfc37e90 ESP: cfc37e80
> [ 0.000000] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210086
> [ 0.000000] CR0: 80050033 CR2: ffa45000 CR3: 0fe78000 CR4: 00000020
> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

The same panic can be reproduced with vanilla 6.8.4 kernel.

By adding some (or rather a lot of) printk into dmi_scan.c, I believe
that the issue is caused by this line:

<https://github.com/torvalds/linux/blob/13a0ac816d22aa47d6c393f14a99f39e49b960df/drivers/firmware/dmi_scan.c#L295>

Or rather by a dmi_header with dm->type == 10 and dm->length == 0.

As the length is (unsigned) zero, after subtracting the (unsigned)
header length and dividing by two, count is slightly below signed
integer max value (and stays there after being casted to signed),
resulting in the loop running "forever" until it reaches non-mapped
memory, resulting in the panic above.


I am unsure who is the culprit, whether DMI header is supposed to not
have length zero or whether Linux is supposed to parse it more gracefully.

In any case, when adding an extra if clause to this function to return
early in case dm->length is zero, the system boots fine and appears to
work fine at first glance. As I unfortunately have no idea what DMI is
used for by the kernels, I do not know if there are any other things I
should test, since the "Onboard device information" is obviously missing.


If I should perform other tests, please tell me. Otherwise I hope that
either an update of Hyper-V or the Linux kernel (or maybe some kernel
parameter I missed) can make 32-bit Linux bootable on Hyper-V again in
the future.

[Slightly off-topic: As 64-bit kernels work fine, if there are ways to
run a 32-bit userland containerized or chrooted in a 64-bit kernel so
that the userland (espeically uname and autoconf) cannot distinguish
from a 32-bit kernel, that might be another option for my use case.
Nested virtualization would of course also work, but the performance
loss due to nested virtualization negates the effect of being able to
pass more than one of the (2 physical, 4 hyperthreaded) cores of my
laptop to the VM].



Thanks for help and best regards,


Michael


2024-04-15 03:18:02

by Michael Kelley

[permalink] [raw]
Subject: RE: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

From: Michael Schierl <[email protected]> Sent: Saturday, April 13, 2024 6:06 AM
>
> I am writing to you as Jean is listed as maintainer of dmi, and the rest
> are listed as maintainer for Hyper-V drivers. If I should have written
> elsewhere, please kindly point me to the correct location.
>
> I am having issues running 32-bit Debian (kernel 6.1.0) on Hyper-V on
> Windows 11 (10.0.22631.3447) when the virtual machine has assigned more
> than one vCPU. The kernel does not boot and no output is shown on screen.
>
> I was able to redirect early printk to serial port and capture this panic:
>
> > early console in setup code
> > Probing EDD (edd=off to disable)... ok
> > [ 0.000000] Linux version 6.1.0-18-686-pae ([email protected]) (gcc-12
> (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP
> PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01)
> > [ 0.000000] BIOS-provided physical RAM map:
> > [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> > [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> > [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
> > [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffeffff] usable
> > [ 0.000000] BIOS-e820: [mem 0x000000007fff0000-0x000000007fffefff] ACPI data
> > [ 0.000000] BIOS-e820: [mem 0x000000007ffff000-0x000000007fffffff] ACPI NVS
> > [ 0.000000] printk: bootconsole [earlyser0] enabled
> > [ 0.000000] NX (Execute Disable) protection: active
> > [ 0.000000] BUG: unable to handle page fault for address: ffa45000
> > [ 0.000000] #PF: supervisor read access in kernel mode
> > [ 0.000000] #PF: error_code(0x0000) - not-present page
> > [ 0.000000] *pdpt = 000000000fe74001
> > [ 0.000000] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-18-686-pae #1 Debian 6.1.76-1
> > [ 0.000000] EIP: dmi_decode+0x2e3/0x40e
> > [ 0.000000] Code: 10 53 e8 b8 f9 ff ff 83 c4 0c e9 3e 01 00 00 0f b6 7e 01 31 db 83 ef 04
> d1 ef 39 df 0f 8e 2b 01 00 00 8a 4c 5e 04 84 c9 79 1e <0f> b6 54 5e 05 89 f0 88 4d f0 e8 c0
> f7 ff ff 8a 4d f0 89 c2 89 c8
> > [ 0.000000] EAX: cff6d220 EBX: 000024bd ECX: cfd2caff EDX: cf9e942c
> > [ 0.000000] ESI: ffa40681 EDI: 7ffffffe EBP: cfc37e90 ESP: cfc37e80
> > [ 0.000000] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210086
> > [ 0.000000] CR0: 80050033 CR2: ffa45000 CR3: 0fe78000 CR4: 00000020
> > [ 0.000000] Call Trace:
> > [ 0.000000] ? __die_body.cold+0x14/0x1a
> > [ 0.000000] ? __die+0x21/0x26
> > [ 0.000000] ? page_fault_oops+0x69/0x120
> > [ 0.000000] ? uuid_string+0x157/0x1a0
> > [ 0.000000] ? kernelmode_fixup_or_oops.constprop.0+0x80/0xe0
> > [ 0.000000] ? __bad_area_nosemaphore.constprop.0+0xfc/0x130
> > [ 0.000000] ? bad_area_nosemaphore+0xf/0x20
> > [ 0.000000] ? do_kern_addr_fault+0x79/0x90
> > [ 0.000000] ? exc_page_fault+0xbc/0x160
> > [ 0.000000] ? paravirt_BUG+0x10/0x10
> > [ 0.000000] ? handle_exception+0x133/0x133
> > [ 0.000000] ? dmi_disable_osi_vista+0x1/0x37
> > [ 0.000000] ? paravirt_BUG+0x10/0x10
> > [ 0.000000] ? dmi_decode+0x2e3/0x40e
> > [ 0.000000] ? dmi_disable_osi_vista+0x1/0x37
> > [ 0.000000] ? paravirt_BUG+0x10/0x10
> > [ 0.000000] ? dmi_decode+0x2e3/0x40e
> > [ 0.000000] ? dmi_smbios3_present+0xd8/0xd8
> > [ 0.000000] dmi_decode_table+0xa9/0xe0
> > [ 0.000000] ? dmi_smbios3_present+0xd8/0xd8
> > [ 0.000000] ? dmi_smbios3_present+0xd8/0xd8
> > [ 0.000000] dmi_walk_early+0x34/0x58
> > [ 0.000000] dmi_present+0x149/0x1b6
> > [ 0.000000] dmi_setup+0x18d/0x22e
> > [ 0.000000] setup_arch+0x676/0xd3f
> > [ 0.000000] ? lockdown_lsm_init+0x1c/0x20
> > [ 0.000000] ? initialize_lsm+0x33/0x4e
> > [ 0.000000] start_kernel+0x65/0x644
> > [ 0.000000] ? set_intr_gate+0x45/0x58
> > [ 0.000000] ? early_idt_handler_common+0x44/0x44
> > [ 0.000000] i386_start_kernel+0x48/0x4a
> > [ 0.000000] startup_32_smp+0x161/0x164
> > [ 0.000000] Modules linked in:
> > [ 0.000000] CR2: 00000000ffa45000
> > [ 0.000000] ---[ end trace 0000000000000000 ]---
> > [ 0.000000] EIP: dmi_decode+0x2e3/0x40e
> > [ 0.000000] Code: 10 53 e8 b8 f9 ff ff 83 c4 0c e9 3e 01 00 00 0f b6 7e 01 31 db 83 ef 04
> d1 ef 39 df 0f 8e 2b 01 00 00 8a 4c 5e 04 84 c9 79 1e <0f> b6 54 5e 05 89 f0 88 4d f0 e8 c0
> f7 ff ff 8a 4d f0 89 c2 89 c8
> > [ 0.000000] EAX: cff6d220 EBX: 000024bd ECX: cfd2caff EDX: cf9e942c
> > [ 0.000000] ESI: ffa40681 EDI: 7ffffffe EBP: cfc37e90 ESP: cfc37e80
> > [ 0.000000] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210086
> > [ 0.000000] CR0: 80050033 CR2: ffa45000 CR3: 0fe78000 CR4: 00000020
> > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>
> The same panic can be reproduced with vanilla 6.8.4 kernel.
>
> By adding some (or rather a lot of) printk into dmi_scan.c, I believe
> that the issue is caused by this line:
>
> https://github.com/torvalds/linux/blob/13a0ac816d22aa47d6c393f14a99f39e49b960df/drivers/firmware/dmi_scan.c#L295>
>
> Or rather by a dmi_header with dm->type == 10 and dm->length == 0.
>
> As the length is (unsigned) zero, after subtracting the (unsigned)
> header length and dividing by two, count is slightly below signed
> integer max value (and stays there after being casted to signed),
> resulting in the loop running "forever" until it reaches non-mapped
> memory, resulting in the panic above.
>
> I am unsure who is the culprit, whether DMI header is supposed to not
> have length zero or whether Linux is supposed to parse it more gracefully.

Good debugging to narrow down the problem!

I would think the DMI header should not have a zero length, but the Linux
parsing should be more robust if it encounters such a bogus header.
Computing a bogus iteration limit is definitely not robust.

But the problem might not be a bad DMI header. It could be some kind of
bug that causes Linux to get misaligned, such that what it is looking at
isn't really a DMI header. See below for suggestions on how to narrow
it down further.

>
> In any case, when adding an extra if clause to this function to return
> early in case dm->length is zero, the system boots fine and appears to
> work fine at first glance. As I unfortunately have no idea what DMI is
> used for by the kernels, I do not know if there are any other things I
> should test, since the "Onboard device information" is obviously missing.
>
>
> If I should perform other tests, please tell me. Otherwise I hope that
> either an update of Hyper-V or the Linux kernel (or maybe some kernel
> parameter I missed) can make 32-bit Linux bootable on Hyper-V again in
> the future.

Let me suggest some additional diagnostics. The DMI information is
provided by the virtual firmware, which is provided by the Hyper-V
host. The raw DMI bytes are available in Linux at

/sys/firmware/dmi/tables/DMI

If you do "hexdump /sys/firmware/dmi/tables/DMI" on your
patched 32-bit kernel and on a working 64-bit kernel, do you see the
same hex data? (send the output to a file in each case, and then
compare the two files) If the DMI data is exactly the same, and a
64-bit kernel works, then perhaps there's a bug in the
DMI parsing code when the kernel is compiled in 32-bit mode.

Also, what is the output of "dmidecode | grep type", both on your
patched 32-bit kernel and a working 64-bit kernel? Here's
what I get with a 64-bit Linux kernel guest on exactly the same
version of Windows 11 that you have:

root@mhkubun:~# dmidecode | grep type
Handle 0x0000, DMI type 0, 26 bytes
Handle 0x0001, DMI type 1, 27 bytes
Handle 0x0002, DMI type 3, 24 bytes
Handle 0x0003, DMI type 2, 17 bytes
Handle 0x0004, DMI type 4, 48 bytes
Handle 0x0005, DMI type 11, 5 bytes
Handle 0x0006, DMI type 16, 23 bytes
Handle 0x0007, DMI type 17, 92 bytes
Handle 0x0008, DMI type 19, 31 bytes
Handle 0x0009, DMI type 20, 35 bytes
Handle 0x000A, DMI type 17, 92 bytes
Handle 0x000B, DMI type 19, 31 bytes
Handle 0x000C, DMI type 20, 35 bytes
Handle 0x000D, DMI type 32, 11 bytes
Handle 0xFEFF, DMI type 127, 4 bytes

Interestingly, there's no entry of type "10", though perhaps your
VM is configured differently from mine. Try also

dmidecode -u

What details are provided for "type 10" (On Board Devices)? That
may help identify which device(s) are causing the problem. Then I
might be able to repro the problem and do some debugging myself.

One final question: Is there an earlier version of the Linux kernel
where 32-bit builds worked for you on this same Windows 11
version?

Michael Kelley

>
> [Slightly off-topic: As 64-bit kernels work fine, if there are ways to
> run a 32-bit userland containerized or chrooted in a 64-bit kernel so
> that the userland (espeically uname and autoconf) cannot distinguish
> from a 32-bit kernel, that might be another option for my use case.
> Nested virtualization would of course also work, but the performance
> loss due to nested virtualization negates the effect of being able to
> pass more than one of the (2 physical, 4 hyperthreaded) cores of my
> laptop to the VM].
>
>
>
> Thanks for help and best regards,
>
>
> Michael


2024-04-15 20:15:38

by Wei Liu

[permalink] [raw]
Subject: Re: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

On Sat, Apr 13, 2024 at 03:06:05PM +0200, Michael Schierl wrote:
> [please cc: me as I am not subscribed to either mailing list]
>
[...]
> [Slightly off-topic: As 64-bit kernels work fine, if there are ways to
> run a 32-bit userland containerized or chrooted in a 64-bit kernel so
> that the userland (espeically uname and autoconf) cannot distinguish
> from a 32-bit kernel, that might be another option for my use case.
> Nested virtualization would of course also work, but the performance
> loss due to nested virtualization negates the effect of being able to
> pass more than one of the (2 physical, 4 hyperthreaded) cores of my
> laptop to the VM].
>

Have you tried `linux32`?

See https://linux.die.net/man/8/linux32

Thanks,
Wei.

>
>
> Thanks for help and best regards,
>
>
> Michael

2024-04-15 21:03:36

by Michael Schierl

[permalink] [raw]
Subject: Re: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

Hello Michael,

Am 15.04.2024 um 05:17 schrieb Michael Kelley:

> Let me suggest some additional diagnostics. The DMI information is
> provided by the virtual firmware, which is provided by the Hyper-V
> host. The raw DMI bytes are available in Linux at
>
> /sys/firmware/dmi/tables/DMI
>
> If you do "hexdump /sys/firmware/dmi/tables/DMI" on your
> patched 32-bit kernel and on a working 64-bit kernel, do you see the
> same hex data? (send the output to a file in each case, and then
> compare the two files)

For convenience, as I currently have no installed system with 64-bit
kernel on this Hyper-v instance, I tested with 32-bit and 64-bit kernel
6.0.8 from live media (grml96 2022.11 from http://www.grml.org), as well as
with my own 32-bit kernel (only for 2-core case).

In any case, I see the same content for /sys/firmware/rmi/tables/DMI as
well as /sys/firmware/dmi/tables/smbios_entry_point on 32-bit vs. 64-bit
kernels. But I see different content when booted with 1 vs. 2 vCPU.

So it is understandable to me why 1 vCPU behaves different from 2vCPU,
but not clear why 32-bit behaves different from 64-bit (assuming in both
cases the same parts of the dmi "blob" are parsed).


> If the DMI data is exactly the same, and a
> 64-bit kernel works, then perhaps there's a bug in the
> DMI parsing code when the kernel is compiled in 32-bit mode.
>
> Also, what is the output of "dmidecode | grep type", both on your
> patched 32-bit kernel and a working 64-bit kernel?


On 64-bit I see output on stderr as well as stdout.


Invalid entry length (0). DMI table is broken! Stop.

The output before is the same when grepping for type

Handle 0x0000, DMI type 0, 20 bytes
Handle 0x0001, DMI type 1, 25 bytes
Handle 0x0002, DMI type 2, 8 bytes
Handle 0x0003, DMI type 3, 17 bytes
Handle 0x0004, DMI type 11, 5 bytes


When not grepping for type, the only difference is the number of structures

1core: 339 structures occupying 17307 bytes.
2core: 356 structures occupying 17307 bytes.

I put everything (raw and hex) up at
<https://gist.github.com/schierlm/4a1f38565856c49e4e4b534cf51961be>

> root@mhkubun:~# dmidecode | grep type
> Handle 0x0000, DMI type 0, 26 bytes
> Handle 0x0001, DMI type 1, 27 bytes
> Handle 0x0002, DMI type 3, 24 bytes
> Handle 0x0003, DMI type 2, 17 bytes
> Handle 0x0004, DMI type 4, 48 bytes
> Handle 0x0005, DMI type 11, 5 bytes
> Handle 0x0006, DMI type 16, 23 bytes
> Handle 0x0007, DMI type 17, 92 bytes
> Handle 0x0008, DMI type 19, 31 bytes
> Handle 0x0009, DMI type 20, 35 bytes
> Handle 0x000A, DMI type 17, 92 bytes
> Handle 0x000B, DMI type 19, 31 bytes
> Handle 0x000C, DMI type 20, 35 bytes
> Handle 0x000D, DMI type 32, 11 bytes
> Handle 0xFEFF, DMI type 127, 4 bytes

That looks healthier than mine... Maybe it also depends on the host...?

> Interestingly, there's no entry of type "10", though perhaps your
> VM is configured differently from mine. Try also
>
> dmidecode -u
>
> What details are provided for "type 10" (On Board Devices)? That
> may help identify which device(s) are causing the problem. Then I
> might be able to repro the problem and do some debugging myself.

No type 10, but again the error on stderr (even with only 1 vCPU).


> One final question: Is there an earlier version of the Linux kernel
> where 32-bit builds worked for you on this same Windows 11
> version?

I am not aware of any (I came from Windows 10 with VirtualBox and wanted
to move my setup to Windows 11 with Hyper-V).

I just tested a 10-year old Linux live media with kernel 3.16.7, and it
behaves the same (2vCPU 32-bit does not boot, the other configurations
do). I may have some more really old live media on physical CDROMs
around, but I doubt is is useful testing these to find out if some
really old kernel would work better.


Thanks again,


Michael


2024-04-15 23:31:55

by Michael Kelley

[permalink] [raw]
Subject: RE: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

From: Michael Schierl <[email protected]> Sent: Monday, April 15, 2024 2:03 PM
>
>
> In any case, I see the same content for /sys/firmware/rmi/tables/DMI as
> well as /sys/firmware/dmi/tables/smbios_entry_point on 32-bit vs. 64-bit
> kernels. But I see different content when booted with 1 vs. 2 vCPU.
>
> So it is understandable to me why 1 vCPU behaves different from 2vCPU,
> but not clear why 32-bit behaves different from 64-bit (assuming in both
> cases the same parts of the dmi "blob" are parsed).
>
>
> > If the DMI data is exactly the same, and a
> > 64-bit kernel works, then perhaps there's a bug in the
> > DMI parsing code when the kernel is compiled in 32-bit mode.
> >
> > Also, what is the output of "dmidecode | grep type", both on your
> > patched 32-bit kernel and a working 64-bit kernel?
>
>
> On 64-bit I see output on stderr as well as stdout.
>
>
> Invalid entry length (0). DMI table is broken! Stop.
>
> The output before is the same when grepping for type
>
> Handle 0x0000, DMI type 0, 20 bytes
> Handle 0x0001, DMI type 1, 25 bytes
> Handle 0x0002, DMI type 2, 8 bytes
> Handle 0x0003, DMI type 3, 17 bytes
> Handle 0x0004, DMI type 11, 5 bytes
>
>
> When not grepping for type, the only difference is the number of structures
>
> 1core: 339 structures occupying 17307 bytes.
> 2core: 356 structures occupying 17307 bytes.
>
> I put everything (raw and hex) up at
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%
> 2Fschierlm%2F4a1f38565856c49e4e4b534cf51961be&data=05%7C02%7C%7Ceaa9f9fd
> d3ac480032a808dc5d8f79ec%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C63
> 8488118016043559%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV
> 2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=dcWXWisnx9wwFt
> P4wucScpfQdfI3w%2Fzih%2BGZTGbheJg%3D&reserved=0>
>
> > root@mhkubun:~# dmidecode | grep type
> > Handle 0x0000, DMI type 0, 26 bytes
> > Handle 0x0001, DMI type 1, 27 bytes
> > Handle 0x0002, DMI type 3, 24 bytes
> > Handle 0x0003, DMI type 2, 17 bytes
> > Handle 0x0004, DMI type 4, 48 bytes
> > Handle 0x0005, DMI type 11, 5 bytes
> > Handle 0x0006, DMI type 16, 23 bytes
> > Handle 0x0007, DMI type 17, 92 bytes
> > Handle 0x0008, DMI type 19, 31 bytes
> > Handle 0x0009, DMI type 20, 35 bytes
> > Handle 0x000A, DMI type 17, 92 bytes
> > Handle 0x000B, DMI type 19, 31 bytes
> > Handle 0x000C, DMI type 20, 35 bytes
> > Handle 0x000D, DMI type 32, 11 bytes
> > Handle 0xFEFF, DMI type 127, 4 bytes
>
> That looks healthier than mine... Maybe it also depends on the host...?
>
> > Interestingly, there's no entry of type "10", though perhaps your
> > VM is configured differently from mine. Try also
> >
> > dmidecode -u
> >
> > What details are provided for "type 10" (On Board Devices)? That
> > may help identify which device(s) are causing the problem. Then I
> > might be able to repro the problem and do some debugging myself.
>
> No type 10, but again the error on stderr (even with only 1 vCPU).
>

OK, good info. If the "dmidecode" program in user space is also
complaining about a bad entry, then Hyper-V probably really has
created a bad entry.

Can you give me details of the Hyper-V VM configuration? Maybe
a screenshot of the Hyper-V Manager "Settings" for the VM would
be a good starting point, though some of the details are on
sub-panels in the UI. I'm guessing your 32-bit Linux VM is
a Generation 1 VM. FWIW, my example was a Generation 2 VM.
When you ran a 64-bit Linux and did not have the problem, was
that with exactly the same Hyper-V VM configuration, or a different
config? Perhaps something about the VM configuration tickles a
bug in Hyper-V and it builds a faulty DMI entry, so I'm focusing on
that aspect. If we can figure out what aspect of the VM config
causes the bad DMI entry to be generated, there might be an
easy work-around for you in tweaking the VM config.

Michael Kelley

2024-04-16 21:37:44

by Michael Schierl

[permalink] [raw]
Subject: Re: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

Hello,


Am 16.04.2024 um 01:31 schrieb Michael Kelley:

> Can you give me details of the Hyper-V VM configuration? Maybe
> a screenshot of the Hyper-V Manager "Settings" for the VM would
> be a good starting point, though some of the details are on
> sub-panels in the UI.

It used to be possible to export Hyper-V VM settings as XML, but
apparently that option has been removed in Win2016/Win10, in favor of
their own proprietary binary .vmcx format...

Also, maybe it matters what else Hyper-V is doing. I've installed both
WSL and WSA, and Windows Defender is using Core Isolation Memory
Integrity. I have also enabled support for nested virtualisation in the
Host/Network Switch, but not in that VM.

Anyway, I just created two new VMs (one of each generation) with no hard
disk and everything else default, added a DVD drive to the SCSI
controller of Gen2 (which Gen1 already had on its IDE controller),
disabled Secure Boot on Gen2 and added a second vCPU to Gen1 (which Gen2
already had).

Afterwards, Gen2's dmidecode looks like the summary you posted, and Gen1
reproduces the issue.

> I'm guessing your 32-bit Linux VM is
> a Generation 1 VM. FWIW, my example was a Generation 2 VM.

Very interesting that Gen2 boots 32-bit Linux better than Gen1 (there is
a delay during hardware autoconfigruation (systemd-udevd) for about 30
seconds when booting Gen2 which I did not investigate yet), despite the
documentation claiming not to use Gen2 for any 32-bit Host OSes.

So I assume this only applies to crappy OSes that directly couple their
bitness to the bitness of the UEFI firmware.

To be fair, the live media I'm using uses Grub's "non-compliant" Linux
loader that bypasses the kernel's EFI stub. When trying with Grub's
"linuxefi" loader, Linux does not boot either, as expected. (On the Gen1
VM, the panic happens regardless whether I use grub's linux16 or linux
loader, and also with SYSLINUX/ISOLINUX loader).

> When you ran a 64-bit Linux and did not have the problem, was
> that with exactly the same Hyper-V VM configuration, or a different
> config?

All my tests were performed with a single (Gen1) VM, and the only
setting I changed was the number of vCPUs.


Regards,


Michael


2024-04-16 23:20:47

by Michael Kelley

[permalink] [raw]
Subject: RE: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

From: Michael Schierl <[email protected]> Sent: Tuesday, April 16, 2024 2:24 PM
>
> Am 16.04.2024 um 01:31 schrieb Michael Kelley:
>
> > Can you give me details of the Hyper-V VM configuration? Maybe
> > a screenshot of the Hyper-V Manager "Settings" for the VM would
> > be a good starting point, though some of the details are on
> > sub-panels in the UI.
>
> It used to be possible to export Hyper-V VM settings as XML, but
> apparently that option has been removed in Win2016/Win10, in favor of
> their own proprietary binary .vmcx format...
>
> Also, maybe it matters what else Hyper-V is doing. I've installed both
> WSL and WSA, and Windows Defender is using Core Isolation Memory
> Integrity. I have also enabled support for nested virtualisation in the
> Host/Network Switch, but not in that VM.
>
> Anyway, I just created two new VMs (one of each generation) with no hard
> disk and everything else default, added a DVD drive to the SCSI
> controller of Gen2 (which Gen1 already had on its IDE controller),
> disabled Secure Boot on Gen2 and added a second vCPU to Gen1 (which Gen2
> already had).
>
> Afterwards, Gen2's dmidecode looks like the summary you posted, and Gen1
> reproduces the issue.
>
> > I'm guessing your 32-bit Linux VM is
> > a Generation 1 VM. FWIW, my example was a Generation 2 VM.
>
> Very interesting that Gen2 boots 32-bit Linux better than Gen1 (there is
> a delay during hardware autoconfigruation (systemd-udevd) for about 30
> seconds when booting Gen2 which I did not investigate yet), despite the
> documentation claiming not to use Gen2 for any 32-bit Host OSes.
>
> So I assume this only applies to crappy OSes that directly couple their
> bitness to the bitness of the UEFI firmware.
>
> To be fair, the live media I'm using uses Grub's "non-compliant" Linux
> loader that bypasses the kernel's EFI stub. When trying with Grub's
> "linuxefi" loader, Linux does not boot either, as expected. (On the Gen1
> VM, the panic happens regardless whether I use grub's linux16 or linux
> loader, and also with SYSLINUX/ISOLINUX loader).
>
> > When you ran a 64-bit Linux and did not have the problem, was
> > that with exactly the same Hyper-V VM configuration, or a different
> > config?
>
> All my tests were performed with a single (Gen1) VM, and the only
> setting I changed was the number of vCPUs.
>

Thanks for the information. I now have a repro of "dmidecode"
in user space complaining about a zero length entry, when running
in a Gen 1 VM with a 64-bit Linux guest. Looking at
/sys/firmware/dmi/tables/DMI, that section of the DMI blob definitely
seems messed up. The handle is 0x0005, which is the next handle in
sequence, but the length and type of the entry are zero. This is a bit
different from the type 10 entry that you saw the 32-bit kernel
choking on, and I don't have an explanation for that. After this
bogus entry, there are a few bytes I don't recognize, then about
100 bytes of zeros, which also seems weird.

But at this point, it's good that I have a repro. It has been a while since
I've built and run a 32-bit kernel, but I think I can get that set up with
the ability to get output during early boot. I'll do some further
debugging with dmidecode and with the 32-bit kernel to figure out
what's going on. There are several mysteries here: 1) Is Hyper-V
really building a bad DMI blob, or is something else trashing it?
2) Why does a 64-bit kernel succeed on the putative bad DMI blob,
while a 32-bit kernel fails? 3) Is dmidecode seeing something
different from the Linux kernel?

Give me a few days to sort all this out. And if Linux can be made
more robust in the face of a bad DMI table entry, I'll submit a
Linux kernel patch for that.

Michael Kelley

2024-04-17 09:45:56

by Jean DELVARE

[permalink] [raw]
Subject: Re: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

Hi Michael and Michael,

Thanks to both of you for all the data and early analysis.

On Tue, 2024-04-16 at 23:20 +0000, Michael Kelley wrote:
> Thanks for the information.  I now have a repro of "dmidecode"
> in user space complaining about a zero length entry, when running
> in a Gen 1 VM with a 64-bit Linux guest.  Looking at
> /sys/firmware/dmi/tables/DMI, that section of the DMI blob definitely
> seems messed up.  The handle is 0x0005, which is the next handle in
> sequence, but the length and type of the entry are zero.  This is a bit
> different from the type 10 entry that you saw the 32-bit kernel
> choking on, and I don't have an explanation for that.  After this
> bogus entry, there are a few bytes I don't recognize, then about
> 100 bytes of zeros, which also seems weird.

Don't let the type 10 distract you. It is entirely possible that the
byte corresponding to type == 10 is already part of the corrupted
memory area. Can you check if the DMI table generated by Hyper-V is
supposed to contain type 10 records at all?

This smells like the DMI table has been overwritten by "something".
Either it happened even before boot, that is, the DMI table generated
by the VM itself is corrupted in the first place, or the DMI table was
originally good but other kernel code wrote some data at the same
memory location (I've seen this once in the past, although that was on
bare metal). That would possibly still be the result of bad information
provided by the VM (for example 2 "hardware" features being told to use
overlapping memory ranges).

You should also check the memory map (as displayed early at boot, so
near the top of dmesg) and verify that the DMI table is located in a
"reserved" memory area, so that area can't be used for memory
allocation. Example on my laptop :

# dmidecode 3.4
Getting SMBIOS data from sysfs.
SMBIOS 3.1.1 present.
Table at 0xBA135000.

So the table starts at physical address 0xba135000, which is in the
following memory map segment:

reserve setup_data: [mem 0x00000000b87b0000-0x00000000bb77dfff] reserved

This memory area is marked as "reserved" so all is well. In my case,
the table is 2256 bytes in size (not always displayed by dmidecode by
default, but you can check the size of file
/sys/firmware/dmi/tables/DMI), so the last byte of the table is at
0xba135000 + 0x8d0 - 1 = 0xba1358cf, which is still within the reserved
range.

If the whole DMI table is NOT located in a "reserved" memory area then
it can get corrupted by any memory allocation.

If the whole DMI table IS located in a "reserved" memory area, it can
still get corrupted, but only by code which itself operates on data
located in a reserved memory area.

> But at this point, it's good that I have a repro. It has been a while since
> I've built and run a 32-bit kernel, but I think I can get that set up with
> the ability to get output during early boot. I'll do some further
> debugging with dmidecode and with the 32-bit kernel to figure out
> what's going on.  There are several mysteries here:  1) Is Hyper-V
> really building a bad DMI blob, or is something else trashing it?

This is a good question, my guess is that the table gets corrupted
afterwards, but better not assume and actually check what the table
looks like at generation time, from the host's perspective.

> 2) Why does a 64-bit kernel succeed on the putative bad DMI blob,
> while a 32-bit kernel fails?

Both DMI tables are corrupted, but are they corrupted in the exact same
way?

>   3) Is dmidecode seeing something different from the Linux kernel?

The DMI table is remapped early at boot time and the result is then
read from dmidecode through /sys/firmware/dmi/tables/DMI. To be honest,
I'm not sure if this "remapping" is a one-time copy or if future
corruption would be reflected to the file. In any case, dmidecode can't
possibly see a less corrupted version of the table. The different
outcome is because dmidecode is more robust to invalid input than the
in-kernel parser.

Note that you can force dmidcode to read the table directly from memory
by using the --no-sysfs option.


> Give me a few days to sort all this out.  And if Linux can be made
> more robust in the face of a bad DMI table entry, I'll submit a
> Linux kernel patch for that.

I agree that the in-kernel DMI table parser should not choke on bad
data. dmidecode has an explicit check on "short entries":

/*
* If a short entry is found (less than 4 bytes), not only it
* is invalid, but we cannot reliably locate the next entry.
* Better stop at this point, and let the user know his/her
* table is broken.
*/
if (h.length < 4)
{
if (!(opt.flags & FLAG_QUIET))
{
fprintf(stderr,
"Invalid entry length (%u). DMI table "
"is broken! Stop.\n\n",
(unsigned int)h.length);
opt.flags |= FLAG_QUIET;
}
break;
}

We need to add something similar to the kernel DMI table parser,
presumably in dmi_scan.c:dmi_decode_table().

--
Jean Delvare
SUSE L3 Support

2024-04-17 16:15:03

by Michael Kelley

[permalink] [raw]
Subject: RE: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

From: Jean DELVARE <[email protected]> Sent: Wednesday, April 17, 2024 2:44 AM
>
> Hi Michael and Michael,
>
> Thanks to both of you for all the data and early analysis.
>
> On Tue, 2024-04-16 at 23:20 +0000, Michael Kelley wrote:
> > Thanks for the information.  I now have a repro of "dmidecode"
> > in user space complaining about a zero length entry, when running
> > in a Gen 1 VM with a 64-bit Linux guest.  Looking at
> > /sys/firmware/dmi/tables/DMI, that section of the DMI blob definitely
> > seems messed up.  The handle is 0x0005, which is the next handle in
> > sequence, but the length and type of the entry are zero.  This is a bit
> > different from the type 10 entry that you saw the 32-bit kernel
> > choking on, and I don't have an explanation for that.  After this
> > bogus entry, there are a few bytes I don't recognize, then about
> > 100 bytes of zeros, which also seems weird.
>
> Don't let the type 10 distract you. It is entirely possible that the
> byte corresponding to type == 10 is already part of the corrupted
> memory area. Can you check if the DMI table generated by Hyper-V is
> supposed to contain type 10 records at all?
>
> This smells like the DMI table has been overwritten by "something".
> Either it happened even before boot, that is, the DMI table generated
> by the VM itself is corrupted in the first place, or the DMI table was
> originally good but other kernel code wrote some data at the same
> memory location (I've seen this once in the past, although that was on
> bare metal). That would possibly still be the result of bad information
> provided by the VM (for example 2 "hardware" features being told to use
> overlapping memory ranges).
>
> You should also check the memory map (as displayed early at boot, so
> near the top of dmesg) and verify that the DMI table is located in a
> "reserved" memory area, so that area can't be used for memory
> allocation. Example on my laptop :
>
> # dmidecode 3.4
> Getting SMBIOS data from sysfs.
> SMBIOS 3.1.1 present.
> Table at 0xBA135000.
>
> So the table starts at physical address 0xba135000, which is in the
> following memory map segment:
>
> reserve setup_data: [mem 0x00000000b87b0000-0x00000000bb77dfff] reserved
>
> This memory area is marked as "reserved" so all is well. In my case,
> the table is 2256 bytes in size (not always displayed by dmidecode by
> default, but you can check the size of file
> /sys/firmware/dmi/tables/DMI), so the last byte of the table is at
> 0xba135000 + 0x8d0 - 1 = 0xba1358cf, which is still within the reserved
> range.
>
> If the whole DMI table is NOT located in a "reserved" memory area then
> it can get corrupted by any memory allocation.
>
> If the whole DMI table IS located in a "reserved" memory area, it can
> still get corrupted, but only by code which itself operates on data
> located in a reserved memory area.
>
> > But at this point, it's good that I have a repro. It has been a while since
> > I've built and run a 32-bit kernel, but I think I can get that set up with
> > the ability to get output during early boot. I'll do some further
> > debugging with dmidecode and with the 32-bit kernel to figure out
> > what's going on.  There are several mysteries here:  1) Is Hyper-V
> > really building a bad DMI blob, or is something else trashing it?
>
> This is a good question, my guess is that the table gets corrupted
> afterwards, but better not assume and actually check what the table
> looks like at generation time, from the host's perspective.
>
> > 2) Why does a 64-bit kernel succeed on the putative bad DMI blob,
> > while a 32-bit kernel fails?
>
> Both DMI tables are corrupted, but are they corrupted in the exact same
> way?
>
> >   3) Is dmidecode seeing something different from the Linux kernel?
>
> The DMI table is remapped early at boot time and the result is then
> read from dmidecode through /sys/firmware/dmi/tables/DMI. To be honest,
> I'm not sure if this "remapping" is a one-time copy or if future
> corruption would be reflected to the file. In any case, dmidecode can't
> possibly see a less corrupted version of the table. The different
> outcome is because dmidecode is more robust to invalid input than the
> in-kernel parser.
>
> Note that you can force dmidcode to read the table directly from memory
> by using the --no-sysfs option.

Thanks for all the good input! I'll follow up on these ideas. FYI, I'm
a former Microsoft employee who spent several years doing Linux
kernel work to enable running as a Hyper-V guest. So working with
Hyper-V and Linux is very familiar to me. But I retired about 6 months
ago, so I don't have the internal Microsoft connections that I once
had if help is needed from the Hyper-V side. Other Microsoft folks on
the thread may need to jump in if such help is needed. At this point,
I'm contributing to Linux kernel work as an individual.

In any case, I'll debug things from the Linux guest side and then see if
anything is needed from the Hyper-V side.

Michael

>
>
> > Give me a few days to sort all this out.  And if Linux can be made
> > more robust in the face of a bad DMI table entry, I'll submit a
> > Linux kernel patch for that.
>
> I agree that the in-kernel DMI table parser should not choke on bad
> data. dmidecode has an explicit check on "short entries":
>
> /*
> * If a short entry is found (less than 4 bytes), not only it
> * is invalid, but we cannot reliably locate the next entry.
> * Better stop at this point, and let the user know his/her
> * table is broken.
> */
> if (h.length < 4)
> {
> if (!(opt.flags & FLAG_QUIET))
> {
> fprintf(stderr,
> "Invalid entry length (%u). DMI table "
> "is broken! Stop.\n\n",
> (unsigned int)h.length);
> opt.flags |= FLAG_QUIET;
> }
> break;
> }
>
> We need to add something similar to the kernel DMI table parser,
> presumably in dmi_scan.c:dmi_decode_table().
>
> --
> Jean Delvare
> SUSE L3 Support

2024-04-17 21:08:36

by Michael Schierl

[permalink] [raw]
Subject: Re: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

Hello Jean,


Thanks for your reply.


Am 17.04.2024 um 11:43 schrieb Jean DELVARE:

> Don't let the type 10 distract you. It is entirely possible that the
> byte corresponding to type == 10 is already part of the corrupted
> memory area. Can you check if the DMI table generated by Hyper-V is
> supposed to contain type 10 records at all?

How? Hyper-V is not open source :-)

My best guess to get Linux out of the equation would be to boot my
trusted MS-DOS 6.2 floppy and use debug.com to dump the DMI:

> | A:\>debug
> | -df000:93d0 [to inspect]
> | -nfromdos.dmi
> | -rcx
> | CX 0000
> | :439B
> | -w f000:93d0
> | -q


The result is byte-for-byte identical to the DMI dump I created from
sysfs and pasted earlier in this thread. Of course, it does not have to
be identical to the memory situation while it was parsed.

> You should also check the memory map (as displayed early at boot, so
> near the top of dmesg) and verify that the DMI table is located in a
> "reserved" memory area, so that area can't be used for memory
> allocation.

The e820 memory map was included in the early printk output I posted
earlier:

> [ 0.000000] BIOS-provided physical RAM map:
> [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
> [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffeffff] usable
> [ 0.000000] BIOS-e820: [mem 0x000000007fff0000-0x000000007fffefff] ACPI data
> [ 0.000000] BIOS-e820: [mem 0x000000007ffff000-0x000000007fffffff] ACPI NVS

And from the dmidecode I pasted earlier:

> Table at 0x000F93D0.

The size is 0x0000439B, so the last byte should be at 0x000FD76A, well
inside the third i820 entry (the second reserved one) - and accessible
even from DOS without requiring any extra effort.

> So the table starts at physical address 0xba135000, which is in the
> following memory map segment:
>
> reserve setup_data: [mem 0x00000000b87b0000-0x00000000bb77dfff] reserved

Looks like UEFI, and well outside the 1MB range :-)

> If the whole DMI table IS located in a "reserved" memory area, it can
> still get corrupted, but only by code which itself operates on data
> located in a reserved memory area.


> Both DMI tables are corrupted, but are they corrupted in the exact same
> way?

At least the dumped tables are byte-for-byte identical on both OS
flavors. And (as I tested above) byte-for-byte identical to a version
dumped from MS-DOS.


Regards,


Michael


2024-04-17 22:34:53

by Michael Kelley

[permalink] [raw]
Subject: RE: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

From: Michael Schierl <[email protected]> Sent: Wednesday, April 17, 2024 2:08 PM
>
> > Don't let the type 10 distract you. It is entirely possible that the
> > byte corresponding to type == 10 is already part of the corrupted
> > memory area. Can you check if the DMI table generated by Hyper-V is
> > supposed to contain type 10 records at all?
>
> How? Hyper-V is not open source :-)

I think that request from Jean is targeted to me or the Microsoft
people on the thread. :-)

>
> My best guess to get Linux out of the equation would be to boot my
> trusted MS-DOS 6.2 floppy and use debug.com to dump the DMI:
>
> > | A:\>debug
> > | -df000:93d0 [to inspect]
> > | -nfromdos.dmi
> > | -rcx
> > | CX 0000
> > | :439B
> > | -w f000:93d0
> > | -q
>
>
> The result is byte-for-byte identical to the DMI dump I created from
> sysfs and pasted earlier in this thread. Of course, it does not have to
> be identical to the memory situation while it was parsed.

I've been looking at the details of the DMI blob in a Linux VM on my
local Windows 11 laptop, as well as in a Generation 1 VM in the Azure
public cloud, which uses Hyper-V. The overall size and layout
of the DMI blob appears to be the same in both cases. The blob is
corrupted in the VM on the local laptop, but good in the Azure VM.

I was wondering how to check if the Linux bootloaders and grub
were somehow corrupting the DMI blob, but now you've
answered the question by running MS-DOS and dumping the
contents. Excellent experiment!

I still want to understand why 32-bit Linux is taking an oops during
boot while 64-bit Linux does not. During boot, I can see that 64-bit
Linux wanders through the corrupted part of the DMI blob and
looks at a lot of bogus entries before it gets back on track again.
But the bogus entries don't cause an oops. Once I figure out
those details, we still have the corrupted DMI blob, and based on
your MS-DOS experiment, it's looking like Hyper-V created the
corrupted form. I want to think more about how to debug that.

FWIW, in comparing the Azure VM with my local VM, it looks like
the corrupted entry is the first type 4 entry describing a CPU.

Michael Kelley

>
> > You should also check the memory map (as displayed early at boot, so
> > near the top of dmesg) and verify that the DMI table is located in a
> > "reserved" memory area, so that area can't be used for memory
> > allocation.
>
> The e820 memory map was included in the early printk output I posted
> earlier:
>
> > [ 0.000000] BIOS-provided physical RAM map:
> > [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff]
> usable
> > [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff]
> reserved
> > [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff]
> reserved
> > [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffeffff] usable
> > [ 0.000000] BIOS-e820: [mem 0x000000007fff0000-0x000000007fffefff] ACPI
> data
> > [ 0.000000] BIOS-e820: [mem 0x000000007ffff000-0x000000007fffffff] ACPI NVS
>
> And from the dmidecode I pasted earlier:
>
> > Table at 0x000F93D0.
>
> The size is 0x0000439B, so the last byte should be at 0x000FD76A, well
> inside the third i820 entry (the second reserved one) - and accessible
> even from DOS without requiring any extra effort.
>
> > So the table starts at physical address 0xba135000, which is in the
> > following memory map segment:
> >
> > reserve setup_data: [mem 0x00000000b87b0000-0x00000000bb77dfff] reserved
>
> Looks like UEFI, and well outside the 1MB range :-)
>
> > If the whole DMI table IS located in a "reserved" memory area, it can
> > still get corrupted, but only by code which itself operates on data
> > located in a reserved memory area.
>
>
> > Both DMI tables are corrupted, but are they corrupted in the exact same
> > way?
>
> At least the dumped tables are byte-for-byte identical on both OS
> flavors. And (as I tested above) byte-for-byte identical to a version
> dumped from MS-DOS.
>
>
> Regards,
>
>
> Michael


2024-04-19 16:36:21

by Michael Kelley

[permalink] [raw]
Subject: RE: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

From: Michael Kelley <[email protected]> Sent: Wednesday, April 17, 2024 3:35 PM
>
> From: Michael Schierl <[email protected]> Sent: Wednesday, April 17, 2024 2:08 PM
> >
> > > Don't let the type 10 distract you. It is entirely possible that the
> > > byte corresponding to type == 10 is already part of the corrupted
> > > memory area. Can you check if the DMI table generated by Hyper-V is
> > > supposed to contain type 10 records at all?
> >
> > How? Hyper-V is not open source :-)
>
> I think that request from Jean is targeted to me or the Microsoft
> people on the thread. :-)
>
> >
> > My best guess to get Linux out of the equation would be to boot my
> > trusted MS-DOS 6.2 floppy and use debug.com to dump the DMI:
> >
> > > | A:\>debug
> > > | -df000:93d0 [to inspect]
> > > | -nfromdos.dmi
> > > | -rcx
> > > | CX 0000
> > > | :439B
> > > | -w f000:93d0
> > > | -q
> >
> >
> > The result is byte-for-byte identical to the DMI dump I created from
> > sysfs and pasted earlier in this thread. Of course, it does not have to
> > be identical to the memory situation while it was parsed.
>
> I've been looking at the details of the DMI blob in a Linux VM on my
> local Windows 11 laptop, as well as in a Generation 1 VM in the Azure
> public cloud, which uses Hyper-V. The overall size and layout
> of the DMI blob appears to be the same in both cases. The blob is
> corrupted in the VM on the local laptop, but good in the Azure VM.
>
> I was wondering how to check if the Linux bootloaders and grub
> were somehow corrupting the DMI blob, but now you've
> answered the question by running MS-DOS and dumping the
> contents. Excellent experiment!
>
> I still want to understand why 32-bit Linux is taking an oops during
> boot while 64-bit Linux does not.

The difference is in this statement in dmi_save_devices():

count = (dm->length - sizeof(struct dmi_header)) / 2;

On a 64-bit system, count is 0xFFFFFFFE. That's seen as a
negative value, and the "for" loop does not do any iterations. So
nothing bad happens.

But on a 32-bit system, count is 0x7FFFFFFE. That's a big
positive number, and the "for" loop iterates to non-existent
memory as Michael Schierl originally described.

I don't know the "C" rules for mixed signed and unsigned
expressions, and how they differ on 32-bit and 64-bit systems.
But that's the cause of the different behavior.

Regardless of the 32-bit vs. 64-bit behavior, the DMI blob is malformed,
almost certainly as created by Hyper-V. I'll see if I can bring this to
the attention of one of my previous contacts on the Hyper-V team.

Michael

> During boot, I can see that 64-bit
> Linux wanders through the corrupted part of the DMI blob and
> looks at a lot of bogus entries before it gets back on track again.
> But the bogus entries don't cause an oops. Once I figure out
> those details, we still have the corrupted DMI blob, and based on
> your MS-DOS experiment, it's looking like Hyper-V created the
> corrupted form. I want to think more about how to debug that.
>
> FWIW, in comparing the Azure VM with my local VM, it looks like
> the corrupted entry is the first type 4 entry describing a CPU.
>
> Michael Kelley
>
> >
> > > You should also check the memory map (as displayed early at boot, so
> > > near the top of dmesg) and verify that the DMI table is located in a
> > > "reserved" memory area, so that area can't be used for memory
> > > allocation.
> >
> > The e820 memory map was included in the early printk output I posted
> > earlier:
> >
> > > [ 0.000000] BIOS-provided physical RAM map:
> > > [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> > > [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> > > [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
> > > [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffeffff] usable
> > > [ 0.000000] BIOS-e820: [mem 0x000000007fff0000-0x000000007fffefff] ACPI data
> > > [ 0.000000] BIOS-e820: [mem 0x000000007ffff000-0x000000007fffffff] ACPI NVS
> >
> > And from the dmidecode I pasted earlier:
> >
> > > Table at 0x000F93D0.
> >
> > The size is 0x0000439B, so the last byte should be at 0x000FD76A, well
> > inside the third i820 entry (the second reserved one) - and accessible
> > even from DOS without requiring any extra effort.
> >
> > > So the table starts at physical address 0xba135000, which is in the
> > > following memory map segment:
> > >
> > > reserve setup_data: [mem 0x00000000b87b0000-0x00000000bb77dfff] reserved
> >
> > Looks like UEFI, and well outside the 1MB range :-)
> >
> > > If the whole DMI table IS located in a "reserved" memory area, it can
> > > still get corrupted, but only by code which itself operates on data
> > > located in a reserved memory area.
> >
> >
> > > Both DMI tables are corrupted, but are they corrupted in the exact same
> > > way?
> >
> > At least the dumped tables are byte-for-byte identical on both OS
> > flavors. And (as I tested above) byte-for-byte identical to a version
> > dumped from MS-DOS.
> >
> >
> > Regards,
> >
> >
> > Michael
>


2024-04-19 20:47:28

by Michael Schierl

[permalink] [raw]
Subject: Re: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

Hello,


Am 19.04.2024 um 18:36 schrieb Michael Kelley:

>> I still want to understand why 32-bit Linux is taking an oops during
>> boot while 64-bit Linux does not.
>
> The difference is in this statement in dmi_save_devices():
>
> count = (dm->length - sizeof(struct dmi_header)) / 2;
>
> On a 64-bit system, count is 0xFFFFFFFE. That's seen as a
> negative value, and the "for" loop does not do any iterations. So
> nothing bad happens.
>
> But on a 32-bit system, count is 0x7FFFFFFE. That's a big
> positive number, and the "for" loop iterates to non-existent
> memory as Michael Schierl originally described.
>
> I don't know the "C" rules for mixed signed and unsigned
> expressions, and how they differ on 32-bit and 64-bit systems.
> But that's the cause of the different behavior.

Probably lots of implementation defined behaviour here. But when looking
at gcc 12.2 for x86/amd64 architecture (which is the version in Debian),
it is at least apparent from the assembly listing:

https://godbolt.org/z/he7MfcWfE

First of all (this gets me every time): sizeof(int) is 4 on both 32-and
64-bit, unlike sizeof(uintptr_t), which is 8 on 64-bit.

Both 32-bit and 64-bit versions zero-extend the value of dm->length from
8 bits to 32 bits (or actually native bitlength as the upper 32 bits of
rax get set to zero whenever eax is assigned), and then the subtraction
and shifting (division) happen as native unsigend type, taking only the
lowest 32 bits of the result as value for count. In the 64-bit case one
of the extra leading 1 bits from the subtraction gets shifted into the
MSB of the result, while in the 32-bit case it remains empty.

When using long instead of int (64-bit signed integer, as I assumed when
looking at the code for the first time), the result would be
0x7FFF_FFFF_FFFF_FFFE on 64-bits, as no truncations happens, and the
behavior would be the same. This clearly shows that I am mentally still
in the 32-bit era, perhaps that explains why I like 32-bit kernels over
64-bit ones so much :D

> Regardless of the 32-bit vs. 64-bit behavior, the DMI blob is malformed,
> almost certainly as created by Hyper-V. I'll see if I can bring this to
> the attention of one of my previous contacts on the Hyper-V team.


Thanks,


Michael


2024-04-19 22:32:31

by Michael Kelley

[permalink] [raw]
Subject: RE: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

From: Michael Schierl <[email protected]> Sent: Friday, April 19, 2024 1:47 PM
> Am 19.04.2024 um 18:36 schrieb Michael Kelley:
>
> >> I still want to understand why 32-bit Linux is taking an oops during
> >> boot while 64-bit Linux does not.
> >
> > The difference is in this statement in dmi_save_devices():
> >
> > count = (dm->length - sizeof(struct dmi_header)) / 2;
> >
> > On a 64-bit system, count is 0xFFFFFFFE. That's seen as a
> > negative value, and the "for" loop does not do any iterations. So
> > nothing bad happens.
> >
> > But on a 32-bit system, count is 0x7FFFFFFE. That's a big
> > positive number, and the "for" loop iterates to non-existent
> > memory as Michael Schierl originally described.
> >
> > I don't know the "C" rules for mixed signed and unsigned
> > expressions, and how they differ on 32-bit and 64-bit systems.
> > But that's the cause of the different behavior.
>
> Probably lots of implementation defined behaviour here. But when looking
> at gcc 12.2 for x86/amd64 architecture (which is the version in Debian),
> it is at least apparent from the assembly listing:
>
> https://godbolt.org/z/he7MfcWfE
>
> First of all (this gets me every time): sizeof(int) is 4 on both 32-and
> 64-bit, unlike sizeof(uintptr_t), which is 8 on 64-bit.
>
> Both 32-bit and 64-bit versions zero-extend the value of dm->length from
> 8 bits to 32 bits (or actually native bitlength as the upper 32 bits of
> rax get set to zero whenever eax is assigned), and then the subtraction
> and shifting (division) happen as native unsigend type, taking only the
> lowest 32 bits of the result as value for count. In the 64-bit case one
> of the extra leading 1 bits from the subtraction gets shifted into the
> MSB of the result, while in the 32-bit case it remains empty.

Yep -- makes sense. As you said, the sub-expression
(dm->length - sizeof(struct dmi_header)) is unsigned with a size that
is the size we're compiling for. When compiling for 32-bit, the right shift
puts a zero in the upper bit (bit 31) because the value is treated as
unsigned. But when compiling for 64-bit, bits [63:32] exist and they
are all ones. The right shift puts the zero in bit 63, and bit 32 (a "1")
gets shifted into bit 31.

Michael