2023-03-29 10:42:03

by Borislav Petkov

[permalink] [raw]
Subject: Re: Panic starting 6.2.x and later 6.1.x kernels

On Tue, Mar 28, 2023 at 09:26:16PM -0400, Gabriel David wrote:
>
> On 3/28/23 1:10 PM, Borislav Petkov wrote:
> > On Tue, Mar 28, 2023 at 04:06:41PM +0100, David R wrote:
> > > Yes, that patch fixes it also. By all means add my tested by:
> > Ok, thanks for checking. That issue is still weird, tho, and we don't have
> > an idea why that happens.
> >
> > If you could test your original, failing kernel with "nointremap" on the
> > command line, that would be cool.
> >
> > Thx.
> >
> I have the same problem, and while I haven't tested the commit you mentioned
> earlier, `nointremap` on the failing kernels(6.1.x and 6.2.3) worked.
>
> So far, apart from this mail thread I've found this reddit thread with the
> issue https://reddit.com/r/archlinux/comments/11ux6uh/stuck_at_loading_initial_ramdisk/
> , and to them updating the BIOS worked. However, to me it didn't. Another
> thing is that David, that person, and me all use 1st gen Ryzen processors(in
> my case, a Ryzen 3 1200).

Yeah, this looks like something's borked with interrupt remapping and
timer interrupt when the code looks at that online capable bit. I guess
interrupt remapping doesn't consider that bit and still remaps to cores
which are now *not* onlined, leading to the panic.

But this is all conjecture of me trying to connect the IO-APIC
observation to this online capable bit.

And, ofcourse, I cannot trigger it:

[ 0.000000] Linux version 6.1.21 (root@epic) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PREEMPT_DYNAMIC Wed Mar 29 12:00:57 CEST 2023

...

[ 0.200425] smpboot: CPU0: AMD EPYC 7251 8-Core Processor (family: 0x17, model: 0x1, stepping: 0x2)

...

[ 4.019751] AMD-Vi: Interrupt remapping enabled

So it looks like only some Zen1 client BIOSes are b0rked. Which is
swell, again. ;-\

But let's wait for tglx to look at this first.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


2023-03-29 16:31:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: Panic starting 6.2.x and later 6.1.x kernels

Gabriel and David,

can you both pls do:

# acpidump -n MADT

as root and dump the output here?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-03-29 16:32:34

by David R

[permalink] [raw]
Subject: Re: Panic starting 6.2.x and later 6.1.x kernels

On 29/03/2023 17:14, Borislav Petkov wrote:
> Gabriel and David,
>
> can you both pls do:
>
> # acpidump -n MADT
>
> as root and dump the output here?
>
> Thx.
>
APIC @ 0x0000000000000000
    0000: 41 50 49 43 DE 00 00 00 03 AC 41 4C 41 53 4B 41 APIC......ALASKA
    0010: 41 20 4D 20 49 20 00 00 09 20 07 01 41 4D 49 20  A M I ... ..AMI
    0020: 13 00 01 00 00 00 E0 FE 01 00 00 00 00 08 01 00 ................
    0030: 01 00 00 00 00 08 02 01 01 00 00 00 00 08 03 02 ................
    0040: 01 00 00 00 00 08 04 03 01 00 00 00 00 08 05 08 ................
    0050: 01 00 00 00 00 08 06 09 01 00 00 00 00 08 07 0A ................
    0060: 01 00 00 00 00 08 08 0B 01 00 00 00 00 08 09 00 ................
    0070: 00 00 00 00 00 08 0A 00 00 00 00 00 00 08 0B 00 ................
    0080: 00 00 00 00 00 08 0C 00 00 00 00 00 00 08 0D 00 ................
    0090: 00 00 00 00 00 08 0E 00 00 00 00 00 00 08 0F 00 ................
    00A0: 00 00 00 00 00 08 10 00 00 00 00 00 04 06 FF 05 ................
    00B0: 00 01 01 0C 09 00 00 00 C0 FE 00 00 00 00 01 0C ................
    00C0: 0A 00 00 10 C0 FE 18 00 00 00 02 0A 00 00 02 00 ................
    00D0: 00 00 00 00 02 0A 00 09 09 00 00 00 0F 00 ..............


Cheers
David

2023-03-29 18:00:33

by Mario Limonciello

[permalink] [raw]
Subject: Re: Panic starting 6.2.x and later 6.1.x kernels

On 3/29/2023 11:20, David R wrote:
> On 29/03/2023 17:14, Borislav Petkov wrote:
>> Gabriel and David,
>>
>> can you both pls do:
>>
>> # acpidump -n MADT
>>
>> as root and dump the output here?
>>
>> Thx.
>>
> APIC @ 0x0000000000000000
>     0000: 41 50 49 43 DE 00 00 00 03 AC 41 4C 41 53 4B 41 APIC......ALASKA
>     0010: 41 20 4D 20 49 20 00 00 09 20 07 01 41 4D 49 20  A M I ... ..AMI
>     0020: 13 00 01 00 00 00 E0 FE 01 00 00 00 00 08 01 00 ................
>     0030: 01 00 00 00 00 08 02 01 01 00 00 00 00 08 03 02 ................
>     0040: 01 00 00 00 00 08 04 03 01 00 00 00 00 08 05 08 ................
>     0050: 01 00 00 00 00 08 06 09 01 00 00 00 00 08 07 0A ................
>     0060: 01 00 00 00 00 08 08 0B 01 00 00 00 00 08 09 00 ................
>     0070: 00 00 00 00 00 08 0A 00 00 00 00 00 00 08 0B 00 ................
>     0080: 00 00 00 00 00 08 0C 00 00 00 00 00 00 08 0D 00 ................
>     0090: 00 00 00 00 00 08 0E 00 00 00 00 00 00 08 0F 00 ................
>     00A0: 00 00 00 00 00 08 10 00 00 00 00 00 04 06 FF 05 ................
>     00B0: 00 01 01 0C 09 00 00 00 C0 FE 00 00 00 00 01 0C ................
>     00C0: 0A 00 00 10 C0 FE 18 00 00 00 02 0A 00 00 02 00 ................
>     00D0: 00 00 00 00 02 0A 00 09 09 00 00 00 0F 00 ..............
>
>
> Cheers
> David
>

Can you guys have a try with this patch to see if it helps the situation?

https://lore.kernel.org/linux-pm/[email protected]/T/#u

Thanks,

2023-03-29 19:08:26

by David R

[permalink] [raw]
Subject: Re: Panic starting 6.2.x and later 6.1.x kernels

On 29/03/2023 18:51, Limonciello, Mario wrote:
> APIC @ 0x0000000000000000
>>      0000: 41 50 49 43 DE 00 00 00 03 AC 41 4C 41 53 4B 41
>> APIC......ALASKA
>>      0010: 41 20 4D 20 49 20 00 00 09 20 07 01 41 4D 49 20  A M I ...
>> ..AMI
>>      0020: 13 00 01 00 00 00 E0 FE 01 00 00 00 00 08 01 00
>> ................
>>      0030: 01 00 00 00 00 08 02 01 01 00 00 00 00 08 03 02
>> ................
>>      0040: 01 00 00 00 00 08 04 03 01 00 00 00 00 08 05 08
>> ................
>>      0050: 01 00 00 00 00 08 06 09 01 00 00 00 00 08 07 0A
>> ................
>>      0060: 01 00 00 00 00 08 08 0B 01 00 00 00 00 08 09 00
>> ................
>>      0070: 00 00 00 00 00 08 0A 00 00 00 00 00 00 08 0B 00
>> ................
>>      0080: 00 00 00 00 00 08 0C 00 00 00 00 00 00 08 0D 00
>> ................
>>      0090: 00 00 00 00 00 08 0E 00 00 00 00 00 00 08 0F 00
>> ................
>>      00A0: 00 00 00 00 00 08 10 00 00 00 00 00 04 06 FF 05
>> ................
>>      00B0: 00 01 01 0C 09 00 00 00 C0 FE 00 00 00 00 01 0C
>> ................
>>      00C0: 0A 00 00 10 C0 FE 18 00 00 00 02 0A 00 00 02 00
>> ................
>>      00D0: 00 00 00 00 02 0A 00 09 09 00 00 00 0F 00 ..............
>>
>>
>> Cheers
>> David
>>
>
> Can you guys have a try with this patch to see if it helps the situation?
>
> https://lore.kernel.org/linux-pm/[email protected]/T/#u
>
>
> Thanks,

Your patch on top of 6.2.8 brought the crash back I'm afraid.

Cheers
David

2023-03-29 19:10:53

by Mario Limonciello

[permalink] [raw]
Subject: Re: Panic starting 6.2.x and later 6.1.x kernels

On 3/29/2023 14:03, David R wrote:
> On 29/03/2023 18:51, Limonciello, Mario wrote:
>> APIC @ 0x0000000000000000
>>>      0000: 41 50 49 43 DE 00 00 00 03 AC 41 4C 41 53 4B 41
>>> APIC......ALASKA
>>>      0010: 41 20 4D 20 49 20 00 00 09 20 07 01 41 4D 49 20  A M I ...
>>> ..AMI
>>>      0020: 13 00 01 00 00 00 E0 FE 01 00 00 00 00 08 01 00
>>> ................
>>>      0030: 01 00 00 00 00 08 02 01 01 00 00 00 00 08 03 02
>>> ................
>>>      0040: 01 00 00 00 00 08 04 03 01 00 00 00 00 08 05 08
>>> ................
>>>      0050: 01 00 00 00 00 08 06 09 01 00 00 00 00 08 07 0A
>>> ................
>>>      0060: 01 00 00 00 00 08 08 0B 01 00 00 00 00 08 09 00
>>> ................
>>>      0070: 00 00 00 00 00 08 0A 00 00 00 00 00 00 08 0B 00
>>> ................
>>>      0080: 00 00 00 00 00 08 0C 00 00 00 00 00 00 08 0D 00
>>> ................
>>>      0090: 00 00 00 00 00 08 0E 00 00 00 00 00 00 08 0F 00
>>> ................
>>>      00A0: 00 00 00 00 00 08 10 00 00 00 00 00 04 06 FF 05
>>> ................
>>>      00B0: 00 01 01 0C 09 00 00 00 C0 FE 00 00 00 00 01 0C
>>> ................
>>>      00C0: 0A 00 00 10 C0 FE 18 00 00 00 02 0A 00 00 02 00
>>> ................
>>>      00D0: 00 00 00 00 02 0A 00 09 09 00 00 00 0F 00 ..............
>>>
>>>
>>> Cheers
>>> David
>>>
>>
>> Can you guys have a try with this patch to see if it helps the situation?
>>
>> https://lore.kernel.org/linux-pm/[email protected]/T/#u
>>
>> Thanks,
>
> Your patch on top of 6.2.8 brought the crash back I'm afraid.
>
> Cheers
> David

Humm. In that case I'm a bit worried there is some conflicting patches
that caused this result. Could you try with both

e2869bd7af60 and aa06e20f1be6 reverted? If that also fails, I think a
more complicated bisect removing those commits is needed.