On Tue, Mar 28, 2023 at 09:26:16PM -0400, Gabriel David wrote:
>
> On 3/28/23 1:10 PM, Borislav Petkov wrote:
> > On Tue, Mar 28, 2023 at 04:06:41PM +0100, David R wrote:
> > > Yes, that patch fixes it also. By all means add my tested by:
> > Ok, thanks for checking. That issue is still weird, tho, and we don't have
> > an idea why that happens.
> >
> > If you could test your original, failing kernel with "nointremap" on the
> > command line, that would be cool.
> >
> > Thx.
> >
> I have the same problem, and while I haven't tested the commit you mentioned
> earlier, `nointremap` on the failing kernels(6.1.x and 6.2.3) worked.
>
> So far, apart from this mail thread I've found this reddit thread with the
> issue https://reddit.com/r/archlinux/comments/11ux6uh/stuck_at_loading_initial_ramdisk/
> , and to them updating the BIOS worked. However, to me it didn't. Another
> thing is that David, that person, and me all use 1st gen Ryzen processors(in
> my case, a Ryzen 3 1200).
Yeah, this looks like something's borked with interrupt remapping and
timer interrupt when the code looks at that online capable bit. I guess
interrupt remapping doesn't consider that bit and still remaps to cores
which are now *not* onlined, leading to the panic.
But this is all conjecture of me trying to connect the IO-APIC
observation to this online capable bit.
And, ofcourse, I cannot trigger it:
[ 0.000000] Linux version 6.1.21 (root@epic) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PREEMPT_DYNAMIC Wed Mar 29 12:00:57 CEST 2023
...
[ 0.200425] smpboot: CPU0: AMD EPYC 7251 8-Core Processor (family: 0x17, model: 0x1, stepping: 0x2)
...
[ 4.019751] AMD-Vi: Interrupt remapping enabled
So it looks like only some Zen1 client BIOSes are b0rked. Which is
swell, again. ;-\
But let's wait for tglx to look at this first.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Gabriel and David,
can you both pls do:
# acpidump -n MADT
as root and dump the output here?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 29/03/2023 17:14, Borislav Petkov wrote:
> Gabriel and David,
>
> can you both pls do:
>
> # acpidump -n MADT
>
> as root and dump the output here?
>
> Thx.
>
APIC @ 0x0000000000000000
0000: 41 50 49 43 DE 00 00 00 03 AC 41 4C 41 53 4B 41 APIC......ALASKA
0010: 41 20 4D 20 49 20 00 00 09 20 07 01 41 4D 49 20 A M I ... ..AMI
0020: 13 00 01 00 00 00 E0 FE 01 00 00 00 00 08 01 00 ................
0030: 01 00 00 00 00 08 02 01 01 00 00 00 00 08 03 02 ................
0040: 01 00 00 00 00 08 04 03 01 00 00 00 00 08 05 08 ................
0050: 01 00 00 00 00 08 06 09 01 00 00 00 00 08 07 0A ................
0060: 01 00 00 00 00 08 08 0B 01 00 00 00 00 08 09 00 ................
0070: 00 00 00 00 00 08 0A 00 00 00 00 00 00 08 0B 00 ................
0080: 00 00 00 00 00 08 0C 00 00 00 00 00 00 08 0D 00 ................
0090: 00 00 00 00 00 08 0E 00 00 00 00 00 00 08 0F 00 ................
00A0: 00 00 00 00 00 08 10 00 00 00 00 00 04 06 FF 05 ................
00B0: 00 01 01 0C 09 00 00 00 C0 FE 00 00 00 00 01 0C ................
00C0: 0A 00 00 10 C0 FE 18 00 00 00 02 0A 00 00 02 00 ................
00D0: 00 00 00 00 02 0A 00 09 09 00 00 00 0F 00 ..............
Cheers
David
On 3/29/2023 11:20, David R wrote:
> On 29/03/2023 17:14, Borislav Petkov wrote:
>> Gabriel and David,
>>
>> can you both pls do:
>>
>> # acpidump -n MADT
>>
>> as root and dump the output here?
>>
>> Thx.
>>
> APIC @ 0x0000000000000000
> 0000: 41 50 49 43 DE 00 00 00 03 AC 41 4C 41 53 4B 41 APIC......ALASKA
> 0010: 41 20 4D 20 49 20 00 00 09 20 07 01 41 4D 49 20 A M I ... ..AMI
> 0020: 13 00 01 00 00 00 E0 FE 01 00 00 00 00 08 01 00 ................
> 0030: 01 00 00 00 00 08 02 01 01 00 00 00 00 08 03 02 ................
> 0040: 01 00 00 00 00 08 04 03 01 00 00 00 00 08 05 08 ................
> 0050: 01 00 00 00 00 08 06 09 01 00 00 00 00 08 07 0A ................
> 0060: 01 00 00 00 00 08 08 0B 01 00 00 00 00 08 09 00 ................
> 0070: 00 00 00 00 00 08 0A 00 00 00 00 00 00 08 0B 00 ................
> 0080: 00 00 00 00 00 08 0C 00 00 00 00 00 00 08 0D 00 ................
> 0090: 00 00 00 00 00 08 0E 00 00 00 00 00 00 08 0F 00 ................
> 00A0: 00 00 00 00 00 08 10 00 00 00 00 00 04 06 FF 05 ................
> 00B0: 00 01 01 0C 09 00 00 00 C0 FE 00 00 00 00 01 0C ................
> 00C0: 0A 00 00 10 C0 FE 18 00 00 00 02 0A 00 00 02 00 ................
> 00D0: 00 00 00 00 02 0A 00 09 09 00 00 00 0F 00 ..............
>
>
> Cheers
> David
>
Can you guys have a try with this patch to see if it helps the situation?
https://lore.kernel.org/linux-pm/[email protected]/T/#u
Thanks,
On 29/03/2023 18:51, Limonciello, Mario wrote:
> APIC @ 0x0000000000000000
>> 0000: 41 50 49 43 DE 00 00 00 03 AC 41 4C 41 53 4B 41
>> APIC......ALASKA
>> 0010: 41 20 4D 20 49 20 00 00 09 20 07 01 41 4D 49 20 A M I ...
>> ..AMI
>> 0020: 13 00 01 00 00 00 E0 FE 01 00 00 00 00 08 01 00
>> ................
>> 0030: 01 00 00 00 00 08 02 01 01 00 00 00 00 08 03 02
>> ................
>> 0040: 01 00 00 00 00 08 04 03 01 00 00 00 00 08 05 08
>> ................
>> 0050: 01 00 00 00 00 08 06 09 01 00 00 00 00 08 07 0A
>> ................
>> 0060: 01 00 00 00 00 08 08 0B 01 00 00 00 00 08 09 00
>> ................
>> 0070: 00 00 00 00 00 08 0A 00 00 00 00 00 00 08 0B 00
>> ................
>> 0080: 00 00 00 00 00 08 0C 00 00 00 00 00 00 08 0D 00
>> ................
>> 0090: 00 00 00 00 00 08 0E 00 00 00 00 00 00 08 0F 00
>> ................
>> 00A0: 00 00 00 00 00 08 10 00 00 00 00 00 04 06 FF 05
>> ................
>> 00B0: 00 01 01 0C 09 00 00 00 C0 FE 00 00 00 00 01 0C
>> ................
>> 00C0: 0A 00 00 10 C0 FE 18 00 00 00 02 0A 00 00 02 00
>> ................
>> 00D0: 00 00 00 00 02 0A 00 09 09 00 00 00 0F 00 ..............
>>
>>
>> Cheers
>> David
>>
>
> Can you guys have a try with this patch to see if it helps the situation?
>
> https://lore.kernel.org/linux-pm/[email protected]/T/#u
>
>
> Thanks,
Your patch on top of 6.2.8 brought the crash back I'm afraid.
Cheers
David
On 3/29/2023 14:03, David R wrote:
> On 29/03/2023 18:51, Limonciello, Mario wrote:
>> APIC @ 0x0000000000000000
>>> 0000: 41 50 49 43 DE 00 00 00 03 AC 41 4C 41 53 4B 41
>>> APIC......ALASKA
>>> 0010: 41 20 4D 20 49 20 00 00 09 20 07 01 41 4D 49 20 A M I ...
>>> ..AMI
>>> 0020: 13 00 01 00 00 00 E0 FE 01 00 00 00 00 08 01 00
>>> ................
>>> 0030: 01 00 00 00 00 08 02 01 01 00 00 00 00 08 03 02
>>> ................
>>> 0040: 01 00 00 00 00 08 04 03 01 00 00 00 00 08 05 08
>>> ................
>>> 0050: 01 00 00 00 00 08 06 09 01 00 00 00 00 08 07 0A
>>> ................
>>> 0060: 01 00 00 00 00 08 08 0B 01 00 00 00 00 08 09 00
>>> ................
>>> 0070: 00 00 00 00 00 08 0A 00 00 00 00 00 00 08 0B 00
>>> ................
>>> 0080: 00 00 00 00 00 08 0C 00 00 00 00 00 00 08 0D 00
>>> ................
>>> 0090: 00 00 00 00 00 08 0E 00 00 00 00 00 00 08 0F 00
>>> ................
>>> 00A0: 00 00 00 00 00 08 10 00 00 00 00 00 04 06 FF 05
>>> ................
>>> 00B0: 00 01 01 0C 09 00 00 00 C0 FE 00 00 00 00 01 0C
>>> ................
>>> 00C0: 0A 00 00 10 C0 FE 18 00 00 00 02 0A 00 00 02 00
>>> ................
>>> 00D0: 00 00 00 00 02 0A 00 09 09 00 00 00 0F 00 ..............
>>>
>>>
>>> Cheers
>>> David
>>>
>>
>> Can you guys have a try with this patch to see if it helps the situation?
>>
>> https://lore.kernel.org/linux-pm/[email protected]/T/#u
>>
>> Thanks,
>
> Your patch on top of 6.2.8 brought the crash back I'm afraid.
>
> Cheers
> David
Humm. In that case I'm a bit worried there is some conflicting patches
that caused this result. Could you try with both
e2869bd7af60 and aa06e20f1be6 reverted? If that also fails, I think a
more complicated bisect removing those commits is needed.