Testing at Collabora unearthed two issues in the new AMD topology parser
code:
1) The CPUID 0x80000008 parser initializes the wrong topology domain
level.
2) The NODEID_MSR parser uses bitfields in a union wrongly which results
in reading out the wrong value and finally in a division by zero.
Many thanks to Laura for helping to debug this issue.
tglx
Hi Thomas,
On 4/10/24 21:45, Thomas Gleixner wrote:
> Testing at Collabora unearthed two issues in the new AMD topology parser
> code:
>
> 1) The CPUID 0x80000008 parser initializes the wrong topology domain
> level.
>
> 2) The NODEID_MSR parser uses bitfields in a union wrongly which results
> in reading out the wrong value and finally in a division by zero.
>
> Many thanks to Laura for helping to debug this issue.
>
> tglx
>
>
Thanks a lot for investigating and solving the issue!
I confirm that with this series applied the kernel boots correctly on
all three AMD Stoney Ridge Chromebooks that were affected by the
regression.
I tested the patches on top of c749ce39 (culprit commit identified by
the bisection) - reference test job:
https://lava.collabora.dev/scheduler/job/13339645
The series doesn't apply directly to next, but I manually applied the
changes on top of next-20240411 and can confirm the kernel boots
correctly with this revision too - reference test job:
https://lava.collabora.dev/scheduler/job/13340321
The regression was originally reported by KernelCI, so:
Reported-by: "kernelci.org bot" <[email protected]>
Tested-by: Laura Nao <[email protected]>
I'll make sure to update the Regzbot tag when the series is merged.
Best,
Laura
On 11.04.24 13:27, Laura Nao wrote:
>
> On 4/10/24 21:45, Thomas Gleixner wrote:
>> Testing at Collabora unearthed two issues in the new AMD topology parser
>> code:
>>
>> 1) The CPUID 0x80000008 parser initializes the wrong topology domain
>> level.
>>
>> 2) The NODEID_MSR parser uses bitfields in a union wrongly which results
>> in reading out the wrong value and finally in a division by zero.
>>
>> Many thanks to Laura for helping to debug this issue.
>>
>> tglx
>>
>>
>
> Thanks a lot for investigating and solving the issue!> [...]
>
> The regression was originally reported by KernelCI, so:
>
> Reported-by: "kernelci.org bot" <[email protected]>
> Tested-by: Laura Nao <[email protected]>
>
> I'll make sure to update the Regzbot tag when the series is merged.
No need to wait, we can do that now:
#regzbot fix: x86/cpu/amd: Make the NODEID_MSR union actually work
But ideally Thomas would add Link: or Closes: tag to the patch
description (e.g.
Closes:
https://lore.kernel.org/all/[email protected]/
) just like Linus asked him to do a while ago already[1], as then this
would not be necessary at all. ;) (SCNR)
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
[1]
https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-DgvdBWg@mail.gmail.com/
On Thu, Apr 11 2024 at 13:37, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 11.04.24 13:27, Laura Nao wrote:
> No need to wait, we can do that now:
>
> #regzbot fix: x86/cpu/amd: Make the NODEID_MSR union actually work
>
> But ideally Thomas would add Link: or Closes: tag to the patch
> description (e.g.
>
> Closes:
> https://lore.kernel.org/all/[email protected]/
>
> ) just like Linus asked him to do a while ago already[1], as then this
> would not be necessary at all. ;) (SCNR)
Will do when applying them and I try to remember that Closes thing, but
you know at my age ....
Thanks,
tglx