2024-04-10 19:45:36

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 0/2] x86/cpu/amd: Fixup the topology rework fallout

Testing at Collabora unearthed two issues in the new AMD topology parser
code:

1) The CPUID 0x80000008 parser initializes the wrong topology domain
level.

2) The NODEID_MSR parser uses bitfields in a union wrongly which results
in reading out the wrong value and finally in a division by zero.

Many thanks to Laura for helping to debug this issue.

tglx




2024-04-11 11:28:22

by Laura Nao

[permalink] [raw]
Subject: Re: [patch 0/2] x86/cpu/amd: Fixup the topology rework fallout

Hi Thomas,

On 4/10/24 21:45, Thomas Gleixner wrote:
> Testing at Collabora unearthed two issues in the new AMD topology parser
> code:
>
> 1) The CPUID 0x80000008 parser initializes the wrong topology domain
> level.
>
> 2) The NODEID_MSR parser uses bitfields in a union wrongly which results
> in reading out the wrong value and finally in a division by zero.
>
> Many thanks to Laura for helping to debug this issue.
>
> tglx
>
>

Thanks a lot for investigating and solving the issue!

I confirm that with this series applied the kernel boots correctly on
all three AMD Stoney Ridge Chromebooks that were affected by the
regression.

I tested the patches on top of c749ce39 (culprit commit identified by
the bisection) - reference test job:
https://lava.collabora.dev/scheduler/job/13339645

The series doesn't apply directly to next, but I manually applied the
changes on top of next-20240411 and can confirm the kernel boots
correctly with this revision too - reference test job:
https://lava.collabora.dev/scheduler/job/13340321

The regression was originally reported by KernelCI, so:

Reported-by: "kernelci.org bot" <[email protected]>
Tested-by: Laura Nao <[email protected]>

I'll make sure to update the Regzbot tag when the series is merged.

Best,

Laura


Subject: Re: [patch 0/2] x86/cpu/amd: Fixup the topology rework fallout

On 11.04.24 13:27, Laura Nao wrote:
>
> On 4/10/24 21:45, Thomas Gleixner wrote:
>> Testing at Collabora unearthed two issues in the new AMD topology parser
>> code:
>>
>> 1) The CPUID 0x80000008 parser initializes the wrong topology domain
>> level.
>>
>> 2) The NODEID_MSR parser uses bitfields in a union wrongly which results
>> in reading out the wrong value and finally in a division by zero.
>>
>> Many thanks to Laura for helping to debug this issue.
>>
>> tglx
>>
>>
>
> Thanks a lot for investigating and solving the issue!> [...]
>
> The regression was originally reported by KernelCI, so:
>
> Reported-by: "kernelci.org bot" <[email protected]>
> Tested-by: Laura Nao <[email protected]>
>
> I'll make sure to update the Regzbot tag when the series is merged.

No need to wait, we can do that now:

#regzbot fix: x86/cpu/amd: Make the NODEID_MSR union actually work

But ideally Thomas would add Link: or Closes: tag to the patch
description (e.g.

Closes:
https://lore.kernel.org/all/[email protected]/

) just like Linus asked him to do a while ago already[1], as then this
would not be necessary at all. ;) (SCNR)

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

[1]
https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-DgvdBWg@mail.gmail.com/

2024-04-11 12:14:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 0/2] x86/cpu/amd: Fixup the topology rework fallout

On Thu, Apr 11 2024 at 13:37, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 11.04.24 13:27, Laura Nao wrote:
> No need to wait, we can do that now:
>
> #regzbot fix: x86/cpu/amd: Make the NODEID_MSR union actually work
>
> But ideally Thomas would add Link: or Closes: tag to the patch
> description (e.g.
>
> Closes:
> https://lore.kernel.org/all/[email protected]/
>
> ) just like Linus asked him to do a while ago already[1], as then this
> would not be necessary at all. ;) (SCNR)

Will do when applying them and I try to remember that Closes thing, but
you know at my age ....

Thanks,

tglx