2023-02-09 01:16:49

by Zev Weiss

[permalink] [raw]
Subject: [RFC PATCH] hwmon: (peci/cputemp) Number cores as seen by host system

While porting OpenBMC to a new platform with a Xeon Gold 6314U CPU
(Ice Lake, 32 cores), I discovered that the core numbering used by the
PECI interface appears to correspond to the cores that are present in
the physical silicon, rather than those that are actually enabled and
usable by the host OS (i.e. it includes cores that the chip was
manufactured with but later had fused off).

Thus far the cputemp driver has transparently exposed that numbering
to userspace in its 'tempX_label' sysfs files, making the core numbers
it reported not align with the core numbering used by the host system,
which seems like an unfortunate source of confusion.

We can instead use a separate counter to label the cores in a
contiguous fashion (0 through numcores-1) so that the core numbering
reported by the PECI cputemp driver matches the numbering seen by the
host.

Signed-off-by: Zev Weiss <[email protected]>
---

Offhand I can't think of any other examples of side effects of that
manufacturing detail (fused-off cores) leaking out in
externally-visible ways, so I'd think it's probably not something we
really want to propagate further.

I've verified that at least on the system I'm working on the numbering
provided by this patch aligns with the host's CPU numbering (loaded
each core individually one by one and saw a corresponding temperature
increase visible via PECI), but I'm not sure if that relationship is
guaranteed to hold on all parts -- Iwona, do you know if that's
something we can rely on?

This patch also leaves the driver's internal core tracking with the
"physical" numbering the PECI interface uses, and hence it's still
sort of visible to userspace in the form of the hwmon channel numbers
used in the names of the sysfs attribute files. If desired we could
also change that to keep the tempX_* file numbers contiguous as well,
though it would necessitate a bit of additional remapping in the
driver to translate between the two.

drivers/hwmon/peci/cputemp.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/hwmon/peci/cputemp.c b/drivers/hwmon/peci/cputemp.c
index 30850a479f61..6b4010cbbfdf 100644
--- a/drivers/hwmon/peci/cputemp.c
+++ b/drivers/hwmon/peci/cputemp.c
@@ -400,14 +400,15 @@ static int init_core_mask(struct peci_cputemp *priv)
static int create_temp_label(struct peci_cputemp *priv)
{
unsigned long core_max = find_last_bit(priv->core_mask, CORE_NUMS_MAX);
- int i;
+ int i, corenum = 0;

priv->coretemp_label = devm_kzalloc(priv->dev, (core_max + 1) * sizeof(char *), GFP_KERNEL);
if (!priv->coretemp_label)
return -ENOMEM;

for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX) {
- priv->coretemp_label[i] = devm_kasprintf(priv->dev, GFP_KERNEL, "Core %d", i);
+ priv->coretemp_label[i] = devm_kasprintf(priv->dev, GFP_KERNEL,
+ "Core %d", corenum++);
if (!priv->coretemp_label[i])
return -ENOMEM;
}
--
2.39.1.236.ga8a28b9eace8



2023-02-09 17:50:11

by Guenter Roeck

[permalink] [raw]
Subject: Re: [RFC PATCH] hwmon: (peci/cputemp) Number cores as seen by host system

On Wed, Feb 08, 2023 at 05:16:32PM -0800, Zev Weiss wrote:
> While porting OpenBMC to a new platform with a Xeon Gold 6314U CPU
> (Ice Lake, 32 cores), I discovered that the core numbering used by the
> PECI interface appears to correspond to the cores that are present in
> the physical silicon, rather than those that are actually enabled and
> usable by the host OS (i.e. it includes cores that the chip was
> manufactured with but later had fused off).
>
> Thus far the cputemp driver has transparently exposed that numbering
> to userspace in its 'tempX_label' sysfs files, making the core numbers
> it reported not align with the core numbering used by the host system,
> which seems like an unfortunate source of confusion.
>
> We can instead use a separate counter to label the cores in a
> contiguous fashion (0 through numcores-1) so that the core numbering
> reported by the PECI cputemp driver matches the numbering seen by the
> host.
>

I don't really have an opinion if this change is desirable or not.
I suspect one could argue either way. I'l definitely want to see
feedback from others. Any comments or thoughts, anyone ?

> Signed-off-by: Zev Weiss <[email protected]>
> ---
>
> Offhand I can't think of any other examples of side effects of that
> manufacturing detail (fused-off cores) leaking out in
> externally-visible ways, so I'd think it's probably not something we
> really want to propagate further.
>
> I've verified that at least on the system I'm working on the numbering
> provided by this patch aligns with the host's CPU numbering (loaded
> each core individually one by one and saw a corresponding temperature
> increase visible via PECI), but I'm not sure if that relationship is
> guaranteed to hold on all parts -- Iwona, do you know if that's
> something we can rely on?
>
> This patch also leaves the driver's internal core tracking with the
> "physical" numbering the PECI interface uses, and hence it's still
> sort of visible to userspace in the form of the hwmon channel numbers
> used in the names of the sysfs attribute files. If desired we could
> also change that to keep the tempX_* file numbers contiguous as well,
> though it would necessitate a bit of additional remapping in the
> driver to translate between the two.

I don't really see the point or benefit of doing that.

Thanks,
Guenter

>
> drivers/hwmon/peci/cputemp.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwmon/peci/cputemp.c b/drivers/hwmon/peci/cputemp.c
> index 30850a479f61..6b4010cbbfdf 100644
> --- a/drivers/hwmon/peci/cputemp.c
> +++ b/drivers/hwmon/peci/cputemp.c
> @@ -400,14 +400,15 @@ static int init_core_mask(struct peci_cputemp *priv)
> static int create_temp_label(struct peci_cputemp *priv)
> {
> unsigned long core_max = find_last_bit(priv->core_mask, CORE_NUMS_MAX);
> - int i;
> + int i, corenum = 0;
>
> priv->coretemp_label = devm_kzalloc(priv->dev, (core_max + 1) * sizeof(char *), GFP_KERNEL);
> if (!priv->coretemp_label)
> return -ENOMEM;
>
> for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX) {
> - priv->coretemp_label[i] = devm_kasprintf(priv->dev, GFP_KERNEL, "Core %d", i);
> + priv->coretemp_label[i] = devm_kasprintf(priv->dev, GFP_KERNEL,
> + "Core %d", corenum++);
> if (!priv->coretemp_label[i])
> return -ENOMEM;
> }
> --
> 2.39.1.236.ga8a28b9eace8
>

2023-02-10 00:14:48

by Zev Weiss

[permalink] [raw]
Subject: Re: [RFC PATCH] hwmon: (peci/cputemp) Number cores as seen by host system

On Thu, Feb 09, 2023 at 09:50:01AM PST, Guenter Roeck wrote:
>On Wed, Feb 08, 2023 at 05:16:32PM -0800, Zev Weiss wrote:
>> While porting OpenBMC to a new platform with a Xeon Gold 6314U CPU
>> (Ice Lake, 32 cores), I discovered that the core numbering used by the
>> PECI interface appears to correspond to the cores that are present in
>> the physical silicon, rather than those that are actually enabled and
>> usable by the host OS (i.e. it includes cores that the chip was
>> manufactured with but later had fused off).
>>
>> Thus far the cputemp driver has transparently exposed that numbering
>> to userspace in its 'tempX_label' sysfs files, making the core numbers
>> it reported not align with the core numbering used by the host system,
>> which seems like an unfortunate source of confusion.
>>
>> We can instead use a separate counter to label the cores in a
>> contiguous fashion (0 through numcores-1) so that the core numbering
>> reported by the PECI cputemp driver matches the numbering seen by the
>> host.
>>
>
>I don't really have an opinion if this change is desirable or not.
>I suspect one could argue either way. I'l definitely want to see
>feedback from others. Any comments or thoughts, anyone ?
>

Agreed, I'd definitely like to get some input from Intel folks on this.

Though since I realize my initial email didn't quite explain this
explicitly, I should probably clarify with an example how weird the
numbering can get with the existing code -- on the 32-core CPU I'm
working with at the moment, the tempX_label files produce the following
core numbers:

Core 0
Core 1
Core 2
Core 3
Core 4
Core 5
Core 6
Core 7
Core 8
Core 9
Core 11
Core 12
Core 13
Core 14
Core 15
Core 18
Core 20
Core 22
Core 23
Core 24
Core 26
Core 27
Core 28
Core 29
Core 30
Core 31
Core 33
Core 34
Core 35
Core 36
Core 38
Core 39

i.e. it's not just a different permutation of the expected core numbers,
we end up with gaps (e.g. the nonexistence of core 10), and core numbers
well in excess of the number of cores the processor really "has" (e.g.
number 39) -- all of which seems like a rather confusing thing to see in
your BMC's sensor readings.


Thanks,
Zev


2023-02-10 00:27:09

by Guenter Roeck

[permalink] [raw]
Subject: Re: [RFC PATCH] hwmon: (peci/cputemp) Number cores as seen by host system

On 2/9/23 16:14, Zev Weiss wrote:
> On Thu, Feb 09, 2023 at 09:50:01AM PST, Guenter Roeck wrote:
>> On Wed, Feb 08, 2023 at 05:16:32PM -0800, Zev Weiss wrote:
>>> While porting OpenBMC to a new platform with a Xeon Gold 6314U CPU
>>> (Ice Lake, 32 cores), I discovered that the core numbering used by the
>>> PECI interface appears to correspond to the cores that are present in
>>> the physical silicon, rather than those that are actually enabled and
>>> usable by the host OS (i.e. it includes cores that the chip was
>>> manufactured with but later had fused off).
>>>
>>> Thus far the cputemp driver has transparently exposed that numbering
>>> to userspace in its 'tempX_label' sysfs files, making the core numbers
>>> it reported not align with the core numbering used by the host system,
>>> which seems like an unfortunate source of confusion.
>>>
>>> We can instead use a separate counter to label the cores in a
>>> contiguous fashion (0 through numcores-1) so that the core numbering
>>> reported by the PECI cputemp driver matches the numbering seen by the
>>> host.
>>>
>>
>> I don't really have an opinion if this change is desirable or not.
>> I suspect one could argue either way. I'l definitely want to see
>> feedback from others. Any comments or thoughts, anyone ?
>>
>
> Agreed, I'd definitely like to get some input from Intel folks on this.
>
> Though since I realize my initial email didn't quite explain this explicitly, I should probably clarify with an example how weird the numbering can get with the existing code -- on the 32-core CPU I'm working with at the moment, the tempX_label files produce the following core numbers:
>
>     Core 0
>     Core 1
>     Core 2
>     Core 3
>     Core 4
>     Core 5
>     Core 6
>     Core 7
>     Core 8
>     Core 9
>     Core 11
>     Core 12
>     Core 13
>     Core 14
>     Core 15
>     Core 18
>     Core 20
>     Core 22
>     Core 23
>     Core 24
>     Core 26
>     Core 27
>     Core 28
>     Core 29
>     Core 30
>     Core 31
>     Core 33
>     Core 34
>     Core 35
>     Core 36
>     Core 38
>     Core 39
>
> i.e. it's not just a different permutation of the expected core numbers, we end up with gaps (e.g. the nonexistence of core 10), and core numbers well in excess of the number of cores the processor really "has" (e.g. number 39) -- all of which seems like a rather confusing thing to see in your BMC's sensor readings.
>

Sure, but what do you see with /proc/cpuinfo and with coretemp
on the host ? It might be even more confusing if the core numbers
reported by the peci driver don't match the core numbers provided
by other tools.

Guenter


2023-02-10 01:48:52

by Zev Weiss

[permalink] [raw]
Subject: Re: [RFC PATCH] hwmon: (peci/cputemp) Number cores as seen by host system

On Thu, Feb 09, 2023 at 04:26:47PM PST, Guenter Roeck wrote:
>On 2/9/23 16:14, Zev Weiss wrote:
>>On Thu, Feb 09, 2023 at 09:50:01AM PST, Guenter Roeck wrote:
>>>On Wed, Feb 08, 2023 at 05:16:32PM -0800, Zev Weiss wrote:
>>>>While porting OpenBMC to a new platform with a Xeon Gold 6314U CPU
>>>>(Ice Lake, 32 cores), I discovered that the core numbering used by the
>>>>PECI interface appears to correspond to the cores that are present in
>>>>the physical silicon, rather than those that are actually enabled and
>>>>usable by the host OS (i.e. it includes cores that the chip was
>>>>manufactured with but later had fused off).
>>>>
>>>>Thus far the cputemp driver has transparently exposed that numbering
>>>>to userspace in its 'tempX_label' sysfs files, making the core numbers
>>>>it reported not align with the core numbering used by the host system,
>>>>which seems like an unfortunate source of confusion.
>>>>
>>>>We can instead use a separate counter to label the cores in a
>>>>contiguous fashion (0 through numcores-1) so that the core numbering
>>>>reported by the PECI cputemp driver matches the numbering seen by the
>>>>host.
>>>>
>>>
>>>I don't really have an opinion if this change is desirable or not.
>>>I suspect one could argue either way. I'l definitely want to see
>>>feedback from others. Any comments or thoughts, anyone ?
>>>
>>
>>Agreed, I'd definitely like to get some input from Intel folks on this.
>>
>>Though since I realize my initial email didn't quite explain this explicitly, I should probably clarify with an example how weird the numbering can get with the existing code -- on the 32-core CPU I'm working with at the moment, the tempX_label files produce the following core numbers:
>>
>> ??? Core 0
>> ??? Core 1
>> ??? Core 2
>> ??? Core 3
>> ??? Core 4
>> ??? Core 5
>> ??? Core 6
>> ??? Core 7
>> ??? Core 8
>> ??? Core 9
>> ??? Core 11
>> ??? Core 12
>> ??? Core 13
>> ??? Core 14
>> ??? Core 15
>> ??? Core 18
>> ??? Core 20
>> ??? Core 22
>> ??? Core 23
>> ??? Core 24
>> ??? Core 26
>> ??? Core 27
>> ??? Core 28
>> ??? Core 29
>> ??? Core 30
>> ??? Core 31
>> ??? Core 33
>> ??? Core 34
>> ??? Core 35
>> ??? Core 36
>> ??? Core 38
>> ??? Core 39
>>
>>i.e. it's not just a different permutation of the expected core numbers, we end up with gaps (e.g. the nonexistence of core 10), and core numbers well in excess of the number of cores the processor really "has" (e.g. number 39) -- all of which seems like a rather confusing thing to see in your BMC's sensor readings.
>>
>
>Sure, but what do you see with /proc/cpuinfo and with coretemp
>on the host ? It might be even more confusing if the core numbers
>reported by the peci driver don't match the core numbers provided
>by other tools.
>

The host sees them numbered as the usual 0-31 you'd generally expect,
and assigned to those cores in the same increasing order -- hence the
patch bringing the two into alignment with each other. Currently only
cores 0 through 9 match up between the two, and the rest are off by
somewhere between one and eight.


Zev


2023-02-10 18:45:22

by Guenter Roeck

[permalink] [raw]
Subject: Re: [RFC PATCH] hwmon: (peci/cputemp) Number cores as seen by host system

On Thu, Feb 09, 2023 at 05:48:41PM -0800, Zev Weiss wrote:
> On Thu, Feb 09, 2023 at 04:26:47PM PST, Guenter Roeck wrote:
> > On 2/9/23 16:14, Zev Weiss wrote:
> > > On Thu, Feb 09, 2023 at 09:50:01AM PST, Guenter Roeck wrote:
> > > > On Wed, Feb 08, 2023 at 05:16:32PM -0800, Zev Weiss wrote:
> > > > > While porting OpenBMC to a new platform with a Xeon Gold 6314U CPU
> > > > > (Ice Lake, 32 cores), I discovered that the core numbering used by the
> > > > > PECI interface appears to correspond to the cores that are present in
> > > > > the physical silicon, rather than those that are actually enabled and
> > > > > usable by the host OS (i.e. it includes cores that the chip was
> > > > > manufactured with but later had fused off).
> > > > >
> > > > > Thus far the cputemp driver has transparently exposed that numbering
> > > > > to userspace in its 'tempX_label' sysfs files, making the core numbers
> > > > > it reported not align with the core numbering used by the host system,
> > > > > which seems like an unfortunate source of confusion.
> > > > >
> > > > > We can instead use a separate counter to label the cores in a
> > > > > contiguous fashion (0 through numcores-1) so that the core numbering
> > > > > reported by the PECI cputemp driver matches the numbering seen by the
> > > > > host.
> > > > >
> > > >
> > > > I don't really have an opinion if this change is desirable or not.
> > > > I suspect one could argue either way. I'l definitely want to see
> > > > feedback from others. Any comments or thoughts, anyone ?
> > > >
> > >
> > > Agreed, I'd definitely like to get some input from Intel folks on this.
> > >
> > > Though since I realize my initial email didn't quite explain this explicitly, I should probably clarify with an example how weird the numbering can get with the existing code -- on the 32-core CPU I'm working with at the moment, the tempX_label files produce the following core numbers:
> > >
> > > ??? Core 0
> > > ??? Core 1
> > > ??? Core 2
> > > ??? Core 3
> > > ??? Core 4
> > > ??? Core 5
> > > ??? Core 6
> > > ??? Core 7
> > > ??? Core 8
> > > ??? Core 9
> > > ??? Core 11
> > > ??? Core 12
> > > ??? Core 13
> > > ??? Core 14
> > > ??? Core 15
> > > ??? Core 18
> > > ??? Core 20
> > > ??? Core 22
> > > ??? Core 23
> > > ??? Core 24
> > > ??? Core 26
> > > ??? Core 27
> > > ??? Core 28
> > > ??? Core 29
> > > ??? Core 30
> > > ??? Core 31
> > > ??? Core 33
> > > ??? Core 34
> > > ??? Core 35
> > > ??? Core 36
> > > ??? Core 38
> > > ??? Core 39
> > >
> > > i.e. it's not just a different permutation of the expected core numbers, we end up with gaps (e.g. the nonexistence of core 10), and core numbers well in excess of the number of cores the processor really "has" (e.g. number 39) -- all of which seems like a rather confusing thing to see in your BMC's sensor readings.
> > >
> >
> > Sure, but what do you see with /proc/cpuinfo and with coretemp
> > on the host ? It might be even more confusing if the core numbers
> > reported by the peci driver don't match the core numbers provided
> > by other tools.
> >
>
> The host sees them numbered as the usual 0-31 you'd generally expect, and
> assigned to those cores in the same increasing order -- hence the patch
> bringing the two into alignment with each other. Currently only cores 0
> through 9 match up between the two, and the rest are off by somewhere
> between one and eight.
>

Hmm, interesting. It is not sequential on my large system (Intel(R) Xeon(R)
Gold 6154). I also know for sure that core IDs on Intel server CPUs are
typically not sequential. The processor number is sequential, but the core
ID isn't. On my system, the output from the "sensors" command (that is,
from the coretemp driver) matches the non-sequential core IDs from
/proc/cpuinfo, which is exactly how I would expect it to be.

Guenter

2023-02-18 21:26:36

by Winiarska, Iwona

[permalink] [raw]
Subject: Re: [RFC PATCH] hwmon: (peci/cputemp) Number cores as seen by host system

On Fri, 2023-02-10 at 10:45 -0800, Guenter Roeck wrote:
> On Thu, Feb 09, 2023 at 05:48:41PM -0800, Zev Weiss wrote:
> > On Thu, Feb 09, 2023 at 04:26:47PM PST, Guenter Roeck wrote:
> > > On 2/9/23 16:14, Zev Weiss wrote:
> > > > On Thu, Feb 09, 2023 at 09:50:01AM PST, Guenter Roeck wrote:
> > > > > On Wed, Feb 08, 2023 at 05:16:32PM -0800, Zev Weiss wrote:
> > > > > > While porting OpenBMC to a new platform with a Xeon Gold 6314U CPU
> > > > > > (Ice Lake, 32 cores), I discovered that the core numbering used by
> > > > > > the
> > > > > > PECI interface appears to correspond to the cores that are present
> > > > > > in
> > > > > > the physical silicon, rather than those that are actually enabled
> > > > > > and
> > > > > > usable by the host OS (i.e. it includes cores that the chip was
> > > > > > manufactured with but later had fused off).
> > > > > >
> > > > > > Thus far the cputemp driver has transparently exposed that numbering
> > > > > > to userspace in its 'tempX_label' sysfs files, making the core
> > > > > > numbers
> > > > > > it reported not align with the core numbering used by the host
> > > > > > system,
> > > > > > which seems like an unfortunate source of confusion.
> > > > > >
> > > > > > We can instead use a separate counter to label the cores in a
> > > > > > contiguous fashion (0 through numcores-1) so that the core numbering
> > > > > > reported by the PECI cputemp driver matches the numbering seen by
> > > > > > the
> > > > > > host.
> > > > > >
> > > > >
> > > > > I don't really have an opinion if this change is desirable or not.
> > > > > I suspect one could argue either way. I'l definitely want to see
> > > > > feedback from others. Any comments or thoughts, anyone ?
> > > > >
> > > >
> > > > Agreed, I'd definitely like to get some input from Intel folks on this.
> > > >
> > > > Though since I realize my initial email didn't quite explain this
> > > > explicitly, I should probably clarify with an example how weird the
> > > > numbering can get with the existing code -- on the 32-core CPU I'm
> > > > working with at the moment, the tempX_label files produce the following
> > > > core numbers:
> > > >
> > > >     Core 0
> > > >     Core 1
> > > >     Core 2
> > > >     Core 3
> > > >     Core 4
> > > >     Core 5
> > > >     Core 6
> > > >     Core 7
> > > >     Core 8
> > > >     Core 9
> > > >     Core 11
> > > >     Core 12
> > > >     Core 13
> > > >     Core 14
> > > >     Core 15
> > > >     Core 18
> > > >     Core 20
> > > >     Core 22
> > > >     Core 23
> > > >     Core 24
> > > >     Core 26
> > > >     Core 27
> > > >     Core 28
> > > >     Core 29
> > > >     Core 30
> > > >     Core 31
> > > >     Core 33
> > > >     Core 34
> > > >     Core 35
> > > >     Core 36
> > > >     Core 38
> > > >     Core 39
> > > >
> > > > i.e. it's not just a different permutation of the expected core numbers,
> > > > we end up with gaps (e.g. the nonexistence of core 10), and core numbers
> > > > well in excess of the number of cores the processor really "has" (e.g.
> > > > number 39) -- all of which seems like a rather confusing thing to see in
> > > > your BMC's sensor readings.
> > > >
> > >
> > > Sure, but what do you see with /proc/cpuinfo and with coretemp
> > > on the host ? It might be even more confusing if the core numbers
> > > reported by the peci driver don't match the core numbers provided
> > > by other tools.
> > >
> >
> > The host sees them numbered as the usual 0-31 you'd generally expect, and
> > assigned to those cores in the same increasing order -- hence the patch
> > bringing the two into alignment with each other.  Currently only cores 0
> > through 9 match up between the two, and the rest are off by somewhere
> > between one and eight.
> >
>
> Hmm, interesting. It is not sequential on my large system (Intel(R) Xeon(R)
> Gold 6154). I also know for sure that core IDs on Intel server CPUs are
> typically not sequential. The processor number is sequential, but the core
> ID isn't. On my system, the output from the "sensors" command (that is,
> from the coretemp driver) matches the non-sequential core IDs from
> /proc/cpuinfo, which is exactly how I would expect it to be.
>
> Guenter

On Linux, from host side, core ID is obtained from EDX of CPUID(EAX=0xb).
Unfortunately, the value exposed to the host (and whether it's in sequential or
non-sequential form) can vary from platform to platform (which BTW is why on
Linux, core ID shouldn't really be used for any logic related to task placement
- topology info should be used instead).
From BMC perspective - we'll always get the non-sequential form.

If we just apply the patch proposed by Zev, we'll end up being consistent on one
set of platforms and inconsistent on other set of platforms.
If we want to make things consistent, we need a different approach - either by
obtaining additional information over PECI or by limiting the scope of the
proposed change to specific platforms.

Thanks
-Iwona

2023-02-21 23:56:01

by Zev Weiss

[permalink] [raw]
Subject: Re: [RFC PATCH] hwmon: (peci/cputemp) Number cores as seen by host system

On Sat, Feb 18, 2023 at 01:20:14PM PST, Winiarska, Iwona wrote:
>On Fri, 2023-02-10 at 10:45 -0800, Guenter Roeck wrote:
>> On Thu, Feb 09, 2023 at 05:48:41PM -0800, Zev Weiss wrote:
>> > On Thu, Feb 09, 2023 at 04:26:47PM PST, Guenter Roeck wrote:
>> > > On 2/9/23 16:14, Zev Weiss wrote:
>> > > > On Thu, Feb 09, 2023 at 09:50:01AM PST, Guenter Roeck wrote:
>> > > > > On Wed, Feb 08, 2023 at 05:16:32PM -0800, Zev Weiss wrote:
>> > > > > > While porting OpenBMC to a new platform with a Xeon Gold 6314U CPU
>> > > > > > (Ice Lake, 32 cores), I discovered that the core numbering used by
>> > > > > > the
>> > > > > > PECI interface appears to correspond to the cores that are present
>> > > > > > in
>> > > > > > the physical silicon, rather than those that are actually enabled
>> > > > > > and
>> > > > > > usable by the host OS (i.e. it includes cores that the chip was
>> > > > > > manufactured with but later had fused off).
>> > > > > >
>> > > > > > Thus far the cputemp driver has transparently exposed that numbering
>> > > > > > to userspace in its 'tempX_label' sysfs files, making the core
>> > > > > > numbers
>> > > > > > it reported not align with the core numbering used by the host
>> > > > > > system,
>> > > > > > which seems like an unfortunate source of confusion.
>> > > > > >
>> > > > > > We can instead use a separate counter to label the cores in a
>> > > > > > contiguous fashion (0 through numcores-1) so that the core numbering
>> > > > > > reported by the PECI cputemp driver matches the numbering seen by
>> > > > > > the
>> > > > > > host.
>> > > > > >
>> > > > >
>> > > > > I don't really have an opinion if this change is desirable or not.
>> > > > > I suspect one could argue either way. I'l definitely want to see
>> > > > > feedback from others. Any comments or thoughts, anyone ?
>> > > > >
>> > > >
>> > > > Agreed, I'd definitely like to get some input from Intel folks on this.
>> > > >
>> > > > Though since I realize my initial email didn't quite explain this
>> > > > explicitly, I should probably clarify with an example how weird the
>> > > > numbering can get with the existing code -- on the 32-core CPU I'm
>> > > > working with at the moment, the tempX_label files produce the following
>> > > > core numbers:
>> > > >
>> > > > ??? Core 0
>> > > > ??? Core 1
>> > > > ??? Core 2
>> > > > ??? Core 3
>> > > > ??? Core 4
>> > > > ??? Core 5
>> > > > ??? Core 6
>> > > > ??? Core 7
>> > > > ??? Core 8
>> > > > ??? Core 9
>> > > > ??? Core 11
>> > > > ??? Core 12
>> > > > ??? Core 13
>> > > > ??? Core 14
>> > > > ??? Core 15
>> > > > ??? Core 18
>> > > > ??? Core 20
>> > > > ??? Core 22
>> > > > ??? Core 23
>> > > > ??? Core 24
>> > > > ??? Core 26
>> > > > ??? Core 27
>> > > > ??? Core 28
>> > > > ??? Core 29
>> > > > ??? Core 30
>> > > > ??? Core 31
>> > > > ??? Core 33
>> > > > ??? Core 34
>> > > > ??? Core 35
>> > > > ??? Core 36
>> > > > ??? Core 38
>> > > > ??? Core 39
>> > > >
>> > > > i.e. it's not just a different permutation of the expected core numbers,
>> > > > we end up with gaps (e.g. the nonexistence of core 10), and core numbers
>> > > > well in excess of the number of cores the processor really "has" (e.g.
>> > > > number 39) -- all of which seems like a rather confusing thing to see in
>> > > > your BMC's sensor readings.
>> > > >
>> > >
>> > > Sure, but what do you see with /proc/cpuinfo and with coretemp
>> > > on the host ? It might be even more confusing if the core numbers
>> > > reported by the peci driver don't match the core numbers provided
>> > > by other tools.
>> > >
>> >
>> > The host sees them numbered as the usual 0-31 you'd generally expect, and
>> > assigned to those cores in the same increasing order -- hence the patch
>> > bringing the two into alignment with each other.? Currently only cores 0
>> > through 9 match up between the two, and the rest are off by somewhere
>> > between one and eight.
>> >
>>
>> Hmm, interesting. It is not sequential on my large system (Intel(R) Xeon(R)
>> Gold 6154). I also know for sure that core IDs on Intel server CPUs are
>> typically not sequential. The processor number is sequential, but the core
>> ID isn't. On my system, the output from the "sensors" command (that is,
>> from the coretemp driver) matches the non-sequential core IDs from
>> /proc/cpuinfo, which is exactly how I would expect it to be.
>>
>> Guenter
>
>On Linux, from host side, core ID is obtained from EDX of CPUID(EAX=0xb).
>Unfortunately, the value exposed to the host (and whether it's in sequential or
>non-sequential form) can vary from platform to platform (which BTW is why on
>Linux, core ID shouldn't really be used for any logic related to task placement
>- topology info should be used instead).
>From BMC perspective - we'll always get the non-sequential form.
>
>If we just apply the patch proposed by Zev, we'll end up being consistent on one
>set of platforms and inconsistent on other set of platforms.
>If we want to make things consistent, we need a different approach - either by
>obtaining additional information over PECI or by limiting the scope of the
>proposed change to specific platforms.
>
>Thanks
>-Iwona
>

Okay, I was sort of afraid of something like that.

Does PECI provide the necessary information to reliably map its
(physical silicon I presume) core numbers to the logical numbers seen by
the host OS? The PECI specs I have don't seem to mention anything along
those lines as far as I can see, though perhaps there are newer or more
detailed ones I don't have access to.

If not, how difficult would it be to classify known CPU models by
distinct core-numbering schemes to handle it "manually" in the driver?
If the necessary information is available I could try to develop a patch
for it.


Thanks,
Zev