2024-03-15 11:43:25

by Feng Tang

[permalink] [raw]
Subject: [PATCH] x86/tsc: Use topology_max_packages() to get package number

Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
on qualified platorms") was introduced to solve problem that
sometimes TSC clocksource is wrongly judged as unstable by watchdog
like 'jiffies', HPET, etc.

In it, the hardware package number is a key factor for judging whether
to disable the watchdog for TSC, and 'nr_online_nodes' was chosen as
an estimation due to it is needed in early boot phase before
registering 'tsc-early' clocksource, where all none-boot CPUs are not
brought up yet.

Dave and Rui pointed out there are many cases in which 'nr_online_nodes'
is not accurate, like:

* numa emulation (numa=fake=8 etc.)
* numa=off
* platforms with CPU-less HBM nodes, CPU-less Optane memory nodes.
* SNC (sub-numa cluster) mode enabled
* 'nr_cpus=', 'possible_cpus=' 'maxcpus=' kernel cmdline parameter setup

Thomas' recent patchset of refactoring x86 topology code introduces
topology_max_package(), which works well in most of the above cases.
The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup, which
sets up the 'nr_cpu_ids' and rejects the rest of the CPUs, and may
cause topology_max_package() less than the real package number, but
it's fine as it is rarely used debug option, and logical package
number really matters in this check. So use the more accurate
topology_max_package() to replace nr_online_nodes().

Reported-by: Dave Hansen <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Feng Tang <[email protected]>
---
arch/x86/kernel/tsc.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 5a69a49acc96..87e7c0e89db1 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1252,15 +1252,12 @@ static void __init check_system_tsc_reliable(void)
* - TSC which does not stop in C-States
* - the TSC_ADJUST register which allows to detect even minimal
* modifications
- * - not more than two sockets. As the number of sockets cannot be
- * evaluated at the early boot stage where this has to be
- * invoked, check the number of online memory nodes as a
- * fallback solution which is an reasonable estimate.
+ * - not more than four sockets.
*/
if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
- nr_online_nodes <= 4)
+ topology_max_packages() <= 4)
tsc_disable_clocksource_watchdog();
}

--
2.34.1



2024-03-15 17:58:51

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: Use topology_max_packages() to get package number

On 3/15/24 04:26, Feng Tang wrote:
> Thomas' recent patchset of refactoring x86 topology code introduces
> topology_max_package(), which works well in most of the above cases.
> The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup, which
> sets up the 'nr_cpu_ids' and rejects the rest of the CPUs, and may
> cause topology_max_package() less than the real package number, but
> it's fine as it is rarely used debug option, and logical package
> number really matters in this check. So use the more accurate
> topology_max_package() to replace nr_online_nodes().

In the end, we have a bunch of hardware enumeration and then a bunch of
processing on top of it taking CPU hotplug support and kernel command
lines into account.

The hardware enumeration is relatively simple. The processing the
kernel does on top of it is complicated.

> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 5a69a49acc96..87e7c0e89db1 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1252,15 +1252,12 @@ static void __init check_system_tsc_reliable(void)
> * - TSC which does not stop in C-States
> * - the TSC_ADJUST register which allows to detect even minimal
> * modifications
> - * - not more than two sockets. As the number of sockets cannot be
> - * evaluated at the early boot stage where this has to be
> - * invoked, check the number of online memory nodes as a
> - * fallback solution which is an reasonable estimate.
> + * - not more than four sockets.
> */
> if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> - nr_online_nodes <= 4)
> + topology_max_packages() <= 4)
> tsc_disable_clocksource_watchdog();
> }

I know there's some history here, but the changelog itself is not clear
about what the problem is or how the patch solves it.

I also kinda dislike the comment talking about "sockets" and the code
talking about "packages". I also did a big *gulp* when I saw this:

#define topology_max_packages() (__max_logical_packages)

and:

> /*
> * Today neither Intel nor AMD support heterogeneous systems so
> * extrapolate the boot cpu's data to all packages.
> */
> ncpus = cpu_data(0).booted_cores * topology_max_smt_threads();
> __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);

Because Intel obviously has heterogeneous systems today.

So I'll buy that removing 'nr_online_nodes' takes NUMA out of the
picture (which is good), but I want to hear more about why
topology_max_packages() and '4' are the right things to be checking.

I suspect the real reason '4' was picked was to give the calculation
some wiggle room because it's not actually all that precise.



2024-03-15 20:58:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: Use topology_max_packages() to get package number

On Fri, Mar 15 2024 at 10:58, Dave Hansen wrote:
> On 3/15/24 04:26, Feng Tang wrote:
>> /*
>> * Today neither Intel nor AMD support heterogeneous systems so
>> * extrapolate the boot cpu's data to all packages.
>> */
>> ncpus = cpu_data(0).booted_cores * topology_max_smt_threads();
>> __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
>
> Because Intel obviously has heterogeneous systems today.

Hybrid is a per package property.

But neither Intel nor AMD support populating multi socket systems with
random packages, where socket 0 has less cores than socket 1 or socket 0
is hybrid and socket 1 is not.

> So I'll buy that removing 'nr_online_nodes' takes NUMA out of the
> picture (which is good), but I want to hear more about why
> topology_max_packages() and '4' are the right things to be checking.
>
> I suspect the real reason '4' was picked was to give the calculation
> some wiggle room because it's not actually all that precise.

IIRC the TSC is only guaranteed to be synchronized up to 4 sockets, but
my memory might be wrong as usual.

Thanks,

tglx

2024-03-17 12:00:42

by Zhang, Rui

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: Use topology_max_packages() to get package number

On Fri, 2024-03-15 at 10:58 -0700, Dave Hansen wrote:
> On 3/15/24 04:26, Feng Tang wrote:
> > Thomas' recent patchset of refactoring x86 topology code introduces
> > topology_max_package(), 

s/topology_max_package/topology_max_packages

And topology_max_packages() is not new. The patch set actually
improves/fixes it.


>
> I also did a big *gulp* when I saw this:
>
>         #define topology_max_packages() (__max_logical_packages)
>
> and:
>
> >         /*
> >          * Today neither Intel nor AMD support heterogeneous
> > systems so
> >          * extrapolate the boot cpu's data to all packages.
> >          */
> >         ncpus = cpu_data(0).booted_cores *
> > topology_max_smt_threads();
> >         __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
>
> Because Intel obviously has heterogeneous systems today.
>
Dave, I think you were checking the old code.
Please refer to commit 090610ba704a ("x86/cpu/topology: Use topology
bitmaps for sizing"), which is just merged in this merge window.

thanks,
rui

2024-03-18 01:17:31

by Feng Tang

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: Use topology_max_packages() to get package number

On Fri, Mar 15, 2024 at 09:51:32PM +0100, Thomas Gleixner wrote:
> On Fri, Mar 15 2024 at 10:58, Dave Hansen wrote:
> > On 3/15/24 04:26, Feng Tang wrote:
> >> /*
> >> * Today neither Intel nor AMD support heterogeneous systems so
> >> * extrapolate the boot cpu's data to all packages.
> >> */
> >> ncpus = cpu_data(0).booted_cores * topology_max_smt_threads();
> >> __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
> >
> > Because Intel obviously has heterogeneous systems today.
>
> Hybrid is a per package property.
>
> But neither Intel nor AMD support populating multi socket systems with
> random packages, where socket 0 has less cores than socket 1 or socket 0
> is hybrid and socket 1 is not.


Before posting the patch, I run the latest upstream kernel with your
topology refactoring patchset on one AlderLake and one MetorLake box,
and they both show the number of package is 1.

>
> > So I'll buy that removing 'nr_online_nodes' takes NUMA out of the
> > picture (which is good), but I want to hear more about why
> > topology_max_packages() and '4' are the right things to be checking.
> >
> > I suspect the real reason '4' was picked was to give the calculation
> > some wiggle room because it's not actually all that precise.
>
> IIRC the TSC is only guaranteed to be synchronized up to 4 sockets, but
> my memory might be wrong as usual.

Yes. I did try to increase the bar to '8' with a patch, and at that
time Peter Zijlstra mentioned there was real issue with TSC found on
some old 8 sockets machine.

Thanks,
Feng

>
> Thanks,
>
> tglx

2024-03-18 01:31:11

by Feng Tang

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: Use topology_max_packages() to get package number

On Fri, Mar 15, 2024 at 10:58:38AM -0700, Dave Hansen wrote:
> On 3/15/24 04:26, Feng Tang wrote:
> > Thomas' recent patchset of refactoring x86 topology code introduces
> > topology_max_package(), which works well in most of the above cases.
> > The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup, which
> > sets up the 'nr_cpu_ids' and rejects the rest of the CPUs, and may
> > cause topology_max_package() less than the real package number, but
> > it's fine as it is rarely used debug option, and logical package
> > number really matters in this check. So use the more accurate
> > topology_max_package() to replace nr_online_nodes().
>
> In the end, we have a bunch of hardware enumeration and then a bunch of
> processing on top of it taking CPU hotplug support and kernel command
> lines into account.
>
> The hardware enumeration is relatively simple. The processing the
> kernel does on top of it is complicated.

Yes!

>
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index 5a69a49acc96..87e7c0e89db1 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1252,15 +1252,12 @@ static void __init check_system_tsc_reliable(void)
> > * - TSC which does not stop in C-States
> > * - the TSC_ADJUST register which allows to detect even minimal
> > * modifications
> > - * - not more than two sockets. As the number of sockets cannot be
> > - * evaluated at the early boot stage where this has to be
> > - * invoked, check the number of online memory nodes as a
> > - * fallback solution which is an reasonable estimate.
> > + * - not more than four sockets.
> > */
> > if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> > boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> > boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> > - nr_online_nodes <= 4)
> > + topology_max_packages() <= 4)
> > tsc_disable_clocksource_watchdog();
> > }
>
> I know there's some history here, but the changelog itself is not clear
> about what the problem is or how the patch solves it.

OK, will improve the changelog. The problem is nr_online_nodes() is
not a good option for get package number, it is mostly a memory node
concept, and easy be cheated by different kernel cmdline setup like
NUMA emulation and hotplug tricks, while it had an advantage of being
availab early before TSC get initialized. Thomas' patchset improves
the topology code, and provide a much better topology_max_packages().

>
> I also kinda dislike the comment talking about "sockets" and the code
> talking about "packages".

Will unifiy to use 'package' term.

> I also did a big *gulp* when I saw this:
>
> #define topology_max_packages() (__max_logical_packages)
>
> and:

Latest code has dropped this.

Thanks,
Feng

> > /*
> > * Today neither Intel nor AMD support heterogeneous systems so
> > * extrapolate the boot cpu's data to all packages.
> > */
> > ncpus = cpu_data(0).booted_cores * topology_max_smt_threads();
> > __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
>
> Because Intel obviously has heterogeneous systems today.
>
> So I'll buy that removing 'nr_online_nodes' takes NUMA out of the
> picture (which is good), but I want to hear more about why
> topology_max_packages() and '4' are the right things to be checking.
>
> I suspect the real reason '4' was picked was to give the calculation
> some wiggle room because it's not actually all that precise.
>
>

2024-03-18 01:33:32

by Feng Tang

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: Use topology_max_packages() to get package number

On Sun, Mar 17, 2024 at 08:00:26PM +0800, Zhang, Rui wrote:
> On Fri, 2024-03-15 at 10:58 -0700, Dave Hansen wrote:
> > On 3/15/24 04:26, Feng Tang wrote:
> > > Thomas' recent patchset of refactoring x86 topology code introduces
> > > topology_max_package(), 
>
> s/topology_max_package/topology_max_packages
>
> And topology_max_packages() is not new. The patch set actually
> improves/fixes it.

Aha, you are right. Will fix the typo.

Thanks,
Feng

>
> >
> > I also did a big *gulp* when I saw this:
> >
> >         #define topology_max_packages() (__max_logical_packages)
> >
> > and:
> >
> > >         /*
> > >          * Today neither Intel nor AMD support heterogeneous
> > > systems so
> > >          * extrapolate the boot cpu's data to all packages.
> > >          */
> > >         ncpus = cpu_data(0).booted_cores *
> > > topology_max_smt_threads();
> > >         __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
> >
> > Because Intel obviously has heterogeneous systems today.
> >
> Dave, I think you were checking the old code.
> Please refer to commit 090610ba704a ("x86/cpu/topology: Use topology
> bitmaps for sizing"), which is just merged in this merge window.
>
> thanks,
> rui

2024-03-18 02:04:45

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: Use topology_max_packages() to get package number


On 3/15/24 07:26, Feng Tang wrote:
> Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
> on qualified platorms") was introduced to solve problem that
> sometimes TSC clocksource is wrongly judged as unstable by watchdog
> like 'jiffies', HPET, etc.
>
> In it, the hardware package number is a key factor for judging whether
> to disable the watchdog for TSC, and 'nr_online_nodes' was chosen as
> an estimation due to it is needed in early boot phase before
> registering 'tsc-early' clocksource, where all none-boot CPUs are not

"none-boot"? You mean "non-boot". Right?

Other than that, the patch looks reasonable to me.

Thanks,
Longman



2024-03-18 02:12:22

by Feng Tang

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: Use topology_max_packages() to get package number

On Sun, Mar 17, 2024 at 10:03:38PM -0400, Waiman Long wrote:
>
> On 3/15/24 07:26, Feng Tang wrote:
> > Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
> > on qualified platorms") was introduced to solve problem that
> > sometimes TSC clocksource is wrongly judged as unstable by watchdog
> > like 'jiffies', HPET, etc.
> >
> > In it, the hardware package number is a key factor for judging whether
> > to disable the watchdog for TSC, and 'nr_online_nodes' was chosen as
> > an estimation due to it is needed in early boot phase before
> > registering 'tsc-early' clocksource, where all none-boot CPUs are not
>
> "none-boot"? You mean "non-boot". Right?

Yes, you are right. will fix it. non-boot CPU means AP (application
processor) here.

>
> Other than that, the patch looks reasonable to me.

Thanks for the review!

- Feng

> Thanks,
> Longman
>
>

2024-03-18 15:09:08

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: Use topology_max_packages() to get package number

On 3/17/24 05:00, Zhang, Rui wrote:
>> I also did a big *gulp* when I saw this:
>>
>>         #define topology_max_packages() (__max_logical_packages)
>>
>> and:
>>
>>>         /*
>>>          * Today neither Intel nor AMD support heterogeneous
>>> systems so
>>>          * extrapolate the boot cpu's data to all packages.
>>>          */
>>>         ncpus = cpu_data(0).booted_cores *
>>> topology_max_smt_threads();
>>>         __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
>> Because Intel obviously has heterogeneous systems today.
>>
> Dave, I think you were checking the old code.
> Please refer to commit 090610ba704a ("x86/cpu/topology: Use topology
> bitmaps for sizing"), which is just merged in this merge window.

You're 100% right. I was looking at a weeks-old tree. Sorry for the noise.

2024-03-18 16:36:07

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: Use topology_max_packages() to get package number

On 3/15/24 04:26, Feng Tang wrote:
> Thomas' recent patchset of refactoring x86 topology code introduces
> topology_max_package(), which works well in most of the above cases.
> The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup, which
> sets up the 'nr_cpu_ids' and rejects the rest of the CPUs, and may
> cause topology_max_package() less than the real package number, but
> it's fine as it is rarely used debug option, and logical package
> number really matters in this check. So use the more accurate
> topology_max_package() to replace nr_online_nodes().

This is a lot longer than what you have in this changelog, but I hope it
makes sense. This is retreading some of what Thomas's set does, but I
think it's worth re-explaining:

The boot CPU iterates through all enumerated APIC ids and either accepts
or rejects the APIC id. For accepted ids, it figures out which bits of
the id map to the package number. It tracks which package numbers have
been seen in a bitmap. topology_max_package() just returns the number
of bits set in that bitmap.

'nr_cpus=' and 'possible_cpus=' can cause more APIC ids to be rejected
and can artificially lower the number of bits in the package bitmap and
thus topology_max_package().

This means that, for example, a system with 8 physical packages might
reject all the CPUs on 6 of those packages and be left with only 2
packages and 2 bits set in the package bitmap. It needs the TSC
watchdog, but would disable it anyway. This isn't ideal, but this only
happens for debug-oriented options.

This is fixable by tracking the package numbers for rejected CPUs. But
it's not worth the trouble for debugging.

--

The only thing that comes to mind is if we need something simple like:

if (topo_info.nr_rejected_cpus)
pr_info("TSC might be buggered\n");

.. somewhere.

2024-03-19 02:25:26

by Feng Tang

[permalink] [raw]
Subject: Re: [PATCH] x86/tsc: Use topology_max_packages() to get package number

On Mon, Mar 18, 2024 at 09:35:57AM -0700, Dave Hansen wrote:
> On 3/15/24 04:26, Feng Tang wrote:
> > Thomas' recent patchset of refactoring x86 topology code introduces
> > topology_max_package(), which works well in most of the above cases.
> > The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup, which
> > sets up the 'nr_cpu_ids' and rejects the rest of the CPUs, and may
> > cause topology_max_package() less than the real package number, but
> > it's fine as it is rarely used debug option, and logical package
> > number really matters in this check. So use the more accurate
> > topology_max_package() to replace nr_online_nodes().
>
> This is a lot longer than what you have in this changelog, but I hope it
> makes sense. This is retreading some of what Thomas's set does, but I
> think it's worth re-explaining:
>
> The boot CPU iterates through all enumerated APIC ids and either accepts
> or rejects the APIC id. For accepted ids, it figures out which bits of
> the id map to the package number. It tracks which package numbers have
> been seen in a bitmap. topology_max_package() just returns the number
> of bits set in that bitmap.
>
> 'nr_cpus=' and 'possible_cpus=' can cause more APIC ids to be rejected
> and can artificially lower the number of bits in the package bitmap and
> thus topology_max_package().
>
> This means that, for example, a system with 8 physical packages might
> reject all the CPUs on 6 of those packages and be left with only 2
> packages and 2 bits set in the package bitmap. It needs the TSC
> watchdog, but would disable it anyway. This isn't ideal, but this only
> happens for debug-oriented options.
>
> This is fixable by tracking the package numbers for rejected CPUs. But
> it's not worth the trouble for debugging.

Will steal it for the v2 :). Many thanks!

>
> --
>
> The only thing that comes to mind is if we need something simple like:
>
> if (topo_info.nr_rejected_cpus)
> pr_info("TSC might be buggered\n");
>
> ... somewhere.

I think logically this should be together with the package checking
code in tsc.c, and we need to export the info from static data
'topo_info'. The other thought is adding in topology code:

--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -460,8 +460,11 @@ void __init topology_init_possible_cpus(void)
pr_info("Num. threads per package: %3u\n", __num_threads_per_package);

pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
- if (topo_info.nr_rejected_cpus)
+ if (topo_info.nr_rejected_cpus) {
pr_info("Rejected CPUs %u\n", topo_info.nr_rejected_cpus);
+ if (__max_logical_packages <= 4)
+ pr_warning("TSC might be bugged due to these rejected CPUs\n");
+ }

Thanks,
Feng