2024-03-25 12:24:40

by Feng Tang

[permalink] [raw]
Subject: [PATCH v2] x86/tsc: Use topology_max_packages() to get package number

Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
on qualified platorms") was introduced to solve problem that
sometimes TSC clocksource is wrongly judged as unstable by watchdog
like 'jiffies', HPET, etc.

In it, the hardware package number is a key factor for judging whether
to disable the watchdog for TSC, and 'nr_online_nodes' was chosen due
to it is available in early boot phase before registering 'tsc-early'
clocksource, where all non-boot CPUs are not brought up yet.

Dave and Rui pointed out there are many cases in which 'nr_online_nodes'
is cheated and not accurate, like:

* numa emulation (numa=fake=8 etc.)
* numa=off
* platforms with CPU-less HBM nodes, CPU-less Optane memory nodes.
* SNC (sub-numa cluster) mode enabled
* 'maxcpus=' cmdline setup, where chopped CPUs could be onlined later
* 'nr_cpus=', 'possible_cpus=' cmdline setup, where chopped CPUs can
not be onlined after boot

Thomas' recent patchset of refactoring x86 topology code improves
topology_max_packages(), by making it more accurate and available in
early boot phase, which works well in most of the above cases.

The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup. And
the reason is, during topology setup, the boot CPU iterates through
all enumerated APIC ids and either accepts or rejects the APIC id.
For accepted ids, it figures out which bits of the id map to the
package number. It tracks which package numbers have been seen in a
bitmap. topology_max_packages() just returns the number of bits set
in that bitmap.

'nr_cpus=' and 'possible_cpus=' can cause more APIC ids to be rejected
and can artificially lower the number of bits in the package bitmap
and thus topology_max_packages(). This means that, for example, a
system with 8 physical packages might reject all the CPUs on 6 of those
packages and be left with only 2 packages and 2 bits set in the package
bitmap. It needs the TSC watchdog, but would disable it anyway. This
isn't ideal, but this only happens for debug-oriented options. This is
fixable by tracking the package numbers for rejected CPUs. But it's
not worth the trouble for debugging.

So use topology_max_packages() to replace nr_online_nodes().

Reported-by: Dave Hansen <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Feng Tang <[email protected]>
---

Hi all,

For warning about possible compromise due to 'nr_cpus=' and 'possible_cpus=',
another alternative could be checking whether these has been setup in cmdline
inside tsc.c and warn there.

Changelog:

Since v1:

* Use Dave's detailed elaboration about 'nr_cpus=', 'possible_cpus='
possibly compromising '__max_logical_packages' in commit log
* Fix typos and inaccuracy pointed out by Rui and Longman

arch/x86/kernel/cpu/topology.c | 5 ++++-
arch/x86/kernel/tsc.c | 7 ++-----
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 3259b1d4fefe..2db03b00e29b 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -460,8 +460,11 @@ void __init topology_init_possible_cpus(void)
pr_info("Num. threads per package: %3u\n", __num_threads_per_package);

pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
- if (topo_info.nr_rejected_cpus)
+ if (topo_info.nr_rejected_cpus) {
pr_info("Rejected CPUs %u\n", topo_info.nr_rejected_cpus);
+ if (__max_logical_packages <= 4)
+ pr_warn("TSC might be buggered due to the rejected CPUs\n");
+ }

init_cpu_present(cpumask_of(0));
init_cpu_possible(cpumask_of(0));
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 5a69a49acc96..d00f88f16eb1 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1252,15 +1252,12 @@ static void __init check_system_tsc_reliable(void)
* - TSC which does not stop in C-States
* - the TSC_ADJUST register which allows to detect even minimal
* modifications
- * - not more than two sockets. As the number of sockets cannot be
- * evaluated at the early boot stage where this has to be
- * invoked, check the number of online memory nodes as a
- * fallback solution which is an reasonable estimate.
+ * - not more than four packages
*/
if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
- nr_online_nodes <= 4)
+ topology_max_packages() <= 4)
tsc_disable_clocksource_watchdog();
}

--
2.34.1



2024-03-26 01:18:09

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v2] x86/tsc: Use topology_max_packages() to get package number

On 3/24/24 23:09, Feng Tang wrote:
> Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
> on qualified platorms") was introduced to solve problem that
> sometimes TSC clocksource is wrongly judged as unstable by watchdog
> like 'jiffies', HPET, etc.
>
> In it, the hardware package number is a key factor for judging whether
> to disable the watchdog for TSC, and 'nr_online_nodes' was chosen due
> to it is available in early boot phase before registering 'tsc-early'
> clocksource, where all non-boot CPUs are not brought up yet.
>
> Dave and Rui pointed out there are many cases in which 'nr_online_nodes'
> is cheated and not accurate, like:
>
> * numa emulation (numa=fake=8 etc.)
> * numa=off
> * platforms with CPU-less HBM nodes, CPU-less Optane memory nodes.
> * SNC (sub-numa cluster) mode enabled
> * 'maxcpus=' cmdline setup, where chopped CPUs could be onlined later
> * 'nr_cpus=', 'possible_cpus=' cmdline setup, where chopped CPUs can
> not be onlined after boot
>
> Thomas' recent patchset of refactoring x86 topology code improves
> topology_max_packages(), by making it more accurate and available in
> early boot phase, which works well in most of the above cases.
>
> The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup. And
> the reason is, during topology setup, the boot CPU iterates through
> all enumerated APIC ids and either accepts or rejects the APIC id.
> For accepted ids, it figures out which bits of the id map to the
> package number. It tracks which package numbers have been seen in a
> bitmap. topology_max_packages() just returns the number of bits set
> in that bitmap.
>
> 'nr_cpus=' and 'possible_cpus=' can cause more APIC ids to be rejected
> and can artificially lower the number of bits in the package bitmap
> and thus topology_max_packages(). This means that, for example, a
> system with 8 physical packages might reject all the CPUs on 6 of those
> packages and be left with only 2 packages and 2 bits set in the package
> bitmap. It needs the TSC watchdog, but would disable it anyway. This
> isn't ideal, but this only happens for debug-oriented options. This is
> fixable by tracking the package numbers for rejected CPUs. But it's
> not worth the trouble for debugging.
>
> So use topology_max_packages() to replace nr_online_nodes().
>
> Reported-by: Dave Hansen <[email protected]>
> Closes: https://lore.kernel.org/lkml/[email protected]/
> Signed-off-by: Feng Tang <[email protected]>
> ---
>
> Hi all,
>
> For warning about possible compromise due to 'nr_cpus=' and 'possible_cpus=',
> another alternative could be checking whether these has been setup in cmdline
> inside tsc.c and warn there.
>
> Changelog:
>
> Since v1:
>
> * Use Dave's detailed elaboration about 'nr_cpus=', 'possible_cpus='
> possibly compromising '__max_logical_packages' in commit log
> * Fix typos and inaccuracy pointed out by Rui and Longman
>
> arch/x86/kernel/cpu/topology.c | 5 ++++-
> arch/x86/kernel/tsc.c | 7 ++-----
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index 3259b1d4fefe..2db03b00e29b 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -460,8 +460,11 @@ void __init topology_init_possible_cpus(void)
> pr_info("Num. threads per package: %3u\n", __num_threads_per_package);
>
> pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
> - if (topo_info.nr_rejected_cpus)
> + if (topo_info.nr_rejected_cpus) {
> pr_info("Rejected CPUs %u\n", topo_info.nr_rejected_cpus);
> + if (__max_logical_packages <= 4)
> + pr_warn("TSC might be buggered due to the rejected CPUs\n");
> + }

People may sometimes use kernel option like "panic_on_warn=1" to cause a
crash dump on warning. Maybe we should just use pr_info() as the
presence of rejected CPUs are unlikely to be a real problem from the TSC
stability point of view. To emphasize it a bit more, we could add a
"WARNING: " prefix, for example.

Cheers,
Longman


>
> init_cpu_present(cpumask_of(0));
> init_cpu_possible(cpumask_of(0));
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 5a69a49acc96..d00f88f16eb1 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1252,15 +1252,12 @@ static void __init check_system_tsc_reliable(void)
> * - TSC which does not stop in C-States
> * - the TSC_ADJUST register which allows to detect even minimal
> * modifications
> - * - not more than two sockets. As the number of sockets cannot be
> - * evaluated at the early boot stage where this has to be
> - * invoked, check the number of online memory nodes as a
> - * fallback solution which is an reasonable estimate.
> + * - not more than four packages
> */
> if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> - nr_online_nodes <= 4)
> + topology_max_packages() <= 4)
> tsc_disable_clocksource_watchdog();
> }
>


2024-03-26 01:28:24

by Feng Tang

[permalink] [raw]
Subject: Re: [PATCH v2] x86/tsc: Use topology_max_packages() to get package number

On Mon, Mar 25, 2024 at 09:17:38PM -0400, Waiman Long wrote:
> On 3/24/24 23:09, Feng Tang wrote:
> > Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
> > on qualified platorms") was introduced to solve problem that
> > sometimes TSC clocksource is wrongly judged as unstable by watchdog
> > like 'jiffies', HPET, etc.
> >
> > In it, the hardware package number is a key factor for judging whether
> > to disable the watchdog for TSC, and 'nr_online_nodes' was chosen due
> > to it is available in early boot phase before registering 'tsc-early'
> > clocksource, where all non-boot CPUs are not brought up yet.
> >
> > Dave and Rui pointed out there are many cases in which 'nr_online_nodes'
> > is cheated and not accurate, like:
> >
> > * numa emulation (numa=fake=8 etc.)
> > * numa=off
> > * platforms with CPU-less HBM nodes, CPU-less Optane memory nodes.
> > * SNC (sub-numa cluster) mode enabled
> > * 'maxcpus=' cmdline setup, where chopped CPUs could be onlined later
> > * 'nr_cpus=', 'possible_cpus=' cmdline setup, where chopped CPUs can
> > not be onlined after boot
> >
> > Thomas' recent patchset of refactoring x86 topology code improves
> > topology_max_packages(), by making it more accurate and available in
> > early boot phase, which works well in most of the above cases.
> >
> > The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup. And
> > the reason is, during topology setup, the boot CPU iterates through
> > all enumerated APIC ids and either accepts or rejects the APIC id.
> > For accepted ids, it figures out which bits of the id map to the
> > package number. It tracks which package numbers have been seen in a
> > bitmap. topology_max_packages() just returns the number of bits set
> > in that bitmap.
> >
> > 'nr_cpus=' and 'possible_cpus=' can cause more APIC ids to be rejected
> > and can artificially lower the number of bits in the package bitmap
> > and thus topology_max_packages(). This means that, for example, a
> > system with 8 physical packages might reject all the CPUs on 6 of those
> > packages and be left with only 2 packages and 2 bits set in the package
> > bitmap. It needs the TSC watchdog, but would disable it anyway. This
> > isn't ideal, but this only happens for debug-oriented options. This is
> > fixable by tracking the package numbers for rejected CPUs. But it's
> > not worth the trouble for debugging.
> >
> > So use topology_max_packages() to replace nr_online_nodes().
> >
> > Reported-by: Dave Hansen <[email protected]>
> > Closes: https://lore.kernel.org/lkml/[email protected]/
> > Signed-off-by: Feng Tang <[email protected]>
> > ---
> >
> > Hi all,
> >
> > For warning about possible compromise due to 'nr_cpus=' and 'possible_cpus=',
> > another alternative could be checking whether these has been setup in cmdline
> > inside tsc.c and warn there.
> >
> > Changelog:
> >
> > Since v1:
> >
> > * Use Dave's detailed elaboration about 'nr_cpus=', 'possible_cpus='
> > possibly compromising '__max_logical_packages' in commit log
> > * Fix typos and inaccuracy pointed out by Rui and Longman
> >
> > arch/x86/kernel/cpu/topology.c | 5 ++++-
> > arch/x86/kernel/tsc.c | 7 ++-----
> > 2 files changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> > index 3259b1d4fefe..2db03b00e29b 100644
> > --- a/arch/x86/kernel/cpu/topology.c
> > +++ b/arch/x86/kernel/cpu/topology.c
> > @@ -460,8 +460,11 @@ void __init topology_init_possible_cpus(void)
> > pr_info("Num. threads per package: %3u\n", __num_threads_per_package);
> > pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
> > - if (topo_info.nr_rejected_cpus)
> > + if (topo_info.nr_rejected_cpus) {
> > pr_info("Rejected CPUs %u\n", topo_info.nr_rejected_cpus);
> > + if (__max_logical_packages <= 4)
> > + pr_warn("TSC might be buggered due to the rejected CPUs\n");
> > + }
>
> People may sometimes use kernel option like "panic_on_warn=1" to cause a
> crash dump on warning. Maybe we should just use pr_info() as the presence of
> rejected CPUs are unlikely to be a real problem from the TSC stability point
> of view. To emphasize it a bit more, we could add a "WARNING: " prefix, for
> example.

Good catch! will chagne it to pr_info. The possible hurt of these
debug options are relatively small. In the past several years, I
haven't got a real case that TSC is really broken.

Thanks,
Feng

>
> Cheers,
> Longman
>