Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
on qualified platorms") was introduced to solve problem that
sometimes TSC clocksource is wrongly judged as unstable by watchdog
like 'jiffies', HPET, etc.
In it, the hardware socket number is a key factor for judging
whether to disable the watchdog for TSC, and 'nr_online_nodes' was
chosen as an estimation due to it is needed in early boot phase
before registering 'tsc-early' clocksource, where all none-boot
CPUs are not brought up yet.
In recent patch review, Dave Hansen pointed out there are many
cases that 'nr_online_nodes' could have issue, like:
* numa emulation (numa=fake=4 etc.)
* numa=off
* platforms with CPU+DRAM nodes, CPU-less HBM nodes, CPU-less
persistent memory nodes.
* SNC (sub-numa cluster) mode is enabled
Peter Zijlstra suggested to use logical package ids, but it is
only usable after smp_init() and all CPUs are initialized.
One solution is to skip the watchdog for 'tsc-early' clocksource,
and move the check after smp_init(), while before 'tsc'
clocksoure is registered, where 'logical_packages' could be used
as a much more accurate socket number.
Signed-off-by: Feng Tang <[email protected]>
---
Hi reviewers,
I separate the code to 2 patches, as I think they are covering 2
problems and easy for bisect. Feel free to combine them into one,
as the 2/2 are a trivial change.
Thanks,
Feng
Changelog:
Since RFC:
* use 'logical_packages' instead of topology_max_packages(), whose
implementaion is not accurate, like for heterogeneous systems
which have combination of Core/Atom CPUs like Alderlake (Dave Hansen)
arch/x86/include/asm/topology.h | 4 ++++
arch/x86/kernel/smpboot.c | 2 +-
arch/x86/kernel/tsc.c | 42 +++++++++++++--------------------
3 files changed, 21 insertions(+), 27 deletions(-)
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 458c891a8273..f9002549770c 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -122,8 +122,11 @@ extern unsigned int __max_die_per_package;
#define topology_sibling_cpumask(cpu) (per_cpu(cpu_sibling_map, cpu))
extern unsigned int __max_logical_packages;
+extern unsigned int logical_packages;
#define topology_max_packages() (__max_logical_packages)
+extern unsigned int logical_packages;
+
static inline int topology_max_die_per_package(void)
{
return __max_die_per_package;
@@ -144,6 +147,7 @@ bool topology_is_primary_thread(unsigned int cpu);
bool topology_smt_supported(void);
#else
#define topology_max_packages() (1)
+#define logical_packages (1)
static inline int
topology_update_package_map(unsigned int apicid, unsigned int cpu) { return 0; }
static inline int
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3f3ea0287f69..d81156beb7e7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -102,7 +102,7 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);
/* Logical package management. We might want to allocate that dynamically */
unsigned int __max_logical_packages __read_mostly;
EXPORT_SYMBOL(__max_logical_packages);
-static unsigned int logical_packages __read_mostly;
+unsigned int logical_packages __read_mostly;
static unsigned int logical_die __read_mostly;
/* Maximum number of SMT threads on any online core */
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index cafacb2e58cc..178448ef00c7 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1131,8 +1131,7 @@ static struct clocksource clocksource_tsc_early = {
.uncertainty_margin = 32 * NSEC_PER_MSEC,
.read = read_tsc,
.mask = CLOCKSOURCE_MASK(64),
- .flags = CLOCK_SOURCE_IS_CONTINUOUS |
- CLOCK_SOURCE_MUST_VERIFY,
+ .flags = CLOCK_SOURCE_IS_CONTINUOUS,
.vdso_clock_mode = VDSO_CLOCKMODE_TSC,
.enable = tsc_cs_enable,
.resume = tsc_resume,
@@ -1180,12 +1179,6 @@ void mark_tsc_unstable(char *reason)
EXPORT_SYMBOL_GPL(mark_tsc_unstable);
-static void __init tsc_disable_clocksource_watchdog(void)
-{
- clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
- clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
-}
-
static void __init check_system_tsc_reliable(void)
{
#if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
@@ -1202,23 +1195,6 @@ static void __init check_system_tsc_reliable(void)
#endif
if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
tsc_clocksource_reliable = 1;
-
- /*
- * Disable the clocksource watchdog when the system has:
- * - TSC running at constant frequency
- * - TSC which does not stop in C-States
- * - the TSC_ADJUST register which allows to detect even minimal
- * modifications
- * - not more than two sockets. As the number of sockets cannot be
- * evaluated at the early boot stage where this has to be
- * invoked, check the number of online memory nodes as a
- * fallback solution which is an reasonable estimate.
- */
- if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
- boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
- boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
- nr_online_nodes <= 2)
- tsc_disable_clocksource_watchdog();
}
/*
@@ -1413,6 +1389,20 @@ static int __init init_tsc_clocksource(void)
if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
+ /*
+ * Disable the clocksource watchdog when the system has:
+ * - TSC running at constant frequency
+ * - TSC which does not stop in C-States
+ * - the TSC_ADJUST register which allows to detect even minimal
+ * modifications
+ * - not more than two sockets.
+ */
+ if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
+ boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
+ boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
+ logical_packages <= 2)
+ clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+
/*
* When TSC frequency is known (retrieved via MSR or CPUID), we skip
* the refined calibration and directly register it as a clocksource.
@@ -1547,7 +1537,7 @@ void __init tsc_init(void)
}
if (tsc_clocksource_reliable || no_tsc_watchdog)
- tsc_disable_clocksource_watchdog();
+ clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
clocksource_register_khz(&clocksource_tsc_early, tsc_khz);
detect_art();
--
2.34.1
On Fri, 2022-10-21 at 14:21 +0800, Feng Tang wrote:
> Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
> on qualified platorms") was introduced to solve problem that
> sometimes TSC clocksource is wrongly judged as unstable by watchdog
> like 'jiffies', HPET, etc.
>
> In it, the hardware socket number is a key factor for judging
> whether to disable the watchdog for TSC, and 'nr_online_nodes' was
> chosen as an estimation due to it is needed in early boot phase
> before registering 'tsc-early' clocksource, where all none-boot
> CPUs are not brought up yet.
>
> In recent patch review, Dave Hansen pointed out there are many
> cases that 'nr_online_nodes' could have issue, like:
> * numa emulation (numa=fake=4 etc.)
> * numa=off
> * platforms with CPU+DRAM nodes, CPU-less HBM nodes, CPU-less
> persistent memory nodes.
> * SNC (sub-numa cluster) mode is enabled
>
> Peter Zijlstra suggested to use logical package ids, but it is
> only usable after smp_init() and all CPUs are initialized.
>
> One solution is to skip the watchdog for 'tsc-early' clocksource,
> and move the check after smp_init(), while before 'tsc'
> clocksoure is registered, where 'logical_packages' could be used
> as a much more accurate socket number.
>
> Signed-off-by: Feng Tang <[email protected]>
> ---
> Hi reviewers,
>
> I separate the code to 2 patches, as I think they are covering 2
> problems and easy for bisect. Feel free to combine them into one,
> as the 2/2 are a trivial change.
>
> Thanks,
> Feng
>
> Changelog:
>
> Since RFC:
> * use 'logical_packages' instead of topology_max_packages(), whose
> implementaion is not accurate, like for heterogeneous systems
> which have combination of Core/Atom CPUs like Alderlake (Dave
> Hansen)
I checked the history of '__max_logical_packages', and realized that
1. for topology_max_packages()/'__max_logical_packages', the divisor
'ncpus' uses cpu_data(0).booted_cores, which is based on the
*online* CPUs. So when using kernel cmdlines like maxcpus=/nr_cpus=,
'__max_logical_packages' can get over-estimated.
2. for 'logical_packages', it equals the number of different physical
Package IDs for all *online* CPUs. So with kernel cmdlines like
nr_cpus=/maxcpus=, it can gets under-estimated.
BTW, I also checked CPUID.B/1F, which can tell a fixed number of CPUs
within a package. But we don't have a fixed number of total CPUs from
hardware.
On my Dell laptop, BIOS allows me to disable/enable one or several
cores. When this happens, the 'total_cpus' changes, but CPUID.B/1F does
not change. So I don't think CPUID.B/1F can be used to optimize the '__
max_logical_packages' calculation.
I'm not sure if we have a perfect solution here.
thanks,
rui
On 10/21/22 08:00, Zhang Rui wrote:
> I checked the history of '__max_logical_packages', and realized that
>
> 1. for topology_max_packages()/'__max_logical_packages', the divisor
> 'ncpus' uses cpu_data(0).booted_cores, which is based on the
> *online* CPUs. So when using kernel cmdlines like maxcpus=/nr_cpus=,
> '__max_logical_packages' can get over-estimated.
>
> 2. for 'logical_packages', it equals the number of different physical
> Package IDs for all *online* CPUs. So with kernel cmdlines like
> nr_cpus=/maxcpus=, it can gets under-estimated.
>
> BTW, I also checked CPUID.B/1F, which can tell a fixed number of CPUs
> within a package. But we don't have a fixed number of total CPUs from
> hardware.
> On my Dell laptop, BIOS allows me to disable/enable one or several
> cores. When this happens, the 'total_cpus' changes, but CPUID.B/1F does
> not change. So I don't think CPUID.B/1F can be used to optimize the '__
> max_logical_packages' calculation.
>
> I'm not sure if we have a perfect solution here.
Are the implementations fixable? Or, at least tolerable?
For instance, I can live with the implementation being a bit goofy when
kernel commandlines are in play. We can pr_info() about those cases.
On Fri, 2022-10-21 at 09:21 -0700, Dave Hansen wrote:
> On 10/21/22 08:00, Zhang Rui wrote:
> > I checked the history of '__max_logical_packages', and realized
> > that
> >
> > 1. for topology_max_packages()/'__max_logical_packages', the
> > divisor
> > 'ncpus' uses cpu_data(0).booted_cores, which is based on the
> > *online* CPUs. So when using kernel cmdlines like
> > maxcpus=/nr_cpus=,
> > '__max_logical_packages' can get over-estimated.
> >
> >
> > 2. for 'logical_packages', it equals the number of different
> > physical
> > Package IDs for all *online* CPUs. So with kernel cmdlines like
> > nr_cpus=/maxcpus=, it can gets under-estimated.
> >
> > BTW, I also checked CPUID.B/1F, which can tell a fixed number of
> > CPUs
> > within a package. But we don't have a fixed number of total CPUs
> > from
> > hardware.
> > On my Dell laptop, BIOS allows me to disable/enable one or several
> > cores. When this happens, the 'total_cpus' changes, but CPUID.B/1F
> > does
> > not change. So I don't think CPUID.B/1F can be used to optimize the
> > '__
> > max_logical_packages' calculation.
> >
> > I'm not sure if we have a perfect solution here.
>
> Are the implementations fixable?
currently, I don't have any idea.
> Or, at least tolerable?
>
> For instance, I can live with the implementation being a bit goofy
> when
> kernel commandlines are in play. We can pr_info() about those cases.
My understanding is that the cpus in the last package may still have
small cpu id value. This means that the 'logical_packages' is hard to
break unless we boot with very small CPU count and happened to disable
all cpus in one/more packages. Feng is experiencing with this and may
have some update later.
If this is the case, is this a valid case that we need to take care of?
thanks,
rui
On Fri, Oct 21, 2022 at 09:21:02AM -0700, Dave Hansen wrote:
> On 10/21/22 08:00, Zhang Rui wrote:
> > I checked the history of '__max_logical_packages', and realized that
> >
> > 1. for topology_max_packages()/'__max_logical_packages', the divisor
> > 'ncpus' uses cpu_data(0).booted_cores, which is based on the
> > *online* CPUs. So when using kernel cmdlines like maxcpus=/nr_cpus=,
> > '__max_logical_packages' can get over-estimated.
> >
> > 2. for 'logical_packages', it equals the number of different physical
> > Package IDs for all *online* CPUs. So with kernel cmdlines like
> > nr_cpus=/maxcpus=, it can gets under-estimated.
Thanks for raising this new case! For 'maxcpus=' cmdline parameter, I
run this on 4-Sockets and 2-Sockets platform, and found the
'logical_packages' is still correct if the 'maxcpus >= (total_cpus/2 + 1)',
but gets wrong for smaller numbers.
SRAT parsing solution [1]. can solve 'maxcpus' case, but it will fail
for other user cases like sub-numa cluster or 'numa=off' case
IIUC, 'maxcpus' are likely for debug purpose, I think 'logical_packages'
is the better option for socket number estimation in the several
existing solutions.
> For instance, I can live with the implementation being a bit goofy when
> kernel commandlines are in play. We can pr_info() about those cases.
Something like adding
pr_info("Watchdog for TSC is disabled for this platform while estimating
the socket number is %d, if the real socket number is bigger than
4 (may due to some tricks like 'maxcpus=' cmdline parameter, please
add 'tsc=watchdog' to cmdline as well\n", logical_packages);
and adding a new 'tsc=watchdog' option to force watchdog on (might be
over-complexed?)
[1]. https://lore.kernel.org/lkml/Y0UgeUIJSFNR4mQB@feng-clx/
Thanks,
Feng
On 10/22/22 09:12, Zhang Rui wrote:
>>> I'm not sure if we have a perfect solution here.
>> Are the implementations fixable?
> currently, I don't have any idea.
>
>> Or, at least tolerable?
That would be great to figure out before we start throwing more patches
around.
>> For instance, I can live with the implementation being a bit goofy
>> when
>> kernel commandlines are in play. We can pr_info() about those cases.
> My understanding is that the cpus in the last package may still have
> small cpu id value. This means that the 'logical_packages' is hard to
> break unless we boot with very small CPU count and happened to disable
> all cpus in one/more packages. Feng is experiencing with this and may
> have some update later.
>
> If this is the case, is this a valid case that we need to take care of?
Well, let's talk through it a bit.
What is the triggering event and what's the fallout?
Is the user on a truly TSC stable system or not?
What kind of maxcpus= argument do they need to specify? Is it something
that's likely to get used in production or is it most likely just for
debugging?
What is the maxcpus= fallout? Does it over estimate or under estimate
the number of logical packages?
How many cases outside of maxcpus= do we know of that lead to an
imprecise "logical packages" calculation?
Does this lead to the TSC being mistakenly marked stable when it is not,
or *not* being marked stable when it is?
Let's get all of that info in one place and make sure we are all agreed
on the *problem* before we got to the solution space.
On 10/24/22 00:37, Feng Tang wrote:
>> For instance, I can live with the implementation being a bit goofy when
>> kernel commandlines are in play. We can pr_info() about those cases.
> Something like adding
>
> pr_info("Watchdog for TSC is disabled for this platform while estimating
> the socket number is %d, if the real socket number is bigger than
> 4 (may due to some tricks like 'maxcpus=' cmdline parameter, please
> add 'tsc=watchdog' to cmdline as well\n", logical_packages);
That's too wishy-washy. Also, I *KNOW* Intel has built systems with
wonky, opaque numbers of "sockets". Cascade Lake was a single physical
"socket", but in all other respects (including enumeration to software)
it acted like two logical sockets.
So, what was the "real" socket number for Cascade Lake? If you looked
in a chassis, you'd see one socket. But, there were two dies in that
socket talking to each other over UPI, so it had a system topology which
was indistinguishable from a 2-socket system.
Let's just state the facts:
pr_info("Disabling TSC watchdog on %d-package system.", ...)
Then, we can have a flag elsewhere to say how reliable that number is.
A taint flag or CPU bug is probably going to far, but something like this:
bool logical_package_count_unreliable = false;
void mark_bad_package_count(char *reason)
{
if (logical_package_count_unreliable)
return true;
pr_warn("processor package count is unreliable");
}
Might be OK. Then you can call mark_bad_package_count() from multiple
sites, like the maxcpus= code.
But, like I said in the other thread, let's make sure we're agreed on
the precise problem that we're solving before we go down this road.
> and adding a new 'tsc=watchdog' option to force watchdog on (might be
> over-complexed?)
Agreed, I don't think that's quite warranted yet.
On Mon, Oct 24, 2022 at 08:42:30AM -0700, Dave Hansen wrote:
> On 10/22/22 09:12, Zhang Rui wrote:
> >>> I'm not sure if we have a perfect solution here.
> >> Are the implementations fixable?
> > currently, I don't have any idea.
> >
> >> Or, at least tolerable?
>
> That would be great to figure out before we start throwing more patches
> around.
Yes, agreed!
> >> For instance, I can live with the implementation being a bit goofy
> >> when
> >> kernel commandlines are in play. We can pr_info() about those cases.
> > My understanding is that the cpus in the last package may still have
> > small cpu id value. This means that the 'logical_packages' is hard to
> > break unless we boot with very small CPU count and happened to disable
> > all cpus in one/more packages. Feng is experiencing with this and may
> > have some update later.
> >
> > If this is the case, is this a valid case that we need to take care of?
>
> Well, let's talk through it a bit.
>
> What is the triggering event and what's the fallout?
In worst case (2 sockets), if the maxcpus falls to '<= total_cpus/2',
the 'logical_packages' will be less than the real number.
> Is the user on a truly TSC stable system or not?
>
> What kind of maxcpus= argument do they need to specify? Is it something
> that's likely to get used in production or is it most likely just for
> debugging?
IIUC, for the server side, it's most likely for debug use. And for
clients, socket number is not an issue.
> What is the maxcpus= fallout? Does it over estimate or under estimate
> the number of logical packages?
Only under estimate.
> How many cases outside of maxcpus= do we know of that lead to an
> imprecise "logical packages" calculation?
Thanks to you, Peter and Rui's info, we have listed a bunch of
user cases than 'maxcpus', and they won't lead to imprecise
'logical_packages'. And I'm not sure if there is other case which
hasn't poped up.
> Does this lead to the TSC being mistakenly marked stable when it is not,
> or *not* being marked stable when it is?
Only the former case 'mistakenly marked stable' is possible, say we
use 'maxcpus=8' on a 192 core 8 sockets machine.
> Let's get all of that info in one place and make sure we are all agreed
> on the *problem* before we got to the solution space.
OK.
Thanks,
Feng
On Mon, Oct 24, 2022 at 08:43:33AM -0700, Dave Hansen wrote:
> On 10/24/22 00:37, Feng Tang wrote:
> >> For instance, I can live with the implementation being a bit goofy when
> >> kernel commandlines are in play. We can pr_info() about those cases.
> > Something like adding
> >
> > pr_info("Watchdog for TSC is disabled for this platform while estimating
> > the socket number is %d, if the real socket number is bigger than
> > 4 (may due to some tricks like 'maxcpus=' cmdline parameter, please
> > add 'tsc=watchdog' to cmdline as well\n", logical_packages);
>
> That's too wishy-washy. Also, I *KNOW* Intel has built systems with
> wonky, opaque numbers of "sockets". Cascade Lake was a single physical
> "socket", but in all other respects (including enumeration to software)
> it acted like two logical sockets.
>
> So, what was the "real" socket number for Cascade Lake? If you looked
> in a chassis, you'd see one socket. But, there were two dies in that
> socket talking to each other over UPI, so it had a system topology which
> was indistinguishable from a 2-socket system.
Good to know and thanks for the info.
I have to admit I haven't checked a server's internals before, and
thanks to Oliver for helping checking some Cascade Lake boxes of 0Day.
In one box where 'lscpu' shows 4 sockets (96 cores in total), it does
only have 2 physical processors in the chassis, just like you
mentioned, it has 2 dies for each processor. And in another box,
'lscpu' shows 2 sockets (44 cores in total), it also has 2 physical
processors but with much smaller size.
And fortunately the 'logical_packages' for these 2 boxes are both
the correct value: 2.
> Let's just state the facts:
>
> pr_info("Disabling TSC watchdog on %d-package system.", ...)
>
> Then, we can have a flag elsewhere to say how reliable that number is.
> A taint flag or CPU bug is probably going to far, but something like this:
>
> bool logical_package_count_unreliable = false;
>
> void mark_bad_package_count(char *reason)
> {
> if (logical_package_count_unreliable)
> return true;
>
> pr_warn("processor package count is unreliable");
> }
>
> Might be OK. Then you can call mark_bad_package_count() from multiple
> sites, like the maxcpus= code.
This should work! we can just add one more check:
boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
!logical_package_count_unreliable &&
logical_packages <= 2
And when some new case leading to a imprecise 'logical_packages' is
found in future, we could just apply to this to it.
> But, like I said in the other thread, let's make sure we're agreed on
> the precise problem that we're solving before we go down this road.
Sure.
Thanks,
Feng
> > and adding a new 'tsc=watchdog' option to force watchdog on (might be
> > over-complexed?)
>
> Agreed, I don't think that's quite warranted yet.
On Tue, Oct 25, 2022 at 03:57:12PM +0800, Feng Tang wrote:
> On Mon, Oct 24, 2022 at 08:43:33AM -0700, Dave Hansen wrote:
> > That's too wishy-washy. Also, I *KNOW* Intel has built systems with
> > wonky, opaque numbers of "sockets". Cascade Lake was a single physical
> > "socket", but in all other respects (including enumeration to software)
> > it acted like two logical sockets.
> >
> > So, what was the "real" socket number for Cascade Lake? If you looked
> > in a chassis, you'd see one socket. But, there were two dies in that
> > socket talking to each other over UPI, so it had a system topology which
> > was indistinguishable from a 2-socket system.
>
> Good to know and thanks for the info.
>
> I have to admit I haven't checked a server's internals before, and
> thanks to Oliver for helping checking some Cascade Lake boxes of 0Day.
>
> In one box where 'lscpu' shows 4 sockets (96 cores in total), it does
> only have 2 physical processors in the chassis, just like you
> mentioned, it has 2 dies for each processor. And in another box,
> 'lscpu' shows 2 sockets (44 cores in total), it also has 2 physical
> processors but with much smaller size.
>
> And fortunately the 'logical_packages' for these 2 boxes are both
> the correct value: 2.
>
> > Let's just state the facts:
> >
> > pr_info("Disabling TSC watchdog on %d-package system.", ...)
> >
> > Then, we can have a flag elsewhere to say how reliable that number is.
> > A taint flag or CPU bug is probably going to far, but something like this:
> >
> > bool logical_package_count_unreliable = false;
> >
> > void mark_bad_package_count(char *reason)
> > {
> > if (logical_package_count_unreliable)
> > return true;
> >
> > pr_warn("processor package count is unreliable");
> > }
> >
> > Might be OK. Then you can call mark_bad_package_count() from multiple
> > sites, like the maxcpus= code.
>
> This should work! we can just add one more check:
>
> boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> !logical_package_count_unreliable &&
> logical_packages <= 2
>
> And when some new case leading to a imprecise 'logical_packages' is
> found in future, we could just apply to this to it.
Hi Thomas, Peter and reviewers,
For estimating socket numbers, there are quite some BIOS/kernel
settings that can affect its accuracy, like:
* numa emulation (numa=fake=4 etc.)
* numa=off
* platforms with CPU+DRAM nodes, CPU-less HBM nodes, CPU-less
persistent memory nodes.
* SNC (sub-numa cluster) mode is enabled
* 'maxcpus=' cmdline limiting onlined CPU numbers
Ideally, BIOS could provide that info in some MSR or through
CPUID, as it knows most of the information before transfering
to OS, but that's just my wishful thinking for now.
And we checked several ways for estimating socket number:
* nr_online_nodes
* SRAT table init(parsing node with CPUs)
* 'logical_packages'
And 'logical_packages' is a better option after comparison,
as it works for mostof the cases above except 'maxcpus='
one, where the 'logical_package' could be smaller than the
real number. Dave suggested to explicitly skip the check
and warn.
We plan to use 'logical_packages' to replace current
'nr_online_nodes' for estimation of socket number, any thoughts
on this, or some suggestions? thanks!
- Feng