2014-02-15 14:02:49

by Petr Tesařík

[permalink] [raw]
Subject: [PATCH] x86: Issue a warning if number of present CPUs > maxcpus and CONFIG_HOTPLUG=n

Note that the check against i (value passed as maxcpus, but at least 1)
is repeated further down, including the warning, but since possible is
already clamped to max_cpus at that time, it is never printed. In fact,
for the non-hotplug case, the warning about exceeding maxcpus is only
ever printed if "possible_cpus" was also specified on the command line.

I strongly believe that such limitation was unintentional.

I also changed the message slightly -- the kernel parameter name is
maxcpus, not max_cpus.

Signed-off-by: Petr Tesarik <[email protected]>
---
arch/x86/kernel/smpboot.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index a32da80..376b6c6 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1226,9 +1226,6 @@ __init void prefill_possible_map(void)
#ifdef CONFIG_HOTPLUG_CPU
if (setup_max_cpus)
possible += disabled_cpus;
-#else
- if (possible > i)
- possible = i;
#endif
} else
possible = setup_possible_cpus;
@@ -1246,7 +1243,7 @@ __init void prefill_possible_map(void)
if (!setup_max_cpus)
#endif
if (possible > i) {
- pr_warn("%d Processors exceeds max_cpus limit of %u\n",
+ pr_warn("%d Processors exceeds maxcpus limit of %u\n",
possible, setup_max_cpus);
possible = i;
}
--
1.8.4.5


2014-02-16 17:12:05

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86: Issue a warning if number of present CPUs > maxcpus and CONFIG_HOTPLUG=n

On Sat, Feb 15, 2014 at 03:02:23PM +0100, Petr Tesarik wrote:
> Note that the check against i (value passed as maxcpus, but at least 1)
> is repeated further down, including the warning, but since possible is
> already clamped to max_cpus at that time, it is never printed. In fact,
> for the non-hotplug case, the warning about exceeding maxcpus is only
> ever printed if "possible_cpus" was also specified on the command line.
>
> I strongly believe that such limitation was unintentional.
>
> I also changed the message slightly -- the kernel parameter name is
> maxcpus, not max_cpus.
>
> Signed-off-by: Petr Tesarik <[email protected]>
> ---
> arch/x86/kernel/smpboot.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index a32da80..376b6c6 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1226,9 +1226,6 @@ __init void prefill_possible_map(void)
> #ifdef CONFIG_HOTPLUG_CPU
> if (setup_max_cpus)
> possible += disabled_cpus;
> -#else
> - if (possible > i)
> - possible = i;

Hmm, ok, this function is not the easiest to parse and provided I'm not
missing some corner case, the cleanup makes sense to me.

Acked-by: Borislav Petkov <[email protected]>

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-02-17 08:34:47

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH] x86: Issue a warning if number of present CPUs > maxcpus and CONFIG_HOTPLUG=n

>>> On 15.02.14 at 15:02, Petr Tesarik <[email protected]> wrote:
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1226,9 +1226,6 @@ __init void prefill_possible_map(void)
> #ifdef CONFIG_HOTPLUG_CPU
> if (setup_max_cpus)
> possible += disabled_cpus;
> -#else
> - if (possible > i)
> - possible = i;
> #endif
> } else
> possible = setup_possible_cpus;

In between here total_cpus is being set, which now will get a
larger value if !HOTPLUG_CPU. Did you check that this has no
unintended side effect? And even if you did, it would still feel
more safe if you moved that line down after the capping point
below.

Similarly (but perhaps less important, albeit possibly slightly
confusing) the NR_CPUS related warning could now get issued
along with the warning below (when possible > nr_cpu_ids > i).
Hence that may better be moved down too (or then in effect
the if() block you modify below would get moved up). I realize
that two warning instead of just one would also be possible
without any change, so you're not really introducing some
entirely new inconsistency here...

Jan

> @@ -1246,7 +1243,7 @@ __init void prefill_possible_map(void)
> if (!setup_max_cpus)
> #endif
> if (possible > i) {
> - pr_warn("%d Processors exceeds max_cpus limit of %u\n",
> + pr_warn("%d Processors exceeds maxcpus limit of %u\n",
> possible, setup_max_cpus);
> possible = i;
> }
> --
> 1.8.4.5


2014-02-17 10:04:10

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH] x86: Issue a warning if number of present CPUs > maxcpus and CONFIG_HOTPLUG=n

Hi Jan,

On Mon, 17 Feb 2014 08:34:34 +0000
"Jan Beulich" <[email protected]> wrote:

> >>> On 15.02.14 at 15:02, Petr Tesarik <[email protected]> wrote:
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -1226,9 +1226,6 @@ __init void prefill_possible_map(void)
> > #ifdef CONFIG_HOTPLUG_CPU
> > if (setup_max_cpus)
> > possible += disabled_cpus;
> > -#else
> > - if (possible > i)
> > - possible = i;
> > #endif
> > } else
> > possible = setup_possible_cpus;
>
> In between here total_cpus is being set, which now will get a
> larger value if !HOTPLUG_CPU. Did you check that this has no
> unintended side effect? And even if you did, it would still feel
> more safe if you moved that line down after the capping point
> below.

This is a non-issue. total_cpus is initialized to
max(possible, num_processors + disabled_cpus).
possible is initialized to num_processors just before the conditional
block, and in !HOTPLUG case, it is not modified afterwards. So:

BEFORE THE CHANGE:
The value of possible was modified only if it was larger than
min(setup_max_cpus, 1). In turn, it could never be bigger than
num_processors + disabled_cpus.

Result: total_cpus = num_processors + disabled_cpus.

AFTER THE CHANGE:
The value of possible is not modified, i.e. it remains equal to
num_processors. disabled_cpus cannot be negative.

Result: total_cpus = num_processors + disabled_cpus

In fact, you can only increase the value of total_cpus by passing
a nr_possible parameter to the kernel which is greater than the
total number of CPUs detected through system tables (MPTABLES, ACPI,
SFI etc.).

> Similarly (but perhaps less important, albeit possibly slightly
> confusing) the NR_CPUS related warning could now get issued
> along with the warning below (when possible > nr_cpu_ids > i).
> Hence that may better be moved down too (or then in effect
> the if() block you modify below would get moved up). I realize
> that two warning instead of just one would also be possible
> without any change, so you're not really introducing some
> entirely new inconsistency here...

Well, if the user passes both nr_cpus and maxcpus parameters to the
kernel, I think it's fair to issue two warnings. But if everyone agrees
that only the maxcpus warning should be printed in that case, I can
send a version 2 of my patch.

Petr Tesarik

> > @@ -1246,7 +1243,7 @@ __init void prefill_possible_map(void)
> > if (!setup_max_cpus)
> > #endif
> > if (possible > i) {
> > - pr_warn("%d Processors exceeds max_cpus limit of %u\n",
> > + pr_warn("%d Processors exceeds maxcpus limit of %u\n",
> > possible, setup_max_cpus);
> > possible = i;
> > }
> > --
> > 1.8.4.5
>
>
>

Subject: Re: [PATCH] x86: Issue a warning if number of present CPUs > maxcpus and CONFIG_HOTPLUG=n

On Mon, 17 Feb 2014, Petr Tesarik wrote:
> Well, if the user passes both nr_cpus and maxcpus parameters to the
> kernel, I think it's fair to issue two warnings. But if everyone agrees
> that only the maxcpus warning should be printed in that case, I can
> send a version 2 of my patch.

Please remember that the market is full of motherboards with the extremely
annoying behaviour of declaring ACPI objects for CPU cores that will never
be available. This includes a large number of workstation and server boards
at the very least, from at least one rather large vendor.

As far as I know, we still don't have a way to realiably detect this and get
rid of the ghost processors which will *NEVER* become online. Setting
maxcpus or nr_cpus manually is the current way to avoid wasting runtime
resources because of phantom cores that will never become reality.

So, when you fix the bug that always supress the warnings, you will at the
same time cause a regression on those boxes, which will now print undesired
warnings. If the user has manually set nr_cpus or maxcpus, maybe it would
be best to not print any warnings or alternatively to downgrade them to
debug level?

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2014-02-17 14:16:33

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH] x86: Issue a warning if number of present CPUs > maxcpus and CONFIG_HOTPLUG=n

On Mon, 17 Feb 2014 10:40:07 -0300
Henrique de Moraes Holschuh <[email protected]> wrote:

> On Mon, 17 Feb 2014, Petr Tesarik wrote:
> > Well, if the user passes both nr_cpus and maxcpus parameters to the
> > kernel, I think it's fair to issue two warnings. But if everyone agrees
> > that only the maxcpus warning should be printed in that case, I can
> > send a version 2 of my patch.
>
> Please remember that the market is full of motherboards with the extremely
> annoying behaviour of declaring ACPI objects for CPU cores that will never
> be available. This includes a large number of workstation and server boards
> at the very least, from at least one rather large vendor.
>
> As far as I know, we still don't have a way to realiably detect this and get
> rid of the ghost processors which will *NEVER* become online. Setting
> maxcpus or nr_cpus manually is the current way to avoid wasting runtime
> resources because of phantom cores that will never become reality.
>
> So, when you fix the bug that always supress the warnings, you will at the
> same time cause a regression on those boxes, which will now print undesired
> warnings. If the user has manually set nr_cpus or maxcpus, maybe it would
> be best to not print any warnings or alternatively to downgrade them to
> debug level?

While I appreciate your concerns, I fail to see how they are related.

First, please keep in mind that my patch does not alter the (more
common) case with CONFIG_HOTPLUG_CPU=y in any way. For the (presumably
less common) !HOTPLUG_CPU case, it adds a warning that is always issued
in the HOTPLUG_CPU case.

Second, regarding your use case, I don't think it changes anything. So,
let's say you have a board with 16 CPUs and a MADT that describes 1024
CPUs. You know that there can (physically) be at most 48 CPUs, so you
boot with nr_cpus=48:

num_processors = 16 /* online CPUs at boot */
disabled_cpus = 992 /* 1008 - 16 */

This results in:

total_cpus = 1008 /* this is purely informative, it is *NOT* used
to size anything */
possible = 48 /* clamped to nr_cpu_ids */

A warning message (with or without my patch):
1024 Processors exceeds NR_CPUS limit of 48

Informative message:
Allowing 16 CPUs, 32 hotplug CPUs

No other warning (with or without my patch).

Petr Tesarik
SUSE L3 Team 1

Subject: Re: [PATCH] x86: Issue a warning if number of present CPUs > maxcpus and CONFIG_HOTPLUG=n

On Mon, 17 Feb 2014, Petr Tesarik wrote:
> This results in:
>
> total_cpus = 1008 /* this is purely informative, it is *NOT* used
> to size anything */
> possible = 48 /* clamped to nr_cpu_ids */
>
> A warning message (with or without my patch):
> 1024 Processors exceeds NR_CPUS limit of 48
>
> Informative message:
> Allowing 16 CPUs, 32 hotplug CPUs
>
> No other warning (with or without my patch).

I'd rather no warnings were printed at all (user asked for that nr_cpus,
there is no reason to warn him about it), but your patch didn't cause any
_new_ warnings to be printed anyway.

So, please ignore my comment. There's no need to change anything in your
patch on my account.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2014-02-17 19:31:09

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH] x86: Issue a warning if number of present CPUs > maxcpus and CONFIG_HOTPLUG=n

On Mon, 17 Feb 2014 16:07:04 -0300
Henrique de Moraes Holschuh <[email protected]> wrote:

> On Mon, 17 Feb 2014, Petr Tesarik wrote:
> > This results in:
> >
> > total_cpus = 1008 /* this is purely informative, it is *NOT* used
> > to size anything */
> > possible = 48 /* clamped to nr_cpu_ids */
> >
> > A warning message (with or without my patch):
> > 1024 Processors exceeds NR_CPUS limit of 48
> >
> > Informative message:
> > Allowing 16 CPUs, 32 hotplug CPUs
> >
> > No other warning (with or without my patch).
>
> I'd rather no warnings were printed at all (user asked for that nr_cpus,
> there is no reason to warn him about it),
>[...]

Agreed. This needs some cleanup.

This code used to check against NR_CPUS, which is a compile-time
constant, so the warning was printed when the user booted a kernel
incapable of using all available CPUs in the system. And this was a
good thing, because there was no (easy) way to find out this constant
from a given kernel binary.

I can post a clean-up patch that doesn't issue any warnings for user
overrides (but does issue warnings for exceeding hard-coded kernel
limits).

But I'll do it separately and only after I know if the current patch
gets accepted or not. ;-)

Petr Tesarik

Subject: Re: [PATCH] x86: Issue a warning if number of present CPUs > maxcpus and CONFIG_HOTPLUG=n

On Mon, 17 Feb 2014, Petr Tesarik wrote:
> > I'd rather no warnings were printed at all (user asked for that nr_cpus,
> > there is no reason to warn him about it),
> >[...]
>
> Agreed. This needs some cleanup.
>
> This code used to check against NR_CPUS, which is a compile-time
> constant, so the warning was printed when the user booted a kernel
> incapable of using all available CPUs in the system. And this was a
> good thing, because there was no (easy) way to find out this constant
> from a given kernel binary.
>
> I can post a clean-up patch that doesn't issue any warnings for user
> overrides (but does issue warnings for exceeding hard-coded kernel
> limits).

That'd be really nice. Thank you!

> But I'll do it separately and only after I know if the current patch
> gets accepted or not. ;-)

:-)

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh