2009-10-23 23:37:49

by Mike Travis

[permalink] [raw]
Subject: [PATCH 1/8] SGI x86_64 UV: Add limit console output function

With a large number of processors in a system there is an excessive amount
of messages sent to the system console. It's estimated that with 4096
processors in a system, and the console baudrate set to 56K, the startup
messages will take about 84 minutes to clear the serial port.

This patch adds (for SGI UV only) a kernel start option "limit_console_
output" (or 'lco' for short), which when set provides the ability to
temporarily reduce the console loglevel during system startup. This allows
informative messages to still be seen on the console without producing
excessive amounts of repetious messages.

Note that all the messages are still available in the kernel log buffer.

Cc: Randy Dunlap <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Robin Getz <[email protected]>
Cc: Dave Young <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Mike Travis <[email protected]>
---
Documentation/kernel-parameters.txt | 6 ++++
include/linux/kernel.h | 23 +++++++++++++++++
kernel/printk.c | 47 ++++++++++++++++++++++++++++++++++++
3 files changed, 76 insertions(+)

--- linux.orig/Documentation/kernel-parameters.txt
+++ linux/Documentation/kernel-parameters.txt
@@ -103,6 +103,7 @@
UMS USB Mass Storage support is enabled.
USB USB support is enabled.
USBHID USB Human Interface Device support is enabled.
+ UV SGI Ultraviolet
V4L Video For Linux support is enabled.
VGA The VGA console has been enabled.
VT Virtual terminal support is enabled.
@@ -1218,6 +1219,11 @@
If there are multiple matching configurations changing
the same attribute, the last one is used.

+ limit_console_output [UV]
+ lco [UV]
+ Limits repetitous messages when the number of cpus in
+ a system is large.
+
lmb=debug [KNL] Enable lmb debug messages.

load_ramdisk= [RAM] List of ramdisks to load from floppy
--- linux.orig/include/linux/kernel.h
+++ linux/include/linux/kernel.h
@@ -372,6 +372,29 @@
}
#endif

+#ifdef CONFIG_X86_UV
+bool _limit_console_output(bool suspend);
+#define limit_console_output(suspend) \
+({ \
+ bool limit = _limit_console_output(suspend); \
+ if (limit) \
+ printk_once(KERN_NOTICE \
+ "printk: further related messages suppressed\n");\
+ limit; \
+})
+
+void end_limit_console_output(void);
+#else
+static inline bool limit_console_output(bool suspend)
+{
+ return false;
+}
+
+static inline void end_limit_console_output(void)
+{
+}
+#endif
+
extern int printk_needs_cpu(int cpu);
extern void printk_tick(void);

--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -1417,3 +1417,50 @@
}
EXPORT_SYMBOL(printk_timed_ratelimit);
#endif
+
+#ifdef CONFIG_X86_UV
+/*
+ * Support to suppress the zillions of extra messages being sent to
+ * the console when a server has a large number of cpus.
+ */
+static bool __read_mostly console_output_limited;
+static int __init limit_console_output_setup(char *str)
+{
+ console_output_limited = true;
+ printk(KERN_NOTICE "printk: console messages will be limited.\n");
+ return 0;
+}
+early_param("limit_console_output", limit_console_output_setup);
+
+static int __init lco(char *str)
+{
+ return limit_console_output_setup(str);
+}
+early_param("lco", lco);
+
+#define SUSPENDED_CONSOLE_LOGLEVEL (DEFAULT_CONSOLE_LOGLEVEL-1)
+
+/* check if "limiting" console output and suspend further msgs if requested. */
+bool _limit_console_output(bool suspend)
+{
+ if (console_output_limited) {
+ if (suspend && saved_console_loglevel == -1) {
+ saved_console_loglevel = console_loglevel;
+ console_loglevel = SUSPENDED_CONSOLE_LOGLEVEL;
+ }
+ return true;
+ }
+ return false;
+}
+EXPORT_SYMBOL(_limit_console_output);
+
+/* remove suspension of console msgs. */
+void end_limit_console_output(void)
+{
+ if (console_output_limited && saved_console_loglevel != -1) {
+ console_loglevel = saved_console_loglevel;
+ saved_console_loglevel = -1;
+ }
+}
+EXPORT_SYMBOL(end_limit_console_output);
+#endif

--


2009-10-24 01:09:15

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 1/8] SGI x86_64 UV: Add limit console output function

On Fri, Oct 23, 2009 at 06:37:44PM -0500, Mike Travis wrote:
> With a large number of processors in a system there is an excessive amount
> of messages sent to the system console. It's estimated that with 4096
> processors in a system, and the console baudrate set to 56K, the startup
> messages will take about 84 minutes to clear the serial port.
>
> This patch adds (for SGI UV only) a kernel start option "limit_console_
> output" (or 'lco' for short), which when set provides the ability to
> temporarily reduce the console loglevel during system startup. This allows
> informative messages to still be seen on the console without producing
> excessive amounts of repetious messages.
>
> Note that all the messages are still available in the kernel log buffer.



Well, this problem does not only concerns SGI UV but all boxes with a large
number of cpus.

Also, instead of adding the same conditionals in multiple places to solve
the same problem (and that may even expand if we go further the SGI UV case,
for example with other archs cpu up/down events), may be can you centralize,
institutionalize this issue by using the existing printk mechanisms.

I mean, may be that could be addressed by adding a new printk
level flag, and then associate the desired filters against it.

KERN_CPU could be a name, since this is targetting cpu events.

2009-10-26 07:02:17

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/8] SGI x86_64 UV: Add limit console output function

Mike Travis <[email protected]> writes:

> With a large number of processors in a system there is an excessive amount
> of messages sent to the system console. It's estimated that with 4096
> processors in a system, and the console baudrate set to 56K, the startup
> messages will take about 84 minutes to clear the serial port.
>
> This patch adds (for SGI UV only) a kernel start option "limit_console_
> output" (or 'lco' for short), which when set provides the ability to
> temporarily reduce the console loglevel during system startup. This allows
> informative messages to still be seen on the console without producing
> excessive amounts of repetious messages.
>
> Note that all the messages are still available in the kernel log buffer.

I've run into the same problem (kernel log being flooded on large number of CPU thread
systems). It's definitely not a UV only problem. Making such a option UV only
is definitely not the right approach, if anything it needs to be for everyone.

Frankly a lot of these messages made sense for debugging at some point,
but really don't anymore and should just be removed.

Also I don't like the defaults of on. It would be better to evaluate if
these various messages are really useful and if they are not just remove them.

For example do we really need the scheduler debug messages by default?

Or do we really need to print the caches for each CPU at boot? The information
is in sysfs anyways and rarely changes (I added this originally on 64bit,
but in hindsight it was a bad idea)

I don't think it makes much sense to print more than 2-3 lines for each CPU boot
for example.

Also more work could be done to make CPU boot up less verbose without
sacrifying debuggability if something goes wrong.

So please:
- Simply remove messages that don't make sense, no flag.
- Make the default non verbose.
- Minimize output in general, with just a few standard checkpoints so
that if there is a hang the developer still has some clue what went wrong.

-Andi

--
[email protected] -- Speaking for myself only.

2009-10-26 16:10:56

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/8] SGI x86_64 UV: Add limit console output function

On Mon, 2009-10-26 at 08:02 +0100, Andi Kleen wrote:

> Also more work could be done to make CPU boot up less verbose without
> sacrifying debuggability if something goes wrong.

What about moving printks over to trace_printk or something. And that
way we can have a "boot up" ring buffer that can later be retrieved if
something goes wrong. It already dumps to the console on panic/oops.

The trace_printk will be hidden and is very fast.

-- Steve

2009-10-26 17:55:29

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH 1/8] SGI x86_64 UV: Add limit console output function



Frederic Weisbecker wrote:
> On Fri, Oct 23, 2009 at 06:37:44PM -0500, Mike Travis wrote:
>> With a large number of processors in a system there is an excessive amount
>> of messages sent to the system console. It's estimated that with 4096
>> processors in a system, and the console baudrate set to 56K, the startup
>> messages will take about 84 minutes to clear the serial port.
>>
>> This patch adds (for SGI UV only) a kernel start option "limit_console_
>> output" (or 'lco' for short), which when set provides the ability to
>> temporarily reduce the console loglevel during system startup. This allows
>> informative messages to still be seen on the console without producing
>> excessive amounts of repetious messages.
>>
>> Note that all the messages are still available in the kernel log buffer.
>
>
>
> Well, this problem does not only concerns SGI UV but all boxes with a large
> number of cpus.
>
> Also, instead of adding the same conditionals in multiple places to solve
> the same problem (and that may even expand if we go further the SGI UV case,
> for example with other archs cpu up/down events), may be can you centralize,
> institutionalize this issue by using the existing printk mechanisms.
>
> I mean, may be that could be addressed by adding a new printk
> level flag, and then associate the desired filters against it.
>
> KERN_CPU could be a name, since this is targetting cpu events.
>

I did try out something like this but the changes quickly became very intrusive,
and I was hoping for a "lighter" touch. The other potential fallout of adding
another printk level might affect user programs that sift through the dmesg
log for "interesting" info.

Also, I could use some other config option to enable this, it's just that the
existing X86_UV was too convenient. ;-) I believe most systems would want this
turned off so the code size shrinks. And until you get the number of cpus into
the hundreds and thousands, the messages usually just fly by - particularly if
you're on a desktop system which has almost an infinite baud rate to the screen,
and usually hides the messages behind a splash screen anyways.

2009-10-26 18:03:59

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH 1/8] SGI x86_64 UV: Add limit console output function



Andi Kleen wrote:
> Mike Travis <[email protected]> writes:
>
>> With a large number of processors in a system there is an excessive amount
>> of messages sent to the system console. It's estimated that with 4096
>> processors in a system, and the console baudrate set to 56K, the startup
>> messages will take about 84 minutes to clear the serial port.
>>
>> This patch adds (for SGI UV only) a kernel start option "limit_console_
>> output" (or 'lco' for short), which when set provides the ability to
>> temporarily reduce the console loglevel during system startup. This allows
>> informative messages to still be seen on the console without producing
>> excessive amounts of repetious messages.
>>
>> Note that all the messages are still available in the kernel log buffer.
>
> I've run into the same problem (kernel log being flooded on large number of CPU thread
> systems). It's definitely not a UV only problem. Making such a option UV only
> is definitely not the right approach, if anything it needs to be for everyone.

I could use something like the MAXSMP config option to enable it...?
>
> Frankly a lot of these messages made sense for debugging at some point,
> but really don't anymore and should just be removed.

That they still go to the kernel log buffer means the messages are still
available for debugging system problems. KDB has a kernel print option if
you end up there before being able to use 'dmesg'.

>
> Also I don't like the defaults of on. It would be better to evaluate if
> these various messages are really useful and if they are not just remove them.

I believe most distros already do that by setting the loglevel argument
(but I could be wrong since I haven't looked at too many of them.)

>
> For example do we really need the scheduler debug messages by default?

This was the most painful message at Nasa (which has a 2k cpu system). It took
well over an hour for these scheduler messages to print, just because we wanted
to get some other DEBUG prints.
>
> Or do we really need to print the caches for each CPU at boot? The information
> is in sysfs anyways and rarely changes (I added this originally on 64bit,
> but in hindsight it was a bad idea)

I was attempting not to decide whether each message was pertinent, only if it
was redundant.

>
> I don't think it makes much sense to print more than 2-3 lines for each CPU boot
> for example.

That would still be 4 to 12 thousand lines of information which, as you say is
available by other means.
>
> Also more work could be done to make CPU boot up less verbose without
> sacrifying debuggability if something goes wrong.
>
> So please:
> - Simply remove messages that don't make sense, no flag.
> - Make the default non verbose.
> - Minimize output in general, with just a few standard checkpoints so
> that if there is a hang the developer still has some clue what went wrong.

loglevel=4 does this quite nicely. ;-)

Thanks,
Mike

2009-10-26 18:05:03

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH 1/8] SGI x86_64 UV: Add limit console output function



Steven Rostedt wrote:
> On Mon, 2009-10-26 at 08:02 +0100, Andi Kleen wrote:
>
>> Also more work could be done to make CPU boot up less verbose without
>> sacrifying debuggability if something goes wrong.
>
> What about moving printks over to trace_printk or something. And that
> way we can have a "boot up" ring buffer that can later be retrieved if
> something goes wrong. It already dumps to the console on panic/oops.
>
> The trace_printk will be hidden and is very fast.
>
> -- Steve
>

I haven't heard of "trace_printk" but how does this differ from the
existing kernel log buffer you get with 'dmesg'?

Thanks,
Mike

2009-10-26 18:51:10

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/8] SGI x86_64 UV: Add limit console output function

On Mon, 2009-10-26 at 11:05 -0700, Mike Travis wrote:

> I haven't heard of "trace_printk" but how does this differ from the
> existing kernel log buffer you get with 'dmesg'?
>

It goes into the ftrace ring buffer and does not print to console. Nor
does it print to dmesg. Currently you need to read ftrace (via the
debugfs) to get to it. Or a sysrq-z will print it out to the console.

-- Steve

2009-10-26 21:55:43

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/8] SGI x86_64 UV: Add limit console output function

On Mon, Oct 26, 2009 at 11:03:59AM -0700, Mike Travis wrote:
>
>
> Andi Kleen wrote:
>> Mike Travis <[email protected]> writes:
>>
>>> With a large number of processors in a system there is an excessive amount
>>> of messages sent to the system console. It's estimated that with 4096
>>> processors in a system, and the console baudrate set to 56K, the startup
>>> messages will take about 84 minutes to clear the serial port.
>>>
>>> This patch adds (for SGI UV only) a kernel start option "limit_console_
>>> output" (or 'lco' for short), which when set provides the ability to
>>> temporarily reduce the console loglevel during system startup. This allows
>>> informative messages to still be seen on the console without producing
>>> excessive amounts of repetious messages.
>>>
>>> Note that all the messages are still available in the kernel log buffer.
>>
>> I've run into the same problem (kernel log being flooded on large number of CPU thread
>> systems). It's definitely not a UV only problem. Making such a option UV only
>> is definitely not the right approach, if anything it needs to be for everyone.
>
> I could use something like the MAXSMP config option to enable it...?

No, it's a problem long before MAXSMP sizes.

>>
>> Frankly a lot of these messages made sense for debugging at some point,
>> but really don't anymore and should just be removed.
>
> That they still go to the kernel log buffer means the messages are still
> available for debugging system problems. KDB has a kernel print option if
> you end up there before being able to use 'dmesg'.

Again they should be just reevaluated and pr_debug()ed or completely
removed.

>
>>
>> Also I don't like the defaults of on. It would be better to evaluate if
>> these various messages are really useful and if they are not just remove them.
>
> I believe most distros already do that by setting the loglevel argument
> (but I could be wrong since I haven't looked at too many of them.)

Even spamming dmesg is a problem. loglevel doesn't fix that.

>
>>
>> For example do we really need the scheduler debug messages by default?
>
> This was the most painful message at Nasa (which has a 2k cpu system). It took
> well over an hour for these scheduler messages to print, just because we wanted
> to get some other DEBUG prints.

They should be just removed.

>>
>> Or do we really need to print the caches for each CPU at boot? The information
>> is in sysfs anyways and rarely changes (I added this originally on 64bit,
>> but in hindsight it was a bad idea)
>
> I was attempting not to decide whether each message was pertinent, only if it
> was redundant.

You should decide or at least ask whoever added it

("How many bugs did you fix with that message last year?" If the answer
is < 10 or so, remove it)
>
>>
>> I don't think it makes much sense to print more than 2-3 lines for each CPU boot
>> for example.
>
> That would still be 4 to 12 thousand lines of information which, as you say is
> available by other means.

A simple checkpoint for debugging is not available by other means.

The cache, mce etc. information is.

For the checkpoint problem on CPU boot it might be reasonable
to write them into a special buffer and only print it when the other
CPU does not come up (BP detects a time out)

With that a single line of per CPU output should be feasible without
losing any debuggability.

In fact debuggability could be improved by putting the output
at better strategic points instead of the ad-hoc way it is currently.

-Andi

--
[email protected] -- Speaking for myself only.

2009-10-26 22:06:56

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH 1/8] SGI x86_64 UV: Add limit console output function



Andi Kleen wrote:
> On Mon, Oct 26, 2009 at 11:03:59AM -0700, Mike Travis wrote:
>>
>> Andi Kleen wrote:
>>> Mike Travis <[email protected]> writes:
>>>
>>>> With a large number of processors in a system there is an excessive amount
>>>> of messages sent to the system console. It's estimated that with 4096
>>>> processors in a system, and the console baudrate set to 56K, the startup
>>>> messages will take about 84 minutes to clear the serial port.
>>>>
>>>> This patch adds (for SGI UV only) a kernel start option "limit_console_
>>>> output" (or 'lco' for short), which when set provides the ability to
>>>> temporarily reduce the console loglevel during system startup. This allows
>>>> informative messages to still be seen on the console without producing
>>>> excessive amounts of repetious messages.
>>>>
>>>> Note that all the messages are still available in the kernel log buffer.
>>> I've run into the same problem (kernel log being flooded on large number of CPU thread
>>> systems). It's definitely not a UV only problem. Making such a option UV only
>>> is definitely not the right approach, if anything it needs to be for everyone.
>> I could use something like the MAXSMP config option to enable it...?
>
> No, it's a problem long before MAXSMP sizes.
>
>>> Frankly a lot of these messages made sense for debugging at some point,
>>> but really don't anymore and should just be removed.
>> That they still go to the kernel log buffer means the messages are still
>> available for debugging system problems. KDB has a kernel print option if
>> you end up there before being able to use 'dmesg'.
>
> Again they should be just reevaluated and pr_debug()ed or completely
> removed.
>
>>> Also I don't like the defaults of on. It would be better to evaluate if
>>> these various messages are really useful and if they are not just remove them.
>> I believe most distros already do that by setting the loglevel argument
>> (but I could be wrong since I haven't looked at too many of them.)
>
> Even spamming dmesg is a problem. loglevel doesn't fix that.
>
>>> For example do we really need the scheduler debug messages by default?
>> This was the most painful message at Nasa (which has a 2k cpu system). It took
>> well over an hour for these scheduler messages to print, just because we wanted
>> to get some other DEBUG prints.
>
> They should be just removed.

I had changed this to CONFIG_DEBUG_SCHED at one time. Perhaps this would be
acceptible?

>
>>> Or do we really need to print the caches for each CPU at boot? The information
>>> is in sysfs anyways and rarely changes (I added this originally on 64bit,
>>> but in hindsight it was a bad idea)
>> I was attempting not to decide whether each message was pertinent, only if it
>> was redundant.
>
> You should decide or at least ask whoever added it
>
> ("How many bugs did you fix with that message last year?" If the answer
> is < 10 or so, remove it)

Ok.

>>> I don't think it makes much sense to print more than 2-3 lines for each CPU boot
>>> for example.
>> That would still be 4 to 12 thousand lines of information which, as you say is
>> available by other means.
>
> A simple checkpoint for debugging is not available by other means.
>
> The cache, mce etc. information is.
>
> For the checkpoint problem on CPU boot it might be reasonable
> to write them into a special buffer and only print it when the other
> CPU does not come up (BP detects a time out)
>
> With that a single line of per CPU output should be feasible without
> losing any debuggability.
>
> In fact debuggability could be improved by putting the output
> at better strategic points instead of the ad-hoc way it is currently.
>
> -Andi
>

Ok, thanks for the feedback. I'll see about reducing the output more
intelligently for CPU's (as per Ingo's suggestions as well.)

Mike

2009-10-30 19:25:15

by Mike Travis

[permalink] [raw]
Subject: [PATCH] x86_64: Limit the number of processor bootup messages

x86_64: Limit the number of processor bootup messages

With a large number of processors in a system there is an excessive amount
of messages sent to the system console. It's estimated that with 4096
processors in a system, and the console baudrate set to 56K, the startup
messages will take about 84 minutes to clear the serial port.

This set of patches limits the number of repetitious messages which contain
no additional information. Much of this information is obtainable from the
/proc and /sysfs. Most of the messages are also sent to the kernel log
buffer as KERN_DEBUG messages so it can be used to examine more closely any
details specific to a processor.

The list of message transformations....

For system_state == SYSTEM_BOOTING:

[ 25.388280] Booting Processors 1-7,320-327, Node 0
[ 26.064742] Booting Processors 8-15,328-335, Node 1
[ 26.837006] Booting Processors 16-31,336-351, Nodes 2-3
[ 28.440427] Booting Processors 32-63,352-383, Nodes 4-7
[ 31.640450] Booting Processors 64-127,384-447, Nodes 8-15
[ 38.041430] Booting Processors 128-255,448-575, Nodes 16-31
[ 50.917504] Booting Processors 256-319,576-639, Nodes 32-39
[ 90.964169] Brought up 640 CPUs

The range of processors increases as a power of 2, so 4096 CPU's should
only take 12 lines.

(QUESTION: print_summary_bootmsg() is in the __init section and is called
from a __cpuinit function, but only when system is booting. Is there a
special flag to handle this case?)

For Processor Information printout:

[ 90.968381] Summary Processor Information for CPUS: 0-639
[ 90.972033] Genuine Intel(R) CPU 0000 @ 2.13GHz stepping 04
[ 90.981402] CPU: L1 I cache: 32K, L1 D cache: 32K
[ 90.985888] CPU: L2 cache: 256K
[ 90.988032] CPU: L3 cache: 24576K
[ 90.992032] MIN 4266.68 BogoMIPS (lpj=8533371)
[ 91.000033] MAX 4267.89 BogoMIPS (lpj=8535789)

These lines have been moved to loglevel KERN_DEBUG:

CPU: Physical Processor ID:
CPU: Processor Core ID:
CPU %d/0x%x -> Node %d
<cache line sizes per cpu>
CPUx is down

This message has been changed to loglevel KERN_DEBUG if system is booting
and KERN_INFO otherwise:

CPU %d is now offline

Signed-off-by: Mike Travis <[email protected]>
---
arch/x86/include/asm/processor.h | 4
arch/x86/kernel/cpu/addon_cpuid_features.c | 4
arch/x86/kernel/cpu/amd.c | 2
arch/x86/kernel/cpu/common.c | 23 +++-
arch/x86/kernel/cpu/intel.c | 2
arch/x86/kernel/cpu/intel_cacheinfo.c | 22 +---
arch/x86/kernel/smpboot.c | 154 ++++++++++++++++++++++++++++-
kernel/cpu.c | 2
8 files changed, 187 insertions(+), 26 deletions(-)

--- linux.orig/arch/x86/include/asm/processor.h
+++ linux/arch/x86/include/asm/processor.h
@@ -111,6 +111,9 @@
u16 cpu_core_id;
/* Index into per_cpu list: */
u16 cpu_index;
+ /* Interior Cache Sizes: */
+ u16 l1i, l1d, l2;
+ u32 l3;
#endif
unsigned int x86_hyper_vendor;
} __attribute__((__aligned__(SMP_CACHE_BYTES)));
@@ -169,6 +172,7 @@
extern void identify_boot_cpu(void);
extern void identify_secondary_cpu(struct cpuinfo_x86 *);
extern void print_cpu_info(struct cpuinfo_x86 *);
+extern void print_cache_info(struct cpuinfo_x86 *, char *msglvl);
extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
extern unsigned short num_cache_leaves;
--- linux.orig/arch/x86/kernel/cpu/addon_cpuid_features.c
+++ linux/arch/x86/kernel/cpu/addon_cpuid_features.c
@@ -128,10 +128,10 @@
c->x86_max_cores = (core_level_siblings / smp_num_siblings);


- printk(KERN_INFO "CPU: Physical Processor ID: %d\n",
+ printk(KERN_DEBUG "CPU: Physical Processor ID: %d\n",
c->phys_proc_id);
if (c->x86_max_cores > 1)
- printk(KERN_INFO "CPU: Processor Core ID: %d\n",
+ printk(KERN_DEBUG "CPU: Processor Core ID: %d\n",
c->cpu_core_id);
return;
#endif
--- linux.orig/arch/x86/kernel/cpu/amd.c
+++ linux/arch/x86/kernel/cpu/amd.c
@@ -376,7 +376,7 @@
}
numa_set_node(cpu, node);

- printk(KERN_INFO "CPU %d/0x%x -> Node %d\n", cpu, apicid, node);
+ printk(KERN_DEBUG "CPU %d/0x%x -> Node %d\n", cpu, apicid, node);
#endif
}

--- linux.orig/arch/x86/kernel/cpu/common.c
+++ linux/arch/x86/kernel/cpu/common.c
@@ -475,9 +475,9 @@

out:
if ((c->x86_max_cores * smp_num_siblings) > 1) {
- printk(KERN_INFO "CPU: Physical Processor ID: %d\n",
+ printk(KERN_DEBUG "CPU: Physical Processor ID: %d\n",
c->phys_proc_id);
- printk(KERN_INFO "CPU: Processor Core ID: %d\n",
+ printk(KERN_DEBUG "CPU: Processor Core ID: %d\n",
c->cpu_core_id);
}
#endif
@@ -967,6 +967,23 @@
#endif
}

+void __cpuinit print_cache_info(struct cpuinfo_x86 *c, char *lvl)
+{
+ if (c->l1i)
+ printk("%sCPU: L1 I cache: %dK", lvl, c->l1i);
+
+ if (c->l1d)
+ printk(KERN_CONT ", L1 D cache: %dK\n", c->l1d);
+ else
+ printk(KERN_CONT "\n");
+
+ if (c->l2)
+ printk("%sCPU: L2 cache: %dK\n", lvl, c->l2);
+
+ if (c->l3)
+ printk("%sCPU: L3 cache: %dK\n", lvl, c->l3);
+}
+
static __init int setup_disablecpuid(char *arg)
{
int bit;
@@ -1115,7 +1132,7 @@
if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
panic("CPU#%d already initialized!\n", cpu);

- printk(KERN_INFO "Initializing CPU#%d\n", cpu);
+ printk(KERN_DEBUG "Initializing CPU#%d\n", cpu);

clear_in_cr4(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);

--- linux.orig/arch/x86/kernel/cpu/intel.c
+++ linux/arch/x86/kernel/cpu/intel.c
@@ -267,7 +267,7 @@
node = first_node(node_online_map);
numa_set_node(cpu, node);

- printk(KERN_INFO "CPU %d/0x%x -> Node %d\n", cpu, apicid, node);
+ printk(KERN_DEBUG "CPU %d/0x%x -> Node %d\n", cpu, apicid, node);
#endif
}

--- linux.orig/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ linux/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -489,23 +489,17 @@
}

if (trace)
- printk(KERN_INFO "CPU: Trace cache: %dK uops", trace);
- else if (l1i)
- printk(KERN_INFO "CPU: L1 I cache: %dK", l1i);
-
- if (l1d)
- printk(KERN_CONT ", L1 D cache: %dK\n", l1d);
- else
- printk(KERN_CONT "\n");
-
- if (l2)
- printk(KERN_INFO "CPU: L2 cache: %dK\n", l2);
-
- if (l3)
- printk(KERN_INFO "CPU: L3 cache: %dK\n", l3);
+ printk(KERN_DEBUG "CPU: Trace cache: %dK uops", trace);

+ c->l1i = l1i;
+ c->l1d = l1d;
+ c->l2 = l2;
+ c->l3 = l3;
c->x86_cache_size = l3 ? l3 : (l2 ? l2 : (l1i+l1d));

+ print_cache_info(c,
+ system_state == SYSTEM_BOOTING? KERN_DEBUG : KERN_INFO);
+
return l2;
}

--- linux.orig/arch/x86/kernel/smpboot.c
+++ linux/arch/x86/kernel/smpboot.c
@@ -442,6 +442,94 @@
return c->llc_shared_map;
}

+/* Summarize Processor Information */
+static void __init summarize_cpu_info(void)
+{
+ cpumask_var_t cpulist, cpusdone;
+ int cpu;
+ int err = 0;
+
+ if (!alloc_cpumask_var(&cpulist, GFP_KERNEL))
+ err = 1;
+
+ else if (!alloc_cpumask_var(&cpusdone, GFP_KERNEL)) {
+ free_cpumask_var(cpulist);
+ err = 1;
+ }
+
+ if (err) {
+ printk(KERN_INFO "Can't print processor summaries\n");
+ return;
+ }
+
+ cpumask_clear(cpusdone);
+ for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
+ struct cpuinfo_x86 *c;
+ int l1i, l1d, l2, l3;
+ int x86, x86_vendor, x86_model, x86_mask;
+ char buf[64];
+ int ncpu;
+ unsigned long minlpj, maxlpj;
+
+ /* skip if cpu has already been displayed */
+ if (cpumask_test_cpu(cpu, cpusdone))
+ continue;
+
+ c = &cpu_data(cpu);
+ l1i = c->l1i;
+ l1d = c->l1d;
+ l2 = c->l2;
+ l3 = c->l3;
+ x86 = c->x86;
+ x86_vendor = c->x86_vendor;
+ x86_model = c->x86_model;
+ x86_mask = c->x86_mask;
+ minlpj = ULONG_MAX;
+ maxlpj = 0;
+
+ cpumask_clear(cpulist);
+
+ /* collate all cpus with same specifics */
+ for (ncpu = cpu; ncpu < nr_cpu_ids; ncpu++) {
+ if (l1i != cpu_data(ncpu).l1i ||
+ l1d != cpu_data(ncpu).l1d ||
+ l2 != cpu_data(ncpu).l2 ||
+ l3 != cpu_data(ncpu).l3 ||
+ x86 != cpu_data(ncpu).x86 ||
+ x86_vendor != cpu_data(ncpu).x86_vendor ||
+ x86_model != cpu_data(ncpu).x86_model ||
+ x86_mask != cpu_data(ncpu).x86_mask)
+ continue;
+
+ cpumask_set_cpu(ncpu, cpulist);
+ cpumask_set_cpu(ncpu, cpusdone);
+
+ if (cpu_data(ncpu).loops_per_jiffy < minlpj)
+ minlpj = cpu_data(ncpu).loops_per_jiffy;
+
+ if (cpu_data(ncpu).loops_per_jiffy > maxlpj)
+ maxlpj = cpu_data(ncpu).loops_per_jiffy;
+ }
+
+ cpulist_scnprintf(buf, sizeof(buf), cpulist);
+ printk(KERN_INFO
+ "Summary Processor Information for CPUS: %s\n", buf);
+
+ printk(KERN_INFO);
+ print_cpu_info(c);
+ print_cache_info(c, KERN_INFO);
+
+ printk(KERN_INFO "MIN %lu.%02lu BogoMIPS (lpj=%lu)\n",
+ minlpj/(500000/HZ), (minlpj/(5000/HZ)) % 100, minlpj);
+
+ printk(KERN_INFO "MAX %lu.%02lu BogoMIPS (lpj=%lu)\n",
+ maxlpj/(500000/HZ), (maxlpj/(5000/HZ)) % 100, maxlpj);
+ }
+
+ free_cpumask_var(cpusdone);
+ free_cpumask_var(cpulist);
+}
+
static void impress_friends(void)
{
int cpu;
@@ -671,6 +759,50 @@
complete(&c_idle->done);
}

+/* Summarize the "Booting processor ..." startup messages */
+static void __init print_summary_bootmsg(int cpu)
+{
+ static int next_node, node_shift;
+ int node = cpu_to_node(cpu);
+
+ if (node >= next_node) {
+ cpumask_var_t cpulist;
+
+ node = next_node;
+ next_node = 1 << node_shift;
+ node_shift++;
+
+ if (alloc_cpumask_var(&cpulist, GFP_KERNEL)) {
+ int i, tmp, last_node = node;
+ char buf[32];
+
+ cpumask_clear(cpulist);
+ for_each_present_cpu(i) {
+ if (i == 0) /* boot cpu */
+ continue;
+
+ tmp = cpu_to_node(i);
+ if (node <= tmp && tmp < next_node) {
+ cpumask_set_cpu(i, cpulist);
+ if (last_node < tmp)
+ last_node = tmp;
+ }
+ }
+ if (cpumask_weight(cpulist)) {
+ cpulist_scnprintf(buf, sizeof(buf), cpulist);
+ printk(KERN_INFO "Booting Processors %s,", buf);
+
+ if (node == last_node)
+ printk(KERN_CONT " Node %d\n", node);
+ else
+ printk(KERN_CONT " Nodes %d-%d\n",
+ node, last_node);
+ }
+ free_cpumask_var(cpulist);
+ }
+ }
+}
+
/*
* NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
* (ie clustered apic addressing mode), this is a LOGICAL apic ID.
@@ -737,8 +869,14 @@
start_ip = setup_trampoline();

/* So we see what's up */
- printk(KERN_INFO "Booting processor %d APIC 0x%x ip 0x%lx\n",
- cpu, apicid, start_ip);
+ if (system_state == SYSTEM_BOOTING) {
+ print_summary_bootmsg(cpu);
+ printk(KERN_DEBUG);
+ } else
+ printk(KERN_INFO);
+
+ printk(KERN_CONT "Booting processor %d APIC 0x%x ip 0x%lx\n",
+ cpu, apicid, start_ip);

/*
* This grunge runs the startup process for
@@ -790,7 +928,7 @@
if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
/* number CPUs logically, starting from 1 (BSP is 0) */
pr_debug("OK.\n");
- printk(KERN_INFO "CPU%d: ", cpu);
+ printk(KERN_DEBUG "CPU%d: ", cpu);
print_cpu_info(&cpu_data(cpu));
pr_debug("CPU has booted.\n");
} else {
@@ -1147,6 +1285,9 @@
{
pr_debug("Boot done.\n");

+ /* print processor data summaries */
+ summarize_cpu_info();
+
impress_friends();
#ifdef CONFIG_X86_IO_APIC
setup_ioapic_dest();
@@ -1300,7 +1441,12 @@
for (i = 0; i < 10; i++) {
/* They ack this in play_dead by setting CPU_DEAD */
if (per_cpu(cpu_state, cpu) == CPU_DEAD) {
- printk(KERN_INFO "CPU %d is now offline\n", cpu);
+ if (system_state == SYSTEM_RUNNING)
+ printk(KERN_INFO);
+ else
+ printk(KERN_DEBUG);
+
+ printk(KERN_CONT "CPU %d is now offline\n", cpu);
if (1 == num_online_cpus())
alternatives_smp_switch(0);
return;
--- linux.orig/kernel/cpu.c
+++ linux/kernel/cpu.c
@@ -394,7 +394,7 @@
error = _cpu_down(cpu, 1);
if (!error) {
cpumask_set_cpu(cpu, frozen_cpus);
- printk("CPU%d is down\n", cpu);
+ printk(KERN_DEBUG "CPU%d is down\n", cpu);
} else {
printk(KERN_ERR "Error taking CPU%d down: %d\n",
cpu, error);

2009-10-30 19:54:09

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages

On Fri, 30 Oct 2009, Mike Travis wrote:

> x86_64: Limit the number of processor bootup messages
>
> With a large number of processors in a system there is an excessive amount
> of messages sent to the system console. It's estimated that with 4096
> processors in a system, and the console baudrate set to 56K, the startup
> messages will take about 84 minutes to clear the serial port.
>
> This set of patches limits the number of repetitious messages which contain
> no additional information. Much of this information is obtainable from the
> /proc and /sysfs. Most of the messages are also sent to the kernel log
> buffer as KERN_DEBUG messages so it can be used to examine more closely any
> details specific to a processor.
>
> The list of message transformations....
>
> For system_state == SYSTEM_BOOTING:
>
> [ 25.388280] Booting Processors 1-7,320-327, Node 0
> [ 26.064742] Booting Processors 8-15,328-335, Node 1
> [ 26.837006] Booting Processors 16-31,336-351, Nodes 2-3
> [ 28.440427] Booting Processors 32-63,352-383, Nodes 4-7
> [ 31.640450] Booting Processors 64-127,384-447, Nodes 8-15
> [ 38.041430] Booting Processors 128-255,448-575, Nodes 16-31
> [ 50.917504] Booting Processors 256-319,576-639, Nodes 32-39
> [ 90.964169] Brought up 640 CPUs
>
> The range of processors increases as a power of 2, so 4096 CPU's should
> only take 12 lines.
>
> (QUESTION: print_summary_bootmsg() is in the __init section and is called
> from a __cpuinit function, but only when system is booting. Is there a
> special flag to handle this case?)
>

That's fine, init.text will still be valid as long as cpuinit.text is and
there will be no hotplug considerations.

> For Processor Information printout:
>
> [ 90.968381] Summary Processor Information for CPUS: 0-639
> [ 90.972033] Genuine Intel(R) CPU 0000 @ 2.13GHz stepping 04
> [ 90.981402] CPU: L1 I cache: 32K, L1 D cache: 32K
> [ 90.985888] CPU: L2 cache: 256K
> [ 90.988032] CPU: L3 cache: 24576K
> [ 90.992032] MIN 4266.68 BogoMIPS (lpj=8533371)
> [ 91.000033] MAX 4267.89 BogoMIPS (lpj=8535789)
>
> These lines have been moved to loglevel KERN_DEBUG:
>
> CPU: Physical Processor ID:
> CPU: Processor Core ID:
> CPU %d/0x%x -> Node %d
> <cache line sizes per cpu>
> CPUx is down
>
> This message has been changed to loglevel KERN_DEBUG if system is booting
> and KERN_INFO otherwise:
>
> CPU %d is now offline
>
> Signed-off-by: Mike Travis <[email protected]>
> ---
> arch/x86/include/asm/processor.h | 4
> arch/x86/kernel/cpu/addon_cpuid_features.c | 4 arch/x86/kernel/cpu/amd.c
> | 2 arch/x86/kernel/cpu/common.c | 23 +++-
> arch/x86/kernel/cpu/intel.c | 2
> arch/x86/kernel/cpu/intel_cacheinfo.c | 22 +---
> arch/x86/kernel/smpboot.c | 154
> ++++++++++++++++++++++++++++-
> kernel/cpu.c | 2 8 files changed, 187
> insertions(+), 26 deletions(-)
>
> --- linux.orig/arch/x86/include/asm/processor.h
> +++ linux/arch/x86/include/asm/processor.h
> @@ -111,6 +111,9 @@
> u16 cpu_core_id;
> /* Index into per_cpu list: */
> u16 cpu_index;
> + /* Interior Cache Sizes: */
> + u16 l1i, l1d, l2;
> + u32 l3;
> #endif
> unsigned int x86_hyper_vendor;
> } __attribute__((__aligned__(SMP_CACHE_BYTES)));
> @@ -169,6 +172,7 @@
> extern void identify_boot_cpu(void);
> extern void identify_secondary_cpu(struct cpuinfo_x86 *);
> extern void print_cpu_info(struct cpuinfo_x86 *);
> +extern void print_cache_info(struct cpuinfo_x86 *, char *msglvl);
> extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
> extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
> extern unsigned short num_cache_leaves;
> --- linux.orig/arch/x86/kernel/cpu/addon_cpuid_features.c
> +++ linux/arch/x86/kernel/cpu/addon_cpuid_features.c
> @@ -128,10 +128,10 @@
> c->x86_max_cores = (core_level_siblings / smp_num_siblings);
>
>
> - printk(KERN_INFO "CPU: Physical Processor ID: %d\n",
> + printk(KERN_DEBUG "CPU: Physical Processor ID: %d\n",
> c->phys_proc_id);
> if (c->x86_max_cores > 1)
> - printk(KERN_INFO "CPU: Processor Core ID: %d\n",
> + printk(KERN_DEBUG "CPU: Processor Core ID: %d\n",
> c->cpu_core_id);
> return;
> #endif

Perhaps an opporunity to move these to pr_debug() instead?

> --- linux.orig/arch/x86/kernel/cpu/amd.c
> +++ linux/arch/x86/kernel/cpu/amd.c
> @@ -376,7 +376,7 @@
> }
> numa_set_node(cpu, node);
>
> - printk(KERN_INFO "CPU %d/0x%x -> Node %d\n", cpu, apicid, node);
> + printk(KERN_DEBUG "CPU %d/0x%x -> Node %d\n", cpu, apicid, node);
> #endif
> }
>
> --- linux.orig/arch/x86/kernel/cpu/common.c
> +++ linux/arch/x86/kernel/cpu/common.c
> @@ -475,9 +475,9 @@
>
> out:
> if ((c->x86_max_cores * smp_num_siblings) > 1) {
> - printk(KERN_INFO "CPU: Physical Processor ID: %d\n",
> + printk(KERN_DEBUG "CPU: Physical Processor ID: %d\n",
> c->phys_proc_id);
> - printk(KERN_INFO "CPU: Processor Core ID: %d\n",
> + printk(KERN_DEBUG "CPU: Processor Core ID: %d\n",
> c->cpu_core_id);
> }
> #endif
> @@ -967,6 +967,23 @@
> #endif
> }
>
> +void __cpuinit print_cache_info(struct cpuinfo_x86 *c, char *lvl)
> +{
> + if (c->l1i)
> + printk("%sCPU: L1 I cache: %dK", lvl, c->l1i);
> +
> + if (c->l1d)
> + printk(KERN_CONT ", L1 D cache: %dK\n", c->l1d);
> + else
> + printk(KERN_CONT "\n");
> +
> + if (c->l2)
> + printk("%sCPU: L2 cache: %dK\n", lvl, c->l2);
> +
> + if (c->l3)
> + printk("%sCPU: L3 cache: %dK\n", lvl, c->l3);
> +}
> +
> static __init int setup_disablecpuid(char *arg)
> {
> int bit;
> @@ -1115,7 +1132,7 @@
> if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
> panic("CPU#%d already initialized!\n", cpu);
>
> - printk(KERN_INFO "Initializing CPU#%d\n", cpu);
> + printk(KERN_DEBUG "Initializing CPU#%d\n", cpu);
>
> clear_in_cr4(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
>
> --- linux.orig/arch/x86/kernel/cpu/intel.c
> +++ linux/arch/x86/kernel/cpu/intel.c
> @@ -267,7 +267,7 @@
> node = first_node(node_online_map);
> numa_set_node(cpu, node);
>
> - printk(KERN_INFO "CPU %d/0x%x -> Node %d\n", cpu, apicid, node);
> + printk(KERN_DEBUG "CPU %d/0x%x -> Node %d\n", cpu, apicid, node);
> #endif
> }
>
> --- linux.orig/arch/x86/kernel/cpu/intel_cacheinfo.c
> +++ linux/arch/x86/kernel/cpu/intel_cacheinfo.c
> @@ -489,23 +489,17 @@
> }
>
> if (trace)
> - printk(KERN_INFO "CPU: Trace cache: %dK uops", trace);
> - else if (l1i)
> - printk(KERN_INFO "CPU: L1 I cache: %dK", l1i);
> -
> - if (l1d)
> - printk(KERN_CONT ", L1 D cache: %dK\n", l1d);
> - else
> - printk(KERN_CONT "\n");
> -
> - if (l2)
> - printk(KERN_INFO "CPU: L2 cache: %dK\n", l2);
> -
> - if (l3)
> - printk(KERN_INFO "CPU: L3 cache: %dK\n", l3);
> + printk(KERN_DEBUG "CPU: Trace cache: %dK uops", trace);
>
> + c->l1i = l1i;
> + c->l1d = l1d;
> + c->l2 = l2;
> + c->l3 = l3;
> c->x86_cache_size = l3 ? l3 : (l2 ? l2 : (l1i+l1d));
>
> + print_cache_info(c,
> + system_state == SYSTEM_BOOTING? KERN_DEBUG : KERN_INFO);
> +
> return l2;
> }
>
> --- linux.orig/arch/x86/kernel/smpboot.c
> +++ linux/arch/x86/kernel/smpboot.c
> @@ -442,6 +442,94 @@
> return c->llc_shared_map;
> }
>
> +/* Summarize Processor Information */
> +static void __init summarize_cpu_info(void)
> +{
> + cpumask_var_t cpulist, cpusdone;
> + int cpu;
> + int err = 0;
> +
> + if (!alloc_cpumask_var(&cpulist, GFP_KERNEL))
> + err = 1;
> +
> + else if (!alloc_cpumask_var(&cpusdone, GFP_KERNEL)) {
> + free_cpumask_var(cpulist);
> + err = 1;
> + }
> +
> + if (err) {
> + printk(KERN_INFO "Can't print processor summaries\n");
> + return;
> + }
> +
> + cpumask_clear(cpusdone);
> + for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
> + struct cpuinfo_x86 *c;
> + int l1i, l1d, l2, l3;
> + int x86, x86_vendor, x86_model, x86_mask;
> + char buf[64];
> + int ncpu;
> + unsigned long minlpj, maxlpj;
> +
> + /* skip if cpu has already been displayed */
> + if (cpumask_test_cpu(cpu, cpusdone))
> + continue;
> +
> + c = &cpu_data(cpu);
> + l1i = c->l1i;
> + l1d = c->l1d;
> + l2 = c->l2;
> + l3 = c->l3;
> + x86 = c->x86;
> + x86_vendor = c->x86_vendor;
> + x86_model = c->x86_model;
> + x86_mask = c->x86_mask;
> + minlpj = ULONG_MAX;
> + maxlpj = 0;
> +
> + cpumask_clear(cpulist);
> +
> + /* collate all cpus with same specifics */
> + for (ncpu = cpu; ncpu < nr_cpu_ids; ncpu++) {
> + if (l1i != cpu_data(ncpu).l1i ||
> + l1d != cpu_data(ncpu).l1d ||
> + l2 != cpu_data(ncpu).l2 ||
> + l3 != cpu_data(ncpu).l3 ||
> + x86 != cpu_data(ncpu).x86 ||
> + x86_vendor != cpu_data(ncpu).x86_vendor ||
> + x86_model != cpu_data(ncpu).x86_model ||
> + x86_mask != cpu_data(ncpu).x86_mask)
> + continue;
> +
> + cpumask_set_cpu(ncpu, cpulist);
> + cpumask_set_cpu(ncpu, cpusdone);
> +
> + if (cpu_data(ncpu).loops_per_jiffy < minlpj)
> + minlpj = cpu_data(ncpu).loops_per_jiffy;
> +
> + if (cpu_data(ncpu).loops_per_jiffy > maxlpj)
> + maxlpj = cpu_data(ncpu).loops_per_jiffy;
> + }
> +
> + cpulist_scnprintf(buf, sizeof(buf), cpulist);
> + printk(KERN_INFO
> + "Summary Processor Information for CPUS: %s\n", buf);
> +
> + printk(KERN_INFO);
> + print_cpu_info(c);
> + print_cache_info(c, KERN_INFO);
> +
> + printk(KERN_INFO "MIN %lu.%02lu BogoMIPS (lpj=%lu)\n",
> + minlpj/(500000/HZ), (minlpj/(5000/HZ)) % 100, minlpj);
> +
> + printk(KERN_INFO "MAX %lu.%02lu BogoMIPS (lpj=%lu)\n",
> + maxlpj/(500000/HZ), (maxlpj/(5000/HZ)) % 100, maxlpj);
> + }
> +
> + free_cpumask_var(cpusdone);
> + free_cpumask_var(cpulist);
> +}
> +
> static void impress_friends(void)
> {
> int cpu;
> @@ -671,6 +759,50 @@
> complete(&c_idle->done);
> }
>
> +/* Summarize the "Booting processor ..." startup messages */
> +static void __init print_summary_bootmsg(int cpu)
> +{
> + static int next_node, node_shift;
> + int node = cpu_to_node(cpu);
> +
> + if (node >= next_node) {
> + cpumask_var_t cpulist;
> +
> + node = next_node;
> + next_node = 1 << node_shift;
> + node_shift++;
> +
> + if (alloc_cpumask_var(&cpulist, GFP_KERNEL)) {
> + int i, tmp, last_node = node;
> + char buf[32];
> +
> + cpumask_clear(cpulist);
> + for_each_present_cpu(i) {
> + if (i == 0) /* boot cpu */
> + continue;
> +
> + tmp = cpu_to_node(i);
> + if (node <= tmp && tmp < next_node) {
> + cpumask_set_cpu(i, cpulist);
> + if (last_node < tmp)
> + last_node = tmp;
> + }
> + }
> + if (cpumask_weight(cpulist)) {
> + cpulist_scnprintf(buf, sizeof(buf), cpulist);
> + printk(KERN_INFO "Booting Processors %s,",
> buf);
> +
> + if (node == last_node)
> + printk(KERN_CONT " Node %d\n", node);
> + else
> + printk(KERN_CONT " Nodes %d-%d\n",
> + node, last_node);
> + }
> + free_cpumask_var(cpulist);
> + }
> + }
> +}
> +
> /*
> * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
> * (ie clustered apic addressing mode), this is a LOGICAL apic ID.

Why isn't cpumask_of_node() available yet?

2009-10-30 20:39:39

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages



David Rientjes wrote:
> On Fri, 30 Oct 2009, Mike Travis wrote:
>
>> x86_64: Limit the number of processor bootup messages
>>
>> With a large number of processors in a system there is an excessive amount
>> of messages sent to the system console. It's estimated that with 4096
>> processors in a system, and the console baudrate set to 56K, the startup
>> messages will take about 84 minutes to clear the serial port.
>>
>> This set of patches limits the number of repetitious messages which contain
>> no additional information. Much of this information is obtainable from the
>> /proc and /sysfs. Most of the messages are also sent to the kernel log
>> buffer as KERN_DEBUG messages so it can be used to examine more closely any
>> details specific to a processor.
>>
>> The list of message transformations....
>>
>> For system_state == SYSTEM_BOOTING:
>>
>> [ 25.388280] Booting Processors 1-7,320-327, Node 0
>> [ 26.064742] Booting Processors 8-15,328-335, Node 1
>> [ 26.837006] Booting Processors 16-31,336-351, Nodes 2-3
>> [ 28.440427] Booting Processors 32-63,352-383, Nodes 4-7
>> [ 31.640450] Booting Processors 64-127,384-447, Nodes 8-15
>> [ 38.041430] Booting Processors 128-255,448-575, Nodes 16-31
>> [ 50.917504] Booting Processors 256-319,576-639, Nodes 32-39
>> [ 90.964169] Brought up 640 CPUs
>>
>> The range of processors increases as a power of 2, so 4096 CPU's should
>> only take 12 lines.
>>
>> (QUESTION: print_summary_bootmsg() is in the __init section and is called
>> from a __cpuinit function, but only when system is booting. Is there a
>> special flag to handle this case?)
>>
>
> That's fine, init.text will still be valid as long as cpuinit.text is and
> there will be no hotplug considerations.

Ok, thanks.
>
>> For Processor Information printout:
>>
>> [ 90.968381] Summary Processor Information for CPUS: 0-639
>> [ 90.972033] Genuine Intel(R) CPU 0000 @ 2.13GHz stepping 04
>> [ 90.981402] CPU: L1 I cache: 32K, L1 D cache: 32K
>> [ 90.985888] CPU: L2 cache: 256K
>> [ 90.988032] CPU: L3 cache: 24576K
>> [ 90.992032] MIN 4266.68 BogoMIPS (lpj=8533371)
>> [ 91.000033] MAX 4267.89 BogoMIPS (lpj=8535789)
>>
>> These lines have been moved to loglevel KERN_DEBUG:
>>
>> CPU: Physical Processor ID:
>> CPU: Processor Core ID:
>> CPU %d/0x%x -> Node %d
>> <cache line sizes per cpu>
>> CPUx is down
>>
>> This message has been changed to loglevel KERN_DEBUG if system is booting
>> and KERN_INFO otherwise:
>>
>> CPU %d is now offline
>>
>> Signed-off-by: Mike Travis <[email protected]>
>> ---
>> arch/x86/include/asm/processor.h | 4
>> arch/x86/kernel/cpu/addon_cpuid_features.c | 4 arch/x86/kernel/cpu/amd.c
>> | 2 arch/x86/kernel/cpu/common.c | 23 +++-
>> arch/x86/kernel/cpu/intel.c | 2
>> arch/x86/kernel/cpu/intel_cacheinfo.c | 22 +---
>> arch/x86/kernel/smpboot.c | 154
>> ++++++++++++++++++++++++++++-
>> kernel/cpu.c | 2 8 files changed, 187
>> insertions(+), 26 deletions(-)
>>
>> --- linux.orig/arch/x86/include/asm/processor.h
>> +++ linux/arch/x86/include/asm/processor.h
>> @@ -111,6 +111,9 @@
>> u16 cpu_core_id;
>> /* Index into per_cpu list: */
>> u16 cpu_index;
>> + /* Interior Cache Sizes: */
>> + u16 l1i, l1d, l2;
>> + u32 l3;
>> #endif
>> unsigned int x86_hyper_vendor;
>> } __attribute__((__aligned__(SMP_CACHE_BYTES)));
>> @@ -169,6 +172,7 @@
>> extern void identify_boot_cpu(void);
>> extern void identify_secondary_cpu(struct cpuinfo_x86 *);
>> extern void print_cpu_info(struct cpuinfo_x86 *);
>> +extern void print_cache_info(struct cpuinfo_x86 *, char *msglvl);
>> extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
>> extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
>> extern unsigned short num_cache_leaves;
>> --- linux.orig/arch/x86/kernel/cpu/addon_cpuid_features.c
>> +++ linux/arch/x86/kernel/cpu/addon_cpuid_features.c
>> @@ -128,10 +128,10 @@
>> c->x86_max_cores = (core_level_siblings / smp_num_siblings);
>>
>>
>> - printk(KERN_INFO "CPU: Physical Processor ID: %d\n",
>> + printk(KERN_DEBUG "CPU: Physical Processor ID: %d\n",
>> c->phys_proc_id);
>> if (c->x86_max_cores > 1)
>> - printk(KERN_INFO "CPU: Processor Core ID: %d\n",
>> + printk(KERN_DEBUG "CPU: Processor Core ID: %d\n",
>> c->cpu_core_id);
>> return;
>> #endif
>
> Perhaps an opporunity to move these to pr_debug() instead?
>

Hmm, good idea.

>> --- linux.orig/arch/x86/kernel/cpu/amd.c
>> +++ linux/arch/x86/kernel/cpu/amd.c
>> @@ -376,7 +376,7 @@
>> }
>> numa_set_node(cpu, node);
>>
>> - printk(KERN_INFO "CPU %d/0x%x -> Node %d\n", cpu, apicid, node);
>> + printk(KERN_DEBUG "CPU %d/0x%x -> Node %d\n", cpu, apicid, node);
>> #endif
>> }
>>
>> --- linux.orig/arch/x86/kernel/cpu/common.c
>> +++ linux/arch/x86/kernel/cpu/common.c
>> @@ -475,9 +475,9 @@
>>
>> out:
>> if ((c->x86_max_cores * smp_num_siblings) > 1) {
>> - printk(KERN_INFO "CPU: Physical Processor ID: %d\n",
>> + printk(KERN_DEBUG "CPU: Physical Processor ID: %d\n",
>> c->phys_proc_id);
>> - printk(KERN_INFO "CPU: Processor Core ID: %d\n",
>> + printk(KERN_DEBUG "CPU: Processor Core ID: %d\n",
>> c->cpu_core_id);
>> }
>> #endif
>> @@ -967,6 +967,23 @@
>> #endif
>> }
>>
>> +void __cpuinit print_cache_info(struct cpuinfo_x86 *c, char *lvl)
>> +{
>> + if (c->l1i)
>> + printk("%sCPU: L1 I cache: %dK", lvl, c->l1i);
>> +
>> + if (c->l1d)
>> + printk(KERN_CONT ", L1 D cache: %dK\n", c->l1d);
>> + else
>> + printk(KERN_CONT "\n");
>> +
>> + if (c->l2)
>> + printk("%sCPU: L2 cache: %dK\n", lvl, c->l2);
>> +
>> + if (c->l3)
>> + printk("%sCPU: L3 cache: %dK\n", lvl, c->l3);
>> +}
>> +
>> static __init int setup_disablecpuid(char *arg)
>> {
>> int bit;
>> @@ -1115,7 +1132,7 @@
>> if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
>> panic("CPU#%d already initialized!\n", cpu);
>>
>> - printk(KERN_INFO "Initializing CPU#%d\n", cpu);
>> + printk(KERN_DEBUG "Initializing CPU#%d\n", cpu);
>>
>> clear_in_cr4(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
>>
>> --- linux.orig/arch/x86/kernel/cpu/intel.c
>> +++ linux/arch/x86/kernel/cpu/intel.c
>> @@ -267,7 +267,7 @@
>> node = first_node(node_online_map);
>> numa_set_node(cpu, node);
>>
>> - printk(KERN_INFO "CPU %d/0x%x -> Node %d\n", cpu, apicid, node);
>> + printk(KERN_DEBUG "CPU %d/0x%x -> Node %d\n", cpu, apicid, node);
>> #endif
>> }
>>
>> --- linux.orig/arch/x86/kernel/cpu/intel_cacheinfo.c
>> +++ linux/arch/x86/kernel/cpu/intel_cacheinfo.c
>> @@ -489,23 +489,17 @@
>> }
>>
>> if (trace)
>> - printk(KERN_INFO "CPU: Trace cache: %dK uops", trace);
>> - else if (l1i)
>> - printk(KERN_INFO "CPU: L1 I cache: %dK", l1i);
>> -
>> - if (l1d)
>> - printk(KERN_CONT ", L1 D cache: %dK\n", l1d);
>> - else
>> - printk(KERN_CONT "\n");
>> -
>> - if (l2)
>> - printk(KERN_INFO "CPU: L2 cache: %dK\n", l2);
>> -
>> - if (l3)
>> - printk(KERN_INFO "CPU: L3 cache: %dK\n", l3);
>> + printk(KERN_DEBUG "CPU: Trace cache: %dK uops", trace);
>>
>> + c->l1i = l1i;
>> + c->l1d = l1d;
>> + c->l2 = l2;
>> + c->l3 = l3;
>> c->x86_cache_size = l3 ? l3 : (l2 ? l2 : (l1i+l1d));
>>
>> + print_cache_info(c,
>> + system_state == SYSTEM_BOOTING? KERN_DEBUG : KERN_INFO);
>> +
>> return l2;
>> }
>>
>> --- linux.orig/arch/x86/kernel/smpboot.c
>> +++ linux/arch/x86/kernel/smpboot.c
>> @@ -442,6 +442,94 @@
>> return c->llc_shared_map;
>> }
>>
>> +/* Summarize Processor Information */
>> +static void __init summarize_cpu_info(void)
>> +{
>> + cpumask_var_t cpulist, cpusdone;
>> + int cpu;
>> + int err = 0;
>> +
>> + if (!alloc_cpumask_var(&cpulist, GFP_KERNEL))
>> + err = 1;
>> +
>> + else if (!alloc_cpumask_var(&cpusdone, GFP_KERNEL)) {
>> + free_cpumask_var(cpulist);
>> + err = 1;
>> + }
>> +
>> + if (err) {
>> + printk(KERN_INFO "Can't print processor summaries\n");
>> + return;
>> + }
>> +
>> + cpumask_clear(cpusdone);
>> + for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
>> + struct cpuinfo_x86 *c;
>> + int l1i, l1d, l2, l3;
>> + int x86, x86_vendor, x86_model, x86_mask;
>> + char buf[64];
>> + int ncpu;
>> + unsigned long minlpj, maxlpj;
>> +
>> + /* skip if cpu has already been displayed */
>> + if (cpumask_test_cpu(cpu, cpusdone))
>> + continue;
>> +
>> + c = &cpu_data(cpu);
>> + l1i = c->l1i;
>> + l1d = c->l1d;
>> + l2 = c->l2;
>> + l3 = c->l3;
>> + x86 = c->x86;
>> + x86_vendor = c->x86_vendor;
>> + x86_model = c->x86_model;
>> + x86_mask = c->x86_mask;
>> + minlpj = ULONG_MAX;
>> + maxlpj = 0;
>> +
>> + cpumask_clear(cpulist);
>> +
>> + /* collate all cpus with same specifics */
>> + for (ncpu = cpu; ncpu < nr_cpu_ids; ncpu++) {
>> + if (l1i != cpu_data(ncpu).l1i ||
>> + l1d != cpu_data(ncpu).l1d ||
>> + l2 != cpu_data(ncpu).l2 ||
>> + l3 != cpu_data(ncpu).l3 ||
>> + x86 != cpu_data(ncpu).x86 ||
>> + x86_vendor != cpu_data(ncpu).x86_vendor ||
>> + x86_model != cpu_data(ncpu).x86_model ||
>> + x86_mask != cpu_data(ncpu).x86_mask)
>> + continue;
>> +
>> + cpumask_set_cpu(ncpu, cpulist);
>> + cpumask_set_cpu(ncpu, cpusdone);
>> +
>> + if (cpu_data(ncpu).loops_per_jiffy < minlpj)
>> + minlpj = cpu_data(ncpu).loops_per_jiffy;
>> +
>> + if (cpu_data(ncpu).loops_per_jiffy > maxlpj)
>> + maxlpj = cpu_data(ncpu).loops_per_jiffy;
>> + }
>> +
>> + cpulist_scnprintf(buf, sizeof(buf), cpulist);
>> + printk(KERN_INFO
>> + "Summary Processor Information for CPUS: %s\n", buf);
>> +
>> + printk(KERN_INFO);
>> + print_cpu_info(c);
>> + print_cache_info(c, KERN_INFO);
>> +
>> + printk(KERN_INFO "MIN %lu.%02lu BogoMIPS (lpj=%lu)\n",
>> + minlpj/(500000/HZ), (minlpj/(5000/HZ)) % 100, minlpj);
>> +
>> + printk(KERN_INFO "MAX %lu.%02lu BogoMIPS (lpj=%lu)\n",
>> + maxlpj/(500000/HZ), (maxlpj/(5000/HZ)) % 100, maxlpj);
>> + }
>> +
>> + free_cpumask_var(cpusdone);
>> + free_cpumask_var(cpulist);
>> +}
>> +
>> static void impress_friends(void)
>> {
>> int cpu;
>> @@ -671,6 +759,50 @@
>> complete(&c_idle->done);
>> }
>>
>> +/* Summarize the "Booting processor ..." startup messages */
>> +static void __init print_summary_bootmsg(int cpu)
>> +{
>> + static int next_node, node_shift;
>> + int node = cpu_to_node(cpu);
>> +
>> + if (node >= next_node) {
>> + cpumask_var_t cpulist;
>> +
>> + node = next_node;
>> + next_node = 1 << node_shift;
>> + node_shift++;
>> +
>> + if (alloc_cpumask_var(&cpulist, GFP_KERNEL)) {
>> + int i, tmp, last_node = node;
>> + char buf[32];
>> +
>> + cpumask_clear(cpulist);
>> + for_each_present_cpu(i) {
>> + if (i == 0) /* boot cpu */
>> + continue;
>> +
>> + tmp = cpu_to_node(i);
>> + if (node <= tmp && tmp < next_node) {
>> + cpumask_set_cpu(i, cpulist);
>> + if (last_node < tmp)
>> + last_node = tmp;
>> + }
>> + }
>> + if (cpumask_weight(cpulist)) {
>> + cpulist_scnprintf(buf, sizeof(buf), cpulist);
>> + printk(KERN_INFO "Booting Processors %s,",
>> buf);
>> +
>> + if (node == last_node)
>> + printk(KERN_CONT " Node %d\n", node);
>> + else
>> + printk(KERN_CONT " Nodes %d-%d\n",
>> + node, last_node);
>> + }
>> + free_cpumask_var(cpulist);
>> + }
>> + }
>> +}
>> +
>> /*
>> * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
>> * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
>
> Why isn't cpumask_of_node() available yet?

I'll try that. It gets a bit tricky in specifying the actual last node that
is being booted.

Thanks,
Mike

2009-10-30 23:30:18

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages

On Fri, 30 Oct 2009, Mike Travis wrote:

> > > x86_64: Limit the number of processor bootup messages
> > >

Is this really only limited to 64 bit?

> > > With a large number of processors in a system there is an excessive amount
> > > of messages sent to the system console. It's estimated that with 4096
> > > processors in a system, and the console baudrate set to 56K, the startup
> > > messages will take about 84 minutes to clear the serial port.
> > >
> > > This set of patches limits the number of repetitious messages which
> > > contain
> > > no additional information. Much of this information is obtainable from
> > > the
> > > /proc and /sysfs. Most of the messages are also sent to the kernel log
> > > buffer as KERN_DEBUG messages so it can be used to examine more closely
> > > any
> > > details specific to a processor.
> > >
> > > The list of message transformations....
> > >
> > > For system_state == SYSTEM_BOOTING:
> > >
> > > [ 25.388280] Booting Processors 1-7,320-327, Node 0
> > > [ 26.064742] Booting Processors 8-15,328-335, Node 1
> > > [ 26.837006] Booting Processors 16-31,336-351, Nodes 2-3
> > > [ 28.440427] Booting Processors 32-63,352-383, Nodes 4-7
> > > [ 31.640450] Booting Processors 64-127,384-447, Nodes 8-15
> > > [ 38.041430] Booting Processors 128-255,448-575, Nodes 16-31
> > > [ 50.917504] Booting Processors 256-319,576-639, Nodes 32-39
> > > [ 90.964169] Brought up 640 CPUs
> > >
> > > The range of processors increases as a power of 2, so 4096 CPU's should
> > > only take 12 lines.
> > >

On your particular machine, yes, but there's no x86 restriction on the
number of cpus per node.

> > > @@ -671,6 +759,50 @@
> > > complete(&c_idle->done);
> > > }
> > >
> > > +/* Summarize the "Booting processor ..." startup messages */
> > > +static void __init print_summary_bootmsg(int cpu)
> > > +{
> > > + static int next_node, node_shift;
> > > + int node = cpu_to_node(cpu);
> > > +
> > > + if (node >= next_node) {
> > > + cpumask_var_t cpulist;
> > > +
> > > + node = next_node;
> > > + next_node = 1 << node_shift;
> > > + node_shift++;
> > > +
> > > + if (alloc_cpumask_var(&cpulist, GFP_KERNEL)) {
> > > + int i, tmp, last_node = node;
> > > + char buf[32];
> > > +
> > > + cpumask_clear(cpulist);
> > > + for_each_present_cpu(i) {
> > > + if (i == 0) /* boot cpu */
> > > + continue;
> > > +
> > > + tmp = cpu_to_node(i);
> > > + if (node <= tmp && tmp < next_node) {
> > > + cpumask_set_cpu(i, cpulist);
> > > + if (last_node < tmp)
> > > + last_node = tmp;
> > > + }
> > > + }
> > > + if (cpumask_weight(cpulist)) {
> > > + cpulist_scnprintf(buf, sizeof(buf), cpulist);
> > > + printk(KERN_INFO "Booting Processors %s,",
> > > buf);
> > > +
> > > + if (node == last_node)
> > > + printk(KERN_CONT " Node %d\n", node);
> > > + else
> > > + printk(KERN_CONT " Nodes %d-%d\n",
> > > + node, last_node);
> > > + }
> > > + free_cpumask_var(cpulist);
> > > + }
> > > + }
> > > +}
> > > +
> > > /*
> > > * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
> > > * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
> >
> > Why isn't cpumask_of_node() available yet?
>
> I'll try that. It gets a bit tricky in specifying the actual last node that
> is being booted.
>

Why do you need to call print_summary_bootmsg() for each cpu? It seems
like you'd be able to move this out to a single call to a new function:

void __init print_summary_bootmsg(void)
{
char buf[128];
int nid;

for_each_online_node(nid) {
const struct cpumask *mask = cpumask_of_node(nid);

if (cpumask_empty(mask))
continue;
cpulist_scnprintf(buf, sizeof(buf), cpumask_of_node(nid));
pr_info("Booting Processors %s, Node %d\n", buf, nid);
}
}

2009-10-31 00:27:45

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages



David Rientjes wrote:
> On Fri, 30 Oct 2009, Mike Travis wrote:
>
>>>> x86_64: Limit the number of processor bootup messages
>>>>
>
> Is this really only limited to 64 bit?

[That was a quick edit to change it from SGI X86_64 UV and it didn't
occur to me to remove the _64. :-)]

>
>>>> With a large number of processors in a system there is an excessive amount
>>>> of messages sent to the system console. It's estimated that with 4096
>>>> processors in a system, and the console baudrate set to 56K, the startup
>>>> messages will take about 84 minutes to clear the serial port.
>>>>
>>>> This set of patches limits the number of repetitious messages which
>>>> contain
>>>> no additional information. Much of this information is obtainable from
>>>> the
>>>> /proc and /sysfs. Most of the messages are also sent to the kernel log
>>>> buffer as KERN_DEBUG messages so it can be used to examine more closely
>>>> any
>>>> details specific to a processor.
>>>>
>>>> The list of message transformations....
>>>>
>>>> For system_state == SYSTEM_BOOTING:
>>>>
>>>> [ 25.388280] Booting Processors 1-7,320-327, Node 0
>>>> [ 26.064742] Booting Processors 8-15,328-335, Node 1
>>>> [ 26.837006] Booting Processors 16-31,336-351, Nodes 2-3
>>>> [ 28.440427] Booting Processors 32-63,352-383, Nodes 4-7
>>>> [ 31.640450] Booting Processors 64-127,384-447, Nodes 8-15
>>>> [ 38.041430] Booting Processors 128-255,448-575, Nodes 16-31
>>>> [ 50.917504] Booting Processors 256-319,576-639, Nodes 32-39
>>>> [ 90.964169] Brought up 640 CPUs
>>>>
>>>> The range of processors increases as a power of 2, so 4096 CPU's should
>>>> only take 12 lines.
>>>>
>
> On your particular machine, yes, but there's no x86 restriction on the
> number of cpus per node.

Yes, my comment is wrong. The limit would be 10 lines for the current kernel
limit of 512 nodes.

>
>>>> @@ -671,6 +759,50 @@
>>>> complete(&c_idle->done);
>>>> }
>>>>
>>>> +/* Summarize the "Booting processor ..." startup messages */
>>>> +static void __init print_summary_bootmsg(int cpu)
>>>> +{
>>>> + static int next_node, node_shift;
>>>> + int node = cpu_to_node(cpu);
>>>> +
>>>> + if (node >= next_node) {
>>>> + cpumask_var_t cpulist;
>>>> +
>>>> + node = next_node;
>>>> + next_node = 1 << node_shift;
>>>> + node_shift++;
>>>> +
>>>> + if (alloc_cpumask_var(&cpulist, GFP_KERNEL)) {
>>>> + int i, tmp, last_node = node;
>>>> + char buf[32];
>>>> +
>>>> + cpumask_clear(cpulist);
>>>> + for_each_present_cpu(i) {
>>>> + if (i == 0) /* boot cpu */
>>>> + continue;
>>>> +
>>>> + tmp = cpu_to_node(i);
>>>> + if (node <= tmp && tmp < next_node) {
>>>> + cpumask_set_cpu(i, cpulist);
>>>> + if (last_node < tmp)
>>>> + last_node = tmp;
>>>> + }
>>>> + }
>>>> + if (cpumask_weight(cpulist)) {
>>>> + cpulist_scnprintf(buf, sizeof(buf), cpulist);
>>>> + printk(KERN_INFO "Booting Processors %s,",
>>>> buf);
>>>> +
>>>> + if (node == last_node)
>>>> + printk(KERN_CONT " Node %d\n", node);
>>>> + else
>>>> + printk(KERN_CONT " Nodes %d-%d\n",
>>>> + node, last_node);
>>>> + }
>>>> + free_cpumask_var(cpulist);
>>>> + }
>>>> + }
>>>> +}
>>>> +
>>>> /*
>>>> * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
>>>> * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
>>> Why isn't cpumask_of_node() available yet?
>> I'll try that. It gets a bit tricky in specifying the actual last node that
>> is being booted.
>>
>
> Why do you need to call print_summary_bootmsg() for each cpu? It seems
> like you'd be able to move this out to a single call to a new function:
>
> void __init print_summary_bootmsg(void)
> {
> char buf[128];
> int nid;
>
> for_each_online_node(nid) {
> const struct cpumask *mask = cpumask_of_node(nid);
>
> if (cpumask_empty(mask))
> continue;
> cpulist_scnprintf(buf, sizeof(buf), cpumask_of_node(nid));
> pr_info("Booting Processors %s, Node %d\n", buf, nid);
> }
> }

Well one thing I did find out, cpumask_of_node (or more specifically
node_to_cpumask_map[] is filled in while the CPU's are booting, not
before.

Also, the above could potentially print 512 lines of boot messages before
booting cpu 1. The printk times also would not be accurate for each group
of cpus. And there's something to be said about actually doing what it
is you say you are doing. ;-)

Booting Processors 0-15 Node 0
Booting Processors 16-31 Node 1
<Here you expect cpus 0-15 to have already been booted.>

Why not just say:

cpulist_scnprintf(buf, sizeof(buf), cpu_present_mask);
pr_info("Booting Processors %s\n", buf);

Since the node -> cpu map can be printed much more efficiently some other way?

For example:

Nodes 0-7: 0-7,512-519 8-15,520-527 ...

would shrink it to 64 lines max.

(Note, it's important to include the "cpu_present_mask" because cpus can
be powered on disabled, and be booted later on, to decrease the initial
system startup time.)

A request was made (by AK?) that getting a general sense of progress is
a "good thing". I wanted to avoid something more mundane like dots or
sequential numbers. The one thing that Andi mentioned that I haven't
figured out is how to "delay print" specific cpu info in the case of a
boot error. I suppose one way would be to save the current position in
the kernel log buffer at the start of each cpu boot, and print that to
the console in case of an error?

Thanks,
Mike

2009-11-02 11:11:36

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages

Mike Travis wrote:
>
> This set of patches limits the number of repetitious messages which contain
> no additional information. Much of this information is obtainable from the
> /proc and /sysfs. Most of the messages are also sent to the kernel log
> buffer as KERN_DEBUG messages so it can be used to examine more closely any
> details specific to a processor.

What would be good is to put the information from the booting CPUs
into some buffer and print it visibly if there's a timeout detected on the BP.

Also power of two summaries at a bit odd, but ok.

> For Processor Information printout:
>
> [ 90.968381] Summary Processor Information for CPUS: 0-639
> [ 90.972033] Genuine Intel(R) CPU 0000 @ 2.13GHz stepping 04

It would be good to print family/model in this line

> [ 90.981402] CPU: L1 I cache: 32K, L1 D cache: 32K
> [ 90.985888] CPU: L2 cache: 256K
> [ 90.988032] CPU: L3 cache: 24576K

I would recommend to drop the cache information; this can be easily
gotten at runtime and is often implied in the CPU name anyways
(and especially L1 and increasingly L2 too change only very rarely)

> [ 90.992032] MIN 4266.68 BogoMIPS (lpj=8533371)
> [ 91.000033] MAX 4267.89 BogoMIPS (lpj=8535789)

Perhaps an average too? You could put all that on one line.


> These lines have been moved to loglevel KERN_DEBUG:
>
> CPU: Physical Processor ID:
> CPU: Processor Core ID:
> CPU %d/0x%x -> Node %d
> <cache line sizes per cpu>

I think you can just remove them.

> CPUx is down

This should be still printed if there's a timeout, or rather print
a "CPUx is not down" message. Right now there's no timeout detection on shutdown, but
I guess that wouldn't be too hard to add.

-Andi

2009-11-02 14:15:14

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 1/8] SGI x86_64 UV: Add limit console output function

On Mon, Oct 26, 2009 at 10:55:31AM -0700, Mike Travis wrote:
>
>
> Frederic Weisbecker wrote:
>> On Fri, Oct 23, 2009 at 06:37:44PM -0500, Mike Travis wrote:
>>> With a large number of processors in a system there is an excessive amount
>>> of messages sent to the system console. It's estimated that with 4096
>>> processors in a system, and the console baudrate set to 56K, the startup
>>> messages will take about 84 minutes to clear the serial port.
>>>
>>> This patch adds (for SGI UV only) a kernel start option "limit_console_
>>> output" (or 'lco' for short), which when set provides the ability to
>>> temporarily reduce the console loglevel during system startup. This allows
>>> informative messages to still be seen on the console without producing
>>> excessive amounts of repetious messages.
>>>
>>> Note that all the messages are still available in the kernel log buffer.
>>
>>
>>
>> Well, this problem does not only concerns SGI UV but all boxes with a large
>> number of cpus.
>>
>> Also, instead of adding the same conditionals in multiple places to solve
>> the same problem (and that may even expand if we go further the SGI UV case,
>> for example with other archs cpu up/down events), may be can you centralize,
>> institutionalize this issue by using the existing printk mechanisms.
>>
>> I mean, may be that could be addressed by adding a new printk
>> level flag, and then associate the desired filters against it.
>>
>> KERN_CPU could be a name, since this is targetting cpu events.
>>
>
> I did try out something like this but the changes quickly became very intrusive,
> and I was hoping for a "lighter" touch. The other potential fallout of adding
> another printk level might affect user programs that sift through the dmesg
> log for "interesting" info.
>
> Also, I could use some other config option to enable this, it's just that the
> existing X86_UV was too convenient. ;-) I believe most systems would want this
> turned off so the code size shrinks. And until you get the number of cpus into
> the hundreds and thousands, the messages usually just fly by - particularly if
> you're on a desktop system which has almost an infinite baud rate to the screen,
> and usually hides the messages behind a splash screen anyways.


Ok :)

2009-11-02 19:21:48

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages



Andi Kleen wrote:
> Mike Travis wrote:
>>
>> This set of patches limits the number of repetitious messages which
>> contain
>> no additional information. Much of this information is obtainable
>> from the
>> /proc and /sysfs. Most of the messages are also sent to the kernel log
>> buffer as KERN_DEBUG messages so it can be used to examine more
>> closely any
>> details specific to a processor.
>
> What would be good is to put the information from the booting CPUs
> into some buffer and print it visibly if there's a timeout detected on
> the BP.

What do you think of this idea.... Add a "mark kernel log buffer" function,
and then if any KERN_NOTE or above happens, it sends the marked info from
the kernel log buffer to the console before the current message. Set the
marker to '0' to clear.

And I was thinking that you might want to print the history of the previous
cpu that booted ok, before printing the info for the cpu that didn't. That
way you'd have some data to compare it with?

>
> Also power of two summaries at a bit odd, but ok.
>
>> For Processor Information printout:
>>
>> [ 90.968381] Summary Processor Information for CPUS: 0-639
>> [ 90.972033] Genuine Intel(R) CPU 0000 @ 2.13GHz stepping 04
>
> It would be good to print family/model in this line

There is more info that should be printed? I'm just calling the current
print_cpu_info using the cpuinfo_x86 for the first cpu in the list. And
it appears that it is printing the x86_model_id. Is there some other info
in that struct that should be printed?

>
>> [ 90.981402] CPU: L1 I cache: 32K, L1 D cache: 32K
>> [ 90.985888] CPU: L2 cache: 256K
>> [ 90.988032] CPU: L3 cache: 24576K
>
> I would recommend to drop the cache information; this can be easily
> gotten at runtime and is often implied in the CPU name anyways
> (and especially L1 and increasingly L2 too change only very rarely)

Ok, though because of future system upgrades to a UV system, you can
end up with slightly different processors (of the same family). The
only differences I've detected so far in testing is the stepping has
changed.

>
>> [ 90.992032] MIN 4266.68 BogoMIPS (lpj=8533371)
>> [ 91.000033] MAX 4267.89 BogoMIPS (lpj=8535789)
>
> Perhaps an average too? You could put all that on one line.

Sure thing.
>
>
>> These lines have been moved to loglevel KERN_DEBUG:
>>
>> CPU: Physical Processor ID:
>> CPU: Processor Core ID:
>> CPU %d/0x%x -> Node %d
>> <cache line sizes per cpu>
>
> I think you can just remove them.

I left them in in case we get to the point of printing KERN_DEBUG
messages in case of a failure. But you think they will not be
necessary in that case? (I also left them KERN_DEBUG instead of
pr_debug as the latter optimizes out the print if kernel DEBUG
is not defined... which it won't be in 99% of the kernels our
customers run with. And generally, it's better it get as much
good information as early as possible after a failure, instead
of attempting to recreate the failure with a "debug" kernel
[scheduling time on the system can sometimes be a real pain.]

>
>> CPUx is down
>
> This should be still printed if there's a timeout, or rather print
> a "CPUx is not down" message. Right now there's no timeout detection on
> shutdown, but
> I guess that wouldn't be too hard to add.

That seems a bit outside the scope of this patch...?

>
> -Andi

Thanks!
Mike

2009-11-02 19:36:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages


* Mike Travis <[email protected]> wrote:

>
>
> Andi Kleen wrote:
>> Mike Travis wrote:
>>>
>>> This set of patches limits the number of repetitious messages which
>>> contain
>>> no additional information. Much of this information is obtainable
>>> from the
>>> /proc and /sysfs. Most of the messages are also sent to the kernel log
>>> buffer as KERN_DEBUG messages so it can be used to examine more
>>> closely any
>>> details specific to a processor.
>>
>> What would be good is to put the information from the booting CPUs
>> into some buffer and print it visibly if there's a timeout detected on
>> the BP.
>
> What do you think of this idea.... Add a "mark kernel log buffer"
> function, and then if any KERN_NOTE or above happens, it sends the
> marked info from the kernel log buffer to the console before the
> current message. Set the marker to '0' to clear.

That's _way_ too complex really, for little benefit. (If there's a boot
hang people will re-try anyway (and this time with a serial console
attached or so), and they can add various boot options to increase
verbosity - depending in which phase the bootup hung.)

So please go with the simple solution i suggested days ago: print stuff
on the boot CPU but after that only a single line per AP CPU.

Ingo

2009-11-02 20:32:08

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages



Ingo Molnar wrote:
> * Mike Travis <[email protected]> wrote:
>
>>
>> Andi Kleen wrote:
>>> Mike Travis wrote:
>>>> This set of patches limits the number of repetitious messages which
>>>> contain
>>>> no additional information. Much of this information is obtainable
>>>> from the
>>>> /proc and /sysfs. Most of the messages are also sent to the kernel log
>>>> buffer as KERN_DEBUG messages so it can be used to examine more
>>>> closely any
>>>> details specific to a processor.
>>> What would be good is to put the information from the booting CPUs
>>> into some buffer and print it visibly if there's a timeout detected on
>>> the BP.
>> What do you think of this idea.... Add a "mark kernel log buffer"
>> function, and then if any KERN_NOTE or above happens, it sends the
>> marked info from the kernel log buffer to the console before the
>> current message. Set the marker to '0' to clear.
>
> That's _way_ too complex really, for little benefit. (If there's a boot
> hang people will re-try anyway (and this time with a serial console
> attached or so), and they can add various boot options to increase
> verbosity - depending in which phase the bootup hung.)

I'm ok with this, though generally speaking large server systems have
serial consoles attached, and save the output into admin logs. One
problem with just setting the loglevel high enough to output debug
messages, is you get literally 100's of thousands of lines of meaningless
information. We waited over 8 hours for a system with 2k cpus to boot
in debug mode, and it never made it all the way up.

My intention for the above was to attempt to print debug information
that pertains to the failure, and not everything else.

>
> So please go with the simple solution i suggested days ago: print stuff
> on the boot CPU but after that only a single line per AP CPU.
>
> Ingo

So you think printing 4096 lines provides meaningful additional
information? I would think at least compress it so you only print
each new processor socket boots and not the 16 threads each of
them have?

I should have timing information soon for 512 cores/1024 threads and
printing a single line for each of those will significantly increase
the time it takes to boot.

Thanks,
Mike

2009-11-04 00:22:48

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages


...
>> So please go with the simple solution i suggested days ago: print
>> stuff on the boot CPU but after that only a single line per AP CPU.
>>
>> Ingo
>

Hi Ingo,

Here is some timing info I collected... Would you accept the first
format (line per node) as a compromise?

(Note there will be 256 node systems w/4096 threads as well as
512 node systems w/HT disabled thus still 4096 threads.)

Btw, I think we should keep the processor summary as it does show some
useful information. Agreed?

(... I'll clean up the BogoMIPS line a bit:

BogoMIPS(lpj): MIN xxx (yyy) AVG xxx (yyy) MAX xxx (yyy)
...)

Thanks,
Mike

64 Nodes/512 cores/1024 threads...

By Node:
1 [ 27.998414] Booting Node 0, Processors 1-7,512-519
2 [ 28.645066] Booting Node 1, Processors 8-15,520-527
3 [ 29.389359] Booting Node 2, Processors 16-23,528-535
4 [ 30.160646] Booting Node 3, Processors 24-31,536-543
...
62 [ 75.013459] Booting Node 61, Processors 488-495,1000-1007
63 [ 75.789663] Booting Node 62, Processors 496-503,1008-1015
64 [ 76.565430] Booting Node 63, Processors 504-511,1016-1023
65 [ 126.860204] Brought up 1024 CPUs
66 [ 126.865392] Summary Processor Information for CPUS: 0-143,256-383,448-655,768-895,960-1023
67 [ 126.876033] Genuine Intel(R) CPU 0000 @ 2.13GHz stepping 04
68 [ 126.881404] BogoMIPS: MIN 3980.53 MAX 4268.28 AVG 4265.85
69 [ 126.888032] Loops/Jiffy: MIN 7961074 MAX 8536570 AVG 8531701
70 [ 126.896875] Summary Processor Information for CPUS: 144-239,384-447,656-751,896-959
71 [ 126.904032] Intel(R) Xeon(R) CPU X7560 @ 2.27GHz stepping 05
72 [ 126.913404] BogoMIPS: MIN 4528.51 MAX 4762.75 AVG 4535.89
73 [ 126.920032] Loops/Jiffy: MIN 9057030 MAX 9525505 AVG 9071785
74 [ 126.924217] Summary Processor Information for CPUS: 240-255,752-767
75 [ 126.932032] Genuine Intel(R) CPU 0000 @ 2.13GHz stepping 03
76 [ 126.937404] BogoMIPS: MIN 4267.31 MAX 4268.24 AVG 4267.66
77 [ 126.944032] Loops/Jiffy: MIN 8534632 MAX 8536490 AVG 8535338
78 [ 126.952425] Total of 1024 processors activated (4454702.72 BogoMIPS).

By CPU:
1 [ 28.010255] Booting Processor 1 APIC 0x2 ip 0x6000
2 [ 28.106191] Booting Processor 2 APIC 0x4 ip 0x6000
3 [ 28.204705] Booting Processor 3 APIC 0x6 ip 0x6000
4 [ 28.300709] Booting Processor 4 APIC 0x10 ip 0x6000
5 [ 28.400707] Booting Processor 5 APIC 0x12 ip 0x6000
...
1020 [ 131.440341] Booting Processor 1020 APIC 0x7f1 ip 0x6000
1021 [ 131.544366] Booting Processor 1021 APIC 0x7f3 ip 0x6000
1022 [ 131.648354] Booting Processor 1022 APIC 0x7f5 ip 0x6000
1023 [ 131.752350] Booting Processor 1023 APIC 0x7f7 ip 0x6000
1024 [ 131.852202] Brought up 1024 CPUs
1025 [ 131.857394] Summary Processor Information for CPUS: 0-143,256-383,448-655,768-895,960-1023
1026 [ 131.868033] Genuine Intel(R) CPU 0000 @ 2.13GHz stepping 04
1027 [ 131.873404] BogoMIPS: MIN 3996.11 MAX 4555.27 AVG 4266.18
1028 [ 131.880033] Loops/Jiffy: MIN 7992230 MAX 9110559 AVG 8532371
1029 [ 131.888871] Summary Processor Information for CPUS: 144-239,384-447,656-751,896-959
1030 [ 131.896032] Intel(R) Xeon(R) CPU X7560 @ 2.27GHz stepping 05
1031 [ 131.905405] BogoMIPS: MIN 4252.28 MAX 4819.44 AVG 4535.46
1032 [ 131.912032] Loops/Jiffy: MIN 8504574 MAX 9638886 AVG 9070920
1033 [ 131.916218] Summary Processor Information for CPUS: 240-255,752-767
1034 [ 131.924032] Genuine Intel(R) CPU 0000 @ 2.13GHz stepping 03
1035 [ 131.929404] BogoMIPS: MIN 4267.15 MAX 4268.35 AVG 4267.67
1036 [ 131.936032] Loops/Jiffy: MIN 8534307 MAX 8536711 AVG 8535357
1037 [ 131.944424] Total of 1024 processors activated (4454789.68 BogoMIPS).

2009-11-04 10:25:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages


* Mike Travis <[email protected]> wrote:

>
> ...
>>> So please go with the simple solution i suggested days ago: print
>>> stuff on the boot CPU but after that only a single line per AP CPU.
>>>
>>> Ingo
>>
>
> Hi Ingo,
>
> Here is some timing info I collected... Would you accept the first
> format (line per node) as a compromise?
>
> (Note there will be 256 node systems w/4096 threads as well as
> 512 node systems w/HT disabled thus still 4096 threads.)
>
> Btw, I think we should keep the processor summary as it does show some
> useful information. Agreed?
>
> (... I'll clean up the BogoMIPS line a bit:
>
> BogoMIPS(lpj): MIN xxx (yyy) AVG xxx (yyy) MAX xxx (yyy)
> ...)
>
> Thanks,
> Mike
>
> 64 Nodes/512 cores/1024 threads...
>
> By Node:
> 1 [ 27.998414] Booting Node 0, Processors 1-7,512-519
> 2 [ 28.645066] Booting Node 1, Processors 8-15,520-527
> 3 [ 29.389359] Booting Node 2, Processors 16-23,528-535
> 4 [ 30.160646] Booting Node 3, Processors 24-31,536-543
> ...
> 62 [ 75.013459] Booting Node 61, Processors 488-495,1000-1007
> 63 [ 75.789663] Booting Node 62, Processors 496-503,1008-1015
> 64 [ 76.565430] Booting Node 63, Processors 504-511,1016-1023
> 65 [ 126.860204] Brought up 1024 CPUs

Yeah, this portion certainly looks good. The important thing is to make
this the default - i.e. we dont want some weird switch (or other hw
dependent flaggery) to turn on two styles of bootup output.

We have trouble keeping a single variant sane already, we definitely
dont want to have multiple variants.

> 66 [ 126.865392] Summary Processor Information for CPUS: 0-143,256-383,448-655,768-895,960-1023
> 67 [ 126.876033] Genuine Intel(R) CPU 0000 @ 2.13GHz stepping 04
> 68 [ 126.881404] BogoMIPS: MIN 3980.53 MAX 4268.28 AVG 4265.85
> 69 [ 126.888032] Loops/Jiffy: MIN 7961074 MAX 8536570 AVG 8531701
> 70 [ 126.896875] Summary Processor Information for CPUS: 144-239,384-447,656-751,896-959
> 71 [ 126.904032] Intel(R) Xeon(R) CPU X7560 @ 2.27GHz stepping 05
> 72 [ 126.913404] BogoMIPS: MIN 4528.51 MAX 4762.75 AVG 4535.89
> 73 [ 126.920032] Loops/Jiffy: MIN 9057030 MAX 9525505 AVG 9071785
> 74 [ 126.924217] Summary Processor Information for CPUS: 240-255,752-767
> 75 [ 126.932032] Genuine Intel(R) CPU 0000 @ 2.13GHz stepping 03
> 76 [ 126.937404] BogoMIPS: MIN 4267.31 MAX 4268.24 AVG 4267.66
> 77 [ 126.944032] Loops/Jiffy: MIN 8534632 MAX 8536490 AVG 8535338
> 78 [ 126.952425] Total of 1024 processors activated (4454702.72 BogoMIPS).

4.4 million bogomips. Nice ;-)

>
> By CPU:
> 1 [ 28.010255] Booting Processor 1 APIC 0x2 ip 0x6000
> 2 [ 28.106191] Booting Processor 2 APIC 0x4 ip 0x6000
> 3 [ 28.204705] Booting Processor 3 APIC 0x6 ip 0x6000
> 4 [ 28.300709] Booting Processor 4 APIC 0x10 ip 0x6000
> 5 [ 28.400707] Booting Processor 5 APIC 0x12 ip 0x6000
> ...
> 1020 [ 131.440341] Booting Processor 1020 APIC 0x7f1 ip 0x6000
> 1021 [ 131.544366] Booting Processor 1021 APIC 0x7f3 ip 0x6000
> 1022 [ 131.648354] Booting Processor 1022 APIC 0x7f5 ip 0x6000
> 1023 [ 131.752350] Booting Processor 1023 APIC 0x7f7 ip 0x6000
> 1024 [ 131.852202] Brought up 1024 CPUs

i'd suggest some cleanups to this single line output.

The 'ip' printout is very lame an comes from ancient times, from our
first attempts to boot Linux in SMP mode on dual 100 MHz pentia ;-)

Something like:

Booting CPU #1020 ... ok.

Would be more than enough. Any other info can be made a debug boot flag,
dependent on apic=verbose.

Please send a new iteration of the arch/x86 (and scheduler/timer)
patches in a separate series instead of mixed into this thread.

Thanks,

Ingo

2009-11-04 10:31:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages


* Mike Travis <[email protected]> wrote:

>
>
> Ingo Molnar wrote:
>> * Mike Travis <[email protected]> wrote:
>>
>>>
>>> Andi Kleen wrote:
>>>> Mike Travis wrote:
>>>>> This set of patches limits the number of repetitious messages
>>>>> which contain
>>>>> no additional information. Much of this information is
>>>>> obtainable from the
>>>>> /proc and /sysfs. Most of the messages are also sent to the kernel log
>>>>> buffer as KERN_DEBUG messages so it can be used to examine more
>>>>> closely any
>>>>> details specific to a processor.
>>>> What would be good is to put the information from the booting CPUs
>>>> into some buffer and print it visibly if there's a timeout detected
>>>> on the BP.
>>> What do you think of this idea.... Add a "mark kernel log buffer"
>>> function, and then if any KERN_NOTE or above happens, it sends the
>>> marked info from the kernel log buffer to the console before the
>>> current message. Set the marker to '0' to clear.
>>
>> That's _way_ too complex really, for little benefit. (If there's a boot
>> hang people will re-try anyway (and this time with a serial console
>> attached or so), and they can add various boot options to increase
>> verbosity - depending in which phase the bootup hung.)
>
> I'm ok with this, though generally speaking large server systems have
> serial consoles attached, and save the output into admin logs. [...]

Typically yes, but not necessarily during basic system bringup, which is
when most of the hangs/problems are found.

> [...] One problem with just setting the loglevel high enough to
> output debug messages, is you get literally 100's of thousands of
> lines of meaningless information. We waited over 8 hours for a system
> with 2k cpus to boot in debug mode, and it never made it all the way
> up.
>
> My intention for the above was to attempt to print debug information
> that pertains to the failure, and not everything else.

We want a noise-free default bootup, and printks (on the boot cpu) in
case of failures.

_that_ abnormal-event printout can then be sufficiently verbose.

>> So please go with the simple solution i suggested days ago: print
>> stuff on the boot CPU but after that only a single line per AP CPU.
>
> So you think printing 4096 lines provides meaningful additional
> information? I would think at least compress it so you only print
> each new processor socket boots and not the 16 threads each of them
> have?
>
> I should have timing information soon for 512 cores/1024 threads and
> printing a single line for each of those will significantly increase
> the time it takes to boot.

Feel free to compress it further. What i was objecting to was the
increased complexity of 'buffering' messages somehow and printing them
conditionally.

Ingo

2009-11-12 22:54:44

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages

On Mon, Nov 02, 2009 at 11:21:39AM -0800, Mike Travis wrote:

> >> [ 90.981402] CPU: L1 I cache: 32K, L1 D cache: 32K
> >> [ 90.985888] CPU: L2 cache: 256K
> >> [ 90.988032] CPU: L3 cache: 24576K
> >
> > I would recommend to drop the cache information; this can be easily
> > gotten at runtime and is often implied in the CPU name anyways
> > (and especially L1 and increasingly L2 too change only very rarely)
>
> Ok, though because of future system upgrades to a UV system, you can
> end up with slightly different processors (of the same family). The
> only differences I've detected so far in testing is the stepping has
> changed.

I happened to be annoyed by dozens of these three printk's earlier,
and hacked up the following (currently untested) patch.

But I don't disagree with Andi either, that it's not particularly useful,
and we can get all this from userspace in /proc/cpuinfo, or x86info.

If someone still finds it valuable to have the kernel keep printing it
though, perhaps something like the following ?

Dave



On processors with a large number of cores, we print dozens of lines of information about
the CPU cache topology, most of which is unnecessary.
This patch reduces spew a lot (down to a single line unless someone uses a mix of processors
with different cache sizes)

- Check if the total cache size on APs is equal to the boot processors cache size.
Print nothing if equal.
- The three printk's will fit on one line.

Signed-off-by: Dave Jones <[email protected]>

diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c
index 804c40e..cc4f44d 100644
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -358,9 +362,9 @@ unsigned int __cpuinit init_intel_cacheinfo(struct cpuinfo_x86 *c)
#ifdef CONFIG_X86_HT
unsigned int cpu = c->cpu_index;
#endif
+ static int is_initialized;

if (c->cpuid_level > 3) {
- static int is_initialized;

if (is_initialized == 0) {
/* Init num_cache_leaves from boot CPU */
@@ -488,6 +492,21 @@ unsigned int __cpuinit init_intel_cacheinfo(struct cpuinfo_x86 *c)
#endif
}

+ c->x86_cache_size = l3 ? l3 : (l2 ? l2 : (l1i+l1d));
+
+ /*
+ * cache topology on all AP's is likely equal to that of the BP
+ * if this is the case, don't bother printing anything out for the AP's.
+ */
+ if (is_initialized != 0) {
+ if (c->x86_cache_size == boot_cpu_data.x86_cache_size)
+ return l2;
+ else
+ printk(KERN_INFO "CPU: AP has different cache size (%d) to BP (%d)\n",
+ c->x86_cache_size,
+ boot_cpu_data.x86_cache_size);
+ }
+
if (trace)
printk(KERN_INFO "CPU: Trace cache: %dK uops", trace);
else if (l1i)
@@ -495,16 +512,12 @@ unsigned int __cpuinit init_intel_cacheinfo(struct cpuinfo_x86 *c)

if (l1d)
printk(KERN_CONT ", L1 D cache: %dK\n", l1d);
- else
- printk(KERN_CONT "\n");

if (l2)
- printk(KERN_INFO "CPU: L2 cache: %dK\n", l2);
+ printk(KERN_CONT ", L2 cache: %dK\n", l2);

if (l3)
- printk(KERN_INFO "CPU: L3 cache: %dK\n", l3);
-
- c->x86_cache_size = l3 ? l3 : (l2 ? l2 : (l1i+l1d));
+ printk(KERN_CONT ", L3 cache: %dK\n", l3);

return l2;
}

2009-11-12 22:59:56

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages

On 11/12/2009 02:22 PM, Dave Jones wrote:
>
> But I don't disagree with Andi either, that it's not particularly useful,
> and we can get all this from userspace in /proc/cpuinfo, or x86info.
>

I personally don't think it's useful at all. It gives information about
the processor which can be obtained from other sources. What we want is
enough information that the CPU can be unambiguously identified, so that
when someone posts dmesg we can tell what machine they came from.

-hpa

2009-11-12 23:18:51

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages

On Thu, Nov 12, 2009 at 02:57:33PM -0800, H. Peter Anvin wrote:
> On 11/12/2009 02:22 PM, Dave Jones wrote:
> >
> > But I don't disagree with Andi either, that it's not particularly useful,
> > and we can get all this from userspace in /proc/cpuinfo, or x86info.
> >
>
> I personally don't think it's useful at all. It gives information about
> the processor which can be obtained from other sources. What we want is
> enough information that the CPU can be unambiguously identified, so that
> when someone posts dmesg we can tell what machine they came from.

In which case..

Dave

---


Remove the CPU cache size printk's.

They aren't useful, and pollute the dmesg output a lot (especially on machines with many cores).
Also the same information can be trivially found out from userspace.

Signed-off-by: Dave Jones <[email protected]>

diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c
index 804c40e..868fcdd 100644
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -488,22 +493,6 @@ unsigned int __cpuinit init_intel_cacheinfo(struct cpuinfo_x86 *c)
#endif
}

- if (trace)
- printk(KERN_INFO "CPU: Trace cache: %dK uops", trace);
- else if (l1i)
- printk(KERN_INFO "CPU: L1 I cache: %dK", l1i);
-
- if (l1d)
- printk(KERN_CONT ", L1 D cache: %dK\n", l1d);
- else
- printk(KERN_CONT "\n");
-
- if (l2)
- printk(KERN_INFO "CPU: L2 cache: %dK\n", l2);
-
- if (l3)
- printk(KERN_INFO "CPU: L3 cache: %dK\n", l3);
-
c->x86_cache_size = l3 ? l3 : (l2 ? l2 : (l1i+l1d));

return l2;

2009-11-13 08:03:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages


* Dave Jones <[email protected]> wrote:

> In which case..
>
> Dave
>
> ---
>
> Remove the CPU cache size printk's.
>
> They aren't useful, and pollute the dmesg output a lot (especially on
> machines with many cores). Also the same information can be trivially
> found out from userspace.
>
> Signed-off-by: Dave Jones <[email protected]>

Precisely - these are the kind of simple patches i'd like to see.

Ingo

2009-11-13 08:12:15

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages

On 11/13/2009 12:03 AM, Ingo Molnar wrote:
>
> * Dave Jones <[email protected]> wrote:
>
>> In which case..
>>
>> Dave
>>
>> ---
>>
>> Remove the CPU cache size printk's.
>>
>> They aren't useful, and pollute the dmesg output a lot (especially on
>> machines with many cores). Also the same information can be trivially
>> found out from userspace.
>>
>> Signed-off-by: Dave Jones <[email protected]>
>
> Precisely - these are the kind of simple patches i'd like to see.
>

Indeed.

Acked-by: H. Peter Anvin <[email protected]>

I'll apply it tomorrow if you don't beat me to it.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-11-13 08:19:37

by Dave Jones

[permalink] [raw]
Subject: [tip:x86/debug] x86: Remove the CPU cache size printk's

Commit-ID: 15cd8812ab2ce62a2f779e93a8398bdad752291a
Gitweb: http://git.kernel.org/tip/15cd8812ab2ce62a2f779e93a8398bdad752291a
Author: Dave Jones <[email protected]>
AuthorDate: Thu, 12 Nov 2009 18:15:43 -0500
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 13 Nov 2009 09:14:55 +0100

x86: Remove the CPU cache size printk's

They aren't really useful, and they pollute the dmesg output a lot
(especially on machines with many cores).

Also the same information can be trivially found out from
userspace.

Reported-by: Mike Travis <[email protected]>
Signed-off-by: Dave Jones <[email protected]>
Acked-by: H. Peter Anvin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Roland Dreier <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Hidetoshi Seto <[email protected]>
Cc: Jack Steiner <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/intel_cacheinfo.c | 16 ----------------
1 files changed, 0 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c
index 804c40e..0df4c2b 100644
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -488,22 +488,6 @@ unsigned int __cpuinit init_intel_cacheinfo(struct cpuinfo_x86 *c)
#endif
}

- if (trace)
- printk(KERN_INFO "CPU: Trace cache: %dK uops", trace);
- else if (l1i)
- printk(KERN_INFO "CPU: L1 I cache: %dK", l1i);
-
- if (l1d)
- printk(KERN_CONT ", L1 D cache: %dK\n", l1d);
- else
- printk(KERN_CONT "\n");
-
- if (l2)
- printk(KERN_INFO "CPU: L2 cache: %dK\n", l2);
-
- if (l3)
- printk(KERN_INFO "CPU: L3 cache: %dK\n", l3);
-
c->x86_cache_size = l3 ? l3 : (l2 ? l2 : (l1i+l1d));

return l2;

2009-11-13 16:10:20

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages



H. Peter Anvin wrote:
> On 11/12/2009 02:22 PM, Dave Jones wrote:
>> But I don't disagree with Andi either, that it's not particularly useful,
>> and we can get all this from userspace in /proc/cpuinfo, or x86info.
>>
>
> I personally don't think it's useful at all. It gives information about
> the processor which can be obtained from other sources. What we want is
> enough information that the CPU can be unambiguously identified, so that
> when someone posts dmesg we can tell what machine they came from.
>
> -hpa

Can we say the same thing about the sched debug messages? It's even more
painful because the number of lines output is exponential to the number
of cpus.

Thanks,
Mike

2009-11-13 22:38:36

by Roland Dreier

[permalink] [raw]
Subject: [PATCH] x86: Remove CPU cache size output for non-Intel too

As Dave Jones said about the output in intel_cacheinfo.c: "They aren't
useful, and pollute the dmesg output a lot (especially on machines with
many cores). Also the same information can be trivially found out from
userspace." Give the generic display_cacheinfo() function the same
treatment.

Signed-off-by: Roland Dreier <[email protected]>
---
arch/x86/kernel/cpu/common.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cc25c2b..5f8f420 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -391,8 +391,6 @@ void __cpuinit display_cacheinfo(struct cpuinfo_x86 *c)

if (n >= 0x80000005) {
cpuid(0x80000005, &dummy, &ebx, &ecx, &edx);
- printk(KERN_INFO "CPU: L1 I Cache: %dK (%d bytes/line), D cache %dK (%d bytes/line)\n",
- edx>>24, edx&0xFF, ecx>>24, ecx&0xFF);
c->x86_cache_size = (ecx>>24) + (edx>>24);
#ifdef CONFIG_X86_64
/* On K8 L1 TLB is inclusive, so don't count it */
@@ -422,9 +420,6 @@ void __cpuinit display_cacheinfo(struct cpuinfo_x86 *c)
#endif

c->x86_cache_size = l2size;
-
- printk(KERN_INFO "CPU: L2 Cache: %dK (%d bytes/line)\n",
- l2size, ecx & 0xFF);
}

void __cpuinit detect_ht(struct cpuinfo_x86 *c)

2009-11-13 22:54:11

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] x86: Remove CPU cache size output for non-Intel too

On Fri, Nov 13, 2009 at 02:38:26PM -0800, Roland Dreier wrote:
> As Dave Jones said about the output in intel_cacheinfo.c: "They aren't
> useful, and pollute the dmesg output a lot (especially on machines with
> many cores). Also the same information can be trivially found out from
> userspace." Give the generic display_cacheinfo() function the same
> treatment.
>
> Signed-off-by: Roland Dreier <[email protected]>

Acked-by: Dave Jones <[email protected]>

2009-11-14 00:54:12

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Limit the number of processor bootup messages


* Mike Travis <[email protected]> wrote:

> H. Peter Anvin wrote:
> >On 11/12/2009 02:22 PM, Dave Jones wrote:
> >>But I don't disagree with Andi either, that it's not particularly useful,
> >>and we can get all this from userspace in /proc/cpuinfo, or x86info.
> >>
> >
> > I personally don't think it's useful at all. It gives information
> > about the processor which can be obtained from other sources. What
> > we want is enough information that the CPU can be unambiguously
> > identified, so that when someone posts dmesg we can tell what
> > machine they came from.
>
> Can we say the same thing about the sched debug messages? It's even
> more painful because the number of lines output is exponential to the
> number of cpus.

Yeah - but instead of getting rid of them please tie them to a (default
off) debug switch. There's been the occasional crash where they were
useful.

Ingo

2009-11-14 00:56:53

by Roland Dreier

[permalink] [raw]
Subject: [tip:x86/debug] x86: Remove CPU cache size output for non-Intel too

Commit-ID: b01c845f0f2e3f9e54e6a78d5d56895f5b95e818
Gitweb: http://git.kernel.org/tip/b01c845f0f2e3f9e54e6a78d5d56895f5b95e818
Author: Roland Dreier <[email protected]>
AuthorDate: Fri, 13 Nov 2009 14:38:26 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 14 Nov 2009 01:51:18 +0100

x86: Remove CPU cache size output for non-Intel too

As Dave Jones said about the output in intel_cacheinfo.c: "They
aren't useful, and pollute the dmesg output a lot (especially on
machines with many cores). Also the same information can be
trivially found out from userspace."

Give the generic display_cacheinfo() function the same treatment.

Signed-off-by: Roland Dreier <[email protected]>
Acked-by: Dave Jones <[email protected]>
Cc: Mike Travis <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Hidetoshi Seto <[email protected]>
Cc: Jack Steiner <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/common.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 617a29f..9db1e24 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -391,8 +391,6 @@ void __cpuinit display_cacheinfo(struct cpuinfo_x86 *c)

if (n >= 0x80000005) {
cpuid(0x80000005, &dummy, &ebx, &ecx, &edx);
- printk(KERN_INFO "CPU: L1 I Cache: %dK (%d bytes/line), D cache %dK (%d bytes/line)\n",
- edx>>24, edx&0xFF, ecx>>24, ecx&0xFF);
c->x86_cache_size = (ecx>>24) + (edx>>24);
#ifdef CONFIG_X86_64
/* On K8 L1 TLB is inclusive, so don't count it */
@@ -422,9 +420,6 @@ void __cpuinit display_cacheinfo(struct cpuinfo_x86 *c)
#endif

c->x86_cache_size = l2size;
-
- printk(KERN_INFO "CPU: L2 Cache: %dK (%d bytes/line)\n",
- l2size, ecx & 0xFF);
}

void __cpuinit detect_ht(struct cpuinfo_x86 *c)