2009-11-16 21:07:33

by Mike Travis

[permalink] [raw]
Subject: [PATCH 2/6] x86: Limit the number of per cpu MCE bootup messages

Limit the number of per cpu MCE messages by using pr_debug.
This prevents filling up the console output with repetitious
messages when the number of cpus is large.

Remove the need for KERN_CONT so it does not add an extraneous
newline in the booting cpu sequence.

Signed-off-by: Mike Travis <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 4 ++--
arch/x86/kernel/cpu/mcheck/mce_intel.c | 20 ++++++++++++--------
2 files changed, 14 insertions(+), 10 deletions(-)

--- linux.orig/arch/x86/kernel/cpu/mcheck/mce.c
+++ linux/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1215,10 +1215,10 @@

b = cap & MCG_BANKCNT_MASK;
if (!banks)
- printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
+ pr_debug("mce: CPU supports %d MCE banks\n", b);

if (b > MAX_NR_BANKS) {
- printk(KERN_WARNING
+ pr_warning(
"MCE: Using only %u machine check banks out of %u\n",
MAX_NR_BANKS, b);
b = MAX_NR_BANKS;
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ linux/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -64,12 +64,15 @@
mce_notify_irq();
}

-static void print_update(char *type, int *hdr, int num)
+static void print_update(char *type, int *hdr, int num, char *buf, int len)
{
- if (*hdr == 0)
- printk(KERN_INFO "CPU %d MCA banks", smp_processor_id());
- *hdr = 1;
- printk(KERN_CONT " %s:%d", type, num);
+ int n = *hdr;
+
+ if (n == 0)
+ n = snprintf(buf, len, "CPU %d MCA banks", smp_processor_id());
+
+ n += snprintf(&buf[n], len - n, " %s:%d", type, num);
+ *hdr = n;
}

/*
@@ -83,6 +86,7 @@
unsigned long flags;
int hdr = 0;
int i;
+ char buf[120];

spin_lock_irqsave(&cmci_discover_lock, flags);
for (i = 0; i < banks; i++) {
@@ -96,7 +100,7 @@
/* Already owned by someone else? */
if (val & CMCI_EN) {
if (test_and_clear_bit(i, owned) || boot)
- print_update("SHD", &hdr, i);
+ print_update("SHD", &hdr, i, buf, sizeof(buf));
__clear_bit(i, __get_cpu_var(mce_poll_banks));
continue;
}
@@ -108,7 +112,7 @@
/* Did the enable bit stick? -- the bank supports CMCI */
if (val & CMCI_EN) {
if (!test_and_set_bit(i, owned) || boot)
- print_update("CMCI", &hdr, i);
+ print_update("CMCI", &hdr, i, buf, sizeof(buf));
__clear_bit(i, __get_cpu_var(mce_poll_banks));
} else {
WARN_ON(!test_bit(i, __get_cpu_var(mce_poll_banks)));
@@ -116,7 +120,7 @@
}
spin_unlock_irqrestore(&cmci_discover_lock, flags);
if (hdr)
- printk(KERN_CONT "\n");
+ pr_debug("%s\n", buf);
}

/*

--


2009-11-16 21:23:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: Limit the number of per cpu MCE bootup messages


* Mike Travis <[email protected]> wrote:

> @@ -83,6 +86,7 @@
> unsigned long flags;
> int hdr = 0;
> int i;
> + char buf[120];

that constant is not particularly nice, is it?

Ingo

2009-11-16 21:35:26

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: Limit the number of per cpu MCE bootup messages



Ingo Molnar wrote:
> * Mike Travis <[email protected]> wrote:
>
>> @@ -83,6 +86,7 @@
>> unsigned long flags;
>> int hdr = 0;
>> int i;
>> + char buf[120];
>
> that constant is not particularly nice, is it?
>
> Ingo

I'm up for suggestions. I just noticed that during testing, the
MCE Banks messages overflowed 80 chars but I didn't actually
check to see what the longest might be.

Should I trim it to 80? Or use a different constant?

Thanks,
Mike

2009-11-17 07:11:16

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: Limit the number of per cpu MCE bootup messages

Mike Travis wrote:
>
>
> Ingo Molnar wrote:
>> * Mike Travis <[email protected]> wrote:
>>
>>> @@ -83,6 +86,7 @@
>>> unsigned long flags;
>>> int hdr = 0;
>>> int i;
>>> + char buf[120];
>>
>> that constant is not particularly nice, is it?
>>
>> Ingo
>
> I'm up for suggestions. I just noticed that during testing, the
> MCE Banks messages overflowed 80 chars but I didn't actually
> check to see what the longest might be.
>
> Should I trim it to 80? Or use a different constant?

I think you could calculate the size using MAX_NR_BANKS.

But I'd like to change the format to shorter one at same time,
So how about the following?


Thanks,
H.Seto

===

[PATCH] x86, mce: rework output of MCE banks ownership information

The output of MCE banks ownership information on boot tend
to be long on new processor which has many banks:

CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21

This message can fill up the console output when the number
of cpus is large.

This patch suppress this info message on boot, and introduce
debug message in shorter format instead, like:

CPU 1 MCE banks map: ssCC PCss ssPP ssss ssss ss

where: s: shared, C: checked by cmci, P: checked by poll.

This patch still keep the info when ownership is updated.
E.g. if a cpu take over the ownership from hot-removed cpu,
both message will be shown:

CPU 1 MCE banks map updated: CMCI:6 CMCI:7 CMCI:10 CMCI:11
CPU 1 MCE banks map: ssCC PCCC ssPP ssCC ssss ss

v2:
- stop changing the level of message on update
- change the number of banks message on boot to debug level

Signed-off-by: Hidetoshi Seto <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 6 +++---
arch/x86/kernel/cpu/mcheck/mce_intel.c | 29 +++++++++++++++++++++++------
2 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 5f277ca..8627976 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1229,11 +1229,11 @@ static int __cpuinit __mcheck_cpu_cap_init(void)

b = cap & MCG_BANKCNT_MASK;
if (!banks)
- printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
+ pr_debug("mce: CPU supports %d MCE banks\n", b);

if (b > MAX_NR_BANKS) {
- printk(KERN_WARNING
- "MCE: Using only %u machine check banks out of %u\n",
+ pr_warning(
+ "MCE: Using only %u machine check banks out of %u\n",
MAX_NR_BANKS, b);
b = MAX_NR_BANKS;
}
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 7c78563..448a38b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -64,12 +64,25 @@ static void intel_threshold_interrupt(void)
mce_notify_irq();
}

+static void print_banks_map(int banks)
+{
+ int i;
+
+ pr_debug("CPU %d MCE banks map:", smp_processor_id());
+ for (i = 0; i < banks; i++) {
+ pr_cont("%s%s", (i % 4) ? "" : " ",
+ test_bit(i, __get_cpu_var(mce_banks_owned)) ? "C" :
+ test_bit(i, __get_cpu_var(mce_poll_banks)) ? "P" : "s");
+ }
+ pr_cont("\n");
+}
+
static void print_update(char *type, int *hdr, int num)
{
if (*hdr == 0)
- printk(KERN_INFO "CPU %d MCA banks", smp_processor_id());
+ pr_info("CPU %d MCE banks map updated:", smp_processor_id());
*hdr = 1;
- printk(KERN_CONT " %s:%d", type, num);
+ pr_cont(" %s:%d", type, num);
}

/*
@@ -85,6 +98,7 @@ static void cmci_discover(int banks, int boot)
int i;

spin_lock_irqsave(&cmci_discover_lock, flags);
+
for (i = 0; i < banks; i++) {
u64 val;

@@ -95,7 +109,7 @@ static void cmci_discover(int banks, int boot)

/* Already owned by someone else? */
if (val & CMCI_EN) {
- if (test_and_clear_bit(i, owned) || boot)
+ if (test_and_clear_bit(i, owned) && !boot)
print_update("SHD", &hdr, i);
__clear_bit(i, __get_cpu_var(mce_poll_banks));
continue;
@@ -107,16 +121,19 @@ static void cmci_discover(int banks, int boot)

/* Did the enable bit stick? -- the bank supports CMCI */
if (val & CMCI_EN) {
- if (!test_and_set_bit(i, owned) || boot)
+ if (!test_and_set_bit(i, owned) && !boot)
print_update("CMCI", &hdr, i);
__clear_bit(i, __get_cpu_var(mce_poll_banks));
} else {
WARN_ON(!test_bit(i, __get_cpu_var(mce_poll_banks)));
}
}
- spin_unlock_irqrestore(&cmci_discover_lock, flags);
if (hdr)
- printk(KERN_CONT "\n");
+ pr_cont("\n");
+ if (hdr || boot)
+ print_banks_map(banks);
+
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
}

/*
--
1.6.5.2

2009-11-17 17:16:04

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH 2/6] x86: Limit the number of per cpu MCE bootup messages



Hidetoshi Seto wrote:
> Mike Travis wrote:
>>
>> Ingo Molnar wrote:
>>> * Mike Travis <[email protected]> wrote:
>>>
>>>> @@ -83,6 +86,7 @@
>>>> unsigned long flags;
>>>> int hdr = 0;
>>>> int i;
>>>> + char buf[120];
>>> that constant is not particularly nice, is it?
>>>
>>> Ingo
>> I'm up for suggestions. I just noticed that during testing, the
>> MCE Banks messages overflowed 80 chars but I didn't actually
>> check to see what the longest might be.
>>
>> Should I trim it to 80? Or use a different constant?
>
> I think you could calculate the size using MAX_NR_BANKS.
>
> But I'd like to change the format to shorter one at same time,
> So how about the following?
>
>
> Thanks,
> H.Seto
>
> ===
>
> [PATCH] x86, mce: rework output of MCE banks ownership information
>
> The output of MCE banks ownership information on boot tend
> to be long on new processor which has many banks:
>
> CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21
>
> This message can fill up the console output when the number
> of cpus is large.
>
> This patch suppress this info message on boot, and introduce
> debug message in shorter format instead, like:
>
> CPU 1 MCE banks map: ssCC PCss ssPP ssss ssss ss
>
> where: s: shared, C: checked by cmci, P: checked by poll.
>
> This patch still keep the info when ownership is updated.
> E.g. if a cpu take over the ownership from hot-removed cpu,
> both message will be shown:
>
> CPU 1 MCE banks map updated: CMCI:6 CMCI:7 CMCI:10 CMCI:11
> CPU 1 MCE banks map: ssCC PCCC ssPP ssCC ssss ss
>
> v2:
> - stop changing the level of message on update
> - change the number of banks message on boot to debug level
>
> Signed-off-by: Hidetoshi Seto <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 6 +++---
> arch/x86/kernel/cpu/mcheck/mce_intel.c | 29 +++++++++++++++++++++++------
> 2 files changed, 26 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 5f277ca..8627976 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1229,11 +1229,11 @@ static int __cpuinit __mcheck_cpu_cap_init(void)
>
> b = cap & MCG_BANKCNT_MASK;
> if (!banks)
> - printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
> + pr_debug("mce: CPU supports %d MCE banks\n", b);
>
> if (b > MAX_NR_BANKS) {
> - printk(KERN_WARNING
> - "MCE: Using only %u machine check banks out of %u\n",
> + pr_warning(
> + "MCE: Using only %u machine check banks out of %u\n",
> MAX_NR_BANKS, b);
> b = MAX_NR_BANKS;
> }
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> index 7c78563..448a38b 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> @@ -64,12 +64,25 @@ static void intel_threshold_interrupt(void)
> mce_notify_irq();
> }
>
> +static void print_banks_map(int banks)
> +{
> + int i;
> +
> + pr_debug("CPU %d MCE banks map:", smp_processor_id());
> + for (i = 0; i < banks; i++) {
> + pr_cont("%s%s", (i % 4) ? "" : " ",
> + test_bit(i, __get_cpu_var(mce_banks_owned)) ? "C" :
> + test_bit(i, __get_cpu_var(mce_poll_banks)) ? "P" : "s");
> + }
> + pr_cont("\n");

The problem here is that if pr_debug is not in effect, then the pr_cont("\n")
outputs a newline in the middle of other messages. I had introduced a new
macro (pr_debug_cont) but it was just as easy to buffer the message and then
print the entire thing with pr_debug().

I think pr_cont() should be reserved for when you actually want a partial
line printed before the end of the line, and not a means to stitch together
a complete line, but I may be in the minority.

This might also be another candidate for printing the needed information via
/sys or /proc ... in which case as Ingo has pointed out, the debug bootup
messages should be simple even to the point of voluminous, and the reports
compacted.

> +}
> +
> static void print_update(char *type, int *hdr, int num)
> {
> if (*hdr == 0)
> - printk(KERN_INFO "CPU %d MCA banks", smp_processor_id());
> + pr_info("CPU %d MCE banks map updated:", smp_processor_id());
> *hdr = 1;
> - printk(KERN_CONT " %s:%d", type, num);
> + pr_cont(" %s:%d", type, num);
> }
>
> /*
> @@ -85,6 +98,7 @@ static void cmci_discover(int banks, int boot)
> int i;
>
> spin_lock_irqsave(&cmci_discover_lock, flags);
> +
> for (i = 0; i < banks; i++) {
> u64 val;
>
> @@ -95,7 +109,7 @@ static void cmci_discover(int banks, int boot)
>
> /* Already owned by someone else? */
> if (val & CMCI_EN) {
> - if (test_and_clear_bit(i, owned) || boot)
> + if (test_and_clear_bit(i, owned) && !boot)
> print_update("SHD", &hdr, i);
> __clear_bit(i, __get_cpu_var(mce_poll_banks));
> continue;
> @@ -107,16 +121,19 @@ static void cmci_discover(int banks, int boot)
>
> /* Did the enable bit stick? -- the bank supports CMCI */
> if (val & CMCI_EN) {
> - if (!test_and_set_bit(i, owned) || boot)
> + if (!test_and_set_bit(i, owned) && !boot)
> print_update("CMCI", &hdr, i);
> __clear_bit(i, __get_cpu_var(mce_poll_banks));
> } else {
> WARN_ON(!test_bit(i, __get_cpu_var(mce_poll_banks)));
> }
> }
> - spin_unlock_irqrestore(&cmci_discover_lock, flags);
> if (hdr)
> - printk(KERN_CONT "\n");
> + pr_cont("\n");

Here again an extraneous newline will be printed in the middle of other messages.

> + if (hdr || boot)
> + print_banks_map(banks);
> +
> + spin_unlock_irqrestore(&cmci_discover_lock, flags);
> }
>
> /*

2009-11-17 18:40:32

by Mike Travis

[permalink] [raw]
Subject: [PATCH] x86, mce: rework output of MCE banks ownership information

Author: Hidetoshi Seto <[email protected]>

The output of MCE banks ownership information on boot tend
to be long on new processor which has many banks:

CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21

This message can fill up the console output when the number
of cpus is large.

This patch suppress this info message on boot, and introduce
debug message in shorter format instead, like:

CPU 1 MCE banks map: ssCC PCss ssPP ssss ssss ss

where: s: shared, C: checked by cmci, P: checked by poll.

This patch still keep the info when ownership is updated.
E.g. if a cpu take over the ownership from hot-removed cpu,
both message will be shown:

CPU 1 MCE banks map updated: CMCI:6 CMCI:7 CMCI:10 CMCI:11
CPU 1 MCE banks map: ssCC PCCC ssPP ssCC ssss ss

v2:
- stop changing the level of message on update
- change the number of banks message on boot to debug level

Signed-off-by: Hidetoshi Seto <[email protected]>

- Modified to not use pr_cont().

Signed-off-by: Mike Travis <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 6 +--
arch/x86/kernel/cpu/mcheck/mce_intel.c | 63 +++++++++++++++++++++++++++------
2 files changed, 55 insertions(+), 14 deletions(-)

--- linux.orig/arch/x86/kernel/cpu/mcheck/mce.c
+++ linux/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1215,11 +1215,11 @@

b = cap & MCG_BANKCNT_MASK;
if (!banks)
- printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
+ pr_debug("mce: CPU supports %d MCE banks\n", b);

if (b > MAX_NR_BANKS) {
- printk(KERN_WARNING
- "MCE: Using only %u machine check banks out of %u\n",
+ pr_warning(
+ "MCE: Using only %u machine check banks out of %u\n",
MAX_NR_BANKS, b);
b = MAX_NR_BANKS;
}
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ linux/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -64,14 +64,50 @@
mce_notify_irq();
}

-static void print_update(char *type, int *hdr, int num)
+#define MCE_MSG_LEN 120
+
+#ifdef DEBUG_KERNEL
+static void print_banks_map(int banks, char *buf)
+{
+ int i, n;
+
+ n = snprintf(buf, MCE_MSG_LEN, "CPU %d MCE banks map:",
+ smp_processor_id());
+ for (i = 0; i < banks; i++) {
+ n += snprintf(&buf[n], MCE_MSG_LEN - n,
+ "%s%s", (i % 4) ? "" : " ",
+ test_bit(i, __get_cpu_var(mce_banks_owned)) ? "C" :
+ test_bit(i, __get_cpu_var(mce_poll_banks)) ? "P" : "s");
+ }
+
+ /* (indicate if message buffer overflowed) */
+ pr_debug("%s%s\n", buf, n < MCE_MSG_LEN ? "" : "..." );
+}
+
+static void print_update(char *type, int *hdr, int num, char *buf)
+{
+ int n = *hdr;
+
+ if (n == 0)
+ n = snprintf(buf, MCE_MSG_LEN,
+ "CPU %d MCE banks map updated:", smp_processor_id());
+
+ n += snprintf(&buf[n], MCE_MSG_LEN - n, " %s:%d", type, num);
+ *hdr = n;
+}
+
+#else /* !DEBUG_KERNEL */
+
+static inline void print_banks_map(int banks, char *buf)
+{
+}
+
+static inline void print_update(char *type, int *hdr, int num, char *buf)
{
- if (*hdr == 0)
- printk(KERN_INFO "CPU %d MCA banks", smp_processor_id());
- *hdr = 1;
- printk(KERN_CONT " %s:%d", type, num);
}

+#endif
+
/*
* Enable CMCI (Corrected Machine Check Interrupt) for available MCE banks
* on this CPU. Use the algorithm recommended in the SDM to discover shared
@@ -83,8 +119,10 @@
unsigned long flags;
int hdr = 0;
int i;
+ char buf[MCE_MSG_LEN];

spin_lock_irqsave(&cmci_discover_lock, flags);
+
for (i = 0; i < banks; i++) {
u64 val;

@@ -95,8 +133,8 @@

/* Already owned by someone else? */
if (val & CMCI_EN) {
- if (test_and_clear_bit(i, owned) || boot)
- print_update("SHD", &hdr, i);
+ if (test_and_clear_bit(i, owned) && !boot)
+ print_update("SHD", &hdr, i, buf);
__clear_bit(i, __get_cpu_var(mce_poll_banks));
continue;
}
@@ -107,16 +145,19 @@

/* Did the enable bit stick? -- the bank supports CMCI */
if (val & CMCI_EN) {
- if (!test_and_set_bit(i, owned) || boot)
- print_update("CMCI", &hdr, i);
+ if (!test_and_set_bit(i, owned) && !boot)
+ print_update("CMCI", &hdr, i, buf);
__clear_bit(i, __get_cpu_var(mce_poll_banks));
} else {
WARN_ON(!test_bit(i, __get_cpu_var(mce_poll_banks)));
}
}
- spin_unlock_irqrestore(&cmci_discover_lock, flags);
if (hdr)
- printk(KERN_CONT "\n");
+ pr_debug("%s%s\n", buf, hdr < MCE_MSG_LEN ? "" : "...");
+ if (hdr || boot)
+ print_banks_map(banks, buf);
+
+ spin_unlock_irqrestore(&cmci_discover_lock, flags);
}

/*

2009-12-14 21:46:49

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH] x86, mce: rework output of MCE banks ownership information

Hi Ingo,

When running the latest kernel, I still find these in the output:


[ 0.722553] Booting Node 0, Processors #1
[ 0.811625] CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21
[ 0.812071] #2
[ 0.907468] CPU 2 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21
[ 0.907918] #3
[ 1.003311] CPU 3 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21
[ 1.003750] #4

Was there anything else needed for this patch to be accepted?
If it's not acceptable, would simply printing the above as DEBUG
messages be acceptable? (I'm aware you don't like printing
summaries during init.)

Thanks,
Mike

Mike Travis wrote:
> Author: Hidetoshi Seto <[email protected]>
>
> The output of MCE banks ownership information on boot tend
> to be long on new processor which has many banks:
>
> CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8
> SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21
>
> This message can fill up the console output when the number
> of cpus is large.
>
> This patch suppress this info message on boot, and introduce
> debug message in shorter format instead, like:
>
> CPU 1 MCE banks map: ssCC PCss ssPP ssss ssss ss
>
> where: s: shared, C: checked by cmci, P: checked by poll.
>
> This patch still keep the info when ownership is updated.
> E.g. if a cpu take over the ownership from hot-removed cpu,
> both message will be shown:
>
> CPU 1 MCE banks map updated: CMCI:6 CMCI:7 CMCI:10 CMCI:11
> CPU 1 MCE banks map: ssCC PCCC ssPP ssCC ssss ss
>
> v2:
> - stop changing the level of message on update
> - change the number of banks message on boot to debug level
>
> Signed-off-by: Hidetoshi Seto <[email protected]>
>
> - Modified to not use pr_cont().
>
> Signed-off-by: Mike Travis <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 6 +--
> arch/x86/kernel/cpu/mcheck/mce_intel.c | 63
> +++++++++++++++++++++++++++------
> 2 files changed, 55 insertions(+), 14 deletions(-)
>
> --- linux.orig/arch/x86/kernel/cpu/mcheck/mce.c
> +++ linux/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1215,11 +1215,11 @@
>
> b = cap & MCG_BANKCNT_MASK;
> if (!banks)
> - printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
> + pr_debug("mce: CPU supports %d MCE banks\n", b);
>
> if (b > MAX_NR_BANKS) {
> - printk(KERN_WARNING
> - "MCE: Using only %u machine check banks out of %u\n",
> + pr_warning(
> + "MCE: Using only %u machine check banks out of %u\n",
> MAX_NR_BANKS, b);
> b = MAX_NR_BANKS;
> }
> --- linux.orig/arch/x86/kernel/cpu/mcheck/mce_intel.c
> +++ linux/arch/x86/kernel/cpu/mcheck/mce_intel.c
> @@ -64,14 +64,50 @@
> mce_notify_irq();
> }
>
> -static void print_update(char *type, int *hdr, int num)
> +#define MCE_MSG_LEN 120
> +
> +#ifdef DEBUG_KERNEL
> +static void print_banks_map(int banks, char *buf)
> +{
> + int i, n;
> +
> + n = snprintf(buf, MCE_MSG_LEN, "CPU %d MCE banks map:",
> + smp_processor_id());
> + for (i = 0; i < banks; i++) {
> + n += snprintf(&buf[n], MCE_MSG_LEN - n,
> + "%s%s", (i % 4) ? "" : " ",
> + test_bit(i, __get_cpu_var(mce_banks_owned)) ? "C" :
> + test_bit(i, __get_cpu_var(mce_poll_banks)) ? "P" : "s");
> + }
> +
> + /* (indicate if message buffer overflowed) */
> + pr_debug("%s%s\n", buf, n < MCE_MSG_LEN ? "" : "..." );
> +}
> +
> +static void print_update(char *type, int *hdr, int num, char *buf)
> +{
> + int n = *hdr;
> +
> + if (n == 0)
> + n = snprintf(buf, MCE_MSG_LEN,
> + "CPU %d MCE banks map updated:", smp_processor_id());
> +
> + n += snprintf(&buf[n], MCE_MSG_LEN - n, " %s:%d", type, num);
> + *hdr = n;
> +}
> +
> +#else /* !DEBUG_KERNEL */
> +
> +static inline void print_banks_map(int banks, char *buf)
> +{
> +}
> +
> +static inline void print_update(char *type, int *hdr, int num, char *buf)
> {
> - if (*hdr == 0)
> - printk(KERN_INFO "CPU %d MCA banks", smp_processor_id());
> - *hdr = 1;
> - printk(KERN_CONT " %s:%d", type, num);
> }
>
> +#endif
> +
> /*
> * Enable CMCI (Corrected Machine Check Interrupt) for available MCE banks
> * on this CPU. Use the algorithm recommended in the SDM to discover shared
> @@ -83,8 +119,10 @@
> unsigned long flags;
> int hdr = 0;
> int i;
> + char buf[MCE_MSG_LEN];
>
> spin_lock_irqsave(&cmci_discover_lock, flags);
> +
> for (i = 0; i < banks; i++) {
> u64 val;
>
> @@ -95,8 +133,8 @@
>
> /* Already owned by someone else? */
> if (val & CMCI_EN) {
> - if (test_and_clear_bit(i, owned) || boot)
> - print_update("SHD", &hdr, i);
> + if (test_and_clear_bit(i, owned) && !boot)
> + print_update("SHD", &hdr, i, buf);
> __clear_bit(i, __get_cpu_var(mce_poll_banks));
> continue;
> }
> @@ -107,16 +145,19 @@
>
> /* Did the enable bit stick? -- the bank supports CMCI */
> if (val & CMCI_EN) {
> - if (!test_and_set_bit(i, owned) || boot)
> - print_update("CMCI", &hdr, i);
> + if (!test_and_set_bit(i, owned) && !boot)
> + print_update("CMCI", &hdr, i, buf);
> __clear_bit(i, __get_cpu_var(mce_poll_banks));
> } else {
> WARN_ON(!test_bit(i, __get_cpu_var(mce_poll_banks)));
> }
> }
> - spin_unlock_irqrestore(&cmci_discover_lock, flags);
> if (hdr)
> - printk(KERN_CONT "\n");
> + pr_debug("%s%s\n", buf, hdr < MCE_MSG_LEN ? "" : "...");
> + if (hdr || boot)
> + print_banks_map(banks, buf);
> +
> + spin_unlock_irqrestore(&cmci_discover_lock, flags);
> }
>
> /*

2009-12-15 01:51:30

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: [PATCH] x86, mce: rework output of MCE banks ownership information

(2009/12/15 6:46), Mike Travis wrote:
> Hi Ingo,
>
> When running the latest kernel, I still find these in the output:
>
>
> [ 0.722553] Booting Node 0, Processors #1
> [ 0.811625] CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6
> SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18
> SHD:19 SHD:20 SHD:21
> [ 0.812071] #2
> [ 0.907468] CPU 2 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6
> SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18
> SHD:19 SHD:20 SHD:21
> [ 0.907918] #3
> [ 1.003311] CPU 3 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6
> SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18
> SHD:19 SHD:20 SHD:21
> [ 1.003750] #4
>
> Was there anything else needed for this patch to be accepted? If it's
> not acceptable, would simply printing the above as DEBUG
> messages be acceptable? (I'm aware you don't like printing
> summaries during init.)
>
> Thanks,
> Mike

I have an updated patch, so let's continue on the new thread if needed.


Thanks,
H.Seto