2009-10-15 21:21:55

by Roland Dreier

[permalink] [raw]
Subject: [PATCH] x86: Don't print number of MCE banks for every CPU

The MCE initialization code explicitly says it doesn't handle asymmetric
configurations where different CPUs support different numbers of MCE
banks, and it prints a big warning in that case. Therefore, printing
the "mce: CPU supports <x> MCE banks" message into the kernel log for
every CPU is pure redundancy that clutters the log significantly for
systems with lots of CPUs.

Signed-off-by: Roland Dreier <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index b1598a9..721a77c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1214,7 +1214,8 @@ static int __cpuinit mce_cap_init(void)
rdmsrl(MSR_IA32_MCG_CAP, cap);

b = cap & MCG_BANKCNT_MASK;
- printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
+ if (!banks)
+ printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);

if (b > MAX_NR_BANKS) {
printk(KERN_WARNING


2009-10-16 07:21:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: Don't print number of MCE banks for every CPU


* Roland Dreier <[email protected]> wrote:

> The MCE initialization code explicitly says it doesn't handle asymmetric
> configurations where different CPUs support different numbers of MCE
> banks, and it prints a big warning in that case. Therefore, printing
> the "mce: CPU supports <x> MCE banks" message into the kernel log for
> every CPU is pure redundancy that clutters the log significantly for
> systems with lots of CPUs.
>
> Signed-off-by: Roland Dreier <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)

Applied, thanks Roland!

Ingo

2009-10-16 07:24:24

by Roland Dreier

[permalink] [raw]
Subject: [tip:x86/urgent] x86: Don't print number of MCE banks for every CPU

Commit-ID: 93ae5012a79b11e7fc855b52c7ce1e16fe1540b0
Gitweb: http://git.kernel.org/tip/93ae5012a79b11e7fc855b52c7ce1e16fe1540b0
Author: Roland Dreier <[email protected]>
AuthorDate: Thu, 15 Oct 2009 14:21:14 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 16 Oct 2009 09:20:03 +0200

x86: Don't print number of MCE banks for every CPU

The MCE initialization code explicitly says it doesn't handle
asymmetric configurations where different CPUs support different
numbers of MCE banks, and it prints a big warning in that case.

Therefore, printing the "mce: CPU supports <x> MCE banks"
message into the kernel log for every CPU is pure redundancy
that clutters the log significantly for systems with lots of
CPUs.

Signed-off-by: Roland Dreier <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index b1598a9..721a77c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1214,7 +1214,8 @@ static int __cpuinit mce_cap_init(void)
rdmsrl(MSR_IA32_MCG_CAP, cap);

b = cap & MCG_BANKCNT_MASK;
- printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
+ if (!banks)
+ printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);

if (b > MAX_NR_BANKS) {
printk(KERN_WARNING

2009-10-27 19:42:10

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH] x86: Don't print number of MCE banks for every CPU

Hi Roland,

I've found that I'm getting one of these lines for every cpu:

mce: CPU supports 0 MCE banks

Regards,
Mike

Roland Dreier wrote:
> The MCE initialization code explicitly says it doesn't handle asymmetric
> configurations where different CPUs support different numbers of MCE
> banks, and it prints a big warning in that case. Therefore, printing
> the "mce: CPU supports <x> MCE banks" message into the kernel log for
> every CPU is pure redundancy that clutters the log significantly for
> systems with lots of CPUs.
>
> Signed-off-by: Roland Dreier <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index b1598a9..721a77c 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1214,7 +1214,8 @@ static int __cpuinit mce_cap_init(void)
> rdmsrl(MSR_IA32_MCG_CAP, cap);
>
> b = cap & MCG_BANKCNT_MASK;
> - printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
> + if (!banks)
> + printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
>
> if (b > MAX_NR_BANKS) {
> printk(KERN_WARNING
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2009-10-27 20:52:50

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH] x86: Don't print number of MCE banks for every CPU


Mike Travis wrote:
> Hi Roland,
>
> I've found that I'm getting one of these lines for every cpu:
>
> mce: CPU supports 0 MCE banks
>

A bit more info. THe data above was from our simulator which
apparently is not simulating mce very well. On a live system
I get 383 lines (for 383 additional cpus) with what appears to be
redundant lines...

[ 4.882085] CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21
[ 4.978893] CPU 2 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21
...
[ 4.978893] CPU 2 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21


> Regards,
> Mike
>
> Roland Dreier wrote:
>> The MCE initialization code explicitly says it doesn't handle asymmetric
>> configurations where different CPUs support different numbers of MCE
>> banks, and it prints a big warning in that case. Therefore, printing
>> the "mce: CPU supports <x> MCE banks" message into the kernel log for
>> every CPU is pure redundancy that clutters the log significantly for
>> systems with lots of CPUs.
>>
>> Signed-off-by: Roland Dreier <[email protected]>
>> ---
>> arch/x86/kernel/cpu/mcheck/mce.c | 3 ++-
>> 1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
>> b/arch/x86/kernel/cpu/mcheck/mce.c
>> index b1598a9..721a77c 100644
>> --- a/arch/x86/kernel/cpu/mcheck/mce.c
>> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
>> @@ -1214,7 +1214,8 @@ static int __cpuinit mce_cap_init(void)
>> rdmsrl(MSR_IA32_MCG_CAP, cap);
>>
>> b = cap & MCG_BANKCNT_MASK;
>> - printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
>> + if (!banks)
>> + printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
>>
>> if (b > MAX_NR_BANKS) {
>> printk(KERN_WARNING
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>

2009-10-28 04:08:34

by Hidetoshi Seto

[permalink] [raw]
Subject: [PATCH] x86, mce: disable MCE if cpu has no MCE banks

Mike Travis wrote:
>
> Mike Travis wrote:
>> Hi Roland,
>>
>> I've found that I'm getting one of these lines for every cpu:
>>
>> mce: CPU supports 0 MCE banks

I believe my patch at last in this mail will solve this issue.

> A bit more info. THe data above was from our simulator which
> apparently is not simulating mce very well. On a live system
> I get 383 lines (for 383 additional cpus) with what appears to be
> redundant lines...
>
> [ 4.882085] CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6
> SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18
> SHD:19 SHD:20 SHD:21
> [ 4.978893] CPU 2 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6
> SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18
> SHD:19 SHD:20 SHD:21
> ...
> [ 4.978893] CPU 2 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6
> SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18
> SHD:19 SHD:20 SHD:21

Hum, I suppose the line for CPU 0 was slightly different from others,
because SHD means "this bank is shared bank and controlled by other".
Maybe:
CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21

But I agree that we could some work for this messages...
Is it better to change the message level to debug from info?
How about changing the format like:
CPU 0 MCA banks map : CCCC PCCC CCPP CCCC CCCC CC
CPU 1 MCA banks map : ssCC PCss ssPP ssss ssss ss
:

If there are no complains, I'll make another patch to do so.


Thanks,
H.Seto

===

Subject: [PATCH] x86, mce: disable MCE if cpu has no MCE banks

If cpu has no MCE banks (e.g. simulated processor on VMs), it is better to
disable MCE support on the system since we cannot handle MCE well.

Reported-by: Mike Travis <[email protected]>
Signed-off-by: Hidetoshi Seto <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8080170..29055ab 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1228,6 +1228,10 @@ static int __cpuinit __mcheck_cpu_cap_init(void)
rdmsrl(MSR_IA32_MCG_CAP, cap);

b = cap & MCG_BANKCNT_MASK;
+ if (!b) {
+ pr_info("MCE: no MCE banks - not enabling MCE support.\n");
+ return -ENODEV;
+ }
if (!banks)
printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);

--
1.6.5.2

2009-10-28 04:26:16

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH] x86: Don't print number of MCE banks for every CPU


> [ 4.882085] CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21

Yes, we should probably kill that debug output as well, that was on my
list of things to do.

- R.

2009-10-28 05:24:48

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks

Hidetoshi Seto wrote:
> Mike Travis wrote:
>> Mike Travis wrote:
>>> Hi Roland,
>>>
>>> I've found that I'm getting one of these lines for every cpu:
>>>
>>> mce: CPU supports 0 MCE banks

That message can be just removed I think. I don't see much value in it
because the value is in sysfs and when you see the CPU type you can easily
determine it anyways.

I don't think the patch below really solves the problem because they
would have the same noise problem back once they switch from the simulator
to a real box which has banks.

> Hum, I suppose the line for CPU 0 was slightly different from others,
> because SHD means "this bank is shared bank and controlled by other".
> Maybe:
> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
>
> But I agree that we could some work for this messages...
> Is it better to change the message level to debug from info?

Can be made INFO yes, but I would prefer not removing them
from the dmesg for now.

Perhaps they could be also compressed a bit like SRAT.

-Andi

2009-10-28 06:26:54

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks

Andi Kleen wrote:
> Hidetoshi Seto wrote:
>> Mike Travis wrote:
>>> Mike Travis wrote:
>>>> Hi Roland,
>>>>
>>>> I've found that I'm getting one of these lines for every cpu:
>>>>
>>>> mce: CPU supports 0 MCE banks
>
> That message can be just removed I think. I don't see much value in it
> because the value is in sysfs and when you see the CPU type you can easily
> determine it anyways.
>
> I don't think the patch below really solves the problem because they
> would have the same noise problem back once they switch from the simulator
> to a real box which has banks.

If box has any banks more than 0, then the line above will be appeared only
once for CPU 0. Only on the simulator, with MCE-capable processor with no
bank, this message becomes unacceptable noise because it appears for every
cpu.

Anyway I think my patch is nice to have, to avoid unexpected behavior on
uncertain environment.

Without disabling, what can we do on MCE with no bank?
I found that do_machine_check() does nothing if banks==0 ... it is better
to let system to panic with "Machine check from unknown source"?


>> Hum, I suppose the line for CPU 0 was slightly different from others,
>> because SHD means "this bank is shared bank and controlled by other".
>> Maybe:
>> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
>>
>> But I agree that we could some work for this messages...
>> Is it better to change the message level to debug from info?
>
> Can be made INFO yes, but I would prefer not removing them
> from the dmesg for now.
>
> Perhaps they could be also compressed a bit like SRAT.

Like SRAT? I could not catch the meaning ... For example?


Thanks,
H.Seto

2009-10-28 06:48:18

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks

Hidetoshi Seto wrote:

>
> Without disabling, what can we do on MCE with no bank?

Nothing, but is it really worth adding a special case?

> I found that do_machine_check() does nothing if banks==0 ... it is better
> to let system to panic with "Machine check from unknown source"?

IMHO yes. In this case the system must be very confused and panic is the
best you can do. Otherwise it won't do anything interesting anyways.

>
>>> Hum, I suppose the line for CPU 0 was slightly different from others,
>>> because SHD means "this bank is shared bank and controlled by other".
>>> Maybe:
>>> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
>>>
>>> But I agree that we could some work for this messages...
>>> Is it better to change the message level to debug from info?
>> Can be made INFO yes, but I would prefer not removing them
>> from the dmesg for now.
>>
>> Perhaps they could be also compressed a bit like SRAT.
>
> Like SRAT? I could not catch the meaning ... For example?

See the recent patches from David Rientjes in the same original thread.

-Andi

2009-10-28 08:19:09

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks

Andi Kleen wrote:
> Hidetoshi Seto wrote:
>> Without disabling, what can we do on MCE with no bank?
>
> Nothing, but is it really worth adding a special case?

If question were:
- is it really worth to support this special environment,
"MCE-capable but no MCE banks" ?
then I'd like to say no.

So I suggested to disable MCE on this uncertain environment.
Or we will end up adding more codes for special cases...

>> I found that do_machine_check() does nothing if banks==0 ... it is better
>> to let system to panic with "Machine check from unknown source"?
>
> IMHO yes. In this case the system must be very confused and panic is the
> best you can do. Otherwise it won't do anything interesting anyways.

Agreed, but this is also a special case.
Not depending on the real number of banks, confused system could fail to
get the value from memory... Humm, in theory MCE handler must be
implemented carefully, but I bet the confused value will not be always 0,
... is it worth to do?

>>>> Hum, I suppose the line for CPU 0 was slightly different from others,
>>>> because SHD means "this bank is shared bank and controlled by other".
>>>> Maybe:
>>>> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
>>>>
>>>> But I agree that we could some work for this messages...
>>>> Is it better to change the message level to debug from info?
>>> Can be made INFO yes, but I would prefer not removing them
>>> from the dmesg for now.
>>>
>>> Perhaps they could be also compressed a bit like SRAT.
>>
>> Like SRAT? I could not catch the meaning ... For example?
>
> See the recent patches from David Rientjes in the same original thread.

I found it, thanks.

So I suppose your idea is like:
CPU 0 MCA banks CMCI:{0-3,5-9,12-21} POLL:{4,10,11}
CPU 1 MCA banks SHD:{0,1,6-9,12-21} CMCI:{2,3,5} POLL:{4,10,11}
right?

IMHO the format I suggested is better to read, as far as banks is
not so big number.
CPU 0 MCA banks map : CCCC PCCC CCPP CCCC CCCC CC
CPU 1 MCA banks map : ssCC PCss ssPP ssss ssss ss


Thanks,
H.Seto

2009-10-28 12:03:57

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks

On Wed, 28 Oct 2009 06:24:45 BST, Andi Kleen said:

> >>> mce: CPU supports 0 MCE banks
>
> That message can be just removed I think. I don't see much value in it
> because the value is in sysfs and when you see the CPU type you can easily
> determine it anyways.

Maybe it should only print a message if it finds an unexpected number of banks?
"Hey dood - we're on a Core3.5 and there should be 6 banks here, but the
hardware says there's only 4. What's up with that?"


Attachments:
(No filename) (227.00 B)

2009-10-28 13:44:23

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks

[email protected] wrote:
> On Wed, 28 Oct 2009 06:24:45 BST, Andi Kleen said:
>
>>>>> mce: CPU supports 0 MCE banks
>> That message can be just removed I think. I don't see much value in it
>> because the value is in sysfs and when you see the CPU type you can easily
>> determine it anyways.
>
> Maybe it should only print a message if it finds an unexpected number of banks?
> "Hey dood - we're on a Core3.5 and there should be 6 banks here, but the
> hardware says there's only 4. What's up with that?"

The kernel doesn't know what number of banks are expected, just humans do.

-Andi

2009-10-28 17:09:06

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks



Hidetoshi Seto wrote:
> Andi Kleen wrote:
>> Hidetoshi Seto wrote:
>>> Without disabling, what can we do on MCE with no bank?
>> Nothing, but is it really worth adding a special case?
>
> If question were:
> - is it really worth to support this special environment,
> "MCE-capable but no MCE banks" ?
> then I'd like to say no.
>
> So I suggested to disable MCE on this uncertain environment.
> Or we will end up adding more codes for special cases...
>
>>> I found that do_machine_check() does nothing if banks==0 ... it is better
>>> to let system to panic with "Machine check from unknown source"?
>> IMHO yes. In this case the system must be very confused and panic is the
>> best you can do. Otherwise it won't do anything interesting anyways.
>
> Agreed, but this is also a special case.
> Not depending on the real number of banks, confused system could fail to
> get the value from memory... Humm, in theory MCE handler must be
> implemented carefully, but I bet the confused value will not be always 0,
> ... is it worth to do?
>
>>>>> Hum, I suppose the line for CPU 0 was slightly different from others,
>>>>> because SHD means "this bank is shared bank and controlled by other".
>>>>> Maybe:
>>>>> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
>>>>>
>>>>> But I agree that we could some work for this messages...
>>>>> Is it better to change the message level to debug from info?
>>>> Can be made INFO yes, but I would prefer not removing them
>>>> from the dmesg for now.
>>>>
>>>> Perhaps they could be also compressed a bit like SRAT.
>>> Like SRAT? I could not catch the meaning ... For example?
>> See the recent patches from David Rientjes in the same original thread.
>
> I found it, thanks.
>
> So I suppose your idea is like:
> CPU 0 MCA banks CMCI:{0-3,5-9,12-21} POLL:{4,10,11}
> CPU 1 MCA banks SHD:{0,1,6-9,12-21} CMCI:{2,3,5} POLL:{4,10,11}
> right?
>
> IMHO the format I suggested is better to read, as far as banks is
> not so big number.
> CPU 0 MCA banks map : CCCC PCCC CCPP CCCC CCCC CC
> CPU 1 MCA banks map : ssCC PCss ssPP ssss ssss ss
>
>
> Thanks,
> H.Seto

The problem comes up when you have a whole bunch of cpus, and the lines
become redundant. Can you compress the lines so that cpus with the
same given mappings are printed on one line?

Thanks,
Mike

2009-10-28 17:12:44

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks


> Perhaps they could be also compressed a bit like SRAT.

Seems like a good idea... but I wonder what the best way to represent
things is. For example I have a 2-socket Nehalem system that shows:

2 times: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
6 times: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
8 times: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8

presumably the first line is once per package, the next line is for the
first sibling in all the other cores in a package, and the last line is
for the SMT siblings of all the cores.

But would we want to accumulate all the different combinations of banks
along with a CPU mask and then print something like:

CPUs 0 4: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
CPUs 1 2 3 5 6 7: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
CPUs 8 9 10 11 12 13 14 15: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8

of course output like that is going to lead to super-long lines on a
64-thread system.

Also I'm not sure of a clean way to implement this; unlike the SRAT
stuff, we need to deal with CPU hotplug so all this at best could be
__cpuinitdata, ie we can't discard it in most configs.

However the "MCA banks" output definitely is annoying on a 64-thread
system -- the amount of output is far greater than the utility of said
output. So ideas on the best way to reduce this would be appreciated.

Thanks,
Roland

2009-10-28 17:36:55

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks



Roland Dreier wrote:
> > Perhaps they could be also compressed a bit like SRAT.
>
> Seems like a good idea... but I wonder what the best way to represent
> things is. For example I have a 2-socket Nehalem system that shows:
>
> 2 times: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
> 6 times: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
> 8 times: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
>
> presumably the first line is once per package, the next line is for the
> first sibling in all the other cores in a package, and the last line is
> for the SMT siblings of all the cores.
>
> But would we want to accumulate all the different combinations of banks
> along with a CPU mask and then print something like:
>
> CPUs 0 4: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
> CPUs 1 2 3 5 6 7: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
> CPUs 8 9 10 11 12 13 14 15: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8

Or use a cpumask and cpulist_scnprintf which condenses the cpu list nicely.

>
> of course output like that is going to lead to super-long lines on a
> 64-thread system.
>
> Also I'm not sure of a clean way to implement this; unlike the SRAT
> stuff, we need to deal with CPU hotplug so all this at best could be
> __cpuinitdata, ie we can't discard it in most configs.
>
> However the "MCA banks" output definitely is annoying on a 64-thread
> system -- the amount of output is far greater than the utility of said
> output. So ideas on the best way to reduce this would be appreciated.
>
> Thanks,
> Roland

2009-10-28 18:03:07

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks


> > But would we want to accumulate all the different combinations of banks
> > along with a CPU mask and then print something like:
> >
> > CPUs 0 4: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
> > CPUs 1 2 3 5 6 7: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
> > CPUs 8 9 10 11 12 13 14 15: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
>
> Or use a cpumask and cpulist_scnprintf which condenses the cpu list nicely.

Thanks! I didn't know about that API.

However with that said I think the real issue is whether that style of
output is a good idea, no matter how nicely the CPU list is formatted :)

- R.