LinuxLists.cc - [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

2011-04-18 14:01:26

Subject: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

From: Borislav Petkov <[email protected]>

The default notifier doesn't make a lot of sense to call in the
correctable errors case. Drop it and emit the mcelog decoding hint only
in the uncorrectable errors case and when no notifier is registered.
Also, limit issuing the "mcelog --ascii" message in the rare case when
we dump unreported CEs before panicking.

While at it, remove unused old x86_mce_decode_callback from the header.

Signed-off-by: Prarit Bhargava <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/mce.h | 2 --
arch/x86/kernel/cpu/mcheck/mce.c | 23 ++++++-----------------
2 files changed, 6 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index eb16e94..021979a 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -142,8 +142,6 @@ static inline void winchip_mcheck_init(struct cpuinfo_x86 *c) {}
static inline void enable_p5_mce(void) {}
#endif

-extern void (*x86_mce_decode_callback)(struct mce *m);
-
void mce_setup(struct mce *m);
void mce_log(struct mce *m);
DECLARE_PER_CPU(struct sys_device, mce_dev);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 68e2303..679bd9e 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -105,20 +105,6 @@ static int cpu_missing;
ATOMIC_NOTIFIER_HEAD(x86_mce_decoder_chain);
EXPORT_SYMBOL_GPL(x86_mce_decoder_chain);

-static int default_decode_mce(struct notifier_block *nb, unsigned long val,
- void *data)
-{
- pr_emerg(HW_ERR "No human readable MCE decoding support on this CPU type.\n");
- pr_emerg(HW_ERR "Run the message through 'mcelog --ascii' to decode.\n");
-
- return NOTIFY_STOP;
-}
-
-static struct notifier_block mce_dec_nb = {
- .notifier_call = default_decode_mce,
- .priority = -1,
-};
-
/* MCA banks polled by the period polling timer for corrected events */
DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
[0 ... BITS_TO_LONGS(MAX_NR_BANKS)-1] = ~0UL
@@ -212,6 +198,8 @@ void mce_log(struct mce *mce)

static void print_mce(struct mce *m)
{
+ int ret = 0;
+
pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
m->extcpu, m->mcgstatus, m->bank, m->status);

@@ -239,7 +227,10 @@ static void print_mce(struct mce *m)
* Print out human-readable details about the MCE error,
* (if the CPU has an implementation for that)
*/
- atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
+ ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
+ if (ret != NOTIFY_STOP)
+ printk_ratelimited(KERN_EMERG HW_ERR
+ "Run the above through 'mcelog --ascii'\n");
}

#define PANIC_TIMEOUT 5 /* 5 seconds */
@@ -1721,8 +1712,6 @@ __setup("mce", mcheck_enable);

int __init mcheck_init(void)
{
- atomic_notifier_chain_register(&x86_mce_decoder_chain, &mce_dec_nb);
-
mcheck_intel_therm_init();

return 0;
--
1.7.4.rc2

2011-04-19 17:14:02

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

* Borislav Petkov <[email protected]> wrote:

> From: Borislav Petkov <[email protected]>
> Signed-off-by: Prarit Bhargava <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>

Hm, that From/SOB combination does not look right.

Thanks,

Ingo

2011-04-19 17:35:49

by Borislav Petkov

[permalink] [raw]

Subject: Re: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

On Tue, Apr 19, 2011 at 01:13:40PM -0400, Ingo Molnar wrote:
>
> * Borislav Petkov <[email protected]> wrote:
>
> > From: Borislav Petkov <[email protected]>
> > Signed-off-by: Prarit Bhargava <[email protected]>
> > Signed-off-by: Borislav Petkov <[email protected]>
>
> Hm, that From/SOB combination does not look right.

Sorry, how's that instead:

--
From: Borislav Petkov <[email protected]>
Date: Wed, 13 Apr 2011 14:32:06 +0200
Subject: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

The default notifier doesn't make a lot of sense to call in the
correctable errors case. Drop it and emit the mcelog decoding hint only
in the uncorrectable errors case and when no notifier is registered.
Also, limit issuing the "mcelog --ascii" message in the rare case when
we dump unreported CEs before panicking.

While at it, remove unused old x86_mce_decode_callback from the header.

Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
---
arch/x86/include/asm/mce.h | 2 --
arch/x86/kernel/cpu/mcheck/mce.c | 23 ++++++-----------------
2 files changed, 6 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index eb16e94..021979a 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -142,8 +142,6 @@ static inline void winchip_mcheck_init(struct cpuinfo_x86 *c) {}
static inline void enable_p5_mce(void) {}
#endif

-extern void (*x86_mce_decode_callback)(struct mce *m);
-
void mce_setup(struct mce *m);
void mce_log(struct mce *m);
DECLARE_PER_CPU(struct sys_device, mce_dev);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 68e2303..679bd9e 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -105,20 +105,6 @@ static int cpu_missing;
ATOMIC_NOTIFIER_HEAD(x86_mce_decoder_chain);
EXPORT_SYMBOL_GPL(x86_mce_decoder_chain);

-static int default_decode_mce(struct notifier_block *nb, unsigned long val,
- void *data)
-{
- pr_emerg(HW_ERR "No human readable MCE decoding support on this CPU type.\n");
- pr_emerg(HW_ERR "Run the message through 'mcelog --ascii' to decode.\n");
-
- return NOTIFY_STOP;
-}
-
-static struct notifier_block mce_dec_nb = {
- .notifier_call = default_decode_mce,
- .priority = -1,
-};
-
/* MCA banks polled by the period polling timer for corrected events */
DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
[0 ... BITS_TO_LONGS(MAX_NR_BANKS)-1] = ~0UL
@@ -212,6 +198,8 @@ void mce_log(struct mce *mce)

static void print_mce(struct mce *m)
{
+ int ret = 0;
+
pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
m->extcpu, m->mcgstatus, m->bank, m->status);

@@ -239,7 +227,10 @@ static void print_mce(struct mce *m)
* Print out human-readable details about the MCE error,
* (if the CPU has an implementation for that)
*/
- atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
+ ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
+ if (ret != NOTIFY_STOP)
+ printk_ratelimited(KERN_EMERG HW_ERR
+ "Run the above through 'mcelog --ascii'\n");
}

#define PANIC_TIMEOUT 5 /* 5 seconds */
@@ -1721,8 +1712,6 @@ __setup("mce", mcheck_enable);

int __init mcheck_init(void)
{
- atomic_notifier_chain_register(&x86_mce_decoder_chain, &mce_dec_nb);
-
mcheck_intel_therm_init();

return 0;
--
1.7.4.rc2

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

2011-04-19 17:45:05

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

* Borislav Petkov <[email protected]> wrote:

> + if (ret != NOTIFY_STOP)
> + printk_ratelimited(KERN_EMERG HW_ERR
> + "Run the above through 'mcelog --ascii'\n");

pr_emerg_ratelimited() would allow you to keep that broken line happy?

Thanks,

Ingo

2011-04-20 10:24:19

by Borislav Petkov

[permalink] [raw]

Subject: Re: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

On Tue, Apr 19, 2011 at 01:44:46PM -0400, Ingo Molnar wrote:
>
> * Borislav Petkov <[email protected]> wrote:
>
> > + if (ret != NOTIFY_STOP)
> > + printk_ratelimited(KERN_EMERG HW_ERR
> > + "Run the above through 'mcelog --ascii'\n");
>
> pr_emerg_ratelimited() would allow you to keep that broken line happy?

Not entirely. I had to do the exit-early-to-save-an-indentation-level
trick also :)

--
From: Borislav Petkov <[email protected]>
Date: Wed, 13 Apr 2011 14:32:06 +0200
Subject: [PATCH -v2.1 2/2] x86, MCE: Drop the default decoding notifier

The default notifier doesn't make a lot of sense to call in the
correctable errors case. Drop it and emit the mcelog decoding hint only
in the uncorrectable errors case and when no notifier is registered.
Also, limit issuing the "mcelog --ascii" message in the rare case when
we dump unreported CEs before panicking.

While at it, remove unused old x86_mce_decode_callback from the header.

Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
---
arch/x86/include/asm/mce.h | 2 --
arch/x86/kernel/cpu/mcheck/mce.c | 24 +++++++-----------------
2 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index eb16e94..021979a 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -142,8 +142,6 @@ static inline void winchip_mcheck_init(struct cpuinfo_x86 *c) {}
static inline void enable_p5_mce(void) {}
#endif

-extern void (*x86_mce_decode_callback)(struct mce *m);
-
void mce_setup(struct mce *m);
void mce_log(struct mce *m);
DECLARE_PER_CPU(struct sys_device, mce_dev);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 68e2303..ff1ae9b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -105,20 +105,6 @@ static int cpu_missing;
ATOMIC_NOTIFIER_HEAD(x86_mce_decoder_chain);
EXPORT_SYMBOL_GPL(x86_mce_decoder_chain);

-static int default_decode_mce(struct notifier_block *nb, unsigned long val,
- void *data)
-{
- pr_emerg(HW_ERR "No human readable MCE decoding support on this CPU type.\n");
- pr_emerg(HW_ERR "Run the message through 'mcelog --ascii' to decode.\n");
-
- return NOTIFY_STOP;
-}
-
-static struct notifier_block mce_dec_nb = {
- .notifier_call = default_decode_mce,
- .priority = -1,
-};
-
/* MCA banks polled by the period polling timer for corrected events */
DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
[0 ... BITS_TO_LONGS(MAX_NR_BANKS)-1] = ~0UL
@@ -212,6 +198,8 @@ void mce_log(struct mce *mce)

static void print_mce(struct mce *m)
{
+ int ret = 0;
+
pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
m->extcpu, m->mcgstatus, m->bank, m->status);

@@ -239,7 +227,11 @@ static void print_mce(struct mce *m)
* Print out human-readable details about the MCE error,
* (if the CPU has an implementation for that)
*/
- atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
+ ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
+ if (ret == NOTIFY_STOP)
+ return;
+
+ pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
}

#define PANIC_TIMEOUT 5 /* 5 seconds */
@@ -1721,8 +1713,6 @@ __setup("mce", mcheck_enable);

int __init mcheck_init(void)
{
- atomic_notifier_chain_register(&x86_mce_decoder_chain, &mce_dec_nb);
-
mcheck_intel_therm_init();

return 0;
--
1.7.4.rc2

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 4363

2011-04-21 12:11:07

by tip-bot for Borislav Petkov

[permalink] [raw]

Subject: [tip:x86/mce] x86, mce: Drop the default decoding notifier

Commit-ID: dffa4b2f62ff28c982144c7033001b1ece4d3532
Gitweb: http://git.kernel.org/tip/dffa4b2f62ff28c982144c7033001b1ece4d3532
Author: Borislav Petkov <[email protected]>
AuthorDate: Wed, 20 Apr 2011 12:23:49 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 21 Apr 2011 11:35:10 +0200

x86, mce: Drop the default decoding notifier

The default notifier doesn't make a lot of sense to call in the
correctable errors case. Drop it and emit the mcelog decoding
hint only in the uncorrectable errors case and when no notifier
is registered. Also, limit issuing the "mcelog --ascii" message
in the rare case when we dump unreported CEs before panicking.

While at it, remove unused old x86_mce_decode_callback from the
header.

Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Nagananda Chumbalkar <[email protected]>
Cc: Russ Anderson <[email protected]>
Link: http://lkml.kernel.org/r/20110420102349.GB1361@aftab
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/mce.h | 2 --
arch/x86/kernel/cpu/mcheck/mce.c | 24 +++++++-----------------
2 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index eb16e94..021979a 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -142,8 +142,6 @@ static inline void winchip_mcheck_init(struct cpuinfo_x86 *c) {}
static inline void enable_p5_mce(void) {}
#endif

-extern void (*x86_mce_decode_callback)(struct mce *m);
-
void mce_setup(struct mce *m);
void mce_log(struct mce *m);
DECLARE_PER_CPU(struct sys_device, mce_dev);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 68e2303..ff1ae9b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -105,20 +105,6 @@ static int cpu_missing;
ATOMIC_NOTIFIER_HEAD(x86_mce_decoder_chain);
EXPORT_SYMBOL_GPL(x86_mce_decoder_chain);

-static int default_decode_mce(struct notifier_block *nb, unsigned long val,
- void *data)
-{
- pr_emerg(HW_ERR "No human readable MCE decoding support on this CPU type.\n");
- pr_emerg(HW_ERR "Run the message through 'mcelog --ascii' to decode.\n");
-
- return NOTIFY_STOP;
-}
-
-static struct notifier_block mce_dec_nb = {
- .notifier_call = default_decode_mce,
- .priority = -1,
-};
-
/* MCA banks polled by the period polling timer for corrected events */
DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
[0 ... BITS_TO_LONGS(MAX_NR_BANKS)-1] = ~0UL
@@ -212,6 +198,8 @@ void mce_log(struct mce *mce)

static void print_mce(struct mce *m)
{
+ int ret = 0;
+
pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
m->extcpu, m->mcgstatus, m->bank, m->status);

@@ -239,7 +227,11 @@ static void print_mce(struct mce *m)
* Print out human-readable details about the MCE error,
* (if the CPU has an implementation for that)
*/
- atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
+ ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
+ if (ret == NOTIFY_STOP)
+ return;
+
+ pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
}

#define PANIC_TIMEOUT 5 /* 5 seconds */
@@ -1721,8 +1713,6 @@ __setup("mce", mcheck_enable);

int __init mcheck_init(void)
{
- atomic_notifier_chain_register(&x86_mce_decoder_chain, &mce_dec_nb);
-
mcheck_intel_therm_init();

return 0;

2011-04-25 19:40:21

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

Borislav Petkov <[email protected]> writes:

> On Tue, Apr 19, 2011 at 01:44:46PM -0400, Ingo Molnar wrote:
>>
>> * Borislav Petkov <[email protected]> wrote:
>>
>> > + if (ret != NOTIFY_STOP)
>> > + printk_ratelimited(KERN_EMERG HW_ERR
>> > + "Run the above through 'mcelog --ascii'\n");
>>
>> pr_emerg_ratelimited() would allow you to keep that broken line happy?
>
> Not entirely. I had to do the exit-early-to-save-an-indentation-level
> trick also :)
>
> --
> From: Borislav Petkov <[email protected]>
> Date: Wed, 13 Apr 2011 14:32:06 +0200
> Subject: [PATCH -v2.1 2/2] x86, MCE: Drop the default decoding notifier
>
> The default notifier doesn't make a lot of sense to call in the
> correctable errors case. Drop it and emit the mcelog decoding hint only
> in the uncorrectable errors case and when no notifier is registered.
> Also, limit issuing the "mcelog --ascii" message in the rare case when
> we dump unreported CEs before panicking.
>
> While at it, remove unused old x86_mce_decode_callback from the
> header.

Can we please print something if we please log something in the
case of a correctable error, when we only report it via mcelog?

I have a stupid recent intel cpu here that hits that case and without
the default x86_mce_decode_callback I wouldn't have even known that I am
getting something like 50 correctable errors an hour on one of my
machines. In particular I am it hits so often I am seeing:
"mce_notify_irq: 2 callbacks suppressed". I need to get those dimms
replaced soon because in a new product I simply can't imagine that many
correctable errors.

Eric

2011-04-26 07:43:12

by Borislav Petkov

[permalink] [raw]

Subject: Re: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

On Mon, Apr 25, 2011 at 03:40:11PM -0400, Eric W. Biederman wrote:
> > From: Borislav Petkov <[email protected]>
> > Date: Wed, 13 Apr 2011 14:32:06 +0200
> > Subject: [PATCH -v2.1 2/2] x86, MCE: Drop the default decoding notifier
> >
> > The default notifier doesn't make a lot of sense to call in the
> > correctable errors case. Drop it and emit the mcelog decoding hint only
> > in the uncorrectable errors case and when no notifier is registered.
> > Also, limit issuing the "mcelog --ascii" message in the rare case when
> > we dump unreported CEs before panicking.
> >
> > While at it, remove unused old x86_mce_decode_callback from the
> > header.
>
> Can we please print something if we please log something in the
> case of a correctable error, when we only report it via mcelog?
>
> I have a stupid recent intel cpu here that hits that case and without
> the default x86_mce_decode_callback I wouldn't have even known that I am
> getting something like 50 correctable errors an hour on one of my
> machines. In particular I am it hits so often I am seeing:
> "mce_notify_irq: 2 callbacks suppressed". I need to get those dimms
> replaced soon because in a new product I simply can't imagine that many
> correctable errors.

Isn't there a mcelog daemon or something that polls /dev/mcelog and
tells you about those DRAM ECCs in some log file where you're supposed
to look? :)

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

2011-04-26 21:06:50

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

Borislav Petkov <[email protected]> writes:

> On Mon, Apr 25, 2011 at 03:40:11PM -0400, Eric W. Biederman wrote:
>> > From: Borislav Petkov <[email protected]>
>> > Date: Wed, 13 Apr 2011 14:32:06 +0200
>> > Subject: [PATCH -v2.1 2/2] x86, MCE: Drop the default decoding notifier
>> >
>> > The default notifier doesn't make a lot of sense to call in the
>> > correctable errors case. Drop it and emit the mcelog decoding hint only
>> > in the uncorrectable errors case and when no notifier is registered.
>> > Also, limit issuing the "mcelog --ascii" message in the rare case when
>> > we dump unreported CEs before panicking.
>> >
>> > While at it, remove unused old x86_mce_decode_callback from the
>> > header.
>>
>> Can we please print something if we please log something in the
>> case of a correctable error, when we only report it via mcelog?
>>
>> I have a stupid recent intel cpu here that hits that case and without
>> the default x86_mce_decode_callback I wouldn't have even known that I am
>> getting something like 50 correctable errors an hour on one of my
>> machines. In particular I am it hits so often I am seeing:
>> "mce_notify_irq: 2 callbacks suppressed". I need to get those dimms
>> replaced soon because in a new product I simply can't imagine that many
>> correctable errors.
>
> Isn't there a mcelog daemon or something that polls /dev/mcelog and
> tells you about those DRAM ECCs in some log file where you're supposed
> to look? :)

On fedora 14 there is a cron job that writes to /var/log/mcelog, and
does not go through syslog. But you have to be proactive and look
there. If the people who work on this code can't even remember
where to look I can't imagine how anyone else can remember.
Which is why I object to the removal of the one printk that told
me something was broken on my machine.

So far from what I have seen /dev/mcelog and the userspace mcelog is
over complicated and near useless. It seems to focused around the
notion that "This is not a software problem, please do not bug
Andi Kleen about it"

Well it is a hardware problem so I do need to RMA that hardware.
Sigh.

Eric

2011-04-26 21:48:29

by Borislav Petkov

[permalink] [raw]

Subject: Re: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

On Tue, Apr 26, 2011 at 05:06:39PM -0400, Eric W. Biederman wrote:
> Borislav Petkov <[email protected]> writes:
>
> > On Mon, Apr 25, 2011 at 03:40:11PM -0400, Eric W. Biederman wrote:
> >> > From: Borislav Petkov <[email protected]>
> >> > Date: Wed, 13 Apr 2011 14:32:06 +0200
> >> > Subject: [PATCH -v2.1 2/2] x86, MCE: Drop the default decoding notifier
> >> >
> >> > The default notifier doesn't make a lot of sense to call in the
> >> > correctable errors case. Drop it and emit the mcelog decoding hint only
> >> > in the uncorrectable errors case and when no notifier is registered.
> >> > Also, limit issuing the "mcelog --ascii" message in the rare case when
> >> > we dump unreported CEs before panicking.
> >> >
> >> > While at it, remove unused old x86_mce_decode_callback from the
> >> > header.
> >>
> >> Can we please print something if we please log something in the
> >> case of a correctable error, when we only report it via mcelog?
> >>
> >> I have a stupid recent intel cpu here that hits that case and without
> >> the default x86_mce_decode_callback I wouldn't have even known that I am
> >> getting something like 50 correctable errors an hour on one of my
> >> machines. In particular I am it hits so often I am seeing:
> >> "mce_notify_irq: 2 callbacks suppressed". I need to get those dimms
> >> replaced soon because in a new product I simply can't imagine that many
> >> correctable errors.
> >
> > Isn't there a mcelog daemon or something that polls /dev/mcelog and
> > tells you about those DRAM ECCs in some log file where you're supposed
> > to look? :)
>
> On fedora 14 there is a cron job that writes to /var/log/mcelog, and
> does not go through syslog. But you have to be proactive and look
> there. If the people who work on this code can't even remember
> where to look I can't imagine how anyone else can remember.

Ha!

I'm working exactly in the opposite direction actually - drop mcelog and
make RAS much more user friendly. As a first step, this is why we have
all that MCE decoding code for AMD hw and when you get an error, you
can't miss it:

Apr 20 21:08:24 kepek kernel: [ 300.816122] [Hardware Error]: MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc00c000c6080a13
Apr 20 21:08:24 kepek kernel: [ 300.825156] [Hardware Error]: Northbridge Error (node 0): DRAM ECC error detected on the NB.
Apr 20 21:08:24 kepek kernel: [ 300.825167] EDAC amd64 MC0: CE ERROR_ADDRESS= 0x4171fe380
Apr 20 21:08:24 kepek kernel: [ 300.825257] EDAC MC0: CE page 0x4171fe, offset 0x380, grain 0, syndrome 0xc601, row 3, channel 0, label "": amd64_
edac

or this:

Apr 15 16:54:17 kepek kernel: [72187.027059] [Hardware Error]: MC0_STATUS[-|UE|-|-|AddrV|UECC]: 0xb400210000010016
Apr 15 16:54:17 kepek kernel: [72187.027059] [Hardware Error]: Data Cache Error: L2 TLB multimatch.
Apr 15 16:54:17 kepek kernel: [72187.027059] [Hardware Error]: cache level: L2, tx: DATA

There's also this RAS daemon I'm hacking on which uses perf to carry
error information to userspace and do more than reporting it. For
example, server farm guys don't want to scan syslog for every CECC error
but rather have it collected somewhere on one machine, maybe over the
network, etc, etc.

So now is the time to speak up and let me know how you would like to get
the error reported? In general, what should be done differently in Linux
wrt to RAS.

> Which is why I object to the removal of the one printk that told
> me something was broken on my machine.

I dunno, maybe it's time we moved the MCE decoding functionality which
is shared by most of x86 into core code. Ingo, Peter, Thomas, what do
you guys think?

This'll at least put something in the logs that is sensible instead
of useless strings which tell the users what to do next. Also, we can
ratelimit it so that DIMMs generating too many CECCs don't flood them
too much. Hmm...

> So far from what I have seen /dev/mcelog and the userspace mcelog is
> over complicated and near useless. It seems to focused around the
> notion that "This is not a software problem, please do not bug
> Andi Kleen about it"

;-)

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

2011-04-26 22:26:41

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

Borislav Petkov <[email protected]> writes:

> On Tue, Apr 26, 2011 at 05:06:39PM -0400, Eric W. Biederman wrote:
>
> Ha!
>
> I'm working exactly in the opposite direction actually - drop mcelog and
> make RAS much more user friendly. As a first step, this is why we have
> all that MCE decoding code for AMD hw and when you get an error, you
> can't miss it:

I have no problem with having mcelog go away.

When we are on a system where we can't decode the mce, (aka the hardware
is newer than the kernel) we need some kind of sensible fallback that
prints something into syslog even if we don't have the full decode.

> Apr 20 21:08:24 kepek kernel: [ 300.816122] [Hardware Error]: MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc00c000c6080a13
> Apr 20 21:08:24 kepek kernel: [ 300.825156] [Hardware Error]: Northbridge Error (node 0): DRAM ECC error detected on the NB.
> Apr 20 21:08:24 kepek kernel: [ 300.825167] EDAC amd64 MC0: CE ERROR_ADDRESS= 0x4171fe380
> Apr 20 21:08:24 kepek kernel: [ 300.825257] EDAC MC0: CE page 0x4171fe, offset 0x380, grain 0, syndrome 0xc601, row 3, channel 0, label "": amd64_
> edac
>
> or this:
>
> Apr 15 16:54:17 kepek kernel: [72187.027059] [Hardware Error]: MC0_STATUS[-|UE|-|-|AddrV|UECC]: 0xb400210000010016
> Apr 15 16:54:17 kepek kernel: [72187.027059] [Hardware Error]: Data Cache Error: L2 TLB multimatch.
> Apr 15 16:54:17 kepek kernel: [72187.027059] [Hardware Error]: cache level: L2, tx: DATA
>
>
> There's also this RAS daemon I'm hacking on which uses perf to carry
> error information to userspace and do more than reporting it. For
> example, server farm guys don't want to scan syslog for every CECC error
> but rather have it collected somewhere on one machine, maybe over the
> network, etc, etc.

Collected somewhere on one machine sounds remarkably like syslog.

I expect what the big server farm guys object to most is errors that
are hard to parse and hard to deal with in automation. And I can't
say that I blame them.

> So now is the time to speak up and let me know how you would like to get
> the error reported? In general, what should be done differently in Linux
> wrt to RAS.

>> Which is why I object to the removal of the one printk that told
>> me something was broken on my machine.
>
> I dunno, maybe it's time we moved the MCE decoding functionality which
> is shared by most of x86 into core code. Ingo, Peter, Thomas, what do
> you guys think?
>
> This'll at least put something in the logs that is sensible instead
> of useless strings which tell the users what to do next. Also, we can
> ratelimit it so that DIMMs generating too many CECCs don't flood them
> too much. Hmm...

Sure. Although any DIMM that is generating so many correctable errors
that you need to rate limit it in the kernel, won't likely to confine
itself to correctable errors.

Still it can happen that things are so bad that you do need to rate
limit it in the kernel. Still with those you start wondering "How did
this machine boot?" So printk_ratelimit sounds like a fine idea.

In the casual use situation where I have not yet bothered on my small
number of machines to set up sophisticated logging infrastructure
I just want something that adds an annoying message to syslog so
people who are logged in can say what was that, and downtime can be
scheduled.

Eric

2011-04-26 22:33:06

by Russ Anderson

[permalink] [raw]

Subject: Re: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

On Tue, Apr 26, 2011 at 02:06:39PM -0700, Eric W. Biederman wrote:
> Borislav Petkov <[email protected]> writes:
> > On Mon, Apr 25, 2011 at 03:40:11PM -0400, Eric W. Biederman wrote:
> >> > From: Borislav Petkov <[email protected]>
> >> > Date: Wed, 13 Apr 2011 14:32:06 +0200
> >> > Subject: [PATCH -v2.1 2/2] x86, MCE: Drop the default decoding notifier
> >> >
> >> > The default notifier doesn't make a lot of sense to call in the
> >> > correctable errors case. Drop it and emit the mcelog decoding hint only
> >> > in the uncorrectable errors case and when no notifier is registered.
> >> > Also, limit issuing the "mcelog --ascii" message in the rare case when
> >> > we dump unreported CEs before panicking.
> >> >
> >> > While at it, remove unused old x86_mce_decode_callback from the
> >> > header.
> >>
> >> Can we please print something if we please log something in the
> >> case of a correctable error, when we only report it via mcelog?
> >>
> >> I have a stupid recent intel cpu here that hits that case and without
> >> the default x86_mce_decode_callback I wouldn't have even known that I am
> >> getting something like 50 correctable errors an hour on one of my
> >> machines. In particular I am it hits so often I am seeing:
> >> "mce_notify_irq: 2 callbacks suppressed". I need to get those dimms
> >> replaced soon because in a new product I simply can't imagine that many
> >> correctable errors.
> >
> > Isn't there a mcelog daemon or something that polls /dev/mcelog and
> > tells you about those DRAM ECCs in some log file where you're supposed
> > to look? :)
>
> On fedora 14 there is a cron job that writes to /var/log/mcelog, and
> does not go through syslog.

Interesting. I'm running fedora 14 and don't have a /var/log/mcelog
file or see an mcelog package (not that I'd looked until just now).

> But you have to be proactive and look
> there. If the people who work on this code can't even remember
> where to look I can't imagine how anyone else can remember.
> Which is why I object to the removal of the one printk that told
> me something was broken on my machine.

Historically hardware error reporting has been very platform
dependent. Those differences made it difficult to come up with
agreement on standard ways to report errors. You raise a good
point that it needs to work better.

> So far from what I have seen /dev/mcelog and the userspace mcelog is
> over complicated and near useless.

/dev/mcelog is extremely useful to SGI. As you said, "you have to
be proactive and look there" which we are and do. :-)

> It seems to focused around the
> notion that "This is not a software problem, please do not bug
> Andi Kleen about it"
>
> Well it is a hardware problem so I do need to RMA that hardware.
> Sigh.

You raise a good issue that users do need to know when their
hardware is having issues.

> Eric

--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc [email protected]

2011-04-26 23:44:53

by Luck, Tony

[permalink] [raw]

Subject: RE: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

> Sure. Although any DIMM that is generating so many correctable errors
> that you need to rate limit it in the kernel, won't likely to confine
> itself to correctable errors.
>
> Still it can happen that things are so bad that you do need to rate
> limit it in the kernel. Still with those you start wondering "How did
> this machine boot?" So printk_ratelimit sounds like a fine idea.

Perhaps we really want thresholds rather than rate limits (for corrected
errors). One corrected error shouldn't cause any but the most paranoid
to worry. A couple of errors from the same DIMM close together might be
some cause for concern, but could just be happenstance. Enough errors that
rate limiting looks useful, and you are into "something needs to be done"
territory.

-Tony

2011-04-27 14:03:43

by Borislav Petkov

[permalink] [raw]

Subject: Re: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier

On Tue, Apr 26, 2011 at 07:44:35PM -0400, Luck, Tony wrote:
> > Sure. Although any DIMM that is generating so many correctable errors
> > that you need to rate limit it in the kernel, won't likely to confine
> > itself to correctable errors.
> >
> > Still it can happen that things are so bad that you do need to rate
> > limit it in the kernel. Still with those you start wondering "How did
> > this machine boot?" So printk_ratelimit sounds like a fine idea.
>
>
> Perhaps we really want thresholds rather than rate limits (for corrected
> errors). One corrected error shouldn't cause any but the most paranoid
> to worry. A couple of errors from the same DIMM close together might be
> some cause for concern, but could just be happenstance. Enough errors that
> rate limiting looks useful, and you are into "something needs to be done"
> territory.

Right, but for thresholding you need to know to which DIMM the error
address belongs. And this is not trivial in all cases. It looks like we
need this error reporting thing dynamic:

As a sane default, we want to dump some _sensible_ info to syslog about
some errors happening. Then, all in-kernel decoding modules can enrich
that error info with more specific details. Those two reporting modes
should be ratelimited since doing thresholding in kernel could be tricky
and taking up precious resources.

Finally, you have the userspace daemon which datacenter people or
Google with small root partitions could use, which sends that info to a
centralized location instead of someone collecting it from each node. In
that case, no info goes out to syslog but is eaten up by the RAS daemon
which runs on every machine. And in userspace you can do all the cool
thresholding, rate limiting, policy applying, etc, your heart desires.

Makes sense?

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632