2010-02-11 14:34:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: Fw: [Bug 15238] Oops on startup: Kernel failure: EDAC amd64: WARNING: ECC is disabled by BIOS

Hi,

On Wed, Feb 10, 2010 at 09:53:03AM -0800, Doug Thompson wrote:
> boris, can you look at this as well.
>
> I think changing WARNING to a NOTICE in the log output would stop the oops parsing code from thinking this an OOPS event, when it is not
>
> doug t
>
>
> --- On Mon, 2/8/10, [email protected] <[email protected]> wrote:
>
> > From: [email protected] <[email protected]>
> > Subject: [Bug 15238] Oops on startup: Kernel failure: EDAC amd64: WARNING: ECC is disabled by BIOS
> > To: [email protected]
> > Date: Monday, February 8, 2010, 2:42 PM
> > http://bugzilla.kernel.org/show_bug.cgi?id=15238
> >
> >
> > Andrew Morton <[email protected]>
> > changed:
> >
> > ? ? ? ? ???What?
> > ? |Removed? ? ? ? ? ?
> > ? ? ? ???|Added
> > ----------------------------------------------------------------------------
> > ? ? ? ???AssignedTo|[email protected]???|[email protected]
> >
> >
> >
> >
> > --- Comment #1 from Andrew Morton <[email protected]>?
> > 2010-02-08 21:42:40 ---
> > Doug, could you please take a look at this one?

How about something like the following? If everyone is ok with that
I'll send it to Linus and stable later since it is trivial enough:

dmesg:

EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)

patch:

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 000dc67..3391e67 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2658,10 +2658,11 @@ static void amd64_restore_ecc_error_reporting(struct amd64_pvt *pvt)
* the memory system completely. A command line option allows to force-enable
* hardware ECC later in amd64_enable_ecc_error_reporting().
*/
-static const char *ecc_warning =
- "WARNING: ECC is disabled by BIOS. Module will NOT be loaded.\n"
- " Either Enable ECC in the BIOS, or set 'ecc_enable_override'.\n"
- " Also, use of the override can cause unknown side effects.\n";
+static const char *ecc_msg =
+ "ECC disabled in the BIOS or no ECC capability, module will not load.\n"
+ " Either enable ECC checking or force module loading by setting "
+ "'ecc_enable_override'.\n"
+ " (Note that use of the override may cause unknown side effects.)\n";

static int amd64_check_ecc_enabled(struct amd64_pvt *pvt)
{
@@ -2673,7 +2674,7 @@ static int amd64_check_ecc_enabled(struct amd64_pvt *pvt)

ecc_enabled = !!(value & K8_NBCFG_ECC_ENABLE);
if (!ecc_enabled)
- amd64_printk(KERN_WARNING, "This node reports that Memory ECC "
+ amd64_printk(KERN_NOTICE, "This node reports that Memory ECC "
"is currently disabled, set F3x%x[22] (%s).\n",
K8_NBCFG, pci_name(pvt->misc_f3_ctl));
else
@@ -2681,13 +2682,13 @@ static int amd64_check_ecc_enabled(struct amd64_pvt *pvt)

nb_mce_en = amd64_nb_mce_bank_enabled_on_node(pvt->mc_node_id);
if (!nb_mce_en)
- amd64_printk(KERN_WARNING, "NB MCE bank disabled, set MSR "
+ amd64_printk(KERN_NOTICE, "NB MCE bank disabled, set MSR "
"0x%08x[4] on node %d to enable.\n",
MSR_IA32_MCG_CTL, pvt->mc_node_id);

if (!ecc_enabled || !nb_mce_en) {
if (!ecc_enable_override) {
- amd64_printk(KERN_WARNING, "%s", ecc_warning);
+ amd64_printk(KERN_NOTICE, "%s", ecc_msg);
return -ENODEV;
}
ecc_enable_override = 0;
--

Thanks.

--
Regards/Gruss,
Boris.

--
Advanced Micro Devices, Inc.
Operating Systems Research Center


2010-02-11 18:06:10

by Doug Thompson

[permalink] [raw]
Subject: Re: Fw: [Bug 15238] Oops on startup: Kernel failure: EDAC amd64: WARNING: ECC is disabled by BIOS



--- On Thu, 2/11/10, Borislav Petkov <[email protected]> wrote:

> From: Borislav Petkov <[email protected]>
> Subject: Re: Fw: [Bug 15238] Oops on startup: Kernel failure: EDAC amd64: WARNING: ECC is disabled by BIOS
> To: "Doug Thompson" <[email protected]>, "Andrew Morton" <[email protected]>
> Cc: [email protected], "edac-devel" <[email protected]>
> Date: Thursday, February 11, 2010, 7:34 AM
> Hi,
>
> On Wed, Feb 10, 2010 at 09:53:03AM -0800, Doug Thompson
> wrote:
> > boris, can you look at this as well.
> >
> > I think changing WARNING to a NOTICE in the log output
> would stop the oops parsing code from thinking this an OOPS
> event, when it is not
> >
> > doug t
> >
> >
> > --- On Mon, 2/8/10, [email protected]
> <[email protected]>
> wrote:
> >
> > > From: [email protected]
> <[email protected]>
> > > Subject: [Bug 15238] Oops on startup: Kernel
> failure: EDAC amd64: WARNING: ECC is disabled by BIOS
> > > To: [email protected]
> > > Date: Monday, February 8, 2010, 2:42 PM
> > > http://bugzilla.kernel.org/show_bug.cgi?id=15238
> > >
> > >
> > > Andrew Morton <[email protected]>
> > > changed:
> > >
> > > ? ? ? ? ???What?
> > > ? |Removed? ? ? ? ? ?
> > > ? ? ? ???|Added
> > >
> ----------------------------------------------------------------------------
> > > ? ? ? ???AssignedTo|[email protected]???|[email protected]
> > >
> > >
> > >
> > >
> > > --- Comment #1 from Andrew Morton <[email protected]>?
> > > 2010-02-08 21:42:40 ---
> > > Doug, could you please take a look at this one?
>
> How about something like the following? If everyone is ok
> with that
> I'll send it to Linus and stable later since it is trivial
> enough:
>
> dmesg:
>
> EDAC amd64: ECC disabled in the BIOS or no ECC capability,
> module will not load.
> Either enable ECC checking or force module loading by
> setting 'ecc_enable_override'.
> (Note that use of the override may cause unknown side
> effects.)
>
> patch:

Signed-of-by: Doug Thompson <[email protected]>

>
> diff --git a/drivers/edac/amd64_edac.c
> b/drivers/edac/amd64_edac.c
> index 000dc67..3391e67 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -2658,10 +2658,11 @@ static void
> amd64_restore_ecc_error_reporting(struct amd64_pvt *pvt)
> ? * the memory system completely. A command line
> option allows to force-enable
> ? * hardware ECC later in
> amd64_enable_ecc_error_reporting().
> ? */
> -static const char *ecc_warning =
> -??? "WARNING: ECC is disabled by BIOS.
> Module will NOT be loaded.\n"
> -??? " Either Enable ECC in the BIOS, or set
> 'ecc_enable_override'.\n"
> -??? " Also, use of the override can cause
> unknown side effects.\n";
> +static const char *ecc_msg =
> +??? "ECC disabled in the BIOS or no ECC
> capability, module will not load.\n"
> +??? " Either enable ECC checking or force
> module loading by setting "
> +??? "'ecc_enable_override'.\n"
> +??? " (Note that use of the override may
> cause unknown side effects.)\n";
>
> static int amd64_check_ecc_enabled(struct amd64_pvt *pvt)
> {
> @@ -2673,7 +2674,7 @@ static int
> amd64_check_ecc_enabled(struct amd64_pvt *pvt)
>
> ??? ecc_enabled = !!(value &
> K8_NBCFG_ECC_ENABLE);
> ??? if (!ecc_enabled)
> -??? ???
> amd64_printk(KERN_WARNING, "This node reports that Memory
> ECC "
> +??? ???
> amd64_printk(KERN_NOTICE, "This node reports that Memory ECC
> "
> ??? ??? ???
> ? ???"is currently disabled, set
> F3x%x[22] (%s).\n",
> ??? ??? ???
> ? ???K8_NBCFG,
> pci_name(pvt->misc_f3_ctl));
> ??? else
> @@ -2681,13 +2682,13 @@ static int
> amd64_check_ecc_enabled(struct amd64_pvt *pvt)
>
> ??? nb_mce_en =
> amd64_nb_mce_bank_enabled_on_node(pvt->mc_node_id);
> ??? if (!nb_mce_en)
> -??? ???
> amd64_printk(KERN_WARNING, "NB MCE bank disabled, set MSR "
> +??? ???
> amd64_printk(KERN_NOTICE, "NB MCE bank disabled, set MSR "
> ??? ??? ???
> ? ???"0x%08x[4] on node %d to
> enable.\n",
> ??? ??? ???
> ? ???MSR_IA32_MCG_CTL,
> pvt->mc_node_id);
>
> ??? if (!ecc_enabled || !nb_mce_en) {
> ??? ??? if
> (!ecc_enable_override) {
> -??? ??? ???
> amd64_printk(KERN_WARNING, "%s", ecc_warning);
> +??? ??? ???
> amd64_printk(KERN_NOTICE, "%s", ecc_msg);
> ??? ??? ???
> return -ENODEV;
> ??? ??? }
> ??? ??? ecc_enable_override
> = 0;
> --
>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> --
> Advanced Micro Devices, Inc.
> Operating Systems Research Center
>