2009-04-29 01:23:55

by Jeff Haran

[permalink] [raw]
Subject: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check()

Hi,

Recent versions of this function contain the following snippets:

if (err_detect & DDR_EDE_SBE)
edac_mc_handle_ce(mci, pfn, err_addr & PAGE_MASK,
syndrome, row_index, 0, mci->ctl_name);

if (err_detect & DDR_EDE_MBE)
edac_mc_handle_ue(mci, pfn, err_addr & PAGE_MASK,
row_index, mci->ctl_name);

I am pretty sure the references to PAGE_MASK should be proceeded by a
tilda, as in:

if (err_detect & DDR_EDE_SBE)
edac_mc_handle_ce(mci, pfn, err_addr & ~PAGE_MASK,
syndrome, row_index, 0, mci->ctl_name);

if (err_detect & DDR_EDE_MBE)
edac_mc_handle_ue(mci, pfn, err_addr & ~PAGE_MASK,
row_index, mci->ctl_name);


Much as I would like to submit a tested patch like the rest of the
world, I find myself in the situation where the only Freescale target
system I have to test on is running a 3 year old kernel (2.6.14), which
preceeds the introduction of EDAC driver support, at least for
Freescale. So the best I can do is borrow from the new EDAC driver and
backport it to the old kernel.

But I have learned a few things in this process and can thus share what
I've learned as it may be of help to the EDAC driver developers:

1) Before you read the Freescale 8548 CAPTURE_ADDRESS register, you want
to read CAPTURE_ATTRIBUTES first and make sure the VLD bit (least
significant bit in the register) is set or else the data in
CAPTURE_ADDRESS may not be yet valid.

2) When you are done scrubbing the memory with the single bit error, you
want to write 0 to CAPTURE_ATTRIBUTES so as to clear VLD and thus setup
the ECC capture logic to capture the next single bit error.

Please include this email address in responses as I do not subscribe.

Thanks,

Jeff Haran
Brocade


2009-04-29 04:13:14

by Doug Thompson

[permalink] [raw]
Subject: Re: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check()


Dave, can you look at this

doug thompson

--- On Tue, 4/28/09, Jeff Haran <[email protected]> wrote:

> From: Jeff Haran <[email protected]>
> Subject: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check()
> To: [email protected]
> Date: Tuesday, April 28, 2009, 7:23 PM
> Hi,
>
> Recent versions of this function contain the following
> snippets:
>
> ? ? if (err_detect & DDR_EDE_SBE)
> ? ? ? ? edac_mc_handle_ce(mci, pfn,
> err_addr & PAGE_MASK,
> ? ? ? ? ? ? ? ?
> ? syndrome, row_index, 0, mci->ctl_name);
>
> ? ? if (err_detect & DDR_EDE_MBE)
> ? ? ? ? edac_mc_handle_ue(mci, pfn,
> err_addr & PAGE_MASK,
> ? ? ? ? ? ? ? ?
> ? row_index, mci->ctl_name);
>
> I am pretty sure the references to PAGE_MASK should be
> proceeded by a
> tilda, as in:
>
> ? ? if (err_detect & DDR_EDE_SBE)
> ? ? ? ? edac_mc_handle_ce(mci, pfn,
> err_addr & ~PAGE_MASK,
> ? ? ? ? ? ? ? ?
> ? syndrome, row_index, 0, mci->ctl_name);
>
> ? ? if (err_detect & DDR_EDE_MBE)
> ? ? ? ? edac_mc_handle_ue(mci, pfn,
> err_addr & ~PAGE_MASK,
> ? ? ? ? ? ? ? ?
> ? row_index, mci->ctl_name);
>
>
> Much as I would like to submit a tested patch like the rest
> of the
> world, I find myself in the situation where the only
> Freescale target
> system I have to test on is running a 3 year old kernel
> (2.6.14), which
> preceeds the introduction of EDAC driver support, at least
> for
> Freescale. So the best I can do is borrow from the new EDAC
> driver and
> backport it to the old kernel.
>
> But I have learned a few things in this process and can
> thus share what
> I've learned as it may be of help to the EDAC driver
> developers:
>
> 1) Before you read the Freescale 8548 CAPTURE_ADDRESS
> register, you want
> to read CAPTURE_ATTRIBUTES first and make sure the VLD bit
> (least
> significant bit in the register) is set or else the data
> in
> CAPTURE_ADDRESS may not be yet valid.
>
> 2) When you are done scrubbing the memory with the single
> bit error, you
> want to write 0 to CAPTURE_ATTRIBUTES so as to clear VLD
> and thus setup
> the ECC capture logic to capture the next single bit
> error.
>
> Please include this email address in responses as I do not
> subscribe.
>
> Thanks,
>
> Jeff Haran
> Brocade
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at? http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at? http://www.tux.org/lkml/
>

2009-04-29 07:42:00

by Andrew Morton

[permalink] [raw]
Subject: Re: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check()

Let's cc the suitable people.

On Tue, 28 Apr 2009 18:23:42 -0700 "Jeff Haran" <[email protected]> wrote:

> Hi,
>
> Recent versions of this function contain the following snippets:
>
> if (err_detect & DDR_EDE_SBE)
> edac_mc_handle_ce(mci, pfn, err_addr & PAGE_MASK,
> syndrome, row_index, 0, mci->ctl_name);
>
> if (err_detect & DDR_EDE_MBE)
> edac_mc_handle_ue(mci, pfn, err_addr & PAGE_MASK,
> row_index, mci->ctl_name);
>
> I am pretty sure the references to PAGE_MASK should be proceeded by a
> tilda, as in:
>
> if (err_detect & DDR_EDE_SBE)
> edac_mc_handle_ce(mci, pfn, err_addr & ~PAGE_MASK,
> syndrome, row_index, 0, mci->ctl_name);
>
> if (err_detect & DDR_EDE_MBE)
> edac_mc_handle_ue(mci, pfn, err_addr & ~PAGE_MASK,
> row_index, mci->ctl_name);
>

Could well be. PAGE_MASK is very easy to get wrong. I've _never_
trusted my own memory of it and I always have to go back to the
definition when reviewing code :(

> Much as I would like to submit a tested patch like the rest of the
> world, I find myself in the situation where the only Freescale target
> system I have to test on is running a 3 year old kernel (2.6.14), which
> preceeds the introduction of EDAC driver support, at least for
> Freescale. So the best I can do is borrow from the new EDAC driver and
> backport it to the old kernel.
>
> But I have learned a few things in this process and can thus share what
> I've learned as it may be of help to the EDAC driver developers:
>
> 1) Before you read the Freescale 8548 CAPTURE_ADDRESS register, you want
> to read CAPTURE_ATTRIBUTES first and make sure the VLD bit (least
> significant bit in the register) is set or else the data in
> CAPTURE_ADDRESS may not be yet valid.
>
> 2) When you are done scrubbing the memory with the single bit error, you
> want to write 0 to CAPTURE_ATTRIBUTES so as to clear VLD and thus setup
> the ECC capture logic to capture the next single bit error.
>

2009-04-29 12:48:09

by Kumar Gala

[permalink] [raw]
Subject: Re: bug in drivers/edac/mpc85xx_edac.c:mpc85xx_mc_check()


On Apr 29, 2009, at 2:37 AM, Andrew Morton wrote:

> Let's cc the suitable people.
>
> On Tue, 28 Apr 2009 18:23:42 -0700 "Jeff Haran" <[email protected]>
> wrote:
>
>> Hi,
>>
>> Recent versions of this function contain the following snippets:
>>
>> if (err_detect & DDR_EDE_SBE)
>> edac_mc_handle_ce(mci, pfn, err_addr & PAGE_MASK,
>> syndrome, row_index, 0, mci->ctl_name);
>>
>> if (err_detect & DDR_EDE_MBE)
>> edac_mc_handle_ue(mci, pfn, err_addr & PAGE_MASK,
>> row_index, mci->ctl_name);
>>
>> I am pretty sure the references to PAGE_MASK should be proceeded by a
>> tilda, as in:
>>
>> if (err_detect & DDR_EDE_SBE)
>> edac_mc_handle_ce(mci, pfn, err_addr & ~PAGE_MASK,
>> syndrome, row_index, 0, mci->ctl_name);
>>
>> if (err_detect & DDR_EDE_MBE)
>> edac_mc_handle_ue(mci, pfn, err_addr & ~PAGE_MASK,
>> row_index, mci->ctl_name);
>>
>
> Could well be. PAGE_MASK is very easy to get wrong. I've _never_
> trusted my own memory of it and I always have to go back to the
> definition when reviewing code :(

This should ~PAGE_MASK to get the offset into the page.

>> Much as I would like to submit a tested patch like the rest of the
>> world, I find myself in the situation where the only Freescale target
>> system I have to test on is running a 3 year old kernel (2.6.14),
>> which
>> preceeds the introduction of EDAC driver support, at least for
>> Freescale. So the best I can do is borrow from the new EDAC driver
>> and
>> backport it to the old kernel.
>>
>> But I have learned a few things in this process and can thus share
>> what
>> I've learned as it may be of help to the EDAC driver developers:
>>
>> 1) Before you read the Freescale 8548 CAPTURE_ADDRESS register, you
>> want
>> to read CAPTURE_ATTRIBUTES first and make sure the VLD bit (least
>> significant bit in the register) is set or else the data in
>> CAPTURE_ADDRESS may not be yet valid.
>>
>> 2) When you are done scrubbing the memory with the single bit
>> error, you
>> want to write 0 to CAPTURE_ATTRIBUTES so as to clear VLD and thus
>> setup
>> the ECC capture logic to capture the next single bit error.

This is a correct description based on how FSL error HW works.

- k