2014-04-09 11:32:56

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] edac: add support for ARM PL310 L2 cache parity

On Sun, Mar 02, 2014 at 08:02:40PM +0530, Punnaiah Choudary Kalluri wrote:
> Add support for ARM Pl310 L2 cache controller parity error
>
> Signed-off-by: Punnaiah Choudary Kalluri <[email protected]>
> ---
> .../devicetree/bindings/edac/pl310_edac_l2.txt | 19 ++
> drivers/edac/Kconfig | 7 +
> drivers/edac/Makefile | 1 +
> drivers/edac/pl310_edac_l2.c | 236 ++++++++++++++++++++
> 4 files changed, 263 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
> create mode 100644 drivers/edac/pl310_edac_l2.c
>
> diff --git a/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt b/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
> new file mode 100644
> index 0000000..94fbb8d
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
> @@ -0,0 +1,19 @@
> +Pl310 L2 Cache EDAC driver, it does reports the data and tag ram parity errors.
> +
> +Required properties:
> +- compatible: Should be "arm,pl310-cache".
> +- intterupts: Interrupt number to the cpu.
> +- reg: Physical base address and size of cache controller's memory mapped
> + registers
> +
> +Example:
> +++++++++
> +
> + L2: cache-controller {
> + compatible = "arm,pl310-cache";
> + interrupts = <0 2 4>;
> + reg = <0xf8f02000 0x1000>;
> + };
> +
> +PL310 L2 Cache EDAC driver detects the Parity enable state by reading the
> +appropriate control register.
> diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
> index 878f090..059ac31 100644
> --- a/drivers/edac/Kconfig
> +++ b/drivers/edac/Kconfig
> @@ -326,6 +326,13 @@ config EDAC_TILE
> Support for error detection and correction on the
> Tilera memory controller.
>
> +config EDAC_PL310_L2
> + tristate "Pl310 L2 Cache Controller"
> + depends on EDAC_MM_EDAC && ARM
> + help
> + Support for parity error detection on L2 cache controller
> + data and tag ram memory
> +


Ok, so I'm looking at this after having looked at the synopsys thing
and it looks very similar in functionality - it does the basic dance of
registering and setting up stuff, only using different devicetree nodes,
regs, etc.

However, it adds a new file under drivers/edac/ and I'm wondering if it
wouldn't be better to simply create a xilinx_edac.c and put all your
stuff in there, maybe even share code by abstracting it nicely. Having
a separate driver only for a single L2 cache controller seems kinda too
granulary for me.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


2014-04-09 13:18:32

by Rob Herring

[permalink] [raw]
Subject: Re: [RFC PATCH] edac: add support for ARM PL310 L2 cache parity

On Wed, Apr 9, 2014 at 6:32 AM, Borislav Petkov <[email protected]> wrote:
> On Sun, Mar 02, 2014 at 08:02:40PM +0530, Punnaiah Choudary Kalluri wrote:
>> Add support for ARM Pl310 L2 cache controller parity error
>>
>> Signed-off-by: Punnaiah Choudary Kalluri <[email protected]>
>> ---
>> .../devicetree/bindings/edac/pl310_edac_l2.txt | 19 ++
>> drivers/edac/Kconfig | 7 +
>> drivers/edac/Makefile | 1 +
>> drivers/edac/pl310_edac_l2.c | 236 ++++++++++++++++++++
>> 4 files changed, 263 insertions(+), 0 deletions(-)
>> create mode 100644 Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
>> create mode 100644 drivers/edac/pl310_edac_l2.c
>>
>> diff --git a/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt b/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
>> new file mode 100644
>> index 0000000..94fbb8d
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/edac/pl310_edac_l2.txt
>> @@ -0,0 +1,19 @@
>> +Pl310 L2 Cache EDAC driver, it does reports the data and tag ram parity errors.
>> +
>> +Required properties:
>> +- compatible: Should be "arm,pl310-cache".
>> +- intterupts: Interrupt number to the cpu.
>> +- reg: Physical base address and size of cache controller's memory mapped
>> + registers
>> +
>> +Example:
>> +++++++++
>> +
>> + L2: cache-controller {
>> + compatible = "arm,pl310-cache";
>> + interrupts = <0 2 4>;
>> + reg = <0xf8f02000 0x1000>;
>> + };
>> +
>> +PL310 L2 Cache EDAC driver detects the Parity enable state by reading the
>> +appropriate control register.
>> diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
>> index 878f090..059ac31 100644
>> --- a/drivers/edac/Kconfig
>> +++ b/drivers/edac/Kconfig
>> @@ -326,6 +326,13 @@ config EDAC_TILE
>> Support for error detection and correction on the
>> Tilera memory controller.
>>
>> +config EDAC_PL310_L2
>> + tristate "Pl310 L2 Cache Controller"
>> + depends on EDAC_MM_EDAC && ARM
>> + help
>> + Support for parity error detection on L2 cache controller
>> + data and tag ram memory
>> +
>
>
> Ok, so I'm looking at this after having looked at the synopsys thing
> and it looks very similar in functionality - it does the basic dance of
> registering and setting up stuff, only using different devicetree nodes,
> regs, etc.
>
> However, it adds a new file under drivers/edac/ and I'm wondering if it
> wouldn't be better to simply create a xilinx_edac.c and put all your
> stuff in there, maybe even share code by abstracting it nicely. Having
> a separate driver only for a single L2 cache controller seems kinda too
> granulary for me.

I don't think so, the PL310 is present on lots of ARM chips besides
Xilinx. I don't know how many support parity as that is optional. In
fact the highbank_l2_edac.c is for the PL310 as well, but the
registers it uses is all custom logic added for ECC and there is no
part of the PL310 h/w used by the driver.

If there is lots duplication, then that's a sign the framework needs
to handle more of the boilerplate pieces. There could be a "simple"
driver/library for devices which are no more than some registers, an
interrupt handler and static information about the type of EDAC
device.

Rob

2014-04-09 15:19:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] edac: add support for ARM PL310 L2 cache parity

On Wed, Apr 09, 2014 at 08:18:28AM -0500, Rob Herring wrote:
> I don't think so, the PL310 is present on lots of ARM chips besides
> Xilinx. I don't know how many support parity as that is optional. In
> fact the highbank_l2_edac.c is for the PL310 as well, but the
> registers it uses is all custom logic added for ECC and there is no
> part of the PL310 h/w used by the driver.

Oh ok, so highbank_l2 and PL310 could theoretically be merged together
in one compilation unit, even if they don't really share code at all...

> If there is lots duplication, then that's a sign the framework needs
> to handle more of the boilerplate pieces. There could be a "simple"
> driver/library for devices which are no more than some registers, an
> interrupt handler and static information about the type of EDAC
> device.

Yeah, it's not that - I'm just getting worried that I'm receiving an
EDAC driver for each piece of silicon out there and would like to still
keep drivers/edac/ sane and be able to control that wild growth.

I'm just thinking out loud here, bear with me pls:

Frankly, having a single compilation unit contain similar silicon
functionality could be a good way to put a hold on the growth but the
disadvantage of this is fatter drivers. Which wouldn't matter all too
much but after a certain level of fat, they might need splitting.

And the highbank version is nothing but the big probe routine and a
small irq handler.

And the PL310 is similar but also with a poller.

I guess, if they don't share functionality at all, putting them together
might not be worth it. Hohummm.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-09 17:29:54

by Punnaiah Choudary

[permalink] [raw]
Subject: Re: [RFC PATCH] edac: add support for ARM PL310 L2 cache parity

There is a driver file cache-l2x0.c under arch/arm/mm for pl310 cache
configuration and management. Russel king had suggested to use
single driver file for both pl310 edac implementation and cache-l2x0.c
Here is the thread
http://www.spinics.net/lists/arm-kernel/msg320407.html

please provide your inputs

Thanks,
Punnaiah

On Wed, Apr 9, 2014 at 8:49 PM, Borislav Petkov <[email protected]> wrote:
> On Wed, Apr 09, 2014 at 08:18:28AM -0500, Rob Herring wrote:
>> I don't think so, the PL310 is present on lots of ARM chips besides
>> Xilinx. I don't know how many support parity as that is optional. In
>> fact the highbank_l2_edac.c is for the PL310 as well, but the
>> registers it uses is all custom logic added for ECC and there is no
>> part of the PL310 h/w used by the driver.
>
> Oh ok, so highbank_l2 and PL310 could theoretically be merged together
> in one compilation unit, even if they don't really share code at all...
>
>> If there is lots duplication, then that's a sign the framework needs
>> to handle more of the boilerplate pieces. There could be a "simple"
>> driver/library for devices which are no more than some registers, an
>> interrupt handler and static information about the type of EDAC
>> device.
>
> Yeah, it's not that - I'm just getting worried that I'm receiving an
> EDAC driver for each piece of silicon out there and would like to still
> keep drivers/edac/ sane and be able to control that wild growth.
>
> I'm just thinking out loud here, bear with me pls:
>
> Frankly, having a single compilation unit contain similar silicon
> functionality could be a good way to put a hold on the growth but the
> disadvantage of this is fatter drivers. Which wouldn't matter all too
> much but after a certain level of fat, they might need splitting.
>
> And the highbank version is nothing but the big probe routine and a
> small irq handler.
>
> And the PL310 is similar but also with a poller.
>
> I guess, if they don't share functionality at all, putting them together
> might not be worth it. Hohummm.
>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --

2014-04-09 17:47:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] edac: add support for ARM PL310 L2 cache parity

On Wed, Apr 09, 2014 at 10:59:49PM +0530, Punnaiah Choudary wrote:
> There is a driver file cache-l2x0.c under arch/arm/mm for pl310 cache
> configuration and management. Russel king had suggested to use
> single driver file for both pl310 edac implementation and cache-l2x0.c
> Here is the thread
> http://www.spinics.net/lists/arm-kernel/msg320407.html
>
> please provide your inputs

Well, it is very simple - you don't really need an EDAC compilation unit
just so that you can report errors - you can very well do that in your
cache-l2x0.c and boom, problem solved.

Simply move the irq handler and the parity error checker to that file.

:-)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-10 06:12:25

by Michal Simek

[permalink] [raw]
Subject: Re: [RFC PATCH] edac: add support for ARM PL310 L2 cache parity

On 04/09/2014 07:47 PM, Borislav Petkov wrote:
> On Wed, Apr 09, 2014 at 10:59:49PM +0530, Punnaiah Choudary wrote:
>> There is a driver file cache-l2x0.c under arch/arm/mm for pl310 cache
>> configuration and management. Russel king had suggested to use
>> single driver file for both pl310 edac implementation and cache-l2x0.c
>> Here is the thread
>> http://www.spinics.net/lists/arm-kernel/msg320407.html
>>
>> please provide your inputs
>
> Well, it is very simple - you don't really need an EDAC compilation unit
> just so that you can report errors - you can very well do that in your
> cache-l2x0.c and boom, problem solved.
>
> Simply move the irq handler and the parity error checker to that file.

I am just curious about this recommendation. Does it mean that we shouldn't
use edac interface just for reporting problems?

I didn't play with it but I guess that there is a record about edac driver
via sysfs, etc. You could read some useful information that you know
what it is protected and I hope that edac interface bring any value
to reporting errors comparing to just printk message from IRQ handler.

My question is if using printk in IRQ handler and report problem is equal to
reporting this via edac interface. Or it is just easy way to do but
using edac interface is right solution and how to do it properly
is different question.

Thanks,
Michal

--
Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
w: http://www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel - Microblaze cpu - http://www.monstr.eu/fdt/
Maintainer of Linux kernel - Xilinx Zynq ARM architecture
Microblaze U-BOOT custodian and responsible for u-boot arm zynq platform



Attachments:
signature.asc (263.00 B)
OpenPGP digital signature

2014-04-10 09:03:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] edac: add support for ARM PL310 L2 cache parity

On Thu, Apr 10, 2014 at 08:12:17AM +0200, Michal Simek wrote:
> I am just curious about this recommendation. Does it mean that we
> shouldn't use edac interface just for reporting problems?

Yes, we should. But, if you want to have two different drivers accessing
the hardware and have to build synchronization around it, I'm simply
putting the simplest solution on the table in case it is viable. If all
you want to do is report errors to dmesg, then you don't need the edac
stuff.

> I didn't play with it but I guess that there is a record about edac
> driver via sysfs, etc. You could read some useful information that you
> know what it is protected and I hope that edac interface bring any
> value to reporting errors comparing to just printk message from IRQ
> handler.

Yes, you can read error counts and DIMM layout. I'm adding a full dump
of sysfs on my machine at the end. Take a look at it and think whether
this could be something you could use/need/like.

But, if you have only one DIMM slot available on the hw (I'm not even
going to pretend to know your possible DIMM layouts) then maybe EDAC is
not needed at all - people would know which DIMM is at fault :-)

And so on and so on.

> My question is if using printk in IRQ handler and report problem is
> equal to reporting this via edac interface. Or it is just easy way
> to do but using edac interface is right solution and how to do it
> properly is different question.

Yeah, it would probably be easier if you would point out first what you
want to use the edac interface for and we can tell you whether it's
already there/doable/makes sense, etc.

Thanks.

/sys/devices/system/edac/mc/mc0/dbam:0x0000000000000077
/sys/devices/system/edac/mc/mc0/dhar:0x00000000c0004003
/sys/devices/system/edac/mc/mc0/inject_ecc_vector:0x0
/sys/devices/system/edac/mc/mc0/seconds_since_reset:169051
/sys/devices/system/edac/mc/mc0/ce_noinfo_count:0
/sys/devices/system/edac/mc/mc0/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank0/size:2048
/sys/devices/system/edac/mc/mc0/rank0/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank0/dimm_location:csrow 0 channel 0
/sys/devices/system/edac/mc/mc0/rank0/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank0/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank0/dimm_label:mc#0csrow#0channel#0
/sys/devices/system/edac/mc/mc0/rank0/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/rank1/size:2048
/sys/devices/system/edac/mc/mc0/rank1/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank1/dimm_location:csrow 0 channel 1
/sys/devices/system/edac/mc/mc0/rank1/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank1/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank1/dimm_label:mc#0csrow#0channel#1
/sys/devices/system/edac/mc/mc0/rank1/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/rank2/size:2048
/sys/devices/system/edac/mc/mc0/rank2/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank2/dimm_location:csrow 1 channel 0
/sys/devices/system/edac/mc/mc0/rank2/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank2/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank2/dimm_label:mc#0csrow#1channel#0
/sys/devices/system/edac/mc/mc0/rank2/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/rank3/size:2048
/sys/devices/system/edac/mc/mc0/rank3/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank3/dimm_location:csrow 1 channel 1
/sys/devices/system/edac/mc/mc0/rank3/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank3/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank3/dimm_label:mc#0csrow#1channel#1
/sys/devices/system/edac/mc/mc0/rank3/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/rank4/size:2048
/sys/devices/system/edac/mc/mc0/rank4/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank4/dimm_location:csrow 2 channel 0
/sys/devices/system/edac/mc/mc0/rank4/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank4/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank4/dimm_label:mc#0csrow#2channel#0
/sys/devices/system/edac/mc/mc0/rank4/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/rank5/size:2048
/sys/devices/system/edac/mc/mc0/rank5/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank5/dimm_location:csrow 2 channel 1
/sys/devices/system/edac/mc/mc0/rank5/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank5/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank5/dimm_label:mc#0csrow#2channel#1
/sys/devices/system/edac/mc/mc0/rank5/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/rank6/size:2048
/sys/devices/system/edac/mc/mc0/rank6/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank6/dimm_location:csrow 3 channel 0
/sys/devices/system/edac/mc/mc0/rank6/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank6/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank6/dimm_label:mc#0csrow#3channel#0
/sys/devices/system/edac/mc/mc0/rank6/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/rank7/size:2048
/sys/devices/system/edac/mc/mc0/rank7/power/async:disabled
/sys/devices/system/edac/mc/mc0/rank7/dimm_location:csrow 3 channel 1
/sys/devices/system/edac/mc/mc0/rank7/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank7/dimm_mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/rank7/dimm_label:mc#0csrow#3channel#1
/sys/devices/system/edac/mc/mc0/rank7/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/sdram_scrub_rate:195300
/sys/devices/system/edac/mc/mc0/inject_word:0x0
/sys/devices/system/edac/mc/mc0/size_mb:16384
/sys/devices/system/edac/mc/mc0/max_location:csrow 7 channel 1
/sys/devices/system/edac/mc/mc0/mc_name:F15h
/sys/devices/system/edac/mc/mc0/csrow0/power/async:disabled
/sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label:mc#0csrow#0channel#0
/sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/ch1_dimm_label:mc#0csrow#0channel#1
/sys/devices/system/edac/mc/mc0/csrow0/size_mb:4096
/sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/ue_count:0
/sys/devices/system/edac/mc/mc0/csrow0/dev_type:Unknown
/sys/devices/system/edac/mc/mc0/csrow0/ce_count:0
/sys/devices/system/edac/mc/mc0/csrow0/edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/csrow0/mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/csrow1/power/async:disabled
/sys/devices/system/edac/mc/mc0/csrow1/ch0_dimm_label:mc#0csrow#1channel#0
/sys/devices/system/edac/mc/mc0/csrow1/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow1/ch1_dimm_label:mc#0csrow#1channel#1
/sys/devices/system/edac/mc/mc0/csrow1/size_mb:4096
/sys/devices/system/edac/mc/mc0/csrow1/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow1/ue_count:0
/sys/devices/system/edac/mc/mc0/csrow1/dev_type:Unknown
/sys/devices/system/edac/mc/mc0/csrow1/ce_count:0
/sys/devices/system/edac/mc/mc0/csrow1/edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/csrow1/mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/csrow2/power/async:disabled
/sys/devices/system/edac/mc/mc0/csrow2/ch0_dimm_label:mc#0csrow#2channel#0
/sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow2/ch1_dimm_label:mc#0csrow#2channel#1
/sys/devices/system/edac/mc/mc0/csrow2/size_mb:4096
/sys/devices/system/edac/mc/mc0/csrow2/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow2/ue_count:0
/sys/devices/system/edac/mc/mc0/csrow2/dev_type:Unknown
/sys/devices/system/edac/mc/mc0/csrow2/ce_count:0
/sys/devices/system/edac/mc/mc0/csrow2/edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/csrow2/mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/csrow3/power/async:disabled
/sys/devices/system/edac/mc/mc0/csrow3/ch0_dimm_label:mc#0csrow#3channel#0
/sys/devices/system/edac/mc/mc0/csrow3/ch0_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow3/ch1_dimm_label:mc#0csrow#3channel#1
/sys/devices/system/edac/mc/mc0/csrow3/size_mb:4096
/sys/devices/system/edac/mc/mc0/csrow3/ch1_ce_count:0
/sys/devices/system/edac/mc/mc0/csrow3/ue_count:0
/sys/devices/system/edac/mc/mc0/csrow3/dev_type:Unknown
/sys/devices/system/edac/mc/mc0/csrow3/ce_count:0
/sys/devices/system/edac/mc/mc0/csrow3/edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/csrow3/mem_type:Unbuffered-DDR3
/sys/devices/system/edac/mc/mc0/topmem2:0x000000043f000000
/sys/devices/system/edac/mc/mc0/ue_count:0
/sys/devices/system/edac/mc/mc0/topmem:0x00000000c0000000
/sys/devices/system/edac/mc/mc0/ue_noinfo_count:0
/sys/devices/system/edac/mc/mc0/ce_count:0
/sys/devices/system/edac/mc/mc0/dram_hole:c0000000 40000000 40000000
/sys/devices/system/edac/mc/mc0/inject_section:0x0
/sys/devices/system/edac/mc/power/async:disabled
/sys/devices/system/edac/pci/pci0/pe_count:0
/sys/devices/system/edac/pci/pci0/npe_count:0
/sys/devices/system/edac/pci/edac_pci_log_npe:1
/sys/devices/system/edac/pci/edac_pci_panic_on_pe:0
/sys/devices/system/edac/pci/edac_pci_log_pe:1
/sys/devices/system/edac/pci/pci_nonparity_count:0
/sys/devices/system/edac/pci/check_pci_errors:0
/sys/devices/system/edac/pci/pci_parity_count:0
/sys/devices/system/edac/power/async:disabled

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-10 10:09:18

by Michal Simek

[permalink] [raw]
Subject: Re: [RFC PATCH] edac: add support for ARM PL310 L2 cache parity

Hi Borislav,

>> My question is if using printk in IRQ handler and report problem is
>> equal to reporting this via edac interface. Or it is just easy way
>> to do but using edac interface is right solution and how to do it
>> properly is different question.
>
> Yeah, it would probably be easier if you would point out first what you
> want to use the edac interface for and we can tell you whether it's
> already there/doable/makes sense, etc.

OK. This is the v1 that's why we are having this discussion now
and I really appreciate your recommendation.
It wasn't too complicated to create this driver that's why not big deal
and it is always better to talk about the code and how to handle it properly.

I had a call with Punnaiah regarding this L2 edac driver and please
correct me if my understanding of pl310 parity error
is wrong (I didn't read pl310 specification).
There are 2 memories in pl310 - data and tag,
Through controller we are able to detect 2 errors:
data parity error and tag parity error.
Both of them are uncorrectable errors and could be just reported.
L2 contains data and instructions together that's why error can
happen in data or instruction part.

The question here is.
This driver is just reporting problem through edac interface
which is counting that errors and provide an unified way
how to report problems.

Maybe as you said we don't need to use edac interface at all
but by design because every error means that there is the problem and
error should be reported and system should be reset because we just
don't know where the problem is. We know that we have a problem.

The question also is if we should execute any code because the problem
can be with instructions and system should just reset.

Isn't there any security issue that even executing any code is a problem?

From the code I see that IRQ can be raised also for different things
because only errors are handled here (BTW: IRQ_HANDLED should be return when
IRQ is actually handled).

I just want to make sure that edac is right interface, we have right reactions
and this is how to handle this problem.

Thanks,
Michal

--
Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
w: http://www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel - Microblaze cpu - http://www.monstr.eu/fdt/
Maintainer of Linux kernel - Xilinx Zynq ARM architecture
Microblaze U-BOOT custodian and responsible for u-boot arm zynq platform



Attachments:
signature.asc (263.00 B)
OpenPGP digital signature

2014-04-11 13:14:58

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] edac: add support for ARM PL310 L2 cache parity

On Thu, Apr 10, 2014 at 12:09:03PM +0200, Michal Simek wrote:
> The question here is. This driver is just reporting problem through
> edac interface which is counting that errors and provide an unified
> way how to report problems.

Yes, normally you can use edac for reporting and error counting. But,
if, as I said earlier, it is easier to solve your issue of having two
entities touch one hardware and synchronizing around it is too much,
just for this one case, you can simply report the errors with simple
printk, without the edac interface.

This is why I was asking the practical question of why do you even need
the edac interface? If it is only for reporting, use printk and solve
the problem of having two drivers.

> Maybe as you said we don't need to use edac interface at all but by
> design because every error means that there is the problem and error
> should be reported and system should be reset because we just don't
> know where the problem is. We know that we have a problem.
>
> The question also is if we should execute any code because the problem
> can be with instructions and system should just reset.
>
> Isn't there any security issue that even executing any code is a
> problem?

Well, this is up to you to answer. If an UE (Uncorrectable Error) causes
data to get corrupted on your system, which, as a result, corrupts
visible state which lands on storage, you definitely want to stop
executing any code. x86 deals very rigorously with errors like those
by running an exception handler, on AMD there's also this thing called
syncflood which stops any execution and a warm reset happens.

So you have to think hard what those UEs cause on your systems and only
then act accordingly. If something bad like the above happens, the last
thing you want to do is report them to dmesg.

HTH.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--