2019-05-30 10:17:23

by Hanna Hawa

[permalink] [raw]
Subject: [PATCH 0/2] Add support for Amazon's Annapurna Labs EDAC for L1/L2

Add support for error detection and correction for Amazon's Annapurna Labs SoCs
for L1/L2 caches.
Alpine SoCs support L1 and L2 error correction check based on ARM implementation.

Hanna Hawa (2):
dt-bindings: EDAC: add Amazon Annapurna Labs EDAC binding
edac: add support for Amazon's Annapurna Labs EDAC

.../devicetree/bindings/edac/amazon-al-edac.txt | 16 ++
MAINTAINERS | 7 +
drivers/edac/Kconfig | 9 +
drivers/edac/Makefile | 1 +
drivers/edac/amazon_al_ca57_edac.c | 283 +++++++++++++++++++++
5 files changed, 316 insertions(+)
create mode 100644 Documentation/devicetree/bindings/edac/amazon-al-edac.txt
create mode 100644 drivers/edac/amazon_al_ca57_edac.c

--
2.7.4


2019-05-30 10:17:45

by Hanna Hawa

[permalink] [raw]
Subject: [PATCH 1/2] dt-bindings: EDAC: add Amazon Annapurna Labs EDAC binding

Signed-off-by: Hanna Hawa <[email protected]>
---
.../devicetree/bindings/edac/amazon-al-edac.txt | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
create mode 100644 Documentation/devicetree/bindings/edac/amazon-al-edac.txt

diff --git a/Documentation/devicetree/bindings/edac/amazon-al-edac.txt b/Documentation/devicetree/bindings/edac/amazon-al-edac.txt
new file mode 100644
index 0000000..76165a4
--- /dev/null
+++ b/Documentation/devicetree/bindings/edac/amazon-al-edac.txt
@@ -0,0 +1,16 @@
+* Amazon Annapurna Labs EDAC
+
+Amazon Annapurna Labs Alpine V2 SoC is based on ARM Cortex-A57, and Alpine V3 SoC is
+based on ARM Cortex-A72.
+Alpine SoCs support L1 and L2 error correction check based on ARM implementation.
+
+Required properties:
+- compatible:
+ should be "amazon,al-cortex-a57-edac" for Alpine V2.
+ should be "amazon,al-cortex-a72-edac" for Alpine V3.
+
+Example:
+
+ edac_l1_l2 {
+ compatible = "amazon,al-cortex-a57-edac";
+ };
--
2.7.4

2019-05-30 10:19:07

by Hanna Hawa

[permalink] [raw]
Subject: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Add support for error detection and correction for Amazon's Annapurna
Labs SoCs for L1/L2 caches.

Amazon's Annapurna Labs SoCs based on ARM CA57 and CA72, the driver
support both cortex based on compatible string.

Signed-off-by: Hanna Hawa <[email protected]>
---
MAINTAINERS | 7 +
drivers/edac/Kconfig | 9 ++
drivers/edac/Makefile | 1 +
drivers/edac/amazon_al_ca57_edac.c | 283 +++++++++++++++++++++++++++++++++++++
4 files changed, 300 insertions(+)
create mode 100644 drivers/edac/amazon_al_ca57_edac.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 5cfbea4..87fab6a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5645,6 +5645,13 @@ S: Supported
F: Documentation/filesystems/ecryptfs.txt
F: fs/ecryptfs/

+EDAC-AMAZON-AL
+M: Hanna Hawa <[email protected]>
+L: [email protected]
+S: Supported
+F: drivers/edac/amazon_al_ca57_edac.c
+F: Documentation/devicetree/bindings/edac/amazon-al-edac.txt
+
EDAC-AMD64
M: Borislav Petkov <[email protected]>
L: [email protected]
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 5e2e034..1a982f8 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -504,4 +504,13 @@ config EDAC_ASPEED
First, ECC must be configured in the bootloader. Then, this driver
will expose error counters via the EDAC kernel framework.

+config EDAC_AMAZON_AL
+ tristate "Amazon AL EDAC"
+ depends on ARCH_ALPINE
+ help
+ Support for error detection and correction for
+ Amazon's Annapurna Labs SoCs.
+
+ This driver detect errors on L1/L2 caches.
+
endif # EDAC
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index 89ad4a84..7e08974 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -84,3 +84,4 @@ obj-$(CONFIG_EDAC_XGENE) += xgene_edac.o
obj-$(CONFIG_EDAC_TI) += ti_edac.o
obj-$(CONFIG_EDAC_QCOM) += qcom_edac.o
obj-$(CONFIG_EDAC_ASPEED) += aspeed_edac.o
+obj-$(CONFIG_EDAC_AMAZON_AL) += amazon_al_ca57_edac.o
diff --git a/drivers/edac/amazon_al_ca57_edac.c b/drivers/edac/amazon_al_ca57_edac.c
new file mode 100644
index 0000000..08237c0
--- /dev/null
+++ b/drivers/edac/amazon_al_ca57_edac.c
@@ -0,0 +1,283 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+
+#include <linux/atomic.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/platform_device.h>
+
+#include "edac_device.h"
+#include "edac_module.h"
+
+#define DRV_NAME "al_cache_edac"
+
+/* Same bit assignments of CPUMERRSR_EL1 and L2MERRSR_EL1 in ARM CA57/CA72 */
+#define ARM_CA57_CPUMERRSR_INDEX_OFF (0)
+#define ARM_CA57_CPUMERRSR_INDEX_MASK (0x3FFFF)
+#define ARM_CA57_CPUMERRSR_BANK_WAY_OFF (18)
+#define ARM_CA57_CPUMERRSR_BANK_WAY_MASK (0x1F)
+#define ARM_CA57_CPUMERRSR_RAM_ID_OFF (24)
+#define ARM_CA57_CPUMERRSR_RAM_ID_MASK (0x7F)
+#define ARM_CA57_L1_I_TAG_RAM 0x00
+#define ARM_CA57_L1_I_DATA_RAM 0x01
+#define ARM_CA57_L1_D_TAG_RAM 0x08
+#define ARM_CA57_L1_D_DATA_RAM 0x09
+#define ARM_CA57_TLB_RAM 0x18
+#define ARM_CA57_CPUMERRSR_VALID_OFF (31)
+#define ARM_CA57_CPUMERRSR_VALID_MASK (0x1)
+#define ARM_CA57_CPUMERRSR_REPEAT_OFF (32)
+#define ARM_CA57_CPUMERRSR_REPEAT_MASK (0xFF)
+#define ARM_CA57_CPUMERRSR_OTHER_OFF (40)
+#define ARM_CA57_CPUMERRSR_OTHER_MASK (0xFF)
+#define ARM_CA57_CPUMERRSR_FATAL_OFF (63)
+#define ARM_CA57_CPUMERRSR_FATAL_MASK (0x1)
+
+#define ARM_CA57_L2MERRSR_INDEX_OFF (0)
+#define ARM_CA57_L2MERRSR_INDEX_MASK (0x3FFFF)
+#define ARM_CA57_L2MERRSR_CPUID_WAY_OFF (18)
+#define ARM_CA57_L2MERRSR_CPUID_WAY_MASK (0xF)
+#define ARM_CA57_L2MERRSR_RAMID_OFF (24)
+#define ARM_CA57_L2MERRSR_RAMID_MASK (0x7F)
+#define ARM_CA57_L2_TAG_RAM 0x10
+#define ARM_CA57_L2_DATA_RAM 0x11
+#define ARM_CA57_L2_SNOOP_RAM 0x12
+#define ARM_CA57_L2_DIRTY_RAM 0x14
+#define ARM_CA57_L2_INC_PLRU_RAM 0x18
+#define ARM_CA57_L2MERRSR_VALID_OFF (31)
+#define ARM_CA57_L2MERRSR_VALID_MASK (0x1)
+#define ARM_CA57_L2MERRSR_REPEAT_OFF (32)
+#define ARM_CA57_L2MERRSR_REPEAT_MASK (0xFF)
+#define ARM_CA57_L2MERRSR_OTHER_OFF (40)
+#define ARM_CA57_L2MERRSR_OTHER_MASK (0xFF)
+#define ARM_CA57_L2MERRSR_FATAL_OFF (63)
+#define ARM_CA57_L2MERRSR_FATAL_MASK (0x1)
+
+static inline u64 read_cpumerrsr_el1(void)
+{
+ u64 val;
+
+ asm volatile("mrs %0, s3_1_c15_c2_2" : "=r" (val));
+
+ return val;
+}
+
+static inline void write_cpumerrsr_el1(u64 val)
+{
+ asm volatile("msr s3_1_c15_c2_2, %0" :: "r" (val));
+}
+
+static inline u64 read_l2merrsr_el1(void)
+{
+ u64 val;
+
+ asm volatile("mrs %0, s3_1_c15_c2_3" : "=r" (val));
+
+ return val;
+}
+
+static inline void write_l2merrsr_el1(u64 val)
+{
+ asm volatile("msr s3_1_c15_c2_3, %0" :: "r" (val));
+}
+
+static void al_a57_edac_cpumerrsr(void *arg)
+{
+ struct edac_device_ctl_info *edac_dev =
+ (struct edac_device_ctl_info *)arg;
+ int cpu;
+ u32 index, way, ramid, repeat, other, fatal;
+ u64 val = read_cpumerrsr_el1();
+
+ /* Return if no valid error */
+ if (!((val >> ARM_CA57_CPUMERRSR_VALID_OFF) &
+ ARM_CA57_CPUMERRSR_VALID_MASK))
+ return;
+
+ cpu = smp_processor_id();
+ index = (val >> ARM_CA57_CPUMERRSR_INDEX_OFF) &
+ ARM_CA57_CPUMERRSR_INDEX_MASK;
+ way = (val >> ARM_CA57_CPUMERRSR_BANK_WAY_OFF) &
+ ARM_CA57_CPUMERRSR_BANK_WAY_MASK;
+ ramid = (val >> ARM_CA57_CPUMERRSR_RAM_ID_OFF) &
+ ARM_CA57_CPUMERRSR_RAM_ID_MASK;
+ repeat = (val >> ARM_CA57_CPUMERRSR_REPEAT_OFF) &
+ ARM_CA57_CPUMERRSR_REPEAT_MASK;
+ other = (val >> ARM_CA57_CPUMERRSR_OTHER_OFF) &
+ ARM_CA57_CPUMERRSR_OTHER_MASK;
+ fatal = (val >> ARM_CA57_CPUMERRSR_FATAL_OFF) &
+ ARM_CA57_CPUMERRSR_FATAL_MASK;
+
+ edac_device_handle_ce(edac_dev, 0, 0, "L2 Error");
+ edac_printk(KERN_CRIT, DRV_NAME, "CPU%d L1 %serror detected\n",
+ cpu, (fatal) ? "Fatal " : "");
+ edac_printk(KERN_CRIT, DRV_NAME, "RAMID=");
+
+ switch (ramid) {
+ case ARM_CA57_L1_I_TAG_RAM:
+ pr_cont("'L1-I Tag RAM' index=%d way=%d", index, way);
+ break;
+ case ARM_CA57_L1_I_DATA_RAM:
+ pr_cont("'L1-I Data RAM' index=%d bank= %d", index, way);
+ break;
+ case ARM_CA57_L1_D_TAG_RAM:
+ pr_cont("'L1-D Tag RAM' index=%d way=%d", index, way);
+ break;
+ case ARM_CA57_L1_D_DATA_RAM:
+ pr_cont("'L1-D Data RAM' index=%d bank=%d", index, way);
+ break;
+ case ARM_CA57_TLB_RAM:
+ pr_cont("'TLB RAM' index=%d", index);
+ break;
+ default:
+ pr_cont("'unknown'");
+ break;
+ }
+
+ pr_cont(", repeat=%d, other=%d (CPUMERRSR_EL1=0x%llx)\n", repeat, other,
+ val);
+
+ write_cpumerrsr_el1(0);
+}
+
+static void al_a57_edac_l2merrsr(void *arg)
+{
+ struct edac_device_ctl_info *edac_dev =
+ (struct edac_device_ctl_info *)arg;
+ int cpu;
+ u32 index, way, ramid, repeat, other, fatal;
+ u64 val = read_l2merrsr_el1();
+
+ /* Return if no valid error */
+ if (!((val >> ARM_CA57_L2MERRSR_VALID_OFF) &
+ ARM_CA57_L2MERRSR_VALID_MASK))
+ return;
+
+ cpu = smp_processor_id();
+ index = (val >> ARM_CA57_L2MERRSR_INDEX_OFF) &
+ ARM_CA57_L2MERRSR_INDEX_MASK;
+ way = (val >> ARM_CA57_L2MERRSR_CPUID_WAY_OFF) &
+ ARM_CA57_L2MERRSR_CPUID_WAY_MASK;
+ ramid = (val >> ARM_CA57_L2MERRSR_RAMID_OFF) &
+ ARM_CA57_L2MERRSR_RAMID_MASK;
+ repeat = (val >> ARM_CA57_L2MERRSR_REPEAT_OFF) &
+ ARM_CA57_L2MERRSR_REPEAT_MASK;
+ other = (val >> ARM_CA57_L2MERRSR_OTHER_OFF) &
+ ARM_CA57_L2MERRSR_OTHER_MASK;
+ fatal = (val >> ARM_CA57_L2MERRSR_FATAL_OFF) &
+ ARM_CA57_L2MERRSR_FATAL_MASK;
+
+ edac_device_handle_ce(edac_dev, 0, 0, "L2 Error");
+ edac_printk(KERN_CRIT, DRV_NAME, "CPU%d L2 %serror detected\n",
+ cpu, (fatal) ? "Fatal " : "");
+ edac_printk(KERN_CRIT, DRV_NAME, "RAMID=");
+
+ switch (ramid) {
+ case ARM_CA57_L2_TAG_RAM:
+ pr_cont("'L2 Tag RAM'");
+ break;
+ case ARM_CA57_L2_DATA_RAM:
+ pr_cont("'L2 Data RAM'");
+ break;
+ case ARM_CA57_L2_SNOOP_RAM:
+ pr_cont("'L2 Snoop RAM'");
+ break;
+ case ARM_CA57_L2_DIRTY_RAM:
+ pr_cont("'L2 Dirty RAM'");
+ break;
+ case ARM_CA57_L2_INC_PLRU_RAM:
+ pr_cont("'L2 Inclusion PLRU RAM'");
+ break;
+ default:
+ pr_cont("'unknown'");
+ break;
+ }
+
+ pr_cont(", cpuid/way=%d, repeat=%d, other=%d (L2MERRSR_EL1=0x%llx)\n",
+ way, repeat, other, val);
+
+ write_l2merrsr_el1(0);
+}
+
+static void al_a57_edac_check(struct edac_device_ctl_info *edac_dev)
+{
+ int cpu, cluster, last_cluster = -1;
+
+ /*
+ * Use get_online_cpus/put_online_cpus to prevent the online CPU map
+ * changing while reads the L1/L2 error status
+ */
+ get_online_cpus();
+ for_each_online_cpu(cpu) {
+ /* Check L1 errors */
+ smp_call_function_single(cpu, al_a57_edac_cpumerrsr, edac_dev,
+ 0);
+ cluster = topology_physical_package_id(cpu);
+ /* Only single CPU will read the L2 error status */
+ if (cluster != last_cluster) {
+ smp_call_function_single(cpu, al_a57_edac_l2merrsr,
+ edac_dev, 0);
+ last_cluster = cluster;
+ }
+ }
+ put_online_cpus();
+}
+
+static int al_a57_edac_probe(struct platform_device *pdev)
+{
+ struct edac_device_ctl_info *edac_dev;
+ struct device *dev = &pdev->dev;
+ int ret;
+
+ /* Polling mode is supported */
+ edac_op_state = EDAC_OPSTATE_POLL;
+
+ edac_dev = edac_device_alloc_ctl_info(0, DRV_NAME, 1, "L", 2, 1, NULL,
+ 0, edac_device_alloc_index());
+ if (IS_ERR(edac_dev))
+ return -ENOMEM;
+
+ edac_dev->edac_check = al_a57_edac_check;
+ edac_dev->dev = dev;
+ edac_dev->mod_name = dev_name(dev);
+ edac_dev->dev_name = dev_name(dev);
+ edac_dev->ctl_name = dev_name(dev);
+ platform_set_drvdata(pdev, edac_dev);
+
+ ret = edac_device_add_device(edac_dev);
+ if (ret)
+ edac_device_free_ctl_info(edac_dev);
+
+ return ret;
+}
+
+static int al_a57_edac_remove(struct platform_device *pdev)
+{
+ struct edac_device_ctl_info *edac_dev = platform_get_drvdata(pdev);
+
+ edac_device_del_device(edac_dev->dev);
+ edac_device_free_ctl_info(edac_dev);
+
+ return 0;
+}
+
+static const struct of_device_id al_a57_edac_of_match[] = {
+ { .compatible = "amazon,al-cortex-a57-edac" },
+ { .compatible = "amazon,al-cortex-a72-edac" },
+ {}
+};
+MODULE_DEVICE_TABLE(of, al_a57_edac_of_match);
+
+static struct platform_driver al_a57_edac_driver = {
+ .probe = al_a57_edac_probe,
+ .remove = al_a57_edac_remove,
+ .driver = {
+ .name = DRV_NAME,
+ .of_match_table = al_a57_edac_of_match,
+ },
+};
+module_platform_driver(al_a57_edac_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hanna Hawa <[email protected]>");
--
2.7.4

2019-05-30 11:56:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 1/2] dt-bindings: EDAC: add Amazon Annapurna Labs EDAC binding

On Thu, May 30, 2019 at 01:15:28PM +0300, Hanna Hawa wrote:
> Signed-off-by: Hanna Hawa <[email protected]>
> ---
> .../devicetree/bindings/edac/amazon-al-edac.txt | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/edac/amazon-al-edac.txt

I know I can't take patches without any changelog text, but perhaps
other maintainers are more leniant... :)

Please fix up.

thanks,

greg k-h

2019-05-30 11:59:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Thu, May 30, 2019 at 01:15:29PM +0300, Hanna Hawa wrote:
> +static void al_a57_edac_cpumerrsr(void *arg)
> +{
> + struct edac_device_ctl_info *edac_dev =
> + (struct edac_device_ctl_info *)arg;

No need for casting anything here, just assign it. Doesn't checkpatch
catch this type of thing these days? You did run it, right?

Please fix that up everywhere you do this in the driver.


thanks,

greg k-h

2019-05-30 12:53:36

by Hanna Hawa

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On 5/30/19 2:57 PM, Greg KH wrote:
> On Thu, May 30, 2019 at 01:15:29PM +0300, Hanna Hawa wrote:
>> +static void al_a57_edac_cpumerrsr(void *arg)
>> +{
>> + struct edac_device_ctl_info *edac_dev =
>> + (struct edac_device_ctl_info *)arg;
> No need for casting anything here, just assign it. Doesn't checkpatch
> catch this type of thing these days? You did run it, right?

I did, but checkpatch didn't catch this. I'll fix in next patch-set.

Thanks for your review.


Hanna

>
> Please fix that up everywhere you do this in the driver.
>
>
> thanks,
>
> greg k-h


2019-05-30 13:05:58

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Thu, 2019-05-30 at 15:52 +0300, [email protected] wrote:
> On 5/30/19 2:57 PM, Greg KH wrote:
> > On Thu, May 30, 2019 at 01:15:29PM +0300, Hanna Hawa wrote:
> > > +static void al_a57_edac_cpumerrsr(void *arg)
> > > +{
> > > + struct edac_device_ctl_info *edac_dev =
> > > + (struct edac_device_ctl_info *)arg;
> > No need for casting anything here, just assign it. Doesn't checkpatch
> > catch this type of thing these days? You did run it, right?
>
> I did, but checkpatch didn't catch this. I'll fix in next patch-set.
>
> Thanks for your review.

checkpatch is brainless about the types of variables/arguments.

coccinelle is another very useful tool so you could also run
scripts/coccicheck on your sources.

see: Documentation/dev-tools/coccinelle.rst


2019-05-30 18:21:01

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On May 30, 2019 3:15:29 AM PDT, Hanna Hawa <[email protected]> wrote:
>Add support for error detection and correction for Amazon's Annapurna
>Labs SoCs for L1/L2 caches.


So this should be a driver for the whole annapurna platform and not only about the RAS functionality in an IP like the caches. See other ARM EDAC drivers in drivers/edac/ for an example.

Thx.

--
Sent from a small device: formatting sux and brevity is inevitable.

2019-05-31 00:37:27

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2] dt-bindings: EDAC: add Amazon Annapurna Labs EDAC binding

On Thu, May 30, 2019 at 04:54:46AM -0700, Greg KH wrote:
> I know I can't take patches without any changelog text,

I thought I was the only one?! :-)

Good.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply. Srsly.

2019-05-31 01:18:21

by Herrenschmidt, Benjamin

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Thu, 2019-05-30 at 11:19 -0700, Boris Petkov wrote:
> On May 30, 2019 3:15:29 AM PDT, Hanna Hawa <[email protected]> wrote:
> > Add support for error detection and correction for Amazon's
> > Annapurna
> > Labs SoCs for L1/L2 caches.
>
>
> So this should be a driver for the whole annapurna platform and not
> only about the RAS functionality in an IP like the caches. See other
> ARM EDAC drivers in drivers/edac/ for an example.

This isn't terribly helpful, there's nothing telling anybody which of
those files corresponds to an ARM SoC :-)

That said ...

You really want a single EDAC driver that contains all the stuff for
the caches, the memory controller, etc... ?

The idea here was to separate the core L1/L2 EDAC from the memory
controller EDAC I think ... Roben, Hanna, can you describe the long run
strategy here ?

Cheers,
Ben.

2019-05-31 05:15:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Fri, May 31, 2019 at 01:15:33AM +0000, Herrenschmidt, Benjamin wrote:
> This isn't terribly helpful, there's nothing telling anybody which of
> those files corresponds to an ARM SoC :-)

drivers/edac/altera_edac.c is one example.

Also, James and I have a small writeup on how an arm driver should look
like, we just need to polish it up and post it.

James?

> That said ...
>
> You really want a single EDAC driver that contains all the stuff for
> the caches, the memory controller, etc... ?

Yap.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply. Srsly.

2019-06-03 06:58:24

by Hanna Hawa

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC



On 5/31/2019 4:15 AM, Herrenschmidt, Benjamin wrote:
> On Thu, 2019-05-30 at 11:19 -0700, Boris Petkov wrote:
>> On May 30, 2019 3:15:29 AM PDT, Hanna Hawa <[email protected]> wrote:
>>> Add support for error detection and correction for Amazon's
>>> Annapurna
>>> Labs SoCs for L1/L2 caches.
>>
>>
>> So this should be a driver for the whole annapurna platform and not
>> only about the RAS functionality in an IP like the caches. See other
>> ARM EDAC drivers in drivers/edac/ for an example.
>
> This isn't terribly helpful, there's nothing telling anybody which of
> those files corresponds to an ARM SoC :-)
>
> That said ...
>
> You really want a single EDAC driver that contains all the stuff for
> the caches, the memory controller, etc... ?
>
> The idea here was to separate the core L1/L2 EDAC from the memory
> controller EDAC I think ... Roben, Hanna, can you describe the long run
> strategy here ?
Correct our target to separate the L1/L2 EDAC from mc, and to maintain
both in separate drivers.

Thanks,
Hanna
>
> Cheers,
> Ben.
>

2019-06-05 15:16:21

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Hi guys,

On 31/05/2019 06:14, Borislav Petkov wrote:
> On Fri, May 31, 2019 at 01:15:33AM +0000, Herrenschmidt, Benjamin wrote:
>> This isn't terribly helpful, there's nothing telling anybody which of
>> those files corresponds to an ARM SoC :-)
>
> drivers/edac/altera_edac.c is one example.
>
> Also, James and I have a small writeup on how an arm driver should look
> like, we just need to polish it up and post it.
>
> James?

Yes I should get on with that. Its mostly for platforms which end up with multiple
piecemeal drivers and some co-ordination is needed. It doesn't look like that will be a
problem here.


>> That said ...
>>
>> You really want a single EDAC driver that contains all the stuff for
>> the caches, the memory controller, etc... ?

This has to be platform specific as it has integration-time dependencies and firmware
dependencies. Doing it as a platform driver matched from the machine-compatible may be
more straightforward today.

The DT will already say "compatible = arm,cortex-a57" for the Alpine-v2, what that
'edac_l1_l2' node is telling us is the integration/firmware stuff has been done, and the
imp-def instructions can be used.


Thanks,

James

2019-06-05 15:17:55

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Hi Hana,

On 30/05/2019 11:15, Hanna Hawa wrote:
> Add support for error detection and correction for Amazon's Annapurna
> Labs SoCs for L1/L2 caches.
>
> Amazon's Annapurna Labs SoCs based on ARM CA57 and CA72, the driver
> support both cortex based on compatible string.

> diff --git a/drivers/edac/amazon_al_ca57_edac.c b/drivers/edac/amazon_al_ca57_edac.c
> new file mode 100644
> index 0000000..08237c0
> --- /dev/null
> +++ b/drivers/edac/amazon_al_ca57_edac.c
> @@ -0,0 +1,283 @@
> +// SPDX-License-Identifier: GPL-2.0

> +/* Same bit assignments of CPUMERRSR_EL1 and L2MERRSR_EL1 in ARM CA57/CA72 */

Allowing linux to access these implementation-defined registers has come up before:
https://www.spinics.net/lists/kernel/msg2750349.html

It looks like you've navigated most of the issues. Accessing implementation-defined
registers is frowned on, but this stuff can't be done generically until v8.2.

This can't be done on 'all A57/A72' because some platforms may not have been integrated to
have error-checking at L1/L2, (1.3 'Features' of [0] says the ECC protection for L1 data
cache etc is optional). Even if they did, this stuff needs turning on in L2CTLR_EL1.
These implementation-defined registers may be trapped by the hypervisor, I assume for your
platform this is linux booted at EL2, so no hypervisor.


> +#define ARM_CA57_CPUMERRSR_INDEX_OFF (0)
> +#define ARM_CA57_CPUMERRSR_INDEX_MASK (0x3FFFF)

(GENMASK() would make it quicker and easier to compare this with the datasheet)


> +#define ARM_CA57_L2_TAG_RAM 0x10
> +#define ARM_CA57_L2_DATA_RAM 0x11
> +#define ARM_CA57_L2_SNOOP_RAM 0x12
> +#define ARM_CA57_L2_DIRTY_RAM 0x14

> +#define ARM_CA57_L2_INC_PLRU_RAM 0x18

A57 describes this one as 'PF RAM'...


> +static inline u64 read_cpumerrsr_el1(void)
> +{
> + u64 val;
> +
> + asm volatile("mrs %0, s3_1_c15_c2_2" : "=r" (val));
> +
> + return val;
> +}

Linux supports versions of binutils that choke on this syntax.
See the sys_reg() definitions in arm64's asm/sysreg.h that define something you can feed
to read_sysreg_s(). It would save having these wrapper functions.

commit 72c583951526 ("arm64: gicv3: Allow GICv3 compilation with older binutils") for the
story.


> +static void al_a57_edac_cpumerrsr(void *arg)
> +{
> + struct edac_device_ctl_info *edac_dev =
> + (struct edac_device_ctl_info *)arg;
> + int cpu;
> + u32 index, way, ramid, repeat, other, fatal;
> + u64 val = read_cpumerrsr_el1();
> +
> + /* Return if no valid error */
> + if (!((val >> ARM_CA57_CPUMERRSR_VALID_OFF) &
> + ARM_CA57_CPUMERRSR_VALID_MASK))
> + return;

| #define ARM_CA57_CPUMERRSR_VALID BIT(31)
| if (!(val & ARM_CA57_CPUMERRSR_VALID))

would be easier to read, the same goes for 'fatal' as its a single bit.


> + edac_device_handle_ce(edac_dev, 0, 0, "L2 Error");

How do we know this was corrected?

6.4.8 "Error Correction Code" has "Double-bit ECC errors set the fatal bit." in a
paragraph talking about the L1 memory system.

"L2 Error" ? Copy and paste?


> + edac_printk(KERN_CRIT, DRV_NAME, "CPU%d L1 %serror detected\n",
> + cpu, (fatal) ? "Fatal " : "");
> + edac_printk(KERN_CRIT, DRV_NAME, "RAMID=");
> +
> + switch (ramid) {
> + case ARM_CA57_L1_I_TAG_RAM:
> + pr_cont("'L1-I Tag RAM' index=%d way=%d", index, way);
> + break;
> + case ARM_CA57_L1_I_DATA_RAM:
> + pr_cont("'L1-I Data RAM' index=%d bank= %d", index, way);
> + break;

Is index/way information really useful? I can't replace way-3 on the system, nor can I
stop it being used. If its useless, I'd rather we don't bother parsing and printing it out.


> + pr_cont(", repeat=%d, other=%d (CPUMERRSR_EL1=0x%llx)\n", repeat, other,
> + val);

'other' here is another error, but we don't know the ramid.
'repeat' is another error for the same ramid.

Could we still feed this stuff into edac? This would make the counters accurate if the
polling frequency isn't quite fast enough.


> + write_cpumerrsr_el1(0);
> +}


> +static void al_a57_edac_l2merrsr(void *arg)
> +{

> + edac_device_handle_ce(edac_dev, 0, 0, "L2 Error");

How do we know this is corrected?

If looks like L2CTLR_EL1[20] might force fatal 1/0 to map to uncorrected/corrected. Is
this what you are depending on here?

(it would be good to have a list of integration-time and firmware dependencies this driver
has, for the next person who tries to enable it on their system and complains it doesn't
work for them)


> + edac_printk(KERN_CRIT, DRV_NAME, "CPU%d L2 %serror detected\n",
> + cpu, (fatal) ? "Fatal " : "");
> + edac_printk(KERN_CRIT, DRV_NAME, "RAMID=");
> +
> + switch (ramid) {
> + case ARM_CA57_L2_TAG_RAM:
> + pr_cont("'L2 Tag RAM'");
> + break;
> + case ARM_CA57_L2_DATA_RAM:
> + pr_cont("'L2 Data RAM'");
> + break;
> + case ARM_CA57_L2_SNOOP_RAM:
> + pr_cont("'L2 Snoop RAM'");
> + break;
> + case ARM_CA57_L2_DIRTY_RAM:
> + pr_cont("'L2 Dirty RAM'");
> + break;
> + case ARM_CA57_L2_INC_PLRU_RAM:
> + pr_cont("'L2 Inclusion PLRU RAM'");

The A57 TRM describes this as "Inclusion PF RAM", and notes its only in r1p0 or later,
(but doesn't say what it is). The A72 TRM describes the same encoding as "Inclusion PLRU
RAM", which is something to do with its replacement policy. It has control bits that A57's
version doesn't, so these are not the same thing.

Disambiguating A57/A72 here is a load of faff, 'L2 internal metadata' probably covers both
cases, but unless these RAMs are replaceable or can be disabled, there isn't much point
working out which one it was.


> + break;
> + default:
> + pr_cont("'unknown'");
> + break;
> + }
> +
> + pr_cont(", cpuid/way=%d, repeat=%d, other=%d (L2MERRSR_EL1=0x%llx)\n",
> + way, repeat, other, val);

cpuid could be useful if you can map it back to the cpu number linux has.
If you can spot that cpu-7 is experiencing more errors than it should, you can leave it
offline.

To do this you'd need to map each L2MERRSR_EL1's '0-3' range back to the CPUs they
actually are. The gic's 'ppi-partitions' does this with phandles, e.g.
Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml. You could add a
similar shaped thing to the l2-cacheX node in the DT, (or in your edac node, but it is a
property of the cache integration).


> + write_l2merrsr_el1(0);
> +}
> +
> +static void al_a57_edac_check(struct edac_device_ctl_info *edac_dev)
> +{
> + int cpu, cluster, last_cluster = -1;
> +
> + /*
> + * Use get_online_cpus/put_online_cpus to prevent the online CPU map
> + * changing while reads the L1/L2 error status

For walking the list of offline cpus, this makes sense. But you schedule work without
waiting, it may get run after you drop the cpus_read_lock()...,


> + */

> + get_online_cpus();

The comment above these helpers is:
| /* Wrappers which go away once all code is converted */

cpus_read_lock()?


> + for_each_online_cpu(cpu) {
> + /* Check L1 errors */
> + smp_call_function_single(cpu, al_a57_edac_cpumerrsr, edac_dev,
> + 0);

As you aren't testing for big/little, wouldn't on_each_cpu() here be simpler?

As you don't wait, what stops al_a57_edac_cpumerrsr() feeding two errors into
edac_device_handle_ce() at the same time? Do you need a spinlock in al_a57_edac_cpumerrsr()?


> + cluster = topology_physical_package_id(cpu);

Hmm, I'm not sure cluster==package is guaranteed to be true forever.

If you describe the L2MERRSR_EL1 cpu mapping in your DT you could use that. Otherwise
pulling out the DT using something like the arch code's parse_cluster().


> + /* Only single CPU will read the L2 error status */

> + if (cluster != last_cluster) {
> + smp_call_function_single(cpu, al_a57_edac_l2merrsr,
> + edac_dev, 0);
> + last_cluster = cluster;
> + }

Here you depend on the CPUs being listed in cluster-order in the DT. I'm fairly sure the
numbering is arbitrary: On my Juno 0,3,4,5 are the A53 cluster, and 1,2 are the A57 cluster.

If 1,3,5 were cluster-a and 2,4,6 were cluster-b, you would end up calling
al_a57_edac_l2merrsr() for each cpu. As you don't wait, they could race.

If you can get a cpu-mask for each cluster, smp_call_function_any() would to the
pick-one-online-cpu work for you.


> + }
> + put_online_cpus();
> +}

> +static int al_a57_edac_remove(struct platform_device *pdev)
> +{
> + struct edac_device_ctl_info *edac_dev = platform_get_drvdata(pdev);
> +
> + edac_device_del_device(edac_dev->dev);
> + edac_device_free_ctl_info(edac_dev);

Your poll function schedule work on other CPUs and didn't wait, is it possible
al_a57_edac_l2merrsr() is still using this memory when you free it?


> + return 0;
> +}


> +MODULE_LICENSE("GPL");

| MODULE_LICENSE("GPL v2");

To match the SPDX header?



Thanks,

James


[0]
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0488c/DDI0488C_cortex_a57_mpcore_r1p0_trm.pdf

2019-06-06 07:56:11

by Hanna Hawa

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC



On 5/31/2019 8:14 AM, Borislav Petkov wrote:
> On Fri, May 31, 2019 at 01:15:33AM +0000, Herrenschmidt, Benjamin wrote:
>> This isn't terribly helpful, there's nothing telling anybody which of
>> those files corresponds to an ARM SoC :-)
>
> drivers/edac/altera_edac.c is one example.
>
> Also, James and I have a small writeup on how an arm driver should look
> like, we just need to polish it up and post it.
>
> James?
>
>> That said ...
>>
>> You really want a single EDAC driver that contains all the stuff for
>> the caches, the memory controller, etc... ?
>
> Yap.

Disagree. The various drivers don't depend on each other.
I think we should keep the drivers separated as they are distinct and
independent IP blocks.

>

2019-06-06 10:35:11

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Hi Hawa,

On 06/06/2019 08:53, Hawa, Hanna wrote:
> On 5/31/2019 8:14 AM, Borislav Petkov wrote:
>> On Fri, May 31, 2019 at 01:15:33AM +0000, Herrenschmidt, Benjamin wrote:
>>> This isn't terribly helpful, there's nothing telling anybody which of
>>> those files corresponds to an ARM SoC :-)
>>
>> drivers/edac/altera_edac.c is one example.
>>
>> Also, James and I have a small writeup on how an arm driver should look
>> like, we just need to polish it up and post it.
>>
>> James?
>>
>>> That said ...
>>>
>>> You really want a single EDAC driver that contains all the stuff for
>>> the caches, the memory controller, etc... ?
>>
>> Yap.
>
> Disagree. The various drivers don't depend on each other.
> I think we should keep the drivers separated as they are distinct and independent IP blocks.

But they don't exist in isolation, they both depend on the integration-choices/firmware
that makes up your platform.

Other platforms may have exactly the same IP blocks, configured differently, or with
different features enabled in firmware. This means we can't just probe the driver based on
the presence of the IP block, we need to know the integration choices and firmware
settings match what the driver requires.

(Case in point, that A57 ECC support is optional, another A57 may not have it)

Descriptions of what firmware did don't really belong in the DT. Its not a hardware property.

This is why its better to probe this stuff based on the machine-compatible/platform-name,
not the presence of the IP block in the DT.


Will either of your separate drivers ever run alone? If they're probed from the same
machine-compatible this won't happen.


How does your memory controller report errors? Does it send back some data with an invalid
checksum, or a specific poison/invalid flag? Will the cache report this as a cache error
too, if its an extra signal, does the cache know what it is?

All these are integration choices between the two IP blocks, done as separate drivers we
don't have anywhere to store that information. Even if you don't care about this, making
them separate drivers should only be done to make them usable on other platforms, where
these choices may have been different.


Thanks,

James

2019-06-06 12:21:48

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Thu, Jun 06, 2019 at 11:33:30AM +0100, James Morse wrote:
> All these are integration choices between the two IP blocks, done as separate drivers we
> don't have anywhere to store that information. Even if you don't care about this, making
> them separate drivers should only be done to make them usable on other platforms,

... that we can do with a separately integratable object like
fsl_ddr_edac.c does it, for example, where it is shared between
two different platform drivers.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-06 12:36:46

by Shenhar, Talel

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC


>> Disagree. The various drivers don't depend on each other.
>> I think we should keep the drivers separated as they are distinct and independent IP blocks.
> But they don't exist in isolation, they both depend on the integration-choices/firmware
> that makes up your platform.
>
> Other platforms may have exactly the same IP blocks, configured differently, or with
> different features enabled in firmware. This means we can't just probe the driver based on
> the presence of the IP block, we need to know the integration choices and firmware
> settings match what the driver requires.
>
> (Case in point, that A57 ECC support is optional, another A57 may not have it)
>
> Descriptions of what firmware did don't really belong in the DT. Its not a hardware property.
>
> This is why its better to probe this stuff based on the machine-compatible/platform-name,
> not the presence of the IP block in the DT.
>
>
> Will either of your separate drivers ever run alone? If they're probed from the same
> machine-compatible this won't happen.
>
>
> How does your memory controller report errors? Does it send back some data with an invalid
> checksum, or a specific poison/invalid flag? Will the cache report this as a cache error
> too, if its an extra signal, does the cache know what it is?
>
> All these are integration choices between the two IP blocks, done as separate drivers we
> don't have anywhere to store that information. Even if you don't care about this, making
> them separate drivers should only be done to make them usable on other platforms, where
> these choices may have been different.

James,

Thanks for the prompt responses.

From our perspective, l1/l2 has nothing to do with the ddr memory
controller.

Its right that they both use same edac subsystem but they are using
totally different APIs of it.

We also even want to have separate control for enabling/disabling l1/l2
edac vs memory controller edac.

Even from technical point-of-view L1/L2 UE collection method is totally
different from collecting memory-controller UE. (CPU exception vs actual
interrupts).

So there is less reason why to combine them vs giving each one its own
file, e.g. al_mc_edac, al_l1_l2_edac (I even don't see why Hanna
combined l1 and l2...)

As we don't have any technical relation between the two we would rather
avoid this combination.

Also, Lets assume we have different setups with different memory
controllers, having a dt binding to control the difference is super easy
and flexible.

Would having a dedicated folder for amazon ease the move to separate files?

Thanks,

Talel.

>
> Thanks,
>
> James

2019-06-06 16:25:41

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Thu, Jun 06, 2019 at 10:53:42AM +0300, Hawa, Hanna wrote:
> Disagree. The various drivers don't depend on each other.
> I think we should keep the drivers separated as they are distinct and
> independent IP blocks.

This topic comes up each time someone submits a new ARM EDAC driver:
EDAC can't support different drivers because of the limitations it has.
And a single platform driver works - see altera_edac.

Again, James has a good write up with example how this can work so be
patient.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-07 15:14:48

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Hi guys,

On 06/06/2019 12:37, Shenhar, Talel wrote:
>>> Disagree. The various drivers don't depend on each other.
>>> I think we should keep the drivers separated as they are distinct and independent IP
>>> blocks.
>> But they don't exist in isolation, they both depend on the integration-choices/firmware
>> that makes up your platform.
>>
>> Other platforms may have exactly the same IP blocks, configured differently, or with
>> different features enabled in firmware. This means we can't just probe the driver based on
>> the presence of the IP block, we need to know the integration choices and firmware
>> settings match what the driver requires.
>>
>> (Case in point, that A57 ECC support is optional, another A57 may not have it)
>>
>> Descriptions of what firmware did don't really belong in the DT. Its not a hardware
>> property.
>>
>> This is why its better to probe this stuff based on the machine-compatible/platform-name,
>> not the presence of the IP block in the DT.
>>
>>
>> Will either of your separate drivers ever run alone? If they're probed from the same
>> machine-compatible this won't happen.
>>
>>
>> How does your memory controller report errors? Does it send back some data with an invalid
>> checksum, or a specific poison/invalid flag? Will the cache report this as a cache error
>> too, if its an extra signal, does the cache know what it is?
>>
>> All these are integration choices between the two IP blocks, done as separate drivers we
>> don't have anywhere to store that information. Even if you don't care about this, making
>> them separate drivers should only be done to make them usable on other platforms, where
>> these choices may have been different.

> From our perspective, l1/l2 has nothing to do with the ddr memory controller.

I understand you're coming from the position that these things have counters, you want
something to read and export them.

I'm coming at this from somewhere else. This stuff has to be considered all the way
through the system. Just because each component supports error detection, doesn't mean you
aren't going to get silent corruption. Likewise if another platform picks up two piecemeal
edac drivers for hardware it happens to have in common with yours, it doesn't mean we're
counting all the errors. This stuff has to be viewed for the whole platform.


> Its right that they both use same edac subsystem but they are using totally different APIs
> of it.
>
> We also even want to have separate control for enabling/disabling l1/l2 edac vs memory
> controller edac.

Curious, what for? Surely you either care about counting errors, or you don't.


> Even from technical point-of-view L1/L2 UE collection method is totally different from
> collecting memory-controller UE. (CPU exception vs actual interrupts).
>
> So there is less reason why to combine them vs giving each one its own file, e.g.
> al_mc_edac, al_l1_l2_edac (I even don't see why Hanna combined l1 and l2...)

> As we don't have any technical relation between the two we would rather avoid this
> combination.
>
> Also, Lets assume we have different setups with different memory controllers, having a dt
> binding to control the difference is super easy and flexible.

If the hardware is different you should describe this in the DT. I'm not suggesting you
don't describe it.

The discussion here is whether we should probe the driver based on a dummy-node
compatible, (which this 'edac_l1_l2' is) or based on the machine compatible.

At the extreme end: you should paint the CPU and cache nodes with a compatible describing
your integration. (I've mangled Juno's DT here:)
| A57_0: cpu@0 {
| compatible = "amazon-al,cortex-a57", "arm,cortex-a57";
| reg = <0x0 0x0>;
| device_type = "cpu";
| next-level-cache = <&A57_L2>;
| };
|
[...]
|
| A57_L2: l2-cache0 {
| compatible = "amazon-al,cache", "cache";
| cpu_map = <A57_0, A57_1>
| };


This is the most accurate way to describe what you have here. The driver can use this to
know that this integration of CPU and Cache support the edac registers. (This doesn't tell
us anything about whether firmware enabled this stuff, or made/left it all secure-only)

But this doesn't give you a device you can bind a driver to, to kick this stuff off.
This (I assume) is why you added a dummy 'edac_l1_l2' node, that just probes the driver.
The hardware is to do with the CPU and caches, 'edac_l1'_l2' doesn't correspond to any
distinct part of the soc.

The request is to use the machine compatible, not a dummy node. This wraps up the firmware
properties too, and any other platform property we don't know about today.

Once you have this, you don't really need the cpu/cache integration annotations, and your
future memory-controller support can be picked up as part of the platform driver.
If you have otherwise identical platforms with different memory controllers, OF gives you
the API to match the node in the DT.


> Would having a dedicated folder for amazon ease the move to separate files?

I don't think anyone cares about the number of files. Code duplication and extra
boiler-plate, maybe.


Thanks,

James

2019-06-08 01:03:07

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Thu, 2019-06-06 at 11:33 +0100, James Morse wrote:

> > Disagree. The various drivers don't depend on each other.
> > I think we should keep the drivers separated as they are distinct and independent IP blocks.
>
> But they don't exist in isolation, they both depend on the integration-choices/firmware
> that makes up your platform.

What do you mean ? They are exposing counters from independent IP
blocks. They are independent drivers. You argument could be use to
claim the entire SoC depends on "integration choices / firmware" ... I
don't get it.

> Other platforms may have exactly the same IP blocks, configured differently, or with
> different features enabled in firmware.

Sure, like every other IP block on the planet. That has never been a
good reason to bring back the ugly spectre of myboard.c file...

> This means we can't just probe the driver based on
> the presence of the IP block, we need to know the integration choices and firmware
> settings match what the driver requires.

Such as ? I mean none of that differs between these EDAC drivers and
any other IP block, and we still probe them individually.

> (Case in point, that A57 ECC support is optional, another A57 may not have it)

So what ? That belongs in the DT.

> Descriptions of what firmware did don't really belong in the DT. Its not a hardware property.

Since when ? I'm tired of people coming up over and over about that
complete fallacy that the DT should for some obscure religious reason
be strictly limited to "HW properties". ACPI isn't. The old Open
Firmware which I used as a basis for creating the FDT wasn't.

It is perfectly legitimate for the DT to contain configuration
information and firmware choices.

What's not OK is to stick there things that are essentially specific to
the Linux driver implementation but that isn't what we are talking
about here.

> This is why its better to probe this stuff based on the machine-compatible/platform-name,
> not the presence of the IP block in the DT.

No. No no no no. This is bringing back the days of having board files
etc... this is wrong.

Those IP blocks don't need any SW coordination at runtime. The drivers
don't share data nor communicate with each other. There is absolultely
no reason to go down that path.

> Will either of your separate drivers ever run alone? If they're probed from the same
> machine-compatible this won't happen.

They should be probed independently from independent DT nodes, what's
the problem you are trying to fix here ?

> How does your memory controller report errors? Does it send back some data with an invalid
> checksum, or a specific poison/invalid flag? Will the cache report this as a cache error
> too, if its an extra signal, does the cache know what it is?

That's ok, you get the error from both sides, power has done it that
way for ever. It's not always possible to correlate anyways and it's
certainly not the job of the EDAC drivers to try.

> All these are integration choices between the two IP blocks, done as separate drivers we
> don't have anywhere to store that information.

We do, it's called the DT.

> Even if you don't care about this, making
> them separate drivers should only be done to make them usable on other platforms, where
> these choices may have been different.

That wouldn't make the drivers unusable on other platforms at all.

Cheers,
Ben.


2019-06-08 01:05:24

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Fri, 2019-06-07 at 16:11 +0100, James Morse wrote:
> I'm coming at this from somewhere else. This stuff has to be considered all the way
> through the system. Just because each component supports error detection, doesn't mean you
> aren't going to get silent corruption. Likewise if another platform picks up two piecemeal
> edac drivers for hardware it happens to have in common with yours, it doesn't mean we're
> counting all the errors. This stuff has to be viewed for the whole platform.

Sure but you don't solve that problem by having a magic myplatform.c
overseer. And even if you do, it can perfectly access the individual IP
block drivers, finding them via phandles in the DT for example etc...
without having to make those individual drivers dependent on some over
arching machine wide probing mechanism.

> But this doesn't give you a device you can bind a driver to, to kick this stuff off.
> This (I assume) is why you added a dummy 'edac_l1_l2' node, that just probes the driver.
> The hardware is to do with the CPU and caches, 'edac_l1'_l2' doesn't correspond to any
> distinct part of the soc.
>
> The request is to use the machine compatible, not a dummy node. This wraps up the firmware
> properties too, and any other platform property we don't know about today.
>
> Once you have this, you don't really need the cpu/cache integration annotations, and your
> future memory-controller support can be picked up as part of the platform driver.
> If you have otherwise identical platforms with different memory controllers, OF gives you
> the API to match the node in the DT.

Dummy nodes are pefectly fine, and has been from the early days of Open
Firmware. That said, these aren't so much dummy as a way to expose the
control path to the caches. The DT isn't perfect in its structure and
the way caches and CPUs are represented makes it difficult to represent
arbitrary control path to them without extra nodes, which is thus what
people do.

Cheers,
Ben.

2019-06-08 09:07:26

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Sat, Jun 08, 2019 at 10:16:11AM +1000, Benjamin Herrenschmidt wrote:
> Those IP blocks don't need any SW coordination at runtime. The drivers
> don't share data nor communicate with each other. There is absolultely
> no reason to go down that path.

Let me set one thing straight: the EDAC "subsystem" if you will - or
that pile of code which does error counting and reporting - has its
limitations in supporting one EDAC driver per platform. And whenever we
have two drivers loadable on a platform, we have to do dirty hacks like

301375e76432 ("EDAC: Add owner check to the x86 platform drivers")

What that means is, that if you need to call EDAC logging routines or
whatnot from two different drivers, there's no locking, no nothing. So
it might work or it might set your cat on fire.

IOW, having multiple separate "drivers" or representations of RAS
functionality using EDAC facilities is something that hasn't been
done. Well, almost. highbank_mc_edac.c and highbank_l2_edac.c is one
example but they make sure they don't step on each other's toes by using
different EDAC pieces - a device vs a memory controller abstraction.

And now the moment all of a sudden you decide you want for those
separate "drivers" to synchronize on something, you need to do something
hacky like the amd_register_ecc_decoder() thing, for example, because we
need to call into the EDAC memory controller driver to decode a DRAM ECC
error properly, while the rest of the error types get decoded somewhere
else...

Then there comes the issue with code reuse - wouldn't it be great if a
memory controller driver can be shared between platform drivers instead of
copying it in both?

We already do that - see fsl_ddr_edac.c which gets shared between PPC
*and* ARM. drivers/edac/skx_common.c is another example for Intel chips.

Now, if you have a platform with 10 IP blocks which each have RAS
functionality, are you saying you'll do 10 different pieces called

<platform_name>_<ip_block#>_edac.c

?

And if <next_platform> has an old IP block with the old RAS
functionality, you load <platform_name>_<ip_block>_edac.c on the new
platform too?

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-11 05:51:53

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Sat, 2019-06-08 at 11:05 +0200, Borislav Petkov wrote:
> On Sat, Jun 08, 2019 at 10:16:11AM +1000, Benjamin Herrenschmidt wrote:
> > Those IP blocks don't need any SW coordination at runtime. The drivers
> > don't share data nor communicate with each other. There is absolultely
> > no reason to go down that path.
>
> Let me set one thing straight: the EDAC "subsystem" if you will - or
> that pile of code which does error counting and reporting - has its
> limitations in supporting one EDAC driver per platform. And whenever we
> have two drivers loadable on a platform, we have to do dirty hacks like
>
> 301375e76432 ("EDAC: Add owner check to the x86 platform drivers")
>
> What that means is, that if you need to call EDAC logging routines or
> whatnot from two different drivers, there's no locking, no nothing. So
> it might work or it might set your cat on fire.

Should we fix that then instead ? What are the big issues with adding
some basic locking ? being called from NMIs ?

If the separate drivers operate on distinct counters I don't see a big
problem there.

> IOW, having multiple separate "drivers" or representations of RAS
> functionality using EDAC facilities is something that hasn't been
> done. Well, almost. highbank_mc_edac.c and highbank_l2_edac.c is one
> example but they make sure they don't step on each other's toes by using
> different EDAC pieces - a device vs a memory controller abstraction.

That sounds like a reasonable requirement.

> And now the moment all of a sudden you decide you want for those
> separate "drivers" to synchronize on something, you need to do something
> hacky like the amd_register_ecc_decoder() thing, for example, because we
> need to call into the EDAC memory controller driver to decode a DRAM ECC
> error properly, while the rest of the error types get decoded somewhere
> else...
>
> Then there comes the issue with code reuse - wouldn't it be great if a
> memory controller driver can be shared between platform drivers instead of
> copying it in both?
>
> We already do that - see fsl_ddr_edac.c which gets shared between PPC
> *and* ARM. drivers/edac/skx_common.c is another example for Intel chips.
>
> Now, if you have a platform with 10 IP blocks which each have RAS
> functionality, are you saying you'll do 10 different pieces called
>
> <platform_name>_<ip_block#>_edac.c
>
> ?
>
> And if <next_platform> has an old IP block with the old RAS
> functionality, you load <platform_name>_<ip_block>_edac.c on the new
> platform too?

I'n not sure why <platform_name> ...

Anyway, let's get back to the specific case of our Amazon platform here
since it's a concrete example.

Hanna, can you give us a reasonably exhaustive list of how many such
"drivers" we'll want in the EDAC subsystem and whether you envision any
coordination requirement between them or not ?

Cheers,
Ben.



2019-06-11 07:23:04

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Tue, 2019-06-11 at 15:50 +1000, Benjamin Herrenschmidt wrote:
> On Sat, 2019-06-08 at 11:05 +0200, Borislav Petkov wrote:
> > On Sat, Jun 08, 2019 at 10:16:11AM +1000, Benjamin Herrenschmidt wrote:
> > > Those IP blocks don't need any SW coordination at runtime. The drivers
> > > don't share data nor communicate with each other. There is absolultely
> > > no reason to go down that path.
> >
> > Let me set one thing straight: the EDAC "subsystem" if you will - or
> > that pile of code which does error counting and reporting - has its
> > limitations in supporting one EDAC driver per platform. And whenever we
> > have two drivers loadable on a platform, we have to do dirty hacks like
> >
> > 301375e76432 ("EDAC: Add owner check to the x86 platform drivers")
> >
> > What that means is, that if you need to call EDAC logging routines or
> > whatnot from two different drivers, there's no locking, no nothing. So
> > it might work or it might set your cat on fire.
>
> Should we fix that then instead ? What are the big issues with adding
> some basic locking ? being called from NMIs ?
>
> If the separate drivers operate on distinct counters I don't see a big
> problem there.

So looking again ... all the registration/removal of edac devices seem
to already be protected by mutexes, so that's not a problem.

Tell me more about what specific races you think we might have here,
I'm not sure I follow...

Cheers,
Ben.


2019-06-11 07:33:08

by Hanna Hawa

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Hi Ben, Boris

On 6/11/2019 8:50 AM, Benjamin Herrenschmidt wrote:
>
> Anyway, let's get back to the specific case of our Amazon platform here
> since it's a concrete example.
>
> Hanna, can you give us a reasonably exhaustive list of how many such
> "drivers" we'll want in the EDAC subsystem and whether you envision any
> coordination requirement between them or not ?
In the near future we plan to push EDAC drivers for L1/L2 and memory
controller.
There's no common resources/shared data between them.

Thanks,
Hanna
>
> Cheers,
> Ben.
>
>
>

2019-06-11 12:24:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Tue, Jun 11, 2019 at 03:50:40PM +1000, Benjamin Herrenschmidt wrote:
> Should we fix that then instead ?

Sure.

> What are the big issues with adding some basic locking ? being called
> from NMIs ?

That is one possible issue. I know we don't call the error decoding
routines in NMI context on x86 but no clue about ARM.

> If the separate drivers operate on distinct counters I don't see a big
> problem there.

Yes, and they do.

> I'n not sure why <platform_name> ...

Well, however you'd call it. It will be some distinct name I hope :)

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-11 12:26:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Tue, Jun 11, 2019 at 05:21:39PM +1000, Benjamin Herrenschmidt wrote:
> So looking again ... all the registration/removal of edac devices seem
> to already be protected by mutexes, so that's not a problem.
>
> Tell me more about what specific races you think we might have here,
> I'm not sure I follow...

Well, as I said "it might work or it might set your cat on fire." For
example, one of the error logging paths is edac_mc_handle_error() and
that thing mostly operates using the *mci pointer which should be ok
but then it calls the "trace_mc_event" tracepoint and I'd suppose that
tracepoints can do lockless but I'm not sure.

So what needs to happen is for paths which weren't called by multiple
EDAC agents in parallel but need to get called in parallel now due to
ARM drivers wanting to do that, to get audited that they're safe.

Situation is easy if you have one platform driver where you can
synchronize things in the driver but since you guys need to do separate
drivers for whatever reason, then that would need to be done prior.

Makes more sense?

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-11 12:26:37

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Tue, Jun 11, 2019 at 10:29:55AM +0300, Hawa, Hanna wrote:
> In the near future we plan to push EDAC drivers for L1/L2 and memory
> controller. There's no common resources/shared data between them.

Ok, you should be safe then. If you need to do more involved interaction
in the future, you know what the issues are.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-12 00:41:45

by Hanna Hawa

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Hi James,

>
> Allowing linux to access these implementation-defined registers has come up before:
> https://www.spinics.net/lists/kernel/msg2750349.html
>
> It looks like you've navigated most of the issues. Accessing implementation-defined
> registers is frowned on, but this stuff can't be done generically until v8.2.
Sure, no planning to do this generally for all ARM a57/a72. I'm doing
this specific for alpine SoCs.

>
> This can't be done on 'all A57/A72' because some platforms may not have been integrated to
> have error-checking at L1/L2, (1.3 'Features' of [0] says the ECC protection for L1 data
> cache etc is optional). Even if they did, this stuff needs turning on in L2CTLR_EL1.
> These implementation-defined registers may be trapped by the hypervisor, I assume for your
> platform this is linux booted at EL2, so no hypervisor.
In Alpine-v2/Alpine-v3 Bit[21]-"ECC and parity enable" in L2CTRL_EL1 is
enabled in the firmware.

>
>
>> +#define ARM_CA57_CPUMERRSR_INDEX_OFF (0)
>> +#define ARM_CA57_CPUMERRSR_INDEX_MASK (0x3FFFF)
>
> (GENMASK() would make it quicker and easier to compare this with the datasheet)
Will be used in next patchset.

>
>
>> +#define ARM_CA57_L2_INC_PLRU_RAM 0x18
>
> A57 describes this one as 'PF RAM'...
will be updated.

>
>
>
> Linux supports versions of binutils that choke on this syntax.
> See the sys_reg() definitions in arm64's asm/sysreg.h that define something you can feed
> to read_sysreg_s(). It would save having these wrapper functions.
>
> commit 72c583951526 ("arm64: gicv3: Allow GICv3 compilation with older binutils") for the
> story.
Great, I'll use sys_reg(), read_sysreg_s(), and remove the wrapper
functions.

>
>
>
> | #define ARM_CA57_CPUMERRSR_VALID BIT(31)
> | if (!(val & ARM_CA57_CPUMERRSR_VALID))
>
> would be easier to read, the same goes for 'fatal' as its a single bit.
Will be fixed, here and in al_a57_edac_l2merrsr.

>
>
>> + edac_device_handle_ce(edac_dev, 0, 0, "L2 Error");
>
> How do we know this was corrected?
>
> 6.4.8 "Error Correction Code" has "Double-bit ECC errors set the fatal bit." in a
> paragraph talking about the L1 memory system.
I'll check fatal field to check if it's uncorrected/corrected.

>
> "L2 Error" ? Copy and paste?
copy/paste mistake, will be fixed.

>
>
>> + edac_printk(KERN_CRIT, DRV_NAME, "CPU%d L1 %serror detected\n",
>> + cpu, (fatal) ? "Fatal " : "");
>> + edac_printk(KERN_CRIT, DRV_NAME, "RAMID=");
>> +
>> + switch (ramid) {
>> + case ARM_CA57_L1_I_TAG_RAM:
>> + pr_cont("'L1-I Tag RAM' index=%d way=%d", index, way);
>> + break;
>> + case ARM_CA57_L1_I_DATA_RAM:
>> + pr_cont("'L1-I Data RAM' index=%d bank= %d", index, way);
>> + break;
>
> Is index/way information really useful? I can't replace way-3 on the system, nor can I
> stop it being used. If its useless, I'd rather we don't bother parsing and printing it out.
I'll remove the index/way information, and keep CPUMERRSR_EL1 value
print (who want this information can parse the value and get the
index/bank status)

>
>
>> + pr_cont(", repeat=%d, other=%d (CPUMERRSR_EL1=0x%llx)\n", repeat, other,
>> + val);
>
> 'other' here is another error, but we don't know the ramid.
> 'repeat' is another error for the same ramid.
>
> Could we still feed this stuff into edac? This would make the counters accurate if the
> polling frequency isn't quite fast enough.
There is no API in EDAC to increase the counters, I can increase by
accessing the ce_count/ue_count from
edac_device_ctl_info/edac_device_instance structures if it's okay..

>
>
>> +static void al_a57_edac_l2merrsr(void *arg)
>> +{
>
>> + edac_device_handle_ce(edac_dev, 0, 0, "L2 Error");
>
> How do we know this is corrected?
>
> If looks like L2CTLR_EL1[20] might force fatal 1/0 to map to uncorrected/corrected. Is
> this what you are depending on here?
No - not on this. Reporting all the errors as corrected seems to be bad.

Can i be depends on fatal field?

if (fatal)
edac_device_handle_ue(edac_dev, 0, 0, "L2 Error");
else
edac_device_handle_ce(edac_dev, 0, 0, "L2 Error");

How can L2CTLR_EL1[20] force fatal?

>
> (it would be good to have a list of integration-time and firmware dependencies this driver
> has, for the next person who tries to enable it on their system and complains it doesn't
> work for them)
Where can i add such information?

>
>
>> + case ARM_CA57_L2_INC_PLRU_RAM:
>> + pr_cont("'L2 Inclusion PLRU RAM'");
>
> The A57 TRM describes this as "Inclusion PF RAM", and notes its only in r1p0 or later,
> (but doesn't say what it is). The A72 TRM describes the same encoding as "Inclusion PLRU
> RAM", which is something to do with its replacement policy. It has control bits that A57's
> version doesn't, so these are not the same thing.
>
> Disambiguating A57/A72 here is a load of faff, 'L2 internal metadata' probably covers both
> cases, but unless these RAMs are replaceable or can be disabled, there isn't much point
> working out which one it was.
Will be fixed to 'L2 internal metadata'

>
>> + pr_cont(", cpuid/way=%d, repeat=%d, other=%d (L2MERRSR_EL1=0x%llx)\n",
>> + way, repeat, other, val);
>
> cpuid could be useful if you can map it back to the cpu number linux has.
> If you can spot that cpu-7 is experiencing more errors than it should, you can leave it
> offline.
>
> To do this you'd need to map each L2MERRSR_EL1's '0-3' range back to the CPUs they
> actually are. The gic's 'ppi-partitions' does this with phandles, e.g.
> Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml. You could add a
> similar shaped thing to the l2-cacheX node in the DT, (or in your edac node, but it is a
> property of the cache integration).
As in L1 prints, I'll remove non-relevant prints.

>
>
>> + /*
>> + * Use get_online_cpus/put_online_cpus to prevent the online CPU map
>> + * changing while reads the L1/L2 error status
>
> For walking the list of offline cpus, this makes sense. But you schedule work without
> waiting, it may get run after you drop the cpus_read_lock()...,
Will update the smp_call_function_single() call function to wait.

>
>
>> + get_online_cpus();
>
> The comment above these helpers is:
> | /* Wrappers which go away once all code is converted */
>
> cpus_read_lock()?
Will be updated.

>
>
>> + for_each_online_cpu(cpu) {
>> + /* Check L1 errors */
>> + smp_call_function_single(cpu, al_a57_edac_cpumerrsr, edac_dev,
>> + 0);
>
> As you aren't testing for big/little, wouldn't on_each_cpu() here be simpler?
Could be simpler for L1, how can be implemented for L2?

>
> As you don't wait, what stops al_a57_edac_cpumerrsr() feeding two errors into
> edac_device_handle_ce() at the same time? Do you need a spinlock in al_a57_edac_cpumerrsr()?
Will call smp_call_function_single() with wait.

>
>
>> + cluster = topology_physical_package_id(cpu);
>
> Hmm, I'm not sure cluster==package is guaranteed to be true forever.
>
> If you describe the L2MERRSR_EL1 cpu mapping in your DT you could use that. Otherwise
> pulling out the DT using something like the arch code's parse_cluster().
I rely on that it's alpine SoC specific driver.

>
>
>> + if (cluster != last_cluster) {
>> + smp_call_function_single(cpu, al_a57_edac_l2merrsr,
>> + edac_dev, 0);
>> + last_cluster = cluster;
>> + }
>
> Here you depend on the CPUs being listed in cluster-order in the DT. I'm fairly sure the
> numbering is arbitrary: On my Juno 0,3,4,5 are the A53 cluster, and 1,2 are the A57 cluster.
>
> If 1,3,5 were cluster-a and 2,4,6 were cluster-b, you would end up calling
> al_a57_edac_l2merrsr() for each cpu. As you don't wait, they could race.
>
> If you can get a cpu-mask for each cluster, smp_call_function_any() would to the
> pick-one-online-cpu work for you.
Again, I rely on that it's alpine SoC specific driver.
How can I get cpu-mask for each cluster? from DT?

>
>
>> +static int al_a57_edac_remove(struct platform_device *pdev)
>> +{
>> + struct edac_device_ctl_info *edac_dev = platform_get_drvdata(pdev);
>> +
>> + edac_device_del_device(edac_dev->dev);
>> + edac_device_free_ctl_info(edac_dev);
>
> Your poll function schedule work on other CPUs and didn't wait, is it possible
> al_a57_edac_l2merrsr() is still using this memory when you free it?
This will be okay, after using wait in smp_call_function_single().
>
>
>> +MODULE_LICENSE("GPL");
>
> | MODULE_LICENSE("GPL v2");
>
> To match the SPDX header?
Will be fixed.

Thanks for your detailed review.

Thanks,
Hanna
>
>
>
> Thanks,
>
> James
>
>
> [0]
> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0488c/DDI0488C_cortex_a57_mpcore_r1p0_trm.pdf
>

2019-06-12 05:08:48

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Tue, 2019-06-11 at 13:56 +0200, Borislav Petkov wrote:
> On Tue, Jun 11, 2019 at 05:21:39PM +1000, Benjamin Herrenschmidt wrote:
> > So looking again ... all the registration/removal of edac devices seem
> > to already be protected by mutexes, so that's not a problem.
> >
> > Tell me more about what specific races you think we might have here,
> > I'm not sure I follow...
>
> Well, as I said "it might work or it might set your cat on fire." For
> example, one of the error logging paths is edac_mc_handle_error() and
> that thing mostly operates using the *mci pointer which should be ok
> but then it calls the "trace_mc_event" tracepoint and I'd suppose that
> tracepoints can do lockless but I'm not sure.

Yes, we would be in a world of pain already if tracepoints couldn't
handle concurrency :-)

> So what needs to happen is for paths which weren't called by multiple
> EDAC agents in parallel but need to get called in parallel now due to
> ARM drivers wanting to do that, to get audited that they're safe.

That's the thing, I don't think we have such path. We are talking about
having separate L1/L2 vs. MC drivers, they don't overlap.

> Situation is easy if you have one platform driver where you can
> synchronize things in the driver but since you guys need to do separate
> drivers for whatever reason, then that would need to be done prior.
>
> Makes more sense?

Sort-of... I still don't see a race in what we propose but I might be
missing something subtle. We are talking about two drivers for two
different IP blocks updating different counters etc...

Cheers,
Ben.


2019-06-12 07:48:40

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Wed, Jun 12, 2019 at 08:25:52AM +1000, Benjamin Herrenschmidt wrote:
> Yes, we would be in a world of pain already if tracepoints couldn't
> handle concurrency :-)

Right, lockless buffer and the whole shebang :)

> Sort-of... I still don't see a race in what we propose but I might be
> missing something subtle. We are talking about two drivers for two
> different IP blocks updating different counters etc...

If you do only *that* you should be fine. That should technically be ok.

I still think, though, that the sensible thing to do is have one
platform driver which concentrates all RAS functionality. It is the
more sensible design and takes care of potential EDAC shortcomings and
the need to communicate between the different logging functionality,
as in, for example, "I had so many errors, lemme go and increase DRAM
scrubber frequency." For example. And all the other advantages of having
everything in a single driver.

And x86 already does that - we even have a single driver for all AMD
platforms - amd64_edac. Intel has a couple but there's still a lot of
sharing.

But apparently ARM folks want to have one driver per IP block. And we
have this discussion each time a new vendor decides to upstream its
driver. And there's no shortage of vendors in ARM-land trying to do
that.

James and I have tried to come up with a nice scheme to make that work
on ARM and he has an example prototype here:

http://www.linux-arm.org/git?p=linux-jm.git;a=shortlog;h=refs/heads/edac_dummy/v1

to show how it could look like.

But I'm slowly growing a serious aversion against having this very same
discussion each time an ARM vendor sends a driver. And that happens
pretty often nowadays.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-12 08:53:25

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Wed, 2019-06-12 at 05:48 +0200, Borislav Petkov wrote:
> On Wed, Jun 12, 2019 at 08:25:52AM +1000, Benjamin Herrenschmidt wrote:
> > Yes, we would be in a world of pain already if tracepoints couldn't
> > handle concurrency :-)
>
> Right, lockless buffer and the whole shebang :)

Yup.

> > Sort-of... I still don't see a race in what we propose but I might be
> > missing something subtle. We are talking about two drivers for two
> > different IP blocks updating different counters etc...
>
> If you do only *that* you should be fine. That should technically be ok.

Yes, that' the point.

> I still think, though, that the sensible thing to do is have one
> platform driver which concentrates all RAS functionality.

I tend to disagree here. We've been down that rabbit hole in the past
and we (Linux in general) are trying to move away from that sort of
"platform" overarching driver as much as possible.

> It is the
> more sensible design and takes care of potential EDAC shortcomings and
> the need to communicate between the different logging functionality,
> as in, for example, "I had so many errors, lemme go and increase DRAM
> scrubber frequency." For example. And all the other advantages of having
> everything in a single driver.

This is a policy. It should either belong to userspace, or be in some
generic RAS code in the kernel, there's no reason why these can't be
abstracted. Also in your specific example, it could be entirely local
to the MC EDAC / DRAM controller path, we could have a generic way for
EDAC to advertise that a given memory channel is giving lots of errors
and have memory controller drivers listen to it but usually the EDAC MC
driver *is* the only thing that looks like a MC driver to begin with,
so again, pretty much no overlap with L1/L2 caches RAS or PCIe RAS
etc...

> And x86 already does that - we even have a single driver for all AMD
> platforms - amd64_edac. Intel has a couple but there's still a lot of
> sharing.

Unless I'm mistaken, that amd64 EDAC is just an MC one... but I only
had a cursory glance at the code.

> But apparently ARM folks want to have one driver per IP block. And we
> have this discussion each time a new vendor decides to upstream its
> driver. And there's no shortage of vendors in ARM-land trying to do
> that.

For good reasons :-)

> James and I have tried to come up with a nice scheme to make that work
> on ARM and he has an example prototype here:
>
> http://www.linux-arm.org/git?p=linux-jm.git;a=shortlog;h=refs/heads/edac_dummy/v1
>
> to show how it could look like.
>
> But I'm slowly growing a serious aversion against having this very same
> discussion each time an ARM vendor sends a driver. And that happens
> pretty often nowadays.

Maybe because what you are promoting might not be the right path
here... seriously, there's a reason why all vendors want to go down
that path and in this case I don't think they are wrong.

This isn't about just another ARM vendor, in fact I'm rather new to the
whole ARM thing, I used to maintain arch/powerpc :-) The point is what
you are trying to push for goes against everything we've been trying to
do in Linux when it comes to splitting drivers to individual IP blocks.

Yes, in *some* cases coordination will be needed in which case there
are ways to do that that don't necessarily involve matching a driver to
the root of the DT, and a pseudo-device is in fact a very reasonable
way to do it, it was a common practice in IEEE1275 before I invented
the FDT, and we do that for a number of other things already.

Cheers,
Ben.


2019-06-12 14:48:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Wed, Jun 12, 2019 at 06:29:26PM +1000, Benjamin Herrenschmidt wrote:
> I tend to disagree here. We've been down that rabbit hole in the past
> and we (Linux in general) are trying to move away from that sort of
> "platform" overarching driver as much as possible.

Why is a "platform" driver like that so bad?

> This is a policy. It should either belong to userspace,

For some errors you can't do userspace as it is too late for it - you
wanna address that before you return to it.

> or be in some generic RAS code in the kernel, there's no reason why
> these can't be abstracted.

Yes, we have this drivers/ras/cec.c thing which collects correctable
DRAM errors on x86. :-)

> Also in your specific example, it could be entirely local to the MC
> EDAC / DRAM controller path, we could have a generic way for EDAC to
> advertise that a given memory channel is giving lots of errors and
> have memory controller drivers listen to it but usually the EDAC MC
> driver *is* the only thing that looks like a MC driver to begin with,
>
> so again, pretty much no overlap with L1/L2 caches RAS or PCIe RAS
> etc...
>
> Unless I'm mistaken, that amd64 EDAC is just an MC one... but I only
> had a cursory glance at the code.

EDAC has historically been concentrating on DRAM errors as that is
what people have been paying attention to. But it isn't limited to
DRAM errors - there is some basic PCI errors functionality behind
edac_pci_create_generic_ctl() which polls for PCI parity errors.

So it still makes sense to me to have a single driver which takes care
of all things RAS for a platform. You just load one driver and it does
it all, including recovery actions.

> Maybe because what you are promoting might not be the right path
> here... seriously, there's a reason why all vendors want to go down
> that path and in this case I don't think they are wrong.
>
> This isn't about just another ARM vendor, in fact I'm rather new to the
> whole ARM thing, I used to maintain arch/powerpc :-)

What happened? :-P

> The point is what you are trying to push for goes against everything
> we've been trying to do in Linux when it comes to splitting drivers to
> individual IP blocks.

Please do explain why is a driver-per-IP-block better and please don't
give me the DT argument - I've heard that one more than enough now.

Like, for example, how do you deal with the case where you have a
platform which has a bunch of IP blocks with RAS functionality and they
all have a separate driver. Now, you want to do recovery and failover
actions depending on certain error count from a certain group of IP
blocks.

You export those counts through sysfs from the separate drivers and you
have a userspace agent doing that policy?

That cannot always fly because recovery actions for some errors need to
happen before we return to userspace - i.e., memory-failure.c types of
scenarios.

You add another "counting" layer which is yet another driver which
collects those errors and applies policy actions?

But then that layer needs to be made generic enough to be shared by the
other EDAC IP block drivers, otherwise every platform would need its own
counter layer driver. Which basically puts the problem somewhere else
but doesn't make it go away.

Another way I'm not thinking of at the moment?

A single driver solves that problem as it has all the required
information in one place and deals with it then and there.

I hear you that platform drivers are frowned upon but connecting it all
in one place for such purposes makes sense to me in this particular
case.

Btw, what is your final goal with these drivers? Dump decoded error
information in dmesg? Or something more sophisticated?

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-12 14:48:58

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Em Wed, 12 Jun 2019 18:29:26 +1000
Benjamin Herrenschmidt <[email protected]> escreveu:

> On Wed, 2019-06-12 at 05:48 +0200, Borislav Petkov wrote:
> > On Wed, Jun 12, 2019 at 08:25:52AM +1000, Benjamin Herrenschmidt wrote:
> > > Yes, we would be in a world of pain already if tracepoints couldn't
> > > handle concurrency :-)
> >
> > Right, lockless buffer and the whole shebang :)
>
> Yup.
>
> > > Sort-of... I still don't see a race in what we propose but I might be
> > > missing something subtle. We are talking about two drivers for two
> > > different IP blocks updating different counters etc...
> >
> > If you do only *that* you should be fine. That should technically be ok.
>
> Yes, that' the point.

I don't think the EDAC core has any troubles with concurrency.

As far as I remember, the hacks we had to do at x86 world are due to
concurrency at the hardware side: having two RAS codes (either between
a driver and BIOS APEI/GHES or between two drivers) accessing the same
set of MCA registers doesn't work at the Intel chipsets I'm aware of.

>
> > I still think, though, that the sensible thing to do is have one
> > platform driver which concentrates all RAS functionality.
>

> I tend to disagree here. We've been down that rabbit hole in the past
> and we (Linux in general) are trying to move away from that sort of
> "platform" overarching driver as much as possible.

Without entering on ARM-specific architecture design, I tend to agree
with the principle that different hardware components would use different
RAS drivers, provided that they use the already existing RAS cores.

Up to some extend, we have already multiple RAS drivers right now.

Besides EDAC/MCE, there are for example, PCIe errors that can come via
AER. Network drivers also report errors using different mechanisms.
I know a person that it is interested on working on an implementation
to collect disk errors using a trace-based error report mechanism.

So, at the end of the day, different errors may come from different
drivers using different non-overlapping error mechanisms.

That's said, from the admin PoV, it makes sense to have a single
daemon that collect errors from all error sources and take the
needed actions.

>
> > It is the
> > more sensible design and takes care of potential EDAC shortcomings and
> > the need to communicate between the different logging functionality,
> > as in, for example, "I had so many errors, lemme go and increase DRAM
> > scrubber frequency." For example. And all the other advantages of having
> > everything in a single driver.
>
> This is a policy. It should either belong to userspace, or be in some
> generic RAS code in the kernel, there's no reason why these can't be
> abstracted. Also in your specific example, it could be entirely local
> to the MC EDAC / DRAM controller path, we could have a generic way for
> EDAC to advertise that a given memory channel is giving lots of errors
> and have memory controller drivers listen to it but usually the EDAC MC
> driver *is* the only thing that looks like a MC driver to begin with,
> so again, pretty much no overlap with L1/L2 caches RAS or PCIe RAS
> etc...

On userspace, I guess there's just the rasdaemon and the old legacy
edac-utils.

Right now, the rasdaemon doesn't have anything like that, but we keep
integrating new RAS report mechanisms to it (we just added support
this week for network devlink error reports).

If I had to put a policy like that on userspace, rasdaemon should probably
be the right place to add it.

Thanks,
Mauro

2019-06-12 14:56:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Wed, Jun 12, 2019 at 07:42:42AM -0300, Mauro Carvalho Chehab wrote:
> That's said, from the admin PoV, it makes sense to have a single
> daemon that collect errors from all error sources and take the
> needed actions.

Doing recovery actions in userspace is too flaky. Daemon can get killed
at any point in time and there are error types where you want to do
recovery *before* you return to userspace.

Yes, we do have different error reporting facilities but I still think
that concentrating all the error information needed in order to do
proper recovery action is the better approach here. And make that part
of the kernel so that it is robust. Userspace can still configure it and
so on.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-12 16:58:15

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Em Wed, 12 Jun 2019 13:00:39 +0200
Borislav Petkov <[email protected]> escreveu:

> On Wed, Jun 12, 2019 at 07:42:42AM -0300, Mauro Carvalho Chehab wrote:
> > That's said, from the admin PoV, it makes sense to have a single
> > daemon that collect errors from all error sources and take the
> > needed actions.
>
> Doing recovery actions in userspace is too flaky. Daemon can get killed
> at any point in time and there are error types where you want to do
> recovery *before* you return to userspace.

Yeah, some actions would work a lot better at Kernelspace. Yet, some
actions would work a lot better if implemented on userspace.

For example, a server with multiple network interfaces may re-route
the traffic to a backup interface if the main one has too many errors.

This can easily be done on userspace.

> Yes, we do have different error reporting facilities but I still think
> that concentrating all the error information needed in order to do
> proper recovery action is the better approach here. And make that part
> of the kernel so that it is robust. Userspace can still configure it and
> so on.

If the error reporting facilities are for the same hardware "group"
(like the machine's memory controllers), I agree with you: it makes
sense to have a single driver.

If they are for completely independent hardware then implementing
as separate drivers would work equally well, with the advantage of
making easier to maintain and make it generic enough to support
different vendors using the same IP block.

Thanks,
Mauro

2019-06-12 17:10:41

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Wed, 2019-06-12 at 08:42 -0300, Mauro Carvalho Chehab wrote:
> > Yes, we do have different error reporting facilities but I still
> > think
> > that concentrating all the error information needed in order to do
> > proper recovery action is the better approach here. And make that
> > part
> > of the kernel so that it is robust. Userspace can still configure
> > it and
> > so on.
>
> If the error reporting facilities are for the same hardware "group"
> (like the machine's memory controllers), I agree with you: it makes
> sense to have a single driver.
>
> If they are for completely independent hardware then implementing
> as separate drivers would work equally well, with the advantage of
> making easier to maintain and make it generic enough to support
> different vendors using the same IP block.

Right. And if you really want a platform orchestrator for recovery in
the kenrel, it should be a separate one, that consumes data from the
individual IP block drivers that report the raw errors anyway.

But for the main case that really needs to be in the kernel, which is
DRAM, the recovery can usually be contained to the MC driver anyway.

Cheers,
Ben.


2019-06-12 17:15:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Wed, Jun 12, 2019 at 09:57:40PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2019-06-12 at 08:42 -0300, Mauro Carvalho Chehab wrote:
> > > Yes, we do have different error reporting facilities but I still
> > > think
> > > that concentrating all the error information needed in order to do
> > > proper recovery action is the better approach here. And make that
> > > part
> > > of the kernel so that it is robust. Userspace can still configure
> > > it and
> > > so on.
> >
> > If the error reporting facilities are for the same hardware "group"
> > (like the machine's memory controllers), I agree with you: it makes
> > sense to have a single driver.
> >
> > If they are for completely independent hardware then implementing
> > as separate drivers would work equally well, with the advantage of
> > making easier to maintain and make it generic enough to support
> > different vendors using the same IP block.
>
> Right. And if you really want a platform orchestrator for recovery in
> the kenrel, it should be a separate one, that consumes data from the
> individual IP block drivers that report the raw errors anyway.

Yap, I think we're in agreement here. I believe the important question
is whether you need to get error information from multiple sources
together in order to do proper recovery or doing it per error source
suffices.

And I think the actual use cases could/should dictate our
drivers/orchestrators design.

Thus my question how you guys are planning on tying all that error info
the drivers report, into the whole system design?

> But for the main case that really needs to be in the kernel, which is
> DRAM, the recovery can usually be contained to the MC driver anyway.

Right, if that is enough to handle the error properly.

The memory-failure.c example I gave before is the error reporting
mechanism (x86 MCA) calling into the mm subsystem to poison and isolate
page frames which are known to contain errors. So you have two things
talking to each other.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-12 17:24:27

by Hanna Hawa

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Hi Boris,

>
> Yap, I think we're in agreement here. I believe the important question
> is whether you need to get error information from multiple sources
> together in order to do proper recovery or doing it per error source
> suffices.
>
> And I think the actual use cases could/should dictate our
> drivers/orchestrators design.
>
> Thus my question how you guys are planning on tying all that error info
> the drivers report, into the whole system design?
We have daemon script that collects correctable/uncorrectable errors
from EDAC sysfs and reports to Amazon service that allow us to take
action on specific error thresholds.

Thanks,
Hanna
>

2019-06-12 18:06:00

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Wed, Jun 12, 2019 at 03:35:31PM +0300, Hawa, Hanna wrote:
> We have daemon script that collects correctable/uncorrectable errors from
> EDAC sysfs and reports to Amazon service that allow us to take action on
> specific error thresholds.

What does "take action" mean, more specifically? Is the script taking
action to poison/isolate memory or a guy goes down to the basement and
replaces parts? :-)

IOW, I'm interested in whether you guys need an additional recovery
action agent and if so, whether it would make sense to have something
generic in the tree, which can be shared with others...

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-13 16:35:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Thu, Jun 13, 2019 at 09:54:18AM +1000, Benjamin Herrenschmidt wrote:
> It tends to be a slippery slope. Also in the ARM world, most SoC tend
> to re-use IP blocks, so you get a lot of code duplication, bug fixed in
> one and not the other etc...

Yes, I'd like to be able to reuse EDAC drivers if they're for single IP
blocks and those IP blocks get integrated by multiple vendors.

> I don't necessarily mind having a "platform" component that handles
> policies in case where userspace is really not an option, but it
> shouldn't be doing it by containing the actual drivers for the
> individual IP block error collection. It could however "use" them via
> in-kernel APIs.

Ok, sounds good.

> Those are rare. At the end of the day, if you have a UE on memory, it's
> a matter of luck. It could have hit your kernel as well. You get lucky
> it only hit userspace but you can't make a general statement you "can't
> trust userspace".

I'm not saying that - I'm saying that if we're going to do a
comprehensive solution we better address all possible error severities
with adequate handling.

> Cache errors tend to be the kind that tend to have to be addressed
> immediately, but even then, that's often local to some architecture
> machine check handling, not even in EDAC.

That's true.

> Do you have a concrete example of a type of error that
>
> - Must be addressed in the kernel
>
> - Relies on coordinating drivers for more than one IP block
>
> ?

My usual example at the end of the #MC handler on x86, do_machine_check():

/* Fault was in user mode and we need to take some action */
if ((m.cs & 3) == 3) {
ist_begin_non_atomic(regs);
local_irq_enable();

if (kill_it || do_memory_failure(&m))
force_sig(SIGBUS, current);

we try to poison and if we fail or have to kill the process anyway, off
it goes.

Yes, this is not talking to EDAC drivers yet but is exemplary for a more
involved recovery action.

> Even then though, my argument would be that the right way to do that,
> assuming that's even platform specific, would be to have then the
> "platform RAS driver" just layout on top of the individual EDAC drivers
> and consume their output. Not contain the drivers themselves.

Ok, that's a fair point and I like the design of that.

> Using machine checks, not EDAC. It's completely orghogonal at this
> point at least.

No, it is using errors reported through the Machine Check Architecture.
EDAC uses the same DRAM error reports. They all come from MCA on x86. It
is a whole notifier chain which gets to see those errors but they all
come from MCA.

PCI errors get reported differently, of course.

EDAC is just a semi-dumb layer around some of those error reporting
mechanisms.

> That said, it would make sense to have an EDAC API to match that
> address back into a DIMM location and give user an informational
> message about failures happening on that DIMM. But that could be done
> via core EDAC MC APIs.

That's what the EDAC drivers on x86 do. All of them :-)

> Here too, no need for having an over-arching platform driver.

Yes, the EDAC drivers which implement all the memory controller
functionality, already do that mapping back. Or at least try to. There's
firmware doing that on x86 too but that's a different story.

<will reply to the rest later in another mail as this one is becoming
too big anyway>.

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-13 17:01:25

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Wed, 2019-06-12 at 12:42 +0200, Borislav Petkov wrote:
> On Wed, Jun 12, 2019 at 06:29:26PM +1000, Benjamin Herrenschmidt wrote:
> > I tend to disagree here. We've been down that rabbit hole in the past
> > and we (Linux in general) are trying to move away from that sort of
> > "platform" overarching driver as much as possible.
>
> Why is a "platform" driver like that so bad?

It tends to be a slippery slope. Also in the ARM world, most SoC tend
to re-use IP blocks, so you get a lot of code duplication, bug fixed in
one and not the other etc...

I don't necessarily mind having a "platform" component that handles
policies in case where userspace is really not an option, but it
shouldn't be doing it by containing the actual drivers for the
individual IP block error collection. It could however "use" them via
in-kernel APIs.

> > This is a policy. It should either belong to userspace,
>
> For some errors you can't do userspace as it is too late for it - you
> wanna address that before you return to it.

Those are rare. At the end of the day, if you have a UE on memory, it's
a matter of luck. It could have hit your kernel as well. You get lucky
it only hit userspace but you can't make a general statement you "can't
trust userspace".

Cache errors tend to be the kind that tend to have to be addressed
immediately, but even then, that's often local to some architecture
machine check handling, not even in EDAC.

Do you have a concrete example of a type of error that

- Must be addressed in the kernel

- Relies on coordinating drivers for more than one IP block

?

Even then though, my argument would be that the right way to do that,
assuming that's even platform specific, would be to have then the
"platform RAS driver" just layout on top of the individual EDAC drivers
and consume their output. Not contain the drivers themselves.

> > or be in some generic RAS code in the kernel, there's no reason why
> > these can't be abstracted.
>
> Yes, we have this drivers/ras/cec.c thing which collects correctable
> DRAM errors on x86. :-)

Using machine checks, not EDAC. It's completely orghogonal at this
point at least.

That said, it would make sense to have an EDAC API to match that
address back into a DIMM location and give user an informational
message about failures happening on that DIMM. But that could be done
via core EDAC MC APIs.

Here too, no need for having an over-arching platform driver.

> > Also in your specific example, it could be entirely local to the MC
> > EDAC / DRAM controller path, we could have a generic way for EDAC to
> > advertise that a given memory channel is giving lots of errors and
> > have memory controller drivers listen to it but usually the EDAC MC
> > driver *is* the only thing that looks like a MC driver to begin with,
> >
> > so again, pretty much no overlap with L1/L2 caches RAS or PCIe RAS
> > etc...
> >
> > Unless I'm mistaken, that amd64 EDAC is just an MC one... but I only
> > had a cursory glance at the code.
>
> EDAC has historically been concentrating on DRAM errors as that is
> what people have been paying attention to. But it isn't limited to
> DRAM errors - there is some basic PCI errors functionality behind
> edac_pci_create_generic_ctl() which polls for PCI parity errors.

Right, somebody whacked the PCI stuff in the same driver. So what ?
There's no coordination here not particular reason it has to be so.
Those PCI bits could have moved to a sepatate driver easily. Maybe they
didn't bcs they didn't have a good way to probe the two separately via
ACPI ? I don't know. But it doesn't matter nor does it affect the
situation with ARM.

That said, x86 platforms tend to be less diverse in their selection of
IP blocks, and tend to have more integrated chipsets where the
distinction between the memory controller and PCI may be a bit less
obvious. This isn't the case on ARM.

I still think that doesn't prove or disprove anything.

> So it still makes sense to me to have a single driver which takes care
> of all things RAS for a platform. You just load one driver and it does
> it all, including recovery actions.

Why ? Because one or two historical drivers mix MC and PCI then "it
makes sense" to do that for everybody ?

And then you have 20 platforms and 20 drivers, with 50% or more code
duplication, bugs fixed in one and not the other, gratuituous behaviour
differences to confuse users etc... No. that doesn't make sense.

> > Maybe because what you are promoting might not be the right path
> > here... seriously, there's a reason why all vendors want to go down
> > that path and in this case I don't think they are wrong.
> >
> > This isn't about just another ARM vendor, in fact I'm rather new to the
> > whole ARM thing, I used to maintain arch/powerpc :-)
>
> What happened? :-P

Long story :-) I handed it over to mpe a while ago, I left IBM earlire
this year.

> > The point is what you are trying to push for goes against everything
> > we've been trying to do in Linux when it comes to splitting drivers to
> > individual IP blocks.
>
> Please do explain why is a driver-per-IP-block better and please don't
> give me the DT argument - I've heard that one more than enough now.

I have no idea what "the DT argument" is, and that's from the guy who
created the FDT....

I have difficulties understanding how you cannot see that having re-
usable single drivers for a single piece of HW makes sense. If anything
in term of avoiding duplication, bitrot, bugs being fixed in some and
not others, etc etc... It also means more eyes on a given piece of code
which is a good thing.

Also you "have heard more than enough" is again a sign that a whole lot
of people are trying to tell you something that you seem to refuse to
hear. Whatever that "DT argument" is, did you just ignore it or had
some good and solid arguments of your own to refute it ?

> Like, for example, how do you deal with the case where you have a
> platform which has a bunch of IP blocks with RAS functionality and they
> all have a separate driver. Now, you want to do recovery and failover
> actions depending on certain error count from a certain group of IP
> blocks.
>
> You export those counts through sysfs from the separate drivers and you
> have a userspace agent doing that policy?

So mostly yes. Works fine for POWER9 :-) That said, you'll have to be
more precise, because so far this is very hypothetical.

> That cannot always fly because recovery actions for some errors need to
> happen before we return to userspace - i.e., memory-failure.c types of
> scenarios.

How come ? Most memory failure type scenario tend to be handled via
MCEs anyway and don't go through EDAC. Additionally, if they do, that
can generally be constrained to the MC driver. But even then, what kind
should be handled "before we return to userspace" ?

However, if we want to create some overall policy then we should create
some in-kernel APIs so that your magic "platform driver" can talk to
the indidual EDAC drivers (or get notifed by them).

Mangling everything together is definitely NOT the way to go here.

However, what I see above is a lot of hand waving and hypothetical
examples, nothing really concrete.

> You add another "counting" layer which is yet another driver which
> collects those errors and applies policy actions?

No. Individual drivers count and report. Not sure what you mean here.

> But then that layer needs to be made generic enough to be shared by the
> other EDAC IP block drivers, otherwise every platform would need its own
> counter layer driver. Which basically puts the problem somewhere else
> but doesn't make it go away.

Not sure what you mean by a "counter layer driver"...

> Another way I'm not thinking of at the moment?
>
> A single driver solves that problem as it has all the required
> information in one place and deals with it then and there.

We could also have the entire BSP of a platform as one giant driver
that does all the RAS, netowrking, serial, and video .. why not ? That
would make your policies a lot easier :-) Not seriously I really fail
to see your points, and it looks like I'm not the only one.

> I hear you that platform drivers are frowned upon but connecting it all
> in one place for such purposes makes sense to me in this particular
> case.

What makes some amount of sense *if necessary* (and I yet have to be
convinced it is) is to have the platform policy driver use internal
kernel APIs to communicate with the individual IP block drivers via
something reasonably standard.

But even then, I yet have to see an actual need for this.

> Btw, what is your final goal with these drivers? Dump decoded error
> information in dmesg? Or something more sophisticated?

I'll let Hanna respond to that one.

Cheers,
Ben.


2019-06-13 17:02:31

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Wed, 2019-06-12 at 14:25 +0200, Borislav Petkov wrote:
> > But for the main case that really needs to be in the kernel, which is
> > DRAM, the recovery can usually be contained to the MC driver anyway.
>
> Right, if that is enough to handle the error properly.
>
> The memory-failure.c example I gave before is the error reporting
> mechanism (x86 MCA) calling into the mm subsystem to poison and isolate
> page frames which are known to contain errors. So you have two things
> talking to each other.

And none of them is an EDAC driver...

I mean yes, the network drivers talk to the network stack, or even the
memory allocator :-)

I still don't see how that requires a big platform coordinator...

Ben.


2019-06-13 17:02:31

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

On Wed, 2019-06-12 at 13:00 +0200, Borislav Petkov wrote:
> On Wed, Jun 12, 2019 at 07:42:42AM -0300, Mauro Carvalho Chehab wrote:
> > That's said, from the admin PoV, it makes sense to have a single
> > daemon that collect errors from all error sources and take the
> > needed actions.
>
> Doing recovery actions in userspace is too flaky. Daemon can get killed
> at any point in time

So what ? If root kills your RAS daemon, then so be it. That has never
been a problem on POWER8/POWER9 server platforms and those have some of
the nastiest RAS in town.

You can kill PID 1 too you know ...

> and there are error types where you want to do recovery *before* you return to userspace.

Very few (precise examples please) and I yet have to see why those need
some kind of magic coordinator.

> Yes, we do have different error reporting facilities but I still think
> that concentrating all the error information needed in order to do
> proper recovery action is the better approach here. And make that part
> of the kernel so that it is robust. Userspace can still configure it and
> so on.

Ben.


2019-06-13 17:05:49

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Hi Hawa,

On 11/06/2019 20:56, Hawa, Hanna wrote:
> James Morse wrote:
>> Hawa, Hanna wrote:
>>> +    edac_device_handle_ce(edac_dev, 0, 0, "L2 Error");
>>
>> How do we know this was corrected?
>>
>> 6.4.8 "Error Correction Code" has "Double-bit ECC errors set the fatal bit." in a
>> paragraph talking about the L1 memory system.

> I'll check fatal field to check if it's uncorrected/corrected.


>>> +    edac_printk(KERN_CRIT, DRV_NAME, "CPU%d L1 %serror detected\n",
>>> +            cpu, (fatal) ? "Fatal " : "");
>>> +    edac_printk(KERN_CRIT, DRV_NAME, "RAMID=");
>>> +
>>> +    switch (ramid) {
>>> +    case ARM_CA57_L1_I_TAG_RAM:
>>> +        pr_cont("'L1-I Tag RAM' index=%d way=%d", index, way);
>>> +        break;
>>> +    case ARM_CA57_L1_I_DATA_RAM:
>>> +        pr_cont("'L1-I Data RAM' index=%d bank= %d", index, way);
>>> +        break;
>>
>> Is index/way information really useful? I can't replace way-3 on the system, nor can I
>> stop it being used. If its useless, I'd rather we don't bother parsing and printing it out.

> I'll remove the index/way information, and keep CPUMERRSR_EL1 value print (who want this
> information can parse the value and get the index/bank status)

Good idea, just print it raw.


>>> +    pr_cont(", repeat=%d, other=%d (CPUMERRSR_EL1=0x%llx)\n", repeat, other,
>>> +        val);
>>
>> 'other' here is another error, but we don't know the ramid.
>> 'repeat' is another error for the same ramid.
>>
>> Could we still feed this stuff into edac? This would make the counters accurate if the
>> polling frequency isn't quite fast enough.

> There is no API in EDAC to increase the counters, I can increase by accessing the
> ce_count/ue_count from edac_device_ctl_info/edac_device_instance structures if it's okay..

Ah, sorry, I was thinking of the edac_mc_handle_error(), which has an error_count parameter.

(I wouldn't go poking in the structures directly).


>>> +static void al_a57_edac_l2merrsr(void *arg)
>>> +{
>>
>>> +    edac_device_handle_ce(edac_dev, 0, 0, "L2 Error");
>>
>> How do we know this is corrected?

>> If looks like L2CTLR_EL1[20] might force fatal 1/0 to map to uncorrected/corrected. Is
>> this what you are depending on here?

> No - not on this. Reporting all the errors as corrected seems to be bad.
>
> Can i be depends on fatal field?

That is described as "set to 1 on the first memory error that caused a Data Abort". I
assume this is one of the parity-error external-aborts.

If the repeat counter shows, say, 2, and fatal is set, you only know that at least one of
these errors caused an abort. But it could have been all three. The repeat counter only
matches against the RAMID and friends, otherwise the error is counted in 'other'.

I don't think there is a right thing to do here, (other than increase the scrubbing
frequency). As you can only feed one error into edac at a time then:

> if (fatal)
>     edac_device_handle_ue(edac_dev, 0, 0, "L2 Error");
> else
>     edac_device_handle_ce(edac_dev, 0, 0, "L2 Error");

seems reasonable. You're reporting the most severe, and 'other/repeat' counter values just
go missing.


> How can L2CTLR_EL1[20] force fatal?

I don't think it can, on a second reading, it looks to be even more complicated than I
thought! That bit is described as disabling forwarding of uncorrected data, but it looks
like the uncorrected data never actually reaches the other end. (I'm unsure what 'flush'
means in this context.)
I was looking for reasons you could 'know' that any reported error was corrected. This was
just a bad suggestion!


>> (it would be good to have a list of integration-time and firmware dependencies this driver
>> has, for the next person who tries to enable it on their system and complains it doesn't
>> work for them)

> Where can i add such information?

The mailing list archive is good enough. I'll ask about any obvious dependency I spot from
the TRM, (e.g. that list at the top of my first reply). If you know of anything weird
please call it out.


>>> +    pr_cont(", cpuid/way=%d, repeat=%d, other=%d (L2MERRSR_EL1=0x%llx)\n",
>>> +        way, repeat, other, val);
>>
>> cpuid could be useful if you can map it back to the cpu number linux has.
>> If you can spot that cpu-7 is experiencing more errors than it should, you can leave it
>> offline.
>>
>> To do this you'd need to map each L2MERRSR_EL1's '0-3' range back to the CPUs they
>> actually are. The gic's 'ppi-partitions' does this with phandles, e.g.
>> Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml. You could add a
>> similar shaped thing to the l2-cacheX node in the DT, (or in your edac node, but it is a
>> property of the cache integration).

> As in L1 prints, I'll remove non-relevant prints.

Fair enough.


>>> +    /*
>>> +     * Use get_online_cpus/put_online_cpus to prevent the online CPU map
>>> +     * changing while reads the L1/L2 error status
>>
>> For walking the list of offline cpus, this makes sense. But you schedule work without
>> waiting, it may get run after you drop the cpus_read_lock()...,

> Will update the smp_call_function_single() call function to wait.


>>> +    for_each_online_cpu(cpu) {
>>> +        /* Check L1 errors */
>>> +        smp_call_function_single(cpu, al_a57_edac_cpumerrsr, edac_dev,
>>> +                     0);
>>
>> As you aren't testing for big/little, wouldn't on_each_cpu() here be simpler?

> Could be simpler for L1, how can be implemented for L2?

You'd need bitmasks for each cluster to feed to smp_call_function_any().


>>> +        cluster = topology_physical_package_id(cpu);
>>
>> Hmm, I'm not sure cluster==package is guaranteed to be true forever.
>>
>> If you describe the L2MERRSR_EL1 cpu mapping in your DT you could use that. Otherwise
>> pulling out the DT using something like the arch code's parse_cluster().

> I rely on that it's alpine SoC specific driver.

... and that the topology code hasn't changed to really know what a package is:
https://lore.kernel.org/lkml/[email protected]/T/#u

As what you really want to know is 'same L2?', and you're holding the cpu_read_lock(),
would struct cacheinfo's shared_cpu_map be a better fit?

This would be done by something like a cpu-mask of cache:shared_cpu_map's for the L2's
you've visited. It removes the dependency on package==L2, and insulates you from the
cpu-numbering not being exactly as you expect.


>>> +        if (cluster != last_cluster) {
>>> +            smp_call_function_single(cpu, al_a57_edac_l2merrsr,
>>> +                         edac_dev, 0);
>>> +            last_cluster = cluster;
>>> +        }
>>
>> Here you depend on the CPUs being listed in cluster-order in the DT. I'm fairly sure the
>> numbering is arbitrary: On my Juno 0,3,4,5 are the A53 cluster, and 1,2 are the A57
>> cluster.
>>
>> If 1,3,5 were cluster-a and 2,4,6 were cluster-b, you would end up calling
>> al_a57_edac_l2merrsr() for each cpu. As you don't wait, they could race.
>>
>> If you can get a cpu-mask for each cluster, smp_call_function_any() would to the
>> pick-one-online-cpu work for you.

> Again, I rely on that it's alpine SoC specific driver.
> How can I get cpu-mask for each cluster? from DT?

Its not cluster you want, its the L2. Cacheinfo has this for online CPUs, and you're
already holding the cpus_read_lock().


>>> +static int al_a57_edac_remove(struct platform_device *pdev)
>>> +{
>>> +    struct edac_device_ctl_info *edac_dev = platform_get_drvdata(pdev);
>>> +
>>> +    edac_device_del_device(edac_dev->dev);
>>> +    edac_device_free_ctl_info(edac_dev);
>>
>> Your poll function schedule work on other CPUs and didn't wait, is it possible
>> al_a57_edac_l2merrsr() is still using this memory when you free it?

> This will be okay, after using wait in smp_call_function_single().

Yup.


Thanks,

James

2019-06-14 10:50:29

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Hi Hawa,

On 13/06/2019 18:05, James Morse wrote:
> On 11/06/2019 20:56, Hawa, Hanna wrote:
>> James Morse wrote:
>>> Hawa, Hanna wrote:
>>>> +        if (cluster != last_cluster) {
>>>> +            smp_call_function_single(cpu, al_a57_edac_l2merrsr,
>>>> +                         edac_dev, 0);
>>>> +            last_cluster = cluster;
>>>> +        }
>>> Here you depend on the CPUs being listed in cluster-order in the DT. I'm fairly sure the
>>> numbering is arbitrary: On my Juno 0,3,4,5 are the A53 cluster, and 1,2 are the A57
>>> cluster.
>>>
>>> If 1,3,5 were cluster-a and 2,4,6 were cluster-b, you would end up calling
>>> al_a57_edac_l2merrsr() for each cpu. As you don't wait, they could race.
>>>
>>> If you can get a cpu-mask for each cluster, smp_call_function_any() would to the
>>> pick-one-online-cpu work for you.

>> Again, I rely on that it's alpine SoC specific driver.

An example of where this goes wrong is kexec:
If you offline CPU0, then kexec, the new kernel will start up on the lowest numbered
online CPU, which won't be zero. But the new kernel will call it CPU0.

Kdump is even better, as it starts up on whichever CPU called panic(), and calls it CPU0.


Thanks,

James


>> How can I get cpu-mask for each cluster? from DT?

> Its not cluster you want, its the L2. Cacheinfo has this for online CPUs, and you're
> already holding the cpus_read_lock().

2019-06-14 10:53:33

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Reply part 2.

On Thu, Jun 13, 2019 at 09:54:18AM +1000, Benjamin Herrenschmidt wrote:
> Why ? Because one or two historical drivers mix MC and PCI then "it
> makes sense" to do that for everybody ?

Because it was like that. And now all of a sudden ARM wants something
different. So we must at least talk about it before we do it, right?

Also, I don't know if you've noticed but RAS "architecture" on Linux is
still a big WIP, to put it mildly. So before we do anything, we should
have at least some rough idea of where it is all going to.

> And then you have 20 platforms and 20 drivers, with 50% or more code
> duplication, bugs fixed in one and not the other, gratuituous behaviour
> differences to confuse users etc... No. that doesn't make sense.

No different on ARM if you have a memory controller IP which is roughly
the same IP but different vendors integrate it and they each tweak it
a bit in their own way (registers, ECC support, etc) and you get an
EDAC MC driver from every vendor and they all don't share the basic
functionality.

> I have no idea what "the DT argument" is, and that's from the guy who
> created the FDT....
>
> I have difficulties understanding how you cannot see that having re-
> usable single drivers for a single piece of HW makes sense. If anything
> in term of avoiding duplication, bitrot, bugs being fixed in some and
> not others, etc etc... It also means more eyes on a given piece of code
> which is a good thing.
>
> Also you "have heard more than enough" is again a sign that a whole lot
> of people are trying to tell you something that you seem to refuse to
> hear.

Hmm, I think I'm hearing it. But not without good arguments for why
we're going to do it. I believe that became clear so far..

> Whatever that "DT argument" is, did you just ignore it or had
> some good and solid arguments of your own to refute it ?

I don't care about refuting it or not - all I care about is getting good
arguments for why we should do this driver-per-IP-block thing. EDAC was
was ok so far - I wasn't going to change it just because someone is
sending me drivers per-IP block and not selling me the idea properly.

And AFAIR I haven't heard a single good argument trying to convince me
why it should be done this way. Only after this thread started and we
started poking at it, I got some good arguments.

So enough wasting time, I think we can try the per-IP things and see
where it would get us.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-06-17 13:02:05

by Hanna Hawa

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC


>>>> +static void al_a57_edac_l2merrsr(void *arg)
>>>> +{
>>>
>>>> +    edac_device_handle_ce(edac_dev, 0, 0, "L2 Error");
>>>
>>> How do we know this is corrected?
>
>>> If looks like L2CTLR_EL1[20] might force fatal 1/0 to map to uncorrected/corrected. Is
>>> this what you are depending on here?
>
>> No - not on this. Reporting all the errors as corrected seems to be bad.
>>
>> Can i be depends on fatal field?
>
> That is described as "set to 1 on the first memory error that caused a Data Abort". I
> assume this is one of the parity-error external-aborts.
>
> If the repeat counter shows, say, 2, and fatal is set, you only know that at least one of
> these errors caused an abort. But it could have been all three. The repeat counter only
> matches against the RAMID and friends, otherwise the error is counted in 'other'.
>
> I don't think there is a right thing to do here, (other than increase the scrubbing
> frequency). As you can only feed one error into edac at a time then:
>
>> if (fatal)
>>     edac_device_handle_ue(edac_dev, 0, 0, "L2 Error");
>> else
>>     edac_device_handle_ce(edac_dev, 0, 0, "L2 Error");
>
> seems reasonable. You're reporting the most severe, and 'other/repeat' counter values just
> go missing.
I had print the values of 'other/repeat' to be noticed.

>
>
>> How can L2CTLR_EL1[20] force fatal?
>
> I don't think it can, on a second reading, it looks to be even more complicated than I
> thought! That bit is described as disabling forwarding of uncorrected data, but it looks
> like the uncorrected data never actually reaches the other end. (I'm unsure what 'flush'
> means in this context.)
> I was looking for reasons you could 'know' that any reported error was corrected. This was
> just a bad suggestion!
Is there interrupt for un-correctable error?
Does 'asynchronous errors' in L2 used to report UE?

In case no interrupt, can we use die-notifier subsystem to check if any
error had occur while system shutdown?

>>>> +        cluster = topology_physical_package_id(cpu);
>>>
>>> Hmm, I'm not sure cluster==package is guaranteed to be true forever.
>>>
>>> If you describe the L2MERRSR_EL1 cpu mapping in your DT you could use that. Otherwise
>>> pulling out the DT using something like the arch code's parse_cluster().
>
>> I rely on that it's alpine SoC specific driver.
>
> ... and that the topology code hasn't changed to really know what a package is:
> https://lore.kernel.org/lkml/[email protected]/T/#u
>
> As what you really want to know is 'same L2?', and you're holding the cpu_read_lock(),
> would struct cacheinfo's shared_cpu_map be a better fit?
>
> This would be done by something like a cpu-mask of cache:shared_cpu_map's for the L2's
> you've visited. It removes the dependency on package==L2, and insulates you from the
> cpu-numbering not being exactly as you expect.
I'll add dt property that point to L2-cache node (phandle), then it'll
be easy to create cpu-mask with all cores that point to same l2 cache.

Thanks,
Hanna


2019-06-19 17:23:49

by James Morse

[permalink] [raw]
Subject: Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

Hi Hawa,

On 17/06/2019 14:00, Hawa, Hanna wrote:
>> I don't think it can, on a second reading, it looks to be even more complicated than I
>> thought! That bit is described as disabling forwarding of uncorrected data, but it looks
>> like the uncorrected data never actually reaches the other end. (I'm unsure what 'flush'
>> means in this context.)
>> I was looking for reasons you could 'know' that any reported error was corrected. This was
>> just a bad suggestion!

> Is there interrupt for un-correctable error?

The answer here is somewhere between 'not really' and 'maybe'.
There is a signal you may have wired-up as an interrupt, but its not usable from linux.

A.8.2 "Asychronous error signals" of the A57 TRM [0] has:
| nINTERRIRQ output Error indicator for an L2 RAM double-bit ECC error.
("7.6 Asynchronous errors" has more on this).

Errors cause L2ECTLR[30] to get set, and this value output as a signal, you may have wired
it up as an interrupt.

If you did, beware its level sensitive, and can only be cleared by writing to L2ECTLR_EL1.
You shouldn't allow linux to access this register as it could mess with the L2
configuration, which could also affect your EL3 and any secure-world software.

The arrival of this interrupt doesn't tell you which L2 tripped the error, and you can
only clear it if you write to L2ECTLR_EL1 on a CPU attached to the right L2. So this isn't
actually a shared (peripheral) interrupt.

This stuff is expected to be used by firmware, which can know the affinity constraints of
signals coming in as interrupts.


> Does 'asynchronous errors' in L2 used to report UE?

From "7.2.4 Error correction code" single-bit errors are always corrected.
A.8.2 quoted above gives the behaviour for double-bit errors.


> In case no interrupt, can we use die-notifier subsystem to check if any error had occur
> while system shutdown?

notify_die() would imply a synchronous exception that killed a thread. SError are a whole
lot worse. Before v8.2 these are all treated as 'uncontained': unknown memory corruption.
Which in your L2 case is exactly what happened. The arch code will panic().

If your driver can print something useful to help debug the panic(), then a panic_notifier
sounds appropriate. But you can't rely on these notifiers being called, as kdump has some
hooks that affect if/when they run.

(KVM will 'contain' SError that come from a guest to the guest, as we know a distinct set
of memory was in use. You may see fatal error counters increasing without the system
panic()ing)

contained/uncontained is part of the terminology from the v8.2 RAS spec [1].


Thanks,

James


[0]
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0488c/DDI0488C_cortex_a57_mpcore_r1p0_trm.pdf
[1]
https://static.docs.arm.com/ddi0587/ca/ARM_DDI_0587C_a_RAS.pdf?_ga=2.148234679.1686960568.1560964184-897392434.1556719556