2019-09-10 19:37:15

by Shenhar, Talel

[permalink] [raw]
Subject: [PATCH v2 0/3] Amazon's Annapurna Labs POS Driver

The Amazon's Annapurna Labs SoCs includes Point Of Serialization error
logging unit that reports an error in case of write error (e.g. attempt to
write to a read only register).

This patch series introduces the support for this unit.

Changes since v1: =================
- move MODULE_ to the end of the file
- simplified resource remapping devm_platform_ioremap_resource()
- use platform_get_irq() instead of irq_of_parse_and_map()
- removed the use of _relaxed accessor in favor to the regular ones
- removed driver selected based on arch
- added casting to u64 before left shifting (reported by kbuild test robot)


Talel Shenhar (3):
dt-bindings: soc: al-pos: Amazon's Annapurna Labs POS
soc: amazon: al-pos: Introduce Amazon's Annapurna Labs POS driver
soc: amazon: al-pos: cast to u64 before left shifting

.../bindings/soc/amazon/amazon,al-pos.txt | 18 +++
MAINTAINERS | 7 ++
drivers/soc/Kconfig | 1 +
drivers/soc/Makefile | 1 +
drivers/soc/amazon/Kconfig | 5 +
drivers/soc/amazon/Makefile | 1 +
drivers/soc/amazon/al_pos.c | 127 +++++++++++++++++++++
7 files changed, 160 insertions(+)
create mode 100644 Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt
create mode 100644 drivers/soc/amazon/Kconfig
create mode 100644 drivers/soc/amazon/Makefile
create mode 100644 drivers/soc/amazon/al_pos.c

--
2.7.4


2019-09-10 19:37:24

by Shenhar, Talel

[permalink] [raw]
Subject: [PATCH v2 1/3] dt-bindings: soc: al-pos: Amazon's Annapurna Labs POS

Document Amazon's Annapurna Labs POS SoC binding.

Signed-off-by: Talel Shenhar <[email protected]>
---
.../devicetree/bindings/soc/amazon/amazon,al-pos.txt | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
create mode 100644 Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt

diff --git a/Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt b/Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt
new file mode 100644
index 00000000..035cc571
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt
@@ -0,0 +1,18 @@
+Amazon's Annapurna Labs POS
+
+POS node is defined to describe the Point Of Serialization (POS) logger
+unit.
+
+Required properties:
+- compatible: Shall be "amazon,al-pos".
+- reg: POS logger resources.
+- interrupts: should contain the interrupt for pos error event.
+
+Example:
+
+al_pos {
+ compatible = "amazon,al-pos";
+ reg = <0x0 0xf0070084 0x0 0x00000008>;
+ interrupt-parent = <&amazon_system_fabric>;
+ interrupts = <24 IRQ_TYPE_LEVEL_HIGH>;
+};
--
2.7.4

2019-09-10 19:37:33

by Shenhar, Talel

[permalink] [raw]
Subject: [PATCH v2 3/3] soc: amazon: al-pos: cast to u64 before left shifting

Fix wrap around for pos errors on addresses above 32 bit.

Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Talel Shenhar <[email protected]>
---
drivers/soc/amazon/al_pos.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/amazon/al_pos.c b/drivers/soc/amazon/al_pos.c
index a865111..e95e1fc 100644
--- a/drivers/soc/amazon/al_pos.c
+++ b/drivers/soc/amazon/al_pos.c
@@ -49,7 +49,7 @@ static irqreturn_t al_pos_irq_handler(int irq, void *info)
writel(0, pos->mmio_base + AL_POS_ERROR_LOG_1);

addr = FIELD_GET(AL_POS_ERROR_LOG_0_ADDR_LOW, log0);
- addr |= (FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1) << 32);
+ addr |= (((u64)FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1)) << 32);
request_id = FIELD_GET(AL_POS_ERROR_LOG_1_REQUEST_ID, log1);
bresp = FIELD_GET(AL_POS_ERROR_LOG_1_BRESP, log1);

--
2.7.4

2019-09-10 19:37:37

by Shenhar, Talel

[permalink] [raw]
Subject: [PATCH v2 2/3] soc: amazon: al-pos: Introduce Amazon's Annapurna Labs POS driver

The Amazon's Annapurna Labs SoCs includes Point Of Serialization error
logging unit that reports an error in case write error (e.g. attempt to
write to a read only register).
This patch introduces the support for this unit.

Signed-off-by: Talel Shenhar <[email protected]>
---
MAINTAINERS | 7 +++
drivers/soc/Kconfig | 1 +
drivers/soc/Makefile | 1 +
drivers/soc/amazon/Kconfig | 5 ++
drivers/soc/amazon/Makefile | 1 +
drivers/soc/amazon/al_pos.c | 127 ++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 142 insertions(+)
create mode 100644 drivers/soc/amazon/Kconfig
create mode 100644 drivers/soc/amazon/Makefile
create mode 100644 drivers/soc/amazon/al_pos.c

diff --git a/MAINTAINERS b/MAINTAINERS
index e7a47b5..8c3a070 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -751,6 +751,13 @@ F: drivers/tty/serial/altera_jtaguart.c
F: include/linux/altera_uart.h
F: include/linux/altera_jtaguart.h

+AMAZON ANNAPURNA LABS POS
+M: Talel Shenhar <[email protected]>
+M: Talel Shenhar <[email protected]>
+S: Maintained
+F: Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt
+F: drivers/soc/amazon/al_pos.c
+
AMAZON ANNAPURNA LABS THERMAL MMIO DRIVER
M: Talel Shenhar <[email protected]>
S: Maintained
diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig
index 833e04a..913a6b1 100644
--- a/drivers/soc/Kconfig
+++ b/drivers/soc/Kconfig
@@ -2,6 +2,7 @@
menu "SOC (System On Chip) specific Drivers"

source "drivers/soc/actions/Kconfig"
+source "drivers/soc/amazon/Kconfig"
source "drivers/soc/amlogic/Kconfig"
source "drivers/soc/aspeed/Kconfig"
source "drivers/soc/atmel/Kconfig"
diff --git a/drivers/soc/Makefile b/drivers/soc/Makefile
index 2ec3550..c1c5c64 100644
--- a/drivers/soc/Makefile
+++ b/drivers/soc/Makefile
@@ -6,6 +6,7 @@
obj-$(CONFIG_ARCH_ACTIONS) += actions/
obj-$(CONFIG_SOC_ASPEED) += aspeed/
obj-$(CONFIG_ARCH_AT91) += atmel/
+obj-y += amazon/
obj-y += bcm/
obj-$(CONFIG_ARCH_DOVE) += dove/
obj-$(CONFIG_MACH_DOVE) += dove/
diff --git a/drivers/soc/amazon/Kconfig b/drivers/soc/amazon/Kconfig
new file mode 100644
index 00000000..fdd4cdd
--- /dev/null
+++ b/drivers/soc/amazon/Kconfig
@@ -0,0 +1,5 @@
+config AL_POS
+ bool "Amazon's Annapurna Labs POS driver"
+ depends on ARCH_ALPINE || COMPILE_TEST
+ help
+ Include support for the SoC POS error capability.
diff --git a/drivers/soc/amazon/Makefile b/drivers/soc/amazon/Makefile
new file mode 100644
index 00000000..a31441a
--- /dev/null
+++ b/drivers/soc/amazon/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_AL_POS) += al_pos.o
diff --git a/drivers/soc/amazon/al_pos.c b/drivers/soc/amazon/al_pos.c
new file mode 100644
index 00000000..a865111
--- /dev/null
+++ b/drivers/soc/amazon/al_pos.c
@@ -0,0 +1,127 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ */
+#include <linux/bitfield.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/of_platform.h>
+
+/* Registers Offset */
+#define AL_POS_ERROR_LOG_1 0x0
+#define AL_POS_ERROR_LOG_0 0x4
+
+/* Registers Fields */
+#define AL_POS_ERROR_LOG_1_VALID GENMASK(31, 31)
+#define AL_POS_ERROR_LOG_1_BRESP GENMASK(18, 17)
+#define AL_POS_ERROR_LOG_1_REQUEST_ID GENMASK(16, 8)
+#define AL_POS_ERROR_LOG_1_ADDR_HIGH GENMASK(7, 0)
+
+#define AL_POS_ERROR_LOG_0_ADDR_LOW GENMASK(31, 0)
+
+static int al_pos_panic;
+module_param(al_pos_panic, int, 0);
+MODULE_PARM_DESC(al_pos_panic, "Defines if POS error is causing panic()");
+
+struct al_pos {
+ struct platform_device *pdev;
+ void __iomem *mmio_base;
+ int irq;
+};
+
+static irqreturn_t al_pos_irq_handler(int irq, void *info)
+{
+ struct platform_device *pdev = info;
+ struct al_pos *pos = platform_get_drvdata(pdev);
+ u32 log1;
+ u32 log0;
+ u64 addr;
+ u16 request_id;
+ u8 bresp;
+
+ log1 = readl(pos->mmio_base + AL_POS_ERROR_LOG_1);
+ if (!FIELD_GET(AL_POS_ERROR_LOG_1_VALID, log1))
+ return IRQ_NONE;
+
+ log0 = readl(pos->mmio_base + AL_POS_ERROR_LOG_0);
+ writel(0, pos->mmio_base + AL_POS_ERROR_LOG_1);
+
+ addr = FIELD_GET(AL_POS_ERROR_LOG_0_ADDR_LOW, log0);
+ addr |= (FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1) << 32);
+ request_id = FIELD_GET(AL_POS_ERROR_LOG_1_REQUEST_ID, log1);
+ bresp = FIELD_GET(AL_POS_ERROR_LOG_1_BRESP, log1);
+
+ dev_err(&pdev->dev, "addr=0x%llx request_id=0x%x bresp=0x%x\n",
+ addr, request_id, bresp);
+
+ if (al_pos_panic)
+ panic("POS");
+
+ return IRQ_HANDLED;
+}
+
+static int al_pos_probe(struct platform_device *pdev)
+{
+ struct al_pos *pos;
+ int ret;
+
+ pos = devm_kzalloc(&pdev->dev, sizeof(*pos), GFP_KERNEL);
+ if (!pos)
+ return -ENOMEM;
+
+ platform_set_drvdata(pdev, pos);
+ pos->pdev = pdev;
+
+ pos->mmio_base = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(pos->mmio_base)) {
+ dev_err(&pdev->dev, "failed to ioremap memory (%ld)\n",
+ PTR_ERR(pos->mmio_base));
+ return PTR_ERR(pos->mmio_base);
+ }
+
+ pos->irq = platform_get_irq(pdev, 0);
+ if (pos->irq <= 0) {
+ dev_err(&pdev->dev, "fail to parse and map irq\n");
+ return -EINVAL;
+ }
+
+ ret = devm_request_irq(&pdev->dev,
+ pos->irq,
+ al_pos_irq_handler,
+ 0,
+ pdev->name,
+ pdev);
+ if (ret != 0) {
+ dev_err(&pdev->dev,
+ "failed to register to irq %d (%d)\n",
+ pos->irq, ret);
+ return ret;
+ }
+
+ dev_info(&pdev->dev, "successfully loaded\n");
+
+ return 0;
+}
+
+static const struct of_device_id al_pos_of_match[] = {
+ { .compatible = "amazon,al-pos", },
+ {},
+};
+
+MODULE_DEVICE_TABLE(of, al_pos_of_match);
+
+static struct platform_driver al_pos_driver = {
+ .probe = al_pos_probe,
+ .driver = {
+ .name = "al-pos",
+ .of_match_table = al_pos_of_match,
+ },
+};
+
+module_platform_driver(al_pos_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Talel Shenhar");
+MODULE_DESCRIPTION("Amazon's Annapurna Labs POS driver");
--
2.7.4

2019-09-11 14:17:53

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] soc: amazon: al-pos: Introduce Amazon's Annapurna Labs POS driver

[+James]

Hi Talel,

On Tue, 10 Sep 2019 20:05:09 +0100,
Talel Shenhar <[email protected]> wrote:
>
> The Amazon's Annapurna Labs SoCs includes Point Of Serialization error
> logging unit that reports an error in case write error (e.g. attempt to
> write to a read only register).
> This patch introduces the support for this unit.
>
> Signed-off-by: Talel Shenhar <[email protected]>
> ---
> MAINTAINERS | 7 +++
> drivers/soc/Kconfig | 1 +
> drivers/soc/Makefile | 1 +
> drivers/soc/amazon/Kconfig | 5 ++
> drivers/soc/amazon/Makefile | 1 +
> drivers/soc/amazon/al_pos.c | 127 ++++++++++++++++++++++++++++++++++++++++++++
> 6 files changed, 142 insertions(+)
> create mode 100644 drivers/soc/amazon/Kconfig
> create mode 100644 drivers/soc/amazon/Makefile
> create mode 100644 drivers/soc/amazon/al_pos.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e7a47b5..8c3a070 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -751,6 +751,13 @@ F: drivers/tty/serial/altera_jtaguart.c
> F: include/linux/altera_uart.h
> F: include/linux/altera_jtaguart.h
>
> +AMAZON ANNAPURNA LABS POS
> +M: Talel Shenhar <[email protected]>
> +M: Talel Shenhar <[email protected]>
> +S: Maintained
> +F: Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt
> +F: drivers/soc/amazon/al_pos.c
> +
> AMAZON ANNAPURNA LABS THERMAL MMIO DRIVER
> M: Talel Shenhar <[email protected]>
> S: Maintained
> diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig
> index 833e04a..913a6b1 100644
> --- a/drivers/soc/Kconfig
> +++ b/drivers/soc/Kconfig
> @@ -2,6 +2,7 @@
> menu "SOC (System On Chip) specific Drivers"
>
> source "drivers/soc/actions/Kconfig"
> +source "drivers/soc/amazon/Kconfig"
> source "drivers/soc/amlogic/Kconfig"
> source "drivers/soc/aspeed/Kconfig"
> source "drivers/soc/atmel/Kconfig"
> diff --git a/drivers/soc/Makefile b/drivers/soc/Makefile
> index 2ec3550..c1c5c64 100644
> --- a/drivers/soc/Makefile
> +++ b/drivers/soc/Makefile
> @@ -6,6 +6,7 @@
> obj-$(CONFIG_ARCH_ACTIONS) += actions/
> obj-$(CONFIG_SOC_ASPEED) += aspeed/
> obj-$(CONFIG_ARCH_AT91) += atmel/
> +obj-y += amazon/
> obj-y += bcm/
> obj-$(CONFIG_ARCH_DOVE) += dove/
> obj-$(CONFIG_MACH_DOVE) += dove/
> diff --git a/drivers/soc/amazon/Kconfig b/drivers/soc/amazon/Kconfig
> new file mode 100644
> index 00000000..fdd4cdd
> --- /dev/null
> +++ b/drivers/soc/amazon/Kconfig
> @@ -0,0 +1,5 @@
> +config AL_POS
> + bool "Amazon's Annapurna Labs POS driver"

Some would say that the name is ever slightly unfortunate...

> + depends on ARCH_ALPINE || COMPILE_TEST
> + help
> + Include support for the SoC POS error capability.
> diff --git a/drivers/soc/amazon/Makefile b/drivers/soc/amazon/Makefile
> new file mode 100644
> index 00000000..a31441a
> --- /dev/null
> +++ b/drivers/soc/amazon/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_AL_POS) += al_pos.o
> diff --git a/drivers/soc/amazon/al_pos.c b/drivers/soc/amazon/al_pos.c
> new file mode 100644
> index 00000000..a865111
> --- /dev/null
> +++ b/drivers/soc/amazon/al_pos.c
> @@ -0,0 +1,127 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + */
> +#include <linux/bitfield.h>
> +#include <linux/interrupt.h>
> +#include <linux/module.h>
> +#include <linux/of_address.h>
> +#include <linux/of_irq.h>
> +#include <linux/of_platform.h>
> +
> +/* Registers Offset */
> +#define AL_POS_ERROR_LOG_1 0x0
> +#define AL_POS_ERROR_LOG_0 0x4
> +
> +/* Registers Fields */
> +#define AL_POS_ERROR_LOG_1_VALID GENMASK(31, 31)
> +#define AL_POS_ERROR_LOG_1_BRESP GENMASK(18, 17)
> +#define AL_POS_ERROR_LOG_1_REQUEST_ID GENMASK(16, 8)
> +#define AL_POS_ERROR_LOG_1_ADDR_HIGH GENMASK(7, 0)
> +
> +#define AL_POS_ERROR_LOG_0_ADDR_LOW GENMASK(31, 0)
> +
> +static int al_pos_panic;
> +module_param(al_pos_panic, int, 0);
> +MODULE_PARM_DESC(al_pos_panic, "Defines if POS error is causing panic()");
> +
> +struct al_pos {
> + struct platform_device *pdev;
> + void __iomem *mmio_base;
> + int irq;
> +};
> +
> +static irqreturn_t al_pos_irq_handler(int irq, void *info)
> +{
> + struct platform_device *pdev = info;
> + struct al_pos *pos = platform_get_drvdata(pdev);
> + u32 log1;
> + u32 log0;
> + u64 addr;
> + u16 request_id;
> + u8 bresp;
> +
> + log1 = readl(pos->mmio_base + AL_POS_ERROR_LOG_1);

Do you actually need the implied barriers? I'd expect that relaxed
accesses should be enough.

> + if (!FIELD_GET(AL_POS_ERROR_LOG_1_VALID, log1))
> + return IRQ_NONE;
> +
> + log0 = readl(pos->mmio_base + AL_POS_ERROR_LOG_0);
> + writel(0, pos->mmio_base + AL_POS_ERROR_LOG_1);
> +
> + addr = FIELD_GET(AL_POS_ERROR_LOG_0_ADDR_LOW, log0);
> + addr |= (FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1) << 32);
> + request_id = FIELD_GET(AL_POS_ERROR_LOG_1_REQUEST_ID, log1);
> + bresp = FIELD_GET(AL_POS_ERROR_LOG_1_BRESP, log1);
> +
> + dev_err(&pdev->dev, "addr=0x%llx request_id=0x%x bresp=0x%x\n",
> + addr, request_id, bresp);

What is this information? How do we make use of it? Given that this is
asynchronous, how do we correlate it to the offending software?

> +
> + if (al_pos_panic)
> + panic("POS");
> +
> + return IRQ_HANDLED;
> +}
> +
> +static int al_pos_probe(struct platform_device *pdev)
> +{
> + struct al_pos *pos;
> + int ret;
> +
> + pos = devm_kzalloc(&pdev->dev, sizeof(*pos), GFP_KERNEL);
> + if (!pos)
> + return -ENOMEM;
> +
> + platform_set_drvdata(pdev, pos);
> + pos->pdev = pdev;
> +
> + pos->mmio_base = devm_platform_ioremap_resource(pdev, 0);
> + if (IS_ERR(pos->mmio_base)) {
> + dev_err(&pdev->dev, "failed to ioremap memory (%ld)\n",
> + PTR_ERR(pos->mmio_base));
> + return PTR_ERR(pos->mmio_base);
> + }
> +
> + pos->irq = platform_get_irq(pdev, 0);
> + if (pos->irq <= 0) {
> + dev_err(&pdev->dev, "fail to parse and map irq\n");
> + return -EINVAL;
> + }
> +
> + ret = devm_request_irq(&pdev->dev,
> + pos->irq,
> + al_pos_irq_handler,
> + 0,
> + pdev->name,
> + pdev);
> + if (ret != 0) {
> + dev_err(&pdev->dev,
> + "failed to register to irq %d (%d)\n",
> + pos->irq, ret);
> + return ret;
> + }
> +
> + dev_info(&pdev->dev, "successfully loaded\n");
> +
> + return 0;
> +}
> +
> +static const struct of_device_id al_pos_of_match[] = {
> + { .compatible = "amazon,al-pos", },
> + {},
> +};
> +
> +MODULE_DEVICE_TABLE(of, al_pos_of_match);
> +
> +static struct platform_driver al_pos_driver = {
> + .probe = al_pos_probe,
> + .driver = {
> + .name = "al-pos",
> + .of_match_table = al_pos_of_match,
> + },
> +};
> +
> +module_platform_driver(al_pos_driver);
> +
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR("Talel Shenhar");
> +MODULE_DESCRIPTION("Amazon's Annapurna Labs POS driver");
> --
> 2.7.4
>

The whole think looks to me like a poor man's EDAC handling, and I'd
expect to be plugged in that subsystem instead. Any reason why this
isn't the case? It would certainly make the handling uniform for the
user.

Thanks,

M.

--
Jazz is not dead, it just smells funny.

2019-09-11 14:23:08

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] soc: amazon: al-pos: cast to u64 before left shifting

On Tue, 10 Sep 2019 20:05:10 +0100,
Talel Shenhar <[email protected]> wrote:
>
> Fix wrap around for pos errors on addresses above 32 bit.
>
> Reported-by: kbuild test robot <[email protected]>
> Signed-off-by: Talel Shenhar <[email protected]>
> ---
> drivers/soc/amazon/al_pos.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/soc/amazon/al_pos.c b/drivers/soc/amazon/al_pos.c
> index a865111..e95e1fc 100644
> --- a/drivers/soc/amazon/al_pos.c
> +++ b/drivers/soc/amazon/al_pos.c
> @@ -49,7 +49,7 @@ static irqreturn_t al_pos_irq_handler(int irq, void *info)
> writel(0, pos->mmio_base + AL_POS_ERROR_LOG_1);
>
> addr = FIELD_GET(AL_POS_ERROR_LOG_0_ADDR_LOW, log0);
> - addr |= (FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1) << 32);
> + addr |= (((u64)FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1)) << 32);
> request_id = FIELD_GET(AL_POS_ERROR_LOG_1_REQUEST_ID, log1);
> bresp = FIELD_GET(AL_POS_ERROR_LOG_1_BRESP, log1);
>
> --
> 2.7.4
>

This fix should be squashed into the previous patch.

Thanks,

M.

--
Jazz is not dead, it just smells funny.

2019-09-12 06:58:15

by Shenhar, Talel

[permalink] [raw]
Subject: Re: [UNVERIFIED SENDER] Re: [PATCH v2 2/3] soc: amazon: al-pos: Introduce Amazon's Annapurna Labs POS driver

Hi Marc,


On 9/11/2019 5:15 PM, Marc Zyngier wrote:
> [+James]
>
> Hi Talel,
>
> On Tue, 10 Sep 2019 20:05:09 +0100,
> Talel Shenhar <[email protected]> wrote:
>
>> + log1 = readl(pos->mmio_base + AL_POS_ERROR_LOG_1);
> Do you actually need the implied barriers? I'd expect that relaxed
> accesses should be enough.

You are correct. Barriers are not needed, In v1 this driver indeed used
_relaxed versions.

Due to request coming from Arnd in v1 patch series I removed it. As this
is not data path I had no strong objection for removing it.

>
>> + if (!FIELD_GET(AL_POS_ERROR_LOG_1_VALID, log1))
>> + return IRQ_NONE;
>> +
>> + log0 = readl(pos->mmio_base + AL_POS_ERROR_LOG_0);
>> + writel(0, pos->mmio_base + AL_POS_ERROR_LOG_1);
>> +
>> + addr = FIELD_GET(AL_POS_ERROR_LOG_0_ADDR_LOW, log0);
>> + addr |= (FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1) << 32);
>> + request_id = FIELD_GET(AL_POS_ERROR_LOG_1_REQUEST_ID, log1);
>> + bresp = FIELD_GET(AL_POS_ERROR_LOG_1_BRESP, log1);
>> +
>> + dev_err(&pdev->dev, "addr=0x%llx request_id=0x%x bresp=0x%x\n",
>> + addr, request_id, bresp);
> What is this information? How do we make use of it? Given that this is
> asynchronous, how do we correlate it to the offending software?

Indeed this information arriving from the HW is asynchronous.

There is no direct method to get the offending software.

There are all kinds of hacks we do to find the offending software once
we find this error. most of the time its a new patch introduced but some
of the time is just digging.

> The whole think looks to me like a poor man's EDAC handling, and I'd
> expect to be plugged in that subsystem instead. Any reason why this
> isn't the case? It would certainly make the handling uniform for the
> user.

This logic was not plugged into EDAC as there is no "Correctable" error
here. its just error event. Not all errors are EDAC in the sense of
Error Detection And *Correction*. There are no correctable errors for
this driver.

So plugging it  under EDAC seems like abusing the EDAC system.

Now that I've emphasize the reason for not putting this under EDAC, what
do you think? should this "only uncorrectable event" driver should be
part of EDAC?


Thanks,

Talel

2019-09-12 07:21:39

by Shenhar, Talel

[permalink] [raw]
Subject: Re: [UNVERIFIED SENDER] Re: [PATCH v2 3/3] soc: amazon: al-pos: cast to u64 before left shifting


On 9/11/2019 5:18 PM, Marc Zyngier wrote:
> On Tue, 10 Sep 2019 20:05:10 +0100,
> Talel Shenhar <[email protected]> wrote:
>> Fix wrap around for pos errors on addresses above 32 bit.
>>
>> Reported-by: kbuild test robot <[email protected]>
>> Signed-off-by: Talel Shenhar <[email protected]>
>> ---
>> drivers/soc/amazon/al_pos.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/soc/amazon/al_pos.c b/drivers/soc/amazon/al_pos.c
>> index a865111..e95e1fc 100644
>> --- a/drivers/soc/amazon/al_pos.c
>> +++ b/drivers/soc/amazon/al_pos.c
>> @@ -49,7 +49,7 @@ static irqreturn_t al_pos_irq_handler(int irq, void *info)
>> writel(0, pos->mmio_base + AL_POS_ERROR_LOG_1);
>>
>> addr = FIELD_GET(AL_POS_ERROR_LOG_0_ADDR_LOW, log0);
>> - addr |= (FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1) << 32);
>> + addr |= (((u64)FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1)) << 32);
>> request_id = FIELD_GET(AL_POS_ERROR_LOG_1_REQUEST_ID, log1);
>> bresp = FIELD_GET(AL_POS_ERROR_LOG_1_BRESP, log1);
>>
>> --
>> 2.7.4
>>
> This fix should be squashed into the previous patch.
Sure, Shall be part of v3.
>
> Thanks,
>
> M.
>

2019-09-12 08:53:23

by Marc Zyngier

[permalink] [raw]
Subject: Re: [UNVERIFIED SENDER] Re: [PATCH v2 2/3] soc: amazon: al-pos: Introduce Amazon's Annapurna Labs POS driver

On Thu, 12 Sep 2019 07:50:03 +0100,
"Shenhar, Talel" <[email protected]> wrote:
>
> Hi Marc,
>
>
> On 9/11/2019 5:15 PM, Marc Zyngier wrote:
> > [+James]
> >
> > Hi Talel,
> >
> > On Tue, 10 Sep 2019 20:05:09 +0100,
> > Talel Shenhar <[email protected]> wrote:
> >
> >> + log1 = readl(pos->mmio_base + AL_POS_ERROR_LOG_1);
> > Do you actually need the implied barriers? I'd expect that relaxed
> > accesses should be enough.
>
> You are correct. Barriers are not needed, In v1 this driver indeed
> used _relaxed versions.
>
> Due to request coming from Arnd in v1 patch series I removed it. As
> this is not data path I had no strong objection for removing it.

Independently from whether this has any material impact on performance
(this obviously isn't a hot path, unless it can be directly generated
by userspace or a guest), I believe it is important to use the right
type of accessor, if only because code gets copied around... Others
would probably argue that this is the very reason why we should always
use the "safe" option...

>
> >
> >> + if (!FIELD_GET(AL_POS_ERROR_LOG_1_VALID, log1))
> >> + return IRQ_NONE;
> >> +
> >> + log0 = readl(pos->mmio_base + AL_POS_ERROR_LOG_0);
> >> + writel(0, pos->mmio_base + AL_POS_ERROR_LOG_1);
> >> +
> >> + addr = FIELD_GET(AL_POS_ERROR_LOG_0_ADDR_LOW, log0);
> >> + addr |= (FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1) << 32);
> >> + request_id = FIELD_GET(AL_POS_ERROR_LOG_1_REQUEST_ID, log1);
> >> + bresp = FIELD_GET(AL_POS_ERROR_LOG_1_BRESP, log1);
> >> +
> >> + dev_err(&pdev->dev, "addr=0x%llx request_id=0x%x bresp=0x%x\n",
> >> + addr, request_id, bresp);
> > What is this information? How do we make use of it? Given that this is
> > asynchronous, how do we correlate it to the offending software?
>
> Indeed this information arriving from the HW is asynchronous.
>
> There is no direct method to get the offending software.
>
> There are all kinds of hacks we do to find the offending software once
> we find this error. most of the time its a new patch introduced but
> some of the time is just digging.

OK, so that the moment, this is more of a debug tool than anything
else, right?

> > The whole think looks to me like a poor man's EDAC handling, and I'd
> > expect to be plugged in that subsystem instead. Any reason why this
> > isn't the case? It would certainly make the handling uniform for the
> > user.
>
> This logic was not plugged into EDAC as there is no "Correctable"
> error here. its just error event. Not all errors are EDAC in the sense
> of Error Detection And *Correction*. There are no correctable errors
> for this driver.

I'd argue the opposite! Because you obviously don't let a read-only
register being written to, the error has been corrected, and you
signal the correction status.

> So plugging it  under EDAC seems like abusing the EDAC system.
>
> Now that I've emphasize the reason for not putting this under EDAC,
> what do you think? should this "only uncorrectable event" driver
> should be part of EDAC?

My choice would be to plug it into the EDAC subsystem, and report all
interrupts as "Corrected" events. Optionally, and only if you are
debugging something that requires it, report the error as
"Uncorrectable", in which case the EDAC subsystem should trigger a
panic.

At least you'd get the infrastructure, logging and tooling that the
EDAC subsystem offers (parsing the kernel log doesn't really count).

Thanks,

M.

--
Jazz is not dead, it just smells funny.

2019-09-12 09:21:45

by Shenhar, Talel

[permalink] [raw]
Subject: Re: [UNVERIFIED SENDER] Re: [UNVERIFIED SENDER] Re: [PATCH v2 2/3] soc: amazon: al-pos: Introduce Amazon's Annapurna Labs POS driver


On 9/12/2019 11:50 AM, Marc Zyngier wrote:
> On Thu, 12 Sep 2019 07:50:03 +0100,
> "Shenhar, Talel" <[email protected]> wrote:
>> Hi Marc,
>>
>>
>> On 9/11/2019 5:15 PM, Marc Zyngier wrote:
>>> [+James]
>>>
>>> Hi Talel,
>>>
>>> On Tue, 10 Sep 2019 20:05:09 +0100,
>>> Talel Shenhar <[email protected]> wrote:
>>>
>>>> + log1 = readl(pos->mmio_base + AL_POS_ERROR_LOG_1);
>>> Do you actually need the implied barriers? I'd expect that relaxed
>>> accesses should be enough.
>> You are correct. Barriers are not needed, In v1 this driver indeed
>> used _relaxed versions.
>>
>> Due to request coming from Arnd in v1 patch series I removed it. As
>> this is not data path I had no strong objection for removing it.
> Independently from whether this has any material impact on performance
> (this obviously isn't a hot path, unless it can be directly generated
> by userspace or a guest), I believe it is important to use the right
> type of accessor, if only because code gets copied around... Others
> would probably argue that this is the very reason why we should always
> use the "safe" option...
Arnld, would love your inputs before publishing v3 to avoid ping-pong.
>
>>>> + if (!FIELD_GET(AL_POS_ERROR_LOG_1_VALID, log1))
>>>> + return IRQ_NONE;
>>>> +
>>>> + log0 = readl(pos->mmio_base + AL_POS_ERROR_LOG_0);
>>>> + writel(0, pos->mmio_base + AL_POS_ERROR_LOG_1);
>>>> +
>>>> + addr = FIELD_GET(AL_POS_ERROR_LOG_0_ADDR_LOW, log0);
>>>> + addr |= (FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1) << 32);
>>>> + request_id = FIELD_GET(AL_POS_ERROR_LOG_1_REQUEST_ID, log1);
>>>> + bresp = FIELD_GET(AL_POS_ERROR_LOG_1_BRESP, log1);
>>>> +
>>>> + dev_err(&pdev->dev, "addr=0x%llx request_id=0x%x bresp=0x%x\n",
>>>> + addr, request_id, bresp);
>>> What is this information? How do we make use of it? Given that this is
>>> asynchronous, how do we correlate it to the offending software?
>> Indeed this information arriving from the HW is asynchronous.
>>
>> There is no direct method to get the offending software.
>>
>> There are all kinds of hacks we do to find the offending software once
>> we find this error. most of the time its a new patch introduced but
>> some of the time is just digging.
> OK, so that the moment, this is more of a debug tool than anything
> else, right?
Not sure what do you mean by debug tool. this is used to capture iliigle
access and allow panic() based on them or simple logging.
>
>>> The whole think looks to me like a poor man's EDAC handling, and I'd
>>> expect to be plugged in that subsystem instead. Any reason why this
>>> isn't the case? It would certainly make the handling uniform for the
>>> user.
>> This logic was not plugged into EDAC as there is no "Correctable"
>> error here. its just error event. Not all errors are EDAC in the sense
>> of Error Detection And *Correction*. There are no correctable errors
>> for this driver.
> I'd argue the opposite! Because you obviously don't let a read-only
> register being written to, the error has been corrected, and you
> signal the correction status.

Not the meaning of corrected from my point of view - the system as a
whole (sw&hw) are not working state. a driver thinks it configured the
system to do A while the system doesn't really do that. and the critical
part is that the driver that did operation A doesn't even have a way to
know it.

So I would not call this corrected.

>
>> So plugging it  under EDAC seems like abusing the EDAC system.
>>
>> Now that I've emphasize the reason for not putting this under EDAC,
>> what do you think? should this "only uncorrectable event" driver
>> should be part of EDAC?
> My choice would be to plug it into the EDAC subsystem, and report all
> interrupts as "Corrected" events. Optionally, and only if you are
> debugging something that requires it, report the error as
> "Uncorrectable", in which case the EDAC subsystem should trigger a
> panic.
>
> At least you'd get the infrastructure, logging and tooling that the
> EDAC subsystem offers (parsing the kernel log doesn't really count).

I see what you say. However, I don't see too much added value in
plugging this to EDAC and feel like it would abuse EDAC framework.

James, will love your input from EDAC point of view, does it make sense
to plug un-correctable only event to EDAC?

>
> Thanks,
>
> M.
>

2019-09-18 13:44:03

by Rob Herring

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] dt-bindings: soc: al-pos: Amazon's Annapurna Labs POS

On Tue, Sep 10, 2019 at 10:05:08PM +0300, Talel Shenhar wrote:
> Document Amazon's Annapurna Labs POS SoC binding.
>
> Signed-off-by: Talel Shenhar <[email protected]>
> ---
> .../devicetree/bindings/soc/amazon/amazon,al-pos.txt | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt

Please convert to DT schema.

>
> diff --git a/Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt b/Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt
> new file mode 100644
> index 00000000..035cc571
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt
> @@ -0,0 +1,18 @@
> +Amazon's Annapurna Labs POS
> +
> +POS node is defined to describe the Point Of Serialization (POS) logger
> +unit.
> +
> +Required properties:
> +- compatible: Shall be "amazon,al-pos".
> +- reg: POS logger resources.
> +- interrupts: should contain the interrupt for pos error event.
> +
> +Example:
> +
> +al_pos {

Needs a unit-address.

> + compatible = "amazon,al-pos";
> + reg = <0x0 0xf0070084 0x0 0x00000008>;
> + interrupt-parent = <&amazon_system_fabric>;
> + interrupts = <24 IRQ_TYPE_LEVEL_HIGH>;
> +};
> --
> 2.7.4
>

2019-09-18 13:55:33

by Shenhar, Talel

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] dt-bindings: soc: al-pos: Amazon's Annapurna Labs POS

Tnx Rob for the review.

Shall be part of v3.

Waiting for responses from Arnd and James and will publish v3.

On 9/18/2019 4:35 PM, Rob Herring wrote:
> On Tue, Sep 10, 2019 at 10:05:08PM +0300, Talel Shenhar wrote:
>> Document Amazon's Annapurna Labs POS SoC binding.
>>
>> Signed-off-by: Talel Shenhar <[email protected]>
>> ---
>> .../devicetree/bindings/soc/amazon/amazon,al-pos.txt | 18 ++++++++++++++++++
>> 1 file changed, 18 insertions(+)
>> create mode 100644 Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt
> Please convert to DT schema.
>
>> diff --git a/Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt b/Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt
>> new file mode 100644
>> index 00000000..035cc571
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/soc/amazon/amazon,al-pos.txt
>> @@ -0,0 +1,18 @@
>> +Amazon's Annapurna Labs POS
>> +
>> +POS node is defined to describe the Point Of Serialization (POS) logger
>> +unit.
>> +
>> +Required properties:
>> +- compatible: Shall be "amazon,al-pos".
>> +- reg: POS logger resources.
>> +- interrupts: should contain the interrupt for pos error event.
>> +
>> +Example:
>> +
>> +al_pos {
> Needs a unit-address.
>
>> + compatible = "amazon,al-pos";
>> + reg = <0x0 0xf0070084 0x0 0x00000008>;
>> + interrupt-parent = <&amazon_system_fabric>;
>> + interrupts = <24 IRQ_TYPE_LEVEL_HIGH>;
>> +};
>> --
>> 2.7.4
>>

2019-09-19 17:12:06

by James Morse

[permalink] [raw]
Subject: Re: [UNVERIFIED SENDER] Re: [UNVERIFIED SENDER] Re: [PATCH v2 2/3] soc: amazon: al-pos: Introduce Amazon's Annapurna Labs POS driver

Hi guys,

On 12/09/2019 10:19, Shenhar, Talel wrote:
> On 9/12/2019 11:50 AM, Marc Zyngier wrote:
>> On Thu, 12 Sep 2019 07:50:03 +0100,
>> "Shenhar, Talel" <[email protected]> wrote:
>>> On 9/11/2019 5:15 PM, Marc Zyngier wrote:
>>>> On Tue, 10 Sep 2019 20:05:09 +0100,
>>>> Talel Shenhar <[email protected]> wrote:
>>>>> +    if (!FIELD_GET(AL_POS_ERROR_LOG_1_VALID, log1))
>>>>> +        return IRQ_NONE;
>>>>> +
>>>>> +    log0 = readl(pos->mmio_base + AL_POS_ERROR_LOG_0);
>>>>> +    writel(0, pos->mmio_base + AL_POS_ERROR_LOG_1);
>>>>> +
>>>>> +    addr = FIELD_GET(AL_POS_ERROR_LOG_0_ADDR_LOW, log0);
>>>>> +    addr |= (FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1) << 32);
>>>>> +    request_id = FIELD_GET(AL_POS_ERROR_LOG_1_REQUEST_ID, log1);
>>>>> +    bresp = FIELD_GET(AL_POS_ERROR_LOG_1_BRESP, log1);
>>>>> +
>>>>> +    dev_err(&pdev->dev, "addr=0x%llx request_id=0x%x bresp=0x%x\n",
>>>>> +        addr, request_id, bresp);

>>>> What is this information? How do we make use of it? Given that this is
>>>> asynchronous, how do we correlate it to the offending software?

>>> Indeed this information arriving from the HW is asynchronous.
>>>
>>> There is no direct method to get the offending software.
>>>
>>> There are all kinds of hacks we do to find the offending software once
>>> we find this error. most of the time its a new patch introduced but
>>> some of the time is just digging.

>> OK, so that the moment, this is more of a debug tool than anything
>> else, right?

> Not sure what do you mean by debug tool. this is used to capture iliigle access and allow
> panic() based on them or simple logging.

Plumbing this into edac as a 'device' gives you the existing/standard interface for
user-space. For example the 'panic_on_ue' that is exposed via sysfs, this saves you having
another interface to toggle it for your driver. You can then use the existing distro tools
to drive/monitor/sample it.


>>>> The whole think looks to me like a poor man's EDAC handling, and I'd
>>>> expect to be plugged in that subsystem instead. Any reason why this
>>>> isn't the case? It would certainly make the handling uniform for the
>>>> user.

>>> This logic was not plugged into EDAC as there is no "Correctable"
>>> error here. its just error event. Not all errors are EDAC in the sense
>>> of Error Detection And *Correction*. There are no correctable errors
>>> for this driver.

>> I'd argue the opposite! Because you obviously don't let a read-only
>> register being written to, the error has been corrected, and you
>> signal the correction status.

> Not the meaning of corrected from my point of view - the system as a whole (sw&hw) are not
> working state. a driver thinks it configured the system to do A while the system doesn't
> really do that. and the critical part is that the driver that did operation A doesn't even
> have a way to know it.
>
> So I would not call this corrected.

I don't think corrected/uncorrected helps here. If the register is read-only, and software
writes to it, its a software bug.

(from the v8.2's RAS extensions view, its somewhere between unrecoverable and uncontained)


>>> So plugging it  under EDAC seems like abusing the EDAC system.

If EDAC doesn't do what you need, it can always be extended.


>>> Now that I've emphasize the reason for not putting this under EDAC,
>>> what do you think? should this "only uncorrectable event" driver
>>> should be part of EDAC?

Sure, (its for memory controllers, but:) enum edac_type has a EDAC_EC: "Error Checking, no
correction". This wouldn't be the only device that only reports uncorrectable errors.


>> My choice would be to plug it into the EDAC subsystem, and report all
>> interrupts as "Corrected" events. Optionally, and only if you are
>> debugging something that requires it, report the error as
>> "Uncorrectable", in which case the EDAC subsystem should trigger a
>> panic.

>> At least you'd get the infrastructure, logging and tooling that the
>> EDAC subsystem offers (parsing the kernel log doesn't really count).

> I see what you say. However, I don't see too much added value in plugging this to EDAC and
> feel like it would abuse EDAC framework.

> James, will love your input from EDAC point of view, does it make sense to plug
> un-correctable only event to EDAC?

I think this device is an example of something like a "Fabric switch units" in
Documentation/driver-api/edac.rst. It makes sense that it should be described as a
'device' to edac. You can then use the existing user-space tools to control/report/monitor
the values.


Thanks,

James

2019-09-23 18:29:51

by Shenhar, Talel

[permalink] [raw]
Subject: Re: [UNVERIFIED SENDER] Re: [UNVERIFIED SENDER] Re: [PATCH v2 2/3] soc: amazon: al-pos: Introduce Amazon's Annapurna Labs POS driver


On 9/19/2019 5:42 PM, James Morse wrote:
> Hi guys,
>
> On 12/09/2019 10:19, Shenhar, Talel wrote:
>> On 9/12/2019 11:50 AM, Marc Zyngier wrote:
>>> On Thu, 12 Sep 2019 07:50:03 +0100,
>>> "Shenhar, Talel" <[email protected]> wrote:
>>>> On 9/11/2019 5:15 PM, Marc Zyngier wrote:
>>>>> On Tue, 10 Sep 2019 20:05:09 +0100,
>>>>> Talel Shenhar <[email protected]> wrote:
>>>>>
>> James, will love your input from EDAC point of view, does it make sense to plug
>> un-correctable only event to EDAC?
> I think this device is an example of something like a "Fabric switch units" in
> Documentation/driver-api/edac.rst. It makes sense that it should be described as a
> 'device' to edac. You can then use the existing user-space tools to control/report/monitor
> the values.

Thank you James,

Will port this logic to be edac device.

>
>
> Thanks,
>
> James