LinuxLists.cc - [PATCH 0/9] bus: brcmstb_gisb: add support for GISBv7 arbiter

2017-03-24 14:47:54

Subject: [PATCH 0/9] bus: brcmstb_gisb: add support for GISBv7 arbiter

This patch set contains changes to enable the GISB arbiter driver
on the latest ARM64 architecture Set-Top Box chips from Broadcom.

This driver relies on being able to hook the abort handlers of
the processor core that are triggered by bus error signals
generated by the GISB bus arbiter hardware found in BCM7XXX chips.
The first three patches are based on the arm64/for-next/core
branch to enable this functionality for the arm64 architecture.

The remaining patches correct some issues with the existing driver,
add the ARM64 architecture specific support to the driver, and
finally add the new register map for the GISBv7 hardware first
appearing in the BCM7278 device.

Doug Berger (7):
arm64: mm: mark fault_info __ro_after_init
arm64: mm: install SError abort handler
bus: brcmstb_gisb: Use register offsets with writes too
bus: brcmstb_gisb: Correct hooking of ARM aborts
bus: brcmstb_gisb: correct support for 64-bit address output
bus: brcmstb_gisb: add ARM64 SError support
bus: brcmstb_gisb: update to support new revision

Florian Fainelli (2):
arm64: mm: Allow installation of memory abort handlers
bus: brcmstb_gisb: Add ARM64 support

.../devicetree/bindings/bus/brcm,gisb-arb.txt | 3 +-
arch/arm64/include/asm/system_misc.h | 5 +
arch/arm64/kernel/entry.S | 69 ++++++++++++--
arch/arm64/mm/fault.c | 48 +++++++++-
drivers/bus/Kconfig | 2 +-
drivers/bus/brcmstb_gisb.c | 106 ++++++++++++++++-----
6 files changed, 197 insertions(+), 36 deletions(-)

--
2.12.0

2017-03-24 14:47:57

by Doug Berger

[permalink] [raw]

Subject: [PATCH 1/9] arm64: mm: Allow installation of memory abort handlers

From: Florian Fainelli <[email protected]>

Similarly to what the ARM/Linux kernel provides, add a hook_fault_code()
function which allows drivers or other parts of the kernel to install
custom memory abort handlers. This is useful when a given SoC's busing
does not propagate the exact faulting physical address, but there is a
way to read it through e.g: a special arbiter driver.

Signed-off-by: Florian Fainelli <[email protected]>
---
arch/arm64/include/asm/system_misc.h | 3 +++
arch/arm64/mm/fault.c | 15 ++++++++++++++-
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index bc812435bc76..e05f5b8c7c1c 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -38,6 +38,9 @@ void arm64_notify_die(const char *str, struct pt_regs *regs,
void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
struct pt_regs *),
int sig, int code, const char *name);
+void hook_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
+ struct pt_regs *),
+ int sig, int code, const char *name);

struct mm_struct;
extern void show_pte(struct mm_struct *mm, unsigned long addr);
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 4bf899fb451b..cdf1260f1005 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -488,7 +488,7 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
return 1;
}

-static const struct fault_info {
+static struct fault_info {
int (*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
int sig;
int code;
@@ -560,6 +560,19 @@ static const struct fault_info {
{ do_bad, SIGBUS, 0, "unknown 63" },
};

+void __init hook_fault_code(int nr,
+ int (*fn)(unsigned long, unsigned int, struct pt_regs *),
+ int sig, int code, const char *name)
+{
+ BUG_ON(nr < 0 || nr >= ARRAY_SIZE(fault_info));
+
+ fault_info[nr].fn = fn;
+ fault_info[nr].sig = sig;
+ fault_info[nr].code = code;
+ fault_info[nr].name = name;
+}
+
+
static const char *fault_name(unsigned int esr)
{
const struct fault_info *inf = fault_info + (esr & 63);
--
2.12.0

2017-03-24 14:48:07

by Doug Berger

[permalink] [raw]

Subject: [PATCH 4/9] bus: brcmstb_gisb: Use register offsets with writes too

This commit corrects the bug introduced in commit f80835875d3d
("bus: brcmstb_gisb: Look up register offsets in a table") such
that gisb_write() translates the register enumeration into an
offset from the base address for writes as well as reads.

Fixes: f80835875d3d ("bus: brcmstb_gisb: Look up register offsets in a table")
Signed-off-by: Doug Berger <[email protected]>
---
drivers/bus/brcmstb_gisb.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
index 72fe0a5a8bf3..a94598d0945a 100644
--- a/drivers/bus/brcmstb_gisb.c
+++ b/drivers/bus/brcmstb_gisb.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2014 Broadcom Corporation
+ * Copyright (C) 2014-2017 Broadcom
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
@@ -127,9 +127,9 @@ static void gisb_write(struct brcmstb_gisb_arb_device *gdev, u32 val, int reg)
return;

if (gdev->big_endian)
- iowrite32be(val, gdev->base + reg);
+ iowrite32be(val, gdev->base + offset);
else
- iowrite32(val, gdev->base + reg);
+ iowrite32(val, gdev->base + offset);
}

static ssize_t gisb_arb_get_timeout(struct device *dev,
--
2.12.0

2017-03-24 14:48:31

by Doug Berger

[permalink] [raw]

Subject: [PATCH 7/9] bus: brcmstb_gisb: Add ARM64 support

From: Florian Fainelli <[email protected]>

Hook to the ARM64 data abort exception #16: synchronous external
abort, which is how the GISB errors will be funneled back to the
ARM64 CPU in case of problems

Signed-off-by: Florian Fainelli <[email protected]>
---
drivers/bus/Kconfig | 2 +-
drivers/bus/brcmstb_gisb.c | 15 ++++++++++++---
2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/bus/Kconfig b/drivers/bus/Kconfig
index 0a52da439abf..d2a5f1184022 100644
--- a/drivers/bus/Kconfig
+++ b/drivers/bus/Kconfig
@@ -57,7 +57,7 @@ config ARM_CCN

config BRCMSTB_GISB_ARB
bool "Broadcom STB GISB bus arbiter"
- depends on ARM || MIPS
+ depends on ARM || ARM64 || MIPS
default ARCH_BRCMSTB || BMIPS_GENERIC
help
Driver for the Broadcom Set Top Box System-on-a-chip internal bus
diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
index c8d2a61d21ed..bf26b4017a2c 100644
--- a/drivers/bus/brcmstb_gisb.c
+++ b/drivers/bus/brcmstb_gisb.c
@@ -30,6 +30,11 @@
#include <asm/signal.h>
#endif

+#ifdef CONFIG_ARM64
+#include <asm/signal.h>
+#include <asm/system_misc.h>
+#endif
+
#ifdef CONFIG_MIPS
#include <asm/traps.h>
#endif
@@ -225,7 +230,7 @@ static int brcmstb_gisb_arb_decode_addr(struct brcmstb_gisb_arb_device *gdev,
return 0;
}

-#ifdef CONFIG_ARM
+#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
static int brcmstb_bus_error_handler(unsigned long addr, unsigned int fsr,
struct pt_regs *regs)
{
@@ -235,7 +240,7 @@ static int brcmstb_bus_error_handler(unsigned long addr, unsigned int fsr,
list_for_each_entry(gdev, &brcmstb_gisb_arb_device_list, next)
brcmstb_gisb_arb_decode_addr(gdev, "bus error");

-#if !defined(CONFIG_ARM_LPAE)
+#if defined(CONFIG_ARM) && !defined(CONFIG_ARM_LPAE)
/*
* If it was an imprecise abort, then we need to correct the
* return address to be _after_ the instruction.
@@ -247,7 +252,7 @@ static int brcmstb_bus_error_handler(unsigned long addr, unsigned int fsr,
/* Always report unhandled exception */
return 1;
}
-#endif
+#endif /* CONFIG_ARM || CONFIG_ARM64 */

#ifdef CONFIG_MIPS
static int brcmstb_bus_error_handler(struct pt_regs *regs, int is_fixup)
@@ -395,6 +400,10 @@ static int __init brcmstb_gisb_arb_probe(struct platform_device *pdev)
"imprecise external abort");
#endif
#endif /* CONFIG_ARM */
+#ifdef CONFIG_ARM64
+ hook_fault_code(16, brcmstb_bus_error_handler, SIGBUS, 0,
+ "synchronous external abort");
+#endif
#ifdef CONFIG_MIPS
board_be_handler = brcmstb_bus_error_handler;
#endif
--
2.12.0

2017-03-24 14:48:19

by Doug Berger

[permalink] [raw]

Subject: [PATCH 5/9] bus: brcmstb_gisb: Correct hooking of ARM aborts

The fault status reporting in the FSR registers is different depending
on whether the Long Physical Address Extension (LPAE) is being used.

This commit corrects the registerring of fault handlers for arm
architecture kernels when the LPAE is enabled. It also forces the
handler to report that the abort exception was unhandled so that the
appropriate signal is sent to the offending user process.

Signed-off-by: Doug Berger <[email protected]>
---
drivers/bus/brcmstb_gisb.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
index a94598d0945a..9eba0143f1a4 100644
--- a/drivers/bus/brcmstb_gisb.c
+++ b/drivers/bus/brcmstb_gisb.c
@@ -225,27 +225,29 @@ static int brcmstb_gisb_arb_decode_addr(struct brcmstb_gisb_arb_device *gdev,
static int brcmstb_bus_error_handler(unsigned long addr, unsigned int fsr,
struct pt_regs *regs)
{
- int ret = 0;
struct brcmstb_gisb_arb_device *gdev;

/* iterate over each GISB arb registered handlers */
list_for_each_entry(gdev, &brcmstb_gisb_arb_device_list, next)
- ret |= brcmstb_gisb_arb_decode_addr(gdev, "bus error");
+ brcmstb_gisb_arb_decode_addr(gdev, "bus error");
+
+#if !defined(CONFIG_ARM_LPAE)
/*
* If it was an imprecise abort, then we need to correct the
* return address to be _after_ the instruction.
*/
if (fsr & (1 << 10))
regs->ARM_pc += 4;
+#endif

- return ret;
+ /* Always report unhandled exception */
+ return 1;
}
#endif

#ifdef CONFIG_MIPS
static int brcmstb_bus_error_handler(struct pt_regs *regs, int is_fixup)
{
- int ret = 0;
struct brcmstb_gisb_arb_device *gdev;
u32 cap_status;

@@ -258,7 +260,7 @@ static int brcmstb_bus_error_handler(struct pt_regs *regs, int is_fixup)
goto out;
}

- ret |= brcmstb_gisb_arb_decode_addr(gdev, "bus error");
+ brcmstb_gisb_arb_decode_addr(gdev, "bus error");
}
out:
return is_fixup ? MIPS_BE_FIXUP : MIPS_BE_FATAL;
@@ -379,9 +381,16 @@ static int __init brcmstb_gisb_arb_probe(struct platform_device *pdev)
list_add_tail(&gdev->next, &brcmstb_gisb_arb_device_list);

#ifdef CONFIG_ARM
+#ifdef CONFIG_ARM_LPAE
+ hook_fault_code(16, brcmstb_bus_error_handler, SIGBUS, 0,
+ "synchronous external abort");
+ hook_fault_code(17, brcmstb_bus_error_handler, SIGBUS, 0,
+ "asynchronous external abort");
+#else
hook_fault_code(22, brcmstb_bus_error_handler, SIGBUS, 0,
"imprecise external abort");
#endif
+#endif /* CONFIG_ARM */
#ifdef CONFIG_MIPS
board_be_handler = brcmstb_bus_error_handler;
#endif
--
2.12.0

2017-03-24 14:48:46

by Doug Berger

[permalink] [raw]

Subject: [PATCH 8/9] bus: brcmstb_gisb: add ARM64 SError support

Asynchronous external aborts (e.g. for buffered writes) trigger SError
aborts on the arm64 architecture. This commit hooks the SError abort
handling to check for GISB arbitration errors.

Signed-off-by: Doug Berger <[email protected]>
---
drivers/bus/brcmstb_gisb.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
index bf26b4017a2c..52b5d96081eb 100644
--- a/drivers/bus/brcmstb_gisb.c
+++ b/drivers/bus/brcmstb_gisb.c
@@ -252,6 +252,28 @@ static int brcmstb_bus_error_handler(unsigned long addr, unsigned int fsr,
/* Always report unhandled exception */
return 1;
}
+
+#ifdef CONFIG_ARM64
+static int (*serror_chain)(unsigned long addr, unsigned int esr,
+ struct pt_regs *regs);
+static int do_brahma_b53_serror(unsigned long addr, unsigned int esr,
+ struct pt_regs *regs)
+{
+ struct brcmstb_gisb_arb_device *gdev;
+
+ if (((esr & (3 << 22)) == 0) && ((esr & 3) == 2)) {
+ /* iterate over each GISB arb registered handlers */
+ list_for_each_entry(gdev, &brcmstb_gisb_arb_device_list, next)
+ brcmstb_gisb_arb_decode_addr(gdev, "bus error");
+ }
+
+ if (serror_chain)
+ return serror_chain(addr, esr, regs);
+
+ /* Always report unhandled exception */
+ return 1;
+}
+#endif
#endif /* CONFIG_ARM || CONFIG_ARM64 */

#ifdef CONFIG_MIPS
@@ -403,6 +425,8 @@ static int __init brcmstb_gisb_arb_probe(struct platform_device *pdev)
#ifdef CONFIG_ARM64
hook_fault_code(16, brcmstb_bus_error_handler, SIGBUS, 0,
"synchronous external abort");
+ if (list_is_singular(&brcmstb_gisb_arb_device_list))
+ serror_chain = hook_serror_handler(do_brahma_b53_serror);
#endif
#ifdef CONFIG_MIPS
board_be_handler = brcmstb_bus_error_handler;
--
2.12.0

2017-03-24 14:49:00

by Doug Berger

[permalink] [raw]

Subject: [PATCH 9/9] bus: brcmstb_gisb: update to support new revision

The 7278 introduces a new version of this core. This
commit adds support for that revision.

Signed-off-by: Doug Berger <[email protected]>
---
Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt | 3 ++-
drivers/bus/brcmstb_gisb.c | 10 ++++++++++
2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt b/Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt
index 1eceefb20f01..8a6c3c2e58fe 100644
--- a/Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt
+++ b/Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt
@@ -3,7 +3,8 @@ Broadcom GISB bus Arbiter controller
Required properties:

- compatible:
- "brcm,gisb-arb" or "brcm,bcm7445-gisb-arb" for 28nm chips
+ "brcm,bcm7278-gisb-arb" for V7 28nm chips
+ "brcm,gisb-arb" or "brcm,bcm7445-gisb-arb" for other 28nm chips
"brcm,bcm7435-gisb-arb" for newer 40nm chips
"brcm,bcm7400-gisb-arb" for older 40nm chips and all 65nm chips
"brcm,bcm7038-gisb-arb" for 130nm chips
diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
index 52b5d96081eb..a91c4ae63e74 100644
--- a/drivers/bus/brcmstb_gisb.c
+++ b/drivers/bus/brcmstb_gisb.c
@@ -63,6 +63,15 @@ static const int gisb_offsets_bcm7038[] = {
[ARB_ERR_CAP_MASTER] = -1,
};

+static const int gisb_offsets_bcm7278[] = {
+ [ARB_TIMER] = 0x008,
+ [ARB_ERR_CAP_CLR] = 0x7f8,
+ [ARB_ERR_CAP_HI_ADDR] = -1,
+ [ARB_ERR_CAP_ADDR] = 0x7e0,
+ [ARB_ERR_CAP_STATUS] = 0x7f0,
+ [ARB_ERR_CAP_MASTER] = 0x7f4,
+};
+
static const int gisb_offsets_bcm7400[] = {
[ARB_TIMER] = 0x00c,
[ARB_ERR_CAP_CLR] = 0x0c8,
@@ -329,6 +338,7 @@ static const struct of_device_id brcmstb_gisb_arb_of_match[] = {
{ .compatible = "brcm,bcm7445-gisb-arb", .data = gisb_offsets_bcm7445 },
{ .compatible = "brcm,bcm7435-gisb-arb", .data = gisb_offsets_bcm7435 },
{ .compatible = "brcm,bcm7400-gisb-arb", .data = gisb_offsets_bcm7400 },
+ { .compatible = "brcm,bcm7278-gisb-arb", .data = gisb_offsets_bcm7278 },
{ .compatible = "brcm,bcm7038-gisb-arb", .data = gisb_offsets_bcm7038 },
{ },
};
--
2.12.0

2017-03-24 14:49:48

by Doug Berger

[permalink] [raw]

Subject: [PATCH 3/9] arm64: mm: install SError abort handler

This commit adds support for minimal handling of SError aborts and
allows them to be hooked by a driver or other part of the kernel to
install a custom SError abort handler. The hook function returns
the previously registered handler so that handlers may be chained if
desired.

The handler should return the value 0 if the error has been handled,
otherwise the handler should either call the next handler in the
chain or return a non-zero value.

Since the Instruction Specific Syndrome value for SError aborts is
implementation specific the registerred handlers must implement
their own parsing of the syndrome.

Signed-off-by: Doug Berger <[email protected]>
---
arch/arm64/include/asm/system_misc.h | 2 ++
arch/arm64/kernel/entry.S | 69 ++++++++++++++++++++++++++++++++----
arch/arm64/mm/fault.c | 31 ++++++++++++++++
3 files changed, 95 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index e05f5b8c7c1c..60ac784ff4e6 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -41,6 +41,8 @@ void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
void hook_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
struct pt_regs *),
int sig, int code, const char *name);
+void *hook_serror_handler(int (*fn)(unsigned long, unsigned int,
+ struct pt_regs *));

struct mm_struct;
extern void show_pte(struct mm_struct *mm, unsigned long addr);
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 43512d4d7df2..d043d66b390d 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -323,18 +323,18 @@ ENTRY(vectors)
ventry el1_sync // Synchronous EL1h
ventry el1_irq // IRQ EL1h
ventry el1_fiq_invalid // FIQ EL1h
- ventry el1_error_invalid // Error EL1h
+ ventry el1_error // Error EL1h

ventry el0_sync // Synchronous 64-bit EL0
ventry el0_irq // IRQ 64-bit EL0
ventry el0_fiq_invalid // FIQ 64-bit EL0
- ventry el0_error_invalid // Error 64-bit EL0
+ ventry el0_error // Error 64-bit EL0

#ifdef CONFIG_COMPAT
ventry el0_sync_compat // Synchronous 32-bit EL0
ventry el0_irq_compat // IRQ 32-bit EL0
ventry el0_fiq_invalid_compat // FIQ 32-bit EL0
- ventry el0_error_invalid_compat // Error 32-bit EL0
+ ventry el0_error_compat // Error 32-bit EL0
#else
ventry el0_sync_invalid // Synchronous 32-bit EL0
ventry el0_irq_invalid // IRQ 32-bit EL0
@@ -374,10 +374,6 @@ ENDPROC(el0_error_invalid)
el0_fiq_invalid_compat:
inv_entry 0, BAD_FIQ, 32
ENDPROC(el0_fiq_invalid_compat)
-
-el0_error_invalid_compat:
- inv_entry 0, BAD_ERROR, 32
-ENDPROC(el0_error_invalid_compat)
#endif

el1_sync_invalid:
@@ -508,6 +504,34 @@ el1_preempt:
ret x24
#endif

+ .align 6
+el1_error:
+ kernel_entry 1
+ mrs x1, esr_el1 // read the syndrome register
+ lsr x24, x1, #ESR_ELx_EC_SHIFT // exception class
+ cmp x24, #ESR_ELx_EC_SERROR // SError exception in EL1
+ b.ne el1_error_inv
+el1_serr:
+ mrs x0, far_el1
+ enable_dbg
+ // re-enable interrupts if they were enabled in the aborted context
+ tbnz x23, #7, 1f // PSR_I_BIT
+ enable_irq
+1:
+ mov x2, sp // struct pt_regs
+ bl do_serr_abort
+
+ // disable interrupts before pulling preserved data off the stack
+ disable_irq
+ kernel_exit 1
+el1_error_inv:
+ enable_dbg
+ mov x0, sp
+ mov x2, x1
+ mov x1, #BAD_ERROR
+ b bad_mode
+ENDPROC(el1_error)
+
/*
* EL0 mode handlers.
*/
@@ -584,6 +608,11 @@ el0_svc_compat:
el0_irq_compat:
kernel_entry 0, 32
b el0_irq_naked
+
+ .align 6
+el0_error_compat:
+ kernel_entry 0, 32
+ b el0_error_naked
#endif

el0_da:
@@ -705,6 +734,32 @@ el0_irq_naked:
b ret_to_user
ENDPROC(el0_irq)

+ .align 6
+el0_error:
+ kernel_entry 0
+el0_error_naked:
+ mrs x25, esr_el1 // read the syndrome register
+ lsr x24, x25, #ESR_ELx_EC_SHIFT // exception class
+ cmp x24, #ESR_ELx_EC_SERROR // SError exception in EL0
+ b.ne el0_error_inv
+el0_serr:
+ mrs x26, far_el1
+ // enable interrupts before calling the main handler
+ enable_dbg_and_irq
+ ct_user_exit
+ bic x0, x26, #(0xff << 56)
+ mov x1, x25
+ mov x2, sp
+ bl do_serr_abort
+ b ret_to_user
+el0_error_inv:
+ enable_dbg
+ mov x0, sp
+ mov x1, #BAD_ERROR
+ mov x2, x25
+ b bad_mode
+ENDPROC(el0_error)
+
/*
* Register switch for AArch64. The callee-saved registers need to be saved
* and restored. On entry:
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 43319ed58a47..577fecea7c7d 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -705,3 +705,34 @@ int cpu_enable_pan(void *__unused)
return 0;
}
#endif /* CONFIG_ARM64_PAN */
+
+static int (*serror_handler)(unsigned long, unsigned int,
+ struct pt_regs *) __ro_after_init;
+
+void *__init hook_serror_handler(int (*fn)(unsigned long, unsigned int,
+ struct pt_regs *))
+{
+ void *ret = serror_handler;
+
+ serror_handler = fn;
+ return ret;
+}
+
+asmlinkage void __exception do_serr_abort(unsigned long addr, unsigned int esr,
+ struct pt_regs *regs)
+{
+ struct siginfo info;
+
+ if (serror_handler)
+ if (!serror_handler(addr, esr, regs))
+ return;
+
+ pr_alert("Unhandled SError: (0x%08x) at 0x%016lx\n", esr, addr);
+ __show_regs(regs);
+
+ info.si_signo = SIGILL;
+ info.si_errno = 0;
+ info.si_code = ILL_ILLOPC;
+ info.si_addr = (void __user *)addr;
+ arm64_notify_die("", regs, &info, esr);
+}
--
2.12.0

2017-03-24 14:50:00

by Doug Berger

[permalink] [raw]

Subject: [PATCH 6/9] bus: brcmstb_gisb: correct support for 64-bit address output

The GISB bus can support addresses beyond 32-bits. So this commit
corrects support for reading a captured 64-bit address into a 64-bit
variable by obtaining the high bits from the ARB_ERR_CAP_HI_ADDR
register (when present) and then outputting the full 64-bit value.

It also removes unused definitions.

Fixes: 44127b771d9c ("bus: add Broadcom GISB bus arbiter timeout/error handler")
Signed-off-by: Doug Berger <[email protected]>
---
drivers/bus/brcmstb_gisb.c | 36 ++++++++++++++++++++----------------
1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
index 9eba0143f1a4..c8d2a61d21ed 100644
--- a/drivers/bus/brcmstb_gisb.c
+++ b/drivers/bus/brcmstb_gisb.c
@@ -37,8 +37,6 @@
#define ARB_ERR_CAP_CLEAR (1 << 0)
#define ARB_ERR_CAP_STATUS_TIMEOUT (1 << 12)
#define ARB_ERR_CAP_STATUS_TEA (1 << 11)
-#define ARB_ERR_CAP_STATUS_BS_SHIFT (1 << 2)
-#define ARB_ERR_CAP_STATUS_BS_MASK 0x3c
#define ARB_ERR_CAP_STATUS_WRITE (1 << 1)
#define ARB_ERR_CAP_STATUS_VALID (1 << 0)

@@ -47,7 +45,6 @@ enum {
ARB_ERR_CAP_CLR,
ARB_ERR_CAP_HI_ADDR,
ARB_ERR_CAP_ADDR,
- ARB_ERR_CAP_DATA,
ARB_ERR_CAP_STATUS,
ARB_ERR_CAP_MASTER,
};
@@ -57,7 +54,6 @@ static const int gisb_offsets_bcm7038[] = {
[ARB_ERR_CAP_CLR] = 0x0c4,
[ARB_ERR_CAP_HI_ADDR] = -1,
[ARB_ERR_CAP_ADDR] = 0x0c8,
- [ARB_ERR_CAP_DATA] = 0x0cc,
[ARB_ERR_CAP_STATUS] = 0x0d0,
[ARB_ERR_CAP_MASTER] = -1,
};
@@ -67,7 +63,6 @@ static const int gisb_offsets_bcm7400[] = {
[ARB_ERR_CAP_CLR] = 0x0c8,
[ARB_ERR_CAP_HI_ADDR] = -1,
[ARB_ERR_CAP_ADDR] = 0x0cc,
- [ARB_ERR_CAP_DATA] = 0x0d0,
[ARB_ERR_CAP_STATUS] = 0x0d4,
[ARB_ERR_CAP_MASTER] = 0x0d8,
};
@@ -77,7 +72,6 @@ static const int gisb_offsets_bcm7435[] = {
[ARB_ERR_CAP_CLR] = 0x168,
[ARB_ERR_CAP_HI_ADDR] = -1,
[ARB_ERR_CAP_ADDR] = 0x16c,
- [ARB_ERR_CAP_DATA] = 0x170,
[ARB_ERR_CAP_STATUS] = 0x174,
[ARB_ERR_CAP_MASTER] = 0x178,
};
@@ -87,7 +81,6 @@ static const int gisb_offsets_bcm7445[] = {
[ARB_ERR_CAP_CLR] = 0x7e4,
[ARB_ERR_CAP_HI_ADDR] = 0x7e8,
[ARB_ERR_CAP_ADDR] = 0x7ec,
- [ARB_ERR_CAP_DATA] = 0x7f0,
[ARB_ERR_CAP_STATUS] = 0x7f4,
[ARB_ERR_CAP_MASTER] = 0x7f8,
};
@@ -109,9 +102,13 @@ static u32 gisb_read(struct brcmstb_gisb_arb_device *gdev, int reg)
{
int offset = gdev->gisb_offsets[reg];

- /* return 1 if the hardware doesn't have ARB_ERR_CAP_MASTER */
- if (offset == -1)
- return 1;
+ if (offset < 0) {
+ /* return 1 if the hardware doesn't have ARB_ERR_CAP_MASTER */
+ if (reg == ARB_ERR_CAP_MASTER)
+ return 1;
+ else
+ return 0;
+ }

if (gdev->big_endian)
return ioread32be(gdev->base + offset);
@@ -119,6 +116,16 @@ static u32 gisb_read(struct brcmstb_gisb_arb_device *gdev, int reg)
return ioread32(gdev->base + offset);
}

+static u64 gisb_read_address(struct brcmstb_gisb_arb_device *gdev)
+{
+ u64 value;
+
+ value = (u64)gisb_read(gdev, ARB_ERR_CAP_ADDR);
+ value |= (u64)gisb_read(gdev, ARB_ERR_CAP_HI_ADDR) << 32;
+
+ return value;
+}
+
static void gisb_write(struct brcmstb_gisb_arb_device *gdev, u32 val, int reg)
{
int offset = gdev->gisb_offsets[reg];
@@ -185,7 +192,7 @@ static int brcmstb_gisb_arb_decode_addr(struct brcmstb_gisb_arb_device *gdev,
const char *reason)
{
u32 cap_status;
- unsigned long arb_addr;
+ u64 arb_addr;
u32 master;
const char *m_name;
char m_fmt[11];
@@ -197,10 +204,7 @@ static int brcmstb_gisb_arb_decode_addr(struct brcmstb_gisb_arb_device *gdev,
return 1;

/* Read the address and master */
- arb_addr = gisb_read(gdev, ARB_ERR_CAP_ADDR) & 0xffffffff;
-#if (IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
- arb_addr |= (u64)gisb_read(gdev, ARB_ERR_CAP_HI_ADDR) << 32;
-#endif
+ arb_addr = gisb_read_address(gdev);
master = gisb_read(gdev, ARB_ERR_CAP_MASTER);

m_name = brcmstb_gisb_master_to_str(gdev, master);
@@ -209,7 +213,7 @@ static int brcmstb_gisb_arb_decode_addr(struct brcmstb_gisb_arb_device *gdev,
m_name = m_fmt;
}

- pr_crit("%s: %s at 0x%lx [%c %s], core: %s\n",
+ pr_crit("%s: %s at 0x%llx [%c %s], core: %s\n",
__func__, reason, arb_addr,
cap_status & ARB_ERR_CAP_STATUS_WRITE ? 'W' : 'R',
cap_status & ARB_ERR_CAP_STATUS_TIMEOUT ? "timeout" : "",
--
2.12.0

2017-03-24 14:47:55

by Doug Berger

[permalink] [raw]

Subject: [PATCH 2/9] arm64: mm: mark fault_info __ro_after_init

The fault_info table must be made writeable to allow installation
of custom memory abort handlers, but it can be made read-only
after initialization to provide some protection.

Signed-off-by: Doug Berger <[email protected]>
---
arch/arm64/mm/fault.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index cdf1260f1005..43319ed58a47 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -493,7 +493,7 @@ static struct fault_info {
int sig;
int code;
const char *name;
-} fault_info[] = {
+} fault_info[] __ro_after_init = {
{ do_bad, SIGBUS, 0, "ttbr address size fault" },
{ do_bad, SIGBUS, 0, "level 1 address size fault" },
{ do_bad, SIGBUS, 0, "level 2 address size fault" },
--
2.12.0

2017-03-24 15:04:31

by Mark Rutland

[permalink] [raw]

Subject: Re: [PATCH 0/9] bus: brcmstb_gisb: add support for GISBv7 arbiter

On Fri, Mar 24, 2017 at 07:46:23AM -0700, Doug Berger wrote:
> This patch set contains changes to enable the GISB arbiter driver
> on the latest ARM64 architecture Set-Top Box chips from Broadcom.
>
> This driver relies on being able to hook the abort handlers of
> the processor core that are triggered by bus error signals
> generated by the GISB bus arbiter hardware found in BCM7XXX chips.

Ugh; hardware generating asynchonous exceptions is hideous. I had hoped
that such hardware was a thing of the past.

Under what circumstances does the GISB bus arbiter generate these
aborts?

Mark.

> The first three patches are based on the arm64/for-next/core
> branch to enable this functionality for the arm64 architecture.
>
> The remaining patches correct some issues with the existing driver,
> add the ARM64 architecture specific support to the driver, and
> finally add the new register map for the GISBv7 hardware first
> appearing in the BCM7278 device.
>
> Doug Berger (7):
> arm64: mm: mark fault_info __ro_after_init
> arm64: mm: install SError abort handler
> bus: brcmstb_gisb: Use register offsets with writes too
> bus: brcmstb_gisb: Correct hooking of ARM aborts
> bus: brcmstb_gisb: correct support for 64-bit address output
> bus: brcmstb_gisb: add ARM64 SError support
> bus: brcmstb_gisb: update to support new revision
>
> Florian Fainelli (2):
> arm64: mm: Allow installation of memory abort handlers
> bus: brcmstb_gisb: Add ARM64 support
>
> .../devicetree/bindings/bus/brcm,gisb-arb.txt | 3 +-
> arch/arm64/include/asm/system_misc.h | 5 +
> arch/arm64/kernel/entry.S | 69 ++++++++++++--
> arch/arm64/mm/fault.c | 48 +++++++++-
> drivers/bus/Kconfig | 2 +-
> drivers/bus/brcmstb_gisb.c | 106 ++++++++++++++++-----
> 6 files changed, 197 insertions(+), 36 deletions(-)
>
> --
> 2.12.0
>

2017-03-24 15:17:34

by Mark Rutland

[permalink] [raw]

Subject: Re: [PATCH 3/9] arm64: mm: install SError abort handler

On Fri, Mar 24, 2017 at 07:46:26AM -0700, Doug Berger wrote:
> This commit adds support for minimal handling of SError aborts and
> allows them to be hooked by a driver or other part of the kernel to
> install a custom SError abort handler. The hook function returns
> the previously registered handler so that handlers may be chained if
> desired.
>
> The handler should return the value 0 if the error has been handled,
> otherwise the handler should either call the next handler in the
> chain or return a non-zero value.

... so the order these get calls is completely dependent on probe
order...

> Since the Instruction Specific Syndrome value for SError aborts is
> implementation specific the registerred handlers must implement
> their own parsing of the syndrome.

... and drivers have to be intimately familiar with the CPU, in order to
be able to parse its IMPLEMENTATION DEFINED ESR_ELx.ISS value.

Even then, there's no guarantee there's anything useful there, since it
is IMPLEMENTATION DEFINED and could simply be RES0 or UNKNOWN in all
cases.

I do not think it is a good idea to allow arbitrary drivers to hook
this fault in this manner.

> + .align 6
> +el0_error:
> + kernel_entry 0
> +el0_error_naked:
> + mrs x25, esr_el1 // read the syndrome register
> + lsr x24, x25, #ESR_ELx_EC_SHIFT // exception class
> + cmp x24, #ESR_ELx_EC_SERROR // SError exception in EL0
> + b.ne el0_error_inv
> +el0_serr:
> + mrs x26, far_el1
> + // enable interrupts before calling the main handler
> + enable_dbg_and_irq

... why?

We don't do this for inv_entry today.

> + ct_user_exit
> + bic x0, x26, #(0xff << 56)
> + mov x1, x25
> + mov x2, sp
> + bl do_serr_abort
> + b ret_to_user
> +el0_error_inv:
> + enable_dbg
> + mov x0, sp
> + mov x1, #BAD_ERROR
> + mov x2, x25
> + b bad_mode
> +ENDPROC(el0_error)

Clearly you expect these to be delivered at arbitrary times during
execution. What if a KVM guest is executing at the time the SError is
delivered?

To be quite frank, I don't believe that we can reliably and safely
handle this misfeature in the kernel, and this infrastructure only
provides the illusion that we can.

I do not think it makes sense to do this.

Thanks,
Mark.

2017-03-24 16:03:21

by Doug Berger

[permalink] [raw]

Subject: Re: [PATCH 0/9] bus: brcmstb_gisb: add support for GISBv7 arbiter

On 03/24/2017 08:03 AM, Mark Rutland wrote:
> On Fri, Mar 24, 2017 at 07:46:23AM -0700, Doug Berger wrote:
>> This patch set contains changes to enable the GISB arbiter driver
>> on the latest ARM64 architecture Set-Top Box chips from Broadcom.
>>
>> This driver relies on being able to hook the abort handlers of
>> the processor core that are triggered by bus error signals
>> generated by the GISB bus arbiter hardware found in BCM7XXX chips.
>
> Ugh; hardware generating asynchonous exceptions is hideous. I had hoped
> that such hardware was a thing of the past.
Yes, hope springs eternal :)

>
> Under what circumstances does the GISB bus arbiter generate these
> aborts?
Since the GISB bus generally uses buffered writes it can early ack the
CPU bus master before the arbitration error is detected. This causes
the bus error to be seen as an asynchronous abort by the CPU.

>
> Mark.
>

2017-03-24 16:49:04

by Doug Berger

[permalink] [raw]

Subject: Re: [PATCH 3/9] arm64: mm: install SError abort handler

On 03/24/2017 08:16 AM, Mark Rutland wrote:
> On Fri, Mar 24, 2017 at 07:46:26AM -0700, Doug Berger wrote:
>> This commit adds support for minimal handling of SError aborts and
>> allows them to be hooked by a driver or other part of the kernel to
>> install a custom SError abort handler. The hook function returns
>> the previously registered handler so that handlers may be chained if
>> desired.
>>
>> The handler should return the value 0 if the error has been handled,
>> otherwise the handler should either call the next handler in the
>> chain or return a non-zero value.
>
> ... so the order these get calls is completely dependent on probe
> order...
Yes, but this was an attempt to keep some flexibility in handling a
very ambiguous event.

>
>> Since the Instruction Specific Syndrome value for SError aborts is
>> implementation specific the registerred handlers must implement
>> their own parsing of the syndrome.
>
> ... and drivers have to be intimately familiar with the CPU, in order to
> be able to parse its IMPLEMENTATION DEFINED ESR_ELx.ISS value.
>
> Even then, there's no guarantee there's anything useful there, since it
> is IMPLEMENTATION DEFINED and could simply be RES0 or UNKNOWN in all
> cases.
>
> I do not think it is a good idea to allow arbitrary drivers to hook
> this fault in this manner.
>
I agree. It should really be resolved in the fault handling code like
it is for the ARM architecture, but the IMPLEMENTATION DEFINED nature of
the event for ARM64 makes this unmanageable but for the most specific
use cases, which is what is attempted here.

>> + .align 6
>> +el0_error:
>> + kernel_entry 0
>> +el0_error_naked:
>> + mrs x25, esr_el1 // read the syndrome register
>> + lsr x24, x25, #ESR_ELx_EC_SHIFT // exception class
>> + cmp x24, #ESR_ELx_EC_SERROR // SError exception in EL0
>> + b.ne el0_error_inv
>> +el0_serr:
>> + mrs x26, far_el1
>> + // enable interrupts before calling the main handler
>> + enable_dbg_and_irq
>
> ... why?
>
> We don't do this for inv_entry today.
>
Yes, my initial downstream implementation modified inv_entry, but after
commit 7d9e8f71b989 ("arm64: avoid returning from bad mode") added the
user abort handling for el0_inv I tried to follow that approach so user
mode errors (i.e. bad writes) wouldn't kill the kernel.

>> + ct_user_exit
>> + bic x0, x26, #(0xff << 56)
>> + mov x1, x25
>> + mov x2, sp
>> + bl do_serr_abort
>> + b ret_to_user
>> +el0_error_inv:
>> + enable_dbg
>> + mov x0, sp
>> + mov x1, #BAD_ERROR
>> + mov x2, x25
>> + b bad_mode
>> +ENDPROC(el0_error)
>
> Clearly you expect these to be delivered at arbitrary times during
> execution. What if a KVM guest is executing at the time the SError is
> delivered?
The timing isn't really arbitrary in our particular use case. It is
just after the bus interface has moved on from the failing transaction
so from the bus interfaces perspective it is asynchronous. The main
benefit is to help debug user mode code that accidentally maps a bad
address since we would never make such an egregious error in the kernel ;)

I'm afraid I'm not fully versed on the implications to KVM here.
>
> To be quite frank, I don't believe that we can reliably and safely
> handle this misfeature in the kernel, and this infrastructure only
> provides the illusion that we can.
>
> I do not think it makes sense to do this.
>
> Thanks,
> Mark.
>
I understand your position since this was the cleanest approach I came
up with and it is admittedly ugly. I would be happy to entertain any
better suggestion on how this could be handled more cleanly.

If you would consider an alternative implementation where we scrap the
SError handler (i.e. maintain the ugliness in our downstream kernel) in
favor of a more gentle user mode crash on SError that allows the kernel
the opportunity to service the interrupt for diagnostic purposes I could
try to repackage that.

Thanks for the review!
Doug

2017-03-24 17:36:41

by Mark Rutland

[permalink] [raw]

Subject: Re: [PATCH 3/9] arm64: mm: install SError abort handler

On Fri, Mar 24, 2017 at 09:48:40AM -0700, Doug Berger wrote:
> On 03/24/2017 08:16 AM, Mark Rutland wrote:
> >On Fri, Mar 24, 2017 at 07:46:26AM -0700, Doug Berger wrote:

> If you would consider an alternative implementation where we scrap
> the SError handler (i.e. maintain the ugliness in our downstream
> kernel) in favor of a more gentle user mode crash on SError that
> allows the kernel the opportunity to service the interrupt for
> diagnostic purposes I could try to repackage that.

If this is just for diagnostic purposes, I believe you can register a
panic notifier, which can then read from the bus. The panic will occur,
but you'll have the opportunity to log some information to dmesg.

Thanks,
Mark.

2017-03-24 17:54:28

by Florian Fainelli

[permalink] [raw]

Subject: Re: [PATCH 3/9] arm64: mm: install SError abort handler

On 03/24/2017 10:35 AM, Mark Rutland wrote:
> On Fri, Mar 24, 2017 at 09:48:40AM -0700, Doug Berger wrote:
>> On 03/24/2017 08:16 AM, Mark Rutland wrote:
>>> On Fri, Mar 24, 2017 at 07:46:26AM -0700, Doug Berger wrote:
>
>> If you would consider an alternative implementation where we scrap
>> the SError handler (i.e. maintain the ugliness in our downstream
>> kernel) in favor of a more gentle user mode crash on SError that
>> allows the kernel the opportunity to service the interrupt for
>> diagnostic purposes I could try to repackage that.
>
> If this is just for diagnostic purposes, I believe you can register a
> panic notifier, which can then read from the bus. The panic will occur,
> but you'll have the opportunity to log some information to dmesg.

And crash the kernel? That sounds awful, FWIW the ARM/Linux kernel is
able to recover just fine from user-space accessing e.g: invalid
physical addresses in the GISB register space, bringing the same level
of functionality to ARM64/Linux sounds reasonable to me.
--
Florian

2017-03-24 18:32:19

by Mark Rutland

[permalink] [raw]

Subject: Re: [PATCH 3/9] arm64: mm: install SError abort handler

Hi Florian,

On Fri, Mar 24, 2017 at 10:53:48AM -0700, Florian Fainelli wrote:
> On 03/24/2017 10:35 AM, Mark Rutland wrote:
> > On Fri, Mar 24, 2017 at 09:48:40AM -0700, Doug Berger wrote:
> >> On 03/24/2017 08:16 AM, Mark Rutland wrote:
> >>> On Fri, Mar 24, 2017 at 07:46:26AM -0700, Doug Berger wrote:
> >
> >> If you would consider an alternative implementation where we scrap
> >> the SError handler (i.e. maintain the ugliness in our downstream
> >> kernel) in favor of a more gentle user mode crash on SError that
> >> allows the kernel the opportunity to service the interrupt for
> >> diagnostic purposes I could try to repackage that.
> >
> > If this is just for diagnostic purposes, I believe you can register a
> > panic notifier, which can then read from the bus. The panic will occur,
> > but you'll have the opportunity to log some information to dmesg.
>
> And crash the kernel? That sounds awful, FWIW the ARM/Linux kernel is
> able to recover just fine from user-space accessing e.g: invalid
> physical addresses in the GISB register space, bringing the same level
> of functionality to ARM64/Linux sounds reasonable to me.

I disagree, given that:

(a) You cannot determine the (HW) origin of the SError in an
architecturally portable way. i.e. when you take an SError, you have
no way of determining what asynchronous event caused it.

(b) SError is effectively an edge-triggered interrupt for fatal system
errors (e.g. it may be triggered in resonse to ECC errors,
corruption detected in caches, etc). Even if you can determine that
the GISB triggered *an* SError, this does not tell you that this was
the *only* SError.

If you take an SError, something bad has already happened. Your data
may already have been corrupted, and worse, you don't know when or
where specifically this occurred (nor how many times).

(c) You cannot determine the (SW) origin of an SError without relying
upon implementation details. This cannot be written in a way that
does not rely on microarchitecture, integration, etc, and would need
to be updated for every future system with this misfeature.

(d) Even if you can determine the (SW) origin of an SError by relying on
IMPLEMENTATION DEFINED details, your handler needs to be intimately
familiar with the arch in question in order to attempt to recover.

For example, the existing code tries to skip an ARM instruction in
some cases. For arm64 there are three cases that would need to be
handled (AArch64 A64, AArch32 A32/ARM, AArch32 T32/Thumb).

Further, it appears to me that the existing code is broken given
that it doesn't handle Thumb, and given that it's skipping an
instruction in response to an asynchronous event -- i.e. some
arbitrary instruction after the one which triggered the abort.

For better or worse, SError *must* be treated as fatal.

As Doug stated:

The main benefit is to help debug user mode code that accidentally
maps a bad address since we would never make such an egregious error
in the kernel ;)

This is just one of many ways a userspace application with direct HW
access can bring down the system. I see no reason to treat it any
differently, especially given the above points.

Thanks,
Mark.

2017-03-24 19:02:21

by Florian Fainelli

[permalink] [raw]

Subject: Re: [PATCH 3/9] arm64: mm: install SError abort handler

On 03/24/2017 11:31 AM, Mark Rutland wrote:
> Hi Florian,
>
> On Fri, Mar 24, 2017 at 10:53:48AM -0700, Florian Fainelli wrote:
>> On 03/24/2017 10:35 AM, Mark Rutland wrote:
>>> On Fri, Mar 24, 2017 at 09:48:40AM -0700, Doug Berger wrote:
>>>> On 03/24/2017 08:16 AM, Mark Rutland wrote:
>>>>> On Fri, Mar 24, 2017 at 07:46:26AM -0700, Doug Berger wrote:
>>>
>>>> If you would consider an alternative implementation where we scrap
>>>> the SError handler (i.e. maintain the ugliness in our downstream
>>>> kernel) in favor of a more gentle user mode crash on SError that
>>>> allows the kernel the opportunity to service the interrupt for
>>>> diagnostic purposes I could try to repackage that.
>>>
>>> If this is just for diagnostic purposes, I believe you can register a
>>> panic notifier, which can then read from the bus. The panic will occur,
>>> but you'll have the opportunity to log some information to dmesg.
>>
>> And crash the kernel? That sounds awful, FWIW the ARM/Linux kernel is
>> able to recover just fine from user-space accessing e.g: invalid
>> physical addresses in the GISB register space, bringing the same level
>> of functionality to ARM64/Linux sounds reasonable to me.
>
> I disagree, given that:
>
> (a) You cannot determine the (HW) origin of the SError in an
> architecturally portable way. i.e. when you take an SError, you have
> no way of determining what asynchronous event caused it.
>
> (b) SError is effectively an edge-triggered interrupt for fatal system
> errors (e.g. it may be triggered in resonse to ECC errors,
> corruption detected in caches, etc). Even if you can determine that
> the GISB triggered *an* SError, this does not tell you that this was
> the *only* SError.

Correct, which is why Doug's changes allow chaining of handlers.

>
> If you take an SError, something bad has already happened. Your data
> may already have been corrupted, and worse, you don't know when or
> where specifically this occurred (nor how many times).

Sure, but that still allows you to send the correct signal to a faulting
application (unless I am missing something here).

>
> (c) You cannot determine the (SW) origin of an SError without relying
> upon implementation details. This cannot be written in a way that
> does not rely on microarchitecture, integration, etc, and would need
> to be updated for every future system with this misfeature.

Which is exactly what is being done here, with the help of platform
specific information (we would not load brcmstb_gisb.c if we were not on
a platform where it makes sense to use that HW).

>
> (d) Even if you can determine the (SW) origin of an SError by relying on
> IMPLEMENTATION DEFINED details, your handler needs to be intimately
> familiar with the arch in question in order to attempt to recover.
>
> For example, the existing code tries to skip an ARM instruction in
> some cases. For arm64 there are three cases that would need to be
> handled (AArch64 A64, AArch32 A32/ARM, AArch32 T32/Thumb).
>
> Further, it appears to me that the existing code is broken given
> that it doesn't handle Thumb, and given that it's skipping an
> instruction in response to an asynchronous event -- i.e. some
> arbitrary instruction after the one which triggered the abort.

OK, that could presumably be fixed though.

>
> For better or worse, SError *must* be treated as fatal.

I disagree here, since this is a platform specific SError exception that
we can actually handle correctly there is a chance to actually not take
down the system on something that can be made non fatal and informative
at the same time.

>
> As Doug stated:
>
> The main benefit is to help debug user mode code that accidentally
> maps a bad address since we would never make such an egregious error
> in the kernel ;)
>
> This is just one of many ways a userspace application with direct HW
> access can bring down the system. I see no reason to treat it any
> differently, especially given the above points.

Partially disagree, in the absence of a way to specifically deal with
the exception, I would almost agree, but this is not the case here, we
have a piece of HW that can help us locate the problem, display an
informative message, and send a SIGBUS to the faulting application.

Anyway, I won't argue much further than that, but I certainly don't
think taking down an entire system is going to prove itself useful when
you need to deploy such a kernel to hundreds of people who have no clue
what so ever what their actual problem is in the first place. Taking a
SIGBUS and printing a message can at least allow us to say: read more
carefully, it say exactly what's wrong.
--
Florian

2017-03-25 05:22:24

by Gregory Fong

[permalink] [raw]

Subject: Re: [PATCH 4/9] bus: brcmstb_gisb: Use register offsets with writes too

On Fri, Mar 24, 2017 at 7:46 AM, Doug Berger <[email protected]> wrote:
> This commit corrects the bug introduced in commit f80835875d3d
> ("bus: brcmstb_gisb: Look up register offsets in a table") such
> that gisb_write() translates the register enumeration into an
> offset from the base address for writes as well as reads.
>
> Fixes: f80835875d3d ("bus: brcmstb_gisb: Look up register offsets in a table")
> Signed-off-by: Doug Berger <[email protected]>

Acked-by: Gregory Fong <[email protected]>

2017-03-25 05:37:04

by Gregory Fong

[permalink] [raw]

Subject: Re: [PATCH 6/9] bus: brcmstb_gisb: correct support for 64-bit address output

On Fri, Mar 24, 2017 at 7:46 AM, Doug Berger <[email protected]> wrote:
> The GISB bus can support addresses beyond 32-bits. So this commit
> corrects support for reading a captured 64-bit address into a 64-bit
> variable by obtaining the high bits from the ARB_ERR_CAP_HI_ADDR
> register (when present) and then outputting the full 64-bit value.
>
> It also removes unused definitions.
>
> Fixes: 44127b771d9c ("bus: add Broadcom GISB bus arbiter timeout/error handler")
> Signed-off-by: Doug Berger <[email protected]>
> ---
> drivers/bus/brcmstb_gisb.c | 36 ++++++++++++++++++++----------------
> 1 file changed, 20 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
> [snip]
> @@ -119,6 +116,16 @@ static u32 gisb_read(struct brcmstb_gisb_arb_device *gdev, int reg)
> return ioread32(gdev->base + offset);
> }
>
> +static u64 gisb_read_address(struct brcmstb_gisb_arb_device *gdev)
> +{
> + u64 value;
> +
> + value = (u64)gisb_read(gdev, ARB_ERR_CAP_ADDR);

Unlike the one on the next line, this cast can be omitted.

> + value |= (u64)gisb_read(gdev, ARB_ERR_CAP_HI_ADDR) << 32;
> +
> + return value;
> +}
> [snip]

Acked-by: Gregory Fong <[email protected]>

2017-03-25 10:07:16

by Marc Zyngier

[permalink] [raw]

Subject: Re: [PATCH 3/9] arm64: mm: install SError abort handler

On Fri, Mar 24 2017 at 07:02:05 PM, Florian Fainelli <[email protected]> wrote:
> On 03/24/2017 11:31 AM, Mark Rutland wrote:
>> Hi Florian,
>>
>> On Fri, Mar 24, 2017 at 10:53:48AM -0700, Florian Fainelli wrote:
>>> On 03/24/2017 10:35 AM, Mark Rutland wrote:
>>>> On Fri, Mar 24, 2017 at 09:48:40AM -0700, Doug Berger wrote:
>>>>> On 03/24/2017 08:16 AM, Mark Rutland wrote:
>>>>>> On Fri, Mar 24, 2017 at 07:46:26AM -0700, Doug Berger wrote:
>>>>
>>>>> If you would consider an alternative implementation where we scrap
>>>>> the SError handler (i.e. maintain the ugliness in our downstream
>>>>> kernel) in favor of a more gentle user mode crash on SError that
>>>>> allows the kernel the opportunity to service the interrupt for
>>>>> diagnostic purposes I could try to repackage that.
>>>>
>>>> If this is just for diagnostic purposes, I believe you can register a
>>>> panic notifier, which can then read from the bus. The panic will occur,
>>>> but you'll have the opportunity to log some information to dmesg.
>>>
>>> And crash the kernel? That sounds awful, FWIW the ARM/Linux kernel is
>>> able to recover just fine from user-space accessing e.g: invalid
>>> physical addresses in the GISB register space, bringing the same level
>>> of functionality to ARM64/Linux sounds reasonable to me.
>>
>> I disagree, given that:
>>
>> (a) You cannot determine the (HW) origin of the SError in an
>> architecturally portable way. i.e. when you take an SError, you have
>> no way of determining what asynchronous event caused it.
>>
>> (b) SError is effectively an edge-triggered interrupt for fatal system
>> errors (e.g. it may be triggered in resonse to ECC errors,
>> corruption detected in caches, etc). Even if you can determine that
>> the GISB triggered *an* SError, this does not tell you that this was
>> the *only* SError.
>
> Correct, which is why Doug's changes allow chaining of handlers.
>
>>
>> If you take an SError, something bad has already happened. Your data
>> may already have been corrupted, and worse, you don't know when or
>> where specifically this occurred (nor how many times).
>
> Sure, but that still allows you to send the correct signal to a faulting
> application (unless I am missing something here).
>
>>
>> (c) You cannot determine the (SW) origin of an SError without relying
>> upon implementation details. This cannot be written in a way that
>> does not rely on microarchitecture, integration, etc, and would need
>> to be updated for every future system with this misfeature.
>
> Which is exactly what is being done here, with the help of platform
> specific information (we would not load brcmstb_gisb.c if we were not on
> a platform where it makes sense to use that HW).
>
>>
>> (d) Even if you can determine the (SW) origin of an SError by relying on
>> IMPLEMENTATION DEFINED details, your handler needs to be intimately
>> familiar with the arch in question in order to attempt to recover.
>>
>> For example, the existing code tries to skip an ARM instruction in
>> some cases. For arm64 there are three cases that would need to be
>> handled (AArch64 A64, AArch32 A32/ARM, AArch32 T32/Thumb).
>>
>> Further, it appears to me that the existing code is broken given
>> that it doesn't handle Thumb, and given that it's skipping an
>> instruction in response to an asynchronous event -- i.e. some
>> arbitrary instruction after the one which triggered the abort.
>
> OK, that could presumably be fixed though.
>
>>
>> For better or worse, SError *must* be treated as fatal.
>
> I disagree here, since this is a platform specific SError exception that
> we can actually handle correctly there is a chance to actually not take
> down the system on something that can be made non fatal and informative
> at the same time.
>
>>
>> As Doug stated:
>>
>> The main benefit is to help debug user mode code that accidentally
>> maps a bad address since we would never make such an egregious error
>> in the kernel ;)
>>
>> This is just one of many ways a userspace application with direct HW
>> access can bring down the system. I see no reason to treat it any
>> differently, especially given the above points.
>
> Partially disagree, in the absence of a way to specifically deal with
> the exception, I would almost agree, but this is not the case here, we
> have a piece of HW that can help us locate the problem, display an
> informative message, and send a SIGBUS to the faulting application.
>
> Anyway, I won't argue much further than that, but I certainly don't
> think taking down an entire system is going to prove itself useful when
> you need to deploy such a kernel to hundreds of people who have no clue
> what so ever what their actual problem is in the first place. Taking a
> SIGBUS and printing a message can at least allow us to say: read more
> carefully, it say exactly what's wrong.

I think there is one point that hasn't been made in this discussion. You
seem to assume that an SError should be handled the same way on an ARMv8
system as you handle it on your ARMv7 platform.

In most cases, Linux on ARMv7 runs (for better or worse) in secure mode,
making it the only software agent capable of handling an Asynchronous
Abort. On v8, Linux runs in non-secure mode, and relies on secure
firmware for a large set of platform specific services (PM, CPU
bring-up...). Crucially, error handling is one of these services.

It is largely expected that an SError should be first taken to EL3
(SCR_EL3.EA being set), handled there by any platform-specific FW able
to triage and log the error, and could even be reported to EL1NS via
some standard mechanism. If the FW decides to reinject this SError to
the non-secure side, then this is bound to be fatal, because even the FW
couldn't handle it.

So my view is that you should move that kind of error handling to the
place where it actually belongs. It will give you the opportunity to
print out your debug messages if necessary, and leave the SError
handling in the kernel the way it should be: fatal.

Thanks,

M.
--
Jazz is not dead. It just smells funny.

2017-03-27 20:19:26

by Florian Fainelli

[permalink] [raw]

Subject: Re: [PATCH 3/9] arm64: mm: install SError abort handler

On 03/25/2017 03:06 AM, Marc Zyngier wrote:
> On Fri, Mar 24 2017 at 07:02:05 PM, Florian Fainelli <[email protected]> wrote:
>> On 03/24/2017 11:31 AM, Mark Rutland wrote:
>>> Hi Florian,
>>>
>>> On Fri, Mar 24, 2017 at 10:53:48AM -0700, Florian Fainelli wrote:
>>>> On 03/24/2017 10:35 AM, Mark Rutland wrote:
>>>>> On Fri, Mar 24, 2017 at 09:48:40AM -0700, Doug Berger wrote:
>>>>>> On 03/24/2017 08:16 AM, Mark Rutland wrote:
>>>>>>> On Fri, Mar 24, 2017 at 07:46:26AM -0700, Doug Berger wrote:
>>>>>
>>>>>> If you would consider an alternative implementation where we scrap
>>>>>> the SError handler (i.e. maintain the ugliness in our downstream
>>>>>> kernel) in favor of a more gentle user mode crash on SError that
>>>>>> allows the kernel the opportunity to service the interrupt for
>>>>>> diagnostic purposes I could try to repackage that.
>>>>>
>>>>> If this is just for diagnostic purposes, I believe you can register a
>>>>> panic notifier, which can then read from the bus. The panic will occur,
>>>>> but you'll have the opportunity to log some information to dmesg.
>>>>
>>>> And crash the kernel? That sounds awful, FWIW the ARM/Linux kernel is
>>>> able to recover just fine from user-space accessing e.g: invalid
>>>> physical addresses in the GISB register space, bringing the same level
>>>> of functionality to ARM64/Linux sounds reasonable to me.
>>>
>>> I disagree, given that:
>>>
>>> (a) You cannot determine the (HW) origin of the SError in an
>>> architecturally portable way. i.e. when you take an SError, you have
>>> no way of determining what asynchronous event caused it.
>>>
>>> (b) SError is effectively an edge-triggered interrupt for fatal system
>>> errors (e.g. it may be triggered in resonse to ECC errors,
>>> corruption detected in caches, etc). Even if you can determine that
>>> the GISB triggered *an* SError, this does not tell you that this was
>>> the *only* SError.
>>
>> Correct, which is why Doug's changes allow chaining of handlers.
>>
>>>
>>> If you take an SError, something bad has already happened. Your data
>>> may already have been corrupted, and worse, you don't know when or
>>> where specifically this occurred (nor how many times).
>>
>> Sure, but that still allows you to send the correct signal to a faulting
>> application (unless I am missing something here).
>>
>>>
>>> (c) You cannot determine the (SW) origin of an SError without relying
>>> upon implementation details. This cannot be written in a way that
>>> does not rely on microarchitecture, integration, etc, and would need
>>> to be updated for every future system with this misfeature.
>>
>> Which is exactly what is being done here, with the help of platform
>> specific information (we would not load brcmstb_gisb.c if we were not on
>> a platform where it makes sense to use that HW).
>>
>>>
>>> (d) Even if you can determine the (SW) origin of an SError by relying on
>>> IMPLEMENTATION DEFINED details, your handler needs to be intimately
>>> familiar with the arch in question in order to attempt to recover.
>>>
>>> For example, the existing code tries to skip an ARM instruction in
>>> some cases. For arm64 there are three cases that would need to be
>>> handled (AArch64 A64, AArch32 A32/ARM, AArch32 T32/Thumb).
>>>
>>> Further, it appears to me that the existing code is broken given
>>> that it doesn't handle Thumb, and given that it's skipping an
>>> instruction in response to an asynchronous event -- i.e. some
>>> arbitrary instruction after the one which triggered the abort.
>>
>> OK, that could presumably be fixed though.
>>
>>>
>>> For better or worse, SError *must* be treated as fatal.
>>
>> I disagree here, since this is a platform specific SError exception that
>> we can actually handle correctly there is a chance to actually not take
>> down the system on something that can be made non fatal and informative
>> at the same time.
>>
>>>
>>> As Doug stated:
>>>
>>> The main benefit is to help debug user mode code that accidentally
>>> maps a bad address since we would never make such an egregious error
>>> in the kernel ;)
>>>
>>> This is just one of many ways a userspace application with direct HW
>>> access can bring down the system. I see no reason to treat it any
>>> differently, especially given the above points.
>>
>> Partially disagree, in the absence of a way to specifically deal with
>> the exception, I would almost agree, but this is not the case here, we
>> have a piece of HW that can help us locate the problem, display an
>> informative message, and send a SIGBUS to the faulting application.
>>
>> Anyway, I won't argue much further than that, but I certainly don't
>> think taking down an entire system is going to prove itself useful when
>> you need to deploy such a kernel to hundreds of people who have no clue
>> what so ever what their actual problem is in the first place. Taking a
>> SIGBUS and printing a message can at least allow us to say: read more
>> carefully, it say exactly what's wrong.
>
> I think there is one point that hasn't been made in this discussion. You
> seem to assume that an SError should be handled the same way on an ARMv8
> system as you handle it on your ARMv7 platform.

Correct, that is absolutely a conscious (or not) assumption about the
platforms being discussed here, and therefore the proposed patches.

>
> In most cases, Linux on ARMv7 runs (for better or worse) in secure mode,
> making it the only software agent capable of handling an Asynchronous
> Abort. On v8, Linux runs in non-secure mode, and relies on secure
> firmware for a large set of platform specific services (PM, CPU
> bring-up...). Crucially, error handling is one of these services.
>
> It is largely expected that an SError should be first taken to EL3
> (SCR_EL3.EA being set), handled there by any platform-specific FW able
> to triage and log the error, and could even be reported to EL1NS via
> some standard mechanism. If the FW decides to reinject this SError to
> the non-secure side, then this is bound to be fatal, because even the FW
> couldn't handle it.
>
> So my view is that you should move that kind of error handling to the
> place where it actually belongs. It will give you the opportunity to
> print out your debug messages if necessary, and leave the SError
> handling in the kernel the way it should be: fatal.

Your point of view is absolutely valid, but does not necessarily match
the reality of things as implemented in real life because:

- the trusted firmware on some platforms is something that is subject to
a different process than a kernel or user-space upgrade and/or
security/certification process, and may be challenging to update in the
future, so people design it with everything they will ever need: PSCI
v0.2 services and that's it

- there is what ARM Ltd. provides as guidelines about how a platform
should be done (read: must, should?), and there is how implementors end
up making (most often uneducated, biased and under time pressure) design
decisions about how a platform will be designed, and prototype is as
close as you could get to a product in the embedded space

That being said, it is definitively something that should be done, so
we'll make sure this gets funneled back to the appropriate people so
they can think about adding SError handling where it should be done.

Thanks for the feedback!
--
Florian