2021-04-22 00:45:23

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH 0/2] arm64: ACPI GTDT watchdog fixes

Dann recently reported that his ThunderX machine failed to boot since
64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard
interrupts"), with a not so pretty crash while trying to send an IPI.

It turned out to be caused by a mix of broken firmware and a buggy
GTDT watchdog driver. Both have forever been buggy, but the above
commit revealed that the error handling path of the driver was
probably the worse part of it all.

Anyway, this short series has two goals:
- handle broken firmware in a less broken way
- make sure that the route cause of the problem can be identified
quickly

Thanks,

M.

Marc Zyngier (2):
ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
ACPI: irq: Prevent unregistering of GIC SGIs

drivers/acpi/arm64/gtdt.c | 10 ++++++----
drivers/acpi/irq.c | 6 +++++-
2 files changed, 11 insertions(+), 5 deletions(-)

--
2.29.2


2021-04-22 00:45:47

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH 2/2] ACPI: irq: Prevent unregistering of GIC SGIs

When using ACPI on arm64, which implies the GIC IRQ model, no
table should ever provide a GSI number in the range [0:15],
as these are reserved for IPIs.

However, drivers tend to call acpi_unregister_gsi() with any
random GSI number provided by half baked tables, which results
in an exploding kernel when its IPIs have been unconfigured.

In order to catch this, check for the silly case early, warn
that something is going wrong and avoid the above disaster.

Signed-off-by: Marc Zyngier <[email protected]>
---
drivers/acpi/irq.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/irq.c b/drivers/acpi/irq.c
index e209081d644b..c68e694fca26 100644
--- a/drivers/acpi/irq.c
+++ b/drivers/acpi/irq.c
@@ -75,8 +75,12 @@ void acpi_unregister_gsi(u32 gsi)
{
struct irq_domain *d = irq_find_matching_fwnode(acpi_gsi_domain_id,
DOMAIN_BUS_ANY);
- int irq = irq_find_mapping(d, gsi);
+ int irq;

+ if (WARN_ON(acpi_irq_model == ACPI_IRQ_MODEL_GIC && gsi < 16))
+ return;
+
+ irq = irq_find_mapping(d, gsi);
irq_dispose_mapping(irq);
}
EXPORT_SYMBOL_GPL(acpi_unregister_gsi);
--
2.29.2

2021-04-22 01:05:37

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH 2/2] ACPI: irq: Prevent unregistering of GIC SGIs

On Wed, Apr 21, 2021 at 05:43:17PM +0100, Marc Zyngier wrote:
> When using ACPI on arm64, which implies the GIC IRQ model, no
> table should ever provide a GSI number in the range [0:15],
> as these are reserved for IPIs.
>
> However, drivers tend to call acpi_unregister_gsi() with any
> random GSI number provided by half baked tables, which results
> in an exploding kernel when its IPIs have been unconfigured.
>
> In order to catch this, check for the silly case early, warn
> that something is going wrong and avoid the above disaster.
>

Reviewed-by: Sudeep Holla <[email protected]>

Just curious if this is just precaution or do we have a platform doing
something stupid like this ?

--
Regards,
Sudeep

2021-04-22 01:15:18

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 2/2] ACPI: irq: Prevent unregistering of GIC SGIs

On Wed, 21 Apr 2021 18:15:16 +0100,
Sudeep Holla <[email protected]> wrote:
>
> On Wed, Apr 21, 2021 at 05:43:17PM +0100, Marc Zyngier wrote:
> > When using ACPI on arm64, which implies the GIC IRQ model, no
> > table should ever provide a GSI number in the range [0:15],
> > as these are reserved for IPIs.
> >
> > However, drivers tend to call acpi_unregister_gsi() with any
> > random GSI number provided by half baked tables, which results
> > in an exploding kernel when its IPIs have been unconfigured.
> >
> > In order to catch this, check for the silly case early, warn
> > that something is going wrong and avoid the above disaster.
> >
>
> Reviewed-by: Sudeep Holla <[email protected]>
>
> Just curious if this is just precaution or do we have a platform doing
> something stupid like this ?

Without this, it could be really hard to pinpoint which driver messes
with IPIs. Having this in place would have caught the GTDT bug much
earlier (several years ago actually).

The only reason I managed to track it down in a short amount of time
is that the driver actually printed an error message before the kernel
exploded while probing a completely unrelated driver. Without this
message, I'd still be scratching my head.

The WARN_ON() would definitely point at the guilty party, and keep the
kernel running.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2021-04-22 01:15:24

by dann frazier

[permalink] [raw]
Subject: Re: [PATCH 0/2] arm64: ACPI GTDT watchdog fixes

On Wed, Apr 21, 2021 at 05:43:15PM +0100, Marc Zyngier wrote:
> Dann recently reported that his ThunderX machine failed to boot since
> 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard
> interrupts"), with a not so pretty crash while trying to send an IPI.
>
> It turned out to be caused by a mix of broken firmware and a buggy
> GTDT watchdog driver. Both have forever been buggy, but the above
> commit revealed that the error handling path of the driver was
> probably the worse part of it all.
>
> Anyway, this short series has two goals:
> - handle broken firmware in a less broken way
> - make sure that the route cause of the problem can be identified
> quickly
>
> Thanks,
>
> M.
>
> Marc Zyngier (2):
> ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
> ACPI: irq: Prevent unregistering of GIC SGIs
>
> drivers/acpi/arm64/gtdt.c | 10 ++++++----
> drivers/acpi/irq.c | 6 +++++-
> 2 files changed, 11 insertions(+), 5 deletions(-)

For the series:

Tested-by: dann frazier <[email protected]>

2021-04-22 01:48:27

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH 1/2] ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure

When failing the driver probe because of invalid firmware properties,
the GTDT driver unmaps the interrupt that it mapped earlier.

However, it never checks whether the mapping of the interrupt actially
succeeded. Even more, should the firmware report an illegal interrupt
number that overlaps with the GIC SGI range, this can result in an
IPI being unmapped, and subsequent fireworks (as reported by Dann
Frazier).

Rework the driver to have a slightly saner behaviour and actually
check whether the interrupt has been mapped before unmapping things.

Reported-by: dann frazier <[email protected]>
Fixes: ca9ae5ec4ef0 ("acpi/arm64: Add SBSA Generic Watchdog support in GTDT driver")
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Cc: Fu Wei <[email protected]>
---
drivers/acpi/arm64/gtdt.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/arm64/gtdt.c b/drivers/acpi/arm64/gtdt.c
index f2d0e5915dab..0a0a982f9c28 100644
--- a/drivers/acpi/arm64/gtdt.c
+++ b/drivers/acpi/arm64/gtdt.c
@@ -329,7 +329,7 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd,
int index)
{
struct platform_device *pdev;
- int irq = map_gt_gsi(wd->timer_interrupt, wd->timer_flags);
+ int irq;

/*
* According to SBSA specification the size of refresh and control
@@ -338,7 +338,7 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd,
struct resource res[] = {
DEFINE_RES_MEM(wd->control_frame_address, SZ_4K),
DEFINE_RES_MEM(wd->refresh_frame_address, SZ_4K),
- DEFINE_RES_IRQ(irq),
+ {},
};
int nr_res = ARRAY_SIZE(res);

@@ -348,10 +348,11 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd,

if (!(wd->refresh_frame_address && wd->control_frame_address)) {
pr_err(FW_BUG "failed to get the Watchdog base address.\n");
- acpi_unregister_gsi(wd->timer_interrupt);
return -EINVAL;
}

+ irq = map_gt_gsi(wd->timer_interrupt, wd->timer_flags);
+ res[2] = (struct resource)DEFINE_RES_IRQ(irq);
if (irq <= 0) {
pr_warn("failed to map the Watchdog interrupt.\n");
nr_res--;
@@ -364,7 +365,8 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd,
*/
pdev = platform_device_register_simple("sbsa-gwdt", index, res, nr_res);
if (IS_ERR(pdev)) {
- acpi_unregister_gsi(wd->timer_interrupt);
+ if (irq > 0)
+ acpi_unregister_gsi(wd->timer_interrupt);
return PTR_ERR(pdev);
}

--
2.29.2

2021-04-22 13:43:50

by Hanjun Guo

[permalink] [raw]
Subject: Re: [PATCH 0/2] arm64: ACPI GTDT watchdog fixes

On 2021/4/22 0:43, Marc Zyngier wrote:
> Dann recently reported that his ThunderX machine failed to boot since
> 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard
> interrupts"), with a not so pretty crash while trying to send an IPI.
>
> It turned out to be caused by a mix of broken firmware and a buggy
> GTDT watchdog driver. Both have forever been buggy, but the above
> commit revealed that the error handling path of the driver was
> probably the worse part of it all.
>
> Anyway, this short series has two goals:
> - handle broken firmware in a less broken way
> - make sure that the route cause of the problem can be identified
> quickly

Tested on Kunpeng920 ARM64 server, didn't any issue after applying
this patch set,

Tested-by: Hanjun Guo <[email protected]>
Reviewed-by: Hanjun Guo <[email protected]>

Thanks
Hanjun

2021-04-22 14:27:34

by Lorenzo Pieralisi

[permalink] [raw]
Subject: Re: [PATCH 0/2] arm64: ACPI GTDT watchdog fixes

On Wed, Apr 21, 2021 at 05:43:15PM +0100, Marc Zyngier wrote:
> Dann recently reported that his ThunderX machine failed to boot since
> 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard
> interrupts"), with a not so pretty crash while trying to send an IPI.
>
> It turned out to be caused by a mix of broken firmware and a buggy
> GTDT watchdog driver. Both have forever been buggy, but the above
> commit revealed that the error handling path of the driver was
> probably the worse part of it all.
>
> Anyway, this short series has two goals:
> - handle broken firmware in a less broken way
> - make sure that the route cause of the problem can be identified
> quickly
>
> Thanks,
>
> M.
>
> Marc Zyngier (2):
> ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
> ACPI: irq: Prevent unregistering of GIC SGIs
>
> drivers/acpi/arm64/gtdt.c | 10 ++++++----
> drivers/acpi/irq.c | 6 +++++-
> 2 files changed, 11 insertions(+), 5 deletions(-)

Patch(2) needs an ACK from Rafael - usually these patches go via
the ARM64 tree but I don't think it is compulsory for this series.

Thank you !

Reviewed-by: Lorenzo Pieralisi <[email protected]>

2021-04-23 13:47:41

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 0/2] arm64: ACPI GTDT watchdog fixes

On Thu, Apr 22, 2021 at 03:23:42PM +0100, Lorenzo Pieralisi wrote:
> On Wed, Apr 21, 2021 at 05:43:15PM +0100, Marc Zyngier wrote:
> > Dann recently reported that his ThunderX machine failed to boot since
> > 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard
> > interrupts"), with a not so pretty crash while trying to send an IPI.
> >
> > It turned out to be caused by a mix of broken firmware and a buggy
> > GTDT watchdog driver. Both have forever been buggy, but the above
> > commit revealed that the error handling path of the driver was
> > probably the worse part of it all.
> >
> > Anyway, this short series has two goals:
> > - handle broken firmware in a less broken way
> > - make sure that the route cause of the problem can be identified
> > quickly
> >
> > Thanks,
> >
> > M.
> >
> > Marc Zyngier (2):
> > ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
> > ACPI: irq: Prevent unregistering of GIC SGIs
> >
> > drivers/acpi/arm64/gtdt.c | 10 ++++++----
> > drivers/acpi/irq.c | 6 +++++-
> > 2 files changed, 11 insertions(+), 5 deletions(-)
>
> Patch(2) needs an ACK from Rafael - usually these patches go via
> the ARM64 tree but I don't think it is compulsory for this series.
>
> Thank you !
>
> Reviewed-by: Lorenzo Pieralisi <[email protected]>

Thanks Lorenzo.

Rafael, if there are no objections, I'll take these two patches in the
arm64 tree.

--
Catalin

2021-04-23 17:14:56

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 0/2] arm64: ACPI GTDT watchdog fixes

On Wed, 21 Apr 2021 17:43:15 +0100, Marc Zyngier wrote:
> Dann recently reported that his ThunderX machine failed to boot since
> 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard
> interrupts"), with a not so pretty crash while trying to send an IPI.
>
> It turned out to be caused by a mix of broken firmware and a buggy
> GTDT watchdog driver. Both have forever been buggy, but the above
> commit revealed that the error handling path of the driver was
> probably the worse part of it all.
>
> [...]

Applied to arm64 (for-next/core), thanks!

[1/2] ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
https://git.kernel.org/arm64/c/1ecd5b129252
[2/2] ACPI: irq: Prevent unregistering of GIC SGIs
https://git.kernel.org/arm64/c/2a20b08f06e7

--
Catalin