Jay Chen reported that using a kdump kernel on a GICv4.1 system
results in a RAS error being delivered when the secondary kernel
configures the ITS's view of the new VPE table.
As it turns out, that's because each RD still has a pointer to
the previous instance of the VPE table, and that particular
implementation is very upset by seeing two bits of the HW that
should point to the same table with different values.
To solve this, let's invalidate any reference that any RD has to
the VPE table when discovering the RDs. The ITS can then be
programmed as expected.
Reported-by: Jay Chen <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Cc: Lorenzo Pieralisi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/irqchip/irq-gic-v3.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index daec3309b014..86397522e786 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -920,6 +920,22 @@ static int __gic_update_rdist_properties(struct redist_region *region,
{
u64 typer = gic_read_typer(ptr + GICR_TYPER);
+ /* Boot-time cleanip */
+ if ((typer & GICR_TYPER_VLPIS) && (typer & GICR_TYPER_RVPEID)) {
+ u64 val;
+
+ /* Deactivate any present vPE */
+ val = gicr_read_vpendbaser(ptr + SZ_128K + GICR_VPENDBASER);
+ if (val & GICR_VPENDBASER_Valid)
+ gicr_write_vpendbaser(GICR_VPENDBASER_PendingLast,
+ ptr + SZ_128K + GICR_VPENDBASER);
+
+ /* Mark the VPE table as invalid */
+ val = gicr_read_vpropbaser(ptr + SZ_128K + GICR_VPROPBASER);
+ val &= ~GICR_VPROPBASER_4_1_VALID;
+ gicr_write_vpropbaser(val, ptr + SZ_128K + GICR_VPROPBASER);
+ }
+
gic_data.rdists.has_vlpis &= !!(typer & GICR_TYPER_VLPIS);
/* RVPEID implies some form of DirectLPI, no matter what the doc says... :-/ */
--
2.30.2
On Thu, Dec 16, 2021 at 02:48:04PM +0000, Marc Zyngier wrote:
> Jay Chen reported that using a kdump kernel on a GICv4.1 system
> results in a RAS error being delivered when the secondary kernel
> configures the ITS's view of the new VPE table.
>
> As it turns out, that's because each RD still has a pointer to
> the previous instance of the VPE table, and that particular
> implementation is very upset by seeing two bits of the HW that
> should point to the same table with different values.
>
> To solve this, let's invalidate any reference that any RD has to
> the VPE table when discovering the RDs. The ITS can then be
> programmed as expected.
It makes sense. I believe there is an additional question though,
related to ITSes sharing the VPE table (SVPET) with RDs.
IIUC, all ITSes within a given affinity (that therefore are sharing the
VPE table) need to be quiesced before allocating a new VPE table.
Again, I am off the radar for a while and this patch makes sense on its
own, just raising the question since I was trying to understand whether
that can be an additional issue to solve on kexec; I will follow up
on this query.
It would be nice to know Alibaba's GIC HW topology if possible.
Thanks for putting together the fix and merging it.
Lorenzo
> Reported-by: Jay Chen <[email protected]>
> Signed-off-by: Marc Zyngier <[email protected]>
> Cc: Lorenzo Pieralisi <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> ---
> drivers/irqchip/irq-gic-v3.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index daec3309b014..86397522e786 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -920,6 +920,22 @@ static int __gic_update_rdist_properties(struct redist_region *region,
> {
> u64 typer = gic_read_typer(ptr + GICR_TYPER);
>
> + /* Boot-time cleanip */
> + if ((typer & GICR_TYPER_VLPIS) && (typer & GICR_TYPER_RVPEID)) {
> + u64 val;
> +
> + /* Deactivate any present vPE */
> + val = gicr_read_vpendbaser(ptr + SZ_128K + GICR_VPENDBASER);
> + if (val & GICR_VPENDBASER_Valid)
> + gicr_write_vpendbaser(GICR_VPENDBASER_PendingLast,
> + ptr + SZ_128K + GICR_VPENDBASER);
> +
> + /* Mark the VPE table as invalid */
> + val = gicr_read_vpropbaser(ptr + SZ_128K + GICR_VPROPBASER);
> + val &= ~GICR_VPROPBASER_4_1_VALID;
> + gicr_write_vpropbaser(val, ptr + SZ_128K + GICR_VPROPBASER);
> + }
> +
> gic_data.rdists.has_vlpis &= !!(typer & GICR_TYPER_VLPIS);
>
> /* RVPEID implies some form of DirectLPI, no matter what the doc says... :-/ */
> --
> 2.30.2
>
On Thu, 16 Dec 2021 19:03:15 +0000,
Lorenzo Pieralisi <[email protected]> wrote:
>
> On Thu, Dec 16, 2021 at 02:48:04PM +0000, Marc Zyngier wrote:
> > Jay Chen reported that using a kdump kernel on a GICv4.1 system
> > results in a RAS error being delivered when the secondary kernel
> > configures the ITS's view of the new VPE table.
> >
> > As it turns out, that's because each RD still has a pointer to
> > the previous instance of the VPE table, and that particular
> > implementation is very upset by seeing two bits of the HW that
> > should point to the same table with different values.
> >
> > To solve this, let's invalidate any reference that any RD has to
> > the VPE table when discovering the RDs. The ITS can then be
> > programmed as expected.
>
> It makes sense. I believe there is an additional question though,
> related to ITSes sharing the VPE table (SVPET) with RDs.
>
> IIUC, all ITSes within a given affinity (that therefore are sharing the
> VPE table) need to be quiesced before allocating a new VPE table.
Yes, there is that too. I think we need a first pass iterating over
the ITSs and invalidate their VPE table pointers, as they may well be
in a shared state. If they are, the ITSs would be liable to generating
RAS errors as well, just like we just saw when sharing the table
between ITS and RDs.
> Again, I am off the radar for a while and this patch makes sense on its
> own, just raising the question since I was trying to understand whether
> that can be an additional issue to solve on kexec; I will follow up
> on this query.
Yeah, please ping me in the new year if you don't hear from me, and
we'll fix that one too.
> It would be nice to know Alibaba's GIC HW topology if possible.
Indeed.
> Thanks for putting together the fix and merging it.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
The following commit has been merged into the irq/irqchip-fixes branch of irqchip:
Commit-ID: c733ebb7cb67dfb146a07c0ae329a0de9ec52f36
Gitweb: https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms/c733ebb7cb67dfb146a07c0ae329a0de9ec52f36
Author: Marc Zyngier <[email protected]>
AuthorDate: Mon, 24 Jan 2022 13:38:09
Committer: Marc Zyngier <[email protected]>
CommitterDate: Wed, 26 Jan 2022 11:10:28
irqchip/gic-v3-its: Reset each ITS's BASERn register before probe
A recent bug report outlined that the way GICv4.1 is handled across
kexec is pretty bad. We can end-up in a situation where ITSs share
memory (this is the case when SVPET==1) and reprogram the base
registers, creating a situation where ITSs that are part of a given
affinity group see different pointers. Which is illegal. Boo.
In order to restore some sanity, reset the BASERn registers to 0
*before* probing any ITS. Although this isn't optimised at all,
this is only a once-per-boot cost, which shouldn't show up on
anyone's radar.
Cc: Jay Chen <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-by: Lorenzo Pieralisi <[email protected]>
Link: https://lore.kernel.org/r/20211216190315.GA14220@lpieralisi
Link: https://lore.kernel.org/r/[email protected]
---
drivers/irqchip/irq-gic-v3-its.c | 120 ++++++++++++++++++++++++------
1 file changed, 99 insertions(+), 21 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 7b8f1ec..220fa45 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -4856,6 +4856,38 @@ static struct syscore_ops its_syscore_ops = {
.resume = its_restore_enable,
};
+static void __init __iomem *its_map_one(struct resource *res, int *err)
+{
+ void __iomem *its_base;
+ u32 val;
+
+ its_base = ioremap(res->start, SZ_64K);
+ if (!its_base) {
+ pr_warn("ITS@%pa: Unable to map ITS registers\n", &res->start);
+ *err = -ENOMEM;
+ return NULL;
+ }
+
+ val = readl_relaxed(its_base + GITS_PIDR2) & GIC_PIDR2_ARCH_MASK;
+ if (val != 0x30 && val != 0x40) {
+ pr_warn("ITS@%pa: No ITS detected, giving up\n", &res->start);
+ *err = -ENODEV;
+ goto out_unmap;
+ }
+
+ *err = its_force_quiescent(its_base);
+ if (*err) {
+ pr_warn("ITS@%pa: Failed to quiesce, giving up\n", &res->start);
+ goto out_unmap;
+ }
+
+ return its_base;
+
+out_unmap:
+ iounmap(its_base);
+ return NULL;
+}
+
static int its_init_domain(struct fwnode_handle *handle, struct its_node *its)
{
struct irq_domain *inner_domain;
@@ -4963,29 +4995,14 @@ static int __init its_probe_one(struct resource *res,
{
struct its_node *its;
void __iomem *its_base;
- u32 val, ctlr;
u64 baser, tmp, typer;
struct page *page;
+ u32 ctlr;
int err;
- its_base = ioremap(res->start, SZ_64K);
- if (!its_base) {
- pr_warn("ITS@%pa: Unable to map ITS registers\n", &res->start);
- return -ENOMEM;
- }
-
- val = readl_relaxed(its_base + GITS_PIDR2) & GIC_PIDR2_ARCH_MASK;
- if (val != 0x30 && val != 0x40) {
- pr_warn("ITS@%pa: No ITS detected, giving up\n", &res->start);
- err = -ENODEV;
- goto out_unmap;
- }
-
- err = its_force_quiescent(its_base);
- if (err) {
- pr_warn("ITS@%pa: Failed to quiesce, giving up\n", &res->start);
- goto out_unmap;
- }
+ its_base = its_map_one(res, &err);
+ if (!its_base)
+ return err;
pr_info("ITS %pR\n", res);
@@ -5249,6 +5266,23 @@ out:
return ret;
}
+/* Mark all the BASER registers as invalid before they get reprogrammed */
+static int __init its_reset_one(struct resource *res)
+{
+ void __iomem *its_base;
+ int err, i;
+
+ its_base = its_map_one(res, &err);
+ if (!its_base)
+ return err;
+
+ for (i = 0; i < GITS_BASER_NR_REGS; i++)
+ gits_write_baser(0, its_base + GITS_BASER + (i << 3));
+
+ iounmap(its_base);
+ return 0;
+}
+
static const struct of_device_id its_device_id[] = {
{ .compatible = "arm,gic-v3-its", },
{},
@@ -5259,6 +5293,26 @@ static int __init its_of_probe(struct device_node *node)
struct device_node *np;
struct resource res;
+ /*
+ * Make sure *all* the ITS are reset before we probe any, as
+ * they may be sharing memory. If any of the ITS fails to
+ * reset, don't even try to go any further, as this could
+ * result in something even worse.
+ */
+ for (np = of_find_matching_node(node, its_device_id); np;
+ np = of_find_matching_node(np, its_device_id)) {
+ int err;
+
+ if (!of_device_is_available(np) ||
+ !of_property_read_bool(np, "msi-controller") ||
+ of_address_to_resource(np, 0, &res))
+ continue;
+
+ err = its_reset_one(&res);
+ if (err)
+ return err;
+ }
+
for (np = of_find_matching_node(node, its_device_id); np;
np = of_find_matching_node(np, its_device_id)) {
if (!of_device_is_available(np))
@@ -5421,11 +5475,35 @@ dom_err:
return err;
}
+static int __init its_acpi_reset(union acpi_subtable_headers *header,
+ const unsigned long end)
+{
+ struct acpi_madt_generic_translator *its_entry;
+ struct resource res;
+
+ its_entry = (struct acpi_madt_generic_translator *)header;
+ res = (struct resource) {
+ .start = its_entry->base_address,
+ .end = its_entry->base_address + ACPI_GICV3_ITS_MEM_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+ };
+
+ return its_reset_one(&res);
+}
+
static void __init its_acpi_probe(void)
{
acpi_table_parse_srat_its();
- acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_TRANSLATOR,
- gic_acpi_parse_madt_its, 0);
+ /*
+ * Make sure *all* the ITS are reset before we probe any, as
+ * they may be sharing memory. If any of the ITS fails to
+ * reset, don't even try to go any further, as this could
+ * result in something even worse.
+ */
+ if (acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_TRANSLATOR,
+ its_acpi_reset, 0) > 0)
+ acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_TRANSLATOR,
+ gic_acpi_parse_madt_its, 0);
acpi_its_srat_maps_free();
}
#else