2021-10-22 10:35:32

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH 0/3] irqchip/gic-v3-its: Fix LPI pending table handling vs PREEMPT_RT

Hi folks,

This is my take at fixing [1]. Reading about the LPI tables situation was
entertaining.

Tested against kexec on an Ampere eMAG. Seems to be working fine atop
5.15-rc6. On the other hand, I can only issue one kexec from 5.15-rc6-rt12 - if
I then issue another one on the new kernel, I get tasks hanging. That is true
even without my patches and without CONFIG_PREEMPT_RT.

[1]: http://lore.kernel.org/r/[email protected]

Cheers,
Valentin

Valentin Schneider (3):
irqchip/gic-v3-its: Give the percpu rdist struct its own flags field
irqchip/gic-v3-its: Postpone LPI pending table freeing and memreserve
irqchip/gic-v3-its: Limit memreserve cpuhp state lifetime

drivers/irqchip/irq-gic-v3-its.c | 108 ++++++++++++++++++++++++-----
include/linux/irqchip/arm-gic-v3.h | 3 +-
2 files changed, 94 insertions(+), 17 deletions(-)

--
2.25.1


2021-10-22 10:35:36

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH 1/3] irqchip/gic-v3-its: Give the percpu rdist struct its own flags field

Later patches will require tracking some per-rdist status. Reuse the bytes
"lost" to padding within the __percpu rdist struct as a flags field, and
re-encode ->lpi_enabled within said flags.

No change in functionality intended.

Signed-off-by: Valentin Schneider <[email protected]>
---
drivers/irqchip/irq-gic-v3-its.c | 26 ++++++++++++++------------
include/linux/irqchip/arm-gic-v3.h | 2 +-
2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index eb0882d15366..a688ed5c21e8 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -43,8 +43,10 @@
#define ITS_FLAGS_WORKAROUND_CAVIUM_22375 (1ULL << 1)
#define ITS_FLAGS_WORKAROUND_CAVIUM_23144 (1ULL << 2)

-#define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0)
-#define RDIST_FLAGS_RD_TABLES_PREALLOCATED (1 << 1)
+#define RDISTS_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0)
+#define RDISTS_FLAGS_RD_TABLES_PREALLOCATED (1 << 1)
+
+#define RDIST_FLAGS_LPI_ENABLED BIT(0)

static u32 lpi_id_bits;

@@ -1415,7 +1417,7 @@ static void lpi_write_config(struct irq_data *d, u8 clr, u8 set)
* And yes, we're flushing exactly: One. Single. Byte.
* Humpf...
*/
- if (gic_rdists->flags & RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING)
+ if (gic_rdists->flags & RDISTS_FLAGS_PROPBASE_NEEDS_FLUSHING)
gic_flush_dcache_to_poc(cfg, sizeof(*cfg));
else
dsb(ishst);
@@ -2224,7 +2226,7 @@ static int gic_reserve_range(phys_addr_t addr, unsigned long size)

static int __init its_setup_lpi_prop_table(void)
{
- if (gic_rdists->flags & RDIST_FLAGS_RD_TABLES_PREALLOCATED) {
+ if (gic_rdists->flags & RDISTS_FLAGS_RD_TABLES_PREALLOCATED) {
u64 val;

val = gicr_read_propbaser(gic_data_rdist_rd_base() + GICR_PROPBASER);
@@ -2978,8 +2980,8 @@ static int __init allocate_lpi_tables(void)
*/
val = readl_relaxed(gic_data_rdist_rd_base() + GICR_CTLR);
if ((val & GICR_CTLR_ENABLE_LPIS) && enabled_lpis_allowed()) {
- gic_rdists->flags |= (RDIST_FLAGS_RD_TABLES_PREALLOCATED |
- RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING);
+ gic_rdists->flags |= (RDISTS_FLAGS_RD_TABLES_PREALLOCATED |
+ RDISTS_FLAGS_PROPBASE_NEEDS_FLUSHING);
pr_info("GICv3: Using preallocated redistributor tables\n");
}

@@ -3044,11 +3046,11 @@ static void its_cpu_init_lpis(void)
phys_addr_t paddr;
u64 val, tmp;

- if (gic_data_rdist()->lpi_enabled)
+ if (gic_data_rdist()->flags & RDIST_FLAGS_LPI_ENABLED)
return;

val = readl_relaxed(rbase + GICR_CTLR);
- if ((gic_rdists->flags & RDIST_FLAGS_RD_TABLES_PREALLOCATED) &&
+ if ((gic_rdists->flags & RDISTS_FLAGS_RD_TABLES_PREALLOCATED) &&
(val & GICR_CTLR_ENABLE_LPIS)) {
/*
* Check that we get the same property table on all
@@ -3095,7 +3097,7 @@ static void its_cpu_init_lpis(void)
gicr_write_propbaser(val, rbase + GICR_PROPBASER);
}
pr_info_once("GIC: using cache flushing for LPI property table\n");
- gic_rdists->flags |= RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING;
+ gic_rdists->flags |= RDISTS_FLAGS_PROPBASE_NEEDS_FLUSHING;
}

/* set PENDBASE */
@@ -3158,7 +3160,7 @@ static void its_cpu_init_lpis(void)
/* Make sure the GIC has seen the above */
dsb(sy);
out:
- gic_data_rdist()->lpi_enabled = true;
+ gic_data_rdist()->flags |= RDIST_FLAGS_LPI_ENABLED;
pr_info("GICv3: CPU%d: using %s LPI pending table @%pa\n",
smp_processor_id(),
gic_data_rdist()->pend_page ? "allocated" : "reserved",
@@ -5138,8 +5140,8 @@ static int redist_disable_lpis(void)
*
* If running with preallocated tables, there is nothing to do.
*/
- if (gic_data_rdist()->lpi_enabled ||
- (gic_rdists->flags & RDIST_FLAGS_RD_TABLES_PREALLOCATED))
+ if ((gic_data_rdist()->flags & RDIST_FLAGS_LPI_ENABLED) ||
+ (gic_rdists->flags & RDISTS_FLAGS_RD_TABLES_PREALLOCATED))
return 0;

/*
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index 81cbf85f73de..0dc34d7d735a 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -615,7 +615,7 @@ struct rdists {
void __iomem *rd_base;
struct page *pend_page;
phys_addr_t phys_base;
- bool lpi_enabled;
+ u64 flags;
cpumask_t *vpe_table_mask;
void *vpe_l1_base;
} __percpu *rdist;
--
2.25.1

2021-10-22 10:37:32

by Valentin Schneider

[permalink] [raw]
Subject: [PATCH 3/3] irqchip/gic-v3-its: Limit memreserve cpuhp state lifetime

The new memreserve cpuhp callback only needs to survive up until a point
where every CPU in the system has booted once. Beyond that, it becomes a
no-op and can be put in the bin.

Signed-off-by: Valentin Schneider <[email protected]>
---
drivers/irqchip/irq-gic-v3-its.c | 23 ++++++++++++++++++++---
include/linux/irqchip/arm-gic-v3.h | 1 +
2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index a6a4af59205e..4ae9ae6b90fe 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -5206,6 +5206,15 @@ int its_cpu_init(void)
}

#ifdef CONFIG_EFI
+static void rdist_memreserve_cpuhp_cleanup_workfn(struct work_struct *work)
+{
+ cpuhp_remove_state_nocalls(gic_rdists->cpuhp_memreserve_state);
+ gic_rdists->cpuhp_memreserve_state = CPUHP_INVALID;
+}
+
+static DECLARE_WORK(rdist_memreserve_cpuhp_cleanup_work,
+ rdist_memreserve_cpuhp_cleanup_workfn);
+
static int its_cpu_memreserve_lpi(unsigned int cpu)
{
struct page *pend_page = gic_data_rdist()->pend_page;
@@ -5226,7 +5235,7 @@ static int its_cpu_memreserve_lpi(unsigned int cpu)
* invocation of this callback, or in a previous life before kexec.
*/
if (gic_data_rdist()->flags & RDIST_FLAGS_PENDTABLE_RESERVED)
- return 0;
+ goto out;

gic_data_rdist()->flags |= RDIST_FLAGS_PENDTABLE_RESERVED;

@@ -5234,6 +5243,11 @@ static int its_cpu_memreserve_lpi(unsigned int cpu)
paddr = page_to_phys(pend_page);
WARN_ON(gic_reserve_range(paddr, LPI_PENDBASE_SZ));

+out:
+ /* This only needs to run once per CPU */
+ if (cpumask_equal(&cpus_booted_once_mask, cpu_possible_mask))
+ schedule_work(&rdist_memreserve_cpuhp_cleanup_work);
+
return 0;
}
#endif
@@ -5421,13 +5435,14 @@ static void __init its_acpi_probe(void)
static void __init its_acpi_probe(void) { }
#endif

-static int __init its_lpi_memreserve_init(void)
+static int __init its_lpi_memreserve_init(struct rdists *rdists)
{
int state;

if (!efi_enabled(EFI_CONFIG_TABLES))
return 0;

+ rdists->cpuhp_memreserve_state = CPUHP_INVALID;
state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
"irqchip/arm/gicv3/memreserve:online",
its_cpu_memreserve_lpi,
@@ -5435,6 +5450,8 @@ static int __init its_lpi_memreserve_init(void)
if (state < 0)
return state;

+ rdists->cpuhp_memreserve_state = state;
+
return 0;
}

@@ -5465,7 +5482,7 @@ int __init its_init(struct fwnode_handle *handle, struct rdists *rdists,
if (err)
return err;

- err = its_lpi_memreserve_init();
+ err = its_lpi_memreserve_init(rdists);
if (err)
return err;

diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index 0dc34d7d735a..95479b315918 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -624,6 +624,7 @@ struct rdists {
u64 flags;
u32 gicd_typer;
u32 gicd_typer2;
+ int cpuhp_memreserve_state;
bool has_vlpis;
bool has_rvpeid;
bool has_direct_lpi;
--
2.25.1

2021-10-23 09:12:43

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 1/3] irqchip/gic-v3-its: Give the percpu rdist struct its own flags field

On Fri, 22 Oct 2021 11:33:05 +0100,
Valentin Schneider <[email protected]> wrote:
>
> Later patches will require tracking some per-rdist status. Reuse the bytes
> "lost" to padding within the __percpu rdist struct as a flags field, and
> re-encode ->lpi_enabled within said flags.
>
> No change in functionality intended.
>
> Signed-off-by: Valentin Schneider <[email protected]>
> ---
> drivers/irqchip/irq-gic-v3-its.c | 26 ++++++++++++++------------
> include/linux/irqchip/arm-gic-v3.h | 2 +-
> 2 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index eb0882d15366..a688ed5c21e8 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -43,8 +43,10 @@
> #define ITS_FLAGS_WORKAROUND_CAVIUM_22375 (1ULL << 1)
> #define ITS_FLAGS_WORKAROUND_CAVIUM_23144 (1ULL << 2)
>
> -#define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0)
> -#define RDIST_FLAGS_RD_TABLES_PREALLOCATED (1 << 1)
> +#define RDISTS_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0)
> +#define RDISTS_FLAGS_RD_TABLES_PREALLOCATED (1 << 1)
> +
> +#define RDIST_FLAGS_LPI_ENABLED BIT(0)

Just to reduce the churn and for me not to misread things (because
RDIST/RDISTS is pretty confusing), how about leaving the original
flags as is, and name the per-RD ones like:

#define RD_LOCAL_LPI_ENABLED BIT(0)

?

Or something else that'd be adequately different from the original
flags?

Otherwise looks sensible.

M.

--
Without deviation from the norm, progress is not possible.

2021-10-23 10:40:18

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 3/3] irqchip/gic-v3-its: Limit memreserve cpuhp state lifetime

On Fri, 22 Oct 2021 11:33:07 +0100,
Valentin Schneider <[email protected]> wrote:
>
> The new memreserve cpuhp callback only needs to survive up until a point
> where every CPU in the system has booted once. Beyond that, it becomes a
> no-op and can be put in the bin.
>
> Signed-off-by: Valentin Schneider <[email protected]>
> ---
> drivers/irqchip/irq-gic-v3-its.c | 23 ++++++++++++++++++++---
> include/linux/irqchip/arm-gic-v3.h | 1 +
> 2 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index a6a4af59205e..4ae9ae6b90fe 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -5206,6 +5206,15 @@ int its_cpu_init(void)
> }
>
> #ifdef CONFIG_EFI
> +static void rdist_memreserve_cpuhp_cleanup_workfn(struct work_struct *work)
> +{
> + cpuhp_remove_state_nocalls(gic_rdists->cpuhp_memreserve_state);
> + gic_rdists->cpuhp_memreserve_state = CPUHP_INVALID;
> +}
> +
> +static DECLARE_WORK(rdist_memreserve_cpuhp_cleanup_work,
> + rdist_memreserve_cpuhp_cleanup_workfn);
> +
> static int its_cpu_memreserve_lpi(unsigned int cpu)
> {
> struct page *pend_page = gic_data_rdist()->pend_page;
> @@ -5226,7 +5235,7 @@ static int its_cpu_memreserve_lpi(unsigned int cpu)
> * invocation of this callback, or in a previous life before kexec.
> */
> if (gic_data_rdist()->flags & RDIST_FLAGS_PENDTABLE_RESERVED)
> - return 0;
> + goto out;
>
> gic_data_rdist()->flags |= RDIST_FLAGS_PENDTABLE_RESERVED;
>
> @@ -5234,6 +5243,11 @@ static int its_cpu_memreserve_lpi(unsigned int cpu)
> paddr = page_to_phys(pend_page);
> WARN_ON(gic_reserve_range(paddr, LPI_PENDBASE_SZ));
>
> +out:
> + /* This only needs to run once per CPU */
> + if (cpumask_equal(&cpus_booted_once_mask, cpu_possible_mask))
> + schedule_work(&rdist_memreserve_cpuhp_cleanup_work);

Which makes me wonder. Do we actually need any flag at all if all we
need to check is whether the CPU has been through the callback at
least once? I have the strong feeling that we are tracking the same
state multiple times here.

Also, could the cpuhp callbacks ever run concurrently? If they could,
two CPUs could schedule the cleanup work in parallel, with interesting
results. You'd need a cmpxchg on the cpuhp state in the workfn.

> +
> return 0;
> }
> #endif
> @@ -5421,13 +5435,14 @@ static void __init its_acpi_probe(void)
> static void __init its_acpi_probe(void) { }
> #endif
>
> -static int __init its_lpi_memreserve_init(void)
> +static int __init its_lpi_memreserve_init(struct rdists *rdists)
> {
> int state;
>
> if (!efi_enabled(EFI_CONFIG_TABLES))
> return 0;
>
> + rdists->cpuhp_memreserve_state = CPUHP_INVALID;
> state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> "irqchip/arm/gicv3/memreserve:online",
> its_cpu_memreserve_lpi,
> @@ -5435,6 +5450,8 @@ static int __init its_lpi_memreserve_init(void)
> if (state < 0)
> return state;
>
> + rdists->cpuhp_memreserve_state = state;
> +
> return 0;
> }
>
> @@ -5465,7 +5482,7 @@ int __init its_init(struct fwnode_handle *handle, struct rdists *rdists,
> if (err)
> return err;
>
> - err = its_lpi_memreserve_init();
> + err = its_lpi_memreserve_init(rdists);
> if (err)
> return err;
>
> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
> index 0dc34d7d735a..95479b315918 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -624,6 +624,7 @@ struct rdists {
> u64 flags;
> u32 gicd_typer;
> u32 gicd_typer2;
> + int cpuhp_memreserve_state;
> bool has_vlpis;
> bool has_rvpeid;
> bool has_direct_lpi;

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2021-10-24 15:52:31

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH 1/3] irqchip/gic-v3-its: Give the percpu rdist struct its own flags field

On 23/10/21 10:10, Marc Zyngier wrote:
> On Fri, 22 Oct 2021 11:33:05 +0100,
> Valentin Schneider <[email protected]> wrote:
>> -#define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0)
>> -#define RDIST_FLAGS_RD_TABLES_PREALLOCATED (1 << 1)
>> +#define RDISTS_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0)
>> +#define RDISTS_FLAGS_RD_TABLES_PREALLOCATED (1 << 1)
>> +
>> +#define RDIST_FLAGS_LPI_ENABLED BIT(0)
>
> Just to reduce the churn and for me not to misread things (because
> RDIST/RDISTS is pretty confusing), how about leaving the original
> flags as is, and name the per-RD ones like:
>
> #define RD_LOCAL_LPI_ENABLED BIT(0)
>
> ?
>
> Or something else that'd be adequately different from the original
> flags?
>

Aye, sounds like the right thing to do!

2021-10-24 15:53:56

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH 3/3] irqchip/gic-v3-its: Limit memreserve cpuhp state lifetime

On 23/10/21 11:37, Marc Zyngier wrote:
> On Fri, 22 Oct 2021 11:33:07 +0100,
> Valentin Schneider <[email protected]> wrote:
>> @@ -5234,6 +5243,11 @@ static int its_cpu_memreserve_lpi(unsigned int cpu)
>> paddr = page_to_phys(pend_page);
>> WARN_ON(gic_reserve_range(paddr, LPI_PENDBASE_SZ));
>>
>> +out:
>> + /* This only needs to run once per CPU */
>> + if (cpumask_equal(&cpus_booted_once_mask, cpu_possible_mask))
>> + schedule_work(&rdist_memreserve_cpuhp_cleanup_work);
>
> Which makes me wonder. Do we actually need any flag at all if all we
> need to check is whether the CPU has been through the callback at
> least once? I have the strong feeling that we are tracking the same
> state multiple times here.
>

Agreed, cf. my reply on 2/3.

> Also, could the cpuhp callbacks ever run concurrently? If they could,
> two CPUs could schedule the cleanup work in parallel, with interesting
> results. You'd need a cmpxchg on the cpuhp state in the workfn.
>

So I think the cpuhp callbacks may run concurrently, but at a quick glance
it seems like we can't get two instances of the same work executing
concurrently: schedule_work()->queue_work() doesn't re-queue a work if it's already
pending, and __queue_work() checks a work's previous pool in case it might
still be running there.

Regardless, that's one less thing to worry about if we make the cpuhp
callback body run at most once on each CPU (only a single CPU will be able
to queue the removal work).