2024-04-16 08:56:07

by Dawei Li

[permalink] [raw]
Subject: [PATCH v2 0/7] Remove on-stack cpumask var for irq subsystem

Hi,

This is v2 of previous series[1] on removal of onstack cpumask var.

Generally it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

One may argue that alloc_cpumask_var() and its friends are the formal
way for these cases. But for struct irqchip::irq_set_affinity(), it's
called under atomic context(raw spinlock held), and dynamic memory
allocation in atomic context is less-favorable.

So a new helper is introduced to address all these issues above. It's
free of any context issue and intermediate cpumask variable allocation
issue(no matter it's on stack or heap).

The case with gic-v3-its(Patch 3) is special from others since it's not
related to intersections between 3 cpumask.

Patch#7 is not for irq subsystem, it's in this series only because it
uses new helper. Please ignore it if you found it's inappropriate for
this series.

Any comments are welcomed.

------------

Change since v1:

- Rebased against tip/irq/core;

- Patch[1]: [Yury]
- Remove ifdefery nesting on find_first_and_and_bit;
- Update commit message;

- Patch[3]: [Marc]
- Merge two bitmap ops into one;
- Update commit message;

- Patch[2,4-6]: [Yury]
- Unwrap lines;

- Patch[7]:
Newly added. Feel free to drop/ignore it if you found it's inappropriate
for this series.

[1] v1:
https://lore.kernel.org/lkml/[email protected]/

Dawei Li (7):
cpumask: introduce cpumask_first_and_and()
irqchip/irq-bcm6345-l1: Avoid explicit cpumask allocation on stack
irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack
irqchip/loongson-eiointc: Avoid explicit cpumask allocation on stack
irqchip/riscv-aplic-direct: Avoid explicit cpumask allocation on stack
irqchip/sifive-plic: Avoid explicit cpumask allocation on stack
cpuidle: Avoid explicit cpumask allocation on stack

drivers/cpuidle/coupled.c | 13 +++---------
drivers/irqchip/irq-bcm6345-l1.c | 6 +-----
drivers/irqchip/irq-gic-v3-its.c | 15 ++++++++-----
drivers/irqchip/irq-loongson-eiointc.c | 8 ++-----
drivers/irqchip/irq-riscv-aplic-direct.c | 7 ++----
drivers/irqchip/irq-sifive-plic.c | 7 ++----
include/linux/cpumask.h | 17 +++++++++++++++
include/linux/find.h | 27 ++++++++++++++++++++++++
lib/find_bit.c | 12 +++++++++++
9 files changed, 76 insertions(+), 36 deletions(-)

base-commit: 35d77eb7b974f62aaef5a0dc72d93ddb1ada4074

Thanks,

Dawei

--
2.27.0



2024-04-16 08:56:18

by Dawei Li

[permalink] [raw]
Subject: [PATCH v2 1/7] cpumask: introduce cpumask_first_and_and()

Introduce cpumask_first_and_and() to get intersection between 3 cpumasks,
free of any intermediate cpumask variable. Instead, cpumask_first_and_and()
works in-place with all inputs and produce desired output directly.

Signed-off-by: Dawei Li <[email protected]>
---
include/linux/cpumask.h | 17 +++++++++++++++++
include/linux/find.h | 27 +++++++++++++++++++++++++++
lib/find_bit.c | 12 ++++++++++++
3 files changed, 56 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 1c29947db848..c46f9e9e1d66 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -187,6 +187,23 @@ unsigned int cpumask_first_and(const struct cpumask *srcp1, const struct cpumask
return find_first_and_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), small_cpumask_bits);
}

+/**
+ * cpumask_first_and_and - return the first cpu from *srcp1 & *srcp2 & *srcp3
+ * @srcp1: the first input
+ * @srcp2: the second input
+ * @srcp3: the third input
+ *
+ * Return: >= nr_cpu_ids if no cpus set in all.
+ */
+static inline
+unsigned int cpumask_first_and_and(const struct cpumask *srcp1,
+ const struct cpumask *srcp2,
+ const struct cpumask *srcp3)
+{
+ return find_first_and_and_bit(cpumask_bits(srcp1), cpumask_bits(srcp2),
+ cpumask_bits(srcp3), small_cpumask_bits);
+}
+
/**
* cpumask_last - get the last CPU in a cpumask
* @srcp: - the cpumask pointer
diff --git a/include/linux/find.h b/include/linux/find.h
index c69598e383c1..28ec5a03393a 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -29,6 +29,8 @@ unsigned long __find_nth_and_andnot_bit(const unsigned long *addr1, const unsign
unsigned long n);
extern unsigned long _find_first_and_bit(const unsigned long *addr1,
const unsigned long *addr2, unsigned long size);
+unsigned long _find_first_and_and_bit(const unsigned long *addr1, const unsigned long *addr2,
+ const unsigned long *addr3, unsigned long size);
extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);

@@ -345,6 +347,31 @@ unsigned long find_first_and_bit(const unsigned long *addr1,
}
#endif

+/**
+ * find_first_and_and_bit - find the first set bit in 3 memory regions
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @addr3: The third address to base the search on
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number for the first set bit
+ * If no bits are set, returns @size.
+ */
+static inline
+unsigned long find_first_and_and_bit(const unsigned long *addr1,
+ const unsigned long *addr2,
+ const unsigned long *addr3,
+ unsigned long size)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val = *addr1 & *addr2 & *addr3 & GENMASK(size - 1, 0);
+
+ return val ? __ffs(val) : size;
+ }
+
+ return _find_first_and_and_bit(addr1, addr2, addr3, size);
+}
+
#ifndef find_first_zero_bit
/**
* find_first_zero_bit - find the first cleared bit in a memory region
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 32f99e9a670e..dacadd904250 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -116,6 +116,18 @@ unsigned long _find_first_and_bit(const unsigned long *addr1,
EXPORT_SYMBOL(_find_first_and_bit);
#endif

+/*
+ * Find the first set bit in three memory regions.
+ */
+unsigned long _find_first_and_and_bit(const unsigned long *addr1,
+ const unsigned long *addr2,
+ const unsigned long *addr3,
+ unsigned long size)
+{
+ return FIND_FIRST_BIT(addr1[idx] & addr2[idx] & addr3[idx], /* nop */, size);
+}
+EXPORT_SYMBOL(_find_first_and_and_bit);
+
#ifndef find_first_zero_bit
/*
* Find the first cleared bit in a memory region.
--
2.27.0


2024-04-16 08:56:31

by Dawei Li

[permalink] [raw]
Subject: [PATCH v2 3/7] irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Remove cpumask var on stack and use cpumask_any_and() to address it.

Signed-off-by: Dawei Li <[email protected]>
---
drivers/irqchip/irq-gic-v3-its.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index fca888b36680..20f954211c61 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3826,9 +3826,9 @@ static int its_vpe_set_affinity(struct irq_data *d,
bool force)
{
struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
- struct cpumask common, *table_mask;
+ unsigned int from, cpu = nr_cpu_ids;
+ struct cpumask *table_mask;
unsigned long flags;
- int from, cpu;

/*
* Changing affinity is mega expensive, so let's be as lazy as
@@ -3850,10 +3850,15 @@ static int its_vpe_set_affinity(struct irq_data *d,
* If we are offered another CPU in the same GICv4.1 ITS
* affinity, pick this one. Otherwise, any CPU will do.
*/
- if (table_mask && cpumask_and(&common, mask_val, table_mask))
- cpu = cpumask_test_cpu(from, &common) ? from : cpumask_first(&common);
- else
+ if (table_mask)
+ cpu = cpumask_any_and(mask_val, table_mask);
+ if (cpu < nr_cpu_ids) {
+ if (cpumask_test_cpu(from, mask_val) &&
+ cpumask_test_cpu(from, table_mask))
+ cpu = from;
+ } else {
cpu = cpumask_first(mask_val);
+ }

if (from == cpu)
goto out;
--
2.27.0



2024-04-16 08:56:42

by Dawei Li

[permalink] [raw]
Subject: [PATCH v2 2/7] irqchip/irq-bcm6345-l1: Avoid explicit cpumask allocation on stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_first_and_and() to avoid the need for a temporary cpumask on
the stack.

Signed-off-by: Dawei Li <[email protected]>
---
drivers/irqchip/irq-bcm6345-l1.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/irqchip/irq-bcm6345-l1.c b/drivers/irqchip/irq-bcm6345-l1.c
index eb02d203c963..90daa274ef23 100644
--- a/drivers/irqchip/irq-bcm6345-l1.c
+++ b/drivers/irqchip/irq-bcm6345-l1.c
@@ -192,14 +192,10 @@ static int bcm6345_l1_set_affinity(struct irq_data *d,
u32 mask = BIT(d->hwirq % IRQS_PER_WORD);
unsigned int old_cpu = cpu_for_irq(intc, d);
unsigned int new_cpu;
- struct cpumask valid;
unsigned long flags;
bool enabled;

- if (!cpumask_and(&valid, &intc->cpumask, dest))
- return -EINVAL;
-
- new_cpu = cpumask_any_and(&valid, cpu_online_mask);
+ new_cpu = cpumask_first_and_and(&intc->cpumask, dest, cpu_online_mask);
if (new_cpu >= nr_cpu_ids)
return -EINVAL;

--
2.27.0


2024-04-16 08:56:53

by Dawei Li

[permalink] [raw]
Subject: [PATCH v2 5/7] irqchip/riscv-aplic-direct: Avoid explicit cpumask allocation on stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_first_and_and() to avoid the need for a temporary cpumask on
the stack.

Signed-off-by: Dawei Li <[email protected]>
---
drivers/irqchip/irq-riscv-aplic-direct.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/irqchip/irq-riscv-aplic-direct.c b/drivers/irqchip/irq-riscv-aplic-direct.c
index 06bace9b7497..4a3ffe856d6c 100644
--- a/drivers/irqchip/irq-riscv-aplic-direct.c
+++ b/drivers/irqchip/irq-riscv-aplic-direct.c
@@ -54,15 +54,12 @@ static int aplic_direct_set_affinity(struct irq_data *d, const struct cpumask *m
struct aplic_direct *direct = container_of(priv, struct aplic_direct, priv);
struct aplic_idc *idc;
unsigned int cpu, val;
- struct cpumask amask;
void __iomem *target;

- cpumask_and(&amask, &direct->lmask, mask_val);
-
if (force)
- cpu = cpumask_first(&amask);
+ cpu = cpumask_first_and(&direct->lmask, mask_val);
else
- cpu = cpumask_any_and(&amask, cpu_online_mask);
+ cpu = cpumask_first_and_and(&direct->lmask, mask_val, cpu_online_mask);

if (cpu >= nr_cpu_ids)
return -EINVAL;
--
2.27.0


2024-04-16 08:57:20

by Dawei Li

[permalink] [raw]
Subject: [PATCH v2 7/7] cpuidle: Avoid explicit cpumask allocation on stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_first_and_and() and cpumask_weight_and() to avoid the need
for a temporary cpumask on the stack.

Signed-off-by: Dawei Li <[email protected]>
---
drivers/cpuidle/coupled.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
index 9acde71558d5..bb8761c8a42e 100644
--- a/drivers/cpuidle/coupled.c
+++ b/drivers/cpuidle/coupled.c
@@ -439,13 +439,8 @@ static int cpuidle_coupled_clear_pokes(int cpu)

static bool cpuidle_coupled_any_pokes_pending(struct cpuidle_coupled *coupled)
{
- cpumask_t cpus;
- int ret;
-
- cpumask_and(&cpus, cpu_online_mask, &coupled->coupled_cpus);
- ret = cpumask_and(&cpus, &cpuidle_coupled_poke_pending, &cpus);
-
- return ret;
+ return cpumask_first_and_and(cpu_online_mask, &coupled->coupled_cpus,
+ &cpuidle_coupled_poke_pending) < nr_cpu_ids;
}

/**
@@ -626,9 +621,7 @@ int cpuidle_enter_state_coupled(struct cpuidle_device *dev,

static void cpuidle_coupled_update_online_cpus(struct cpuidle_coupled *coupled)
{
- cpumask_t cpus;
- cpumask_and(&cpus, cpu_online_mask, &coupled->coupled_cpus);
- coupled->online_count = cpumask_weight(&cpus);
+ coupled->online_count = cpumask_weight_and(cpu_online_mask, &coupled->coupled_cpus);
}

/**
--
2.27.0


2024-04-16 08:57:23

by Dawei Li

[permalink] [raw]
Subject: [PATCH v2 6/7] irqchip/sifive-plic: Avoid explicit cpumask allocation on stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_first_and_and() to avoid the need for a temporary cpumask on
the stack.

Signed-off-by: Dawei Li <[email protected]>
---
drivers/irqchip/irq-sifive-plic.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index f3d4cb9e34f7..8fb183ced1e7 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -164,15 +164,12 @@ static int plic_set_affinity(struct irq_data *d,
const struct cpumask *mask_val, bool force)
{
unsigned int cpu;
- struct cpumask amask;
struct plic_priv *priv = irq_data_get_irq_chip_data(d);

- cpumask_and(&amask, &priv->lmask, mask_val);
-
if (force)
- cpu = cpumask_first(&amask);
+ cpu = cpumask_first_and(&priv->lmask, mask_val);
else
- cpu = cpumask_any_and(&amask, cpu_online_mask);
+ cpu = cpumask_first_and_and(&priv->lmask, mask_val, cpu_online_mask);

if (cpu >= nr_cpu_ids)
return -EINVAL;
--
2.27.0


2024-04-16 08:57:29

by Dawei Li

[permalink] [raw]
Subject: [PATCH v2 4/7] irqchip/loongson-eiointc: Avoid explicit cpumask allocation on stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_first_and_and() to avoid the need for a temporary cpumask on
the stack.

Signed-off-by: Dawei Li <[email protected]>
---
drivers/irqchip/irq-loongson-eiointc.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/irqchip/irq-loongson-eiointc.c b/drivers/irqchip/irq-loongson-eiointc.c
index 4f5e6d21d77d..c7ddebf312ad 100644
--- a/drivers/irqchip/irq-loongson-eiointc.c
+++ b/drivers/irqchip/irq-loongson-eiointc.c
@@ -93,19 +93,15 @@ static int eiointc_set_irq_affinity(struct irq_data *d, const struct cpumask *af
unsigned int cpu;
unsigned long flags;
uint32_t vector, regaddr;
- struct cpumask intersect_affinity;
struct eiointc_priv *priv = d->domain->host_data;

raw_spin_lock_irqsave(&affinity_lock, flags);

- cpumask_and(&intersect_affinity, affinity, cpu_online_mask);
- cpumask_and(&intersect_affinity, &intersect_affinity, &priv->cpuspan_map);
-
- if (cpumask_empty(&intersect_affinity)) {
+ cpu = cpumask_first_and_and(&priv->cpuspan_map, affinity, cpu_online_mask);
+ if (cpu >= nr_cpu_ids) {
raw_spin_unlock_irqrestore(&affinity_lock, flags);
return -EINVAL;
}
- cpu = cpumask_first(&intersect_affinity);

vector = d->hwirq;
regaddr = EIOINTC_REG_ENABLE + ((vector >> 5) << 2);
--
2.27.0


2024-04-16 17:46:05

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] cpumask: introduce cpumask_first_and_and()

On Tue, Apr 16, 2024 at 04:54:48PM +0800, Dawei Li wrote:
> Introduce cpumask_first_and_and() to get intersection between 3 cpumasks,
> free of any intermediate cpumask variable. Instead, cpumask_first_and_and()
> works in-place with all inputs and produce desired output directly.
>
> Signed-off-by: Dawei Li <[email protected]>

Acked-by: Yury Norov <[email protected]>

> ---
> include/linux/cpumask.h | 17 +++++++++++++++++
> include/linux/find.h | 27 +++++++++++++++++++++++++++
> lib/find_bit.c | 12 ++++++++++++
> 3 files changed, 56 insertions(+)
>
> diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> index 1c29947db848..c46f9e9e1d66 100644
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -187,6 +187,23 @@ unsigned int cpumask_first_and(const struct cpumask *srcp1, const struct cpumask
> return find_first_and_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), small_cpumask_bits);
> }
>
> +/**
> + * cpumask_first_and_and - return the first cpu from *srcp1 & *srcp2 & *srcp3
> + * @srcp1: the first input
> + * @srcp2: the second input
> + * @srcp3: the third input
> + *
> + * Return: >= nr_cpu_ids if no cpus set in all.
> + */
> +static inline
> +unsigned int cpumask_first_and_and(const struct cpumask *srcp1,
> + const struct cpumask *srcp2,
> + const struct cpumask *srcp3)
> +{
> + return find_first_and_and_bit(cpumask_bits(srcp1), cpumask_bits(srcp2),
> + cpumask_bits(srcp3), small_cpumask_bits);
> +}
> +
> /**
> * cpumask_last - get the last CPU in a cpumask
> * @srcp: - the cpumask pointer
> diff --git a/include/linux/find.h b/include/linux/find.h
> index c69598e383c1..28ec5a03393a 100644
> --- a/include/linux/find.h
> +++ b/include/linux/find.h
> @@ -29,6 +29,8 @@ unsigned long __find_nth_and_andnot_bit(const unsigned long *addr1, const unsign
> unsigned long n);
> extern unsigned long _find_first_and_bit(const unsigned long *addr1,
> const unsigned long *addr2, unsigned long size);
> +unsigned long _find_first_and_and_bit(const unsigned long *addr1, const unsigned long *addr2,
> + const unsigned long *addr3, unsigned long size);
> extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
> extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
>
> @@ -345,6 +347,31 @@ unsigned long find_first_and_bit(const unsigned long *addr1,
> }
> #endif
>
> +/**
> + * find_first_and_and_bit - find the first set bit in 3 memory regions
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @addr3: The third address to base the search on
> + * @size: The bitmap size in bits
> + *
> + * Returns the bit number for the first set bit
> + * If no bits are set, returns @size.
> + */
> +static inline
> +unsigned long find_first_and_and_bit(const unsigned long *addr1,
> + const unsigned long *addr2,
> + const unsigned long *addr3,
> + unsigned long size)
> +{
> + if (small_const_nbits(size)) {
> + unsigned long val = *addr1 & *addr2 & *addr3 & GENMASK(size - 1, 0);
> +
> + return val ? __ffs(val) : size;
> + }
> +
> + return _find_first_and_and_bit(addr1, addr2, addr3, size);
> +}
> +
> #ifndef find_first_zero_bit
> /**
> * find_first_zero_bit - find the first cleared bit in a memory region
> diff --git a/lib/find_bit.c b/lib/find_bit.c
> index 32f99e9a670e..dacadd904250 100644
> --- a/lib/find_bit.c
> +++ b/lib/find_bit.c
> @@ -116,6 +116,18 @@ unsigned long _find_first_and_bit(const unsigned long *addr1,
> EXPORT_SYMBOL(_find_first_and_bit);
> #endif
>
> +/*
> + * Find the first set bit in three memory regions.
> + */
> +unsigned long _find_first_and_and_bit(const unsigned long *addr1,
> + const unsigned long *addr2,
> + const unsigned long *addr3,
> + unsigned long size)
> +{
> + return FIND_FIRST_BIT(addr1[idx] & addr2[idx] & addr3[idx], /* nop */, size);
> +}
> +EXPORT_SYMBOL(_find_first_and_and_bit);
> +
> #ifndef find_first_zero_bit
> /*
> * Find the first cleared bit in a memory region.
> --
> 2.27.0

2024-04-16 17:49:59

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] cpumask: introduce cpumask_first_and_and()

On Tue, Apr 16, 2024 at 10:45:54AM -0700, Yury Norov wrote:
> On Tue, Apr 16, 2024 at 04:54:48PM +0800, Dawei Li wrote:
> > Introduce cpumask_first_and_and() to get intersection between 3 cpumasks,
> > free of any intermediate cpumask variable. Instead, cpumask_first_and_and()
> > works in-place with all inputs and produce desired output directly.

Still there: s/produce/produces

But whatever. Also, I think this patch would better go with the rest
of the series, right?

> >
> > Signed-off-by: Dawei Li <[email protected]>
>
> Acked-by: Yury Norov <[email protected]>
>
> > ---
> > include/linux/cpumask.h | 17 +++++++++++++++++
> > include/linux/find.h | 27 +++++++++++++++++++++++++++
> > lib/find_bit.c | 12 ++++++++++++
> > 3 files changed, 56 insertions(+)
> >
> > diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> > index 1c29947db848..c46f9e9e1d66 100644
> > --- a/include/linux/cpumask.h
> > +++ b/include/linux/cpumask.h
> > @@ -187,6 +187,23 @@ unsigned int cpumask_first_and(const struct cpumask *srcp1, const struct cpumask
> > return find_first_and_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), small_cpumask_bits);
> > }
> >
> > +/**
> > + * cpumask_first_and_and - return the first cpu from *srcp1 & *srcp2 & *srcp3
> > + * @srcp1: the first input
> > + * @srcp2: the second input
> > + * @srcp3: the third input
> > + *
> > + * Return: >= nr_cpu_ids if no cpus set in all.
> > + */
> > +static inline
> > +unsigned int cpumask_first_and_and(const struct cpumask *srcp1,
> > + const struct cpumask *srcp2,
> > + const struct cpumask *srcp3)
> > +{
> > + return find_first_and_and_bit(cpumask_bits(srcp1), cpumask_bits(srcp2),
> > + cpumask_bits(srcp3), small_cpumask_bits);
> > +}
> > +
> > /**
> > * cpumask_last - get the last CPU in a cpumask
> > * @srcp: - the cpumask pointer
> > diff --git a/include/linux/find.h b/include/linux/find.h
> > index c69598e383c1..28ec5a03393a 100644
> > --- a/include/linux/find.h
> > +++ b/include/linux/find.h
> > @@ -29,6 +29,8 @@ unsigned long __find_nth_and_andnot_bit(const unsigned long *addr1, const unsign
> > unsigned long n);
> > extern unsigned long _find_first_and_bit(const unsigned long *addr1,
> > const unsigned long *addr2, unsigned long size);
> > +unsigned long _find_first_and_and_bit(const unsigned long *addr1, const unsigned long *addr2,
> > + const unsigned long *addr3, unsigned long size);
> > extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
> > extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
> >
> > @@ -345,6 +347,31 @@ unsigned long find_first_and_bit(const unsigned long *addr1,
> > }
> > #endif
> >
> > +/**
> > + * find_first_and_and_bit - find the first set bit in 3 memory regions
> > + * @addr1: The first address to base the search on
> > + * @addr2: The second address to base the search on
> > + * @addr3: The third address to base the search on
> > + * @size: The bitmap size in bits
> > + *
> > + * Returns the bit number for the first set bit
> > + * If no bits are set, returns @size.
> > + */
> > +static inline
> > +unsigned long find_first_and_and_bit(const unsigned long *addr1,
> > + const unsigned long *addr2,
> > + const unsigned long *addr3,
> > + unsigned long size)
> > +{
> > + if (small_const_nbits(size)) {
> > + unsigned long val = *addr1 & *addr2 & *addr3 & GENMASK(size - 1, 0);
> > +
> > + return val ? __ffs(val) : size;
> > + }
> > +
> > + return _find_first_and_and_bit(addr1, addr2, addr3, size);
> > +}
> > +
> > #ifndef find_first_zero_bit
> > /**
> > * find_first_zero_bit - find the first cleared bit in a memory region
> > diff --git a/lib/find_bit.c b/lib/find_bit.c
> > index 32f99e9a670e..dacadd904250 100644
> > --- a/lib/find_bit.c
> > +++ b/lib/find_bit.c
> > @@ -116,6 +116,18 @@ unsigned long _find_first_and_bit(const unsigned long *addr1,
> > EXPORT_SYMBOL(_find_first_and_bit);
> > #endif
> >
> > +/*
> > + * Find the first set bit in three memory regions.
> > + */
> > +unsigned long _find_first_and_and_bit(const unsigned long *addr1,
> > + const unsigned long *addr2,
> > + const unsigned long *addr3,
> > + unsigned long size)
> > +{
> > + return FIND_FIRST_BIT(addr1[idx] & addr2[idx] & addr3[idx], /* nop */, size);
> > +}
> > +EXPORT_SYMBOL(_find_first_and_and_bit);
> > +
> > #ifndef find_first_zero_bit
> > /*
> > * Find the first cleared bit in a memory region.
> > --
> > 2.27.0

2024-04-16 18:01:56

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] irqchip/loongson-eiointc: Avoid explicit cpumask allocation on stack

On Tue, Apr 16, 2024 at 04:54:51PM +0800, Dawei Li wrote:
> In general it's preferable to avoid placing cpumasks on the stack, as
> for large values of NR_CPUS these can consume significant amounts of
> stack space and make stack overflows more likely.
>
> Use cpumask_first_and_and() to avoid the need for a temporary cpumask on
> the stack.
>
> Signed-off-by: Dawei Li <[email protected]>
> ---
> drivers/irqchip/irq-loongson-eiointc.c | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/irqchip/irq-loongson-eiointc.c b/drivers/irqchip/irq-loongson-eiointc.c
> index 4f5e6d21d77d..c7ddebf312ad 100644
> --- a/drivers/irqchip/irq-loongson-eiointc.c
> +++ b/drivers/irqchip/irq-loongson-eiointc.c
> @@ -93,19 +93,15 @@ static int eiointc_set_irq_affinity(struct irq_data *d, const struct cpumask *af
> unsigned int cpu;
> unsigned long flags;
> uint32_t vector, regaddr;
> - struct cpumask intersect_affinity;
> struct eiointc_priv *priv = d->domain->host_data;
>
> raw_spin_lock_irqsave(&affinity_lock, flags);
>
> - cpumask_and(&intersect_affinity, affinity, cpu_online_mask);
> - cpumask_and(&intersect_affinity, &intersect_affinity, &priv->cpuspan_map);
> -
> - if (cpumask_empty(&intersect_affinity)) {

This was unneeded because cpumask_and() returns true if there are set
bits.

For the series:

Reviewed-by: Yury Norov <[email protected]>

> + cpu = cpumask_first_and_and(&priv->cpuspan_map, affinity, cpu_online_mask);
> + if (cpu >= nr_cpu_ids) {
> raw_spin_unlock_irqrestore(&affinity_lock, flags);
> return -EINVAL;
> }
> - cpu = cpumask_first(&intersect_affinity);
>
> vector = d->hwirq;
> regaddr = EIOINTC_REG_ENABLE + ((vector >> 5) << 2);
> --
> 2.27.0

2024-04-17 01:35:59

by Dawei Li

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] cpumask: introduce cpumask_first_and_and()

Hi Yury,

Thanks for review.

On Tue, Apr 16, 2024 at 10:49:46AM -0700, Yury Norov wrote:
> On Tue, Apr 16, 2024 at 10:45:54AM -0700, Yury Norov wrote:
> > On Tue, Apr 16, 2024 at 04:54:48PM +0800, Dawei Li wrote:
> > > Introduce cpumask_first_and_and() to get intersection between 3 cpumasks,
> > > free of any intermediate cpumask variable. Instead, cpumask_first_and_and()
> > > works in-place with all inputs and produce desired output directly.
>
> Still there: s/produce/produces

Oops, sorry for that. If it's needed I will respin v3.

>
> But whatever. Also, I think this patch would better go with the rest
> of the series, right?

I suppose so, this series should be applied as a whole.

>
> > >
> > > Signed-off-by: Dawei Li <[email protected]>

> >
> > Acked-by: Yury Norov <[email protected]>

Thanks!

Dawei
> >
> > > ---
> > > include/linux/cpumask.h | 17 +++++++++++++++++
> > > include/linux/find.h | 27 +++++++++++++++++++++++++++
> > > lib/find_bit.c | 12 ++++++++++++
> > > 3 files changed, 56 insertions(+)
> > >
> > > diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> > > index 1c29947db848..c46f9e9e1d66 100644
> > > --- a/include/linux/cpumask.h
> > > +++ b/include/linux/cpumask.h
> > > @@ -187,6 +187,23 @@ unsigned int cpumask_first_and(const struct cpumask *srcp1, const struct cpumask
> > > return find_first_and_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), small_cpumask_bits);
> > > }
> > >
> > > +/**
> > > + * cpumask_first_and_and - return the first cpu from *srcp1 & *srcp2 & *srcp3
> > > + * @srcp1: the first input
> > > + * @srcp2: the second input
> > > + * @srcp3: the third input
> > > + *
> > > + * Return: >= nr_cpu_ids if no cpus set in all.
> > > + */
> > > +static inline
> > > +unsigned int cpumask_first_and_and(const struct cpumask *srcp1,
> > > + const struct cpumask *srcp2,
> > > + const struct cpumask *srcp3)
> > > +{
> > > + return find_first_and_and_bit(cpumask_bits(srcp1), cpumask_bits(srcp2),
> > > + cpumask_bits(srcp3), small_cpumask_bits);
> > > +}
> > > +
> > > /**
> > > * cpumask_last - get the last CPU in a cpumask
> > > * @srcp: - the cpumask pointer
> > > diff --git a/include/linux/find.h b/include/linux/find.h
> > > index c69598e383c1..28ec5a03393a 100644
> > > --- a/include/linux/find.h
> > > +++ b/include/linux/find.h
> > > @@ -29,6 +29,8 @@ unsigned long __find_nth_and_andnot_bit(const unsigned long *addr1, const unsign
> > > unsigned long n);
> > > extern unsigned long _find_first_and_bit(const unsigned long *addr1,
> > > const unsigned long *addr2, unsigned long size);
> > > +unsigned long _find_first_and_and_bit(const unsigned long *addr1, const unsigned long *addr2,
> > > + const unsigned long *addr3, unsigned long size);
> > > extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
> > > extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
> > >
> > > @@ -345,6 +347,31 @@ unsigned long find_first_and_bit(const unsigned long *addr1,
> > > }
> > > #endif
> > >
> > > +/**
> > > + * find_first_and_and_bit - find the first set bit in 3 memory regions
> > > + * @addr1: The first address to base the search on
> > > + * @addr2: The second address to base the search on
> > > + * @addr3: The third address to base the search on
> > > + * @size: The bitmap size in bits
> > > + *
> > > + * Returns the bit number for the first set bit
> > > + * If no bits are set, returns @size.
> > > + */
> > > +static inline
> > > +unsigned long find_first_and_and_bit(const unsigned long *addr1,
> > > + const unsigned long *addr2,
> > > + const unsigned long *addr3,
> > > + unsigned long size)
> > > +{
> > > + if (small_const_nbits(size)) {
> > > + unsigned long val = *addr1 & *addr2 & *addr3 & GENMASK(size - 1, 0);
> > > +
> > > + return val ? __ffs(val) : size;
> > > + }
> > > +
> > > + return _find_first_and_and_bit(addr1, addr2, addr3, size);
> > > +}
> > > +
> > > #ifndef find_first_zero_bit
> > > /**
> > > * find_first_zero_bit - find the first cleared bit in a memory region
> > > diff --git a/lib/find_bit.c b/lib/find_bit.c
> > > index 32f99e9a670e..dacadd904250 100644
> > > --- a/lib/find_bit.c
> > > +++ b/lib/find_bit.c
> > > @@ -116,6 +116,18 @@ unsigned long _find_first_and_bit(const unsigned long *addr1,
> > > EXPORT_SYMBOL(_find_first_and_bit);
> > > #endif
> > >
> > > +/*
> > > + * Find the first set bit in three memory regions.
> > > + */
> > > +unsigned long _find_first_and_and_bit(const unsigned long *addr1,
> > > + const unsigned long *addr2,
> > > + const unsigned long *addr3,
> > > + unsigned long size)
> > > +{
> > > + return FIND_FIRST_BIT(addr1[idx] & addr2[idx] & addr3[idx], /* nop */, size);
> > > +}
> > > +EXPORT_SYMBOL(_find_first_and_and_bit);
> > > +
> > > #ifndef find_first_zero_bit
> > > /*
> > > * Find the first cleared bit in a memory region.
> > > --
> > > 2.27.0
>

2024-04-17 10:56:18

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack

On Tue, 16 Apr 2024 09:54:50 +0100,
Dawei Li <[email protected]> wrote:
>
> In general it's preferable to avoid placing cpumasks on the stack, as
> for large values of NR_CPUS these can consume significant amounts of
> stack space and make stack overflows more likely.
>
> Remove cpumask var on stack and use cpumask_any_and() to address it.
>
> Signed-off-by: Dawei Li <[email protected]>

Reviewed-by: Marc Zyngier <[email protected]>

M.

--
Without deviation from the norm, progress is not possible.

2024-04-17 11:21:23

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH v2 6/7] irqchip/sifive-plic: Avoid explicit cpumask allocation on stack

On Tue, Apr 16, 2024 at 2:26 PM Dawei Li <[email protected]> wrote:
>
> In general it's preferable to avoid placing cpumasks on the stack, as
> for large values of NR_CPUS these can consume significant amounts of
> stack space and make stack overflows more likely.
>
> Use cpumask_first_and_and() to avoid the need for a temporary cpumask on
> the stack.
>
> Signed-off-by: Dawei Li <[email protected]>

LGTM.

Reviewed-by: Anup Patel <[email protected]>

Regards,
Anup

> ---
> drivers/irqchip/irq-sifive-plic.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
> index f3d4cb9e34f7..8fb183ced1e7 100644
> --- a/drivers/irqchip/irq-sifive-plic.c
> +++ b/drivers/irqchip/irq-sifive-plic.c
> @@ -164,15 +164,12 @@ static int plic_set_affinity(struct irq_data *d,
> const struct cpumask *mask_val, bool force)
> {
> unsigned int cpu;
> - struct cpumask amask;
> struct plic_priv *priv = irq_data_get_irq_chip_data(d);
>
> - cpumask_and(&amask, &priv->lmask, mask_val);
> -
> if (force)
> - cpu = cpumask_first(&amask);
> + cpu = cpumask_first_and(&priv->lmask, mask_val);
> else
> - cpu = cpumask_any_and(&amask, cpu_online_mask);
> + cpu = cpumask_first_and_and(&priv->lmask, mask_val, cpu_online_mask);
>
> if (cpu >= nr_cpu_ids)
> return -EINVAL;
> --
> 2.27.0
>

2024-04-17 11:22:28

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH v2 5/7] irqchip/riscv-aplic-direct: Avoid explicit cpumask allocation on stack

On Tue, Apr 16, 2024 at 2:26 PM Dawei Li <[email protected]> wrote:
>
> In general it's preferable to avoid placing cpumasks on the stack, as
> for large values of NR_CPUS these can consume significant amounts of
> stack space and make stack overflows more likely.
>
> Use cpumask_first_and_and() to avoid the need for a temporary cpumask on
> the stack.
>
> Signed-off-by: Dawei Li <[email protected]>

LGTM.

Reviewed-by: Anup Patel <[email protected]>

Regards,
Anup

> ---
> drivers/irqchip/irq-riscv-aplic-direct.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/irqchip/irq-riscv-aplic-direct.c b/drivers/irqchip/irq-riscv-aplic-direct.c
> index 06bace9b7497..4a3ffe856d6c 100644
> --- a/drivers/irqchip/irq-riscv-aplic-direct.c
> +++ b/drivers/irqchip/irq-riscv-aplic-direct.c
> @@ -54,15 +54,12 @@ static int aplic_direct_set_affinity(struct irq_data *d, const struct cpumask *m
> struct aplic_direct *direct = container_of(priv, struct aplic_direct, priv);
> struct aplic_idc *idc;
> unsigned int cpu, val;
> - struct cpumask amask;
> void __iomem *target;
>
> - cpumask_and(&amask, &direct->lmask, mask_val);
> -
> if (force)
> - cpu = cpumask_first(&amask);
> + cpu = cpumask_first_and(&direct->lmask, mask_val);
> else
> - cpu = cpumask_any_and(&amask, cpu_online_mask);
> + cpu = cpumask_first_and_and(&direct->lmask, mask_val, cpu_online_mask);
>
> if (cpu >= nr_cpu_ids)
> return -EINVAL;
> --
> 2.27.0
>

Subject: [tip: irq/core] irqchip/sifive-plic: Avoid explicit cpumask allocation on stack

The following commit has been merged into the irq/core branch of tip:

Commit-ID: a7fb69ffd7ce438a259b2f9fbcebc62f5caf2d4f
Gitweb: https://git.kernel.org/tip/a7fb69ffd7ce438a259b2f9fbcebc62f5caf2d4f
Author: Dawei Li <[email protected]>
AuthorDate: Tue, 16 Apr 2024 16:54:53 +08:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 24 Apr 2024 21:23:49 +02:00

irqchip/sifive-plic: Avoid explicit cpumask allocation on stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_first_and_and() to avoid the need for a temporary cpumask on
the stack.

Signed-off-by: Dawei Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
drivers/irqchip/irq-sifive-plic.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index f3d4cb9..8fb183c 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -164,15 +164,12 @@ static int plic_set_affinity(struct irq_data *d,
const struct cpumask *mask_val, bool force)
{
unsigned int cpu;
- struct cpumask amask;
struct plic_priv *priv = irq_data_get_irq_chip_data(d);

- cpumask_and(&amask, &priv->lmask, mask_val);
-
if (force)
- cpu = cpumask_first(&amask);
+ cpu = cpumask_first_and(&priv->lmask, mask_val);
else
- cpu = cpumask_any_and(&amask, cpu_online_mask);
+ cpu = cpumask_first_and_and(&priv->lmask, mask_val, cpu_online_mask);

if (cpu >= nr_cpu_ids)
return -EINVAL;

Subject: [tip: irq/core] irqchip/loongson-eiointc: Avoid explicit cpumask allocation on stack

The following commit has been merged into the irq/core branch of tip:

Commit-ID: 2bc32db5a262cc34753cb4208b2d3043d1cd81ae
Gitweb: https://git.kernel.org/tip/2bc32db5a262cc34753cb4208b2d3043d1cd81ae
Author: Dawei Li <[email protected]>
AuthorDate: Tue, 16 Apr 2024 16:54:51 +08:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 24 Apr 2024 21:23:49 +02:00

irqchip/loongson-eiointc: Avoid explicit cpumask allocation on stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_first_and_and() to avoid the need for a temporary cpumask on
the stack.

Signed-off-by: Dawei Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Yury Norov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
drivers/irqchip/irq-loongson-eiointc.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/irqchip/irq-loongson-eiointc.c b/drivers/irqchip/irq-loongson-eiointc.c
index 4f5e6d2..c7ddebf 100644
--- a/drivers/irqchip/irq-loongson-eiointc.c
+++ b/drivers/irqchip/irq-loongson-eiointc.c
@@ -93,19 +93,15 @@ static int eiointc_set_irq_affinity(struct irq_data *d, const struct cpumask *af
unsigned int cpu;
unsigned long flags;
uint32_t vector, regaddr;
- struct cpumask intersect_affinity;
struct eiointc_priv *priv = d->domain->host_data;

raw_spin_lock_irqsave(&affinity_lock, flags);

- cpumask_and(&intersect_affinity, affinity, cpu_online_mask);
- cpumask_and(&intersect_affinity, &intersect_affinity, &priv->cpuspan_map);
-
- if (cpumask_empty(&intersect_affinity)) {
+ cpu = cpumask_first_and_and(&priv->cpuspan_map, affinity, cpu_online_mask);
+ if (cpu >= nr_cpu_ids) {
raw_spin_unlock_irqrestore(&affinity_lock, flags);
return -EINVAL;
}
- cpu = cpumask_first(&intersect_affinity);

vector = d->hwirq;
regaddr = EIOINTC_REG_ENABLE + ((vector >> 5) << 2);

Subject: [tip: irq/core] irqchip/riscv-aplic-direct: Avoid explicit cpumask allocation on stack

The following commit has been merged into the irq/core branch of tip:

Commit-ID: 5d650d1eba876717888a0951ed873ef0f1d8cf61
Gitweb: https://git.kernel.org/tip/5d650d1eba876717888a0951ed873ef0f1d8cf61
Author: Dawei Li <[email protected]>
AuthorDate: Tue, 16 Apr 2024 16:54:52 +08:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 24 Apr 2024 21:23:49 +02:00

irqchip/riscv-aplic-direct: Avoid explicit cpumask allocation on stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_first_and_and() to avoid the need for a temporary cpumask on
the stack.

Signed-off-by: Dawei Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
drivers/irqchip/irq-riscv-aplic-direct.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/irqchip/irq-riscv-aplic-direct.c b/drivers/irqchip/irq-riscv-aplic-direct.c
index 06bace9..4a3ffe8 100644
--- a/drivers/irqchip/irq-riscv-aplic-direct.c
+++ b/drivers/irqchip/irq-riscv-aplic-direct.c
@@ -54,15 +54,12 @@ static int aplic_direct_set_affinity(struct irq_data *d, const struct cpumask *m
struct aplic_direct *direct = container_of(priv, struct aplic_direct, priv);
struct aplic_idc *idc;
unsigned int cpu, val;
- struct cpumask amask;
void __iomem *target;

- cpumask_and(&amask, &direct->lmask, mask_val);
-
if (force)
- cpu = cpumask_first(&amask);
+ cpu = cpumask_first_and(&direct->lmask, mask_val);
else
- cpu = cpumask_any_and(&amask, cpu_online_mask);
+ cpu = cpumask_first_and_and(&direct->lmask, mask_val, cpu_online_mask);

if (cpu >= nr_cpu_ids)
return -EINVAL;

Subject: [tip: irq/core] irqchip/irq-bcm6345-l1: Avoid explicit cpumask allocation on stack

The following commit has been merged into the irq/core branch of tip:

Commit-ID: 6a9a52f74e3b82ff3f5398810c1b23ad497e2df5
Gitweb: https://git.kernel.org/tip/6a9a52f74e3b82ff3f5398810c1b23ad497e2df5
Author: Dawei Li <[email protected]>
AuthorDate: Tue, 16 Apr 2024 16:54:49 +08:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 24 Apr 2024 21:23:49 +02:00

irqchip/irq-bcm6345-l1: Avoid explicit cpumask allocation on stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_first_and_and() to avoid the need for a temporary cpumask on
the stack.

Signed-off-by: Dawei Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
drivers/irqchip/irq-bcm6345-l1.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/irqchip/irq-bcm6345-l1.c b/drivers/irqchip/irq-bcm6345-l1.c
index eb02d20..90daa27 100644
--- a/drivers/irqchip/irq-bcm6345-l1.c
+++ b/drivers/irqchip/irq-bcm6345-l1.c
@@ -192,14 +192,10 @@ static int bcm6345_l1_set_affinity(struct irq_data *d,
u32 mask = BIT(d->hwirq % IRQS_PER_WORD);
unsigned int old_cpu = cpu_for_irq(intc, d);
unsigned int new_cpu;
- struct cpumask valid;
unsigned long flags;
bool enabled;

- if (!cpumask_and(&valid, &intc->cpumask, dest))
- return -EINVAL;
-
- new_cpu = cpumask_any_and(&valid, cpu_online_mask);
+ new_cpu = cpumask_first_and_and(&intc->cpumask, dest, cpu_online_mask);
if (new_cpu >= nr_cpu_ids)
return -EINVAL;


Subject: [tip: irq/core] cpumask: Introduce cpumask_first_and_and()

The following commit has been merged into the irq/core branch of tip:

Commit-ID: cdc66553c4130735f0a2db943a5259e54ff1597a
Gitweb: https://git.kernel.org/tip/cdc66553c4130735f0a2db943a5259e54ff1597a
Author: Dawei Li <[email protected]>
AuthorDate: Tue, 16 Apr 2024 16:54:48 +08:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 24 Apr 2024 21:23:49 +02:00

cpumask: Introduce cpumask_first_and_and()

Introduce cpumask_first_and_and() to get intersection between 3 cpumasks,
free of any intermediate cpumask variable. Instead, cpumask_first_and_and()
works in-place with all inputs and produces desired output directly.

Signed-off-by: Dawei Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Yury Norov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
include/linux/cpumask.h | 17 +++++++++++++++++
include/linux/find.h | 27 +++++++++++++++++++++++++++
lib/find_bit.c | 12 ++++++++++++
3 files changed, 56 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 1c29947..c46f9e9 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -188,6 +188,23 @@ unsigned int cpumask_first_and(const struct cpumask *srcp1, const struct cpumask
}

/**
+ * cpumask_first_and_and - return the first cpu from *srcp1 & *srcp2 & *srcp3
+ * @srcp1: the first input
+ * @srcp2: the second input
+ * @srcp3: the third input
+ *
+ * Return: >= nr_cpu_ids if no cpus set in all.
+ */
+static inline
+unsigned int cpumask_first_and_and(const struct cpumask *srcp1,
+ const struct cpumask *srcp2,
+ const struct cpumask *srcp3)
+{
+ return find_first_and_and_bit(cpumask_bits(srcp1), cpumask_bits(srcp2),
+ cpumask_bits(srcp3), small_cpumask_bits);
+}
+
+/**
* cpumask_last - get the last CPU in a cpumask
* @srcp: - the cpumask pointer
*
diff --git a/include/linux/find.h b/include/linux/find.h
index c69598e..28ec5a0 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -29,6 +29,8 @@ unsigned long __find_nth_and_andnot_bit(const unsigned long *addr1, const unsign
unsigned long n);
extern unsigned long _find_first_and_bit(const unsigned long *addr1,
const unsigned long *addr2, unsigned long size);
+unsigned long _find_first_and_and_bit(const unsigned long *addr1, const unsigned long *addr2,
+ const unsigned long *addr3, unsigned long size);
extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);

@@ -345,6 +347,31 @@ unsigned long find_first_and_bit(const unsigned long *addr1,
}
#endif

+/**
+ * find_first_and_and_bit - find the first set bit in 3 memory regions
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @addr3: The third address to base the search on
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number for the first set bit
+ * If no bits are set, returns @size.
+ */
+static inline
+unsigned long find_first_and_and_bit(const unsigned long *addr1,
+ const unsigned long *addr2,
+ const unsigned long *addr3,
+ unsigned long size)
+{
+ if (small_const_nbits(size)) {
+ unsigned long val = *addr1 & *addr2 & *addr3 & GENMASK(size - 1, 0);
+
+ return val ? __ffs(val) : size;
+ }
+
+ return _find_first_and_and_bit(addr1, addr2, addr3, size);
+}
+
#ifndef find_first_zero_bit
/**
* find_first_zero_bit - find the first cleared bit in a memory region
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 32f99e9..dacadd9 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -116,6 +116,18 @@ unsigned long _find_first_and_bit(const unsigned long *addr1,
EXPORT_SYMBOL(_find_first_and_bit);
#endif

+/*
+ * Find the first set bit in three memory regions.
+ */
+unsigned long _find_first_and_and_bit(const unsigned long *addr1,
+ const unsigned long *addr2,
+ const unsigned long *addr3,
+ unsigned long size)
+{
+ return FIND_FIRST_BIT(addr1[idx] & addr2[idx] & addr3[idx], /* nop */, size);
+}
+EXPORT_SYMBOL(_find_first_and_and_bit);
+
#ifndef find_first_zero_bit
/*
* Find the first cleared bit in a memory region.

Subject: [tip: irq/core] cpuidle: Avoid explicit cpumask allocation on stack

The following commit has been merged into the irq/core branch of tip:

Commit-ID: 6f28c4a852fab8bd759a383149dfd30511477249
Gitweb: https://git.kernel.org/tip/6f28c4a852fab8bd759a383149dfd30511477249
Author: Dawei Li <[email protected]>
AuthorDate: Tue, 16 Apr 2024 16:54:54 +08:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 24 Apr 2024 21:23:49 +02:00

cpuidle: Avoid explicit cpumask allocation on stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Use cpumask_first_and_and() and cpumask_weight_and() to avoid the need
for a temporary cpumask on the stack.

Signed-off-by: Dawei Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
drivers/cpuidle/coupled.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
index 9acde71..bb8761c 100644
--- a/drivers/cpuidle/coupled.c
+++ b/drivers/cpuidle/coupled.c
@@ -439,13 +439,8 @@ static int cpuidle_coupled_clear_pokes(int cpu)

static bool cpuidle_coupled_any_pokes_pending(struct cpuidle_coupled *coupled)
{
- cpumask_t cpus;
- int ret;
-
- cpumask_and(&cpus, cpu_online_mask, &coupled->coupled_cpus);
- ret = cpumask_and(&cpus, &cpuidle_coupled_poke_pending, &cpus);
-
- return ret;
+ return cpumask_first_and_and(cpu_online_mask, &coupled->coupled_cpus,
+ &cpuidle_coupled_poke_pending) < nr_cpu_ids;
}

/**
@@ -626,9 +621,7 @@ out:

static void cpuidle_coupled_update_online_cpus(struct cpuidle_coupled *coupled)
{
- cpumask_t cpus;
- cpumask_and(&cpus, cpu_online_mask, &coupled->coupled_cpus);
- coupled->online_count = cpumask_weight(&cpus);
+ coupled->online_count = cpumask_weight_and(cpu_online_mask, &coupled->coupled_cpus);
}

/**

Subject: [tip: irq/core] irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack

The following commit has been merged into the irq/core branch of tip:

Commit-ID: fcb8af4cbcd122e33ceeadd347b8866d32035af7
Gitweb: https://git.kernel.org/tip/fcb8af4cbcd122e33ceeadd347b8866d32035af7
Author: Dawei Li <[email protected]>
AuthorDate: Tue, 16 Apr 2024 16:54:50 +08:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 24 Apr 2024 21:23:49 +02:00

irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack

In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Remove cpumask var on stack and use cpumask_any_and() to address it.

Signed-off-by: Dawei Li <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
drivers/irqchip/irq-gic-v3-its.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index fca888b..20f9542 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3826,9 +3826,9 @@ static int its_vpe_set_affinity(struct irq_data *d,
bool force)
{
struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
- struct cpumask common, *table_mask;
+ unsigned int from, cpu = nr_cpu_ids;
+ struct cpumask *table_mask;
unsigned long flags;
- int from, cpu;

/*
* Changing affinity is mega expensive, so let's be as lazy as
@@ -3850,10 +3850,15 @@ static int its_vpe_set_affinity(struct irq_data *d,
* If we are offered another CPU in the same GICv4.1 ITS
* affinity, pick this one. Otherwise, any CPU will do.
*/
- if (table_mask && cpumask_and(&common, mask_val, table_mask))
- cpu = cpumask_test_cpu(from, &common) ? from : cpumask_first(&common);
- else
+ if (table_mask)
+ cpu = cpumask_any_and(mask_val, table_mask);
+ if (cpu < nr_cpu_ids) {
+ if (cpumask_test_cpu(from, mask_val) &&
+ cpumask_test_cpu(from, table_mask))
+ cpu = from;
+ } else {
cpu = cpumask_first(mask_val);
+ }

if (from == cpu)
goto out;