2021-10-28 07:08:29

by Jiasheng Jiang

[permalink] [raw]
Subject: [PATCH v3] cpumask: Fix implicit type conversion

The description of the macro in `include/linux/cpumask.h` says the
variable 'cpu' can be int, whose value ranges from (-2^31) to
(2^31 - 1).
However in the for_each_cpu(), 'nr_cpu_ids' and the return value of
cpumask_next() is unsigned int, whose value ranges from 0 to
(2^32 - 1).
If return value of cpumask_next() is (2^31), the restrict
'cpu < nr_cpu_ids' can also be statisfied, but the actual value
of 'cpu' is (-2^31).
Take amd_pmu_cpu_starting() in `arch/x86/events/amd/core.c` as an
example. When value of 'cpu' is (-2^31), then in the per_cpu(),
there will apear __per_cpu_offset[-2^31], which is array out of
bounds error.
Moreover, the num of cpu to be the negative doesn't make sense and
may easily causes trouble.
It is universally accepted that the implicit type conversion is
terrible.
Also, having the good programming custom will set an example for
others.
Thus, it might be better to fix the macro description of 'cpu' and
deal with all the existing issues.

Fixes: c743f0a ("sched/fair, cpumask: Export for_each_cpu_wrap()")
Fixes: 8bd93a2 ("rcu: Accelerate grace period if last non-dynticked CPU")
Fixes: 984f2f3 ("cpumask: introduce new API, without changing anything, v3")
Signed-off-by: Jiasheng Jiang <[email protected]>
---
include/linux/cpumask.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index bfc4690..5db1d9d 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -232,7 +232,7 @@ int cpumask_any_distribute(const struct cpumask *srcp);

/**
* for_each_cpu - iterate over every cpu in a mask
- * @cpu: the (optionally unsigned) integer iterator
+ * @cpu: the unsigned integer iterator
* @mask: the cpumask pointer
*
* After the loop, cpu is >= nr_cpu_ids.
@@ -244,7 +244,7 @@ int cpumask_any_distribute(const struct cpumask *srcp);

/**
* for_each_cpu_not - iterate over every cpu in a complemented mask
- * @cpu: the (optionally unsigned) integer iterator
+ * @cpu: the unsigned integer iterator
* @mask: the cpumask pointer
*
* After the loop, cpu is >= nr_cpu_ids.
@@ -258,7 +258,7 @@ extern int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool

/**
* for_each_cpu_wrap - iterate over every cpu in a mask, starting at a specified location
- * @cpu: the (optionally unsigned) integer iterator
+ * @cpu: the unsigned integer iterator
* @mask: the cpumask poiter
* @start: the start location
*
@@ -273,7 +273,7 @@ extern int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool

/**
* for_each_cpu_and - iterate over every cpu in both masks
- * @cpu: the (optionally unsigned) integer iterator
+ * @cpu: the unsigned integer iterator
* @mask1: the first cpumask pointer
* @mask2: the second cpumask pointer
*
--
2.7.4


2021-10-28 10:53:18

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v3] cpumask: Fix implicit type conversion

On 28/10/21 07:06, Jiasheng Jiang wrote:
> The description of the macro in `include/linux/cpumask.h` says the
> variable 'cpu' can be int, whose value ranges from (-2^31) to
> (2^31 - 1).
> However in the for_each_cpu(), 'nr_cpu_ids' and the return value of
> cpumask_next() is unsigned int, whose value ranges from 0 to
> (2^32 - 1).
> If return value of cpumask_next() is (2^31), the restrict
> 'cpu < nr_cpu_ids' can also be statisfied, but the actual value
> of 'cpu' is (-2^31).
> Take amd_pmu_cpu_starting() in `arch/x86/events/amd/core.c` as an
> example. When value of 'cpu' is (-2^31), then in the per_cpu(),
> there will apear __per_cpu_offset[-2^31], which is array out of
> bounds error.
> Moreover, the num of cpu to be the negative doesn't make sense and
> may easily causes trouble.
> It is universally accepted that the implicit type conversion is
> terrible.
> Also, having the good programming custom will set an example for
> others.
> Thus, it might be better to fix the macro description of 'cpu' and
> deal with all the existing issues.
>

AFAIA the upper bounds for NR_CPUS are around 2^12 (arm64) and 2^13 (x86);
I don't think we're anywhere near supporting such massive systems.

I got curious and had a look at the size of .data..percpu on a defconfig
arm64 kernel - I get about ~40KB. So purely on the percpu data side of
things, we're talking about 100TB of RAM...

Trying to improve the code is laudable, but I don't see much incentive in
the churn ATM.

> Fixes: c743f0a ("sched/fair, cpumask: Export for_each_cpu_wrap()")
> Fixes: 8bd93a2 ("rcu: Accelerate grace period if last non-dynticked CPU")
> Fixes: 984f2f3 ("cpumask: introduce new API, without changing anything, v3")

Where's the v1->v2->v3 changelog? This is merely fiddling with doc headers,
what's being fixed here?