LinuxLists.cc - [PATCH 0/2] incorrect cpumask behavior with CPUMASK

2015-02-26 06:20:07

Subject: [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK

From: Oleg Drokin <[email protected]>

I just got a report today from Tyson Whitehead <[email protected]>
that Lustre crashes when CPUMASK_OFFSTACK is enabled.

A little investigation revealed that this code:
cpumask_t mask;
...
cpumask_copy(&mask, topology_thread_cpumask(0));
weight = cpus_weight(mask);

that was supposed to calculate number of cpu siblings/partitions returns
a crazy high number over 3000 which is impossible as I only have
8 cpu cores on my system.

So after a bit of digging I found out that:
cpumask_copy only copies up to nr_cpumask_bits (actual number of cpus I
have in the system),
where as cpumask_weight actully tries to count bits up to NR_CPUS.

Not only calculating up to NR_CPUS is wasteful in this case, and
since we know how many cpus we have in the system - it only makes sense
to calculate only this much anyway, it's wrong because the copy only copied
8 bits to our variable and the rest of it is some random stack garbage.

So I propose two patches here, the first one I am more certain about -
operations that operate on current cpuset like cpus_weight, but also
cpus_empty, cpus_$LOGICALop cpus_$BINARYop are converted from NR_CPUS to
nr_cpumask_bits (this is ok when CONFIG_CPUMASK_OFFSTACK is not set as it's
then defined to NR_CPUS anyway).
I am leaving __cpus_setall __cpus_clear out of it as these two look like
they deal with entire set and it would be useful for them to operate on
all NR_CPUS bits for the case if more cpus are added later and such.

The second patch that I am not sure if we wnat, but it seems to be useful
until struct cpumask is fully dynamic is to convert what looks like
whole-set operations e.g. copies, namely:
cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS
bits to ensure there's no stale garbage left in the mask should the
cpu count increases later.

I checked the code and allocating cpumasks on stack is not all that
uncommon in the code, so this should be a worthwhile fix.

Please consider.

Oleg Drokin (2):
cpumask: Properly calculate cpumask values
cpumask: make whole cpumask operations like copy to work with NR_CPUS
bits

include/linux/cpumask.h | 37 +++++++++++++++++++++----------------
1 file changed, 21 insertions(+), 16 deletions(-)

--
2.1.0

2015-02-26 06:20:04

by Oleg Drokin

[permalink] [raw]

Subject: [PATCH 1/2] cpumask: Properly calculate cpumask values

From: Oleg Drokin <[email protected]>

With CONFIG_CPUMASK_OFFSTACK enabled there seems to be some disparity
between theoretical maximum of CPUs in the system (NR_CPUS that is huge)
and the actual value that is calculated at runtime (nr_cpu_ids).

Functions like cpus_weight should only check up to nr_cpu_ids bits
in the cpu mask, as there's no point to go all the way to 8192 bits
of theoritically possibly CPUs.

Signed-off-by: Oleg Drokin <[email protected]>
---
include/linux/cpumask.h | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 086549a..f0599e1 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -902,21 +902,24 @@ static inline int __cpu_test_and_set(int cpu, cpumask_t *addr)
return test_and_set_bit(cpu, addr->bits);
}

-#define cpus_and(dst, src1, src2) __cpus_and(&(dst), &(src1), &(src2), NR_CPUS)
+#define cpus_and(dst, src1, src2) __cpus_and(&(dst), &(src1), &(src2), \
+ nr_cpumask_bits)
static inline int __cpus_and(cpumask_t *dstp, const cpumask_t *src1p,
const cpumask_t *src2p, unsigned int nbits)
{
return bitmap_and(dstp->bits, src1p->bits, src2p->bits, nbits);
}

-#define cpus_or(dst, src1, src2) __cpus_or(&(dst), &(src1), &(src2), NR_CPUS)
+#define cpus_or(dst, src1, src2) __cpus_or(&(dst), &(src1), &(src2), \
+ nr_cpumask_bits)
static inline void __cpus_or(cpumask_t *dstp, const cpumask_t *src1p,
const cpumask_t *src2p, unsigned int nbits)
{
bitmap_or(dstp->bits, src1p->bits, src2p->bits, nbits);
}

-#define cpus_xor(dst, src1, src2) __cpus_xor(&(dst), &(src1), &(src2), NR_CPUS)
+#define cpus_xor(dst, src1, src2) __cpus_xor(&(dst), &(src1), &(src2), \
+ nr_cpumask_bits)
static inline void __cpus_xor(cpumask_t *dstp, const cpumask_t *src1p,
const cpumask_t *src2p, unsigned int nbits)
{
@@ -924,48 +927,50 @@ static inline void __cpus_xor(cpumask_t *dstp, const cpumask_t *src1p,
}

#define cpus_andnot(dst, src1, src2) \
- __cpus_andnot(&(dst), &(src1), &(src2), NR_CPUS)
+ __cpus_andnot(&(dst), &(src1), &(src2), \
+ nr_cpumask_bits)
static inline int __cpus_andnot(cpumask_t *dstp, const cpumask_t *src1p,
const cpumask_t *src2p, unsigned int nbits)
{
return bitmap_andnot(dstp->bits, src1p->bits, src2p->bits, nbits);
}

-#define cpus_equal(src1, src2) __cpus_equal(&(src1), &(src2), NR_CPUS)
+#define cpus_equal(src1, src2) __cpus_equal(&(src1), &(src2), nr_cpumask_bits)
static inline int __cpus_equal(const cpumask_t *src1p,
const cpumask_t *src2p, unsigned int nbits)
{
return bitmap_equal(src1p->bits, src2p->bits, nbits);
}

-#define cpus_intersects(src1, src2) __cpus_intersects(&(src1), &(src2), NR_CPUS)
+#define cpus_intersects(src1, src2) __cpus_intersects(&(src1), &(src2), \
+ nr_cpumask_bits)
static inline int __cpus_intersects(const cpumask_t *src1p,
const cpumask_t *src2p, unsigned int nbits)
{
return bitmap_intersects(src1p->bits, src2p->bits, nbits);
}

-#define cpus_subset(src1, src2) __cpus_subset(&(src1), &(src2), NR_CPUS)
+#define cpus_subset(src1, src2) __cpus_subset(&(src1), &(src2), nr_cpumask_bits)
static inline int __cpus_subset(const cpumask_t *src1p,
const cpumask_t *src2p, unsigned int nbits)
{
return bitmap_subset(src1p->bits, src2p->bits, nbits);
}

-#define cpus_empty(src) __cpus_empty(&(src), NR_CPUS)
+#define cpus_empty(src) __cpus_empty(&(src), nr_cpumask_bits)
static inline int __cpus_empty(const cpumask_t *srcp, unsigned int nbits)
{
return bitmap_empty(srcp->bits, nbits);
}

-#define cpus_weight(cpumask) __cpus_weight(&(cpumask), NR_CPUS)
+#define cpus_weight(cpumask) __cpus_weight(&(cpumask), nr_cpumask_bits)
static inline int __cpus_weight(const cpumask_t *srcp, unsigned int nbits)
{
return bitmap_weight(srcp->bits, nbits);
}

#define cpus_shift_left(dst, src, n) \
- __cpus_shift_left(&(dst), &(src), (n), NR_CPUS)
+ __cpus_shift_left(&(dst), &(src), (n), nr_cpumask_bits)
static inline void __cpus_shift_left(cpumask_t *dstp,
const cpumask_t *srcp, int n, int nbits)
{
--
2.1.0

2015-02-26 06:20:06

by Oleg Drokin

[permalink] [raw]

Subject: [PATCH 2/2] cpumask: make whole cpumask operations like copy to work with NR_CPUS bits

From: Oleg Drokin <[email protected]>

When we are doing things like cpumask_copy, and CONFIG_CPUMASK_OFFSTACK
is set, we only copy actual number of bits equal to number of CPUs
we have. But underlying allocations got NR_CPUS = 8192, so
if the cpumask is allocated on the stack or has other prefilled values
there's a lot of garbage left that might become exposed as more CPUs are
added into the system.
The patch converts such whole-mask functions:
cpumask_setall, cpumask_clear, cpumask_copy
to operate on the whole NR_CPUS value.

Signed-off-by: Oleg Drokin <[email protected]>
---
include/linux/cpumask.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index f0599e1..28a8bb3 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -324,21 +324,21 @@ static inline int cpumask_test_and_clear_cpu(int cpu, struct cpumask *cpumask)
}

/**
- * cpumask_setall - set all cpus (< nr_cpu_ids) in a cpumask
+ * cpumask_setall - set all cpus (< NR_CPUS) in a cpumask
* @dstp: the cpumask pointer
*/
static inline void cpumask_setall(struct cpumask *dstp)
{
- bitmap_fill(cpumask_bits(dstp), nr_cpumask_bits);
+ bitmap_fill(cpumask_bits(dstp), NR_CPUS);
}

/**
- * cpumask_clear - clear all cpus (< nr_cpu_ids) in a cpumask
+ * cpumask_clear - clear all cpus (< NR_CPUS) in a cpumask
* @dstp: the cpumask pointer
*/
static inline void cpumask_clear(struct cpumask *dstp)
{
- bitmap_zero(cpumask_bits(dstp), nr_cpumask_bits);
+ bitmap_zero(cpumask_bits(dstp), NR_CPUS);
}

/**
@@ -470,7 +470,7 @@ static inline bool cpumask_full(const struct cpumask *srcp)

/**
* cpumask_weight - Count of bits in *srcp
- * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @srcp: the cpumask to count bits (< NR_CPUS) in.
*/
static inline unsigned int cpumask_weight(const struct cpumask *srcp)
{
@@ -511,7 +511,7 @@ static inline void cpumask_shift_left(struct cpumask *dstp,
static inline void cpumask_copy(struct cpumask *dstp,
const struct cpumask *srcp)
{
- bitmap_copy(cpumask_bits(dstp), cpumask_bits(srcp), nr_cpumask_bits);
+ bitmap_copy(cpumask_bits(dstp), cpumask_bits(srcp), NR_CPUS);
}

/**
--
2.1.0

2015-02-27 12:11:21

by Rusty Russell

[permalink] [raw]

Subject: Re: [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK

[email protected] writes:
> From: Oleg Drokin <[email protected]>
>
> I just got a report today from Tyson Whitehead <[email protected]>
> that Lustre crashes when CPUMASK_OFFSTACK is enabled.
>
> A little investigation revealed that this code:
> cpumask_t mask;
> ...
> cpumask_copy(&mask, topology_thread_cpumask(0));
> weight = cpus_weight(mask);

Yes. cpumask_weight should have been used here. The old cpus_* are
deprecated.

> The second patch that I am not sure if we wnat, but it seems to be useful
> until struct cpumask is fully dynamic is to convert what looks like
> whole-set operations e.g. copies, namely:
> cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS
> bits to ensure there's no stale garbage left in the mask should the
> cpu count increases later.

You can't do this, because dynamically allocated cpumasks don't have
NR_CPUS bits.

Let's just kill all the cpus_ functions. This wasn't done originally
because archs which didn't care about offline cpumasks didn't want the
churn. In particular, they must not copy struct cpumask by assignment,
and fixing those is a fair bit of churn.

The following is the minimal fix:

Cheers,
Rusty.

CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS: set if CPUMASK_OFFSTACK.

Using these functions with offstack cpus is unsafe. They use all NR_CPUS
bits, unstead of nr_cpumask_bits.

Signed-off-by: Rusty Russell <[email protected]>

diff --git a/lib/Kconfig b/lib/Kconfig
index 87da53bb1fef..51b4210f3da9 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -398,8 +398,7 @@ config CPUMASK_OFFSTACK
stack overflow.

config DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
- bool "Disable obsolete cpumask functions" if DEBUG_PER_CPU_MAPS
- depends on BROKEN
+ bool "Disable obsolete cpumask functions" if CPUMASK_OFFSTACK

config CPU_RMAP
bool

2015-02-27 17:52:21

by Oleg Drokin

[permalink] [raw]

Subject: Re: [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK

Hello!

On Feb 27, 2015, at 6:46 AM, Rusty Russell wrote:

> [email protected] writes:
>> From: Oleg Drokin <[email protected]>
>>
>> I just got a report today from Tyson Whitehead <[email protected]>
>> that Lustre crashes when CPUMASK_OFFSTACK is enabled.
>>
>> A little investigation revealed that this code:
>> cpumask_t mask;
>> ...
>> cpumask_copy(&mask, topology_thread_cpumask(0));
>> weight = cpus_weight(mask);
> Yes. cpumask_weight should have been used here. The old cpus_* are
> deprecated.

Oh! I guess we missed the announcement.
I'll convert it over.

Should I also do a patch converting all other users and removing the deprecated
functions while I am at it?

>> The second patch that I am not sure if we wnat, but it seems to be useful
>> until struct cpumask is fully dynamic is to convert what looks like
>> whole-set operations e.g. copies, namely:
>> cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS
>> bits to ensure there's no stale garbage left in the mask should the
>> cpu count increases later.
> You can't do this, because dynamically allocated cpumasks don't have
> NR_CPUS bits.

Well, right now they certainly have. As in, it's a static define and we allocate
a bitmap to fit the (in my case) up to 8192 bits into such off-stack masks.

The concern is since number of cpus is not really a fixed variable, when you
do cpumask initialization, and then number of cpus grows, the end of the mask
could be garbage? Am I overthinking this and it's not really a problem?

> Let's just kill all the cpus_ functions. This wasn't done originally
> because archs which didn't care about offline cpumasks didn't want the
> churn. In particular, they must not copy struct cpumask by assignment,
> and fixing those is a fair bit of churn.

Ah, copy masks by assignment, I see.
Well, I guess we can eliminate the users outside of the affected arch trees
(I assume in x86 there would be no objections?) and perhaps add a warning to
checkpatch.pl?

Thanks!

Bye,
Oleg-