2002-12-16 19:05:34

by Bjorn Helgaas

[permalink] [raw]
Subject: [PATCH] Fix CPU bitmask truncation

This patch fixes some obviously incorrect bitmask truncations in 2.4.20.

diff -Nru a/include/linux/sched.h b/include/linux/sched.h
--- a/include/linux/sched.h Mon Dec 16 11:58:42 2002
+++ b/include/linux/sched.h Mon Dec 16 11:58:42 2002
@@ -482,8 +482,8 @@
policy: SCHED_OTHER, \
mm: NULL, \
active_mm: &init_mm, \
- cpus_runnable: -1, \
- cpus_allowed: -1, \
+ cpus_runnable: ~0UL, \
+ cpus_allowed: ~0UL, \
run_list: LIST_HEAD_INIT(tsk.run_list), \
next_task: &tsk, \
prev_task: &tsk, \
diff -Nru a/kernel/sched.c b/kernel/sched.c
--- a/kernel/sched.c Mon Dec 16 11:58:42 2002
+++ b/kernel/sched.c Mon Dec 16 11:58:42 2002
@@ -116,7 +116,7 @@

#define idle_task(cpu) (init_tasks[cpu_number_map(cpu)])
#define can_schedule(p,cpu) \
- ((p)->cpus_runnable & (p)->cpus_allowed & (1 << cpu))
+ ((p)->cpus_runnable & (p)->cpus_allowed & (1UL << cpu))

#else

@@ -359,7 +359,7 @@
if (task_on_runqueue(p))
goto out;
add_to_runqueue(p);
- if (!synchronous || !(p->cpus_allowed & (1 << smp_processor_id())))
+ if (!synchronous || !(p->cpus_allowed & (1UL << smp_processor_id())))
reschedule_idle(p);
success = 1;
out:


2002-12-20 10:23:05

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] Fix CPU bitmask truncation

On Mon, Dec 16, 2002 at 12:13:29PM -0700, Bjorn Helgaas wrote:
> This patch fixes some obviously incorrect bitmask truncations in 2.4.20.

Linus, this is the 2.5.x version of the same patch originally by Bjorn
for 2.4.x. This fixes an entire class of critical 64-bit bugs.

Against 2.5.52-bk as of 2:25AM 20 Dec 2002. Please apply.


Thanks,
Bill

Fix task->cpus_allowed bitmask truncations. Originally due to
Bjorn Helgaas for 2.4.x.

include/linux/init_task.h | 2 +-
kernel/module.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)


===== include/linux/init_task.h 1.19 vs edited =====
--- 1.19/include/linux/init_task.h Sun Sep 29 07:02:55 2002
+++ edited/include/linux/init_task.h Fri Dec 20 02:22:04 2002
@@ -63,7 +63,7 @@
.prio = MAX_PRIO-20, \
.static_prio = MAX_PRIO-20, \
.policy = SCHED_NORMAL, \
- .cpus_allowed = -1, \
+ .cpus_allowed = ~0UL, \
.mm = NULL, \
.active_mm = &init_mm, \
.run_list = LIST_HEAD_INIT(tsk.run_list), \
===== kernel/module.c 1.31 vs edited =====
--- 1.31/kernel/module.c Sun Dec 1 22:44:11 2002
+++ edited/kernel/module.c Fri Dec 20 02:19:53 2002
@@ -210,7 +210,7 @@
struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
setscheduler(current->pid, SCHED_FIFO, &param);
#endif
- set_cpus_allowed(current, 1 << (unsigned long)cpu);
+ set_cpus_allowed(current, 1UL << (unsigned long)cpu);

/* Ack: we are alive */
atomic_inc(&stopref_thread_ack);
@@ -271,7 +271,7 @@

/* FIXME: racy with set_cpus_allowed. */
old_allowed = current->cpus_allowed;
- set_cpus_allowed(current, 1 << (unsigned long)cpu);
+ set_cpus_allowed(current, 1UL << (unsigned long)cpu);

atomic_set(&stopref_thread_ack, 0);
stopref_num_threads = 0;

2002-12-20 11:07:56

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] Fix CPU bitmask truncation

On Mon, Dec 16, 2002 at 12:13:29PM -0700, Bjorn Helgaas wrote:
>> This patch fixes some obviously incorrect bitmask truncations in 2.4.20.

On Fri, Dec 20, 2002 at 02:30:28AM -0800, William Lee Irwin III wrote:
> Linus, this is the 2.5.x version of the same patch originally by Bjorn
> for 2.4.x. This fixes an entire class of critical 64-bit bugs.
> Against 2.5.52-bk as of 2:25AM 20 Dec 2002. Please apply.

Actually, this looks like a non-issue from userspace on the IA64 boxen
I can get to. akpm pointed out that this seemed to pass his testing,
and on deeper inspection, IA64 userspace did not find this to be an issue.

Bjorn, could you explain on what toolchains and/or architectures you had
this issue? It sounds serious and/or real enough yet I can't actually
make this happen myself.


Thanks,
Bill

2002-12-20 12:09:23

by Andreas Schwab

[permalink] [raw]
Subject: Re: [PATCH] Fix CPU bitmask truncation

William Lee Irwin III <[email protected]> writes:

|> ===== include/linux/init_task.h 1.19 vs edited =====
|> --- 1.19/include/linux/init_task.h Sun Sep 29 07:02:55 2002
|> +++ edited/include/linux/init_task.h Fri Dec 20 02:22:04 2002
|> @@ -63,7 +63,7 @@
|> .prio = MAX_PRIO-20, \
|> .static_prio = MAX_PRIO-20, \
|> .policy = SCHED_NORMAL, \
|> - .cpus_allowed = -1, \
|> + .cpus_allowed = ~0UL, \

This is useless. Assigning -1 to any unsigned type is garanteed to give
you all bits one, and with two's complement this also holds for any signed
type.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 N?rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2002-12-20 16:52:30

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] Fix CPU bitmask truncation

On Friday 20 December 2002 4:15 am, William Lee Irwin III wrote:
> Actually, this looks like a non-issue from userspace on the IA64 boxen
> I can get to. akpm pointed out that this seemed to pass his testing,
> and on deeper inspection, IA64 userspace did not find this to be an issue.
>
> Bjorn, could you explain on what toolchains and/or architectures you had
> this issue? It sounds serious and/or real enough yet I can't actually
> make this happen myself.

This was an issue with gcc 2.96 on a 64-way IA64 box. I don't have
access to one at the moment, but as I remember, without the 2.4 changes:

- ((p)->cpus_runnable & (p)->cpus_allowed & (1 << cpu))
+ ((p)->cpus_runnable & (p)->cpus_allowed & (1UL << cpu))

nothing would get scheduled on CPUs 32-63. I guess those changes
aren't controversial, though.

The question of whether this was strictly necessary:

- cpus_runnable: -1, \
- cpus_allowed: -1, \
+ cpus_runnable: ~0UL, \
+ cpus_allowed: ~0UL, \

I don't specifically recall, and a quick test suggests that it really
doesn't matter. Since cpus_runnable and cpus_allowed are declared
unsigned long, I think ~0UL is a more direct expression of what is
desired, but maybe that's just a personal preference.

Bjorn

2002-12-20 17:05:18

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] Fix CPU bitmask truncation

William Lee Irwin III <[email protected]> writes:
|> ===== include/linux/init_task.h 1.19 vs edited =====
|> --- 1.19/include/linux/init_task.h Sun Sep 29 07:02:55 2002
|> +++ edited/include/linux/init_task.h Fri Dec 20 02:22:04 2002
|> @@ -63,7 +63,7 @@
|> .prio = MAX_PRIO-20, \
|> .static_prio = MAX_PRIO-20, \
|> .policy = SCHED_NORMAL, \
|> - .cpus_allowed = -1, \
|> + .cpus_allowed = ~0UL, \

On Fri, Dec 20, 2002 at 01:17:24PM +0100, Andreas Schwab wrote:
> This is useless. Assigning -1 to any unsigned type is garanteed to give
> you all bits one, and with two's complement this also holds for any signed
> type.

Not so on all gcc versions. The rest of the world can figure out what
to do about the versions that do not.


Bill

2002-12-20 18:15:15

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] Fix CPU bitmask truncation

On Fri, 2002-12-20 at 04:17, Andreas Schwab wrote:
> This is useless. Assigning -1 to any unsigned type is garanteed to give
> you all bits one, and with two's complement this also holds for any signed
> type.

Only if the -1 is the same size as the unsigned type. Otherwise it will
be 0-extended.

J

2002-12-20 19:52:24

by Andreas Schwab

[permalink] [raw]
Subject: Re: [PATCH] Fix CPU bitmask truncation

Jeremy Fitzhardinge <[email protected]> writes:

|> On Fri, 2002-12-20 at 04:17, Andreas Schwab wrote:
|> > This is useless. Assigning -1 to any unsigned type is garanteed to give
|> > you all bits one, and with two's complement this also holds for any signed
|> > type.
|>
|> Only if the -1 is the same size as the unsigned type. Otherwise it will
|> be 0-extended.

Wrong. Unsigned arithmetics is defined as modulo MAX+1, and -1 equals MAX
modulo MAX+1 for every MAX.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 N?rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2002-12-20 22:52:44

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH] Fix CPU bitmask truncation


> Linus, this is the 2.5.x version of the same patch originally by Bjorn
> for 2.4.x. This fixes an entire class of critical 64-bit bugs.
>
> Against 2.5.52-bk as of 2:25AM 20 Dec 2002. Please apply.

Here's one in 2.5, found when adding 64 CPU support to ppc64.

Anton

===== kernel/module.c 1.31 vs edited =====
--- 1.31/kernel/module.c Mon Dec 2 17:44:11 2002
+++ edited/kernel/module.c Tue Dec 17 10:29:27 2002
@@ -210,7 +210,7 @@
struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
setscheduler(current->pid, SCHED_FIFO, &param);
#endif
- set_cpus_allowed(current, 1 << (unsigned long)cpu);
+ set_cpus_allowed(current, 1UL << (unsigned long)cpu);

/* Ack: we are alive */
atomic_inc(&stopref_thread_ack);
@@ -271,7 +271,7 @@

/* FIXME: racy with set_cpus_allowed. */
old_allowed = current->cpus_allowed;
- set_cpus_allowed(current, 1 << (unsigned long)cpu);
+ set_cpus_allowed(current, 1UL << (unsigned long)cpu);

atomic_set(&stopref_thread_ack, 0);
stopref_num_threads = 0;

2002-12-20 23:29:30

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] Fix CPU bitmask truncation

At some point in the past, I wrote:
>> Linus, this is the 2.5.x version of the same patch originally by Bjorn
>> for 2.4.x. This fixes an entire class of critical 64-bit bugs.
>> Against 2.5.52-bk as of 2:25AM 20 Dec 2002. Please apply.

On Sat, Dec 21, 2002 at 09:57:14AM +1100, Anton Blanchard wrote:
> Here's one in 2.5, found when adding 64 CPU support to ppc64.
> Anton

That was already included in the patch I sent, which was for 2.5.x.
I'll take this to mean the problem isn't solved in modern toolchains.

Linus, I think that pretty much settles it.


Bill

2002-12-20 23:38:58

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH] Fix CPU bitmask truncation


> That was already included in the patch I sent, which was for 2.5.x.

OK, I only read through the 2.4 patch.

Anton

2002-12-25 21:38:04

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Fix CPU bitmask truncation

On Fri, 2002-12-20 at 17:00, Bjorn Helgaas wrote:
> This was an issue with gcc 2.96 on a 64-way IA64 box. I don't have
> access to one at the moment, but as I remember, without the 2.4 changes:
>
> - ((p)->cpus_runnable & (p)->cpus_allowed & (1 << cpu))
> + ((p)->cpus_runnable & (p)->cpus_allowed & (1UL << cpu))
>
> nothing would get scheduled on CPUs 32-63. I guess those changes
> aren't controversial, though.

Is this a C quirk or a compiler bug ?

2002-12-25 21:44:53

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix CPU bitmask truncation



On 25 Dec 2002, Alan Cox wrote:

> On Fri, 2002-12-20 at 17:00, Bjorn Helgaas wrote:
> > This was an issue with gcc 2.96 on a 64-way IA64 box. I don't have
> > access to one at the moment, but as I remember, without the 2.4 changes:
> >
> > - ((p)->cpus_runnable & (p)->cpus_allowed & (1 << cpu))
> > + ((p)->cpus_runnable & (p)->cpus_allowed & (1UL << cpu))
> >
> > nothing would get scheduled on CPUs 32-63. I guess those changes
> > aren't controversial, though.
>
> Is this a C quirk or a compiler bug ?

It's normal C behaviour. I wouldn't even call it a quirk.

1 << cpu

is clearly an int, and as such will have undefined behaviour for cpu >
bits-of-int.

The promotion to unsigned long happens _after_ the shift has already
happened as an int, since nothing in the sub-expression needs promotion
per se.

As undefined behaviour, the C compiler is _allowed_ to do what we meant,
but quite frankly, a C compiler that would take the undefined behaviour
case and turn it into the "unsigned long" behaviour we were looking for
would be quite interesting and almost certainly just random luck.

So if you want to test a bit in an "unsigned long", you should do either

(x >> bit) & 1

(where the shift will be on a "long", and the binary and will also be
automatically promoted to a unsigned long) or you should do

x & (1UL << bit)

which is what the patch did (it's applied in 2.5.53 already, btw).

Linus