This patch fixes some obviously incorrect bitmask truncations in 2.4.20.
diff -Nru a/include/linux/sched.h b/include/linux/sched.h
--- a/include/linux/sched.h Mon Dec 16 11:58:42 2002
+++ b/include/linux/sched.h Mon Dec 16 11:58:42 2002
@@ -482,8 +482,8 @@
policy: SCHED_OTHER, \
mm: NULL, \
active_mm: &init_mm, \
- cpus_runnable: -1, \
- cpus_allowed: -1, \
+ cpus_runnable: ~0UL, \
+ cpus_allowed: ~0UL, \
run_list: LIST_HEAD_INIT(tsk.run_list), \
next_task: &tsk, \
prev_task: &tsk, \
diff -Nru a/kernel/sched.c b/kernel/sched.c
--- a/kernel/sched.c Mon Dec 16 11:58:42 2002
+++ b/kernel/sched.c Mon Dec 16 11:58:42 2002
@@ -116,7 +116,7 @@
#define idle_task(cpu) (init_tasks[cpu_number_map(cpu)])
#define can_schedule(p,cpu) \
- ((p)->cpus_runnable & (p)->cpus_allowed & (1 << cpu))
+ ((p)->cpus_runnable & (p)->cpus_allowed & (1UL << cpu))
#else
@@ -359,7 +359,7 @@
if (task_on_runqueue(p))
goto out;
add_to_runqueue(p);
- if (!synchronous || !(p->cpus_allowed & (1 << smp_processor_id())))
+ if (!synchronous || !(p->cpus_allowed & (1UL << smp_processor_id())))
reschedule_idle(p);
success = 1;
out:
On Mon, Dec 16, 2002 at 12:13:29PM -0700, Bjorn Helgaas wrote:
> This patch fixes some obviously incorrect bitmask truncations in 2.4.20.
Linus, this is the 2.5.x version of the same patch originally by Bjorn
for 2.4.x. This fixes an entire class of critical 64-bit bugs.
Against 2.5.52-bk as of 2:25AM 20 Dec 2002. Please apply.
Thanks,
Bill
Fix task->cpus_allowed bitmask truncations. Originally due to
Bjorn Helgaas for 2.4.x.
include/linux/init_task.h | 2 +-
kernel/module.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
===== include/linux/init_task.h 1.19 vs edited =====
--- 1.19/include/linux/init_task.h Sun Sep 29 07:02:55 2002
+++ edited/include/linux/init_task.h Fri Dec 20 02:22:04 2002
@@ -63,7 +63,7 @@
.prio = MAX_PRIO-20, \
.static_prio = MAX_PRIO-20, \
.policy = SCHED_NORMAL, \
- .cpus_allowed = -1, \
+ .cpus_allowed = ~0UL, \
.mm = NULL, \
.active_mm = &init_mm, \
.run_list = LIST_HEAD_INIT(tsk.run_list), \
===== kernel/module.c 1.31 vs edited =====
--- 1.31/kernel/module.c Sun Dec 1 22:44:11 2002
+++ edited/kernel/module.c Fri Dec 20 02:19:53 2002
@@ -210,7 +210,7 @@
struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
setscheduler(current->pid, SCHED_FIFO, ¶m);
#endif
- set_cpus_allowed(current, 1 << (unsigned long)cpu);
+ set_cpus_allowed(current, 1UL << (unsigned long)cpu);
/* Ack: we are alive */
atomic_inc(&stopref_thread_ack);
@@ -271,7 +271,7 @@
/* FIXME: racy with set_cpus_allowed. */
old_allowed = current->cpus_allowed;
- set_cpus_allowed(current, 1 << (unsigned long)cpu);
+ set_cpus_allowed(current, 1UL << (unsigned long)cpu);
atomic_set(&stopref_thread_ack, 0);
stopref_num_threads = 0;
On Mon, Dec 16, 2002 at 12:13:29PM -0700, Bjorn Helgaas wrote:
>> This patch fixes some obviously incorrect bitmask truncations in 2.4.20.
On Fri, Dec 20, 2002 at 02:30:28AM -0800, William Lee Irwin III wrote:
> Linus, this is the 2.5.x version of the same patch originally by Bjorn
> for 2.4.x. This fixes an entire class of critical 64-bit bugs.
> Against 2.5.52-bk as of 2:25AM 20 Dec 2002. Please apply.
Actually, this looks like a non-issue from userspace on the IA64 boxen
I can get to. akpm pointed out that this seemed to pass his testing,
and on deeper inspection, IA64 userspace did not find this to be an issue.
Bjorn, could you explain on what toolchains and/or architectures you had
this issue? It sounds serious and/or real enough yet I can't actually
make this happen myself.
Thanks,
Bill
William Lee Irwin III <[email protected]> writes:
|> ===== include/linux/init_task.h 1.19 vs edited =====
|> --- 1.19/include/linux/init_task.h Sun Sep 29 07:02:55 2002
|> +++ edited/include/linux/init_task.h Fri Dec 20 02:22:04 2002
|> @@ -63,7 +63,7 @@
|> .prio = MAX_PRIO-20, \
|> .static_prio = MAX_PRIO-20, \
|> .policy = SCHED_NORMAL, \
|> - .cpus_allowed = -1, \
|> + .cpus_allowed = ~0UL, \
This is useless. Assigning -1 to any unsigned type is garanteed to give
you all bits one, and with two's complement this also holds for any signed
type.
Andreas.
--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 N?rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
On Friday 20 December 2002 4:15 am, William Lee Irwin III wrote:
> Actually, this looks like a non-issue from userspace on the IA64 boxen
> I can get to. akpm pointed out that this seemed to pass his testing,
> and on deeper inspection, IA64 userspace did not find this to be an issue.
>
> Bjorn, could you explain on what toolchains and/or architectures you had
> this issue? It sounds serious and/or real enough yet I can't actually
> make this happen myself.
This was an issue with gcc 2.96 on a 64-way IA64 box. I don't have
access to one at the moment, but as I remember, without the 2.4 changes:
- ((p)->cpus_runnable & (p)->cpus_allowed & (1 << cpu))
+ ((p)->cpus_runnable & (p)->cpus_allowed & (1UL << cpu))
nothing would get scheduled on CPUs 32-63. I guess those changes
aren't controversial, though.
The question of whether this was strictly necessary:
- cpus_runnable: -1, \
- cpus_allowed: -1, \
+ cpus_runnable: ~0UL, \
+ cpus_allowed: ~0UL, \
I don't specifically recall, and a quick test suggests that it really
doesn't matter. Since cpus_runnable and cpus_allowed are declared
unsigned long, I think ~0UL is a more direct expression of what is
desired, but maybe that's just a personal preference.
Bjorn
William Lee Irwin III <[email protected]> writes:
|> ===== include/linux/init_task.h 1.19 vs edited =====
|> --- 1.19/include/linux/init_task.h Sun Sep 29 07:02:55 2002
|> +++ edited/include/linux/init_task.h Fri Dec 20 02:22:04 2002
|> @@ -63,7 +63,7 @@
|> .prio = MAX_PRIO-20, \
|> .static_prio = MAX_PRIO-20, \
|> .policy = SCHED_NORMAL, \
|> - .cpus_allowed = -1, \
|> + .cpus_allowed = ~0UL, \
On Fri, Dec 20, 2002 at 01:17:24PM +0100, Andreas Schwab wrote:
> This is useless. Assigning -1 to any unsigned type is garanteed to give
> you all bits one, and with two's complement this also holds for any signed
> type.
Not so on all gcc versions. The rest of the world can figure out what
to do about the versions that do not.
Bill
On Fri, 2002-12-20 at 04:17, Andreas Schwab wrote:
> This is useless. Assigning -1 to any unsigned type is garanteed to give
> you all bits one, and with two's complement this also holds for any signed
> type.
Only if the -1 is the same size as the unsigned type. Otherwise it will
be 0-extended.
J
Jeremy Fitzhardinge <[email protected]> writes:
|> On Fri, 2002-12-20 at 04:17, Andreas Schwab wrote:
|> > This is useless. Assigning -1 to any unsigned type is garanteed to give
|> > you all bits one, and with two's complement this also holds for any signed
|> > type.
|>
|> Only if the -1 is the same size as the unsigned type. Otherwise it will
|> be 0-extended.
Wrong. Unsigned arithmetics is defined as modulo MAX+1, and -1 equals MAX
modulo MAX+1 for every MAX.
Andreas.
--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 N?rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
> Linus, this is the 2.5.x version of the same patch originally by Bjorn
> for 2.4.x. This fixes an entire class of critical 64-bit bugs.
>
> Against 2.5.52-bk as of 2:25AM 20 Dec 2002. Please apply.
Here's one in 2.5, found when adding 64 CPU support to ppc64.
Anton
===== kernel/module.c 1.31 vs edited =====
--- 1.31/kernel/module.c Mon Dec 2 17:44:11 2002
+++ edited/kernel/module.c Tue Dec 17 10:29:27 2002
@@ -210,7 +210,7 @@
struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
setscheduler(current->pid, SCHED_FIFO, ¶m);
#endif
- set_cpus_allowed(current, 1 << (unsigned long)cpu);
+ set_cpus_allowed(current, 1UL << (unsigned long)cpu);
/* Ack: we are alive */
atomic_inc(&stopref_thread_ack);
@@ -271,7 +271,7 @@
/* FIXME: racy with set_cpus_allowed. */
old_allowed = current->cpus_allowed;
- set_cpus_allowed(current, 1 << (unsigned long)cpu);
+ set_cpus_allowed(current, 1UL << (unsigned long)cpu);
atomic_set(&stopref_thread_ack, 0);
stopref_num_threads = 0;
At some point in the past, I wrote:
>> Linus, this is the 2.5.x version of the same patch originally by Bjorn
>> for 2.4.x. This fixes an entire class of critical 64-bit bugs.
>> Against 2.5.52-bk as of 2:25AM 20 Dec 2002. Please apply.
On Sat, Dec 21, 2002 at 09:57:14AM +1100, Anton Blanchard wrote:
> Here's one in 2.5, found when adding 64 CPU support to ppc64.
> Anton
That was already included in the patch I sent, which was for 2.5.x.
I'll take this to mean the problem isn't solved in modern toolchains.
Linus, I think that pretty much settles it.
Bill
> That was already included in the patch I sent, which was for 2.5.x.
OK, I only read through the 2.4 patch.
Anton
On Fri, 2002-12-20 at 17:00, Bjorn Helgaas wrote:
> This was an issue with gcc 2.96 on a 64-way IA64 box. I don't have
> access to one at the moment, but as I remember, without the 2.4 changes:
>
> - ((p)->cpus_runnable & (p)->cpus_allowed & (1 << cpu))
> + ((p)->cpus_runnable & (p)->cpus_allowed & (1UL << cpu))
>
> nothing would get scheduled on CPUs 32-63. I guess those changes
> aren't controversial, though.
Is this a C quirk or a compiler bug ?
On 25 Dec 2002, Alan Cox wrote:
> On Fri, 2002-12-20 at 17:00, Bjorn Helgaas wrote:
> > This was an issue with gcc 2.96 on a 64-way IA64 box. I don't have
> > access to one at the moment, but as I remember, without the 2.4 changes:
> >
> > - ((p)->cpus_runnable & (p)->cpus_allowed & (1 << cpu))
> > + ((p)->cpus_runnable & (p)->cpus_allowed & (1UL << cpu))
> >
> > nothing would get scheduled on CPUs 32-63. I guess those changes
> > aren't controversial, though.
>
> Is this a C quirk or a compiler bug ?
It's normal C behaviour. I wouldn't even call it a quirk.
1 << cpu
is clearly an int, and as such will have undefined behaviour for cpu >
bits-of-int.
The promotion to unsigned long happens _after_ the shift has already
happened as an int, since nothing in the sub-expression needs promotion
per se.
As undefined behaviour, the C compiler is _allowed_ to do what we meant,
but quite frankly, a C compiler that would take the undefined behaviour
case and turn it into the "unsigned long" behaviour we were looking for
would be quite interesting and almost certainly just random luck.
So if you want to test a bit in an "unsigned long", you should do either
(x >> bit) & 1
(where the shift will be on a "long", and the binary and will also be
automatically promoted to a unsigned long) or you should do
x & (1UL << bit)
which is what the patch did (it's applied in 2.5.53 already, btw).
Linus