2003-03-07 08:20:04

by Martin J. Bligh

[permalink] [raw]
Subject: NUMA scheduler broken

Something in current -bk breaks the NUMA scheduler.

Kernbench: (make -j N vmlinux, where N = 16 x num_cpus)
Elapsed System User CPU
2.5.64-align 46.29 129.38 567.02 1504.75
2.5.64-bk 153.04 49.05 558.30 395.00

Looks like it's not node-balancing at all and just using one node.
However, I spent a few minutes looking, and can't see anything obvious
that would have caused it.

Will look at it some more tommorow when I wake up, but if any of you
can see what's broken of the top of your head ...

M.



2003-03-07 11:41:01

by Rick Lindsley

[permalink] [raw]
Subject: Re: NUMA scheduler broken

Looks like __activate_task() should call nr_running_inc(rq) rather than
rq->nr_running++, and the same in wake_up_forked_process(). My guess
is that the bogus node_nr_running value is causing some really poor
scheduling decisions to be made on NUMA. See if that changes your result.

Rick

2003-03-07 12:31:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: NUMA scheduler broken


On Fri, 7 Mar 2003, Rick Lindsley wrote:

> Looks like __activate_task() should call nr_running_inc(rq) rather than
> rq->nr_running++, and the same in wake_up_forked_process(). My guess is
> that the bogus node_nr_running value is causing some really poor
> scheduling decisions to be made on NUMA. See if that changes your
> result.

indeed. The attached patch (against BK-curr) fixes this.

Ingo

--- kernel/sched.c.orig 2003-03-07 13:40:53.000000000 +0100
+++ kernel/sched.c 2003-03-07 13:41:19.000000000 +0100
@@ -325,7 +325,7 @@
static inline void __activate_task(task_t *p, runqueue_t *rq)
{
enqueue_task(p, rq->active);
- rq->nr_running++;
+ nr_running_inc(rq);
}

static inline void activate_task(task_t *p, runqueue_t *rq)
@@ -545,7 +545,7 @@
list_add_tail(&p->run_list, &current->run_list);
p->array = current->array;
p->array->nr_active++;
- rq->nr_running++;
+ nr_running_inc(rq);
}
task_rq_unlock(rq, &flags);
}

2003-03-07 16:13:38

by Martin J. Bligh

[permalink] [raw]
Subject: Re: NUMA scheduler broken

>> Looks like __activate_task() should call nr_running_inc(rq) rather than
>> rq->nr_running++, and the same in wake_up_forked_process(). My guess is
>> that the bogus node_nr_running value is causing some really poor
>> scheduling decisions to be made on NUMA. See if that changes your
>> result.

Yay! that fixes it. Nice catch Rick.

> indeed. The attached patch (against BK-curr) fixes this.

Thanks Ingo ... Linus, could you add that one?

Thanks,

M.

> Ingo
>
> --- kernel/sched.c.orig 2003-03-07 13:40:53.000000000 +0100
> +++ kernel/sched.c 2003-03-07 13:41:19.000000000 +0100
> @@ -325,7 +325,7 @@
> static inline void __activate_task(task_t *p, runqueue_t *rq)
> {
> enqueue_task(p, rq->active);
> - rq->nr_running++;
> + nr_running_inc(rq);
> }
>
> static inline void activate_task(task_t *p, runqueue_t *rq)
> @@ -545,7 +545,7 @@
> list_add_tail(&p->run_list, &current->run_list);
> p->array = current->array;
> p->array->nr_active++;
> - rq->nr_running++;
> + nr_running_inc(rq);
> }
> task_rq_unlock(rq, &flags);
> }
>
>