Something in current -bk breaks the NUMA scheduler.
Kernbench: (make -j N vmlinux, where N = 16 x num_cpus)
Elapsed System User CPU
2.5.64-align 46.29 129.38 567.02 1504.75
2.5.64-bk 153.04 49.05 558.30 395.00
Looks like it's not node-balancing at all and just using one node.
However, I spent a few minutes looking, and can't see anything obvious
that would have caused it.
Will look at it some more tommorow when I wake up, but if any of you
can see what's broken of the top of your head ...
M.
Looks like __activate_task() should call nr_running_inc(rq) rather than
rq->nr_running++, and the same in wake_up_forked_process(). My guess
is that the bogus node_nr_running value is causing some really poor
scheduling decisions to be made on NUMA. See if that changes your result.
Rick
On Fri, 7 Mar 2003, Rick Lindsley wrote:
> Looks like __activate_task() should call nr_running_inc(rq) rather than
> rq->nr_running++, and the same in wake_up_forked_process(). My guess is
> that the bogus node_nr_running value is causing some really poor
> scheduling decisions to be made on NUMA. See if that changes your
> result.
indeed. The attached patch (against BK-curr) fixes this.
Ingo
--- kernel/sched.c.orig 2003-03-07 13:40:53.000000000 +0100
+++ kernel/sched.c 2003-03-07 13:41:19.000000000 +0100
@@ -325,7 +325,7 @@
static inline void __activate_task(task_t *p, runqueue_t *rq)
{
enqueue_task(p, rq->active);
- rq->nr_running++;
+ nr_running_inc(rq);
}
static inline void activate_task(task_t *p, runqueue_t *rq)
@@ -545,7 +545,7 @@
list_add_tail(&p->run_list, ¤t->run_list);
p->array = current->array;
p->array->nr_active++;
- rq->nr_running++;
+ nr_running_inc(rq);
}
task_rq_unlock(rq, &flags);
}
>> Looks like __activate_task() should call nr_running_inc(rq) rather than
>> rq->nr_running++, and the same in wake_up_forked_process(). My guess is
>> that the bogus node_nr_running value is causing some really poor
>> scheduling decisions to be made on NUMA. See if that changes your
>> result.
Yay! that fixes it. Nice catch Rick.
> indeed. The attached patch (against BK-curr) fixes this.
Thanks Ingo ... Linus, could you add that one?
Thanks,
M.
> Ingo
>
> --- kernel/sched.c.orig 2003-03-07 13:40:53.000000000 +0100
> +++ kernel/sched.c 2003-03-07 13:41:19.000000000 +0100
> @@ -325,7 +325,7 @@
> static inline void __activate_task(task_t *p, runqueue_t *rq)
> {
> enqueue_task(p, rq->active);
> - rq->nr_running++;
> + nr_running_inc(rq);
> }
>
> static inline void activate_task(task_t *p, runqueue_t *rq)
> @@ -545,7 +545,7 @@
> list_add_tail(&p->run_list, ¤t->run_list);
> p->array = current->array;
> p->array->nr_active++;
> - rq->nr_running++;
> + nr_running_inc(rq);
> }
> task_rq_unlock(rq, &flags);
> }
>
>