On SMP Alpha, a saturated system (with one busy userland process per
CPU) becomes almost completely unusable. The busy loops don't have to
be doing anything more complicated than "while(1) ;". The symptom is
that even logging into the system (with ssh) takes on the order of 30
seconds. (Tested on an ES40 with 4 processors, 2.4.18.)
It turns out that the problem is in kernel/schedule.c:reschedule_idle.
cycles_t oldest_idle;
...
oldest_idle = (cycles_t) -1;
...
if (oldest_idle == -1ULL) {
Since asm-alpha/timex.h defines cycles_t as unsigned int, this
comparison is always false. Changing it to (cycles_t)-1 fixes the
problem.
Patch below. I'd like this to go into 2.4.22, if nobody has a problem
with that. I would like confirmation from other eyes -- this patch
doesn't break semantics on any architecture, does it?
-andy
--- linux-2.4.21/kernel/sched.c Thu Jul 17 16:43:37 2003
+++ linux-2.4.21-idle-fix/kernel/sched.c Thu Jul 17 16:23:25 2003
@@ -282,7 +282,7 @@
target_tsk = tsk;
}
} else {
- if (oldest_idle == -1ULL) {
+ if (oldest_idle == (cycles_t)-1) {
int prio = preemption_goodness(tsk, p, cpu);
if (prio > max_prio) {
@@ -294,7 +294,7 @@
}
tsk = target_tsk;
if (tsk) {
- if (oldest_idle != -1ULL) {
+ if (oldest_idle != (cycles_t)-1) {
best_cpu = tsk->processor;
goto send_now_idle;
}
On Thu, Jul 17, 2003 at 04:51:39PM -0500, Andy Isaacson wrote:
> Since asm-alpha/timex.h defines cycles_t as unsigned int, this
> comparison is always false. Changing it to (cycles_t)-1 fixes the
> problem.
Nice catch, Andy. Thanks.
Ivan.
On Fri, Jul 18, 2003 at 02:20:43AM +0400, Ivan Kokshaysky wrote:
> On Thu, Jul 17, 2003 at 04:51:39PM -0500, Andy Isaacson wrote:
> > Since asm-alpha/timex.h defines cycles_t as unsigned int, this
> > comparison is always false. Changing it to (cycles_t)-1 fixes the
> > problem.
>
> Nice catch, Andy. Thanks.
Wasn't me, it was Bruce Keller (but I don't know if he wants his email
address plastered all over linux-kernel). I just cleaned up and
prepared a patch with bug description.
-andy