2003-05-06 16:01:01

by Hiroshi Inoue

[permalink] [raw]
Subject: 2.4.20: scheduler issue: bad scheduling latency case

Hi,

I have found a case which may introduce bad scheduling latency up to
10 msec (or 1/HZ sec) in task scheduler of kernel 2.4.20 at SMP machine.

In schedule(), if other CPU in the system set "need_resched" flag
of task struct within the section showed below in order to request
rescheduling, this reschedule request can be neglected.


case TASK_RUNNING:;
}
***** prev->need_resched = 0; *************** // begin section

/*
* this is the scheduler proper:
*/

(Omission)

/*
* from this point on nothing can prevent us from
* switching to the next task, save this fact in
* sched_data.
*/
***** sched_data->curr = next; ************* // end section
task_set_cpu(next, this_cpu);


This case seems to be very rare, but it was observed that this occurred
several times while I compiled a Linux kernel in my environment (machine
with 2 logical CPUs by Hyper-Threading enabled processor).

A simple patch for this issue is attached.
Does it make sense?



diff -Nru linux-2.4.20-orig/kernel/sched.c linux-2.4.20/kernel/sched.c
--- linux-2.4.20-orig/kernel/sched.c Fri Nov 29 08:53:15 2002
+++ linux-2.4.20/kernel/sched.c Fri Apr 11 16:04:34 2003
@@ -625,6 +625,11 @@
goto repeat_schedule;
}

+ if (unlikely(prev->need_resched)) {
+ prev->need_resched = 0;
+ goto repeat_schedule;
+ }
+
/*
* from this point on nothing can prevent us from
* switching to the next task, save this fact in



Regards,
Hiroshi Inoue <[email protected]>