Hi there,
grant@deltree:~$ uname -r
2.6.15.3a
grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut -c-95
...
2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET /test/linux-2.6/tosh/ HTTP/
real 0m8.537s
user 0m0.970s
sys 0m1.100s
--> reboot to 2.4.32-hf32.2
grant@deltree:~$ uname -r
2.4.32-hf32.2
grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut -c-95
...
2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET /test/linux-2.6/tosh/ HTTP/
real 0m2.271s
user 0m0.730s
sys 0m0.540s
Still a 4:1 slowdown, machine .config and dmesg info:
http://bugsplatter.mine.nu/test/boxen/deltree/
While it is very nice to know I can run recent 2.6 kernels without a fuss,
2.4.latest feels better in this particular context.
This console sluggishness is noticeable enough on older hardware for me to
forgo exercising 2.6.latest.stable bugs for much time on it ;)
For those suffering deja vu, yes, I reported this last month (or, recently).
Grant.
On Wed, Feb 08, 2006 at 01:11:49PM +1100, Grant Coady wrote:
> This console sluggishness is noticeable enough on older hardware for me to
> forgo exercising 2.6.latest.stable bugs for much time on it ;)
>
> For those suffering deja vu, yes, I reported this last month (or, recently).
This bug report is a bit vague in terms of what the problem is -- the
test case hits 3 major subsystems (io, vm, net), all of which have changed
rather substantially in the course of 2.6 development. Would it be possible
to profile the system using oprofile to get an idea what the hotspots are?
Have you compared basic hard disk throughput with hdparm, as well as
ensuring DMA is enabled with 32 bit io? What about testing network
performance with netperf (or a netcat of /dev/zero)? A few more data points
would be quite helpful. Cheers,
-ben
--
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here
and they've asked us to stop the party." Don't Email: <[email protected]>.
On Wed, 8 Feb 2006 01:11 pm, Grant Coady wrote:
> Hi there,
>
> grant@deltree:~$ uname -r
> 2.6.15.3a
> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> -c-95 ...
> 2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET
> /test/linux-2.6/tosh/ HTTP/
>
> real 0m8.537s
> user 0m0.970s
> sys 0m1.100s
>
> --> reboot to 2.4.32-hf32.2
>
> grant@deltree:~$ uname -r
> 2.4.32-hf32.2
> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> -c-95 ...
> 2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET
> /test/linux-2.6/tosh/ HTTP/
>
> real 0m2.271s
> user 0m0.730s
> sys 0m0.540s
>
> Still a 4:1 slowdown, machine .config and dmesg info:
> http://bugsplatter.mine.nu/test/boxen/deltree/
What happens if you add "| cat" on the end of your command?
Cheers,
Con
On Tue, 7 Feb 2006 21:24:11 -0500, Benjamin LaHaise <[email protected]> wrote:
>On Wed, Feb 08, 2006 at 01:11:49PM +1100, Grant Coady wrote:
>> This console sluggishness is noticeable enough on older hardware for me to
>> forgo exercising 2.6.latest.stable bugs for much time on it ;)
>>
>> For those suffering deja vu, yes, I reported this last month (or, recently).
>
>This bug report is a bit vague in terms of what the problem is -- the
>test case hits 3 major subsystems (io, vm, net), all of which have changed
>rather substantially in the course of 2.6 development.
Vague 'cos I do not know where the problem is. One might say slowdown
is like a near a 1ms delay per line output, but slowdown does not
correlate to kernel tick frequency. :(
> Would it be possible
>to profile the system using oprofile to get an idea what the hotspots are?
Perhaps, I've yet to try that.
>Have you compared basic hard disk throughput with hdparm, as well as
>ensuring DMA is enabled with 32 bit io? What about testing network
>performance with netperf (or a netcat of /dev/zero)? A few more data points
>would be quite helpful.
Yes, the gross datapoints such as basic net i/o, disk i/o, say 2.6 is
better/faster than 2.4 on this hardware, my problem is how to describe
measured console slowdown in terms meaningful to you.
I'll take a look at oprofile, report back if I can make sense of it ;)
Grant.
On Wed, 8 Feb 2006 13:35:18 +1100, Con Kolivas <[email protected]> wrote:
>On Wed, 8 Feb 2006 01:11 pm, Grant Coady wrote:
>> Hi there,
>>
>> grant@deltree:~$ uname -r
>> 2.6.15.3a
>> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
>> -c-95 ...
>> 2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET
>> /test/linux-2.6/tosh/ HTTP/
>>
>> real 0m8.537s
>> user 0m0.970s
>> sys 0m1.100s
>>
>> --> reboot to 2.4.32-hf32.2
>>
>> grant@deltree:~$ uname -r
>> 2.4.32-hf32.2
>> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
>> -c-95 ...
>> 2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET
>> /test/linux-2.6/tosh/ HTTP/
>>
>> real 0m2.271s
>> user 0m0.730s
>> sys 0m0.540s
>>
>> Still a 4:1 slowdown, machine .config and dmesg info:
>> http://bugsplatter.mine.nu/test/boxen/deltree/
>
>What happens if you add "| cat" on the end of your command?
It gets faster with 2.4.32-hf32.2 by a little bit (I forgot to copy)
reboot to 2.6.15.3a, without...
real 0m8.737s
user 0m1.030s
sys 0m1.200s
with... oh shit / surprise!!
real 0m1.861s
user 0m0.560s
sys 0m0.370s
What is that telling me / you / us?
Thanks,
Grant
>
>Cheers,
>Con
On Wed, 8 Feb 2006 01:55 pm, Grant Coady wrote:
> On Wed, 8 Feb 2006 13:35:18 +1100, Con Kolivas <[email protected]> wrote:
> >On Wed, 8 Feb 2006 01:11 pm, Grant Coady wrote:
> >> Hi there,
> >>
> >> grant@deltree:~$ uname -r
> >> 2.6.15.3a
> >> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> >> -c-95 ...
> >> 2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET
> >> /test/linux-2.6/tosh/ HTTP/
> >>
> >> real 0m8.537s
> >> user 0m0.970s
> >> sys 0m1.100s
> >>
> >> --> reboot to 2.4.32-hf32.2
> >>
> >> grant@deltree:~$ uname -r
> >> 2.4.32-hf32.2
> >> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> >> -c-95 ...
> >> 2006-02-08 12:38:13 +1100: bugsplatter.mine.nu 193.196.182.215 "GET
> >> /test/linux-2.6/tosh/ HTTP/
> >>
> >> real 0m2.271s
> >> user 0m0.730s
> >> sys 0m0.540s
> >>
> >> Still a 4:1 slowdown, machine .config and dmesg info:
> >> http://bugsplatter.mine.nu/test/boxen/deltree/
> >
> >What happens if you add "| cat" on the end of your command?
>
> It gets faster with 2.4.32-hf32.2 by a little bit (I forgot to copy)
>
> reboot to 2.6.15.3a, without...
>
> real 0m8.737s
> user 0m1.030s
> sys 0m1.200s
>
> with... oh shit / surprise!!
>
> real 0m1.861s
> user 0m0.560s
> sys 0m0.370s
>
> What is that telling me / you / us?
Heh.
This is the terminal's fault. xterm et al use an algorithm to determine how
fast your machine is and decide whether to jump scroll or smooth scroll. This
algorithm is basically broken with the 2.6 scheduler and it decides to mostly
smooth scroll.
Cheers,
Con
On Wed, Feb 08, 2006 at 01:50:10PM +1100, Grant Coady wrote:
> Vague 'cos I do not know where the problem is. One might say slowdown
> is like a near a 1ms delay per line output, but slowdown does not
> correlate to kernel tick frequency. :(
Two things come to mind: can you try doing a vmstat 1 while running the
test and compare 2.4 vs 2.6? Also, does it make a difference if you switch
from the e100 driver to eepro100?
> I'll take a look at oprofile, report back if I can make sense of it ;)
If the CPU is pegged that will guide fixing things quite nicely, but the
fact that it's 1ms per line sounds like something more sinister.
-ben
--
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here
and they've asked us to stop the party." Don't Email: <[email protected]>.
On 2/7/06, Con Kolivas <[email protected]> wrote:
> This is the terminal's fault. xterm et al use an algorithm to determine how
> fast your machine is and decide whether to jump scroll or smooth scroll. This
> algorithm is basically broken with the 2.6 scheduler and it decides to mostly
> smooth scroll.
Recent versions of xterm are supposed to fix this. (Skimming xterm's
changelog, I think it might have been fixed in version 201, but I'm
not completely sure.)
--
-Barry K. Nathan <[email protected]>
On Tue, 7 Feb 2006 20:12:40 -0800, "Barry K. Nathan" <[email protected]> wrote:
>On 2/7/06, Con Kolivas <[email protected]> wrote:
>> This is the terminal's fault. xterm et al use an algorithm to determine how
>> fast your machine is and decide whether to jump scroll or smooth scroll. This
>> algorithm is basically broken with the 2.6 scheduler and it decides to mostly
>> smooth scroll.
>
>Recent versions of xterm are supposed to fix this. (Skimming xterm's
>changelog, I think it might have been fixed in version 201, but I'm
>not completely sure.)
Yeah but I'm using ssh to PuTTY on windoze. No GUI on linux box.
Grant.
On Wed, 8 Feb 2006 14:00:59 +1100, Con Kolivas <[email protected]> wrote:
>This is the terminal's fault. xterm et al use an algorithm to determine how
>fast your machine is and decide whether to jump scroll or smooth scroll. This
>algorithm is basically broken with the 2.6 scheduler and it decides to mostly
>smooth scroll.
Strange it does that over localnet to a PuTTY terminal on windoze.
Seems a strange thing to do in the kernel though, presentation
buffering / management surely can be done in userspace?
Grant.
Hi Grant,
On Wed, Feb 08, 2006 at 03:51:24PM +1100, Grant Coady wrote:
> On Wed, 8 Feb 2006 14:00:59 +1100, Con Kolivas <[email protected]> wrote:
>
> >This is the terminal's fault. xterm et al use an algorithm to determine how
> >fast your machine is and decide whether to jump scroll or smooth scroll. This
> >algorithm is basically broken with the 2.6 scheduler and it decides to mostly
> >smooth scroll.
>
> Strange it does that over localnet to a PuTTY terminal on windoze.
>
> Seems a strange thing to do in the kernel though, presentation
> buffering / management surely can be done in userspace?
I suspect the sshd on the firewall gets woken up for each line and it
behaves exactly like an xterm. After having done a lot of "ls -l|cat"
on 2.6, I'm not surprized at all :-/
A good test would be to strace sshd under 2.4 and 2.6. You could even
use strace -tt. Probably that you will see something like 1 ms between
two reads on 2.6 and nearly nothing between them in 2.4.
> Grant.
Cheers,
Willy
Hi Willy,
On Wed, 8 Feb 2006 06:17:09 +0100, Willy Tarreau <[email protected]> wrote:
>Hi Grant,
>
>On Wed, Feb 08, 2006 at 03:51:24PM +1100, Grant Coady wrote:
>> On Wed, 8 Feb 2006 14:00:59 +1100, Con Kolivas <[email protected]> wrote:
>>
>> >This is the terminal's fault. xterm et al use an algorithm to determine how
>> >fast your machine is and decide whether to jump scroll or smooth scroll. This
>> >algorithm is basically broken with the 2.6 scheduler and it decides to mostly
>> >smooth scroll.
>>
>> Strange it does that over localnet to a PuTTY terminal on windoze.
>>
>> Seems a strange thing to do in the kernel though, presentation
>> buffering / management surely can be done in userspace?
>
>I suspect the sshd on the firewall gets woken up for each line and it
>behaves exactly like an xterm. After having done a lot of "ls -l|cat"
>on 2.6, I'm not surprized at all :-/
>
>A good test would be to strace sshd under 2.4 and 2.6. You could even
>use strace -tt. Probably that you will see something like 1 ms between
>two reads on 2.6 and nearly nothing between them in 2.4.
Yes, it is nearly 1ms per line delay with 2.6, but 2.4 and 2.6 with the
trailing '|cat' give similar times, didn't try that notion last time.
We know now it isn't the network cards, disk I/O, just an oddness in 2.6 ;)
Cheers,
Grant.
On Wed, 2006-02-08 at 14:00 +1100, Con Kolivas wrote:
> This is the terminal's fault. xterm et al use an algorithm to
> determine how fast your machine is and decide whether to jump scroll
> or smooth scroll. This algorithm is basically broken with the 2.6
> scheduler and it decides to mostly smooth scroll.
>
Hmm, I've been having a similar problem for ages. If I just do "ls" in
my home directory 10 or 20 times, approximately 20% of the time it's
"fast":
real 0m0.177s
user 0m0.028s
sys 0m0.027s
And the rest of the times it's "slow":
real 0m1.240s
user 0m0.036s
sys 0m0.040s
I rarely get anything in between - it's either ~1.2s or ~0.2s.
"time ls | cat" is always fast - 0.18 - 0.35s.
real 0m0.188s
user 0m0.014s
sys 0m0.018s
It has been this way as long as I can remember and it never made
sense...
Lee
>> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
>> -c-95 ...
>
>What happens if you add "| cat" on the end of your command?
>
Do you think it's the new pipe buffering thing? (Introduced 2.6.10-.12,
don't remember exactly)
Jan Engelhardt
--
On Thu, 2006-02-09 at 18:06 +0100, Jan Engelhardt wrote:
> >> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> >> -c-95 ...
> >
> >What happens if you add "| cat" on the end of your command?
> >
> Do you think it's the new pipe buffering thing? (Introduced 2.6.10-.12,
> don't remember exactly)
If it's the same problem I've been seeing it goes back much farther than
2.6.10.
Lately I suspect the scheduler.
Lee
On Thu, 2006-02-09 at 15:06 -0500, Lee Revell wrote:
> On Thu, 2006-02-09 at 18:06 +0100, Jan Engelhardt wrote:
> > >> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> > >> -c-95 ...
> > >
> > >What happens if you add "| cat" on the end of your command?
> > >
> > Do you think it's the new pipe buffering thing? (Introduced 2.6.10-.12,
> > don't remember exactly)
>
> If it's the same problem I've been seeing it goes back much farther than
> 2.6.10.
>
> Lately I suspect the scheduler.
Hmm. I ran into an oddity while testing a modified kernel, and see
something in schedule() that I don't think is right...
Down where it does requeue_task(next, array) if a freshly awakened task
is to possibly receive a priority boost for the time it sat on the
runqueue, I see a potential problem. If the task didn't sit on the
queue long enough to be promoted, and isn't at the very top, it is going
to the back of the bus as soon it gets preempted by say xmms. For a
task that possibly just sat through the full rotation of a busy queue
waiting for a shot at the cpu, that has got to hurt. Speculating, that
requeue looks like it's there to increase the queue rotation rate, ie to
reduce latency, but it looks to me like it can also accomplish the
opposite if the context switch rate for your queue isn't very high.
... I ended up sharing a queue with a few rampaging irman2 threads, and
each keystroke took ages. [btw, i wonder how the heck next->array could
not be rq->active there]
-Mike
On Fri, 2006-02-10 at 07:35 +0100, MIke Galbraith wrote:
> On Thu, 2006-02-09 at 15:06 -0500, Lee Revell wrote:
> > On Thu, 2006-02-09 at 18:06 +0100, Jan Engelhardt wrote:
> > > >> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log| cut
> > > >> -c-95 ...
> > > >
> > > >What happens if you add "| cat" on the end of your command?
> > > >
> > > Do you think it's the new pipe buffering thing? (Introduced 2.6.10-.12,
> > > don't remember exactly)
> >
> > If it's the same problem I've been seeing it goes back much farther than
> > 2.6.10.
> >
> > Lately I suspect the scheduler.
>
> Hmm. I ran into an oddity while testing a modified kernel, and see
> something in schedule() that I don't think is right...
>
> Down where it does requeue_task(next, array) if a freshly awakened task
> is to possibly receive a priority boost for the time it sat on the
> runqueue, I see a potential problem. If the task didn't sit on the
> queue long enough to be promoted, and isn't at the very top, it is going
> to the back of the bus as soon it gets preempted by say xmms. For a
> task that possibly just sat through the full rotation of a busy queue
> waiting for a shot at the cpu, that has got to hurt. Speculating, that
> requeue looks like it's there to increase the queue rotation rate, ie to
> reduce latency, but it looks to me like it can also accomplish the
> opposite if the context switch rate for your queue isn't very high.
>
> ... I ended up sharing a queue with a few rampaging irman2 threads, and
> each keystroke took ages. [btw, i wonder how the heck next->array could
> not be rq->active there]
I guess you didn't try my pseudo suggestion. Since I happen to be
actively tinkering in this very area, I'll be a bit more direct :)
If you think it's the scheduler, how about try the patch below. It's
against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
logic in the scheduler or not. I don't see other candidates in there,
not that that means there aren't any of course.
With this patch in place, running an irman2 in one window, a make -j4
over nfs in another, and multimedia_sim as a test application in
another, I get these results...
[mikeg@Homer]:> ./multimedia_sim 0 60
nice_level = 0
duration = 60 seconds
[frames] received: 1784 dropped: 0
[latency] mean: 0.000627 max: 0.019647 stddev: 0.000786
score: 0.004240
.... test proggy attached for inspection.
To be extra sure, set both /proc/sys/kernel/sched_g1 and g2 to 0. That
will (I mean should of course;) more or less restore the original O(1)
scheduler behavior.
-Mike
Maybe not pretty, but effective counts too...
--- linux-2.6.16-rc2-mm1x/include/linux/sched.h.org 2006-02-09 13:15:50.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/include/linux/sched.h 2006-02-09 13:16:30.000000000 +0100
@@ -721,14 +721,14 @@
unsigned short ioprio;
- unsigned long sleep_avg;
+ unsigned long sleep_avg, last_slice, throttle_stamp;
unsigned long long timestamp, last_ran;
unsigned long long sched_time; /* sched_clock time spent running */
enum sleep_type sleep_type;
unsigned long policy;
cpumask_t cpus_allowed;
- unsigned int time_slice, first_time_slice;
+ unsigned int time_slice, slice_info;
#ifdef CONFIG_SCHEDSTATS
struct sched_info sched_info;
--- linux-2.6.16-rc2-mm1x/include/linux/sysctl.h.org 2006-02-09 13:16:02.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/include/linux/sysctl.h 2006-02-09 13:16:30.000000000 +0100
@@ -147,6 +147,8 @@
KERN_SETUID_DUMPABLE=69, /* int: behaviour of dumps for setuid core */
KERN_SPIN_RETRY=70, /* int: number of spinlock retries */
KERN_ACPI_VIDEO_FLAGS=71, /* int: flags for setting up video after ACPI sleep */
+ KERN_SCHED_THROTTLE1=72, /* int: throttling grace period 1 in secs */
+ KERN_SCHED_THROTTLE2=73, /* int: throttling grace period 2 in secs */
};
--- linux-2.6.16-rc2-mm1x/kernel/sched.c.org 2006-02-09 13:15:04.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/kernel/sched.c 2006-02-10 16:55:09.000000000 +0100
@@ -158,9 +158,195 @@
#define TASK_INTERACTIVE(p) \
((p)->prio <= (p)->static_prio - DELTA(p))
-#define INTERACTIVE_SLEEP(p) \
- (JIFFIES_TO_NS(MAX_SLEEP_AVG * \
- (MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
+/*
+ * Interactivity boost can lead to serious starvation problems if the
+ * task being boosted turns out to be a cpu hog. To combat this, we
+ * compute a running slice_avg, which is the sane upper limit for the
+ * task's sleep_avg. If an 'interactive' task begins burning cpu, it's
+ * slice_avg will decay, making it visible as a problem so corrective
+ * measures can be applied.
+ *
+ * /proc/sys/kernel tunables.
+ *
+ * sched_g1: Grace period in seconds that a task is allowed to run unchecked.
+ * sched_g2: seconds thereafter, to force a priority adjustment.
+ */
+
+int sched_g1 = 20;
+int sched_g2 = 10;
+
+/*
+ * Offset from the time we noticed a potential problem until we disable the
+ * interactive bonus multiplier, and adjust sleep_avg consumption rate.
+ */
+#define G1 (sched_g1 * HZ)
+
+/*
+ * Offset thereafter that we disable the interactive bonus divisor, and adjust
+ * a runaway task's priority.
+ */
+#define G2 (sched_g2 * HZ + G1)
+
+/*
+ * Grace period has expired.
+ */
+#define grace_expired(p, grace) ((p)->throttle_stamp && \
+ time_after_eq(jiffies, (p)->throttle_stamp + (grace)))
+
+#define NEXT_PRIO (NS_MAX_SLEEP_AVG / MAX_BONUS)
+
+/*
+ * Warning: do not reduce threshold below NS_MAX_SLEEP_AVG / MAX_BONUS
+ * else you may break the case where one of a pair of communicating tasks
+ * only sleeps a miniscule amount of time, but must to be able to preempt
+ * it's partner in order to get any cpu time to speak of. If you push that
+ * task to the same level or below it's partner, it will not be able to
+ * preempt and will starve. This scenario was fixed for bonus calculation
+ * by converting sleep_avg to ns.
+ */
+#define THROTTLE_THRESHOLD (NEXT_PRIO)
+
+#define NS_MAX_SLEEP_AVG_PCNT (NS_MAX_SLEEP_AVG / 100)
+
+/*
+ * Masks for p->slice_info, formerly p->first_time_slice.
+ * SLICE_FTS: 0x80000000 Task is in it's first ever timeslice.
+ * SLICE_NEW: 0x40000000 Slice refreshed.
+ * SLICE_SPA: 0x3FFF8000 Spare bits.
+ * SLICE_LTS: 0x00007F80 Last time slice
+ * SLICE_AVG: 0x0000007F Task slice_avg stored as percentage.
+ */
+#define SLICE_AVG_BITS 7
+#define SLICE_LTS_BITS 10
+#define SLICE_SPA_BITS 13
+#define SLICE_NEW_BITS 1
+#define SLICE_FTS_BITS 1
+
+#define SLICE_AVG_SHIFT 0
+#define SLICE_LTS_SHIFT (SLICE_AVG_SHIFT + SLICE_AVG_BITS)
+#define SLICE_SPA_SHIFT (SLICE_LTS_SHIFT + SLICE_LTS_BITS)
+#define SLICE_NEW_SHIFT (SLICE_SPA_SHIFT + SLICE_SPA_BITS)
+#define SLICE_FTS_SHIFT (SLICE_NEW_SHIFT + SLICE_NEW_BITS)
+
+#define INFO_MASK(x) ((1U << (x))-1)
+#define SLICE_AVG_MASK (INFO_MASK(SLICE_AVG_BITS) << SLICE_AVG_SHIFT)
+#define SLICE_LTS_MASK (INFO_MASK(SLICE_LTS_BITS) << SLICE_LTS_SHIFT)
+#define SLICE_SPA_MASK (INFO_MASK(SLICE_SPA_BITS) << SLICE_SPA_SHIFT)
+#define SLICE_NEW_MASK (INFO_MASK(SLICE_NEW_BITS) << SLICE_NEW_SHIFT)
+#define SLICE_FTS_MASK (INFO_MASK(SLICE_FTS_BITS) << SLICE_FTS_SHIFT)
+
+#define first_time_slice(p) ((p)->slice_info & SLICE_FTS_MASK)
+#define set_first_time_slice(p) ((p)->slice_info |= SLICE_FTS_MASK)
+#define clr_first_time_slice(p) ((p)->slice_info &= ~SLICE_FTS_MASK)
+
+#define slice_is_new(p) ((p)->slice_info & SLICE_NEW_MASK)
+#define set_slice_is_new(p) ((p)->slice_info |= SLICE_NEW_MASK)
+#define clr_slice_is_new(p) ((p)->slice_info &= ~SLICE_NEW_MASK)
+
+#define last_slice(p) \
+ ((((p)->slice_info & SLICE_LTS_MASK) >> SLICE_LTS_SHIFT) ? : \
+ DEF_TIMESLICE)
+#define set_last_slice(p, n) ((p)->slice_info = (((p)->slice_info & \
+ ~SLICE_LTS_MASK) | (((n) << SLICE_LTS_SHIFT) & SLICE_LTS_MASK)))
+
+#define slice_avg(p) \
+ ((((p)->slice_info & SLICE_AVG_MASK) >> SLICE_AVG_SHIFT) * \
+ NS_MAX_SLEEP_AVG_PCNT)
+#define set_slice_avg(p, n) ((p)->slice_info = (((p)->slice_info & \
+ ~SLICE_AVG_MASK) | ((((n) / NS_MAX_SLEEP_AVG_PCNT) \
+ << SLICE_AVG_SHIFT) & SLICE_AVG_MASK)))
+#define slice_avg_raw(p) \
+ (((p)->slice_info & SLICE_AVG_MASK) >> SLICE_AVG_SHIFT)
+#define set_slice_avg_raw(p, n) ((p)->slice_info = (((p)->slice_info & \
+ ~SLICE_AVG_MASK) | (((n) << SLICE_AVG_SHIFT) & SLICE_AVG_MASK)))
+
+#define cpu_avg(p) \
+ (100 - slice_avg_raw(p))
+
+#define slice_time_avg(p) \
+ (100 * last_slice(p) / max((unsigned)cpu_avg(p), 1U))
+
+#define time_this_slice(p) \
+ (jiffies - (p)->last_slice)
+
+#define cpu_this_slice(p) \
+ (100 * last_slice(p) / max((unsigned)time_this_slice(p), \
+ (unsigned)last_slice(p)))
+
+#define this_slice_avg(p) \
+ ((100 - cpu_this_slice(p)) * NS_MAX_SLEEP_AVG_PCNT)
+
+/*
+ * In order to prevent tasks from thrashing between domesticated livestock
+ * and irate rhino, once a throttle is hung on a task, the only way to get
+ * rid of it is to change behavior. We push the throttle stamp forward in
+ * time as things improve until the stamp is in the future. Only then may
+ * we safely pull our 'tranquilizer dart'.
+ */
+#define conditional_tag(p) ((!(p)->throttle_stamp && \
+ (p)->sleep_avg > slice_avg(p) + THROTTLE_THRESHOLD) ? \
+({ \
+ ((p)->throttle_stamp = jiffies) ? : 1; \
+}) : 0)
+
+/*
+ * Those who use the least cpu receive the most encouragement.
+ */
+#define SLICE_AVG_MULTIPLIER(p) \
+ (1 + NS_TO_JIFFIES(this_slice_avg(p)) * MAX_BONUS / MAX_SLEEP_AVG)
+
+#define conditional_release(p) (((p)->throttle_stamp && \
+ (p)->sched_time >= (G2 ? JIFFIES_TO_NS(HZ) : ~0ULL) && \
+ ((20 + cpu_this_slice(p) < cpu_avg(p) && (p)->sleep_avg < \
+ slice_avg(p) + THROTTLE_THRESHOLD) || cpu_avg(p) <= 5)) ? \
+({ \
+ int __ret = 0; \
+ int delay = slice_time_avg(p) - last_slice(p); \
+ if (delay > 0) { \
+ delay *= SLICE_AVG_MULTIPLIER(p); \
+ (p)->throttle_stamp += delay; \
+ } \
+ if (time_before(jiffies, (p)->throttle_stamp)) { \
+ (p)->throttle_stamp = 0; \
+ __ret++; \
+ if (!((p)->state & TASK_NONINTERACTIVE)) \
+ (p)->sleep_type = SLEEP_NORMAL; \
+ } \
+ __ret; \
+}) : 0)
+
+/*
+ * CURRENT_BONUS(p) adjusted to match slice_avg after grace expiration.
+ */
+#define ADJUSTED_BONUS(p, grace) \
+({ \
+ unsigned long sleep_avg = (p)->sleep_avg; \
+ if (grace_expired(p, (grace))) \
+ sleep_avg = min((unsigned long)(p)->sleep_avg, \
+ (unsigned long)slice_avg(p)); \
+ NS_TO_JIFFIES(sleep_avg) * MAX_BONUS / MAX_SLEEP_AVG; \
+})
+
+#define BONUS_MULTIPLIER(p) \
+ (grace_expired(p, G1) ? : SLICE_AVG_MULTIPLIER(p))
+
+#define BONUS_DIVISOR(p) \
+ (grace_expired(p, G2) ? : (1 + ADJUSTED_BONUS(p, G1)))
+
+#define INTERACTIVE_SLEEP_AVG(p) \
+ (min(JIFFIES_TO_NS(MAX_SLEEP_AVG * (MAX_BONUS / 2 + DELTA(p)) / MAX_BONUS), \
+ NS_MAX_SLEEP_AVG))
+
+/*
+ * The quantity of sleep quaranteed to elevate a task to interactive status,
+ * or if already there, to elevate it to the next priority or beyond.
+ */
+#define INTERACTIVE_SLEEP_NS(p, ns) \
+ (BONUS_MULTIPLIER(p) * (ns) >= INTERACTIVE_SLEEP_AVG(p) || \
+ ((p)->sleep_avg < INTERACTIVE_SLEEP_AVG(p) && BONUS_MULTIPLIER(p) * \
+ (ns) + (p)->sleep_avg >= INTERACTIVE_SLEEP_AVG(p)) || \
+ ((p)->sleep_avg >= INTERACTIVE_SLEEP_AVG(p) && BONUS_MULTIPLIER(p) * \
+ (ns) + ((p)->sleep_avg % NEXT_PRIO) >= NEXT_PRIO))
#define TASK_PREEMPTS_CURR(p, rq) \
((p)->prio < (rq)->curr->prio)
@@ -668,7 +854,7 @@
if (rt_task(p))
return p->prio;
- bonus = CURRENT_BONUS(p) - MAX_BONUS / 2;
+ bonus = ADJUSTED_BONUS(p, G2) - MAX_BONUS / 2;
prio = p->static_prio - bonus;
if (prio < MAX_RT_PRIO)
@@ -794,19 +980,39 @@
if (likely(sleep_time > 0)) {
/*
- * User tasks that sleep a long time are categorised as
- * idle. They will only have their sleep_avg increased to a
+ * Tasks that sleep a long time are categorised as idle.
+ * They will only have their sleep_avg increased to a
* level that makes them just interactive priority to stay
* active yet prevent them suddenly becoming cpu hogs and
- * starving other processes.
+ * starving other processes. All tasks must stop at each
+ * TASK_INTERACTIVE boundry before moving on so that no
+ * single sleep slams it straight into NS_MAX_SLEEP_AVG.
*/
- if (p->mm && sleep_time > INTERACTIVE_SLEEP(p)) {
- unsigned long ceiling;
+ if (INTERACTIVE_SLEEP_NS(p, sleep_time)) {
+ int ticks = last_slice(p) / BONUS_DIVISOR(p);
+ unsigned long ceiling = INTERACTIVE_SLEEP_AVG(p);
+
+ ticks = JIFFIES_TO_NS(ticks);
+
+ if (grace_expired(p, G2) && slice_avg(p) < ceiling)
+ ceiling = slice_avg(p);
+ /* Promote previously interactive task. */
+ else if (p->sleep_avg >= INTERACTIVE_SLEEP_AVG(p) &&
+ !grace_expired(p, G2)) {
+
+ ceiling = p->sleep_avg / NEXT_PRIO;
+ if (ceiling < MAX_BONUS)
+ ceiling++;
+ ceiling *= NEXT_PRIO;
+ }
- ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -
- DEF_TIMESLICE);
- if (p->sleep_avg < ceiling)
- p->sleep_avg = ceiling;
+ ceiling += ticks;
+
+ if (ceiling > NS_MAX_SLEEP_AVG)
+ ceiling = NS_MAX_SLEEP_AVG;
+
+ if (p->sleep_avg < ceiling)
+ p->sleep_avg = ceiling;
} else {
/*
@@ -816,9 +1022,8 @@
* If a task was sleeping with the noninteractive
* label do not apply this non-linear boost
*/
- if (p->sleep_type != SLEEP_NONINTERACTIVE || !p->mm)
- sleep_time *=
- (MAX_BONUS - CURRENT_BONUS(p)) ? : 1;
+ if (p->sleep_type != SLEEP_NONINTERACTIVE)
+ sleep_time *= BONUS_MULTIPLIER(p);
/*
* This code gives a bonus to interactive tasks.
@@ -1367,7 +1572,10 @@
out_activate:
#endif /* CONFIG_SMP */
- if (old_state == TASK_UNINTERRUPTIBLE) {
+
+ conditional_release(p);
+
+ if (old_state & TASK_UNINTERRUPTIBLE) {
rq->nr_uninterruptible--;
/*
* Tasks waking from uninterruptible sleep are likely
@@ -1468,9 +1676,27 @@
* The remainder of the first timeslice might be recovered by
* the parent if the child exits early enough.
*/
- p->first_time_slice = 1;
+ set_first_time_slice(p);
current->time_slice >>= 1;
p->timestamp = sched_clock();
+
+ /*
+ * Set up slice_info for the child.
+ *
+ * Note: The child inherits the parent's throttle,
+ * and must shake it loose. It does not inherit
+ * the parent's slice_avg.
+ */
+ set_slice_avg(p, NS_MAX_SLEEP_AVG);
+ set_last_slice(p, p->time_slice);
+ set_slice_is_new(p);
+ p->last_slice = jiffies;
+ /*
+ * Limit the difficulty to what the parent faced.
+ */
+ if (p->throttle_stamp && grace_expired(p, G2))
+ p->throttle_stamp = jiffies - G2;
+
if (unlikely(!current->time_slice)) {
/*
* This case is rare, it happens when the parent has only
@@ -1584,7 +1810,7 @@
* the sleep_avg of the parent as well.
*/
rq = task_rq_lock(p->parent, &flags);
- if (p->first_time_slice && task_cpu(p) == task_cpu(p->parent)) {
+ if (first_time_slice(p) && task_cpu(p) == task_cpu(p->parent)) {
p->parent->time_slice += p->time_slice;
if (unlikely(p->parent->time_slice > task_timeslice(p)))
p->parent->time_slice = task_timeslice(p);
@@ -2665,6 +2891,51 @@
}
/*
+ * Calculate a task's average cpu usage rate in terms of sleep_avg, and
+ * check whether the task may soon need throttling. Must be called after
+ * refreshing the task's time slice.
+ * @p: task for which slice_avg should be computed.
+ */
+static void recalc_task_slice_avg(task_t *p)
+{
+ unsigned int slice_avg = slice_avg_raw(p);
+ unsigned int time_slice = last_slice(p);
+ int w = MAX_BONUS, idle;
+
+ if (unlikely(!time_slice))
+ set_last_slice(p, p->time_slice);
+
+ idle = 100 - cpu_this_slice(p);
+
+ /*
+ * If the task is lowering it's cpu usage, speed up the
+ * effect on slice_avg so we don't over-throttle.
+ */
+ if (idle > slice_avg) {
+ w -= idle / w;
+ if (!w)
+ w = 1;
+ }
+
+ slice_avg = (w * (slice_avg ? : 1) + idle) / (w + 1);
+
+ /* Check to see if we should start/stop throttling. */
+ if(!rt_task(p) && !conditional_release(p))
+ conditional_tag(p);
+
+ /* Update slice_avg. */
+ set_slice_avg_raw(p, slice_avg);
+
+ /* Update cached slice length. */
+ if (time_slice != p->time_slice)
+ set_last_slice(p, p->time_slice);
+
+ /* And finally, stamp and tag the new slice. */
+ set_slice_is_new(p);
+ p->last_slice = jiffies;
+}
+
+/*
* This function gets called by the timer code, with HZ frequency.
* We call it with interrupts disabled.
*
@@ -2709,20 +2980,24 @@
*/
if ((p->policy == SCHED_RR) && !--p->time_slice) {
p->time_slice = task_timeslice(p);
- p->first_time_slice = 0;
+ recalc_task_slice_avg(p);
+ clr_first_time_slice(p);
set_tsk_need_resched(p);
/* put it at the end of the queue: */
requeue_task(p, rq->active);
}
+ if (unlikely(p->throttle_stamp))
+ p->throttle_stamp = 0;
goto out_unlock;
}
if (!--p->time_slice) {
dequeue_task(p, rq->active);
set_tsk_need_resched(p);
- p->prio = effective_prio(p);
p->time_slice = task_timeslice(p);
- p->first_time_slice = 0;
+ recalc_task_slice_avg(p);
+ p->prio = effective_prio(p);
+ clr_first_time_slice(p);
if (!rq->expired_timestamp)
rq->expired_timestamp = jiffies;
@@ -3033,7 +3308,7 @@
* Tasks charged proportionately less run_time at high sleep_avg to
* delay them losing their interactive status
*/
- run_time /= (CURRENT_BONUS(prev) ? : 1);
+ run_time /= BONUS_DIVISOR(prev);
spin_lock_irq(&rq->lock);
@@ -3047,7 +3322,7 @@
unlikely(signal_pending(prev))))
prev->state = TASK_RUNNING;
else {
- if (prev->state == TASK_UNINTERRUPTIBLE)
+ if (prev->state & TASK_UNINTERRUPTIBLE)
rq->nr_uninterruptible++;
deactivate_task(prev, rq);
}
@@ -3096,6 +3371,7 @@
rq->best_expired_prio = MAX_PRIO;
}
+repeat_selection:
idx = sched_find_first_bit(array->bitmap);
queue = array->queue + idx;
next = list_entry(queue->next, task_t, run_list);
@@ -3115,8 +3391,14 @@
dequeue_task(next, array);
next->prio = new_prio;
enqueue_task(next, array);
- } else
- requeue_task(next, array);
+
+ /*
+ * We may have just been demoted below other
+ * runnable tasks in our previous queue.
+ */
+ next->sleep_type = SLEEP_NORMAL;
+ goto repeat_selection;
+ }
}
next->sleep_type = SLEEP_NORMAL;
switch_tasks:
@@ -3134,6 +3416,14 @@
prev->sleep_avg = 0;
prev->timestamp = prev->last_ran = now;
+ /*
+ * Tag start of execution of a new timeslice.
+ */
+ if (unlikely(slice_is_new(next))) {
+ next->last_slice = jiffies;
+ clr_slice_is_new(next);
+ }
+
sched_info_switch(prev, next);
if (likely(prev != next)) {
next->timestamp = now;
--- linux-2.6.16-rc2-mm1x/kernel/sysctl.c.org 2006-02-09 13:15:17.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/kernel/sysctl.c 2006-02-09 13:16:30.000000000 +0100
@@ -69,6 +69,8 @@
extern int pid_max_min, pid_max_max;
extern int sysctl_drop_caches;
extern int percpu_pagelist_fraction;
+extern int sched_g1;
+extern int sched_g2;
#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
int unknown_nmi_panic;
@@ -224,6 +226,11 @@
{ .ctl_name = 0 }
};
+/* Constants for minimum and maximum testing in vm_table and
+ * kern_table. We use these as one-element integer vectors. */
+static int zero;
+static int one_hundred = 100;
+
static ctl_table kern_table[] = {
{
.ctl_name = KERN_OSTYPE,
@@ -666,15 +673,29 @@
.proc_handler = &proc_dointvec,
},
#endif
+ {
+ .ctl_name = KERN_SCHED_THROTTLE1,
+ .procname = "sched_g1",
+ .data = &sched_g1,
+ .maxlen = sizeof (int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ .strategy = &sysctl_intvec,
+ .extra1 = &zero,
+ },
+ {
+ .ctl_name = KERN_SCHED_THROTTLE2,
+ .procname = "sched_g2",
+ .data = &sched_g2,
+ .maxlen = sizeof (int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ .strategy = &sysctl_intvec,
+ .extra1 = &zero,
+ },
{ .ctl_name = 0 }
};
-/* Constants for minimum and maximum testing in vm_table.
- We use these as one-element integer vectors. */
-static int zero;
-static int one_hundred = 100;
-
-
static ctl_table vm_table[] = {
{
.ctl_name = VM_OVERCOMMIT_MEMORY,
--- linux-2.6.16-rc2-mm1x/fs/pipe.c.org 2006-02-09 13:15:35.000000000 +0100
+++ linux-2.6.16-rc2-mm1x/fs/pipe.c 2006-02-09 13:16:30.000000000 +0100
@@ -39,11 +39,7 @@
{
DEFINE_WAIT(wait);
- /*
- * Pipes are system-local resources, so sleeping on them
- * is considered a noninteractive wait:
- */
- prepare_to_wait(PIPE_WAIT(*inode), &wait, TASK_INTERRUPTIBLE|TASK_NONINTERACTIVE);
+ prepare_to_wait(PIPE_WAIT(*inode), &wait, TASK_INTERRUPTIBLE);
mutex_unlock(PIPE_MUTEX(*inode));
schedule();
finish_wait(PIPE_WAIT(*inode), &wait);
On Sun, 2006-02-12 at 14:47 +0100, MIke Galbraith wrote:
> If you think it's the scheduler, how about try the patch below. It's
> against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
> logic in the scheduler or not. I don't see other candidates in there,
> not that that means there aren't any of course.
I'll try, but it's a serious pain for me to build an -mm kernel. A
patch against 2.6.16-rc1 would be much easier.
Lee
On Sun, 2006-02-12 at 14:03 -0500, Lee Revell wrote:
> On Sun, 2006-02-12 at 14:47 +0100, MIke Galbraith wrote:
> > If you think it's the scheduler, how about try the patch below. It's
> > against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
> > logic in the scheduler or not. I don't see other candidates in there,
> > not that that means there aren't any of course.
>
> I'll try, but it's a serious pain for me to build an -mm kernel. A
> patch against 2.6.16-rc1 would be much easier.
Ok, here she comes. It's a bit too reluctant to release a task so it
can reach interactive status at the moment, but for this test, that's a
feature. In fact, for this test, it's probably best to jump straight to
setting both g1 and g2 to zero.
-Mike
--- linux-2.6.16-rc1/include/linux/sched.h.org 2006-02-12 21:28:28.000000000 +0100
+++ linux-2.6.16-rc1/include/linux/sched.h 2006-02-12 21:54:40.000000000 +0100
@@ -688,6 +688,13 @@
struct audit_context; /* See audit.c */
struct mempolicy;
+enum sleep_type {
+ SLEEP_NORMAL,
+ SLEEP_NONINTERACTIVE,
+ SLEEP_INTERACTIVE,
+ SLEEP_INTERRUPTED,
+};
+
struct task_struct {
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
struct thread_info *thread_info;
@@ -709,14 +716,14 @@
unsigned short ioprio;
- unsigned long sleep_avg;
+ unsigned long sleep_avg, last_slice, throttle_stamp;
unsigned long long timestamp, last_ran;
unsigned long long sched_time; /* sched_clock time spent running */
- int activated;
+ enum sleep_type sleep_type;
unsigned long policy;
cpumask_t cpus_allowed;
- unsigned int time_slice, first_time_slice;
+ unsigned int time_slice, slice_info;
#ifdef CONFIG_SCHEDSTATS
struct sched_info sched_info;
--- linux-2.6.16-rc1/include/linux/sysctl.h.org 2006-02-12 21:28:44.000000000 +0100
+++ linux-2.6.16-rc1/include/linux/sysctl.h 2006-02-12 21:34:46.000000000 +0100
@@ -146,6 +146,8 @@
KERN_RANDOMIZE=68, /* int: randomize virtual address space */
KERN_SETUID_DUMPABLE=69, /* int: behaviour of dumps for setuid core */
KERN_SPIN_RETRY=70, /* int: number of spinlock retries */
+ KERN_SCHED_THROTTLE1=71, /* int: throttling grace period 1 in secs */
+ KERN_SCHED_THROTTLE2=72, /* int: throttling grace period 2 in secs */
};
--- linux-2.6.16-rc1/kernel/sched.c.org 2006-02-12 21:29:13.000000000 +0100
+++ linux-2.6.16-rc1/kernel/sched.c 2006-02-12 21:58:14.000000000 +0100
@@ -149,9 +149,195 @@
#define TASK_INTERACTIVE(p) \
((p)->prio <= (p)->static_prio - DELTA(p))
-#define INTERACTIVE_SLEEP(p) \
- (JIFFIES_TO_NS(MAX_SLEEP_AVG * \
- (MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
+/*
+ * Interactivity boost can lead to serious starvation problems if the
+ * task being boosted turns out to be a cpu hog. To combat this, we
+ * compute a running slice_avg, which is the sane upper limit for the
+ * task's sleep_avg. If an 'interactive' task begins burning cpu, it's
+ * slice_avg will decay, making it visible as a problem so corrective
+ * measures can be applied.
+ *
+ * /proc/sys/kernel tunables.
+ *
+ * sched_g1: Grace period in seconds that a task is allowed to run unchecked.
+ * sched_g2: seconds thereafter, to force a priority adjustment.
+ */
+
+int sched_g1 = 20;
+int sched_g2 = 10;
+
+/*
+ * Offset from the time we noticed a potential problem until we disable the
+ * interactive bonus multiplier, and adjust sleep_avg consumption rate.
+ */
+#define G1 (sched_g1 * HZ)
+
+/*
+ * Offset thereafter that we disable the interactive bonus divisor, and adjust
+ * a runaway task's priority.
+ */
+#define G2 (sched_g2 * HZ + G1)
+
+/*
+ * Grace period has expired.
+ */
+#define grace_expired(p, grace) ((p)->throttle_stamp && \
+ time_after_eq(jiffies, (p)->throttle_stamp + (grace)))
+
+#define NEXT_PRIO (NS_MAX_SLEEP_AVG / MAX_BONUS)
+
+/*
+ * Warning: do not reduce threshold below NS_MAX_SLEEP_AVG / MAX_BONUS
+ * else you may break the case where one of a pair of communicating tasks
+ * only sleeps a miniscule amount of time, but must to be able to preempt
+ * it's partner in order to get any cpu time to speak of. If you push that
+ * task to the same level or below it's partner, it will not be able to
+ * preempt and will starve. This scenario was fixed for bonus calculation
+ * by converting sleep_avg to ns.
+ */
+#define THROTTLE_THRESHOLD (NEXT_PRIO)
+
+#define NS_MAX_SLEEP_AVG_PCNT (NS_MAX_SLEEP_AVG / 100)
+
+/*
+ * Masks for p->slice_info, formerly p->first_time_slice.
+ * SLICE_FTS: 0x80000000 Task is in it's first ever timeslice.
+ * SLICE_NEW: 0x40000000 Slice refreshed.
+ * SLICE_SPA: 0x3FFF8000 Spare bits.
+ * SLICE_LTS: 0x00007F80 Last time slice
+ * SLICE_AVG: 0x0000007F Task slice_avg stored as percentage.
+ */
+#define SLICE_AVG_BITS 7
+#define SLICE_LTS_BITS 10
+#define SLICE_SPA_BITS 13
+#define SLICE_NEW_BITS 1
+#define SLICE_FTS_BITS 1
+
+#define SLICE_AVG_SHIFT 0
+#define SLICE_LTS_SHIFT (SLICE_AVG_SHIFT + SLICE_AVG_BITS)
+#define SLICE_SPA_SHIFT (SLICE_LTS_SHIFT + SLICE_LTS_BITS)
+#define SLICE_NEW_SHIFT (SLICE_SPA_SHIFT + SLICE_SPA_BITS)
+#define SLICE_FTS_SHIFT (SLICE_NEW_SHIFT + SLICE_NEW_BITS)
+
+#define INFO_MASK(x) ((1U << (x))-1)
+#define SLICE_AVG_MASK (INFO_MASK(SLICE_AVG_BITS) << SLICE_AVG_SHIFT)
+#define SLICE_LTS_MASK (INFO_MASK(SLICE_LTS_BITS) << SLICE_LTS_SHIFT)
+#define SLICE_SPA_MASK (INFO_MASK(SLICE_SPA_BITS) << SLICE_SPA_SHIFT)
+#define SLICE_NEW_MASK (INFO_MASK(SLICE_NEW_BITS) << SLICE_NEW_SHIFT)
+#define SLICE_FTS_MASK (INFO_MASK(SLICE_FTS_BITS) << SLICE_FTS_SHIFT)
+
+#define first_time_slice(p) ((p)->slice_info & SLICE_FTS_MASK)
+#define set_first_time_slice(p) ((p)->slice_info |= SLICE_FTS_MASK)
+#define clr_first_time_slice(p) ((p)->slice_info &= ~SLICE_FTS_MASK)
+
+#define slice_is_new(p) ((p)->slice_info & SLICE_NEW_MASK)
+#define set_slice_is_new(p) ((p)->slice_info |= SLICE_NEW_MASK)
+#define clr_slice_is_new(p) ((p)->slice_info &= ~SLICE_NEW_MASK)
+
+#define last_slice(p) \
+ ((((p)->slice_info & SLICE_LTS_MASK) >> SLICE_LTS_SHIFT) ? : \
+ DEF_TIMESLICE)
+#define set_last_slice(p, n) ((p)->slice_info = (((p)->slice_info & \
+ ~SLICE_LTS_MASK) | (((n) << SLICE_LTS_SHIFT) & SLICE_LTS_MASK)))
+
+#define slice_avg(p) \
+ ((((p)->slice_info & SLICE_AVG_MASK) >> SLICE_AVG_SHIFT) * \
+ NS_MAX_SLEEP_AVG_PCNT)
+#define set_slice_avg(p, n) ((p)->slice_info = (((p)->slice_info & \
+ ~SLICE_AVG_MASK) | ((((n) / NS_MAX_SLEEP_AVG_PCNT) \
+ << SLICE_AVG_SHIFT) & SLICE_AVG_MASK)))
+#define slice_avg_raw(p) \
+ (((p)->slice_info & SLICE_AVG_MASK) >> SLICE_AVG_SHIFT)
+#define set_slice_avg_raw(p, n) ((p)->slice_info = (((p)->slice_info & \
+ ~SLICE_AVG_MASK) | (((n) << SLICE_AVG_SHIFT) & SLICE_AVG_MASK)))
+
+#define cpu_avg(p) \
+ (100 - slice_avg_raw(p))
+
+#define slice_time_avg(p) \
+ (100 * last_slice(p) / max((unsigned)cpu_avg(p), 1U))
+
+#define time_this_slice(p) \
+ (jiffies - (p)->last_slice)
+
+#define cpu_this_slice(p) \
+ (100 * last_slice(p) / max((unsigned)time_this_slice(p), \
+ (unsigned)last_slice(p)))
+
+#define this_slice_avg(p) \
+ ((100 - cpu_this_slice(p)) * NS_MAX_SLEEP_AVG_PCNT)
+
+/*
+ * In order to prevent tasks from thrashing between domesticated livestock
+ * and irate rhino, once a throttle is hung on a task, the only way to get
+ * rid of it is to change behavior. We push the throttle stamp forward in
+ * time as things improve until the stamp is in the future. Only then may
+ * we safely pull our 'tranquilizer dart'.
+ */
+#define conditional_tag(p) ((!(p)->throttle_stamp && \
+ (p)->sleep_avg > slice_avg(p) + THROTTLE_THRESHOLD) ? \
+({ \
+ ((p)->throttle_stamp = jiffies) ? : 1; \
+}) : 0)
+
+/*
+ * Those who use the least cpu receive the most encouragement.
+ */
+#define SLICE_AVG_MULTIPLIER(p) \
+ (1 + NS_TO_JIFFIES(this_slice_avg(p)) * MAX_BONUS / MAX_SLEEP_AVG)
+
+#define conditional_release(p) (((p)->throttle_stamp && \
+ (p)->sched_time >= (G2 ? JIFFIES_TO_NS(HZ) : ~0ULL) && \
+ ((20 + cpu_this_slice(p) < cpu_avg(p) && (p)->sleep_avg < \
+ slice_avg(p) + THROTTLE_THRESHOLD) || cpu_avg(p) <= 5)) ? \
+({ \
+ int __ret = 0; \
+ int delay = slice_time_avg(p) - last_slice(p); \
+ if (delay > 0) { \
+ delay *= SLICE_AVG_MULTIPLIER(p); \
+ (p)->throttle_stamp += delay; \
+ } \
+ if (time_before(jiffies, (p)->throttle_stamp)) { \
+ (p)->throttle_stamp = 0; \
+ __ret++; \
+ if (!((p)->state & TASK_NONINTERACTIVE)) \
+ (p)->sleep_type = SLEEP_NORMAL; \
+ } \
+ __ret; \
+}) : 0)
+
+/*
+ * CURRENT_BONUS(p) adjusted to match slice_avg after grace expiration.
+ */
+#define ADJUSTED_BONUS(p, grace) \
+({ \
+ unsigned long sleep_avg = (p)->sleep_avg; \
+ if (grace_expired(p, (grace))) \
+ sleep_avg = min((unsigned long)(p)->sleep_avg, \
+ (unsigned long)slice_avg(p)); \
+ NS_TO_JIFFIES(sleep_avg) * MAX_BONUS / MAX_SLEEP_AVG; \
+})
+
+#define BONUS_MULTIPLIER(p) \
+ (grace_expired(p, G1) ? : SLICE_AVG_MULTIPLIER(p))
+
+#define BONUS_DIVISOR(p) \
+ (grace_expired(p, G2) ? : (1 + ADJUSTED_BONUS(p, G1)))
+
+#define INTERACTIVE_SLEEP_AVG(p) \
+ (min(JIFFIES_TO_NS(MAX_SLEEP_AVG * (MAX_BONUS / 2 + DELTA(p)) / MAX_BONUS), \
+ NS_MAX_SLEEP_AVG))
+
+/*
+ * The quantity of sleep quaranteed to elevate a task to interactive status,
+ * or if already there, to elevate it to the next priority or beyond.
+ */
+#define INTERACTIVE_SLEEP_NS(p, ns) \
+ (BONUS_MULTIPLIER(p) * (ns) >= INTERACTIVE_SLEEP_AVG(p) || \
+ ((p)->sleep_avg < INTERACTIVE_SLEEP_AVG(p) && BONUS_MULTIPLIER(p) * \
+ (ns) + (p)->sleep_avg >= INTERACTIVE_SLEEP_AVG(p)) || \
+ ((p)->sleep_avg >= INTERACTIVE_SLEEP_AVG(p) && BONUS_MULTIPLIER(p) * \
+ (ns) + ((p)->sleep_avg % NEXT_PRIO) >= NEXT_PRIO))
#define TASK_PREEMPTS_CURR(p, rq) \
((p)->prio < (rq)->curr->prio)
@@ -659,7 +845,7 @@
if (rt_task(p))
return p->prio;
- bonus = CURRENT_BONUS(p) - MAX_BONUS / 2;
+ bonus = ADJUSTED_BONUS(p, G2) - MAX_BONUS / 2;
prio = p->static_prio - bonus;
if (prio < MAX_RT_PRIO)
@@ -759,36 +945,50 @@
if (likely(sleep_time > 0)) {
/*
- * User tasks that sleep a long time are categorised as
- * idle and will get just interactive status to stay active &
- * prevent them suddenly becoming cpu hogs and starving
- * other processes.
+ * Tasks that sleep a long time are categorised as idle.
+ * They will only have their sleep_avg increased to a
+ * level that makes them just interactive priority to stay
+ * active yet prevent them suddenly becoming cpu hogs and
+ * starving other processes. All tasks must stop at each
+ * TASK_INTERACTIVE boundry before moving on so that no
+ * single sleep slams it straight into NS_MAX_SLEEP_AVG.
*/
- if (p->mm && p->activated != -1 &&
- sleep_time > INTERACTIVE_SLEEP(p)) {
- p->sleep_avg = JIFFIES_TO_NS(MAX_SLEEP_AVG -
- DEF_TIMESLICE);
+ if (INTERACTIVE_SLEEP_NS(p, sleep_time)) {
+ int ticks = last_slice(p) / BONUS_DIVISOR(p);
+ unsigned long ceiling = INTERACTIVE_SLEEP_AVG(p);
+
+ ticks = JIFFIES_TO_NS(ticks);
+
+ if (grace_expired(p, G2) && slice_avg(p) < ceiling)
+ ceiling = slice_avg(p);
+ /* Promote previously interactive task. */
+ else if (p->sleep_avg >= INTERACTIVE_SLEEP_AVG(p) &&
+ !grace_expired(p, G2)) {
+
+ ceiling = p->sleep_avg / NEXT_PRIO;
+ if (ceiling < MAX_BONUS)
+ ceiling++;
+ ceiling *= NEXT_PRIO;
+ }
+
+ ceiling += ticks;
+
+ if (ceiling > NS_MAX_SLEEP_AVG)
+ ceiling = NS_MAX_SLEEP_AVG;
+
+ if (p->sleep_avg < ceiling)
+ p->sleep_avg = ceiling;
} else {
- /*
- * The lower the sleep avg a task has the more
- * rapidly it will rise with sleep time.
- */
- sleep_time *= (MAX_BONUS - CURRENT_BONUS(p)) ? : 1;
/*
- * Tasks waking from uninterruptible sleep are
- * limited in their sleep_avg rise as they
- * are likely to be waiting on I/O
+ * The lower the sleep avg a task has the more
+ * rapidly it will rise with sleep time. This enables
+ * tasks to rapidly recover to a low latency priority.
+ * If a task was sleeping with the noninteractive
+ * label do not apply this non-linear boost
*/
- if (p->activated == -1 && p->mm) {
- if (p->sleep_avg >= INTERACTIVE_SLEEP(p))
- sleep_time = 0;
- else if (p->sleep_avg + sleep_time >=
- INTERACTIVE_SLEEP(p)) {
- p->sleep_avg = INTERACTIVE_SLEEP(p);
- sleep_time = 0;
- }
- }
+ if (p->sleep_type != SLEEP_NONINTERACTIVE)
+ sleep_time *= BONUS_MULTIPLIER(p);
/*
* This code gives a bonus to interactive tasks.
@@ -835,7 +1035,7 @@
* This checks to make sure it's not an uninterruptible task
* that is now waking up.
*/
- if (!p->activated) {
+ if (p->sleep_type != SLEEP_NONINTERACTIVE) {
/*
* Tasks which were woken up by interrupts (ie. hw events)
* are most likely of interactive nature. So we give them
@@ -844,13 +1044,13 @@
* on a CPU, first time around:
*/
if (in_interrupt())
- p->activated = 2;
+ p->sleep_type = SLEEP_INTERRUPTED;
else {
/*
* Normal first-time wakeups get a credit too for
* on-runqueue time, but it will be weighted down:
*/
- p->activated = 1;
+ p->sleep_type = SLEEP_INTERACTIVE;
}
}
p->timestamp = now;
@@ -1371,25 +1571,28 @@
out_activate:
#endif /* CONFIG_SMP */
- if (old_state == TASK_UNINTERRUPTIBLE) {
- rq->nr_uninterruptible--;
+
+ conditional_release(p);
+
+ if (old_state & TASK_UNINTERRUPTIBLE) {
/*
- * Tasks on involuntary sleep don't earn
- * sleep_avg beyond just interactive state.
+ * Tasks waking from uninterruptible sleep are likely
+ * to be sleeping involuntarily on I/O and are otherwise
+ * cpu bound so label them as noninteractive.
*/
- p->activated = -1;
- }
+ p->sleep_type = SLEEP_NONINTERACTIVE;
+ } else
/*
* Tasks that have marked their sleep as noninteractive get
- * woken up without updating their sleep average. (i.e. their
- * sleep is handled in a priority-neutral manner, no priority
- * boost and no penalty.)
+ * woken up with their sleep average not weighted in an
+ * interactive way.
*/
- if (old_state & TASK_NONINTERACTIVE)
- __activate_task(p, rq);
- else
- activate_task(p, rq, cpu == this_cpu);
+ if (old_state & TASK_NONINTERACTIVE)
+ p->sleep_type = SLEEP_NONINTERACTIVE;
+
+
+ activate_task(p, rq, cpu == this_cpu);
/*
* Sync wakeups (i.e. those types of wakeups where the waker
* has indicated that it will leave the CPU in short order)
@@ -1471,9 +1674,27 @@
* The remainder of the first timeslice might be recovered by
* the parent if the child exits early enough.
*/
- p->first_time_slice = 1;
+ set_first_time_slice(p);
current->time_slice >>= 1;
p->timestamp = sched_clock();
+
+ /*
+ * Set up slice_info for the child.
+ *
+ * Note: The child inherits the parent's throttle,
+ * and must shake it loose. It does not inherit
+ * the parent's slice_avg.
+ */
+ set_slice_avg(p, NS_MAX_SLEEP_AVG);
+ set_last_slice(p, p->time_slice);
+ set_slice_is_new(p);
+ p->last_slice = jiffies;
+ /*
+ * Limit the difficulty to what the parent faced.
+ */
+ if (p->throttle_stamp && grace_expired(p, G2))
+ p->throttle_stamp = jiffies - G2;
+
if (unlikely(!current->time_slice)) {
/*
* This case is rare, it happens when the parent has only
@@ -1587,7 +1808,7 @@
* the sleep_avg of the parent as well.
*/
rq = task_rq_lock(p->parent, &flags);
- if (p->first_time_slice && task_cpu(p) == task_cpu(p->parent)) {
+ if (first_time_slice(p) && task_cpu(p) == task_cpu(p->parent)) {
p->parent->time_slice += p->time_slice;
if (unlikely(p->parent->time_slice > task_timeslice(p)))
p->parent->time_slice = task_timeslice(p);
@@ -2655,6 +2876,51 @@
}
/*
+ * Calculate a task's average cpu usage rate in terms of sleep_avg, and
+ * check whether the task may soon need throttling. Must be called after
+ * refreshing the task's time slice.
+ * @p: task for which slice_avg should be computed.
+ */
+static void recalc_task_slice_avg(task_t *p)
+{
+ unsigned int slice_avg = slice_avg_raw(p);
+ unsigned int time_slice = last_slice(p);
+ int w = MAX_BONUS, idle;
+
+ if (unlikely(!time_slice))
+ set_last_slice(p, p->time_slice);
+
+ idle = 100 - cpu_this_slice(p);
+
+ /*
+ * If the task is lowering it's cpu usage, speed up the
+ * effect on slice_avg so we don't over-throttle.
+ */
+ if (idle > slice_avg) {
+ w -= idle / w;
+ if (!w)
+ w = 1;
+ }
+
+ slice_avg = (w * (slice_avg ? : 1) + idle) / (w + 1);
+
+ /* Check to see if we should start/stop throttling. */
+ if(!rt_task(p) && !conditional_release(p))
+ conditional_tag(p);
+
+ /* Update slice_avg. */
+ set_slice_avg_raw(p, slice_avg);
+
+ /* Update cached slice length. */
+ if (time_slice != p->time_slice)
+ set_last_slice(p, p->time_slice);
+
+ /* And finally, stamp and tag the new slice. */
+ set_slice_is_new(p);
+ p->last_slice = jiffies;
+}
+
+/*
* This function gets called by the timer code, with HZ frequency.
* We call it with interrupts disabled.
*
@@ -2699,20 +2965,24 @@
*/
if ((p->policy == SCHED_RR) && !--p->time_slice) {
p->time_slice = task_timeslice(p);
- p->first_time_slice = 0;
+ recalc_task_slice_avg(p);
+ clr_first_time_slice(p);
set_tsk_need_resched(p);
/* put it at the end of the queue: */
requeue_task(p, rq->active);
}
+ if (unlikely(p->throttle_stamp))
+ p->throttle_stamp = 0;
goto out_unlock;
}
if (!--p->time_slice) {
dequeue_task(p, rq->active);
set_tsk_need_resched(p);
- p->prio = effective_prio(p);
p->time_slice = task_timeslice(p);
- p->first_time_slice = 0;
+ recalc_task_slice_avg(p);
+ p->prio = effective_prio(p);
+ clr_first_time_slice(p);
if (!rq->expired_timestamp)
rq->expired_timestamp = jiffies;
@@ -2959,6 +3229,12 @@
#endif
+static inline int interactive_sleep(enum sleep_type sleep_type)
+{
+ return (sleep_type == SLEEP_INTERACTIVE ||
+ sleep_type == SLEEP_INTERRUPTED);
+}
+
/*
* schedule() is the main scheduler function.
*/
@@ -3017,7 +3293,7 @@
* Tasks charged proportionately less run_time at high sleep_avg to
* delay them losing their interactive status
*/
- run_time /= (CURRENT_BONUS(prev) ? : 1);
+ run_time /= BONUS_DIVISOR(prev);
spin_lock_irq(&rq->lock);
@@ -3031,7 +3307,7 @@
unlikely(signal_pending(prev))))
prev->state = TASK_RUNNING;
else {
- if (prev->state == TASK_UNINTERRUPTIBLE)
+ if (prev->state & TASK_UNINTERRUPTIBLE)
rq->nr_uninterruptible++;
deactivate_task(prev, rq);
}
@@ -3080,16 +3356,17 @@
rq->best_expired_prio = MAX_PRIO;
}
+repeat_selection:
idx = sched_find_first_bit(array->bitmap);
queue = array->queue + idx;
next = list_entry(queue->next, task_t, run_list);
- if (!rt_task(next) && next->activated > 0) {
+ if (!rt_task(next) && interactive_sleep(next->sleep_type)) {
unsigned long long delta = now - next->timestamp;
if (unlikely((long long)(now - next->timestamp) < 0))
delta = 0;
- if (next->activated == 1)
+ if (next->sleep_type == SLEEP_INTERACTIVE)
delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
array = next->array;
@@ -3099,10 +3376,16 @@
dequeue_task(next, array);
next->prio = new_prio;
enqueue_task(next, array);
- } else
- requeue_task(next, array);
+
+ /*
+ * We may have just been demoted below other
+ * runnable tasks in our previous queue.
+ */
+ next->sleep_type = SLEEP_NORMAL;
+ goto repeat_selection;
+ }
}
- next->activated = 0;
+ next->sleep_type = SLEEP_NORMAL;
switch_tasks:
if (next == rq->idle)
schedstat_inc(rq, sched_goidle);
@@ -3118,6 +3401,14 @@
prev->sleep_avg = 0;
prev->timestamp = prev->last_ran = now;
+ /*
+ * Tag start of execution of a new timeslice.
+ */
+ if (unlikely(slice_is_new(next))) {
+ next->last_slice = jiffies;
+ clr_slice_is_new(next);
+ }
+
sched_info_switch(prev, next);
if (likely(prev != next)) {
next->timestamp = now;
--- linux-2.6.16-rc1/kernel/sysctl.c.org 2006-02-12 21:29:24.000000000 +0100
+++ linux-2.6.16-rc1/kernel/sysctl.c 2006-02-12 21:29:53.000000000 +0100
@@ -71,6 +71,8 @@
extern int pid_max_min, pid_max_max;
extern int sysctl_drop_caches;
extern int percpu_pagelist_fraction;
+extern int sched_g1;
+extern int sched_g2;
#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
int unknown_nmi_panic;
@@ -226,6 +228,11 @@
{ .ctl_name = 0 }
};
+/* Constants for minimum and maximum testing in vm_table and
+ * kern_table. We use these as one-element integer vectors. */
+static int zero;
+static int one_hundred = 100;
+
static ctl_table kern_table[] = {
{
.ctl_name = KERN_OSTYPE,
@@ -658,15 +665,29 @@
.proc_handler = &proc_dointvec,
},
#endif
+ {
+ .ctl_name = KERN_SCHED_THROTTLE1,
+ .procname = "sched_g1",
+ .data = &sched_g1,
+ .maxlen = sizeof (int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ .strategy = &sysctl_intvec,
+ .extra1 = &zero,
+ },
+ {
+ .ctl_name = KERN_SCHED_THROTTLE2,
+ .procname = "sched_g2",
+ .data = &sched_g2,
+ .maxlen = sizeof (int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ .strategy = &sysctl_intvec,
+ .extra1 = &zero,
+ },
{ .ctl_name = 0 }
};
-/* Constants for minimum and maximum testing in vm_table.
- We use these as one-element integer vectors. */
-static int zero;
-static int one_hundred = 100;
-
-
static ctl_table vm_table[] = {
{
.ctl_name = VM_OVERCOMMIT_MEMORY,
--- linux-2.6.16-rc1/fs/pipe.c.org 2006-02-12 21:29:35.000000000 +0100
+++ linux-2.6.16-rc1/fs/pipe.c 2006-02-12 21:29:53.000000000 +0100
@@ -39,11 +39,7 @@
{
DEFINE_WAIT(wait);
- /*
- * Pipes are system-local resources, so sleeping on them
- * is considered a noninteractive wait:
- */
- prepare_to_wait(PIPE_WAIT(*inode), &wait, TASK_INTERRUPTIBLE|TASK_NONINTERACTIVE);
+ prepare_to_wait(PIPE_WAIT(*inode), &wait, TASK_INTERRUPTIBLE);
mutex_unlock(PIPE_MUTEX(*inode));
schedule();
finish_wait(PIPE_WAIT(*inode), &wait);
On Sun, 2006-02-12 at 22:36 +0100, MIke Galbraith wrote:
> On Sun, 2006-02-12 at 14:03 -0500, Lee Revell wrote:
> > On Sun, 2006-02-12 at 14:47 +0100, MIke Galbraith wrote:
> > > If you think it's the scheduler, how about try the patch below. It's
> > > against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
> > > logic in the scheduler or not. I don't see other candidates in there,
> > > not that that means there aren't any of course.
> >
> > I'll try, but it's a serious pain for me to build an -mm kernel. A
> > patch against 2.6.16-rc1 would be much easier.
>
> Ok, here she comes. It's a bit too reluctant to release a task so it
> can reach interactive status at the moment, but for this test, that's a
> feature. In fact, for this test, it's probably best to jump straight to
> setting both g1 and g2 to zero.
Thanks, this solves the "ls" problem I was having! The strange
"oscillating" behavior is gone, now it consistently takes 0.19-0.45s.
It's still not as consistent as "time ls | cat", which takes 0.19-0.26s,
but MUCH better.
Lee
On Sun, 2006-02-12 at 22:36 +0100, MIke Galbraith wrote:
> On Sun, 2006-02-12 at 14:03 -0500, Lee Revell wrote:
> > On Sun, 2006-02-12 at 14:47 +0100, MIke Galbraith wrote:
> > > If you think it's the scheduler, how about try the patch below. It's
> > > against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
> > > logic in the scheduler or not. I don't see other candidates in there,
> > > not that that means there aren't any of course.
> >
> > I'll try, but it's a serious pain for me to build an -mm kernel. A
> > patch against 2.6.16-rc1 would be much easier.
>
> Ok, here she comes. It's a bit too reluctant to release a task so it
> can reach interactive status at the moment, but for this test, that's a
> feature. In fact, for this test, it's probably best to jump straight to
> setting both g1 and g2 to zero.
Not only does this fix my "time ls" test case, it seems to drastically
improve interactivity for my desktop apps. I was really being plagued
by weird stalls, it's much smoother now.
Now to regression test it...
Lee
On Sun, 2006-02-12 at 18:39 -0500, Lee Revell wrote:
> On Sun, 2006-02-12 at 22:36 +0100, MIke Galbraith wrote:
> > On Sun, 2006-02-12 at 14:03 -0500, Lee Revell wrote:
> > > On Sun, 2006-02-12 at 14:47 +0100, MIke Galbraith wrote:
> > > > If you think it's the scheduler, how about try the patch below. It's
> > > > against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
> > > > logic in the scheduler or not. I don't see other candidates in there,
> > > > not that that means there aren't any of course.
> > >
> > > I'll try, but it's a serious pain for me to build an -mm kernel. A
> > > patch against 2.6.16-rc1 would be much easier.
> >
> > Ok, here she comes. It's a bit too reluctant to release a task so it
> > can reach interactive status at the moment, but for this test, that's a
> > feature. In fact, for this test, it's probably best to jump straight to
> > setting both g1 and g2 to zero.
>
> Not only does this fix my "time ls" test case, it seems to drastically
> improve interactivity for my desktop apps. I was really being plagued
> by weird stalls, it's much smoother now.
Yeah, but under load, that reluctance to release is fairly annoying...
>
> Now to regression test it...
...and may cause test applications to not reach their proper priority
before measurement begins.
-Mike
On Mon, 2006-02-13 at 04:09 +0100, MIke Galbraith wrote:
> On Sun, 2006-02-12 at 18:39 -0500, Lee Revell wrote:
> > On Sun, 2006-02-12 at 22:36 +0100, MIke Galbraith wrote:
> > > On Sun, 2006-02-12 at 14:03 -0500, Lee Revell wrote:
> > > > On Sun, 2006-02-12 at 14:47 +0100, MIke Galbraith wrote:
> > > > > If you think it's the scheduler, how about try the patch below. It's
> > > > > against 2.6.16-rc2-mm1, and should tell you if it is the interactivity
> > > > > logic in the scheduler or not. I don't see other candidates in there,
> > > > > not that that means there aren't any of course.
> > > >
> > > > I'll try, but it's a serious pain for me to build an -mm kernel. A
> > > > patch against 2.6.16-rc1 would be much easier.
> > >
> > > Ok, here she comes. It's a bit too reluctant to release a task so it
> > > can reach interactive status at the moment, but for this test, that's a
> > > feature. In fact, for this test, it's probably best to jump straight to
> > > setting both g1 and g2 to zero.
> >
> > Not only does this fix my "time ls" test case, it seems to drastically
> > improve interactivity for my desktop apps. I was really being plagued
> > by weird stalls, it's much smoother now.
>
> Yeah, but under load, that reluctance to release is fairly annoying...
This seems to manifest on my system as the mouse getting jerky under
load. Still, I don't mind - the overall feel is still smoother - as if
the X server was getting too much CPU before.
Lee
On Sun, 2006-02-12 at 22:39 -0500, Lee Revell wrote:
> On Mon, 2006-02-13 at 04:09 +0100, MIke Galbraith wrote:
> > On Sun, 2006-02-12 at 18:39 -0500, Lee Revell wrote:
> > > Not only does this fix my "time ls" test case, it seems to drastically
> > > improve interactivity for my desktop apps. I was really being plagued
> > > by weird stalls, it's much smoother now.
> >
> > Yeah, but under load, that reluctance to release is fairly annoying...
>
> This seems to manifest on my system as the mouse getting jerky under
> load. Still, I don't mind - the overall feel is still smoother - as if
> the X server was getting too much CPU before.
Not only the X server, the interactivity boost in general is way too
severe. It still needs work, but it's shaping up. Who knows, it may
even get to the point of applying for inclusion.
Now, let's see if we can get your problem fixed with something that can
possibly go into 2.6.16 as a bugfix. Can you please try the below?
This patch fixes two of what I would call thinkos in the interactivity
logic. One, the requeuing of a freshly awakened task can lead to that
task losing it's slice of cpu upon preempt if it isn't alone in it's
queue. Two, tasks which sleep for > INTERACTIVE_SLEEP(p), approximately
700ms, are treated as being idle, and prevented from slamming straight
to max dynamic priority for obvious reasons, yet a pure cpu hog that
sleeps for 100ms will fail the test, and subsequently have it's sleep
time multiplied by MAX_BONUS.
Just in case it does fix it for you, I'll even add a blame line.
Signed-off-by: Mike Galbraith <[email protected]>
--- linux-2.6.16-rc3/kernel/sched.c.org 2006-02-13 04:44:57.000000000 +0100
+++ linux-2.6.16-rc3/kernel/sched.c 2006-02-13 05:11:18.000000000 +0100
@@ -150,8 +150,11 @@
((p)->prio <= (p)->static_prio - DELTA(p))
#define INTERACTIVE_SLEEP(p) \
- (JIFFIES_TO_NS(MAX_SLEEP_AVG * \
- (MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
+ (min(JIFFIES_TO_NS(MAX_SLEEP_AVG * (MAX_BONUS / 2 + DELTA((p)) + 1) / \
+ MAX_BONUS - 1), NS_MAX_SLEEP_AVG))
+
+#define BONUS_MULTIPLIER(p) \
+ ((MAX_BONUS - CURRENT_BONUS(p)) ? : 1)
#define TASK_PREEMPTS_CURR(p, rq) \
((p)->prio < (rq)->curr->prio)
@@ -708,7 +711,7 @@
* prevent them suddenly becoming cpu hogs and starving
* other processes.
*/
- if (p->mm && p->activated != -1 &&
+ if (p->mm && p->activated != -1 && BONUS_MULTIPLIER(p) *
sleep_time > INTERACTIVE_SLEEP(p)) {
p->sleep_avg = JIFFIES_TO_NS(MAX_SLEEP_AVG -
DEF_TIMESLICE);
@@ -717,7 +720,7 @@
* The lower the sleep avg a task has the more
* rapidly it will rise with sleep time.
*/
- sleep_time *= (MAX_BONUS - CURRENT_BONUS(p)) ? : 1;
+ sleep_time *= BONUS_MULTIPLIER(p);
/*
* Tasks waking from uninterruptible sleep are
@@ -3009,8 +3012,7 @@
dequeue_task(next, array);
next->prio = new_prio;
enqueue_task(next, array);
- } else
- requeue_task(next, array);
+ }
}
next->activated = 0;
switch_tasks:
On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> Now, let's see if we can get your problem fixed with something that can
> possibly go into 2.6.16 as a bugfix. Can you please try the below?
These sorts of changes definitely need to pass through -mm first... and don't
forget -mm looks quite different to mainline.
Cheers,
Con
On Mon, 2006-02-13 at 16:05 +1100, Con Kolivas wrote:
> On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> > Now, let's see if we can get your problem fixed with something that can
> > possibly go into 2.6.16 as a bugfix. Can you please try the below?
>
> These sorts of changes definitely need to pass through -mm first... and don't
> forget -mm looks quite different to mainline.
I'll leave that up to Ingo of course, and certainly have no problem with
them burning in mm. However, I must say that I personally classify
these two changes as being trivial and obviously correct enough to be
included in 2.6.16. I realize that mm is different in this area (we
talked about it), but the problems addressed by this patch still remain.
Anyway, let's wait and see if it fixes Lee's problem.
-Mike
On Monday 13 February 2006 16:32, MIke Galbraith wrote:
> On Mon, 2006-02-13 at 16:05 +1100, Con Kolivas wrote:
> > On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> > > Now, let's see if we can get your problem fixed with something that can
> > > possibly go into 2.6.16 as a bugfix. Can you please try the below?
> >
> > These sorts of changes definitely need to pass through -mm first... and
> > don't forget -mm looks quite different to mainline.
>
> I'll leave that up to Ingo of course, and certainly have no problem with
> them burning in mm. However, I must say that I personally classify
> these two changes as being trivial and obviously correct enough to be
> included in 2.6.16.
This part I agree with:
- } else
- requeue_task(next, array);
+ }
The rest changes behaviour; it's not a "bug" so needs testing, should be a
separate patch from this part, and modified to suit -mm.
Cheers,
Con
On Mon, 2006-02-13 at 16:37 +1100, Con Kolivas wrote:
> On Monday 13 February 2006 16:32, MIke Galbraith wrote:
> > On Mon, 2006-02-13 at 16:05 +1100, Con Kolivas wrote:
> > > On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> > > > Now, let's see if we can get your problem fixed with something that can
> > > > possibly go into 2.6.16 as a bugfix. Can you please try the below?
> > >
> > > These sorts of changes definitely need to pass through -mm first... and
> > > don't forget -mm looks quite different to mainline.
> >
> > I'll leave that up to Ingo of course, and certainly have no problem with
> > them burning in mm. However, I must say that I personally classify
> > these two changes as being trivial and obviously correct enough to be
> > included in 2.6.16.
>
> This part I agree with:
> - } else
> - requeue_task(next, array);
> + }
>
> The rest changes behaviour; it's not a "bug" so needs testing, should be a
> separate patch from this part, and modified to suit -mm.
Well, both change behavior, and I heartily disagree. Blocking a 700ms
sleep while allowing a 100ms sleep to bypass the same checkpoint only to
then be multiplied by 10 is a bug.
Actually, the point at which a task becomes interactive is the point at
which scheduler semantics change. Ergo, as far as I'm concerned, this
should be a boundary which must be crossed before proceeding further.
That, I agree, would be a behavioral change which should be baked in mm.
-Mike
On Monday 13 February 2006 16:57, MIke Galbraith wrote:
> On Mon, 2006-02-13 at 16:37 +1100, Con Kolivas wrote:
> > On Monday 13 February 2006 16:32, MIke Galbraith wrote:
> > > On Mon, 2006-02-13 at 16:05 +1100, Con Kolivas wrote:
> > > > On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> > > > > Now, let's see if we can get your problem fixed with something that
> > > > > can possibly go into 2.6.16 as a bugfix. Can you please try the
> > > > > below?
> > > >
> > > > These sorts of changes definitely need to pass through -mm first...
> > > > and don't forget -mm looks quite different to mainline.
> > >
> > > I'll leave that up to Ingo of course, and certainly have no problem
> > > with them burning in mm. However, I must say that I personally
> > > classify these two changes as being trivial and obviously correct
> > > enough to be included in 2.6.16.
> >
> > This part I agree with:
> > - } else
> > - requeue_task(next, array);
> > + }
> >
> > The rest changes behaviour; it's not a "bug" so needs testing, should be
> > a separate patch from this part, and modified to suit -mm.
>
> Well, both change behavior, and I heartily disagree.
The first change was the previous behaviour for some time. Your latter change
while it makes sense has never been in the kernel. Either way I don't
disagree with your reasoning but most things that change behaviour should go
through -mm. The first as I said was the behaviour in mainline for some time
till my silly requeue change.
Cheers,
Con
> Blocking a 700ms
> sleep while allowing a 100ms sleep to bypass the same checkpoint only to
> then be multiplied by 10 is a bug.
>
> Actually, the point at which a task becomes interactive is the point at
> which scheduler semantics change. Ergo, as far as I'm concerned, this
> should be a boundary which must be crossed before proceeding further.
> That, I agree, would be a behavioral change which should be baked in mm.
On Mon, 2006-02-13 at 17:08 +1100, Con Kolivas wrote:
> On Monday 13 February 2006 16:57, MIke Galbraith wrote:
> > On Mon, 2006-02-13 at 16:37 +1100, Con Kolivas wrote:
> > > On Monday 13 February 2006 16:32, MIke Galbraith wrote:
> > > > On Mon, 2006-02-13 at 16:05 +1100, Con Kolivas wrote:
> > > > > On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> > > > > > Now, let's see if we can get your problem fixed with something that
> > > > > > can possibly go into 2.6.16 as a bugfix. Can you please try the
> > > > > > below?
> > > > >
> > > > > These sorts of changes definitely need to pass through -mm first...
> > > > > and don't forget -mm looks quite different to mainline.
> > > >
> > > > I'll leave that up to Ingo of course, and certainly have no problem
> > > > with them burning in mm. However, I must say that I personally
> > > > classify these two changes as being trivial and obviously correct
> > > > enough to be included in 2.6.16.
> > >
> > > This part I agree with:
> > > - } else
> > > - requeue_task(next, array);
> > > + }
> > >
> > > The rest changes behaviour; it's not a "bug" so needs testing, should be
> > > a separate patch from this part, and modified to suit -mm.
> >
> > Well, both change behavior, and I heartily disagree.
>
> The first change was the previous behaviour for some time. Your latter change
> while it makes sense has never been in the kernel. Either way I don't
> disagree with your reasoning but most things that change behaviour should go
> through -mm. The first as I said was the behaviour in mainline for some time
> till my silly requeue change.
Ok, we're basically in agreement on these changes, it's just a matter of
when. As maintainer, Ingo has to weigh the benefit, danger, etc etc.
-Mike
On Mon, 2006-02-13 at 07:35 +0100, MIke Galbraith wrote:
> On Mon, 2006-02-13 at 17:08 +1100, Con Kolivas wrote:
> > On Monday 13 February 2006 16:57, MIke Galbraith wrote:
> > > On Mon, 2006-02-13 at 16:37 +1100, Con Kolivas wrote:
> > > > On Monday 13 February 2006 16:32, MIke Galbraith wrote:
> > > > > On Mon, 2006-02-13 at 16:05 +1100, Con Kolivas wrote:
> > > > > > On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> > > > > > > Now, let's see if we can get your problem fixed with something that
> > > > > > > can possibly go into 2.6.16 as a bugfix. Can you please try the
> > > > > > > below?
> > > > > >
> > > > > > These sorts of changes definitely need to pass through -mm first...
> > > > > > and don't forget -mm looks quite different to mainline.
> > > > >
> > > > > I'll leave that up to Ingo of course, and certainly have no problem
> > > > > with them burning in mm. However, I must say that I personally
> > > > > classify these two changes as being trivial and obviously correct
> > > > > enough to be included in 2.6.16.
> > > >
> > > > This part I agree with:
> > > > - } else
> > > > - requeue_task(next, array);
> > > > + }
> > > >
> > > > The rest changes behaviour; it's not a "bug" so needs testing, should be
> > > > a separate patch from this part, and modified to suit -mm.
> > >
> > > Well, both change behavior, and I heartily disagree.
> >
> > The first change was the previous behaviour for some time. Your latter change
> > while it makes sense has never been in the kernel. Either way I don't
> > disagree with your reasoning but most things that change behaviour should go
> > through -mm. The first as I said was the behaviour in mainline for some time
> > till my silly requeue change.
>
> Ok, we're basically in agreement on these changes, it's just a matter of
> when. As maintainer, Ingo has to weigh the benefit, danger, etc etc.
Do you know which of those changes fixes the "ls" problem?
Lee
On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> Do you know which of those changes fixes the "ls" problem?
No, it could be either, both, or neither. Heck, it _could_ be a
combination of all of the things in my experimental tree for that
matter. I put this patch out there because I know they're both bugs,
and strongly suspect it'll cure the worst of the interactivity related
delays.
I'm hoping you'll test it and confirm that it fixes yours.
-Mike
On Monday 13 February 2006 17:35, MIke Galbraith wrote:
> On Mon, 2006-02-13 at 17:08 +1100, Con Kolivas wrote:
> > On Monday 13 February 2006 16:57, MIke Galbraith wrote:
> > > On Mon, 2006-02-13 at 16:37 +1100, Con Kolivas wrote:
> > > > On Monday 13 February 2006 16:32, MIke Galbraith wrote:
> > > > > On Mon, 2006-02-13 at 16:05 +1100, Con Kolivas wrote:
> > > > > > On Monday 13 February 2006 15:59, MIke Galbraith wrote:
> > > > > > > Now, let's see if we can get your problem fixed with something
> > > > > > > that can possibly go into 2.6.16 as a bugfix. Can you please
> > > > > > > try the below?
> > > > > >
> > > > > > These sorts of changes definitely need to pass through -mm
> > > > > > first... and don't forget -mm looks quite different to mainline.
> > > > >
> > > > > I'll leave that up to Ingo of course, and certainly have no problem
> > > > > with them burning in mm. However, I must say that I personally
> > > > > classify these two changes as being trivial and obviously correct
> > > > > enough to be included in 2.6.16.
> > > >
> > > > This part I agree with:
> > > > - } else
> > > > - requeue_task(next, array);
> > > > + }
> > > >
> > > > The rest changes behaviour; it's not a "bug" so needs testing, should
> > > > be a separate patch from this part, and modified to suit -mm.
> > >
> > > Well, both change behavior, and I heartily disagree.
> >
> > The first change was the previous behaviour for some time. Your latter
> > change while it makes sense has never been in the kernel. Either way I
> > don't disagree with your reasoning but most things that change behaviour
> > should go through -mm. The first as I said was the behaviour in mainline
> > for some time till my silly requeue change.
>
> Ok, we're basically in agreement on these changes, it's just a matter of
> when. As maintainer, Ingo has to weigh the benefit, danger, etc etc.
Aye and do note the change to the idle detection code currently in -mm will
already make substantial difference there, possibly related to fluctuating
behaviour.
Cheers,
Con
On Mon, 2006-02-13 at 18:15 +1100, Con Kolivas wrote:
> On Monday 13 February 2006 17:35, MIke Galbraith wrote:
> >
> > Ok, we're basically in agreement on these changes, it's just a matter of
> > when. As maintainer, Ingo has to weigh the benefit, danger, etc etc.
>
> Aye and do note the change to the idle detection code currently in -mm will
> already make substantial difference there, possibly related to fluctuating
> behaviour.
Possibly... but there also lies a two edged sword. Previously, some
bad boys were being truncated back to user_prio 17. In mm, that's no
longer true.
-Mike
On Mon, 2006-02-13 at 08:08 +0100, MIke Galbraith wrote:
> On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> > Do you know which of those changes fixes the "ls" problem?
>
> No, it could be either, both, or neither. Heck, it _could_ be a
> combination of all of the things in my experimental tree for that
> matter. I put this patch out there because I know they're both bugs,
> and strongly suspect it'll cure the worst of the interactivity related
> delays.
>
> I'm hoping you'll test it and confirm that it fixes yours.
Nope, this does not fix it. "time ls" ping-pongs back and forth between
~0.1s and ~0.9s. Must have been something else in the first patch.
Lee
On Mon, 2006-02-13 at 03:43 -0500, Lee Revell wrote:
> On Mon, 2006-02-13 at 08:08 +0100, MIke Galbraith wrote:
> > On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> > > Do you know which of those changes fixes the "ls" problem?
> >
> > No, it could be either, both, or neither. Heck, it _could_ be a
> > combination of all of the things in my experimental tree for that
> > matter. I put this patch out there because I know they're both bugs,
> > and strongly suspect it'll cure the worst of the interactivity related
> > delays.
> >
> > I'm hoping you'll test it and confirm that it fixes yours.
>
> Nope, this does not fix it. "time ls" ping-pongs back and forth between
> ~0.1s and ~0.9s. Must have been something else in the first patch.
Oh well. Thanks for testing Lee. I was hoping this would be a case of
instant gratification, 2.6.16 being near, but it's not to be.
-Mike
On Mon, 2006-02-13 at 03:43 -0500, Lee Revell wrote:
> On Mon, 2006-02-13 at 08:08 +0100, MIke Galbraith wrote:
> > On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> > > Do you know which of those changes fixes the "ls" problem?
> >
> > No, it could be either, both, or neither. Heck, it _could_ be a
> > combination of all of the things in my experimental tree for that
> > matter. I put this patch out there because I know they're both bugs,
> > and strongly suspect it'll cure the worst of the interactivity related
> > delays.
> >
> > I'm hoping you'll test it and confirm that it fixes yours.
>
> Nope, this does not fix it. "time ls" ping-pongs back and forth between
> ~0.1s and ~0.9s. Must have been something else in the first patch.
Hmm. Thinking about it some more, it's probably more than this alone,
but it could well be the boost qualifier I'm using...
Instead of declaring a task to be deserving of large quantities of boost
based upon their present shortage of sleep_avg, I based it upon their
not using their full slice. He who uses the least gets the most. This
made a large contribution to mitigating the parallel compile over NFS
problem the current scheduler has. The fact that (current) heuristics
which mandate that any task which sleeps for 5% of it's slice may use
95% cpu practically forever can not only work, but work quite well in
the general case, tells me that the vast majority of all tasks are, and
will forever remain, cpu hogs.
The present qualifier creates positive feedback for cpu hogs by giving
them the most reward for being the biggest hog by our own definition.
If you'll pardon the pun, we gives pigs wings, and hope that they don't
actually use them and fly directly over head. This is the root problem
as I see it, that and the fact that even if sleep_avg acquisition and
consumption were purely 1:1 as the original O(1) scheduler was, if you
sleep 1 ns longer than you run, you'll eventually be up to you neck in
sleep_avg. (a darn good reason to use something like slice_avg to help
determine when to drain off the excess)
Changing that qualifier would also mean that he who is _getting_ the
least cpu would get the most boost as well, so it should help with
fairness, and things like the test case mentioned in comments in the
patch where one task can end up starving it's own partner.
Is there any reason that "he who uses the least gets the most" would be
inferior to "he who has the least for whatever reason gets the most"?
If I were to put a patch together that did only that (IMHO sensible)
thing, would anyone be interested in trying it?
-Mike
On Mon, 2006-02-13 at 13:35 +0100, MIke Galbraith wrote:
> On Mon, 2006-02-13 at 03:43 -0500, Lee Revell wrote:
> > On Mon, 2006-02-13 at 08:08 +0100, MIke Galbraith wrote:
> > > On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> > > > Do you know which of those changes fixes the "ls" problem?
> > >
> > > No, it could be either, both, or neither. Heck, it _could_ be a
> > > combination of all of the things in my experimental tree for that
> > > matter. I put this patch out there because I know they're both bugs,
> > > and strongly suspect it'll cure the worst of the interactivity related
> > > delays.
> > >
> > > I'm hoping you'll test it and confirm that it fixes yours.
> >
> > Nope, this does not fix it. "time ls" ping-pongs back and forth between
> > ~0.1s and ~0.9s. Must have been something else in the first patch.
>
> Hmm. Thinking about it some more, it's probably more than this alone,
> but it could well be the boost qualifier I'm using...
OK, with 2.6.16-rc2-mm1, "ls" bounces around between 0.15s and 0.50s.
Better than mainline but the large seemingly random variance is still
perceptible and annoying. And, "ls | cat" behaves about the same as
"ls", while on mainline it was consistently faster (!).
Do you have an updated patch against -mm that I can test?
Lee
On Tue, 2006-02-14 at 23:22 -0500, Lee Revell wrote:
> On Mon, 2006-02-13 at 13:35 +0100, MIke Galbraith wrote:
> > On Mon, 2006-02-13 at 03:43 -0500, Lee Revell wrote:
> > > On Mon, 2006-02-13 at 08:08 +0100, MIke Galbraith wrote:
> > > > On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> > > > > Do you know which of those changes fixes the "ls" problem?
> > > >
> > > > No, it could be either, both, or neither. Heck, it _could_ be a
> > > > combination of all of the things in my experimental tree for that
> > > > matter. I put this patch out there because I know they're both bugs,
> > > > and strongly suspect it'll cure the worst of the interactivity related
> > > > delays.
> > > >
> > > > I'm hoping you'll test it and confirm that it fixes yours.
> > >
> > > Nope, this does not fix it. "time ls" ping-pongs back and forth between
> > > ~0.1s and ~0.9s. Must have been something else in the first patch.
> >
> > Hmm. Thinking about it some more, it's probably more than this alone,
> > but it could well be the boost qualifier I'm using...
>
> OK, with 2.6.16-rc2-mm1, "ls" bounces around between 0.15s and 0.50s.
> Better than mainline but the large seemingly random variance is still
> perceptible and annoying. And, "ls | cat" behaves about the same as
> "ls", while on mainline it was consistently faster (!).
Ok. That means the reduction in fluctuation had nothing to do with my
changes. It also suggests that there may be something of a regression
in the changes that are in mm, which I also carried in my patch, since
the timing for both kernels appear to be ~identical with or without my
bits. That seems a little odd to me considering what those changes do.
>
> Do you have an updated patch against -mm that I can test?
I will soon if you still want to try it. I've fixed the throttle release
thing, and am fine tuning the interactivity bits. I have it working
very well now, but want to try to squeeze some more from it.
Drop me a line if you're still interested from the interactivity side,
but I think the ls delay reduction has turned out to be a red herring.
-Mike
On Wed, 2006-02-15 at 06:22 +0100, MIke Galbraith wrote:
> > OK, with 2.6.16-rc2-mm1, "ls" bounces around between 0.15s and
> 0.50s.
> > Better than mainline but the large seemingly random variance is
> still
> > perceptible and annoying. And, "ls | cat" behaves about the same as
> > "ls", while on mainline it was consistently faster (!).
>
> Ok. That means the reduction in fluctuation had nothing to do with my
> changes. It also suggests that there may be something of a regression
> in the changes that are in mm, which I also carried in my patch, since
> the timing for both kernels appear to be ~identical with or without my
> bits. That seems a little odd to me considering what those changes
> do.
>
> >
> > Do you have an updated patch against -mm that I can test?
>
> I will soon if you still want to try it. I've fixed the throttle
> release
> thing, and am fine tuning the interactivity bits. I have it working
> very well now, but want to try to squeeze some more from it.
>
> Drop me a line if you're still interested from the interactivity side,
> but I think the ls delay reduction has turned out to be a red
> herring.
Just to be clear - this is 2.6.16-rc2-mm1 *without* your patch that I am
talking about.
Lee
On Wed, 2006-02-15 at 01:11 -0500, Lee Revell wrote:
> On Wed, 2006-02-15 at 06:22 +0100, MIke Galbraith wrote:
> > > OK, with 2.6.16-rc2-mm1, "ls" bounces around between 0.15s and
> > 0.50s.
> > > Better than mainline but the large seemingly random variance is
> > still
> > > perceptible and annoying. And, "ls | cat" behaves about the same as
> > > "ls", while on mainline it was consistently faster (!).
> >
> > Ok. That means the reduction in fluctuation had nothing to do with my
> > changes. It also suggests that there may be something of a regression
> > in the changes that are in mm, which I also carried in my patch, since
> > the timing for both kernels appear to be ~identical with or without my
> > bits. That seems a little odd to me considering what those changes
> > do.
> >
> > >
> > > Do you have an updated patch against -mm that I can test?
> >
> > I will soon if you still want to try it. I've fixed the throttle
> > release
> > thing, and am fine tuning the interactivity bits. I have it working
> > very well now, but want to try to squeeze some more from it.
> >
> > Drop me a line if you're still interested from the interactivity side,
> > but I think the ls delay reduction has turned out to be a red
> > herring.
>
> Just to be clear - this is 2.6.16-rc2-mm1 *without* your patch that I am
> talking about.
Exactly. 2.6.16-rc2-mm1 without my patch has a delay of .15 to .50s.
2.6.16-rc1 with my patch had a reported delay of from .19 to .45s.
That's identical in my book. My patch to rc1 also contained Con's
changes that are in mm, that's constant. Subtracting the variable, my
patch, made no difference. Con's changes may be responsible for the
behavior change, but mine are certainly not.
-Mike