2003-08-19 14:57:24

by Con Kolivas

[permalink] [raw]
Subject: [PATCH] O17int

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Food for the starving masses.

This patch prevents any single task from being able to starve the runqueue
that it's on. This minimises the impact a poorly behaved application or a
malicious one has on the rest of the system. If an interactive task (eg
blender) spins on wait on a cpu hog (X), other tasks will be less affected by
priority inversion occurring between X and blender (ie X will still suffer
but the machine wont come to a standstill). Broken apps will improve at least
partially with this, but they should be fixed for good performance.

Changes:
A task that uses up it's full timeslice will not be allowed to be the very
next task scheduled unless no other tasks on that runqueue are active. This
allows the next highest priority task to be scheduled or even an array switch
followed by the next task.

Dropped the on runqueue bonus to high credit tasks on requeuing for timeslice
granularity for the above to work well.

Patch O16.3-O17int and a full patch against 2.6.0-test3 is available at
http://kernel.kolivas.org/2.5

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/QjvMZUg7+tp6mRURAuLpAJ4wZaZ1qa5n4otk/wE+muarpqilJgCff3Bi
I8J81js9UZL3XoPqR0FnjEE=
=1KDG
-----END PGP SIGNATURE-----


Attachments:
(No filename) (1.26 kB)
clearsigned data
patch-O16.3-O17int (5.01 kB)
Download all attachments

2003-08-19 17:04:38

by MånsRullgård

[permalink] [raw]
Subject: Re: [PATCH] O17int

Con Kolivas <[email protected]> writes:

> Food for the starving masses.
>
> This patch prevents any single task from being able to starve the
> runqueue that it's on. This minimises the impact a poorly behaved
> application or a malicious one has on the rest of the system. If an

I have to disagree. Open a file of a few hundred lines in XEmacs and
do a regexp search for "^[> ]*-*\n\\([> ]*.*\n\\)*[> ]*foo". The
system will more or less freeze. It's a very nasty regexp, and it's
an error to try to use it, but it still shouldn't freeze the system.

--
M?ns Rullg?rd
[email protected]

2003-08-19 18:59:32

by MånsRullgård

[permalink] [raw]
Subject: Re: [PATCH] O17int


Another XEmacs strangeness:

When compiling from xemacs, everything is fine until the compilation
is done. Then xemacs starts spinning wildly in some loop doing this:

select(1024, [], NULL, NULL, {0, 0}) = 0 (Timeout)
write(5, "F\v\5\0004\0\0\0017\0\0\1\4\0\223\0008\0\r\0F0\5\0004\0"..., 40) = 40
write(5, "F\v\5\0004\0\0\0017\0\0\1\4\0\223\0008\0\r\0F0\5\0004\0"..., 40) = 40
select(1024, [], NULL, NULL, {0, 0}) = 0 (Timeout)
ioctl(5, 0x541b, [0]) = 0
ioctl(5, 0x541b, [0]) = 0
gettimeofday({1061288133, 302532}, NULL) = 0
select(20, [3 5 6 7 9 10 12 13 19], [], [], {0, 0}) = 1 (in [13], left {0, 0})
gettimeofday({1061288133, 302779}, NULL) = 0
select(1024, [13], NULL, NULL, {0, 0}) = 1 (in [13], left {0, 0})
read(13, 0x8dc4b08, 512) = -1 EIO (Input/output error)
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
wait4(1703, 0xbfffe80c, WNOHANG, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
wait4(1162, 0xbfffe80c, WNOHANG, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
wait4(1160, 0xbfffe80c, WNOHANG, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
wait4(1003, 0xbfffe80c, WNOHANG, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
select(1024, [], NULL, NULL, {0, 0}) = 0 (Timeout)
write(5, "F\v\5\0004\0\0\0017\0\0\1\4\0\223\0008\0\r\0F0\5\0004\0"..., 40) = 40
write(5, "F\v\5\0004\0\0\0017\0\0\1\4\0\223\0008\0\r\0F0\5\0004\0"..., 40) = 40
select(1024, [], NULL, NULL, {0, 0}) = 0 (Timeout)
ioctl(5, 0x541b, [0]) = 0
ioctl(5, 0x541b, [0]) = 0
gettimeofday({1061288133, 304385}, NULL) = 0
select(20, [3 5 6 7 9 10 12 13 19], [], [], {0, 0}) = 1 (in [13], left {0, 0})
gettimeofday({1061288133, 304605}, NULL) = 0
select(1024, [13], NULL, NULL, {0, 0}) = 1 (in [13], left {0, 0})
read(13, 0x8dc4b08, 512) = -1 EIO (Input/output error)
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
wait4(1703, 0xbfffe80c, WNOHANG, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
wait4(1162, 0xbfffe80c, WNOHANG, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
wait4(1160, 0xbfffe80c, WNOHANG, NULL) = 0

This goes on for anything from half a second to several seconds.
During that time other processes, except X, are starved.

I saw this first with 2.6.0-test1 vanilla, then it went away in -test2
and -test3, only to show up again with O16.3int. My O16.2 kernel
seems ok, which seems strange to me since the difference from O16.2 to
O16.3 is very small.

Any ideas?

--
M?ns Rullg?rd
[email protected]

2003-08-20 01:12:59

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O17int

On Wed, 20 Aug 2003 04:58, M?ns Rullg?rd wrote:
> Another XEmacs strangeness:
>
> When compiling from xemacs, everything is fine until the compilation
> is done. Then xemacs starts spinning wildly in some loop doing this:
> This goes on for anything from half a second to several seconds.
> During that time other processes, except X, are starved.
>
> I saw this first with 2.6.0-test1 vanilla, then it went away in -test2
> and -test3, only to show up again with O16.3int. My O16.2 kernel
> seems ok, which seems strange to me since the difference from O16.2 to
> O16.3 is very small.

While being a small patch, 16.2-16.3 was a large change. It removed the very
aggressive starvation avoidance of spin on wait waker/wakee in 16.2, which
clearly helped your issue.

> Any ideas?

Pretty sure we have another spinner. A reniced -11 batch run of top -d 1 and
vmstat during the spinning, and a kernel profile for that time will be
helpful.

Con

2003-08-20 01:16:46

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O17int

On Wed, 20 Aug 2003 02:39, M?ns Rullg?rd wrote:
> Con Kolivas <[email protected]> writes:
> > Food for the starving masses.
> >
> > This patch prevents any single task from being able to starve the
> > runqueue that it's on. This minimises the impact a poorly behaved
> > application or a malicious one has on the rest of the system. If an
>
> I have to disagree. Open a file of a few hundred lines in XEmacs and
> do a regexp search for "^[> ]*-*\n\\([> ]*.*\n\\)*[> ]*foo". The
> system will more or less freeze. It's a very nasty regexp, and it's
> an error to try to use it, but it still shouldn't freeze the system.

Reasonably sure this is a variation on the starvation which O17 won't address.
Please top/vmstat/profile this and I'll look into it. There is potential for
starvation without using up full timeslices and this may be it.

Con

2003-08-20 04:55:09

by Voluspa

[permalink] [raw]
Subject: Re: [PATCH] O17int


On 2003-08-19 15:01:28 Con Kolivas wrote:

> Food for the starving masses.
>
> This patch prevents any single task from being able to starve the
> runqueue that it's on.

This is completely tru in my two test scenarios. Winex 3.1 running
"Baldurs Gate I" is as smooth as -O10 or -A3 and it is impossible to
trigger a permanent starvation. Switching to a text console and back, I
only hear brief, ca 1.5 seconds, "sound repeats" after which the game
recovers totally. Clicking the "Software standard BLT[on/off]", that
even triggered a short priority inversion in A3, has no impact at all.
Playability, for those who wonder, is an impressive 9 out of 10 ;-)

Blender 2.28 can not starve xmms one iota. Within blender itself, I can
cause 1 to 5 second freezes while doing a slow "world rotate", but that
is something the application programmers have to fix.

I see no throughput regression, and overall "feel" of the system is
great. A real keeper, this one.

Mvh
Mats Johannesson

2003-08-20 07:56:56

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] O17int

At 06:55 AM 8/20/2003 +0200, Voluspa wrote:

>Blender 2.28 can not starve xmms one iota. Within blender itself, I can
>cause 1 to 5 second freezes while doing a slow "world rotate", but that
>is something the application programmers have to fix.

I'm not so sure that it's an application bug. With Nick's patch, I cannot
trigger any delay what so ever, whereas with stock, or with Ingo's changes
[as well as my own, damn the bad luck] I can. I'm not saying it's _not_ a
bug mind you, but color me suspicious ;-)

-Mike

2003-08-20 08:53:23

by MånsRullgård

[permalink] [raw]
Subject: Re: [PATCH] O17int

Con Kolivas <[email protected]> writes:

>> When compiling from xemacs, everything is fine until the compilation
>> is done. Then xemacs starts spinning wildly in some loop doing this:
>> This goes on for anything from half a second to several seconds.
>> During that time other processes, except X, are starved.
>>
>> I saw this first with 2.6.0-test1 vanilla, then it went away in -test2
>> and -test3, only to show up again with O16.3int. My O16.2 kernel
>> seems ok, which seems strange to me since the difference from O16.2 to
>> O16.3 is very small.
>
> While being a small patch, 16.2-16.3 was a large change. It removed the very
> aggressive starvation avoidance of spin on wait waker/wakee in 16.2, which
> clearly helped your issue.

I have confirmed that this change introduced the problem.

>> Any ideas?
>
> Pretty sure we have another spinner. A reniced -11 batch run of top -d 1 and
> vmstat during the spinning, and a kernel profile for that time will be
> helpful.

I'll try to get that.

--
M?ns Rullg?rd
[email protected]

2003-08-20 09:19:11

by MånsRullgård

[permalink] [raw]
Subject: Re: [PATCH] O17int

Con Kolivas <[email protected]> writes:

>> > This patch prevents any single task from being able to starve the
>> > runqueue that it's on. This minimises the impact a poorly behaved
>> > application or a malicious one has on the rest of the system. If an
>>
>> I have to disagree. Open a file of a few hundred lines in XEmacs and
>> do a regexp search for "^[> ]*-*\n\\([> ]*.*\n\\)*[> ]*foo". The
>> system will more or less freeze. It's a very nasty regexp, and it's
>> an error to try to use it, but it still shouldn't freeze the system.
>
> Reasonably sure this is a variation on the starvation which O17
> won't address.

Well, obviously.

I tried O17 minus 016.2-O16.3, and I don't get the problems, even with
X at nice 0. Does that tell you anything.

> Please top/vmstat/profile this and I'll look into it. There is
> potential for starvation without using up full timeslices and this
> may be it.

I'll do when I boot a kernel that has the problem.

--
M?ns Rullg?rd
[email protected]

2003-08-20 11:15:22

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O17int

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 20 Aug 2003 18:00, Mike Galbraith wrote:
> At 06:55 AM 8/20/2003 +0200, Voluspa wrote:
> >Blender 2.28 can not starve xmms one iota. Within blender itself, I can
> >cause 1 to 5 second freezes while doing a slow "world rotate", but that
> >is something the application programmers have to fix.
>
> I'm not so sure that it's an application bug. With Nick's patch, I cannot
> trigger any delay what so ever, whereas with stock, or with Ingo's changes
> [as well as my own, damn the bad luck] I can. I'm not saying it's _not_ a
> bug mind you, but color me suspicious ;-)

/me giggles like a 12 year old girl (teehee)

Try an earlier version of blender and you'll see it goes away. Other ones to
try are opening a gpg signed mail (like this mail) in kmail. The slower the
sleep avg decay in any tree the longer the spin. Nick's changes I believe
cover up the flaw. I'm not discounting Nick's work, but I do believe the apps
are broken and it's only the current scheduler design that makes it visible.
I would also like to make it impossible for priority inversion to happen but
at the moment I've just got to make sure they dont starve anything but their
dependent cpu hog. Still looking for some useful task dependency tracking.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/Q1nQZUg7+tp6mRURAgmIAJ0f6jGLZFjjguhYv+MGEz5S1DuMCwCeO+Id
ii0V4YOXlL9bB7wJ6rn8QEo=
=PygI
-----END PGP SIGNATURE-----

2003-08-20 16:26:28

by Wiktor Wodecki

[permalink] [raw]
Subject: Re: [PATCH] O17int

On Wed, Aug 20, 2003 at 01:01:28AM +1000, Con Kolivas wrote:
> Food for the starving masses.
snip

Sorry, but I still have the starving problem. the more I use O16/O17 the
more problems I encounter. xterms sometimes wake up after half a second
for another half a second and falls asleep for a whole second then.
after that, it's fully interactive. io-load seems to produce the
problem. a simple tar xf linux-2.6.0-test3.tar seems to halt the system.
new processes take ages to start. This also happend to me on O16.2.
Maybe it's because some AS patches are missing in vanilla but are in
2.6.0-test3-mm?

--
Regards,

Wiktor Wodecki


Attachments:
(No filename) (633.00 B)
(No filename) (189.00 B)
Download all attachments

2003-08-20 16:42:15

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] O17int



Wiktor Wodecki wrote:

>On Wed, Aug 20, 2003 at 01:01:28AM +1000, Con Kolivas wrote:
>
>>Food for the starving masses.
>>
>snip
>
>Sorry, but I still have the starving problem. the more I use O16/O17 the
>more problems I encounter. xterms sometimes wake up after half a second
>for another half a second and falls asleep for a whole second then.
>after that, it's fully interactive. io-load seems to produce the
>problem. a simple tar xf linux-2.6.0-test3.tar seems to halt the system.
>new processes take ages to start. This also happend to me on O16.2.
>Maybe it's because some AS patches are missing in vanilla but are in
>2.6.0-test3-mm?
>

There are no AS patches missing in vanilla. I don't think there are..
none that would change that.

Its unlikely that its an IO problem because its unlikely that your tar
would be evicting anything that X or the xterm depend on. If the
machine is swapping at this time then try setting /proc/sys/vm/swappiness
to 0.

Oh, you could try my cpu scheduler patch a try if you're bored.


2003-08-20 21:17:08

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O17int

On Thu, 21 Aug 2003 02:27, Wiktor Wodecki wrote:
> On Wed, Aug 20, 2003 at 01:01:28AM +1000, Con Kolivas wrote:
> > Food for the starving masses.
>
> snip
>
> Sorry, but I still have the starving problem. the more I use O16/O17 the
> more problems I encounter. xterms sometimes wake up after half a second
> for another half a second and falls asleep for a whole second then.
> after that, it's fully interactive. io-load seems to produce the
> problem. a simple tar xf linux-2.6.0-test3.tar seems to halt the system.

I can't reproduce your problem here until I hit swap hard. You sure that's not
happening? Kernel threads do get slight extra priority over regular threads
by now which should help throughput under load. Perhaps this has tipped the
balance on your machine when you hit swap pressure hard. Do a vmstat run
while you can reproduce the problem.

> new processes take ages to start. This also happend to me on O16.2.

But not in 16.3?

> Maybe it's because some AS patches are missing in vanilla but are in
> 2.6.0-test3-mm?

As Nick said, no this isn't the case.

Con

2003-08-20 22:51:31

by Voluspa

[permalink] [raw]
Subject: Re: [PATCH] O17int


I'm doing a backup now, basically a "cp -a" of all relevant directories
on the system, and smoothness _has_ taken a penalty by the starvation
work. Scrolling a picturefilled webpage (eg cnn.com) in Opera 6.12 jerks
with each disk write, or thereabouts. Load is ca 2.5 to 3. Doing a "dd"
from /dev/zero to a large file doesn't affect anything since it's a long
steady write.

Ah well :-)
Mats Johannesson

2003-08-21 05:19:20

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O17int

Unhappy with this latest O16.3-O17int patch I'm withdrawing it, and
recommending nothing on top of O16.3 yet.

More and more it just seems to be a bandaid to the priority inverting spin on
waiters, and it does seem to be of detriment to general interacivity. I can
now reproduce some loss of interactive feel with O17.

Something specific for the spin on waiters is required that doesn't affect
general performance. The fact that I can reproduce the same starvation in
vanilla 2.6.0-test3 but to a lesser extent means this is an intrinsic problem
that needs a specific solution.

Con

2003-08-21 05:31:16

by Apurva Mehta

[permalink] [raw]
Subject: Re: [PATCH] O17int

* Con Kolivas <[email protected]> [19-08-2003 21:26]:
> Food for the starving masses.
[snip]

Yesterday, I got a major lockup with this patch. The system just
completely froze for more than 10 seconds. There was no mouse
movement, I could not switch workspaces.. nothing..

There was quite a bit of load. There were about 180 processes (up from
the normal 40-50) due to a number of procmail/sendmail pairs. I run
spamassassin so each of those processes was testing for spam apart
from my other tests. I was also actively browsing the web and
playing music. Thats about all that was going on.

Unfortunately I could not get the numbers. Once it passed, everything
was perfectly back to normal. I have not been able to reproduce the
freeze because I have not got enough emails at one go to reproduce
that kind of load. If will try to capture the numbers next time round.

- Apurva

--
Engineers motto: cheap, good, fast: choose any two

2003-08-21 07:49:25

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] O17int

At 03:26 PM 8/21/2003 +1000, Con Kolivas wrote:
>Unhappy with this latest O16.3-O17int patch I'm withdrawing it, and
>recommending nothing on top of O16.3 yet.
>
>More and more it just seems to be a bandaid to the priority inverting spin on
>waiters, and it does seem to be of detriment to general interacivity. I can
>now reproduce some loss of interactive feel with O17.
>
>Something specific for the spin on waiters is required that doesn't affect
>general performance. The fact that I can reproduce the same starvation in
>vanilla 2.6.0-test3 but to a lesser extent means this is an intrinsic problem
>that needs a specific solution.

I can see only one possible answer to this - never allow a normal task to
hold the cpu for long stretches (define) while there are other tasks
runnable. (see attachment)

I think the _easiest_ fix for this particular starvation (without tossing
baby out with bath water;) is to try to "down-shift" in schedule() when
next == prev. This you can do very cheaply with a find_next_bit(). That
won't help the case where there are multiple tasks involved, but should fix
the most common case for dirt cheap. (another simple alternative is to
globally "down-shift" periodically)

The most generally effective form of the "down-shift" anti-starvation
tactic that I've tried, is to periodically check the head of all queues
below current position (can do very quickly), and actively select the
oldest task who hasn't run since some defined deadline. Queues are
serviced based upon priority most of the time, and based upon age some of
the time.

Everything I've tried along these lines has a common upside: works, and
downside: butt ugly :)

-Mike


Attachments:
strace-blender (3.30 kB)

2003-08-21 11:41:06

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O17int

On Thu, 21 Aug 2003 17:53, Mike Galbraith wrote:
> At 03:26 PM 8/21/2003 +1000, Con Kolivas wrote:
> >Unhappy with this latest O16.3-O17int patch I'm withdrawing it, and
> >recommending nothing on top of O16.3 yet.
> >
> >More and more it just seems to be a bandaid to the priority inverting spin
> > on waiters, and it does seem to be of detriment to general interacivity.
> > I can now reproduce some loss of interactive feel with O17.
> >
> >Something specific for the spin on waiters is required that doesn't affect
> >general performance. The fact that I can reproduce the same starvation in
> >vanilla 2.6.0-test3 but to a lesser extent means this is an intrinsic
> > problem that needs a specific solution.
>
> I can see only one possible answer to this - never allow a normal task to
> hold the cpu for long stretches (define) while there are other tasks
> runnable. (see attachment)

I assume you mean the strace ? That was the only attachment, and it just looks
like shiploads of schedule() from the get time of day. Yes?

>
> I think the _easiest_ fix for this particular starvation (without tossing
> baby out with bath water;) is to try to "down-shift" in schedule() when
> next == prev. This you can do very cheaply with a find_next_bit(). That
> won't help the case where there are multiple tasks involved, but should fix
> the most common case for dirt cheap. (another simple alternative is to
> globally "down-shift" periodically)

Err funny you should say that; that's what O17 did. But it hurt because it
would never allow a task that used a full timeslice to be next==prev. The
less I throttled that, the less effective the antistarvation was. However
this is clearly a problem without using up full timeslices. I originally
thought they weren't trying to schedule lots because of the drop in ctx
during starvation but I forgot that rescheduling the same task doesnt count
as a ctx.

Also I recall that winegames got much better in O10 when everything was
charged at least one jiffy (pre nanosecond timing) suggesting those that were
waking up for minute amounts of time repeatedly were being penalised; thus
taking out the possibility of the starving task staying high priority for
long.

> The most generally effective form of the "down-shift" anti-starvation
> tactic that I've tried, is to periodically check the head of all queues
> below current position (can do very quickly), and actively select the
> oldest task who hasn't run since some defined deadline. Queues are
> serviced based upon priority most of the time, and based upon age some of
> the time.

Hmm also sounds fudgy.

> Everything I've tried along these lines has a common upside: works, and
> downside: butt ugly :)

Well if it works well who cares. My starvation throttler (O17) hurt
interactivity though which is why I dropped it.

I'm guessing I need to rewrite the O10 semantics to work with nanosecond
timing instead. I'll have to give that a go.

Thanks for your comments Mike. Hopefully the munchkins can stack high enough
;)

Con

2003-08-21 15:10:21

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] O17int

At 09:46 PM 8/21/2003 +1000, Con Kolivas wrote:
>On Thu, 21 Aug 2003 17:53, Mike Galbraith wrote:
> > At 03:26 PM 8/21/2003 +1000, Con Kolivas wrote:
> > >Unhappy with this latest O16.3-O17int patch I'm withdrawing it, and
> > >recommending nothing on top of O16.3 yet.
> > >
> > >More and more it just seems to be a bandaid to the priority inverting spin
> > > on waiters, and it does seem to be of detriment to general interacivity.
> > > I can now reproduce some loss of interactive feel with O17.
> > >
> > >Something specific for the spin on waiters is required that doesn't affect
> > >general performance. The fact that I can reproduce the same starvation in
> > >vanilla 2.6.0-test3 but to a lesser extent means this is an intrinsic
> > > problem that needs a specific solution.
> >
> > I can see only one possible answer to this - never allow a normal task to
> > hold the cpu for long stretches (define) while there are other tasks
> > runnable. (see attachment)
>
>I assume you mean the strace ? That was the only attachment, and it just
>looks
>like shiploads of schedule() from the get time of day. Yes?

(no, ~2 seconds of X being awol)

> > I think the _easiest_ fix for this particular starvation (without tossing
> > baby out with bath water;) is to try to "down-shift" in schedule() when
> > next == prev. This you can do very cheaply with a find_next_bit(). That
> > won't help the case where there are multiple tasks involved, but should fix
> > the most common case for dirt cheap. (another simple alternative is to
> > globally "down-shift" periodically)
>
>Err funny you should say that; that's what O17 did. But it hurt because it
>would never allow a task that used a full timeslice to be next==prev. The

If someone is marked for resched, it means we want to give someone else the
cpu right? In this case at least, re-selecting blender is not the right
thing to do. Looks like he's burning rubber... going nowhere fast.

> less I throttled that, the less effective the antistarvation was. However
>this is clearly a problem without using up full timeslices. I originally
>thought they weren't trying to schedule lots because of the drop in ctx
>during starvation but I forgot that rescheduling the same task doesnt count
>as a ctx.

Hmm. In what way did it hurt interactivity? I know that if you pass the
cpu off to non-gui task who's going to use his full 100 ms slice, you'll
definitely feel it. (made workaround, will spare delicate tummies;) If
you mean that, say X releases the cpu and has only a couple of ms left on
it's slice and is alone in it's queue, that preempting it at the end of
it's slice after having only had the cpu for such a short time after wakeup
hurts, you can qualify the preempt decision with a cpu possession time check.

> Also I recall that winegames got much better in O10 when everything was
>charged at least one jiffy (pre nanosecond timing) suggesting those that were
>waking up for minute amounts of time repeatedly were being penalised; thus
>taking out the possibility of the starving task staying high priority for
>long.

(unsure what you mean here)

> > The most generally effective form of the "down-shift" anti-starvation
> > tactic that I've tried, is to periodically check the head of all queues
> > below current position (can do very quickly), and actively select the
> > oldest task who hasn't run since some defined deadline. Queues are
> > serviced based upon priority most of the time, and based upon age some of
> > the time.
>
>Hmm also sounds fudgy.

Yeah. I crossbred it with a ~deadline scheduler, and created a mutt.

-Mike (2'6"")

2003-08-21 22:16:16

by Wes Janzen

[permalink] [raw]
Subject: Re: [PATCH] O17int



Mike Galbraith wrote:

>> less I throttled that, the less effective the antistarvation was.
>> However
>> this is clearly a problem without using up full timeslices. I originally
>> thought they weren't trying to schedule lots because of the drop in ctx
>> during starvation but I forgot that rescheduling the same task doesnt
>> count
>> as a ctx.
>
>
> Hmm. In what way did it hurt interactivity? I know that if you pass
> the cpu off to non-gui task who's going to use his full 100 ms slice,
> you'll definitely feel it. (made workaround, will spare delicate
> tummies;) If you mean that, say X releases the cpu and has only a
> couple of ms left on it's slice and is alone in it's queue, that
> preempting it at the end of it's slice after having only had the cpu
> for such a short time after wakeup hurts, you can qualify the preempt
> decision with a cpu possession time check.

I wish I could get mm3 running so I could evaluate those interactivity
statements. I can't imagine it being worse than what I'm experiencing now:

9 0 63968 21992 19672 319056 0 0 0 46 1231 289 87 13
0 0
5 0 63968 21096 19704 320020 0 0 0 26 1202 300 86
14 0 0
14 0 63968 21104 19704 320020 0 0 0 0 1578 385 86
14 0 0
8 0 63968 20448 20152 320048 0 0 7 41 1189 294 91
9 0 0
15 0 63968 19552 20132 321052 0 0 6 76 5926 1330 87
13 0 0
13 0 63968 18992 20132 321664 0 0 0 31 5464 1266 87
13 0 0
11 0 63968 18676 20100 321976 0 0 0 65 4613 1008 87
13 0 0
8 0 63968 18044 20100 322660 0 0 0 14 4643 1132 87
13 0 0
16 0 63968 17660 20100 323024 0 0 0 20 5273 1272 87
13 0 0
14 0 63968 12092 20060 328596 0 0 0 24 5110 1221 87
13 0 0
14 0 63968 11644 20052 329052 0 0 0 68 5264 1285 87
13 0 0
13 0 63968 11324 19924 329560 0 0 0 34 4473 1125 87
13 0 0
16 0 63968 10740 19964 329880 0 0 1 74 1494 445 90
10 0 0
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
r b swpd free inact active si so bi bo in cs us sy
id wa
22 0 63968 10452 19968 330300 0 0 1 51 4584 1059 88
12 0 0
14 0 63968 10004 19940 330736 0 0 0 90 6567 1547 87
13 0 0
15 0 63968 9188 20048 331472 5 0 10 106 5176 1289 87
13 0 0
13 0 63968 13932 20016 326824 6 0 6 26 6026 1318 87
13 0 0
12 0 63968 13412 19984 327324 0 0 0 24 4683 1083 87
13 0 0
13 1 63968 12700 20012 327956 0 0 16 46 5673 1313 87
13 0 0
12 0 63968 12252 20084 328296 0 0 6 78 3499 780 88
12 0 0
18 0 63968 11468 20060 329016 0 0 0 35 5456 1231 87
13 0 0
13 0 63968 12052 20096 328580 0 0 3 96 4430 988 87
13 0 0
13 0 63968 5012 19144 336532 0 0 0 62 5672 1225 88
12 0 0
16 0 63964 6624 17976 336088 0 0 0 297 4808 1041 88
12 0 0
11 0 63964 5616 17964 337124 0 0 0 69 5219 1094 89
11 0 0
16 1 63964 6732 15688 338276 13 0 20 155 4984 1156 87
13 0 0
4 1 63964 7772 12544 340436 13 0 20 11 1070 201 90
10 0 0
3 1 63964 4364 11560 344840 0 0 86 338 1038 250 92
8 0 0


That would be compiling the kernel, bunzipping a file, and some cron
mandb thing that was running gzip in the background niced. Plus X and
Mozilla, which probably starts the problem. At the end there, you see
things calm down. That's also the way it starts out, then something
sets off the "priority inversion" and the machine becomes completely
worthless. Even the task that are running aren't really accomplishing
anything. So the load goes from around 4/5 into the teens and the
context switching makes a corresponding jump. And then both
interactivity AND throughput fall through the floor.

I can't imagine any interactivity regressions that are worse than this
behavior...

And this happens with just X and Mozilla running. It happens less often
without X running, but still happens. Even if I'm at a VT, it could
take 5-6 seconds for my text to appear after I type. This happens all
the time, about once every few minutes and correlates with a
simultaneous increase in context switches and load.

>
>> Also I recall that winegames got much better in O10 when everything was
>> charged at least one jiffy (pre nanosecond timing) suggesting those
>> that were
>> waking up for minute amounts of time repeatedly were being penalised;
>> thus
>> taking out the possibility of the starving task staying high priority
>> for
>> long.
>
>
> (unsure what you mean here)

Can you set a cutoff point where if the process uses less that X percent
of the max timeslice, it is penalized? I don't know if it's possible
to do a loop of empty spins at some point and time it to find out what
the cut-off point should be...otherwise I imagine it would need to be
tuned for every processor speed. Could you use the bogomips to gauge
the speed of the machine and use that to determine the min timeslice?
From what I understand above, that would perhaps be more selective than
just penalizing every process and have a positive affect on
everything...of course I'm open to the possibility that I have it all
wrong ;-)

Wes


2003-08-21 23:59:34

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O17int

Quoting Mike Galbraith <[email protected]>:

> At 09:46 PM 8/21/2003 +1000, Con Kolivas wrote:
> >On Thu, 21 Aug 2003 17:53, Mike Galbraith wrote:
> > > At 03:26 PM 8/21/2003 +1000, Con Kolivas wrote:
> > > >Unhappy with this latest O16.3-O17int patch I'm withdrawing it, and
> > > >recommending nothing on top of O16.3 yet.
> > > >
> > > >More and more it just seems to be a bandaid to the priority inverting
> spin
> > > > on waiters, and it does seem to be of detriment to general
> interacivity.
> > > > I can now reproduce some loss of interactive feel with O17.
> > > >
> > > >Something specific for the spin on waiters is required that doesn't
> affect
> > > >general performance. The fact that I can reproduce the same starvation
> in
> > > >vanilla 2.6.0-test3 but to a lesser extent means this is an intrinsic
> > > > problem that needs a specific solution.
> > >
> > > I can see only one possible answer to this - never allow a normal task
> to
> > > hold the cpu for long stretches (define) while there are other tasks
> > > runnable. (see attachment)
> >
> >I assume you mean the strace ? That was the only attachment, and it just
> >looks
> >like shiploads of schedule() from the get time of day. Yes?
>
> (no, ~2 seconds of X being awol)
>
> > > I think the _easiest_ fix for this particular starvation (without
> tossing
> > > baby out with bath water;) is to try to "down-shift" in schedule() when
> > > next == prev. This you can do very cheaply with a find_next_bit().
> That
> > > won't help the case where there are multiple tasks involved, but should
> fix
> > > the most common case for dirt cheap. (another simple alternative is to
> > > globally "down-shift" periodically)
> >
> >Err funny you should say that; that's what O17 did. But it hurt because it
> >would never allow a task that used a full timeslice to be next==prev. The
>
> If someone is marked for resched, it means we want to give someone else the
> cpu right? In this case at least, re-selecting blender is not the right
> thing to do. Looks like he's burning rubber... going nowhere fast.
>
> > less I throttled that, the less effective the antistarvation was. However
> >this is clearly a problem without using up full timeslices. I originally
> >thought they weren't trying to schedule lots because of the drop in ctx
> >during starvation but I forgot that rescheduling the same task doesnt count
> >as a ctx.
>
> Hmm. In what way did it hurt interactivity? I know that if you pass the
> cpu off to non-gui task who's going to use his full 100 ms slice, you'll
> definitely feel it. (made workaround, will spare delicate tummies;) If

Well I made it so that if full timeslice is used, and then the same task is
next==prev, let next candidate for scheduling get up to TIMESLICE_GRANULARITY
timeslice; not a full task timeslice. Even this was palpable as losing the X
waving around smoothness with anything else burning (kernel compile, bunzip or
whatever). Basically X is somewhere between a frequent sleeper with short
periods of cpu time, or a burst of full timeslices when used heavily. When it's
full timeslices, it suffers under the same things that throttle spinners.

> you mean that, say X releases the cpu and has only a couple of ms left on
> it's slice and is alone in it's queue, that preempting it at the end of
> it's slice after having only had the cpu for such a short time after wakeup
> hurts, you can qualify the preempt decision with a cpu possession time
> check.

I was letting it run out the full timeslice unabated, and only preventing it
from getting immediately rescheduled.

>
> > Also I recall that winegames got much better in O10 when everything was
> >charged at least one jiffy (pre nanosecond timing) suggesting those that
> were
> >waking up for minute amounts of time repeatedly were being penalised; thus
> >taking out the possibility of the starving task staying high priority for
> >long.
>
> (unsure what you mean here)

Ok I think blender (we say blenndah in .au) is waking up, polling for X, then
going back to sleep only to do it all over again. I don't think it's wasting a
full timeslice at all. There seem to be two variants of this spin on wait; one
is the task that uses a full timeslice spinning (wine games), and the other is
waking up, rescheduling and going back to sleep only to do it all over again.

>
> > > The most generally effective form of the "down-shift" anti-starvation
> > > tactic that I've tried, is to periodically check the head of all queues
> > > below current position (can do very quickly), and actively select the
> > > oldest task who hasn't run since some defined deadline. Queues are
> > > serviced based upon priority most of the time, and based upon age some
> of
> > > the time.
> >
> >Hmm also sounds fudgy.
>
> Yeah. I crossbred it with a ~deadline scheduler, and created a mutt.

But how did this mutt perform?

> -Mike (2'6"")

+ another 2'6"" doesn't quite get us tall enough. Maybe we need another
munchkin.

Con

2003-08-22 00:09:20

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH] O17int

Quoting Wes Janzen <[email protected]>:

> I wish I could get mm3 running so I could evaluate those interactivity
> statements. I can't imagine it being worse than what I'm experiencing now:

Umm. You didn't mention which kernel/patch? I seem to recall you were using
Osomething but which?

> That would be compiling the kernel, bunzipping a file, and some cron
> mandb thing that was running gzip in the background niced. Plus X and
> Mozilla, which probably starts the problem. At the end there, you see
> things calm down. That's also the way it starts out, then something
> sets off the "priority inversion" and the machine becomes completely
> worthless. Even the task that are running aren't really accomplishing
> anything. So the load goes from around 4/5 into the teens and the
> context switching makes a corresponding jump. And then both
> interactivity AND throughput fall through the floor.
>
> I can't imagine any interactivity regressions that are worse than this
> behavior...

If this is Osomething, can you tell me when it started doing this and what
vanilla does.

> And this happens with just X and Mozilla running. It happens less often
> without X running, but still happens. Even if I'm at a VT, it could
> take 5-6 seconds for my text to appear after I type. This happens all
> the time, about once every few minutes and correlates with a
> simultaneous increase in context switches and load.

Ditto

> Can you set a cutoff point where if the process uses less that X percent
> of the max timeslice, it is penalized? I don't know if it's possible
> to do a loop of empty spins at some point and time it to find out what
> the cut-off point should be...otherwise I imagine it would need to be
> tuned for every processor speed. Could you use the bogomips to gauge
> the speed of the machine and use that to determine the min timeslice?
> From what I understand above, that would perhaps be more selective than
> just penalizing every process and have a positive affect on
> everything...of course I'm open to the possibility that I have it all
> wrong ;-)

A thought, but unfortunately some arbitrary limit would just be hit by
everything randomly rather than with any meaningful frequency. The timeslice
interactive things use up is extremely variable depending on circumstances.

Con

2003-08-22 00:43:01

by Felipe Alfaro Solana

[permalink] [raw]
Subject: Re: [PATCH] O17int

On Fri, 2003-08-22 at 00:18, Wes Janzen wrote:

> I wish I could get mm3 running so I could evaluate those interactivity
> statements. I can't imagine it being worse than what I'm experiencing now:

If -mm3 doesn't work, then you can download patches against vanilla
2.6.0-test3 from
http://members.optusnet.com.au/ckolivas/kernel/2.5/2.6.0-test3/patch-test3-O16.3int

2003-08-22 05:08:56

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] O17int

At 09:59 AM 8/22/2003 +1000, Con Kolivas wrote:
>Quoting Mike Galbraith <[email protected]>:
>
> > > > The most generally effective form of the "down-shift" anti-starvation
> > > > tactic that I've tried, is to periodically check the head of all queues
> > > > below current position (can do very quickly), and actively select the
> > > > oldest task who hasn't run since some defined deadline. Queues are
> > > > serviced based upon priority most of the time, and based upon age some
> > of
> > > > the time.
> > >
> > >Hmm also sounds fudgy.
> >
> > Yeah. I crossbred it with a ~deadline scheduler, and created a mutt.
>
>But how did this mutt perform?

At the time, I was more concerned by the very long semaphore hold times I
was seeing than anything else. That it helped quite a lot. It didn't hurt
throughput in any way I could see, and it improved irman's latency
numbers. (process load was routinely hitting ~1.5 seconds max latency at
the time, that tree cut it to roughly 400-500ms iirc) Just like anything
else that increases fairness though, it hurt X feel somewhat in the
presence of load.

-Mike

2003-08-22 06:16:52

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] O17int

At 05:18 PM 8/21/2003 -0500, Wes Janzen wrote:


>Mike Galbraith wrote:
>
>>> less I throttled that, the less effective the antistarvation was. However
>>>this is clearly a problem without using up full timeslices. I originally
>>>thought they weren't trying to schedule lots because of the drop in ctx
>>>during starvation but I forgot that rescheduling the same task doesnt count
>>>as a ctx.
>>
>>
>>Hmm. In what way did it hurt interactivity? I know that if you pass the
>>cpu off to non-gui task who's going to use his full 100 ms slice, you'll
>>definitely feel it. (made workaround, will spare delicate tummies;) If
>>you mean that, say X releases the cpu and has only a couple of ms left on
>>it's slice and is alone in it's queue, that preempting it at the end of
>>it's slice after having only had the cpu for such a short time after
>>wakeup hurts, you can qualify the preempt decision with a cpu possession
>>time check.
>
>I wish I could get mm3 running so I could evaluate those interactivity
>statements. I can't imagine it being worse than what I'm experiencing now:
>
>9 0 63968 21992 19672 319056 0 0 0 46 1231 289 87 13
>0 0
>5 0 63968 21096 19704 320020 0 0 0 26 1202 300 86 14 0 0
>14 0 63968 21104 19704 320020 0 0 0 0 1578 385 86 14 0 0
>8 0 63968 20448 20152 320048 0 0 7 41 1189 294 91
>9 0 0
>15 0 63968 19552 20132 321052 0 0 6 76 5926 1330 87 13 0 0
>13 0 63968 18992 20132 321664 0 0 0 31 5464 1266 87 13 0 0
>11 0 63968 18676 20100 321976 0 0 0 65 4613 1008 87 13 0 0
>8 0 63968 18044 20100 322660 0 0 0 14 4643 1132 87 13 0 0
>16 0 63968 17660 20100 323024 0 0 0 20 5273 1272 87 13 0 0
>14 0 63968 12092 20060 328596 0 0 0 24 5110 1221 87 13 0 0
>14 0 63968 11644 20052 329052 0 0 0 68 5264 1285 87 13 0 0
>13 0 63968 11324 19924 329560 0 0 0 34 4473 1125 87 13 0 0
>16 0 63968 10740 19964 329880 0 0 1 74 1494 445 90 10 0 0
>procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>r b swpd free inact active si so bi bo in cs us sy id wa
>22 0 63968 10452 19968 330300 0 0 1 51 4584 1059 88 12 0 0
>14 0 63968 10004 19940 330736 0 0 0 90 6567 1547 87 13 0 0
>15 0 63968 9188 20048 331472 5 0 10 106 5176 1289 87 13 0 0
>13 0 63968 13932 20016 326824 6 0 6 26 6026 1318 87 13 0 0
>12 0 63968 13412 19984 327324 0 0 0 24 4683 1083 87 13 0 0
>13 1 63968 12700 20012 327956 0 0 16 46 5673 1313 87 13 0 0
>12 0 63968 12252 20084 328296 0 0 6 78 3499 780 88 12 0 0
>18 0 63968 11468 20060 329016 0 0 0 35 5456 1231 87 13 0 0
>13 0 63968 12052 20096 328580 0 0 3 96 4430 988 87 13 0 0
>13 0 63968 5012 19144 336532 0 0 0 62 5672 1225 88 12 0 0
>16 0 63964 6624 17976 336088 0 0 0 297 4808 1041 88 12 0 0
>11 0 63964 5616 17964 337124 0 0 0 69 5219 1094 89 11 0 0
>16 1 63964 6732 15688 338276 13 0 20 155 4984 1156 87 13 0 0
>4 1 63964 7772 12544 340436 13 0 20 11 1070 201 90 10 0 0
>3 1 63964 4364 11560 344840 0 0 86 338 1038 250 92
>8 0 0
>
>
>That would be compiling the kernel, bunzipping a file, and some cron mandb
>thing that was running gzip in the background niced. Plus X and Mozilla,
>which probably starts the problem. At the end there, you see things calm
>down. That's also the way it starts out, then something sets off the
>"priority inversion" and the machine becomes completely worthless. Even
>the task that are running aren't really accomplishing anything. So the
>load goes from around 4/5 into the teens and the context switching makes a
>corresponding jump. And then both interactivity AND throughput fall
>through the floor.
>
>I can't imagine any interactivity regressions that are worse than this
>behavior...
>
>And this happens with just X and Mozilla running. It happens less often
>without X running, but still happens. Even if I'm at a VT, it could take
>5-6 seconds for my text to appear after I type. This happens all the
>time, about once every few minutes and correlates with a simultaneous
>increase in context switches and load.

Those high interrupt counts are all stalls? What kernel is that?



>>> Also I recall that winegames got much better in O10 when everything was
>>>charged at least one jiffy (pre nanosecond timing) suggesting those that
>>>were
>>>waking up for minute amounts of time repeatedly were being penalised; thus
>>>taking out the possibility of the starving task staying high priority for
>>>long.
>>
>>
>>(unsure what you mean here)
>
>Can you set a cutoff point where if the process uses less that X percent
>of the max timeslice, it is penalized? I don't know if it's possible
>to do a loop of empty spins at some point and time it to find out what the
>cut-off point should be...otherwise I imagine it would need to be tuned
>for every processor speed. Could you use the bogomips to gauge the speed
>of the machine and use that to determine the min timeslice?
> From what I understand above, that would perhaps be more selective than
> just penalizing every process and have a positive affect on
> everything...of course I'm open to the possibility that I have it all wrong ;-)
>
>Wes
>

2003-08-22 20:45:43

by Wes Janzen

[permalink] [raw]
Subject: Re: [PATCH] O17int

Yep, those are very bad stalls, much worse than the normal temporary
stalls when something is running in the background. I think part of
that can be attributed to mozilla, which likes to do very little for a
while, then jump up to 5% for a little bit and then up to 35% for 5
seconds or so.
With something running in the background, things get very bad. Those
vmstat were supposed to be reported at 5 second intervals, but they were
not being reported at the rate during the problem. Those represent
about 5 minutes of stalling. I could live with a short stall, but 5
minutes where the computer barely takes input is crazy. X becomes
totally unresponsive to the point I cannot switch to a VT.

I started a shutdown one time with the acipd daemon watching for power
button events. It took 1 hour 30 minutes from the time it said that it
was shutting down (I could hear the beep from the shutdown process) to
the point I got to "Stopping at daemon" which is barely into the
shutdown cycle. Even then I waited another 10 minutes for it to
complete the shutdown and it never did. All I was doing there was
compiling in a gnome-terminal, and had just clicked on a bug-buddy
window to get it to show debugging information. I couldn't get to a VT
so I wasn't able to get a vmstat log of that one.

Even if this is due to a bad interaction between a program and X, it
shouldn't be able to bring the system to its knees.

Kernel version:
2.6.0-test3-mm2 + O16.3int



Mike Galbraith wrote:

>
> Those high interrupt counts are all stalls? What kernel is that?
>

2003-08-22 21:15:13

by Wes Janzen

[permalink] [raw]
Subject: Re: [PATCH] O17int

My last reply didn't make it to the list due to an html attachment.
Mozilla is supposed to strip those but apparently does not sometimes...

Con Kolivas wrote:

>Umm. You didn't mention which kernel/patch? I seem to recall you were using
>Osomething but which?
>
>
For the record, that's 2.6.0-test3-mm2 + O16.3int

>
>If this is Osomething, can you tell me when it started doing this and what
>vanilla does.
>
>
Again, for posterity, this started with mm2, mm1 is similar to vanilla
except it has better interactivity. I've been running vanilla overnight
and today but haven't found any cases where it stalls like mm2 does.