LinuxLists.cc - [PATCH] Minor scheduler fix to get rid of skipping in xmms

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Hi Michael,

I don't know if really a bug due to xmms, I suspect that's the case. I'm
not familiar with xmms internals, but when I gdb'ed the process after it
froze, all the threads either stopped at poll(), write(), select(), or
nanosleep(). Some combination of the blocking calls among those is probably
causing the stall. I highly doubt it's due to the kernel since I haven't
been experiencing hangs in any other applications. It could be the socket
code though if extensive modifications to it have been made, since I've
never experienced hangs like this in the 2.4.18 kernel used by RedHat 8.0.

John Yau

>From: Michael Buesch <[email protected]>
>To: "John Yau" <[email protected]>
>CC: linux kernel mailing list <[email protected]>
>Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms
>Date: Sat, 6 Sep 2003 12:03:35 +0200
>
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>On Saturday 06 September 2003 11:46, John Yau wrote:
> > Hi folks,
>
>Hi John,
>
> > xmms still completely hangs every once in a while for me. However I
> > suspect it's due to a bug in xmms that deadlocks. Anyone else
>experiencing
> > hangs with xmms while tuning into Shoutcast???
>
>Yes, that's (was?) a bug of xmms.
>
>- --
>Regards Michael Buesch [ http://www.tuxsoft.de.vu ]
>Animals on this machine: some GNUs and Penguin 2.6.0-test4-bk2
>
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1.2.2 (GNU/Linux)
>
>iD8DBQE/WbD/oxoigfggmSgRAs5qAJ99vZeNeMEXhl72VvVlGFMWh55HVgCeLK0R
>MgWcMSUSdEYL+OeehfDNBCc=
>=TdCG
>-----END PGP SIGNATURE-----
>

_________________________________________________________________
Get 10MB of e-mail storage! Sign up for Hotmail Extra Storage.
http://join.msn.com/?PAGE=features/es

2003-09-06 16:50:39

by Robert Love

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Sat, 2003-09-06 at 05:46, John Yau wrote:

> I'm new to patch submission process, so bear with me. This little patch I
> wrote seems to get rid of the annoying skipping in xmms except in the most
> extreme cases. See comments inlined in code for details of the fix.

This looks exactly like the granular timeslice patch Ingo did?

Robert Love

2003-09-06 16:58:07

by Michael Buesch

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Saturday 06 September 2003 17:58, John Yau wrote:
> Hi Michael,

John,

> I don't know if really a bug due to xmms, I suspect that's the case. I'm
> not familiar with xmms internals, but when I gdb'ed the process after it
> froze, all the threads either stopped at poll(), write(), select(), or
> nanosleep().

I updated to xmms-1.2.8 final and I didn't see a hang until yet.
I didn't recognize hangs in 1.2.7, so I assume it was some
bug in the beta-versions between 1.2.7 - 1.2.8, which I used and for
which I saw the hangs.
But probably we're not talking about the same problems. :)

- --
Regards Michael Buesch [ http://www.tuxsoft.de.vu ]
Animals on this machine: some GNUs and Penguin 2.6.0-test4-bk2

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/WhIWoxoigfggmSgRAkW9AJ4iBpQO1FbnwQzbBfXKb3QALDlXjwCfR6JL
5II9W7tp6Y8733GCB9NxBuk=
=fzvJ
-----END PGP SIGNATURE-----

2003-09-06 18:00:08

[permalink] [raw]

Subject: RE: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Hmm...I just started monitoring the linux-kernel list on an archive website
last week, so I have no idea if my patch is duplicate work. I can't seem to
find the specific patch you're referring to. Can you send me a link to the
patch?

John Yau

-----Original Message-----
From: Robert Love [mailto:[email protected]]
Sent: Saturday, September 06, 2003 1:01 PM
To: John Yau
Cc: [email protected]
Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Sat, 2003-09-06 at 05:46, John Yau wrote:

> I'm new to patch submission process, so bear with me. This little
> patch I
> wrote seems to get rid of the annoying skipping in xmms except in the most

> extreme cases. See comments inlined in code for details of the fix.

This looks exactly like the granular timeslice patch Ingo did?

Robert Love

2003-09-06 18:17:59

[permalink] [raw]

Subject: RE: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Scratch that, I just found Ingo's patch. My patch does essentially the same
thing except it only allows the current active process to be preempted if it
got demoted in priority during the effective priority recalculation. This
IMHO is better because it doesn't do unnecessary context switches. If the
process were truly a CPU hog relative other processes on the run queue, then
it'd get preempted eventually when it gets demoted rather than always every
25 ms. How come Ingo's granular timeslice patch didn't get put into
2.6.0-test4?

John Yau

-----Original Message-----
From: Robert Love [mailto:[email protected]]
Sent: Saturday, September 06, 2003 1:01 PM
To: John Yau
Cc: [email protected]
Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Sat, 2003-09-06 at 05:46, John Yau wrote:

> I'm new to patch submission process, so bear with me. This little
> patch I
> wrote seems to get rid of the annoying skipping in xmms except in the most

> extreme cases. See comments inlined in code for details of the fix.

This looks exactly like the granular timeslice patch Ingo did?

Robert Love

2003-09-06 19:42:46

by Rahul Karnik

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

John Yau wrote:
> How come Ingo's granular timeslice patch didn't get put into
> 2.6.0-test4?

It is being tested in Andrew's mm kernels, along with Con's tweaks.

-Rahul
--
Rahul Karnik
[email protected]
http://www.genebrew.com/

2003-09-06 19:53:49

by Robert Love

[permalink] [raw]

Subject: RE: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Sat, 2003-09-06 at 14:17, John Yau wrote:

> Scratch that, I just found Ingo's patch. My patch does essentially the same
> thing except it only allows the current active process to be preempted if it
> got demoted in priority during the effective priority recalculation. This
> IMHO is better because it doesn't do unnecessary context switches. If the
> process were truly a CPU hog relative other processes on the run queue, then
> it'd get preempted eventually when it gets demoted rather than always every
> 25 ms.

The rationale behind Ingo's patch is to "break up" the timeslices to
give better scheduling latency to multiple tasks at the same priority.
So it is not "unnecessary context switches," just "extra context
switches."

It also recalculates the process's effective priority, like yours does,
so it also has the same advantage as your patch: to more quickly detect
tasks that have changed in interactivity, and to handle that.

Not sure which approach is better. Only testing will tell.

> How come Ingo's granular timeslice patch didn't get put into 2.6.0-test4?

Interactivity improvements are currently a contentious issue. The patch
is back in 2.6-mm, though.

Robert Love

2003-09-06 22:42:07

[permalink] [raw]

Subject: RE: [PATCH] Minor scheduler fix to get rid of skipping in xmms

>The rationale behind Ingo's patch is to "break up" the timeslices to give
better scheduling latency to
>multiple tasks at the same priority.
>So it is not "unnecessary context switches," just "extra context switches."

Hmm...my reasoning is that those switches are unnecessary because the
interactivity bonus/penalty will take care of breaking the timeslices up in
case of a CPU hog, albeit not at precise 25 ms granularity. Though having
regularity in scheduling is nice, I think Ingo's patch somewhat negates the
purpose of having heterogenous time slice lengths. I suspect Ingo's
approach will thrash the caches quite a bit more than mine; we should
definitely test this a bit to find out for sure. Any suggestions on how to
go about that?

If we're going to do a context switch every 25 ms no matter what, we might
as well just make the scheduler a true real time scheduler, dump having
different time slice lengths and interactivity recalculations, and go
completely round robin with strictly enforced priorities and a single class
of time slice somewhere 1 to 5 ms long.

John Yau

2003-09-07 02:41:18

by Martin J. Bligh

[permalink] [raw]

Subject: RE: [PATCH] Minor scheduler fix to get rid of skipping in xmms

>> The rationale behind Ingo's patch is to "break up" the timeslices to give
> better scheduling latency to
>> multiple tasks at the same priority.
>> So it is not "unnecessary context switches," just "extra context switches."
>
> Hmm...my reasoning is that those switches are unnecessary because the
> interactivity bonus/penalty will take care of breaking the timeslices up in
> case of a CPU hog, albeit not at precise 25 ms granularity. Though having
> regularity in scheduling is nice, I think Ingo's patch somewhat negates the
> purpose of having heterogenous time slice lengths. I suspect Ingo's
> approach will thrash the caches quite a bit more than mine; we should
> definitely test this a bit to find out for sure. Any suggestions on how to
> go about that?
>
> If we're going to do a context switch every 25 ms no matter what, we might
> as well just make the scheduler a true real time scheduler, dump having
> different time slice lengths and interactivity recalculations, and go
> completely round robin with strictly enforced priorities and a single class
> of time slice somewhere 1 to 5 ms long.

IIRC, that context switching was what sucked on cpu bound jobs (like
doing a kernel compile). If you can send me both patches (offline),
I'll do a straight comparison on the benchmarking rig I have set up
on Monday.

M.

2003-09-07 05:08:26

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Robert Love wrote:

>On Sat, 2003-09-06 at 14:17, John Yau wrote:
>
>
>>Scratch that, I just found Ingo's patch. My patch does essentially the same
>>thing except it only allows the current active process to be preempted if it
>>got demoted in priority during the effective priority recalculation. This
>>IMHO is better because it doesn't do unnecessary context switches. If the
>>process were truly a CPU hog relative other processes on the run queue, then
>>it'd get preempted eventually when it gets demoted rather than always every
>>25 ms.
>>
>
>The rationale behind Ingo's patch is to "break up" the timeslices to
>give better scheduling latency to multiple tasks at the same priority.
>So it is not "unnecessary context switches," just "extra context
>switches."
>
>It also recalculates the process's effective priority, like yours does,
>so it also has the same advantage as your patch: to more quickly detect
>tasks that have changed in interactivity, and to handle that.
>
>Not sure which approach is better. Only testing will tell.
>
>
>>How come Ingo's granular timeslice patch didn't get put into 2.6.0-test4?
>>
>
>Interactivity improvements are currently a contentious issue. The patch
>is back in 2.6-mm, though.
>

Although I think what is less contentious is that Con's stuff is an
improvement over the 2.6 tree, and the consensus is that _something_
as to be done to it. So it is quite sad that the scheduler in 2.6 is
sitting there doing nothing but waiting to be obsoleted, while Con's
good (and begnin) scheduler patches are waiting around and getting
less than 1% of the testing they need.

2003-09-07 05:14:08

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

John Yau wrote:

>>The rationale behind Ingo's patch is to "break up" the timeslices to give
>>
>better scheduling latency to
>
>>multiple tasks at the same priority.
>>So it is not "unnecessary context switches," just "extra context switches."
>>
>
>Hmm...my reasoning is that those switches are unnecessary because the
>interactivity bonus/penalty will take care of breaking the timeslices up in
>case of a CPU hog, albeit not at precise 25 ms granularity. Though having
>regularity in scheduling is nice, I think Ingo's patch somewhat negates the
>purpose of having heterogenous time slice lengths. I suspect Ingo's
>approach will thrash the caches quite a bit more than mine; we should
>definitely test this a bit to find out for sure. Any suggestions on how to
>go about that?
>
>If we're going to do a context switch every 25 ms no matter what, we might
>as well just make the scheduler a true real time scheduler, dump having
>different time slice lengths and interactivity recalculations, and go
>completely round robin with strictly enforced priorities and a single class
>of time slice somewhere 1 to 5 ms long.
>

Heh, your logic is entertaining. I don't know how you got from step 1
to step 3 ;)

Anyway, you don't have to dump different timeslice lengths because you
don't really have them to begin with. See how "Nick's scheduler policy
v12" fixes your problems by mostly reducing complexity, not adding to
it.

2003-09-07 06:17:55

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Nick Piggin <[email protected]> wrote:
>
> So it is quite sad that the scheduler in 2.6 is
> sitting there doing nothing but waiting to be obsoleted, while Con's
> good (and begnin) scheduler patches are waiting around and getting
> less than 1% of the testing they need.

My concern is the (large) performance regression with specjbb and
volanomark, due to increased idle time.

We cannot just jam all this code into Linus's tree while crossing our
fingers and hoping that something will turn up to fix this problem.
Because we don't know what causes it, nor whether we even _can_ fix it.

So this is the problem which everyone who is working on the CPU scheduler
should be concentrating on, please.

2003-09-07 06:29:35

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Andrew Morton wrote:

>Nick Piggin <[email protected]> wrote:
>
>>So it is quite sad that the scheduler in 2.6 is
>> sitting there doing nothing but waiting to be obsoleted, while Con's
>> good (and begnin) scheduler patches are waiting around and getting
>> less than 1% of the testing they need.
>>
>
>My concern is the (large) performance regression with specjbb and
>volanomark, due to increased idle time.
>
>We cannot just jam all this code into Linus's tree while crossing our
>fingers and hoping that something will turn up to fix this problem.
>Because we don't know what causes it, nor whether we even _can_ fix it.
>
>So this is the problem which everyone who is working on the CPU scheduler
>should be concentrating on, please.
>

IIRC my (equivalent to Andrew's CAN_MIGRATE) patch fixed this. There was
still a small (~8%?) performance regression, but idle times were on par
with -linus. I don't have easy access to a largeish NUMA box, so I
can't do much more.

2003-09-07 06:44:35

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Nick Piggin <[email protected]> wrote:
>
> >My concern is the (large) performance regression with specjbb and
> >volanomark, due to increased idle time.
> >
> >We cannot just jam all this code into Linus's tree while crossing our
> >fingers and hoping that something will turn up to fix this problem.
> >Because we don't know what causes it, nor whether we even _can_ fix it.
> >
> >So this is the problem which everyone who is working on the CPU scheduler
> >should be concentrating on, please.
> >
>
> IIRC my (equivalent to Andrew's CAN_MIGRATE) patch fixed this. There was
> still a small (~8%?) performance regression, but idle times were on par
> with -linus. I don't have easy access to a largeish NUMA box, so I
> can't do much more.
>

That is not clear at this time. We do know that the reaim regression was
introduced by sched-2.6.0-test2-mm2-A3, but we don't know why. Certainly
that patch did not introduce the problem which Andrew's patch fixed. And
we have theorised that Andrew's patch brought back the reaim throughput.
And we have extrapolated those observations to possible improvements in
volanomark throughput.

It's all foggy and I'd like to see a clean rerun of specjbb and volanomark
by Mark Peloquin and co, confirming that -mm6 is performing OK.

Also, I'm concerned that sched-2.6.0-test2-mm2-A3 caused slowdowns and
Andrew's patch caused speedups and they just cancelled out. Let's get
Andrew's patch into Linus's tree and see if it speeds things up. If it
does, we probably still have a problem.

2003-09-07 06:59:53

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Andrew Morton wrote:

>Nick Piggin <[email protected]> wrote:
>
>>>My concern is the (large) performance regression with specjbb and
>>>
>> >volanomark, due to increased idle time.
>> >
>> >We cannot just jam all this code into Linus's tree while crossing our
>> >fingers and hoping that something will turn up to fix this problem.
>> >Because we don't know what causes it, nor whether we even _can_ fix it.
>> >
>> >So this is the problem which everyone who is working on the CPU scheduler
>> >should be concentrating on, please.
>> >
>>
>> IIRC my (equivalent to Andrew's CAN_MIGRATE) patch fixed this. There was
>> still a small (~8%?) performance regression, but idle times were on par
>> with -linus. I don't have easy access to a largeish NUMA box, so I
>> can't do much more.
>>
>>
>
>That is not clear at this time. We do know that the reaim regression was
>introduced by sched-2.6.0-test2-mm2-A3, but we don't know why. Certainly
>that patch did not introduce the problem which Andrew's patch fixed. And
>we have theorised that Andrew's patch brought back the reaim throughput.
>And we have extrapolated those observations to possible improvements in
>volanomark throughput.
>

Earlier we _saw_ my patch do what it was supposed to:
http://members.optusnet.com.au/ckolivas/kernel/2.5/volano/
Idle time is back to mainline levels although throughput is still down
a bit. (I'd say thats due to overbalancing which could be tuned back,
I'm going to attack the SMP and NUMA stuff in the scheduler soon).

>
>
>It's all foggy and I'd like to see a clean rerun of specjbb and volanomark
>by Mark Peloquin and co, confirming that -mm6 is performing OK.
>
>
>Also, I'm concerned that sched-2.6.0-test2-mm2-A3 caused slowdowns and
>Andrew's patch caused speedups and they just cancelled out. Let's get
>Andrew's patch into Linus's tree and see if it speeds things up. If it
>does, we probably still have a problem.
>
>

The slowdowns are due to CPUs becoming idle too long, and the patches
fix that. If CPUs aren't idle, the patch will have no effect.

I have no idea how volanomark really works, so I have no idea why the
patch causes queue imbalances. Thats the problem with those jumbo
patches :P

2003-09-07 07:02:45

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Nick Piggin wrote:

>
>
> Andrew Morton wrote:
>
>>
>>
>> Also, I'm concerned that sched-2.6.0-test2-mm2-A3 caused slowdowns and
>> Andrew's patch caused speedups and they just cancelled out. Let's get
>> Andrew's patch into Linus's tree and see if it speeds things up. If it
>> does, we probably still have a problem.
>>
>>
>
> The slowdowns are due to CPUs becoming idle too long, and the patches
> fix that. If CPUs aren't idle, the patch will have no effect.

Oh... I guess this is what you were saying?

2003-09-07 07:49:07

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

>
> Heh, your logic is entertaining. I don't know how you got from step 1
> to step 3 ;)

LOL...I got a bit scatterbrained. My basic argument is the fewer context
switches while maintaining interactivity the better because it's less
overhead and less cache thrashing. If we don't care about the overhead and
thrashing at all, then might as well be very aggressive with the scheduler
and use uniform 1 ms timeslices in a RR fashion. I've coded such a
scheduler in an embedded systems context; response time is awesome, but I
highly doubt it'd work for Linux workloads.

>
> Anyway, you don't have to dump different timeslice lengths because you
> don't really have them to begin with. See how "Nick's scheduler policy
> v12" fixes your problems by mostly reducing complexity, not adding to
> it.
>

I just started monitoring the list and I'm still quite a bit behind, so I'm
playing catch up on reading whenever I have a bit of free time. I'll look
for your patch and check out your code.

John Yau

2003-09-07 08:10:30

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Johnny Yau wrote:

>>Heh, your logic is entertaining. I don't know how you got from step 1
>>to step 3 ;)
>>
>
>LOL...I got a bit scatterbrained. My basic argument is the fewer context
>switches while maintaining interactivity the better because it's less
>overhead and less cache thrashing. If we don't care about the overhead and
>thrashing at all, then might as well be very aggressive with the scheduler
>and use uniform 1 ms timeslices in a RR fashion. I've coded such a
>scheduler in an embedded systems context; response time is awesome, but I
>highly doubt it'd work for Linux workloads.
>

Even if context switches don't cost anything, you still want to have
priorities so cpu hogs can be preempted by other tasks in order to
quickly respond to IO events. You want interactive tasks to be able
to sometimes get more cpu than cpu hogs, etc. Scheduling latency is
only a part of it.

2003-09-07 08:35:14

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

>
> Even if context switches don't cost anything, you still want to have
> priorities so cpu hogs can be preempted by other tasks in order to
> quickly respond to IO events. You want interactive tasks to be able
> to sometimes get more cpu than cpu hogs, etc. Scheduling latency is
> only a part of it.
>

Of course priorities are still necessary =) However assuming that
interactive tasks will always finish much much earlier than hogs, it's not
really worth it to give interactive tasks any special treatment when you
have very fine timeslices.

For example you have x that will use 100 ms and y that will use 5 ms, both
of the same priority. Assuming that x entered into the queue first and y
immediately after, at 20 ms timeslice, it will be 25 ms before y finishes.
However, at 1 ms timeslice, y finishes in 10 ms.

John Yau

2003-09-07 09:26:55

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

John Yau wrote:

>>Even if context switches don't cost anything, you still want to have
>>priorities so cpu hogs can be preempted by other tasks in order to
>>quickly respond to IO events. You want interactive tasks to be able
>>to sometimes get more cpu than cpu hogs, etc. Scheduling latency is
>>only a part of it.
>>
>>
>
>Of course priorities are still necessary =) However assuming that
>interactive tasks will always finish much much earlier than hogs, it's not
>really worth it to give interactive tasks any special treatment when you
>have very fine timeslices.
>

Its actually more important when you have smaller timeslices, because
the interactive task is more likely to use all of its timeslice in a
burst of activity, then getting stuck behind all the cpu hogs.

>
>
>For example you have x that will use 100 ms and y that will use 5 ms, both
>of the same priority. Assuming that x entered into the queue first and y
>immediately after, at 20 ms timeslice, it will be 25 ms before y finishes.
>However, at 1 ms timeslice, y finishes in 10 ms.
>
>

Yes. Also, say 5 hogs running, an interactive task needs to do something
taking 2ms. At a 2ms timeslice, it will take 2ms. At a 1ms timeslice it
will take 6ms.

2003-09-07 14:33:05

by Martin J. Bligh

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

> IIRC my (equivalent to Andrew's CAN_MIGRATE) patch fixed this. There
> was still a small (~8%?) performance regression, but idle times were
> on par with -linus. I don't have easy access to a largeish NUMA box,
> so I can't do much more.

The degredations were seen on SMP (though I can also see them on NUMA).
You should be able to get access to a largish SMP (or even NUMA) box
via OSDL. Alternatively, I should be able to run some tests on Monday,
once the power is back in our lab (grrrr). Sounds like test order of
the day is:

test4
test4 + "Andrew's patch" (whatever that was, and whichever Andrew ;-))
test4 + Andrew + Con
test4 + Andres + Nick.

Though the results weren't as extreme on my machine, they were 10% or
so, and I can probably beat jbb into running fairly easily, or Mark
can do that one.

M.

2003-09-07 16:51:40

by Robert Love

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Sun, 2003-09-07 at 02:18, Andrew Morton wrote:

> We cannot just jam all this code into Linus's tree while crossing our
> fingers and hoping that something will turn up to fix this problem.
> Because we don't know what causes it, nor whether we even _can_ fix it.

Actually, this would be my argument _for_ Nick's approach. It is simple
and we all understand it.

There are a _lot_ of scheduler changes in 2.6-mm, and who knows which
ones are an improvement, a detriment, and a noop? It is like bandaids
on top of bandaids, to fix corner cases.

And we _do_ know what causes the problem: the interactivity estimator
misestimates interactivity. What we do not know is what fixes it.

Robert Love

2003-09-07 17:31:55

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

>
> Its actually more important when you have smaller timeslices, because
> the interactive task is more likely to use all of its timeslice in a
> burst of activity, then getting stuck behind all the cpu hogs.
>

Well, I didn't claim it'd be optimal, I just said that it's not worth the
extra effort. The interactive task will still finish in O((interactive_time
/ timeslice) * #hogs + interative_time) ms. As long as the cpu time
interactive tasks require are very short, they still should finish within a
reasonable amount of time.

> >
>
> Yes. Also, say 5 hogs running, an interactive task needs to do something
> taking 2ms. At a 2ms timeslice, it will take 2ms. At a 1ms timeslice it
> will take 6ms.
>

That's assuming that the interactive task gets scheduled first. In the
worst case scenario where it gets scheduled last, at 2 ms, it takes 12 ms
and at 1 ms it also takes 12 ms. Not much difference there.

2003-09-07 17:36:14

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Robert Love <[email protected]> wrote:
>
> There are a _lot_ of scheduler changes in 2.6-mm, and who knows which
> ones are an improvement, a detriment, and a noop?

We know that sched-2.6.0-test2-mm2-A3.patch caused the regression, and
we now that sched-CAN_MIGRATE_TASK-fix.patch mostly fixed it up.

What we don't know is whether the thing which sched-CAN_MIGRATE_TASK-fix.patch
fixed was the thing which sched-2.6.0-test2-mm2-A3.patch broke.

2003-09-07 17:38:54

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

John Yau wrote:

>>Its actually more important when you have smaller timeslices, because
>>the interactive task is more likely to use all of its timeslice in a
>>burst of activity, then getting stuck behind all the cpu hogs.
>>
>>
>
>Well, I didn't claim it'd be optimal, I just said that it's not worth the
>extra effort. The interactive task will still finish in O((interactive_time
>/ timeslice) * #hogs + interative_time) ms. As long as the cpu time
>interactive tasks require are very short, they still should finish within a
>reasonable amount of time.
>

I have found it to be worth the extra effort in my patches, but maybe
you have something different in mind.

>
>>Yes. Also, say 5 hogs running, an interactive task needs to do something
>>taking 2ms. At a 2ms timeslice, it will take 2ms. At a 1ms timeslice it
>>will take 6ms.
>>
>>
>
>That's assuming that the interactive task gets scheduled first. In the
>worst case scenario where it gets scheduled last, at 2 ms, it takes 12 ms
>and at 1 ms it also takes 12 ms. Not much difference there.
>
>

No, not much difference. If the worst case scenario happens, it
indicates you have quite a big problem (ie. an interactive task not
allowed to preempt cpu hogs).

2003-09-07 18:12:53

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Andrew Morton wrote:

>Robert Love <[email protected]> wrote:
>
>> There are a _lot_ of scheduler changes in 2.6-mm, and who knows which
>> ones are an improvement, a detriment, and a noop?
>>
>
>We know that sched-2.6.0-test2-mm2-A3.patch caused the regression, and
>we now that sched-CAN_MIGRATE_TASK-fix.patch mostly fixed it up.
>
>What we don't know is whether the thing which sched-CAN_MIGRATE_TASK-fix.patch
>fixed was the thing which sched-2.6.0-test2-mm2-A3.patch broke.
>
>

I think Robert was just talking about general improvements or regressions,
etc. I don't think Con is going too badly though - the small amount of
feedback I read about it is normally positive.

I think we might as well use -linus tree for testing while its still in
the test release phase. It probably gets a few orders of magnitude more
testing than mm.

2003-09-07 18:14:54

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Robert Love wrote:

>On Sun, 2003-09-07 at 02:18, Andrew Morton wrote:
>
>
>>We cannot just jam all this code into Linus's tree while crossing our
>>fingers and hoping that something will turn up to fix this problem.
>>Because we don't know what causes it, nor whether we even _can_ fix it.
>>
>
>Actually, this would be my argument _for_ Nick's approach. It is simple
>and we all understand it.
>

Unfortunately (or fortunately?) you can't really get from my patch to
Con's in small simple steps, its basically one or the other. I'd like
to see my patch get included in 2.6, but I'm yet to convince many others.
Con is further along that road, so my only possibility for wider testing
is to try free up mm kernels for possible use ;) (if Andrew will have
them of course). Getting Con's patch more testing wouldn't hurt
anyone though, of course.

2003-09-08 00:15:21

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Mon, 8 Sep 2003 03:30, John Yau wrote:
> > Its actually more important when you have smaller timeslices, because
> > the interactive task is more likely to use all of its timeslice in a
> > burst of activity, then getting stuck behind all the cpu hogs.
>
> Well, I didn't claim it'd be optimal, I just said that it's not worth the
> extra effort. The interactive task will still finish in
> O((interactive_time / timeslice) * #hogs + interative_time) ms. As long as
> the cpu time interactive tasks require are very short, they still should
> finish within a reasonable amount of time.
>
> > Yes. Also, say 5 hogs running, an interactive task needs to do something
> > taking 2ms. At a 2ms timeslice, it will take 2ms. At a 1ms timeslice it
> > will take 6ms.
>
> That's assuming that the interactive task gets scheduled first. In the
> worst case scenario where it gets scheduled last, at 2 ms, it takes 12 ms
> and at 1 ms it also takes 12 ms. Not much difference there.

Ultra short timeslices would dissolve a lot of the interactivity issues but
come with a serious problem - most cpu caches these days, which are
incredibly valuable to cpu performance, take time to be filled and emptied,
and 1ms timeslices are just too short to use their benefits. The time
required to derive useful benefit depends on the cpu type but can be up to
20ms. In my patches, the most interactive tasks round robin every 10ms which
can cause some detriment to performance, but fortunately interactive tasks
tend not to be as cpu bound as other tasks. What also happens is that if an
interactive task decides to use a burst of cpu it will always be dropped by
at least one priority so the tasks that only ever use small amounts of cpu
(read audio apps) will always go first.

Also O1int as of vO20 round robins them with less frequency as their
interactivity drops, which corresponds quite nicely with how cpu bound they
are, and this maintaints throughput of cpu bound tasks.

Con

2003-09-08 00:30:58

by David Lang

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Mon, 8 Sep 2003, Con Kolivas wrote:

> Ultra short timeslices would dissolve a lot of the interactivity issues but
> come with a serious problem - most cpu caches these days, which are
> incredibly valuable to cpu performance, take time to be filled and emptied,
> and 1ms timeslices are just too short to use their benefits. The time
> required to derive useful benefit depends on the cpu type but can be up to
> 20ms. In my patches, the most interactive tasks round robin every 10ms which
> can cause some detriment to performance, but fortunately interactive tasks
> tend not to be as cpu bound as other tasks. What also happens is that if an
> interactive task decides to use a burst of cpu it will always be dropped by
> at least one priority so the tasks that only ever use small amounts of cpu
> (read audio apps) will always go first.

Con,
doesn't this get affected more with the CPU and memory speed then with
clock time? (i.e. the faster CPU is calling for addresses in a sorter time
period since it gets through more of the program and the faster memory
gets the cache misses into cache faster)

or are the caches getting enough larger as CPU and memeory
speeds climb that the clock time is close to being consistant?

it just seems wrong to list a specific wall clock time nessasary to fill a
cache with so many variables, it may have been correct as of the date that
time was calculated, but as CPU's change I would expect the minimum time
to vary quite a bit based on the particular CPU (which brings up the
question of if the timeslices should be different for different CPU's)

David Lang

2003-09-08 00:40:08

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Mon, 8 Sep 2003 10:27, David Lang wrote:
> On Mon, 8 Sep 2003, Con Kolivas wrote:
> > Ultra short timeslices would dissolve a lot of the interactivity issues
> > but come with a serious problem - most cpu caches these days, which are
> > incredibly valuable to cpu performance, take time to be filled and
> > emptied, and 1ms timeslices are just too short to use their benefits. The
> > time required to derive useful benefit depends on the cpu type but can be
> > up to 20ms. In my patches, the most interactive tasks round robin every
> > 10ms which can cause some detriment to performance, but fortunately
> > interactive tasks tend not to be as cpu bound as other tasks. What also
> > happens is that if an interactive task decides to use a burst of cpu it
> > will always be dropped by at least one priority so the tasks that only
> > ever use small amounts of cpu (read audio apps) will always go first.
>
> Con,
> doesn't this get affected more with the CPU and memory speed then with
> clock time? (i.e. the faster CPU is calling for addresses in a sorter time
> period since it gets through more of the program and the faster memory
> gets the cache misses into cache faster)
>
> or are the caches getting enough larger as CPU and memeory
> speeds climb that the clock time is close to being consistant?
>
> it just seems wrong to list a specific wall clock time nessasary to fill a
> cache with so many variables, it may have been correct as of the date that
> time was calculated, but as CPU's change I would expect the minimum time
> to vary quite a bit based on the particular CPU (which brings up the
> question of if the timeslices should be different for different CPU's)

Good question. What it seems from basic testing is that the cache build
up/tear down time does not seem to change at the same rate as the clock speed
at all. From the testing I _could_ do, it made no difference across PII->PIV
in terms of deriving benefit from the minimum timeslice possible, suggesting
to me that the cache is probably the rate limiting thing but I have no
further science to back me up on that. This meant to me that despite the fact
that varying timeslice according to clock speed sounding like a good idea, it
would probably be better to vary the timeslice according to cpu architecture
first (i386), family second (P2 versus P4), and clock speed to not have any
effect. All in all it makes for another swag of confusing changes that
probably are not worth it in the grand scheme if the baseline settings work
well across the board.

There are heaps of people out there with more hardware knowledge than I who
could answer this better.

Con

2003-09-08 22:27:57

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Andrew Morton wrote:

>That is not clear at this time. We do know that the reaim regression was
>introduced by sched-2.6.0-test2-mm2-A3, but we don't know why. Certainly
>that patch did not introduce the problem which Andrew's patch fixed. And
>we have theorised that Andrew's patch brought back the reaim throughput.
>And we have extrapolated those observations to possible improvements in
>volanomark throughput.
>
>It's all foggy and I'd like to see a clean rerun of specjbb and volanomark
>by Mark Peloquin and co, confirming that -mm6 is performing OK.
>
>
For specjbb things are looking good from a throughput point of view.

2.6.0-test4 2.6.0-test4-mm6
# of WHs OPs/sec OPs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 9783.46 10093.09 3.16 309.63 293.50 *
4 33783.93 35763.79 5.86 1979.86 1013.52 *
7 54401.52 54288.06 -0.21 -113.46 1632.05
10 56861.59 56445.20 -0.73 -416.39 1705.85
13 56024.86 55720.23 -0.54 -304.63 1680.75
16 43874.77 48994.63 11.67 5119.86 1316.24 *
19 32658.83 31248.04 -4.32 -1410.79 979.76 *

But to get these numbers we are using much more CPU. I'll leave it to
others to decide if this is good or not.

CPU IDLE TIME
2.6.0-test4 2.6.0-test4-mm6
# of WHs %CPU %CPU %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 87.30 87.31 0.01 0.01 3.62
4 49.53 49.51 -0.04 -0.02 2.49
7 12.40 12.32 -0.65 -0.08 1.37
10 0.36 0.40 11.11 0.04 1.01
13 1.20 0.62 -48.33 -0.58 1.04
16 15.17 2.79 -81.61 -12.38 1.46 *
19 30.66 5.29 -82.75 -25.37 1.92 *

Volanomark, on the other hand is still off by quite a bit from test4 stock

Results:Throughput

tolerance = 0.00 + 3.00% of 2.6.0-test4
2.6.0-test4 2.6.0-test4-mm6
Msgs/sec Msgs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 40757 37223 -8.67 -3534.00 1222.71 *

>
>Also, I'm concerned that sched-2.6.0-test2-mm2-A3 caused slowdowns and
>Andrew's patch caused speedups and they just cancelled out. Let's get
>Andrew's patch into Linus's tree and see if it speeds things up. If it
>does, we probably still have a problem.
>

If thre is any particular patch/tree combination you would like me to
try out, please let me know and I will see if I can get the results for
you.

Steve

2003-09-08 23:15:39

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Steven Pratt <[email protected]> wrote:
>
> For specjbb things are looking good from a throughput point of view.
> ...
> Volanomark, on the other hand is still off by quite a bit from test4 stock
>

hmm, thanks.

I'm not sure that volanomark is very representative of any real-world
thing.

> ...
> If thre is any particular patch/tree combination you would like me to
> try out, please let me know and I will see if I can get the results for
> you.

Could we please see test5 versus test5 plus Andrew's patch?

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2.6.0-test4-mm6/broken-out/sched-CAN_MIGRATE_TASK-fix.patch

and if you have time, also test5 plus sched-CAN_MIGRATE_TASK-fix.patch plus

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2.6.0-test4-mm6/broken-out/sched-balance-fix-2.6.0-test3-mm3-A0.patch

What I'm afraid of is that those patches will yield improved results over
test5, and that adding

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2.6.0-test4-mm6/broken-out/sched-2.6.0-test2-mm2-A3.patch

will slow things down again.

2003-09-08 23:22:00

by William Lee Irwin III

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Mon, Sep 08, 2003 at 03:56:39PM -0700, Andrew Morton wrote:
> hmm, thanks.
> I'm not sure that volanomark is very representative of any real-world
> thing.

AIUI it's dominated by 3-tier locking issues.

-- wli

2003-09-09 02:02:42

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Tue, 9 Sep 2003 08:56, Andrew Morton wrote:
> Steven Pratt <[email protected]> wrote:
> > For specjbb things are looking good from a throughput point of view.
> > ...
> > Volanomark, on the other hand is still off by quite a bit from test4
> > stock
>
> hmm, thanks.
>
> I'm not sure that volanomark is very representative of any real-world
> thing.
>
> > ...
> > If thre is any particular patch/tree combination you would like me to
> > try out, please let me know and I will see if I can get the results for
> > you.
>
> Could we please see test5 versus test5 plus Andrew's patch?
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2
>.6.0-test4-mm6/broken-out/sched-CAN_MIGRATE_TASK-fix.patch
>
> and if you have time, also test5 plus sched-CAN_MIGRATE_TASK-fix.patch plus
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2
>.6.0-test4-mm6/broken-out/sched-balance-fix-2.6.0-test3-mm3-A0.patch

Interestingly enough this drops the volano results the same proportion as
Ingo's A3 patch. 11000 ->10400 throughput with same idle, but more
schedule().

I've posted some results for test5 volano and test5-A0 here:
http://kernel.kolivas.org/2.5/volano

More testing underway.
Con

2003-09-09 02:08:50

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Tue, 9 Sep 2003 12:10, Con Kolivas wrote:
> On Tue, 9 Sep 2003 08:56, Andrew Morton wrote:
> > Steven Pratt <[email protected]> wrote:
> > > For specjbb things are looking good from a throughput point of view.
> > > ...
> > > Volanomark, on the other hand is still off by quite a bit from test4
> > > stock
> >
> > hmm, thanks.
> >
> > I'm not sure that volanomark is very representative of any real-world
> > thing.
> >
> > > ...
> > > If thre is any particular patch/tree combination you would like me to
> > > try out, please let me know and I will see if I can get the results for
> > > you.
> >
> > Could we please see test5 versus test5 plus Andrew's patch?
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4
> >/2 .6.0-test4-mm6/broken-out/sched-CAN_MIGRATE_TASK-fix.patch
> >
> > and if you have time, also test5 plus sched-CAN_MIGRATE_TASK-fix.patch
> > plus
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4
> >/2 .6.0-test4-mm6/broken-out/sched-balance-fix-2.6.0-test3-mm3-A0.patch
>
> Interestingly enough this drops the volano results the same proportion as
> Ingo's A3 patch. 11000 ->10400 throughput with same idle, but more
> schedule().
>
> I've posted some results for test5 volano and test5-A0 here:
> http://kernel.kolivas.org/2.5/volano
>
> More testing underway.

Correction sorry: These changes were due to sched-CAN_MIGRATE_TASK-fix.patch
and the test results say volano-results-2.6.0-test5-A0-*

Con

2003-09-09 02:24:10

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Tue, 9 Sep 2003 12:16, Con Kolivas wrote:
> On Tue, 9 Sep 2003 12:10, Con Kolivas wrote:
> > On Tue, 9 Sep 2003 08:56, Andrew Morton wrote:
> > > Steven Pratt <[email protected]> wrote:
> > > > For specjbb things are looking good from a throughput point of view.
> > > > ...
> > > > Volanomark, on the other hand is still off by quite a bit from test4
> > > > stock
> > >
> > > hmm, thanks.
> > >
> > > I'm not sure that volanomark is very representative of any real-world
> > > thing.
> > >
> > > > ...
> > > > If thre is any particular patch/tree combination you would like me to
> > > > try out, please let me know and I will see if I can get the results
> > > > for you.
> > >
> > > Could we please see test5 versus test5 plus Andrew's patch?
> > >
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-tes
> > >t4 /2 .6.0-test4-mm6/broken-out/sched-CAN_MIGRATE_TASK-fix.patch
> > >
> > > and if you have time, also test5 plus sched-CAN_MIGRATE_TASK-fix.patch
> > > plus
> > >
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-tes
> > >t4 /2
> > > .6.0-test4-mm6/broken-out/sched-balance-fix-2.6.0-test3-mm3-A0.patch
> >
> > Interestingly enough this drops the volano results the same proportion as
> > Ingo's A3 patch. 11000 ->10400 throughput with same idle, but more
> > schedule().
> >
> > I've posted some results for test5 volano and test5-A0 here:
> > http://kernel.kolivas.org/2.5/volano
> >
> > More testing underway.
>
> Correction sorry: These changes were due to
> sched-CAN_MIGRATE_TASK-fix.patch and the test results say
> volano-results-2.6.0-test5-A0-*

Further testing shows the patch: sched-balance-fix-2.6.0-test3-mm3-A0.patch to
have no effect on volano results by itself.

Con

2003-09-09 02:33:09

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Con Kolivas <[email protected]> wrote:
>
> > Correction sorry: These changes were due to
> > sched-CAN_MIGRATE_TASK-fix.patch and the test results say
> > volano-results-2.6.0-test5-A0-*
>
> Further testing shows the patch: sched-balance-fix-2.6.0-test3-mm3-A0.patch to
> have no effect on volano results by itself.

OK, thanks.

As I have just learnt that volanomark is dominated by sched_yield()-based
userspace locking I have suddenly lost all interest in it. We don't want
to tune the kernel for braindead locking implementations. Call me when it
has been converted to futexes ;)

Do we know what specjbb is doing internally?

2003-09-09 04:16:09

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Con Kolivas wrote:

>On Tue, 9 Sep 2003 12:16, Con Kolivas wrote:
>
>>On Tue, 9 Sep 2003 12:10, Con Kolivas wrote:
>>
>>>On Tue, 9 Sep 2003 08:56, Andrew Morton wrote:
>>>
>>>>Steven Pratt <[email protected]> wrote:
>>>>
>>>>>For specjbb things are looking good from a throughput point of view.
>>>>>...
>>>>>Volanomark, on the other hand is still off by quite a bit from test4
>>>>>stock
>>>>>
>>>>hmm, thanks.
>>>>
>>>>I'm not sure that volanomark is very representative of any real-world
>>>>thing.
>>>>
>>>>
>>>>>...
>>>>>If thre is any particular patch/tree combination you would like me to
>>>>>try out, please let me know and I will see if I can get the results
>>>>>for you.
>>>>>
>>>>Could we please see test5 versus test5 plus Andrew's patch?
>>>>
>>>>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-tes
>>>>t4 /2 .6.0-test4-mm6/broken-out/sched-CAN_MIGRATE_TASK-fix.patch
>>>>
>>>>and if you have time, also test5 plus sched-CAN_MIGRATE_TASK-fix.patch
>>>>plus
>>>>
>>>>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-tes
>>>>t4 /2
>>>>.6.0-test4-mm6/broken-out/sched-balance-fix-2.6.0-test3-mm3-A0.patch
>>>>
>>>Interestingly enough this drops the volano results the same proportion as
>>>Ingo's A3 patch. 11000 ->10400 throughput with same idle, but more
>>>schedule().
>>>
>>>I've posted some results for test5 volano and test5-A0 here:
>>>http://kernel.kolivas.org/2.5/volano
>>>
>>>More testing underway.
>>>
>>Correction sorry: These changes were due to
>>sched-CAN_MIGRATE_TASK-fix.patch and the test results say
>>volano-results-2.6.0-test5-A0-*
>>
>
>Further testing shows the patch: sched-balance-fix-2.6.0-test3-mm3-A0.patch to
>have no effect on volano results by itself.
>

Hi Con,
Any chance you could give this
http://www.kerneltrap.org/~npiggin/v14/sched-rollup-nopolicy-v14.gz
a try? It should apply against test5.

2003-09-09 06:41:19

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Tue, 9 Sep 2003 14:14, Nick Piggin wrote:
> Hi Con,
> Any chance you could give this
> http://www.kerneltrap.org/~npiggin/v14/sched-rollup-nopolicy-v14.gz
> a try? It should apply against test5.

Tested. This patch causes a drop in volano throughput of around 8% also like
the other patches. I guess this patch must be good too :P

Con

2003-09-09 22:07:45

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Andrew Morton wrote:

>Steven Pratt <[email protected]> wrote:
>
>
>>For specjbb things are looking good from a throughput point of view.
>>...
>>Volanomark, on the other hand is still off by quite a bit from test4 stock
>>
>>
>>
>hmm, thanks.
>
>I'm not sure that volanomark is very representative of any real-world
>thing.
>
>
>
>>...
>>If thre is any particular patch/tree combination you would like me to
>>try out, please let me know and I will see if I can get the results for
>>you.
>>
>>
>
>Could we please see test5 versus test5 plus Andrew's patch?
>
>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2.6.0-test4-mm6/broken-out/sched-CAN_MIGRATE_TASK-fix.patch
>
This patch improves specjjb over test5 and has no real effect on any of
kernbench, volanomark or specsdet.

Specjbb Throughput
2.6.0-test5 2.6.0-test5BALANCE
# of WHs OPs/sec OPs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 10118.42 10062.66 -0.55 -55.76 303.55
4 35316.38 34676.03 -1.81 -640.35 1059.49
7 54126.17 52717.84 -2.60 -1408.33 1623.79
10 56906.64 56587.53 -0.56 -319.11 1707.20
13 51589.86 54625.25 5.88 3035.39 1547.70 *
16 41410.52 43120.66 4.13 1710.14 1242.32 *
19 32944.48 35820.89 8.73 2876.41 988.33 *

Volanomark
2.6.0-test5 2.6.0-test5BALANCE
Msgs/sec Msgs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 40915 41391 1.16 476.00 1227.45

>
>and if you have time, also test5 plus sched-CAN_MIGRATE_TASK-fix.patch plus
>
>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2.6.0-test4-mm6/broken-out/sched-balance-fix-2.6.0-test3-mm3-A0.patch
>
>
This patch degrades both specjbb and volanomark, and to a lesser degree
specsdet

Specjbb throughput
2.6.0-test5 2.6.0-test5MIGRATE
# of WHs OPs/sec OPs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 10118.42 9980.90 -1.36 -137.52 303.55
4 35316.38 34065.11 -3.54 -1251.27 1059.49 *
7 54126.17 52697.10 -2.64 -1429.07 1623.79
10 56906.64 55466.77 -2.53 -1439.87 1707.20
13 51589.86 43152.57 -16.35 -8437.29 1547.70 *
16 41410.52 45201.21 9.15 3790.69 1242.32 *
19 32944.48 29025.16 -11.90 -3919.32 988.33 *

Volanomark
2.6.0-test5 2.6.0-test5MIGRATE
Msgs/sec Msgs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 40915 38518 -5.86 -2397.00 1227.45 *

>
>What I'm afraid of is that those patches will yield improved results over
>test5, and that adding
>
>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2.6.0-test4-mm6/broken-out/sched-2.6.0-test2-mm2-A3.patch
>
I tried adding this patch to stock test5 and it failed to apply
cleanly. I have not had a chance to look at why. Did you mean for this
to be applied by itself, or was this supposed to go on top of one of the
other patches?

>
>will slow things down again.
>

Steve

2003-09-09 22:31:27

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Steven Pratt <[email protected]> wrote:
>
> >
> >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2.6.0-test4-mm6/broken-out/sched-CAN_MIGRATE_TASK-fix.patch
> >
> This patch improves specjjb over test5 and has no real effect on any of
> kernbench, volanomark or specsdet.

Fine, it's a good fix.

> >
> >and if you have time, also test5 plus sched-CAN_MIGRATE_TASK-fix.patch plus
> >
> >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2.6.0-test4-mm6/broken-out/sched-balance-fix-2.6.0-test3-mm3-A0.patch
> >
> >
> This patch degrades both specjbb and volanomark, and to a lesser degree
> specsdet

ok. And just confirming: that was test5 plus
sched-CAN_MIGRATE_TASK-fix.patch plus
sched-balance-fix-2.6.0-test3-mm3-A0.patch?

I didn't expect a regression from sched-balance-fix.

> >What I'm afraid of is that those patches will yield improved results over
> >test5, and that adding
> >
> >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2.6.0-test4-mm6/broken-out/sched-2.6.0-test2-mm2-A3.patch
> >
> I tried adding this patch to stock test5 and it failed to apply
> cleanly. I have not had a chance to look at why. Did you mean for this
> to be applied by itself, or was this supposed to go on top of one of the
> other patches?

Yes, it applies on top of the other two patches.

Thanks for working on this: it's pretty important right now.

2003-09-09 23:34:25

by David Mosberger-Tang

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

>>>>> On Wed, 10 Sep 2003 00:40:09 +0200, Andrew Morton <[email protected]> said:

Andrew> Steven Pratt <[email protected]> wrote:

>> >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2.6.0-test4-mm6/broken-out/sched-CAN_MIGRATE_TASK-fix.patch
>> >
>> This patch improves specjjb over test5 and has no real effect on
>> any of kernbench, volanomark or specsdet.

Andrew> Fine, it's a good fix.

Is it that simple? My reading is that it will do very bad things,
e.g., to pipe roundtrip latency on SMP machines. Something that the
O(1) scheduler has handled nicely so far.

My preference would have been to break affinity only in the presence
of a _persistent_ load imbalance of >> 1. For example, it's perfectly
OK and indeed encouraged to run N tasks on one and the same CPU, if
those tasks are (almost) never runnable at the same time.

--david
--
Interested in learning more about IA-64 Linux? Try http://www.lia64.org/book/

2003-09-09 23:55:14

by Cliff White

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

>
> Nick Piggin wrote:

> Con Kolivas wrote:
>

> Hi Con,
> Any chance you could give this
> http://www.kerneltrap.org/~npiggin/v14/sched-rollup-nopolicy-v14.gz
> a try? It should apply against test5.
>
>
I have some STP tests scheduled against this also (PLM 2117)
Please let me know if you want other combinations tested - am just catching up
on
this thread.
cliffw

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2003-09-10 02:13:51

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Cliff White wrote:

>> Nick Piggin wrote:
>>
>
>>Con Kolivas wrote:
>>
>>
>
>>Hi Con,
>>Any chance you could give this
>>http://www.kerneltrap.org/~npiggin/v14/sched-rollup-nopolicy-v14.gz
>>a try? It should apply against test5.
>>
>>
>>
>I have some STP tests scheduled against this also (PLM 2117)
>Please let me know if you want other combinations tested - am just catching up
>on
>this thread.
>cliffw
>

Thanks Cliff that would be cool. If you could test this:
http://www.kerneltrap.org/~npiggin/v14/sched-rollup-v14.gz
as well would be good. The previous one is more important though.

2003-09-10 13:59:52

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Andrew Morton wrote:

>Steven Pratt <[email protected]> wrote:
>
>
>>>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2.6.0-test4-mm6/broken-out/sched-CAN_MIGRATE_TASK-fix.patch
>>>
>>>
>>>
>>This patch improves specjjb over test5 and has no real effect on any of
>>kernbench, volanomark or specsdet.
>>
>>
>
>Fine, it's a good fix.
>
>
>
>>>and if you have time, also test5 plus sched-CAN_MIGRATE_TASK-fix.patch plus
>>>
>>>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2.6.0-test4-mm6/broken-out/sched-balance-fix-2.6.0-test3-mm3-A0.patch
>>>
>>>
>>>
>>This patch degrades both specjbb and volanomark, and to a lesser degree
>>specsdet
>>
>>
>
>ok. And just confirming: that was test5 plus
>sched-CAN_MIGRATE_TASK-fix.patch plus
>sched-balance-fix-2.6.0-test3-mm3-A0.patch?
>
>
No this was test 5 plus sched-CAN_MIGRATE_TASK-fix.patch only. I seems
I misread the request. I am running that job now.

>I didn't expect a regression from sched-balance-fix.
>
>
>
>>>What I'm afraid of is that those patches will yield improved results over
>>>test5, and that adding
>>>
>>>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test4/2.6.0-test4-mm6/broken-out/sched-2.6.0-test2-mm2-A3.patch
>>>
>>>
>>>
>>I tried adding this patch to stock test5 and it failed to apply
>>cleanly. I have not had a chance to look at why. Did you mean for this
>>to be applied by itself, or was this supposed to go on top of one of the
>>other patches?
>>
>>
>
>Yes, it applies on top of the other two patches.
>
>Thanks for working on this: it's pretty important right now.
>
Ok, this is submitted as well. Should have results this afternoon.

Steve

2003-09-10 18:52:11

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Ok, looks like I reversed 2 set of results yesterday when I posted.
Here is a more complete (and hopefully accurate) reporting of the 3 patches.

A) *** test5 + sched-CAN_MIGRATE_TASK-fix.patch
degrades volanomark, specjbb and specsdet

B) *** test5 + sched-balance-fix-2.6.0-test3-mm3-A0.patch
improves specjbb with no real change on sdet or volanomark

C) *** test5 + sched-CAN_MIGRATE_TASK-fix.patch + sched-balance-fix-2.6.0-test3-mm3-A0.patch
degrades volanomark and sdet, more degrade than improvement on specjbb

D) *** test5 + sched-CAN_MIGRATE_TASK-fix.patch + sched-balance-fix-2.6.0-test3-mm3-A0.patch + sched-2.6.0-test2-mm2-A3.patch
degrades volanomark and sdet, mixed results on specjbb

Below are the details for each of the runs.

A) *** test5 + sched-CAN_MIGRATE_TASK-fix.patch ********************

Volanomark
2.6.0-test5 2.6.0-test5MIGRATE
Msgs/sec Msgs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 40915 38518 -5.86 -2397.00 1227.45 *

SPECJBB
2.6.0-test5 2.6.0-test5MIGRATE
# of WHs OPs/sec OPs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 10118.42 9980.90 -1.36 -137.52 303.55
4 35316.38 34065.11 -3.54 -1251.27 1059.49 *
7 54126.17 52697.10 -2.64 -1429.07 1623.79
10 56906.64 55466.77 -2.53 -1439.87 1707.20
13 51589.86 43152.57 -16.35 -8437.29 1547.70 *
16 41410.52 45201.21 9.15 3790.69 1242.32 *
19 32944.48 29025.16 -11.90 -3919.32 988.33 *

SPECSDET
2.6.0-test5 2.6.0-test5MIGRATE
Threads Ops/sec Ops/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 3232 3111 -3.74 -121.00 96.96 *
4 11794 11383 -3.48 -411.00 353.82 *
16 19008 18726 -1.48 -282.00 570.24
64 18736 18701 -0.19 -35.00 562.08

B) *** test5 + sched-balance-fix-2.6.0-test3-mm3-A0.patch *************************

VOLANOMARK
2.6.0-test5 2.6.0-test5BALANCE
Msgs/sec Msgs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 40915 41391 1.16 476.00 1227.45

SPECJBB
2.6.0-test5 2.6.0-test5BALANCE
# of WHs OPs/sec OPs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 10118.42 10062.66 -0.55 -55.76 303.55
4 35316.38 34676.03 -1.81 -640.35 1059.49
7 54126.17 52717.84 -2.60 -1408.33 1623.79
10 56906.64 56587.53 -0.56 -319.11 1707.20
13 51589.86 54625.25 5.88 3035.39 1547.70 *
16 41410.52 43120.66 4.13 1710.14 1242.32 *
19 32944.48 35820.89 8.73 2876.41 988.33 *

SPECSDET
2.6.0-test5 2.6.0-test5BALANCE
# of WHs OPs/sec OPs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 10118.42 10062.66 -0.55 -55.76 303.55
4 35316.38 34676.03 -1.81 -640.35 1059.49
7 54126.17 52717.84 -2.60 -1408.33 1623.79
10 56906.64 56587.53 -0.56 -319.11 1707.20
13 51589.86 54625.25 5.88 3035.39 1547.70 *
16 41410.52 43120.66 4.13 1710.14 1242.32 *
19 32944.48 35820.89 8.73 2876.41 988.33 *

C) *** test5 + sched-CAN_MIGRATE_TASK-fix.patch + sched-balance-fix-2.6.0-test3-mm3-A0.patch

VOLANOMARK
2.6.0-test5 2.6.0-test5MIGRATE-BALANCE
Msgs/sec Msgs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 40915 38651 -5.53 -2264.00 1227.45 *

SPECJBB
2.6.0-test5 2.6.0-test5MIGRATE-BALANCE
# of WHs OPs/sec OPs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 10118.42 10103.50 -0.15 -14.92 303.55
4 35316.38 35420.78 0.30 104.40 1059.49
7 54126.17 54256.02 0.24 129.85 1623.79
10 56906.64 57224.95 0.56 318.31 1707.20
13 51589.86 43993.71 -14.72 -7596.15 1547.70 *
16 41410.52 45037.11 8.76 3626.59 1242.32 *
19 32944.48 30018.34 -8.88 -2926.14 988.33 *

SPECSDET
2.6.0-test5 2.6.0-test5MIGRATE-BALANCE
Threads Ops/sec Ops/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 3232 3031 -6.22 -201.00 96.96 *
4 11794 11288 -4.29 -506.00 353.82 *
16 19008 18716 -1.54 -292.00 570.24
64 18736 18775 0.21 39.00 562.08

D) *** test5 + sched-CAN_MIGRATE_TASK-fix.patch + sched-balance-fix-2.6.0-test3-mm3-A0.patch +
sched-2.6.0-test2-mm2-A3.patch

VOLANOMARK
2.6.0-test5 2.6.0-test5ALL
Msgs/sec Msgs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 40915 37204 -9.07 -3711.00 1227.45 *

SPECJBB
2.6.0-test5 2.6.0-test5ALL
# of WHs OPs/sec OPs/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 10118.42 10124.52 0.06 6.10 303.55
4 35316.38 34240.71 -3.05 -1075.67 1059.49 *
7 54126.17 54015.28 -0.20 -110.89 1623.79
10 56906.64 56618.95 -0.51 -287.69 1707.20
13 51589.86 47911.50 -7.13 -3678.36 1547.70 *
16 41410.52 46771.18 12.95 5360.66 1242.32 *
19 32944.48 33306.14 1.10 361.66 988.33

SPECSDET
2.6.0-test5 2.6.0-test5ALL
Threads Ops/sec Ops/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 3232 3097 -4.18 -135.00 96.96 *
4 11794 11251 -4.60 -543.00 353.82 *
16 19008 18657 -1.85 -351.00 570.24
64 18736 18703 -0.18 -33.00 562.08

Steve

2003-09-10 19:08:13

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Nick Piggin wrote:

>
> Cliff White wrote:
>
>>> Nick Piggin wrote:
>>>
>>> Con Kolivas wrote:
>>>
>>>
>>> Hi Con,
>>> Any chance you could give this
>>> http://www.kerneltrap.org/~npiggin/v14/sched-rollup-nopolicy-v14.gz
>>> a try? It should apply against test5.
>>>
>>>
>>>
>> I have some STP tests scheduled against this also (PLM 2117) Please
>> let me know if you want other combinations tested - am just catching
>> up on
>> this thread.
>> cliffw
>>
>
> Thanks Cliff that would be cool. If you could test this:
> http://www.kerneltrap.org/~npiggin/v14/sched-rollup-v14.gz
> as well would be good. The previous one is more important though.

I gave this a try on the same setup that I am using for the regression
tests and the scheduler tests for Andrew. What I got was the following
oops:

CPU: 5
EIP: 0060:[<c011c577>] Not tainted
EFLAGS: 00010003
EIP is at load_balance+0x257/0x3f0
eax: f6583998 ebx: c6099518 ecx: 00000000 edx: c60b9100
esi: c6099518 edi: c60994f8 ebp: f64bff44 esp: f64bff14
ds: 007b es: 007b ss: 0068
Process java (pid: 3482, threadinfo=f64be000 task=f6b13300)
Stack: c6099104 c6098bc0 c6099518 c6099100 00000000 00000005 00000080
000000ff
00000002 f6b13300 c60b9100 c60b8bc0 f64bff8c c011cef9 c60b8bc0
00000001
000000ff f64be000 c60b8bc0 c011ca9d f6b13300 00000000 00000002
c60a8bc0
Call Trace:
[<c011cef9>] schedule+0x4d9/0x550
[<c011ca9d>] schedule+0x7d/0x550
[<c010a08d>] sys_rt_sigsuspend+0xed/0x130
[<c010b01f>] syscall_call+0x7/0xb

Code: 89 19 89 4b 04 8b 47 18 0f ab 42 04 ff 02 89 57 28 8b 55 08

Steve

2003-09-10 20:23:38

by Dave Hansen

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Wed, 2003-09-10 at 12:05, Steven Pratt wrote:
> I gave this a try on the same setup that I am using for the regression
> tests and the scheduler tests for Andrew. What I got was the following
> oops:

If you compile your kernel with debugging (-g) turned on, then you can
use addr2line on the vmlinux and the eip where it crashed. That is
immensely useful in diagnosing oopses. Guessing where
load_balance+0x257 is actually located is kinda hard.

--
Dave Hansen
[email protected]

2003-09-11 00:15:26

by Cliff White

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

>
>
> Cliff White wrote:
>
> >> Nick Piggin wrote:
> >>
> >
> >>Con Kolivas wrote:
> >>
> >>
> >
> >>Hi Con,
> >>Any chance you could give this
> >>http://www.kerneltrap.org/~npiggin/v14/sched-rollup-nopolicy-v14.gz
> >>a try? It should apply against test5.
> >>
> >>
> >>
> >I have some STP tests scheduled against this also (PLM 2117)
> >Please let me know if you want other combinations tested - am just catching up
> >on
> >this thread.
> >cliffw
> >
>
> Thanks Cliff that would be cool. If you could test this:
> http://www.kerneltrap.org/~npiggin/v14/sched-rollup-v14.gz
> as well would be good. The previous one is more important though.
>
sched-rollup-v14 runs good on 2-cpu w/reaim (more in next email):
example:
http://khack.osdl.org/stp/279657/

Has oops on 4-cpu when running the dbt2-1tier database test(below):
Two runs, different oops.
Let me know how much more detail you need.
Both machines are Intel PIII x ~800mhz w/ 1GB RAM/cpu
Diffs between the 2 and 4 way would include. :
- different SCSI controllers
- database uses raw IO
kernel .config files should be the same for both, other
than SCSI.
cliffw

----------------------------
Sep 10 15:44:30 stp4-000 kernel: sdo: sdo1
Unable to handle kernel paging request at virtual address 4d7a7153
printing eip:
c011c1f9
*pde = 0e81a067
*pte = 00000000
Oops: 0002 [#1]
CPU: 3
EIP: 0060:[<c011c1f9>] Not tainted
EFLAGS: 00010003
EIP is at load_balance+0x2a9/0x4d0
eax: de9c4998 ebx: c366a518 ecx: 4d7a7153 edx: c3672100
esi: c366a518 edi: c366a4f8 ebp: c3e8df64 esp: c3e8df30
ds: 007b es: 007b ss: 0068
Process kernel (pid: 4246, threadinfo=c3e8c000 task=f7b9b300)
Stack: c3669bc0 c366a104 c3669bc0 c366a518 c366a100 00000000 00000003 00000080
0000000f 00000002 c3e8c000 c3671bc0 0000000f c3e8df88 c011c486 c3671bc0
00000000 0000000f 00000000 00000001 00000000 c0412084 c3e8dfc4 c012a6a5
Call Trace:
[<c011c486>] rebalance_tick+0x66/0xa0
[<c012a6a5>] update_process_times+0x45/0x60
[<c0117961>] smp_apic_timer_interrupt+0x141/0x150
[<c010a0ca>] apic_timer_interrupt+0x1a/0x20

Code: 89 19 89 4b 04 8b 47 18 0f ab 42 04 ff 02 89 57 28 8b 5d 08
----------
second 4-cpu run, same test
--------
nable to handle kernel paging request at virtual address 20190078
printing eip:
c011c9c4
*pde = 00000000
Oops: 0002 [#1]
CPU: 1
EIP: 0060:[<c011c9c4>] Not tainted
EFLAGS: 00010002
EIP is at schedule+0x224/0x5d0
eax: 00000001 ebx: cacfd940 ecx: 00000001 edx: 20190000
esi: de984460 edi: f73a8e00 ebp: caed3c28 esp: caed3be8
ds: 007b es: 007b ss: 0068
Process awk (pid: 9673, threadinfo=caed2000 task=cacfd940)
Stack: de984460 840e6b72 00000003 d1bf3120 c3661bc0 c3661020 00000001 bfffa000
c11c03e0 00000007 f73a8e00 c3661590 003fe5d0 caed2000 0324ef80 f7983dc0
caed3c34 c011cda6 e8912660 00000001 c0150ed4 c3661034 00000000 00000000
Call Trace:
[<c011cda6>] preempt_schedule+0x36/0x50
[<c0150ed4>] exit_mmap+0x1e4/0x240
[<c011f6a0>] mmput+0x70/0xc0
[<c0169bad>] exec_mmap+0x11d/0x210
[<c0169e03>] flush_old_exec+0x163/0xa10
[<c0169a80>] kernel_read+0x50/0x60
[<c018c2cc>] load_elf_binary+0x2dc/0xbc0
[<c011a148>] pgd_alloc+0x18/0x20
[<c011f59d>] mm_init+0xbd/0x100
[<c0169533>] copy_strings+0x213/0x290
[<c018bff0>] load_elf_binary+0x0/0xbc0
[<c016aa41>] search_binary_handler+0xa1/0x210
[<c016adc4>] do_execve+0x214/0x250
[<c0107c80>] sys_execve+0x50/0x80
[<c0109689>] sysenter_past_esp+0x52/0x71

Code: f0 0f ab 42 78 8b 42 10 05 00 00 00 40 0f 22 d8 8b 8a 9c 00

_______________________________________________

cliffw
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2003-09-11 02:53:05

by Andrew Theurer

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Robert Love <[email protected]> wrote:
>
>> There are a _lot_ of scheduler changes in 2.6-mm, and who knows which
>> ones are an improvement, a detriment, and a noop?

> We know that sched-2.6.0-test2-mm2-A3.patch caused the regression, and
> we now that sched-CAN_MIGRATE_TASK-fix.patch mostly fixed it up.

> What we don't know is whether the thing which
> sched-CAN_MIGRATE_TASK-fix.patch
> fixed was the thing which sched-2.6.0-test2-mm2-A3.patch broke.

Sorry for jumping into this late. I didn't even know the can_migrate patch
was being discussed, let alone in -mm :). And to be fair, this really is
Ingo's aggressive idle steal patch.

Anyway, these patches are somewhat related. It would seem that A3's
shortening the tasks' run time would not only slow performance beacuse of
cache thrash, but could possibly break CAN_MIGRATE's cache warmth check,
right? That in turn would stop load balancing from working well, leading to
more idle time, which the CAN_MIGRATE patch sort of bypassed for idle cpus.

I see Nick's balance patch as somewhat harmless, at least combined with A3
patch. However, one concern is that the "ping-pong" steal interval is not
really 200ms, but 200ms/(nr_cpus-1), which without A3, could show up as a
problem, especially on an 8 way box. In addition, I do think there's a
problem with num tasks we steal. It should not be imbalance/2, it should be:
max_load - (node_nr_running / num_cpus_node). If we steal any more than
this, which is quite possible with imbalance/2, then it's likely this_cpu now
has too many tasks, and some other cpu will steal again. Using *imbalance/2
works fine on 2-way smp, but I'm pretty sure we "over steal" tasks on 4 way
and up. Anyway, I'm getting off topic here...

But Steve's latest results have me toally stumped. Why would a patch which
shortens run time and probbaly thrashes cache improve a cpu bound workload
like JBB? And why would a patch that makes sure idle cpus don't stay idle
reduce performance by so much?

Steve, are you absolutely sure your latest results on test5 are correct? Any
possibility the original results were the "good" ones?

FWIW, I have seen the CAN_MIGRATE patch make a huge difference, not just in
testing, but a -real- enterprise application used in "production". And
unlike JBB and Volano, there's no high rate of sched_yield either. They do
have a high rate of cswitches, but only because their workload message
driven. This patch made a 40% improvement on 4-way on a 2.4 distro kernel
that has O(1).

-Andrew Theurer

2003-09-11 11:05:02

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Andrew Theurer wrote:

>Robert Love <[email protected]> wrote:
>
>>> There are a _lot_ of scheduler changes in 2.6-mm, and who knows which
>>> ones are an improvement, a detriment, and a noop?
>>>
>
>>We know that sched-2.6.0-test2-mm2-A3.patch caused the regression, and
>>we now that sched-CAN_MIGRATE_TASK-fix.patch mostly fixed it up.
>>
>
>>What we don't know is whether the thing which
>>sched-CAN_MIGRATE_TASK-fix.patch
>>fixed was the thing which sched-2.6.0-test2-mm2-A3.patch broke.
>>
>
>Sorry for jumping into this late. I didn't even know the can_migrate patch
>was being discussed, let alone in -mm :). And to be fair, this really is
>Ingo's aggressive idle steal patch.
>
>Anyway, these patches are somewhat related. It would seem that A3's
>shortening the tasks' run time would not only slow performance beacuse of
>cache thrash, but could possibly break CAN_MIGRATE's cache warmth check,
>right? That in turn would stop load balancing from working well, leading to
>more idle time, which the CAN_MIGRATE patch sort of bypassed for idle cpus.
>

Yeah thats probably right. Good thinking.

>
>I see Nick's balance patch as somewhat harmless, at least combined with A3
>patch. However, one concern is that the "ping-pong" steal interval is not
>really 200ms, but 200ms/(nr_cpus-1), which without A3, could show up as a
>problem, especially on an 8 way box. In addition, I do think there's a
>problem with num tasks we steal. It should not be imbalance/2, it should be:
>max_load - (node_nr_running / num_cpus_node). If we steal any more than
>this, which is quite possible with imbalance/2, then it's likely this_cpu now
>has too many tasks, and some other cpu will steal again. Using *imbalance/2
>works fine on 2-way smp, but I'm pretty sure we "over steal" tasks on 4 way
>and up. Anyway, I'm getting off topic here...
>

IIRC max_load is supposed to be the number of tasks on the runqueue
being stolen from, isn't it?

2003-09-11 13:02:54

by Andrew Theurer

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

On Thursday 11 September 2003 06:04, Nick Piggin wrote:
> Andrew Theurer wrote:
> >Robert Love <[email protected]> wrote:
> >>> There are a _lot_ of scheduler changes in 2.6-mm, and who knows which
> >>> ones are an improvement, a detriment, and a noop?
> >>
> >>We know that sched-2.6.0-test2-mm2-A3.patch caused the regression, and
> >>we now that sched-CAN_MIGRATE_TASK-fix.patch mostly fixed it up.
> >>
> >>
> >>What we don't know is whether the thing which
> >>sched-CAN_MIGRATE_TASK-fix.patch
> >>fixed was the thing which sched-2.6.0-test2-mm2-A3.patch broke.
> >
> >Sorry for jumping into this late. I didn't even know the can_migrate
> > patch was being discussed, let alone in -mm :). And to be fair, this
> > really is Ingo's aggressive idle steal patch.
> >
> >Anyway, these patches are somewhat related. It would seem that A3's
> >shortening the tasks' run time would not only slow performance beacuse of
> >cache thrash, but could possibly break CAN_MIGRATE's cache warmth check,
> >right? That in turn would stop load balancing from working well, leading
> > to more idle time, which the CAN_MIGRATE patch sort of bypassed for idle
> > cpus.
>
> Yeah thats probably right. Good thinking.
>
> >I see Nick's balance patch as somewhat harmless, at least combined with A3
> >patch. However, one concern is that the "ping-pong" steal interval is not
> >really 200ms, but 200ms/(nr_cpus-1), which without A3, could show up as a
> >problem, especially on an 8 way box. In addition, I do think there's a
> >problem with num tasks we steal. It should not be imbalance/2, it should
> > be: max_load - (node_nr_running / num_cpus_node). If we steal any more
> > than this, which is quite possible with imbalance/2, then it's likely
> > this_cpu now has too many tasks, and some other cpu will steal again.
> > Using *imbalance/2 works fine on 2-way smp, but I'm pretty sure we "over
> > steal" tasks on 4 way and up. Anyway, I'm getting off topic here...
>
> IIRC max_load is supposed to be the number of tasks on the runqueue
> being stolen from, isn't it?

Yes, but I think I still got this wrong. Ideally, once we finish stealing,
the busiest runqueue should not have more than node_nr_runing/nr_cpus_node,
but more importantly, this_cpu should not have more than
node_nr_running/nr_cpus_node, so maybe it should be:

min(a,b) where
a = max_load - load_average How much we are over the load_average
b = load_average - this_load How much we are under the load_average
load_average = node_nr_runing / nr_cpus_node.
node_nr_running can be summed as we look for the busiest queue, so it should
not be too costly.
if min(a,b) is neagtive (this_cpu's runqueue length was greater than
load_average) we don't steal at all.

2003-09-11 13:53:35

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

Andrew Theurer wrote:

>On Thursday 11 September 2003 06:04, Nick Piggin wrote:
>
>>Andrew Theurer wrote:
>>
>>>Robert Love <[email protected]> wrote:
>>>
>>>>>There are a _lot_ of scheduler changes in 2.6-mm, and who knows which
>>>>>ones are an improvement, a detriment, and a noop?
>>>>>
>>>>We know that sched-2.6.0-test2-mm2-A3.patch caused the regression, and
>>>>we now that sched-CAN_MIGRATE_TASK-fix.patch mostly fixed it up.
>>>>
>>>>
>>>>What we don't know is whether the thing which
>>>>sched-CAN_MIGRATE_TASK-fix.patch
>>>>fixed was the thing which sched-2.6.0-test2-mm2-A3.patch broke.
>>>>
>>>Sorry for jumping into this late. I didn't even know the can_migrate
>>>patch was being discussed, let alone in -mm :). And to be fair, this
>>>really is Ingo's aggressive idle steal patch.
>>>
>>>Anyway, these patches are somewhat related. It would seem that A3's
>>>shortening the tasks' run time would not only slow performance beacuse of
>>>cache thrash, but could possibly break CAN_MIGRATE's cache warmth check,
>>>right? That in turn would stop load balancing from working well, leading
>>>to more idle time, which the CAN_MIGRATE patch sort of bypassed for idle
>>>cpus.
>>>
>>Yeah thats probably right. Good thinking.
>>
>>
>>>I see Nick's balance patch as somewhat harmless, at least combined with A3
>>>patch. However, one concern is that the "ping-pong" steal interval is not
>>>really 200ms, but 200ms/(nr_cpus-1), which without A3, could show up as a
>>>problem, especially on an 8 way box. In addition, I do think there's a
>>>problem with num tasks we steal. It should not be imbalance/2, it should
>>>be: max_load - (node_nr_running / num_cpus_node). If we steal any more
>>>than this, which is quite possible with imbalance/2, then it's likely
>>>this_cpu now has too many tasks, and some other cpu will steal again.
>>>Using *imbalance/2 works fine on 2-way smp, but I'm pretty sure we "over
>>>steal" tasks on 4 way and up. Anyway, I'm getting off topic here...
>>>
>>IIRC max_load is supposed to be the number of tasks on the runqueue
>>being stolen from, isn't it?
>>
>
>Yes, but I think I still got this wrong. Ideally, once we finish stealing,
>the busiest runqueue should not have more than node_nr_runing/nr_cpus_node,
>but more importantly, this_cpu should not have more than
>node_nr_running/nr_cpus_node, so maybe it should be:
>
>min(a,b) where
>a = max_load - load_average How much we are over the load_average
>b = load_average - this_load How much we are under the load_average
>load_average = node_nr_runing / nr_cpus_node.
>node_nr_running can be summed as we look for the busiest queue, so it should
>not be too costly.
>if min(a,b) is neagtive (this_cpu's runqueue length was greater than
>load_average) we don't steal at all.
>

Oh OK you're thinking about balancing across the entire NUMA. I was just
thinking it will eventually settle down, but you're right: its probably
better to overdampen the balancing than to underdampen it.

2003-09-11 14:41:14

by Andrew Theurer

[permalink] [raw]

Subject: Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

> >>>I see Nick's balance patch as somewhat harmless, at least combined with
> >>> A3 patch. However, one concern is that the "ping-pong" steal interval
> >>> is not really 200ms, but 200ms/(nr_cpus-1), which without A3, could
> >>> show up as a problem, especially on an 8 way box. In addition, I do
> >>> think there's a problem with num tasks we steal. It should not be
> >>> imbalance/2, it should be: max_load - (node_nr_running /
> >>> num_cpus_node). If we steal any more than this, which is quite
> >>> possible with imbalance/2, then it's likely this_cpu now has too many
> >>> tasks, and some other cpu will steal again. Using *imbalance/2 works
> >>> fine on 2-way smp, but I'm pretty sure we "over steal" tasks on 4 way
> >>> and up. Anyway, I'm getting off topic here...
> >>
> >>IIRC max_load is supposed to be the number of tasks on the runqueue
> >>being stolen from, isn't it?
> >
> >Yes, but I think I still got this wrong. Ideally, once we finish
> > stealing, the busiest runqueue should not have more than
> > node_nr_runing/nr_cpus_node, but more importantly, this_cpu should not
> > have more than
> >node_nr_running/nr_cpus_node, so maybe it should be:
> >
> >min(a,b) where
> >a = max_load - load_average How much we are over the load_average
> >b = load_average - this_load How much we are under the load_average
> >load_average = node_nr_runing / nr_cpus_node.
> >node_nr_running can be summed as we look for the busiest queue, so it
> > should not be too costly.
> >if min(a,b) is neagtive (this_cpu's runqueue length was greater than
> >load_average) we don't steal at all.
>
> Oh OK you're thinking about balancing across the entire NUMA. I was just
> thinking it will eventually settle down, but you're right: its probably
> better to overdampen the balancing than to underdampen it.

Actually this is really geared towards within the node. The goal for each cpu
in the node (or just a non NUMA system) should be to steal just enough to
have rq->nr_running to be nr_running()/nr_cpus. I'm still not sure how many
tasks we should really steal from internode balance.

2003-09-11 22:58:35