2003-08-23 05:49:03

by Con Kolivas

[permalink] [raw]
Subject: [PATCH]O18.1int

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Some high credit tasks were being missed due to their prolonged cpu burn at
startup flagging them as low credit tasks.

Low credit tasks can now recover to become high credit.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/RwHDZUg7+tp6mRURAie7AJ43egdTeSapoX1D0aJQcEksBTkKdwCfcyHZ
cD1TMt7oFNXvmSrqnJe7Z+E=
=41zp
-----END PGP SIGNATURE-----


Attachments:
(No filename) (418.00 B)
clearsigned data
patch-O18-O18.1int (935.00 B)
Download all attachments

2003-08-23 09:09:42

by Thomas Schlichter

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

On Saturday 23 August 2003 07:55, Con Kolivas wrote:
> Some high credit tasks were being missed due to their prolonged cpu burn at
> startup flagging them as low credit tasks.
>
> Low credit tasks can now recover to become high credit.
>
> Con

Hi Con!

First of all... Your interactive scheduler work is GREAT! I really like it...!

Now I tried to unterstand what exacly the latest patch does, and as far as I
can see the first and the third hunk just delete respectively expand the
macro VARYING_CREDIT(p). But the second hunk helps processes to get some
interactive_credit until they become a HIGH_CREDIT task. This looks
reasonable to me...

So, now I wanted to know how a task may lose its interactive_credit again...
The only code I saw doing this is exaclty the third hunk of your patch. But
if a process is a HIGH_CREDIT task it can never lose its interactive_credit
again. Is that intented?

I think the third hunk should look like following:
@@ -1548,7 +1545,7 @@ switch_tasks:
prev->sleep_avg -= run_time;
if ((long)prev->sleep_avg <= 0){
prev->sleep_avg = 0;
- prev->interactive_credit -= VARYING_CREDIT(prev);
+ prev->interactive_credit -= !(LOW_CREDIT(prev));
}
prev->timestamp = now;

As an additional idea I think interactive_credit should be allowed to be a bit
bigger than MAX_SLEEP_AVG and a bit lower than -MAX_SLEEP_AVG. This would
make LOW_CREDIT processes stay LOW_CREDIT even if they do some sleep and
HIGH_CREDIT processes star HIGH_CREDIT even if they do some computing...

But of course I may completely miss something...

Thomas

2003-08-23 09:19:27

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH]O18.1int



Thomas Schlichter wrote:

>On Saturday 23 August 2003 07:55, Con Kolivas wrote:
>
>>Some high credit tasks were being missed due to their prolonged cpu burn at
>>startup flagging them as low credit tasks.
>>
>>Low credit tasks can now recover to become high credit.
>>
>>Con
>>
>
>Hi Con!
>
>First of all... Your interactive scheduler work is GREAT! I really like it...!
>
>Now I tried to unterstand what exacly the latest patch does, and as far as I
>can see the first and the third hunk just delete respectively expand the
>macro VARYING_CREDIT(p). But the second hunk helps processes to get some
>interactive_credit until they become a HIGH_CREDIT task. This looks
>reasonable to me...
>
>So, now I wanted to know how a task may lose its interactive_credit again...
>The only code I saw doing this is exaclty the third hunk of your patch. But
>if a process is a HIGH_CREDIT task it can never lose its interactive_credit
>again. Is that intented?
>
>I think the third hunk should look like following:
>@@ -1548,7 +1545,7 @@ switch_tasks:
> prev->sleep_avg -= run_time;
> if ((long)prev->sleep_avg <= 0){
> prev->sleep_avg = 0;
>- prev->interactive_credit -= VARYING_CREDIT(prev);
>+ prev->interactive_credit -= !(LOW_CREDIT(prev));
> }
> prev->timestamp = now;
>
>As an additional idea I think interactive_credit should be allowed to be a bit
>bigger than MAX_SLEEP_AVG and a bit lower than -MAX_SLEEP_AVG. This would
>make LOW_CREDIT processes stay LOW_CREDIT even if they do some sleep and
>HIGH_CREDIT processes star HIGH_CREDIT even if they do some computing...
>
>But of course I may completely miss something...
>
>

Hi
I don't know what is preferred on lkml, but I dislike mixing booleans
and integer arithmetic.

if (!LOW_CREDIT(prev))
prev->interactive_credit--;

Easier to read IMO.


2003-08-23 09:30:30

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH]O18.1int


We have a problem. See the this analysis from Steve Pratt.


Steven Pratt <[email protected]> wrote:
>
> Mark Peloquin wrote:
>
> > Been awhile since results where posted, therefore this is a little long.
> >
> >
> > Nightly Regression Summary for 2.6.0-test3 vs 2.6.0-test3-mm3
> >
> > Benchmark Pass/Fail Improvements Regressions
> > Results Results Summary
> > --------------- --------- ------------ -----------
> > ----------- ----------- -------
> > dbench.ext2 P N N 2.6.0-test3
> > 2.6.0-test3-mm3 report
> > dbench.ext3 P N Y 2.6.0-test3
> > 2.6.0-test3-mm3 report
>
> The ext3 dbench regression is very significant for multi threaded 193 ->
> 118. Looks like this regression first showed up in mm1 and does not
> exist in any of the bk trees.
>
> http://ltcperf.ncsa.uiuc.edu/data/history-graphs/dbench.ext3.throughput.plot.16.png
>
> >
> > volanomark P N Y 2.6.0-test3
> > 2.6.0-test3-mm3 report
>
> Volanomark is significant as well. 10% drop in mm tree. This one also
> appeared to show up in mm1 although it was a 14% drop then so mm3
> actually looks a little better. There were build errors on mm2 run so I
> don't have that data at this time.
> Following link illustrates the drop in mm tree for volanomark.
>
> http://ltcperf.ncsa.uiuc.edu/data/history-graphs/volanomark.throughput.plot.1.png
>
>
> SpecJBB2000 for high warehouses also took a bit hit. Probably the same
> root cause as volanomark.
> Here is the history plot for the 19 warehouse run.
>
> http://ltcperf.ncsa.uiuc.edu/data/history-graphs/specjbb.results.avg.plot.19.png
>
> Huge spike in idle time.
> http://ltcperf.ncsa.uiuc.edu/data/history-graphs/specjbb.utilization.idle.avg.plot.19.png
>
> >
> > http://ltcperf.ncsa.uiuc.edu/data/2.6.0-test3-mm3/2.6.0-test3-vs-2.6.0-test3-mm3/
> >
>

Those graphs are woeful.

Steve has done some preliminary testing which indicates that the volanomark
and specjbb regressions are due to the CPU scheduler changes.

I have verifed that the ext3 regression is mostly due to setting
PF_SYNCWRITE on kjournald. I/O scheduler stuff. I don't know why, but
that patch obviously bites the dust. There is still a 10-15% regression on
dbench 16 on my 4x Xeon which is due to the CPU scheduler patches.

It's good that the reaim regression mostly went away, but it would be nice
to know why. When I was looking into the reaim problem it appeared that
setting TIMESLICE_GRANULARITY to MAX_TIMESLICE made no difference, but more
careful testing is needed on this.

There really is no point in proceeding with this fine tuning activity when
we have these large and not understood regressions floating about.

I suggest that what we need to do is to await some more complete testing of
the CPU scheduler patch alone from Steve and co. If it is fully confirmed
that the CPU scheduler changes are the culprit we need to either fix it or
go back to square one and start again with more careful testing and a less
ambitious set of changes.

It could be that we're looking at some sort of tradeoff here, and we're
already too far over to one side. I don't know.

It might help if you or a buddy could get set up with volanomark on an OSDL
4-or-8-way so that you can more closely track the effect of your changes on
such benchmarks.

2003-08-23 09:50:31

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH]O18.1int



Andrew Morton wrote:

>We have a problem. See the this analysis from Steve Pratt.
>
>
>Steven Pratt <[email protected]> wrote:
>
>>Mark Peloquin wrote:
>>
>>
>>>Been awhile since results where posted, therefore this is a little long.
>>>
>>>
>>>Nightly Regression Summary for 2.6.0-test3 vs 2.6.0-test3-mm3
>>>
>>>Benchmark Pass/Fail Improvements Regressions
>>>Results Results Summary
>>>--------------- --------- ------------ -----------
>>>----------- ----------- -------
>>>dbench.ext2 P N N 2.6.0-test3
>>>2.6.0-test3-mm3 report
>>>dbench.ext3 P N Y 2.6.0-test3
>>>2.6.0-test3-mm3 report
>>>
>>The ext3 dbench regression is very significant for multi threaded 193 ->
>>118. Looks like this regression first showed up in mm1 and does not
>>exist in any of the bk trees.
>>
>>http://ltcperf.ncsa.uiuc.edu/data/history-graphs/dbench.ext3.throughput.plot.16.png
>>
>>
>>>volanomark P N Y 2.6.0-test3
>>>2.6.0-test3-mm3 report
>>>
>>Volanomark is significant as well. 10% drop in mm tree. This one also
>>appeared to show up in mm1 although it was a 14% drop then so mm3
>>actually looks a little better. There were build errors on mm2 run so I
>>don't have that data at this time.
>>Following link illustrates the drop in mm tree for volanomark.
>>
>>http://ltcperf.ncsa.uiuc.edu/data/history-graphs/volanomark.throughput.plot.1.png
>>
>>
>>SpecJBB2000 for high warehouses also took a bit hit. Probably the same
>>root cause as volanomark.
>>Here is the history plot for the 19 warehouse run.
>>
>>http://ltcperf.ncsa.uiuc.edu/data/history-graphs/specjbb.results.avg.plot.19.png
>>
>>Huge spike in idle time.
>>http://ltcperf.ncsa.uiuc.edu/data/history-graphs/specjbb.utilization.idle.avg.plot.19.png
>>
>>
>>>http://ltcperf.ncsa.uiuc.edu/data/2.6.0-test3-mm3/2.6.0-test3-vs-2.6.0-test3-mm3/
>>>
>>>
>
>Those graphs are woeful.
>

Aren't they.

>
>Steve has done some preliminary testing which indicates that the volanomark
>and specjbb regressions are due to the CPU scheduler changes.
>
>I have verifed that the ext3 regression is mostly due to setting
>PF_SYNCWRITE on kjournald. I/O scheduler stuff. I don't know why, but
>that patch obviously bites the dust. There is still a 10-15% regression on
>dbench 16 on my 4x Xeon which is due to the CPU scheduler patches.
>

Thats fine. I never measured any improvement with it. Its sad that
that it didn't go as I hoped, but that probably tells you I don't
know enough about how journalling works.

>
>It's good that the reaim regression mostly went away, but it would be nice
>to know why. When I was looking into the reaim problem it appeared that
>setting TIMESLICE_GRANULARITY to MAX_TIMESLICE made no difference, but more
>careful testing is needed on this.
>
>There really is no point in proceeding with this fine tuning activity when
>we have these large and not understood regressions floating about.
>

I think changes in the CPU scheduler cause butterflies to flap their
wings or what have you. Good luck pinning it down.

>
>I suggest that what we need to do is to await some more complete testing of
>the CPU scheduler patch alone from Steve and co. If it is fully confirmed
>that the CPU scheduler changes are the culprit we need to either fix it or
>go back to square one and start again with more careful testing and a less
>ambitious set of changes.
>
>It could be that we're looking at some sort of tradeoff here, and we're
>already too far over to one side. I don't know.
>
>It might help if you or a buddy could get set up with volanomark on an OSDL
>4-or-8-way so that you can more closely track the effect of your changes on
>such benchmarks.
>

I think you'd be wasting your time until the interactivity side of
things is working better. Unless Con has a smaller set of undisputed
improvements to test with.


2003-08-23 12:15:00

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

On Sat, 23 Aug 2003 19:08, Thomas Schlichter wrote:
> On Saturday 23 August 2003 07:55, Con Kolivas wrote:
> > Some high credit tasks were being missed due to their prolonged cpu burn
> > at startup flagging them as low credit tasks.
> >
> > Low credit tasks can now recover to become high credit.

> First of all... Your interactive scheduler work is GREAT! I really like
> it...!

Thank you. I never assume no news is good news so I appreciate that.

> Now I tried to unterstand what exacly the latest patch does, and as far as
> I can see the first and the third hunk just delete respectively expand the
> macro VARYING_CREDIT(p). But the second hunk helps processes to get some
> interactive_credit until they become a HIGH_CREDIT task. This looks
> reasonable to me...
>
> So, now I wanted to know how a task may lose its interactive_credit
> again... The only code I saw doing this is exaclty the third hunk of your
> patch. But if a process is a HIGH_CREDIT task it can never lose its
> interactive_credit again. Is that intented?

Yes indeed it is.

> I think the third hunk should look like following:
[snip]

> As an additional idea I think interactive_credit should be allowed to be a
> bit bigger than MAX_SLEEP_AVG and a bit lower than -MAX_SLEEP_AVG. This
> would make LOW_CREDIT processes stay LOW_CREDIT even if they do some sleep
> and HIGH_CREDIT processes star HIGH_CREDIT even if they do some
> computing...
>
> But of course I may completely miss something...

Originally flagged HIGH_CREDIT and LOW_CREDIT tasks had no limit on how much
credit they could accumulate or lose. This worked in practice but I needed
some cutoff beyond which they would be one or the other and MSA seemed a
decent number. At some stage if it was boundless there would be variable
overflow as well and checking for this was unecessary overhead. I dont want
tasks flagged as high credit to be forgotten though so thats why I dont let
them change. Originally the same idea was for low credit tasks but that
proved to occasionally choose X during startup as low credit. Higher numbers?
perhaps but really this depends on hardware as to what is perfect.

Cheers,
Con

P.S. All this may be moot as it looks like I, or someone else, may have to
start again.

2003-08-23 12:18:27

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

On Sat, 23 Aug 2003 19:18, Nick Piggin wrote:
> Hi
> I don't know what is preferred on lkml, but I dislike mixing booleans
> and integer arithmetic.
>
> if (!LOW_CREDIT(prev))
> prev->interactive_credit--;
>
> Easier to read IMO.

I agree. I only mixed them because of my (perhaps false) belief that it's less
of a hit not having another if.. branch point. Then again today's compilers
probably optimise it out anyway.

Con

2003-08-23 13:22:52

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

On Sat, 23 Aug 2003 19:32, Andrew Morton wrote:
> It might help if you or a buddy could get set up with volanomark on an OSDL
> 4-or-8-way so that you can more closely track the effect of your changes on
> such benchmarks.

Underway.

Con

2003-08-23 17:04:29

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

On Sat, 23 Aug 2003 19:49, Nick Piggin wrote:
> Andrew Morton wrote:
> >We have a problem. See the this analysis from Steve Pratt.

> >Steve has done some preliminary testing which indicates that the
> > volanomark and specjbb regressions are due to the CPU scheduler changes.

> >It might help if you or a buddy could get set up with volanomark on an
> > OSDL 4-or-8-way so that you can more closely track the effect of your
> > changes on such benchmarks.

Ok here goes.
This is on 8way:

Test4:
Average throughput = 11145 messages per second

Test4-O18.1:
Average throughput = 9860 messages per second

Test3-mm3:
Average throughput = 9788 messages per second


So I grabbed test3-mm3 and started peeling back the patches
and found no change in throughput without _any_ of my Oxint patches applied,
and just Ingo's A3 patch:

Test3-mm3-A3
Average throughput = 9889 messages per second


Then finally I removed that patch so there were no interactivity patches:
Test3-mm3-ni
Average throughput = 11052 messages per second


I performed each run 3 times and have the results and profiles posted here:
http://kernel.kolivas.org/2.5/volano

wli suggested inlining sched_clock from A3 to see if that helped but at 3am I
think it can wait. At least I've been able to track down the drop. Thanks
zwane for the iron access.

Con

2003-08-23 21:47:01

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

Con Kolivas <[email protected]> wrote:
>
> > >It might help if you or a buddy could get set up with volanomark on an
> > > OSDL 4-or-8-way so that you can more closely track the effect of your
> > > changes on such benchmarks.
>
> Ok here goes.
> This is on 8way:
>
> Test4:
> Average throughput = 11145 messages per second
>
> Test4-O18.1:
> Average throughput = 9860 messages per second
>
> Test3-mm3:
> Average throughput = 9788 messages per second
>
>
> So I grabbed test3-mm3 and started peeling back the patches
> and found no change in throughput without _any_ of my Oxint patches applied,
> and just Ingo's A3 patch:
>
> Test3-mm3-A3
> Average throughput = 9889 messages per second
>
>
> Then finally I removed that patch so there were no interactivity patches:
> Test3-mm3-ni
> Average throughput = 11052 messages per second

Well that was quick, thanks.

Surely the only reason we see more idle time in this sort of workload is
because of runqueue imbalance: some CPUs are idle while other CPUs have
more than one runnable process. That sounds like a bug more than a
tuning/balancing thing: having no runnable tasks is a sort of binary
do-something-right-now case.

We should be going across and pullng a task off another CPU synchronously
as soon as a runqueue is seen to be empty. The code tries to do that so
hrm.

Ingo just sent the below patch which is related, but doesn't look like it
will fix it. I'll include this in test4-mm1, RSN.



From: Ingo Molnar <[email protected]>

the attached patch fixes the SMP balancing problem reported by David
Mosberger. (the 'yield-ing threads do not get spread out properly' bug).

it turns out that we never really spread out tasks in the busy-rebalance
case - contrary to my intention. The most likely incarnation of this
balancing bug is via yield() - but in theory pipe users can be affected
too.

the patch balances more agressively with a slow frequency, on SMP (or
within the same node on NUMA - not between nodes on NUMA).



kernel/sched.c | 3 +--
1 files changed, 1 insertion(+), 2 deletions(-)

diff -puN kernel/sched.c~sched-balance-fix-2.6.0-test3-mm3-A0 kernel/sched.c
--- 25/kernel/sched.c~sched-balance-fix-2.6.0-test3-mm3-A0 2003-08-23 13:57:06.000000000 -0700
+++ 25-akpm/kernel/sched.c 2003-08-23 13:57:06.000000000 -0700
@@ -1144,7 +1144,6 @@ static void rebalance_tick(runqueue_t *t
load_balance(this_rq, idle, cpu_to_node_mask(this_cpu));
spin_unlock(&this_rq->lock);
}
- return;
}
#ifdef CONFIG_NUMA
if (!(j % BUSY_NODE_REBALANCE_TICK))
@@ -1152,7 +1151,7 @@ static void rebalance_tick(runqueue_t *t
#endif
if (!(j % BUSY_REBALANCE_TICK)) {
spin_lock(&this_rq->lock);
- load_balance(this_rq, idle, cpu_to_node_mask(this_cpu));
+ load_balance(this_rq, 0, cpu_to_node_mask(this_cpu));
spin_unlock(&this_rq->lock);
}
}

_

2003-08-23 22:04:54

by Voluspa

[permalink] [raw]
Subject: Re: [PATCH]O18.1int


On 2003-08-23 12:21:14 Con Kolivas wrote:

> On Sat, 23 Aug 2003 19:08, Thomas Schlichter wrote:
>> On Saturday 23 August 2003 07:55, Con Kolivas wrote:
[...]
>> First of all... Your interactive scheduler work is GREAT! I really
>> like it...!
[...]
> P.S. All this may be moot as it looks like I, or someone else, may
> have to start again.

If you do, please remember this report. Quick summary: All problem areas
I've seen are completely resolved - or at least to a major extent.
Blender, wine and "cp -a".

Andrew Mortons red flag was a good incentive to run my tests on a
pure 2.6.0-test4 and write down the outcome. Adding the O18.1int after
this really stand out.

I won't repeat all my hardware or software facts, but thought I'd add a
few words on tweaks that might have influence: File system is ext2 with
all partitions mounted "noatime". Disk readahead has been upped by
"hdparm -a 512". VM swappiness reduced to 50 (I've got 128 meg mem).
/dev/rtc is changed to 666, read/write for everybody, and
/proc/sys/dev/rtc/max-user-freq is at 1024.

***
_The game "Baldurs Gate I" under winex 3.1_

2.6.0-test4:
Loads but is seriously starved. Sound repeats and graphic freezes occur
with a wave-like frequency. Mouse pointer can hardly be controlled.
Playability 0 of 10.

2.6.0-test4-O18.1int:
Not starved for most of the time, and I can not cause "priority
inversion" on demand with my usual tricks. There _are_ random freezes
but the game recovers within 2 seconds. Playability 8.
***

***
_Blender 2.23 (standalone, and xmms 1.2.7 on a directory of mp3s)_

2.6.0-test4:
Doing a slow "world plane" rotate around the axes is in perfect sync. No
freezes (and no music skips).

2.6.0-test4-O18.1int:
The same as vanilla -test4.
***

***
_Blender 2.28_

2.6.0-test4:
The slow "world plane" rotate cause a 2 to 5 second freeze, quickly
repeating itself if I continue the rotate. Mouse pointer can become
invisible, freeze or jerk right across the screen while loosing its grip
of the world plane grid.

2.6.0-test4-O18.1int:
Perfect sync! How did you achieve this Con? No matter how long I do the
slow "world plane" rotate I can not get it to freeze. Same with a quick
rotate.
***

***
_Blender 2.28 plus xmms 1.2.7_

2.6.0-test4:
Short blackout, ca 3 seconds, at the beginning of a new song if I starve
xmms with the slow "world plane" rotate. Not during the song, only at
the beginning.

2.6.0-test4-O18.1int:
Music perfect during any rotation. Both beginning of songs and the rest.
***

***
_"cp -a" of /usr from hda to hdc and
using Opera 6.12 "software scrollwheel" on
svt.se (page has 70+ small graphics) or a
long slashdot.org comment page_

2.6.0-test4:
"cp" PRI is 15-16, Opera PRI is 15. Slight jerk every 5 second while
scrolling the page on lowest speed.

2.6.0-test4-O18.1int:
"cp" PRI is 17-18, Opera PRI is 15. Perfect smoothness while scrolling.
***

Mvh
Mats Johannesson

2003-08-24 02:40:56

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

On Sun, 24 Aug 2003 07:49, Andrew Morton wrote:
> Con Kolivas <[email protected]> wrote:
> > > >It might help if you or a buddy could get set up with volanomark on an
> > > >
> > > > OSDL 4-or-8-way so that you can more closely track the effect of
> > > > your changes on such benchmarks.
> >
> > Ok here goes.
> > This is on 8way:
> >
> > Test4:
> > Average throughput = 11145 messages per second
> >
> > Test4-O18.1:
> > Average throughput = 9860 messages per second
> >
> > Test3-mm3:
> > Average throughput = 9788 messages per second
> >
> >
> > So I grabbed test3-mm3 and started peeling back the patches
> > and found no change in throughput without _any_ of my Oxint patches
> > applied, and just Ingo's A3 patch:
> >
> > Test3-mm3-A3
> > Average throughput = 9889 messages per second
> >
> >
> > Then finally I removed that patch so there were no interactivity
> > patches: Test3-mm3-ni
> > Average throughput = 11052 messages per second
>
> Well that was quick, thanks.
>
> Surely the only reason we see more idle time in this sort of workload is
> because of runqueue imbalance: some CPUs are idle while other CPUs have
> more than one runnable process. That sounds like a bug more than a
> tuning/balancing thing: having no runnable tasks is a sort of binary
> do-something-right-now case.
>
> We should be going across and pullng a task off another CPU synchronously
> as soon as a runqueue is seen to be empty. The code tries to do that so
> hrm.
>
> Ingo just sent the below patch which is related, but doesn't look like it
> will fix it. I'll include this in test4-mm1, RSN.

Just for the record I also tried inlining sched_clock and removing the rdtsc
call entirely (and just getting the value from jiffies) and it made no
measurable impact on performance.

Con

2003-08-24 03:57:53

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

On Sun, 24 Aug 2003 08:03, Voluspa wrote:
> On 2003-08-23 12:21:14 Con Kolivas wrote:
> > On Sat, 23 Aug 2003 19:08, Thomas Schlichter wrote:
> >> On Saturday 23 August 2003 07:55, Con Kolivas wrote:
>
> [...]
>
> >> First of all... Your interactive scheduler work is GREAT! I really
> >> like it...!
>
> [...]
>
> > P.S. All this may be moot as it looks like I, or someone else, may
> > have to start again.
>
> If you do, please remember this report. Quick summary: All problem areas
> I've seen are completely resolved - or at least to a major extent.
> Blender, wine and "cp -a".
>
> Andrew Mortons red flag was a good incentive to run my tests on a
> pure 2.6.0-test4 and write down the outcome. Adding the O18.1int after
> this really stand out.

Thanks for extensive testing and report.

I didn't want to make a fuss about these O18 patches because I've been excited
by changes previously that weren't as good as I first thought.

My mistake after O15 was that my patches made priority inversion (always
there) much much worse at times, and I went looking for a generic solution to
priority inversion which has destroyed better coders than I. Instead I went
looking for why it was much worse on O15+ and found two algorithm bugs. Much
better to kill off bugs than hide them under the carpet.

Furthermore it doesn't look like it's my fault for the drop in performance on
big SMP after all which is good news.

Con

2003-08-25 09:24:07

by MånsRullgård

[permalink] [raw]
Subject: Re: [PATCH]O18.1int


XEmacs still spins after running a background job like make or grep.
It's fine if I reverse patch-O16.2-O16.3. The spinning doesn't happen
as often, or as long time as with O16.3, but it's there and it's
irritating.

--
M?ns Rullg?rd
[email protected]

2003-08-25 09:42:49

by Alex Riesen

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

M?ns Rullg?rd, Mon, Aug 25, 2003 11:24:01 +0200:
> XEmacs still spins after running a background job like make or grep.
> It's fine if I reverse patch-O16.2-O16.3. The spinning doesn't happen
> as often, or as long time as with O16.3, but it's there and it's
> irritating.

another example is RXVT (an X terminal emulator). Starts spinnig after
it's child has exited. Not always, but annoyingly often. System is
almost locked while it spins (calling select).

2003-08-25 10:09:37

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

On Mon, 25 Aug 2003 19:42, Alex Riesen wrote:
> M?ns Rullg?rd, Mon, Aug 25, 2003 11:24:01 +0200:
> > XEmacs still spins after running a background job like make or grep.
> > It's fine if I reverse patch-O16.2-O16.3. The spinning doesn't happen
> > as often, or as long time as with O16.3, but it's there and it's
> > irritating.
>
> another example is RXVT (an X terminal emulator). Starts spinnig after
> it's child has exited. Not always, but annoyingly often. System is
> almost locked while it spins (calling select).

What does vanilla kernel do with these apps running? Both immediately after
the apps have started up and some time (>1 min) after they've been running?

Con

2003-08-25 10:17:22

by MånsRullgård

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

Alex Riesen <[email protected]> writes:

>> XEmacs still spins after running a background job like make or grep.
>> It's fine if I reverse patch-O16.2-O16.3. The spinning doesn't happen
>> as often, or as long time as with O16.3, but it's there and it's
>> irritating.
>
> another example is RXVT (an X terminal emulator). Starts spinnig after
> it's child has exited. Not always, but annoyingly often. System is
> almost locked while it spins (calling select).

It sounds like the same bug. IMHO, it's rather bad, since a
non-privileged process can make the system unusable for a non-zero
amount of time.

How should I do to capture some information about this thing? Do you
know what causes it, Con?

--
M?ns Rullg?rd
[email protected]

2003-08-25 10:21:40

by Alex Riesen

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

Con Kolivas, Mon, Aug 25, 2003 12:16:13 +0200:
> On Mon, 25 Aug 2003 19:42, Alex Riesen wrote:
> > M?ns Rullg?rd, Mon, Aug 25, 2003 11:24:01 +0200:
> > > XEmacs still spins after running a background job like make or grep.
> > > It's fine if I reverse patch-O16.2-O16.3. The spinning doesn't happen
> > > as often, or as long time as with O16.3, but it's there and it's
> > > irritating.
> >
> > another example is RXVT (an X terminal emulator). Starts spinnig after
> > it's child has exited. Not always, but annoyingly often. System is
> > almost locked while it spins (calling select).
>
> What does vanilla kernel do with these apps running? Both immediately after
> the apps have started up and some time (>1 min) after they've been running?
>

cannot test atm. Will do in 10hours.
RXVT behaved sanely (or probably spin-effect is very rare) in 2.4 (with
O(1) alone and your 2.4 patches) and plain 2.6-test1.

2003-08-25 10:34:27

by Alex Riesen

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

M?ns Rullg?rd, Mon, Aug 25, 2003 12:17:16 +0200:
> Alex Riesen <[email protected]> writes:
>
> >> XEmacs still spins after running a background job like make or grep.
> >> It's fine if I reverse patch-O16.2-O16.3. The spinning doesn't happen
> >> as often, or as long time as with O16.3, but it's there and it's
> >> irritating.
> >
> > another example is RXVT (an X terminal emulator). Starts spinnig after
> > it's child has exited. Not always, but annoyingly often. System is
> > almost locked while it spins (calling select).
>
> It sounds like the same bug. IMHO, it's rather bad, since a
> non-privileged process can make the system unusable for a non-zero
> amount of time.

the source of RXVT looks more like the bug: it does not check for
errors, even though it is a bit tricky to get portably.
It is still a problem, though: "_almost_ locked" does not make it nice.

> How should I do to capture some information about this thing?

Use "top" and look at the dynamic priority.


2003-08-25 10:35:03

by MånsRullgård

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

Con Kolivas <[email protected]> writes:

>> > XEmacs still spins after running a background job like make or grep.
>> > It's fine if I reverse patch-O16.2-O16.3. The spinning doesn't happen
>> > as often, or as long time as with O16.3, but it's there and it's
>> > irritating.
>>
>> another example is RXVT (an X terminal emulator). Starts spinnig after
>> it's child has exited. Not always, but annoyingly often. System is
>> almost locked while it spins (calling select).
>
> What does vanilla kernel do with these apps running? Both immediately after
> the apps have started up and some time (>1 min) after they've been running?

Vanilla test1 has the spin effect. Test2 doesn't. I haven't tried
vanilla test3 or test4. As I've said, the O16.2-O16.3 patch
introduced the problem. With that patch reversed, everything is
fine. What problem does that patch fix?

--
M?ns Rullg?rd
[email protected]

2003-08-25 10:41:57

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

On Mon, 25 Aug 2003 20:17, M?ns Rullg?rd wrote:
> Alex Riesen <[email protected]> writes:
> >> XEmacs still spins after running a background job like make or grep.
> >> It's fine if I reverse patch-O16.2-O16.3. The spinning doesn't happen
> >> as often, or as long time as with O16.3, but it's there and it's
> >> irritating.
> >
> > another example is RXVT (an X terminal emulator). Starts spinnig after
> > it's child has exited. Not always, but annoyingly often. System is
> > almost locked while it spins (calling select).
>
> It sounds like the same bug. IMHO, it's rather bad, since a
> non-privileged process can make the system unusable for a non-zero
> amount of time.
>
> How should I do to capture some information about this thing? Do you
> know what causes it, Con?

Read my rfc on the orthogonal interactivity patches and look under priority
inversion. It may be that, in which case it happens to vanilla as well. To
capture useful information for me is quite easy. Run a reniced -11 top -d 1
-b in batch mode and reniced -11 vmstat 1 in batch mode to capture it
happening. That should be enough information to see what's going on and
doesn't need a kernel compile or any special tools. Renice -11 to make sure
the tools dont get preempted in that period.

Con

2003-08-25 10:43:38

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

On Mon, 25 Aug 2003 20:34, M?ns Rullg?rd wrote:
> Con Kolivas <[email protected]> writes:
> >> > XEmacs still spins after running a background job like make or grep.
> >> > It's fine if I reverse patch-O16.2-O16.3. The spinning doesn't happen
> >> > as often, or as long time as with O16.3, but it's there and it's
> >> > irritating.
> >>
> >> another example is RXVT (an X terminal emulator). Starts spinnig after
> >> it's child has exited. Not always, but annoyingly often. System is
> >> almost locked while it spins (calling select).
> >
> > What does vanilla kernel do with these apps running? Both immediately
> > after the apps have started up and some time (>1 min) after they've been
> > running?
>
> Vanilla test1 has the spin effect. Test2 doesn't. I haven't tried
> vanilla test3 or test4. As I've said, the O16.2-O16.3 patch
> introduced the problem. With that patch reversed, everything is
> fine. What problem does that patch fix?

It's a generic fix for priority inversion but it induces badness in smp, and
latency in task preemption on up so it's not suitable.

Con

2003-08-25 11:15:23

by MånsRullgård

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

Con Kolivas <[email protected]> writes:

>> Vanilla test1 has the spin effect. Test2 doesn't. I haven't tried
>> vanilla test3 or test4. As I've said, the O16.2-O16.3 patch
>> introduced the problem. With that patch reversed, everything is
>> fine. What problem does that patch fix?
>
> It's a generic fix for priority inversion but it induces badness in smp, and
> latency in task preemption on up so it's not suitable.

Now I'm confused. If that patch is bad, then why is it in O18?

--
M?ns Rullg?rd
[email protected]

2003-08-25 11:23:45

by MånsRullgård

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

Alex Riesen <[email protected]> writes:

>> >> XEmacs still spins after running a background job like make or grep.
>> >> It's fine if I reverse patch-O16.2-O16.3. The spinning doesn't happen
>> >> as often, or as long time as with O16.3, but it's there and it's
>> >> irritating.
>> >
>> > another example is RXVT (an X terminal emulator). Starts spinnig after
>> > it's child has exited. Not always, but annoyingly often. System is
>> > almost locked while it spins (calling select).
>>
>> It sounds like the same bug. IMHO, it's rather bad, since a
>> non-privileged process can make the system unusable for a non-zero
>> amount of time.
>
> the source of RXVT looks more like the bug: it does not check for
> errors, even though it is a bit tricky to get portably.

A program should never be able to grab more than it's share of the CPU
time, no matter how buggy it is.

>> How should I do to capture some information about this thing?
>
> Use "top" and look at the dynamic priority.

I'll try, but it could be tricky, since it doesn't usually last very
long.

--
M?ns Rullg?rd
[email protected]

2003-08-25 11:30:12

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

On Mon, 25 Aug 2003 21:15, M?ns Rullg?rd wrote:
> Con Kolivas <[email protected]> writes:
> >> Vanilla test1 has the spin effect. Test2 doesn't. I haven't tried
> >> vanilla test3 or test4. As I've said, the O16.2-O16.3 patch
> >> introduced the problem. With that patch reversed, everything is
> >> fine. What problem does that patch fix?
> >
> > It's a generic fix for priority inversion but it induces badness in smp,
> > and latency in task preemption on up so it's not suitable.
>
> Now I'm confused. If that patch is bad, then why is it in O18?

No, the 16.2 patch is bad. 16.3 backed it out.

Con

2003-08-25 11:59:33

by MånsRullgård

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

Con Kolivas <[email protected]> writes:

>> >> Vanilla test1 has the spin effect. Test2 doesn't. I haven't tried
>> >> vanilla test3 or test4. As I've said, the O16.2-O16.3 patch
>> >> introduced the problem. With that patch reversed, everything is
>> >> fine. What problem does that patch fix?
>> >
>> > It's a generic fix for priority inversion but it induces badness in smp,
>> > and latency in task preemption on up so it's not suitable.
>>
>> Now I'm confused. If that patch is bad, then why is it in O18?
>
> No, the 16.2 patch is bad. 16.3 backed it out.

OK, but it somehow made XEmacs behave badly.

--
M?ns Rullg?rd
[email protected]

2003-08-25 12:21:44

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

On Mon, 25 Aug 2003 21:58, M?ns Rullg?rd wrote:
> Con Kolivas <[email protected]> writes:
> >> >> Vanilla test1 has the spin effect. Test2 doesn't. I haven't tried
> >> >> vanilla test3 or test4. As I've said, the O16.2-O16.3 patch
> >> >> introduced the problem. With that patch reversed, everything is
> >> >> fine. What problem does that patch fix?
> >> >
> >> > It's a generic fix for priority inversion but it induces badness in
> >> > smp, and latency in task preemption on up so it's not suitable.
> >>
> >> Now I'm confused. If that patch is bad, then why is it in O18?
> >
> > No, the 16.2 patch is bad. 16.3 backed it out.
>
> OK, but it somehow made XEmacs behave badly.

Well it was a generic fix in 16.2 that helped XEmacs as I said. O15 also had a
generic fix (child not preempting it's parent) but that too was covering up
the real issue, but it wasnt as drastic as 16.2.

Con

2003-08-25 12:49:14

by MånsRullgård

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

Con Kolivas <[email protected]> writes:

>> >> >> Vanilla test1 has the spin effect. Test2 doesn't. I haven't tried
>> >> >> vanilla test3 or test4. As I've said, the O16.2-O16.3 patch
>> >> >> introduced the problem. With that patch reversed, everything is
>> >> >> fine. What problem does that patch fix?
>> >> >
>> >> > It's a generic fix for priority inversion but it induces badness in
>> >> > smp, and latency in task preemption on up so it's not suitable.
>> >>
>> >> Now I'm confused. If that patch is bad, then why is it in O18?
>> >
>> > No, the 16.2 patch is bad. 16.3 backed it out.
>>
>> OK, but it somehow made XEmacs behave badly.
>
> Well it was a generic fix in 16.2 that helped XEmacs as I said. O15
> also had a generic fix (child not preempting it's parent) but that
> too was covering up the real issue, but it wasnt as drastic as 16.2.

Of the kernels I've tested, only test1 vanilla and O16.3 and later
show the problem. Btw, is it related to the XEmacs regexp search
problem, or is that a different one?

--
M?ns Rullg?rd
[email protected]

2003-08-25 13:25:30

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

On Mon, 25 Aug 2003 22:49, M?ns Rullg?rd wrote:
> Con Kolivas <[email protected]> writes:
> >> >> >> Vanilla test1 has the spin effect. Test2 doesn't. I haven't
> >> >> >> tried vanilla test3 or test4. As I've said, the O16.2-O16.3 patch
> >> >> >> introduced the problem. With that patch reversed, everything is
> >> >> >> fine. What problem does that patch fix?
> >> >> >
> >> >> > It's a generic fix for priority inversion but it induces badness in
> >> >> > smp, and latency in task preemption on up so it's not suitable.
> >> >>
> >> >> Now I'm confused. If that patch is bad, then why is it in O18?
> >> >
> >> > No, the 16.2 patch is bad. 16.3 backed it out.
> >>
> >> OK, but it somehow made XEmacs behave badly.
> >
> > Well it was a generic fix in 16.2 that helped XEmacs as I said. O15
> > also had a generic fix (child not preempting it's parent) but that
> > too was covering up the real issue, but it wasnt as drastic as 16.2.
>
> Of the kernels I've tested, only test1 vanilla and O16.3 and later
> show the problem. Btw, is it related to the XEmacs regexp search
> problem, or is that a different one?

Same thing. Both examples of priority inversion. In your case a parent child
interaction which was worked around in O15. That interaction seems the more
common case.

Con

2003-08-25 21:02:59

by Alex Riesen

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

Alex Riesen, Mon, Aug 25, 2003 12:21:33 +0200:
> > > > XEmacs still spins after running a background job like make or grep.
> > > > It's fine if I reverse patch-O16.2-O16.3. The spinning doesn't happen
> > > > as often, or as long time as with O16.3, but it's there and it's
> > > > irritating.
> > >
> > > another example is RXVT (an X terminal emulator). Starts spinnig after
> > > it's child has exited. Not always, but annoyingly often. System is
> > > almost locked while it spins (calling select).
> >
> > What does vanilla kernel do with these apps running? Both immediately after
> > the apps have started up and some time (>1 min) after they've been running?
>
> cannot test atm. Will do in 10hours.
> RXVT behaved sanely (or probably spin-effect is very rare) in 2.4 (with
> O(1) alone and your 2.4 patches) and plain 2.6-test1.
>

Sorry, I have to postpone this investigation. No time on the machine.

I try to describe the behaviour of rxvt as best as I can below.

Afaics, the application (rxvt) just sleeps at the beginning waiting for
input from X. As every terminal would do. At some point its inferior
process finishes, but it fails to notice this spinning madly in the
internal loop calling select, which returns immediately (because other
side of pty was closed. That is the error in rxvt). Probably it has
accumulated enough "priority" up to this moment to block other
applications (window manager, for example) when it suddenly starts running?

-alex

2003-08-25 22:41:52

by Con Kolivas

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

On Tue, 26 Aug 2003 07:02, Alex Riesen wrote:
> Alex Riesen, Mon, Aug 25, 2003 12:21:33 +0200:
> > > > > XEmacs still spins after running a background job like make or
> > > > > grep. It's fine if I reverse patch-O16.2-O16.3. The spinning
> > > > > doesn't happen as often, or as long time as with O16.3, but it's
> > > > > there and it's irritating.
> > > >
> > > > another example is RXVT (an X terminal emulator). Starts spinnig
> > > > after it's child has exited. Not always, but annoyingly often. System
> > > > is almost locked while it spins (calling select).
> > >
> > > What does vanilla kernel do with these apps running? Both immediately
> > > after the apps have started up and some time (>1 min) after they've
> > > been running?
> >
> > cannot test atm. Will do in 10hours.
> > RXVT behaved sanely (or probably spin-effect is very rare) in 2.4 (with
> > O(1) alone and your 2.4 patches) and plain 2.6-test1.
>
> Sorry, I have to postpone this investigation. No time on the machine.
>
> I try to describe the behaviour of rxvt as best as I can below.
>
> Afaics, the application (rxvt) just sleeps at the beginning waiting for
> input from X. As every terminal would do. At some point its inferior
> process finishes, but it fails to notice this spinning madly in the
> internal loop calling select, which returns immediately (because other
> side of pty was closed. That is the error in rxvt). Probably it has
> accumulated enough "priority" up to this moment to block other
> applications (window manager, for example) when it suddenly starts running?

Something like that. Interesting you point out select as wli was
profiling/tracing the mozilla/acroread plugin combination that spins on wait
and also found select was causing grief. It was calling select with a 15ms
timeout and X was getting less than 5ms to do it's work and respond and it
was repeatedly timing out. Seems a common link there.

Con

2003-08-25 23:01:07

by Alex Riesen

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

Con Kolivas, Tue, Aug 26, 2003 00:48:23 +0200:
> > Afaics, the application (rxvt) just sleeps at the beginning waiting for
> > input from X. As every terminal would do. At some point its inferior
> > process finishes, but it fails to notice this spinning madly in the
> > internal loop calling select, which returns immediately (because other
> > side of pty was closed. That is the error in rxvt). Probably it has
> > accumulated enough "priority" up to this moment to block other
> > applications (window manager, for example) when it suddenly starts running?
>
> Something like that. Interesting you point out select as wli was
> profiling/tracing the mozilla/acroread plugin combination that spins on wait
> and also found select was causing grief. It was calling select with a 15ms
> timeout and X was getting less than 5ms to do it's work and respond and it
> was repeatedly timing out. Seems a common link there.
>

Yes, looks similar. Probably a simplier test could be to let the
program loop using select on stdin with zero timeout (it is a pty
usually :)

2003-08-26 22:04:02

by Alex Riesen

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

Alex Riesen, Mon, Aug 25, 2003 12:21:33 +0200:
> Con Kolivas, Mon, Aug 25, 2003 12:16:13 +0200:
> > On Mon, 25 Aug 2003 19:42, Alex Riesen wrote:
> > > M?ns Rullg?rd, Mon, Aug 25, 2003 11:24:01 +0200:
> > > > XEmacs still spins after running a background job like make or grep.
> > > > It's fine if I reverse patch-O16.2-O16.3. The spinning doesn't happen
> > > > as often, or as long time as with O16.3, but it's there and it's
> > > > irritating.
> > >
> > > another example is RXVT (an X terminal emulator). Starts spinnig after
> > > it's child has exited. Not always, but annoyingly often. System is
> > > almost locked while it spins (calling select).
> >
> > What does vanilla kernel do with these apps running? Both immediately after
> > the apps have started up and some time (>1 min) after they've been running?
> >
> cannot test atm. Will do in 10hours.
> RXVT behaved sanely (or probably spin-effect is very rare) in 2.4 (with
> O(1) alone and your 2.4 patches) and plain 2.6-test1.
>

run on RXVT on plain 2.4-test4. No freezes, dynamic priority started at
23, dropped to 15 after the window lost focus and stayed at it.

As there were no execution bursts this time, vmstat did not show
anything interesting. At least I did not see anything.

-alex

2003-08-26 22:20:44

by Alex Riesen

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

Alex Riesen, Mon, Aug 25, 2003 12:29:33 +0200:
> Nick Piggin, Mon, Aug 25, 2003 12:27:14 +0200:
> > If you have some spare time perhaps you could test my scheduler
> > patch.
>
> i'll try to. Can't promise to have it today, though.
>

tried 7a. What I noticed first, is that the problem with rxvt eating up
all cpu time is gone :) Also applications get less priorities (11-16).
Can't say everything is very smooth, but somehow it makes very good
impression. No really rough edges, but I have to admit I tried only pure
cpu load (bash -c 'while :; do :; done').
Applications feel to start faster (subjective).
X was/is not niced.

Made the kernel to default boot for now, will see how it behaves.

-alex

2003-08-27 02:27:04

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH]O18.1int



Alex Riesen wrote:

>Alex Riesen, Mon, Aug 25, 2003 12:29:33 +0200:
>
>
>>Nick Piggin, Mon, Aug 25, 2003 12:27:14 +0200:
>>
>>
>>>If you have some spare time perhaps you could test my scheduler
>>>patch.
>>>
>>>
>>i'll try to. Can't promise to have it today, though.
>>
>>
>>
>
>tried 7a. What I noticed first, is that the problem with rxvt eating up
>all cpu time is gone :) Also applications get less priorities (11-16).
>Can't say everything is very smooth, but somehow it makes very good
>impression. No really rough edges, but I have to admit I tried only pure
>cpu load (bash -c 'while :; do :; done').
>Applications feel to start faster (subjective).
>X was/is not niced.
>
>

Thanks.. try renicing X to -10 or even -20.


2003-08-28 12:19:58

by Guillaume Chazarain

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

Hi Con (and linux-kernel),

I noticed a regression wrt 2.6.0-test4 and 2.4.22 with this
big context-switcher:

#include <unistd.h>

#define COUNT (1024 * 1024)

int main(void)
{
char buffer = 0;
int fd[2], i;

pipe(fd);

if (fork()) {
for (i = 0; i < COUNT; i++)
write(fd[1], &buffer, 1);
} else {
for (i = 0; i < COUNT; i++)
read(fd[0], &buffer, 1);
}

return 0;
}


Here are the timing results on my Pentium3 450:
Nothing else is running, no X.
vmstat(1) shows 200000 context-switchs per second.

2.4.22:
User time (seconds): 0.42
System time (seconds): 1.04
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.89
Minor (reclaiming a frame) page faults: 15

2.6.0-test4:
User time (seconds): 0.45
System time (seconds): 1.70
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.30
Minor (reclaiming a frame) page faults: 17

2.6.0-test4-nobonus:
User time (seconds): 0.42
System time (seconds): 1.26
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.24
Minor (reclaiming a frame) page faults: 17

2.6.0-test4-O18.1:
User time (seconds): 0.49
System time (seconds): 2.67
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.24
Minor (reclaiming a frame) page faults: 17

2.6.0-test4-O18.1-nobonus:
User time (seconds): 0.40
System time (seconds): 1.22
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.18
Minor (reclaiming a frame) page faults: 17

With -nobonus I mean this dumb patch that keeps the scheduling
overhead but not the computed bonus:

--- linux-2.6.0-test4-ck/kernel/sched.c.old
+++ linux-2.6.0-test4-ck/kernel/sched.c
@@ -349,6 +349,9 @@ static int effective_prio(task_t *p)

bonus = CURRENT_BONUS(p) - MAX_BONUS / 2;

+ if (p->pid > 100)
+ bonus = 0;
+
prio = p->static_prio - bonus;
if (prio < MAX_RT_PRIO)
prio = MAX_RT_PRIO;



And the top(1) results are:

2.6.0-test4:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
586 g 23 0 1336 260 1308 R 51.2 0.1 0:02.85 a.out (writer)
587 g 25 0 1336 260 1308 R 47.3 0.1 0:02.74 a.out (reader)

2.6.0-test4-ck:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
717 g 16 0 1336 260 1308 S 50.6 0.1 0:02.49 a.out (writer)
718 g 25 0 1336 260 1308 R 49.6 0.1 0:02.51 a.out (reader)



My conclusion is that the regression is not because of an increased
overhead but because of wrong scheduling decisions in this case.
It runs at full speed when the reader and writer are at the same
priority.
Maybe this is also the case in the volano benchmark?

Anyway, we could reduce the overhead in schedule() by doing the sched_clock()
stuff only in the (prev != next) case.


BTW I am also interested in the patch below that prevents C-Z'ed processes
from gaining interactivity bonus.

--- linux-2.6.0-test4-ck/kernel/sched.c.old
+++ linux-2.6.0-test4-ck/kernel/sched.c
@@ -449,8 +449,10 @@ static void recalc_task_prio(task_t *p,
static inline void activate_task(task_t *p, runqueue_t *rq)
{
unsigned long long now = sched_clock();
+ int sleeping = p->state & (TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE);

- recalc_task_prio(p, now);
+ if (likely(sleeping))
+ recalc_task_prio(p, now);

/*
* This checks to make sure it's not an uninterruptible task




Thanks for reading.

Guillaume






2003-08-28 13:40:14

by Guillaume Chazarain

[permalink] [raw]
Subject: Re: [PATCH]O18.1int

28/08/03 14:34:15, Nick Piggin <[email protected]> wrote:
>Guillaume Chazarain wrote:
>
>>Hi Con (and linux-kernel),
>>
>>I noticed a regression wrt 2.6.0-test4 and 2.4.22 with this
>>big context-switcher:
>>
>
>Hi Guillaume,
>If you get the time, would you be able to try my patch? Thanks.

Here are the results for Nick's v8:

top(1):

639 g 30 0 1336 260 1308 R 51.2 0.1 0:03.80 a.out
638 g 22 0 1336 260 1308 S 47.3 0.1 0:03.39 a.out

User time (seconds): 0.57
System time (seconds): 2.72
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.85
Minor (reclaiming a frame) page faults: 17


Guillaume.




2003-08-28 13:58:42

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH]O18.1int



Guillaume Chazarain wrote:

>28/08/03 14:34:15, Nick Piggin <[email protected]> wrote:
>
>>Guillaume Chazarain wrote:
>>
>>
>>>Hi Con (and linux-kernel),
>>>
>>>I noticed a regression wrt 2.6.0-test4 and 2.4.22 with this
>>>big context-switcher:
>>>
>>>
>>Hi Guillaume,
>>If you get the time, would you be able to try my patch? Thanks.
>>
>
>Here are the results for Nick's v8:
>
>top(1):
>
> 639 g 30 0 1336 260 1308 R 51.2 0.1 0:03.80 a.out
> 638 g 22 0 1336 260 1308 S 47.3 0.1 0:03.39 a.out
>
>User time (seconds): 0.57
>System time (seconds): 2.72
>Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.85
>Minor (reclaiming a frame) page faults: 17
>
>

Thanks Guillaume, so not very good. Its interesting that there can
be such a big difference in performance, but its a very simple app
so makes a good test for the specific regression.

In both Con's and my patches, the reader gets a bit more CPU. This
might be due to it preempting the writer more often on wakeups,
which would lead to more scheduling per work done and a regression.

If this is the case, I'm not sure the behaviour is too undesirable
though: its often very important for woken processes to be run
quickly. Its not clear that this workload is something we would
want to optimize for. Assuming the problem is what I guess.

I will take a look into it further when I get time tomorrow.