2002-11-18 06:17:02

by Mike Galbraith

[permalink] [raw]
Subject: 2.5.47 scheduler problems?

Greetings,

For testing swap throughput, I like to run make -j30 bzImage on my 500Mhz
PIII w. 128Mb ram. For testing interactivity, I fire up KDE, start a
smaller make -j, grab a window, and wave it around.

With 2.4.20rc2+rc1aa1, running a -j10 build (not swapping) is very very
bad. However, if I set all tasks in the system to SCHED_FIFO or SCHED_RR
prior to this light make -j, I have a ~pretty smooth system.

If I do the same in 2.5.47, I have no control of my box. Setting all tasks
to SCHED_FIFO or SCHED_RR prior to starting make -j10 bzImage, I can regain
control, but interactivity under load is basically not present.

I used to be able to wave a window poorly at make -j25 (swapping heftily),
fairly smoothly at make -j20, and smoothly at make -j15 or below. This
with no SCHED_RR/SCHED_FIFO. (I haven't done much testing like this in
quite a while though)

-Mike


2002-11-18 06:44:53

by Tim Connors

[permalink] [raw]
Subject: Re: 2.5.47 scheduler problems?

In linux.kernel, you wrote:
> Greetings,
>
> For testing swap throughput, I like to run make -j30 bzImage on my 500Mhz
> PIII w. 128Mb ram. For testing interactivity, I fire up KDE, start a
> smaller make -j, grab a window, and wave it around.
>
> With 2.4.20rc2+rc1aa1, running a -j10 build (not swapping) is very very
> bad. However, if I set all tasks in the system to SCHED_FIFO or SCHED_RR
> prior to this light make -j, I have a ~pretty smooth system.
>
> If I do the same in 2.5.47, I have no control of my box. Setting all tasks
> to SCHED_FIFO or SCHED_RR prior to starting make -j10 bzImage, I can regain
> control, but interactivity under load is basically not present.

Funny that.

> I used to be able to wave a window poorly at make -j25 (swapping heftily),
> fairly smoothly at make -j20, and smoothly at make -j15 or below. This
> with no SCHED_RR/SCHED_FIFO. (I haven't done much testing like this in
> quite a while though)

Perhaps you should consider buying an extra 29 CPU's for you desktop?

--
TimC -- http://astronomy.swin.edu.au/staff/tconnors/

A Chemist who falls in acid is absorbed in work.

2002-11-18 07:01:13

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.47 scheduler problems?

Tim Connors wrote:
>
> > I used to be able to wave a window poorly at make -j25 (swapping heftily),
> > fairly smoothly at make -j20, and smoothly at make -j15 or below. This
> > with no SCHED_RR/SCHED_FIFO. (I haven't done much testing like this in
> > quite a while though)
>
> Perhaps you should consider buying an extra 29 CPU's for you desktop?
>

No. He's saying that it used to be OK, but it has got worse.

A much simpler test is to start a big compilation and then madly
waggle an X window around. Goes OK for a few seconds, and then
seizes up quite horridly. Presumably because the scheduler has
suddenly decided that the X server has become a "batch" process
and is scheduling it in a similar manner to the compilation.

If you stop wiggling the window for 5-10 seconds it comes back.
Presumably because the scheduler has decided that the X server is
"interactive" again.

When it happens, it's *very* bad. The mouse cursor doesn't move
for 0.5-1.0 seconds and then takes great leaps. It is unusable.

Strangely it does not happen (much) when the background load is
a few busywaits. It has to be a compilation - maybe short-lived
batch processes is what triggers it.

For me, the X server is sometimes the victim, and the MUA (netscape4)
is frequently victimised. This is because the MUA alternates between
periods of interactivity and periods of compute-intensive work (parsing
large mailboxes). When this problem strikes you have to just sit there
with your arms folded waiting for it to stop.

It needs fixing.

2002-11-18 07:28:21

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.5.47 scheduler problems?


----- Original Message -----
From: "Tim Connors" <[email protected]>
To: <[email protected]>; "Mike Galbraith" <[email protected]>
Sent: Monday, November 18, 2002 7:51 AM
Subject: Re: 2.5.47 scheduler problems?


> In linux.kernel, you wrote:
> > Greetings,
> >
> > For testing swap throughput, I like to run make -j30 bzImage on my
500Mhz
> > PIII w. 128Mb ram. For testing interactivity, I fire up KDE, start
a
> > smaller make -j, grab a window, and wave it around.
> >
> > With 2.4.20rc2+rc1aa1, running a -j10 build (not swapping) is very
very
> > bad. However, if I set all tasks in the system to SCHED_FIFO or
SCHED_RR
> > prior to this light make -j, I have a ~pretty smooth system.
> >
> > If I do the same in 2.5.47, I have no control of my box. Setting
all tasks
> > to SCHED_FIFO or SCHED_RR prior to starting make -j10 bzImage, I can
regain
> > control, but interactivity under load is basically not present.
>
> Funny that.
>
> > I used to be able to wave a window poorly at make -j25 (swapping
heftily),
> > fairly smoothly at make -j20, and smoothly at make -j15 or below.
This
> > with no SCHED_RR/SCHED_FIFO. (I haven't done much testing like this
in
> > quite a while though)
>
> Perhaps you should consider buying an extra 29 CPU's for you desktop?

I have neither the need for 30 CPUs, nor the cash to pay for such a
beast :)

I gather you think my test is silly?

-Mike

2002-11-18 07:34:17

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.5.47 scheduler problems?


----- Original Message -----
From: "Andrew Morton" <[email protected]>
To: "Tim Connors" <[email protected]>
Cc: <[email protected]>; "Mike Galbraith" <[email protected]>
Sent: Monday, November 18, 2002 8:08 AM
Subject: Re: 2.5.47 scheduler problems?


> Tim Connors wrote:
> >
> > > I used to be able to wave a window poorly at make -j25 (swapping
heftily),
> > > fairly smoothly at make -j20, and smoothly at make -j15 or below.
This
> > > with no SCHED_RR/SCHED_FIFO. (I haven't done much testing like
this in
> > > quite a while though)
> >
> > Perhaps you should consider buying an extra 29 CPU's for you
desktop?
> >
>
> No. He's saying that it used to be OK, but it has got worse.
>
> A much simpler test is to start a big compilation and then madly
> waggle an X window around. Goes OK for a few seconds, and then
> seizes up quite horridly. Presumably because the scheduler has
> suddenly decided that the X server has become a "batch" process
> and is scheduling it in a similar manner to the compilation.
>
> If you stop wiggling the window for 5-10 seconds it comes back.
> Presumably because the scheduler has decided that the X server is
> "interactive" again.
>
> When it happens, it's *very* bad. The mouse cursor doesn't move
> for 0.5-1.0 seconds and then takes great leaps. It is unusable.

I was watching it this morning, without wiggling, and it seems to update
window content (make output in one and vmstat in another) about every 5
seconds.. very odd looking.

-Mike

2002-11-18 07:46:04

by Tim Connors

[permalink] [raw]
Subject: Re: 2.5.47 scheduler problems?

On Mon, 18 Nov 2002, Mike Galbraith wrote:

> > > If I do the same in 2.5.47, I have no control of my box. Setting
> all tasks
> > > to SCHED_FIFO or SCHED_RR prior to starting make -j10 bzImage, I can
> regain
> > > control, but interactivity under load is basically not present.
> >
> > Funny that.
> >
> > > I used to be able to wave a window poorly at make -j25 (swapping
> heftily),
> > > fairly smoothly at make -j20, and smoothly at make -j15 or below.
> This
> > > with no SCHED_RR/SCHED_FIFO. (I haven't done much testing like this
> in
> > > quite a while though)
> >
> > Perhaps you should consider buying an extra 29 CPU's for you desktop?
>
> I have neither the need for 30 CPUs, nor the cash to pay for such a
> beast :)
>
> I gather you think my test is silly?

Well, yes, 30 processes at a time on a single CPU does seem a bit silly -
given that (under the old system), you would not expect X to get more than
3% of the CPU time.
Also sceduling normal processes (ie, not real-time processes) as RR/FIFO
seemed also pretty bad.

However....

But I have to now admit that I haven't yet played with 2.5.47 seriously,
and wansn't aware of the problems which Andrew just posted.

mea culpa.


--
TimC -- http://astronomy.swin.edu.au/staff/tconnors/

If you ever fear that machines will surpass humans in intelligence,
just ask Microsoft to write the OS. -- POTU in RHOD

2002-11-18 10:51:33

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.5.47 scheduler problems?


----- Original Message -----
From: "Tim Connors" <[email protected]>
To: "Mike Galbraith" <[email protected]>
Cc: <[email protected]>
Sent: Monday, November 18, 2002 8:53 AM
Subject: Re: 2.5.47 scheduler problems?


> On Mon, 18 Nov 2002, Mike Galbraith wrote:
>
> > > > If I do the same in 2.5.47, I have no control of my box.
Setting
> > all tasks
> > > > to SCHED_FIFO or SCHED_RR prior to starting make -j10 bzImage, I
can
> > regain
> > > > control, but interactivity under load is basically not present.
> > >
> > > Funny that.
> > >
> > > > I used to be able to wave a window poorly at make -j25 (swapping
> > heftily),
> > > > fairly smoothly at make -j20, and smoothly at make -j15 or
below.
> > This
> > > > with no SCHED_RR/SCHED_FIFO. (I haven't done much testing like
this
> > in
> > > > quite a while though)
> > >
> > > Perhaps you should consider buying an extra 29 CPU's for you
desktop?
> >
> > I have neither the need for 30 CPUs, nor the cash to pay for such a
> > beast :)
> >
> > I gather you think my test is silly?
>
> Well, yes, 30 processes at a time on a single CPU does seem a bit
silly -
> given that (under the old system), you would not expect X to get more
than
> 3% of the CPU time.

I don't try -j30 with X/KDE running.. that's much too heavy for my
little box. The whole point of doing -j30 on my box without X/KDE is
that it juuuust fills up capacity. It generally adds a minute to build
time despite quite hefty swapping. With aa kernels or heavily twiddled
stock kernels, it's more like 30 seconds. (with new gcc, -j30 is way
too much too.. oink oink;)

> Also sceduling normal processes (ie, not real-time processes) as
RR/FIFO
> seemed also pretty bad.

That was only to see if I _could_ get some CPU, and with (only:) 10
copies of gcc running.

>
> However....
>
> But I have to now admit that I haven't yet played with 2.5.47
seriously,
> and wansn't aware of the problems which Andrew just posted.
>
> mea culpa.
>
>
> --
> TimC -- http://astronomy.swin.edu.au/staff/tconnors/

-Mike

2002-11-22 05:34:24

by Jim Houston

[permalink] [raw]
Subject: Re: 2.5.47 scheduler problems?

Hi Mike, Rik, Everyone,

The O(1) schedule just isn't fair. It will run a subset
of the runable processes excluding the rest. See my earlier
emails for the details.

I had been working on a fix for this but got distracted
by Posix timers. I still hope to get back to it.

My patch is here:
http://marc.theaimsgroup.com/?l=linux-kernel&m=103508412423719&w=2

It fixes fairness but breaks nice(2). Rik van Riel has a
patch here which builds on my patch which fixes this:
http://marc.theaimsgroup.com/?l=linux-kernel&m=103651801424031&w=2

I just gave this a spin with. The patches still apply cleanly
to linux-2.5.48 and it seems well behaved:-)

I found this problem with the LTP waitpid06 test. It actually
produced a live-lock. See this mail:
http://marc.theaimsgroup.com/?l=linux-kernel&m=103133744217082&w=2

Jim Houston - Concurrent Computer Corp.

2002-11-22 11:03:46

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.5.47 scheduler problems?

At 12:41 AM 11/22/2002 -0500, Jim Houston wrote:
>Hi Mike, Rik, Everyone,
>
>The O(1) schedule just isn't fair. It will run a subset
>of the runable processes excluding the rest. See my earlier
>emails for the details.
>
>I had been working on a fix for this but got distracted
>by Posix timers. I still hope to get back to it.
>
>My patch is here:
>http://marc.theaimsgroup.com/?l=linux-kernel&m=103508412423719&w=2

In a brief test, this seems to cure my problem.

>It fixes fairness but breaks nice(2). Rik van Riel has a
>patch here which builds on my patch which fixes this:
>http://marc.theaimsgroup.com/?l=linux-kernel&m=103651801424031&w=2

(I haven't test this one yet)

>I just gave this a spin with. The patches still apply cleanly
>to linux-2.5.48 and it seems well behaved:-)

It seems a little choppy still for a not swapping load, but greatly improved.

Thanks!

-Mike

2002-11-22 12:47:08

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.5.47 scheduler problems?

At 12:07 PM 11/22/2002 +0100, Mike Galbraith wrote:
>At 12:41 AM 11/22/2002 -0500, Jim Houston wrote:
>
>>I just gave this a spin with. The patches still apply cleanly
>>to linux-2.5.48 and it seems well behaved:-)
>
>It seems a little choppy still for a not swapping load, but greatly improved.
>
>Thanks!

(I put it into virgin 2.5.47 fwiw) I have some very odd behavior. I
wanted to see how the kernel did at make -j30 bzImage on my test box to see
what effect it has on throughput (box is 500 Mhz PIII + 128Mb ram), and get
vmstat output like the attached. I should be roughly 30Mb into swap and
paging heftily at this point.

-Mike


Attachments:
vmstat.out (2.24 kB)

2002-11-22 14:00:01

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.5.47 scheduler problems?

At 01:51 PM 11/22/2002 +0100, Mike Galbraith wrote:
>At 12:07 PM 11/22/2002 +0100, Mike Galbraith wrote:
>>At 12:41 AM 11/22/2002 -0500, Jim Houston wrote:
>>
>>>I just gave this a spin with. The patches still apply cleanly
>>>to linux-2.5.48 and it seems well behaved:-)
>>
>>It seems a little choppy still for a not swapping load, but greatly improved.
>>
>>Thanks!
>
>(I put it into virgin 2.5.47 fwiw) I have some very odd behavior. I
>wanted to see how the kernel did at make -j30 bzImage on my test box to
>see what effect it has on throughput (box is 500 Mhz PIII + 128Mb ram),
>and get vmstat output like the attached. I should be roughly 30Mb into
>swap and paging heftily at this point.

Never mind the vmstat output.. it seems you need both patches. With both
in 2.5.48, the build progressed in a much more normal looking fashion. I'm
not losing control of my box any more under load.

-Mike