2006-02-13 12:29:26

by Mike Galbraith

[permalink] [raw]

Subject: Re: 2.6 vs 2.4, ssh terminal slowdown

On Mon, 2006-02-13 at 03:43 -0500, Lee Revell wrote:
> On Mon, 2006-02-13 at 08:08 +0100, MIke Galbraith wrote:
> > On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> > > Do you know which of those changes fixes the "ls" problem?
> >
> > No, it could be either, both, or neither. Heck, it _could_ be a
> > combination of all of the things in my experimental tree for that
> > matter. I put this patch out there because I know they're both bugs,
> > and strongly suspect it'll cure the worst of the interactivity related
> > delays.
> >
> > I'm hoping you'll test it and confirm that it fixes yours.
>
> Nope, this does not fix it. "time ls" ping-pongs back and forth between
> ~0.1s and ~0.9s. Must have been something else in the first patch.

Hmm. Thinking about it some more, it's probably more than this alone,
but it could well be the boost qualifier I'm using...

Instead of declaring a task to be deserving of large quantities of boost
based upon their present shortage of sleep_avg, I based it upon their
not using their full slice. He who uses the least gets the most. This
made a large contribution to mitigating the parallel compile over NFS
problem the current scheduler has. The fact that (current) heuristics
which mandate that any task which sleeps for 5% of it's slice may use
95% cpu practically forever can not only work, but work quite well in
the general case, tells me that the vast majority of all tasks are, and
will forever remain, cpu hogs.

The present qualifier creates positive feedback for cpu hogs by giving
them the most reward for being the biggest hog by our own definition.
If you'll pardon the pun, we gives pigs wings, and hope that they don't
actually use them and fly directly over head. This is the root problem
as I see it, that and the fact that even if sleep_avg acquisition and
consumption were purely 1:1 as the original O(1) scheduler was, if you
sleep 1 ns longer than you run, you'll eventually be up to you neck in
sleep_avg. (a darn good reason to use something like slice_avg to help
determine when to drain off the excess)

Changing that qualifier would also mean that he who is _getting_ the
least cpu would get the most boost as well, so it should help with
fairness, and things like the test case mentioned in comments in the
patch where one task can end up starving it's own partner.

Is there any reason that "he who uses the least gets the most" would be
inferior to "he who has the least for whatever reason gets the most"?

If I were to put a patch together that did only that (IMHO sensible)
thing, would anyone be interested in trying it?

-Mike

2006-02-15 04:23:04

by Lee Revell

[permalink] [raw]

Subject: Re: 2.6 vs 2.4, ssh terminal slowdown

On Mon, 2006-02-13 at 13:35 +0100, MIke Galbraith wrote:
> On Mon, 2006-02-13 at 03:43 -0500, Lee Revell wrote:
> > On Mon, 2006-02-13 at 08:08 +0100, MIke Galbraith wrote:
> > > On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> > > > Do you know which of those changes fixes the "ls" problem?
> > >
> > > No, it could be either, both, or neither. Heck, it _could_ be a
> > > combination of all of the things in my experimental tree for that
> > > matter. I put this patch out there because I know they're both bugs,
> > > and strongly suspect it'll cure the worst of the interactivity related
> > > delays.
> > >
> > > I'm hoping you'll test it and confirm that it fixes yours.
> >
> > Nope, this does not fix it. "time ls" ping-pongs back and forth between
> > ~0.1s and ~0.9s. Must have been something else in the first patch.
>
> Hmm. Thinking about it some more, it's probably more than this alone,
> but it could well be the boost qualifier I'm using...

OK, with 2.6.16-rc2-mm1, "ls" bounces around between 0.15s and 0.50s.
Better than mainline but the large seemingly random variance is still
perceptible and annoying. And, "ls | cat" behaves about the same as
"ls", while on mainline it was consistently faster (!).

Do you have an updated patch against -mm that I can test?

Lee

2006-02-15 05:22:15

by Mike Galbraith

[permalink] [raw]

Subject: Re: 2.6 vs 2.4, ssh terminal slowdown

On Tue, 2006-02-14 at 23:22 -0500, Lee Revell wrote:
> On Mon, 2006-02-13 at 13:35 +0100, MIke Galbraith wrote:
> > On Mon, 2006-02-13 at 03:43 -0500, Lee Revell wrote:
> > > On Mon, 2006-02-13 at 08:08 +0100, MIke Galbraith wrote:
> > > > On Mon, 2006-02-13 at 01:38 -0500, Lee Revell wrote:
> > > > > Do you know which of those changes fixes the "ls" problem?
> > > >
> > > > No, it could be either, both, or neither. Heck, it _could_ be a
> > > > combination of all of the things in my experimental tree for that
> > > > matter. I put this patch out there because I know they're both bugs,
> > > > and strongly suspect it'll cure the worst of the interactivity related
> > > > delays.
> > > >
> > > > I'm hoping you'll test it and confirm that it fixes yours.
> > >
> > > Nope, this does not fix it. "time ls" ping-pongs back and forth between
> > > ~0.1s and ~0.9s. Must have been something else in the first patch.
> >
> > Hmm. Thinking about it some more, it's probably more than this alone,
> > but it could well be the boost qualifier I'm using...
>
> OK, with 2.6.16-rc2-mm1, "ls" bounces around between 0.15s and 0.50s.
> Better than mainline but the large seemingly random variance is still
> perceptible and annoying. And, "ls | cat" behaves about the same as
> "ls", while on mainline it was consistently faster (!).

Ok. That means the reduction in fluctuation had nothing to do with my
changes. It also suggests that there may be something of a regression
in the changes that are in mm, which I also carried in my patch, since
the timing for both kernels appear to be ~identical with or without my
bits. That seems a little odd to me considering what those changes do.

>
> Do you have an updated patch against -mm that I can test?

I will soon if you still want to try it. I've fixed the throttle release
thing, and am fine tuning the interactivity bits. I have it working
very well now, but want to try to squeeze some more from it.

Drop me a line if you're still interested from the interactivity side,
but I think the ls delay reduction has turned out to be a red herring.

-Mike

2006-02-15 06:11:11

by Lee Revell

[permalink] [raw]

Subject: Re: 2.6 vs 2.4, ssh terminal slowdown

On Wed, 2006-02-15 at 06:22 +0100, MIke Galbraith wrote:
> > OK, with 2.6.16-rc2-mm1, "ls" bounces around between 0.15s and
> 0.50s.
> > Better than mainline but the large seemingly random variance is
> still
> > perceptible and annoying. And, "ls | cat" behaves about the same as
> > "ls", while on mainline it was consistently faster (!).
>
> Ok. That means the reduction in fluctuation had nothing to do with my
> changes. It also suggests that there may be something of a regression
> in the changes that are in mm, which I also carried in my patch, since
> the timing for both kernels appear to be ~identical with or without my
> bits. That seems a little odd to me considering what those changes
> do.
>
> >
> > Do you have an updated patch against -mm that I can test?
>
> I will soon if you still want to try it. I've fixed the throttle
> release
> thing, and am fine tuning the interactivity bits. I have it working
> very well now, but want to try to squeeze some more from it.
>
> Drop me a line if you're still interested from the interactivity side,
> but I think the ls delay reduction has turned out to be a red
> herring.

Just to be clear - this is 2.6.16-rc2-mm1 *without* your patch that I am
talking about.

Lee

2006-02-15 07:17:45

by Mike Galbraith

[permalink] [raw]

Subject: Re: 2.6 vs 2.4, ssh terminal slowdown

On Wed, 2006-02-15 at 01:11 -0500, Lee Revell wrote:
> On Wed, 2006-02-15 at 06:22 +0100, MIke Galbraith wrote:
> > > OK, with 2.6.16-rc2-mm1, "ls" bounces around between 0.15s and
> > 0.50s.
> > > Better than mainline but the large seemingly random variance is
> > still
> > > perceptible and annoying. And, "ls | cat" behaves about the same as
> > > "ls", while on mainline it was consistently faster (!).
> >
> > Ok. That means the reduction in fluctuation had nothing to do with my
> > changes. It also suggests that there may be something of a regression
> > in the changes that are in mm, which I also carried in my patch, since
> > the timing for both kernels appear to be ~identical with or without my
> > bits. That seems a little odd to me considering what those changes
> > do.
> >
> > >
> > > Do you have an updated patch against -mm that I can test?
> >
> > I will soon if you still want to try it. I've fixed the throttle
> > release
> > thing, and am fine tuning the interactivity bits. I have it working
> > very well now, but want to try to squeeze some more from it.
> >
> > Drop me a line if you're still interested from the interactivity side,
> > but I think the ls delay reduction has turned out to be a red
> > herring.
>
> Just to be clear - this is 2.6.16-rc2-mm1 *without* your patch that I am
> talking about.

Exactly. 2.6.16-rc2-mm1 without my patch has a delay of .15 to .50s.
2.6.16-rc1 with my patch had a reported delay of from .19 to .45s.
That's identical in my book. My patch to rc1 also contained Con's
changes that are in mm, that's constant. Subtracting the variable, my
patch, made no difference. Con's changes may be responsible for the
behavior change, but mine are certainly not.

-Mike