2002-07-01 17:55:15

by Bill Davidsen

[permalink] [raw]
Subject: [OKS] O(1) scheduler in 2.4

What's the issue? The most popular trees have been using it without issue
for six months or so, and I know of no cases of bad behaviour. I know
there are people who don't believe in the preempt patch, but the new
scheduler seems to work better under both desktop and server load.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


2002-07-01 18:13:21

by Tom Rini

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Mon, Jul 01, 2002 at 01:52:54PM -0400, Bill Davidsen wrote:

> What's the issue?

a) We're at 2.4.19-rc1 right now. It would be horribly
counterproductive to put O(1) in right now.
b) 2.4 is the _stable_ tree. If every big change in 2.5 got back ported
to 2.4, it'd be just like 2.5 :)
c) I also suspect that it hasn't been as widley tested on !x86 as the
stuff currently in 2.4. And again, 2.4 is the stable tree.

--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

2002-07-01 18:49:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4


On Mon, 1 Jul 2002, Bill Davidsen wrote:

> What's the issue? The most popular trees have been using it without
> issue for six months or so, and I know of no cases of bad behaviour.
> [...]

well, the patch is barely 6 months old. A new scheduler changes the
'heart' of the kernel and something like that should not be done for the
stable branch, especially since it has finally started to converge towards
a state that can be called stable ...

> [...] I know there are people who don't believe in the preempt patch,
> but the new scheduler seems to work better under both desktop and server
> load.

well, the preempt patch is rather for RT-type workloads where milliseconds
matter, which improvements are not a matter of belief, but a matter of
hard latencies. Mere mortals should hardly notice its effects under normal
loads - perhaps a bit more 'snappiness'. But such effects do accumulate
up, and people are seeing visible improvements with combo-patches of
lowlat-lockbreak+preempt+O(1).

Ingo

2002-07-01 23:42:16

by J.A. Magallon

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4


On 2002.07.01 Tom Rini wrote:
>On Mon, Jul 01, 2002 at 01:52:54PM -0400, Bill Davidsen wrote:
>
>> What's the issue?
>
>a) We're at 2.4.19-rc1 right now. It would be horribly
>counterproductive to put O(1) in right now.

.20-pre1 would be a good start, but my hope is that this reserved for
the vm updates from -aa ;).

>b) 2.4 is the _stable_ tree. If every big change in 2.5 got back ported
>to 2.4, it'd be just like 2.5 :)

So you want to wait till 2.6.40 to be able to use a O1 scheduler on a
kernel that does not eat up your drives ? (say, next year by this same month...)

>c) I also suspect that it hasn't been as widley tested on !x86 as the
>stuff currently in 2.4. And again, 2.4 is the stable tree.
>

I know it is not a priority for 2.4, but say it wil never happen...

--
J.A. Magallon \ Software is like sex: It's better when it's free
mailto:[email protected] \ -- Linus Torvalds, FSF T-shirt
Linux werewolf 2.4.19-rc1-jam1, Mandrake Linux 8.3 (Cooker) for i586
gcc (GCC) 3.1.1 (Mandrake Linux 8.3 3.1.1-0.7mdk)

2002-07-02 02:49:15

by Tom Rini

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Tue, Jul 02, 2002 at 01:44:32AM +0200, J.A. Magallon wrote:
>
> On 2002.07.01 Tom Rini wrote:
> >On Mon, Jul 01, 2002 at 01:52:54PM -0400, Bill Davidsen wrote:
> >
> >> What's the issue?
> >
> >b) 2.4 is the _stable_ tree. If every big change in 2.5 got back ported
> >to 2.4, it'd be just like 2.5 :)
>
> So you want to wait till 2.6.40 to be able to use a O1 scheduler on a
> kernel that does not eat up your drives ? (say, next year by this same month...)

I assume you mean 2.4.60 here, and no, I don't think O1 scheduler should
go into 2.4 ever. We're aiming for a _stable_ series here. Let me
stress that again, _stable_. I'd hope that 2.4.60 is as slow in coming
as 2.0.40 is.

> >c) I also suspect that it hasn't been as widley tested on !x86 as the
> >stuff currently in 2.4. And again, 2.4 is the stable tree.
>
> I know it is not a priority for 2.4, but say it wil never happen...

I won't say it will never happen, just that I don't think it should.
It's a rather invasive thing (and as Ingo said, it's just not getting
stable).

--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

2002-07-02 14:49:57

by Bill Davidsen

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Mon, 1 Jul 2002, Tom Rini wrote:

> On Mon, Jul 01, 2002 at 01:52:54PM -0400, Bill Davidsen wrote:
>
> > What's the issue?
>
> a) We're at 2.4.19-rc1 right now. It would be horribly
> counterproductive to put O(1) in right now.
> b) 2.4 is the _stable_ tree. If every big change in 2.5 got back ported
> to 2.4, it'd be just like 2.5 :)
> c) I also suspect that it hasn't been as widley tested on !x86 as the
> stuff currently in 2.4. And again, 2.4 is the stable tree.

Since 2.5 feature freeze isn't planned until fall, I think you can assume
there will be releases after 2.4.19... Since it has been as heavily tested
as any feature not in a stable release kernel can be, there seems little
reason to put it off for a year, assuming 2.6 releases within six months
of feature freeze.

Stable doesn't mean moribund, we are working Andrea's VM stuff in, and
that's a LOT more likely to behave differently on hardware with other word
length. Keeping inferior performance for another year and then trying to
separate 2.5 other unintended features from any possible scheduler issues
seems like a reduction in stability for 2.6.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-07-02 15:10:26

by Bill Davidsen

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Mon, 1 Jul 2002, Ingo Molnar wrote:

>
> On Mon, 1 Jul 2002, Bill Davidsen wrote:
>
> > What's the issue? The most popular trees have been using it without
> > issue for six months or so, and I know of no cases of bad behaviour.
> > [...]
>
> well, the patch is barely 6 months old. A new scheduler changes the
> 'heart' of the kernel and something like that should not be done for the
> stable branch, especially since it has finally started to converge towards
> a state that can be called stable ...

As I noted, the VM changes which are going in without objection are more
likely to be a cause of problems caused by word length, memory
organization, etc. And they work fine, at least for Intel and SPARC. O(1)
has been as tested as any feature can be, certainly -ac kernels are run by
more people than 2.5 kernels, and running the best process is less likely
to be hardware dependent. There is a big win with this scheduler, it keeps
the system running far better on mixed loads, and does it without hours of
playing with nice() to get things balanced.

> > [...] I know there are people who don't believe in the preempt patch,
> > but the new scheduler seems to work better under both desktop and server
> > load.
>
> well, the preempt patch is rather for RT-type workloads where milliseconds
> matter, which improvements are not a matter of belief, but a matter of
> hard latencies. Mere mortals should hardly notice its effects under normal
> loads - perhaps a bit more 'snappiness'. But such effects do accumulate
> up, and people are seeing visible improvements with combo-patches of
> lowlat-lockbreak+preempt+O(1).

Last time I tried that, I used all but lockbreak, and the only place I saw
anything for my loads was slightly lower latency for a slow machine
playing router. But I'm running news and dns servers, and O(1) seems to
drop the load average by about 15% (as much as you can measure on a
machine with 400% swings in the demand ;-)

Thanks for the input, I just don't see that there will ever be a better
time to put it in, the 2.5 kernel is very lightly used and tested, and has
enough other things happening to mask anything short of a disaster. And
2.6 will be another stable kernel, at least numerically, initially with
much less testing.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-07-02 15:12:56

by Tom Rini

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Tue, Jul 02, 2002 at 10:46:56AM -0400, Bill Davidsen wrote:
> On Mon, 1 Jul 2002, Tom Rini wrote:
>
> > On Mon, Jul 01, 2002 at 01:52:54PM -0400, Bill Davidsen wrote:
> >
> > > What's the issue?
> >
> > a) We're at 2.4.19-rc1 right now. It would be horribly
> > counterproductive to put O(1) in right now.
> > b) 2.4 is the _stable_ tree. If every big change in 2.5 got back ported
> > to 2.4, it'd be just like 2.5 :)
> > c) I also suspect that it hasn't been as widley tested on !x86 as the
> > stuff currently in 2.4. And again, 2.4 is the stable tree.
>
> Since 2.5 feature freeze isn't planned until fall, I think you can assume
> there will be releases after 2.4.19...

I sure hope so, I've got a whole bunch of PPC stuff that's been around
for ages now that just might make it into 2.4.20 :)

> Since it has been as heavily tested
> as any feature not in a stable release kernel can be, there seems little
> reason to put it off for a year, assuming 2.6 releases within six months
> of feature freeze.

Sure there is. It's called stopping feature creep. O(1) is a nice
feature, but so is the bio stuff, the initcall levels, and other things
in 2.5 as well. But should we back port all of these to 2.4 as well?

> Stable doesn't mean moribund, we are working Andrea's VM stuff in, and
> that's a LOT more likely to behave differently on hardware with other word
> length.

Being someone who actually works on !x86 hardware all of the time, I'm
slightly warry of Andrea's VM work as well. But it's also something
which has been split into numerous small chunks, so hopefully problems
will be spotted.

> Keeping inferior performance for another year and then trying to
> separate 2.5 other unintended features from any possible scheduler issues
> seems like a reduction in stability for 2.6.

It's no more of a reduction in stability than not back porting
everything else. And making things stable is why eventually Linus says
'enough' and kicks out 2.stable.0-test1. Anyhow, since this isn't a
subsystem backport, but part of the core kernel, I would think that you
could only get limited use out of the testing (I remember reading some
of the O(1) announcments for 2.4.then-current and reading about small
bugs that weren't in the 2.5 version).

--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

2002-07-02 16:16:23

by Luigi Genoni

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Tue, 2 Jul 2002, J.A. Magallon wrote:

> Date: Tue, 2 Jul 2002 01:44:32 +0200
> From: J.A. Magallon <[email protected]>
> To: Tom Rini <[email protected]>
> Cc: Bill Davidsen <[email protected]>,
> Linux-Kernel Mailing List <[email protected]>
> Subject: Re: [OKS] O(1) scheduler in 2.4
>
>
> On 2002.07.01 Tom Rini wrote:
> >On Mon, Jul 01, 2002 at 01:52:54PM -0400, Bill Davidsen wrote:
> >
> >> What's the issue?
> >
> >a) We're at 2.4.19-rc1 right now. It would be horribly
> >counterproductive to put O(1) in right now.
>
> .20-pre1 would be a good start, but my hope is that this reserved for
> the vm updates from -aa ;).

If I am not wrong in the AA tree the O(1) scheduler has been merged, so
there is an opportunity do update booth ;).


>
> >c) I also suspect that it hasn't been as widley tested on !x86 as the
> >stuff currently in 2.4. And again, 2.4 is the stable tree.
> >
>
Well, I think it has been supposed to high quality test, also if the
tester basis was quite reduced...

2002-07-02 16:51:22

by Tomas Szepe

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

> > >> What's the issue?
> > >
> > >a) We're at 2.4.19-rc1 right now. It would be horribly
> > >counterproductive to put O(1) in right now.
> >
> > .20-pre1 would be a good start, but my hope is that this reserved for
> > the vm updates from -aa ;).
>
> If I am not wrong in the AA tree the O(1) scheduler has been merged, so
> there is an opportunity do update booth ;).


... and then hope the thing doesn't turn into a suicide booth.


T.

2002-07-03 07:08:00

by Rob Landley

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Monday 01 July 2002 10:48 pm, Tom Rini wrote:
> On Tue, Jul 02, 2002 at 01:44:32AM +0200, J.A. Magallon wrote:
> > On 2002.07.01 Tom Rini wrote:
> > >On Mon, Jul 01, 2002 at 01:52:54PM -0400, Bill Davidsen wrote:
> > >> What's the issue?
> > >
> > >b) 2.4 is the _stable_ tree. If every big change in 2.5 got back ported
> > >to 2.4, it'd be just like 2.5 :)
> >
> > So you want to wait till 2.6.40 to be able to use a O1 scheduler on a
> > kernel that does not eat up your drives ? (say, next year by this same
> > month...)
>
> I assume you mean 2.4.60 here, and no, I don't think O1 scheduler should
> go into 2.4 ever. We're aiming for a _stable_ series here. Let me

Ah, monday morning virtue, overcompensating for 2.4.10. It's the hangover
speaking...

"We upgrade our kernel on a production machine without testing it first, and
we get mad if anything actually CHANGED. We want that upgrade to be a NOP,
darn it! We want it to be as if we never did it in the first place, that's
why we do it..."

If you want stone tablet stability, why the heck are you upgrading your
kernel? Downloading the new version off of kernel.org generally means you're
mucking about with a working box, making changes that are not 100% required.
If a security vulnerability comes out, you have the source and can patch the
specific bug in your version. (If you're not up to that, you're probably
using a vendor kernel, which is a whole 'nother can of worms.)

If you install new hardware or software, and it going "boing" would be a bad
thing, you try it on a scratch box first. If you don't, you deserve what you
get.

I'm under the impression 2.4.19 is introducing chunks of Andre Hedrick's new
IDE code. So it's ok to upgrade something that can, in case of a bug, eat
your data silently in a way that journaling won't detect. Why? LBA-48 and
ATA-133, of course. But scheduling, which is SUPPOSED to be
non-deterministic from day one and could theoretically be brain-dead round
robin without affecting anything but performance... That's not safe to
upgrade. Right.

If you have a race condition in your code that a new scheduler triggers,
ANYTHING could trigger it. 2.4.18 behaves horribly under load, try md5sum on
an iso image and then pull up an xterm and su to another user. It can take
30 seconds. (Yeah, that's mostly IO starvation rather than the scheduler,
but still, how is the new scheduler going to do WORSE than this?)

The argument here is basically "don't change anything". It's not exactly a
series then, is it? If you want trailing edge, 2.0 is still being
maintained, let alone 2.2. Those have a great excuse for not accepting
anything new beyond a really obvious bugfix. 2.4 does not, because 2.6 isn't
out yet. Backporting of somethings from 2.5 to 2.4 will occur until then,
and O(1) is an obvious eventual candidate.

> stress that again, _stable_. I'd hope that 2.4.60 is as slow in coming
> as 2.0.40 is.

So the fact that it's in Alan Cox's kernel (meaning Red Hat is shipping it in
2.4.18-5.55, meaning that if more people aren't actually USING it yet than
marcelo's 2.4, they will be soon), and andrea's kernel (meaning new VM
development is being done with it in mind)... It may not be "sufficiently
tested" yet but it's GETTING a lot of testing. You use anything EXCEPT a
stock vanilla 2.4, you're probably getting O(1) at this point.

If the vendors are starting to ship the thing already, what is the DOWN side
to integrating it? The down side to NEVER integrating it is eventually fewer
people using the kernel off of kernel.org.

Does this remind anybody else of the 0.90 software raid stuff? At some point
it makes more sense to keep the OLD one around as a patch for the 5% of the
community that doesn't want to upgrade. We're not there on the scheduler
yet, but "should not happen" without a qualifier means "never"...

> > >c) I also suspect that it hasn't been as widley tested on !x86 as the
> > >stuff currently in 2.4. And again, 2.4 is the stable tree.
> >
> > I know it is not a priority for 2.4, but say it wil never happen...
>
> I won't say it will never happen, just that I don't think it should.
> It's a rather invasive thing (and as Ingo said, it's just not getting
> stable).

Ingo's main objection was that the patch is only 6 months old, and that 2.4
is only now stabilizing and that bug squeezing and smoothing should be given
a little longer to ensure that people have the option of NOT upgrading, and
that those upgrading want improvements rather than critical "this just
doesn't work" fixes. And that's a fine argument.

But 2.6 isn't going to be out this year. It's not even having its first
freeze until October. Traditionally, we've been running a year and a half
between stable releases (and another six months to actually get the new one
battle-tested to where the distros and at least 50% of the production boxes
upgrade.) We've got a year to eighteen months left on that cycle. Are the
distros going to hold off adding it to 2.4 for a year to 18 months?

The real question is, how much MORE conservative than the distros should the
mainline kernels be?

Rob

2002-07-03 07:28:16

by Adrian Bunk

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Tue, 2 Jul 2002, Rob Landley wrote:

>...
> The real question is, how much MORE conservative than the distros should the
> mainline kernels be?

Your "the distros" are only a subset of all Linux distributions? E.g. the
2.4 kernel images in Debian (that will be in the next release of Debian)
are plain ftp.kernel.org kernels (no -ac or -aa kernels) with only very
few patches (read: bug fixes) applied.

> Rob

cu
Adrian

--

You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
Alan Cox



2002-07-03 08:35:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4


On Tue, 2 Jul 2002, Rob Landley wrote:

> If you want stone tablet stability, why the heck are you upgrading your
> kernel? [...]

to get security and stability fixes.

> The argument here is basically "don't change anything". It's not
> exactly a series then, is it? If you want trailing edge, 2.0 is still
> being maintained, let alone 2.2. Those have a great excuse for not
> accepting anything new beyond a really obvious bugfix. 2.4 does not,
> because 2.6 isn't out yet. Backporting of somethings from 2.5 to 2.4
> will occur until then, and O(1) is an obvious eventual candidate.

it might be a candidate for inclusion once it has _proven_ stability and
robustness (in terms of tester and developer exposion), on the same order
of magnitude as the 2.4 kernel - but that needs time and exposure in trees
like the -ac tree and vendor trees. It might not happen at all, during the
lifetime of 2.4.

Note that the O(1) scheduler isnt a security or stability fix, neither is
it a driver backport. It isnt a feature backport that enables hardware
that couldnt be used in 2.4 before. The VM was a special case because most
people agreed that it truly sucked, and even though people keep
disagreeing about that decision, the VM is in a pretty good shape now -
and we still have good correlation between the VM in 2.5, and the VM in
2.4. The 2.4 scheduler on the other hand doesnt suck for 99% of the
people, so our hands are not forced in any way - we have the choice of a
'proven-rock-solid good scheduler' vs. an 'even better, but still young
scheduler'.

if say 90% of Linux users on the planet adopt the O(1) scheduler, and in a
year or two there wont be a bigger distro (including Debian of course)
without the O(1) scheduler in it [which, admittedly, is happening
already], then it can and should perhaps be merged into 2.4. But right now
i think that the majority of 2.4 users are running the stock 2.4
scheduler.

> So the fact that it's in Alan Cox's kernel (meaning Red Hat is shipping
> it in 2.4.18-5.55, meaning that if more people aren't actually USING it
> yet than marcelo's 2.4, they will be soon), and andrea's kernel (meaning
> new VM development is being done with it in mind)... It may not be
> "sufficiently tested" yet but it's GETTING a lot of testing. You use
> anything EXCEPT a stock vanilla 2.4, you're probably getting O(1) at
> this point.

things like migration to a new kernel happen on a slighly slower scale
than the 6 months this patch has existed. I'd say in 1 year what you say
might be true. 70% of the Linux users are not running the 'very latest'
release.

also note that the O(1) scheduler patch in the Red Hat kernel rpm was a
stability fork done months ago, with stability fixes backported into it.
The 2.4 O(1) patches being distributed now are more like direct backports
of the 2.5 scheduler - this way we can get testing and feedback even from
those people who do not want to (or cannot) run a 2.5 kernel due to the
massive IO changes being underway.

i do not say that the O(1) scheduler has bugs (if i knew about any i'd
have fixed it already :), i am simply saying that to be able to say to
Marcelo "it does not have bugs and does not introduce problems" it needs
more exposure. [ And if the author of a given piece of code says things
like this then it usually does not get merged ;-) ]

> not there on the scheduler yet, but "should not happen" without a
> qualifier means "never"...

we agree here.

> The real question is, how much MORE conservative than the distros should
> the mainline kernels be?

There's a natural 'feature race' between distros, so the distros can act
as an additional (and pretty powerful) testing tool for various kernel
features - and for which the distros are willing to spend resources and
take risks as well. In fact they also act as a 'user demand' filter, for
kernel features as well. And if all distros pick up a given feature, and
it's been in for more than 6 months, (instead of 'more than 6 months since
first patch') then Marcelo will have a much easier decision :-)

Ingo

2002-07-04 03:39:36

by Bill Davidsen

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4


> it might be a candidate for inclusion once it has _proven_ stability and
> robustness (in terms of tester and developer exposion), on the same order
> of magnitude as the 2.4 kernel - but that needs time and exposure in trees
> like the -ac tree and vendor trees. It might not happen at all, during the
> lifetime of 2.4.

It has already proven to be stable and robust in the sense that it isn't
worse than the stock scheduler on typical loads and is vastly better on
some.
>
> Note that the O(1) scheduler isnt a security or stability fix, neither is
> it a driver backport. It isnt a feature backport that enables hardware
> that couldnt be used in 2.4 before. The VM was a special case because most
> people agreed that it truly sucked, and even though people keep
> disagreeing about that decision, the VM is in a pretty good shape now -
> and we still have good correlation between the VM in 2.5, and the VM in
> 2.4. The 2.4 scheduler on the other hand doesnt suck for 99% of the
> people, so our hands are not forced in any way - we have the choice of a
> 'proven-rock-solid good scheduler' vs. an 'even better, but still young
> scheduler'.

Here I disagree. Sure behaves like a stability fix to me. On a system with
a mix of interractive and cpu-bound processes, including processes with
hundreds of threads, you just can't get reasonable performance balancing
with nice() because it is totally impractical to keep tuning a thread
which changes from hog to disk io to socket waits with a human in the
loop. The new scheduler notices this stuff and makes it work, I don't even
know for sure (as in tried it) if you can have different nice on threads
of the same process.

This is not some neat feature to buy a few percent better this or that,
this is roughly 50% more users on the server before it falls over, and no
total bogs when many threads change to hog mode at once.

You will not hear me saying this about preempt, or low-latency, and I bet
that after I try lock-break this weekend I won't fell that I have to have
that either. The O(1) scheduler is self defense against badly behaved
processes, and the reason it should go in mainline is so it won't depend
on someone finding the time to backport the fun stuff from 2.5 as a patch
every time.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-07-04 04:05:16

by Bill Davidsen

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Tue, 2 Jul 2002, Tom Rini wrote:

> Sure there is. It's called stopping feature creep. O(1) is a nice
> feature, but so is the bio stuff, the initcall levels, and other things
> in 2.5 as well. But should we back port all of these to 2.4 as well?

None of the other stuff is (a) a solution for any current problem I've
seen (it NEW capability), or (b) has a functional and widely exposed port
to 2.4 already.

The only other feature which which I'm familiar which even remotely fits
those two characteristics is rmap, and with the VM changes Andrea has made
I certainly don't hit really bad VM behaviour on my machines. On some rmap
is a tad better, but compared to 2.4.16 or so 19-preX-aa is acceptable.

> > Stable doesn't mean moribund, we are working Andrea's VM stuff in, and
> > that's a LOT more likely to behave differently on hardware with other word
> > length.
>
> Being someone who actually works on !x86 hardware all of the time, I'm
> slightly warry of Andrea's VM work as well. But it's also something
> which has been split into numerous small chunks, so hopefully problems
> will be spotted.
>
> > Keeping inferior performance for another year and then trying to
> > separate 2.5 other unintended features from any possible scheduler issues
> > seems like a reduction in stability for 2.6.
>
> It's no more of a reduction in stability than not back porting
> everything else. And making things stable is why eventually Linus says
> 'enough' and kicks out 2.stable.0-test1. Anyhow, since this isn't a
> subsystem backport, but part of the core kernel, I would think that you
> could only get limited use out of the testing (I remember reading some
> of the O(1) announcments for 2.4.then-current and reading about small
> bugs that weren't in the 2.5 version).

The current scheduler has one big bug; it gives the processor to the wrong
process under some load conditions to the point where the system appears
hung for seconds (or longer). There are two issues, one is best or
acceptable performance, and one is "best worst-case performance." The O(1)
simply doesn't have or hasn't shown me the jackpot case when load changes
on a machine. To me that justifies O(1). Even if it was not faster than
the current scheduler for normal load, the worst case is what needs a fix.

And as I mentioned to Ingo, I don't feel that way about low-latency or
preempt, even though they help a little they don't really fix anything
broken, and I don't argue for inclusion. The current scheduler does behave
very badly in some cases, and should be fixed now, not in 18 months.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-07-04 04:18:46

by Tom Rini

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Thu, Jul 04, 2002 at 12:02:20AM -0400, Bill Davidsen wrote:

> On Tue, 2 Jul 2002, Tom Rini wrote:
>
> > Sure there is. It's called stopping feature creep. O(1) is a nice
> > feature, but so is the bio stuff, the initcall levels, and other things
> > in 2.5 as well. But should we back port all of these to 2.4 as well?
>
> None of the other stuff is (a) a solution for any current problem I've
> seen (it NEW capability), or (b) has a functional and widely exposed port
> to 2.4 already.

I believe (b), but bio is attempting to solve some of the underlying
block device issues. And all of the IDE stuff is trying to make a good
IDE subsystem. And so on and so forth.

> The only other feature which which I'm familiar which even remotely fits
> those two characteristics is rmap, and with the VM changes Andrea has made
> I certainly don't hit really bad VM behaviour on my machines. On some rmap
> is a tad better, but compared to 2.4.16 or so 19-preX-aa is acceptable.

And rmap isn't in 2.4. And I don't think it will be, nor IMHO some
parts of -aa.

> > It's no more of a reduction in stability than not back porting
> > everything else. And making things stable is why eventually Linus says
> > 'enough' and kicks out 2.stable.0-test1. Anyhow, since this isn't a
> > subsystem backport, but part of the core kernel, I would think that you
> > could only get limited use out of the testing (I remember reading some
> > of the O(1) announcments for 2.4.then-current and reading about small
> > bugs that weren't in the 2.5 version).
>
> The current scheduler has one big bug; it gives the processor to the wrong
> process under some load conditions to the point where the system appears
> hung for seconds (or longer).

So, in some corner cases it sucks. The VM has issues for corner cases
as well, which is why distros include lots of other patches in their
kernels.

> And as I mentioned to Ingo, I don't feel that way about low-latency or
> preempt, even though they help a little they don't really fix anything
> broken, and I don't argue for inclusion. The current scheduler does behave
> very badly in some cases, and should be fixed now, not in 18 months.

I don't think the low-latency, preempt or O(1) should make it into 2.4.
And since Ingo, who wrote this, doesn't think it should go into 2.4
right now, it hopefully won't.

Just because some corner cases can be fixed by massive rewrites doesn't
mean the fix should be backported. It seems I can't stress this enough,
2.4 is supposed to be _stable_. And by stable I mean doesn't crash,
lock up, or panic. Less than ideal VM usage or CPU usage generally
isn't solvable in small easily verifiable patches like fixing crashes,
lock ups and panics are.

I'm not saying people shouldn't use O(1) (or preempt or low-latency or a
half dozen other things not in 2.4 proper), just that they shouldn't go
into 2.4.<current>. Vendors should decide if they want to add them on
top of a stable base. Users should decide if they want to add them on
top of a stable base.

--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

2002-07-04 06:56:25

by Ingo Molnar

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4


On Wed, 3 Jul 2002, Bill Davidsen wrote:

> > it might be a candidate for inclusion once it has _proven_ stability and
> > robustness (in terms of tester and developer exposion), on the same order
> > of magnitude as the 2.4 kernel - but that needs time and exposure in trees
> > like the -ac tree and vendor trees. It might not happen at all, during the
> > lifetime of 2.4.
>
> It has already proven to be stable and robust in the sense that it isn't
> worse than the stock scheduler on typical loads and is vastly better on
> some.

this is your experience, and i'm happy about that. Whether it's the same
experience for 90% of Linux users, time will tell.

> > Note that the O(1) scheduler isnt a security or stability fix, neither is
> > it a driver backport. It isnt a feature backport that enables hardware
> > that couldnt be used in 2.4 before. The VM was a special case because most
> > people agreed that it truly sucked, and even though people keep
> > disagreeing about that decision, the VM is in a pretty good shape now -
> > and we still have good correlation between the VM in 2.5, and the VM in
> > 2.4. The 2.4 scheduler on the other hand doesnt suck for 99% of the
> > people, so our hands are not forced in any way - we have the choice of a
> > 'proven-rock-solid good scheduler' vs. an 'even better, but still young
> > scheduler'.
>
> Here I disagree. Sure behaves like a stability fix to me. On a system
> with a mix of interractive and cpu-bound processes, including processes
> with hundreds of threads, you just can't get reasonable performance
> balancing with nice() because it is totally impractical to keep tuning a
> thread which changes from hog to disk io to socket waits with a human in
> the loop. The new scheduler notices this stuff and makes it work, I
> don't even know for sure (as in tried it) if you can have different nice
> on threads of the same process.

(yes, it's possible to nice() individual threads.)

> This is not some neat feature to buy a few percent better this or that,
> this is roughly 50% more users on the server before it falls over, and
> no total bogs when many threads change to hog mode at once.

are these hard numbers? I havent seen much hard data yet from real-life
servers using the O(1) scheduler. There was lots of feedback from
desktop-class systems that behave better, but servers used to be pretty
good with the previous scheduler as well.

> You will not hear me saying this about preempt, or low-latency, and I
> bet that after I try lock-break this weekend I won't fell that I have to
> have that either. The O(1) scheduler is self defense against badly
> behaved processes, and the reason it should go in mainline is so it
> won't depend on someone finding the time to backport the fun stuff from
> 2.5 as a patch every time.

well, the O(1) scheduler indeed tries to put up as much defense against
'badly behaved' processes as possible. In fact you should try to start up
your admin shells via nice -20, that gives much more priority than it used
to under the previous scheduler - it's very close to the RT priorities,
but without the risks. This works in the other direction as well: nice +19
has a much stronger meaning (in terms of preemption and timeslice
distribution) than it used to.

Ingo

2002-07-04 07:35:19

by Joe

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

Ingo, it's apparent you are refraining from
pushing this O(1) scheduler - that's admirable,
but don't swing too far in the other direction.

The fact is, it's working well in 2.5, it's working
well in the 2.4-ac tree, it's working well in the
2.4-aa tree, and Red Hat has been shipping it.

It will soon be the case that most Linux users
are using O(1) - thus any poor clown who
downloads the standard src from kernel.org
has a large task ahead of him if he wants
similar functionality to the majority of
linux users. This divergence may not be a
good thing...

;-)

Joe

Ingo Molnar wrote:

>On Wed, 3 Jul 2002, Bill Davidsen wrote:
>
>
>
>>>it might be a candidate for inclusion once it has _proven_ stability and
>>>robustness (in terms of tester and developer exposion), on the same order
>>>of magnitude as the 2.4 kernel - but that needs time and exposure in trees
>>>like the -ac tree and vendor trees. It might not happen at all, during the
>>>lifetime of 2.4.
>>>
>>>
>>It has already proven to be stable and robust in the sense that it isn't
>>worse than the stock scheduler on typical loads and is vastly better on
>>some.
>>
>>
>
>this is your experience, and i'm happy about that. Whether it's the same
>experience for 90% of Linux users, time will tell.
>
>
>
>>>Note that the O(1) scheduler isnt a security or stability fix, neither is
>>>it a driver backport. It isnt a feature backport that enables hardware
>>>that couldnt be used in 2.4 before. The VM was a special case because most
>>>people agreed that it truly sucked, and even though people keep
>>>disagreeing about that decision, the VM is in a pretty good shape now -
>>>and we still have good correlation between the VM in 2.5, and the VM in
>>>2.4. The 2.4 scheduler on the other hand doesnt suck for 99% of the
>>>people, so our hands are not forced in any way - we have the choice of a
>>>'proven-rock-solid good scheduler' vs. an 'even better, but still young
>>>scheduler'.
>>>
>>>
>>Here I disagree. Sure behaves like a stability fix to me. On a system
>>with a mix of interractive and cpu-bound processes, including processes
>>with hundreds of threads, you just can't get reasonable performance
>>balancing with nice() because it is totally impractical to keep tuning a
>>thread which changes from hog to disk io to socket waits with a human in
>>the loop. The new scheduler notices this stuff and makes it work, I
>>don't even know for sure (as in tried it) if you can have different nice
>>on threads of the same process.
>>
>>
>
>(yes, it's possible to nice() individual threads.)
>
>
>
>>This is not some neat feature to buy a few percent better this or that,
>>this is roughly 50% more users on the server before it falls over, and
>>no total bogs when many threads change to hog mode at once.
>>
>>
>
>are these hard numbers? I havent seen much hard data yet from real-life
>servers using the O(1) scheduler. There was lots of feedback from
>desktop-class systems that behave better, but servers used to be pretty
>good with the previous scheduler as well.
>
>
>
>>You will not hear me saying this about preempt, or low-latency, and I
>>bet that after I try lock-break this weekend I won't fell that I have to
>>have that either. The O(1) scheduler is self defense against badly
>>behaved processes, and the reason it should go in mainline is so it
>>won't depend on someone finding the time to backport the fun stuff from
>>2.5 as a patch every time.
>>
>>
>
>well, the O(1) scheduler indeed tries to put up as much defense against
>'badly behaved' processes as possible. In fact you should try to start up
>your admin shells via nice -20, that gives much more priority than it used
>to under the previous scheduler - it's very close to the RT priorities,
>but without the risks. This works in the other direction as well: nice +19
>has a much stronger meaning (in terms of preemption and timeslice
>distribution) than it used to.
>
> Ingo
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>



2002-07-05 00:43:44

by Rob Landley

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Wednesday 03 July 2002 11:36 pm, Bill Davidsen wrote:

> This is not some neat feature to buy a few percent better this or that,
> this is roughly 50% more users on the server before it falls over, and no
> total bogs when many threads change to hog mode at once.
>
> You will not hear me saying this about preempt, or low-latency, and I bet
> that after I try lock-break this weekend I won't fell that I have to have
> that either. The O(1) scheduler is self defense against badly behaved
> processes, and the reason it should go in mainline is so it won't depend
> on someone finding the time to backport the fun stuff from 2.5 as a patch
> every time.

I've got a similar setup. At work I'm doing a simple ssh based vpn:
connections to the vpn address range outside the local subnet are intercepted
by port forwarding to a tiny daemon (700 lines of C source, mostly comments),
that shells out to ssh (forwarding stdin and stdout back to the net
connection) to connect to the appropriate remote gateaway, where it runs
netcat to complete the connection.

So each tcp/ip stream is individually wrapped in its own ssh process, which
exits automatically when the connection closes. No mess, no fuss, and
scalability is based on active connections rather than the number of systems
in the VPN.

Unfortunately, some of these VPN gateways are behind existing firewalls
(cisco, etc). If I can get a port forwarded to my vpn gateway from that
firewall, life is good (it's a little more work for the daemon to figure out
where to ssh to, but that's part of the 700 lines). But when I can't get
that, the machine has to dial out to a known public machine (the "star
server") and have its incoming data bounced off of that machine. (Evil, but
only incoming connections to those trapped machines need to use the star
server. Everybody else can still dial direct, and the trapped machines can
still dial out direct.)

The star server tends to be running LOTS of ssh processes (four for each
connection: one instance of sshd for each incoming connection, plus the
netbounce processes that sshd instance runs, which talk to one another
through named pipes. I could get that down to two processes by modifying
sshd to integrate the netbounce functionality, but it hasn't been a
bottleneck. Netbounce doesn't eat much, sshd is the real cpu hog. And it's
not as easy to rewrite netbounce to be one central process with a poll loop
as you'd think: sshd wants to run SOMETHING. So far I'm using standard sshd
code, I'd prefer not to make special purpose modifications to the thing it if
I can help it.)

The bottleneck is that with thirty big data transfers going through sixty
sshd processes (which are real CPU hogs decrypting incoming data and
encrypting outgoing data), a 700 mhz athlon goes catatonic. The existing
bulk data shoveling connections have their data shoveled fine, but new
incoming connections (even for short lived "fetch me 10k of web data of the
remote box" type connections) are Not Happy. The existing scheduler's
getting confused by the fact that the sshd sessions DO sometimes block to
get/send their data, and isn't so good at keeping a running average to spot
the CPU hogs and the sessions that are more interactive or simply short lived.

That's why I'm playing with the O(1) scheduler. I may need to put rate
limiting in netbounce anyway, but the problem I'm HITTING is that the
existing scheduler is melting down so badly that past a fairly low saturation
level, fresh connection attempts through the star server are timing out.
(This hardware seems like it should be able to handle around 100 simultaneous
connections, and it's currently melting down around 30.)

Yeah, I'm beating the CPU to death encrypting and decrypting data. Yeah, I
could throw more hardware at the problem (and will). I could take another
stab at redesigning the star server to consolidate all the netbounce
processes into a single poll loop (which would require modifying sshd), but
netbounce isn't the problem: the two sshd processes per connection are. (I
could merge all the connections to and from each box into a single sshd
process per gateway, but that clashes with the way the rest of the VPN works,
which is simple and suprisingly reliable, and there would still be at least
one per box anyway. And what that really MEANS is that I'd be bypassing the
process scheduler and doing my own manual scheduling.)

This is a real-world situation of a pure scheduling problem. The star server
has a quarter gigabyte of ram and isn't going anywhere near swap. The
scheduler has plenty of hints about CPU usage, blocking for I/O, and freshly
spawned processes needing to start at a higher priority than entrenched
saturation level data shovelers.

Hence putting "play with O(1)" on my to-do list...

Rob

2002-07-05 06:16:08

by Andrew Rodland

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Thu, 4 Jul 2002 08:56:01 +0200 (CEST)
Ingo Molnar <[email protected]> wrote:

>
> On Wed, 3 Jul 2002, Bill Davidsen wrote:
>
> > > it might be a candidate for inclusion once it has _proven_
> > > stability and robustness (in terms of tester and developer
> > > exposion), on the same order of magnitude as the 2.4 kernel - but
> > > that needs time and exposure in trees like the -ac tree and vendor
> > > trees. It might not happen at all, during the lifetime of 2.4.
> >
> > It has already proven to be stable and robust in the sense that it
> > isn't worse than the stock scheduler on typical loads and is vastly
> > better on some.
>
> this is your experience, and i'm happy about that. Whether it's the
> same experience for 90% of Linux users, time will tell.
>
> > > Note that the O(1) scheduler isnt a security or stability fix,
> > > neither is it a driver backport. It isnt a feature backport that
> > > enables hardware that couldnt be used in 2.4 before. The VM was a
> > > special case because most people agreed that it truly sucked, and
> > > even though people keep disagreeing about that decision, the VM is
> > > in a pretty good shape now - and we still have good correlation
> > > between the VM in 2.5, and the VM in 2.4. The 2.4 scheduler on the
> > > other hand doesnt suck for 99% of the people, so our hands are not
> > > forced in any way - we have the choice of a'proven-rock-solid good
> > > scheduler' vs. an 'even better, but still young scheduler'.
> >
> > Here I disagree. Sure behaves like a stability fix to me. On a
> > system with a mix of interractive and cpu-bound processes, including
> > processes with hundreds of threads, you just can't get reasonable
> > performance balancing with nice() because it is totally impractical
> > to keep tuning a thread which changes from hog to disk io to socket
> > waits with a human in the loop. The new scheduler notices this stuff
> > and makes it work, I don't even know for sure (as in tried it) if
> > you can have different nice on threads of the same process.
>
> (yes, it's possible to nice() individual threads.)
>
> > This is not some neat feature to buy a few percent better this or
> > that, this is roughly 50% more users on the server before it falls
> > over, and no total bogs when many threads change to hog mode at
> > once.
>
> are these hard numbers? I havent seen much hard data yet from
> real-life servers using the O(1) scheduler. There was lots of feedback
> from desktop-class systems that behave better, but servers used to be
> pretty good with the previous scheduler as well.
>
> > You will not hear me saying this about preempt, or low-latency, and
> > I bet that after I try lock-break this weekend I won't fell that I
> > have to have that either. The O(1) scheduler is self defense against
> > badly behaved processes, and the reason it should go in mainline is
> > so it won't depend on someone finding the time to backport the fun
> > stuff from 2.5 as a patch every time.
>
> well, the O(1) scheduler indeed tries to put up as much defense
> against'badly behaved' processes as possible. In fact you should try
> to start up your admin shells via nice -20, that gives much more
> priority than it used to under the previous scheduler - it's very
> close to the RT priorities, but without the risks. This works in the
> other direction as well: nice +19 has a much stronger meaning (in
> terms of preemption and timeslice distribution) than it used to.

Very nearly off topic, but I've had a few people on IRC tell me that
they love O(1) specifically because it has a 'nice that actually does
something'. As a matter of fact, I've had to change my X startup
scripts, to make it a bit less selfish; the defaults are just plain
silly, now.

I had thought before that I had a complaint about processes that spawn a
large number of children, and then reap them all at once, but it turns
out that I was just running myself out of memory while conducting the
test, and that if I avoid swapping, I don't run into any problems. I'm
running 2.4.19-pre10-ac2 + preempt + some little things, on a 400mhz
laptop, and it's just about as smooth as I could ask for.

As for O(1) in mainline, I think that it's better than what we've got.
But as for me, as long as O(1)-sched keeps moving, and AC keeps cranking
out the patches, I'll be happy. >:)
>
> Ingo
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2002-07-05 06:54:30

by Adrian Bunk

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Fri, 5 Jul 2002, Andrew Rodland wrote:

>...
> Very nearly off topic, but I've had a few people on IRC tell me that
> they love O(1) specifically because it has a 'nice that actually does
> something'. As a matter of fact, I've had to change my X startup
> scripts, to make it a bit less selfish; the defaults are just plain
> silly, now.
>...

This is exactly a reason why O(1) shouldn't go into 2.4:

E.g. my X is as suggested by my the installation routine of my
distribution (Debian unstable/testing) niced to -10. It would be a bad
surprise for _many_ people if they upgrade their 2.4 kernel because of
other security and/or stability fixes and such a setting is then wrong.


cu
Adrian

--

You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
Alan Cox



2002-07-05 07:00:12

by Andrew Rodland

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Fri, 5 Jul 2002 08:56:59 +0200 (CEST)
Adrian Bunk <[email protected]> wrote:

> On Fri, 5 Jul 2002, Andrew Rodland wrote:
>
> >...
> > Very nearly off topic, but I've had a few people on IRC tell me that
> > they love O(1) specifically because it has a 'nice that actually
> > does something'. As a matter of fact, I've had to change my X
> > startup scripts, to make it a bit less selfish; the defaults are
> > just plain silly, now.
> >...
>
> This is exactly a reason why O(1) shouldn't go into 2.4:
>
> E.g. my X is as suggested by my the installation routine of my
> distribution (Debian unstable/testing) niced to -10. It would be a bad
> surprise for _many_ people if they upgrade their 2.4 kernel because of
> other security and/or stability fixes and such a setting is then
> wrong.
>

Same setup, actually -- I changed it to -3 and it seems nicer.

As for it going into 2.4, well, I'm not incredibly strongly for it, but
I do get a feeling that most of the distros (especially the ones famous
for patching their kernels beyond recognizabliity) will start jumping on
this particular wagon soon. Does the kernel want to be like debian
("Well, yeah, the releases are horribly out of date, but normal human
beings don't actually _use_ the releases") ?

P.S. Do not suppose from this message that I do not love debian
immensely. :)

2002-07-05 09:14:06

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Thu, Jul 04, 2002 at 08:56:01AM +0200, Ingo Molnar wrote:
> are these hard numbers? I havent seen much hard data yet from real-life
> servers using the O(1) scheduler. There was lots of feedback from
> desktop-class systems that behave better, but servers used to be pretty
> good with the previous scheduler as well.

I seem to recall some testing having been done demonstrating such
differences. I'll ask around when I get back from vacation, though I'll
confess it's far afield from my usual interests.


Cheers,
Bill

2002-07-05 11:21:26

by Bill Davidsen

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

Rob, while I'm sure O(1) would help, you have designed this network to
have a high overhead. I'll send you some notes on how to easily reduce the
overhead to max one sshd per machine connected to the bounce machine. And
give you an option to move the crypt overhead to the machines at the end
points.

On Thu, 4 Jul 2002, Rob Landley wrote:
[...snip...]>
> The bottleneck is that with thirty big data transfers going through sixty
> sshd processes (which are real CPU hogs decrypting incoming data and
> encrypting outgoing data), a 700 mhz athlon goes catatonic. The existing
> bulk data shoveling connections have their data shoveled fine, but new
> incoming connections (even for short lived "fetch me 10k of web data of the
> remote box" type connections) are Not Happy. The existing scheduler's
> getting confused by the fact that the sshd sessions DO sometimes block to
> get/send their data, and isn't so good at keeping a running average to spot
> the CPU hogs and the sessions that are more interactive or simply short lived.
>
> That's why I'm playing with the O(1) scheduler. I may need to put rate
> limiting in netbounce anyway, but the problem I'm HITTING is that the
> existing scheduler is melting down so badly that past a fairly low saturation
> level, fresh connection attempts through the star server are timing out.
> (This hardware seems like it should be able to handle around 100 simultaneous
> connections, and it's currently melting down around 30.)
>
> Yeah, I'm beating the CPU to death encrypting and decrypting data. Yeah, I
> could throw more hardware at the problem (and will). I could take another
> stab at redesigning the star server to consolidate all the netbounce
> processes into a single poll loop (which would require modifying sshd), but
> netbounce isn't the problem: the two sshd processes per connection are. (I
> could merge all the connections to and from each box into a single sshd
> process per gateway, but that clashes with the way the rest of the VPN works,
> which is simple and suprisingly reliable, and there would still be at least
> one per box anyway. And what that really MEANS is that I'd be bypassing the
> process scheduler and doing my own manual scheduling.)
>
> This is a real-world situation of a pure scheduling problem. The star server
> has a quarter gigabyte of ram and isn't going anywhere near swap. The
> scheduler has plenty of hints about CPU usage, blocking for I/O, and freshly
> spawned processes needing to start at a higher priority than entrenched
> saturation level data shovelers.
[...snip...]

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-07-05 21:04:57

by Rob Landley

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Friday 05 July 2002 07:17 am, Bill Davidsen wrote:
> Rob, while I'm sure O(1) would help, you have designed this network to
> have a high overhead.

The star server design has high overhead. The direct point to point does
not. I'm trying to make the star server work the way the rest of the network
works, WITHOUT extensively redesigning the parts of the network that don't
need to use the star server

The star server's inherently a kludge, and I know it, and I'm trying to
minimize its use. It's inherently a single point of failure, and a
bottleneck on an otherwise distributed and scalable system, and it
potentially doubles bandwidth usage and generally MORE than doubles the
bandwidth bill because fast connections are expensive and often metered. No
redesign will fix those fundamental problems. The star server really exists
for political reasons: some people think that having a process behind the
firewall go out and fetch incoming connections is more secure than forwarding
a port. Either way, a way in exists by definition, or you haven't got a VPN.
On a technical level, I really do suggest just forwarding the port.

> I'll send you some notes on how to easily reduce the
> overhead to max one sshd per machine connected to the bounce machine. And
> give you an option to move the crypt overhead to the machines at the end
> points.

Thanks for your suggestions. I mentioned last time that I could redesign my
existing code in a number of ways to get around the old flawed scheduler,
yes. And there are easier ones than you suggested (I REALLY don't want to
over-complicate what is currently a very simple design). And redoing it
probably seems like a much easier problem to tackle when you don't know what
the full set of design requirements are. (Among other things, nodes move
dynamically. They go down, their IP address changes...)

I did stop and reconsider your suggestion about removing the star server's
redundant decrypt/re-encrypt step. It could be done without introducing a
ppp layer (which has several of the aforementioned design requirements
problems I won't go into here). Unfortunately, if I did that, the initial
handshaking a client box does with the star server (to identify itself and
the type of connection it wants to make, etc) wouldn't be encrypted or
cryptographically verified either (unless I did it myself, and right now all
the encryption is neatly handled by ssh, which I already mentioned not
wanting to modify). As I said, if O(1) doesn't work I have options. (I have
a long to-do list to get through first but I hope to be able to try it on a
stress-testable server sometime after the weekend.)

The other thing is that CPU usage should scale with bandwidth shoveled, and
that should be mostly true whether it's one process or 100. (Yeah, modulo
cache flushing, but it's the same process and that data it works on is
use-once stream no matter how you look at it.) The star server is hooked up
to the internet, not a LAN. If it's got a faster than 10 megabit connection
somebody's putting a LOT of money behind it, they could definitely afford to
throw SMP CPU time at the problem, and in that case having multiple processes
makes scaling easier. Having CPU usage be a limiting factor was acceptable
in the initial design, but the behavior under load of the old scheduler is a
bit... unexpected at times.

And at THIS point, the question is whether to redesign the app or fix the
scheduler. (I expect the multi-threaded people hit this all the time. :)

Rob

2002-07-06 04:34:01

by Bill Davidsen

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Fri, 5 Jul 2002, Rob Landley wrote:

> I did stop and reconsider your suggestion about removing the star server's
> redundant decrypt/re-encrypt step. It could be done without introducing a
> ppp layer (which has several of the aforementioned design requirements
> problems I won't go into here). Unfortunately, if I did that, the initial
> handshaking a client box does with the star server (to identify itself and
> the type of connection it wants to make, etc) wouldn't be encrypted or
> cryptographically verified either (unless I did it myself, and right now all
> the encryption is neatly handled by ssh, which I already mentioned not
> wanting to modify).

That's not correct... if you set the encryption type to none the
connection and port forwarding are not encrypted, but the handshake still
is, using password, host key, or requiring both. You can make a fully
authenticated non-encrypted connection. I like running the popular "sleep"
program as the main command, and using port forwarding for what you do,
since you reject running ppp over ssh.

I'm running 19-pre10ac2+smp patches, as I recall ac4 or 5 are out, I just
stopped upgrading when I got stability. If you run uni you should be able
to drop in the new kernel, push the excryption overhead to the endpoints,
and have nearly no work on the star server.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-07-07 05:06:30

by Rob Landley

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Saturday 06 July 2002 12:31 am, Bill Davidsen wrote:

> That's not correct... if you set the encryption type to none the
> connection and port forwarding are not encrypted, but the handshake still
> is, using password, host key, or requiring both. You can make a fully
> authenticated non-encrypted connection. I like running the popular "sleep"
> program as the main command, and using port forwarding for what you do,
> since you reject running ppp over ssh.

I'm sending data through the connection to tell the star server who we are
(which sshd may know but the star server doesn't) which I don't want to have
snooped. Box identification keys and such that could be used to forge
access. Yes, I could redesign the entire handshaking protocol to be based on
an md5 sum with a timestamp, and redo the key distribution of all the other
boxes in my network to try to get ssh to inform us of which box is connecting
in, but I don't want to.

The star server is a kludge. Period. Moving CPU usage won't change that.
The problem you're trying to solve ONLY affects the star server. The SIMPLE
solution is to put rate limiting into netbounce, which I specced out last
week but thought O(1) would be a better solution.

It's not the load that's the problem (a T1 line, DSL connection, or 10baseT
can't saturate it). It's the unfairness, which was found in laboratory
stress testing with a 100baseT network connection and a 700 mhz processor.
(If you can afford a 100baseT connection to the internet which you can keep
saturated for long periods of time, you can usually afford more than a 700
mhz processor. Really.)

Moving the limiting factor from CPU time (which is cheap) to network
bandwidth (which is expensive) makes the unfairness harder to fix anyway.
The O(1) scheduler gives the behavior we want, randomly dropping packets
because your network connection is saturated (which is the normal case in the
field anyway today) does not. If sheer bandwidth saturation causes a similar
problem in the second round of stress testing, I may have to put rate
limiting into netbounce anyway, although I'm hoping a combination of
processing latency and O(1) will do that for me even when the star server is
not CPU-limited but bandwidth limited by real-world internet connections.
(I'm not holding my breath, but dealing with the current problem means I can
hold off for a while...)

You keep trying to find a way to fix the wrong problem: it's not the
bandwidth, it's the unfairness. O(1) is a quick and easy fix that might
address this (on tuesday), and so far it's only a problem when a kludge I'm
trying to minimize the use of is stress tested under laboratory conditions.
I only mentioned it in the first place as a real-world application of O(1) to
an existing problem, yes one that could be solved in other ways.

I could get this thing to work under DOS if I wanted to, without any
scheduler at all, I just really don't consider it a good use of time. I
would happily have the star server do twice the work if it avoids adding
complexity and overhead to the nodes that are NOT using the star server.
Doing otherwise is optimizing the wrong thing: the star server is a bad idea
requested by management for customers who want to have a VPN without
configuring their firewalls to work with it. It -CAN'T- be efficient, it's a
single bottleneck for the entire network that's sending every packet through
the same interface twice, the only question is how inefficient will it be.
I'm not rewriting the way the rest of the nodes work to coddle the star
server unless I have no choice, and I had about three alternatives lined up
to try before I decided that O(1) would be a cleaner and easier thing to try.

> I'm running 19-pre10ac2+smp patches, as I recall ac4 or 5 are out, I just
> stopped upgrading when I got stability. If you run uni you should be able
> to drop in the new kernel, push the excryption overhead to the endpoints,
> and have nearly no work on the star server.

What the heck does the new kernel have to do with rewriting my app so that
ssh is used in a different manner? I could do that on 2.4.18 just fine,
and I've repeatedly said I don't want to, and going truly in-depth as to the
reasons WHY is off-topic here.

I got the O(1) patch for 19-rc1 and will testing it on my laptop in a few
hours. I really don't want to continue this thread.

Rob

2002-07-07 10:58:07

by Bill Davidsen

[permalink] [raw]
Subject: Re: [OKS] O(1) scheduler in 2.4

On Sat, 6 Jul 2002, Rob Landley wrote:

> I got the O(1) patch for 19-rc1 and will testing it on my laptop in a few
> hours. I really don't want to continue this thread.

Agreed, I haven't been able to communicate the idea to you in several
tries or you would not be talking about rewriting your application (unless
changing router IP requires that). You appear to be looking for the best
way to do something which doesn't need to be done, and I wish you joy at
it.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.