It seems there are two low-latency projects out there. The one by Robert Love:
http://tech9.net/rml/linux/
and the original one:
http://www.uow.edu.au/~andrewm/linux/schedlat.html
Correct me if I'm wrong, but the former uses spinlocks to know when it can
preempt the kernel, and the latter just tries to reduce latency by adding
(un)conditional_schedule and placing it at key places in the kernel?
My questions are:
1) Which of these two projects has better latency performance? Has anyone
benchmarked them against each other?
2) Will either of these ever be merged into Linus' kernel (2.5?)
3) Is there a possibility that either of these will make it to non-x86
platforms? (for me: alpha) The second patch looks like it would
straightforwardly work on any arch, but the config.in for it is only in
arch/i386. Robert Love's patches would need some arch-specific asm...
Thanks,
-- Bob
Bob McElrath ([email protected])
Univ. of Wisconsin at Madison, Department of Physics
Bob McElrath wrote:
>
> It seems there are two low-latency projects out there. The one by Robert Love:
> http://tech9.net/rml/linux/
> and the original one:
> http://www.uow.edu.au/~andrewm/linux/schedlat.html
>
> Correct me if I'm wrong, but the former uses spinlocks to know when it can
> preempt the kernel, and the latter just tries to reduce latency by adding
> (un)conditional_schedule and placing it at key places in the kernel?
Pretty much. The second one also reorganises various areas of the
kernel which can traverse very long lists when under spinlocks.
> My questions are:
> 1) Which of these two projects has better latency performance? Has anyone
> benchmarked them against each other?
I haven't seen any rigorous latency measurements on Rob's stuff, and
I haven't seriously measured the reschedule-based patch for months. But
I would expect the preempt patch to perform significantly worse because
it doesn't attempt to break up the abovementioned long-held locks. (It can
do so, though - a straightforward adaptation of the reschedule patch's
changes will fix it).
> 2) Will either of these ever be merged into Linus' kernel (2.5?)
Controversial. My vague feeling is that they shouldn't. Here's
why:
The great majority of users and applications really only need
a mostly-better-than-ten-millisecond latency. This gives good
responsiveness for user interfaces and media streaming. This
can trivially be achieved with the current kernel via a thirty line
patch (which _should_ be applied to 2.4.x. I need to get off my
butt).
But the next rank of applications - instrumentation, control systems,
media production sytems, etc require 500-1000 usec latencies, and
the group of people who require this is considerably smaller. And their
requirements are quite aggressive. And maintaining that performance
with either approach is a fair bit of work and impacts (by definition)
the while kernel. That's all an argument for keeping it offstream.
> 3) Is there a possibility that either of these will make it to non-x86
> platforms? (for me: alpha) The second patch looks like it would
> straightforwardly work on any arch, but the config.in for it is only in
> arch/i386. Robert Love's patches would need some arch-specific asm...
>
The rescheduling patch should work fine on any architecture - just copy
the arch/i386/config.in changes.
-
On October 6, 2001 08:46 am, Andrew Morton wrote:
> Bob McElrath wrote:
> > 1) Which of these two projects has better latency performance? Has anyone
> > benchmarked them against each other?
>
> I haven't seen any rigorous latency measurements on Rob's stuff, and
> I haven't seriously measured the reschedule-based patch for months. But
> I would expect the preempt patch to perform significantly worse because
> it doesn't attempt to break up the abovementioned long-held locks.
Nor should it. The preemption patch should properly address only what is
needed to implement preemption, and a patch similar to yours should be
applied on top to break up the remaining lock latencies. (Perhaps a duh?)
> (It can
> do so, though - a straightforward adaptation of the reschedule patch's
> changes will fix it).
Yep.
--
Daniel
Andrew Morton [[email protected]] wrote:
> Bob McElrath wrote:
> > 3) Is there a possibility that either of these will make it to non-x86
> > platforms? (for me: alpha) The second patch looks like it would
> > straightforwardly work on any arch, but the config.in for it is only in
> > arch/i386. Robert Love's patches would need some arch-specific asm...
>
> The rescheduling patch should work fine on any architecture - just copy
> the arch/i386/config.in changes.
I'm running it (2.4.10-pre4-low-latency) on my alpha now, so if you want to add
the appropriate magic to arch/alpha/config.in, please do.
Unfortunately, the use-once stuff broke again in the vm of this kernel, and now
I can't perceive any advantage due to low-latency because of all the swapping. :(
Cheers,
-- Bob
Bob McElrath ([email protected])
Univ. of Wisconsin at Madison, Department of Physics
On Fri, Oct 05, 2001 at 11:46:39PM -0700, Andrew Morton wrote:
> Bob McElrath wrote:
> > 2) Will either of these ever be merged into Linus' kernel (2.5?)
>
> Controversial. My vague feeling is that they shouldn't. Here's
> why:
>
> The great majority of users and applications really only need
> a mostly-better-than-ten-millisecond latency. This gives good
> responsiveness for user interfaces and media streaming. This
> can trivially be achieved with the current kernel via a thirty line
> patch (which _should_ be applied to 2.4.x. I need to get off my
> butt).
>
> But the next rank of applications - instrumentation, control systems,
> media production sytems, etc require 500-1000 usec latencies, and
> the group of people who require this is considerably smaller. And their
> requirements are quite aggressive. And maintaining that performance
> with either approach is a fair bit of work and impacts (by definition)
> the while kernel. That's all an argument for keeping it offstream.
>
And exactly how is low latency going to hurt the majority?
This reminds me of when 4GB on ia32 was enough, or 16 bit UIDs, or...
Should those have been left out too just because the people who needed them
were few?
If the requirements for manufacturing control, or audio processing, or etc
will make my home box, or my server work better then why not include it?
On Sat, 2001-10-06 at 18:00, Mike Fedyk wrote:
> And exactly how is low latency going to hurt the majority?
The problem is people argue that a preemptible kernel lowers throughput
since I/O is now interrupted. Of course, if they fear that, maybe we
should switch to cooperative multitasking!
Anyhow, tests show the preemptible kernel has a negligible effect on
throughput -- in fact in some cases we improve it since overtime we
better distribute system load. This is one reason why I ask for dbench
or bonnie benchmarks from the preemption users. Results are good.
The other concern is that added complexity is a Bad Thing, and I agree,
but the complexity of preemption is insanely low. In fact, since we use
so many preexisting constructs (such as SMP locks), its practically
nothing.
> This reminds me of when 4GB on ia32 was enough, or 16 bit UIDs, or...
>
> Should those have been left out too just because the people who needed them
> were few?
Agreed.
> If the requirements for manufacturing control, or audio processing, or etc
> will make my home box, or my server work better then why not include it?
That is my thought process, too.
Robert Love
On Sat, 2001-10-06 at 02:05, Bob McElrath wrote:
> [...]
> Correct me if I'm wrong, but the former uses spinlocks to know when it can
> preempt the kernel, and the latter just tries to reduce latency by adding
> (un)conditional_schedule and placing it at key places in the kernel?
Correct. The low-latency patch does some other work to try to break up
huge routines, too.
> My questions are:
> 1) Which of these two projects has better latency performance? Has anyone
> benchmarked them against each other?
I suspect you will find a lower average latency with the preemption
patch. However, I suspect with the low-latency patch you may see a
lower maximum since it works on some of the terribly long-held lock
situations.
In truth, a combination of the two could prove useful. I have been
working on finding the worst-case non-preemption regions (longest held
lock regions) in the kernel.
> 2) Will either of these ever be merged into Linus' kernel (2.5?)
I hope :)
> 3) Is there a possibility that either of these will make it to non-x86
> platforms? (for me: alpha) The second patch looks like it would
> straightforwardly work on any arch, but the config.in for it is only in
> arch/i386. Robert Love's patches would need some arch-specific asm...
Andrew's patch should work fine on all platforms, although I think the
configure statement is in the processor section so you will need to move
it to arch/alpha/config.in
The preemption patch has a small amount of arch-independent code but we
are working on supporting all architectures. 2.5...
Robert Love
On Sat, Oct 06, 2001 at 06:36:49PM -0400, Robert Love wrote:
> On Sat, 2001-10-06 at 02:05, Bob McElrath wrote:
> > 3) Is there a possibility that either of these will make it to non-x86
> > platforms? (for me: alpha) The second patch looks like it would
> > straightforwardly work on any arch, but the config.in for it is only in
> > arch/i386. Robert Love's patches would need some arch-specific asm...
>
> Andrew's patch should work fine on all platforms, although I think the
> configure statement is in the processor section so you will need to move
> it to arch/alpha/config.in
>
> The preemption patch has a small amount of arch-independent code but we
> are working on supporting all architectures. 2.5...
>
If you decide to provide support for PPC32 I'd be happy to test it on top of
a benh or bitkeeper kernel tree.
Mike
On Sat, 2001-10-06 at 02:46, Andrew Morton wrote:
> [...]
> > My questions are:
> > 1) Which of these two projects has better latency performance? Has anyone
> > benchmarked them against each other?
>
> I haven't seen any rigorous latency measurements on Rob's stuff, and
> I haven't seriously measured the reschedule-based patch for months. But
> I would expect the preempt patch to perform significantly worse because
> it doesn't attempt to break up the abovementioned long-held locks. (It can
> do so, though - a straightforward adaptation of the reschedule patch's
> changes will fix it).
We've gotten some great benchmarks (I originally asked all the users for
them), I would be happy to send some your way if I can dig them up.
Basically we saw average latency drop to under 5ms; 1ms in many cases.
Worst-case latency tended to be around 50ms, but we have measured locks
(using the preempt-stats) which are still in the way-to-long range.
I think preemption is a very natural and clean solution the problem --
its the way things should just be, anyhow.
Nonetheless, running a lock-breaking patch on top of preemption is
interesting. I am looking into doing this with the lock times I have
collected.
> > 2) Will either of these ever be merged into Linus' kernel (2.5?)
>
> Controversial. My vague feeling is that they shouldn't. Here's
> why:
>
> The great majority of users and applications really only need
> a mostly-better-than-ten-millisecond latency. This gives good
> responsiveness for user interfaces and media streaming. This
> can trivially be achieved with the current kernel via a thirty line
> patch (which _should_ be applied to 2.4.x. I need to get off my
> butt).
>
> But the next rank of applications - instrumentation, control systems,
> media production sytems, etc require 500-1000 usec latencies, and
> the group of people who require this is considerably smaller. And their
> requirements are quite aggressive. And maintaining that performance
> with either approach is a fair bit of work and impacts (by definition)
> the while kernel. That's all an argument for keeping it offstream.
With preemption, we can gain the <10ms that most "regular" users want.
Without it, we don't have it.
With preemption, we can come super close to the 0.5-1ms latency (on
average) the specialized groups list want. With preemption and perhaps
some other work (something akin to your low-latency patch) we can
achieve it for sure ... perhaps better.
If we can achieve such great results, and keep throughput low, and do it
with such little complexity -- of course, after we prove all this -- why
not merge it? Anyhow, its a configure option!
> [...]
Robert Love
On 6 Oct 2001, Robert Love wrote:
> If we can achieve such great results, and keep throughput low, and do it
^^^
> with such little complexity -- of course, after we prove all this -- why
> not merge it? Anyhow, its a configure option!
heh.
-jwb
On Sat, 2001-10-06 at 22:38, Jeffrey W. Baker wrote:
> On 6 Oct 2001, Robert Love wrote:
>
> > If we can achieve such great results, and keep throughput low, and do it
> ^^^
> > with such little complexity -- of course, after we prove all this -- why
> > not merge it? Anyhow, its a configure option!
>
> heh.
I guess we should aim to keep them high, eh?
Robert Love
Mike Fedyk wrote:
>On Fri, Oct 05, 2001 at 11:46:39PM -0700, Andrew Morton wrote:
> > But the next rank of applications - instrumentation, control systems,
> > media production sytems, etc require 500-1000 usec latencies, and
> > the group of people who require this is considerably smaller. And their
> > requirements are quite aggressive. And maintaining that performance
> > with either approach is a fair bit of work and impacts (by definition)
> > the while kernel. That's all an argument for keeping it offstream.
> >
>
> And exactly how is low latency going to hurt the majority?
>
> This reminds me of when 4GB on ia32 was enough, or 16 bit UIDs, or...
Low latency wobviously won't do damage by itself. But Andrew Morton
said it well: "And maintaining that performance
with either approach is a fair bit of work and impacts (by definition)
the whole kernel."
I.e. it is too much work to get right (and keep right). The amount
of developers is finite, their time can be better spent on other
improvements. All future improvement will be harder if we also have
to _maintain_ extreme low latency. This is not fix-it-once thing.
Helge Hafting
Helge Hafting wrote:
>
> Mike Fedyk wrote:
> >On Fri, Oct 05, 2001 at 11:46:39PM -0700, Andrew Morton wrote:
> > > But the next rank of applications - instrumentation, control systems,
> > > media production sytems, etc require 500-1000 usec latencies, and
> > > the group of people who require this is considerably smaller. And their
> > > requirements are quite aggressive. And maintaining that performance
> > > with either approach is a fair bit of work and impacts (by definition)
> > > the while kernel. That's all an argument for keeping it offstream.
> > >
> >
> > And exactly how is low latency going to hurt the majority?
> >
> > This reminds me of when 4GB on ia32 was enough, or 16 bit UIDs, or...
>
> Low latency wobviously won't do damage by itself. But Andrew Morton
> said it well: "And maintaining that performance
> with either approach is a fair bit of work and impacts (by definition)
> the whole kernel."
>
> I.e. it is too much work to get right (and keep right). The amount
> of developers is finite, their time can be better spent on other
> improvements. All future improvement will be harder if we also have
> to _maintain_ extreme low latency. This is not fix-it-once thing.
>
Well, no, but do we want to improve as kernel writers, or just stay
"hackers"? If low latency was a concern the same way lack of dead locks
and avoiding OOPs is today, don't you think we would be better coders?
As for me, I want to shoot for the higher goal. Even if I miss, I will
still have accomplished more than if I had shot for the mundane.
George
george anzinger wrote:
>
> Well, no, but do we want to improve as kernel writers, or just stay
> "hackers"? If low latency was a concern the same way lack of dead locks
> and avoiding OOPs is today, don't you think we would be better coders?
> As for me, I want to shoot for the higher goal. Even if I miss, I will
> still have accomplished more than if I had shot for the mundane.
Right. It needs to be a conscious, planned decision: "from now on,
holding a lock for more than 500 usecs is a bug".
So someone, be it Linus, "the community" or my Mum needs to decide
that this is a feature which the kernel will henceforth support.
It's a new feature - it should be treated as such.
-
> Right. It needs to be a conscious, planned decision: "from now on,
> holding a lock for more than 500 usecs is a bug".
Firstly you can start with "of course some hardware will stall the bus
longer than that"
Alan Cox ([email protected]) wrote :
> > Right. It needs to be a conscious, planned decision: "from now on,
> > holding a lock for more than 500 usecs is a bug".
>
> Firstly you can start with "of course some hardware will stall the bus
> longer than that"
So ?
Some hardware miscalculates certain floating point operations,
but we still use FPUs.
Some ethernet cards corrupt the packets, but Linux still supports ethernet.
Some IDE hardrives lock up in DMA mode, yet Linux still supports DMA.
Some softmodems don't work at all, yet Linux still support modems.
There is always buggy hardware in every category. No reason to not use the good ones.
david, just being a PITA ...
--
David Balazic
--------------
"Be excellent to each other." - Bill S. Preston, Esq., & "Ted" Theodore Logan
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -