LinuxLists.cc - Realtime-preempt performs worse for many threads?

2005-11-01 20:02:45

Subject: Realtime-preempt performs worse for many threads?

Hi!

I've been developing some code for the OpenPBX project
(http://www.openpbx.org) and wrote a program to test how the system,
responds when hundreds of threads are spawned. These threads run at
high priority (SCHED_FIFO) and use clock_nanocleep with absolute
timeouts on a 20ms loop cycle.

With the stock 2.6.14 kernel, I get latencies in the order of several
milliseconds (but less than 20ms) when running 1250 threads
simultaneously. However, when I switch to a kernel patched with
realtime-preempt latency increases to several hundred milliseconds in
many cases.

When I only only spawn 10 or so threads, realtime-preempt gives me
latencies of less than 1ms while the stock kernel still gives me a few
milliseconds. However, when the number of threads sleeping on
clock_nanosleep increases to several hundred, things just break.

Should I assume that realtime-preempt at this time is not ready to
deal with hundreds of realtime threads sleeping most of the time on
clock_nanosleep?

Any ideas on how to maybe debug this and see if there is some kind of problem?

Thanks!

Carlos

--
"We hold [...] that all men are created equal; that they are
endowed [...] with certain inalienable rights; that among
these are life, liberty, and the pursuit of happiness"
-- Thomas Jefferson

2005-11-01 23:28:04

by Esben Nielsen

[permalink] [raw]

Subject: Re: Realtime-preempt performs worse for many threads?

On Tue, 1 Nov 2005, Carlos Antunes wrote:

> Hi!
>
> I've been developing some code for the OpenPBX project
> (http://www.openpbx.org) and wrote a program to test how the system,
> responds when hundreds of threads are spawned. These threads run at
> high priority (SCHED_FIFO) and use clock_nanocleep with absolute
> timeouts on a 20ms loop cycle.
>
> With the stock 2.6.14 kernel, I get latencies in the order of several
> milliseconds (but less than 20ms) when running 1250 threads
> simultaneously. However, when I switch to a kernel patched with
> realtime-preempt latency increases to several hundred milliseconds in
> many cases.

There is only one explanation:
Some of the operations (task switch, nanosleep etc.) are more expensive in
the RT kernel. Thus your 1250 threads spend 100% CPU doing what they do.
You therefore get very bad latencies.

>
> When I only only spawn 10 or so threads, realtime-preempt gives me
> latencies of less than 1ms while the stock kernel still gives me a few
> milliseconds. However, when the number of threads sleeping on
> clock_nanosleep increases to several hundred, things just break.
>
> Should I assume that realtime-preempt at this time is not ready to
> deal with hundreds of realtime threads sleeping most of the time on
> clock_nanosleep?
>

Well, apparently not. But whoever wants to do that for a _real_
application?
In practise, I would say that you can not really gurantie latencies for
more than the 10-20 highest priority threads. The probabilty that
those 10-20 threads have runs for an non-determiniticly long time
becomes very high, especially if they are event-triggered: The events
could come in bursts. Therefore you can't consider the next threads "RT".
This is what happens in your case: When you increase the number of threads
the CPU gets more and more to do on the highest priority. At some point
you hit 100% workload for 100s of ms in row. The next threads thus can't
meet their deadlines. As you put all your threads on the same priority,
nobody is really among highest priority threads and nobody optains
deterministic latencies.

> Any ideas on how to maybe debug this and see if there is some kind of problem?
>

No bug, only extra overhead. The RT kernel has a lower _throughput_.
Apparently 1250 threads doing nanosleep for 2ms is it's limit on your
machine.

Esben

> Thanks!
>
> Carlos
>
>
>
> --
> "We hold [...] that all men are created equal; that they are
> endowed [...] with certain inalienable rights; that among
> these are life, liberty, and the pursuit of happiness"
> -- Thomas Jefferson
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2005-11-02 00:29:42

by Carlos Antunes

[permalink] [raw]

Subject: Re: Realtime-preempt performs worse for many threads?

On 11/1/05, Carlos Antunes <[email protected]> wrote:
> On 11/1/05, Esben Nielsen <[email protected]> wrote:
> > On Tue, 1 Nov 2005, Carlos Antunes wrote:
> >
> > > Hi!
> > >
> > > I've been developing some code for the OpenPBX project
> > > (http://www.openpbx.org) and wrote a program to test how the system,
> > > responds when hundreds of threads are spawned. These threads run at
> > > high priority (SCHED_FIFO) and use clock_nanocleep with absolute
> > > timeouts on a 20ms loop cycle.
> > >
> > > With the stock 2.6.14 kernel, I get latencies in the order of several
> > > milliseconds (but less than 20ms) when running 1250 threads
> > > simultaneously. However, when I switch to a kernel patched with
> > > realtime-preempt latency increases to several hundred milliseconds in
> > > many cases.
> >
> > There is only one explanation:
> > Some of the operations (task switch, nanosleep etc.) are more expensive in
> > the RT kernel. Thus your 1250 threads spend 100% CPU doing what they do.
> > You therefore get very bad latencies.
> >
>
> Esben,
>
> Thanks for replying. Let me chalenge this assumption of yours, though.
>
> I just ran a test with those 1250 threads (all they do is sleep for
> 20ms, wake up, increment a number, and repeat the process). The CPU
> was 86% *IDLE* while running this. One thread took 1.3 seconds to wake
> up once. Do you think this is, well, normal, given how RT is supposed
> to operate?
>

Esben,

If, instead of SCHD_FIFO, I use SCHED_OTHER, I get max latency in the
order 13ms running those 1250 threads. With SCHED_FIFO (the only
change), I get 1.3 seconds. Makes sense to you?

Thanks!

Carlos

--
"We hold [...] that all men are created equal; that they are
endowed [...] with certain inalienable rights; that among
these are life, liberty, and the pursuit of happiness"
-- Thomas Jefferson

2005-11-02 00:31:15

by jmerkey

[permalink] [raw]

Subject: Re: Realtime-preempt performs worse for many threads?

Oink Oink. Verified.

Jeff

Carlos Antunes wrote:

>On 11/1/05, Carlos Antunes <[email protected]> wrote:
>
>
>>On 11/1/05, Esben Nielsen <[email protected]> wrote:
>>
>>
>>>On Tue, 1 Nov 2005, Carlos Antunes wrote:
>>>
>>>
>>>
>>>>Hi!
>>>>
>>>>I've been developing some code for the OpenPBX project
>>>>(http://www.openpbx.org) and wrote a program to test how the system,
>>>>responds when hundreds of threads are spawned. These threads run at
>>>>high priority (SCHED_FIFO) and use clock_nanocleep with absolute
>>>>timeouts on a 20ms loop cycle.
>>>>
>>>>With the stock 2.6.14 kernel, I get latencies in the order of several
>>>>milliseconds (but less than 20ms) when running 1250 threads
>>>>simultaneously. However, when I switch to a kernel patched with
>>>>realtime-preempt latency increases to several hundred milliseconds in
>>>>many cases.
>>>>
>>>>
>>>There is only one explanation:
>>>Some of the operations (task switch, nanosleep etc.) are more expensive in
>>>the RT kernel. Thus your 1250 threads spend 100% CPU doing what they do.
>>>You therefore get very bad latencies.
>>>
>>>
>>>
>>Esben,
>>
>>Thanks for replying. Let me chalenge this assumption of yours, though.
>>
>>I just ran a test with those 1250 threads (all they do is sleep for
>>20ms, wake up, increment a number, and repeat the process). The CPU
>>was 86% *IDLE* while running this. One thread took 1.3 seconds to wake
>>up once. Do you think this is, well, normal, given how RT is supposed
>>to operate?
>>
>>
>>
>
>Esben,
>
>If, instead of SCHD_FIFO, I use SCHED_OTHER, I get max latency in the
>order 13ms running those 1250 threads. With SCHED_FIFO (the only
>change), I get 1.3 seconds. Makes sense to you?
>
>Thanks!
>
>Carlos
>
>--
>"We hold [...] that all men are created equal; that they are
>endowed [...] with certain inalienable rights; that among
>these are life, liberty, and the pursuit of happiness"
> -- Thomas Jefferson
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>

2005-11-02 00:33:04

by Esben Nielsen

[permalink] [raw]

Subject: Re: Realtime-preempt performs worse for many threads?

On Tue, 1 Nov 2005, Carlos Antunes wrote:

> On 11/1/05, Carlos Antunes <[email protected]> wrote:
> > On 11/1/05, Esben Nielsen <[email protected]> wrote:
> > > On Tue, 1 Nov 2005, Carlos Antunes wrote:
> > >
> > > > Hi!
> > > >
> > > > I've been developing some code for the OpenPBX project
> > > > (http://www.openpbx.org) and wrote a program to test how the system,
> > > > responds when hundreds of threads are spawned. These threads run at
> > > > high priority (SCHED_FIFO) and use clock_nanocleep with absolute
> > > > timeouts on a 20ms loop cycle.
> > > >
> > > > With the stock 2.6.14 kernel, I get latencies in the order of several
> > > > milliseconds (but less than 20ms) when running 1250 threads
> > > > simultaneously. However, when I switch to a kernel patched with
> > > > realtime-preempt latency increases to several hundred milliseconds in
> > > > many cases.
> > >
> > > There is only one explanation:
> > > Some of the operations (task switch, nanosleep etc.) are more expensive in
> > > the RT kernel. Thus your 1250 threads spend 100% CPU doing what they do.
> > > You therefore get very bad latencies.
> > >
> >
> > Esben,
> >
> > Thanks for replying. Let me chalenge this assumption of yours, though.
> >
> > I just ran a test with those 1250 threads (all they do is sleep for
> > 20ms, wake up, increment a number, and repeat the process). The CPU
> > was 86% *IDLE* while running this. One thread took 1.3 seconds to wake
> > up once. Do you think this is, well, normal, given how RT is supposed
> > to operate?
> >
>
> Esben,
>
> If, instead of SCHD_FIFO, I use SCHED_OTHER, I get max latency in the
> order 13ms running those 1250 threads. With SCHED_FIFO (the only
> change), I get 1.3 seconds. Makes sense to you?
>

No. I can't explain that one. You might have discovered a bug in
SCHED_FIFO. What about SCHED_RR?

Esben

> Thanks!
>
> Carlos
>
> --
> "We hold [...] that all men are created equal; that they are
> endowed [...] with certain inalienable rights; that among
> these are life, liberty, and the pursuit of happiness"
> -- Thomas Jefferson
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>