2001-10-10 00:36:51

by Ed Sweetman

[permalink] [raw]
Subject: 2.4.10-ac10-preempt lmbench output.

I'm very pleased so far with ac10 with the preempt patch. Much better than
2.4.9-ac18-preempt, which is what i was using. I'm just going to put up some
output from lmbench to see if anyone who is running the non-preempt version
is seeing better or worse timings and scores. Perhaps the improvement is
all in my head due to me moving my atapi devices off of the promise card
(since you're not supposed to put any on it) and now everything is generally
running faster despite the kernel being used. Heh. so here they are

ftp://ftp.bitmover.com/lmbench/LMbench2.tgz
(top of README says lmbench 2alpha8)
compiled without any changes to the Makefile (gcc 2.95.4)

Simple syscall: 0.3226 microseconds
Simple read: 0.8185 microseconds
Simple write: 0.5791 microseconds
Simple stat: 3.7546 microseconds
Simple open/close: 5.6581 microseconds
lat_fs (ext2)
0k 1000 36770 123993
1k 1000 15526 74383
4k 1000 15202 73692
10k 1000 9124 51972
FIFO latency: 8.0457 microseconds
Signal handler installation: 0.932 microseconds
Signal handler overhead: 2.852 microseconds
Protection fault: 0.761 microseconds
Pipe latency: 7.9139 microseconds
Pagefaults on /something.avi: 13098 usecs
Process fork+exit: 249.6818 microseconds
Process fork+execve: 298.0000 microseconds
Process fork+/bin/sh -c: 7883.0000 microseconds
AF_UNIX sock stream latency: 11.0054 microseconds
Select on 200 tcp fd's: 62.7955 microseconds
Select on 200 fd's: 18.5960 microseconds
Fcntl lock latency: 7.3516 microseconds
lat_ctx on an Eterm process
"size=0k ovr=2.82
"size=1024k ovr=301.96

That's all from lmbench2. Anyone without the preempt patch using the same
kernel care to compare? I'm very pleased.
Heavily io bound processes (dbench 32) still causes something as light as an
mp3 player to skip, though. That probably wont be fixed intil 2.5, since
you need to have preemption in the vm and the rest of the kernel.


2001-10-10 01:18:27

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Tue, Oct 09, 2001 at 08:36:56PM -0400, safemode wrote:
> mp3 player to skip, though. That probably wont be fixed intil 2.5, since
> you need to have preemption in the vm and the rest of the kernel.

xmms skips during I/O should have nothing to do with preemption.

As Alan noted for the ring of dma fragments to expire you need a
scheduler latency of the order of seconds, now (assuming the ll points
in read/write paths) when we've bad latencies under writes it's of the
order of 10msec and it can be turned down further by putting preemption
checks in the buffer lru lists write paths.

The reason xmms skips I believe is because the vm is doing write
throttling. I've at least one idea on how to fix it but it has nothing
to do with preemption in the VM or whatever else scheduler related
thing.

So I wouldn't expect to fix any playback skips where buffering is
possible by using the preemptive patch etc.. It's nearly impossible that
it makes any difference.

The preemptive patch can matter only if you're doing real time signal
processing where any kind of buffering isn't possible.

Andrea

2001-10-10 02:01:47

by Robert Love

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Tue, 2001-10-09 at 20:36, safemode wrote:
> I'm very pleased so far with ac10 with the preempt patch. Much better than
> 2.4.9-ac18-preempt, which is what i was using. I'm just going to put up some
> output from lmbench to see if anyone who is running the non-preempt version
> is seeing better or worse timings and scores. Perhaps the improvement is
> all in my head due to me moving my atapi devices off of the promise card
> (since you're not supposed to put any on it) and now everything is generally
> running faster despite the kernel being used. Heh. so here they are

I've noticed good improvements on 2.4.10-ac10, too. You may want to try
Rik's eatcache patch available at
http://www.surriel.com/patches/2.4/2.4.10-ac9-eatcache - It does a
noticeable job of preventing the cache thrashing that occurs during
heavy cache activity. This will result in less VM activity, hopefully,
and thus less lock held time. He can use the feedback to tune it
better.

Also, you will really want to run lmbench on 2.4.10-ac10-nopreempt
yourself. While a lot of lmbench is pretty kernel-specific
machine-agnostic, a faster MHz CPU will certainly change almost every
result.

Robert Love

2001-10-10 02:09:37

by Robert Love

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Tue, 2001-10-09 at 21:18, Andrea Arcangeli wrote:
> xmms skips during I/O should have nothing to do with preemption.

Why does preemption patch make a difference for me, then? I'm not doing
anything even remotely close to real-time processing.

> As Alan noted for the ring of dma fragments to expire you need a
> scheduler latency of the order of seconds, now (assuming the ll points
> in read/write paths) when we've bad latencies under writes it's of the
> order of 10msec and it can be turned down further by putting preemption
> checks in the buffer lru lists write paths.

Isn't mp3 decoding done `just in time' ie we decode x and buffer x,
decode y and buffer y...hopefully in a quick enough manner it sounds
like coherent sound? Thus, if the task can not be scheduled as
required, there are noticeable latencies in the sound not because the
sound card buffer ran dry but because the mp3 couldn't even be decoded
to fill the buffer!

Anyhow, if we have latencies of 10ms (and in reality there are higher
latencies, too), these can cause the sort of system response scenerios
that are a problem. Preemption makes these latencies effectively 0
(outside of locks).

Robert Love

2001-10-10 02:09:28

by Ed Sweetman

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Tuesday 09 October 2001 21:18, Andrea Arcangeli wrote:
> On Tue, Oct 09, 2001 at 08:36:56PM -0400, safemode wrote:
> > mp3 player to skip, though. That probably wont be fixed intil 2.5,
> > since you need to have preemption in the vm and the rest of the kernel.
>
> xmms skips during I/O should have nothing to do with preemption.
>
> As Alan noted for the ring of dma fragments to expire you need a
> scheduler latency of the order of seconds, now (assuming the ll points
> in read/write paths) when we've bad latencies under writes it's of the
> order of 10msec and it can be turned down further by putting preemption
> checks in the buffer lru lists write paths.
>
> The reason xmms skips I believe is because the vm is doing write
> throttling. I've at least one idea on how to fix it but it has nothing
> to do with preemption in the VM or whatever else scheduler related
> thing.
>
> So I wouldn't expect to fix any playback skips where buffering is
> possible by using the preemptive patch etc.. It's nearly impossible that
> it makes any difference.
>
> The preemptive patch can matter only if you're doing real time signal
> processing where any kind of buffering isn't possible.
>
> Andrea

That's what i would think too at first. What's confusing me is the fact that
it is affected by priority. Which means preemption can solve the problem.
If i run the mp3 player at nice -n -20, i get no skips. Why else would that
be if not that preemption is dictating that freeamp's process gets whatever
it wants when it wants ?
My question is why does freeamp need to be at -20 nice just to do it's thing
when logic dictates that it should be dbench that skips and is throttled down
when run at the same priority as freeamp and not freeamp, since freeamp isn't
trying to abuse it's resources.
I mean, if renicing the process allows it not to skip, what else is going on
if it's not preemption, isn't that what the purpose of the priorities are -
preempting lower priorities? Or is nice something totally different,
separate from priorities?
I'm not exactly seeing how renicing it to -20 and the kernel letting freeamp
do what it wants over anything else does not fall under the definition of
preemption. It cant possibly have nothing to do with preemption of
preemption directly effects it.

Ok, so maybe i'm wrong and it has nothing to do with preemption, if then what
exactly is allowing freeamp to play perfectly when run with nice -n -20 and
not at normal 0. And why is that the default behavior of the kernel ? It
seems quite unfair in a multiuser-multiprocessing system.

2001-10-10 02:29:52

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Tue, Oct 09, 2001 at 10:09:33PM -0400, safemode wrote:
> On Tuesday 09 October 2001 21:18, Andrea Arcangeli wrote:
> > On Tue, Oct 09, 2001 at 08:36:56PM -0400, safemode wrote:
> > > mp3 player to skip, though. That probably wont be fixed intil 2.5,
> > > since you need to have preemption in the vm and the rest of the kernel.
> >
> > xmms skips during I/O should have nothing to do with preemption.
> >
> > As Alan noted for the ring of dma fragments to expire you need a
> > scheduler latency of the order of seconds, now (assuming the ll points
> > in read/write paths) when we've bad latencies under writes it's of the
> > order of 10msec and it can be turned down further by putting preemption
> > checks in the buffer lru lists write paths.
> >
> > The reason xmms skips I believe is because the vm is doing write
> > throttling. I've at least one idea on how to fix it but it has nothing
> > to do with preemption in the VM or whatever else scheduler related
> > thing.
> >
> > So I wouldn't expect to fix any playback skips where buffering is
> > possible by using the preemptive patch etc.. It's nearly impossible that
> > it makes any difference.
> >
> > The preemptive patch can matter only if you're doing real time signal
> > processing where any kind of buffering isn't possible.
> >
> > Andrea
>
> That's what i would think too at first. What's confusing me is the fact that
> it is affected by priority. Which means preemption can solve the problem.
> If i run the mp3 player at nice -n -20, i get no skips. Why else would that

As Dan Mann noted privately there's of course also the possibility that
the scheduler scheduled xmms away for a long time because there's a very
high cpu load, this seems confirmed since the skip goes away with nice
-n -20.

> be if not that preemption is dictating that freeamp's process gets whatever
> it wants when it wants ?

If -n -20 fixes the problem that has nothing to do with scheduler
latency or with the write throttling and the preemption patch cannot
help at all.

If -n -20 fixes the problem it simply means your cpu load was too high.
The linux scheduler is fair. So to fix it there are only those possible
ways:

1) buy a faster cpu
2) add additional cpus to your system
3) reduce the cpu load of your system by stopping some of the cpu
eaters
4) run xmms RT or with higher priority (or reduce the priority of the
other cpu hogs)

As said it's very very unlikely that preemption points can fix xmms
skips anyways, the worst scheduler latency is always of the order of the
msecs, to generate skips you need a latency of seconds.

I thought your problem weren't just xmms being scheduled away due high
cpu load, because dbench is intended to be an I/O benchmark but maybe
you've lots of cache and you do little I/O?

The problem I was talking about in my earlier email applies to RT tasks
too, so if you were doing lots of I/O and xmms started doing write
throttling just running nice -n -20 wouldn't helped.

> I mean, if renicing the process allows it not to skip, what else is going on

The reason it allows it not to skip is because the scheduler gives more
cpu to xmms.

There's nothing magic in the software, if you divide the cpu in 10 parts
and you give 1/10 of the cpu to xmms, but xmms needs 1/2 of the cpu to
play your .mp3 then there's nothing you can do to fix it but to tell
the scheduler to give more cpu to xmms (renicing to -20 gives more cpu
to xmms, enough to sustain the .mp3 decoding without dropouts).

> Ok, so maybe i'm wrong and it has nothing to do with preemption, if then what

Correct, it has nothing to do with preemption.

> not at normal 0. And why is that the default behavior of the kernel ? It
> seems quite unfair in a multiuser-multiprocessing system.

The opposite, the scheduler is fair, so it divides the cpu to all the
tasks in your system. If xmms wouldn't skip the scheduler isn't fair,
and as you say that would be very bad in a multiuser system.

Andrea

2001-10-10 02:37:23

by Robert Love

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Tue, 2001-10-09 at 22:30, Andrea Arcangeli wrote:
> As said it's very very unlikely that preemption points can fix xmms
> skips anyways, the worst scheduler latency is always of the order of the
> msecs, to generate skips you need a latency of seconds.
>
> [...]
>
> There's nothing magic in the software, if you divide the cpu in 10 parts
> and you give 1/10 of the cpu to xmms, but xmms needs 1/2 of the cpu to
> play your .mp3 then there's nothing you can do to fix it but to tell
> the scheduler to give more cpu to xmms (renicing to -20 gives more cpu

What if the CPU does divide its time into two 1/2 parts, and gives one
each to xmms and dbench. Everything runs fine, since xmms needs 1/2 cpu
to play without skips.

Now dbench (or any task) is in kernel space for too long. The CPU time
xmms needs will of course still be given, but _too late_. Its just not
a cpu resource problem, its a timing problem. xmms needs x units of CPU
every y units of time. Just getting the x whenever is not enough.

With preempt-kernel patch, the long-lasting kernel space activity dbench
is engaged in won't hog the CPU until it completes. When xmms is ready
(time y arrives), the scheduler will yield the CPU.

Robert Love

2001-10-10 02:51:27

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Tue, Oct 09, 2001 at 10:10:26PM -0400, Robert Love wrote:
> On Tue, 2001-10-09 at 21:18, Andrea Arcangeli wrote:
> > xmms skips during I/O should have nothing to do with preemption.
>
> Why does preemption patch make a difference for me, then? I'm not doing
> anything even remotely close to real-time processing.

I don't see how can it make any difference with buffered playback of
data. It doesn't make sense that it make any difference.

The only factors that can generate xmms skips are:

1) vm write throttling, xmms blocking for seconds during a page fault
or while reading the mp3 from disk waiting I/O submitted by another
I/O bound application (or also xmms blocking swapping out some page
during swapout storms)

I've ideas on how to avoid the write throttling of xmms with heuristics in
balance_dirty(), the swapout blocking cannot be avoided instead (it
has to block or die)

This problem cannot be avioided from userspace.

2) xmms isn't getting enough cpu, like running xmms on a 386 with 16mhz,
it will generate dropouts, run xmms RT or give it the right amount
of cpu using priorities to fix the dropouts

3) a scheduler latency of the order of the seconds, the only thing
that could generate that could be the read/write paths with lofs of
cache

Ah, wait a moment, if the preemptive patch fixes your problem I guess
you're hitting the lack of preemption in read/write. I tend to forget
about that since I've fixed it ages ago in -aa in both 2.2 and 2.4 by
putting the needed preemption points in the copy-users exactly because
that could really hang the machine for many seconds on a multi-gb ram
box. Andrew suggested to move the scheduler points down to
generic_file_read/write and I plan to do that soon.

So could you please check if running -aa or the preemptive patch makes
any difference for you while using xmms?

> > As Alan noted for the ring of dma fragments to expire you need a
> > scheduler latency of the order of seconds, now (assuming the ll points
> > in read/write paths) when we've bad latencies under writes it's of the
> > order of 10msec and it can be turned down further by putting preemption
> > checks in the buffer lru lists write paths.
>
> Isn't mp3 decoding done `just in time' ie we decode x and buffer x,

mp3 decoding is done with an huge buffering, this means even if xmms
runs after 0.5 sec after it got the wakeup (due scheduler latency
problems) it will have no problem to decode the rest of the mp3 and put
it in the buffer to the soundcard.

Of course if xmms runs after the soundcard dma ring dried out, then
there will be a dropout, but it would need seconds of scheduler latency
to generate such a dropout which isn't going to happen.

Of course if you divide the cpu across 100 tasks all cpu hogs, then it
is likely xmms will run once every few seconds, (if you assume all tasks
are pure cpu hogs it will be a round robin) but that's completly
unrelated to the preemption latency.

> decode y and buffer y...hopefully in a quick enough manner it sounds
> like coherent sound? Thus, if the task can not be scheduled as
> required, there are noticeable latencies in the sound not because the
> sound card buffer ran dry but because the mp3 couldn't even be decoded
> to fill the buffer!
>
> Anyhow, if we have latencies of 10ms (and in reality there are higher
> latencies, too), these can cause the sort of system response scenerios
> that are a problem. Preemption makes these latencies effectively 0
> (outside of locks).

almost all kernel cpu hogs (except the copy users where I put the
reschedule points) that can stall the systems for mseconds are covered
by locks. So we need to do stuff like in vmscan.c::shrink_cache also
with the preemptive kernel patch, but then calling schedule explicitly
won't make any difference in both scheduler latency and performance (it
will be actually faster because you do the explicit preemption check
only in the cpu hogs rather than in the last spin_unlock.

Andrea

2001-10-10 03:06:22

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Tue, Oct 09, 2001 at 10:37:56PM -0400, Robert Love wrote:
> On Tue, 2001-10-09 at 22:30, Andrea Arcangeli wrote:
> > As said it's very very unlikely that preemption points can fix xmms
> > skips anyways, the worst scheduler latency is always of the order of the
> > msecs, to generate skips you need a latency of seconds.
> >
> > [...]
> >
> > There's nothing magic in the software, if you divide the cpu in 10 parts
> > and you give 1/10 of the cpu to xmms, but xmms needs 1/2 of the cpu to
> > play your .mp3 then there's nothing you can do to fix it but to tell
> > the scheduler to give more cpu to xmms (renicing to -20 gives more cpu
>
> What if the CPU does divide its time into two 1/2 parts, and gives one
> each to xmms and dbench. Everything runs fine, since xmms needs 1/2 cpu
> to play without skips.

Of course. (btw, when running dbench there's usually more than one
thread to generate more I/O, usually 20/40, it depends on the parameter
but let's assume there's only one thread for the sake of this example)

> Now dbench (or any task) is in kernel space for too long. The CPU time

The time in kernel space decreases the timeslice too, so it doesn't
matter if it runs in kernel space too long, it will still be accounted
as such 1/2 of time.

> xmms needs will of course still be given, but _too late_. Its just not

I think the issue you raise is that dbench gets a 10msec more of cpu
time and xmms starts running 10msec later than expected (because of the
scheduler latency peak worst case of 10msec).

But that doesn't matter. The scheduler isn't perfect anyways. The
resolution of the scheduler is 10msec too, so you can easily lose 10msec
anywhere else no matter of whatever scheduler latency of 10msec.

The only tasks that can get hurted by the scheduler latency are real
time tasks running with RT prio that expects to get running in less than
10msec after their wakeup, this is obviously not the xmms case that can
live fine even if it becomes running after houndred milliseconds after
its wakeup.

The point is that to avoid dropouts dbench must take say 40% of the cpu
and xmms another 40% of the cpu. Then the 10msec doesn't matter. If each
one takes 50% of cpu exactly you can run in dropouts anyways because of
scheduler imprecisions.

So again: the preemptive patch cannot make any difference, except for
the read/write copy-user paths that originally Ingo fixed ages ago in
2.2, and that I also later fixed in all -aa 2.2 and 2.4 and that are
also fixed in the lowlatency patches from Andrew (but in the
generic_file_read/write rather than in copy-user, to possible avoid some
overhead for short copy users, but the end result for an xmms user is
exactly the same).

So for whatever non real time, but where buffering is possible running
lowlatency patch from Ingo or Andrew, preemptive patch, or -aa isn''t
going to make any difference.

> a cpu resource problem, its a timing problem. xmms needs x units of CPU
> every y units of time. Just getting the x whenever is not enough.
>
> With preempt-kernel patch, the long-lasting kernel space activity dbench
> is engaged in won't hog the CPU until it completes. When xmms is ready
> (time y arrives), the scheduler will yield the CPU.
>
> Robert Love


Andrea

2001-10-10 03:23:53

by Robert Love

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Tue, 2001-10-09 at 23:06, Andrea Arcangeli wrote:
> [...]
> I think the issue you raise is that dbench gets a 10msec more of cpu
> time and xmms starts running 10msec later than expected (because of the
> scheduler latency peak worst case of 10msec).
>
> But that doesn't matter. The scheduler isn't perfect anyways. The
> resolution of the scheduler is 10msec too, so you can easily lose 10msec
> anywhere else no matter of whatever scheduler latency of 10msec. [...]

I agree with generally everything you say.

I think, however, you are making two assumptions:

(a) xmms has a very large leeway in the timing of its execution

(b) the maximum time a process sits in kernel space is 10ms.

While I agree (a) is true, it may not be so in all scenerios.
Furthermore, the specified leeway does not exist for all timing-critical
tasks. Not all of these tasks are specialized real-time applications,
either.

Most importantly, however, the maximum latency of the system is not
10ms. Even _with_ preemption, we have observed greater latencies (due
to long held locks).

This is why I believe the a preemptible kernel benefits more than just
real-time signal processing.

Robert Love

2001-10-10 03:58:08

by Dieter Nützel

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Tue, Oct 10, 2001 at 03:06, Andrea Arcangeli wrote:
> On Tue, Oct 09, 2001 at 10:37:56PM -0400, Robert Love wrote:
> > On Tue, 2001-10-09 at 22:30, Andrea Arcangeli wrote:
> > > As said it's very very unlikely that preemption points can fix xmms
> > > skips anyways, the worst scheduler latency is always of the order of the
> > > msecs, to generate skips you need a latency of seconds.

[...]
> The point is that to avoid dropouts dbench must take say 40% of the cpu
> and xmms another 40% of the cpu. Then the 10msec doesn't matter. If each
> one takes 50% of cpu exactly you can run in dropouts anyways because of
> scheduler imprecisions.

I get the dropouts (2~3 sec) after dbench 32 is running for 9~10 seconds.
I've tried with RT artds and nice -20 mpg123.

Kernel: 2.4.11-pre6 + 00_vm-1 + preempt

Only solution:
I have to copy the test MPG3 file into /dev/shm.

CPU (1 GHz Athlon II) is ~75% idle during the hiccup.
The dbench processes are mostly in wait_page/wait_cache if I remember right.
So I think that you are right it is a file IO wait (latency) problem.

Please hurry up with your read/write copy-user paths lowlatency patches ;-)

> So again: the preemptive patch cannot make any difference, except for
> the read/write copy-user paths that originally Ingo fixed ages ago in
> 2.2, and that I also later fixed in all -aa 2.2 and 2.4 and that are
> also fixed in the lowlatency patches from Andrew (but in the
> generic_file_read/write rather than in copy-user, to possible avoid some
> overhead for short copy users, but the end result for an xmms user is
> exactly the same).

Andrew have you a current version of your lowlatency patches handy?

Robert you are running a dual PIII system, right?
Could that be the ground why you aren't see the hiccup with your nice preempt
patch? Are you running ReiserFS or EXT2/3?

Thanks,
Dieter

2001-10-10 04:01:38

by Robert Love

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Tue, 2001-10-09 at 23:57, Dieter N?tzel wrote:
> Robert you are running a dual PIII system, right?
> Could that be the ground why you aren't see the hiccup with your nice preempt
> patch? Are you running ReiserFS or EXT2/3?

No, I am on a single P3-733. I am using ext3.

I have had reports from users on both UP and SMP systems that say audio
playback is undisturbed during heavy I/O with preempt-kernel patch. Of
course, I don't know their definition of undisturbed...but I would wager
it doesn't include 2-3s skips.

Robert Love

2001-10-10 04:03:58

by Robert Love

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Wed, 2001-10-10 at 00:02, Robert Love wrote:
> On Tue, 2001-10-09 at 23:57, Dieter N?tzel wrote:
> > Robert you are running a dual PIII system, right?
> > Could that be the ground why you aren't see the hiccup with your nice preempt
> > patch? Are you running ReiserFS or EXT2/3?
>
> No, I am on a single P3-733. I am using ext3.

Oh, I'm not using Linus's tree, either.

Right now I am 2.4.10-ac10 + preempt-kernel + Rik's eatcache

Robert Love

2001-10-10 04:03:38

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Tue, Oct 09, 2001 at 11:24:36PM -0400, Robert Love wrote:
> On Tue, 2001-10-09 at 23:06, Andrea Arcangeli wrote:
> > [...]
> > I think the issue you raise is that dbench gets a 10msec more of cpu
> > time and xmms starts running 10msec later than expected (because of the
> > scheduler latency peak worst case of 10msec).
> >
> > But that doesn't matter. The scheduler isn't perfect anyways. The
> > resolution of the scheduler is 10msec too, so you can easily lose 10msec
> > anywhere else no matter of whatever scheduler latency of 10msec. [...]
>
> I agree with generally everything you say.
>
> I think, however, you are making two assumptions:
>
> (a) xmms has a very large leeway in the timing of its execution

Yes.

> (b) the maximum time a process sits in kernel space is 10ms.

Actually a task can sit in the kernel for its whole life and still
provide usec scheduler latencies (ksoftirqd)

> While I agree (a) is true, it may not be so in all scenerios.

Correct, of course if you run xmms on a 16mhz CPU it will dropout no
matter of the size of the dma buffer. And you're right the slower the
CPU is the strictier the scheduler latency requirements to avoid the
dropouts are. If the CPU is very slow as soon as xmms gets runnable it
may not have enough time to decode the next mp3 data before the DMA ring
drys out.

So yes I'm assuming doing playback on any recent cpu where xmms just
hurts because it generates high frequency reschedule that in turns means
tlb flushing on x86 (not because of the real cpu load). And mostly
because of its "moving" GUI, not even because of the sound backend :).

> Furthermore, the specified leeway does not exist for all timing-critical
> tasks. Not all of these tasks are specialized real-time applications,
> either.
>
> Most importantly, however, the maximum latency of the system is not
> 10ms. Even _with_ preemption, we have observed greater latencies (due
> to long held locks).

I was reading a very detailed latency analyse done on the 2.4.10 SuSE
kernel by Takashi and it showed 10msec peaks of worst case latency.

The stress during the latency measurement were x11perf (but ok, that's
mostly userspace), /proc with top, disk write, disk read and disk copy.

But of course in -aa as said there's just the most important part of the
low latency patch included, so without -aa I know for sure that the
scheduler latency can run up to several seconds with lots of ram in the
system (this is why I included only those scheduler points even in 2.2
for the multigigabyte machines, where the lack of rescheduling in
read/write could become a patological case visible with eyes while
typing in the shell).

And of course if you only apply the preempt patch and you don't add the
explicit points in the cpu hogs under locks, you'll cover only
copy-user lock less and semaphore parts, but not the other bits like the
ones in the memory management that are also covered since 2.4.1[01] and
in recent 2.2.

> This is why I believe the a preemptible kernel benefits more than just
> real-time signal processing.

Provided that read/write gets fixed like in either Andrew's patch,
Ingo's patch or -aa, I believe it cannot make any visible difference for
mp3 playback in any recent machine. Feel free to experiment yourself
with 2.4.11aa1.

Andrea

2001-10-10 04:23:21

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Wed, Oct 10, 2001 at 05:57:46AM +0200, Dieter N?tzel wrote:
> On Tue, Oct 10, 2001 at 03:06, Andrea Arcangeli wrote:
> > On Tue, Oct 09, 2001 at 10:37:56PM -0400, Robert Love wrote:
> > > On Tue, 2001-10-09 at 22:30, Andrea Arcangeli wrote:
> > > > As said it's very very unlikely that preemption points can fix xmms
> > > > skips anyways, the worst scheduler latency is always of the order of the
> > > > msecs, to generate skips you need a latency of seconds.
>
> [...]
> > The point is that to avoid dropouts dbench must take say 40% of the cpu
> > and xmms another 40% of the cpu. Then the 10msec doesn't matter. If each
> > one takes 50% of cpu exactly you can run in dropouts anyways because of
> > scheduler imprecisions.
>
> I get the dropouts (2~3 sec) after dbench 32 is running for 9~10 seconds.
> I've tried with RT artds and nice -20 mpg123.
>
> Kernel: 2.4.11-pre6 + 00_vm-1 + preempt
>
> Only solution:
> I have to copy the test MPG3 file into /dev/shm.

If copying the mp3 data into /dev/shm fixes the problem it could be also
an I/O overload. But it could also be still the vm write throttling: to
read the mp3 from disk you need to allocate some cache, while to read
from /dev/shm you don't need to allocate anything because it was just
allocate when you copied the file there. Or it could be both things
together.

Like the cpu is divided by all the CPU hogs, the disk bandwith is also
divided by all the applications doing I/O at the same time (modulo the
fact the gloabl bandwith dramatically decrease when multiple apps do I/O
at the same time due the seeks, the thing that the elevator tries to
avoid by introducing some degree of unfairness in the I/O patterns).

So if this is just an I/O overload (possible too) some possible fixes
could be:

1) buy faster disk
2) try with elvtune -r 1 -w 2 /dev/hd[abcd] /dev/sd[abcd] that will try
to decrease the global I/O disk bandwith of the system, but it will
increase fairness

> CPU (1 GHz Athlon II) is ~75% idle during the hiccup.

Of course I can imagine. This is totally unrelated to scheduler
latencies, it's either vm write throttling or I/O congestion so you
don't have enough bandwith to read the file or both.

> The dbench processes are mostly in wait_page/wait_cache if I remember right.
> So I think that you are right it is a file IO wait (latency) problem.

Yes.

> Please hurry up with your read/write copy-user paths lowlatency patches ;-)

In the meantime you can use the preemption points in the copy-user, they
can add a bit more of overhead but nothing interesting, I believe it's
more a cleanup than an improvement to move the reschedule points in
read/write as suggested by Andrew.

BTW, this is the relevant patch:

ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.11aa1/00_copy-user-lat-5

You're probably more interested in the possible heuristic that I've in
mind to avoid xmms to wait I/O completion for the work submitted by
dbench. Of course assuming the vm write throttling was a relevant cause
of the dropouts, and that the dropouts weren't just due an I/O
congestion (too low disk bendwith).

BTW, to find out if the reason of the dropouts where the vm write
throttling or the too low disk bandwith you can run ps l <pid_of_xmms>,
if it says wait_on_buffer all the time it's the vm write throttling, if
it says always something else it's the too low disk bandwith, I suspect
as said above that you'll see both things because it is probably a mixed
effect. If it's not vm write throttling only a faster disk or elvtune
tweaking can help you, there's no renice-IO -n -20 that allows to
prioritize the I/O bandwith to a certain application.

Andrea

2001-10-10 04:26:51

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Wed, Oct 10, 2001 at 12:02:13AM -0400, Robert Love wrote:
> On Tue, 2001-10-09 at 23:57, Dieter N?tzel wrote:
> > Robert you are running a dual PIII system, right?
> > Could that be the ground why you aren't see the hiccup with your nice preempt
> > patch? Are you running ReiserFS or EXT2/3?
>
> No, I am on a single P3-733. I am using ext3.
>
> I have had reports from users on both UP and SMP systems that say audio
> playback is undisturbed during heavy I/O with preempt-kernel patch. Of
> course, I don't know their definition of undisturbed...but I would wager
> it doesn't include 2-3s skips.

If it's purerly I/O even mainline that is missing the reschedule points
shouldn't matter.

Infact the only thing that hurts during pure I/O (I mean not I/O from
cache, I mean real I/O to disk) is the browse of the lru dirty lists in
buffer.c and the vm lists (the latter are covered in latest 2.4). And
just the preemptive patch cannot help there since they're both covered
by locks and the explicit checks in the preemptive patch will get a
result equal to the lowlatency approch.

If it's mixed I/O half from cache and half from disk, then the lack of
reschedule points could be the culprit of course.

Andrea

2001-10-10 04:42:38

by Dieter Nützel

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

Am Mittwoch, 10. Oktober 2001 06:23 schrieb Andrea Arcangeli:
> On Wed, Oct 10, 2001 at 05:57:46AM +0200, Dieter N?tzel wrote:

[...]
> > I get the dropouts (2~3 sec) after dbench 32 is running for 9~10 seconds.

It is mostly only _ONE_ dropout like above.

> > I've tried with RT artds and nice -20 mpg123.
> >
> > Kernel: 2.4.11-pre6 + 00_vm-1 + preempt
> >
> > Only solution:
> > I have to copy the test MPG3 file into /dev/shm.
>
> If copying the mp3 data into /dev/shm fixes the problem it could be also
> an I/O overload.

The above plus nice -20 mpg123 *.mp3
I've forgotten to clearify this, sorry.

Should I try 2.4.11 + 00_vm-1 or 2.4.11aa1, again?

> So if this is just an I/O overload (possible too) some possible fixes
> could be:
>
> 1) buy faster disk

:-)

But I've checked it with two SCSI disks.
IBM DDYS U160 10k for dbench 32 and
IBM DDRS UW 7.2k for the mp3s

=> the hiccup

> 2) try with elvtune -r 1 -w 2 /dev/hd[abcd] /dev/sd[abcd] that will try
> to decrease the global I/O disk bandwith of the system, but it will
> increase fairness

OK, after I had some sleep...

> > CPU (1 GHz Athlon II) is ~75% idle during the hiccup.
>
> Of course I can imagine. This is totally unrelated to scheduler
> latencies, it's either vm write throttling or I/O congestion so you
> don't have enough bandwith to read the file or both.

It is only _ONE_ time during the whole dbench 32 run.

> > The dbench processes are mostly in wait_page/wait_cache if I remember
> > right. So I think that you are right it is a file IO wait (latency)
> > problem.
>
> Yes.
>
> > Please hurry up with your read/write copy-user paths lowlatency patches
> > ;-)
>
> In the meantime you can use the preemption points in the copy-user, they
> can add a bit more of overhead but nothing interesting, I believe it's
> more a cleanup than an improvement to move the reschedule points in
> read/write as suggested by Andrew.
>
> BTW, this is the relevant patch:
>
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.11
>aa1/00_copy-user-lat-5

GREAT.

> You're probably more interested in the possible heuristic that I've in
> mind to avoid xmms to wait I/O completion for the work submitted by
> dbench. Of course assuming the vm write throttling was a relevant cause
> of the dropouts, and that the dropouts weren't just due an I/O
> congestion (too low disk bendwith).

> BTW, to find out if the reason of the dropouts where the vm write
> throttling or the too low disk bandwith you can run ps l <pid_of_xmms>,

What do you mean here? I can't find a meaningfully ps option.

> if it says wait_on_buffer all the time it's the vm write throttling, if
> it says always something else it's the too low disk bandwith, I suspect
> as said above that you'll see both things because it is probably a mixed
> effect. If it's not vm write throttling only a faster disk or elvtune
> tweaking can help you, there's no renice-IO -n -20 that allows to
> prioritize the I/O bandwith to a certain application.

Thanks,
Dieter

2001-10-10 04:48:27

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Wed, Oct 10, 2001 at 06:42:37AM +0200, Dieter N?tzel wrote:
> Am Mittwoch, 10. Oktober 2001 06:23 schrieb Andrea Arcangeli:
> > On Wed, Oct 10, 2001 at 05:57:46AM +0200, Dieter N?tzel wrote:
>
> [...]
> > > I get the dropouts (2~3 sec) after dbench 32 is running for 9~10 seconds.
>
> It is mostly only _ONE_ dropout like above.

One isn't really too bad actually, if there's an huge I/O going on at
least.

> The above plus nice -20 mpg123 *.mp3
> I've forgotten to clearify this, sorry.
>
> Should I try 2.4.11 + 00_vm-1 or 2.4.11aa1, again?

2.4.11aa1 with also read/write reschedule points would be more
interesting I think.

> > You're probably more interested in the possible heuristic that I've in
> > mind to avoid xmms to wait I/O completion for the work submitted by
> > dbench. Of course assuming the vm write throttling was a relevant cause
> > of the dropouts, and that the dropouts weren't just due an I/O
> > congestion (too low disk bendwith).
>
> > BTW, to find out if the reason of the dropouts where the vm write
> > throttling or the too low disk bandwith you can run ps l <pid_of_xmms>,
>
> What do you mean here? I can't find a meaningfully ps option.

I meant the output of `ps l` (WCHAN column xmms row).

Andrea

2001-10-10 05:14:12

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

Dieter N?tzel wrote:
>
> Andrew have you a current version of your lowlatency patches handy?
>

mm.. Nice people keep sending me updates. It's at
http://www.uow.edu.au/~andrewm/linux/schedlat.html and applies
to 2.4.11 with one little reject. I don't know how it's
performing at present - it's time for another round of tuning
and testing.

wrt this discussion: I would assume that xmms is simply stalling
on disk access. All it takes is for one of its text pages to be
dropped and it could have to wait a very long time indeed to
come back to life. The disk read latency could easily exceed
any sane buffering in the sound card or its driver.

The application should be using mlockall(MCL_FUTURE) and it should
run `nice -19' (SCHED_FIFO and SCHED_RR are rather risky - if the
app gets stuck in a loop, it's time to hit the big button). If the
app isn't doing both these things then it just doesn't have a chance.

I don't understand why Andrea is pointing at write throttling? xmms
doesn't do any disk writes, does it??

Andrea's VM has a rescheduling point in shrink_cache(), which is the
analogue of the other VM's page_launder(). This rescheduling point
is *absolutely critial*, because it opens up what is probably the
longest-held spinlock in the kernel (under common use). If there
were a similar reschedulig point in page_launder(), comparisons
would be more valid...


I would imagine that for a (very) soft requirement such as audio
playback, the below patch, combined with mlockall and renicing
should fix the problems. I would expect that this patch will
give effects which are similar to the preempt patch. This is because
most of the other latency problems are under locks - icache/dcache
shrinking and zap_page_range(), etc.

This patch should go into the stock 2.4 kernel.

Oh. And always remember to `renice -19' your X server.



--- linux-2.4.11/mm/filemap.c Tue Oct 9 21:31:40 2001
+++ linux-akpm/mm/filemap.c Tue Oct 9 21:47:51 2001
@@ -1230,6 +1230,9 @@ found_page:
page_cache_get(page);
spin_unlock(&pagecache_lock);

+ if (current->need_resched)
+ schedule();
+
if (!Page_Uptodate(page))
goto page_not_up_to_date;
generic_file_readahead(reada_ok, filp, inode, page);
@@ -2725,6 +2728,9 @@ generic_file_write(struct file *file,con
if (!PageLocked(page)) {
PAGE_BUG(page);
}
+
+ if (current->need_resched)
+ schedule();

kaddr = kmap(page);
status = mapping->a_ops->prepare_write(file, page, offset, offset+bytes);
--- linux-2.4.11/fs/buffer.c Tue Oct 9 21:31:40 2001
+++ linux-akpm/fs/buffer.c Tue Oct 9 22:08:51 2001
@@ -29,6 +29,7 @@
/* async buffer flushing, 1999 Andrea Arcangeli <[email protected]> */

#include <linux/config.h>
+#include <linux/compiler.h>
#include <linux/sched.h>
#include <linux/fs.h>
#include <linux/slab.h>
@@ -231,6 +232,10 @@ static int write_some_buffers(kdev_t dev
static void write_unlocked_buffers(kdev_t dev)
{
do {
+ if (unlikely(current->need_resched)) {
+ __set_current_state(TASK_RUNNING);
+ schedule();
+ }
spin_lock(&lru_list_lock);
} while (write_some_buffers(dev));
run_task_queue(&tq_disk);
--- linux-2.4.11/fs/proc/array.c Sun Sep 23 12:48:44 2001
+++ linux-akpm/fs/proc/array.c Tue Oct 9 21:47:51 2001
@@ -414,6 +414,9 @@ static inline void statm_pte_range(pmd_t
pte_t page = *pte;
struct page *ptpage;

+ if (current->need_resched)
+ schedule(); /* For `top' and `ps' */
+
address += PAGE_SIZE;
pte++;
if (pte_none(page))
--- linux-2.4.11/fs/proc/generic.c Sun Sep 23 12:48:44 2001
+++ linux-akpm/fs/proc/generic.c Tue Oct 9 21:47:51 2001
@@ -98,6 +98,9 @@ proc_file_read(struct file * file, char
retval = n;
break;
}
+
+ if (current->need_resched)
+ schedule(); /* Some proc files are large */

/* This is a hack to allow mangling of file pos independent
* of actual bytes read. Simply place the data at page,

2001-10-10 05:26:22

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Tue, Oct 09, 2001 at 10:13:58PM -0700, Andrew Morton wrote:
> I don't understand why Andrea is pointing at write throttling? xmms
> doesn't do any disk writes, does it??

Of course it doesn't. You're right it could be just because of I/O
bandwith shortage. But it could really be also because of vm write
throttling.

xmms can end waiting I/O completion for I/O submitted by other I/O bound
tasks. This because xmms is reading from disk and in turn it is
allocating cache and in turn it is allocating memory. While allocating
memory it may need to write throttle.

Copying the file to /dev/shm fixed the problem but that would cover both
the write throttling and the disk bandwith problems at the same time and
I guess it's a mixed effect of both things.

> Andrea's VM has a rescheduling point in shrink_cache(), which is the
> analogue of the other VM's page_launder(). This rescheduling point
> is *absolutely critial*, because it opens up what is probably the
> longest-held spinlock in the kernel (under common use). If there
> were a similar reschedulig point in page_launder(), comparisons
> would be more valid...

Indeed.

> I would imagine that for a (very) soft requirement such as audio
> playback, the below patch, combined with mlockall and renicing
> should fix the problems. I would expect that this patch will
> give effects which are similar to the preempt patch. This is because

I didn't checked the patch in the detail yet but it seems you covered
read/write some bits in /proc and a lru list during buffer flushing. I
agree that it should be enough to give the same effects of the preempt
patch.

> most of the other latency problems are under locks - icache/dcache
> shrinking and zap_page_range(), etc.

Exactly.

> This patch should go into the stock 2.4 kernel.
>
> Oh. And always remember to `renice -19' your X server.

I don't renice my X server (I rather renice all cpu hogs to +19 and I
left -20 for something that really needs to run as fast as possible
regardless of the X server).

Andrea

2001-10-10 05:58:28

by Justin A

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Tue, Oct 09, 2001 at 08:36:56PM -0400, safemode wrote:
> Heavily io bound processes (dbench 32) still causes something as light as an
> mp3 player to skip, though. That probably wont be fixed intil 2.5, since

What buffer size are you using in your mp3 player? I have xmms set to
5000ms or so and it never skips. mpg321(esd or oss) also never skips no
matter what I do, but the original mpg123-oss will with even light load
on the cpu/disk.

This is with 2.4.10-ac9+preempt on an athlon 700

-Justin

2001-10-10 11:41:53

by Ed Sweetman

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Wednesday 10 October 2001 01:26, Andrea Arcangeli wrote:
> On Tue, Oct 09, 2001 at 10:13:58PM -0700, Andrew Morton wrote:
> > I don't understand why Andrea is pointing at write throttling? xmms
> > doesn't do any disk writes, does it??
>
> Of course it doesn't. You're right it could be just because of I/O
> bandwith shortage. But it could really be also because of vm write
> throttling.
>
> xmms can end waiting I/O completion for I/O submitted by other I/O bound
> tasks. This because xmms is reading from disk and in turn it is
> allocating cache and in turn it is allocating memory. While allocating
> memory it may need to write throttle.
>
> Copying the file to /dev/shm fixed the problem but that would cover both
> the write throttling and the disk bandwith problems at the same time and
> I guess it's a mixed effect of both things.
>
> > Andrea's VM has a rescheduling point in shrink_cache(), which is the
> > analogue of the other VM's page_launder(). This rescheduling point
> > is *absolutely critial*, because it opens up what is probably the
> > longest-held spinlock in the kernel (under common use). If there
> > were a similar reschedulig point in page_launder(), comparisons
> > would be more valid...
>
> Indeed.
>
> > I would imagine that for a (very) soft requirement such as audio
> > playback, the below patch, combined with mlockall and renicing
> > should fix the problems. I would expect that this patch will
> > give effects which are similar to the preempt patch. This is because
>
> I didn't checked the patch in the detail yet but it seems you covered
> read/write some bits in /proc and a lru list during buffer flushing. I
> agree that it should be enough to give the same effects of the preempt
> patch.
>
> > most of the other latency problems are under locks - icache/dcache
> > shrinking and zap_page_range(), etc.
>
> Exactly.
>
> > This patch should go into the stock 2.4 kernel.
> >
> > Oh. And always remember to `renice -19' your X server.
Blah, You shouldn't need to. You shouldn't have anything able to lag your X
server unless you're running so many programs your cpu time slices are too
small for it's needs ( or memory).


> I don't renice my X server (I rather renice all cpu hogs to +19 and I
> left -20 for something that really needs to run as fast as possible
> regardless of the X server).
>
> Andrea

freeamp runs with no noticable cpu usage, meaning it's 0.0 nearly 100% of the
time and i have 256K of input buffer and 16K of output. Then i have a
process like dbench create a bunch of threads (32) and cause freeamp to skip.
Now how is that a fair spread of cpu? The point is that this doesn't have
to do with cpu spread and getting locked out of cpu. It just has to do with
dbench holding the kernel for too long in places and the kernel should know
that and tell it to wait since other processes are behaving. There needs
to be a threshhold of kernel usage before the kernel will begin to preempt
that task for all the ones within the threshhold unless YOU want that kernel
hogger to run at full speed. In which case you can renice it to a lower nice
(higher priority). Dbench is getting it's share of cpu maybe, but it's
getting for too much of it's share of kernel time and that needs to be
stopped and it's unfair in a multi-user multiprocessing system. That's what
i meant earlier.

It's just my opinion that kernel hoggers should need to be given user defined
higher priority to hog the kernel and not everything else to just run because
you're running a kernel hogger.

2001-10-10 11:59:59

by Ed Sweetman

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

OK, i copied the mp3 into /dev/shm and without any renicing of anything it
plays fine during dbench 32. so the problem is disk access taking too long.

Which is strange since i'm running dbench on a separate hdd on a totally
different controller.




On Wednesday 10 October 2001 07:41, safemode wrote:
> On Wednesday 10 October 2001 01:26, Andrea Arcangeli wrote:
> > On Tue, Oct 09, 2001 at 10:13:58PM -0700, Andrew Morton wrote:
> > >
> > > Oh. And always remember to `renice -19' your X server.
>
> Blah, You shouldn't need to. You shouldn't have anything able to lag your
> X server unless you're running so many programs your cpu time slices are
> too small for it's needs ( or memory).
>
> > I don't renice my X server (I rather renice all cpu hogs to +19 and I
> > left -20 for something that really needs to run as fast as possible
> > regardless of the X server).
> >
> > Andrea
>
> freeamp runs with no noticable cpu usage, meaning it's 0.0 nearly 100% of
> the time and i have 256K of input buffer and 16K of output. Then i have a
> process like dbench create a bunch of threads (32) and cause freeamp to
> skip. Now how is that a fair spread of cpu? The point is that this doesn't
> have to do with cpu spread and getting locked out of cpu. It just has to
> do with dbench holding the kernel for too long in places and the kernel
> should know that and tell it to wait since other processes are behaving.
> There needs to be a threshhold of kernel usage before the kernel will begin
> to preempt that task for all the ones within the threshhold unless YOU want
> that kernel hogger to run at full speed. In which case you can renice it to
> a lower nice (higher priority). Dbench is getting it's share of cpu maybe,
> but it's getting for too much of it's share of kernel time and that needs
> to be stopped and it's unfair in a multi-user multiprocessing system.
> That's what i meant earlier.
>
> It's just my opinion that kernel hoggers should need to be given user
> defined higher priority to hog the kernel and not everything else to just
> run because you're running a kernel hogger.

2001-10-10 13:37:08

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Wed, Oct 10, 2001 at 08:00:04AM -0400, safemode wrote:
> OK, i copied the mp3 into /dev/shm and without any renicing of anything it
> plays fine during dbench 32. so the problem is disk access taking too long.
>
> Which is strange since i'm running dbench on a separate hdd on a totally
> different controller.

then if you know it's not disk congestion, it's most probably due the vm
write throttling.

Andrea

2001-10-10 16:00:13

by Dieter Nützel

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

Am Mittwoch, 10. Oktober 2001 05:25 schrieb Justin A:
> On Tue, Oct 09, 2001 at 08:36:56PM -0400, safemode wrote:
> > Heavily io bound processes (dbench 32) still causes something as light as
> > an mp3 player to skip, though. That probably wont be fixed intil 2.5,
> > since
>
> What buffer size are you using in your mp3 player? I have xmms set to
> 5000ms or so and it never skips.

OK, I'll give xmms with this buffer size a go, too.

> mpg321(esd or oss) also never skips no matter what I do,

Do you have link to the mpg321 (oss) version for me?

> but the original mpg123-oss will with even light load
> on the cpu/disk.

I get the hiccup with mpg123 and noatun (artsd, KDE-2.2.1).

>
> This is with 2.4.10-ac9+preempt on an athlon 700

Here with Linus tree.

-Dieter

2001-10-10 18:14:33

by George Anzinger

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

Andrew Morton wrote:
>
> Dieter N?tzel wrote:
> >
> > Andrew have you a current version of your lowlatency patches handy?
> >
>
> mm.. Nice people keep sending me updates. It's at
> http://www.uow.edu.au/~andrewm/linux/schedlat.html and applies
> to 2.4.11 with one little reject. I don't know how it's
> performing at present - it's time for another round of tuning
> and testing.
>
> wrt this discussion: I would assume that xmms is simply stalling
> on disk access. All it takes is for one of its text pages to be
> dropped and it could have to wait a very long time indeed to
> come back to life. The disk read latency could easily exceed
> any sane buffering in the sound card or its driver.
>
> The application should be using mlockall(MCL_FUTURE) and it should
> run `nice -19' (SCHED_FIFO and SCHED_RR are rather risky - if the
> app gets stuck in a loop, it's time to hit the big button).

When running any RT tasks it is aways wise to have an open shell running
at a higher priority. It is also neccessary to have an open console
path to the shell which may mean that X needs to be up there too. But
if this is just a back door, an alternative console could be outside of
X and do the trick.

George

> If the
> app isn't doing both these things then it just doesn't have a chance.
>
> I don't understand why Andrea is pointing at write throttling? xmms
> doesn't do any disk writes, does it??
>
> Andrea's VM has a rescheduling point in shrink_cache(), which is the
> analogue of the other VM's page_launder(). This rescheduling point
> is *absolutely critial*, because it opens up what is probably the
> longest-held spinlock in the kernel (under common use). If there
> were a similar reschedulig point in page_launder(), comparisons
> would be more valid...
>
> I would imagine that for a (very) soft requirement such as audio
> playback, the below patch, combined with mlockall and renicing
> should fix the problems. I would expect that this patch will
> give effects which are similar to the preempt patch. This is because
> most of the other latency problems are under locks - icache/dcache
> shrinking and zap_page_range(), etc.
>
> This patch should go into the stock 2.4 kernel.
>
> Oh. And always remember to `renice -19' your X server.
>
> --- linux-2.4.11/mm/filemap.c Tue Oct 9 21:31:40 2001
> +++ linux-akpm/mm/filemap.c Tue Oct 9 21:47:51 2001
> @@ -1230,6 +1230,9 @@ found_page:
> page_cache_get(page);
> spin_unlock(&pagecache_lock);
>
> + if (current->need_resched)
> + schedule();
> +
> if (!Page_Uptodate(page))
> goto page_not_up_to_date;
> generic_file_readahead(reada_ok, filp, inode, page);
> @@ -2725,6 +2728,9 @@ generic_file_write(struct file *file,con
> if (!PageLocked(page)) {
> PAGE_BUG(page);
> }
> +
> + if (current->need_resched)
> + schedule();
>
> kaddr = kmap(page);
> status = mapping->a_ops->prepare_write(file, page, offset, offset+bytes);
> --- linux-2.4.11/fs/buffer.c Tue Oct 9 21:31:40 2001
> +++ linux-akpm/fs/buffer.c Tue Oct 9 22:08:51 2001
> @@ -29,6 +29,7 @@
> /* async buffer flushing, 1999 Andrea Arcangeli <[email protected]> */
>
> #include <linux/config.h>
> +#include <linux/compiler.h>
> #include <linux/sched.h>
> #include <linux/fs.h>
> #include <linux/slab.h>
> @@ -231,6 +232,10 @@ static int write_some_buffers(kdev_t dev
> static void write_unlocked_buffers(kdev_t dev)
> {
> do {
> + if (unlikely(current->need_resched)) {
> + __set_current_state(TASK_RUNNING);
> + schedule();
> + }
> spin_lock(&lru_list_lock);
> } while (write_some_buffers(dev));
> run_task_queue(&tq_disk);
> --- linux-2.4.11/fs/proc/array.c Sun Sep 23 12:48:44 2001
> +++ linux-akpm/fs/proc/array.c Tue Oct 9 21:47:51 2001
> @@ -414,6 +414,9 @@ static inline void statm_pte_range(pmd_t
> pte_t page = *pte;
> struct page *ptpage;
>
> + if (current->need_resched)
> + schedule(); /* For `top' and `ps' */
> +
> address += PAGE_SIZE;
> pte++;
> if (pte_none(page))
> --- linux-2.4.11/fs/proc/generic.c Sun Sep 23 12:48:44 2001
> +++ linux-akpm/fs/proc/generic.c Tue Oct 9 21:47:51 2001
> @@ -98,6 +98,9 @@ proc_file_read(struct file * file, char
> retval = n;
> break;
> }
> +
> + if (current->need_resched)
> + schedule(); /* Some proc files are large */
>
> /* This is a hack to allow mangling of file pos independent
> * of actual bytes read. Simply place the data at page,
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-10-10 19:47:29

by Roger Larsson

[permalink] [raw]
Subject: Buffers, dbench and latency

I merge two comment... since they are related!

On Wednesday 10 October 2001 04:51, Andrea Arcangeli wrote:
> Of course if xmms runs after the soundcard dma ring dried out, then
> there will be a dropout, but it would need seconds of scheduler latency
> to generate such a dropout which isn't going to happen.

On Wednesday 10 October 2001 07:25, Justin A wrote:
> On Tue, Oct 09, 2001 at 08:36:56PM -0400, safemode wrote:
> > Heavily io bound processes (dbench 32) still causes something as light
> > as an mp3 player to skip, though. That probably wont be fixed intil
> > 2.5, since
>
> What buffer size are you using in your mp3 player? I have xmms set to
> 5000ms or so and it never skips. mpg321(esd or oss) also never skips no
> matter what I do, but the original mpg123-oss will with even light load
> on the cpu/disk.
>
> This is with 2.4.10-ac9+preempt on an athlon 700
>

5000 ms == 5 s of buffered audio, assume 44100 k samples/s of 16 bit and 2
channels (5 channels is not that unusual) this gives a buffer of 882 kB ! or
215 pages! (Justin could probably fill in the actual details, but it does not
really matter for the discussion below)

I do not think this is the size of the DMA ring buffer...

This is what I think is the situation, I have not read the xmms source - it
could be implemented by one non blocking thread, but the analysis holds
anyway (IMHO):

* One thread reads from disk to a xmms ring buffer of 5000 ms, this will
give that tread lots of time to try to keep it big enough.

* Another thread, reads from this buffer, and writes to the DMA ring buffer.
(usually fragmented in several parts)
When it gets full - this tread goes to sleep...

In this situation we could have two causes for dropouts.

* The second thread is really a RT tread. It gets woken up when one
fragment has been used by the Audio DMA, i.e. new data can be
written (often it got blocked in the write, and thus only needs to get
the CPU for a brief moment). But it has to react in time...

Checking the source for emu10k1:
default fragment length is 20 ms, total default bufflen is 500 ms
but, maximum buff size is 65536 Bytes = 16 pages
this gives a DMA buffer time in our example (44100 16 bit, stereo)
of: 370 ms

So this process has an absolute RT limit of 370 ms, not seconds!
+ this is the part were different low latency approaches work, since
you might stay in kernel for longer times than this, there are loops
over page descriptors where used data might not be in cache...
The 10 ms figure claimed for 2.4.10 is not the whole truth...
+ The process buffer needs to be locked in memory, as all code
that needs to execute to read from it and write to DMA buffer.
Since a 5 s buffer will be written, and then staying idle for 5 s,
before it is used (read) the next time...

* The same for the other process - it has more time to spare, but
it has to keep the process buffer with data. And it is usually some sort
of CPU hog - decoding compressed audio streams. It has to get
scheduled in and allowed to run.

Note: dbench threads as most IO limited ones behaves nice from a
scheduler viewpoint - mostly waiting on IO resources.
On my computer they use less CPU (0.3 % each) than
artsd (10.8% and 1.1%) and noatune (1.9%). And that is not strange
in any way since the audio processes actually has to do some
calculations...

Now the scheduler has to choose - a process mostly waiting on
IO or one that actually uses a part of its time slice...

If you are not running with lower nicelevel or SCHED_FIFO/RT
to guarantee that you will be selected on next rescheduling - you
will be in trouble...

/RogerL

--
Roger Larsson
Skellefte?
Sweden

2001-10-10 20:10:15

by Justin A

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Wed, Oct 10, 2001 at 05:37:40PM +0200, Dieter N?tzel wrote:
> Am Mittwoch, 10. Oktober 2001 05:25 schrieb Justin A:
> > On Tue, Oct 09, 2001 at 08:36:56PM -0400, safemode wrote:
> > > Heavily io bound processes (dbench 32) still causes something as light as
> > > an mp3 player to skip, though. That probably wont be fixed intil 2.5,
> > > since
> >
> > What buffer size are you using in your mp3 player? I have xmms set to
> > 5000ms or so and it never skips.
>
> OK, I'll give xmms with this buffer size a go, too.
>
> > mpg321(esd or oss) also never skips no matter what I do,
>
> Do you have link to the mpg321 (oss) version for me?

It should be in the same version:

-o dt Set output devicetype to dt [esd,alsa,arts,sun,oss]

>
> > but the original mpg123-oss will with even light load
> > on the cpu/disk.
>
> I get the hiccup with mpg123 and noatun (artsd, KDE-2.2.1).

Have you tried the -b option in mpg123?

-b n output buffer: n Kbytes [0]

Even maxed out it has no effect on the quality of the playback.
>
> >
> > This is with 2.4.10-ac9+preempt on an athlon 700
>
> Here with Linus tree.
>
> -Dieter

This behavior(xmms and mpg321 fine, mpg123 skipping) has always been the
same for me. xmms was a more reliable player on my pentium 100. It may
just be a better design in mpg321 and xmms.

-Justin

2001-10-10 23:42:30

by Ed Sweetman

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Wednesday 10 October 2001 09:36, Andrea Arcangeli wrote:
> On Wed, Oct 10, 2001 at 08:00:04AM -0400, safemode wrote:
> > OK, i copied the mp3 into /dev/shm and without any renicing of anything
> > it plays fine during dbench 32. so the problem is disk access taking too
> > long.
> >
> > Which is strange since i'm running dbench on a separate hdd on a totally
> > different controller.
>
> then if you know it's not disk congestion, it's most probably due the vm
> write throttling.
>
> Andrea

How is it that a process at the same priority as allowed to throttle the
kernel's vm and starve other processes at the same priority. That sounds
like dbench is being allowed to preempt other processes at the same priority.
even if it is indirect preemption. The effect is the same.

2001-10-11 00:30:57

by Mike Fedyk

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Wed, Oct 10, 2001 at 07:42:31PM -0400, safemode wrote:
> On Wednesday 10 October 2001 09:36, Andrea Arcangeli wrote:
> > On Wed, Oct 10, 2001 at 08:00:04AM -0400, safemode wrote:
> > > OK, i copied the mp3 into /dev/shm and without any renicing of anything
> > > it plays fine during dbench 32. so the problem is disk access taking too
> > > long.
> > >
> > > Which is strange since i'm running dbench on a separate hdd on a totally
> > > different controller.
> >
> > then if you know it's not disk congestion, it's most probably due the vm
> > write throttling.
> >
> > Andrea
>
> How is it that a process at the same priority as allowed to throttle the
> kernel's vm and starve other processes at the same priority. That sounds
> like dbench is being allowed to preempt other processes at the same priority.
> even if it is indirect preemption. The effect is the same.

The problem is that the disk subsystem doesn't take into account the
priority of the process initiating the heavy (or any for that matter) IO.

AFAICT, the only way to get fair disk access is to modify (shorten) the
elevator queue lengths (which IMHO are much too long). Check out elvtune
(I'm testing "-r 500 -w 750" right now) in the util-linux package.

Mike

2001-10-13 19:34:10

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

Hi!

> Now dbench (or any task) is in kernel space for too long. The CPU time
> xmms needs will of course still be given, but _too late_. Its just not
> a cpu resource problem, its a timing problem. xmms needs x units of CPU
> every y units of time. Just getting the x whenever is not enough.

Yep, with

x = 60msec
y = 600msec

So you can give it time up to 540msec late with no drop-outs.

--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2001-10-13 20:42:53

by Mike Fedyk

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Fri, Oct 12, 2001 at 01:22:20PM +0000, Pavel Machek wrote:
> Hi!
>
> > Now dbench (or any task) is in kernel space for too long. The CPU time
> > xmms needs will of course still be given, but _too late_. Its just not
> > a cpu resource problem, its a timing problem. xmms needs x units of CPU
> > every y units of time. Just getting the x whenever is not enough.
>
> Yep, with
>
> x = 60msec
> y = 600msec
>
> So you can give it time up to 540msec late with no drop-outs.
>

How fast was the processor/memory on the system that produced these numbers?

2001-10-13 23:21:39

by Robert Love

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

On Fri, 2001-10-12 at 09:22, Pavel Machek wrote:
> Hi!
>
> > Now dbench (or any task) is in kernel space for too long. The CPU time
> > xmms needs will of course still be given, but _too late_. Its just not
> > a cpu resource problem, its a timing problem. xmms needs x units of CPU
> > every y units of time. Just getting the x whenever is not enough.
>
> Yep, with
>
> x = 60msec
> y = 600msec

How are you arriving at that y? On what system?

I agree with the x value, though. However, people have been shouting a
lot of numbers around about the size of audio buffers, the size of DMA
buffers, etc. and they aren't realistic. If you calculate out the times
that have been quoted, keeping in mind the stereo and all, the buffers
are huge. We need to find out what the sizes _really_ are, and then
keep in mind that we have multiple channels (plus any other sound output
from X and what not).

Honestly, I don't know what the buffer sizes are. I do know audio skips
for me, and preempt-kernel patch improves that.

Robert Love

2001-10-14 06:18:47

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.4.10-ac10-preempt lmbench output.

Hi!

> > > Now dbench (or any task) is in kernel space for too long. The CPU time
> > > xmms needs will of course still be given, but _too late_. Its just not
> > > a cpu resource problem, its a timing problem. xmms needs x units of CPU
> > > every y units of time. Just getting the x whenever is not enough.
> >
> > Yep, with
> >
> > x = 60msec
> > y = 600msec
>
> How are you arriving at that y? On what system?

Toshiba sattelite notebook. I remember being able to ^Z splay process
playing mp3, and bg-ing it in time not to skip. That means that y is
at least 300msec or so.

[I wanted to retry it on k6/400 with sblive and mpg123 (not splay) and
could not do the trick.]
Pavel
--
Casualities in World Trade Center: 6453 dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.