On Tue, 2007-07-31 at 10:57 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Tue, 2007-07-31 at 01:46 +0200, Kasper Sandberg wrote:
> >
> > > could perhaps be filesystem related, i have my maildir(extremely
> > > large) on reiserfs, and /home on xfs. what my mail client will do is
> > > download mail, spamasassin it(loading database from home), then it
> > > will put to imap server placing it on reiserfs, and then a "local"
> > > copy in my home.
> >
> > Ooh, do you perchance have PREEMPT_BKL=y?
> >
sorry late response.
nope, i run totally without preemption, i did however test with, it
seemed to not matter in terms of smoothness, but reduced the throughput
slightly.
> > If so, try on another filesystem than reiserfs (or disable
> > PREEMPT_BKL, but that is obviously the lesser of the two choices).
> >
> > Ingo traced a 1+ second latency at my end to BKL priority inversion
> > between tty and reiserfs.
>
> ah, indeed, that makes quite a bit of sense. Almost all of the Reiser3
> code runs under the BKL, and the only other major kernel infrastructure
> that has BKL dependencies is the TTY code. Kasper, as a debugging
> matter, could you try to move that spamassassin workload off into a
> non-Reiser3 filesystem and/or disable PREEMPT_BKL? If that makes a
> noticeable difference (for the better ;) then we can continue figuring
> out what's happening exactly.
the pricess is as this:
mail client fetches mail
mail client invokes spamasassin
if spam -> spam
else filtering
if it matches certain filters, it gets put into my imap server, which is
reiserfs.
> Ingo
* Kasper Sandberg <[email protected]> wrote:
> > ah, indeed, that makes quite a bit of sense. Almost all of the
> > Reiser3 code runs under the BKL, and the only other major kernel
> > infrastructure that has BKL dependencies is the TTY code. Kasper, as
> > a debugging matter, could you try to move that spamassassin workload
> > off into a non-Reiser3 filesystem and/or disable PREEMPT_BKL? If
> > that makes a noticeable difference (for the better ;) then we can
> > continue figuring out what's happening exactly.
>
> the pricess is as this:
> mail client fetches mail
> mail client invokes spamasassin
> if spam -> spam
> else filtering
> if it matches certain filters, it gets put into my imap server, which is
> reiserfs.
do you have any filesystem that is not reiserfs? If yes, could you, as a
test, check whether file activities on _that_ file system still cause
these lags, or is the lag purely connected to the reiser3 filesystem?
Ingo
Kasper,
could you please try the "chew-max" latency-printing utility:
http://people.redhat.com/mingo/cfs-scheduler/tools/chew-max.c
if you start it on an idle system it prints a single line:
$ ./chew-max
pid 14506, prio 0, interval of 99984800 nsec
and prints nothing else. It continues looping and looping (using up 100%
of CPU time), and the moment it's preempted, it prints a line about that
preemption latency. Under higher load it will print something like this:
out for 63 ms [max: 66], ran for 5 ms, load 7
out for 85 ms [max: 85], ran for 4 ms, load 5
out for 7 ms [max: 85], ran for 0 ms, load 0
out for 105 ms [max: 105], ran for 3 ms, load 3
out for 174 ms [max: 174], ran for 6 ms, load 3
out for 219 ms [max: 219], ran for 3 ms, load 1
out for 78 ms [max: 219], ran for 3 ms, load 3
so that we get a picture of your latencies, could you run this tool why
you are seeing those 'bad' desktop latencies? (Since your CPU has two
cores it might make sense to run two instances of chew-max.)
record the latencies like this:
./chew-max > chew1.out &
./chew-max > chew2.out &
and send us the chew1.out and chew2.out files (bzip2 -9 compressed).
Thanks!
Ingo
* Ingo Molnar <[email protected]> wrote:
> do you have any filesystem that is not reiserfs? If yes, could you, as
> a test, check whether file activities on _that_ file system still
> cause these lags, or is the lag purely connected to the reiser3
> filesystem?
i still have little debug info from you to start from: no
cfs-debug-info.sh output of the problematic workload and no kernel
.config.
i tried to reproduce your problems based on your existing description: i
did a lot of reiser3 testing yesterday and i also wrote a 'BKL latency
simulator' (which does a faux lock_kernel() + unlock_kernel() so that
the testcode runs into the BKL all the time) - but still this had no
visible effect on desktop latencies so either i have some subtle
difference in my setup or this aspect of your workload is not the cause
of the smoothness problem.
could please give us more debug info and try to simplify the "bad" case
down to something that can be pinpointed and triggered more exactly? Do
you see any particular 'ruckle' in the 3D game when you see a smoothness
problem? Anything that we could clearly label as 'anomalous latency' in
a tracer output? (in that case i'll send you tracing patches so that we
can catch a trace of that 'hickup')
You said the imap stuff could be causing the smoothness problem: as a
debugging thing could you try to renice all the imap activities (imap
daemon / mailer) to nice +19, does that make the game magically smooth
again? If yes then this is an indicator that the problem is interaction
between the game and the imap activities.
Ingo
First off, sorry for the late response.
On Thu, 2007-08-02 at 17:42 +0200, Ingo Molnar wrote:
> Kasper,
>
> could you please try the "chew-max" latency-printing utility:
>
> http://people.redhat.com/mingo/cfs-scheduler/tools/chew-max.c
>
> if you start it on an idle system it prints a single line:
>
> $ ./chew-max
> pid 14506, prio 0, interval of 99984800 nsec
>
> and prints nothing else. It continues looping and looping (using up 100%
> of CPU time), and the moment it's preempted, it prints a line about that
> preemption latency. Under higher load it will print something like this:
>
> out for 63 ms [max: 66], ran for 5 ms, load 7
> out for 85 ms [max: 85], ran for 4 ms, load 5
> out for 7 ms [max: 85], ran for 0 ms, load 0
> out for 105 ms [max: 105], ran for 3 ms, load 3
> out for 174 ms [max: 174], ran for 6 ms, load 3
> out for 219 ms [max: 219], ran for 3 ms, load 1
> out for 78 ms [max: 219], ran for 3 ms, load 3
>
> so that we get a picture of your latencies, could you run this tool why
> you are seeing those 'bad' desktop latencies? (Since your CPU has two
bad is not the exact word, its pretty good, certainly better than old
vanilla scheduler, just not as smooth as could be.
> cores it might make sense to run two instances of chew-max.)
its a singlecore socket 754 firstgen amd64 :)
>
> record the latencies like this:
>
> ./chew-max > chew1.out &
> ./chew-max > chew2.out &
>
> and send us the chew1.out and chew2.out files (bzip2 -9 compressed).
> Thanks!
i've attached it(bzip2'ed)
i've come to think it is IO related, but not entirely related to
reiserfs, perhaps xfs.
i will conduct some tests as soon as possible.
>
> Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/