LinuxLists.cc - [long] Another BFS versus CFS shakedown

2009-09-08 23:42:43

Subject: [long] Another BFS versus CFS shakedown

I've also run some tests and have very consciously tried to pay attention
to interactivity, while also trying to get some "hard" data.

I've only BCCed Con on this mail as I get the impression he'll not be
interested in following the LKML thread.

Con: you are very welcome to follow up either privately or to lkml if you
do want to follow up on any of the results.

System info
-----------
HP 2510p Core2 Duo 1.33GHz 2GB notebook
Debian stable ("Lenny"), KDE desktop environment
Wireless networking (iwlagn)
Notebook is in a docking station with a second (main) display connected
using "old" style X and graphics drivers (no KMS)

CFS was tested with 2.6.31-rc9
BFS was tested with 2.6.30.5 + the bfs-209 patch

In both cases I've not done anything special with kernel configs. I've
just used my old .30 config as base for BFS, and my current .31 one for
CFS. I can't remember making any changes since .30, which was confirmed
by a quick look at the diff.

Kernel configs + test script and logs available at:
http://alioth.debian.org/~fjp/tmp/linux_BFSvsCFS/

BFS general impression
----------------------
I've used BFS for over a day yesterday and today, and in general I'm very
impressed. During normal use (coding and testing a shell script that's
CPU/memory heavy + normal mail/news/browser + amarok) I've not seen any
strange issues. My notebook even suspended and resumed (StR) without any
problems.

With CFS I regularly have short freezes of the mouse cursor or when
typing. I think that it's related to KDE's news reader knode updating
from my local news server. With CFS I also saw such freezes a few times,
but they _seemed_ less frequent and less severe. No hard data though.

But this evening, while I was preparing and running the tests, I've had 4
freezes of the desktop. The first two times it was only a partial freeze:
taskbar was frozen, but I could still switch apps and use the graphical
console; the last two times it was a full freeze of the display and
keyboard (incl. e.g. numlock), but in the background everything continued
to run normally and I could log in over SSH without any problem. On
reboot some file systems did fail to unmount though.

Normally my desktop and X.Org are 100% reliable.

Test description
----------------
I've done two tests. The first consisted of:
- playing Marillion's "B'Sides Themselves" in amarok from an NFS share
- having the game "Chromium B.S.U." displaying it's opening graphics on
the laptop display; this has very smoothly flowing graphics and is
thus a nice visual reference for latency issues; the game itself is
quite fast-paced and can get starved quite easily
- the two tasks above resulted in ~10% overall CPU usage
- running a script with kernel compiles and that script I had been
working on

The script was invoked as:
./scheduler-tests 2>&1 | tee `uname -r`.log

The main steps in the script are:
- stop cron; clear ccache; prepare for kernel build (allnoconfig)
- 3 x make -j4 kernel build; 2 with 'time', 1 with Jens' 'latt' [1]
- 3 x make -j2 kernel build; 2 with 'time', 1 with Jens' 'latt' [1]
- 4 runs of my own script [2], the last two in parallel

[1] I used Peter's version from:
http://marc.info/?l=linux-kernel&m=125242343131497&w=2
[2] The script produces .dot files showing graphs of Debian package
depencies: http://alioth.debian.org/~fjp/debtree/
It very inefficiently queries the package management databases
and forks insane numbers of sub shells, but the output is great ;-)

Disclaimer: I have no idea what the numbers from 'latt' mean or how
reliable they are.

The second test was:
- still playing Marillion
- playing a movie that's streamed from vlc on my server to vlc on the
laptop display of the notebook, with sound muted
- running a make -j4 kernel compile
- actually playing Chromium

Test results
------------
Right, let's get down to the meat after that long intro.
Challenger goes first.

BFS CFS
========================= ==========================
make -j4 (1) real 2m40.232s real 2m41.907s
time user 3m25.617s user 3m15.792s
sys 0m34.450s sys 0m33.534s

make -j4 (2) real 2m16.196s real 2m19.140s
time user 3m16.212s user 3m6.052s
sys 0m32.770s sys 0m31.930s

make -j4 (3) Entries: 3088 (clients=8) Entries: 3168 (clients=8)
latt Max 19066 usec Max 23665 usec
Avg 73 usec Avg 8637 usec
Stdev 694 usec Stdev 7565 usec
---------------
make -j2 (1) real 2m14.962s real 2m32.508s
time user 3m8.740s user 3m8.320s
sys 0m32.470s sys 0m31.554s

make -j2 (2) real 2m15.650s real 2m33.396s
time user 3m8.428s user 3m3.147s
sys 0m31.490s sys 0m31.566s

make -j2 (3) Entries: 1568 (clients=4) Entries: 1732 (clients=4)
latt Max 8064 usec Max 24859 usec
Avg 78 usec Avg 9431 usec
Stdev 393 usec Stdev 7431 usec
---------------
debtree (1) real 1m31.299s real 1m8.275s
time user 1m13.973s user 0m46.395s
sys 0m19.653s sys 0m14.277s

debtree (2) real 1m27.140s real 1m3.181s
time user 1m15.441s user 0m46.223s
sys 0m19.765s sys 0m14.097s

Difference between (1) and (2) is probably that for (1) the cache was
still empty, while during (2) all needed data was already in memory.

debtree (3) This is mostly as background for (4) which ran in parallel.
time Results are not fully comparable due to timing issues!
real 1m20.773s real 1m6.512s
user 1m5.460s user 0m46.251s
sys 0m17.813s sys 0m13.361s

debtree (4) Entries: 160 (clients=4) Entries: 192 (clients=4)
latt Max 134 usec Max 21214 usec
Avg 27 usec Avg 12139 usec
Stdev 17 usec Stdev 6707 usec

Observations during scripted tests
----------------------------------
- music play was never a problem
- with CFS the Chromium opening graphics stayed smooth and at close to
normal speed, some minor slowdown only during -j4 kernel builds
- with BFS there was a very notable slowdown and sometimes short skips
in the Chromium opening graphics during -j4 compiles; during -j2
compiles it stayed smooth, with maybe a very slight slowdown
- with CFS overall CPU usage is horrible during -j2 kernel compiles:
top -d1 shows idle of between 5 and 30% (!), probably averaging around
15% and that's with amarok and chromium running as well; for -j4 it is
close to 100% full time;
- BFS shows very close to 100% with both -j2 and -j4

Observations during interactive tests
-------------------------------------
Unfortunately the desktop froze completely with BFS very shortly after I
started the test, so observations are not completely reliable.

- with CFS the movie showed major skips during -j4 compile and Chromium
was only barely playable (and zero fun); with compile at nice -n 10
Chromium was a lot more playable, but movie still skipped a lot
- with BFS I only had a _very_ short observation period, but the movie
seemed to play almost completely normally, even _without_ niceing -j4;
at the same time the game was similar to CFS after nicing the build

Very Rough Conclusions
----------------------
* BFS is faster in real time for both -j4 and -j2 kernel compiles, but
uses more resources getting there
* CFS might have done better if it'd been using CPUs at 100%
* BFS is indeed more efficient at -j2 than -j4 on a system with 2 cores,
but when running more tasks than cores slowdown in interactive tasks
* BFS does significantly worse running my script, which means that I lost
time doing my development work yesterday and today :-(
* BFS shows significantly better "latt" figures
* But at the same time only BFS showed notable slowdown in Chromium during
kernel compiles
* BFS seems to distribute capacity much more equally and fluently: when
there is too much work and no priorities are assigned, all tasks suffer,
but none are starved
* there is certainly room for improvement in CFS; the under-usage of the
CPUs and movie skips are quite bad

With BFS I suspect that running the kernel builds niced, which I normally
do, would have shown perfect Chromium behavior.

I won't have opportunity to do follow-up testing in the very short term,
but am in general prepared to spend more time on this in the coming
months.

Hope this is of value.

Cheers,
FJP

2009-09-09 00:01:17

by Nikos Chantziaras

[permalink] [raw]

Subject: Re: [long] Another BFS versus CFS shakedown

On 09/09/2009 02:42 AM, Frans Pop wrote:
> But this evening, while I was preparing and running the tests, I've had 4
> freezes of the desktop.

Unfortunately BFS doesn't provide a reliable way (yet?) to run such
tests on it. This might be the cause for the hangs (from bfs-faq.txt):

Currently known problems?
[...]
3. Stuck tasks after extensive use of trace functions
(ptrace etc.).
[...]
5. More likely to show up bugs in *other* code due to
being much more aggressive at using multiple CPUs so
race conditions will show up more frequently.

2009-09-09 00:44:14

by Frans Pop

[permalink] [raw]

Subject: Re: [long] Another BFS versus CFS shakedown

On Wednesday 09 September 2009, Frans Pop wrote:
> BFS general impression
> ----------------------
> I've used BFS for over a day yesterday and today, and in general I'm
> very impressed. During normal use (coding and testing a shell script
> that's CPU/memory heavy + normal mail/news/browser + amarok) I've not
> seen any strange issues. My notebook even suspended and resumed (StR)
> without any problems.
>
> With CFS I regularly have short freezes of the mouse cursor or when
> typing. I think that it's related to KDE's news reader knode updating
> from my local news server. With CFS I also saw such freezes a few
> times, but they _seemed_ less frequent and less severe. No hard data
> though.

The 2nd CFS should have been BFS here. Sorry.

> But this evening, while I was preparing and running the tests, I've had
> 4 freezes of the desktop. The first two times it was only a partial
> freeze: taskbar was frozen, but I could still switch apps and use the
> graphical console; the last two times it was a full freeze of the
> display and keyboard (incl. e.g. numlock), but in the background
> everything continued to run normally and I could log in over SSH
> without any problem. On reboot some file systems did fail to unmount
> though.
>
> Normally my desktop and X.Org are 100% reliable.

Cheers,
FJP

P.S. I've received a very positive and friendly private reply from Con.

2009-09-11 06:20:41

by Ingo Molnar

[permalink] [raw]

Subject: Re: [long] Another BFS versus CFS shakedown

Frans, thanks for the detailed tests, they are very useful!

* Frans Pop <[email protected]> wrote:

> [1] I used Peter's version from:
> http://marc.info/?l=linux-kernel&m=125242343131497&w=2
[...]
> Disclaimer: I have no idea what the numbers from 'latt' mean or
> how reliable they are.

Note, the one you used was a still buggy version of latt.c producing
bogus latency numbers - you will need the fix to it attached below.

Furthermore, the following tune might be needed on mainline to make
it produce consistently good max numbers (not just good averages):

echo 0 > /proc/sys/kernel/sched_wakeup_granularity_ns

Let me pick out the worst observed mainline interactive behavior you
reported:

> - with CFS the movie showed major skips during -j4 compile and
> Chromium was only barely playable (and zero fun); with compile
> at nice -n 10 Chromium was a lot more playable, but movie still
> skipped a lot

FYI, this ought to be fixed in the latest scheduler tree which you
can find in -tip:

http://people.redhat.com/mingo/tip.git/README

Which fix was confirmed by Nikos Chantziaras for a very similar set
of workloads. (and partly confirmed by Jens for a different
workload)

Also, the make -j<nr_cpus> performance results improved in latest
-tip, although it's still an open question whether all issues have
been fixed.

Ingo

--- latt.c.orig
+++ latt.c
@@ -39,6 +39,7 @@ static unsigned int verbose;
struct stats
{
double n, mean, M2, max;
+ int max_pid;
};

static void update_stats(struct stats *stats, unsigned long long val)
@@ -85,22 +86,6 @@ static double stddev_stats(struct stats
return sqrt(variance);
}

-/*
- * The std dev of the mean is related to the std dev by:
- *
- * s
- * s_mean = -------
- * sqrt(n)
- *
- */
-static double stddev_mean_stats(struct stats *stats)
-{
- double variance = stats->M2 / (stats->n - 1);
- double variance_mean = variance / stats->n;
-
- return sqrt(variance_mean);
-}
-
struct stats delay_stats;

static int pipes[MAX_CLIENTS*2][2];
@@ -212,7 +197,7 @@ static unsigned long usec_since(struct t
static void log_delay(unsigned long delay)
{
if (verbose) {
- fprintf(stderr, "log delay %8lu usec\n", delay);
+ fprintf(stderr, "log delay %8lu usec (pid %d)\n", delay, getpid());
fflush(stderr);
}

@@ -300,7 +285,7 @@ static int __write_ts(int i, struct time
return write(fd, ts, sizeof(*ts)) != sizeof(*ts);
}

-static long __read_ts(int i, struct timespec *ts)
+static long __read_ts(int i, struct timespec *ts, pid_t *cpids)
{
int fd = pipes[2*i+1][0];
struct timespec t;
@@ -309,11 +294,14 @@ static long __read_ts(int i, struct time
return -1;

log_delay(usec_since(ts, &t));
+ if (verbose)
+ fprintf(stderr, "got delay %ld from child %d [pid %d]\n", usec_since(ts, &t), i, cpids[i]);

return 0;
}

-static int read_ts(struct pollfd *pfd, unsigned int nr, struct timespec *ts)
+static int read_ts(struct pollfd *pfd, unsigned int nr, struct timespec *ts,
+ pid_t *cpids)
{
unsigned int i;

@@ -322,7 +310,7 @@ static int read_ts(struct pollfd *pfd, u
return -1L;
if (pfd[i].revents & POLLIN) {
pfd[i].events = 0;
- if (__read_ts(i, &ts[i]))
+ if (__read_ts(i, &ts[i], cpids))
return -1L;
nr--;
}
@@ -368,7 +356,6 @@ static void run_parent(pid_t *cpids)
srand(1234);

do {
- unsigned long delay;
unsigned pending_events;

do_rand_sleep();
@@ -404,17 +391,17 @@ static void run_parent(pid_t *cpids)
*/
pending_events = clients;
while (pending_events) {
- int evts = poll(ipfd, clients, 0);
+ int evts = poll(ipfd, clients, -1);

if (evts < 0) {
do_exit = 1;
break;
} else if (!evts) {
- /* printf("bugger2\n"); */
+ printf("bugger2\n");
continue;
}

- if (read_ts(ipfd, evts, t1)) {
+ if (read_ts(ipfd, evts, t1, cpids)) {
do_exit = 1;
break;
}

2009-09-11 07:04:49

by Frans Pop

[permalink] [raw]

Subject: Re: [long] Another BFS versus CFS shakedown

On Friday 11 September 2009, Ingo Molnar wrote:
> Note, the one you used was a still buggy version of latt.c producing
> bogus latency numbers - you will need the fix to it attached below.

Yes, I'm aware of that and have already copied Jens' latest version.

> Furthermore, the following tune might be needed on mainline to make
> it produce consistently good max numbers (not just good averages):
>
> echo 0 > /proc/sys/kernel/sched_wakeup_granularity_ns

Ack. I've seen the patches to change some defaults floating by.
Hmmm. I think the proposed new default for my system is 2ms with 2 CPUs?

I will not test against TIP at this time, but I plan to do the following:
- repeat my tests now using vanilla 2.6.31 for both BFS and CFS
This will provide a baseline to verify improvements.
- do two additional runs with CFS with some modified tunables
- do one more run probably when .32-rc2 is out
I'd expect that to have the scheduler fixes, while the worst post-merge
issues should be resolved.

I also have a couple of ideas for getting additional data. I'll post my
results as follow-ups.

I'm very impressed with the responses to the issues that have been raised,
but I think we do owe Con a huge thank you for setting off that process.

I also think there is a lot to be said for having a very straightforward
alternative scheduler available for baseline comparisons. It's much
easier to come out and say "something's broken" if you know some latency
issue is not due to buggy hardware or applications or orange bunnies with
a cosmic ray gun. I'll not go into the question whether such a scheduler
should be in mainline or not.

Cheers,
FJP

2009-09-11 07:17:10

by Jens Axboe

[permalink] [raw]

Subject: Re: [long] Another BFS versus CFS shakedown

On Fri, Sep 11 2009, Frans Pop wrote:
> On Friday 11 September 2009, Ingo Molnar wrote:
> > Note, the one you used was a still buggy version of latt.c producing
> > bogus latency numbers - you will need the fix to it attached below.
>
> Yes, I'm aware of that and have already copied Jens' latest version.

BTW, I put it in a git repo, it quickly gets really confusing with so
many version going around. So that can be accessed here:

git://git.kernel.dk/latt.git

and as with my other repos, snapshots are automatically generated every
hour when new commits have been made. To get the very latest latt and
not have to use git, download:

http://brick.kernel.dk/snaps/latt-git-latest.tar.gz

--
Jens Axboe

2009-09-11 07:53:37

by Ingo Molnar

[permalink] [raw]

Subject: Re: [long] Another BFS versus CFS shakedown

* Jens Axboe <[email protected]> wrote:

> On Fri, Sep 11 2009, Frans Pop wrote:
> > On Friday 11 September 2009, Ingo Molnar wrote:
> > > Note, the one you used was a still buggy version of latt.c producing
> > > bogus latency numbers - you will need the fix to it attached below.
> >
> > Yes, I'm aware of that and have already copied Jens' latest version.
>
> BTW, I put it in a git repo, it quickly gets really confusing with so
> many version going around. So that can be accessed here:
>
> git://git.kernel.dk/latt.git
>
> and as with my other repos, snapshots are automatically generated every
> hour when new commits have been made. To get the very latest latt and
> not have to use git, download:
>
> http://brick.kernel.dk/snaps/latt-git-latest.tar.gz

Btw., your earlier latt reports should be discarded as invalid due
to that bug.

With the fixed latt.c version the mainline latencies (both
worst-case and average) were reported to be better after the poll()
bug got fixed, so in that area, for this kind of measurement,
mainline seems to be working well.

[ What happened is that the poll() bug was creating false latencies
in the mainline scheduler tests. (BFS avoided measuring that bug
incidentally, by its agressive balancer moved the wakee tasks away
from the buggy busy-looping poll() looping parent task. Two
instances of latt.c would possibly have shown similar latencies.) ]

I see you added new 'work generator' changes to latt.c now, will
check/validate that version of latt.c too.

Thanks,

Ingo

2009-09-11 07:58:31

by Jens Axboe

[permalink] [raw]

Subject: Re: [long] Another BFS versus CFS shakedown

On Fri, Sep 11 2009, Ingo Molnar wrote:
>
> * Jens Axboe <[email protected]> wrote:
>
> > On Fri, Sep 11 2009, Frans Pop wrote:
> > > On Friday 11 September 2009, Ingo Molnar wrote:
> > > > Note, the one you used was a still buggy version of latt.c producing
> > > > bogus latency numbers - you will need the fix to it attached below.
> > >
> > > Yes, I'm aware of that and have already copied Jens' latest version.
> >
> > BTW, I put it in a git repo, it quickly gets really confusing with so
> > many version going around. So that can be accessed here:
> >
> > git://git.kernel.dk/latt.git
> >
> > and as with my other repos, snapshots are automatically generated every
> > hour when new commits have been made. To get the very latest latt and
> > not have to use git, download:
> >
> > http://brick.kernel.dk/snaps/latt-git-latest.tar.gz
>
> Btw., your earlier latt reports should be discarded as invalid due
> to that bug.

Yes

> With the fixed latt.c version the mainline latencies (both
> worst-case and average) were reported to be better after the poll()
> bug got fixed, so in that area, for this kind of measurement,
> mainline seems to be working well.
>
> [ What happened is that the poll() bug was creating false latencies
> in the mainline scheduler tests. (BFS avoided measuring that bug
> incidentally, by its agressive balancer moved the wakee tasks away
> from the buggy busy-looping poll() looping parent task. Two
> instances of latt.c would possibly have shown similar latencies.) ]
>
> I see you added new 'work generator' changes to latt.c now, will
> check/validate that version of latt.c too.

I did, it's a simple 'generate random data and compress it' work piece
for each client. You can control the amount of work with -x, which sets
the kb of data it'll work on. Stats are generated both for wakeup
latency, and work processing latency.

--
Jens Axboe