2008-03-25 12:50:01

by Sanders, Rob M.

[permalink] [raw]
Subject: Performance changes between 2.6.13 and 2.6.23

Hello all,
I've been lurking on the digest for some time (don't want to receive full lkml traffic at work) and saw the
posts about Wine performance regressions in 2.6.24. Some of what I saw there, particularly Andi Kleen's
responses, mirror something that I see on my box at home. I had emailed Andi directly since I only read the
digest, and I'm posting this here at his suggestion. Please CC: [email protected] with any replies, as I'm only
getting the lkml digest.
I'm running on a dual 2GHZ G5 Powermac w/2GB ram. I recently upgraded from YDL4.0.91 (2.6.13 kernel) to
YDL6 (2.6.23 kernel) and noticed that the overall performance of the box seems more sluggish after the upgrade.
Of more particular concern, the main application that I build has seen a 4-5X slowdown in performance. Under
YDL4.0.91 I could process roughly 1e8 data points in ~2 seconds, and under YDL6.0 I now process 1e8 data points
in ~8 seconds. Total CPU loading (from top) is about 5%, under both systems. The application spawns multiple
processes and uses semaphores and shared memory to move data between the processes. When I use 'vmstat 3' the
single biggest difference I see between YDL4.0.91 and YDL6 is that the YDL4.0.91 system is the YDL4.0.91 system is
context switching about 7000 per interval, whereas the YDl6 system is context switching about 1200 times per interval.
I've been talking somewhat with Owen Stampflee (works for the distro maker, TSS) and have rebuilt the kernel on
the 2.6 box removing some of the things that I don't need on my box (cell support, etc). There did seem (subjectively)
to be a slight improvement after that. When I get some more time to play at home I'm going to take the 2.6.13 .config
I use and build a 2.6.23 kernal using 'make oldconfig' and try to figure if it is a config issues.
I realize that going from 2.6.13 to 2.6.23 is a *huge* change, and that the problems may not be tied to the kernel
but to other things, but are there any other suggestions folks have for finding the cause of the performance slowdown?

Thank you...


Rob


2008-03-25 13:26:16

by Bart Van Assche

[permalink] [raw]
Subject: Re: Performance changes between 2.6.13 and 2.6.23

On Tue, Mar 25, 2008 at 1:34 PM, Sanders, Rob M. <[email protected]> wrote:
> Of more particular concern, the main application that I build has seen a 4-5X
> slowdown in performance.

Is this a singlethreaded or a multithreaded application ? Is this
application CPU-bound or IO-bound ? Does it rely a lot on memory
allocation ?

I'm not sure this will tell you the cause, but what might help is to
compile and run lmbench2 and interbench, and to compare the results
for the two kernel versions. Make sure that only the kernel version
differs between the two test runs, and that all other components (a.o.
libc and libpthread) stay the same. Booting the old YDL with the new
kernel should work, while booting the new YDL with an old kernel
probably will cause trouble.

See also http://www.bitmover.com/lmbench/ and
http://members.optusnet.com.au/ckolivas/interbench/.

Bart.

2008-03-25 13:55:56

by Sanders, Rob M.

[permalink] [raw]
Subject: RE: Performance changes between 2.6.13 and 2.6.23

Bart,
Each process is single threaded, although each process is built with the
-lpthread library. For this particular application I would expect the
bottleneck to be in I/O (between processes) bound. I hadn't thought
about trying to boot YDL4 using the new kernel, I'll try that, and I'll
look at the lmench2 and interbench. Thanks....

Rob



-----Original Message-----
From: Bart Van Assche [mailto:[email protected]]
Sent: Tue 3/25/2008 9:25 AM
To: Sanders, Rob M.
Cc: [email protected]
Subject: Re: Performance changes between 2.6.13 and 2.6.23

On Tue, Mar 25, 2008 at 1:34 PM, Sanders, Rob M. <[email protected]> wrote:
> Of more particular concern, the main application that I build has seen a 4-5X
> slowdown in performance.

Is this a singlethreaded or a multithreaded application ? Is this
application CPU-bound or IO-bound ? Does it rely a lot on memory
allocation ?

I'm not sure this will tell you the cause, but what might help is to
compile and run lmbench2 and interbench, and to compare the results
for the two kernel versions. Make sure that only the kernel version
differs between the two test runs, and that all other components (a.o.
libc and libpthread) stay the same. Booting the old YDL with the new
kernel should work, while booting the new YDL with an old kernel
probably will cause trouble.

See also http://www.bitmover.com/lmbench/ and
http://members.optusnet.com.au/ckolivas/interbench/.

Bart.

2008-03-25 14:23:23

by Bart Van Assche

[permalink] [raw]
Subject: Re: Performance changes between 2.6.13 and 2.6.23

On Tue, Mar 25, 2008 at 2:52 PM, Sanders, Rob M. <[email protected]> wrote:
> Each process is single threaded, although each process is built with the
> -lpthread library. For this particular application I would expect the
> bottleneck to be in I/O (between processes) bound. I hadn't thought
> about trying to boot YDL4 using the new kernel, I'll try that, and I'll
> look at the lmench2 and interbench. Thanks....

In that case it might be interesting to observe the number of context
switches per second caused by the different processes. If the product
of the context switch time reported by lmbench2 and the number of
context switches per second is more than about 0.1, this means that a
lot of time is spent in just context switching and the application
probably should be optimized to cause less context switches. This
holds for any OS.

On a Linux system you can observe the number of context switches
performed by all processes e.g. via the following bash script:

interval=5; last=""; while true; do ctxt=$(while read col1 col2 rest;
do if [ $col1 = ctxt ]; then echo $col2; fi; done </proc/stat); if [
"$last" != "" ]; then echo $(((ctxt-last)/interval)); fi; last=$ctxt;
sleep $interval; done

The above script uses as much bash built-ins as possible such that it
causes as few context switches as possible.

Bart.

2008-03-25 16:41:48

by Ray Lee

[permalink] [raw]
Subject: Re: Performance changes between 2.6.13 and 2.6.23

On Tue, Mar 25, 2008 at 5:34 AM, Sanders, Rob M. <[email protected]> wrote:
> Hello all,
> I've been lurking on the digest for some time (don't want to receive full lkml traffic at work) and saw the
> posts about Wine performance regressions in 2.6.24. Some of what I saw there, particularly Andi Kleen's
> responses, mirror something that I see on my box at home. I had emailed Andi directly since I only read the
> digest, and I'm posting this here at his suggestion. Please CC: [email protected] with any replies, as I'm only
> getting the lkml digest.
> I'm running on a dual 2GHZ G5 Powermac w/2GB ram. I recently upgraded from YDL4.0.91 (2.6.13 kernel) to
> YDL6 (2.6.23 kernel) and noticed that the overall performance of the box seems more sluggish after the upgrade.
> Of more particular concern, the main application that I build has seen a 4-5X slowdown in performance. Under
> YDL4.0.91 I could process roughly 1e8 data points in ~2 seconds, and under YDL6.0 I now process 1e8 data points

So, two processors, and multiple processes passing data back and
forth. Key point seems to be:

> in ~8 seconds. Total CPU loading (from top) is about 5%, under both systems. The application spawns multiple
> processes and uses semaphores and shared memory to move data between the processes. When I use 'vmstat 3' the
> single biggest difference I see between YDL4.0.91 and YDL6 is that the YDL4.0.91 system is the YDL4.0.91 system is
> context switching about 7000 per interval, whereas the YDl6 system is context switching about 1200 times per interval.

Many more context switches per second, so a lot less work is getting
done each time.

> I've been talking somewhat with Owen Stampflee (works for the distro maker, TSS) and have rebuilt the kernel on
> the 2.6 box removing some of the things that I don't need on my box (cell support, etc). There did seem (subjectively)
> to be a slight improvement after that. When I get some more time to play at home I'm going to take the 2.6.13 .config
> I use and build a 2.6.23 kernal using 'make oldconfig' and try to figure if it is a config issues.
> I realize that going from 2.6.13 to 2.6.23 is a *huge* change, and that the problems may not be tied to the kernel
> but to other things, but are there any other suggestions folks have for finding the cause of the performance slowdow

A lot has changed between 2.6.23 and current mainline as well.
Particularly in the scheduler, which I suspect is the issue for your
test. If possible, could you try a 2.6.25-rc-latest kernel, both
before and after an "echo 5 > /proc/sys/kernel/sched_features" and see
if that makes any difference?

2008-03-25 16:47:22

by Sanders, Rob M.

[permalink] [raw]
Subject: RE: Performance changes between 2.6.13 and 2.6.23




-----Original Message-----
From: [email protected] on behalf of Ray Lee
Sent: Tue 3/25/2008 12:41 PM
To: Sanders, Rob M.
Cc: [email protected]; [email protected]
Subject: Re: Performance changes between 2.6.13 and 2.6.23

On Tue, Mar 25, 2008 at 5:34 AM, Sanders, Rob M. <[email protected]> wrote:
> Hello all,
> I've been lurking on the digest for some time (don't want to receive full lkml traffic at work) and saw the
> posts about Wine performance regressions in 2.6.24. Some of what I saw there, particularly Andi Kleen's
> responses, mirror something that I see on my box at home. I had emailed Andi directly since I only read the
> digest, and I'm posting this here at his suggestion. Please CC: [email protected] with any replies, as I'm only
> getting the lkml digest.
> I'm running on a dual 2GHZ G5 Powermac w/2GB ram. I recently upgraded from YDL4.0.91 (2.6.13 kernel) to
> YDL6 (2.6.23 kernel) and noticed that the overall performance of the box seems more sluggish after the upgrade.
> Of more particular concern, the main application that I build has seen a 4-5X slowdown in performance. Under
> YDL4.0.91 I could process roughly 1e8 data points in ~2 seconds, and under YDL6.0 I now process 1e8 data points

So, two processors, and multiple processes passing data back and
forth. Key point seems to be:

> in ~8 seconds. Total CPU loading (from top) is about 5%, under both systems. The application spawns multiple
> processes and uses semaphores and shared memory to move data between the processes. When I use 'vmstat 3' the
> single biggest difference I see between YDL4.0.91 and YDL6 is that the YDL4.0.91 system is the YDL4.0.91 system is
> context switching about 7000 per interval, whereas the YDl6 system is context switching about 1200 times per interval.

Many more context switches per second, so a lot less work is getting
done each time.

> I've been talking somewhat with Owen Stampflee (works for the distro maker, TSS) and have rebuilt the kernel on
> the 2.6 box removing some of the things that I don't need on my box (cell support, etc). There did seem (subjectively)
> to be a slight improvement after that. When I get some more time to play at home I'm going to take the 2.6.13 .config
> I use and build a 2.6.23 kernal using 'make oldconfig' and try to figure if it is a config issues.
> I realize that going from 2.6.13 to 2.6.23 is a *huge* change, and that the problems may not be tied to the kernel
> but to other things, but are there any other suggestions folks have for finding the cause of the performance slowdow

A lot has changed between 2.6.23 and current mainline as well.
Particularly in the scheduler, which I suspect is the issue for your
test. If possible, could you try a 2.6.25-rc-latest kernel, both
before and after an "echo 5 > /proc/sys/kernel/sched_features" and see
if that makes any difference?

===
Ray,
I'll add that to the list of things to try. I do want to clarify that the YDL4 system with 7000 context switches every
3 seconds was processing 4-5 times as much data as the YLD6 system that had 1200 context switches. The other test
I want to do is (re)install YDL5 and pull my app over to it. I hadn't used YDL5 due to my compilers not working
on it. I've upgraded those (at least to eval versions) now for YDL6.

Rob

2008-03-27 19:37:28

by Sanders, Rob M.

[permalink] [raw]
Subject: RE: Performance changes between 2.6.13 and 2.6.23




-----Original Message-----
From: [email protected] on behalf of Ray Lee
Sent: Tue 3/25/2008 12:41 PM
To: Sanders, Rob M.
Cc: [email protected]; [email protected]
Subject: Re: Performance changes between 2.6.13 and 2.6.23

On Tue, Mar 25, 2008 at 5:34 AM, Sanders, Rob M. <[email protected]> wrote:
> Hello all,
> I've been lurking on the digest for some time (don't want to receive full lkml traffic at work) and saw the
> posts about Wine performance regressions in 2.6.24. Some of what I saw there, particularly Andi Kleen's
> responses, mirror something that I see on my box at home. I had emailed Andi directly since I only read the
> digest, and I'm posting this here at his suggestion. Please CC: [email protected] with any replies, as I'm only
> getting the lkml digest.
> I'm running on a dual 2GHZ G5 Powermac w/2GB ram. I recently upgraded from YDL4.0.91 (2.6.13 kernel) to
> YDL6 (2.6.23 kernel) and noticed that the overall performance of the box seems more sluggish after the upgrade.
> Of more particular concern, the main application that I build has seen a 4-5X slowdown in performance. Under
> YDL4.0.91 I could process roughly 1e8 data points in ~2 seconds, and under YDL6.0 I now process 1e8 data points

So, two processors, and multiple processes passing data back and
forth. Key point seems to be:

> in ~8 seconds. Total CPU loading (from top) is about 5%, under both systems. The application spawns multiple
> processes and uses semaphores and shared memory to move data between the processes. When I use 'vmstat 3' the
> single biggest difference I see between YDL4.0.91 and YDL6 is that the YDL4.0.91 system is the YDL4.0.91 system is
> context switching about 7000 per interval, whereas the YDl6 system is context switching about 1200 times per interval.

Many more context switches per second, so a lot less work is getting
done each time.

> I've been talking somewhat with Owen Stampflee (works for the distro maker, TSS) and have rebuilt the kernel on
> the 2.6 box removing some of the things that I don't need on my box (cell support, etc). There did seem (subjectively)
> to be a slight improvement after that. When I get some more time to play at home I'm going to take the 2.6.13 .config
> I use and build a 2.6.23 kernal using 'make oldconfig' and try to figure if it is a config issues.
> I realize that going from 2.6.13 to 2.6.23 is a *huge* change, and that the problems may not be tied to the kernel
> but to other things, but are there any other suggestions folks have for finding the cause of the performance slowdow

A lot has changed between 2.6.23 and current mainline as well.
Particularly in the scheduler, which I suspect is the issue for your
test. If possible, could you try a 2.6.25-rc-latest kernel, both
before and after an "echo 5 > /proc/sys/kernel/sched_features" and see
if that makes any difference?


=====

Sorry to disappear for a few days, I just hadn't had time to do anything at home. Finally had some time to
run some test, and at this point I'm thinking that the problem is not with the kernel. Not quite sure where to look
next, but I've done the following tests:

YDL4.0.91 with 2.6.13 kernel - normal speed
YDL6.0 with 2.6.23 kenrel - 4-5x slowdown
YDL4.1 with 2.6.15 kernel - 4-5x slowdown
YDL4.0.91 with 2.6.15 kernel - *normal speed* - kernel config pulled from YDL4.1 install
YDL4.0.91 with 2.6.23 kernel - failed boot

I'm going to try and test a few other distros (Fedora 8/9, possibly OpenSuse) to see if they have similar
issues. If anyone has other suggestions (including websites to read) feel free to drop me a private line
(at [email protected]). Thanks....

Rob

2008-03-28 09:29:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: Performance changes between 2.6.13 and 2.6.23


* Sanders, Rob M. <[email protected]> wrote:

> Sorry to disappear for a few days, I just hadn't had time to do
> anything at home. Finally had some time to run some test, and at this
> point I'm thinking that the problem is not with the kernel. Not quite
> sure where to look next, but I've done the following tests:
>
> YDL4.0.91 with 2.6.13 kernel - normal speed
> YDL6.0 with 2.6.23 kenrel - 4-5x slowdown
> YDL4.1 with 2.6.15 kernel - 4-5x slowdown
> YDL4.0.91 with 2.6.15 kernel - *normal speed* - kernel config pulled from YDL4.1 install
> YDL4.0.91 with 2.6.23 kernel - failed boot

if you suspect the scheduler then please try the following suggestions i
made in another thread:

---------->
could you run this script while such a slowdown is really prominent:

http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

and send me the output it generates? The output is the most useful if
you do this on a kernel that has CONFIG_SCHED_DEBUG=y and
CONFIG_SCHEDSTATS=y enabled.

on the off chance that this issue has been fixed in the soon-to-be
2.6.25 kernel, you might also want to try x86.git/latest, which is based
on the latest Linus tree and has all relevant x86 fixes and improvements
added as well:
http://people.redhat.com/mingo/x86.git/README

several of the changes can affect performance.

a third (and most comprehensive) way to debug this would be to send me a
scheduler trace of such a slowdown, you can generate a scheduler trace
the following way:

http://people.redhat.com/mingo/sched-devel.git/readme-tracer.txt

but we can probably give a first estimation based on the cfs-debug-info
output already. Btw., you can combine the scheduler and the x86 git tree
into a temporary unified tree by doing these two commands:

git-checkout -b tmp x86/latest
git-merge sched-devel/latest

(run "make oldconfig" to pick up the new config options.)

Ingo

2008-03-28 10:34:19

by Sanders, Rob M.

[permalink] [raw]
Subject: RE: Performance changes between 2.6.13 and 2.6.23


* Sanders, Rob M. <[email protected]> wrote:

> Sorry to disappear for a few days, I just hadn't had time to do
> anything at home. Finally had some time to run some test, and at this
> point I'm thinking that the problem is not with the kernel. Not quite
> sure where to look next, but I've done the following tests:
>
> YDL4.0.91 with 2.6.13 kernel - normal speed
> YDL6.0 with 2.6.23 kenrel - 4-5x slowdown
> YDL4.1 with 2.6.15 kernel - 4-5x slowdown
> YDL4.0.91 with 2.6.15 kernel - *normal speed* - kernel config pulled from YDL4.1 install
> YDL4.0.91 with 2.6.23 kernel - failed boot

if you suspect the scheduler then please try the following suggestions i
made in another thread:

---------->
could you run this script while such a slowdown is really prominent:

http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

and send me the output it generates? The output is the most useful if
you do this on a kernel that has CONFIG_SCHED_DEBUG=y and
CONFIG_SCHEDSTATS=y enabled.

on the off chance that this issue has been fixed in the soon-to-be
2.6.25 kernel, you might also want to try x86.git/latest, which is based
on the latest Linus tree and has all relevant x86 fixes and improvements
added as well:
http://people.redhat.com/mingo/x86.git/README

several of the changes can affect performance.

a third (and most comprehensive) way to debug this would be to send me a
scheduler trace of such a slowdown, you can generate a scheduler trace
the following way:

http://people.redhat.com/mingo/sched-devel.git/readme-tracer.txt

but we can probably give a first estimation based on the cfs-debug-info
output already. Btw., you can combine the scheduler and the x86 git tree
into a temporary unified tree by doing these two commands:

git-checkout -b tmp x86/latest
git-merge sched-devel/latest

(run "make oldconfig" to pick up the new config options.)

Ingo

======
Ingo,
Thanks. I'll try this over the weekend if I get some free time, but it might not happen.
This is a busy weekend....

Rob