2008-03-16 05:39:00

by Allan Menezes

[permalink] [raw]
Subject: HPL benchmarking linux kernel for shared memory performance.

Hi,
I eliminated in which kernel verion this begins to happen. I am
benchmarking a single node with hpl and openmpi beta 1.3 and gotoblas v1.24
for personal noncommercial reasons.
I tried a single node with the command $ mpirun -np 1 ./xhpl
and with kernel ver 2.6.23.14 i get over 38Gflops
but with kernel ver 2.6.23.15 compiling with the same .config i get 7.xx
Gflops which is 1/5th that of the other kernel.
Keep in mind that only the kernels have changed from kernel.org not any
hardware or anything else as all else is same! but the performace drops
to 1/5th
There is no network involved in this single node quad core intel test
but just shared memory.
So the shared memory or smp performance of the newer kernels is far far
worse than upto 2.6.23.14!
Even with 2.6.25.rc5 the performance is degraded!
Can you please help me find why this is occurring? Please advise!
mY set up Quad core Q6600 intel overclocked stably to 2.88 GHZ , 6 gig
ddr2 800 mhz dual channel ram,
Allan Menezes


2008-03-16 10:43:21

by Marcin Slusarz

[permalink] [raw]
Subject: Re: HPL benchmarking linux kernel for shared memory performance.

On Sun, Mar 16, 2008 at 01:38:51AM -0400, Allan Menezes wrote:
> Hi,
> I eliminated in which kernel verion this begins to happen. I am
> benchmarking a single node with hpl and openmpi beta 1.3 and gotoblas v1.24
> for personal noncommercial reasons.
> I tried a single node with the command $ mpirun -np 1 ./xhpl
> and with kernel ver 2.6.23.14 i get over 38Gflops
> but with kernel ver 2.6.23.15 compiling with the same .config i get 7.xx
> Gflops which is 1/5th that of the other kernel.
> Keep in mind that only the kernels have changed from kernel.org not any
> hardware or anything else as all else is same! but the performace drops to
> 1/5th
> There is no network involved in this single node quad core intel test but
> just shared memory.
> So the shared memory or smp performance of the newer kernels is far far
> worse than upto 2.6.23.14!
> Even with 2.6.25.rc5 the performance is degraded!
> Can you please help me find why this is occurring? Please advise!
> mY set up Quad core Q6600 intel overclocked stably to 2.88 GHZ , 6 gig ddr2
> 800 mhz dual channel ram,
> Allan Menezes

If you want to speed up bug resolution, you can bisect it to only one patch with git
(google for git bisect) on this tree:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.23.y.git

ps: please post your .config and dmesg output

Marcin

2008-03-16 18:14:18

by Roger Heflin

[permalink] [raw]
Subject: Re: HPL benchmarking linux kernel for shared memory performance.

Allan Menezes wrote:
> Hi,
> I eliminated in which kernel verion this begins to happen. I am
> benchmarking a single node with hpl and openmpi beta 1.3 and gotoblas v1.24
> for personal noncommercial reasons.
> I tried a single node with the command $ mpirun -np 1 ./xhpl
> and with kernel ver 2.6.23.14 i get over 38Gflops
> but with kernel ver 2.6.23.15 compiling with the same .config i get 7.xx
> Gflops which is 1/5th that of the other kernel.
> Keep in mind that only the kernels have changed from kernel.org not any
> hardware or anything else as all else is same! but the performace drops
> to 1/5th
> There is no network involved in this single node quad core intel test
> but just shared memory.
> So the shared memory or smp performance of the newer kernels is far far
> worse than upto 2.6.23.14!
> Even with 2.6.25.rc5 the performance is degraded!
> Can you please help me find why this is occurring? Please advise!
> mY set up Quad core Q6600 intel overclocked stably to 2.88 GHZ , 6 gig
> ddr2 800 mhz dual channel ram,
> Allan Menezes

Make sure that it is actually using multiple processes or threads, if it was
only using a single thread the number could be that low, and a number of really
trivial things being wrong could cause it to only be using one thread. "top -H"
will show threads.

Is the HPL executable identical (the same executable should work just fine so
long as the underlying HW is the same, so no need to recompile/relink xhpl)?

And you might also try Intel's MKL with hpl and see if that is better or worse
than gotoblas (this probably won't affect the problem, but it would be good to
know if the Intel MKL is better or worse than gotoblas).

Also, as the other person mentioned use git bisect and figure out which patch
did it, if you can identify one patch someone can probably figure it out.

Roger