2005-09-13 21:12:22

by Danny ter Haar

[permalink] [raw]
Subject: Q: why _less_ performance on machine with SMP then with UP kernel ?

I've been posting here recently bout our newsgateway.
For short:

If i enable both CPU's i get less performance than enabling both cpu's

Long version:

Here is a description of the setup:
---
It's a tyan server /motherboard
http://www.tyan.com/products/html/ta26b2882.html
with 2 x OPTERON250 cpu's and 4 GIG of ECC ram.
There is 8 x scsi disks for storage
cupper gig-E for internal communication to spool/header servers etc.
acenic FiberOptic Gig-E 64bit PCI card for link to the internet.

Bandwidth use is sampled from the ethernet switch and with mrtg
visualised.

Take today for example:
http://newsgate.newsserver.nl/kernel/2.6.14-rc1-ethernet-bandwidth.png

>From yesterday till 10:30am i ran 2.6.13.1 in UP mode.
As you can see blue (==incoming traffic) is fairly constant.
This morning i compiled/installed 2.6.14-rc1-smp.
I let it ran till 12:15 but it's clear that it can't keep up
with the flow of data. I rebooted to 2.6.14-rc1 (UP) and that
keeps up with the data just fine.

So what is the difference between UP & SMP ?
shared memory , shared interrupts.
I don't know _why_ it's living up to _my_ expectation.
I hoped that the load would drop (it's between 4 to 5) op UP kernel
because certain processes would be split over the processors.

Anybody want to try and explain to me where i'm making an error ?

Config file & kern.log output can be found at :
http://newsgate.newsserver.nl/kernel/

A very confused

Danny


2005-09-13 21:54:18

by Andrew Walrond

[permalink] [raw]
Subject: Re: Q: why _less_ performance on machine with SMP then with UP kernel ?

On Tuesday 13 September 2005 22:12, Danny ter Haar wrote:
>
> From yesterday till 10:30am i ran 2.6.13.1 in UP mode.
> As you can see blue (==incoming traffic) is fairly constant.
> This morning i compiled/installed 2.6.14-rc1-smp.
> I let it ran till 12:15 but it's clear that it can't keep up
> with the flow of data. I rebooted to 2.6.14-rc1 (UP) and that
> keeps up with the data just fine.
>
> So what is the difference between UP & SMP ?

Is there any indication in the system log that your userland (news?) software
was having problems? It may be entirely unrelated to your problem, but you
should anyway be aware of a nasty unresolved issue with all smp kernels >=
2.6.12 on smp x86_64 systems:

http://bugzilla.kernel.org/show_bug.cgi?id=4851

If you have any indication of userland problems, you might try
echo 0 > /proc/sys/kernel/randomize_va_space
which much reduces (but seemingly does not completely remove) this issue for
most people.

>
> A very confused
>

One of the major symptoms of this particular bug ;)

Andrew Walrond

2005-09-14 06:16:55

by Danny ter Haar

[permalink] [raw]
Subject: Re: Q: why _less_ performance on machine with SMP then with UP kernel ?

Andrew Walrond <[email protected]> wrote:
>> So what is the difference between UP & SMP ?
>Is there any indication in the system log that your userland (news?) software
>was having problems?

Not really. The load of the machine doesn't get that high as normal.
On machines that feed usenet to us (usenet ==push system) i see that we
grow "backlog" (we're not accepting usenet as fast as we should).

>It may be entirely unrelated to your problem, but you
>should anyway be aware of a nasty unresolved issue with all smp kernels >=
>2.6.12 on smp x86_64 systems:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=4851


Now that you mention it, i saw this in the log file when running the SMP
kernel:

newsgate kernel: mv[7024]: segfault at 00002aaaaabc3648 rip 00002aaaaaaac80e
rsp 00007fffffdc17c0 error 4

It was only a oneliner, no further details.

>If you have any indication of userland problems, you might try
> echo 0 > /proc/sys/kernel/randomize_va_space
>which much reduces (but seemingly does not completely remove) this issue for
>most people.

What i did try is the following (prior on other SMP kernels)
I tried binding certain processes to a certain cpu.
Did the same for the irq's.
Just to see if it mattered.

It didn't...

>> A very confused
>One of the major symptoms of this particular bug ;)

I know, that's why a plee for help is _my_ only resort ;-)
I do understand that at this level most people send in patches.

Thanks for your reply!

Will try the "echo 0" next time i boot a smp kernel.

Danny


2005-09-14 07:58:05

by Andrew Walrond

[permalink] [raw]
Subject: Re: Q: why _less_ performance on machine with SMP then with UP kernel ?

On Wednesday 14 September 2005 07:16, Danny ter Haar wrote:
>
> Now that you mention it, i saw this in the log file when running the SMP
> kernel:
>
> newsgate kernel: mv[7024]: segfault at 00002aaaaabc3648 rip
> 00002aaaaaaac80e rsp 00007fffffdc17c0 error 4
>
> It was only a oneliner, no further details.
>

Almost certainly due to bug 4851. You _really_ do not want to be using 2.6.12+
smp kernels on your hardware until this is sorted out. Stick to 2.6.11.12

I think this bug is going to bite a _lot_ of amd64 smp and dual core people as
they/distros start upgrading to these newer kernels.

Andrew/Linus: you might consider including a warning in kernel release
messages until this problem gets resolved, since it affects userland in such
an unpredictable way and we have a good handle on which systems it affects
(smp AMD64)

Andrew Walrond