2009-10-28 19:24:54

by Igor Chudov

[permalink] [raw]
Subject: Hyperthreading on 4 core CPU DECREASES performance???

At work, we have some users who run a multithreaded app, and they need
every single bit of performance we can squeeze from computers
(financial).

We are looking into whether we can obtain additional speed performance
from using Intel hyperthreading, as opposed to disabling
hyperthreading.

If we are able to get benefits from hyperthreading, it will be a huge
argument towards converting certain high end users desktops to Linux
from Windows XP.

I wrote a test perl script, that starts several tasks in parallel. All
these tasks perform a certain amount of calculations and exit. The
test completes when all of them exit.

The results were actually a disappointment, if the number of tasks was
equal to the number of physical cores. For the test with four parallel
subprocesses, on four CPUs, It takes longer to run it with HT than
without HT.

I think that I understand why.

What I found is that not all of these parallel tasks finish at the
same time. This happens because often times, two tasks are assigned to
two logical CPUs that share the same core, and some tasks are assigned
to only one core, whereas some cores are idling.

I cannot believe that I am the only guy with this problem, and hope
that the Linux community has found a solution. For example, perhaps
they can assign higher priority to some logical CPUs (say, to 0, 2, 4,
6) and lower priority to others. This way the higher priority ones
would be filled with tunning tasks, before fake "shadow processors"
1,3,5,7 are utilized.

I would like to know if perhaps there is a boot option to this effect.

This is Ubuntu Hardy, 2.6.24 kernel. I tried the same with 2.6.31,
with the same effect.

This article from Intel:

http://software.intel.com/sites/oss/pdfs/mclinux.pdf

It talks about intelligent handling of multiple cores as a done deal,
but in my experience that did not actually occur.

I would like to know how can I make the scheduler to prefer to spread
the tasks across physical cores as opposed to bundling two on one core
and leaving some cores idle.


2009-10-28 20:46:29

by Andreas Mohr

[permalink] [raw]
Subject: Re: Hyperthreading on 4 core CPU DECREASES performance???

Hi,

> I would like to know how can I make the scheduler to prefer to spread
> the tasks across physical cores as opposed to bundling two on one core
> and leaving some cores idle.

Did you have /sys/devices/system/cpu/sched_smt_power_savings activated?

Andreas Mohr

2009-10-28 21:11:48

by Igor Chudov

[permalink] [raw]
Subject: Re: Hyperthreading on 4 core CPU DECREASES performance???

On Wed, Oct 28, 2009 at 3:46 PM, Andreas Mohr <[email protected]> wrote:
>> I would like to know how can I make the scheduler to prefer to spread
>> the tasks across physical cores as opposed to bundling two on one core
>> and leaving some cores idle.
>
> Did you have /sys/devices/system/cpu/sched_smt_power_savings activated?

Andreas, no, I did not have it activated, it was set to 0. I did check
it before and double checked it now.

2009-10-28 22:23:44

by Alan

[permalink] [raw]
Subject: Re: Hyperthreading on 4 core CPU DECREASES performance???

> This is Ubuntu Hardy, 2.6.24 kernel. I tried the same with 2.6.31,
> with the same effect.
>
> This article from Intel:
>
> http://software.intel.com/sites/oss/pdfs/mclinux.pdf
>
> It talks about intelligent handling of multiple cores as a done deal,
> but in my experience that did not actually occur.
>
> I would like to know how can I make the scheduler to prefer to spread
> the tasks across physical cores as opposed to bundling two on one core
> and leaving some cores idle.

The scheduler will try and do balancing for packages and for HT. Make
sure your distribution is built with the SCHED_SMT and SCHED_MC options
enabled. Some of the non-enterprise distirbutions may well not have these
enabled.

For hand laying out threads see: man pthread_setaffinity_np

The win from HT depends a lot on the CPU and also the workload mix. In
some workloads it will decrease performance.


2009-10-29 00:00:09

by Igor Chudov

[permalink] [raw]
Subject: Re: Hyperthreading on 4 core CPU DECREASES performance???

On Wed, Oct 28, 2009 at 5:24 PM, Alan Cox <[email protected]> wrote:
>> This is Ubuntu Hardy, 2.6.24 kernel. I tried the same with 2.6.31,
>> with the same effect.

>> I would like to know how can I make the scheduler to prefer to spread
>> the tasks across physical cores as opposed to bundling two on one core
>> and leaving some cores idle.
>
> The scheduler will try and do balancing for packages and for HT. Make
> sure your distribution is built with the SCHED_SMT and SCHED_MC options
> enabled. Some of the non-enterprise distirbutions may well not have these
> enabled.

Yes, for the stock hardy -server and -generic kernel:

CONFIG_SCHED_MC=y
CONFIG_SCHED_SMT=y

>
> For hand laying out threads see: man pthread_setaffinity_np
>
> The win from HT depends a lot on the CPU and also the workload mix. In
> some workloads it will decrease performance.

OK, I think that I understand. Thanks.

So, what you are basically saying is that there is no option to
improve this behavior, right?

Thanks Alan.

2009-10-29 00:44:37

by Ben Gamari

[permalink] [raw]
Subject: Re: Hyperthreading on 4 core CPU DECREASES performance???

Excerpts from Igor Chudov's message of Wed Oct 28 20:00:10 -0400 2009:
> On Wed, Oct 28, 2009 at 5:24 PM, Alan Cox <[email protected]> wrote:
> > The win from HT depends a lot on the CPU and also the workload mix. In
> > some workloads it will decrease performance.
>
> OK, I think that I understand. Thanks.
>
> So, what you are basically saying is that there is no option to
> improve this behavior, right?
>
He was saying (correct me if I'm wrong) that SMT isn't necessarily a
win. For some workloads that utilize a variety of variety of processor
resources, Hyperthreading can help extract parellelism and improve
overall pipeline utilization. In many types of workloads, however, this
increased parallelism only results in increased cache, and TLB thrashing
and bandwidth contention. If your workload has a large cache footprint,
this will probably be a significant consideration. This problem is a
problem intrinsic to SMT and outside of the usual scheduling
considerations for cache-friendliness, there is little that the kernel
can do.

Whoever wrote that intelligent handling of multiple cores was a done
deal was lying. Certainly, the kernel is very good at scheduling
multiple cores, but using them in a way to extract performance is
completely up to the application and is by no means an easy task.

- Ben