2010-01-26 10:02:14

by Justin Piszcz

[permalink] [raw]
Subject: Hyperthreading on Core i7s: To use or not to use?

Hello,

Should the 'correct' kernel [CPU] configuration for a core i7 860/870..?

- Multi-core support
- Cores: 8
- SMT: Enabled/ON

>From CONFIG_SCHED_SMT:

. SMT scheduler support improves the CPU scheduler's decision making .
. when dealing with Intel Pentium 4 chips with HyperThreading at a .
. cost of slightly increased overhead in some places. If unsure say .
. N here. .

Does this also 'help' and/or 'apply' as much when dealing with Core i7s?

--

Quick little benchmark (pbzip2 -9 linux kernel source), the benchmark is
really within the noise (8 on/off)
- Multicore(8)/HT(Off) = 73.72user 0.33system 0:09.50elapsed 779%CPU (0avgtext+0avgdata 458528maxresident
- Multicore(8)/HT(On) = 74.28user 0.40system 0:09.67elapsed 772%CPU (0avgtext+0avgdata 428304maxresident
- Multicore(4)/HT(On) = 68.76user 0.30system 0:17.44elapsed 396%CPU (0avgtext+0avgdata 213616maxresident)k

--

Has anyone done any in-depth benchmarking for the core i7s that have multiple
cores and HT disabled/enabled?

Justin.


2010-01-26 10:57:00

by Daniel J Blueman

[permalink] [raw]
Subject: Re: Hyperthreading on Core i7s: To use or not to use?

On Jan 26, 10:10 am, Justin Piszcz <[email protected]> wrote:
> Hello,
>
> Should the 'correct' kernel [CPU] configuration for a core i7 860/870..?
>
> - Multi-core support
> - Cores: 8
> - SMT: Enabled/ON
>
> From CONFIG_SCHED_SMT:
>
> . SMT scheduler support improves the CPU scheduler's decision making .
> . when dealing with Intel Pentium 4 chips with HyperThreading at a .
> . cost of slightly increased overhead in some places. If unsure say .
> . N here. .
>
> Does this also 'help' and/or 'apply' as much when dealing with Core i7s?
>
> --
>
> Quick little benchmark (pbzip2 -9 linux kernel source), the benchmark is
> really within the noise (8 on/off)
> - Multicore(8)/HT(Off) = 73.72user 0.33system 0:09.50elapsed 779%CPU (0avgtext+0avgdata 458528maxresident
> - Multicore(8)/HT(On) = 74.28user 0.40system 0:09.67elapsed 772%CPU (0avgtext+0avgdata 428304maxresident
> - Multicore(4)/HT(On) = 68.76user 0.30system 0:17.44elapsed 396%CPU (0avgtext+0avgdata 213616maxresident)k
>
> --
>
> Has anyone done any in-depth benchmarking for the core i7s that have multiple
> cores and HT disabled/enabled?

With my Dell Studio 15 (model 1557) laptop, there is no option to
disable HT in the current BIOS, so booting with maxcpus=4 (since the
kernel enumerates non-sibling cores first) gave me a 5-15% speedup on
some large image processing (convolution, FFTs, conversion) on all
available cores, presumably due to better cache efficiency.

Booting with maxcpus=4 prevents any of the cores sitting in C6, needed
for turbo-boost and a lower thermal profile, though I did find
scheduling latency and responsiveness better under load booting with
maxcpus=4, so favour this when plugged in.

Clearly, having the BIOS option allows benefit to certain applications
- Dell should give their users the choice!

Perhaps the 'noht' boot option should be reintroduced to initialise
all cores, but only expose non-sibling cores to the OS (thus allowing
C6)?

Daniel

tip: modprobe msr and use turbostat to monitor turbo-boost and C-state
residency: http://bugzilla.kernel.org/attachment.cgi?id=24673
--
Daniel J Blueman

2010-01-26 13:11:57

by peng huang

[permalink] [raw]
Subject: Re: Hyperthreading on Core i7s: To use or not to use?

Hello,

I have been done some benchmark things on the processors with HT,
if you run lots of processes/threads that more than the physical
cores,may be you should enable the HT to get more benefits.
#unless your processes cause some cache competitions.

I will send you some test data later.

-huang

2010-01-26 (火) の 10:56 +0000 に Daniel J Blueman さんは書きました:
> On Jan 26, 10:10 am, Justin Piszcz <[email protected]> wrote:
> > Hello,
> >
> > Should the 'correct' kernel [CPU] configuration for a core i7 860/870..?
> >
> > - Multi-core support
> > - Cores: 8
> > - SMT: Enabled/ON
> >
> > From CONFIG_SCHED_SMT:
> >
> > . SMT scheduler support improves the CPU scheduler's decision making .
> > . when dealing with Intel Pentium 4 chips with HyperThreading at a .
> > . cost of slightly increased overhead in some places. If unsure say .
> > . N here. .
> >
> > Does this also 'help' and/or 'apply' as much when dealing with Core i7s?
> >
> > --
> >
> > Quick little benchmark (pbzip2 -9 linux kernel source), the benchmark is
> > really within the noise (8 on/off)
> > - Multicore(8)/HT(Off) = 73.72user 0.33system 0:09.50elapsed 779%CPU (0avgtext+0avgdata 458528maxresident
> > - Multicore(8)/HT(On) = 74.28user 0.40system 0:09.67elapsed 772%CPU (0avgtext+0avgdata 428304maxresident
> > - Multicore(4)/HT(On) = 68.76user 0.30system 0:17.44elapsed 396%CPU (0avgtext+0avgdata 213616maxresident)k
> >
> > --
> >
> > Has anyone done any in-depth benchmarking for the core i7s that have multiple
> > cores and HT disabled/enabled?
>
> With my Dell Studio 15 (model 1557) laptop, there is no option to
> disable HT in the current BIOS, so booting with maxcpus=4 (since the
> kernel enumerates non-sibling cores first) gave me a 5-15% speedup on
> some large image processing (convolution, FFTs, conversion) on all
> available cores, presumably due to better cache efficiency.
>
> Booting with maxcpus=4 prevents any of the cores sitting in C6, needed
> for turbo-boost and a lower thermal profile, though I did find
> scheduling latency and responsiveness better under load booting with
> maxcpus=4, so favour this when plugged in.
>
> Clearly, having the BIOS option allows benefit to certain applications
> - Dell should give their users the choice!
>
> Perhaps the 'noht' boot option should be reintroduced to initialise
> all cores, but only expose non-sibling cores to the OS (thus allowing
> C6)?
>
> Daniel
>
> tip: modprobe msr and use turbostat to monitor turbo-boost and C-state
> residency: http://bugzilla.kernel.org/attachment.cgi?id=24673


--
peng huang <[email protected]>

2010-01-27 00:50:24

by J.A. Magallón

[permalink] [raw]
Subject: Re: Hyperthreading on Core i7s: To use or not to use?

On Tue, 26 Jan 2010 10:56:57 +0000, Daniel J Blueman <[email protected]> wrote:

> On Jan 26, 10:10 am, Justin Piszcz <[email protected]> wrote:
> > Hello,
> >
> > Should the 'correct' kernel [CPU] configuration for a core i7 860/870..?
> >
> > - Multi-core support
> > - Cores: 8
> > - SMT: Enabled/ON
> >
> > From CONFIG_SCHED_SMT:
> >
> > . SMT scheduler support improves the CPU scheduler's decision making .
> > . when dealing with Intel Pentium 4 chips with HyperThreading at a .
> > . cost of slightly increased overhead in some places. If unsure say .
> > . N here. .
> >
> > Does this also 'help' and/or 'apply' as much when dealing with Core i7s?
> >
> > --
> >
> > Quick little benchmark (pbzip2 -9 linux kernel source), the benchmark is
> > really within the noise (8 on/off)
> > - Multicore(8)/HT(Off) = 73.72user 0.33system 0:09.50elapsed 779%CPU (0avgtext+0avgdata 458528maxresident
> > - Multicore(8)/HT(On) = 74.28user 0.40system 0:09.67elapsed 772%CPU (0avgtext+0avgdata 428304maxresident
> > - Multicore(4)/HT(On) = 68.76user 0.30system 0:17.44elapsed 396%CPU (0avgtext+0avgdata 213616maxresident)k

What does 'multicore(n)' mean ? And HT ?
Are they BIOS, kernel or pbzip2 options ?

Well, with that i7 you have:
- 1 processor
- 4 cores
- 8 threads (4x2)

(BTW, i think the 'CPUs' nomenclature in kernel is misleading...
with nowadays processor, what does 'CPU' mean ? processor, core, thread...)

With that processor, you should:
- Configure the kernel for 8 'kernel CPUs', what means you can support
8 threads
- Configure SCHED_MC on, as you have several cores inside the same CPU that
share L3 cache
- Configure SCHED_SMT on, as you have several threads per core

Don't know if SCHED_SMT is very useful for not-hand-crafted apps, but it can't
hurt too much for some extra scheduling calculations...

What does 'within the noise' mean for you ?
With the 4-on mode, it just takes 2x the time to do the work...(look at elapsed!)
So, assuming the kernel is 'intelligent':
- Using 4 threads, it takes 17.44, and scheduler is using 4 threads located
on different cores.
- Using 8 threads, it takes 9.50. So effciency is 17.44/(2*9.50) = 91%
Very good! So this HyperThreading is not like that old in P4s, works much
better. Even with each couple threads competing for registers and L1 (or L2?)
cache.

So hyper threading is good, why should you disable it ?

> > --
> >
> > Has anyone done any in-depth benchmarking for the core i7s that have multiple
> > cores and HT disabled/enabled?
>
> With my Dell Studio 15 (model 1557) laptop, there is no option to
> disable HT in the current BIOS, so booting with maxcpus=4 (since the
> kernel enumerates non-sibling cores first) gave me a 5-15% speedup on
> some large image processing (convolution, FFTs, conversion) on all
> available cores, presumably due to better cache efficiency.
>

Sure you don't have a full extra processor, but it can do some work.

An example. This is a ray tracing engine. If you don't know ray-tracing,
lets say in short it implies traversing a tree structure and doing some
floating point calculations on each node. This repeated millions of times.
With an old [email protected] GHz, with HT, this are some simple benchmarks:

- One thread can do about 540 kilo-rays per second
- Both threads do about 800 kR/s. Efficiency: 75%
In other words, the 'second' thread counts as about 'half' more CPU,
like if you had 1.5 cpus instead of 2.0. Anyways, its more than 1.0.

And this is not seriously hand-crafted, just POSIX threads code.

If your application is any more complex than the pathological case of
summing two vectors, you can use the HT-sibling for something useful.
And even in that case you can code your program in a cache-aware fashion,
perhaps doing an interleaved sum instead of one chunk on each thread (?)...

--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2010.1 (Cooker) for x86_64
Linux 2.6.32.3-desktop-0.rc2.1mnb (gcc 4.4.2 ) SMP

Subject: Re: Hyperthreading on Core i7s: To use or not to use?

On Wed, 27 Jan 2010, J.A. Magall?n wrote:
> On Tue, 26 Jan 2010 10:56:57 +0000, Daniel J Blueman <[email protected]> wrote:
> Well, with that i7 you have:
> - 1 processor
> - 4 cores
> - 8 threads (4x2)

And Turbo Boost, let's not forget it.

> What does 'within the noise' mean for you ?
> With the 4-on mode, it just takes 2x the time to do the work...(look at elapsed!)
> So, assuming the kernel is 'intelligent':
> - Using 4 threads, it takes 17.44, and scheduler is using 4 threads located
> on different cores.
> - Using 8 threads, it takes 9.50. So effciency is 17.44/(2*9.50) = 91%
> Very good! So this HyperThreading is not like that old in P4s, works much
> better. Even with each couple threads competing for registers and L1 (or L2?)
> cache.
>
> So hyper threading is good, why should you disable it ?

When for some reason it gives you less 'automatic overclocking' and your
workload happens to benefit more from less cores with a higher clock, than
more cores with a lower clock, I suppose.

Which might just mean one in that situation should try the 'power aware
scheduler' since it supposedly tries to idle threads/cores a lot more, which
should make it easier for the active cores to overclock themselves.

Using Len's userspace utility to track the real freq. of boosted cores might
give some insights. It is in his pmtools package, available at:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh