2006-12-27 14:16:30

by Martin Knoblauch

[permalink] [raw]
Subject: How to detect multi-core and/or HT-enabled CPUs in 2.4.x and 2.6.x kernels

Hi, (please CC on replies, thanks)

for the ganglia project (http://ganglia.sourceforge.net/) we are
trying to find a heuristics to determine the number of physical CPU
"cores" as opposed to virtual processors added by enabling HT. The
method should work on 2.4 and 2.6 kernels.

So far it seems that looking at the "physical id", "core id" and "cpu
cores" of /proc/cpuinfo is the way to go.

In 2.6 I would try to find the distinct "physical id"s and and sum
up the corresponding "cpu cores". The question is whether this would
work for 2.4 based systems.

Does anybody recall when the "physical id", "core id" and "cpu cores"
were added to /proc/cpuinfo ?

Cheers
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de


2006-12-27 14:24:33

by Arjan van de Ven

[permalink] [raw]
Subject: Re: How to detect multi-core and/or HT-enabled CPUs in 2.4.x and 2.6.x kernels

On Wed, 2006-12-27 at 06:16 -0800, Martin Knoblauch wrote:
> Hi, (please CC on replies, thanks)
>
> for the ganglia project (http://ganglia.sourceforge.net/) we are
> trying to find a heuristics to determine the number of physical CPU
> "cores" as opposed to virtual processors added by enabling HT. The
> method should work on 2.4 and 2.6 kernels.

I have a counter question for you.. what are you trying to do with the
"these two are SMT sibblings" information ?

Because I suspect "HT" is the wrong level of detection for what you
really want to achieve....

If you want to decide "shares caches" then at least 2.6 kernels directly
export that (and HT is just the wrong way to go about this).
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-27 14:37:20

by Jan Engelhardt

[permalink] [raw]
Subject: Re: How to detect multi-core and/or HT-enabled CPUs in 2.4.x and 2.6.x kernels


On Dec 27 2006 06:16, Martin Knoblauch wrote:
>
> So far it seems that looking at the "physical id", "core id" and "cpu
>cores" of /proc/cpuinfo is the way to go.

Possibly, but it does not catch all cases.

$grep '"physical id' /erk/kernel/linux-2.6.20-rc2/ -r

returns exactly three lines, for
/erk/kernel/linux-2.6.20-rc2/arch/i386/kernel/cpu/proc.c
/erk/kernel/linux-2.6.20-rc2/arch/ia64/kernel/setup.c
/erk/kernel/linux-2.6.20-rc2/arch/x86_64/kernel/setup.c

So what'cha doing about, say, sparc64? Here is its procinfo of a
standard SMP one:

15:31 ares:~ # cat /proc/cpuinfo
cpu : TI UltraSparc II (BlackBird)
fpu : UltraSparc II integrated FPU
prom : OBP 3.30.0 2003/11/11 10:37
type : sun4u
ncpus probed : 2
ncpus active : 2
D$ parity tl1 : 0
I$ parity tl1 : 0
Cpu0Bogo : 800.49
Cpu0ClkTck : 0000000017d78400
Cpu1Bogo : 800.05
Cpu1ClkTck : 0000000017d78400
MMU Type : Spitfire
State:
CPU0: online
CPU1: online


> In 2.6 I would try to find the distinct "physical id"s and and sum
>up the corresponding "cpu cores". The question is whether this would
>work for 2.4 based systems.
>
> Does anybody recall when the "physical id", "core id" and "cpu cores"
>were added to /proc/cpuinfo ?

Why don't you check it out? 2.4.34 only has the "physical id" string for
x86_64. It does not seem to have CONFIG_SCHED_SMT at all. (Time to leave
the dead horse alone.)


-`J'
--

2006-12-27 14:40:49

by Martin Knoblauch

[permalink] [raw]
Subject: Re: How to detect multi-core and/or HT-enabled CPUs in 2.4.x and 2.6.x kernels


--- Arjan van de Ven <[email protected]> wrote:

> On Wed, 2006-12-27 at 06:16 -0800, Martin Knoblauch wrote:
> > Hi, (please CC on replies, thanks)
> >
> > for the ganglia project (http://ganglia.sourceforge.net/) we are
> > trying to find a heuristics to determine the number of physical CPU
> > "cores" as opposed to virtual processors added by enabling HT. The
> > method should work on 2.4 and 2.6 kernels.
>
> I have a counter question for you.. what are you trying to do with
> the
> "these two are SMT sibblings" information ?
>
> Because I suspect "HT" is the wrong level of detection for what you
> really want to achieve....
>
> If you want to decide "shares caches" then at least 2.6 kernels
> directly
> export that (and HT is just the wrong way to go about this).
> --
Hi Arjan,

one piece of information that Ganglia collects for a node is the
"number of CPUs", originally meaning "physical CPUs". With the
introduction of HT and multi-core things are a bit more complex now. We
have decided that HT sibblings do not qualify as "real" CPUs, while
multi-cores do.

Currently we are doing "sysconf(_SC_NPROCESSORS_ONLN)". But this
includes both physical and virtual (HT) cores. We are looking for a
method that only shows "real iron" and works on 2.6 and 2.4 kernels.
Whether this has any practial valus is a completely different question.

Cheers
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de

2006-12-27 15:13:05

by Arjan van de Ven

[permalink] [raw]
Subject: Re: How to detect multi-core and/or HT-enabled CPUs in 2.4.x and 2.6.x kernels


>
> one piece of information that Ganglia collects for a node is the
> "number of CPUs", originally meaning "physical CPUs".

Ok I was afraid of that.

> With the
> introduction of HT and multi-core things are a bit more complex now. We
> have decided that HT sibblings do not qualify as "real" CPUs, while
> multi-cores do.

I think that decision is a mistake, and is probably based on experiences
with the first generation of HT capable Pentium 4 processors.

The original p4 HT to a large degree suffered from a too small cache
that now was shared. SMT in general isn't per se all that different in
performance than dual core, at least not on a fundamental level, it's
all a matter of how many resources each thread has on average. With dual
core sharing the cache for example, that already is part HT. Putting the
"boundary" at HT-but-not-dual-core is going to be highly artificial and
while it may work for the current hardware, in general it's not a good
way of separating things (just look at the PowerPC processors, those are
highly SMT as well), and I suspect that your distinction is just going
to break all the time over the next 10 years ;) Or even today on the
current "large cache" P4 processors with HT it already breaks. (just
those tend to be the expensive models so more rare)

I would strongly urge you to reconsider this decision; if you want to
show "sockets" that sounds reasonable, or even if you want to do it on
the "bus sharing" level (FSB/HT), but HT.. just sounds wrong.





--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-27 15:22:44

by glebn

[permalink] [raw]
Subject: Re: How to detect multi-core and/or HT-enabled CPUs in 2.4.x and 2.6.x kernels

On Wed, Dec 27, 2006 at 04:13:00PM +0100, Arjan van de Ven wrote:
> The original p4 HT to a large degree suffered from a too small cache
> that now was shared. SMT in general isn't per se all that different in
> performance than dual core, at least not on a fundamental level, it's
> all a matter of how many resources each thread has on average. With dual
> core sharing the cache for example, that already is part HT. Putting the
> "boundary" at HT-but-not-dual-core is going to be highly artificial and
> while it may work for the current hardware, in general it's not a good
> way of separating things (just look at the PowerPC processors, those are
> highly SMT as well), and I suspect that your distinction is just going
> to break all the time over the next 10 years ;) Or even today on the
> current "large cache" P4 processors with HT it already breaks. (just
> those tend to be the expensive models so more rare)
>
If I run two threads that are doing only calculations and very little or no
IO at all on the same socket will modern HT and dual core be the same
(or close) performance wise?

--
Gleb.

2006-12-27 15:38:59

by Arjan van de Ven

[permalink] [raw]
Subject: Re: How to detect multi-core and/or HT-enabled CPUs in 2.4.x and 2.6.x kernels


> If I run two threads that are doing only calculations and very little or no
> IO at all on the same socket will modern HT and dual core be the same
> (or close) performance wise?

it depends on how cache/memory bandwidth sensitive your calculation
is.... if your calculation is memory bandwidth sensitive then they're
the same. If your calculation is very sensitive to have the dataset fit
in cache.. it's another different ballgame again, because then it
depends if the cores in your dual core share the cache or not.


--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-27 15:41:59

by Martin Knoblauch

[permalink] [raw]
Subject: Re: How to detect multi-core and/or HT-enabled CPUs in 2.4.x and 2.6.x kernels


--- Gleb Natapov <[email protected]> wrote:

> On Wed, Dec 27, 2006 at 04:13:00PM +0100, Arjan van de Ven wrote:
> > The original p4 HT to a large degree suffered from a too small
> cache
> > that now was shared. SMT in general isn't per se all that different
> in
> > performance than dual core, at least not on a fundamental level,
> it's
> > all a matter of how many resources each thread has on average. With
> dual
> > core sharing the cache for example, that already is part HT.
> Putting the
> > "boundary" at HT-but-not-dual-core is going to be highly artificial
> and
> > while it may work for the current hardware, in general it's not a
> good
> > way of separating things (just look at the PowerPC processors,
> those are
> > highly SMT as well), and I suspect that your distinction is just
> going
> > to break all the time over the next 10 years ;) Or even today on
> the
> > current "large cache" P4 processors with HT it already breaks.
> (just
> > those tend to be the expensive models so more rare)
> >
> If I run two threads that are doing only calculations and very little
> or no
> IO at all on the same socket will modern HT and dual core be the same
> (or close) performance wise?
>
Hi Gleb,

this is a real interesting question. Ganglia is coming [originally]
from the HPC side of computing. At least in the past HT as implemented
on XEONs did help a lot. Running two CPU+memory-bandwith intensive
processes on the same physical CPU would at best result in a 50/50
performance split. So, knowing how many "real" CPUs are in a system is
interesting to us.

Other workloads (like lots of java threads doing mixed IO and CPU
stuff) of course can benefit from HT.

Cheers
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de

2006-12-27 15:51:07

by Martin Knoblauch

[permalink] [raw]
Subject: Re: How to detect multi-core and/or HT-enabled CPUs in 2.4.x and 2.6.x kernels


--- Gleb Natapov <[email protected]> wrote:

> >
> If I run two threads that are doing only calculations and very little
> or no
> IO at all on the same socket will modern HT and dual core be the same
> (or close) performance wise?
>

actually I wanted to write that "HT as implemented on XEONs did not
help a lot for HPC workloads in the past"....

Cheers
Martin

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de

2006-12-27 15:53:30

by Arjan van de Ven

[permalink] [raw]
Subject: Re: How to detect multi-core and/or HT-enabled CPUs in 2.4.x and 2.6.x kernels


>
> this is a real interesting question. Ganglia is coming [originally]
> from the HPC side of computing. At least in the past HT as implemented
> on XEONs did help a lot. Running two CPU+memory-bandwith intensive
> processes on the same physical CPU would at best result in a 50/50
> performance split. So, knowing how many "real" CPUs are in a system is
> interesting to us.

but this 50/50 split is most likely because of either a cache or a
bandwidth bottleneck, at which point "HT" is the wrong measure.

(also if you go to things like SUN's Niagara CPU then it's even more
clear that SMT isn't the right measure for performance capability)


--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-27 16:09:35

by Arjan van de Ven

[permalink] [raw]
Subject: Re: How to detect multi-core and/or HT-enabled CPUs in 2.4.x and 2.6.x kernels


>
> actually I wanted to write that "HT as implemented on XEONs did not
> help a lot for HPC workloads in the past"....


btw this is exactly the problem I am trying to point out: ".. as
implemented in generation XYZ model ABC of processor DEF".
that's going to be really fragile and in fact won't work even for
processors you can buy today (power5 and sparc niagara for example, and
depending on the workload, even on todays 16Mb cache Xeons).

once your program (and many others) have such a check, then the next
step will be pressure on the kernel code to "fake" the old situation
when there is a processor where <vague criteria of the day> no longer
holds. It's basically a road to madness :-(

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-27 17:14:54

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: How to detect multi-core and/or HT-enabled CPUs in 2.4.x and 2.6.x kernels

In article <[email protected]> you wrote:
> once your program (and many others) have such a check, then the next
> step will be pressure on the kernel code to "fake" the old situation
> when there is a processor where <vague criteria of the day> no longer
> holds. It's basically a road to madness :-(

I agree that for HPC sizing a benchmark with various levels of parallelity
are better. The question is, if the code in question only is for inventory
reasons. In that case I would do something like x sockets, y cores and z cm
threads.

Bernd

2006-12-27 17:52:05

by Martin Knoblauch

[permalink] [raw]
Subject: Re: How to detect multi-core and/or HT-enabled CPUs in 2.4.x and 2.6.x kernels

>In article <1167235772.3281.3977.camel@xxxxxxxxxxxxxxxxxxxxx> you
wrote:
>> once your program (and many others) have such a check, then the next
>> step will be pressure on the kernel code to "fake" the old situation
>> when there is a processor where <vague criteria of the day> no
longer
>> holds. It's basically a road to madness :-(
>
> I agree that for HPC sizing a benchmark with various levels of
> parallelity are better. The question is, if the code in question
> only is for inventory reasons. In that case I would do something
> like x sockets, y cores and z cm threads.
>
> Bernd

For sizing purposes, doing benchmarks is the only way. For the purpose
of Ganglia the sockets/cores/threads info is purely for inventory. And
we are likely going to add the new information to our metrics.

But - we still need to find a way to extract the infor :-)

Cheers
Martin
PS: I have likely killed the CC this time. Sorry.

------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de

2006-12-27 20:27:22

by Suresh Siddha

[permalink] [raw]
Subject: Re: How to detect multi-core and/or HT-enabled CPUs in 2.4.x and 2.6.x kernels

On Wed, Dec 27, 2006 at 09:52:02AM -0800, Martin Knoblauch wrote:
> For sizing purposes, doing benchmarks is the only way. For the purpose
> of Ganglia the sockets/cores/threads info is purely for inventory. And
> we are likely going to add the new information to our metrics.
>
> But - we still need to find a way to extract the infor :-)

Only the 2.4 x86_64 kernels are exporting limited info("physical id",
"siblings") through /proc/cpuinfo.

Some of the distos based on 2.4 kernels have the complete topology
(physical id, core id, cpu cores, siblings) exported through /proc/cpuinfo.

thanks,
suresh