Hello!
When I wrote schwanz3(*) for fun, I noticed /proc/cpuinfo
varies very much on different architectures.
Is it possible to make it look more identical (as far as the different
archs allow it)?
So that one at least can count the cpus on every system the same way.
If so, who would the one I should contact and who would accept / verify
a patch doing that?
Greetings,
Nico
--
Keep it simple & stupid, use what's available.
Please use pgp encryption: 8D0E 27A4 is my id.
http://nico.schotteli.us | http://linux.schottelius.org
On Tue, Apr 19, 2005 at 02:15:30PM +0200, Nico Schottelius wrote:
> When I wrote schwanz3(*) for fun, I noticed /proc/cpuinfo
> varies very much on different architectures.
>
> Is it possible to make it look more identical (as far as the different
> archs allow it)?
>
> So that one at least can count the cpus on every system the same way.
>
> If so, who would the one I should contact and who would accept / verify
> a patch doing that?
If you change it now, how many tools would break?
Maybe if you can list what statistics you think should be common to all
systems, that could be presented in another file that is always the same
format on each architecture.
Certainly looking at arm and i386, other than the bogomips field there
is nothing in common between their cpuinfo contents. THey don't even
capitalize bogomips the same either.
I doubt this is really doable. If all you want is the number of CPUs
then something like sysconf(_SC_NPROCESSORS_CONF) should do.
Len Sorensen
On Tue, 2005-04-19 at 09:24 -0400, Lennart Sorensen wrote:
> On Tue, Apr 19, 2005 at 02:15:30PM +0200, Nico Schottelius wrote:
> > When I wrote schwanz3(*) for fun, I noticed /proc/cpuinfo
> > varies very much on different architectures.
> >
> > Is it possible to make it look more identical (as far as the different
> > archs allow it)?
> >
> > So that one at least can count the cpus on every system the same way.
> >
> > If so, who would the one I should contact and who would accept / verify
> > a patch doing that?
>
> If you change it now, how many tools would break?
>
Lots. Please don't change the format of /proc/cpuinfo.
Lee
Lee Revell [Tue, Apr 19, 2005 at 03:17:00PM -0400]:
> On Tue, 2005-04-19 at 09:24 -0400, Lennart Sorensen wrote:
> > On Tue, Apr 19, 2005 at 02:15:30PM +0200, Nico Schottelius wrote:
> > > When I wrote schwanz3(*) for fun, I noticed /proc/cpuinfo
> > > varies very much on different architectures.
> > >
> > > Is it possible to make it look more identical (as far as the different
> > > archs allow it)?
> > >
> > > So that one at least can count the cpus on every system the same way.
> > >
> > > If so, who would the one I should contact and who would accept / verify
> > > a patch doing that?
> >
> > If you change it now, how many tools would break?
> >
>
> Lots. Please don't change the format of /proc/cpuinfo.
Can you tell me which ones?
And if there are really that many tools, which are dependent on
those information, wouldn't it be much more senseful to make
it (as far as possible) the same?
I must say I was really impressed, how easy I got the number of
cpus on *BSD (I am not a bsd user, still impressed).
They also have the same format on every arch and mostly the same
between different bsds (as far as I have seen).
In general, where are the advantages of having very different cpuinfo
formats? Tools would need to know less about the arch and could
depend on "I am on Linux" only.
Just some thoughts,
Nico
>
> Lee
>
>
--
Keep it simple & stupid, use what's available.
Please use pgp encryption: 8D0E 27A4 is my id.
http://nico.schotteli.us | http://linux.schottelius.org
On Tue, 19 Apr 2005 22:00:12 +0200
Nico Schottelius <[email protected]> wrote:
> Can you tell me which ones?
glibc even parses /proc/cpuinfo, so by implication every
application
On Tue, Apr 19, 2005 at 10:00:12PM +0200, Nico Schottelius wrote:
> Can you tell me which ones?
top for example would probably break. Maybe not but I suspect it would.
mplayer probably would since it uses it to find the cpu type and
features that cpu supports.
> And if there are really that many tools, which are dependent on
> those information, wouldn't it be much more senseful to make
> it (as far as possible) the same?
Well the tools that care are often architecture specific. After all the
info in cpuinfo is very architecture specific.
> I must say I was really impressed, how easy I got the number of
> cpus on *BSD (I am not a bsd user, still impressed).
Well there still is that sysconf call to get the number of cpus, which
is way better in C than parsing a text file to get the info as far as I
am concerned.
> They also have the same format on every arch and mostly the same
> between different bsds (as far as I have seen).
What info do they provide in that on BSD?
> In general, where are the advantages of having very different cpuinfo
> formats? Tools would need to know less about the arch and could
> depend on "I am on Linux" only.
The info in cpuinfo is only of interest to a tool that knows about the
architecture it is on.
It is not meant to look up the number of cpus the system has. It is
meant to provide the info about the cpus in the system, which means it
provides info relevant to the cpu on a given architecture.
Len Sorensen
On Tue, 2005-04-19 at 22:00 +0200, Nico Schottelius wrote:
> Can you tell me which ones?
>
Multimedia apps like JACK and mplayer that use the TSC for high res
timing need to know the CPU speed, and /proc/cpuinfo is the fast way to
get it.
Why don't you create sysfs entries instead? It would be better to have
all the cpuinfo contents as one value per file anyway (faster
application startup).
Lee
Lee Revell [Tue, Apr 19, 2005 at 04:42:12PM -0400]:
> On Tue, 2005-04-19 at 22:00 +0200, Nico Schottelius wrote:
> > Can you tell me which ones?
> >
>
> Multimedia apps like JACK and mplayer that use the TSC for high res
> timing need to know the CPU speed, and /proc/cpuinfo is the fast way to
> get it.
>
> Why don't you create sysfs entries instead? It would be better to have
> all the cpuinfo contents as one value per file anyway (faster
> application startup).
Well, sounds very good. It's a chance for me to learn to program
sysfs and also to create something useful.
So the right location to place that data would be
/sys/devices/system/cpu/cpuX?
Nico
--
Keep it simple & stupid, use what's available.
Please use pgp encryption: 8D0E 27A4 is my id.
http://nico.schotteli.us | http://linux.schottelius.org
On Tue, 19 Apr 2005, Nico Schottelius wrote:
> Lee Revell [Tue, Apr 19, 2005 at 04:42:12PM -0400]:
>> On Tue, 2005-04-19 at 22:00 +0200, Nico Schottelius wrote:
>>> Can you tell me which ones?
>>>
>>
>> Multimedia apps like JACK and mplayer that use the TSC for high res
>> timing need to know the CPU speed, and /proc/cpuinfo is the fast way to
>> get it.
>>
>> Why don't you create sysfs entries instead? It would be better to have
>> all the cpuinfo contents as one value per file anyway (faster
>> application startup).
>
> Well, sounds very good. It's a chance for me to learn to program
> sysfs and also to create something useful.
>
> So the right location to place that data would be
> /sys/devices/system/cpu/cpuX?
IIRC there was such patch not very long ago and it was rejected (to not
make kernel larger or something like that...). But I can be wrong.
I think that all data from /proc that are not user-process data should be
exported using sysfs or something similar. Maybe even in future one will
write userspace fs (using FUSE?) to emulate old /proc entries using sysfs
exported data. This will allow removing all proc code from kernel. Maybe
even user processes data should be exported using sysfs (like
/sys/processes/123/maps/41610000/file)? But this is my personal opinion.
Grzegorz Kulewski
On Tue, Apr 19, 2005 at 09:24:17AM -0400, Lennart Sorensen wrote:
> If you change it now, how many tools would break?
>
> Maybe if you can list what statistics you think should be common to all
> systems, that could be presented in another file that is always the same
> format on each architecture.
>
> Certainly looking at arm and i386, other than the bogomips field there
> is nothing in common between their cpuinfo contents. THey don't even
> capitalize bogomips the same either.
>
> I doubt this is really doable. If all you want is the number of CPUs
> then something like sysconf(_SC_NPROCESSORS_CONF) should do.
Which in glibc is implemented by counting the number of processor: records
in /proc/cpuinfo, so simply Nico's parser seems to be insufficient.
Ralf
On Tue, 19 Apr 2005, Nico Schottelius wrote:
>When I wrote schwanz3(*) for fun, I noticed /proc/cpuinfo
>varies very much on different architectures.
Yep, and it has been this way since the begining of time.
>So that one at least can count the cpus on every system the same way.
Hah. Give me a minute to stop laughing... I argued the same point almost
a decade ago. Linus decided to be an ass and flat refused to ever export
numcpu (or any of the current day derivatives) which brought us to the
bullshit of parsing the arch dependant /proc/cpuinfo.
Short of a kernel module to export the kernel variables, that's the only
damned way to find the number of cpus in a Linux system. I was bitched at
by other Distributed.net developers years ago for adding this sort of code
to count up the cpus under linux -- at the time, libc/glibc's sysconf()
didn't support getting cpu info under linux. Today, glibc's sysconf()
parses /proc/cpuinfo.
>If so, who would the one I should contact and who would accept / verify
>a patch doing that?
Linus has already spoken. Don't waste your time. (unless he's willing to
rethink this whole stupidity.)
Beyond counting cpus, each arch is reporting very different things, so
combining them into one general format really doesn't make sense. The
notion of putting all that info in sysfs space isn't bad except it takes
up a lot more memory.
--Ricky
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
>>When I wrote schwanz3(*) for fun, I noticed /proc/cpuinfo
>>varies very much on different architectures.
>
> Yep, and it has been this way since the begining of time.
>
>>So that one at least can count the cpus on every system the same way.
>
> Hah. Give me a minute to stop laughing... I argued the same point almost
> a decade ago. Linus decided to be an ass and flat refused to ever export
> numcpu (or any of the current day derivatives) which brought us to the
> bullshit of parsing the arch dependant /proc/cpuinfo.
Hey Ricky.
Not to be a pain but how exactly would that interface look today
in your eyes?
Single AthlonXP system - 1 cpu right?
Dual Opteron - 2 cpu right?
Now come the interesting things :
Single P4 w/ HT enabled - 1 or 2?
even more interesting :
DualCore P4 w/ HT disabled - 1 or 2 ?
And to top it off :
DualCore P4 w/ HT enabled - 1, 2 or 4 ?
Show me a scalable interface that can account
for all cases here.
One software might want to count each virtual
CPU as 1 hence the DC P4 w/ HT it would want
to count as 4.
Another software might want to only count
the cores, hence count them as 2.
Yet another software might want to count it as
1.
Then of course we might have a system with 4
DualCore whatever with HT with 4 CPU boards
in some kind of NUMA. Do you want to count
4 CPU's (4 boards) or do you want 16 CPU's
(4 boards * 4 CPU's per board) or 32 CPU's
(4 boards * 4 CPU's per board * 2 cores per CPU)
or .. or .. or ..
It quickly gets out of hand.
And everybody will want to count it differently.
If you set a standard "only count physical CPUs"
then the next guy will think differently.
Same as if you set the standard to "only count
physical cores".
Today we have dualcore, HT, other kinds of SMT, etc,
add multiple CPU's per board in a NUMA or some kind
of clustering ...
So yes, I do agree that it would be good to have an
easy way to get it but the question is .. what is
a person after..
// Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)
iD8DBQFCfD2KBrn2kJu9P78RAr5sAKC4StnvHWvKvf2IljbEhHDpEDs11ACgiy4W
RCa9q9OanAS0LcYhdnz3TE0=
=g0Y7
-----END PGP SIGNATURE-----
Ricky Beam <[email protected]> wrote:
>
> Short of a kernel module to export the kernel variables, that's the only
> damned way to find the number of cpus in a Linux system.
Question is: do you need to know the number of CPUs (why?) or do you need
to know the number of CPUs which you're currently allowed to use or do you
need to know the maximum number of CPUs which you are allowed to bind
yourself to, or what?
Probably these things can be worked out via the get/set_affinity() syscalls
and/or via the cpuset sysfs interfaces, but it isn't as simple as you're
assuming.
On Sat, 7 May 2005, I wrote:
>> Hah. Give me a minute to stop laughing... I argued the same point almost
>> a decade ago. Linus decided to be an ass and flat refused to ever export
>> numcpu (or any of the current day derivatives) which brought us to the
>> bullshit of parsing the arch dependant /proc/cpuinfo.
>
>Not to be a pain but how exactly would that interface look today
>in your eyes?
...
That's why I said, "or any of the current day derivatives".
Back when I first brought this up (8 years ago?), it was simple... numcpu
was it. There weren't any virtual processors or multi-core critters.
CPU affinity, cpumasks, and sysfs weren't even dreams.
Today, things are more complicated... much more complicated. However,
they've generally already been hashed out and handled in some fashion.
The kernel already knows how many cpus there are, how many are online,
which ones are virtual (at least to the point that the scheduler knows),
etc. I'm not sure what difference multi-core chips really make as they're
just two+ cpus in the same package -- yes, that means all of them have to
be offline to physically remove the processor, but that's pretty hardcore,
specialized function to begin with.
The issue with detecting HT enabled processors came up shortly after
they became available and /proc/cpuinfo and associated apps were updated
accordingly.
My point is, and has always been, it's much faster and far more efficient
to have a "binary" view of what the kernel has always known than spinning
around in one's chair groking a wad of mostly meaningless ASCII text
engineered to make sense only to the eyeballs of humans. Most of
/proc fits in this boat... with cpuinfo in the driver's seat.
(I won't launch into my oft repeated ASCII vs. binary /proc flamewar.
We have 4GHz processors now, so nobody cares about being efficient
despite about a 10x(+) speedup if the ascii middleman were taken out
and shot.)
--Ricky
Hi Andrew,
On Fri, May 06, 2005 at 09:14:55PM -0700, Andrew Morton wrote:
> Ricky Beam <[email protected]> wrote:
> >
> > Short of a kernel module to export the kernel variables, that's the only
> > damned way to find the number of cpus in a Linux system.
>
> Question is: do you need to know the number of CPUs (why?) or do you need
> to know the number of CPUs which you're currently allowed to use or do you
> need to know the maximum number of CPUs which you are allowed to bind
> yourself to, or what?
I personally think that what would be useful is not "the number of CPUs"
(which does not make any sense), but an enumeration of :
- the physical nodes (for NUMA)
- the physical CPUs
- each CPU's cores (for multi-core)
- each core's siblings (for HT/SMT)
each of which would report their respective id for {set,get}_affinity().
This way, the application would be able to choose how it needs to spread
over available CPUs depending on its workload. But IMHO, this should
definitely not be put in cpuinfo. I consider that cpuinfo is for the human.
> Probably these things can be worked out via the get/set_affinity() syscalls
> and/or via the cpuset sysfs interfaces, but it isn't as simple as you're
> assuming.
At least it would be simpler with some layout info like above.
Cheers,
Willy
On Sat, May 07, 2005 at 09:58:29AM +0200, Willy Tarreau wrote:
> I personally think that what would be useful is not "the number of CPUs"
> (which does not make any sense), but an enumeration of :
>
> - the physical nodes (for NUMA)
> - the physical CPUs
> - each CPU's cores (for multi-core)
> - each core's siblings (for HT/SMT)
>
> each of which would report their respective id for {set,get}_affinity().
> This way, the application would be able to choose how it needs to spread
> over available CPUs depending on its workload.
Typically, the application shouldn't care. The scheduler should be
deciding where processes would be best run.
> > Probably these things can be worked out via the get/set_affinity() syscalls
> > and/or via the cpuset sysfs interfaces, but it isn't as simple as you're
> > assuming.
>
> At least it would be simpler with some layout info like above.
There's nothing stopping a process from parsing /dev/cpu/x/cpuid
to find out anything it wants about the layout of cores/siblings,
but I'll bet you'd be hard pressed to find a single application
that would benefit from knowing about the layout.
Processes know nothing about what the kernel has scheduled on
other threads/cores, and they shouldn't. Trying to do the same
cpu arbitration in userspace is madness.
What /could/ be useful would be a way to tell sched_setaffinity
and co "I have two threads, I'd like them both to run on different cores,
avoiding HT pairs, and never be migrated off them" without having to care
about the layout of the cpus in each application.
Dave
Hi Dave,
On Sat, May 07, 2005 at 12:53:57PM -0400, Dave Jones wrote:
> On Sat, May 07, 2005 at 09:58:29AM +0200, Willy Tarreau wrote:
> What /could/ be useful would be a way to tell sched_setaffinity
> and co "I have two threads, I'd like them both to run on different cores,
> avoiding HT pairs, and never be migrated off them" without having to care
> about the layout of the cpus in each application.
Well, that's exactly for this that I formulated the proposal. A
CPU-intensive application which benefits from the cache would better
choose to run on HT pairs. A network-hungry application will prefer
running on only one sibling of each HT pair, and probably one process
per core, particularly when each core receives one NIC's interrupt.
A memory bandwidth intensive application will choose to run on a
single NUMA node, etc... So either the application can choose this
itself from its understanding of the CPU layout, or it can ask the
system "hey, I'd like this type of workload, how many process should
I start, and where should I bind them ?". I agree that the later
seems more portable and puts less burden on the application.
Regards,
Willy
On Sat, May 07, 2005 at 07:05:56PM +0200, Willy Tarreau wrote:
> Well, that's exactly for this that I formulated the proposal. A
> CPU-intensive application which benefits from the cache would better
> choose to run on HT pairs. A network-hungry application will prefer
> running on only one sibling of each HT pair, and probably one process
> per core, particularly when each core receives one NIC's interrupt.
> A memory bandwidth intensive application will choose to run on a
> single NUMA node, etc... So either the application can choose this
> itself from its understanding of the CPU layout, or it can ask the
> system "hey, I'd like this type of workload, how many process should
> I start, and where should I bind them ?".
I think generalising this and having a method to do this in the kernel
is a much better idea than each application parsing this themselves.
Things are only getting more and more complex as time goes on,
and I don't trust application developers to get it right.
Centralising this in the kernel (or maybe even glibc) means we can get
it right, and have every application benefit. If we get it wrong, we
fix it, and all the applications are fixed without needing fixing/recompiling.
Dave
On Sat, May 07, 2005 at 01:20:05PM -0400, Dave Jones wrote:
> On Sat, May 07, 2005 at 07:05:56PM +0200, Willy Tarreau wrote:
>
> > Well, that's exactly for this that I formulated the proposal. A
> > CPU-intensive application which benefits from the cache would better
> > choose to run on HT pairs. A network-hungry application will prefer
> > running on only one sibling of each HT pair, and probably one process
> > per core, particularly when each core receives one NIC's interrupt.
> > A memory bandwidth intensive application will choose to run on a
> > single NUMA node, etc... So either the application can choose this
> > itself from its understanding of the CPU layout, or it can ask the
> > system "hey, I'd like this type of workload, how many process should
> > I start, and where should I bind them ?".
>
> I think generalising this and having a method to do this in the kernel
> is a much better idea than each application parsing this themselves.
> Things are only getting more and more complex as time goes on,
> and I don't trust application developers to get it right.
>
> Centralising this in the kernel (or maybe even glibc) means we can get
> it right, and have every application benefit. If we get it wrong, we
> fix it, and all the applications are fixed without needing fixing/recompiling.
Agreed.
Even more, support for newer layouts would only require a kernel upgrade
and not an application update. And porting applications to other
architectures would be more transparent.
Regards,
Willy
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Ricky.
>>Not to be a pain but how exactly would that interface look today
>>in your eyes?
> Back when I first brought this up (8 years ago?), it was simple... numcpu
> was it. There weren't any virtual processors or multi-core critters.
Weren't there? Hmm. First SMT implementation dates back to 1970.
Or the HEP-1 from 1982. Or the Tera from 1990.
They weren't called SMT though back then.
Irrelevant to the discussion though.
And no, "they didn't run Linux" doesn't cut it in my eyes.
People run Linux on anything, always have, always will.
> CPU affinity, cpumasks, and sysfs weren't even dreams.
> Today, things are more complicated... much more complicated. However,
> they've generally already been hashed out and handled in some fashion.
> The kernel already knows how many cpus there are, how many are online,
> which ones are virtual (at least to the point that the scheduler knows),
> etc. I'm not sure what difference multi-core chips really make as they're
> just two+ cpus in the same package -- yes, that means all of them have to
> be offline to physically remove the processor, but that's pretty hardcore,
> specialized function to begin with.
Pretty big generalization there. But tell me, a HT DualCore CPU - how
DO you think it should end up being visible?
Also, remember the some database vendors have said that they will charge
per cpu package and some have said it's per cpu core.
Whatever interface is chosen have to accomodate both.
// Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)
iD8DBQFCfQALBrn2kJu9P78RAus0AJ96rwWGKG5AyuchQRxSOZESFaPLqwCZAWeJ
nJLYcT7NsJhFB81eUYVP4v8=
=Wb0F
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi.
> I personally think that what would be useful is not "the number of CPUs"
> (which does not make any sense), but an enumeration of :
>
> - the physical nodes (for NUMA)
> - the physical CPUs
> - each CPU's cores (for multi-core)
> - each core's siblings (for HT/SMT)
>
> each of which would report their respective id for {set,get}_affinity().
> This way, the application would be able to choose how it needs to spread
> over available CPUs depending on its workload. But IMHO, this should
> definitely not be put in cpuinfo. I consider that cpuinfo is for the human.
When one defines it one way you can be sure there'll come some company
and figure something out that doesn't fit into that representation.
Like - Stick a board into the CPU slot of some motherboard. That board
has two DualCore, SMT chips.
Oops.
Now the funny part - there is a company selling those things (not
dualcore yet, but SMT anyhow).
How do you fit it into that model?
// Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)
iD8DBQFCfQDoBrn2kJu9P78RAtUSAJ42Jm9xbpOE9UhOP7kpj5vGTuiPbACfXiB5
PnjLvgytXnlnrlKO+IVW5GE=
=ovm/
-----END PGP SIGNATURE-----
On Sat, 7 May 2005, Stefan Smietanowski wrote:
>> Back when I first brought this up (8 years ago?), it was simple... numcpu
>> was it. There weren't any virtual processors or multi-core critters.
... as far as linux was concerned, which is the whole point. We aren't
talking about those ancient cray's and other oddball (by modern definition)
quad 386's -- yes, I've seen one of those; yes, it was more dust than
actual computer :-)
>Pretty big generalization there. But tell me, a HT DualCore CPU - how
>DO you think it should end up being visible?
One has to make generalizations. I'm not typin' for days here.
Your HT DC CPU counts as 4 cpus total... the same as two HT processors.
The system does not fundamentally need to make a distinction on dual core
vs. two actual chips + heat sinks + fans. The system will perform almost
identically (if not acutally identically) to a dual (single core) processor
system.
>Also, remember the some database vendors have said that they will charge
>per cpu package and some have said it's per cpu core.
That's between you and the licensor. Some count virtual processors, some
count logical processors, and I'm sure there are some that are worded
based on the physical number of processor chips (even if they aren't online.)
But I don't pander to the greedy bastard database vendors :-) "I swear
we're only running Oracle on *one* of the 8 processors. Honest."
(you can never satisfy everyone all of the time.)
Personally, I don't count HT as a "second processor"... because it's not.
--Ricky
On Sat, May 07, 2005 at 07:54:48PM +0200, Stefan Smietanowski wrote:
> When one defines it one way you can be sure there'll come some company
> and figure something out that doesn't fit into that representation.
>
> Like - Stick a board into the CPU slot of some motherboard. That board
> has two DualCore, SMT chips.
>
> Oops.
>
> Now the funny part - there is a company selling those things (not
> dualcore yet, but SMT anyhow).
>
> How do you fit it into that model?
Two CPU on a board accessing the memory through a same bus is just like
a NUMA node. Anyway, as Dave told it, it's even better to have the kernel
translate the application needs into hardware ressources as it is the
best one to deal with those hardware builders' fantasies.
Regards,
Willy
* Ricky Beam ([email protected]) wrote:
>
> Your HT DC CPU counts as 4 cpus total... the same as two HT processors.
> The system does not fundamentally need to make a distinction on dual core
> vs. two actual chips + heat sinks + fans. The system will perform almost
> identically (if not acutally identically) to a dual (single core) processor
> system.
That *might* be the case for a current one; but I'm sure dual core
processors will end up sharing a level of cache/bus arbitration
sometime. Anyway it is a bit nasty not to real dual-core'd ness
to things, after all they might get upset if someone tried to hotswap
one.....
Dave
P.S. On the side note of early dual core chips; there is an R65c00/21
and R65c29 listed in a 1985 Rockwell datasheet I've got - dual 6502s!
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
Stefan Smietanowski <[email protected]> wrote:
>> I personally think that what would be useful is not "the number of CPUs"
>> (which does not make any sense), but an enumeration of :
>>
>> - the physical nodes (for NUMA)
>> - the physical CPUs
>> - each CPU's cores (for multi-core)
>> - each core's siblings (for HT/SMT)
>>
>> each of which would report their respective id for {set,get}_affinity().
>> This way, the application would be able to choose how it needs to spread
>> over available CPUs depending on its workload. But IMHO, this should
>> definitely not be put in cpuinfo. I consider that cpuinfo is for the human.
>
> When one defines it one way you can be sure there'll come some company
> and figure something out that doesn't fit into that representation.
>
> Like - Stick a board into the CPU slot of some motherboard. That board
> has two DualCore, SMT chips.
Obviously it must be a tree of CPU groups. CPUs in one NUMA node go into
one group, multi-core CPUs have all cores in one group and HT is a group,
too. This will scale from UP (degenerated tree with just one CPU) to
clusters with multicore HT-capable CPUs on PCI boards.
e.g. if you choose to represent it as a string:
Object = <Offline-Indicator>? (<CPU>|<Group>) <Details>?
Offline-Indicator = "-"
CPU = qr/[a-z0-9_]+/ # or more restrictive
Group = "[" (<Object>)("," <Object>)* "]"
Details = "{" (<vardef)("," <Vardef>)* "}"
Vardef = <Key> "=" <Value> # or "=>" if you think perl
Key = qr/[a-z0-9_]+/
Value = qr/"([^"\000]|"")*"/
The described board in a UP system might look like:
[cpu0{speed="4.77MHz", RAM="80KB"},
[[[boardcpu0_0, boardcpu0_0_ht]{group="ht"},
[boardcpu0_1, boardcpu0_1_ht]{group="ht"}],
[[boardcpu1_0, boardcpu1_0_ht]{group="ht"},
[boardcpu1_1, boardcpu1_1_ht]{group="ht"}]{group="numa",RAM="4GB"}]
To count them, do:
#/usr/bin/perl
s/"[^"]*"//g;
s/\{[^}]*\}//g;
$numcpus=s/,/,/g + 1;
--
Top 100 things you don't want the sysadmin to say:
43. The backup procedure works fine, but the restore is tricky!
On Sat, May 07, 2005 at 01:20:05PM -0400, Dave Jones wrote:
> On Sat, May 07, 2005 at 07:05:56PM +0200, Willy Tarreau wrote:
> > system "hey, I'd like this type of workload, how many process should
> > I start, and where should I bind them ?".
>
> I think generalising this and having a method to do this in the kernel
> is a much better idea than each application parsing this themselves.
> Things are only getting more and more complex as time goes on,
> and I don't trust application developers to get it right.
As a developer of a multiprocess/multithreaded application I can assure
you that you are right not to trust application developers to get this
right. The idea that a programmer understands the behavior of the
applications they write is largely a myth. Furthermore, I suspect
that SMT will evolve in directions that make the idea of a processor
more and more fuzzy. I don't think it is wise to construct any
interface that suggests knowing the hardware details is good, or that
processes should be bound by userland. Certainly it is sometimes
necessary for userland to do this, but we should look at that as a
bug in the kernel.
Thanks,
Jim
--
[email protected]
SDF Public Access UNIX System - http://sdf.lonestar.org
"Bodo Eggert <[email protected]>" <[email protected]> writes:
>
> Obviously it must be a tree of CPU groups. CPUs in one NUMA node go into
> one group, multi-core CPUs have all cores in one group and HT is a group,
> too. This will scale from UP (degenerated tree with just one CPU) to
> clusters with multicore HT-capable CPUs on PCI boards.
All this informtation (except HT/multicore are folded into a single
level) is already there in sysfs.
libnuma uses it to discover the topology and report it to the
user.
-Andi
On Sat, 2005-05-07 at 21:25, Jim Nance wrote:
> On Sat, May 07, 2005 at 01:20:05PM -0400, Dave Jones wrote:
> > On Sat, May 07, 2005 at 07:05:56PM +0200, Willy Tarreau wrote:
>
> > > system "hey, I'd like this type of workload, how many process should
> > > I start, and where should I bind them ?".
> >
> > I think generalising this and having a method to do this in the kernel
> > is a much better idea than each application parsing this themselves.
> > Things are only getting more and more complex as time goes on,
> > and I don't trust application developers to get it right.
>
> As a developer of a multiprocess/multithreaded application I can assure
> you that you are right not to trust application developers to get this
> right. The idea that a programmer understands the behavior of the
> applications they write is largely a myth. Furthermore, I suspect
> that SMT will evolve in directions that make the idea of a processor
> more and more fuzzy. I don't think it is wise to construct any
> interface that suggests knowing the hardware details is good, or that
> processes should be bound by userland. Certainly it is sometimes
> necessary for userland to do this, but we should look at that as a
> bug in the kernel.
>
> Thanks,
>
> Jim
Aw c'mon. Don't we believe in the C programming philosphy of trusting
the programmer? You know, give them enough rope to hang themselves?
Personally, I don't care because I can parse cpuid and the like directly
myself, but examples of legitimate uses for this knowledge are
compilers, jvms, and threading libraries, all which although lowlevel,
are technically userland. I don't think it's our job to protect the user
from themselves, it's to give them a reasonable default, and the
interfaces to take advantage of the tricker stuff for special purposes
if they wish.
Andrew Morton wrote:
> Ricky Beam <[email protected]> wrote:
>
>>Short of a kernel module to export the kernel variables, that's the only
>> damned way to find the number of cpus in a Linux system.
>
>
> Question is: do you need to know the number of CPUs (why?) or do you need
> to know the number of CPUs which you're currently allowed to use or do you
> need to know the maximum number of CPUs which you are allowed to bind
> yourself to, or what?
I can see responsible programs checking Ncpu before deciding how many
threads to start, so it seems that some accurate info could be useful in
the real world.
--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
Jim Nance wrote:
> On Sat, May 07, 2005 at 01:20:05PM -0400, Dave Jones wrote:
>
>>On Sat, May 07, 2005 at 07:05:56PM +0200, Willy Tarreau wrote:
>
>
>> > system "hey, I'd like this type of workload, how many process should
>> > I start, and where should I bind them ?".
>>
>>I think generalising this and having a method to do this in the kernel
>>is a much better idea than each application parsing this themselves.
>>Things are only getting more and more complex as time goes on,
>>and I don't trust application developers to get it right.
>
>
> As a developer of a multiprocess/multithreaded application I can assure
> you that you are right not to trust application developers to get this
> right. The idea that a programmer understands the behavior of the
> applications they write is largely a myth. Furthermore, I suspect
> that SMT will evolve in directions that make the idea of a processor
> more and more fuzzy. I don't think it is wise to construct any
> interface that suggests knowing the hardware details is good, or that
> processes should be bound by userland. Certainly it is sometimes
> necessary for userland to do this, but we should look at that as a
> bug in the kernel.
Might I suggest that if you like the "we know best just trust us"
approach, there is another OS to use. Making information available to
good applications will improve system performance, or at least allow
better limitation of requests for resources, and bad applications will
be bad regardless of what you hide. You don't hide the CPU hardware any
more than the memory size.
--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
Ricky Beam wrote:
> On Tue, 19 Apr 2005, Nico Schottelius wrote:
>
>>When I wrote schwanz3(*) for fun, I noticed /proc/cpuinfo
>>varies very much on different architectures.
>
>
> Yep, and it has been this way since the begining of time.
>
>
>>So that one at least can count the cpus on every system the same way.
>
>
> Hah. Give me a minute to stop laughing... I argued the same point almost
> a decade ago. Linus decided to be an ass and flat refused to ever export
> numcpu (or any of the current day derivatives) which brought us to the
> bullshit of parsing the arch dependant /proc/cpuinfo.
Don't ever take up a career as a diplomat, no one in their right mind
would want such a tactless person for a diplomatic job, say UN ambasador
for instance.
Linus did what was probably right then. I would agree that there is room
for something better now. Just to prove it could be done (not that this
is the only or best way):
cpu0 {
socket: 0
chip-cache: 0
num-core: 2
per-core-cache: 512k
num-siblings: 2
sibling-cache: 0
family: i86
features: sse2 sse3 xxs bvd
# stepping and revision info
}
cpu1 {
socket: 1
chip-cache: 0
num-core: 1
pre-core-cache: 512k
num-siblings: 2
sibling-cache: 64k
family: i86
features: sse2 sse3 xxs bvd kook2
# stepping and revision info
}
This is just proof of concept, you can have per-chip, per-core, and
per-sibling cache for instance, but I can't believe that anyone would
make a chip where the cache per core or per sibling differed, or the
instruction set, etc. Depending on where you buy your BS, Intel and AMD
will (or won't) make single and dual core chips to fit the same socket.
The complexity wasn't needed a decade ago, and I'm not sure it is now,
other than it being easy to display if people don't complain about
breaking the existing format.
--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
On Mon, May 09, 2005 at 02:00:55PM -0400, Bill Davidsen wrote:
> Linus did what was probably right then. I would agree that there is room
> for something better now. Just to prove it could be done (not that this
> is the only or best way):
I suspect many architecture's /proc/cpuinfo were not decided by Linus at
all, but by whoever ported linux to that architecture.
> cpu0 {
> socket: 0
> chip-cache: 0
> num-core: 2
> per-core-cache: 512k
> num-siblings: 2
> sibling-cache: 0
> family: i86
> features: sse2 sse3 xxs bvd
> # stepping and revision info
> }
> cpu1 {
> socket: 1
> chip-cache: 0
> num-core: 1
> pre-core-cache: 512k
> num-siblings: 2
> sibling-cache: 64k
> family: i86
> features: sse2 sse3 xxs bvd kook2
> # stepping and revision info
> }
Where does numa nodes fit into that?
> This is just proof of concept, you can have per-chip, per-core, and
> per-sibling cache for instance, but I can't believe that anyone would
> make a chip where the cache per core or per sibling differed, or the
> instruction set, etc. Depending on where you buy your BS, Intel and AMD
> will (or won't) make single and dual core chips to fit the same socket.
Have you seen the Cell processor? Multi core with different instruction
set for the smaller execution cores than the main one.
> The complexity wasn't needed a decade ago, and I'm not sure it is now,
> other than it being easy to display if people don't complain about
> breaking the existing format.
But people always like complaining about all changes that they notice.
Len Sorensen
Bill Davidsen wrote:
> Might I suggest that if you like the "we know best just trust us"
> approach, there is another OS to use. Making information available to
> good applications will improve system performance, or at least allow
> better limitation of requests for resources
What will you do with the information? The kernel is doing all the
resource allocation and scheduling.
From a higher-level, the application wants the best performance.
Doesn't it make more sense to have an API that lets you query things
like: how many cores do I have, how many separate memory interfaces do I
have, how many cores handle interrupts, etc.
Based on that information you tell the system: "I've got 4 processes,
please put them all on cores with separate memory connectivity since
they're all memory-intensive. Now please put these other two threads on
the same cpu since they share memory but serialize each other by design."
The app shouldn't care about the details of architecture, but it should
be able to work with the system to give the best performance.
Chris
On Mon, May 09, 2005 at 02:09:10PM -0600, Chris Friesen wrote:
> Bill Davidsen wrote:
>
> >Might I suggest that if you like the "we know best just trust us"
> >approach, there is another OS to use. Making information available to
> >good applications will improve system performance, or at least allow
> >better limitation of requests for resources
>
> What will you do with the information? The kernel is doing all the
> resource allocation and scheduling.
>
> From a higher-level, the application wants the best performance.
> Doesn't it make more sense to have an API that lets you query things
> like: how many cores do I have, how many separate memory interfaces do I
> have, how many cores handle interrupts, etc.
>
> Based on that information you tell the system: "I've got 4 processes,
> please put them all on cores with separate memory connectivity since
> they're all memory-intensive. Now please put these other two threads on
> the same cpu since they share memory but serialize each other by design."
>
> The app shouldn't care about the details of architecture, but it should
> be able to work with the system to give the best performance.
What if the process is able to split itself into say 4 or 8 or 16
threads, but if you only have the hardware to run 2 threads you migth
get less context switch overhead running less threads at a time, while
if you have 16 cpus available, running 4 threads will not be the fastest
way to get the job done. Being able to "optimally" configure the
program on the fly might be handy (although a config setting of the
optimal config on the particular machine would do the same thing).
Now on the other hand if the process could tell that there were 8 cpu
cores and decided to run 8 threads, but the admin was running another
program already that was using 4 cores, then auto detecting the core
count and starting 8 threads might still be inefficient, and 4 would
have been optimal.
I think make has the right idea. Let the user and/or admin decide how
to allocate the resources. If they don't know what they are doing, well
who does. As long as the user can tell what their machine is they
should be able to decide how many threads to start in a given program.
/proc/cpuinfo as it is currently, is not too bad for that task.
Adding an option or config setting to the program where the user can
tell it how many threads to run seems like the right solution, while if
the program is simpler to write as 2 threads running at the same time
with no obvious overhead from doing it that way, then run it as 2 threads
even if you only have 1 cpu core to run it. Context switches are hardly
that expensive on a modern machine.
Len Sorensen
I don't want to butt in on a private fight but, philosophically,
I would argue that it is up to the kernel to report the real
hardware configuration in an easy to use, and extensible, way.
This only needs to be done once. To argue about what application
writers could or should use, based on what happens today is
just a cop-out; the only thing one must say is that if the app
dosn't understand the architecture it must provide defaults.
- Brian
Good Afternoon Bill,
Thanks for the input. Let me make a couple of comments.
On Mon, May 09, 2005 at 02:14:14PM -0400, Bill Davidsen wrote:
> Might I suggest that if you like the "we know best just trust us"
> approach, there is another OS to use. Making information available to
> good applications will improve system performance, or at least allow
> better limitation of requests for resources, and bad applications will
> be bad regardless of what you hide. You don't hide the CPU hardware any
> more than the memory size.
You could use a similar argument for cooperative rather than
preemptive multitasking. It might even be a valid argument,
assuming you controlled all the processes running on the system.
But it didn't work very well in practice.
I see two problems with encouraging applications to get involved
with processor selection.
The first is they don't have enough information to get it right.
There are going to be other processes running on the machine.
The optimal set of processors to run on is going to depend on
what else is running and what it is doing at that instant. This
isn't information a usermode process has good access to. Say I
have an application that wants to bind its 2 threads to the two
processors on a single SMT chip. Now say I run two of these
applications on a machine with 2 SMT chips on it. What keeps
both of them from binding themselves to the same chip? Should
it be the applications responsibility to look through the process
table and see what other applicatioins are bound to what processors?
What prevents races if they do?
The second is that once you give userland an interface, it becomes
very difficult to remove it once it no longer makes sense. See
the thread on this mailing list concerning the C/H/S values returned
for disk drives as an example. Having to support a particular interface
may make it impossible to add improvements we want to add in the
future. For example, if at some point in the future we come up with
a really great scheduling algorithm, it won't help if the programs
have already bound themselvs to particular processors.
Now I know there are exceptions to rules. But in general I would say
that if an application needs to know about the configuration of the
processors, then its compensating for shortcommings in the kernel.
Thanks,
Jim
--
[email protected]
SDF Public Access UNIX System - http://sdf.lonestar.org
Hi Jim,
On Tue, May 10, 2005 at 02:23:01AM +0000, Jim Nance wrote:
> Now I know there are exceptions to rules. But in general I would say
> that if an application needs to know about the configuration of the
> processors, then its compensating for shortcommings in the kernel.
I cannot agree. When an application needs to know such things, it is
because it has being developped primarily for a certain platform and
optimized for such platform. The kernel's scheduler is written for a
general purpose and not for some particular apps which will run two
concurrent threads at 100% CPU each for example, with lots of memory
exchanges through the CPU cache. It is not uncommon to have to manually
play with /proc before starting some specific apps. It is for this case
that some help from the kernel might be welcome. Instead of forcing eth0
interrupt to CPU0 then binding your process to CPU0, you might prefer to
tell the kernel "this app needs to run on the CPU which receives ints
from eth0", whatever it is.
Regards,
willy
Jim wrote:
> I see two problems with encouraging applications to get involved
> with processor selection.
I suspect you are confusing "application" with "not kernel".
There are basically three layers of software on big systems:
1) kernel
2) administration (system services, libraries and utilities)
3) application
Something like a batch manager is an example in layer (2) that needs
extensive knowledge of a systems hardware, and extensive ability to
manage exactly what runs and allocates where.
Large systems very much expect to manage what threads run where. These
API's are already present - check out sched_setaffinity, mbind,
set_mempolicy, and cpusets. The details of what hardware is where,
including memory, processor and interconnect, are also there as well, in
various /proc and /sys files.
No - we don't expect the application to know all this. But we absolutely
require that various admin level programs know this stuff in intimate
detail, and enable the administration of large systems in a variety of
ways.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401
Andrew wrote:
> Probably these things can be worked out via the get/set_affinity() syscalls
> and/or via the cpuset sysfs interfaces, but it isn't as simple as you're
> assuming.
Yes - it's all there. Sometimes the ways to discover it aren't pretty,
but that's one thing that libraries are good for - to wrap such detail.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401
On Tue, May 10, 2005 at 12:21:21AM -0700, Paul Jackson wrote:
> Yes - it's all there. Sometimes the ways to discover it aren't pretty,
> but that's one thing that libraries are good for - to wrap such detail.
And then when the kernel adds something new, you update one library
rather than 1000s of applications.
Perhaps making it hard to get at without a certainl library is a good
way to avoid too many applications poling at it just because they can.
Len Sorensen
Chris Friesen wrote:
> Bill Davidsen wrote:
>
>> Might I suggest that if you like the "we know best just trust us"
>> approach, there is another OS to use. Making information available to
>> good applications will improve system performance, or at least allow
>> better limitation of requests for resources
>
>
> What will you do with the information? The kernel is doing all the
> resource allocation and scheduling.
>
> From a higher-level, the application wants the best performance.
> Doesn't it make more sense to have an API that lets you query things
> like: how many cores do I have, how many separate memory interfaces do I
> have, how many cores handle interrupts, etc.
>
> Based on that information you tell the system: "I've got 4 processes,
> please put them all on cores with separate memory connectivity since
> they're all memory-intensive. Now please put these other two threads on
> the same cpu since they share memory but serialize each other by design."
Unless you actually have such a feature, saying "let's not make what we
have useful because we could have something better someday" seems to be
a needless sacrifice of electrons.
Bill Davidsen wrote:
> Unless you actually have such a feature, saying "let's not make what we
> have useful because we could have something better someday" seems to be
> a needless sacrifice of electrons.
This has already been addressed. If we add a crappy interface now, it
will become very difficult to remove it in the future.
Chris
Jim Nance wrote:
> Good Afternoon Bill,
>
> Thanks for the input. Let me make a couple of comments.
>
> On Mon, May 09, 2005 at 02:14:14PM -0400, Bill Davidsen wrote:
>
>
>>Might I suggest that if you like the "we know best just trust us"
>>approach, there is another OS to use. Making information available to
>>good applications will improve system performance, or at least allow
>>better limitation of requests for resources, and bad applications will
>>be bad regardless of what you hide. You don't hide the CPU hardware any
>>more than the memory size.
>
>
> You could use a similar argument for cooperative rather than
> preemptive multitasking. It might even be a valid argument,
> assuming you controlled all the processes running on the system.
> But it didn't work very well in practice.
>
> I see two problems with encouraging applications to get involved
> with processor selection.
>
> The first is they don't have enough information to get it right.
The application doesn't have to get it "right" however you define that,
but unless you expect to extend the API to add a
signal(STARTMORETHREADS) it makes sense for the application to have some
idea what the hardware config is, because the kernel doesn't know what
the application is going to do (future) vs. what it already did (past).
Running a number of threads somewhat close to the number of CPUs, or
sockets, or something else the application can know is far better than
starting 64 threads in case this is a big NUMA machine, or running
single thread while seven of eight CPUs do nothing.
By looking at Ncpu and ldavg a smart application can avoid being really
wrong, which gives the kernel a better chance of improving throughput.
> There are going to be other processes running on the machine.
> The optimal set of processors to run on is going to depend on
> what else is running and what it is doing at that instant. This
> isn't information a usermode process has good access to. Say I
> have an application that wants to bind its 2 threads to the two
> processors on a single SMT chip. Now say I run two of these
> applications on a machine with 2 SMT chips on it. What keeps
> both of them from binding themselves to the same chip? Should
> it be the applications responsibility to look through the process
> table and see what other applicatioins are bound to what processors?
> What prevents races if they do?
If the application can choose a sane number of threads, that makes the
problem of memory management and CPU scheduling easier. Just because the
application can't do a perfect job doesn't mean that it should do
without information needed to do something reasonable.
Perfect is the enemy of better.
Lennart Sorensen wrote:
> On Mon, May 09, 2005 at 02:00:55PM -0400, Bill Davidsen wrote:
>
>>Linus did what was probably right then. I would agree that there is room
>>for something better now. Just to prove it could be done (not that this
>>is the only or best way):
>
>
> I suspect many architecture's /proc/cpuinfo were not decided by Linus at
> all, but by whoever ported linux to that architecture.
>
>
>> cpu0 {
>> socket: 0
>> chip-cache: 0
>> num-core: 2
>> per-core-cache: 512k
>> num-siblings: 2
>> sibling-cache: 0
>> family: i86
>> features: sse2 sse3 xxs bvd
>> # stepping and revision info
>> }
>> cpu1 {
>> socket: 1
>> chip-cache: 0
>> num-core: 1
>> pre-core-cache: 512k
>> num-siblings: 2
>> sibling-cache: 64k
>> family: i86
>> features: sse2 sse3 xxs bvd kook2
>> # stepping and revision info
>> }
>
>
> Where does numa nodes fit into that?
>
>
>>This is just proof of concept, you can have per-chip, per-core, and
>>per-sibling cache for instance, but I can't believe that anyone would
>>make a chip where the cache per core or per sibling differed, or the
>>instruction set, etc. Depending on where you buy your BS, Intel and AMD
>>will (or won't) make single and dual core chips to fit the same socket.
>
>
> Have you seen the Cell processor? Multi core with different instruction
> set for the smaller execution cores than the main one.
I'm aware of it, but until someone actually produces a multicore which
executes the same instruction set (386+P4?) I assume that all the cores
used by the program will be the same.
I wrote for the DEC Rainbow (8086 and Z80, one did disk and video, one
did serial+net), and IIRC the memory addresses were shared but the IO
addresses weren't. Also something I can't easily name which had a 68010
and 4 bit RISC in a single carrier. Early microcomputer days were fun,
or at least I thought it was fun to cope with bizarre and unreliable
hardware when I was young.
Lennart Sorensen wrote:
> On Tue, May 10, 2005 at 12:21:21AM -0700, Paul Jackson wrote:
>
>>Yes - it's all there. Sometimes the ways to discover it aren't pretty,
>>but that's one thing that libraries are good for - to wrap such detail.
>
>
> And then when the kernel adds something new, you update one library
> rather than 1000s of applications.
>
> Perhaps making it hard to get at without a certainl library is a good
> way to avoid too many applications poling at it just because they can.
The advantage of /proc is that it works from C, Java, Perl, Python, etc.
Oh, and humans, the reason all the application run.