2008-07-03 23:41:56

by Nathan Lynch

[permalink] [raw]
Subject: [PATCH 1/2] elf loader support for auxvec base platform string

Some IBM POWER-based platforms have the ability to run in a
mode which mostly appears to the OS as a different processor from the
actual hardware. For example, a Power6 system may appear to be a
Power5+, which makes the AT_PLATFORM value "power5+".

However, some applications (virtual machines, optimized libraries) can
benefit from knowledge of the underlying CPU model. A new aux vector
entry, AT_BASE_PLATFORM, will denote the actual hardware. For
example, on a Power6 system in Power5+ compatibility mode, AT_PLATFORM
will be "power5+" and AT_BASE_PLATFORM will be "power6".

If the architecture has defined ELF_BASE_PLATFORM, copy that value to
the user stack in the same manner as ELF_PLATFORM.

Signed-off-by: Nathan Lynch <[email protected]>
---

Next patch implements ELF/AT_BASE_PLATFORM for powerpc.

fs/binfmt_elf.c | 23 +++++++++++++++++++++++
1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index d48ff5f..834c2c4 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -131,6 +131,10 @@ static int padzero(unsigned long elf_bss)
#define STACK_ALLOC(sp, len) ({ sp -= len ; sp; })
#endif

+#ifndef ELF_BASE_PLATFORM
+#define ELF_BASE_PLATFORM NULL
+#endif
+
static int
create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
unsigned long load_addr, unsigned long interp_load_addr)
@@ -142,7 +146,9 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
elf_addr_t __user *envp;
elf_addr_t __user *sp;
elf_addr_t __user *u_platform;
+ elf_addr_t __user *u_base_platform;
const char *k_platform = ELF_PLATFORM;
+ const char *k_base_platform = ELF_BASE_PLATFORM;
int items;
elf_addr_t *elf_info;
int ei_index = 0;
@@ -172,6 +178,19 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
return -EFAULT;
}

+ /*
+ * If this architecture has a "base" platform capability
+ * string, copy it to userspace.
+ */
+ u_base_platform = NULL;
+ if (k_base_platform) {
+ size_t len = strlen(k_base_platform) + 1;
+
+ u_base_platform = (elf_addr_t __user *)STACK_ALLOC(p, len);
+ if (__copy_to_user(u_base_platform, k_base_platform, len))
+ return -EFAULT;
+ }
+
/* Create the ELF interpreter info */
elf_info = (elf_addr_t *)current->mm->saved_auxv;
/* update AT_VECTOR_SIZE_BASE if the number of NEW_AUX_ENT() changes */
@@ -208,6 +227,10 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
NEW_AUX_ENT(AT_PLATFORM,
(elf_addr_t)(unsigned long)u_platform);
}
+ if (k_base_platform) {
+ NEW_AUX_ENT(AT_BASE_PLATFORM,
+ (elf_addr_t)(unsigned long)u_base_platform);
+ }
if (bprm->interp_flags & BINPRM_FLAGS_EXECFD) {
NEW_AUX_ENT(AT_EXECFD, bprm->interp_data);
}
--
1.5.5.1


2008-07-04 02:20:17

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

Why not just use ELF_HWCAP for this? It looks like powerpc only has 3 bits
left there (keeping it to 32), but 3 is not 0. If not that, why not use
dsocaps? That is, some magic in the vDSO, which glibc already supports on
all machines where it uses the vDSO. (For how it works, see the use in
arch/x86/vdso/vdso32/note.S for CONFIG_XEN.)


Thanks,
Roland

2008-07-04 02:37:17

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

Nathan Lynch writes:
> Some IBM POWER-based platforms have the ability to run in a
> mode which mostly appears to the OS as a different processor from the
> actual hardware. For example, a Power6 system may appear to be a
> Power5+, which makes the AT_PLATFORM value "power5+".
>
> However, some applications (virtual machines, optimized libraries) can
> benefit from knowledge of the underlying CPU model. A new aux vector
> entry, AT_BASE_PLATFORM, will denote the actual hardware. For
> example, on a Power6 system in Power5+ compatibility mode, AT_PLATFORM
> will be "power5+" and AT_BASE_PLATFORM will be "power6".

Why on earth would you ever want AT_PLATFORM to differ from AT_BASE_PLATFORM?
In cases that matter you admit that AT_BASE_PLATFORM takes precedence,
so why involve a fake lame not-quite-the-platform in the first place?

Workaround for buggy software?

2008-07-07 05:49:53

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

On Thu, 2008-07-03 at 19:19 -0700, Roland McGrath wrote:
> Why not just use ELF_HWCAP for this? It looks like powerpc only has 3 bits
> left there (keeping it to 32), but 3 is not 0. If not that, why not use
> dsocaps? That is, some magic in the vDSO, which glibc already supports on
> all machines where it uses the vDSO. (For how it works, see the use in
> arch/x86/vdso/vdso32/note.S for CONFIG_XEN.)

Well, we use strings to represent the platforms already (ie, the actual
CPU microarchitecture). Fitting those into bits would be annoying, it
makes sense to have AT_BASE_PLATFORM to be the "base" variant of
AT_PLATFORM.

_However_ there is a bug in that this patch adds an entry without
bumping the number of entries in the cached array (ie.
AT_VECTOR_SIZE_BASE needs to be updated).

Ben.

2008-07-07 06:18:53

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

> Well, we use strings to represent the platforms already (ie, the actual
> CPU microarchitecture). Fitting those into bits would be annoying, it

Then use dsocaps.

> makes sense to have AT_BASE_PLATFORM to be the "base" variant of
> AT_PLATFORM.

I understand why you think so. But let's not be too abstract. The
purpose of the addition is to drive ld.so's selection of libraries, yes?

AT_PLATFORM is a lousy model. Handling it is clunky in glibc, because
we have to compare against all the known strings to turn it back into a
bit for the ld.so.cache bitmasks. We're not going to do the same for
AT_BASE_PLATFORM too, because it just sucks.

Using dsocaps gives you the best of both worlds. You can freely choose
new strings in the kernel without the ld.so code having to know about
them (which is not true of AT_PLATFORM, but may be true of how you are
thinking about "strings are nice"). You do have to map all the
possibilities that a single kernel build can produce into distinct bits.
But, there are 32 unallocated bits to start with. Moreover, those bit
assignments are not part of any permanent ABI like bits in AT_* values.
They just have to match up between this kernel build and the ld.conf.d
file installed along with it--kernel hackers and kernel packagers have
to coordinate, not kernel hackers and userland hackers.


Thanks,
Roland

2008-07-07 06:23:57

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

On Sun, 2008-07-06 at 23:18 -0700, Roland McGrath wrote:
>
> Using dsocaps gives you the best of both worlds. You can freely choose
> new strings in the kernel without the ld.so code having to know about
> them (which is not true of AT_PLATFORM, but may be true of how you are
> thinking about "strings are nice"). You do have to map all the
> possibilities that a single kernel build can produce into distinct bits.
> But, there are 32 unallocated bits to start with. Moreover, those bit
> assignments are not part of any permanent ABI like bits in AT_* values.
> They just have to match up between this kernel build and the ld.conf.d
> file installed along with it--kernel hackers and kernel packagers have
> to coordinate, not kernel hackers and userland hackers.

I'm not sure... if ld.conf.d isn't parse of the kernel source tree then
it -will- end in tears...

Ben.

2008-07-07 06:36:09

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

> I'm not sure... if ld.conf.d isn't parse of the kernel source tree then
> it -will- end in tears...

Of course, you should include the file you want people to install
as part of the kernel source or build. You can copy it into
place in make install or something if you like (convention is to
call it something.conf in /etc/ld.so.conf.d); then run ldconfig.
The x86-xen case has not bothered to include its one-line file as
such, just the comment in arch/x86/vdso/vdso32/note.S telling you
what it is.


Thanks,
Roland

2008-07-07 06:49:19

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

On Sun, 2008-07-06 at 23:35 -0700, Roland McGrath wrote:
> > I'm not sure... if ld.conf.d isn't parse of the kernel source tree then
> > it -will- end in tears...
>
> Of course, you should include the file you want people to install
> as part of the kernel source or build. You can copy it into
> place in make install or something if you like (convention is to
> call it something.conf in /etc/ld.so.conf.d); then run ldconfig.
> The x86-xen case has not bothered to include its one-line file as
> such, just the comment in arch/x86/vdso/vdso32/note.S telling you
> what it is.

Nathan, can you discuss that with Steve Munroe and see if he's ok
with such an approach ?

Cheers,
Ben.

2008-07-07 07:50:16

by Andreas Schwab

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

Roland McGrath <[email protected]> writes:

>> I'm not sure... if ld.conf.d isn't parse of the kernel source tree then
>> it -will- end in tears...
>
> Of course, you should include the file you want people to install
> as part of the kernel source or build. You can copy it into
> place in make install or something if you like (convention is to
> call it something.conf in /etc/ld.so.conf.d); then run ldconfig.

That will make it part of the kernel ABI, since the mapping depends on
the running kernel, doesn't it?

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2008-07-07 09:32:14

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

> That will make it part of the kernel ABI, since the mapping depends on
> the running kernel, doesn't it?

Well, not the permanent ABI in the sense that AT_* et al are. This
mapping must agree among all users sharing the same ld.so.cache file.
That is all. So if you were to change the meaning of a bit that was
used before with a different string, then you could not have the
conflicting ld.so.conf.d files both installed at the same time
(ldconfig will complain and fail). If you wanted to have two kernels
both installed that disagree on the string for a given bit, then you'd
have to switch the ld.so.conf.d files and re-run ldconfig when you
switch which kernel you're booting.

There are 32 bits free now. One can anticipate that reassigning a bit
would come up only after these are exhausted. With prudent use, this
will take a very long time to happen. Then the oldest CPU type string
might be retired to reuse its bit. It seems unlikely that there will
be a single installation (root directory) that really needs to have
installed both a kernel optimized for the oldest CPU model known and a
kernel optimized for the newest CPU model known. If there is, we can
worry about it then.

At any rate, the point remains that these bit assignments are not part
of any published userland ABI one has to think about in all the ways
that the real ABI implies. There is nowhere that has to know them or
will ever consider them, except the kernel with the vDSO image built
inside and the ld.so.conf.d file that goes with it.


Thanks,
Roland

2008-07-07 10:01:25

by Andreas Schwab

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

Roland McGrath <[email protected]> writes:

> There are 32 bits free now. One can anticipate that reassigning a bit
> would come up only after these are exhausted. With prudent use, this
> will take a very long time to happen. Then the oldest CPU type string
> might be retired to reuse its bit. It seems unlikely that there will
> be a single installation (root directory) that really needs to have
> installed both a kernel optimized for the oldest CPU model known and a
> kernel optimized for the newest CPU model known.

The kernel does not have to come from the same place as the root
filesystem. You may want to run a new kernel with an old filesystem, or
vice-versa.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2008-07-07 15:55:37

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

Mikael Pettersson wrote:
> Nathan Lynch writes:
> > Some IBM POWER-based platforms have the ability to run in a
> > mode which mostly appears to the OS as a different processor from the
> > actual hardware. For example, a Power6 system may appear to be a
> > Power5+, which makes the AT_PLATFORM value "power5+".
> >
> > However, some applications (virtual machines, optimized libraries) can
> > benefit from knowledge of the underlying CPU model. A new aux vector
> > entry, AT_BASE_PLATFORM, will denote the actual hardware. For
> > example, on a Power6 system in Power5+ compatibility mode, AT_PLATFORM
> > will be "power5+" and AT_BASE_PLATFORM will be "power6".
>
> Why on earth would you ever want AT_PLATFORM to differ from AT_BASE_PLATFORM?
> In cases that matter you admit that AT_BASE_PLATFORM takes precedence,
> so why involve a fake lame not-quite-the-platform in the first place?
>
> Workaround for buggy software?

My apologies, I did not explain the motivation well.

The idea is that while AT_PLATFORM indicates the instruction set
supported, AT_BASE_PLATFORM indicates the underlying
microarchitecture. It's not a matter of buggy software, or of one
value taking precedence over the other.

2008-07-07 16:17:04

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

Roland McGrath wrote:
> > Well, we use strings to represent the platforms already (ie, the actual
> > CPU microarchitecture). Fitting those into bits would be annoying, it
>
> Then use dsocaps.
>
> > makes sense to have AT_BASE_PLATFORM to be the "base" variant of
> > AT_PLATFORM.
>
> I understand why you think so. But let's not be too abstract. The
> purpose of the addition is to drive ld.so's selection of libraries, yes?

That is one purpose. But there are others (JVMs, performance tools).
dsocaps seems to be an ld.so-specific thing... or am I missing how a
"third-party" program would use it?

2008-07-07 22:17:54

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

Benjamin Herrenschmidt wrote:
> On Thu, 2008-07-03 at 19:19 -0700, Roland McGrath wrote:
> > Why not just use ELF_HWCAP for this? It looks like powerpc only has 3 bits
> > left there (keeping it to 32), but 3 is not 0. If not that, why not use
> > dsocaps? That is, some magic in the vDSO, which glibc already supports on
> > all machines where it uses the vDSO. (For how it works, see the use in
> > arch/x86/vdso/vdso32/note.S for CONFIG_XEN.)
>
> Well, we use strings to represent the platforms already (ie, the actual
> CPU microarchitecture). Fitting those into bits would be annoying, it
> makes sense to have AT_BASE_PLATFORM to be the "base" variant of
> AT_PLATFORM.
>
> _However_ there is a bug in that this patch adds an entry without
> bumping the number of entries in the cached array (ie.
> AT_VECTOR_SIZE_BASE needs to be updated).

Ugh, yes. I was hoping to work this in such a way that AT_VECTOR_SIZE
(and thus the size of mm_struct) increases only for architectures that
implement AT_BASE_PLATFORM... would it be wrong to account for it in
AT_VECTOR_SIZE_ARCH?

2008-07-07 22:57:17

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

On Mon, 2008-07-07 at 12:01 +0200, Andreas Schwab wrote:
> Roland McGrath <[email protected]> writes:
>
> > There are 32 bits free now. One can anticipate that reassigning a bit
> > would come up only after these are exhausted. With prudent use, this
> > will take a very long time to happen. Then the oldest CPU type string
> > might be retired to reuse its bit. It seems unlikely that there will
> > be a single installation (root directory) that really needs to have
> > installed both a kernel optimized for the oldest CPU model known and a
> > kernel optimized for the newest CPU model known.
>
> The kernel does not have to come from the same place as the root
> filesystem. You may want to run a new kernel with an old filesystem, or
> vice-versa.

I agree, I'm pretty dubious here...

Cheers,
Ben.

2008-07-07 23:02:03

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

On Mon, 2008-07-07 at 17:17 -0500, Nathan Lynch wrote:
> Benjamin Herrenschmidt wrote:
> > On Thu, 2008-07-03 at 19:19 -0700, Roland McGrath wrote:
> > > Why not just use ELF_HWCAP for this? It looks like powerpc only has 3 bits
> > > left there (keeping it to 32), but 3 is not 0. If not that, why not use
> > > dsocaps? That is, some magic in the vDSO, which glibc already supports on
> > > all machines where it uses the vDSO. (For how it works, see the use in
> > > arch/x86/vdso/vdso32/note.S for CONFIG_XEN.)
> >
> > Well, we use strings to represent the platforms already (ie, the actual
> > CPU microarchitecture). Fitting those into bits would be annoying, it
> > makes sense to have AT_BASE_PLATFORM to be the "base" variant of
> > AT_PLATFORM.
> >
> > _However_ there is a bug in that this patch adds an entry without
> > bumping the number of entries in the cached array (ie.
> > AT_VECTOR_SIZE_BASE needs to be updated).
>
> Ugh, yes. I was hoping to work this in such a way that AT_VECTOR_SIZE
> (and thus the size of mm_struct) increases only for architectures that
> implement AT_BASE_PLATFORM... would it be wrong to account for it in
> AT_VECTOR_SIZE_ARCH?

Yes. The later is for things added from ARCH_DLINFO. Since the
code for AT_BASE_PLATFORM is in the generic binfmt_elf, it would
be asking for trouble to not account for it in the base AT_VECTOR_SIZE.

Ben.

2008-07-08 00:31:38

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

> > The kernel does not have to come from the same place as the root
> > filesystem. You may want to run a new kernel with an old filesystem, or
> > vice-versa.

Well, it's not like these bits are really going to change in practice. My
point was just that this is a far "softer" ABI than the general kernel-user
contract. Sorry if my being precise about things gives you indigestion.

> I agree, I'm pretty dubious here...

Dubious about whether the dsocaps bit assignments are "part of the ABI"?
Fine. Let's talk again when you've used up 32 bits and want to figure out
what to do next.

Dubious about whether dsocaps is the right thing to do? I think you are
overlooking what the actual kernel-user compatibility reality is here.

Firstly, what is the "risk" in the "gone wrong" case? The risk is that a
DSO load via ld.so.cache will overlook a /lib/power99/foo.so match and get
a /lib/foo.so match instead because ldconfig doesn't know about "power99".
If foo.so wasn't in ld.so.cache at all, there is no problem.
If you used LD_LIBRARY_PATH, there is no problem.
If you used dlopen with an explicit file name (has /), there is no problem.

What happens if you boot a kernel that uses dsocaps with the new string
"power99", but you are missing the ld.so.conf.d file to match your kernel?
Then a DSO load via ld.so.cache will overlook "power99" matches.

How do you fix it?
Install the right (tiny text) file, and run ldconfig.

What happens now if you are using a new kernel that supplies a new string
"power99" in AT_PLATFORM or another auxv element used the same way, but
with an old root filesystem (say one including any glibc that exists today)?
Then a DSO load via ld.so.cache will overlook "power99" matches.

How do you fix it?
You add a bit assignment in the glibc sources, recompile glibc,
install a new whole glibc package. (Conceivably if you are extremely
careful you can manage to redo an otherwise completely identical
build to the glibc on your old system and replace only ld{64,}.so.1
and ldconfig.) Then run the new ldconfig.

In short, if you use a root filesystem from before kernels started using
the new string, then you will degenerate to default-platform library
matches from loads via ld.so.cache (i.e. /lib/foo.so, not /lib/somecpu/foo.so).
If you want to do better than that for this case, it's intractable using
AT_PLATFORM, and simple using dsocaps (probably simpler than booting a
special kernel was).

I haven't figured out what in this old-vs-new picture you think AT_PLATFORM
or something else like it would ever buy you.


Thanks,
Roland

2008-07-08 00:54:18

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

Adding Steve to the CC list as I'd like his input from the
glibc/powerpc side as he's the requester of that feature in the first
place.

Steve: Roland is proposing to ues dsocaps instead of AT_BASE_PLATFORM.

Cheers,
Ben.

On Mon, 2008-07-07 at 17:31 -0700, Roland McGrath wrote:
> > > The kernel does not have to come from the same place as the root
> > > filesystem. You may want to run a new kernel with an old filesystem, or
> > > vice-versa.
>
> Well, it's not like these bits are really going to change in practice. My
> point was just that this is a far "softer" ABI than the general kernel-user
> contract. Sorry if my being precise about things gives you indigestion.
>
> > I agree, I'm pretty dubious here...
>
> Dubious about whether the dsocaps bit assignments are "part of the ABI"?
> Fine. Let's talk again when you've used up 32 bits and want to figure out
> what to do next.
>
> Dubious about whether dsocaps is the right thing to do? I think you are
> overlooking what the actual kernel-user compatibility reality is here.
>
> Firstly, what is the "risk" in the "gone wrong" case? The risk is that a
> DSO load via ld.so.cache will overlook a /lib/power99/foo.so match and get
> a /lib/foo.so match instead because ldconfig doesn't know about "power99".
> If foo.so wasn't in ld.so.cache at all, there is no problem.
> If you used LD_LIBRARY_PATH, there is no problem.
> If you used dlopen with an explicit file name (has /), there is no problem.
>
> What happens if you boot a kernel that uses dsocaps with the new string
> "power99", but you are missing the ld.so.conf.d file to match your kernel?
> Then a DSO load via ld.so.cache will overlook "power99" matches.
>
> How do you fix it?
> Install the right (tiny text) file, and run ldconfig.
>
> What happens now if you are using a new kernel that supplies a new string
> "power99" in AT_PLATFORM or another auxv element used the same way, but
> with an old root filesystem (say one including any glibc that exists today)?
> Then a DSO load via ld.so.cache will overlook "power99" matches.
>
> How do you fix it?
> You add a bit assignment in the glibc sources, recompile glibc,
> install a new whole glibc package. (Conceivably if you are extremely
> careful you can manage to redo an otherwise completely identical
> build to the glibc on your old system and replace only ld{64,}.so.1
> and ldconfig.) Then run the new ldconfig.
>
> In short, if you use a root filesystem from before kernels started using
> the new string, then you will degenerate to default-platform library
> matches from loads via ld.so.cache (i.e. /lib/foo.so, not /lib/somecpu/foo.so).
> If you want to do better than that for this case, it's intractable using
> AT_PLATFORM, and simple using dsocaps (probably simpler than booting a
> special kernel was).
>
> I haven't figured out what in this old-vs-new picture you think AT_PLATFORM
> or something else like it would ever buy you.
>
>
> Thanks,
> Roland
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2008-07-08 18:32:22

by Steven Munroe

[permalink] [raw]
Subject: Re: [PATCH 1/2] elf loader support for auxvec base platform string

On Tue, 2008-07-08 at 10:48 +1000, Benjamin Herrenschmidt wrote:
> Adding Steve to the CC list as I'd like his input from the
> glibc/powerpc side as he's the requester of that feature in the first
> place.
>
> Steve: Roland is proposing to ues dsocaps instead of AT_BASE_PLATFORM.
>

I am will to discuss better solutions with Roland. It seems like I am
finally on the air for linuxppc-dev but it seems some of my earlier
notes got lost.

So I will restate. AT_BASE_PLATFORM is proposed solution to several
problems including CPU tuned library selection. If dsocaps is better
solution for library select I am happy to consider and discuss this.

However it is not clear that dsocaps is solution to all requires we need
to address for virtualization and partition migration of applications.
This required a durable and public API accessible form any application
or library.

First the problem:

We want to support migration of running partitions (including the kernel
and all running applications) abd we have to deal with mixed platform
clusters. If we want to migrate freely between POWER5+ and POWER6 (or
POWER7) systems then we need to make sure the application and its
libraries restrict themselves to the lowest ISA Version level (2.04 in
this case).

So the hardware and hypervisor support and enforce CPU compatibility
modes. For a partition is created on a POWER6 to run in POWER5+ mode.
There are HID bits set to restrict instruction set to the POWER5+
subset. So running a program that uses new POWER6 instruction on this
partition will SIGILL.

So while this is really a POWER6 machine it is wrong for the kernel to
return AT_PLATFORM=power6. The /lib/power6/libc.so and libm.so do use
the new ISA V2.05 instructions that will SIGILL in this (POWER5+
compatible) partition.

In this case the kernel should return AT_PLATFORM=power5+
because /lib/power5+/libc.so is build --with-cpu=power5+ and only uses
the ISA V2.04 instructions.

But that introduces some new problems. The processor, internal pipeline
(micro-architecture), and performance monitor unit (PMU events have to
match the pipeline structure) have not changed (still POWER6/7). This
implications on application performance and many performance tools.

For example oProfile/PAPI/libpfm need to know what the processor really
is because miss programing the PMU get bogus results or even crash the
systems. Another example is a JVM/JIT compiler which needs to know what
supported ISA level is (from AT_PLATFORM and AT_HWCAP), but can generate
better code if, it knows that base platform is different, and what the
actual micro-architecture is. For these examples the
AT_PLATFORM/AT_HWCAP based library selection mechanism does not apply.
And except for oProfile these examples are user mode
applications/libraries that need this information from a simple and
durable and public API. To me AT_BASE_PLATFORM seems like the minimal,
simplest, and most general solution to these problem.

Ok now back to library selection and dsocaps. Running power5+ libraries
on a power6 will execute (will not SIGILL) but may not be optimal. the
best performance also require careful instruction selection and
scheduling. For example the performance of memset/memcpy/memcmp depend
on tuning to the detail timing of the Load/Store pipelines, Store Queue
depth, and L2 cache clocking. This can be very different between
processor generations.

For this power5+ compatible partitions, we would like the option to
build libraries for -mcpu=power5+ -mtune=power6! etc!. The details of
how this will work are TBD. I put forth AT_BASE_PLATFORM with thought
that it could be search modifier in addition to AT_PLATFORM
(i.e. /lib/power5+/power6/libc.so.

If dsocaps is a better mechanism for library selection I am more then
will to discuss how dsocaps works and how it can be applied to this
specific case.