LinuxLists.cc - CET shadow stack app compatibility

2022-11-15 00:09:28

Subject: CET shadow stack app compatibility

Hi Linus,

Could you weigh in on some brewing ecosystem compatibility issues
around x86 CET[0] shadow stacks? This is the CPU security feature that
keeps a separate protected stack to record return addresses, and
verifies them on return. Support for this feature is not upstream in
the kernel and so the issues discussed here are future problems that
have not happened yet.

The issues all have a root cause of support for CET in tools spreading
widely while kernel support was still in development. This has lead to:
1. Some existing binaries (node.js, PyPy, CRIU) that will break when
glibc updates to use the kernel CET APIs.
2. GCC C++ exception stack unwinding code expecting old development
versions of the kernel ABI.

On the first issue, once there is kernel support, glibc plans to
immediately update in such a way that some existing distro binaries
will break against it. So the scenario is existing distro binaries
being used with future versions of glibc. The known extent of breakage
is limited to some packages of node.js and PyPy, and any version of
CRIU, but it’s reasonable to assume that there are undetected breakages
based on how it came about.

The breakage derives from how the decision is made on whether to enable
shadow stack enforcement. Glibc will do this by checking a bit in the
elf header of the binary. It then tells the kernel to turn CET on via a
separate kernel API. But instead of this elf bit being selected by
application developers, it was mostly applied in various automated ways
(mostly default on) by distro builds for years. This huge amount of
untested enablement has not generated any visible issues for users yet,
because without kernel support the presence of this bit has not
generated any actual CET enforcement.

In some ways it is a variation of past compatibility problems around
distros overriding package defaults for compiler hardening. But the
difference is that the kernel support is involved in doing the
enforcement in this case, leading to the issues going undetected.

For the second issue, there are also problems lurking in gcc. The gcc
CET support has preceded the kernel changes and the unwinding code
assumes things about the kernel shadow stack signal frame ABI that have
changed over the course of CET kernel development. It is compatible by
luck for now, but old GGC’s that apply the existing elf bit (going back
to gcc-8) can generate future binaries that would constrain the shadow
stack signal frame from expanding, which there are already plans to do.

I would like to make this go smoother all around by having the kernel
detect the existing elf bit and refuse to enable CET for these
applications, like this[1]. Then the binaries derived from the pre-
kernel support era would just continue to run normally without CET
enforcement. The intention would be to force tools to pick a new elf
bit to denote compatibility for this feature. With a tools reset, this
time the upstream kernel would have shadow stack support ahead of tools
and so any issues would likely show up earlier.

The best place to exclude the old binaries from shadow stack support
would be in the glibc loader, but developers of that (on CC) are
against creating new CET elf bits. So the kernel would be taking a
stand here and would essentially burn this bit from the kernel side.

Are you generally ok with the kernel reaching out and getting involved
in this shadow stack enablement decision like this?

Thanks,

Rick

[0] https://lwn.net/Articles/885220/
[1]
https://lore.kernel.org/lkml/[email protected]/

2022-11-15 02:28:40

by Linus Torvalds

[permalink] [raw]

Subject: Re: CET shadow stack app compatibility

On Mon, Nov 14, 2022 at 3:15 PM Edgecombe, Rick P
<[email protected]> wrote:
>
> I would like to make this go smoother all around by having the kernel
> detect the existing elf bit and refuse to enable CET for these
> applications, like this[1].

Honestly, I don't want to preemptively say 'this won't work".

That said, once CET is enabled in the kernel, and it turns out that
people complain that it breaks existing binaries, at that point I
guess it gets disabled again. Possibly at that point using something
like your suggested patch. But I'm not doing it until actual problems
appear, and until we actually have this code in the kernel.

I'm disgusted by glibc being willing to just upgrade and break
existing binaries and take the "you shouldn't upgrade glibc if you
have old binaries" approach.

But hey, I guess that's part for the course for glibc, and there's
nothing I can do about that.

But yes, once people complain, I'll just make sure that old binaries
continue to work, and at that point the glibc and tooling people will
presumably have to fix their broken situation to get CET at all.

Because no, the kernel doesn't enable CET if it breaks binaries.
That's how we roll.

Linus

2022-11-15 08:06:07

by Florian Weimer

[permalink] [raw]

Subject: Re: CET shadow stack app compatibility

* Linus Torvalds:

> I'm disgusted by glibc being willing to just upgrade and break
> existing binaries and take the "you shouldn't upgrade glibc if you
> have old binaries" approach.

We've been in this position for years. Every time we use a new system
call to implement existing functionality in glibc, some applications
break. Mostly due to seccomp filters. They break even if there would
be no observable differences for applictions in the way the new system
calls would be invoked if the seccomp filter wouldn't block them.

I proposed a new ENOSYS handshake between userspace and kernel to reduce
the amount of breakage (but not all of it). Senior kernel developers
rejected it, so we didn't implement it in glibc.

[PATCH] syscalls: Document OCI seccomp filter interactions & workaround
<https://lore.kernel.org/linux-api/[email protected]/>

(It deals with OCI because it's well-documented, but the same principle
would have applied to browser sandboxes, too.)

Instead, we work with distributions and upstreams to make sure the
applications are ready before the next distribution glibc update.
Fortunately, there seems to be a pretty broad overlap between
seccomp-using applications and applications with frequent, more-or-less
mandatory updates, so the transition periods are relatively short. You
didn't seem to have noticed, so maybe we aren't doing such a bad job
after all.

I don't see why CET or x86 shadow stack support could not be handled in
the same way. (There is probably a similar overlap.) At least we
should try how far we can get with the existing binaries, and if things
turn out not working after all, we will have to start over with
different markers. But the kernel shouldn't have to care.

Based on what we have seen so far (and since fixed), it's mostly shared
objects that weren't marked up correctly. The posted hack didn't even
deal with that case. If the main executable has the current markers,
the kernel will not disable shadow stack, and the process will still
crash after loading the incorrectly marked shared object. Someone has
to step in and fix things for real (so that they don't break again just
after rebuild with a current toolchain adding the current markers). The
kernel patch makes this harder because it's not possible anymore to use
an existing distribution for this kind of work. Instead, we'd have to
wait for a rebuild with the new markers, and of course this rebuild will
put is in exactly the same position as before: the incompatibilities
will be back because they are no longer masked by the kernel.

Fortunately, we are in a way better situation on x86 than where we are
with PAC on AArch64: there you have to reboot with a custom kernel
option to disable PAC and restore compatibility with applications. (As
far as I know, PAC state isn't process-switched, which I find rather
flabbergasting.) Furthermore, the way it was deployed in application
and libraries was largely unconditional (hard-coded into hand-written
assembly, without preprocessor conditionals to see of PAC was enabled
during the build). At least the presence of CET features depends on CET
compiler flags, and we can easily turn it off on a per-process basis if
there are any incompatibilities.

Thanks,
Florian

2022-11-15 09:55:56

by Peter Zijlstra

[permalink] [raw]

Subject: Re: CET shadow stack app compatibility

Let me hijack this and go off on a tangent..

On Mon, Nov 14, 2022 at 11:15:44PM +0000, Edgecombe, Rick P wrote:

> The breakage derives from how the decision is made on whether to enable
> shadow stack enforcement. Glibc will do this by checking a bit in the
> elf header of the binary. It then tells the kernel to turn CET on via a
> separate kernel API. But instead of this elf bit being selected by
> application developers, it was mostly applied in various automated ways
> (mostly default on) by distro builds for years. This huge amount of
> untested enablement has not generated any visible issues for users yet,
> because without kernel support the presence of this bit has not
> generated any actual CET enforcement.

CET is two things, ideally we're fully eradicate the term CET, never
again mention CET, ever. Whoever at Intel decided to push that term has
created so much confusion it's not funny :/

The feature at hand here is backward edge control flow -- or shadow
stacks (the means to implement this). Be explicit about this, do *NOT*
use CET ever again.

The other thing CET has is forward edge control flow -- or indirect
branch tracking, this is a completely different and independent feature
and not advertised or implemented here.

These things are obviously related, but since they're two independent
features there's the endless confusion as to which is actually meant.

(go (re)watch the last plumbers conf talks on the subject -- there's
always someone who gets is wrong)

The only things that should have CET in their name are the CR4 bit and
the two MSRs, nothing more.

ELF bits should not, must not, be called CET. API, not CET, Compiler
features, also not CET.

(and I know it's too late to eradicate some of it, but please, at least
make sure the kernel doesn't propagate this nonsense).

2022-11-15 17:20:23

by Edgecombe, Rick P

[permalink] [raw]

Subject: Re: CET shadow stack app compatibility

On Tue, 2022-11-15 at 08:33 +0100, Florian Weimer wrote:
> Based on what we have seen so far (and since fixed), it's mostly
> shared
> objects that weren't marked up correctly.

For the benefit of anyone that is not involved in CET... As PeterZ was
just discussing, "CET" consists of two mostly independent features:
"IBT" and "Shadow Stack". I am currently trying to enable userspace
shadow stack in the kernel. No IBT enforcement will happen in userspace
for the time being.

For IBT, which seems to be in worse shape than shadow stack from an
existing userspace perspective, I have also seen shared objects with
issues.

For shadow stack, it was just JITing binaries. Of course if glibc is
compiled in non-permissive mode there is an additional category of
issues around dlopen()ing that we haven't even discussed yet. And the
past issues around makecontext() we have already worked around from the
kernel. If you are aware of any other specific compatibility problems,
please share so we can discuss the extent.

> The posted hack didn't even
> deal with that case. If the main executable has the current markers,
> the kernel will not disable shadow stack, and the process will still
> crash after loading the incorrectly marked shared object.

The proposed glibc changes would not enable shadow stack unless the
execing binary has the elf bit marked. So if we block those binaries
(which the kernel can easily check) from enabling shadow stack, none of
the linked shared objects will have shadow stack either. So I think we
are ok to hold this in our back pocket to resolve the known issues if
anyone complains.

Where the shared objects could come into play is, in the event that we
have to block the old elf bit from the kernel, and a new one is
properly marked on a new executable, future glibcs could decide to
honor the old bits when checking shared libraries. So you could have an
executable with SHSTK2 bit loading a problem SO with just SHSTK1 bit.

It would indeed be more difficult for the kernel to detect this,
especially in the dlopen() case, but it should not prevent simply
blocking any day 1 kernel support binaries. Please, please, don't do
this in the future if it comes up though. If the kernel can't find any
good options, it risks shadow stack getting reverted for everyone.

> Someone has
> to step in and fix things for real (so that they don't break again
> just
> after rebuild with a current toolchain adding the current markers).
> The
> kernel patch makes this harder because it's not possible anymore to
> use
> an existing distribution for this kind of work.

There was an EXPERT config for things like this, and I was mulling a
runtime sysctl. But I think now the idea is that the patch could serve
a "better than a full revert" purpose. Not an ideal solution.

But I still don't see why doing the order:
1. kernel support
2. libc support
3. compiler support

...wouldn't have generated a more normal situation where old binaries
don't break against new kernels and testing can easily happen to reduce
issues further. So we could still reset and do exactly that.

> Instead, we'd have to
> wait for a rebuild with the new markers, and of course this rebuild
> will
> put is in exactly the same position as before: the incompatibilities
> will be back because they are no longer masked by the kernel.

People building new apps and testing them against upstream kernels and
finding issues sounds like business as usual. I'm not trying to solve
all possible userspace mistakes from the kernel.

2022-11-15 17:20:57

by Edgecombe, Rick P

[permalink] [raw]

Subject: Re: CET shadow stack app compatibility

On Tue, 2022-11-15 at 10:43 +0100, Peter Zijlstra wrote:
> CET is two things, ideally we're fully eradicate the term CET, never
> again mention CET, ever. Whoever at Intel decided to push that term
> has
> created so much confusion it's not funny :/
>
> The feature at hand here is backward edge control flow -- or shadow
> stacks (the means to implement this). Be explicit about this, do
> *NOT*
> use CET ever again.
>
> The other thing CET has is forward edge control flow -- or indirect
> branch tracking, this is a completely different and independent
> feature
> and not advertised or implemented here.
>
> These things are obviously related, but since they're two independent
> features there's the endless confusion as to which is actually meant.
>
> (go (re)watch the last plumbers conf talks on the subject -- there's
> always someone who gets is wrong)
>
> The only things that should have CET in their name are the CR4 bit
> and
> the two MSRs, nothing more.

The only other place in the kernel where it has to be that way is the
"control protection" fault handler.

I agree it's confusing, but when you talk about "shadow stacks", a lot
of people don't connect it to the HW feature. Where as they have heard
of CET. So for contexts like this, I thought it was useful to jog
memories. I could put more distance between it... "x86 shadow stacks
(you may have heard of CET)".

>
> ELF bits should not, must not, be called CET. API, not CET, Compiler
> features, also not CET.

So the arch_prctl()s can't be shared between shadow stack and IBT? They
don't have to be, but this is a new thing after a fair amount of
earlier discussion.

>
> (and I know it's too late to eradicate some of it, but please, at
> least
> make sure the kernel doesn't propagate this nonsense).

2022-11-15 20:19:56

by Peter Zijlstra

[permalink] [raw]

Subject: Re: CET shadow stack app compatibility

On Tue, Nov 15, 2022 at 05:04:40PM +0000, Edgecombe, Rick P wrote:
> > ELF bits should not, must not, be called CET. API, not CET, Compiler
> > features, also not CET.
>
> So the arch_prctl()s can't be shared between shadow stack and IBT? They
> don't have to be, but this is a new thing after a fair amount of
> earlier discussion.

I would very strongly suggest IBT not use that interface and instead we
follow ARM64 BTI's lead -- such that application developers don't go
insane trying to use two nearly identical solutions.

I mean, the toolchain folks made a godawefull mess of things, but we
don't have to.

2022-12-02 19:47:22

by Florian Weimer

[permalink] [raw]

Subject: Re: CET shadow stack app compatibility

* Rick P. Edgecombe:

> For IBT, which seems to be in worse shape than shadow stack from an
> existing userspace perspective, I have also seen shared objects with
> issues.
>
> For shadow stack, it was just JITing binaries.

Except that the actual JITters are usually in shared objects, too, and
you just assume here that they get loaded by a main program from the
same build. 8-) I think most of them are reusable independently, or are
bundled into applications built with a different toolchain.

> Of course if glibc is compiled in non-permissive mode there is an
> additional category of issues around dlopen()ing that we haven't even
> discussed yet. And the past issues around makecontext() we have
> already worked around from the kernel. If you are aware of any other
> specific compatibility problems, please share so we can discuss the
> extent.

H.J. ran most of the experiments on Fedora. We did some early
validation many years ago, using the first ABI iteration. We didn't
have as much reach as we liked in terms of hardening at the time, if I
recall correctly, but there were only very few cases where something did
not work and was also not marked as incompatible.

>> The posted hack didn't even
>> deal with that case. If the main executable has the current markers,
>> the kernel will not disable shadow stack, and the process will still
>> crash after loading the incorrectly marked shared object.
>
> The proposed glibc changes would not enable shadow stack unless the
> execing binary has the elf bit marked. So if we block those binaries
> (which the kernel can easily check) from enabling shadow stack, none of
> the linked shared objects will have shadow stack either. So I think we
> are ok to hold this in our back pocket to resolve the known issues if
> anyone complains.

See above, the assumption that the JITter and the main program come from
the same build that is implicit in this is not actually true in
practice.

> Where the shared objects could come into play is, in the event that we
> have to block the old elf bit from the kernel, and a new one is
> properly marked on a new executable, future glibcs could decide to
> honor the old bits when checking shared libraries. So you could have an
> executable with SHSTK2 bit loading a problem SO with just SHSTK1 bit.

Right. But we can also have policies in userspace to paper over this.
I'm not worried about it. I want to see how far we can get before
making the flip in an upstream version of glibc, but if the kernel
enforces SHSTK2 (even just on executables), I need a toolchain update
plus a rebuild of a large chunk of the distribution.

So with reusing SHSTK1 markup, it goes like this:

1. Get a Fedora rawhide kernel with userspace SHSTK support.
2. Get the glibc patches from H.J., and gate them behind a tunable
(off by default). Kernel behavior should not change with this
new glibc because the required arch_prctl does not happen
(and the old ones currently in glibc have different numbers).
3. Run the Fedora graphical desktop with the tunable switched on and a few key
third-party applications to see where we stand in terms of
compatibility.
3b Do the same thing with RHEL and some enterprise applications
(using the kernel and glibc from 1 & 2 for a start).
4. (Optional.) Flip the default of the tunable to on.

I don't know how quickly we can get past step 1, but it seems fairly
soon, maybe three months, considering the upcoming end-of-year break.

With SHSTK2 markup required by the kernel, it goes like this:

1. Get a Fedora rawhide kernel with userspace SHSTK support.
2. Get a SHSTK2-enabled toolchain. GCC is currently freezing for the 13
release, so this is not a good time of the year for that. It's
probably going to be a custom compiler, unless we want to wait a
couple of months, and even then it's got to be a downstream-only
backport at first because to upstream, this will have a “not
finished” whiff (it's the umpteenth ABI change).
3. Get the glibc patches from H.J. We would probably put it behind
a tunable as well.
4. Rebuild key parts of Fedora, probably directly in rawhide (the
rolling integration distribution).
5. Run the Fedora rawhide graphical desktop etc.
6. RHEL testing will require a SHSTK2 port to a different compiler
and another mass rebuild. ISV application testing is not meaningful
until the ISVs have switched to a newer compiler.

That's going to take much longer than three months. Maybe we have to do
this in the end, but even then, we have no way of forcing developers to
test on SHSTK-capable hardware on new-enough before turning on the
SHSTK2 bit.

In the end, we might still need SHSTK2, but we don't know that yet, and
the first approach is quite cheap, so I really want to try it.

Keep in mind that just because some useful interface is provided by the
kernel, we can't necessarily use it in glibc immediately because with
all those seccomp filters out there (and other dependencies on internal
glibc/kernel interface details), too much would break if we exposed it
into existing applications without some coordination. SHSTK isn't
*that* different, except that we have some binary markup to guide us at
run time.

> But I still don't see why doing the order:
> 1. kernel support
> 2. libc support
> 3. compiler support
>
> ...wouldn't have generated a more normal situation where old binaries
> don't break against new kernels and testing can easily happen to reduce
> issues further. So we could still reset and do exactly that.

No matter in which order you do it, some group will want to change ABI
or semantics. We actually had multiple different iterations in
different orders, and everybody wanted to put their mark onto this
feature, changing the ABI. I don't care at all about the internal ABI
between glibc and the kernel, but the markup of the binaries (besides
glibc itself) is quite important to me.

In retrospect, separating SHSTK from IBT from the start would have
helped a lot because I think we could have done that in libc without
compiler support. But I don't think anyone expected this to take four
to five years to implement (or probably longer for IBT).

>> Instead, we'd have to
>> wait for a rebuild with the new markers, and of course this rebuild
>> will
>> put is in exactly the same position as before: the incompatibilities
>> will be back because they are no longer masked by the kernel.
>
> People building new apps and testing them against upstream kernels and
> finding issues sounds like business as usual. I'm not trying to solve
> all possible userspace mistakes from the kernel.

They also have to test on the right hardware and with a new/unreleased
glibc.

I think it would be helpful to those developers if we could give them an
existing distribution early on they can use for experiments. Not just
getting SHSTK going, but also playing with the perf integration (which
to me is the real goal here).

Thanks,
Florian

2022-12-05 20:00:10

by Edgecombe, Rick P

[permalink] [raw]

Subject: Re: CET shadow stack app compatibility

On Fri, 2022-12-02 at 19:48 +0100, Florian Weimer wrote:
> * Rick P. Edgecombe:
>
> > For IBT, which seems to be in worse shape than shadow stack from an
> > existing userspace perspective, I have also seen shared objects
> > with
> > issues.
> >
> > For shadow stack, it was just JITing binaries.
>
> Except that the actual JITters are usually in shared objects, too,
> and
> you just assume here that they get loaded by a main program from the
> same build. 8-) I think most of them are reusable independently, or
> are
> bundled into applications built with a different toolchain.

So I guess the situation must be a SHSTK2 binary dlopen()s a
broken SHSTK1 DSO (broken because of JITing or whatever) using a future
version of glibc. It would depend on how the future implementation of
SHSTK2 in glibc would handle this. I can only hope glibc would do the
right thing to avoid whatever situation caused the creation of SHSTK2.

If the scenario is SHSTK1 binary dlopen()s a broken SHSTK1 DSO, it
would already not have shadow stack because SHSTK1 was blocked from
getting shadow stack enabled.

>
> > Of course if glibc is compiled in non-permissive mode there is an
> > additional category of issues around dlopen()ing that we haven't
> > even
> > discussed yet. And the past issues around makecontext() we have
> > already worked around from the kernel. If you are aware of any
> > other
> > specific compatibility problems, please share so we can discuss the
> > extent.
>
> H.J. ran most of the experiments on Fedora. We did some early
> validation many years ago, using the first ABI iteration. We didn't
> have as much reach as we liked in terms of hardening at the time, if
> I
> recall correctly, but there were only very few cases where something
> did
> not work and was also not marked as incompatible.

I think most binaries will work automatically. The problem is the
standard is not "doesn't break *too many* binaries".

>
> > > The posted hack didn't even
> > > deal with that case. If the main executable has the current
> > > markers,
> > > the kernel will not disable shadow stack, and the process will
> > > still
> > > crash after loading the incorrectly marked shared object.
> >
> > The proposed glibc changes would not enable shadow stack unless the
> > execing binary has the elf bit marked. So if we block those
> > binaries
> > (which the kernel can easily check) from enabling shadow stack,
> > none of
> > the linked shared objects will have shadow stack either. So I think
> > we
> > are ok to hold this in our back pocket to resolve the known issues
> > if
> > anyone complains.
>
> See above, the assumption that the JITter and the main program come
> from
> the same build that is implicit in this is not actually true in
> practice.

Hmm, not sure I understand your point. Are you saying that the kernel
can't resolve the found issues by blocking SHSTK1 execing binaries? I
think it can by depending on nice future glibc behavior.

In general, the point that the kernel can't fully stop userspace from
breaking itself is well taken.

>
> > Where the shared objects could come into play is, in the event that
> > we
> > have to block the old elf bit from the kernel, and a new one is
> > properly marked on a new executable, future glibcs could decide to
> > honor the old bits when checking shared libraries. So you could
> > have an
> > executable with SHSTK2 bit loading a problem SO with just SHSTK1
> > bit.
>
> Right. But we can also have policies in userspace to paper over
> this.
> I'm not worried about it. I want to see how far we can get before
> making the flip in an upstream version of glibc, but if the kernel
> enforces SHSTK2 (even just on executables), I need a toolchain update
> plus a rebuild of a large chunk of the distribution.

The existing gcc's assume wrong ABI as well, so it's probably safest to
use an updated toolchain in any case. I wasn't able to find any
binaries that broke because of the GCC issues, but it wasn't an
exhaustive search.

But remember, even that filter patch had a Kconfig to disable it.
Distros with the resources to test everything on SHSTK hardware and
users that don't build their own glibcs could probably minimize the
impact. But smaller distros or users could at least not be surprised or
wait for SHSTK2 to make its way through.

>
> So with reusing SHSTK1 markup, it goes like this:
>
> 1. Get a Fedora rawhide kernel with userspace SHSTK support.
> 2. Get the glibc patches from H.J., and gate them behind a tunable
> (off by default). Kernel behavior should not change with this
> new glibc because the required arch_prctl does not happen
> (and the old ones currently in glibc have different numbers).
> 3. Run the Fedora graphical desktop with the tunable switched on and
> a few key
> third-party applications to see where we stand in terms of
> compatibility.
> 3b Do the same thing with RHEL and some enterprise applications
> (using the kernel and glibc from 1 & 2 for a start).
> 4. (Optional.) Flip the default of the tunable to on.
>
> I don't know how quickly we can get past step 1, but it seems fairly
> soon, maybe three months, considering the upcoming end-of-year break.
>
> With SHSTK2 markup required by the kernel, it goes like this:
>
> 1. Get a Fedora rawhide kernel with userspace SHSTK support.
> 2. Get a SHSTK2-enabled toolchain. GCC is currently freezing for the
> 13
> release, so this is not a good time of the year for that. It's
> probably going to be a custom compiler, unless we want to wait a
> couple of months, and even then it's got to be a downstream-only
> backport at first because to upstream, this will have a “not
> finished” whiff (it's the umpteenth ABI change).
> 3. Get the glibc patches from H.J. We would probably put it behind
> a tunable as well.
> 4. Rebuild key parts of Fedora, probably directly in rawhide (the
> rolling integration distribution).
> 5. Run the Fedora rawhide graphical desktop etc.
> 6. RHEL testing will require a SHSTK2 port to a different compiler
> and another mass rebuild. ISV application testing is not
> meaningful
> until the ISVs have switched to a newer compiler.
>
> That's going to take much longer than three months. Maybe we have to
> do
> this in the end, but even then, we have no way of forcing developers
> to
> test on SHSTK-capable hardware on new-enough before turning on the
> SHSTK2 bit.
>
> In the end, we might still need SHSTK2, but we don't know that yet,
> and
> the first approach is quite cheap, so I really want to try it.

Yes, this is the working plan at this point. I removed the elf header
bit filter in the latest revision. I still personally would favor
starting over with SHSTK2 from the beginning, even if it led to slower
roll out. That would be a feature, not a bug, in my view.

If we do end up needing SHSTK2 though, then it resets the clock and the
rollout is the slowest of the possibilities.

>
> Keep in mind that just because some useful interface is provided by
> the
> kernel, we can't necessarily use it in glibc immediately because with
> all those seccomp filters out there (and other dependencies on
> internal
> glibc/kernel interface details), too much would break if we exposed
> it
> into existing applications without some coordination. SHSTK isn't
> *that* different, except that we have some binary markup to guide us
> at
> run time.

The thing that is rare is that the way that is has been rolled out
restricts existing behavior under the nose of the application
developers AND it depends on kernel/HW support. In the analogy of
forced compiler hardening options, as best I can tell (I'm educating
myself on this history only recently), larger distros started doing
this and found and fixed the issues. Then smaller ones picked it up
after that.

With shadow stack, we seem to be well down this path already because of
the lack of kernel support.

>
> > But I still don't see why doing the order:
> > 1. kernel support
> > 2. libc support
> > 3. compiler support
> >
> > ...wouldn't have generated a more normal situation where old
> > binaries
> > don't break against new kernels and testing can easily happen to
> > reduce
> > issues further. So we could still reset and do exactly that.
>
> No matter in which order you do it, some group will want to change
> ABI
> or semantics. We actually had multiple different iterations in
> different orders, and everybody wanted to put their mark onto this
> feature, changing the ABI. I don't care at all about the internal
> ABI
> between glibc and the kernel, but the markup of the binaries (besides
> glibc itself) is quite important to me.

I'm late to this project, but for my changes to the enablement ABI I
really had no choice. I preferred SHSTK2 to resolve the boot problems
too and we did this other ABI change after extreme resistance from the
glibc side. So it was really trying to prevent an insta-revert rather
then putting any marks on anything.

Whatever the spec, we really need to prevent compatibility sensitive
features like this from making it upstream in userspace before the
kernel changes. The kernel has high backwards compatibility standards.
To try to achieve this, it should have flexibility to design its own
ABI. Putting the userspace changes upstream ahead of time for a feature
like this constrains the kernel.

The idea that userspace can finalize on all the bits and ABI for future
features and then wait lurking to cause kernel regressions if the
kernel doesn't match is wrong. It also caused these concrete issues. So
hopefully everyone is on the same page about this for the future. Just
want to be clear in case.

>
> In retrospect, separating SHSTK from IBT from the start would have
> helped a lot because I think we could have done that in libc without
> compiler support. But I don't think anyone expected this to take
> four
> to five years to implement (or probably longer for IBT).
>
> > > Instead, we'd have to
> > > wait for a rebuild with the new markers, and of course this
> > > rebuild
> > > will
> > > put is in exactly the same position as before: the
> > > incompatibilities
> > > will be back because they are no longer masked by the kernel.
> >
> > People building new apps and testing them against upstream kernels
> > and
> > finding issues sounds like business as usual. I'm not trying to
> > solve
> > all possible userspace mistakes from the kernel.
>
> They also have to test on the right hardware and with a
> new/unreleased
> glibc.
>
> I think it would be helpful to those developers if we could give them
> an
> existing distribution early on they can use for experiments. Not
> just
> getting SHSTK going, but also playing with the perf integration
> (which
> to me is the real goal here).
>
>

Agreed. A Kconfig or sysctl would have worked fine for this purpose
though.