2021-02-01 17:58:19

by Jason A. Donenfeld

[permalink] [raw]
Subject: forkat(int pidfd), execveat(int pidfd), other awful things?

Hi Andy & others,

I was reversing some NT stuff recently and marveling over how wild and
crazy things are over in Windows-land. A few things related to process
creation caught my interest:

- It's possible to create a new process with an *arbitrary parent
process*, which means it'll then inherit various things like handles
and security attributes and tokens from that new parent process.

- It's possible to create a new process with the memory space handle
of a different process. Consider this on Linux, and you have some
abomination like `forkat(int pidfd)`.

The big question is "why!?" At first I was just amused by its presence
in NT. Everything is an object and you can usually freely mix and
match things, and it's very flexible, which is cool. But this is NT,
not Linux.

Jann and I were discussing, though, that maybe some variant of these
features might be useful to get rid of setuid executables. Imagine
something like `systemd-sudod`, forked off of PID 1 very early.
Subsequently all new processes on the system run with
PR_SET_NO_NEW_PRIVS or similar policies to prevent non-root->root
transition. Then, if you want to transition, you ask systemd-sudod (or
polkitd, or whatever else you have in mind) to make you a new process,
and it then does the various policy checks, and executes a new process
for you as the parent of the requesting process.

So how would that work? Well, executing processes with arbitrary
parents would be part of it, as above. But we'd probably want to more
carefully control that new process. Which chroot is it in? How do
cgroups work? And so on. And ultimately this design leads to something
like ZwCreateProcess, where you have several arguments, each to a
handle to some part of the new process state, or null to be inherited
from its parent.

int execve_parent(int parent_pidfd, int root_dirfd, int cgroup_fd, int
namespace_fd, const char *pathname, char *const argv[], char *const
envp[]);

One could imagine this growing pretty unwieldy. There's also this
other design aspect of Linux that's worth considering. Namespaces and
other process-inherited resources are generally hierarchical, with
children getting the resource from their parent. This makes sense and
is simple to conceptualize. Everytime we add a new thing_fd as a
pointer to one of these resources, and allow it to be used outside of
that hierarchy, it introduces a kind of "escape hatch". That might be
considered "bad design" by some; it might not be by others. Seen this
way, NT is one massive escape hatch, with pretty much everything being
an object with a handle.

But! Maybe this is nonetheless an interesting design avenue to
explore. The introduction of pidfd is sort of just the "beginning" of
that kind of design.

Is any of this interesting to you as a future of privilege escalation
and management on Linux?

Jason


2021-02-01 18:03:08

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: forkat(int pidfd), execveat(int pidfd), other awful things?

> int execve_parent(int parent_pidfd, int root_dirfd, int cgroup_fd, int
> namespace_fd, const char *pathname, char *const argv[], char *const
> envp[]);

A variant on the same scheme would be:

int execve_remote(int pidfd, int root_dirfd, int cgroup_fd, int
namespace_fd, const char *pathname, char *const argv[], char *const
envp[]);

Unpriv'd process calls fork(), and from that fork sends its pidfd
through a unix socket to systemd-sudod, which then calls execve_remote
on that pidfd.

There are a lot of (potentially very bad) ways to skin this cat.

2021-02-01 18:27:54

by Christian Brauner

[permalink] [raw]
Subject: Re: forkat(int pidfd), execveat(int pidfd), other awful things?

On Mon, Feb 01, 2021 at 06:47:17PM +0100, Jason A. Donenfeld wrote:
> Hi Andy & others,
>
> I was reversing some NT stuff recently and marveling over how wild and
> crazy things are over in Windows-land. A few things related to process
> creation caught my interest:
>
> - It's possible to create a new process with an *arbitrary parent
> process*, which means it'll then inherit various things like handles
> and security attributes and tokens from that new parent process.
>
> - It's possible to create a new process with the memory space handle
> of a different process. Consider this on Linux, and you have some
> abomination like `forkat(int pidfd)`.
>
> The big question is "why!?" At first I was just amused by its presence
> in NT. Everything is an object and you can usually freely mix and
> match things, and it's very flexible, which is cool. But this is NT,
> not Linux.
>
> Jann and I were discussing, though, that maybe some variant of these
> features might be useful to get rid of setuid executables. Imagine
> something like `systemd-sudod`, forked off of PID 1 very early.
> Subsequently all new processes on the system run with
> PR_SET_NO_NEW_PRIVS or similar policies to prevent non-root->root
> transition. Then, if you want to transition, you ask systemd-sudod (or
> polkitd, or whatever else you have in mind) to make you a new process,
> and it then does the various policy checks, and executes a new process
> for you as the parent of the requesting process.
>
> So how would that work? Well, executing processes with arbitrary
> parents would be part of it, as above. But we'd probably want to more
> carefully control that new process. Which chroot is it in? How do
> cgroups work? And so on. And ultimately this design leads to something
> like ZwCreateProcess, where you have several arguments, each to a
> handle to some part of the new process state, or null to be inherited
> from its parent.
>
> int execve_parent(int parent_pidfd, int root_dirfd, int cgroup_fd, int
> namespace_fd, const char *pathname, char *const argv[], char *const
> envp[]);
>
> One could imagine this growing pretty unwieldy. There's also this
> other design aspect of Linux that's worth considering. Namespaces and
> other process-inherited resources are generally hierarchical, with
> children getting the resource from their parent. This makes sense and
> is simple to conceptualize. Everytime we add a new thing_fd as a
> pointer to one of these resources, and allow it to be used outside of
> that hierarchy, it introduces a kind of "escape hatch". That might be
> considered "bad design" by some; it might not be by others. Seen this
> way, NT is one massive escape hatch, with pretty much everything being
> an object with a handle.
>
> But! Maybe this is nonetheless an interesting design avenue to
> explore. The introduction of pidfd is sort of just the "beginning" of
> that kind of design.
>
> Is any of this interesting to you as a future of privilege escalation
> and management on Linux?

A bunch of this was discussed in a breakout room during Linux Plumbers
last year and I also had discussions with Lennart about this a little
while ago.

One API I had proposed was to extend pidfd_open() to give you a
pidfd that does not yet refer to any process, i.e. instead of

int pidfd = pidfd_open(1234, 0);

you could do

int pidfd = pidfd_open(-1/-ESRCH, 0);

which would give you an empty process handle without any mentionable
properties.

A simple/dumb design would then be to let clone3() not just return
pidfds but also take pidfds as an argument. You could then hand-off the
pidfd to another process SCM_RIGHTS/pidfd_getfd() and have it create a
process for you with the privileges of the caller, you'd still be the
parent.

Or in addition to pidfd_open() we add new syscalls to configure a
process context pidfd_configure() or sm. This design I initially
proposed before we ended up with what we have now.

So yes, I would love to have at least the concept to create a process
for another process, delegated fork, essentially.

Christian

2021-02-01 18:34:40

by Andy Lutomirski

[permalink] [raw]
Subject: Re: forkat(int pidfd), execveat(int pidfd), other awful things?

On Mon, Feb 1, 2021 at 9:47 AM Jason A. Donenfeld <[email protected]> wrote:
>
> Hi Andy & others,
>
> I was reversing some NT stuff recently and marveling over how wild and
> crazy things are over in Windows-land. A few things related to process
> creation caught my interest:
>
> - It's possible to create a new process with an *arbitrary parent
> process*, which means it'll then inherit various things like handles
> and security attributes and tokens from that new parent process.
>
> - It's possible to create a new process with the memory space handle
> of a different process. Consider this on Linux, and you have some
> abomination like `forkat(int pidfd)`.

My general thought is that this is an excellent idea, but maybe not
quite in this form. I do rather like a lot about the NT design,
although I have to say that their actual taste in the structures
passed into APIs is baroque at best.

If we're going to do this, though, can we stay away from fork and and
exec entirely? Fork is cute but inefficient, and exec is the source
of neverending complexity and bugs in the kernel. But I also think
that whole project can be decoupled into two almost-orthogonal pieces:

1. Inserting new processes into unusual places in the process tree.
The only part of setuid that really needs kernel help to replace is
for the daemon to be able to make its newly-spawned child be a child
of the process that called out to the daemon. Christian's pidfd
proposal could help here, and there could be a new API that is only a
minor tweak to existing fork/exec to fork-and-reparent.

2. A sane process creation API. It would be delightful to be able to
create a fully-specified process without forking. This might end up
being a fairly complicated project, though -- there are a lot of
inherited process properties to be enumerated.

(Bonus #3): binfmts are a pretty big attack surface. Having a way to
handle all the binfmt magic in userspace might be a nice extension to
#2.

--Andy

2021-02-01 18:41:20

by Casey Schaufler

[permalink] [raw]
Subject: Re: forkat(int pidfd), execveat(int pidfd), other awful things?

On 2/1/2021 9:47 AM, Jason A. Donenfeld wrote:
> Hi Andy & others,
>
> I was reversing some NT stuff recently and marveling over how wild and
> crazy things are over in Windows-land. A few things related to process
> creation caught my interest:
>
> - It's possible to create a new process with an *arbitrary parent
> process*, which means it'll then inherit various things like handles
> and security attributes and tokens from that new parent process.
>
> - It's possible to create a new process with the memory space handle
> of a different process. Consider this on Linux, and you have some
> abomination like `forkat(int pidfd)`.
>
> The big question is "why!?" At first I was just amused by its presence
> in NT. Everything is an object and you can usually freely mix and
> match things, and it's very flexible, which is cool. But this is NT,
> not Linux.
>
> Jann and I were discussing, though, that maybe some variant of these
> features might be useful to get rid of setuid executables. Imagine
> something like `systemd-sudod`, forked off of PID 1 very early.
> Subsequently all new processes on the system run with
> PR_SET_NO_NEW_PRIVS or similar policies to prevent non-root->root
> transition. Then, if you want to transition, you ask systemd-sudod (or
> polkitd, or whatever else you have in mind) to make you a new process,
> and it then does the various policy checks, and executes a new process
> for you as the parent of the requesting process.
>
> So how would that work? Well, executing processes with arbitrary
> parents would be part of it, as above. But we'd probably want to more
> carefully control that new process. Which chroot is it in? How do
> cgroups work? And so on. And ultimately this design leads to something
> like ZwCreateProcess, where you have several arguments, each to a
> handle to some part of the new process state, or null to be inherited
> from its parent.
>
> int execve_parent(int parent_pidfd, int root_dirfd, int cgroup_fd, int
> namespace_fd, const char *pathname, char *const argv[], char *const
> envp[]);
>
> One could imagine this growing pretty unwieldy. There's also this
> other design aspect of Linux that's worth considering. Namespaces and
> other process-inherited resources are generally hierarchical, with
> children getting the resource from their parent. This makes sense and
> is simple to conceptualize. Everytime we add a new thing_fd as a
> pointer to one of these resources, and allow it to be used outside of
> that hierarchy, it introduces a kind of "escape hatch". That might be
> considered "bad design" by some; it might not be by others. Seen this
> way, NT is one massive escape hatch, with pretty much everything being
> an object with a handle.
>
> But! Maybe this is nonetheless an interesting design avenue to
> explore. The introduction of pidfd is sort of just the "beginning" of
> that kind of design.
>
> Is any of this interesting to you as a future of privilege escalation
> and management on Linux?

TL;DR - We have plenty of flayed cats.

My brief analysis of your proposal doesn't lead me to think
that there's anything you couldn't already do with systemd and
an application launcher. We already have a bunch of security
mechanisms and behaviors that the masses have decided are too
complicated or dangerous to use. And some that *are* too
complicated or dangerous to use. I wouldn't see these mechanisms
as "hardening" the kernel. I would see them as complicating
what passes for the Linux security policy.

>
> Jason

2021-02-02 09:27:09

by David Laight

[permalink] [raw]
Subject: RE: forkat(int pidfd), execveat(int pidfd), other awful things?

From: Andy Lutomirski
> Sent: 01 February 2021 18:30
...
> 2. A sane process creation API. It would be delightful to be able to
> create a fully-specified process without forking. This might end up
> being a fairly complicated project, though -- there are a lot of
> inherited process properties to be enumerated.

Since you are going to (eventually) load in a program image
have to do several system calls to create the process isn't
likely to be a problem.
So using separate calls for each property isn't really an issue
and solves the horrid problem of the API structure.

So you could create an embryonic process that inherits a lot
of stuff from the current process, the do actions that
sort out the fds, argv, namespace etc.
Finally running the new program.

It would probably make implement posix_spawn() easier.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-07-28 16:40:07

by John Cotton Ericson

[permalink] [raw]
Subject: Leveraging pidfs for process creation without fork

Hi,

I was excited to learn about about pidfds the other day, precisely in
hopes that it would open the door to such a "sane process creation API".
I searched the LKML, found this thread, and now hope to rekindle the
discussion; my apologies if there has been more discussion since that I
missed and I am making redundant noise.

----

On Tue, Feb 2, 2021, at 4:23 AM, David Laight wrote:
> From: Andy Lutomirski
> > Sent: 01 February 2021 18:30
> ...
> > 2. A sane process creation API. It would be delightful to be able to
> > create a fully-specified process without forking. This might end up
> > being a fairly complicated project, though -- there are a lot of
> > inherited process properties to be enumerated.
>
> Since you are going to (eventually) load in a program image
> have to do several system calls to create the process isn't
> likely to be a problem.
> So using separate calls for each property isn't really an issue
> and solves the horrid problem of the API structure.

I definitely concur creating an embryonic process and then setting the
properties sounds separately like the right approach. I'm no expert, but
I gather from afar that between BPF and io_uring, plenty of people are
investigating general methods of batched/pipelined communication with
the kernel, and so there's little reason to go around making more ad-hoc
mammoth syscalls for specific sets of tasks.

----

> So you could create an embryonic process that inherits a lot
> of stuff from the current process, the do actions that
> sort out the fds, argv, namespace etc.
> Finally running the new program.

All that sounds good, but I wonder if it would be possible to have a
flag such that inheritance (where practical) would *not* be the default
for new processes. I'm convinced that better security will always be an
uphill battle until privileges/capabilities/resources are *not* shared
by default. Only when more sharing requires monotonically more
programmer effort will productivity/laziness align with the principle of
least privilege.

With fork/exec, there's no good way to achieve this, I think it's safe
to say. But with the embryonic processes method, where one has the
ability to e.g. set/unset file descriptors on the embryo under
construction, it seems quite natural.

This is one wrinkle of interface evolution --- as new sandboxing
mechanisms / namespaces are created, we would either need to create
yet-new "no really, default no-share" flags, or arguably be causing API
breakage as previously "leaking" privileges are patched up. I am hopeful
that either having versioned flags, or thoroughly documenting up-front
that the exact behavior is subject to change as "leaks are plugged" is
OK, but I recognize that the former might be too much complexity and the
latter to weasel-wordy, and therefore the whole idea of "opt-in sharing
only" will have to wait.

----

The security <-> ergonomics aspect is the main point of interest for me,
but there a few random ideas:

1. I originally thought an fd to an embryonic process should in fact
point to the task_struct rather than pid, since there is no risk of the
data becoming useless asynchronously --- an embryonic process is never
scheduled and cannot do anything like exiting on it's own. But there is
no reason an embryonic process need start with just one thread, so
allowing entire embryonic thread groups might actually be virtuous. I
don't know for sure, but I figure in that case it is simpler to just
stick with the pid indirection.

2. Embryonic processes can be "forked at rest" (i.e. just duplicated),
which would allow a regime where they are used as templates for process
creation, duplicated ("forked at rest"), and sent around for other tasks
to spawn processes themselves. If my idea for "opt-in sharing only"
fails per the above, sending around an "as isolated as possible" embryo
template could be a decent fallback.

That's all I got. I hope continuing this design process is of interest
to others.

Cheers,

John

2021-07-29 14:26:54

by Christian Brauner

[permalink] [raw]
Subject: Re: Leveraging pidfs for process creation without fork

On Wed, Jul 28, 2021 at 12:37:57PM -0400, John Cotton Ericson wrote:
> Hi,
>
> I was excited to learn about about pidfds the other day, precisely in hopes
> that it would open the door to such a "sane process creation API". I
> searched the LKML, found this thread, and now hope to rekindle the
> discussion; my apologies if there has been more discussion since that I

Yeah, I haven't forgotten this discussion. A proposal is on my todo list
for this year. So far I've scheduled some time to work on this in the
fall.

Thanks!
Christian

2021-07-29 14:58:21

by John Ericson

[permalink] [raw]
Subject: Re: Leveraging pidfs for process creation without fork

Wonderful, looking forward to it reading it then!

John

On Thu, Jul 29, 2021, at 10:24 AM, Christian Brauner wrote:
> On Wed, Jul 28, 2021 at 12:37:57PM -0400, John Cotton Ericson wrote:
> > Hi,
> >
> > I was excited to learn about about pidfds the other day, precisely in hopes
> > that it would open the door to such a "sane process creation API". I
> > searched the LKML, found this thread, and now hope to rekindle the
> > discussion; my apologies if there has been more discussion since that I
>
> Yeah, I haven't forgotten this discussion. A proposal is on my todo list
> for this year. So far I've scheduled some time to work on this in the
> fall.
>
> Thanks!
> Christian

2021-07-30 01:44:20

by Al Viro

[permalink] [raw]
Subject: Re: Leveraging pidfs for process creation without fork

On Thu, Jul 29, 2021 at 04:24:15PM +0200, Christian Brauner wrote:
> On Wed, Jul 28, 2021 at 12:37:57PM -0400, John Cotton Ericson wrote:
> > Hi,
> >
> > I was excited to learn about about pidfds the other day, precisely in hopes
> > that it would open the door to such a "sane process creation API". I
> > searched the LKML, found this thread, and now hope to rekindle the
> > discussion; my apologies if there has been more discussion since that I
>
> Yeah, I haven't forgotten this discussion. A proposal is on my todo list
> for this year. So far I've scheduled some time to work on this in the
> fall.

Keep in mind that quite a few places in kernel/exit.c very much rely upon the
lack of anything outside of thread group adding threads into it. Same for
fs/exec.c.

2021-07-31 22:47:10

by Al Viro

[permalink] [raw]
Subject: Re: Leveraging pidfs for process creation without fork

On Sat, Jul 31, 2021 at 03:11:03PM -0700, John Ericson wrote:
> Do you mind pointing out one of those examples? I'm new to this, but if they follow a pattern I should be able to find the other examples based off it. I'm certainly curious to take a look :).
>
> I hope these issues aren't to deep. Ideally there's a nice decoupling so the creating process is just manipulating "inert" data structures for the embryo that scheduler doesn't even need see, and then after the embryonic process is submitted, when the context switches to it for the first time that's a completely normal process without special cases.
>
> The place complexity is hardest to avoid I think would be cleaning up the yet-unborn embryonic processes orphaned by exitted parent(s), because that will have to handle all the semi-initialized states those could be in (as opposed to real processes).

It's more on the exit/exec/coredump side, actually. For
exit we want to be sure that no new live threads will appear in a
group once the last live thread has entered do_exit(). For
exec (de_thread(), for starters) you want to have all threads
except for the one that does execve() to be killed and your
thread to take over as group leader. Look for the machinery there
and in do_exit()/release_task() involved into that. For coredump
you want all threads except for dumper to be brought into do_exit()
and stopped there, for dumping one to be able to access their state.

Then there's fun with ->sighand treatment - the whole thing
critically relies upon ->sighand being shared for the entire thread
group; look at the ->sighand->siglock uses.

The whole area is full of rather subtle places. Again, the
real headache comes from the exit and execve. Embryonic threads are
passive; it's the ones already running that can (and do) cause PITA.

What do you want that for, BTW?

2021-08-02 12:20:58

by Christian Brauner

[permalink] [raw]
Subject: Re: Leveraging pidfs for process creation without fork

On Sat, Jul 31, 2021 at 10:42:16PM +0000, Al Viro wrote:
> On Sat, Jul 31, 2021 at 03:11:03PM -0700, John Ericson wrote:
> > Do you mind pointing out one of those examples? I'm new to this, but if they follow a pattern I should be able to find the other examples based off it. I'm certainly curious to take a look :).
> >
> > I hope these issues aren't to deep. Ideally there's a nice decoupling so the creating process is just manipulating "inert" data structures for the embryo that scheduler doesn't even need see, and then after the embryonic process is submitted, when the context switches to it for the first time that's a completely normal process without special cases.
> >
> > The place complexity is hardest to avoid I think would be cleaning up the yet-unborn embryonic processes orphaned by exitted parent(s), because that will have to handle all the semi-initialized states those could be in (as opposed to real processes).
>
> It's more on the exit/exec/coredump side, actually. For
> exit we want to be sure that no new live threads will appear in a
> group once the last live thread has entered do_exit(). For
> exec (de_thread(), for starters) you want to have all threads
> except for the one that does execve() to be killed and your
> thread to take over as group leader. Look for the machinery there
> and in do_exit()/release_task() involved into that. For coredump
> you want all threads except for dumper to be brought into do_exit()
> and stopped there, for dumping one to be able to access their state.
>
> Then there's fun with ->sighand treatment - the whole thing
> critically relies upon ->sighand being shared for the entire thread
> group; look at the ->sighand->siglock uses.
>
> The whole area is full of rather subtle places. Again, the
> real headache comes from the exit and execve. Embryonic threads are
> passive; it's the ones already running that can (and do) cause PITA.

Iiuc, you're talking about adding a thread into a thread-group tg1 from
a thread in another thread-group tg2. I don't think that's a very
pressing use-case and I agree that that sounds rather nasty right now.
Unless I'm missing something, a simple api to create something like a
processes configuration context doesn't require this.

2021-08-03 06:01:58

by John Cotton Ericson

[permalink] [raw]
Subject: Re: Leveraging pidfs for process creation without fork

On Mon, Aug 2, 2021, at 8:19 AM, Christian Brauner wrote:
> On Sat, Jul 31, 2021 at 10:42:16PM +0000, Al Viro wrote:
> >
> > It's more on the exit/exec/coredump side, actually. For
> > exit we want to be sure that no new live threads will appear in a
> > group once the last live thread has entered do_exit(). For
> > exec (de_thread(), for starters) you want to have all threads
> > except for the one that does execve() to be killed and your
> > thread to take over as group leader. Look for the machinery there
> > and in do_exit()/release_task() involved into that. For coredump
> > you want all threads except for dumper to be brought into do_exit()
> > and stopped there, for dumping one to be able to access their state.
> >
> > Then there's fun with ->sighand treatment - the whole thing
> > critically relies upon ->sighand being shared for the entire thread
> > group; look at the ->sighand->siglock uses.
> >
> > The whole area is full of rather subtle places. Again, the
> > real headache comes from the exit and execve. Embryonic threads are
> > passive; it's the ones already running that can (and do) cause PITA.

I took a look at de_thread and begin_new_exec. It does seems whatever
trouble there is stems from a bit of mixing concerns as I thought.

Most of begin_new_exec seems about wiping clean the current process's
state, including the de_thread, unsharing various things. But then
operations like that first bprm_creds_from_file call (of perhaps more
recent vintage [1]) is about initializing new state from binprm argument.

It is interesting to me to note that some of the "unsharing" happens at
clone time (the namespaces), and some happens (also) at exec time (file
table, signal handlers). This to me is more good concrete evidence fork
+ exec is awkward and strews concerns.

There perhaps will be some subtleties about in which order state can be
set up on the embryonic process, but I don't think any de_thread will be
needed because there will never be threads from a "previous" state lying
around. Indeed there is no "previous" anything, just the current
everything-inert embryonic process.

I would propose trying to rip up begin_new_exec so the unsharing,
de_thread-ing etc. is just done in the traditional exec path, and just
the bprm bits with a non-current fresh embryonic task_sched are done in
the new one.

[1]: 56305aa9b6fab91a5555a45796b79c1b0a6353d1

> Iiuc, you're talking about adding a thread into a thread-group tg1 from
> a thread in another thread-group tg2. I don't think that's a very
> pressing use-case and I agree that that sounds rather nasty right now.
> Unless I'm missing something, a simple api to create something like a
> processes configuration context doesn't require this.

Agreed.

I did mention embryonic processes with multiple threads, but was just a
shower thought and not something I really care about. Also, since that
would entail adding a thread to an inert thread group the creator has
full powers over (it's "on the operating table") I don't think it would
be so bad.

(To keep this new servery metaphor going, exec would be self-surgery,
and adding a thread to *live* thread group would be surgery without
anesthesia.)

> a processes configuration context

This phrase stuck to me, Christian. Not to rush you on your concrete
proposal, but sounds like you are envisioning building up a separate
struct with instructions on how to produce a process, rather than
mutating unscheduled but otherwise genuine `task_struct`s?

> > What do you want that for, BTW?

Those security + ergonomic things I mentioned in my original email are
the main goal.

I have a personal *long*-term goal to see something like CloudABI
resurrected. I think it got most of the interfaces right, but not
process management, and now that there are pidfds, we have a chance to
better.

I'm in no rush, so happy to just see very linux-specific interfaces
evolve in a good direction for now. Writing a personality or some other
shim is not the interesting part, to say the least, so I'm happy to wait
ages before doing that while the internals marinate.

John