2009-06-24 23:01:04

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

In some circumstances running process needs to re-execute
its image.

Among other useful cases, it is _crucial_ for NOMMU arches.

They need it to perform daemonization. Classic sequence
of "fork, parent dies, child continues" can't be used
due to lack of fork on NOMMU, and instead we have to do
"vfork, child re-exec itself (with a flag to not daemonize)
and therefore unblocks parent, parent dies".

Another crucial use case on NOMMU is POSIX shell support.
Imagine a shell command of the form "func1 | func2 | func3".
This can be implemented on NOMMU by vforking thrice,
re-executing the shell in every child in the form
"<shell> -c 'body of funcN'", and letting parent wait and collect
exitcodes and such. As far as I can see, it's the only way
to implement it correctly on NOMMU.

The program may re-execute itself by name if it knows the name,
but we generally may be unsure about it. Binary may be renamed,
or even deleted while it is being run.

More elegant way is to execute /proc/self/exe.
This works just fine as long as /proc is mounted.

But it breaks if /proc isn't mounted, and this can happen in real-world
usage. For example, when shell invoked very early in initrd/initramfs.

With this patch, it is possible to execute /proc/self/exe
even if /proc is not mounted. In the below example,
./sh is a static shell binary:

# chroot . ./sh
/ # echo $0
./sh
/ # . /proc/self/exe
hush: /proc/self/exe: No such file or directory
/ # /proc/self/exe <==========
/ # echo $0
/proc/self/exe
/ # exit
/ # exit
#

On an unpatched kernel, command marked with <=== would fail.

How patch does it: when execve syscall discovers that opening of binary
image fails, a small bit of code is added to special case "/proc/self/exe"
string. If binary name is *exactly* that string, and if error is ENOENT
or EACCES, then exec will still succeed, using current binary's image.

Please apply.

Signed-off-by: Denys Vlasenko <[email protected]>
--
vda


Attachments:
linux-2.6.30_proc_self_exe.patch (852.00 B)

2009-06-24 23:21:35

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

On Thu, 25 Jun 2009 01:00:56 +0200
Denys Vlasenko <[email protected]> wrote:

> In some circumstances running process needs to re-execute
> its image.
>
> Among other useful cases, it is _crucial_ for NOMMU arches.
>
> They need it to perform daemonization. Classic sequence
> of "fork, parent dies, child continues" can't be used
> due to lack of fork on NOMMU, and instead we have to do
> "vfork, child re-exec itself (with a flag to not daemonize)
> and therefore unblocks parent, parent dies".
>
> Another crucial use case on NOMMU is POSIX shell support.
> Imagine a shell command of the form "func1 | func2 | func3".
> This can be implemented on NOMMU by vforking thrice,
> re-executing the shell in every child in the form
> "<shell> -c 'body of funcN'", and letting parent wait and collect
> exitcodes and such. As far as I can see, it's the only way
> to implement it correctly on NOMMU.
>
> The program may re-execute itself by name if it knows the name,
> but we generally may be unsure about it. Binary may be renamed,
> or even deleted while it is being run.
>
> More elegant way is to execute /proc/self/exe.
> This works just fine as long as /proc is mounted.
>
> But it breaks if /proc isn't mounted, and this can happen in real-world
> usage. For example, when shell invoked very early in initrd/initramfs.

Why can't userspace mount /proc before doing the daemonization?

> With this patch, it is possible to execute /proc/self/exe
> even if /proc is not mounted. In the below example,
> ./sh is a static shell binary:
>
> # chroot . ./sh
> / # echo $0
> ./sh
> / # . /proc/self/exe
> hush: /proc/self/exe: No such file or directory
> / # /proc/self/exe <==========
> / # echo $0
> /proc/self/exe
> / # exit
> / # exit
> #
>
> On an unpatched kernel, command marked with <=== would fail.
>
> How patch does it: when execve syscall discovers that opening of binary
> image fails, a small bit of code is added to special case "/proc/self/exe"
> string. If binary name is *exactly* that string, and if error is ENOENT
> or EACCES, then exec will still succeed, using current binary's image.
>
> Please apply.
>
>
> diff -urp ../linux-2.6.30.org/fs/exec.c linux-2.6.30/fs/exec.c
> --- ../linux-2.6.30.org/fs/exec.c 2009-06-10 05:05:27.000000000 +0200
> +++ linux-2.6.30/fs/exec.c 2009-06-25 00:20:13.000000000 +0200
> @@ -652,9 +652,25 @@ struct file *open_exec(const char *name)
> file = do_filp_open(AT_FDCWD, name,
> O_LARGEFILE | O_RDONLY | FMODE_EXEC, 0,
> MAY_EXEC | MAY_OPEN);
> - if (IS_ERR(file))
> - goto out;
> + if (IS_ERR(file)) {
> + if ((PTR_ERR(file) == -ENOENT || PTR_ERR(file) == -EACCES)
> + && strcmp(name, "/proc/self/exe") == 0
> + ) {
> + struct file *sv = file;
> + struct mm_struct *mm;
>
> + mm = get_task_mm(current);
> + if (!mm)
> + goto out;
> + file = get_mm_exe_file(mm);
> + mmput(mm);
> + if (file)
> + goto ok;
> + file = sv;
> + }
> + goto out;
> + }
> +ok:
> err = -EACCES;
> if (!S_ISREG(file->f_path.dentry->d_inode->i_mode))
> goto exit;

Oh geeze. Hard-coded "/proc/self/exec" it the middle of the core exec
code? You're a brave man.

Relatively minor observations:

- The code layout is weird

- This hack should be hidden in a separate function, not splattered
all over the middle of open_exec().

- That function should be documented in a way which will permit
readers to understand why it exists.


But don't do any of that yet. This will be an unpopular patch and I
fear for its future ;)

2009-06-24 23:49:21

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

On Thu, Jun 25, 2009 at 1:21 AM, Andrew Morton<[email protected]> wrote:
> On Thu, 25 Jun 2009 01:00:56 +0200
> Denys Vlasenko <[email protected]> wrote:
>> In some circumstances running process needs to re-execute
>> its image.
...
>> More elegant way is to execute /proc/self/exe.
>> This works just fine as long as /proc is mounted.
>>
>> But it breaks if /proc isn't mounted, and this can happen in real-world
>> usage. For example, when shell invoked very early in initrd/initramfs.
>
> Why can't userspace mount /proc before doing the daemonization?

Some people want to unset CONFIG_PROC_FS, and still have
working POSIX compatible shell. Coincidentally, NOMMU
machines, ones which *require* re-execution of the shell to support that,
tent to be the most memory starved machines too (thus most likely
to be those where people desire to unset CONFIG_PROC_FS).

> Oh geeze. ?Hard-coded "/proc/self/exec" it the middle of the core exec
> code? ?You're a brave man.

There are other alternatives. This looked to be the least ugly
to me.

We can special-case execve(NULL, ...).
But I feared people would say this will change previously-buggy
userspace code into one acting weirdly; in come cases
leading to infinite execve loops. Do you think it's better
than "/proc/self/exe"?

Then I thought about using a special name to mean "re-execute me",
like "", or "/./self" or whatever. Whatever I though about,
it was either risking a collision with a real file, or was too ugly,
or both.

Then it occurred to me that "/proc/self/exe" _already is_
such a name. It is _already used_ for this purpose, so the userspace
does not need to be changed.

For the extra non-intrusiveness, the hack kicks in only if
/proc/self/exe does not exist.


[code style notes skipped. I will re-write it in whatever form
you like it most, when/if it will be agreed on in principle ]

> But don't do any of that yet. ?This will be an unpopular patch and I
> fear for its future ;)

Propose some other way to make it possible to re-execute a binary
without /proc.
--
vda

2009-06-24 23:59:06

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

On Thu, Jun 25, 2009 at 01:00:56AM +0200, Denys Vlasenko wrote:
> More elegant way is to execute /proc/self/exe.
> This works just fine as long as /proc is mounted.

So mount it.

> But it breaks if /proc isn't mounted, and this can happen in real-world
> usage. For example, when shell invoked very early in initrd/initramfs.

So mount it.

> With this patch, it is possible to execute /proc/self/exe
> even if /proc is not mounted.

> How patch does it: when execve syscall discovers that opening of binary
> image fails, a small bit of code is added to special case "/proc/self/exe"
> string. If binary name is *exactly* that string, and if error is ENOENT
> or EACCES, then exec will still succeed, using current binary's image.
>
> Please apply.

No. This is just plain sick. Magical pathnames have no business being
in the kernel. If procfs is too much for your sensitive soul, do an
extremely trimmed-down version that would consist of *one* *file* (yes,
as root and only node on fs). Said file being a procfs-style symlink,
doing exactly what /proc/self/exec would do.

On such system you can just mkdir /proc/self, touch /proc/self/exec,
mount -t self_exec none /proc/self/exec and be done with that. No
magic needed, end of the story.

2009-06-25 00:07:38

by Mike Frysinger

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

On Wed, Jun 24, 2009 at 19:58, Al Viro wrote:
> On Thu, Jun 25, 2009 at 01:00:56AM +0200, Denys Vlasenko wrote:
>> More elegant way is to execute /proc/self/exe.
>> This works just fine as long as /proc is mounted.
>
> So mount it.

well, in the busybox case, in order to run mount you might have to
exec yourself first ...

> No. This is just plain sick. Magical pathnames have no business being
> in the kernel. If procfs is too much for your sensitive soul, do an
> extremely trimmed-down version that would consist of *one* *file* (yes,
> as root and only node on fs). Said file being a procfs-style symlink,
> doing exactly what /proc/self/exec would do.
>
> On such system you can just mkdir /proc/self, touch /proc/self/exec,
> mount -t self_exec none /proc/self/exec and be done with that. No
> magic needed, end of the story.

if that is acceptable, how about a special binfmt that depends on
EMBEDDED and we put the magic there.
-mike

2009-06-25 00:26:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

On Thu, 25 Jun 2009 01:49:13 +0200
Denys Vlasenko <[email protected]> wrote:

> Propose some other way to make it possible to re-execute a binary
> without /proc.

Add a system call to do this?

2009-06-25 08:09:38

by Alan

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

> With this patch, it is possible to execute /proc/self/exe
> even if /proc is not mounted. In the below example,
> ./sh is a static shell binary:

What if the user has procfs mounted somewherelse, what if they are in a
chroot where you don't want them to patch the binary and re-exec it ?

It would be far far cleaner for NOMMU to have a NOMMU private "reexec()"
call that didn't rely on procfs or hacking names into the kernel.

So NAK

Alan

2009-06-25 18:02:27

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

Denys Vlasenko <[email protected]> writes:

> In some circumstances running process needs to re-execute
> its image.
>
> Among other useful cases, it is _crucial_ for NOMMU arches.
>
> They need it to perform daemonization. Classic sequence
> of "fork, parent dies, child continues" can't be used
> due to lack of fork on NOMMU, and instead we have to do
> "vfork, child re-exec itself (with a flag to not daemonize)
> and therefore unblocks parent, parent dies".

Why?

I would expect a simple assembly wrapper around clone would work.
I.e. Create a new process but share the MM and reuse the same
stack. When clone returns if we are the parent exit, otherwise
continue on with life.

> Another crucial use case on NOMMU is POSIX shell support.
> Imagine a shell command of the form "func1 | func2 | func3".
> This can be implemented on NOMMU by vforking thrice,
> re-executing the shell in every child in the form
> "<shell> -c 'body of funcN'", and letting parent wait and collect
> exitcodes and such. As far as I can see, it's the only way
> to implement it correctly on NOMMU.
>
> The program may re-execute itself by name if it knows the name,
> but we generally may be unsure about it. Binary may be renamed,
> or even deleted while it is being run.

It really sounds like you want to implement fork for NOMMU.
If you have a base address register which everything is relative
to I can't imagine it would be too hard. Heck you might even be
able to do it in user space.

Eric

2009-06-25 18:16:57

by Mike Frysinger

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

On Thu, Jun 25, 2009 at 14:02, Eric W. Biederman wrote:
> Denys Vlasenko <[email protected]> writes:
>> In some circumstances running process needs to re-execute
>> its image.
>>
>> Among other useful cases, it is _crucial_ for NOMMU arches.
>>
>> They need it to perform daemonization. Classic sequence
>> of "fork, parent dies, child continues" can't be used
>> due to lack of fork on NOMMU, and instead we have to do
>> "vfork, child re-exec itself (with a flag to not daemonize)
>> and therefore unblocks parent, parent dies".
>
> Why?
>
> I would expect a simple assembly wrapper around clone would work.
> I.e.  Create a new process but share the MM and reuse the same
> stack. When clone returns if we are the parent exit, otherwise
> continue on with life.

this has already been implemented in uClibc with the hints from Jamie
Lokier, but this was done only recently. before that, the only way to
daemonize was to vfork+re-exec.

>> Another crucial use case on NOMMU is POSIX shell support.
>> Imagine a shell command of the form "func1 | func2 | func3".
>> This can be implemented on NOMMU by vforking thrice,
>> re-executing the shell in every child in the form
>> "<shell> -c 'body of funcN'", and letting parent wait and collect
>> exitcodes and such. As far as I can see, it's the only way
>> to implement it correctly on NOMMU.
>>
>> The program may re-execute itself by name if it knows the name,
>> but we generally may be unsure about it. Binary may be renamed,
>> or even deleted while it is being run.
>
> It really sounds like you want to implement fork for NOMMU.
> If you have a base address register which everything is relative
> to I can't imagine it would be too hard.  Heck you might even be
> able to do it in user space.

that isnt really feasible.
-mike

2009-06-26 08:00:40

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

On Thu, Jun 25, 2009 at 10:10 AM, Alan Cox<[email protected]> wrote:
>> With this patch, it is possible to execute /proc/self/exe
>> even if /proc is not mounted. In the below example,
>> ./sh is a static shell binary:
>
> What if the user has procfs mounted somewherelse, what if they are in a
> chroot where you don't want them to patch the binary and re-exec it ?
>
> It would be far far cleaner for NOMMU to have a NOMMU private "reexec()"
> call that didn't rely on procfs or hacking names into the kernel.
>
> So NAK

I am ok with it. Are other people ok with adding a syscall
just for this purpose? Al?
--
vda

2009-06-26 08:18:40

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

* Denys Vlasenko:

> Some people want to unset CONFIG_PROC_FS, and still have
> working POSIX compatible shell.

Nowadays, the dynamic linker requires a mounted /proc with
/proc/self/exe for some features (notably $ORIGIN handling). So it's
not self-exec only that doesn't work if /proc isn't there. I guess
the kernel must make sure that /proc is always there (unlikely), or we
need a different mechanism to get this data.

--
Florian Weimer <[email protected]>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstra?e 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

2009-06-26 13:26:55

by Mike Frysinger

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

On Fri, Jun 26, 2009 at 04:00, Denys Vlasenko wrote:
> On Thu, Jun 25, 2009 at 10:10 AM, Alan Cox wrote:
>>> With this patch, it is possible to execute /proc/self/exe
>>> even if /proc is not mounted. In the below example,
>>> ./sh is a static shell binary:
>>
>> What if the user has procfs mounted somewherelse, what if they are in a
>> chroot where you don't want them to patch the binary and re-exec it ?
>>
>> It would be far far cleaner for NOMMU to have a NOMMU private "reexec()"
>> call that didn't rely on procfs or hacking names into the kernel.
>>
>> So NAK
>
> I am ok with it. Are other people ok with adding a syscall
> just for this purpose? Al?

please try a custom binfmt first
-mike

2009-06-26 22:55:40

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

On Fri, Jun 26, 2009 at 3:26 PM, Mike Frysinger<[email protected]> wrote:
> On Fri, Jun 26, 2009 at 04:00, Denys Vlasenko wrote:
>> On Thu, Jun 25, 2009 at 10:10 AM, Alan Cox wrote:
>>>> With this patch, it is possible to execute /proc/self/exe
>>>> even if /proc is not mounted. In the below example,
>>>> ./sh is a static shell binary:
>>>
>>> What if the user has procfs mounted somewherelse, what if they are in a
>>> chroot where you don't want them to patch the binary and re-exec it ?
>>>
>>> It would be far far cleaner for NOMMU to have a NOMMU private "reexec()"
>>> call that didn't rely on procfs or hacking names into the kernel.
>>>
>>> So NAK
>>
>> I am ok with it. Are other people ok with adding a syscall
>> just for this purpose? Al?
>
> please try a custom binfmt first

I did not understand you.
--
vda

2009-06-26 23:18:38

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

On Thu, Jun 25, 2009 at 1:58 AM, Al Viro<[email protected]> wrote:
>> With this patch, it is possible to execute /proc/self/exe
>> even if /proc is not mounted.
>
>> How patch does it: when execve syscall discovers that opening of binary
>> image fails, a small bit of code is added to special case "/proc/self/exe"
>> string. If binary name is *exactly* that string, and if error is ENOENT
>> or EACCES, then exec will still succeed, using current binary's image.
>>
>> Please apply.
>
> No. ?This is just plain sick. ?Magical pathnames have no business being
> in the kernel.

This is not a magical *pathname*. It only looks like it.
This is the magic 1st argument of execve which makes it perform reexecve().
Sorry, I had to explain it in the first email...

Creating entire new syscall reexecve() just for this purpose seems excessive.
Special-casing execve allows to avoid this.

I could have used "Please reexec me!!!" as a magic 1st parameter to execve.
This has two downsides: the string, however weird, still *can* match
a real file. Second, userspace needs to be modified to use such a name.

Magic parameter of the form "/proc/self/exe" does not suffer from
2nd problem. Userspace already uses it exactly for this purpose,
no change needed.

> If procfs is too much for your sensitive soul, do an
> extremely trimmed-down version that would consist of *one* *file* (yes,
> as root and only node on fs). ?Said file being a procfs-style symlink,
> doing exactly what /proc/self/exec would do.
>
> On such system you can just mkdir /proc/self, touch /proc/self/exec,
> mount -t self_exec none /proc/self/exec and be done with that. ?No
> magic needed, end of the story.

This would use many times more memory than a small code addition
on an execve's error path I posted.

It also would require mounting a filesystem. So the shell started by
init=/bin/sh on NOMMU machine either will need to be programmed
to do it when execve("/proc/self/exe") fails, or the user will need
to be taught to do it by hand before user can be sure the shell
will be able to run some POSIX constructs like function calls
in pipes etc.

--
vda

2009-06-27 11:16:29

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

On Thu 2009-06-25 01:49:13, Denys Vlasenko wrote:
> On Thu, Jun 25, 2009 at 1:21 AM, Andrew Morton<[email protected]> wrote:
> > On Thu, 25 Jun 2009 01:00:56 +0200
> > Denys Vlasenko <[email protected]> wrote:
> >> In some circumstances running process needs to re-execute
> >> its image.
> ...
> >> More elegant way is to execute /proc/self/exe.
> >> This works just fine as long as /proc is mounted.
> >>
> >> But it breaks if /proc isn't mounted, and this can happen in real-world
> >> usage. For example, when shell invoked very early in initrd/initramfs.
> >
> > Why can't userspace mount /proc before doing the daemonization?
>
> Some people want to unset CONFIG_PROC_FS, and still have
> working POSIX compatible shell. Coincidentally, NOMMU
> machines, ones which *require* re-execution of the shell to support that,
> tent to be the most memory starved machines too (thus most likely
> to be those where people desire to unset CONFIG_PROC_FS).

And some people want to mount /proc on /xyzzy.

Create minimal PROCMINI fs with just /proc/selv/exe?

> We can special-case execve(NULL, ...).
> But I feared people would say this will change previously-buggy
> userspace code into one acting weirdly; in come cases
> leading to infinite execve loops. Do you think it's better
> than "/proc/self/exe"?

Yes. Or... add execme() syscall?

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-06-28 19:31:58

by Mike Frysinger

[permalink] [raw]
Subject: Re: [PATCH] allow execve'ing "/proc/self/exe" even if /proc is not mounted

On Fri, Jun 26, 2009 at 18:55, Denys Vlasenko wrote:
> On Fri, Jun 26, 2009 at 3:26 PM, Mike Frysinger wrote:
>> On Fri, Jun 26, 2009 at 04:00, Denys Vlasenko wrote:
>>> On Thu, Jun 25, 2009 at 10:10 AM, Alan Cox wrote:
>>>>> With this patch, it is possible to execute /proc/self/exe
>>>>> even if /proc is not mounted. In the below example,
>>>>> ./sh is a static shell binary:
>>>>
>>>> What if the user has procfs mounted somewherelse, what if they are in a
>>>> chroot where you don't want them to patch the binary and re-exec it ?
>>>>
>>>> It would be far far cleaner for NOMMU to have a NOMMU private "reexec()"
>>>> call that didn't rely on procfs or hacking names into the kernel.
>>>>
>>>> So NAK
>>>
>>> I am ok with it. Are other people ok with adding a syscall
>>> just for this purpose? Al?
>>
>> please try a custom binfmt first
>
> I did not understand you.

i was thinking fs/binfmt_*.c will get executed all the time, but they
may not get the chance if execve() aborts early due to the file not
being found. if that's the case, then nm me.
-mike