2012-02-29 15:16:42

by Cyrill Gorcunov

[permalink] [raw]
Subject: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file

Hi guys,

at restore time we would like to have a way for /proc/pid/exe
symlink recovering. So I thought extending prctl might be
a good idea.

Still maybe there some other good and 'right' way to do it,
so I would like to gather opinions.

Please review, thanks!

Cyrill
---
From: Cyrill Gorcunov <[email protected]>
Subject: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file

When we do restore we would like to have a way to setup
a former mm_struct::exe_file so that /proc/pid/exe would
point to the original executable file a process had at
checkpoint time.

For this sake PR_SET_MM_EXE_FILE code is introduced.

Note, if mm_struct::exe_file already mapped more than once
we refuse to change anything (which prevents kernel from
potential problems).

Signed-off-by: Cyrill Gorcunov <[email protected]>
---
include/linux/prctl.h | 1
kernel/sys.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 73 insertions(+), 1 deletion(-)

Index: linux-2.6.git/include/linux/prctl.h
===================================================================
--- linux-2.6.git.orig/include/linux/prctl.h
+++ linux-2.6.git/include/linux/prctl.h
@@ -118,5 +118,6 @@
# define PR_SET_MM_ENV_START 10
# define PR_SET_MM_ENV_END 11
# define PR_SET_MM_AUXV 12
+# define PR_SET_MM_EXE_FILE 13

#endif /* _LINUX_PRCTL_H */
Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -1701,6 +1701,66 @@ static bool vma_flags_mismatch(struct vm
(vma->vm_flags & banned);
}

+/* Expects mm->mmap_sem is read-taken */
+static int prctl_set_mm_exe_file(struct mm_struct *mm,
+ const void __user *path,
+ size_t size)
+{
+ struct file *new_exe_file;
+ char *pathbuf;
+ int ret = 0;
+
+ if (size >= PATH_MAX)
+ return -EINVAL;
+
+ /*
+ * We allow to change only those exe's which
+ * are not mapped several times. This one
+ * is early test while mmap_sem is taken.
+ */
+ if (mm->num_exe_file_vmas > 1)
+ return -EBUSY;
+
+ up_read(&mm->mmap_sem);
+
+ pathbuf = kmalloc(size, GFP_TEMPORARY);
+ if (!pathbuf) {
+ ret = -ENOMEM;
+ goto err_down;
+ }
+
+ if (copy_from_user(pathbuf, path, size)) {
+ kfree(pathbuf);
+ ret = -EFAULT;
+ goto err_down;
+ }
+ pathbuf[size-1] = '\0';
+
+ new_exe_file = open_exec(pathbuf);
+ kfree(pathbuf);
+
+ down_read(&mm->mmap_sem);
+
+ if (IS_ERR(new_exe_file))
+ return PTR_ERR(new_exe_file);
+
+ /*
+ * We allow to change only those exe's which
+ * are not mapped several times.
+ */
+ if (mm->num_exe_file_vmas < 2) {
+ set_mm_exe_file(mm, new_exe_file);
+ ret = 0;
+ } else
+ ret = -EBUSY;
+
+ return ret;
+
+err_down:
+ down_read(&mm->mmap_sem);
+ return ret;
+}
+
static int prctl_set_mm(int opt, unsigned long addr,
unsigned long arg4, unsigned long arg5)
{
@@ -1709,7 +1769,9 @@ static int prctl_set_mm(int opt, unsigne
struct vm_area_struct *vma;
int error = 0;

- if (arg5 || (arg4 && opt != PR_SET_MM_AUXV))
+ if (arg5 || (arg4 &&
+ opt != PR_SET_MM_AUXV &&
+ opt != PR_SET_MM_EXE_FILE))
return -EINVAL;

if (!capable(CAP_SYS_ADMIN))
@@ -1837,6 +1899,15 @@ static int prctl_set_mm(int opt, unsigne

return 0;
}
+
+ /*
+ * This to restore /proc/self/exe link.
+ */
+ case PR_SET_MM_EXE_FILE:
+ error = prctl_set_mm_exe_file(mm, (const void __user *)addr, arg4);
+ if (error)
+ goto out;
+ break;
default:
error = -EINVAL;
goto out;


2012-02-29 15:23:34

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file

On 02/29/2012 07:16 PM, Cyrill Gorcunov wrote:
> Hi guys,
>
> at restore time we would like to have a way for /proc/pid/exe
> symlink recovering. So I thought extending prctl might be
> a good idea.
>
> Still maybe there some other good and 'right' way to do it,
> so I would like to gather opinions.

If using the prctl for this, then taking a given file descriptor
(with fcheck) instead of taking path and opening it in kernel is
better.

> Please review, thanks!
>
> Cyrill

Thanks,
Pavel

2012-02-29 15:31:39

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file

On Wed, Feb 29, 2012 at 07:23:07PM +0400, Pavel Emelyanov wrote:
> On 02/29/2012 07:16 PM, Cyrill Gorcunov wrote:
> > Hi guys,
> >
> > at restore time we would like to have a way for /proc/pid/exe
> > symlink recovering. So I thought extending prctl might be
> > a good idea.
> >
> > Still maybe there some other good and 'right' way to do it,
> > so I would like to gather opinions.
>
> If using the prctl for this, then taking a given file descriptor
> (with fcheck) instead of taking path and opening it in kernel is
> better.
>

Sounds like a plan. Thanks Pavel!

Cyrill

2012-02-29 19:31:04

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file

On 02/29, Cyrill Gorcunov wrote:
>
> +static int prctl_set_mm_exe_file(struct mm_struct *mm,
> + const void __user *path,
> + size_t size)
> +{
> + struct file *new_exe_file;
> + char *pathbuf;
> + int ret = 0;
> +
> + if (size >= PATH_MAX)
> + return -EINVAL;
> +
> + /*
> + * We allow to change only those exe's which
> + * are not mapped several times. This one
> + * is early test while mmap_sem is taken.
> + */
> + if (mm->num_exe_file_vmas > 1)
> + return -EBUSY;

I don't really understand this check, but it is racy. Another thread
can change ->num_exe_file_vmas right after the check.

> + up_read(&mm->mmap_sem);

up? I do not see down...

> + new_exe_file = open_exec(pathbuf);
> + kfree(pathbuf);
> +
> + down_read(&mm->mmap_sem);

probably you meant "up" here. OK, I am ignoring ->mmap_sem, I can't
understand what did you really mean ;)

> + if (IS_ERR(new_exe_file))
> + return PTR_ERR(new_exe_file);
> +
> + /*
> + * We allow to change only those exe's which
> + * are not mapped several times.
> + */
> + if (mm->num_exe_file_vmas < 2) {
> + set_mm_exe_file(mm, new_exe_file);
> + ret = 0;
> + } else
> + ret = -EBUSY;
> +
> + return ret;

Both success/EBUSY leak new_exe_file. And I agree with Pavel,
prctl_set_mm_exe_file() should take fd, not filename.

I simply can't understand why set_mm_exe_file() is safe. What
if we race with another thread doing set_mm_exe_file() too?
Or it can race with added_exe_file_vma/removed_exe_file_vma.

And. set_mm_exe_file() sets ->num_exe_file_vmas = 0, this is
simply wrong? It should match the number of VM_EXECUTABLE
vmas.

In short, I do not understand the patch at all. It seems, you
only need to replace mm->exe_file under down_write(mmap_sem)
and nothing else.

Oleg.

2012-02-29 20:01:10

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [RFC] c/r: prctl: Add ability to set new mm_struct::exe_file

On Wed, Feb 29, 2012 at 08:24:00PM +0100, Oleg Nesterov wrote:
> On 02/29, Cyrill Gorcunov wrote:
> >
> > +static int prctl_set_mm_exe_file(struct mm_struct *mm,
> > + const void __user *path,
> > + size_t size)
> > +{
> > + struct file *new_exe_file;
> > + char *pathbuf;
> > + int ret = 0;
> > +
> > + if (size >= PATH_MAX)
> > + return -EINVAL;
> > +
> > + /*
> > + * We allow to change only those exe's which
> > + * are not mapped several times. This one
> > + * is early test while mmap_sem is taken.
> > + */
> > + if (mm->num_exe_file_vmas > 1)
> > + return -EBUSY;
>
> I don't really understand this check, but it is racy. Another thread
> can change ->num_exe_file_vmas right after the check.
>
> > + up_read(&mm->mmap_sem);
>
> up? I do not see down...

down is taken in calling routine (as pointed in comment on
prctl_set_mm_exe_file), thus I suppose I miss something since
the calling functions which increment/decrement num_exe_file_vmas
(such as mremap) do down_write(mmap_sem) first.

>
> > + new_exe_file = open_exec(pathbuf);
> > + kfree(pathbuf);
> > +
> > + down_read(&mm->mmap_sem);
>
> probably you meant "up" here. OK, I am ignoring ->mmap_sem, I can't
> understand what did you really mean ;)
>

nop, down instead ;)

> > + if (IS_ERR(new_exe_file))
> > + return PTR_ERR(new_exe_file);
> > +
> > + /*
> > + * We allow to change only those exe's which
> > + * are not mapped several times.
> > + */
> > + if (mm->num_exe_file_vmas < 2) {
> > + set_mm_exe_file(mm, new_exe_file);
> > + ret = 0;
> > + } else
> > + ret = -EBUSY;
> > +
> > + return ret;
>
> Both success/EBUSY leak new_exe_file. And I agree with Pavel,

yeah, will fix, thanks!

> prctl_set_mm_exe_file() should take fd, not filename.
>

yes, i'll switch to this idea

> I simply can't understand why set_mm_exe_file() is safe. What
> if we race with another thread doing set_mm_exe_file() too?
> Or it can race with added_exe_file_vma/removed_exe_file_vma.

really, Oleg, I don't see race here since this routine is
caller under down_read and I've been releasing mmap_sem for
short time then reacquiring it, and recheck for number of
num_exe_file_vmas. so I presume I miss something obvious
here.

>
> And. set_mm_exe_file() sets ->num_exe_file_vmas = 0, this is
> simply wrong? It should match the number of VM_EXECUTABLE
> vmas.
>

yes, it's a nit which sould be fixed. thanks!

> In short, I do not understand the patch at all. It seems, you
> only need to replace mm->exe_file under down_write(mmap_sem)
> and nothing else.

I can't just replace it, I wanted to check it the new symlink
will indeed point to executable (such ceheck btw is done
in open_exec() helper) and I actually wonted to replace
only freshly created executables which didn't have any
remaps on executable VMA (still I might be wrong
here and it's indeed safe just to replace old exe_file).

That's why I posted it as RFC and really appreciate
feedback (so, thanks a lot, Oleg!).

Cyrill