2010-11-23 14:42:12

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [resend][PATCH 4/4] oom: don't ignore rss in nascent mm

On 10/25, KOSAKI Motohiro wrote:
>
> Because execve() makes new mm struct and setup stack and
> copy argv. It mean the task have two mm while execve() temporary.
> Unfortunately this nascent mm is not pointed any tasks, then
> OOM-killer can't detect this memory usage. therefore OOM-killer
> may kill incorrect task.
>
> Thus, this patch added signal->in_exec_mm member and track
> nascent mm usage.

Stupid question.

Can't we just account these allocations in the old -mm temporary?

IOW. Please look at the "patch" below. It is of course incomplete
and wrong (to the point inc_mm_counter() is not safe without
SPLIT_RSS_COUNTING), and copy_strings/flush_old_exec are not the
best places to play with mm-counters, just to explain what I mean.

It is very simple. copy_strings() increments MM_ANONPAGES every
time we add a new page into bprm->vma. This makes this memory
visible to select_bad_process().

When exec changes ->mm (or if it fails), we change MM_ANONPAGES
counter back.

Most probably I missed something, but what do you think?

Oleg.

--- x/include/linux/binfmts.h
+++ x/include/linux/binfmts.h
@@ -29,6 +29,7 @@ struct linux_binprm{
char buf[BINPRM_BUF_SIZE];
#ifdef CONFIG_MMU
struct vm_area_struct *vma;
+ unsigned long mm_anonpages;
#else
# define MAX_ARG_PAGES 32
struct page *page[MAX_ARG_PAGES];
--- x/fs/exec.c
+++ x/fs/exec.c
@@ -457,6 +457,9 @@ static int copy_strings(int argc, const
goto out;
}

+ bmrp->mm_anonpages--;
+ inc_mm_counter(current->mm, MM_ANONPAGES);
+
if (kmapped_page) {
flush_kernel_dcache_page(kmapped_page);
kunmap(kmapped_page);
@@ -1003,6 +1006,7 @@ int flush_old_exec(struct linux_binprm *
/*
* Release all of the old mmap stuff
*/
+ add_mm_counter(current->mm, bprm->mm_anonpages);
retval = exec_mmap(bprm->mm);
if (retval)
goto out;
@@ -1426,8 +1430,10 @@ int do_execve(const char * filename,
return retval;

out:
- if (bprm->mm)
- mmput (bprm->mm);
+ if (bprm->mm) {
+ add_mm_counter(current->mm, bprm->mm_anonpages);
+ mmput(bprm->mm);
+ }

out_file:
if (bprm->file) {


2010-11-24 00:24:48

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [resend][PATCH 4/4] oom: don't ignore rss in nascent mm

Hi

> On 10/25, KOSAKI Motohiro wrote:
> >
> > Because execve() makes new mm struct and setup stack and
> > copy argv. It mean the task have two mm while execve() temporary.
> > Unfortunately this nascent mm is not pointed any tasks, then
> > OOM-killer can't detect this memory usage. therefore OOM-killer
> > may kill incorrect task.
> >
> > Thus, this patch added signal->in_exec_mm member and track
> > nascent mm usage.
>
> Stupid question.
>
> Can't we just account these allocations in the old -mm temporary?
>
> IOW. Please look at the "patch" below. It is of course incomplete
> and wrong (to the point inc_mm_counter() is not safe without
> SPLIT_RSS_COUNTING), and copy_strings/flush_old_exec are not the
> best places to play with mm-counters, just to explain what I mean.
>
> It is very simple. copy_strings() increments MM_ANONPAGES every
> time we add a new page into bprm->vma. This makes this memory
> visible to select_bad_process().
>
> When exec changes ->mm (or if it fails), we change MM_ANONPAGES
> counter back.
>
> Most probably I missed something, but what do you think?

Because, If the pages of argv is swapping out when processing execve,
This accouing doesn't work.

Of cource, changing swapping-out logic is one of way. But I did hope
no VM core logic change. taking implict mlocking argv area during execve
is also one of option. But I did think implicit mlocking is more risky.

Is this enough explanation? Please don't hesitate say "no". If people
don't like my approach, I don't hesitate change my thinking.

Thanks.

2010-11-24 11:16:50

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [resend][PATCH 4/4] oom: don't ignore rss in nascent mm

On 11/24, KOSAKI Motohiro wrote:
>
> Hi
>
> > On 10/25, KOSAKI Motohiro wrote:
> > >
> > > Because execve() makes new mm struct and setup stack and
> > > copy argv. It mean the task have two mm while execve() temporary.
> > > Unfortunately this nascent mm is not pointed any tasks, then
> > > OOM-killer can't detect this memory usage. therefore OOM-killer
> > > may kill incorrect task.
> > >
> > > Thus, this patch added signal->in_exec_mm member and track
> > > nascent mm usage.
> >
> > Stupid question.
> >
> > Can't we just account these allocations in the old -mm temporary?
> >
> > IOW. Please look at the "patch" below. It is of course incomplete
> > and wrong (to the point inc_mm_counter() is not safe without
> > SPLIT_RSS_COUNTING), and copy_strings/flush_old_exec are not the
> > best places to play with mm-counters, just to explain what I mean.
> >
> > It is very simple. copy_strings() increments MM_ANONPAGES every
> > time we add a new page into bprm->vma. This makes this memory
> > visible to select_bad_process().
> >
> > When exec changes ->mm (or if it fails), we change MM_ANONPAGES
> > counter back.
> >
> > Most probably I missed something, but what do you think?
>
> Because, If the pages of argv is swapping out when processing execve,
> This accouing doesn't work.

Why?

If copy_strings() inserts the new page into bprm->vma and then
this page is swapped out, inc_mm_counter(current->mm, MM_ANONPAGES)
becomes incorrect, yes. And we can't turn it into MM_SWAPENTS.

But does this really matter? oom_badness() counts MM_ANONPAGES +
MM_SWAPENTS, and result is the same.

> Is this enough explanation? Please don't hesitate say "no". If people
> don't like my approach, I don't hesitate change my thinking.

Well, certainly I can't say no ;)

But it would be nice to find a more simple fix (if it can work,
of course).


And. I need a simple solution for the older kernels.

Oleg.

2010-11-25 11:06:59

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [resend][PATCH 4/4] oom: don't ignore rss in nascent mm

> > > Stupid question.
> > >
> > > Can't we just account these allocations in the old -mm temporary?
> > >
> > > IOW. Please look at the "patch" below. It is of course incomplete
> > > and wrong (to the point inc_mm_counter() is not safe without
> > > SPLIT_RSS_COUNTING), and copy_strings/flush_old_exec are not the
> > > best places to play with mm-counters, just to explain what I mean.
> > >
> > > It is very simple. copy_strings() increments MM_ANONPAGES every
> > > time we add a new page into bprm->vma. This makes this memory
> > > visible to select_bad_process().
> > >
> > > When exec changes ->mm (or if it fails), we change MM_ANONPAGES
> > > counter back.
> > >
> > > Most probably I missed something, but what do you think?
> >
> > Because, If the pages of argv is swapping out when processing execve,
> > This accouing doesn't work.
>
> Why?
>
> If copy_strings() inserts the new page into bprm->vma and then
> this page is swapped out, inc_mm_counter(current->mm, MM_ANONPAGES)
> becomes incorrect, yes. And we can't turn it into MM_SWAPENTS.
>
> But does this really matter? oom_badness() counts MM_ANONPAGES +
> MM_SWAPENTS, and result is the same.

Ah, I got it. I did too strongly get stucked correct accounting. but
you mean it's not must.

Okey, I'll tackle this one at this weekend hopefully.



> > Is this enough explanation? Please don't hesitate say "no". If people
> > don't like my approach, I don't hesitate change my thinking.
>
> Well, certainly I can't say no ;)
>
> But it would be nice to find a more simple fix (if it can work,
> of course).
>
>
> And. I need a simple solution for the older kernels.

Alright. It is certinally considerable one.

2010-11-25 14:10:01

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [resend][PATCH 4/4] oom: don't ignore rss in nascent mm

On 11/25, KOSAKI Motohiro wrote:
>
> > > > It is very simple. copy_strings() increments MM_ANONPAGES every
> > > > time we add a new page into bprm->vma. This makes this memory
> > > > visible to select_bad_process().
> > > >
> > > > When exec changes ->mm (or if it fails), we change MM_ANONPAGES
> > > > counter back.
> > > >
> > > > Most probably I missed something, but what do you think?
> > >
> > > Because, If the pages of argv is swapping out when processing execve,
> > > This accouing doesn't work.
> >
> > Why?
> >
> > If copy_strings() inserts the new page into bprm->vma and then
> > this page is swapped out, inc_mm_counter(current->mm, MM_ANONPAGES)
> > becomes incorrect, yes. And we can't turn it into MM_SWAPENTS.
> >
> > But does this really matter? oom_badness() counts MM_ANONPAGES +
> > MM_SWAPENTS, and result is the same.
>
> Ah, I got it. I did too strongly get stucked correct accounting. but
> you mean it's not must.

Yes. In fact, I _think_ this patch makes accounting better, even if
the extra MM_ANONPAGES numbers are not 100% correct.

Even if we add signal->in_exec_mm, nobody except oom_badness() will
look at it.

With this patch, say, /proc/pid/statm or /proc/pid/status will report
the memory allocated by the execing task. Even if technically this is
not correct (and 'swap' part may be wrong), this makes sense imho.
Otherwise, there is no way to see that this task allocates (may be
a lot) of memory.

This can "confuse" update_hiwater_rss(), but imho this is fine too.


> > > Is this enough explanation? Please don't hesitate say "no". If people
> > > don't like my approach, I don't hesitate change my thinking.
> >
> > Well, certainly I can't say no ;)
> >
> > But it would be nice to find a more simple fix (if it can work,
> > of course).
> >
> >
> > And. I need a simple solution for the older kernels.
>
> Alright. It is certinally considerable one.

Great! I'll send the patch tomorrow.

Even if you prefer another fix for 2.6.37/stable, I'd like to see
your review to know if it is correct or not (for backporting).

Oleg.

2010-11-25 19:44:32

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [resend][PATCH 4/4] oom: don't ignore rss in nascent mm

On 11/25, Oleg Nesterov wrote:
>
> Great! I'll send the patch tomorrow.
>
> Even if you prefer another fix for 2.6.37/stable, I'd like to see
> your review to know if it is correct or not (for backporting).

OK, what do you think about the patch below?

Seems to work, with this patch the test-case doesn't kill the
system (sysctl_oom_kill_allocating_task == 0).

I didn't dare to change !CONFIG_MMU case, I do not know how to
test it.

The patch is not complete, compat_copy_strings() needs changes.
But, shouldn't it use get_arg_page() too? Otherwise, where do
we check RLIMIT_STACK?

The patch asks for the cleanups. In particular, I think exec_mmap()
should accept bprm, not mm. But I'd prefer to do this later.

Oleg.

include/linux/binfmts.h | 1 +
fs/exec.c | 28 ++++++++++++++++++++++++++--
2 files changed, 27 insertions(+), 2 deletions(-)

--- K/include/linux/binfmts.h~acct_exec_mem 2010-08-19 11:35:00.000000000 +0200
+++ K/include/linux/binfmts.h 2010-11-25 20:19:33.000000000 +0100
@@ -33,6 +33,7 @@ struct linux_binprm{
# define MAX_ARG_PAGES 32
struct page *page[MAX_ARG_PAGES];
#endif
+ unsigned long vma_pages;
struct mm_struct *mm;
unsigned long p; /* current top of mem */
unsigned int
--- K/fs/exec.c~acct_exec_mem 2010-11-25 15:16:56.000000000 +0100
+++ K/fs/exec.c 2010-11-25 20:20:49.000000000 +0100
@@ -162,6 +162,25 @@ out:
return error;
}

+static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+{
+ struct mm_struct *mm = current->mm;
+ long diff = pages - bprm->vma_pages;
+
+ if (!mm || !diff)
+ return;
+
+ bprm->vma_pages += diff;
+
+#ifdef SPLIT_RSS_COUNTING
+ add_mm_counter(mm, MM_ANONPAGES, diff);
+#else
+ spin_lock(&mm->page_table_lock);
+ add_mm_counter(mm, MM_ANONPAGES, diff);
+ spin_unlock(&mm->page_table_lock);
+#endif
+}
+
#ifdef CONFIG_MMU

static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
@@ -186,6 +205,8 @@ static struct page *get_arg_page(struct
unsigned long size = bprm->vma->vm_end - bprm->vma->vm_start;
struct rlimit *rlim;

+ acct_arg_size(bprm, size / PAGE_SIZE);
+
/*
* We've historically supported up to 32 pages (ARG_MAX)
* of argument strings even with small stacks
@@ -1003,6 +1024,7 @@ int flush_old_exec(struct linux_binprm *
/*
* Release all of the old mmap stuff
*/
+ acct_arg_size(bprm, 0);
retval = exec_mmap(bprm->mm);
if (retval)
goto out;
@@ -1426,8 +1448,10 @@ int do_execve(const char * filename,
return retval;

out:
- if (bprm->mm)
- mmput (bprm->mm);
+ if (bprm->mm) {
+ acct_arg_size(bprm, 0);
+ mmput(bprm->mm);
+ }

out_file:
if (bprm->file) {

2010-11-29 05:25:37

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [resend][PATCH 4/4] oom: don't ignore rss in nascent mm

> On 11/25, Oleg Nesterov wrote:
> >
> > Great! I'll send the patch tomorrow.
> >
> > Even if you prefer another fix for 2.6.37/stable, I'd like to see
> > your review to know if it is correct or not (for backporting).
>
> OK, what do you think about the patch below?

Great. Thanks a lot.


>
> Seems to work, with this patch the test-case doesn't kill the
> system (sysctl_oom_kill_allocating_task == 0).
>
> I didn't dare to change !CONFIG_MMU case, I do not know how to
> test it.
>
> The patch is not complete, compat_copy_strings() needs changes.
> But, shouldn't it use get_arg_page() too? Otherwise, where do
> we check RLIMIT_STACK?
>

Because NOMMU doesn't have variable length argv. Instead it is still
using MAX_ARG_STRLEN as old MMU code.

32 pages hard coded argv limitation naturally prevent this nascent mm
issue.


> The patch asks for the cleanups. In particular, I think exec_mmap()
> should accept bprm, not mm. But I'd prefer to do this later.
>
> Oleg.

General request. Please consider to keep Brad's reported-by tag.


>
> include/linux/binfmts.h | 1 +
> fs/exec.c | 28 ++++++++++++++++++++++++++--
> 2 files changed, 27 insertions(+), 2 deletions(-)
>
> --- K/include/linux/binfmts.h~acct_exec_mem 2010-08-19 11:35:00.000000000 +0200
> +++ K/include/linux/binfmts.h 2010-11-25 20:19:33.000000000 +0100
> @@ -33,6 +33,7 @@ struct linux_binprm{
> # define MAX_ARG_PAGES 32
> struct page *page[MAX_ARG_PAGES];
> #endif
> + unsigned long vma_pages;
> struct mm_struct *mm;
> unsigned long p; /* current top of mem */
> unsigned int
> --- K/fs/exec.c~acct_exec_mem 2010-11-25 15:16:56.000000000 +0100
> +++ K/fs/exec.c 2010-11-25 20:20:49.000000000 +0100
> @@ -162,6 +162,25 @@ out:
> return error;
> }
>
> +static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)

Please move this function into #ifdef CONFIG_MMU. nommu code doesn't use it.

> +{
> + struct mm_struct *mm = current->mm;
> + long diff = pages - bprm->vma_pages;

I prefer to cast signed before assignment. It's safer more.


> +
> + if (!mm || !diff)
> + return;
> +
> + bprm->vma_pages += diff;
> +
> +#ifdef SPLIT_RSS_COUNTING
> + add_mm_counter(mm, MM_ANONPAGES, diff);
> +#else
> + spin_lock(&mm->page_table_lock);
> + add_mm_counter(mm, MM_ANONPAGES, diff);
> + spin_unlock(&mm->page_table_lock);
> +#endif

OK, looks good.


> +}
> +
> #ifdef CONFIG_MMU
>
> static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> @@ -186,6 +205,8 @@ static struct page *get_arg_page(struct
> unsigned long size = bprm->vma->vm_end - bprm->vma->vm_start;
> struct rlimit *rlim;
>
> + acct_arg_size(bprm, size / PAGE_SIZE);
> +
> /*
> * We've historically supported up to 32 pages (ARG_MAX)
> * of argument strings even with small stacks
> @@ -1003,6 +1024,7 @@ int flush_old_exec(struct linux_binprm *
> /*
> * Release all of the old mmap stuff
> */
> + acct_arg_size(bprm, 0);

Why do we need this unacct here? I mean 1) if exec_mmap() is success,
we don't need unaccount at all 2) if exec_mmap() is failure, an epilogue of
do_execve() does unaccount thing.


> retval = exec_mmap(bprm->mm);
> if (retval)
> goto out;
> @@ -1426,8 +1448,10 @@ int do_execve(const char * filename,
> return retval;
>
> out:
> - if (bprm->mm)
> - mmput (bprm->mm);
> + if (bprm->mm) {
> + acct_arg_size(bprm, 0);
> + mmput(bprm->mm);
> + }
>
> out_file:
> if (bprm->file) {
>


2010-11-29 11:41:27

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [resend][PATCH 4/4] oom: don't ignore rss in nascent mm

On 11/29, KOSAKI Motohiro wrote:
>
> > The patch is not complete, compat_copy_strings() needs changes.
> > But, shouldn't it use get_arg_page() too? Otherwise, where do
> > we check RLIMIT_STACK?
>
> Because NOMMU doesn't have variable length argv. Instead it is still
> using MAX_ARG_STRLEN as old MMU code.
>
> 32 pages hard coded argv limitation naturally prevent this nascent mm
> issue.

Ah, I didn't mean NOMMU. I meant compat_execve()->compat_copy_strings().
If a 32bit process execs we seem to miss the RLIMIT_STACK check, no?

> > The patch asks for the cleanups. In particular, I think exec_mmap()
> > should accept bprm, not mm. But I'd prefer to do this later.
> >
> > Oleg.
>
> General request. Please consider to keep Brad's reported-by tag.

Yes, yes, sure.

> > +static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)

OK.

> Please move this function into #ifdef CONFIG_MMU. nommu code doesn't use it.

Well it does, to revert the MM_ANONPAGES counter. I'll add the empty
function for NOMMU.

> > +{
> > + struct mm_struct *mm = current->mm;
> > + long diff = pages - bprm->vma_pages;
>
> I prefer to cast signed before assignment. It's safer more.

OK.

> > @@ -1003,6 +1024,7 @@ int flush_old_exec(struct linux_binprm *
> > /*
> > * Release all of the old mmap stuff
> > */
> > + acct_arg_size(bprm, 0);
>
> Why do we need this unacct here? I mean 1) if exec_mmap() is success,
> we don't need unaccount at all

Yes, we already killed all sub-threads. But this doesn't mean nobody
else can use current->mm, think about CLONE_VM. The simplest example
is vfork().

> 2) if exec_mmap() is failure, an epilogue of
> do_execve() does unaccount thing.

Yes.

Thanks Kosaki!

I'll resend v2 today. I am still not sure about compat_copy_strings()...

Oleg.

2010-11-29 18:31:36

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [resend][PATCH 4/4] oom: don't ignore rss in nascent mm

On 11/29, Oleg Nesterov wrote:
>
> I'll resend v2 today.

OK, please see below, just for your review.

I was going to sent it "officially" with the changelog/etc, but

> I am still not sure about compat_copy_strings()...

Yes, I think it needs the same checks. It should use get_arg_page()
or we need more copy-and-paste code, I think it should also check
fatal_signal_pending() like copy_strings() does.

I was going to export get_arg_page/acct_arg_size, but it is so
ugly. I'll try to find the way to unify copy_strings and
compat_copy_strings, not sure it is possible to do cleanly.

Probably this needs a separate patch in any case.

Oleg.

Changes:

- move acct_arg_size() under CONFIG_MMU

- add the "nop" version for NOMMMU

include/linux/binfmts.h | 1 +
fs/exec.c | 32 ++++++++++++++++++++++++++++++--
2 files changed, 31 insertions(+), 2 deletions(-)

--- K/include/linux/binfmts.h~acct_exec_mem 2010-08-19 11:35:00.000000000 +0200
+++ K/include/linux/binfmts.h 2010-11-29 17:29:35.000000000 +0100
@@ -29,6 +29,7 @@ struct linux_binprm{
char buf[BINPRM_BUF_SIZE];
#ifdef CONFIG_MMU
struct vm_area_struct *vma;
+ unsigned long vma_pages;
#else
# define MAX_ARG_PAGES 32
struct page *page[MAX_ARG_PAGES];
--- K/fs/exec.c~acct_exec_mem 2010-11-25 15:16:56.000000000 +0100
+++ K/fs/exec.c 2010-11-29 17:51:43.000000000 +0100
@@ -164,6 +164,25 @@ out:

#ifdef CONFIG_MMU

+static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+{
+ struct mm_struct *mm = current->mm;
+ long diff = (long)(pages - bprm->vma_pages);
+
+ if (!mm || !diff)
+ return;
+
+ bprm->vma_pages = pages;
+
+#ifdef SPLIT_RSS_COUNTING
+ add_mm_counter(mm, MM_ANONPAGES, diff);
+#else
+ spin_lock(&mm->page_table_lock);
+ add_mm_counter(mm, MM_ANONPAGES, diff);
+ spin_unlock(&mm->page_table_lock);
+#endif
+}
+
static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
@@ -186,6 +205,8 @@ static struct page *get_arg_page(struct
unsigned long size = bprm->vma->vm_end - bprm->vma->vm_start;
struct rlimit *rlim;

+ acct_arg_size(bprm, size / PAGE_SIZE);
+
/*
* We've historically supported up to 32 pages (ARG_MAX)
* of argument strings even with small stacks
@@ -276,6 +297,10 @@ static bool valid_arg_len(struct linux_b

#else

+static inline void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+{
+}
+
static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
@@ -1003,6 +1028,7 @@ int flush_old_exec(struct linux_binprm *
/*
* Release all of the old mmap stuff
*/
+ acct_arg_size(bprm, 0);
retval = exec_mmap(bprm->mm);
if (retval)
goto out;
@@ -1426,8 +1452,10 @@ int do_execve(const char * filename,
return retval;

out:
- if (bprm->mm)
- mmput (bprm->mm);
+ if (bprm->mm) {
+ acct_arg_size(bprm, 0);
+ mmput(bprm->mm);
+ }

out_file:
if (bprm->file) {

2010-11-30 00:06:45

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [resend][PATCH 4/4] oom: don't ignore rss in nascent mm

> On 11/29, KOSAKI Motohiro wrote:
> >
> > > The patch is not complete, compat_copy_strings() needs changes.
> > > But, shouldn't it use get_arg_page() too? Otherwise, where do
> > > we check RLIMIT_STACK?
> >
> > Because NOMMU doesn't have variable length argv. Instead it is still
> > using MAX_ARG_STRLEN as old MMU code.
> >
> > 32 pages hard coded argv limitation naturally prevent this nascent mm
> > issue.
>
> Ah, I didn't mean NOMMU. I meant compat_execve()->compat_copy_strings().
> If a 32bit process execs we seem to miss the RLIMIT_STACK check, no?

Ah, yes. that's bug. You have found more serious issue ;)



> > > The patch asks for the cleanups. In particular, I think exec_mmap()
> > > should accept bprm, not mm. But I'd prefer to do this later.
> > >
> > > Oleg.
> >
> > General request. Please consider to keep Brad's reported-by tag.
>
> Yes, yes, sure.
>
> > > +static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
>
> OK.
>
> > Please move this function into #ifdef CONFIG_MMU. nommu code doesn't use it.
>
> Well it does, to revert the MM_ANONPAGES counter. I'll add the empty
> function for NOMMU.
>
> > > +{
> > > + struct mm_struct *mm = current->mm;
> > > + long diff = pages - bprm->vma_pages;
> >
> > I prefer to cast signed before assignment. It's safer more.
>
> OK.
>
> > > @@ -1003,6 +1024,7 @@ int flush_old_exec(struct linux_binprm *
> > > /*
> > > * Release all of the old mmap stuff
> > > */
> > > + acct_arg_size(bprm, 0);
> >
> > Why do we need this unacct here? I mean 1) if exec_mmap() is success,
> > we don't need unaccount at all
>
> Yes, we already killed all sub-threads. But this doesn't mean nobody
> else can use current->mm, think about CLONE_VM. The simplest example
> is vfork().

Right you are.


> > 2) if exec_mmap() is failure, an epilogue of
> > do_execve() does unaccount thing.
>
> Yes.
>
> Thanks Kosaki!
>
> I'll resend v2 today. I am still not sure about compat_copy_strings()...
>
> Oleg.
>


2010-11-30 20:02:30

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/2] exec: more excessive argument size fixes for 2.6.37/stable

On 11/29, Oleg Nesterov wrote:
>
> I was going to export get_arg_page/acct_arg_size, but it is so
> ugly.

But I think this is the only option for 2.6.37/stable.

So. I am sending 2 patches, hopefully they fix the problems
and there are simple enough for 2.6.27/stable.

> I'll try to find the way to unify copy_strings and
> compat_copy_strings, not sure it is possible to do cleanly.

I'll send the cleanups which unify compat/non-compat code on
top of these fixes, this is not stable material.

Oleg.

2010-11-30 20:03:27

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/2] exec: make argv/envp memory visible to oom-killer

Brad Spengler published a local memory-allocation DoS that
evades the OOM-killer (though not the virtual memory RLIMIT):
http://www.grsecurity.net/~spender/64bit_dos.c

execve()->copy_strings() can allocate a lot of memory, but
this is not visible to oom-killer, nobody can see the nascent
bprm->mm and take it into account.

With this patch get_arg_page() increments current's MM_ANONPAGES
counter every time we allocate the new page for argv/envp. When
do_execve() succeds or fails, we change this counter back.

Technically this is not 100% correct, we can't know if the new
page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but
I don't think this really matters and everything becomes correct
once exec changes ->mm or fails.

Reported-by: Brad Spengler <[email protected]>
By-discussion-with: KOSAKI Motohiro <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---

include/linux/binfmts.h | 1 +
fs/exec.c | 32 ++++++++++++++++++++++++++++++--
2 files changed, 31 insertions(+), 2 deletions(-)

--- K/include/linux/binfmts.h~acct_exec_mem 2010-11-30 18:27:15.000000000 +0100
+++ K/include/linux/binfmts.h 2010-11-30 18:28:54.000000000 +0100
@@ -29,6 +29,7 @@ struct linux_binprm{
char buf[BINPRM_BUF_SIZE];
#ifdef CONFIG_MMU
struct vm_area_struct *vma;
+ unsigned long vma_pages;
#else
# define MAX_ARG_PAGES 32
struct page *page[MAX_ARG_PAGES];
--- K/fs/exec.c~acct_exec_mem 2010-11-30 18:27:15.000000000 +0100
+++ K/fs/exec.c 2010-11-30 18:28:54.000000000 +0100
@@ -164,6 +164,25 @@ out:

#ifdef CONFIG_MMU

+static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+{
+ struct mm_struct *mm = current->mm;
+ long diff = (long)(pages - bprm->vma_pages);
+
+ if (!mm || !diff)
+ return;
+
+ bprm->vma_pages = pages;
+
+#ifdef SPLIT_RSS_COUNTING
+ add_mm_counter(mm, MM_ANONPAGES, diff);
+#else
+ spin_lock(&mm->page_table_lock);
+ add_mm_counter(mm, MM_ANONPAGES, diff);
+ spin_unlock(&mm->page_table_lock);
+#endif
+}
+
static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
@@ -186,6 +205,8 @@ static struct page *get_arg_page(struct
unsigned long size = bprm->vma->vm_end - bprm->vma->vm_start;
struct rlimit *rlim;

+ acct_arg_size(bprm, size / PAGE_SIZE);
+
/*
* We've historically supported up to 32 pages (ARG_MAX)
* of argument strings even with small stacks
@@ -276,6 +297,10 @@ static bool valid_arg_len(struct linux_b

#else

+static inline void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+{
+}
+
static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
@@ -1003,6 +1028,7 @@ int flush_old_exec(struct linux_binprm *
/*
* Release all of the old mmap stuff
*/
+ acct_arg_size(bprm, 0);
retval = exec_mmap(bprm->mm);
if (retval)
goto out;
@@ -1426,8 +1452,10 @@ int do_execve(const char * filename,
return retval;

out:
- if (bprm->mm)
- mmput (bprm->mm);
+ if (bprm->mm) {
+ acct_arg_size(bprm, 0);
+ mmput(bprm->mm);
+ }

out_file:
if (bprm->file) {

2010-11-30 20:03:41

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/2] exec: copy-and-paste the fixes into compat_do_execve() paths

Note: this patch targets 2.6.37 and tries to be as simple as possible.
That is why it adds more copy-and-paste horror into fs/compat.c and
uglifies fs/exec.c, this will be cleanuped later.

compat_copy_strings() plays with bprm->vma/mm directly and thus has
two problems: it lacks the RLIMIT_STACK check and argv/envp memory
is not visible to oom killer.

Export acct_arg_size() and get_arg_page(), change compat_copy_strings()
to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0)
as do_execve() does.

Add the fatal_signal_pending/cond_resched checks into compat_count() and
compat_copy_strings(), this matches the code in fs/exec.c and certainly
makes sense.

Signed-off-by: Oleg Nesterov <[email protected]>
---

include/linux/binfmts.h | 4 ++++
fs/exec.c | 8 ++++----
fs/compat.c | 28 +++++++++++++++-------------
3 files changed, 23 insertions(+), 17 deletions(-)

--- K/include/linux/binfmts.h~compat_get_arg_page 2010-11-30 18:28:54.000000000 +0100
+++ K/include/linux/binfmts.h 2010-11-30 18:30:45.000000000 +0100
@@ -60,6 +60,10 @@ struct linux_binprm{
unsigned long loader, exec;
};

+extern void acct_arg_size(struct linux_binprm *bprm, unsigned long pages);
+extern struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+ int write);
+
#define BINPRM_FLAGS_ENFORCE_NONDUMP_BIT 0
#define BINPRM_FLAGS_ENFORCE_NONDUMP (1 << BINPRM_FLAGS_ENFORCE_NONDUMP_BIT)

--- K/fs/exec.c~compat_get_arg_page 2010-11-30 18:28:54.000000000 +0100
+++ K/fs/exec.c 2010-11-30 18:30:45.000000000 +0100
@@ -164,7 +164,7 @@ out:

#ifdef CONFIG_MMU

-static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
struct mm_struct *mm = current->mm;
long diff = (long)(pages - bprm->vma_pages);
@@ -183,7 +183,7 @@ static void acct_arg_size(struct linux_b
#endif
}

-static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
struct page *page;
@@ -297,11 +297,11 @@ static bool valid_arg_len(struct linux_b

#else

-static inline void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
}

-static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
struct page *page;
--- K/fs/compat.c~compat_get_arg_page 2010-11-30 17:55:20.000000000 +0100
+++ K/fs/compat.c 2010-11-30 18:30:45.000000000 +0100
@@ -1350,6 +1350,10 @@ static int compat_count(compat_uptr_t __
argv++;
if (i++ >= max)
return -E2BIG;
+
+ if (fatal_signal_pending(current))
+ return -ERESTARTNOHAND;
+ cond_resched();
}
}
return i;
@@ -1391,6 +1395,12 @@ static int compat_copy_strings(int argc,
while (len > 0) {
int offset, bytes_to_copy;

+ if (fatal_signal_pending(current)) {
+ ret = -ERESTARTNOHAND;
+ goto out;
+ }
+ cond_resched();
+
offset = pos % PAGE_SIZE;
if (offset == 0)
offset = PAGE_SIZE;
@@ -1407,18 +1417,8 @@ static int compat_copy_strings(int argc,
if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
struct page *page;

-#ifdef CONFIG_STACK_GROWSUP
- ret = expand_stack_downwards(bprm->vma, pos);
- if (ret < 0) {
- /* We've exceed the stack rlimit. */
- ret = -E2BIG;
- goto out;
- }
-#endif
- ret = get_user_pages(current, bprm->mm, pos,
- 1, 1, 1, &page, NULL);
- if (ret <= 0) {
- /* We've exceed the stack rlimit. */
+ page = get_arg_page(bprm, pos, 1);
+ if (!page) {
ret = -E2BIG;
goto out;
}
@@ -1539,8 +1539,10 @@ int compat_do_execve(char * filename,
return retval;

out:
- if (bprm->mm)
+ if (bprm->mm) {
+ acct_arg_size(bprm, 0);
mmput(bprm->mm);
+ }

out_file:
if (bprm->file) {

2010-11-30 20:07:16

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/4] exec: unify compat/non-compat code

(remove stable)

On 11/30, Oleg Nesterov wrote:
>
> I'll send the cleanups which unify compat/non-compat code on
> top of these fixes, this is not stable material.

On top of

[PATCH 1/2] exec: make argv/envp memory visible to oom-killer
[PATCH 2/2] exec: copy-and-paste the fixes into compat_do_execve() paths

Imho, execve code in fs/compat.c must die. It is very hard to
maintain this copy-and-paste horror.

Oleg.

2010-11-30 20:08:07

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/4] exec: introduce "bool compat" argument

No functional changes, preparation to simplify the review.

And the new (and currently unused) "bool compat" argument to
get_arg_ptr(), count(), and copy_strings().

Add this argument to do_execve() as well, and rename it to
do_execve_common().

Reintroduce do_execve() as a trivial wrapper() on top of
do_execve_common(compat => false).

Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/exec.c | 33 +++++++++++++++++++++------------
1 file changed, 21 insertions(+), 12 deletions(-)

--- K/fs/exec.c~2_is_compat_arg 2010-11-30 19:14:54.000000000 +0100
+++ K/fs/exec.c 2010-11-30 19:47:24.000000000 +0100
@@ -391,7 +391,7 @@ err:
}

static const char __user *
-get_arg_ptr(const char __user * const __user *argv, int argc)
+get_arg_ptr(const char __user * const __user *argv, int argc, bool compat)
{
const char __user *ptr;

@@ -404,13 +404,13 @@ get_arg_ptr(const char __user * const __
/*
* count() counts the number of strings in array ARGV.
*/
-static int count(const char __user * const __user * argv, int max)
+static int count(const char __user * const __user *argv, int max, bool compat)
{
int i = 0;

if (argv != NULL) {
for (;;) {
- const char __user *p = get_arg_ptr(argv, i);
+ const char __user *p = get_arg_ptr(argv, i, compat);

if (!p)
break;
@@ -435,7 +435,7 @@ static int count(const char __user * con
* ensures the destination page is created and not swapped out.
*/
static int copy_strings(int argc, const char __user *const __user *argv,
- struct linux_binprm *bprm)
+ struct linux_binprm *bprm, bool compat)
{
struct page *kmapped_page = NULL;
char *kaddr = NULL;
@@ -448,7 +448,7 @@ static int copy_strings(int argc, const
unsigned long pos;

ret = -EFAULT;
- str = get_arg_ptr(argv, argc);
+ str = get_arg_ptr(argv, argc, compat);
if (IS_ERR(str))
goto out;

@@ -531,7 +531,8 @@ int copy_strings_kernel(int argc, const
int r;
mm_segment_t oldfs = get_fs();
set_fs(KERNEL_DS);
- r = copy_strings(argc, (const char __user *const __user *)argv, bprm);
+ r = copy_strings(argc, (const char __user *const __user *)argv,
+ bprm, false);
set_fs(oldfs);
return r;
}
@@ -1382,10 +1383,10 @@ EXPORT_SYMBOL(search_binary_handler);
/*
* sys_execve() executes a new program.
*/
-int do_execve(const char * filename,
+static int do_execve_common(const char *filename,
const char __user *const __user *argv,
const char __user *const __user *envp,
- struct pt_regs * regs)
+ struct pt_regs *regs, bool compat)
{
struct linux_binprm *bprm;
struct file *file;
@@ -1427,11 +1428,11 @@ int do_execve(const char * filename,
if (retval)
goto out_file;

- bprm->argc = count(argv, MAX_ARG_STRINGS);
+ bprm->argc = count(argv, MAX_ARG_STRINGS, compat);
if ((retval = bprm->argc) < 0)
goto out;

- bprm->envc = count(envp, MAX_ARG_STRINGS);
+ bprm->envc = count(envp, MAX_ARG_STRINGS, compat);
if ((retval = bprm->envc) < 0)
goto out;

@@ -1444,11 +1445,11 @@ int do_execve(const char * filename,
goto out;

bprm->exec = bprm->p;
- retval = copy_strings(bprm->envc, envp, bprm);
+ retval = copy_strings(bprm->envc, envp, bprm, compat);
if (retval < 0)
goto out;

- retval = copy_strings(bprm->argc, argv, bprm);
+ retval = copy_strings(bprm->argc, argv, bprm, compat);
if (retval < 0)
goto out;

@@ -1492,6 +1493,14 @@ out_ret:
return retval;
}

+int do_execve(const char *filename,
+ const char __user *const __user *argv,
+ const char __user *const __user *envp,
+ struct pt_regs *regs)
+{
+ return do_execve_common(filename, argv, envp, regs, false);
+}
+
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;

2010-11-30 20:08:20

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/4] exec: introduce get_arg_ptr() helper

Introduce get_arg_ptr() helper, convert count() and copy_strings()
to use it.

No functional changes, preparation. This helper is trivial, it just
reads the pointer from argv/envp user-space array.

Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/exec.c | 36 +++++++++++++++++++++++++-----------
1 file changed, 25 insertions(+), 11 deletions(-)

--- K/fs/exec.c~1_get_arg_ptr 2010-11-30 18:30:45.000000000 +0100
+++ K/fs/exec.c 2010-11-30 19:14:54.000000000 +0100
@@ -390,6 +390,17 @@ err:
return err;
}

+static const char __user *
+get_arg_ptr(const char __user * const __user *argv, int argc)
+{
+ const char __user *ptr;
+
+ if (get_user(ptr, argv + argc))
+ return ERR_PTR(-EFAULT);
+
+ return ptr;
+}
+
/*
* count() counts the number of strings in array ARGV.
*/
@@ -399,13 +410,14 @@ static int count(const char __user * con

if (argv != NULL) {
for (;;) {
- const char __user * p;
+ const char __user *p = get_arg_ptr(argv, i);

- if (get_user(p, argv))
- return -EFAULT;
if (!p)
break;
- argv++;
+
+ if (IS_ERR(p))
+ return -EFAULT;
+
if (i++ >= max)
return -E2BIG;

@@ -435,16 +447,18 @@ static int copy_strings(int argc, const
int len;
unsigned long pos;

- if (get_user(str, argv+argc) ||
- !(len = strnlen_user(str, MAX_ARG_STRLEN))) {
- ret = -EFAULT;
+ ret = -EFAULT;
+ str = get_arg_ptr(argv, argc);
+ if (IS_ERR(str))
goto out;
- }

- if (!valid_arg_len(bprm, len)) {
- ret = -E2BIG;
+ len = strnlen_user(str, MAX_ARG_STRLEN);
+ if (!len)
+ goto out;
+
+ ret = -E2BIG;
+ if (!valid_arg_len(bprm, len))
goto out;
- }

/* We're going to work our way backwords. */
pos = bprm->p;

2010-11-30 20:08:30

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 3/4] exec: unify compat_do_execve() code

Teach get_arg_ptr() to handle compat = T case correctly.

This allows us to remove the compat_do_execve() code from fs/compat.c
and reimplement compat_do_execve() as the trivial wrapper on top of
do_execve_common(compat => true).

In fact, this fixes another (minor) bug. "compat_uptr_t str" can
overflow after "str += len" in compat_copy_strings() if a 64bit
application execs via sys32_execve().

Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/exec.c | 25 ++++++
fs/compat.c | 235 ------------------------------------------------------------
2 files changed, 25 insertions(+), 235 deletions(-)

--- K/fs/exec.c~3_use_compat 2010-11-30 19:47:24.000000000 +0100
+++ K/fs/exec.c 2010-11-30 20:15:11.000000000 +0100
@@ -55,6 +55,7 @@
#include <linux/fs_struct.h>
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
+#include <linux/compat.h>

#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -395,6 +396,18 @@ get_arg_ptr(const char __user * const __
{
const char __user *ptr;

+#ifdef CONFIG_COMPAT
+ if (unlikely(compat)) {
+ compat_uptr_t __user *a = (void __user*)argv;
+ compat_uptr_t p;
+
+ if (get_user(p, a + argc))
+ return ERR_PTR(-EFAULT);
+
+ return compat_ptr(p);
+ }
+#endif
+
if (get_user(ptr, argv + argc))
return ERR_PTR(-EFAULT);

@@ -1501,6 +1514,18 @@ int do_execve(const char *filename,
return do_execve_common(filename, argv, envp, regs, false);
}

+#ifdef CONFIG_COMPAT
+int compat_do_execve(char * filename,
+ compat_uptr_t __user *argv,
+ compat_uptr_t __user *envp,
+ struct pt_regs * regs)
+{
+ return do_execve_common(filename,
+ (void __user*)argv, (void __user*)envp,
+ regs, true);
+}
+#endif
+
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;
--- K/fs/compat.c~3_use_compat 2010-11-30 18:30:45.000000000 +0100
+++ K/fs/compat.c 2010-11-30 20:17:28.000000000 +0100
@@ -1330,241 +1330,6 @@ compat_sys_openat(unsigned int dfd, cons
return do_sys_open(dfd, filename, flags, mode);
}

-/*
- * compat_count() counts the number of arguments/envelopes. It is basically
- * a copy of count() from fs/exec.c, except that it works with 32 bit argv
- * and envp pointers.
- */
-static int compat_count(compat_uptr_t __user *argv, int max)
-{
- int i = 0;
-
- if (argv != NULL) {
- for (;;) {
- compat_uptr_t p;
-
- if (get_user(p, argv))
- return -EFAULT;
- if (!p)
- break;
- argv++;
- if (i++ >= max)
- return -E2BIG;
-
- if (fatal_signal_pending(current))
- return -ERESTARTNOHAND;
- cond_resched();
- }
- }
- return i;
-}
-
-/*
- * compat_copy_strings() is basically a copy of copy_strings() from fs/exec.c
- * except that it works with 32 bit argv and envp pointers.
- */
-static int compat_copy_strings(int argc, compat_uptr_t __user *argv,
- struct linux_binprm *bprm)
-{
- struct page *kmapped_page = NULL;
- char *kaddr = NULL;
- unsigned long kpos = 0;
- int ret;
-
- while (argc-- > 0) {
- compat_uptr_t str;
- int len;
- unsigned long pos;
-
- if (get_user(str, argv+argc) ||
- !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
- ret = -EFAULT;
- goto out;
- }
-
- if (len > MAX_ARG_STRLEN) {
- ret = -E2BIG;
- goto out;
- }
-
- /* We're going to work our way backwords. */
- pos = bprm->p;
- str += len;
- bprm->p -= len;
-
- while (len > 0) {
- int offset, bytes_to_copy;
-
- if (fatal_signal_pending(current)) {
- ret = -ERESTARTNOHAND;
- goto out;
- }
- cond_resched();
-
- offset = pos % PAGE_SIZE;
- if (offset == 0)
- offset = PAGE_SIZE;
-
- bytes_to_copy = offset;
- if (bytes_to_copy > len)
- bytes_to_copy = len;
-
- offset -= bytes_to_copy;
- pos -= bytes_to_copy;
- str -= bytes_to_copy;
- len -= bytes_to_copy;
-
- if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
- struct page *page;
-
- page = get_arg_page(bprm, pos, 1);
- if (!page) {
- ret = -E2BIG;
- goto out;
- }
-
- if (kmapped_page) {
- flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
- put_page(kmapped_page);
- }
- kmapped_page = page;
- kaddr = kmap(kmapped_page);
- kpos = pos & PAGE_MASK;
- flush_cache_page(bprm->vma, kpos,
- page_to_pfn(kmapped_page));
- }
- if (copy_from_user(kaddr+offset, compat_ptr(str),
- bytes_to_copy)) {
- ret = -EFAULT;
- goto out;
- }
- }
- }
- ret = 0;
-out:
- if (kmapped_page) {
- flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
- put_page(kmapped_page);
- }
- return ret;
-}
-
-/*
- * compat_do_execve() is mostly a copy of do_execve(), with the exception
- * that it processes 32 bit argv and envp pointers.
- */
-int compat_do_execve(char * filename,
- compat_uptr_t __user *argv,
- compat_uptr_t __user *envp,
- struct pt_regs * regs)
-{
- struct linux_binprm *bprm;
- struct file *file;
- struct files_struct *displaced;
- bool clear_in_exec;
- int retval;
-
- retval = unshare_files(&displaced);
- if (retval)
- goto out_ret;
-
- retval = -ENOMEM;
- bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
- if (!bprm)
- goto out_files;
-
- retval = prepare_bprm_creds(bprm);
- if (retval)
- goto out_free;
-
- retval = check_unsafe_exec(bprm);
- if (retval < 0)
- goto out_free;
- clear_in_exec = retval;
- current->in_execve = 1;
-
- file = open_exec(filename);
- retval = PTR_ERR(file);
- if (IS_ERR(file))
- goto out_unmark;
-
- sched_exec();
-
- bprm->file = file;
- bprm->filename = filename;
- bprm->interp = filename;
-
- retval = bprm_mm_init(bprm);
- if (retval)
- goto out_file;
-
- bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
- if ((retval = bprm->argc) < 0)
- goto out;
-
- bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
- if ((retval = bprm->envc) < 0)
- goto out;
-
- retval = prepare_binprm(bprm);
- if (retval < 0)
- goto out;
-
- retval = copy_strings_kernel(1, &bprm->filename, bprm);
- if (retval < 0)
- goto out;
-
- bprm->exec = bprm->p;
- retval = compat_copy_strings(bprm->envc, envp, bprm);
- if (retval < 0)
- goto out;
-
- retval = compat_copy_strings(bprm->argc, argv, bprm);
- if (retval < 0)
- goto out;
-
- retval = search_binary_handler(bprm, regs);
- if (retval < 0)
- goto out;
-
- /* execve succeeded */
- current->fs->in_exec = 0;
- current->in_execve = 0;
- acct_update_integrals(current);
- free_bprm(bprm);
- if (displaced)
- put_files_struct(displaced);
- return retval;
-
-out:
- if (bprm->mm) {
- acct_arg_size(bprm, 0);
- mmput(bprm->mm);
- }
-
-out_file:
- if (bprm->file) {
- allow_write_access(bprm->file);
- fput(bprm->file);
- }
-
-out_unmark:
- if (clear_in_exec)
- current->fs->in_exec = 0;
- current->in_execve = 0;
-
-out_free:
- free_bprm(bprm);
-
-out_files:
- if (displaced)
- reset_files_struct(displaced);
-out_ret:
- return retval;
-}
-
#define __COMPAT_NFDBITS (8 * sizeof(compat_ulong_t))

static int poll_select_copy_remaining(struct timespec *end_time, void __user *p,

2010-11-30 20:08:50

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 4/4] exec: unexport acct_arg_size() and get_arg_page()

Unexport acct_arg_size() and get_arg_page(), fs/compat.c doesn't
need them any longer.

Signed-off-by: Oleg Nesterov <[email protected]>
---

include/linux/binfmts.h | 4 ----
fs/exec.c | 8 ++++----
2 files changed, 4 insertions(+), 8 deletions(-)

--- K/include/linux/binfmts.h~4_unexport_arg_helpers 2010-11-30 18:30:45.000000000 +0100
+++ K/include/linux/binfmts.h 2010-11-30 20:38:13.000000000 +0100
@@ -60,10 +60,6 @@ struct linux_binprm{
unsigned long loader, exec;
};

-extern void acct_arg_size(struct linux_binprm *bprm, unsigned long pages);
-extern struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
- int write);
-
#define BINPRM_FLAGS_ENFORCE_NONDUMP_BIT 0
#define BINPRM_FLAGS_ENFORCE_NONDUMP (1 << BINPRM_FLAGS_ENFORCE_NONDUMP_BIT)

--- K/fs/exec.c~4_unexport_arg_helpers 2010-11-30 20:15:11.000000000 +0100
+++ K/fs/exec.c 2010-11-30 20:38:13.000000000 +0100
@@ -165,7 +165,7 @@ out:

#ifdef CONFIG_MMU

-void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
struct mm_struct *mm = current->mm;
long diff = (long)(pages - bprm->vma_pages);
@@ -184,7 +184,7 @@ void acct_arg_size(struct linux_binprm *
#endif
}

-struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
struct page *page;
@@ -298,11 +298,11 @@ static bool valid_arg_len(struct linux_b

#else

-void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+static inline void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
}

-struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
struct page *page;

2010-12-01 00:12:53

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 1/2] exec: make argv/envp memory visible to oom-killer

> Brad Spengler published a local memory-allocation DoS that
> evades the OOM-killer (though not the virtual memory RLIMIT):
> http://www.grsecurity.net/~spender/64bit_dos.c
>
> execve()->copy_strings() can allocate a lot of memory, but
> this is not visible to oom-killer, nobody can see the nascent
> bprm->mm and take it into account.
>
> With this patch get_arg_page() increments current's MM_ANONPAGES
> counter every time we allocate the new page for argv/envp. When
> do_execve() succeds or fails, we change this counter back.
>
> Technically this is not 100% correct, we can't know if the new
> page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but
> I don't think this really matters and everything becomes correct
> once exec changes ->mm or fails.
>
> Reported-by: Brad Spengler <[email protected]>
> By-discussion-with: KOSAKI Motohiro <[email protected]>
> Signed-off-by: Oleg Nesterov <[email protected]>

Looks good to me.
Reviewed-by: KOSAKI Motohiro <[email protected]>


Thank you very much.


> --- K/fs/exec.c~acct_exec_mem 2010-11-30 18:27:15.000000000 +0100
> +++ K/fs/exec.c 2010-11-30 18:28:54.000000000 +0100
> @@ -164,6 +164,25 @@ out:
>
> #ifdef CONFIG_MMU
>
> +static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)

One minor request.

I guess this function can easily makes confusing to a code reader. So I
hope you write small function comments. describe to
- What is oom nascent issue
- Why we think inaccurate account is ok


> +{
> + struct mm_struct *mm = current->mm;
> + long diff = (long)(pages - bprm->vma_pages);
> +
> + if (!mm || !diff)
> + return;
> +
> + bprm->vma_pages = pages;
> +
> +#ifdef SPLIT_RSS_COUNTING
> + add_mm_counter(mm, MM_ANONPAGES, diff);
> +#else
> + spin_lock(&mm->page_table_lock);
> + add_mm_counter(mm, MM_ANONPAGES, diff);
> + spin_unlock(&mm->page_table_lock);
> +#endif
> +}
> +



2010-12-01 03:04:41

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 2/2] exec: copy-and-paste the fixes into compat_do_execve() paths

> Note: this patch targets 2.6.37 and tries to be as simple as possible.
> That is why it adds more copy-and-paste horror into fs/compat.c and
> uglifies fs/exec.c, this will be cleanuped later.
>
> compat_copy_strings() plays with bprm->vma/mm directly and thus has
> two problems: it lacks the RLIMIT_STACK check and argv/envp memory
> is not visible to oom killer.
>
> Export acct_arg_size() and get_arg_page(), change compat_copy_strings()
> to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0)
> as do_execve() does.
>
> Add the fatal_signal_pending/cond_resched checks into compat_count() and
> compat_copy_strings(), this matches the code in fs/exec.c and certainly
> makes sense.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Looks good to me.
Reviewed-by: KOSAKI Motohiro <[email protected]>


2010-12-01 03:09:53

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 0/4] exec: unify compat/non-compat code

> (remove stable)
>
> On 11/30, Oleg Nesterov wrote:
> >
> > I'll send the cleanups which unify compat/non-compat code on
> > top of these fixes, this is not stable material.
>
> On top of
>
> [PATCH 1/2] exec: make argv/envp memory visible to oom-killer
> [PATCH 2/2] exec: copy-and-paste the fixes into compat_do_execve() paths
>
> Imho, execve code in fs/compat.c must die. It is very hard to
> maintain this copy-and-paste horror.

I strongly like this series. (yes, I made fault to forgot to change compat.c
multiple times ;)

Unfortunatelly, this is a bit large and I have no time now. I expect I
can review this at this or next weekend.....
Hopefully, anyoneelse will review this and ignore me....


2010-12-01 17:38:36

by Milton Miller

[permalink] [raw]
Subject: (No subject header)

On Tue, 30 Nov 2010 about 20:01:29 -0000, Oleg Nesterov wrote:
> Teach get_arg_ptr() to handle compat = T case correctly.

> #include <asm/uaccess.h>
> #include <asm/mmu_context.h>
> @@ -395,6 +396,18 @@ get_arg_ptr(const char __user * const __
> {
> const char __user *ptr;
>
> +#ifdef CONFIG_COMPAT
> + if (unlikely(compat)) {

This should not be marked unlikely. Unlikely tells gcc the path
with over 99% confidence and disables branch predictors on some
architectures. If called from a compat processes this will result
in a mispredicted branch every iteration. Just use if (compat)
and let the hardware branch predictors do their job.

> + compat_uptr_t __user *a = (void __user*)argv;
> + compat_uptr_t p;
> +
> + if (get_user(p, a + argc))
> + return ERR_PTR(-EFAULT);
> +
> + return compat_ptr(p);
> + }
> +#endif
> +
> if (get_user(ptr, argv + argc))
> return ERR_PTR(-EFAULT);
>
> @@ -1501,6 +1514,18 @@ int do_execve(const char *filename,
> return do_execve_common(filename, argv, envp, regs, false);
> }
>
> +#ifdef CONFIG_COMPAT
> +int compat_do_execve(char * filename,
> + compat_uptr_t __user *argv,
> + compat_uptr_t __user *envp,
> + struct pt_regs * regs)
> +{
> + return do_execve_common(filename,
> + (void __user*)argv, (void __user*)envp,

Shouldn't these be compat_ptr(argv)? (makes a difference on s390)

> + regs, true);
> +}
> +#endif
> +
> void set_binfmt(struct linux_binfmt *new)
> {
> struct mm_struct *mm = current->mm;

Thanks,
milton

2010-12-01 18:15:24

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/2] exec: make argv/envp memory visible to oom-killer

On 12/01, KOSAKI Motohiro wrote:
>
> > +static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
>
> One minor request.
>
> I guess this function can easily makes confusing to a code reader. So I
> hope you write small function comments. describe to
> - What is oom nascent issue
> - Why we think inaccurate account is ok

Agreed, this needs a comment.

The patch was already applied, I'll send a separate one on top
of the next "unify exec/compat" series. Or, I'll add the comments
into this series, depending on review.

Thanks,

Oleg.

2010-12-01 18:34:55

by Oleg Nesterov

[permalink] [raw]
Subject: Re: (No subject header)

On 12/01, Milton Miller wrote:
>
> On Tue, 30 Nov 2010 about 20:01:29 -0000, Oleg Nesterov wrote:
> > Teach get_arg_ptr() to handle compat = T case correctly.
>
> > #include <asm/uaccess.h>
> > #include <asm/mmu_context.h>
> > @@ -395,6 +396,18 @@ get_arg_ptr(const char __user * const __
> > {
> > const char __user *ptr;
> >
> > +#ifdef CONFIG_COMPAT
> > + if (unlikely(compat)) {
>
> This should not be marked unlikely. Unlikely tells gcc the path
> with over 99% confidence and disables branch predictors on some
> architectures. If called from a compat processes this will result
> in a mispredicted branch every iteration. Just use if (compat)
> and let the hardware branch predictors do their job.

This applies to almost every likely/unlikely, and I think that compat
processes should fall into "unlikely category". But I don't really mind,
I can remove this hint, I added it mostly as documentation.

> > +#ifdef CONFIG_COMPAT
> > +int compat_do_execve(char * filename,
> > + compat_uptr_t __user *argv,
> > + compat_uptr_t __user *envp,
> > + struct pt_regs * regs)
> > +{
> > + return do_execve_common(filename,
> > + (void __user*)argv, (void __user*)envp,
>
> Shouldn't these be compat_ptr(argv)? (makes a difference on s390)

I'll recheck, but I don't think so. Please note that compat_ptr()
accepts "compat_uptr_t", not "compat_uptr_t *".

argv should be correct as a pointer to user-space, otherwise the
current code is buggy. For example, compat_do_execve() passes
argv to compat_count() which does get_user(argv) without any
conversion.

IOW, even if this should be fixed, I think this have nothing to
do with this patch. But I'll recheck, thanks.

Oleg.

2011-02-25 18:01:22

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/4 RESEND] exec: unify compat/non-compat code

On 12/01, Oleg Nesterov wrote:
>
> On 12/01, Milton Miller wrote:
> >
> > > +#ifdef CONFIG_COMPAT
> > > +int compat_do_execve(char * filename,
> > > + compat_uptr_t __user *argv,
> > > + compat_uptr_t __user *envp,
> > > + struct pt_regs * regs)
> > > +{
> > > + return do_execve_common(filename,
> > > + (void __user*)argv, (void __user*)envp,
> >
> > Shouldn't these be compat_ptr(argv)? (makes a difference on s390)
>
> I'll recheck, but I don't think so. Please note that compat_ptr()
> accepts "compat_uptr_t", not "compat_uptr_t *".
>
> argv should be correct as a pointer to user-space, otherwise the
> current code is buggy. For example, compat_do_execve() passes
> argv to compat_count() which does get_user(argv) without any
> conversion.

So, once again, this should not (and can not) be compat_ptr(argv) afaics.

I don't understand the s390 asm, but compat_wrapper.S:sys32_execve_wrapper
looks correct. If not, the current code is already buggy and s390 should
be fixed. argv/envp are not compat ptrs, they just point to compat_ data,
we should not do any conversion.

I am resending this series unchanged, plus the trivial 5/5 to document
acct_arg_size().

----------------------------------------------------------------------

execve code in fs/compat.c must die. It is very hard to maintain this
copy-and-paste horror. And the only reason for this duplication is that
argv/envp point to char* or compat_uptr_t depending on compat. We can
add the trivial helper which hides the difference and unify the code.

Oleg.

2011-02-25 18:01:46

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/5] exec: introduce "bool compat" argument

No functional changes, preparation to simplify the review.

And the new (and currently unused) "bool compat" argument to
get_arg_ptr(), count(), and copy_strings().

Add this argument to do_execve() as well, and rename it to
do_execve_common().

Reintroduce do_execve() as a trivial wrapper() on top of
do_execve_common(compat => false).

Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/exec.c | 33 +++++++++++++++++++++------------
1 file changed, 21 insertions(+), 12 deletions(-)

--- 38/fs/exec.c~2_is_compat_arg 2011-02-25 18:04:50.000000000 +0100
+++ 38/fs/exec.c 2011-02-25 18:05:05.000000000 +0100
@@ -396,7 +396,7 @@ err:
}

static const char __user *
-get_arg_ptr(const char __user * const __user *argv, int argc)
+get_arg_ptr(const char __user * const __user *argv, int argc, bool compat)
{
const char __user *ptr;

@@ -409,13 +409,13 @@ get_arg_ptr(const char __user * const __
/*
* count() counts the number of strings in array ARGV.
*/
-static int count(const char __user * const __user * argv, int max)
+static int count(const char __user * const __user *argv, int max, bool compat)
{
int i = 0;

if (argv != NULL) {
for (;;) {
- const char __user *p = get_arg_ptr(argv, i);
+ const char __user *p = get_arg_ptr(argv, i, compat);

if (!p)
break;
@@ -440,7 +440,7 @@ static int count(const char __user * con
* ensures the destination page is created and not swapped out.
*/
static int copy_strings(int argc, const char __user *const __user *argv,
- struct linux_binprm *bprm)
+ struct linux_binprm *bprm, bool compat)
{
struct page *kmapped_page = NULL;
char *kaddr = NULL;
@@ -453,7 +453,7 @@ static int copy_strings(int argc, const
unsigned long pos;

ret = -EFAULT;
- str = get_arg_ptr(argv, argc);
+ str = get_arg_ptr(argv, argc, compat);
if (IS_ERR(str))
goto out;

@@ -536,7 +536,8 @@ int copy_strings_kernel(int argc, const
int r;
mm_segment_t oldfs = get_fs();
set_fs(KERNEL_DS);
- r = copy_strings(argc, (const char __user *const __user *)argv, bprm);
+ r = copy_strings(argc, (const char __user *const __user *)argv,
+ bprm, false);
set_fs(oldfs);
return r;
}
@@ -1387,10 +1388,10 @@ EXPORT_SYMBOL(search_binary_handler);
/*
* sys_execve() executes a new program.
*/
-int do_execve(const char * filename,
+static int do_execve_common(const char *filename,
const char __user *const __user *argv,
const char __user *const __user *envp,
- struct pt_regs * regs)
+ struct pt_regs *regs, bool compat)
{
struct linux_binprm *bprm;
struct file *file;
@@ -1432,11 +1433,11 @@ int do_execve(const char * filename,
if (retval)
goto out_file;

- bprm->argc = count(argv, MAX_ARG_STRINGS);
+ bprm->argc = count(argv, MAX_ARG_STRINGS, compat);
if ((retval = bprm->argc) < 0)
goto out;

- bprm->envc = count(envp, MAX_ARG_STRINGS);
+ bprm->envc = count(envp, MAX_ARG_STRINGS, compat);
if ((retval = bprm->envc) < 0)
goto out;

@@ -1449,11 +1450,11 @@ int do_execve(const char * filename,
goto out;

bprm->exec = bprm->p;
- retval = copy_strings(bprm->envc, envp, bprm);
+ retval = copy_strings(bprm->envc, envp, bprm, compat);
if (retval < 0)
goto out;

- retval = copy_strings(bprm->argc, argv, bprm);
+ retval = copy_strings(bprm->argc, argv, bprm, compat);
if (retval < 0)
goto out;

@@ -1497,6 +1498,14 @@ out_ret:
return retval;
}

+int do_execve(const char *filename,
+ const char __user *const __user *argv,
+ const char __user *const __user *envp,
+ struct pt_regs *regs)
+{
+ return do_execve_common(filename, argv, envp, regs, false);
+}
+
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;

2011-02-25 18:01:52

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/5] exec: introduce get_arg_ptr() helper

Introduce get_arg_ptr() helper, convert count() and copy_strings()
to use it.

No functional changes, preparation. This helper is trivial, it just
reads the pointer from argv/envp user-space array.

Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/exec.c | 36 +++++++++++++++++++++++++-----------
1 file changed, 25 insertions(+), 11 deletions(-)

--- 38/fs/exec.c~1_get_arg_ptr 2011-02-25 18:01:59.000000000 +0100
+++ 38/fs/exec.c 2011-02-25 18:04:50.000000000 +0100
@@ -395,6 +395,17 @@ err:
return err;
}

+static const char __user *
+get_arg_ptr(const char __user * const __user *argv, int argc)
+{
+ const char __user *ptr;
+
+ if (get_user(ptr, argv + argc))
+ return ERR_PTR(-EFAULT);
+
+ return ptr;
+}
+
/*
* count() counts the number of strings in array ARGV.
*/
@@ -404,13 +415,14 @@ static int count(const char __user * con

if (argv != NULL) {
for (;;) {
- const char __user * p;
+ const char __user *p = get_arg_ptr(argv, i);

- if (get_user(p, argv))
- return -EFAULT;
if (!p)
break;
- argv++;
+
+ if (IS_ERR(p))
+ return -EFAULT;
+
if (i++ >= max)
return -E2BIG;

@@ -440,16 +452,18 @@ static int copy_strings(int argc, const
int len;
unsigned long pos;

- if (get_user(str, argv+argc) ||
- !(len = strnlen_user(str, MAX_ARG_STRLEN))) {
- ret = -EFAULT;
+ ret = -EFAULT;
+ str = get_arg_ptr(argv, argc);
+ if (IS_ERR(str))
goto out;
- }

- if (!valid_arg_len(bprm, len)) {
- ret = -E2BIG;
+ len = strnlen_user(str, MAX_ARG_STRLEN);
+ if (!len)
+ goto out;
+
+ ret = -E2BIG;
+ if (!valid_arg_len(bprm, len))
goto out;
- }

/* We're going to work our way backwords. */
pos = bprm->p;

2011-02-25 18:02:37

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 3/5] exec: unify compat_do_execve() code

Teach get_arg_ptr() to handle compat = T case correctly.

This allows us to remove the compat_do_execve() code from fs/compat.c
and reimplement compat_do_execve() as the trivial wrapper on top of
do_execve_common(compat => true).

In fact, this fixes another (minor) bug. "compat_uptr_t str" can
overflow after "str += len" in compat_copy_strings() if a 64bit
application execs via sys32_execve().

Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/exec.c | 25 ++++++
fs/compat.c | 235 ------------------------------------------------------------
2 files changed, 25 insertions(+), 235 deletions(-)

--- 38/fs/exec.c~3_use_compat 2011-02-25 18:05:05.000000000 +0100
+++ 38/fs/exec.c 2011-02-25 18:05:17.000000000 +0100
@@ -55,6 +55,7 @@
#include <linux/fs_struct.h>
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
+#include <linux/compat.h>

#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -400,6 +401,18 @@ get_arg_ptr(const char __user * const __
{
const char __user *ptr;

+#ifdef CONFIG_COMPAT
+ if (unlikely(compat)) {
+ compat_uptr_t __user *a = (void __user *)argv;
+ compat_uptr_t p;
+
+ if (get_user(p, a + argc))
+ return ERR_PTR(-EFAULT);
+
+ return compat_ptr(p);
+ }
+#endif
+
if (get_user(ptr, argv + argc))
return ERR_PTR(-EFAULT);

@@ -1506,6 +1519,18 @@ int do_execve(const char *filename,
return do_execve_common(filename, argv, envp, regs, false);
}

+#ifdef CONFIG_COMPAT
+int compat_do_execve(char *filename,
+ compat_uptr_t __user *argv,
+ compat_uptr_t __user *envp,
+ struct pt_regs *regs)
+{
+ return do_execve_common(filename,
+ (void __user *)argv, (void __user*)envp,
+ regs, true);
+}
+#endif
+
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;
--- 38/fs/compat.c~3_use_compat 2011-02-25 18:01:58.000000000 +0100
+++ 38/fs/compat.c 2011-02-25 18:05:17.000000000 +0100
@@ -1330,241 +1330,6 @@ compat_sys_openat(unsigned int dfd, cons
return do_sys_open(dfd, filename, flags, mode);
}

-/*
- * compat_count() counts the number of arguments/envelopes. It is basically
- * a copy of count() from fs/exec.c, except that it works with 32 bit argv
- * and envp pointers.
- */
-static int compat_count(compat_uptr_t __user *argv, int max)
-{
- int i = 0;
-
- if (argv != NULL) {
- for (;;) {
- compat_uptr_t p;
-
- if (get_user(p, argv))
- return -EFAULT;
- if (!p)
- break;
- argv++;
- if (i++ >= max)
- return -E2BIG;
-
- if (fatal_signal_pending(current))
- return -ERESTARTNOHAND;
- cond_resched();
- }
- }
- return i;
-}
-
-/*
- * compat_copy_strings() is basically a copy of copy_strings() from fs/exec.c
- * except that it works with 32 bit argv and envp pointers.
- */
-static int compat_copy_strings(int argc, compat_uptr_t __user *argv,
- struct linux_binprm *bprm)
-{
- struct page *kmapped_page = NULL;
- char *kaddr = NULL;
- unsigned long kpos = 0;
- int ret;
-
- while (argc-- > 0) {
- compat_uptr_t str;
- int len;
- unsigned long pos;
-
- if (get_user(str, argv+argc) ||
- !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
- ret = -EFAULT;
- goto out;
- }
-
- if (len > MAX_ARG_STRLEN) {
- ret = -E2BIG;
- goto out;
- }
-
- /* We're going to work our way backwords. */
- pos = bprm->p;
- str += len;
- bprm->p -= len;
-
- while (len > 0) {
- int offset, bytes_to_copy;
-
- if (fatal_signal_pending(current)) {
- ret = -ERESTARTNOHAND;
- goto out;
- }
- cond_resched();
-
- offset = pos % PAGE_SIZE;
- if (offset == 0)
- offset = PAGE_SIZE;
-
- bytes_to_copy = offset;
- if (bytes_to_copy > len)
- bytes_to_copy = len;
-
- offset -= bytes_to_copy;
- pos -= bytes_to_copy;
- str -= bytes_to_copy;
- len -= bytes_to_copy;
-
- if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
- struct page *page;
-
- page = get_arg_page(bprm, pos, 1);
- if (!page) {
- ret = -E2BIG;
- goto out;
- }
-
- if (kmapped_page) {
- flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
- put_page(kmapped_page);
- }
- kmapped_page = page;
- kaddr = kmap(kmapped_page);
- kpos = pos & PAGE_MASK;
- flush_cache_page(bprm->vma, kpos,
- page_to_pfn(kmapped_page));
- }
- if (copy_from_user(kaddr+offset, compat_ptr(str),
- bytes_to_copy)) {
- ret = -EFAULT;
- goto out;
- }
- }
- }
- ret = 0;
-out:
- if (kmapped_page) {
- flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
- put_page(kmapped_page);
- }
- return ret;
-}
-
-/*
- * compat_do_execve() is mostly a copy of do_execve(), with the exception
- * that it processes 32 bit argv and envp pointers.
- */
-int compat_do_execve(char * filename,
- compat_uptr_t __user *argv,
- compat_uptr_t __user *envp,
- struct pt_regs * regs)
-{
- struct linux_binprm *bprm;
- struct file *file;
- struct files_struct *displaced;
- bool clear_in_exec;
- int retval;
-
- retval = unshare_files(&displaced);
- if (retval)
- goto out_ret;
-
- retval = -ENOMEM;
- bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
- if (!bprm)
- goto out_files;
-
- retval = prepare_bprm_creds(bprm);
- if (retval)
- goto out_free;
-
- retval = check_unsafe_exec(bprm);
- if (retval < 0)
- goto out_free;
- clear_in_exec = retval;
- current->in_execve = 1;
-
- file = open_exec(filename);
- retval = PTR_ERR(file);
- if (IS_ERR(file))
- goto out_unmark;
-
- sched_exec();
-
- bprm->file = file;
- bprm->filename = filename;
- bprm->interp = filename;
-
- retval = bprm_mm_init(bprm);
- if (retval)
- goto out_file;
-
- bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
- if ((retval = bprm->argc) < 0)
- goto out;
-
- bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
- if ((retval = bprm->envc) < 0)
- goto out;
-
- retval = prepare_binprm(bprm);
- if (retval < 0)
- goto out;
-
- retval = copy_strings_kernel(1, &bprm->filename, bprm);
- if (retval < 0)
- goto out;
-
- bprm->exec = bprm->p;
- retval = compat_copy_strings(bprm->envc, envp, bprm);
- if (retval < 0)
- goto out;
-
- retval = compat_copy_strings(bprm->argc, argv, bprm);
- if (retval < 0)
- goto out;
-
- retval = search_binary_handler(bprm, regs);
- if (retval < 0)
- goto out;
-
- /* execve succeeded */
- current->fs->in_exec = 0;
- current->in_execve = 0;
- acct_update_integrals(current);
- free_bprm(bprm);
- if (displaced)
- put_files_struct(displaced);
- return retval;
-
-out:
- if (bprm->mm) {
- acct_arg_size(bprm, 0);
- mmput(bprm->mm);
- }
-
-out_file:
- if (bprm->file) {
- allow_write_access(bprm->file);
- fput(bprm->file);
- }
-
-out_unmark:
- if (clear_in_exec)
- current->fs->in_exec = 0;
- current->in_execve = 0;
-
-out_free:
- free_bprm(bprm);
-
-out_files:
- if (displaced)
- reset_files_struct(displaced);
-out_ret:
- return retval;
-}
-
#define __COMPAT_NFDBITS (8 * sizeof(compat_ulong_t))

static int poll_select_copy_remaining(struct timespec *end_time, void __user *p,

2011-02-25 18:03:13

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 4/5] exec: unexport acct_arg_size() and get_arg_page()

Unexport acct_arg_size() and get_arg_page(), fs/compat.c doesn't
need them any longer.

Signed-off-by: Oleg Nesterov <[email protected]>
---

include/linux/binfmts.h | 4 ----
fs/exec.c | 8 ++++----
2 files changed, 4 insertions(+), 8 deletions(-)

--- 38/include/linux/binfmts.h~4_unexport_arg_helpers 2011-02-25 18:01:57.000000000 +0100
+++ 38/include/linux/binfmts.h 2011-02-25 18:05:27.000000000 +0100
@@ -60,10 +60,6 @@ struct linux_binprm {
unsigned long loader, exec;
};

-extern void acct_arg_size(struct linux_binprm *bprm, unsigned long pages);
-extern struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
- int write);
-
#define BINPRM_FLAGS_ENFORCE_NONDUMP_BIT 0
#define BINPRM_FLAGS_ENFORCE_NONDUMP (1 << BINPRM_FLAGS_ENFORCE_NONDUMP_BIT)

--- 38/fs/exec.c~4_unexport_arg_helpers 2011-02-25 18:05:17.000000000 +0100
+++ 38/fs/exec.c 2011-02-25 18:05:27.000000000 +0100
@@ -165,7 +165,7 @@ out:

#ifdef CONFIG_MMU

-void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
struct mm_struct *mm = current->mm;
long diff = (long)(pages - bprm->vma_pages);
@@ -184,7 +184,7 @@ void acct_arg_size(struct linux_binprm *
#endif
}

-struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
struct page *page;
@@ -303,11 +303,11 @@ static bool valid_arg_len(struct linux_b

#else

-void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+static inline void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
}

-struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
struct page *page;

2011-02-25 18:03:27

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 5/5] exec: document acct_arg_size()

Add the comment to explain acct_arg_size().

Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/exec.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- 38/fs/exec.c~5_doc_acct_arg_size 2011-02-25 18:05:27.000000000 +0100
+++ 38/fs/exec.c 2011-02-25 18:05:34.000000000 +0100
@@ -164,7 +164,12 @@ out:
}

#ifdef CONFIG_MMU
-
+/*
+ * The nascent bprm->mm is not visible until exec_mmap() but it can
+ * use a lot of memory, account these pages in current->mm temporary
+ * for oom_badness()->get_mm_rss(). Once exec succeeds or fails, we
+ * change the counter back via acct_arg_size(0).
+ */
static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
struct mm_struct *mm = current->mm;

2011-02-25 18:55:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/4 RESEND] exec: unify compat/non-compat code

On Fri, Feb 25, 2011 at 9:52 AM, Oleg Nesterov <[email protected]> wrote:
>> On 12/01, Milton Miller wrote:
>> >
>> > > +#ifdef CONFIG_COMPAT
>> > > +int compat_do_execve(char * filename,
>> > > + compat_uptr_t __user *argv,
>> > > + compat_uptr_t __user *envp,
>> > > + struct pt_regs * regs)
>> > > +{
>> > > + return do_execve_common(filename,
>> > > + ? ? ? ? ? ? ? ? ? ? ? ? (void __user*)argv, (void __user*)envp,
>> >
>> > Shouldn't these be compat_ptr(argv)? ?(makes a difference on s390)

Indeed. The "compat_uptr_t __user *argv" is wrong, and it should be just

compat_uptr_t argv;

and then every time you turn it into a pointer, it should use
"compat_ptr(argv)".

Then, since it's a pointer to an array of pointers, when you do that,
you should turn it into a pointer to "compat_uptr_t", so you actually
have this:

- user passes "compat_uptr_t"

- the kernel can turn that into "compat_uptr_t __user *" by doing

compat_uptr_t __user *pptr;
pptr = compat_ptr(argv);

- the kernel needs to fetch the individual entries with

compat_uptr_t cuptr = get_user(pptr);

- the kernel can then turn _those_ into the actual pointers to the string with

const char __user *str = compat_ptr(cuptr);

so you need two levels of compat_ptr() conversion.

> So, once again, this should not (and can not) be compat_ptr(argv) afaics.

It can be, and probably should. But the low-level s390 wrapper
function may have done one of the levels already. It probably
shouldn't, and we _should_ do the "compat_ptr()" thing a the generic C
level. That's what we do with all the other pointers, after all.

Linus

2011-02-25 18:58:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 2/5] exec: introduce "bool compat" argument

On Fri, Feb 25, 2011 at 9:52 AM, Oleg Nesterov <[email protected]> wrote:
> No functional changes, preparation to simplify the review.

I think this is wrong.

If you introduce the "bool compat" thing, you should also change the
type of the argument pointers to some opaque type at the same time.
It's no longer really a

const char __user *const __user *

pointer at that point. Trying to claim it is, is just wrong. The type
suddently becomes conditional on that 'compat' variable.

Linus

2011-02-25 19:11:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 3/5] exec: unify compat_do_execve() code

On Fri, Feb 25, 2011 at 9:53 AM, Oleg Nesterov <[email protected]> wrote:
> Teach get_arg_ptr() to handle compat = T case correctly.

Does it?

> +#ifdef CONFIG_COMPAT
> +int compat_do_execve(char *filename,
> + ? ? ? compat_uptr_t __user *argv,
> + ? ? ? compat_uptr_t __user *envp,
> + ? ? ? struct pt_regs *regs)
> +{
> + ? ? ? return do_execve_common(filename,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (void __user *)argv, (void __user*)envp,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? regs, true);
> +}
> +#endif

I really suspect this should be something like

typedef union {
compat_uptr_t compat;
const char __user *native;
} conditional_user_ptr_t;

...

int compat_do_execve(char *filename,
compat_uptr_t argv,
compat_uptr_t envp,
struct pt_regs *regs)
{
return do_execve_common(filename,
compat_ptr(argv), compat_ptr(envp), regs);

where that 'do_execve_common()' takes it's arguments as

union conditional_user_ptr_t __user *argv,
union conditional_user_ptr_t __user *envp

and then in get_arg_ptr() we do the proper union member dereference
depending on the "compat" flag.

THAT would actually have the type system help us track what is
actually going on, and would clarify the rules. It would also make it
clear that "do_execve_common()" does *not* take some kind of random
pointer to user space (much less a "const char __user *const char
__user *"). It really does take a pointer to user space, but what that
pointer contains in turn depends on the "compat" flag.

IOW, it really acts as a pointer to a user-space union, and I think
we'd be better off having the type show that.

Linus

2011-02-26 12:44:24

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/4 RESEND] exec: unify compat/non-compat code

On 02/25, Linus Torvalds wrote:
>
> On Fri, Feb 25, 2011 at 9:52 AM, Oleg Nesterov <[email protected]> wrote:
> >> On 12/01, Milton Miller wrote:
> >> >
> >> > > +#ifdef CONFIG_COMPAT
> >> > > +int compat_do_execve(char * filename,
> >> > > + compat_uptr_t __user *argv,
> >> > > + compat_uptr_t __user *envp,
> >> > > + struct pt_regs * regs)
> >> > > +{
> >> > > + return do_execve_common(filename,
> >> > > + ? ? ? ? ? ? ? ? ? ? ? ? (void __user*)argv, (void __user*)envp,
> >> >
> >> > Shouldn't these be compat_ptr(argv)? ?(makes a difference on s390)
>
> Indeed. The "compat_uptr_t __user *argv" is wrong, and it should be just
>
> compat_uptr_t argv;
>
> and then every time you turn it into a pointer, it should use
> "compat_ptr(argv)".

Oh, perhaps, and I was thinking about this too. But this is another
issue, no? Or I misunderstood.

First of all, I agree that perhaps it makes sense to change the
signature of compat_do_execve()

- compat_do_execve(compat_uptr_t __user *argv)
+ compat_do_execve(compat_uptr_t argv)

but this has nothing to do with this series. We can do this before
or after ("after" seems simpler").

> - user passes "compat_uptr_t"

Yes,

> - the kernel can turn that into "compat_uptr_t __user *" by doing
>
> compat_uptr_t __user *pptr;
> pptr = compat_ptr(argv);

Yes! and the kernel already does this before it calls compat_do_execve(),
iow compat_do_execve() gets the result of compat_ptr(compat_ptr_from_user).

> - the kernel needs to fetch the individual entries with
>
> compat_uptr_t cuptr = get_user(pptr);
>
> - the kernel can then turn _those_ into the actual pointers to the string with
>
> const char __user *str = compat_ptr(cuptr);

Yes, and this is exactly what get_arg_ptr(compat => true) does.

> > So, once again, this should not (and can not) be compat_ptr(argv) afaics.
>
> It can be, and probably should.

Only if we change the signature of compat_do_execve(). With the current
code yet another compat_ptr() is not needed and it is simply wrong, this
is what I meant when I replied to Milton.

> But the low-level s390 wrapper
> function may have done one of the levels already. It probably
> shouldn't, and we _should_ do the "compat_ptr()" thing a the generic C
> level.

Agreed, but currently this compat_ptr() thing belongs to the caller.

IOW. Lets look at the current code. arch/ calls
compat_do_execve(compat_uptr_t __user *argv)->compat_count(argv) which
does get_user(argv) without any conversion, because argv was already
converted or arch/ is buggy.

Both do_execve() and compat_do_execve() accept the valid pointer
which does not need any conversion. But this pointer points to different
things, either to "char*" of "compat_uptr_t".

However, please see my reply to 2-3/5, I agree that this is confusing
and can be cleanuped.

Or do you think I missed something else?

Oleg.

2011-02-26 12:46:08

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/5] exec: introduce "bool compat" argument

On 02/25, Linus Torvalds wrote:
>
> On Fri, Feb 25, 2011 at 9:52 AM, Oleg Nesterov <[email protected]> wrote:
> > No functional changes, preparation to simplify the review.
>
> I think this is wrong.
>
> If you introduce the "bool compat" thing, you should also change the
> type of the argument pointers to some opaque type at the same time.
> It's no longer really a
>
> const char __user *const __user *
>
> pointer at that point. Trying to claim it is, is just wrong. The type
> suddently becomes conditional on that 'compat' variable.

Yes, this is true.

And I agree this could be done in more clean way, just we need more
changed. Please see the next email.

Oleg.

2011-02-26 12:46:32

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/5] exec: unify compat_do_execve() code

On 02/25, Linus Torvalds wrote:
>
> On Fri, Feb 25, 2011 at 9:53 AM, Oleg Nesterov <[email protected]> wrote:
> > Teach get_arg_ptr() to handle compat = T case correctly.
>
> Does it?

I think it does.

> > +#ifdef CONFIG_COMPAT
> > +int compat_do_execve(char *filename,
> > + ? ? ? compat_uptr_t __user *argv,
> > + ? ? ? compat_uptr_t __user *envp,
> > + ? ? ? struct pt_regs *regs)
> > +{
> > + ? ? ? return do_execve_common(filename,
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (void __user *)argv, (void __user*)envp,
> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? regs, true);
> > +}
> > +#endif
>
> I really suspect this should be something like
>
> typedef union {
> compat_uptr_t compat;
> const char __user *native;
> } conditional_user_ptr_t;

Personally I don't really like this union, to me "void __user*" looks
better, but I won't insist.

> int compat_do_execve(char *filename,
> compat_uptr_t argv,
> compat_uptr_t envp,
> {
> return do_execve_common(filename,
> compat_ptr(argv), compat_ptr(envp), regs);

Indeed! But, again, this has nothing to do with this series. We can
do this later and change the callers in arch/.

> where that 'do_execve_common()' takes it's arguments as
>
> union conditional_user_ptr_t __user *argv,
> union conditional_user_ptr_t __user *envp
>
> and then in get_arg_ptr() we do the proper union member dereference
> depending on the "compat" flag.

Once again, to me "void __user*" looks better (just simpler). In this
case get_arg_ptr() becomes (without const/__user for the clarity)

void *get_arg_ptr(void **argv, int argc, bool compat)
{
char *ptr;

#ifdef CONFIG_COMPAT
if (unlikely(compat)) {
compat_uptr_t *a = argv;
compat_uptr_t p;

if (get_user(p, a + argc))
return ERR_PTR(-EFAULT);

return compat_ptr(p);
}
#endif

if (get_user(ptr, argv + argc))
return ERR_PTR(-EFAULT);

return ptr;
}

Otherwise, get_arg_ptr() should return conditional_user_ptr_t as well,
this looks like the unnecessary complication to me, but of course this
is subjective.

So, what do you think?

Oleg.

2011-02-26 13:06:27

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/5] exec: unify compat_do_execve() code

On 02/26, Oleg Nesterov wrote:
>
> Once again, to me "void __user*" looks better (just simpler). In this
> case get_arg_ptr() becomes (without const/__user for the clarity)
>
> void *get_arg_ptr(void **argv, int argc, bool compat)
> {
> char *ptr;
>
> #ifdef CONFIG_COMPAT
> if (unlikely(compat)) {
> compat_uptr_t *a = argv;
> compat_uptr_t p;
>
> if (get_user(p, a + argc))
> return ERR_PTR(-EFAULT);
>
> return compat_ptr(p);
> }
> #endif
>
> if (get_user(ptr, argv + argc))
> return ERR_PTR(-EFAULT);
>
> return ptr;
> }
>
> Otherwise, get_arg_ptr() should return conditional_user_ptr_t as well,

No, this is not true, I am stupid.

Still,

> this looks like the unnecessary complication to me, but of course this
> is subjective.
>
> So, what do you think?

Yes, please.

Oleg.

2011-02-26 15:56:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 3/5] exec: unify compat_do_execve() code

On Sat, Feb 26, 2011 at 4:37 AM, Oleg Nesterov <[email protected]> wrote:
>>
>> ? typedef union {
>> ? ? ?compat_uptr_t compat;
>> ? ? ?const char __user *native;
>> ? ?} conditional_user_ptr_t;
>
> Personally I don't really like this union, to me "void __user*" looks
> better, but I won't insist.

Umm. "void __user *" may look simpler/better, but it's WRONG.

Using "const char __user *const __user *" is correct - but only for
the non-compat case.

And using "void __user *" may result in compiling code, but it will
have lost all actual information about the type. We don't do that in
the kernel if we can avoid it, because "void *" basically does no type
checking. Sure, sometimes it's the only thing we can do, but _if_ we
have a type, we should use it.

And that "union" really is the true type. You are passing a user
pointer down that can be either of those members.

So if you think it looks ugly, then you shouldn't do that "conditional
compat argument at run-time at all". Because the real ugliness of the
type comes not from the type, but from the fact that you pass a
pointer that can contain two different things.


> Once again, to me "void __user*" looks better (just simpler). In this
> case get_arg_ptr() becomes (without const/__user for the clarity)

No.

I simply won't apply that. It's WRONG. It's wrong because you've
dropped all the type information.

With the right union,

> ? ? ? ?void *get_arg_ptr(void **argv, int argc, bool compat)
> ? ? ? ?{
> ? ? ? ? ? ? ? ?char *ptr;
>
> ? ? ? ?#ifdef CONFIG_COMPAT
> ? ? ? ? ? ? ? ?if (unlikely(compat)) {
> ? ? ? ? ? ? ? ? ? ? ? ?compat_uptr_t *a = argv;
> ? ? ? ? ? ? ? ? ? ? ? ?compat_uptr_t p;
>
> ? ? ? ? ? ? ? ? ? ? ? ?if (get_user(p, a + argc))
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?return ERR_PTR(-EFAULT);
>
> ? ? ? ? ? ? ? ? ? ? ? ?return compat_ptr(p);
> ? ? ? ? ? ? ? ?}
> ? ? ? ?#endif
>
> ? ? ? ? ? ? ? ?if (get_user(ptr, &argv. + argc))
> ? ? ? ? ? ? ? ? ? ? ? ?return ERR_PTR(-EFAULT);
>
> ? ? ? ? ? ? ? ?return ptr;
> ? ? ? ?}
>
> Otherwise, get_arg_ptr() should return conditional_user_ptr_t as well,

No it shouldn't. The get_arg_ptr() should always just return the
actual pointer. It will have _resolved_ the ambiguity! That's what the
"compat_ptr()" thing does in the return case inside teh CONFIG_COMPAT.

So the correct way to do this is something like the following (yeah,
maybe I got the syntax wrong, I didn't test this, I just wrote it in
my MUA):

void *get_arg_ptr(const union compat_ptr_union __user *argv,
int argc, bool compat)
{
char *ptr;

#ifdef CONFIG_COMPAT
if (unlikely(compat)) {
compat_uptr_t p;

if (get_user(p, &argv->compat + argc))
return ERR_PTR(-EFAULT);

return compat_ptr(p);
}
#endif

if (get_user(ptr, &argv->noncompat +argc))
return ERR_PTR(-EFAULT);

return ptr;
}

and notice how it gets the types right, and it even has one line LESS
than your version, exactly because it gets the types right and doesn't
need that implied cast in your

compat_uptr_t *a = argv;

(in fact, I think your version needs an _explicit_ cast in order to
not get a warning: you can't just cast "void **" to something else).

See? The advantage of the union is that the types are correct, which
means that the casts are unnecessary.

The advantage of the union is also that you see what is going on, and
it's clear from the function prototype that this doesn't just take a
random user pointer, it takes a user pointer to something that can be
two different types.

See? Correct typing is important.

Linus

2011-02-26 18:59:19

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/5] exec: unify compat_do_execve() code

On 02/26, Linus Torvalds wrote:
>
> On Sat, Feb 26, 2011 at 4:37 AM, Oleg Nesterov <[email protected]> wrote:
> >>
> > Otherwise, get_arg_ptr() should return conditional_user_ptr_t as well,
>
> No it shouldn't.

(Yes I am stupid, see the next email).

> and notice how it gets the types right, and it even has one line LESS
> than your version, exactly because it gets the types right and doesn't
> need that implied cast in your
>
> compat_uptr_t *a = argv;
>
> (in fact, I think your version needs an _explicit_ cast in order to
> not get a warning: you can't just cast "void **" to something else).

Yes, and get_user(argv) in the !compat case doesn't look nice, I agree.

> See? The advantage of the union is that the types are correct, which
> means that the casts are unnecessary.

My point was, apart from the trivial get_arg_ptr() helper, nobody else
uses this argv/envp, so I thought it is OK to drop the type info and
use "void *".

But as I said, I won't insist. I'll redo/resend.

Thanks.

Oleg.

2011-03-01 20:56:42

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 0/5] exec: unify native/compat code

On 02/26, Oleg Nesterov wrote:
>
> On 02/26, Linus Torvalds wrote:
> >
> > See? The advantage of the union is that the types are correct, which
> > means that the casts are unnecessary.
>
> My point was, apart from the trivial get_arg_ptr() helper, nobody else
> uses this argv/envp, so I thought it is OK to drop the type info and
> use "void *".
>
> But as I said, I won't insist. I'll redo/resend.

Well, yes... But it turns out I didn't actually read what you proposed.

typedef union {
compat_uptr_t compat;
const char __user *native;
} conditional_user_ptr_t;

...

where that 'do_execve_common()' takes it's arguments as

union conditional_user_ptr_t __user *argv,
union conditional_user_ptr_t __user *envp

I hope you didn't really mean this...

OK, we have two kinds of pointers, the union makes sense. But I think
we do not want the 3rd kind, pointer to the union. This can't help to
avoid the casts. Yes, get_arg_ptr() can do

&argv->native

but this still means the cast even if looks differently (and tricky).

And. How can we pass "argv" from do_execve() to do_execve_common() ?
We need another cast.

So. If you insist you prefer the pointer to the union - no need to
convince me. Just say this and I'll redo again.

This patch does:

typedef union {
const char __user *const __user *native;
compat_uptr_t __user *compat;
} conditional_user_ptr_t;

static int do_execve_common(const char *filename,
conditional_user_ptr_t argv,
conditional_user_ptr_t envp,
struct pt_regs *regs, bool compat)

get_arg_ptr() does argv.native/compat, this looks more understandable.

Do you agree?

copy_strings_kernel() still needs the cast, but this is only because
we want to add "__user" for annotation.

Oleg.

2011-03-01 20:57:03

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 1/5] exec: introduce get_arg_ptr() helper

Introduce get_arg_ptr() helper, convert count() and copy_strings()
to use it.

No functional changes, preparation. This helper is trivial, it just
reads the pointer from argv/envp user-space array.

Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/exec.c | 36 +++++++++++++++++++++++++-----------
1 file changed, 25 insertions(+), 11 deletions(-)

--- 38/fs/exec.c~1_get_arg_ptr 2011-03-01 21:15:47.000000000 +0100
+++ 38/fs/exec.c 2011-03-01 21:17:45.000000000 +0100
@@ -395,6 +395,17 @@ err:
return err;
}

+static const char __user *
+get_arg_ptr(const char __user * const __user *argv, int argc)
+{
+ const char __user *ptr;
+
+ if (get_user(ptr, argv + argc))
+ return ERR_PTR(-EFAULT);
+
+ return ptr;
+}
+
/*
* count() counts the number of strings in array ARGV.
*/
@@ -404,13 +415,14 @@ static int count(const char __user * con

if (argv != NULL) {
for (;;) {
- const char __user * p;
+ const char __user *p = get_arg_ptr(argv, i);

- if (get_user(p, argv))
- return -EFAULT;
if (!p)
break;
- argv++;
+
+ if (IS_ERR(p))
+ return -EFAULT;
+
if (i++ >= max)
return -E2BIG;

@@ -440,16 +452,18 @@ static int copy_strings(int argc, const
int len;
unsigned long pos;

- if (get_user(str, argv+argc) ||
- !(len = strnlen_user(str, MAX_ARG_STRLEN))) {
- ret = -EFAULT;
+ ret = -EFAULT;
+ str = get_arg_ptr(argv, argc);
+ if (IS_ERR(str))
goto out;
- }

- if (!valid_arg_len(bprm, len)) {
- ret = -E2BIG;
+ len = strnlen_user(str, MAX_ARG_STRLEN);
+ if (!len)
+ goto out;
+
+ ret = -E2BIG;
+ if (!valid_arg_len(bprm, len))
goto out;
- }

/* We're going to work our way backwords. */
pos = bprm->p;

2011-03-01 20:57:29

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 2/5] exec: introduce "bool compat" argument

No functional changes, preparation to simplify the review.

And the new (and currently unused) "bool compat" argument to
get_arg_ptr(), count(), and copy_strings().

Add this argument to do_execve() as well, and rename it to
do_execve_common().

Reintroduce do_execve() as a trivial wrapper() on top of
do_execve_common(compat => false).

Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/exec.c | 33 +++++++++++++++++++++------------
1 file changed, 21 insertions(+), 12 deletions(-)

--- 38/fs/exec.c~2_is_compat_arg 2011-03-01 21:17:45.000000000 +0100
+++ 38/fs/exec.c 2011-03-01 21:17:46.000000000 +0100
@@ -396,7 +396,7 @@ err:
}

static const char __user *
-get_arg_ptr(const char __user * const __user *argv, int argc)
+get_arg_ptr(const char __user * const __user *argv, int argc, bool compat)
{
const char __user *ptr;

@@ -409,13 +409,13 @@ get_arg_ptr(const char __user * const __
/*
* count() counts the number of strings in array ARGV.
*/
-static int count(const char __user * const __user * argv, int max)
+static int count(const char __user * const __user *argv, int max, bool compat)
{
int i = 0;

if (argv != NULL) {
for (;;) {
- const char __user *p = get_arg_ptr(argv, i);
+ const char __user *p = get_arg_ptr(argv, i, compat);

if (!p)
break;
@@ -440,7 +440,7 @@ static int count(const char __user * con
* ensures the destination page is created and not swapped out.
*/
static int copy_strings(int argc, const char __user *const __user *argv,
- struct linux_binprm *bprm)
+ struct linux_binprm *bprm, bool compat)
{
struct page *kmapped_page = NULL;
char *kaddr = NULL;
@@ -453,7 +453,7 @@ static int copy_strings(int argc, const
unsigned long pos;

ret = -EFAULT;
- str = get_arg_ptr(argv, argc);
+ str = get_arg_ptr(argv, argc, compat);
if (IS_ERR(str))
goto out;

@@ -536,7 +536,8 @@ int copy_strings_kernel(int argc, const
int r;
mm_segment_t oldfs = get_fs();
set_fs(KERNEL_DS);
- r = copy_strings(argc, (const char __user *const __user *)argv, bprm);
+ r = copy_strings(argc, (const char __user *const __user *)argv,
+ bprm, false);
set_fs(oldfs);
return r;
}
@@ -1387,10 +1388,10 @@ EXPORT_SYMBOL(search_binary_handler);
/*
* sys_execve() executes a new program.
*/
-int do_execve(const char * filename,
+static int do_execve_common(const char *filename,
const char __user *const __user *argv,
const char __user *const __user *envp,
- struct pt_regs * regs)
+ struct pt_regs *regs, bool compat)
{
struct linux_binprm *bprm;
struct file *file;
@@ -1432,11 +1433,11 @@ int do_execve(const char * filename,
if (retval)
goto out_file;

- bprm->argc = count(argv, MAX_ARG_STRINGS);
+ bprm->argc = count(argv, MAX_ARG_STRINGS, compat);
if ((retval = bprm->argc) < 0)
goto out;

- bprm->envc = count(envp, MAX_ARG_STRINGS);
+ bprm->envc = count(envp, MAX_ARG_STRINGS, compat);
if ((retval = bprm->envc) < 0)
goto out;

@@ -1449,11 +1450,11 @@ int do_execve(const char * filename,
goto out;

bprm->exec = bprm->p;
- retval = copy_strings(bprm->envc, envp, bprm);
+ retval = copy_strings(bprm->envc, envp, bprm, compat);
if (retval < 0)
goto out;

- retval = copy_strings(bprm->argc, argv, bprm);
+ retval = copy_strings(bprm->argc, argv, bprm, compat);
if (retval < 0)
goto out;

@@ -1497,6 +1498,14 @@ out_ret:
return retval;
}

+int do_execve(const char *filename,
+ const char __user *const __user *argv,
+ const char __user *const __user *envp,
+ struct pt_regs *regs)
+{
+ return do_execve_common(filename, argv, envp, regs, false);
+}
+
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;

2011-03-01 20:57:49

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 3/5] exec: introduce conditional_user_ptr_t

No functional changes, preparation.

Introduce conditional_user_ptr_t, change do_execve() paths to use it
instead of "char __user * const __user *argv".

This makes the argv/envp arguments opaque, we are ready to handle the
compat case which needs argv pointing to compat_uptr_t.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/exec.c | 33 +++++++++++++++++++++------------
1 file changed, 21 insertions(+), 12 deletions(-)

--- 38/fs/exec.c~3_typdef_for_argv 2011-03-01 21:17:46.000000000 +0100
+++ 38/fs/exec.c 2011-03-01 21:17:46.000000000 +0100
@@ -395,12 +395,16 @@ err:
return err;
}

+typedef union {
+ const char __user *const __user *native;
+} conditional_user_ptr_t;
+
static const char __user *
-get_arg_ptr(const char __user * const __user *argv, int argc, bool compat)
+get_arg_ptr(conditional_user_ptr_t argv, int argc, bool compat)
{
const char __user *ptr;

- if (get_user(ptr, argv + argc))
+ if (get_user(ptr, argv.native + argc))
return ERR_PTR(-EFAULT);

return ptr;
@@ -409,11 +413,11 @@ get_arg_ptr(const char __user * const __
/*
* count() counts the number of strings in array ARGV.
*/
-static int count(const char __user * const __user *argv, int max, bool compat)
+static int count(conditional_user_ptr_t argv, int max, bool compat)
{
int i = 0;

- if (argv != NULL) {
+ if (argv.native != NULL) {
for (;;) {
const char __user *p = get_arg_ptr(argv, i, compat);

@@ -439,7 +443,7 @@ static int count(const char __user * con
* processes's memory to the new process's stack. The call to get_user_pages()
* ensures the destination page is created and not swapped out.
*/
-static int copy_strings(int argc, const char __user *const __user *argv,
+static int copy_strings(int argc, conditional_user_ptr_t argv,
struct linux_binprm *bprm, bool compat)
{
struct page *kmapped_page = NULL;
@@ -530,15 +534,19 @@ out:
/*
* Like copy_strings, but get argv and its values from kernel memory.
*/
-int copy_strings_kernel(int argc, const char *const *argv,
+int copy_strings_kernel(int argc, const char *const *ptr,
struct linux_binprm *bprm)
{
int r;
mm_segment_t oldfs = get_fs();
+ conditional_user_ptr_t argv = {
+ .native = (const char __user *const __user *)ptr,
+ };
+
set_fs(KERNEL_DS);
- r = copy_strings(argc, (const char __user *const __user *)argv,
- bprm, false);
+ r = copy_strings(argc, argv, bprm, false);
set_fs(oldfs);
+
return r;
}
EXPORT_SYMBOL(copy_strings_kernel);
@@ -1389,8 +1397,7 @@ EXPORT_SYMBOL(search_binary_handler);
* sys_execve() executes a new program.
*/
static int do_execve_common(const char *filename,
- const char __user *const __user *argv,
- const char __user *const __user *envp,
+ conditional_user_ptr_t argv, conditional_user_ptr_t envp,
struct pt_regs *regs, bool compat)
{
struct linux_binprm *bprm;
@@ -1499,10 +1506,12 @@ out_ret:
}

int do_execve(const char *filename,
- const char __user *const __user *argv,
- const char __user *const __user *envp,
+ const char __user *const __user *__argv,
+ const char __user *const __user *__envp,
struct pt_regs *regs)
{
+ conditional_user_ptr_t argv = { .native = __argv };
+ conditional_user_ptr_t envp = { .native = __envp };
return do_execve_common(filename, argv, envp, regs, false);
}

2011-03-01 20:58:14

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 4/5] exec: unify do_execve/compat_do_execve code

Add the appropriate member into conditional_user_ptr_t union and
teach get_arg_ptr() to handle compat = T case correctly.

This allows us to remove the compat_do_execve() code from fs/compat.c
and reimplement compat_do_execve() as the trivial wrapper on top of
do_execve_common(compat => true).

In fact, this fixes another (minor) bug. "compat_uptr_t str" can
overflow after "str += len" in compat_copy_strings() if a 64bit
application execs via sys32_execve().

Unexport acct_arg_size() and get_arg_page(), fs/compat.c doesn't
need them any longer.

Signed-off-by: Oleg Nesterov <[email protected]>
---

include/linux/binfmts.h | 4
fs/exec.c | 33 +++++-
fs/compat.c | 235 ------------------------------------------------
3 files changed, 29 insertions(+), 243 deletions(-)

--- 38/include/linux/binfmts.h~4_use_compat 2011-03-01 21:15:45.000000000 +0100
+++ 38/include/linux/binfmts.h 2011-03-01 21:17:47.000000000 +0100
@@ -60,10 +60,6 @@ struct linux_binprm {
unsigned long loader, exec;
};

-extern void acct_arg_size(struct linux_binprm *bprm, unsigned long pages);
-extern struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
- int write);
-
#define BINPRM_FLAGS_ENFORCE_NONDUMP_BIT 0
#define BINPRM_FLAGS_ENFORCE_NONDUMP (1 << BINPRM_FLAGS_ENFORCE_NONDUMP_BIT)

--- 38/fs/exec.c~4_use_compat 2011-03-01 21:17:46.000000000 +0100
+++ 38/fs/exec.c 2011-03-01 21:17:47.000000000 +0100
@@ -55,6 +55,7 @@
#include <linux/fs_struct.h>
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
+#include <linux/compat.h>

#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -164,7 +165,7 @@ out:

#ifdef CONFIG_MMU

-void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
struct mm_struct *mm = current->mm;
long diff = (long)(pages - bprm->vma_pages);
@@ -183,7 +184,7 @@ void acct_arg_size(struct linux_binprm *
#endif
}

-struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
struct page *page;
@@ -302,11 +303,11 @@ static bool valid_arg_len(struct linux_b

#else

-void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+static inline void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
}

-struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
struct page *page;
@@ -397,6 +398,7 @@ err:

typedef union {
const char __user *const __user *native;
+ compat_uptr_t __user *compat;
} conditional_user_ptr_t;

static const char __user *
@@ -404,6 +406,17 @@ get_arg_ptr(conditional_user_ptr_t argv,
{
const char __user *ptr;

+#ifdef CONFIG_COMPAT
+ if (unlikely(compat)) {
+ compat_uptr_t p;
+
+ if (get_user(p, argv.compat + argc))
+ return ERR_PTR(-EFAULT);
+
+ return compat_ptr(p);
+ }
+#endif
+
if (get_user(ptr, argv.native + argc))
return ERR_PTR(-EFAULT);

@@ -1515,6 +1528,18 @@ int do_execve(const char *filename,
return do_execve_common(filename, argv, envp, regs, false);
}

+#ifdef CONFIG_COMPAT
+int compat_do_execve(char *filename,
+ compat_uptr_t __user *__argv,
+ compat_uptr_t __user *__envp,
+ struct pt_regs *regs)
+{
+ conditional_user_ptr_t argv = { .compat = __argv };
+ conditional_user_ptr_t envp = { .compat = __envp };
+ return do_execve_common(filename, argv, envp, regs, true);
+}
+#endif
+
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;
--- 38/fs/compat.c~4_use_compat 2011-03-01 21:15:45.000000000 +0100
+++ 38/fs/compat.c 2011-03-01 21:17:47.000000000 +0100
@@ -1330,241 +1330,6 @@ compat_sys_openat(unsigned int dfd, cons
return do_sys_open(dfd, filename, flags, mode);
}

-/*
- * compat_count() counts the number of arguments/envelopes. It is basically
- * a copy of count() from fs/exec.c, except that it works with 32 bit argv
- * and envp pointers.
- */
-static int compat_count(compat_uptr_t __user *argv, int max)
-{
- int i = 0;
-
- if (argv != NULL) {
- for (;;) {
- compat_uptr_t p;
-
- if (get_user(p, argv))
- return -EFAULT;
- if (!p)
- break;
- argv++;
- if (i++ >= max)
- return -E2BIG;
-
- if (fatal_signal_pending(current))
- return -ERESTARTNOHAND;
- cond_resched();
- }
- }
- return i;
-}
-
-/*
- * compat_copy_strings() is basically a copy of copy_strings() from fs/exec.c
- * except that it works with 32 bit argv and envp pointers.
- */
-static int compat_copy_strings(int argc, compat_uptr_t __user *argv,
- struct linux_binprm *bprm)
-{
- struct page *kmapped_page = NULL;
- char *kaddr = NULL;
- unsigned long kpos = 0;
- int ret;
-
- while (argc-- > 0) {
- compat_uptr_t str;
- int len;
- unsigned long pos;
-
- if (get_user(str, argv+argc) ||
- !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
- ret = -EFAULT;
- goto out;
- }
-
- if (len > MAX_ARG_STRLEN) {
- ret = -E2BIG;
- goto out;
- }
-
- /* We're going to work our way backwords. */
- pos = bprm->p;
- str += len;
- bprm->p -= len;
-
- while (len > 0) {
- int offset, bytes_to_copy;
-
- if (fatal_signal_pending(current)) {
- ret = -ERESTARTNOHAND;
- goto out;
- }
- cond_resched();
-
- offset = pos % PAGE_SIZE;
- if (offset == 0)
- offset = PAGE_SIZE;
-
- bytes_to_copy = offset;
- if (bytes_to_copy > len)
- bytes_to_copy = len;
-
- offset -= bytes_to_copy;
- pos -= bytes_to_copy;
- str -= bytes_to_copy;
- len -= bytes_to_copy;
-
- if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
- struct page *page;
-
- page = get_arg_page(bprm, pos, 1);
- if (!page) {
- ret = -E2BIG;
- goto out;
- }
-
- if (kmapped_page) {
- flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
- put_page(kmapped_page);
- }
- kmapped_page = page;
- kaddr = kmap(kmapped_page);
- kpos = pos & PAGE_MASK;
- flush_cache_page(bprm->vma, kpos,
- page_to_pfn(kmapped_page));
- }
- if (copy_from_user(kaddr+offset, compat_ptr(str),
- bytes_to_copy)) {
- ret = -EFAULT;
- goto out;
- }
- }
- }
- ret = 0;
-out:
- if (kmapped_page) {
- flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
- put_page(kmapped_page);
- }
- return ret;
-}
-
-/*
- * compat_do_execve() is mostly a copy of do_execve(), with the exception
- * that it processes 32 bit argv and envp pointers.
- */
-int compat_do_execve(char * filename,
- compat_uptr_t __user *argv,
- compat_uptr_t __user *envp,
- struct pt_regs * regs)
-{
- struct linux_binprm *bprm;
- struct file *file;
- struct files_struct *displaced;
- bool clear_in_exec;
- int retval;
-
- retval = unshare_files(&displaced);
- if (retval)
- goto out_ret;
-
- retval = -ENOMEM;
- bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
- if (!bprm)
- goto out_files;
-
- retval = prepare_bprm_creds(bprm);
- if (retval)
- goto out_free;
-
- retval = check_unsafe_exec(bprm);
- if (retval < 0)
- goto out_free;
- clear_in_exec = retval;
- current->in_execve = 1;
-
- file = open_exec(filename);
- retval = PTR_ERR(file);
- if (IS_ERR(file))
- goto out_unmark;
-
- sched_exec();
-
- bprm->file = file;
- bprm->filename = filename;
- bprm->interp = filename;
-
- retval = bprm_mm_init(bprm);
- if (retval)
- goto out_file;
-
- bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
- if ((retval = bprm->argc) < 0)
- goto out;
-
- bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
- if ((retval = bprm->envc) < 0)
- goto out;
-
- retval = prepare_binprm(bprm);
- if (retval < 0)
- goto out;
-
- retval = copy_strings_kernel(1, &bprm->filename, bprm);
- if (retval < 0)
- goto out;
-
- bprm->exec = bprm->p;
- retval = compat_copy_strings(bprm->envc, envp, bprm);
- if (retval < 0)
- goto out;
-
- retval = compat_copy_strings(bprm->argc, argv, bprm);
- if (retval < 0)
- goto out;
-
- retval = search_binary_handler(bprm, regs);
- if (retval < 0)
- goto out;
-
- /* execve succeeded */
- current->fs->in_exec = 0;
- current->in_execve = 0;
- acct_update_integrals(current);
- free_bprm(bprm);
- if (displaced)
- put_files_struct(displaced);
- return retval;
-
-out:
- if (bprm->mm) {
- acct_arg_size(bprm, 0);
- mmput(bprm->mm);
- }
-
-out_file:
- if (bprm->file) {
- allow_write_access(bprm->file);
- fput(bprm->file);
- }
-
-out_unmark:
- if (clear_in_exec)
- current->fs->in_exec = 0;
- current->in_execve = 0;
-
-out_free:
- free_bprm(bprm);
-
-out_files:
- if (displaced)
- reset_files_struct(displaced);
-out_ret:
- return retval;
-}
-
#define __COMPAT_NFDBITS (8 * sizeof(compat_ulong_t))

static int poll_select_copy_remaining(struct timespec *end_time, void __user *p,

2011-03-01 20:58:37

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 5/5] exec: document acct_arg_size()

Add the comment to explain acct_arg_size().

Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/exec.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- 38/fs/exec.c~5_doc_acct_arg_size 2011-03-01 21:17:47.000000000 +0100
+++ 38/fs/exec.c 2011-03-01 21:17:47.000000000 +0100
@@ -164,7 +164,12 @@ out:
}

#ifdef CONFIG_MMU
-
+/*
+ * The nascent bprm->mm is not visible until exec_mmap() but it can
+ * use a lot of memory, account these pages in current->mm temporary
+ * for oom_badness()->get_mm_rss(). Once exec succeeds or fails, we
+ * change the counter back via acct_arg_size(0).
+ */
static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
struct mm_struct *mm = current->mm;

2011-03-01 21:40:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] exec: unify native/compat code

On Tue, Mar 1, 2011 at 12:47 PM, Oleg Nesterov <[email protected]> wrote:
> ? ? ? ?where that 'do_execve_common()' takes it's arguments as
>
> ? ? ? ? ? ? ? ?union conditional_user_ptr_t __user *argv,
> ? ? ? ? ? ? ? ?union conditional_user_ptr_t __user *envp
>
> I hope you didn't really mean this...

I really did mean that (although not the double "union" + "_t" thing
for the typedef).

But I'm not going to claim that it has to be done exactly that way,
the union can certainly be encapsulated differently too.

So I'm ok with your alternative

> ? ? ? ?typedef union {
> ? ? ? ? ? ? ? ?const char __user *const __user *native;
> ? ? ? ? ? ? ? ?compat_uptr_t __user *compat;
> ? ? ? ?} conditional_user_ptr_t;

model instead, which moves the pointer into the union.

However, if you do this, then I have one more suggestion: just move
the "compat" flag in there too!

Every time you pass the union, you're going to pass the compat flag to
distinguish the cases. So do it like this:

struct conditional_ptr {
int is_compat;
union {
const char __user *const __user *native;
compat_uptr_t __user *compat;
};
};

and it will all look much cleaner, I bet.

Linus

2011-03-02 16:36:23

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v3 0/4] exec: unify native/compat code

On 03/01, Linus Torvalds wrote:
>
> So I'm ok with your alternative
>
> > ? ? ? ?typedef union {
> > ? ? ? ? ? ? ? ?const char __user *const __user *native;
> > ? ? ? ? ? ? ? ?compat_uptr_t __user *compat;
> > ? ? ? ?} conditional_user_ptr_t;
>
> model instead, which moves the pointer into the union.
>
> However, if you do this, then I have one more suggestion: just move
> the "compat" flag in there too!
>
> Every time you pass the union, you're going to pass the compat flag to
> distinguish the cases. So do it like this:
>
> struct conditional_ptr {
> int is_compat;
> union {
> const char __user *const __user *native;
> compat_uptr_t __user *compat;
> };
> };
>
> and it will all look much cleaner, I bet.

Heh. I knew. I swear, I knew you would suggest this ;)

OK, please find v3. I had to deanonymize the union though, otherwise
the initializer in do_execve() becomes nontrivial.



But I don't think this is right. Not only this adds 200 bytes to exec.o.
To me, is_compat is not the private property of argv/envp. Yes, currently
nobody except get_arg_ptr() needs to know the difference. But who knows,
it is possible that we will need more "if (compat)" code in future. IOW,
I think that the explicit argument is a win.

Never mind. I agree with everything as long as we can remove this c-a-p
compat_do_execve().

Oleg.

2011-03-02 16:36:58

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v3 1/4] exec: introduce get_arg_ptr() helper

Introduce get_arg_ptr() helper, convert count() and copy_strings()
to use it.

No functional changes, preparation. This helper is trivial, it just
reads the pointer from argv/envp user-space array.

Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/exec.c | 36 +++++++++++++++++++++++++-----------
1 file changed, 25 insertions(+), 11 deletions(-)

--- 38/fs/exec.c~1_get_arg_ptr 2011-03-02 15:15:27.000000000 +0100
+++ 38/fs/exec.c 2011-03-02 15:16:44.000000000 +0100
@@ -395,6 +395,17 @@ err:
return err;
}

+static const char __user *
+get_arg_ptr(const char __user * const __user *argv, int argc)
+{
+ const char __user *ptr;
+
+ if (get_user(ptr, argv + argc))
+ return ERR_PTR(-EFAULT);
+
+ return ptr;
+}
+
/*
* count() counts the number of strings in array ARGV.
*/
@@ -404,13 +415,14 @@ static int count(const char __user * con

if (argv != NULL) {
for (;;) {
- const char __user * p;
+ const char __user *p = get_arg_ptr(argv, i);

- if (get_user(p, argv))
- return -EFAULT;
if (!p)
break;
- argv++;
+
+ if (IS_ERR(p))
+ return -EFAULT;
+
if (i++ >= max)
return -E2BIG;

@@ -440,16 +452,18 @@ static int copy_strings(int argc, const
int len;
unsigned long pos;

- if (get_user(str, argv+argc) ||
- !(len = strnlen_user(str, MAX_ARG_STRLEN))) {
- ret = -EFAULT;
+ ret = -EFAULT;
+ str = get_arg_ptr(argv, argc);
+ if (IS_ERR(str))
goto out;
- }

- if (!valid_arg_len(bprm, len)) {
- ret = -E2BIG;
+ len = strnlen_user(str, MAX_ARG_STRLEN);
+ if (!len)
+ goto out;
+
+ ret = -E2BIG;
+ if (!valid_arg_len(bprm, len))
goto out;
- }

/* We're going to work our way backwords. */
pos = bprm->p;

2011-03-02 16:37:08

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v3 2/4] exec: introduce struct conditional_ptr

No functional changes, preparation.

Introduce struct conditional_ptr, change do_execve() paths to use it
instead of "char __user * const __user *argv".

This makes the argv/envp arguments opaque, we are ready to handle the
compat case which needs argv pointing to compat_uptr_t.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/exec.c | 42 ++++++++++++++++++++++++++++++------------
1 file changed, 30 insertions(+), 12 deletions(-)

--- 38/fs/exec.c~2_typedef_for_argv 2011-03-02 15:40:22.000000000 +0100
+++ 38/fs/exec.c 2011-03-02 15:40:44.000000000 +0100
@@ -395,12 +395,15 @@ err:
return err;
}

-static const char __user *
-get_arg_ptr(const char __user * const __user *argv, int argc)
+struct conditional_ptr {
+ const char __user *const __user *native;
+};
+
+static const char __user *get_arg_ptr(struct conditional_ptr argv, int argc)
{
const char __user *ptr;

- if (get_user(ptr, argv + argc))
+ if (get_user(ptr, argv.native + argc))
return ERR_PTR(-EFAULT);

return ptr;
@@ -409,11 +412,11 @@ get_arg_ptr(const char __user * const __
/*
* count() counts the number of strings in array ARGV.
*/
-static int count(const char __user * const __user * argv, int max)
+static int count(struct conditional_ptr argv, int max)
{
int i = 0;

- if (argv != NULL) {
+ if (argv.native != NULL) {
for (;;) {
const char __user *p = get_arg_ptr(argv, i);

@@ -439,7 +442,7 @@ static int count(const char __user * con
* processes's memory to the new process's stack. The call to get_user_pages()
* ensures the destination page is created and not swapped out.
*/
-static int copy_strings(int argc, const char __user *const __user *argv,
+static int copy_strings(int argc, struct conditional_ptr argv,
struct linux_binprm *bprm)
{
struct page *kmapped_page = NULL;
@@ -530,14 +533,19 @@ out:
/*
* Like copy_strings, but get argv and its values from kernel memory.
*/
-int copy_strings_kernel(int argc, const char *const *argv,
+int copy_strings_kernel(int argc, const char *const *__argv,
struct linux_binprm *bprm)
{
int r;
mm_segment_t oldfs = get_fs();
+ struct conditional_ptr argv = {
+ .native = (const char __user *const __user *)__argv,
+ };
+
set_fs(KERNEL_DS);
- r = copy_strings(argc, (const char __user *const __user *)argv, bprm);
+ r = copy_strings(argc, argv, bprm);
set_fs(oldfs);
+
return r;
}
EXPORT_SYMBOL(copy_strings_kernel);
@@ -1387,10 +1395,10 @@ EXPORT_SYMBOL(search_binary_handler);
/*
* sys_execve() executes a new program.
*/
-int do_execve(const char * filename,
- const char __user *const __user *argv,
- const char __user *const __user *envp,
- struct pt_regs * regs)
+static int do_execve_common(const char *filename,
+ struct conditional_ptr argv,
+ struct conditional_ptr envp,
+ struct pt_regs *regs)
{
struct linux_binprm *bprm;
struct file *file;
@@ -1497,6 +1505,16 @@ out_ret:
return retval;
}

+int do_execve(const char *filename,
+ const char __user *const __user *__argv,
+ const char __user *const __user *__envp,
+ struct pt_regs *regs)
+{
+ struct conditional_ptr argv = { .native = __argv };
+ struct conditional_ptr envp = { .native = __envp };
+ return do_execve_common(filename, argv, envp, regs);
+}
+
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;

2011-03-02 16:37:29

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v3 3/4] exec: unify do_execve/compat_do_execve code

Add the appropriate members into struct conditional_ptr and teach
get_arg_ptr() to handle is_compat = T case correctly.

This allows us to remove the compat_do_execve() code from fs/compat.c
and reimplement compat_do_execve() as the trivial wrapper on top of
do_execve_common(is_compat => true).

In fact, this fixes another (minor) bug. "compat_uptr_t str" can
overflow after "str += len" in compat_copy_strings() if a 64bit
application execs via sys32_execve().

Unexport acct_arg_size() and get_arg_page(), fs/compat.c doesn't
need them any longer.

Signed-off-by: Oleg Nesterov <[email protected]>
---

include/linux/binfmts.h | 4
fs/exec.c | 58 +++++++++--
fs/compat.c | 235 ------------------------------------------------
3 files changed, 46 insertions(+), 251 deletions(-)

--- 38/include/linux/binfmts.h~3_handle_compat_case 2011-03-02 15:15:25.000000000 +0100
+++ 38/include/linux/binfmts.h 2011-03-02 15:47:15.000000000 +0100
@@ -60,10 +60,6 @@ struct linux_binprm {
unsigned long loader, exec;
};

-extern void acct_arg_size(struct linux_binprm *bprm, unsigned long pages);
-extern struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
- int write);
-
#define BINPRM_FLAGS_ENFORCE_NONDUMP_BIT 0
#define BINPRM_FLAGS_ENFORCE_NONDUMP (1 << BINPRM_FLAGS_ENFORCE_NONDUMP_BIT)

--- 38/fs/exec.c~3_handle_compat_case 2011-03-02 15:40:44.000000000 +0100
+++ 38/fs/exec.c 2011-03-02 16:21:57.000000000 +0100
@@ -55,6 +55,7 @@
#include <linux/fs_struct.h>
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
+#include <linux/compat.h>

#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -164,7 +165,7 @@ out:

#ifdef CONFIG_MMU

-void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
struct mm_struct *mm = current->mm;
long diff = (long)(pages - bprm->vma_pages);
@@ -183,7 +184,7 @@ void acct_arg_size(struct linux_binprm *
#endif
}

-struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
struct page *page;
@@ -302,11 +303,11 @@ static bool valid_arg_len(struct linux_b

#else

-void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+static inline void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
}

-struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
struct page *page;
@@ -396,17 +397,34 @@ err:
}

struct conditional_ptr {
- const char __user *const __user *native;
+#ifdef CONFIG_COMPAT
+ bool is_compat;
+#endif
+ union {
+ const char __user *const __user *native;
+ compat_uptr_t __user *compat;
+ } ptr;
};

static const char __user *get_arg_ptr(struct conditional_ptr argv, int argc)
{
- const char __user *ptr;
+ const char __user *native;

- if (get_user(ptr, argv.native + argc))
+#ifdef CONFIG_COMPAT
+ if (unlikely(argv.is_compat)) {
+ compat_uptr_t compat;
+
+ if (get_user(compat, argv.ptr.compat + argc))
+ return ERR_PTR(-EFAULT);
+
+ return compat_ptr(compat);
+ }
+#endif
+
+ if (get_user(native, argv.ptr.native + argc))
return ERR_PTR(-EFAULT);

- return ptr;
+ return native;
}

/*
@@ -416,7 +434,7 @@ static int count(struct conditional_ptr
{
int i = 0;

- if (argv.native != NULL) {
+ if (argv.ptr.native != NULL) {
for (;;) {
const char __user *p = get_arg_ptr(argv, i);

@@ -539,7 +557,7 @@ int copy_strings_kernel(int argc, const
int r;
mm_segment_t oldfs = get_fs();
struct conditional_ptr argv = {
- .native = (const char __user *const __user *)__argv,
+ .ptr.native = (const char __user *const __user *)__argv,
};

set_fs(KERNEL_DS);
@@ -1510,11 +1528,27 @@ int do_execve(const char *filename,
const char __user *const __user *__envp,
struct pt_regs *regs)
{
- struct conditional_ptr argv = { .native = __argv };
- struct conditional_ptr envp = { .native = __envp };
+ struct conditional_ptr argv = { .ptr.native = __argv };
+ struct conditional_ptr envp = { .ptr.native = __envp };
return do_execve_common(filename, argv, envp, regs);
}

+#ifdef CONFIG_COMPAT
+int compat_do_execve(char *filename,
+ compat_uptr_t __user *__argv,
+ compat_uptr_t __user *__envp,
+ struct pt_regs *regs)
+{
+ struct conditional_ptr argv = {
+ .is_compat = true, .ptr.compat = __argv,
+ };
+ struct conditional_ptr envp = {
+ .is_compat = true, .ptr.compat = __envp,
+ };
+ return do_execve_common(filename, argv, envp, regs);
+}
+#endif
+
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;
--- 38/fs/compat.c~3_handle_compat_case 2011-03-02 15:15:25.000000000 +0100
+++ 38/fs/compat.c 2011-03-02 15:47:15.000000000 +0100
@@ -1330,241 +1330,6 @@ compat_sys_openat(unsigned int dfd, cons
return do_sys_open(dfd, filename, flags, mode);
}

-/*
- * compat_count() counts the number of arguments/envelopes. It is basically
- * a copy of count() from fs/exec.c, except that it works with 32 bit argv
- * and envp pointers.
- */
-static int compat_count(compat_uptr_t __user *argv, int max)
-{
- int i = 0;
-
- if (argv != NULL) {
- for (;;) {
- compat_uptr_t p;
-
- if (get_user(p, argv))
- return -EFAULT;
- if (!p)
- break;
- argv++;
- if (i++ >= max)
- return -E2BIG;
-
- if (fatal_signal_pending(current))
- return -ERESTARTNOHAND;
- cond_resched();
- }
- }
- return i;
-}
-
-/*
- * compat_copy_strings() is basically a copy of copy_strings() from fs/exec.c
- * except that it works with 32 bit argv and envp pointers.
- */
-static int compat_copy_strings(int argc, compat_uptr_t __user *argv,
- struct linux_binprm *bprm)
-{
- struct page *kmapped_page = NULL;
- char *kaddr = NULL;
- unsigned long kpos = 0;
- int ret;
-
- while (argc-- > 0) {
- compat_uptr_t str;
- int len;
- unsigned long pos;
-
- if (get_user(str, argv+argc) ||
- !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
- ret = -EFAULT;
- goto out;
- }
-
- if (len > MAX_ARG_STRLEN) {
- ret = -E2BIG;
- goto out;
- }
-
- /* We're going to work our way backwords. */
- pos = bprm->p;
- str += len;
- bprm->p -= len;
-
- while (len > 0) {
- int offset, bytes_to_copy;
-
- if (fatal_signal_pending(current)) {
- ret = -ERESTARTNOHAND;
- goto out;
- }
- cond_resched();
-
- offset = pos % PAGE_SIZE;
- if (offset == 0)
- offset = PAGE_SIZE;
-
- bytes_to_copy = offset;
- if (bytes_to_copy > len)
- bytes_to_copy = len;
-
- offset -= bytes_to_copy;
- pos -= bytes_to_copy;
- str -= bytes_to_copy;
- len -= bytes_to_copy;
-
- if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
- struct page *page;
-
- page = get_arg_page(bprm, pos, 1);
- if (!page) {
- ret = -E2BIG;
- goto out;
- }
-
- if (kmapped_page) {
- flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
- put_page(kmapped_page);
- }
- kmapped_page = page;
- kaddr = kmap(kmapped_page);
- kpos = pos & PAGE_MASK;
- flush_cache_page(bprm->vma, kpos,
- page_to_pfn(kmapped_page));
- }
- if (copy_from_user(kaddr+offset, compat_ptr(str),
- bytes_to_copy)) {
- ret = -EFAULT;
- goto out;
- }
- }
- }
- ret = 0;
-out:
- if (kmapped_page) {
- flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
- put_page(kmapped_page);
- }
- return ret;
-}
-
-/*
- * compat_do_execve() is mostly a copy of do_execve(), with the exception
- * that it processes 32 bit argv and envp pointers.
- */
-int compat_do_execve(char * filename,
- compat_uptr_t __user *argv,
- compat_uptr_t __user *envp,
- struct pt_regs * regs)
-{
- struct linux_binprm *bprm;
- struct file *file;
- struct files_struct *displaced;
- bool clear_in_exec;
- int retval;
-
- retval = unshare_files(&displaced);
- if (retval)
- goto out_ret;
-
- retval = -ENOMEM;
- bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
- if (!bprm)
- goto out_files;
-
- retval = prepare_bprm_creds(bprm);
- if (retval)
- goto out_free;
-
- retval = check_unsafe_exec(bprm);
- if (retval < 0)
- goto out_free;
- clear_in_exec = retval;
- current->in_execve = 1;
-
- file = open_exec(filename);
- retval = PTR_ERR(file);
- if (IS_ERR(file))
- goto out_unmark;
-
- sched_exec();
-
- bprm->file = file;
- bprm->filename = filename;
- bprm->interp = filename;
-
- retval = bprm_mm_init(bprm);
- if (retval)
- goto out_file;
-
- bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
- if ((retval = bprm->argc) < 0)
- goto out;
-
- bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
- if ((retval = bprm->envc) < 0)
- goto out;
-
- retval = prepare_binprm(bprm);
- if (retval < 0)
- goto out;
-
- retval = copy_strings_kernel(1, &bprm->filename, bprm);
- if (retval < 0)
- goto out;
-
- bprm->exec = bprm->p;
- retval = compat_copy_strings(bprm->envc, envp, bprm);
- if (retval < 0)
- goto out;
-
- retval = compat_copy_strings(bprm->argc, argv, bprm);
- if (retval < 0)
- goto out;
-
- retval = search_binary_handler(bprm, regs);
- if (retval < 0)
- goto out;
-
- /* execve succeeded */
- current->fs->in_exec = 0;
- current->in_execve = 0;
- acct_update_integrals(current);
- free_bprm(bprm);
- if (displaced)
- put_files_struct(displaced);
- return retval;
-
-out:
- if (bprm->mm) {
- acct_arg_size(bprm, 0);
- mmput(bprm->mm);
- }
-
-out_file:
- if (bprm->file) {
- allow_write_access(bprm->file);
- fput(bprm->file);
- }
-
-out_unmark:
- if (clear_in_exec)
- current->fs->in_exec = 0;
- current->in_execve = 0;
-
-out_free:
- free_bprm(bprm);
-
-out_files:
- if (displaced)
- reset_files_struct(displaced);
-out_ret:
- return retval;
-}
-
#define __COMPAT_NFDBITS (8 * sizeof(compat_ulong_t))

static int poll_select_copy_remaining(struct timespec *end_time, void __user *p,

2011-03-02 16:37:39

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v3 4/4] exec: document acct_arg_size()

Add the comment to explain acct_arg_size().

Signed-off-by: Oleg Nesterov <[email protected]>
---

fs/exec.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- 38/fs/exec.c~4_doc_acct_arg_size 2011-03-02 16:21:57.000000000 +0100
+++ 38/fs/exec.c 2011-03-02 16:27:24.000000000 +0100
@@ -164,7 +164,12 @@ out:
}

#ifdef CONFIG_MMU
-
+/*
+ * The nascent bprm->mm is not visible until exec_mmap() but it can
+ * use a lot of memory, account these pages in current->mm temporary
+ * for oom_badness()->get_mm_rss(). Once exec succeeds or fails, we
+ * change the counter back via acct_arg_size(0).
+ */
static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
struct mm_struct *mm = current->mm;

2011-03-02 16:53:31

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] exec: unify native/compat code

On 03/02, Oleg Nesterov wrote:
>
> Never mind. I agree with everything as long as we can remove this c-a-p
> compat_do_execve().

forgot to mention...

And probably you meant we should pass "struct conditional_ptr*", not
by value. I can redo again.

And sorry for the duplicated 4/4 emails...

Oleg.

2011-03-02 18:01:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] exec: unify native/compat code

On Wed, Mar 2, 2011 at 8:44 AM, Oleg Nesterov <[email protected]> wrote:
>
> forgot to mention...
>
> And probably you meant we should pass "struct conditional_ptr*", not
> by value. I can redo again.

No, I think we're ok with passing the structure by value - it's a
small structure that would generally be passed in registers (at least
on some architectures, I guess it will depend on the ABI), and we do
the "struct-by-value" thing for other things too (notably the page
table entries), so it's not a new thing in the kernel.

So I think I finally have no complaints. Of course, I didn't actually
check whether it _works_, but I assume it does.

If the s390 people (who actually do special things with compat
pointers) can test, that would be ok, but I'm certainly happily going
to apply this series when the next merge window opens.

Linus

2011-03-02 19:39:42

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] exec: unify native/compat code

From: Linus Torvalds <[email protected]>
Date: Wed, 2 Mar 2011 10:00:23 -0800

> No, I think we're ok with passing the structure by value - it's a
> small structure that would generally be passed in registers (at least
> on some architectures, I guess it will depend on the ABI), and we do
> the "struct-by-value" thing for other things too (notably the page
> table entries), so it's not a new thing in the kernel.

We purposely don't do that "page table entry typedef'd to aggregate" stuff
on sparc32 because otherwise such values get passed on the stack.

Architectures can currently avoid this bad code generation for the
page table case, but with this new code they won't be able to avoid
pass-by-value.

2011-03-02 19:48:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] exec: unify native/compat code

On Wed, Mar 2, 2011 at 11:40 AM, David Miller <[email protected]> wrote:
>
> We purposely don't do that "page table entry typedef'd to aggregate" stuff
> on sparc32 because otherwise such values get passed on the stack.
>
> Architectures can currently avoid this bad code generation for the
> page table case, but with this new code they won't be able to avoid
> pass-by-value.

Well, the thing is, on architectures that _can_ pass by value, it
avoids one indirection.

And if you do pass it on stack, then the code generated will be the
same as if we passed a pointer. So sparc may not be able to take
advantage of the optimization, but I don't think the code generation
would be worse.

For the page table case, we don't have that kind of trade-off: the
trade-off there is literally just between "pass in registers, or pass
on stack". Here the trade-off is "pass as an aggregate value or pass
as a pointer to an aggregate value".

That said, since I suspect that the main user will always just get
inlined (ie the helper function that actually fetches the pointers), I
suspect even sparc will see the advantage of the pass-by-value model.

But you might want to actually test the difference and look at the
code generation.

Linus

2011-03-02 19:53:36

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] exec: unify native/compat code

From: Linus Torvalds <[email protected]>
Date: Wed, 2 Mar 2011 11:48:03 -0800

> Well, the thing is, on architectures that _can_ pass by value, it
> avoids one indirection.
>
> And if you do pass it on stack, then the code generated will be the
> same as if we passed a pointer. So sparc may not be able to take
> advantage of the optimization, but I don't think the code generation
> would be worse.

That's a good point, the situation here is different than the page table
one.

2011-03-03 03:01:22

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] exec: introduce get_arg_ptr() helper

Hi

Sorry for the long delay. now I'm getting stuck sucky paper work. ;-)
In short, I don't find any issue in this patch. So, I'll test it at
this weekend if linus haven't merged it yet.

A few small and cosmetic comments are below. but anyway I don't want
keep this up in the air.
Reviewed-by: KOSAKI Motohiro <[email protected]>



> Introduce get_arg_ptr() helper, convert count() and copy_strings()
> to use it.
>
> No functional changes, preparation. This helper is trivial, it just
> reads the pointer from argv/envp user-space array.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
>
> fs/exec.c | 36 +++++++++++++++++++++++++-----------
> 1 file changed, 25 insertions(+), 11 deletions(-)
>
> --- 38/fs/exec.c~1_get_arg_ptr 2011-03-02 15:15:27.000000000 +0100
> +++ 38/fs/exec.c 2011-03-02 15:16:44.000000000 +0100
> @@ -395,6 +395,17 @@ err:
> return err;
> }
>
> +static const char __user *
> +get_arg_ptr(const char __user * const __user *argv, int argc)
> +{

[argc, argv] is natural order to me than [argv, argc].
and "get_" prefix are usually used for reference count incrementing
function in linux. so, i _personally_ prefer to call "user_arg_ptr".


2011-03-03 03:08:24

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] exec: introduce struct conditional_ptr

> No functional changes, preparation.
>
> Introduce struct conditional_ptr, change do_execve() paths to use it
> instead of "char __user * const __user *argv".
>
> This makes the argv/envp arguments opaque, we are ready to handle the
> compat case which needs argv pointing to compat_uptr_t.
>
> Suggested-by: Linus Torvalds <[email protected]>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
>
> fs/exec.c | 42 ++++++++++++++++++++++++++++++------------
> 1 file changed, 30 insertions(+), 12 deletions(-)
>
> --- 38/fs/exec.c~2_typedef_for_argv 2011-03-02 15:40:22.000000000 +0100
> +++ 38/fs/exec.c 2011-03-02 15:40:44.000000000 +0100
> @@ -395,12 +395,15 @@ err:
> return err;
> }
>
> -static const char __user *
> -get_arg_ptr(const char __user * const __user *argv, int argc)
> +struct conditional_ptr {

I _personally_ don't like "conditional". Its name is based on code logic.
It's unclear what mean "conditional". From data strucuture view, It is
"opaque userland pointer".

but again, it is my personal preference.

Reviewed-by: KOSAKI Motohiro <[email protected]>


2011-03-03 03:09:29

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] exec: document acct_arg_size()

> Add the comment to explain acct_arg_size().
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
>
> fs/exec.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> --- 38/fs/exec.c~4_doc_acct_arg_size 2011-03-02 16:21:57.000000000 +0100
> +++ 38/fs/exec.c 2011-03-02 16:27:24.000000000 +0100
> @@ -164,7 +164,12 @@ out:
> }
>
> #ifdef CONFIG_MMU
> -
> +/*
> + * The nascent bprm->mm is not visible until exec_mmap() but it can
> + * use a lot of memory, account these pages in current->mm temporary
> + * for oom_badness()->get_mm_rss(). Once exec succeeds or fails, we
> + * change the counter back via acct_arg_size(0).
> + */
> static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
> {
> struct mm_struct *mm = current->mm;
>

Yeah! Thank you very much to make proper and clear comment.
Reviewed-by: KOSAKI Motohiro <[email protected]>


2011-03-03 03:13:13

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] exec: unify do_execve/compat_do_execve code

> @@ -1510,11 +1528,27 @@ int do_execve(const char *filename,
> const char __user *const __user *__envp,
> struct pt_regs *regs)
> {
> - struct conditional_ptr argv = { .native = __argv };
> - struct conditional_ptr envp = { .native = __envp };
> + struct conditional_ptr argv = { .ptr.native = __argv };
> + struct conditional_ptr envp = { .ptr.native = __envp };
> return do_execve_common(filename, argv, envp, regs);
> }
>
> +#ifdef CONFIG_COMPAT
> +int compat_do_execve(char *filename,
> + compat_uptr_t __user *__argv,
> + compat_uptr_t __user *__envp,
> + struct pt_regs *regs)
> +{
> + struct conditional_ptr argv = {
> + .is_compat = true, .ptr.compat = __argv,
> + };

Please don't mind to compress a line.

struct conditional_ptr argv = {
.is_compat = true,
.ptr.compat = __argv,
};

is more good readability.


Other parts looks very good to me.
Reviewed-by: KOSAKI Motohiro <[email protected]>


2011-03-03 15:57:05

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] exec: introduce get_arg_ptr() helper

On 03/03, KOSAKI Motohiro wrote:
>
> > +static const char __user *
> > +get_arg_ptr(const char __user * const __user *argv, int argc)
> > +{
>
> [argc, argv] is natural order to me than [argv, argc].

Yes... in fact, "argc" is misnamed here. It doesn't mean the number of
arguments, it is the index in the array. Perhaps this should be [argv, nr].

> and "get_" prefix are usually used for reference count incrementing
> function in linux. so, i _personally_ prefer to call "user_arg_ptr".

Agreed, the name is ugly. I'll rename and resend keeping your reviewed-by.

[2/4]
> I _personally_ don't like "conditional". Its name is based on code logic.
> It's unclear what mean "conditional". From data strucuture view, It is
> "opaque userland pointer".

I agree with any naming, just suggest a better name ;)

[3/4]
> > + struct conditional_ptr argv = {
> > + .is_compat = true, .ptr.compat = __argv,
> > + };
>
> Please don't mind to compress a line.
>
> struct conditional_ptr argv = {
> .is_compat = true,
> .ptr.compat = __argv,
> };

OK, will do.

Thanks for review!

Oleg.

2011-03-03 16:08:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] exec: introduce get_arg_ptr() helper

On Thu, Mar 3, 2011 at 7:47 AM, Oleg Nesterov <[email protected]> wrote:
>> I _personally_ don't like "conditional". Its name is based on code logic.
>> It's unclear what mean "conditional". From data strucuture view, It is
>> "opaque userland pointer".
>
> I agree with any naming, just suggest a better name ;)

Maybe just "struct user_arg_ptr" or something?

Linus

2011-03-05 20:39:50

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v4 0/4] exec: unify native/compat code

On 03/03, Linus Torvalds wrote:
>
> On Thu, Mar 3, 2011 at 7:47 AM, Oleg Nesterov <[email protected]> wrote:
> >> I _personally_ don't like "conditional". Its name is based on code logic.
> >> It's unclear what mean "conditional". From data strucuture view, It is
> >> "opaque userland pointer".
> >
> > I agree with any naming, just suggest a better name ;)
>
> Maybe just "struct user_arg_ptr" or something?

OK, nothing else was suggessted, I assume Kosaki agrees.

So rename conditional_ptr to user_arg_ptr.

Also rename get_user_ptr() to get_user_arg_ptr(). It was suggested to
use the same "user_arg_ptr" for this helper too, but this is not
grep-friendly. As for get_ in the name... Well, I can redo again ;)
But this matches get_user() and this is all what this helper does.

Otherwise unchanged.

Oleg.

2011-03-05 20:40:48

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v4 3/4] exec: unify do_execve/compat_do_execve code

Add the appropriate members into struct user_arg_ptr and teach
get_user_arg_ptr() to handle is_compat = T case correctly.

This allows us to remove the compat_do_execve() code from fs/compat.c
and reimplement compat_do_execve() as the trivial wrapper on top of
do_execve_common(is_compat => true).

In fact, this fixes another (minor) bug. "compat_uptr_t str" can
overflow after "str += len" in compat_copy_strings() if a 64bit
application execs via sys32_execve().

Unexport acct_arg_size() and get_arg_page(), fs/compat.c doesn't
need them any longer.

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
---

include/linux/binfmts.h | 4
fs/exec.c | 58 +++++++++--
fs/compat.c | 235 ------------------------------------------------
3 files changed, 46 insertions(+), 251 deletions(-)

--- 38/include/linux/binfmts.h~3_handle_compat_case 2011-03-05 21:13:45.000000000 +0100
+++ 38/include/linux/binfmts.h 2011-03-05 21:23:15.000000000 +0100
@@ -60,10 +60,6 @@ struct linux_binprm {
unsigned long loader, exec;
};

-extern void acct_arg_size(struct linux_binprm *bprm, unsigned long pages);
-extern struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
- int write);
-
#define BINPRM_FLAGS_ENFORCE_NONDUMP_BIT 0
#define BINPRM_FLAGS_ENFORCE_NONDUMP (1 << BINPRM_FLAGS_ENFORCE_NONDUMP_BIT)

--- 38/fs/exec.c~3_handle_compat_case 2011-03-05 21:22:42.000000000 +0100
+++ 38/fs/exec.c 2011-03-05 21:23:15.000000000 +0100
@@ -55,6 +55,7 @@
#include <linux/fs_struct.h>
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
+#include <linux/compat.h>

#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -164,7 +165,7 @@ out:

#ifdef CONFIG_MMU

-void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
struct mm_struct *mm = current->mm;
long diff = (long)(pages - bprm->vma_pages);
@@ -183,7 +184,7 @@ void acct_arg_size(struct linux_binprm *
#endif
}

-struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
struct page *page;
@@ -302,11 +303,11 @@ static bool valid_arg_len(struct linux_b

#else

-void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+static inline void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
}

-struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
struct page *page;
@@ -396,17 +397,34 @@ err:
}

struct user_arg_ptr {
- const char __user *const __user *native;
+#ifdef CONFIG_COMPAT
+ bool is_compat;
+#endif
+ union {
+ const char __user *const __user *native;
+ compat_uptr_t __user *compat;
+ } ptr;
};

static const char __user *get_user_arg_ptr(struct user_arg_ptr argv, int nr)
{
- const char __user *ptr;
+ const char __user *native;

- if (get_user(ptr, argv.native + nr))
+#ifdef CONFIG_COMPAT
+ if (unlikely(argv.is_compat)) {
+ compat_uptr_t compat;
+
+ if (get_user(compat, argv.ptr.compat + nr))
+ return ERR_PTR(-EFAULT);
+
+ return compat_ptr(compat);
+ }
+#endif
+
+ if (get_user(native, argv.ptr.native + nr))
return ERR_PTR(-EFAULT);

- return ptr;
+ return native;
}

/*
@@ -416,7 +434,7 @@ static int count(struct user_arg_ptr arg
{
int i = 0;

- if (argv.native != NULL) {
+ if (argv.ptr.native != NULL) {
for (;;) {
const char __user *p = get_user_arg_ptr(argv, i);

@@ -539,7 +557,7 @@ int copy_strings_kernel(int argc, const
int r;
mm_segment_t oldfs = get_fs();
struct user_arg_ptr argv = {
- .native = (const char __user *const __user *)__argv,
+ .ptr.native = (const char __user *const __user *)__argv,
};

set_fs(KERNEL_DS);
@@ -1510,10 +1528,28 @@ int do_execve(const char *filename,
const char __user *const __user *__envp,
struct pt_regs *regs)
{
- struct user_arg_ptr argv = { .native = __argv };
- struct user_arg_ptr envp = { .native = __envp };
+ struct user_arg_ptr argv = { .ptr.native = __argv };
+ struct user_arg_ptr envp = { .ptr.native = __envp };
+ return do_execve_common(filename, argv, envp, regs);
+}
+
+#ifdef CONFIG_COMPAT
+int compat_do_execve(char *filename,
+ compat_uptr_t __user *__argv,
+ compat_uptr_t __user *__envp,
+ struct pt_regs *regs)
+{
+ struct user_arg_ptr argv = {
+ .is_compat = true,
+ .ptr.compat = __argv,
+ };
+ struct user_arg_ptr envp = {
+ .is_compat = true,
+ .ptr.compat = __envp,
+ };
return do_execve_common(filename, argv, envp, regs);
}
+#endif

void set_binfmt(struct linux_binfmt *new)
{
--- 38/fs/compat.c~3_handle_compat_case 2011-03-05 21:13:45.000000000 +0100
+++ 38/fs/compat.c 2011-03-05 21:23:15.000000000 +0100
@@ -1330,241 +1330,6 @@ compat_sys_openat(unsigned int dfd, cons
return do_sys_open(dfd, filename, flags, mode);
}

-/*
- * compat_count() counts the number of arguments/envelopes. It is basically
- * a copy of count() from fs/exec.c, except that it works with 32 bit argv
- * and envp pointers.
- */
-static int compat_count(compat_uptr_t __user *argv, int max)
-{
- int i = 0;
-
- if (argv != NULL) {
- for (;;) {
- compat_uptr_t p;
-
- if (get_user(p, argv))
- return -EFAULT;
- if (!p)
- break;
- argv++;
- if (i++ >= max)
- return -E2BIG;
-
- if (fatal_signal_pending(current))
- return -ERESTARTNOHAND;
- cond_resched();
- }
- }
- return i;
-}
-
-/*
- * compat_copy_strings() is basically a copy of copy_strings() from fs/exec.c
- * except that it works with 32 bit argv and envp pointers.
- */
-static int compat_copy_strings(int argc, compat_uptr_t __user *argv,
- struct linux_binprm *bprm)
-{
- struct page *kmapped_page = NULL;
- char *kaddr = NULL;
- unsigned long kpos = 0;
- int ret;
-
- while (argc-- > 0) {
- compat_uptr_t str;
- int len;
- unsigned long pos;
-
- if (get_user(str, argv+argc) ||
- !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
- ret = -EFAULT;
- goto out;
- }
-
- if (len > MAX_ARG_STRLEN) {
- ret = -E2BIG;
- goto out;
- }
-
- /* We're going to work our way backwords. */
- pos = bprm->p;
- str += len;
- bprm->p -= len;
-
- while (len > 0) {
- int offset, bytes_to_copy;
-
- if (fatal_signal_pending(current)) {
- ret = -ERESTARTNOHAND;
- goto out;
- }
- cond_resched();
-
- offset = pos % PAGE_SIZE;
- if (offset == 0)
- offset = PAGE_SIZE;
-
- bytes_to_copy = offset;
- if (bytes_to_copy > len)
- bytes_to_copy = len;
-
- offset -= bytes_to_copy;
- pos -= bytes_to_copy;
- str -= bytes_to_copy;
- len -= bytes_to_copy;
-
- if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
- struct page *page;
-
- page = get_arg_page(bprm, pos, 1);
- if (!page) {
- ret = -E2BIG;
- goto out;
- }
-
- if (kmapped_page) {
- flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
- put_page(kmapped_page);
- }
- kmapped_page = page;
- kaddr = kmap(kmapped_page);
- kpos = pos & PAGE_MASK;
- flush_cache_page(bprm->vma, kpos,
- page_to_pfn(kmapped_page));
- }
- if (copy_from_user(kaddr+offset, compat_ptr(str),
- bytes_to_copy)) {
- ret = -EFAULT;
- goto out;
- }
- }
- }
- ret = 0;
-out:
- if (kmapped_page) {
- flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
- put_page(kmapped_page);
- }
- return ret;
-}
-
-/*
- * compat_do_execve() is mostly a copy of do_execve(), with the exception
- * that it processes 32 bit argv and envp pointers.
- */
-int compat_do_execve(char * filename,
- compat_uptr_t __user *argv,
- compat_uptr_t __user *envp,
- struct pt_regs * regs)
-{
- struct linux_binprm *bprm;
- struct file *file;
- struct files_struct *displaced;
- bool clear_in_exec;
- int retval;
-
- retval = unshare_files(&displaced);
- if (retval)
- goto out_ret;
-
- retval = -ENOMEM;
- bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
- if (!bprm)
- goto out_files;
-
- retval = prepare_bprm_creds(bprm);
- if (retval)
- goto out_free;
-
- retval = check_unsafe_exec(bprm);
- if (retval < 0)
- goto out_free;
- clear_in_exec = retval;
- current->in_execve = 1;
-
- file = open_exec(filename);
- retval = PTR_ERR(file);
- if (IS_ERR(file))
- goto out_unmark;
-
- sched_exec();
-
- bprm->file = file;
- bprm->filename = filename;
- bprm->interp = filename;
-
- retval = bprm_mm_init(bprm);
- if (retval)
- goto out_file;
-
- bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
- if ((retval = bprm->argc) < 0)
- goto out;
-
- bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
- if ((retval = bprm->envc) < 0)
- goto out;
-
- retval = prepare_binprm(bprm);
- if (retval < 0)
- goto out;
-
- retval = copy_strings_kernel(1, &bprm->filename, bprm);
- if (retval < 0)
- goto out;
-
- bprm->exec = bprm->p;
- retval = compat_copy_strings(bprm->envc, envp, bprm);
- if (retval < 0)
- goto out;
-
- retval = compat_copy_strings(bprm->argc, argv, bprm);
- if (retval < 0)
- goto out;
-
- retval = search_binary_handler(bprm, regs);
- if (retval < 0)
- goto out;
-
- /* execve succeeded */
- current->fs->in_exec = 0;
- current->in_execve = 0;
- acct_update_integrals(current);
- free_bprm(bprm);
- if (displaced)
- put_files_struct(displaced);
- return retval;
-
-out:
- if (bprm->mm) {
- acct_arg_size(bprm, 0);
- mmput(bprm->mm);
- }
-
-out_file:
- if (bprm->file) {
- allow_write_access(bprm->file);
- fput(bprm->file);
- }
-
-out_unmark:
- if (clear_in_exec)
- current->fs->in_exec = 0;
- current->in_execve = 0;
-
-out_free:
- free_bprm(bprm);
-
-out_files:
- if (displaced)
- reset_files_struct(displaced);
-out_ret:
- return retval;
-}
-
#define __COMPAT_NFDBITS (8 * sizeof(compat_ulong_t))

static int poll_select_copy_remaining(struct timespec *end_time, void __user *p,

2011-03-05 20:40:14

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v4 1/4] exec: introduce get_user_arg_ptr() helper

Introduce get_user_arg_ptr() helper, convert count() and copy_strings()
to use it.

No functional changes, preparation. This helper is trivial, it just
reads the pointer from argv/envp user-space array.

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
---

fs/exec.c | 36 +++++++++++++++++++++++++-----------
1 file changed, 25 insertions(+), 11 deletions(-)

--- 38/fs/exec.c~1_get_arg_ptr 2011-03-05 21:13:46.000000000 +0100
+++ 38/fs/exec.c 2011-03-05 21:21:56.000000000 +0100
@@ -395,6 +395,17 @@ err:
return err;
}

+static const char __user *
+get_user_arg_ptr(const char __user * const __user *argv, int nr)
+{
+ const char __user *ptr;
+
+ if (get_user(ptr, argv + nr))
+ return ERR_PTR(-EFAULT);
+
+ return ptr;
+}
+
/*
* count() counts the number of strings in array ARGV.
*/
@@ -404,13 +415,14 @@ static int count(const char __user * con

if (argv != NULL) {
for (;;) {
- const char __user * p;
+ const char __user *p = get_user_arg_ptr(argv, i);

- if (get_user(p, argv))
- return -EFAULT;
if (!p)
break;
- argv++;
+
+ if (IS_ERR(p))
+ return -EFAULT;
+
if (i++ >= max)
return -E2BIG;

@@ -440,16 +452,18 @@ static int copy_strings(int argc, const
int len;
unsigned long pos;

- if (get_user(str, argv+argc) ||
- !(len = strnlen_user(str, MAX_ARG_STRLEN))) {
- ret = -EFAULT;
+ ret = -EFAULT;
+ str = get_user_arg_ptr(argv, argc);
+ if (IS_ERR(str))
goto out;
- }

- if (!valid_arg_len(bprm, len)) {
- ret = -E2BIG;
+ len = strnlen_user(str, MAX_ARG_STRLEN);
+ if (!len)
+ goto out;
+
+ ret = -E2BIG;
+ if (!valid_arg_len(bprm, len))
goto out;
- }

/* We're going to work our way backwords. */
pos = bprm->p;

2011-03-05 20:40:29

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v4 2/4] exec: introduce struct user_arg_ptr

No functional changes, preparation.

Introduce struct user_arg_ptr, change do_execve() paths to use it
instead of "char __user * const __user *argv".

This makes the argv/envp arguments opaque, we are ready to handle the
compat case which needs argv pointing to compat_uptr_t.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
---

fs/exec.c | 42 ++++++++++++++++++++++++++++++------------
1 file changed, 30 insertions(+), 12 deletions(-)

--- 38/fs/exec.c~2_typedef_for_argv 2011-03-05 21:21:56.000000000 +0100
+++ 38/fs/exec.c 2011-03-05 21:22:42.000000000 +0100
@@ -395,12 +395,15 @@ err:
return err;
}

-static const char __user *
-get_user_arg_ptr(const char __user * const __user *argv, int nr)
+struct user_arg_ptr {
+ const char __user *const __user *native;
+};
+
+static const char __user *get_user_arg_ptr(struct user_arg_ptr argv, int nr)
{
const char __user *ptr;

- if (get_user(ptr, argv + nr))
+ if (get_user(ptr, argv.native + nr))
return ERR_PTR(-EFAULT);

return ptr;
@@ -409,11 +412,11 @@ get_user_arg_ptr(const char __user * con
/*
* count() counts the number of strings in array ARGV.
*/
-static int count(const char __user * const __user * argv, int max)
+static int count(struct user_arg_ptr argv, int max)
{
int i = 0;

- if (argv != NULL) {
+ if (argv.native != NULL) {
for (;;) {
const char __user *p = get_user_arg_ptr(argv, i);

@@ -439,7 +442,7 @@ static int count(const char __user * con
* processes's memory to the new process's stack. The call to get_user_pages()
* ensures the destination page is created and not swapped out.
*/
-static int copy_strings(int argc, const char __user *const __user *argv,
+static int copy_strings(int argc, struct user_arg_ptr argv,
struct linux_binprm *bprm)
{
struct page *kmapped_page = NULL;
@@ -530,14 +533,19 @@ out:
/*
* Like copy_strings, but get argv and its values from kernel memory.
*/
-int copy_strings_kernel(int argc, const char *const *argv,
+int copy_strings_kernel(int argc, const char *const *__argv,
struct linux_binprm *bprm)
{
int r;
mm_segment_t oldfs = get_fs();
+ struct user_arg_ptr argv = {
+ .native = (const char __user *const __user *)__argv,
+ };
+
set_fs(KERNEL_DS);
- r = copy_strings(argc, (const char __user *const __user *)argv, bprm);
+ r = copy_strings(argc, argv, bprm);
set_fs(oldfs);
+
return r;
}
EXPORT_SYMBOL(copy_strings_kernel);
@@ -1387,10 +1395,10 @@ EXPORT_SYMBOL(search_binary_handler);
/*
* sys_execve() executes a new program.
*/
-int do_execve(const char * filename,
- const char __user *const __user *argv,
- const char __user *const __user *envp,
- struct pt_regs * regs)
+static int do_execve_common(const char *filename,
+ struct user_arg_ptr argv,
+ struct user_arg_ptr envp,
+ struct pt_regs *regs)
{
struct linux_binprm *bprm;
struct file *file;
@@ -1497,6 +1505,16 @@ out_ret:
return retval;
}

+int do_execve(const char *filename,
+ const char __user *const __user *__argv,
+ const char __user *const __user *__envp,
+ struct pt_regs *regs)
+{
+ struct user_arg_ptr argv = { .native = __argv };
+ struct user_arg_ptr envp = { .native = __envp };
+ return do_execve_common(filename, argv, envp, regs);
+}
+
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;

2011-03-05 20:53:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v4 3/4] exec: unify do_execve/compat_do_execve code

Ok, everything looks fine to me.

Except looking at this, I don't think this part:

On Sat, Mar 5, 2011 at 12:31 PM, Oleg Nesterov <[email protected]> wrote:
>
> ?struct user_arg_ptr {
> - ? ? ? const char __user *const __user *native;
> +#ifdef CONFIG_COMPAT
> + ? ? ? bool is_compat;
> +#endif
> + ? ? ? union {
> + ? ? ? ? ? ? ? const char __user *const __user *native;
> + ? ? ? ? ? ? ? compat_uptr_t __user *compat;
> + ? ? ? } ptr;
> ?};

will necessarily even compile on an architecture that doesn't have any
'compat' support.

Do we even define 'compat_uptr_t' for that case? I don't think so.

So I suspect you need two of those annoying #ifdef's. Or we need to
have some way to guarantee that 'compat_uptr_t' exists.

Linus

2011-03-05 20:41:04

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v4 4/4] exec: document acct_arg_size()

Add the comment to explain acct_arg_size().

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
---

fs/exec.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- 38/fs/exec.c~4_doc_acct_arg_size 2011-03-05 21:23:15.000000000 +0100
+++ 38/fs/exec.c 2011-03-05 21:23:33.000000000 +0100
@@ -164,7 +164,12 @@ out:
}

#ifdef CONFIG_MMU
-
+/*
+ * The nascent bprm->mm is not visible until exec_mmap() but it can
+ * use a lot of memory, account these pages in current->mm temporary
+ * for oom_badness()->get_mm_rss(). Once exec succeeds or fails, we
+ * change the counter back via acct_arg_size(0).
+ */
static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
struct mm_struct *mm = current->mm;

2011-03-05 21:29:58

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v4 3/4] exec: unify do_execve/compat_do_execve code

On 03/05, Linus Torvalds wrote:
>
> Ok, everything looks fine to me.
>
> Except looking at this, I don't think this part:
>
> On Sat, Mar 5, 2011 at 12:31 PM, Oleg Nesterov <[email protected]> wrote:
> >
> > ?struct user_arg_ptr {
> > - ? ? ? const char __user *const __user *native;
> > +#ifdef CONFIG_COMPAT
> > + ? ? ? bool is_compat;
> > +#endif
> > + ? ? ? union {
> > + ? ? ? ? ? ? ? const char __user *const __user *native;
> > + ? ? ? ? ? ? ? compat_uptr_t __user *compat;
> > + ? ? ? } ptr;
> > ?};
>
> will necessarily even compile on an architecture that doesn't have any
> 'compat' support.

Aaaaaaaaaaaaaaaaaah, now this is a really good point.

> Do we even define 'compat_uptr_t' for that case? I don't think so.

Indeed, you are right.

What I was thinking about? I do not know.

> So I suspect you need two of those annoying #ifdef's.

please expect v5 tomorrow.

Oleg.

2011-03-06 17:11:27

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v5 1/4] exec: introduce get_user_arg_ptr() helper

Introduce get_user_arg_ptr() helper, convert count() and copy_strings()
to use it.

No functional changes, preparation. This helper is trivial, it just
reads the pointer from argv/envp user-space array.

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Tested-by: KOSAKI Motohiro <[email protected]>
---

fs/exec.c | 36 +++++++++++++++++++++++++-----------
1 file changed, 25 insertions(+), 11 deletions(-)

--- 38/fs/exec.c~1_get_arg_ptr 2011-03-06 17:48:00.000000000 +0100
+++ 38/fs/exec.c 2011-03-06 17:51:01.000000000 +0100
@@ -395,6 +395,17 @@ err:
return err;
}

+static const char __user *
+get_user_arg_ptr(const char __user * const __user *argv, int nr)
+{
+ const char __user *ptr;
+
+ if (get_user(ptr, argv + nr))
+ return ERR_PTR(-EFAULT);
+
+ return ptr;
+}
+
/*
* count() counts the number of strings in array ARGV.
*/
@@ -404,13 +415,14 @@ static int count(const char __user * con

if (argv != NULL) {
for (;;) {
- const char __user * p;
+ const char __user *p = get_user_arg_ptr(argv, i);

- if (get_user(p, argv))
- return -EFAULT;
if (!p)
break;
- argv++;
+
+ if (IS_ERR(p))
+ return -EFAULT;
+
if (i++ >= max)
return -E2BIG;

@@ -440,16 +452,18 @@ static int copy_strings(int argc, const
int len;
unsigned long pos;

- if (get_user(str, argv+argc) ||
- !(len = strnlen_user(str, MAX_ARG_STRLEN))) {
- ret = -EFAULT;
+ ret = -EFAULT;
+ str = get_user_arg_ptr(argv, argc);
+ if (IS_ERR(str))
goto out;
- }

- if (!valid_arg_len(bprm, len)) {
- ret = -E2BIG;
+ len = strnlen_user(str, MAX_ARG_STRLEN);
+ if (!len)
+ goto out;
+
+ ret = -E2BIG;
+ if (!valid_arg_len(bprm, len))
goto out;
- }

/* We're going to work our way backwords. */
pos = bprm->p;

2011-03-06 17:12:04

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v5 3/4] exec: unify do_execve/compat_do_execve code

Add the appropriate members into struct user_arg_ptr and teach
get_user_arg_ptr() to handle is_compat = T case correctly.

This allows us to remove the compat_do_execve() code from fs/compat.c
and reimplement compat_do_execve() as the trivial wrapper on top of
do_execve_common(is_compat => true).

In fact, this fixes another (minor) bug. "compat_uptr_t str" can
overflow after "str += len" in compat_copy_strings() if a 64bit
application execs via sys32_execve().

Unexport acct_arg_size() and get_arg_page(), fs/compat.c doesn't
need them any longer.

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Tested-by: KOSAKI Motohiro <[email protected]>
---

include/linux/binfmts.h | 4
fs/exec.c | 58 +++++++++--
fs/compat.c | 235 ------------------------------------------------
3 files changed, 46 insertions(+), 251 deletions(-)

--- 38/include/linux/binfmts.h~3_handle_compat_case 2011-03-06 17:48:00.000000000 +0100
+++ 38/include/linux/binfmts.h 2011-03-06 17:52:26.000000000 +0100
@@ -60,10 +60,6 @@ struct linux_binprm {
unsigned long loader, exec;
};

-extern void acct_arg_size(struct linux_binprm *bprm, unsigned long pages);
-extern struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
- int write);
-
#define BINPRM_FLAGS_ENFORCE_NONDUMP_BIT 0
#define BINPRM_FLAGS_ENFORCE_NONDUMP (1 << BINPRM_FLAGS_ENFORCE_NONDUMP_BIT)

--- 38/fs/exec.c~3_handle_compat_case 2011-03-06 17:51:44.000000000 +0100
+++ 38/fs/exec.c 2011-03-06 17:56:26.000000000 +0100
@@ -55,6 +55,7 @@
#include <linux/fs_struct.h>
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
+#include <linux/compat.h>

#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -164,7 +165,7 @@ out:

#ifdef CONFIG_MMU

-void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
struct mm_struct *mm = current->mm;
long diff = (long)(pages - bprm->vma_pages);
@@ -183,7 +184,7 @@ void acct_arg_size(struct linux_binprm *
#endif
}

-struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
struct page *page;
@@ -302,11 +303,11 @@ static bool valid_arg_len(struct linux_b

#else

-void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
+static inline void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
}

-struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
+static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
int write)
{
struct page *page;
@@ -396,17 +397,36 @@ err:
}

struct user_arg_ptr {
- const char __user *const __user *native;
+#ifdef CONFIG_COMPAT
+ bool is_compat;
+#endif
+ union {
+ const char __user *const __user *native;
+#ifdef CONFIG_COMPAT
+ compat_uptr_t __user *compat;
+#endif
+ } ptr;
};

static const char __user *get_user_arg_ptr(struct user_arg_ptr argv, int nr)
{
- const char __user *ptr;
+ const char __user *native;

- if (get_user(ptr, argv.native + nr))
+#ifdef CONFIG_COMPAT
+ if (unlikely(argv.is_compat)) {
+ compat_uptr_t compat;
+
+ if (get_user(compat, argv.ptr.compat + nr))
+ return ERR_PTR(-EFAULT);
+
+ return compat_ptr(compat);
+ }
+#endif
+
+ if (get_user(native, argv.ptr.native + nr))
return ERR_PTR(-EFAULT);

- return ptr;
+ return native;
}

/*
@@ -416,7 +436,7 @@ static int count(struct user_arg_ptr arg
{
int i = 0;

- if (argv.native != NULL) {
+ if (argv.ptr.native != NULL) {
for (;;) {
const char __user *p = get_user_arg_ptr(argv, i);

@@ -539,7 +559,7 @@ int copy_strings_kernel(int argc, const
int r;
mm_segment_t oldfs = get_fs();
struct user_arg_ptr argv = {
- .native = (const char __user *const __user *)__argv,
+ .ptr.native = (const char __user *const __user *)__argv,
};

set_fs(KERNEL_DS);
@@ -1510,11 +1530,29 @@ int do_execve(const char *filename,
const char __user *const __user *__envp,
struct pt_regs *regs)
{
- struct user_arg_ptr argv = { .native = __argv };
- struct user_arg_ptr envp = { .native = __envp };
+ struct user_arg_ptr argv = { .ptr.native = __argv };
+ struct user_arg_ptr envp = { .ptr.native = __envp };
return do_execve_common(filename, argv, envp, regs);
}

+#ifdef CONFIG_COMPAT
+int compat_do_execve(char *filename,
+ compat_uptr_t __user *__argv,
+ compat_uptr_t __user *__envp,
+ struct pt_regs *regs)
+{
+ struct user_arg_ptr argv = {
+ .is_compat = true,
+ .ptr.compat = __argv,
+ };
+ struct user_arg_ptr envp = {
+ .is_compat = true,
+ .ptr.compat = __envp,
+ };
+ return do_execve_common(filename, argv, envp, regs);
+}
+#endif
+
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;
--- 38/fs/compat.c~3_handle_compat_case 2011-03-06 17:48:00.000000000 +0100
+++ 38/fs/compat.c 2011-03-06 17:52:26.000000000 +0100
@@ -1330,241 +1330,6 @@ compat_sys_openat(unsigned int dfd, cons
return do_sys_open(dfd, filename, flags, mode);
}

-/*
- * compat_count() counts the number of arguments/envelopes. It is basically
- * a copy of count() from fs/exec.c, except that it works with 32 bit argv
- * and envp pointers.
- */
-static int compat_count(compat_uptr_t __user *argv, int max)
-{
- int i = 0;
-
- if (argv != NULL) {
- for (;;) {
- compat_uptr_t p;
-
- if (get_user(p, argv))
- return -EFAULT;
- if (!p)
- break;
- argv++;
- if (i++ >= max)
- return -E2BIG;
-
- if (fatal_signal_pending(current))
- return -ERESTARTNOHAND;
- cond_resched();
- }
- }
- return i;
-}
-
-/*
- * compat_copy_strings() is basically a copy of copy_strings() from fs/exec.c
- * except that it works with 32 bit argv and envp pointers.
- */
-static int compat_copy_strings(int argc, compat_uptr_t __user *argv,
- struct linux_binprm *bprm)
-{
- struct page *kmapped_page = NULL;
- char *kaddr = NULL;
- unsigned long kpos = 0;
- int ret;
-
- while (argc-- > 0) {
- compat_uptr_t str;
- int len;
- unsigned long pos;
-
- if (get_user(str, argv+argc) ||
- !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
- ret = -EFAULT;
- goto out;
- }
-
- if (len > MAX_ARG_STRLEN) {
- ret = -E2BIG;
- goto out;
- }
-
- /* We're going to work our way backwords. */
- pos = bprm->p;
- str += len;
- bprm->p -= len;
-
- while (len > 0) {
- int offset, bytes_to_copy;
-
- if (fatal_signal_pending(current)) {
- ret = -ERESTARTNOHAND;
- goto out;
- }
- cond_resched();
-
- offset = pos % PAGE_SIZE;
- if (offset == 0)
- offset = PAGE_SIZE;
-
- bytes_to_copy = offset;
- if (bytes_to_copy > len)
- bytes_to_copy = len;
-
- offset -= bytes_to_copy;
- pos -= bytes_to_copy;
- str -= bytes_to_copy;
- len -= bytes_to_copy;
-
- if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
- struct page *page;
-
- page = get_arg_page(bprm, pos, 1);
- if (!page) {
- ret = -E2BIG;
- goto out;
- }
-
- if (kmapped_page) {
- flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
- put_page(kmapped_page);
- }
- kmapped_page = page;
- kaddr = kmap(kmapped_page);
- kpos = pos & PAGE_MASK;
- flush_cache_page(bprm->vma, kpos,
- page_to_pfn(kmapped_page));
- }
- if (copy_from_user(kaddr+offset, compat_ptr(str),
- bytes_to_copy)) {
- ret = -EFAULT;
- goto out;
- }
- }
- }
- ret = 0;
-out:
- if (kmapped_page) {
- flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
- put_page(kmapped_page);
- }
- return ret;
-}
-
-/*
- * compat_do_execve() is mostly a copy of do_execve(), with the exception
- * that it processes 32 bit argv and envp pointers.
- */
-int compat_do_execve(char * filename,
- compat_uptr_t __user *argv,
- compat_uptr_t __user *envp,
- struct pt_regs * regs)
-{
- struct linux_binprm *bprm;
- struct file *file;
- struct files_struct *displaced;
- bool clear_in_exec;
- int retval;
-
- retval = unshare_files(&displaced);
- if (retval)
- goto out_ret;
-
- retval = -ENOMEM;
- bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
- if (!bprm)
- goto out_files;
-
- retval = prepare_bprm_creds(bprm);
- if (retval)
- goto out_free;
-
- retval = check_unsafe_exec(bprm);
- if (retval < 0)
- goto out_free;
- clear_in_exec = retval;
- current->in_execve = 1;
-
- file = open_exec(filename);
- retval = PTR_ERR(file);
- if (IS_ERR(file))
- goto out_unmark;
-
- sched_exec();
-
- bprm->file = file;
- bprm->filename = filename;
- bprm->interp = filename;
-
- retval = bprm_mm_init(bprm);
- if (retval)
- goto out_file;
-
- bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
- if ((retval = bprm->argc) < 0)
- goto out;
-
- bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
- if ((retval = bprm->envc) < 0)
- goto out;
-
- retval = prepare_binprm(bprm);
- if (retval < 0)
- goto out;
-
- retval = copy_strings_kernel(1, &bprm->filename, bprm);
- if (retval < 0)
- goto out;
-
- bprm->exec = bprm->p;
- retval = compat_copy_strings(bprm->envc, envp, bprm);
- if (retval < 0)
- goto out;
-
- retval = compat_copy_strings(bprm->argc, argv, bprm);
- if (retval < 0)
- goto out;
-
- retval = search_binary_handler(bprm, regs);
- if (retval < 0)
- goto out;
-
- /* execve succeeded */
- current->fs->in_exec = 0;
- current->in_execve = 0;
- acct_update_integrals(current);
- free_bprm(bprm);
- if (displaced)
- put_files_struct(displaced);
- return retval;
-
-out:
- if (bprm->mm) {
- acct_arg_size(bprm, 0);
- mmput(bprm->mm);
- }
-
-out_file:
- if (bprm->file) {
- allow_write_access(bprm->file);
- fput(bprm->file);
- }
-
-out_unmark:
- if (clear_in_exec)
- current->fs->in_exec = 0;
- current->in_execve = 0;
-
-out_free:
- free_bprm(bprm);
-
-out_files:
- if (displaced)
- reset_files_struct(displaced);
-out_ret:
- return retval;
-}
-
#define __COMPAT_NFDBITS (8 * sizeof(compat_ulong_t))

static int poll_select_copy_remaining(struct timespec *end_time, void __user *p,

2011-03-06 17:11:11

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v5 0/4] exec: unify native/compat code

On 03/06, KOSAKI Motohiro wrote:
>
> And, I happily reported this series run successfully my testsuite.
> Could you please add my tested-by tag?

Sure, thanks a lot Kosaki.

I hope this is the last version. Changes:

- as Linus pointed out, we do not have compat_uptr_t without
CONFIG_COMPAT. Add another ifdef into struct user_arg_ptr.

Oleg.

2011-03-06 17:12:20

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v5 4/4] exec: document acct_arg_size()

Add the comment to explain acct_arg_size().

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
---

fs/exec.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- 38/fs/exec.c~4_doc_acct_arg_size 2011-03-06 17:56:26.000000000 +0100
+++ 38/fs/exec.c 2011-03-06 17:56:47.000000000 +0100
@@ -164,7 +164,12 @@ out:
}

#ifdef CONFIG_MMU
-
+/*
+ * The nascent bprm->mm is not visible until exec_mmap() but it can
+ * use a lot of memory, account these pages in current->mm temporary
+ * for oom_badness()->get_mm_rss(). Once exec succeeds or fails, we
+ * change the counter back via acct_arg_size(0).
+ */
static void acct_arg_size(struct linux_binprm *bprm, unsigned long pages)
{
struct mm_struct *mm = current->mm;

2011-03-06 17:11:59

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v5 2/4] exec: introduce struct user_arg_ptr

No functional changes, preparation.

Introduce struct user_arg_ptr, change do_execve() paths to use it
instead of "char __user * const __user *argv".

This makes the argv/envp arguments opaque, we are ready to handle the
compat case which needs argv pointing to compat_uptr_t.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Tested-by: KOSAKI Motohiro <[email protected]>
---

fs/exec.c | 42 ++++++++++++++++++++++++++++++------------
1 file changed, 30 insertions(+), 12 deletions(-)

--- 38/fs/exec.c~2_typedef_for_argv 2011-03-06 17:51:01.000000000 +0100
+++ 38/fs/exec.c 2011-03-06 17:51:44.000000000 +0100
@@ -395,12 +395,15 @@ err:
return err;
}

-static const char __user *
-get_user_arg_ptr(const char __user * const __user *argv, int nr)
+struct user_arg_ptr {
+ const char __user *const __user *native;
+};
+
+static const char __user *get_user_arg_ptr(struct user_arg_ptr argv, int nr)
{
const char __user *ptr;

- if (get_user(ptr, argv + nr))
+ if (get_user(ptr, argv.native + nr))
return ERR_PTR(-EFAULT);

return ptr;
@@ -409,11 +412,11 @@ get_user_arg_ptr(const char __user * con
/*
* count() counts the number of strings in array ARGV.
*/
-static int count(const char __user * const __user * argv, int max)
+static int count(struct user_arg_ptr argv, int max)
{
int i = 0;

- if (argv != NULL) {
+ if (argv.native != NULL) {
for (;;) {
const char __user *p = get_user_arg_ptr(argv, i);

@@ -439,7 +442,7 @@ static int count(const char __user * con
* processes's memory to the new process's stack. The call to get_user_pages()
* ensures the destination page is created and not swapped out.
*/
-static int copy_strings(int argc, const char __user *const __user *argv,
+static int copy_strings(int argc, struct user_arg_ptr argv,
struct linux_binprm *bprm)
{
struct page *kmapped_page = NULL;
@@ -530,14 +533,19 @@ out:
/*
* Like copy_strings, but get argv and its values from kernel memory.
*/
-int copy_strings_kernel(int argc, const char *const *argv,
+int copy_strings_kernel(int argc, const char *const *__argv,
struct linux_binprm *bprm)
{
int r;
mm_segment_t oldfs = get_fs();
+ struct user_arg_ptr argv = {
+ .native = (const char __user *const __user *)__argv,
+ };
+
set_fs(KERNEL_DS);
- r = copy_strings(argc, (const char __user *const __user *)argv, bprm);
+ r = copy_strings(argc, argv, bprm);
set_fs(oldfs);
+
return r;
}
EXPORT_SYMBOL(copy_strings_kernel);
@@ -1387,10 +1395,10 @@ EXPORT_SYMBOL(search_binary_handler);
/*
* sys_execve() executes a new program.
*/
-int do_execve(const char * filename,
- const char __user *const __user *argv,
- const char __user *const __user *envp,
- struct pt_regs * regs)
+static int do_execve_common(const char *filename,
+ struct user_arg_ptr argv,
+ struct user_arg_ptr envp,
+ struct pt_regs *regs)
{
struct linux_binprm *bprm;
struct file *file;
@@ -1497,6 +1505,16 @@ out_ret:
return retval;
}

+int do_execve(const char *filename,
+ const char __user *const __user *__argv,
+ const char __user *const __user *__envp,
+ struct pt_regs *regs)
+{
+ struct user_arg_ptr argv = { .native = __argv };
+ struct user_arg_ptr envp = { .native = __envp };
+ return do_execve_common(filename, argv, envp, regs);
+}
+
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;

2011-03-06 12:04:09

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] exec: unify native/compat code

> On 03/03, Linus Torvalds wrote:
> >
> > On Thu, Mar 3, 2011 at 7:47 AM, Oleg Nesterov <[email protected]> wrote:
> > >> I _personally_ don't like "conditional". Its name is based on code logic.
> > >> It's unclear what mean "conditional". From data strucuture view, It is
> > >> "opaque userland pointer".
> > >
> > > I agree with any naming, just suggest a better name ;)
> >
> > Maybe just "struct user_arg_ptr" or something?
>
> OK, nothing else was suggessted, I assume Kosaki agrees.

Sure. :)

And, I happily reported this series run successfully my testsuite.
Could you please add my tested-by tag?

thanks.


>
> So rename conditional_ptr to user_arg_ptr.
>
> Also rename get_user_ptr() to get_user_arg_ptr(). It was suggested to
> use the same "user_arg_ptr" for this helper too, but this is not
> grep-friendly. As for get_ in the name... Well, I can redo again ;)
> But this matches get_user() and this is all what this helper does.
>
> Otherwise unchanged.
>
> Oleg.
>