2010-02-06 00:43:17

by Michael Neuling

[permalink] [raw]
Subject: Stack size protection broken on ppc64

On recent ppc64 kernels, limiting the stack (using 'ulimit -s blah') is
now more restrictive than it was before. On 2.6.31 with 4k pages I
could run 'ulimit -s 16; /usr/bin/test' without a problem. Now with
mainline, even 'ulimit -s 64; /usr/bin/test' gets killed.

Using 64k pages is even worse. I can't even run '/bin/ls' with a 1MB
stack (ulimit -s 1024; /bin/ls). Hence, it seems new kernels are too
restrictive, rather than the old kernels being too liberal.

I've not tested with any other architectures.

Bisecting, I found that this is the culprit (which is in 2.6.32)

commit fc63cf237078c86214abcb2ee9926d8ad289da9b
Author: Anton Blanchard <[email protected]>
exec: setup_arg_pages() fails to return errors

Looking at the patch, it's probably just unmasking a preexisting issue.
The error path for expand_stack() (and others) was modified to:
---
ret = expand_stack(vma, stack_base);
if (ret)
ret = -EFAULT;

out_unlock:
up_write(&mm->mmap_sem);
- return 0;
+ return ret;
}
EXPORT_SYMBOL(setup_arg_pages);
---

So previously expand_stack errors were not returned correctly by
setup_arg_pages, but now they are.

Any clues how to fix this?

Mikey


2010-02-06 04:24:05

by Anton Blanchard

[permalink] [raw]
Subject: Re: Stack size protection broken on ppc64


Hi,

> On recent ppc64 kernels, limiting the stack (using 'ulimit -s blah') is
> now more restrictive than it was before. On 2.6.31 with 4k pages I
> could run 'ulimit -s 16; /usr/bin/test' without a problem. Now with
> mainline, even 'ulimit -s 64; /usr/bin/test' gets killed.
>
> Using 64k pages is even worse. I can't even run '/bin/ls' with a 1MB
> stack (ulimit -s 1024; /bin/ls). Hence, it seems new kernels are too
> restrictive, rather than the old kernels being too liberal.

It looks like this is causing it:

#define EXTRA_STACK_VM_PAGES 20 /* random */

...

#ifdef CONFIG_STACK_GROWSUP
stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
#else
stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
#endif

Which got added back in 2005 in a memory overcommit patch. It only took 5
years for us to go back and review that random setting :)

The comment from Andries explains the purpose:

(1) It reserves a reasonable amount of virtual stack space (amount
randomly chosen, no guarantees given) when the process is started, so
that the common utilities will not be killed by segfault on stack
extension.

This explains why 64kB is much worse. The extra stack reserve should be in kB
and we also need to be careful not to ask for more than our rlimit.

Anton

2010-02-06 10:22:18

by Michael Neuling

[permalink] [raw]
Subject: Re: Stack size protection broken on ppc64

> > On recent ppc64 kernels, limiting the stack (using 'ulimit -s blah') is
> > now more restrictive than it was before. On 2.6.31 with 4k pages I
> > could run 'ulimit -s 16; /usr/bin/test' without a problem. Now with
> > mainline, even 'ulimit -s 64; /usr/bin/test' gets killed.
> >
> > Using 64k pages is even worse. I can't even run '/bin/ls' with a 1MB
> > stack (ulimit -s 1024; /bin/ls). Hence, it seems new kernels are too
> > restrictive, rather than the old kernels being too liberal.
>
> It looks like this is causing it:
>
> #define EXTRA_STACK_VM_PAGES 20 /* random */
>
> ...
>
> #ifdef CONFIG_STACK_GROWSUP
> stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> #else
> stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> #endif
>
> Which got added back in 2005 in a memory overcommit patch. It only took 5
> years for us to go back and review that random setting :)
>
> The comment from Andries explains the purpose:
>
> (1) It reserves a reasonable amount of virtual stack space (amount
> randomly chosen, no guarantees given) when the process is started, so
> that the common utilities will not be killed by segfault on stack
> extension.
>
> This explains why 64kB is much worse. The extra stack reserve should be in kB
> and we also need to be careful not to ask for more than our rlimit.

Cool, thanks. The following is based on this and fixes the problem for
me on PPC64 ie. the !CONFIG_STACK_GROWSUP case.

Mikey

[PATCH] Restrict stack space reservation to rlimit

When reserving stack space for a new process, make sure we're not
attempting to allocate more than rlimit allows.

Also, reserve the same stack size independent of page size.

This fixes a bug unmasked by fc63cf237078c86214abcb2ee9926d8ad289da9b

Signed-off-by: Michael Neuling <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: [email protected]
---
fs/exec.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

Index: clone1/fs/exec.c
===================================================================
--- clone1.orig/fs/exec.c
+++ clone1/fs/exec.c
@@ -554,7 +554,7 @@ static int shift_arg_pages(struct vm_are
return 0;
}

-#define EXTRA_STACK_VM_PAGES 20 /* random */
+#define EXTRA_STACK_VM_SIZE 81920UL /* randomly 20 4K pages */

/*
* Finalizes the stack vm_area_struct. The flags and permissions are updated,
@@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
goto out_unlock;
}

+ stack_base = min(EXTRA_STACK_VM_SIZE,
+ current->signal->rlim[RLIMIT_STACK].rlim_cur) -
+ PAGE_SIZE;
#ifdef CONFIG_STACK_GROWSUP
- stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_end + stack_base;
#else
- stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_start - stack_base;
#endif
ret = expand_stack(vma, stack_base);
if (ret)

2010-02-08 00:07:56

by Michael Neuling

[permalink] [raw]
Subject: [PATCH] Restrict stack space reservation to rlimit

apkm, linus: this or something like it needs to go into 2.6.33 (& 32) to
fix 'ulimit -s'.

Mikey

[PATCH] Restrict stack space reservation to rlimit

When reserving stack space for a new process, make sure we're not
attempting to allocate more than rlimit allows.

Also, reserve the same stack size independent of page size.

This fixes a bug unmasked by fc63cf237078c86214abcb2ee9926d8ad289da9b

Signed-off-by: Michael Neuling <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: [email protected]
---
fs/exec.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

Index: clone1/fs/exec.c
===================================================================
--- clone1.orig/fs/exec.c
+++ clone1/fs/exec.c
@@ -554,7 +554,7 @@ static int shift_arg_pages(struct vm_are
return 0;
}

-#define EXTRA_STACK_VM_PAGES 20 /* random */
+#define EXTRA_STACK_VM_SIZE 81920UL /* randomly 20 4K pages */

/*
* Finalizes the stack vm_area_struct. The flags and permissions are updated,
@@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
goto out_unlock;
}

+ stack_base = min(EXTRA_STACK_VM_SIZE,
+ current->signal->rlim[RLIMIT_STACK].rlim_cur) -
+ PAGE_SIZE;
#ifdef CONFIG_STACK_GROWSUP
- stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_end + stack_base;
#else
- stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_start - stack_base;
#endif
ret = expand_stack(vma, stack_base);
if (ret)

2010-02-08 00:09:23

by Anton Blanchard

[permalink] [raw]
Subject: Re: Stack size protection broken on ppc64


Hi,

> Cool, thanks. The following is based on this and fixes the problem for
> me on PPC64 ie. the !CONFIG_STACK_GROWSUP case.

Thanks! Seeing the original setting of EXTRA_STACK_VM_PAGES is more or
less random, I wonder if we should round EXTRA_STACK_VM_SIZE up to 128kB
(or even down to 64kB) so it operates better with > 4kB pages.

But in the end its probably of little use for the default OVERCOMMIT_GUESS
setting, so the main thing is we dont terminate processes incorrectly.

Acked-by: Anton Blanchard <[email protected]>

Anton

2010-02-08 00:28:46

by Michael Neuling

[permalink] [raw]
Subject: [PATCH] Restrict stack space reservation to rlimit

When reserving stack space for a new process, make sure we're not
attempting to allocate more than rlimit allows.

Also, reserve the same stack size independent of page size.

This fixes a bug cause by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba
"mm: variable length argument support" and unmasked by
fc63cf237078c86214abcb2ee9926d8ad289da9b
"exec: setup_arg_pages() fails to return errors".

Signed-off-by: Michael Neuling <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: [email protected]
---
Update commit message to include patch name and SHA1 of related
patches.

fs/exec.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

Index: clone1/fs/exec.c
===================================================================
--- clone1.orig/fs/exec.c
+++ clone1/fs/exec.c
@@ -554,7 +554,7 @@ static int shift_arg_pages(struct vm_are
return 0;
}

-#define EXTRA_STACK_VM_PAGES 20 /* random */
+#define EXTRA_STACK_VM_SIZE 81920UL /* randomly 20 4K pages */

/*
* Finalizes the stack vm_area_struct. The flags and permissions are updated,
@@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
goto out_unlock;
}

+ stack_base = min(EXTRA_STACK_VM_SIZE,
+ current->signal->rlim[RLIMIT_STACK].rlim_cur) -
+ PAGE_SIZE;
#ifdef CONFIG_STACK_GROWSUP
- stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_end + stack_base;
#else
- stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_start - stack_base;
#endif
ret = expand_stack(vma, stack_base);
if (ret)

2010-02-08 05:06:11

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Restrict stack space reservation to rlimit

Hi

> apkm, linus: this or something like it needs to go into 2.6.33 (& 32) to
> fix 'ulimit -s'.

"fix ulimit -s" is too cool explanation ;-)
we are not ESPer. please consider to provide what bug is exist.


> Mikey
>
> [PATCH] Restrict stack space reservation to rlimit
>
> When reserving stack space for a new process, make sure we're not
> attempting to allocate more than rlimit allows.
>
> Also, reserve the same stack size independent of page size.

Why do we need page size independent stack size? It seems to have
compatibility breaking risk.


>
> This fixes a bug unmasked by fc63cf237078c86214abcb2ee9926d8ad289da9b
>
> Signed-off-by: Michael Neuling <[email protected]>
> Cc: Anton Blanchard <[email protected]>
> Cc: [email protected]
> ---
> fs/exec.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> Index: clone1/fs/exec.c
> ===================================================================
> --- clone1.orig/fs/exec.c
> +++ clone1/fs/exec.c
> @@ -554,7 +554,7 @@ static int shift_arg_pages(struct vm_are
> return 0;
> }
>
> -#define EXTRA_STACK_VM_PAGES 20 /* random */
> +#define EXTRA_STACK_VM_SIZE 81920UL /* randomly 20 4K pages */
>
> /*
> * Finalizes the stack vm_area_struct. The flags and permissions are updated,
> @@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
> goto out_unlock;
> }
>
> + stack_base = min(EXTRA_STACK_VM_SIZE,
> + current->signal->rlim[RLIMIT_STACK].rlim_cur) -
> + PAGE_SIZE;
> #ifdef CONFIG_STACK_GROWSUP
> - stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + stack_base = vma->vm_end + stack_base;
> #else
> - stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + stack_base = vma->vm_start - stack_base;
> #endif
> ret = expand_stack(vma, stack_base);
> if (ret)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2010-02-08 05:12:28

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH] Restrict stack space reservation to rlimit


Hi,

> Why do we need page size independent stack size? It seems to have
> compatibility breaking risk.

I don't think so. The current behaviour is clearly wrong, we dont need a
16x larger stack just because you went from a 4kB to a 64kB base page
size. The user application stack usage is the same in both cases.

Anton

2010-02-08 05:22:28

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Restrict stack space reservation to rlimit

>
> Hi,
>
> > Why do we need page size independent stack size? It seems to have
> > compatibility breaking risk.
>
> I don't think so. The current behaviour is clearly wrong, we dont need a
> 16x larger stack just because you went from a 4kB to a 64kB base page
> size. The user application stack usage is the same in both cases.

I didn't discuss which behavior is better. Michael said he want to apply
his patch to 2.6.32 & 2.6.33. stable tree never accept the breaking
compatibility patch.

Your answer doesn't explain why can't we wait it until next merge window.


btw, personally, I like page size indepent stack size. but I'm not sure
why making stack size independency is related to bug fix.


2010-02-08 05:32:45

by Anton Blanchard

[permalink] [raw]
Subject: Re: [PATCH] Restrict stack space reservation to rlimit


Hi,

> I didn't discuss which behavior is better. Michael said he want to apply
> his patch to 2.6.32 & 2.6.33. stable tree never accept the breaking
> compatibility patch.
>
> Your answer doesn't explain why can't we wait it until next merge window.
>
>
> btw, personally, I like page size indepent stack size. but I'm not sure
> why making stack size independency is related to bug fix.

OK sorry, I misunderstood your initial mail. I agree fixing the bit that
regressed in 2.6.32 is the most important thing. The difference in page size is
clearly wrong but since it isn't a regression we could probably live with it
until 2.6.34

Anton

2010-02-08 05:37:53

by Michael Neuling

[permalink] [raw]
Subject: [PATCH] Restrict stack space reservation to rlimit

> >
> > Hi,
> >
> > > Why do we need page size independent stack size? It seems to have
> > > compatibility breaking risk.
> >
> > I don't think so. The current behaviour is clearly wrong, we dont need a
> > 16x larger stack just because you went from a 4kB to a 64kB base page
> > size. The user application stack usage is the same in both cases.
>
> I didn't discuss which behavior is better. Michael said he want to apply
> his patch to 2.6.32 & 2.6.33. stable tree never accept the breaking
> compatibility patch.
>
> Your answer doesn't explain why can't we wait it until next merge window.
>
> btw, personally, I like page size indepent stack size. but I'm not sure
> why making stack size independency is related to bug fix.

I tend to agree.

Below is just the bug fix to limit the reservation size based rlimit.
We still reserve different stack sizes based on the page size as
before (unless we hit rlimit of course).

Mikey

Restrict stack space reservation to rlimit

When reserving stack space for a new process, make sure we're not
attempting to allocate more than rlimit allows.

This fixes a bug cause by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba
"mm: variable length argument support" and unmasked by
fc63cf237078c86214abcb2ee9926d8ad289da9b
"exec: setup_arg_pages() fails to return errors".

Signed-off-by: Michael Neuling <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: [email protected]
---
fs/exec.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6-ozlabs/fs/exec.c
===================================================================
--- linux-2.6-ozlabs.orig/fs/exec.c
+++ linux-2.6-ozlabs/fs/exec.c
@@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
goto out_unlock;
}

+ stack_base = min(EXTRA_STACK_VM_PAGES * PAGE_SIZE,
+ current->signal->rlim[RLIMIT_STACK].rlim_cur -
+ PAGE_SIZE);
#ifdef CONFIG_STACK_GROWSUP
- stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_end + stack_base;
#else
- stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_start - stack_base;
#endif
ret = expand_stack(vma, stack_base);
if (ret)

2010-02-08 06:05:48

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Restrict stack space reservation to rlimit

> > >
> > > Hi,
> > >
> > > > Why do we need page size independent stack size? It seems to have
> > > > compatibility breaking risk.
> > >
> > > I don't think so. The current behaviour is clearly wrong, we dont need a
> > > 16x larger stack just because you went from a 4kB to a 64kB base page
> > > size. The user application stack usage is the same in both cases.
> >
> > I didn't discuss which behavior is better. Michael said he want to apply
> > his patch to 2.6.32 & 2.6.33. stable tree never accept the breaking
> > compatibility patch.
> >
> > Your answer doesn't explain why can't we wait it until next merge window.
> >
> > btw, personally, I like page size indepent stack size. but I'm not sure
> > why making stack size independency is related to bug fix.
>
> I tend to agree.
>
> Below is just the bug fix to limit the reservation size based rlimit.
> We still reserve different stack sizes based on the page size as
> before (unless we hit rlimit of course).

Thanks.

I agree your patch in almost part. but I have very few requests.


> Mikey
>
> Restrict stack space reservation to rlimit
>
> When reserving stack space for a new process, make sure we're not
> attempting to allocate more than rlimit allows.
>
> This fixes a bug cause by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba
> "mm: variable length argument support" and unmasked by
> fc63cf237078c86214abcb2ee9926d8ad289da9b
> "exec: setup_arg_pages() fails to return errors".

Your initial mail have following problem use-case. please append it
into the patch description.

On recent ppc64 kernels, limiting the stack (using 'ulimit -s blah') is
now more restrictive than it was before. On 2.6.31 with 4k pages I
could run 'ulimit -s 16; /usr/bin/test' without a problem. Now with
mainline, even 'ulimit -s 64; /usr/bin/test' gets killed.


>
> Signed-off-by: Michael Neuling <[email protected]>
> Cc: Anton Blanchard <[email protected]>
> Cc: [email protected]
> ---
> fs/exec.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> Index: linux-2.6-ozlabs/fs/exec.c
> ===================================================================
> --- linux-2.6-ozlabs.orig/fs/exec.c
> +++ linux-2.6-ozlabs/fs/exec.c
> @@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
> goto out_unlock;
> }
>
> + stack_base = min(EXTRA_STACK_VM_PAGES * PAGE_SIZE,
> + current->signal->rlim[RLIMIT_STACK].rlim_cur -
> + PAGE_SIZE);

This line is a bit unclear why "- PAGE_SIZE" is necessary.
personally, I like following likes explicit comments.

stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
stack_lim = ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur);

/* Initial stack must not cause stack overflow. */
if (stack_expand + PAGE_SIZE > stack_lim)
stack_expand = stack_lim - PAGE_SIZE;

note: accessing rlim_cur require ACCESS_ONCE.


Thought?


> #ifdef CONFIG_STACK_GROWSUP
> - stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + stack_base = vma->vm_end + stack_base;
> #else
> - stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + stack_base = vma->vm_start - stack_base;
> #endif
> ret = expand_stack(vma, stack_base);
> if (ret)
>


2010-02-08 06:12:11

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Restrict stack space reservation to rlimit

>
> Hi,
>
> > I didn't discuss which behavior is better. Michael said he want to apply
> > his patch to 2.6.32 & 2.6.33. stable tree never accept the breaking
> > compatibility patch.
> >
> > Your answer doesn't explain why can't we wait it until next merge window.
> >
> >
> > btw, personally, I like page size indepent stack size. but I'm not sure
> > why making stack size independency is related to bug fix.
>
> OK sorry, I misunderstood your initial mail. I agree fixing the bit that
> regressed in 2.6.32 is the most important thing. The difference in page size is
> clearly wrong but since it isn't a regression we could probably live with it
> until 2.6.34

thanks!

2010-02-08 07:07:50

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] Restrict stack space reservation to rlimit

On Mon, Feb 8, 2010 at 2:05 PM, KOSAKI Motohiro
<[email protected]> wrote:
>> --- linux-2.6-ozlabs.orig/fs/exec.c
>> +++ linux-2.6-ozlabs/fs/exec.c
>> @@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
>>                       goto out_unlock;
>>       }
>>
>> +     stack_base = min(EXTRA_STACK_VM_PAGES * PAGE_SIZE,
>> +                      current->signal->rlim[RLIMIT_STACK].rlim_cur -
>> +                        PAGE_SIZE);
>
> This line is a bit unclear why "- PAGE_SIZE" is necessary.
> personally, I like following likes explicit comments.
>
>        stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
>        stack_lim = ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur);
>
>        /* Initial stack must not cause stack overflow. */
>        if (stack_expand + PAGE_SIZE > stack_lim)
>                stack_expand = stack_lim - PAGE_SIZE;
>
> note: accessing rlim_cur require ACCESS_ONCE.
>
>
> Thought?

It's better to use the helper function: rlimit().

2010-02-08 07:11:53

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Restrict stack space reservation to rlimit

> On Mon, Feb 8, 2010 at 2:05 PM, KOSAKI Motohiro
> <[email protected]> wrote:
> >> --- linux-2.6-ozlabs.orig/fs/exec.c
> >> +++ linux-2.6-ozlabs/fs/exec.c
> >> @@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
> >>                       goto out_unlock;
> >>       }
> >>
> >> +     stack_base = min(EXTRA_STACK_VM_PAGES * PAGE_SIZE,
> >> +                      current->signal->rlim[RLIMIT_STACK].rlim_cur -
> >> +                        PAGE_SIZE);
> >
> > This line is a bit unclear why "- PAGE_SIZE" is necessary.
> > personally, I like following likes explicit comments.
> >
> >        stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> >        stack_lim = ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur);
> >
> >        /* Initial stack must not cause stack overflow. */
> >        if (stack_expand + PAGE_SIZE > stack_lim)
> >                stack_expand = stack_lim - PAGE_SIZE;
> >
> > note: accessing rlim_cur require ACCESS_ONCE.
> >
> >
> > Thought?
>
> It's better to use the helper function: rlimit().

AFAIK, stable tree doesn't have rlimit(). but yes, making two patch
(for mainline and for stable) is good opinion.


2010-02-08 10:45:55

by Michael Neuling

[permalink] [raw]
Subject: Re: [PATCH] Restrict stack space reservation to rlimit


In message <[email protected]> you wrote:
> > > >
> > > > Hi,
> > > >
> > > > > Why do we need page size independent stack size? It seems to have
> > > > > compatibility breaking risk.
> > > >
> > > > I don't think so. The current behaviour is clearly wrong, we dont need
a
> > > > 16x larger stack just because you went from a 4kB to a 64kB base page
> > > > size. The user application stack usage is the same in both cases.
> > >
> > > I didn't discuss which behavior is better. Michael said he want to apply
> > > his patch to 2.6.32 & 2.6.33. stable tree never accept the breaking
> > > compatibility patch.
> > >
> > > Your answer doesn't explain why can't we wait it until next merge window.
> > >
> > > btw, personally, I like page size indepent stack size. but I'm not sure
> > > why making stack size independency is related to bug fix.
> >
> > I tend to agree.
> >
> > Below is just the bug fix to limit the reservation size based rlimit.
> > We still reserve different stack sizes based on the page size as
> > before (unless we hit rlimit of course).
>
> Thanks.
>
> I agree your patch in almost part. but I have very few requests.
>
>
> > Mikey
> >
> > Restrict stack space reservation to rlimit
> >
> > When reserving stack space for a new process, make sure we're not
> > attempting to allocate more than rlimit allows.
> >
> > This fixes a bug cause by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba
> > "mm: variable length argument support" and unmasked by
> > fc63cf237078c86214abcb2ee9926d8ad289da9b
> > "exec: setup_arg_pages() fails to return errors".
>
> Your initial mail have following problem use-case. please append it
> into the patch description.
>
> On recent ppc64 kernels, limiting the stack (using 'ulimit -s blah') is
> now more restrictive than it was before. On 2.6.31 with 4k pages I
> could run 'ulimit -s 16; /usr/bin/test' without a problem. Now with
> mainline, even 'ulimit -s 64; /usr/bin/test' gets killed.

Ok, I'll add this info in.

>
> >
> > Signed-off-by: Michael Neuling <[email protected]>
> > Cc: Anton Blanchard <[email protected]>
> > Cc: [email protected]
> > ---
> > fs/exec.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > Index: linux-2.6-ozlabs/fs/exec.c
> > ===================================================================
> > --- linux-2.6-ozlabs.orig/fs/exec.c
> > +++ linux-2.6-ozlabs/fs/exec.c
> > @@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
> > goto out_unlock;
> > }
> >
> > + stack_base = min(EXTRA_STACK_VM_PAGES * PAGE_SIZE,
> > + current->signal->rlim[RLIMIT_STACK].rlim_cur -
> > + PAGE_SIZE);
>
> This line is a bit unclear why "- PAGE_SIZE" is necessary.

This is because the stack is already 1 page in size. I'm going to
change that code to make it clearer... hopefully :-)

> personally, I like following likes explicit comments.
>
> stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> stack_lim = ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur);
>
> /* Initial stack must not cause stack overflow. */
> if (stack_expand + PAGE_SIZE > stack_lim)
> stack_expand = stack_lim - PAGE_SIZE;
>
> note: accessing rlim_cur require ACCESS_ONCE.
>
>
> Thought?

Thanks, looks better/clearer to me too. I'll change, new patch coming....

Mikey

>
>
> > #ifdef CONFIG_STACK_GROWSUP
> > - stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> > + stack_base = vma->vm_end + stack_base;
> > #else
> > - stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> > + stack_base = vma->vm_start - stack_base;
> > #endif
> > ret = expand_stack(vma, stack_base);
> > if (ret)
> >
>
>
>

2010-02-09 06:11:28

by Michael Neuling

[permalink] [raw]
Subject: [PATCH] Restrict initial stack space expansion to rlimit

When reserving stack space for a new process, make sure we're not
attempting to expand the stack by more than rlimit allows.

This fixes a bug caused by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba "mm:
variable length argument support" and unmasked by
fc63cf237078c86214abcb2ee9926d8ad289da9b "exec: setup_arg_pages() fails
to return errors". This bug means when limiting the stack to less the
20*PAGE_SIZE (eg. 80K on 4K pages or 'ulimit -s 79') all processes will
be killed before they start. This is particularly bad with 64K pages,
where a ulimit below 1280K will kill every process.

Signed-off-by: Michael Neuling <[email protected]>
Cc: [email protected]
---
Attempts to answer comments from Kosaki Motohiro.

Tested on PPC only, hence !CONFIG_STACK_GROWSUP. Someone should
probably ACK for an arch with CONFIG_STACK_GROWSUP.

As noted, stable needs the same patch, but 2.6.32 doesn't have the
rlimit() helper.

fs/exec.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)

Index: linux-2.6-ozlabs/fs/exec.c
===================================================================
--- linux-2.6-ozlabs.orig/fs/exec.c
+++ linux-2.6-ozlabs/fs/exec.c
@@ -555,6 +555,7 @@ static int shift_arg_pages(struct vm_are
}

#define EXTRA_STACK_VM_PAGES 20 /* random */
+#define ALIGN_DOWN(addr,size) ((addr)&(~((size)-1)))

/*
* Finalizes the stack vm_area_struct. The flags and permissions are updated,
@@ -570,7 +571,7 @@ int setup_arg_pages(struct linux_binprm
struct vm_area_struct *vma = bprm->vma;
struct vm_area_struct *prev = NULL;
unsigned long vm_flags;
- unsigned long stack_base;
+ unsigned long stack_base, stack_expand, stack_expand_lim, stack_size;

#ifdef CONFIG_STACK_GROWSUP
/* Limit stack size to 1GB */
@@ -627,10 +628,24 @@ int setup_arg_pages(struct linux_binprm
goto out_unlock;
}

+ stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_size = vma->vm_end - vma->vm_start;
+ if (rlimit(RLIMIT_STACK) < stack_size)
+ stack_expand_lim = 0; /* don't shrick the stack */
+ else
+ /*
+ * Align this down to a page boundary as expand_stack
+ * will align it up.
+ */
+ stack_expand_lim = ALIGN_DOWN(rlimit(RLIMIT_STACK) - stack_size,
+ PAGE_SIZE);
+ /* Initial stack must not cause stack overflow. */
+ if (stack_expand > stack_expand_lim)
+ stack_expand = stack_expand_lim;
#ifdef CONFIG_STACK_GROWSUP
- stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_end + stack_expand;
#else
- stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_start - stack_expand;
#endif
ret = expand_stack(vma, stack_base);
if (ret)

2010-02-09 06:46:55

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Restrict initial stack space expansion to rlimit

> When reserving stack space for a new process, make sure we're not
> attempting to expand the stack by more than rlimit allows.
>
> This fixes a bug caused by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba "mm:
> variable length argument support" and unmasked by
> fc63cf237078c86214abcb2ee9926d8ad289da9b "exec: setup_arg_pages() fails
> to return errors". This bug means when limiting the stack to less the
> 20*PAGE_SIZE (eg. 80K on 4K pages or 'ulimit -s 79') all processes will
> be killed before they start. This is particularly bad with 64K pages,
> where a ulimit below 1280K will kill every process.
>
> Signed-off-by: Michael Neuling <[email protected]>
> Cc: [email protected]
> ---
> Attempts to answer comments from Kosaki Motohiro.
>
> Tested on PPC only, hence !CONFIG_STACK_GROWSUP. Someone should
> probably ACK for an arch with CONFIG_STACK_GROWSUP.
>
> As noted, stable needs the same patch, but 2.6.32 doesn't have the
> rlimit() helper.
>
> fs/exec.c | 21 ++++++++++++++++++---
> 1 file changed, 18 insertions(+), 3 deletions(-)
>
> Index: linux-2.6-ozlabs/fs/exec.c
> ===================================================================
> --- linux-2.6-ozlabs.orig/fs/exec.c
> +++ linux-2.6-ozlabs/fs/exec.c
> @@ -555,6 +555,7 @@ static int shift_arg_pages(struct vm_are
> }
>
> #define EXTRA_STACK_VM_PAGES 20 /* random */
> +#define ALIGN_DOWN(addr,size) ((addr)&(~((size)-1)))
>
> /*
> * Finalizes the stack vm_area_struct. The flags and permissions are updated,
> @@ -570,7 +571,7 @@ int setup_arg_pages(struct linux_binprm
> struct vm_area_struct *vma = bprm->vma;
> struct vm_area_struct *prev = NULL;
> unsigned long vm_flags;
> - unsigned long stack_base;
> + unsigned long stack_base, stack_expand, stack_expand_lim, stack_size;
>
> #ifdef CONFIG_STACK_GROWSUP
> /* Limit stack size to 1GB */
> @@ -627,10 +628,24 @@ int setup_arg_pages(struct linux_binprm
> goto out_unlock;
> }
>
> + stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + stack_size = vma->vm_end - vma->vm_start;
> + if (rlimit(RLIMIT_STACK) < stack_size)
> + stack_expand_lim = 0; /* don't shrick the stack */
> + else
> + /*
> + * Align this down to a page boundary as expand_stack
> + * will align it up.
> + */
> + stack_expand_lim = ALIGN_DOWN(rlimit(RLIMIT_STACK) - stack_size,
> + PAGE_SIZE);
> + /* Initial stack must not cause stack overflow. */
> + if (stack_expand > stack_expand_lim)
> + stack_expand = stack_expand_lim;
> #ifdef CONFIG_STACK_GROWSUP
> - stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + stack_base = vma->vm_end + stack_expand;
> #else
> - stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + stack_base = vma->vm_start - stack_expand;
> #endif
> ret = expand_stack(vma, stack_base);
> if (ret)

Umm.. It looks correct. but the nested complex if statement seems a bit ugly.
Instead, How about following?

note: it's untested.



===============
From: Michael Neuling <[email protected]>
Subject: Restrict initial stack space expansion to rlimit

When reserving stack space for a new process, make sure we're not
attempting to expand the stack by more than rlimit allows.

This fixes a bug caused by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba "mm:
variable length argument support" and unmasked by
fc63cf237078c86214abcb2ee9926d8ad289da9b "exec: setup_arg_pages() fails
to return errors". This bug means when limiting the stack to less the
20*PAGE_SIZE (eg. 80K on 4K pages or 'ulimit -s 79') all processes will
be killed before they start. This is particularly bad with 64K pages,
where a ulimit below 1280K will kill every process.

[[email protected]: cleanups]
Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
Cc: [email protected]
---
Attempts to answer comments from Kosaki Motohiro.

Tested on PPC only, hence !CONFIG_STACK_GROWSUP. Someone should
probably ACK for an arch with CONFIG_STACK_GROWSUP.

As noted, stable needs the same patch, but 2.6.32 doesn't have the
rlimit() helper.

diff --git a/fs/exec.c b/fs/exec.c
index 6f7fb0c..325bad4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -573,6 +573,9 @@ int setup_arg_pages(struct linux_binprm *bprm,
struct vm_area_struct *prev = NULL;
unsigned long vm_flags;
unsigned long stack_base;
+ unsigned long stack_size;
+ unsigned long stack_expand;
+ unsigned long rlim_stack;

#ifdef CONFIG_STACK_GROWSUP
/* Limit stack size to 1GB */
@@ -629,10 +632,27 @@ int setup_arg_pages(struct linux_binprm *bprm,
goto out_unlock;
}

+ stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_size = vma->vm_end - vma->vm_start;
+ /*
+ * Align this down to a page boundary as expand_stack
+ * will align it up.
+ */
+ rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
+ if (rlim_stack < stack_size)
+ rlim_stack = stack_size;
#ifdef CONFIG_STACK_GROWSUP
- stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ if (stack_size + stack_expand > rlim_stack) {
+ stack_base = vma->vm_start + rlim_stack;
+ } else {
+ stack_base = vma->vm_end + stack_expand;
+ }
#else
- stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ if (stack_size + stack_expand > rlim_stack) {
+ stack_base = vma->vm_end - rlim_stack;
+ } else {
+ stack_base = vma->vm_start - stack_expand;
+ }
#endif
ret = expand_stack(vma, stack_base);
if (ret)


2010-02-09 08:59:31

by Michael Neuling

[permalink] [raw]
Subject: Re: [PATCH] Restrict initial stack space expansion to rlimit

In message <[email protected]> you wrote:
> > When reserving stack space for a new process, make sure we're not
> > attempting to expand the stack by more than rlimit allows.
> >
> > This fixes a bug caused by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba "mm:
> > variable length argument support" and unmasked by
> > fc63cf237078c86214abcb2ee9926d8ad289da9b "exec: setup_arg_pages() fails
> > to return errors". This bug means when limiting the stack to less the
> > 20*PAGE_SIZE (eg. 80K on 4K pages or 'ulimit -s 79') all processes will
> > be killed before they start. This is particularly bad with 64K pages,
> > where a ulimit below 1280K will kill every process.
> >
> > Signed-off-by: Michael Neuling <[email protected]>
> > Cc: [email protected]
> > ---
> > Attempts to answer comments from Kosaki Motohiro.
> >
> > Tested on PPC only, hence !CONFIG_STACK_GROWSUP. Someone should
> > probably ACK for an arch with CONFIG_STACK_GROWSUP.
> >
> > As noted, stable needs the same patch, but 2.6.32 doesn't have the
> > rlimit() helper.
> >
> > fs/exec.c | 21 ++++++++++++++++++---
> > 1 file changed, 18 insertions(+), 3 deletions(-)
> >
> > Index: linux-2.6-ozlabs/fs/exec.c
> > ===================================================================
> > --- linux-2.6-ozlabs.orig/fs/exec.c
> > +++ linux-2.6-ozlabs/fs/exec.c
> > @@ -555,6 +555,7 @@ static int shift_arg_pages(struct vm_are
> > }
> >
> > #define EXTRA_STACK_VM_PAGES 20 /* random */
> > +#define ALIGN_DOWN(addr,size) ((addr)&(~((size)-1)))
> >
> > /*
> > * Finalizes the stack vm_area_struct. The flags and permissions are updat
ed,
> > @@ -570,7 +571,7 @@ int setup_arg_pages(struct linux_binprm
> > struct vm_area_struct *vma = bprm->vma;
> > struct vm_area_struct *prev = NULL;
> > unsigned long vm_flags;
> > - unsigned long stack_base;
> > + unsigned long stack_base, stack_expand, stack_expand_lim, stack_size;
> >
> > #ifdef CONFIG_STACK_GROWSUP
> > /* Limit stack size to 1GB */
> > @@ -627,10 +628,24 @@ int setup_arg_pages(struct linux_binprm
> > goto out_unlock;
> > }
> >
> > + stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> > + stack_size = vma->vm_end - vma->vm_start;
> > + if (rlimit(RLIMIT_STACK) < stack_size)
> > + stack_expand_lim = 0; /* don't shrick the stack */
> > + else
> > + /*
> > + * Align this down to a page boundary as expand_stack
> > + * will align it up.
> > + */
> > + stack_expand_lim = ALIGN_DOWN(rlimit(RLIMIT_STACK) - stack_size
,
> > + PAGE_SIZE);
> > + /* Initial stack must not cause stack overflow. */
> > + if (stack_expand > stack_expand_lim)
> > + stack_expand = stack_expand_lim;
> > #ifdef CONFIG_STACK_GROWSUP
> > - stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> > + stack_base = vma->vm_end + stack_expand;
> > #else
> > - stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> > + stack_base = vma->vm_start - stack_expand;
> > #endif
> > ret = expand_stack(vma, stack_base);
> > if (ret)
>
> Umm.. It looks correct. but the nested complex if statement seems a bit ugly.
> Instead, How about following?

I don't like the duplicated code in the #ifdef/else but I can live with it.

> note: it's untested.

Works for me on ppc64 with 4k and 64k pages. Thanks!

I'd still like someone with a CONFIG_STACK_GROWSUP arch to test/ACK it
as well.

Mikey

>
>
>
> ===============
> From: Michael Neuling <[email protected]>
> Subject: Restrict initial stack space expansion to rlimit
>
> When reserving stack space for a new process, make sure we're not
> attempting to expand the stack by more than rlimit allows.
>
> This fixes a bug caused by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba "mm:
> variable length argument support" and unmasked by
> fc63cf237078c86214abcb2ee9926d8ad289da9b "exec: setup_arg_pages() fails
> to return errors". This bug means when limiting the stack to less the
> 20*PAGE_SIZE (eg. 80K on 4K pages or 'ulimit -s 79') all processes will
> be killed before they start. This is particularly bad with 64K pages,
> where a ulimit below 1280K will kill every process.
>
> [[email protected]: cleanups]
> Signed-off-by: Michael Neuling <[email protected]>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> Cc: [email protected]
> ---
> Attempts to answer comments from Kosaki Motohiro.
>
> Tested on PPC only, hence !CONFIG_STACK_GROWSUP. Someone should
> probably ACK for an arch with CONFIG_STACK_GROWSUP.
>
> As noted, stable needs the same patch, but 2.6.32 doesn't have the
> rlimit() helper.
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 6f7fb0c..325bad4 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -573,6 +573,9 @@ int setup_arg_pages(struct linux_binprm *bprm,
> struct vm_area_struct *prev = NULL;
> unsigned long vm_flags;
> unsigned long stack_base;
> + unsigned long stack_size;
> + unsigned long stack_expand;
> + unsigned long rlim_stack;
>
> #ifdef CONFIG_STACK_GROWSUP
> /* Limit stack size to 1GB */
> @@ -629,10 +632,27 @@ int setup_arg_pages(struct linux_binprm *bprm,
> goto out_unlock;
> }
>
> + stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + stack_size = vma->vm_end - vma->vm_start;
> + /*
> + * Align this down to a page boundary as expand_stack
> + * will align it up.
> + */
> + rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
> + if (rlim_stack < stack_size)
> + rlim_stack = stack_size;
> #ifdef CONFIG_STACK_GROWSUP
> - stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + if (stack_size + stack_expand > rlim_stack) {
> + stack_base = vma->vm_start + rlim_stack;
> + } else {
> + stack_base = vma->vm_end + stack_expand;
> + }
> #else
> - stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + if (stack_size + stack_expand > rlim_stack) {
> + stack_base = vma->vm_end - rlim_stack;
> + } else {
> + stack_base = vma->vm_start - stack_expand;
> + }
> #endif
> ret = expand_stack(vma, stack_base);
> if (ret)
>
>
>

2010-02-09 21:28:20

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Restrict initial stack space expansion to rlimit

On Tue, 09 Feb 2010 19:59:27 +1100
Michael Neuling <[email protected]> wrote:

> > > + /* Initial stack must not cause stack overflow. */
> > > + if (stack_expand > stack_expand_lim)
> > > + stack_expand = stack_expand_lim;
> > > #ifdef CONFIG_STACK_GROWSUP
> > > - stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> > > + stack_base = vma->vm_end + stack_expand;
> > > #else
> > > - stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> > > + stack_base = vma->vm_start - stack_expand;
> > > #endif
> > > ret = expand_stack(vma, stack_base);
> > > if (ret)
> >
> > Umm.. It looks correct. but the nested complex if statement seems a bit ugly.
> > Instead, How about following?
>
> I don't like the duplicated code in the #ifdef/else but I can live with it.

cleanup the cleanup:

--- a/fs/exec.c~fs-execc-restrict-initial-stack-space-expansion-to-rlimit-cleanup-cleanup
+++ a/fs/exec.c
@@ -637,20 +637,17 @@ int setup_arg_pages(struct linux_binprm
* will align it up.
*/
rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
- if (rlim_stack < stack_size)
- rlim_stack = stack_size;
+ rlim_stack = min(rlim_stack, stack_size);
#ifdef CONFIG_STACK_GROWSUP
- if (stack_size + stack_expand > rlim_stack) {
+ if (stack_size + stack_expand > rlim_stack)
stack_base = vma->vm_start + rlim_stack;
- } else {
+ else
stack_base = vma->vm_end + stack_expand;
- }
#else
- if (stack_size + stack_expand > rlim_stack) {
+ if (stack_size + stack_expand > rlim_stack)
stack_base = vma->vm_end - rlim_stack;
- } else {
+ else
stack_base = vma->vm_start - stack_expand;
- }
#endif
ret = expand_stack(vma, stack_base);
if (ret)
_

> > note: it's untested.
>
> Works for me on ppc64 with 4k and 64k pages. Thanks!
>
> I'd still like someone with a CONFIG_STACK_GROWSUP arch to test/ACK it
> as well.

There's only one CONFIG_GROWSUP arch - parisc.

Guys, here's the rolled-up patch. Could someone please test it on
parisc?

err, I'm not sure what one needs to do to test it, actually.
Presumably it involves setting an unusual `ulimit -s'. Can someone
please suggest a test plan?




From: Michael Neuling <[email protected]>

When reserving stack space for a new process, make sure we're not
attempting to expand the stack by more than rlimit allows.

This fixes a bug caused by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba ("mm:
variable length argument support") and unmasked by
fc63cf237078c86214abcb2ee9926d8ad289da9b ("exec: setup_arg_pages() fails
to return errors").

This bug means that when limiting the stack to less the 20*PAGE_SIZE (eg.
80K on 4K pages or 'ulimit -s 79') all processes will be killed before
they start. This is particularly bad with 64K pages, where a ulimit below
1280K will kill every process.

Signed-off-by: Michael Neuling <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Americo Wang <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: James Morris <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Serge Hallyn <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: <[email protected]>

fs/exec.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)

diff -puN fs/exec.c~fs-execc-restrict-initial-stack-space-expansion-to-rlimit fs/exec.c
--- a/fs/exec.c~fs-execc-restrict-initial-stack-space-expansion-to-rlimit
+++ a/fs/exec.c
@@ -571,6 +571,9 @@ int setup_arg_pages(struct linux_binprm
struct vm_area_struct *prev = NULL;
unsigned long vm_flags;
unsigned long stack_base;
+ unsigned long stack_size;
+ unsigned long stack_expand;
+ unsigned long rlim_stack;

#ifdef CONFIG_STACK_GROWSUP
/* Limit stack size to 1GB */
@@ -627,10 +630,24 @@ int setup_arg_pages(struct linux_binprm
goto out_unlock;
}

+ stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_size = vma->vm_end - vma->vm_start;
+ /*
+ * Align this down to a page boundary as expand_stack
+ * will align it up.
+ */
+ rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
+ rlim_stack = min(rlim_stack, stack_size);
#ifdef CONFIG_STACK_GROWSUP
- stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ if (stack_size + stack_expand > rlim_stack)
+ stack_base = vma->vm_start + rlim_stack;
+ else
+ stack_base = vma->vm_end + stack_expand;
#else
- stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ if (stack_size + stack_expand > rlim_stack)
+ stack_base = vma->vm_end - rlim_stack;
+ else
+ stack_base = vma->vm_start - stack_expand;
#endif
ret = expand_stack(vma, stack_base);
if (ret)
_

2010-02-09 21:51:34

by Michael Neuling

[permalink] [raw]
Subject: Re: [PATCH] Restrict initial stack space expansion to rlimit


> > > note: it's untested.
> >
> > Works for me on ppc64 with 4k and 64k pages. Thanks!
> >
> > I'd still like someone with a CONFIG_STACK_GROWSUP arch to test/ACK it
> > as well.
>
> There's only one CONFIG_GROWSUP arch - parisc.
>
> Guys, here's the rolled-up patch.

FYI the rolled up patch still works fine on PPC64. Thanks.

> Could someone please test it on parisc?
>
> err, I'm not sure what one needs to do to test it, actually.
> Presumably it involves setting an unusual `ulimit -s'. Can someone
> please suggest a test plan?

How about doing:
'ulimit -s 15; ls'
before and after the patch is applied. Before it's applied, 'ls' should
be killed. After the patch is applied, 'ls' should no longer be killed.

I'm suggesting a stack limit of 15KB since it's small enough to trigger
20*PAGE_SIZE. Also 15KB not a multiple of PAGE_SIZE, which is a trickier
case to handle correctly with this code.

4K pages on parisc should be fine to test with.

Mikey

>
> From: Michael Neuling <[email protected]>
>
> When reserving stack space for a new process, make sure we're not
> attempting to expand the stack by more than rlimit allows.
>
> This fixes a bug caused by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba ("mm:
> variable length argument support") and unmasked by
> fc63cf237078c86214abcb2ee9926d8ad289da9b ("exec: setup_arg_pages() fails
> to return errors").
>
> This bug means that when limiting the stack to less the 20*PAGE_SIZE (eg.
> 80K on 4K pages or 'ulimit -s 79') all processes will be killed before
> they start. This is particularly bad with 64K pages, where a ulimit below
> 1280K will kill every process.
>
> Signed-off-by: Michael Neuling <[email protected]>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Americo Wang <[email protected]>
> Cc: Anton Blanchard <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: James Morris <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Serge Hallyn <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: <[email protected]>
>
> fs/exec.c | 21 +++++++++++++++++++--
> 1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff -puN fs/exec.c~fs-execc-restrict-initial-stack-space-expansion-to-rlimit
fs/exec.c
> --- a/fs/exec.c~fs-execc-restrict-initial-stack-space-expansion-to-rlimit
> +++ a/fs/exec.c
> @@ -571,6 +571,9 @@ int setup_arg_pages(struct linux_binprm
> struct vm_area_struct *prev = NULL;
> unsigned long vm_flags;
> unsigned long stack_base;
> + unsigned long stack_size;
> + unsigned long stack_expand;
> + unsigned long rlim_stack;
>
> #ifdef CONFIG_STACK_GROWSUP
> /* Limit stack size to 1GB */
> @@ -627,10 +630,24 @@ int setup_arg_pages(struct linux_binprm
> goto out_unlock;
> }
>
> + stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + stack_size = vma->vm_end - vma->vm_start;
> + /*
> + * Align this down to a page boundary as expand_stack
> + * will align it up.
> + */
> + rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
> + rlim_stack = min(rlim_stack, stack_size);
> #ifdef CONFIG_STACK_GROWSUP
> - stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + if (stack_size + stack_expand > rlim_stack)
> + stack_base = vma->vm_start + rlim_stack;
> + else
> + stack_base = vma->vm_end + stack_expand;
> #else
> - stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + if (stack_size + stack_expand > rlim_stack)
> + stack_base = vma->vm_end - rlim_stack;
> + else
> + stack_base = vma->vm_start - stack_expand;
> #endif
> ret = expand_stack(vma, stack_base);
> if (ret)
> _
>

2010-02-09 22:27:23

by Helge Deller

[permalink] [raw]
Subject: Re: [PATCH] Restrict initial stack space expansion to rlimit

On 02/09/2010 10:51 PM, Michael Neuling wrote:
>>> I'd still like someone with a CONFIG_STACK_GROWSUP arch to test/ACK it
>>> as well.
>>
>> There's only one CONFIG_GROWSUP arch - parisc.
>> Could someone please test it on parisc?

I did.

> How about doing:
> 'ulimit -s 15; ls'
> before and after the patch is applied. Before it's applied, 'ls' should
> be killed. After the patch is applied, 'ls' should no longer be killed.
>
> I'm suggesting a stack limit of 15KB since it's small enough to trigger
> 20*PAGE_SIZE. Also 15KB not a multiple of PAGE_SIZE, which is a trickier
> case to handle correctly with this code.
>
> 4K pages on parisc should be fine to test with.

Mikey, thanks for the suggested test plan.

I'm not sure if your patch does it correct for parisc/stack-grows-up-case.

I tested your patch on a 4k pages kernel:
root@c3000:~# uname -a
Linux c3000 2.6.33-rc7-32bit #221 Tue Feb 9 23:17:06 CET 2010 parisc GNU/Linux

Without your patch:
root@c3000:~# ulimit -s 15; ls
Killed
-> correct.

With your patch:
root@c3000:~# ulimit -s 15; ls
Killed
_or_:
root@c3000:~# ulimit -s 15; ls
Segmentation fault
-> ??

Any idea?

Helge


>> From: Michael Neuling<[email protected]>
>>
>> When reserving stack space for a new process, make sure we're not
>> attempting to expand the stack by more than rlimit allows.
>>
>> This fixes a bug caused by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba ("mm:
>> variable length argument support") and unmasked by
>> fc63cf237078c86214abcb2ee9926d8ad289da9b ("exec: setup_arg_pages() fails
>> to return errors").
>>
>> This bug means that when limiting the stack to less the 20*PAGE_SIZE (eg.
>> 80K on 4K pages or 'ulimit -s 79') all processes will be killed before
>> they start. This is particularly bad with 64K pages, where a ulimit below
>> 1280K will kill every process.
>>
>> Signed-off-by: Michael Neuling<[email protected]>
>> Cc: KOSAKI Motohiro<[email protected]>
>> Cc: Americo Wang<[email protected]>
>> Cc: Anton Blanchard<[email protected]>
>> Cc: Oleg Nesterov<[email protected]>
>> Cc: James Morris<[email protected]>
>> Cc: Ingo Molnar<[email protected]>
>> Cc: Serge Hallyn<[email protected]>
>> Cc: Benjamin Herrenschmidt<[email protected]>
>> Cc:<[email protected]>
>>
>> fs/exec.c | 21 +++++++++++++++++++--
>> 1 file changed, 19 insertions(+), 2 deletions(-)
>>
>> diff -puN fs/exec.c~fs-execc-restrict-initial-stack-space-expansion-to-rlimit
> fs/exec.c
>> --- a/fs/exec.c~fs-execc-restrict-initial-stack-space-expansion-to-rlimit
>> +++ a/fs/exec.c
>> @@ -571,6 +571,9 @@ int setup_arg_pages(struct linux_binprm
>> struct vm_area_struct *prev = NULL;
>> unsigned long vm_flags;
>> unsigned long stack_base;
>> + unsigned long stack_size;
>> + unsigned long stack_expand;
>> + unsigned long rlim_stack;
>>
>> #ifdef CONFIG_STACK_GROWSUP
>> /* Limit stack size to 1GB */
>> @@ -627,10 +630,24 @@ int setup_arg_pages(struct linux_binprm
>> goto out_unlock;
>> }
>>
>> + stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
>> + stack_size = vma->vm_end - vma->vm_start;
>> + /*
>> + * Align this down to a page boundary as expand_stack
>> + * will align it up.
>> + */
>> + rlim_stack = rlimit(RLIMIT_STACK)& PAGE_MASK;
>> + rlim_stack = min(rlim_stack, stack_size);
>> #ifdef CONFIG_STACK_GROWSUP
>> - stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
>> + if (stack_size + stack_expand> rlim_stack)
>> + stack_base = vma->vm_start + rlim_stack;
>> + else
>> + stack_base = vma->vm_end + stack_expand;
>> #else
>> - stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
>> + if (stack_size + stack_expand> rlim_stack)
>> + stack_base = vma->vm_end - rlim_stack;
>> + else
>> + stack_base = vma->vm_start - stack_expand;
>> #endif
>> ret = expand_stack(vma, stack_base);
>> if (ret)

2010-02-10 05:13:01

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Restrict initial stack space expansion to rlimit

> On 02/09/2010 10:51 PM, Michael Neuling wrote:
> >>> I'd still like someone with a CONFIG_STACK_GROWSUP arch to test/ACK it
> >>> as well.
> >>
> >> There's only one CONFIG_GROWSUP arch - parisc.
> >> Could someone please test it on parisc?
>
> I did.
>
> > How about doing:
> > 'ulimit -s 15; ls'
> > before and after the patch is applied. Before it's applied, 'ls' should
> > be killed. After the patch is applied, 'ls' should no longer be killed.
> >
> > I'm suggesting a stack limit of 15KB since it's small enough to trigger
> > 20*PAGE_SIZE. Also 15KB not a multiple of PAGE_SIZE, which is a trickier
> > case to handle correctly with this code.
> >
> > 4K pages on parisc should be fine to test with.
>
> Mikey, thanks for the suggested test plan.
>
> I'm not sure if your patch does it correct for parisc/stack-grows-up-case.
>
> I tested your patch on a 4k pages kernel:
> root@c3000:~# uname -a
> Linux c3000 2.6.33-rc7-32bit #221 Tue Feb 9 23:17:06 CET 2010 parisc GNU/Linux
>
> Without your patch:
> root@c3000:~# ulimit -s 15; ls
> Killed
> -> correct.
>
> With your patch:
> root@c3000:~# ulimit -s 15; ls
> Killed
> _or_:
> root@c3000:~# ulimit -s 15; ls
> Segmentation fault
> -> ??
>
> Any idea?

My x86_64 box also makes segmentation fault. I think "ulimit -s 15" is too small stack for ls.
"ulimit -s 27; ls " wroks perfectly fine.


2010-02-10 05:30:20

by Michael Neuling

[permalink] [raw]
Subject: Re: [PATCH] Restrict initial stack space expansion to rlimit



In message <[email protected]> you wrote:
> > On 02/09/2010 10:51 PM, Michael Neuling wrote:
> > >>> I'd still like someone with a CONFIG_STACK_GROWSUP arch to test/ACK it
> > >>> as well.
> > >>
> > >> There's only one CONFIG_GROWSUP arch - parisc.
> > >> Could someone please test it on parisc?
> >
> > I did.
> >
> > > How about doing:
> > > 'ulimit -s 15; ls'
> > > before and after the patch is applied. Before it's applied, 'ls' should
> > > be killed. After the patch is applied, 'ls' should no longer be killed.
> > >
> > > I'm suggesting a stack limit of 15KB since it's small enough to trigger
> > > 20*PAGE_SIZE. Also 15KB not a multiple of PAGE_SIZE, which is a trickier
> > > case to handle correctly with this code.
> > >
> > > 4K pages on parisc should be fine to test with.
> >
> > Mikey, thanks for the suggested test plan.
> >
> > I'm not sure if your patch does it correct for parisc/stack-grows-up-case.
> >
> > I tested your patch on a 4k pages kernel:
> > root@c3000:~# uname -a
> > Linux c3000 2.6.33-rc7-32bit #221 Tue Feb 9 23:17:06 CET 2010 parisc GNU/Li
nux
> >
> > Without your patch:
> > root@c3000:~# ulimit -s 15; ls
> > Killed
> > -> correct.
> >
> > With your patch:
> > root@c3000:~# ulimit -s 15; ls
> > Killed
> > _or_:
> > root@c3000:~# ulimit -s 15; ls
> > Segmentation fault
> > -> ??
> >
> > Any idea?
>
> My x86_64 box also makes segmentation fault. I think "ulimit -s 15" is too sm
all stack for ls.
> "ulimit -s 27; ls " wroks perfectly fine.

Arrh. I asked Helge offline earlier to check what use to work on parisc
on 2.6.31.

I guess PPC has a nice clean non-bloated ABI :-D

Mikey

2010-02-10 05:31:11

by Michael Neuling

[permalink] [raw]
Subject: Re: [PATCH] Restrict initial stack space expansion to rlimit



In message <[email protected]> you wrote:
> > On 02/09/2010 10:51 PM, Michael Neuling wrote:
> > >>> I'd still like someone with a CONFIG_STACK_GROWSUP arch to test/ACK it
> > >>> as well.
> > >>
> > >> There's only one CONFIG_GROWSUP arch - parisc.
> > >> Could someone please test it on parisc?
> >
> > I did.
> >
> > > How about doing:
> > > 'ulimit -s 15; ls'
> > > before and after the patch is applied. Before it's applied, 'ls' should
> > > be killed. After the patch is applied, 'ls' should no longer be killed.
> > >
> > > I'm suggesting a stack limit of 15KB since it's small enough to trigger
> > > 20*PAGE_SIZE. Also 15KB not a multiple of PAGE_SIZE, which is a trickier
> > > case to handle correctly with this code.
> > >
> > > 4K pages on parisc should be fine to test with.
> >
> > Mikey, thanks for the suggested test plan.
> >
> > I'm not sure if your patch does it correct for parisc/stack-grows-up-case.
> >
> > I tested your patch on a 4k pages kernel:
> > root@c3000:~# uname -a
> > Linux c3000 2.6.33-rc7-32bit #221 Tue Feb 9 23:17:06 CET 2010 parisc GNU/Li
nux
> >
> > Without your patch:
> > root@c3000:~# ulimit -s 15; ls
> > Killed
> > -> correct.
> >
> > With your patch:
> > root@c3000:~# ulimit -s 15; ls
> > Killed
> > _or_:
> > root@c3000:~# ulimit -s 15; ls
> > Segmentation fault
> > -> ??
> >
> > Any idea?
>
> My x86_64 box also makes segmentation fault. I think "ulimit -s 15" is too sm
all stack for ls.
> "ulimit -s 27; ls " wroks perfectly fine.

Arrh. I asked Helge offline earlier to check what use to work on parisc
on 2.6.31.

I guess PPC has a nice clean non-bloated ABI :-D

Mikey

2010-02-11 22:16:15

by Helge Deller

[permalink] [raw]
Subject: Re: [PATCH] Restrict initial stack space expansion to rlimit

On 02/10/2010 06:31 AM, Michael Neuling wrote:
> In message<[email protected]> you wrote:
>>> On 02/09/2010 10:51 PM, Michael Neuling wrote:
>>>>>> I'd still like someone with a CONFIG_STACK_GROWSUP arch to test/ACK it
>>>>>> as well.
>>>>>
>>>>> There's only one CONFIG_GROWSUP arch - parisc.
>>>>> Could someone please test it on parisc?
>>>
>>> I did.
>>>
>>>> How about doing:
>>>> 'ulimit -s 15; ls'
>>>> before and after the patch is applied. Before it's applied, 'ls' should
>>>> be killed. After the patch is applied, 'ls' should no longer be killed.
>>>>
>>>> I'm suggesting a stack limit of 15KB since it's small enough to trigger
>>>> 20*PAGE_SIZE. Also 15KB not a multiple of PAGE_SIZE, which is a trickier
>>>> case to handle correctly with this code.
>>>>
>>>> 4K pages on parisc should be fine to test with.
>>>
>>> Mikey, thanks for the suggested test plan.
>>>
>>> I'm not sure if your patch does it correct for parisc/stack-grows-up-case.
>>>
>>> I tested your patch on a 4k pages kernel:
>>> root@c3000:~# uname -a
>>> Linux c3000 2.6.33-rc7-32bit #221 Tue Feb 9 23:17:06 CET 2010 parisc GNU/Li
> nux
>>>
>>> Without your patch:
>>> root@c3000:~# ulimit -s 15; ls
>>> Killed
>>> -> correct.
>>>
>>> With your patch:
>>> root@c3000:~# ulimit -s 15; ls
>>> Killed
>>> _or_:
>>> root@c3000:~# ulimit -s 15; ls
>>> Segmentation fault
>>> -> ??
>>>
>>> Any idea?
>>
>> My x86_64 box also makes segmentation fault. I think "ulimit -s 15" is too sm
> all stack for ls.
>> "ulimit -s 27; ls " wroks perfectly fine.
>
> Arrh. I asked Helge offline earlier to check what use to work on parisc
> on 2.6.31.
>
> I guess PPC has a nice clean non-bloated ABI :-D

Hi Mikey,

I tested again, and it works for me with "ulimit -s 27" as well (on a 4k, 32bit kernel).
Still, I'm not 100% sure if your patch is correct.
Anyway, it seems to work.

But what makes me wonder is, why EXTRA_STACK_VM_PAGES is defined in pages at all.
You wrote in your patch description:
> This bug means that when limiting the stack to less the 20*PAGE_SIZE (eg.
> 80K on 4K pages or 'ulimit -s 79') all processes will be killed before
> they start. This is particularly bad with 64K pages, where a ulimit below
> 1280K will kill every process.

Wouldn't it make sense to define and use EXTRA_STACK_VM_SIZE instead (e.g. as 20*4096 = 80k)?
This extra stack reservation should IMHO be independend of the actual kernel page size.

Helge

2010-02-11 22:23:15

by Michael Neuling

[permalink] [raw]
Subject: Re: [PATCH] Restrict initial stack space expansion to rlimit

In message <[email protected]> you wrote:
> On 02/10/2010 06:31 AM, Michael Neuling wrote:
> > In message<[email protected]> you wrote:
> >>> On 02/09/2010 10:51 PM, Michael Neuling wrote:
> >>>>>> I'd still like someone with a CONFIG_STACK_GROWSUP arch to test/ACK it
> >>>>>> as well.
> >>>>>
> >>>>> There's only one CONFIG_GROWSUP arch - parisc.
> >>>>> Could someone please test it on parisc?
> >>>
> >>> I did.
> >>>
> >>>> How about doing:
> >>>> 'ulimit -s 15; ls'
> >>>> before and after the patch is applied. Before it's applied, 'ls' should
> >>>> be killed. After the patch is applied, 'ls' should no longer be killed.
> >>>>
> >>>> I'm suggesting a stack limit of 15KB since it's small enough to trigger
> >>>> 20*PAGE_SIZE. Also 15KB not a multiple of PAGE_SIZE, which is a trickie
r
> >>>> case to handle correctly with this code.
> >>>>
> >>>> 4K pages on parisc should be fine to test with.
> >>>
> >>> Mikey, thanks for the suggested test plan.
> >>>
> >>> I'm not sure if your patch does it correct for parisc/stack-grows-up-case
.
> >>>
> >>> I tested your patch on a 4k pages kernel:
> >>> root@c3000:~# uname -a
> >>> Linux c3000 2.6.33-rc7-32bit #221 Tue Feb 9 23:17:06 CET 2010 parisc GNU/
Li
> > nux
> >>>
> >>> Without your patch:
> >>> root@c3000:~# ulimit -s 15; ls
> >>> Killed
> >>> -> correct.
> >>>
> >>> With your patch:
> >>> root@c3000:~# ulimit -s 15; ls
> >>> Killed
> >>> _or_:
> >>> root@c3000:~# ulimit -s 15; ls
> >>> Segmentation fault
> >>> -> ??
> >>>
> >>> Any idea?
> >>
> >> My x86_64 box also makes segmentation fault. I think "ulimit -s 15" is too
sm
> > all stack for ls.
> >> "ulimit -s 27; ls " wroks perfectly fine.
> >
> > Arrh. I asked Helge offline earlier to check what use to work on parisc
> > on 2.6.31.
> >
> > I guess PPC has a nice clean non-bloated ABI :-D
>
> Hi Mikey,
>
> I tested again, and it works for me with "ulimit -s 27" as well (on a
> 4k, 32bit kernel).
> Still, I'm not 100% sure if your patch is correct.

Thanks for retesting

Did "ulimit -s 27" fail before you applied?

> Anyway, it seems to work.
>
> But what makes me wonder is, why EXTRA_STACK_VM_PAGES is defined in pages at
all.
> You wrote in your patch description:
> > This bug means that when limiting the stack to less the 20*PAGE_SIZE (eg.
> > 80K on 4K pages or 'ulimit -s 79') all processes will be killed before
> > they start. This is particularly bad with 64K pages, where a ulimit below
> > 1280K will kill every process.
>
> Wouldn't it make sense to define and use EXTRA_STACK_VM_SIZE instead
> (e.g. as 20*4096 = 80k)? This extra stack reservation should IMHO be
> independend of the actual kernel page size.

If you look back through this thread, that has already been noted but
it's a separate issue to this bug, so that change will be deferred till
2.6.34.

Mikey

2010-02-12 05:44:20

by Michael Neuling

[permalink] [raw]
Subject: [PATCH] Create initial stack independent of PAGE_SIZE

Currently we create the initial stack based on the PAGE_SIZE. This is
unnecessary.

This creates this initial stack independent of the PAGE_SIZE.

It also bumps up the number of 4k pages allocated from 20 to 32, to
align with 64K page systems.

Signed-off-by: Michael Neuling <[email protected]>
---
This is the second half of my original patch. This can be targeted for
2.6.34 as it's just a cleanup.

Tested on PPC64 with 4k and 64k pages.

fs/exec.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

Index: linux-2.6-ozlabs/fs/exec.c
===================================================================
--- linux-2.6-ozlabs.orig/fs/exec.c
+++ linux-2.6-ozlabs/fs/exec.c
@@ -554,8 +554,6 @@ static int shift_arg_pages(struct vm_are
return 0;
}

-#define EXTRA_STACK_VM_PAGES 20 /* random */
-
/*
* Finalizes the stack vm_area_struct. The flags and permissions are updated,
* the stack is optionally relocated, and some extra space is added.
@@ -630,7 +628,7 @@ int setup_arg_pages(struct linux_binprm
goto out_unlock;
}

- stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages */
stack_size = vma->vm_end - vma->vm_start;
/*
* Align this down to a page boundary as expand_stack

2010-02-12 07:20:46

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Create initial stack independent of PAGE_SIZE

> Currently we create the initial stack based on the PAGE_SIZE. This is
> unnecessary.

I don't think this is enough explanation. In past mail, you described
why page size dependency is harmful. I hope you add it into the patch
description.

IOW, we don't need to change the unnecessary-but-non-harmful behavior.

>
> This creates this initial stack independent of the PAGE_SIZE.
>
> It also bumps up the number of 4k pages allocated from 20 to 32, to
> align with 64K page systems.

Why do we need page-aligning? Do you mean this code doesn't works on
128K (or more larger) page systems?


>
> Signed-off-by: Michael Neuling <[email protected]>
> ---
> This is the second half of my original patch. This can be targeted for
> 2.6.34 as it's just a cleanup.
>
> Tested on PPC64 with 4k and 64k pages.
>
> fs/exec.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> Index: linux-2.6-ozlabs/fs/exec.c
> ===================================================================
> --- linux-2.6-ozlabs.orig/fs/exec.c
> +++ linux-2.6-ozlabs/fs/exec.c
> @@ -554,8 +554,6 @@ static int shift_arg_pages(struct vm_are
> return 0;
> }
>
> -#define EXTRA_STACK_VM_PAGES 20 /* random */
> -
> /*
> * Finalizes the stack vm_area_struct. The flags and permissions are updated,
> * the stack is optionally relocated, and some extra space is added.
> @@ -630,7 +628,7 @@ int setup_arg_pages(struct linux_binprm
> goto out_unlock;
> }
>
> - stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages */
> stack_size = vma->vm_end - vma->vm_start;
> /*
> * Align this down to a page boundary as expand_stack


2010-02-12 09:02:21

by Michael Neuling

[permalink] [raw]
Subject: Re: [PATCH] Create initial stack independent of PAGE_SIZE

In message <[email protected]> you wrote:
> > Currently we create the initial stack based on the PAGE_SIZE. This is
> > unnecessary.
>
> I don't think this is enough explanation. In past mail, you described
> why page size dependency is harmful. I hope you add it into the patch
> description.

I don't think it's harmful, it's just irrelevant. Stack size is
independent of page size.

> IOW, we don't need to change the unnecessary-but-non-harmful behavior.
>
> >
> > This creates this initial stack independent of the PAGE_SIZE.
> >
> > It also bumps up the number of 4k pages allocated from 20 to 32, to
> > align with 64K page systems.
>
> Why do we need page-aligning? Do you mean this code doesn't works on
> 128K (or more larger) page systems?

If the "random" setting is not a common multiple of the 4k and 64k
pages, they will end up getting aligned differently, hence causing what
we are trying to avoid in the first place with this patch.

I should probably add this as a comment in the code comment?

Mikey

>
>
> >
> > Signed-off-by: Michael Neuling <[email protected]>
> > ---
> > This is the second half of my original patch. This can be targeted for
> > 2.6.34 as it's just a cleanup.
> >
> > Tested on PPC64 with 4k and 64k pages.
> >
> > fs/exec.c | 4 +---
> > 1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > Index: linux-2.6-ozlabs/fs/exec.c
> > ===================================================================
> > --- linux-2.6-ozlabs.orig/fs/exec.c
> > +++ linux-2.6-ozlabs/fs/exec.c
> > @@ -554,8 +554,6 @@ static int shift_arg_pages(struct vm_are
> > return 0;
> > }
> >
> > -#define EXTRA_STACK_VM_PAGES 20 /* random */
> > -
> > /*
> > * Finalizes the stack vm_area_struct. The flags and permissions are updat
ed,
> > * the stack is optionally relocated, and some extra space is added.
> > @@ -630,7 +628,7 @@ int setup_arg_pages(struct linux_binprm
> > goto out_unlock;
> > }
> >
> > - stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> > + stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages */
> > stack_size = vma->vm_end - vma->vm_start;
> > /*
> > * Align this down to a page boundary as expand_stack
>
>
>

2010-02-12 09:51:15

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Create initial stack independent of PAGE_SIZE

> In message <[email protected]> you wrote:
> > > Currently we create the initial stack based on the PAGE_SIZE. This is
> > > unnecessary.
> >
> > I don't think this is enough explanation. In past mail, you described
> > why page size dependency is harmful. I hope you add it into the patch
> > description.
>
> I don't think it's harmful, it's just irrelevant. Stack size is
> independent of page size.

ok.

> > IOW, we don't need to change the unnecessary-but-non-harmful behavior.
> >
> > >
> > > This creates this initial stack independent of the PAGE_SIZE.
> > >
> > > It also bumps up the number of 4k pages allocated from 20 to 32, to
> > > align with 64K page systems.
> >
> > Why do we need page-aligning? Do you mean this code doesn't works on
> > 128K (or more larger) page systems?
>
> If the "random" setting is not a common multiple of the 4k and 64k
> pages, they will end up getting aligned differently, hence causing what
> we are trying to avoid in the first place with this patch.

I see. ok,
Reviewed-by: KOSAKI Motohiro <[email protected]>

>
> I should probably add this as a comment in the code comment?

probably, It's not big matter.

Thanks :)



>
> Mikey
>
> >
> >
> > >
> > > Signed-off-by: Michael Neuling <[email protected]>
> > > ---
> > > This is the second half of my original patch. This can be targeted for
> > > 2.6.34 as it's just a cleanup.
> > >
> > > Tested on PPC64 with 4k and 64k pages.
> > >
> > > fs/exec.c | 4 +---
> > > 1 file changed, 1 insertion(+), 3 deletions(-)
> > >
> > > Index: linux-2.6-ozlabs/fs/exec.c
> > > ===================================================================
> > > --- linux-2.6-ozlabs.orig/fs/exec.c
> > > +++ linux-2.6-ozlabs/fs/exec.c
> > > @@ -554,8 +554,6 @@ static int shift_arg_pages(struct vm_are
> > > return 0;
> > > }
> > >
> > > -#define EXTRA_STACK_VM_PAGES 20 /* random */
> > > -
> > > /*
> > > * Finalizes the stack vm_area_struct. The flags and permissions are updat
> ed,
> > > * the stack is optionally relocated, and some extra space is added.
> > > @@ -630,7 +628,7 @@ int setup_arg_pages(struct linux_binprm
> > > goto out_unlock;
> > > }
> > >
> > > - stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> > > + stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages */
> > > stack_size = vma->vm_end - vma->vm_start;
> > > /*
> > > * Align this down to a page boundary as expand_stack
> >
> >
> >