2004-09-16 17:01:37

by Utz Lehmann

[permalink] [raw]
Subject: [PATCH] flexmmap: optimise mmap_base gap for hard limited stack

Hi

With the flexmmap memory layout there is at least a 128 MB gap between
mmap_base and TASK_SIZE. I think this is for the case that a running process
can expand it's stack soft rlimit.

If there is a hard limit for the stack this minium gap is just a waste of
space. This patch reduce the gap to the hard limit + 1 MB hole. If a process
has a 8192 KB hard limit it have additional 119 MB space available over the
current behavior.

And the current implemention has a problem. If the stack soft limit is
128+ MB there is no hole between the stack and mmap_base. If there is a
mapping at mmap_base stack overflows are not detected. The patch made a
1MB hole between them.

Tested only on x86.


Signed-off-by: Utz Lehmann <[email protected]>

diff -Nrup linux-2.6.9-rc2/arch/i386/mm/mmap.c linux-2.6.9-rc2-gap/arch/i386/mm/mmap.c
--- linux-2.6.9-rc2/arch/i386/mm/mmap.c 2004-09-16 11:18:15.363366420 +0200
+++ linux-2.6.9-rc2-gap/arch/i386/mm/mmap.c 2004-09-16 16:01:13.197508592 +0200
@@ -30,10 +30,13 @@
/*
* Top of mmap area (just below the process stack).
*
- * Leave an at least ~128 MB hole.
+ * Leave an at least 1 MB hole between stack and mmap_base.
+ * Leave an at least 128 MB gap between TASK_SIZE and mmap_base with a
+ * soft rlimit stack.
*/
-#define MIN_GAP (128*1024*1024)
-#define MAX_GAP (TASK_SIZE/6*5)
+#define MIN_HOLE (1*1024*1024)
+#define MIN_GAP (128*1024*1024 - MIN_HOLE)
+#define MAX_GAP (TASK_SIZE/6*5 - MIN_HOLE)

static inline unsigned long mmap_base(struct mm_struct *mm)
{
@@ -43,8 +46,10 @@ static inline unsigned long mmap_base(st
gap = MIN_GAP;
else if (gap > MAX_GAP)
gap = MAX_GAP;
+ if (gap > current->rlim[RLIMIT_STACK].rlim_max)
+ gap = current->rlim[RLIMIT_STACK].rlim_max;

- return TASK_SIZE - (gap & PAGE_MASK);
+ return TASK_SIZE - ((gap + MIN_HOLE) & PAGE_MASK);
}

/*
diff -Nrup linux-2.6.9-rc2/arch/ppc64/mm/mmap.c linux-2.6.9-rc2-gap/arch/ppc64/mm/mmap.c
--- linux-2.6.9-rc2/arch/ppc64/mm/mmap.c 2004-09-16 11:18:19.760799910 +0200
+++ linux-2.6.9-rc2-gap/arch/ppc64/mm/mmap.c 2004-09-16 16:37:44.995858703 +0200
@@ -30,10 +30,13 @@
/*
* Top of mmap area (just below the process stack).
*
- * Leave an at least ~128 MB hole.
+ * Leave an at least 1 MB hole between stack and mmap_base.
+ * Leave an at least 128 MB gap between TASK_SIZE and mmap_base with a
+ * soft rlimit stack.
*/
-#define MIN_GAP (128*1024*1024)
-#define MAX_GAP (TASK_SIZE/6*5)
+#define MIN_HOLE (1*1024*1024)
+#define MIN_GAP (128*1024*1024 - MIN_HOLE)
+#define MAX_GAP (TASK_SIZE/6*5 - MIN_HOLE)

static inline unsigned long mmap_base(void)
{
@@ -43,8 +46,10 @@ static inline unsigned long mmap_base(vo
gap = MIN_GAP;
else if (gap > MAX_GAP)
gap = MAX_GAP;
+ if (gap > current->rlim[RLIMIT_STACK].rlim_max)
+ gap = current->rlim[RLIMIT_STACK].rlim_max;

- return TASK_SIZE - (gap & PAGE_MASK);
+ return TASK_SIZE - ((gap + MIN_HOLE) & PAGE_MASK);
}

static inline int mmap_is_legacy(void)
diff -Nrup linux-2.6.9-rc2/arch/s390/mm/mmap.c linux-2.6.9-rc2-gap/arch/s390/mm/mmap.c
--- linux-2.6.9-rc2/arch/s390/mm/mmap.c 2004-09-16 11:18:19.855787673 +0200
+++ linux-2.6.9-rc2-gap/arch/s390/mm/mmap.c 2004-09-16 16:37:59.459999725 +0200
@@ -30,10 +30,13 @@
/*
* Top of mmap area (just below the process stack).
*
- * Leave an at least ~128 MB hole.
+ * Leave an at least 1 MB hole between stack and mmap_base.
+ * Leave an at least 128 MB gap between TASK_SIZE and mmap_base with a
+ * soft rlimit stack.
*/
-#define MIN_GAP (128*1024*1024)
-#define MAX_GAP (TASK_SIZE/6*5)
+#define MIN_HOLE (1*1024*1024)
+#define MIN_GAP (128*1024*1024 - MIN_HOLE)
+#define MAX_GAP (TASK_SIZE/6*5 - MIN_HOLE)

static inline unsigned long mmap_base(void)
{
@@ -43,8 +46,10 @@ static inline unsigned long mmap_base(vo
gap = MIN_GAP;
else if (gap > MAX_GAP)
gap = MAX_GAP;
+ if (gap > current->rlim[RLIMIT_STACK].rlim_max)
+ gap = current->rlim[RLIMIT_STACK].rlim_max;

- return TASK_SIZE - (gap & PAGE_MASK);
+ return TASK_SIZE - ((gap + MIN_HOLE) & PAGE_MASK);
}

static inline int mmap_is_legacy(void)


2004-09-16 17:40:38

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] flexmmap: optimise mmap_base gap for hard limited stack

On Thu, Sep 16, 2004 at 06:56:13PM +0200, Utz Lehmann wrote:
> Hi
>
> With the flexmmap memory layout there is at least a 128 MB gap between
> mmap_base and TASK_SIZE. I think this is for the case that a running process
> can expand it's stack soft rlimit.
>
> If there is a hard limit for the stack this minium gap is just a waste of
> space. This patch reduce the gap to the hard limit + 1 MB hole. If a process
> has a 8192 KB hard limit it have additional 119 MB space available over the
> current behavior.
>
> And the current implemention has a problem. If the stack soft limit is
> 128+ MB there is no hole between the stack and mmap_base. If there is a
> mapping at mmap_base stack overflows are not detected. The patch made a
> 1MB hole between them.

I developed a sysctl several years ago in all my 2.2 and 2.4 kernels
including all 2.2 and 2.4 SUSE kernels that major software vendors
requires for safety of their apps. IIRC I tried to merge it once but I
failed (got not applied to mainline). Now I'v just got another bugzilla
open about the lack of the sysctl and the major app is now again not
foolproof. A fixed number won't work, so I have to drop such a fixed GAP
anyways and rewrite it by forward porting my patch.

The sysctl in question is /proc/sys/vm/heap-stack-gap, so I recommend to
drop all those fixed GAP sizes and implement this instead:

http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.23aa3/00_silent-stack-overflow-20

If you reinvet the wheel and you prefer not to share the above code to
make a sysctl, at least make sure to use the name "heap-stack-gap" to
avoid any pointless incompatibility.

2004-09-16 17:49:23

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] flexmmap: optimise mmap_base gap for hard limited stack

On Thu, Sep 16, 2004 at 06:56:13PM +0200, Utz Lehmann wrote:
> Hi
>
> With the flexmmap memory layout there is at least a 128 MB gap between
> mmap_base and TASK_SIZE. I think this is for the case that a running process
> can expand it's stack soft rlimit.
>
> If there is a hard limit for the stack this minium gap is just a waste of
> space. This patch reduce the gap to the hard limit + 1 MB hole. If a process
> has a 8192 KB hard limit it have additional 119 MB space available over the
> current behavior.


I'm not so convinced this is the right approach... a bit of room for the
apps to increase their stack sounds useful. (and a "reasonable" amount is
SuS specified afaik, 128Mb is quite reasonable)


> And the current implemention has a problem. If the stack soft limit is
> 128+ MB there is no hole between the stack and mmap_base. If there is a
> mapping at mmap_base stack overflows are not detected. The patch made a
> 1MB hole between them.

ack on this part.


Attachments:
(No filename) (979.00 B)
(No filename) (189.00 B)
Download all attachments

2004-09-16 18:27:02

by Utz Lehmann

[permalink] [raw]
Subject: Re: [PATCH] flexmmap: optimise mmap_base gap for hard limited stack

Arjan van de Ven [[email protected]] wrote:
> On Thu, Sep 16, 2004 at 06:56:13PM +0200, Utz Lehmann wrote:
> > Hi
> >
> > With the flexmmap memory layout there is at least a 128 MB gap between
> > mmap_base and TASK_SIZE. I think this is for the case that a running process
> > can expand it's stack soft rlimit.
> >
> > If there is a hard limit for the stack this minium gap is just a waste of
> > space. This patch reduce the gap to the hard limit + 1 MB hole. If a process
> > has a 8192 KB hard limit it have additional 119 MB space available over the
> > current behavior.
>
>
> I'm not so convinced this is the right approach... a bit of room for the
> apps to increase their stack sounds useful. (and a "reasonable" amount is
> SuS specified afaik, 128Mb is quite reasonable)

This is only for a hard limited (rlim_max) stack. A non-root application
can not increase it anyway.
The default (rlim_cur = ~8MB, rlim_max = ulimited) is unchanged and get a
gap of 128MB.

A check for CAP_SYS_RESOURCE can be added. But i dont think it's worth.

2004-09-16 19:08:13

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH] flexmmap: optimise mmap_base gap for hard limited stack

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Utz Lehmann wrote:

> A check for CAP_SYS_RESOURCE can be added. But i dont think it's worth.

It is needed. Otherwise how do you allow increasing the stack size
again once it has been limited? I've no problem with using the smallest
reserved stack region with !CAP_SYS_RESOURCE, but otherwise the existing
method should be used.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFBSeRu2ijCOnn/RHQRAi9kAKCvg6KSntcpjNT0Ld8wLQuS5RqxtACfUwDY
X59x6hGwCtUZUgbX2O/hV7k=
=mOVA
-----END PGP SIGNATURE-----

2004-09-17 13:19:59

by Utz Lehmann

[permalink] [raw]
Subject: Re: [PATCH] flexmmap: optimise mmap_base gap for hard limited stack

Ulrich Drepper [[email protected]] wrote:
> > A check for CAP_SYS_RESOURCE can be added. But i dont think it's worth.
>
> It is needed. Otherwise how do you allow increasing the stack size
> again once it has been limited? I've no problem with using the smallest
> reserved stack region with !CAP_SYS_RESOURCE, but otherwise the existing
> method should be used.

I made that change. The following patch only reduce the gap when the
application can not extend the stack space anyway (hard limited stack &&
!CAP_SYS_RESOURCE). All other cases stay unchanged except for the 1 MB hole
for soft limited stacks >128 MB.

It gave a nice way for making most of the default 128 MB gap usable for
applications. Just run them with a hard stack limit.

Now i can allocate more than 3.8GiB in one chunk on x86 (this patch +
exec-shield + 4g/4g + ulimit -H -s 8192).


Signed-off-by: Utz Lehmann <[email protected]>

diff -Nrup linux-2.6.9-rc2/arch/i386/mm/mmap.c linux-2.6.9-rc2-gap4/arch/i386/mm/mmap.c
--- linux-2.6.9-rc2/arch/i386/mm/mmap.c 2004-09-16 11:18:15.363366420 +0200
+++ linux-2.6.9-rc2-gap4/arch/i386/mm/mmap.c 2004-09-17 12:14:16.734968291 +0200
@@ -30,10 +30,13 @@
/*
* Top of mmap area (just below the process stack).
*
- * Leave an at least ~128 MB hole.
+ * Leave an at least 1 MB hole between stack and mmap_base.
+ * Leave an at least 128 MB gap between TASK_SIZE and mmap_base with a
+ * soft rlimit stack.
*/
-#define MIN_GAP (128*1024*1024)
-#define MAX_GAP (TASK_SIZE/6*5)
+#define MIN_HOLE (1*1024*1024)
+#define MIN_GAP (128*1024*1024 - MIN_HOLE)
+#define MAX_GAP (TASK_SIZE/6*5 - MIN_HOLE)

static inline unsigned long mmap_base(struct mm_struct *mm)
{
@@ -43,8 +46,11 @@ static inline unsigned long mmap_base(st
gap = MIN_GAP;
else if (gap > MAX_GAP)
gap = MAX_GAP;
+ if ((gap > current->rlim[RLIMIT_STACK].rlim_max) &&
+ !capable(CAP_SYS_RESOURCE))
+ gap = current->rlim[RLIMIT_STACK].rlim_max;

- return TASK_SIZE - (gap & PAGE_MASK);
+ return TASK_SIZE - ((gap + MIN_HOLE) & PAGE_MASK);
}

/*
diff -Nrup linux-2.6.9-rc2/arch/ppc64/mm/mmap.c linux-2.6.9-rc2-gap4/arch/ppc64/mm/mmap.c
--- linux-2.6.9-rc2/arch/ppc64/mm/mmap.c 2004-09-16 11:18:19.760799910 +0200
+++ linux-2.6.9-rc2-gap4/arch/ppc64/mm/mmap.c 2004-09-17 12:15:05.572696938 +0200
@@ -30,10 +30,13 @@
/*
* Top of mmap area (just below the process stack).
*
- * Leave an at least ~128 MB hole.
+ * Leave an at least 1 MB hole between stack and mmap_base.
+ * Leave an at least 128 MB gap between TASK_SIZE and mmap_base with a
+ * soft rlimit stack.
*/
-#define MIN_GAP (128*1024*1024)
-#define MAX_GAP (TASK_SIZE/6*5)
+#define MIN_HOLE (1*1024*1024)
+#define MIN_GAP (128*1024*1024 - MIN_HOLE)
+#define MAX_GAP (TASK_SIZE/6*5 - MIN_HOLE)

static inline unsigned long mmap_base(void)
{
@@ -43,8 +46,11 @@ static inline unsigned long mmap_base(vo
gap = MIN_GAP;
else if (gap > MAX_GAP)
gap = MAX_GAP;
+ if ((gap > current->rlim[RLIMIT_STACK].rlim_max) &&
+ !capable(CAP_SYS_RESOURCE))
+ gap = current->rlim[RLIMIT_STACK].rlim_max;

- return TASK_SIZE - (gap & PAGE_MASK);
+ return TASK_SIZE - ((gap + MIN_HOLE) & PAGE_MASK);
}

static inline int mmap_is_legacy(void)
diff -Nrup linux-2.6.9-rc2/arch/s390/mm/mmap.c linux-2.6.9-rc2-gap4/arch/s390/mm/mmap.c
--- linux-2.6.9-rc2/arch/s390/mm/mmap.c 2004-09-16 11:18:19.855787673 +0200
+++ linux-2.6.9-rc2-gap4/arch/s390/mm/mmap.c 2004-09-17 12:15:23.054452086 +0200
@@ -30,10 +30,13 @@
/*
* Top of mmap area (just below the process stack).
*
- * Leave an at least ~128 MB hole.
+ * Leave an at least 1 MB hole between stack and mmap_base.
+ * Leave an at least 128 MB gap between TASK_SIZE and mmap_base with a
+ * soft rlimit stack.
*/
-#define MIN_GAP (128*1024*1024)
-#define MAX_GAP (TASK_SIZE/6*5)
+#define MIN_HOLE (1*1024*1024)
+#define MIN_GAP (128*1024*1024 - MIN_HOLE)
+#define MAX_GAP (TASK_SIZE/6*5 - MIN_HOLE)

static inline unsigned long mmap_base(void)
{
@@ -43,8 +46,11 @@ static inline unsigned long mmap_base(vo
gap = MIN_GAP;
else if (gap > MAX_GAP)
gap = MAX_GAP;
+ if ((gap > current->rlim[RLIMIT_STACK].rlim_max) &&
+ !capable(CAP_SYS_RESOURCE))
+ gap = current->rlim[RLIMIT_STACK].rlim_max;

- return TASK_SIZE - (gap & PAGE_MASK);
+ return TASK_SIZE - ((gap + MIN_HOLE) & PAGE_MASK);
}

static inline int mmap_is_legacy(void)

2004-09-17 13:21:50

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] flexmmap: optimise mmap_base gap for hard limited stack

On Fri, Sep 17, 2004 at 03:18:30PM +0200, Utz Lehmann wrote:
> Ulrich Drepper [[email protected]] wrote:
> > > A check for CAP_SYS_RESOURCE can be added. But i dont think it's worth.
> >
> > It is needed. Otherwise how do you allow increasing the stack size
> > again once it has been limited? I've no problem with using the smallest
> > reserved stack region with !CAP_SYS_RESOURCE, but otherwise the existing
> > method should be used.
>
> I made that change. The following patch only reduce the gap when the
> application can not extend the stack space anyway (hard limited stack &&
> !CAP_SYS_RESOURCE). All other cases stay unchanged except for the 1 MB hole
> for soft limited stacks >128 MB.
>
> It gave a nice way for making most of the default 128 MB gap usable for
> applications. Just run them with a hard stack limit.
>
> Now i can allocate more than 3.8GiB in one chunk on x86 (this patch +
> exec-shield + 4g/4g + ulimit -H -s 8192).

Ack; nice work!


Attachments:
(No filename) (972.00 B)
(No filename) (189.00 B)
Download all attachments

2004-09-17 13:25:03

by Utz Lehmann

[permalink] [raw]
Subject: Re: [PATCH] flexmmap: optimise mmap_base gap for hard limited stack

Andrea Arcangeli [[email protected]] wrote:
> I developed a sysctl several years ago in all my 2.2 and 2.4 kernels
> including all 2.2 and 2.4 SUSE kernels that major software vendors
> requires for safety of their apps. IIRC I tried to merge it once but I
> failed (got not applied to mainline). Now I'v just got another bugzilla
> open about the lack of the sysctl and the major app is now again not
> foolproof. A fixed number won't work, so I have to drop such a fixed GAP
> anyways and rewrite it by forward porting my patch.
>
> The sysctl in question is /proc/sys/vm/heap-stack-gap, so I recommend to
> drop all those fixed GAP sizes and implement this instead:
>
> http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.23aa3/00_silent-stack-overflow-20
>
> If you reinvet the wheel and you prefer not to share the above code to
> make a sysctl, at least make sure to use the name "heap-stack-gap" to
> avoid any pointless incompatibility.

I dont know if i understand your patch correctly. It looks that there is a
gap wandering below the actual stack. If this is the case than i think it's
not suited for the flexmmap layout. With flexmmap the mappings are top down
from mmap_base.