2014-04-18 01:25:51

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH v3] ipc,shm: disable shmmax and shmall by default

The default size for shmmax is, and always has been, 32Mb.
Today, this value is rather small, making users have to
increase it via sysctl, which can cause unnecessary work and
userspace application workarounds. Ie:

http://rhaas.blogspot.com/2012/06/absurd-shared-memory-limits.html

Unix has historically required setting these limits for shared
memory, and Linux inherited such behavior. The consequence of this
is added complexity for users and administrators. One very common
example are Database setup/installation documents and scripts,
where users must manually calculate the values for these limits.
This also requires (some) knowledge of how the underlying memory
management works, thus causing, in many occasions, the limits to
just be flat out wrong. Disabling these limits sooner could have
saved companies a lot of time, headaches and money for support.
But it's never too late, simplify users life now.

Instead of choosing yet another arbitrary value, larger than 32Mb,
this patch disables the use of both shmmax and shmall by default,
allowing users to create segments of unlimited sizes. Users and
applications that already explicitly set these values through sysctl
are left untouched, and thus does not change any of the behavior.

So a value of 0 bytes or pages, for shmmax and shmall, respectively,
implies unlimited memory, as opposed to disabling sysv shared memory.
This is safe as 0 cannot possibly be used previously as SHMMIN is
hardcoded to 1 and cannot be modified. This change will of course
be reflected in shmctl(SHM_STAT) calls. Any application that does
preliminary checking of the size of shmmax, must also check for
shmmin, and therefore the kernel can safely make this change. It is
well stated that any sizes must be within both ranges.

Another advantage of setting these values to 0 is that we automatically
take care of any variable overflowing problems, where the limit can
accidentally become 0. Without this change, such situations are just
*broken*, where shmmax = 0 < shmmin = 1.

This change allows Linux to treat shm just as regular anonymous memory.
One important difference between them, though, is handling out-of-memory
conditions: as opposed to regular anon memory, the OOM killer will not
free the memory as it is shm, allowing users to potentially abuse this.
To overcome this situation, the shm_rmid_forced option must be enabled.

Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
---
Changes from v2:
- Improve changelog (per Andrew/Manfred).
- Minor documentation updates (per Michael).

Changes from v1:
- Respect SHMMIN even when shmmax is 0 (unlimited).
This fixes the shmget02 test that broke in v1. (per Manfred).

- Update changelog regarding OOM description (per Kosaki)

include/linux/shm.h | 3 ++-
include/uapi/linux/shm.h | 8 ++++----
ipc/shm.c | 6 ++++--
3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/shm.h b/include/linux/shm.h
index 1e2cd2e..34e6ba74 100644
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -4,7 +4,8 @@
#include <asm/page.h>
#include <uapi/linux/shm.h>

-#define SHMALL (SHMMAX/PAGE_SIZE*(SHMMNI/16)) /* max shm system wide (pages) */
+/* max shm system wide (pages), 0 being unlimited */
+#define SHMALL 0
#include <asm/shmparam.h>
struct shmid_kernel /* private to the kernel */
{
diff --git a/include/uapi/linux/shm.h b/include/uapi/linux/shm.h
index 78b6941..d645c0c 100644
--- a/include/uapi/linux/shm.h
+++ b/include/uapi/linux/shm.h
@@ -9,14 +9,14 @@

/*
* SHMMAX, SHMMNI and SHMALL are upper limits are defaults which can
- * be increased by sysctl
+ * be modified by sysctl. By default, disable SHMMAX and SHMALL with
+ * 0 bytes, thus allowing processes to have unlimited shared memory.
*/
-
-#define SHMMAX 0x2000000 /* max shared seg size (bytes) */
+#define SHMMAX 0 /* max shared seg size (bytes) */
#define SHMMIN 1 /* min shared seg size (bytes) */
#define SHMMNI 4096 /* max num of segs system wide */
#ifndef __KERNEL__
-#define SHMALL (SHMMAX/getpagesize()*(SHMMNI/16))
+#define SHMALL 0
#endif
#define SHMSEG SHMMNI /* max shared segs per process */

diff --git a/ipc/shm.c b/ipc/shm.c
index 7645961..8630561 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -490,10 +490,12 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
int id;
vm_flags_t acctflag = 0;

- if (size < SHMMIN || size > ns->shm_ctlmax)
+ if (size < SHMMIN ||
+ (ns->shm_ctlmax && size > ns->shm_ctlmax))
return -EINVAL;

- if (ns->shm_tot + numpages > ns->shm_ctlall)
+ if (ns->shm_ctlall &&
+ ns->shm_tot + numpages > ns->shm_ctlall)
return -ENOSPC;

shp = ipc_rcu_alloc(sizeof(*shp));
--
1.8.1.4



2014-04-18 09:26:38

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH v3] ipc,shm: disable shmmax and shmall by default

Hi Davidlohr,

On 04/18/2014 03:25 AM, Davidlohr Bueso wrote:
> So a value of 0 bytes or pages, for shmmax and shmall, respectively,
> implies unlimited memory, as opposed to disabling sysv shared memory.
That might be a second risk:
Right now, a sysadmin can prevent sysv memory allocations with

# sysctl kernel.shmall=0

After your patch is applied, this line allows unlimited allocations.

Obviously my patch has the opposite problem: 64-bit wrap-arounds.

> --- a/include/uapi/linux/shm.h
> +++ b/include/uapi/linux/shm.h
> @@ -9,14 +9,14 @@
>
> /*
> * SHMMAX, SHMMNI and SHMALL are upper limits are defaults which can
> - * be increased by sysctl
> + * be modified by sysctl. By default, disable SHMMAX and SHMALL with
> + * 0 bytes, thus allowing processes to have unlimited shared memory.
> */
> -
> -#define SHMMAX 0x2000000 /* max shared seg size (bytes) */
> +#define SHMMAX 0 /* max shared seg size (bytes) */
> #define SHMMIN 1 /* min shared seg size (bytes) */
> #define SHMMNI 4096 /* max num of segs system wide */
> #ifndef __KERNEL__
> -#define SHMALL (SHMMAX/getpagesize()*(SHMMNI/16))
> +#define SHMALL 0
> #endif
> #define SHMSEG SHMMNI /* max shared segs per process */
>
The "#ifndef __KERNEL__" is not required:
As there is no reference to PAGE_SIZE anymore, one definition for SHMALL
is sufficient.


--
Manfred

Subject: Re: [PATCH v3] ipc,shm: disable shmmax and shmall by default

On Fri, Apr 18, 2014 at 11:26 AM, Manfred Spraul
<[email protected]> wrote:
> Hi Davidlohr,
>
>
> On 04/18/2014 03:25 AM, Davidlohr Bueso wrote:
>>
>> So a value of 0 bytes or pages, for shmmax and shmall, respectively,
>> implies unlimited memory, as opposed to disabling sysv shared memory.
>
> That might be a second risk:
> Right now, a sysadmin can prevent sysv memory allocations with
>
> # sysctl kernel.shmall=0
>
> After your patch is applied, this line allows unlimited allocations.

Good point. I wonder if some folk may get bitten by this complete
reversal the semantics of shmall==0.

> Obviously my patch has the opposite problem: 64-bit wrap-arounds.

I know you alluded to a case in another thread, but I couldn't quite
work out from the mail you referred to whether this was really the
problem. (And I assume those folks were forced to fix their set-up
scripts anyway.) So, it's not clear to me whether this is a real
problem. (And your patch does not worsen things from the current
situation, right?)

Cheers,

Michael



>> --- a/include/uapi/linux/shm.h
>> +++ b/include/uapi/linux/shm.h
>> @@ -9,14 +9,14 @@
>> /*
>> * SHMMAX, SHMMNI and SHMALL are upper limits are defaults which can
>> - * be increased by sysctl
>> + * be modified by sysctl. By default, disable SHMMAX and SHMALL with
>> + * 0 bytes, thus allowing processes to have unlimited shared memory.
>> */
>> -
>> -#define SHMMAX 0x2000000 /* max shared seg size (bytes) */
>> +#define SHMMAX 0 /* max shared seg size (bytes) */
>> #define SHMMIN 1 /* min shared seg size (bytes) */
>> #define SHMMNI 4096 /* max num of segs system wide */
>> #ifndef __KERNEL__
>> -#define SHMALL (SHMMAX/getpagesize()*(SHMMNI/16))
>> +#define SHMALL 0
>> #endif
>> #define SHMSEG SHMMNI /* max shared segs per process */
>>
>
> The "#ifndef __KERNEL__" is not required:
> As there is no reference to PAGE_SIZE anymore, one definition for SHMALL is
> sufficient.
>
>
> --
> Manfred



--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2014-04-18 16:33:38

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH v3] ipc,shm: disable shmmax and shmall by default

On Fri, 2014-04-18 at 11:26 +0200, Manfred Spraul wrote:
> Hi Davidlohr,
>
> On 04/18/2014 03:25 AM, Davidlohr Bueso wrote:
> > So a value of 0 bytes or pages, for shmmax and shmall, respectively,
> > implies unlimited memory, as opposed to disabling sysv shared memory.
> That might be a second risk:
> Right now, a sysadmin can prevent sysv memory allocations with
>
> # sysctl kernel.shmall=0

Yeah, I had pointed this out previously, and it is addressed in the
changelog. shmall = 0 directly contradicts size < shmmin = 1, so I don't
know who's wrong there...

2014-04-18 17:51:10

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH v3] ipc,shm: disable shmmax and shmall by default

On 04/18/2014 05:36 PM, Michael Kerrisk (man-pages) wrote:
> On Fri, Apr 18, 2014 at 11:26 AM, Manfred Spraul
> <[email protected]> wrote:
>> Obviously my patch has the opposite problem: 64-bit wrap-arounds.
> I know you alluded to a case in another thread, but I couldn't quite
> work out from the mail you referred to whether this was really the
> problem. (And I assume those folks were forced to fix their set-up
> scripts anyway.) So, it's not clear to me whether this is a real
> problem. (And your patch does not worsen things from the current
> situation, right?)
a) When I wrote the comment it was just an idea.
But now I think wrap-around could be an issue, e.g.
find_vma_intersection(,addr,addr+ULONG_MAX) always returns false, even
if there are vmas inbetween.

b) If we make ULONG_MAX the default, then it should work.

--
Manfred