Hi all,
the increase of SHMMAX/SHMALL is now a 4 patch series, and still
not ready for merging (see at the end, TASK_SIZE and s390).
If we increase the default limits for SHMMAX and SHMALL,
integer overflows could happen:
SHMMAX:
- shmmem_file_setup places a hard limit on the segment size:
MAX_LFS_FILESIZE.
on 32-bit, the limit is > 1 TB.
--> 32-bit: 4 GB-1 segments are possible.
Rounded up to full pages the actual allocated size
is 0.
--> patch 3
on 64-bit, this is 0x7fff ffff ffff ffff
--> no chance for an overflow.
- shmat:
- find_vma_intersection does not handle overflows properly
--> patch 1.
- do_mmap_pgoff limits mappings to TASK_SIZE
3 GB on 32-bit (assuming x86)
47 bits on 64-bit (assuming x86)
- do_mmap_pgoff checks for overflows:
map 2 GB, starting from addr=2.5GB fails.
SHMALL:
- after creating 8192 segments size (1L<<63)-1, shm_tot
overflows and returns 0.
--> patch 2.
And finally:
Patch 4, increase the limits to ULONG_MAX
Open points:
- Better ideas to handle uapi: Is it worth the effort to get
access to TASK_SIZE? I would say no.
- Better ideas with regards to SHMALL? The values are probably
large enough, but still arbitrary.
- The TASK_SIZE definition for e.g. S390 differs: It's not
a constant, instead it is the current task size for current.
And it seems that the task size can change based on
(virtual) memory pressure (s390_mmap_check()).
For new namespaces, this might have interesting effects, i.e.
this must be fixed.
--
Manfred
shm_tot counts the total number of pages used by shm segments.
If SHMALL is ULONG_MAX (or nearly ULONG_MAX), then the number
can overflow. Subsequent calls to shmctl(,SHM_INFO,) would return
wrong values for shm_tot.
The patch adds a detection for overflows.
Signed-off-by: Manfred Spraul <[email protected]>
---
ipc/shm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/ipc/shm.c b/ipc/shm.c
index 382e2fb..2dfa3d6 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -493,7 +493,8 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
if (size < SHMMIN || size > ns->shm_ctlmax)
return -EINVAL;
- if (ns->shm_tot + numpages > ns->shm_ctlall)
+ if (ns->shm_tot + numpages < ns->shm_tot ||
+ ns->shm_tot + numpages > ns->shm_ctlall)
return -ENOSPC;
shp = ipc_rcu_alloc(sizeof(*shp));
--
1.9.0
System V shared memory
a) can be abused to trigger out-of-memory conditions and the standard
measures against out-of-memory do not work:
- it is not possible to use setrlimit to limit the size of shm segments.
- segments can exist without association with any processes, thus
the oom-killer is unable to free that memory.
b) is typically used for shared information - today often multiple GB.
(e.g. database shared buffers)
The current default is a maximum segment size of 32 MB and a maximum total
size of 8 GB. This is often too much for a) and not enough for b), which
means that lots of users must change the defaults.
This patch increases the default limits to the supported maximum, which is
perfect for case b). The defaults are used after boot and as the initial
value for each new namespace.
Admins/distros that need a protection against a) should reduce the limits
and/or enable shm_rmid_forced.
Further notes:
- The patch only changes default, overrides behave as before:
# sysctl kernel/shmall=33554432
would recreate the previous limit for SHMMAX (for the current namespace).
- Disabling sysv shm allocation is possible with:
# sysctl kernel.shmall=0
(not a new feature, also per-namespace)
- The limits are intentionally not set to ULONG_MAX, to avoid triggering
overflows in user space.
[not unreasonable, see http://marc.info/?l=linux-mm&m=139638334330127]
- The the maximum segment size is set to TASK_SIZE. Segments larger than
TASK_SIZE do not make sense, because such segments can't be mapped.
- The limit for the total memory is 256*TASK_SIZE.
This would be 768 GB for x86-32 and 64 PB for x86-64.
Values larger than that might make sense, but not in the next few weeks.
Signed-off-by: Manfred Spraul <[email protected]>
Reported-by: Davidlohr Bueso <[email protected]>
Cc: [email protected]
---
include/linux/shm.h | 5 ++++-
include/uapi/linux/shm.h | 10 ++++++++--
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/include/linux/shm.h b/include/linux/shm.h
index 1e2cd2e..7cafb08 100644
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -4,7 +4,10 @@
#include <asm/page.h>
#include <uapi/linux/shm.h>
-#define SHMALL (SHMMAX/PAGE_SIZE*(SHMMNI/16)) /* max shm system wide (pages) */
+#define SHMMAX TASK_SIZE /* max shared seg size (bytes) */
+#define SHMALL (SHMMAX/PAGE_SIZE*(SHMMNI/16))
+ /* max shm system wide (pages) */
+
#include <asm/shmparam.h>
struct shmid_kernel /* private to the kernel */
{
diff --git a/include/uapi/linux/shm.h b/include/uapi/linux/shm.h
index 78b6941..a20bb7a 100644
--- a/include/uapi/linux/shm.h
+++ b/include/uapi/linux/shm.h
@@ -9,14 +9,20 @@
/*
* SHMMAX, SHMMNI and SHMALL are upper limits are defaults which can
- * be increased by sysctl
+ * be modified by sysctl
*/
-#define SHMMAX 0x2000000 /* max shared seg size (bytes) */
#define SHMMIN 1 /* min shared seg size (bytes) */
#define SHMMNI 4096 /* max num of segs system wide */
#ifndef __KERNEL__
+/*
+ * The real values is TASK_SIZE, which is not exported as uapi.
+ * Since this is only the boot time default, 1 GB is a sufficiently
+ * accurate approximation of TASK_SIZE.
+ */
+#define SHMMAX 0x40000000 /* max shared seg size (bytes) */
#define SHMALL (SHMMAX/getpagesize()*(SHMMNI/16))
+ /* max shm system wide (pages) */
#endif
#define SHMSEG SHMMNI /* max shared segs per process */
--
1.9.0
SHMMAX is the upper limit of a shared memory segment,
counted in bytes. The actual allocation is that size, rounded
up to the next full page.
Add a check that prevents the creation of segments where the
rounded up size causes an integer overflow.
Signed-off-by: Manfred Spraul <[email protected]>
---
ipc/shm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/ipc/shm.c b/ipc/shm.c
index 2dfa3d6..f000696 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -493,6 +493,9 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
if (size < SHMMIN || size > ns->shm_ctlmax)
return -EINVAL;
+ if (numpages << PAGE_SHIFT < size)
+ return -ENOSPC;
+
if (ns->shm_tot + numpages < ns->shm_tot ||
ns->shm_tot + numpages > ns->shm_ctlall)
return -ENOSPC;
--
1.9.0
find_vma_intersection does not work properly if addr+size overflows.
The patch adds a manual check before the call to find_vma_intersection.
Signed-off-by: Manfred Spraul <[email protected]>
---
ipc/shm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/ipc/shm.c b/ipc/shm.c
index 7645961..382e2fb 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1160,6 +1160,9 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
down_write(¤t->mm->mmap_sem);
if (addr && !(shmflg & SHM_REMAP)) {
err = -EINVAL;
+ if (addr + size < addr)
+ goto invalid;
+
if (find_vma_intersection(current->mm, addr, addr + size))
goto invalid;
/*
--
1.9.0