System V shared memory
a) can be abused to trigger out-of-memory conditions and the standard
measures against out-of-memory do not work:
- it is not possible to use setrlimit to limit the size of shm segments.
- segments can exist without association with any processes, thus
the oom-killer is unable to free that memory.
b) is typically used for shared information - today often multiple GB.
(e.g. database shared buffers)
The current default is a maximum segment size of 32 MB and a maximum total
size of 8 GB. This is often too much for a) and not enough for b), which
means that lots of users must change the defaults.
This patch increases the default limits to ULONG_MAX, which is perfect for
case b). The defaults are used after boot and as the initial value for
each new namespace.
Admins/distros that need a protection against a) should reduce the limits
and/or enable shm_rmid_forced.
Further notes:
- The patch only changes the boot time default, overrides behave as before:
# sysctl kernel/shmall=33554432
would recreate the previous limit for SHMMAX (for the current namespace).
- Disabling sysv shm allocation is possible with:
# sysctl kernel.shmall=0
(not a new feature, also per-namespace)
- ULONG_MAX is not really infinity, but 18 Exabyte segment size and
75 Zettabyte total size. This should be enough for the next few weeks.
(assuming a 64-bit system with 4k pages)
Risks:
- The patch breaks installations that use "take current value and increase
it a bit". [seems to exist, http://marc.info/?l=linux-mm&m=139638334330127]
After a:
# CUR=`sysctl -n kernel.shmmax`
# NEW=`echo $CUR+1 | bc -l`
# sysctl -n kernel.shmmax=$NEW
shmmax ends up as 0, which disables shm allocations.
- There is no wrap-around protection for ns->shm_ctlall, i.e. the 75 ZB
limit is not enforced.
Signed-off-by: Manfred Spraul <[email protected]>
Reported-by: Davidlohr Bueso <[email protected]>
Cc: [email protected]
---
include/linux/shm.h | 1 -
include/uapi/linux/shm.h | 8 +++-----
2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/include/linux/shm.h b/include/linux/shm.h
index 1e2cd2e..b33bbeb 100644
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -4,7 +4,6 @@
#include <asm/page.h>
#include <uapi/linux/shm.h>
-#define SHMALL (SHMMAX/PAGE_SIZE*(SHMMNI/16)) /* max shm system wide (pages) */
#include <asm/shmparam.h>
struct shmid_kernel /* private to the kernel */
{
diff --git a/include/uapi/linux/shm.h b/include/uapi/linux/shm.h
index 78b6941..b7370f9 100644
--- a/include/uapi/linux/shm.h
+++ b/include/uapi/linux/shm.h
@@ -9,15 +9,13 @@
/*
* SHMMAX, SHMMNI and SHMALL are upper limits are defaults which can
- * be increased by sysctl
+ * be modified by sysctl
*/
-#define SHMMAX 0x2000000 /* max shared seg size (bytes) */
+#define SHMMAX ULONG_MAX /* max shared seg size (bytes) */
#define SHMMIN 1 /* min shared seg size (bytes) */
#define SHMMNI 4096 /* max num of segs system wide */
-#ifndef __KERNEL__
-#define SHMALL (SHMMAX/getpagesize()*(SHMMNI/16))
-#endif
+#define SHMALL ULONG_MAX /* max shm system wide (pages) */
#define SHMSEG SHMMNI /* max shared segs per process */
--
1.9.0
> -----Original Message-----
> From: Manfred Spraul [mailto:[email protected]]
> Sent: Friday, April 18, 2014 2:19 AM
> To: Andrew Morton; Davidlohr Bueso
> Cc: LKML; KAMEZAWA Hiroyuki; Motohiro Kosaki JP; [email protected]; [email protected]; [email protected]; Manfred Spraul;
> [email protected]
> Subject: [PATCH] ipc/shm: Increase the defaults for SHMALL, SHMMAX to infinity
>
> System V shared memory
>
> a) can be abused to trigger out-of-memory conditions and the standard
> measures against out-of-memory do not work:
>
> - it is not possible to use setrlimit to limit the size of shm segments.
>
> - segments can exist without association with any processes, thus
> the oom-killer is unable to free that memory.
>
> b) is typically used for shared information - today often multiple GB.
> (e.g. database shared buffers)
>
> The current default is a maximum segment size of 32 MB and a maximum total size of 8 GB. This is often too much for a) and not
> enough for b), which means that lots of users must change the defaults.
>
> This patch increases the default limits to ULONG_MAX, which is perfect for case b). The defaults are used after boot and as the initial
> value for each new namespace.
>
> Admins/distros that need a protection against a) should reduce the limits and/or enable shm_rmid_forced.
>
> Further notes:
> - The patch only changes the boot time default, overrides behave as before:
> # sysctl kernel/shmall=33554432
> would recreate the previous limit for SHMMAX (for the current namespace).
>
> - Disabling sysv shm allocation is possible with:
> # sysctl kernel.shmall=0
> (not a new feature, also per-namespace)
>
> - ULONG_MAX is not really infinity, but 18 Exabyte segment size and
> 75 Zettabyte total size. This should be enough for the next few weeks.
> (assuming a 64-bit system with 4k pages)
>
> Risks:
> - The patch breaks installations that use "take current value and increase
> it a bit". [seems to exist, http://marc.info/?l=linux-mm&m=139638334330127]
> After a:
> # CUR=`sysctl -n kernel.shmmax`
> # NEW=`echo $CUR+1 | bc -l`
> # sysctl -n kernel.shmmax=$NEW
> shmmax ends up as 0, which disables shm allocations.
>
> - There is no wrap-around protection for ns->shm_ctlall, i.e. the 75 ZB
> limit is not enforced.
>
> Signed-off-by: Manfred Spraul <[email protected]>
> Reported-by: Davidlohr Bueso <[email protected]>
> Cc: [email protected]
I'm ok either ULONG_MAX or 0 (special value of infinity).
Acked-by: KOSAKI Motohiro <[email protected]>
On Fri, 2014-04-18 at 11:18 +0200, Manfred Spraul wrote:
> System V shared memory
>
> a) can be abused to trigger out-of-memory conditions and the standard
> measures against out-of-memory do not work:
>
> - it is not possible to use setrlimit to limit the size of shm segments.
>
> - segments can exist without association with any processes, thus
> the oom-killer is unable to free that memory.
>
> b) is typically used for shared information - today often multiple GB.
> (e.g. database shared buffers)
>
> The current default is a maximum segment size of 32 MB and a maximum total
> size of 8 GB. This is often too much for a) and not enough for b), which
> means that lots of users must change the defaults.
>
> This patch increases the default limits to ULONG_MAX, which is perfect for
> case b). The defaults are used after boot and as the initial value for
> each new namespace.
>
> Admins/distros that need a protection against a) should reduce the limits
> and/or enable shm_rmid_forced.
>
> Further notes:
> - The patch only changes the boot time default, overrides behave as before:
> # sysctl kernel/shmall=33554432
> would recreate the previous limit for SHMMAX (for the current namespace).
>
> - Disabling sysv shm allocation is possible with:
> # sysctl kernel.shmall=0
> (not a new feature, also per-namespace)
>
> - ULONG_MAX is not really infinity, but 18 Exabyte segment size and
> 75 Zettabyte total size. This should be enough for the next few weeks.
> (assuming a 64-bit system with 4k pages)
>
> Risks:
> - The patch breaks installations that use "take current value and increase
> it a bit". [seems to exist, http://marc.info/?l=linux-mm&m=139638334330127]
This really scares me. The probability of occurrence is now much higher,
and not just theoretical. It would legitimately break userspace.
On Sat, Apr 19, 2014 at 8:55 AM, Davidlohr Bueso <[email protected]> wrote:
> On Fri, 2014-04-18 at 11:18 +0200, Manfred Spraul wrote:
>> System V shared memory
>>
>> a) can be abused to trigger out-of-memory conditions and the standard
>> measures against out-of-memory do not work:
>>
>> - it is not possible to use setrlimit to limit the size of shm segments.
>>
>> - segments can exist without association with any processes, thus
>> the oom-killer is unable to free that memory.
>>
>> b) is typically used for shared information - today often multiple GB.
>> (e.g. database shared buffers)
>>
>> The current default is a maximum segment size of 32 MB and a maximum total
>> size of 8 GB. This is often too much for a) and not enough for b), which
>> means that lots of users must change the defaults.
>>
>> This patch increases the default limits to ULONG_MAX, which is perfect for
>> case b). The defaults are used after boot and as the initial value for
>> each new namespace.
>>
>> Admins/distros that need a protection against a) should reduce the limits
>> and/or enable shm_rmid_forced.
>>
>> Further notes:
>> - The patch only changes the boot time default, overrides behave as before:
>> # sysctl kernel/shmall=33554432
>> would recreate the previous limit for SHMMAX (for the current namespace).
>>
>> - Disabling sysv shm allocation is possible with:
>> # sysctl kernel.shmall=0
>> (not a new feature, also per-namespace)
>>
>> - ULONG_MAX is not really infinity, but 18 Exabyte segment size and
>> 75 Zettabyte total size. This should be enough for the next few weeks.
>> (assuming a 64-bit system with 4k pages)
>>
>> Risks:
>> - The patch breaks installations that use "take current value and increase
>> it a bit". [seems to exist, http://marc.info/?l=linux-mm&m=139638334330127]
>
> This really scares me. The probability of occurrence is now much higher,
> and not just theoretical. It would legitimately break userspace.
I'm missing something. Manfred's patch doesn't actually change the
behavior on this point does it? If the problem is more than
theoretical, then it _already_ affects users, right? (And they would
therefore already be working around the problem.)
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
On 04/19/2014 08:55 AM, Davidlohr Bueso wrote:
> On Fri, 2014-04-18 at 11:18 +0200, Manfred Spraul wrote:
>> - ULONG_MAX is not really infinity, but 18 Exabyte segment size and
>> 75 Zettabyte total size. This should be enough for the next few weeks.
>> (assuming a 64-bit system with 4k pages)
Note: I found three integer overflows, none of them critical.
I will send patches, I just must get a 32-bit test setup first.
>> Risks:
>> - The patch breaks installations that use "take current value and increase
>> it a bit". [seems to exist, http://marc.info/?l=linux-mm&m=139638334330127]
> This really scares me. The probability of occurrence is now much higher,
> and not just theoretical. It would legitimately break userspace.
That's why I mentioned it.
For shmmax, there is a simple answer: Use TASK_SIZE instead of ULONG_MAX.
- sufficiently far away from overflow.
- values beyond TASK_SIZE are useless anyway, you can't map such segments.
I don't have a good answer for shmall. 1L<<(BITS_PER_LONG-1) is too ugly.
Any proposals?
--
Manfred
On 04/19/2014 09:10 AM, Michael Kerrisk (man-pages) wrote:
> On Sat, Apr 19, 2014 at 8:55 AM, Davidlohr Bueso <[email protected]> wrote:
>> On Fri, 2014-04-18 at 11:18 +0200, Manfred Spraul wrote:
>>> Risks:
>>> - The patch breaks installations that use "take current value and increase
>>> it a bit". [seems to exist, http://marc.info/?l=linux-mm&m=139638334330127]
>> This really scares me. The probability of occurrence is now much higher,
>> and not just theoretical. It would legitimately break userspace.
> I'm missing something. Manfred's patch doesn't actually change the
> behavior on this point does it? If the problem is more than
> theoretical, then it _already_ affects users, right? (And they would
> therefore already be working around the problem.)
The current default is 32 MB. if some increases it by 1 MB, then the
result is 33 MB.
The new default would be ULONG_MAX. If someone increases it by 1 MB,
then the result is 1 MB - 1 byte.
--
Manfred
On 04/19/2014 10:45 AM, Manfred Spraul wrote:
> On 04/19/2014 09:10 AM, Michael Kerrisk (man-pages) wrote:
>> On Sat, Apr 19, 2014 at 8:55 AM, Davidlohr Bueso <[email protected]> wrote:
>>> On Fri, 2014-04-18 at 11:18 +0200, Manfred Spraul wrote:
>>>> Risks:
>>>> - The patch breaks installations that use "take current value and increase
>>>> it a bit". [seems to exist, http://marc.info/?l=linux-mm&m=139638334330127]
>>> This really scares me. The probability of occurrence is now much higher,
>>> and not just theoretical. It would legitimately break userspace.
>> I'm missing something. Manfred's patch doesn't actually change the
>> behavior on this point does it? If the problem is more than
>> theoretical, then it _already_ affects users, right? (And they would
>> therefore already be working around the problem.)
> The current default is 32 MB. if some increases it by 1 MB, then the
> result is 33 MB.
> The new default would be ULONG_MAX. If someone increases it by 1 MB,
> then the result is 1 MB - 1 byte.
Ahh. Got it now--sorry for being slow.
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
On 04/19/2014 10:37 AM, Manfred Spraul wrote:
> On 04/19/2014 08:55 AM, Davidlohr Bueso wrote:
>> On Fri, 2014-04-18 at 11:18 +0200, Manfred Spraul wrote:
>>> - ULONG_MAX is not really infinity, but 18 Exabyte segment size and
>>> 75 Zettabyte total size. This should be enough for the next few weeks.
>>> (assuming a 64-bit system with 4k pages)
> Note: I found three integer overflows, none of them critical.
> I will send patches, I just must get a 32-bit test setup first.
>>> Risks:
>>> - The patch breaks installations that use "take current value and increase
>>> it a bit". [seems to exist, http://marc.info/?l=linux-mm&m=139638334330127]
>> This really scares me. The probability of occurrence is now much higher,
>> and not just theoretical. It would legitimately break userspace.
> That's why I mentioned it.
> For shmmax, there is a simple answer: Use TASK_SIZE instead of ULONG_MAX.
> - sufficiently far away from overflow.
> - values beyond TASK_SIZE are useless anyway, you can't map such segments.
>
> I don't have a good answer for shmall. 1L<<(BITS_PER_LONG-1) is too ugly.
> Any proposals?
If shmmax is TASK_SIZE, would not the existing
#define SHMALL (SHMMAX/getpagesize()*(SHMMNI/16))
suffice?
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/