On Mon, 2014-03-31 at 17:05 -0700, Andrew Morton wrote:
> On Mon, 31 Mar 2014 16:25:32 -0700 Davidlohr Bueso <[email protected]> wrote:
>
> > On Mon, 2014-03-31 at 16:13 -0700, Andrew Morton wrote:
> > > On Mon, 31 Mar 2014 15:59:33 -0700 Davidlohr Bueso <[email protected]> wrote:
> > >
> > > > >
> > > > > - Shouldn't there be a way to alter this namespace's shm_ctlmax?
> > > >
> > > > Unfortunately this would also add the complexity I previously mentioned.
> > >
> > > But if the current namespace's shm_ctlmax is too small, you're screwed.
> > > Have to shut down the namespace all the way back to init_ns and start
> > > again.
> > >
> > > > > - What happens if we just nuke the limit altogether and fall back to
> > > > > the next check, which presumably is the rlimit bounds?
> > > >
> > > > afaik we only have rlimit for msgqueues. But in any case, while I like
> > > > that simplicity, it's too late. Too many workloads (specially DBs) rely
> > > > heavily on shmmax. Removing it and relying on something else would thus
> > > > cause a lot of things to break.
> > >
> > > It would permit larger shm segments - how could that break things? It
> > > would make most or all of these issues go away?
> > >
> >
> > So sysadmins wouldn't be very happy, per man shmget(2):
> >
> > EINVAL A new segment was to be created and size < SHMMIN or size >
> > SHMMAX, or no new segment was to be created, a segment with given key
> > existed, but size is greater than the size of that segment.
>
> So their system will act as if they had set SHMMAX=enormous. What
> problems could that cause?
So, just like any sysctl configurable, only privileged users can change
this value. If we remove this option, users can theoretically create
huge segments, thus ignoring any custom limit previously set. This is
what I fear. Think of it kind of like mlock's rlimit. And for that
matter, why does sysctl exist at all, the same would go for the rest of
the limits.
> Look. The 32M thing is causing problems. Arbitrarily increasing the
> arbitrary 32M to an arbitrary 128M won't fix anything - we still have
> the problem. Think bigger, please: how can we make this problem go
> away for ever?
That's the thing, I don't think we can make it go away without breaking
userspace. I'm not saying that my 4x increase is the correct value, I
don't think any default value is really correct, as with any other
hardcoded limits there are pros and cons. That's really why we give
users the option to change it to the "correct" one via sysctl. All I'm
saying is that 32mb is just too small for default in today's systems,
and increasing it is just making a bad situation a tiny bit better.
Thanks,
Davidlohr
On Tue, Apr 1, 2014 at 1:01 PM, Davidlohr Bueso <[email protected]> wrote:
> On Mon, 2014-03-31 at 17:05 -0700, Andrew Morton wrote:
>> On Mon, 31 Mar 2014 16:25:32 -0700 Davidlohr Bueso <[email protected]> wrote:
>>
>> > On Mon, 2014-03-31 at 16:13 -0700, Andrew Morton wrote:
>> > > On Mon, 31 Mar 2014 15:59:33 -0700 Davidlohr Bueso <[email protected]> wrote:
>> > >
>> > > > >
>> > > > > - Shouldn't there be a way to alter this namespace's shm_ctlmax?
>> > > >
>> > > > Unfortunately this would also add the complexity I previously mentioned.
>> > >
>> > > But if the current namespace's shm_ctlmax is too small, you're screwed.
>> > > Have to shut down the namespace all the way back to init_ns and start
>> > > again.
>> > >
>> > > > > - What happens if we just nuke the limit altogether and fall back to
>> > > > > the next check, which presumably is the rlimit bounds?
>> > > >
>> > > > afaik we only have rlimit for msgqueues. But in any case, while I like
>> > > > that simplicity, it's too late. Too many workloads (specially DBs) rely
>> > > > heavily on shmmax. Removing it and relying on something else would thus
>> > > > cause a lot of things to break.
>> > >
>> > > It would permit larger shm segments - how could that break things? It
>> > > would make most or all of these issues go away?
>> > >
>> >
>> > So sysadmins wouldn't be very happy, per man shmget(2):
>> >
>> > EINVAL A new segment was to be created and size < SHMMIN or size >
>> > SHMMAX, or no new segment was to be created, a segment with given key
>> > existed, but size is greater than the size of that segment.
>>
>> So their system will act as if they had set SHMMAX=enormous. What
>> problems could that cause?
>
> So, just like any sysctl configurable, only privileged users can change
> this value. If we remove this option, users can theoretically create
> huge segments, thus ignoring any custom limit previously set. This is
> what I fear. Think of it kind of like mlock's rlimit. And for that
> matter, why does sysctl exist at all, the same would go for the rest of
> the limits.
Hmm. It's hard to agree. AFAIK 32MB is just borrowed from other Unix
and it doesn't respect any Linux internals. Look, non privileged user
can user unlimited memory, at least on linux. So I don't find out any
difference between regular anon and shmem.
So, I personally like 0 byte per default.
On Tue, 2014-04-01 at 14:10 -0400, KOSAKI Motohiro wrote:
> On Tue, Apr 1, 2014 at 1:01 PM, Davidlohr Bueso <[email protected]> wrote:
> > On Mon, 2014-03-31 at 17:05 -0700, Andrew Morton wrote:
> >> On Mon, 31 Mar 2014 16:25:32 -0700 Davidlohr Bueso <[email protected]> wrote:
> >>
> >> > On Mon, 2014-03-31 at 16:13 -0700, Andrew Morton wrote:
> >> > > On Mon, 31 Mar 2014 15:59:33 -0700 Davidlohr Bueso <[email protected]> wrote:
> >> > >
> >> > > > >
> >> > > > > - Shouldn't there be a way to alter this namespace's shm_ctlmax?
> >> > > >
> >> > > > Unfortunately this would also add the complexity I previously mentioned.
> >> > >
> >> > > But if the current namespace's shm_ctlmax is too small, you're screwed.
> >> > > Have to shut down the namespace all the way back to init_ns and start
> >> > > again.
> >> > >
> >> > > > > - What happens if we just nuke the limit altogether and fall back to
> >> > > > > the next check, which presumably is the rlimit bounds?
> >> > > >
> >> > > > afaik we only have rlimit for msgqueues. But in any case, while I like
> >> > > > that simplicity, it's too late. Too many workloads (specially DBs) rely
> >> > > > heavily on shmmax. Removing it and relying on something else would thus
> >> > > > cause a lot of things to break.
> >> > >
> >> > > It would permit larger shm segments - how could that break things? It
> >> > > would make most or all of these issues go away?
> >> > >
> >> >
> >> > So sysadmins wouldn't be very happy, per man shmget(2):
> >> >
> >> > EINVAL A new segment was to be created and size < SHMMIN or size >
> >> > SHMMAX, or no new segment was to be created, a segment with given key
> >> > existed, but size is greater than the size of that segment.
> >>
> >> So their system will act as if they had set SHMMAX=enormous. What
> >> problems could that cause?
> >
> > So, just like any sysctl configurable, only privileged users can change
> > this value. If we remove this option, users can theoretically create
> > huge segments, thus ignoring any custom limit previously set. This is
> > what I fear. Think of it kind of like mlock's rlimit. And for that
> > matter, why does sysctl exist at all, the same would go for the rest of
> > the limits.
>
> Hmm. It's hard to agree. AFAIK 32MB is just borrowed from other Unix
> and it doesn't respect any Linux internals.
Agreed, it's stupid, but it's what Linux chose to use since forever.
> Look, non privileged user
> can user unlimited memory, at least on linux. So I don't find out any
> difference between regular anon and shmem.
Fine, let's try it, if users complain we can revert.
>
> So, I personally like 0 byte per default.
If by this you mean 0 bytes == unlimited, then I agree. It's less harsh
then removing it entirely. So instead of removing the limit we can just
set it by default to 0, and in newseg() if shm_ctlmax == 0 then we don't
return EINVAL if the passed size is great (obviously), otherwise, if the
user _explicitly_ set it via sysctl then we respect that. Andrew, do you
agree with this? If so I'll send a patch.
Thanks,
Davidlohr
On Tue, 01 Apr 2014 10:01:39 -0700 Davidlohr Bueso <[email protected]> wrote:
> > > EINVAL A new segment was to be created and size < SHMMIN or size >
> > > SHMMAX, or no new segment was to be created, a segment with given key
> > > existed, but size is greater than the size of that segment.
> >
> > So their system will act as if they had set SHMMAX=enormous. What
> > problems could that cause?
>
> So, just like any sysctl configurable, only privileged users can change
> this value. If we remove this option, users can theoretically create
> huge segments, thus ignoring any custom limit previously set. This is
> what I fear.
What's wrong with that? Waht are we actually ptoecting the system
from? tmpfs exhaustion?
> Think of it kind of like mlock's rlimit. And for that
> matter, why does sysctl exist at all, the same would go for the rest of
> the limits.
These things exist to protect the system from intentional or accidental
service denials. What are the service denials in this case?
> > Look. The 32M thing is causing problems. Arbitrarily increasing the
> > arbitrary 32M to an arbitrary 128M won't fix anything - we still have
> > the problem. Think bigger, please: how can we make this problem go
> > away for ever?
>
> That's the thing, I don't think we can make it go away without breaking
> userspace.
Still waiting for details!
> I'm not saying that my 4x increase is the correct value, I
> don't think any default value is really correct, as with any other
> hardcoded limits there are pros and cons. That's really why we give
> users the option to change it to the "correct" one via sysctl. All I'm
> saying is that 32mb is just too small for default in today's systems,
> and increasing it is just making a bad situation a tiny bit better.
Let's understand what's preventing us from making it a great deal better.
On Tue, Apr 1, 2014 at 2:31 PM, Davidlohr Bueso <[email protected]> wrote:
> On Tue, 2014-04-01 at 14:10 -0400, KOSAKI Motohiro wrote:
>> On Tue, Apr 1, 2014 at 1:01 PM, Davidlohr Bueso <[email protected]> wrote:
>> > On Mon, 2014-03-31 at 17:05 -0700, Andrew Morton wrote:
>> >> On Mon, 31 Mar 2014 16:25:32 -0700 Davidlohr Bueso <[email protected]> wrote:
>> >>
>> >> > On Mon, 2014-03-31 at 16:13 -0700, Andrew Morton wrote:
>> >> > > On Mon, 31 Mar 2014 15:59:33 -0700 Davidlohr Bueso <[email protected]> wrote:
>> >> > >
>> >> > > > >
>> >> > > > > - Shouldn't there be a way to alter this namespace's shm_ctlmax?
>> >> > > >
>> >> > > > Unfortunately this would also add the complexity I previously mentioned.
>> >> > >
>> >> > > But if the current namespace's shm_ctlmax is too small, you're screwed.
>> >> > > Have to shut down the namespace all the way back to init_ns and start
>> >> > > again.
>> >> > >
>> >> > > > > - What happens if we just nuke the limit altogether and fall back to
>> >> > > > > the next check, which presumably is the rlimit bounds?
>> >> > > >
>> >> > > > afaik we only have rlimit for msgqueues. But in any case, while I like
>> >> > > > that simplicity, it's too late. Too many workloads (specially DBs) rely
>> >> > > > heavily on shmmax. Removing it and relying on something else would thus
>> >> > > > cause a lot of things to break.
>> >> > >
>> >> > > It would permit larger shm segments - how could that break things? It
>> >> > > would make most or all of these issues go away?
>> >> > >
>> >> >
>> >> > So sysadmins wouldn't be very happy, per man shmget(2):
>> >> >
>> >> > EINVAL A new segment was to be created and size < SHMMIN or size >
>> >> > SHMMAX, or no new segment was to be created, a segment with given key
>> >> > existed, but size is greater than the size of that segment.
>> >>
>> >> So their system will act as if they had set SHMMAX=enormous. What
>> >> problems could that cause?
>> >
>> > So, just like any sysctl configurable, only privileged users can change
>> > this value. If we remove this option, users can theoretically create
>> > huge segments, thus ignoring any custom limit previously set. This is
>> > what I fear. Think of it kind of like mlock's rlimit. And for that
>> > matter, why does sysctl exist at all, the same would go for the rest of
>> > the limits.
>>
>> Hmm. It's hard to agree. AFAIK 32MB is just borrowed from other Unix
>> and it doesn't respect any Linux internals.
>
> Agreed, it's stupid, but it's what Linux chose to use since forever.
>
>> Look, non privileged user
>> can user unlimited memory, at least on linux. So I don't find out any
>> difference between regular anon and shmem.
>
> Fine, let's try it, if users complain we can revert.
>
>>
>> So, I personally like 0 byte per default.
>
> If by this you mean 0 bytes == unlimited, then I agree. It's less harsh
> then removing it entirely. So instead of removing the limit we can just
> set it by default to 0, and in newseg() if shm_ctlmax == 0 then we don't
> return EINVAL if the passed size is great (obviously), otherwise, if the
> user _explicitly_ set it via sysctl then we respect that. Andrew, do you
> agree with this? If so I'll send a patch.
Yes, my 0 bytes mean unlimited. I totally agree we shouldn't remove the knob
entirely.
On Tue, 2014-04-01 at 15:51 -0400, KOSAKI Motohiro wrote:
> >> So, I personally like 0 byte per default.
> >
> > If by this you mean 0 bytes == unlimited, then I agree. It's less harsh
> > then removing it entirely. So instead of removing the limit we can just
> > set it by default to 0, and in newseg() if shm_ctlmax == 0 then we don't
> > return EINVAL if the passed size is great (obviously), otherwise, if the
> > user _explicitly_ set it via sysctl then we respect that. Andrew, do you
> > agree with this? If so I'll send a patch.
>
> Yes, my 0 bytes mean unlimited. I totally agree we shouldn't remove the knob
> entirely.
Hmmm so 0 won't really work because it could be weirdly used to disable
shm altogether... we cannot go to some negative value either since we're
dealing with unsigned, and cutting the range in half could also hurt
users that set the limit above that. So I was thinking of simply setting
SHMMAX to ULONG_MAX and be done with it. Users can then set it manually
if they want a smaller value.
Makes sense?
On Tue, Apr 1, 2014 at 5:01 PM, Davidlohr Bueso <[email protected]> wrote:
> On Tue, 2014-04-01 at 15:51 -0400, KOSAKI Motohiro wrote:
>> >> So, I personally like 0 byte per default.
>> >
>> > If by this you mean 0 bytes == unlimited, then I agree. It's less harsh
>> > then removing it entirely. So instead of removing the limit we can just
>> > set it by default to 0, and in newseg() if shm_ctlmax == 0 then we don't
>> > return EINVAL if the passed size is great (obviously), otherwise, if the
>> > user _explicitly_ set it via sysctl then we respect that. Andrew, do you
>> > agree with this? If so I'll send a patch.
>>
>> Yes, my 0 bytes mean unlimited. I totally agree we shouldn't remove the knob
>> entirely.
>
> Hmmm so 0 won't really work because it could be weirdly used to disable
> shm altogether... we cannot go to some negative value either since we're
> dealing with unsigned, and cutting the range in half could also hurt
> users that set the limit above that. So I was thinking of simply setting
> SHMMAX to ULONG_MAX and be done with it. Users can then set it manually
> if they want a smaller value.
>
> Makes sense?
I don't think people use 0 for disabling. but ULONG_MAX make sense to me too.
On Tue, 1 Apr 2014 17:12:50 -0400 KOSAKI Motohiro <[email protected]> wrote:
> On Tue, Apr 1, 2014 at 5:01 PM, Davidlohr Bueso <[email protected]> wrote:
> > On Tue, 2014-04-01 at 15:51 -0400, KOSAKI Motohiro wrote:
> >> >> So, I personally like 0 byte per default.
> >> >
> >> > If by this you mean 0 bytes == unlimited, then I agree. It's less harsh
> >> > then removing it entirely. So instead of removing the limit we can just
> >> > set it by default to 0, and in newseg() if shm_ctlmax == 0 then we don't
> >> > return EINVAL if the passed size is great (obviously), otherwise, if the
> >> > user _explicitly_ set it via sysctl then we respect that. Andrew, do you
> >> > agree with this? If so I'll send a patch.
> >>
> >> Yes, my 0 bytes mean unlimited. I totally agree we shouldn't remove the knob
> >> entirely.
> >
> > Hmmm so 0 won't really work because it could be weirdly used to disable
> > shm altogether... we cannot go to some negative value either since we're
> > dealing with unsigned, and cutting the range in half could also hurt
> > users that set the limit above that. So I was thinking of simply setting
> > SHMMAX to ULONG_MAX and be done with it. Users can then set it manually
> > if they want a smaller value.
> >
> > Makes sense?
>
> I don't think people use 0 for disabling. but ULONG_MAX make sense to me too.
Distros could have set it to [U]LONG_MAX in initscripts ten years ago
- less phone calls, happier customers. And they could do so today.
But they haven't. What are the risks of doing this?
>> > Hmmm so 0 won't really work because it could be weirdly used to disable
>> > shm altogether... we cannot go to some negative value either since we're
>> > dealing with unsigned, and cutting the range in half could also hurt
>> > users that set the limit above that. So I was thinking of simply setting
>> > SHMMAX to ULONG_MAX and be done with it. Users can then set it manually
>> > if they want a smaller value.
>> >
>> > Makes sense?
>>
>> I don't think people use 0 for disabling. but ULONG_MAX make sense to me too.
>
> Distros could have set it to [U]LONG_MAX in initscripts ten years ago
> - less phone calls, happier customers. And they could do so today.
>
> But they haven't. What are the risks of doing this?
I have no idea really. But at least I'm sure current default is much worse.
1. Solaris changed the default to total-memory/4 since Solaris 10 for DB.
http://www.postgresql.org/docs/9.1/static/kernel-resources.html
2. RHEL changed the default to very big size since RHEL5 (now it is
64GB). Even tough many box don't have 64GB memory at that time.
On Tue, 2014-04-01 at 17:12 -0400, KOSAKI Motohiro wrote:
> On Tue, Apr 1, 2014 at 5:01 PM, Davidlohr Bueso <[email protected]> wrote:
> > On Tue, 2014-04-01 at 15:51 -0400, KOSAKI Motohiro wrote:
> >> >> So, I personally like 0 byte per default.
> >> >
> >> > If by this you mean 0 bytes == unlimited, then I agree. It's less harsh
> >> > then removing it entirely. So instead of removing the limit we can just
> >> > set it by default to 0, and in newseg() if shm_ctlmax == 0 then we don't
> >> > return EINVAL if the passed size is great (obviously), otherwise, if the
> >> > user _explicitly_ set it via sysctl then we respect that. Andrew, do you
> >> > agree with this? If so I'll send a patch.
> >>
> >> Yes, my 0 bytes mean unlimited. I totally agree we shouldn't remove the knob
> >> entirely.
> >
> > Hmmm so 0 won't really work because it could be weirdly used to disable
> > shm altogether... we cannot go to some negative value either since we're
> > dealing with unsigned, and cutting the range in half could also hurt
> > users that set the limit above that. So I was thinking of simply setting
> > SHMMAX to ULONG_MAX and be done with it. Users can then set it manually
> > if they want a smaller value.
> >
> > Makes sense?
>
> I don't think people use 0 for disabling. but ULONG_MAX make sense to me too.
Yeah, you're right, SHMMNI is 1 and users _cannot_ change it.
On Tue, 1 Apr 2014 17:41:54 -0400 KOSAKI Motohiro <[email protected]> wrote:
> >> > Hmmm so 0 won't really work because it could be weirdly used to disable
> >> > shm altogether... we cannot go to some negative value either since we're
> >> > dealing with unsigned, and cutting the range in half could also hurt
> >> > users that set the limit above that. So I was thinking of simply setting
> >> > SHMMAX to ULONG_MAX and be done with it. Users can then set it manually
> >> > if they want a smaller value.
> >> >
> >> > Makes sense?
> >>
> >> I don't think people use 0 for disabling. but ULONG_MAX make sense to me too.
> >
> > Distros could have set it to [U]LONG_MAX in initscripts ten years ago
> > - less phone calls, happier customers. And they could do so today.
> >
> > But they haven't. What are the risks of doing this?
>
> I have no idea really. But at least I'm sure current default is much worse.
>
> 1. Solaris changed the default to total-memory/4 since Solaris 10 for DB.
> http://www.postgresql.org/docs/9.1/static/kernel-resources.html
>
> 2. RHEL changed the default to very big size since RHEL5 (now it is
> 64GB). Even tough many box don't have 64GB memory at that time.
Ah-hah, that's interesting info.
Let's make the default 64GB?
Then we can blame RH if something goes wrong ;)
On Tue, 2014-04-01 at 14:48 -0700, Andrew Morton wrote:
> On Tue, 1 Apr 2014 17:41:54 -0400 KOSAKI Motohiro <[email protected]> wrote:
>
> > >> > Hmmm so 0 won't really work because it could be weirdly used to disable
> > >> > shm altogether... we cannot go to some negative value either since we're
> > >> > dealing with unsigned, and cutting the range in half could also hurt
> > >> > users that set the limit above that. So I was thinking of simply setting
> > >> > SHMMAX to ULONG_MAX and be done with it. Users can then set it manually
> > >> > if they want a smaller value.
> > >> >
> > >> > Makes sense?
> > >>
> > >> I don't think people use 0 for disabling. but ULONG_MAX make sense to me too.
> > >
> > > Distros could have set it to [U]LONG_MAX in initscripts ten years ago
> > > - less phone calls, happier customers. And they could do so today.
> > >
> > > But they haven't. What are the risks of doing this?
> >
> > I have no idea really. But at least I'm sure current default is much worse.
> >
> > 1. Solaris changed the default to total-memory/4 since Solaris 10 for DB.
> > http://www.postgresql.org/docs/9.1/static/kernel-resources.html
> >
> > 2. RHEL changed the default to very big size since RHEL5 (now it is
> > 64GB). Even tough many box don't have 64GB memory at that time.
>
> Ah-hah, that's interesting info.
>
> Let's make the default 64GB?
But again, yet another arbitrary value...
On Tue, 01 Apr 2014 15:02:31 -0700 Davidlohr Bueso <[email protected]> wrote:
> On Tue, 2014-04-01 at 14:48 -0700, Andrew Morton wrote:
> > On Tue, 1 Apr 2014 17:41:54 -0400 KOSAKI Motohiro <[email protected]> wrote:
> >
> > > >> > Hmmm so 0 won't really work because it could be weirdly used to disable
> > > >> > shm altogether... we cannot go to some negative value either since we're
> > > >> > dealing with unsigned, and cutting the range in half could also hurt
> > > >> > users that set the limit above that. So I was thinking of simply setting
> > > >> > SHMMAX to ULONG_MAX and be done with it. Users can then set it manually
> > > >> > if they want a smaller value.
> > > >> >
> > > >> > Makes sense?
> > > >>
> > > >> I don't think people use 0 for disabling. but ULONG_MAX make sense to me too.
> > > >
> > > > Distros could have set it to [U]LONG_MAX in initscripts ten years ago
> > > > - less phone calls, happier customers. And they could do so today.
> > > >
> > > > But they haven't. What are the risks of doing this?
> > >
> > > I have no idea really. But at least I'm sure current default is much worse.
> > >
> > > 1. Solaris changed the default to total-memory/4 since Solaris 10 for DB.
> > > http://www.postgresql.org/docs/9.1/static/kernel-resources.html
> > >
> > > 2. RHEL changed the default to very big size since RHEL5 (now it is
> > > 64GB). Even tough many box don't have 64GB memory at that time.
> >
> > Ah-hah, that's interesting info.
> >
> > Let's make the default 64GB?
>
> But again, yet another arbitrary value...
Well, I'm assuming 64GB==infinity. It *was* infinity in the RHEL5
timeframe, but infinity has since become larger so pickanumber.
On Tue, Apr 1, 2014 at 5:48 PM, Andrew Morton <[email protected]> wrote:
> On Tue, 1 Apr 2014 17:41:54 -0400 KOSAKI Motohiro <[email protected]> wrote:
>
>> >> > Hmmm so 0 won't really work because it could be weirdly used to disable
>> >> > shm altogether... we cannot go to some negative value either since we're
>> >> > dealing with unsigned, and cutting the range in half could also hurt
>> >> > users that set the limit above that. So I was thinking of simply setting
>> >> > SHMMAX to ULONG_MAX and be done with it. Users can then set it manually
>> >> > if they want a smaller value.
>> >> >
>> >> > Makes sense?
>> >>
>> >> I don't think people use 0 for disabling. but ULONG_MAX make sense to me too.
>> >
>> > Distros could have set it to [U]LONG_MAX in initscripts ten years ago
>> > - less phone calls, happier customers. And they could do so today.
>> >
>> > But they haven't. What are the risks of doing this?
>>
>> I have no idea really. But at least I'm sure current default is much worse.
>>
>> 1. Solaris changed the default to total-memory/4 since Solaris 10 for DB.
>> http://www.postgresql.org/docs/9.1/static/kernel-resources.html
>>
>> 2. RHEL changed the default to very big size since RHEL5 (now it is
>> 64GB). Even tough many box don't have 64GB memory at that time.
>
> Ah-hah, that's interesting info.
>
> Let's make the default 64GB?
64GB is infinity at that time, but it no longer near infinity today. I like
very large or total memory proportional number.
But I'm open. Please let me see if anyone know the disadvantage of
very large value.
On Tue, 2014-04-01 at 18:49 -0400, KOSAKI Motohiro wrote:
> On Tue, Apr 1, 2014 at 5:48 PM, Andrew Morton <[email protected]> wrote:
> > On Tue, 1 Apr 2014 17:41:54 -0400 KOSAKI Motohiro <[email protected]> wrote:
> >
> >> >> > Hmmm so 0 won't really work because it could be weirdly used to disable
> >> >> > shm altogether... we cannot go to some negative value either since we're
> >> >> > dealing with unsigned, and cutting the range in half could also hurt
> >> >> > users that set the limit above that. So I was thinking of simply setting
> >> >> > SHMMAX to ULONG_MAX and be done with it. Users can then set it manually
> >> >> > if they want a smaller value.
> >> >> >
> >> >> > Makes sense?
> >> >>
> >> >> I don't think people use 0 for disabling. but ULONG_MAX make sense to me too.
> >> >
> >> > Distros could have set it to [U]LONG_MAX in initscripts ten years ago
> >> > - less phone calls, happier customers. And they could do so today.
> >> >
> >> > But they haven't. What are the risks of doing this?
> >>
> >> I have no idea really. But at least I'm sure current default is much worse.
> >>
> >> 1. Solaris changed the default to total-memory/4 since Solaris 10 for DB.
> >> http://www.postgresql.org/docs/9.1/static/kernel-resources.html
> >>
> >> 2. RHEL changed the default to very big size since RHEL5 (now it is
> >> 64GB). Even tough many box don't have 64GB memory at that time.
> >
> > Ah-hah, that's interesting info.
> >
> > Let's make the default 64GB?
>
> 64GB is infinity at that time, but it no longer near infinity today. I like
> very large or total memory proportional number.
So I still like 0 for unlimited. Nice, clean and much easier to look at
than ULONG_MAX. And since we cannot disable shm through SHMMIN, I really
don't see any disadvantages, as opposed to some other arbitrary value.
Furthermore it wouldn't break userspace: any existing sysctl would
continue to work, and if not set, the user never has to worry about this
tunable again.
Please let me know if you all agree with this...
>> > Ah-hah, that's interesting info.
>> >
>> > Let's make the default 64GB?
>>
>> 64GB is infinity at that time, but it no longer near infinity today. I like
>> very large or total memory proportional number.
>
> So I still like 0 for unlimited. Nice, clean and much easier to look at
> than ULONG_MAX. And since we cannot disable shm through SHMMIN, I really
> don't see any disadvantages, as opposed to some other arbitrary value.
> Furthermore it wouldn't break userspace: any existing sysctl would
> continue to work, and if not set, the user never has to worry about this
> tunable again.
>
> Please let me know if you all agree with this...
Surething. Why not. :)
On Tue, 2014-04-01 at 19:56 -0400, KOSAKI Motohiro wrote:
> >> > Ah-hah, that's interesting info.
> >> >
> >> > Let's make the default 64GB?
> >>
> >> 64GB is infinity at that time, but it no longer near infinity today. I like
> >> very large or total memory proportional number.
> >
> > So I still like 0 for unlimited. Nice, clean and much easier to look at
> > than ULONG_MAX. And since we cannot disable shm through SHMMIN, I really
> > don't see any disadvantages, as opposed to some other arbitrary value.
> > Furthermore it wouldn't break userspace: any existing sysctl would
> > continue to work, and if not set, the user never has to worry about this
> > tunable again.
> >
> > Please let me know if you all agree with this...
>
> Surething. Why not. :)
*sigh* actually, the plot thickens a bit with SHMALL (total size of shm
segments system wide, in pages). Currently by default:
#define SHMALL (SHMMAX/getpagesize()*(SHMMNI/16))
This deals with physical memory, at least admins are recommended to set
it to some large percentage of ram / pagesize. So I think that if we
loose control over the default value, users can potentially DoS the
system, or at least cause excessive swapping if not manually set, but
then again the same goes for anon mem... so do we care?
On Tue, Apr 01 2014, Davidlohr Bueso <[email protected]> wrote:
> On Tue, 2014-04-01 at 19:56 -0400, KOSAKI Motohiro wrote:
>> >> > Ah-hah, that's interesting info.
>> >> >
>> >> > Let's make the default 64GB?
>> >>
>> >> 64GB is infinity at that time, but it no longer near infinity today. I like
>> >> very large or total memory proportional number.
>> >
>> > So I still like 0 for unlimited. Nice, clean and much easier to look at
>> > than ULONG_MAX. And since we cannot disable shm through SHMMIN, I really
>> > don't see any disadvantages, as opposed to some other arbitrary value.
>> > Furthermore it wouldn't break userspace: any existing sysctl would
>> > continue to work, and if not set, the user never has to worry about this
>> > tunable again.
>> >
>> > Please let me know if you all agree with this...
>>
>> Surething. Why not. :)
>
> *sigh* actually, the plot thickens a bit with SHMALL (total size of shm
> segments system wide, in pages). Currently by default:
>
> #define SHMALL (SHMMAX/getpagesize()*(SHMMNI/16))
>
> This deals with physical memory, at least admins are recommended to set
> it to some large percentage of ram / pagesize. So I think that if we
> loose control over the default value, users can potentially DoS the
> system, or at least cause excessive swapping if not manually set, but
> then again the same goes for anon mem... so do we care?
At least when there's an egregious anon leak the oom killer has the
power to free the memory by killing until the memory is unreferenced.
This isn't true for shm or tmpfs. So shm is more effective than anon at
crushing a machine.
(2014/04/02 10:08), Greg Thelen wrote:
>
> On Tue, Apr 01 2014, Davidlohr Bueso <[email protected]> wrote:
>
>> On Tue, 2014-04-01 at 19:56 -0400, KOSAKI Motohiro wrote:
>>>>>> Ah-hah, that's interesting info.
>>>>>>
>>>>>> Let's make the default 64GB?
>>>>>
>>>>> 64GB is infinity at that time, but it no longer near infinity today. I like
>>>>> very large or total memory proportional number.
>>>>
>>>> So I still like 0 for unlimited. Nice, clean and much easier to look at
>>>> than ULONG_MAX. And since we cannot disable shm through SHMMIN, I really
>>>> don't see any disadvantages, as opposed to some other arbitrary value.
>>>> Furthermore it wouldn't break userspace: any existing sysctl would
>>>> continue to work, and if not set, the user never has to worry about this
>>>> tunable again.
>>>>
>>>> Please let me know if you all agree with this...
>>>
>>> Surething. Why not. :)
>>
>> *sigh* actually, the plot thickens a bit with SHMALL (total size of shm
>> segments system wide, in pages). Currently by default:
>>
>> #define SHMALL (SHMMAX/getpagesize()*(SHMMNI/16))
>>
>> This deals with physical memory, at least admins are recommended to set
>> it to some large percentage of ram / pagesize. So I think that if we
>> loose control over the default value, users can potentially DoS the
>> system, or at least cause excessive swapping if not manually set, but
>> then again the same goes for anon mem... so do we care?
>
> At least when there's an egregious anon leak the oom killer has the
> power to free the memory by killing until the memory is unreferenced.
> This isn't true for shm or tmpfs. So shm is more effective than anon at
> crushing a machine.
Hm..sysctl.kernel.shm_rmid_forced won't work with oom-killer ?
http://www.openwall.com/lists/kernel-hardening/2011/07/26/7
I like to handle this kind of issue under memcg but hmm..tmpfs's limit is half
of memory at default.
Thanks,
-Kame
On Tue, Apr 01 2014, Kamezawa Hiroyuki <[email protected]> wrote:
>> On Tue, Apr 01 2014, Davidlohr Bueso <[email protected]> wrote:
>>
>>> On Tue, 2014-04-01 at 19:56 -0400, KOSAKI Motohiro wrote:
>>>>>>> Ah-hah, that's interesting info.
>>>>>>>
>>>>>>> Let's make the default 64GB?
>>>>>>
>>>>>> 64GB is infinity at that time, but it no longer near infinity today. I like
>>>>>> very large or total memory proportional number.
>>>>>
>>>>> So I still like 0 for unlimited. Nice, clean and much easier to look at
>>>>> than ULONG_MAX. And since we cannot disable shm through SHMMIN, I really
>>>>> don't see any disadvantages, as opposed to some other arbitrary value.
>>>>> Furthermore it wouldn't break userspace: any existing sysctl would
>>>>> continue to work, and if not set, the user never has to worry about this
>>>>> tunable again.
>>>>>
>>>>> Please let me know if you all agree with this...
>>>>
>>>> Surething. Why not. :)
>>>
>>> *sigh* actually, the plot thickens a bit with SHMALL (total size of shm
>>> segments system wide, in pages). Currently by default:
>>>
>>> #define SHMALL (SHMMAX/getpagesize()*(SHMMNI/16))
>>>
>>> This deals with physical memory, at least admins are recommended to set
>>> it to some large percentage of ram / pagesize. So I think that if we
>>> loose control over the default value, users can potentially DoS the
>>> system, or at least cause excessive swapping if not manually set, but
>>> then again the same goes for anon mem... so do we care?
>>
> (2014/04/02 10:08), Greg Thelen wrote:
>>
>> At least when there's an egregious anon leak the oom killer has the
>> power to free the memory by killing until the memory is unreferenced.
>> This isn't true for shm or tmpfs. So shm is more effective than anon at
>> crushing a machine.
>
> Hm..sysctl.kernel.shm_rmid_forced won't work with oom-killer ?
>
> http://www.openwall.com/lists/kernel-hardening/2011/07/26/7
>
> I like to handle this kind of issue under memcg but hmm..tmpfs's limit is half
> of memory at default.
Ah, yes. I forgot about shm_rmid_forced. Thanks. It would give the oom
killer ability to cleanup shm (as it does with anon) when
shm_rmid_forced=1.
The default size for shmmax is, and always has been, 32Mb.
Today, in the XXI century, it seems that this value is rather small,
making users have to increase it via sysctl, which can cause
unnecessary work and userspace application workarounds[1].
Instead of choosing yet another arbitrary value, larger than 32Mb,
this patch disables the use of both shmmax and shmall by default,
allowing users to create segments of unlimited sizes. Users and
applications that already explicitly set these values through sysctl
are left untouched, and thus does not change any of the behavior.
So a value of 0 bytes or pages, for shmmax and shmall, respectively,
implies unlimited memory, as opposed to disabling sysv shared memory.
This is safe as 0 cannot possibly be used previously as SHMMIN is
hardcoded to 1 and cannot be modified.
This change allows Linux to treat shm just as regular anonymous memory.
One important difference between them, though, is handling out-of-memory
conditions: as opposed to regular anon memory, the OOM killer will not
kill processes that are hogging memory through shm, allowing users to
potentially abuse this. To overcome this situation, the shm_rmid_forced
option must be enabled.
Running this patch through LTP, everything passes, except the following,
which, due to the nature of this change, is quite expected:
shmget02 1 TFAIL : call succeeded unexpectedly
[1]: http://rhaas.blogspot.com/2012/06/absurd-shared-memory-limits.html
Signed-off-by: Davidlohr Bueso <[email protected]>
---
include/linux/shm.h | 2 +-
include/uapi/linux/shm.h | 8 ++++----
ipc/shm.c | 6 ++++--
3 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/include/linux/shm.h b/include/linux/shm.h
index 1e2cd2e..0ca06a3 100644
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -4,7 +4,7 @@
#include <asm/page.h>
#include <uapi/linux/shm.h>
-#define SHMALL (SHMMAX/PAGE_SIZE*(SHMMNI/16)) /* max shm system wide (pages) */
+#define SHMALL 0 /* max shm system wide (pages) */
#include <asm/shmparam.h>
struct shmid_kernel /* private to the kernel */
{
diff --git a/include/uapi/linux/shm.h b/include/uapi/linux/shm.h
index 78b6941..5f0ef28 100644
--- a/include/uapi/linux/shm.h
+++ b/include/uapi/linux/shm.h
@@ -9,14 +9,14 @@
/*
* SHMMAX, SHMMNI and SHMALL are upper limits are defaults which can
- * be increased by sysctl
+ * be increased by sysctl. By default, disable SHMMAX and SHMALL with
+ * 0 bytes, thus allowing processes to have unlimited shared memory.
*/
-
-#define SHMMAX 0x2000000 /* max shared seg size (bytes) */
+#define SHMMAX 0 /* max shared seg size (bytes) */
#define SHMMIN 1 /* min shared seg size (bytes) */
#define SHMMNI 4096 /* max num of segs system wide */
#ifndef __KERNEL__
-#define SHMALL (SHMMAX/getpagesize()*(SHMMNI/16))
+#define SHMALL 0
#endif
#define SHMSEG SHMMNI /* max shared segs per process */
diff --git a/ipc/shm.c b/ipc/shm.c
index 7645961..ae01ffa 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -490,10 +490,12 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
int id;
vm_flags_t acctflag = 0;
- if (size < SHMMIN || size > ns->shm_ctlmax)
+ if (ns->shm_ctlmax &&
+ (size < SHMMIN || size > ns->shm_ctlmax))
return -EINVAL;
- if (ns->shm_tot + numpages > ns->shm_ctlall)
+ if (ns->shm_ctlall &&
+ ns->shm_tot + numpages > ns->shm_ctlall)
return -ENOSPC;
shp = ipc_rcu_alloc(sizeof(*shp));
--
1.8.1.4
(2014/04/03 9:20), Davidlohr Bueso wrote:
> The default size for shmmax is, and always has been, 32Mb.
> Today, in the XXI century, it seems that this value is rather small,
> making users have to increase it via sysctl, which can cause
> unnecessary work and userspace application workarounds[1].
>
> Instead of choosing yet another arbitrary value, larger than 32Mb,
> this patch disables the use of both shmmax and shmall by default,
> allowing users to create segments of unlimited sizes. Users and
> applications that already explicitly set these values through sysctl
> are left untouched, and thus does not change any of the behavior.
>
> So a value of 0 bytes or pages, for shmmax and shmall, respectively,
> implies unlimited memory, as opposed to disabling sysv shared memory.
> This is safe as 0 cannot possibly be used previously as SHMMIN is
> hardcoded to 1 and cannot be modified.
>
> This change allows Linux to treat shm just as regular anonymous memory.
> One important difference between them, though, is handling out-of-memory
> conditions: as opposed to regular anon memory, the OOM killer will not
> kill processes that are hogging memory through shm, allowing users to
> potentially abuse this. To overcome this situation, the shm_rmid_forced
> option must be enabled.
>
> Running this patch through LTP, everything passes, except the following,
> which, due to the nature of this change, is quite expected:
>
> shmget02 1 TFAIL : call succeeded unexpectedly
>
> [1]: http://rhaas.blogspot.com/2012/06/absurd-shared-memory-limits.html
>
> Signed-off-by: Davidlohr Bueso <[email protected]>
looks good to me
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
When this goes mainline, updating a man page of ipcs may be required.
Thanks,
-Kame
> ---
> include/linux/shm.h | 2 +-
> include/uapi/linux/shm.h | 8 ++++----
> ipc/shm.c | 6 ++++--
> 3 files changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/shm.h b/include/linux/shm.h
> index 1e2cd2e..0ca06a3 100644
> --- a/include/linux/shm.h
> +++ b/include/linux/shm.h
> @@ -4,7 +4,7 @@
> #include <asm/page.h>
> #include <uapi/linux/shm.h>
>
> -#define SHMALL (SHMMAX/PAGE_SIZE*(SHMMNI/16)) /* max shm system wide (pages) */
> +#define SHMALL 0 /* max shm system wide (pages) */
> #include <asm/shmparam.h>
> struct shmid_kernel /* private to the kernel */
> {
> diff --git a/include/uapi/linux/shm.h b/include/uapi/linux/shm.h
> index 78b6941..5f0ef28 100644
> --- a/include/uapi/linux/shm.h
> +++ b/include/uapi/linux/shm.h
> @@ -9,14 +9,14 @@
>
> /*
> * SHMMAX, SHMMNI and SHMALL are upper limits are defaults which can
> - * be increased by sysctl
> + * be increased by sysctl. By default, disable SHMMAX and SHMALL with
> + * 0 bytes, thus allowing processes to have unlimited shared memory.
> */
> -
> -#define SHMMAX 0x2000000 /* max shared seg size (bytes) */
> +#define SHMMAX 0 /* max shared seg size (bytes) */
> #define SHMMIN 1 /* min shared seg size (bytes) */
> #define SHMMNI 4096 /* max num of segs system wide */
> #ifndef __KERNEL__
> -#define SHMALL (SHMMAX/getpagesize()*(SHMMNI/16))
> +#define SHMALL 0
> #endif
> #define SHMSEG SHMMNI /* max shared segs per process */
>
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 7645961..ae01ffa 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -490,10 +490,12 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> int id;
> vm_flags_t acctflag = 0;
>
> - if (size < SHMMIN || size > ns->shm_ctlmax)
> + if (ns->shm_ctlmax &&
> + (size < SHMMIN || size > ns->shm_ctlmax))
> return -EINVAL;
>
> - if (ns->shm_tot + numpages > ns->shm_ctlall)
> + if (ns->shm_ctlall &&
> + ns->shm_tot + numpages > ns->shm_ctlall)
> return -ENOSPC;
>
> shp = ipc_rcu_alloc(sizeof(*shp));
>
Hi Davidlohr,
On 04/03/2014 02:20 AM, Davidlohr Bueso wrote:
> The default size for shmmax is, and always has been, 32Mb.
> Today, in the XXI century, it seems that this value is rather small,
> making users have to increase it via sysctl, which can cause
> unnecessary work and userspace application workarounds[1].
>
> Instead of choosing yet another arbitrary value, larger than 32Mb,
> this patch disables the use of both shmmax and shmall by default,
> allowing users to create segments of unlimited sizes. Users and
> applications that already explicitly set these values through sysctl
> are left untouched, and thus does not change any of the behavior.
>
> So a value of 0 bytes or pages, for shmmax and shmall, respectively,
> implies unlimited memory, as opposed to disabling sysv shared memory.
> This is safe as 0 cannot possibly be used previously as SHMMIN is
> hardcoded to 1 and cannot be modified.
Are we sure that no user space apps uses shmctl(IPC_INFO) and prints a
pretty error message if shmall is too small?
We would break these apps.
--
Manfred
On Thu, 2014-04-03 at 21:02 +0200, Manfred Spraul wrote:
> Hi Davidlohr,
>
> On 04/03/2014 02:20 AM, Davidlohr Bueso wrote:
> > The default size for shmmax is, and always has been, 32Mb.
> > Today, in the XXI century, it seems that this value is rather small,
> > making users have to increase it via sysctl, which can cause
> > unnecessary work and userspace application workarounds[1].
> >
> > Instead of choosing yet another arbitrary value, larger than 32Mb,
> > this patch disables the use of both shmmax and shmall by default,
> > allowing users to create segments of unlimited sizes. Users and
> > applications that already explicitly set these values through sysctl
> > are left untouched, and thus does not change any of the behavior.
> >
> > So a value of 0 bytes or pages, for shmmax and shmall, respectively,
> > implies unlimited memory, as opposed to disabling sysv shared memory.
> > This is safe as 0 cannot possibly be used previously as SHMMIN is
> > hardcoded to 1 and cannot be modified.
> Are we sure that no user space apps uses shmctl(IPC_INFO) and prints a
> pretty error message if shmall is too small?
> We would break these apps.
Good point. 0 bytes/pages would definitely trigger an unexpected error
message if users did this. But on the other hand I'm not sure this
actually is a _real_ scenario, since upon overflow the value can still
end up being 0, which is totally bogus and would cause the same
breakage.
So I see two possible workarounds:
(i) Use ULONG_MAX for the shmmax default instead. This would make shmall
default to 1152921504606846720 and 268435456, for 64 and 32bit systems,
respectively.
(ii) Keep the 0 bytes, but add a new a "transition" tunable that, if set
(default off), would allow 0 bytes to be unlimited. With time, users
could hopefully update their applications and we could eventually get
rid of it. This _seems_ to be the less aggressive way to go.
Thoughts?
Thanks,
Davidlohr
On Wed, Apr 2, 2014 at 8:20 PM, Davidlohr Bueso <[email protected]> wrote:
> The default size for shmmax is, and always has been, 32Mb.
> Today, in the XXI century, it seems that this value is rather small,
> making users have to increase it via sysctl, which can cause
> unnecessary work and userspace application workarounds[1].
>
> Instead of choosing yet another arbitrary value, larger than 32Mb,
> this patch disables the use of both shmmax and shmall by default,
> allowing users to create segments of unlimited sizes. Users and
> applications that already explicitly set these values through sysctl
> are left untouched, and thus does not change any of the behavior.
>
> So a value of 0 bytes or pages, for shmmax and shmall, respectively,
> implies unlimited memory, as opposed to disabling sysv shared memory.
> This is safe as 0 cannot possibly be used previously as SHMMIN is
> hardcoded to 1 and cannot be modified.
>
> This change allows Linux to treat shm just as regular anonymous memory.
> One important difference between them, though, is handling out-of-memory
> conditions: as opposed to regular anon memory, the OOM killer will not
> kill processes that are hogging memory through shm, allowing users to
> potentially abuse this. To overcome this situation, the shm_rmid_forced
> option must be enabled.
I'm very slightly against this sentence.
OOM killer WILL kill the process because shm touching increase RSS anyway.
But the killing doesn't make memory freeing because it's shmem.
>
> Running this patch through LTP, everything passes, except the following,
> which, due to the nature of this change, is quite expected:
>
> shmget02 1 TFAIL : call succeeded unexpectedly
>
> [1]: http://rhaas.blogspot.com/2012/06/absurd-shared-memory-limits.html
>
> Signed-off-by: Davidlohr Bueso <[email protected]>
> ---
> include/linux/shm.h | 2 +-
> include/uapi/linux/shm.h | 8 ++++----
> ipc/shm.c | 6 ++++--
> 3 files changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/shm.h b/include/linux/shm.h
> index 1e2cd2e..0ca06a3 100644
> --- a/include/linux/shm.h
> +++ b/include/linux/shm.h
> @@ -4,7 +4,7 @@
> #include <asm/page.h>
> #include <uapi/linux/shm.h>
>
> -#define SHMALL (SHMMAX/PAGE_SIZE*(SHMMNI/16)) /* max shm system wide (pages) */
> +#define SHMALL 0 /* max shm system wide (pages) */
> #include <asm/shmparam.h>
> struct shmid_kernel /* private to the kernel */
> {
> diff --git a/include/uapi/linux/shm.h b/include/uapi/linux/shm.h
> index 78b6941..5f0ef28 100644
> --- a/include/uapi/linux/shm.h
> +++ b/include/uapi/linux/shm.h
> @@ -9,14 +9,14 @@
>
> /*
> * SHMMAX, SHMMNI and SHMALL are upper limits are defaults which can
> - * be increased by sysctl
> + * be increased by sysctl. By default, disable SHMMAX and SHMALL with
> + * 0 bytes, thus allowing processes to have unlimited shared memory.
> */
> -
> -#define SHMMAX 0x2000000 /* max shared seg size (bytes) */
> +#define SHMMAX 0 /* max shared seg size (bytes) */
> #define SHMMIN 1 /* min shared seg size (bytes) */
> #define SHMMNI 4096 /* max num of segs system wide */
> #ifndef __KERNEL__
> -#define SHMALL (SHMMAX/getpagesize()*(SHMMNI/16))
> +#define SHMALL 0
> #endif
> #define SHMSEG SHMMNI /* max shared segs per process */
>
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 7645961..ae01ffa 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -490,10 +490,12 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> int id;
> vm_flags_t acctflag = 0;
>
> - if (size < SHMMIN || size > ns->shm_ctlmax)
> + if (ns->shm_ctlmax &&
> + (size < SHMMIN || size > ns->shm_ctlmax))
> return -EINVAL;
>
> - if (ns->shm_tot + numpages > ns->shm_ctlall)
> + if (ns->shm_ctlall &&
> + ns->shm_tot + numpages > ns->shm_ctlall)
> return -ENOSPC;
>
> shp = ipc_rcu_alloc(sizeof(*shp));
Looks good.
Acked-by: KOSAKI Motohiro <[email protected]>
On Thu, Apr 3, 2014 at 3:50 PM, Davidlohr Bueso <[email protected]> wrote:
> On Thu, 2014-04-03 at 21:02 +0200, Manfred Spraul wrote:
>> Hi Davidlohr,
>>
>> On 04/03/2014 02:20 AM, Davidlohr Bueso wrote:
>> > The default size for shmmax is, and always has been, 32Mb.
>> > Today, in the XXI century, it seems that this value is rather small,
>> > making users have to increase it via sysctl, which can cause
>> > unnecessary work and userspace application workarounds[1].
>> >
>> > Instead of choosing yet another arbitrary value, larger than 32Mb,
>> > this patch disables the use of both shmmax and shmall by default,
>> > allowing users to create segments of unlimited sizes. Users and
>> > applications that already explicitly set these values through sysctl
>> > are left untouched, and thus does not change any of the behavior.
>> >
>> > So a value of 0 bytes or pages, for shmmax and shmall, respectively,
>> > implies unlimited memory, as opposed to disabling sysv shared memory.
>> > This is safe as 0 cannot possibly be used previously as SHMMIN is
>> > hardcoded to 1 and cannot be modified.
>
>> Are we sure that no user space apps uses shmctl(IPC_INFO) and prints a
>> pretty error message if shmall is too small?
>> We would break these apps.
>
> Good point. 0 bytes/pages would definitely trigger an unexpected error
> message if users did this. But on the other hand I'm not sure this
> actually is a _real_ scenario, since upon overflow the value can still
> end up being 0, which is totally bogus and would cause the same
> breakage.
>
> So I see two possible workarounds:
> (i) Use ULONG_MAX for the shmmax default instead. This would make shmall
> default to 1152921504606846720 and 268435456, for 64 and 32bit systems,
> respectively.
>
> (ii) Keep the 0 bytes, but add a new a "transition" tunable that, if set
> (default off), would allow 0 bytes to be unlimited. With time, users
> could hopefully update their applications and we could eventually get
> rid of it. This _seems_ to be the less aggressive way to go.
Do you mean
set 0: IPC_INFO return shmmax = 0.
set 1: IPC_INFO return shmmax = ULONG_MAX.
?
That makes sense.
> This change allows Linux to treat shm just as regular anonymous memory.
> One important difference between them, though, is handling out-of-memory
> conditions: as opposed to regular anon memory, the OOM killer will not
> kill processes that are hogging memory through shm, allowing users to
> potentially abuse this. To overcome this situation, the shm_rmid_forced
> option must be enabled.
Off topic: systemd implemented similar feature RemoveIPC and it is
enabled by default.
http://lists.freedesktop.org/archives/systemd-devel/2014-March/018232.html
On Thu, 2014-04-03 at 19:39 -0400, KOSAKI Motohiro wrote:
> On Thu, Apr 3, 2014 at 3:50 PM, Davidlohr Bueso <[email protected]> wrote:
> > On Thu, 2014-04-03 at 21:02 +0200, Manfred Spraul wrote:
> >> Hi Davidlohr,
> >>
> >> On 04/03/2014 02:20 AM, Davidlohr Bueso wrote:
> >> > The default size for shmmax is, and always has been, 32Mb.
> >> > Today, in the XXI century, it seems that this value is rather small,
> >> > making users have to increase it via sysctl, which can cause
> >> > unnecessary work and userspace application workarounds[1].
> >> >
> >> > Instead of choosing yet another arbitrary value, larger than 32Mb,
> >> > this patch disables the use of both shmmax and shmall by default,
> >> > allowing users to create segments of unlimited sizes. Users and
> >> > applications that already explicitly set these values through sysctl
> >> > are left untouched, and thus does not change any of the behavior.
> >> >
> >> > So a value of 0 bytes or pages, for shmmax and shmall, respectively,
> >> > implies unlimited memory, as opposed to disabling sysv shared memory.
> >> > This is safe as 0 cannot possibly be used previously as SHMMIN is
> >> > hardcoded to 1 and cannot be modified.
> >
> >> Are we sure that no user space apps uses shmctl(IPC_INFO) and prints a
> >> pretty error message if shmall is too small?
> >> We would break these apps.
> >
> > Good point. 0 bytes/pages would definitely trigger an unexpected error
> > message if users did this. But on the other hand I'm not sure this
> > actually is a _real_ scenario, since upon overflow the value can still
> > end up being 0, which is totally bogus and would cause the same
> > breakage.
> >
> > So I see two possible workarounds:
> > (i) Use ULONG_MAX for the shmmax default instead. This would make shmall
> > default to 1152921504606846720 and 268435456, for 64 and 32bit systems,
> > respectively.
> >
> > (ii) Keep the 0 bytes, but add a new a "transition" tunable that, if set
> > (default off), would allow 0 bytes to be unlimited. With time, users
> > could hopefully update their applications and we could eventually get
> > rid of it. This _seems_ to be the less aggressive way to go.
>
> Do you mean
>
> set 0: IPC_INFO return shmmax = 0.
> set 1: IPC_INFO return shmmax = ULONG_MAX.
>
> ?
>
> That makes sense.
Well I was mostly referring to:
set 0: leave things as there are now.
set 1: this patch.
I don't think it makes much sense to set unlimited for both 0 and
ULONG_MAX, that would probably just create even more confusion.
But then again, we shouldn't even care about breaking things with shmmax
or shmall with 0 value, it just makes no sense from a user PoV. shmmax
cannot be 0 unless there's an overflow, which voids any valid cases, and
thus shmall cannot be 0 either as it would go against any values set for
shmmax. I think it's safe to ignore this.
On Fri, Apr 4, 2014 at 1:00 AM, Davidlohr Bueso <[email protected]> wrote:
> On Thu, 2014-04-03 at 19:39 -0400, KOSAKI Motohiro wrote:
>> On Thu, Apr 3, 2014 at 3:50 PM, Davidlohr Bueso <[email protected]> wrote:
>> > On Thu, 2014-04-03 at 21:02 +0200, Manfred Spraul wrote:
>> >> Hi Davidlohr,
>> >>
>> >> On 04/03/2014 02:20 AM, Davidlohr Bueso wrote:
>> >> > The default size for shmmax is, and always has been, 32Mb.
>> >> > Today, in the XXI century, it seems that this value is rather small,
>> >> > making users have to increase it via sysctl, which can cause
>> >> > unnecessary work and userspace application workarounds[1].
>> >> >
>> >> > Instead of choosing yet another arbitrary value, larger than 32Mb,
>> >> > this patch disables the use of both shmmax and shmall by default,
>> >> > allowing users to create segments of unlimited sizes. Users and
>> >> > applications that already explicitly set these values through sysctl
>> >> > are left untouched, and thus does not change any of the behavior.
>> >> >
>> >> > So a value of 0 bytes or pages, for shmmax and shmall, respectively,
>> >> > implies unlimited memory, as opposed to disabling sysv shared memory.
>> >> > This is safe as 0 cannot possibly be used previously as SHMMIN is
>> >> > hardcoded to 1 and cannot be modified.
>> >
>> >> Are we sure that no user space apps uses shmctl(IPC_INFO) and prints a
>> >> pretty error message if shmall is too small?
>> >> We would break these apps.
>> >
>> > Good point. 0 bytes/pages would definitely trigger an unexpected error
>> > message if users did this. But on the other hand I'm not sure this
>> > actually is a _real_ scenario, since upon overflow the value can still
>> > end up being 0, which is totally bogus and would cause the same
>> > breakage.
>> >
>> > So I see two possible workarounds:
>> > (i) Use ULONG_MAX for the shmmax default instead. This would make shmall
>> > default to 1152921504606846720 and 268435456, for 64 and 32bit systems,
>> > respectively.
>> >
>> > (ii) Keep the 0 bytes, but add a new a "transition" tunable that, if set
>> > (default off), would allow 0 bytes to be unlimited. With time, users
>> > could hopefully update their applications and we could eventually get
>> > rid of it. This _seems_ to be the less aggressive way to go.
>>
>> Do you mean
>>
>> set 0: IPC_INFO return shmmax = 0.
>> set 1: IPC_INFO return shmmax = ULONG_MAX.
>>
>> ?
>>
>> That makes sense.
>
> Well I was mostly referring to:
>
> set 0: leave things as there are now.
> set 1: this patch.
I don't recommend this approach because many user never switch 1 and
finally getting API fragmentation.
> I don't think it makes much sense to set unlimited for both 0 and
> ULONG_MAX, that would probably just create even more confusion.
>
> But then again, we shouldn't even care about breaking things with shmmax
> or shmall with 0 value, it just makes no sense from a user PoV. shmmax
> cannot be 0 unless there's an overflow, which voids any valid cases, and
> thus shmall cannot be 0 either as it would go against any values set for
> shmmax. I think it's safe to ignore this.
Agreed.
IMHO, until you find out any incompatibility issue of this, we don't
need the switch
because we can't make good workaround for that. I'd suggest to merge your patch
and see what happen.
Hi,
On 04/05/2014 08:24 PM, KOSAKI Motohiro wrote:
> On Fri, Apr 4, 2014 at 1:00 AM, Davidlohr Bueso <[email protected]> wrote:
>> I don't think it makes much sense to set unlimited for both 0 and
>> ULONG_MAX, that would probably just create even more confusion.
I agree.
Unlimited was INT_MAX since 0.99.10 and ULONG_MAX since 2.3.39 (with
proper backward compatibility for user space).
Adding a second value for unlimited just creates confusion.
>> But then again, we shouldn't even care about breaking things with shmmax
>> or shmall with 0 value, it just makes no sense from a user PoV. shmmax
>> cannot be 0 unless there's an overflow, which voids any valid cases, and
>> thus shmall cannot be 0 either as it would go against any values set for
>> shmmax. I think it's safe to ignore this.
> Agreed.
> IMHO, until you find out any incompatibility issue of this, we don't
> need the switch
> because we can't make good workaround for that. I'd suggest to merge your patch
> and see what happen.
I disagree:
- "shmctl(,IPC_INFO,&buf); if (my_memory_size > buf.shmmax)
perror("change shmmax");" worked correctly since 0.99.10. I don't think
that merging the patch and seeing what happens is the right approach.
- setting shmmax by default to ULONG_MAX is the perfect workaround.
What reasons are there against the one-line patch?
>
> -#define SHMMAX 0x2000000 /* max shared seg size
(bytes) */
> +#define SHMMAX ULONG_MAX /* max shared seg size
(bytes) */
>
--
Manfred
On Sun, 2014-04-06 at 08:42 +0200, Manfred Spraul wrote:
> Hi,
>
> On 04/05/2014 08:24 PM, KOSAKI Motohiro wrote:
> > On Fri, Apr 4, 2014 at 1:00 AM, Davidlohr Bueso <[email protected]> wrote:
> >> I don't think it makes much sense to set unlimited for both 0 and
> >> ULONG_MAX, that would probably just create even more confusion.
> I agree.
> Unlimited was INT_MAX since 0.99.10 and ULONG_MAX since 2.3.39 (with
> proper backward compatibility for user space).
>
> Adding a second value for unlimited just creates confusion.
> >> But then again, we shouldn't even care about breaking things with shmmax
> >> or shmall with 0 value, it just makes no sense from a user PoV. shmmax
> >> cannot be 0 unless there's an overflow, which voids any valid cases, and
> >> thus shmall cannot be 0 either as it would go against any values set for
> >> shmmax. I think it's safe to ignore this.
> > Agreed.
> > IMHO, until you find out any incompatibility issue of this, we don't
> > need the switch
> > because we can't make good workaround for that. I'd suggest to merge your patch
> > and see what happen.
> I disagree:
> - "shmctl(,IPC_INFO,&buf); if (my_memory_size > buf.shmmax)
> perror("change shmmax");" worked correctly since 0.99.10. I don't think
> that merging the patch and seeing what happens is the right approach.
I agree, we *must* get this right the first time. So no rushing into
things that might later come and bite us in the future.
That said, if users are doing that kind of check, then they must also
check against shmmin, which has _always_ been 1. So shmmax == 0 is a no
no. Otherwise it's not the kernel's fault that they're misusing the API,
which IMO is pretty straightforward for such things. And if shmmax
cannot be 0, shmall cannot be 0.
> - setting shmmax by default to ULONG_MAX is the perfect workaround.
>
> What reasons are there against the one-line patch?
There's really nothing wrong with it, it's just that 0 is a much nicer
value to have for 'unlimited'. And if we can get away with it, then
lets, otherwise yes, we should go with this path.
Hi Davidlohr,
On 04/03/2014 02:20 AM, Davidlohr Bueso wrote:
> The default size for shmmax is, and always has been, 32Mb.
> Today, in the XXI century, it seems that this value is rather small,
> making users have to increase it via sysctl, which can cause
> unnecessary work and userspace application workarounds[1].
>
> [snip]
> Running this patch through LTP, everything passes, except the following,
> which, due to the nature of this change, is quite expected:
>
> shmget02 1 TFAIL : call succeeded unexpectedly
Why is this TFAIL expected?
>
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 7645961..ae01ffa 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -490,10 +490,12 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> int id;
> vm_flags_t acctflag = 0;
>
> - if (size < SHMMIN || size > ns->shm_ctlmax)
> + if (ns->shm_ctlmax &&
> + (size < SHMMIN || size > ns->shm_ctlmax))
> return -EINVAL;
>
> - if (ns->shm_tot + numpages > ns->shm_ctlall)
> + if (ns->shm_ctlall &&
> + ns->shm_tot + numpages > ns->shm_ctlall)
> return -ENOSPC;
>
> shp = ipc_rcu_alloc(sizeof(*shp));
Ok, I understand it:
Your patch disables checking shmmax, shmall *AND* checking for SHMMIN.
a) Have you double checked that 0-sized shm segments work properly?
Does the swap code handle it properly, ...?
b) It's that yet another risk for user space incompatibility?
c) The patch summary is misleading, the impact on SHMMIN is not mentioned.
--
Manfred
On Fri, 2014-04-11 at 20:28 +0200, Manfred Spraul wrote:
> Hi Davidlohr,
>
> On 04/03/2014 02:20 AM, Davidlohr Bueso wrote:
> > The default size for shmmax is, and always has been, 32Mb.
> > Today, in the XXI century, it seems that this value is rather small,
> > making users have to increase it via sysctl, which can cause
> > unnecessary work and userspace application workarounds[1].
> >
> > [snip]
> > Running this patch through LTP, everything passes, except the following,
> > which, due to the nature of this change, is quite expected:
> >
> > shmget02 1 TFAIL : call succeeded unexpectedly
> Why is this TFAIL expected?
So looking at shmget02.c, this is the case that fails:
for (i = 0; i < TST_TOTAL; i++) {
/*
* Look for a failure ...
*/
TEST(shmget(*(TC[i].skey), TC[i].size, TC[i].flags));
if (TEST_RETURN != -1) {
tst_resm(TFAIL, "call succeeded unexpectedly");
continue;
}
Where TC[0] is:
struct test_case_t {
int *skey;
int size;
int flags;
int error;
} TC[] = {
/* EINVAL - size is 0 */
{
&shmkey2, 0, IPC_CREAT | IPC_EXCL | SHM_RW, EINVAL},
So it's expected because now 0 is actually valid. And before:
EINVAL A new segment was to be created and size < SHMMIN or size > SHMMAX
> >
> > diff --git a/ipc/shm.c b/ipc/shm.c
> > index 7645961..ae01ffa 100644
> > --- a/ipc/shm.c
> > +++ b/ipc/shm.c
> > @@ -490,10 +490,12 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> > int id;
> > vm_flags_t acctflag = 0;
> >
> > - if (size < SHMMIN || size > ns->shm_ctlmax)
> > + if (ns->shm_ctlmax &&
> > + (size < SHMMIN || size > ns->shm_ctlmax))
> > return -EINVAL;
> >
> > - if (ns->shm_tot + numpages > ns->shm_ctlall)
> > + if (ns->shm_ctlall &&
> > + ns->shm_tot + numpages > ns->shm_ctlall)
> > return -ENOSPC;
> >
> > shp = ipc_rcu_alloc(sizeof(*shp));
> Ok, I understand it:
> Your patch disables checking shmmax, shmall *AND* checking for SHMMIN.
Right, if shmmax is 0, then there's no point checking for shmmin,
otherwise we'd always end up returning EINVAL.
>
> a) Have you double checked that 0-sized shm segments work properly?
> Does the swap code handle it properly, ...? EINVAL A new segment was to be created and size < SHMMIN or size > SHMMAX
Hmm so I've been using this patch just fine on my laptop since I sent
it. So far I haven't seen any issues. Are you refering to something in
particular? I'd be happy to run any cases you're concerned with.
> b) It's that yet another risk for user space incompatibility?
Sorry, I don't follow here.
> c) The patch summary is misleading, the impact on SHMMIN is not mentioned.
Sure, I can explicitly add it to the changelog.
Thanks,
Davidlohr
On Fri, 2014-04-11 at 13:27 -0700, Davidlohr Bueso wrote:
> On Fri, 2014-04-11 at 20:28 +0200, Manfred Spraul wrote:
> > Your patch disables checking shmmax, shmall *AND* checking for SHMMIN.
>
> Right, if shmmax is 0, then there's no point checking for shmmin,
> otherwise we'd always end up returning EINVAL.
Actually that's complete bogus.
Now that I think of it, shmget(key, 0, flg) should still return EINVAL.
That has *nothing* to do with any limits we are changing here and is
simply wrong since the passed size still cannot be less than 1 (shmmin).
I'll update the patch, thanks for pointing this out.
On 04/11/2014 10:27 PM, Davidlohr Bueso wrote:
> On Fri, 2014-04-11 at 20:28 +0200, Manfred Spraul wrote:
>> Hi Davidlohr,
>>
>> On 04/03/2014 02:20 AM, Davidlohr Bueso wrote:
>>> The default size for shmmax is, and always has been, 32Mb.
>>> Today, in the XXI century, it seems that this value is rather small,
>>> making users have to increase it via sysctl, which can cause
>>> unnecessary work and userspace application workarounds[1].
>>>
>>> [snip]
>>> Running this patch through LTP, everything passes, except the following,
>>> which, due to the nature of this change, is quite expected:
>>>
>>> shmget02 1 TFAIL : call succeeded unexpectedly
>> Why is this TFAIL expected?
> So looking at shmget02.c, this is the case that fails:
>
> for (i = 0; i < TST_TOTAL; i++) {
> /*
> * Look for a failure ...
> */
>
> TEST(shmget(*(TC[i].skey), TC[i].size, TC[i].flags));
>
> if (TEST_RETURN != -1) {
> tst_resm(TFAIL, "call succeeded unexpectedly");
> continue;
> }
>
> Where TC[0] is:
> struct test_case_t {
> int *skey;
> int size;
> int flags;
> int error;
> } TC[] = {
> /* EINVAL - size is 0 */
> {
> &shmkey2, 0, IPC_CREAT | IPC_EXCL | SHM_RW, EINVAL},
>
> So it's expected because now 0 is actually valid. And before:
>
> EINVAL A new segment was to be created and size < SHMMIN or size > SHMMAX
>
>>> diff --git a/ipc/shm.c b/ipc/shm.c
>>> index 7645961..ae01ffa 100644
>>> --- a/ipc/shm.c
>>> +++ b/ipc/shm.c
>>> @@ -490,10 +490,12 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
>>> int id;
>>> vm_flags_t acctflag = 0;
>>>
>>> - if (size < SHMMIN || size > ns->shm_ctlmax)
>>> + if (ns->shm_ctlmax &&
>>> + (size < SHMMIN || size > ns->shm_ctlmax))
>>> return -EINVAL;
>>>
>>> - if (ns->shm_tot + numpages > ns->shm_ctlall)
>>> + if (ns->shm_ctlall &&
>>> + ns->shm_tot + numpages > ns->shm_ctlall)
>>> return -ENOSPC;
>>>
>>> shp = ipc_rcu_alloc(sizeof(*shp));
>> Ok, I understand it:
>> Your patch disables checking shmmax, shmall *AND* checking for SHMMIN.
> Right, if shmmax is 0, then there's no point checking for shmmin,
> otherwise we'd always end up returning EINVAL.
>
>> a) Have you double checked that 0-sized shm segments work properly?
>> Does the swap code handle it properly, ...? EINVAL A new segment was to be created and size < SHMMIN or size > SHMMAX
> Hmm so I've been using this patch just fine on my laptop since I sent
> it. So far I haven't seen any issues. Are you refering to something in
> particular? I'd be happy to run any cases you're concerned with.
I'm thinking about malicious applications.
Create 0-sized segments and then map them. Does find_vma_intersection
handle that case?
The same for all other functions that are called by the shm code.
You can't replace code review by "runs for a month"
>> b) It's that yet another risk for user space incompatibility?
> Sorry, I don't follow here.
Applications expect that shmget(,0,) fails.
--
Manfred
On Sat, 2014-04-12 at 10:50 +0200, Manfred Spraul wrote:
> On 04/11/2014 10:27 PM, Davidlohr Bueso wrote:
> > On Fri, 2014-04-11 at 20:28 +0200, Manfred Spraul wrote:
> >> Hi Davidlohr,
> >>
> >> On 04/03/2014 02:20 AM, Davidlohr Bueso wrote:
> >>> The default size for shmmax is, and always has been, 32Mb.
> >>> Today, in the XXI century, it seems that this value is rather small,
> >>> making users have to increase it via sysctl, which can cause
> >>> unnecessary work and userspace application workarounds[1].
> >>>
> >>> [snip]
> >>> Running this patch through LTP, everything passes, except the following,
> >>> which, due to the nature of this change, is quite expected:
> >>>
> >>> shmget02 1 TFAIL : call succeeded unexpectedly
> >> Why is this TFAIL expected?
> > So looking at shmget02.c, this is the case that fails:
> >
> > for (i = 0; i < TST_TOTAL; i++) {
> > /*
> > * Look for a failure ...
> > */
> >
> > TEST(shmget(*(TC[i].skey), TC[i].size, TC[i].flags));
> >
> > if (TEST_RETURN != -1) {
> > tst_resm(TFAIL, "call succeeded unexpectedly");
> > continue;
> > }
> >
> > Where TC[0] is:
> > struct test_case_t {
> > int *skey;
> > int size;
> > int flags;
> > int error;
> > } TC[] = {
> > /* EINVAL - size is 0 */
> > {
> > &shmkey2, 0, IPC_CREAT | IPC_EXCL | SHM_RW, EINVAL},
> >
> > So it's expected because now 0 is actually valid. And before:
> >
> > EINVAL A new segment was to be created and size < SHMMIN or size > SHMMAX
> >
> >>> diff --git a/ipc/shm.c b/ipc/shm.c
> >>> index 7645961..ae01ffa 100644
> >>> --- a/ipc/shm.c
> >>> +++ b/ipc/shm.c
> >>> @@ -490,10 +490,12 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> >>> int id;
> >>> vm_flags_t acctflag = 0;
> >>>
> >>> - if (size < SHMMIN || size > ns->shm_ctlmax)
> >>> + if (ns->shm_ctlmax &&
> >>> + (size < SHMMIN || size > ns->shm_ctlmax))
> >>> return -EINVAL;
> >>>
> >>> - if (ns->shm_tot + numpages > ns->shm_ctlall)
> >>> + if (ns->shm_ctlall &&
> >>> + ns->shm_tot + numpages > ns->shm_ctlall)
> >>> return -ENOSPC;
> >>>
> >>> shp = ipc_rcu_alloc(sizeof(*shp));
> >> Ok, I understand it:
> >> Your patch disables checking shmmax, shmall *AND* checking for SHMMIN.
> > Right, if shmmax is 0, then there's no point checking for shmmin,
> > otherwise we'd always end up returning EINVAL.
> >
> >> a) Have you double checked that 0-sized shm segments work properly?
> >> Does the swap code handle it properly, ...? EINVAL A new segment was to be created and size < SHMMIN or size > SHMMAX
> > Hmm so I've been using this patch just fine on my laptop since I sent
> > it. So far I haven't seen any issues. Are you refering to something in
> > particular? I'd be happy to run any cases you're concerned with.
> I'm thinking about malicious applications.
> Create 0-sized segments and then map them. Does find_vma_intersection
> handle that case?
> The same for all other functions that are called by the shm code.
Right I agree, which is why I corrected it in v2.
> You can't replace code review by "runs for a month"
Manfred, I was not referring to that at all.
> >> b) It's that yet another risk for user space incompatibility?
> > Sorry, I don't follow here.
> Applications expect that shmget(,0,) fails.
Again, v2.
Hi Andrew,
On 04/02/2014 12:08 AM, Andrew Morton wrote:
> Well, I'm assuming 64GB==infinity. It *was* infinity in the RHEL5
> timeframe, but infinity has since become larger so pickanumber.
I think infinity is the right solution:
The only common case where infinity is wrong would be Android - and
Android disables sysv shm entirely.
There are two patches:
http://marc.info/?l=linux-kernel&m=139730332306185&q=raw
http://marc.info/?l=linux-kernel&m=139727299800644&q=raw
Could you apply one of them?
I wrote the first one, thus I'm biased which one is better.
--
Manfred
On Sun, 2014-04-13 at 20:05 +0200, Manfred Spraul wrote:
> Hi Andrew,
>
> On 04/02/2014 12:08 AM, Andrew Morton wrote:
> > Well, I'm assuming 64GB==infinity. It *was* infinity in the RHEL5
> > timeframe, but infinity has since become larger so pickanumber.
>
> I think infinity is the right solution:
> The only common case where infinity is wrong would be Android - and
> Android disables sysv shm entirely.
>
> There are two patches:
> http://marc.info/?l=linux-kernel&m=139730332306185&q=raw
If you apply this one, please include the below, which updates a missing
definition for SHMALL.
diff --git a/include/uapi/linux/shm.h b/include/uapi/linux/shm.h
index d9497b7..0774ec4 100644
--- a/include/uapi/linux/shm.h
+++ b/include/uapi/linux/shm.h
@@ -9,14 +9,14 @@
/*
* SHMMAX, SHMMNI and SHMALL are upper limits are defaults which can
- * be increased by sysctl
+ * be decreased by sysctl.
*/
#define SHMMAX ULONG_MAX /* max shared seg size (bytes) */
#define SHMMIN 1 /* min shared seg size (bytes) */
#define SHMMNI 4096 /* max num of segs system wide */
#ifndef __KERNEL__
-#define SHMALL (SHMMAX/getpagesize()*(SHMMNI/16))
+#define SHMALL ULONG_MAX
#endif
#define SHMSEG SHMMNI /* max shared segs per process */
> http://marc.info/?l=linux-kernel&m=139727299800644&q=raw
>
> Could you apply one of them?
> I wrote the first one, thus I'm biased which one is better.
>
> --
> Manfred
On Sun, 13 Apr 2014 20:05:34 +0200 Manfred Spraul <[email protected]> wrote:
> Hi Andrew,
>
> On 04/02/2014 12:08 AM, Andrew Morton wrote:
> > Well, I'm assuming 64GB==infinity. It *was* infinity in the RHEL5
> > timeframe, but infinity has since become larger so pickanumber.
>
> I think infinity is the right solution:
> The only common case where infinity is wrong would be Android - and
> Android disables sysv shm entirely.
>
> There are two patches:
> http://marc.info/?l=linux-kernel&m=139730332306185&q=raw
> http://marc.info/?l=linux-kernel&m=139727299800644&q=raw
>
> Could you apply one of them?
> I wrote the first one, thus I'm biased which one is better.
I like your patch because applying it might encourage you to send more
kernel patches - I miss the old days ;)
But I do worry about disrupting existing systems so I like Davidlohr's
idea of making the change a no-op for people who are currently
explicitly setting shmmax and shmall.
In an ideal world, system administrators would review this change,
would remove their explicit limit-setting and would retest everything
then roll it out. But in the real world with Davidlohr's patch, they
just won't know that we did this and they'll still be manually
configuring shmmax/shmall ten years from now. I almost wonder if we
should drop a printk_once("hey, you don't need to do that any more")
when shmmax/shmall are altered?
I think the changelogs for both patches could afford to spend much more
time talking about *why* we're making this change. What problem is
the current code causing? This is a somewhat risky change and we
should demonstrate good reasons for making it. If people end up taking
damage because of this change, they are going to be looking at that
changelog trying to work out why we did this to them, so let's explain
it carefully.
On Wed, 2014-04-16 at 15:46 -0700, Andrew Morton wrote:
> On Sun, 13 Apr 2014 20:05:34 +0200 Manfred Spraul <[email protected]> wrote:
>
> > Hi Andrew,
> >
> > On 04/02/2014 12:08 AM, Andrew Morton wrote:
> > > Well, I'm assuming 64GB==infinity. It *was* infinity in the RHEL5
> > > timeframe, but infinity has since become larger so pickanumber.
> >
> > I think infinity is the right solution:
> > The only common case where infinity is wrong would be Android - and
> > Android disables sysv shm entirely.
> >
> > There are two patches:
> > http://marc.info/?l=linux-kernel&m=139730332306185&q=raw
> > http://marc.info/?l=linux-kernel&m=139727299800644&q=raw
> >
> > Could you apply one of them?
> > I wrote the first one, thus I'm biased which one is better.
>
> I like your patch because applying it might encourage you to send more
> kernel patches - I miss the old days ;)
>
> But I do worry about disrupting existing systems so I like Davidlohr's
> idea of making the change a no-op for people who are currently
> explicitly setting shmmax and shmall.
>
> In an ideal world, system administrators would review this change,
> would remove their explicit limit-setting and would retest everything
> then roll it out. But in the real world with Davidlohr's patch, they
> just won't know that we did this and they'll still be manually
> configuring shmmax/shmall ten years from now. I almost wonder if we
> should drop a printk_once("hey, you don't need to do that any more")
> when shmmax/shmall are altered?
That's a good idea, and along with the manpage update (+ probably some
blog/lwn post) users should be well informed. We want them to update
their scripts. Cc'ing Michael Kerrisk btw, who might give us a fresh
userspace perspective.
> I think the changelogs for both patches could afford to spend much more
> time talking about *why* we're making this change. What problem is
> the current code causing? This is a somewhat risky change and we
> should demonstrate good reasons for making it. If people end up taking
> damage because of this change, they are going to be looking at that
> changelog trying to work out why we did this to them, so let's explain
> it carefully.
Fair enough, although that's really why I added the link to Robert Haas'
blog post. In my past life I did some technical support for Oracle, so I
*know* the pain such limits can cause. How does the following sound?
"Unix has historically required setting these limits for shared
memory, and Linux inherited such behavior. The consequence of this
is added complexity for users and administrators. One very common
example are Database setup/installation documents and scripts, where
users must manually calculate the values for these limits. This also
requires (some) knowledge of how the underlying memory management works,
thus causing, in many occasions, the limits to just be flat out wrong.
Disabling these limits sooner could have saved companies a lot of time,
headaches and money for support. But it's never too late, simplify users
life now."
On Thu, Apr 17, 2014 at 12:46 AM, Andrew Morton
<[email protected]> wrote:
> On Sun, 13 Apr 2014 20:05:34 +0200 Manfred Spraul <[email protected]> wrote:
>
>> Hi Andrew,
>>
>> On 04/02/2014 12:08 AM, Andrew Morton wrote:
>> > Well, I'm assuming 64GB==infinity. It *was* infinity in the RHEL5
>> > timeframe, but infinity has since become larger so pickanumber.
>>
>> I think infinity is the right solution:
>> The only common case where infinity is wrong would be Android - and
>> Android disables sysv shm entirely.
>>
>> There are two patches:
>> http://marc.info/?l=linux-kernel&m=139730332306185&q=raw
>> http://marc.info/?l=linux-kernel&m=139727299800644&q=raw
>>
>> Could you apply one of them?
>> I wrote the first one, thus I'm biased which one is better.
>
> I like your patch because applying it might encourage you to send more
> kernel patches - I miss the old days ;)
>
> But I do worry about disrupting existing systems so I like Davidlohr's
> idea of making the change a no-op for people who are currently
> explicitly setting shmmax and shmall.
Agreed. It's hard to imagine situations where people might care
nowadays, but there's no limits to people's insane inventiveness. Some
people really might want to set an upper limit.
> In an ideal world, system administrators would review this change,
And in the ideal world, patches such as this would CC
[email protected], as described in
Documentation/SubmitChecklist, so that users who care about getting
advance warning on API changes could be alerted and might even review
and comment...
> would remove their explicit limit-setting and would retest everything
> then roll it out. But in the real world with Davidlohr's patch, they
> just won't know that we did this and they'll still be manually
> configuring shmmax/shmall ten years from now. I almost wonder if we
> should drop a printk_once("hey, you don't need to do that any more")
> when shmmax/shmall are altered?
Makes some sense. But then what about the (strange) people who really
do want to set a limit. Do we just say that they have to live with the
message?
Cheers,
Michael
On 04/17/2014 12:41 PM, Michael Kerrisk wrote:
> On Thu, Apr 17, 2014 at 12:46 AM, Andrew Morton
> <[email protected]> wrote:
>> On Sun, 13 Apr 2014 20:05:34 +0200 Manfred Spraul <[email protected]> wrote:
>>
>>> Hi Andrew,
>>>
>>> On 04/02/2014 12:08 AM, Andrew Morton wrote:
>>>> Well, I'm assuming 64GB==infinity. It *was* infinity in the RHEL5
>>>> timeframe, but infinity has since become larger so pickanumber.
>>> I think infinity is the right solution:
>>> The only common case where infinity is wrong would be Android - and
>>> Android disables sysv shm entirely.
>>>
>>> There are two patches:
>>> http://marc.info/?l=linux-kernel&m=139730332306185&q=raw
>>> http://marc.info/?l=linux-kernel&m=139727299800644&q=raw
>>>
>>> Could you apply one of them?
>>> I wrote the first one, thus I'm biased which one is better.
>> I like your patch because applying it might encourage you to send more
>> kernel patches - I miss the old days ;)
>>
>> But I do worry about disrupting existing systems so I like Davidlohr's
>> idea of making the change a no-op for people who are currently
>> explicitly setting shmmax and shmall.
> Agreed. It's hard to imagine situations where people might care
> nowadays, but there's no limits to people's insane inventiveness. Some
> people really might want to set an upper limit.
I don't understand that: neither patch has any impact after an explicit
sysctl that overwrites shmmax.
>> In an ideal world, system administrators would review this change,
> And in the ideal world, patches such as this would CC
> [email protected], as described in
> Documentation/SubmitChecklist, so that users who care about getting
> advance warning on API changes could be alerted and might even review
> and comment...
Good point.
Davidlohr: Your patch has an impact on shmctl(,IPC_INFO,).
Could you add that for v3?
I'll try to make a v2 (with your update to the uapi header file) tomorrow.
--
Manfred
On Thu, Apr 17, 2014 at 6:41 PM, Manfred Spraul
<[email protected]> wrote:
> On 04/17/2014 12:41 PM, Michael Kerrisk wrote:
>>
>> On Thu, Apr 17, 2014 at 12:46 AM, Andrew Morton
>> <[email protected]> wrote:
>>>
>>> On Sun, 13 Apr 2014 20:05:34 +0200 Manfred Spraul
>>> <[email protected]> wrote:
>>>
>>>> Hi Andrew,
>>>>
>>>> On 04/02/2014 12:08 AM, Andrew Morton wrote:
>>>>>
>>>>> Well, I'm assuming 64GB==infinity. It *was* infinity in the RHEL5
>>>>> timeframe, but infinity has since become larger so pickanumber.
>>>>
>>>> I think infinity is the right solution:
>>>> The only common case where infinity is wrong would be Android - and
>>>> Android disables sysv shm entirely.
>>>>
>>>> There are two patches:
>>>> http://marc.info/?l=linux-kernel&m=139730332306185&q=raw
>>>> http://marc.info/?l=linux-kernel&m=139727299800644&q=raw
>>>>
>>>> Could you apply one of them?
>>>> I wrote the first one, thus I'm biased which one is better.
>>>
>>> I like your patch because applying it might encourage you to send more
>>> kernel patches - I miss the old days ;)
>>>
>>> But I do worry about disrupting existing systems so I like Davidlohr's
>>> idea of making the change a no-op for people who are currently
>>> explicitly setting shmmax and shmall.
>>
>> Agreed. It's hard to imagine situations where people might care
>> nowadays, but there's no limits to people's insane inventiveness. Some
>> people really might want to set an upper limit.
>
> I don't understand that: neither patch has any impact after an explicit
> sysctl that overwrites shmmax.
You don't understand it, because I was being dense :-}. I
misunderstood your patch. I think I was thrown by this line in the
commit message:
The patch disables both limits by setting the limits to ULONG_MAX.
Of course, you patch doesn't *disable* the limits, it simply sets the
defaults to the maximum.
>>> In an ideal world, system administrators would review this change,
>>
>> And in the ideal world, patches such as this would CC
>> [email protected], as described in
>> Documentation/SubmitChecklist, so that users who care about getting
>> advance warning on API changes could be alerted and might even review
>> and comment...
>
> Good point.
> Davidlohr: Your patch has an impact on shmctl(,IPC_INFO,).
> Could you add that for v3?
Well, actually, BOTH patches change the API, because they both affect
SHMALL/SHMMAX.
Cheers,
Michael
> I'll try to make a v2 (with your update to the uapi header file) tomorrow.
>
> --
> Manfred
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/