(2014/04/01 9:05), Andrew Morton wrote:
> On Mon, 31 Mar 2014 16:25:32 -0700 Davidlohr Bueso <[email protected]> wrote:
>
>> On Mon, 2014-03-31 at 16:13 -0700, Andrew Morton wrote:
>>> On Mon, 31 Mar 2014 15:59:33 -0700 Davidlohr Bueso <[email protected]> wrote:
>>>
>>>>>
>>>>> - Shouldn't there be a way to alter this namespace's shm_ctlmax?
>>>>
>>>> Unfortunately this would also add the complexity I previously mentioned.
>>>
>>> But if the current namespace's shm_ctlmax is too small, you're screwed.
>>> Have to shut down the namespace all the way back to init_ns and start
>>> again.
>>>
>>>>> - What happens if we just nuke the limit altogether and fall back to
>>>>> the next check, which presumably is the rlimit bounds?
>>>>
>>>> afaik we only have rlimit for msgqueues. But in any case, while I like
>>>> that simplicity, it's too late. Too many workloads (specially DBs) rely
>>>> heavily on shmmax. Removing it and relying on something else would thus
>>>> cause a lot of things to break.
>>>
>>> It would permit larger shm segments - how could that break things? It
>>> would make most or all of these issues go away?
>>>
>>
>> So sysadmins wouldn't be very happy, per man shmget(2):
>>
>> EINVAL A new segment was to be created and size < SHMMIN or size >
>> SHMMAX, or no new segment was to be created, a segment with given key
>> existed, but size is greater than the size of that segment.
>
> So their system will act as if they had set SHMMAX=enormous. What
> problems could that cause?
>
>
> Look. The 32M thing is causing problems. Arbitrarily increasing the
> arbitrary 32M to an arbitrary 128M won't fix anything - we still have
> the problem. Think bigger, please: how can we make this problem go
> away for ever?
>
Our middleware engineers has been complaining about this sysctl limit.
System administrator need to calculate required sysctl value by making sum
of all planned middlewares, and middleware provider needs to write "please
calculate systcl param by....." in their installation manuals.
Now, I think containers will be the base application platform. In the container,
the memory is limited by "memory cgroup" and the admin of container should be able
to overwrite the limit in the container to the value arbitrarily.
Because of these, I vote for
1. remove the limit
(but removing this may cause applications corrupted...)
or
2. A container admin should set the value considering memcg's limit.
BTW, if /proc/sys is bind-mounted as read-only by lxc runtime,
it seems difficult for admin to modify it. I have no idea whether it's lack
of kernel feature or it's userland's problem.
Thanks,
-Kame
On Tue, 01 Apr 2014 15:29:05 +0900 Kamezawa Hiroyuki <[email protected]> wrote:
> >
> > So their system will act as if they had set SHMMAX=enormous. What
> > problems could that cause?
> >
> >
> > Look. The 32M thing is causing problems. Arbitrarily increasing the
> > arbitrary 32M to an arbitrary 128M won't fix anything - we still have
> > the problem. Think bigger, please: how can we make this problem go
> > away for ever?
> >
>
> Our middleware engineers has been complaining about this sysctl limit.
> System administrator need to calculate required sysctl value by making sum
> of all planned middlewares, and middleware provider needs to write "please
> calculate systcl param by....." in their installation manuals.
Why aren't people just setting the sysctl to a petabyte? What problems
would that lead to?
>> Our middleware engineers has been complaining about this sysctl limit.
>> System administrator need to calculate required sysctl value by making sum
>> of all planned middlewares, and middleware provider needs to write "please
>> calculate systcl param by....." in their installation manuals.
>
> Why aren't people just setting the sysctl to a petabyte? What problems
> would that lead to?
I don't have much Fujitsu middleware knowledges. But I'd like to explain
very funny bug I saw.
1. middleware-A suggest to set SHMMAX to very large value (maybe
LONG_MAX, but my memory was flushed)
2. middleware-B suggest to set SHMMAX to increase some dozen mega byte.
Finally, it was overflow and didn't work at all.
Let's demonstrate.
# echo 18446744073709551615 > /proc/sys/kernel/shmmax
# cat /proc/sys/kernel/shmmax
18446744073709551615
# echo 18446744073709551616 > /proc/sys/kernel/shmmax
# cat /proc/sys/kernel/shmmax
0
That's why many open source software continue the silly game. But
again, I don't have knowledge about Fujitsu middleware. I'm waiting
kamezawa-san's answer.
On Tue, 2014-04-01 at 16:15 -0400, KOSAKI Motohiro wrote:
> >> Our middleware engineers has been complaining about this sysctl limit.
> >> System administrator need to calculate required sysctl value by making sum
> >> of all planned middlewares, and middleware provider needs to write "please
> >> calculate systcl param by....." in their installation manuals.
> >
> > Why aren't people just setting the sysctl to a petabyte? What problems
> > would that lead to?
>
> I don't have much Fujitsu middleware knowledges. But I'd like to explain
> very funny bug I saw.
>
> 1. middleware-A suggest to set SHMMAX to very large value (maybe
> LONG_MAX, but my memory was flushed)
> 2. middleware-B suggest to set SHMMAX to increase some dozen mega byte.
>
> Finally, it was overflow and didn't work at all.
>
> Let's demonstrate.
>
> # echo 18446744073709551615 > /proc/sys/kernel/shmmax
> # cat /proc/sys/kernel/shmmax
> 18446744073709551615
> # echo 18446744073709551616 > /proc/sys/kernel/shmmax
> # cat /proc/sys/kernel/shmmax
> 0
hehe, what a nasty little tunable this is. Reminds me of this:
https://access.redhat.com/site/solutions/16333
(2014/04/02 5:15), KOSAKI Motohiro wrote:
>>> Our middleware engineers has been complaining about this sysctl limit.
>>> System administrator need to calculate required sysctl value by making sum
>>> of all planned middlewares, and middleware provider needs to write "please
>>> calculate systcl param by....." in their installation manuals.
>>
>> Why aren't people just setting the sysctl to a petabyte? What problems
>> would that lead to?
>
> I don't have much Fujitsu middleware knowledges. But I'd like to explain
> very funny bug I saw.
>
> 1. middleware-A suggest to set SHMMAX to very large value (maybe
> LONG_MAX, but my memory was flushed)
> 2. middleware-B suggest to set SHMMAX to increase some dozen mega byte.
>
> Finally, it was overflow and didn't work at all.
>
> Let's demonstrate.
>
> # echo 18446744073709551615 > /proc/sys/kernel/shmmax
> # cat /proc/sys/kernel/shmmax
> 18446744073709551615
> # echo 18446744073709551616 > /proc/sys/kernel/shmmax
> # cat /proc/sys/kernel/shmmax
> 0
>
> That's why many open source software continue the silly game. But
> again, I don't have knowledge about Fujitsu middleware. I'm waiting
> kamezawa-san's answer.
>
Nowadays, Middleware/application are required to be installed automatically without
any admin's operations. But the shmmax tends to be a value which admin needs to modify
by hand after installation. This is not the last one problem, but it is.
I says MW engineers "you, middleware/application, can modify it automatically
as you needed, there will be no pain".
But they tend not to do it. (in my guess) in application writer's way on thinking..
- If there is a limit by OS, it should have some meaning.
There may be an unknown, os internal reason which the system admin need to check it.
For example, os will consume more resource when shmmax is enlarged.
- If there is a limit by OS, it should be modified by admin.
I guess customer thinks so, too. There is no official information "increasing shmmax
will not cunsume any resource and will not cause any problem in the kernel inside."
Then, admins need to set it. Middleware needs to write "please modify the sysctl
value based on this calculation....." in their manual.
I think the worst problem about this "limit" is that it's hard to explain "why this limit
exists". I need to answer "I guess it's just legacy, hehe...."
Thanks,
-Kame
(2014/04/02 4:19), Andrew Morton wrote:
> On Tue, 01 Apr 2014 15:29:05 +0900 Kamezawa Hiroyuki <[email protected]> wrote:
>
>>>
>>> So their system will act as if they had set SHMMAX=enormous. What
>>> problems could that cause?
>>>
>>>
>>> Look. The 32M thing is causing problems. Arbitrarily increasing the
>>> arbitrary 32M to an arbitrary 128M won't fix anything - we still have
>>> the problem. Think bigger, please: how can we make this problem go
>>> away for ever?
>>>
>>
>> Our middleware engineers has been complaining about this sysctl limit.
>> System administrator need to calculate required sysctl value by making sum
>> of all planned middlewares, and middleware provider needs to write "please
>> calculate systcl param by....." in their installation manuals.
>
> Why aren't people just setting the sysctl to a petabyte? What problems
> would that lead to?
>
They(and admin) don't know the fact, setting petabytes won't cause any pain.
In their thinking:
==
If there is a kernel's limit, it should have some (bad) side-effect and
the trade-off which must be handled by admin is represented by the limit.
In this case, they think setting this value large will consume tons of resource.
==
They don't care kernel's implemenation but takes care of what API says.
Of course, always I was asked, I answer set it to peta-bytes. But the fact
*default is small* makes them doubtful.
Thanks,
-Kame
> Why aren't people just setting the sysctl to a petabyte? What problems
> would that lead to?
Historically - hanging on real world desktop systems when someone
accidentally creates a giant SHM segment and maps it.
If you are running with vm overcmmit set to actually do checks then it
*shouldn't* blow up nowdays.
More to the point wtf are people still using prehistoric sys5 IPC APIs
not shmemfs/posix shmem ?
(2014/04/02 23:55), One Thousand Gnomes wrote:
>> Why aren't people just setting the sysctl to a petabyte? What problems
>> would that lead to?
>
> Historically - hanging on real world desktop systems when someone
> accidentally creates a giant SHM segment and maps it.
>
> If you are running with vm overcmmit set to actually do checks then it
> *shouldn't* blow up nowdays.
>
> More to the point wtf are people still using prehistoric sys5 IPC APIs
> not shmemfs/posix shmem ?
>
AFAIK, there are many sys5 ipc users.
And admins are using ipcs etc...for checking status. I guess they will not
change the attitude until they see trouble with sysv IPC.
*) I think some RedHat's document(MRG?) says sysv IPC is obsolete clearly but...
I tend to recommend posix shared memory when people newly starts development but
there is an another trap.
IIUC, for posix shmem, an implicit size limit is applied by tmpfs's fs size.
tmpfs mounted on /dev/shm tends to be limited to half size of system memory.
It's hard to know that limit for users before hitting trouble.
Thanks,
-Kame