From: Solofo Ramangalahy <[email protected]>
Initialize msgmnb value to
min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE)
to increase the default value for larger machines.
MSG_CPU_SCALE scaling factor is defined to be 4, as 16384 x 4 = 65536
is an already used and recommended value.
The msgmni value is made dependant of msgmnb to keep the memory
dedicated to message queues within the 1/MSG_MEM_SCALE of lowmem
bound.
Unlike msgmni, the value is not scaled (down) with respect to the
number of ipc namespaces for simplicity.
To disable recomputation when user explicitely set a value,
we reuse the callback defined for msgmni.
As msgmni and msgmnb are correlated, user settings of any of the two
disable recomputation of both, for now. This is refined in a later
patch.
When a negative value is put in /proc/sys/kernel/msgmnb
automatic recomputing is re-enabled.
Signed-off-by: Solofo Ramangalahy <[email protected]>
---
Documentation/sysctl/kernel.txt | 28 ++++++++++++++++++++++++++++
include/linux/msg.h | 6 ++++++
ipc/ipc_sysctl.c | 5 +++--
ipc/msg.c | 17 +++++++++++++----
4 files changed, 50 insertions(+), 6 deletions(-)
Index: b/ipc/msg.c
===================================================================
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -38,6 +38,7 @@
#include <linux/rwsem.h>
#include <linux/nsproxy.h>
#include <linux/ipc_namespace.h>
+#include <linux/cpumask.h>
#include <asm/current.h>
#include <asm/uaccess.h>
@@ -92,7 +93,7 @@ void recompute_msgmni(struct ipc_namespa
si_meminfo(&i);
allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit)
- / MSGMNB;
+ / ns->msg_ctlmnb;
nb_ns = atomic_read(&nr_ipc_ns);
allowed /= nb_ns;
@@ -108,11 +109,19 @@ void recompute_msgmni(struct ipc_namespa
ns->msg_ctlmni = allowed;
}
+/*
+ * Scale msgmnb with the number of online cpus, up to 4x MSGMNB.
+ */
+void recompute_msgmnb(struct ipc_namespace *ns)
+{
+ ns->msg_ctlmnb =
+ min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE);
+}
void msg_init_ns(struct ipc_namespace *ns)
{
ns->msg_ctlmax = MSGMAX;
- ns->msg_ctlmnb = MSGMNB;
+ recompute_msgmnb(ns);
recompute_msgmni(ns);
@@ -132,8 +141,8 @@ void __init msg_init(void)
{
msg_init_ns(&init_ipc_ns);
- printk(KERN_INFO "msgmni has been set to %d\n",
- init_ipc_ns.msg_ctlmni);
+ printk(KERN_INFO "msgmni has been set to %d, msgmnb to %d\n",
+ init_ipc_ns.msg_ctlmni, init_ipc_ns.msg_ctlmnb);
ipc_init_proc_interface("sysvipc/msg",
" key msqid perms cbytes qnum lspid lrpid uid gid cuid cgid stime rtime ctime\n",
Index: b/include/linux/msg.h
===================================================================
--- a/include/linux/msg.h
+++ b/include/linux/msg.h
@@ -58,6 +58,12 @@ struct msginfo {
* more than 16 GB : msgmni = 32K (IPCMNI)
*/
#define MSG_MEM_SCALE 32
+/*
+ * Scaling factor to compute msgmnb: ns->msg_ctlmnb is between MSGMNB
+ * and MSGMNB * MSG_CPU_SCALE. This leads to a max msgmnb value of
+ * 65536 which is an already used and recommended value.
+ */
+#define MSG_CPU_SCALE 4
#define MSGMNI 16 /* <= IPCMNI */ /* max # of msg queue identifiers */
#define MSGMAX 8192 /* <= INT_MAX */ /* max size of message (bytes) */
Index: b/ipc/ipc_sysctl.c
===================================================================
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -42,6 +42,7 @@ static void tunable_set_callback(int val
* Re-enable automatic recomputing only if not already
* enabled.
*/
+ recompute_msgmnb(current->nsproxy->ipc_ns);
recompute_msgmni(current->nsproxy->ipc_ns);
cond_register_ipcns_notifier(current->nsproxy->ipc_ns);
}
@@ -210,8 +211,8 @@ static struct ctl_table ipc_kern_table[]
.data = &init_ipc_ns.msg_ctlmnb,
.maxlen = sizeof (init_ipc_ns.msg_ctlmnb),
.mode = 0644,
- .proc_handler = proc_ipc_dointvec,
- .strategy = sysctl_ipc_data,
+ .proc_handler = proc_ipc_callback_dointvec,
+ .strategy = sysctl_ipc_registered_data,
},
{
.ctl_name = KERN_SEM,
Index: b/Documentation/sysctl/kernel.txt
===================================================================
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -179,6 +179,34 @@ kernel stack.
==============================================================
+msgmnb
+
+Maximum size in bytes (not in message count) of a single SystemV IPC
+message queue (b stands for bytes).
+
+This value is dynamic and depends on the online cpu count of the
+machine (taking cpu hotplug into account).
+
+Computed values are between MSGMNB and MSGMNB*MSG_CPU_SCALE #define
+constants (currently [16384,65536]).
+
+The exact value is automatically (re)computed, but:
+. If the value is positioned from user space (via procfs or sysctl()),
+ to a positive value then the automatic recomputation is
+ disabled. This leaves control to user space. E.g.
+
+ # echo 16384 > /proc/sys/kernel/msgmnb
+
+. If the value is positioned from user space to a negative value, then
+ the computation is reenabled. E.g.
+
+ # echo -1 > /proc/sys/kernel/msgmnb
+
+See recompute_msgmnb() function in ipc/ directory for details.
+The value of msgmnb is coupled with the value of msgmni.
+
+==============================================================
+
osrelease, ostype & version:
# cat osrelease
--
On Tue, 24 Jun 2008 11:34:53 +0200
<[email protected]> wrote:
> From: Solofo Ramangalahy <[email protected]>
>
> Initialize msgmnb value to
> min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE)
> to increase the default value for larger machines.
>
> MSG_CPU_SCALE scaling factor is defined to be 4, as 16384 x 4 = 65536
> is an already used and recommended value.
>
> The msgmni value is made dependant of msgmnb to keep the memory
> dedicated to message queues within the 1/MSG_MEM_SCALE of lowmem
> bound.
>
> Unlike msgmni, the value is not scaled (down) with respect to the
> number of ipc namespaces for simplicity.
>
> To disable recomputation when user explicitely set a value,
> we reuse the callback defined for msgmni.
>
> As msgmni and msgmnb are correlated, user settings of any of the two
> disable recomputation of both, for now. This is refined in a later
> patch.
>
> When a negative value is put in /proc/sys/kernel/msgmnb
> automatic recomputing is re-enabled.
>
Thanks for taking the time to describe this work so well.
>
> ---
> Documentation/sysctl/kernel.txt | 28 ++++++++++++++++++++++++++++
> include/linux/msg.h | 6 ++++++
> ipc/ipc_sysctl.c | 5 +++--
> ipc/msg.c | 17 +++++++++++++----
> 4 files changed, 50 insertions(+), 6 deletions(-)
>
> Index: b/ipc/msg.c
> ===================================================================
> --- a/ipc/msg.c
> +++ b/ipc/msg.c
> @@ -38,6 +38,7 @@
> #include <linux/rwsem.h>
> #include <linux/nsproxy.h>
> #include <linux/ipc_namespace.h>
> +#include <linux/cpumask.h>
>
> #include <asm/current.h>
> #include <asm/uaccess.h>
> @@ -92,7 +93,7 @@ void recompute_msgmni(struct ipc_namespa
>
> si_meminfo(&i);
> allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit)
> - / MSGMNB;
> + / ns->msg_ctlmnb;
> nb_ns = atomic_read(&nr_ipc_ns);
> allowed /= nb_ns;
>
> @@ -108,11 +109,19 @@ void recompute_msgmni(struct ipc_namespa
>
> ns->msg_ctlmni = allowed;
> }
> +/*
> + * Scale msgmnb with the number of online cpus, up to 4x MSGMNB.
> + */
> +void recompute_msgmnb(struct ipc_namespace *ns)
> +{
> + ns->msg_ctlmnb =
> + min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE);
> +}
>
> void msg_init_ns(struct ipc_namespace *ns)
> {
> ns->msg_ctlmax = MSGMAX;
> - ns->msg_ctlmnb = MSGMNB;
> + recompute_msgmnb(ns);
>
> recompute_msgmni(ns);
>
> @@ -132,8 +141,8 @@ void __init msg_init(void)
> {
> msg_init_ns(&init_ipc_ns);
>
> - printk(KERN_INFO "msgmni has been set to %d\n",
> - init_ipc_ns.msg_ctlmni);
> + printk(KERN_INFO "msgmni has been set to %d, msgmnb to %d\n",
> + init_ipc_ns.msg_ctlmni, init_ipc_ns.msg_ctlmnb);
>
> ipc_init_proc_interface("sysvipc/msg",
> " key msqid perms cbytes qnum lspid lrpid uid gid cuid cgid stime rtime ctime\n",
> Index: b/include/linux/msg.h
> ===================================================================
> --- a/include/linux/msg.h
> +++ b/include/linux/msg.h
> @@ -58,6 +58,12 @@ struct msginfo {
> * more than 16 GB : msgmni = 32K (IPCMNI)
> */
> #define MSG_MEM_SCALE 32
> +/*
> + * Scaling factor to compute msgmnb: ns->msg_ctlmnb is between MSGMNB
> + * and MSGMNB * MSG_CPU_SCALE. This leads to a max msgmnb value of
> + * 65536 which is an already used and recommended value.
> + */
> +#define MSG_CPU_SCALE 4
>
> #define MSGMNI 16 /* <= IPCMNI */ /* max # of msg queue identifiers */
> #define MSGMAX 8192 /* <= INT_MAX */ /* max size of message (bytes) */
> Index: b/ipc/ipc_sysctl.c
> ===================================================================
> --- a/ipc/ipc_sysctl.c
> +++ b/ipc/ipc_sysctl.c
> @@ -42,6 +42,7 @@ static void tunable_set_callback(int val
> * Re-enable automatic recomputing only if not already
> * enabled.
> */
> + recompute_msgmnb(current->nsproxy->ipc_ns);
> recompute_msgmni(current->nsproxy->ipc_ns);
> cond_register_ipcns_notifier(current->nsproxy->ipc_ns);
> }
> @@ -210,8 +211,8 @@ static struct ctl_table ipc_kern_table[]
> .data = &init_ipc_ns.msg_ctlmnb,
> .maxlen = sizeof (init_ipc_ns.msg_ctlmnb),
> .mode = 0644,
> - .proc_handler = proc_ipc_dointvec,
> - .strategy = sysctl_ipc_data,
> + .proc_handler = proc_ipc_callback_dointvec,
> + .strategy = sysctl_ipc_registered_data,
> },
> {
> .ctl_name = KERN_SEM,
> Index: b/Documentation/sysctl/kernel.txt
> ===================================================================
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -179,6 +179,34 @@ kernel stack.
>
> ==============================================================
>
> +msgmnb
> +
> +Maximum size in bytes (not in message count) of a single SystemV IPC
> +message queue (b stands for bytes).
> +
> +This value is dynamic and depends on the online cpu count of the
> +machine (taking cpu hotplug into account).
> +
> +Computed values are between MSGMNB and MSGMNB*MSG_CPU_SCALE #define
> +constants (currently [16384,65536]).
> +
> +The exact value is automatically (re)computed, but:
> +. If the value is positioned from user space (via procfs or sysctl()),
> + to a positive value then the automatic recomputation is
> + disabled. This leaves control to user space. E.g.
> +
> + # echo 16384 > /proc/sys/kernel/msgmnb
> +
> +. If the value is positioned from user space to a negative value, then
> + the computation is reenabled. E.g.
> +
> + # echo -1 > /proc/sys/kernel/msgmnb
> +
> +See recompute_msgmnb() function in ipc/ directory for details.
> +The value of msgmnb is coupled with the value of msgmni.
> +
The magical positive-versus-negative number trick is a bit obscure, and
I don't think there's any precedent for it in the kernel ABI (which is
what this is).
Is there anything we can do to reduce the unusualness of this
interface? Say, add a new /proc/sys/kernel/automatic-msgmnb which
contains the automatic scaling and leave /proc/sys/kernel/msgmnb
containing the manual scaling? Or something like that?
Andrew Morton wrote:
> On Tue, 24 Jun 2008 11:34:53 +0200
> <[email protected]> wrote:
>
>
>>From: Solofo Ramangalahy <[email protected]>
>>
>>Initialize msgmnb value to
>>min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE)
>>to increase the default value for larger machines.
>>
>>MSG_CPU_SCALE scaling factor is defined to be 4, as 16384 x 4 = 65536
>>is an already used and recommended value.
>>
>>The msgmni value is made dependant of msgmnb to keep the memory
>>dedicated to message queues within the 1/MSG_MEM_SCALE of lowmem
>>bound.
>>
>>Unlike msgmni, the value is not scaled (down) with respect to the
>>number of ipc namespaces for simplicity.
>>
>>To disable recomputation when user explicitely set a value,
>>we reuse the callback defined for msgmni.
>>
>>As msgmni and msgmnb are correlated, user settings of any of the two
>>disable recomputation of both, for now. This is refined in a later
>>patch.
>>
>>When a negative value is put in /proc/sys/kernel/msgmnb
>>automatic recomputing is re-enabled.
>>
>
>
> Thanks for taking the time to describe this work so well.
>
>
>>---
>> Documentation/sysctl/kernel.txt | 28 ++++++++++++++++++++++++++++
>> include/linux/msg.h | 6 ++++++
>> ipc/ipc_sysctl.c | 5 +++--
>> ipc/msg.c | 17 +++++++++++++----
>> 4 files changed, 50 insertions(+), 6 deletions(-)
>>
>>Index: b/ipc/msg.c
>>===================================================================
>>--- a/ipc/msg.c
>>+++ b/ipc/msg.c
>>@@ -38,6 +38,7 @@
>> #include <linux/rwsem.h>
>> #include <linux/nsproxy.h>
>> #include <linux/ipc_namespace.h>
>>+#include <linux/cpumask.h>
>>
>> #include <asm/current.h>
>> #include <asm/uaccess.h>
>>@@ -92,7 +93,7 @@ void recompute_msgmni(struct ipc_namespa
>>
>> si_meminfo(&i);
>> allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit)
>>- / MSGMNB;
>>+ / ns->msg_ctlmnb;
>> nb_ns = atomic_read(&nr_ipc_ns);
>> allowed /= nb_ns;
>>
>>@@ -108,11 +109,19 @@ void recompute_msgmni(struct ipc_namespa
>>
>> ns->msg_ctlmni = allowed;
>> }
>>+/*
>>+ * Scale msgmnb with the number of online cpus, up to 4x MSGMNB.
>>+ */
>>+void recompute_msgmnb(struct ipc_namespace *ns)
>>+{
>>+ ns->msg_ctlmnb =
>>+ min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE);
>>+}
>>
>> void msg_init_ns(struct ipc_namespace *ns)
>> {
>> ns->msg_ctlmax = MSGMAX;
>>- ns->msg_ctlmnb = MSGMNB;
>>+ recompute_msgmnb(ns);
>>
>> recompute_msgmni(ns);
>>
>>@@ -132,8 +141,8 @@ void __init msg_init(void)
>> {
>> msg_init_ns(&init_ipc_ns);
>>
>>- printk(KERN_INFO "msgmni has been set to %d\n",
>>- init_ipc_ns.msg_ctlmni);
>>+ printk(KERN_INFO "msgmni has been set to %d, msgmnb to %d\n",
>>+ init_ipc_ns.msg_ctlmni, init_ipc_ns.msg_ctlmnb);
>>
>> ipc_init_proc_interface("sysvipc/msg",
>> " key msqid perms cbytes qnum lspid lrpid uid gid cuid cgid stime rtime ctime\n",
>>Index: b/include/linux/msg.h
>>===================================================================
>>--- a/include/linux/msg.h
>>+++ b/include/linux/msg.h
>>@@ -58,6 +58,12 @@ struct msginfo {
>> * more than 16 GB : msgmni = 32K (IPCMNI)
>> */
>> #define MSG_MEM_SCALE 32
>>+/*
>>+ * Scaling factor to compute msgmnb: ns->msg_ctlmnb is between MSGMNB
>>+ * and MSGMNB * MSG_CPU_SCALE. This leads to a max msgmnb value of
>>+ * 65536 which is an already used and recommended value.
>>+ */
>>+#define MSG_CPU_SCALE 4
>>
>> #define MSGMNI 16 /* <= IPCMNI */ /* max # of msg queue identifiers */
>> #define MSGMAX 8192 /* <= INT_MAX */ /* max size of message (bytes) */
>>Index: b/ipc/ipc_sysctl.c
>>===================================================================
>>--- a/ipc/ipc_sysctl.c
>>+++ b/ipc/ipc_sysctl.c
>>@@ -42,6 +42,7 @@ static void tunable_set_callback(int val
>> * Re-enable automatic recomputing only if not already
>> * enabled.
>> */
>>+ recompute_msgmnb(current->nsproxy->ipc_ns);
>> recompute_msgmni(current->nsproxy->ipc_ns);
>> cond_register_ipcns_notifier(current->nsproxy->ipc_ns);
>> }
>>@@ -210,8 +211,8 @@ static struct ctl_table ipc_kern_table[]
>> .data = &init_ipc_ns.msg_ctlmnb,
>> .maxlen = sizeof (init_ipc_ns.msg_ctlmnb),
>> .mode = 0644,
>>- .proc_handler = proc_ipc_dointvec,
>>- .strategy = sysctl_ipc_data,
>>+ .proc_handler = proc_ipc_callback_dointvec,
>>+ .strategy = sysctl_ipc_registered_data,
>> },
>> {
>> .ctl_name = KERN_SEM,
>>Index: b/Documentation/sysctl/kernel.txt
>>===================================================================
>>--- a/Documentation/sysctl/kernel.txt
>>+++ b/Documentation/sysctl/kernel.txt
>>@@ -179,6 +179,34 @@ kernel stack.
>>
>> ==============================================================
>>
>>+msgmnb
>>+
>>+Maximum size in bytes (not in message count) of a single SystemV IPC
>>+message queue (b stands for bytes).
>>+
>>+This value is dynamic and depends on the online cpu count of the
>>+machine (taking cpu hotplug into account).
>>+
>>+Computed values are between MSGMNB and MSGMNB*MSG_CPU_SCALE #define
>>+constants (currently [16384,65536]).
>>+
>>+The exact value is automatically (re)computed, but:
>>+. If the value is positioned from user space (via procfs or sysctl()),
>>+ to a positive value then the automatic recomputation is
>>+ disabled. This leaves control to user space. E.g.
>>+
>>+ # echo 16384 > /proc/sys/kernel/msgmnb
>>+
>>+. If the value is positioned from user space to a negative value, then
>>+ the computation is reenabled. E.g.
>>+
>>+ # echo -1 > /proc/sys/kernel/msgmnb
>>+
>>+See recompute_msgmnb() function in ipc/ directory for details.
>>+The value of msgmnb is coupled with the value of msgmni.
>>+
>
>
> The magical positive-versus-negative number trick is a bit obscure, and
> I don't think there's any precedent for it in the kernel ABI (which is
> what this is).
>
> Is there anything we can do to reduce the unusualness of this
> interface? Say, add a new /proc/sys/kernel/automatic-msgmnb which
> contains the automatic scaling and leave /proc/sys/kernel/msgmnb
> containing the manual scaling? Or something like that?
Well, I plead guilty ;-)
I've done this proposal when sending the msgmni scaling patches
(unfortunatly my network is down, so can't look the reference thread).
From what I have in my folders here's the complete story:
. January 08: sent the patches
. 02/05/2008: got an answer from Yasunori Goto:
Yasunori Goto wrote:
> Hmmm. I suppose this may be side effect which user does not wish.
>
> I would like to recommend there should be a switch which can turn
> on/off
> automatic recomputing.
> If user would like to change this value, it should be turned off.
> Otherwise, his requrest will be rejected with some messages.
>
> Probably, user can understand easier than this side effect.
. 02/11/2008: resent the patches after fixing the issues:
Nadia Derbey wrote:
> Resending the set of patches after Yasunori's remark about being able
> to turn on/off automatic recomputing.
> (see message at http://lkml.org/lkml/2008/2/5/149).
> I actually introduced an intermediate solution: when msgmni is set by
> hand, it is uneregistered from the ipcns notifier chain (i.e.
> automatic recomputing is disabled). This corresponds to an implicit
> turn off. Setting it to a negative value makes it registered back in
> the notifier chain (which corresponds to the turn on proposed by
> Yasunaori).
And I don't remember anybody complaining about that :-(
Sorry for introducing this "magical positive-vs-negative # trick".
Will think a bit more about your suggestion.
Regards,
Nadia
Andrew Morton wrote:
> On Tue, 24 Jun 2008 11:34:53 +0200
> <[email protected]> wrote:
>
>
>>From: Solofo Ramangalahy <[email protected]>
>>
>>Initialize msgmnb value to
>>min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE)
>>to increase the default value for larger machines.
>>
>>MSG_CPU_SCALE scaling factor is defined to be 4, as 16384 x 4 = 65536
>>is an already used and recommended value.
>>
>>The msgmni value is made dependant of msgmnb to keep the memory
>>dedicated to message queues within the 1/MSG_MEM_SCALE of lowmem
>>bound.
>>
>>Unlike msgmni, the value is not scaled (down) with respect to the
>>number of ipc namespaces for simplicity.
>>
>>To disable recomputation when user explicitely set a value,
>>we reuse the callback defined for msgmni.
>>
>>As msgmni and msgmnb are correlated, user settings of any of the two
>>disable recomputation of both, for now. This is refined in a later
>>patch.
>>
>>When a negative value is put in /proc/sys/kernel/msgmnb
>>automatic recomputing is re-enabled.
>>
>
>
> Thanks for taking the time to describe this work so well.
>
>
>>---
>> Documentation/sysctl/kernel.txt | 28 ++++++++++++++++++++++++++++
>> include/linux/msg.h | 6 ++++++
>> ipc/ipc_sysctl.c | 5 +++--
>> ipc/msg.c | 17 +++++++++++++----
>> 4 files changed, 50 insertions(+), 6 deletions(-)
>>
>>Index: b/ipc/msg.c
>>===================================================================
>>--- a/ipc/msg.c
>>+++ b/ipc/msg.c
>>@@ -38,6 +38,7 @@
>> #include <linux/rwsem.h>
>> #include <linux/nsproxy.h>
>> #include <linux/ipc_namespace.h>
>>+#include <linux/cpumask.h>
>>
>> #include <asm/current.h>
>> #include <asm/uaccess.h>
>>@@ -92,7 +93,7 @@ void recompute_msgmni(struct ipc_namespa
>>
>> si_meminfo(&i);
>> allowed = (((i.totalram - i.totalhigh) / MSG_MEM_SCALE) * i.mem_unit)
>>- / MSGMNB;
>>+ / ns->msg_ctlmnb;
>> nb_ns = atomic_read(&nr_ipc_ns);
>> allowed /= nb_ns;
>>
>>@@ -108,11 +109,19 @@ void recompute_msgmni(struct ipc_namespa
>>
>> ns->msg_ctlmni = allowed;
>> }
>>+/*
>>+ * Scale msgmnb with the number of online cpus, up to 4x MSGMNB.
>>+ */
>>+void recompute_msgmnb(struct ipc_namespace *ns)
>>+{
>>+ ns->msg_ctlmnb =
>>+ min(MSGMNB * num_online_cpus(), MSGMNB * MSG_CPU_SCALE);
>>+}
>>
>> void msg_init_ns(struct ipc_namespace *ns)
>> {
>> ns->msg_ctlmax = MSGMAX;
>>- ns->msg_ctlmnb = MSGMNB;
>>+ recompute_msgmnb(ns);
>>
>> recompute_msgmni(ns);
>>
>>@@ -132,8 +141,8 @@ void __init msg_init(void)
>> {
>> msg_init_ns(&init_ipc_ns);
>>
>>- printk(KERN_INFO "msgmni has been set to %d\n",
>>- init_ipc_ns.msg_ctlmni);
>>+ printk(KERN_INFO "msgmni has been set to %d, msgmnb to %d\n",
>>+ init_ipc_ns.msg_ctlmni, init_ipc_ns.msg_ctlmnb);
>>
>> ipc_init_proc_interface("sysvipc/msg",
>> " key msqid perms cbytes qnum lspid lrpid uid gid cuid cgid stime rtime ctime\n",
>>Index: b/include/linux/msg.h
>>===================================================================
>>--- a/include/linux/msg.h
>>+++ b/include/linux/msg.h
>>@@ -58,6 +58,12 @@ struct msginfo {
>> * more than 16 GB : msgmni = 32K (IPCMNI)
>> */
>> #define MSG_MEM_SCALE 32
>>+/*
>>+ * Scaling factor to compute msgmnb: ns->msg_ctlmnb is between MSGMNB
>>+ * and MSGMNB * MSG_CPU_SCALE. This leads to a max msgmnb value of
>>+ * 65536 which is an already used and recommended value.
>>+ */
>>+#define MSG_CPU_SCALE 4
>>
>> #define MSGMNI 16 /* <= IPCMNI */ /* max # of msg queue identifiers */
>> #define MSGMAX 8192 /* <= INT_MAX */ /* max size of message (bytes) */
>>Index: b/ipc/ipc_sysctl.c
>>===================================================================
>>--- a/ipc/ipc_sysctl.c
>>+++ b/ipc/ipc_sysctl.c
>>@@ -42,6 +42,7 @@ static void tunable_set_callback(int val
>> * Re-enable automatic recomputing only if not already
>> * enabled.
>> */
>>+ recompute_msgmnb(current->nsproxy->ipc_ns);
>> recompute_msgmni(current->nsproxy->ipc_ns);
>> cond_register_ipcns_notifier(current->nsproxy->ipc_ns);
>> }
>>@@ -210,8 +211,8 @@ static struct ctl_table ipc_kern_table[]
>> .data = &init_ipc_ns.msg_ctlmnb,
>> .maxlen = sizeof (init_ipc_ns.msg_ctlmnb),
>> .mode = 0644,
>>- .proc_handler = proc_ipc_dointvec,
>>- .strategy = sysctl_ipc_data,
>>+ .proc_handler = proc_ipc_callback_dointvec,
>>+ .strategy = sysctl_ipc_registered_data,
>> },
>> {
>> .ctl_name = KERN_SEM,
>>Index: b/Documentation/sysctl/kernel.txt
>>===================================================================
>>--- a/Documentation/sysctl/kernel.txt
>>+++ b/Documentation/sysctl/kernel.txt
>>@@ -179,6 +179,34 @@ kernel stack.
>>
>> ==============================================================
>>
>>+msgmnb
>>+
>>+Maximum size in bytes (not in message count) of a single SystemV IPC
>>+message queue (b stands for bytes).
>>+
>>+This value is dynamic and depends on the online cpu count of the
>>+machine (taking cpu hotplug into account).
>>+
>>+Computed values are between MSGMNB and MSGMNB*MSG_CPU_SCALE #define
>>+constants (currently [16384,65536]).
>>+
>>+The exact value is automatically (re)computed, but:
>>+. If the value is positioned from user space (via procfs or sysctl()),
>>+ to a positive value then the automatic recomputation is
>>+ disabled. This leaves control to user space. E.g.
>>+
>>+ # echo 16384 > /proc/sys/kernel/msgmnb
>>+
>>+. If the value is positioned from user space to a negative value, then
>>+ the computation is reenabled. E.g.
>>+
>>+ # echo -1 > /proc/sys/kernel/msgmnb
>>+
>>+See recompute_msgmnb() function in ipc/ directory for details.
>>+The value of msgmnb is coupled with the value of msgmni.
>>+
>
>
> The magical positive-versus-negative number trick is a bit obscure, and
> I don't think there's any precedent for it in the kernel ABI (which is
> what this is).
>
> Is there anything we can do to reduce the unusualness of this
> interface? Say, add a new /proc/sys/kernel/automatic-msgmnb which
> contains the automatic scaling and leave /proc/sys/kernel/msgmnb
> containing the manual scaling? Or something like that?
Well, I don't know if I well understood your proposal: is it 1 value in
automatic-msgmnb and another one in msgmnb?
I don't clearly see how this could work.
IMHO, we should keep /proc/sys/kernel/msgmnb as a way to externalize the
current tunable value (whether it is automatically recomputed or not).
Also keep the current strategy: as soon as a value is written into that
file, give up with the automatic recomputing.
And use the file you propose as a way to go back and forth between
automatic recomputing and manual setting.
So the process would be the following:
1) kernel boots in "automatic recomputing mode"
/proc/kernel/sys/msgmni contains whatever value has been computed
/proc/kernel/sys/automatic-msgmnb contains "ON"
2) echo <val> > /proc/kernel/sys/msgmnb
. sets msg_ctlmnb to <val>
. de-activates automatic recomputing (i.e. if, say, a cpu disappears
it won't be recompiuted anymore)
. /proc/kernel/sys/automatic-msgmnb now contains "OFF"
Echoing "OFF" into /proc/kernel/sys/automatic-msgmnb would have the same
effect (except that msg_ctlmnb's value would stay blocked at its current
value)
3) echo "ON" > /proc/kernel/sys/automatic-msgmnb
. recomputes msgmnb's value based on the current available resources
. re-activates automatic recomputing for msgmnb.
Of course, all this should be applied to msgmni too.
And may be this automatic-xxx file should be located under sysfs?
--> create /sys/kernel/automatic directory and have 1 file per
tunable to be scalled (who knows, may be we are adding other ones in th
future?)
Now, may be this is what you actually proposed and I completely
misunderstod it?
Regards,
Nadia
On Thu, 26 Jun 2008 16:49:02 +0200 Nadia Derbey <[email protected]> wrote:
> >>+. If the value is positioned from user space to a negative value, then
> >>+ the computation is reenabled. E.g.
> >>+
> >>+ # echo -1 > /proc/sys/kernel/msgmnb
> >>+
> >>+See recompute_msgmnb() function in ipc/ directory for details.
> >>+The value of msgmnb is coupled with the value of msgmni.
> >>+
> >
> >
> > The magical positive-versus-negative number trick is a bit obscure, and
> > I don't think there's any precedent for it in the kernel ABI (which is
> > what this is).
> >
> > Is there anything we can do to reduce the unusualness of this
> > interface? Say, add a new /proc/sys/kernel/automatic-msgmnb which
> > contains the automatic scaling and leave /proc/sys/kernel/msgmnb
> > containing the manual scaling? Or something like that?
>
> Well, I don't know if I well understood your proposal: is it 1 value in
> automatic-msgmnb and another one in msgmnb?
> I don't clearly see how this could work.
>
> IMHO, we should keep /proc/sys/kernel/msgmnb as a way to externalize the
> current tunable value (whether it is automatically recomputed or not).
>
> Also keep the current strategy: as soon as a value is written into that
> file, give up with the automatic recomputing.
>
> And use the file you propose as a way to go back and forth between
> automatic recomputing and manual setting.
>
> So the process would be the following:
> 1) kernel boots in "automatic recomputing mode"
> /proc/kernel/sys/msgmni contains whatever value has been computed
> /proc/kernel/sys/automatic-msgmnb contains "ON"
>
> 2) echo <val> > /proc/kernel/sys/msgmnb
> . sets msg_ctlmnb to <val>
> . de-activates automatic recomputing (i.e. if, say, a cpu disappears
> it won't be recompiuted anymore)
> . /proc/kernel/sys/automatic-msgmnb now contains "OFF"
>
> Echoing "OFF" into /proc/kernel/sys/automatic-msgmnb would have the same
> effect (except that msg_ctlmnb's value would stay blocked at its current
> value)
>
> 3) echo "ON" > /proc/kernel/sys/automatic-msgmnb
> . recomputes msgmnb's value based on the current available resources
> . re-activates automatic recomputing for msgmnb.
>
> Of course, all this should be applied to msgmni too.
> And may be this automatic-xxx file should be located under sysfs?
> --> create /sys/kernel/automatic directory and have 1 file per
> tunable to be scalled (who knows, may be we are adding other ones in th
> future?)
>
> Now, may be this is what you actually proposed and I completely
> misunderstod it?
>
I don't know what I proposed, sorry ;) I didn't think about it very hard.
But the positive-values-mean-one-thing/negative-values-mean-another-thing
trick is unusual and rather unpleasing. I was hoping you guys could come up
with a cleaner interface.