2007-01-16 06:27:51

by Nadia Derbey

[permalink] [raw]
Subject: [RFC][PATCH 1/6] Tunable structure and registration routines

[PATCH 01/06]

Defines the auto_tune structure: this is the structure that contains the
information needed by the adjustment routine for a given tunable.
Also defines the registration routines.

The fork kernel component defines a tunable structure for the threads-max
tunable and registers it.


Signed-off-by: Nadia Derbey <[email protected]>


---
Documentation/00-INDEX | 2
Documentation/auto_tune.txt | 333 ++++++++++++++++++++++++++++++++++++++++++++
fs/Kconfig | 2
include/linux/akt.h | 186 ++++++++++++++++++++++++
include/linux/akt_ops.h | 186 ++++++++++++++++++++++++
init/main.c | 2
kernel/Makefile | 1
kernel/autotune/Kconfig | 30 +++
kernel/autotune/Makefile | 7
kernel/autotune/akt.c | 123 ++++++++++++++++
kernel/fork.c | 18 ++
11 files changed, 890 insertions(+)

Index: linux-2.6.20-rc4/Documentation/00-INDEX
===================================================================
--- linux-2.6.20-rc4.orig/Documentation/00-INDEX 2007-01-15 13:08:13.000000000 +0100
+++ linux-2.6.20-rc4/Documentation/00-INDEX 2007-01-15 14:17:22.000000000 +0100
@@ -52,6 +52,8 @@ applying-patches.txt
- description of various trees and how to apply their patches.
arm/
- directory with info about Linux on the ARM architecture.
+auto_tune.txt
+ - info on the Automatic Kernel Tunables (AKT) feature.
basic_profiling.txt
- basic instructions for those who wants to profile Linux kernel.
binfmt_misc.txt
Index: linux-2.6.20-rc4/Documentation/auto_tune.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/Documentation/auto_tune.txt 2007-01-15 14:19:18.000000000 +0100
@@ -0,0 +1,333 @@
+ Automatic Kernel Tunables
+ =========================
+
+ Nadia Derbey ([email protected])
+
+
+
+This feature aims at making the kernel automatically change the tunables
+values as it sees resources running out.
+
+The AKT framework is made of 2 parts:
+
+1) Kernel part:
+Interfaces are provided to the kernel subsystems, to (un)register the
+tunables that might be automatically tuned in the future.
+
+Registering a tunable consists in the following steps:
+- a structure is declared and filled by the kernel subsystem for the
+registered tunable
+- that tunable structure is registered into sysfs
+
+Registration should be done during the kernel subsystem initialization step.
+
+Unregistering a tunable is the reverse operation. It should not be necessary
+for the kernel subsystems: it is only useful when unloading modules that would
+have registered a tunable during their loading step.
+
+The routines interfaces are the following:
+
+1.1) Declaring a tunable:
+
+A tunable structure should be declared and defined by the kernel subsystems as
+follows:
+
+DEFINE_TUNABLE(structure_name, threshold, min, max,
+ tunable_variable_ptr, checked_variable_ptr,
+ tunable_variable_type);
+
+Parameters:
+- structure_name: this is the name of the tunable structure
+
+- threshold: percentage to apply to the tunable value to detect if adjustment
+is needed
+
+- min: minimum value the tunable can ever reach (needed when adjusting down
+the tunable)
+
+- max: maximum value the tunable can ever reach (needed when adjusting up the
+tunable)
+
+- tunable_variable_ptr: address of the tunable that will be adjusted if
+needed.
+(ex: in kernel/fork.c it is max_threads's address)
+
+- checked_variable_ptr: address of the variable that is controlled by the
+tunable. This is the calling subsystem's object counter.
+(ex: in kernel/fork.c it is nr_threads's address: nr_threads should
+always remain < max_threads)
+
+- tunable_variable_type: this type is important since it helps choosing the
+appropriate automatic tuning routine.
+It can be one of short / ushort / int / uint / size_t / long / ulong
+
+The automatic tuning routine (i.e. the routine that should be called when
+automatic tuning is activated) is set to the default one:
+default_auto_tuning_<type>().
+<type> is chosen according to the tunable_variable_type parameters.
+All the previously listed parameters are useful to this routine.
+Refer to the description of the automatic adjustment routine to see how
+these parameters are actually used.
+
+Refer to "Updating the auto-tuning function pointer" to know how to set
+this routine to another one.
+
+
+1.2) Updating a tunable's characteristics
+
+1.2.1) Updating min / max values:
+
+Sometimes, when calling DEFINE_TUNABLE(), the min and max values are not
+exactly known, yet. In that case, the following routine should be called
+once these values are known:
+
+set_tunable_min_max(structure_name, new_min, new_max)
+
+Parameters:
+- structure_name: this is the name of the tunable structure
+
+- new_min: minimum value the tunable can ever reach
+
+- new_max: maximum value the tunable can ever reach
+
+1.2.2) Updating the auto-tuning function pointer:
+
+If the default auto-tuning routine doesn't fit your needs, you can define
+another one and associate it to the tunable using the following routine:
+
+set_autotuning_routine(structure_name, auto_tune)
+
+Parameters:
+- structure_name: this is the name of the tunable structure
+
+- auto_tune: routine that should be called when automatic tuning is activated.
+If this parameter is not NULL, it should be set to a function pointer defined
+by the kernel subsystem caller. See 1.5) for the routine prototype. See also
+maxfiles_auto_tuning() in fs/file_table.c for an example.
+
+
+1.3) Registering a tunable:
+
+Once declared and its min / max / auto_tuning routine updated, the tunable
+structure should be registered using the following routine:
+
+int register_tunable(struct auto_tune *tunable_addr);
+
+Parameters:
+- tunable_addr: address of the tunable structure previsouly declared.
+
+Return value:
+- 0 : successful
+- < 0 : failure
+
+
+Registering a tunable makes it potentially automatically adjustable:
+the tunable is viewed as a kobject with 3 attributes (i.e. 3 files at sysfs
+level):
+- autotune (rw): enables to (de)activate the auto tuning for that tunable
+- min (rw): enables to play with the min tunable value
+- max (rw): enables to play with the max tunable value
+
+The only way to make a registered tunable automatically adjustable is through
+sysfs (see the sysfs part for more details).
+
+
+
+1.4) Unregistering a tunable:
+
+int unregister_tunable(struct auto_tune *reg_tun_addr);
+
+Parameters:
+- reg_tun_addr: address of the tunable structure to unregister
+
+
+This routine is only useful for modules: when unloading, they should
+unregister any previously registered tunable.
+
+
+
+1.5) Automatic tuning routine:
+
+The 2nd main service provided by the kernel part is a function pointer
+(auto_tune_func): it points to the routine that actually automatically
+adjusts the tunable passed in as a parameter.
+
+This is accomplished by one of the following:
+- if an automatic tuning routine has been provided during the tunable
+declaration, that routine will actually be called.
+- if no automatic tuning routine has been provided, the default one is called.
+NOTE: it can process one of the following types, depending on the type used
+ when declaring the tunable (see DEFINE_TUNABLE above): short, ushort,
+ int, uint, size-t, long, ulong.
+
+
+If the automatic tuning routine is provided by the kernel subsystem caller,
+it should be declared as follows:
+
+int <routine_name>(int cmd, struct auto_tune *params);
+
+Parameters:
+- cmd: tuning direction
+ . AKT_UP: the tunable will be adjusted upwards (i.e. its value is
+ increased if needed)
+ . AKT_DOWN: the tunable is adjusted downwards (i.e. its value is
+ decreased if needed)
+- params: pointer to the previously registered tunable structure
+
+
+Any kernel subsystem that has registered a tunable should call
+auto_tune_func() as follows:
+
++-------------------------+--------------------------------------------+
+| Step | Routine to call |
++-------------------------+--------------------------------------------+
+| Declaration phase | DEFINE_TUNABLE(name, values...); |
++-------------------------+--------------------------------------------+
+| Initialization routine | set_tunable_min_max(name, min, max); |
+| | set_autotuning_routine(name, routine); |
+| | register_tunable(&name); |
+| Note: the 1st 2 calls | |
+| are optional | |
++-------------------------+--------------------------------------------+
+| Alloc | activate_auto_tuning(AKT_UP, &name); |
++-------------------------+--------------------------------------------+
+| Free | activate_auto_tuning(AKT_DOWN, &name); |
++-------------------------+--------------------------------------------+
+| module_exit() routine | unregister_tunable(&name); |
++-------------------------+--------------------------------------------+
+
+activate_auto_tuning is a static inline defined in akt.h, that does the
+following:
+. if <tunable is registered> and <auto tuning is allowd for tunable>
+. call the routine stored in tunable->auto_tune
+
+
+The effect of the default automatic tuning routine is the following:
+
+ +----------------------------------------------------------------+
+ | Tunable automatically adjustable |
+ +---------------+------------------------------------------------+
+ | NO | YES |
++----------+---------------+------------------------------------------------+
+| AKT_UP | No effect | If the tunable value exceeds the specified |
+| | | threshold, that value is increased up to a |
+| | | maximum value. |
+| | | The maximum value is specified during the |
+| | | tunable declaration and can be changed at any |
+| | | time through sysfs |
++----------+---------------+------------------------------------------------+
+| AKT_DOWN | No effect | If the tunable value falls under the specified |
+| | | threshold, that value is decreased down to a |
+| | | minimum value. |
+| | | The minimum value is specified during the |
+| | | tunable declaration and can be changed at any |
+| | | time through sysfs |
++----------+---------------+------------------------------------------------+
+
+
+1.6. Default automatic adjustment routine
+
+The last service provided by AKT at the kernel level is the default automatic
+adjustment routine. As seen, above, this routine supports various tunables
+types. It works as follows (only the AKT_UP direction is described here -
+AKT_DOWN does the reverse operation):
+
+The 2nd parameter passed in to this routine is a pointer to a previously
+registerd tunable structure. That structure contains the following fields (see
+1.1 for the detailed description):
+- threshold
+- key
+- min
+- max
+- tunable
+- checked
+
+When this routine is entered, it does the following:
+1. <*checked> is compared to <*tunable> * threshold
+2. if <*checked> is greater, <*tunable> is set to:
+ <*tunable> + (<*tunable> * (100 - threshold) / 100)
+
+
+
+1.6) akt and sysfs:
+
+AKT uses sysfs to enable the tunables management from the user world (mainly
+making them automatic or manual).
+
+akt uses sysfs in the following way:
+- a tunables subsystem (tunables_subsys) is declared and registered during akt
+initialization.
+- registering a tunable is equivalent to registering the corresponding kobject
+within that subsystem.
+- each tunable kobject has 3 associated attributes, all with a RW mode (i.e.
+the show() and store() methods are provided for them):
+ . autotune: enables to (de)activate automatic tuning for the tunable
+ . max: enables to set a new maximum value for the tunable
+ . min: enables to set a new minimum value for the tunable
+
+
+1.7) tunables that are namespace dependent
+
+In this paragraph, the particular case of tunables that are namespace
+dependent is presented.
+
+1.7.1) Declaring a tunable:
+
+The tunable structure for such tunables should be declared in the namespace
+structure that contains the associated tunable (ex: the tunable structure for
+msg_ctlmni should be declared in the ipc_namespace structure).
+
+The tunable structure should be declared as follows:
+
+DECLARE_TUNABLE(structure_name);
+
+Parameters:
+- structure_name: this is the name of the tunable structure
+
+1.7.2) Initializing the tunable structure
+
+Then the tunable structure should be initialized by calling the following
+routine:
+
+init_tunable_ipcns(namespace_ptr, structure_name, threshold, min, max,
+ tunable_variable_ptr, checked_variable_ptr,
+ tunable_variable_type);
+
+Parameters:
+- namespace_ptr: pointer to the namespace the tunable belongs to.
+
+See DEFINE_TUNABLE for the other parameters
+
+1.7.3) Registering the tunable structure
+
+register_tunable should be called, giving it the tunable structure address
+that belongs to the init namespace.
+
+This applies to activate_auto_tuning too.
+
+All the routines that show/store attributes or that do the auto tuning are
+namespace dependent.
+
+
+2) User part:
+
+As seen above, the only way to activate automatic tuning is from user side:
+- the directory /sys/tunables is created during the init phase.
+- each time a tunable is registered by a kernel subsystem, a directory is
+created for it under /sys/tunables.
+- This directory contains 1 file for each tunable kobject attribute:
++-----------+---------------+-------------------+----------------------------+
+| attribute | default value | how to set it | effect |
++-----------+---------------+-------------------+----------------------------+
+| autotune | 0 | echo 1 > autotune | makes the tunable automatic|
+| | | echo 0 > autotune | makes the tunable manual |
++-----------+---------------+-------------------+----------------------------+
+| max | max value set | echo <M> > max | sets the tunable max value |
+| | during tunable| | to <M> |
+| | definition | | |
++-----------+---------------+-------------------+----------------------------+
+| min | min value set | echo <m> > min | sets the tunable min value |
+| | during tunable| | to <m> |
+| | definition | | |
++-----------+---------------+-------------------+----------------------------+
+
Index: linux-2.6.20-rc4/fs/Kconfig
===================================================================
--- linux-2.6.20-rc4.orig/fs/Kconfig 2007-01-15 13:08:14.000000000 +0100
+++ linux-2.6.20-rc4/fs/Kconfig 2007-01-15 14:20:20.000000000 +0100
@@ -925,6 +925,8 @@ config PROC_KCORE
bool "/proc/kcore support" if !ARM
depends on PROC_FS && MMU

+source "kernel/autotune/Kconfig"
+
config PROC_VMCORE
bool "/proc/vmcore support (EXPERIMENTAL)"
depends on PROC_FS && EXPERIMENTAL && CRASH_DUMP
Index: linux-2.6.20-rc4/include/linux/akt.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/include/linux/akt.h 2007-01-15 14:26:24.000000000 +0100
@@ -0,0 +1,186 @@
+/*
+ * linux/include/akt.h
+ *
+ * Automatic Kernel Tunables support for Linux.
+ * This file contains structures definitions and prototypes needed for AKT
+ * support.
+ *
+ * Copyright (C) 2006 Bull S.A.S
+ *
+ * Author: Nadia Derbey <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef AKT_H
+#define AKT_H
+
+#include <linux/types.h>
+#include <linux/kobject.h>
+
+
+
+/*
+ * First parameter passed to the adjustment routine
+ */
+#define AKT_UP 0 /* adjustment "up" */
+#define AKT_DOWN 1 /* adjustment "down" */
+
+
+struct auto_tune;
+/*
+ * Automatic adjustment routine.
+ * Returns 0, if the tunable value has not been changed, 1 else
+ */
+typedef int (*auto_tune_fn)(int, struct auto_tune *);
+
+
+/*
+ * Structure used to describe the min / max values for a tunable inside the
+ * auto_tune structure.
+ * These values are type dependent and are used as high / low boundaries when
+ * tuning up or down.
+ * The type is known when the tunable is defined (see DEFINE_TUNABLE macro).
+ */
+struct typed_value {
+ union {
+ short val_short;
+ ushort val_ushort;
+ int val_int;
+ uint val_uint;
+ size_t val_size_t;
+ long val_long;
+ ulong val_ulong;
+ } value;
+};
+
+
+
+/*
+ * This is the structure that describes a tunable. One of these structures is
+ * allocated for each registered tunable, and the associated kobject exported
+ * via sysfs.
+ *
+ * The structure lock (tunable_lck) protects
+ * against concurrent accesses to tunable and checked pointers
+ *
+ * A pointer to this structure is passed in to the automatic adjustment
+ * routine.
+ * automatic adjustment principle is the following:
+ * AKT_UP:
+ * 1. *checked is compared to *tunable * threshold
+ * 2. if *checked is greater, the tunable is adjusted up
+ * AKT_DOWN: reverse operation
+ */
+struct auto_tune {
+ spinlock_t tunable_lck; /* serializes access to the stucture fields */
+ auto_tune_fn auto_tune; /* auto tuning routine registered by the */
+ /* calling kernel susbsystem. If NULL, the */
+ /* auto tuning routine that will be called */
+ /* is the default one that processes uints */
+ int (*check_parms)(struct auto_tune *); /* min / max checking */
+ /* routine ptr: points to */
+ /* the appropriate routine */
+ /* depending on the */
+ /* tunable type */
+ const char *name;
+ char flags; /* Only 2 bits are meaningful: */
+ /* bit 0: set to 1 if the associated tunable can */
+ /* be automatically adjusted */
+ /* bits 1: set to 1 if the tunable has been */
+ /* registered */
+ /* bits 2-7: useless */
+ char threshold; /* threshold to enable the adjustment expressed as */
+ /* a %age */
+ struct typed_value min; /* min value the tunable can ever reach */
+ /* and associated show / store routines) */
+ struct typed_value max; /* max value the tunable can ever reach */
+ /* and associated show / store routines) */
+ void *tunable; /* address of the tunable to adjust */
+ void *checked; /* address of the variable that is controlled by */
+ /* the tunable. This is the calling subsystem's */
+ /* object counter */
+};
+
+
+/*
+ * Flags for a registered tunable
+ */
+#define TUNABLE_REGISTERED 0x02
+
+
+/*
+ * When calling this routine the tunable lock should be held
+ */
+static inline int is_tunable_registered(struct auto_tune *tunable)
+{
+ return (tunable->flags & TUNABLE_REGISTERED) == TUNABLE_REGISTERED;
+}
+
+
+#ifdef CONFIG_AKT
+
+
+
+#define TUNABLE_INIT(_name, _thresh, _min, _max, _tun, _chk, type) \
+ { \
+ .tunable_lck = SPIN_LOCK_UNLOCKED, \
+ .auto_tune = default_auto_tuning_##type, \
+ .check_parms = check_parms_##type, \
+ .name = (_name), \
+ .flags = 0, \
+ .threshold = (_thresh), \
+ .min = { \
+ .value = { .val_##type = (_min), }, \
+ }, \
+ .max = { \
+ .value = { .val_##type = (_max), }, \
+ }, \
+ .tunable = (_tun), \
+ .checked = (_chk), \
+ }
+
+
+#define DEFINE_TUNABLE(s, thr, min, max, tun, chk, type) \
+ struct auto_tune s = TUNABLE_INIT(#s, thr, min, max, tun, chk, type)
+
+#define set_tunable_min_max(s, _min, _max, type) \
+ do { \
+ (s).min.value.val_##type = _min; \
+ (s).max.value.val_##type = _max; \
+ } while (0)
+
+
+
+extern int register_tunable(struct auto_tune *);
+extern int unregister_tunable(struct auto_tune *);
+
+
+#else /* CONFIG_AKT */
+
+
+#define DEFINE_TUNABLE(s, thresh, min, max, tun, chk, type)
+#define set_tunable_min_max(s, min, max, type) do { } while (0)
+
+
+#define register_tunable(a) 0
+#define unregister_tunable(a) 0
+
+
+#endif /* CONFIG_AKT */
+
+extern void fork_late_init(void);
+
+#endif /* AKT_H */
Index: linux-2.6.20-rc4/include/linux/akt_ops.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/include/linux/akt_ops.h 2007-01-15 14:28:16.000000000 +0100
@@ -0,0 +1,186 @@
+/*
+ * linux/include/akt_ops.h
+ *
+ * Automatic Kernel Tunables support for Linux.
+ * This file contains the definitions for the type dependent routines
+ * needed for AKT support.
+ *
+ * Copyright (C) 2006 Bull S.A.S
+ *
+ * Author: Nadia Derbey <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef AKT_OPS_H
+#define AKT_OPS_H
+
+#include <linux/errno.h>
+
+
+/*
+ * Checks that min and max values are coherent
+ * Called by register_tunable()
+ * Type independent - can be one of short / ushort / int / uint / long /
+ * ulong / size_t
+ */
+#define __check_parms(p, type) \
+( { \
+ int __rc; \
+ type _min = p->min.value.val_##type; \
+ type _max = p->max.value.val_##type; \
+ \
+ if (_min > _max) \
+ __rc = 1; \
+ else \
+ __rc = 0; \
+ __rc; \
+} )
+
+static inline int check_parms_short(struct auto_tune *p)
+{
+ return __check_parms(p, short);
+}
+
+static inline int check_parms_ushort(struct auto_tune *p)
+{
+ return __check_parms(p, ushort);
+}
+
+static inline int check_parms_int(struct auto_tune *p)
+{
+ return __check_parms(p, int);
+}
+
+static inline int check_parms_uint(struct auto_tune *p)
+{
+ return __check_parms(p, uint);
+}
+
+static inline int check_parms_size_t(struct auto_tune *p)
+{
+ return __check_parms(p, size_t);
+}
+
+static inline int check_parms_long(struct auto_tune *p)
+{
+ return __check_parms(p, long);
+}
+
+static inline int check_parms_ulong(struct auto_tune *p)
+{
+ return __check_parms(p, ulong);
+}
+
+
+/*
+ * FUNCTION: This is the routine called to accomplish auto tuning if none
+ * has been specified for a tunable.
+ * It can be called by any kernel subsystem that is allocating or
+ * freeing an object whose maximum value is controlled by a
+ * tunable.
+ * ex: max # of semaphore ids is controlled by sc_semmni
+ * ==> this routine might be called by sys_semget() to "adjust up"
+ * and by semctl_down() to "adjust down"
+ *
+ * Upwards adjustment:
+ * Adjustment is needed if the checked variable has reached
+ * (threshold / 100 * tunable)
+ * In that case, tunable is set to
+ * (tunable + tunable * (100 - threshold) / 100)
+ *
+ * Downards adjustment:
+ * Adjustment is needed if the checked variable has fallen
+ * under (threshold / 100 * tunable previous value)
+ * In that case tunable is set back to its previous value,
+ * i.e. to (tunable * 100 / (200 - threshold))
+ *
+ * PARAMETERS: direction: controls the adjustment direction (up / down)
+ * p: pointer to the registered tunable structure
+ *
+ * EXECUTION ENVIRONMENT: This routine should be called with the
+ * p->tunable_lck lock held
+ *
+ * Type independent - can be one of short / ushort / int / uint / long /
+ * ulong / size_t
+ *
+ * RETURN VALUE: 1 if tunable has been adjusted
+ * 0 else
+ */
+#define __default_auto_tuning(direction, p, type) \
+( { \
+ int __rc; \
+ type _chk = *((type *) p->checked); \
+ type _tun = *((type *) p->tunable); \
+ type _thr = (type) p->threshold; \
+ type _min = (type) p->min.value.val_##type; \
+ type _max = (type) p->max.value.val_##type; \
+ \
+ if (direction == AKT_UP) { \
+ if ((_chk >= (_tun * _thr) / 100) && (_tun < _max)) { \
+ type ___x = (_tun * (200 - _thr)) / 100; \
+ *((type *) p->tunable) = min(_max, ___x); \
+ __rc = 1; \
+ } else \
+ __rc = 0; \
+ } else { \
+ if ((_chk < (_tun * _thr) / (200 - _thr)) && (_tun>_min)) { \
+ type ___x = (_tun * 100) / (200 - _thr); \
+ *((type *) p->tunable) = max(_min, ___x); \
+ __rc = 1; \
+ } else \
+ __rc = 0; \
+ } \
+ __rc; \
+} )
+
+static inline int default_auto_tuning_short(int dir, struct auto_tune *p)
+{
+ return __default_auto_tuning(dir, p, short);
+}
+
+static inline int default_auto_tuning_ushort(int dir, struct auto_tune *p)
+{
+ return __default_auto_tuning(dir, p, ushort);
+}
+
+static inline int default_auto_tuning_int(int dir, struct auto_tune *p)
+{
+ return __default_auto_tuning(dir, p, int);
+}
+
+static inline int default_auto_tuning_uint(int dir, struct auto_tune *p)
+{
+ return __default_auto_tuning(dir, p, uint);
+}
+
+static inline int default_auto_tuning_size_t(int dir, struct auto_tune *p)
+{
+ return __default_auto_tuning(dir, p, size_t);
+}
+
+static inline int default_auto_tuning_long(int dir, struct auto_tune *p)
+{
+ return __default_auto_tuning(dir, p, long);
+}
+
+static inline int default_auto_tuning_ulong(int dir, struct auto_tune *p)
+{
+ return __default_auto_tuning(dir, p, ulong);
+}
+
+
+
+#endif /* AKT_OPS_H */
Index: linux-2.6.20-rc4/init/main.c
===================================================================
--- linux-2.6.20-rc4.orig/init/main.c 2007-01-15 13:08:15.000000000 +0100
+++ linux-2.6.20-rc4/init/main.c 2007-01-15 14:29:17.000000000 +0100
@@ -54,6 +54,7 @@
#include <linux/pid_namespace.h>
#include <linux/compile.h>
#include <linux/device.h>
+#include <linux/akt.h>

#include <asm/io.h>
#include <asm/bugs.h>
@@ -613,6 +614,7 @@ asmlinkage void __init start_kernel(void
signals_init();
/* rootfs populating might need page-writeback */
page_writeback_init();
+ fork_late_init();
#ifdef CONFIG_PROC_FS
proc_root_init();
#endif
Index: linux-2.6.20-rc4/kernel/Makefile
===================================================================
--- linux-2.6.20-rc4.orig/kernel/Makefile 2007-01-15 13:08:15.000000000 +0100
+++ linux-2.6.20-rc4/kernel/Makefile 2007-01-15 14:30:43.000000000 +0100
@@ -50,6 +50,7 @@ obj-$(CONFIG_RELAY) += relay.o
obj-$(CONFIG_UTS_NS) += utsname.o
obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_AKT) += autotune/

ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
# According to Alan Modra <[email protected]>, the -fno-omit-frame-pointer is
Index: linux-2.6.20-rc4/kernel/autotune/Kconfig
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/kernel/autotune/Kconfig 2007-01-15 14:31:25.000000000 +0100
@@ -0,0 +1,30 @@
+#
+# Automatic Kernel Tunables
+#
+
+menu "Automatic Kernel Tunables"
+
+config AKT
+ bool "Automatic kernel tunable (kernel support)"
+ depends on PROC_FS && SYSFS
+ help
+ This is a functionality that enables automatic adjustment of kernel
+ tunables: when this feature is enabled the kernel can automatically
+ change the tunables values as it sees resources running out.
+
+ The list of kernel tunables that can potentially be automatically
+ adjusted can found under /sys/tunables.
+
+ In order to make a tunable actually automatic, issue the following
+ command:
+ echo 1 > /sys/tunables/<tunable_name>/autotune
+
+ In order to make it manual, issue the following command:
+ echo 0 > /sys/tunables/<tunable_name>/autotune
+
+ See Documentation/auto_tune.txt for more details.
+
+ If unsure, say N.
+
+endmenu
+
Index: linux-2.6.20-rc4/kernel/autotune/Makefile
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/kernel/autotune/Makefile 2007-01-15 14:31:57.000000000 +0100
@@ -0,0 +1,7 @@
+#
+# Makefile for akt
+#
+
+obj-y := akt.o
+
+
Index: linux-2.6.20-rc4/kernel/autotune/akt.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/kernel/autotune/akt.c 2007-01-15 14:51:54.000000000 +0100
@@ -0,0 +1,123 @@
+/*
+ * linux/kernel/autotune/akt.c
+ *
+ * Automatic Kernel Tunables for Linux - Kernel support
+ *
+ * Copyright (C) 2006 Bull S.A.S
+ *
+ * Author: Nadia Derbey <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+/*
+ * FUNCTIONS:
+ * register_tunable (exported)
+ * unregister_tunable (exported)
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/akt.h>
+
+
+
+
+
+
+
+/*
+ * FUNCTION: Inserts a tunable structure into sysfs
+ * This routine serves also as a checker for the tunable
+ * structure fields.
+ * This routine is called by any kernel subsystem that wants to
+ * use akt services (automatic tunables adjustment) in the future
+ *
+ * NOTE: when calling this routine, the tunable structure should have already
+ * been filled by defining it with DEFINE_TUNABLE()
+ *
+ * RETURN VALUE: 0: successful
+ * <0 if failure
+ */
+int register_tunable(struct auto_tune *tun)
+{
+ if (tun == NULL) {
+ printk(KERN_ERR "\tBad tunable structure pointer (NULL)\n");
+ return -EINVAL;
+ }
+
+ if (tun->threshold <= 0 || tun->threshold >= 100) {
+ printk(KERN_ERR "\tBad threshold (%d) value "
+ "- should be in the [1-99] interval\n",
+ tun->threshold);
+ return -EINVAL;
+ }
+
+ if (tun->tunable == NULL) {
+ printk(KERN_ERR "\tBad tunable pointer (NULL)\n");
+ return -EINVAL;
+ }
+
+ if (tun->checked == NULL) {
+ printk(KERN_ERR "\tBad checked value pointer (NULL)\n");
+ return -EINVAL;
+ }
+
+ /*
+ * Check the min / max value
+ */
+ if (tun->check_parms(tun)) {
+ printk(KERN_ERR "\tBad min / max values\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+
+/*
+ * FUNCTION: Removes a tunable structure from sysfs.
+ * This routine is called by any kernel subsystem that doesn't
+ * need the akt services anymore
+ *
+ * NOTE: reg_tun should point to a previously registered tunable
+ *
+ * RETURN VALUE: 0: successful
+ * <0 if failure
+ */
+int unregister_tunable(struct auto_tune *reg_tun)
+{
+ if (reg_tun == NULL) {
+ printk(KERN_ERR "\tBad tunable address (NULL)\n");
+ return -EINVAL;
+ }
+
+ spin_lock(&reg_tun->tunable_lck);
+
+ BUG_ON(!is_tunable_registered(reg_tun));
+
+ reg_tun->flags = 0;
+
+ spin_unlock(&reg_tun->tunable_lck);
+
+ return 0;
+}
+
+
+
+
+EXPORT_SYMBOL_GPL(register_tunable);
+EXPORT_SYMBOL_GPL(unregister_tunable);
Index: linux-2.6.20-rc4/kernel/fork.c
===================================================================
--- linux-2.6.20-rc4.orig/kernel/fork.c 2007-01-15 13:08:15.000000000 +0100
+++ linux-2.6.20-rc4/kernel/fork.c 2007-01-15 14:36:48.000000000 +0100
@@ -49,6 +49,8 @@
#include <linux/delayacct.h>
#include <linux/taskstats_kern.h>
#include <linux/random.h>
+#include <linux/akt.h>
+#include <linux/akt_ops.h>

#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -65,6 +67,13 @@ int nr_threads; /* The idle threads do

int max_threads; /* tunable limit on nr_threads */

+#define THREADTHRESH 80
+/*
+ * The actual values for min and max will be known during fork_init
+ */
+DEFINE_TUNABLE(max_threads_akt, THREADTHRESH, 0, 0, &max_threads,
+ &nr_threads, int);
+
DEFINE_PER_CPU(unsigned long, process_counts) = 0;

__cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */
@@ -152,12 +161,21 @@ void __init fork_init(unsigned long memp
if(max_threads < 20)
max_threads = 20;

+ set_tunable_min_max(max_threads_akt, max_threads, mempages / 2, int);
+
init_task.signal->rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
init_task.signal->rlim[RLIMIT_SIGPENDING] =
init_task.signal->rlim[RLIMIT_NPROC];
}

+void __init fork_late_init(void)
+{
+ if (register_tunable(&max_threads_akt))
+ printk(KERN_WARNING
+ "Failed registering tunable max_threads\n");
+}
+
static struct task_struct *dup_task_struct(struct task_struct *orig)
{
struct task_struct *tsk;

--


2007-01-25 00:36:59

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/6] Tunable structure and registration routines

On Tue, 16 Jan 2007 07:15:17 +0100 [email protected] wrote:

> [PATCH 01/06]
>
> Defines the auto_tune structure: this is the structure that contains the
> information needed by the adjustment routine for a given tunable.
> Also defines the registration routines.
>
> The fork kernel component defines a tunable structure for the threads-max
> tunable and registers it.
>
> Signed-off-by: Nadia Derbey <[email protected]>
> ---
> Documentation/00-INDEX | 2
> Documentation/auto_tune.txt | 333 ++++++++++++++++++++++++++++++++++++++++++++
> fs/Kconfig | 2
> include/linux/akt.h | 186 ++++++++++++++++++++++++
> include/linux/akt_ops.h | 186 ++++++++++++++++++++++++
> init/main.c | 2
> kernel/Makefile | 1
> kernel/autotune/Kconfig | 30 +++
> kernel/autotune/Makefile | 7
> kernel/autotune/akt.c | 123 ++++++++++++++++
> kernel/fork.c | 18 ++
> 11 files changed, 890 insertions(+)
>
> Index: linux-2.6.20-rc4/Documentation/auto_tune.txt
> ===================================================================
> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.20-rc4/Documentation/auto_tune.txt 2007-01-15 14:19:18.000000000 +0100
> @@ -0,0 +1,333 @@
> + Automatic Kernel Tunables
> + =========================
> +
> + Nadia Derbey ([email protected])
> +
> +
> +
> +This feature aims at making the kernel automatically change the tunables
> +values as it sees resources running out.
> +
> +The AKT framework is made of 2 parts:
> +
> +1) Kernel part:
> +Interfaces are provided to the kernel subsystems, to (un)register the
> +tunables that might be automatically tuned in the future.
> +
> +Registering a tunable consists in the following steps:

s/in/of/

> +- a structure is declared and filled by the kernel subsystem for the
> +registered tunable
> +- that tunable structure is registered into sysfs
> +
> +Registration should be done during the kernel subsystem initialization step.

...

> +Any kernel subsystem that has registered a tunable should call
> +auto_tune_func() as follows:
> +
> ++-------------------------+--------------------------------------------+
> +| Step | Routine to call |
> ++-------------------------+--------------------------------------------+
> +| Declaration phase | DEFINE_TUNABLE(name, values...); |
> ++-------------------------+--------------------------------------------+
> +| Initialization routine | set_tunable_min_max(name, min, max); |
> +| | set_autotuning_routine(name, routine); |
> +| | register_tunable(&name); |
> +| Note: the 1st 2 calls | |
> +| are optional | |
> ++-------------------------+--------------------------------------------+
> +| Alloc | activate_auto_tuning(AKT_UP, &name); |
> ++-------------------------+--------------------------------------------+
> +| Free | activate_auto_tuning(AKT_DOWN, &name); |

So does Free always use AKT_DOWN? why does it matter?
Seems unneeded and inconsistent.
How does one activate a tunable for downward adjustment?

> ++-------------------------+--------------------------------------------+
> +| module_exit() routine | unregister_tunable(&name); |
> ++-------------------------+--------------------------------------------+
> +
> +activate_auto_tuning is a static inline defined in akt.h, that does the
> +following:
> +. if <tunable is registered> and <auto tuning is allowd for tunable>

allowed

> +. call the routine stored in tunable->auto_tune
> +
> +
> +The effect of the default automatic tuning routine is the following:
> +
> + +----------------------------------------------------------------+
> + | Tunable automatically adjustable |
> + +---------------+------------------------------------------------+
> + | NO | YES |
> ++----------+---------------+------------------------------------------------+
> +| AKT_UP | No effect | If the tunable value exceeds the specified |
> +| | | threshold, that value is increased up to a |
> +| | | maximum value. |
> +| | | The maximum value is specified during the |
> +| | | tunable declaration and can be changed at any |
> +| | | time through sysfs |
> ++----------+---------------+------------------------------------------------+
> +| AKT_DOWN | No effect | If the tunable value falls under the specified |
> +| | | threshold, that value is decreased down to a |
> +| | | minimum value. |
> +| | | The minimum value is specified during the |
> +| | | tunable declaration and can be changed at any |
> +| | | time through sysfs |
> ++----------+---------------+------------------------------------------------+
> +
> +
> +1.6. Default automatic adjustment routine
> +
> +The last service provided by AKT at the kernel level is the default automatic
> +adjustment routine. As seen, above, this routine supports various tunables
> +types. It works as follows (only the AKT_UP direction is described here -
> +AKT_DOWN does the reverse operation):
> +
> +The 2nd parameter passed in to this routine is a pointer to a previously
> +registerd tunable structure. That structure contains the following fields (see
registered

> +1.1 for the detailed description):
> +- threshold
> +- key
> +- min
> +- max
> +- tunable
> +- checked
> +
> +When this routine is entered, it does the following:
> +1. <*checked> is compared to <*tunable> * threshold
> +2. if <*checked> is greater, <*tunable> is set to:
> + <*tunable> + (<*tunable> * (100 - threshold) / 100)
> +
> +
> +
> +1.6) akt and sysfs:
> +
...

> +
> +1.7) tunables that are namespace dependent
> +
...

> +
> +1.7.2) Initializing the tunable structure
> +
> +Then the tunable structure should be initialized by calling the following
> +routine:
> +
> +init_tunable_ipcns(namespace_ptr, structure_name, threshold, min, max,
> + tunable_variable_ptr, checked_variable_ptr,
> + tunable_variable_type);
> +
> +Parameters:
> +- namespace_ptr: pointer to the namespace the tunable belongs to.
> +
> +See DEFINE_TUNABLE for the other parameters

end with a period/full-stop '.'.

> +
> +1.7.3) Registering the tunable structure
> +
...

> +
> +2) User part:
> +
> +As seen above, the only way to activate automatic tuning is from user side:
> +- the directory /sys/tunables is created during the init phase.
> +- each time a tunable is registered by a kernel subsystem, a directory is
> +created for it under /sys/tunables.
> +- This directory contains 1 file for each tunable kobject attribute:

Please try to limit text documentation to 80 columns or less.

> ++-----------+---------------+-------------------+----------------------------+
> +| attribute | default value | how to set it | effect |
> ++-----------+---------------+-------------------+----------------------------+
> +| autotune | 0 | echo 1 > autotune | makes the tunable automatic|
> +| | | echo 0 > autotune | makes the tunable manual |
> ++-----------+---------------+-------------------+----------------------------+
> +| max | max value set | echo <M> > max | sets the tunable max value |
> +| | during tunable| | to <M> |
> +| | definition | | |
> ++-----------+---------------+-------------------+----------------------------+
> +| min | min value set | echo <m> > min | sets the tunable min value |
> +| | during tunable| | to <m> |
> +| | definition | | |
> ++-----------+---------------+-------------------+----------------------------+
> +
> Index: linux-2.6.20-rc4/fs/Kconfig
> ===================================================================
> --- linux-2.6.20-rc4.orig/fs/Kconfig 2007-01-15 13:08:14.000000000 +0100
> +++ linux-2.6.20-rc4/fs/Kconfig 2007-01-15 14:20:20.000000000 +0100
> @@ -925,6 +925,8 @@ config PROC_KCORE
> bool "/proc/kcore support" if !ARM
> depends on PROC_FS && MMU
>
> +source "kernel/autotune/Kconfig"

Why is that is the File systems menu? Seems odd to me
for it to be there. If it's just because it depends on
PROC_FS and SYSFS, then it should just go completely after
the File systems menu.

> config PROC_VMCORE
> bool "/proc/vmcore support (EXPERIMENTAL)"
> depends on PROC_FS && EXPERIMENTAL && CRASH_DUMP
> Index: linux-2.6.20-rc4/include/linux/akt.h
> ===================================================================
> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.20-rc4/include/linux/akt.h 2007-01-15 14:26:24.000000000 +0100
> @@ -0,0 +1,186 @@
> +
> +#ifndef AKT_H
> +#define AKT_H
> +
> +#include <linux/types.h>
> +#include <linux/kobject.h>
> +
> +/*
> + * First parameter passed to the adjustment routine
> + */
> +#define AKT_UP 0 /* adjustment "up" */
> +#define AKT_DOWN 1 /* adjustment "down" */
> +
> +
> +struct auto_tune {
> + spinlock_t tunable_lck; /* serializes access to the stucture fields */
> + auto_tune_fn auto_tune; /* auto tuning routine registered by the */
> + /* calling kernel susbsystem. If NULL, the */
> + /* auto tuning routine that will be called */
> + /* is the default one that processes uints */
> + int (*check_parms)(struct auto_tune *); /* min / max checking */
> + /* routine ptr: points to */
> + /* the appropriate routine */
> + /* depending on the */
> + /* tunable type */
> + const char *name;
> + char flags; /* Only 2 bits are meaningful: */

Make flags unsigned char so that no sign bit is needed.

> + /* bit 0: set to 1 if the associated tunable can */
> + /* be automatically adjusted */
> + /* bits 1: set to 1 if the tunable has been */
> + /* registered */
> + /* bits 2-7: useless */

unused ??

> + char threshold; /* threshold to enable the adjustment expressed as */
> + /* a %age */
> + struct typed_value min; /* min value the tunable can ever reach */
> + /* and associated show / store routines) */
> + struct typed_value max; /* max value the tunable can ever reach */
> + /* and associated show / store routines) */
> + void *tunable; /* address of the tunable to adjust */
> + void *checked; /* address of the variable that is controlled by */
> + /* the tunable. This is the calling subsystem's */
> + /* object counter */
> +};
> +

...

> +
> +extern void fork_late_init(void);

Looks like the wrong header file for that extern.

> +#endif /* AKT_H */

> Index: linux-2.6.20-rc4/kernel/autotune/akt.c
> ===================================================================
> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.20-rc4/kernel/autotune/akt.c 2007-01-15 14:51:54.000000000 +0100
> @@ -0,0 +1,123 @@
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/akt.h>
> +
> +
> +
> + Too Much Whitespace. :)
> +
> +
> +
> +/*
> + * FUNCTION: Inserts a tunable structure into sysfs
> + * This routine serves also as a checker for the tunable
> + * structure fields.
> + * This routine is called by any kernel subsystem that wants to
> + * use akt services (automatic tunables adjustment) in the future
> + *
> + * NOTE: when calling this routine, the tunable structure should have already
> + * been filled by defining it with DEFINE_TUNABLE()
> + *
> + * RETURN VALUE: 0: successful
> + * <0 if failure
> + */

Please use kernel-doc format for function comment blocks.

> +int register_tunable(struct auto_tune *tun)
> +{
> + if (tun == NULL) {
> + printk(KERN_ERR "\tBad tunable structure pointer (NULL)\n");

Each printk() needs something that tells that module or part
of the kernel that it's coming from (sometimes called a prefix).
And drop the \t (tab). IOW, replace the tab with a prefix, e.g.:

printk(KERN_ERR "autotune: Bad tunable structure NULL pointer\n");

> + return -EINVAL;
> + }
> +
> + if (tun->threshold <= 0 || tun->threshold >= 100) {
> + printk(KERN_ERR "\tBad threshold (%d) value "
> + "- should be in the [1-99] interval\n",
> + tun->threshold);

Replace \t with a prefix (and more below).

> + return -EINVAL;
> + }
> +
> + if (tun->tunable == NULL) {
> + printk(KERN_ERR "\tBad tunable pointer (NULL)\n");
> + return -EINVAL;
> + }
> +
> + if (tun->checked == NULL) {
> + printk(KERN_ERR "\tBad checked value pointer (NULL)\n");
> + return -EINVAL;
> + }
> +
> + /*
> + * Check the min / max value
> + */
> + if (tun->check_parms(tun)) {
> + printk(KERN_ERR "\tBad min / max values\n");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +
> +/*
> + * FUNCTION: Removes a tunable structure from sysfs.
> + * This routine is called by any kernel subsystem that doesn't
> + * need the akt services anymore
> + *
> + * NOTE: reg_tun should point to a previously registered tunable
> + *
> + * RETURN VALUE: 0: successful
> + * <0 if failure
> + */
> +int unregister_tunable(struct auto_tune *reg_tun)
> +{
> + if (reg_tun == NULL) {
> + printk(KERN_ERR "\tBad tunable address (NULL)\n");
> + return -EINVAL;
> + }
> +
> + spin_lock(&reg_tun->tunable_lck);
> +
> + BUG_ON(!is_tunable_registered(reg_tun));
> +
> + reg_tun->flags = 0;
> +
> + spin_unlock(&reg_tun->tunable_lck);
> +
> + return 0;
> +}
> +
> + Too Much Whitespace....
> +
> +
> +EXPORT_SYMBOL_GPL(register_tunable);
> +EXPORT_SYMBOL_GPL(unregister_tunable);

---
~Randy

2007-01-25 16:23:45

by Nadia Derbey

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/6] Tunable structure and registration routines

Randy,

Thanks for reviewing the code!
My comments embedded.
I'll re-send the patches as soon as possible.

Regards,
Nadia

Randy Dunlap wrote:
> On Tue, 16 Jan 2007 07:15:17 +0100 [email protected] wrote:
>
>
>>[PATCH 01/06]
>>
<snip>
>
>
>>+Any kernel subsystem that has registered a tunable should call
>>+auto_tune_func() as follows:
>>+
>>++-------------------------+--------------------------------------------+
>>+| Step | Routine to call |
>>++-------------------------+--------------------------------------------+
>>+| Declaration phase | DEFINE_TUNABLE(name, values...); |
>>++-------------------------+--------------------------------------------+
>>+| Initialization routine | set_tunable_min_max(name, min, max); |
>>+| | set_autotuning_routine(name, routine); |
>>+| | register_tunable(&name); |
>>+| Note: the 1st 2 calls | |
>>+| are optional | |
>>++-------------------------+--------------------------------------------+
>>+| Alloc | activate_auto_tuning(AKT_UP, &name); |
>>++-------------------------+--------------------------------------------+
>>+| Free | activate_auto_tuning(AKT_DOWN, &name); |
>
>
> So does Free always use AKT_DOWN? why does it matter?
> Seems unneeded and inconsistent.

Tuning down is recommended in order to come back to the default tunable
value.
I agree with you: today it has quite no effect, except on the tunable
value. If we take the ipc's example, grow_ary() just returns if the new
tunable value happens to be lower than the previous one.
But we can imagine, in the future, that grow_ary could deallocate the
unused memory.
+ in that particular case, lowering the tunable value makes the 1st loop
in ipc_addid() shorter.

> How does one activate a tunable for downward adjustment?

Actually a tunable is activated to be dynamically adjusted (whatever the
direction).
But you are giving me an idea for a future enhancement: we can imagine a
tunable that could be allowed to increase only (or decrease only). In
that case, we should move the autotune sysfs attribute into an 'up' and
a 'down' attribute?

<snip>

>>+
>>+2) User part:
>>+
>>+As seen above, the only way to activate automatic tuning is from user side:
>>+- the directory /sys/tunables is created during the init phase.
>>+- each time a tunable is registered by a kernel subsystem, a directory is
>>+created for it under /sys/tunables.
>>+- This directory contains 1 file for each tunable kobject attribute:
>
>
> Please try to limit text documentation to 80 columns or less.

That's exactly what I did?



<snip>

>>Index: linux-2.6.20-rc4/fs/Kconfig
>>===================================================================
>>--- linux-2.6.20-rc4.orig/fs/Kconfig 2007-01-15 13:08:14.000000000 +0100
>>+++ linux-2.6.20-rc4/fs/Kconfig 2007-01-15 14:20:20.000000000 +0100
>>@@ -925,6 +925,8 @@ config PROC_KCORE
>> bool "/proc/kcore support" if !ARM
>> depends on PROC_FS && MMU
>>
>>+source "kernel/autotune/Kconfig"
>
>
> Why is that is the File systems menu? Seems odd to me
> for it to be there. If it's just because it depends on
> PROC_FS and SYSFS, then it should just go completely after
> the File systems menu.
>

Since the tunables that are handled in AKT, I wanted the feature to be
close to CONFIG_PROC_FS.
Now, I do not agree with your proposal: putting it after the FS menu
means that it would appear in the main menu, right? I'll try to find a
better place for it.



>>Index: linux-2.6.20-rc4/include/linux/akt.h
>>===================================================================
>>--- /dev/null 1970-01-01 00:00:00.000000000 +0000
>>+++ linux-2.6.20-rc4/include/linux/akt.h 2007-01-15 14:26:24.000000000 +0100
>>@@ -0,0 +1,186 @@
>>+


<snip>

>>+ char flags; /* Only 2 bits are meaningful: */
>
>
> Make flags unsigned char so that no sign bit is needed.
>
>
>>+ /* bit 0: set to 1 if the associated tunable can */
>>+ /* be automatically adjusted */
>>+ /* bits 1: set to 1 if the tunable has been */
>>+ /* registered */
>>+ /* bits 2-7: useless */
>
>
> unused ??

yep

<snip>

>
>
>>+
>>+extern void fork_late_init(void);
>
>
> Looks like the wrong header file for that extern.
>
>

Actually, I wanted the changes to the existing kernel files to be as
small as possible. That's why everything is concentrated, whenever
possible, in the added files.

Regards,
Nadia




2007-01-25 16:39:03

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/6] Tunable structure and registration routines

On Thu, 25 Jan 2007 17:26:31 +0100 Nadia Derbey wrote:

> Randy,
>
> Thanks for reviewing the code!
> My comments embedded.
> I'll re-send the patches as soon as possible.

OK, thanks.


> Randy Dunlap wrote:
> > On Tue, 16 Jan 2007 07:15:17 +0100 [email protected] wrote:
> >
> >
> >>[PATCH 01/06]
> >>
> <snip>
> >
> >
> >>+Any kernel subsystem that has registered a tunable should call
> >>+auto_tune_func() as follows:
> >>+
> >>++-------------------------+--------------------------------------------+
> >>+| Step | Routine to call |
> >>++-------------------------+--------------------------------------------+
> >>+| Declaration phase | DEFINE_TUNABLE(name, values...); |
> >>++-------------------------+--------------------------------------------+
> >>+| Initialization routine | set_tunable_min_max(name, min, max); |
> >>+| | set_autotuning_routine(name, routine); |
> >>+| | register_tunable(&name); |
> >>+| Note: the 1st 2 calls | |
> >>+| are optional | |
> >>++-------------------------+--------------------------------------------+
> >>+| Alloc | activate_auto_tuning(AKT_UP, &name); |
> >>++-------------------------+--------------------------------------------+
> >>+| Free | activate_auto_tuning(AKT_DOWN, &name); |
> >
> >
> > So does Free always use AKT_DOWN? why does it matter?
> > Seems unneeded and inconsistent.
>
> Tuning down is recommended in order to come back to the default tunable
> value.

Let me try to be clearer. What is Alloc? and why is AKT_UP
associated with Alloc and AFK_DOWN associated with Free (whatever
that means)?


> I agree with you: today it has quite no effect, except on the tunable
> value. If we take the ipc's example, grow_ary() just returns if the new
> tunable value happens to be lower than the previous one.
> But we can imagine, in the future, that grow_ary could deallocate the
> unused memory.
> + in that particular case, lowering the tunable value makes the 1st loop
> in ipc_addid() shorter.
>
> > How does one activate a tunable for downward adjustment?
>
> Actually a tunable is activated to be dynamically adjusted (whatever the
> direction).
> But you are giving me an idea for a future enhancement: we can imagine a
> tunable that could be allowed to increase only (or decrease only). In
> that case, we should move the autotune sysfs attribute into an 'up' and
> a 'down' attribute?

Couldn't the tunable owner just adjust the min value to a new
(larger) min value, e.g.?


> >>+extern void fork_late_init(void);
> >
> >
> > Looks like the wrong header file for that extern.
> >
> >
>
> Actually, I wanted the changes to the existing kernel files to be as
> small as possible. That's why everything is concentrated, whenever
> possible, in the added files.

I suppose that's OK for review, but it shouldn't be merged that way.

---
~Randy

2007-01-25 16:58:49

by Nadia Derbey

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/6] Tunable structure and registration routines

Randy Dunlap wrote:
> On Thu, 25 Jan 2007 17:26:31 +0100 Nadia Derbey wrote:
>>>>+Any kernel subsystem that has registered a tunable should call
>>>>+auto_tune_func() as follows:
>>>>+
>>>>++-------------------------+--------------------------------------------+
>>>>+| Step | Routine to call |
>>>>++-------------------------+--------------------------------------------+
>>>>+| Declaration phase | DEFINE_TUNABLE(name, values...); |
>>>>++-------------------------+--------------------------------------------+
>>>>+| Initialization routine | set_tunable_min_max(name, min, max); |
>>>>+| | set_autotuning_routine(name, routine); |
>>>>+| | register_tunable(&name); |
>>>>+| Note: the 1st 2 calls | |
>>>>+| are optional | |
>>>>++-------------------------+--------------------------------------------+
>>>>+| Alloc | activate_auto_tuning(AKT_UP, &name); |
>>>>++-------------------------+--------------------------------------------+
>>>>+| Free | activate_auto_tuning(AKT_DOWN, &name); |
>>>
>>>
>>>So does Free always use AKT_DOWN? why does it matter?
>>>Seems unneeded and inconsistent.
>>
>>Tuning down is recommended in order to come back to the default tunable
>>value.
>
>
> Let me try to be clearer. What is Alloc? and why is AKT_UP
> associated with Alloc and AFK_DOWN associated with Free (whatever
> that means)?

Alloc stands for resource allocation: in a subsystem where resource
allocation depends on a tunable value, we should tune up that value
prior to the alllocation itself. Let's come back to the ipc subsystem
example: ipc_addid() is the routine that adds an entry to an ipc array.
The 1st thing it does (via grow_ary()) is to allocate some more space
for the ipc array if needed, i.e. if the ipc tunable value has
increased. That's why the tunable should be tuned up before calling
ipc_addid().

AKT_DOWN is the reverse operation: we are freeing resources, so the
tunble has no reason to remain with a big value.

>
>
>
>>I agree with you: today it has quite no effect, except on the tunable
>>value. If we take the ipc's example, grow_ary() just returns if the new
>>tunable value happens to be lower than the previous one.
>>But we can imagine, in the future, that grow_ary could deallocate the
>>unused memory.
>>+ in that particular case, lowering the tunable value makes the 1st loop
>>in ipc_addid() shorter.
>>
>>
>>>How does one activate a tunable for downward adjustment?
>>
>>Actually a tunable is activated to be dynamically adjusted (whatever the
>>direction).
>>But you are giving me an idea for a future enhancement: we can imagine a
>>tunable that could be allowed to increase only (or decrease only). In
>>that case, we should move the autotune sysfs attribute into an 'up' and
>>a 'down' attribute?
>
>
> Couldn't the tunable owner just adjust the min value to a new
> (larger) min value, e.g.?

You're completely right: setting the min value to the default one should
be enough!

>
>
>
>>>>+extern void fork_late_init(void);
>>>
>>>
>>>Looks like the wrong header file for that extern.
>>>
>>>
>>
>>Actually, I wanted the changes to the existing kernel files to be as
>>small as possible. That's why everything is concentrated, whenever
>>possible, in the added files.
>
>
> I suppose that's OK for review, but it shouldn't be merged that way.
>
> ---
> ~Randy
>


Regards,
Nadia