This series adds support for something that seems like many people always
wanted but nobody added it yet, so here's the ability to set sysctl parameters
via kernel command line options in the form of sysctl.vm.something=1
The important part is Patch 1. The second, not so important part is an attempt
to clean up legacy one-off parameters that do the same thing as a sysctl.
I don't want to remove them completely for compatibility reasons, but with
generic sysctl support the idea is to remove the one-off param handlers and
treat the parameters as aliases for the sysctl variants.
I have identified several parameters that mention sysctl counterparts in
Documentation/admin-guide/kernel-parameters.txt but there might be more. The
conversion also has varying level of success:
- numa_zonelist_order is converted in Patch 2 together with adding the
necessary infrastructure. It's easy as it doesn't really do anything but warn
on deprecated value these days.
- hung_task_panic is converted in Patch 3, but there's a downside that now it
only accepts 0 and 1, while previously it was any integer value
- nmi_watchdog maps to two sysctls nmi_watchdog and hardlockup_panic, so
there's no straighforward conversion possible
- traceoff_on_warning is a flag without value and it would be required to
handle that somehow in the conversion infractructure, which seems pointless
for a single flag
Anyway I hope that Patch 1 is mature enough to go regardless of the fate of the
less important second part.
Changes since RFCv2
- make proc_mnt internal to functions (Kees)
- use kasprintf when building path (Kees)
- improve error reporting - common errnos are translated to more obvious
messages and the rest uses %pe
Vlastimil Babka (3):
kernel/sysctl: support setting sysctl parameters from kernel command
line
kernel/sysctl: support handling command line aliases
kernel/hung_task convert hung_task_panic boot parameter to sysctl
.../admin-guide/kernel-parameters.txt | 11 +-
fs/proc/proc_sysctl.c | 135 ++++++++++++++++++
include/linux/sysctl.h | 4 +
init/main.c | 2 +
kernel/hung_task.c | 10 --
mm/page_alloc.c | 9 --
6 files changed, 151 insertions(+), 20 deletions(-)
--
2.25.1
We can now handle sysctl parameters on kernel command line and have
infrastructure to convert legacy command line options that duplicate sysctl
to become a sysctl alias.
This patch converts the hung_task_panic parameter. Note that the sysctl handler
is more strict and allows only 0 and 1, while the legacy parameter allowed
any non-zero value. But there is little reason anyone would not be using 1.
Signed-off-by: Vlastimil Babka <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 2 +-
fs/proc/proc_sysctl.c | 1 +
kernel/hung_task.c | 10 ----------
3 files changed, 2 insertions(+), 11 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 81ff626fc700..e0b8840404a1 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1457,7 +1457,7 @@
[KNL] Should the hung task detector generate panics.
Format: <integer>
- A nonzero value instructs the kernel to panic when a
+ A value of 1 instructs the kernel to panic when a
hung task is detected. The default value is controlled
by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
option. The value selected by this boot parameter can
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 97eb0b552bf8..77b1b844b02b 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -1743,6 +1743,7 @@ struct sysctl_alias {
*/
static const struct sysctl_alias sysctl_aliases[] = {
{"numa_zonelist_order", "vm.numa_zonelist_order" },
+ {"hung_task_panic", "kernel.hung_task_panic" },
{ }
};
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 14a625c16cb3..b22b5eeab3cb 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -63,16 +63,6 @@ static struct task_struct *watchdog_task;
unsigned int __read_mostly sysctl_hung_task_panic =
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE;
-static int __init hung_task_panic_setup(char *str)
-{
- int rc = kstrtouint(str, 0, &sysctl_hung_task_panic);
-
- if (rc)
- return rc;
- return 1;
-}
-__setup("hung_task_panic=", hung_task_panic_setup);
-
static int
hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
{
--
2.25.1
We can now handle sysctl parameters on kernel command line, but historically
some parameters introduced their own command line equivalent, which we don't
want to remove for compatibility reasons. We can however convert them to the
generic infrastructure with a table translating the legacy command line
parameters to their sysctl names, and removing the one-off param handlers.
This patch adds the support and makes the first conversion to demonstrate it,
on the (deprecated) numa_zonelist_order parameter.
Signed-off-by: Vlastimil Babka <[email protected]>
---
fs/proc/proc_sysctl.c | 48 ++++++++++++++++++++++++++++++++++++-------
mm/page_alloc.c | 9 --------
2 files changed, 41 insertions(+), 16 deletions(-)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 653188c9c4c9..97eb0b552bf8 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -1727,6 +1727,37 @@ int __init proc_sys_init(void)
return sysctl_init();
}
+struct sysctl_alias {
+ const char *kernel_param;
+ const char *sysctl_param;
+};
+
+/*
+ * Historically some settings had both sysctl and a command line parameter.
+ * With the generic sysctl. parameter support, we can handle them at a single
+ * place and only keep the historical name for compatibility. This is not meant
+ * to add brand new aliases. When adding existing aliases, consider whether
+ * the possibly different moment of changing the value (e.g. from early_param
+ * to the moment do_sysctl_args() is called) is an issue for the specific
+ * parameter.
+ */
+static const struct sysctl_alias sysctl_aliases[] = {
+ {"numa_zonelist_order", "vm.numa_zonelist_order" },
+ { }
+};
+
+static const char *sysctl_find_alias(char *param)
+{
+ const struct sysctl_alias *alias;
+
+ for (alias = &sysctl_aliases[0]; alias->kernel_param != NULL; alias++) {
+ if (strcmp(alias->kernel_param, param) == 0)
+ return alias->sysctl_param;
+ }
+
+ return NULL;
+}
+
/* Set sysctl value passed on kernel command line. */
static int process_sysctl_arg(char *param, char *val,
const char *unused, void *arg)
@@ -1740,15 +1771,18 @@ static int process_sysctl_arg(char *param, char *val,
loff_t pos = 0;
ssize_t wret;
- if (strncmp(param, "sysctl", sizeof("sysctl") - 1))
- return 0;
-
- param += sizeof("sysctl") - 1;
+ if (strncmp(param, "sysctl", sizeof("sysctl") - 1) == 0) {
+ param += sizeof("sysctl") - 1;
- if (param[0] != '/' && param[0] != '.')
- return 0;
+ if (param[0] != '/' && param[0] != '.')
+ return 0;
- param++;
+ param++;
+ } else {
+ param = (char *) sysctl_find_alias(param);
+ if (!param)
+ return 0;
+ }
if (!proc_mnt) {
proc_fs_type = get_fs_type("proc");
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3c4eb750a199..de7a134b1b8a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5460,15 +5460,6 @@ static int __parse_numa_zonelist_order(char *s)
return 0;
}
-static __init int setup_numa_zonelist_order(char *s)
-{
- if (!s)
- return 0;
-
- return __parse_numa_zonelist_order(s);
-}
-early_param("numa_zonelist_order", setup_numa_zonelist_order);
-
char numa_zonelist_order[] = "Node";
/*
--
2.25.1
A recently proposed patch to add vm_swappiness command line parameter in
addition to existing sysctl [1] made me wonder why we don't have a general
support for passing sysctl parameters via command line. Googling found only
somebody else wondering the same [2], but I haven't found any prior discussion
with reasons why not to do this.
Settings the vm_swappiness issue aside (the underlying issue might be solved in
a different way), quick search of kernel-parameters.txt shows there are already
some that exist as both sysctl and kernel parameter - hung_task_panic,
nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism
would remove the need to add more of those one-offs and might be handy in
situations where configuration by e.g. /etc/sysctl.d/ is impractical.
Hence, this patch adds a new parse_args() pass that looks for parameters
prefixed by 'sysctl.' and tries to interpret them as writes to the
corresponding sys/ files using an temporary in-kernel procfs mount. This
mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically
registered sysctl tables. Errors due to e.g. invalid parameter name or value
are reported in the kernel log.
The processing is hooked right before the init process is loaded, as some
handlers might be more complicated than simple setters and might need some
subsystems to be initialized. At the moment the init process can be started and
eventually execute a process writing to /proc/sys/ then it should be also fine
to do that from the kernel.
Sysctls registered later on module load time are not set by this mechanism -
it's expected that in such scenarios, setting sysctl values from userspace is
practical enough.
[1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
[2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
[3] https://lore.kernel.org/r/[email protected]/
Signed-off-by: Vlastimil Babka <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 9 ++
fs/proc/proc_sysctl.c | 100 ++++++++++++++++++
include/linux/sysctl.h | 4 +
init/main.c | 2 +
4 files changed, 115 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c07815d230bc..81ff626fc700 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4793,6 +4793,15 @@
switches= [HW,M68k]
+ sysctl.*= [KNL]
+ Set a sysctl parameter, right before loading the init
+ process, as if the value was written to the respective
+ /proc/sys/... file. Both '.' and '/' are recognized as
+ separators. Unrecognized parameters and invalid values
+ are reported in the kernel log. Sysctls registered
+ later by a loaded module cannot be set this way.
+ Example: sysctl.vm.swappiness=40
+
sysfs.deprecated=0|1 [KNL]
Enable/disable old style sysfs layout for old udev
on older distributions. When this option is enabled
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index c75bb4632ed1..653188c9c4c9 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -14,6 +14,7 @@
#include <linux/mm.h>
#include <linux/module.h>
#include <linux/bpf-cgroup.h>
+#include <linux/mount.h>
#include "internal.h"
static const struct dentry_operations proc_sys_dentry_operations;
@@ -1725,3 +1726,102 @@ int __init proc_sys_init(void)
return sysctl_init();
}
+
+/* Set sysctl value passed on kernel command line. */
+static int process_sysctl_arg(char *param, char *val,
+ const char *unused, void *arg)
+{
+ char *path;
+ struct vfsmount *proc_mnt = *((struct vfsmount **)arg);
+ struct file_system_type *proc_fs_type;
+ struct file *file;
+ int len;
+ int err;
+ loff_t pos = 0;
+ ssize_t wret;
+
+ if (strncmp(param, "sysctl", sizeof("sysctl") - 1))
+ return 0;
+
+ param += sizeof("sysctl") - 1;
+
+ if (param[0] != '/' && param[0] != '.')
+ return 0;
+
+ param++;
+
+ if (!proc_mnt) {
+ proc_fs_type = get_fs_type("proc");
+ if (!proc_fs_type) {
+ pr_err("Failed to find procfs to set sysctl from command line");
+ return 0;
+ }
+ proc_mnt = kern_mount(proc_fs_type);
+ put_filesystem(proc_fs_type);
+ if (IS_ERR(proc_mnt)) {
+ pr_err("Failed to mount procfs to set sysctl from command line");
+ return 0;
+ }
+ *((struct vfsmount **)arg) = proc_mnt;
+ }
+
+ path = kasprintf(GFP_KERNEL, "sys/%s", param);
+ if (!path)
+ panic("%s: Failed to allocate path for %s\n", __func__, param);
+ strreplace(path, '.', '/');
+
+ file = file_open_root(proc_mnt->mnt_root, proc_mnt, path, O_WRONLY, 0);
+ if (IS_ERR(file)) {
+ err = PTR_ERR(file);
+ if (err == -ENOENT)
+ pr_err("Failed to set sysctl parameter '%s=%s': parameter not found",
+ param, val);
+ else if (err == -EACCES)
+ pr_err("Failed to set sysctl parameter '%s=%s': permission denied (read-only?)",
+ param, val);
+ else
+ pr_err("Error %pe opening proc file to set sysctl parameter '%s=%s'",
+ file, param, val);
+ goto out;
+ }
+ len = strlen(val);
+ wret = kernel_write(file, val, len, &pos);
+ if (wret < 0) {
+ err = wret;
+ if (err == -EINVAL)
+ pr_err("Failed to set sysctl parameter '%s=%s': invalid value",
+ param, val);
+ else
+ pr_err("Error %pe writing to proc file to set sysctl parameter '%s=%s'",
+ ERR_PTR(err), param, val);
+ } else if (wret != len) {
+ pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
+ wret, len, path, param, val);
+ }
+
+ err = filp_close(file, NULL);
+ if (err)
+ pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
+ ERR_PTR(err), param, val);
+out:
+ kfree(path);
+ return 0;
+}
+
+void do_sysctl_args(void)
+{
+ char *command_line;
+ struct vfsmount *proc_mnt = NULL;
+
+ command_line = kstrdup(saved_command_line, GFP_KERNEL);
+ if (!command_line)
+ panic("%s: Failed to allocate copy of command line\n", __func__);
+
+ parse_args("Setting sysctl args", command_line,
+ NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
+
+ if (proc_mnt)
+ kern_unmount(proc_mnt);
+
+ kfree(command_line);
+}
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 02fa84493f23..5f3f2a00d75f 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -206,6 +206,7 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
void unregister_sysctl_table(struct ctl_table_header * table);
extern int sysctl_init(void);
+void do_sysctl_args(void);
extern struct ctl_table sysctl_mount_point[];
@@ -236,6 +237,9 @@ static inline void setup_sysctl_set(struct ctl_table_set *p,
{
}
+void do_sysctl_args(void)
+{
+}
#endif /* CONFIG_SYSCTL */
int sysctl_max_threads(struct ctl_table *table, int write,
diff --git a/init/main.c b/init/main.c
index ee4947af823f..a91ea166a731 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1367,6 +1367,8 @@ static int __ref kernel_init(void *unused)
rcu_end_inkernel_boot();
+ do_sysctl_args();
+
if (ramdisk_execute_command) {
ret = run_init_process(ramdisk_execute_command);
if (!ret)
--
2.25.1
On 3/30/20 1:55 PM, Vlastimil Babka wrote:
> A recently proposed patch to add vm_swappiness command line parameter in
> addition to existing sysctl [1] made me wonder why we don't have a general
> support for passing sysctl parameters via command line. Googling found only
> somebody else wondering the same [2], but I haven't found any prior discussion
> with reasons why not to do this.
>
> Settings the vm_swappiness issue aside (the underlying issue might be solved in
> a different way), quick search of kernel-parameters.txt shows there are already
> some that exist as both sysctl and kernel parameter - hung_task_panic,
> nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism
> would remove the need to add more of those one-offs and might be handy in
> situations where configuration by e.g. /etc/sysctl.d/ is impractical.
>
> Hence, this patch adds a new parse_args() pass that looks for parameters
> prefixed by 'sysctl.' and tries to interpret them as writes to the
> corresponding sys/ files using an temporary in-kernel procfs mount. This
> mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically
> registered sysctl tables. Errors due to e.g. invalid parameter name or value
> are reported in the kernel log.
>
> The processing is hooked right before the init process is loaded, as some
> handlers might be more complicated than simple setters and might need some
> subsystems to be initialized. At the moment the init process can be started and
> eventually execute a process writing to /proc/sys/ then it should be also fine
> to do that from the kernel.
>
> Sysctls registered later on module load time are not set by this mechanism -
> it's expected that in such scenarios, setting sysctl values from userspace is
> practical enough.
>
> [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
> [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
> [3] https://lore.kernel.org/r/[email protected]/
>
> Signed-off-by: Vlastimil Babka <[email protected]>
Boo, all the error prints should terminate with \n
Will wait for feedback before resend.
On Mon, Mar 30, 2020 at 01:55:33PM +0200, Vlastimil Babka wrote:
> A recently proposed patch to add vm_swappiness command line parameter in
> addition to existing sysctl [1] made me wonder why we don't have a general
> support for passing sysctl parameters via command line. Googling found only
> somebody else wondering the same [2], but I haven't found any prior discussion
> with reasons why not to do this.
>
> Settings the vm_swappiness issue aside (the underlying issue might be solved in
> a different way), quick search of kernel-parameters.txt shows there are already
> some that exist as both sysctl and kernel parameter - hung_task_panic,
> nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism
> would remove the need to add more of those one-offs and might be handy in
> situations where configuration by e.g. /etc/sysctl.d/ is impractical.
>
> Hence, this patch adds a new parse_args() pass that looks for parameters
> prefixed by 'sysctl.' and tries to interpret them as writes to the
> corresponding sys/ files using an temporary in-kernel procfs mount. This
> mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically
> registered sysctl tables. Errors due to e.g. invalid parameter name or value
> are reported in the kernel log.
>
> The processing is hooked right before the init process is loaded, as some
> handlers might be more complicated than simple setters and might need some
> subsystems to be initialized. At the moment the init process can be started and
> eventually execute a process writing to /proc/sys/ then it should be also fine
> to do that from the kernel.
>
> Sysctls registered later on module load time are not set by this mechanism -
> it's expected that in such scenarios, setting sysctl values from userspace is
> practical enough.
>
> [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
> [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
> [3] https://lore.kernel.org/r/[email protected]/
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> .../admin-guide/kernel-parameters.txt | 9 ++
> fs/proc/proc_sysctl.c | 100 ++++++++++++++++++
> include/linux/sysctl.h | 4 +
> init/main.c | 2 +
> 4 files changed, 115 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c07815d230bc..81ff626fc700 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4793,6 +4793,15 @@
>
> switches= [HW,M68k]
>
> + sysctl.*= [KNL]
> + Set a sysctl parameter, right before loading the init
> + process, as if the value was written to the respective
> + /proc/sys/... file. Both '.' and '/' are recognized as
> + separators. Unrecognized parameters and invalid values
> + are reported in the kernel log. Sysctls registered
> + later by a loaded module cannot be set this way.
> + Example: sysctl.vm.swappiness=40
> +
> sysfs.deprecated=0|1 [KNL]
> Enable/disable old style sysfs layout for old udev
> on older distributions. When this option is enabled
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index c75bb4632ed1..653188c9c4c9 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -14,6 +14,7 @@
> #include <linux/mm.h>
> #include <linux/module.h>
> #include <linux/bpf-cgroup.h>
> +#include <linux/mount.h>
> #include "internal.h"
>
> static const struct dentry_operations proc_sys_dentry_operations;
> @@ -1725,3 +1726,102 @@ int __init proc_sys_init(void)
>
> return sysctl_init();
> }
> +
> +/* Set sysctl value passed on kernel command line. */
> +static int process_sysctl_arg(char *param, char *val,
> + const char *unused, void *arg)
> +{
> + char *path;
> + struct vfsmount *proc_mnt = *((struct vfsmount **)arg);
I would just make this:
struct vfsmount **proc_mnt = (struct vfsmount **)arg;
> + struct file_system_type *proc_fs_type;
> + struct file *file;
> + int len;
> + int err;
> + loff_t pos = 0;
> + ssize_t wret;
> +
> + if (strncmp(param, "sysctl", sizeof("sysctl") - 1))
> + return 0;
> +
> + param += sizeof("sysctl") - 1;
> +
> + if (param[0] != '/' && param[0] != '.')
> + return 0;
> +
> + param++;
> +
> + if (!proc_mnt) {
if (!*proc_mnt) {
I would also add a comment here to explain that this is doing an
on-demand mount so that it doesn't have to mount proc if there are not
sysctl parameters.
> + proc_fs_type = get_fs_type("proc");
> + if (!proc_fs_type) {
> + pr_err("Failed to find procfs to set sysctl from command line");
> + return 0;
> + }
> + proc_mnt = kern_mount(proc_fs_type);
*proc_mnt = kern_mount(proc_fs_type);
> + put_filesystem(proc_fs_type);
> + if (IS_ERR(proc_mnt)) {
if (IS_ERR(*proc_mnt)) {
> + pr_err("Failed to mount procfs to set sysctl from command line");
> + return 0;
> + }
> + *((struct vfsmount **)arg) = proc_mnt;
Then drop this line.
> + }
> +
> + path = kasprintf(GFP_KERNEL, "sys/%s", param);
> + if (!path)
> + panic("%s: Failed to allocate path for %s\n", __func__, param);
> + strreplace(path, '.', '/');
> +
> + file = file_open_root(proc_mnt->mnt_root, proc_mnt, path, O_WRONLY, 0);
file = file_open_root((*proc_mnt)->mnt_root, *proc_mnt, path, O_WRONLY, 0);
> + if (IS_ERR(file)) {
> + err = PTR_ERR(file);
> + if (err == -ENOENT)
> + pr_err("Failed to set sysctl parameter '%s=%s': parameter not found",
> + param, val);
> + else if (err == -EACCES)
> + pr_err("Failed to set sysctl parameter '%s=%s': permission denied (read-only?)",
> + param, val);
> + else
> + pr_err("Error %pe opening proc file to set sysctl parameter '%s=%s'",
> + file, param, val);
> + goto out;
> + }
> + len = strlen(val);
> + wret = kernel_write(file, val, len, &pos);
> + if (wret < 0) {
> + err = wret;
> + if (err == -EINVAL)
> + pr_err("Failed to set sysctl parameter '%s=%s': invalid value",
> + param, val);
> + else
> + pr_err("Error %pe writing to proc file to set sysctl parameter '%s=%s'",
> + ERR_PTR(err), param, val);
> + } else if (wret != len) {
> + pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
> + wret, len, path, param, val);
> + }
> +
> + err = filp_close(file, NULL);
> + if (err)
> + pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
> + ERR_PTR(err), param, val);
> +out:
> + kfree(path);
> + return 0;
> +}
> +
> +void do_sysctl_args(void)
> +{
> + char *command_line;
> + struct vfsmount *proc_mnt = NULL;
> +
> + command_line = kstrdup(saved_command_line, GFP_KERNEL);
> + if (!command_line)
> + panic("%s: Failed to allocate copy of command line\n", __func__);
> +
> + parse_args("Setting sysctl args", command_line,
> + NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
> +
> + if (proc_mnt)
> + kern_unmount(proc_mnt);
> +
> + kfree(command_line);
> +}
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index 02fa84493f23..5f3f2a00d75f 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -206,6 +206,7 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
> void unregister_sysctl_table(struct ctl_table_header * table);
>
> extern int sysctl_init(void);
> +void do_sysctl_args(void);
>
> extern struct ctl_table sysctl_mount_point[];
>
> @@ -236,6 +237,9 @@ static inline void setup_sysctl_set(struct ctl_table_set *p,
> {
> }
>
> +void do_sysctl_args(void)
As with the others in the no-op case:
static inline void do_sysctl_args(void)
> +{
> +}
> #endif /* CONFIG_SYSCTL */
>
> int sysctl_max_threads(struct ctl_table *table, int write,
> diff --git a/init/main.c b/init/main.c
> index ee4947af823f..a91ea166a731 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1367,6 +1367,8 @@ static int __ref kernel_init(void *unused)
>
> rcu_end_inkernel_boot();
>
> + do_sysctl_args();
> +
> if (ramdisk_execute_command) {
> ret = run_init_process(ramdisk_execute_command);
> if (!ret)
> --
> 2.25.1
>
Otherwise, yes, looks good!
--
Kees Cook
On Mon, Mar 30, 2020 at 01:55:34PM +0200, Vlastimil Babka wrote:
> We can now handle sysctl parameters on kernel command line, but historically
> some parameters introduced their own command line equivalent, which we don't
> want to remove for compatibility reasons. We can however convert them to the
> generic infrastructure with a table translating the legacy command line
> parameters to their sysctl names, and removing the one-off param handlers.
>
> This patch adds the support and makes the first conversion to demonstrate it,
> on the (deprecated) numa_zonelist_order parameter.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Kees Cook <[email protected]>
-Kees
> ---
> fs/proc/proc_sysctl.c | 48 ++++++++++++++++++++++++++++++++++++-------
> mm/page_alloc.c | 9 --------
> 2 files changed, 41 insertions(+), 16 deletions(-)
>
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index 653188c9c4c9..97eb0b552bf8 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -1727,6 +1727,37 @@ int __init proc_sys_init(void)
> return sysctl_init();
> }
>
> +struct sysctl_alias {
> + const char *kernel_param;
> + const char *sysctl_param;
> +};
> +
> +/*
> + * Historically some settings had both sysctl and a command line parameter.
> + * With the generic sysctl. parameter support, we can handle them at a single
> + * place and only keep the historical name for compatibility. This is not meant
> + * to add brand new aliases. When adding existing aliases, consider whether
> + * the possibly different moment of changing the value (e.g. from early_param
> + * to the moment do_sysctl_args() is called) is an issue for the specific
> + * parameter.
> + */
> +static const struct sysctl_alias sysctl_aliases[] = {
> + {"numa_zonelist_order", "vm.numa_zonelist_order" },
> + { }
> +};
> +
> +static const char *sysctl_find_alias(char *param)
> +{
> + const struct sysctl_alias *alias;
> +
> + for (alias = &sysctl_aliases[0]; alias->kernel_param != NULL; alias++) {
> + if (strcmp(alias->kernel_param, param) == 0)
> + return alias->sysctl_param;
> + }
> +
> + return NULL;
> +}
> +
> /* Set sysctl value passed on kernel command line. */
> static int process_sysctl_arg(char *param, char *val,
> const char *unused, void *arg)
> @@ -1740,15 +1771,18 @@ static int process_sysctl_arg(char *param, char *val,
> loff_t pos = 0;
> ssize_t wret;
>
> - if (strncmp(param, "sysctl", sizeof("sysctl") - 1))
> - return 0;
> -
> - param += sizeof("sysctl") - 1;
> + if (strncmp(param, "sysctl", sizeof("sysctl") - 1) == 0) {
> + param += sizeof("sysctl") - 1;
>
> - if (param[0] != '/' && param[0] != '.')
> - return 0;
> + if (param[0] != '/' && param[0] != '.')
> + return 0;
>
> - param++;
> + param++;
> + } else {
> + param = (char *) sysctl_find_alias(param);
> + if (!param)
> + return 0;
> + }
>
> if (!proc_mnt) {
> proc_fs_type = get_fs_type("proc");
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3c4eb750a199..de7a134b1b8a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5460,15 +5460,6 @@ static int __parse_numa_zonelist_order(char *s)
> return 0;
> }
>
> -static __init int setup_numa_zonelist_order(char *s)
> -{
> - if (!s)
> - return 0;
> -
> - return __parse_numa_zonelist_order(s);
> -}
> -early_param("numa_zonelist_order", setup_numa_zonelist_order);
> -
> char numa_zonelist_order[] = "Node";
>
> /*
> --
> 2.25.1
>
--
Kees Cook
On Mon, Mar 30, 2020 at 06:23:57PM +0200, Vlastimil Babka wrote:
> Boo, all the error prints should terminate with \n
> Will wait for feedback before resend.
eek, yes, good catch. :)
--
Kees Cook
On Mon, Mar 30, 2020 at 01:55:35PM +0200, Vlastimil Babka wrote:
> We can now handle sysctl parameters on kernel command line and have
> infrastructure to convert legacy command line options that duplicate sysctl
> to become a sysctl alias.
>
> This patch converts the hung_task_panic parameter. Note that the sysctl handler
> is more strict and allows only 0 and 1, while the legacy parameter allowed
> any non-zero value. But there is little reason anyone would not be using 1.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
I'll let others double-check, but I think this change should be okay. If
not, we can adjust the sysctl handler to accept an arbitrary int.
Reviewed-by: Kees Cook <[email protected]>
-Kees
> ---
> Documentation/admin-guide/kernel-parameters.txt | 2 +-
> fs/proc/proc_sysctl.c | 1 +
> kernel/hung_task.c | 10 ----------
> 3 files changed, 2 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 81ff626fc700..e0b8840404a1 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1457,7 +1457,7 @@
> [KNL] Should the hung task detector generate panics.
> Format: <integer>
>
> - A nonzero value instructs the kernel to panic when a
> + A value of 1 instructs the kernel to panic when a
> hung task is detected. The default value is controlled
> by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
> option. The value selected by this boot parameter can
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index 97eb0b552bf8..77b1b844b02b 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -1743,6 +1743,7 @@ struct sysctl_alias {
> */
> static const struct sysctl_alias sysctl_aliases[] = {
> {"numa_zonelist_order", "vm.numa_zonelist_order" },
> + {"hung_task_panic", "kernel.hung_task_panic" },
> { }
> };
>
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 14a625c16cb3..b22b5eeab3cb 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -63,16 +63,6 @@ static struct task_struct *watchdog_task;
> unsigned int __read_mostly sysctl_hung_task_panic =
> CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE;
>
> -static int __init hung_task_panic_setup(char *str)
> -{
> - int rc = kstrtouint(str, 0, &sysctl_hung_task_panic);
> -
> - if (rc)
> - return rc;
> - return 1;
> -}
> -__setup("hung_task_panic=", hung_task_panic_setup);
> -
> static int
> hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
> {
> --
> 2.25.1
>
--
Kees Cook
Sorry to be late to the apocalypse review party for this, feedback below.
On Mon, Mar 30, 2020 at 01:55:33PM +0200, Vlastimil Babka wrote:
> A recently proposed patch to add vm_swappiness command line parameter in
> addition to existing sysctl [1] made me wonder why we don't have a general
> support for passing sysctl parameters via command line. Googling found only
> somebody else wondering the same [2], but I haven't found any prior discussion
> with reasons why not to do this.
>
> Settings the vm_swappiness issue aside (the underlying issue might be solved in
> a different way), quick search of kernel-parameters.txt shows there are already
> some that exist as both sysctl and kernel parameter - hung_task_panic,
> nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism
> would remove the need to add more of those one-offs and might be handy in
> situations where configuration by e.g. /etc/sysctl.d/ is impractical.
>
> Hence, this patch adds a new parse_args() pass that looks for parameters
> prefixed by 'sysctl.' and tries to interpret them as writes to the
> corresponding sys/ files using an temporary in-kernel procfs mount. This
> mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically
> registered sysctl tables.
"even though we don't handle modular sysctls" might be safer to add.
> Errors due to e.g. invalid parameter name or value
> are reported in the kernel log.
>
> The processing is hooked right before the init process is loaded, as some
> handlers might be more complicated than simple setters and might need some
> subsystems to be initialized. At the moment the init process can be started and
> eventually execute a process writing to /proc/sys/ then it should be also fine
> to do that from the kernel.
This is wonderful when we think about existing sysctls which have
corresponding silly boot params that do the same thing. However, shoving
a boot param capability down every possible built-in sysctl brings
forward support considerations we should take serious, as this would
add a new user interface and we'll have to support it.
Simply put, not all sysctls should be born to be boot params. I suggest
we white-list which ones we can process, so that only sysctls we *do*
review and agree are good candidates get allowed to also be boot params.
Calling a proc hanlder early might seem functional, but if the subsystem
defers evaluation of a setting later, then any boot param set would be
lifted anyway. I think each syscl would need to be reviewed for this to
be supported in a way that doesn't create odd unexpected system settings
which we later have to support forever.
Should we not do this, we'll have to live with the consequences of
supporting the full swoop of sysctls are boot params, whatever
consequences those may be.
> Sysctls registered later on module load time are not set by this mechanism -
> it's expected that in such scenarios, setting sysctl values from userspace is
> practical enough.
I'm just not sure if its worth supporting these, for modules we have
module params, but those with more creative userspace might have a
better idea as to why we'd want to support this. I just can't see it
yet.
> [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
> [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
> [3] https://lore.kernel.org/r/[email protected]/
>
> Signed-off-by: Vlastimil Babka <[email protected]>
> ---
> .../admin-guide/kernel-parameters.txt | 9 ++
> fs/proc/proc_sysctl.c | 100 ++++++++++++++++++
> include/linux/sysctl.h | 4 +
> init/main.c | 2 +
> 4 files changed, 115 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c07815d230bc..81ff626fc700 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4793,6 +4793,15 @@
>
> switches= [HW,M68k]
>
> + sysctl.*= [KNL]
> + Set a sysctl parameter, right before loading the init
> + process, as if the value was written to the respective
> + /proc/sys/... file. Both '.' and '/' are recognized as
> + separators. Unrecognized parameters and invalid values
> + are reported in the kernel log. Sysctls registered
> + later by a loaded module cannot be set this way.
> + Example: sysctl.vm.swappiness=40
> +
> sysfs.deprecated=0|1 [KNL]
> Enable/disable old style sysfs layout for old udev
> on older distributions. When this option is enabled
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index c75bb4632ed1..653188c9c4c9 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -14,6 +14,7 @@
> #include <linux/mm.h>
> #include <linux/module.h>
> #include <linux/bpf-cgroup.h>
> +#include <linux/mount.h>
> #include "internal.h"
>
> static const struct dentry_operations proc_sys_dentry_operations;
> @@ -1725,3 +1726,102 @@ int __init proc_sys_init(void)
>
> return sysctl_init();
> }
> +
> +/* Set sysctl value passed on kernel command line. */
> +static int process_sysctl_arg(char *param, char *val,
> + const char *unused, void *arg)
> +{
> + char *path;
> + struct vfsmount *proc_mnt = *((struct vfsmount **)arg);
> + struct file_system_type *proc_fs_type;
> + struct file *file;
> + int len;
> + int err;
> + loff_t pos = 0;
> + ssize_t wret;
> +
> + if (strncmp(param, "sysctl", sizeof("sysctl") - 1))
> + return 0;
> +
> + param += sizeof("sysctl") - 1;
> +
> + if (param[0] != '/' && param[0] != '.')
> + return 0;
> +
> + param++;
> +
> + if (!proc_mnt) {
> + proc_fs_type = get_fs_type("proc");
> + if (!proc_fs_type) {
> + pr_err("Failed to find procfs to set sysctl from command line");
> + return 0;
> + }
> + proc_mnt = kern_mount(proc_fs_type);
> + put_filesystem(proc_fs_type);
> + if (IS_ERR(proc_mnt)) {
> + pr_err("Failed to mount procfs to set sysctl from command line");
> + return 0;
> + }
> + *((struct vfsmount **)arg) = proc_mnt;
> + }
> +
> + path = kasprintf(GFP_KERNEL, "sys/%s", param);
> + if (!path)
> + panic("%s: Failed to allocate path for %s\n", __func__, param);
> + strreplace(path, '.', '/');
> +
> + file = file_open_root(proc_mnt->mnt_root, proc_mnt, path, O_WRONLY, 0);
> + if (IS_ERR(file)) {
> + err = PTR_ERR(file);
> + if (err == -ENOENT)
> + pr_err("Failed to set sysctl parameter '%s=%s': parameter not found",
> + param, val);
> + else if (err == -EACCES)
> + pr_err("Failed to set sysctl parameter '%s=%s': permission denied (read-only?)",
> + param, val);
> + else
> + pr_err("Error %pe opening proc file to set sysctl parameter '%s=%s'",
> + file, param, val);
> + goto out;
> + }
> + len = strlen(val);
> + wret = kernel_write(file, val, len, &pos);
> + if (wret < 0) {
> + err = wret;
> + if (err == -EINVAL)
> + pr_err("Failed to set sysctl parameter '%s=%s': invalid value",
> + param, val);
> + else
> + pr_err("Error %pe writing to proc file to set sysctl parameter '%s=%s'",
> + ERR_PTR(err), param, val);
> + } else if (wret != len) {
> + pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
> + wret, len, path, param, val);
> + }
> +
> + err = filp_close(file, NULL);
> + if (err)
> + pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
> + ERR_PTR(err), param, val);
> +out:
> + kfree(path);
> + return 0;
> +}
> +
> +void do_sysctl_args(void)
> +{
> + char *command_line;
> + struct vfsmount *proc_mnt = NULL;
> +
> + command_line = kstrdup(saved_command_line, GFP_KERNEL);
can you use kstrndup() ? And then use kfree_const()? Yes, feel free to
move __kstrncpy() to a generic kstrncpy().
> + if (!command_line)
> + panic("%s: Failed to allocate copy of command line\n", __func__);
> +
> + parse_args("Setting sysctl args", command_line,
> + NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
> +
> + if (proc_mnt)
> + kern_unmount(proc_mnt);
> +
> + kfree(command_line);
> +}
Then, can we get this tested as part of lib/test_sysctl.c with its
respective tools/testing/selftests/sysctl/sysctl.sh ?
Luis
On 3/30/20 10:43 AM, Kees Cook wrote:
...
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 81ff626fc700..e0b8840404a1 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -1457,7 +1457,7 @@
>> [KNL] Should the hung task detector generate panics.
>> Format: <integer>
>>
>> - A nonzero value instructs the kernel to panic when a
>> + A value of 1 instructs the kernel to panic when a
>> hung task is detected. The default value is controlled
>> by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
>> option. The value selected by this boot parameter can
>> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
>> index 97eb0b552bf8..77b1b844b02b 100644
>> --- a/fs/proc/proc_sysctl.c
>> +++ b/fs/proc/proc_sysctl.c
>> @@ -1743,6 +1743,7 @@ struct sysctl_alias {
>> */
>> static const struct sysctl_alias sysctl_aliases[] = {
>> {"numa_zonelist_order", "vm.numa_zonelist_order" },
Hi Vlastimil,
Maybe best to delete the above line? Because:
a) it was added as an example, and now that you have a real use case in this patch,
the example is no longer required, and
b) numa_zonelist_order is deprecated, as a boot param. Adding support to it in this
brand-new mechanism seems to be going a bit in the opposite direction of deprecation.
And, I don't think you really want all the sysctls to be enabled as boot params, right? Your
comment right above sysctl_aliases[] (shown in patch 2) sort of indicates that only some items
are meant to be both sysctl's and boot params. And that makes sense.
In fact, the sysctl_aliases[] is (or could be) effectively the whitelist that Luis Chamberlain
was requesting in another thread. A whitelist makes good sense, for the reasons Luis listed.
As such, keeping it limited to items that we want, seems like the way to go, IMHO.
thanks,
--
John Hubbard
NVIDIA
>> + {"hung_task_panic", "kernel.hung_task_panic" },
>> { }
>> };
>>
>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>> index 14a625c16cb3..b22b5eeab3cb 100644
>> --- a/kernel/hung_task.c
>> +++ b/kernel/hung_task.c
>> @@ -63,16 +63,6 @@ static struct task_struct *watchdog_task;
>> unsigned int __read_mostly sysctl_hung_task_panic =
>> CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE;
>>
>> -static int __init hung_task_panic_setup(char *str)
>> -{
>> - int rc = kstrtouint(str, 0, &sysctl_hung_task_panic);
>> -
>> - if (rc)
>> - return rc;
>> - return 1;
>> -}
>> -__setup("hung_task_panic=", hung_task_panic_setup);
>> -
>> static int
>> hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
>> {
>> --
>> 2.25.1
>>
>
On 3/31/20 2:34 AM, John Hubbard wrote:
> On 3/30/20 10:43 AM, Kees Cook wrote:
> ...
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>> index 81ff626fc700..e0b8840404a1 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -1457,7 +1457,7 @@
>>> [KNL] Should the hung task detector generate panics.
>>> Format: <integer>
>>>
>>> - A nonzero value instructs the kernel to panic when a
>>> + A value of 1 instructs the kernel to panic when a
>>> hung task is detected. The default value is controlled
>>> by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
>>> option. The value selected by this boot parameter can
>>> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
>>> index 97eb0b552bf8..77b1b844b02b 100644
>>> --- a/fs/proc/proc_sysctl.c
>>> +++ b/fs/proc/proc_sysctl.c
>>> @@ -1743,6 +1743,7 @@ struct sysctl_alias {
>>> */
>>> static const struct sysctl_alias sysctl_aliases[] = {
>>> {"numa_zonelist_order", "vm.numa_zonelist_order" },
>
>
> Hi Vlastimil,
>
> Maybe best to delete the above line? Because:
>
> a) it was added as an example, and now that you have a real use case in this patch,
> the example is no longer required, and
>
> b) numa_zonelist_order is deprecated, as a boot param. Adding support to it in this
> brand-new mechanism seems to be going a bit in the opposite direction of deprecation.
Well, this aliases table is not the brand new mechanism, it's just for handling
sysctls that also have a legacy boot param. numa_zonelist_order is such legacy
boot param, so we can handle it here instead of its special handler. If we
decide to remove it later, we can do that, but there is no user-visible effect
on its deprecation by this series.
> And, I don't think you really want all the sysctls to be enabled as boot params, right? Your
The point of Patch 1 is very much so that all sysctls can be set using a boot
param in the form of sysctl.foo.bar=baz
> comment right above sysctl_aliases[] (shown in patch 2) sort of indicates that only some items
> are meant to be both sysctl's and boot params. And that makes sense.
Patches 2+3 are only about handling the legacy boot params that have a sysctl
counterpart.
> In fact, the sysctl_aliases[] is (or could be) effectively the whitelist that Luis Chamberlain
> was requesting in another thread. A whitelist makes good sense, for the reasons Luis listed.
> As such, keeping it limited to items that we want, seems like the way to go, IMHO.
See my reply there once I send it :)
> thanks,
>
On 3/31/20 12:44 AM, Luis Chamberlain wrote:
> Sorry to be late to the apocalypse review party for this, feedback below.
>
> On Mon, Mar 30, 2020 at 01:55:33PM +0200, Vlastimil Babka wrote:
>> A recently proposed patch to add vm_swappiness command line parameter in
>> addition to existing sysctl [1] made me wonder why we don't have a general
>> support for passing sysctl parameters via command line. Googling found only
>> somebody else wondering the same [2], but I haven't found any prior discussion
>> with reasons why not to do this.
>>
>> Settings the vm_swappiness issue aside (the underlying issue might be solved in
>> a different way), quick search of kernel-parameters.txt shows there are already
>> some that exist as both sysctl and kernel parameter - hung_task_panic,
>> nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism
>> would remove the need to add more of those one-offs and might be handy in
>> situations where configuration by e.g. /etc/sysctl.d/ is impractical.
>>
>> Hence, this patch adds a new parse_args() pass that looks for parameters
>> prefixed by 'sysctl.' and tries to interpret them as writes to the
>> corresponding sys/ files using an temporary in-kernel procfs mount. This
>> mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically
>> registered sysctl tables.
>
> "even though we don't handle modular sysctls" might be safer to add.
OK
>> Errors due to e.g. invalid parameter name or value
>> are reported in the kernel log.
>>
>> The processing is hooked right before the init process is loaded, as some
>> handlers might be more complicated than simple setters and might need some
>> subsystems to be initialized. At the moment the init process can be started and
>> eventually execute a process writing to /proc/sys/ then it should be also fine
>> to do that from the kernel.
>
> This is wonderful when we think about existing sysctls which have
> corresponding silly boot params that do the same thing. However, shoving
> a boot param capability down every possible built-in sysctl brings
> forward support considerations we should take serious, as this would
> add a new user interface and we'll have to support it.
Hmm, if I boot with an initramfs with init process that does mount /proc and set
some sysctl there as the very first thing, then this will be almost the same
moment as my patch does it. There is no further kernel initialization in
between. So with your logic we already do support all non-modular sysctls to be
set so early.
> Simply put, not all sysctls should be born to be boot params. I suggest
> we white-list which ones we can process, so that only sysctls we *do*
> review and agree are good candidates get allowed to also be boot params.
By above, the nuber of sysctls that will be problematic with this boot param
mechanism, but work properly when set from init process immediately, should be
near zero, and I would expect truly zero. As such, whitelist approach seems
excessive to me and it would take a lot of effort to build it, and it will be a
lottery which sysctl would work as boot param on which kernel version. Sounds
like a lot of trouble for little benefit to me.
> Calling a proc hanlder early might seem functional, but if the subsystem
> defers evaluation of a setting later, then any boot param set would be
> lifted anyway.
I'm not sure I understand, can you show me some example please?
> I think each syscl would need to be reviewed for this to
> be supported in a way that doesn't create odd unexpected system settings
> which we later have to support forever.
We would already do per the initramfs argument.
> Should we not do this, we'll have to live with the consequences of
> supporting the full swoop of sysctls are boot params, whatever
> consequences those may be.
Of course when the first user tries to set some particular sysctl as boot param
and finds and reports it doesn't work as intended, then it can be fixed or
blacklisted and it can't break anyone else?
>> Sysctls registered later on module load time are not set by this mechanism -
>> it's expected that in such scenarios, setting sysctl values from userspace is
>> practical enough.
>
> I'm just not sure if its worth supporting these, for modules we have
> module params, but those with more creative userspace might have a
> better idea as to why we'd want to support this. I just can't see it
> yet.
Sure, I can defer that part for later now.
>> [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
>> [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
>> [3] https://lore.kernel.org/r/[email protected]/
>>
>> Signed-off-by: Vlastimil Babka <[email protected]>
>> ---
>
On Tue 31-03-20 09:42:46, Vlastimil Babka wrote:
[...]
> > Should we not do this, we'll have to live with the consequences of
> > supporting the full swoop of sysctls are boot params, whatever
> > consequences those may be.
>
> Of course when the first user tries to set some particular sysctl as boot param
> and finds and reports it doesn't work as intended, then it can be fixed or
> blacklisted and it can't break anyone else?
Absolutely agreed. I would be really careful to not overengineer this
whole thing.
--
Michal Hocko
SUSE Labs
On Tue, Mar 31, 2020 at 09:42:46AM +0200, Vlastimil Babka wrote:
> On 3/31/20 12:44 AM, Luis Chamberlain wrote:
> >
> > This is wonderful when we think about existing sysctls which have
> > corresponding silly boot params that do the same thing. However, shoving
> > a boot param capability down every possible built-in sysctl brings
> > forward support considerations we should take serious, as this would
> > add a new user interface and we'll have to support it.
>
> Hmm, if I boot with an initramfs with init process that does mount /proc and set
> some sysctl there as the very first thing, then this will be almost the same
> moment as my patch does it. There is no further kernel initialization in
> between. So with your logic we already do support all non-modular sysctls to be
> set so early.
Yes, true. Then by all means:
Reviewed-by: Luis Chamberlain <[email protected]>
Luis
On Mon, Mar 30, 2020 at 01:55:34PM +0200, Vlastimil Babka wrote:
> We can now handle sysctl parameters on kernel command line, but historically
> some parameters introduced their own command line equivalent, which we don't
> want to remove for compatibility reasons. We can however convert them to the
> generic infrastructure with a table translating the legacy command line
> parameters to their sysctl names, and removing the one-off param handlers.
>
> This patch adds the support and makes the first conversion to demonstrate it,
> on the (deprecated) numa_zonelist_order parameter.
>
> Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Luis
On 3/31/20 12:27 AM, Vlastimil Babka wrote:
...
> Patches 2+3 are only about handling the legacy boot params that have a sysctl
> counterpart.
>
OK, I misread what those were for.
>> In fact, the sysctl_aliases[] is (or could be) effectively the whitelist that Luis Chamberlain
>> was requesting in another thread. A whitelist makes good sense, for the reasons Luis listed.
>> As such, keeping it limited to items that we want, seems like the way to go, IMHO.
>
> See my reply there once I send it :)
>
Saw that, and it all sounds good now.
thanks,
--
John Hubbard
NVIDIA
On Tue, Mar 31, 2020 at 09:48:36AM +0200, Michal Hocko wrote:
> On Tue 31-03-20 09:42:46, Vlastimil Babka wrote:
> [...]
> > > Should we not do this, we'll have to live with the consequences of
> > > supporting the full swoop of sysctls are boot params, whatever
> > > consequences those may be.
> >
> > Of course when the first user tries to set some particular sysctl as boot param
> > and finds and reports it doesn't work as intended, then it can be fixed or
> > blacklisted and it can't break anyone else?
>
> Absolutely agreed. I would be really careful to not overengineer this
> whole thing.
Right -- this is supposed to be _simple_, and I think that's the primary
benefit here. If we encounter problems we can fix those cases. The
careful place, I think, needs to be with converting existing boot params
to be aliases. That's when timing considerations need to be taken into
account carefully.
--
Kees Cook
On 2020/03/30 20:55, Vlastimil Babka wrote:
> @@ -63,16 +63,6 @@ static struct task_struct *watchdog_task;
> unsigned int __read_mostly sysctl_hung_task_panic =
> CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE;
>
> -static int __init hung_task_panic_setup(char *str)
> -{
> - int rc = kstrtouint(str, 0, &sysctl_hung_task_panic);
> -
> - if (rc)
> - return rc;
> - return 1;
> -}
> -__setup("hung_task_panic=", hung_task_panic_setup);
Can we defer removal of this handler for "one release cycle" (and instead emit a line
saying that "this parameter will be replaced by ..." during that cycle) ? I welcome
PATCH 1/3, but kernel testing projects (e.g. syzbot) needs to update their settings
between PATCH 1/3 was merged into linux.git and PATCH 3/3 is merged into linux.git .
https://lkml.kernel.org/r/CACT4Y+YE-j5ncjTGN6UhngfCNRgVo-QDZ3VCBGACdbs9-v+axQ@mail.gmail.com says
"Announcing unmerged changes is too early (as this patch showed). And once it's in linux-next it's already too late.."
> -
> static int
> hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
> {
>
On 4/1/20 1:12 AM, Tetsuo Handa wrote:
> On 2020/03/30 20:55, Vlastimil Babka wrote:
>> @@ -63,16 +63,6 @@ static struct task_struct *watchdog_task;
>> unsigned int __read_mostly sysctl_hung_task_panic =
>> CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE;
>>
>> -static int __init hung_task_panic_setup(char *str)
>> -{
>> - int rc = kstrtouint(str, 0, &sysctl_hung_task_panic);
>> -
>> - if (rc)
>> - return rc;
>> - return 1;
>> -}
>> -__setup("hung_task_panic=", hung_task_panic_setup);
>
> Can we defer removal of this handler for "one release cycle" (and instead emit a line
> saying that "this parameter will be replaced by ..." during that cycle) ? I welcome
The old parameter is not removed, it's just handled differently, see patch 2.
Both old name and new sysctl.name will work.
> PATCH 1/3, but kernel testing projects (e.g. syzbot) needs to update their settings
> between PATCH 1/3 was merged into linux.git and PATCH 3/3 is merged into linux.git .
>
> https://lkml.kernel.org/r/CACT4Y+YE-j5ncjTGN6UhngfCNRgVo-QDZ3VCBGACdbs9-v+axQ@mail.gmail.com says
> "Announcing unmerged changes is too early (as this patch showed). And once it's in linux-next it's already too late.."
>
>> -
>> static int
>> hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
>> {
>>
>
On 3/31/20 12:44 AM, Luis Chamberlain wrote:
>> + } else if (wret != len) {
>> + pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
>> + wret, len, path, param, val);
>> + }
>> +
>> + err = filp_close(file, NULL);
>> + if (err)
>> + pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
>> + ERR_PTR(err), param, val);
>> +out:
>> + kfree(path);
>> + return 0;
>> +}
>> +
>> +void do_sysctl_args(void)
>> +{
>> + char *command_line;
>> + struct vfsmount *proc_mnt = NULL;
>> +
>> + command_line = kstrdup(saved_command_line, GFP_KERNEL);
>
> can you use kstrndup() ? And then use kfree_const()? Yes, feel free to
I don't follow, what am I missing? Do you mean this?
size_t len = strlen(saved_command_line);
command_line = kstrndup(saved_command_line, len, GFP_KERNEL);
What would be the advantage over plain kstrdup()?
As for kfree_const(), when would command_line be .rodata? I don't see using
kstrndup() resulting in that.
> move __kstrncpy() to a generic kstrncpy().
>
>> + if (!command_line)
>> + panic("%s: Failed to allocate copy of command line\n", __func__);
>> +
>> + parse_args("Setting sysctl args", command_line,
>> + NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
>> +
>> + if (proc_mnt)
>> + kern_unmount(proc_mnt);
>> +
>> + kfree(command_line);
>> +}
>
> Then, can we get this tested as part of lib/test_sysctl.c with its
> respective tools/testing/selftests/sysctl/sysctl.sh ?
Hmm so I add some sysctl to the test "module" (in fact the 'config' file says it
should be build with 'y', which would be needed anyway) and expand the test
instructions so that the test kernel boot has to include it on the command line,
and then I verify it has been set? Or do you see a better way?
Thanks,
Vlastimil
> Luis
>
On Wed, Apr 01, 2020 at 01:01:47PM +0200, Vlastimil Babka wrote:
> On 3/31/20 12:44 AM, Luis Chamberlain wrote:
> >> + } else if (wret != len) {
> >> + pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
> >> + wret, len, path, param, val);
> >> + }
> >> +
> >> + err = filp_close(file, NULL);
> >> + if (err)
> >> + pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
> >> + ERR_PTR(err), param, val);
> >> +out:
> >> + kfree(path);
> >> + return 0;
> >> +}
> >> +
> >> +void do_sysctl_args(void)
> >> +{
> >> + char *command_line;
> >> + struct vfsmount *proc_mnt = NULL;
> >> +
> >> + command_line = kstrdup(saved_command_line, GFP_KERNEL);
> >
> > can you use kstrndup() ? And then use kfree_const()? Yes, feel free to
>
> I don't follow, what am I missing? Do you mean this?
>
> size_t len = strlen(saved_command_line);
> command_line = kstrndup(saved_command_line, len, GFP_KERNEL);
>
> What would be the advantage over plain kstrdup()?
> As for kfree_const(), when would command_line be .rodata? I don't see using
> kstrndup() resulting in that.
The const nature of using kstrdup() comes with using const for your
purpose. ie:
const char *const_command_line = saved_command_line;
The point of a kstrncpy() then is to ensure force a const throughout
your use if you know you don't need modifications.
> > move __kstrncpy() to a generic kstrncpy().
> >
> >> + if (!command_line)
> >> + panic("%s: Failed to allocate copy of command line\n", __func__);
> >> +
> >> + parse_args("Setting sysctl args", command_line,
> >> + NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
> >> +
> >> + if (proc_mnt)
> >> + kern_unmount(proc_mnt);
> >> +
> >> + kfree(command_line);
> >> +}
> >
> > Then, can we get this tested as part of lib/test_sysctl.c with its
> > respective tools/testing/selftests/sysctl/sysctl.sh ?
>
> Hmm so I add some sysctl to the test "module" (in fact the 'config' file says it
> should be build with 'y', which would be needed anyway) and expand the test
> instructions so that the test kernel boot has to include it on the command line,
> and then I verify it has been set? Or do you see a better way?
We don't necessarily have a way to test the use boot params today.
That reveals an are which we should eventually put some focus on
in the future. In the meantime we have to deal with what we have.
So let's think about this:
You are adding a new cmdline sysctl boot param, and also a wrapper
for those old boot bootparams to also work using both new sysctl
path and old path. Testing just these both should suffice.
How about this:
For testing the new feature you are adding, can you extend the default
boot params *always* if a new CONFIG_TEST_SYSCTL_CMDLINE is set? Then
upon boot we can verify the proc handlers for these new boot params got
kicked, and likewise some other proc handlers which also can be used
from the cmdline are *not* set. For this later set, we already have
a series of test syctls you can use. In fact, you can use the existing
syctls for both cases already I believe, its just a matter of adding
this new CONFIG_TEST_SYSCTL_CMDLINE which would extend the cmdline,
and these tests would take place *first* on the script.
That would test both cases with one kernel.
You could then also add a bogus new sysctl which also expands to a silly
raw boot param to test the wrapper you are providing. That would be the
only new test syctl you would need to add.
If you can think of a way to *break* your new functionality and ensure
it doesn't break the kernel, even better.
Luis
On Thu, Apr 02, 2020 at 04:04:42PM +0000, Luis Chamberlain wrote:
> On Wed, Apr 01, 2020 at 01:01:47PM +0200, Vlastimil Babka wrote:
> > On 3/31/20 12:44 AM, Luis Chamberlain wrote:
> > >> + } else if (wret != len) {
> > >> + pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
> > >> + wret, len, path, param, val);
> > >> + }
> > >> +
> > >> + err = filp_close(file, NULL);
> > >> + if (err)
> > >> + pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
> > >> + ERR_PTR(err), param, val);
> > >> +out:
> > >> + kfree(path);
> > >> + return 0;
> > >> +}
> > >> +
> > >> +void do_sysctl_args(void)
> > >> +{
> > >> + char *command_line;
> > >> + struct vfsmount *proc_mnt = NULL;
> > >> +
> > >> + command_line = kstrdup(saved_command_line, GFP_KERNEL);
> > >
> > > can you use kstrndup() ? And then use kfree_const()? Yes, feel free to
> >
> > I don't follow, what am I missing? Do you mean this?
> >
> > size_t len = strlen(saved_command_line);
> > command_line = kstrndup(saved_command_line, len, GFP_KERNEL);
> >
> > What would be the advantage over plain kstrdup()?
> > As for kfree_const(), when would command_line be .rodata? I don't see using
> > kstrndup() resulting in that.
>
> The const nature of using kstrdup() comes with using const for your
> purpose. ie:
>
> const char *const_command_line = saved_command_line;
>
> The point of a kstrncpy() then is to ensure force a const throughout
> your use if you know you don't need modifications.
I'm not following this suggestion. It _is_ modifying it. That's why it's
making a copy. What am I missing?
> > >> + parse_args("Setting sysctl args", command_line,
> > >> + NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
> > >> +
> > >> + if (proc_mnt)
> > >> + kern_unmount(proc_mnt);
> > >> +
> > >> + kfree(command_line);
> > >> +}
> > >
> > > Then, can we get this tested as part of lib/test_sysctl.c with its
> > > respective tools/testing/selftests/sysctl/sysctl.sh ?
> >
> > Hmm so I add some sysctl to the test "module" (in fact the 'config' file says it
> > should be build with 'y', which would be needed anyway) and expand the test
> > instructions so that the test kernel boot has to include it on the command line,
> > and then I verify it has been set? Or do you see a better way?
>
> We don't necessarily have a way to test the use boot params today.
> That reveals an are which we should eventually put some focus on
> in the future. In the meantime we have to deal with what we have.
>
> So let's think about this:
>
> You are adding a new cmdline sysctl boot param, and also a wrapper
> for those old boot bootparams to also work using both new sysctl
> path and old path. Testing just these both should suffice.
>
> How about this:
>
> For testing the new feature you are adding, can you extend the default
> boot params *always* if a new CONFIG_TEST_SYSCTL_CMDLINE is set? Then
> upon boot we can verify the proc handlers for these new boot params got
> kicked, and likewise some other proc handlers which also can be used
> from the cmdline are *not* set. For this later set, we already have
> a series of test syctls you can use. In fact, you can use the existing
> syctls for both cases already I believe, its just a matter of adding
> this new CONFIG_TEST_SYSCTL_CMDLINE which would extend the cmdline,
> and these tests would take place *first* on the script.
This seems... messy. I'm all for testing this, but I'd rather this not
be internally driven. This is an external interface (boot params), so
I'd rather an external driver handle that testing. We don't have a
common method to do that with the kernel, though.
> That would test both cases with one kernel.
>
> You could then also add a bogus new sysctl which also expands to a silly
> raw boot param to test the wrapper you are providing. That would be the
> only new test syctl you would need to add.
Sure, that seems reasonable. Supporting externally driven testing makes
sense for this.
--
Kees Cook
On Thu, Apr 02, 2020 at 10:23:13AM -0700, Kees Cook wrote:
> On Thu, Apr 02, 2020 at 04:04:42PM +0000, Luis Chamberlain wrote:
> > On Wed, Apr 01, 2020 at 01:01:47PM +0200, Vlastimil Babka wrote:
> > > On 3/31/20 12:44 AM, Luis Chamberlain wrote:
> > > >> + } else if (wret != len) {
> > > >> + pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
> > > >> + wret, len, path, param, val);
> > > >> + }
> > > >> +
> > > >> + err = filp_close(file, NULL);
> > > >> + if (err)
> > > >> + pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
> > > >> + ERR_PTR(err), param, val);
> > > >> +out:
> > > >> + kfree(path);
> > > >> + return 0;
> > > >> +}
> > > >> +
> > > >> +void do_sysctl_args(void)
> > > >> +{
> > > >> + char *command_line;
> > > >> + struct vfsmount *proc_mnt = NULL;
> > > >> +
> > > >> + command_line = kstrdup(saved_command_line, GFP_KERNEL);
> > > >
> > > > can you use kstrndup() ? And then use kfree_const()? Yes, feel free to
> > >
> > > I don't follow, what am I missing? Do you mean this?
> > >
> > > size_t len = strlen(saved_command_line);
> > > command_line = kstrndup(saved_command_line, len, GFP_KERNEL);
> > >
> > > What would be the advantage over plain kstrdup()?
> > > As for kfree_const(), when would command_line be .rodata? I don't see using
> > > kstrndup() resulting in that.
> >
> > The const nature of using kstrdup() comes with using const for your
> > purpose. ie:
> >
> > const char *const_command_line = saved_command_line;
> >
> > The point of a kstrncpy() then is to ensure force a const throughout
> > your use if you know you don't need modifications.
>
> I'm not following this suggestion. It _is_ modifying it. That's why it's
> making a copy. What am I missing?
We modify the copied bootparams to allow new sysctls to map to old boot params?
If so, then yes, this cannot be used.
> > > >> + parse_args("Setting sysctl args", command_line,
> > > >> + NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
> > > >> +
> > > >> + if (proc_mnt)
> > > >> + kern_unmount(proc_mnt);
> > > >> +
> > > >> + kfree(command_line);
> > > >> +}
> > > >
> > > > Then, can we get this tested as part of lib/test_sysctl.c with its
> > > > respective tools/testing/selftests/sysctl/sysctl.sh ?
> > >
> > > Hmm so I add some sysctl to the test "module" (in fact the 'config' file says it
> > > should be build with 'y', which would be needed anyway) and expand the test
> > > instructions so that the test kernel boot has to include it on the command line,
> > > and then I verify it has been set? Or do you see a better way?
> >
> > We don't necessarily have a way to test the use boot params today.
> > That reveals an are which we should eventually put some focus on
> > in the future. In the meantime we have to deal with what we have.
> >
> > So let's think about this:
> >
> > You are adding a new cmdline sysctl boot param, and also a wrapper
> > for those old boot bootparams to also work using both new sysctl
> > path and old path. Testing just these both should suffice.
> >
> > How about this:
> >
> > For testing the new feature you are adding, can you extend the default
> > boot params *always* if a new CONFIG_TEST_SYSCTL_CMDLINE is set? Then
> > upon boot we can verify the proc handlers for these new boot params got
> > kicked, and likewise some other proc handlers which also can be used
> > from the cmdline are *not* set. For this later set, we already have
> > a series of test syctls you can use. In fact, you can use the existing
> > syctls for both cases already I believe, its just a matter of adding
> > this new CONFIG_TEST_SYSCTL_CMDLINE which would extend the cmdline,
> > and these tests would take place *first* on the script.
>
> This seems... messy.
It is all we have.
> I'm all for testing this,
OK so we do want to test it.
> but I'd rather this not be internally driven.
This is the least cumbersome solution I could think of. Other things
would require things like using qemu, etc. That seems much more messsy.
> This is an external interface (boot params), so
> I'd rather an external driver handle that testing. We don't have a
> common method to do that with the kernel, though.
Right... which begs the question now -- how do we test this sort of
stuff? The above would at least get us coverage while we iron something
more generic out for boot params.
> > That would test both cases with one kernel.
> >
> > You could then also add a bogus new sysctl which also expands to a silly
> > raw boot param to test the wrapper you are providing. That would be the
> > only new test syctl you would need to add.
>
> Sure, that seems reasonable. Supporting externally driven testing makes
> sense for this.
But again, what exactly?
Luis
On Thu, Apr 02, 2020 at 08:59:32PM +0000, Luis Chamberlain wrote:
> On Thu, Apr 02, 2020 at 10:23:13AM -0700, Kees Cook wrote:
> > On Thu, Apr 02, 2020 at 04:04:42PM +0000, Luis Chamberlain wrote:
> > > On Wed, Apr 01, 2020 at 01:01:47PM +0200, Vlastimil Babka wrote:
> > > > On 3/31/20 12:44 AM, Luis Chamberlain wrote:
> > > > >> + } else if (wret != len) {
> > > > >> + pr_err("Wrote only %ld bytes of %d writing to proc file %s to set sysctl parameter '%s=%s'",
> > > > >> + wret, len, path, param, val);
> > > > >> + }
> > > > >> +
> > > > >> + err = filp_close(file, NULL);
> > > > >> + if (err)
> > > > >> + pr_err("Error %pe closing proc file to set sysctl parameter '%s=%s'",
> > > > >> + ERR_PTR(err), param, val);
> > > > >> +out:
> > > > >> + kfree(path);
> > > > >> + return 0;
> > > > >> +}
> > > > >> +
> > > > >> +void do_sysctl_args(void)
> > > > >> +{
> > > > >> + char *command_line;
> > > > >> + struct vfsmount *proc_mnt = NULL;
> > > > >> +
> > > > >> + command_line = kstrdup(saved_command_line, GFP_KERNEL);
> > > > >
> > > > > can you use kstrndup() ? And then use kfree_const()? Yes, feel free to
> > > >
> > > > I don't follow, what am I missing? Do you mean this?
> > > >
> > > > size_t len = strlen(saved_command_line);
> > > > command_line = kstrndup(saved_command_line, len, GFP_KERNEL);
> > > >
> > > > What would be the advantage over plain kstrdup()?
> > > > As for kfree_const(), when would command_line be .rodata? I don't see using
> > > > kstrndup() resulting in that.
> > >
> > > The const nature of using kstrdup() comes with using const for your
> > > purpose. ie:
> > >
> > > const char *const_command_line = saved_command_line;
> > >
> > > The point of a kstrncpy() then is to ensure force a const throughout
> > > your use if you know you don't need modifications.
> >
> > I'm not following this suggestion. It _is_ modifying it. That's why it's
> > making a copy. What am I missing?
>
> We modify the copied bootparams to allow new sysctls to map to old boot params?
>
> If so, then yes, this cannot be used.
I feel like I've lost track of this thread. This strdup is so that the
command line can have '\0's injected while it steps through the args
(and for doing the . and / replacement). I don't know what you mean by
"map" here: this is standard parse_args() usage.
> > > > >> + parse_args("Setting sysctl args", command_line,
> > > > >> + NULL, 0, -1, -1, &proc_mnt, process_sysctl_arg);
> > > > >> +
> > > > >> + if (proc_mnt)
> > > > >> + kern_unmount(proc_mnt);
> > > > >> +
> > > > >> + kfree(command_line);
> > > > >> +}
> > > > >
> > > > > Then, can we get this tested as part of lib/test_sysctl.c with its
> > > > > respective tools/testing/selftests/sysctl/sysctl.sh ?
> > > >
> > > > Hmm so I add some sysctl to the test "module" (in fact the 'config' file says it
> > > > should be build with 'y', which would be needed anyway) and expand the test
> > > > instructions so that the test kernel boot has to include it on the command line,
> > > > and then I verify it has been set? Or do you see a better way?
> > >
> > > We don't necessarily have a way to test the use boot params today.
> > > That reveals an are which we should eventually put some focus on
> > > in the future. In the meantime we have to deal with what we have.
> > >
> > > So let's think about this:
> > >
> > > You are adding a new cmdline sysctl boot param, and also a wrapper
> > > for those old boot bootparams to also work using both new sysctl
> > > path and old path. Testing just these both should suffice.
> > >
> > > How about this:
> > >
> > > For testing the new feature you are adding, can you extend the default
> > > boot params *always* if a new CONFIG_TEST_SYSCTL_CMDLINE is set? Then
> > > upon boot we can verify the proc handlers for these new boot params got
> > > kicked, and likewise some other proc handlers which also can be used
> > > from the cmdline are *not* set. For this later set, we already have
> > > a series of test syctls you can use. In fact, you can use the existing
> > > syctls for both cases already I believe, its just a matter of adding
> > > this new CONFIG_TEST_SYSCTL_CMDLINE which would extend the cmdline,
> > > and these tests would take place *first* on the script.
> >
> > This seems... messy.
>
> It is all we have.
> > I'm all for testing this,
>
> OK so we do want to test it.
>
> > but I'd rather this not be internally driven.
>
> This is the least cumbersome solution I could think of. Other things
> would require things like using qemu, etc. That seems much more messsy.
Yes. Doing an internal extension isn't testing the actual code.
>
> > This is an external interface (boot params), so
> > I'd rather an external driver handle that testing. We don't have a
> > common method to do that with the kernel, though.
>
> Right... which begs the question now -- how do we test this sort of
> stuff? The above would at least get us coverage while we iron something
> more generic out for boot params.
>
> > > That would test both cases with one kernel.
> > >
> > > You could then also add a bogus new sysctl which also expands to a silly
> > > raw boot param to test the wrapper you are providing. That would be the
> > > only new test syctl you would need to add.
> >
> > Sure, that seems reasonable. Supporting externally driven testing makes
> > sense for this.
>
> But again, what exactly?
I don't think anything is needed for this series. It can be boot tested
manually.
--
Kees Cook
On Fri, Apr 03, 2020 at 04:57:51PM -0700, Kees Cook wrote:
> On Thu, Apr 02, 2020 at 08:59:32PM +0000, Luis Chamberlain wrote:
> > We modify the copied bootparams to allow new sysctls to map to old boot params?
>
> This strdup is so that the
> command line can have '\0's injected while it steps through the args
> (and for doing the . and / replacement).
Please ignore the const feedback then.
> > This is the least cumbersome solution I could think of. Other things
> > would require things like using qemu, etc. That seems much more messsy.
>
> Yes. Doing an internal extension isn't testing the actual code.
But it would.
> > > This is an external interface (boot params), so
> > > I'd rather an external driver handle that testing. We don't have a
> > > common method to do that with the kernel, though.
> >
> > Right... which begs the question now -- how do we test this sort of
> > stuff? The above would at least get us coverage while we iron something
> > more generic out for boot params.
> >
> > > > That would test both cases with one kernel.
> > > >
> > > > You could then also add a bogus new sysctl which also expands to a silly
> > > > raw boot param to test the wrapper you are providing. That would be the
> > > > only new test syctl you would need to add.
> > >
> > > Sure, that seems reasonable. Supporting externally driven testing makes
> > > sense for this.
> >
> > But again, what exactly?
>
> I don't think anything is needed for this series. It can be boot tested
> manually.
Why test it manually when it could be tested automatically with a new kconfig?
Luis
On Mon, Apr 06, 2020 at 02:08:36PM +0000, Luis Chamberlain wrote:
> > Yes. Doing an internal extension isn't testing the actual code.
>
> But it would.
>
> [...]
> > I don't think anything is needed for this series. It can be boot tested
> > manually.
>
> Why test it manually when it could be tested automatically with a new kconfig?
So, my impression is that adding code to the internals to test the
internals isn't a valid test (or at least makes it fragile) because the
test would depend on the changes to the internals (or at least depend on
non-default non-production CONFIGs).
Can you send a patch for what you think this should look like? Perhaps
I'm not correctly imagining what you're describing?
Regardless of testing, I think this series is ready for -mm.
--
Kees Cook
On Mon, Apr 06, 2020 at 08:58:50AM -0700, Kees Cook wrote:
> On Mon, Apr 06, 2020 at 02:08:36PM +0000, Luis Chamberlain wrote:
> > > Yes. Doing an internal extension isn't testing the actual code.
> >
> > But it would.
> >
> > [...]
> > > I don't think anything is needed for this series. It can be boot tested
> > > manually.
> >
> > Why test it manually when it could be tested automatically with a new kconfig?
>
> So, my impression is that adding code to the internals to test the
> internals isn't a valid test (or at least makes it fragile) because the
> test would depend on the changes to the internals (or at least depend on
> non-default non-production CONFIGs).
The *internal* aspect here is an extension to boot params under a
kconfig which would simply append to it, as if the user would have
added some more params. Since we already have test sysctl params the
only one we'd need to add on the test driver would be a dummy one which
tests the alias, on the second patch. We should have enough sysctls to
already test dummy values.
Nothing else would be needed as the sysctl test driver would just need
to test that the values expected when this is enabled is set.
> Can you send a patch for what you think this should look like? Perhaps
> I'm not correctly imagining what you're describing?
I rather get the person involved in the changes to do the testing so
as they're the ones designing the feature. If however it is not clear
what I mean I'm happy to elaborate.
Vlastimil do you get what I mean?
> Regardless of testing, I think this series is ready for -mm.
I'm happy for it to go in provided we at least devise a follow up plan
for testing. Otherwise -- like other things, it won't get done.
Luis
On Tue, Apr 14, 2020 at 01:25:07PM +0200, Vlastimil Babka wrote:
> On 4/6/20 7:08 PM, Luis Chamberlain wrote:
> > On Mon, Apr 06, 2020 at 08:58:50AM -0700, Kees Cook wrote:
> >> On Mon, Apr 06, 2020 at 02:08:36PM +0000, Luis Chamberlain wrote:
> >> > > Yes. Doing an internal extension isn't testing the actual code.
> >> >
> >> > But it would.
> >> >
> >> > [...]
> >> > > I don't think anything is needed for this series. It can be boot tested
> >> > > manually.
> >> >
> >> > Why test it manually when it could be tested automatically with a new kconfig?
> >>
> >> So, my impression is that adding code to the internals to test the
> >> internals isn't a valid test (or at least makes it fragile) because the
> >> test would depend on the changes to the internals (or at least depend on
> >> non-default non-production CONFIGs).
> >
> > The *internal* aspect here is an extension to boot params under a
> > kconfig which would simply append to it, as if the user would have
>
> So there's no such kconfig yet to apply boot parameters specified by configure,
> right? That would itself be a new feature.
I cannot say, there is no easy grammatical expression for this. For
kernel testing, no, I am not aware of one.
> Or could we use bootconfig? (CC Masami)
Neat, yeah, that seems like a set of helpers of which could help for
sure.
> >> Regardless of testing, I think this series is ready for -mm.
> >
> > I'm happy for it to go in provided we at least devise a follow up plan
> > for testing. Otherwise -- like other things, it won't get done.
>
> OK I'll send a v2 and we can discuss the test driver details. I don't want to
> neglect testing, but also not block the new functionality,
So long as the testing part gets done, I'm happy :)
> in case it means
> testing means adding another new functionality.
Yeah, I can see how some new code may need to be added for that, its a
good point.
But think about this for a second, if we *didn't* have such code added,
how could it have easily been tested before? The question is rhetorical,
as I'm alluding to that a lot of old code wasn't designed for easy unit
testing either, and I'd invite one to consider the value of the change
in philosophy on first code design when one *does* take that into
consideration. There are certain best practices that I'd wager are
probably good for us to evaluate for the long term of kernel evolution,
and I think that always advocating designing testing around new
functionality would be one.
Once and if you do add your testing, and if such testing is expanded to
test parsing the cmdline further, and to purposely break it, who knows
what other bugs bugs we'll find.
But maybe Masami already uncovered all the fun bugs.
Luis
On 4/6/20 7:08 PM, Luis Chamberlain wrote:
> On Mon, Apr 06, 2020 at 08:58:50AM -0700, Kees Cook wrote:
>> On Mon, Apr 06, 2020 at 02:08:36PM +0000, Luis Chamberlain wrote:
>> > > Yes. Doing an internal extension isn't testing the actual code.
>> >
>> > But it would.
>> >
>> > [...]
>> > > I don't think anything is needed for this series. It can be boot tested
>> > > manually.
>> >
>> > Why test it manually when it could be tested automatically with a new kconfig?
>>
>> So, my impression is that adding code to the internals to test the
>> internals isn't a valid test (or at least makes it fragile) because the
>> test would depend on the changes to the internals (or at least depend on
>> non-default non-production CONFIGs).
>
> The *internal* aspect here is an extension to boot params under a
> kconfig which would simply append to it, as if the user would have
So there's no such kconfig yet to apply boot parameters specified by configure,
right? That would itself be a new feature. Or could we use bootconfig? (CC Masami)
> added some more params. Since we already have test sysctl params the
> only one we'd need to add on the test driver would be a dummy one which
> tests the alias, on the second patch. We should have enough sysctls to
> already test dummy values.
Right.
> Nothing else would be needed as the sysctl test driver would just need
> to test that the values expected when this is enabled is set.
>
>> Can you send a patch for what you think this should look like? Perhaps
>> I'm not correctly imagining what you're describing?
>
> I rather get the person involved in the changes to do the testing so
> as they're the ones designing the feature. If however it is not clear
> what I mean I'm happy to elaborate.
>
> Vlastimil do you get what I mean?
Hopefully :)
>> Regardless of testing, I think this series is ready for -mm.
>
> I'm happy for it to go in provided we at least devise a follow up plan
> for testing. Otherwise -- like other things, it won't get done.
OK I'll send a v2 and we can discuss the test driver details. I don't want to
neglect testing, but also not block the new functionality, in case it means
testing means adding another new functionality.
Thanks,
Vlastimil
> Luis
>
On Tue, 14 Apr 2020 13:25:07 +0200
Vlastimil Babka <[email protected]> wrote:
> On 4/6/20 7:08 PM, Luis Chamberlain wrote:
> > On Mon, Apr 06, 2020 at 08:58:50AM -0700, Kees Cook wrote:
> >> On Mon, Apr 06, 2020 at 02:08:36PM +0000, Luis Chamberlain wrote:
> >> > > Yes. Doing an internal extension isn't testing the actual code.
> >> >
> >> > But it would.
> >> >
> >> > [...]
> >> > > I don't think anything is needed for this series. It can be boot tested
> >> > > manually.
> >> >
> >> > Why test it manually when it could be tested automatically with a new kconfig?
> >>
> >> So, my impression is that adding code to the internals to test the
> >> internals isn't a valid test (or at least makes it fragile) because the
> >> test would depend on the changes to the internals (or at least depend on
> >> non-default non-production CONFIGs).
> >
> > The *internal* aspect here is an extension to boot params under a
> > kconfig which would simply append to it, as if the user would have
>
> So there's no such kconfig yet to apply boot parameters specified by configure,
> right? That would itself be a new feature. Or could we use bootconfig? (CC Masami)
Yes, I think you can use bootconfig to add this feature more flexibly.
I think your patch is easily modified to use bootconfig. :)
Thank you,
--
Masami Hiramatsu <[email protected]>