Upon running into an old kmod v19 issue with mount (get_fs_type()) a few of us
hunted for the cause of the issue. Although the issue ended up being a
userspace issue, a stress test driver was written to help reproduce the issue,
and along the way a few other fixes and sanity checks were implemented.
I've taken the time to generalize the stress test driver as a kselftest driver
with a 9 test cases. The last two test cases reveal an existing issue which
is not yet addressed upstream, even if you have kmod v19 present. A fix is
proposed in the last patch. Orignally we had discarded this patch as too
complex due to the alias handling, but upon further analysis of test cases
and memory pressure issues, it seems worth considering. Other than the
last patch I don't think much of the other patches are controversial, but
sending as RFC first just in case.
If its not clear, an end goal here is to make module loading a bit more
deterministic with stronger sanity checks and stress tests. Please note,
the stress test diver requires 4 GiB of RAM to run all tests without running
out of memory. A lot of this has to do with the memory requirements needed
for a dynamic test for multiple threads, but note that the final memory
pressure and OOMs actually don't come from this allocation, but instead
from many finit_module() calls, this consumes quite a bit of memory, specially
if you have a lot of dependencies which also need to be loaded prior to
your needed module -- as is the case for filesystem drivers.
These patches are available on my linux-next git-tree on my branch
20161208-kmod-test-driver-try2 [0], which is based on linux-next tag
next-20161208. Patches are also available based on v4.9-rc8 [1] for
those looking for a bit more stable tree given x86_64 on linux-next is
hosed at the moment.
Since kmod.c doesn't seem to get much love, and since I've been digging
quite a bit into it for other users (firmware) I suppose I could volunteer
myself to maintain this code as well, unless there are oppositions to this.
[0] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20161208-kmod-test-driver-try2
[1] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux.git/log/?h=20161208-kmod-test-driver
Luis R. Rodriguez (10):
kmod: add test driver to stress test the module loader
module: fix memory leak on early load_module() failures
kmod: add dynamic max concurrent thread count
kmod: provide wrappers for kmod_concurrent inc/dec
kmod: return -EBUSY if modprobe limit is reached
kmod: provide sanity check on kmod_concurrent access
kmod: use simplified rate limit printk
sysctl: add support for unsigned int properly
kmod: add helpers for getting kmod count and limit
kmod: add a sanity check on module loading
Documentation/admin-guide/kernel-parameters.txt | 7 +
include/linux/kmod.h | 9 +
include/linux/sysctl.h | 3 +
init/Kconfig | 23 +
init/main.c | 1 +
kernel/kmod.c | 244 ++++-
kernel/module.c | 12 +-
kernel/sysctl.c | 198 +++-
lib/Kconfig.debug | 25 +
lib/Makefile | 1 +
lib/test_kmod.c | 1248 +++++++++++++++++++++++
tools/testing/selftests/kmod/Makefile | 11 +
tools/testing/selftests/kmod/config | 7 +
tools/testing/selftests/kmod/kmod.sh | 448 ++++++++
14 files changed, 2199 insertions(+), 38 deletions(-)
create mode 100644 lib/test_kmod.c
create mode 100644 tools/testing/selftests/kmod/Makefile
create mode 100644 tools/testing/selftests/kmod/config
create mode 100755 tools/testing/selftests/kmod/kmod.sh
--
2.10.1
This adds a new stress test driver for kmod: the kernel module loader.
The new stress test driver, test_kmod, is only enabled as a module right
now. It should be possible to load this as built-in and load tests early
(refer to the force_init_test module parameter), however since a lot of
test can get a system out of memory fast we leave this disabled for now.
Using a system with 1024 MiB of RAM can *easily* get your kernel
OOM fast with this test driver.
The test_kmod driver exposes API knobs for us to fine tune simple
request_module() and get_fs_type() calls. Since these API calls
only allow each one parameter a test driver for these is rather
simple. Other factors that can help out test driver though are
the number of calls we issue and knowing current limitations of
each. This exposes configuration as much as possible through
userspace to be able to build tests directly from userspace.
Since it allows multiple misc devices its will eventually (once we
add a knob to let us create new devices at will) also be possible to
perform more tests in parallel, provided you have enough memory.
We only enable tests we know work as of right now.
Demo screenshots:
# tools/testing/selftests/kmod/kmod.sh
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0002_driver: OK! - loading kmod test
kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0002_fs: OK! - loading kmod test
kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0003: OK! - loading kmod test
kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0004: OK! - loading kmod test
kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
Test completed
You can also request for specific tests:
# tools/testing/selftests/kmod/kmod.sh -t 0001
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
Test completed
Lastly, the current available number of tests:
# tools/testing/selftests/kmod/kmod.sh --help
Usage: tools/testing/selftests/kmod/kmod.sh [ -t <4-number-digit> ]
Valid tests: 0001-0009
0001 - Simple test - 1 thread for empty string
0002 - Simple test - 1 thread for modules/filesystems that do not exist
0003 - Simple test - 1 thread for get_fs_type() only
0004 - Simple test - 2 threads for get_fs_type() only
0005 - multithreaded tests with default setup - request_module() only
0006 - multithreaded tests with default setup - get_fs_type() only
0007 - multithreaded tests with default setup test request_module() and get_fs_type()
0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()
The following test cases currently fail, as such they are not currently
enabled by default:
# tools/testing/selftests/kmod/kmod.sh -t 0007
# tools/testing/selftests/kmod/kmod.sh -t 0008
# tools/testing/selftests/kmod/kmod.sh -t 0009
# tools/testing/selftests/kmod/kmod.sh -t 0010
# tools/testing/selftests/kmod/kmod.sh -t 0011
To be sure to run them as intended please unload both of the modules:
o test_module
o xfs
And ensure they are not loaded on your system prior to testing them.
If you use these paritions for your rootfs you can change the default
test driver used for get_fs_type() by exporting it into your
environment. For example of other test defaults you can override
refer to kmod.sh allow_user_defaults().
Behind the scenes this is how we fine tune at a test case prior to
hitting a trigger to run it:
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
Finally to trigger:
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config
The kmod.sh script uses the above constructs to build differnt test cases.
A bit of interpretation of the current failures follows, first two
premises:
a) When request_module() is used userspace figures out an optimized version of
module order for us. Once it finds the modules it needs, as per depmod
symbol dep map, it will finit_module() the respective modules which
are needed for the original request_module() request.
b) We have an optimization in place whereby if a kernel uses
request_module() on a module already loaded we never bother
userspace as the module already is loaded. This is all handled by
kernel/kmod.c.
A few things to consider to help identify root causes of issues:
0) kmod 19 has a broken heuristic for modules being assumed to be
built-in to your kernel and will return 0 even though request_module()
failed. Upgrade to a newer version of kmod.
1) A get_fs_type() call for "xfs" will request_module() for
"fs-xfs", not for "xfs". The optimization in kernel described in b)
fails to catch if we have a lot of consecutive get_fs_type() calls.
The reason is the optimization in place does not look for aliases. This
means two consecutive get_fs_type() calls will bump kmod_concurrent, whereas
request_module() will not.
This one explanation why test case 0009 fails at least once for
get_fs_type().
2) If a module fails to load --- for whatever reason (kmod_concurrent
limit reached, file not yet present due to rootfs switch, out of memory)
we have a period of time during which module request for the same name
either with request_module() or get_fs_type() will *also* fail to load
even if the file for the module is ready.
This explains why *multiple* NULLs are possible on test 0009.
3) finit_module() consumes quite a bit of memory.
4) Filesystems typically also have more dependent modules than other
modules, its important to note though that even though a get_fs_type() call
does not incur additional kmod_concurrent bumps, since userspace
loads dependencies it finds it needs via finit_module_fd(), it *will*
take much more memory to load a module with a lot of dependencies.
Because of 3) and 4) we will easily run into out of memory failures
with certain tests. For instance test 0006 fails on qemu with 1024 MiB
of RAM. It panics a box after reaping all userspace processes and still
not having enough memory to reap.
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
lib/Kconfig.debug | 25 +
lib/Makefile | 1 +
lib/test_kmod.c | 1248 +++++++++++++++++++++++++++++++++
tools/testing/selftests/kmod/Makefile | 11 +
tools/testing/selftests/kmod/config | 7 +
tools/testing/selftests/kmod/kmod.sh | 449 ++++++++++++
6 files changed, 1741 insertions(+)
create mode 100644 lib/test_kmod.c
create mode 100644 tools/testing/selftests/kmod/Makefile
create mode 100644 tools/testing/selftests/kmod/config
create mode 100755 tools/testing/selftests/kmod/kmod.sh
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 7446097f72bd..6cad548e0682 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1994,6 +1994,31 @@ config BUG_ON_DATA_CORRUPTION
If unsure, say N.
+config TEST_KMOD
+ tristate "kmod stress tester"
+ default n
+ depends on m
+ select TEST_LKM
+ select XFS_FS
+ select TUN
+ select BTRFS_FS
+ help
+ Test the kernel's module loading mechanism: kmod. kmod implements
+ support to load modules using the Linux kernel's usermode helper.
+ This test provides a series of tests against kmod.
+
+ Although technically you can either build test_kmod as a module or
+ into the kernel we disallow building it into the kernel since
+ it stress tests request_module() and this will very likely cause
+ some issues by taking over precious threads available from other
+ module load requests, ultimately this could be fatal.
+
+ To run tests run:
+
+ tools/testing/selftests/kmod/kmod.sh --help
+
+ If unsure, say N.
+
source "samples/Kconfig"
source "lib/Kconfig.kgdb"
diff --git a/lib/Makefile b/lib/Makefile
index d15e235f72ea..3c5a14821e16 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -55,6 +55,7 @@ obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o
obj-$(CONFIG_TEST_PRINTF) += test_printf.o
obj-$(CONFIG_TEST_BITMAP) += test_bitmap.o
obj-$(CONFIG_TEST_UUID) += test_uuid.o
+obj-$(CONFIG_TEST_KMOD) += test_kmod.o
ifeq ($(CONFIG_DEBUG_KOBJECT),y)
CFLAGS_kobject.o += -DDEBUG
diff --git a/lib/test_kmod.c b/lib/test_kmod.c
new file mode 100644
index 000000000000..63fded83b9b6
--- /dev/null
+++ b/lib/test_kmod.c
@@ -0,0 +1,1248 @@
+/*
+ * kmod stress test driver
+ *
+ * Copyright (C) 2016 Luis R. Rodriguez <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of copyleft-next (version 0.3.1 or later) as published
+ * at http://copyleft-next.org/.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+/*
+ * This driver provides an interface to trigger and test the kernel's
+ * module loader through a series of configurations and a few triggers.
+ * To test this driver use the following script as root:
+ *
+ * tools/testing/selftests/kmod/kmod.sh --help
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/kmod.h>
+#include <linux/printk.h>
+#include <linux/kthread.h>
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/vmalloc.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+
+#define TEST_START_NUM_THREADS 50
+#define TEST_START_DRIVER "test_module"
+#define TEST_START_TEST_FS "xfs"
+#define TEST_START_TEST_CASE TEST_KMOD_DRIVER
+
+
+static bool force_init_test = false;
+module_param(force_init_test, bool_enable_only, 0644);
+MODULE_PARM_DESC(force_init_test,
+ "Force kicking a test immediatley after driver loads");
+
+/*
+ * For device allocation / registration
+ */
+static DEFINE_MUTEX(reg_dev_mutex);
+static LIST_HEAD(reg_test_devs);
+
+/*
+ * num_test_devs actually represents the *next* ID of the next
+ * device we will allow to create.
+ */
+static int num_test_devs;
+
+/**
+ * enum kmod_test_case - linker table test case
+ *
+ * If you add a test case, please be sure to review if you need to se
+ * @need_mod_put for your tests case.
+ *
+ * @TEST_KMOD_DRIVER: stress tests request_module()
+ * @TEST_KMOD_FS_TYPE: stress tests get_fs_type()
+ */
+enum kmod_test_case {
+ __TEST_KMOD_INVALID = 0,
+
+ TEST_KMOD_DRIVER,
+ TEST_KMOD_FS_TYPE,
+
+ __TEST_KMOD_MAX,
+};
+
+struct test_config {
+ char *test_driver;
+ char *test_fs;
+ unsigned int num_threads;
+ enum kmod_test_case test_case;
+ int test_result;
+};
+
+struct kmod_test_device;
+
+/**
+ * kmod_test_device_info - thread info
+ *
+ * @ret_sync: return value if request_module() is used, sync request for
+ * @TEST_KMOD_DRIVER
+ * @fs_sync: return value of get_fs_type() for @TEST_KMOD_FS_TYPE
+ * @thread_idx: thread ID
+ * @test_dev: test device test is being performed under
+ * @need_mod_put: Some tests (get_fs_type() is one) requires putting the module
+ * (module_put(fs_sync->owner)) when done, otherwise you will not be able
+ * to unload the respective modules and re-test. We use this to keep
+ * accounting of when we need this and to help out in case we need to
+ * error out and deal with module_put() on error.
+ */
+struct kmod_test_device_info {
+ int ret_sync;
+ struct file_system_type *fs_sync;
+ struct task_struct *task_sync;
+ unsigned int thread_idx;
+ struct kmod_test_device *test_dev;
+ bool need_mod_put;
+};
+
+/**
+ * kmod_test_device - test device to help test kmod
+ *
+ * @dev_idx: unique ID for test device
+ * @config: configuration for the test
+ * @misc_dev: we use a misc device under the hood
+ * @dev: pointer to misc_dev's own struct device
+ * @config_mutex: protects configuration of test
+ * @trigger_mutex: the test trigger can only be fired once at a time
+ * @thread_lock: protects @done count, and the @info per each thread
+ * @done: number of threads which have completed or failed
+ * @test_is_oom: when we run out of memory, use this to halt moving forward
+ * @kthreads_done: completion used to signal when all work is done
+ * @list: needed to be part of the reg_test_devs
+ * @info: array of info for each thread
+ */
+struct kmod_test_device {
+ int dev_idx;
+ struct test_config config;
+ struct miscdevice misc_dev;
+ struct device *dev;
+ struct mutex config_mutex;
+ struct mutex trigger_mutex;
+ struct mutex thread_mutex;
+
+ unsigned int done;
+
+ bool test_is_oom;
+ struct completion kthreads_done;
+ struct list_head list;
+
+ struct kmod_test_device_info *info;
+};
+
+static const char *test_case_str(enum kmod_test_case test_case)
+{
+ switch (test_case) {
+ case TEST_KMOD_DRIVER:
+ return "TEST_KMOD_DRIVER";
+ case TEST_KMOD_FS_TYPE:
+ return "TEST_KMOD_FS_TYPE";
+ default:
+ return "invalid";
+ }
+}
+
+static struct miscdevice *dev_to_misc_dev(struct device *dev)
+{
+ return dev_get_drvdata(dev);
+}
+
+static struct kmod_test_device *misc_dev_to_test_dev(struct miscdevice *misc_dev)
+{
+ return container_of(misc_dev, struct kmod_test_device, misc_dev);
+}
+
+static struct kmod_test_device *dev_to_test_dev(struct device *dev)
+{
+ struct miscdevice *misc_dev;
+
+ misc_dev = dev_to_misc_dev(dev);
+
+ return misc_dev_to_test_dev(misc_dev);
+}
+
+/* Must run with thread_mutex held */
+static void kmod_test_done_check(struct kmod_test_device *test_dev,
+ unsigned int idx)
+{
+ struct test_config *config = &test_dev->config;
+
+ test_dev->done++;
+ dev_dbg(test_dev->dev, "Done thread count: %u\n", test_dev->done);
+
+ if (test_dev->done == config->num_threads) {
+ dev_info(test_dev->dev, "Done: %u threads have all run now\n",
+ test_dev->done);
+ dev_info(test_dev->dev, "Last thread to run: %u\n", idx);
+ complete(&test_dev->kthreads_done);
+ }
+}
+
+static void test_kmod_put_module(struct kmod_test_device_info *info)
+{
+ struct kmod_test_device *test_dev = info->test_dev;
+ struct test_config *config = &test_dev->config;
+
+ if (!info->need_mod_put)
+ return;
+
+ switch (config->test_case) {
+ case TEST_KMOD_DRIVER:
+ break;
+ case TEST_KMOD_FS_TYPE:
+ if (info && info->fs_sync && info->fs_sync->owner)
+ module_put(info->fs_sync->owner);
+ break;
+ default:
+ BUG();
+ }
+
+ info->need_mod_put = true;
+}
+
+static int run_request(void *data)
+{
+ struct kmod_test_device_info *info = data;
+ struct kmod_test_device *test_dev = info->test_dev;
+ struct test_config *config = &test_dev->config;
+
+ switch (config->test_case) {
+ case TEST_KMOD_DRIVER:
+ info->ret_sync = request_module("%s", config->test_driver);
+ break;
+ case TEST_KMOD_FS_TYPE:
+ info->fs_sync = get_fs_type(config->test_fs);
+ info->need_mod_put = true;
+ break;
+ default:
+ /* __trigger_config_run() already checked for test sanity */
+ BUG();
+ return -EINVAL;
+ }
+
+ dev_dbg(test_dev->dev, "Ran thread %u\n", info->thread_idx);
+
+ test_kmod_put_module(info);
+
+ mutex_lock(&test_dev->thread_mutex);
+ info->task_sync = NULL;
+ kmod_test_done_check(test_dev, info->thread_idx);
+ mutex_unlock(&test_dev->thread_mutex);
+
+ return 0;
+}
+
+static int tally_work_test(struct kmod_test_device_info *info)
+{
+ struct kmod_test_device *test_dev = info->test_dev;
+ struct test_config *config = &test_dev->config;
+ int err_ret = 0;
+
+ switch (config->test_case) {
+ case TEST_KMOD_DRIVER:
+ /*
+ * Only capture errors, if one is found that's
+ * enough, for now.
+ */
+ if (info->ret_sync != 0)
+ err_ret = info->ret_sync;
+ dev_info(test_dev->dev,
+ "Sync thread %d return status: %d\n",
+ info->thread_idx, info->ret_sync);
+ break;
+ case TEST_KMOD_FS_TYPE:
+ /* For now we make this simple */
+ if (!info->fs_sync)
+ err_ret = -EINVAL;
+ dev_info(test_dev->dev, "Sync thread %u fs: %s\n",
+ info->thread_idx, info->fs_sync ? config->test_fs :
+ "NULL");
+ break;
+ default:
+ BUG();
+ }
+
+ return err_ret;
+}
+
+/*
+ * XXX: add result option to display if all errors did not match.
+ * For now we just keep any error code if one was found.
+ *
+ * If this ran it means *all* tasks were created fine and we
+ * are now just collecting results.
+ *
+ * Only propagate errors, do not override with a subsequent sucess case.
+ */
+static void tally_up_work(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+ struct kmod_test_device_info *info;
+ unsigned int idx;
+ int err_ret = 0;
+ int ret = 0;
+
+ mutex_lock(&test_dev->thread_mutex);
+
+ dev_info(test_dev->dev, "Results:\n");
+
+ for (idx=0; idx < config->num_threads; idx++) {
+ info = &test_dev->info[idx];
+ ret = tally_work_test(info);
+ if (ret)
+ err_ret = ret;
+ }
+
+ /*
+ * Note: request_module() returns 256 for a module not found even
+ * though modprobe itself returns 1.
+ */
+ config->test_result = err_ret;
+
+ mutex_unlock(&test_dev->thread_mutex);
+}
+
+static int try_one_request(struct kmod_test_device *test_dev, unsigned int idx)
+{
+ struct kmod_test_device_info *info = &test_dev->info[idx];
+ int fail_ret = -ENOMEM;
+
+ mutex_lock(&test_dev->thread_mutex);
+
+ info->thread_idx = idx;
+ info->test_dev = test_dev;
+ info->task_sync = kthread_run(run_request, info, "%s-%u",
+ KBUILD_MODNAME, idx);
+
+ if (!info->task_sync || IS_ERR(info->task_sync)) {
+ test_dev->test_is_oom = true;
+ dev_err(test_dev->dev, "Setting up thread %u failed\n", idx);
+ info->task_sync = NULL;
+ goto err_out;
+ } else
+ dev_dbg(test_dev->dev, "Kicked off thread %u\n", idx);
+
+ mutex_unlock(&test_dev->thread_mutex);
+
+ return 0;
+
+err_out:
+ info->ret_sync = fail_ret;
+ mutex_unlock(&test_dev->thread_mutex);
+
+ return fail_ret;
+}
+
+static void test_dev_kmod_stop_tests(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+ struct kmod_test_device_info *info;
+ unsigned int i;
+
+ dev_info(test_dev->dev, "Ending request_module() tests\n");
+
+ mutex_lock(&test_dev->thread_mutex);
+
+ for (i=0; i < config->num_threads; i++) {
+ info = &test_dev->info[i];
+ if (info->task_sync && !IS_ERR(info->task_sync)) {
+ dev_info(test_dev->dev,
+ "Stopping still-running thread %i\n", i);
+ kthread_stop(info->task_sync);
+ }
+
+ /*
+ * info->task_sync is well protected, it can only be
+ * NULL or a pointer to a struct. If its NULL we either
+ * never ran, or we did and we completed the work. Completed
+ * tasks *always* put the module for us. This is a sanity
+ * check -- just in case.
+ */
+ if (info->task_sync && info->need_mod_put)
+ test_kmod_put_module(info);
+ }
+
+ mutex_unlock(&test_dev->thread_mutex);
+}
+
+/*
+ * Only wait *iff* we did not run into any errors during all of our thread
+ * set up. If run into any issues we stop threads and just bail out with
+ * an error to the trigger. This also means we don't need any tally work
+ * for any threads which fail.
+ */
+static int try_requests(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+ unsigned int idx;
+ int ret;
+ bool any_error = false;
+
+ for (idx=0; idx < config->num_threads; idx++) {
+ if (test_dev->test_is_oom) {
+ any_error = true;
+ break;
+ }
+
+ ret = try_one_request(test_dev, idx);
+ if (ret) {
+ any_error = true;
+ break;
+ }
+ }
+
+ if (!any_error) {
+ test_dev->test_is_oom = false;
+ dev_info(test_dev->dev,
+ "No errors were found while initializing threads\n");
+ wait_for_completion(&test_dev->kthreads_done);
+ tally_up_work(test_dev);
+ } else {
+ test_dev->test_is_oom = true;
+ dev_info(test_dev->dev,
+ "At least one thread failed to start, stop all work\n");
+ test_dev_kmod_stop_tests(test_dev);
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static int run_test_driver(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+
+ dev_info(test_dev->dev, "Test case: %s (%u)\n",
+ test_case_str(config->test_case),
+ config->test_case);
+ dev_info(test_dev->dev, "Test driver to load: %s\n",
+ config->test_driver);
+ dev_info(test_dev->dev, "Number of threads to run: %u\n",
+ config->num_threads);
+ dev_info(test_dev->dev, "Thread IDs will range from 0 - %u\n",
+ config->num_threads - 1);
+
+ return try_requests(test_dev);
+}
+
+static int run_test_fs_type(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+
+ dev_info(test_dev->dev, "Test case: %s (%u)\n",
+ test_case_str(config->test_case),
+ config->test_case);
+ dev_info(test_dev->dev, "Test filesystem to load: %s\n",
+ config->test_fs);
+ dev_info(test_dev->dev, "Number of threads to run: %u\n",
+ config->num_threads);
+ dev_info(test_dev->dev, "Thread IDs will range from 0 - %u\n",
+ config->num_threads - 1);
+
+ return try_requests(test_dev);
+}
+
+static ssize_t config_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+ int len = 0;
+
+ mutex_lock(&test_dev->config_mutex);
+
+ len += sprintf(buf, "Custom trigger configuration for: %s\n",
+ dev_name(dev));
+
+ len += sprintf(buf+len, "Number of threads:\t%u\n",
+ config->num_threads);
+
+ len += sprintf(buf+len, "Test_case:\t%s (%u)\n",
+ test_case_str(config->test_case),
+ config->test_case);
+
+ if (config->test_driver)
+ len += sprintf(buf+len, "driver:\t%s\n",
+ config->test_driver);
+ else
+ len += sprintf(buf+len, "driver:\tEMTPY\n");
+
+ if (config->test_fs)
+ len += sprintf(buf+len, "fs:\t%s\n",
+ config->test_fs);
+ else
+ len += sprintf(buf+len, "fs:\tEMTPY\n");
+
+
+ mutex_unlock(&test_dev->config_mutex);
+
+ return len;
+}
+static DEVICE_ATTR_RO(config);
+
+/*
+ * This ensures we don't allow kicking threads through if our configuration
+ * is faulty.
+ */
+static int __trigger_config_run(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+
+ test_dev->done = 0;
+
+ switch (config->test_case) {
+ case TEST_KMOD_DRIVER:
+ return run_test_driver(test_dev);
+ case TEST_KMOD_FS_TYPE:
+ return run_test_fs_type(test_dev);
+ default:
+ dev_warn(test_dev->dev,
+ "Invalid test case requested: %u\n",
+ config->test_case);
+ return -EINVAL;
+ }
+}
+
+static int trigger_config_run(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+ int ret;
+
+ mutex_lock(&test_dev->trigger_mutex);
+ mutex_lock(&test_dev->config_mutex);
+
+ ret = __trigger_config_run(test_dev);
+ if (ret < 0)
+ goto out;
+ dev_info(test_dev->dev, "General test result: %d\n",
+ config->test_result);
+
+ /*
+ * We must return 0 after a trigger even unless something went
+ * wrong with the setup of the test. If the test setup went fine
+ * then userspace must just check the result of config->test_result.
+ * One issue with relying on the return from a call in the kernel
+ * is if the kernel returns a possitive value using this trigger
+ * will not return the value to userspace, it would be lost.
+ *
+ * By not relying on capturing the return value of tests we are using
+ * through the trigger it also us to run tests with set -e and only
+ * fail when something went wrong with the driver upon trigger
+ * requests.
+ */
+ ret = 0;
+
+out:
+ mutex_unlock(&test_dev->config_mutex);
+ mutex_unlock(&test_dev->trigger_mutex);
+
+ return ret;
+}
+
+static ssize_t
+trigger_config_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ int ret;
+
+ if (test_dev->test_is_oom)
+ return -ENOMEM;
+
+ /* For all intents and purposes we don't care what userspace
+ * sent this trigger, we care only that we were triggered.
+ * We treat the return value only for caputuring issues with
+ * the test setup. At this point all the test variables should
+ * have been allocated so typically this should never fail.
+ */
+ ret = trigger_config_run(test_dev);
+ if (unlikely(ret < 0))
+ goto out;
+
+ /*
+ * Note: any return > 0 will be treated as success
+ * and the error value will not be available to userspace.
+ * Do not rely on trying to send to userspace a test value
+ * return value as possitive return errors will be lost.
+ */
+ if (WARN_ON(ret > 0))
+ return -EINVAL;
+
+ ret = count;
+out:
+ return ret;
+}
+static DEVICE_ATTR_WO(trigger_config);
+
+/*
+ * XXX: move to kstrncpy() once merged.
+ *
+ * Users should use kfree_const() when freeing these.
+ */
+static int __kstrncpy(char **dst, const char *name, size_t count, gfp_t gfp)
+{
+ *dst = kstrndup(name, count, gfp);
+ if (!*dst)
+ return -ENOSPC;
+ return count;
+}
+
+static int config_copy_test_driver_name(struct test_config *config,
+ const char *name,
+ size_t count)
+{
+ return __kstrncpy(&config->test_driver, name, count, GFP_KERNEL);
+}
+
+
+static int config_copy_test_fs(struct test_config *config, const char *name,
+ size_t count)
+{
+ return __kstrncpy(&config->test_fs, name, count, GFP_KERNEL);
+}
+
+static void __kmod_config_free(struct test_config *config)
+{
+ if (!config)
+ return;
+
+ kfree_const(config->test_driver);
+ config->test_driver = NULL;
+
+ kfree_const(config->test_fs);
+ config->test_driver = NULL;
+}
+
+static void kmod_config_free(struct kmod_test_device *test_dev)
+{
+ struct test_config *config;
+
+ if (!test_dev)
+ return;
+
+ config = &test_dev->config;
+
+ mutex_lock(&test_dev->config_mutex);
+ __kmod_config_free(config);
+ mutex_unlock(&test_dev->config_mutex);
+}
+
+static ssize_t config_test_driver_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+ int copied;
+
+ mutex_lock(&test_dev->config_mutex);
+
+ kfree_const(config->test_driver);
+ config->test_driver = NULL;
+
+ copied = config_copy_test_driver_name(config, buf, count);
+ mutex_unlock(&test_dev->config_mutex);
+
+ return copied;
+}
+
+static ssize_t config_test_driver_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ mutex_lock(&test_dev->config_mutex);
+ strcpy(buf, config->test_driver);
+ strcat(buf, "\n");
+ mutex_unlock(&test_dev->config_mutex);
+
+ return strlen(buf) + 1;
+}
+static DEVICE_ATTR(config_test_driver, 0644, config_test_driver_show,
+ config_test_driver_store);
+
+static ssize_t config_test_fs_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+ int copied;
+
+ mutex_lock(&test_dev->config_mutex);
+
+ kfree_const(config->test_fs);
+ config->test_fs = NULL;
+
+ copied = config_copy_test_fs(config, buf, count);
+ mutex_unlock(&test_dev->config_mutex);
+
+ return copied;
+}
+
+static ssize_t config_test_fs_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ mutex_lock(&test_dev->config_mutex);
+ strcpy(buf, config->test_fs);
+ strcat(buf, "\n");
+ mutex_unlock(&test_dev->config_mutex);
+
+ return strlen(buf) + 1;
+}
+static DEVICE_ATTR(config_test_fs, 0644, config_test_fs_show,
+ config_test_fs_store);
+
+static int trigger_config_run_driver(struct kmod_test_device *test_dev,
+ const char *test_driver)
+{
+ int copied;
+ struct test_config *config = &test_dev->config;
+
+ mutex_lock(&test_dev->config_mutex);
+
+ config->test_case = TEST_KMOD_DRIVER;
+
+ kfree_const(config->test_driver);
+ config->test_driver = NULL;
+
+ copied = config_copy_test_driver_name(config, test_driver,
+ strlen(test_driver));
+ mutex_unlock(&test_dev->config_mutex);
+
+ if (copied != strlen(test_driver)) {
+ test_dev->test_is_oom = true;
+ return -EINVAL;
+ }
+
+ test_dev->test_is_oom = false;
+
+ return trigger_config_run(test_dev);
+}
+
+static int trigger_config_run_fs(struct kmod_test_device *test_dev,
+ const char *fs_type)
+{
+ int copied;
+ struct test_config *config = &test_dev->config;
+
+ mutex_lock(&test_dev->config_mutex);
+ config->test_case = TEST_KMOD_FS_TYPE;
+
+ kfree_const(config->test_fs);
+ config->test_driver = NULL;
+
+ copied = config_copy_test_fs(config, fs_type, strlen(fs_type));
+ mutex_unlock(&test_dev->config_mutex);
+
+ if (copied != strlen(fs_type)) {
+ test_dev->test_is_oom = true;
+ return -EINVAL;
+ }
+
+ test_dev->test_is_oom = false;
+
+ return trigger_config_run(test_dev);
+}
+
+static void free_test_dev_info(struct kmod_test_device *test_dev)
+{
+ if (test_dev->info) {
+ vfree(test_dev->info);
+ test_dev->info = NULL;
+ }
+}
+
+static int kmod_config_sync_info(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+
+ free_test_dev_info(test_dev);
+ test_dev->info = vzalloc(config->num_threads *
+ sizeof(struct kmod_test_device_info));
+ if (!test_dev->info) {
+ dev_err(test_dev->dev, "Cannot alloc test_dev info\n");
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+/*
+ * Old kernels may not have this, if you want to port this code to
+ * test it on older kernels.
+ */
+#ifdef get_kmod_umh_limit
+static unsigned int kmod_init_test_thread_limit(void)
+{
+ return get_kmod_umh_limit();
+}
+#else
+static unsigned int kmod_init_test_thread_limit(void)
+{
+ return TEST_START_NUM_THREADS;
+}
+#endif
+
+static int __kmod_config_init(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+ int ret = -ENOMEM, copied;
+
+ __kmod_config_free(config);
+
+ copied = config_copy_test_driver_name(config, TEST_START_DRIVER,
+ strlen(TEST_START_DRIVER));
+ if (copied != strlen(TEST_START_DRIVER))
+ goto err_out;
+
+ copied = config_copy_test_fs(config, TEST_START_TEST_FS,
+ strlen(TEST_START_TEST_FS));
+ if (copied != strlen(TEST_START_TEST_FS))
+ goto err_out;
+
+ config->num_threads = kmod_init_test_thread_limit();
+ config->test_result = 0;
+ config->test_case = TEST_START_TEST_CASE;
+
+ ret = kmod_config_sync_info(test_dev);
+ if (ret)
+ goto err_out;
+
+ test_dev->test_is_oom = false;
+
+ return 0;
+
+err_out:
+ test_dev->test_is_oom = true;
+ WARN_ON(test_dev->test_is_oom);
+
+ __kmod_config_free(config);
+
+ return ret;
+}
+
+static ssize_t reset_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ int ret;
+
+ mutex_lock(&test_dev->trigger_mutex);
+ mutex_lock(&test_dev->config_mutex);
+
+ ret = __kmod_config_init(test_dev);
+ if (ret < 0) {
+ ret = -ENOMEM;
+ dev_err(dev, "could not alloc settings for config trigger: %d\n",
+ ret);
+ goto out;
+ }
+
+ dev_info(dev, "reset\n");
+ ret = count;
+
+out:
+ mutex_unlock(&test_dev->config_mutex);
+ mutex_unlock(&test_dev->trigger_mutex);
+
+ return ret;
+}
+static DEVICE_ATTR_WO(reset);
+
+static int test_dev_config_update_uint_sync(struct kmod_test_device *test_dev,
+ const char *buf, size_t size,
+ unsigned int *config,
+ int (*test_sync)(struct kmod_test_device *test_dev))
+{
+ int ret;
+ char *end;
+ long new = simple_strtol(buf, &end, 0);
+ unsigned int old_val;
+ if (end == buf || new > UINT_MAX)
+ return -EINVAL;
+
+ mutex_lock(&test_dev->config_mutex);
+
+ old_val = *config;
+ *(unsigned int *)config = new;
+
+ ret = test_sync(test_dev);
+ if (ret) {
+ *(unsigned int *)config = old_val;
+
+ ret = test_sync(test_dev);
+ WARN_ON(ret);
+
+ mutex_unlock(&test_dev->config_mutex);
+ return -EINVAL;
+ }
+
+ mutex_unlock(&test_dev->config_mutex);
+ /* Always return full write size even if we didn't consume all */
+ return size;
+}
+
+static int test_dev_config_update_uint_range(struct kmod_test_device *test_dev,
+ const char *buf, size_t size,
+ unsigned int *config,
+ unsigned int min,
+ unsigned int max)
+{
+ char *end;
+ long new = simple_strtol(buf, &end, 0);
+ if (end == buf || new < min || new > max || new > UINT_MAX)
+ return -EINVAL;
+
+ mutex_lock(&test_dev->config_mutex);
+ *(unsigned int *)config = new;
+ mutex_unlock(&test_dev->config_mutex);
+
+ /* Always return full write size even if we didn't consume all */
+ return size;
+}
+
+static int test_dev_config_update_int(struct kmod_test_device *test_dev,
+ const char *buf, size_t size,
+ int *config)
+{
+ char *end;
+ long new = simple_strtol(buf, &end, 0);
+ if (end == buf || new > INT_MAX || new < INT_MIN)
+ return -EINVAL;
+ mutex_lock(&test_dev->config_mutex);
+ *(int *)config = new;
+ mutex_unlock(&test_dev->config_mutex);
+ /* Always return full write size even if we didn't consume all */
+ return size;
+}
+
+static ssize_t test_dev_config_show_int(struct kmod_test_device *test_dev,
+ char *buf,
+ int config)
+{
+ int val;
+
+ mutex_lock(&test_dev->config_mutex);
+ val = config;
+ mutex_unlock(&test_dev->config_mutex);
+
+ return snprintf(buf, PAGE_SIZE, "%d\n", val);
+}
+
+static ssize_t test_dev_config_show_uint(struct kmod_test_device *test_dev,
+ char *buf,
+ unsigned int config)
+{
+ unsigned int val;
+
+ mutex_lock(&test_dev->config_mutex);
+ val = config;
+ mutex_unlock(&test_dev->config_mutex);
+
+ return snprintf(buf, PAGE_SIZE, "%u\n", val);
+}
+
+static ssize_t test_result_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ return test_dev_config_update_int(test_dev, buf, count,
+ &config->test_result);
+}
+
+static ssize_t config_num_threads_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ return test_dev_config_update_uint_sync(test_dev, buf, count,
+ &config->num_threads,
+ kmod_config_sync_info);
+}
+
+static ssize_t config_num_threads_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ return test_dev_config_show_int(test_dev, buf, config->num_threads);
+}
+static DEVICE_ATTR(config_num_threads, 0644, config_num_threads_show,
+ config_num_threads_store);
+
+static ssize_t config_test_case_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ return test_dev_config_update_uint_range(test_dev, buf, count,
+ &config->test_case,
+ __TEST_KMOD_INVALID + 1,
+ __TEST_KMOD_MAX - 1);
+}
+
+static ssize_t config_test_case_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ return test_dev_config_show_uint(test_dev, buf, config->test_case);
+}
+static DEVICE_ATTR(config_test_case, 0644, config_test_case_show,
+ config_test_case_store);
+
+static ssize_t test_result_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ return test_dev_config_show_int(test_dev, buf, config->test_result);
+}
+static DEVICE_ATTR(test_result, 0644, test_result_show, test_result_store);
+
+#define TEST_KMOD_DEV_ATTR(name) &dev_attr_##name.attr
+
+static struct attribute *test_dev_attrs[] = {
+ TEST_KMOD_DEV_ATTR(trigger_config),
+ TEST_KMOD_DEV_ATTR(config),
+ TEST_KMOD_DEV_ATTR(reset),
+
+ TEST_KMOD_DEV_ATTR(config_test_driver),
+ TEST_KMOD_DEV_ATTR(config_test_fs),
+ TEST_KMOD_DEV_ATTR(config_num_threads),
+ TEST_KMOD_DEV_ATTR(config_test_case),
+ TEST_KMOD_DEV_ATTR(test_result),
+
+ NULL,
+};
+
+ATTRIBUTE_GROUPS(test_dev);
+
+static int kmod_config_init(struct kmod_test_device *test_dev)
+{
+ int ret;
+
+ mutex_lock(&test_dev->config_mutex);
+ ret = __kmod_config_init(test_dev);
+ mutex_unlock(&test_dev->config_mutex);
+
+ return ret;
+}
+
+/*
+ * XXX: this could perhaps be made generic already too, but a hunt
+ * for actual users would be needed first. It could be generic
+ * if other test drivers end up using a similar mechanism.
+ */
+const char *test_dev_get_name(const char *base, int idx, gfp_t gfp)
+{
+ const char *name_const;
+ char *name;
+
+ if (!base)
+ return NULL;
+ if (strlen(base) > 30)
+ return NULL;
+ name = kzalloc(1024, gfp);
+ if (!name)
+ return NULL;
+
+ strncat(name, base, strlen(base));
+ sprintf(name+(strlen(base)), "%d", idx);
+ name_const = kstrdup_const(name, gfp);
+
+ kfree(name);
+
+ return name_const;
+}
+
+static struct kmod_test_device *alloc_test_dev_kmod(int idx)
+{
+ int ret;
+ struct kmod_test_device *test_dev;
+ struct miscdevice *misc_dev;
+
+ test_dev = vzalloc(sizeof(struct kmod_test_device));
+ if (!test_dev) {
+ pr_err("Cannot alloc test_dev\n");
+ goto err_out;
+ }
+
+ mutex_init(&test_dev->config_mutex);
+ mutex_init(&test_dev->trigger_mutex);
+ mutex_init(&test_dev->thread_mutex);
+
+ init_completion(&test_dev->kthreads_done);
+
+ ret = kmod_config_init(test_dev);
+ if (ret < 0) {
+ pr_err("Cannot alloc kmod_config_init()\n");
+ goto err_out_free;
+ }
+
+ test_dev->dev_idx = idx;
+ misc_dev = &test_dev->misc_dev;
+
+ misc_dev->minor = MISC_DYNAMIC_MINOR;
+ misc_dev->name = test_dev_get_name("test_kmod", test_dev->dev_idx,
+ GFP_KERNEL);
+ if (!misc_dev->name) {
+ pr_err("Cannot alloc misc_dev->name\n");
+ goto err_out_free_config;
+ }
+ misc_dev->groups = test_dev_groups;
+
+ return test_dev;
+
+err_out_free_config:
+ free_test_dev_info(test_dev);
+ kmod_config_free(test_dev);
+err_out_free:
+ vfree(test_dev);
+ test_dev = NULL;
+err_out:
+ return NULL;
+}
+
+static void free_test_dev_kmod(struct kmod_test_device *test_dev)
+{
+ if (test_dev) {
+ kfree_const(test_dev->misc_dev.name);
+ test_dev->misc_dev.name = NULL;
+ free_test_dev_info(test_dev);
+ kmod_config_free(test_dev);
+ vfree(test_dev);
+ test_dev = NULL;
+ }
+}
+
+static struct kmod_test_device *register_test_dev_kmod(void)
+{
+ struct kmod_test_device *test_dev = NULL;
+ int ret;
+
+ mutex_unlock(®_dev_mutex);
+
+ /* int should suffice for number of devices, test for wrap */
+ if (unlikely(num_test_devs + 1) < 0) {
+ pr_err("reached limit of number of test devices\n");
+ goto out;
+ }
+
+ test_dev = alloc_test_dev_kmod(num_test_devs);
+ if (!test_dev)
+ goto out;
+
+ ret = misc_register(&test_dev->misc_dev);
+ if (ret) {
+ pr_err("could not register misc device: %d\n", ret);
+ free_test_dev_kmod(test_dev);
+ goto out;
+ }
+
+ test_dev->dev = test_dev->misc_dev.this_device;
+ list_add_tail(&test_dev->list, ®_test_devs);
+ dev_info(test_dev->dev, "interface ready\n");
+
+ num_test_devs++;
+
+out:
+ mutex_unlock(®_dev_mutex);
+
+ return test_dev;
+
+}
+
+static int __init test_kmod_init(void)
+{
+ struct kmod_test_device *test_dev;
+ int ret;
+
+ test_dev = register_test_dev_kmod();
+ if (!test_dev) {
+ pr_err("Cannot add first test kmod device\n");
+ return -ENODEV;
+ }
+
+ /*
+ * With some work we might be able to gracefully enable
+ * testing with this driver built-in, for now this seems
+ * rather risky. For those willing to try have at it,
+ * and enable the below. Good luck! If that works, try
+ * lowering the init level for more fun.
+ */
+ if (force_init_test) {
+ ret = trigger_config_run_driver(test_dev, "tun");
+ if (WARN_ON(ret))
+ return ret;
+ ret = trigger_config_run_fs(test_dev, "btrfs");
+ if (WARN_ON(ret))
+ return ret;
+ }
+
+ return 0;
+}
+late_initcall(test_kmod_init);
+
+static
+void unregister_test_dev_kmod(struct kmod_test_device *test_dev)
+{
+ mutex_lock(&test_dev->trigger_mutex);
+ mutex_lock(&test_dev->config_mutex);
+
+ test_dev_kmod_stop_tests(test_dev);
+
+ dev_info(test_dev->dev, "removing interface\n");
+ misc_deregister(&test_dev->misc_dev);
+
+ mutex_unlock(&test_dev->config_mutex);
+ mutex_unlock(&test_dev->trigger_mutex);
+
+ free_test_dev_kmod(test_dev);
+}
+
+static void __exit test_kmod_exit(void)
+{
+ struct kmod_test_device *test_dev, *tmp;
+
+ mutex_lock(®_dev_mutex);
+ list_for_each_entry_safe(test_dev, tmp, ®_test_devs, list) {
+ list_del(&test_dev->list);
+ unregister_test_dev_kmod(test_dev);
+ }
+ mutex_unlock(®_dev_mutex);
+}
+module_exit(test_kmod_exit);
+
+MODULE_AUTHOR("Luis R. Rodriguez <[email protected]>");
+MODULE_LICENSE("GPL");
diff --git a/tools/testing/selftests/kmod/Makefile b/tools/testing/selftests/kmod/Makefile
new file mode 100644
index 000000000000..fa2ccc5fb3de
--- /dev/null
+++ b/tools/testing/selftests/kmod/Makefile
@@ -0,0 +1,11 @@
+# Makefile for kmod loading selftests
+
+# No binaries, but make sure arg-less "make" doesn't trigger "run_tests"
+all:
+
+TEST_PROGS := kmod.sh
+
+include ../lib.mk
+
+# Nothing to clean up.
+clean:
diff --git a/tools/testing/selftests/kmod/config b/tools/testing/selftests/kmod/config
new file mode 100644
index 000000000000..259f4fd6b5e2
--- /dev/null
+++ b/tools/testing/selftests/kmod/config
@@ -0,0 +1,7 @@
+CONFIG_TEST_KMOD=m
+CONFIG_TEST_LKM=m
+CONFIG_XFS_FS=m
+
+# For the module parameter force_init_test is used
+CONFIG_TUN=m
+CONFIG_BTRFS_FS=m
diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
new file mode 100755
index 000000000000..9ea1864d8bae
--- /dev/null
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -0,0 +1,449 @@
+#!/bin/bash
+#
+# Copyright (C) 2016 Luis R. Rodriguez <[email protected]>
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms of copyleft-next (version 0.3.1 or later) as published
+# at http://copyleft-next.org/.
+
+# This is a stress test script for kmod, the kernel module loader. It uses
+# test_kmod which exposes a series of knobs for the API for us so we can
+# tweak each test in userspace rather than in kernelspace.
+#
+# The way kmod works is it uses the kernel's usermode helper API to eventually
+# call /sbin/modprobe. It has a limit of the number of concurrent calls
+# possible. The kernel interface to load modules is request_module(), however
+# mount uses get_fs_type(). Both behave slightly differently, but the
+# differences are important enough to test each call separately. For this
+# reason test_kmod starts by providing tests for both calls.
+#
+# The test driver test_kmod assumes a series of defaults which you can
+# override by exporting to your environment prior running this script.
+# For instance this script assumes you do not have xfs loaded upon boot.
+# If this is false, export DEFAULT_KMOD_FS="ext4" prior to running this
+# script if the filesyste module you don't have loaded upon bootup
+# is ext4 instead. Refer to allow_user_defaults() for a list of user
+# override variables possible.
+#
+# You'll want at least 4096 GiB of RAM to expect to run these tests
+# without running out of memory on them. For other requirements refer
+# to test_reqs()
+
+set -e
+
+TEST_DRIVER="test_kmod"
+
+function allow_user_defaults()
+{
+ if [ -z $DEFAULT_KMOD_DRIVER ]; then
+ DEFAULT_KMOD_DRIVER="test_module"
+ fi
+
+ if [ -z $DEFAULT_KMOD_FS ]; then
+ DEFAULT_KMOD_FS="xfs"
+ fi
+
+ if [ -z $PROC_DIR ]; then
+ PROC_DIR="/proc/sys/kernel/"
+ fi
+
+ if [ -z $MODPROBE_LIMIT ]; then
+ MODPROBE_LIMIT=50
+ fi
+
+ if [ -z $DIR ]; then
+ DIR="/sys/devices/virtual/misc/${TEST_DRIVER}0/"
+ fi
+
+ MODPROBE_LIMIT_FILE="${PROC_DIR}/kmod-limit"
+}
+
+test_reqs()
+{
+ if ! which modprobe 2> /dev/null > /dev/null; then
+ echo "$0: You need modprobe installed"
+ exit 1
+ fi
+
+ if ! which kmod 2> /dev/null > /dev/null; then
+ echo "$0: You need kmod installed"
+ exit 1
+ fi
+
+ # kmod 19 has a bad bug where it returns 0 when modprobe
+ # gets called *even* if the module was not loaded due to
+ # some bad heuristics. For details see:
+ #
+ # A work around is possible in-kernel but its rather
+ # complex.
+ KMOD_VERSION=$(kmod --version | awk '{print $3}')
+ if [[ $KMOD_VERSION -le 19 ]]; then
+ echo "$0: You need at least kmod 20"
+ echo "kmod <= 19 is buggy, for details see:"
+ echo "http://git.kernel.org/cgit/utils/kernel/kmod/kmod.git/commit/libkmod/libkmod-module.c?id=fd44a98ae2eb5eb32161088954ab21e58e19dfc4"
+ exit 1
+ fi
+}
+
+function load_req_mod()
+{
+ if [ ! -d $DIR ]; then
+ # Alanis: "Oh isn't it ironic?"
+ modprobe $TEST_DRIVER
+ if [ ! -d $DIR ]; then
+ echo "$0: $DIR not present"
+ echo "You must have the following enabled in your kernel:"
+ cat $PWD/config
+ exit 1
+ fi
+ fi
+}
+
+test_finish()
+{
+ echo "Test completed"
+}
+
+errno_name_to_val()
+{
+ case "$1" in
+ # kmod calls modprobe and upon of a module not found
+ # modprobe returns just 1... However in the kernel we
+ # *sometimes* see 256...
+ MODULE_NOT_FOUND)
+ echo 256;;
+ SUCCESS)
+ echo 0;;
+ -EPERM)
+ echo -1;;
+ -ENOENT)
+ echo -2;;
+ -EINVAL)
+ echo -22;;
+ -ERR_ANY)
+ echo -123456;;
+ *)
+ echo invalid;;
+ esac
+}
+
+errno_val_to_name()
+ case "$1" in
+ 256)
+ echo MODULE_NOT_FOUND;;
+ 0)
+ echo SUCCESS;;
+ -1)
+ echo -EPERM;;
+ -2)
+ echo -ENOENT;;
+ -22)
+ echo -EINVAL;;
+ -123456)
+ echo -ERR_ANY;;
+ *)
+ echo invalid;;
+ esac
+
+config_set_test_case_driver()
+{
+ if ! echo -n 1 >$DIR/config_test_case; then
+ echo "$0: Unable to set to test case to driver" >&2
+ exit 1
+ fi
+}
+
+config_set_test_case_fs()
+{
+ if ! echo -n 2 >$DIR/config_test_case; then
+ echo "$0: Unable to set to test case to fs" >&2
+ exit 1
+ fi
+}
+
+config_num_threads()
+{
+ if ! echo -n $1 >$DIR/config_num_threads; then
+ echo "$0: Unable to set to number of threads" >&2
+ exit 1
+ fi
+}
+
+config_get_modprobe_limit()
+{
+ if [[ -f ${MODPROBE_LIMIT_FILE} ]] ; then
+ MODPROBE_LIMIT=$(cat $MODPROBE_LIMIT_FILE)
+ fi
+ echo $MODPROBE_LIMIT
+}
+
+config_num_thread_limit_extra()
+{
+ MODPROBE_LIMIT=$(config_get_modprobe_limit)
+ let EXTRA_LIMIT=$MODPROBE_LIMIT+$1
+ config_num_threads $EXTRA_LIMIT
+}
+
+# For special characters use printf directly,
+# refer to kmod_test_0001
+config_set_driver()
+{
+ if ! echo -n $1 >$DIR/config_test_driver; then
+ echo "$0: Unable to set driver" >&2
+ exit 1
+ fi
+}
+
+config_set_fs()
+{
+ if ! echo -n $1 >$DIR/config_test_fs; then
+ echo "$0: Unable to set driver" >&2
+ exit 1
+ fi
+}
+
+config_get_driver()
+{
+ cat $DIR/config_test_driver
+}
+
+config_get_test_result()
+{
+ cat $DIR/test_result
+}
+
+config_reset()
+{
+ if ! echo -n "1" >"$DIR"/reset; then
+ echo "$0: reset shuld have worked" >&2
+ exit 1
+ fi
+}
+
+config_show_config()
+{
+ echo "----------------------------------------------------"
+ cat "$DIR"/config
+ echo "----------------------------------------------------"
+}
+
+config_trigger()
+{
+ if ! echo -n "1" >"$DIR"/trigger_config 2>/dev/null; then
+ echo "$1: FAIL - loading should have worked"
+ config_show_config
+ exit 1
+ fi
+ echo "$1: OK! - loading kmod test"
+}
+
+config_trigger_want_fail()
+{
+ if echo "1" > $DIR/trigger_config 2>/dev/null; then
+ echo "$1: FAIL - test case was expected to fail"
+ config_show_config
+ exit 1
+ fi
+ echo "$1: OK! - kmod test case failed as expected"
+}
+
+config_expect_result()
+{
+ RC=$(config_get_test_result)
+ RC_NAME=$(errno_val_to_name $RC)
+
+ ERRNO_NAME=$2
+ ERRNO=$(errno_name_to_val $ERRNO_NAME)
+
+ if [[ $ERRNO_NAME = "-ERR_ANY" ]]; then
+ if [[ $RC -ge 0 ]]; then
+ echo "$1: FAIL, test expects $ERRNO_NAME - got $RC_NAME ($RC)" >&2
+ config_show_config
+ exit 1
+ fi
+ elif [[ $RC != $ERRNO ]]; then
+ echo "$1: FAIL, test expects $ERRNO_NAME ($ERRNO) - got $RC_NAME ($RC)" >&2
+ config_show_config
+ exit 1
+ fi
+ echo "$1: OK! - Return value: $RC ($RC_NAME), expected $ERRNO_NAME"
+}
+
+kmod_defaults_driver()
+{
+ config_reset
+ modprobe -r $DEFAULT_KMOD_DRIVER
+ config_set_driver $DEFAULT_KMOD_DRIVER
+}
+
+kmod_defaults_fs()
+{
+ config_reset
+ modprobe -r $DEFAULT_KMOD_FS
+ config_set_fs $DEFAULT_KMOD_FS
+ config_set_test_case_fs
+}
+
+kmod_test_0001_driver()
+{
+ NAME='\000'
+
+ kmod_defaults_driver
+ config_num_threads 1
+ printf '\000' >"$DIR"/config_test_driver
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
+}
+
+kmod_test_0001_fs()
+{
+ NAME='\000'
+
+ kmod_defaults_fs
+ config_num_threads 1
+ printf '\000' >"$DIR"/config_test_fs
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} -EINVAL
+}
+
+kmod_test_0001()
+{
+ kmod_test_0001_driver
+ kmod_test_0001_fs
+}
+
+kmod_test_0002_driver()
+{
+ NAME="nope-$DEFAULT_KMOD_DRIVER"
+
+ kmod_defaults_driver
+ config_set_driver $NAME
+ config_num_threads 1
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
+}
+
+kmod_test_0002_fs()
+{
+ NAME="nope-$DEFAULT_KMOD_FS"
+
+ kmod_defaults_fs
+ config_set_fs $NAME
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} -EINVAL
+}
+
+kmod_test_0002()
+{
+ kmod_test_0002_driver
+ kmod_test_0002_fs
+}
+
+kmod_test_0003()
+{
+ kmod_defaults_fs
+ config_num_threads 1
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} SUCCESS
+}
+
+kmod_test_0004()
+{
+ kmod_defaults_fs
+ config_num_threads 2
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} SUCCESS
+}
+
+kmod_test_0005()
+{
+ kmod_defaults_driver
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} SUCCESS
+}
+
+kmod_test_0006()
+{
+ kmod_defaults_fs
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} SUCCESS
+}
+
+kmod_test_0007()
+{
+ kmod_test_0005
+ kmod_test_0006
+}
+
+kmod_test_0008()
+{
+ kmod_defaults_driver
+ MODPROBE_LIMIT=$(config_get_modprobe_limit)
+ let EXTRA=$MODPROBE_LIMIT/2
+ config_num_thread_limit_extra $EXTRA
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} -EINVAL
+}
+
+kmod_test_0009()
+{
+ kmod_defaults_fs
+ #MODPROBE_LIMIT=$(config_get_modprobe_limit)
+ #let EXTRA=$MODPROBE_LIMIT/3
+ config_num_thread_limit_extra 5
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} -EINVAL
+}
+
+trap "test_finish" EXIT
+test_reqs
+allow_user_defaults
+load_req_mod
+
+usage()
+{
+ echo "Usage: $0 [ -t <4-number-digit> ]"
+ echo "Valid tests: 0001-0011"
+ echo
+ echo "0001 - Simple test - 1 thread for empty string"
+ echo "0002 - Simple test - 1 thread for modules/filesystems that do not exist"
+ echo "0003 - Simple test - 1 thread for get_fs_type() only"
+ echo "0004 - Simple test - 2 threads for get_fs_type() only"
+ echo "0005 - multithreaded tests with default setup - request_module() only"
+ echo "0006 - multithreaded tests with default setup - get_fs_type() only"
+ echo "0007 - multithreaded tests with default setup test request_module() and get_fs_type()"
+ echo "0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()"
+ echo "0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()"
+ exit 1
+}
+
+# You can ask for a specific test:
+if [[ $# > 0 ]] ; then
+ if [[ $1 != "-t" ]]; then
+ usage
+ fi
+
+ re='^[0-9]+$'
+ if ! [[ $2 =~ $re ]]; then
+ usage
+ fi
+
+ RUN_TEST=kmod_test_$2
+ $RUN_TEST
+ exit 0
+fi
+
+# Once tese are enabled please leave them as-is. Write your own test,
+# we have tons of space.
+kmod_test_0001
+kmod_test_0002
+kmod_test_0003
+kmod_test_0004
+kmod_test_0005
+kmod_test_0006
+kmod_test_0007
+
+#kmod_test_0008
+#kmod_test_0009
+
+exit 0
--
2.10.1
While looking for early possible module loading failures I was
able to reproduce a memory leak possible with kmemleak. There
are a few rare ways to trigger a failure:
o we've run into a failure while processing kernel parameters
(parse_args() returns an error)
o mod_sysfs_setup() fails
o we're a live patch module and copy_module_elf() fails
Chances of running into this issue is really low.
kmemleak splat:
unreferenced object 0xffff9f2c4ada1b00 (size 32):
comm "kworker/u16:4", pid 82, jiffies 4294897636 (age 681.816s)
hex dump (first 32 bytes):
6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8c6cfeba>] kmemleak_alloc+0x4a/0xa0
[<ffffffff8c200046>] __kmalloc_track_caller+0x126/0x230
[<ffffffff8c1bc581>] kstrdup+0x31/0x60
[<ffffffff8c1bc5d4>] kstrdup_const+0x24/0x30
[<ffffffff8c3c23aa>] kvasprintf_const+0x7a/0x90
[<ffffffff8c3b5481>] kobject_set_name_vargs+0x21/0x90
[<ffffffff8c4fbdd7>] dev_set_name+0x47/0x50
[<ffffffffc07819e5>] memstick_check+0x95/0x33c [memstick]
[<ffffffff8c09c893>] process_one_work+0x1f3/0x4b0
[<ffffffff8c09cb98>] worker_thread+0x48/0x4e0
[<ffffffff8c0a2b79>] kthread+0xc9/0xe0
[<ffffffff8c6dab5f>] ret_from_fork+0x1f/0x40
[<ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
kernel/module.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/module.c b/kernel/module.c
index f7482db0f843..e420ed67e533 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3722,6 +3722,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
mod_sysfs_teardown(mod);
coming_cleanup:
mod->state = MODULE_STATE_GOING;
+ destroy_params(mod->kp, mod->num_kp);
blocking_notifier_call_chain(&module_notify_list,
MODULE_STATE_GOING, mod);
klp_module_going(mod);
--
2.10.1
We currently statically limit the number of modprobe threads which
we allow to run concurrently to 50. As per Keith Owens, this was a
completely arbitrary value, and it was set in the 2.3.38 days [0]
over 16 years ago in year 2000.
Although we haven't yet hit our lower limits, experimentation [1]
shows that when and if we hit this limit in the worst case, will be
fatal -- consider get_fs_type() failures upon mount on a system which
has many partitions, some of which might even be with the same
filesystem. Its best to be prudent and increase and set this
value to something more sensible which ensures we're far from hitting
the limit and also allows default build/user run time override.
The worst case is fatal given that once a module fails to load there
is a period of time during which subsequent request for the same module
will fail, so in the case of partitions its not just one request that
could fail, but whole series of partitions. This later issue of a
module request failure domino effect can be addressed later, but
increasing the limit to something more meaninful should at least give us
enough cushion to avoid this for a while.
Set this value up with a bit more meaninful modern limits:
Bump this up to 64 max for small systems (CONFIG_BASE_SMALL)
Bump this up to 128 max for larger systems (!CONFIG_BASE_SMALL)
Also allow the default max limit to be further fine tuned at compile
time and at initialization at run time at boot up using the kernel
parameter: max_modprobes.
[0] https://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/?id=ab1c4ec7410f6ec64e1511d1a7d850fc99c09b44
[1] https://github.com/mcgrof/test_request_module
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 7 ++++
include/linux/kmod.h | 3 +-
init/Kconfig | 23 +++++++++++++
init/main.c | 1 +
kernel/kmod.c | 43 ++++++++++++++++---------
5 files changed, 61 insertions(+), 16 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index be2d6d0a03a4..92bcccc65ea4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1700,6 +1700,13 @@
keepinitrd [HW,ARM]
+ kmod.max_modprobes [KNL]
+ This lets you set the max allowed of concurrent
+ modprobes threads possible on a system overriding the
+ default heuristic of:
+
+ min(max_threads/2, 2 << CONFIG_MAX_KMOD_CONCURRENT)
+
kernelcore= [KNL,X86,IA-64,PPC]
Format: nn[KMGTPE] | "mirror"
This parameter
diff --git a/include/linux/kmod.h b/include/linux/kmod.h
index fcfd2bf14d3f..15783cd7f056 100644
--- a/include/linux/kmod.h
+++ b/include/linux/kmod.h
@@ -38,13 +38,14 @@ int __request_module(bool wait, const char *name, ...);
#define request_module_nowait(mod...) __request_module(false, mod)
#define try_then_request_module(x, mod...) \
((x) ?: (__request_module(true, mod), (x)))
+void init_kmod_umh(void);
#else
static inline int request_module(const char *name, ...) { return -ENOSYS; }
static inline int request_module_nowait(const char *name, ...) { return -ENOSYS; }
+static inline void init_kmod_umh(void) { }
#define try_then_request_module(x, mod...) (x)
#endif
-
struct cred;
struct file;
diff --git a/init/Kconfig b/init/Kconfig
index 271692a352f1..da2c25746937 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2111,6 +2111,29 @@ config TRIM_UNUSED_KSYMS
If unsure, or if you need to build out-of-tree modules, say N.
+config MAX_KMOD_CONCURRENT
+ int "Max allowed concurrent request_module() calls (6=>64, 10=>1024)"
+ range 0 14
+ default 6 if !BASE_SMALL
+ default 7 if BASE_SMALL
+ help
+ The kernel restricts the number of possible concurrent calls to
+ request_module() to help avoid a recursive loop possible with
+ modules. The default maximum number of concurrent threads allowed
+ to run request_module() will be:
+
+ max_modprobes = min(max_threads/2, 2 << CONFIG_MAX_KMOD_CONCURRENT);
+
+ The value set in CONFIG_MAX_KMOD_CONCURRENT represents then the power
+ of 2 value used at boot time for the above computation. You can
+ override the default built value using the kernel parameter:
+
+ kmod.max_modprobes=4096
+
+ We set this to default to 64 (2^6) concurrent modprobe threads for
+ small systems, for larger systems this defaults to 128 (2^7)
+ concurrent modprobe threads.
+
endif # MODULES
config MODULES_TREE_LOOKUP
diff --git a/init/main.c b/init/main.c
index 8161208d4ece..1fa441aa32c6 100644
--- a/init/main.c
+++ b/init/main.c
@@ -638,6 +638,7 @@ asmlinkage __visible void __init start_kernel(void)
thread_stack_cache_init();
cred_init();
fork_init();
+ init_kmod_umh();
proc_caches_init();
buffer_init();
key_init();
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 0277d1216f80..cb6f7ca7b8a5 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -44,6 +44,9 @@
#include <trace/events/module.h>
extern int max_threads;
+unsigned int max_modprobes;
+module_param(max_modprobes, uint, 0644);
+MODULE_PARM_DESC(max_modprobes, "Max number of allowed concurrent modprobes");
#define CAP_BSET (void *)1
#define CAP_PI (void *)2
@@ -125,10 +128,8 @@ int __request_module(bool wait, const char *fmt, ...)
{
va_list args;
char module_name[MODULE_NAME_LEN];
- unsigned int max_modprobes;
int ret;
static atomic_t kmod_concurrent = ATOMIC_INIT(0);
-#define MAX_KMOD_CONCURRENT 50 /* Completely arbitrary value - KAO */
static int kmod_loop_msg;
/*
@@ -152,19 +153,6 @@ int __request_module(bool wait, const char *fmt, ...)
if (ret)
return ret;
- /* If modprobe needs a service that is in a module, we get a recursive
- * loop. Limit the number of running kmod threads to max_threads/2 or
- * MAX_KMOD_CONCURRENT, whichever is the smaller. A cleaner method
- * would be to run the parents of this process, counting how many times
- * kmod was invoked. That would mean accessing the internals of the
- * process tables to get the command line, proc_pid_cmdline is static
- * and it is not worth changing the proc code just to handle this case.
- * KAO.
- *
- * "trace the ppid" is simple, but will fail if someone's
- * parent exits. I think this is as good as it gets. --RR
- */
- max_modprobes = min(max_threads/2, MAX_KMOD_CONCURRENT);
atomic_inc(&kmod_concurrent);
if (atomic_read(&kmod_concurrent) > max_modprobes) {
/* We may be blaming an innocent here, but unlikely */
@@ -186,6 +174,31 @@ int __request_module(bool wait, const char *fmt, ...)
return ret;
}
EXPORT_SYMBOL(__request_module);
+
+/*
+ * If modprobe needs a service that is in a module, we get a recursive
+ * loop. Limit the number of running kmod threads to max_threads/2 or
+ * CONFIG_MAX_KMOD_CONCURRENT, whichever is the smaller. A cleaner method
+ * would be to run the parents of this process, counting how many times
+ * kmod was invoked. That would mean accessing the internals of the
+ * process tables to get the command line, proc_pid_cmdline is static
+ * and it is not worth changing the proc code just to handle this case.
+ *
+ * "trace the ppid" is simple, but will fail if someone's
+ * parent exits. I think this is as good as it gets.
+ *
+ * You can override with with a kernel parameter, for instance to allow
+ * 4096 concurrent modprobe instances:
+ *
+ * kmod.max_modprobes=4096
+ */
+void __init init_kmod_umh(void)
+{
+ if (!max_modprobes)
+ max_modprobes = min(max_threads/2,
+ 2 << CONFIG_MAX_KMOD_CONCURRENT);
+}
+
#endif /* CONFIG_MODULES */
static void call_usermodehelper_freeinfo(struct subprocess_info *info)
--
2.10.1
kmod_concurrent is used as an atomic counter for enabling
the allowed limit of modprobe calls, provide wrappers for it
to enable this to be expanded on more easily. This will be done
later.
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
kernel/kmod.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/kernel/kmod.c b/kernel/kmod.c
index cb6f7ca7b8a5..049d7eabda38 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -44,6 +44,9 @@
#include <trace/events/module.h>
extern int max_threads;
+
+static atomic_t kmod_concurrent = ATOMIC_INIT(0);
+
unsigned int max_modprobes;
module_param(max_modprobes, uint, 0644);
MODULE_PARM_DESC(max_modprobes, "Max number of allowed concurrent modprobes");
@@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
return -ENOMEM;
}
+static int kmod_umh_threads_get(void)
+{
+ atomic_inc(&kmod_concurrent);
+ if (atomic_read(&kmod_concurrent) < max_modprobes)
+ return 0;
+ atomic_dec(&kmod_concurrent);
+ return -ENOMEM;
+}
+
+static void kmod_umh_threads_put(void)
+{
+ atomic_dec(&kmod_concurrent);
+}
+
/**
* __request_module - try to load a kernel module
* @wait: wait (or not) for the operation to complete
@@ -129,7 +146,6 @@ int __request_module(bool wait, const char *fmt, ...)
va_list args;
char module_name[MODULE_NAME_LEN];
int ret;
- static atomic_t kmod_concurrent = ATOMIC_INIT(0);
static int kmod_loop_msg;
/*
@@ -153,8 +169,8 @@ int __request_module(bool wait, const char *fmt, ...)
if (ret)
return ret;
- atomic_inc(&kmod_concurrent);
- if (atomic_read(&kmod_concurrent) > max_modprobes) {
+ ret = kmod_umh_threads_get();
+ if (ret) {
/* We may be blaming an innocent here, but unlikely */
if (kmod_loop_msg < 5) {
printk(KERN_ERR
@@ -162,15 +178,14 @@ int __request_module(bool wait, const char *fmt, ...)
module_name);
kmod_loop_msg++;
}
- atomic_dec(&kmod_concurrent);
- return -ENOMEM;
+ return ret;
}
trace_module_request(module_name, wait, _RET_IP_);
ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
- atomic_dec(&kmod_concurrent);
+ kmod_umh_threads_put();
return ret;
}
EXPORT_SYMBOL(__request_module);
--
2.10.1
Running out of our modprobe limit is not a memory limit but
a system specific established limitation set to avoid a possible
recursive issue with modprobe. This gives userspace a better idea
of what happened if we can't load a module, it could use this to
wait and try again.
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
kernel/kmod.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 049d7eabda38..ab38539f7e91 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -117,7 +117,7 @@ static int kmod_umh_threads_get(void)
if (atomic_read(&kmod_concurrent) < max_modprobes)
return 0;
atomic_dec(&kmod_concurrent);
- return -ENOMEM;
+ return -EBUSY;
}
static void kmod_umh_threads_put(void)
--
2.10.1
Only decrement *iff* we're possitive. Warn if we've hit
a situation where the counter is already 0 after we're done
with a modprobe call, this would tell us we have an unaccounted
counter access -- this in theory should not be possible as
only one routine controls the counter, however preemption is
one case that could trigger this situation. Avoid that situation
by disabling preemptiong while we access the counter.
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
kernel/kmod.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/kernel/kmod.c b/kernel/kmod.c
index ab38539f7e91..09cf35a2075a 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -113,16 +113,28 @@ static int call_modprobe(char *module_name, int wait)
static int kmod_umh_threads_get(void)
{
+ int ret = 0;
+
+ preempt_disable();
atomic_inc(&kmod_concurrent);
if (atomic_read(&kmod_concurrent) < max_modprobes)
- return 0;
- atomic_dec(&kmod_concurrent);
- return -EBUSY;
+ goto out;
+
+ atomic_dec_if_positive(&kmod_concurrent);
+ ret = -EBUSY;
+out:
+ preempt_enable();
+ return 0;
}
static void kmod_umh_threads_put(void)
{
- atomic_dec(&kmod_concurrent);
+ int ret;
+
+ preempt_disable();
+ ret = atomic_dec_if_positive(&kmod_concurrent);
+ WARN_ON(ret < 0);
+ preempt_enable();
}
/**
--
2.10.1
Just use the simplified rate limit printk when the max modprobe
limit is reached, while at it throw out a bone should the error
be triggered.
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
kernel/kmod.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 09cf35a2075a..ef65f4c3578a 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -158,7 +158,6 @@ int __request_module(bool wait, const char *fmt, ...)
va_list args;
char module_name[MODULE_NAME_LEN];
int ret;
- static int kmod_loop_msg;
/*
* We don't allow synchronous module loading from async. Module
@@ -183,13 +182,8 @@ int __request_module(bool wait, const char *fmt, ...)
ret = kmod_umh_threads_get();
if (ret) {
- /* We may be blaming an innocent here, but unlikely */
- if (kmod_loop_msg < 5) {
- printk(KERN_ERR
- "request_module: runaway loop modprobe %s\n",
- module_name);
- kmod_loop_msg++;
- }
+ pr_err_ratelimited("request_module: modprobe limit (%u) reached with module %s\n",
+ max_modprobes, module_name);
return ret;
}
--
2.10.1
Commit e7d316a02f6838 ("sysctl: handle error writing UINT_MAX to u32 fields")
added proc_douintvec() to start help adding support for unsigned int,
this however was only half the work needed, all these issues are present
with the current implementation:
o Printing the values shows a negative value, this happens
since do_proc_dointvec() and this uses proc_put_long()
o We can easily wrap around the int values: UINT_MAX is
4294967295, if we echo in 4294967295 + 1 we end up with 0,
using 4294967295 + 2 we end up with 1.
o We echo negative values in and they are accepted
Fix all these issues by adding our own do_proc_douintvec(). Likewise to
keep parity provide the other typically useful proc_douintvec_minmax().
Adding proc_douintvec_minmax_sysadmin() is easy but we wait for an actual
user for that.
Cc: Subash Abhinov Kasiviswanathan <[email protected]>
Cc: Heinrich Schuchardt <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields")
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
include/linux/sysctl.h | 3 +
kernel/sysctl.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 181 insertions(+), 6 deletions(-)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index adf4e51cf597..a35d40ecc211 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -47,6 +47,9 @@ extern int proc_douintvec(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
extern int proc_dointvec_minmax(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
+extern int proc_douintvec_minmax(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos);
extern int proc_dointvec_jiffies(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
extern int proc_dointvec_userhz_jiffies(struct ctl_table *, int,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 1a292ebcbbb6..06711e648fa3 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2125,12 +2125,12 @@ static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp,
return 0;
}
-static int do_proc_douintvec_conv(bool *negp, unsigned long *lvalp,
- int *valp,
- int write, void *data)
+static int do_proc_douintvec_conv(unsigned long *lvalp,
+ unsigned int *valp,
+ int write, void *data)
{
if (write) {
- if (*negp)
+ if (*lvalp > (unsigned long) UINT_MAX)
return -EINVAL;
*valp = *lvalp;
} else {
@@ -2243,6 +2243,115 @@ static int do_proc_dointvec(struct ctl_table *table, int write,
buffer, lenp, ppos, conv, data);
}
+static int __do_proc_douintvec(void *tbl_data, struct ctl_table *table,
+ int write, void __user *buffer,
+ size_t *lenp, loff_t *ppos,
+ int (*conv)(unsigned long *lvalp,
+ unsigned int *valp,
+ int write, void *data),
+ void *data)
+{
+ unsigned int *i, vleft;
+ bool first = true;
+ int err = 0;
+ size_t left;
+ char *kbuf = NULL, *p;
+
+ if (!tbl_data || !table->maxlen || !*lenp || (*ppos && !write)) {
+ *lenp = 0;
+ return 0;
+ }
+
+ i = (unsigned int *) tbl_data;
+ vleft = table->maxlen / sizeof(*i);
+ left = *lenp;
+
+ if (!conv)
+ conv = do_proc_douintvec_conv;
+
+ if (write) {
+ if (*ppos) {
+ switch (sysctl_writes_strict) {
+ case SYSCTL_WRITES_STRICT:
+ goto out;
+ case SYSCTL_WRITES_WARN:
+ warn_sysctl_write(table);
+ break;
+ default:
+ break;
+ }
+ }
+
+ if (left > PAGE_SIZE - 1)
+ left = PAGE_SIZE - 1;
+ p = kbuf = memdup_user_nul(buffer, left);
+ if (IS_ERR(kbuf))
+ return PTR_ERR(kbuf);
+ }
+
+ for (; left && vleft--; i++, first=false) {
+ unsigned long lval;
+ bool neg;
+
+ if (write) {
+ left -= proc_skip_spaces(&p);
+
+ if (!left)
+ break;
+ err = proc_get_long(&p, &left, &lval, &neg,
+ proc_wspace_sep,
+ sizeof(proc_wspace_sep), NULL);
+ if (neg) {
+ err = -EINVAL;
+ break;
+ }
+ if (err)
+ break;
+ if (conv(&lval, i, 1, data)) {
+ err = -EINVAL;
+ break;
+ }
+ } else {
+ if (conv(&lval, i, 0, data)) {
+ err = -EINVAL;
+ break;
+ }
+ if (!first)
+ err = proc_put_char(&buffer, &left, '\t');
+ if (err)
+ break;
+ err = proc_put_long(&buffer, &left, lval, false);
+ if (err)
+ break;
+ }
+ }
+
+ if (!write && !first && left && !err)
+ err = proc_put_char(&buffer, &left, '\n');
+ if (write && !err && left)
+ left -= proc_skip_spaces(&p);
+ if (write) {
+ kfree(kbuf);
+ if (first)
+ return err ? : -EINVAL;
+ }
+ *lenp -= left;
+out:
+ *ppos += *lenp;
+ return err;
+}
+
+static int do_proc_douintvec(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos,
+ int (*conv)(unsigned long *lvalp,
+ unsigned int *valp,
+ int write, void *data),
+ void *data)
+{
+ return __do_proc_douintvec(table->data, table, write,
+ buffer, lenp, ppos, conv, data);
+}
+
/**
* proc_dointvec - read a vector of integers
* @table: the sysctl table
@@ -2278,8 +2387,8 @@ int proc_dointvec(struct ctl_table *table, int write,
int proc_douintvec(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
- return do_proc_dointvec(table, write, buffer, lenp, ppos,
- do_proc_douintvec_conv, NULL);
+ return do_proc_douintvec(table, write, buffer, lenp, ppos,
+ do_proc_douintvec_conv, NULL);
}
/*
@@ -2384,6 +2493,62 @@ int proc_dointvec_minmax(struct ctl_table *table, int write,
do_proc_dointvec_minmax_conv, ¶m);
}
+struct do_proc_douintvec_minmax_conv_param {
+ unsigned int *min;
+ unsigned int *max;
+};
+
+static int do_proc_douintvec_minmax_conv(unsigned long *lvalp,
+ unsigned int *valp,
+ int write, void *data)
+{
+ struct do_proc_douintvec_minmax_conv_param *param = data;
+ if (write) {
+ unsigned int val = *lvalp;
+ if ((param->min && *param->min > val) ||
+ (param->max && *param->max < val))
+ return -ERANGE;
+
+ if (*lvalp > (unsigned long) UINT_MAX)
+ return -EINVAL;
+ *valp = val;
+ } else {
+ unsigned int val = *valp;
+ *lvalp = (unsigned long) val;
+ }
+ return 0;
+}
+
+/**
+ * proc_douintvec_minmax - read a vector of unsigned ints with min/max values
+ * @table: the sysctl table
+ * @write: %TRUE if this is a write to the sysctl file
+ * @buffer: the user buffer
+ * @lenp: the size of the user buffer
+ * @ppos: file position
+ *
+ * Reads/writes up to table->maxlen/sizeof(unsigned int) unsigned integer
+ * values from/to the user buffer, treated as an ASCII string. Negative
+ * strings are not allowed.
+ *
+ * This routine will ensure the values are within the range specified by
+ * table->extra1 (min) and table->extra2 (max). There is a final sanity
+ * check for UINT_MAX to avoid having to support wrap around uses from
+ * userspace.
+ *
+ * Returns 0 on success.
+ */
+int proc_douintvec_minmax(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct do_proc_douintvec_minmax_conv_param param = {
+ .min = (unsigned int *) table->extra1,
+ .max = (unsigned int *) table->extra2,
+ };
+ return do_proc_douintvec(table, write, buffer, lenp, ppos,
+ do_proc_douintvec_minmax_conv, ¶m);
+}
+
static void validate_coredump_safety(void)
{
#ifdef CONFIG_COREDUMP
@@ -2891,6 +3056,12 @@ int proc_dointvec_minmax(struct ctl_table *table, int write,
return -ENOSYS;
}
+int proc_douintvec_minmax(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ return -ENOSYS;
+}
+
int proc_dointvec_jiffies(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
@@ -2933,6 +3104,7 @@ EXPORT_SYMBOL(proc_dointvec);
EXPORT_SYMBOL(proc_douintvec);
EXPORT_SYMBOL(proc_dointvec_jiffies);
EXPORT_SYMBOL(proc_dointvec_minmax);
+EXPORT_SYMBOL_GPL(proc_douintvec_minmax);
EXPORT_SYMBOL(proc_dointvec_userhz_jiffies);
EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
EXPORT_SYMBOL(proc_dostring);
--
2.10.1
This adds helpers for getting access to the kmod count and limit from
userspace. While at it, this also lets userspace fine tune the kmod
limit after boot, it uses the shiny new proc_douintvec_minmax().
These knobs should help userspace more gracefully and deterministically
handle module loading.
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
include/linux/kmod.h | 8 +++++
kernel/kmod.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++--
kernel/sysctl.c | 14 +++++++++
3 files changed, 103 insertions(+), 2 deletions(-)
diff --git a/include/linux/kmod.h b/include/linux/kmod.h
index 15783cd7f056..94c7379cff94 100644
--- a/include/linux/kmod.h
+++ b/include/linux/kmod.h
@@ -39,13 +39,21 @@ int __request_module(bool wait, const char *name, ...);
#define try_then_request_module(x, mod...) \
((x) ?: (__request_module(true, mod), (x)))
void init_kmod_umh(void);
+unsigned int get_kmod_umh_limit(void);
+int sysctl_kmod_count(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos);
+int sysctl_kmod_limit(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos);
#else
static inline int request_module(const char *name, ...) { return -ENOSYS; }
static inline int request_module_nowait(const char *name, ...) { return -ENOSYS; }
static inline void init_kmod_umh(void) { }
+static unsigned int get_kmod_umh_limit(void) { return 0; }
#define try_then_request_module(x, mod...) (x)
#endif
+#define get_kmod_umh_limit get_kmod_umh_limit
+
struct cred;
struct file;
diff --git a/kernel/kmod.c b/kernel/kmod.c
index ef65f4c3578a..a0f449f77ed7 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -138,6 +138,27 @@ static void kmod_umh_threads_put(void)
}
/**
+ * get_kmod_umh_limit - get concurrent modprobe thread limit
+ *
+ * Returns the number of allowed concurrent modprobe calls.
+ */
+unsigned int get_kmod_umh_limit(void)
+{
+ return max_modprobes;
+}
+EXPORT_SYMBOL_GPL(get_kmod_umh_limit);
+
+/**
+ * get_kmod_umh_count - get number of concurrent modprobe calls running
+ *
+ * Returns the number of concurrent modprobe calls currently running.
+ */
+int get_kmod_umh_count(void)
+{
+ return atomic_read(&kmod_concurrent);
+}
+
+/**
* __request_module - try to load a kernel module
* @wait: wait (or not) for the operation to complete
* @fmt: printf style format string for the name of the module
@@ -196,6 +217,11 @@ int __request_module(bool wait, const char *fmt, ...)
}
EXPORT_SYMBOL(__request_module);
+static void __set_max_modprobes(unsigned int suggested)
+{
+ max_modprobes = min((unsigned int) max_threads/2, suggested);
+}
+
/*
* If modprobe needs a service that is in a module, we get a recursive
* loop. Limit the number of running kmod threads to max_threads/2 or
@@ -212,12 +238,65 @@ EXPORT_SYMBOL(__request_module);
* 4096 concurrent modprobe instances:
*
* kmod.max_modprobes=4096
+ *
+ * You can also set the limit via sysctl:
+ *
+ * echo 4096 > /proc/sys/kernel/kmod-limit
+ *
+ * You can also set the query the current thread count:
+ *
+ * cat /proc/sys/kernel/kmod-count
+ *
+ * These knobs should enable userspace to more gracefully and
+ * deterministically handle module loading.
*/
void __init init_kmod_umh(void)
{
if (!max_modprobes)
- max_modprobes = min(max_threads/2,
- 2 << CONFIG_MAX_KMOD_CONCURRENT);
+ __set_max_modprobes(2 << CONFIG_MAX_KMOD_CONCURRENT);
+}
+
+int sysctl_kmod_count(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct ctl_table t;
+ int ret = 0;
+ int count = get_kmod_umh_count();
+
+ t = *table;
+ t.data = &count;
+
+ if (write)
+ return -EPERM;
+
+ ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
+
+ return ret;
+}
+
+int sysctl_kmod_limit(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct ctl_table t;
+ int ret;
+ unsigned int local_max_modprobes = max_modprobes;
+ unsigned int min = 0;
+ unsigned int max = max_threads/2;
+
+ t = *table;
+ t.data = &local_max_modprobes;
+ t.extra1 = &min;
+ t.extra2 = &max;
+
+ ret = proc_douintvec_minmax(&t, write, buffer, lenp, ppos);
+ if (ret == -ERANGE)
+ pr_err("modprobe thread valid range: %u - %u\n", min, max);
+ if (ret || !write)
+ return ret;
+
+ __set_max_modprobes((unsigned int) local_max_modprobes);
+
+ return 0;
}
#endif /* CONFIG_MODULES */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 06711e648fa3..0ba56001e49b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -660,6 +660,20 @@ static struct ctl_table kern_table[] = {
.extra1 = &one,
.extra2 = &one,
},
+ {
+ .procname = "kmod-count",
+ .data = NULL, /* filled in by handler */
+ .maxlen = sizeof(int),
+ .mode = 0444,
+ .proc_handler = sysctl_kmod_count,
+ },
+ {
+ .procname = "kmod-limit",
+ .data = NULL, /* filled in by handler */
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = sysctl_kmod_limit,
+ },
#endif
#ifdef CONFIG_UEVENT_HELPER
{
--
2.10.1
kmod has an optimization in place whereby if a some kernel code
uses request_module() on a module already loaded we never bother
userspace as the module already is loaded. This is not true for
get_fs_type() though as it uses aliases.
Additionally kmod <= v19 was broken -- it returns 0 to modprobe calls,
assuming the kernel module is built-in, where really we have a race as
the module starts forming. kmod <= v19 has incorrect userspace heuristics,
a userspace kmod fix is available for it:
http://git.kernel.org/cgit/utils/kernel/kmod/kmod.git/commit/libkmod/libkmod-module.c?id=fd44a98ae2eb5eb32161088954ab21e58e19dfc4
This changes kmod to address both:
o Provides the alias optimization for get_fs_type() so modules already
loaded do not get re-requested.
o Provides a sanity test to verify modprobe's work
This is important given how any get_fs_type() users assert success
means we're ready to go, and tests with the new test_kmod stress driver
reveal that request_module() and get_fs_type() might fail for a few
other reasons. You don't need old kmod to fail on request_module() or
get_fs_type(), with the right system setup, these calls *can* fail
today.
Although this does get us in the business of keeping alias maps in
kernel, the the work to support and maintain this is trivial.
Aditionally, since it may be important get_fs_type() should not fail on
certain systems, this tightens things up a bit more.
The TL;DR:
kmod <= v19 will return 0 on modprobe calls if you are built-in,
however its heuristics for checking if you are built-in were broken.
It assumed that having the directory /sys/module/module-name
but not having the file /sys/module/module-name/initstate
is sufficient to assume a module is built-in.
The kernel loads the inittstate attribute *after* it creates the
directory. This is an issue when modprobe returns 0 for kernel calls
which assumes a return of 0 on request_module() can give you the
right to assert the module is loaded and live.
We cannot trust returns of modprobe as 0 in the kernel, we need to
verify that modules are live if modprobe return 0 but only if modules
*are* modules. The kernel heuristic we use to determine if a module is
built-in is that if modprobe returns 0 we know we must be built-in or
a module, but if we are a module clearly we must have a lingering kmod
dangling on our linked list. If there is no modules there we are *somewhat*
certain the module must be built in.
This is not enough though... we cannot easily work around this since the
kernel can use aliases to userspace for modules calls. For instance
fs/namespace.c uses fs-modulename for filesystesms on get_fs_type(), so
these need to be taken into consideration as well.
Using kmod <= 19 will give you a NULL get_fs_type() return even though
the module was loaded... That is a corner case, there are other failures
for request_module() though -- the other failures are not easy to
reproduce though but fortunately we have a stress test driver to help
with that now. Use the following tests:
# tools/testing/selftests/kmod/kmod.sh -t 0008
# tools/testing/selftests/kmod/kmod.sh -t 0009
You can more easily see this error if you have kmod <= v19 installed.
You will need to install kmod <= v19, be sure to install its modprobe
into /sbin/ as by default the 'make install' target does not replace
your own.
This test helps cure test_kmod cases 0008 0009 so enable them.
Reported-by: Martin Wilck <[email protected]>
Reported-by: Randy Wright <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
kernel/kmod.c | 73 ++++++++++++++++++++++++++++++++++++
kernel/module.c | 11 ++++--
tools/testing/selftests/kmod/kmod.sh | 9 ++---
3 files changed, 85 insertions(+), 8 deletions(-)
diff --git a/kernel/kmod.c b/kernel/kmod.c
index a0f449f77ed7..6bf0feab41d1 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -61,6 +61,11 @@ static DECLARE_RWSEM(umhelper_sem);
#ifdef CONFIG_MODULES
+bool finished_loading(const char *name);
+int module_wait_until_finished(const char *name);
+struct module *find_module_all(const char *name, size_t len,
+ bool even_unformed);
+
/*
modprobe_path is set via /proc/sys.
*/
@@ -158,6 +163,72 @@ int get_kmod_umh_count(void)
return atomic_read(&kmod_concurrent);
}
+static bool kmod_exists(char *name)
+{
+ struct module *mod;
+
+ mutex_lock(&module_mutex);
+ mod = find_module_all(name, strlen(name), true);
+ mutex_unlock(&module_mutex);
+
+ if (mod)
+ return true;
+
+ return false;
+}
+
+/*
+ * The assumption is this must be a module, it could still not be live though
+ * since kmod <= 19 returns 0 even if it was not ready yet. Allow for force
+ * wait check in case you are stuck on old userspace.
+ */
+static int wait_for_kmod(char *name)
+{
+ int ret = 0;
+
+ if (!finished_loading(name))
+ ret = module_wait_until_finished(name);
+
+ return ret;
+}
+
+/*
+ * kmod <= 19 will tell us modprobe returned 0 even if the module
+ * is not ready yet, it does this because it checks the /sys/module/mod-name
+ * directory and if its created but the /sys/module/mod-name/initstate is not
+ * created it assumes you have a built-in driver. At this point the module
+ * is still unformed, and telling the kernel at any point via request_module()
+ * will cause issues given a lot of places in the kernel assert that the driver
+ * will be present and ready. We need to account for this.
+ *
+ * If we had a module and even if buggy modprobe returned 0, we know we'd at
+ * least have a dangling kmod entry we could fetch.
+ *
+ * If modprobe returned 0 and we cannot find a kmod entry this is a good
+ * indicator your by userspace and kernel space that what you have is built-in.
+ *
+ * If modprobe returned 0 and we can find a kmod entry we should air on the
+ * side of caution and wait for the module to become ready or going.
+ *
+ * In the worst case, for built-in, we have to check on the module list for
+ * as many aliases possible the kernel gives the module, if that is n, that
+ * n traversals on the module list.
+ */
+static int finished_kmod_load(char *name)
+{
+ int ret = 0;
+ bool is_fs = (strlen(name) > 3) && (strncmp(name, "fs-", 3) == 0);
+
+ if (kmod_exists(name)) {
+ ret = wait_for_kmod(name);
+ } else {
+ if (is_fs && kmod_exists(name + 3))
+ ret = wait_for_kmod(name + 3);
+ }
+
+ return ret;
+}
+
/**
* __request_module - try to load a kernel module
* @wait: wait (or not) for the operation to complete
@@ -211,6 +282,8 @@ int __request_module(bool wait, const char *fmt, ...)
trace_module_request(module_name, wait, _RET_IP_);
ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
+ if (!ret)
+ ret = finished_kmod_load(module_name);
kmod_umh_threads_put();
return ret;
diff --git a/kernel/module.c b/kernel/module.c
index e420ed67e533..bf854321dca0 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -590,8 +590,8 @@ EXPORT_SYMBOL_GPL(find_symbol);
* Search for module by name: must hold module_mutex (or preempt disabled
* for read-only access).
*/
-static struct module *find_module_all(const char *name, size_t len,
- bool even_unformed)
+struct module *find_module_all(const char *name, size_t len,
+ bool even_unformed)
{
struct module *mod;
@@ -3325,7 +3325,7 @@ static int post_relocation(struct module *mod, const struct load_info *info)
}
/* Is this module of this name done loading? No locks held. */
-static bool finished_loading(const char *name)
+bool finished_loading(const char *name)
{
struct module *mod;
bool ret;
@@ -3486,6 +3486,11 @@ static int may_init_module(void)
return 0;
}
+int module_wait_until_finished(const char *name)
+{
+ return wait_event_interruptible(module_wq, finished_loading(name));
+}
+
/*
* We try to place it in the list now to make sure it's unique before
* we dedicate too many resources. In particular, temporary percpu
diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
index 9ea1864d8bae..ccf35b8d1671 100755
--- a/tools/testing/selftests/kmod/kmod.sh
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -382,7 +382,7 @@ kmod_test_0008()
let EXTRA=$MODPROBE_LIMIT/2
config_num_thread_limit_extra $EXTRA
config_trigger ${FUNCNAME[0]}
- config_expect_result ${FUNCNAME[0]} -EINVAL
+ config_expect_result ${FUNCNAME[0]} SUCCESS
}
kmod_test_0009()
@@ -392,7 +392,7 @@ kmod_test_0009()
#let EXTRA=$MODPROBE_LIMIT/3
config_num_thread_limit_extra 5
config_trigger ${FUNCNAME[0]}
- config_expect_result ${FUNCNAME[0]} -EINVAL
+ config_expect_result ${FUNCNAME[0]} SUCCESS
}
trap "test_finish" EXIT
@@ -442,8 +442,7 @@ kmod_test_0004
kmod_test_0005
kmod_test_0006
kmod_test_0007
-
-#kmod_test_0008
-#kmod_test_0009
+kmod_test_0008
+kmod_test_0009
exit 0
--
2.10.1
On Thu, Dec 8, 2016 at 10:47 AM, Luis R. Rodriguez <[email protected]> wrote:
> This adds a new stress test driver for kmod: the kernel module loader.
> The new stress test driver, test_kmod, is only enabled as a module right
> now. It should be possible to load this as built-in and load tests early
> (refer to the force_init_test module parameter), however since a lot of
> test can get a system out of memory fast we leave this disabled for now.
>
> Using a system with 1024 MiB of RAM can *easily* get your kernel
> OOM fast with this test driver.
>
> The test_kmod driver exposes API knobs for us to fine tune simple
> request_module() and get_fs_type() calls. Since these API calls
> only allow each one parameter a test driver for these is rather
> simple. Other factors that can help out test driver though are
> the number of calls we issue and knowing current limitations of
> each. This exposes configuration as much as possible through
> userspace to be able to build tests directly from userspace.
>
> Since it allows multiple misc devices its will eventually (once we
> add a knob to let us create new devices at will) also be possible to
> perform more tests in parallel, provided you have enough memory.
>
> We only enable tests we know work as of right now.
>
> Demo screenshots:
>
> # tools/testing/selftests/kmod/kmod.sh
> kmod_test_0001_driver: OK! - loading kmod test
> kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
> kmod_test_0001_fs: OK! - loading kmod test
> kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
> kmod_test_0002_driver: OK! - loading kmod test
> kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
> kmod_test_0002_fs: OK! - loading kmod test
> kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
> kmod_test_0003: OK! - loading kmod test
> kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
> kmod_test_0004: OK! - loading kmod test
> kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
> kmod_test_0005: OK! - loading kmod test
> kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
> kmod_test_0006: OK! - loading kmod test
> kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
> kmod_test_0005: OK! - loading kmod test
> kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
> kmod_test_0006: OK! - loading kmod test
> kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
> Test completed
>
> You can also request for specific tests:
>
> # tools/testing/selftests/kmod/kmod.sh -t 0001
> kmod_test_0001_driver: OK! - loading kmod test
> kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
> kmod_test_0001_fs: OK! - loading kmod test
> kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
> Test completed
>
> Lastly, the current available number of tests:
>
> # tools/testing/selftests/kmod/kmod.sh --help
> Usage: tools/testing/selftests/kmod/kmod.sh [ -t <4-number-digit> ]
> Valid tests: 0001-0009
>
> 0001 - Simple test - 1 thread for empty string
> 0002 - Simple test - 1 thread for modules/filesystems that do not exist
> 0003 - Simple test - 1 thread for get_fs_type() only
> 0004 - Simple test - 2 threads for get_fs_type() only
> 0005 - multithreaded tests with default setup - request_module() only
> 0006 - multithreaded tests with default setup - get_fs_type() only
> 0007 - multithreaded tests with default setup test request_module() and get_fs_type()
> 0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
> 0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()
>
> The following test cases currently fail, as such they are not currently
> enabled by default:
>
> # tools/testing/selftests/kmod/kmod.sh -t 0007
> # tools/testing/selftests/kmod/kmod.sh -t 0008
> # tools/testing/selftests/kmod/kmod.sh -t 0009
> # tools/testing/selftests/kmod/kmod.sh -t 0010
> # tools/testing/selftests/kmod/kmod.sh -t 0011
>
> To be sure to run them as intended please unload both of the modules:
>
> o test_module
> o xfs
>
> And ensure they are not loaded on your system prior to testing them.
> If you use these paritions for your rootfs you can change the default
> test driver used for get_fs_type() by exporting it into your
> environment. For example of other test defaults you can override
> refer to kmod.sh allow_user_defaults().
>
> Behind the scenes this is how we fine tune at a test case prior to
> hitting a trigger to run it:
>
> cat /sys/devices/virtual/misc/test_kmod0/config
> echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
> echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
> echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
> cat /sys/devices/virtual/misc/test_kmod0/config
> echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
>
> Finally to trigger:
>
> echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config
>
> The kmod.sh script uses the above constructs to build differnt test cases.
Typo: different
> A bit of interpretation of the current failures follows, first two
> premises:
>
> a) When request_module() is used userspace figures out an optimized version of
> module order for us. Once it finds the modules it needs, as per depmod
> symbol dep map, it will finit_module() the respective modules which
> are needed for the original request_module() request.
>
> b) We have an optimization in place whereby if a kernel uses
> request_module() on a module already loaded we never bother
> userspace as the module already is loaded. This is all handled by
> kernel/kmod.c.
>
> A few things to consider to help identify root causes of issues:
>
> 0) kmod 19 has a broken heuristic for modules being assumed to be
> built-in to your kernel and will return 0 even though request_module()
> failed. Upgrade to a newer version of kmod.
>
> 1) A get_fs_type() call for "xfs" will request_module() for
> "fs-xfs", not for "xfs". The optimization in kernel described in b)
> fails to catch if we have a lot of consecutive get_fs_type() calls.
> The reason is the optimization in place does not look for aliases. This
> means two consecutive get_fs_type() calls will bump kmod_concurrent, whereas
> request_module() will not.
>
> This one explanation why test case 0009 fails at least once for
> get_fs_type().
>
> 2) If a module fails to load --- for whatever reason (kmod_concurrent
> limit reached, file not yet present due to rootfs switch, out of memory)
> we have a period of time during which module request for the same name
> either with request_module() or get_fs_type() will *also* fail to load
> even if the file for the module is ready.
>
> This explains why *multiple* NULLs are possible on test 0009.
>
> 3) finit_module() consumes quite a bit of memory.
Is this due to reading the module into kernel memory or something else?
> 4) Filesystems typically also have more dependent modules than other
> modules, its important to note though that even though a get_fs_type() call
> does not incur additional kmod_concurrent bumps, since userspace
> loads dependencies it finds it needs via finit_module_fd(), it *will*
> take much more memory to load a module with a lot of dependencies.
>
> Because of 3) and 4) we will easily run into out of memory failures
> with certain tests. For instance test 0006 fails on qemu with 1024 MiB
> of RAM. It panics a box after reaping all userspace processes and still
> not having enough memory to reap.
Are the buffers not released until after all the dependent modules are
loaded? I thought it would load one by one?
> Signed-off-by: Luis R. Rodriguez <[email protected]>
This is a great selftest, thanks for working on it!
Notes below...
> ---
> lib/Kconfig.debug | 25 +
> lib/Makefile | 1 +
> lib/test_kmod.c | 1248 +++++++++++++++++++++++++++++++++
> tools/testing/selftests/kmod/Makefile | 11 +
> tools/testing/selftests/kmod/config | 7 +
> tools/testing/selftests/kmod/kmod.sh | 449 ++++++++++++
> 6 files changed, 1741 insertions(+)
> create mode 100644 lib/test_kmod.c
> create mode 100644 tools/testing/selftests/kmod/Makefile
> create mode 100644 tools/testing/selftests/kmod/config
> create mode 100755 tools/testing/selftests/kmod/kmod.sh
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 7446097f72bd..6cad548e0682 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1994,6 +1994,31 @@ config BUG_ON_DATA_CORRUPTION
>
> If unsure, say N.
>
> +config TEST_KMOD
> + tristate "kmod stress tester"
> + default n
> + depends on m
> + select TEST_LKM
> + select XFS_FS
> + select TUN
> + select BTRFS_FS
Since the desired FS can be changed at runtime, maybe these selects
aren't needed?
> + help
> + Test the kernel's module loading mechanism: kmod. kmod implements
> + support to load modules using the Linux kernel's usermode helper.
> + This test provides a series of tests against kmod.
> +
> + Although technically you can either build test_kmod as a module or
> + into the kernel we disallow building it into the kernel since
> + it stress tests request_module() and this will very likely cause
> + some issues by taking over precious threads available from other
> + module load requests, ultimately this could be fatal.
> +
> + To run tests run:
> +
> + tools/testing/selftests/kmod/kmod.sh --help
> +
> + If unsure, say N.
> +
> source "samples/Kconfig"
>
> source "lib/Kconfig.kgdb"
> diff --git a/lib/Makefile b/lib/Makefile
> index d15e235f72ea..3c5a14821e16 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -55,6 +55,7 @@ obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o
> obj-$(CONFIG_TEST_PRINTF) += test_printf.o
> obj-$(CONFIG_TEST_BITMAP) += test_bitmap.o
> obj-$(CONFIG_TEST_UUID) += test_uuid.o
> +obj-$(CONFIG_TEST_KMOD) += test_kmod.o
>
> ifeq ($(CONFIG_DEBUG_KOBJECT),y)
> CFLAGS_kobject.o += -DDEBUG
> diff --git a/lib/test_kmod.c b/lib/test_kmod.c
> new file mode 100644
> index 000000000000..63fded83b9b6
> --- /dev/null
> +++ b/lib/test_kmod.c
> @@ -0,0 +1,1248 @@
> +/*
> + * kmod stress test driver
> + *
> + * Copyright (C) 2016 Luis R. Rodriguez <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of copyleft-next (version 0.3.1 or later) as published
> + * at http://copyleft-next.org/.
> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +/*
> + * This driver provides an interface to trigger and test the kernel's
> + * module loader through a series of configurations and a few triggers.
> + * To test this driver use the following script as root:
> + *
> + * tools/testing/selftests/kmod/kmod.sh --help
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/kmod.h>
> +#include <linux/printk.h>
> +#include <linux/kthread.h>
> +#include <linux/sched.h>
> +#include <linux/fs.h>
> +#include <linux/miscdevice.h>
> +#include <linux/vmalloc.h>
> +#include <linux/slab.h>
> +#include <linux/device.h>
> +
> +#define TEST_START_NUM_THREADS 50
> +#define TEST_START_DRIVER "test_module"
> +#define TEST_START_TEST_FS "xfs"
> +#define TEST_START_TEST_CASE TEST_KMOD_DRIVER
> +
> +
> +static bool force_init_test = false;
> +module_param(force_init_test, bool_enable_only, 0644);
> +MODULE_PARM_DESC(force_init_test,
> + "Force kicking a test immediatley after driver loads");
Typo: immediately
> +
> +/*
> + * For device allocation / registration
> + */
> +static DEFINE_MUTEX(reg_dev_mutex);
> +static LIST_HEAD(reg_test_devs);
> +
> +/*
> + * num_test_devs actually represents the *next* ID of the next
> + * device we will allow to create.
> + */
> +static int num_test_devs;
> +
> +/**
> + * enum kmod_test_case - linker table test case
> + *
> + * If you add a test case, please be sure to review if you need to se
> + * @need_mod_put for your tests case.
> + *
> + * @TEST_KMOD_DRIVER: stress tests request_module()
> + * @TEST_KMOD_FS_TYPE: stress tests get_fs_type()
> + */
> +enum kmod_test_case {
> + __TEST_KMOD_INVALID = 0,
> +
> + TEST_KMOD_DRIVER,
> + TEST_KMOD_FS_TYPE,
> +
> + __TEST_KMOD_MAX,
> +};
> +
> +struct test_config {
> + char *test_driver;
> + char *test_fs;
> + unsigned int num_threads;
> + enum kmod_test_case test_case;
> + int test_result;
> +};
> +
> +struct kmod_test_device;
> +
> +/**
> + * kmod_test_device_info - thread info
> + *
> + * @ret_sync: return value if request_module() is used, sync request for
> + * @TEST_KMOD_DRIVER
> + * @fs_sync: return value of get_fs_type() for @TEST_KMOD_FS_TYPE
> + * @thread_idx: thread ID
> + * @test_dev: test device test is being performed under
> + * @need_mod_put: Some tests (get_fs_type() is one) requires putting the module
> + * (module_put(fs_sync->owner)) when done, otherwise you will not be able
> + * to unload the respective modules and re-test. We use this to keep
> + * accounting of when we need this and to help out in case we need to
> + * error out and deal with module_put() on error.
> + */
> +struct kmod_test_device_info {
> + int ret_sync;
> + struct file_system_type *fs_sync;
> + struct task_struct *task_sync;
> + unsigned int thread_idx;
> + struct kmod_test_device *test_dev;
> + bool need_mod_put;
> +};
> +
> +/**
> + * kmod_test_device - test device to help test kmod
> + *
> + * @dev_idx: unique ID for test device
> + * @config: configuration for the test
> + * @misc_dev: we use a misc device under the hood
> + * @dev: pointer to misc_dev's own struct device
> + * @config_mutex: protects configuration of test
> + * @trigger_mutex: the test trigger can only be fired once at a time
> + * @thread_lock: protects @done count, and the @info per each thread
> + * @done: number of threads which have completed or failed
> + * @test_is_oom: when we run out of memory, use this to halt moving forward
> + * @kthreads_done: completion used to signal when all work is done
> + * @list: needed to be part of the reg_test_devs
> + * @info: array of info for each thread
> + */
> +struct kmod_test_device {
> + int dev_idx;
> + struct test_config config;
> + struct miscdevice misc_dev;
> + struct device *dev;
> + struct mutex config_mutex;
> + struct mutex trigger_mutex;
> + struct mutex thread_mutex;
> +
> + unsigned int done;
> +
> + bool test_is_oom;
> + struct completion kthreads_done;
> + struct list_head list;
> +
> + struct kmod_test_device_info *info;
> +};
> +
> +static const char *test_case_str(enum kmod_test_case test_case)
> +{
> + switch (test_case) {
> + case TEST_KMOD_DRIVER:
> + return "TEST_KMOD_DRIVER";
> + case TEST_KMOD_FS_TYPE:
> + return "TEST_KMOD_FS_TYPE";
> + default:
> + return "invalid";
> + }
> +}
> +
> +static struct miscdevice *dev_to_misc_dev(struct device *dev)
> +{
> + return dev_get_drvdata(dev);
> +}
> +
> +static struct kmod_test_device *misc_dev_to_test_dev(struct miscdevice *misc_dev)
> +{
> + return container_of(misc_dev, struct kmod_test_device, misc_dev);
> +}
> +
> +static struct kmod_test_device *dev_to_test_dev(struct device *dev)
> +{
> + struct miscdevice *misc_dev;
> +
> + misc_dev = dev_to_misc_dev(dev);
> +
> + return misc_dev_to_test_dev(misc_dev);
> +}
> +
> +/* Must run with thread_mutex held */
> +static void kmod_test_done_check(struct kmod_test_device *test_dev,
> + unsigned int idx)
> +{
> + struct test_config *config = &test_dev->config;
> +
> + test_dev->done++;
> + dev_dbg(test_dev->dev, "Done thread count: %u\n", test_dev->done);
> +
> + if (test_dev->done == config->num_threads) {
> + dev_info(test_dev->dev, "Done: %u threads have all run now\n",
> + test_dev->done);
> + dev_info(test_dev->dev, "Last thread to run: %u\n", idx);
> + complete(&test_dev->kthreads_done);
> + }
> +}
> +
> +static void test_kmod_put_module(struct kmod_test_device_info *info)
> +{
> + struct kmod_test_device *test_dev = info->test_dev;
> + struct test_config *config = &test_dev->config;
> +
> + if (!info->need_mod_put)
> + return;
> +
> + switch (config->test_case) {
> + case TEST_KMOD_DRIVER:
> + break;
> + case TEST_KMOD_FS_TYPE:
> + if (info && info->fs_sync && info->fs_sync->owner)
> + module_put(info->fs_sync->owner);
> + break;
> + default:
> + BUG();
> + }
> +
> + info->need_mod_put = true;
> +}
> +
> +static int run_request(void *data)
> +{
> + struct kmod_test_device_info *info = data;
> + struct kmod_test_device *test_dev = info->test_dev;
> + struct test_config *config = &test_dev->config;
> +
> + switch (config->test_case) {
> + case TEST_KMOD_DRIVER:
> + info->ret_sync = request_module("%s", config->test_driver);
> + break;
> + case TEST_KMOD_FS_TYPE:
> + info->fs_sync = get_fs_type(config->test_fs);
> + info->need_mod_put = true;
> + break;
> + default:
> + /* __trigger_config_run() already checked for test sanity */
> + BUG();
> + return -EINVAL;
> + }
> +
> + dev_dbg(test_dev->dev, "Ran thread %u\n", info->thread_idx);
> +
> + test_kmod_put_module(info);
> +
> + mutex_lock(&test_dev->thread_mutex);
> + info->task_sync = NULL;
> + kmod_test_done_check(test_dev, info->thread_idx);
> + mutex_unlock(&test_dev->thread_mutex);
> +
> + return 0;
> +}
> +
> +static int tally_work_test(struct kmod_test_device_info *info)
> +{
> + struct kmod_test_device *test_dev = info->test_dev;
> + struct test_config *config = &test_dev->config;
> + int err_ret = 0;
> +
> + switch (config->test_case) {
> + case TEST_KMOD_DRIVER:
> + /*
> + * Only capture errors, if one is found that's
> + * enough, for now.
> + */
> + if (info->ret_sync != 0)
> + err_ret = info->ret_sync;
> + dev_info(test_dev->dev,
> + "Sync thread %d return status: %d\n",
> + info->thread_idx, info->ret_sync);
> + break;
> + case TEST_KMOD_FS_TYPE:
> + /* For now we make this simple */
> + if (!info->fs_sync)
> + err_ret = -EINVAL;
> + dev_info(test_dev->dev, "Sync thread %u fs: %s\n",
> + info->thread_idx, info->fs_sync ? config->test_fs :
> + "NULL");
> + break;
> + default:
> + BUG();
> + }
> +
> + return err_ret;
> +}
> +
> +/*
> + * XXX: add result option to display if all errors did not match.
> + * For now we just keep any error code if one was found.
> + *
> + * If this ran it means *all* tasks were created fine and we
> + * are now just collecting results.
> + *
> + * Only propagate errors, do not override with a subsequent sucess case.
> + */
> +static void tally_up_work(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> + struct kmod_test_device_info *info;
> + unsigned int idx;
> + int err_ret = 0;
> + int ret = 0;
> +
> + mutex_lock(&test_dev->thread_mutex);
> +
> + dev_info(test_dev->dev, "Results:\n");
> +
> + for (idx=0; idx < config->num_threads; idx++) {
> + info = &test_dev->info[idx];
> + ret = tally_work_test(info);
> + if (ret)
> + err_ret = ret;
> + }
> +
> + /*
> + * Note: request_module() returns 256 for a module not found even
> + * though modprobe itself returns 1.
> + */
> + config->test_result = err_ret;
> +
> + mutex_unlock(&test_dev->thread_mutex);
> +}
> +
> +static int try_one_request(struct kmod_test_device *test_dev, unsigned int idx)
> +{
> + struct kmod_test_device_info *info = &test_dev->info[idx];
> + int fail_ret = -ENOMEM;
> +
> + mutex_lock(&test_dev->thread_mutex);
> +
> + info->thread_idx = idx;
> + info->test_dev = test_dev;
> + info->task_sync = kthread_run(run_request, info, "%s-%u",
> + KBUILD_MODNAME, idx);
> +
> + if (!info->task_sync || IS_ERR(info->task_sync)) {
> + test_dev->test_is_oom = true;
> + dev_err(test_dev->dev, "Setting up thread %u failed\n", idx);
> + info->task_sync = NULL;
> + goto err_out;
> + } else
> + dev_dbg(test_dev->dev, "Kicked off thread %u\n", idx);
> +
> + mutex_unlock(&test_dev->thread_mutex);
> +
> + return 0;
> +
> +err_out:
> + info->ret_sync = fail_ret;
> + mutex_unlock(&test_dev->thread_mutex);
> +
> + return fail_ret;
> +}
> +
> +static void test_dev_kmod_stop_tests(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> + struct kmod_test_device_info *info;
> + unsigned int i;
> +
> + dev_info(test_dev->dev, "Ending request_module() tests\n");
> +
> + mutex_lock(&test_dev->thread_mutex);
> +
> + for (i=0; i < config->num_threads; i++) {
> + info = &test_dev->info[i];
> + if (info->task_sync && !IS_ERR(info->task_sync)) {
> + dev_info(test_dev->dev,
> + "Stopping still-running thread %i\n", i);
> + kthread_stop(info->task_sync);
> + }
> +
> + /*
> + * info->task_sync is well protected, it can only be
> + * NULL or a pointer to a struct. If its NULL we either
> + * never ran, or we did and we completed the work. Completed
> + * tasks *always* put the module for us. This is a sanity
> + * check -- just in case.
> + */
> + if (info->task_sync && info->need_mod_put)
> + test_kmod_put_module(info);
> + }
> +
> + mutex_unlock(&test_dev->thread_mutex);
> +}
> +
> +/*
> + * Only wait *iff* we did not run into any errors during all of our thread
> + * set up. If run into any issues we stop threads and just bail out with
> + * an error to the trigger. This also means we don't need any tally work
> + * for any threads which fail.
> + */
> +static int try_requests(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> + unsigned int idx;
> + int ret;
> + bool any_error = false;
> +
> + for (idx=0; idx < config->num_threads; idx++) {
> + if (test_dev->test_is_oom) {
> + any_error = true;
> + break;
> + }
> +
> + ret = try_one_request(test_dev, idx);
> + if (ret) {
> + any_error = true;
> + break;
> + }
> + }
> +
> + if (!any_error) {
> + test_dev->test_is_oom = false;
> + dev_info(test_dev->dev,
> + "No errors were found while initializing threads\n");
> + wait_for_completion(&test_dev->kthreads_done);
> + tally_up_work(test_dev);
> + } else {
> + test_dev->test_is_oom = true;
> + dev_info(test_dev->dev,
> + "At least one thread failed to start, stop all work\n");
> + test_dev_kmod_stop_tests(test_dev);
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +static int run_test_driver(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> +
> + dev_info(test_dev->dev, "Test case: %s (%u)\n",
> + test_case_str(config->test_case),
> + config->test_case);
> + dev_info(test_dev->dev, "Test driver to load: %s\n",
> + config->test_driver);
> + dev_info(test_dev->dev, "Number of threads to run: %u\n",
> + config->num_threads);
> + dev_info(test_dev->dev, "Thread IDs will range from 0 - %u\n",
> + config->num_threads - 1);
> +
> + return try_requests(test_dev);
> +}
> +
> +static int run_test_fs_type(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> +
> + dev_info(test_dev->dev, "Test case: %s (%u)\n",
> + test_case_str(config->test_case),
> + config->test_case);
> + dev_info(test_dev->dev, "Test filesystem to load: %s\n",
> + config->test_fs);
> + dev_info(test_dev->dev, "Number of threads to run: %u\n",
> + config->num_threads);
> + dev_info(test_dev->dev, "Thread IDs will range from 0 - %u\n",
> + config->num_threads - 1);
> +
> + return try_requests(test_dev);
> +}
> +
> +static ssize_t config_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> + int len = 0;
> +
> + mutex_lock(&test_dev->config_mutex);
> +
> + len += sprintf(buf, "Custom trigger configuration for: %s\n",
> + dev_name(dev));
> +
> + len += sprintf(buf+len, "Number of threads:\t%u\n",
> + config->num_threads);
> +
> + len += sprintf(buf+len, "Test_case:\t%s (%u)\n",
> + test_case_str(config->test_case),
> + config->test_case);
> +
> + if (config->test_driver)
> + len += sprintf(buf+len, "driver:\t%s\n",
> + config->test_driver);
> + else
> + len += sprintf(buf+len, "driver:\tEMTPY\n");
> +
> + if (config->test_fs)
> + len += sprintf(buf+len, "fs:\t%s\n",
> + config->test_fs);
> + else
> + len += sprintf(buf+len, "fs:\tEMTPY\n");
These should all use snprintf...
> +
> +
> + mutex_unlock(&test_dev->config_mutex);
> +
> + return len;
> +}
> +static DEVICE_ATTR_RO(config);
> +
> +/*
> + * This ensures we don't allow kicking threads through if our configuration
> + * is faulty.
> + */
> +static int __trigger_config_run(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> +
> + test_dev->done = 0;
> +
> + switch (config->test_case) {
> + case TEST_KMOD_DRIVER:
> + return run_test_driver(test_dev);
> + case TEST_KMOD_FS_TYPE:
> + return run_test_fs_type(test_dev);
> + default:
> + dev_warn(test_dev->dev,
> + "Invalid test case requested: %u\n",
> + config->test_case);
> + return -EINVAL;
> + }
> +}
> +
> +static int trigger_config_run(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> + int ret;
> +
> + mutex_lock(&test_dev->trigger_mutex);
> + mutex_lock(&test_dev->config_mutex);
> +
> + ret = __trigger_config_run(test_dev);
> + if (ret < 0)
> + goto out;
> + dev_info(test_dev->dev, "General test result: %d\n",
> + config->test_result);
> +
> + /*
> + * We must return 0 after a trigger even unless something went
> + * wrong with the setup of the test. If the test setup went fine
> + * then userspace must just check the result of config->test_result.
> + * One issue with relying on the return from a call in the kernel
> + * is if the kernel returns a possitive value using this trigger
> + * will not return the value to userspace, it would be lost.
> + *
> + * By not relying on capturing the return value of tests we are using
> + * through the trigger it also us to run tests with set -e and only
> + * fail when something went wrong with the driver upon trigger
> + * requests.
> + */
> + ret = 0;
> +
> +out:
> + mutex_unlock(&test_dev->config_mutex);
> + mutex_unlock(&test_dev->trigger_mutex);
> +
> + return ret;
> +}
> +
> +static ssize_t
> +trigger_config_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + int ret;
> +
> + if (test_dev->test_is_oom)
> + return -ENOMEM;
> +
> + /* For all intents and purposes we don't care what userspace
> + * sent this trigger, we care only that we were triggered.
> + * We treat the return value only for caputuring issues with
> + * the test setup. At this point all the test variables should
> + * have been allocated so typically this should never fail.
> + */
> + ret = trigger_config_run(test_dev);
> + if (unlikely(ret < 0))
> + goto out;
> +
> + /*
> + * Note: any return > 0 will be treated as success
> + * and the error value will not be available to userspace.
> + * Do not rely on trying to send to userspace a test value
> + * return value as possitive return errors will be lost.
> + */
> + if (WARN_ON(ret > 0))
> + return -EINVAL;
> +
> + ret = count;
> +out:
> + return ret;
> +}
> +static DEVICE_ATTR_WO(trigger_config);
> +
> +/*
> + * XXX: move to kstrncpy() once merged.
> + *
> + * Users should use kfree_const() when freeing these.
> + */
> +static int __kstrncpy(char **dst, const char *name, size_t count, gfp_t gfp)
> +{
> + *dst = kstrndup(name, count, gfp);
> + if (!*dst)
> + return -ENOSPC;
> + return count;
> +}
> +
> +static int config_copy_test_driver_name(struct test_config *config,
> + const char *name,
> + size_t count)
> +{
> + return __kstrncpy(&config->test_driver, name, count, GFP_KERNEL);
> +}
> +
> +
> +static int config_copy_test_fs(struct test_config *config, const char *name,
> + size_t count)
> +{
> + return __kstrncpy(&config->test_fs, name, count, GFP_KERNEL);
> +}
> +
> +static void __kmod_config_free(struct test_config *config)
> +{
> + if (!config)
> + return;
> +
> + kfree_const(config->test_driver);
> + config->test_driver = NULL;
> +
> + kfree_const(config->test_fs);
> + config->test_driver = NULL;
> +}
> +
> +static void kmod_config_free(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config;
> +
> + if (!test_dev)
> + return;
> +
> + config = &test_dev->config;
> +
> + mutex_lock(&test_dev->config_mutex);
> + __kmod_config_free(config);
> + mutex_unlock(&test_dev->config_mutex);
> +}
> +
> +static ssize_t config_test_driver_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> + int copied;
> +
> + mutex_lock(&test_dev->config_mutex);
> +
> + kfree_const(config->test_driver);
> + config->test_driver = NULL;
> +
> + copied = config_copy_test_driver_name(config, buf, count);
> + mutex_unlock(&test_dev->config_mutex);
> +
> + return copied;
> +}
> +
> +static ssize_t config_test_driver_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + mutex_lock(&test_dev->config_mutex);
> + strcpy(buf, config->test_driver);
> + strcat(buf, "\n");
IIUC, the show/store API uses a max size of PAGE_SIZE. If that's
correct, it's possible that this show routine could write past the end
of buf, due to the end newline, etc. Best to use snprintf like you do
below for the other shows.
> + mutex_unlock(&test_dev->config_mutex);
> +
> + return strlen(buf) + 1;
> +}
> +static DEVICE_ATTR(config_test_driver, 0644, config_test_driver_show,
> + config_test_driver_store);
> +
> +static ssize_t config_test_fs_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> + int copied;
> +
> + mutex_lock(&test_dev->config_mutex);
> +
> + kfree_const(config->test_fs);
> + config->test_fs = NULL;
> +
> + copied = config_copy_test_fs(config, buf, count);
> + mutex_unlock(&test_dev->config_mutex);
> +
> + return copied;
> +}
> +
> +static ssize_t config_test_fs_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + mutex_lock(&test_dev->config_mutex);
> + strcpy(buf, config->test_fs);
> + strcat(buf, "\n");
> + mutex_unlock(&test_dev->config_mutex);
Same here... (which, btw, could likely use to be a helper function,
the show and store functions here are identical except for test_driver
vs test_fs).
> +
> + return strlen(buf) + 1;
> +}
> +static DEVICE_ATTR(config_test_fs, 0644, config_test_fs_show,
> + config_test_fs_store);
> +
> +static int trigger_config_run_driver(struct kmod_test_device *test_dev,
> + const char *test_driver)
> +{
> + int copied;
> + struct test_config *config = &test_dev->config;
> +
> + mutex_lock(&test_dev->config_mutex);
> +
> + config->test_case = TEST_KMOD_DRIVER;
> +
> + kfree_const(config->test_driver);
> + config->test_driver = NULL;
> +
> + copied = config_copy_test_driver_name(config, test_driver,
> + strlen(test_driver));
> + mutex_unlock(&test_dev->config_mutex);
> +
> + if (copied != strlen(test_driver)) {
Can't these copied tests just check < 0? (i.e. avoid the repeated
strlen which can be fragile.)
> + test_dev->test_is_oom = true;
> + return -EINVAL;
> + }
> +
> + test_dev->test_is_oom = false;
> +
> + return trigger_config_run(test_dev);
> +}
> +
> +static int trigger_config_run_fs(struct kmod_test_device *test_dev,
> + const char *fs_type)
> +{
> + int copied;
> + struct test_config *config = &test_dev->config;
> +
> + mutex_lock(&test_dev->config_mutex);
> + config->test_case = TEST_KMOD_FS_TYPE;
> +
> + kfree_const(config->test_fs);
> + config->test_driver = NULL;
> +
> + copied = config_copy_test_fs(config, fs_type, strlen(fs_type));
> + mutex_unlock(&test_dev->config_mutex);
> +
> + if (copied != strlen(fs_type)) {
> + test_dev->test_is_oom = true;
> + return -EINVAL;
> + }
> +
> + test_dev->test_is_oom = false;
> +
> + return trigger_config_run(test_dev);
> +}
These two functions are almost identical too. Only test_case and the
copy function change...
> +
> +static void free_test_dev_info(struct kmod_test_device *test_dev)
> +{
> + if (test_dev->info) {
> + vfree(test_dev->info);
> + test_dev->info = NULL;
> + }
> +}
vfree() already checks for NULL, you can drop the if.
> +
> +static int kmod_config_sync_info(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> +
> + free_test_dev_info(test_dev);
> + test_dev->info = vzalloc(config->num_threads *
> + sizeof(struct kmod_test_device_info));
> + if (!test_dev->info) {
> + dev_err(test_dev->dev, "Cannot alloc test_dev info\n");
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * Old kernels may not have this, if you want to port this code to
> + * test it on older kernels.
> + */
> +#ifdef get_kmod_umh_limit
> +static unsigned int kmod_init_test_thread_limit(void)
> +{
> + return get_kmod_umh_limit();
> +}
> +#else
> +static unsigned int kmod_init_test_thread_limit(void)
> +{
> + return TEST_START_NUM_THREADS;
> +}
> +#endif
> +
> +static int __kmod_config_init(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> + int ret = -ENOMEM, copied;
> +
> + __kmod_config_free(config);
> +
> + copied = config_copy_test_driver_name(config, TEST_START_DRIVER,
> + strlen(TEST_START_DRIVER));
> + if (copied != strlen(TEST_START_DRIVER))
> + goto err_out;
> +
> + copied = config_copy_test_fs(config, TEST_START_TEST_FS,
> + strlen(TEST_START_TEST_FS));
> + if (copied != strlen(TEST_START_TEST_FS))
> + goto err_out;
> +
> + config->num_threads = kmod_init_test_thread_limit();
> + config->test_result = 0;
> + config->test_case = TEST_START_TEST_CASE;
> +
> + ret = kmod_config_sync_info(test_dev);
> + if (ret)
> + goto err_out;
> +
> + test_dev->test_is_oom = false;
> +
> + return 0;
> +
> +err_out:
> + test_dev->test_is_oom = true;
> + WARN_ON(test_dev->test_is_oom);
> +
> + __kmod_config_free(config);
> +
> + return ret;
> +}
> +
> +static ssize_t reset_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + int ret;
> +
> + mutex_lock(&test_dev->trigger_mutex);
> + mutex_lock(&test_dev->config_mutex);
> +
> + ret = __kmod_config_init(test_dev);
> + if (ret < 0) {
> + ret = -ENOMEM;
> + dev_err(dev, "could not alloc settings for config trigger: %d\n",
> + ret);
> + goto out;
> + }
> +
> + dev_info(dev, "reset\n");
> + ret = count;
> +
> +out:
> + mutex_unlock(&test_dev->config_mutex);
> + mutex_unlock(&test_dev->trigger_mutex);
> +
> + return ret;
> +}
> +static DEVICE_ATTR_WO(reset);
> +
> +static int test_dev_config_update_uint_sync(struct kmod_test_device *test_dev,
> + const char *buf, size_t size,
> + unsigned int *config,
> + int (*test_sync)(struct kmod_test_device *test_dev))
> +{
> + int ret;
> + char *end;
> + long new = simple_strtol(buf, &end, 0);
> + unsigned int old_val;
> + if (end == buf || new > UINT_MAX)
> + return -EINVAL;
> +
> + mutex_lock(&test_dev->config_mutex);
> +
> + old_val = *config;
> + *(unsigned int *)config = new;
> +
> + ret = test_sync(test_dev);
> + if (ret) {
> + *(unsigned int *)config = old_val;
> +
> + ret = test_sync(test_dev);
> + WARN_ON(ret);
> +
> + mutex_unlock(&test_dev->config_mutex);
> + return -EINVAL;
> + }
> +
> + mutex_unlock(&test_dev->config_mutex);
> + /* Always return full write size even if we didn't consume all */
> + return size;
> +}
> +
> +static int test_dev_config_update_uint_range(struct kmod_test_device *test_dev,
> + const char *buf, size_t size,
> + unsigned int *config,
> + unsigned int min,
> + unsigned int max)
> +{
> + char *end;
> + long new = simple_strtol(buf, &end, 0);
> + if (end == buf || new < min || new > max || new > UINT_MAX)
> + return -EINVAL;
> +
> + mutex_lock(&test_dev->config_mutex);
> + *(unsigned int *)config = new;
config is already an unsigned int *, why cast?
> + mutex_unlock(&test_dev->config_mutex);
> +
> + /* Always return full write size even if we didn't consume all */
> + return size;
> +}
> +
> +static int test_dev_config_update_int(struct kmod_test_device *test_dev,
> + const char *buf, size_t size,
> + int *config)
> +{
> + char *end;
> + long new = simple_strtol(buf, &end, 0);
> + if (end == buf || new > INT_MAX || new < INT_MIN)
> + return -EINVAL;
> + mutex_lock(&test_dev->config_mutex);
> + *(int *)config = new;
config is already an int *, why cast?
> + mutex_unlock(&test_dev->config_mutex);
> + /* Always return full write size even if we didn't consume all */
> + return size;
> +}
> +
> +static ssize_t test_dev_config_show_int(struct kmod_test_device *test_dev,
> + char *buf,
> + int config)
> +{
> + int val;
> +
> + mutex_lock(&test_dev->config_mutex);
> + val = config;
> + mutex_unlock(&test_dev->config_mutex);
> +
> + return snprintf(buf, PAGE_SIZE, "%d\n", val);
> +}
> +
> +static ssize_t test_dev_config_show_uint(struct kmod_test_device *test_dev,
> + char *buf,
> + unsigned int config)
> +{
> + unsigned int val;
> +
> + mutex_lock(&test_dev->config_mutex);
> + val = config;
> + mutex_unlock(&test_dev->config_mutex);
> +
> + return snprintf(buf, PAGE_SIZE, "%u\n", val);
> +}
> +
> +static ssize_t test_result_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + return test_dev_config_update_int(test_dev, buf, count,
> + &config->test_result);
> +}
> +
> +static ssize_t config_num_threads_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + return test_dev_config_update_uint_sync(test_dev, buf, count,
> + &config->num_threads,
> + kmod_config_sync_info);
> +}
> +
> +static ssize_t config_num_threads_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + return test_dev_config_show_int(test_dev, buf, config->num_threads);
> +}
> +static DEVICE_ATTR(config_num_threads, 0644, config_num_threads_show,
> + config_num_threads_store);
> +
> +static ssize_t config_test_case_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + return test_dev_config_update_uint_range(test_dev, buf, count,
> + &config->test_case,
> + __TEST_KMOD_INVALID + 1,
> + __TEST_KMOD_MAX - 1);
> +}
> +
> +static ssize_t config_test_case_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + return test_dev_config_show_uint(test_dev, buf, config->test_case);
> +}
> +static DEVICE_ATTR(config_test_case, 0644, config_test_case_show,
> + config_test_case_store);
> +
> +static ssize_t test_result_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + return test_dev_config_show_int(test_dev, buf, config->test_result);
> +}
> +static DEVICE_ATTR(test_result, 0644, test_result_show, test_result_store);
> +
> +#define TEST_KMOD_DEV_ATTR(name) &dev_attr_##name.attr
> +
> +static struct attribute *test_dev_attrs[] = {
> + TEST_KMOD_DEV_ATTR(trigger_config),
> + TEST_KMOD_DEV_ATTR(config),
> + TEST_KMOD_DEV_ATTR(reset),
> +
> + TEST_KMOD_DEV_ATTR(config_test_driver),
> + TEST_KMOD_DEV_ATTR(config_test_fs),
> + TEST_KMOD_DEV_ATTR(config_num_threads),
> + TEST_KMOD_DEV_ATTR(config_test_case),
> + TEST_KMOD_DEV_ATTR(test_result),
> +
> + NULL,
> +};
> +
> +ATTRIBUTE_GROUPS(test_dev);
> +
> +static int kmod_config_init(struct kmod_test_device *test_dev)
> +{
> + int ret;
> +
> + mutex_lock(&test_dev->config_mutex);
> + ret = __kmod_config_init(test_dev);
> + mutex_unlock(&test_dev->config_mutex);
> +
> + return ret;
> +}
> +
> +/*
> + * XXX: this could perhaps be made generic already too, but a hunt
> + * for actual users would be needed first. It could be generic
> + * if other test drivers end up using a similar mechanism.
> + */
> +const char *test_dev_get_name(const char *base, int idx, gfp_t gfp)
> +{
> + const char *name_const;
> + char *name;
> +
> + if (!base)
> + return NULL;
> + if (strlen(base) > 30)
> + return NULL;
why?
> + name = kzalloc(1024, gfp);
> + if (!name)
> + return NULL;
> +
> + strncat(name, base, strlen(base));
> + sprintf(name+(strlen(base)), "%d", idx);
> + name_const = kstrdup_const(name, gfp);
> +
> + kfree(name);
> +
> + return name_const;
> +}
What is going on here? Why not just:
return kasprintf(gfp, "%s%d", base, idx);
For all of that code? And kstrdup_const is pointless here since it'll
always just do the dup (as the kmalloc source isn't in rodata).
> +
> +static struct kmod_test_device *alloc_test_dev_kmod(int idx)
> +{
> + int ret;
> + struct kmod_test_device *test_dev;
> + struct miscdevice *misc_dev;
> +
> + test_dev = vzalloc(sizeof(struct kmod_test_device));
> + if (!test_dev) {
> + pr_err("Cannot alloc test_dev\n");
> + goto err_out;
> + }
> +
> + mutex_init(&test_dev->config_mutex);
> + mutex_init(&test_dev->trigger_mutex);
> + mutex_init(&test_dev->thread_mutex);
> +
> + init_completion(&test_dev->kthreads_done);
> +
> + ret = kmod_config_init(test_dev);
> + if (ret < 0) {
> + pr_err("Cannot alloc kmod_config_init()\n");
> + goto err_out_free;
> + }
> +
> + test_dev->dev_idx = idx;
> + misc_dev = &test_dev->misc_dev;
> +
> + misc_dev->minor = MISC_DYNAMIC_MINOR;
> + misc_dev->name = test_dev_get_name("test_kmod", test_dev->dev_idx,
> + GFP_KERNEL);
> + if (!misc_dev->name) {
> + pr_err("Cannot alloc misc_dev->name\n");
> + goto err_out_free_config;
> + }
> + misc_dev->groups = test_dev_groups;
> +
> + return test_dev;
> +
> +err_out_free_config:
> + free_test_dev_info(test_dev);
> + kmod_config_free(test_dev);
> +err_out_free:
> + vfree(test_dev);
> + test_dev = NULL;
> +err_out:
> + return NULL;
> +}
> +
> +static void free_test_dev_kmod(struct kmod_test_device *test_dev)
> +{
> + if (test_dev) {
> + kfree_const(test_dev->misc_dev.name);
> + test_dev->misc_dev.name = NULL;
> + free_test_dev_info(test_dev);
> + kmod_config_free(test_dev);
> + vfree(test_dev);
> + test_dev = NULL;
> + }
> +}
> +
> +static struct kmod_test_device *register_test_dev_kmod(void)
> +{
> + struct kmod_test_device *test_dev = NULL;
> + int ret;
> +
> + mutex_unlock(®_dev_mutex);
> +
> + /* int should suffice for number of devices, test for wrap */
> + if (unlikely(num_test_devs + 1) < 0) {
> + pr_err("reached limit of number of test devices\n");
> + goto out;
> + }
> +
> + test_dev = alloc_test_dev_kmod(num_test_devs);
> + if (!test_dev)
> + goto out;
> +
> + ret = misc_register(&test_dev->misc_dev);
> + if (ret) {
> + pr_err("could not register misc device: %d\n", ret);
> + free_test_dev_kmod(test_dev);
> + goto out;
> + }
> +
> + test_dev->dev = test_dev->misc_dev.this_device;
> + list_add_tail(&test_dev->list, ®_test_devs);
> + dev_info(test_dev->dev, "interface ready\n");
> +
> + num_test_devs++;
> +
> +out:
> + mutex_unlock(®_dev_mutex);
> +
> + return test_dev;
> +
> +}
> +
> +static int __init test_kmod_init(void)
> +{
> + struct kmod_test_device *test_dev;
> + int ret;
> +
> + test_dev = register_test_dev_kmod();
> + if (!test_dev) {
> + pr_err("Cannot add first test kmod device\n");
> + return -ENODEV;
> + }
> +
> + /*
> + * With some work we might be able to gracefully enable
> + * testing with this driver built-in, for now this seems
> + * rather risky. For those willing to try have at it,
> + * and enable the below. Good luck! If that works, try
> + * lowering the init level for more fun.
> + */
> + if (force_init_test) {
> + ret = trigger_config_run_driver(test_dev, "tun");
> + if (WARN_ON(ret))
> + return ret;
> + ret = trigger_config_run_fs(test_dev, "btrfs");
> + if (WARN_ON(ret))
> + return ret;
> + }
> +
> + return 0;
> +}
> +late_initcall(test_kmod_init);
> +
> +static
> +void unregister_test_dev_kmod(struct kmod_test_device *test_dev)
> +{
> + mutex_lock(&test_dev->trigger_mutex);
> + mutex_lock(&test_dev->config_mutex);
> +
> + test_dev_kmod_stop_tests(test_dev);
> +
> + dev_info(test_dev->dev, "removing interface\n");
> + misc_deregister(&test_dev->misc_dev);
> +
> + mutex_unlock(&test_dev->config_mutex);
> + mutex_unlock(&test_dev->trigger_mutex);
> +
> + free_test_dev_kmod(test_dev);
> +}
> +
> +static void __exit test_kmod_exit(void)
> +{
> + struct kmod_test_device *test_dev, *tmp;
> +
> + mutex_lock(®_dev_mutex);
> + list_for_each_entry_safe(test_dev, tmp, ®_test_devs, list) {
> + list_del(&test_dev->list);
> + unregister_test_dev_kmod(test_dev);
> + }
> + mutex_unlock(®_dev_mutex);
> +}
> +module_exit(test_kmod_exit);
> +
> +MODULE_AUTHOR("Luis R. Rodriguez <[email protected]>");
> +MODULE_LICENSE("GPL");
> diff --git a/tools/testing/selftests/kmod/Makefile b/tools/testing/selftests/kmod/Makefile
> new file mode 100644
> index 000000000000..fa2ccc5fb3de
> --- /dev/null
> +++ b/tools/testing/selftests/kmod/Makefile
> @@ -0,0 +1,11 @@
> +# Makefile for kmod loading selftests
> +
> +# No binaries, but make sure arg-less "make" doesn't trigger "run_tests"
> +all:
> +
> +TEST_PROGS := kmod.sh
> +
> +include ../lib.mk
> +
> +# Nothing to clean up.
> +clean:
> diff --git a/tools/testing/selftests/kmod/config b/tools/testing/selftests/kmod/config
> new file mode 100644
> index 000000000000..259f4fd6b5e2
> --- /dev/null
> +++ b/tools/testing/selftests/kmod/config
> @@ -0,0 +1,7 @@
> +CONFIG_TEST_KMOD=m
> +CONFIG_TEST_LKM=m
> +CONFIG_XFS_FS=m
> +
> +# For the module parameter force_init_test is used
> +CONFIG_TUN=m
> +CONFIG_BTRFS_FS=m
> diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
> new file mode 100755
> index 000000000000..9ea1864d8bae
> --- /dev/null
> +++ b/tools/testing/selftests/kmod/kmod.sh
> @@ -0,0 +1,449 @@
> +#!/bin/bash
> +#
> +# Copyright (C) 2016 Luis R. Rodriguez <[email protected]>
> +#
> +# This program is free software; you can redistribute it and/or modify it
> +# under the terms of copyleft-next (version 0.3.1 or later) as published
> +# at http://copyleft-next.org/.
> +
> +# This is a stress test script for kmod, the kernel module loader. It uses
> +# test_kmod which exposes a series of knobs for the API for us so we can
> +# tweak each test in userspace rather than in kernelspace.
> +#
> +# The way kmod works is it uses the kernel's usermode helper API to eventually
> +# call /sbin/modprobe. It has a limit of the number of concurrent calls
> +# possible. The kernel interface to load modules is request_module(), however
> +# mount uses get_fs_type(). Both behave slightly differently, but the
> +# differences are important enough to test each call separately. For this
> +# reason test_kmod starts by providing tests for both calls.
> +#
> +# The test driver test_kmod assumes a series of defaults which you can
> +# override by exporting to your environment prior running this script.
> +# For instance this script assumes you do not have xfs loaded upon boot.
> +# If this is false, export DEFAULT_KMOD_FS="ext4" prior to running this
> +# script if the filesyste module you don't have loaded upon bootup
> +# is ext4 instead. Refer to allow_user_defaults() for a list of user
> +# override variables possible.
> +#
> +# You'll want at least 4096 GiB of RAM to expect to run these tests
4TiB of RAM? I assume this was meant to be 4 GiB not 4096?
> +# without running out of memory on them. For other requirements refer
> +# to test_reqs()
> +
> +set -e
> +
> +TEST_DRIVER="test_kmod"
> +
> +function allow_user_defaults()
> +{
> + if [ -z $DEFAULT_KMOD_DRIVER ]; then
> + DEFAULT_KMOD_DRIVER="test_module"
> + fi
> +
> + if [ -z $DEFAULT_KMOD_FS ]; then
> + DEFAULT_KMOD_FS="xfs"
> + fi
> +
> + if [ -z $PROC_DIR ]; then
> + PROC_DIR="/proc/sys/kernel/"
> + fi
> +
> + if [ -z $MODPROBE_LIMIT ]; then
> + MODPROBE_LIMIT=50
> + fi
> +
> + if [ -z $DIR ]; then
> + DIR="/sys/devices/virtual/misc/${TEST_DRIVER}0/"
> + fi
> +
> + MODPROBE_LIMIT_FILE="${PROC_DIR}/kmod-limit"
> +}
> +
> +test_reqs()
> +{
> + if ! which modprobe 2> /dev/null > /dev/null; then
> + echo "$0: You need modprobe installed"
While not a huge deal, I prefer that error messages end up on stderr,
so adding >&2 to all the failure echos (or providing an err function)
would be nice. (This happens in later places...)
> + exit 1
> + fi
> +
> + if ! which kmod 2> /dev/null > /dev/null; then
> + echo "$0: You need kmod installed"
> + exit 1
> + fi
> +
> + # kmod 19 has a bad bug where it returns 0 when modprobe
> + # gets called *even* if the module was not loaded due to
> + # some bad heuristics. For details see:
> + #
> + # A work around is possible in-kernel but its rather
> + # complex.
> + KMOD_VERSION=$(kmod --version | awk '{print $3}')
> + if [[ $KMOD_VERSION -le 19 ]]; then
> + echo "$0: You need at least kmod 20"
> + echo "kmod <= 19 is buggy, for details see:"
> + echo "http://git.kernel.org/cgit/utils/kernel/kmod/kmod.git/commit/libkmod/libkmod-module.c?id=fd44a98ae2eb5eb32161088954ab21e58e19dfc4"
> + exit 1
> + fi
> +}
> +
> +function load_req_mod()
> +{
> + if [ ! -d $DIR ]; then
> + # Alanis: "Oh isn't it ironic?"
> + modprobe $TEST_DRIVER
> + if [ ! -d $DIR ]; then
> + echo "$0: $DIR not present"
> + echo "You must have the following enabled in your kernel:"
> + cat $PWD/config
I like this (minimum config in the test directory). Are other tests
doing this too?
> + exit 1
> + fi
> + fi
> +}
> +
> +test_finish()
> +{
> + echo "Test completed"
> +}
> +
> +errno_name_to_val()
> +{
> + case "$1" in
> + # kmod calls modprobe and upon of a module not found
> + # modprobe returns just 1... However in the kernel we
> + # *sometimes* see 256...
> + MODULE_NOT_FOUND)
> + echo 256;;
> + SUCCESS)
> + echo 0;;
> + -EPERM)
> + echo -1;;
> + -ENOENT)
> + echo -2;;
> + -EINVAL)
> + echo -22;;
> + -ERR_ANY)
> + echo -123456;;
> + *)
> + echo invalid;;
> + esac
> +}
> +
> +errno_val_to_name()
> + case "$1" in
> + 256)
> + echo MODULE_NOT_FOUND;;
> + 0)
> + echo SUCCESS;;
> + -1)
> + echo -EPERM;;
> + -2)
> + echo -ENOENT;;
> + -22)
> + echo -EINVAL;;
> + -123456)
> + echo -ERR_ANY;;
> + *)
> + echo invalid;;
> + esac
> +
> +config_set_test_case_driver()
> +{
> + if ! echo -n 1 >$DIR/config_test_case; then
> + echo "$0: Unable to set to test case to driver" >&2
> + exit 1
> + fi
> +}
> +
> +config_set_test_case_fs()
> +{
> + if ! echo -n 2 >$DIR/config_test_case; then
> + echo "$0: Unable to set to test case to fs" >&2
> + exit 1
> + fi
> +}
> +
> +config_num_threads()
> +{
> + if ! echo -n $1 >$DIR/config_num_threads; then
> + echo "$0: Unable to set to number of threads" >&2
> + exit 1
> + fi
> +}
> +
> +config_get_modprobe_limit()
> +{
> + if [[ -f ${MODPROBE_LIMIT_FILE} ]] ; then
> + MODPROBE_LIMIT=$(cat $MODPROBE_LIMIT_FILE)
> + fi
> + echo $MODPROBE_LIMIT
> +}
> +
> +config_num_thread_limit_extra()
> +{
> + MODPROBE_LIMIT=$(config_get_modprobe_limit)
> + let EXTRA_LIMIT=$MODPROBE_LIMIT+$1
> + config_num_threads $EXTRA_LIMIT
> +}
> +
> +# For special characters use printf directly,
> +# refer to kmod_test_0001
> +config_set_driver()
> +{
> + if ! echo -n $1 >$DIR/config_test_driver; then
> + echo "$0: Unable to set driver" >&2
> + exit 1
> + fi
> +}
> +
> +config_set_fs()
> +{
> + if ! echo -n $1 >$DIR/config_test_fs; then
> + echo "$0: Unable to set driver" >&2
> + exit 1
> + fi
> +}
> +
> +config_get_driver()
> +{
> + cat $DIR/config_test_driver
> +}
> +
> +config_get_test_result()
> +{
> + cat $DIR/test_result
> +}
> +
> +config_reset()
> +{
> + if ! echo -n "1" >"$DIR"/reset; then
> + echo "$0: reset shuld have worked" >&2
> + exit 1
> + fi
> +}
> +
> +config_show_config()
> +{
> + echo "----------------------------------------------------"
> + cat "$DIR"/config
> + echo "----------------------------------------------------"
> +}
> +
> +config_trigger()
> +{
> + if ! echo -n "1" >"$DIR"/trigger_config 2>/dev/null; then
> + echo "$1: FAIL - loading should have worked"
> + config_show_config
> + exit 1
> + fi
> + echo "$1: OK! - loading kmod test"
> +}
> +
> +config_trigger_want_fail()
> +{
> + if echo "1" > $DIR/trigger_config 2>/dev/null; then
> + echo "$1: FAIL - test case was expected to fail"
> + config_show_config
> + exit 1
> + fi
> + echo "$1: OK! - kmod test case failed as expected"
> +}
> +
> +config_expect_result()
> +{
> + RC=$(config_get_test_result)
> + RC_NAME=$(errno_val_to_name $RC)
> +
> + ERRNO_NAME=$2
> + ERRNO=$(errno_name_to_val $ERRNO_NAME)
> +
> + if [[ $ERRNO_NAME = "-ERR_ANY" ]]; then
> + if [[ $RC -ge 0 ]]; then
> + echo "$1: FAIL, test expects $ERRNO_NAME - got $RC_NAME ($RC)" >&2
> + config_show_config
> + exit 1
> + fi
> + elif [[ $RC != $ERRNO ]]; then
> + echo "$1: FAIL, test expects $ERRNO_NAME ($ERRNO) - got $RC_NAME ($RC)" >&2
> + config_show_config
> + exit 1
> + fi
> + echo "$1: OK! - Return value: $RC ($RC_NAME), expected $ERRNO_NAME"
> +}
> +
> +kmod_defaults_driver()
> +{
> + config_reset
> + modprobe -r $DEFAULT_KMOD_DRIVER
> + config_set_driver $DEFAULT_KMOD_DRIVER
> +}
> +
> +kmod_defaults_fs()
> +{
> + config_reset
> + modprobe -r $DEFAULT_KMOD_FS
> + config_set_fs $DEFAULT_KMOD_FS
> + config_set_test_case_fs
> +}
> +
> +kmod_test_0001_driver()
> +{
> + NAME='\000'
> +
> + kmod_defaults_driver
> + config_num_threads 1
> + printf '\000' >"$DIR"/config_test_driver
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
> +}
> +
> +kmod_test_0001_fs()
> +{
> + NAME='\000'
> +
> + kmod_defaults_fs
> + config_num_threads 1
> + printf '\000' >"$DIR"/config_test_fs
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} -EINVAL
> +}
> +
> +kmod_test_0001()
> +{
> + kmod_test_0001_driver
> + kmod_test_0001_fs
> +}
> +
> +kmod_test_0002_driver()
> +{
> + NAME="nope-$DEFAULT_KMOD_DRIVER"
> +
> + kmod_defaults_driver
> + config_set_driver $NAME
> + config_num_threads 1
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
> +}
> +
> +kmod_test_0002_fs()
> +{
> + NAME="nope-$DEFAULT_KMOD_FS"
> +
> + kmod_defaults_fs
> + config_set_fs $NAME
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} -EINVAL
> +}
> +
> +kmod_test_0002()
> +{
> + kmod_test_0002_driver
> + kmod_test_0002_fs
> +}
> +
> +kmod_test_0003()
> +{
> + kmod_defaults_fs
> + config_num_threads 1
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} SUCCESS
> +}
> +
> +kmod_test_0004()
> +{
> + kmod_defaults_fs
> + config_num_threads 2
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} SUCCESS
> +}
> +
> +kmod_test_0005()
> +{
> + kmod_defaults_driver
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} SUCCESS
> +}
> +
> +kmod_test_0006()
> +{
> + kmod_defaults_fs
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} SUCCESS
> +}
> +
> +kmod_test_0007()
> +{
> + kmod_test_0005
> + kmod_test_0006
> +}
> +
> +kmod_test_0008()
> +{
> + kmod_defaults_driver
> + MODPROBE_LIMIT=$(config_get_modprobe_limit)
> + let EXTRA=$MODPROBE_LIMIT/2
> + config_num_thread_limit_extra $EXTRA
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} -EINVAL
> +}
> +
> +kmod_test_0009()
> +{
> + kmod_defaults_fs
> + #MODPROBE_LIMIT=$(config_get_modprobe_limit)
> + #let EXTRA=$MODPROBE_LIMIT/3
> + config_num_thread_limit_extra 5
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} -EINVAL
> +}
> +
> +trap "test_finish" EXIT
> +test_reqs
> +allow_user_defaults
> +load_req_mod
> +
> +usage()
> +{
> + echo "Usage: $0 [ -t <4-number-digit> ]"
> + echo "Valid tests: 0001-0011"
> + echo
> + echo "0001 - Simple test - 1 thread for empty string"
> + echo "0002 - Simple test - 1 thread for modules/filesystems that do not exist"
> + echo "0003 - Simple test - 1 thread for get_fs_type() only"
> + echo "0004 - Simple test - 2 threads for get_fs_type() only"
> + echo "0005 - multithreaded tests with default setup - request_module() only"
> + echo "0006 - multithreaded tests with default setup - get_fs_type() only"
> + echo "0007 - multithreaded tests with default setup test request_module() and get_fs_type()"
> + echo "0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()"
> + echo "0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()"
> + exit 1
> +}
> +
> +# You can ask for a specific test:
> +if [[ $# > 0 ]] ; then
> + if [[ $1 != "-t" ]]; then
> + usage
> + fi
> +
> + re='^[0-9]+$'
> + if ! [[ $2 =~ $re ]]; then
> + usage
> + fi
> +
> + RUN_TEST=kmod_test_$2
> + $RUN_TEST
> + exit 0
> +fi
> +
> +# Once tese are enabled please leave them as-is. Write your own test,
> +# we have tons of space.
> +kmod_test_0001
> +kmod_test_0002
> +kmod_test_0003
> +kmod_test_0004
> +kmod_test_0005
> +kmod_test_0006
> +kmod_test_0007
> +
> +#kmod_test_0008
> +#kmod_test_0009
While it's documented in the commit log, I think a short note for each
disabled test should be added here too.
> +
> +exit 0
> --
> 2.10.1
>
-Kees
--
Kees Cook
Nexus Security
On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <[email protected]> wrote:
> We currently statically limit the number of modprobe threads which
> we allow to run concurrently to 50. As per Keith Owens, this was a
> completely arbitrary value, and it was set in the 2.3.38 days [0]
> over 16 years ago in year 2000.
>
> Although we haven't yet hit our lower limits, experimentation [1]
> shows that when and if we hit this limit in the worst case, will be
> fatal -- consider get_fs_type() failures upon mount on a system which
> has many partitions, some of which might even be with the same
> filesystem. Its best to be prudent and increase and set this
> value to something more sensible which ensures we're far from hitting
> the limit and also allows default build/user run time override.
>
> The worst case is fatal given that once a module fails to load there
> is a period of time during which subsequent request for the same module
> will fail, so in the case of partitions its not just one request that
> could fail, but whole series of partitions. This later issue of a
> module request failure domino effect can be addressed later, but
> increasing the limit to something more meaninful should at least give us
> enough cushion to avoid this for a while.
>
> Set this value up with a bit more meaninful modern limits:
>
> Bump this up to 64 max for small systems (CONFIG_BASE_SMALL)
> Bump this up to 128 max for larger systems (!CONFIG_BASE_SMALL)
>
> Also allow the default max limit to be further fine tuned at compile
> time and at initialization at run time at boot up using the kernel
> parameter: max_modprobes.
>
> [0] https://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/?id=ab1c4ec7410f6ec64e1511d1a7d850fc99c09b44
> [1] https://github.com/mcgrof/test_request_module
>
> Signed-off-by: Luis R. Rodriguez <[email protected]>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 7 ++++
> include/linux/kmod.h | 3 +-
> init/Kconfig | 23 +++++++++++++
> init/main.c | 1 +
> kernel/kmod.c | 43 ++++++++++++++++---------
> 5 files changed, 61 insertions(+), 16 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index be2d6d0a03a4..92bcccc65ea4 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1700,6 +1700,13 @@
>
> keepinitrd [HW,ARM]
>
> + kmod.max_modprobes [KNL]
> + This lets you set the max allowed of concurrent
> + modprobes threads possible on a system overriding the
> + default heuristic of:
> +
> + min(max_threads/2, 2 << CONFIG_MAX_KMOD_CONCURRENT)
> +
> kernelcore= [KNL,X86,IA-64,PPC]
> Format: nn[KMGTPE] | "mirror"
> This parameter
> diff --git a/include/linux/kmod.h b/include/linux/kmod.h
> index fcfd2bf14d3f..15783cd7f056 100644
> --- a/include/linux/kmod.h
> +++ b/include/linux/kmod.h
> @@ -38,13 +38,14 @@ int __request_module(bool wait, const char *name, ...);
> #define request_module_nowait(mod...) __request_module(false, mod)
> #define try_then_request_module(x, mod...) \
> ((x) ?: (__request_module(true, mod), (x)))
> +void init_kmod_umh(void);
> #else
> static inline int request_module(const char *name, ...) { return -ENOSYS; }
> static inline int request_module_nowait(const char *name, ...) { return -ENOSYS; }
> +static inline void init_kmod_umh(void) { }
> #define try_then_request_module(x, mod...) (x)
> #endif
>
> -
> struct cred;
> struct file;
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 271692a352f1..da2c25746937 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -2111,6 +2111,29 @@ config TRIM_UNUSED_KSYMS
>
> If unsure, or if you need to build out-of-tree modules, say N.
>
> +config MAX_KMOD_CONCURRENT
> + int "Max allowed concurrent request_module() calls (6=>64, 10=>1024)"
> + range 0 14
> + default 6 if !BASE_SMALL
> + default 7 if BASE_SMALL
> + help
> + The kernel restricts the number of possible concurrent calls to
> + request_module() to help avoid a recursive loop possible with
> + modules. The default maximum number of concurrent threads allowed
> + to run request_module() will be:
> +
> + max_modprobes = min(max_threads/2, 2 << CONFIG_MAX_KMOD_CONCURRENT);
> +
> + The value set in CONFIG_MAX_KMOD_CONCURRENT represents then the power
> + of 2 value used at boot time for the above computation. You can
> + override the default built value using the kernel parameter:
> +
> + kmod.max_modprobes=4096
> +
> + We set this to default to 64 (2^6) concurrent modprobe threads for
> + small systems, for larger systems this defaults to 128 (2^7)
> + concurrent modprobe threads.
> +
> endif # MODULES
>
> config MODULES_TREE_LOOKUP
> diff --git a/init/main.c b/init/main.c
> index 8161208d4ece..1fa441aa32c6 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -638,6 +638,7 @@ asmlinkage __visible void __init start_kernel(void)
> thread_stack_cache_init();
> cred_init();
> fork_init();
> + init_kmod_umh();
> proc_caches_init();
> buffer_init();
> key_init();
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 0277d1216f80..cb6f7ca7b8a5 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -44,6 +44,9 @@
> #include <trace/events/module.h>
>
> extern int max_threads;
> +unsigned int max_modprobes;
> +module_param(max_modprobes, uint, 0644);
> +MODULE_PARM_DESC(max_modprobes, "Max number of allowed concurrent modprobes");
>
> #define CAP_BSET (void *)1
> #define CAP_PI (void *)2
> @@ -125,10 +128,8 @@ int __request_module(bool wait, const char *fmt, ...)
> {
> va_list args;
> char module_name[MODULE_NAME_LEN];
> - unsigned int max_modprobes;
> int ret;
> static atomic_t kmod_concurrent = ATOMIC_INIT(0);
> -#define MAX_KMOD_CONCURRENT 50 /* Completely arbitrary value - KAO */
> static int kmod_loop_msg;
>
> /*
> @@ -152,19 +153,6 @@ int __request_module(bool wait, const char *fmt, ...)
> if (ret)
> return ret;
>
> - /* If modprobe needs a service that is in a module, we get a recursive
> - * loop. Limit the number of running kmod threads to max_threads/2 or
> - * MAX_KMOD_CONCURRENT, whichever is the smaller. A cleaner method
> - * would be to run the parents of this process, counting how many times
> - * kmod was invoked. That would mean accessing the internals of the
> - * process tables to get the command line, proc_pid_cmdline is static
> - * and it is not worth changing the proc code just to handle this case.
> - * KAO.
> - *
> - * "trace the ppid" is simple, but will fail if someone's
> - * parent exits. I think this is as good as it gets. --RR
> - */
> - max_modprobes = min(max_threads/2, MAX_KMOD_CONCURRENT);
> atomic_inc(&kmod_concurrent);
> if (atomic_read(&kmod_concurrent) > max_modprobes) {
> /* We may be blaming an innocent here, but unlikely */
> @@ -186,6 +174,31 @@ int __request_module(bool wait, const char *fmt, ...)
> return ret;
> }
> EXPORT_SYMBOL(__request_module);
> +
> +/*
> + * If modprobe needs a service that is in a module, we get a recursive
> + * loop. Limit the number of running kmod threads to max_threads/2 or
> + * CONFIG_MAX_KMOD_CONCURRENT, whichever is the smaller. A cleaner method
> + * would be to run the parents of this process, counting how many times
> + * kmod was invoked. That would mean accessing the internals of the
> + * process tables to get the command line, proc_pid_cmdline is static
> + * and it is not worth changing the proc code just to handle this case.
> + *
> + * "trace the ppid" is simple, but will fail if someone's
> + * parent exits. I think this is as good as it gets.
> + *
> + * You can override with with a kernel parameter, for instance to allow
> + * 4096 concurrent modprobe instances:
> + *
> + * kmod.max_modprobes=4096
> + */
> +void __init init_kmod_umh(void)
What does umh mean?
> +{
> + if (!max_modprobes)
> + max_modprobes = min(max_threads/2,
> + 2 << CONFIG_MAX_KMOD_CONCURRENT);
> +}
> +
> #endif /* CONFIG_MODULES */
>
> static void call_usermodehelper_freeinfo(struct subprocess_info *info)
> --
> 2.10.1
>
--
Kees Cook
Nexus Security
On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <[email protected]> wrote:
> kmod_concurrent is used as an atomic counter for enabling
> the allowed limit of modprobe calls, provide wrappers for it
> to enable this to be expanded on more easily. This will be done
> later.
>
> Signed-off-by: Luis R. Rodriguez <[email protected]>
> ---
> kernel/kmod.c | 27 +++++++++++++++++++++------
> 1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index cb6f7ca7b8a5..049d7eabda38 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -44,6 +44,9 @@
> #include <trace/events/module.h>
>
> extern int max_threads;
> +
> +static atomic_t kmod_concurrent = ATOMIC_INIT(0);
> +
> unsigned int max_modprobes;
> module_param(max_modprobes, uint, 0644);
> MODULE_PARM_DESC(max_modprobes, "Max number of allowed concurrent modprobes");
> @@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
> return -ENOMEM;
> }
>
> +static int kmod_umh_threads_get(void)
> +{
> + atomic_inc(&kmod_concurrent);
> + if (atomic_read(&kmod_concurrent) < max_modprobes)
> + return 0;
> + atomic_dec(&kmod_concurrent);
> + return -ENOMEM;
> +}
> +
> +static void kmod_umh_threads_put(void)
> +{
> + atomic_dec(&kmod_concurrent);
> +}
Can you use a kref here instead? We're trying to kill raw use of
atomic_t for reference counting...
> +
> /**
> * __request_module - try to load a kernel module
> * @wait: wait (or not) for the operation to complete
> @@ -129,7 +146,6 @@ int __request_module(bool wait, const char *fmt, ...)
> va_list args;
> char module_name[MODULE_NAME_LEN];
> int ret;
> - static atomic_t kmod_concurrent = ATOMIC_INIT(0);
> static int kmod_loop_msg;
>
> /*
> @@ -153,8 +169,8 @@ int __request_module(bool wait, const char *fmt, ...)
> if (ret)
> return ret;
>
> - atomic_inc(&kmod_concurrent);
> - if (atomic_read(&kmod_concurrent) > max_modprobes) {
> + ret = kmod_umh_threads_get();
> + if (ret) {
> /* We may be blaming an innocent here, but unlikely */
> if (kmod_loop_msg < 5) {
> printk(KERN_ERR
> @@ -162,15 +178,14 @@ int __request_module(bool wait, const char *fmt, ...)
> module_name);
> kmod_loop_msg++;
> }
> - atomic_dec(&kmod_concurrent);
> - return -ENOMEM;
> + return ret;
> }
>
> trace_module_request(module_name, wait, _RET_IP_);
>
> ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
>
> - atomic_dec(&kmod_concurrent);
> + kmod_umh_threads_put();
> return ret;
> }
> EXPORT_SYMBOL(__request_module);
> --
> 2.10.1
>
--
Kees Cook
Nexus Security
On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <[email protected]> wrote:
> While looking for early possible module loading failures I was
> able to reproduce a memory leak possible with kmemleak. There
> are a few rare ways to trigger a failure:
>
> o we've run into a failure while processing kernel parameters
> (parse_args() returns an error)
> o mod_sysfs_setup() fails
> o we're a live patch module and copy_module_elf() fails
>
> Chances of running into this issue is really low.
>
> kmemleak splat:
>
> unreferenced object 0xffff9f2c4ada1b00 (size 32):
> comm "kworker/u16:4", pid 82, jiffies 4294897636 (age 681.816s)
> hex dump (first 32 bytes):
> 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<ffffffff8c6cfeba>] kmemleak_alloc+0x4a/0xa0
> [<ffffffff8c200046>] __kmalloc_track_caller+0x126/0x230
> [<ffffffff8c1bc581>] kstrdup+0x31/0x60
> [<ffffffff8c1bc5d4>] kstrdup_const+0x24/0x30
> [<ffffffff8c3c23aa>] kvasprintf_const+0x7a/0x90
> [<ffffffff8c3b5481>] kobject_set_name_vargs+0x21/0x90
> [<ffffffff8c4fbdd7>] dev_set_name+0x47/0x50
> [<ffffffffc07819e5>] memstick_check+0x95/0x33c [memstick]
> [<ffffffff8c09c893>] process_one_work+0x1f3/0x4b0
> [<ffffffff8c09cb98>] worker_thread+0x48/0x4e0
> [<ffffffff8c0a2b79>] kthread+0xc9/0xe0
> [<ffffffff8c6dab5f>] ret_from_fork+0x1f/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> Signed-off-by: Luis R. Rodriguez <[email protected]>
Acked-by: Kees Cook <[email protected]>
Is this worth sending through -stable too?
-Kees
> ---
> kernel/module.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index f7482db0f843..e420ed67e533 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3722,6 +3722,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
> mod_sysfs_teardown(mod);
> coming_cleanup:
> mod->state = MODULE_STATE_GOING;
> + destroy_params(mod->kp, mod->num_kp);
> blocking_notifier_call_chain(&module_notify_list,
> MODULE_STATE_GOING, mod);
> klp_module_going(mod);
> --
> 2.10.1
>
--
Kees Cook
Nexus Security
On Thu, Dec 08, 2016 at 12:28:07PM -0800, Kees Cook wrote:
> On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <[email protected]> wrote:
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index 0277d1216f80..cb6f7ca7b8a5 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -44,6 +44,9 @@
> > @@ -186,6 +174,31 @@ int __request_module(bool wait, const char *fmt, ...)
> > return ret;
> > }
> > EXPORT_SYMBOL(__request_module);
> > +
> > +/*
> > + * If modprobe needs a service that is in a module, we get a recursive
> > + * loop. Limit the number of running kmod threads to max_threads/2 or
> > + * CONFIG_MAX_KMOD_CONCURRENT, whichever is the smaller. A cleaner method
> > + * would be to run the parents of this process, counting how many times
> > + * kmod was invoked. That would mean accessing the internals of the
> > + * process tables to get the command line, proc_pid_cmdline is static
> > + * and it is not worth changing the proc code just to handle this case.
> > + *
> > + * "trace the ppid" is simple, but will fail if someone's
> > + * parent exits. I think this is as good as it gets.
> > + *
> > + * You can override with with a kernel parameter, for instance to allow
> > + * 4096 concurrent modprobe instances:
> > + *
> > + * kmod.max_modprobes=4096
> > + */
> > +void __init init_kmod_umh(void)
>
> What does umh mean?
umh is user mode helper. kmod.c actually implements the kernel's umh code.
A subsequent series I will want to move all that to umh.c and keep module
loading separate in kmod.c But that's for later as a cleanup.
BTW any chance I can have you trim replies to file name and hunk for changes
you reply to ? As an example I did that here :)
Luis
On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
> On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <[email protected]> wrote:
> > kmod_concurrent is used as an atomic counter for enabling
> > the allowed limit of modprobe calls, provide wrappers for it
> > to enable this to be expanded on more easily. This will be done
> > later.
> >
> > Signed-off-by: Luis R. Rodriguez <[email protected]>
> > ---
> > kernel/kmod.c | 27 +++++++++++++++++++++------
> > 1 file changed, 21 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index cb6f7ca7b8a5..049d7eabda38 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -44,6 +44,9 @@
> > #include <trace/events/module.h>
> >
> > extern int max_threads;
> > +
> > +static atomic_t kmod_concurrent = ATOMIC_INIT(0);
> > +
> > unsigned int max_modprobes;
> > module_param(max_modprobes, uint, 0644);
> > MODULE_PARM_DESC(max_modprobes, "Max number of allowed concurrent modprobes");
> > @@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
> > return -ENOMEM;
> > }
> >
> > +static int kmod_umh_threads_get(void)
> > +{
> > + atomic_inc(&kmod_concurrent);
> > + if (atomic_read(&kmod_concurrent) < max_modprobes)
> > + return 0;
> > + atomic_dec(&kmod_concurrent);
> > + return -ENOMEM;
> > +}
> > +
> > +static void kmod_umh_threads_put(void)
> > +{
> > + atomic_dec(&kmod_concurrent);
> > +}
>
> Can you use a kref here instead? We're trying to kill raw use of
> atomic_t for reference counting...
That's a much broader functional change than I was looking for, but I am up for
it. Can you describe the benefit of using kref you expect or why this is an
ongoing crusade? Since its a larger functional change how about doing this
change later, and we can test impact with the tress test driver. In theory if
there are benefits can't we add a test case to prove the gains?
Luis
On Thu, Dec 8, 2016 at 2:30 PM, Kees Cook <[email protected]> wrote:
> On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <[email protected]> wrote:
>> While looking for early possible module loading failures I was
>> able to reproduce a memory leak possible with kmemleak. There
>> are a few rare ways to trigger a failure:
>>
>> o we've run into a failure while processing kernel parameters
>> (parse_args() returns an error)
>> o mod_sysfs_setup() fails
>> o we're a live patch module and copy_module_elf() fails
>>
>> Chances of running into this issue is really low.
>>
>> kmemleak splat:
>>
>> unreferenced object 0xffff9f2c4ada1b00 (size 32):
>> comm "kworker/u16:4", pid 82, jiffies 4294897636 (age 681.816s)
>> hex dump (first 32 bytes):
>> 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>> backtrace:
>> [<ffffffff8c6cfeba>] kmemleak_alloc+0x4a/0xa0
>> [<ffffffff8c200046>] __kmalloc_track_caller+0x126/0x230
>> [<ffffffff8c1bc581>] kstrdup+0x31/0x60
>> [<ffffffff8c1bc5d4>] kstrdup_const+0x24/0x30
>> [<ffffffff8c3c23aa>] kvasprintf_const+0x7a/0x90
>> [<ffffffff8c3b5481>] kobject_set_name_vargs+0x21/0x90
>> [<ffffffff8c4fbdd7>] dev_set_name+0x47/0x50
>> [<ffffffffc07819e5>] memstick_check+0x95/0x33c [memstick]
>> [<ffffffff8c09c893>] process_one_work+0x1f3/0x4b0
>> [<ffffffff8c09cb98>] worker_thread+0x48/0x4e0
>> [<ffffffff8c0a2b79>] kthread+0xc9/0xe0
>> [<ffffffff8c6dab5f>] ret_from_fork+0x1f/0x40
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> Signed-off-by: Luis R. Rodriguez <[email protected]>
>
> Acked-by: Kees Cook <[email protected]>
>
> Is this worth sending through -stable too?
Yes, for some reason git-send e-mail complained to me about
[email protected] not being a valid local address, so I had to remove
it, but indeed. I'll try to fix this e-mail issue later and add your
tag.
Luis
On Thu, Dec 8, 2016 at 1:10 PM, Luis R. Rodriguez <[email protected]> wrote:
> On Thu, Dec 8, 2016 at 2:30 PM, Kees Cook <[email protected]> wrote:
>> On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <[email protected]> wrote:
>>> While looking for early possible module loading failures I was
>>> able to reproduce a memory leak possible with kmemleak. There
>>> are a few rare ways to trigger a failure:
>>>
>>> o we've run into a failure while processing kernel parameters
>>> (parse_args() returns an error)
>>> o mod_sysfs_setup() fails
>>> o we're a live patch module and copy_module_elf() fails
>>>
>>> Chances of running into this issue is really low.
>>>
>>> kmemleak splat:
>>>
>>> unreferenced object 0xffff9f2c4ada1b00 (size 32):
>>> comm "kworker/u16:4", pid 82, jiffies 4294897636 (age 681.816s)
>>> hex dump (first 32 bytes):
>>> 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>>> backtrace:
>>> [<ffffffff8c6cfeba>] kmemleak_alloc+0x4a/0xa0
>>> [<ffffffff8c200046>] __kmalloc_track_caller+0x126/0x230
>>> [<ffffffff8c1bc581>] kstrdup+0x31/0x60
>>> [<ffffffff8c1bc5d4>] kstrdup_const+0x24/0x30
>>> [<ffffffff8c3c23aa>] kvasprintf_const+0x7a/0x90
>>> [<ffffffff8c3b5481>] kobject_set_name_vargs+0x21/0x90
>>> [<ffffffff8c4fbdd7>] dev_set_name+0x47/0x50
>>> [<ffffffffc07819e5>] memstick_check+0x95/0x33c [memstick]
>>> [<ffffffff8c09c893>] process_one_work+0x1f3/0x4b0
>>> [<ffffffff8c09cb98>] worker_thread+0x48/0x4e0
>>> [<ffffffff8c0a2b79>] kthread+0xc9/0xe0
>>> [<ffffffff8c6dab5f>] ret_from_fork+0x1f/0x40
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>> Signed-off-by: Luis R. Rodriguez <[email protected]>
>>
>> Acked-by: Kees Cook <[email protected]>
>>
>> Is this worth sending through -stable too?
>
> Yes, for some reason git-send e-mail complained to me about
> [email protected] not being a valid local address, so I had to remove
> it, but indeed. I'll try to fix this e-mail issue later and add your
> tag.
Yup, you want [email protected]. :)
-Kees
--
Kees Cook
Nexus Security
On Thu, 8 Dec 2016, Luis R. Rodriguez wrote:
> While looking for early possible module loading failures I was
> able to reproduce a memory leak possible with kmemleak. There
> are a few rare ways to trigger a failure:
>
> o we've run into a failure while processing kernel parameters
> (parse_args() returns an error)
> o mod_sysfs_setup() fails
> o we're a live patch module and copy_module_elf() fails
>
> Chances of running into this issue is really low.
>
> kmemleak splat:
>
> unreferenced object 0xffff9f2c4ada1b00 (size 32):
> comm "kworker/u16:4", pid 82, jiffies 4294897636 (age 681.816s)
> hex dump (first 32 bytes):
> 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<ffffffff8c6cfeba>] kmemleak_alloc+0x4a/0xa0
> [<ffffffff8c200046>] __kmalloc_track_caller+0x126/0x230
> [<ffffffff8c1bc581>] kstrdup+0x31/0x60
> [<ffffffff8c1bc5d4>] kstrdup_const+0x24/0x30
> [<ffffffff8c3c23aa>] kvasprintf_const+0x7a/0x90
> [<ffffffff8c3b5481>] kobject_set_name_vargs+0x21/0x90
> [<ffffffff8c4fbdd7>] dev_set_name+0x47/0x50
> [<ffffffffc07819e5>] memstick_check+0x95/0x33c [memstick]
> [<ffffffff8c09c893>] process_one_work+0x1f3/0x4b0
> [<ffffffff8c09cb98>] worker_thread+0x48/0x4e0
> [<ffffffff8c0a2b79>] kthread+0xc9/0xe0
> [<ffffffff8c6dab5f>] ret_from_fork+0x1f/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> Signed-off-by: Luis R. Rodriguez <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
What about
Fixes: e180a6b7759a ("param: fix charp parameters set via sysfs")
?
Miroslav
> ---
> kernel/module.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index f7482db0f843..e420ed67e533 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3722,6 +3722,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
> mod_sysfs_teardown(mod);
> coming_cleanup:
> mod->state = MODULE_STATE_GOING;
> + destroy_params(mod->kp, mod->num_kp);
> blocking_notifier_call_chain(&module_notify_list,
> MODULE_STATE_GOING, mod);
> klp_module_going(mod);
> --
> 2.10.1
>
On Thu, 2016-12-08 at 11:49 -0800, Luis R. Rodriguez wrote:
>
> Although this does get us in the business of keeping alias maps in
> kernel, the the work to support and maintain this is trivial.
You've implemented a special treatment for request_module("fs-$X")in
finished_kmod_load(), but there are many more aliases defined (and
used) in the kernel. Do you plan to implement special code for "char-
major-$X", "crypto-$X", "binfmt-$X" etc. later?
Regards
Martin
--
Dr. Martin Wilck <[email protected]>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
On Fri, Dec 9, 2016 at 12:03 PM, Martin Wilck <[email protected]> wrote:
> On Thu, 2016-12-08 at 11:49 -0800, Luis R. Rodriguez wrote:
>>
>> Although this does get us in the business of keeping alias maps in
>> kernel, the the work to support and maintain this is trivial.
>
> You've implemented a special treatment for request_module("fs-$X")in
> finished_kmod_load(), but there are many more aliases defined (and
> used) in the kernel. Do you plan to implement special code for "char-
> major-$X", "crypto-$X", "binfmt-$X" etc. later?
Yeah, no, that is just complete garbage.
Those module aliases already exist in the module info section. We just
don't parse the alias tags in the kernel.
So the real fix is to make find_module_all() just do that.
Doing random ad-hoc "let's prefix with 'fs-xyz'" games are completely
unacceptable. That's just pure shit. Stop this idiocy.
Linus
On Thu, Dec 08, 2016 at 12:24:35PM -0800, Kees Cook wrote:
> On Thu, Dec 8, 2016 at 10:47 AM, Luis R. Rodriguez <[email protected]> wrote:
> > The kmod.sh script uses the above constructs to build differnt test cases.
>
> Typo: different
Fixed.
> > 3) finit_module() consumes quite a bit of memory.
>
> Is this due to reading the module into kernel memory or something else?
Very likely yes, but to be honest I have not had chance to instrument too
carefully, its TODO work :)
> > 4) Filesystems typically also have more dependent modules than other
> > modules, its important to note though that even though a get_fs_type() call
> > does not incur additional kmod_concurrent bumps, since userspace
> > loads dependencies it finds it needs via finit_module_fd(), it *will*
> > take much more memory to load a module with a lot of dependencies.
> >
> > Because of 3) and 4) we will easily run into out of memory failures
> > with certain tests. For instance test 0006 fails on qemu with 1024 MiB
> > of RAM. It panics a box after reaping all userspace processes and still
> > not having enough memory to reap.
>
> Are the buffers not released until after all the dependent modules are
> loaded? I thought it would load one by one?
kmod.c allows up to kmod_concurrent concurrent requests out to userspace,
how it handles this is up to userspace, but note that prior to the knobs
exposed in this patch set userspace neither knew what kmod_concurrent
was nor how many concurrent threads are active at any point in time.
> > Signed-off-by: Luis R. Rodriguez <[email protected]>
>
> This is a great selftest, thanks for working on it!
My pleasure.
> Notes below...
>
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 7446097f72bd..6cad548e0682 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1994,6 +1994,31 @@ config BUG_ON_DATA_CORRUPTION
> >
> > If unsure, say N.
> >
> > +config TEST_KMOD
> > + tristate "kmod stress tester"
> > + default n
> > + depends on m
> > + select TEST_LKM
> > + select XFS_FS
> > + select TUN
> > + select BTRFS_FS
>
> Since the desired FS can be changed at runtime, maybe these selects
> aren't needed?
Well yes and no, yes because its the defaults built-in. No, because as you note
we can alter the defaults in userspace. Without the alternatives being set the
driver will not really work at all though. Here is an example where Arnd's
kconfig "suggests" for kconfig could come in handy. Until we have that I think
I'd prefer to keep it this way.
> > diff --git a/lib/test_kmod.c b/lib/test_kmod.c
> > new file mode 100644
> > index 000000000000..63fded83b9b6
> > --- /dev/null
> > +++ b/lib/test_kmod.c
> > @@ -0,0 +1,1248 @@
> > +static bool force_init_test = false;
> > +module_param(force_init_test, bool_enable_only, 0644);
> > +MODULE_PARM_DESC(force_init_test,
> > + "Force kicking a test immediatley after driver loads");
>
> Typo: immediately
Fixed.
> > +static ssize_t config_show(struct device *dev,
> > + struct device_attribute *attr,
> > + char *buf)
> > +{
> > + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> > + struct test_config *config = &test_dev->config;
> > + int len = 0;
> > +
> > + mutex_lock(&test_dev->config_mutex);
> > +
> > + len += sprintf(buf, "Custom trigger configuration for: %s\n",
> > + dev_name(dev));
> > +
> > + len += sprintf(buf+len, "Number of threads:\t%u\n",
> > + config->num_threads);
> > +
> > + len += sprintf(buf+len, "Test_case:\t%s (%u)\n",
> > + test_case_str(config->test_case),
> > + config->test_case);
> > +
> > + if (config->test_driver)
> > + len += sprintf(buf+len, "driver:\t%s\n",
> > + config->test_driver);
> > + else
> > + len += sprintf(buf+len, "driver:\tEMTPY\n");
> > +
> > + if (config->test_fs)
> > + len += sprintf(buf+len, "fs:\t%s\n",
> > + config->test_fs);
> > + else
> > + len += sprintf(buf+len, "fs:\tEMTPY\n");
>
> These should all use snprintf...
Fixed. If the caller is sysfs_kf_seq_show() then max is PAGE_SIZE, will
use that as the limit to start with.
> > +static ssize_t config_test_driver_show(struct device *dev,
> > + struct device_attribute *attr,
> > + char *buf)
> > +{
> > + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> > + struct test_config *config = &test_dev->config;
> > +
> > + mutex_lock(&test_dev->config_mutex);
> > + strcpy(buf, config->test_driver);
> > + strcat(buf, "\n");
>
> IIUC, the show/store API uses a max size of PAGE_SIZE. If that's
> correct, it's possible that this show routine could write past the end
> of buf, due to the end newline, etc. Best to use snprintf like you do
> below for the other shows.
Sure.
> > +static ssize_t config_test_fs_show(struct device *dev,
> > + struct device_attribute *attr,
> > + char *buf)
> > +{
> > + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> > + struct test_config *config = &test_dev->config;
> > +
> > + mutex_lock(&test_dev->config_mutex);
> > + strcpy(buf, config->test_fs);
> > + strcat(buf, "\n");
> > + mutex_unlock(&test_dev->config_mutex);
>
> Same here... (which, btw, could likely use to be a helper function,
> the show and store functions here are identical except for test_driver
> vs test_fs).
Sure, I'm starting to think a lot of test boiler plate for setup and show of
config stuff could be shared. We can consider this more once we have a few more
test drivers like this. I have 3 total now in the pipeline.
> > +
> > + return strlen(buf) + 1;
> > +}
> > +static DEVICE_ATTR(config_test_fs, 0644, config_test_fs_show,
> > + config_test_fs_store);
> > +
> > +static int trigger_config_run_driver(struct kmod_test_device *test_dev,
> > + const char *test_driver)
> > +{
> > + int copied;
> > + struct test_config *config = &test_dev->config;
> > +
> > + mutex_lock(&test_dev->config_mutex);
> > +
> > + config->test_case = TEST_KMOD_DRIVER;
> > +
> > + kfree_const(config->test_driver);
> > + config->test_driver = NULL;
> > +
> > + copied = config_copy_test_driver_name(config, test_driver,
> > + strlen(test_driver));
> > + mutex_unlock(&test_dev->config_mutex);
> > +
> > + if (copied != strlen(test_driver)) {
>
> Can't these copied tests just check < 0? (i.e. avoid the repeated
> strlen which can be fragile.)
Sure, it can be:
if (copied <= 0 || copied != strlen(test_driver)) {
That way its both a negative check and also that something
non-empty was passed.
> > + test_dev->test_is_oom = true;
> > + return -EINVAL;
And come to think of it, these should return -ENOMEM;
> > + }
> > +
> > + test_dev->test_is_oom = false;
> > +
> > + return trigger_config_run(test_dev);
> > +}
> > +
> > +static int trigger_config_run_fs(struct kmod_test_device *test_dev,
> > + const char *fs_type)
> > +{
> > + int copied;
> > + struct test_config *config = &test_dev->config;
> > +
> > + mutex_lock(&test_dev->config_mutex);
> > + config->test_case = TEST_KMOD_FS_TYPE;
> > +
> > + kfree_const(config->test_fs);
> > + config->test_driver = NULL;
> > +
> > + copied = config_copy_test_fs(config, fs_type, strlen(fs_type));
> > + mutex_unlock(&test_dev->config_mutex);
> > +
> > + if (copied != strlen(fs_type)) {
> > + test_dev->test_is_oom = true;
> > + return -EINVAL;
> > + }
> > +
> > + test_dev->test_is_oom = false;
> > +
> > + return trigger_config_run(test_dev);
> > +}
>
> These two functions are almost identical too. Only test_case and the
> copy function change...
They are now shared.
> > +static void free_test_dev_info(struct kmod_test_device *test_dev)
> > +{
> > + if (test_dev->info) {
> > + vfree(test_dev->info);
> > + test_dev->info = NULL;
> > + }
> > +}
>
> vfree() already checks for NULL, you can drop the if.
Fixed.
> > +static int test_dev_config_update_uint_range(struct kmod_test_device *test_dev,
> > + const char *buf, size_t size,
> > + unsigned int *config,
> > + unsigned int min,
> > + unsigned int max)
> > +{
> > + char *end;
> > + long new = simple_strtol(buf, &end, 0);
> > + if (end == buf || new < min || new > max || new > UINT_MAX)
> > + return -EINVAL;
> > +
> > + mutex_lock(&test_dev->config_mutex);
> > + *(unsigned int *)config = new;
>
> config is already an unsigned int *, why cast?
Fixed.
> > +static int test_dev_config_update_int(struct kmod_test_device *test_dev,
> > + const char *buf, size_t size,
> > + int *config)
> > +{
> > + char *end;
> > + long new = simple_strtol(buf, &end, 0);
> > + if (end == buf || new > INT_MAX || new < INT_MIN)
> > + return -EINVAL;
> > + mutex_lock(&test_dev->config_mutex);
> > + *(int *)config = new;
>
> config is already an int *, why cast?
Fixed.
> > +/*
> > + * XXX: this could perhaps be made generic already too, but a hunt
> > + * for actual users would be needed first. It could be generic
> > + * if other test drivers end up using a similar mechanism.
> > + */
> > +const char *test_dev_get_name(const char *base, int idx, gfp_t gfp)
> > +{
> > + const char *name_const;
> > + char *name;
> > +
> > + if (!base)
> > + return NULL;
> > + if (strlen(base) > 30)
> > + return NULL;
>
> why?
It was an arbitrary limit, will use PAGE_SIZE. But I'll just remove the
entire routine (see below).
>
> > + name = kzalloc(1024, gfp);
> > + if (!name)
> > + return NULL;
> > +
> > + strncat(name, base, strlen(base));
> > + sprintf(name+(strlen(base)), "%d", idx);
> > + name_const = kstrdup_const(name, gfp);
> > +
> > + kfree(name);
> > +
> > + return name_const;
> > +}
>
> What is going on here? Why not just:
> return kasprintf(gfp, "%s%d", base, idx);
>
> For all of that code? And kstrdup_const is pointless here since it'll
> always just do the dup (as the kmalloc source isn't in rodata).
Heh, yeah, true, nuked.
> > diff --git a/tools/testing/selftests/kmod/config b/tools/testing/selftests/kmod/config
> > new file mode 100644
> > index 000000000000..259f4fd6b5e2
> > --- /dev/null
> > +++ b/tools/testing/selftests/kmod/config
> > @@ -0,0 +1,7 @@
> > +CONFIG_TEST_KMOD=m
> > +CONFIG_TEST_LKM=m
> > +CONFIG_XFS_FS=m
> > +
> > +# For the module parameter force_init_test is used
> > +CONFIG_TUN=m
> > +CONFIG_BTRFS_FS=m
> > diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
> > new file mode 100755
> > index 000000000000..9ea1864d8bae
> > --- /dev/null
> > +++ b/tools/testing/selftests/kmod/kmod.sh
> > @@ -0,0 +1,449 @@
> > +#!/bin/bash
> > +#
<-- snip -->
> > +# You'll want at least 4096 GiB of RAM to expect to run these tests
>
> 4TiB of RAM? I assume this was meant to be 4 GiB not 4096?
Whoops, yeah sorry 4 GiB only.
> > +# without running out of memory on them. For other requirements refer
> > +# to test_reqs()
> > +
> > +set -e
> > +
> > +TEST_DRIVER="test_kmod"
> > +
> > +function allow_user_defaults()
> > +{
> > + if [ -z $DEFAULT_KMOD_DRIVER ]; then
> > + DEFAULT_KMOD_DRIVER="test_module"
> > + fi
> > +
> > + if [ -z $DEFAULT_KMOD_FS ]; then
> > + DEFAULT_KMOD_FS="xfs"
> > + fi
> > +
> > + if [ -z $PROC_DIR ]; then
> > + PROC_DIR="/proc/sys/kernel/"
> > + fi
> > +
> > + if [ -z $MODPROBE_LIMIT ]; then
> > + MODPROBE_LIMIT=50
> > + fi
> > +
> > + if [ -z $DIR ]; then
> > + DIR="/sys/devices/virtual/misc/${TEST_DRIVER}0/"
> > + fi
> > +
> > + MODPROBE_LIMIT_FILE="${PROC_DIR}/kmod-limit"
> > +}
> > +
> > +test_reqs()
> > +{
> > + if ! which modprobe 2> /dev/null > /dev/null; then
> > + echo "$0: You need modprobe installed"
>
> While not a huge deal, I prefer that error messages end up on stderr,
> so adding >&2 to all the failure echos (or providing an err function)
> would be nice. (This happens in later places...)
Addressed.
> > +function load_req_mod()
> > +{
> > + if [ ! -d $DIR ]; then
> > + # Alanis: "Oh isn't it ironic?"
> > + modprobe $TEST_DRIVER
> > + if [ ! -d $DIR ]; then
> > + echo "$0: $DIR not present"
> > + echo "You must have the following enabled in your kernel:"
> > + cat $PWD/config
>
> I like this (minimum config in the test directory). Are other tests
> doing this too?
mcgrof@ergon ~/linux-next (git::(no branch, rebasing 20161213-kmod-test-driver))$ find tools/testing/selftests/ -name config
tools/testing/selftests/static_keys/config
tools/testing/selftests/cpu-hotplug/config
tools/testing/selftests/ipc/config
tools/testing/selftests/mount/config
tools/testing/selftests/zram/config
tools/testing/selftests/seccomp/config
tools/testing/selftests/memory-hotplug/config
tools/testing/selftests/vm/config
tools/testing/selftests/ftrace/config
tools/testing/selftests/pstore/config
tools/testing/selftests/firmware/config
tools/testing/selftests/net/config
tools/testing/selftests/bpf/config
tools/testing/selftests/user/config
tools/testing/selftests/kmod/config
Seems like a hipster trend.
> > +# Once tese are enabled please leave them as-is. Write your own test,
> > +# we have tons of space.
> > +kmod_test_0001
> > +kmod_test_0002
> > +kmod_test_0003
> > +kmod_test_0004
> > +kmod_test_0005
> > +kmod_test_0006
> > +kmod_test_0007
> > +
> > +#kmod_test_0008
> > +#kmod_test_0009
>
> While it's documented in the commit log, I think a short note for each
> disabled test should be added here too.
Will do, thanks so much for the review!
Luis
On Thu 2016-12-08 11:48:14, Luis R. Rodriguez wrote:
> We currently statically limit the number of modprobe threads which
> we allow to run concurrently to 50. As per Keith Owens, this was a
> completely arbitrary value, and it was set in the 2.3.38 days [0]
> over 16 years ago in year 2000.
>
> Although we haven't yet hit our lower limits, experimentation [1]
> shows that when and if we hit this limit in the worst case, will be
> fatal -- consider get_fs_type() failures upon mount on a system which
> has many partitions, some of which might even be with the same
> filesystem. Its best to be prudent and increase and set this
> value to something more sensible which ensures we're far from hitting
> the limit and also allows default build/user run time override.
>
> The worst case is fatal given that once a module fails to load there
> is a period of time during which subsequent request for the same module
> will fail, so in the case of partitions its not just one request that
> could fail, but whole series of partitions. This later issue of a
> module request failure domino effect can be addressed later, but
> increasing the limit to something more meaninful should at least give us
> enough cushion to avoid this for a while.
>
> Set this value up with a bit more meaninful modern limits:
>
> Bump this up to 64 max for small systems (CONFIG_BASE_SMALL)
> Bump this up to 128 max for larger systems (!CONFIG_BASE_SMALL)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 271692a352f1..da2c25746937 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -2111,6 +2111,29 @@ config TRIM_UNUSED_KSYMS
>
> If unsure, or if you need to build out-of-tree modules, say N.
>
> +config MAX_KMOD_CONCURRENT
> + int "Max allowed concurrent request_module() calls (6=>64, 10=>1024)"
> + range 0 14
Would not too small range break loading module dependencies?
I am not sure how it is implemented but it might require having
some more module loads in progress.
I would give 6 as minimum. Nobody has troubles with the current limit.
> + default 6 if !BASE_SMALL
> + default 7 if BASE_SMALL
Aren't the conditions inversed?
> diff --git a/init/main.c b/init/main.c
> index 8161208d4ece..1fa441aa32c6 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -638,6 +638,7 @@ asmlinkage __visible void __init start_kernel(void)
> thread_stack_cache_init();
> cred_init();
> fork_init();
> + init_kmod_umh();
> proc_caches_init();
> buffer_init();
> key_init();
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 0277d1216f80..cb6f7ca7b8a5 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -186,6 +174,31 @@ int __request_module(bool wait, const char *fmt, ...)
> return ret;
> }
> EXPORT_SYMBOL(__request_module);
> +
> +/*
> + * If modprobe needs a service that is in a module, we get a recursive
> + * loop. Limit the number of running kmod threads to max_threads/2 or
> + * CONFIG_MAX_KMOD_CONCURRENT, whichever is the smaller. A cleaner method
> + * would be to run the parents of this process, counting how many times
> + * kmod was invoked. That would mean accessing the internals of the
> + * process tables to get the command line, proc_pid_cmdline is static
> + * and it is not worth changing the proc code just to handle this case.
> + *
> + * "trace the ppid" is simple, but will fail if someone's
> + * parent exits. I think this is as good as it gets.
> + *
> + * You can override with with a kernel parameter, for instance to allow
> + * 4096 concurrent modprobe instances:
> + *
> + * kmod.max_modprobes=4096
> + */
> +void __init init_kmod_umh(void)
> +{
> + if (!max_modprobes)
> + max_modprobes = min(max_threads/2,
> + 2 << CONFIG_MAX_KMOD_CONCURRENT);
This should be
1 << CONFIG_MAX_KMOD_CONCURRENT);
1 << 1 = 2;
Note that this calculation is mentioned also some comments and
documentation.
Best Regards,
Petr
On Thu 2016-12-08 11:48:50, Luis R. Rodriguez wrote:
> Only decrement *iff* we're possitive. Warn if we've hit
> a situation where the counter is already 0 after we're done
> with a modprobe call, this would tell us we have an unaccounted
> counter access -- this in theory should not be possible as
> only one routine controls the counter, however preemption is
> one case that could trigger this situation. Avoid that situation
> by disabling preemptiong while we access the counter.
I am curious about it. How could enabled preemption cause that
the counter will get negative?
Unaccounted access would be possible if put() is called
without get() or if put() is called before get().
I do not see a way how the value might get negative when
the calls are paired and ordered.
Best Regards,
Petr
On Thu 2016-12-08 11:49:01, Luis R. Rodriguez wrote:
> Just use the simplified rate limit printk when the max modprobe
> limit is reached, while at it throw out a bone should the error
> be triggered.
>
> Signed-off-by: Luis R. Rodriguez <[email protected]>
> ---
> kernel/kmod.c | 10 ++--------
> 1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 09cf35a2075a..ef65f4c3578a 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -158,7 +158,6 @@ int __request_module(bool wait, const char *fmt, ...)
> va_list args;
> char module_name[MODULE_NAME_LEN];
> int ret;
> - static int kmod_loop_msg;
>
> /*
> * We don't allow synchronous module loading from async. Module
> @@ -183,13 +182,8 @@ int __request_module(bool wait, const char *fmt, ...)
>
> ret = kmod_umh_threads_get();
> if (ret) {
> - /* We may be blaming an innocent here, but unlikely */
> - if (kmod_loop_msg < 5) {
> - printk(KERN_ERR
> - "request_module: runaway loop modprobe %s\n",
> - module_name);
> - kmod_loop_msg++;
> - }
> + pr_err_ratelimited("request_module: modprobe limit (%u) reached with module %s\n",
> + max_modprobes, module_name);
I like this change. I would only be even more descriptive in which
limit is reached. Something like
pr_err_ratelimited("request_module: module \"%s\" reached limit (%u) of concurrent modprobe calls\n",
module_name, max_modprobes);
Either way, feel free to add:
Reviewed-by: Petr Mladek <[email protected]>
Best Regards,
Petr
On Wed, 2016-12-14 at 17:23 +0100, Petr Mladek wrote:
> On Thu 2016-12-08 11:49:01, Luis R. Rodriguez wrote:
> > Just use the simplified rate limit printk when the max modprobe
> > limit is reached, while at it throw out a bone should the error
> > be triggered.
[]
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
[]
> > @@ -183,13 +182,8 @@ int __request_module(bool wait, const char *fmt, ...)
> >
> > ret = kmod_umh_threads_get();
> > if (ret) {
> > - /* We may be blaming an innocent here, but unlikely */
> > - if (kmod_loop_msg < 5) {
> > - printk(KERN_ERR
> > - "request_module: runaway loop modprobe %s\n",
> > - module_name);
> > - kmod_loop_msg++;
> > - }
> > + pr_err_ratelimited("request_module: modprobe limit (%u) reached with module %s\n",
> > + max_modprobes, module_name);
>
> I like this change. I would only be even more descriptive in which
> limit is reached. Something like
>
> pr_err_ratelimited("request_module: module \"%s\" reached limit (%u) of concurrent modprobe calls\n",
> module_name, max_modprobes);
>
> Either way, feel free to add:
>
> Reviewed-by: Petr Mladek <[email protected]>
Seems sensible.
I suggest using "%s: ", __func__ instead of embedding
the function name.
On Wed, Dec 14, 2016 at 05:08:58PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 11:48:50, Luis R. Rodriguez wrote:
> > Only decrement *iff* we're possitive. Warn if we've hit
> > a situation where the counter is already 0 after we're done
> > with a modprobe call, this would tell us we have an unaccounted
> > counter access -- this in theory should not be possible as
> > only one routine controls the counter, however preemption is
> > one case that could trigger this situation. Avoid that situation
> > by disabling preemptiong while we access the counter.
>
> I am curious about it. How could enabled preemption cause that
> the counter will get negative?
As the commit log describes today in theory this is not possible
was we have only have one routine controlling the counter. If we
were to expand this then such possibilities become more real.
> Unaccounted access would be possible if put() is called
> without get() or if put() is called before get().
Exactly, so buggy users of the get/put calls in future calls.
I can just drop the preemption disable / enable for now as it
should not be an issue now.
> I do not see a way how the value might get negative when
> the calls are paired and ordered.
Right, this just matches parity with module_put(), its perhaps
*preemptively* too cautious though so I could just drop the
preemption enable/disable for now as that would slow down
things a bit.
Luis
"Luis R. Rodriguez" <[email protected]> writes:
> kmod has an optimization in place whereby if a some kernel code
> uses request_module() on a module already loaded we never bother
> userspace as the module already is loaded. This is not true for
> get_fs_type() though as it uses aliases.
Well, the obvious thing to do here is block kmod if we're currently
loading the same module. Otherwise it has to do some weird spinning
thing in userspace anyway.
We already have module_wq for this, we just need a bit more code to
share the return value; and there's a weird corner case there where we
have "modprobe foo param=invalid" then "modprobe foo param=valid" and we
fail both with -EINVAL, but it's probably not worth fixing.
Cheers,
Rusty.
On Thu 2016-12-08 22:08:59, Luis R. Rodriguez wrote:
> On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
> > On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <[email protected]> wrote:
> > > kmod_concurrent is used as an atomic counter for enabling
> > > the allowed limit of modprobe calls, provide wrappers for it
> > > to enable this to be expanded on more easily. This will be done
> > > later.
> > >
> > > Signed-off-by: Luis R. Rodriguez <[email protected]>
> > > ---
> > > kernel/kmod.c | 27 +++++++++++++++++++++------
> > > 1 file changed, 21 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > > index cb6f7ca7b8a5..049d7eabda38 100644
> > > --- a/kernel/kmod.c
> > > +++ b/kernel/kmod.c
> > > @@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
> > > return -ENOMEM;
> > > }
> > >
> > > +static int kmod_umh_threads_get(void)
> > > +{
> > > + atomic_inc(&kmod_concurrent);
This approach might actually cause false failures. If we
are on the limit and more processes do this increment
in parallel, it makes the number bigger that it should be.
> > > + if (atomic_read(&kmod_concurrent) < max_modprobes)
> > > + return 0;
> > > + atomic_dec(&kmod_concurrent);
> > > + return -ENOMEM;
> > > +}
> > > +
> > > +static void kmod_umh_threads_put(void)
> > > +{
> > > + atomic_dec(&kmod_concurrent);
> > > +}
> >
> > Can you use a kref here instead? We're trying to kill raw use of
> > atomic_t for reference counting...
>
> That's a much broader functional change than I was looking for, but I am up for
> it. Can you describe the benefit of using kref you expect or why this is an
> ongoing crusade? Since its a larger functional change how about doing this
> change later, and we can test impact with the tress test driver. In theory if
> there are benefits can't we add a test case to prove the gains?
Kees probably refers to the kref improvements that Peter Zijlstra
is working on, see
https://lkml.kernel.org/r/[email protected]
The advantage is that the new refcount API handles over and
underflow.
Another advantage is that it increments/decrements the value
only when it is safe. It uses cmpxchg to make sure that
the checks are valid.
Best Regards,
Petr
On Thu 2016-12-08 11:48:50, Luis R. Rodriguez wrote:
> Only decrement *iff* we're possitive. Warn if we've hit
> a situation where the counter is already 0 after we're done
> with a modprobe call, this would tell us we have an unaccounted
> counter access -- this in theory should not be possible as
> only one routine controls the counter, however preemption is
> one case that could trigger this situation. Avoid that situation
> by disabling preemptiong while we access the counter.
>
> Signed-off-by: Luis R. Rodriguez <[email protected]>
> ---
> kernel/kmod.c | 20 ++++++++++++++++----
> 1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index ab38539f7e91..09cf35a2075a 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -113,16 +113,28 @@ static int call_modprobe(char *module_name, int wait)
>
> static int kmod_umh_threads_get(void)
> {
> + int ret = 0;
> +
> + preempt_disable();
> atomic_inc(&kmod_concurrent);
> if (atomic_read(&kmod_concurrent) < max_modprobes)
> - return 0;
> - atomic_dec(&kmod_concurrent);
> - return -EBUSY;
> + goto out;
I though more about it and the disabled preemtion might make
sense here. It makes sure that we are not rescheduled here
and that kmod_concurrent is not increased by mistake for too long.
Well, it still would make sense to increment the value
only when it is under the limit and set the incremented
value using cmpxchg to avoid races.
I mean to use similar trick that is used by refcount_inc(), see
https://lkml.kernel.org/r/[email protected]
> + atomic_dec_if_positive(&kmod_concurrent);
> + ret = -EBUSY;
> +out:
> + preempt_enable();
> + return 0;
> }
>
> static void kmod_umh_threads_put(void)
> {
> - atomic_dec(&kmod_concurrent);
> + int ret;
> +
> + preempt_disable();
> + ret = atomic_dec_if_positive(&kmod_concurrent);
> + WARN_ON(ret < 0);
> + preempt_enable();
The disabled preemption does not make much sense here.
We do not need to tie the atomic operation and the WARN
together so tightly.
Best Regards,
Petr
On Thu 2016-12-08 11:49:20, Luis R. Rodriguez wrote:
> This adds helpers for getting access to the kmod count and limit from
> userspace. While at it, this also lets userspace fine tune the kmod
> limit after boot, it uses the shiny new proc_douintvec_minmax().
>
> These knobs should help userspace more gracefully and deterministically
> handle module loading.
>
> Signed-off-by: Luis R. Rodriguez <[email protected]>
> ---
> include/linux/kmod.h | 8 +++++
> kernel/kmod.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> kernel/sysctl.c | 14 +++++++++
> 3 files changed, 103 insertions(+), 2 deletions(-)
I am not sure if it is worth it. As you say in the 3rd patch,
there was rather low limit for 16 years and nobody probably had
problems with it.
Anyway, it seems that such know should also get documented in
Documentation/sysctl/kernel.txt
Best Regards,
Petr
On Fri, Dec 09, 2016 at 12:56:21PM -0800, Linus Torvalds wrote:
> On Fri, Dec 9, 2016 at 12:03 PM, Martin Wilck <[email protected]> wrote:
> > On Thu, 2016-12-08 at 11:49 -0800, Luis R. Rodriguez wrote:
> >>
> >> Although this does get us in the business of keeping alias maps in
> >> kernel, the the work to support and maintain this is trivial.
> >
> > You've implemented a special treatment for request_module("fs-$X")in
> > finished_kmod_load(), but there are many more aliases defined (and
> > used) in the kernel. Do you plan to implement special code for "char-
> > major-$X", "crypto-$X", "binfmt-$X" etc. later?
>
> Yeah, no, that is just complete garbage.
>
> Those module aliases already exist in the module info section. We just
> don't parse the alias tags in the kernel.
>
> So the real fix is to make find_module_all() just do that.
Ah yes, that is much sexier, this is now done and it works nicely, thanks
for the suggestion.
> Doing random ad-hoc "let's prefix with 'fs-xyz'" games are completely
> unacceptable. That's just pure shit. Stop this idiocy.
Look at that fin DNA in action :)
Luis
On Thu 2016-12-08 11:48 -0800, Luis R. Rodriguez wrote:
> While looking for early possible module loading failures I was
> able to reproduce a memory leak possible with kmemleak. There
> are a few rare ways to trigger a failure:
>
> o we've run into a failure while processing kernel parameters
> (parse_args() returns an error)
> o mod_sysfs_setup() fails
> o we're a live patch module and copy_module_elf() fails
>
> Chances of running into this issue is really low.
>
> kmemleak splat:
>
> unreferenced object 0xffff9f2c4ada1b00 (size 32):
> comm "kworker/u16:4", pid 82, jiffies 4294897636 (age 681.816s)
> hex dump (first 32 bytes):
> 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<ffffffff8c6cfeba>] kmemleak_alloc+0x4a/0xa0
> [<ffffffff8c200046>] __kmalloc_track_caller+0x126/0x230
> [<ffffffff8c1bc581>] kstrdup+0x31/0x60
> [<ffffffff8c1bc5d4>] kstrdup_const+0x24/0x30
> [<ffffffff8c3c23aa>] kvasprintf_const+0x7a/0x90
> [<ffffffff8c3b5481>] kobject_set_name_vargs+0x21/0x90
> [<ffffffff8c4fbdd7>] dev_set_name+0x47/0x50
> [<ffffffffc07819e5>] memstick_check+0x95/0x33c [memstick]
> [<ffffffff8c09c893>] process_one_work+0x1f3/0x4b0
> [<ffffffff8c09cb98>] worker_thread+0x48/0x4e0
> [<ffffffff8c0a2b79>] kthread+0xc9/0xe0
> [<ffffffff8c6dab5f>] ret_from_fork+0x1f/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> Signed-off-by: Luis R. Rodriguez <[email protected]>
> ---
> kernel/module.c | 1 +
> 1 file changed, 1 insertion(+)
Reviewed-by: Aaron Tomlin <[email protected]>
--
Aaron Tomlin
On Tue, Dec 13, 2016 at 10:10:41PM +0100, Luis R. Rodriguez wrote:
> On Thu, Dec 08, 2016 at 12:24:35PM -0800, Kees Cook wrote:
> > On Thu, Dec 8, 2016 at 10:47 AM, Luis R. Rodriguez <[email protected]> wrote:
> > > 3) finit_module() consumes quite a bit of memory.
> >
> > Is this due to reading the module into kernel memory or something else?
>
> Very likely yes, but to be honest I have not had chance to instrument too
> carefully, its TODO work :)
I've checked and the issue is since get_fs_type() does not check for
aliases we end up hammering tons of module requests, this in turn is
an analysis on load_module(). Within there layout_and_allocate()
uses first a local copy of the passed user data and mapping it into
a struct module, after a bit of sanity checks it finally allocates a
copy for us, so its struct module size * however many requests were
allowed to get in for load_module(). We could simply avoid an allocation
if the module is already present. I have this as another optimization
now but am running many other tests to compare performance.
> > > +# Once tese are enabled please leave them as-is. Write your own test,
> > > +# we have tons of space.
> > > +kmod_test_0001
> > > +kmod_test_0002
> > > +kmod_test_0003
> > > +kmod_test_0004
> > > +kmod_test_0005
> > > +kmod_test_0006
> > > +kmod_test_0007
> > > +
> > > +#kmod_test_0008
> > > +#kmod_test_0009
> >
> > While it's documented in the commit log, I think a short note for each
> > disabled test should be added here too.
>
> Will do, thanks so much for the review!
As I added test 0008's reason for why I think it fails I realized that the reason the test
can sometimes fail is very different than test 0009 which is for get_fs_type(). You see
get_fs_type() hammers kmod concurrent since we don't have an alias check and moprobe
calling fs-xfs for instance does not catch that the module is already loaded so it
delays the get_fs_type() call and so the __request_module() call, hogging up its
kmod concurrent increment.
For direct request_module() calls we don't have the alias issue, but since
we don't check if a module is loaded prior to calling userspace (I now have a fix
for this, reducing this latency does help) it means there are often times the
chances we will pour in tons of requests without them getting processed and
go over the concurrent limit.
I've added a clutch into __request_module() then so instead of just failing
we first check if we're at a threshold (say about 1/4 away from limit) and
if so we let a few threads breath, until they are done. This fixes *both*
test cases without much code changes, however as I've noted in other threads,
this is not the only issue to address.
Luis
On Thu, Dec 15, 2016 at 05:56:19PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 11:49:20, Luis R. Rodriguez wrote:
> > This adds helpers for getting access to the kmod count and limit from
> > userspace. While at it, this also lets userspace fine tune the kmod
> > limit after boot, it uses the shiny new proc_douintvec_minmax().
> >
> > These knobs should help userspace more gracefully and deterministically
> > handle module loading.
> >
> > Signed-off-by: Luis R. Rodriguez <[email protected]>
> > ---
> > include/linux/kmod.h | 8 +++++
> > kernel/kmod.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> > kernel/sysctl.c | 14 +++++++++
> > 3 files changed, 103 insertions(+), 2 deletions(-)
>
> I am not sure if it is worth it. As you say in the 3rd patch,
> there was rather low limit for 16 years and nobody probably had
> problems with it.
Note, *probably* - ie, this could have gone unreported for a while, and
to be frank how can we know for sure a pesky module just did not load due
to this? In the case of get_fs_type() issue this can be fatal for a partition
mount, not a good example to wait to look forward to before we take this
serious.
I added the sysctl value mostly for read purposes, the count is probably
useless for any accounting to be done in userspace due to delays this
reading and making this value useful in userspace can have, I can nuke
that. The kmod-limit however seems very useful so that userspace knows
how to properly thread *safely* modprobe calls more deterministically.
Adding write support to let one bump the limit was just an easy convenience
possible given the read support was being added, but its use should
really only be useful for testing purposes post bootup given that the
real value in the limit will be important at boot time prior to the sysctl
parsing. The real know tweak which should be used in case of issues is
the module parameter added earlier.
So I could drop the kmod-count, and just make the kmod-limit read-only.
Thoughts?
> Anyway, it seems that such know should also get documented in
> Documentation/sysctl/kernel.txt
Will do if we keep them, thanks.
Luis
On Thu, Dec 15, 2016 at 01:46:25PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 22:08:59, Luis R. Rodriguez wrote:
> > On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
> > > On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <[email protected]> wrote:
> > > > kmod_concurrent is used as an atomic counter for enabling
> > > > the allowed limit of modprobe calls, provide wrappers for it
> > > > to enable this to be expanded on more easily. This will be done
> > > > later.
> > > >
> > > > Signed-off-by: Luis R. Rodriguez <[email protected]>
> > > > ---
> > > > kernel/kmod.c | 27 +++++++++++++++++++++------
> > > > 1 file changed, 21 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > > > index cb6f7ca7b8a5..049d7eabda38 100644
> > > > --- a/kernel/kmod.c
> > > > +++ b/kernel/kmod.c
> > > > @@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
> > > > return -ENOMEM;
> > > > }
> > > >
> > > > +static int kmod_umh_threads_get(void)
> > > > +{
> > > > + atomic_inc(&kmod_concurrent);
>
> This approach might actually cause false failures. If we
> are on the limit and more processes do this increment
> in parallel, it makes the number bigger that it should be.
This approach is *exactly* what the existing code does :P
I just provided wrappers. I agree with the old approach though,
reason is it acts as a lock in for the bump. What seems rather
stupid though is to just reject with an error on limit without first
taking a breather. I've now added a little clutch so that we first
take some fresh air when close to the limit, this reduces the chances
of going fatal.
With a clutch in place we can still go over the limit, its just we'd
have a few threads waiting until previous calls clear out. If there
is enough calls waiting eventually we'll fail.
Note though that __request_module() can wait, but here is an option
to not wait so such a clutch can only wait if we're allowed to.
> > > > + if (atomic_read(&kmod_concurrent) < max_modprobes)
> > > > + return 0;
> > > > + atomic_dec(&kmod_concurrent);
> > > > + return -ENOMEM;
> > > > +}
> > > > +
> > > > +static void kmod_umh_threads_put(void)
> > > > +{
> > > > + atomic_dec(&kmod_concurrent);
> > > > +}
> > >
> > > Can you use a kref here instead? We're trying to kill raw use of
> > > atomic_t for reference counting...
> >
> > That's a much broader functional change than I was looking for, but I am up for
> > it. Can you describe the benefit of using kref you expect or why this is an
> > ongoing crusade? Since its a larger functional change how about doing this
> > change later, and we can test impact with the tress test driver. In theory if
> > there are benefits can't we add a test case to prove the gains?
>
> Kees probably refers to the kref improvements that Peter Zijlstra
> is working on, see
> https://lkml.kernel.org/r/[email protected]
>
> The advantage is that the new refcount API handles over and
> underflow.
>
> Another advantage is that it increments/decrements the value
> only when it is safe. It uses cmpxchg to make sure that
> the checks are valid.
Great thanks, will look into that.
Luis
On Thu, Dec 15, 2016 at 10:57:42AM +1030, Rusty Russell wrote:
> "Luis R. Rodriguez" <[email protected]> writes:
> > kmod has an optimization in place whereby if a some kernel code
> > uses request_module() on a module already loaded we never bother
> > userspace as the module already is loaded. This is not true for
> > get_fs_type() though as it uses aliases.
>
> Well, the obvious thing to do here is block kmod if we're currently
> loading the same module.
OK thanks, I've now added this, it sure helps. Test cases 0008 and 0009 require
hammering on the test over and over to see a failure on vanilla kernels,
an upper bound I found was about 150 times each test. Running test 0008
150 times with this enhancement you mentioned shaves off ~4 seconds.
For test 0009 it shaves off ~16 seconds, but as I note below the alias support
was needed as well.
> Otherwise it has to do some weird spinning
> thing in userspace anyway.
Right, but note that the get_fs_type() tests would still fail given
module.c was not alias-aware yet. I have the patches to add support
for the aliases now though and this is part of what helped shave
off time from the tests.
> We already have module_wq for this, we just need a bit more code to
> share the return value; and there's a weird corner case there where we
> have "modprobe foo param=invalid" then "modprobe foo param=valid" and we
> fail both with -EINVAL, but it's probably not worth fixing.
Hm OK. Although the set of patches I have fix and optimize now some
of these corner cases one issue that I still didn't quite yet figure
out was that a failure propagates secondary failures. That is,
say a module fails and you have loaded 4 request for the same module,
if the first request failed the last 3 *could* also fail. You can
trigger and see this with the latest script:
http://drvbp1.linux-foundation.org/~mcgrof/2016/12/16/kmod.sh
The latest version of the test_kmod driver:
http://drvbp1.linux-foundation.org/~mcgrof/2016/12/16/test_kmod.patch
./kmod.sh -t 0008
./kmod.sh -t 0009
When either of these fail you'll on dmesg that either a few NULL or
errors were found. It may not be worth fixing this race... given
that after apply all of my patches I no longer see this at all,
but I'm pretty sure a test case can be created to replicate more
easily.
FWIW a few things did occur to me:
a) list_add_rcu() is used so new modules get added first
b) find_module_all() returns the last module which was added as it traverses
the module list
Because of a) and b) if two modules for the same driver can be on
the list at the same time then we'll get very likely a module which
is unformed or going than a live module. Changing module addition
to use list_add_tail_rcu() should mean we typically get the first
module added to the list for the module name I think, but other
than that I could not think clearly of the root case to allowing
multiple errors.
BTW should find_module_all() use rcu to traverse?
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -594,7 +594,7 @@ static struct module *find_module_all(const char *name, size_t len,
module_assert_mutex_or_preempt();
- list_for_each_entry(mod, &modules, list) {
+ list_for_each_entry_rcu(mod, &modules, list) {
if (!even_unformed && mod->state == MODULE_STATE_UNFORMED)
continue;
if (strlen(mod->name) == len && !memcmp(mod->name, name, len))
@@ -3532,7 +3532,7 @@ static int add_unformed_module(struct module *mod)
goto out;
}
mod_update_bounds(mod);
- list_add_rcu(&mod->list, &modules);
+ list_add_tail_rcu(&mod->list, &modules);
mod_tree_insert(mod);
err = 0;
On Wed, Dec 14, 2016 at 04:38:27PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 11:48:14, Luis R. Rodriguez wrote:
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 271692a352f1..da2c25746937 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -2111,6 +2111,29 @@ config TRIM_UNUSED_KSYMS
> >
> > If unsure, or if you need to build out-of-tree modules, say N.
> >
> > +config MAX_KMOD_CONCURRENT
> > + int "Max allowed concurrent request_module() calls (6=>64, 10=>1024)"
> > + range 0 14
>
> Would not too small range break loading module dependencies?
No, dependencies are resolved by depmod, so userspace looks at the list and
just finit_module() the depenencies, skipping kmod. So the limit is
really only for kernel acting like a boss.
> I am not sure how it is implemented but it might require having
> some more module loads in progress.
Dependencies should be OK, a more serious concern with dependencies is
the aggregate memory it takes to load all dep modules for one required
module since finit_module() ends up allocating the struct module to copy
over data from userspace.
> I would give 6 as minimum. Nobody has troubles with the current limit.
Fair enough! Although disabling modprobe calls all together seemed like
a fun test, that should we allow that via the module parameter at least?
> > + default 6 if !BASE_SMALL
> > + default 7 if BASE_SMALL
>
> Aren't the conditions inversed?
Whoops yes, sorry.
> > +void __init init_kmod_umh(void)
> > +{
> > + if (!max_modprobes)
> > + max_modprobes = min(max_threads/2,
> > + 2 << CONFIG_MAX_KMOD_CONCURRENT);
>
> This should be
>
> 1 << CONFIG_MAX_KMOD_CONCURRENT);
>
> 1 << 1 = 2;
>
> Note that this calculation is mentioned also some comments and
> documentation.
Heh sorry, yes fixed! Good thing I had still tested all along with the
value I intended though :P
Luis
On Wed, Dec 14, 2016 at 05:23:50PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 11:49:01, Luis R. Rodriguez wrote:
> > Just use the simplified rate limit printk when the max modprobe
> > limit is reached, while at it throw out a bone should the error
> > be triggered.
> >
> > Signed-off-by: Luis R. Rodriguez <[email protected]>
> > ---
> > kernel/kmod.c | 10 ++--------
> > 1 file changed, 2 insertions(+), 8 deletions(-)
> >
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index 09cf35a2075a..ef65f4c3578a 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -158,7 +158,6 @@ int __request_module(bool wait, const char *fmt, ...)
> > va_list args;
> > char module_name[MODULE_NAME_LEN];
> > int ret;
> > - static int kmod_loop_msg;
> >
> > /*
> > * We don't allow synchronous module loading from async. Module
> > @@ -183,13 +182,8 @@ int __request_module(bool wait, const char *fmt, ...)
> >
> > ret = kmod_umh_threads_get();
> > if (ret) {
> > - /* We may be blaming an innocent here, but unlikely */
> > - if (kmod_loop_msg < 5) {
> > - printk(KERN_ERR
> > - "request_module: runaway loop modprobe %s\n",
> > - module_name);
> > - kmod_loop_msg++;
> > - }
> > + pr_err_ratelimited("request_module: modprobe limit (%u) reached with module %s\n",
> > + max_modprobes, module_name);
>
> I like this change. I would only be even more descriptive in which
> limit is reached. Something like
>
> pr_err_ratelimited("request_module: module \"%s\" reached limit (%u) of concurrent modprobe calls\n",
> module_name, max_modprobes);
Sure, changed.
> Either way, feel free to add:
>
> Reviewed-by: Petr Mladek <[email protected]>
Thanks!
Luis
On Fri, Dec 09, 2016 at 06:06:44PM +0100, Miroslav Benes wrote:
>
> Reviewed-by: Miroslav Benes <[email protected]>
>
> What about
>
> Fixes: e180a6b7759a ("param: fix charp parameters set via sysfs")
>
> ?
Sure thing, added thanks!
Luis
"Luis R. Rodriguez" <[email protected]> writes:
> On Thu, Dec 15, 2016 at 10:57:42AM +1030, Rusty Russell wrote:
>> "Luis R. Rodriguez" <[email protected]> writes:
>> > kmod has an optimization in place whereby if a some kernel code
>> > uses request_module() on a module already loaded we never bother
>> > userspace as the module already is loaded. This is not true for
>> > get_fs_type() though as it uses aliases.
>>
>> Well, the obvious thing to do here is block kmod if we're currently
>> loading the same module.
>
> OK thanks, I've now added this, it sure helps. Test cases 0008 and 0009 require
> hammering on the test over and over to see a failure on vanilla kernels,
> an upper bound I found was about 150 times each test. Running test 0008
> 150 times with this enhancement you mentioned shaves off ~4 seconds.
> For test 0009 it shaves off ~16 seconds, but as I note below the alias support
> was needed as well.
>
>> Otherwise it has to do some weird spinning
>> thing in userspace anyway.
>
> Right, but note that the get_fs_type() tests would still fail given
> module.c was not alias-aware yet.
AFAICT the mistake here is that kmod is returning "done, OK" when the
module it is trying to load is already loading (but not finished
loading). That's the root problem; it's an attempt at optimization by
kmod which goes awry.
Looking at the code in the kernel, we *already* get this right: block if
a module is still loading anyway. Once it succeeds we return -EBUSY; if
it fails we'll proceed to try to load it again.
I don't understand what you're trying to fix with adding aliases
in-kernel?
> FWIW a few things did occur to me:
>
> a) list_add_rcu() is used so new modules get added first
Only after we're sure that there are no duplicates.
> b) find_module_all() returns the last module which was added as it traverses
> the module list
> BTW should find_module_all() use rcu to traverse?
Yes; the kallsyms code does this on Oops. Not really a big issue in
practice, but a nice fix.
Thanks,
Rusty.
>
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -594,7 +594,7 @@ static struct module *find_module_all(const char *name, size_t len,
>
> module_assert_mutex_or_preempt();
>
> - list_for_each_entry(mod, &modules, list) {
> + list_for_each_entry_rcu(mod, &modules, list) {
> if (!even_unformed && mod->state == MODULE_STATE_UNFORMED)
> continue;
> if (strlen(mod->name) == len && !memcmp(mod->name, name, len))
> @@ -3532,7 +3532,7 @@ static int add_unformed_module(struct module *mod)
> goto out;
> }
> mod_update_bounds(mod);
> - list_add_rcu(&mod->list, &modules);
> + list_add_tail_rcu(&mod->list, &modules);
> mod_tree_insert(mod);
> err = 0;
>
"Luis R. Rodriguez" <[email protected]> writes:
> On Dec 16, 2016 9:54 PM, "Rusty Russell" <[email protected]> wrote:
> > AFAICT the mistake here is that kmod is returning "done, OK" when the
> > module it is trying to load is already loading (but not finished
> > loading). That's the root problem; it's an attempt at optimization by
> > kmod which goes awry.
>
> This is true! To be precise though the truth of the matter is that kmod'd
> respective usermode helper: modprobe can be buggy and may lie to us. It may
> allow request_module() to return 0 but since we don't validate it, any
> assumption we make can be deadly. In the case of get_fs_type() its a null
> dereference.
Wait, what?? I can't see that in get_fs_type, which hasn't changed
since 2013. If a caller is assuming get_fs_type() doesn't return NULL,
they're broken and need fixing of course:
struct file_system_type *get_fs_type(const char *name)
{
struct file_system_type *fs;
const char *dot = strchr(name, '.');
int len = dot ? dot - name : strlen(name);
fs = __get_fs_type(name, len);
if (!fs && (request_module("fs-%.*s", len, name) == 0))
fs = __get_fs_type(name, len);
if (dot && fs && !(fs->fs_flags & FS_HAS_SUBTYPE)) {
put_filesystem(fs);
fs = NULL;
}
return fs;
}
Where does this NULL-deref is the module isn't correctly loaded?
> *Iff* we want a sanity check to verify kmod's umh is not lying to us we
> need to verify after 0 was returned that it was not lying to us. Since kmod
> accepts aliases but find_modules_all() only works on the real module name a
> validation check cannot happen when all you have are aliases.
request_module() should block until resolution, but that's fundamentally
a userspace problem. Let's not paper over it in kernelspace.
> *Iff* we are sure we don't want a validation (or another earlier
> optimization to avoid calling out to modrobe if the alias requested is
> already present, which does the time shaving I mentioned on the tests) then
> naturally no request_module() calls returning 0 can assert information
> about the requested module. I think we might need to change more code if we
> accept we cannot trust request_module() calls, or we accept userspace
> telling the kernel something may mean we sometimes crash. This later
> predicament seems rather odd so hence the patch.
>
> Perhaps in some cases validation of work from a umh is not critical in
> kernel but for request_module() I can tell you that today get_fs_type code
> currently asserts the module found can never be NULL.
OK, what am I missing in the code above?
> > Looking at the code in the kernel, we *already* get this right: block if
> > a module is still loading anyway. Once it succeeds we return -EBUSY if
> >
> > it fails we'll proceed to try to load it again.
> >
> > I don't understand what you're trying to fix with adding aliases
> > in-kernel?
>
> Two fold now:
>
> a) validation on request_module() work when an alias is used
But why?
> b) since kmod accepts aliaes, if we get aliases support, it means we could
> *also* preemptively avoid calling out to userspace for modules already
> present.
No, because once we have a module we don't request it: requesting is the
fallback case.
> >> FWIW a few things did occur to me:
> >>
> >> a) list_add_rcu() is used so new modules get added first
> >
> > Only after we're sure that there are no duplicates.
> >
> >
> OK! This is a very critical assertion. I should be able to add a debug
> WARN_ON() should two modules be on the modules list for the same module
> then ?
Yes, names must be unique.
>> b) find_module_all() returns the last module which was added as it
> traverses
>> the module list
>
>> BTW should find_module_all() use rcu to traverse?
>
> Yes; the kallsyms code does this on Oops. Not really a big issue in
> practice, but a nice fix.
>
> Ok, will bundle into my queue.
Please submit to Jessica for her module queue, as it's orthogonal
AFAICT.
> I will note though that I still think there's a bug in this code --
> upon a failure other "spinning" requests can fail, I believe this may
> be due to not having another state or informing pending modules too
> early of a failure but I haven't been able to prove this conjecture
> yet.
That's possible, but I can't see it from quickly re-checking the code.
The module should be fully usable at this point; the module's init has
been called successfully, so in the case of __get_fs_type() it should
now succeed. The module cleans up its init section, but that should be
independent.
If there is a race, it's likely to be when some other caller wakes the
queue. Moving the wakeup as soon as possible should make it easier to
trigger:
diff --git a/kernel/module.c b/kernel/module.c
index f57dd63186e6..78bd89d41a22 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3397,6 +3397,7 @@ static noinline int do_init_module(struct module *mod)
/* Now it's a first class citizen! */
mod->state = MODULE_STATE_LIVE;
+ wake_up_all(&module_wq);
blocking_notifier_call_chain(&module_notify_list,
MODULE_STATE_LIVE, mod);
@@ -3445,7 +3446,6 @@ static noinline int do_init_module(struct module *mod)
*/
call_rcu_sched(&freeinit->rcu, do_free_init);
mutex_unlock(&module_mutex);
- wake_up_all(&module_wq);
return 0;
Thanks,
Rusty.
On Mon, Dec 19, 2016 at 6:53 PM, Rusty Russell <[email protected]> wrote:
> Where does this NULL-deref is the module isn't correctly loaded?
No you are right, sorry -- I had confused a failure to mount over null
deref, my mistake.
>> *Iff* we want a sanity check to verify kmod's umh is not lying to us we
>> need to verify after 0 was returned that it was not lying to us. Since kmod
>> accepts aliases but find_modules_all() only works on the real module name a
>> validation check cannot happen when all you have are aliases.
>
> request_module() should block until resolution, but that's fundamentally
> a userspace problem. Let's not paper over it in kernelspace.
OK -- if userspace messes up again it may be a bit hard to prove
unless we have a validation debug thing in place, would such a thing
in debug form be reasonable ?
>> Yes; the kallsyms code does this on Oops. Not really a big issue in
>> practice, but a nice fix.
>>
>> Ok, will bundle into my queue.
>
> Please submit to Jessica for her module queue, as it's orthogonal
> AFAICT.
Will do.
>> I will note though that I still think there's a bug in this code --
>> upon a failure other "spinning" requests can fail, I believe this may
>> be due to not having another state or informing pending modules too
>> early of a failure but I haven't been able to prove this conjecture
>> yet.
>
> That's possible, but I can't see it from quickly re-checking the code.
>
> The module should be fully usable at this point; the module's init has
> been called successfully, so in the case of __get_fs_type() it should
> now succeed. The module cleans up its init section, but that should be
> independent.
>
> If there is a race, it's likely to be when some other caller wakes the
> queue. Moving the wakeup as soon as possible should make it easier to
> trigger:
>
> diff --git a/kernel/module.c b/kernel/module.c
> index f57dd63186e6..78bd89d41a22 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3397,6 +3397,7 @@ static noinline int do_init_module(struct module *mod)
>
> /* Now it's a first class citizen! */
> mod->state = MODULE_STATE_LIVE;
> + wake_up_all(&module_wq);
> blocking_notifier_call_chain(&module_notify_list,
> MODULE_STATE_LIVE, mod);
>
> @@ -3445,7 +3446,6 @@ static noinline int do_init_module(struct module *mod)
> */
> call_rcu_sched(&freeinit->rcu, do_free_init);
> mutex_unlock(&module_mutex);
> - wake_up_all(&module_wq);
>
> return 0;
>
Will give this a shot, thanks!
Luis
"Luis R. Rodriguez" <[email protected]> writes:
> OK -- if userspace messes up again it may be a bit hard to prove
> unless we have a validation debug thing in place, would such a thing
> in debug form be reasonable ?
That makes perfect sense. Untested hack:
diff --git a/fs/filesystems.c b/fs/filesystems.c
index c5618db110be..e5c90e80c7d3 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -275,9 +275,10 @@ struct file_system_type *get_fs_type(const char *name)
int len = dot ? dot - name : strlen(name);
fs = __get_fs_type(name, len);
- if (!fs && (request_module("fs-%.*s", len, name) == 0))
+ if (!fs && (request_module("fs-%.*s", len, name) == 0)) {
fs = __get_fs_type(name, len);
-
+ WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name);
+ }
if (dot && fs && !(fs->fs_flags & FS_HAS_SUBTYPE)) {
put_filesystem(fs);
fs = NULL;
Maybe a similar hack for try_then_request_module(), but many places seem
to open-code request_module() so it's not as trivial...
Cheers,
Rusty.
On Tue, Dec 20, 2016 at 8:21 PM, Rusty Russell <[email protected]> wrote:
> "Luis R. Rodriguez" <[email protected]> writes:
>> OK -- if userspace messes up again it may be a bit hard to prove
>> unless we have a validation debug thing in place, would such a thing
>> in debug form be reasonable ?
>
> That makes perfect sense. Untested hack:
>
> diff --git a/fs/filesystems.c b/fs/filesystems.c
> index c5618db110be..e5c90e80c7d3 100644
> --- a/fs/filesystems.c
> +++ b/fs/filesystems.c
> @@ -275,9 +275,10 @@ struct file_system_type *get_fs_type(const char *name)
> int len = dot ? dot - name : strlen(name);
>
> fs = __get_fs_type(name, len);
> - if (!fs && (request_module("fs-%.*s", len, name) == 0))
> + if (!fs && (request_module("fs-%.*s", len, name) == 0)) {
> fs = __get_fs_type(name, len);
> -
> + WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name);
> + }
> if (dot && fs && !(fs->fs_flags & FS_HAS_SUBTYPE)) {
> put_filesystem(fs);
> fs = NULL;
This is precisely a type of debug patch we had added first to verify "WTF".
> Maybe a similar hack for try_then_request_module(), but many places seem
> to open-code request_module() so it's not as trivial...
Right, out of ~350 request_module() calls (not included try requests)
only ~46 check the return value. Hence a validation check, and come to
think of it, *this* was the issue that originally had me believing
that in some places we might end up in a null deref --if those open
coded request_module() calls assume the driver is loaded there could
be many places where a NULL is inevitable. Granted, I agree they
should be fixed, we could add a grammar rule to start nagging at
driver developers for started, but it does beg the question also of
what a tightly knit validation for modprobe might look like, and hence
this patch and now the completed not-yet-posted alias work.
Would it be worthy as a kconfig kmod debugging aide for now? I can
follow up with a semantic patch to nag about checking the return value
of request_module(), and we can have 0-day then also complain about
new invalid uses.
Luis
+++ Luis R. Rodriguez [16/12/16 09:05 +0100]:
>On Thu, Dec 15, 2016 at 01:46:25PM +0100, Petr Mladek wrote:
>> On Thu 2016-12-08 22:08:59, Luis R. Rodriguez wrote:
>> > On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
>> > > On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <[email protected]> wrote:
>> > > > kmod_concurrent is used as an atomic counter for enabling
>> > > > the allowed limit of modprobe calls, provide wrappers for it
>> > > > to enable this to be expanded on more easily. This will be done
>> > > > later.
>> > > >
>> > > > Signed-off-by: Luis R. Rodriguez <[email protected]>
>> > > > ---
>> > > > kernel/kmod.c | 27 +++++++++++++++++++++------
>> > > > 1 file changed, 21 insertions(+), 6 deletions(-)
>> > > >
>> > > > diff --git a/kernel/kmod.c b/kernel/kmod.c
>> > > > index cb6f7ca7b8a5..049d7eabda38 100644
>> > > > --- a/kernel/kmod.c
>> > > > +++ b/kernel/kmod.c
>> > > > @@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
>> > > > return -ENOMEM;
>> > > > }
>> > > >
>> > > > +static int kmod_umh_threads_get(void)
>> > > > +{
>> > > > + atomic_inc(&kmod_concurrent);
>>
>> This approach might actually cause false failures. If we
>> are on the limit and more processes do this increment
>> in parallel, it makes the number bigger that it should be.
>
>This approach is *exactly* what the existing code does :P
>I just provided wrappers. I agree with the old approach though,
>reason is it acts as a lock in for the bump.
I think what Petr meant was that we could run into false failures when multiple
atomic increments happen between the first increment and the subsequent
atomic_read.
Say max_modprobes is 64 -
atomic_inc(&kmod_concurrent); // thread 1: kmod_concurrent is 63
atomic_inc(&kmod_concurrent); // thread 2: kmod_concurrent is 64
atomic_inc(&kmod_concurrent); // thread 3: kmod_concurrent is 65
if (atomic_read(&kmod_concurrent) < max_modprobes) // if all threads read 65 here, then all will error out
return 0; // when the first two should have succeeded (false failures)
atomic_dec(&kmod_concurrent);
return -ENOMEM;
But yeah, I think this issue was already in the existing kmod code..
Jessica
+++ Luis R. Rodriguez [08/12/16 11:48 -0800]:
>kmod_concurrent is used as an atomic counter for enabling
>the allowed limit of modprobe calls, provide wrappers for it
>to enable this to be expanded on more easily. This will be done
>later.
>
>Signed-off-by: Luis R. Rodriguez <[email protected]>
>---
> kernel/kmod.c | 27 +++++++++++++++++++++------
> 1 file changed, 21 insertions(+), 6 deletions(-)
>
>diff --git a/kernel/kmod.c b/kernel/kmod.c
>index cb6f7ca7b8a5..049d7eabda38 100644
>--- a/kernel/kmod.c
>+++ b/kernel/kmod.c
>@@ -44,6 +44,9 @@
> #include <trace/events/module.h>
>
> extern int max_threads;
>+
>+static atomic_t kmod_concurrent = ATOMIC_INIT(0);
>+
> unsigned int max_modprobes;
> module_param(max_modprobes, uint, 0644);
> MODULE_PARM_DESC(max_modprobes, "Max number of allowed concurrent modprobes");
>@@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
> return -ENOMEM;
> }
>
>+static int kmod_umh_threads_get(void)
>+{
>+ atomic_inc(&kmod_concurrent);
>+ if (atomic_read(&kmod_concurrent) < max_modprobes)
Should this not be <=? I think this only allows up to max_modprobes-1 concurrent threads.
>+ return 0;
>+ atomic_dec(&kmod_concurrent);
>+ return -ENOMEM;
>+}
>+
>+static void kmod_umh_threads_put(void)
>+{
>+ atomic_dec(&kmod_concurrent);
>+}
>+
> /**
> * __request_module - try to load a kernel module
> * @wait: wait (or not) for the operation to complete
>@@ -129,7 +146,6 @@ int __request_module(bool wait, const char *fmt, ...)
> va_list args;
> char module_name[MODULE_NAME_LEN];
> int ret;
>- static atomic_t kmod_concurrent = ATOMIC_INIT(0);
> static int kmod_loop_msg;
>
> /*
>@@ -153,8 +169,8 @@ int __request_module(bool wait, const char *fmt, ...)
> if (ret)
> return ret;
>
>- atomic_inc(&kmod_concurrent);
>- if (atomic_read(&kmod_concurrent) > max_modprobes) {
>+ ret = kmod_umh_threads_get();
>+ if (ret) {
> /* We may be blaming an innocent here, but unlikely */
> if (kmod_loop_msg < 5) {
> printk(KERN_ERR
>@@ -162,15 +178,14 @@ int __request_module(bool wait, const char *fmt, ...)
> module_name);
> kmod_loop_msg++;
> }
>- atomic_dec(&kmod_concurrent);
>- return -ENOMEM;
>+ return ret;
> }
>
> trace_module_request(module_name, wait, _RET_IP_);
>
> ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
>
>- atomic_dec(&kmod_concurrent);
>+ kmod_umh_threads_put();
> return ret;
> }
> EXPORT_SYMBOL(__request_module);
>--
>2.10.1
>