LinuxLists.cc - [PATCH v3 0/4] module: Merge same-name module load requests

2022-10-16 12:44:23

Subject: [PATCH v3 0/4] module: Merge same-name module load requests

Changes since v2 [1]:
- Add kselftests to verify the expected behavior.
- Split a clean-up change to the module_mutex comment into a separate patch.
- Clarify a description of the main patch and mention that it fixes also
a recently reported problem with module vmap space allocation errors.
- Improve some code comments.

Changes since v1 [2]:
- Change the error returned by a duplicate load when the main insert
fails from -ENODEV to -EBUSY.
- Change the error returned by a duplicate load when a same-name module
is going from -EAGAIN to -EBUSY.
- Use a per-shared_load_info completion object to inform waiting loads
when the main one is done.
- Add a patch to correct wake up of module_wq.

[1] https://lore.kernel.org/linux-modules/[email protected]/
[2] https://lore.kernel.org/linux-modules/[email protected]/

Petr Pavlu (4):
module: Correct wake up of module_wq
module: Update a comment describing what is protected by module_mutex
module: Merge same-name module load requests
selftests: kmod: Add tests for merging same-name module load requests

kernel/module/main.c | 222 +++++++++++++-----
tools/testing/selftests/kmod/.gitignore | 1 +
tools/testing/selftests/kmod/Makefile | 17 +-
tools/testing/selftests/kmod/init_module.c | 49 ++++
tools/testing/selftests/kmod/kmod.sh | 139 +++++++++++
.../selftests/kmod/kmod_test_0014/Makefile | 14 ++
.../kmod/kmod_test_0014/kmod_test_0014.c | 29 +++
7 files changed, 402 insertions(+), 69 deletions(-)
create mode 100644 tools/testing/selftests/kmod/.gitignore
create mode 100644 tools/testing/selftests/kmod/init_module.c
create mode 100644 tools/testing/selftests/kmod/kmod_test_0014/Makefile
create mode 100644 tools/testing/selftests/kmod/kmod_test_0014/kmod_test_0014.c

--
2.35.3

2022-10-16 12:44:48

by Petr Pavlu

[permalink] [raw]

Subject: [PATCH v3 3/4] module: Merge same-name module load requests

During a system boot, it can happen that the kernel receives a burst of
requests to insert the same module but loading it eventually fails
during its init call. For instance, udev can make a request to insert
a frequency module for each individual CPU when another frequency module
is already loaded which causes the init function of the new module to
return an error.

The module loader currently serializes all such requests, with the
barrier in add_unformed_module(). This creates a lot of unnecessary work
and delays the boot. It can prevent udev from loading drivers for other
devices and might cause timeouts of services waiting on them and
subsequently a failed boot.

The mentioned serialization was introduced as a side-effect of commit
6e6de3dee51a. The kernel before that merged some of same load requests
although it was more by accident and relied on specific timing. The
patch brings this behavior back in a more explicit form.

The logic is improved as follows:
* A check whether a module load matches an already loaded module is
moved right after a module name is determined. -EEXIST continues to be
returned if the module exists and is live, -EBUSY is returned if
a same-name module is going.
* A new reference-counted shared_load_info structure is introduced to
keep track of duplicate load requests. Two loads are considered
equivalent if their module name matches. In case a load duplicates
another running insert, the code waits for its completion and then
returns -EEXIST or -EBUSY depending on whether it succeeded.

Moving the check for same-name module loads earlier has also a positive
effect on reducing memory pressure. For instance, David Hildenbrand and
Lin Liu reported [1] that when KASAN_INLINE is enabled (resulting in
large module size), with plenty of devices that udev wants to probe and
with plenty of CPUs that can carry out that probing concurrently, the
system can actually run out of module vmap space and trigger vmap
allocation errors. This is fixed by the patch too as it avoids duplicate
layout_and_allocate() work.

[1] https://lore.kernel.org/all/[email protected]/

Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading")
Reviewed-by: Petr Mladek <[email protected]>
Signed-off-by: Petr Pavlu <[email protected]>
---
kernel/module/main.c | 217 ++++++++++++++++++++++++++++++-------------
1 file changed, 155 insertions(+), 62 deletions(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index 5288843ca40f..2228c0f725e7 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -66,11 +66,28 @@
* uses RCU list operations).
* 2) module_use links,
* 3) mod_tree.addr_min/mod_tree.addr_max,
- * 4) list of unloaded_tainted_modules.
+ * 4) list of unloaded_tainted_modules,
+ * 5) list of running_loads.
*/
DEFINE_MUTEX(module_mutex);
LIST_HEAD(modules);

+/* Shared information to track duplicate module loads. */
+struct shared_load_info {
+ char name[MODULE_NAME_LEN];
+ refcount_t refcnt;
+ struct list_head list;
+ struct completion done;
+ int err;
+};
+static LIST_HEAD(running_loads);
+
+/*
+ * Waiting for a module load when the exact module name is not known, for
+ * example, when resolving symbols from another modules.
+ */
+static DECLARE_WAIT_QUEUE_HEAD(module_wq);
+
/* Work queue for freeing init sections in success case */
static void do_free_init(struct work_struct *w);
static DECLARE_WORK(init_free_wq, do_free_init);
@@ -124,9 +141,6 @@ static void mod_update_bounds(struct module *mod)
int modules_disabled;
core_param(nomodule, modules_disabled, bint, 0);

-/* Waiting for a module to finish initializing? */
-static DECLARE_WAIT_QUEUE_HEAD(module_wq);
-
static BLOCKING_NOTIFIER_HEAD(module_notify_list);

int register_module_notifier(struct notifier_block *nb)
@@ -764,8 +778,6 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
strscpy(last_unloaded_module.taints, module_flags(mod, buf, false), sizeof(last_unloaded_module.taints));

free_module(mod);
- /* someone could wait for the module in add_unformed_module() */
- wake_up_interruptible(&module_wq);
return 0;
out:
mutex_unlock(&module_mutex);
@@ -2373,26 +2385,6 @@ static int post_relocation(struct module *mod, const struct load_info *info)
return module_finalize(info->hdr, info->sechdrs, mod);
}

-/* Is this module of this name done loading? No locks held. */
-static bool finished_loading(const char *name)
-{
- struct module *mod;
- bool ret;
-
- /*
- * The module_mutex should not be a heavily contended lock;
- * if we get the occasional sleep here, we'll go an extra iteration
- * in the wait_event_interruptible(), which is harmless.
- */
- sched_annotate_sleep();
- mutex_lock(&module_mutex);
- mod = find_module_all(name, strlen(name), true);
- ret = !mod || mod->state == MODULE_STATE_LIVE;
- mutex_unlock(&module_mutex);
-
- return ret;
-}
-
/* Call module constructors. */
static void do_mod_ctors(struct module *mod)
{
@@ -2523,7 +2515,6 @@ static noinline int do_init_module(struct module *mod)
schedule_work(&init_free_wq);

mutex_unlock(&module_mutex);
- wake_up_interruptible(&module_wq);

return 0;

@@ -2539,7 +2530,6 @@ static noinline int do_init_module(struct module *mod)
klp_module_going(mod);
ftrace_release_mod(mod);
free_module(mod);
- wake_up_interruptible(&module_wq);
return ret;
}

@@ -2551,43 +2541,138 @@ static int may_init_module(void)
return 0;
}

+static struct shared_load_info *
+shared_load_info_alloc(const struct load_info *info)
+{
+ struct shared_load_info *shared_info =
+ kzalloc(sizeof(*shared_info), GFP_KERNEL);
+ if (shared_info == NULL)
+ return ERR_PTR(-ENOMEM);
+
+ strscpy(shared_info->name, info->name, sizeof(shared_info->name));
+ refcount_set(&shared_info->refcnt, 1);
+ INIT_LIST_HEAD(&shared_info->list);
+ init_completion(&shared_info->done);
+ return shared_info;
+}
+
+static void shared_load_info_get(struct shared_load_info *shared_info)
+{
+ refcount_inc(&shared_info->refcnt);
+}
+
+static void shared_load_info_put(struct shared_load_info *shared_info)
+{
+ if (refcount_dec_and_test(&shared_info->refcnt))
+ kfree(shared_info);
+}
+
/*
- * We try to place it in the list now to make sure it's unique before
- * we dedicate too many resources. In particular, temporary percpu
+ * Check that a module load is unique and make it visible to others. The code
+ * looks for parallel running inserts and already loaded modules. Two inserts
+ * are considered equivalent if their module name matches. In case this load
+ * duplicates another running insert, the code waits for its completion and
+ * then returns -EEXIST or -EBUSY depending on whether it succeeded.
+ *
+ * Detecting early that a load is unique avoids dedicating too many cycles and
+ * resources to bring up the module. In particular, it prevents temporary percpu
* memory exhaustion.
+ *
+ * Merging same load requests then primarily helps during the boot process. It
+ * can happen that the kernel receives a burst of requests to load the same
+ * module (for example, a same module for each individual CPU) and loading it
+ * eventually fails during its init call. Merging the requests allows that only
+ * one full attempt to load the module is made.
+ *
+ * On a non-error return, it is guaranteed that this load is unique.
*/
-static int add_unformed_module(struct module *mod)
+static struct shared_load_info *add_running_load(const struct load_info *info)
{
- int err;
struct module *old;
+ struct shared_load_info *shared_info;

- mod->state = MODULE_STATE_UNFORMED;
-
-again:
mutex_lock(&module_mutex);
- old = find_module_all(mod->name, strlen(mod->name), true);
- if (old != NULL) {
- if (old->state != MODULE_STATE_LIVE) {
- /* Wait in case it fails to load. */
+
+ /* Search if there is a running load of a module with the same name. */
+ list_for_each_entry(shared_info, &running_loads, list)
+ if (strcmp(shared_info->name, info->name) == 0) {
+ int err;
+
+ shared_load_info_get(shared_info);
mutex_unlock(&module_mutex);
- err = wait_event_interruptible(module_wq,
- finished_loading(mod->name));
- if (err)
- goto out_unlocked;
- goto again;
+
+ err = wait_for_completion_interruptible(
+ &shared_info->done);
+ /*
+ * Return -EBUSY when the parallel load failed for any
+ * reason. This load might end up another way but we are
+ * not going to try.
+ */
+ if (!err)
+ err = shared_info->err ? -EBUSY : -EEXIST;
+ shared_load_info_put(shared_info);
+ shared_info = ERR_PTR(err);
+ goto out_unlocked;
+ }
+
+ /* Search if there is a live module with the given name already. */
+ old = find_module_all(info->name, strlen(info->name), true);
+ if (old != NULL) {
+ if (old->state == MODULE_STATE_LIVE) {
+ shared_info = ERR_PTR(-EEXIST);
+ goto out;
}
- err = -EEXIST;
+
+ /*
+ * Any active load always has its record in running_loads and so
+ * would be found above. This applies independent whether such
+ * a module is currently in MODULE_STATE_UNFORMED,
+ * MODULE_STATE_COMING, or even in MODULE_STATE_GOING if its
+ * initialization failed. It therefore means this must be an
+ * older going module and the caller should try later once it is
+ * gone.
+ */
+ WARN_ON(old->state != MODULE_STATE_GOING);
+ shared_info = ERR_PTR(-EBUSY);
goto out;
}
- mod_update_bounds(mod);
- list_add_rcu(&mod->list, &modules);
- mod_tree_insert(mod);
- err = 0;
+
+ /* The load is unique, make it visible to others. */
+ shared_info = shared_load_info_alloc(info);
+ if (IS_ERR(shared_info))
+ goto out;
+ list_add(&shared_info->list, &running_loads);

out:
mutex_unlock(&module_mutex);
out_unlocked:
- return err;
+ return shared_info;
+}
+
+static void finalize_running_load(struct shared_load_info *shared_info, int err)
+{
+ /* Inform other duplicate inserts that the load finished. */
+ mutex_lock(&module_mutex);
+ list_del(&shared_info->list);
+ shared_info->err = err;
+ mutex_unlock(&module_mutex);
+
+ complete_all(&shared_info->done);
+ shared_load_info_put(shared_info);
+
+ /* Tell other modules waiting on this one that it completed loading. */
+ wake_up_interruptible(&module_wq);
+}
+
+static void add_unformed_module(struct module *mod)
+{
+ mod->state = MODULE_STATE_UNFORMED;
+
+ mutex_lock(&module_mutex);
+ mod_update_bounds(mod);
+ list_add_rcu(&mod->list, &modules);
+ mod_tree_insert(mod);
+ mutex_unlock(&module_mutex);
}

static int complete_formation(struct module *mod, struct load_info *info)
@@ -2672,6 +2757,7 @@ static int unknown_module_param_cb(char *param, char *val, const char *modname,
static int load_module(struct load_info *info, const char __user *uargs,
int flags)
{
+ struct shared_load_info *shared_info;
struct module *mod;
long err = 0;
char *after_dashes;
@@ -2709,38 +2795,43 @@ static int load_module(struct load_info *info, const char __user *uargs,
goto free_copy;

/*
- * Now that we know we have the correct module name, check
- * if it's blacklisted.
+ * Now that we know we have the correct module name, check if there is
+ * another load of the same name in progress.
*/
+ shared_info = add_running_load(info);
+ if (IS_ERR(shared_info)) {
+ err = PTR_ERR(shared_info);
+ goto free_copy;
+ }
+
+ /* Check if the module is blacklisted. */
if (blacklisted(info->name)) {
err = -EPERM;
pr_err("Module %s is blacklisted\n", info->name);
- goto free_copy;
+ goto free_shared;
}

err = rewrite_section_headers(info, flags);
if (err)
- goto free_copy;
+ goto free_shared;

/* Check module struct version now, before we try to use module. */
if (!check_modstruct_version(info, info->mod)) {
err = -ENOEXEC;
- goto free_copy;
+ goto free_shared;
}

/* Figure out module layout, and allocate all the memory. */
mod = layout_and_allocate(info, flags);
if (IS_ERR(mod)) {
err = PTR_ERR(mod);
- goto free_copy;
+ goto free_shared;
}

audit_log_kern_module(mod->name);

/* Reserve our place in the list. */
- err = add_unformed_module(mod);
- if (err)
- goto free_module;
+ add_unformed_module(mod);

#ifdef CONFIG_MODULE_SIG
mod->sig_ok = info->sig_ok;
@@ -2847,7 +2938,9 @@ static int load_module(struct load_info *info, const char __user *uargs,
/* Done! */
trace_module_load(mod);

- return do_init_module(mod);
+ err = do_init_module(mod);
+ finalize_running_load(shared_info, err);
+ return err;

sysfs_cleanup:
mod_sysfs_teardown(mod);
@@ -2880,15 +2973,15 @@ static int load_module(struct load_info *info, const char __user *uargs,
/* Unlink carefully: kallsyms could be walking list. */
list_del_rcu(&mod->list);
mod_tree_remove(mod);
- wake_up_interruptible(&module_wq);
/* Wait for RCU-sched synchronizing before releasing mod->list. */
synchronize_rcu();
mutex_unlock(&module_mutex);
- free_module:
/* Free lock-classes; relies on the preceding sync_rcu() */
lockdep_free_key_range(mod->data_layout.base, mod->data_layout.size);

module_deallocate(mod, info);
+ free_shared:
+ finalize_running_load(shared_info, err);
free_copy:
free_copy(info, flags);
return err;
--
2.35.3

2022-10-16 13:03:48

by Petr Pavlu

[permalink] [raw]

Subject: [PATCH v3 4/4] selftests: kmod: Add tests for merging same-name module load requests

Add two tests to check that loading the same module multiple times in
parallel results only in one real attempt to initialize it.
Synchronization of the loads is done by waiting 1000 ms in the init
function of a sample module kmod_test_0014. The tests measure time
needed to perform all inserts to verify that the loads get merged by the
module loader and are not serialized.

* Case 0014 checks a situation when the load is successful. It should
result in one insert returning 0 and remaining inserts returning
EEXIST.
* Case 0015 checks a situation when the load is failing because the
module init function returns ENODEV. It should result in one insert
returning this error code and remaining inserts returning EBUSY.

The tests use a simple init_module program to load kmod_test_0014.ko. It
enables to obtain directly a return code from the finit_module syscall.

Signed-off-by: Petr Pavlu <[email protected]>
---
tools/testing/selftests/kmod/.gitignore | 1 +
tools/testing/selftests/kmod/Makefile | 17 ++-
tools/testing/selftests/kmod/init_module.c | 49 ++++++
tools/testing/selftests/kmod/kmod.sh | 139 ++++++++++++++++++
.../selftests/kmod/kmod_test_0014/Makefile | 14 ++
.../kmod/kmod_test_0014/kmod_test_0014.c | 29 ++++
6 files changed, 244 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/kmod/.gitignore
create mode 100644 tools/testing/selftests/kmod/init_module.c
create mode 100644 tools/testing/selftests/kmod/kmod_test_0014/Makefile
create mode 100644 tools/testing/selftests/kmod/kmod_test_0014/kmod_test_0014.c

diff --git a/tools/testing/selftests/kmod/.gitignore b/tools/testing/selftests/kmod/.gitignore
new file mode 100644
index 000000000000..ea3afcaccc79
--- /dev/null
+++ b/tools/testing/selftests/kmod/.gitignore
@@ -0,0 +1 @@
+init_module
diff --git a/tools/testing/selftests/kmod/Makefile b/tools/testing/selftests/kmod/Makefile
index 5b3e746a0bee..48aeaae76211 100644
--- a/tools/testing/selftests/kmod/Makefile
+++ b/tools/testing/selftests/kmod/Makefile
@@ -1,12 +1,19 @@
# SPDX-License-Identifier: GPL-2.0-only
# Makefile for kmod loading selftests

-# No binaries, but make sure arg-less "make" doesn't trigger "run_tests"
-all:
-
TEST_PROGS := kmod.sh

+# compile but not part of 'make run_tests'
+TEST_GEN_PROGS_EXTENDED := init_module kmod_test_0014.ko
+
+# override lib.mk's default rule
+override define CLEAN
+ $(RM) $(TEST_GEN_PROGS_EXTENDED)
+ $(MAKE) -C kmod_test_0014 clean
+endef
+
include ../lib.mk

-# Nothing to clean up.
-clean:
+$(OUTPUT)/kmod_test_0014.ko: $(wildcard kmod_test_0014/Makefile kmod_test_0014/*.[ch])
+ $(MAKE) -C kmod_test_0014
+ cp kmod_test_0014/kmod_test_0014.ko $@
diff --git a/tools/testing/selftests/kmod/init_module.c b/tools/testing/selftests/kmod/init_module.c
new file mode 100644
index 000000000000..529abd0a8357
--- /dev/null
+++ b/tools/testing/selftests/kmod/init_module.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Simple program to insert a module, similar to insmod but tailored
+ * specifically for needs of module loader tests.
+ */
+
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <errno.h>
+
+int main(int argc, char *argv[])
+{
+ const char *filename, *args;
+ int fd, ret;
+
+ if (argc < 2 || argc > 3) {
+ fprintf(stderr, "usage: %s filename [args]\n", argv[0]);
+ return EINVAL;
+ }
+
+ filename = argv[1];
+ args = argc > 2 ? argv[2] : "";
+
+ fd = open(filename, O_RDONLY);
+ if (fd == -1) {
+ ret = errno;
+ fprintf(stderr, "init_module: could not open file %s: %s\n",
+ filename, strerror(ret));
+ return ret;
+ }
+
+ ret = syscall(SYS_finit_module, fd, args, 0);
+ if (ret != 0) {
+ ret = errno;
+ fprintf(stderr, "init_module: could not load module %s: %s\n",
+ filename, strerror(ret));
+ }
+
+ close(fd);
+
+ return ret;
+}
diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
index afd42387e8b2..8836d7fe1a0a 100755
--- a/tools/testing/selftests/kmod/kmod.sh
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -65,6 +65,8 @@ ALL_TESTS="$ALL_TESTS 0010:1:1"
ALL_TESTS="$ALL_TESTS 0011:1:1"
ALL_TESTS="$ALL_TESTS 0012:1:1"
ALL_TESTS="$ALL_TESTS 0013:1:1"
+ALL_TESTS="$ALL_TESTS 0014:1:1"
+ALL_TESTS="$ALL_TESTS 0015:1:1"

# Kselftest framework requirement - SKIP code is 4.
ksft_skip=4
@@ -171,6 +173,12 @@ errno_name_to_val()
echo -1;;
-ENOENT)
echo -2;;
+ -EBUSY)
+ echo -16;;
+ -EEXIST)
+ echo -17;;
+ -ENODEV)
+ echo -19;;
-EINVAL)
echo -22;;
-ERR_ANY)
@@ -190,6 +198,12 @@ errno_val_to_name()
echo -EPERM;;
-2)
echo -ENOENT;;
+ -16)
+ echo -EBUSY;;
+ -17)
+ echo -EEXIST;;
+ -19)
+ echo -ENODEV;;
-22)
echo -EINVAL;;
-123456)
@@ -504,6 +518,129 @@ kmod_test_0013()
"cat /sys/module/${DEFAULT_KMOD_DRIVER}/sections/.*text | head -n1"
}

+# Test that loading the same module multiple times in parallel results only in
+# one real attempt to initialize it. Synchronization of the loads is done by
+# waiting 1000 ms in the init function of a sample module kmod_test_0014. Tests
+# measure time needed to perform all inserts to verify that the loads get merged
+# by the module loader and are not serialized.
+#
+# * Case 0014 checks a situation when the load is successful. It should result
+# in one insert returning 0 and remaining inserts returning EEXIST.
+# * Case 0015 checks a situation when the load is failing because the module
+# init function returns ENODEV. It should result in one insert returning this
+# error code and remaining inserts returning EBUSY.
+kmod_check_parallel_loads()
+{
+ local test_name="$1"
+ local parallel_loads="$2"
+ local insmod_params="$3"
+ local max_time="$4"
+ local success_exp="$5"
+ local ebusy_exp="$6"
+ local eexist_exp="$7"
+ local enodev_exp="$8"
+ local other_exp="$9"
+
+ local EBUSY=$(errno_name_to_val -EBUSY)
+ local EEXIST=$(errno_name_to_val -EEXIST)
+ local ENODEV=$(errno_name_to_val -ENODEV)
+ local start end elapsed pids
+ local success_cnt=0 ebusy_cnt=0 eexist_cnt=0 enodev_cnt=0 other_cnt=0
+ local test_ok=true
+
+ # Do a dummy insert to get the file into the page cache.
+ ./init_module kmod_test_0014.ko "sleep_msecs=0 retval=0"
+ rmmod kmod_test_0014
+
+ # Run the parallel loads.
+ let start=$(date +%s%N)/1000000
+
+ for i in $(seq 0 $(($parallel_loads - 1))); do
+ ./init_module kmod_test_0014.ko "$insmod_params" &
+ pids[$i]=$!
+ done
+
+ for pid in ${pids[*]}; do
+ local rc
+ { wait $pid; let rc=-$?; } || true
+
+ case $rc in
+ 0)
+ let success_cnt=$success_cnt+1;;
+ $EBUSY)
+ let ebusy_cnt=$ebusy_cnt+1;;
+ $EEXIST)
+ let eexist_cnt=$eexist_cnt+1;;
+ $ENODEV)
+ let enodev_cnt=$enodev_cnt+1;;
+ *)
+ let other_cnt=$other_cnt+1;;
+ esac
+ done
+
+ let end=$(date +%s%N)/1000000
+ let elapsed=$end-$start
+
+ if (( $success_cnt > 0 )); then
+ rmmod kmod_test_0014
+ fi
+
+ # Check the results.
+ if (( $elapsed > $max_time )); then
+ echo "$test_name: FAIL, time to load the modules is within limit, test expects '<$max_time' ms - got '$elapsed' ms" >&2
+ test_ok=false
+ else
+ echo "$test_name: OK! - time to load the modules is within limit, test expects '<$max_time' ms - got '$elapsed' ms" >&2
+ fi
+ if (( $success_cnt != $success_exp )); then
+ echo "$test_name: FAIL, number of successful loads, test expects '$success_exp' - got '$success_cnt'" >&2
+ test_ok=false
+ else
+ echo "$test_name: OK! - number of successful loads, test expects '$success_exp' - got '$success_cnt'" >&2
+ fi
+ if (( $ebusy_cnt != $ebusy_exp )); then
+ echo "$test_name: FAIL, number of loads returning EBUSY, test expects '$ebusy_exp' - got '$ebusy_cnt'" >&2
+ test_ok=false
+ else
+ echo "$test_name: OK! - number of loads returning EBUSY, test expects '$ebusy_exp' - got '$ebusy_cnt'" >&2
+ fi
+ if (( $eexist_cnt != $eexist_exp )); then
+ echo "$test_name: FAIL, number of loads returning EEXIST, test expects '$eexist_exp' - got '$eexist_cnt'" >&2
+ test_ok=false
+ else
+ echo "$test_name: OK! - number of loads returning EEXIST, test expects '$eexist_exp' - got '$eexist_cnt'" >&2
+ fi
+ if (( $enodev_cnt != $enodev_exp )); then
+ echo "$test_name: FAIL, number of loads returning ENODEV, test expects '$enodev_exp' - got '$enodev_cnt'" >&2
+ test_ok=false
+ else
+ echo "$test_name: OK! - number of loads returning ENODEV, test expects '$enodev_exp' - got '$enodev_cnt'" >&2
+ fi
+ if (( $other_cnt != $other_exp )); then
+ echo "$test_name: FAIL, number of loads returning other values, test expects '$other_exp' - got '$other_cnt'" >&2
+ test_ok=false
+ else
+ echo "$test_name: OK! - number of loads returning other values, test expects '$other_exp' - got '$other_cnt'" >&2
+ fi
+
+ [ $test_ok = true ] || exit 1
+}
+
+kmod_test_0014()
+{
+ kmod_check_parallel_loads \
+ "${FUNCNAME[0]}" 4 "sleep_msecs=1000 retval=0" 3000 \
+ 1 0 3 0 0
+}
+
+kmod_test_0015()
+{
+ local ENODEV=$(errno_name_to_val -ENODEV)
+ kmod_check_parallel_loads \
+ "${FUNCNAME[0]}" 4 "sleep_msecs=1000 retval=$ENODEV" 3000 \
+ 0 3 0 1 0
+}
+
list_tests()
{
echo "Test ID list:"
@@ -525,6 +662,8 @@ list_tests()
echo "0011 x $(get_test_count 0011) - test completely disabling module autoloading"
echo "0012 x $(get_test_count 0012) - test /proc/modules address visibility under CAP_SYSLOG"
echo "0013 x $(get_test_count 0013) - test /sys/module/*/sections/* visibility under CAP_SYSLOG"
+ echo "0014 x $(get_test_count 0014) - test handling of parallel loads, success case"
+ echo "0015 x $(get_test_count 0015) - test handling of parallel loads, init returning error"
}

usage()
diff --git a/tools/testing/selftests/kmod/kmod_test_0014/Makefile b/tools/testing/selftests/kmod/kmod_test_0014/Makefile
new file mode 100644
index 000000000000..c90afe5a98ad
--- /dev/null
+++ b/tools/testing/selftests/kmod/kmod_test_0014/Makefile
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+KMOD_TEST_0014_DIR := $(realpath $(dir $(abspath $(lastword $(MAKEFILE_LIST)))))
+KDIR ?= $(abspath $(KMOD_TEST_0014_DIR)/../../../../..)
+
+MODULES = kmod_test_0014.ko
+
+obj-m += kmod_test_0014.o
+
+all:
+ $(MAKE) -C $(KDIR) M=$(KMOD_TEST_0014_DIR) modules
+
+clean:
+ $(MAKE) -C $(KDIR) M=$(KMOD_TEST_0014_DIR) clean
diff --git a/tools/testing/selftests/kmod/kmod_test_0014/kmod_test_0014.c b/tools/testing/selftests/kmod/kmod_test_0014/kmod_test_0014.c
new file mode 100644
index 000000000000..2b2ebeefe689
--- /dev/null
+++ b/tools/testing/selftests/kmod/kmod_test_0014/kmod_test_0014.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/init.h>
+#include <linux/delay.h>
+#include <linux/module.h>
+
+static unsigned int sleep_msecs = 1000;
+module_param(sleep_msecs, uint, 0644);
+MODULE_PARM_DESC(sleep_msecs, "init sleep in miliseconds, default 1000");
+
+static int retval;
+module_param(retval, int, 0644);
+MODULE_PARM_DESC(retval, "init return value, default 0");
+
+static int __init kmod_test_0014_init(void)
+{
+ msleep(sleep_msecs);
+ return retval;
+}
+
+static void __exit kmod_test_0014_exit(void)
+{
+}
+
+module_init(kmod_test_0014_init);
+module_exit(kmod_test_0014_exit);
+
+MODULE_AUTHOR("Petr Pavlu <[email protected]>");
+MODULE_LICENSE("GPL");
--
2.35.3

2022-10-17 08:01:09

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH v3 3/4] module: Merge same-name module load requests

On 16.10.22 14:30, Petr Pavlu wrote:
> During a system boot, it can happen that the kernel receives a burst of
> requests to insert the same module but loading it eventually fails
> during its init call. For instance, udev can make a request to insert
> a frequency module for each individual CPU when another frequency module
> is already loaded which causes the init function of the new module to
> return an error.
>
> The module loader currently serializes all such requests, with the
> barrier in add_unformed_module(). This creates a lot of unnecessary work
> and delays the boot. It can prevent udev from loading drivers for other
> devices and might cause timeouts of services waiting on them and
> subsequently a failed boot.
>
> The mentioned serialization was introduced as a side-effect of commit
> 6e6de3dee51a. The kernel before that merged some of same load requests
> although it was more by accident and relied on specific timing. The
> patch brings this behavior back in a more explicit form.
>
> The logic is improved as follows:
> * A check whether a module load matches an already loaded module is
> moved right after a module name is determined. -EEXIST continues to be
> returned if the module exists and is live, -EBUSY is returned if
> a same-name module is going.

Can you clarify why the EBUSY change is needed? Why not simply return
EEXIST?

If you have thread 0 loading the module and thread 1 unloading the
module concurrently, then it's pretty much unpredictable what the
outcome will be either way, no?

Add a random sleep to thread 1 (such that the module is *not* going yet)
and the result will be EEXIST.

I suggest avoiding a EBUSY change unless there is real reason to do so.

User space that concurrently loads and unloads the same module is shaky
already, no?

> * A new reference-counted shared_load_info structure is introduced to
> keep track of duplicate load requests. Two loads are considered
> equivalent if their module name matches. In case a load duplicates
> another running insert, the code waits for its completion and then
> returns -EEXIST or -EBUSY depending on whether it succeeded.
>
> Moving the check for same-name module loads earlier has also a positive
> effect on reducing memory pressure. For instance, David Hildenbrand and
> Lin Liu reported [1] that when KASAN_INLINE is enabled (resulting in
> large module size), with plenty of devices that udev wants to probe and
> with plenty of CPUs that can carry out that probing concurrently, the
> system can actually run out of module vmap space and trigger vmap
> allocation errors. This is fixed by the patch too as it avoids duplicate
> layout_and_allocate() work.

It might we reasonable to add the kernel messages here. Can you also add
the Reported-by?

>
> [1] https://lore.kernel.org/all/[email protected]/
>
> Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading")
> Reviewed-by: Petr Mladek <[email protected]>
> Signed-off-by: Petr Pavlu <[email protected]>
> ---
> kernel/module/main.c | 217 ++++++++++++++++++++++++++++++-------------
> 1 file changed, 155 insertions(+), 62 deletions(-)
>
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 5288843ca40f..2228c0f725e7 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -66,11 +66,28 @@
> * uses RCU list operations).
> * 2) module_use links,
> * 3) mod_tree.addr_min/mod_tree.addr_max,
> - * 4) list of unloaded_tainted_modules.
> + * 4) list of unloaded_tainted_modules,
> + * 5) list of running_loads.
> */
> DEFINE_MUTEX(module_mutex);
> LIST_HEAD(modules);
>
> +/* Shared information to track duplicate module loads. */
> +struct shared_load_info {
> + char name[MODULE_NAME_LEN];
> + refcount_t refcnt;
> + struct list_head list;
> + struct completion done;
> + int err;
> +};
> +static LIST_HEAD(running_loads);
> +
> +/*
> + * Waiting for a module load when the exact module name is not known, for
> + * example, when resolving symbols from another modules.
> + */
> +static DECLARE_WAIT_QUEUE_HEAD(module_wq);
> +
> /* Work queue for freeing init sections in success case */
> static void do_free_init(struct work_struct *w);
> static DECLARE_WORK(init_free_wq, do_free_init);
> @@ -124,9 +141,6 @@ static void mod_update_bounds(struct module *mod)
> int modules_disabled;
> core_param(nomodule, modules_disabled, bint, 0);
>
> -/* Waiting for a module to finish initializing? */
> -static DECLARE_WAIT_QUEUE_HEAD(module_wq);
> -
> static BLOCKING_NOTIFIER_HEAD(module_notify_list);
>
> int register_module_notifier(struct notifier_block *nb)
> @@ -764,8 +778,6 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
> strscpy(last_unloaded_module.taints, module_flags(mod, buf, false), sizeof(last_unloaded_module.taints));
>
> free_module(mod);
> - /* someone could wait for the module in add_unformed_module() */
> - wake_up_interruptible(&module_wq);
> return 0;
> out:
> mutex_unlock(&module_mutex);
> @@ -2373,26 +2385,6 @@ static int post_relocation(struct module *mod, const struct load_info *info)
> return module_finalize(info->hdr, info->sechdrs, mod);
> }
>
> -/* Is this module of this name done loading? No locks held. */
> -static bool finished_loading(const char *name)
> -{
> - struct module *mod;
> - bool ret;
> -
> - /*
> - * The module_mutex should not be a heavily contended lock;
> - * if we get the occasional sleep here, we'll go an extra iteration
> - * in the wait_event_interruptible(), which is harmless.
> - */
> - sched_annotate_sleep();
> - mutex_lock(&module_mutex);
> - mod = find_module_all(name, strlen(name), true);
> - ret = !mod || mod->state == MODULE_STATE_LIVE;
> - mutex_unlock(&module_mutex);
> -
> - return ret;
> -}
> -
> /* Call module constructors. */
> static void do_mod_ctors(struct module *mod)
> {
> @@ -2523,7 +2515,6 @@ static noinline int do_init_module(struct module *mod)
> schedule_work(&init_free_wq);
>
> mutex_unlock(&module_mutex);
> - wake_up_interruptible(&module_wq);
>
> return 0;
>
> @@ -2539,7 +2530,6 @@ static noinline int do_init_module(struct module *mod)
> klp_module_going(mod);
> ftrace_release_mod(mod);
> free_module(mod);
> - wake_up_interruptible(&module_wq);
> return ret;
> }
>
> @@ -2551,43 +2541,138 @@ static int may_init_module(void)
> return 0;
> }
>
> +static struct shared_load_info *
> +shared_load_info_alloc(const struct load_info *info)
> +{
> + struct shared_load_info *shared_info =
> + kzalloc(sizeof(*shared_info), GFP_KERNEL);
> + if (shared_info == NULL)
> + return ERR_PTR(-ENOMEM);
> +
> + strscpy(shared_info->name, info->name, sizeof(shared_info->name));
> + refcount_set(&shared_info->refcnt, 1);
> + INIT_LIST_HEAD(&shared_info->list);
> + init_completion(&shared_info->done);
> + return shared_info;
> +}
> +
> +static void shared_load_info_get(struct shared_load_info *shared_info)
> +{
> + refcount_inc(&shared_info->refcnt);
> +}
> +
> +static void shared_load_info_put(struct shared_load_info *shared_info)
> +{
> + if (refcount_dec_and_test(&shared_info->refcnt))
> + kfree(shared_info);
> +}
> +
> /*
> - * We try to place it in the list now to make sure it's unique before
> - * we dedicate too many resources. In particular, temporary percpu
> + * Check that a module load is unique and make it visible to others. The code
> + * looks for parallel running inserts and already loaded modules. Two inserts
> + * are considered equivalent if their module name matches. In case this load
> + * duplicates another running insert, the code waits for its completion and
> + * then returns -EEXIST or -EBUSY depending on whether it succeeded.
> + *
> + * Detecting early that a load is unique avoids dedicating too many cycles and
> + * resources to bring up the module. In particular, it prevents temporary percpu
> * memory exhaustion.
> + *
> + * Merging same load requests then primarily helps during the boot process. It
> + * can happen that the kernel receives a burst of requests to load the same
> + * module (for example, a same module for each individual CPU) and loading it
> + * eventually fails during its init call. Merging the requests allows that only
> + * one full attempt to load the module is made.
> + *
> + * On a non-error return, it is guaranteed that this load is unique.
> */
> -static int add_unformed_module(struct module *mod)
> +static struct shared_load_info *add_running_load(const struct load_info *info)
> {
> - int err;
> struct module *old;
> + struct shared_load_info *shared_info;
>
> - mod->state = MODULE_STATE_UNFORMED;
> -
> -again:
> mutex_lock(&module_mutex);
> - old = find_module_all(mod->name, strlen(mod->name), true);
> - if (old != NULL) {
> - if (old->state != MODULE_STATE_LIVE) {
> - /* Wait in case it fails to load. */
> +
> + /* Search if there is a running load of a module with the same name. */
> + list_for_each_entry(shared_info, &running_loads, list)
> + if (strcmp(shared_info->name, info->name) == 0) {
> + int err;
> +
> + shared_load_info_get(shared_info);
> mutex_unlock(&module_mutex);
> - err = wait_event_interruptible(module_wq,
> - finished_loading(mod->name));
> - if (err)
> - goto out_unlocked;
> - goto again;
> +
> + err = wait_for_completion_interruptible(
> + &shared_info->done);
> + /*
> + * Return -EBUSY when the parallel load failed for any
> + * reason. This load might end up another way but we are
> + * not going to try.

Why not? Usually "-EAGAIN" signals that user space should retry. But I
hope that we can avoid EBUSY altogether and simply retry here.

I'd suggest shared_load_info_put()+retry.

No need to optimize for corner cases (concurrent load failing so we
don't retry ourselves).

> + */
> + if (!err)
> + err = shared_info->err ? -EBUSY : -EEXIST;
> + shared_load_info_put(shared_info);
> + shared_info = ERR_PTR(err);
> + goto out_unlocked;
> + }
> +
> + /* Search if there is a live module with the given name already. */
> + old = find_module_all(info->name, strlen(info->name), true);
> + if (old != NULL) {
> + if (old->state == MODULE_STATE_LIVE) {
> + shared_info = ERR_PTR(-EEXIST);
> + goto out;
> }
> - err = -EEXIST;
> +
> + /*
> + * Any active load always has its record in running_loads and so
> + * would be found above. This applies independent whether such
> + * a module is currently in MODULE_STATE_UNFORMED,
> + * MODULE_STATE_COMING, or even in MODULE_STATE_GOING if its
> + * initialization failed. It therefore means this must be an
> + * older going module and the caller should try later once it is
> + * gone.
> + */
> + WARN_ON(old->state != MODULE_STATE_GOING);
> + shared_info = ERR_PTR(-EBUSY);

As raised above, why not EEXIST? Concurrent loading+unloading is racy
either way.

> goto out;
> }
> - mod_update_bounds(mod);
> - list_add_rcu(&mod->list, &modules);
> - mod_tree_insert(mod);
> - err = 0;
> +
> + /* The load is unique, make it visible to others. */
> + shared_info = shared_load_info_alloc(info);
> + if (IS_ERR(shared_info))
> + goto out;
> + list_add(&shared_info->list, &running_loads);
>
> out:
> mutex_unlock(&module_mutex);
> out_unlocked:
> - return err;
> + return shared_info;
> +}
> +
> +static void finalize_running_load(struct shared_load_info *shared_info, int err)

s/finalize/release? It would be nice if the name could correspond to an
opposite action of "add_running_load".

> +{
> + /* Inform other duplicate inserts that the load finished. */
> + mutex_lock(&module_mutex);
> + list_del(&shared_info->list);
> + shared_info->err = err;
> + mutex_unlock(&module_mutex);
> +
> + complete_all(&shared_info->done);
> + shared_load_info_put(shared_info);
> +
> + /* Tell other modules waiting on this one that it completed loading. */
> + wake_up_interruptible(&module_wq);
> +}
> +

[...]

> sysfs_cleanup:
> mod_sysfs_teardown(mod);
> @@ -2880,15 +2973,15 @@ static int load_module(struct load_info *info, const char __user *uargs,
> /* Unlink carefully: kallsyms could be walking list. */
> list_del_rcu(&mod->list);
> mod_tree_remove(mod);
> - wake_up_interruptible(&module_wq);
> /* Wait for RCU-sched synchronizing before releasing mod->list. */
> synchronize_rcu();
> mutex_unlock(&module_mutex);
> - free_module:
> /* Free lock-classes; relies on the preceding sync_rcu() */
> lockdep_free_key_range(mod->data_layout.base, mod->data_layout.size);
>
> module_deallocate(mod, info);
> + free_shared:

Ideally, the label matches what's actually being done. So maybe
"release_shared" if you go with "release_..."

> + finalize_running_load(shared_info, err);
> free_copy:
> free_copy(info, flags);
> return err;

--
Thanks,

David / dhildenb

2022-10-17 13:06:33

by Petr Mladek

[permalink] [raw]

Subject: Re: [PATCH v3 3/4] module: Merge same-name module load requests

On Sun 2022-10-16 14:30:30, Petr Pavlu wrote:
> During a system boot, it can happen that the kernel receives a burst of
> requests to insert the same module but loading it eventually fails
> during its init call. For instance, udev can make a request to insert
> a frequency module for each individual CPU when another frequency module
> is already loaded which causes the init function of the new module to
> return an error.
>
> The module loader currently serializes all such requests, with the
> barrier in add_unformed_module(). This creates a lot of unnecessary work
> and delays the boot. It can prevent udev from loading drivers for other
> devices and might cause timeouts of services waiting on them and
> subsequently a failed boot.
>
> The mentioned serialization was introduced as a side-effect of commit
> 6e6de3dee51a. The kernel before that merged some of same load requests
> although it was more by accident and relied on specific timing. The
> patch brings this behavior back in a more explicit form.
>
> The logic is improved as follows:
> * A check whether a module load matches an already loaded module is
> moved right after a module name is determined. -EEXIST continues to be
> returned if the module exists and is live, -EBUSY is returned if
> a same-name module is going.
> * A new reference-counted shared_load_info structure is introduced to
> keep track of duplicate load requests. Two loads are considered
> equivalent if their module name matches. In case a load duplicates
> another running insert, the code waits for its completion and then
> returns -EEXIST or -EBUSY depending on whether it succeeded.
>
> Moving the check for same-name module loads earlier has also a positive
> effect on reducing memory pressure. For instance, David Hildenbrand and
> Lin Liu reported [1] that when KASAN_INLINE is enabled (resulting in
> large module size), with plenty of devices that udev wants to probe and
> with plenty of CPUs that can carry out that probing concurrently, the
> system can actually run out of module vmap space and trigger vmap
> allocation errors. This is fixed by the patch too as it avoids duplicate
> layout_and_allocate() work.
>
> [1] https://lore.kernel.org/all/[email protected]/
>
> Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading")
> Reviewed-by: Petr Mladek <[email protected]>
> Signed-off-by: Petr Pavlu <[email protected]>

All these tags should be in the order of (logical) appearance.
Somethings like:

Suggested-by: if any
Signed-off-by: author of the patch
Acked-by: in order of appearance
Revived-by: in order of appearance
Tested-by: in order of appearance
Signed-off-by: committer (upstream maintainer)

I know that I provided Reviewed-by before you did the final changes.
Well, you could use my tag only when you did only cosmetic changes or
changes that I requested. You expect that I would agree with the patch
even after it was updated.

Best Regards,
Petr

2022-10-17 14:50:20

by Petr Mladek

[permalink] [raw]

Subject: Re: [PATCH v3 4/4] selftests: kmod: Add tests for merging same-name module load requests

On Sun 2022-10-16 14:30:31, Petr Pavlu wrote:
> Add two tests to check that loading the same module multiple times in
> parallel results only in one real attempt to initialize it.
> Synchronization of the loads is done by waiting 1000 ms in the init

I do not have a good experience with this kind of synchronization.
It usually is not reliable. The test might be very slow especially when
false positives are solved by prolonging the delay.

Alternative solution would be to have two modules:

1st module would provide a counter, for example:

int modB_load_cnt;
module_param(modB_load_cnt, int, 0444);
EXPORT_SYMBOL(modB_load_cnt);

EXPORT_SYMBOL() should allow to directly increment the counter
from the 2nd module.

module_param() should make the value readable via
/sys/module/modA/parameters/modB_load_cnt. It can be
checked by kmod_sh.

> function of a sample module kmod_test_0014. The tests measure time
> needed to perform all inserts to verify that the loads get merged by the
> module loader and are not serialized.
>
> * Case 0014 checks a situation when the load is successful. It should
> result in one insert returning 0 and remaining inserts returning
> EEXIST.
> * Case 0015 checks a situation when the load is failing because the
> module init function returns ENODEV. It should result in one insert
> returning this error code and remaining inserts returning EBUSY.
>
> The tests use a simple init_module program to load kmod_test_0014.ko. It
> enables to obtain directly a return code from the finit_module syscall.

The tested scenarios look good.

Best Regards,
Petr

2022-10-18 09:39:27

by Petr Pavlu

[permalink] [raw]

Subject: Re: [PATCH v3 3/4] module: Merge same-name module load requests

On 10/17/22 09:43, David Hildenbrand wrote:
> On 16.10.22 14:30, Petr Pavlu wrote:
>> During a system boot, it can happen that the kernel receives a burst of
>> requests to insert the same module but loading it eventually fails
>> during its init call. For instance, udev can make a request to insert
>> a frequency module for each individual CPU when another frequency module
>> is already loaded which causes the init function of the new module to
>> return an error.
>>
>> The module loader currently serializes all such requests, with the
>> barrier in add_unformed_module(). This creates a lot of unnecessary work
>> and delays the boot. It can prevent udev from loading drivers for other
>> devices and might cause timeouts of services waiting on them and
>> subsequently a failed boot.
>>
>> The mentioned serialization was introduced as a side-effect of commit
>> 6e6de3dee51a. The kernel before that merged some of same load requests
>> although it was more by accident and relied on specific timing. The
>> patch brings this behavior back in a more explicit form.
>>
>> The logic is improved as follows:
>> * A check whether a module load matches an already loaded module is
>> moved right after a module name is determined. -EEXIST continues to be
>> returned if the module exists and is live, -EBUSY is returned if
>> a same-name module is going.
>
> Can you clarify why the EBUSY change is needed? Why not simply return
> EEXIST?
>
> If you have thread 0 loading the module and thread 1 unloading the
> module concurrently, then it's pretty much unpredictable what the
> outcome will be either way, no?
>
> Add a random sleep to thread 1 (such that the module is *not* going yet)
> and the result will be EEXIST.
>
> I suggest avoiding a EBUSY change unless there is real reason to do so.
>
> User space that concurrently loads and unloads the same module is shaky
> already, no?

I cannot quite think of a scenario where it would practically matter how this
corner case is handled. Prior to 6e6de3dee51a ("kernel/module.c: Only return
-EEXIST for modules that have finished loading"), an init_module() call would
have returned EEXIST in this case. After the mentioned commit, the loader
waits for the old module to be gone and then proceeds with the load. Finally,
this patch changes init_module() to immediately return EBUSY.

With the proposed changes, EEXIST and EBUSY is used as follows:
* EEXIST is returned from init_module() if a given module is already loaded or
becomes live by a parallel load.
* EBUSY is returned if a concurrent operation is detected on a module with the
same name and the module is not live. This applies to both init_module() and
delete_module().

I think it is generally a good idea to return EEXIST from init_module() only
if a given module is fully operational. Userspace (udev) typically handles
EEXIST as "success" and so there is some potential for confusion otherwise.

However, I don't feel strongly about this particular case.

>> * A new reference-counted shared_load_info structure is introduced to
>> keep track of duplicate load requests. Two loads are considered
>> equivalent if their module name matches. In case a load duplicates
>> another running insert, the code waits for its completion and then
>> returns -EEXIST or -EBUSY depending on whether it succeeded.
>>
>> Moving the check for same-name module loads earlier has also a positive
>> effect on reducing memory pressure. For instance, David Hildenbrand and
>> Lin Liu reported [1] that when KASAN_INLINE is enabled (resulting in
>> large module size), with plenty of devices that udev wants to probe and
>> with plenty of CPUs that can carry out that probing concurrently, the
>> system can actually run out of module vmap space and trigger vmap
>> allocation errors. This is fixed by the patch too as it avoids duplicate
>> layout_and_allocate() work.
>
> It might we reasonable to add the kernel messages here. Can you also add
> the Reported-by?

Ok, I avoided adding the Reported-by tag because I was not sure how to
properly record that it applies only to the vmap allocation issue. I suspect
it can be clarified after the tag in a "[...]" note.

My plan is to add the following:

[ 165.842123] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[ 165.843359] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[ 165.844894] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[ 165.847028] CPU: 253 PID: 4995 Comm: systemd-udevd Not tainted 5.19.0 #2
[ 165.935689] Hardware name: Lenovo ThinkSystem SR950 -[7X12ABC1WW]-/-[7X12ABC1WW]-, BIOS -[PSE130O-1.81]- 05/20/2020
[ 165.947343] Call Trace:
[ 165.950075] <TASK>
[ 165.952425] dump_stack_lvl+0x57/0x81
[ 165.956532] warn_alloc.cold+0x95/0x18a
[ 165.981219] __vmalloc_node_range+0x291/0x560
[ 166.003741] module_alloc+0xe7/0x170
[ 166.011840] move_module+0x4c/0x630
[ 166.015751] layout_and_allocate+0x32c/0x560
[ 166.020519] load_module+0x8e0/0x25c0
[ 166.053854] __do_sys_finit_module+0x11a/0x1c0
[ 166.068494] do_syscall_64+0x59/0x90
[ 166.095855] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 165.818200] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size

Reported-by: Lin Liu <[email protected]>
Reported-by: David Hildenbrand <[email protected]>
[the vmap allocation issue]

>>
>> [1] https://lore.kernel.org/all/[email protected]/
>>
>> Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading")
>> Reviewed-by: Petr Mladek <[email protected]>
>> Signed-off-by: Petr Pavlu <[email protected]>
>> ---
>> kernel/module/main.c | 217 ++++++++++++++++++++++++++++++-------------
>> 1 file changed, 155 insertions(+), 62 deletions(-)
>>
>> diff --git a/kernel/module/main.c b/kernel/module/main.c
>> index 5288843ca40f..2228c0f725e7 100644
>> --- a/kernel/module/main.c
>> +++ b/kernel/module/main.c
>> @@ -66,11 +66,28 @@
>> * uses RCU list operations).
>> * 2) module_use links,
>> * 3) mod_tree.addr_min/mod_tree.addr_max,
>> - * 4) list of unloaded_tainted_modules.
>> + * 4) list of unloaded_tainted_modules,
>> + * 5) list of running_loads.
>> */
>> DEFINE_MUTEX(module_mutex);
>> LIST_HEAD(modules);
>>
>> +/* Shared information to track duplicate module loads. */
>> +struct shared_load_info {
>> + char name[MODULE_NAME_LEN];
>> + refcount_t refcnt;
>> + struct list_head list;
>> + struct completion done;
>> + int err;
>> +};
>> +static LIST_HEAD(running_loads);
>> +
>> +/*
>> + * Waiting for a module load when the exact module name is not known, for
>> + * example, when resolving symbols from another modules.
>> + */
>> +static DECLARE_WAIT_QUEUE_HEAD(module_wq);
>> +
>> /* Work queue for freeing init sections in success case */
>> static void do_free_init(struct work_struct *w);
>> static DECLARE_WORK(init_free_wq, do_free_init);
>> @@ -124,9 +141,6 @@ static void mod_update_bounds(struct module *mod)
>> int modules_disabled;
>> core_param(nomodule, modules_disabled, bint, 0);
>>
>> -/* Waiting for a module to finish initializing? */
>> -static DECLARE_WAIT_QUEUE_HEAD(module_wq);
>> -
>> static BLOCKING_NOTIFIER_HEAD(module_notify_list);
>>
>> int register_module_notifier(struct notifier_block *nb)
>> @@ -764,8 +778,6 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
>> strscpy(last_unloaded_module.taints, module_flags(mod, buf, false), sizeof(last_unloaded_module.taints));
>>
>> free_module(mod);
>> - /* someone could wait for the module in add_unformed_module() */
>> - wake_up_interruptible(&module_wq);
>> return 0;
>> out:
>> mutex_unlock(&module_mutex);
>> @@ -2373,26 +2385,6 @@ static int post_relocation(struct module *mod, const struct load_info *info)
>> return module_finalize(info->hdr, info->sechdrs, mod);
>> }
>>
>> -/* Is this module of this name done loading? No locks held. */
>> -static bool finished_loading(const char *name)
>> -{
>> - struct module *mod;
>> - bool ret;
>> -
>> - /*
>> - * The module_mutex should not be a heavily contended lock;
>> - * if we get the occasional sleep here, we'll go an extra iteration
>> - * in the wait_event_interruptible(), which is harmless.
>> - */
>> - sched_annotate_sleep();
>> - mutex_lock(&module_mutex);
>> - mod = find_module_all(name, strlen(name), true);
>> - ret = !mod || mod->state == MODULE_STATE_LIVE;
>> - mutex_unlock(&module_mutex);
>> -
>> - return ret;
>> -}
>> -
>> /* Call module constructors. */
>> static void do_mod_ctors(struct module *mod)
>> {
>> @@ -2523,7 +2515,6 @@ static noinline int do_init_module(struct module *mod)
>> schedule_work(&init_free_wq);
>>
>> mutex_unlock(&module_mutex);
>> - wake_up_interruptible(&module_wq);
>>
>> return 0;
>>
>> @@ -2539,7 +2530,6 @@ static noinline int do_init_module(struct module *mod)
>> klp_module_going(mod);
>> ftrace_release_mod(mod);
>> free_module(mod);
>> - wake_up_interruptible(&module_wq);
>> return ret;
>> }
>>
>> @@ -2551,43 +2541,138 @@ static int may_init_module(void)
>> return 0;
>> }
>>
>> +static struct shared_load_info *
>> +shared_load_info_alloc(const struct load_info *info)
>> +{
>> + struct shared_load_info *shared_info =
>> + kzalloc(sizeof(*shared_info), GFP_KERNEL);
>> + if (shared_info == NULL)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + strscpy(shared_info->name, info->name, sizeof(shared_info->name));
>> + refcount_set(&shared_info->refcnt, 1);
>> + INIT_LIST_HEAD(&shared_info->list);
>> + init_completion(&shared_info->done);
>> + return shared_info;
>> +}
>> +
>> +static void shared_load_info_get(struct shared_load_info *shared_info)
>> +{
>> + refcount_inc(&shared_info->refcnt);
>> +}
>> +
>> +static void shared_load_info_put(struct shared_load_info *shared_info)
>> +{
>> + if (refcount_dec_and_test(&shared_info->refcnt))
>> + kfree(shared_info);
>> +}
>> +
>> /*
>> - * We try to place it in the list now to make sure it's unique before
>> - * we dedicate too many resources. In particular, temporary percpu
>> + * Check that a module load is unique and make it visible to others. The code
>> + * looks for parallel running inserts and already loaded modules. Two inserts
>> + * are considered equivalent if their module name matches. In case this load
>> + * duplicates another running insert, the code waits for its completion and
>> + * then returns -EEXIST or -EBUSY depending on whether it succeeded.
>> + *
>> + * Detecting early that a load is unique avoids dedicating too many cycles and
>> + * resources to bring up the module. In particular, it prevents temporary percpu
>> * memory exhaustion.
>> + *
>> + * Merging same load requests then primarily helps during the boot process. It
>> + * can happen that the kernel receives a burst of requests to load the same
>> + * module (for example, a same module for each individual CPU) and loading it
>> + * eventually fails during its init call. Merging the requests allows that only
>> + * one full attempt to load the module is made.
>> + *
>> + * On a non-error return, it is guaranteed that this load is unique.
>> */
>> -static int add_unformed_module(struct module *mod)
>> +static struct shared_load_info *add_running_load(const struct load_info *info)
>> {
>> - int err;
>> struct module *old;
>> + struct shared_load_info *shared_info;
>>
>> - mod->state = MODULE_STATE_UNFORMED;
>> -
>> -again:
>> mutex_lock(&module_mutex);
>> - old = find_module_all(mod->name, strlen(mod->name), true);
>> - if (old != NULL) {
>> - if (old->state != MODULE_STATE_LIVE) {
>> - /* Wait in case it fails to load. */
>> +
>> + /* Search if there is a running load of a module with the same name. */
>> + list_for_each_entry(shared_info, &running_loads, list)
>> + if (strcmp(shared_info->name, info->name) == 0) {
>> + int err;
>> +
>> + shared_load_info_get(shared_info);
>> mutex_unlock(&module_mutex);
>> - err = wait_event_interruptible(module_wq,
>> - finished_loading(mod->name));
>> - if (err)
>> - goto out_unlocked;
>> - goto again;
>> +
>> + err = wait_for_completion_interruptible(
>> + &shared_info->done);
>> + /*
>> + * Return -EBUSY when the parallel load failed for any
>> + * reason. This load might end up another way but we are
>> + * not going to try.
>
> Why not? Usually "-EAGAIN" signals that user space should retry. But I
> hope that we can avoid EBUSY altogether and simply retry here.
>
> I'd suggest shared_load_info_put()+retry.
>
> No need to optimize for corner cases (concurrent load failing so we
> don't retry ourselves).

Avoiding a retry in this case is actually the main motivation for this patch.
It looks I'm still failing to explain this in the commit message, but please
see my replies on previous versions of the patch where I provided more details
about the observed issue [1, 2].

Worth noting is that both your scenario and my case are situations where
a same module is attempted to be loaded multiple times, once per each CPU.
Even if only one attempt is eventually fully processed, the decision that
other parallel loads are not needed happens quite late. In particular, udev
(libkmod) still has to load and decompress a given module binary multiple
times. Ideally, I think this should be prevented altogether by improving other
parts of the whole process. Udev could be made smarter to avoid duplicate
loads or the kernel could model uevents related to CPUs differently. This is
something that I was also considering but eventually settled on trying to fix
only the immediate kernel regression.

>> + */
>> + if (!err)
>> + err = shared_info->err ? -EBUSY : -EEXIST;
>> + shared_load_info_put(shared_info);
>> + shared_info = ERR_PTR(err);
>> + goto out_unlocked;
>> + }
>> +
>> + /* Search if there is a live module with the given name already. */
>> + old = find_module_all(info->name, strlen(info->name), true);
>> + if (old != NULL) {
>> + if (old->state == MODULE_STATE_LIVE) {
>> + shared_info = ERR_PTR(-EEXIST);
>> + goto out;
>> }
>> - err = -EEXIST;
>> +
>> + /*
>> + * Any active load always has its record in running_loads and so
>> + * would be found above. This applies independent whether such
>> + * a module is currently in MODULE_STATE_UNFORMED,
>> + * MODULE_STATE_COMING, or even in MODULE_STATE_GOING if its
>> + * initialization failed. It therefore means this must be an
>> + * older going module and the caller should try later once it is
>> + * gone.
>> + */
>> + WARN_ON(old->state != MODULE_STATE_GOING);
>> + shared_info = ERR_PTR(-EBUSY);
>
> As raised above, why not EEXIST? Concurrent loading+unloading is racy
> either way.

Discussed above.

>> goto out;
>> }
>> - mod_update_bounds(mod);
>> - list_add_rcu(&mod->list, &modules);
>> - mod_tree_insert(mod);
>> - err = 0;
>> +
>> + /* The load is unique, make it visible to others. */
>> + shared_info = shared_load_info_alloc(info);
>> + if (IS_ERR(shared_info))
>> + goto out;
>> + list_add(&shared_info->list, &running_loads);
>>
>> out:
>> mutex_unlock(&module_mutex);
>> out_unlocked:
>> - return err;
>> + return shared_info;
>> +}
>> +
>> +static void finalize_running_load(struct shared_load_info *shared_info, int err)
>
> s/finalize/release? It would be nice if the name could correspond to an
> opposite action of "add_running_load".
>

Ok, I can change this.

Note though that functions add_running_load() and finalize_running_load() are
not exactly opposite actions. I tried to avoid using "release" because it
typically suggests to me a simple free of resources while
finalize_running_load() still does some important work.

>> +{
>> + /* Inform other duplicate inserts that the load finished. */
>> + mutex_lock(&module_mutex);
>> + list_del(&shared_info->list);
>> + shared_info->err = err;
>> + mutex_unlock(&module_mutex);
>> +
>> + complete_all(&shared_info->done);
>> + shared_load_info_put(shared_info);
>> +
>> + /* Tell other modules waiting on this one that it completed loading. */
>> + wake_up_interruptible(&module_wq);
>> +}
>> +
>
>
> [...]
>
>> sysfs_cleanup:
>> mod_sysfs_teardown(mod);
>> @@ -2880,15 +2973,15 @@ static int load_module(struct load_info *info, const char __user *uargs,
>> /* Unlink carefully: kallsyms could be walking list. */
>> list_del_rcu(&mod->list);
>> mod_tree_remove(mod);
>> - wake_up_interruptible(&module_wq);
>> /* Wait for RCU-sched synchronizing before releasing mod->list. */
>> synchronize_rcu();
>> mutex_unlock(&module_mutex);
>> - free_module:
>> /* Free lock-classes; relies on the preceding sync_rcu() */
>> lockdep_free_key_range(mod->data_layout.base, mod->data_layout.size);
>>
>> module_deallocate(mod, info);
>> + free_shared:
>
> Ideally, the label matches what's actually being done. So maybe
> "release_shared" if you go with "release_..."

Ok.

[1] https://lore.kernel.org/linux-modules/[email protected]/
[2] https://lore.kernel.org/linux-modules/[email protected]/

Thanks,
Petr

2022-10-18 10:12:15

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH v3 3/4] module: Merge same-name module load requests

>> User space that concurrently loads and unloads the same module is shaky
>> already, no?
>
> I cannot quite think of a scenario where it would practically matter how this
> corner case is handled. Prior to 6e6de3dee51a ("kernel/module.c: Only return
> -EEXIST for modules that have finished loading"), an init_module() call would
> have returned EEXIST in this case. After the mentioned commit, the loader
> waits for the old module to be gone and then proceeds with the load. Finally,
> this patch changes init_module() to immediately return EBUSY.
>
> With the proposed changes, EEXIST and EBUSY is used as follows:
> * EEXIST is returned from init_module() if a given module is already loaded or
> becomes live by a parallel load.
> * EBUSY is returned if a concurrent operation is detected on a module with the
> same name and the module is not live. This applies to both init_module() and
> delete_module().
>
> I think it is generally a good idea to return EEXIST from init_module() only
> if a given module is fully operational. Userspace (udev) typically handles
> EEXIST as "success" and so there is some potential for confusion otherwise.
>
> However, I don't feel strongly about this particular case.
>

I'd vote to keep it simple and not change the way errors are returned
unless there is real reason.

EBUSY is currently documented to be only returned for "Timeout while
trying to resolve a symbol reference by this module.". Your're changing
that.

EEXIST: "A module with this name is already loaded." -- which includes
IMHO if the module is concurrently going away. Again, it's all racy
either way.

>>> * A new reference-counted shared_load_info structure is introduced to
>>> keep track of duplicate load requests. Two loads are considered
>>> equivalent if their module name matches. In case a load duplicates
>>> another running insert, the code waits for its completion and then
>>> returns -EEXIST or -EBUSY depending on whether it succeeded.
>>>
>>> Moving the check for same-name module loads earlier has also a positive
>>> effect on reducing memory pressure. For instance, David Hildenbrand and
>>> Lin Liu reported [1] that when KASAN_INLINE is enabled (resulting in
>>> large module size), with plenty of devices that udev wants to probe and
>>> with plenty of CPUs that can carry out that probing concurrently, the
>>> system can actually run out of module vmap space and trigger vmap
>>> allocation errors. This is fixed by the patch too as it avoids duplicate
>>> layout_and_allocate() work.
>>
>> It might we reasonable to add the kernel messages here. Can you also add
>> the Reported-by?
>
> Ok, I avoided adding the Reported-by tag because I was not sure how to
> properly record that it applies only to the vmap allocation issue. I suspect
> it can be clarified after the tag in a "[...]" note.
>
> My plan is to add the following:
>
> [ 165.842123] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
> [ 165.843359] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
> [ 165.844894] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
> [ 165.847028] CPU: 253 PID: 4995 Comm: systemd-udevd Not tainted 5.19.0 #2
> [ 165.935689] Hardware name: Lenovo ThinkSystem SR950 -[7X12ABC1WW]-/-[7X12ABC1WW]-, BIOS -[PSE130O-1.81]- 05/20/2020
> [ 165.947343] Call Trace:
> [ 165.950075] <TASK>
> [ 165.952425] dump_stack_lvl+0x57/0x81
> [ 165.956532] warn_alloc.cold+0x95/0x18a
> [ 165.981219] __vmalloc_node_range+0x291/0x560
> [ 166.003741] module_alloc+0xe7/0x170
> [ 166.011840] move_module+0x4c/0x630
> [ 166.015751] layout_and_allocate+0x32c/0x560
> [ 166.020519] load_module+0x8e0/0x25c0
> [ 166.053854] __do_sys_finit_module+0x11a/0x1c0
> [ 166.068494] do_syscall_64+0x59/0x90
> [ 166.095855] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 165.818200] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
>
> Reported-by: Lin Liu <[email protected]>
> Reported-by: David Hildenbrand <[email protected]>
> [the vmap allocation issue]

Sounds good, you can also mention which issue was reported by whom in
the text in addition.

[...]

>>> -static int add_unformed_module(struct module *mod)
>>> +static struct shared_load_info *add_running_load(const struct load_info *info)
>>> {
>>> - int err;
>>> struct module *old;
>>> + struct shared_load_info *shared_info;
>>>
>>> - mod->state = MODULE_STATE_UNFORMED;
>>> -
>>> -again:
>>> mutex_lock(&module_mutex);
>>> - old = find_module_all(mod->name, strlen(mod->name), true);
>>> - if (old != NULL) {
>>> - if (old->state != MODULE_STATE_LIVE) {
>>> - /* Wait in case it fails to load. */
>>> +
>>> + /* Search if there is a running load of a module with the same name. */
>>> + list_for_each_entry(shared_info, &running_loads, list)
>>> + if (strcmp(shared_info->name, info->name) == 0) {
>>> + int err;
>>> +
>>> + shared_load_info_get(shared_info);
>>> mutex_unlock(&module_mutex);
>>> - err = wait_event_interruptible(module_wq,
>>> - finished_loading(mod->name));
>>> - if (err)
>>> - goto out_unlocked;
>>> - goto again;
>>> +
>>> + err = wait_for_completion_interruptible(
>>> + &shared_info->done);
>>> + /*
>>> + * Return -EBUSY when the parallel load failed for any
>>> + * reason. This load might end up another way but we are
>>> + * not going to try.
>>
>> Why not? Usually "-EAGAIN" signals that user space should retry. But I
>> hope that we can avoid EBUSY altogether and simply retry here.
>>
>> I'd suggest shared_load_info_put()+retry.
>>
>> No need to optimize for corner cases (concurrent load failing so we
>> don't retry ourselves).
>
> Avoiding a retry in this case is actually the main motivation for this patch.
> It looks I'm still failing to explain this in the commit message, but please
> see my replies on previous versions of the patch where I provided more details
> about the observed issue [1, 2].

How is it the common case we care about that a parallel load *failed*
(not with -EEXIST but via some other error)?

That would mean we're optimizing for the case that 400 CPUs try loading
the same module and loading the module essentially always fails.

Is this really what we want to optimize?

Isn't there a way to not report EBUSY in that case as well? Return the
error from the other load that failed?

>
> Worth noting is that both your scenario and my case are situations where
> a same module is attempted to be loaded multiple times, once per each CPU.
> Even if only one attempt is eventually fully processed, the decision that
> other parallel loads are not needed happens quite late. In particular, udev
> (libkmod) still has to load and decompress a given module binary multiple
> times. Ideally, I think this should be prevented altogether by improving other
> parts of the whole process. Udev could be made smarter to avoid duplicate
> loads or the kernel could model uevents related to CPUs differently. This is
> something that I was also considering but eventually settled on trying to fix
> only the immediate kernel regression.

Yes, exactly same thoughts here.

--
Thanks,

David / dhildenb

2022-10-25 18:33:46

by Luis Chamberlain

[permalink] [raw]

Subject: Re: [PATCH v3 4/4] selftests: kmod: Add tests for merging same-name module load requests

On Sun, Oct 16, 2022 at 02:30:31PM +0200, Petr Pavlu wrote:
> diff --git a/tools/testing/selftests/kmod/init_module.c b/tools/testing/selftests/kmod/init_module.c
> @@ -525,6 +662,8 @@ list_tests()
> echo "0011 x $(get_test_count 0011) - test completely disabling module autoloading"
> echo "0012 x $(get_test_count 0012) - test /proc/modules address visibility under CAP_SYSLOG"
> echo "0013 x $(get_test_count 0013) - test /sys/module/*/sections/* visibility under CAP_SYSLOG"
> + echo "0014 x $(get_test_count 0014) - test handling of parallel loads, success case"
> + echo "0015 x $(get_test_count 0015) - test handling of parallel loads, init returning error"

Good stuff!

So test 0015 mimics the error reported by Prarit Bhargava through
commit 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules
that have finished loading")? If so it would be good to document that
here.

Also, this patch should go first, with the ALL_TESTS variable set to
disable the tests which are known to fail, so to demonstrate the *new*
issues as they are failing and then your fix first and then you enable
that test afterwards.

If 6e6de3dee51a introduced another regression which is not covered
by the tests we'd add it, and fix on top of it as a stepping stone
to prove / send to stable.

Luis

2022-10-25 23:09:43

by Luis Chamberlain

[permalink] [raw]

Subject: Re: [PATCH v3 4/4] selftests: kmod: Add tests for merging same-name module load requests

On Tue, Oct 25, 2022 at 10:51:14AM -0700, Luis Chamberlain wrote:
> On Sun, Oct 16, 2022 at 02:30:31PM +0200, Petr Pavlu wrote:
> > diff --git a/tools/testing/selftests/kmod/init_module.c b/tools/testing/selftests/kmod/init_module.c
> > @@ -525,6 +662,8 @@ list_tests()
> > echo "0011 x $(get_test_count 0011) - test completely disabling module autoloading"
> > echo "0012 x $(get_test_count 0012) - test /proc/modules address visibility under CAP_SYSLOG"
> > echo "0013 x $(get_test_count 0013) - test /sys/module/*/sections/* visibility under CAP_SYSLOG"
> > + echo "0014 x $(get_test_count 0014) - test handling of parallel loads, success case"
> > + echo "0015 x $(get_test_count 0015) - test handling of parallel loads, init returning error"
>
> Good stuff!
>
> So test 0015 mimics the error reported by Prarit Bhargava through
> commit 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules
> that have finished loading")? If so it would be good to document that
> here.
>
> Also, this patch should go first, with the ALL_TESTS variable set to
> disable the tests which are known to fail, so to demonstrate the *new*
> issues as they are failing and then your fix first and then you enable
> that test afterwards.
>
> If 6e6de3dee51a introduced another regression which is not covered
> by the tests we'd add it, and fix on top of it as a stepping stone
> to prove / send to stable.

On second though, this could also be made into a self-contained C code
test within lib/test_kmod.c:

* You'd extend kmod_test_case with a new enum, say TEST_KMOD_INIT_RACES
* You could extend run_request() to run the series of tests required.
* request_module() does not allow you to specify module parameters,
but we could have optional support added, and make the existing
request_module() always pass that empty / NULL.

Kind of like lib/test_vmalloc.c which has an array of tests, you can peg
new tests to the TEST_KMOD_INIT_RACES for it.

Let me know what folks think.

Luis

2023-01-12 09:41:38

by Petr Pavlu

[permalink] [raw]

Subject: Re: [PATCH v3 4/4] selftests: kmod: Add tests for merging same-name module load requests

[A different fix that the one from this thread was selected but it is still
useful to discuss these test cases and if they should be added in some form.]

On 10/17/22 15:51, Petr Mladek wrote:
> On Sun 2022-10-16 14:30:31, Petr Pavlu wrote:
>> Add two tests to check that loading the same module multiple times in
>> parallel results only in one real attempt to initialize it.
>> Synchronization of the loads is done by waiting 1000 ms in the init
>
> I do not have a good experience with this kind of synchronization.
> It usually is not reliable. The test might be very slow especially when
> false positives are solved by prolonging the delay.
>
> Alternative solution would be to have two modules:
>
> 1st module would provide a counter, for example:
>
> int modB_load_cnt;
> module_param(modB_load_cnt, int, 0444);
> EXPORT_SYMBOL(modB_load_cnt);
>
> EXPORT_SYMBOL() should allow to directly increment the counter
> from the 2nd module.
>
> module_param() should make the value readable via
> /sys/module/modA/parameters/modB_load_cnt. It can be
> checked by kmod_sh.

I agree that it would be best to avoid any synchronization based on timeouts
in these tests.

My reading is that your idea should allow the tests to remove measuring how
long it took in total to process all module inserts. It was possible for me to
implement this change.

It unfortunately doesn't help with the 1 second timeout that the
kmod_test_0014 module (modB in your description) has in its init function. Its
purpose is to make sure that any parallel loads of the same module which were
started by kmod.sh manage to reach add_unformed_module(), sleep there and
therefore hit the updated logic.

One option how to avoid this timeout is to extend modA to register a kprobe on
finished_loading() and export via a parameter which loads started by kmod.sh
reached this point. This approach works ok on my system and a prototype is
pasted at the end of this mail. Two shortcomings are that it relies on
internal knowledge of the module loader code and function finished_loading()
might not be always available for probing as it could get inlined in some
configurations.

To summarize, I see the following options for these tests:
* Use a timeout to synchronize the loads.
* Use the outlined kprobe approach.
* Don't add these tests at all.

Any opinions what would be preferred? I'm leaning towards not adding these
tests as they look fragile to me.

Thanks,
Petr

---

diff --git a/tools/testing/selftests/kmod/.gitignore b/tools/testing/selftests/kmod/.gitignore
new file mode 100644
index 000000000000..ea3afcaccc79
--- /dev/null
+++ b/tools/testing/selftests/kmod/.gitignore
@@ -0,0 +1 @@
+init_module
diff --git a/tools/testing/selftests/kmod/Makefile b/tools/testing/selftests/kmod/Makefile
index 5b3e746a0bee..1c684ac700b5 100644
--- a/tools/testing/selftests/kmod/Makefile
+++ b/tools/testing/selftests/kmod/Makefile
@@ -1,12 +1,20 @@
# SPDX-License-Identifier: GPL-2.0-only
# Makefile for kmod loading selftests

-# No binaries, but make sure arg-less "make" doesn't trigger "run_tests"
-all:
-
TEST_PROGS := kmod.sh

+# compile but not part of 'make run_tests'
+TEST_GEN_PROGS_EXTENDED := \
+ init_module test_kmod_parallel_count.ko test_kmod_parallel_sleep.ko
+
+# override lib.mk's default rule
+override define CLEAN
+ $(RM) $(TEST_GEN_PROGS_EXTENDED)
+ $(MAKE) -C test_kmod_parallel clean
+endef
+
include ../lib.mk

-# Nothing to clean up.
-clean:
+$(OUTPUT)/test_kmod_parallel_count.ko $(OUTPUT)/test_kmod_parallel_sleep.ko: $(wildcard test_kmod_parallel/Makefile test_kmod_parallel/*.[ch])
+ $(MAKE) -C test_kmod_parallel
+ cp test_kmod_parallel/$(notdir $@) $@
diff --git a/tools/testing/selftests/kmod/config b/tools/testing/selftests/kmod/config
index 259f4fd6b5e2..704c4735686c 100644
--- a/tools/testing/selftests/kmod/config
+++ b/tools/testing/selftests/kmod/config
@@ -5,3 +5,7 @@ CONFIG_XFS_FS=m
# For the module parameter force_init_test is used
CONFIG_TUN=m
CONFIG_BTRFS_FS=m
+
+# For the module test_kmod_parallel
+CONFIG_KALLSYMS=y
+CONFIG_KPROBES=y
diff --git a/tools/testing/selftests/kmod/init_module.c b/tools/testing/selftests/kmod/init_module.c
new file mode 100644
index 000000000000..529abd0a8357
--- /dev/null
+++ b/tools/testing/selftests/kmod/init_module.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Simple program to insert a module, similar to insmod but tailored
+ * specifically for needs of module loader tests.
+ */
+
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <errno.h>
+
+int main(int argc, char *argv[])
+{
+ const char *filename, *args;
+ int fd, ret;
+
+ if (argc < 2 || argc > 3) {
+ fprintf(stderr, "usage: %s filename [args]\n", argv[0]);
+ return EINVAL;
+ }
+
+ filename = argv[1];
+ args = argc > 2 ? argv[2] : "";
+
+ fd = open(filename, O_RDONLY);
+ if (fd == -1) {
+ ret = errno;
+ fprintf(stderr, "init_module: could not open file %s: %s\n",
+ filename, strerror(ret));
+ return ret;
+ }
+
+ ret = syscall(SYS_finit_module, fd, args, 0);
+ if (ret != 0) {
+ ret = errno;
+ fprintf(stderr, "init_module: could not load module %s: %s\n",
+ filename, strerror(ret));
+ }
+
+ close(fd);
+
+ return ret;
+}
diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
index afd42387e8b2..8ca82b16f165 100755
--- a/tools/testing/selftests/kmod/kmod.sh
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -65,6 +65,8 @@ ALL_TESTS="$ALL_TESTS 0010:1:1"
ALL_TESTS="$ALL_TESTS 0011:1:1"
ALL_TESTS="$ALL_TESTS 0012:1:1"
ALL_TESTS="$ALL_TESTS 0013:1:1"
+ALL_TESTS="$ALL_TESTS 0014:1:1"
+ALL_TESTS="$ALL_TESTS 0015:1:1"

# Kselftest framework requirement - SKIP code is 4.
ksft_skip=4
@@ -171,6 +173,12 @@ errno_name_to_val()
echo -1;;
-ENOENT)
echo -2;;
+ -EBUSY)
+ echo -16;;
+ -EEXIST)
+ echo -17;;
+ -ENODEV)
+ echo -19;;
-EINVAL)
echo -22;;
-ERR_ANY)
@@ -190,6 +198,12 @@ errno_val_to_name()
echo -EPERM;;
-2)
echo -ENOENT;;
+ -16)
+ echo -EBUSY;;
+ -17)
+ echo -EEXIST;;
+ -19)
+ echo -ENODEV;;
-22)
echo -EINVAL;;
-123456)
@@ -504,6 +518,135 @@ kmod_test_0013()
"cat /sys/module/${DEFAULT_KMOD_DRIVER}/sections/.*text | head -n1"
}

+# Check that loading the same module multiple times in parallel results only in
+# one real attempt to initialize it. The function utilizes two sample modules
+# test_kmod_parallel_count and test_kmod_parallel_sleep:
+# TODO Describe.
+#
+# Cases:
+# * Test 0014 checks a situation when the load is successful. It should result
+# in one insert returning 0 and remaining inserts returning EEXIST.
+# * Test 0015 checks a situation when the load is failing because the module
+# init function returns ENODEV. It should result in one insert returning this
+# error code and remaining inserts returning EBUSY.
+kmod_check_parallel_loads()
+{
+ local test_name="$1"
+ local parallel_loads="$2"
+ local insmod_params="$3"
+ local success_exp="$4"
+ local ebusy_exp="$5"
+ local eexist_exp="$6"
+ local enodev_exp="$7"
+ local other_exp="$8"
+
+ local EBUSY=$(errno_name_to_val -EBUSY)
+ local EEXIST=$(errno_name_to_val -EEXIST)
+ local ENODEV=$(errno_name_to_val -ENODEV)
+ local init_cnt pids hits rc
+ local success_cnt=0 ebusy_cnt=0 eexist_cnt=0 enodev_cnt=0 other_cnt=0
+ local test_ok=true
+
+ # Load the counter module
+ ./init_module test_kmod_parallel_count.ko
+
+ # Run the parallel loads.
+ for i in $(seq 0 $(($parallel_loads - 1))); do
+ (echo $BASHPID > /sys/module/test_kmod_parallel_count/parameters/pids_watched;
+ exec ./init_module test_kmod_parallel_sleep.ko "$insmod_params") &
+ pids[$i]=$!
+ done
+
+ # Wait for all secondary loads to block in add_unformed_module().
+ while true; do
+ hits=( $(cat /sys/module/test_kmod_parallel_count/parameters/pids_hit) )
+ (( ${#hits[@]} >= $parallel_loads - 1 )) && break
+ sleep 0.1
+ done
+ echo > /sys/module/test_kmod_parallel_count/parameters/pids_watched
+
+ # Set the state to 1 which unblocks the init function.
+ echo 1 > /sys/module/test_kmod_parallel_count/parameters/state
+
+ # Collect all results.
+ for pid in ${pids[*]}; do
+ { wait $pid; let rc=-$?; } || true
+
+ case $rc in
+ 0)
+ let success_cnt=$success_cnt+1;;
+ $EBUSY)
+ let ebusy_cnt=$ebusy_cnt+1;;
+ $EEXIST)
+ let eexist_cnt=$eexist_cnt+1;;
+ $ENODEV)
+ let enodev_cnt=$enodev_cnt+1;;
+ *)
+ let other_cnt=$other_cnt+1;;
+ esac
+ done
+
+ if (( $success_cnt > 0 )); then
+ rmmod test_kmod_parallel_sleep
+ fi
+
+ init_cnt=$(cat /sys/module/test_kmod_parallel_count/parameters/init_count)
+ rmmod test_kmod_parallel_count
+
+ # Check the results.
+ if (( $init_cnt != 1 )); then
+ echo "$test_name: FAIL, number of init calls, test expects '1' - got '$init_cnt'" >&2
+ test_ok=false
+ else
+ echo "$test_name: OK! - number of init calls, test expects '1' - got '$init_cnt'" >&2
+ fi
+ if (( $success_cnt != $success_exp )); then
+ echo "$test_name: FAIL, number of successful loads, test expects '$success_exp' - got '$success_cnt'" >&2
+ test_ok=false
+ else
+ echo "$test_name: OK! - number of successful loads, test expects '$success_exp' - got '$success_cnt'" >&2
+ fi
+ if (( $ebusy_cnt != $ebusy_exp )); then
+ echo "$test_name: FAIL, number of loads returning EBUSY, test expects '$ebusy_exp' - got '$ebusy_cnt'" >&2
+ test_ok=false
+ else
+ echo "$test_name: OK! - number of loads returning EBUSY, test expects '$ebusy_exp' - got '$ebusy_cnt'" >&2
+ fi
+ if (( $eexist_cnt != $eexist_exp )); then
+ echo "$test_name: FAIL, number of loads returning EEXIST, test expects '$eexist_exp' - got '$eexist_cnt'" >&2
+ test_ok=false
+ else
+ echo "$test_name: OK! - number of loads returning EEXIST, test expects '$eexist_exp' - got '$eexist_cnt'" >&2
+ fi
+ if (( $enodev_cnt != $enodev_exp )); then
+ echo "$test_name: FAIL, number of loads returning ENODEV, test expects '$enodev_exp' - got '$enodev_cnt'" >&2
+ test_ok=false
+ else
+ echo "$test_name: OK! - number of loads returning ENODEV, test expects '$enodev_exp' - got '$enodev_cnt'" >&2
+ fi
+ if (( $other_cnt != $other_exp )); then
+ echo "$test_name: FAIL, number of loads returning other values, test expects '$other_exp' - got '$other_cnt'" >&2
+ test_ok=false
+ else
+ echo "$test_name: OK! - number of loads returning other values, test expects '$other_exp' - got '$other_cnt'" >&2
+ fi
+
+ [ $test_ok = true ] || exit 1
+}
+
+kmod_test_0014()
+{
+ kmod_check_parallel_loads \
+ "${FUNCNAME[0]}" 4 "sleep_msecs=1000 init_retval=0" 1 0 3 0 0
+}
+
+kmod_test_0015()
+{
+ local ENODEV=$(errno_name_to_val -ENODEV)
+ kmod_check_parallel_loads \
+ "${FUNCNAME[0]}" 4 "sleep_msecs=1000 init_retval=$ENODEV" 0 3 0 1 0
+}
+
list_tests()
{
echo "Test ID list:"
@@ -525,6 +668,8 @@ list_tests()
echo "0011 x $(get_test_count 0011) - test completely disabling module autoloading"
echo "0012 x $(get_test_count 0012) - test /proc/modules address visibility under CAP_SYSLOG"
echo "0013 x $(get_test_count 0013) - test /sys/module/*/sections/* visibility under CAP_SYSLOG"
+ echo "0014 x $(get_test_count 0014) - test handling of parallel loads, success case"
+ echo "0015 x $(get_test_count 0015) - test handling of parallel loads, init returning error"
}

usage()
diff --git a/tools/testing/selftests/kmod/settings b/tools/testing/selftests/kmod/settings
new file mode 100644
index 000000000000..ba4d85f74cd6
--- /dev/null
+++ b/tools/testing/selftests/kmod/settings
@@ -0,0 +1 @@
+timeout=90
diff --git a/tools/testing/selftests/kmod/test_kmod_parallel/Makefile b/tools/testing/selftests/kmod/test_kmod_parallel/Makefile
new file mode 100644
index 000000000000..8d2b24aa868f
--- /dev/null
+++ b/tools/testing/selftests/kmod/test_kmod_parallel/Makefile
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+KMOD_TEST_PARALLEL_DIR := $(realpath $(dir $(abspath $(lastword $(MAKEFILE_LIST)))))
+KDIR ?= $(abspath $(KMOD_TEST_PARALLEL_DIR)/../../../../..)
+
+obj-m := test_kmod_parallel_count.o test_kmod_parallel_sleep.o
+
+all:
+ $(MAKE) -C $(KDIR) M=$(KMOD_TEST_PARALLEL_DIR) modules
+
+clean:
+ $(MAKE) -C $(KDIR) M=$(KMOD_TEST_PARALLEL_DIR) clean
diff --git a/tools/testing/selftests/kmod/test_kmod_parallel/test_kmod_parallel_count.c b/tools/testing/selftests/kmod/test_kmod_parallel/test_kmod_parallel_count.c
new file mode 100644
index 000000000000..8de9351d6df9
--- /dev/null
+++ b/tools/testing/selftests/kmod/test_kmod_parallel/test_kmod_parallel_count.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/kstrtox.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/string.h>
+#include <linux/wait.h>
+
+/*
+ * State parameter: value 0 blocks the init function of test_kmod_parallel_sleep
+ * when it invokes test_kmod_parallel_count_wait(), value 1 makes it to proceed.
+ */
+static unsigned int state;
+static DECLARE_WAIT_QUEUE_HEAD(state_wq);
+
+static int state_set(const char *val, const struct kernel_param *kp)
+{
+ int err;
+
+ err = kstrtoint(val, 0, &state);
+ if (err != 0)
+ return err;
+
+ wake_up_interruptible(&state_wq);
+ return 0;
+}
+
+static const struct kernel_param_ops state_ops = {
+ .set = state_set,
+ .get = param_get_uint,
+};
+
+module_param_cb(state, &state_ops, &state, 0644);
+MODULE_PARM_DESC(state, "module state switch, default 0");
+
+/*
+ * Counter to record how many times the init function of
+ * test_kmod_parallel_sleep was invoked.
+ */
+static unsigned int init_count;
+module_param(init_count, uint, 0444);
+MODULE_PARM_DESC(init_count, "counter for test_kmod_parallel_sleep_init()");
+
+int test_kmod_parallel_count_wait(void)
+{
+ kernel_param_lock(THIS_MODULE);
+ init_count++;
+ kernel_param_unlock(THIS_MODULE);
+
+ /* Wait for an external actor to transfer the state from 0 to 1. */
+ return wait_event_interruptible(state_wq, state == 1);
+}
+EXPORT_SYMBOL(test_kmod_parallel_count_wait);
+
+/*
+ * Tasks watched for invocation of finished_loading() and which ones actually
+ * executed the function.
+ */
+static DEFINE_SPINLOCK(pids_lock);
+static pid_t pids_watched[4] = { -1, -1, -1, -1 };
+static pid_t pids_hit[4] = { -1, -1, -1, -1 };
+
+static int format_pids(char *buffer, const pid_t *pids, size_t pids_size)
+{
+ int ret = 0;
+ size_t i;
+
+ for (i = 0; i < pids_size; i++) {
+ if (pids[i] == -1)
+ break;
+ ret += sprintf(buffer + ret, "%s%d", ret != 0 ? " " : "",
+ pids[i]);
+ }
+
+ ret += sprintf(buffer + ret, "\n");
+ return ret;
+}
+
+static int pids_watched_set(const char *val, const struct kernel_param *kp)
+{
+ int err;
+ size_t i;
+ pid_t pid;
+
+ spin_lock(&pids_lock);
+
+ /* Handle the reset case. */
+ if (sysfs_streq(val, "")) {
+ for (i = 0; i < ARRAY_SIZE(pids_watched); i++) {
+ pids_watched[i] = -1;
+ pids_hit[i] = -1;
+ }
+ goto success;
+ }
+
+ /* Add one PID to the watch set. */
+ err = kstrtoint(val, 0, &pid);
+ if (err != 0)
+ goto err;
+ if (pid < 0) {
+ err = -EINVAL;
+ goto err;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(pids_watched); i++)
+ if (pids_watched[i] == -1) {
+ pids_watched[i] = pid;
+ break;
+ }
+
+ if (i == ARRAY_SIZE(pids_watched)) {
+ err = -EINVAL;
+ goto err;
+ }
+
+success:
+ err = 0;
+err:
+ spin_unlock(&pids_lock);
+ return err;
+}
+
+static int pids_watched_get(char *buffer, const struct kernel_param *kp)
+{
+ int ret;
+
+ spin_lock(&pids_lock);
+ ret = format_pids(buffer, pids_watched, ARRAY_SIZE(pids_watched));
+ spin_unlock(&pids_lock);
+
+ return ret;
+}
+
+static const struct kernel_param_ops pids_watched_ops = {
+ .set = pids_watched_set,
+ .get = pids_watched_get,
+};
+
+module_param_cb(pids_watched, &pids_watched_ops, NULL, 0644);
+MODULE_PARM_DESC(state, "PIDs watched for finished_loading()");
+
+static int pids_hit_get(char *buffer, const struct kernel_param *kp)
+{
+ int ret;
+
+ spin_lock(&pids_lock);
+ ret = format_pids(buffer, pids_hit, ARRAY_SIZE(pids_hit));
+ spin_unlock(&pids_lock);
+
+ return ret;
+}
+
+static const struct kernel_param_ops pids_hit_ops = {
+ .get = pids_hit_get,
+};
+
+module_param_cb(pids_hit, &pids_hit_ops, NULL, 0444);
+MODULE_PARM_DESC(state, "PIDs which hit finished_loading()");
+
+static int __kprobes finished_loading_pre(struct kprobe *p,
+ struct pt_regs *regs)
+{
+ size_t i;
+
+ spin_lock(&pids_lock);
+
+ /* Check if this task is watched. */
+ for (i = 0; i < ARRAY_SIZE(pids_watched); i++)
+ if (current->pid == pids_watched[i])
+ break;
+ if (i == ARRAY_SIZE(pids_watched))
+ goto out;
+
+ /* Record the hit. */
+ for (i = 0; i < ARRAY_SIZE(pids_hit); i++) {
+ if (current->pid == pids_hit[i])
+ break;
+ if (pids_hit[i] != -1)
+ continue;
+ pids_hit[i] = current->pid;
+ break;
+ }
+
+out:
+ spin_unlock(&pids_lock);
+ return 0;
+}
+
+static struct kprobe finished_loading_kp = {
+ .symbol_name = "finished_loading",
+ .pre_handler = finished_loading_pre,
+};
+
+static int __init test_kmod_parallel_count_init(void)
+{
+ int err;
+
+ err = register_kprobe(&finished_loading_kp);
+ if (err < 0) {
+ pr_err("register_kprobe failed, returned %d\n", err);
+ return err;
+ }
+ return 0;
+}
+
+static void __exit test_kmod_parallel_count_exit(void)
+{
+ unregister_kprobe(&finished_loading_kp);
+}
+
+module_init(test_kmod_parallel_count_init);
+module_exit(test_kmod_parallel_count_exit);
+
+MODULE_AUTHOR("Petr Pavlu <[email protected]>");
+MODULE_LICENSE("GPL v2");
+MODULE_INFO(test, "Y");
diff --git a/tools/testing/selftests/kmod/test_kmod_parallel/test_kmod_parallel_sleep.c b/tools/testing/selftests/kmod/test_kmod_parallel/test_kmod_parallel_sleep.c
new file mode 100644
index 000000000000..cd1b8184af2a
--- /dev/null
+++ b/tools/testing/selftests/kmod/test_kmod_parallel/test_kmod_parallel_sleep.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+
+int test_kmod_parallel_count_wait(void);
+
+static int init_retval;
+module_param(init_retval, int, 0644);
+MODULE_PARM_DESC(init_retval, "init return value, default 0");
+
+static int __init test_kmod_parallel_sleep_init(void)
+{
+ int err;
+
+ err = test_kmod_parallel_count_wait();
+ if (err != 0)
+ return err;
+
+ return init_retval;
+}
+
+static void __exit test_kmod_parallel_sleep_exit(void)
+{
+}
+
+module_init(test_kmod_parallel_sleep_init);
+module_exit(test_kmod_parallel_sleep_exit);
+
+MODULE_AUTHOR("Petr Pavlu <[email protected]>");
+MODULE_LICENSE("GPL v2");
+MODULE_INFO(test, "Y");

2023-01-13 15:12:35

by Petr Mladek

[permalink] [raw]

Subject: Re: [PATCH v3 4/4] selftests: kmod: Add tests for merging same-name module load requests

On Thu 2023-01-12 10:03:10, Petr Pavlu wrote:
> [A different fix that the one from this thread was selected but it is still
> useful to discuss these test cases and if they should be added in some form.]
>
> On 10/17/22 15:51, Petr Mladek wrote:
> > On Sun 2022-10-16 14:30:31, Petr Pavlu wrote:
> >> Add two tests to check that loading the same module multiple times in
> >> parallel results only in one real attempt to initialize it.
> >> Synchronization of the loads is done by waiting 1000 ms in the init
> >
> > I do not have a good experience with this kind of synchronization.
> > It usually is not reliable. The test might be very slow especially when
> > false positives are solved by prolonging the delay.
> >
> > Alternative solution would be to have two modules:
> >
> > 1st module would provide a counter, for example:
> >
> > int modB_load_cnt;
> > module_param(modB_load_cnt, int, 0444);
> > EXPORT_SYMBOL(modB_load_cnt);
> >
> > EXPORT_SYMBOL() should allow to directly increment the counter
> > from the 2nd module.
> >
> > module_param() should make the value readable via
> > /sys/module/modA/parameters/modB_load_cnt. It can be
> > checked by kmod_sh.
>
> I agree that it would be best to avoid any synchronization based on timeouts
> in these tests.
>
> My reading is that your idea should allow the tests to remove measuring how
> long it took in total to process all module inserts. It was possible for me to
> implement this change.
>
> It unfortunately doesn't help with the 1 second timeout that the
> kmod_test_0014 module (modB in your description) has in its init function. Its
> purpose is to make sure that any parallel loads of the same module which were
> started by kmod.sh manage to reach add_unformed_module(), sleep there and
> therefore hit the updated logic.

I see.

> One option how to avoid this timeout is to extend modA to register a kprobe on
> finished_loading() and export via a parameter which loads started by kmod.sh
> reached this point. This approach works ok on my system and a prototype is
> pasted at the end of this mail. Two shortcomings are that it relies on
> internal knowledge of the module loader code and function finished_loading()
> might not be always available for probing as it could get inlined in some
> configurations.

Yeah, it is a bit fragile as well.

> To summarize, I see the following options for these tests:
> * Use a timeout to synchronize the loads.
> * Use the outlined kprobe approach.
> * Don't add these tests at all.

Yet another solution would be to add a support for this test into
the module loaded code. I mean that add_unformed_module() might
increment a global counter when a certain module is waiting there.
The global counter then might be used to unblock the init()
callback.

The test module might be distinguished, for example, by a test
specific module info string. For example, see
check_modinfo_livepatch(). It looks for the info string defined
in the livepatch modules, see MODULE_INFO(livepatch, "Y"); in
samples/livepatch/livepatch-sample.c.

That said, I do not like this much either. I am not sure if it is
more or less crazy than the kprobe approach.

> Any opinions what would be preferred? I'm leaning towards not adding these
> tests as they look fragile to me.

I do not have strong opinion.

My experience is that some tests are not worth the effort. The
maintenance or false positives might add more harm than good.

My feeling is that this one belongs into this category.
We could keep the timeout and make it error prone.
Or we could use some hacks and make it hard to maintain.

Best Regards,
Petr