2020-12-03 13:55:39

by Jessica Yu

[permalink] [raw]
Subject: [PATCH RFC 0/1] Delay module uevent until after initialization

Hi,

This patch addresses a race condition between udev and the module loader
that was recently described here:

https://github.com/systemd/systemd/issues/17586

Currently, the module loader issues a KOBJ_ADD uevent before it calls a
module's initialization function. Some mount units expect that the module
has initialized already upon receiving the uevent. For instance, the
configfs module creates the /sys/kernel/config mount point during its init
function, but the module loader issues the uevent before the init function
is called. If udev processes the uevent before the module loader calls the
init function, then the mount unit will fail, since /sys/kernel/config will
not exist yet.

Nicolas Morey-Chaisemartin provided a simple test script to trigger the
race condition:

while true; do
umount configfs
rmmod configfs
sleep 1
modprobe configfs
ls -alFd /sys/kernel/config
sleep 1
systemctl status sys-kernel-config.mount | tail -n 1
done

When the mount fails due to the race condition, you would see a message like:

systemd[1]: Condition check resulted in Kernel Configuration File System being skipped.

I ran the script for about 30 minutes on my own machine and managed to trigger
the failure condition 4 times. With the patch applied, I was not able to
trigger the failed condition anymore after running the script for the same
amount of time. Nicolas also reported similar test results after testing a
kernel with the patch applied.

This is sent first as an RFC to both LKML and systemd mailing lists since
the uevent call has been like this in the module loader for more than a
decade (since v2.6), I would be cautious as to not break anything that
actually relies on the current behavior for whatever reason. More testing
would be greatly appreciated.

Thanks,

Jessica

Jessica Yu (1):
module: delay kobject uevent until after module init call

kernel/module.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)


2020-12-03 13:55:59

by Jessica Yu

[permalink] [raw]
Subject: [PATCH RFC 1/1] module: delay kobject uevent until after module init call

Apparently there has been a longstanding race between udev/systemd and
the module loader. Currently, the module loader sends a uevent right
after sysfs initialization, but before the module calls its init
function. However, some udev rules expect that the module has
initialized already upon receiving the uevent.

This race has been triggered recently (see link in references) in some
systemd mount unit files. For instance, the configfs module creates the
/sys/kernel/config mount point in its init function, however the module
loader issues the uevent before this happens. sys-kernel-config.mount
expects to be able to mount /sys/kernel/config upon receipt of the
module loading uevent, but if the configfs module has not called its
init function yet, then this directory will not exist and the mount unit
fails. A similar situation exists for sys-fs-fuse-connections.mount, as
the fuse sysfs mount point is created during the fuse module's init
function. If udev is faster than module initialization then the mount
unit would fail in a similar fashion.

To fix this race, delay the module KOBJ_ADD uevent until after the
module has finished calling its init routine.

References: https://github.com/systemd/systemd/issues/17586
Signed-off-by: Jessica Yu <[email protected]>
---
kernel/module.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index a40ec708f8f2..e1dd0df57244 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1897,7 +1897,6 @@ static int mod_sysfs_init(struct module *mod)
if (err)
mod_kobject_put(mod);

- /* delay uevent until full sysfs population */
out:
return err;
}
@@ -1934,7 +1933,6 @@ static int mod_sysfs_setup(struct module *mod,
add_sect_attrs(mod, info);
add_notes_attrs(mod, info);

- kobject_uevent(&mod->mkobj.kobj, KOBJ_ADD);
return 0;

out_unreg_modinfo_attrs:
@@ -3656,6 +3654,9 @@ static noinline int do_init_module(struct module *mod)
blocking_notifier_call_chain(&module_notify_list,
MODULE_STATE_LIVE, mod);

+ /* Delay uevent until module has finished its init routine */
+ kobject_uevent(&mod->mkobj.kobj, KOBJ_ADD);
+
/*
* We need to finish all async code before the module init sequence
* is done. This has potential to deadlock. For example, a newly
--
2.16.4

Subject: Re: [PATCH RFC 1/1] module: delay kobject uevent until after module init call

I tested this on a system that had the script in patch#0 fail 100% of the time.
With this patch, I haven't seen any failure in a few hours.
Haven't seen any unexpected side effect either.

Tested-By: Nicolas Morey-Chaisemartin <[email protected]>

On 12/3/20 2:51 PM, Jessica Yu wrote:
> Apparently there has been a longstanding race between udev/systemd and
> the module loader. Currently, the module loader sends a uevent right
> after sysfs initialization, but before the module calls its init
> function. However, some udev rules expect that the module has
> initialized already upon receiving the uevent.
>
> This race has been triggered recently (see link in references) in some
> systemd mount unit files. For instance, the configfs module creates the
> /sys/kernel/config mount point in its init function, however the module
> loader issues the uevent before this happens. sys-kernel-config.mount
> expects to be able to mount /sys/kernel/config upon receipt of the
> module loading uevent, but if the configfs module has not called its
> init function yet, then this directory will not exist and the mount unit
> fails. A similar situation exists for sys-fs-fuse-connections.mount, as
> the fuse sysfs mount point is created during the fuse module's init
> function. If udev is faster than module initialization then the mount
> unit would fail in a similar fashion.
>
> To fix this race, delay the module KOBJ_ADD uevent until after the
> module has finished calling its init routine.
>
> References: https://github.com/systemd/systemd/issues/17586
> Signed-off-by: Jessica Yu <[email protected]>
> ---
> kernel/module.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index a40ec708f8f2..e1dd0df57244 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -1897,7 +1897,6 @@ static int mod_sysfs_init(struct module *mod)
> if (err)
> mod_kobject_put(mod);
>
> - /* delay uevent until full sysfs population */
> out:
> return err;
> }
> @@ -1934,7 +1933,6 @@ static int mod_sysfs_setup(struct module *mod,
> add_sect_attrs(mod, info);
> add_notes_attrs(mod, info);
>
> - kobject_uevent(&mod->mkobj.kobj, KOBJ_ADD);
> return 0;
>
> out_unreg_modinfo_attrs:
> @@ -3656,6 +3654,9 @@ static noinline int do_init_module(struct module *mod)
> blocking_notifier_call_chain(&module_notify_list,
> MODULE_STATE_LIVE, mod);
>
> + /* Delay uevent until module has finished its init routine */
> + kobject_uevent(&mod->mkobj.kobj, KOBJ_ADD);
> +
> /*
> * We need to finish all async code before the module init sequence
> * is done. This has potential to deadlock. For example, a newly
>

2020-12-03 18:35:58

by Greg KH

[permalink] [raw]
Subject: Re: [systemd-devel] [PATCH RFC 1/1] module: delay kobject uevent until after module init call

On Thu, Dec 03, 2020 at 02:51:24PM +0100, Jessica Yu wrote:
> Apparently there has been a longstanding race between udev/systemd and
> the module loader. Currently, the module loader sends a uevent right
> after sysfs initialization, but before the module calls its init
> function. However, some udev rules expect that the module has
> initialized already upon receiving the uevent.
>
> This race has been triggered recently (see link in references) in some
> systemd mount unit files. For instance, the configfs module creates the
> /sys/kernel/config mount point in its init function, however the module
> loader issues the uevent before this happens. sys-kernel-config.mount
> expects to be able to mount /sys/kernel/config upon receipt of the
> module loading uevent, but if the configfs module has not called its
> init function yet, then this directory will not exist and the mount unit
> fails. A similar situation exists for sys-fs-fuse-connections.mount, as
> the fuse sysfs mount point is created during the fuse module's init
> function. If udev is faster than module initialization then the mount
> unit would fail in a similar fashion.
>
> To fix this race, delay the module KOBJ_ADD uevent until after the
> module has finished calling its init routine.
>
> References: https://github.com/systemd/systemd/issues/17586
> Signed-off-by: Jessica Yu <[email protected]>

Nice fix:
Reviewed-by: Greg Kroah-Hartman <[email protected]>

2020-12-10 12:42:16

by Jessica Yu

[permalink] [raw]
Subject: Re: [PATCH RFC 1/1] module: delay kobject uevent until after module init call

+++ Jessica Yu [03/12/20 14:51 +0100]:
>Apparently there has been a longstanding race between udev/systemd and
>the module loader. Currently, the module loader sends a uevent right
>after sysfs initialization, but before the module calls its init
>function. However, some udev rules expect that the module has
>initialized already upon receiving the uevent.
>
>This race has been triggered recently (see link in references) in some
>systemd mount unit files. For instance, the configfs module creates the
>/sys/kernel/config mount point in its init function, however the module
>loader issues the uevent before this happens. sys-kernel-config.mount
>expects to be able to mount /sys/kernel/config upon receipt of the
>module loading uevent, but if the configfs module has not called its
>init function yet, then this directory will not exist and the mount unit
>fails. A similar situation exists for sys-fs-fuse-connections.mount, as
>the fuse sysfs mount point is created during the fuse module's init
>function. If udev is faster than module initialization then the mount
>unit would fail in a similar fashion.
>
>To fix this race, delay the module KOBJ_ADD uevent until after the
>module has finished calling its init routine.
>
>References: https://github.com/systemd/systemd/issues/17586
>Signed-off-by: Jessica Yu <[email protected]>

Thanks all, this has been applied to modules-next to try to get as
much -next time as possible before the upcoming merge window.

Jessica