2017-07-21 14:05:29

by Matt Redfearn

[permalink] [raw]
Subject: [RFC PATCH] exec: Avoid recursive modprobe for binary format handlers

When the kernel does not have a binary format handler for an executable
it is attempting to load, when CONFIG_MODULES is enabled it will attempt
to load a module for that format. If the kernel does not have a binary
format handler for the modprobe executable, this will trigger another
module load. Previously this recursive module loading was caught and an
error message printed informing the user that the executable could not
be executed:

request_module: runaway loop modprobe binfmt-464c
Starting init:/sbin/init exists but couldn't execute it (error -8)

Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
no longer caught, it just ends up waiting indefinitely for the kmod_wq
wait queue. Hence the kernel appears to hang silently when starting
userspace.

This problem was observed when the binfmt handler for MIPS o32 binaries
is not built in to a 64bit kernel and the root filesystem is o32 ABI.

Catch this by adding a guard to search_binary_handler(). If there is no
binary format handler available to load an exectuable, and the
executable matches modprobe_path, i.e. the userspace helper that would
be executed to load a module, then do not attempt to load the module
since it will just end up here again when it fails to execute. This
actually improves the original behaviour since the "runaway loop"
warning is no longer printed, and we simply get:

Starting init:/sbin/init exists but couldn't execute it (error -8)

Fixes: 6d7964a722af ("kmod: throttle kmod thread limit")
Signed-off-by: Matt Redfearn <[email protected]>
---

What we really need to detect is that exec'ing modprobe failed, but
currently it does not get as far as an actual error since it just ends
up stuck waiting for the modprobes to complete, which they never will.
Open to suggestions of a different / better way to fix this.

Thanks,
Matt

---
fs/exec.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index 62175cbcc801..004bb50a01fe 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1644,6 +1644,9 @@ int search_binary_handler(struct linux_binprm *bprm)
if (printable(bprm->buf[0]) && printable(bprm->buf[1]) &&
printable(bprm->buf[2]) && printable(bprm->buf[3]))
return retval;
+ /* Game over if we need to load a module to execute modprobe */
+ if (strcmp(bprm->filename, modprobe_path) == 0)
+ return retval;
if (request_module("binfmt-%04x", *(ushort *)(bprm->buf + 2)) < 0)
return retval;
need_retry = false;
--
2.7.4


2017-08-02 00:12:04

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC PATCH] exec: Avoid recursive modprobe for binary format handlers

On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
> When the kernel does not have a binary format handler for an executable
> it is attempting to load, when CONFIG_MODULES is enabled it will attempt
> to load a module for that format. If the kernel does not have a binary
> format handler for the modprobe executable, this will trigger another
> module load.

Ah fun.

> Previously this recursive module loading was caught and an
> error message printed informing the user that the executable could not
> be executed:
>
> request_module: runaway loop modprobe binfmt-464c
> Starting init:/sbin/init exists but couldn't execute it (error -8)

Its incorrect to believe though that this message was coming up before
only after one cross dependency loop, it would only come up on we create
a loop which went over 50 requests. In this case this is caused because
the binfmt hander for modprobe is not loaded yet and it tries again to
load the bimfmd for it, which leads to the loop and > 50 limit reached.

modprobe binfmt for /sbin/init -- modprobe binfmt for modprobe --> loop

> Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
> merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
> no longer caught, it just ends up waiting indefinitely for the kmod_wq
> wait queue. Hence the kernel appears to hang silently when starting
> userspace.

Indeed, the recursive issue were no longer expected to exist.

> This problem was observed when the binfmt handler for MIPS o32 binaries
> is not built in to a 64bit kernel and the root filesystem is o32 ABI.

I see!

Another way to see this is that the binfmt handler for modprobe should
be built-in, and its unclear if there is a sensible way to ensure this
with kconfig. Perhaps on some architectures it may be possible by not
allowing some binfmd handlers as modules, but I would be surprised if
this could be a general thing.

> Catch this by adding a guard to search_binary_handler(). If there is no
> binary format handler available to load an exectuable, and the
> executable matches modprobe_path, i.e. the userspace helper that would
> be executed to load a module, then do not attempt to load the module
> since it will just end up here again when it fails to execute. This
> actually improves the original behaviour since the "runaway loop"
> warning is no longer printed, and we simply get:
>
> Starting init:/sbin/init exists but couldn't execute it (error -8)

Neat, indeed this error message is much more meaningful.

> Fixes: 6d7964a722af ("kmod: throttle kmod thread limit")
> Signed-off-by: Matt Redfearn <[email protected]>
> ---
>
> What we really need to detect is that exec'ing modprobe failed, but
> currently it does not get as far as an actual error since it just ends
> up stuck waiting for the modprobes to complete,

Well right, it won't return as the kernel is busy trying to load the
first binfmt for init by fist calling modprobe for the binfmt for...
modprobe.

> which they never will.

Yup.

> Open to suggestions of a different / better way to fix this.
>
> Thanks,
> Matt
>
> ---
> fs/exec.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 62175cbcc801..004bb50a01fe 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1644,6 +1644,9 @@ int search_binary_handler(struct linux_binprm *bprm)
> if (printable(bprm->buf[0]) && printable(bprm->buf[1]) &&
> printable(bprm->buf[2]) && printable(bprm->buf[3]))
> return retval;
> + /* Game over if we need to load a module to execute modprobe */
> + if (strcmp(bprm->filename, modprobe_path) == 0)
> + return retval;

Wouldn't this just break having a binfmt used for modprobe always?

This also does not solve another issue I could think of now:

The *old* implementation would also prevent a set of binaries to daisy chain
a set of 50 different binaries which require different binfmt loaders. The
current implementation enables this and we'd just wait. There's a bound to
the number of binfmd loaders though, so this would be bounded. If however
a 2nd loader loaded the first binary we'd run into the same issue I think.

If we can't think of a good way to resolve this we'll just have to revert
6d7964a722af for now.

> if (request_module("binfmt-%04x", *(ushort *)(bprm->buf + 2)) < 0)
> return retval;
> need_retry = false;
> --
> 2.7.4

Luis

2017-08-02 02:28:24

by Kees Cook

[permalink] [raw]
Subject: Re: [RFC PATCH] exec: Avoid recursive modprobe for binary format handlers

On Tue, Aug 1, 2017 at 5:12 PM, Luis R. Rodriguez <[email protected]> wrote:
> On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
>> Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
>> merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
>> no longer caught, it just ends up waiting indefinitely for the kmod_wq
>> wait queue. Hence the kernel appears to hang silently when starting
>> userspace.
>
> Indeed, the recursive issue were no longer expected to exist.

Errr, yeah, recursive binfmt loads can still happen.

> The *old* implementation would also prevent a set of binaries to daisy chain
> a set of 50 different binaries which require different binfmt loaders. The
> current implementation enables this and we'd just wait. There's a bound to
> the number of binfmd loaders though, so this would be bounded. If however
> a 2nd loader loaded the first binary we'd run into the same issue I think.
>
> If we can't think of a good way to resolve this we'll just have to revert
> 6d7964a722af for now.

The weird but "normal" recursive case is usually a script calling a
script calling a misc format. Getting a chain of modprobes running,
though, seems unlikely. I *think* Matt's patch is okay, but I agree,
it'd be better for the request_module() to fail.

-Kees

--
Kees Cook
Pixel Security

2017-08-02 23:23:37

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC PATCH] exec: Avoid recursive modprobe for binary format handlers

On Tue, Aug 01, 2017 at 07:28:20PM -0700, Kees Cook wrote:
> On Tue, Aug 1, 2017 at 5:12 PM, Luis R. Rodriguez <[email protected]> wrote:
> > On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
> >> Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
> >> merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
> >> no longer caught, it just ends up waiting indefinitely for the kmod_wq
> >> wait queue. Hence the kernel appears to hang silently when starting
> >> userspace.
> >
> > Indeed, the recursive issue were no longer expected to exist.
>
> Errr, yeah, recursive binfmt loads can still happen.
>
> > The *old* implementation would also prevent a set of binaries to daisy chain
> > a set of 50 different binaries which require different binfmt loaders. The
> > current implementation enables this and we'd just wait. There's a bound to
> > the number of binfmd loaders though, so this would be bounded. If however
> > a 2nd loader loaded the first binary we'd run into the same issue I think.
> >
> > If we can't think of a good way to resolve this we'll just have to revert
> > 6d7964a722af for now.
>
> The weird but "normal" recursive case is usually a script calling a
> script calling a misc format. Getting a chain of modprobes running,
> though, seems unlikely. I *think* Matt's patch is okay, but I agree,
> it'd be better for the request_module() to fail.

In that case how about we just have each waiter only wait max X seconds,
if the number of concurrent ongoing modprobe calls hasn't reduced by
a single digit in X seconds we give up on request_module() for the
module and clearly indicate what happened.

Matt, can you test?

Note I've used wait_event_killable_timeout() to only accept SIGKILL
for now. I've seen issues wit SIGCHILD and at modprobe this could
even be a bigger issue, so this would restrict the signals received
*only* to SIGKILL.

It would be good to come up with a simple test case for this in
tools/testing/selftests/kmod/kmod.sh

Luis

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 5b74e36c0ca8..dc19880c02f5 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -757,6 +757,43 @@ extern int do_wait_intr_irq(wait_queue_head_t *, wait_queue_entry_t *);
__ret; \
})

+#define __wait_event_killable_timeout(wq_head, condition, timeout) \
+ ___wait_event(wq_head, ___wait_cond_timeout(condition), \
+ TASK_KILLABLE, 0, timeout, \
+ __ret = schedule_timeout(__ret))
+
+/**
+ * wait_event_killable_timeout - sleep until a condition gets true or a timeout elapses
+ * @wq_head: the waitqueue to wait on
+ * @condition: a C expression for the event to wait for
+ * @timeout: timeout, in jiffies
+ *
+ * The process is put to sleep (TASK_KILLABLE) until the
+ * @condition evaluates to true or a kill signal is received.
+ * The @condition is checked each time the waitqueue @wq_head is woken up.
+ *
+ * wake_up() has to be called after changing any variable that could
+ * change the result of the wait condition.
+ *
+ * Returns:
+ * 0 if the @condition evaluated to %false after the @timeout elapsed,
+ * 1 if the @condition evaluated to %true after the @timeout elapsed,
+ * the remaining jiffies (at least 1) if the @condition evaluated
+ * to %true before the @timeout elapsed, or -%ERESTARTSYS if it was
+ * interrupted by a kill signal.
+ *
+ * Only kill signals interrupt this process.
+ */
+#define wait_event_killable_timeout(wq_head, condition, timeout) \
+({ \
+ long __ret = timeout; \
+ might_sleep(); \
+ if (!___wait_cond_timeout(condition)) \
+ __ret = __wait_event_killable_timeout(wq_head, \
+ condition, timeout); \
+ __ret; \
+})
+

#define __wait_event_lock_irq(wq_head, condition, lock, cmd) \
(void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0, \
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 6d016c5d97c8..1b5f7bada8d2 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -71,6 +71,13 @@ static atomic_t kmod_concurrent_max = ATOMIC_INIT(MAX_KMOD_CONCURRENT);
static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);

/*
+ * If modprobe can't be called after this time we assume its very likely
+ * your userspace has created a recursive dependency, and we'll have no
+ * option but to fail.
+ */
+#define MAX_KMOD_TIMEOUT 5
+
+/*
modprobe_path is set via /proc/sys.
*/
char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
@@ -167,8 +174,18 @@ int __request_module(bool wait, const char *fmt, ...)
pr_warn_ratelimited("request_module: kmod_concurrent_max (%u) close to 0 (max_modprobes: %u), for module %s, throttling...",
atomic_read(&kmod_concurrent_max),
MAX_KMOD_CONCURRENT, module_name);
- wait_event_interruptible(kmod_wq,
- atomic_dec_if_positive(&kmod_concurrent_max) >= 0);
+ ret = wait_event_killable_timeout(kmod_wq,
+ atomic_dec_if_positive(&kmod_concurrent_max) >= 0,
+ MAX_KMOD_TIMEOUT * HZ);
+ if (!ret) {
+ pr_warn_ratelimited("request_module: modprobe %s cannot be processed, kmod busy with %d threads for more than %d seconds now",
+ module_name, atomic_read(&kmod_concurrent_max), MAX_KMOD_TIMEOUT);
+ pr_warn_ratelimited("request_module: recursive modprobe call very likely!");
+ return -ETIME;
+ } else if (ret == -ERESTARTSYS) {
+ pr_warn_ratelimited("request_module: sigkill sent for modprobe %s, giving up", module_name);
+ return ret;
+ }
}

trace_module_request(module_name, wait, _RET_IP_);

2017-08-04 00:02:46

by Kees Cook

[permalink] [raw]
Subject: Re: [RFC PATCH] exec: Avoid recursive modprobe for binary format handlers

On Wed, Aug 2, 2017 at 4:23 PM, Luis R. Rodriguez <[email protected]> wrote:
> On Tue, Aug 01, 2017 at 07:28:20PM -0700, Kees Cook wrote:
>> On Tue, Aug 1, 2017 at 5:12 PM, Luis R. Rodriguez <[email protected]> wrote:
>> > On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
>> >> Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
>> >> merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
>> >> no longer caught, it just ends up waiting indefinitely for the kmod_wq
>> >> wait queue. Hence the kernel appears to hang silently when starting
>> >> userspace.
>> >
>> > Indeed, the recursive issue were no longer expected to exist.
>>
>> Errr, yeah, recursive binfmt loads can still happen.
>>
>> > The *old* implementation would also prevent a set of binaries to daisy chain
>> > a set of 50 different binaries which require different binfmt loaders. The
>> > current implementation enables this and we'd just wait. There's a bound to
>> > the number of binfmd loaders though, so this would be bounded. If however
>> > a 2nd loader loaded the first binary we'd run into the same issue I think.
>> >
>> > If we can't think of a good way to resolve this we'll just have to revert
>> > 6d7964a722af for now.
>>
>> The weird but "normal" recursive case is usually a script calling a
>> script calling a misc format. Getting a chain of modprobes running,
>> though, seems unlikely. I *think* Matt's patch is okay, but I agree,
>> it'd be better for the request_module() to fail.
>
> In that case how about we just have each waiter only wait max X seconds,
> if the number of concurrent ongoing modprobe calls hasn't reduced by
> a single digit in X seconds we give up on request_module() for the
> module and clearly indicate what happened.
>
> Matt, can you test?
>
> Note I've used wait_event_killable_timeout() to only accept SIGKILL
> for now. I've seen issues wit SIGCHILD and at modprobe this could
> even be a bigger issue, so this would restrict the signals received
> *only* to SIGKILL.
>
> It would be good to come up with a simple test case for this in
> tools/testing/selftests/kmod/kmod.sh
>
> Luis
>
> diff --git a/include/linux/wait.h b/include/linux/wait.h
> index 5b74e36c0ca8..dc19880c02f5 100644
> --- a/include/linux/wait.h
> +++ b/include/linux/wait.h
> @@ -757,6 +757,43 @@ extern int do_wait_intr_irq(wait_queue_head_t *, wait_queue_entry_t *);
> __ret; \
> })
>
> +#define __wait_event_killable_timeout(wq_head, condition, timeout) \
> + ___wait_event(wq_head, ___wait_cond_timeout(condition), \
> + TASK_KILLABLE, 0, timeout, \
> + __ret = schedule_timeout(__ret))
> +
> +/**
> + * wait_event_killable_timeout - sleep until a condition gets true or a timeout elapses
> + * @wq_head: the waitqueue to wait on
> + * @condition: a C expression for the event to wait for
> + * @timeout: timeout, in jiffies
> + *
> + * The process is put to sleep (TASK_KILLABLE) until the
> + * @condition evaluates to true or a kill signal is received.
> + * The @condition is checked each time the waitqueue @wq_head is woken up.
> + *
> + * wake_up() has to be called after changing any variable that could
> + * change the result of the wait condition.
> + *
> + * Returns:
> + * 0 if the @condition evaluated to %false after the @timeout elapsed,
> + * 1 if the @condition evaluated to %true after the @timeout elapsed,
> + * the remaining jiffies (at least 1) if the @condition evaluated
> + * to %true before the @timeout elapsed, or -%ERESTARTSYS if it was
> + * interrupted by a kill signal.
> + *
> + * Only kill signals interrupt this process.
> + */
> +#define wait_event_killable_timeout(wq_head, condition, timeout) \
> +({ \
> + long __ret = timeout; \
> + might_sleep(); \
> + if (!___wait_cond_timeout(condition)) \
> + __ret = __wait_event_killable_timeout(wq_head, \
> + condition, timeout); \
> + __ret; \
> +})
> +
>
> #define __wait_event_lock_irq(wq_head, condition, lock, cmd) \
> (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0, \
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 6d016c5d97c8..1b5f7bada8d2 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -71,6 +71,13 @@ static atomic_t kmod_concurrent_max = ATOMIC_INIT(MAX_KMOD_CONCURRENT);
> static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
>
> /*
> + * If modprobe can't be called after this time we assume its very likely
> + * your userspace has created a recursive dependency, and we'll have no
> + * option but to fail.
> + */
> +#define MAX_KMOD_TIMEOUT 5

Would this mean slow (swappy) systems could start failing modprobe
just due to access times?

-Kees

> +
> +/*
> modprobe_path is set via /proc/sys.
> */
> char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
> @@ -167,8 +174,18 @@ int __request_module(bool wait, const char *fmt, ...)
> pr_warn_ratelimited("request_module: kmod_concurrent_max (%u) close to 0 (max_modprobes: %u), for module %s, throttling...",
> atomic_read(&kmod_concurrent_max),
> MAX_KMOD_CONCURRENT, module_name);
> - wait_event_interruptible(kmod_wq,
> - atomic_dec_if_positive(&kmod_concurrent_max) >= 0);
> + ret = wait_event_killable_timeout(kmod_wq,
> + atomic_dec_if_positive(&kmod_concurrent_max) >= 0,
> + MAX_KMOD_TIMEOUT * HZ);
> + if (!ret) {
> + pr_warn_ratelimited("request_module: modprobe %s cannot be processed, kmod busy with %d threads for more than %d seconds now",
> + module_name, atomic_read(&kmod_concurrent_max), MAX_KMOD_TIMEOUT);
> + pr_warn_ratelimited("request_module: recursive modprobe call very likely!");
> + return -ETIME;
> + } else if (ret == -ERESTARTSYS) {
> + pr_warn_ratelimited("request_module: sigkill sent for modprobe %s, giving up", module_name);
> + return ret;
> + }
> }
>
> trace_module_request(module_name, wait, _RET_IP_);



--
Kees Cook
Pixel Security

2017-08-04 00:10:42

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC PATCH] exec: Avoid recursive modprobe for binary format handlers

On Thu, Aug 03, 2017 at 05:02:40PM -0700, Kees Cook wrote:
> On Wed, Aug 2, 2017 at 4:23 PM, Luis R. Rodriguez <[email protected]> wrote:
> > On Tue, Aug 01, 2017 at 07:28:20PM -0700, Kees Cook wrote:
> >> On Tue, Aug 1, 2017 at 5:12 PM, Luis R. Rodriguez <[email protected]> wrote:
> >> > On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
> >> >> Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
> >> >> merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
> >> >> no longer caught, it just ends up waiting indefinitely for the kmod_wq
> >> >> wait queue. Hence the kernel appears to hang silently when starting
> >> >> userspace.
> >> >
> >> > Indeed, the recursive issue were no longer expected to exist.
> >>
> >> Errr, yeah, recursive binfmt loads can still happen.
> >>
> >> > The *old* implementation would also prevent a set of binaries to daisy chain
> >> > a set of 50 different binaries which require different binfmt loaders. The
> >> > current implementation enables this and we'd just wait. There's a bound to
> >> > the number of binfmd loaders though, so this would be bounded. If however
> >> > a 2nd loader loaded the first binary we'd run into the same issue I think.
> >> >
> >> > If we can't think of a good way to resolve this we'll just have to revert
> >> > 6d7964a722af for now.
> >>
> >> The weird but "normal" recursive case is usually a script calling a
> >> script calling a misc format. Getting a chain of modprobes running,
> >> though, seems unlikely. I *think* Matt's patch is okay, but I agree,
> >> it'd be better for the request_module() to fail.
> >
> > In that case how about we just have each waiter only wait max X seconds,
> > if the number of concurrent ongoing modprobe calls hasn't reduced by
> > a single digit in X seconds we give up on request_module() for the
> > module and clearly indicate what happened.
> >
> > Matt, can you test?
> >
> > Note I've used wait_event_killable_timeout() to only accept SIGKILL
> > for now. I've seen issues wit SIGCHILD and at modprobe this could
> > even be a bigger issue, so this would restrict the signals received
> > *only* to SIGKILL.
> >
> > It would be good to come up with a simple test case for this in
> > tools/testing/selftests/kmod/kmod.sh
> >
> > Luis
> >
> > diff --git a/include/linux/wait.h b/include/linux/wait.h
> > index 5b74e36c0ca8..dc19880c02f5 100644
> > --- a/include/linux/wait.h
> > +++ b/include/linux/wait.h
> > @@ -757,6 +757,43 @@ extern int do_wait_intr_irq(wait_queue_head_t *, wait_queue_entry_t *);
> > __ret; \
> > })
> >
> > +#define __wait_event_killable_timeout(wq_head, condition, timeout) \
> > + ___wait_event(wq_head, ___wait_cond_timeout(condition), \
> > + TASK_KILLABLE, 0, timeout, \
> > + __ret = schedule_timeout(__ret))
> > +
> > +/**
> > + * wait_event_killable_timeout - sleep until a condition gets true or a timeout elapses
> > + * @wq_head: the waitqueue to wait on
> > + * @condition: a C expression for the event to wait for
> > + * @timeout: timeout, in jiffies
> > + *
> > + * The process is put to sleep (TASK_KILLABLE) until the
> > + * @condition evaluates to true or a kill signal is received.
> > + * The @condition is checked each time the waitqueue @wq_head is woken up.
> > + *
> > + * wake_up() has to be called after changing any variable that could
> > + * change the result of the wait condition.
> > + *
> > + * Returns:
> > + * 0 if the @condition evaluated to %false after the @timeout elapsed,
> > + * 1 if the @condition evaluated to %true after the @timeout elapsed,
> > + * the remaining jiffies (at least 1) if the @condition evaluated
> > + * to %true before the @timeout elapsed, or -%ERESTARTSYS if it was
> > + * interrupted by a kill signal.
> > + *
> > + * Only kill signals interrupt this process.
> > + */
> > +#define wait_event_killable_timeout(wq_head, condition, timeout) \
> > +({ \
> > + long __ret = timeout; \
> > + might_sleep(); \
> > + if (!___wait_cond_timeout(condition)) \
> > + __ret = __wait_event_killable_timeout(wq_head, \
> > + condition, timeout); \
> > + __ret; \
> > +})
> > +
> >
> > #define __wait_event_lock_irq(wq_head, condition, lock, cmd) \
> > (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0, \
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index 6d016c5d97c8..1b5f7bada8d2 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -71,6 +71,13 @@ static atomic_t kmod_concurrent_max = ATOMIC_INIT(MAX_KMOD_CONCURRENT);
> > static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
> >
> > /*
> > + * If modprobe can't be called after this time we assume its very likely
> > + * your userspace has created a recursive dependency, and we'll have no
> > + * option but to fail.
> > + */
> > +#define MAX_KMOD_TIMEOUT 5
>
> Would this mean slow (swappy) systems could start failing modprobe
> just due to access times?

No, this is pre-launch and depends on *all* running kmod threads.
The wait would *only* fail if we already hit the limit of 50 concurrent
kmod threads running at the same time and they *all* don't finish for 5 seconds
straight. If at any point in time any modprobe call finishes that would clear
this and the waiting modprobe waiting would chug on. So this would only happen
if we were maxed out busy without any return for X seconds straight with all
kmod threads busy.

The name probably should reflect that better then, MAX_KMOD_ALL_BUSY_TIMEOUT
maybe?

Luis

2017-08-07 10:26:15

by Matt Redfearn

[permalink] [raw]
Subject: Re: [RFC PATCH] exec: Avoid recursive modprobe for binary format handlers

Hi Luis,


On 03/08/17 00:23, Luis R. Rodriguez wrote:
> On Tue, Aug 01, 2017 at 07:28:20PM -0700, Kees Cook wrote:
>> On Tue, Aug 1, 2017 at 5:12 PM, Luis R. Rodriguez <[email protected]> wrote:
>>> On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
>>>> Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
>>>> merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
>>>> no longer caught, it just ends up waiting indefinitely for the kmod_wq
>>>> wait queue. Hence the kernel appears to hang silently when starting
>>>> userspace.
>>> Indeed, the recursive issue were no longer expected to exist.
>> Errr, yeah, recursive binfmt loads can still happen.
>>
>>> The *old* implementation would also prevent a set of binaries to daisy chain
>>> a set of 50 different binaries which require different binfmt loaders. The
>>> current implementation enables this and we'd just wait. There's a bound to
>>> the number of binfmd loaders though, so this would be bounded. If however
>>> a 2nd loader loaded the first binary we'd run into the same issue I think.
>>>
>>> If we can't think of a good way to resolve this we'll just have to revert
>>> 6d7964a722af for now.
>> The weird but "normal" recursive case is usually a script calling a
>> script calling a misc format. Getting a chain of modprobes running,
>> though, seems unlikely. I *think* Matt's patch is okay, but I agree,
>> it'd be better for the request_module() to fail.
> In that case how about we just have each waiter only wait max X seconds,
> if the number of concurrent ongoing modprobe calls hasn't reduced by
> a single digit in X seconds we give up on request_module() for the
> module and clearly indicate what happened.
>
> Matt, can you test?

Sure - I've tested patch this on Cavium Octeon under the same conditions
as before (64 bit kernel with 32bit userspace & no binfmt handler builtin).

The failing modprobe is now caught and rather than silence we get the
expected kernel panic, albeit all the warnings look quite noisy.

VFS: Mounted root (ext3 filesystem) readonly on device 8:5.
devtmpfs: mounted
Freeing unused kernel memory: 372K
This architecture does not have kernel memory protection.
request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50),
for module binfmt-4c46, throttling...
request_module: modprobe binfmt-4c46 cannot be processed, kmod busy with
0 threads for more than 5 seconds now
request_module: recursive modprobe call very likely!
Starting init: /sbin/init exists but couldn't execute it (error -8)
request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50),
for module binfmt-4c46, throttling...
request_module: modprobe binfmt-4c46 cannot be processed, kmod busy with
0 threads for more than 5 seconds now
request_module: recursive modprobe call very likely!
Starting init: /bin/sh exists but couldn't execute it (error -8)
Kernel panic - not syncing: No working init found. Try passing init=
option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
---[ end Kernel panic - not syncing: No working init found. Try passing
init= option to kernel. See Linux Documentation/admin-guide/init.rst for
guidance.


In any case, this is better than the 4.13-rc1 behavior, so

Tested-by: Matt Redfearn <[email protected]>

Thanks,
Matt

>
> Note I've used wait_event_killable_timeout() to only accept SIGKILL
> for now. I've seen issues wit SIGCHILD and at modprobe this could
> even be a bigger issue, so this would restrict the signals received
> *only* to SIGKILL.
>
> It would be good to come up with a simple test case for this in
> tools/testing/selftests/kmod/kmod.sh
>
> Luis
>
> diff --git a/include/linux/wait.h b/include/linux/wait.h
> index 5b74e36c0ca8..dc19880c02f5 100644
> --- a/include/linux/wait.h
> +++ b/include/linux/wait.h
> @@ -757,6 +757,43 @@ extern int do_wait_intr_irq(wait_queue_head_t *, wait_queue_entry_t *);
> __ret; \
> })
>
> +#define __wait_event_killable_timeout(wq_head, condition, timeout) \
> + ___wait_event(wq_head, ___wait_cond_timeout(condition), \
> + TASK_KILLABLE, 0, timeout, \
> + __ret = schedule_timeout(__ret))
> +
> +/**
> + * wait_event_killable_timeout - sleep until a condition gets true or a timeout elapses
> + * @wq_head: the waitqueue to wait on
> + * @condition: a C expression for the event to wait for
> + * @timeout: timeout, in jiffies
> + *
> + * The process is put to sleep (TASK_KILLABLE) until the
> + * @condition evaluates to true or a kill signal is received.
> + * The @condition is checked each time the waitqueue @wq_head is woken up.
> + *
> + * wake_up() has to be called after changing any variable that could
> + * change the result of the wait condition.
> + *
> + * Returns:
> + * 0 if the @condition evaluated to %false after the @timeout elapsed,
> + * 1 if the @condition evaluated to %true after the @timeout elapsed,
> + * the remaining jiffies (at least 1) if the @condition evaluated
> + * to %true before the @timeout elapsed, or -%ERESTARTSYS if it was
> + * interrupted by a kill signal.
> + *
> + * Only kill signals interrupt this process.
> + */
> +#define wait_event_killable_timeout(wq_head, condition, timeout) \
> +({ \
> + long __ret = timeout; \
> + might_sleep(); \
> + if (!___wait_cond_timeout(condition)) \
> + __ret = __wait_event_killable_timeout(wq_head, \
> + condition, timeout); \
> + __ret; \
> +})
> +
>
> #define __wait_event_lock_irq(wq_head, condition, lock, cmd) \
> (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0, \
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 6d016c5d97c8..1b5f7bada8d2 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -71,6 +71,13 @@ static atomic_t kmod_concurrent_max = ATOMIC_INIT(MAX_KMOD_CONCURRENT);
> static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
>
> /*
> + * If modprobe can't be called after this time we assume its very likely
> + * your userspace has created a recursive dependency, and we'll have no
> + * option but to fail.
> + */
> +#define MAX_KMOD_TIMEOUT 5
> +
> +/*
> modprobe_path is set via /proc/sys.
> */
> char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
> @@ -167,8 +174,18 @@ int __request_module(bool wait, const char *fmt, ...)
> pr_warn_ratelimited("request_module: kmod_concurrent_max (%u) close to 0 (max_modprobes: %u), for module %s, throttling...",
> atomic_read(&kmod_concurrent_max),
> MAX_KMOD_CONCURRENT, module_name);
> - wait_event_interruptible(kmod_wq,
> - atomic_dec_if_positive(&kmod_concurrent_max) >= 0);
> + ret = wait_event_killable_timeout(kmod_wq,
> + atomic_dec_if_positive(&kmod_concurrent_max) >= 0,
> + MAX_KMOD_TIMEOUT * HZ);
> + if (!ret) {
> + pr_warn_ratelimited("request_module: modprobe %s cannot be processed, kmod busy with %d threads for more than %d seconds now",
> + module_name, atomic_read(&kmod_concurrent_max), MAX_KMOD_TIMEOUT);
> + pr_warn_ratelimited("request_module: recursive modprobe call very likely!");
> + return -ETIME;
> + } else if (ret == -ERESTARTSYS) {
> + pr_warn_ratelimited("request_module: sigkill sent for modprobe %s, giving up", module_name);
> + return ret;
> + }
> }
>
> trace_module_request(module_name, wait, _RET_IP_);

2017-08-08 19:23:56

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC PATCH] exec: Avoid recursive modprobe for binary format handlers

On Mon, Aug 07, 2017 at 11:26:09AM +0100, Matt Redfearn wrote:
> Hi Luis,
> On 03/08/17 00:23, Luis R. Rodriguez wrote:
> > On Tue, Aug 01, 2017 at 07:28:20PM -0700, Kees Cook wrote:
> > > On Tue, Aug 1, 2017 at 5:12 PM, Luis R. Rodriguez <[email protected]> wrote:
> > > > On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
> > > > > Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
> > > > > merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
> > > > > no longer caught, it just ends up waiting indefinitely for the kmod_wq
> > > > > wait queue. Hence the kernel appears to hang silently when starting
> > > > > userspace.
> > > > Indeed, the recursive issue were no longer expected to exist.
> > > Errr, yeah, recursive binfmt loads can still happen.
> > >
> > > > The *old* implementation would also prevent a set of binaries to daisy chain
> > > > a set of 50 different binaries which require different binfmt loaders. The
> > > > current implementation enables this and we'd just wait. There's a bound to
> > > > the number of binfmd loaders though, so this would be bounded. If however
> > > > a 2nd loader loaded the first binary we'd run into the same issue I think.
> > > >
> > > > If we can't think of a good way to resolve this we'll just have to revert
> > > > 6d7964a722af for now.
> > > The weird but "normal" recursive case is usually a script calling a
> > > script calling a misc format. Getting a chain of modprobes running,
> > > though, seems unlikely. I *think* Matt's patch is okay, but I agree,
> > > it'd be better for the request_module() to fail.
> > In that case how about we just have each waiter only wait max X seconds,
> > if the number of concurrent ongoing modprobe calls hasn't reduced by
> > a single digit in X seconds we give up on request_module() for the
> > module and clearly indicate what happened.
> >
> > Matt, can you test?
>
> Sure - I've tested patch this on Cavium Octeon under the same conditions as
> before (64 bit kernel with 32bit userspace & no binfmt handler builtin).
>
> The failing modprobe is now caught and rather than silence we get the
> expected kernel panic, albeit all the warnings look quite noisy.

Thanks for testing! I agree its all too verbose, I'll clean that up and
resubmit with a cleaner log.

I tried to devise a test case for this but for the life of me I could not. If
you happen to come up with something please feel free to submit one for
lib/test_kmod.c!

> In any case, this is better than the 4.13-rc1 behavior, so
>
> Tested-by: Matt Redfearn <[email protected]>

Luis

2017-08-09 00:09:46

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC PATCH] exec: Avoid recursive modprobe for binary format handlers

On Wed, Aug 02, 2017 at 02:12:00AM +0200, Luis R. Rodriguez wrote:
> On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 62175cbcc801..004bb50a01fe 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -1644,6 +1644,9 @@ int search_binary_handler(struct linux_binprm *bprm)
> > if (printable(bprm->buf[0]) && printable(bprm->buf[1]) &&
> > printable(bprm->buf[2]) && printable(bprm->buf[3]))
> > return retval;
> > + /* Game over if we need to load a module to execute modprobe */
> > + if (strcmp(bprm->filename, modprobe_path) == 0)
> > + return retval;
>
> Wouldn't this just break having a binfmt used for modprobe always?

The place where you put the check is when a system has CONFIG_MODULES
and a first search for built-in handlers yielded no results so it would
not break that for built-in.

Thinking about this a little further, having an binfmd handler not built-in
seems to really be the issue in this particular case and indeed having one as
modular really just makes no sense as modprobe would be needed.

Although the alternative patch I suggested still makes sense for a *generic
loop detection complaint/error fix, putting this check in place and bailing
still makes sense as well, but this sort of thing seems to be the type of
system build error userspace could try to pick up on pro-actively, ie you
should not get to the point you boot into this, the build system should somehow
complain about it.

Cc'ing linux-modules folks to see if perhaps kmod could do something about this
more proactively.

Ideally if we could do this via kconfig for an architecture that'd be even
better but its not clear if this sort of thing is visible for MIPS on kconfig,
so kmod could be a next place to look for.

We'd need userpace kmod to verify the binary format for modprobe / kmod was
built-in otherwise fail.

> This also does not solve another issue I could think of now:
>
> The *old* implementation would also prevent a set of binaries to daisy chain
> a set of 50 different binaries which require different binfmt loaders. The
> current implementation enables this and we'd just wait. There's a bound to
> the number of binfmd loaders though, so this would be bounded. If however
> a 2nd loader loaded the first binary we'd run into the same issue I think.

Upon testing -- the 2nd loader will not incur another new bump on kmod
concurrent given the original module would have a struct module already
present on the modules list, so these loops don't create a kmod concurrent
bump, they just keep the system waiting forever.

userspace kmod detects these sorts of loops but only for symbol references,
it doesn't check for request_module() calls, and even if it did, it would
then have to also consider aliasing.

kmod handles loops through export symbols references, it won't let a system
complete 'make modules_install' target as depmod will fail when this is
detected. The kmod git tree has some test for this, see
testsuite/module-playground/mod-loop* -- loading any of those yields an error
on modules_install target time as depmod picks it up:

depmod: ERROR: Found 7 modules in dependency cycles!
depmod: ERROR: Cycle detected: mod_loop_a -> mod_loop_b -> mod_loop_c -> mod_loop_a
depmod: ERROR: Cycle detected: mod_loop_a -> mod_loop_b -> mod_loop_c -> mod_loop_g
depmod: ERROR: Cycle detected: mod_loop_a -> mod_loop_b -> mod_loop_c -> mod_loop_f
depmod: ERROR: Cycle detected: mod_loop_d -> mod_loop_e -> mod_loop_d

So -- I will continue to submit the new generic alternative patch I suggested but
we should discuss this particular error further to try to more proactively
prevent it if possible.

It seems we already have in place userspace tools to prevent further loops, the
new warning should help catch others which escape our imagination at this time.
Two other types of issues would be desirable in the future for userspace
to detect proactively:

o module loops using request_module() and aliases
o when the modprobe binfmt is not built-in

Luis

2017-09-08 21:23:47

by Lucas De Marchi

[permalink] [raw]
Subject: Re: [RFC PATCH] exec: Avoid recursive modprobe for binary format handlers

Hi,

On Tue, Aug 8, 2017 at 5:09 PM, Luis R. Rodriguez <[email protected]> wrote:
> On Wed, Aug 02, 2017 at 02:12:00AM +0200, Luis R. Rodriguez wrote:
>> On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
>> > diff --git a/fs/exec.c b/fs/exec.c
>> > index 62175cbcc801..004bb50a01fe 100644
>> > --- a/fs/exec.c
>> > +++ b/fs/exec.c
>> > @@ -1644,6 +1644,9 @@ int search_binary_handler(struct linux_binprm *bprm)
>> > if (printable(bprm->buf[0]) && printable(bprm->buf[1]) &&
>> > printable(bprm->buf[2]) && printable(bprm->buf[3]))
>> > return retval;
>> > + /* Game over if we need to load a module to execute modprobe */
>> > + if (strcmp(bprm->filename, modprobe_path) == 0)
>> > + return retval;
>>
>> Wouldn't this just break having a binfmt used for modprobe always?
>
> The place where you put the check is when a system has CONFIG_MODULES
> and a first search for built-in handlers yielded no results so it would
> not break that for built-in.
>
> Thinking about this a little further, having an binfmd handler not built-in
> seems to really be the issue in this particular case and indeed having one as
> modular really just makes no sense as modprobe would be needed.
>
> Although the alternative patch I suggested still makes sense for a *generic
> loop detection complaint/error fix, putting this check in place and bailing
> still makes sense as well, but this sort of thing seems to be the type of
> system build error userspace could try to pick up on pro-actively, ie you
> should not get to the point you boot into this, the build system should somehow
> complain about it.
>
> Cc'ing linux-modules folks to see if perhaps kmod could do something about this
> more proactively.

Tracking at runtime with modprobe/libkmod would be really difficult as
a module can be loaded
from different sources. I don't see a reliable way to do that. One
thing often forgotten
is that due to install rules the user can even add anything as a
dependency with kmod not
even knowing about (softdep is related, but at least kmod knows what
the user is trying to do
and use it to handle dependencies).

For this particular case, not going through the modprobe helper would
be a way to accomplish that since
you wouldn't need the corresponding binfmt module to run modprobe.
Udev handles module
loading via libkmod , but the only way to trigger it is via the rules
rather than via a request from kernel.


Lucas De Marchi