I think there is a race (albeit a hard-to-hit one) between load_module()
error handling and kprobe registration which could cause a kernel page to
become read-only, panic due to protection fault.
In short, the protection that gets applied [at the bug_cleanup label] can be
overridden by another CPU when executing set_all_modules_text_ro().
Therefore creating the possibility for the kprobe registration code path to
touch a [formed] module that is being deallocated. Consequently we could
free a mapped page, that is not 'writable'. The same page, when later
accessed, will result in a page fault which cannot be handled. Below is an
attempt to illustrate the race. Please note we assume that:
- kprobe uses ftrace
- parse_args() or mod_sysfs_setup() would have to fail
- CPU Y and X do not attempt to load the same module
- CPU Y would have to sneak in *after* CPU X called the two 'unset'
functions but before CPU X removes the module from the list of all
modules
CPU X
...
load_module
// Unknown/invalid module
// parameter specified ...
after_dashes = parse_args(...)
if (IS_ERR(after_dashes))
err = PTR_ERR(after_dashes)
goto coming_cleanup:
...
bug_cleanup:
module_disable_ro(mod)
module_disable_nx(mod)
...
// set_all_modules_text_ro() on CPU Y sneaks in here <-----.
// and overrides the effects of the previous 'unset' |
... |
list_del_rcu(&mod->list) |
|
|
CPU Y |
... |
sys_finit_module |
load_module |
do_init_module |
do_one_initcall |
// mod->init |
kprobe_example_init |
register_kprobe |
arm_kprobe |
// kprobe uses ftrace |
arm_kprobe_ftrace |
register_ftrace_function |
ftrace_startup |
ftrace_startup_enable |
ftrace_run_update_code |
ftrace_arch_code_modify_post_process |
{ |
// |
// Set all [formed] module's |
// core and init pages as |
// read-only under |
// module_mutex ... |
// |
set_all_modules_text_ro() ---------'
}
The following patches (I hope) is an attempt to address this theoretical
race. Please let me know your thoughts.
Aaron Tomlin (2):
module: Ensure a module's state is set accordingly during module
coming cleanup code
module: When modifying a module's text ignore modules which are going
away too
kernel/module.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--
2.5.5
By default, during the access permission modification of a module's core
and init pages, we only ignore modules that are malformed. There is no
reason not to extend this to modules which are going away too.
This patch makes both set_all_modules_text_rw() and
set_all_modules_text_ro() skip modules which are going away too.
Signed-off-by: Aaron Tomlin <[email protected]>
---
kernel/module.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
index ff93ab8..09c386b 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1953,7 +1953,8 @@ void set_all_modules_text_rw(void)
mutex_lock(&module_mutex);
list_for_each_entry_rcu(mod, &modules, list) {
- if (mod->state == MODULE_STATE_UNFORMED)
+ if (mod->state == MODULE_STATE_UNFORMED ||
+ mod->state == MODULE_STATE_GOING)
continue;
frob_text(&mod->core_layout, set_memory_rw);
@@ -1969,7 +1970,8 @@ void set_all_modules_text_ro(void)
mutex_lock(&module_mutex);
list_for_each_entry_rcu(mod, &modules, list) {
- if (mod->state == MODULE_STATE_UNFORMED)
+ if (mod->state == MODULE_STATE_UNFORMED ||
+ mod->state == MODULE_STATE_GOING)
continue;
frob_text(&mod->core_layout, set_memory_ro);
--
2.5.5
In load_module() in the event of an error, for e.g. unknown module
parameter(s) specified we go to perform some module coming clean up
operations. At this point the module is still in a "formed" state
when it is actually going away.
This patch updates the module's state accordingly to ensure anyone on the
module_notify_list waiting for a module going away notification will be
notified accordingly.
Signed-off-by: Aaron Tomlin <[email protected]>
---
kernel/module.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/module.c b/kernel/module.c
index f57dd63..ff93ab8 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3708,6 +3708,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
sysfs_cleanup:
mod_sysfs_teardown(mod);
coming_cleanup:
+ mod->state = MODULE_STATE_GOING;
blocking_notifier_call_chain(&module_notify_list,
MODULE_STATE_GOING, mod);
klp_module_going(mod);
--
2.5.5
Aaron Tomlin <[email protected]> writes:
> In load_module() in the event of an error, for e.g. unknown module
> parameter(s) specified we go to perform some module coming clean up
> operations. At this point the module is still in a "formed" state
> when it is actually going away.
>
> This patch updates the module's state accordingly to ensure anyone on the
> module_notify_list waiting for a module going away notification will be
> notified accordingly.
I recall a similar proposal before.
I've audited all the subscribers to check they didn't look at
mod->state; they seem OK.
We actually do this in the init-failed path, so this should be OK.
Acked-by: Rusty Russell <[email protected]>
Thanks,
Rusty.
> Signed-off-by: Aaron Tomlin <[email protected]>
> ---
> kernel/module.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index f57dd63..ff93ab8 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3708,6 +3708,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
> sysfs_cleanup:
> mod_sysfs_teardown(mod);
> coming_cleanup:
> + mod->state = MODULE_STATE_GOING;
> blocking_notifier_call_chain(&module_notify_list,
> MODULE_STATE_GOING, mod);
> klp_module_going(mod);
> --
> 2.5.5
Aaron Tomlin <[email protected]> writes:
> By default, during the access permission modification of a module's core
> and init pages, we only ignore modules that are malformed. There is no
> reason not to extend this to modules which are going away too.
Well, it depends on all the callers (ie. ftrace): is that also ignoring
modules which are going away?
Otherwise, we set MODULE_STATE_GOING, ftrace walks all the modules and
this one is still RO...
Thanks,
Rusty.
> This patch makes both set_all_modules_text_rw() and
> set_all_modules_text_ro() skip modules which are going away too.
>
> Signed-off-by: Aaron Tomlin <[email protected]>
> ---
> kernel/module.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index ff93ab8..09c386b 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -1953,7 +1953,8 @@ void set_all_modules_text_rw(void)
>
> mutex_lock(&module_mutex);
> list_for_each_entry_rcu(mod, &modules, list) {
> - if (mod->state == MODULE_STATE_UNFORMED)
> + if (mod->state == MODULE_STATE_UNFORMED ||
> + mod->state == MODULE_STATE_GOING)
> continue;
>
> frob_text(&mod->core_layout, set_memory_rw);
> @@ -1969,7 +1970,8 @@ void set_all_modules_text_ro(void)
>
> mutex_lock(&module_mutex);
> list_for_each_entry_rcu(mod, &modules, list) {
> - if (mod->state == MODULE_STATE_UNFORMED)
> + if (mod->state == MODULE_STATE_UNFORMED ||
> + mod->state == MODULE_STATE_GOING)
> continue;
>
> frob_text(&mod->core_layout, set_memory_ro);
> --
> 2.5.5
On Wed, 26 Oct 2016 11:35:18 +1030
Rusty Russell <[email protected]> wrote:
> Aaron Tomlin <[email protected]> writes:
> > By default, during the access permission modification of a module's core
> > and init pages, we only ignore modules that are malformed. There is no
> > reason not to extend this to modules which are going away too.
>
> Well, it depends on all the callers (ie. ftrace): is that also ignoring
> modules which are going away?
>
> Otherwise, we set MODULE_STATE_GOING, ftrace walks all the modules and
> this one is still RO...
>
Actually, looking into this more, you are correct. There's a
possibility in enabling ftrace after the module is about to go but
before ftrace_release_mod() is called (which will remove the module
text from the ftrace function list).
I don't see any reason for not allowing set_all_modules_text_rw() from
being called if a module is going. If a module is going, shouldn't its
text be rw anyway?
Perhaps just preventing it from turning into ro will be sufficient. And
remove the check from set_all_modules_text_rw().
-- Steve
> Thanks,
> Rusty.
>
> > This patch makes both set_all_modules_text_rw() and
> > set_all_modules_text_ro() skip modules which are going away too.
> >
> > Signed-off-by: Aaron Tomlin <[email protected]>
> > ---
> > kernel/module.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/module.c b/kernel/module.c
> > index ff93ab8..09c386b 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -1953,7 +1953,8 @@ void set_all_modules_text_rw(void)
> >
> > mutex_lock(&module_mutex);
> > list_for_each_entry_rcu(mod, &modules, list) {
> > - if (mod->state == MODULE_STATE_UNFORMED)
> > + if (mod->state == MODULE_STATE_UNFORMED ||
> > + mod->state == MODULE_STATE_GOING)
> > continue;
> >
> > frob_text(&mod->core_layout, set_memory_rw);
> > @@ -1969,7 +1970,8 @@ void set_all_modules_text_ro(void)
> >
> > mutex_lock(&module_mutex);
> > list_for_each_entry_rcu(mod, &modules, list) {
> > - if (mod->state == MODULE_STATE_UNFORMED)
> > + if (mod->state == MODULE_STATE_UNFORMED ||
> > + mod->state == MODULE_STATE_GOING)
> > continue;
> >
> > frob_text(&mod->core_layout, set_memory_ro);
> > --
> > 2.5.5
This looks line to me. Rusty, do you have any issues with this?
Maybe we should add a comment to why a going module shouldn't be
converted to ro (because of ftrace and kprobes). But other than that,
I have no issue with it.
I also added Jessica to the Cc as I notice she will be the new module
maintainer: http://lwn.net/Articles/704653/
-- Steve
On Thu, 27 Oct 2016 10:36:06 +0100
Aaron Tomlin <[email protected]> wrote:
> By default, during the access permission modification of a module's core
> and init pages, we only ignore modules that are malformed. Albeit for a
> module which is going away, it does not make sense to change its text to
> RO since the module should be RW, before deallocation.
>
> This patch makes set_all_modules_text_ro() skip modules which are going
> away too.
>
> Signed-off-by: Aaron Tomlin <[email protected]>
> ---
> kernel/module.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index ff93ab8..2a383df 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -1969,7 +1969,8 @@ void set_all_modules_text_ro(void)
>
> mutex_lock(&module_mutex);
> list_for_each_entry_rcu(mod, &modules, list) {
> - if (mod->state == MODULE_STATE_UNFORMED)
> + if (mod->state == MODULE_STATE_UNFORMED ||
> + mod->state == MODULE_STATE_GOING)
> continue;
>
> frob_text(&mod->core_layout, set_memory_ro);
By default, during the access permission modification of a module's core
and init pages, we only ignore modules that are malformed. Albeit for a
module which is going away, it does not make sense to change its text to
RO since the module should be RW, before deallocation.
This patch makes set_all_modules_text_ro() skip modules which are going
away too.
Signed-off-by: Aaron Tomlin <[email protected]>
---
kernel/module.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/module.c b/kernel/module.c
index ff93ab8..2a383df 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1969,7 +1969,8 @@ void set_all_modules_text_ro(void)
mutex_lock(&module_mutex);
list_for_each_entry_rcu(mod, &modules, list) {
- if (mod->state == MODULE_STATE_UNFORMED)
+ if (mod->state == MODULE_STATE_UNFORMED ||
+ mod->state == MODULE_STATE_GOING)
continue;
frob_text(&mod->core_layout, set_memory_ro);
--
2.5.5
On Thu 2016-10-27 09:49 -0400, Steven Rostedt wrote:
[ ... ]
> I also added Jessica to the Cc as I notice she will be the new module
> maintainer: http://lwn.net/Articles/704653/
Hi Jessica,
Any thoughts?
Thanks,
--
Aaron Tomlin
+++ Rusty Russell [26/10/16 11:24 +1030]:
>Aaron Tomlin <[email protected]> writes:
>> In load_module() in the event of an error, for e.g. unknown module
>> parameter(s) specified we go to perform some module coming clean up
>> operations. At this point the module is still in a "formed" state
>> when it is actually going away.
>>
>> This patch updates the module's state accordingly to ensure anyone on the
>> module_notify_list waiting for a module going away notification will be
>> notified accordingly.
>
>I recall a similar proposal before.
>
>I've audited all the subscribers to check they didn't look at
>mod->state; they seem OK.
>
>We actually do this in the init-failed path, so this should be OK.
We did discuss a similar proposal before:
https://lkml.kernel.org/r/[email protected]
The complaint back then was that we need to be in the COMING state for
strong_try_module_get() to fail. But it will also correctly fail for GOING
modules in the module_is_live() check in the subsequent call to
try_module_get(), so I believe we are still OK here.
Jessica
+++ Aaron Tomlin [07/11/16 11:46 +0000]:
>Hi Jessica,
>
>Any thoughts?
Hi Aaron,
Thanks for your patience as I slowly work through a large swath of emails :-)
Anyway, this looks fine to me. A going module's text should be (or
soon will be) rw anyway, so checking for going modules in the ro
case should be enough.
Rusty, if you give your ack for the second patch, I can apply both
patches to my modules-next branch. I'll also incorporate Steven's
suggestion for a comment explaining why going modules shouldn't be
converted to ro in this context.
Thanks,
Jessica
On Wed, 9 Nov 2016, Jessica Yu wrote:
> +++ Rusty Russell [26/10/16 11:24 +1030]:
> > Aaron Tomlin <[email protected]> writes:
> > > In load_module() in the event of an error, for e.g. unknown module
> > > parameter(s) specified we go to perform some module coming clean up
> > > operations. At this point the module is still in a "formed" state
> > > when it is actually going away.
> > >
> > > This patch updates the module's state accordingly to ensure anyone on the
> > > module_notify_list waiting for a module going away notification will be
> > > notified accordingly.
> >
> > I recall a similar proposal before.
> >
> > I've audited all the subscribers to check they didn't look at
> > mod->state; they seem OK.
> >
> > We actually do this in the init-failed path, so this should be OK.
>
> We did discuss a similar proposal before:
>
> https://lkml.kernel.org/r/[email protected]
>
> The complaint back then was that we need to be in the COMING state for
> strong_try_module_get() to fail. But it will also correctly fail for GOING
> modules in the module_is_live() check in the subsequent call to
> try_module_get(), so I believe we are still OK here.
FWIW, I looked and this is true. Even the error -ENOENT could be better in
this case than -EBUSY (since the module is going away).
Reviewed-by: Miroslav Benes <[email protected]>
for the patch, if you want it.
Anyway, the comment above strong_try_module_get() is not true for almost 9
nine years. So how about something like:
-->8--
>From 872e11394fdaba8fb9a333e114dc92273d2d1bf5 Mon Sep 17 00:00:00 2001
From: Miroslav Benes <[email protected]>
Date: Wed, 16 Nov 2016 16:45:48 +0100
Subject: [PATCH] module: Fix a comment above strong_try_module_get()
The comment above strong_try_module_get() function is not true anymore.
Return values changed with commit c9a3ba55bb5d ("module: wait for
dependent modules doing init.").
Signed-off-by: Miroslav Benes <[email protected]>
---
kernel/module.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
index f57dd63186e6..67160ca8110e 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -313,8 +313,9 @@ struct load_info {
} index;
};
-/* We require a truly strong try_module_get(): 0 means failure due to
- ongoing or failed initialization etc. */
+/* We require a truly strong try_module_get(): 0 means success.
+ * Otherwise an error is returned due to ongoing or failed
+ * initialization etc. */
static inline int strong_try_module_get(struct module *mod)
{
BUG_ON(mod && mod->state == MODULE_STATE_UNFORMED);
--
2.10.2
Aaron Tomlin <[email protected]> writes:
> By default, during the access permission modification of a module's core
> and init pages, we only ignore modules that are malformed. Albeit for a
> module which is going away, it does not make sense to change its text to
> RO since the module should be RW, before deallocation.
>
> This patch makes set_all_modules_text_ro() skip modules which are going
> away too.
>
> Signed-off-by: Aaron Tomlin <[email protected]>
Acked-by: Rusty Russell <[email protected]>
Thanks!
Rusty.
> ---
> kernel/module.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index ff93ab8..2a383df 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -1969,7 +1969,8 @@ void set_all_modules_text_ro(void)
>
> mutex_lock(&module_mutex);
> list_for_each_entry_rcu(mod, &modules, list) {
> - if (mod->state == MODULE_STATE_UNFORMED)
> + if (mod->state == MODULE_STATE_UNFORMED ||
> + mod->state == MODULE_STATE_GOING)
> continue;
>
> frob_text(&mod->core_layout, set_memory_ro);
> --
> 2.5.5
On Wed, 9 Nov 2016 05:40:58 -0500
Jessica Yu <[email protected]> wrote:
> +++ Aaron Tomlin [07/11/16 11:46 +0000]:
> >Hi Jessica,
> >
> >Any thoughts?
>
> Hi Aaron,
>
> Thanks for your patience as I slowly work through a large swath of emails :-)
>
> Anyway, this looks fine to me. A going module's text should be (or
> soon will be) rw anyway, so checking for going modules in the ro
> case should be enough.
>
> Rusty, if you give your ack for the second patch, I can apply both
> patches to my modules-next branch. I'll also incorporate Steven's
> suggestion for a comment explaining why going modules shouldn't be
> converted to ro in this context.
>
Hi Jessica,
Have you pulled these in? I haven't noticed them in linux-next.
-- Steve
+++ Steven Rostedt [23/11/16 11:00 -0500]:
>On Wed, 9 Nov 2016 05:40:58 -0500
>Jessica Yu <[email protected]> wrote:
>
>> +++ Aaron Tomlin [07/11/16 11:46 +0000]:
>> >Hi Jessica,
>> >
>> >Any thoughts?
>>
>> Hi Aaron,
>>
>> Thanks for your patience as I slowly work through a large swath of emails :-)
>>
>> Anyway, this looks fine to me. A going module's text should be (or
>> soon will be) rw anyway, so checking for going modules in the ro
>> case should be enough.
>>
>> Rusty, if you give your ack for the second patch, I can apply both
>> patches to my modules-next branch. I'll also incorporate Steven's
>> suggestion for a comment explaining why going modules shouldn't be
>> converted to ro in this context.
>>
>
>Hi Jessica,
>
>Have you pulled these in? I haven't noticed them in linux-next.
I currently have this queued up for 4.10. I'm still clearing up some
(unrelated to this patch) maintainership transition questions, but
expect the new modules tree to be pushed out and included in -next
by this week.
Thanks,
Jessica
+++ Miroslav Benes [16/11/16 16:49 +0100]:
>On Wed, 9 Nov 2016, Jessica Yu wrote:
>
>> +++ Rusty Russell [26/10/16 11:24 +1030]:
>> > Aaron Tomlin <[email protected]> writes:
>> > > In load_module() in the event of an error, for e.g. unknown module
>> > > parameter(s) specified we go to perform some module coming clean up
>> > > operations. At this point the module is still in a "formed" state
>> > > when it is actually going away.
>> > >
>> > > This patch updates the module's state accordingly to ensure anyone on the
>> > > module_notify_list waiting for a module going away notification will be
>> > > notified accordingly.
>> >
>> > I recall a similar proposal before.
>> >
>> > I've audited all the subscribers to check they didn't look at
>> > mod->state; they seem OK.
>> >
>> > We actually do this in the init-failed path, so this should be OK.
>>
>> We did discuss a similar proposal before:
>>
>> https://lkml.kernel.org/r/[email protected]
>>
>> The complaint back then was that we need to be in the COMING state for
>> strong_try_module_get() to fail. But it will also correctly fail for GOING
>> modules in the module_is_live() check in the subsequent call to
>> try_module_get(), so I believe we are still OK here.
>
>FWIW, I looked and this is true. Even the error -ENOENT could be better in
>this case than -EBUSY (since the module is going away).
>
>Reviewed-by: Miroslav Benes <[email protected]>
>
>for the patch, if you want it.
>
>Anyway, the comment above strong_try_module_get() is not true for almost 9
>nine years. So how about something like:
>
>-->8--
>
>From 872e11394fdaba8fb9a333e114dc92273d2d1bf5 Mon Sep 17 00:00:00 2001
>From: Miroslav Benes <[email protected]>
>Date: Wed, 16 Nov 2016 16:45:48 +0100
>Subject: [PATCH] module: Fix a comment above strong_try_module_get()
>
>The comment above strong_try_module_get() function is not true anymore.
>Return values changed with commit c9a3ba55bb5d ("module: wait for
>dependent modules doing init.").
>
>Signed-off-by: Miroslav Benes <[email protected]>
Thanks Miroslav, that comment was confusing and needed updating. I've
queued this on top of the other patches.
Jessica