2015-08-17 23:20:12

by Laura Abbott

[permalink] [raw]
Subject: 0be964be0 "module: Sanitize RCU usage and locking" breaks symbol_put_addr?

Hi,

We received a few bug backtraces:

[ 41.536933] ------------[ cut here ]------------
[ 41.537545] WARNING: CPU: 1 PID: 813 at kernel/module.c:291 module_assert_mutex_or_preempt+0x49/0x90()
[ 41.538174] Modules linked in: mxl5007t af9013 ... dvb_usb_af9015(+) ... dvb_usb_v2 dvb_core rc_core ...
[ 41.542457] CPU: 1 PID: 813 Comm: systemd-udevd Not tainted 4.2.0-0.rc6.git0.1.fc24.x86_64+debug #1
...
[ 41.545938] Call Trace:
[ 41.546607] [<ffffffff81868d8e>] dump_stack+0x4c/0x65
[ 41.547280] [<ffffffff810ab406>] warn_slowpath_common+0x86/0xc0
[ 41.547959] [<ffffffffa057d0b0>] ? af9013_read_ucblocks+0x20/0x20 [af9013]
[ 41.548633] [<ffffffffa057d0b0>] ? af9013_read_ucblocks+0x20/0x20 [af9013]
[ 41.549315] [<ffffffff810ab53a>] warn_slowpath_null+0x1a/0x20
[ 41.549994] [<ffffffff81150529>] module_assert_mutex_or_preempt+0x49/0x90
[ 41.550664] [<ffffffff81150822>] __module_address+0x32/0x150
[ 41.551346] [<ffffffffa057d0b0>] ? af9013_read_ucblocks+0x20/0x20 [af9013]
[ 41.552037] [<ffffffffa057d0b0>] ? af9013_read_ucblocks+0x20/0x20 [af9013]
[ 41.552684] [<ffffffff81150956>] __module_text_address+0x16/0x70
[ 41.553361] [<ffffffffa057d0b0>] ? af9013_read_ucblocks+0x20/0x20 [af9013]
[ 41.554049] [<ffffffffa057d0b0>] ? af9013_read_ucblocks+0x20/0x20 [af9013]
[ 41.554701] [<ffffffff81150f19>] symbol_put_addr+0x29/0x40
[ 41.555392] [<ffffffffa04b77ad>] dvb_frontend_detach+0x7d/0x90 [dvb_core]
[ 41.556078] [<ffffffffa04cdfd5>] dvb_usbv2_probe+0xc85/0x11a0 [dvb_usb_v2]
[ 41.556750] [<ffffffffa05607c4>] af9015_probe+0x84/0xf0 [dvb_usb_af9015]
[ 41.557483] [<ffffffff8161c03b>] usb_probe_interface+0x1bb/0x2e0
[ 41.558169] [<ffffffff81579f26>] driver_probe_device+0x1f6/0x450
[ 41.558837] [<ffffffff8157a214>] __driver_attach+0x94/0xa0
[ 41.559469] [<ffffffff8157a180>] ? driver_probe_device+0x450/0x450
[ 41.560126] [<ffffffff815778f3>] bus_for_each_dev+0x73/0xc0
[ 41.560748] [<ffffffff815796fe>] driver_attach+0x1e/0x20
[ 41.561442] [<ffffffff8157922e>] bus_add_driver+0x1ee/0x280
[ 41.562088] [<ffffffff8157b0a0>] driver_register+0x60/0xe0
[ 41.562712] [<ffffffff8161a87d>] usb_register_driver+0xad/0x160
[ 41.563348] [<ffffffffa0567000>] ? 0xffffffffa0567000
[ 41.563971] [<ffffffffa056701e>] af9015_usb_driver_init+0x1e/0x1000 [dvb_usb_af9015]
[ 41.564580] [<ffffffff81002123>] do_one_initcall+0xb3/0x200
[ 41.565210] [<ffffffff8124ac65>] ? kmem_cache_alloc_trace+0x355/0x380
[ 41.565834] [<ffffffff81867c37>] ? do_init_module+0x28/0x1e9
[ 41.566428] [<ffffffff81867c6f>] do_init_module+0x60/0x1e9
[ 41.567042] [<ffffffff81154167>] load_module+0x21f7/0x28d0
[ 41.567633] [<ffffffff8114f600>] ? m_show+0x1b0/0x1b0
[ 41.568252] [<ffffffff81026d79>] ? sched_clock+0x9/0x10
[ 41.568861] [<ffffffff810e6ddc>] ? local_clock+0x1c/0x20
[ 41.569453] [<ffffffff811549b8>] SyS_init_module+0x178/0x1c0
[ 41.570059] [<ffffffff8187282e>] entry_SYSCALL_64_fastpath+0x12/0x76
[ 41.570630] ---[ end trace 31a9dd90d4f559f5 ]---

Based on my understanding, this is spitting a warning that the module_mutex
isn't held. There's a nice comment in symbol_put_addr right before the call:

/* module_text_address is safe here: we're supposed to have reference
* to module from symbol_get, so it can't go away. */
modaddr = __module_text_address(a);

so it looks like this is an erroneous warning which shouldn't need the mutex held.
Any ideas or am I off base here?

Thanks,
Laura


2015-08-18 02:31:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 0be964be0 "module: Sanitize RCU usage and locking" breaks symbol_put_addr?

On Mon, Aug 17, 2015 at 04:20:09PM -0700, Laura Abbott wrote:
> Hi,
>
> We received a few bug backtraces:
>
> [ 41.536933] ------------[ cut here ]------------
> [ 41.537545] WARNING: CPU: 1 PID: 813 at kernel/module.c:291 module_assert_mutex_or_preempt+0x49/0x90()
> [ 41.538174] Modules linked in: mxl5007t af9013 ... dvb_usb_af9015(+) ... dvb_usb_v2 dvb_core rc_core ...
> [ 41.542457] CPU: 1 PID: 813 Comm: systemd-udevd Not tainted 4.2.0-0.rc6.git0.1.fc24.x86_64+debug #1
> ...
> [ 41.549994] [<ffffffff81150529>] module_assert_mutex_or_preempt+0x49/0x90
> [ 41.550664] [<ffffffff81150822>] __module_address+0x32/0x150
> [ 41.552684] [<ffffffff81150956>] __module_text_address+0x16/0x70
> [ 41.554701] [<ffffffff81150f19>] symbol_put_addr+0x29/0x40
> [ 41.555392] [<ffffffffa04b77ad>] dvb_frontend_detach+0x7d/0x90 [dvb_core]

> Based on my understanding, this is spitting a warning that the module_mutex
> isn't held. There's a nice comment in symbol_put_addr right before the call:
>
> /* module_text_address is safe here: we're supposed to have reference
> * to module from symbol_get, so it can't go away. */
> modaddr = __module_text_address(a);
>
> so it looks like this is an erroneous warning which shouldn't need the mutex held.
> Any ideas or am I off base here?

That comment is wrong, you still need either preempt disabled or hold
the module mutex because you're going to iterate the data structure in
order to look up the module.

The fact that you hold a reference on the module you're going to
(hopefully) find, doesn't stabilize the data structure.

So I think maybe symbol_put_addr() is broken and it wants something
like:

---
kernel/module.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index b86b7bf1be38..8f051a106676 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1063,11 +1063,15 @@ void symbol_put_addr(void *addr)
if (core_kernel_text(a))
return;

- /* module_text_address is safe here: we're supposed to have reference
- * to module from symbol_get, so it can't go away. */
+ /*
+ * Even though we hold a reference on the module; we still need to
+ * disable preemption in order to safely traverse the data structure.
+ */
+ preempt_disable();
modaddr = __module_text_address(a);
BUG_ON(!modaddr);
module_put(modaddr);
+ preempt_enable();
}
EXPORT_SYMBOL_GPL(symbol_put_addr);

2015-08-19 03:12:56

by Rusty Russell

[permalink] [raw]
Subject: Re: 0be964be0 "module: Sanitize RCU usage and locking" breaks symbol_put_addr?

Peter Zijlstra <[email protected]> writes:
> On Mon, Aug 17, 2015 at 04:20:09PM -0700, Laura Abbott wrote:
>> Hi,
>>
>> We received a few bug backtraces:
>>
>> [ 41.536933] ------------[ cut here ]------------
>> [ 41.537545] WARNING: CPU: 1 PID: 813 at kernel/module.c:291 module_assert_mutex_or_preempt+0x49/0x90()
>> [ 41.538174] Modules linked in: mxl5007t af9013 ... dvb_usb_af9015(+) ... dvb_usb_v2 dvb_core rc_core ...
>> [ 41.542457] CPU: 1 PID: 813 Comm: systemd-udevd Not tainted 4.2.0-0.rc6.git0.1.fc24.x86_64+debug #1
>> ...
>> [ 41.549994] [<ffffffff81150529>] module_assert_mutex_or_preempt+0x49/0x90
>> [ 41.550664] [<ffffffff81150822>] __module_address+0x32/0x150
>> [ 41.552684] [<ffffffff81150956>] __module_text_address+0x16/0x70
>> [ 41.554701] [<ffffffff81150f19>] symbol_put_addr+0x29/0x40
>> [ 41.555392] [<ffffffffa04b77ad>] dvb_frontend_detach+0x7d/0x90 [dvb_core]
>
>> Based on my understanding, this is spitting a warning that the module_mutex
>> isn't held. There's a nice comment in symbol_put_addr right before the call:
>>
>> /* module_text_address is safe here: we're supposed to have reference
>> * to module from symbol_get, so it can't go away. */
>> modaddr = __module_text_address(a);
>>
>> so it looks like this is an erroneous warning which shouldn't need the mutex held.
>> Any ideas or am I off base here?
>
> That comment is wrong, you still need either preempt disabled or hold
> the module mutex because you're going to iterate the data structure in
> order to look up the module.

Indeed! That comment is wrong, and your fix is good.

Care to S-O-B on it?

Thanks,
Rusty.

>
> The fact that you hold a reference on the module you're going to
> (hopefully) find, doesn't stabilize the data structure.
>
> So I think maybe symbol_put_addr() is broken and it wants something
> like:
>
> ---
> kernel/module.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index b86b7bf1be38..8f051a106676 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -1063,11 +1063,15 @@ void symbol_put_addr(void *addr)
> if (core_kernel_text(a))
> return;
>
> - /* module_text_address is safe here: we're supposed to have reference
> - * to module from symbol_get, so it can't go away. */
> + /*
> + * Even though we hold a reference on the module; we still need to
> + * disable preemption in order to safely traverse the data structure.
> + */
> + preempt_disable();
> modaddr = __module_text_address(a);
> BUG_ON(!modaddr);
> module_put(modaddr);
> + preempt_enable();
> }
> EXPORT_SYMBOL_GPL(symbol_put_addr);
>

2015-08-19 12:42:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: 0be964be0 "module: Sanitize RCU usage and locking" breaks symbol_put_addr?


On Wed, Aug 19, 2015 at 06:19:43AM +0930, Rusty Russell wrote:
> Indeed! That comment is wrong, and your fix is good.
>
> Care to S-O-B on it?

Of course, here goes.

---
Subject: module: Fix locking in symbol_put_addr()

Laura reported an assertion triggering:

[<ffffffff81150529>] module_assert_mutex_or_preempt+0x49/0x90
[<ffffffff81150822>] __module_address+0x32/0x150
[<ffffffff81150956>] __module_text_address+0x16/0x70
[<ffffffff81150f19>] symbol_put_addr+0x29/0x40
[<ffffffffa04b77ad>] dvb_frontend_detach+0x7d/0x90 [dvb_core]

Which lead us to inspect symbol_put_addr(). This function has a comment
claiming it doesn't need to disable preemption around the module lookup
because it holds a reference to the module it wants to find, which
therefore cannot go away.

This is wrong (and a false optimization too, preempt_disable() is really
rather cheap, and I doubt any of this is on uber critical paths,
otherwise it would've retained a pointer to the actual module anyway and
avoided the second lookup).

While its true that the module cannot go away while we hold a reference
on it, the data structure we do the lookup in very much _CAN_ change
while we do the lookup. Therefore fix the comment and add the
required preempt_disable().

Reported-by: Laura Abbott <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
kernel/module.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index b86b7bf1be38..8f051a106676 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1063,11 +1063,15 @@ void symbol_put_addr(void *addr)
if (core_kernel_text(a))
return;

- /* module_text_address is safe here: we're supposed to have reference
- * to module from symbol_get, so it can't go away. */
+ /*
+ * Even though we hold a reference on the module; we still need to
+ * disable preemption in order to safely traverse the data structure.
+ */
+ preempt_disable();
modaddr = __module_text_address(a);
BUG_ON(!modaddr);
module_put(modaddr);
+ preempt_enable();
}
EXPORT_SYMBOL_GPL(symbol_put_addr);

2015-08-20 05:34:36

by Rusty Russell

[permalink] [raw]
Subject: Re: 0be964be0 "module: Sanitize RCU usage and locking" breaks symbol_put_addr?

Peter Zijlstra <[email protected]> writes:
> On Wed, Aug 19, 2015 at 06:19:43AM +0930, Rusty Russell wrote:
>> Indeed! That comment is wrong, and your fix is good.
>>
>> Care to S-O-B on it?
>
> Of course, here goes.

Thanks! This is an ancient bug (2009) which your extra assertions
caught. It's unlikely to have ever bitten anyone, but I've CC'd stable
for it anyway:

Fixes: a6e6abd575fc ("module: remove module_text_address()")
Cc: [email protected]

Applied,
Rusty.