2017-09-20 18:37:10

by Andrey Konovalov

[permalink] [raw]
Subject: usb/net/p54: trying to register non-static key in p54_unregister_leds

Hi!

I've got the following report while fuzzing the kernel with syzkaller.

On commit ebb2c2437d8008d46796902ff390653822af6cc4 (Sep 18).

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 1404 Comm: kworker/1:1 Not tainted
4.14.0-rc1-42251-gebb2c2437d80-dirty #205
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
__dump_stack lib/dump_stack.c:16
dump_stack+0x292/0x395 lib/dump_stack.c:52
register_lock_class+0x6c4/0x1a00 kernel/locking/lockdep.c:769
__lock_acquire+0x27e/0x4550 kernel/locking/lockdep.c:3385
lock_acquire+0x259/0x620 kernel/locking/lockdep.c:4002
flush_work+0xf0/0x8c0 kernel/workqueue.c:2886
__cancel_work_timer+0x51d/0x870 kernel/workqueue.c:2961
cancel_delayed_work_sync+0x1f/0x30 kernel/workqueue.c:3081
p54_unregister_leds+0x6c/0xc0 drivers/net/wireless/intersil/p54/led.c:160
p54_unregister_common+0x3d/0xb0 drivers/net/wireless/intersil/p54/main.c:856
p54u_disconnect+0x86/0x120 drivers/net/wireless/intersil/p54/p54usb.c:1073
usb_unbind_interface+0x21c/0xa90 drivers/usb/core/driver.c:423
__device_release_driver drivers/base/dd.c:861
device_release_driver_internal+0x4f4/0x5c0 drivers/base/dd.c:893
device_release_driver+0x1e/0x30 drivers/base/dd.c:918
bus_remove_device+0x2f4/0x4b0 drivers/base/bus.c:565
device_del+0x5c4/0xab0 drivers/base/core.c:1985
usb_disable_device+0x1e9/0x680 drivers/usb/core/message.c:1170
usb_disconnect+0x260/0x7a0 drivers/usb/core/hub.c:2124
hub_port_connect drivers/usb/core/hub.c:4754
hub_port_connect_change drivers/usb/core/hub.c:5009
port_event drivers/usb/core/hub.c:5115
hub_event+0x1318/0x3740 drivers/usb/core/hub.c:5195
process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
process_scheduled_works kernel/workqueue.c:2179
worker_thread+0xb2b/0x1850 kernel/workqueue.c:2255
kthread+0x3a1/0x470 kernel/kthread.c:231
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431


2017-09-20 19:55:10

by Johannes Berg

[permalink] [raw]
Subject: Re: usb/net/p54: trying to register non-static key in p54_unregister_leds

On Wed, 2017-09-20 at 21:27 +0200, Christian Lamparter wrote:

> It seems this is caused as a result of:
>     -> lock_map_acquire(&work->lockdep_map);
>     lock_map_release(&work->lockdep_map);
>
>     in flush_work() [0]

Agree.

> This was added by:
>
> commit 0976dfc1d0cd80a4e9dfaf87bd8744612bde475a
> Author: Stephen Boyd <[email protected]>
> Date:   Fri Apr 20 17:28:50 2012 -0700
>
> workqueue: Catch more locking problems with flush_work()

Yes, but that doesn't matter.
    
> Looking at the Stephen's patch, it's clear that it was made
> with "static DECLARE_WORK(work, my_work)" in mind. However
> p54's led_work is "per-device", hence it is stored in the
> devices context p54_common, which is dynamically allocated.
> So, maybe revert Stephen's patch?

I disagree - as the lockdep warning says:

> > INFO: trying to register non-static key.
> > the code is fine but needs lockdep annotation.
> > turning off the locking correctness validator.

What it needs is to actually correctly go through initializing the work
at least once.

Without more information, I can't really say what's going on, but I
assume that something is failing and p54_unregister_leds() is getting
invoked without p54_init_leds() having been invoked, so essentially
it's trying to flush a work that was never initialized?

INIT_DELAYED_WORK() does, after all, initialize the lockdep map
properly via __INIT_WORK().

johannes

2017-09-24 14:13:41

by Johannes Berg

[permalink] [raw]
Subject: Re: [RESEND] Re: usb/net/p54: trying to register non-static key in p54_unregister_leds

On Sat, 2017-09-23 at 21:37 +0200, Christian Lamparter wrote:

> But this also begs the question: Is this really working then?
> From what I can tell, if CONFIG_LOCKDEP is not set then there's no
> BUG no WARN, no other splat or any other odd system behaviour. Does
> [cancel | flush]_[delayed_]work[_sync] really "just work" by
> *accident*, as long the delayed_work | work_struct is zeroed out? 

It looks like it does, but I'm not sure it's not more or less by
accident. Look at get_work_pool() for example, it might actually return
non-NULL in this case, and then in start_flush_work() you'll probably
fall into one of the few "already_gone" cases.

> And should it work in the future as well?

I guess it's not really guaranteed, the API doesn't state anything to
that effect. Not that I'm looking forward to a new workqueue rewrite ;)

johannes

2017-09-23 19:37:37

by Christian Lamparter

[permalink] [raw]
Subject: [RESEND] Re: usb/net/p54: trying to register non-static key in p54_unregister_leds

This got rejected by gmail once. Let's see if it works now.

On Thursday, September 21, 2017 8:22:45 PM CEST Andrey Konovalov wrote:
> On Wed, Sep 20, 2017 at 9:55 PM, Johannes Berg
> <[email protected]> wrote:
> > On Wed, 2017-09-20 at 21:27 +0200, Christian Lamparter wrote:
> >
> >> It seems this is caused as a result of:
> >> -> lock_map_acquire(&work->lockdep_map);
> >> lock_map_release(&work->lockdep_map);
> >>
> >> in flush_work() [0]
> >
> > Agree.
> >
> >> This was added by:
> >>
> >> commit 0976dfc1d0cd80a4e9dfaf87bd8744612bde475a
> >> Author: Stephen Boyd <[email protected]>
> >> Date: Fri Apr 20 17:28:50 2012 -0700
> >>
> >> workqueue: Catch more locking problems with flush_work()
> >
> > Yes, but that doesn't matter.
> >
> >> Looking at the Stephen's patch, it's clear that it was made
> >> with "static DECLARE_WORK(work, my_work)" in mind. However
> >> p54's led_work is "per-device", hence it is stored in the
> >> devices context p54_common, which is dynamically allocated.
> >> So, maybe revert Stephen's patch?
> >
> > I disagree - as the lockdep warning says:
> >
> >> > INFO: trying to register non-static key.
> >> > the code is fine but needs lockdep annotation.
> >> > turning off the locking correctness validator.
> >
> > What it needs is to actually correctly go through initializing the work
> > at least once.
> >
> > Without more information, I can't really say what's going on, but I
> > assume that something is failing and p54_unregister_leds() is getting
> > invoked without p54_init_leds() having been invoked, so essentially
> > it's trying to flush a work that was never initialized?
> >
> > INIT_DELAYED_WORK() does, after all, initialize the lockdep map
> > properly via __INIT_WORK().

Ok, thanks. This does indeed explain it.

But this also begs the question: Is this really working then?
>From what I can tell, if CONFIG_LOCKDEP is not set then there's no BUG
no WARN, no other splat or any other odd system behaviour. Does
[cancel | flush]_[delayed_]work[_sync] really "just work" by *accident*,
as long the delayed_work | work_struct is zeroed out?
And should it work in the future as well?

> Since I'm able to reproduce this, please let me know if you need me to
> collect some debug traces to help with the triage.

Do you want to take a shot at making a patch too? At a quick glance, it
should be enough to move the [#ifdef CONFIG_P54_LEDS ... #endif] block
in p54_unregister_common() into the if (priv->registered) { block
(preferably before the ieee80211_unregister_hw(dev).

Regards,
Christian

2017-09-20 19:27:46

by Christian Lamparter

[permalink] [raw]
Subject: Re: usb/net/p54: trying to register non-static key in p54_unregister_leds

On Wednesday, September 20, 2017 8:37:08 PM CEST Andrey Konovalov wrote:
> Hi!
>
> I've got the following report while fuzzing the kernel with syzkaller.
>
> On commit ebb2c2437d8008d46796902ff390653822af6cc4 (Sep 18).
>
> INFO: trying to register non-static key.
> the code is fine but needs lockdep annotation.
> turning off the locking correctness validator.
> CPU: 1 PID: 1404 Comm: kworker/1:1 Not tainted
> 4.14.0-rc1-42251-gebb2c2437d80-dirty #205
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Workqueue: usb_hub_wq hub_event
> Call Trace:
> __dump_stack lib/dump_stack.c:16
> dump_stack+0x292/0x395 lib/dump_stack.c:52
> register_lock_class+0x6c4/0x1a00 kernel/locking/lockdep.c:769
> __lock_acquire+0x27e/0x4550 kernel/locking/lockdep.c:3385
> lock_acquire+0x259/0x620 kernel/locking/lockdep.c:4002
> flush_work+0xf0/0x8c0 kernel/workqueue.c:2886
> __cancel_work_timer+0x51d/0x870 kernel/workqueue.c:2961
> cancel_delayed_work_sync+0x1f/0x30 kernel/workqueue.c:3081
> p54_unregister_leds+0x6c/0xc0 drivers/net/wireless/intersil/p54/led.c:160
> p54_unregister_common+0x3d/0xb0 drivers/net/wireless/intersil/p54/main.c:856
> p54u_disconnect+0x86/0x120 drivers/net/wireless/intersil/p54/p54usb.c:1073
> usb_unbind_interface+0x21c/0xa90 drivers/usb/core/driver.c:423
> __device_release_driver drivers/base/dd.c:861
> device_release_driver_internal+0x4f4/0x5c0 drivers/base/dd.c:893
> device_release_driver+0x1e/0x30 drivers/base/dd.c:918
> bus_remove_device+0x2f4/0x4b0 drivers/base/bus.c:565
> device_del+0x5c4/0xab0 drivers/base/core.c:1985
> usb_disable_device+0x1e9/0x680 drivers/usb/core/message.c:1170
> usb_disconnect+0x260/0x7a0 drivers/usb/core/hub.c:2124
> hub_port_connect drivers/usb/core/hub.c:4754
> hub_port_connect_change drivers/usb/core/hub.c:5009
> port_event drivers/usb/core/hub.c:5115
> hub_event+0x1318/0x3740 drivers/usb/core/hub.c:5195
> process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
> process_scheduled_works kernel/workqueue.c:2179
> worker_thread+0xb2b/0x1850 kernel/workqueue.c:2255
> kthread+0x3a1/0x470 kernel/kthread.c:231
> ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>

It seems this is caused as a result of:
-> lock_map_acquire(&work->lockdep_map);
lock_map_release(&work->lockdep_map);

in flush_work() [0]

This was added by:

commit 0976dfc1d0cd80a4e9dfaf87bd8744612bde475a
Author: Stephen Boyd <[email protected]>
Date: Fri Apr 20 17:28:50 2012 -0700

workqueue: Catch more locking problems with flush_work()

Looking at the Stephen's patch, it's clear that it was made
with "static DECLARE_WORK(work, my_work)" in mind. However
p54's led_work is "per-device", hence it is stored in the
devices context p54_common, which is dynamically allocated.
So, maybe revert Stephen's patch?

[0] <http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L2853>

2017-09-21 18:22:46

by Andrey Konovalov

[permalink] [raw]
Subject: Re: usb/net/p54: trying to register non-static key in p54_unregister_leds

On Wed, Sep 20, 2017 at 9:55 PM, Johannes Berg
<[email protected]> wrote:
> On Wed, 2017-09-20 at 21:27 +0200, Christian Lamparter wrote:
>
>> It seems this is caused as a result of:
>> -> lock_map_acquire(&work->lockdep_map);
>> lock_map_release(&work->lockdep_map);
>>
>> in flush_work() [0]
>
> Agree.
>
>> This was added by:
>>
>> commit 0976dfc1d0cd80a4e9dfaf87bd8744612bde475a
>> Author: Stephen Boyd <[email protected]>
>> Date: Fri Apr 20 17:28:50 2012 -0700
>>
>> workqueue: Catch more locking problems with flush_work()
>
> Yes, but that doesn't matter.
>
>> Looking at the Stephen's patch, it's clear that it was made
>> with "static DECLARE_WORK(work, my_work)" in mind. However
>> p54's led_work is "per-device", hence it is stored in the
>> devices context p54_common, which is dynamically allocated.
>> So, maybe revert Stephen's patch?
>
> I disagree - as the lockdep warning says:
>
>> > INFO: trying to register non-static key.
>> > the code is fine but needs lockdep annotation.
>> > turning off the locking correctness validator.
>
> What it needs is to actually correctly go through initializing the work
> at least once.
>
> Without more information, I can't really say what's going on, but I
> assume that something is failing and p54_unregister_leds() is getting
> invoked without p54_init_leds() having been invoked, so essentially
> it's trying to flush a work that was never initialized?
>
> INIT_DELAYED_WORK() does, after all, initialize the lockdep map
> properly via __INIT_WORK().

Since I'm able to reproduce this, please let me know if you need me to
collect some debug traces to help with the triage.

Thanks!

>
> johannes
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

2017-09-26 15:06:32

by Andrey Konovalov

[permalink] [raw]
Subject: Re: [RESEND] Re: usb/net/p54: trying to register non-static key in p54_unregister_leds

On Sat, Sep 23, 2017 at 9:37 PM, 'Christian Lamparter' via syzkaller
<[email protected]> wrote:
> This got rejected by gmail once. Let's see if it works now.
>
> On Thursday, September 21, 2017 8:22:45 PM CEST Andrey Konovalov wrote:
>> On Wed, Sep 20, 2017 at 9:55 PM, Johannes Berg
>> <[email protected]> wrote:
>> > On Wed, 2017-09-20 at 21:27 +0200, Christian Lamparter wrote:
>> >
>> >> It seems this is caused as a result of:
>> >> -> lock_map_acquire(&work->lockdep_map);
>> >> lock_map_release(&work->lockdep_map);
>> >>
>> >> in flush_work() [0]
>> >
>> > Agree.
>> >
>> >> This was added by:
>> >>
>> >> commit 0976dfc1d0cd80a4e9dfaf87bd8744612bde475a
>> >> Author: Stephen Boyd <[email protected]>
>> >> Date: Fri Apr 20 17:28:50 2012 -0700
>> >>
>> >> workqueue: Catch more locking problems with flush_work()
>> >
>> > Yes, but that doesn't matter.
>> >
>> >> Looking at the Stephen's patch, it's clear that it was made
>> >> with "static DECLARE_WORK(work, my_work)" in mind. However
>> >> p54's led_work is "per-device", hence it is stored in the
>> >> devices context p54_common, which is dynamically allocated.
>> >> So, maybe revert Stephen's patch?
>> >
>> > I disagree - as the lockdep warning says:
>> >
>> >> > INFO: trying to register non-static key.
>> >> > the code is fine but needs lockdep annotation.
>> >> > turning off the locking correctness validator.
>> >
>> > What it needs is to actually correctly go through initializing the work
>> > at least once.
>> >
>> > Without more information, I can't really say what's going on, but I
>> > assume that something is failing and p54_unregister_leds() is getting
>> > invoked without p54_init_leds() having been invoked, so essentially
>> > it's trying to flush a work that was never initialized?
>> >
>> > INIT_DELAYED_WORK() does, after all, initialize the lockdep map
>> > properly via __INIT_WORK().
>
> Ok, thanks. This does indeed explain it.
>
> But this also begs the question: Is this really working then?
> From what I can tell, if CONFIG_LOCKDEP is not set then there's no BUG
> no WARN, no other splat or any other odd system behaviour. Does
> [cancel | flush]_[delayed_]work[_sync] really "just work" by *accident*,
> as long the delayed_work | work_struct is zeroed out?
> And should it work in the future as well?
>
>> Since I'm able to reproduce this, please let me know if you need me to
>> collect some debug traces to help with the triage.
>
> Do you want to take a shot at making a patch too? At a quick glance, it
> should be enough to move the [#ifdef CONFIG_P54_LEDS ... #endif] block
> in p54_unregister_common() into the if (priv->registered) { block
> (preferably before the ieee80211_unregister_hw(dev).

Just mailed a patch.

>
> Regards,
> Christian
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> For more options, visit https://groups.google.com/d/optout.