2007-05-06 15:16:53

by Satyam Sharma

[permalink] [raw]
Subject: [PATCH] make hci_notifier a blocking notifier (was Re: BUG: sleeping function called from invalid context at net/core/sock.c:1523)

On 5/6/07, Satyam Sharma <[email protected]> wrote:
> Hi Ray,
>
> On 5/6/07, Ray Lee <[email protected]> wrote:
> > Upon resume (from suspend to RAM) of my laptop, I'm getting the
> > following. (Not a regression, it's been there a while.) This is under
> > 2.6.21. Everything *seems* fine afterward, but... <shrug> -- it's not
> > like I use bluetooth all that often.
> >
> > My laptop has the bluetooth adapter hooked via USB insternally, so
> > it's the hci_usb driver. Authors cc:d.
> >
> > [ 1.329537] BUG: sleeping function called from invalid context at
> > net/core/sock.c:1523
> > [ 1.329574] in_atomic():1, irqs_disabled():0
> > [ 1.329576] INFO: lockdep is turned off.
> > [ 1.329578]
> > [ 1.329578] Call Trace:
> > [ 1.329585] [<ffffffff8029cedc>] debug_show_held_locks+0x1c/0x30
> > [ 1.329598] [<ffffffff8020b2f5>] __might_sleep+0xe5/0xf0
> > [ 1.329603] [<ffffffff803cfdbc>] lock_sock_nested+0x2c/0x120
> > [ 1.329617] [<ffffffff881c9a6a>] :bluetooth:hci_sock_dev_event+0x4a/0xf0
> > [ 1.329627] [<ffffffff881c9ae7>] :bluetooth:hci_sock_dev_event+0xc7/0xf0
> > [ 1.329634] [<ffffffff80269cef>] notifier_call_chain+0x2f/0x50
> > [ 1.329639] [<ffffffff80269d49>] atomic_notifier_call_chain+0x39/0x70
> > [ 1.329649] [<ffffffff881c4d86>] :bluetooth:hci_notify+0x16/0x20
> > [ 1.329657] [<ffffffff881c5deb>] :bluetooth:hci_unregister_dev+0x5b/0x80
> > [ 1.329664] [<ffffffff8825e136>] :hci_usb:hci_usb_disconnect+0x56/0x90
> > [ 1.329683] [<ffffffff8801e0fe>] :usbcore:usb_unbind_interface+0x4e/0xa0
> > [ 1.329690] [<ffffffff80392e03>] __device_release_driver+0x93/0xc0
> > [ 1.329694] [<ffffffff803933e6>] device_release_driver+0x46/0x70
> > [ 1.329699] [<ffffffff80392638>] bus_remove_device+0x78/0x90
> > [ 1.329703] [<ffffffff80390697>] device_del+0x187/0x200
> > [ 1.329717] [<ffffffff8801b4d2>] :usbcore:usb_disable_device+0x82/0x110
> > [ 1.329731] [<ffffffff8801745a>] :usbcore:usb_disconnect+0xba/0x140
> > [ 1.329746] [<ffffffff88018440>] :usbcore:hub_thread+0x400/0xcc0
> > [ 1.329757] [<ffffffff80297970>] autoremove_wake_function+0x0/0x40
> > [ 1.329772] [<ffffffff88018040>] :usbcore:hub_thread+0x0/0xcc0
> > [ 1.329775] [<ffffffff802977a0>] keventd_create_kthread+0x0/0x90
> > [ 1.329781] [<ffffffff80233e73>] kthread+0xd3/0x110
> > [ 1.329784] [<ffffffff80228890>] schedule_tail+0x0/0xe0
> > [ 1.329791] [<ffffffff80260648>] child_rip+0xa/0x12
> > [ 1.329796] [<ffffffff80260200>] restore_args+0x0/0x30
> > [ 1.329802] [<ffffffff80233da0>] kthread+0x0/0x110
> > [ 1.329806] [<ffffffff8026063e>] child_rip+0x0/0x12
> > [ 1.329809]
>
> Hmmm ... net/bluetooth/hci_sock.c:hci_sock_dev_event() is calling
> lock_sock() which can sleep (while holding the hci_sk_list.lock
> read-write spinlock).
>
> Can't really convert hci_sk_list.lock to a rwsem as
> hci_sock_dev_event() is the notifier_call for hci_sock_nblock which is
> an atomic notifier and so executes in atomic context.
>
> We could convert hci_sock_nblock itself to a blocking notifier too,
> but then I'm not _that_ familiar with this code to do that myself.
>
> Anyway, this doesn't look like anything to do with suspend-resume.
> Perhaps you could try to send this on [email protected] or
> [email protected] or something.

Well, it was a cool summer Sunday evening in Kanpur and ... I decided
to stay in the lab and investigate this :-(

Anyway, the hci_notifier is called from the following six call sites:

hci_dev_open() and hci_dev_close() -> both called from
hci_sock_ioctl() => both can sleep
hci_register_dev() and hci_unregister_dev() => again both are capable
of sleeping
hci_suspend_dev() and hci_resume_dev() -> called from the .suspend()
and .resume() of the hci_usb_driver, and again both of these can sleep

Is there any other reason why hci_notifier must be an atomic notifier?

(CC'ing Alan Stern just in case, apparently hci_notifier became atomic
when notifier chains were classified into atomic / blocking)

Patch below makes hci_notifier a blocking notifier. Compile-tested
only, I have no bluetooth hardware to run this, so let me know if this
would be wrong.

(Note that this only goes half-way in resolving this bug, because we
would also need to convert hci_sk_list.lock from rwlock to rwsem for
that. But that is not possible as other users of hci_sk_list.lock
_are_ atomic)

Satyam

---

Make hci_notifier a blocking notifier, as none of its users call it
from atomic context.

Signed-off-by: Satyam Sharma <[email protected]>

---

net/bluetooth/hci_core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

---

diff -ruNp a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
--- a/net/bluetooth/hci_core.c 2007-04-26 08:38:32.000000000 +0530
+++ b/net/bluetooth/hci_core.c 2007-05-06 20:16:30.000000000 +0530
@@ -72,23 +72,23 @@ DEFINE_RWLOCK(hci_cb_list_lock);
struct hci_proto *hci_proto[HCI_MAX_PROTO];

/* HCI notifiers list */
-static ATOMIC_NOTIFIER_HEAD(hci_notifier);
+static BLOCKING_NOTIFIER_HEAD(hci_notifier);

/* ---- HCI notifications ---- */

int hci_register_notifier(struct notifier_block *nb)
{
- return atomic_notifier_chain_register(&hci_notifier, nb);
+ return blocking_notifier_chain_register(&hci_notifier, nb);
}

int hci_unregister_notifier(struct notifier_block *nb)
{
- return atomic_notifier_chain_unregister(&hci_notifier, nb);
+ return blocking_notifier_chain_unregister(&hci_notifier, nb);
}

static void hci_notify(struct hci_dev *hdev, int event)
{
- atomic_notifier_call_chain(&hci_notifier, event, hdev);
+ blocking_notifier_call_chain(&hci_notifier, event, hdev);
}

/* ---- HCI requests ---- */


2007-05-06 22:06:41

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] make hci_notifier a blocking notifier (was Re: BUG: sleeping function called from invalid context at net/core/sock.c:1523)

On Sun, 6 May 2007, Satyam Sharma wrote:

> Anyway, the hci_notifier is called from the following six call sites:
>
> hci_dev_open() and hci_dev_close() -> both called from
> hci_sock_ioctl() => both can sleep
> hci_register_dev() and hci_unregister_dev() => again both are capable
> of sleeping
> hci_suspend_dev() and hci_resume_dev() -> called from the .suspend()
> and .resume() of the hci_usb_driver, and again both of these can sleep
>
> Is there any other reason why hci_notifier must be an atomic notifier?
>
> (CC'ing Alan Stern just in case, apparently hci_notifier became atomic
> when notifier chains were classified into atomic / blocking)

I don't remember exactly why this particular choice was made. Perhaps we
found that the notifier callout routines didn't use any blocking
primitives (we may have been mistaken about this -- there was a lot of
code to check) and so therefore the choice didn't matter. In that case we
probably just decided to make it an atomic notifier to keep things simple.

As you found, changing it to a blocking notifier is very easy. Provided
all the callers are non-atomic it should work just fine.

Alan Stern

2007-05-06 23:15:04

by Ray Lee

[permalink] [raw]
Subject: Re: [PATCH] make hci_notifier a blocking notifier (was Re: BUG: sleeping function called from invalid context at net/core/sock.c:1523)

On 5/6/07, Alan Stern <[email protected]> wrote:
> On Sun, 6 May 2007, Satyam Sharma wrote:
>
> > Anyway, the hci_notifier is called from the following six call sites:
> >
> > hci_dev_open() and hci_dev_close() -> both called from
> > hci_sock_ioctl() => both can sleep
> > hci_register_dev() and hci_unregister_dev() => again both are capable
> > of sleeping
> > hci_suspend_dev() and hci_resume_dev() -> called from the .suspend()
> > and .resume() of the hci_usb_driver, and again both of these can sleep
> >
> > Is there any other reason why hci_notifier must be an atomic notifier?
> >
> > (CC'ing Alan Stern just in case, apparently hci_notifier became atomic
> > when notifier chains were classified into atomic / blocking)
>
> I don't remember exactly why this particular choice was made. Perhaps we
> found that the notifier callout routines didn't use any blocking
> primitives (we may have been mistaken about this -- there was a lot of
> code to check) and so therefore the choice didn't matter. In that case we
> probably just decided to make it an atomic notifier to keep things simple.
>
> As you found, changing it to a blocking notifier is very easy. Provided
> all the callers are non-atomic it should work just fine.

Okay, I'll go ahead and try the patch, then, and report back.

Thanks,

Ray

2007-05-06 23:56:08

by Satyam Sharma

[permalink] [raw]
Subject: Re: [PATCH] make hci_notifier a blocking notifier (was Re: BUG: sleeping function called from invalid context at net/core/sock.c:1523)

(Dropped Pavel, Rafael and linux-pm from CC list, this isn't a PM
error so don't want to spam them; and added bluez-devel)

On 5/7/07, Ray Lee <[email protected]> wrote:
> On 5/6/07, Alan Stern <[email protected]> wrote:
> > On Sun, 6 May 2007, Satyam Sharma wrote:
> >
> > > Anyway, the hci_notifier is called from the following six call sites:
> > >
> > > hci_dev_open() and hci_dev_close() -> both called from
> > > hci_sock_ioctl() => both can sleep
> > > hci_register_dev() and hci_unregister_dev() => again both are capable
> > > of sleeping
> > > hci_suspend_dev() and hci_resume_dev() -> called from the .suspend()
> > > and .resume() of the hci_usb_driver, and again both of these can sleep
> > >
> > > Is there any other reason why hci_notifier must be an atomic notifier?
> > >
> > > (CC'ing Alan Stern just in case, apparently hci_notifier became atomic
> > > when notifier chains were classified into atomic / blocking)
> >
> > I don't remember exactly why this particular choice was made. Perhaps we
> > found that the notifier callout routines didn't use any blocking
> > primitives (we may have been mistaken about this -- there was a lot of
> > code to check) and so therefore the choice didn't matter. In that case we
> > probably just decided to make it an atomic notifier to keep things simple.
> >
> > As you found, changing it to a blocking notifier is very easy. Provided
> > all the callers are non-atomic it should work just fine.
>
> Okay, I'll go ahead and try the patch, then, and report back.

You'd still get the BUG message. To fully resolve the problem, we need
to make the hci_sock_dev_event() notifier callout blocking (which
happened with this patch) but also convert hci_sk_list.lock to a
rwsem, but some users of that rwlock (other than hci_sock_dev_event)
are atomic.

However, please do try and get back, as your testing would still be
helpful to see whether converting hci_notifier to blocking had other
side-effects -- if you only see the same message again and otherwise
things seem fine, then we're good as far as at least this change was
concerned.

Thanks,
Satyam

2007-05-16 19:10:11

by Ray Lee

[permalink] [raw]
Subject: Re: [PATCH] make hci_notifier a blocking notifier (was Re: BUG: sleeping function called from invalid context at net/core/sock.c:1523)

Apologies for taking so long to get back to you -- I've been on the
road for the last week and have finally got to a point where I could
test the patch.

On 5/6/07, Satyam Sharma <[email protected]> wrote:
> (Dropped Pavel, Rafael and linux-pm from CC list, this isn't a PM
> error so don't want to spam them; and added bluez-devel)
>
> On 5/7/07, Ray Lee <[email protected]> wrote:
> > On 5/6/07, Alan Stern <[email protected]> wrote:
> > > On Sun, 6 May 2007, Satyam Sharma wrote:
> > >
> > > > Anyway, the hci_notifier is called from the following six call sites:
> > > >
> > > > hci_dev_open() and hci_dev_close() -> both called from
> > > > hci_sock_ioctl() => both can sleep
> > > > hci_register_dev() and hci_unregister_dev() => again both are capable
> > > > of sleeping
> > > > hci_suspend_dev() and hci_resume_dev() -> called from the .suspend()
> > > > and .resume() of the hci_usb_driver, and again both of these can sleep
> > > >
> > > > Is there any other reason why hci_notifier must be an atomic notifier?
> > > >
> > > > (CC'ing Alan Stern just in case, apparently hci_notifier became atomic
> > > > when notifier chains were classified into atomic / blocking)
> > >
> > > I don't remember exactly why this particular choice was made. Perhaps we
> > > found that the notifier callout routines didn't use any blocking
> > > primitives (we may have been mistaken about this -- there was a lot of
> > > code to check) and so therefore the choice didn't matter. In that case we
> > > probably just decided to make it an atomic notifier to keep things simple.
> > >
> > > As you found, changing it to a blocking notifier is very easy. Provided
> > > all the callers are non-atomic it should work just fine.
> >
> > Okay, I'll go ahead and try the patch, then, and report back.
>
> You'd still get the BUG message. To fully resolve the problem, we need
> to make the hci_sock_dev_event() notifier callout blocking (which
> happened with this patch) but also convert hci_sk_list.lock to a
> rwsem, but some users of that rwlock (other than hci_sock_dev_event)
> are atomic.
>
> However, please do try and get back, as your testing would still be
> helpful to see whether converting hci_notifier to blocking had other
> side-effects -- if you only see the same message again and otherwise
> things seem fine, then we're good as far as at least this change was
> concerned.

Yes, it's roughly the same trace. There are some differences, though
those are likely due to me finding a new way to trigger the issue. (My
laptop has a button to turn the WiFi/Bluetooth on and off. Hitting
that and causing a disconnect of the internal Bluetooth connector
triggers the same issue without going through a suspend/resume cycle.)

[ 272.539154] BUG: sleeping function called from invalid context at
net/core/sock.c:1547
[ 272.539161] in_atomic():1, irqs_disabled():0
[ 272.539163] 2 locks held by khubd/1350:
[ 272.539165] #0: ((hci_notifier).rwsem){----}, at:
[<ffffffff8023741b>] __blocking_notifier_call_chain+0x3b/0x80
[ 272.539175] #1: (old_style_rw_init#2){-.-?}, at:
[<ffffffff88203c53>] hci_sock_dev_event+0x53/0x100 [bluetooth]
[ 272.539196]
[ 272.539197] Call Trace:
[ 272.539203] [<ffffffff80245833>] debug_show_held_locks+0x13/0x30
[ 272.539216] [<ffffffff80224963>] __might_sleep+0xc3/0xf0
[ 272.539221] [<ffffffff803c29dc>] lock_sock_nested+0x2c/0x120
[ 272.539231] [<ffffffff88203c53>] :bluetooth:hci_sock_dev_event+0x53/0x100
[ 272.539241] [<ffffffff88203c76>] :bluetooth:hci_sock_dev_event+0x76/0x100
[ 272.539250] [<ffffffff8045d073>] notifier_call_chain+0x53/0x80
[ 272.539256] [<ffffffff80237431>] __blocking_notifier_call_chain+0x51/0x80
[ 272.539262] [<ffffffff80237471>] blocking_notifier_call_chain+0x11/0x20
[ 272.539270] [<ffffffff881fed16>] :bluetooth:hci_notify+0x16/0x20
[ 272.539278] [<ffffffff881ffdbb>] :bluetooth:hci_unregister_dev+0x5b/0x80
[ 272.539286] [<ffffffff88224136>] :hci_usb:hci_usb_disconnect+0x56/0x90
[ 272.539309] [<ffffffff8801e66e>] :usbcore:usb_unbind_interface+0x4e/0xa0
[ 272.539315] [<ffffffff80380d86>] __device_release_driver+0x86/0xc0
[ 272.539320] [<ffffffff803812e6>] device_release_driver+0x46/0x70
[ 272.539325] [<ffffffff803804e3>] bus_remove_device+0x63/0x90
[ 272.539329] [<ffffffff8037e474>] device_del+0x1a4/0x2e0
[ 272.539344] [<ffffffff8801b8c6>] :usbcore:usb_disable_device+0x96/0x120
[ 272.539358] [<ffffffff880173ba>] :usbcore:usb_disconnect+0xba/0x140
[ 272.539372] [<ffffffff88017ac3>] :usbcore:hub_thread+0x263/0xdb0
[ 272.539382] [<ffffffff80456b66>] __sched_text_start+0x296/0x2ce
[ 272.539389] [<ffffffff8023eb30>] autoremove_wake_function+0x0/0x40
[ 272.539403] [<ffffffff88017860>] :usbcore:hub_thread+0x0/0xdb0
[ 272.539408] [<ffffffff8023e73d>] kthread+0x4d/0x80
[ 272.539414] [<ffffffff8020a298>] child_rip+0xa/0x12
[ 272.539420] [<ffffffff80209e50>] restore_args+0x0/0x30
[ 272.539424] [<ffffffff8023e844>] kthreadd+0xd4/0x160
[ 272.539429] [<ffffffff8023e6f0>] kthread+0x0/0x80
[ 272.539433] [<ffffffff8020a28e>] child_rip+0x0/0x12

Ray

2007-05-17 04:13:16

by Satyam Sharma

[permalink] [raw]
Subject: Re: [PATCH] make hci_notifier a blocking notifier (was Re: BUG: sleeping function called from invalid context at net/core/sock.c:1523)

On 5/17/07, Ray Lee <[email protected]> wrote:
> Apologies for taking so long to get back to you -- I've been on the
> road for the last week and have finally got to a point where I could
> test the patch.
>
> On 5/6/07, Satyam Sharma <[email protected]> wrote:
> > (Dropped Pavel, Rafael and linux-pm from CC list, this isn't a PM
> > error so don't want to spam them; and added bluez-devel)
> >
> > On 5/7/07, Ray Lee <[email protected]> wrote:
> > > On 5/6/07, Alan Stern <[email protected]> wrote:
> > > > On Sun, 6 May 2007, Satyam Sharma wrote:
> > > >
> > > > > Anyway, the hci_notifier is called from the following six call sites:
> > > > >
> > > > > hci_dev_open() and hci_dev_close() -> both called from
> > > > > hci_sock_ioctl() => both can sleep
> > > > > hci_register_dev() and hci_unregister_dev() => again both are capable
> > > > > of sleeping
> > > > > hci_suspend_dev() and hci_resume_dev() -> called from the .suspend()
> > > > > and .resume() of the hci_usb_driver, and again both of these can sleep
> > > > >
> > > > > Is there any other reason why hci_notifier must be an atomic notifier?
> > > > >
> > > > > (CC'ing Alan Stern just in case, apparently hci_notifier became atomic
> > > > > when notifier chains were classified into atomic / blocking)
> > > >
> > > > I don't remember exactly why this particular choice was made. Perhaps we
> > > > found that the notifier callout routines didn't use any blocking
> > > > primitives (we may have been mistaken about this -- there was a lot of
> > > > code to check) and so therefore the choice didn't matter. In that case we
> > > > probably just decided to make it an atomic notifier to keep things simple.
> > > >
> > > > As you found, changing it to a blocking notifier is very easy. Provided
> > > > all the callers are non-atomic it should work just fine.
> > >
> > > Okay, I'll go ahead and try the patch, then, and report back.
> >
> > You'd still get the BUG message. To fully resolve the problem, we need
> > to make the hci_sock_dev_event() notifier callout blocking (which
> > happened with this patch) but also convert hci_sk_list.lock to a
> > rwsem, but some users of that rwlock (other than hci_sock_dev_event)
> > are atomic.
> >
> > However, please do try and get back, as your testing would still be
> > helpful to see whether converting hci_notifier to blocking had other
> > side-effects -- if you only see the same message again and otherwise
> > things seem fine, then we're good as far as at least this change was
> > concerned.
>
> Yes, it's roughly the same trace. There are some differences, though
> those are likely due to me finding a new way to trigger the issue. (My
> laptop has a button to turn the WiFi/Bluetooth on and off. Hitting
> that and causing a disconnect of the internal Bluetooth connector
> triggers the same issue without going through a suspend/resume cycle.)

Hi Ray,

This issue has actually been resolved, see the patch at:
http://lkml.org/lkml/2007/5/16/149

[ We've slightly altered the locking scheme, but it's also good to know
that converting hci_notifier to a blocking notifier doesn't cause any
troubles either. If this is fine with other drivers too, this could actually
be a separate patch. ]

I'll also soon send that patch to Andrew, will Cc you too.

Thanks,
Satyam

2007-05-17 04:33:01

by Ray Lee

[permalink] [raw]
Subject: Re: [PATCH] make hci_notifier a blocking notifier (was Re: BUG: sleeping function called from invalid context at net/core/sock.c:1523)

On 5/16/07, Satyam Sharma <[email protected]> wrote:
> This issue has actually been resolved, see the patch at:
> http://lkml.org/lkml/2007/5/16/149

Ah, excellent. Thanks!

Ray