2013-07-06 18:03:16

by Maarten Lankhorst

[permalink] [raw]
Subject: inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage with hcd_urb_list_lock

I didn't even know I still had lockdep on.
The following lockdep splat happened when I plugged in a usb bluetooth dongle, using
the pre-rc1 3.11 kernel at HEAD b2c311075db

=================================
[ INFO: inconsistent lock state ]
3.10.0+ #106 Not tainted
---------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
irq/42-xhci_hcd/97 [HC0[0]:SC0[2]:HE1:SE0] takes:
(hcd_urb_list_lock){?.....}, at: [<ffffffff8149440e>] usb_hcd_unlink_urb_from_ep+0x28/0x4e
{IN-HARDIRQ-W} state was registered at:
[<ffffffff810afbc6>] __lock_acquire+0x861/0x1c2e
[<ffffffff810b1556>] lock_acquire+0x87/0x139
[<ffffffff81698a1a>] _raw_spin_lock+0x3b/0x4a
[<ffffffff8149440e>] usb_hcd_unlink_urb_from_ep+0x28/0x4e
[<ffffffff814a65f1>] ehci_urb_done+0x5b/0x9c
[<ffffffff814a8e18>] qh_completions+0x311/0x3ca
[<ffffffff814aa613>] ehci_handle_intr_unlinks+0xad/0x143
[<ffffffff814a7098>] ehci_hrtimer_func+0x6b/0xbd
[<ffffffff8107cfc8>] __run_hrtimer+0x61/0x1f9
[<ffffffff8107d7ca>] hrtimer_interrupt+0x10a/0x253
[<ffffffff810256c0>] local_apic_timer_interrupt+0x38/0x5e
[<ffffffff81025ab5>] smp_apic_timer_interrupt+0x46/0x5b
[<ffffffff816a152f>] apic_timer_interrupt+0x6f/0x80
irq event stamp: 224
hardirqs last enabled at (224): [<ffffffff8168e8d4>] __slab_free+0x1c7/0x2ed
hardirqs last disabled at (223): [<ffffffff8168e846>] __slab_free+0x139/0x2ed
softirqs last enabled at (206): [<ffffffff810d8024>] irq_forced_thread_fn+0x49/0x5a
softirqs last disabled at (212): [<ffffffff810d7fff>] irq_forced_thread_fn+0x24/0x5a

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(hcd_urb_list_lock);
<Interrupt>
lock(hcd_urb_list_lock);

*** DEADLOCK ***

1 lock held by irq/42-xhci_hcd/97:
#0: (&(&xhci->lock)->rlock){+.-...}, at: [<ffffffff814befdd>] xhci_irq+0x2f/0x143d

stack backtrace:
CPU: 1 PID: 97 Comm: irq/42-xhci_hcd Not tainted 3.10.0+ #106
Hardware name: Acer Aspire M3985/Aspire M3985, BIOS P01-A1 03/12/2012
ffffffff8210c150 ffff88040834da48 ffffffff81691af4 0000000000000007
ffff8804082e20b0 ffff88040834daa8 ffffffff8168cb10 0000000000000002
ffff880400000001 ffff880400000000 ffffffff8100f4f7 ffff88040834dac4
Call Trace:
[<ffffffff81691af4>] dump_stack+0x4f/0x84
[<ffffffff8168cb10>] print_usage_bug+0x1f5/0x206
[<ffffffff8100f4f7>] ? save_stack_trace+0x2f/0x50
[<ffffffff810af30c>] mark_lock+0x276/0x2cf
[<ffffffff810ae8cc>] ? check_usage_forwards+0x12f/0x12f
[<ffffffff810af925>] __lock_acquire+0x5c0/0x1c2e
[<ffffffff810b1e04>] ? mark_held_locks+0x6d/0x117
[<ffffffff8168e8d4>] ? __slab_free+0x1c7/0x2ed
[<ffffffff810b1f5a>] ? trace_hardirqs_on_caller+0xac/0x1bb
[<ffffffff810b2076>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
[<ffffffff810b1556>] lock_acquire+0x87/0x139
[<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
[<ffffffff81698a1a>] _raw_spin_lock+0x3b/0x4a
[<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
[<ffffffff8149440e>] usb_hcd_unlink_urb_from_ep+0x28/0x4e
[<ffffffff814bf55a>] xhci_irq+0x5ac/0x143d
[<ffffffff81699171>] ? _raw_spin_unlock_irq+0x3b/0x5d
[<ffffffff8108386d>] ? finish_task_switch+0x7c/0x101
[<ffffffff81083830>] ? finish_task_switch+0x3f/0x101
[<ffffffff81697060>] ? __schedule+0x42a/0x885
[<ffffffff810d7fdb>] ? irq_thread_fn+0x48/0x48
[<ffffffff814c03fc>] xhci_msi_irq+0x11/0x15
[<ffffffff810d8009>] irq_forced_thread_fn+0x2e/0x5a
[<ffffffff810d7d36>] irq_thread+0x10e/0x131
[<ffffffff810d7ee3>] ? irq_finalize_oneshot.part.33+0xe0/0xe0
[<ffffffff810d7c28>] ? wake_threads_waitq+0x44/0x44
[<ffffffff81079b61>] kthread+0xea/0xef
[<ffffffff81079a77>] ? flush_kthread_work+0x19c/0x19c
[<ffffffff816a082c>] ret_from_fork+0x7c/0xb0
[<ffffffff81079a77>] ? flush_kthread_work+0x19c/0x19c


2013-07-06 20:48:50

by Alan Stern

[permalink] [raw]
Subject: Re: inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage with hcd_urb_list_lock

On Sat, 6 Jul 2013, Maarten Lankhorst wrote:

> I didn't even know I still had lockdep on.
> The following lockdep splat happened when I plugged in a usb bluetooth dongle, using
> the pre-rc1 3.11 kernel at HEAD b2c311075db
>
> =================================
> [ INFO: inconsistent lock state ]
> 3.10.0+ #106 Not tainted
> ---------------------------------
> inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> irq/42-xhci_hcd/97 [HC0[0]:SC0[2]:HE1:SE0] takes:
> (hcd_urb_list_lock){?.....}, at: [<ffffffff8149440e>] usb_hcd_unlink_urb_from_ep+0x28/0x4e

> stack backtrace:
> CPU: 1 PID: 97 Comm: irq/42-xhci_hcd Not tainted 3.10.0+ #106
> Hardware name: Acer Aspire M3985/Aspire M3985, BIOS P01-A1 03/12/2012
> ffffffff8210c150 ffff88040834da48 ffffffff81691af4 0000000000000007
> ffff8804082e20b0 ffff88040834daa8 ffffffff8168cb10 0000000000000002
> ffff880400000001 ffff880400000000 ffffffff8100f4f7 ffff88040834dac4
> Call Trace:
> [<ffffffff81691af4>] dump_stack+0x4f/0x84
> [<ffffffff8168cb10>] print_usage_bug+0x1f5/0x206
> [<ffffffff8100f4f7>] ? save_stack_trace+0x2f/0x50
> [<ffffffff810af30c>] mark_lock+0x276/0x2cf
> [<ffffffff810ae8cc>] ? check_usage_forwards+0x12f/0x12f
> [<ffffffff810af925>] __lock_acquire+0x5c0/0x1c2e
> [<ffffffff810b1e04>] ? mark_held_locks+0x6d/0x117
> [<ffffffff8168e8d4>] ? __slab_free+0x1c7/0x2ed
> [<ffffffff810b1f5a>] ? trace_hardirqs_on_caller+0xac/0x1bb
> [<ffffffff810b2076>] ? trace_hardirqs_on+0xd/0xf
> [<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
> [<ffffffff810b1556>] lock_acquire+0x87/0x139
> [<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
> [<ffffffff81698a1a>] _raw_spin_lock+0x3b/0x4a
> [<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
> [<ffffffff8149440e>] usb_hcd_unlink_urb_from_ep+0x28/0x4e
> [<ffffffff814bf55a>] xhci_irq+0x5ac/0x143d
> [<ffffffff81699171>] ? _raw_spin_unlock_irq+0x3b/0x5d
> [<ffffffff8108386d>] ? finish_task_switch+0x7c/0x101
> [<ffffffff81083830>] ? finish_task_switch+0x3f/0x101
> [<ffffffff81697060>] ? __schedule+0x42a/0x885
> [<ffffffff810d7fdb>] ? irq_thread_fn+0x48/0x48
> [<ffffffff814c03fc>] xhci_msi_irq+0x11/0x15

It looks like xhci_msi_irq() needs to call local_irq_save() and
local_irq_restore().

Alan Stern

2013-07-07 09:32:31

by Ming Lei

[permalink] [raw]
Subject: Re: inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage with hcd_urb_list_lock

On Sun, Jul 7, 2013 at 4:48 AM, Alan Stern <[email protected]> wrote:
> On Sat, 6 Jul 2013, Maarten Lankhorst wrote:
>
>> I didn't even know I still had lockdep on.
>> The following lockdep splat happened when I plugged in a usb bluetooth dongle, using
>> the pre-rc1 3.11 kernel at HEAD b2c311075db
>>
>> =================================
>> [ INFO: inconsistent lock state ]
>> 3.10.0+ #106 Not tainted
>> ---------------------------------
>> inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
>> irq/42-xhci_hcd/97 [HC0[0]:SC0[2]:HE1:SE0] takes:
>> (hcd_urb_list_lock){?.....}, at: [<ffffffff8149440e>] usb_hcd_unlink_urb_from_ep+0x28/0x4e
>
>> stack backtrace:
>> CPU: 1 PID: 97 Comm: irq/42-xhci_hcd Not tainted 3.10.0+ #106

Maarten, I am wondering you are running kernel from linus tree since
looks xhci threaded irq is enabled from the log.

If yes, please don't do that because USB host controller driver still
doesn't support threaded irq now.

>> Hardware name: Acer Aspire M3985/Aspire M3985, BIOS P01-A1 03/12/2012
>> ffffffff8210c150 ffff88040834da48 ffffffff81691af4 0000000000000007
>> ffff8804082e20b0 ffff88040834daa8 ffffffff8168cb10 0000000000000002
>> ffff880400000001 ffff880400000000 ffffffff8100f4f7 ffff88040834dac4
>> Call Trace:
>> [<ffffffff81691af4>] dump_stack+0x4f/0x84
>> [<ffffffff8168cb10>] print_usage_bug+0x1f5/0x206
>> [<ffffffff8100f4f7>] ? save_stack_trace+0x2f/0x50
>> [<ffffffff810af30c>] mark_lock+0x276/0x2cf
>> [<ffffffff810ae8cc>] ? check_usage_forwards+0x12f/0x12f
>> [<ffffffff810af925>] __lock_acquire+0x5c0/0x1c2e
>> [<ffffffff810b1e04>] ? mark_held_locks+0x6d/0x117
>> [<ffffffff8168e8d4>] ? __slab_free+0x1c7/0x2ed
>> [<ffffffff810b1f5a>] ? trace_hardirqs_on_caller+0xac/0x1bb
>> [<ffffffff810b2076>] ? trace_hardirqs_on+0xd/0xf
>> [<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
>> [<ffffffff810b1556>] lock_acquire+0x87/0x139
>> [<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
>> [<ffffffff81698a1a>] _raw_spin_lock+0x3b/0x4a
>> [<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
>> [<ffffffff8149440e>] usb_hcd_unlink_urb_from_ep+0x28/0x4e
>> [<ffffffff814bf55a>] xhci_irq+0x5ac/0x143d
>> [<ffffffff81699171>] ? _raw_spin_unlock_irq+0x3b/0x5d
>> [<ffffffff8108386d>] ? finish_task_switch+0x7c/0x101
>> [<ffffffff81083830>] ? finish_task_switch+0x3f/0x101
>> [<ffffffff81697060>] ? __schedule+0x42a/0x885
>> [<ffffffff810d7fdb>] ? irq_thread_fn+0x48/0x48
>> [<ffffffff814c03fc>] xhci_msi_irq+0x11/0x15
>
> It looks like xhci_msi_irq() needs to call local_irq_save() and
> local_irq_restore().

Looks it isn't needed.

Thanks,
--
Ming Lei

2013-07-11 16:32:46

by Sarah Sharp

[permalink] [raw]
Subject: Re: inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage with hcd_urb_list_lock

On Sat, Jul 06, 2013 at 04:48:47PM -0400, Alan Stern wrote:
> On Sat, 6 Jul 2013, Maarten Lankhorst wrote:
>
> > I didn't even know I still had lockdep on.
> > The following lockdep splat happened when I plugged in a usb bluetooth dongle, using
> > the pre-rc1 3.11 kernel at HEAD b2c311075db
> >
> > =================================
> > [ INFO: inconsistent lock state ]
> > 3.10.0+ #106 Not tainted
> > ---------------------------------
> > inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> > irq/42-xhci_hcd/97 [HC0[0]:SC0[2]:HE1:SE0] takes:
> > (hcd_urb_list_lock){?.....}, at: [<ffffffff8149440e>] usb_hcd_unlink_urb_from_ep+0x28/0x4e
>
> > stack backtrace:
> > CPU: 1 PID: 97 Comm: irq/42-xhci_hcd Not tainted 3.10.0+ #106
> > Hardware name: Acer Aspire M3985/Aspire M3985, BIOS P01-A1 03/12/2012
> > ffffffff8210c150 ffff88040834da48 ffffffff81691af4 0000000000000007
> > ffff8804082e20b0 ffff88040834daa8 ffffffff8168cb10 0000000000000002
> > ffff880400000001 ffff880400000000 ffffffff8100f4f7 ffff88040834dac4
> > Call Trace:
> > [<ffffffff81691af4>] dump_stack+0x4f/0x84
> > [<ffffffff8168cb10>] print_usage_bug+0x1f5/0x206
> > [<ffffffff8100f4f7>] ? save_stack_trace+0x2f/0x50
> > [<ffffffff810af30c>] mark_lock+0x276/0x2cf
> > [<ffffffff810ae8cc>] ? check_usage_forwards+0x12f/0x12f
> > [<ffffffff810af925>] __lock_acquire+0x5c0/0x1c2e
> > [<ffffffff810b1e04>] ? mark_held_locks+0x6d/0x117
> > [<ffffffff8168e8d4>] ? __slab_free+0x1c7/0x2ed
> > [<ffffffff810b1f5a>] ? trace_hardirqs_on_caller+0xac/0x1bb
> > [<ffffffff810b2076>] ? trace_hardirqs_on+0xd/0xf
> > [<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
> > [<ffffffff810b1556>] lock_acquire+0x87/0x139
> > [<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
> > [<ffffffff81698a1a>] _raw_spin_lock+0x3b/0x4a
> > [<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
> > [<ffffffff8149440e>] usb_hcd_unlink_urb_from_ep+0x28/0x4e
> > [<ffffffff814bf55a>] xhci_irq+0x5ac/0x143d
> > [<ffffffff81699171>] ? _raw_spin_unlock_irq+0x3b/0x5d
> > [<ffffffff8108386d>] ? finish_task_switch+0x7c/0x101
> > [<ffffffff81083830>] ? finish_task_switch+0x3f/0x101
> > [<ffffffff81697060>] ? __schedule+0x42a/0x885
> > [<ffffffff810d7fdb>] ? irq_thread_fn+0x48/0x48
> > [<ffffffff814c03fc>] xhci_msi_irq+0x11/0x15
>
> It looks like xhci_msi_irq() needs to call local_irq_save() and
> local_irq_restore().

Alan, does this only need to be called when MSI is enabled?
xhci_msi_irq() basically calls xhci_irq, which is called if MSI is
disabled and legacy PCI interrupts are used.

I'm trying to figure out whether to put the calls to local_irq_save()
and local_irq_restore() in xhci_msi_irq() or xhci_irq().

Sarah Sharp

2013-07-11 16:58:56

by Alan Stern

[permalink] [raw]
Subject: Re: inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage with hcd_urb_list_lock

On Thu, 11 Jul 2013, Sarah Sharp wrote:

> On Sat, Jul 06, 2013 at 04:48:47PM -0400, Alan Stern wrote:
> > On Sat, 6 Jul 2013, Maarten Lankhorst wrote:
> >
> > > I didn't even know I still had lockdep on.
> > > The following lockdep splat happened when I plugged in a usb bluetooth dongle, using
> > > the pre-rc1 3.11 kernel at HEAD b2c311075db
> > >
> > > =================================
> > > [ INFO: inconsistent lock state ]
> > > 3.10.0+ #106 Not tainted
> > > ---------------------------------
> > > inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> > > irq/42-xhci_hcd/97 [HC0[0]:SC0[2]:HE1:SE0] takes:
> > > (hcd_urb_list_lock){?.....}, at: [<ffffffff8149440e>] usb_hcd_unlink_urb_from_ep+0x28/0x4e
> >
> > > stack backtrace:
> > > CPU: 1 PID: 97 Comm: irq/42-xhci_hcd Not tainted 3.10.0+ #106
> > > Hardware name: Acer Aspire M3985/Aspire M3985, BIOS P01-A1 03/12/2012
> > > ffffffff8210c150 ffff88040834da48 ffffffff81691af4 0000000000000007
> > > ffff8804082e20b0 ffff88040834daa8 ffffffff8168cb10 0000000000000002
> > > ffff880400000001 ffff880400000000 ffffffff8100f4f7 ffff88040834dac4
> > > Call Trace:
> > > [<ffffffff81691af4>] dump_stack+0x4f/0x84
> > > [<ffffffff8168cb10>] print_usage_bug+0x1f5/0x206
> > > [<ffffffff8100f4f7>] ? save_stack_trace+0x2f/0x50
> > > [<ffffffff810af30c>] mark_lock+0x276/0x2cf
> > > [<ffffffff810ae8cc>] ? check_usage_forwards+0x12f/0x12f
> > > [<ffffffff810af925>] __lock_acquire+0x5c0/0x1c2e
> > > [<ffffffff810b1e04>] ? mark_held_locks+0x6d/0x117
> > > [<ffffffff8168e8d4>] ? __slab_free+0x1c7/0x2ed
> > > [<ffffffff810b1f5a>] ? trace_hardirqs_on_caller+0xac/0x1bb
> > > [<ffffffff810b2076>] ? trace_hardirqs_on+0xd/0xf
> > > [<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
> > > [<ffffffff810b1556>] lock_acquire+0x87/0x139
> > > [<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
> > > [<ffffffff81698a1a>] _raw_spin_lock+0x3b/0x4a
> > > [<ffffffff8149440e>] ? usb_hcd_unlink_urb_from_ep+0x28/0x4e
> > > [<ffffffff8149440e>] usb_hcd_unlink_urb_from_ep+0x28/0x4e
> > > [<ffffffff814bf55a>] xhci_irq+0x5ac/0x143d
> > > [<ffffffff81699171>] ? _raw_spin_unlock_irq+0x3b/0x5d
> > > [<ffffffff8108386d>] ? finish_task_switch+0x7c/0x101
> > > [<ffffffff81083830>] ? finish_task_switch+0x3f/0x101
> > > [<ffffffff81697060>] ? __schedule+0x42a/0x885
> > > [<ffffffff810d7fdb>] ? irq_thread_fn+0x48/0x48
> > > [<ffffffff814c03fc>] xhci_msi_irq+0x11/0x15
> >
> > It looks like xhci_msi_irq() needs to call local_irq_save() and
> > local_irq_restore().
>
> Alan, does this only need to be called when MSI is enabled?
> xhci_msi_irq() basically calls xhci_irq, which is called if MSI is
> disabled and legacy PCI interrupts are used.
>
> I'm trying to figure out whether to put the calls to local_irq_save()
> and local_irq_restore() in xhci_msi_irq() or xhci_irq().

Ming Lei pointed out that my comment was wrong. The reason for the
lockdep violation was because Maarten had set up xhci-hcd to use
threaded interrupts. You can see this in the current process name in
the lockdep warning: "irq/42-xhci_hcd" is the name of an
interrupt-handler thread.

Without threaded interrupts, it seems likely that no changes are
needed. Certainly no changes are needed in xhci_irq(), because usbcore
disables local interrupts explicitly before invoking legacy interrupt
handlers (see usb_hcd_irq() in hcd.c). In the absense of threaded
interrupts, I assume xhci_msi_irq() will be invoked with interrupts
disabled -- if not then it will need to disable them before calling
xhci_irq().

Alan Stern