2015-11-22 11:24:10

by Daniel J Blueman

[permalink] [raw]
Subject: Re: [4.3] kworker busy in pm_runtime_work

On 16 November 2015 at 23:22, Alan Stern <[email protected]> wrote:
> On Mon, 16 Nov 2015, Daniel J Blueman wrote:
>
>> Tuning USB suspend [1] in 4.3 on a Dell XPS 15 9553 (Skylake), I see a
>> kworker thread spinning in rpm_suspend [2].
>>
>> What is the most useful debug to get here beyond the immediate [3]?
>
> You can try doing:
>
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control

kworker and ksoftirqd spinning occurs when I echo 'auto' to all the
USB control entries. Using Alan's excellent tip, we see this being
logged repeatedly at a high rate:
[ 353.245180] usb usb1-port4: status 0107 change 0000
[ 353.245194] usb usb1-port12: status 0507 change 0000
[ 353.245202] hub 1-0:1.0: state 7 ports 16 chg 0000 evt 0000
[ 353.245203] hub 1-0:1.0: hub_suspend
[ 353.245205] usb usb1: bus auto-suspend, wakeup 1
[ 353.245206] usb usb1: bus suspend fail, err -16
[ 353.245207] hub 1-0:1.0: hub_resume
...

So, EBUSY. Both the webcam is not open, and the bluetooth interface
[1] is rfkill'd; the situation occurs even if I unload all related
modules.

What further debug would be useful?

Thanks!
Daniel

-- [1]

Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 002: ID 0a5c:6410 Broadcom Corp.
Bus 001 Device 003: ID 1bcf:2b95 Sunplus Innovation Technology Inc.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
--
Daniel J Blueman


2015-11-23 15:52:45

by Alan Stern

[permalink] [raw]
Subject: Re: [4.3] kworker busy in pm_runtime_work

On Sun, 22 Nov 2015, Daniel J Blueman wrote:

> On 16 November 2015 at 23:22, Alan Stern <[email protected]> wrote:
> > On Mon, 16 Nov 2015, Daniel J Blueman wrote:
> >
> >> Tuning USB suspend [1] in 4.3 on a Dell XPS 15 9553 (Skylake), I see a
> >> kworker thread spinning in rpm_suspend [2].
> >>
> >> What is the most useful debug to get here beyond the immediate [3]?
> >
> > You can try doing:
> >
> > echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
>
> kworker and ksoftirqd spinning occurs when I echo 'auto' to all the
> USB control entries. Using Alan's excellent tip, we see this being
> logged repeatedly at a high rate:
> [ 353.245180] usb usb1-port4: status 0107 change 0000
> [ 353.245194] usb usb1-port12: status 0507 change 0000
> [ 353.245202] hub 1-0:1.0: state 7 ports 16 chg 0000 evt 0000
> [ 353.245203] hub 1-0:1.0: hub_suspend
> [ 353.245205] usb usb1: bus auto-suspend, wakeup 1
> [ 353.245206] usb usb1: bus suspend fail, err -16
> [ 353.245207] hub 1-0:1.0: hub_resume
> ...
>
> So, EBUSY. Both the webcam is not open, and the bluetooth interface
> [1] is rfkill'd; the situation occurs even if I unload all related
> modules.
>
> What further debug would be useful?
>
> Thanks!
> Daniel
>
> -- [1]
>
> Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
> Bus 001 Device 002: ID 0a5c:6410 Broadcom Corp.
> Bus 001 Device 003: ID 1bcf:2b95 Sunplus Innovation Technology Inc.
> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Since bus 1 uses an xHCI controller, you should do:

echo 'module xhci-hcd =p' >/sys/kernel/debug/dynamic_debug/control

I'm reasonably sure this will end up printing "suspend failed
because a port is resuming", since that's the only place where
xhci_bus_suspend() fails with -EBUSY, but you should try it to confirm
this.

Alan Stern

2015-11-24 22:23:08

by Daniel J Blueman

[permalink] [raw]
Subject: Re: [4.3] kworker busy in pm_runtime_work

On 23 November 2015 at 23:52, Alan Stern <[email protected]> wrote:
> On Sun, 22 Nov 2015, Daniel J Blueman wrote:
>
>> On 16 November 2015 at 23:22, Alan Stern <[email protected]> wrote:
>> > On Mon, 16 Nov 2015, Daniel J Blueman wrote:
>> >
>> >> Tuning USB suspend [1] in 4.3 on a Dell XPS 15 9553 (Skylake), I see a
>> >> kworker thread spinning in rpm_suspend [2].
>> >>
>> >> What is the most useful debug to get here beyond the immediate [3]?
>> >
>> > You can try doing:
>> >
>> > echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
>>
>> kworker and ksoftirqd spinning occurs when I echo 'auto' to all the
>> USB control entries. Using Alan's excellent tip, we see this being
>> logged repeatedly at a high rate:
>> [ 353.245180] usb usb1-port4: status 0107 change 0000
>> [ 353.245194] usb usb1-port12: status 0507 change 0000
>> [ 353.245202] hub 1-0:1.0: state 7 ports 16 chg 0000 evt 0000
>> [ 353.245203] hub 1-0:1.0: hub_suspend
>> [ 353.245205] usb usb1: bus auto-suspend, wakeup 1
>> [ 353.245206] usb usb1: bus suspend fail, err -16
>> [ 353.245207] hub 1-0:1.0: hub_resume
>> ...
>>
>> So, EBUSY. Both the webcam is not open, and the bluetooth interface
>> [1] is rfkill'd; the situation occurs even if I unload all related
>> modules.
>>
>> What further debug would be useful?
>>
>> Thanks!
>> Daniel
>>
>> -- [1]
>>
>> Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
>> Bus 001 Device 002: ID 0a5c:6410 Broadcom Corp.
>> Bus 001 Device 003: ID 1bcf:2b95 Sunplus Innovation Technology Inc.
>> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
>
> Since bus 1 uses an xHCI controller, you should do:
>
> echo 'module xhci-hcd =p' >/sys/kernel/debug/dynamic_debug/control
>
> I'm reasonably sure this will end up printing "suspend failed
> because a port is resuming", since that's the only place where
> xhci_bus_suspend() fails with -EBUSY, but you should try it to confirm
> this.

I had to use:
echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control

and indeed we see:
[29172.246221] xhci_hcd 0000:00:14.0: get port status, actual port 11
status = 0xe63
[29172.246222] xhci_hcd 0000:00:14.0: Get port status returned 0x507
[29172.246224] xhci_hcd 0000:00:14.0: get port status, actual port 12
status = 0x2a0
[29172.246228] xhci_hcd 0000:00:14.0: get port status, actual port 13
status = 0x2a0
[29172.246228] xhci_hcd 0000:00:14.0: Get port status returned 0x100
[29172.246231] xhci_hcd 0000:00:14.0: get port status, actual port 14
status = 0x2a0
[29172.246232] xhci_hcd 0000:00:14.0: Get port status returned 0x100
[29172.246235] xhci_hcd 0000:00:14.0: Get port status returned 0x100
[29172.246248] xhci_hcd 0000:00:14.0: Get port status returned 0x100
[29172.246254] xhci_hcd 0000:00:14.0: Get port status returned 0x100
[29172.246264] xhci_hcd 0000:00:14.0: Get port status returned 0x100
[29172.246275] xhci_hcd 0000:00:14.0: Get port status returned 0x100
[29172.246285] xhci_hcd 0000:00:14.0: Get port status returned 0x507
[29172.246294] xhci_hcd 0000:00:14.0: get port status, actual port 14
status = 0x2a0
[29172.246302] xhci_hcd 0000:00:14.0: suspend failed because a port is resuming
[29172.246321] xhci_hcd 0000:00:14.0: Get port status returned 0x107
[29172.246332] xhci_hcd 0000:00:14.0: get port status, actual port 6
status = 0x2a0
[29172.246346] xhci_hcd 0000:00:14.0: Get port status returned 0x100
[29172.246359] xhci_hcd 0000:00:14.0: get port status, actual port 13
status = 0x2a0
[29172.246364] xhci_hcd 0000:00:14.0: Get port status returned 0x100
[29172.246366] xhci_hcd 0000:00:14.0: get port status, actual port 15
status = 0x2a0
[29172.246371] xhci_hcd 0000:00:14.0: suspend failed because a port is resuming
[29172.246380] xhci_hcd 0000:00:14.0: Get port status returned 0x100
[29172.246382] xhci_hcd 0000:00:14.0: get port status, actual port 1
status = 0x2a0
[29172.246383] xhci_hcd 0000:00:14.0: Get port status returned 0x100
...
--
Daniel J Blueman

2015-11-25 15:42:31

by Alan Stern

[permalink] [raw]
Subject: Re: [4.3] kworker busy in pm_runtime_work

On Wed, 25 Nov 2015, Daniel J Blueman wrote:

> On 23 November 2015 at 23:52, Alan Stern <[email protected]> wrote:
> > On Sun, 22 Nov 2015, Daniel J Blueman wrote:
> >
> >> On 16 November 2015 at 23:22, Alan Stern <[email protected]> wrote:
> >> > On Mon, 16 Nov 2015, Daniel J Blueman wrote:
> >> >
> >> >> Tuning USB suspend [1] in 4.3 on a Dell XPS 15 9553 (Skylake), I see a
> >> >> kworker thread spinning in rpm_suspend [2].
> >> >>
> >> >> What is the most useful debug to get here beyond the immediate [3]?
> >> >
> >> > You can try doing:
> >> >
> >> > echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> >>
> >> kworker and ksoftirqd spinning occurs when I echo 'auto' to all the
> >> USB control entries. Using Alan's excellent tip, we see this being
> >> logged repeatedly at a high rate:
> >> [ 353.245180] usb usb1-port4: status 0107 change 0000
> >> [ 353.245194] usb usb1-port12: status 0507 change 0000
> >> [ 353.245202] hub 1-0:1.0: state 7 ports 16 chg 0000 evt 0000
> >> [ 353.245203] hub 1-0:1.0: hub_suspend
> >> [ 353.245205] usb usb1: bus auto-suspend, wakeup 1
> >> [ 353.245206] usb usb1: bus suspend fail, err -16
> >> [ 353.245207] hub 1-0:1.0: hub_resume
> >> ...
> >>
> >> So, EBUSY. Both the webcam is not open, and the bluetooth interface
> >> [1] is rfkill'd; the situation occurs even if I unload all related
> >> modules.
> >>
> >> What further debug would be useful?
> >>
> >> Thanks!
> >> Daniel
> >>
> >> -- [1]
> >>
> >> Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
> >> Bus 001 Device 002: ID 0a5c:6410 Broadcom Corp.
> >> Bus 001 Device 003: ID 1bcf:2b95 Sunplus Innovation Technology Inc.
> >> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> >
> > Since bus 1 uses an xHCI controller, you should do:
> >
> > echo 'module xhci-hcd =p' >/sys/kernel/debug/dynamic_debug/control
> >
> > I'm reasonably sure this will end up printing "suspend failed
> > because a port is resuming", since that's the only place where
> > xhci_bus_suspend() fails with -EBUSY, but you should try it to confirm
> > this.
>
> I had to use:
> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
>
> and indeed we see:
> [29172.246221] xhci_hcd 0000:00:14.0: get port status, actual port 11
> status = 0xe63
> [29172.246222] xhci_hcd 0000:00:14.0: Get port status returned 0x507
> [29172.246224] xhci_hcd 0000:00:14.0: get port status, actual port 12
> status = 0x2a0
> [29172.246228] xhci_hcd 0000:00:14.0: get port status, actual port 13
> status = 0x2a0
> [29172.246228] xhci_hcd 0000:00:14.0: Get port status returned 0x100
> [29172.246231] xhci_hcd 0000:00:14.0: get port status, actual port 14
> status = 0x2a0
> [29172.246232] xhci_hcd 0000:00:14.0: Get port status returned 0x100
> [29172.246235] xhci_hcd 0000:00:14.0: Get port status returned 0x100
> [29172.246248] xhci_hcd 0000:00:14.0: Get port status returned 0x100
> [29172.246254] xhci_hcd 0000:00:14.0: Get port status returned 0x100
> [29172.246264] xhci_hcd 0000:00:14.0: Get port status returned 0x100
> [29172.246275] xhci_hcd 0000:00:14.0: Get port status returned 0x100
> [29172.246285] xhci_hcd 0000:00:14.0: Get port status returned 0x507
> [29172.246294] xhci_hcd 0000:00:14.0: get port status, actual port 14
> status = 0x2a0
> [29172.246302] xhci_hcd 0000:00:14.0: suspend failed because a port is resuming
> [29172.246321] xhci_hcd 0000:00:14.0: Get port status returned 0x107
> [29172.246332] xhci_hcd 0000:00:14.0: get port status, actual port 6
> status = 0x2a0
> [29172.246346] xhci_hcd 0000:00:14.0: Get port status returned 0x100
> [29172.246359] xhci_hcd 0000:00:14.0: get port status, actual port 13
> status = 0x2a0
> [29172.246364] xhci_hcd 0000:00:14.0: Get port status returned 0x100
> [29172.246366] xhci_hcd 0000:00:14.0: get port status, actual port 15
> status = 0x2a0
> [29172.246371] xhci_hcd 0000:00:14.0: suspend failed because a port is resuming
> [29172.246380] xhci_hcd 0000:00:14.0: Get port status returned 0x100
> [29172.246382] xhci_hcd 0000:00:14.0: get port status, actual port 1
> status = 0x2a0
> [29172.246383] xhci_hcd 0000:00:14.0: Get port status returned 0x100

It's time to get some assistance from the xhci-hcd maintainer.

Mathias, Daniel's machine is spinning, trying to suspend the root hub
on the xHCI HS/FS/LS bus and failing, over and over again. It looks
like either bus_state->resuming_ports or bus_state->port_remote_wakeup
is nonzero when it shouldn't be.

The information we have is all above. Can you help Daniel debug this?

Alan Stern

2015-11-26 13:23:53

by Mathias Nyman

[permalink] [raw]
Subject: Re: [4.3] kworker busy in pm_runtime_work

On 25.11.2015 17:42, Alan Stern wrote:
> On Wed, 25 Nov 2015, Daniel J Blueman wrote:
>
>> On 23 November 2015 at 23:52, Alan Stern <[email protected]> wrote:
>>> On Sun, 22 Nov 2015, Daniel J Blueman wrote:
>>>
>>>> On 16 November 2015 at 23:22, Alan Stern <[email protected]> wrote:
>>>>> On Mon, 16 Nov 2015, Daniel J Blueman wrote:
>>>>>
>>>>>> Tuning USB suspend [1] in 4.3 on a Dell XPS 15 9553 (Skylake), I see a
>>>>>> kworker thread spinning in rpm_suspend [2].
>>>>>>
>>>>>> What is the most useful debug to get here beyond the immediate [3]?
>>>>>
>>>>> You can try doing:
>>>>>
>>>>> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
>>>>
>>>> kworker and ksoftirqd spinning occurs when I echo 'auto' to all the
>>>> USB control entries. Using Alan's excellent tip, we see this being
>>>> logged repeatedly at a high rate:
>>>> [ 353.245180] usb usb1-port4: status 0107 change 0000
>>>> [ 353.245194] usb usb1-port12: status 0507 change 0000
>>>> [ 353.245202] hub 1-0:1.0: state 7 ports 16 chg 0000 evt 0000
>>>> [ 353.245203] hub 1-0:1.0: hub_suspend
>>>> [ 353.245205] usb usb1: bus auto-suspend, wakeup 1
>>>> [ 353.245206] usb usb1: bus suspend fail, err -16
>>>> [ 353.245207] hub 1-0:1.0: hub_resume
>>>> ...
>>>>
>>>> So, EBUSY. Both the webcam is not open, and the bluetooth interface
>>>> [1] is rfkill'd; the situation occurs even if I unload all related
>>>> modules.
>>>>
>>>> What further debug would be useful?
>>>>
>>>> Thanks!
>>>> Daniel
>>>>
>>>> -- [1]
>>>>
>>>> Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
>>>> Bus 001 Device 002: ID 0a5c:6410 Broadcom Corp.
>>>> Bus 001 Device 003: ID 1bcf:2b95 Sunplus Innovation Technology Inc.
>>>> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
>>>
>>> Since bus 1 uses an xHCI controller, you should do:
>>>
>>> echo 'module xhci-hcd =p' >/sys/kernel/debug/dynamic_debug/control
>>>
>>> I'm reasonably sure this will end up printing "suspend failed
>>> because a port is resuming", since that's the only place where
>>> xhci_bus_suspend() fails with -EBUSY, but you should try it to confirm
>>> this.
>>
>> I had to use:
>> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
>>
>> and indeed we see:
>> [29172.246221] xhci_hcd 0000:00:14.0: get port status, actual port 11
>> status = 0xe63
>> [29172.246222] xhci_hcd 0000:00:14.0: Get port status returned 0x507
>> [29172.246224] xhci_hcd 0000:00:14.0: get port status, actual port 12
>> status = 0x2a0
>> [29172.246228] xhci_hcd 0000:00:14.0: get port status, actual port 13
>> status = 0x2a0
>> [29172.246228] xhci_hcd 0000:00:14.0: Get port status returned 0x100
>> [29172.246231] xhci_hcd 0000:00:14.0: get port status, actual port 14
>> status = 0x2a0
>> [29172.246232] xhci_hcd 0000:00:14.0: Get port status returned 0x100
>> [29172.246235] xhci_hcd 0000:00:14.0: Get port status returned 0x100
>> [29172.246248] xhci_hcd 0000:00:14.0: Get port status returned 0x100
>> [29172.246254] xhci_hcd 0000:00:14.0: Get port status returned 0x100
>> [29172.246264] xhci_hcd 0000:00:14.0: Get port status returned 0x100
>> [29172.246275] xhci_hcd 0000:00:14.0: Get port status returned 0x100
>> [29172.246285] xhci_hcd 0000:00:14.0: Get port status returned 0x507
>> [29172.246294] xhci_hcd 0000:00:14.0: get port status, actual port 14
>> status = 0x2a0
>> [29172.246302] xhci_hcd 0000:00:14.0: suspend failed because a port is resuming
>> [29172.246321] xhci_hcd 0000:00:14.0: Get port status returned 0x107
>> [29172.246332] xhci_hcd 0000:00:14.0: get port status, actual port 6
>> status = 0x2a0
>> [29172.246346] xhci_hcd 0000:00:14.0: Get port status returned 0x100
>> [29172.246359] xhci_hcd 0000:00:14.0: get port status, actual port 13
>> status = 0x2a0
>> [29172.246364] xhci_hcd 0000:00:14.0: Get port status returned 0x100
>> [29172.246366] xhci_hcd 0000:00:14.0: get port status, actual port 15
>> status = 0x2a0
>> [29172.246371] xhci_hcd 0000:00:14.0: suspend failed because a port is resuming
>> [29172.246380] xhci_hcd 0000:00:14.0: Get port status returned 0x100
>> [29172.246382] xhci_hcd 0000:00:14.0: get port status, actual port 1
>> status = 0x2a0
>> [29172.246383] xhci_hcd 0000:00:14.0: Get port status returned 0x100
>
> It's time to get some assistance from the xhci-hcd maintainer.
>
> Mathias, Daniel's machine is spinning, trying to suspend the root hub
> on the xHCI HS/FS/LS bus and failing, over and over again. It looks
> like either bus_state->resuming_ports or bus_state->port_remote_wakeup
> is nonzero when it shouldn't be.
>
> The information we have is all above. Can you help Daniel debug this?
>
> Alan Stern
>

Looks like bits in bus_state->resuming_ports may be left uncleared.

We set a bit in bus_state->resuming_ports if we get a port status event with a port
in resume link state. We only clear it when a "get port status" request finds the port
in resume state with a resume timer passed 20ms.

If something happens during this 20ms to the port, and it moves to some other link state,
then resuming_ports is never cleared.

There is a fix for what now appears to be a subset of this issue in 4.4-rc2, clearing
the bus_state->resuming_ports in case host initiated resume code moved the port link to
U0 and before the "get port status" request finds the port in resume state.
I now see that this is not enough.

I'll write a patch reordering how we check flags, timers and link state in "get port status"

Daniel, can you test a patch like this if I write it for 4.3 kernel?

-Mathias