It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
it works fine with ehci_hcd or USB 2.0.
The way I reproduce the problem is with this command:
$ i3lock && systemctl suspend
This is what I see on the screen when it hangs:
https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
Some logs:
https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
I also tried Linux 4.10.1 and I could reproduce this problem there as well.
Please let me know if I could provide more info.
Thanks,
Diego
Hi Greg,
On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
>> it works fine with ehci_hcd or USB 2.0.
>>
>> The way I reproduce the problem is with this command:
>>
>> $ i3lock && systemctl suspend
>>
>> This is what I see on the screen when it hangs:
>>
>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>>
>> Some logs:
>>
>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>>
>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>>
>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>>
>> Please let me know if I could provide more info.
>
> Has any previous kernel ever worked properly before? If so, any chance
> you can use 'git bisect' to find the offending commit?
I'm not sure, this is my work machine and I've only started using it
recently (since about a month ago or so).
I will try older kernels and see if I get any different results, I
will report back in any case.
>
> And are you sure you have updated your bios to the latest version?
Yes.
>
> thanks,
>
> greg k-h
Thanks,
Diego
On Wed, Mar 8, 2017 at 3:49 PM, Diego Viola <[email protected]> wrote:
> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
> it works fine with ehci_hcd or USB 2.0.
>
> The way I reproduce the problem is with this command:
>
> $ i3lock && systemctl suspend
>
> This is what I see on the screen when it hangs:
>
> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>
> Some logs:
>
> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>
> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>
> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>
> Please let me know if I could provide more info.
>
> Thanks,
> Diego
I've created a bug report here.
https://bugzilla.kernel.org/show_bug.cgi?id=194819
Diego
On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
> it works fine with ehci_hcd or USB 2.0.
>
> The way I reproduce the problem is with this command:
>
> $ i3lock && systemctl suspend
>
> This is what I see on the screen when it hangs:
>
> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>
> Some logs:
>
> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>
> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>
> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>
> Please let me know if I could provide more info.
Has any previous kernel ever worked properly before? If so, any chance
you can use 'git bisect' to find the offending commit?
And are you sure you have updated your bios to the latest version?
thanks,
greg k-h
On Wed, Mar 8, 2017 at 5:40 PM, Diego Viola <[email protected]> wrote:
> Hi Greg,
>
> On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
>> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
>>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
>>> it works fine with ehci_hcd or USB 2.0.
>>>
>>> The way I reproduce the problem is with this command:
>>>
>>> $ i3lock && systemctl suspend
>>>
>>> This is what I see on the screen when it hangs:
>>>
>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>>>
>>> Some logs:
>>>
>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>>>
>>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>>>
>>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>>>
>>> Please let me know if I could provide more info.
>>
>> Has any previous kernel ever worked properly before? If so, any chance
>> you can use 'git bisect' to find the offending commit?
>
> I'm not sure, this is my work machine and I've only started using it
> recently (since about a month ago or so).
>
> I will try older kernels and see if I get any different results, I
> will report back in any case.
>
>>
>> And are you sure you have updated your bios to the latest version?
>
> Yes.
>
>>
>> thanks,
>>
>> greg k-h
>
> Thanks,
> Diego
I found another workaround, I can suspend/resume fine with `i3lock &&
systemctl suspend` if I disconnect/unplug all my USB devices
(keyboard, mouse, etc). This with the default settings in the BIOS
(both USB 2.0 and 3.0 enabled).
I'm also seeing some messages like this in dmesg:
[ 16.172190] usb 2-6: device descriptor read/64, error -110
Would this indicate a hardware/firmware/power issue?
Thanks,
Diego
On Thu, Mar 9, 2017 at 11:11 AM, Diego Viola <[email protected]> wrote:
> On Wed, Mar 8, 2017 at 5:40 PM, Diego Viola <[email protected]> wrote:
>> Hi Greg,
>>
>> On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
>>> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
>>>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
>>>> it works fine with ehci_hcd or USB 2.0.
>>>>
>>>> The way I reproduce the problem is with this command:
>>>>
>>>> $ i3lock && systemctl suspend
>>>>
>>>> This is what I see on the screen when it hangs:
>>>>
>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>>>>
>>>> Some logs:
>>>>
>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>>>>
>>>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>>>>
>>>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>>>>
>>>> Please let me know if I could provide more info.
>>>
>>> Has any previous kernel ever worked properly before? If so, any chance
>>> you can use 'git bisect' to find the offending commit?
>>
>> I'm not sure, this is my work machine and I've only started using it
>> recently (since about a month ago or so).
>>
>> I will try older kernels and see if I get any different results, I
>> will report back in any case.
>>
>>>
>>> And are you sure you have updated your bios to the latest version?
>>
>> Yes.
>>
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> Thanks,
>> Diego
>
> I found another workaround, I can suspend/resume fine with `i3lock &&
> systemctl suspend` if I disconnect/unplug all my USB devices
> (keyboard, mouse, etc). This with the default settings in the BIOS
> (both USB 2.0 and 3.0 enabled).
>
> I'm also seeing some messages like this in dmesg:
>
> [ 16.172190] usb 2-6: device descriptor read/64, error -110
>
> Would this indicate a hardware/firmware/power issue?
>
> Thanks,
> Diego
OK, I've built Linux 4.4.52 (I did a localmodconfig) and rebooted into
it, I did a suspend/resume and it hanged the first time I tried to
resume, which isn't much different than using the latest kernel.
My dmesg is still being spammed with these messages:
[ 260.043673] usb 2-1: Device not responding to setup address.
[ 260.246918] usb 2-1: device not accepting address 15, error -71
[ 260.633662] usb 2-1: new high-speed USB device number 17 using xhci_hcd
[ 261.341340] usb 2-1: USB disconnect, device number 17
I guess it's safe to assume at this point that this is a hardware problem?
Thanks,
Diego
On Thu, Mar 9, 2017 at 2:15 PM, Diego Viola <[email protected]> wrote:
> On Thu, Mar 9, 2017 at 11:11 AM, Diego Viola <[email protected]> wrote:
>> On Wed, Mar 8, 2017 at 5:40 PM, Diego Viola <[email protected]> wrote:
>>> Hi Greg,
>>>
>>> On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
>>>> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
>>>>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
>>>>> it works fine with ehci_hcd or USB 2.0.
>>>>>
>>>>> The way I reproduce the problem is with this command:
>>>>>
>>>>> $ i3lock && systemctl suspend
>>>>>
>>>>> This is what I see on the screen when it hangs:
>>>>>
>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>>>>>
>>>>> Some logs:
>>>>>
>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>>>>>
>>>>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>>>>>
>>>>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>>>>>
>>>>> Please let me know if I could provide more info.
>>>>
>>>> Has any previous kernel ever worked properly before? If so, any chance
>>>> you can use 'git bisect' to find the offending commit?
>>>
>>> I'm not sure, this is my work machine and I've only started using it
>>> recently (since about a month ago or so).
>>>
>>> I will try older kernels and see if I get any different results, I
>>> will report back in any case.
>>>
>>>>
>>>> And are you sure you have updated your bios to the latest version?
>>>
>>> Yes.
>>>
>>>>
>>>> thanks,
>>>>
>>>> greg k-h
>>>
>>> Thanks,
>>> Diego
>>
>> I found another workaround, I can suspend/resume fine with `i3lock &&
>> systemctl suspend` if I disconnect/unplug all my USB devices
>> (keyboard, mouse, etc). This with the default settings in the BIOS
>> (both USB 2.0 and 3.0 enabled).
>>
>> I'm also seeing some messages like this in dmesg:
>>
>> [ 16.172190] usb 2-6: device descriptor read/64, error -110
>>
>> Would this indicate a hardware/firmware/power issue?
>>
>> Thanks,
>> Diego
>
> OK, I've built Linux 4.4.52 (I did a localmodconfig) and rebooted into
> it, I did a suspend/resume and it hanged the first time I tried to
> resume, which isn't much different than using the latest kernel.
>
> My dmesg is still being spammed with these messages:
>
> [ 260.043673] usb 2-1: Device not responding to setup address.
> [ 260.246918] usb 2-1: device not accepting address 15, error -71
> [ 260.633662] usb 2-1: new high-speed USB device number 17 using xhci_hcd
> [ 261.341340] usb 2-1: USB disconnect, device number 17
>
> I guess it's safe to assume at this point that this is a hardware problem?
>
> Thanks,
> Diego
Hello,
I've found something interesting and what it seems to be the cause of
my problem.
As soon as I boot my system I can see this process being in the D-state:
[root@myhost ~]# ps aux | grep " D"
root 269 0.0 0.0 0 0 ? D 14:11 0:00 [rtsx_usb_ms_2]
root 1424 0.0 0.0 10788 2172 pts/2 S+ 14:19 0:00 grep D
[root@myhost ~]#
I'm not exactly sure why that is, but if I do a 'rmmod rtsx_usb_ms'
the problem is gone. I already tried suspending/resuming ~40 times
after I disabled the module and the suspend/resume problem is gone.
Diego
On Tue, Mar 14, 2017 at 2:20 PM, Diego Viola <[email protected]> wrote:
> On Thu, Mar 9, 2017 at 2:15 PM, Diego Viola <[email protected]> wrote:
>> On Thu, Mar 9, 2017 at 11:11 AM, Diego Viola <[email protected]> wrote:
>>> On Wed, Mar 8, 2017 at 5:40 PM, Diego Viola <[email protected]> wrote:
>>>> Hi Greg,
>>>>
>>>> On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
>>>>> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
>>>>>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
>>>>>> it works fine with ehci_hcd or USB 2.0.
>>>>>>
>>>>>> The way I reproduce the problem is with this command:
>>>>>>
>>>>>> $ i3lock && systemctl suspend
>>>>>>
>>>>>> This is what I see on the screen when it hangs:
>>>>>>
>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>>>>>>
>>>>>> Some logs:
>>>>>>
>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>>>>>>
>>>>>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>>>>>>
>>>>>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>>>>>>
>>>>>> Please let me know if I could provide more info.
>>>>>
>>>>> Has any previous kernel ever worked properly before? If so, any chance
>>>>> you can use 'git bisect' to find the offending commit?
>>>>
>>>> I'm not sure, this is my work machine and I've only started using it
>>>> recently (since about a month ago or so).
>>>>
>>>> I will try older kernels and see if I get any different results, I
>>>> will report back in any case.
>>>>
>>>>>
>>>>> And are you sure you have updated your bios to the latest version?
>>>>
>>>> Yes.
>>>>
>>>>>
>>>>> thanks,
>>>>>
>>>>> greg k-h
>>>>
>>>> Thanks,
>>>> Diego
>>>
>>> I found another workaround, I can suspend/resume fine with `i3lock &&
>>> systemctl suspend` if I disconnect/unplug all my USB devices
>>> (keyboard, mouse, etc). This with the default settings in the BIOS
>>> (both USB 2.0 and 3.0 enabled).
>>>
>>> I'm also seeing some messages like this in dmesg:
>>>
>>> [ 16.172190] usb 2-6: device descriptor read/64, error -110
>>>
>>> Would this indicate a hardware/firmware/power issue?
>>>
>>> Thanks,
>>> Diego
>>
>> OK, I've built Linux 4.4.52 (I did a localmodconfig) and rebooted into
>> it, I did a suspend/resume and it hanged the first time I tried to
>> resume, which isn't much different than using the latest kernel.
>>
>> My dmesg is still being spammed with these messages:
>>
>> [ 260.043673] usb 2-1: Device not responding to setup address.
>> [ 260.246918] usb 2-1: device not accepting address 15, error -71
>> [ 260.633662] usb 2-1: new high-speed USB device number 17 using xhci_hcd
>> [ 261.341340] usb 2-1: USB disconnect, device number 17
>>
>> I guess it's safe to assume at this point that this is a hardware problem?
>>
>> Thanks,
>> Diego
>
> Hello,
>
> I've found something interesting and what it seems to be the cause of
> my problem.
>
> As soon as I boot my system I can see this process being in the D-state:
>
> [root@myhost ~]# ps aux | grep " D"
> root 269 0.0 0.0 0 0 ? D 14:11 0:00 [rtsx_usb_ms_2]
> root 1424 0.0 0.0 10788 2172 pts/2 S+ 14:19 0:00 grep D
> [root@myhost ~]#
>
> I'm not exactly sure why that is, but if I do a 'rmmod rtsx_usb_ms'
> the problem is gone. I already tried suspending/resuming ~40 times
> after I disabled the module and the suspend/resume problem is gone.
>
> Diego
Adding Roger Tseng to the CC also.
Diego
On Tue, Mar 14, 2017 at 4:15 PM, Diego Viola <[email protected]> wrote:
> On Tue, Mar 14, 2017 at 2:20 PM, Diego Viola <[email protected]> wrote:
>> On Thu, Mar 9, 2017 at 2:15 PM, Diego Viola <[email protected]> wrote:
>>> On Thu, Mar 9, 2017 at 11:11 AM, Diego Viola <[email protected]> wrote:
>>>> On Wed, Mar 8, 2017 at 5:40 PM, Diego Viola <[email protected]> wrote:
>>>>> Hi Greg,
>>>>>
>>>>> On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
>>>>>> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
>>>>>>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
>>>>>>> it works fine with ehci_hcd or USB 2.0.
>>>>>>>
>>>>>>> The way I reproduce the problem is with this command:
>>>>>>>
>>>>>>> $ i3lock && systemctl suspend
>>>>>>>
>>>>>>> This is what I see on the screen when it hangs:
>>>>>>>
>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>>>>>>>
>>>>>>> Some logs:
>>>>>>>
>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>>>>>>>
>>>>>>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>>>>>>>
>>>>>>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>>>>>>>
>>>>>>> Please let me know if I could provide more info.
>>>>>>
>>>>>> Has any previous kernel ever worked properly before? If so, any chance
>>>>>> you can use 'git bisect' to find the offending commit?
>>>>>
>>>>> I'm not sure, this is my work machine and I've only started using it
>>>>> recently (since about a month ago or so).
>>>>>
>>>>> I will try older kernels and see if I get any different results, I
>>>>> will report back in any case.
>>>>>
>>>>>>
>>>>>> And are you sure you have updated your bios to the latest version?
>>>>>
>>>>> Yes.
>>>>>
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> greg k-h
>>>>>
>>>>> Thanks,
>>>>> Diego
>>>>
>>>> I found another workaround, I can suspend/resume fine with `i3lock &&
>>>> systemctl suspend` if I disconnect/unplug all my USB devices
>>>> (keyboard, mouse, etc). This with the default settings in the BIOS
>>>> (both USB 2.0 and 3.0 enabled).
>>>>
>>>> I'm also seeing some messages like this in dmesg:
>>>>
>>>> [ 16.172190] usb 2-6: device descriptor read/64, error -110
>>>>
>>>> Would this indicate a hardware/firmware/power issue?
>>>>
>>>> Thanks,
>>>> Diego
>>>
>>> OK, I've built Linux 4.4.52 (I did a localmodconfig) and rebooted into
>>> it, I did a suspend/resume and it hanged the first time I tried to
>>> resume, which isn't much different than using the latest kernel.
>>>
>>> My dmesg is still being spammed with these messages:
>>>
>>> [ 260.043673] usb 2-1: Device not responding to setup address.
>>> [ 260.246918] usb 2-1: device not accepting address 15, error -71
>>> [ 260.633662] usb 2-1: new high-speed USB device number 17 using xhci_hcd
>>> [ 261.341340] usb 2-1: USB disconnect, device number 17
>>>
>>> I guess it's safe to assume at this point that this is a hardware problem?
>>>
>>> Thanks,
>>> Diego
>>
>> Hello,
>>
>> I've found something interesting and what it seems to be the cause of
>> my problem.
>>
>> As soon as I boot my system I can see this process being in the D-state:
>>
>> [root@myhost ~]# ps aux | grep " D"
>> root 269 0.0 0.0 0 0 ? D 14:11 0:00 [rtsx_usb_ms_2]
>> root 1424 0.0 0.0 10788 2172 pts/2 S+ 14:19 0:00 grep D
>> [root@myhost ~]#
>>
>> I'm not exactly sure why that is, but if I do a 'rmmod rtsx_usb_ms'
>> the problem is gone. I already tried suspending/resuming ~40 times
>> after I disabled the module and the suspend/resume problem is gone.
>>
>> Diego
>
> Adding Roger Tseng to the CC also.
>
> Diego
According to this document:
http://downloads.dell.com/manuals/all-products/esuprt_laptop/esuprt_inspiron_laptop/inspiron-15-5558-laptop_reference%20guide_en-us.pdf
My computer only has a SD card slot and no MEMSTICK slot.
lsusb says this though:
Bus 001 Device 005: ID 0bda:0129 Realtek Semiconductor Corp. RTS5129
Card Reader Controller
Maybe the driver gets locked up looking for the MEMSTICK slot?
Diego
+Alan
On 15 March 2017 at 15:00, Diego Viola <[email protected]> wrote:
> On Tue, Mar 14, 2017 at 4:15 PM, Diego Viola <[email protected]> wrote:
>> On Tue, Mar 14, 2017 at 2:20 PM, Diego Viola <[email protected]> wrote:
>>> On Thu, Mar 9, 2017 at 2:15 PM, Diego Viola <[email protected]> wrote:
>>>> On Thu, Mar 9, 2017 at 11:11 AM, Diego Viola <[email protected]> wrote:
>>>>> On Wed, Mar 8, 2017 at 5:40 PM, Diego Viola <[email protected]> wrote:
>>>>>> Hi Greg,
>>>>>>
>>>>>> On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
>>>>>>> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
>>>>>>>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
>>>>>>>> it works fine with ehci_hcd or USB 2.0.
>>>>>>>>
>>>>>>>> The way I reproduce the problem is with this command:
>>>>>>>>
>>>>>>>> $ i3lock && systemctl suspend
>>>>>>>>
>>>>>>>> This is what I see on the screen when it hangs:
>>>>>>>>
>>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
>>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>>>>>>>>
>>>>>>>> Some logs:
>>>>>>>>
>>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
>>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>>>>>>>>
>>>>>>>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>>>>>>>>
>>>>>>>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>>>>>>>>
>>>>>>>> Please let me know if I could provide more info.
>>>>>>>
>>>>>>> Has any previous kernel ever worked properly before? If so, any chance
>>>>>>> you can use 'git bisect' to find the offending commit?
>>>>>>
>>>>>> I'm not sure, this is my work machine and I've only started using it
>>>>>> recently (since about a month ago or so).
>>>>>>
>>>>>> I will try older kernels and see if I get any different results, I
>>>>>> will report back in any case.
>>>>>>
>>>>>>>
>>>>>>> And are you sure you have updated your bios to the latest version?
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> greg k-h
>>>>>>
>>>>>> Thanks,
>>>>>> Diego
>>>>>
>>>>> I found another workaround, I can suspend/resume fine with `i3lock &&
>>>>> systemctl suspend` if I disconnect/unplug all my USB devices
>>>>> (keyboard, mouse, etc). This with the default settings in the BIOS
>>>>> (both USB 2.0 and 3.0 enabled).
>>>>>
>>>>> I'm also seeing some messages like this in dmesg:
>>>>>
>>>>> [ 16.172190] usb 2-6: device descriptor read/64, error -110
>>>>>
>>>>> Would this indicate a hardware/firmware/power issue?
>>>>>
>>>>> Thanks,
>>>>> Diego
>>>>
>>>> OK, I've built Linux 4.4.52 (I did a localmodconfig) and rebooted into
>>>> it, I did a suspend/resume and it hanged the first time I tried to
>>>> resume, which isn't much different than using the latest kernel.
>>>>
>>>> My dmesg is still being spammed with these messages:
>>>>
>>>> [ 260.043673] usb 2-1: Device not responding to setup address.
>>>> [ 260.246918] usb 2-1: device not accepting address 15, error -71
>>>> [ 260.633662] usb 2-1: new high-speed USB device number 17 using xhci_hcd
>>>> [ 261.341340] usb 2-1: USB disconnect, device number 17
>>>>
>>>> I guess it's safe to assume at this point that this is a hardware problem?
>>>>
>>>> Thanks,
>>>> Diego
>>>
>>> Hello,
>>>
>>> I've found something interesting and what it seems to be the cause of
>>> my problem.
>>>
>>> As soon as I boot my system I can see this process being in the D-state:
>>>
>>> [root@myhost ~]# ps aux | grep " D"
>>> root 269 0.0 0.0 0 0 ? D 14:11 0:00 [rtsx_usb_ms_2]
>>> root 1424 0.0 0.0 10788 2172 pts/2 S+ 14:19 0:00 grep D
>>> [root@myhost ~]#
>>>
>>> I'm not exactly sure why that is, but if I do a 'rmmod rtsx_usb_ms'
>>> the problem is gone. I already tried suspending/resuming ~40 times
>>> after I disabled the module and the suspend/resume problem is gone.
That's a good observation!
It suspect the drivers/memstick/host/rtsx_usb_ms.c isn't behaving
properly from PM point of view. Perhaps it tries to access its device
while it from a runtime PM point view still is in a runtime suspended
state. Exactly why I don't know yet.
Moreover we have had issues with this driver before and its
corresponding SD card driver in drivers/mmc/host/rtsx_usb_sdmmc.c. On
top of that, both their corresponding devices shares the same usb mfd
device as parent, which is managed by drivers/mfd/rtsx_usb.c.
Unfortunate my knowledge about USB is still in the learning phase,
however I know well about runtime PM ans system suspend, so perhaps I
still might be able to help.
Anyway, I have looped in Alan, let's see if he has some input to this.
>>>
>>> Diego
>>
>> Adding Roger Tseng to the CC also.
>>
>> Diego
>
> According to this document:
>
> http://downloads.dell.com/manuals/all-products/esuprt_laptop/esuprt_inspiron_laptop/inspiron-15-5558-laptop_reference%20guide_en-us.pdf
>
> My computer only has a SD card slot and no MEMSTICK slot.
>
> lsusb says this though:
>
> Bus 001 Device 005: ID 0bda:0129 Realtek Semiconductor Corp. RTS5129
> Card Reader Controller
>
> Maybe the driver gets locked up looking for the MEMSTICK slot?
Yes correct!
>
> Diego
Kind regards
Uffe
On Thu, 16 Mar 2017, Ulf Hansson wrote:
> +Alan
>
> On 15 March 2017 at 15:00, Diego Viola <[email protected]> wrote:
> > On Tue, Mar 14, 2017 at 4:15 PM, Diego Viola <[email protected]> wrote:
> >> On Tue, Mar 14, 2017 at 2:20 PM, Diego Viola <[email protected]> wrote:
> >>> On Thu, Mar 9, 2017 at 2:15 PM, Diego Viola <[email protected]> wrote:
> >>>> On Thu, Mar 9, 2017 at 11:11 AM, Diego Viola <[email protected]> wrote:
> >>>>> On Wed, Mar 8, 2017 at 5:40 PM, Diego Viola <[email protected]> wrote:
> >>>>>> Hi Greg,
> >>>>>>
> >>>>>> On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
> >>>>>>> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
> >>>>>>>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
> >>>>>>>> it works fine with ehci_hcd or USB 2.0.
> >>>>>>>>
> >>>>>>>> The way I reproduce the problem is with this command:
> >>>>>>>>
> >>>>>>>> $ i3lock && systemctl suspend
> >>>>>>>>
> >>>>>>>> This is what I see on the screen when it hangs:
> >>>>>>>>
> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
> >>>>>>>>
> >>>>>>>> Some logs:
> >>>>>>>>
> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
> >>>>>>>>
> >>>>>>>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
> >>>>>>>>
> >>>>>>>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
> >>>>>>>>
> >>>>>>>> Please let me know if I could provide more info.
> >>>>>>>
> >>>>>>> Has any previous kernel ever worked properly before? If so, any chance
> >>>>>>> you can use 'git bisect' to find the offending commit?
> >>>>>>
> >>>>>> I'm not sure, this is my work machine and I've only started using it
> >>>>>> recently (since about a month ago or so).
> >>>>>>
> >>>>>> I will try older kernels and see if I get any different results, I
> >>>>>> will report back in any case.
> >>>>>>
> >>>>>>>
> >>>>>>> And are you sure you have updated your bios to the latest version?
> >>>>>>
> >>>>>> Yes.
> >>>>>>
> >>>>>>>
> >>>>>>> thanks,
> >>>>>>>
> >>>>>>> greg k-h
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Diego
> >>>>>
> >>>>> I found another workaround, I can suspend/resume fine with `i3lock &&
> >>>>> systemctl suspend` if I disconnect/unplug all my USB devices
> >>>>> (keyboard, mouse, etc). This with the default settings in the BIOS
> >>>>> (both USB 2.0 and 3.0 enabled).
> >>>>>
> >>>>> I'm also seeing some messages like this in dmesg:
> >>>>>
> >>>>> [ 16.172190] usb 2-6: device descriptor read/64, error -110
> >>>>>
> >>>>> Would this indicate a hardware/firmware/power issue?
> >>>>>
> >>>>> Thanks,
> >>>>> Diego
> >>>>
> >>>> OK, I've built Linux 4.4.52 (I did a localmodconfig) and rebooted into
> >>>> it, I did a suspend/resume and it hanged the first time I tried to
> >>>> resume, which isn't much different than using the latest kernel.
> >>>>
> >>>> My dmesg is still being spammed with these messages:
> >>>>
> >>>> [ 260.043673] usb 2-1: Device not responding to setup address.
> >>>> [ 260.246918] usb 2-1: device not accepting address 15, error -71
> >>>> [ 260.633662] usb 2-1: new high-speed USB device number 17 using xhci_hcd
> >>>> [ 261.341340] usb 2-1: USB disconnect, device number 17
> >>>>
> >>>> I guess it's safe to assume at this point that this is a hardware problem?
> >>>>
> >>>> Thanks,
> >>>> Diego
> >>>
> >>> Hello,
> >>>
> >>> I've found something interesting and what it seems to be the cause of
> >>> my problem.
> >>>
> >>> As soon as I boot my system I can see this process being in the D-state:
> >>>
> >>> [root@myhost ~]# ps aux | grep " D"
> >>> root 269 0.0 0.0 0 0 ? D 14:11 0:00 [rtsx_usb_ms_2]
> >>> root 1424 0.0 0.0 10788 2172 pts/2 S+ 14:19 0:00 grep D
> >>> [root@myhost ~]#
> >>>
> >>> I'm not exactly sure why that is, but if I do a 'rmmod rtsx_usb_ms'
> >>> the problem is gone. I already tried suspending/resuming ~40 times
> >>> after I disabled the module and the suspend/resume problem is gone.
>
> That's a good observation!
>
> It suspect the drivers/memstick/host/rtsx_usb_ms.c isn't behaving
> properly from PM point of view. Perhaps it tries to access its device
> while it from a runtime PM point view still is in a runtime suspended
> state. Exactly why I don't know yet.
>
> Moreover we have had issues with this driver before and its
> corresponding SD card driver in drivers/mmc/host/rtsx_usb_sdmmc.c. On
> top of that, both their corresponding devices shares the same usb mfd
> device as parent, which is managed by drivers/mfd/rtsx_usb.c.
>
> Unfortunate my knowledge about USB is still in the learning phase,
> however I know well about runtime PM ans system suspend, so perhaps I
> still might be able to help.
>
> Anyway, I have looped in Alan, let's see if he has some input to this.
Is the rtsx_usb_ms device attached to an xHCI controller?
How is the hang during resume related to the actions of the xhci-hcd
driver? (You'll probably need to enable dynamic debugging for xhci-hcd
and use a network console to get the answer.)
If this problem really is related to xhci-hcd, have you tried bringing
it to the attention of the xhci-hcd maintainer?
Are you using the most up-to-date version of the kernel? xhci-hcd is
still getting fixes at a very high rate.
Alan Stern
> >>>
> >>> Diego
> >>
> >> Adding Roger Tseng to the CC also.
> >>
> >> Diego
> >
> > According to this document:
> >
> > http://downloads.dell.com/manuals/all-products/esuprt_laptop/esuprt_inspiron_laptop/inspiron-15-5558-laptop_reference%20guide_en-us.pdf
> >
> > My computer only has a SD card slot and no MEMSTICK slot.
> >
> > lsusb says this though:
> >
> > Bus 001 Device 005: ID 0bda:0129 Realtek Semiconductor Corp. RTS5129
> > Card Reader Controller
> >
> > Maybe the driver gets locked up looking for the MEMSTICK slot?
>
> Yes correct!
>
> >
> > Diego
>
> Kind regards
> Uffe
>
On Thu, Mar 16, 2017 at 12:45 PM, Diego Viola <[email protected]> wrote:
> On Thu, Mar 16, 2017 at 12:07 PM, Alan Stern <[email protected]> wrote:
>> On Thu, 16 Mar 2017, Ulf Hansson wrote:
>>
>>> +Alan
>>>
>>> On 15 March 2017 at 15:00, Diego Viola <[email protected]> wrote:
>>> > On Tue, Mar 14, 2017 at 4:15 PM, Diego Viola <[email protected]> wrote:
>>> >> On Tue, Mar 14, 2017 at 2:20 PM, Diego Viola <[email protected]> wrote:
>>> >>> On Thu, Mar 9, 2017 at 2:15 PM, Diego Viola <[email protected]> wrote:
>>> >>>> On Thu, Mar 9, 2017 at 11:11 AM, Diego Viola <[email protected]> wrote:
>>> >>>>> On Wed, Mar 8, 2017 at 5:40 PM, Diego Viola <[email protected]> wrote:
>>> >>>>>> Hi Greg,
>>> >>>>>>
>>> >>>>>> On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
>>> >>>>>>> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
>>> >>>>>>>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
>>> >>>>>>>> it works fine with ehci_hcd or USB 2.0.
>>> >>>>>>>>
>>> >>>>>>>> The way I reproduce the problem is with this command:
>>> >>>>>>>>
>>> >>>>>>>> $ i3lock && systemctl suspend
>>> >>>>>>>>
>>> >>>>>>>> This is what I see on the screen when it hangs:
>>> >>>>>>>>
>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>>> >>>>>>>>
>>> >>>>>>>> Some logs:
>>> >>>>>>>>
>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>>> >>>>>>>>
>>> >>>>>>>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>>> >>>>>>>>
>>> >>>>>>>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>>> >>>>>>>>
>>> >>>>>>>> Please let me know if I could provide more info.
>>> >>>>>>>
>>> >>>>>>> Has any previous kernel ever worked properly before? If so, any chance
>>> >>>>>>> you can use 'git bisect' to find the offending commit?
>>> >>>>>>
>>> >>>>>> I'm not sure, this is my work machine and I've only started using it
>>> >>>>>> recently (since about a month ago or so).
>>> >>>>>>
>>> >>>>>> I will try older kernels and see if I get any different results, I
>>> >>>>>> will report back in any case.
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>> And are you sure you have updated your bios to the latest version?
>>> >>>>>>
>>> >>>>>> Yes.
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>> thanks,
>>> >>>>>>>
>>> >>>>>>> greg k-h
>>> >>>>>>
>>> >>>>>> Thanks,
>>> >>>>>> Diego
>>> >>>>>
>>> >>>>> I found another workaround, I can suspend/resume fine with `i3lock &&
>>> >>>>> systemctl suspend` if I disconnect/unplug all my USB devices
>>> >>>>> (keyboard, mouse, etc). This with the default settings in the BIOS
>>> >>>>> (both USB 2.0 and 3.0 enabled).
>>> >>>>>
>>> >>>>> I'm also seeing some messages like this in dmesg:
>>> >>>>>
>>> >>>>> [ 16.172190] usb 2-6: device descriptor read/64, error -110
>>> >>>>>
>>> >>>>> Would this indicate a hardware/firmware/power issue?
>>> >>>>>
>>> >>>>> Thanks,
>>> >>>>> Diego
>>> >>>>
>>> >>>> OK, I've built Linux 4.4.52 (I did a localmodconfig) and rebooted into
>>> >>>> it, I did a suspend/resume and it hanged the first time I tried to
>>> >>>> resume, which isn't much different than using the latest kernel.
>>> >>>>
>>> >>>> My dmesg is still being spammed with these messages:
>>> >>>>
>>> >>>> [ 260.043673] usb 2-1: Device not responding to setup address.
>>> >>>> [ 260.246918] usb 2-1: device not accepting address 15, error -71
>>> >>>> [ 260.633662] usb 2-1: new high-speed USB device number 17 using xhci_hcd
>>> >>>> [ 261.341340] usb 2-1: USB disconnect, device number 17
>>> >>>>
>>> >>>> I guess it's safe to assume at this point that this is a hardware problem?
>>> >>>>
>>> >>>> Thanks,
>>> >>>> Diego
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> I've found something interesting and what it seems to be the cause of
>>> >>> my problem.
>>> >>>
>>> >>> As soon as I boot my system I can see this process being in the D-state:
>>> >>>
>>> >>> [root@myhost ~]# ps aux | grep " D"
>>> >>> root 269 0.0 0.0 0 0 ? D 14:11 0:00 [rtsx_usb_ms_2]
>>> >>> root 1424 0.0 0.0 10788 2172 pts/2 S+ 14:19 0:00 grep D
>>> >>> [root@myhost ~]#
>>> >>>
>>> >>> I'm not exactly sure why that is, but if I do a 'rmmod rtsx_usb_ms'
>>> >>> the problem is gone. I already tried suspending/resuming ~40 times
>>> >>> after I disabled the module and the suspend/resume problem is gone.
>>>
>>> That's a good observation!
>>>
>>> It suspect the drivers/memstick/host/rtsx_usb_ms.c isn't behaving
>>> properly from PM point of view. Perhaps it tries to access its device
>>> while it from a runtime PM point view still is in a runtime suspended
>>> state. Exactly why I don't know yet.
>>>
>>> Moreover we have had issues with this driver before and its
>>> corresponding SD card driver in drivers/mmc/host/rtsx_usb_sdmmc.c. On
>>> top of that, both their corresponding devices shares the same usb mfd
>>> device as parent, which is managed by drivers/mfd/rtsx_usb.c.
>>>
>>> Unfortunate my knowledge about USB is still in the learning phase,
>>> however I know well about runtime PM ans system suspend, so perhaps I
>>> still might be able to help.
>>>
>>> Anyway, I have looped in Alan, let's see if he has some input to this.
>>
>> Is the rtsx_usb_ms device attached to an xHCI controller?
>
> I think so, I'm not sure.
>
> lsusb -t reveals rtsx_usb is under xhci_hcd as seen here:
>
> https://bugzilla.kernel.org/attachment.cgi?id=255301
>
> Also, I tried disabling USB 3.0 from the BIOS and I'm still able to
> see rtsx_usb_ms is being loaded after that and the [rtsx_usb_ms_2]
> also shows up as a D-state process still, but no hanging occurs when
> USB 3.0 (xhci_hcd) is disabled.
>
>>
>> How is the hang during resume related to the actions of the xhci-hcd
>> driver? (You'll probably need to enable dynamic debugging for xhci-hcd
>> and use a network console to get the answer.)
>
> OK, I'll do this and get back with a trace.
>
>>
>> If this problem really is related to xhci-hcd, have you tried bringing
>> it to the attention of the xhci-hcd maintainer?
>
> No, not yet. I'm also not sure who the current maintainer for xhci_hcd is?
>
> modinfo says the author is Sarah Sharp but does she still maintains it?
>
>>
>> Are you using the most up-to-date version of the kernel? xhci-hcd is
>> still getting fixes at a very high rate.
>
> Yes, I'm currently on 4.10.2-ARCH.
>
> I will keep an eye on xhci_hcd changes on the latest git and give them
> a try also.
>
>>
>> Alan Stern
>>
>>> >>>
>>> >>> Diego
>>> >>
>>> >> Adding Roger Tseng to the CC also.
>>> >>
>>> >> Diego
>>> >
>>> > According to this document:
>>> >
>>> > http://downloads.dell.com/manuals/all-products/esuprt_laptop/esuprt_inspiron_laptop/inspiron-15-5558-laptop_reference%20guide_en-us.pdf
>>> >
>>> > My computer only has a SD card slot and no MEMSTICK slot.
>>> >
>>> > lsusb says this though:
>>> >
>>> > Bus 001 Device 005: ID 0bda:0129 Realtek Semiconductor Corp. RTS5129
>>> > Card Reader Controller
>>> >
>>> > Maybe the driver gets locked up looking for the MEMSTICK slot?
>>>
>>> Yes correct!
>>>
>>> >
>>> > Diego
>>>
>>> Kind regards
>>> Uffe
>>>
>>
>
> Thanks,
> Diego
lsusb -t with USB 3.0 disabled on BIOS:
https://bugzilla.kernel.org/attachment.cgi?id=255303
Diego
On Thu, Mar 16, 2017 at 12:07 PM, Alan Stern <[email protected]> wrote:
> On Thu, 16 Mar 2017, Ulf Hansson wrote:
>
>> +Alan
>>
>> On 15 March 2017 at 15:00, Diego Viola <[email protected]> wrote:
>> > On Tue, Mar 14, 2017 at 4:15 PM, Diego Viola <[email protected]> wrote:
>> >> On Tue, Mar 14, 2017 at 2:20 PM, Diego Viola <[email protected]> wrote:
>> >>> On Thu, Mar 9, 2017 at 2:15 PM, Diego Viola <[email protected]> wrote:
>> >>>> On Thu, Mar 9, 2017 at 11:11 AM, Diego Viola <[email protected]> wrote:
>> >>>>> On Wed, Mar 8, 2017 at 5:40 PM, Diego Viola <[email protected]> wrote:
>> >>>>>> Hi Greg,
>> >>>>>>
>> >>>>>> On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
>> >>>>>>> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
>> >>>>>>>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
>> >>>>>>>> it works fine with ehci_hcd or USB 2.0.
>> >>>>>>>>
>> >>>>>>>> The way I reproduce the problem is with this command:
>> >>>>>>>>
>> >>>>>>>> $ i3lock && systemctl suspend
>> >>>>>>>>
>> >>>>>>>> This is what I see on the screen when it hangs:
>> >>>>>>>>
>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>> >>>>>>>>
>> >>>>>>>> Some logs:
>> >>>>>>>>
>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>> >>>>>>>>
>> >>>>>>>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>> >>>>>>>>
>> >>>>>>>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>> >>>>>>>>
>> >>>>>>>> Please let me know if I could provide more info.
>> >>>>>>>
>> >>>>>>> Has any previous kernel ever worked properly before? If so, any chance
>> >>>>>>> you can use 'git bisect' to find the offending commit?
>> >>>>>>
>> >>>>>> I'm not sure, this is my work machine and I've only started using it
>> >>>>>> recently (since about a month ago or so).
>> >>>>>>
>> >>>>>> I will try older kernels and see if I get any different results, I
>> >>>>>> will report back in any case.
>> >>>>>>
>> >>>>>>>
>> >>>>>>> And are you sure you have updated your bios to the latest version?
>> >>>>>>
>> >>>>>> Yes.
>> >>>>>>
>> >>>>>>>
>> >>>>>>> thanks,
>> >>>>>>>
>> >>>>>>> greg k-h
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Diego
>> >>>>>
>> >>>>> I found another workaround, I can suspend/resume fine with `i3lock &&
>> >>>>> systemctl suspend` if I disconnect/unplug all my USB devices
>> >>>>> (keyboard, mouse, etc). This with the default settings in the BIOS
>> >>>>> (both USB 2.0 and 3.0 enabled).
>> >>>>>
>> >>>>> I'm also seeing some messages like this in dmesg:
>> >>>>>
>> >>>>> [ 16.172190] usb 2-6: device descriptor read/64, error -110
>> >>>>>
>> >>>>> Would this indicate a hardware/firmware/power issue?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Diego
>> >>>>
>> >>>> OK, I've built Linux 4.4.52 (I did a localmodconfig) and rebooted into
>> >>>> it, I did a suspend/resume and it hanged the first time I tried to
>> >>>> resume, which isn't much different than using the latest kernel.
>> >>>>
>> >>>> My dmesg is still being spammed with these messages:
>> >>>>
>> >>>> [ 260.043673] usb 2-1: Device not responding to setup address.
>> >>>> [ 260.246918] usb 2-1: device not accepting address 15, error -71
>> >>>> [ 260.633662] usb 2-1: new high-speed USB device number 17 using xhci_hcd
>> >>>> [ 261.341340] usb 2-1: USB disconnect, device number 17
>> >>>>
>> >>>> I guess it's safe to assume at this point that this is a hardware problem?
>> >>>>
>> >>>> Thanks,
>> >>>> Diego
>> >>>
>> >>> Hello,
>> >>>
>> >>> I've found something interesting and what it seems to be the cause of
>> >>> my problem.
>> >>>
>> >>> As soon as I boot my system I can see this process being in the D-state:
>> >>>
>> >>> [root@myhost ~]# ps aux | grep " D"
>> >>> root 269 0.0 0.0 0 0 ? D 14:11 0:00 [rtsx_usb_ms_2]
>> >>> root 1424 0.0 0.0 10788 2172 pts/2 S+ 14:19 0:00 grep D
>> >>> [root@myhost ~]#
>> >>>
>> >>> I'm not exactly sure why that is, but if I do a 'rmmod rtsx_usb_ms'
>> >>> the problem is gone. I already tried suspending/resuming ~40 times
>> >>> after I disabled the module and the suspend/resume problem is gone.
>>
>> That's a good observation!
>>
>> It suspect the drivers/memstick/host/rtsx_usb_ms.c isn't behaving
>> properly from PM point of view. Perhaps it tries to access its device
>> while it from a runtime PM point view still is in a runtime suspended
>> state. Exactly why I don't know yet.
>>
>> Moreover we have had issues with this driver before and its
>> corresponding SD card driver in drivers/mmc/host/rtsx_usb_sdmmc.c. On
>> top of that, both their corresponding devices shares the same usb mfd
>> device as parent, which is managed by drivers/mfd/rtsx_usb.c.
>>
>> Unfortunate my knowledge about USB is still in the learning phase,
>> however I know well about runtime PM ans system suspend, so perhaps I
>> still might be able to help.
>>
>> Anyway, I have looped in Alan, let's see if he has some input to this.
>
> Is the rtsx_usb_ms device attached to an xHCI controller?
I think so, I'm not sure.
lsusb -t reveals rtsx_usb is under xhci_hcd as seen here:
https://bugzilla.kernel.org/attachment.cgi?id=255301
Also, I tried disabling USB 3.0 from the BIOS and I'm still able to
see rtsx_usb_ms is being loaded after that and the [rtsx_usb_ms_2]
also shows up as a D-state process still, but no hanging occurs when
USB 3.0 (xhci_hcd) is disabled.
>
> How is the hang during resume related to the actions of the xhci-hcd
> driver? (You'll probably need to enable dynamic debugging for xhci-hcd
> and use a network console to get the answer.)
OK, I'll do this and get back with a trace.
>
> If this problem really is related to xhci-hcd, have you tried bringing
> it to the attention of the xhci-hcd maintainer?
No, not yet. I'm also not sure who the current maintainer for xhci_hcd is?
modinfo says the author is Sarah Sharp but does she still maintains it?
>
> Are you using the most up-to-date version of the kernel? xhci-hcd is
> still getting fixes at a very high rate.
Yes, I'm currently on 4.10.2-ARCH.
I will keep an eye on xhci_hcd changes on the latest git and give them
a try also.
>
> Alan Stern
>
>> >>>
>> >>> Diego
>> >>
>> >> Adding Roger Tseng to the CC also.
>> >>
>> >> Diego
>> >
>> > According to this document:
>> >
>> > http://downloads.dell.com/manuals/all-products/esuprt_laptop/esuprt_inspiron_laptop/inspiron-15-5558-laptop_reference%20guide_en-us.pdf
>> >
>> > My computer only has a SD card slot and no MEMSTICK slot.
>> >
>> > lsusb says this though:
>> >
>> > Bus 001 Device 005: ID 0bda:0129 Realtek Semiconductor Corp. RTS5129
>> > Card Reader Controller
>> >
>> > Maybe the driver gets locked up looking for the MEMSTICK slot?
>>
>> Yes correct!
>>
>> >
>> > Diego
>>
>> Kind regards
>> Uffe
>>
>
Thanks,
Diego
On Thu, Mar 16, 2017 at 12:51 PM, Diego Viola <[email protected]> wrote:
> On Thu, Mar 16, 2017 at 12:45 PM, Diego Viola <[email protected]> wrote:
>> On Thu, Mar 16, 2017 at 12:07 PM, Alan Stern <[email protected]> wrote:
>>> On Thu, 16 Mar 2017, Ulf Hansson wrote:
>>>
>>>> +Alan
>>>>
>>>> On 15 March 2017 at 15:00, Diego Viola <[email protected]> wrote:
>>>> > On Tue, Mar 14, 2017 at 4:15 PM, Diego Viola <[email protected]> wrote:
>>>> >> On Tue, Mar 14, 2017 at 2:20 PM, Diego Viola <[email protected]> wrote:
>>>> >>> On Thu, Mar 9, 2017 at 2:15 PM, Diego Viola <[email protected]> wrote:
>>>> >>>> On Thu, Mar 9, 2017 at 11:11 AM, Diego Viola <[email protected]> wrote:
>>>> >>>>> On Wed, Mar 8, 2017 at 5:40 PM, Diego Viola <[email protected]> wrote:
>>>> >>>>>> Hi Greg,
>>>> >>>>>>
>>>> >>>>>> On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
>>>> >>>>>>> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
>>>> >>>>>>>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
>>>> >>>>>>>> it works fine with ehci_hcd or USB 2.0.
>>>> >>>>>>>>
>>>> >>>>>>>> The way I reproduce the problem is with this command:
>>>> >>>>>>>>
>>>> >>>>>>>> $ i3lock && systemctl suspend
>>>> >>>>>>>>
>>>> >>>>>>>> This is what I see on the screen when it hangs:
>>>> >>>>>>>>
>>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
>>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>>>> >>>>>>>>
>>>> >>>>>>>> Some logs:
>>>> >>>>>>>>
>>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
>>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>>>> >>>>>>>>
>>>> >>>>>>>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>>>> >>>>>>>>
>>>> >>>>>>>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>>>> >>>>>>>>
>>>> >>>>>>>> Please let me know if I could provide more info.
>>>> >>>>>>>
>>>> >>>>>>> Has any previous kernel ever worked properly before? If so, any chance
>>>> >>>>>>> you can use 'git bisect' to find the offending commit?
>>>> >>>>>>
>>>> >>>>>> I'm not sure, this is my work machine and I've only started using it
>>>> >>>>>> recently (since about a month ago or so).
>>>> >>>>>>
>>>> >>>>>> I will try older kernels and see if I get any different results, I
>>>> >>>>>> will report back in any case.
>>>> >>>>>>
>>>> >>>>>>>
>>>> >>>>>>> And are you sure you have updated your bios to the latest version?
>>>> >>>>>>
>>>> >>>>>> Yes.
>>>> >>>>>>
>>>> >>>>>>>
>>>> >>>>>>> thanks,
>>>> >>>>>>>
>>>> >>>>>>> greg k-h
>>>> >>>>>>
>>>> >>>>>> Thanks,
>>>> >>>>>> Diego
>>>> >>>>>
>>>> >>>>> I found another workaround, I can suspend/resume fine with `i3lock &&
>>>> >>>>> systemctl suspend` if I disconnect/unplug all my USB devices
>>>> >>>>> (keyboard, mouse, etc). This with the default settings in the BIOS
>>>> >>>>> (both USB 2.0 and 3.0 enabled).
>>>> >>>>>
>>>> >>>>> I'm also seeing some messages like this in dmesg:
>>>> >>>>>
>>>> >>>>> [ 16.172190] usb 2-6: device descriptor read/64, error -110
>>>> >>>>>
>>>> >>>>> Would this indicate a hardware/firmware/power issue?
>>>> >>>>>
>>>> >>>>> Thanks,
>>>> >>>>> Diego
>>>> >>>>
>>>> >>>> OK, I've built Linux 4.4.52 (I did a localmodconfig) and rebooted into
>>>> >>>> it, I did a suspend/resume and it hanged the first time I tried to
>>>> >>>> resume, which isn't much different than using the latest kernel.
>>>> >>>>
>>>> >>>> My dmesg is still being spammed with these messages:
>>>> >>>>
>>>> >>>> [ 260.043673] usb 2-1: Device not responding to setup address.
>>>> >>>> [ 260.246918] usb 2-1: device not accepting address 15, error -71
>>>> >>>> [ 260.633662] usb 2-1: new high-speed USB device number 17 using xhci_hcd
>>>> >>>> [ 261.341340] usb 2-1: USB disconnect, device number 17
>>>> >>>>
>>>> >>>> I guess it's safe to assume at this point that this is a hardware problem?
>>>> >>>>
>>>> >>>> Thanks,
>>>> >>>> Diego
>>>> >>>
>>>> >>> Hello,
>>>> >>>
>>>> >>> I've found something interesting and what it seems to be the cause of
>>>> >>> my problem.
>>>> >>>
>>>> >>> As soon as I boot my system I can see this process being in the D-state:
>>>> >>>
>>>> >>> [root@myhost ~]# ps aux | grep " D"
>>>> >>> root 269 0.0 0.0 0 0 ? D 14:11 0:00 [rtsx_usb_ms_2]
>>>> >>> root 1424 0.0 0.0 10788 2172 pts/2 S+ 14:19 0:00 grep D
>>>> >>> [root@myhost ~]#
>>>> >>>
>>>> >>> I'm not exactly sure why that is, but if I do a 'rmmod rtsx_usb_ms'
>>>> >>> the problem is gone. I already tried suspending/resuming ~40 times
>>>> >>> after I disabled the module and the suspend/resume problem is gone.
>>>>
>>>> That's a good observation!
>>>>
>>>> It suspect the drivers/memstick/host/rtsx_usb_ms.c isn't behaving
>>>> properly from PM point of view. Perhaps it tries to access its device
>>>> while it from a runtime PM point view still is in a runtime suspended
>>>> state. Exactly why I don't know yet.
>>>>
>>>> Moreover we have had issues with this driver before and its
>>>> corresponding SD card driver in drivers/mmc/host/rtsx_usb_sdmmc.c. On
>>>> top of that, both their corresponding devices shares the same usb mfd
>>>> device as parent, which is managed by drivers/mfd/rtsx_usb.c.
>>>>
>>>> Unfortunate my knowledge about USB is still in the learning phase,
>>>> however I know well about runtime PM ans system suspend, so perhaps I
>>>> still might be able to help.
>>>>
>>>> Anyway, I have looped in Alan, let's see if he has some input to this.
>>>
>>> Is the rtsx_usb_ms device attached to an xHCI controller?
>>
>> I think so, I'm not sure.
>>
>> lsusb -t reveals rtsx_usb is under xhci_hcd as seen here:
>>
>> https://bugzilla.kernel.org/attachment.cgi?id=255301
>>
>> Also, I tried disabling USB 3.0 from the BIOS and I'm still able to
>> see rtsx_usb_ms is being loaded after that and the [rtsx_usb_ms_2]
>> also shows up as a D-state process still, but no hanging occurs when
>> USB 3.0 (xhci_hcd) is disabled.
>>
>>>
>>> How is the hang during resume related to the actions of the xhci-hcd
>>> driver? (You'll probably need to enable dynamic debugging for xhci-hcd
>>> and use a network console to get the answer.)
>>
>> OK, I'll do this and get back with a trace.
>>
>>>
>>> If this problem really is related to xhci-hcd, have you tried bringing
>>> it to the attention of the xhci-hcd maintainer?
>>
>> No, not yet. I'm also not sure who the current maintainer for xhci_hcd is?
>>
>> modinfo says the author is Sarah Sharp but does she still maintains it?
>>
>>>
>>> Are you using the most up-to-date version of the kernel? xhci-hcd is
>>> still getting fixes at a very high rate.
>>
>> Yes, I'm currently on 4.10.2-ARCH.
>>
>> I will keep an eye on xhci_hcd changes on the latest git and give them
>> a try also.
>>
>>>
>>> Alan Stern
>>>
>>>> >>>
>>>> >>> Diego
>>>> >>
>>>> >> Adding Roger Tseng to the CC also.
>>>> >>
>>>> >> Diego
>>>> >
>>>> > According to this document:
>>>> >
>>>> > http://downloads.dell.com/manuals/all-products/esuprt_laptop/esuprt_inspiron_laptop/inspiron-15-5558-laptop_reference%20guide_en-us.pdf
>>>> >
>>>> > My computer only has a SD card slot and no MEMSTICK slot.
>>>> >
>>>> > lsusb says this though:
>>>> >
>>>> > Bus 001 Device 005: ID 0bda:0129 Realtek Semiconductor Corp. RTS5129
>>>> > Card Reader Controller
>>>> >
>>>> > Maybe the driver gets locked up looking for the MEMSTICK slot?
>>>>
>>>> Yes correct!
>>>>
>>>> >
>>>> > Diego
>>>>
>>>> Kind regards
>>>> Uffe
>>>>
>>>
>>
>> Thanks,
>> Diego
>
> lsusb -t with USB 3.0 disabled on BIOS:
>
> https://bugzilla.kernel.org/attachment.cgi?id=255303
>
> Diego
Hrm, that's rtsx_usb and not rtsx_usb_ms.
I'm getting confused here.
Diego
On Thu, Mar 16, 2017 at 1:02 PM, Diego Viola <[email protected]> wrote:
> On Thu, Mar 16, 2017 at 12:51 PM, Diego Viola <[email protected]> wrote:
>> On Thu, Mar 16, 2017 at 12:45 PM, Diego Viola <[email protected]> wrote:
>>> On Thu, Mar 16, 2017 at 12:07 PM, Alan Stern <[email protected]> wrote:
>>>> On Thu, 16 Mar 2017, Ulf Hansson wrote:
>>>>
>>>>> +Alan
>>>>>
>>>>> On 15 March 2017 at 15:00, Diego Viola <[email protected]> wrote:
>>>>> > On Tue, Mar 14, 2017 at 4:15 PM, Diego Viola <[email protected]> wrote:
>>>>> >> On Tue, Mar 14, 2017 at 2:20 PM, Diego Viola <[email protected]> wrote:
>>>>> >>> On Thu, Mar 9, 2017 at 2:15 PM, Diego Viola <[email protected]> wrote:
>>>>> >>>> On Thu, Mar 9, 2017 at 11:11 AM, Diego Viola <[email protected]> wrote:
>>>>> >>>>> On Wed, Mar 8, 2017 at 5:40 PM, Diego Viola <[email protected]> wrote:
>>>>> >>>>>> Hi Greg,
>>>>> >>>>>>
>>>>> >>>>>> On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
>>>>> >>>>>>> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
>>>>> >>>>>>>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
>>>>> >>>>>>>> it works fine with ehci_hcd or USB 2.0.
>>>>> >>>>>>>>
>>>>> >>>>>>>> The way I reproduce the problem is with this command:
>>>>> >>>>>>>>
>>>>> >>>>>>>> $ i3lock && systemctl suspend
>>>>> >>>>>>>>
>>>>> >>>>>>>> This is what I see on the screen when it hangs:
>>>>> >>>>>>>>
>>>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
>>>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>>>>> >>>>>>>>
>>>>> >>>>>>>> Some logs:
>>>>> >>>>>>>>
>>>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
>>>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>>>>> >>>>>>>>
>>>>> >>>>>>>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>>>>> >>>>>>>>
>>>>> >>>>>>>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>>>>> >>>>>>>>
>>>>> >>>>>>>> Please let me know if I could provide more info.
>>>>> >>>>>>>
>>>>> >>>>>>> Has any previous kernel ever worked properly before? If so, any chance
>>>>> >>>>>>> you can use 'git bisect' to find the offending commit?
>>>>> >>>>>>
>>>>> >>>>>> I'm not sure, this is my work machine and I've only started using it
>>>>> >>>>>> recently (since about a month ago or so).
>>>>> >>>>>>
>>>>> >>>>>> I will try older kernels and see if I get any different results, I
>>>>> >>>>>> will report back in any case.
>>>>> >>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> And are you sure you have updated your bios to the latest version?
>>>>> >>>>>>
>>>>> >>>>>> Yes.
>>>>> >>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> thanks,
>>>>> >>>>>>>
>>>>> >>>>>>> greg k-h
>>>>> >>>>>>
>>>>> >>>>>> Thanks,
>>>>> >>>>>> Diego
>>>>> >>>>>
>>>>> >>>>> I found another workaround, I can suspend/resume fine with `i3lock &&
>>>>> >>>>> systemctl suspend` if I disconnect/unplug all my USB devices
>>>>> >>>>> (keyboard, mouse, etc). This with the default settings in the BIOS
>>>>> >>>>> (both USB 2.0 and 3.0 enabled).
>>>>> >>>>>
>>>>> >>>>> I'm also seeing some messages like this in dmesg:
>>>>> >>>>>
>>>>> >>>>> [ 16.172190] usb 2-6: device descriptor read/64, error -110
>>>>> >>>>>
>>>>> >>>>> Would this indicate a hardware/firmware/power issue?
>>>>> >>>>>
>>>>> >>>>> Thanks,
>>>>> >>>>> Diego
>>>>> >>>>
>>>>> >>>> OK, I've built Linux 4.4.52 (I did a localmodconfig) and rebooted into
>>>>> >>>> it, I did a suspend/resume and it hanged the first time I tried to
>>>>> >>>> resume, which isn't much different than using the latest kernel.
>>>>> >>>>
>>>>> >>>> My dmesg is still being spammed with these messages:
>>>>> >>>>
>>>>> >>>> [ 260.043673] usb 2-1: Device not responding to setup address.
>>>>> >>>> [ 260.246918] usb 2-1: device not accepting address 15, error -71
>>>>> >>>> [ 260.633662] usb 2-1: new high-speed USB device number 17 using xhci_hcd
>>>>> >>>> [ 261.341340] usb 2-1: USB disconnect, device number 17
>>>>> >>>>
>>>>> >>>> I guess it's safe to assume at this point that this is a hardware problem?
>>>>> >>>>
>>>>> >>>> Thanks,
>>>>> >>>> Diego
>>>>> >>>
>>>>> >>> Hello,
>>>>> >>>
>>>>> >>> I've found something interesting and what it seems to be the cause of
>>>>> >>> my problem.
>>>>> >>>
>>>>> >>> As soon as I boot my system I can see this process being in the D-state:
>>>>> >>>
>>>>> >>> [root@myhost ~]# ps aux | grep " D"
>>>>> >>> root 269 0.0 0.0 0 0 ? D 14:11 0:00 [rtsx_usb_ms_2]
>>>>> >>> root 1424 0.0 0.0 10788 2172 pts/2 S+ 14:19 0:00 grep D
>>>>> >>> [root@myhost ~]#
>>>>> >>>
>>>>> >>> I'm not exactly sure why that is, but if I do a 'rmmod rtsx_usb_ms'
>>>>> >>> the problem is gone. I already tried suspending/resuming ~40 times
>>>>> >>> after I disabled the module and the suspend/resume problem is gone.
>>>>>
>>>>> That's a good observation!
>>>>>
>>>>> It suspect the drivers/memstick/host/rtsx_usb_ms.c isn't behaving
>>>>> properly from PM point of view. Perhaps it tries to access its device
>>>>> while it from a runtime PM point view still is in a runtime suspended
>>>>> state. Exactly why I don't know yet.
>>>>>
>>>>> Moreover we have had issues with this driver before and its
>>>>> corresponding SD card driver in drivers/mmc/host/rtsx_usb_sdmmc.c. On
>>>>> top of that, both their corresponding devices shares the same usb mfd
>>>>> device as parent, which is managed by drivers/mfd/rtsx_usb.c.
>>>>>
>>>>> Unfortunate my knowledge about USB is still in the learning phase,
>>>>> however I know well about runtime PM ans system suspend, so perhaps I
>>>>> still might be able to help.
>>>>>
>>>>> Anyway, I have looped in Alan, let's see if he has some input to this.
>>>>
>>>> Is the rtsx_usb_ms device attached to an xHCI controller?
>>>
>>> I think so, I'm not sure.
>>>
>>> lsusb -t reveals rtsx_usb is under xhci_hcd as seen here:
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=255301
>>>
>>> Also, I tried disabling USB 3.0 from the BIOS and I'm still able to
>>> see rtsx_usb_ms is being loaded after that and the [rtsx_usb_ms_2]
>>> also shows up as a D-state process still, but no hanging occurs when
>>> USB 3.0 (xhci_hcd) is disabled.
>>>
>>>>
>>>> How is the hang during resume related to the actions of the xhci-hcd
>>>> driver? (You'll probably need to enable dynamic debugging for xhci-hcd
>>>> and use a network console to get the answer.)
>>>
>>> OK, I'll do this and get back with a trace.
>>>
>>>>
>>>> If this problem really is related to xhci-hcd, have you tried bringing
>>>> it to the attention of the xhci-hcd maintainer?
>>>
>>> No, not yet. I'm also not sure who the current maintainer for xhci_hcd is?
>>>
>>> modinfo says the author is Sarah Sharp but does she still maintains it?
>>>
>>>>
>>>> Are you using the most up-to-date version of the kernel? xhci-hcd is
>>>> still getting fixes at a very high rate.
>>>
>>> Yes, I'm currently on 4.10.2-ARCH.
>>>
>>> I will keep an eye on xhci_hcd changes on the latest git and give them
>>> a try also.
>>>
>>>>
>>>> Alan Stern
>>>>
>>>>> >>>
>>>>> >>> Diego
>>>>> >>
>>>>> >> Adding Roger Tseng to the CC also.
>>>>> >>
>>>>> >> Diego
>>>>> >
>>>>> > According to this document:
>>>>> >
>>>>> > http://downloads.dell.com/manuals/all-products/esuprt_laptop/esuprt_inspiron_laptop/inspiron-15-5558-laptop_reference%20guide_en-us.pdf
>>>>> >
>>>>> > My computer only has a SD card slot and no MEMSTICK slot.
>>>>> >
>>>>> > lsusb says this though:
>>>>> >
>>>>> > Bus 001 Device 005: ID 0bda:0129 Realtek Semiconductor Corp. RTS5129
>>>>> > Card Reader Controller
>>>>> >
>>>>> > Maybe the driver gets locked up looking for the MEMSTICK slot?
>>>>>
>>>>> Yes correct!
>>>>>
>>>>> >
>>>>> > Diego
>>>>>
>>>>> Kind regards
>>>>> Uffe
>>>>>
>>>>
>>>
>>> Thanks,
>>> Diego
>>
>> lsusb -t with USB 3.0 disabled on BIOS:
>>
>> https://bugzilla.kernel.org/attachment.cgi?id=255303
>>
>> Diego
>
> Hrm, that's rtsx_usb and not rtsx_usb_ms.
>
> I'm getting confused here.
>
> Diego
CC Mathias Nyman
On Thu, Mar 16, 2017 at 12:07 PM, Alan Stern <[email protected]> wrote:
> On Thu, 16 Mar 2017, Ulf Hansson wrote:
>
>> +Alan
>>
>> On 15 March 2017 at 15:00, Diego Viola <[email protected]> wrote:
>> > On Tue, Mar 14, 2017 at 4:15 PM, Diego Viola <[email protected]> wrote:
>> >> On Tue, Mar 14, 2017 at 2:20 PM, Diego Viola <[email protected]> wrote:
>> >>> On Thu, Mar 9, 2017 at 2:15 PM, Diego Viola <[email protected]> wrote:
>> >>>> On Thu, Mar 9, 2017 at 11:11 AM, Diego Viola <[email protected]> wrote:
>> >>>>> On Wed, Mar 8, 2017 at 5:40 PM, Diego Viola <[email protected]> wrote:
>> >>>>>> Hi Greg,
>> >>>>>>
>> >>>>>> On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
>> >>>>>>> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
>> >>>>>>>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
>> >>>>>>>> it works fine with ehci_hcd or USB 2.0.
>> >>>>>>>>
>> >>>>>>>> The way I reproduce the problem is with this command:
>> >>>>>>>>
>> >>>>>>>> $ i3lock && systemctl suspend
>> >>>>>>>>
>> >>>>>>>> This is what I see on the screen when it hangs:
>> >>>>>>>>
>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>> >>>>>>>>
>> >>>>>>>> Some logs:
>> >>>>>>>>
>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>> >>>>>>>>
>> >>>>>>>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>> >>>>>>>>
>> >>>>>>>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>> >>>>>>>>
>> >>>>>>>> Please let me know if I could provide more info.
>> >>>>>>>
>> >>>>>>> Has any previous kernel ever worked properly before? If so, any chance
>> >>>>>>> you can use 'git bisect' to find the offending commit?
>> >>>>>>
>> >>>>>> I'm not sure, this is my work machine and I've only started using it
>> >>>>>> recently (since about a month ago or so).
>> >>>>>>
>> >>>>>> I will try older kernels and see if I get any different results, I
>> >>>>>> will report back in any case.
>> >>>>>>
>> >>>>>>>
>> >>>>>>> And are you sure you have updated your bios to the latest version?
>> >>>>>>
>> >>>>>> Yes.
>> >>>>>>
>> >>>>>>>
>> >>>>>>> thanks,
>> >>>>>>>
>> >>>>>>> greg k-h
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Diego
>> >>>>>
>> >>>>> I found another workaround, I can suspend/resume fine with `i3lock &&
>> >>>>> systemctl suspend` if I disconnect/unplug all my USB devices
>> >>>>> (keyboard, mouse, etc). This with the default settings in the BIOS
>> >>>>> (both USB 2.0 and 3.0 enabled).
>> >>>>>
>> >>>>> I'm also seeing some messages like this in dmesg:
>> >>>>>
>> >>>>> [ 16.172190] usb 2-6: device descriptor read/64, error -110
>> >>>>>
>> >>>>> Would this indicate a hardware/firmware/power issue?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Diego
>> >>>>
>> >>>> OK, I've built Linux 4.4.52 (I did a localmodconfig) and rebooted into
>> >>>> it, I did a suspend/resume and it hanged the first time I tried to
>> >>>> resume, which isn't much different than using the latest kernel.
>> >>>>
>> >>>> My dmesg is still being spammed with these messages:
>> >>>>
>> >>>> [ 260.043673] usb 2-1: Device not responding to setup address.
>> >>>> [ 260.246918] usb 2-1: device not accepting address 15, error -71
>> >>>> [ 260.633662] usb 2-1: new high-speed USB device number 17 using xhci_hcd
>> >>>> [ 261.341340] usb 2-1: USB disconnect, device number 17
>> >>>>
>> >>>> I guess it's safe to assume at this point that this is a hardware problem?
>> >>>>
>> >>>> Thanks,
>> >>>> Diego
>> >>>
>> >>> Hello,
>> >>>
>> >>> I've found something interesting and what it seems to be the cause of
>> >>> my problem.
>> >>>
>> >>> As soon as I boot my system I can see this process being in the D-state:
>> >>>
>> >>> [root@myhost ~]# ps aux | grep " D"
>> >>> root 269 0.0 0.0 0 0 ? D 14:11 0:00 [rtsx_usb_ms_2]
>> >>> root 1424 0.0 0.0 10788 2172 pts/2 S+ 14:19 0:00 grep D
>> >>> [root@myhost ~]#
>> >>>
>> >>> I'm not exactly sure why that is, but if I do a 'rmmod rtsx_usb_ms'
>> >>> the problem is gone. I already tried suspending/resuming ~40 times
>> >>> after I disabled the module and the suspend/resume problem is gone.
>>
>> That's a good observation!
>>
>> It suspect the drivers/memstick/host/rtsx_usb_ms.c isn't behaving
>> properly from PM point of view. Perhaps it tries to access its device
>> while it from a runtime PM point view still is in a runtime suspended
>> state. Exactly why I don't know yet.
>>
>> Moreover we have had issues with this driver before and its
>> corresponding SD card driver in drivers/mmc/host/rtsx_usb_sdmmc.c. On
>> top of that, both their corresponding devices shares the same usb mfd
>> device as parent, which is managed by drivers/mfd/rtsx_usb.c.
>>
>> Unfortunate my knowledge about USB is still in the learning phase,
>> however I know well about runtime PM ans system suspend, so perhaps I
>> still might be able to help.
>>
>> Anyway, I have looped in Alan, let's see if he has some input to this.
>
> Is the rtsx_usb_ms device attached to an xHCI controller?
>
> How is the hang during resume related to the actions of the xhci-hcd
> driver? (You'll probably need to enable dynamic debugging for xhci-hcd
> and use a network console to get the answer.)
>
> If this problem really is related to xhci-hcd, have you tried bringing
> it to the attention of the xhci-hcd maintainer?
>
> Are you using the most up-to-date version of the kernel? xhci-hcd is
> still getting fixes at a very high rate.
>
> Alan Stern
>
>> >>>
>> >>> Diego
>> >>
>> >> Adding Roger Tseng to the CC also.
>> >>
>> >> Diego
>> >
>> > According to this document:
>> >
>> > http://downloads.dell.com/manuals/all-products/esuprt_laptop/esuprt_inspiron_laptop/inspiron-15-5558-laptop_reference%20guide_en-us.pdf
>> >
>> > My computer only has a SD card slot and no MEMSTICK slot.
>> >
>> > lsusb says this though:
>> >
>> > Bus 001 Device 005: ID 0bda:0129 Realtek Semiconductor Corp. RTS5129
>> > Card Reader Controller
>> >
>> > Maybe the driver gets locked up looking for the MEMSTICK slot?
>>
>> Yes correct!
>>
>> >
>> > Diego
>>
>> Kind regards
>> Uffe
>>
>
Alan,
I'm not sure if you saw or if it's useful, but I already got a trace
with netconsole a while ago while trying to reproduce the hang:
https://bugzilla.kernel.org/attachment.cgi?id=255227
Could you point me to some instructions about enabling dynamic
debugging for xhci-hcd?
Thanks,
Diego
On Thu, Mar 16, 2017 at 2:14 PM, Diego Viola <[email protected]> wrote:
> On Thu, Mar 16, 2017 at 12:07 PM, Alan Stern <[email protected]> wrote:
>> On Thu, 16 Mar 2017, Ulf Hansson wrote:
>>
>>> +Alan
>>>
>>> On 15 March 2017 at 15:00, Diego Viola <[email protected]> wrote:
>>> > On Tue, Mar 14, 2017 at 4:15 PM, Diego Viola <[email protected]> wrote:
>>> >> On Tue, Mar 14, 2017 at 2:20 PM, Diego Viola <[email protected]> wrote:
>>> >>> On Thu, Mar 9, 2017 at 2:15 PM, Diego Viola <[email protected]> wrote:
>>> >>>> On Thu, Mar 9, 2017 at 11:11 AM, Diego Viola <[email protected]> wrote:
>>> >>>>> On Wed, Mar 8, 2017 at 5:40 PM, Diego Viola <[email protected]> wrote:
>>> >>>>>> Hi Greg,
>>> >>>>>>
>>> >>>>>> On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
>>> >>>>>>> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
>>> >>>>>>>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
>>> >>>>>>>> it works fine with ehci_hcd or USB 2.0.
>>> >>>>>>>>
>>> >>>>>>>> The way I reproduce the problem is with this command:
>>> >>>>>>>>
>>> >>>>>>>> $ i3lock && systemctl suspend
>>> >>>>>>>>
>>> >>>>>>>> This is what I see on the screen when it hangs:
>>> >>>>>>>>
>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>>> >>>>>>>>
>>> >>>>>>>> Some logs:
>>> >>>>>>>>
>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
>>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>>> >>>>>>>>
>>> >>>>>>>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>>> >>>>>>>>
>>> >>>>>>>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>>> >>>>>>>>
>>> >>>>>>>> Please let me know if I could provide more info.
>>> >>>>>>>
>>> >>>>>>> Has any previous kernel ever worked properly before? If so, any chance
>>> >>>>>>> you can use 'git bisect' to find the offending commit?
>>> >>>>>>
>>> >>>>>> I'm not sure, this is my work machine and I've only started using it
>>> >>>>>> recently (since about a month ago or so).
>>> >>>>>>
>>> >>>>>> I will try older kernels and see if I get any different results, I
>>> >>>>>> will report back in any case.
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>> And are you sure you have updated your bios to the latest version?
>>> >>>>>>
>>> >>>>>> Yes.
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>> thanks,
>>> >>>>>>>
>>> >>>>>>> greg k-h
>>> >>>>>>
>>> >>>>>> Thanks,
>>> >>>>>> Diego
>>> >>>>>
>>> >>>>> I found another workaround, I can suspend/resume fine with `i3lock &&
>>> >>>>> systemctl suspend` if I disconnect/unplug all my USB devices
>>> >>>>> (keyboard, mouse, etc). This with the default settings in the BIOS
>>> >>>>> (both USB 2.0 and 3.0 enabled).
>>> >>>>>
>>> >>>>> I'm also seeing some messages like this in dmesg:
>>> >>>>>
>>> >>>>> [ 16.172190] usb 2-6: device descriptor read/64, error -110
>>> >>>>>
>>> >>>>> Would this indicate a hardware/firmware/power issue?
>>> >>>>>
>>> >>>>> Thanks,
>>> >>>>> Diego
>>> >>>>
>>> >>>> OK, I've built Linux 4.4.52 (I did a localmodconfig) and rebooted into
>>> >>>> it, I did a suspend/resume and it hanged the first time I tried to
>>> >>>> resume, which isn't much different than using the latest kernel.
>>> >>>>
>>> >>>> My dmesg is still being spammed with these messages:
>>> >>>>
>>> >>>> [ 260.043673] usb 2-1: Device not responding to setup address.
>>> >>>> [ 260.246918] usb 2-1: device not accepting address 15, error -71
>>> >>>> [ 260.633662] usb 2-1: new high-speed USB device number 17 using xhci_hcd
>>> >>>> [ 261.341340] usb 2-1: USB disconnect, device number 17
>>> >>>>
>>> >>>> I guess it's safe to assume at this point that this is a hardware problem?
>>> >>>>
>>> >>>> Thanks,
>>> >>>> Diego
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> I've found something interesting and what it seems to be the cause of
>>> >>> my problem.
>>> >>>
>>> >>> As soon as I boot my system I can see this process being in the D-state:
>>> >>>
>>> >>> [root@myhost ~]# ps aux | grep " D"
>>> >>> root 269 0.0 0.0 0 0 ? D 14:11 0:00 [rtsx_usb_ms_2]
>>> >>> root 1424 0.0 0.0 10788 2172 pts/2 S+ 14:19 0:00 grep D
>>> >>> [root@myhost ~]#
>>> >>>
>>> >>> I'm not exactly sure why that is, but if I do a 'rmmod rtsx_usb_ms'
>>> >>> the problem is gone. I already tried suspending/resuming ~40 times
>>> >>> after I disabled the module and the suspend/resume problem is gone.
>>>
>>> That's a good observation!
>>>
>>> It suspect the drivers/memstick/host/rtsx_usb_ms.c isn't behaving
>>> properly from PM point of view. Perhaps it tries to access its device
>>> while it from a runtime PM point view still is in a runtime suspended
>>> state. Exactly why I don't know yet.
>>>
>>> Moreover we have had issues with this driver before and its
>>> corresponding SD card driver in drivers/mmc/host/rtsx_usb_sdmmc.c. On
>>> top of that, both their corresponding devices shares the same usb mfd
>>> device as parent, which is managed by drivers/mfd/rtsx_usb.c.
>>>
>>> Unfortunate my knowledge about USB is still in the learning phase,
>>> however I know well about runtime PM ans system suspend, so perhaps I
>>> still might be able to help.
>>>
>>> Anyway, I have looped in Alan, let's see if he has some input to this.
>>
>> Is the rtsx_usb_ms device attached to an xHCI controller?
>>
>> How is the hang during resume related to the actions of the xhci-hcd
>> driver? (You'll probably need to enable dynamic debugging for xhci-hcd
>> and use a network console to get the answer.)
>>
>> If this problem really is related to xhci-hcd, have you tried bringing
>> it to the attention of the xhci-hcd maintainer?
>>
>> Are you using the most up-to-date version of the kernel? xhci-hcd is
>> still getting fixes at a very high rate.
>>
>> Alan Stern
>>
>>> >>>
>>> >>> Diego
>>> >>
>>> >> Adding Roger Tseng to the CC also.
>>> >>
>>> >> Diego
>>> >
>>> > According to this document:
>>> >
>>> > http://downloads.dell.com/manuals/all-products/esuprt_laptop/esuprt_inspiron_laptop/inspiron-15-5558-laptop_reference%20guide_en-us.pdf
>>> >
>>> > My computer only has a SD card slot and no MEMSTICK slot.
>>> >
>>> > lsusb says this though:
>>> >
>>> > Bus 001 Device 005: ID 0bda:0129 Realtek Semiconductor Corp. RTS5129
>>> > Card Reader Controller
>>> >
>>> > Maybe the driver gets locked up looking for the MEMSTICK slot?
>>>
>>> Yes correct!
>>>
>>> >
>>> > Diego
>>>
>>> Kind regards
>>> Uffe
>>>
>>
>
> Alan,
>
> I'm not sure if you saw or if it's useful, but I already got a trace
> with netconsole a while ago while trying to reproduce the hang:
>
> https://bugzilla.kernel.org/attachment.cgi?id=255227
>
> Could you point me to some instructions about enabling dynamic
> debugging for xhci-hcd?
>
> Thanks,
> Diego
Already explained here.
https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/dynamic-debug-howto.rst
On Fri, 17 Mar 2017, Diego Viola wrote:
> Hi,
>
> Here's the log to the netconsole dmesg capture, I've used
> xhci_hcd.dyndbg no_console_suspend=1 as the kernel parameters.
>
> I did the usual suspend/resume cycle with i3lock, it hung after the
> third attempt when trying to resume from suspend.
>
> https://bugzilla.kernel.org/attachment.cgi?id=255309
I'm not an expert on xHCI. This should be CC'ed to the xhci-hcd
maintainer.
Alan Stern
>
> Please let me know if I should provide something else.
>
> Thanks,
> Diego
>
On Thu, Mar 16, 2017 at 12:07 PM, Alan Stern <[email protected]> wrote:
> On Thu, 16 Mar 2017, Ulf Hansson wrote:
>
>> +Alan
>>
>> On 15 March 2017 at 15:00, Diego Viola <[email protected]> wrote:
>> > On Tue, Mar 14, 2017 at 4:15 PM, Diego Viola <[email protected]> wrote:
>> >> On Tue, Mar 14, 2017 at 2:20 PM, Diego Viola <[email protected]> wrote:
>> >>> On Thu, Mar 9, 2017 at 2:15 PM, Diego Viola <[email protected]> wrote:
>> >>>> On Thu, Mar 9, 2017 at 11:11 AM, Diego Viola <[email protected]> wrote:
>> >>>>> On Wed, Mar 8, 2017 at 5:40 PM, Diego Viola <[email protected]> wrote:
>> >>>>>> Hi Greg,
>> >>>>>>
>> >>>>>> On Wed, Mar 8, 2017 at 5:15 PM, Greg KH <[email protected]> wrote:
>> >>>>>>> On Wed, Mar 08, 2017 at 03:49:19PM -0300, Diego Viola wrote:
>> >>>>>>>> It hangs on resume from suspend if I have USB 3.0 enabled on the BIOS,
>> >>>>>>>> it works fine with ehci_hcd or USB 2.0.
>> >>>>>>>>
>> >>>>>>>> The way I reproduce the problem is with this command:
>> >>>>>>>>
>> >>>>>>>> $ i3lock && systemctl suspend
>> >>>>>>>>
>> >>>>>>>> This is what I see on the screen when it hangs:
>> >>>>>>>>
>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170308_095000.jpg
>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/IMG_20170307_133928.jpg
>> >>>>>>>>
>> >>>>>>>> Some logs:
>> >>>>>>>>
>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg1.txt
>> >>>>>>>> https://dl.dropboxusercontent.com/u/6005119/dell/dmesg2.txt
>> >>>>>>>>
>> >>>>>>>> I'm on Arch Linux x86_64, kernel 4.9.11-1-ARCH.
>> >>>>>>>>
>> >>>>>>>> I also tried Linux 4.10.1 and I could reproduce this problem there as well.
>> >>>>>>>>
>> >>>>>>>> Please let me know if I could provide more info.
>> >>>>>>>
>> >>>>>>> Has any previous kernel ever worked properly before? If so, any chance
>> >>>>>>> you can use 'git bisect' to find the offending commit?
>> >>>>>>
>> >>>>>> I'm not sure, this is my work machine and I've only started using it
>> >>>>>> recently (since about a month ago or so).
>> >>>>>>
>> >>>>>> I will try older kernels and see if I get any different results, I
>> >>>>>> will report back in any case.
>> >>>>>>
>> >>>>>>>
>> >>>>>>> And are you sure you have updated your bios to the latest version?
>> >>>>>>
>> >>>>>> Yes.
>> >>>>>>
>> >>>>>>>
>> >>>>>>> thanks,
>> >>>>>>>
>> >>>>>>> greg k-h
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Diego
>> >>>>>
>> >>>>> I found another workaround, I can suspend/resume fine with `i3lock &&
>> >>>>> systemctl suspend` if I disconnect/unplug all my USB devices
>> >>>>> (keyboard, mouse, etc). This with the default settings in the BIOS
>> >>>>> (both USB 2.0 and 3.0 enabled).
>> >>>>>
>> >>>>> I'm also seeing some messages like this in dmesg:
>> >>>>>
>> >>>>> [ 16.172190] usb 2-6: device descriptor read/64, error -110
>> >>>>>
>> >>>>> Would this indicate a hardware/firmware/power issue?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Diego
>> >>>>
>> >>>> OK, I've built Linux 4.4.52 (I did a localmodconfig) and rebooted into
>> >>>> it, I did a suspend/resume and it hanged the first time I tried to
>> >>>> resume, which isn't much different than using the latest kernel.
>> >>>>
>> >>>> My dmesg is still being spammed with these messages:
>> >>>>
>> >>>> [ 260.043673] usb 2-1: Device not responding to setup address.
>> >>>> [ 260.246918] usb 2-1: device not accepting address 15, error -71
>> >>>> [ 260.633662] usb 2-1: new high-speed USB device number 17 using xhci_hcd
>> >>>> [ 261.341340] usb 2-1: USB disconnect, device number 17
>> >>>>
>> >>>> I guess it's safe to assume at this point that this is a hardware problem?
>> >>>>
>> >>>> Thanks,
>> >>>> Diego
>> >>>
>> >>> Hello,
>> >>>
>> >>> I've found something interesting and what it seems to be the cause of
>> >>> my problem.
>> >>>
>> >>> As soon as I boot my system I can see this process being in the D-state:
>> >>>
>> >>> [root@myhost ~]# ps aux | grep " D"
>> >>> root 269 0.0 0.0 0 0 ? D 14:11 0:00 [rtsx_usb_ms_2]
>> >>> root 1424 0.0 0.0 10788 2172 pts/2 S+ 14:19 0:00 grep D
>> >>> [root@myhost ~]#
>> >>>
>> >>> I'm not exactly sure why that is, but if I do a 'rmmod rtsx_usb_ms'
>> >>> the problem is gone. I already tried suspending/resuming ~40 times
>> >>> after I disabled the module and the suspend/resume problem is gone.
>>
>> That's a good observation!
>>
>> It suspect the drivers/memstick/host/rtsx_usb_ms.c isn't behaving
>> properly from PM point of view. Perhaps it tries to access its device
>> while it from a runtime PM point view still is in a runtime suspended
>> state. Exactly why I don't know yet.
>>
>> Moreover we have had issues with this driver before and its
>> corresponding SD card driver in drivers/mmc/host/rtsx_usb_sdmmc.c. On
>> top of that, both their corresponding devices shares the same usb mfd
>> device as parent, which is managed by drivers/mfd/rtsx_usb.c.
>>
>> Unfortunate my knowledge about USB is still in the learning phase,
>> however I know well about runtime PM ans system suspend, so perhaps I
>> still might be able to help.
>>
>> Anyway, I have looped in Alan, let's see if he has some input to this.
>
> Is the rtsx_usb_ms device attached to an xHCI controller?
>
> How is the hang during resume related to the actions of the xhci-hcd
> driver? (You'll probably need to enable dynamic debugging for xhci-hcd
> and use a network console to get the answer.)
>
> If this problem really is related to xhci-hcd, have you tried bringing
> it to the attention of the xhci-hcd maintainer?
>
> Are you using the most up-to-date version of the kernel? xhci-hcd is
> still getting fixes at a very high rate.
>
> Alan Stern
>
>> >>>
>> >>> Diego
>> >>
>> >> Adding Roger Tseng to the CC also.
>> >>
>> >> Diego
>> >
>> > According to this document:
>> >
>> > http://downloads.dell.com/manuals/all-products/esuprt_laptop/esuprt_inspiron_laptop/inspiron-15-5558-laptop_reference%20guide_en-us.pdf
>> >
>> > My computer only has a SD card slot and no MEMSTICK slot.
>> >
>> > lsusb says this though:
>> >
>> > Bus 001 Device 005: ID 0bda:0129 Realtek Semiconductor Corp. RTS5129
>> > Card Reader Controller
>> >
>> > Maybe the driver gets locked up looking for the MEMSTICK slot?
>>
>> Yes correct!
>>
>> >
>> > Diego
>>
>> Kind regards
>> Uffe
>>
>
Hi,
Here's the log to the netconsole dmesg capture, I've used
xhci_hcd.dyndbg no_console_suspend=1 as the kernel parameters.
I did the usual suspend/resume cycle with i3lock, it hung after the
third attempt when trying to resume from suspend.
https://bugzilla.kernel.org/attachment.cgi?id=255309
Please let me know if I should provide something else.
Thanks,
Diego
On Fri, Mar 17, 2017 at 1:24 PM, Alan Stern <[email protected]> wrote:
> On Fri, 17 Mar 2017, Diego Viola wrote:
>
>> Hi,
>>
>> Here's the log to the netconsole dmesg capture, I've used
>> xhci_hcd.dyndbg no_console_suspend=1 as the kernel parameters.
>>
>> I did the usual suspend/resume cycle with i3lock, it hung after the
>> third attempt when trying to resume from suspend.
>>
>> https://bugzilla.kernel.org/attachment.cgi?id=255309
>
> I'm not an expert on xHCI. This should be CC'ed to the xhci-hcd
> maintainer.
>
> Alan Stern
>
>>
>> Please let me know if I should provide something else.
>>
>> Thanks,
>> Diego
>>
>
I've forwarded my email to Mathias Nyman.
Diego
On Fri, Mar 17, 2017 at 1:57 PM, Diego Viola <[email protected]> wrote:
> On Fri, Mar 17, 2017 at 1:24 PM, Alan Stern <[email protected]> wrote:
>> On Fri, 17 Mar 2017, Diego Viola wrote:
>>
>>> Hi,
>>>
>>> Here's the log to the netconsole dmesg capture, I've used
>>> xhci_hcd.dyndbg no_console_suspend=1 as the kernel parameters.
>>>
>>> I did the usual suspend/resume cycle with i3lock, it hung after the
>>> third attempt when trying to resume from suspend.
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=255309
>>
>> I'm not an expert on xHCI. This should be CC'ed to the xhci-hcd
>> maintainer.
>>
>> Alan Stern
>>
>>>
>>> Please let me know if I should provide something else.
>>>
>>> Thanks,
>>> Diego
>>>
>>
>
> I've forwarded my email to Mathias Nyman.
>
> Diego
Still a problem with 4.11.0-rc2-ARCH+
commit d528ae0d3dfedea553812c957a6ed1e87feeed8a
On Fri, Mar 17, 2017 at 5:18 PM, Diego Viola <[email protected]> wrote:
> On Fri, Mar 17, 2017 at 1:57 PM, Diego Viola <[email protected]> wrote:
>> On Fri, Mar 17, 2017 at 1:24 PM, Alan Stern <[email protected]> wrote:
>>> On Fri, 17 Mar 2017, Diego Viola wrote:
>>>
>>>> Hi,
>>>>
>>>> Here's the log to the netconsole dmesg capture, I've used
>>>> xhci_hcd.dyndbg no_console_suspend=1 as the kernel parameters.
>>>>
>>>> I did the usual suspend/resume cycle with i3lock, it hung after the
>>>> third attempt when trying to resume from suspend.
>>>>
>>>> https://bugzilla.kernel.org/attachment.cgi?id=255309
>>>
>>> I'm not an expert on xHCI. This should be CC'ed to the xhci-hcd
>>> maintainer.
>>>
>>> Alan Stern
>>>
>>>>
>>>> Please let me know if I should provide something else.
>>>>
>>>> Thanks,
>>>> Diego
>>>>
>>>
>>
>> I've forwarded my email to Mathias Nyman.
>>
>> Diego
>
> Still a problem with 4.11.0-rc2-ARCH+
>
> commit d528ae0d3dfedea553812c957a6ed1e87feeed8a
I have had a conversation with oiaohm over IRC about this, some
interesting things he had said about this issue:
2017-03-18 18:08:02 oiaohm That driver that was going dead
because because it was physical port less was usb stack. So maybe it
that bit of hardware still doing stupid.
2017-03-18 18:21:44 oiaohm I guess this current log of yours is
with the realtek memstick black listed.
2017-03-18 18:21:55 oiaohm because it does not exist.
2017-03-18 18:22:09 oiaohm physically.
2017-03-18 18:23:04 oiaohm Maybe. If the hardware is not inited
the usb stack might not try to suspend it.
2017-03-18 18:26:30 oiaohm No matter how you look at it the thing
is broken hardware. I don't know if that realtek is USB 3.0
2017-03-18 18:27:02 oiaohm Or it sitting on a USB 3.0 hub inside
the machine.
2017-03-18 18:27:39 oiaohm You cannot expect a driver to work
when the hardware is portless.
2017-03-18 18:27:51 oiaohm and it should have a port.
2017-03-18 18:27:52 oiaohm either.
2017-03-18 18:29:03 oiaohm rtsx_usb_ms this is a memstick driver
there should be memstick port on you system or a header for a
memstick port both mean the pull down and pull up circiuts are present
so the hardware cannot function right.
2017-03-18 18:29:38 oiaohm You gone over the machine and there is
no memstick port exposed being a laptop the odds of internal header is
basically never happens.
2017-03-18 18:30:27 oiaohm so it broken hardware.
2017-03-18 18:31:18 oiaohm the correct answer with broken
hardware is don't init the part blacklist the driver.
2017-03-18 18:40:36 oiaohm You can think of it this way the
hardware gets lost because it cannot tell if something is connected so
is sending messages and waiting for responses that will never come.
But when hardware is there due to different speeds of cards it has no
clear clue what the time frame is.
2017-03-18 18:41:27 oiaohm So the hardware being lost and kinda
jammed is purely to be excepted if it does not have all it required
circuits to function.
2017-03-18 18:48:42 oiaohm You have the realtek controller for a
memstick port and it cannot tell if the proper hardware is present or
not that is what is triggering the driver to load.
2017-03-18 18:49:18 oiaohm There is a difference when you have
the USB 3.0 controller active.
2017-03-18 18:49:49 oiaohm You will see a lot of windows users
noting they need to disable the USB 3.0 controller to hibernate.
2017-03-18 18:50:19 oiaohm In usb 2.0 the operating system polls
the USB ports and does a lot of the messaging. In USB 3.0 controller
it does that polling.
2017-03-18 18:50:40 oiaohm USB 3.0 controllers normally presume
all the hardware that is inited is functional.
2017-03-18 18:51:07 oiaohm Linux kernel doing USB 2.0 polling
itself presumes the hardware could be busted.
2017-03-18 18:56:37 oiaohm USB 3.0 controller is interpret driven
to the OS so it does a lot of heavy lifting of USB by itself. USB
2.0 and before controllers are like win modems basically brainless and
depending on the OS todo everything thing.
2017-03-18 18:58:27 oiaohm So usb 2.0 controller not showing the
issue and the usb 3.0 showing the issue is kind of expected. If you
did not init the hardware and usb 3.0 controller still showed a issue
then there would be a problem.
2017-03-18 19:02:59 oiaohm dviola I guess the only thing you were
missing is that the USB 3.0 controller had proper controller so can
think for itself and USB 2.0 and before is like a brainless winmoden
so the OS can work around a few USB hardware issues in USB 2.0
controller mode.
2017-03-18 19:09:58 oiaohm do remember the difference between usb
2.0 and usb 3.0 at times you have no choice but to force back to usb
2.0
2017-03-18 19:10:12 oiaohm With broken bits of hardware.
2017-03-18 22:30:38 oiaohm I think everyone is being confused by
a basic hardware construciton cost cutting move.
2017-03-18 22:31:18 oiaohm Now maybe they will be able to come up
with some solution to allow memmory stick part of the realtek where it
not a port to be sanely not inited.
On 19.03.2017 23:29, Diego Viola wrote:
> On Fri, Mar 17, 2017 at 5:18 PM, Diego Viola <[email protected]> wrote:
>> On Fri, Mar 17, 2017 at 1:57 PM, Diego Viola <[email protected]> wrote:
>>> On Fri, Mar 17, 2017 at 1:24 PM, Alan Stern <[email protected]> wrote:
>>>> On Fri, 17 Mar 2017, Diego Viola wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Here's the log to the netconsole dmesg capture, I've used
>>>>> xhci_hcd.dyndbg no_console_suspend=1 as the kernel parameters.
>>>>>
>>>>> I did the usual suspend/resume cycle with i3lock, it hung after the
>>>>> third attempt when trying to resume from suspend.
>>>>>
>>>>> https://bugzilla.kernel.org/attachment.cgi?id=255309
>>>>
>>>> I'm not an expert on xHCI. This should be CC'ed to the xhci-hcd
>>>> maintainer.
>>>>
>>>> Alan Stern
>>>>
>>>>>
>>>>> Please let me know if I should provide something else.
>>>>>
>>>>> Thanks,
>>>>> Diego
>>>>>
>>>>
>>>
>>> I've forwarded my email to Mathias Nyman.
>>>
>>> Diego
>>
>> Still a problem with 4.11.0-rc2-ARCH+
>>
From a quick glance it looks like rtsx_usb_ms probaly takes a mutex (&ucr->dev_mutex)
and then issues a usb_bulk_msg() and waits for it to complete with mutex held.
The usb message times out, usb core kills the urb but the URB probably never gets completed,
and function never returns.
Everyting using ucr->dev_mutex would block, for example the kthread, rtsx_usb_detect_ms_card
that continuously tries to detect a ms card, takes and releases the same ucr->dev_mutex for
each try.
[ 614.026502] INFO: task kworker/u8:0:5 blocked for more than 120 seconds.
[ 614.027865] Not tainted 4.10.3-1-ARCH #1
[ 614.029116] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 614.030467] kworker/u8:0 D 0 5 2 0x00000000
[ 614.031812] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[ 614.033179] Call Trace:
[ 614.034550] __schedule+0x22f/0x700
[ 614.035940] schedule+0x3d/0x90
[ 614.037334] schedule_preempt_disabled+0x15/0x20
[ 614.038680] __mutex_lock_slowpath+0x19b/0x2d0
[ 614.040067] ? flush_workqueue+0x204/0x580
[ 614.041456] mutex_lock+0x23/0x30
[ 614.042163] acpi_device_hotplug+0x43/0x3e7
[ 614.042882] acpi_hotplug_work_fn+0x1e/0x29
[ 614.043612] process_one_work+0x1e5/0x470
[ 614.044356] worker_thread+0x48/0x4e0
[ 614.045077] kthread+0x101/0x140
[ 614.045788] ? process_one_work+0x470/0x470
[ 614.046495] ? kthread_create_on_node+0x60/0x60
[ 614.047215] ret_from_fork+0x2c/0x40
[ 614.047950] INFO: task rtsx_usb_ms_1:235 blocked for more than 120 seconds.
[ 614.048697] Not tainted 4.10.3-1-ARCH #1
[ 614.049465] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 614.050265] rtsx_usb_ms_1 D 0 235 2 0x00000000
[ 614.051064] Call Trace:
[ 614.051841] __schedule+0x22f/0x700
[ 614.052626] schedule+0x3d/0x90
[ 614.053411] usb_kill_urb.part.4+0x6c/0xa0 [usbcore]
[ 614.054198] ? wake_atomic_t_function+0x60/0x60
[ 614.055005] usb_kill_urb+0x21/0x30 [usbcore]
[ 614.055819] usb_start_wait_urb+0xe5/0x170 [usbcore]
[ 614.056652] usb_bulk_msg+0xbd/0x160 [usbcore]
[ 614.057489] rtsx_usb_send_cmd+0x63/0x90 [rtsx_usb]
[ 614.058306] rtsx_usb_read_register+0x6c/0xc0 [rtsx_usb]
[ 614.059118] rtsx_usb_detect_ms_card+0x98/0x120 [rtsx_usb_ms]
There is a lot going on in xhci during the last suspend befor this.
URBs are canceled, devices reset and re-enumerated, timeout while reading descriptor,
device firmware changed.
It's possible we end up in a situation where xhci never givers back the URB.
4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue and giveback.
Could you try enabling xhci tracing before suspending (not the same as xhci verbose dynamic debug)
It will generate a lot of data, so better to remove all extra USB devices.
xhci tracing can be added with:
mount -t debugfs none /sys/kernel/debug
echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
and then send the output of cat /sys/kernel/debug/tracing/trace
-Mathias
On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
<[email protected]> wrote:
> On 19.03.2017 23:29, Diego Viola wrote:
>>
>> On Fri, Mar 17, 2017 at 5:18 PM, Diego Viola <[email protected]>
>> wrote:
>>>
>>> On Fri, Mar 17, 2017 at 1:57 PM, Diego Viola <[email protected]>
>>> wrote:
>>>>
>>>> On Fri, Mar 17, 2017 at 1:24 PM, Alan Stern <[email protected]>
>>>> wrote:
>>>>>
>>>>> On Fri, 17 Mar 2017, Diego Viola wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Here's the log to the netconsole dmesg capture, I've used
>>>>>> xhci_hcd.dyndbg no_console_suspend=1 as the kernel parameters.
>>>>>>
>>>>>> I did the usual suspend/resume cycle with i3lock, it hung after the
>>>>>> third attempt when trying to resume from suspend.
>>>>>>
>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=255309
>>>>>
>>>>>
>>>>> I'm not an expert on xHCI. This should be CC'ed to the xhci-hcd
>>>>> maintainer.
>>>>>
>>>>> Alan Stern
>>>>>
>>>>>>
>>>>>> Please let me know if I should provide something else.
>>>>>>
>>>>>> Thanks,
>>>>>> Diego
>>>>>>
>>>>>
>>>>
>>>> I've forwarded my email to Mathias Nyman.
>>>>
>>>> Diego
>>>
>>>
>>> Still a problem with 4.11.0-rc2-ARCH+
>>>
>
> From a quick glance it looks like rtsx_usb_ms probaly takes a mutex
> (&ucr->dev_mutex)
> and then issues a usb_bulk_msg() and waits for it to complete with mutex
> held.
> The usb message times out, usb core kills the urb but the URB probably never
> gets completed,
> and function never returns.
>
> Everyting using ucr->dev_mutex would block, for example the kthread,
> rtsx_usb_detect_ms_card
> that continuously tries to detect a ms card, takes and releases the same
> ucr->dev_mutex for
> each try.
>
> [ 614.026502] INFO: task kworker/u8:0:5 blocked for more than 120 seconds.
> [ 614.027865] Not tainted 4.10.3-1-ARCH #1
> [ 614.029116] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 614.030467] kworker/u8:0 D 0 5 2 0x00000000
> [ 614.031812] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> [ 614.033179] Call Trace:
> [ 614.034550] __schedule+0x22f/0x700
> [ 614.035940] schedule+0x3d/0x90
> [ 614.037334] schedule_preempt_disabled+0x15/0x20
> [ 614.038680] __mutex_lock_slowpath+0x19b/0x2d0
> [ 614.040067] ? flush_workqueue+0x204/0x580
> [ 614.041456] mutex_lock+0x23/0x30
> [ 614.042163] acpi_device_hotplug+0x43/0x3e7
> [ 614.042882] acpi_hotplug_work_fn+0x1e/0x29
> [ 614.043612] process_one_work+0x1e5/0x470
> [ 614.044356] worker_thread+0x48/0x4e0
> [ 614.045077] kthread+0x101/0x140
> [ 614.045788] ? process_one_work+0x470/0x470
> [ 614.046495] ? kthread_create_on_node+0x60/0x60
> [ 614.047215] ret_from_fork+0x2c/0x40
> [ 614.047950] INFO: task rtsx_usb_ms_1:235 blocked for more than 120
> seconds.
> [ 614.048697] Not tainted 4.10.3-1-ARCH #1
> [ 614.049465] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 614.050265] rtsx_usb_ms_1 D 0 235 2 0x00000000
> [ 614.051064] Call Trace:
> [ 614.051841] __schedule+0x22f/0x700
> [ 614.052626] schedule+0x3d/0x90
> [ 614.053411] usb_kill_urb.part.4+0x6c/0xa0 [usbcore]
> [ 614.054198] ? wake_atomic_t_function+0x60/0x60
> [ 614.055005] usb_kill_urb+0x21/0x30 [usbcore]
> [ 614.055819] usb_start_wait_urb+0xe5/0x170 [usbcore]
> [ 614.056652] usb_bulk_msg+0xbd/0x160 [usbcore]
> [ 614.057489] rtsx_usb_send_cmd+0x63/0x90 [rtsx_usb]
> [ 614.058306] rtsx_usb_read_register+0x6c/0xc0 [rtsx_usb]
> [ 614.059118] rtsx_usb_detect_ms_card+0x98/0x120 [rtsx_usb_ms]
>
> There is a lot going on in xhci during the last suspend befor this.
> URBs are canceled, devices reset and re-enumerated, timeout while reading
> descriptor,
> device firmware changed.
>
> It's possible we end up in a situation where xhci never givers back the
> URB.
>
> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue and
> giveback.
>
> Could you try enabling xhci tracing before suspending (not the same as xhci
> verbose dynamic debug)
> It will generate a lot of data, so better to remove all extra USB devices.
>
> xhci tracing can be added with:
>
> mount -t debugfs none /sys/kernel/debug
> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>
> and then send the output of cat /sys/kernel/debug/tracing/trace
>
> -Mathias
>
>
https://bugzilla.kernel.org/attachment.cgi?id=255367
This is with Linux 4.11.0-rc3-ARCH.
USB mouse/keyboard was unplugged before booting the machine.
I didn't do a suspend/resume before getting this trace, should I do that?
Should I reproduce the hang and get a netconsole dmesg capture with
tracing enabled?
Thanks,
Diego
On 20.03.2017 17:39, Diego Viola wrote:
> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
> <[email protected]> wrote:
>> On 19.03.2017 23:29, Diego Viola wrote:
>>>
>>>> Still a problem with 4.11.0-rc2-ARCH+
>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue and
>> giveback.
>>
>> Could you try enabling xhci tracing before suspending (not the same as xhci
>> verbose dynamic debug)
>> It will generate a lot of data, so better to remove all extra USB devices.
>>
>> xhci tracing can be added with:
>>
>> mount -t debugfs none /sys/kernel/debug
>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>
>> and then send the output of cat /sys/kernel/debug/tracing/trace
>
> https://bugzilla.kernel.org/attachment.cgi?id=255367
>
> This is with Linux 4.11.0-rc3-ARCH.
>
> USB mouse/keyboard was unplugged before booting the machine.
>
> I didn't do a suspend/resume before getting this trace, should I do that?
>
> Should I reproduce the hang and get a netconsole dmesg capture with
> tracing enabled?
A trace and a dmesg of the same suspend/reusume hang would be great.
And if you can then one of a succesful suspend/resume for reference.
(I haven't yet checked the one you added to bugzilla)
-Mathias
On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
<[email protected]> wrote:
> On 20.03.2017 17:39, Diego Viola wrote:
>>
>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>> <[email protected]> wrote:
>>>
>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>
>>>>
>
>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>
>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>> and
>>> giveback.
>>>
>>> Could you try enabling xhci tracing before suspending (not the same as
>>> xhci
>>> verbose dynamic debug)
>>> It will generate a lot of data, so better to remove all extra USB
>>> devices.
>>>
>>> xhci tracing can be added with:
>>>
>>> mount -t debugfs none /sys/kernel/debug
>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>
>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>
>>
>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>
>> This is with Linux 4.11.0-rc3-ARCH.
>>
>> USB mouse/keyboard was unplugged before booting the machine.
>>
>> I didn't do a suspend/resume before getting this trace, should I do that?
>>
>> Should I reproduce the hang and get a netconsole dmesg capture with
>> tracing enabled?
>
>
> A trace and a dmesg of the same suspend/reusume hang would be great.
I can capture the dmesg with netconsole once the machine hangs, but
I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
the hang. I'm unable to use ssh after the hang.
> And if you can then one of a succesful suspend/resume for reference.
Here's the trace after a successful suspend/resume:
https://bugzilla.kernel.org/attachment.cgi?id=255369
>
> (I haven't yet checked the one you added to bugzilla)
>
> -Mathias
Diego
On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <[email protected]> wrote:
> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
> <[email protected]> wrote:
>> On 20.03.2017 17:39, Diego Viola wrote:
>>>
>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>> <[email protected]> wrote:
>>>>
>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>
>>>>>
>>
>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>
>>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>>> and
>>>> giveback.
>>>>
>>>> Could you try enabling xhci tracing before suspending (not the same as
>>>> xhci
>>>> verbose dynamic debug)
>>>> It will generate a lot of data, so better to remove all extra USB
>>>> devices.
>>>>
>>>> xhci tracing can be added with:
>>>>
>>>> mount -t debugfs none /sys/kernel/debug
>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>
>>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>>
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>>
>>> This is with Linux 4.11.0-rc3-ARCH.
>>>
>>> USB mouse/keyboard was unplugged before booting the machine.
>>>
>>> I didn't do a suspend/resume before getting this trace, should I do that?
>>>
>>> Should I reproduce the hang and get a netconsole dmesg capture with
>>> tracing enabled?
>>
>>
>> A trace and a dmesg of the same suspend/reusume hang would be great.
>
> I can capture the dmesg with netconsole once the machine hangs, but
> I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
> the hang. I'm unable to use ssh after the hang.
>
>> And if you can then one of a succesful suspend/resume for reference.
>
> Here's the trace after a successful suspend/resume:
>
> https://bugzilla.kernel.org/attachment.cgi?id=255369
>
>>
>> (I haven't yet checked the one you added to bugzilla)
>>
>> -Mathias
>
> Diego
ftrace_dump_on_oops is what I was looking for.
Diego
On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola <[email protected]> wrote:
> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <[email protected]> wrote:
>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>> <[email protected]> wrote:
>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>
>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>> <[email protected]> wrote:
>>>>>
>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>
>>>>>>
>>>
>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>
>>>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>>>> and
>>>>> giveback.
>>>>>
>>>>> Could you try enabling xhci tracing before suspending (not the same as
>>>>> xhci
>>>>> verbose dynamic debug)
>>>>> It will generate a lot of data, so better to remove all extra USB
>>>>> devices.
>>>>>
>>>>> xhci tracing can be added with:
>>>>>
>>>>> mount -t debugfs none /sys/kernel/debug
>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>>
>>>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>>>
>>>>
>>>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>>>
>>>> This is with Linux 4.11.0-rc3-ARCH.
>>>>
>>>> USB mouse/keyboard was unplugged before booting the machine.
>>>>
>>>> I didn't do a suspend/resume before getting this trace, should I do that?
>>>>
>>>> Should I reproduce the hang and get a netconsole dmesg capture with
>>>> tracing enabled?
>>>
>>>
>>> A trace and a dmesg of the same suspend/reusume hang would be great.
>>
>> I can capture the dmesg with netconsole once the machine hangs, but
>> I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
>> the hang. I'm unable to use ssh after the hang.
>>
>>> And if you can then one of a succesful suspend/resume for reference.
>>
>> Here's the trace after a successful suspend/resume:
>>
>> https://bugzilla.kernel.org/attachment.cgi?id=255369
>>
>>>
>>> (I haven't yet checked the one you added to bugzilla)
>>>
>>> -Mathias
>>
>> Diego
>
> ftrace_dump_on_oops is what I was looking for.
>
> Diego
I tried ftrace_dump_on_oops but I can't see the trace coming in, not
sure what I'm doing wrong. :(
Diego
On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola <[email protected]> wrote:
> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola <[email protected]> wrote:
>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <[email protected]> wrote:
>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>> <[email protected]> wrote:
>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>
>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>
>>>>>>>
>>>>
>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>
>>>>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>>>>> and
>>>>>> giveback.
>>>>>>
>>>>>> Could you try enabling xhci tracing before suspending (not the same as
>>>>>> xhci
>>>>>> verbose dynamic debug)
>>>>>> It will generate a lot of data, so better to remove all extra USB
>>>>>> devices.
>>>>>>
>>>>>> xhci tracing can be added with:
>>>>>>
>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>>>
>>>>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>>>>
>>>>>
>>>>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>>>>
>>>>> This is with Linux 4.11.0-rc3-ARCH.
>>>>>
>>>>> USB mouse/keyboard was unplugged before booting the machine.
>>>>>
>>>>> I didn't do a suspend/resume before getting this trace, should I do that?
>>>>>
>>>>> Should I reproduce the hang and get a netconsole dmesg capture with
>>>>> tracing enabled?
>>>>
>>>>
>>>> A trace and a dmesg of the same suspend/reusume hang would be great.
>>>
>>> I can capture the dmesg with netconsole once the machine hangs, but
>>> I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
>>> the hang. I'm unable to use ssh after the hang.
>>>
>>>> And if you can then one of a succesful suspend/resume for reference.
>>>
>>> Here's the trace after a successful suspend/resume:
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=255369
>>>
>>>>
>>>> (I haven't yet checked the one you added to bugzilla)
>>>>
>>>> -Mathias
>>>
>>> Diego
>>
>> ftrace_dump_on_oops is what I was looking for.
>>
>> Diego
>
> I tried ftrace_dump_on_oops but I can't see the trace coming in, not
> sure what I'm doing wrong. :(
>
> Diego
I was able to obtain the trace with this: hung_task_panic=1
no_console_suspend=1 ftrace_dump_on_oops
Diego
On Tue, Mar 21, 2017 at 12:29 PM, Diego Viola <[email protected]> wrote:
> On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola <[email protected]> wrote:
>> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola <[email protected]> wrote:
>>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <[email protected]> wrote:
>>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>>> <[email protected]> wrote:
>>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>>
>>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>>
>>>>>>>>
>>>>>
>>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>>
>>>>>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>>>>>> and
>>>>>>> giveback.
>>>>>>>
>>>>>>> Could you try enabling xhci tracing before suspending (not the same as
>>>>>>> xhci
>>>>>>> verbose dynamic debug)
>>>>>>> It will generate a lot of data, so better to remove all extra USB
>>>>>>> devices.
>>>>>>>
>>>>>>> xhci tracing can be added with:
>>>>>>>
>>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>>>>
>>>>>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>>>>>
>>>>>>
>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>>>>>
>>>>>> This is with Linux 4.11.0-rc3-ARCH.
>>>>>>
>>>>>> USB mouse/keyboard was unplugged before booting the machine.
>>>>>>
>>>>>> I didn't do a suspend/resume before getting this trace, should I do that?
>>>>>>
>>>>>> Should I reproduce the hang and get a netconsole dmesg capture with
>>>>>> tracing enabled?
>>>>>
>>>>>
>>>>> A trace and a dmesg of the same suspend/reusume hang would be great.
>>>>
>>>> I can capture the dmesg with netconsole once the machine hangs, but
>>>> I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
>>>> the hang. I'm unable to use ssh after the hang.
>>>>
>>>>> And if you can then one of a succesful suspend/resume for reference.
>>>>
>>>> Here's the trace after a successful suspend/resume:
>>>>
>>>> https://bugzilla.kernel.org/attachment.cgi?id=255369
>>>>
>>>>>
>>>>> (I haven't yet checked the one you added to bugzilla)
>>>>>
>>>>> -Mathias
>>>>
>>>> Diego
>>>
>>> ftrace_dump_on_oops is what I was looking for.
>>>
>>> Diego
>>
>> I tried ftrace_dump_on_oops but I can't see the trace coming in, not
>> sure what I'm doing wrong. :(
>>
>> Diego
>
> I was able to obtain the trace with this: hung_task_panic=1
> no_console_suspend=1 ftrace_dump_on_oops
>
> Diego
Here's the log I was able to obtain today, dmesg + ftrace at the time
of the crash:
https://bugzilla.kernel.org/attachment.cgi?id=255419
USB keyboard and mouse was plugged when I reproduced this.
Please let me know if you need more info.
Diego
On 22.03.2017 00:52, Diego Viola wrote:
> On Tue, Mar 21, 2017 at 12:29 PM, Diego Viola <[email protected]> wrote:
>> On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola <[email protected]> wrote:
>>> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola <[email protected]> wrote:
>>>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <[email protected]> wrote:
>>>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>>>> <[email protected]> wrote:
>>>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>>>
>>>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>>>
>>>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>>>
>>>>>>>> xhci tracing can be added with:
>>>>>>>>
>>>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>
> Here's the log I was able to obtain today, dmesg + ftrace at the time
> of the crash:
>
> https://bugzilla.kernel.org/attachment.cgi?id=255419
>
> USB keyboard and mouse was plugged when I reproduced this.
>
> Please let me know if you need more info.
>
Thanks, I'm looking at the logs and so far the most suspicious looking entry is:
[ 257.060941] rtsx_usb-254 0.... 119946155us : xhci_urb_enqueue: ep1out-bulk: urb ffff880105a93300 pipe 3221259520 length 0/12 sgs 0/0 stream 0 flags 00010000
[ 257.063601] rtsx_usb-254 0.... 119946162us : xhci_urb_enqueue: ep0out-control: urb ffff880105a93300 pipe 2147484928 length 0/0 sgs 0/0 stream 0 flags 00100000
It enqueues the same URB, without ever giving it back or actually queuing any trbs for
the urb, wel,l it might just fail to enqueue it in the first place.
I need to search for a URB that has been dequeued but never given back in the trace
-Mathias
On 22.03.2017 19:51, Mathias Nyman wrote:
> On 22.03.2017 00:52, Diego Viola wrote:
>> On Tue, Mar 21, 2017 at 12:29 PM, Diego Viola <[email protected]> wrote:
>>> On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola <[email protected]> wrote:
>>>> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola <[email protected]> wrote:
>>>>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <[email protected]> wrote:
>>>>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>>>>> <[email protected]> wrote:
>>>>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>>>>
>>>>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>>>>
>>>>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>>>>
>>>>>>>>> xhci tracing can be added with:
>>>>>>>>>
>>>>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>
>> Here's the log I was able to obtain today, dmesg + ftrace at the time
>> of the crash:
>>
>> https://bugzilla.kernel.org/attachment.cgi?id=255419
>>
>> USB keyboard and mouse was plugged when I reproduced this.
>>
>> Please let me know if you need more info.
>>
>
> Thanks, I'm looking at the logs and so far the most suspicious looking entry is:
>
> [ 257.060941] rtsx_usb-254 0.... 119946155us : xhci_urb_enqueue: ep1out-bulk: urb ffff880105a93300 pipe 3221259520 length 0/12 sgs 0/0 stream 0 flags 00010000
> [ 257.063601] rtsx_usb-254 0.... 119946162us : xhci_urb_enqueue: ep0out-control: urb ffff880105a93300 pipe 2147484928 length 0/0 sgs 0/0 stream 0 flags 00100000
>
> It enqueues the same URB, without ever giving it back or actually queuing any trbs for
> the urb, wel,l it might just fail to enqueue it in the first place.
>
> I need to search for a URB that has been dequeued but never given back in the trace
Ok, found a much more likely candidate:
[ 258.004078] kworker/-544 0d..1 121599183us : xhci_urb_dequeue: ep1out-bulk: urb ffff880105a930c0 pipe 3221259520...
We try to kill this URB "ffff880105a930c0", twice, and its never given back.
Trace is missing "xhci_dbg_cancel_urb: Cancel URB..." entry in log after
xhci_urb_dequeue, so it never got added to the list for cancellation in xhci driver.
xhci_urb_dequeue() has one place where it just returns an error without
giving back the urb or queuing it for cancellation.
This is in my opinion a bug in xhci_urb_dequeue()
rtsx_usb_ms is a good test for usb, it seems to be constantly queuing urbs at all
inappropriate times.
If I write a patch can you try it out?
-Mathias
On Thu, Mar 23, 2017 at 2:02 PM, Mathias Nyman
<[email protected]> wrote:
> On 22.03.2017 19:51, Mathias Nyman wrote:
>>
>> On 22.03.2017 00:52, Diego Viola wrote:
>>>
>>> On Tue, Mar 21, 2017 at 12:29 PM, Diego Viola <[email protected]>
>>> wrote:
>>>>
>>>> On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola <[email protected]>
>>>> wrote:
>>>>>
>>>>> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> xhci tracing can be added with:
>>>>>>>>>>
>>>>>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>
>>>
>>> Here's the log I was able to obtain today, dmesg + ftrace at the time
>>> of the crash:
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=255419
>>>
>>> USB keyboard and mouse was plugged when I reproduced this.
>>>
>>> Please let me know if you need more info.
>>>
>>
>> Thanks, I'm looking at the logs and so far the most suspicious looking
>> entry is:
>>
>> [ 257.060941] rtsx_usb-254 0.... 119946155us : xhci_urb_enqueue:
>> ep1out-bulk: urb ffff880105a93300 pipe 3221259520 length 0/12 sgs 0/0 stream
>> 0 flags 00010000
>> [ 257.063601] rtsx_usb-254 0.... 119946162us : xhci_urb_enqueue:
>> ep0out-control: urb ffff880105a93300 pipe 2147484928 length 0/0 sgs 0/0
>> stream 0 flags 00100000
>>
>> It enqueues the same URB, without ever giving it back or actually queuing
>> any trbs for
>> the urb, wel,l it might just fail to enqueue it in the first place.
>>
>> I need to search for a URB that has been dequeued but never given back in
>> the trace
>
>
> Ok, found a much more likely candidate:
>
> [ 258.004078] kworker/-544 0d..1 121599183us : xhci_urb_dequeue:
> ep1out-bulk: urb ffff880105a930c0 pipe 3221259520...
>
> We try to kill this URB "ffff880105a930c0", twice, and its never given back.
> Trace is missing "xhci_dbg_cancel_urb: Cancel URB..." entry in log after
> xhci_urb_dequeue, so it never got added to the list for cancellation in xhci
> driver.
>
> xhci_urb_dequeue() has one place where it just returns an error without
> giving back the urb or queuing it for cancellation.
> This is in my opinion a bug in xhci_urb_dequeue()
>
> rtsx_usb_ms is a good test for usb, it seems to be constantly queuing urbs
> at all
> inappropriate times.
>
> If I write a patch can you try it out?
Yes.
>
> -Mathias
>
>
>
Thanks,
Diego
Manually give back URB if we are not able to add it to cancel queue and
stop the endpoint normally.
This can happen if device just reset before URB timed out and dequeued,
leading to missing endpoint ring.
This caused a hang on Dell Inspiron 5558/0VNM2T at resume from suspend
as urb was never returned.
[ 245.270505] INFO: task rtsx_usb_ms_1:254 blocked for more than 120 seconds.
[ 245.272244] Tainted: G W 4.11.0-rc3-ARCH #2
[ 245.273983] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 245.275737] rtsx_usb_ms_1 D 0 254 2 0x00000000
[ 245.277524] Call Trace:
[ 245.279278] __schedule+0x2d3/0x8a0
[ 245.281077] schedule+0x3d/0x90
[ 245.281961] usb_kill_urb.part.3+0x6c/0xa0 [usbcore]
[ 245.282861] ? wake_atomic_t_function+0x60/0x60
[ 245.283760] usb_kill_urb+0x21/0x30 [usbcore]
[ 245.284649] usb_start_wait_urb+0xe5/0x170 [usbcore]
[ 245.285541] ? try_to_del_timer_sync+0x53/0x80
[ 245.286434] usb_bulk_msg+0xbd/0x160 [usbcore]
[ 245.287326] rtsx_usb_send_cmd+0x63/0x90 [rtsx_usb]
Reported-by: [email protected]
Cc: [email protected]
Signed-off-by: Mathias Nyman <[email protected]>
---
drivers/usb/host/xhci.c | 43 +++++++++++++++++++++++++------------------
1 file changed, 25 insertions(+), 18 deletions(-)
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 50aee8b..953fd8f 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1477,6 +1477,7 @@ int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
struct xhci_ring *ep_ring;
struct xhci_virt_ep *ep;
struct xhci_command *command;
+ struct xhci_virt_device *vdev;
xhci = hcd_to_xhci(hcd);
spin_lock_irqsave(&xhci->lock, flags);
@@ -1485,15 +1486,27 @@ int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
/* Make sure the URB hasn't completed or been unlinked already */
ret = usb_hcd_check_unlink_urb(hcd, urb, status);
- if (ret || !urb->hcpriv)
+ if (ret)
goto done;
+
+ /* give back URB now if we can't queue it for cancel */
+ vdev = xhci->devs[urb->dev->slot_id];
+ urb_priv = urb->hcpriv;
+ if (!vdev || !urb_priv)
+ goto err_giveback;
+
+ ep_index = xhci_get_endpoint_index(&urb->ep->desc);
+ ep = &vdev->eps[ep_index];
+ ep_ring = xhci_urb_to_transfer_ring(xhci, urb);
+ if (!ep || !ep_ring)
+ goto err_giveback;
+
temp = readl(&xhci->op_regs->status);
if (temp == 0xffffffff || (xhci->xhc_state & XHCI_STATE_HALTED)) {
xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
"HW died, freeing TD.");
- urb_priv = urb->hcpriv;
for (i = urb_priv->num_tds_done;
- i < urb_priv->num_tds && xhci->devs[urb->dev->slot_id];
+ i < urb_priv->num_tds;
i++) {
td = &urb_priv->td[i];
if (!list_empty(&td->td_list))
@@ -1501,23 +1514,9 @@ int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
if (!list_empty(&td->cancelled_td_list))
list_del_init(&td->cancelled_td_list);
}
-
- usb_hcd_unlink_urb_from_ep(hcd, urb);
- spin_unlock_irqrestore(&xhci->lock, flags);
- usb_hcd_giveback_urb(hcd, urb, -ESHUTDOWN);
- xhci_urb_free_priv(urb_priv);
- return ret;
+ goto err_giveback;
}
- ep_index = xhci_get_endpoint_index(&urb->ep->desc);
- ep = &xhci->devs[urb->dev->slot_id]->eps[ep_index];
- ep_ring = xhci_urb_to_transfer_ring(xhci, urb);
- if (!ep_ring) {
- ret = -EINVAL;
- goto done;
- }
-
- urb_priv = urb->hcpriv;
i = urb_priv->num_tds_done;
if (i < urb_priv->num_tds)
xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
@@ -1554,6 +1553,14 @@ int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
done:
spin_unlock_irqrestore(&xhci->lock, flags);
return ret;
+
+err_giveback:
+ if (urb_priv)
+ xhci_urb_free_priv(urb_priv);
+ usb_hcd_unlink_urb_from_ep(hcd, urb);
+ spin_unlock_irqrestore(&xhci->lock, flags);
+ usb_hcd_giveback_urb(hcd, urb, -ESHUTDOWN);
+ return ret;
}
/* Drop an endpoint from a new bandwidth configuration for this device.
--
1.9.1
On Thu, Mar 23, 2017 at 2:12 PM, Diego Viola <[email protected]> wrote:
> On Thu, Mar 23, 2017 at 2:02 PM, Mathias Nyman
> <[email protected]> wrote:
>> On 22.03.2017 19:51, Mathias Nyman wrote:
>>>
>>> On 22.03.2017 00:52, Diego Viola wrote:
>>>>
>>>> On Tue, Mar 21, 2017 at 12:29 PM, Diego Viola <[email protected]>
>>>> wrote:
>>>>>
>>>>> On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> xhci tracing can be added with:
>>>>>>>>>>>
>>>>>>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>
>>>>
>>>> Here's the log I was able to obtain today, dmesg + ftrace at the time
>>>> of the crash:
>>>>
>>>> https://bugzilla.kernel.org/attachment.cgi?id=255419
>>>>
>>>> USB keyboard and mouse was plugged when I reproduced this.
>>>>
>>>> Please let me know if you need more info.
>>>>
>>>
>>> Thanks, I'm looking at the logs and so far the most suspicious looking
>>> entry is:
>>>
>>> [ 257.060941] rtsx_usb-254 0.... 119946155us : xhci_urb_enqueue:
>>> ep1out-bulk: urb ffff880105a93300 pipe 3221259520 length 0/12 sgs 0/0 stream
>>> 0 flags 00010000
>>> [ 257.063601] rtsx_usb-254 0.... 119946162us : xhci_urb_enqueue:
>>> ep0out-control: urb ffff880105a93300 pipe 2147484928 length 0/0 sgs 0/0
>>> stream 0 flags 00100000
>>>
>>> It enqueues the same URB, without ever giving it back or actually queuing
>>> any trbs for
>>> the urb, wel,l it might just fail to enqueue it in the first place.
>>>
>>> I need to search for a URB that has been dequeued but never given back in
>>> the trace
>>
>>
>> Ok, found a much more likely candidate:
>>
>> [ 258.004078] kworker/-544 0d..1 121599183us : xhci_urb_dequeue:
>> ep1out-bulk: urb ffff880105a930c0 pipe 3221259520...
>>
>> We try to kill this URB "ffff880105a930c0", twice, and its never given back.
>> Trace is missing "xhci_dbg_cancel_urb: Cancel URB..." entry in log after
>> xhci_urb_dequeue, so it never got added to the list for cancellation in xhci
>> driver.
>>
>> xhci_urb_dequeue() has one place where it just returns an error without
>> giving back the urb or queuing it for cancellation.
>> This is in my opinion a bug in xhci_urb_dequeue()
>>
>> rtsx_usb_ms is a good test for usb, it seems to be constantly queuing urbs
>> at all
>> inappropriate times.
>>
>> If I write a patch can you try it out?
>
> Yes.
>
>>
>> -Mathias
>>
>>
>>
>
> Thanks,
> Diego
Hi Mathias,
I tested your patch with Linux 4.11-rc3 and can confirm that it solves
the problem.
I've tested suspend and resume with i3lock 150 times and it works.
Thank you, I appreciate it a lot.
Diego
On 24.03.2017 18:25, Diego Viola wrote:
> On Thu, Mar 23, 2017 at 2:12 PM, Diego Viola <[email protected]> wrote:
>> On Thu, Mar 23, 2017 at 2:02 PM, Mathias Nyman
>> <[email protected]> wrote:
>>> On 22.03.2017 19:51, Mathias Nyman wrote:
>>>>
>>>> On 22.03.2017 00:52, Diego Viola wrote:
>>>>>
>>>>> On Tue, Mar 21, 2017 at 12:29 PM, Diego Viola <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> xhci tracing can be added with:
>>>>>>>>>>>>
>>>>>>>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>>>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>>
>>>>>
>>>>> Here's the log I was able to obtain today, dmesg + ftrace at the time
>>>>> of the crash:
>>>>>
>>>>> https://bugzilla.kernel.org/attachment.cgi?id=255419
>>>>>
>>>>> USB keyboard and mouse was plugged when I reproduced this.
>>>>>
>>>>> Please let me know if you need more info.
>>>>>
>>>>
>>>> Thanks, I'm looking at the logs and so far the most suspicious looking
>>>> entry is:
>>>>
>>>> [ 257.060941] rtsx_usb-254 0.... 119946155us : xhci_urb_enqueue:
>>>> ep1out-bulk: urb ffff880105a93300 pipe 3221259520 length 0/12 sgs 0/0 stream
>>>> 0 flags 00010000
>>>> [ 257.063601] rtsx_usb-254 0.... 119946162us : xhci_urb_enqueue:
>>>> ep0out-control: urb ffff880105a93300 pipe 2147484928 length 0/0 sgs 0/0
>>>> stream 0 flags 00100000
>>>>
>>>> It enqueues the same URB, without ever giving it back or actually queuing
>>>> any trbs for
>>>> the urb, wel,l it might just fail to enqueue it in the first place.
>>>>
>>>> I need to search for a URB that has been dequeued but never given back in
>>>> the trace
>>>
>>>
>>> Ok, found a much more likely candidate:
>>>
>>> [ 258.004078] kworker/-544 0d..1 121599183us : xhci_urb_dequeue:
>>> ep1out-bulk: urb ffff880105a930c0 pipe 3221259520...
>>>
>>> We try to kill this URB "ffff880105a930c0", twice, and its never given back.
>>> Trace is missing "xhci_dbg_cancel_urb: Cancel URB..." entry in log after
>>> xhci_urb_dequeue, so it never got added to the list for cancellation in xhci
>>> driver.
>>>
>>> xhci_urb_dequeue() has one place where it just returns an error without
>>> giving back the urb or queuing it for cancellation.
>>> This is in my opinion a bug in xhci_urb_dequeue()
>>>
>>> rtsx_usb_ms is a good test for usb, it seems to be constantly queuing urbs
>>> at all
>>> inappropriate times.
>>>
>>> If I write a patch can you try it out?
>>
>> Yes.
>>
>>>
>>> -Mathias
>>>
>>>
>>>
>>
>> Thanks,
>> Diego
>
> Hi Mathias,
>
> I tested your patch with Linux 4.11-rc3 and can confirm that it solves
> the problem.
>
> I've tested suspend and resume with i3lock 150 times and it works.
>
> Thank you, I appreciate it a lot.
>
> Diego
>
Great, I'll send it forward, it can still make 4.11 final.
Thanks for testing
-Mathias