Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp17864950ybl; Thu, 2 Jan 2020 13:38:15 -0800 (PST) X-Google-Smtp-Source: APXvYqxmHyWmiRHYR6mVy+R80dE59MEj6vK4cf+jp8bduAIs5xzki+yTSMZON1qRabUk+sZ9D+Rl X-Received: by 2002:a9d:6196:: with SMTP id g22mr97666951otk.204.1578001095166; Thu, 02 Jan 2020 13:38:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578001095; cv=none; d=google.com; s=arc-20160816; b=cBRPUE8Z9vzs4yStDRF1mVfOJyR+ykBTWbS99Go10fRHwhR2BVc+TWQaLbnmq1Hhj0 5KIqHaT22VxySbEWXPDCn6wZ7L2j0lPCrX7FT0+IndnwxPTy7BQqtNi6pkrBtBn9rAMg 7/3QVqlVI4ErfmdQp7ZUVwsIHDzRk/9bTIpSKTQwWuKr+D3luuXCC+W43flbHjveKx7Z 7z0jlfQJGG7w4OWFtKIKpJXHnFlVPJfI/6DkSes/n8USJS2oRvleKiekomgPLgvDbxMs gS3R84ASvnARS1frt/C4ygvsT6FW4kpql4Qn/cIVdaa384434Nw862NrmzD4bBDxT1DS 6xjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:in-reply-to :subject:cc:to:from:date; bh=IN6d2eKCz8huCJRn9t4vu6MpsPaNEbmeVMHV+pjV0Cc=; b=KKnMlyzBN04uAbCin2MYHnluTrRlg/UICeD8ZLbkWEWt/YBOpEuI34kOZq5FWMJaqP niSOy6RunbpvSeO4/9xIieNRf8HjalFNTiNiF1NvP5GOsVZ3h7IUDxCXP0J6NsuoUKXD tsnrtXiKzQlYesnEz3Z5vTeih7YNAaGTBmSJZSj5winYkH9WcloNaiaSGQ5VUjyjXTzX A8f/G1e2eTL+vhI6hdyBRE/nvJxElVnnZp2Zw7ZzRcsDnx1uh72mFYz/luxtsMHvhggx ZLhyJWgzR9btTLFBVwVHOpLUXWy1pOcAsHZrA4BaZGQgEzN9cIoVqH7jTiYDfiRT48F4 6N+Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j13si29380236otq.146.2020.01.02.13.38.03; Thu, 02 Jan 2020 13:38:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726148AbgABVhL (ORCPT + 99 others); Thu, 2 Jan 2020 16:37:11 -0500 Received: from iolanthe.rowland.org ([192.131.102.54]:55358 "HELO iolanthe.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1726099AbgABVhK (ORCPT ); Thu, 2 Jan 2020 16:37:10 -0500 Received: (qmail 21999 invoked by uid 2102); 2 Jan 2020 16:37:09 -0500 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 2 Jan 2020 16:37:09 -0500 Date: Thu, 2 Jan 2020 16:37:09 -0500 (EST) From: Alan Stern X-X-Sender: stern@iolanthe.rowland.org To: AceLan Kao , Mathias Nyman cc: Greg Kroah-Hartman , Kai-Heng Feng , Thinh Nguyen , Harry Pan , David Heinzelmann , Andrey Konovalov , Nicolas Saenz Julienne , Mathieu Malaterre , USB list , "Linux-Kernel@Vger. Kernel. Org" Subject: Re: [PATCH] usb: hub: move resume delay at the head of all USB access functions In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 25 Dec 2019, AceLan Kao wrote: > Here[1] are the dmesg and the usbmon log from wireshark, and > /sys/kernel/debug/usb/usbmon/0u. > > I verified this issue on Dell XPS 13 + Dell Salomon WD19 docking > station(plug-in 3 USB disk on it) > After s2idle 7 times, 2 usb disks lost. But from wireshark log, the > packets look normal, no error. > > So, I re-do the test again and log the usbmon/0u output, but it's greek to me. > Hope you can help to find some clues in the logs. > Thanks. > > 1. https://people.canonical.com/~acelan/bugs/usb_issue/ Maybe Mathias can help figure out what's going on. It clearly is an xHCI-related problem of some sort. I can't get much out of these logs. For one thing, the time period covered by the usbmon trace is different from the time period in your dmesg log. When you collect two kinds of logs for a test, it's important that they should be collected at the _same_ time! Otherwise they record different events, which is no use. For another, your usbmon trace used the 0u file, but you should have used the 2u file instead. All the errors you get involve devices on bus 2; including other buses in the trace just makes it more confusing and causes more entries to be dropped. Also, I don't understand why your Dell Dock devices show up the way they do. The SuperSpeed dock is on bus 2, and the high speed dock is on bus 3, which is attached to a different xHCI controller! A hub (which is what a dock is) should show up twice, and both instances should be attached to the same controller. In any case, it's clear that you've got some problem involving the xHCI controller and the SuperSpeed Dell Dock (device 2-1). Here's an excerpt from the dmesg log showing what goes wrong. This occurs during a system resume; apparently the dock's connection drops while the system is suspended and there are errors when the system tries to re-activate it: [ 721.507202] usb 2-1: USB disconnect, device number 32 [ 721.507206] usb 2-1.3: USB disconnect, device number 33 [ 721.507209] usb 2-1.3.1: USB disconnect, device number 35 [ 721.540214] usb 2-1.3.3: USB disconnect, device number 36 [ 721.584518] usb 2-1.4: USB disconnect, device number 34 [ 721.647466] xhci_hcd 0000:00:0d.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state. That's the first problem. [ 721.919433] usb 2-1: new SuperSpeedPlus Gen 2 USB device number 38 using xhci_hcd [ 721.942597] usb 2-1: New USB device found, idVendor=0bda, idProduct=0487, bcdDevice= 1.47 [ 721.942600] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 721.942602] usb 2-1: Product: Dell dock [ 721.942604] usb 2-1: Manufacturer: Dell Inc. [ 721.944849] hub 2-1:1.0: USB hub found [ 721.945232] hub 2-1:1.0: 4 ports detected [ 722.178395] hub 2-1:1.0: hub_ext_port_status failed (err = -71) [ 722.178570] usb 2-1-port3: attempt power cycle [ 722.559212] usb 2-1: USB disconnect, device number 38 [ 722.559311] usb 2-1: Failed to suspend device, error -19 And that's a second error. Then the system tries again: [ 722.831431] usb 2-1: new SuperSpeedPlus Gen 2 USB device number 43 using xhci_hcd [ 722.854608] usb 2-1: New USB device found, idVendor=0bda, idProduct=0487, bcdDevice= 1.47 [ 722.854611] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 722.854614] usb 2-1: Product: Dell dock [ 722.854616] usb 2-1: Manufacturer: Dell Inc. [ 722.857034] hub 2-1:1.0: USB hub found [ 722.857440] hub 2-1:1.0: 4 ports detected [ 723.090106] hub 2-1:1.0: hub_ext_port_status failed (err = -71) [ 723.090316] usb 2-1-port3: attempt power cycle [ 723.471212] usb 2-1: USB disconnect, device number 43 [ 723.471274] usb 2-1: Failed to suspend device, error -19 And it fails in the same way. A third try ends up succeeding: [ 723.743465] usb 2-1: new SuperSpeedPlus Gen 2 USB device number 48 using xhci_hcd ... I don't see why the device numbers jump around in such a crazy manner. When device 38 disconnects, the next device is number 43. And when that fails, the next number is 48! It looks like device numbers are somehow being allocated for the four child devices even though the log doesn't show them being detected. Here's a portion of the usbmon trace that apparently corresponds somewhat to part of the log above, although it's hard to be certain. In this trace, device 77 on bus 2 is 2-1. Unlike the log above, it only has a child attached to port 3, not to port 4: ffff9879b43ff840 291366827 S Ci:2:001:0 s a3 00 0000 0001 0004 4 < ffff9879b43ff840 291366846 C Ci:2:001:0 0 4 = 03020000 ffff9879b2f54780 291469875 S Ci:2:077:0 s a3 00 0000 0001 0004 4 < ffff9879b2f54780 291470062 C Ci:2:077:0 0 4 = a0020000 ffff9879b2f54780 291470069 S Ci:2:077:0 s a3 00 0000 0002 0004 4 < ffff9879b2f54780 291470208 C Ci:2:077:0 0 4 = a0020000 ffff9879b2f54780 291470214 S Ci:2:077:0 s a3 00 0000 0003 0004 4 < ffff9879b2f54780 291470387 C Ci:2:077:0 0 4 = 03021000 ffff9879b2f54780 291470392 S Co:2:077:0 s 23 01 0014 0003 0000 0 ffff9879b2f54780 291470527 C Co:2:077:0 0 0 ffff9879b2f54780 291470533 S Ci:2:077:0 s a3 00 0000 0004 0004 4 < ffff9879b2f54780 291470709 C Ci:2:077:0 0 4 = a0020000 ffff9879b43ff480 291573804 S Ii:2:077:1 -115:128 2 < ffff9879b2f54780 291573839 S Ci:2:077:0 s a3 00 0000 0003 0004 4 < ffff9879b2f54780 291574068 C Ci:2:077:0 0 4 = 03020000 ffff9879b2f54780 291574166 S Ci:2:077:0 s a3 00 0000 0003 0004 4 < ffff9879b2f54780 291574363 C Ci:2:077:0 0 4 = 03020000 ffff9879b2f54780 291574383 S Co:2:077:0 s 23 03 0004 0003 0000 0 ffff9879b2f54780 291574580 C Co:2:077:0 0 0 ffff9879b2f54780 291593804 S Ci:2:077:0 s a3 00 0002 0003 0008 8 < ffff9879ae2969c0 291595804 C Ii:2:001:1 0:2048 1 = 02 ffff9879ae2969c0 291595824 S Ii:2:001:1 -115:2048 4 < ffff9879ba3476c0 291595843 S Ci:2:001:0 s a3 00 0000 0001 0004 4 < ffff9879ba3476c0 291595857 C Ci:2:001:0 0 4 = c0024100 As far as I can tell, the device was working fine until at this point it disconnected itself. ffff9879ba3476c0 291595863 S Co:2:001:0 s 23 01 0010 0001 0000 0 ffff9879ba3476c0 291595867 C Co:2:001:0 0 0 ffff9879ba3476c0 291595870 S Co:2:001:0 s 23 01 0019 0001 0000 0 ffff9879ba3476c0 291595875 C Co:2:001:0 0 0 ffff9879ba3476c0 291595879 S Co:2:001:0 s 23 03 001c 0001 0000 0 ffff9879ba3476c0 291595884 C Co:2:001:0 0 0 ffff9879b2f54780 291600949 C Ci:2:077:0 -71 0 ffff9879b2f54780 291600997 S Co:2:077:0 s 23 03 0004 0003 0000 0 ffff9879b2f54780 291601000 E Co:2:077:0 -19 0 Mathias, any suggestions on how Acelan can track down the reasons for these problems? Alan Stern