Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758274AbcDADzT (ORCPT ); Thu, 31 Mar 2016 23:55:19 -0400 Received: from mail-am1on0082.outbound.protection.outlook.com ([157.56.112.82]:12025 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751717AbcDADzQ convert rfc822-to-8bit (ORCPT ); Thu, 31 Mar 2016 23:55:16 -0400 From: Rajesh Bhagat To: Mathias Nyman CC: "gregkh@linuxfoundation.org" , "linux-usb@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Sriram Dash Subject: RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout Thread-Topic: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout Thread-Index: AQHRgOP0koj0An7NrUiFwLeaADZ9zZ9fDpyAgAQ7AdCAAFktgIABTHuggAB1NoCAAQay8IAAsdaAgAdP/gCAAk/0gIACPkawgAC4cQCAANwGQA== Date: Fri, 1 Apr 2016 03:55:11 +0000 Message-ID: References: <1458284463-12743-1-git-send-email-rajesh.bhagat@nxp.com> <56EBE485.1060301@linux.intel.com> <56EFBBD1.6080600@intel.com> <56F1350B.7000006@linux.intel.com> <56F2A697.1000701@linux.intel.com> <56FAB97F.4030206@linux.intel.com> <56FD35F4.3020005@linux.intel.com> In-Reply-To: <56FD35F4.3020005@linux.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: linux.intel.com; dkim=none (message not signed) header.d=none;linux.intel.com; dmarc=none action=none header.from=nxp.com; x-originating-ip: [192.88.169.1] x-ms-office365-filtering-correlation-id: 277b9ca2-f2f9-4a79-2ef3-08d359e16cbc x-microsoft-exchange-diagnostics: 1;DB3PR04MB0555;5:5jrozhHl/Vl6u6nQ9HYShz35/pVBeBOefck9NpcqudcV0V/txT3GeefPZgAAgpEiK5f4cnelC/x66j/Hrvz3whU2rv+63BDGJHs+VHWtwRru7hu1gLVkbyD1helmlSr5sFCiKvL/dIPnugLGZiA8Xg==;24:7SiHLYgKum8QMorVWnpS46exRvZPg0s8JtVpiFNTzHpyGRVtixNt2qT4pimmpJ6oUhROlzHSRECd2ZmmErZz8hk8/pMieFlnjhMgAKRj5a4= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB3PR04MB0555; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001);SRVR:DB3PR04MB0555;BCL:0;PCL:0;RULEID:;SRVR:DB3PR04MB0555; x-forefront-prvs: 0899B47777 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(24454002)(377424004)(13464003)(377454003)(66066001)(5004730100002)(5890100001)(189998001)(86362001)(10400500002)(33656002)(5008740100001)(3280700002)(3846002)(77096005)(5002640100001)(6116002)(122556002)(102836003)(81166005)(76576001)(575784001)(87936001)(92566002)(50986999)(76176999)(110136002)(2950100001)(54356999)(1220700001)(2900100001)(4326007)(74316001)(93886004)(19580405001)(3660700001)(2906002)(106116001)(1096002)(5003600100002)(586003)(19580395003)(49343001);DIR:OUT;SFP:1101;SCL:1;SRVR:DB3PR04MB0555;H:HE1PR0401MB2028.eurprd04.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-OriginatorOrg: nxp.com X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Apr 2016 03:55:11.0219 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 686ea1d3-bc2b-4c6f-a92c-d99c5c301635 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR04MB0555 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7832 Lines: 184 > -----Original Message----- > From: Mathias Nyman [mailto:mathias.nyman@linux.intel.com] > Sent: Thursday, March 31, 2016 8:07 PM > To: Rajesh Bhagat > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; linux- > kernel@vger.kernel.org; Sriram Dash > Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI > commmand timeout > > On 31.03.2016 06:51, Rajesh Bhagat wrote: > > > > > > > > Hello Mathias, > > > > Thanks a lot for your support. > > > > Attached patch is working and allows the completion handler to be > > called. And resume operation completes and shell comes back (but with lot of > errors). > > > > Now, some other points which needs our attention here are: > > 1. usb core hub code is not checking the error status of hcd->driver- > >reset_device(xhci_discover_or_reset_device) > > and continues considering reset_device was successful (and causes other issues). > Hence, a check is needed on return > > value of reset_device and proper action is required on it. > > 2. Even if above return value is checked and reset_device status is used. USB core > hub retries for N times which is not > > required in this case as adding to the resume operation time. But if in this case > we return -ENOTCONN instead of -EINVAL > > from hcd->driver->reset_device(xhci_discover_or_reset_device), code returns > early and doesn't retry. > > > > Proposed Solution for above with your suggested patches is as below: > > > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index > > 7cad1fa..efeba31 100644 > > --- a/drivers/usb/core/hub.c > > +++ b/drivers/usb/core/hub.c > > @@ -2807,13 +2807,17 @@ done: > > struct usb_hcd *hcd = bus_to_hcd(udev->bus); > > > > update_devnum(udev, 0); > > - /* The xHC may think the device is already reset, > > - * so ignore the status. > > + /* > > + * check the status of resetting the device and update > > + * the udev status. > > */ > > if (hcd->driver->reset_device) > > - hcd->driver->reset_device(hcd, udev); > > + status = > > + hcd->driver->reset_device(hcd, udev); > > > > - usb_set_device_state(udev, USB_STATE_DEFAULT); > > + if (status == 0) > > + usb_set_device_state(udev, USB_STATE_DEFAULT); > > + else > > + usb_set_device_state(udev, > > + USB_STATE_NOTATTACHED); > > } > > } else { > > if (udev) > > diff --git a/drivers/usb/host/xhci-ring.c > > b/drivers/usb/host/xhci-ring.c index b3a0a22..9427175f 100644 > > --- a/drivers/usb/host/xhci-ring.c > > +++ b/drivers/usb/host/xhci-ring.c > > @@ -310,6 +310,10 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci) > > return -ESHUTDOWN; > > } > > > > + /* writing the CMD_RING_ABORT bit should create a command completion > > + * event, add a command completion timeout for it as well > > + */ > > + mod_timer(&xhci->cmd_timer, jiffies + > > + XHCI_CMD_DEFAULT_TIMEOUT); > > return 0; > > } > > > > @@ -1243,6 +1247,7 @@ void xhci_handle_command_timeout(unsigned long data) > > int ret; > > unsigned long flags; > > u64 hw_ring_state; > > + u32 old_status; > > struct xhci_command *cur_cmd = NULL; > > xhci = (struct xhci_hcd *) data; > > > > @@ -1250,6 +1255,7 @@ void xhci_handle_command_timeout(unsigned long data) > > spin_lock_irqsave(&xhci->lock, flags); > > if (xhci->current_cmd) { > > cur_cmd = xhci->current_cmd; > > + old_status = cur_cmd->status; > > cur_cmd->status = COMP_CMD_ABORT; > > } > > > > @@ -1272,7 +1278,15 @@ void xhci_handle_command_timeout(unsigned long > data) > > } > > /* command timeout on stopped ring, ring can't be aborted */ > > xhci_dbg(xhci, "Command timeout on stopped ring\n"); > > - xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd); > > + > > + /* is this the second timeout for the same command? ring wont restart */ > > + if (old_status == COMP_CMD_ABORT) { > > + xhci_err(xhci, "Abort timeout, Fail to restart cmd ring\n"); > > + xhci_cleanup_command_queue(xhci); > > + /* everything else here to halt, cleanup, free etc kill xhc */ > > + } else > > + xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd); > > + > > spin_unlock_irqrestore(&xhci->lock, flags); > > return; > > } > > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index > > c502c22..bd18966 100644 > > --- a/drivers/usb/host/xhci.c > > +++ b/drivers/usb/host/xhci.c > > @@ -3450,7 +3450,7 @@ int xhci_discover_or_reset_device(struct usb_hcd *hcd, > struct usb_device *udev) > > if (ret == 1) > > return 0; > > else > > - return -EINVAL; > > + return -ENOTCONN; /* Don't retry */ > > } > > > > if (virt_dev->tt_info) > > > > Sample output after above patch (timer is set as wakeup source): > > > > root@phoenix:~# echo mem > /sys/power/state > > PM: Syncing filesystems ... done. > > Freezing user space processes ... (elapsed 0.001 seconds) done. > > Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > > PM: suspend of devices complete after 155.658 msecs > > PM: late suspend of devices complete after 1.594 msecs > > PM: noirq suspend of devices complete after 1.496 msecs > > PM: noirq resume of devices complete after 1.290 msecs > > PM: early resume of devices complete after 1.250 msecs usb usb1: root > > hub lost power or was reset usb usb2: root hub lost power or was reset > > xhci-hcd xhci-hcd.0.auto: Abort timeout, Fail to restart cmd ring > > xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID > > xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 127. > > xhci-hcd xhci-hcd.0.auto: Abort timeout, Fail to restart cmd ring > > xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID > > xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 127. > > PM: resume of devices complete after 25436.366 msecs <= resume time is > increased a lot even after using -ENOTCONN > > Restarting tasks ... > > usb 1-1: USB disconnect, device number 2 usb 1-1.2: USB disconnect, > > device number 3 usb 2-1: USB disconnect, device number 2 done. > > root@phoenix:~# > > > > Please share your opinion on other changes for patch submission as well as resume > time. > > > > I think more effort should be put into investigating why this happens in the first place. > What is the root cause? why doesn't xhci start properly after resume in this case. > Hello Mathias, I understand your point and would surely debug the root cause of the issue. But implementing the fallback mechanism in SW, if HW does not respond well seems to a better solution to me. Just to add, code prior to common implementation of xhci_handle_command_timeout was handling the above case and was __not__ dependent on HW. Please comment on the possibility of above fallback mechanism in XHCI SW and any side effects that you can foresee. > Optimizing resume time and error paths should be secondary here, resuming faster > isn't really a matter when xhci is completely stuck. > I agree on above point. - Rajesh Bhagat > -Mathias