Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754016AbcCWDxm (ORCPT ); Tue, 22 Mar 2016 23:53:42 -0400 Received: from mail-db3on0053.outbound.protection.outlook.com ([157.55.234.53]:16752 "EHLO emea01-db3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753727AbcCWDxl convert rfc822-to-8bit (ORCPT ); Tue, 22 Mar 2016 23:53:41 -0400 X-Greylist: delayed 171319 seconds by postgrey-1.27 at vger.kernel.org; Tue, 22 Mar 2016 23:53:40 EDT From: Rajesh Bhagat To: Mathias Nyman CC: "gregkh@linuxfoundation.org" , "linux-usb@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Sriram Dash Subject: RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout Thread-Topic: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout Thread-Index: AQHRgOP0koj0An7NrUiFwLeaADZ9zZ9fDpyAgAQ7AdCAAFktgIABTHuggAB1NoCAAQay8A== Date: Wed, 23 Mar 2016 03:53:36 +0000 Message-ID: References: <1458284463-12743-1-git-send-email-rajesh.bhagat@nxp.com> <56EBE485.1060301@linux.intel.com> <56EFBBD1.6080600@intel.com> <56F1350B.7000006@linux.intel.com> In-Reply-To: <56F1350B.7000006@linux.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: linux.intel.com; dkim=none (message not signed) header.d=none;linux.intel.com; dmarc=none action=none header.from=nxp.com; x-originating-ip: [192.88.169.1] x-ms-office365-filtering-correlation-id: 76eafec2-d219-4055-72aa-08d352ceb6ae x-microsoft-exchange-diagnostics: 1;DB3PR04MB0555;5:TBRkwh2T9XyPDlxYG/OXIpjyzG0oMBPpboGNgNYYdKRa+lpnHBtDeNfM1tb7+V9lkf+KWsu83WlboP5xj4kGfGOcMmjGY5nSJVljqNfcVarzpGYVhQXbcCYd6difASJvjDyNRvy4T3ItGwiDqnnUtA==;24:U/Jmd5zaHfzCChVy5NEU2u6N3CmpY5YzaykgvPDmIbiEWOTJ8f+zbHvbxeB/xPswPSizoKrQD4eyNGVizjWxWBcRxvgzJrJOl+w1T2k24II= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB3PR04MB0555; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001);SRVR:DB3PR04MB0555;BCL:0;PCL:0;RULEID:;SRVR:DB3PR04MB0555; x-forefront-prvs: 08902E536D x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(377454003)(50084003)(24454002)(5004730100002)(76576001)(54356999)(50986999)(11100500001)(2950100001)(2900100001)(76176999)(74316001)(189998001)(106116001)(87936001)(10400500002)(110136002)(86362001)(77096005)(5002640100001)(33656002)(92566002)(81166005)(5003600100002)(122556002)(66066001)(93886004)(5890100001)(1220700001)(5008740100001)(19580395003)(19580405001)(2906002)(3660700001)(102836003)(6116002)(3280700002)(4326007)(586003)(1096002)(3846002)(49343001)(21314002);DIR:OUT;SFP:1101;SCL:1;SRVR:DB3PR04MB0555;H:HE1PR0401MB2028.eurprd04.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-OriginatorOrg: nxp.com X-MS-Exchange-CrossTenant-originalarrivaltime: 23 Mar 2016 03:53:36.8654 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 686ea1d3-bc2b-4c6f-a92c-d99c5c301635 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR04MB0555 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5524 Lines: 140 > -----Original Message----- > From: Mathias Nyman [mailto:mathias.nyman@linux.intel.com] > Sent: Tuesday, March 22, 2016 5:36 PM > To: Rajesh Bhagat > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; linux- > kernel@vger.kernel.org; Sriram Dash > Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI > commmand timeout > > On 22.03.2016 07:19, Rajesh Bhagat wrote: > > > > > >> -----Original Message----- > >> From: Mathias Nyman [mailto:mathias.nyman@intel.com] > >> Sent: Monday, March 21, 2016 2:46 PM > >> To: Rajesh Bhagat ; Mathias Nyman > >> ; linux-usb@vger.kernel.org; linux- > >> kernel@vger.kernel.org > >> Cc: gregkh@linuxfoundation.org; Sriram Dash > >> Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation > >> due to XHCI commmand timeout > >> > >> On 21.03.2016 06:18, Rajesh Bhagat wrote: > >>> > >>> > >>>> > >>>> Hi > >>>> > >>>> I think clearing the whole command ring is a bit too much in this case. > >>>> It may cause issues for all attached devices when one command times out. > >>>> > >>> > >>> Hi Mathias, > >>> > >>> I understand your point, But I want to understand how would > >>> completion handler be called if a command is timed out and > >>> xhci_abort_cmd_ring is successful. In this case all the code would be waiting on > completion handler forever. > >>> > >>> > >>> 2. xhci_handle_command_timeout -> xhci_abort_cmd_ring(failure) -> > >>> xhci_cleanup_command_queue -> xhci_complete_del_and_free_cmd > >>> > >>> In our case command is timed out, Hence we hit the case #2 but > >>> xhci_abort_cmd_ring is success which does not calls complete. > >> > >> xhci_abort_cmd_ring() will write CA bit (CMD_RING_ABORT) to CRCR register. > >> This will generate a command completion event with status "command > >> aborted" for the pending command. > >> This event is then followed by a "command ring stopped" command completion > event. > >> > >> See xHCI specs 5.4.5 and 4.6.1.2 > >> > >> handle_cmd_completion() will check if cmd_comp_code == > >> COMP_CMD_ABORT, goto event_handled, and call > >> xhci_complete_del_and_free_cmd(cmd, cmd_comp_code) for the aborted > command. > >> > >> If xHCI already processed the aborted command, we might only get a > >> command ring stopped event, in this case handle_cmd_completion() will > >> call xhci_handle_stopped_cmd_ring(xhci, cmd), which will turn the > >> commands that were tagged for "abort" that still remain on the command ring to > NO-OP commands. > >> > >> The completion callback will be called for these NO-OP command later > >> when we get a command completion event for them. > >> > > > > Thanks Mathias for detailed explanation. Now I understand how > > completion handler is supposed to be called in this scenario. > > > > But in our case, somehow we are not getting any event and > > handle_cmd_completion function is not getting called even after successful > xhci_abort_cmd_ring when command timed out. > > > > Now, my point here is code prior to this patch xhci: rework command > > timeout and cancellation, Code would have returned in case command timed out in > xhci_alloc_dev itself. > > > > - /* XXX: how much time for xHC slot assignment? */ > > - timeleft = wait_for_completion_interruptible_timeout( > > - command->completion, > > - XHCI_CMD_DEFAULT_TIMEOUT); > > - if (timeleft <= 0) { > > - xhci_warn(xhci, "%s while waiting for a slot\n", > > - timeleft == 0 ? "Timeout" : "Signal"); > > - /* cancel the enable slot request */ > > - ret = xhci_cancel_cmd(xhci, NULL, command->command_trb); > > - return ret; > > - } > > + wait_for_completion(command->completion); > > > > But after this patch, we are waiting for hardware event, which is > > somehow not generated and causing a hang scenario. > > > > IMO, The assumption that "xhci_abort_cmd_ring would always generate an > > event and handle_cmd_completion would be called" will not be always be true if HW > is in bad state. > > > > Please share your opinion. > > > > writing the CA (command abort) bit in CRCR (command ring control register) will stop > the command ring, and CRR (command ring running) will be set to 0 by xHC. > xhci_abort_cmd_ring() polls this bit up to 5 seconds. > If it's not 0 then the driver considers the command abort as failed. > > The scenario you're thinking of is that xHC would still react to CA bit set, it would stop > the command ring and set CRR 0, but not send a command completion event. > > Have you tried adding some debug to handle_cmd_completion() and see if you receive > any event after command abortion? > Yes. We have added debug prints at first line of handle_cmd_completion, and we are not getting those prints. The last print messages that we get are as below from xhci_alloc_dev while resume operation: xhci-hcd xhci-hcd.0.auto: Command timeout xhci-hcd xhci-hcd.0.auto: Abort command ring May be somehow, USB controller is in bad state and not responding to the commands. Please suggest how XHCI driver can handle such situations. > The patches I mentioned earlier are also worth trying out, they might fix the cause > why command aborts in the first place. > We tried both the patches you mentioned. But they dint come to rescue here. > -Mathias > > > > >