Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756752AbcLBCqp (ORCPT ); Thu, 1 Dec 2016 21:46:45 -0500 Received: from mail-yw0-f182.google.com ([209.85.161.182]:36828 "EHLO mail-yw0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751105AbcLBCqn (ORCPT ); Thu, 1 Dec 2016 21:46:43 -0500 MIME-Version: 1.0 In-Reply-To: <58402579.6000308@linux.intel.com> References: <613dafc211127a4589306e91e231e151feb5ce80.1480496291.git.baolin.wang@linaro.org> <583EDD87.8030307@linux.intel.com> <58402579.6000308@linux.intel.com> From: Baolin Wang Date: Fri, 2 Dec 2016 10:46:42 +0800 Message-ID: Subject: Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command To: Mathias Nyman Cc: mathias.nyman@intel.com, Greg KH , USB , LKML , Mark Brown Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3138 Lines: 85 On 1 December 2016 at 21:28, Mathias Nyman wrote: > On 01.12.2016 06:54, Baolin Wang wrote: >> >> On 30 November 2016 at 22:09, Mathias Nyman >> wrote: >>> >>> On 30.11.2016 11:02, Baolin Wang wrote: >>>> >>>> >>>> If the hardware never responds to the stop endpoint command, the >>>> URBs will never be completed, and we might hang the USB subsystem. >>>> The original watchdog timer is used to watch if one stop endpoint >>>> command is timeout, if timeout, then the watchdog timer will set >>>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all >>>> pending URBs. >>>> >>>> But now we already have one command timer to control command timeout, >>>> thus we can also use the command timer to watch the stop endpoint >>>> command, instead of one duplicate watchdog timer which need to be >>>> removed. >>>> >>>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if >>>> this is the last stop endpoint command of one endpoint. Since we >>>> can make sure we only set one stop endpoint command for one endpoint >>>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove >>>> this flag. >>>> >>>> We also need to clean up the command queue before trying to halt the >>>> xHCI host in xhci_stop_endpoint_command_timeout() function. >>> >>> >>> >>> This isn't a bad idea. >>> >>> There are anyway some corner cases and details that need to be >>> checked, such as suspend (which will clear the command queue), module >>> unload >>> and abrupt host removal (like pci hotplug removal of host controller) >>> we need to make sure we can trust the command timer to always return the >>> canceled URB >> >> >> Yes, you are right, we need to check these carefully. >> >> Suspend process, module unload and abrupt host removal, they all will >> issue usb_disconnect() firstly before clear the command queue, it will >> check URBs for every endpoint by >> usb_disconnect()--->usb_disable_device()--->usb_disable_endpoint(), >> which will make sure every URBs of endpoints will be cancelled by the >> stop endpoint command responding or the timeout function of stop >> endpoint command (xhci_stop_endpoint_command_timeout()) in >> usb_hcd_flush_endpoint(). From that point, we can make sure the >> command timer will be useful to handle stop endpoint command timeout. >> Please correct me if I said something wrong. Thanks. >> > > This relies on current queued command that times out to be the stop endpoint > command. > > If host partially, or completely hangs there might be any number of commands > in the > command queue, and we would need to wait for each one of them to timeout, > finish > before we finally get to the stop endpoint command, and give back the urb. > > I think it would be better to first fix the issues with the current watchdog > function, get > those fixes into stable, and then think about moving to the command queue > timer. OK, make sense. Thanks. > > In short, this patch doesn't currently fix any existing issue, but might > cause the > timeout to be more unreliable > > -Mathias -- Baolin.wang Best Regards