Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757872AbcLAN1l (ORCPT ); Thu, 1 Dec 2016 08:27:41 -0500 Received: from mga14.intel.com ([192.55.52.115]:20642 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751601AbcLAN1k (ORCPT ); Thu, 1 Dec 2016 08:27:40 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,282,1477983600"; d="scan'208";a="1066638211" Subject: Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command To: Baolin Wang References: <613dafc211127a4589306e91e231e151feb5ce80.1480496291.git.baolin.wang@linaro.org> <583EDD87.8030307@linux.intel.com> Cc: mathias.nyman@intel.com, Greg KH , USB , LKML , Mark Brown From: Mathias Nyman Message-ID: <58402579.6000308@linux.intel.com> Date: Thu, 1 Dec 2016 15:28:25 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2897 Lines: 63 On 01.12.2016 06:54, Baolin Wang wrote: > On 30 November 2016 at 22:09, Mathias Nyman > wrote: >> On 30.11.2016 11:02, Baolin Wang wrote: >>> >>> If the hardware never responds to the stop endpoint command, the >>> URBs will never be completed, and we might hang the USB subsystem. >>> The original watchdog timer is used to watch if one stop endpoint >>> command is timeout, if timeout, then the watchdog timer will set >>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all >>> pending URBs. >>> >>> But now we already have one command timer to control command timeout, >>> thus we can also use the command timer to watch the stop endpoint >>> command, instead of one duplicate watchdog timer which need to be >>> removed. >>> >>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if >>> this is the last stop endpoint command of one endpoint. Since we >>> can make sure we only set one stop endpoint command for one endpoint >>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove >>> this flag. >>> >>> We also need to clean up the command queue before trying to halt the >>> xHCI host in xhci_stop_endpoint_command_timeout() function. >> >> >> This isn't a bad idea. >> >> There are anyway some corner cases and details that need to be >> checked, such as suspend (which will clear the command queue), module unload >> and abrupt host removal (like pci hotplug removal of host controller) >> we need to make sure we can trust the command timer to always return the >> canceled URB > > Yes, you are right, we need to check these carefully. > > Suspend process, module unload and abrupt host removal, they all will > issue usb_disconnect() firstly before clear the command queue, it will > check URBs for every endpoint by > usb_disconnect()--->usb_disable_device()--->usb_disable_endpoint(), > which will make sure every URBs of endpoints will be cancelled by the > stop endpoint command responding or the timeout function of stop > endpoint command (xhci_stop_endpoint_command_timeout()) in > usb_hcd_flush_endpoint(). From that point, we can make sure the > command timer will be useful to handle stop endpoint command timeout. > Please correct me if I said something wrong. Thanks. > This relies on current queued command that times out to be the stop endpoint command. If host partially, or completely hangs there might be any number of commands in the command queue, and we would need to wait for each one of them to timeout, finish before we finally get to the stop endpoint command, and give back the urb. I think it would be better to first fix the issues with the current watchdog function, get those fixes into stable, and then think about moving to the command queue timer. In short, this patch doesn't currently fix any existing issue, but might cause the timeout to be more unreliable -Mathias