Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759700AbcLBBRL (ORCPT ); Thu, 1 Dec 2016 20:17:11 -0500 Received: from mga03.intel.com ([134.134.136.65]:54929 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753274AbcLBBRJ (ORCPT ); Thu, 1 Dec 2016 20:17:09 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,284,1477983600"; d="scan'208";a="12446228" Subject: Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command To: Baolin Wang References: <613dafc211127a4589306e91e231e151feb5ce80.1480496291.git.baolin.wang@linaro.org> <583FB8F0.6070503@linux.intel.com> <583FC4AF.3030502@linux.intel.com> <583FD4C6.7020306@linux.intel.com> Cc: mathias.nyman@intel.com, Greg KH , USB , LKML , Mark Brown From: Lu Baolu Message-ID: <5840CB93.4010908@linux.intel.com> Date: Fri, 2 Dec 2016 09:17:07 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4360 Lines: 99 Hi, On 12/01/2016 04:03 PM, Baolin Wang wrote: > On 1 December 2016 at 15:44, Lu Baolu wrote: >> Hi, >> >> On 12/01/2016 03:35 PM, Baolin Wang wrote: >>> On 1 December 2016 at 14:35, Lu Baolu wrote: >>>> Hi, >>>> >>>> On 12/01/2016 02:04 PM, Baolin Wang wrote: >>>>> Hi Baolu, >>>>> >>>>> On 1 December 2016 at 13:45, Lu Baolu wrote: >>>>>> Hi, >>>>>> >>>>>> On 11/30/2016 05:02 PM, Baolin Wang wrote: >>>>>>> If the hardware never responds to the stop endpoint command, the >>>>>>> URBs will never be completed, and we might hang the USB subsystem. >>>>>>> The original watchdog timer is used to watch if one stop endpoint >>>>>>> command is timeout, if timeout, then the watchdog timer will set >>>>>>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all >>>>>>> pending URBs. >>>>>>> >>>>>>> But now we already have one command timer to control command timeout, >>>>>>> thus we can also use the command timer to watch the stop endpoint >>>>>>> command, instead of one duplicate watchdog timer which need to be >>>>>>> removed. >>>>>>> >>>>>>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if >>>>>>> this is the last stop endpoint command of one endpoint. Since we >>>>>>> can make sure we only set one stop endpoint command for one endpoint >>>>>>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove >>>>>>> this flag. >>>>>> I am afraid you can't do this. "stop_cmds_pending" was added >>>>>> to fix the problem described in the comments that you want to >>>>>> remove. But I didn't find any fix of this problem in your patch. >>>>> Now we can not pending another stop endpoint command for the same one >>>>> endpoint, since will check 'EP_HALT_PENDING' flag in >>>>> xhci_urb_dequeue() function to avoid this. But after some >>>>> investigation, I think I missed the stop endpoint command in >>>>> xhci_stop_device() which did not check the 'EP_HALT_PENDING' flag, >>>>> maybe need to add 'EP_HALT_PENDING' flag checking in >>>>> xhci_stop_device() function. DId I miss something else? Thanks. >>>> Consider below three threads running on different CPUs at the same time. >>>> >>>> Thread A: xhci_handle_cmd_stop_ep() --- in interrupt handler >>>> Thread B: xhci_stop_endpoint_command_timeout() --- timer expired >>>> Thread C: xhci_urb_dequeue --- called by usb core >>>> >>>> They are serialized by xhci->lock. Let's consider below sequence: >>>> >>>> Thread A: >>>> - delete xhci->cmd_timer), but will fail due to Thread B. >>>> - clear EP_HALT_PENDING bit >>>> >>>> Thread B: >>>> - halt the host controller >>>> >>>> Thread C: >>>> - set EP_HALT_PENDING bit >>>> - enqueue another stop endpoint command >>>> - add the timer back >>> Ah, thanks for you comments. >>> If thread B halted the host, then the thread C xhci_urb_dequeue() will >>> check host state failed, then just return and can not set another stop >>> endpoint command. >> Not so simple. Thread B will release xhci->lock before xhci_halt(). > Yes. > >>> But from your example, I think your meaning is we >>> should not halt the host by thread B, since we have handled the stop >>> endpoint command event. >>> >>> So I think we need to check the xhci command of this stop endpoint >>> command in xhci_stop_endpoint_command_timeout(), if the xhci command >>> of this stop endpoint command is not in the command list (which means >>> the stop endpoint command event has been handled), then just return >>> and do not halt the controller. Right? >>> >> I'd like suggest you to put this change into a separated patch. >> It's actually a different topic from the main purpose of this patch. > OK, I will. Thanks for your comments. > If you are going to work out a v2 patch, please consider whether we can totally leverage the common command mechanism to handle this stop endpoint command. Currently, when a stop endpoint command is issued for urb unlink, there are two timers for this command. This is duplicated and we should remove the stop-endpoint-only timer. The timeout functions are also different. The stop-endpoint-only timer just halt the host controller (this should be the last sort of way), while the common command timer will try to recover the situation by aborting and restart the command ring. Best regards, Lu Baolu