Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757510AbcLBCsh (ORCPT ); Thu, 1 Dec 2016 21:48:37 -0500 Received: from mail-yw0-f172.google.com ([209.85.161.172]:34447 "EHLO mail-yw0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752648AbcLBCsf (ORCPT ); Thu, 1 Dec 2016 21:48:35 -0500 MIME-Version: 1.0 In-Reply-To: <5840CB93.4010908@linux.intel.com> References: <613dafc211127a4589306e91e231e151feb5ce80.1480496291.git.baolin.wang@linaro.org> <583FB8F0.6070503@linux.intel.com> <583FC4AF.3030502@linux.intel.com> <583FD4C6.7020306@linux.intel.com> <5840CB93.4010908@linux.intel.com> From: Baolin Wang Date: Fri, 2 Dec 2016 10:48:33 +0800 Message-ID: Subject: Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command To: Lu Baolu Cc: mathias.nyman@intel.com, Greg KH , USB , LKML , Mark Brown Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4607 Lines: 103 On 2 December 2016 at 09:17, Lu Baolu wrote: > Hi, > > On 12/01/2016 04:03 PM, Baolin Wang wrote: >> On 1 December 2016 at 15:44, Lu Baolu wrote: >>> Hi, >>> >>> On 12/01/2016 03:35 PM, Baolin Wang wrote: >>>> On 1 December 2016 at 14:35, Lu Baolu wrote: >>>>> Hi, >>>>> >>>>> On 12/01/2016 02:04 PM, Baolin Wang wrote: >>>>>> Hi Baolu, >>>>>> >>>>>> On 1 December 2016 at 13:45, Lu Baolu wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On 11/30/2016 05:02 PM, Baolin Wang wrote: >>>>>>>> If the hardware never responds to the stop endpoint command, the >>>>>>>> URBs will never be completed, and we might hang the USB subsystem. >>>>>>>> The original watchdog timer is used to watch if one stop endpoint >>>>>>>> command is timeout, if timeout, then the watchdog timer will set >>>>>>>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all >>>>>>>> pending URBs. >>>>>>>> >>>>>>>> But now we already have one command timer to control command timeout, >>>>>>>> thus we can also use the command timer to watch the stop endpoint >>>>>>>> command, instead of one duplicate watchdog timer which need to be >>>>>>>> removed. >>>>>>>> >>>>>>>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if >>>>>>>> this is the last stop endpoint command of one endpoint. Since we >>>>>>>> can make sure we only set one stop endpoint command for one endpoint >>>>>>>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove >>>>>>>> this flag. >>>>>>> I am afraid you can't do this. "stop_cmds_pending" was added >>>>>>> to fix the problem described in the comments that you want to >>>>>>> remove. But I didn't find any fix of this problem in your patch. >>>>>> Now we can not pending another stop endpoint command for the same one >>>>>> endpoint, since will check 'EP_HALT_PENDING' flag in >>>>>> xhci_urb_dequeue() function to avoid this. But after some >>>>>> investigation, I think I missed the stop endpoint command in >>>>>> xhci_stop_device() which did not check the 'EP_HALT_PENDING' flag, >>>>>> maybe need to add 'EP_HALT_PENDING' flag checking in >>>>>> xhci_stop_device() function. DId I miss something else? Thanks. >>>>> Consider below three threads running on different CPUs at the same time. >>>>> >>>>> Thread A: xhci_handle_cmd_stop_ep() --- in interrupt handler >>>>> Thread B: xhci_stop_endpoint_command_timeout() --- timer expired >>>>> Thread C: xhci_urb_dequeue --- called by usb core >>>>> >>>>> They are serialized by xhci->lock. Let's consider below sequence: >>>>> >>>>> Thread A: >>>>> - delete xhci->cmd_timer), but will fail due to Thread B. >>>>> - clear EP_HALT_PENDING bit >>>>> >>>>> Thread B: >>>>> - halt the host controller >>>>> >>>>> Thread C: >>>>> - set EP_HALT_PENDING bit >>>>> - enqueue another stop endpoint command >>>>> - add the timer back >>>> Ah, thanks for you comments. >>>> If thread B halted the host, then the thread C xhci_urb_dequeue() will >>>> check host state failed, then just return and can not set another stop >>>> endpoint command. >>> Not so simple. Thread B will release xhci->lock before xhci_halt(). >> Yes. >> >>>> But from your example, I think your meaning is we >>>> should not halt the host by thread B, since we have handled the stop >>>> endpoint command event. >>>> >>>> So I think we need to check the xhci command of this stop endpoint >>>> command in xhci_stop_endpoint_command_timeout(), if the xhci command >>>> of this stop endpoint command is not in the command list (which means >>>> the stop endpoint command event has been handled), then just return >>>> and do not halt the controller. Right? >>>> >>> I'd like suggest you to put this change into a separated patch. >>> It's actually a different topic from the main purpose of this patch. >> OK, I will. Thanks for your comments. >> > > If you are going to work out a v2 patch, please consider whether > we can totally leverage the common command mechanism to > handle this stop endpoint command. > > Currently, when a stop endpoint command is issued for urb unlink, > there are two timers for this command. This is duplicated and we > should remove the stop-endpoint-only timer. The timeout functions > are also different. The stop-endpoint-only timer just halt the host > controller (this should be the last sort of way), while the common > command timer will try to recover the situation by aborting and > restart the command ring. Yes, thanks for your reminding and I will think about that. -- Baolin.wang Best Regards