Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758761AbcLAHoM (ORCPT ); Thu, 1 Dec 2016 02:44:12 -0500 Received: from mga06.intel.com ([134.134.136.31]:45746 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757487AbcLAHoK (ORCPT ); Thu, 1 Dec 2016 02:44:10 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,723,1477983600"; d="scan'208";a="197770117" Subject: Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command To: Baolin Wang References: <613dafc211127a4589306e91e231e151feb5ce80.1480496291.git.baolin.wang@linaro.org> <583FB8F0.6070503@linux.intel.com> <583FC4AF.3030502@linux.intel.com> Cc: mathias.nyman@intel.com, Greg KH , USB , LKML , Mark Brown From: Lu Baolu Message-ID: <583FD4C6.7020306@linux.intel.com> Date: Thu, 1 Dec 2016 15:44:06 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3453 Lines: 82 Hi, On 12/01/2016 03:35 PM, Baolin Wang wrote: > On 1 December 2016 at 14:35, Lu Baolu wrote: >> Hi, >> >> On 12/01/2016 02:04 PM, Baolin Wang wrote: >>> Hi Baolu, >>> >>> On 1 December 2016 at 13:45, Lu Baolu wrote: >>>> Hi, >>>> >>>> On 11/30/2016 05:02 PM, Baolin Wang wrote: >>>>> If the hardware never responds to the stop endpoint command, the >>>>> URBs will never be completed, and we might hang the USB subsystem. >>>>> The original watchdog timer is used to watch if one stop endpoint >>>>> command is timeout, if timeout, then the watchdog timer will set >>>>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all >>>>> pending URBs. >>>>> >>>>> But now we already have one command timer to control command timeout, >>>>> thus we can also use the command timer to watch the stop endpoint >>>>> command, instead of one duplicate watchdog timer which need to be >>>>> removed. >>>>> >>>>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if >>>>> this is the last stop endpoint command of one endpoint. Since we >>>>> can make sure we only set one stop endpoint command for one endpoint >>>>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove >>>>> this flag. >>>> I am afraid you can't do this. "stop_cmds_pending" was added >>>> to fix the problem described in the comments that you want to >>>> remove. But I didn't find any fix of this problem in your patch. >>> Now we can not pending another stop endpoint command for the same one >>> endpoint, since will check 'EP_HALT_PENDING' flag in >>> xhci_urb_dequeue() function to avoid this. But after some >>> investigation, I think I missed the stop endpoint command in >>> xhci_stop_device() which did not check the 'EP_HALT_PENDING' flag, >>> maybe need to add 'EP_HALT_PENDING' flag checking in >>> xhci_stop_device() function. DId I miss something else? Thanks. >> Consider below three threads running on different CPUs at the same time. >> >> Thread A: xhci_handle_cmd_stop_ep() --- in interrupt handler >> Thread B: xhci_stop_endpoint_command_timeout() --- timer expired >> Thread C: xhci_urb_dequeue --- called by usb core >> >> They are serialized by xhci->lock. Let's consider below sequence: >> >> Thread A: >> - delete xhci->cmd_timer), but will fail due to Thread B. >> - clear EP_HALT_PENDING bit >> >> Thread B: >> - halt the host controller >> >> Thread C: >> - set EP_HALT_PENDING bit >> - enqueue another stop endpoint command >> - add the timer back > Ah, thanks for you comments. > If thread B halted the host, then the thread C xhci_urb_dequeue() will > check host state failed, then just return and can not set another stop > endpoint command. Not so simple. Thread B will release xhci->lock before xhci_halt(). > But from your example, I think your meaning is we > should not halt the host by thread B, since we have handled the stop > endpoint command event. > > So I think we need to check the xhci command of this stop endpoint > command in xhci_stop_endpoint_command_timeout(), if the xhci command > of this stop endpoint command is not in the command list (which means > the stop endpoint command event has been handled), then just return > and do not halt the controller. Right? > I'd like suggest you to put this change into a separated patch. It's actually a different topic from the main purpose of this patch. Best regards, Lu Baolu