Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757580AbcLAGfc (ORCPT ); Thu, 1 Dec 2016 01:35:32 -0500 Received: from mga06.intel.com ([134.134.136.31]:16763 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751388AbcLAGfb (ORCPT ); Thu, 1 Dec 2016 01:35:31 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,723,1477983600"; d="scan'208";a="1066505369" Subject: Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command To: Baolin Wang References: <613dafc211127a4589306e91e231e151feb5ce80.1480496291.git.baolin.wang@linaro.org> <583FB8F0.6070503@linux.intel.com> Cc: mathias.nyman@intel.com, Greg KH , USB , LKML , Mark Brown From: Lu Baolu Message-ID: <583FC4AF.3030502@linux.intel.com> Date: Thu, 1 Dec 2016 14:35:27 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3370 Lines: 83 Hi, On 12/01/2016 02:04 PM, Baolin Wang wrote: > Hi Baolu, > > On 1 December 2016 at 13:45, Lu Baolu wrote: >> Hi, >> >> On 11/30/2016 05:02 PM, Baolin Wang wrote: >>> If the hardware never responds to the stop endpoint command, the >>> URBs will never be completed, and we might hang the USB subsystem. >>> The original watchdog timer is used to watch if one stop endpoint >>> command is timeout, if timeout, then the watchdog timer will set >>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all >>> pending URBs. >>> >>> But now we already have one command timer to control command timeout, >>> thus we can also use the command timer to watch the stop endpoint >>> command, instead of one duplicate watchdog timer which need to be >>> removed. >>> >>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if >>> this is the last stop endpoint command of one endpoint. Since we >>> can make sure we only set one stop endpoint command for one endpoint >>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove >>> this flag. >> I am afraid you can't do this. "stop_cmds_pending" was added >> to fix the problem described in the comments that you want to >> remove. But I didn't find any fix of this problem in your patch. > Now we can not pending another stop endpoint command for the same one > endpoint, since will check 'EP_HALT_PENDING' flag in > xhci_urb_dequeue() function to avoid this. But after some > investigation, I think I missed the stop endpoint command in > xhci_stop_device() which did not check the 'EP_HALT_PENDING' flag, > maybe need to add 'EP_HALT_PENDING' flag checking in > xhci_stop_device() function. DId I miss something else? Thanks. Consider below three threads running on different CPUs at the same time. Thread A: xhci_handle_cmd_stop_ep() --- in interrupt handler Thread B: xhci_stop_endpoint_command_timeout() --- timer expired Thread C: xhci_urb_dequeue --- called by usb core They are serialized by xhci->lock. Let's consider below sequence: Thread A: - delete xhci->cmd_timer), but will fail due to Thread B. - clear EP_HALT_PENDING bit Thread B: - halt the host controller Thread C: - set EP_HALT_PENDING bit - enqueue another stop endpoint command - add the timer back Best regards, Lu Baolu > >> - * The timer may also fire if the host takes a very long time to respond to the >> - * command, and the stop endpoint command completion handler cannot delete the >> - * timer before the timer function is called. Another endpoint cancellation may >> - * sneak in before the timer function can grab the lock, and that may queue >> - * another stop endpoint command and add the timer back. So we cannot use a >> - * simple flag to say whether there is a pending stop endpoint command for a >> - * particular endpoint. >> - * >> - * Instead we use a combination of that flag and a counter for the number of >> - * pending stop endpoint commands. If the timer is the tail end of the last >> - * stop endpoint command, and the endpoint's command is still pending, we assume >> - * the host is dying. >> >> Best regards, >> Lu Baolu >> >>> We also need to clean up the command queue before trying to halt the >>> xHCI host in xhci_stop_endpoint_command_timeout() function. >>> >>> Signed-off-by: Baolin Wang > >