Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751282AbdCQQaZ (ORCPT ); Fri, 17 Mar 2017 12:30:25 -0400 Received: from netrider.rowland.org ([192.131.102.5]:60801 "HELO netrider.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751200AbdCQQaX (ORCPT ); Fri, 17 Mar 2017 12:30:23 -0400 Date: Fri, 17 Mar 2017 12:30:21 -0400 (EDT) From: Alan Stern X-X-Sender: stern@netrider.rowland.org To: Guenter Roeck cc: Greg Kroah-Hartman , , , Douglas Anderson , Brian Norris Subject: Re: [PATCH v3] usb: hub: Fix error loop seen after hub communication errors In-Reply-To: <1489692253-20299-1-git-send-email-linux@roeck-us.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5875 Lines: 148 On Thu, 16 Mar 2017, Guenter Roeck wrote: > While stress testing a usb controller using a bind/unbind looop, the > following error loop was observed. > > usb 7-1.2: new low-speed USB device number 3 using xhci-hcd > usb 7-1.2: hub failed to enable device, error -108 > usb 7-1-port2: cannot disable (err = -22) > usb 7-1-port2: couldn't allocate usb_device > usb 7-1-port2: cannot disable (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: activate --> -22 > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: activate --> -22 > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: activate --> -22 > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: activate --> -22 > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: activate --> -22 > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: activate --> -22 > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: activate --> -22 > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: activate --> -22 > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > ** 57 printk messages dropped ** hub 7-1:1.0: activate --> -22 > ** 82 printk messages dropped ** hub 7-1:1.0: hub_ext_port_status failed (err = -22) > > This continues forever. After adding tracebacks into the code, > the call sequence leading to this is found to be as follows. > > [] hub_activate+0x368/0x7b8 > [] hub_resume+0x2c/0x3c > [] usb_resume_interface.isra.6+0x128/0x158 > [] usb_suspend_both+0x1e8/0x288 > [] usb_runtime_suspend+0x3c/0x98 > [] __rpm_callback+0x48/0x7c > [] rpm_callback+0xa8/0xd4 > [] rpm_suspend+0x84/0x758 > [] rpm_idle+0x2c8/0x498 > [] __pm_runtime_idle+0x60/0xac > [] usb_autopm_put_interface+0x6c/0x7c > [] hub_event+0x10ac/0x12ac > [] process_one_work+0x390/0x6b8 > [] worker_thread+0x480/0x610 > [] kthread+0x164/0x178 > [] ret_from_fork+0x10/0x40 > > kick_hub_wq() is called from hub_activate() even after failures to > communicate with the hub. This results in an endless sequence of > hub event -> hub activate -> wq trigger -> hub event -> ... > > Provide two solutions for the problem. > > - Only trigger the hub event queue if communication with the hub > was successful. > - After a suspend failure, only resume already suspended interfaces > if the communication with the device is still possible. > > Each of the changes fixes the observed problem. Use both to improve > robustness. > > Signed-off-by: Guenter Roeck > --- > v3: In hub.c, abort immediately if hub_port_status() returns an error. > Since hub_port_status() already logs the error, don't log it again. > In device,c, log the error return value from usb_suspend_device() > if usb_get_status() failed as well. > Don't check if the hub is still accessible if the error returned > from hub_port_status() is -EBUSY. > v2: Instead of not triggering the hub wq after an error to submit an urb, > implement a more complex error detection and handling. Do it in two > places. Marked as RFC to determine if one (or both) of those solutions > are viable. > > drivers/usb/core/driver.c | 18 ++++++++++++++++++ > drivers/usb/core/hub.c | 5 ++++- > 2 files changed, 22 insertions(+), 1 deletion(-) Acked-by: Alan Stern > diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c > index cdee5130638b..7ebdf2a4e8fe 100644 > --- a/drivers/usb/core/driver.c > +++ b/drivers/usb/core/driver.c > @@ -1331,6 +1331,24 @@ static int usb_suspend_both(struct usb_device *udev, pm_message_t msg) > */ > if (udev->parent && !PMSG_IS_AUTO(msg)) > status = 0; > + > + /* > + * If the device is inaccessible, don't try to resume > + * suspended interfaces and just return the error. > + */ > + if (status && status != -EBUSY) { > + int err; > + u16 devstat; > + > + err = usb_get_status(udev, USB_RECIP_DEVICE, 0, > + &devstat); > + if (err) { > + dev_err(&udev->dev, > + "Failed to suspend device, error %d\n", > + status); > + goto done; > + } > + } > } > > /* If the suspend failed, resume interfaces that did get suspended */ > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c > index 5a420657f9f7..5ab7cd09e1a4 100644 > --- a/drivers/usb/core/hub.c > +++ b/drivers/usb/core/hub.c > @@ -1066,6 +1066,9 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) > > portstatus = portchange = 0; > status = hub_port_status(hub, port1, &portstatus, &portchange); > + if (status) > + goto abort; > + > if (udev || (portstatus & USB_PORT_STAT_CONNECTION)) > dev_dbg(&port_dev->dev, "status %04x change %04x\n", > portstatus, portchange); > @@ -1198,7 +1201,7 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) > > /* Scan all ports that need attention */ > kick_hub_wq(hub); > - > +abort: > if (type == HUB_INIT2 || type == HUB_INIT3) { > /* Allow autosuspend if it was suppressed */ > disconnected: >