Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933252AbdCJVOO (ORCPT ); Fri, 10 Mar 2017 16:14:14 -0500 Received: from iolanthe.rowland.org ([192.131.102.54]:45120 "HELO iolanthe.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S932482AbdCJVOE (ORCPT ); Fri, 10 Mar 2017 16:14:04 -0500 Date: Fri, 10 Mar 2017 16:14:02 -0500 (EST) From: Alan Stern X-X-Sender: stern@iolanthe.rowland.org To: Guenter Roeck cc: Greg Kroah-Hartman , Douglas Anderson , Brian Norris , , Subject: Re: [PATCH] usb: hub: Fix error loop seen after hub communication errors In-Reply-To: <1489007329-8352-1-git-send-email-linux@roeck-us.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4024 Lines: 93 On Wed, 8 Mar 2017, Guenter Roeck wrote: > While stress testing a usb controller using a bind/unbind looop, the > following error loop was observed. > > usb 7-1.2: new low-speed USB device number 3 using xhci-hcd > usb 7-1.2: hub failed to enable device, error -108 > usb 7-1-port2: cannot disable (err = -22) > usb 7-1-port2: couldn't allocate usb_device > usb 7-1-port2: cannot disable (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: activate --> -22 > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: activate --> -22 ... > This continues forever. After adding tracebacks into the code, > the call sequence leading to this is found to be as follows. > > [] hub_activate+0x368/0x7b8 > [] hub_resume+0x2c/0x3c > [] usb_resume_interface.isra.6+0x128/0x158 > [] usb_suspend_both+0x1e8/0x288 > [] usb_runtime_suspend+0x3c/0x98 > [] __rpm_callback+0x48/0x7c > [] rpm_callback+0xa8/0xd4 > [] rpm_suspend+0x84/0x758 > [] rpm_idle+0x2c8/0x498 > [] __pm_runtime_idle+0x60/0xac > [] usb_autopm_put_interface+0x6c/0x7c > [] hub_event+0x10ac/0x12ac > [] process_one_work+0x390/0x6b8 > [] worker_thread+0x480/0x610 > [] kthread+0x164/0x178 > [] ret_from_fork+0x10/0x40 > > kick_hub_wq() is called from hub_activate() even after failures to > communicate with the hub. This results in an endless sequence of > hub event -> hub activate -> wq trigger -> hub event -> ... > To break the loop, only trigger the hub event queue if communication > with the hub is successful. > > Signed-off-by: Guenter Roeck > --- > drivers/usb/core/hub.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c > index 5286bf67869a..f22ab428b310 100644 > --- a/drivers/usb/core/hub.c > +++ b/drivers/usb/core/hub.c > @@ -1197,7 +1197,8 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) > &hub->leds, LED_CYCLE_PERIOD); > > /* Scan all ports that need attention */ > - kick_hub_wq(hub); > + if (!status) > + kick_hub_wq(hub); > > if (type == HUB_INIT2 || type == HUB_INIT3) { > /* Allow autosuspend if it was suppressed */ I'm not at all sure this is the best solution. status == 0 means that an URB was successfully _submitted_; it does not mean that future communication with the hub will be successful. The problem arises here because the system is unable to runtime-suspend the hub (you didn't say why, but probably because of the same communication problem). When a suspend fails, drivers are informed that the device is back at full power, and of course the hub driver tries to determine the hub's current state at that time. When this fails, there's no reason to keep the hub at full power any longer, so the system tries to do another runtime suspend. This suspend fails just like the first one, and there's your loop. This same kind of thing could happen with any driver. In your case, the problem could be solved by making sure that when the hub fails to suspend, it is still at least minimally functional. We don't do this now; when usb_port_suspend fails there is no checking to see whether the device is still working. (It would be interesting to know exactly at which point usb_port_suspend fails during your test. And why didn't your system mark the hub as disconnected when the host controller was unbound?) An even worse problem is what to do when the device is working okay but runtime suspend still fails (for example, if remote wakeup cannot be enabled). How do we prevent the system from going into a similar loop then? Alan Stern