Received: by 10.213.65.68 with SMTP id h4csp864045imn; Tue, 27 Mar 2018 10:09:48 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/tM81OGoZbU1SNnhUQrDCVbdCgsel5p4uMER2TY1N3TYF9rrBxj/d1za1oSIbJVNNB0bs9 X-Received: by 2002:a17:902:bc41:: with SMTP id t1-v6mr177913plz.56.1522170588454; Tue, 27 Mar 2018 10:09:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522170588; cv=none; d=google.com; s=arc-20160816; b=nne7we0oPU4h4Wvcg///MraVCihgPCE4t27HuDNqpVGeIlQ8aU2qEss5QJm3ghHxTx C2nHYr+b6/49S5M8cdRXnzJDpTPdiWm1m1+6HmzhuNFXs2VgodhO2ehiCMU5K9aiUCD3 2e1HQpqZI7GsQeUraRcddmX/rMfDHjPOOkcT2iFqtfrOu/Za/lqrGJ6L4KCxVd8onr2J JlpaAhsqfNCXbwsz21K1UDuMQiUciBd9GPDzKQ2Hybxs31KY0/mYZeLdERAMmKc9Tjs/ 4yLL1f9XTsNzLzu/lqncHZvGcakQz3OT+EWBIGS7H9madwyZG5vL4SZ7xmO4/cXg7d5L Jwbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:in-reply-to :subject:cc:to:from:date:arc-authentication-results; bh=PO/wL8L2tozpvzytTJlShCQnOR/L9L+4xSRoKqaYxrI=; b=J4CwYgFzrRwERy52EYf3UGPlOIehoNjVtOaF2p4xbuxtGmVZkoT/aqWspy+jMk0fMi 4puiw9o8iM2IX40DgkngH5amULLhbyt6lJ3mw0KCndeQsO8EcF0GXwnZiVPCei+Ow6rv ewuN18tfeQrPGaL4k9aCtKKwcxMgZ2pcmdD8trvg9XUGujx1AefC9JfvlkdL5zS0ojU+ nSCkLwZtHQqpsbo2zsk3wktyDYgbdn6vVnOkzYj9iUw+HZBTi/wW28dCRqeY2wvjxRi8 wERNgzNZRy6NZuBH2krKsophP2l7856r43yaexypz4HkkgHWt3TwPn/qR4BnOSiCuyTm BJkw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l69si1232558pfk.180.2018.03.27.10.09.25; Tue, 27 Mar 2018 10:09:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752365AbeC0RGw (ORCPT + 99 others); Tue, 27 Mar 2018 13:06:52 -0400 Received: from iolanthe.rowland.org ([192.131.102.54]:58850 "HELO iolanthe.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1753118AbeC0RGu (ORCPT ); Tue, 27 Mar 2018 13:06:50 -0400 Received: (qmail 5320 invoked by uid 2102); 27 Mar 2018 13:06:49 -0400 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 27 Mar 2018 13:06:49 -0400 Date: Tue, 27 Mar 2018 13:06:49 -0400 (EDT) From: Alan Stern X-X-Sender: stern@iolanthe.rowland.org To: Jonathan Liu cc: Jeffy Chen , linux-kernel , Brian Norris , , Douglas Anderson , AMAN DEEP , , Greg Kroah-Hartman , Subject: Re: [v6] usb: ohci: Proper handling of ed_rm_list to handle race condition between usb_kill_urb() and finish_unlinks() In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 26 Mar 2018, Jonathan Liu wrote: > > I am experiencing a USB function call hang from userspace with OCHI > > (full speed USB device) after updating from Linux 4.14.15 to 4.14.24 > > and noticed this commit. > > > > Here is the Linux 4.14.24 kernel stack trace (extracted from SysRq+w > > and amended with addr2line): > > [] (__schedule) from [] (schedule+0x50/0xb4) > > kernel/sched/core.c:2792 > > [] (schedule) from [] > > (usb_kill_urb.part.3+0x78/0xa8) include/asm-generic/preempt.h:59 > > [] (usb_kill_urb.part.3) from [] > > (usbdev_ioctl+0x1288/0x1cf0) drivers/usb/core/urb.c:690 > > [] (usbdev_ioctl) from [] > > (do_vfs_ioctl+0x9c/0x8ec) drivers/usb/core/devio.c:1835 > > [] (do_vfs_ioctl) from [] (SyS_ioctl+0x34/0x5c) > > fs/ioctl.c:47 > > [] (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x54) > > include/linux/file.h:39 > > > > Afterwards the kernel is unresponsive to disconnect/connect of the > > full speed USB device but I can connect/disconnect a high speed USB > > device to the same port and communicate to it without problem since it > > uses EHCI (OHCI is companion controller). If I try to connect the full > > speed USB device again it is still unresponsive. The userspace > > application is still hanging after all this. > > > > Could this commit be causing the issue? It sounds like the URB gets cancelled but is never given back. But although the commit you mentioned does affect this logic, I don't see the problem could arise. > There does seem to be something wrong with this commit. I did some > more testing with Linux 4.16.0-rc6 on sun7i-a20-olinuxino-lime. > I connected a sun7i-a20-pcduino3b running Linux 4.15.12 that has OTG > port with gadget zero kernel module (g_zero) loaded to the USB port > and did some testing with usbtest from tools/usb/testusb.c in kernel > repository. > > I had to do the following change to avoid getting flooded with > "usbtest 1-1:3.0: unlink retry" kernel messages after test 11 of > testusb starts: > --- a/drivers/usb/misc/usbtest.c > +++ b/drivers/usb/misc/usbtest.c > @@ -1473,7 +1473,7 @@ static int unlink1(struct usbtest_dev *dev, int > pipe, int > size, int async) > * resubmission, but since we're testing unlink > * paths, we can't. > */ > - ERROR(dev, "unlink retry\n"); > + /*ERROR(dev, "unlink retry\n");*/ That's a bizarre message in any case. The condition under which it gets printed is _not_ an error, just a race. I would expect it to show up every now and then just based on random chance. But if it gets printed over and over again then something is definitely wrong. > If I revert the commit, I can run "./testusb -a" to completion without > any issues with Linux 4.16.0-rc6 on sun7i-a20-olinuxino-lime. > I don't get a kernel stall detected when testing on x86_64 but I do > still get at least one "unlink retry" error message from kernel in > dmesg which can be reproduced using "./testusb -a" or more frequently > with "./testusb -a -t 11 -c 5000". A few of those "unlink retry" messages during a test with 5000 iterations does not mean anything is wrong; don't worry about it. It would be great if you could debug this farther. I'm not sure what to suggest, however. Clearly we need to know what's going on in ohci-q.c's finish_unlinks() routine; unfortunately this routine probably gets called thousands of times before the problem shows up. Anyway, here's a patch you can try out on the sun7i-a20-olinuxino-lime. I have no idea whether it will help pinpoint the source of the problem. Alan Stern Index: usb-4.x/drivers/usb/host/ohci-hcd.c =================================================================== --- usb-4.x.orig/drivers/usb/host/ohci-hcd.c +++ usb-4.x/drivers/usb/host/ohci-hcd.c @@ -327,14 +327,21 @@ static int ohci_urb_dequeue(struct usb_h * some upcoming INTR_SF to call finish_unlinks() */ urb_priv = urb->hcpriv; - if (urb_priv->ed->state == ED_OPER) + if (urb_priv->ed->state == ED_OPER) { + ohci_info(ohci, "start dequeue URB %p ed %p\n", urb, + urb_priv->ed); start_ed_unlink(ohci, urb_priv->ed); - + } else + ohci_info(ohci, "idle dequeue URB %p ed %p\n", urb, + urb_priv->ed); if (ohci->rh_state != OHCI_RH_RUNNING) { /* With HC dead, we can clean up right away */ ohci_work(ohci); } } + else + ohci_info(ohci, "failed dequeue URB %p\n", urb); + spin_unlock_irqrestore (&ohci->lock, flags); return rc; } Index: usb-4.x/drivers/usb/host/ohci-q.c =================================================================== --- usb-4.x.orig/drivers/usb/host/ohci-q.c +++ usb-4.x/drivers/usb/host/ohci-q.c @@ -1022,6 +1022,7 @@ skip_ed: *last = ed->ed_next; ed->ed_next = NULL; modified = 0; + ohci_info(ohci, "scanning unlinked ed %p\n", ed); /* unlink urbs as requested, but rescan the list after * we call a completion since it might have unlinked @@ -1072,6 +1073,8 @@ rescan_this: /* if URB is done, clean up */ if (urb_priv->td_cnt >= urb_priv->length) { modified = completed = 1; + ohci_info(ohci, "finish dequeue URB %p ed %p\n", + urb, ed); finish_urb(ohci, urb, 0); } }