Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp1376359pxy; Thu, 6 May 2021 06:51:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzucHdyconZx5HqEt4BiKmFIqn5FG2TgYbFSzqfFsISKtrKAAWRqoF0ITTEuUCO+hIoek4M X-Received: by 2002:aa7:cd83:: with SMTP id x3mr5205514edv.373.1620309118470; Thu, 06 May 2021 06:51:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620309118; cv=none; d=google.com; s=arc-20160816; b=Fyf1BWf9eepGQ/tSs2BWTah6pcBYSHNFnLWp87Vtz21OY2k5IgWh7Kj9QlAJrGsjpp WtfbJr8YaIQ54kTRAJx30okuSFd9NC2bbrQaCmK3TL67QPw/H3bQyUlJbnn0Hd93kXJZ jSF9KgDWAhwyIhEkkCZbmKwCNMEqFx3wGczXxeK/pYdRpH5G3U7B7d+U8qHXF5iluYf0 jyoJCg/cgQadvj9jDMZFl0VGt45O3LsGRl8dSt21IuesmrEw4qVv0odmZajq8ZJhH0B5 wropo4enXUn086gc4SRNm1GK10dpk86v2Et1sfVvQgslK+qHGqBxK4pLOSJW/x8V05hE laVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=4FmR8HaUVvWVn80za8CqRiyeCy/nXfVrunXJdS+S1Bw=; b=mVO3AYQjeO0fqsYHzlnusN3Pgul6JwZCgYB5lXeyhBuvgqQOLvwpBhAx94Ol/wxi0q KCakYXBGrIyE89bvy+FLtD2PX3XZuLrZp3OI/HqcX8o6P32zduocy8OfvNlp334qlg79 i8D7WtVUo2H8Mo0lsOG+hAf87KKOZJpX00UEiKkGUGOWu82J9FhneGpRJZr2g2kqxyU5 TIOKiKBeToW3Wg7N9RyonWJJZFs26plI8yoneyZl7OgD9/lDa5hEc1HBlXWoSudsnRYE 7GhTzTPRSZUI1tPmAbH8pv9iU2IBoqqrEt7juYyMtoqIYhOiB2+JL0zq3cJ2BLBF2wTI 21fg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ml14si2230771ejb.661.2021.05.06.06.51.33; Thu, 06 May 2021 06:51:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234449AbhEFNuD (ORCPT + 99 others); Thu, 6 May 2021 09:50:03 -0400 Received: from netrider.rowland.org ([192.131.102.5]:44985 "HELO netrider.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S234381AbhEFNuD (ORCPT ); Thu, 6 May 2021 09:50:03 -0400 Received: (qmail 735048 invoked by uid 1000); 6 May 2021 09:49:04 -0400 Date: Thu, 6 May 2021 09:49:04 -0400 From: Alan Stern To: Guido Kiener Cc: Dmitry Vyukov , syzbot , Greg Kroah-Hartman , "dpenkler@gmail.com" , "lee.jones@linaro.org" , USB list , "bp@alien8.de" , "dwmw@amazon.co.uk" , "hpa@zytor.com" , "linux-kernel@vger.kernel.org" , "luto@kernel.org" , "mingo@redhat.com" , "syzkaller-bugs@googlegroups.com" , "tglx@linutronix.de" , "x86@kernel.org" Subject: Re: Re: Re: [syzbot] INFO: rcu detected stall in tx Message-ID: <20210506134904.GA734112@rowland.harvard.edu> References: <58bda4726ca24d0e963a6787d4c86313@rohde-schwarz.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <58bda4726ca24d0e963a6787d4c86313@rohde-schwarz.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 05, 2021 at 10:22:24PM +0000, Guido Kiener wrote: > > -----Original Message----- > > From: Alan Stern > > Sent: Tuesday, May 4, 2021 5:14 PM > > To: Kiener Guido 14DS1 > > Subject: Re: Re: [syzbot] INFO: rcu detected stall in tx > > > > On Mon, May 03, 2021 at 09:56:05PM +0000, Guido Kiener wrote: > > > Hi all, > > > > > > Dave and I discussed the "self-detected stall on CPU" caused by the usbtmc > > driver. > > > > > > What happened? > > > The callback handler usbtmc_interrupt(struct urb *urb) for the INT pipe receives > > an erroneous urb with status -EPROTO (-71). > > > See > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre > > > e/drivers/usb/class/usbtmc.c?h=v5.12#n2340 > > > -EPROTO does not abort/shutdown the pipe and the urb is resubmitted to receive > > the next packet. However the callback handler usbtmc_interrupt is called again with > > the same erroneous status -EPROTO and this seems to result in an endless loop. > > > According to > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre > > > e/Documentation/driver-api/usb/error-codes.rst?h=v5.12#n177 > > > the error -EPROTO indicates a hardware problem or a bad cable. > > > > > > Most usb drivers do not react in a specific way on this hardware problems and > > resubmit the urb. We assume these drivers will run into the same endless loop. > > Some other driver samples are: > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre > > > e/drivers/usb/class/cdc-acm.c?h=v5.12#n379 > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre > > > e/drivers/hid/usbhid/usbmouse.c?h=v5.12#n65 > > > > > > Possible solutions: > > > Hardware defects or bad cables seems to be a common problem for most usb > > drivers and I assume we do not want to fix this problem in all class specific drivers, > > but in lower level host drivers, e.g: > > > 1. Using a counter and close the pipe after some detected errors 2. > > > Delay the resubmission of the urb to avoid high cpu usage 3. Do > > > nothing, since it is just a rare problem. > > > > > > We've never seen this problem in our products and we do not dare to change > > anything. > > > > Drivers are not consistent in the way they handle these errors, as you have seen. A > > few try to take active measures, such as retrys with increasing timeouts. Many > > drivers just ignore them, which is not a very good idea. > > > > The general feeling among kernel USB developers is that a -EPROTO, -EILSEQ, or > > -ETIME error should be regarded as fatal, much the same as an unplug event. The > > driver should avoid resubmitting URBs and just wait to be unbound from the device. > > Thanks for your assessment. I agree with the general feeling. I counted about hundred > specific usb drivers, so wouldn't it be better to fix the problem in some of the host drivers (e.g. urb.c)? > We could return an error when calling usb_submit_urb() on an erroneous pipe. > I cannot estimate the side effects and we need to check all drivers again how they deal with the > error situation. Maybe there are some special driver that need a specialized error handling. > In this case these drivers could reset the (new?) error flag to allow calling usb_submit_urb() > again without error. This could work, isn't it? That is feasible, although it would be an awkward approach. As you said, the side effects aren't clear. But it might work. > > If you would like to audit drivers and fix them up to behave this way, that would be > > great. > > Currently not. I cannot pull the USB cable in home office :-), but I will keep an eye on it. > When I'm more involved in the next USB driver issue than I will test bad cables and > maybe get more ideas how we could test and fix this rare error. Will you be able to test patches? Alan Stern