Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp3708927pxy; Tue, 4 May 2021 08:14:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzi12W/cUjDI+x6UGPPz1ep9LWcdQEsp6ovphO2hcNWzWMkuLU3KEX3wlGPdVWQyUfwmEwf X-Received: by 2002:a17:90b:2353:: with SMTP id ms19mr5550141pjb.118.1620141269378; Tue, 04 May 2021 08:14:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620141269; cv=none; d=google.com; s=arc-20160816; b=czSinCMC2aWGw0oGutbs4qbOK72ajLbFuUd2OVgCY5jOBtPW5dmJv295qHPiGqe1Pv YOFLLZCK0nyJVtK2rsrtVtkkouMsuJrgCR35ZbNW1rTuSQ6vfYugmVDn2ES5cbe0752S lxkEqM1K+6+IO6PeclJi9YBkK0Xf9exEPsPxyycp4Y02CBTGK6UD3ibkXpdyt+hlGUYM qq11uv5r7g/eps8ltH7aspm+4oLBbpDOVt9RJHA7IWWdJhR3ZllrLS4r2CYOcMdzvZon AbfLvEnCpn6aZXY2nUX1+q8FlelOWgeNCb7hjDoN8n29Tiy0cfnJOSN9z9fXlEmgXiYW a3ZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=OqqCEJ5KHHhiOc3N8N0Uh1ZPL5iZCYl8V/osaaRdQKw=; b=K5cdZk1F4wmmuBQ9x7Rq4cqAsGmwWp7oom5mtMcKCacKaOZyGTKkGUK+ljt+Gb9wjw QmWmHrjiiWnwj9JD35ziev0cB0P0/J4FdHGdV0LNSZHOUaMhDqo5eb6rGThZuJo9nlGB IeftsGKlOeU+BA80mUY/95cs4MwSrEIemkKbmHeA8oesCxj52X7KLngAMuHhHfZem7JV CQtorLGxjytYbX8VbXAjOHMe9YyKgpu+KaSB378Yfmfpt+VnHRGuawZkoLz2pN6grwXg 6IxG3ec+yB9/d1g/g1iA8o3+cyxGAgvj96Q4xFuUMor/l/jJgD+WtKHWNIPtYaGtgENw 4Iqg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d19si16419311pjr.127.2021.05.04.08.14.16; Tue, 04 May 2021 08:14:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230512AbhEDPO2 (ORCPT + 99 others); Tue, 4 May 2021 11:14:28 -0400 Received: from netrider.rowland.org ([192.131.102.5]:57623 "HELO netrider.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S230246AbhEDPO1 (ORCPT ); Tue, 4 May 2021 11:14:27 -0400 Received: (qmail 660129 invoked by uid 1000); 4 May 2021 11:13:31 -0400 Date: Tue, 4 May 2021 11:13:31 -0400 From: Alan Stern To: Guido Kiener Cc: Dmitry Vyukov , syzbot , Greg Kroah-Hartman , "dpenkler@gmail.com" , "lee.jones@linaro.org" , USB list , "bp@alien8.de" , "dwmw@amazon.co.uk" , "hpa@zytor.com" , "linux-kernel@vger.kernel.org" , "luto@kernel.org" , "mingo@redhat.com" , "syzkaller-bugs@googlegroups.com" , "tglx@linutronix.de" , "x86@kernel.org" Subject: Re: Re: [syzbot] INFO: rcu detected stall in tx Message-ID: <20210504151331.GB657070@rowland.harvard.edu> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 03, 2021 at 09:56:05PM +0000, Guido Kiener wrote: > Hi all, > > Dave and I discussed the "self-detected stall on CPU" caused by the usbtmc driver. > > What happened? > The callback handler usbtmc_interrupt(struct urb *urb) for the INT pipe receives an erroneous urb with status -EPROTO (-71). > See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/class/usbtmc.c?h=v5.12#n2340 > -EPROTO does not abort/shutdown the pipe and the urb is resubmitted to receive the next packet. However the callback handler usbtmc_interrupt is called again with the same erroneous status -EPROTO and this seems to result in an endless loop. > According to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/driver-api/usb/error-codes.rst?h=v5.12#n177 > the error -EPROTO indicates a hardware problem or a bad cable. > > Most usb drivers do not react in a specific way on this hardware problems and resubmit the urb. We assume these drivers will run into the same endless loop. Some other driver samples are: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/class/cdc-acm.c?h=v5.12#n379 > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/hid/usbhid/usbmouse.c?h=v5.12#n65 > > Possible solutions: > Hardware defects or bad cables seems to be a common problem for most usb drivers and I assume we do not want to fix this problem in all class specific drivers, but in lower level host drivers, e.g: > 1. Using a counter and close the pipe after some detected errors > 2. Delay the resubmission of the urb to avoid high cpu usage > 3. Do nothing, since it is just a rare problem. > > We've never seen this problem in our products and we do not dare to change anything. Drivers are not consistent in the way they handle these errors, as you have seen. A few try to take active measures, such as retrys with increasing timeouts. Many drivers just ignore them, which is not a very good idea. The general feeling among kernel USB developers is that a -EPROTO, -EILSEQ, or -ETIME error should be regarded as fatal, much the same as an unplug event. The driver should avoid resubmitting URBs and just wait to be unbound from the device. If you would like to audit drivers and fix them up to behave this way, that would be great. (FYI, by far the most common causes of these errors are: The user has unplugged the USB cable, or the device's firmware has crashed. It is quite rare for the cause to be intermittent, although not entirely unheard of -- for example, someone once reported errors resulting from EM or power-line interference caused by flickering fluorescent lights or something of that sort. It's pretty safe to ignore this possibility.) Alan Stern