Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp348954pxb; Mon, 25 Oct 2021 09:27:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxOum2gFVnV2bRbt33RrrjxAUhb3AD7tCYM/Rq2zEhSC8YNQrdcCJnif2Kfd77iDlnQxmpF X-Received: by 2002:a17:907:629b:: with SMTP id nd27mr23170359ejc.24.1635179238841; Mon, 25 Oct 2021 09:27:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635179238; cv=none; d=google.com; s=arc-20160816; b=xj3fz2FnQHsctCdQlSNxRAj0vBIoMsAd3TO2Ko740SnnOzKYCN28bSl4Obj2DRvkqI S8mQz9SSpxvaY6W1++nJIiQJHw4ZCYTE4CfgtDVQK8GjWWtgw6PoeiLGOGzq/S5VLEUo tHS3g4VD5pVjzBj4eKW6W/uNxLXVNAnavF3g8/b7S053ZKC1JSu9pz5aGpjBVcEIAu+M OQ+qNgCgVonjVoDLL2pz1OyzY2IBptiLAGTtUJp8AFhhG/07fUyXpByoQjGhhsO+H2jB tOgoqFnBnylBWMAMXHT0MB27pgoFPNlUzK45oX2Q0H0FITa0LHDa/NzkNXrpdBHyVLmL EMiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=GMLpwDgyeR8JPJtTTwNllVOffCtVguS4A3kJRpRHESk=; b=vqPNKOp/NWOcvbmH9XHMWgNCwn8vYcCBsS96oBpu2uGwr4lGfan0Lga12EsiTAiwOW 4yVX1jP1xjx3m6WQpiYdQrutltNz2iLt3ZDxmx3tlfVw93ovnZyZPaT8urqtdrYlzBeA Lybk/YWUbAbtxM1QKgBmcZ9L+W485GcpdNMBxPcPGkrpVALBQDGt+yF5w4UJ7tX6ydVN f2rBgE8ISYQ5LPNAAQXI9Yg9OaH7xX39oH1Jyq9+5d7ngGLZsmZN7idRVuXK+bZTiCce FqUg2ykVRCLFqy4J/EeL2IttJPRHwhtdLnXY5coNsYE0TftArhRsxwE/GiO9gW5/JZPh PoXQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id jg5si919852ejc.71.2021.10.25.09.26.55; Mon, 25 Oct 2021 09:27:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233959AbhJYQY3 (ORCPT + 99 others); Mon, 25 Oct 2021 12:24:29 -0400 Received: from netrider.rowland.org ([192.131.102.5]:43493 "HELO netrider.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S233937AbhJYQYY (ORCPT ); Mon, 25 Oct 2021 12:24:24 -0400 Received: (qmail 1263247 invoked by uid 1000); 25 Oct 2021 12:22:00 -0400 Date: Mon, 25 Oct 2021 12:22:00 -0400 From: Alan Stern To: Krzysztof Kozlowski Cc: Felipe Balbi , Greg Kroah-Hartman , syzbot , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, syzkaller-bugs@googlegroups.com, Pavel Skripkin , Thierry Escande , Andrey Konovalov Subject: Re: [syzbot] INFO: task hung in port100_probe Message-ID: <20211025162200.GC1258186@rowland.harvard.edu> References: <000000000000c644cd05c55ca652@google.com> <9e06e977-9a06-f411-ab76-7a44116e883b@canonical.com> <20210722144721.GA6592@rowland.harvard.edu> <20211020220503.GB1140001@rowland.harvard.edu> <7d26fa0f-3a45-cefc-fd83-e8979ba6107c@canonical.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7d26fa0f-3a45-cefc-fd83-e8979ba6107c@canonical.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 25, 2021 at 04:57:23PM +0200, Krzysztof Kozlowski wrote: > On 21/10/2021 00:05, Alan Stern wrote: > >> > >> The syzkaller reproducer fails if >1 of threads are running these usb > >> gadgets. When this happens, no "in_urb" completion happens. No this > >> "ack" port100_recv_ack(). > >> > >> I added some debugs and simply dummy_hcd dummy_timer() is woken up on > >> enqueuing in_urb and then is looping crazy on a previous URB (some older > >> URB, coming from before port100 driver probe started). The dummy_timer() > >> loop never reaches the second "in_urb" to process it, I think. > > > > Is there any way you can track down what's happening in that crazy loop? > > That is, what driver was responsible for the previous URB? > > > > We have seen this sort of thing before, where a driver submits an URB > > for a gadget which has disconnected. The URB fails with -EPROTO status > > but the URB's completion handler does an automatic resubmit. That can > > lead to a very tight loop with dummy-hcd, and it could easily prevent > > some other important processing from occurring. The simple solution is > > to prevent the driver from resubmitting when the completion status is > > -EPROTO. > > Hi Alan, > > Thanks for the reply. > > The URB which causes crazy loop is the port100 driver second URB, the > one called ack or in_urb. > > The flow is: > 1. probe() > 2. port100_get_command_type_mask() > 3. port100_send_cmd_async() > 4. port100_send_frame_async() > 5. usb_submit_urb(dev->out_urb) > The call succeeds, the dummy_hcd picks it up and immediately ends the > timer-loop with -EPROTO So that URB completes immediately. > The completion here does not resubmit another/same URB. I checked this > carefully and I hope I did not miss anything. Yeah, I see the same thing. > 6. port100_submit_urb_for_ack() which sends the in_urb: > usb_submit_urb(dev->in_urb) > ... wait for completion > ... dummy_hcd loops on this URB around line 2000: > if (status == -EINPROGRESS) > continue Do I understand this correctly? You're saying that dummy-hcd executes the following jump at line 1975: /* incomplete transfer? */ if (status == -EINPROGRESS) continue; which goes back up to the loop head on line 1831: list_for_each_entry_safe(urbp, tmp, &dum_hcd->urbp_list, urbp_list) { Is that right? I don't see why this should cause any problem. It won't loop back to the same URB; it will make its way through the list. (Unless the list has somehow gotten corrupted...) dum_hcd->urbp_list should be short (perhaps 32 entries at most), so the loop should reach the end of the list fairly quickly. Now, doing all this 1000 times per second could use up a significant portion of the available time. Do you think that's the reason for the problem? It seems pretty unlikely. Alan Stern