Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp834965pxb; Wed, 3 Nov 2021 13:13:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwotz8g50FnN0WdT8zHufrIcXbGvylO8GTuz3N0iDlhgBIyQsFsC5ZyZk9r1co1yfPvK1Y8 X-Received: by 2002:a17:906:e103:: with SMTP id gj3mr5179823ejb.456.1635970394399; Wed, 03 Nov 2021 13:13:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635970394; cv=none; d=google.com; s=arc-20160816; b=XdCu6DCY4JayFsVCgPNbc9OEjaDHPP3eLV+t3M1OMBVuGgv1sx0Gpatne4cR6wlDCf IPtLTIQFn133dw9fIXTPU0s4MaIPwiMSD6XCge8lUE/xVs0MELzXDuioH8uTbJ3UA3fy ctFHbIHBXh8wHI0qpB4jV2pTFENvKg4P4sRJzYo2r/w8BS5o0l2yHDgs92c3KuLZW70+ BlwxbIrz5I6Xy/NZ1NXMJ2k5Rg0N948iIvir1cR8hOShWfEf3Gu+LYXcVstq1KHtq/Sw ZTuM5ItzDyedTQyrELrh1STUrdIISwVQ0C2u2b2dXpCVC4kMhZqPh3lQ/geELveTY0Ow sHLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=IW9CIPa5ts5gzlYS3u3FhNYpYisgfeyxXBcVZcr4b4w=; b=Srd2KCjCXcuuZWd4kvXBd2OiyGTF3/i9BfPAWtaRU0htiSc0EeFFKF0yrnchkJGIHE oVRCLRvbDNwkKLlpaO8b9Uv67pas28W4YeDYL9BrxYmGLKjp9heXG7OgPSjqwoY+fNb4 ptCzUhXC6CqIf8/X15pvFHiMRuZJ8gUiHHC+PXlvH0+g6fBei0c1ZJG3BU8to5dXiy0D lVOwWUvYhUZcwMhRsr1KL0aqvnJhqPN7EMWsISdPv7PGJh4UzFauLA3PIrmpnDOpOz6n G7vc3iyLgtVnCe65uGBdPXfDDxvU+0NoLrT5mdYgEO8h1d78UwgEE+msutJ3vuNW8dXe JzeA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-bluetooth-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-bluetooth-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bq5si4814458edb.617.2021.11.03.13.12.30; Wed, 03 Nov 2021 13:13:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-bluetooth-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-bluetooth-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-bluetooth-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230243AbhKCUOQ (ORCPT + 99 others); Wed, 3 Nov 2021 16:14:16 -0400 Received: from netrider.rowland.org ([192.131.102.5]:49937 "HELO netrider.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S230172AbhKCUOQ (ORCPT ); Wed, 3 Nov 2021 16:14:16 -0400 Received: (qmail 1535365 invoked by uid 1000); 3 Nov 2021 16:11:38 -0400 Date: Wed, 3 Nov 2021 16:11:38 -0400 From: Alan Stern To: Marcel Holtmann Cc: Benjamin Berg , linux-usb@vger.kernel.org, linux-bluetooth@vger.kernel.org Subject: Re: Userspace enumeration hang while btusb tries to load firmware of removed device Message-ID: <20211103201138.GC1529362@rowland.harvard.edu> References: <20211103182303.GB1529362@rowland.harvard.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-bluetooth@vger.kernel.org On Wed, Nov 03, 2021 at 08:31:03PM +0100, Marcel Holtmann wrote: > Hi Alan, > > >> a user is seeing a hang in fprintd while enumerating devices which > >> appears to be caused by an interaction of: > >> > >> * system is resuming from S3 > >> * btusb starts loading firmware > >> * bluetooth device disappears (probably thinkpad_acpi rfkill) > >> * libusb enumerates USB devices (fprintd in this case) > >> > >> When this happens, the firmware load fails after a timeout of 10s. It > >> appears that if userspace queries information about the root hub in > >> question during this time, it will hang until the btusb firmware load > >> has timed out. > >> > >> Attaching the full kernel log, below an excerpt, you can see: > >> * At :12 device removal: "usb 5-4: USB disconnect, device number 33" > >> * libusb enumeration retrieves information about the usb5 root hub, > >> and blocks on this > >> * At :14 there is a tx timeout on hci0 > >> * At :23 the firmware load finally fails > >> * Then usb_disable_device happens > >> * libusb/fprintd gets the usb5 HUB information and continues its > >> enumeration > >> > >> As I see it, there may be two issues: > >> 1. userspace should not block due to the firmware load hanging > >> 2. btusb should give up more quickly when the device disappears > >> > >> Does anyone have a good idea about the possible cause or how we can fix > >> the problem? > >> > >> Downstream issue: https://bugzilla.redhat.com/show_bug.cgi?id=2019857 > > > > I'm not familiar with the btusb driver, so someone on the > > linux-bluetooth mailing list would have a better idea about this. > > However, it does look as though btusb keeps the device locked during the > > entire 10-second period while it tries to send over the firmware, and it > > doesn't abort the procedure when it starts getting disconnection errors > > but instead persists until a timeout expires. Keeping the device locked > > would certainly block lsusb. > > > > In general, locking the device during a firmware upload seems like > > the right thing to do -- you don't want extraneous transfers from > > other processes messing up the firmware! So overall, it appears that > > the whole problem would be solved if the firmware transfer were > > aborted as soon as the -ENODEV errors start appearing. > > the problem seems to be that we hitting HCI command timeout. So the > firmware download is done via HCI commands. These commands are send to > the transport driver btusb.c via hdev->send (as btusb_send_frame). > This triggers the usb_submit_urb or queues them via data->deferred > anchor. All this reports back the error properly except that nobody > does anything with it. > > See hci_send_frame() last portion: > > err = hdev->send(hdev, skb); > if (err < 0) { > bt_dev_err(hdev, "sending frame failed (%d)", err); > kfree_skb(skb); > } > > And that is it. We are not checking for ENODEV or any error here. That > means the failure of the HCI command gets only caught via the HCI > command timeout. I don’t know how to do this yet, but you would have > to look there to fail HCI command right away instead of waiting for > the timeout. I have no idea how all the different layers work here. Clearly something has to signal hdev->req_wait_q after setting hdev->req_status to some appropriate value. Can this be done in btusb.c, either when the URB is submitted or when it completes? Or in hci_send_frame? Alan Stern