Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp797013pxb; Wed, 3 Nov 2021 12:33:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwHN4N96gVkBopiB7qJKnpJOB769bYHE1OzqhNSaR8PomIKBh8OrAID2SkIQqSrxxKJ3yfz X-Received: by 2002:a05:6638:2045:: with SMTP id t5mr330468jaj.127.1635967985641; Wed, 03 Nov 2021 12:33:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635967985; cv=none; d=google.com; s=arc-20160816; b=Zfi06zcOVsghhe+sG2NHUNQCjfcJqIyqxZjMDNlCuTjHJeBD3+r9xc/SRTaND5fl2D bogXjLzqeb/4MbvpsX0/Sav6rgqLpGMBCfCT6t1X7EzXr3vFKG/9ydJOlZri72a2Mzpy n86zgkRA/ealBxAwMXxthitXwXu+dJAMFwMlZ+ugMWVgbGDAgO1oYNmBJOm/uspZeufv HJqrJJBMOQQEjwb4yznHXJBKNWjAj9Y+hgS8EFq0swgdBcgAcDZTBYSmD+4WmIX117Rh ES5MKOba9YvKk/Qccn8OSsB8S2NMnRkIxTjFZyTSwJEhq/iloIP3OqHZG33QZLN0dI2r QeQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version; bh=IQI4ezHyOyzLKoSeTZx2P1AswyvJERplfxrhIvRmBK8=; b=dkE+6sUfrkhLHHXWFWt5ZJzCYI23uWn4njupVFu4uNyOfN3wOVXzF3cNBKrL6klsYX r67cU/JwlUwXkIZ+pyVCqoNtKOHFEoXXCpnV+MjFqWHOh8AofEtCbc6k5iZteG5K3fQp uF+JPKqUVdv6wJULTYxdLuCTwELcvG3dlbii8RU9xS6MDYrrDrA0ngSkj1Nj4I6xfW0C 7HZeEeGtR4ThnlloeL+wtMbeWp1neF/ADjA4SSeQrkZKWVnN85jXP7zs9mSo4vZL23z3 F+H+rS3wZrVAlV60LToU6v+Q2G55YQS88xBk6NgtaVyBZiW23O8d69UIAz7anHZIYF5S fUHw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-bluetooth-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-bluetooth-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f18si5119712jav.26.2021.11.03.12.31.59; Wed, 03 Nov 2021 12:33:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-bluetooth-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-bluetooth-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-bluetooth-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230198AbhKCTdn convert rfc822-to-8bit (ORCPT + 99 others); Wed, 3 Nov 2021 15:33:43 -0400 Received: from coyote.holtmann.net ([212.227.132.17]:38865 "EHLO mail.holtmann.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230156AbhKCTdm (ORCPT ); Wed, 3 Nov 2021 15:33:42 -0400 Received: from smtpclient.apple (p4fefc15c.dip0.t-ipconnect.de [79.239.193.92]) by mail.holtmann.org (Postfix) with ESMTPSA id 27C9DCED15; Wed, 3 Nov 2021 20:31:04 +0100 (CET) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.20.0.1.32\)) Subject: Re: Userspace enumeration hang while btusb tries to load firmware of removed device From: Marcel Holtmann In-Reply-To: <20211103182303.GB1529362@rowland.harvard.edu> Date: Wed, 3 Nov 2021 20:31:03 +0100 Cc: Benjamin Berg , linux-usb@vger.kernel.org, linux-bluetooth@vger.kernel.org Content-Transfer-Encoding: 8BIT Message-Id: References: <20211103182303.GB1529362@rowland.harvard.edu> To: Alan Stern X-Mailer: Apple Mail (2.3693.20.0.1.32) Precedence: bulk List-ID: X-Mailing-List: linux-bluetooth@vger.kernel.org Hi Alan, >> a user is seeing a hang in fprintd while enumerating devices which >> appears to be caused by an interaction of: >> >> * system is resuming from S3 >> * btusb starts loading firmware >> * bluetooth device disappears (probably thinkpad_acpi rfkill) >> * libusb enumerates USB devices (fprintd in this case) >> >> When this happens, the firmware load fails after a timeout of 10s. It >> appears that if userspace queries information about the root hub in >> question during this time, it will hang until the btusb firmware load >> has timed out. >> >> Attaching the full kernel log, below an excerpt, you can see: >> * At :12 device removal: "usb 5-4: USB disconnect, device number 33" >> * libusb enumeration retrieves information about the usb5 root hub, >> and blocks on this >> * At :14 there is a tx timeout on hci0 >> * At :23 the firmware load finally fails >> * Then usb_disable_device happens >> * libusb/fprintd gets the usb5 HUB information and continues its >> enumeration >> >> As I see it, there may be two issues: >> 1. userspace should not block due to the firmware load hanging >> 2. btusb should give up more quickly when the device disappears >> >> Does anyone have a good idea about the possible cause or how we can fix >> the problem? >> >> Downstream issue: https://bugzilla.redhat.com/show_bug.cgi?id=2019857 > > I'm not familiar with the btusb driver, so someone on the > linux-bluetooth mailing list would have a better idea about this. > However, it does look as though btusb keeps the device locked during the > entire 10-second period while it tries to send over the firmware, and it > doesn't abort the procedure when it starts getting disconnection errors > but instead persists until a timeout expires. Keeping the device locked > would certainly block lsusb. > > In general, locking the device during a firmware upload seems like > the right thing to do -- you don't want extraneous transfers from > other processes messing up the firmware! So overall, it appears that > the whole problem would be solved if the firmware transfer were > aborted as soon as the -ENODEV errors start appearing. the problem seems to be that we hitting HCI command timeout. So the firmware download is done via HCI commands. These commands are send to the transport driver btusb.c via hdev->send (as btusb_send_frame). This triggers the usb_submit_urb or queues them via data->deferred anchor. All this reports back the error properly except that nobody does anything with it. See hci_send_frame() last portion: err = hdev->send(hdev, skb); if (err < 0) { bt_dev_err(hdev, "sending frame failed (%d)", err); kfree_skb(skb); } And that is it. We are not checking for ENODEV or any error here. That means the failure of the HCI command gets only caught via the HCI command timeout. I don’t know how to do this yet, but you would have to look there to fail HCI command right away instead of waiting for the timeout. Regards Marcel