Received: by 2002:ab2:3141:0:b0:1ed:23cc:44d1 with SMTP id i1csp1959279lqg; Mon, 4 Mar 2024 08:41:26 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCViGJCzH9o/mMCwUmhc63qA9ZCpQwaiI6Hwb7wCZw8VEfNR6PVlimv3LNYxtrsw85210nWwPTWrhcLnZRjm/eQ8XA/roLwpe+97mmAzVg== X-Google-Smtp-Source: AGHT+IF30DM/J226gr/Wk+HwZdIGrIw3XTpgBS9Pyk65vrQYQ/YO1BDC+RU2RzZUi3M4yeRq2l4e X-Received: by 2002:a05:6830:6505:b0:6e4:df6e:d72c with SMTP id cm5-20020a056830650500b006e4df6ed72cmr6661258otb.29.1709570486545; Mon, 04 Mar 2024 08:41:26 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709570486; cv=pass; d=google.com; s=arc-20160816; b=F8n+Z2GSWzwH+D3vj0yrggMC/zkM84CTMLkuyBK8Xo2QwBiFELklwv20ZkymNhCFNv ZGXloa//xLYAut6ZljkgglQJWbr2uVbrgS+603BJH6zJlxzArRTqSPnR2GY+61dZbg+e cVbsr/WMfg79GFS2FA5WpALsDMS37RYl8fO0BUVIxdP+fjOk724RRTfz/40zBxe8Qkdz vBXN++ikKsUkg9SsXNf32ZssOyb2wS32QcIVRvWwuE1ExNtA3ZKVFc9ae3uUFRzeJJQM Qk8jf71J4fqMLF7mWyYSK91My3l79g9mQUb5cYlc0l5WpjI21hvu4WluHm5tCpGvnLu4 T+Gg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date; bh=IArUhTTO5PNn4mYhLmVMZbD2uSSJd+O+HlNQ3oEjuqs=; fh=eNQMHOSzT451Hwy8XVNgyA1Kpge9+8M8oJf2cHmCYMU=; b=cuC65XEYmdSv7d+KF/Ff824iPTcRjr9gGbF23drulrldvyQ4cMnG8TSqvbiqTCZz/W 6pW5hU290h3V2HMMTMu9UR9x4dGJd5WctB9RmGwlXpjqCxF83vTOaCfLWYi7r84B/CWA 4aVX5RZS6g5GWd3eoefWXCHHOA8p89cxYkjmpNC7YhNeYb7FTLQ395GLzovYHBpupF9N VXz9hcqSby1lBV40bj7JMuS2krx4NIPci8esA+ybFQ9jvHWFj+cUOrr9QdDSiqGMo2qm shwlswNUh+eNT6Cz6ZoecBAp2QoOM24R9GjGOh60+hAeLn+c4y4fh33xbru3bzHj4oI+ UEMA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=netrider.rowland.org); spf=pass (google.com: domain of linux-kernel+bounces-90914-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-90914-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=harvard.edu Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id z21-20020a656655000000b005dc4fca7d3csi8510352pgv.200.2024.03.04.08.41.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 08:41:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-90914-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=netrider.rowland.org); spf=pass (google.com: domain of linux-kernel+bounces-90914-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-90914-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=harvard.edu Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id C3F34B2756B for ; Mon, 4 Mar 2024 16:15:45 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D96004C62E; Mon, 4 Mar 2024 16:15:28 +0000 (UTC) Received: from netrider.rowland.org (netrider.rowland.org [192.131.102.5]) by smtp.subspace.kernel.org (Postfix) with SMTP id EF9954CB46 for ; Mon, 4 Mar 2024 16:15:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.131.102.5 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709568928; cv=none; b=l2rzLOzYoh32/7ztqHzhaJA9l+1QcQtEsXhgwquzd85binUoIg7hMRTpneqQgq1LTzYtnFrj4nTVsLd+yrlKFUeiePrMweKRZOvzSraLSD9tsflrD4w/NR25xJ/jl2fb6uL8YbGVjpWpV4jzrVvpaoJUmCjEnjCtoYHss1KHdGg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709568928; c=relaxed/simple; bh=qAdvsdCrpIoYhm6EIJvOhedQNm7nBC/QUtRA/6aXAf0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=GB8sPTGZ7PJAx7soXq5y7//o66U4jtRV3VJPcZLFSEE5zPA9TI91cgCF1uslFFU5jxSdlwNrjWJDvX0sG7AzRFU0n3uFmIPLphG7VhkKTrs88hzQLDz+WYmPb1GF/0tlLK2CBoA/dyqM5l73q7UZdhyc8M6cdLr/NJe18x6smHE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=rowland.harvard.edu; spf=pass smtp.mailfrom=netrider.rowland.org; arc=none smtp.client-ip=192.131.102.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=rowland.harvard.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=netrider.rowland.org Received: (qmail 125641 invoked by uid 1000); 4 Mar 2024 11:15:24 -0500 Date: Mon, 4 Mar 2024 11:15:24 -0500 From: Alan Stern To: Sam Sun , Greg KH , Tejun Heo Cc: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org, "xrivendell7@gmail.com" , hgajjar@de.adit-jv.com, quic_ugoswami@quicinc.com, stanley_chang@realtek.com, heikki.krogerus@linux.intel.com Subject: Re: [Bug] INFO: task hung in hub_activate Message-ID: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Mon, Mar 04, 2024 at 08:10:02PM +0800, Sam Sun wrote: > Dear developers and maintainers, > > We encountered a task hung in function hub_activate(). It was reported > before by Syzbot several years ago > (https://groups.google.com/g/syzkaller-lts-bugs/c/_komEgHj03Y/m/rbcVKyLXBwAJ), > but no repro at that time. We have a C repro this time and kernel > config is attached to this email. The bug report is listed below. Never mind the rest of the kernel log; I figured out what's going on. Here are the important parts: > ppid:8106 flags:0x00000006 > Call Trace: > > context_switch kernel/sched/core.c:5376 [inline] > __schedule+0xcea/0x59e0 kernel/sched/core.c:6688 > __schedule_loop kernel/sched/core.c:6763 [inline] > schedule+0xe9/0x270 kernel/sched/core.c:6778 > schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6835 > __mutex_lock_common kernel/locking/mutex.c:679 [inline] > __mutex_lock+0x509/0x940 kernel/locking/mutex.c:747 > device_lock include/linux/device.h:992 [inline] > usb_deauthorize_interface+0x4d/0x130 drivers/usb/core/message.c:1789 > interface_authorized_store+0xaf/0x110 drivers/usb/core/sysfs.c:1178 > dev_attr_store+0x54/0x80 drivers/base/core.c:2366 usb_deauthorize_interface() starts by calling device_lock() on the usb_interface's parent usb_device. > ppid:8109 flags:0x00004006 > Call Trace: > > context_switch kernel/sched/core.c:5376 [inline] > __schedule+0xcea/0x59e0 kernel/sched/core.c:6688 > __schedule_loop kernel/sched/core.c:6763 [inline] > schedule+0xe9/0x270 kernel/sched/core.c:6778 > kernfs_drain+0x36c/0x550 fs/kernfs/dir.c:505 > __kernfs_remove+0x280/0x650 fs/kernfs/dir.c:1465 > kernfs_remove_by_name_ns+0xb4/0x130 fs/kernfs/dir.c:1673 > kernfs_remove_by_name include/linux/kernfs.h:623 [inline] > remove_files+0x96/0x1c0 fs/sysfs/group.c:28 > sysfs_remove_group+0x8b/0x180 fs/sysfs/group.c:292 > sysfs_remove_groups fs/sysfs/group.c:316 [inline] > sysfs_remove_groups+0x60/0xa0 fs/sysfs/group.c:308 > device_remove_groups drivers/base/core.c:2734 [inline] > device_remove_attrs+0x192/0x290 drivers/base/core.c:2909 > device_del+0x391/0xa30 drivers/base/core.c:3813 > usb_disable_device+0x360/0x7b0 drivers/usb/core/message.c:1416 > usb_set_configuration+0x1243/0x1c40 drivers/usb/core/message.c:2063 > usb_deauthorize_device+0xe4/0x110 drivers/usb/core/hub.c:2638 > authorized_store+0x122/0x140 drivers/usb/core/sysfs.c:747 > dev_attr_store+0x54/0x80 drivers/base/core.c:2366 Among other things, usb_disable_device() calls device_del() for the usb_device's child interfaces. For brevity, let A be the parent usb_device and let B be the child usb_interface. Then in broad terms, we have: CPU 0 CPU 1 ----------------------------- ---------------------------- usb_deauthorize_device(A) device_lock(A) usb_deauthorize_interface(B) usb_set_configuration(A, -1) device_lock(A) usb_disable_device(B) device_del(B) sysfs_remove_group(B, intf_attrs) The problem now is: 1. The kernfs core (kernfs_drain) on CPU 0 can't remove the intf_attrs sysfs attribute group while CPU 1 is in the middle of running a callback routine for one of the attribute files in that group. 2. The callback routine on CPU 1 can't grab A's lock while CPU 0 is holding it. Result: deadlock. This seems to be the only case where an interface sysfs callback routine tries to acquire the parent device's lock. That lock is needed here because when an interface is deauthorized, the kernel has to unbind the driver for that interface -- and binding or unbinding a USB interface driver requires that the parent device's lock be held. Three ideas stand out. First, the device_lock() call should be interruptible, because it is called when a user process writes to the "authorized" attribute file. But that alone won't fix the problem. Second, we could avoid the deadlock by adding a timeout to this device_lock() call. But we probably don't want a deauthorize operation to fail because of a timeout from a contested lock. Third, this must be a generic problem. It will occur any time a sysfs attribute callback tries to lock its device while another process is trying to unregister that device. We faced this sort of problem some years ago when we were worrying about "suicidal" attributes -- ones which would unregister their own devices. I don't remember what the fix was or how it worked. But we need something like it here. Greg and Tejun, any ideas? Is it possible somehow for an attribute file to be removed while its callback is still running? Alan Stern