Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp4135238iog; Tue, 21 Jun 2022 12:49:18 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sARtPzEKg53nBI1GXc3xnnWys+d5s6w/rw7MqRJWNus7Z0XJvAfmw1qEQ4QO4Ot4C1v45u X-Received: by 2002:a17:90a:eb0c:b0:1ec:c985:bc06 with SMTP id j12-20020a17090aeb0c00b001ecc985bc06mr5624686pjz.30.1655840958746; Tue, 21 Jun 2022 12:49:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655840958; cv=none; d=google.com; s=arc-20160816; b=G79fc5qLWdGr+QAN+g0LxTdJ41bzZhapH4YItf2LThwC4eWE9G9nJJ4ONUJjSTr0J7 jlJvlI7mSHfvcfLQYzZ2BgB4o1Cku54jj1e4ZgP0jCk2ToHPLd0I1QHzTDcPxM8OiTJj R71UDG3ihXQoyT1eIVHUfd3sTnj91zL58Pe8y+jxpfvGA6JYtE+qsTmpr83QTZp+xnSi E463HsQ3Ds7m4M2M4FoZiQPxiXLICocp+sDJPuy7AfWFhdKJH/hpN1ozAySy6t8meT1X 8Qxh/asg3bx7NfE7CNLicC80rvd5nhipmiL9yLnFFG7M1gTndbrz27xcnUAb1FOgMMpX J9+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=INh+4krfD39xvde28L65iXkRMDb0Vl8OJLWvZM7CN1Q=; b=i8uMPreZU/oJD1LMCR9onID44Mopaw8XqMVtAevzM3pOAeYRs5uD22I9L3KMG/yM39 tJd9Zh7aGOb9tXdwTBbq3bLWcI1OCgqmiJVC3llWPlnGokVqPemZJ4JLV4kiGWmi0Uct kRSQYt9yFzeYzHufvz5HSUuSGb4wmXc1lQ7hXR3fszKh+f3gz0l8c+McWcdbyiO1OiT2 78Zw1qoxtNaLujGue7S1NxSAw0wcsunFkwzNjUiz98qLuAymXwPA9dIQ93ZLkys4dJIq Js5Ant10v4zXBAJG8Ulr576P4bkvR6X3NgxL/CAuaM+0omze8No229ghVViKTwlYR/36 ElGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=ae+P9Ndv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d21-20020a170902aa9500b0016a29155e3asi6251520plr.353.2022.06.21.12.49.07; Tue, 21 Jun 2022 12:49:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=ae+P9Ndv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354017AbiFUTed (ORCPT + 99 others); Tue, 21 Jun 2022 15:34:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354012AbiFUTec (ORCPT ); Tue, 21 Jun 2022 15:34:32 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 91FA02229F for ; Tue, 21 Jun 2022 12:34:31 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 29A7561798 for ; Tue, 21 Jun 2022 19:34:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8EDDAC3411C; Tue, 21 Jun 2022 19:34:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1655840070; bh=JqOVJ9WxigD2WJ71Jncl66ZjSZFq5vF7VG2S7PJYiW0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ae+P9NdveG66dRlRPUZ4jBeGo8H8r607tNRd8tNebkRdO4pLBXMXDCTN5WQLEBqRY sJzGebHANRM3WfLlk2UByyD2Nq8YCSwbbhMnDeLOlpC4Rp9ro2id6gntNLkcWruZD7 zblSfIjwvwSJROG8FI/zp+7LmHCRKGsR9s8hgREU= Date: Tue, 21 Jun 2022 21:34:26 +0200 From: Greg KH To: "zhangwensheng (E)" Cc: rafael@kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com Subject: Re: [PATCH -next] driver core: fix deadlock in __driver_attach Message-ID: References: <20220608094355.3298420-1-zhangwensheng5@huawei.com> <4e5e9276-e32d-0903-1f4e-20880cc72d82@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4e5e9276-e32d-0903-1f4e-20880cc72d82@huawei.com> X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A: http://en.wikipedia.org/wiki/Top_post Q: Were do I find info about this thing called top-posting? A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? A: No. Q: Should I include quotations after my reply? http://daringfireball.net/2007/07/on_top On Thu, Jun 16, 2022 at 04:00:58PM +0800, zhangwensheng (E) wrote: > sorry that I didn't see your reply. > it is real not potential, I have triggered this problem successfully and > proven that this change can fix it. > > stack like commit b232b02bf3c2 ("driver core: fix deadlock in > __device_attach"). > list below: > ??? In __driver_attach function, The lock holding logic is as follows: > ??? ... > ??? __driver_attach > ??? if (driver_allows_async_probing(drv)) > ????? device_lock(dev)????? // get lock dev > ??????? async_schedule_dev(__driver_attach_async_helper, dev); // func > ????????? async_schedule_node > ??????????? async_schedule_node_domain(func) > ????????????? entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC); > ????????????? /* when fail or work limit, sync to execute func, but > ???????????????? __driver_attach_async_helper will get lock dev as > ???????????????? will, which will lead to A-A deadlock.? */ > ????????????? if (!entry || atomic_read(&entry_count) > MAX_WORK) { > ??????????????? func; > ????????????? else > ??????????????? queue_work_node(node, system_unbound_wq, &entry->work) > ????? device_unlock(dev) > > ??? As above show, when it is allowed to do async probes, because of > ??? out of memory or work limit, async work is not be allowed, to do > ??? sync execute instead. it will lead to A-A deadlock because of > ??? __driver_attach_async_helper getting lock dev. > > ??? Because it's logic is same as commit b232b02bf3c2 ("driver core: fix > deadlock > ??? in __device_attach"),? I simplify the description. > > > Reproduce: > and it can be reproduce by make the condition > (if (!entry || atomic_read(&entry_count) > MAX_WORK)) untenable, like below: > > [? 370.785650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [? 370.787154] task:swapper/0?????? state:D stack:??? 0 pid:??? 1 ppid:???? > 0 flags:0x00004000 > [? 370.788865] Call Trace: > [? 370.789374]? > [? 370.789841]? __schedule+0x482/0x1050 > [? 370.790613]? schedule+0x92/0x1a0 > [? 370.791290]? schedule_preempt_disabled+0x2c/0x50 > [? 370.792256]? __mutex_lock.isra.0+0x757/0xec0 > [? 370.793158]? __mutex_lock_slowpath+0x1f/0x30 > [? 370.794079]? mutex_lock+0x50/0x60 > [? 370.794795]? __device_driver_lock+0x2f/0x70 > [? 370.795677]? ? driver_probe_device+0xd0/0xd0 > [? 370.796576]? __driver_attach_async_helper+0x1d/0xd0 > [? 370.797318]? ? driver_probe_device+0xd0/0xd0 > [? 370.797957]? async_schedule_node_domain+0xa5/0xc0 > [? 370.798652]? async_schedule_node+0x19/0x30 > [? 370.799243]? __driver_attach+0x246/0x290 > [? 370.799828]? ? driver_allows_async_probing+0xa0/0xa0 > [? 370.800548]? bus_for_each_dev+0x9d/0x130 > [? 370.801132]? driver_attach+0x22/0x30 > [? 370.801666]? bus_add_driver+0x290/0x340 > [? 370.802246]? driver_register+0x88/0x140 > [? 370.802817]? ? virtio_scsi_init+0x116/0x116 > [? 370.803425]? scsi_register_driver+0x1a/0x30 > [? 370.804057]? init_sd+0x184/0x226 > [? 370.804533]? do_one_initcall+0x71/0x3a0 > [? 370.805107]? kernel_init_freeable+0x39a/0x43a > [? 370.805759]? ? rest_init+0x150/0x150 > [? 370.806283]? kernel_init+0x26/0x230 > [? 370.806799]? ret_from_fork+0x1f/0x30 > > And my change can fix it. Ok, please put that type of information in the changelog text. thanks, greg k-h