Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp1310930rdb; Wed, 6 Dec 2023 15:13:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IG6pRVBRqUC/347sV7+6rCwue3p0/6rlGagGeDRH4MjK1Bm5214JE5r/PfDj0QbMAfubOR/ X-Received: by 2002:a05:6870:701e:b0:1fb:17cc:1786 with SMTP id u30-20020a056870701e00b001fb17cc1786mr1667375oae.4.1701904382035; Wed, 06 Dec 2023 15:13:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701904382; cv=none; d=google.com; s=arc-20160816; b=DUYwkQi6ShdrWXwBJT6Til5dsfhIBWXphAHFxXxR8jNsMDFhlS5nRlCSRpr9oLgIbF 9RnEvLA2ygptRzmybV7VtijXXee54sGUYUiEObBB1+argYV5yFqAj2Ha9bIXIiz+0z3y IQldecKxcsRYjm+JPvkAh8bhZOuHkx9jSZY6+e+py0rNWfgNehjSDq4Bq2MCW2TkWjPq EsS380oBjhgdPHg2NzqPXaSvtBIQxGZXzxl5+sc378OLcGOuEoLbMQbPBtTxIaQqNgdb B/SnocP6OaLgw+jAQkiY7p8MrcOeTmudFr9SocoEG2762y6klavb4TBOgyRmMP9iJL7n BQfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:to:from:date :dkim-signature; bh=Nfdzn+asIXJY1Pbqh1hyge+K9h/DWRhMVkVogM0FKzw=; fh=s3drdWtBAAffQoRKqrt23cBiEOZi5tQ6+cK2VR9vLUA=; b=MRlA7WzlUL1BlmMx4vBrgKeUwVKOp83ZiHGWj1jrw0Iy8ibjYGjlfW+91nkUtjf0iy Z4CuBkB2ejji6q/+qPJTJGqG/73R/in4g56Mt1YZxKyTeYpMBqJffTcrmJxl+TW8p0Q1 FkTq33YqlBl4Pv38uWBNGjEgB/ijFh2x41rM3I3UlymR2d1gAa2SrkpWFtrxxYLpZ/3L +m0sddNyrcwvBJTzCf6yD/Z86di7E8eq9tmcyT0CnjTH/hrr/XmbAv3bdJ+g6rMrBU8E D+40Q5nu6xemLasF59DDCbfcp8bkiq1ZmtUaAfexZYdKJXsco3/4G/j8p37ckoQtIdKG 3Deg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=OfyVzYnc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id b32-20020a631b60000000b005c67dd98b13si39503pgm.314.2023.12.06.15.13.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 15:13:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=OfyVzYnc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 96EA580D652D; Wed, 6 Dec 2023 15:12:59 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229657AbjLFXMm (ORCPT + 99 others); Wed, 6 Dec 2023 18:12:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229590AbjLFXMl (ORCPT ); Wed, 6 Dec 2023 18:12:41 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AFB2B181 for ; Wed, 6 Dec 2023 15:12:47 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DCC32C433C7; Wed, 6 Dec 2023 23:12:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1701904367; bh=q0AAb5LsQDqYl5vBMK1bSmG4E29bCARdra+SeV0W4Q0=; h=Date:From:To:Subject:In-Reply-To:References:From; b=OfyVzYnc72UgxM0i9NwfjylRMNW9Ocbubs21NdcPhncJI2nB+iSzMiykfFSfL6PIK GW7+nhMSl2Ic54XMLLoBBqtqgICD74gNCnnUbCLKiNP31mir7Uav2qWblO2VlB8Hbm RmenOPVqi4HZF06nJS2fUMlhpWy8kjpbLeSOp/Jo= Date: Wed, 6 Dec 2023 15:12:46 -0800 From: Andrew Morton To: Ming Lei , Thomas Gleixner , linux-kernel@vger.kernel.org, Keith Busch , linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, Yi Zhang , Guangwu Zhang , Chengming Zhou , Jens Axboe Subject: Re: [PATCH V4 resend] lib/group_cpus.c: avoid to acquire cpu hotplug lock in group_cpus_evenly Message-Id: <20231206151246.99bbf0f253b85f053bea9199@linux-foundation.org> In-Reply-To: <20231120120059.ef0614c2295b2102100cb56e@linux-foundation.org> References: <20231120083559.285174-1-ming.lei@redhat.com> <20231120120059.ef0614c2295b2102100cb56e@linux-foundation.org> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 06 Dec 2023 15:12:59 -0800 (PST) On Mon, 20 Nov 2023 12:00:59 -0800 Andrew Morton wrote: > On Mon, 20 Nov 2023 16:35:59 +0800 Ming Lei wrote: > > > group_cpus_evenly() could be part of storage driver's error handler, > > such as nvme driver, when may happen during CPU hotplug, in which > > storage queue has to drain its pending IOs because all CPUs associated > > with the queue are offline and the queue is becoming inactive. And > > handling IO needs error handler to provide forward progress. > > > > Then dead lock is caused: > > > > 1) inside CPU hotplug handler, CPU hotplug lock is held, and blk-mq's > > handler is waiting for inflight IO > > > > 2) error handler is waiting for CPU hotplug lock > > > > 3) inflight IO can't be completed in blk-mq's CPU hotplug handler because > > error handling can't provide forward progress. > > > > Solve the deadlock by not holding CPU hotplug lock in group_cpus_evenly(), > > in which two stage spreads are taken: 1) the 1st stage is over all present > > CPUs; 2) the end stage is over all other CPUs. > > > > Turns out the two stage spread just needs consistent 'cpu_present_mask', and > > remove the CPU hotplug lock by storing it into one local cache. This way > > doesn't change correctness, because all CPUs are still covered. > > I'm not sure what is the intended merge path for this, but I can do lib/. > > Do you think that a -stable backport is needed? It sounds that way. > > If so, are we able to identify a suitable Fixes: target? That would > predate f7b3ea8cf72f3 ("genirq/affinity: Move group_cpus_evenly() into > lib/"). No? I think this predates 428e211641ed8 ("genirq/affinity: Replace deprecated CPU-hotplug functions." also. I'll slap a cc:stable on it and I'll let you and the -stable maintainers figure it out.