Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp1348314rdb; Wed, 6 Dec 2023 16:44:08 -0800 (PST) X-Google-Smtp-Source: AGHT+IFQjGXTmx93j7e334JIelnoeNBhUXhuvo/tEpbwv4RQjs1EFniWkqrVT4rPpK0sXcfQkaru X-Received: by 2002:a17:902:f690:b0:1d0:8554:5dcc with SMTP id l16-20020a170902f69000b001d085545dccmr1643530plg.98.1701909847784; Wed, 06 Dec 2023 16:44:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701909847; cv=none; d=google.com; s=arc-20160816; b=II7i5wYbtY+QBcaRi1Wqklnp12MMa3OsGiEhIpKFr6HjP9uL39yjHx/v8Hn3GdHP42 86fEZ0pkjC/n0JWerZDZOt6N63JugHrzz4Y60lsq4SUhytd78DNTpNYR4JTO8N2RyRWU ICs225aEMqcsvRDf9MagtUHRHzlDkJvAdwQnuyEyXE1E5aseAY583asGEwpzv1MGatm2 KKDeoTW8OE8QqAPRGh97ijuznlcQsGScWRnYDzXYkuGpP4sDqSTMZ9mlD9yPdzXtLswN mJTeQpA4wq7sOzUWzOqcpKix43M4+nh//HD7AqOAzmdbVMc0jMCU6oy1iRNiEs2kWzda +15A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=vyGPqsbyM+o/waC/snbt/Zuwaa3Q4YW9cnMRXpLc38I=; fh=tIT6pJ2z+oFP5Od0tvWpA9xBHOPNcNjOLVi8WR2QoGw=; b=lgOtxOB9541+m801qW84t20GC20e4zvOGbCSmY5FzugmP0WrZ8ZNQ+MImpuEge4y23 G87mG0MtYZf/xWuOyLJycsVNJ369/pUzOQsxf7XbryL29c+/E7yS8/XJ8/muMNYh19vI 9iTZIW1exTmHprDubmD39vGDN5Czr3h93QZ8jUhZtrNjPd9V8IpCYfdSq5yffXz+uhac fHgzSIjHPsncJYUseGy9lBWiFV9BIO28DgP4/zXH1S6zab6NneZB5OCqjFEjC1WswLLX 417xUJuJKBHSGa5eB7YNXYFaoGwMbId7re6hBntxzY6/g3SlZb0gZYXCG0Splmor2Box 6o7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=hCjvRBHK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id jw13-20020a170903278d00b001d03e572976si96371plb.591.2023.12.06.16.44.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 16:44:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=hCjvRBHK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 802ED80657D8; Wed, 6 Dec 2023 16:44:06 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231164AbjLGAn4 (ORCPT + 99 others); Wed, 6 Dec 2023 19:43:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229729AbjLGAnz (ORCPT ); Wed, 6 Dec 2023 19:43:55 -0500 Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9BACD4B; Wed, 6 Dec 2023 16:44:01 -0800 (PST) Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-6ce26a03d9eso126662b3a.0; Wed, 06 Dec 2023 16:44:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701909841; x=1702514641; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=vyGPqsbyM+o/waC/snbt/Zuwaa3Q4YW9cnMRXpLc38I=; b=hCjvRBHKhDEeZUbPUpl4/cjWOS7wPCYRJ8/ihKv1S/ehtQ4+Erzi7hFs5H5oQAx+vP b5481IVk4jSFCC0J2smQLnIEDuqtGeH6NYCZjUUsBlVCpgJUN8jJWjYW69B6LRm+4Gpg WkFTWGPM5Ocnob2/gR+G32yKvmVInveofisaRiW4J4eHULheN1wFyfgX9MGFSaimovT6 idxlepn4EjcKaB8Rl25W47SkMhPeMkBr6dmgNdHANwxy2LcJHFqcmGRO1Z5mjKNDDXtk kgBZdxh6k/77Io0BtOwetfMtiiGuqs5d2sk8gSUqkOFw+BV33lZTYNB6CvadpCefT6vV 5kiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701909841; x=1702514641; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=vyGPqsbyM+o/waC/snbt/Zuwaa3Q4YW9cnMRXpLc38I=; b=MmJ4bVYwDOSdRahdLWdWIeqOPgmJRoBSx1Z/xpvtdW1IfbglVNYE6DYj3APNO9YoJE ekxlZToPPv03YKvN0i/CgzqoAVcQlEk4+Vq28xgZsPBo1YrYP1vWZ7nKXbuKsgaQNN2j Fj/vWzItm3/rct9a7axRDbl8qT76CGdmo98eXRF9rNhvzBhFSfek/qrUpaMczc9CPGE5 tleQa9VEJLoCcnuPlK+9GG0R7mN0RHrCjQedVaEfL8ZppAjT9MUmw8WSRMgleZK6ejqv PB+0N7bTGyZSlqItUiPAtxYu7+0Ab67DRVpDODksPBMsFtw/tlmDaD2zBTMX4EUIoiky 8zZA== X-Gm-Message-State: AOJu0Yx2mxEZLyYZnPX4eLK+mf3DNRKw235fd0eqFgsWvMUpHktb8jPQ 23Sjcny5LQoafGZxdjxI10WzEUi3YfQjSw== X-Received: by 2002:a05:6a00:b87:b0:6ce:6b7c:ba41 with SMTP id g7-20020a056a000b8700b006ce6b7cba41mr2046931pfj.64.1701909841025; Wed, 06 Dec 2023 16:44:01 -0800 (PST) Received: from localhost ([216.228.127.130]) by smtp.gmail.com with ESMTPSA id c23-20020aa78817000000b006cbe1bb5e3asm114399pfo.138.2023.12.06.16.43.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 16:44:00 -0800 (PST) Date: Wed, 6 Dec 2023 16:41:44 -0800 From: Yury Norov To: Ming Lei Cc: Thomas Gleixner , Andrew Morton , linux-kernel@vger.kernel.org, Keith Busch , linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, Yi Zhang , Guangwu Zhang , Chengming Zhou , Jens Axboe Subject: Re: [PATCH V4 resend] lib/group_cpus.c: avoid to acquire cpu hotplug lock in group_cpus_evenly Message-ID: References: <20231120083559.285174-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231120083559.285174-1-ming.lei@redhat.com> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 06 Dec 2023 16:44:06 -0800 (PST) Hi Ming, On Mon, Nov 20, 2023 at 04:35:59PM +0800, Ming Lei wrote: > group_cpus_evenly() could be part of storage driver's error handler, > such as nvme driver, when may happen during CPU hotplug, in which > storage queue has to drain its pending IOs because all CPUs associated > with the queue are offline and the queue is becoming inactive. And > handling IO needs error handler to provide forward progress. > > Then dead lock is caused: > > 1) inside CPU hotplug handler, CPU hotplug lock is held, and blk-mq's > handler is waiting for inflight IO > > 2) error handler is waiting for CPU hotplug lock > > 3) inflight IO can't be completed in blk-mq's CPU hotplug handler because > error handling can't provide forward progress. > > Solve the deadlock by not holding CPU hotplug lock in group_cpus_evenly(), > in which two stage spreads are taken: 1) the 1st stage is over all present > CPUs; 2) the end stage is over all other CPUs. > > Turns out the two stage spread just needs consistent 'cpu_present_mask', and > remove the CPU hotplug lock by storing it into one local cache. This way > doesn't change correctness, because all CPUs are still covered. > > Cc: Keith Busch > Cc: linux-nvme@lists.infradead.org > Cc: linux-block@vger.kernel.org > Reported-by: Yi Zhang > Reported-by: Guangwu Zhang > Tested-by: Guangwu Zhang > Reviewed-by: Chengming Zhou > Reviewed-by: Jens Axboe > Signed-off-by: Ming Lei > --- > lib/group_cpus.c | 22 ++++++++++++++++------ > 1 file changed, 16 insertions(+), 6 deletions(-) > > diff --git a/lib/group_cpus.c b/lib/group_cpus.c > index aa3f6815bb12..ee272c4cefcc 100644 > --- a/lib/group_cpus.c > +++ b/lib/group_cpus.c > @@ -366,13 +366,25 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps) > if (!masks) > goto fail_node_to_cpumask; > > - /* Stabilize the cpumasks */ > - cpus_read_lock(); > build_node_to_cpumask(node_to_cpumask); > > + /* > + * Make a local cache of 'cpu_present_mask', so the two stages > + * spread can observe consistent 'cpu_present_mask' without holding > + * cpu hotplug lock, then we can reduce deadlock risk with cpu > + * hotplug code. > + * > + * Here CPU hotplug may happen when reading `cpu_present_mask`, and > + * we can live with the case because it only affects that hotplug > + * CPU is handled in the 1st or 2nd stage, and either way is correct > + * from API user viewpoint since 2-stage spread is sort of > + * optimization. > + */ > + cpumask_copy(npresmsk, data_race(cpu_present_mask)); Now that you initialize the npresmsk explicitly, you can allocate it using alloc_cpumask_var(). The same actually holds for nmsk too, and even before this patch. Maybe fix it in a separate prepending patch? > + > /* grouping present CPUs first */ > ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask, > - cpu_present_mask, nmsk, masks); > + npresmsk, nmsk, masks); > if (ret < 0) > goto fail_build_affinity; > nr_present = ret; > @@ -387,15 +399,13 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps) > curgrp = 0; > else > curgrp = nr_present; > - cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask); > + cpumask_andnot(npresmsk, cpu_possible_mask, npresmsk); > ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask, > npresmsk, nmsk, masks); The first thing the helper does is checking if nprepmask is empty. cpumask_andnot() returns false in that case. So, assuming that present cpumask in the previous call can't be empty, we can save few cycles if drop corresponding check in the helper and do like this: if (cpumask_andnot(npresmsk, cpu_possible_mask, npresmsk) == 0) { nr_others = 0; goto fail_build_affinity; } ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask, npresmsk, nmsk, masks); Although, it's not related to this patch directly. So, if you fix zalloc_cpumask_var(), the patch looks good to me. Reviewed-by: Yury Norov > if (ret >= 0) > nr_others = ret; > > fail_build_affinity: > - cpus_read_unlock(); > - > if (ret >= 0) > WARN_ON(nr_present + nr_others < numgrps); > > -- > 2.41.0 >