Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp1630066pxp; Thu, 17 Mar 2022 13:07:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxW9BX7E5zLw2oDk95auFDSEFqWOAIU19iCUm0iyfGRzdSNKH2rwiuZ41KI/oI/EEqkczwS X-Received: by 2002:a17:902:bcc6:b0:153:53c2:7e2c with SMTP id o6-20020a170902bcc600b0015353c27e2cmr6617764pls.14.1647547634321; Thu, 17 Mar 2022 13:07:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647547634; cv=none; d=google.com; s=arc-20160816; b=BVFfMMmfOluFDqHfLB16oCigxTguSULhwa6CXnWVHMPd5wC9G4nsp33zpkcunBkaRP EqQDSApbczSVw5yca+DKwLRmvOfgReMmgtewOyhBM+qn8/4XfP8qT8dw0h578TyDvrEs ZwBIyB4vWoaZyjtcq/F5Apoir1NCOyIWDd2g4rjP+ewGAagPNABrV5mq0U1V19bPPYDg I3w5THqcITqiihs/dOw/x5exzes+kdkvwXBwXW+J4pLmCXOu3/M6aej+CHuVzuQhSUxx 6uetLDoaqJBmEshZw+yEX5/JlR9u3AMUTy/ztjigfclxJn1x9UXlpqjE7htErWWS8RF6 WisQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=cS3gCMpL/ai7XGofPaKcXCSufVSIDZju6j69US3CloI=; b=jXvXeQDHZG4KtkoAccsK1G/oK9SwZ1nCij21PhiQvMcDuuSODkHBBcyZPETx/JF1aV mzw+X2GwgrAr1CEmrjbHRzKW5zueod7OO03UG3z1Q0HAY8tM1R50RHZCZJV30RChJaE/ jl7e3nilxLIfsl+ZOFfRAL7Nmk/xYkm5OTYDCdr0IP9SUpuSs6wsu61BXP78szfFImlF Z1/yr2IIkRLDUFcrtxf5CP5L2rjENIvbAo5dOJcr3+bbePEx+QcGb0daxnliWktlVoa4 OPeScqp4aBgVVX8nFF9xqxlb4/6zXQEFpBg9nYvy8xmPpeYZQEkZ4jOFB03zObngn4Pa aakQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=rEb6cct6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id p8-20020a170902e74800b00153cc9db550si34504plf.314.2022.03.17.13.07.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Mar 2022 13:07:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=rEb6cct6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E879C28CA96; Thu, 17 Mar 2022 12:55:18 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236487AbiCQQYe (ORCPT + 99 others); Thu, 17 Mar 2022 12:24:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235658AbiCQQYd (ORCPT ); Thu, 17 Mar 2022 12:24:33 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2015A27CE; Thu, 17 Mar 2022 09:23:01 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8E66361269; Thu, 17 Mar 2022 16:23:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 68E0BC340E9; Thu, 17 Mar 2022 16:23:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1647534180; bh=ImQoBTsHebNJ/T3CW7t2GWhQ4WI2Vkh2QjjRMp1jqPI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=rEb6cct6YvAtYn00NaHSXzpD7cVSX8otKlkfnUcbxploRSARisnXsbC9DM0FHwjHb k8Lp6uvBVa4bJZceAjF8uzTik9HuKk8NxNKUVwOVTa78zxVuJDxxOZ8kNlIkRU6Wj+ TZNIgXENw+kP1u1Zr/EQd86S36ZMqOnjJkxEu9AVB5eQDkD0roccNrtapFuPHt0Hpo 9y57yJdhnqnLu6hi100F6AvZ8nDIuh6YngzAobMW3dP4leSQtqQk6AiNStG/7dBIAs i14MPUP7NTyp1muOFArjSX0D9yR55/Eirdo8BB1rlttIYM9mJhCJxiHoqCIW3pwupj n/NNwiva/7xjg== Date: Thu, 17 Mar 2022 17:22:57 +0100 From: Frederic Weisbecker To: Zqiang Cc: paulmck@kernel.org, rcu@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] rcu: Deoffload rdp if rcuop/rcuog kthreads spawn failed Message-ID: <20220317162257.GA463894@lothringen> References: <20220314023314.795253-1-qiang1.zhang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220314023314.795253-1-qiang1.zhang@intel.com> X-Spam-Status: No, score=-3.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Zqiang, Thanks for poking into this! Several comments: On Mon, Mar 14, 2022 at 10:33:14AM +0800, Zqiang wrote: > When CONFIG_RCU_NOCB_CPU is enabled and 'rcu_nocbs' is set, the rcuop > and rcuog kthreads is created. however the rcuop or rcuog kthreads > creation may fail, if failed, deoffload per-cpu rdp which belong > to rcu_nocb_mask. > > Signed-off-by: Zqiang > --- > v1->v2: > Invert the locking dependency order between > rcu_state.barrier_mutex and hotplug lock. > > Holding nocb_gp_kthread_mutex, ensure that > the nocb_gp_kthread exists. > > kernel/rcu/tree_nocb.h | 63 +++++++++++++++++++++++++++++++----------- > 1 file changed, 47 insertions(+), 16 deletions(-) > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > index 46694e13398a..4ec96d0c11de 100644 > --- a/kernel/rcu/tree_nocb.h > +++ b/kernel/rcu/tree_nocb.h > @@ -972,10 +972,7 @@ static int rdp_offload_toggle(struct rcu_data *rdp, > } > raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags); > > - if (wake_gp) > - wake_up_process(rdp_gp->nocb_gp_kthread); > - > - return 0; > + return wake_gp; > } > > static long rcu_nocb_rdp_deoffload(void *arg) > @@ -984,11 +981,18 @@ static long rcu_nocb_rdp_deoffload(void *arg) > struct rcu_segcblist *cblist = &rdp->cblist; > unsigned long flags; > int ret; > + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; > > + /* > + *When rcuog or rcuop spawn fail, direct call rcu_nocb_rdp_deoffload(). > + *due to the target CPU(rdp->cpu) is not online(cpu_online(rdp->cpu) > + *return false) yet. this warning will be triggered. > + */ > WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id()); How about: WARN_ON_ONCE((rdp->cpu != raw_smp_processor_id()) && cpu_online(cpu)); > > pr_info("De-offloading %d\n", rdp->cpu); > > + mutex_lock(&rdp_gp->nocb_gp_kthread_mutex); Please instead lock nocb_gp_kthread_mutex right above the rdp_gp->nocb_gp_kthread check below. It doesn't look needed before that. > rcu_nocb_lock_irqsave(rdp, flags); > /* > * Flush once and for all now. This suffices because we are > @@ -1010,9 +1014,19 @@ static long rcu_nocb_rdp_deoffload(void *arg) > rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE); > invoke_rcu_core(); > ret = rdp_offload_toggle(rdp, false, flags); Better use a new wake_gp variable for clarity. > - swait_event_exclusive(rdp->nocb_state_wq, > - !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB | > + > + if (rdp_gp->nocb_gp_kthread) { > + if (ret) > + wake_up_process(rdp_gp->nocb_gp_kthread); > + swait_event_exclusive(rdp->nocb_state_wq, > + !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB | > SEGCBLIST_KTHREAD_GP)); > + } else { > + rcu_nocb_lock_irqsave(rdp, flags); > + rcu_segcblist_clear_flags(&rdp->cblist, > + SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP); > + rcu_nocb_unlock_irqrestore(rdp, flags); > + } And you can unlock nocb_gp_kthread_mutex here. > /* Stop nocb_gp_wait() from iterating over this structure. */ > list_del_rcu(&rdp->nocb_entry_rdp); > /* > @@ -1030,12 +1044,12 @@ static long rcu_nocb_rdp_deoffload(void *arg) > * rcu_nocb_unlock_irqrestore() anymore. > */ > raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags); > - > + mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex); > /* Sanity check */ > WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass)); > > > - return ret; > + return 0; > } > > int rcu_nocb_cpu_deoffload(int cpu) > @@ -1043,8 +1057,8 @@ int rcu_nocb_cpu_deoffload(int cpu) > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); > int ret = 0; > > - mutex_lock(&rcu_state.barrier_mutex); > cpus_read_lock(); > + mutex_lock(&rcu_state.barrier_mutex); Please do the locking order change in a separate patch. It's a significant change on its own. > if (rcu_rdp_is_offloaded(rdp)) { > if (cpu_online(cpu)) { > ret = work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp); > @@ -1068,6 +1082,7 @@ static long rcu_nocb_rdp_offload(void *arg) > struct rcu_segcblist *cblist = &rdp->cblist; > unsigned long flags; > int ret; > + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; > > WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id()); > /* > @@ -1077,6 +1092,12 @@ static long rcu_nocb_rdp_offload(void *arg) > if (!rdp->nocb_gp_rdp) > return -EINVAL; > > + mutex_lock(&rdp_gp->nocb_gp_kthread_mutex); > + if (!rdp_gp->nocb_gp_kthread) { > + mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex); > + return -EINVAL; > + } I believe you don't need to hold nocb_gp_kthread_mutex here. I think a simple WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread) is enough because it's unexpected here. > + > pr_info("Offloading %d\n", rdp->cpu); > > /* > @@ -1112,6 +1133,8 @@ static long rcu_nocb_rdp_offload(void *arg) > * rcu_nocb_unlock() rcu_nocb_unlock() > */ > ret = rdp_offload_toggle(rdp, true, flags); > + if (ret) You can use wake_gp here too. > + wake_up_process(rdp_gp->nocb_gp_kthread); Thanks!