Received: by 2002:a05:6a10:144:0:0:0:0 with SMTP id 4csp788866pxw; Sat, 9 Apr 2022 00:03:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJygKyvVE1tw7OqfENv5ZzUZ08wCOvVOv57fhuEhT8wK0hBSMo3V5tyeHSOXBB8hwPRxAW0o X-Received: by 2002:a17:906:2b93:b0:6cf:bb48:5a80 with SMTP id m19-20020a1709062b9300b006cfbb485a80mr21314982ejg.681.1649487798616; Sat, 09 Apr 2022 00:03:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649487798; cv=none; d=google.com; s=arc-20160816; b=i4H7qvL29pgYX9yzcUwumtE3uvmeUnSs9LqjFFaBnDt9Nd2N8VlTpZbngV+dw3/Oq6 I9ISxoS8lGdBz1Se8pW1KSsZixgFpzaoy7VbLC7qFaS5iMHamVG5mS7FGkFwaN1e7x/K wS0X/yjgN9B1KbENijqEs0etUPFuPHo+CKbokL44MluyWxdfNhB/+o16a7+qS7AhsH/w mNarByiUD9oOOhHRF3uwRaou2zk91jTYOo/i0KO5GIMJG/TKMdnT2rseYuQaTbRWwg+I ACy0r988bsL4N6LO4Utfmi/zTzXZ0WTnQwZYR++A952fjee4Zb3PzpTLNZ8sVYi982TX C6iA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=OQ3yXPs/ePkmyqPbhNIG1CvTCw2rw1NphQOx38b16uI=; b=TZWNPJ7/jCIORwq/96x1Yfu6/pYxjaPqeCABzIE4hOrFaip/2iFOYbl+2oU3oaa2+8 RbXCjj5V+ijBeOrYtfSW/FCnid4p14D4Kmw+noVRTc/rvZ+vcWl93XPNSek1JoJVJrBp keEiSqjXK0V+PeSTC22v0viO4F+/qkzFbYsMwM5SwgimRt/2+5gW8pp/cQMn5kKtt8Fn CMQ9ebMeHX1y/NiV0Ewyoga8Ae/LoRFMGaRahqtDiv/4SYyE9WfrMXWlxp9J47238ISa iQgVuA38ncbZxLAuFOjB150qzHzuO8QlsiZwI/VdGIdBqtFFq18v4Q4Q8a+j+I1tEIb1 DfpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=cuJUaTKW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g10-20020a1709061e0a00b006e843669008si2339462ejj.394.2022.04.09.00.02.26; Sat, 09 Apr 2022 00:03:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=cuJUaTKW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237134AbiDHOns (ORCPT + 99 others); Fri, 8 Apr 2022 10:43:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33292 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233339AbiDHOnn (ORCPT ); Fri, 8 Apr 2022 10:43:43 -0400 Received: from mail-io1-xd2e.google.com (mail-io1-xd2e.google.com [IPv6:2607:f8b0:4864:20::d2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7EFDEF1E9E for ; Fri, 8 Apr 2022 07:41:39 -0700 (PDT) Received: by mail-io1-xd2e.google.com with SMTP id g21so10788381iom.13 for ; Fri, 08 Apr 2022 07:41:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OQ3yXPs/ePkmyqPbhNIG1CvTCw2rw1NphQOx38b16uI=; b=cuJUaTKW54ST5EiyzqertJtD6UBOB75Rl7LRp5oTlOLilLM1b5cNmhXmelDcK8pxff o609683EyCDOtYG5OMxQ1qC9F3WRVoYqS/L4d4piVvR77abxl3PmjhKc7gXn0k6KyOe5 P3sSPvO2afz1ae/Jhl3lPQXf0fKINpu3z4wJc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OQ3yXPs/ePkmyqPbhNIG1CvTCw2rw1NphQOx38b16uI=; b=VMh0HK2ynA5FFTwNXOYVAh3mGIy5E1yA9GCnYfvKKy3ecw3ywDq/6mneKp28e61K9h O1XTRqRWFXFRb2mvtpS644wL732McbkEFkgdsoZC86UfmheCJgMJSSsUKXyd4i9nFkNi mNVZc6SwFcEgUAsd4gFHqPUghwTSGYTUoVfrffzmgblqOCC1pQIm73QFiiNg6Yfmw3mb QQRXeFrAisYw1e1zifIJHe2hREh0j5F16pVuK0UywZTfCPYsKqeqe2KYold7uBnrb75B ZsniM+GQqkb78I5PouV5Aaiy1o3e0W9b1lfzI0hLZfuUM5BB1hYgHhK5zya+3/+r0qyN o7eA== X-Gm-Message-State: AOAM5327fK5GFcPHK7Js2HiXVW88ETaVvcqTZzf6AylkC5cIJFA9lGlM 0paV0Mx5PEQLNAkV0dxNCmgrL+W+3m8xCqwz0+Dqxg== X-Received: by 2002:a5d:8450:0:b0:64c:cc87:c5fc with SMTP id w16-20020a5d8450000000b0064ccc87c5fcmr8326802ior.190.1649428898763; Fri, 08 Apr 2022 07:41:38 -0700 (PDT) MIME-Version: 1.0 References: <20220408045734.1158817-1-kaleshsingh@google.com> <20220408143444.GC4285@paulmck-ThinkPad-P17-Gen-1> In-Reply-To: <20220408143444.GC4285@paulmck-ThinkPad-P17-Gen-1> From: Joel Fernandes Date: Fri, 8 Apr 2022 10:41:26 -0400 Message-ID: Subject: Re: [PATCH v2] EXP rcu: Move expedited grace period (GP) work to RT kthread_worker To: "Paul E. McKenney" Cc: Kalesh Singh , Suren Baghdasaryan , kernel-team , Tejun Heo , Tim Murray , Wei Wang , Kyle Lin , Chunwei Lu , Lulu Wang , Frederic Weisbecker , Neeraj Upadhyay , Josh Triplett , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , rcu , LKML Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 8, 2022 at 10:34 AM Paul E. McKenney wrote: > > On Fri, Apr 08, 2022 at 06:42:42AM -0400, Joel Fernandes wrote: > > On Fri, Apr 8, 2022 at 12:57 AM Kalesh Singh wrote: > > > > > [...] > > > @@ -334,15 +334,13 @@ static bool exp_funnel_lock(unsigned long s) > > > * Select the CPUs within the specified rcu_node that the upcoming > > > * expedited grace period needs to wait for. > > > */ > > > -static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) > > > +static void __sync_rcu_exp_select_node_cpus(struct rcu_exp_work *rewp) > > > { > > > int cpu; > > > unsigned long flags; > > > unsigned long mask_ofl_test; > > > unsigned long mask_ofl_ipi; > > > int ret; > > > - struct rcu_exp_work *rewp = > > > - container_of(wp, struct rcu_exp_work, rew_work); > > > struct rcu_node *rnp = container_of(rewp, struct rcu_node, rew); > > > > > > raw_spin_lock_irqsave_rcu_node(rnp, flags); > > > @@ -417,13 +415,119 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) > > > rcu_report_exp_cpu_mult(rnp, mask_ofl_test, false); > > > } > > > > > > +static void rcu_exp_sel_wait_wake(unsigned long s); > > > + > > > +#ifdef CONFIG_RCU_EXP_KTHREAD > > > > Just my 2c: > > > > Honestly, I am not sure if the benefits of duplicating the code to use > > normal workqueues outweighs the drawbacks (namely code complexity, > > code duplication - which can in turn cause more bugs and maintenance > > headaches down the line). The code is harder to read and adding more > > 30 character function names does not help. > > > > For something as important as expedited GPs, I can't imagine a > > scenario where an RT kthread worker would cause "issues". If it does > > cause issues, that's what the -rc cycles and the stable releases are > > for. I prefer to trust the process than take a one-foot-in-the-door > > approach. > > > > So please, can we just keep it simple? > > Yes and no. > > This is a bug fix, but only for those systems that are expecting real-time > response from synchronize_rcu_expedited(). As far as I know, this is only > Android. The rest of the systems are just fine with the current behavior. As far as you know, but are you sure? > In addition, this bug fix introduces significant risks, especially in > terms of performance for throughput-oriented workloads. Could you explain what the risk is? That's the part I did not follow. How can making synchronize_rcu_expedited() work getting priority introduce throughput issues? > So yes, let's do this bug fix (with appropriate adjustment), but let's > also avoid exposing the non-Android workloads to risks from the inevitable > unintended consequences. ;-) I would argue the risk is also adding code complexity and more bugs without clear rationale for why it is being done. There's always risk with any change, but that's the -rc cycles and stable kernels help catch those. I think we should not add more code complexity if it is a theoretical concern. There's also another possible risk - there is a possible hidden problem here that probably the non-Android folks haven't noticed or been able to debug. I would rather just do the right thing. Just my 2c, - Joel > > Thanx, Paul > > > Thanks, > > > > - Joel > > > > > > > +static void sync_rcu_exp_select_node_cpus(struct kthread_work *wp) > > > +{ > > > + struct rcu_exp_work *rewp = > > > + container_of(wp, struct rcu_exp_work, rew_work); > > > + > > > + __sync_rcu_exp_select_node_cpus(rewp); > > > +} > > > + > > > +static inline bool rcu_gp_par_worker_started(void) > > > +{ > > > + return !!READ_ONCE(rcu_exp_par_gp_kworker); > > > +} > > > + > > > +static inline void sync_rcu_exp_select_cpus_queue_work(struct rcu_node *rnp) > > > +{ > > > + kthread_init_work(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus); > > > + /* > > > + * Use rcu_exp_par_gp_kworker, because flushing a work item from > > > + * another work item on the same kthread worker can result in > > > + * deadlock. > > > + */ > > > + kthread_queue_work(rcu_exp_par_gp_kworker, &rnp->rew.rew_work); > > > +} > > > + > > > +static inline void sync_rcu_exp_select_cpus_flush_work(struct rcu_node *rnp) > > > +{ > > > + kthread_flush_work(&rnp->rew.rew_work); > > > +} > > > + > > > +/* > > > + * Work-queue handler to drive an expedited grace period forward. > > > + */ > > > +static void wait_rcu_exp_gp(struct kthread_work *wp) > > > +{ > > > + struct rcu_exp_work *rewp; > > > + > > > + rewp = container_of(wp, struct rcu_exp_work, rew_work); > > > + rcu_exp_sel_wait_wake(rewp->rew_s); > > > +} > > > + > > > +static inline void synchronize_rcu_expedited_queue_work(struct rcu_exp_work *rew) > > > +{ > > > + kthread_init_work(&rew->rew_work, wait_rcu_exp_gp); > > > + kthread_queue_work(rcu_exp_gp_kworker, &rew->rew_work); > > > +} > > > + > > > +static inline void synchronize_rcu_expedited_destroy_work(struct rcu_exp_work *rew) > > > +{ > > > +} > > > +#else /* !CONFIG_RCU_EXP_KTHREAD */ > > > +static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) > > > +{ > > > + struct rcu_exp_work *rewp = > > > + container_of(wp, struct rcu_exp_work, rew_work); > > > + > > > + __sync_rcu_exp_select_node_cpus(rewp); > > > +} > > > + > > > +static inline bool rcu_gp_par_worker_started(void) > > > +{ > > > + return !!READ_ONCE(rcu_par_gp_wq); > > > +} > > > + > > > +static inline void sync_rcu_exp_select_cpus_queue_work(struct rcu_node *rnp) > > > +{ > > > + int cpu = find_next_bit(&rnp->ffmask, BITS_PER_LONG, -1); > > > + > > > + INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus); > > > + /* If all offline, queue the work on an unbound CPU. */ > > > + if (unlikely(cpu > rnp->grphi - rnp->grplo)) > > > + cpu = WORK_CPU_UNBOUND; > > > + else > > > + cpu += rnp->grplo; > > > + queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work); > > > +} > > > + > > > +static inline void sync_rcu_exp_select_cpus_flush_work(struct rcu_node *rnp) > > > +{ > > > + flush_work(&rnp->rew.rew_work); > > > +} > > > + > > > +/* > > > + * Work-queue handler to drive an expedited grace period forward. > > > + */ > > > +static void wait_rcu_exp_gp(struct work_struct *wp) > > > +{ > > > + struct rcu_exp_work *rewp; > > > + > > > + rewp = container_of(wp, struct rcu_exp_work, rew_work); > > > + rcu_exp_sel_wait_wake(rewp->rew_s); > > > +} > > > + > > > +static inline void synchronize_rcu_expedited_queue_work(struct rcu_exp_work *rew) > > > +{ > > > + INIT_WORK_ONSTACK(&rew->rew_work, wait_rcu_exp_gp); > > > + queue_work(rcu_gp_wq, &rew->rew_work); > > > +} > > > + > > > +static inline void synchronize_rcu_expedited_destroy_work(struct rcu_exp_work *rew) > > > +{ > > > + destroy_work_on_stack(&rew->rew_work); > > > +} > > > +#endif /* CONFIG_RCU_EXP_KTHREAD */ > > > + > > > /* > > > * Select the nodes that the upcoming expedited grace period needs > > > * to wait for. > > > */ > > > static void sync_rcu_exp_select_cpus(void) > > > { > > > - int cpu; > > > struct rcu_node *rnp; > > > > > > trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("reset")); > > > @@ -435,28 +539,21 @@ static void sync_rcu_exp_select_cpus(void) > > > rnp->exp_need_flush = false; > > > if (!READ_ONCE(rnp->expmask)) > > > continue; /* Avoid early boot non-existent wq. */ > > > - if (!READ_ONCE(rcu_par_gp_wq) || > > > + if (!rcu_gp_par_worker_started() || > > > rcu_scheduler_active != RCU_SCHEDULER_RUNNING || > > > rcu_is_last_leaf_node(rnp)) { > > > - /* No workqueues yet or last leaf, do direct call. */ > > > + /* No worker started yet or last leaf, do direct call. */ > > > sync_rcu_exp_select_node_cpus(&rnp->rew.rew_work); > > > continue; > > > } > > > - INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus); > > > - cpu = find_next_bit(&rnp->ffmask, BITS_PER_LONG, -1); > > > - /* If all offline, queue the work on an unbound CPU. */ > > > - if (unlikely(cpu > rnp->grphi - rnp->grplo)) > > > - cpu = WORK_CPU_UNBOUND; > > > - else > > > - cpu += rnp->grplo; > > > - queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work); > > > + sync_rcu_exp_select_cpus_queue_work(rnp); > > > rnp->exp_need_flush = true; > > > } > > > > > > - /* Wait for workqueue jobs (if any) to complete. */ > > > + /* Wait for jobs (if any) to complete. */ > > > rcu_for_each_leaf_node(rnp) > > > if (rnp->exp_need_flush) > > > - flush_work(&rnp->rew.rew_work); > > > + sync_rcu_exp_select_cpus_flush_work(rnp); > > > } > > > > > > /* > > > @@ -622,17 +719,6 @@ static void rcu_exp_sel_wait_wake(unsigned long s) > > > rcu_exp_wait_wake(s); > > > } > > > > > > -/* > > > - * Work-queue handler to drive an expedited grace period forward. > > > - */ > > > -static void wait_rcu_exp_gp(struct work_struct *wp) > > > -{ > > > - struct rcu_exp_work *rewp; > > > - > > > - rewp = container_of(wp, struct rcu_exp_work, rew_work); > > > - rcu_exp_sel_wait_wake(rewp->rew_s); > > > -} > > > - > > > #ifdef CONFIG_PREEMPT_RCU > > > > > > /* > > > @@ -848,20 +934,19 @@ void synchronize_rcu_expedited(void) > > > } else { > > > /* Marshall arguments & schedule the expedited grace period. */ > > > rew.rew_s = s; > > > - INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp); > > > - queue_work(rcu_gp_wq, &rew.rew_work); > > > + synchronize_rcu_expedited_queue_work(&rew); > > > } > > > > > > /* Wait for expedited grace period to complete. */ > > > rnp = rcu_get_root(); > > > wait_event(rnp->exp_wq[rcu_seq_ctr(s) & 0x3], > > > sync_exp_work_done(s)); > > > - smp_mb(); /* Workqueue actions happen before return. */ > > > + smp_mb(); /* Work actions happen before return. */ > > > > > > /* Let the next expedited grace period start. */ > > > mutex_unlock(&rcu_state.exp_mutex); > > > > > > if (likely(!boottime)) > > > - destroy_work_on_stack(&rew.rew_work); > > > + synchronize_rcu_expedited_destroy_work(&rew); > > > } > > > EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); > > > > > > base-commit: 42e7a03d3badebd4e70aea5362d6914dfc7c220b > > > -- > > > 2.35.1.1178.g4f1659d476-goog > > >