Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E48FC25B4E for ; Tue, 24 Jan 2023 16:58:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234083AbjAXQ6c (ORCPT ); Tue, 24 Jan 2023 11:58:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233451AbjAXQ6b (ORCPT ); Tue, 24 Jan 2023 11:58:31 -0500 Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9EB0B45BC1 for ; Tue, 24 Jan 2023 08:58:28 -0800 (PST) Received: by mail-qt1-x832.google.com with SMTP id s4so13594872qtx.6 for ; Tue, 24 Jan 2023 08:58:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=4qUiDe46C6VRX7ZHZU9nb0H49/VoZpIdfPJmqtDQGds=; b=dZLjr/KYxIvFemNW2IsflLazv16AGJ4Fictk81LIIBucOqK80TE4ZdQwUDTMCFirT2 ti88u8/FV3ud464/GMC9gY2V1+hDnRTJci4xfZO5CQ1KSLNf16H5AnyGzk8x6aAF0blh Rml9MRKRq/eyzi2ZrtAij6Ws2ZcT+ojfBfy38= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4qUiDe46C6VRX7ZHZU9nb0H49/VoZpIdfPJmqtDQGds=; b=RIDqADR3A/c+oAXOJTDOpzuDTN7cPF+o/T+W7Yk1rIN7ysGXz+IMN6Ck5TZJ8rEHqa xf4DLqNKPizJzs6yI5bKY3qID/Lea5WYQpQRn1Oh4P2IqmLdjokYAHjwthvEOZy8vIyX fADcm3aLSM3oZrC8LdoV6PgLwFQ1WxHxsgvogcM5nLDx0tvHewbnKImpVmeQMDQz8MIH c+EDLrlMVKjDUr2CWLnaoElKKkMrYV7E69cEvVr7UWaG/emG8SVhDqdNPbOJ9jTF35mo SCN7ZV12H+3+hq47YWvoEKiKFZqiM0NkjxQO57JRs1rSN996r+0xaoJCnbSpkYE1r8t5 +AOQ== X-Gm-Message-State: AFqh2kre/+0IZiusankixCWyHouNxFdNSUa18gQfMCARvA2blx5ZE6EZ i0tGW7ejLPSpsfQRQ+e58eSZmg== X-Google-Smtp-Source: AMrXdXsG8yos8UdZcq3YT2hj5MtqT79VpjjiRIn2PzdIsXHIA3p0yrq7MaW0BO5X5trciymOBLcUVw== X-Received: by 2002:ac8:44c6:0:b0:3b6:3272:960e with SMTP id b6-20020ac844c6000000b003b63272960emr42175379qto.54.1674579507583; Tue, 24 Jan 2023 08:58:27 -0800 (PST) Received: from localhost (129.239.188.35.bc.googleusercontent.com. [35.188.239.129]) by smtp.gmail.com with ESMTPSA id x12-20020ac87ecc000000b0039cc0fbdb61sm1532332qtj.53.2023.01.24.08.58.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Jan 2023 08:58:26 -0800 (PST) Date: Tue, 24 Jan 2023 16:58:26 +0000 From: Joel Fernandes To: Frederic Weisbecker Cc: "Zhang, Qiang1" , "Paul E. McKenney" , quic_neeraju@quicinc.com, rcu@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] rcu: Remove impossible wakeup rcu GP kthread action from rcu_report_qs_rdp() Message-ID: References: <375EBA2F-F47A-4BC6-B4D4-2E96D5D15AB6@joelfernandes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Frederic, On Mon, Jan 23, 2023 at 10:11:15PM +0100, Frederic Weisbecker wrote: > On Mon, Jan 23, 2023 at 03:54:07PM -0500, Joel Fernandes wrote: > > > On Jan 23, 2023, at 11:27 AM, Frederic Weisbecker wrote: > > > > > > On Mon, Jan 23, 2023 at 10:22:19AM -0500, Joel Fernandes wrote: > > >>> What am I missing? > > >> > > >> That the acceleration is also done by __note_gp_changes() once the > > >> grace period ends anyway, so if any acceleration was missed as you > > >> say, it will be done anyway. > > >> > > >> Also it is done by scheduler tick raising softirq: > > >> > > >> rcu_pending() does this: > > >> /* Has RCU gone idle with this CPU needing another grace period? */ > > >> if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) && > > >> !rcu_rdp_is_offloaded(rdp) && > > >> !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) > > >> return 1; > > >> > > >> and rcu_core(): > > >> /* No grace period and unregistered callbacks? */ > > >> if (!rcu_gp_in_progress() && > > >> rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) { > > >> rcu_nocb_lock_irqsave(rdp, flags); > > >> if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) > > >> rcu_accelerate_cbs_unlocked(rnp, rdp); > > >> rcu_nocb_unlock_irqrestore(rdp, flags); > > >> } > > >> > > >> So, I am not sure if you need needacc at all. Those CBs that have not > > >> been assigned grace period numbers will be taken care off :) > > > > > > But that's only when there is no grace period pending, so it can't happen while > > > we report a QS. > > > > > > OTOH without the needacc, those callbacks waiting to be accelerated would be > > > eventually processed but only on the next tick following the end of a grace > > > period...if none has started since then. So if someone else starts a new GP > > > before the current CPU, we must wait another GP, etc... > > > > > > That's potentially dangerous. > > > > Waiting for just one more GP cannot be dangerous IMO. Anyway there is no > > guarantee that callback will run immediately at end of GP, there may be one or > > more GPs before callback can run, if I remember correctly. That is by > > design.. but please correct me if my understanding is different from yours. > > It's not bound to just one GP. If you have N CPUs flooding callbacks for a > long while, your CPU has 1/N chances to be the one starting the next GP on > each turn. Relying on the acceleration to be performed only when no GP is > running may theoretically starve your callbacks forever. I tend to agree with you, however I was mostly referring to the "needac". The theoretical case is even more theoretical because you have to be half way through the transition _and_ flooding at the same time such that there is not a chance for RCU to be idle for a long period of time (which we don't really see in our systems per-se). But I agree, even if theoretical, maybe better to handle it. > > > And unfortunately we can't do the acceleration from __note_gp_changes() > > > due to lock ordering restrictions: nocb_lock -> rnp_lock > > > Yeah I was more going for the fact that if the offload to deoffload transition completed before note_gp_changes() was called, which obviously we cannot assume... But yeah for the 'limbo between offload to not offload state', eventually RCU goes idle, the scheduling clock interrupt (via rcu_pending()) will raise softirq or note_gp_changes() will accelerate. Which does not help the theoretical flooding case above, but still... > > > > Ah. This part I am not sure. Appreciate if point me to any old archive > > links or documentation detailing that, if possible… > > It's not documented but the code in nocb_gp_wait() or nocb_cb_wait() has > that locking order for example. > > Excerpt: > > rcu_nocb_lock_irqsave(rdp, flags); if (rcu_segcblist_nextgp(cblist, > &cur_gp_seq) && rcu_seq_done(&rnp->gp_seq, cur_gp_seq) && > raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */ > needwake_gp = rcu_advance_cbs(rdp->mynode, rdp); > raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ } > > Thanks. Ah I know what you mean now, in other words there is a chance of an ABBA deadlock if we do not acquire the locks in the correct order. Thank you for clarifying. Indeed rcutree_migrate_callbacks() also confirms it. I guess to Frederic's point, another case other than the CB flooding, that can starve CB accelerations is if the system is in the 'limbo between offload and de-offload' state for a long period of time. In such a situation also, neither note_gp_changes(), nor the scheduling-clock interrupt via softirq will help accelerate CBs. Whether such a limbo state is possible indefinitely, I am not sure... Thanks a lot for the discussions and it is always a learning experience talking about these...! thanks, - Joel