Date:   Wed, 1 Dec 2021 13:55:02 +0100
From:   Frederic Weisbecker <frederic@kernel.org>
To:     Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc:     "Paul E . McKenney" <paulmck@kernel.org>,
        LKML <linux-kernel@vger.kernel.org>,
        Uladzislau Rezki <urezki@gmail.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        Josh Triplett <josh@joshtriplett.org>,
        Joel Fernandes <joel@joelfernandes.org>, rcu@vger.kernel.org
Subject: Re: [PATCH 1/6] rcu/nocb: Remove rdp from nocb list when de-offloaded
Message-ID: <20211201125502.GA628470@lothringen>
References: <20211123003708.468409-1-frederic@kernel.org>
 <20211123003708.468409-2-frederic@kernel.org>
 <84ab6b4a-6fc4-be3f-d990-1f46265a46e6@quicinc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <84ab6b4a-6fc4-be3f-d990-1f46265a46e6@quicinc.com>
Precedence: bulk

On Wed, Dec 01, 2021 at 02:55:16PM +0530, Neeraj Upadhyay wrote:
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index 2461fe8d0c23..cc1165559177 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -625,7 +625,15 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> >   	 * and the global grace-period kthread are awakened if needed.
> >   	 */
> >   	WARN_ON_ONCE(my_rdp->nocb_gp_rdp != my_rdp);
> > -	for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> > +	/*
> > +	 * An rdp can be removed from the list after being de-offloaded or added
> > +	 * to the list before being (re-)offloaded. If the below loop happens while
> > +	 * an rdp is de-offloaded and then re-offloaded shortly afterward, we may
> > +	 * shortcut and ignore a part of the rdp list due to racy list iteration.
> > +	 * Fortunately a new run through the entire loop is forced after an rdp is
> > +	 * added here so that such race get quickly fixed.
> > +	 */
> > +	list_for_each_entry_rcu(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp, 1) {
> 
> Can we hit a (unlikely) case where repeated calls to de-offload/offload
> cause this loop to continue for long time?

Probably not, I guess the only situation for that to happen would be:

	 DEOFFLOAD rdp 0
	 OFFLOAD rdp 0
 	 DEOFFLOAD rdp 1
	 OFFLOAD rdp 1
 	 DEOFFLOAD rdp 2
	 OFFLOAD rdp 2
	 etc...

But even then you'd need a very bad luck.

> 
> 
> >   		bool needwake_state = false;
> >   		if (!nocb_gp_enabled_cb(rdp))
> 
> Now that we can probe flags here, without holding the nocb_gp_lock first (
> the case where de-offload and offload happens while we are iterating the
> list); can it cause a WARNING from below code?
> 
> 
> 	WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP));
> 	rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP);
> 
> The sequence like this is possible?
> 
> 1. <de-offload>
>     Clear SEGCBLIST_OFFLOADED
> 2. nocb_gp_wait() clears SEGCBLIST_KTHREAD_GP in
> nocb_gp_update_state_deoffloading() and continues to next rdp.
> 3. <offload>
>     rdp_offload_toggle() hasn't been called yet.
> 4. rcuog thread migrates to different CPU, while executing the
> loop in nocb_gp_wait().
> 5. nocb_gp_wait() reaches the tail rdp.
> 6. Current CPU , where  rcog thread is running hasn't observed
> SEGCBLIST_OFFLOADED clearing done in step 1; so, nocb_gp_enabled_cb()
> passes.

This shouldn't happen. If a task migrates from CPU X to CPU Y, CPU Y is
guaranteed to see what CPU X saw due to scheduler ordering.

But you have a good point that checking those flags outside the nocb lock
is asking for trouble. It used to be an optimization but now that the rdp
entries are removed when they are not needed anymore, we should probably
lock unconditionally before the call to nocb_gp_enabled_cb().

Thanks.

> 7. nocb_gp_wait() acquires the rdp's nocb lock and read the state to
> be deoffloaded; however, SEGCBLIST_KTHREAD_GP is not set and
> we hit WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist,
> SEGCBLIST_KTHREAD_GP));
> 
> Thanks
> Neeraj