->donelist becomes != NULL only in rcu_process_callbacks().
rcu_process_callbacks() always calls rcu_do_batch() when
->donelist != NULL.
rcu_do_batch() schedules rcu_process_callbacks() again if
->donelist was not flushed entirely.
So ->donelist != NULL means that rcu_tasklet is either
TASKLET_STATE_SCHED or TASKLET_STATE_RUN, we don't need to
check it in __rcu_pending().
[ This patch was tested with rcutorture.ko, I don't understand
it's output, but it says "End of test: SUCCESS". So if this
patch incorrect blame Paul, not me! ]
Signed-off-by: Oleg Nesterov <[email protected]>
--- 2.6.15/kernel/rcupdate.c~2_DONE 2006-01-08 21:35:21.000000000 +0300
+++ 2.6.15/kernel/rcupdate.c 2006-01-08 21:55:45.000000000 +0300
@@ -454,10 +454,6 @@ static int __rcu_pending(struct rcu_ctrl
if (!rdp->curlist && rdp->nxtlist)
return 1;
- /* This cpu has finished callbacks to invoke */
- if (rdp->donelist)
- return 1;
-
/* The rcu core waits for a quiescent state from the cpu */
if (rdp->quiescbatch != rcp->cur || rdp->qs_pending)
return 1;
On Sun, Jan 08, 2006 at 10:19:24PM +0300, Oleg Nesterov wrote:
> ->donelist becomes != NULL only in rcu_process_callbacks().
>
> rcu_process_callbacks() always calls rcu_do_batch() when
> ->donelist != NULL.
>
> rcu_do_batch() schedules rcu_process_callbacks() again if
> ->donelist was not flushed entirely.
>
> So ->donelist != NULL means that rcu_tasklet is either
> TASKLET_STATE_SCHED or TASKLET_STATE_RUN, we don't need to
> check it in __rcu_pending().
>
> [ This patch was tested with rcutorture.ko, I don't understand
> it's output, but it says "End of test: SUCCESS". So if this
> patch incorrect blame Paul, not me! ]
;-)
Did you run the newest version of rcutorture.ko that includes
Vatsa's CPU-hotplug additions?
Thanx, Paul
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- 2.6.15/kernel/rcupdate.c~2_DONE 2006-01-08 21:35:21.000000000 +0300
> +++ 2.6.15/kernel/rcupdate.c 2006-01-08 21:55:45.000000000 +0300
> @@ -454,10 +454,6 @@ static int __rcu_pending(struct rcu_ctrl
> if (!rdp->curlist && rdp->nxtlist)
> return 1;
>
> - /* This cpu has finished callbacks to invoke */
> - if (rdp->donelist)
> - return 1;
> -
> /* The rcu core waits for a quiescent state from the cpu */
> if (rdp->quiescbatch != rcp->cur || rdp->qs_pending)
> return 1;
>
On Sun, Jan 08, 2006 at 10:19:24PM +0300, Oleg Nesterov wrote:
> ->donelist becomes != NULL only in rcu_process_callbacks().
>
> rcu_process_callbacks() always calls rcu_do_batch() when
> ->donelist != NULL.
>
> rcu_do_batch() schedules rcu_process_callbacks() again if
> ->donelist was not flushed entirely.
>
> So ->donelist != NULL means that rcu_tasklet is either
> TASKLET_STATE_SCHED or TASKLET_STATE_RUN, we don't need to
> check it in __rcu_pending().
Do I smell a bug wrt CPU Hotplug here? Basically, I see that we do
a rcu_move_batch of ->curlist and ->nxtlist of the dead CPU. Why not
->donelist? If we have to do a rcu_move_batch of ->donelist also,
then perhaps the ->donelist != NULL check is required in
rcu_pending? This is considering that the RCU tasklet of the dead
CPU is killed (rather than moved over to a different CPU).
--
Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017
Srivatsa Vaddagiri wrote:
> On Sun, Jan 08, 2006 at 10:19:24PM +0300, Oleg Nesterov wrote:
> > ->donelist becomes != NULL only in rcu_process_callbacks().
> >
> > rcu_process_callbacks() always calls rcu_do_batch() when
> > ->donelist != NULL.
> >
> > rcu_do_batch() schedules rcu_process_callbacks() again if
> > ->donelist was not flushed entirely.
> >
> > So ->donelist != NULL means that rcu_tasklet is either
> > TASKLET_STATE_SCHED or TASKLET_STATE_RUN, we don't need to
> > check it in __rcu_pending().
>
> Do I smell a bug wrt CPU Hotplug here? Basically, I see that we do
> a rcu_move_batch of ->curlist and ->nxtlist of the dead CPU. Why not
> ->donelist?
After the quick reading CPU Hotplug code I think you are right, there
is a bug in rcu_offline_cpu(). It should also move ->donelist, it is
lost otherwise.
> If we have to do a rcu_move_batch of ->donelist also,
> then perhaps the ->donelist != NULL check is required in
> rcu_pending?
rcu_move_batch() always adds entries to the ->nxttail, so I think
this patch is correct.
> This is considering that the RCU tasklet of the dead
> CPU is killed (rather than moved over to a different CPU).
Yes, it is killed explicitly in rcu_offline_cpu() via tasklet_kill_immediate().
Note that we can't remove this tasklet_kill_immediate() and rely on
takeover_tasklets(). rcu_process_callbacks() does __get_cpu_var(),
so it can't find orphaned rcu_data anyway if the tasklet was moved
to another cpu.
So, do we need something like this (untested, uncompiled) patch or not?
--- 2.6.15/kernel/rcupdate.c~ 2006-01-09 20:23:32.000000000 +0300
+++ 2.6.15/kernel/rcupdate.c 2006-01-09 20:26:20.000000000 +0300
@@ -355,8 +355,9 @@ static void __rcu_offline_cpu(struct rcu
spin_unlock_bh(&rcp->lock);
rcu_move_batch(this_rdp, rdp->curlist, rdp->curtail);
rcu_move_batch(this_rdp, rdp->nxtlist, rdp->nxttail);
-
+ rcu_move_batch(this_rdp, rdp->donelist, rdp->donetail);
}
+
static void rcu_offline_cpu(int cpu)
{
struct rcu_data *this_rdp = &get_cpu_var(rcu_data);
Oleg.
On Mon, Jan 09, 2006 at 05:51:08PM +0300, Oleg Nesterov wrote:
> Srivatsa Vaddagiri wrote:
> > If we have to do a rcu_move_batch of ->donelist also,
> > then perhaps the ->donelist != NULL check is required in
> > rcu_pending?
>
> rcu_move_batch() always adds entries to the ->nxttail, so I think
> this patch is correct.
Hmm ..adding entries on dead cpu's ->donelist to ->nxtlist of some other CPU
doesnt make sense (it re-triggers graceperiods for all those callbacks which is
not needed). Maybe rcu_move_batch should take that into account and instead add
to ->donelist of current CPU which is processing the CPU_DEAD callback.
In which case, ->donelist != NULL check is still reqd in rcu_pending ?
--
Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017
Srivatsa Vaddagiri wrote:
>
> On Mon, Jan 09, 2006 at 05:51:08PM +0300, Oleg Nesterov wrote:
> > Srivatsa Vaddagiri wrote:
> > > If we have to do a rcu_move_batch of ->donelist also,
> > > then perhaps the ->donelist != NULL check is required in
> > > rcu_pending?
> >
> > rcu_move_batch() always adds entries to the ->nxttail, so I think
> > this patch is correct.
>
> Hmm ..adding entries on dead cpu's ->donelist to ->nxtlist of some other CPU
> doesnt make sense (it re-triggers graceperiods for all those callbacks which is
> not needed).
Yes, but this is rare event, so ...
> Maybe rcu_move_batch should take that into account and instead add
> to ->donelist of current CPU which is processing the CPU_DEAD callback.
May be you are right,
> In which case, ->donelist != NULL check is still reqd in rcu_pending ?
Yes, in that case it is required. Do you think it is worth to optimize
CPU_DEAD case?
Oleg.
On Sun, Jan 08, 2006 at 10:19:24PM +0300, Oleg Nesterov wrote:
> ->donelist becomes != NULL only in rcu_process_callbacks().
>
> rcu_process_callbacks() always calls rcu_do_batch() when
> ->donelist != NULL.
>
> rcu_do_batch() schedules rcu_process_callbacks() again if
> ->donelist was not flushed entirely.
>
> So ->donelist != NULL means that rcu_tasklet is either
> TASKLET_STATE_SCHED or TASKLET_STATE_RUN, we don't need to
> check it in __rcu_pending().
As Vatsa noted, this is needed if the CPU-hotplug case moves
from ->donelist to ->donelist. It could be omitted if CPU-hotplug
instead moves from ->donelist to ->nextlist, as is the case in Oleg's
patch. The extra grace-period delay should not be a problem for the
presumably rare hotplug case, but:
o the extra test in __rcu_pending() should be quite inexpensive,
since the cacheline is already loaded given the earlier tests.
o although tasklet_schedule() looks to be perfectly reliable
right now, and although any bugs in tasklet_schedule() must
be fixed, having RCU leakage be the major symptom of
tasklet_schedule() failure sounds quite unfriendly to me.
So I am not (yet) convinced that this patch is the way to go.
Thanx, Paul
> [ This patch was tested with rcutorture.ko, I don't understand
> it's output, but it says "End of test: SUCCESS". So if this
> patch incorrect blame Paul, not me! ]
>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- 2.6.15/kernel/rcupdate.c~2_DONE 2006-01-08 21:35:21.000000000 +0300
> +++ 2.6.15/kernel/rcupdate.c 2006-01-08 21:55:45.000000000 +0300
> @@ -454,10 +454,6 @@ static int __rcu_pending(struct rcu_ctrl
> if (!rdp->curlist && rdp->nxtlist)
> return 1;
>
> - /* This cpu has finished callbacks to invoke */
> - if (rdp->donelist)
> - return 1;
> -
> /* The rcu core waits for a quiescent state from the cpu */
> if (rdp->quiescbatch != rcp->cur || rdp->qs_pending)
> return 1;
>
"Paul E. McKenney" wrote:
>
> On Sun, Jan 08, 2006 at 10:19:24PM +0300, Oleg Nesterov wrote:
> > ->donelist becomes != NULL only in rcu_process_callbacks().
> >
> > rcu_process_callbacks() always calls rcu_do_batch() when
> > ->donelist != NULL.
> >
> > rcu_do_batch() schedules rcu_process_callbacks() again if
> > ->donelist was not flushed entirely.
> >
> > So ->donelist != NULL means that rcu_tasklet is either
> > TASKLET_STATE_SCHED or TASKLET_STATE_RUN, we don't need to
> > check it in __rcu_pending().
>
> As Vatsa noted, this is needed if the CPU-hotplug case moves
> from ->donelist to ->donelist. It could be omitted if CPU-hotplug
> instead moves from ->donelist to ->nextlist, as is the case in Oleg's
> patch. The extra grace-period delay should not be a problem for the
> presumably rare hotplug case, but:
Just to be sure. So do you agree that CPU-hotplug is buggy now (without
that patch) ?
> o the extra test in __rcu_pending() should be quite inexpensive,
> since the cacheline is already loaded given the earlier tests.
Yes, it was a cleanup, not an optimization.
> o although tasklet_schedule() looks to be perfectly reliable
> right now, and although any bugs in tasklet_schedule() must
> be fixed, having RCU leakage be the major symptom of
> tasklet_schedule() failure sounds quite unfriendly to me.
>
> So I am not (yet) convinced that this patch is the way to go.
Ok, I agree.
Oleg.
On Mon, Jan 09, 2006 at 11:31:20PM +0300, Oleg Nesterov wrote:
> "Paul E. McKenney" wrote:
> >
> > On Sun, Jan 08, 2006 at 10:19:24PM +0300, Oleg Nesterov wrote:
> > > ->donelist becomes != NULL only in rcu_process_callbacks().
> > >
> > > rcu_process_callbacks() always calls rcu_do_batch() when
> > > ->donelist != NULL.
> > >
> > > rcu_do_batch() schedules rcu_process_callbacks() again if
> > > ->donelist was not flushed entirely.
> > >
> > > So ->donelist != NULL means that rcu_tasklet is either
> > > TASKLET_STATE_SCHED or TASKLET_STATE_RUN, we don't need to
> > > check it in __rcu_pending().
> >
> > As Vatsa noted, this is needed if the CPU-hotplug case moves
> > from ->donelist to ->donelist. It could be omitted if CPU-hotplug
> > instead moves from ->donelist to ->nextlist, as is the case in Oleg's
> > patch. The extra grace-period delay should not be a problem for the
> > presumably rare hotplug case, but:
>
> Just to be sure. So do you agree that CPU-hotplug is buggy now (without
> that patch) ?
Hmmm... So your thought is that __rcu_offline_cpu() moves nxtlist and
curlist, but not donelist, but then returns to rcu_offline_cpu(), which
might well do the tasklet_kill_immediate() before the tasklet completed
processing all of donelist.
Seems plausible to me. If true, your patch adding the following statement
to the ed of __rcu_offline_cpu seems like a reasonable fix:
rcu_move_batch(this_rdp, rdp->donelist, rdp->donetail);
Vatsa, is there something that Oleg and I are missing?
> > o the extra test in __rcu_pending() should be quite inexpensive,
> > since the cacheline is already loaded given the earlier tests.
>
> Yes, it was a cleanup, not an optimization.
>
> > o although tasklet_schedule() looks to be perfectly reliable
> > right now, and although any bugs in tasklet_schedule() must
> > be fixed, having RCU leakage be the major symptom of
> > tasklet_schedule() failure sounds quite unfriendly to me.
> >
> > So I am not (yet) convinced that this patch is the way to go.
>
> Ok, I agree.
Sounds good!
Thanx, Paul
On Mon, Jan 09, 2006 at 11:59:33AM -0800, Paul E. McKenney wrote:
> Hmmm... So your thought is that __rcu_offline_cpu() moves nxtlist and
> curlist, but not donelist, but then returns to rcu_offline_cpu(), which
> might well do the tasklet_kill_immediate() before the tasklet completed
> processing all of donelist.
>
> Seems plausible to me. If true, your patch adding the following statement
> to the ed of __rcu_offline_cpu seems like a reasonable fix:
>
> rcu_move_batch(this_rdp, rdp->donelist, rdp->donetail);
>
> Vatsa, is there something that Oleg and I are missing?
I think this should take care of the CPU Hotplug bug, with the caveat
that the callbacks on ->donelist will wait for additional grace period before
being invoked (which seems ok).
Oleg, do you want to resend the patch after some testing?
- vatsa
Srivatsa Vaddagiri wrote:
>
> On Mon, Jan 09, 2006 at 11:59:33AM -0800, Paul E. McKenney wrote:
> > Hmmm... So your thought is that __rcu_offline_cpu() moves nxtlist and
> > curlist, but not donelist, but then returns to rcu_offline_cpu(), which
> > might well do the tasklet_kill_immediate() before the tasklet completed
> > processing all of donelist.
> >
> > Seems plausible to me. If true, your patch adding the following statement
> > to the ed of __rcu_offline_cpu seems like a reasonable fix:
> >
> > rcu_move_batch(this_rdp, rdp->donelist, rdp->donetail);
> >
> > Vatsa, is there something that Oleg and I are missing?
>
> I think this should take care of the CPU Hotplug bug, with the caveat
> that the callbacks on ->donelist will wait for additional grace period before
> being invoked (which seems ok).
>
> Oleg, do you want to resend the patch after some testing?
Sure. The problem is that I have no idea how to test this change.
However, the patch seems "obviously correct", so I am sending it.
Srivatsa, I am sorry, I forgot to add you on CC: list while re-sending
these cleanups. I hope you can see them on lkml.
Oleg.
On Sun, Jan 08, 2006 at 10:19:24PM +0300, Oleg Nesterov wrote:
> ->donelist becomes != NULL only in rcu_process_callbacks().
>
> So ->donelist != NULL means that rcu_tasklet is either
> TASKLET_STATE_SCHED or TASKLET_STATE_RUN, we don't need to
> check it in __rcu_pending().
>
> [ This patch was tested with rcutorture.ko, I don't understand
> it's output, but it says "End of test: SUCCESS". So if this
> patch incorrect blame Paul, not me! ]
>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- 2.6.15/kernel/rcupdate.c~2_DONE 2006-01-08 21:35:21.000000000 +0300
> +++ 2.6.15/kernel/rcupdate.c 2006-01-08 21:55:45.000000000 +0300
> @@ -454,10 +454,6 @@ static int __rcu_pending(struct rcu_ctrl
> if (!rdp->curlist && rdp->nxtlist)
> return 1;
>
> - /* This cpu has finished callbacks to invoke */
> - if (rdp->donelist)
> - return 1;
> -
> /* The rcu core waits for a quiescent state from the cpu */
> if (rdp->quiescbatch != rcp->cur || rdp->qs_pending)
> return 1;
This may not be a good idea. For example, during cpu hotplug,
a cpu may inherit a set of finished callbacks that need to be
invoked. So, an rcu_pending() check needs to detect that.
Thanks
Dipankar