2006-01-27 18:41:20

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [patch 1/2] rcu batch tuning

Dipankar Sarma wrote:
>
> +static void force_quiescent_state(struct rcu_data *rdp,
> + struct rcu_ctrlblk *rcp)
> +{
> + int cpu;
> + cpumask_t cpumask;
> + set_need_resched();
> + if (unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) {
> + rdp->last_rs_qlen = rdp->qlen;
> + /*
> + * Don't send IPI to itself. With irqs disabled,
> + * rdp->cpu is the current cpu.
> + */
> + cpumask = rcp->cpumask;
> + cpu_clear(rdp->cpu, cpumask);
> + for_each_cpu_mask(cpu, cpumask)
> + smp_send_reschedule(cpu);
> + }
> +}
> [ ... snip ... ]
>
> @@ -92,17 +128,13 @@ void fastcall call_rcu(struct rcu_head *
> rdp = &__get_cpu_var(rcu_data);
> *rdp->nxttail = head;
> rdp->nxttail = &head->next;
> -
> - if (unlikely(++rdp->count > 10000))
> - set_need_resched();
> -
> + if (unlikely(++rdp->qlen > qhimark)) {
> + rdp->blimit = INT_MAX;
> + force_quiescent_state(rdp, &rcu_ctrlblk);
> + }

When ->qlen exceeds qhimark for the first time we send reschedule IPI to
other CPUs and force_quiescent_state() records ->last_rs_qlen = ->qlen.
But we don't reset ->last_rs_qlen when ->qlen goes to 0, this means that
next time we need ++rdp->qlen > qhimark + rsinterval to force other CPUS
to pass quiescent state, no?

Also, it seems to me it's better to have 2 counters, one for length(->donelist)
and another for length(->curlist + ->nxtlist). I think we don't need
force_quiescent_state() when all rcu callbacks are placed in ->donelist,
we only need to increase rdp->blimit in this case.

Oleg.


2006-01-27 23:43:07

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [patch 1/2] rcu batch tuning

On Fri, Jan 27, 2006 at 10:57:59PM +0300, Oleg Nesterov wrote:
> Dipankar Sarma wrote:
> >
> > +static void force_quiescent_state(struct rcu_data *rdp,
> > + struct rcu_ctrlblk *rcp)
> > +{
> > + int cpu;
> > + cpumask_t cpumask;
> > + set_need_resched();
> > + if (unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) {
> > + rdp->last_rs_qlen = rdp->qlen;
> > + /*
> > + * Don't send IPI to itself. With irqs disabled,
> > + * rdp->cpu is the current cpu.
> > + */
> > + cpumask = rcp->cpumask;
> > + cpu_clear(rdp->cpu, cpumask);
> > + for_each_cpu_mask(cpu, cpumask)
> > + smp_send_reschedule(cpu);
> > + }
> > +}
> > [ ... snip ... ]
> >
> > @@ -92,17 +128,13 @@ void fastcall call_rcu(struct rcu_head *
> > rdp = &__get_cpu_var(rcu_data);
> > *rdp->nxttail = head;
> > rdp->nxttail = &head->next;
> > -
> > - if (unlikely(++rdp->count > 10000))
> > - set_need_resched();
> > -
> > + if (unlikely(++rdp->qlen > qhimark)) {
> > + rdp->blimit = INT_MAX;
> > + force_quiescent_state(rdp, &rcu_ctrlblk);
> > + }
>
> When ->qlen exceeds qhimark for the first time we send reschedule IPI to
> other CPUs and force_quiescent_state() records ->last_rs_qlen = ->qlen.
> But we don't reset ->last_rs_qlen when ->qlen goes to 0, this means that
> next time we need ++rdp->qlen > qhimark + rsinterval to force other CPUS
> to pass quiescent state, no?

Good catch -- this could well explain Lee's continuing to hit
latency problems. Although this would not cause the first
latency event, only subsequent ones, it seems to me that ->last_rs_qlen
should be reset whenever ->blimit is reset.

Dipankar, thoughts?

> Also, it seems to me it's better to have 2 counters, one for length(->donelist)
> and another for length(->curlist + ->nxtlist). I think we don't need
> force_quiescent_state() when all rcu callbacks are placed in ->donelist,
> we only need to increase rdp->blimit in this case.

True, currently the patch keeps the sum of the length of all three lists,
and takes both actions when the sum gets too large. But the only way
you would get unneeded IPIs would be if callback processing was
stalled, but callback generation and grace-period processing was
still proceeding. Seems at first glance to be an unusual corner
case, with the only downside being some extra IPIs. Or am I missing
some aspect?

Thanx, Paul

2006-01-28 17:08:52

by Dipankar Sarma

[permalink] [raw]
Subject: Re: [patch 1/2] rcu batch tuning

On Fri, Jan 27, 2006 at 10:57:59PM +0300, Oleg Nesterov wrote:
> Dipankar Sarma wrote:
> >
> > @@ -92,17 +128,13 @@ void fastcall call_rcu(struct rcu_head *
> > rdp = &__get_cpu_var(rcu_data);
> > *rdp->nxttail = head;
> > rdp->nxttail = &head->next;
> > -
> > - if (unlikely(++rdp->count > 10000))
> > - set_need_resched();
> > -
> > + if (unlikely(++rdp->qlen > qhimark)) {
> > + rdp->blimit = INT_MAX;
> > + force_quiescent_state(rdp, &rcu_ctrlblk);
> > + }
>
> When ->qlen exceeds qhimark for the first time we send reschedule IPI to
> other CPUs and force_quiescent_state() records ->last_rs_qlen = ->qlen.
> But we don't reset ->last_rs_qlen when ->qlen goes to 0, this means that
> next time we need ++rdp->qlen > qhimark + rsinterval to force other CPUS
> to pass quiescent state, no?

Yes. I have fixed it my code.

>
> Also, it seems to me it's better to have 2 counters, one for length(->donelist)
> and another for length(->curlist + ->nxtlist). I think we don't need
> force_quiescent_state() when all rcu callbacks are placed in ->donelist,
> we only need to increase rdp->blimit in this case.

We could, but I am not sure that ->qlen is not a good measure of
grace period not happening. In fact, in the past, long donelists
often resulted in longer grace periods. So, no harm in forcing
reschedule and allowing ksoftirq to relinquish cpu.

Thanks
Dipankar