Date: Fri, 27 Jan 2006 15:42:31 -0800
From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Dipankar Sarma <dipankar@in.ibm.com>, linux-kernel@vger.kernel.org,
       Andrew Morton <akpm@osdl.org>, Linus Torvalds <torvalds@osdl.org>
Subject: Re: [patch 1/2] rcu batch tuning
Message-ID: <20060127234231.GD10075@us.ibm.com>
Reply-To: paulmck@us.ibm.com
References: <43DA7B47.11D10B09@tv-sign.ru>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <43DA7B47.11D10B09@tv-sign.ru>
User-Agent: Mutt/1.4.1i
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2596
Lines: 68

On Fri, Jan 27, 2006 at 10:57:59PM +0300, Oleg Nesterov wrote:
> Dipankar Sarma wrote:
> >
> > +static void force_quiescent_state(struct rcu_data *rdp,
> > +			struct rcu_ctrlblk *rcp)
> > +{
> > +	int cpu;
> > +	cpumask_t cpumask;
> > +	set_need_resched();
> > +	if (unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) {
> > +		rdp->last_rs_qlen = rdp->qlen;
> > +		/*
> > +		 * Don't send IPI to itself. With irqs disabled,
> > +		 * rdp->cpu is the current cpu.
> > +		 */
> > +		cpumask = rcp->cpumask;
> > +		cpu_clear(rdp->cpu, cpumask);
> > +		for_each_cpu_mask(cpu, cpumask)
> > +			smp_send_reschedule(cpu);
> > +	}
> > +}
> > [ ... snip ... ]
> >
> > @@ -92,17 +128,13 @@ void fastcall call_rcu(struct rcu_head *
> >  	rdp = &__get_cpu_var(rcu_data);
> >  	*rdp->nxttail = head;
> >  	rdp->nxttail = &head->next;
> > -
> > -	if (unlikely(++rdp->count > 10000))
> > -		set_need_resched();
> > -
> > +	if (unlikely(++rdp->qlen > qhimark)) {
> > +		rdp->blimit = INT_MAX;
> > +		force_quiescent_state(rdp, &rcu_ctrlblk);
> > +	}
> 
> When ->qlen exceeds qhimark for the first time we send reschedule IPI to
> other CPUs and force_quiescent_state() records ->last_rs_qlen = ->qlen.
> But we don't reset ->last_rs_qlen when ->qlen goes to 0, this means that
> next time we need ++rdp->qlen > qhimark + rsinterval to force other CPUS
> to pass quiescent state, no?

Good catch -- this could well explain Lee's continuing to hit
latency problems.  Although this would not cause the first
latency event, only subsequent ones, it seems to me that ->last_rs_qlen
should be reset whenever ->blimit is reset.

Dipankar, thoughts?

> Also, it seems to me it's better to have 2 counters, one for length(->donelist)
> and another for length(->curlist + ->nxtlist). I think we don't need
> force_quiescent_state() when all rcu callbacks are placed in ->donelist,
> we only need to increase rdp->blimit in this case.

True, currently the patch keeps the sum of the length of all three lists,
and takes both actions when the sum gets too large.  But the only way
you would get unneeded IPIs would be if callback processing was
stalled, but callback generation and grace-period processing was
still proceeding.  Seems at first glance to be an unusual corner
case, with the only downside being some extra IPIs.  Or am I missing
some aspect?

						Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/