Date: Thu, 26 Jun 2008 08:27:28 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Gautham R Shenoy <ego@in.ibm.com>,
       Dhaval Giani <dhaval@linux.vnet.ibm.com>,
       Dipankar Sarma <dipankar@in.ibm.com>, laijs@cn.fujitsu.com,
       Peter Zijlstra <a.p.zijlstra@chello.nl>,
       lkml <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fix rcu vs hotplug race
Message-ID: <20080626152728.GA24972@linux.vnet.ibm.com>
Reply-To: paulmck@linux.vnet.ibm.com
References: <20080623103700.GA4043@linux.vnet.ibm.com> <20080623105844.GC28192@elte.hu> <20080623114941.GB3160@in.ibm.com> <20080624110144.GA8695@elte.hu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20080624110144.GA8695@elte.hu>
User-Agent: Mutt/1.5.13 (2006-08-11)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2242
Lines: 47

On Tue, Jun 24, 2008 at 01:01:44PM +0200, Ingo Molnar wrote:
> 
> * Gautham R Shenoy <ego@in.ibm.com> wrote:
> 
> > > hm, not sure - we might just be fighting the symptom and we might 
> > > now create a silent resource leak instead. Isnt a full RCU quiescent 
> > > state forced (on all CPUs) before a CPU is cleared out of 
> > > cpu_online_map? That way the to-be-offlined CPU should never 
> > > actually show up in rcp->cpumask.
> > 
> > No, this does not happen currently. The rcp->cpumask is always 
> > initialized to cpu_online_map&~nohz_cpu_mask when we start a new 
> > batch. Hence, before the batch ends, if a cpu goes offline we _can_ 
> > have a stale rcp->cpumask, till the RCU subsystem has handled it's 
> > CPU_DEAD notification.
> > 
> > Thus for a tiny interval, the rcp->cpumask would contain the offlined 
> > CPU. One of the alternatives is probably to handle this using 
> > CPU_DYING notifier instead of CPU_DEAD where we can call 
> > __rcu_offline_cpu().
> > 
> > The warn_on that dhaval was hitting was because of some cpu-offline 
> > that was called just before we did a local_irq_save inside call_rcu(). 
> > But at that time, the rcp->cpumask was still stale, and hence we ended 
> > up sending a smp_reschedule() to an offlined cpu. So the check may not 
> > create any resource leak.
> 
> the check may not - but the problem it highlights might and with the 
> patch we'd end up hiding potential problems in this area.
> 
> Paul, what do you think about this mixed CPU hotplug plus RCU workload?

RCU most certainly needs to work correctly in face of arbitrary sequences
of CPU-hotplug events, and should therefore be tested with arbitrary
CPU-hotplug tests.  And RCU also most certainly needs to refrain from
issuing spurious warning messages that might over time be ignored,
possibly causing someone to miss a real bug.  My concern with this patch
is in the second spurious-warning area.

Not sure I answered the actual question, though...

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/