Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752551Ab3IFUib (ORCPT ); Fri, 6 Sep 2013 16:38:31 -0400 Received: from e8.ny.us.ibm.com ([32.97.182.138]:40100 "EHLO e8.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750929Ab3IFUi3 (ORCPT ); Fri, 6 Sep 2013 16:38:29 -0400 Date: Fri, 6 Sep 2013 13:38:21 -0700 From: "Paul E. McKenney" To: Frederic Weisbecker Cc: Eric Dumazet , linux-kernel@vger.kernel.org, mingo@elte.hu, laijs@cn.fujitsu.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, darren@dvhart.com, sbw@mit.edu Subject: Re: [PATCH] rcu: Is it safe to enter an RCU read-side critical section? Message-ID: <20130906203821.GX3966@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20130905195234.GA20555@linux.vnet.ibm.com> <20130906105934.GF20519@somewhere> <20130906151851.GQ3966@linux.vnet.ibm.com> <1378488088.31445.39.camel@edumazet-glaptop> <20130906174117.GU3966@linux.vnet.ibm.com> <20130906185927.GE2706@somewhere> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130906185927.GE2706@somewhere> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13090620-0320-0000-0000-000000EB8546 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4055 Lines: 84 On Fri, Sep 06, 2013 at 08:59:29PM +0200, Frederic Weisbecker wrote: > On Fri, Sep 06, 2013 at 10:41:17AM -0700, Paul E. McKenney wrote: > > On Fri, Sep 06, 2013 at 10:21:28AM -0700, Eric Dumazet wrote: > > > On Fri, 2013-09-06 at 08:18 -0700, Paul E. McKenney wrote: > > > > > > > int rcu_is_cpu_idle(void) > > > > { > > > > int ret; > > > > > > > > preempt_disable(); > > > > ret = (atomic_read(&__get_cpu_var(rcu_dynticks).dynticks) & 0x1) == 0; > > > > preempt_enable(); > > > > return ret; > > > > } > > > > > > Paul I find this very confusing. > > > > > > If caller doesn't have preemption disabled, what could be the meaning of > > > this rcu_is_cpu_idle() call ? > > > > > > Its result is meaningless if suddenly thread is preempted, so what is > > > the point ? > > > > > > Sorry if this is obvious to you. > > > > It is a completely fair question. In fact, this might well now be > > pointing to a bug given NO_HZ_FULL. > > > > The assumption is that if you don't have preemption disabled, you had > > better be running on a CPU that RCU is paying attention to. The rationale > > involves preemptible RCU. > > > > Suppose that you just did rcu_read_lock() on a CPU that RCU is paying > > attention to. All is well, and rcu_is_cpu_idle() will return false, as > > expected. Suppose now that it is possible to be preempted and suddenly > > find yourself running on a CPU that RCU is not paying attention to. > > This would have the effect of making your RCU read-side critical section > > be ignored. Therefore, it had better not be possible to be preempted > > from a CPU to which RCU is paying attention to a CPU that RCU is ignoring. > > > > So if rcu_is_cpu_idle() returns false, you had better be guaranteed > > that whatever CPU you are running on (which might well be a different > > one than the rcu_is_cpu_idle() was running on) is being watched by RCU. > > > > So, Frederic, does this still work with NO_HZ_FULL? If not, I believe > > we have a bigger problem than the preempt_disable() in rcu_is_cpu_idle()! > > Sure it works well, because the scheduler task entrypoints exit those RCU > extended quiescent states. > > Imagine that you're running on an rcu read side critical section on CPU 0, which > is not in extended quiescent state. Now you get preempted in the middle of your > RCU read side critical section (you called rcu_read_lock() but not yet rcu_read_unlock()). > > Later on, the task is woken up to be scheduled in CPU 1. If CPU 1 is in extended > quiescent state because it runs is userspace, it receives a scheduler IPI, > then schedule_user() is called by the end of the interrupt and in turns calls rcu_user_exit() > before the task is resumed to the code it was running on CPU 0, in the middle of > the rcu read side extended quiescent state. > > See, the key here is the rcu_user_exit() that restore the CPU on RCU's state machine. > There are other possible scheduler entrypoints when a CPU runs in user extended quiescent > state: exception and syscall entries or even preempt_schedule_irq() in case we receive an irq > in the kernel while we haven't yet reached the call to rcu_user_exit()... All of these should > be covered, otherwise you bet RCU would be prompt to warn. > > That's why when we call rcu_is_cpu_idle() from an RCU read side critical section, it's legit even > if we can be preempted anytime around it. > And preempt_disable() is probably not even necessary, except perhaps if __get_cpu_var() itself > relies on non-preemptibility for its own correctness on the address calculation. Whew!!! ;-) But the problem for rcu_is_cpu_idle() was not the calls from the scheduler, but rather those from lockdep. If the overhead is a concern, you could switch to the primitives I will be supplying for Steven. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/