Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751465AbbEDSjV (ORCPT ); Mon, 4 May 2015 14:39:21 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:52812 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750941AbbEDSjN (ORCPT ); Mon, 4 May 2015 14:39:13 -0400 Date: Mon, 4 May 2015 11:39:06 -0700 From: "Paul E. McKenney" To: Rik van Riel Cc: Paolo Bonzini , Ingo Molnar , Andy Lutomirski , "linux-kernel@vger.kernel.org" , X86 ML , williams@redhat.com, Andrew Lutomirski , fweisbec@redhat.com, Peter Zijlstra , Heiko Carstens , Thomas Gleixner , Ingo Molnar , Linus Torvalds Subject: Re: question about RCU dynticks_nesting Message-ID: <20150504183906.GS5381@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20150501162109.GA1091@gmail.com> <5543A94B.3020108@redhat.com> <20150501163431.GB1327@gmail.com> <5543C05E.9040209@redhat.com> <20150501184025.GA2114@gmail.com> <5543CFE5.1030509@redhat.com> <20150502052733.GA9983@gmail.com> <55473B47.6080600@redhat.com> <55479749.7070608@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55479749.7070608@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15050418-0033-0000-0000-000004692B8B Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3224 Lines: 75 On Mon, May 04, 2015 at 11:59:05AM -0400, Rik van Riel wrote: > On 05/04/2015 05:26 AM, Paolo Bonzini wrote: > > > Isn't this racy? > > > > synchronize_rcu CPU nohz CPU > > --------------------------------------------------------- > > set flag = 0 > > read flag = 0 > > return to userspace > > set TIF_NOHZ > > > > and there's no guarantee that TIF_NOHZ is ever processed by the nohz CPU. > > Looking at the code some more, a flag is not going to be enough. > > An irq can hit while we are in kernel mode, leading to the > task's "rcu active" counter being incremented twice. > > However, currently the RCU code seems to use a much more > complex counting scheme, with a different increment for > kernel/task use, and irq use. > > This counter seems to be modeled on the task preempt_counter, > where we do care about whether we are in task context, irq > context, or softirq context. > > On the other hand, the RCU code only seems to care about > whether or not a CPU is in an extended quiescent state, > or is potentially in an RCU critical section. > > Paul, what is the reason for RCU using a complex counter, > instead of a simple increment for each potential kernel/RCU > entry, like rcu_read_lock() does with CONFIG_PREEMPT_RCU > enabled? Heh! I found out why the hard way. You see, there are architectures where a CPU can enter an interrupt level without ever exiting, and perhaps vice versa. But only if that CPU is non-idle at the time. So, when a CPU enters idle, it is necessary to reset the interrupt nesting to zero. But that means that it is in turn necessary to count task-level nesting separately from interrupt-level nesting, so that we can determine when the CPU goes idle from a task-level viewpoint. Hence the use of masks and fields within the counter. It -might- be possible to simplify this somewhat, especially now that we have unified idle loops. Except that I don't trust the architectures to be reasonable about this at this point. Furthermore, the associated nesting checks do trigger when people are making certain types of changes to architectures, so it is a useful debugging tool. Which is another reason that I am reluctant to change it. > In fact, would we be able to simply use tsk->rcu_read_lock_nesting > as an indicator of whether or not we should bother waiting on that > task or CPU when doing synchronize_rcu? Depends on exactly what you are asking. If you are asking if I could add a few more checks to preemptible RCU and speed up grace-period detection in a number of cases, the answer is very likely "yes". This is on my list, but not particularly high priority. If you are asking whether CPU 0 could access ->rcu_read_lock_nesting of some task running on some other CPU, in theory, the answer is "yes", but in practice that would require putting full memory barriers in both rcu_read_lock() and rcu_read_unlock(), so the real answer is "no". Or am I missing your point? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/