Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755073AbbGQS7l (ORCPT ); Fri, 17 Jul 2015 14:59:41 -0400 Received: from mail-la0-f44.google.com ([209.85.215.44]:35266 "EHLO mail-la0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751502AbbGQS7k (ORCPT ); Fri, 17 Jul 2015 14:59:40 -0400 MIME-Version: 1.0 In-Reply-To: <20150717044921.GA18298@linux.vnet.ibm.com> References: <20150717042907.GZ3717@linux.vnet.ibm.com> <20150717044921.GA18298@linux.vnet.ibm.com> From: Andy Lutomirski Date: Fri, 17 Jul 2015 11:59:18 -0700 Message-ID: Subject: Re: Reconciling rcu_irq_enter()/rcu_nmi_enter() with context tracking To: Paul McKenney Cc: Sasha Levin , =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYmVja2Vy?= , "linux-kernel@vger.kernel.org" , Peter Zijlstra , X86 ML , Rik van Riel Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4650 Lines: 99 On Thu, Jul 16, 2015 at 9:49 PM, Paul E. McKenney wrote: > On Thu, Jul 16, 2015 at 09:29:07PM -0700, Paul E. McKenney wrote: >> On Thu, Jul 16, 2015 at 06:53:15PM -0700, Andy Lutomirski wrote: >> > For reasons that mystify me a bit, we currently track context tracking >> > state separately from rcu's watching state. This results in strange >> > artifacts: nothing generic cause IRQs to enter CONTEXT_KERNEL, and we >> > can nest exceptions inside the IRQ handler (an example would be >> > wrmsr_safe failing), and, in -next, we splat a warning: >> > >> > https://gist.github.com/sashalevin/a006a44989312f6835e7 >> > >> > I'm trying to make context tracking more exact, which will fix this >> > issue (the particular splat that Sasha hit shouldn't be possible when >> > I'm done), but I think it would be nice to unify all of this stuff. >> > Would it be plausible for us to guarantee that RCU state is always in >> > sync with context tracking state? If so, we could maybe simplify >> > things and have fewer state variables. >> >> A noble goal. Might even be possible, and maybe even advantageous. >> >> But it is usually easier to say than to do. RCU really does need to make >> some adjustments when the state changes, as do the other subsystems. >> It might or might not be possible to do the transitions atomically. >> And if the transitions are not atomic, there will still be weird code >> paths where (say) the processor is considered non-idle, but RCU doesn't >> realize it yet. Such a code path could not safely use rcu_read_lock(), >> so you still need RCU to be able to scream if someone tries it. >> Contrariwise, if there is a code path where the processor is considered >> idle, but RCU thinks it is non-idle, that code path can stall >> grace periods. (Yes, not a problem if the code path is short enough. >> At least if the underlying VCPU is making progres...) >> >> Still, I cannot prove that it is impossible, and if it is possible, >> then as you say, there might well be benefits. >> >> > Doing this for NMIs might be weird. Would it make sense to have a >> > CONTEXT_NMI that's somehow valid even if the NMI happened while >> > changing context tracking state. >> >> Face it, NMIs are weird. ;-) >> >> > Thoughts? As it stands, I think we might already be broken for real: >> > >> > Syscall -> user_exit. Perf NMI hits *during* user_exit. Perf does >> > copy_from_user_nmi, which can fault, causing do_page_fault to get >> > called, which calls exception_enter(), which can't be a good thing. >> > >> > RCU is okay (sort of) because of rcu_nmi_enter, but this seems very fragile. >> >> Actually, I see more cases where people forget irq_enter() than >> rcu_nmi_enter(). "We will just nip in quickly and do something without >> actually letting the irq system know. Oh, and we want some event tracing >> in that code path." Boom! >> >> > Thoughts? As it stands, I need to do something because -tip and thus >> > -next spews occasional warnings. >> >> Tell me more? > > And for completeness, RCU also has the following requirements on the > state-transition mechanism: > > 1. It must be possible to reliably sample some other CPU's state. > This is an energy-efficiency requirement, as RCU is not normally > permitted to wake up idle CPUs. Nor nohz CPUs, for that matter. NOHZ needs this for vtime accounting, too. I think Rik might be thinking about this. Maybe the underlying state could be shared? > > 2. RCU must be able to track passage through idle and nohz states. > In other words, if RCU samples at t=0 and finds that the CPU > is executing (say) in kernel mode, and RCU samples again at > t=10 and again finds that the CPU is executing in kernel mode, > RCU needs to be able to determine whether or not that CPU passed > through idle or nohz betweentimes. And RCU can do this for CONTEXT_KERNEL vs CONTEXT_USER because the context tracking stuff notifies RCU. The think I'm less than happy with is that we can currently be CONTEXT_USER but still rcu-awake. This is manageable, but it seems messy. > > 3. In some configurations, RCU needs to be able to block entry into > nohz state, both for idle and userspace. > Hmm. I suppose we could be CONTEXT_USER but still have RCU awake, although the tick would have to stay on. Grumble. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/