Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752272AbbGRNM7 (ORCPT ); Sat, 18 Jul 2015 09:12:59 -0400 Received: from mail-wi0-f182.google.com ([209.85.212.182]:36793 "EHLO mail-wi0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751195AbbGRNM5 (ORCPT ); Sat, 18 Jul 2015 09:12:57 -0400 Date: Sat, 18 Jul 2015 15:12:53 +0200 From: Frederic Weisbecker To: Andy Lutomirski Cc: Paul McKenney , Sasha Levin , "linux-kernel@vger.kernel.org" , Peter Zijlstra , X86 ML , Rik van Riel Subject: Re: Reconciling rcu_irq_enter()/rcu_nmi_enter() with context tracking Message-ID: <20150718131252.GB1747@lerouge> References: <20150717042907.GZ3717@linux.vnet.ibm.com> <20150717044921.GA18298@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5092 Lines: 102 On Fri, Jul 17, 2015 at 11:59:18AM -0700, Andy Lutomirski wrote: > On Thu, Jul 16, 2015 at 9:49 PM, Paul E. McKenney > wrote: > > On Thu, Jul 16, 2015 at 09:29:07PM -0700, Paul E. McKenney wrote: > >> On Thu, Jul 16, 2015 at 06:53:15PM -0700, Andy Lutomirski wrote: > >> > For reasons that mystify me a bit, we currently track context tracking > >> > state separately from rcu's watching state. This results in strange > >> > artifacts: nothing generic cause IRQs to enter CONTEXT_KERNEL, and we > >> > can nest exceptions inside the IRQ handler (an example would be > >> > wrmsr_safe failing), and, in -next, we splat a warning: > >> > > >> > https://gist.github.com/sashalevin/a006a44989312f6835e7 > >> > > >> > I'm trying to make context tracking more exact, which will fix this > >> > issue (the particular splat that Sasha hit shouldn't be possible when > >> > I'm done), but I think it would be nice to unify all of this stuff. > >> > Would it be plausible for us to guarantee that RCU state is always in > >> > sync with context tracking state? If so, we could maybe simplify > >> > things and have fewer state variables. > >> > >> A noble goal. Might even be possible, and maybe even advantageous. > >> > >> But it is usually easier to say than to do. RCU really does need to make > >> some adjustments when the state changes, as do the other subsystems. > >> It might or might not be possible to do the transitions atomically. > >> And if the transitions are not atomic, there will still be weird code > >> paths where (say) the processor is considered non-idle, but RCU doesn't > >> realize it yet. Such a code path could not safely use rcu_read_lock(), > >> so you still need RCU to be able to scream if someone tries it. > >> Contrariwise, if there is a code path where the processor is considered > >> idle, but RCU thinks it is non-idle, that code path can stall > >> grace periods. (Yes, not a problem if the code path is short enough. > >> At least if the underlying VCPU is making progres...) > >> > >> Still, I cannot prove that it is impossible, and if it is possible, > >> then as you say, there might well be benefits. > >> > >> > Doing this for NMIs might be weird. Would it make sense to have a > >> > CONTEXT_NMI that's somehow valid even if the NMI happened while > >> > changing context tracking state. > >> > >> Face it, NMIs are weird. ;-) > >> > >> > Thoughts? As it stands, I think we might already be broken for real: > >> > > >> > Syscall -> user_exit. Perf NMI hits *during* user_exit. Perf does > >> > copy_from_user_nmi, which can fault, causing do_page_fault to get > >> > called, which calls exception_enter(), which can't be a good thing. > >> > > >> > RCU is okay (sort of) because of rcu_nmi_enter, but this seems very fragile. > >> > >> Actually, I see more cases where people forget irq_enter() than > >> rcu_nmi_enter(). "We will just nip in quickly and do something without > >> actually letting the irq system know. Oh, and we want some event tracing > >> in that code path." Boom! > >> > >> > Thoughts? As it stands, I need to do something because -tip and thus > >> > -next spews occasional warnings. > >> > >> Tell me more? > > > > And for completeness, RCU also has the following requirements on the > > state-transition mechanism: > > > > 1. It must be possible to reliably sample some other CPU's state. > > This is an energy-efficiency requirement, as RCU is not normally > > permitted to wake up idle CPUs. Nor nohz CPUs, for that matter. > > NOHZ needs this for vtime accounting, too. I think Rik might be > thinking about this. Maybe the underlying state could be shared? > > > > > 2. RCU must be able to track passage through idle and nohz states. > > In other words, if RCU samples at t=0 and finds that the CPU > > is executing (say) in kernel mode, and RCU samples again at > > t=10 and again finds that the CPU is executing in kernel mode, > > RCU needs to be able to determine whether or not that CPU passed > > through idle or nohz betweentimes. > > And RCU can do this for CONTEXT_KERNEL vs CONTEXT_USER because the > context tracking stuff notifies RCU. The think I'm less than happy > with is that we can currently be CONTEXT_USER but still rcu-awake. > This is manageable, but it seems messy. When we interrupt userspace, right? I don't see that much as a problem, until we use a unified context tracking for both RCU and context tracking. > > > > > 3. In some configurations, RCU needs to be able to block entry into > > nohz state, both for idle and userspace. > > > > Hmm. I suppose we could be CONTEXT_USER but still have RCU awake, > although the tick would have to stay on. Well 3) is handled by the tick nohz code so it's still external. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/