Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751410AbbHSRKO (ORCPT ); Wed, 19 Aug 2015 13:10:14 -0400 Received: from mail-wi0-f169.google.com ([209.85.212.169]:36885 "EHLO mail-wi0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750797AbbHSRKM (ORCPT ); Wed, 19 Aug 2015 13:10:12 -0400 Date: Wed, 19 Aug 2015 19:10:09 +0200 From: Frederic Weisbecker To: Andy Lutomirski Cc: Andy Lutomirski , X86 ML , Sasha Levin , Brian Gerst , Denys Vlasenko , "linux-kernel@vger.kernel.org" , Oleg Nesterov , Borislav Petkov , Rik van Riel Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts Message-ID: <20150819171007.GA21717@lerouge> References: <20150818221623.GA12858@lerouge> <20150818230235.GA13685@lerouge> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5090 Lines: 135 On Tue, Aug 18, 2015 at 04:07:51PM -0700, Andy Lutomirski wrote: > On Tue, Aug 18, 2015 at 4:02 PM, Frederic Weisbecker wrote: > > On Tue, Aug 18, 2015 at 03:35:30PM -0700, Andy Lutomirski wrote: > >> On Tue, Aug 18, 2015 at 3:16 PM, Frederic Weisbecker wrote: > >> > On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote: > >> >> This fixes a couple minor holes if we took an IRQ very early in syscall > >> >> processing: > >> >> > >> >> - We could enter the IRQ with CONTEXT_USER. Everything worked (RCU > >> >> was fine), but we could warn if all the debugging options were > >> >> set. > >> > > >> > So this is fixing issues after your changes that call user_exit() from > >> > IRQs, right? > >> > >> Yes. Here's an example splat, courtesy of Sasha: > >> > >> https://gist.github.com/sashalevin/a006a44989312f6835e7 > >> > >> > > >> > But the IRQs aren't supposed to call user_exit(), they have their own hooks. > >> > That's where the real issue is. > >> > >> In -tip, the assumption is that we *always* switch to CONTEXT_KERNEL > >> when entering the kernel for a non-NMI reason. > > > > Why? IRQs don't need that! We already have irq_enter()/irq_exit(). > > > > Those are certainly redundant. So? What's the point in duplicating a hook in arch code that core code already has? > I want to have a real hook to call > that says "switch to IRQ context from CONTEXT_USER" or "switch to IRQ > context from CONTEXT_KERNEL" (aka noop), but that doesn't currently > exist. You're not answering _why_ you want that. > > > And we don't want to call rcu_user_*() pairs on IRQs, you're > > introducing a serious performance regression here! And I'm talking about > > the code that's currently in -tip. > > Is there an easy way to fix it? For example, could we figure out what > makes it take so long and make it faster? Sure, just remove your arch IRQ hook. > If we need to, we could > back out the IRQ bit and change the assertions for 4.3, but I'd rather > keep the exact context tracking if at all possible. I have no idea what you mean by exact context tracking here. But If we ever want to call irq_enter() using arch hooks, and I have no idea why we would ever want to do that since that involve complexifying the code by $NR_ARCHS and moving C code to ASM, we need serious reasons! And that's certainly not something we are going to plan now for the next week's merge window. > >> That means that we can > >> avoid all of the (expensive!) checks for what context we're in. > > > > If you're referring to context tracking, the context check is a per-cpu > > read. Not something that's usually considered expensive. > > In -tip, there aren't even extra branches, except those imposed by the > user_exit implementation. No there is the "call enter_from_user_mode" in the IRQ fast path. > > > > >> It also means that (other than IRQs, which need further cleanup), we only > >> switch once per user/kernel switch. > > > > ??? > > In 4.2 and before, we can switch multiple times on the way out of the > kernel, via SCHEDULE_USER, do_notify_resume, etc. In -tip, we do it > exactly once no matter what. That's what we want for syscalls but not for IRQs. > > > > >> > >> The cost for doing should be essentially zero, modulo artifacts from > >> poor inlining. > > > > And modulo rcu_user_*() that do multiple costly atomic_add_return() operations > > implying full memory barriers. Plus the unnecessary vtime accounting that doubles > > the existing one in irq_enter/exit() (those even imply a lock currently, which will > > probably be turned to seqcount, but still, full memory barriers...). > > > > I'm sorry but I'm going to NACK any code that does that in IRQs (and again that > > concerns current tip:x86/asm). > > Why do we need these heavyweight barriers? Actually it's not full barriers but atomic ones (smp_mb__after_atomic_stuff()) I suspect we can't do much better given RCU requirements. Still we don't need to call it twice. > > If there's actually a measurable performance hit in IRQs in -tip, then > can we come up with a better fix? I'm sure it's very easily measurable. > For example, we could change all > the new CT_WARN_ON calls to check "are we in CONTEXT_KERNEL or in IRQ > context" and make the IRQ entry do a lighter weight context tracking > operation. I don't see what we need to check actually. Context tracking can be in any state while in IRQ. > > But I think I'm still missing something fundamental about the > performance: why is irq_enter() any faster than user_exit()? It's stlightly faster at least because it takes care of nesting IRQs which is likely with softirqs that get interrupted. Now of course we wouldn't call user_exit() in this case, but the hook is there in generic code, no need for anything from the arch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/