Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754112AbbHRWfv (ORCPT ); Tue, 18 Aug 2015 18:35:51 -0400 Received: from mail-ob0-f169.google.com ([209.85.214.169]:33571 "EHLO mail-ob0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752175AbbHRWfu (ORCPT ); Tue, 18 Aug 2015 18:35:50 -0400 MIME-Version: 1.0 In-Reply-To: <20150818221623.GA12858@lerouge> References: <20150818221623.GA12858@lerouge> From: Andy Lutomirski Date: Tue, 18 Aug 2015 15:35:30 -0700 Message-ID: Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts To: Frederic Weisbecker Cc: Andy Lutomirski , X86 ML , Sasha Levin , Brian Gerst , Denys Vlasenko , "linux-kernel@vger.kernel.org" , Oleg Nesterov , Borislav Petkov , Rik van Riel Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2311 Lines: 59 On Tue, Aug 18, 2015 at 3:16 PM, Frederic Weisbecker wrote: > On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote: >> This fixes a couple minor holes if we took an IRQ very early in syscall >> processing: >> >> - We could enter the IRQ with CONTEXT_USER. Everything worked (RCU >> was fine), but we could warn if all the debugging options were >> set. > > So this is fixing issues after your changes that call user_exit() from > IRQs, right? Yes. Here's an example splat, courtesy of Sasha: https://gist.github.com/sashalevin/a006a44989312f6835e7 > > But the IRQs aren't supposed to call user_exit(), they have their own hooks. > That's where the real issue is. In -tip, the assumption is that we *always* switch to CONTEXT_KERNEL when entering the kernel for a non-NMI reason. That means that we can avoid all of the (expensive!) checks for what context we're in. It also means that (other than IRQs, which need further cleanup), we only switch once per user/kernel switch. The cost for doing should be essentially zero, modulo artifacts from poor inlining. IMO the code is much more straightforward than it used to be, and it has the potential to be quite fast. For one thing, we never invoke context tracking with IRQs on, and Rik had some profiles suggesting that a bunch of the overhead involved dealing with repeated irq flag manipulation. One way or another, IRQs need to switch from RCU-not-watching to RCU-watching, and I don't see what's wrong with user_exit for this purpose. Of course, if user_exit is slow, we should fix that. Also, this isn't really related to IRQs calling user_exit. It's that IRQs can recurse into other entries (#GP in Sasha's case) which also validate the context. None of the speedups that will be enabled are written yet, but I strongly suspect they will be soon :) In my book, the fact that we now have context tracking assertions all over the place is a good thing. It means we're much less likely to break it. --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/