Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751486AbbHRXIO (ORCPT ); Tue, 18 Aug 2015 19:08:14 -0400 Received: from mail-ob0-f178.google.com ([209.85.214.178]:34852 "EHLO mail-ob0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750867AbbHRXIL (ORCPT ); Tue, 18 Aug 2015 19:08:11 -0400 MIME-Version: 1.0 In-Reply-To: <20150818230235.GA13685@lerouge> References: <20150818221623.GA12858@lerouge> <20150818230235.GA13685@lerouge> From: Andy Lutomirski Date: Tue, 18 Aug 2015 16:07:51 -0700 Message-ID: Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts To: Frederic Weisbecker Cc: Andy Lutomirski , X86 ML , Sasha Levin , Brian Gerst , Denys Vlasenko , "linux-kernel@vger.kernel.org" , Oleg Nesterov , Borislav Petkov , Rik van Riel Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3680 Lines: 96 On Tue, Aug 18, 2015 at 4:02 PM, Frederic Weisbecker wrote: > On Tue, Aug 18, 2015 at 03:35:30PM -0700, Andy Lutomirski wrote: >> On Tue, Aug 18, 2015 at 3:16 PM, Frederic Weisbecker wrote: >> > On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote: >> >> This fixes a couple minor holes if we took an IRQ very early in syscall >> >> processing: >> >> >> >> - We could enter the IRQ with CONTEXT_USER. Everything worked (RCU >> >> was fine), but we could warn if all the debugging options were >> >> set. >> > >> > So this is fixing issues after your changes that call user_exit() from >> > IRQs, right? >> >> Yes. Here's an example splat, courtesy of Sasha: >> >> https://gist.github.com/sashalevin/a006a44989312f6835e7 >> >> > >> > But the IRQs aren't supposed to call user_exit(), they have their own hooks. >> > That's where the real issue is. >> >> In -tip, the assumption is that we *always* switch to CONTEXT_KERNEL >> when entering the kernel for a non-NMI reason. > > Why? IRQs don't need that! We already have irq_enter()/irq_exit(). > Those are certainly redundant. I want to have a real hook to call that says "switch to IRQ context from CONTEXT_USER" or "switch to IRQ context from CONTEXT_KERNEL" (aka noop), but that doesn't currently exist. > And we don't want to call rcu_user_*() pairs on IRQs, you're > introducing a serious performance regression here! And I'm talking about > the code that's currently in -tip. Is there an easy way to fix it? For example, could we figure out what makes it take so long and make it faster? If we need to, we could back out the IRQ bit and change the assertions for 4.3, but I'd rather keep the exact context tracking if at all possible. > >> That means that we can >> avoid all of the (expensive!) checks for what context we're in. > > If you're referring to context tracking, the context check is a per-cpu > read. Not something that's usually considered expensive. In -tip, there aren't even extra branches, except those imposed by the user_exit implementation. > >> It also means that (other than IRQs, which need further cleanup), we only >> switch once per user/kernel switch. > > ??? In 4.2 and before, we can switch multiple times on the way out of the kernel, via SCHEDULE_USER, do_notify_resume, etc. In -tip, we do it exactly once no matter what. > >> >> The cost for doing should be essentially zero, modulo artifacts from >> poor inlining. > > And modulo rcu_user_*() that do multiple costly atomic_add_return() operations > implying full memory barriers. Plus the unnecessary vtime accounting that doubles > the existing one in irq_enter/exit() (those even imply a lock currently, which will > probably be turned to seqcount, but still, full memory barriers...). > > I'm sorry but I'm going to NACK any code that does that in IRQs (and again that > concerns current tip:x86/asm). Why do we need these heavyweight barriers? If there's actually a measurable performance hit in IRQs in -tip, then can we come up with a better fix? For example, we could change all the new CT_WARN_ON calls to check "are we in CONTEXT_KERNEL or in IRQ context" and make the IRQ entry do a lighter weight context tracking operation. But I think I'm still missing something fundamental about the performance: why is irq_enter() any faster than user_exit()? --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/