MIME-Version: 1.0
In-Reply-To: <20150818221623.GA12858@lerouge>
References: <ad9154dd60f669e94e60d36d23c3267b2ac4c94d.1439924771.git.luto@kernel.org>
 <20150818221623.GA12858@lerouge>
From: Andy Lutomirski <luto@amacapital.net>
Date: Tue, 18 Aug 2015 15:35:30 -0700
Message-ID: <CALCETrVQCi_RZqRSTy9bs0V+RB6cLHVfYq4Ouq_JLMoJePg1zA@mail.gmail.com>
Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>, X86 ML <x86@kernel.org>,
        Sasha Levin <sasha.levin@oracle.com>, Brian Gerst <brgerst@gmail.com>,
        Denys Vlasenko <dvlasenk@redhat.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Oleg Nesterov <oleg@redhat.com>, Borislav Petkov <bp@alien8.de>,
        Rik van Riel <riel@redhat.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2311
Lines: 59

On Tue, Aug 18, 2015 at 3:16 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote:
>> This fixes a couple minor holes if we took an IRQ very early in syscall
>> processing:
>>
>>  - We could enter the IRQ with CONTEXT_USER.  Everything worked (RCU
>>    was fine), but we could warn if all the debugging options were
>>    set.
>
> So this is fixing issues after your changes that call user_exit() from
> IRQs, right?

Yes.  Here's an example splat, courtesy of Sasha:

https://gist.github.com/sashalevin/a006a44989312f6835e7

>
> But the IRQs aren't supposed to call user_exit(), they have their own hooks.
> That's where the real issue is.

In -tip, the assumption is that we *always* switch to CONTEXT_KERNEL
when entering the kernel for a non-NMI reason.  That means that we can
avoid all of the (expensive!) checks for what context we're in.  It
also means that (other than IRQs, which need further cleanup), we only
switch once per user/kernel switch.

The cost for doing should be essentially zero, modulo artifacts from
poor inlining.  IMO the code is much more straightforward than it used
to be, and it has the potential to be quite fast.  For one thing, we
never invoke context tracking with IRQs on, and Rik had some profiles
suggesting that a bunch of the overhead involved dealing with repeated
irq flag manipulation.

One way or another, IRQs need to switch from RCU-not-watching to
RCU-watching, and I don't see what's wrong with user_exit for this
purpose.  Of course, if user_exit is slow, we should fix that.

Also, this isn't really related to IRQs calling user_exit.  It's that
IRQs can recurse into other entries (#GP in Sasha's case) which also
validate the context.

None of the speedups that will be enabled are written yet, but I
strongly suspect they will be soon :)

In my book, the fact that we now have context tracking assertions all
over the place is a good thing.  It means we're much less likely to
break it.

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/