Date: Wed, 19 Aug 2015 19:10:09 +0200
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>, X86 ML <x86@kernel.org>,
        Sasha Levin <sasha.levin@oracle.com>, Brian Gerst <brgerst@gmail.com>,
        Denys Vlasenko <dvlasenk@redhat.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Oleg Nesterov <oleg@redhat.com>, Borislav Petkov <bp@alien8.de>,
        Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling
 interrupts
Message-ID: <20150819171007.GA21717@lerouge>
References: <ad9154dd60f669e94e60d36d23c3267b2ac4c94d.1439924771.git.luto@kernel.org>
 <20150818221623.GA12858@lerouge>
 <CALCETrVQCi_RZqRSTy9bs0V+RB6cLHVfYq4Ouq_JLMoJePg1zA@mail.gmail.com>
 <20150818230235.GA13685@lerouge>
 <CALCETrVqA0JVxZeD+7vMD-9xmF8udXw5RO0fwQA2X45DSVqM0Q@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CALCETrVqA0JVxZeD+7vMD-9xmF8udXw5RO0fwQA2X45DSVqM0Q@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5090
Lines: 135

On Tue, Aug 18, 2015 at 04:07:51PM -0700, Andy Lutomirski wrote:
> On Tue, Aug 18, 2015 at 4:02 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > On Tue, Aug 18, 2015 at 03:35:30PM -0700, Andy Lutomirski wrote:
> >> On Tue, Aug 18, 2015 at 3:16 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> >> > On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote:
> >> >> This fixes a couple minor holes if we took an IRQ very early in syscall
> >> >> processing:
> >> >>
> >> >>  - We could enter the IRQ with CONTEXT_USER.  Everything worked (RCU
> >> >>    was fine), but we could warn if all the debugging options were
> >> >>    set.
> >> >
> >> > So this is fixing issues after your changes that call user_exit() from
> >> > IRQs, right?
> >>
> >> Yes.  Here's an example splat, courtesy of Sasha:
> >>
> >> https://gist.github.com/sashalevin/a006a44989312f6835e7
> >>
> >> >
> >> > But the IRQs aren't supposed to call user_exit(), they have their own hooks.
> >> > That's where the real issue is.
> >>
> >> In -tip, the assumption is that we *always* switch to CONTEXT_KERNEL
> >> when entering the kernel for a non-NMI reason.
> >
> > Why? IRQs don't need that! We already have irq_enter()/irq_exit().
> >
> 
> Those are certainly redundant.

So? What's the point in duplicating a hook in arch code that core code already
has?

> I want to have a real hook to call
> that says "switch to IRQ context from CONTEXT_USER" or "switch to IRQ
> context from CONTEXT_KERNEL" (aka noop), but that doesn't currently
> exist.

You're not answering _why_ you want that.

> 
> > And we don't want to call rcu_user_*() pairs on IRQs, you're
> > introducing a serious performance regression here! And I'm talking about
> > the code that's currently in -tip.
> 
> Is there an easy way to fix it?  For example, could we figure out what
> makes it take so long and make it faster?

Sure, just remove your arch IRQ hook.

> If we need to, we could
> back out the IRQ bit and change the assertions for 4.3, but I'd rather
> keep the exact context tracking if at all possible.

I have no idea what you mean by exact context tracking here.

But If we ever want to call irq_enter() using arch hooks, and I have no idea why
we would ever want to do that since that involve complexifying the code
by $NR_ARCHS and moving C code to ASM, we need serious reasons! And that's
certainly not something we are going to plan now for the next week's merge window.

> >> That means that we can
> >> avoid all of the (expensive!) checks for what context we're in.
> >
> > If you're referring to context tracking, the context check is a per-cpu
> > read. Not something that's usually considered expensive.
> 
> In -tip, there aren't even extra branches, except those imposed by the
> user_exit implementation.

No there is the "call enter_from_user_mode" in the IRQ fast path.

> 
> >
> >> It also means that (other than IRQs, which need further cleanup), we only
> >> switch once per user/kernel switch.
> >
> > ???
> 
> In 4.2 and before, we can switch multiple times on the way out of the
> kernel, via SCHEDULE_USER, do_notify_resume, etc.  In -tip, we do it
> exactly once no matter what.

That's what we want for syscalls but not for IRQs.

> 
> >
> >>
> >> The cost for doing should be essentially zero, modulo artifacts from
> >> poor inlining.
> >
> > And modulo rcu_user_*() that do multiple costly atomic_add_return() operations
> > implying full memory barriers. Plus the unnecessary vtime accounting that doubles
> > the existing one in irq_enter/exit() (those even imply a lock currently, which will
> > probably be turned to seqcount, but still, full memory barriers...).
> >
> > I'm sorry but I'm going to NACK any code that does that in IRQs (and again that
> > concerns current tip:x86/asm).
> 
> Why do we need these heavyweight barriers?

Actually it's not full barriers but atomic ones (smp_mb__after_atomic_stuff())
I suspect we can't do much better given RCU requirements.

Still we don't need to call it twice.

> 
> If there's actually a measurable performance hit in IRQs in -tip, then
> can we come up with a better fix?

I'm sure it's very easily measurable.

> For example, we could change all
> the new CT_WARN_ON calls to check "are we in CONTEXT_KERNEL or in IRQ
> context" and make the IRQ entry do a lighter weight context tracking
> operation.

I don't see what we need to check actually. Context tracking can be in any
state while in IRQ.

> 
> But I think I'm still missing something fundamental about the
> performance: why is irq_enter() any faster than user_exit()?

It's stlightly faster at least because it takes care of nesting IRQs which
is likely with softirqs that get interrupted.

Now of course we wouldn't call user_exit() in this case, but the hook is there
in generic code, no need for anything from the arch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/