Date: Thu, 23 Jul 2015 23:17:44 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: X86 ML <x86@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Willy Tarreau <w@1wt.eu>, Borislav Petkov <bp@alien8.de>,
        Thomas Gleixner <tglx@linutronix.de>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Steven Rostedt <rostedt@goodmis.org>, Brian Gerst <brgerst@gmail.com>
Subject: Re: Dealing with the NMI mess
Message-ID: <20150723211744.GM25159@twins.programming.kicks-ass.net>
References: <CALCETrUf9s-o-ETMiSxxjMGxVeH7di4O9vTi0Oe7wS-RCiVXLA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CALCETrUf9s-o-ETMiSxxjMGxVeH7di4O9vTi0Oe7wS-RCiVXLA@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1866
Lines: 46

On Thu, Jul 23, 2015 at 01:21:16PM -0700, Andy Lutomirski wrote:
> 3. Forbid faults (other than MCE) inside NMI.
> 
> Option 3 is almost easy.  There are really only two kinds of faults
> that can legitimately nest inside NMI: #PF and #DB.  #DB is easy to
> fix (e.g. with my patches or Peter's patches).
> 
> What if we went all out and forbade page faults in NMI as well.  There
> are two reasons that I can think of that we might page fault inside an
> NMI:
> 
> b) User memory access faults.
> 
> The reason we access user state in general from an NMI is to allow
> perf to capture enough user stack data to let the tooling backtrace
> back to user space.  What if we did it differently?  Instead of
> capturing this data in NMI context, capture it in
> prepare_exit_to_usermode. 

> Peter, can this be done without breaking the perf ABI?  If we were
> designing all of this stuff from scratch right now, I'd suggest doing
> it this way, but I'm not sure whether it makes sense to try to
> retrofit it in.

Not really; but also almost :/

So the thing is that we currently attach the user backtrace to all
events -- and there can be many before we return to userspace again.

So none of those events would have a userspace stack, I'm sure that's
going to confuse the tooling.

OTOH, userspace stacks are a best effort thing, we bail at the first
sign of trouble (eg. the stack page is not there).

Now realistically this 'never' happens, and it would result in
consistently truncated user traces, where your proposal would result in
a whole bunch of events with no user traces and then an 'extra' event
with a one.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/