MIME-Version: 1.0
In-Reply-To: <CA+55aFy0-rj7hp3zOUAZD5y5Zp=v6Cu3TG0SHB-buj3oYTJcZg@mail.gmail.com>
References: <CALCETrUf9s-o-ETMiSxxjMGxVeH7di4O9vTi0Oe7wS-RCiVXLA@mail.gmail.com>
 <CA+55aFwR8mHw=wm+Uecy0ERgrD7WbijBn9kj_ZAd47L4GyG5Xw@mail.gmail.com>
 <CALCETrVAzhE7w3BDjqRack54BLncZALbnAOZyeXHx1cSTryy4g@mail.gmail.com>
 <CA+55aFyxs8Q5WrjN9o4Zmfd_4+muLkcoO8cXyv5Nt+Pf8c0TBQ@mail.gmail.com>
 <20150723173105.6795c0dc@gandalf.local.home> <CA+55aFy0-rj7hp3zOUAZD5y5Zp=v6Cu3TG0SHB-buj3oYTJcZg@mail.gmail.com>
From: Andy Lutomirski <luto@amacapital.net>
Date: Thu, 23 Jul 2015 14:50:11 -0700
Message-ID: <CALCETrWMgsrgEYWpzPFapOj+-SvZfadDAZ7SH7O8bFsR2b6F1Q@mail.gmail.com>
Subject: Re: Dealing with the NMI mess
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>, X86 ML <x86@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Willy Tarreau <w@1wt.eu>, Borislav Petkov <bp@alien8.de>,
        Thomas Gleixner <tglx@linutronix.de>,
        Peter Zijlstra <peterz@infradead.org>, Brian Gerst <brgerst@gmail.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2939
Lines: 68

On Thu, Jul 23, 2015 at 2:48 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jul 23, 2015 at 2:31 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>>
>> Let me get this straight. The idea is in the #DB handler to detect that
>> it was triggered in NMI context, and if so, simply disarm that
>> breakpoint permanently, right?
>
> No, for simplicity, I'd make it cover not just NMI code, but any
> "kernel code with interrupts disabled".
>
> Because that's the test we'd use for "use ret instead of iret".
>
> And that wider test is exactly because it's so damn hard to get the
> exact instruction boundaries right. Let's *not* go down the path
> (again) of having to get the whole %rip range and "magic stack pointer
> values" etc.
>
> Make it simple and completely unambiguous. The rule really would be:
>
>  - if we return to kernel space and interrupts are disabled, we will
> use "ret" rather than "iret"
>
>    Hard rule. Simple. Straightforward. No random %rip values. No
> random %rsp values. NO CRAP.
>
>  - but because we use "ret" rather than "iret" we can't get RF
> semantics, it means that #DB is special. RF is supposed to make us
> make forward progress
>
>    So for that reason, #DB just says "if the breakpoint happened
> during that interrupts-ff reghion, I will clear %dr7 to guarantee
> forward progress"

What if we relax it slightly: "if the breakpoint happened during that
interrupts-off region, I will clear all *kernel breakpoints* in %dr7
to guarantee forward progress"?

Watchpoints don't need RF to make forward progress, and, by leaving
watchpoints alone, we avoid breaking gdb.

>
> So those would be the two main rules. Very simple, and avoiding all nasty cases.
>
> Now, I'd be willing to then hide the "oops, we clear dr7 very
> agrressively" issue by having a few additional _heuristics_. But I
> call them "heuristics" because unlike the current NMI nesting games,
> they aren't about core stability. They are about "ok, maybe somebody
> wants to trigger those faults, and we'll be _nice_ and try to make it
> easy for them", but nothing more.
>
> So for example, if that "#DB clears %dr7" happened, it sounds easy to
> set _TIF_USER_WORK_MASK, and just force %dr7 to be re-loaded from a
> cached value, so that if we disabled things because of some user stack
> trace access, it will be re-enabled by the time we return to user
> space. I think that sounds reasonable, but it's not something the core
> low-level entry x86 assembly code needs to even care about. It's not
> that level of "core", it's just being polite.

Once we limit it to instruction breakpoints, I don't think re-enabling
before returning to userspace matters.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/