MIME-Version: 1.0
In-Reply-To: <CALCETrXhXPj_b6rUMn=SR0QwE92rL=k5DCFraZwBj9FpUgadYw@mail.gmail.com>
References: <cover.1457805972.git.luto@kernel.org>
	<a3b871a4eb533340d04255409dfecc94f88c647d.1457805972.git.luto@kernel.org>
	<20160314120202.GD15800@pd.tnic>
	<CALCETrW6E0Nz6gSmRKTvHbQDhnHVpuhzmgZB1nZ3m-DL-Bt=tQ@mail.gmail.com>
	<CA+55aFwhBA8yPhuP8FfHjgbfDj3dwEjR86kkTH3rymYZqikhjw@mail.gmail.com>
	<CALCETrXhXPj_b6rUMn=SR0QwE92rL=k5DCFraZwBj9FpUgadYw@mail.gmail.com>
Date: Mon, 14 Mar 2016 11:04:59 -0700
Message-ID: <CA+55aFwr80aZ3-1H4VPB5=jeU_Yt=hYFfFvZC5-cNXzgXxGf3A@mail.gmail.com>
Subject: Re: [PATCH v4 2/5] x86/msr: Carry on after a non-"safe" MSR access
 fails without !panic_on_oops
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>, xen-devel <Xen-devel@lists.xen.org>,
        Arjan van de Ven <arjan@linux.intel.com>,
        Borislav Petkov <bp@alien8.de>, X86 ML <x86@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        KVM list <kvm@vger.kernel.org>, Andy Lutomirski <luto@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Peter Zijlstra <peterz@infradead.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1733
Lines: 45

On Mon, Mar 14, 2016 at 10:17 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> So yes, let's please warn.  I'm okay with removing the panic_on_oops
> thing though.  (But if anyone suggests that we should stop OOPSing on
> bad kernel page faults, I *will* fight back.)

Bad kernel page faults are something completely different. They would
be actual bugs regardless.

The MSR thing has *often* been just silly "this CPU doesn't do that
MSR". Generic bootup setup code etc that just didn't know or care
about the particular badly documented rule for one particular random
CPU version and stepping.

In fact, when I say "often", I suspect I should really just say
"always". I don't think we've ever found a case where oopsing would
have been the right thing. But it has definitely caused lots of
problems, especially in the early paths where your code doesn't even
work right now.

Now, when it comes to the warning, I guess I could live with it, but I
think it's stupid to make this a low-level exception handler thing.

So what I think should be done:

 - make sure that wr/rdmsr_safe() actually works during very early
init. At some point it didn't.

 - get rid of the current wrmsr/rdmsr *entirely*. It's shit.

 - Add this wrapper:

      #define wrmsr(msr, low, high) \
        WARN_ON_ONCE(wrmsr_safe(msr, low, high))

and be done with it. We could even decide to make that WARN_ON_ONCE()
be something we could configure out, because it's really a debugging
thing and isn't like it should be all that fatal.

None of this insane complicated crap that buys us exactly *nothing*,
and depends on fancy new exception handling support etc etc.

So what's the downside to just doing this simple thing?

                      Linus