MIME-Version: 1.0
In-Reply-To: <CA+55aFwhBA8yPhuP8FfHjgbfDj3dwEjR86kkTH3rymYZqikhjw@mail.gmail.com>
References: <cover.1457805972.git.luto@kernel.org> <a3b871a4eb533340d04255409dfecc94f88c647d.1457805972.git.luto@kernel.org>
 <20160314120202.GD15800@pd.tnic> <CALCETrW6E0Nz6gSmRKTvHbQDhnHVpuhzmgZB1nZ3m-DL-Bt=tQ@mail.gmail.com>
 <CA+55aFwhBA8yPhuP8FfHjgbfDj3dwEjR86kkTH3rymYZqikhjw@mail.gmail.com>
From: Andy Lutomirski <luto@amacapital.net>
Date: Mon, 14 Mar 2016 10:17:07 -0700
Message-ID: <CALCETrXhXPj_b6rUMn=SR0QwE92rL=k5DCFraZwBj9FpUgadYw@mail.gmail.com>
Subject: Re: [PATCH v4 2/5] x86/msr: Carry on after a non-"safe" MSR access
 fails without !panic_on_oops
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>, xen-devel <Xen-devel@lists.xen.org>,
        Arjan van de Ven <arjan@linux.intel.com>,
        Borislav Petkov <bp@alien8.de>, X86 ML <x86@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        KVM list <kvm@vger.kernel.org>, Andy Lutomirski <luto@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Peter Zijlstra <peterz@infradead.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1911
Lines: 42

On Mon, Mar 14, 2016 at 10:11 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mar 14, 2016 10:05 AM, "Andy Lutomirski" <luto@amacapital.net> wrote:
>>
>> We could probably remove that check and let custom fixups run early.
>> I don't see any compelling reason to keep them disabled.  That should
>> probably be a separate change, though.
>
> Or we could just use the existing wrmsr_safe() code and not add this new
> special code at all.
>
> Look, why are you doing this? We should get rid of the difference between
> wrmsr and wrmsr_safe(), not make it bigger.
>
> Just make everything safe. There has never in the history of anything been
> an advantage to making things oops and to making things more complicated.
>
> Why is rd/wrmsr() so magically important?

Because none of this is actually safe unless the code that calls
whatever_safe actually does something intelligent with the return
value.  I think that most code that does rdmsr or wrmsr actually
thinks "I know that this MSR exists and I want to access it" and the
code may actually malfunction if the access fails.  So I think we
really do want the warning so we can fix the bugs if they happen.  And
I wouldn't be at all surprised if there's a bug or two in perf where
some perf event registers itself incorrectly in some cases because it
messed up the probe sequence.  We don't want to totally ignore the
resulting failure of the perf event -- we want to get a warning so
that we know about the bug.

Or suppose we're writing some important but easy-to-miss MSR, like PAT
or one of the excessive number of system call setup MSRs.  If the
write fails, the system could easily still appear to work until
something unfortunate happens.

So yes, let's please warn.  I'm okay with removing the panic_on_oops
thing though.  (But if anyone suggests that we should stop OOPSing on
bad kernel page faults, I *will* fight back.)

--Andy