Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934693AbcCOKV6 (ORCPT ); Tue, 15 Mar 2016 06:21:58 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:33170 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932766AbcCOKVs (ORCPT ); Tue, 15 Mar 2016 06:21:48 -0400 Date: Tue, 15 Mar 2016 11:21:42 +0100 From: Ingo Molnar To: Linus Torvalds Cc: Andy Lutomirski , Paolo Bonzini , xen-devel , Arjan van de Ven , Borislav Petkov , X86 ML , Andrew Morton , KVM list , Andy Lutomirski , "linux-kernel@vger.kernel.org" , Peter Zijlstra Subject: Re: [PATCH v4 2/5] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops Message-ID: <20160315102142.GA23406@gmail.com> References: <20160314120202.GD15800@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2495 Lines: 64 * Linus Torvalds wrote: > On Mon, Mar 14, 2016 at 10:17 AM, Andy Lutomirski wrote: > > > > So yes, let's please warn. I'm okay with removing the panic_on_oops > > thing though. (But if anyone suggests that we should stop OOPSing on > > bad kernel page faults, I *will* fight back.) > > Bad kernel page faults are something completely different. They would > be actual bugs regardless. > > The MSR thing has *often* been just silly "this CPU doesn't do that > MSR". Generic bootup setup code etc that just didn't know or care > about the particular badly documented rule for one particular random > CPU version and stepping. > > In fact, when I say "often", I suspect I should really just say > "always". I don't think we've ever found a case where oopsing would > have been the right thing. But it has definitely caused lots of > problems, especially in the early paths where your code doesn't even > work right now. > > Now, when it comes to the warning, I guess I could live with it, but I > think it's stupid to make this a low-level exception handler thing. > > So what I think should be done: > > - make sure that wr/rdmsr_safe() actually works during very early > init. At some point it didn't. > > - get rid of the current wrmsr/rdmsr *entirely*. It's shit. > > - Add this wrapper: > > #define wrmsr(msr, low, high) \ > WARN_ON_ONCE(wrmsr_safe(msr, low, high)) > > and be done with it. We could even decide to make that WARN_ON_ONCE() > be something we could configure out, because it's really a debugging > thing and isn't like it should be all that fatal. > > None of this insane complicated crap that buys us exactly *nothing*, > and depends on fancy new exception handling support etc etc. > > So what's the downside to just doing this simple thing? This was my original suggestion as well. The objection was that sometimes it's performance critical. To which I replied that 1) I highly doubt that a single extra branch of the check is measurable 2) and even if so, then _those_ performance critical cases should be done in some special way (rdmsr_nocheck() or whatever) - at which point the argument was that there's a lot of such cases. Which final line of argument I still don't really buy, btw., but it became a me against everyone else argument and I cowardly punted. Now that the 800 lb gorilla is on my side I of course stand my ground, principles are principles! Thanks, Ingo