2016-03-11 16:48:34

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

On Thu, Oct 1, 2015 at 12:15 AM, Ingo Molnar <[email protected]> wrote:
>
> * Andy Lutomirski <[email protected]> wrote:
>
>> > These could still be open coded in an inlined fashion, like the scheduler usage.
>>
>> We could have a raw_rdmsr for those.
>>
>> OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is
>> worth the effort. This isn't a frequent source of bugs to my knowledge, and we
>> don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so do we
>> really gain much by rigging a recovery mechanism for rdmsr and wrmsr failures
>> for code that doesn't use the _safe variants?
>
> It's just the general principle really: don't crash the kernel on bootup. There's
> few things more user hostile than that.
>
> Also, this would maintain the status quo: since we now (accidentally) don't crash
> the kernel on distro kernels (but silently and unsafely ignore the faulting
> instruction), we should not regress that behavior (by adding the chance to crash
> again), but improve upon it.

Just a heads up: the extable improvements in tip:ras/core make it
straightforward to get the best of all worlds: explicit failure
handling (written in C!), no fast path overhead whatsoever, and no new
garbage in the exception handlers.

Patches coming once I test them.

>
> Thanks,
>
> Ingo



--
Andy Lutomirski
AMA Capital Management, LLC


2016-03-12 16:02:55

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops


* Andy Lutomirski <[email protected]> wrote:

> On Thu, Oct 1, 2015 at 12:15 AM, Ingo Molnar <[email protected]> wrote:
> >
> > * Andy Lutomirski <[email protected]> wrote:
> >
> >> > These could still be open coded in an inlined fashion, like the scheduler usage.
> >>
> >> We could have a raw_rdmsr for those.
> >>
> >> OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is
> >> worth the effort. This isn't a frequent source of bugs to my knowledge, and we
> >> don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so do we
> >> really gain much by rigging a recovery mechanism for rdmsr and wrmsr failures
> >> for code that doesn't use the _safe variants?
> >
> > It's just the general principle really: don't crash the kernel on bootup. There's
> > few things more user hostile than that.
> >
> > Also, this would maintain the status quo: since we now (accidentally) don't crash
> > the kernel on distro kernels (but silently and unsafely ignore the faulting
> > instruction), we should not regress that behavior (by adding the chance to crash
> > again), but improve upon it.
>
> Just a heads up: the extable improvements in tip:ras/core make it
> straightforward to get the best of all worlds: explicit failure
> handling (written in C!), no fast path overhead whatsoever, and no new
> garbage in the exception handlers.

I _knew_ I should have merged them into tip:x86/mm, not tip:ras/core ;-)

I had a quick look at your new MSR series and I'm very happy with that direction!

Thanks,

Ingo