Message-ID: <1337693441.13348.36.camel@gandalf.stny.rr.com>
Subject: Re: NMI vs #PF clash
From: Steven Rostedt <rostedt@goodmis.org>
To: Avi Kivity <avi@redhat.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        "H. Peter Anvin" <hpa@zytor.com>, Thomas Gleixner <tglx@linutronix.de>,
        Paul Turner <pjt@google.com>, Peter Zijlstra <peterz@infradead.org>,
        Frederic Weisbecker <fweisbec@gmail.com>,
        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: Tue, 22 May 2012 09:30:41 -0400
In-Reply-To: <4FBB8C40.6080304@redhat.com>
References: <4FBB8C40.6080304@redhat.com>
Content-Type: text/plain; charset="ISO-8859-15"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2534
Lines: 63

On Tue, 2012-05-22 at 15:53 +0300, Avi Kivity wrote:
> The recent changes to NMI allow exceptions to take place in NMI
> handlers, but I think that a #PF (say, due to access to vmalloc space)
> is still problematic.  Consider the sequence
> 
>   #PF  (cr2 set by processor)
>     NMI
>       ...
>       #PF (cr2 clobbered)
>         do_page_fault()
>         IRET
>       ...
>       IRET
>     do_page_fault()
>       address = read_cr2()

This is still problematic. But the "allow faults in NMI" wasn't written
for page faults, although they wont totally crash the system like they
use to. If a NMI triggers during a page fault routine before the reading
of the cr2, and it takes a page fault, then yes, this will corrupt the
cr2 and cause unpredictable results (not good)

That said, we still should not be having page faults in NMI. The fault
handling was to allow breakpoints in the NMI code, which should not be a
problem here. There is code to handle nested breakpoints because of
NMIs.

The only time I found #PF useful in NMIs was for debugging. Having a
stack dump of all tasks (sysrq-t) when the NMI watchdog detects a
deadlock can be very useful. But stack traces can trigger page faults,
and before this fault handling in NMI code went in, I could not get a
full task state dump from NMI context. This was due to the first page
fault happening by a stack dump would enable NMIs, and as the state of
all tasks dumping out to the serial port took a long time, another NMI
would come in and corrupt the NMI stack leading to a system hang or
triple fault reboot. Never letting the task dump to finish.

This code now alleviates that problem.

>  
> The last line reads the overwritten cr2 value.
> 
> I vaguely remember some discussion about this back in the day, but I
> can't find anything in the code to save/restore cr2 in the NMI handler. 
> Did I miss it?  Or perhaps the page fault handler ignores the incorrect
> cr2 and IRETs, to fault back immediately?
> 

Now if we want to handle page faults from NMI context, we could do some
tricks to have the NMI detect that it interrupted a page fault before it
read the cr2 and in that case, save off the cr2 register, and restore it
before returning.

Or we could just have the NMI always restore the cr2 register.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/