Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934840Ab0GOWBW (ORCPT ); Thu, 15 Jul 2010 18:01:22 -0400 Received: from mail.openrapids.net ([64.15.138.104]:33160 "EHLO blackscsi.openrapids.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S934215Ab0GOWBU (ORCPT ); Thu, 15 Jul 2010 18:01:20 -0400 Date: Thu, 15 Jul 2010 18:01:17 -0400 From: Mathieu Desnoyers To: Linus Torvalds Cc: LKML , Andrew Morton , Ingo Molnar , Peter Zijlstra , Steven Rostedt , Steven Rostedt , Frederic Weisbecker , Thomas Gleixner , Christoph Hellwig , Li Zefan , Lai Jiangshan , Johannes Berg , Masami Hiramatsu , Arnaldo Carvalho de Melo , Tom Zanussi , KOSAKI Motohiro , Andi Kleen , "H. Peter Anvin" , Jeremy Fitzhardinge , "Frank Ch. Eigler" , Tejun Heo Subject: Re: [patch 1/2] x86_64 page fault NMI-safe Message-ID: <20100715220117.GA1499@Krystal> References: <20100714203940.GC22096@Krystal> <20100714222115.GA30122@Krystal> <20100715183153.GA9276@Krystal> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Editor: vi X-Info: http://www.efficios.com X-Operating-System: Linux/2.6.26-2-686 (i686) X-Uptime: 17:49:35 up 174 days, 26 min, 3 users, load average: 0.08, 0.05, 0.00 User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3965 Lines: 117 Hi Linus, I modified your code, intenting to handle the fake NMI entry gracefully given that NMIs are not necessarily disabled at the entry point. It uses a "need fake NMI" flag rather than playing games with CS and faults. When a fake NMI is needed, it simply jumps back to the beginning of regular nmi code. NMI exit code and fake NMI entry are made reentrant with respect to NMI handler interruption by testing, at the very beginning of the NMI handler, if a NMI is nested over the whole nmi_atomic .. nmi_atomic_end code region. This code assumes NMIs have a separate stack. This code is still utterly untested and might eat your Doritos, only provided for general enjoyment. Thanks, Mathieu # # Two per-cpu variables: a "are we nested" flag (one byte). # a "do we need to execute a fake NMI" flag (one byte). # The %rsp at which the stack copy is saved is at a fixed address, which leaves # enough room at the bottom of NMI stack for the "real" NMI entry stack. This # assumes we have a separate NMI stack. # The NMI stack copy top of stack is at nmi_stack_copy. # The NMI stack copy "rip" is at nmi_stack_copy_rip, which is set to # nmi_stack_copy-32. # nmi: # Test if nested over atomic code. cmpq $nmi_atomic,0(%rsp) jae nmi_addr_is_ae # Test if nested over general NMI code. cmpb $0,%__percpu_seg:nmi_stack_nesting jne nmi_nested_set_fake_and_return # create new stack is_unnested_nmi: # Save some space for nested NMI's. The exception itself # will never use more space, but it might use less (since # if will be a kernel-kernel transition). # Save %rax on top of the stack (need to temporarily use it) pushq %rax movq %rsp, %rax movq $nmi_stack_copy,%rsp # copy the five words of stack info. rip starts at 8+0(%rax). pushq 8+32(%rax) # ss pushq 8+24(%rax) # rsp pushq 8+16(%rax) # eflags pushq 8+8(%rax) # cs pushq 8+0(%rax) # rip movq 0(%rax),%rax # restore %rax set_nmi_nesting: # and set the nesting flags movb $0xff,%__percpu_seg:nmi_stack_nesting regular_nmi_code: ... # regular NMI code goes here, and can take faults, # because this sequence now has proper nested-nmi # handling ... nmi_atomic: # An NMI nesting over the whole nmi_atomic .. nmi_atomic_end region will # be handled specially. This includes the fake NMI entry point. cmpb $0,%__percpu_seg:need_fake_nmi jne fake_nmi movb $0,%__percpu_seg:nmi_stack_nesting iret # This is the fake NMI entry point. fake_nmi: movb $0x0,%__percpu_seg:need_fake_nmi jmp regular_nmi_code nmi_atomic_end: # Make sure the address is in the nmi_atomic range and in CS segment. nmi_addr_is_ae: cmpq $nmi_atomic_end,0(%rsp) jae is_unnested_nmi # The saved rip points to the final NMI iret. Check the CS segment to # make sure. cmpw $__KERNEL_CS,8(%rsp) jne is_unnested_nmi # This is the case when we hit just as we're supposed to do the atomic code # of a previous nmi. We run the NMI using the old return address that is still # on the stack, rather than copy the new one that is bogus and points to where # the nested NMI interrupted the original NMI handler! # Easy: just set the stack pointer to point to the stack copy, clear # need_fake_nmi (because we are directly going to execute the requested NMI) and # jump to "nesting flag set" (which is followed by regular nmi code execution). movq $nmi_stack_copy_rip,%rsp movb $0x0,%__percpu_seg:need_fake_nmi jmp set_nmi_nesting # This is the actual nested case. Make sure we branch to the fake NMI handler # after this handler is done. nmi_nested_set_fake_and_return: movb $0xff,%__percpu_seg:need_fake_nmi popfq jmp *(%rsp) -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/