Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp3407802ybi; Fri, 5 Jul 2019 06:52:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqxrlociUCwVZAnOFZxKTntGRWpCN7BbRTy9scZ+n9aWCOUluPFJG0P8JEKjVws8fJKGGrIO X-Received: by 2002:a63:5c7:: with SMTP id 190mr5630368pgf.67.1562334738502; Fri, 05 Jul 2019 06:52:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562334738; cv=none; d=google.com; s=arc-20160816; b=Mv8AYZ8XD2FzxZMCAwtXF8ckes3M/DVWxYjXdtku3pdOtcqgQ+nppz8zVvwfw1/+Lt OpdrgcpR5/rhW+s4YpxavIa/1GHAfnFXwWHFNszVxfK5EHINut6mN6I3jlaHALjWWPUb ZKzBzor4X/6YVslMotMVTsqYh132dXN3uHaAuaN2lcUPTHVOX32y4INomY9aXIPCRcWK 346SSjdqzG89+Rj+/CSPZZ+idMFsYVCtUC59x99WhNb8du3oWR7ZZMacH9BkLqB0SpKu /JktIUXkyRT+Z3NMKwbzO/S6KjmEIg4wh7PvFZ0LCjx21weQA11cs1b6N872O/+SMKsH fWVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=3aOmf4wG24KSuD0s327KTluB/ek7l1O+tQrwGfYOZew=; b=bHuAidf7hE9Nix2k8Wy55GSvdiN4NSzAS4CLaOO5412fx6JBJfivDzMYWv8sWdj5wX NtmtwotSdG7cNhbXHUSfXKE5vcCyBiAN0qxmIHpPwIYd8Xep7IL3Mt9hI3NTHloPRp// jQ4+giyK1qj1yrzRZEggTO1zCVVQ5YnsHbWoCHFyrjQ/UaL9QyRzo9riHOjYEvelHcS1 32mvB0KOsONKB3SFwmJS2sXMMJsEuk0l1nc9t4b3LT0+qr1wZqH6ziV0BcluY5JGLi8h DrXWWnJuS1QDPRt4siMf6cIFzHrK1hK0GN3Vo+bNPDHMHRqTe92DiHhM3APz9RzLOyRL CAVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=Qqcv+JAC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o37si9490901pjb.20.2019.07.05.06.52.03; Fri, 05 Jul 2019 06:52:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=Qqcv+JAC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728244AbfGENus (ORCPT + 99 others); Fri, 5 Jul 2019 09:50:48 -0400 Received: from merlin.infradead.org ([205.233.59.134]:35494 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727498AbfGENur (ORCPT ); Fri, 5 Jul 2019 09:50:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=3aOmf4wG24KSuD0s327KTluB/ek7l1O+tQrwGfYOZew=; b=Qqcv+JAC02xbIHnUp/YzYVXZn Y52cIwuqLCFJ1vfCSH+qgZZCZqVvQ59bTdj5E0Vlr750CU+DwFuVVzUK5lG7WGR5ueC6BMJKzoSyo XIfJPDL3K23uV+FIVk1L97gOhfI1iDTV+F0y8gcie3T1fU6j67uw3b22TyTrF2C5mWTqiact/Zk9T uWcmd/fTRDUKGavWhkiyMtEJIuG7elZTwr4s7V9DtY9Spx57emXs5IbBbxJRNDIGj/YKtkDNXB2oA PsUymVnOyEsWMpR00WxWSB9nBnR+t4vQ5YLxdQK43k0wCzq+b6RwvstoWnorCcturtLyshin8s/zx lX9b5z7pg==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.92 #3 (Red Hat Linux)) id 1hjOax-0002Mf-Hq; Fri, 05 Jul 2019 13:49:47 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 112A120AF24BF; Fri, 5 Jul 2019 15:49:16 +0200 (CEST) Date: Fri, 5 Jul 2019 15:49:16 +0200 From: Peter Zijlstra To: Linus Torvalds Cc: Thomas Gleixner , Borislav Petkov , Ingo Molnar , Steven Rostedt , Andrew Lutomirski , Peter Anvin , Dave Hansen , Juergen Gross , Linux List Kernel Mailing , He Zhe , Joel Fernandes , devel@etsukata.com Subject: Re: [PATCH v2 5/7] x86/mm, tracing: Fix CR2 corruption Message-ID: <20190705134916.GU3402@hirez.programming.kicks-ass.net> References: <20190704195555.580363209@infradead.org> <20190704200050.534802824@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 05, 2019 at 11:18:51AM +0900, Linus Torvalds wrote: > On Fri, Jul 5, 2019 at 5:03 AM Peter Zijlstra wrote: > > > > Despire the current efforts to read CR2 before tracing happens there > > still exist a number of possible holes: > > So this whole series disturbs me for the simple reason that I thought > tracing was supposed to save/restore cr2 and make it unnecessary to > worry about this in non-tracing code. > > That is very much what the NMI code explicitly does. Why shouldn't all > the other tracing code do the same thing in case they can take page > faults? > > So I don't think the patches are wrong per se, but this seems to solve > it at the wrong level. My thinking is that that results in far too many sites which we have to fix and a possibly fragility of interface. Invariably we'll get multiple interface for the same thing, one which preserves CR2 and one which doesn't -- in the name of performance. And then someone uses the wrong one, and we're back where we started. Conversely, this way we get to fix it in one place. Also; all previous attempts at fixing this have been about pushing the read_cr2() earlier; notably: 0ac09f9f8cd1 ("x86, trace: Fix CR2 corruption when tracing page faults") d4078e232267 ("x86, trace: Further robustify CR2 handling vs tracing") And I'm thinking that with exception of this patch, the rest are worthwhile cleanups regardless. Also; while looking at this, if we do continue with the C wrappers from the very last patch, we can do horrible things like this on top and move the read_cr2() back into C code. --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -826,7 +826,7 @@ apicinterrupt IRQ_WORK_VECTOR irq_work */ #define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + (x) * 8) -.macro idtentry_part do_sym, has_error_code:req, read_cr2:req, paranoid:req, shift_ist=-1, ist_offset=0 +.macro idtentry_part do_sym, has_error_code:req, paranoid:req, shift_ist=-1, ist_offset=0 .if \paranoid call paranoid_entry @@ -836,10 +836,6 @@ apicinterrupt IRQ_WORK_VECTOR irq_work .endif UNWIND_HINT_REGS - .if \read_cr2 - GET_CR2_INTO(%rdx); /* can clobber %rax */ - .endif - .if \has_error_code movq ORIG_RAX(%rsp), %rsi /* get error code */ movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */ @@ -885,7 +881,6 @@ apicinterrupt IRQ_WORK_VECTOR irq_work * fresh stack. (This is for #DB, which has a nasty habit * of recursing.) * @create_gap: create a 6-word stack gap when coming from kernel mode. - * @read_cr2: load CR2 into the 3rd argument; done before calling any C code * * idtentry generates an IDT stub that sets up a usable kernel context, * creates struct pt_regs, and calls @do_sym. The stub has the following @@ -910,7 +905,7 @@ apicinterrupt IRQ_WORK_VECTOR irq_work * @paranoid == 2 is special: the stub will never switch stacks. This is for * #DF: if the thread stack is somehow unusable, we'll still get a useful OOPS. */ -.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ist_offset=0 create_gap=0 read_cr2=0 +.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ist_offset=0 create_gap=0 ENTRY(\sym) UNWIND_HINT_IRET_REGS offset=\has_error_code*8 @@ -948,7 +943,7 @@ ENTRY(\sym) .Lfrom_usermode_no_gap_\@: .endif - idtentry_part \do_sym, \has_error_code, \read_cr2, \paranoid, \shift_ist, \ist_offset + idtentry_part \do_sym, \has_error_code, \paranoid, \shift_ist, \ist_offset .if \paranoid == 1 /* @@ -957,7 +952,7 @@ ENTRY(\sym) * run in real process context if user_mode(regs). */ .Lfrom_usermode_switch_stack_\@: - idtentry_part \do_sym, \has_error_code, \read_cr2, 0 + idtentry_part \do_sym, \has_error_code, paranoid=0 .endif _ASM_NOKPROBE(\sym) @@ -969,7 +964,7 @@ idtentry overflow do_overflow has_er idtentry bounds do_bounds has_error_code=0 idtentry invalid_op do_invalid_op has_error_code=0 idtentry device_not_available do_device_not_available has_error_code=0 -idtentry double_fault do_double_fault has_error_code=1 paranoid=2 read_cr2=1 +idtentry double_fault do_double_fault has_error_code=1 paranoid=2 idtentry coprocessor_segment_overrun do_coprocessor_segment_overrun has_error_code=0 idtentry invalid_TSS do_invalid_TSS has_error_code=1 idtentry segment_not_present do_segment_not_present has_error_code=1 @@ -1142,10 +1137,10 @@ idtentry xenint3 do_int3 has_error_co #endif idtentry general_protection do_general_protection has_error_code=1 -idtentry page_fault do_page_fault has_error_code=1 read_cr2=1 +idtentry page_fault do_page_fault has_error_code=1 #ifdef CONFIG_KVM_GUEST -idtentry async_page_fault do_async_page_fault has_error_code=1 read_cr2=1 +idtentry async_page_fault do_async_page_fault has_error_code=1 #endif #ifdef CONFIG_X86_MCE --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -22,20 +22,34 @@ #define CALL_enter_from_user_mode(_regs) #endif +#define __IDT_NR1 1 +#define __IDT_NR2 2 +#define __IDT_NR3 2 + +#define IDT_NR(n) __IDT_NR##n + +#define __IDT_TRAP1(t1,a1) +#define __IDT_TRAP2(t1,a1,t2,a2) +#define __IDT_TRAP3(t1,a1,t2,a2,t3,a3) t3 a3 = read_cr2() + +#define IDT_TRAP(n,...) __IDT_TRAP##n(__VA_ARGS__) + #define IDTENTRYx(n, name, ...) \ static notrace void __idt_##name(__IDT_MAP(n, __IDT_DECL, __VA_ARGS__)); \ NOKPROBE_SYMBOL(__idt_##name); \ - dotraplinkage notrace void name(__IDT_MAP(n, __IDT_DECL, __VA_ARGS__)) \ + dotraplinkage notrace void name(__IDT_MAP(__IDT_NR(n), __IDT_DECL, __VA_ARGS__)) \ { \ __IDT_MAP(n, __IDT_TEST, __VA_ARGS__); \ + __IDT_TRAP(n, __VA_ARGS__); \ trace_hardirqs_off(); \ CALL_enter_from_user_mode(regs); \ __idt_##name(__IDT_MAP(n, __IDT_ARGS, __VA_ARGS__)); \ } \ NOKPROBE_SYMBOL(name); \ - dotraplinkage notrace void name##_paranoid(__IDT_MAP(n, __IDT_DECL, __VA_ARGS__)) \ + dotraplinkage notrace void name##_paranoid(__IDT_MAP(__IDT_NR(n), __IDT_DECL, __VA_ARGS__)) \ { \ __IDT_MAP(n, __IDT_TEST, __VA_ARGS__); \ + __IDT_TRAP(n, __VA_ARGS__); \ trace_hardirqs_off(); \ __idt_##name(__IDT_MAP(n, __IDT_ARGS, __VA_ARGS__)); \ } \