Received: by 10.223.185.116 with SMTP id b49csp1003527wrg; Wed, 21 Feb 2018 10:23:54 -0800 (PST) X-Google-Smtp-Source: AH8x224vi2WaM0AxHEtAezNLFdeb3JJ4dEW/kaEsXHQe2Cr1H4Q4XBvnHQenfQM9bbrKn2zyG10m X-Received: by 10.99.116.10 with SMTP id p10mr3419436pgc.97.1519237434612; Wed, 21 Feb 2018 10:23:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519237434; cv=none; d=google.com; s=arc-20160816; b=uGDlpjbRtjD5uqQqLd++ptJ4UKO2CYR16qLycPEMOMR0YW4c9p3Ybi/w9SOTuDzef9 bTIE8xCARxX+hXR/Gw6NIPfl2vWrwmwCAu8aeCZ4h0cZmXk2FRl6FG2h3/VlfMRIilgi QHuwfjRu6aVGpUtFYKTXOxKSHvaEoeng+WD6URieAlsaMZIzZtOEJSPbD2T8UP1rY2NZ u/iVgUzYbMGPxKbw7yb3wIUZgkqBj7TOT4AMB/uTmI9X7MiAgA559qHasu8ceWkJZma8 iExlkt4YGmCCjfbojrqzoXnjLC/7mgLGgFD40Ao4/sasa+G1frdkFWC2C4rbn7oLHW9e Rz2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=rzWIV10keHvBNMOYD1l9rJ6wnC+o76rchvdIvX2QxRM=; b=uSyw9YvK7FbP4eWAvvvFpIw8GsbotxOGw1i+dfI6CtP4cfMPB1zRL9LdWdrklf/hAC UXrA6YjtGKgYbASnQbjmuG9BGJH8m5zkqovWyi6E5nJFXHWezekjaLDmyAbCoVQOmS9P r/PL1/Igfo/CeHoB8sRhKECsm6H5cx4N+iVsgIb3UJnCflFJgbEBgYGa7z0HQyXsUvBW 0B9I1UeDgs1BzzZQ8BE1has7EM27R9siYf1Q8DMmIwyMs71HJmvQQ8nm2T/fBaf36L7e rV6z40djfy8Q7n+uFGYT12RzoavvCiWMZJ8JHH5u7okUZLm/7dRTGk8PgBAQZHb/JuYS o7WQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 4si1323924pgh.771.2018.02.21.10.23.40; Wed, 21 Feb 2018 10:23:54 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935368AbeBUNAv (ORCPT + 99 others); Wed, 21 Feb 2018 08:00:51 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:37518 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932962AbeBUNAt (ORCPT ); Wed, 21 Feb 2018 08:00:49 -0500 Received: from localhost (LFbn-1-12258-90.w90-92.abo.wanadoo.fr [90.92.71.90]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 4B09210E6; Wed, 21 Feb 2018 13:00:31 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Linus Torvalds , Dominik Brodowski , Andy Lutomirski , Borislav Petkov , Brian Gerst , Denys Vlasenko , "H. Peter Anvin" , Josh Poimboeuf , Peter Zijlstra , Thomas Gleixner , dan.j.williams@intel.com, Ingo Molnar Subject: [PATCH 4.14 063/167] x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros Date: Wed, 21 Feb 2018 13:47:54 +0100 Message-Id: <20180221124527.948125254@linuxfoundation.org> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180221124524.639039577@linuxfoundation.org> References: <20180221124524.639039577@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Dominik Brodowski commit dde3036d62ba3375840b10ab9ec0d568fd773b07 upstream. Previously, error_entry() and paranoid_entry() saved the GP registers onto stack space previously allocated by its callers. Combine these two steps in the callers, and use the generic PUSH_AND_CLEAR_REGS macro for that. This adds a significant amount ot text size. However, Ingo Molnar points out that: "these numbers also _very_ significantly over-represent the extra footprint. The assumptions that resulted in us compressing the IRQ entry code have changed very significantly with the new x86 IRQ allocation code we introduced in the last year: - IRQ vectors are usually populated in tightly clustered groups. With our new vector allocator code the typical per CPU allocation percentage on x86 systems is ~3 device vectors and ~10 fixed vectors out of ~220 vectors - i.e. a very low ~6% utilization (!). [...] The days where we allocated a lot of vectors on every CPU and the compression of the IRQ entry code text mattered are over. - Another issue is that only a small minority of vectors is frequent enough to actually matter to cache utilization in practice: 3-4 key IPIs and 1-2 device IRQs at most - and those vectors tend to be tightly clustered as well into about two groups, and are probably already on 2-3 cache lines in practice. For the common case of 'cache cold' IRQs it's the depth of the call chain and the fragmentation of the resulting I$ that should be the main performance limit - not the overall size of it. - The CPU side cost of IRQ delivery is still very expensive even in the best, most cached case, as in 'over a thousand cycles'. So much stuff is done that maybe contemporary x86 IRQ entry microcode already prefetches the IDT entry and its expected call target address."[*] [*] http://lkml.kernel.org/r/20180208094710.qnjixhm6hybebdv7@gmail.com The "testb $3, CS(%rsp)" instruction in the idtentry macro does not need modification. Previously, %rsp was manually decreased by 15*8; with this patch, %rsp is decreased by 15 pushq instructions. [jpoimboe@redhat.com: unwind hint improvements] Suggested-by: Linus Torvalds Signed-off-by: Dominik Brodowski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-7-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/calling.h | 42 +----------------------------------------- arch/x86/entry/entry_64.S | 20 +++++++++----------- 2 files changed, 10 insertions(+), 52 deletions(-) --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -97,46 +97,6 @@ For 32-bit we have the following convent #define SIZEOF_PTREGS 21*8 - .macro ALLOC_PT_GPREGS_ON_STACK - addq $-(15*8), %rsp - .endm - - .macro SAVE_AND_CLEAR_REGS offset=0 - /* - * Save registers and sanitize registers of values that a - * speculation attack might otherwise want to exploit. The - * lower registers are likely clobbered well before they - * could be put to use in a speculative execution gadget. - * Interleave XOR with PUSH for better uop scheduling: - */ - movq %rdi, 14*8+\offset(%rsp) - movq %rsi, 13*8+\offset(%rsp) - movq %rdx, 12*8+\offset(%rsp) - movq %rcx, 11*8+\offset(%rsp) - movq %rax, 10*8+\offset(%rsp) - movq %r8, 9*8+\offset(%rsp) - xorq %r8, %r8 /* nospec r8 */ - movq %r9, 8*8+\offset(%rsp) - xorq %r9, %r9 /* nospec r9 */ - movq %r10, 7*8+\offset(%rsp) - xorq %r10, %r10 /* nospec r10 */ - movq %r11, 6*8+\offset(%rsp) - xorq %r11, %r11 /* nospec r11 */ - movq %rbx, 5*8+\offset(%rsp) - xorl %ebx, %ebx /* nospec rbx */ - movq %rbp, 4*8+\offset(%rsp) - xorl %ebp, %ebp /* nospec rbp */ - movq %r12, 3*8+\offset(%rsp) - xorq %r12, %r12 /* nospec r12 */ - movq %r13, 2*8+\offset(%rsp) - xorq %r13, %r13 /* nospec r13 */ - movq %r14, 1*8+\offset(%rsp) - xorq %r14, %r14 /* nospec r14 */ - movq %r15, 0*8+\offset(%rsp) - xorq %r15, %r15 /* nospec r15 */ - UNWIND_HINT_REGS offset=\offset - .endm - .macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax /* * Push registers and sanitize registers of values that a @@ -211,7 +171,7 @@ For 32-bit we have the following convent * is just setting the LSB, which makes it an invalid stack address and is also * a signal to the unwinder that it's a pt_regs pointer in disguise. * - * NOTE: This macro must be used *after* SAVE_AND_CLEAR_REGS because it corrupts + * NOTE: This macro must be used *after* PUSH_AND_CLEAR_REGS because it corrupts * the original rbp. */ .macro ENCODE_FRAME_POINTER ptregs_offset=0 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -867,7 +867,9 @@ ENTRY(\sym) pushq $-1 /* ORIG_RAX: no syscall to restart */ .endif - ALLOC_PT_GPREGS_ON_STACK + /* Save all registers in pt_regs */ + PUSH_AND_CLEAR_REGS + ENCODE_FRAME_POINTER .if \paranoid < 2 testb $3, CS(%rsp) /* If coming from userspace, switch stacks */ @@ -1115,15 +1117,12 @@ idtentry machine_check do_mce has_err #endif /* - * Save all registers in pt_regs, and switch gs if needed. + * Switch gs if needed. * Use slow, but surefire "are we in kernel?" check. * Return: ebx=0: need swapgs on exit, ebx=1: otherwise */ ENTRY(paranoid_entry) - UNWIND_HINT_FUNC cld - SAVE_AND_CLEAR_REGS 8 - ENCODE_FRAME_POINTER 8 movl $1, %ebx movl $MSR_GS_BASE, %ecx rdmsr @@ -1136,7 +1135,7 @@ ENTRY(paranoid_entry) SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14 ret -END(paranoid_entry) +ENDPROC(paranoid_entry) /* * "Paranoid" exit path from exception stack. This is invoked @@ -1167,14 +1166,12 @@ ENTRY(paranoid_exit) END(paranoid_exit) /* - * Save all registers in pt_regs, and switch gs if needed. + * Switch gs if needed. * Return: EBX=0: came from user mode; EBX=1: otherwise */ ENTRY(error_entry) - UNWIND_HINT_FUNC + UNWIND_HINT_REGS offset=8 cld - SAVE_AND_CLEAR_REGS 8 - ENCODE_FRAME_POINTER 8 testb $3, CS+8(%rsp) jz .Lerror_kernelspace @@ -1565,7 +1562,8 @@ end_repeat_nmi: * frame to point back to repeat_nmi. */ pushq $-1 /* ORIG_RAX: no syscall to restart */ - ALLOC_PT_GPREGS_ON_STACK + PUSH_AND_CLEAR_REGS + ENCODE_FRAME_POINTER /* * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit