Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp339730imu; Thu, 3 Jan 2019 21:32:42 -0800 (PST) X-Google-Smtp-Source: ALg8bN7WiJ6CtmM00UHb0uzJceEes05Yp6itxB5DESumskZuamf1tr4olPETW6sNqvTeZTcDX2/I X-Received: by 2002:a63:2849:: with SMTP id o70mr483123pgo.155.1546579962187; Thu, 03 Jan 2019 21:32:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546579962; cv=none; d=google.com; s=arc-20160816; b=uqa97pUcDrpM9cr5clbvO5UsBTHjsKs4BlaPFpg3IYou+dpztSUkv4hSjpVOpj16s6 KPiH00uYhWEWEkw3nn50ls1VN8IBnKcuxbuNk8O/9huw3KaijsX2L17kS5WoNSg5am1o 3WnHfnH93/84Qbw3xc1IKrpgt7kS18g4vv8sTDD7jAjo7tfHr+I+jhAmFE3s2Q53Sc3U fZQoUJvOu/VIvq4pmueFCLRnR3GGiZLv10l3KBX3RnGbqFAmKw7KqzNrZsWC/gtT9SEF CIfhP4DkUB7dS7nrkcRuFRFYlOaK5b4+IELnabJuRYO3ks3/eHtAH3cljn1FEI1Vr7it 7O+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:from:cc:to:subject :content-transfer-encoding:mime-version:references:in-reply-to :user-agent:date; bh=/OQSKRa+VMZ7hbUOgnkm+GsbKZn4hjBdkglPQzgY34w=; b=hKwYrkCOH40ufV8tgilJ9sQZ5otu2kpKJbKqR0nebG+dWLvi2PDni3vr9113m22i+7 26U4nh56QFE6dpfX3jqBz2lfXxSf/Xxjpmpf0Cth7B9RgAqIeiGUMvIDLV91X5OIAYpt F3lq7Vi4sdzw8ErCzM16xWhRCN5WEMIC2IwN/Nk5ATD6m7sYjc4OBYlL29PNiFHyzb53 Dj6p9/o9C96IKUY/dEAxHL8ejqBiRgSX0zGQ9vhwrQTzc3iwGHY9UxLeJZAnYPYWCxp0 7TXoUIZRPo4/eHwOhmfWSGmFmQAb/q0gkK0ZfNQLJnUTXeL9I2IAMxt4bmQSmgSf+vnn HpGQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d36si33774887pla.216.2019.01.03.21.32.27; Thu, 03 Jan 2019 21:32:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729120AbfADAfp convert rfc822-to-8bit (ORCPT + 99 others); Thu, 3 Jan 2019 19:35:45 -0500 Received: from terminus.zytor.com ([198.137.202.136]:42639 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726034AbfADAfo (ORCPT ); Thu, 3 Jan 2019 19:35:44 -0500 Received: from wld62.hos.anvin.org (c-24-5-245-234.hsd1.ca.comcast.net [24.5.245.234] (may be forged)) (authenticated bits=0) by mail.zytor.com (8.15.2/8.15.2) with ESMTPSA id x040Z7Ia1575193 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Thu, 3 Jan 2019 16:35:08 -0800 Date: Thu, 03 Jan 2019 16:34:58 -0800 User-Agent: K-9 Mail for Android In-Reply-To: <20181231072112.21051-2-namit@vmware.com> References: <20181231072112.21051-1-namit@vmware.com> <20181231072112.21051-2-namit@vmware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Subject: Re: [RFC v2 1/6] x86: introduce kernel restartable sequence To: Nadav Amit , Ingo Molnar , Andy Lutomirski , Peter Zijlstra , Josh Poimboeuf , Edward Cree CC: Thomas Gleixner , LKML , Nadav Amit , X86 ML , Paolo Abeni , Borislav Petkov , David Woodhouse From: hpa@zytor.com Message-ID: <7C07ACBD-A269-4F00-A3FD-2041B27146D4@zytor.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On December 30, 2018 11:21:07 PM PST, Nadav Amit wrote: >It is sometimes beneficial to have a restartable sequence - very few >instructions which if they are preempted jump to a predefined point. > >To provide such functionality on x86-64, we use an empty REX-prefix >(opcode 0x40) as an indication for instruction in such a sequence. >Before >calling the schedule IRQ routine, if the "magic" prefix is found, we >call a routine to adjust the instruction pointer. It is expected that >this opcode is not in common use. > >The following patch will make use of this function. Since there are no >other users (yet?), the patch does not bother to create a general >infrastructure and API that others can use for such sequences. Yet, it >should not be hard to make such extension later. > >Signed-off-by: Nadav Amit >--- > arch/x86/entry/entry_64.S | 16 ++++++++++++++-- > arch/x86/include/asm/nospec-branch.h | 12 ++++++++++++ > arch/x86/kernel/traps.c | 7 +++++++ > 3 files changed, 33 insertions(+), 2 deletions(-) > >diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S >index 1f0efdb7b629..e144ff8b914f 100644 >--- a/arch/x86/entry/entry_64.S >+++ b/arch/x86/entry/entry_64.S >@@ -644,12 +644,24 @@ retint_kernel: > /* Interrupts are off */ > /* Check if we need preemption */ > btl $9, EFLAGS(%rsp) /* were interrupts off? */ >- jnc 1f >+ jnc 2f > 0: cmpl $0, PER_CPU_VAR(__preempt_count) >+ jnz 2f >+ >+ /* >+ * Allow to use restartable code sections in the kernel. Consider an >+ * instruction with the first byte having REX prefix without any bits >+ * set as an indication for an instruction in such a section. >+ */ >+ movq RIP(%rsp), %rax >+ cmpb $KERNEL_RESTARTABLE_PREFIX, (%rax) > jnz 1f >+ mov %rsp, %rdi >+ call restart_kernel_rseq >+1: > call preempt_schedule_irq > jmp 0b >-1: >+2: > #endif > /* > * The iretq could re-enable interrupts: >diff --git a/arch/x86/include/asm/nospec-branch.h >b/arch/x86/include/asm/nospec-branch.h >index dad12b767ba0..be4713ef0940 100644 >--- a/arch/x86/include/asm/nospec-branch.h >+++ b/arch/x86/include/asm/nospec-branch.h >@@ -54,6 +54,12 @@ > jnz 771b; \ > add $(BITS_PER_LONG/8) * nr, sp; > >+/* >+ * An empty REX-prefix is an indication that this instruction is part >of kernel >+ * restartable sequence. >+ */ >+#define KERNEL_RESTARTABLE_PREFIX (0x40) >+ > #ifdef __ASSEMBLY__ > > /* >@@ -150,6 +156,12 @@ > #endif > .endm > >+.macro restartable_seq_prefix >+#ifdef CONFIG_PREEMPT >+ .byte KERNEL_RESTARTABLE_PREFIX >+#endif >+.endm >+ > #else /* __ASSEMBLY__ */ > > #define ANNOTATE_NOSPEC_ALTERNATIVE \ >diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c >index 85cccadb9a65..b1e855bad5ac 100644 >--- a/arch/x86/kernel/traps.c >+++ b/arch/x86/kernel/traps.c >@@ -59,6 +59,7 @@ > #include > #include > #include >+#include > > #ifdef CONFIG_X86_64 > #include >@@ -186,6 +187,12 @@ int fixup_bug(struct pt_regs *regs, int trapnr) > return 0; > } > >+asmlinkage __visible void restart_kernel_rseq(struct pt_regs *regs) >+{ >+ if (user_mode(regs) || *(u8 *)regs->ip != KERNEL_RESTARTABLE_PREFIX) >+ return; >+} >+ > static nokprobe_inline int >do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str, > struct pt_regs *regs, long error_code) A 0x40 prefix is *not* a noop. It changes the interpretation of byte registers 4 though 7 from ah, ch, dh, bh to spl, bpl, sil and dil. It may not matter in your application but: a. You need to clarify that so is the case, and why; b. Phrase it differently so others don't propagate the same misunderstanding. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.