Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp10455853imu; Sun, 30 Dec 2018 23:22:55 -0800 (PST) X-Google-Smtp-Source: ALg8bN5YtBQt2T+Ts3qzEV9u3J4PUdRNAUJB/GXY52p45QKbOpcocfzJfPFa0RWLu1r0WNPa0tAG X-Received: by 2002:a17:902:b20e:: with SMTP id t14mr36537772plr.128.1546240975746; Sun, 30 Dec 2018 23:22:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546240975; cv=none; d=google.com; s=arc-20160816; b=VvL/2oq89HAJYuk/lQB8vsXV6j2oKrQe1hYfvR7lefMkwmwBFkjUHzEJyUvcpFe6JM 3H8nAXqtA6oR0XO77U7LluE4qIxXdtOOGuBzU08+xX2aKnbtycsI67Jav4tiUEGaXQqm LkxBzEh/fCU3W25lLx6hC8iWr3iGdj4Qi84NWr2VqD/NWq2JdvKr3K6ecPfXmKSvimTz XYANZ7oZGMJz+w3tnS635hl6ZI2aoVzfwd2MLYJmUUcUGCUHpduEccoFV9WrfkAp4FZT fx3oEivUkiEcuLnhuFPQf5qkbeNhkGfYfqa5YPCKJRjtqRw/NJTjqBeDTM2ZXoRfjdH2 7JTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=Mgi8+oRiBoFmX8AG1/ocE4e+MkHLQsahazhJhPe5TH8=; b=LlzHJVkRPZ2wiS8137mP/gVZ32olScfCpvF6xGJknJ88nJPDGYAaOqvUPVbXn2e5Lc vsx30rP1sgNyavqvQwrAp5E1dGWZTBmKK82tXMN2jrJqKRBWvhTRXDCm6yni5BDii0w2 dA+EHURZtefF5/ieGg7eXqJC4P3X89UliwglxqQckqkRi8Xx7/eq9grKxXWQc831IEjQ 8m4nYrIuDMGZDoUd64GOXfb1FQ6pimaF8SyV56vT04fAvrdj86dRLkV5OV1pRwdWzrFY 6qBqdpSAJHHrEhEGc8QWLAPXQQ0usdN2tEOltjYiKj5rvyJHIi7Hte+8Cy/ZmIK6Yt64 O6UQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=vmware.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 69si44322033pla.75.2018.12.30.23.22.40; Sun, 30 Dec 2018 23:22:55 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=vmware.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727158AbeLaHVB (ORCPT + 99 others); Mon, 31 Dec 2018 02:21:01 -0500 Received: from ex13-edg-ou-001.vmware.com ([208.91.0.189]:41029 "EHLO EX13-EDG-OU-001.vmware.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726435AbeLaHU5 (ORCPT ); Mon, 31 Dec 2018 02:20:57 -0500 Received: from sc9-mailhost3.vmware.com (10.113.161.73) by EX13-EDG-OU-001.vmware.com (10.113.208.155) with Microsoft SMTP Server id 15.0.1156.6; Sun, 30 Dec 2018 23:20:33 -0800 Received: from sc2-haas01-esx0118.eng.vmware.com (sc2-haas01-esx0118.eng.vmware.com [10.172.44.118]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id 8CF9E40B12; Sun, 30 Dec 2018 23:20:35 -0800 (PST) From: Nadav Amit To: Ingo Molnar , Andy Lutomirski , Peter Zijlstra , Josh Poimboeuf , Edward Cree CC: "H . Peter Anvin" , Thomas Gleixner , LKML , Nadav Amit , X86 ML , Paolo Abeni , Borislav Petkov , David Woodhouse , Nadav Amit Subject: [RFC v2 1/6] x86: introduce kernel restartable sequence Date: Sun, 30 Dec 2018 23:21:07 -0800 Message-ID: <20181231072112.21051-2-namit@vmware.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181231072112.21051-1-namit@vmware.com> References: <20181231072112.21051-1-namit@vmware.com> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: None (EX13-EDG-OU-001.vmware.com: namit@vmware.com does not designate permitted sender hosts) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org It is sometimes beneficial to have a restartable sequence - very few instructions which if they are preempted jump to a predefined point. To provide such functionality on x86-64, we use an empty REX-prefix (opcode 0x40) as an indication for instruction in such a sequence. Before calling the schedule IRQ routine, if the "magic" prefix is found, we call a routine to adjust the instruction pointer. It is expected that this opcode is not in common use. The following patch will make use of this function. Since there are no other users (yet?), the patch does not bother to create a general infrastructure and API that others can use for such sequences. Yet, it should not be hard to make such extension later. Signed-off-by: Nadav Amit --- arch/x86/entry/entry_64.S | 16 ++++++++++++++-- arch/x86/include/asm/nospec-branch.h | 12 ++++++++++++ arch/x86/kernel/traps.c | 7 +++++++ 3 files changed, 33 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 1f0efdb7b629..e144ff8b914f 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -644,12 +644,24 @@ retint_kernel: /* Interrupts are off */ /* Check if we need preemption */ btl $9, EFLAGS(%rsp) /* were interrupts off? */ - jnc 1f + jnc 2f 0: cmpl $0, PER_CPU_VAR(__preempt_count) + jnz 2f + + /* + * Allow to use restartable code sections in the kernel. Consider an + * instruction with the first byte having REX prefix without any bits + * set as an indication for an instruction in such a section. + */ + movq RIP(%rsp), %rax + cmpb $KERNEL_RESTARTABLE_PREFIX, (%rax) jnz 1f + mov %rsp, %rdi + call restart_kernel_rseq +1: call preempt_schedule_irq jmp 0b -1: +2: #endif /* * The iretq could re-enable interrupts: diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index dad12b767ba0..be4713ef0940 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -54,6 +54,12 @@ jnz 771b; \ add $(BITS_PER_LONG/8) * nr, sp; +/* + * An empty REX-prefix is an indication that this instruction is part of kernel + * restartable sequence. + */ +#define KERNEL_RESTARTABLE_PREFIX (0x40) + #ifdef __ASSEMBLY__ /* @@ -150,6 +156,12 @@ #endif .endm +.macro restartable_seq_prefix +#ifdef CONFIG_PREEMPT + .byte KERNEL_RESTARTABLE_PREFIX +#endif +.endm + #else /* __ASSEMBLY__ */ #define ANNOTATE_NOSPEC_ALTERNATIVE \ diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 85cccadb9a65..b1e855bad5ac 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -59,6 +59,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 #include @@ -186,6 +187,12 @@ int fixup_bug(struct pt_regs *regs, int trapnr) return 0; } +asmlinkage __visible void restart_kernel_rseq(struct pt_regs *regs) +{ + if (user_mode(regs) || *(u8 *)regs->ip != KERNEL_RESTARTABLE_PREFIX) + return; +} + static nokprobe_inline int do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str, struct pt_regs *regs, long error_code) -- 2.17.1