Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753400AbbDATZ1 (ORCPT ); Wed, 1 Apr 2015 15:25:27 -0400 Received: from mail.kernel.org ([198.145.29.136]:47898 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753241AbbDATZZ (ORCPT ); Wed, 1 Apr 2015 15:25:25 -0400 From: Andy Lutomirski To: Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org Cc: Borislav Petkov , Denys Vlasenko , Andy Lutomirski Subject: [PATCH urgent] x86, asm: Disable opportunistic SYSRET if regs->flags has TF set Date: Wed, 1 Apr 2015 12:25:20 -0700 Message-Id: <2805a341e0dddb37b018486b0ab4162e2f2fb118.1427916036.git.luto@kernel.org> X-Mailer: git-send-email 2.3.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2649 Lines: 71 When I wrote the opportunistic SYSRET code, I missed an important difference between SYSRET and IRET. Both instructions are capable of setting EFLAGS.TF, but they behave differently when doing so. IRET will not issue a #DB trap after execution when it sets TF This is critical -- otherwise you'd never be able to make forward progress when returning to userspace. SYSRET, on the other hand, will trap with #DB immediately after returning to CPL3, and the next instruction will never execute. This breaks anything that opportunistically SYSRETs to a user context with TF set. For example, running this code with TF set and a SIGTRAP handler loaded never gets past post_nop. extern unsigned char post_nop[]; asm volatile ("pushfq\n\t" "popq %%r11\n\t" "nop\n\t" "post_nop:" : : "c" (post_nop) : "r11"); In my defense, I can't find this documented in the AMD or Intel manual. Fix it by using IRET to restore TF. Since it's late, I'm keeping this minimal and keeping "testq" instead of switching to "testl". Fixes: 2a23c6b8a9c4 x86_64, entry: Use sysret to return to userspace when possible Signed-off-by: Andy Lutomirski --- This affects 4.0-rc as well as -tip. A full test case lives here: https://git.kernel.org/cgit/linux/kernel/git/luto/misc-tests.git/ It's called single_step_syscall_64. On Intel systems, the 32-bit version of that test fails for unrelated reasons, but that's not a regression, and fixing it will be much more intrusive. arch/x86/kernel/entry_64.S | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 750c6efcb718..369f2716ef3f 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -715,7 +715,14 @@ retint_swapgs: /* return to user-space */ cmpq %r11,EFLAGS(%rsp) /* R11 == RFLAGS */ jne opportunistic_sysret_failed - testq $X86_EFLAGS_RF,%r11 /* sysret can't restore RF */ + /* + * SYSRET can't restore RF. SYSRET can restore TF, but unlike IRET, + * restoring TF results in a trap from userspace immediately after + * SYSRET. This would cause an infinite loop whenever #DB happens + * with register state that satisfies the opportunistic SYSRET + * conditions. + */ + testq $(X86_EFLAGS_RF|X86_EFLAGS_TF),%r11 jnz opportunistic_sysret_failed /* nothing to check for RSP */ -- 2.3.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/