Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751727AbcCFRah (ORCPT ); Sun, 6 Mar 2016 12:30:37 -0500 Received: from mail-ob0-f193.google.com ([209.85.214.193]:33360 "EHLO mail-ob0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751484AbcCFRaa (ORCPT ); Sun, 6 Mar 2016 12:30:30 -0500 MIME-Version: 1.0 In-Reply-To: <20160306165546.GA6902@pd.tnic> References: <20160306165546.GA6902@pd.tnic> Date: Sun, 6 Mar 2016 12:30:28 -0500 Message-ID: Subject: Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling From: Brian Gerst To: Borislav Petkov Cc: Andy Lutomirski , "the arch/x86 maintainers" , Linux Kernel Mailing List , Oleg Nesterov , Andrew Cooper Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5137 Lines: 125 On Sun, Mar 6, 2016 at 11:55 AM, Borislav Petkov wrote: > On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote: >> Due to a blatant design error, SYSENTER doesn't clear TF. As a result, >> if a user does SYSENTER with TF set, we will single-step through the >> kernel until something clears TF. There is absolutely nothing we can >> do to prevent this short of turning off SYSENTER [1]. >> >> Simplify the handling considerably with two changes: >> >> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >> add TF to that list of flags to sanitize with no overhead whatsoever. >> >> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. >> >> That's all we need to do. >> >> Don't get too excited -- our handling is still buggy on 32-bit >> kernels. There's nothing wrong with the SYSENTER code itself, but >> the #DB prologue has a clever fixup for traps on the very first >> instruction of entry_SYSENTER_32, and the fixup doesn't work quite >> correctly. The next two patches will fix that. >> >> [1] We could probably prevent it by forcing BTF on at all times and >> making sure we clear TF before any branches in the SYSENTER >> code. Needless to say, this is a bad idea. >> >> Signed-off-by: Andy Lutomirski >> --- >> arch/x86/entry/entry_32.S | 42 ++++++++++++++++++++++---------- >> arch/x86/entry/entry_64_compat.S | 9 ++++++- >> arch/x86/include/asm/proto.h | 15 ++++++++++-- >> arch/x86/kernel/traps.c | 52 +++++++++++++++++++++++++++++++++------- >> 4 files changed, 94 insertions(+), 24 deletions(-) >> >> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S >> index a8c3424c3392..7700cf695d8c 100644 >> --- a/arch/x86/entry/entry_32.S >> +++ b/arch/x86/entry/entry_32.S >> @@ -287,7 +287,26 @@ need_resched: >> END(resume_kernel) >> #endif >> >> - # SYSENTER call handler stub >> +GLOBAL(__begin_SYSENTER_singlestep_region) >> +/* >> + * All code from here through __end_SYSENTER_singlestep_region is subject >> + * to being single-stepped if a user program sets TF and executes SYSENTER. >> + * There is absolutely nothing that we can do to prevent this from happening >> + * (thanks Intel!). To keep our handling of this situation as simple as >> + * possible, we handle TF just like AC and NT, except that our #DB handler >> + * will ignore all of the single-step traps generated in this range. >> + */ >> + >> +#ifdef CONFIG_XEN >> +/* >> + * Xen doesn't set %esp to be precisely what the normal SYSENTER >> + * entry point expects, so fix it up before using the normal path. >> + */ >> +ENTRY(xen_sysenter_target) >> + addl $5*4, %esp /* remove xen-provided frame */ >> + jmp sysenter_past_esp >> +#endif >> + >> ENTRY(entry_SYSENTER_32) >> movl TSS_sysenter_sp0(%esp), %esp >> sysenter_past_esp: > > Can you do this below (ontop of your diff) and get rid of those > __{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat > globals and use the entry_SYSENTER_{32|compat} symbols instead? > > From a quick scan, kallsyms can give you the symbol size too so that you > can compute where it ends: > > readelf -a vmlinux | grep entry_SYSENTER > 55454: ffffffff8170aff0 99 FUNC GLOBAL DEFAULT 1 entry_SYSENTER_compat > 62596: ffffffff8170b053 0 NOTYPE GLOBAL DEFAULT 1 __end_entry_SYSENTER_comp > > 0xffffffff8170aff0 + 99 = 0xffffffff8170b053 > > --- > Index: b/arch/x86/entry/entry_32.S > =================================================================== > --- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100 > +++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100 > @@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio > * will ignore all of the single-step traps generated in this range. > */ > > +ENTRY(entry_SYSENTER_32) > #ifdef CONFIG_XEN > -/* > - * Xen doesn't set %esp to be precisely what the normal SYSENTER > - * entry point expects, so fix it up before using the normal path. > - */ > -ENTRY(xen_sysenter_target) > addl $5*4, %esp /* remove xen-provided frame */ > jmp sysenter_past_esp > -#endif > - > -ENTRY(entry_SYSENTER_32) > +#else > movl TSS_sysenter_sp0(%esp), %esp > +#endif > sysenter_past_esp: > pushl $__USER_DS /* pt_regs->ss */ > pushl %ebp /* pt_regs->sp (stashed in bp) */ This won't work if you run a Xen enabled kernel on bare metal. This would work though: ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp", X86_FEATURE_XENPV I haven't read the Xen hypervisor code, but what are those 5 words that were pushed on the stack by the hypervisor? It suspiciously is the size of an IRET frame. Considering that we don't use SYSEXIT on Xen anymore, can we just redirect SYSENTER to the INT80 handler? Perhaps even just disable SYSENTER support in the VDSO on Xen. I can't imagine SYSENTER is any faster than INT80 on Xen, because it has to trap to the hypervisor first. -- Brian Gerst