Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751664AbcCFRhL (ORCPT ); Sun, 6 Mar 2016 12:37:11 -0500 Received: from mail-ob0-f178.google.com ([209.85.214.178]:35642 "EHLO mail-ob0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751271AbcCFRhD (ORCPT ); Sun, 6 Mar 2016 12:37:03 -0500 MIME-Version: 1.0 In-Reply-To: References: <20160306165546.GA6902@pd.tnic> From: Andy Lutomirski Date: Sun, 6 Mar 2016 09:36:42 -0800 Message-ID: Subject: Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling To: Brian Gerst Cc: Borislav Petkov , Andy Lutomirski , "the arch/x86 maintainers" , Linux Kernel Mailing List , Oleg Nesterov , Andrew Cooper Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5838 Lines: 142 On Sun, Mar 6, 2016 at 9:30 AM, Brian Gerst wrote: > On Sun, Mar 6, 2016 at 11:55 AM, Borislav Petkov wrote: >> On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote: >>> Due to a blatant design error, SYSENTER doesn't clear TF. As a result, >>> if a user does SYSENTER with TF set, we will single-step through the >>> kernel until something clears TF. There is absolutely nothing we can >>> do to prevent this short of turning off SYSENTER [1]. >>> >>> Simplify the handling considerably with two changes: >>> >>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >>> add TF to that list of flags to sanitize with no overhead whatsoever. >>> >>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. >>> >>> That's all we need to do. >>> >>> Don't get too excited -- our handling is still buggy on 32-bit >>> kernels. There's nothing wrong with the SYSENTER code itself, but >>> the #DB prologue has a clever fixup for traps on the very first >>> instruction of entry_SYSENTER_32, and the fixup doesn't work quite >>> correctly. The next two patches will fix that. >>> >>> [1] We could probably prevent it by forcing BTF on at all times and >>> making sure we clear TF before any branches in the SYSENTER >>> code. Needless to say, this is a bad idea. >>> >>> Signed-off-by: Andy Lutomirski >>> --- >>> arch/x86/entry/entry_32.S | 42 ++++++++++++++++++++++---------- >>> arch/x86/entry/entry_64_compat.S | 9 ++++++- >>> arch/x86/include/asm/proto.h | 15 ++++++++++-- >>> arch/x86/kernel/traps.c | 52 +++++++++++++++++++++++++++++++++------- >>> 4 files changed, 94 insertions(+), 24 deletions(-) >>> >>> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S >>> index a8c3424c3392..7700cf695d8c 100644 >>> --- a/arch/x86/entry/entry_32.S >>> +++ b/arch/x86/entry/entry_32.S >>> @@ -287,7 +287,26 @@ need_resched: >>> END(resume_kernel) >>> #endif >>> >>> - # SYSENTER call handler stub >>> +GLOBAL(__begin_SYSENTER_singlestep_region) >>> +/* >>> + * All code from here through __end_SYSENTER_singlestep_region is subject >>> + * to being single-stepped if a user program sets TF and executes SYSENTER. >>> + * There is absolutely nothing that we can do to prevent this from happening >>> + * (thanks Intel!). To keep our handling of this situation as simple as >>> + * possible, we handle TF just like AC and NT, except that our #DB handler >>> + * will ignore all of the single-step traps generated in this range. >>> + */ >>> + >>> +#ifdef CONFIG_XEN >>> +/* >>> + * Xen doesn't set %esp to be precisely what the normal SYSENTER >>> + * entry point expects, so fix it up before using the normal path. >>> + */ >>> +ENTRY(xen_sysenter_target) >>> + addl $5*4, %esp /* remove xen-provided frame */ >>> + jmp sysenter_past_esp >>> +#endif >>> + >>> ENTRY(entry_SYSENTER_32) >>> movl TSS_sysenter_sp0(%esp), %esp >>> sysenter_past_esp: >> >> Can you do this below (ontop of your diff) and get rid of those >> __{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat >> globals and use the entry_SYSENTER_{32|compat} symbols instead? >> >> From a quick scan, kallsyms can give you the symbol size too so that you >> can compute where it ends: >> >> readelf -a vmlinux | grep entry_SYSENTER >> 55454: ffffffff8170aff0 99 FUNC GLOBAL DEFAULT 1 entry_SYSENTER_compat >> 62596: ffffffff8170b053 0 NOTYPE GLOBAL DEFAULT 1 __end_entry_SYSENTER_comp >> >> 0xffffffff8170aff0 + 99 = 0xffffffff8170b053 >> >> --- >> Index: b/arch/x86/entry/entry_32.S >> =================================================================== >> --- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100 >> +++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100 >> @@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio >> * will ignore all of the single-step traps generated in this range. >> */ >> >> +ENTRY(entry_SYSENTER_32) >> #ifdef CONFIG_XEN >> -/* >> - * Xen doesn't set %esp to be precisely what the normal SYSENTER >> - * entry point expects, so fix it up before using the normal path. >> - */ >> -ENTRY(xen_sysenter_target) >> addl $5*4, %esp /* remove xen-provided frame */ >> jmp sysenter_past_esp >> -#endif >> - >> -ENTRY(entry_SYSENTER_32) >> +#else >> movl TSS_sysenter_sp0(%esp), %esp >> +#endif >> sysenter_past_esp: >> pushl $__USER_DS /* pt_regs->ss */ >> pushl %ebp /* pt_regs->sp (stashed in bp) */ > > > This won't work if you run a Xen enabled kernel on bare metal. This > would work though: > > ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp", > X86_FEATURE_XENPV That might work. > > I haven't read the Xen hypervisor code, but what are those 5 words > that were pushed on the stack by the hypervisor? It suspiciously is > the size of an IRET frame. I think it is nominally an IRET frame. One might wonder what's in it, given that the hypervisor doesn't know what the old IP or SP was. > Considering that we don't use SYSEXIT on > Xen anymore, can we just redirect SYSENTER to the INT80 handler? > Perhaps even just disable SYSENTER support in the VDSO on Xen. I > can't imagine SYSENTER is any faster than INT80 on Xen, because it has > to trap to the hypervisor first. > I think we should leave it enabled -- having user programs on Xen PV trap into the hypervisor for a system call using SYSENTER will still be much faster than using INT $0x80 as long as the hypervisor does something reasonable. I think that Andrew Cooper has some patches in the works to clean a bunch of the Xen PV entry prologue stuff up. --Andy