Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753108AbcCGSrI (ORCPT ); Mon, 7 Mar 2016 13:47:08 -0500 Received: from mail-ob0-f180.google.com ([209.85.214.180]:33213 "EHLO mail-ob0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752607AbcCGSrC (ORCPT ); Mon, 7 Mar 2016 13:47:02 -0500 MIME-Version: 1.0 In-Reply-To: References: From: Andy Lutomirski Date: Mon, 7 Mar 2016 10:46:42 -0800 Message-ID: Subject: Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling To: Brian Gerst Cc: Andy Lutomirski , "the arch/x86 maintainers" , Linux Kernel Mailing List , Borislav Petkov , Oleg Nesterov , Andrew Cooper Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3939 Lines: 85 On Mon, Mar 7, 2016 at 10:41 AM, Brian Gerst wrote: > On Mon, Mar 7, 2016 at 1:03 PM, Andy Lutomirski wrote: >> On Mon, Mar 7, 2016 at 9:17 AM, Brian Gerst wrote: >>> On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski wrote: >>>> Due to a blatant design error, SYSENTER doesn't clear TF. As a result, >>>> if a user does SYSENTER with TF set, we will single-step through the >>>> kernel until something clears TF. There is absolutely nothing we can >>>> do to prevent this short of turning off SYSENTER [1]. >>>> >>>> Simplify the handling considerably with two changes: >>>> >>>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >>>> add TF to that list of flags to sanitize with no overhead whatsoever. >>>> >>>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. >>> >>> What is wrong with the current method of clearing TF and setting >>> TIF_SINGLESTEP on the first debug trap? This patch actually increases >>> complexity because it has to check for a range of addresses rather >>> than just the first instruction, plus it has to singlestep all the way >>> through the SYSENTER prologue. >>> >>> Unless there is an actual issue with TIF_SINGLESTEP, I don't think >>> this patch is an improvement. >> >> TIF_SINGLESTEP has bizarrely overloaded semantics. >> >> There's this: >> >> /* >> * If we stepped into a sysenter/syscall insn, it trapped in >> * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP. >> * If user-mode had set TF itself, then it's still clear from >> * do_debug() and we need to set it again to restore the user >> * state so we don't wrongly set TIF_FORCED_TF below. >> * If enable_single_step() was used last and that is what >> * set TIF_SINGLESTEP, then both TF and TIF_FORCED_TF are >> * already set and our bookkeeping is fine. >> */ >> if (unlikely(test_tsk_thread_flag(child, TIF_SINGLESTEP))) >> regs->flags |= X86_EFLAGS_TF; >> >> but TIF_SINGLESTEP is also used for other things. (And I need to >> follow up with a patch to remove that comment.) This results in >> incomprehensible behavior: if a user program sets TF and does >> SYSENTER, then TIF_SINGLESTEP gets set (and stays set!). This does >> not happen if a user sets TF and does INT80 or SYSCALL. There was at >> least one bug in here that Oleg fixed a while back, and I wouldn't be >> at all surprised if there were others. >> >> I don't know what would happen if TF were set and SYSENTER were used >> to do a syscall where the __get_user to load the syscall nr failed. >> That happens before the TIF_SINGLESTEP fixup. >> >> Basically, the overloaded use of TIF_SINGLESTEP was complicated and >> hard to understand, and the new behavior is straightforward and >> consistent with other entries, even if it's a bit slower. >> >> We could introduce a new TIF_SYSENTER_TF and use it directly, or we >> could accelerate the TF fixup in regs->flags by switching to a >> different entry path (I had a draft patch to do that), but I tend to >> favor simplicity for things that aren't performance-critical. > > The alternate entry path wouldn't be very big, just 5 instructions > with the OR being the only difference. Agreed. It's just more code to maintain and test. I can certainly do it. I scrapped it the first time because it was messy on Xen PV. It was extra messy because I did this part of the series before I cleaned up the garbage ENTRY(debug) code, and that got in the way as well. > > Another option is to use a per-cpu var instead of a TIF flag. Then we'd need to add a branch in the TF-not-set path, which would be a bit unfortunate. As a third option, we could force BTF on, which would reduce the number of useless traps. In any case, I'd be glad to speed this up as a follow-up patch if anyone cares about performance. --Andy