Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753168AbcCGSDq (ORCPT ); Mon, 7 Mar 2016 13:03:46 -0500 Received: from mail-ob0-f182.google.com ([209.85.214.182]:35569 "EHLO mail-ob0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751146AbcCGSDi (ORCPT ); Mon, 7 Mar 2016 13:03:38 -0500 MIME-Version: 1.0 In-Reply-To: References: From: Andy Lutomirski Date: Mon, 7 Mar 2016 10:03:18 -0800 Message-ID: Subject: Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling To: Brian Gerst Cc: Andy Lutomirski , "the arch/x86 maintainers" , Linux Kernel Mailing List , Borislav Petkov , Oleg Nesterov , Andrew Cooper Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2889 Lines: 60 On Mon, Mar 7, 2016 at 9:17 AM, Brian Gerst wrote: > On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski wrote: >> Due to a blatant design error, SYSENTER doesn't clear TF. As a result, >> if a user does SYSENTER with TF set, we will single-step through the >> kernel until something clears TF. There is absolutely nothing we can >> do to prevent this short of turning off SYSENTER [1]. >> >> Simplify the handling considerably with two changes: >> >> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >> add TF to that list of flags to sanitize with no overhead whatsoever. >> >> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. > > What is wrong with the current method of clearing TF and setting > TIF_SINGLESTEP on the first debug trap? This patch actually increases > complexity because it has to check for a range of addresses rather > than just the first instruction, plus it has to singlestep all the way > through the SYSENTER prologue. > > Unless there is an actual issue with TIF_SINGLESTEP, I don't think > this patch is an improvement. TIF_SINGLESTEP has bizarrely overloaded semantics. There's this: /* * If we stepped into a sysenter/syscall insn, it trapped in * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP. * If user-mode had set TF itself, then it's still clear from * do_debug() and we need to set it again to restore the user * state so we don't wrongly set TIF_FORCED_TF below. * If enable_single_step() was used last and that is what * set TIF_SINGLESTEP, then both TF and TIF_FORCED_TF are * already set and our bookkeeping is fine. */ if (unlikely(test_tsk_thread_flag(child, TIF_SINGLESTEP))) regs->flags |= X86_EFLAGS_TF; but TIF_SINGLESTEP is also used for other things. (And I need to follow up with a patch to remove that comment.) This results in incomprehensible behavior: if a user program sets TF and does SYSENTER, then TIF_SINGLESTEP gets set (and stays set!). This does not happen if a user sets TF and does INT80 or SYSCALL. There was at least one bug in here that Oleg fixed a while back, and I wouldn't be at all surprised if there were others. I don't know what would happen if TF were set and SYSENTER were used to do a syscall where the __get_user to load the syscall nr failed. That happens before the TIF_SINGLESTEP fixup. Basically, the overloaded use of TIF_SINGLESTEP was complicated and hard to understand, and the new behavior is straightforward and consistent with other entries, even if it's a bit slower. We could introduce a new TIF_SYSENTER_TF and use it directly, or we could accelerate the TF fixup in regs->flags by switching to a different entry path (I had a draft patch to do that), but I tend to favor simplicity for things that aren't performance-critical.