Received: by 10.223.185.116 with SMTP id b49csp889635wrg; Fri, 16 Feb 2018 08:48:20 -0800 (PST) X-Google-Smtp-Source: AH8x226OaVVz7vdOcBPDSfxhA2FMCRITD89+VKa1YyuhyZ4DdHiSGJPrEaSD8Vx3N7f97U6+isYf X-Received: by 2002:a17:902:9a41:: with SMTP id x1-v6mr6310423plv.256.1518799700784; Fri, 16 Feb 2018 08:48:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518799700; cv=none; d=google.com; s=arc-20160816; b=nlwSkjXZlX4FdD1Q3Q8dELUzQ3F5cOEiHPdFN2UYkaP7I5PI9UHjN7y7nYqwR3GiLB IUb36+fyzU9f6w3xwjJvbIrxB68C9b+s5b8Hl3ZQV6jmILAGQ2T7odRevh/Grmg+B+5V UbA2oBMv3V2nttc4oXHIWuPLakhw0q1Uqjba5FzY06XM3YGrShk9CNO9EtXNl2nMq6dp 13eVdF/xrr/hqipcQ7Jdd3gRASZo86VjEDV8YAlh5oHyu+bYBmiYJrzgGG7p/aLgR13r wPW19aJT6mA4GdMImddsDhd7BGkIGXJ7ei6LRW7dJOczo8wEqZ9JNYedA1GqnHbgVEN9 L72w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dmarc-filter :arc-authentication-results; bh=XCdVHjlkaCBNUyvmqd+Oe79Fg8Med1PqzboAJMZCJj8=; b=fujTGkXZy3Jh+7vKM+EbLC5OCjuCy68dC73OSeOdzV64WH7TVU4D8Ai3KTVs5f8H/y 13JJELV1DT2wHAfRd25uJAkC9f5tMfJQS0fHb5zkbm450t0abWRiBcfiqfZ8t7gBR+HJ dDjlmyxZ87nTZpvrVCzTXVhxyWS16Sbp+oSuYE9KFXvCJixXfDwAlV/ja/uje+oGJf5g Mys8TMjfqzbjX8TtXJrSsYQOft3ZKRWPEIkvgmC+ncDkZo1BvCA4mNTQyb6cgSQzu2Di +i7+JKbFNMue1M0NIpwBG58rYGDa3UCfpGQE6OUULbdLDUXieRrs7M54f5Xte5v/yr7Q cb2w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o2-v6si4761644plk.235.2018.02.16.08.48.06; Fri, 16 Feb 2018 08:48:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1033874AbeBOUDK (ORCPT + 99 others); Thu, 15 Feb 2018 15:03:10 -0500 Received: from mail.kernel.org ([198.145.29.99]:56676 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033802AbeBOUDI (ORCPT ); Thu, 15 Feb 2018 15:03:08 -0500 Received: from mail-it0-f54.google.com (mail-it0-f54.google.com [209.85.214.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 36171217C9 for ; Thu, 15 Feb 2018 20:03:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 36171217C9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org Received: by mail-it0-f54.google.com with SMTP id 18so2069816itj.1 for ; Thu, 15 Feb 2018 12:03:08 -0800 (PST) X-Gm-Message-State: APf1xPBg5qCmq27z15eRdCkkDWgsoyTMAG7EO529Ll8AREuOGISKX54+ LnZfwjR6oPYK+aB0n8GBMPoqwQSk/5eJKhqrXgM3Ow== X-Received: by 10.36.1.20 with SMTP id 20mr5187188itk.104.1518724987568; Thu, 15 Feb 2018 12:03:07 -0800 (PST) MIME-Version: 1.0 Received: by 10.2.137.84 with HTTP; Thu, 15 Feb 2018 12:02:47 -0800 (PST) In-Reply-To: <20180215163602.61162-5-namit@vmware.com> References: <20180215163602.61162-1-namit@vmware.com> <20180215163602.61162-5-namit@vmware.com> From: Andy Lutomirski Date: Thu, 15 Feb 2018 20:02:47 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode To: Nadav Amit Cc: Ingo Molnar , Thomas Gleixner , Andy Lutomirski , Peter Zijlstra , Dave Hansen , Willy Tarreau , Nadav Amit , X86 ML , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 15, 2018 at 4:36 PM, Nadav Amit wrote: > Based on the understanding that there should be no way for userspace to > address the kernel-space from compatibility mode, disable it while > running in compatibility mode as long as the 64-bit code segment of the > user is not used. > > Reenabling PTI is performed by restoring NX-bits to the userspace > mappings, flushing the TLBs, and notifying all the CPUs that use the > affected mm to disable PTI. Each core responds by removing the present > bit for the 64-bit code-segment, and marking that PTI is disabled on > that core. > I dislike this patch because it's conflating two things. The patch claims to merely disable PTI for compat tasks, whatever those are. But it's also introducing a much stronger concept of what a compat task is. The kernel currently mostly doesn't care whether a task is "compat" or not, and I think that most remaining code paths that do care are buggy and should be removed. I think the right way to approach this is to add a new arch_prctl() that changes allowable bitness, like this: arch_prctl(ARCH_SET_ALLOWED_GDT_CS, X86_ALLOW_CS32 | X86_ALLOW_CS64); this would set the current task to work the normal way, where 32-bit and 64-bit CS are available. You could set just X86_ALLOW_CS32 to deny 64-bit mode and just X86_ALLOW_CS64 to deny 32-bit mode. This would make nice attack surface reduction tools for the more paranoid sandbox users to use. Doing arch_prctl(ARCH_SET_ALLOWED_GDT_CS, 0) would return -EINVAL. A separate patch would turn PTI off if you set X86_ALLOW_CS32. This has the downside that old code doesn't get the benefit without some code change, but that's not the end of the world. > +static void pti_cpu_update_func(void *info) > +{ > + struct mm_struct *mm = (struct mm_struct *)info; > + > + if (mm != this_cpu_read(cpu_tlbstate.loaded_mm)) > + return; > + > + /* > + * Keep CS64 and CPU settings in sync despite potential concurrent > + * updates. > + */ > + set_cpu_pti_disable(READ_ONCE(mm->context.pti_disable)); > +} I don't like this at all. IMO a sane implementation should never change PTI status on a remote CPU. Just track it per task. > +void __pti_reenable(void) > +{ > + struct mm_struct *mm = current->mm; > + int cpu; > + > + if (!mm_pti_disable(mm)) > + return; > + > + /* > + * Prevent spurious page-fault storm while we set the NX-bit and have > + * yet not updated the per-CPU pti_disable flag. > + */ > + down_write(&mm->mmap_sem); > + > + if (!mm_pti_disable(mm)) > + goto out; > + > + /* > + * First, mark the PTI is enabled. Although we do anything yet, we are > + * safe as long as we do not reenable CS64. Since we did not update the > + * page tables yet, this may lead to spurious page-faults, but we need > + * the pti_disable in mm to be set for __pti_set_user_pgd() to do the > + * right thing. Holding mmap_sem would ensure matter we hold the > + * mmap_sem to prevent them from swamping the system. > + */ > + mm->context.pti_disable = PTI_DISABLE_OFF; > + > + /* Second, restore the NX bits. */ > + pti_update_user_pgds(mm, true); You're holding mmap_sem, but there are code paths that touch page tables that don't hold mmap_sem, such as the stack extension code. > + > +bool pti_handle_segment_not_present(long error_code) > +{ > + if (!static_cpu_has(X86_FEATURE_PTI)) > + return false; > + > + if ((unsigned short)error_code != GDT_ENTRY_DEFAULT_USER_CS << 3) > + return false; > + > + pti_reenable(); > + return true; > +} Please don't. You're trying to emulate the old behavior here, but you're emulating it wrong. In particular, you won't trap on LAR.