Received: by 10.223.185.116 with SMTP id b49csp789887wrg; Fri, 16 Feb 2018 07:15:04 -0800 (PST) X-Google-Smtp-Source: AH8x2255xqzNP9+WaiNS4nSybP74eMVGkossgSDhwxAsrA6+Kq2p0cU8wogOmDSRw+d4hZNFMBEV X-Received: by 10.101.97.79 with SMTP id o15mr1462472pgv.31.1518794104156; Fri, 16 Feb 2018 07:15:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518794104; cv=none; d=google.com; s=arc-20160816; b=xKOaKvh4iel2h238EN7Y+7ug6ulJrtGj+2Kmi/9k67Xee4rFNyOzlQzsVQ2H9GoVoK dRBdhhDy3UuLlhrDzcw0LZDtQTFZTeCzO0P5JisLkObykb3BbNZpdfd4B6gQzAjHTAzU NE0gqL2IVzkR7cMtZosP3bjO21oHglZ0ZXw5InoGBG+gl1JGurRx6s+s1mhMoRCXkrPh DuHQQLEZP9TWv1ujIwUjSZr2wyo9GfLVeVsosrvHtIOt+cV0gXoWm1lpJDPJdwZnI1UB EETWcJobAArBCV9omSD9Y2HNoBZBKjLr1vvZThIsVTL6VeNigBz8HzOeM8JQpDnHO31o r6iA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature:arc-authentication-results; bh=6MYeJECKhEZs+2rndBfEA9vBaIIgtb68MrmLdL/rIEw=; b=E2hUiMyl9ZccWC/pzz+4HU9I3Mr2Qq+yTjK5zG8807HdpN6oh9s+0pd6vWdAFVE2uh TEXv/QN1IFOii1ZCDgzsuB0aSjRyOVqkT5HJn/JqntBHlcXx+8PUOt8PuSEg0ZE+VKcS 0fEh7iWhP5rWfapYR+oBBxOAXhyljk4cuqQshPdXenza9Gww+JXqbMKAW38N5XYkhdD7 SYTkhnIMigqGQxxKJB5P4rHImCUlpb67nnlEDMDfyjwfWsb+/hHqLWN/tbPM0unUoYQm YuMkuXCw8MrnfibGuJYpXFbaMpE3VcWZOeI1dxG9rBxaOOr2Wbj0crjI1N1etwgmhCCM flrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GP8w6gZg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j5-v6si2549570pll.591.2018.02.16.07.14.48; Fri, 16 Feb 2018 07:15:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GP8w6gZg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1163148AbeBOU6r (ORCPT + 99 others); Thu, 15 Feb 2018 15:58:47 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:36631 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1162995AbeBOU6q (ORCPT ); Thu, 15 Feb 2018 15:58:46 -0500 Received: by mail-wm0-f68.google.com with SMTP id f3so3321035wmc.1 for ; Thu, 15 Feb 2018 12:58:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=6MYeJECKhEZs+2rndBfEA9vBaIIgtb68MrmLdL/rIEw=; b=GP8w6gZgFPg1lQSQ66ECl+WwMBCQ5Y123WHrI7AvWTGeZosOUTvE9D8xMovolsHr9Z W+s6Zydq3LNj1H+UCzOEk7HTLmd1MiVqKYJJQwL4tqVh3n6JMV+F5zhoQmwk+cOXFupz Sy+uQBVUm4ZTmIUyHkMzutYpVdmLchfYZml7MThZsHmwvBkQFRApPM2W7hGXiK2Xx0Ul SgjDhm32/kX+7Max588Rh+72YQbLzebuXmKXBrSA0SripGoemGzt945xc7g55fdr1OxC TtqeIK4rV4luwpa7+ATbLYvIL8N84wYwrU28VC9J7bWYulFiDNdxO/jkvHsTm0Mn8eDT r5zQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=6MYeJECKhEZs+2rndBfEA9vBaIIgtb68MrmLdL/rIEw=; b=saFCIqmZzADSRFXepgjonIcYNLNW6/SYuXwP4UB6Rw3Nu9/Qr/ljvwIv2OVJHyaNCK IQWQR66IUcYxFNdX7wSQzy2adwPtEvW4DdrxN88mQdZasdKMu93Vlm8XybAUTmanQakP /sNnu6ydeb2q4a5ZXUXU/+X/FiKiRcsqavY7kRdptoFmgo1P36kAvKPLo40ciIIYYrjB +1d9s/gZHZq4363ta5hY9pWUGktvb7XtemAeLTd70ICuzLfWIStD/stbMgor7INvslE/ 08Tl+69UGyzle1zkGMh4nrxxewh+W0AzIX5/AWzRO0byeqfsC2y1tWa/B6pSpmQL1zHB IMcg== X-Gm-Message-State: APf1xPCggD1egIfokLHGRw6LpHITAQac5V512fmU6eul1qexB0+bu7io jYsk6CkaFXiP3EmwQuAWBts= X-Received: by 10.80.145.246 with SMTP id h51mr5068426eda.302.1518728324628; Thu, 15 Feb 2018 12:58:44 -0800 (PST) Received: from [10.2.101.129] ([208.91.2.2]) by smtp.gmail.com with ESMTPSA id o63sm713609eda.17.2018.02.15.12.58.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 15 Feb 2018 12:58:43 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode From: Nadav Amit In-Reply-To: Date: Thu, 15 Feb 2018 12:58:38 -0800 Cc: Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Dave Hansen , Willy Tarreau , X86 ML , LKML Content-Transfer-Encoding: quoted-printable Message-Id: <9EB804CA-0EC9-4CBB-965A-F3C8520201E7@gmail.com> References: <20180215163602.61162-1-namit@vmware.com> <20180215163602.61162-5-namit@vmware.com> To: Andy Lutomirski X-Mailer: Apple Mail (2.3273) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Andy Lutomirski wrote: > On Thu, Feb 15, 2018 at 4:36 PM, Nadav Amit wrote: >> Based on the understanding that there should be no way for userspace = to >> address the kernel-space from compatibility mode, disable it while >> running in compatibility mode as long as the 64-bit code segment of = the >> user is not used. >>=20 >> Reenabling PTI is performed by restoring NX-bits to the userspace >> mappings, flushing the TLBs, and notifying all the CPUs that use the >> affected mm to disable PTI. Each core responds by removing the = present >> bit for the 64-bit code-segment, and marking that PTI is disabled on >> that core. >=20 > I dislike this patch because it's conflating two things. The patch > claims to merely disable PTI for compat tasks, whatever those are. > But it's also introducing a much stronger concept of what a compat > task is. The kernel currently mostly doesn't care whether a task is > "compat" or not, and I think that most remaining code paths that do > care are buggy and should be removed. >=20 > I think the right way to approach this is to add a new arch_prctl() > that changes allowable bitness, like this: >=20 > arch_prctl(ARCH_SET_ALLOWED_GDT_CS, X86_ALLOW_CS32 | X86_ALLOW_CS64); >=20 > this would set the current task to work the normal way, where 32-bit > and 64-bit CS are available. You could set just X86_ALLOW_CS32 to > deny 64-bit mode and just X86_ALLOW_CS64 to deny 32-bit mode. This > would make nice attack surface reduction tools for the more paranoid > sandbox users to use. Doing arch_prctl(ARCH_SET_ALLOWED_GDT_CS, 0) > would return -EINVAL. >=20 > A separate patch would turn PTI off if you set X86_ALLOW_CS32. >=20 > This has the downside that old code doesn't get the benefit without > some code change, but that's not the end of the world. >=20 >> +static void pti_cpu_update_func(void *info) >> +{ >> + struct mm_struct *mm =3D (struct mm_struct *)info; >> + >> + if (mm !=3D this_cpu_read(cpu_tlbstate.loaded_mm)) >> + return; >> + >> + /* >> + * Keep CS64 and CPU settings in sync despite potential = concurrent >> + * updates. >> + */ >> + set_cpu_pti_disable(READ_ONCE(mm->context.pti_disable)); >> +} >=20 > I don't like this at all. IMO a sane implementation should never > change PTI status on a remote CPU. Just track it per task. =46rom your comments in the past, I understood that this requirement (enabling/disabling PTI per mm) came from Linus. I can do it per task, = but that means that the NX-bit will always be cleared on the kernel = page-table for user mappings. >> +void __pti_reenable(void) >> +{ >> + struct mm_struct *mm =3D current->mm; >> + int cpu; >> + >> + if (!mm_pti_disable(mm)) >> + return; >> + >> + /* >> + * Prevent spurious page-fault storm while we set the NX-bit = and have >> + * yet not updated the per-CPU pti_disable flag. >> + */ >> + down_write(&mm->mmap_sem); >> + >> + if (!mm_pti_disable(mm)) >> + goto out; >> + >> + /* >> + * First, mark the PTI is enabled. Although we do anything = yet, we are >> + * safe as long as we do not reenable CS64. Since we did not = update the >> + * page tables yet, this may lead to spurious page-faults, = but we need >> + * the pti_disable in mm to be set for __pti_set_user_pgd() = to do the >> + * right thing. Holding mmap_sem would ensure matter we hold = the >> + * mmap_sem to prevent them from swamping the system. >> + */ >> + mm->context.pti_disable =3D PTI_DISABLE_OFF; >> + >> + /* Second, restore the NX bits. */ >> + pti_update_user_pgds(mm, true); >=20 > You're holding mmap_sem, but there are code paths that touch page > tables that don't hold mmap_sem, such as the stack extension code. Do they change user mappings? These are the only ones = pti_update_user_pgds() changes. >> + >> +bool pti_handle_segment_not_present(long error_code) >> +{ >> + if (!static_cpu_has(X86_FEATURE_PTI)) >> + return false; >> + >> + if ((unsigned short)error_code !=3D GDT_ENTRY_DEFAULT_USER_CS = << 3) >> + return false; >> + >> + pti_reenable(); >> + return true; >> +} >=20 > Please don't. You're trying to emulate the old behavior here, but > you're emulating it wrong. In particular, you won't trap on LAR. Yes, I thought I=E2=80=99ll manage to address LAR, but failed. I thought = you said this is not a =E2=80=9Cshow-stopper=E2=80=9D. I=E2=80=99ll adapt your = approach of using prctl, although it really limits the benefit of this mechanism.