Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp5130402imm; Sun, 22 Jul 2018 14:01:47 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfpCTDgHNA8kS40LaVGiP3s6vso9PVfv7xdfTJ87Kpep77OJATC68XbNr8X5aHn24Kg3bVN X-Received: by 2002:a62:47c4:: with SMTP id p65-v6mr10760164pfi.170.1532293307412; Sun, 22 Jul 2018 14:01:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532293307; cv=none; d=google.com; s=arc-20160816; b=Bo8SOlbLu9vOJzNTd4fNskpMWoSMJsTdOJvZOuUENry8jOSjKGE5zaB72hGuiK6p6U 5M3wYc6cFz5E60vBuHeVnyYHT71cRhB8tzewdAvJkhuEo8vsPVKx8ILDJpuu640zn7Xj pGl8OnRhufZNCScAkl98atbJqfmHiXZPWXYBiTNwCUKWQ60ygtgIiFgniB6h1ZsecEge Agze+PKaeE6OA0XZnHUlN+OpP5Y3BuKjvxD3b6nuT0f+wmMx8tYvcgdhvY+59OVYkcgV zwXLq9HGCtrZjzrPxfmajTppaG9mmOC2i5D+gPqKb4PjC9ZKlxd11r+rncNkMZ+tqW59 vxCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature:arc-authentication-results; bh=R79B4xq1MDdo4+TK/HmVUqKu7t1kJyhyARYtcCjwUrQ=; b=JmvFnM83AmmQTzp+19jwlz0jZF9b32TUxCZDNvsUzBCOYKHSnyW8PPByo+LBvK6pNc d5n+oMXS/zAaPKGf2bka72Ga4hsDBfRwYpLClBLvx7zOZydDlXPFu/YUeiu67AlVC42F RwqqAu3o22XXoM+VHJWA9kOBAepxpcmP8vj1Mblvz6rItjFYJ6iO3l04uGy9t2k/VW3b M+m2YzLm+1ZA/r5FJsVCaDPom7pcaBG2aE+loEkkCk0liG0cjB9wQkEjloTDhBWDtU5g PlWx3hpK5QsTeI2CSRTAiJE4DwlD+PZmSYV35HI4mdOV0cZVBhxUKH9yJWSnACU4icXm R6DA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=TP5fcaKr; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y78-v6si7677760pfj.159.2018.07.22.14.01.32; Sun, 22 Jul 2018 14:01:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=TP5fcaKr; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730338AbeGVV5W (ORCPT + 99 others); Sun, 22 Jul 2018 17:57:22 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:44917 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730252AbeGVV5V (ORCPT ); Sun, 22 Jul 2018 17:57:21 -0400 Received: by mail-pg1-f194.google.com with SMTP id r1-v6so10727272pgp.11 for ; Sun, 22 Jul 2018 13:59:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=R79B4xq1MDdo4+TK/HmVUqKu7t1kJyhyARYtcCjwUrQ=; b=TP5fcaKrkOBc9nXbF8znCwmr6HOI94LDp701rm+04DhPMAK5SEU9hiY19BreBEf51H Hsh9hYMTuakQWLhMkkmKubuHvku8TF9yzjQumRr9Fp5PeqZFQjNVbINgeA51M1b6Cz4J VU8iHX2xsR7vA1dIgEc7kDPNIpidqEf/yjioOLoWB1C+y+RSgBoDvTvqyK9Y/kKDDlk3 uuXR+bRrcEkG0LkWPTRw1AzQvMnFNDugFYbvJ3friuqDUABIT5S7no5i41kgpVDc6+Xp 3X+8PL11hSaO909Vk8xxBG1BtbNS/OCG2H4Qr2Zhd9dphDrDT6ebIJGPXJLvwIeRnXKo JpjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=R79B4xq1MDdo4+TK/HmVUqKu7t1kJyhyARYtcCjwUrQ=; b=nnUbjNmhp3esSXgMkgePvkej8y/dGQFw0YSwWh6KI/oUuD7d0N/x5XaKkVQeKEUVw7 OCK+Z3yBv9ZRiZb5+KbONmEz0dRYCHqm3FJ4AfaqVCK/78L8p8ln/P8Z+BnhAHGFSTML Xxgey7wpy5hdZciAj4fvGy9VD0xHyXlelYtFUS6Fy4teqieaotoIWicUT3nd6FFRUq5M LYV2m4bqsXoiBxo+3GThUB2QC3XhqDE0rgtENOaW0exkSuuVz+RuVRS0nOssE/CWjb/4 3wkwkmfGIJ53jj+tmlssdl1A3yG1wC1Ake23e+KZfMxS09RGW9CyDVVg71E1aI5sT8q0 mnpQ== X-Gm-Message-State: AOUpUlG8sDaJpoGEc9KUp7ru/kDv60SM5LCDRq0dgjjesaLeXhNed7Kr lV0uKf4o5Jgf6JUfE4fv6Ql6Ng== X-Received: by 2002:a62:170c:: with SMTP id 12-v6mr10515851pfx.139.1532293164250; Sun, 22 Jul 2018 13:59:24 -0700 (PDT) Received: from ?IPv6:2600:1010:b052:c03e:2ca6:ca95:c68d:5143? ([2600:1010:b052:c03e:2ca6:ca95:c68d:5143]) by smtp.gmail.com with ESMTPSA id v23-v6sm1539437pfm.80.2018.07.22.13.59.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 22 Jul 2018 13:59:22 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: [RFC 2/2] x86/pti/64: Remove the SYSCALL64 entry trampoline From: Andy Lutomirski X-Mailer: iPhone Mail (15G77) In-Reply-To: Date: Sun, 22 Jul 2018 13:59:21 -0700 Cc: Andrew Lutomirski , the arch/x86 maintainers , Linux Kernel Mailing List , Borislav Petkov , Dave Hansen Content-Transfer-Encoding: quoted-printable Message-Id: <422DF5AC-6B45-406F-B3FC-DD1AA9BC18F6@amacapital.net> References: <1e3d01ce04315218d3f8ee269528bb774a4d1d60.1532281180.git.luto@kernel.org> To: Linus Torvalds Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Jul 22, 2018, at 11:27 AM, Linus Torvalds wrote: >=20 >> On Sun, Jul 22, 2018 at 10:45 AM Andy Lutomirski wrote:= >>=20 >> This patch changes the code to map the percpu TSS into the user page >> tables to allow the non-trampoline SYSCALL64 path to work under PTI. >=20 > Me likey. >=20 > However: >=20 >> This does not add a new direct information leak, since the TSS is >> readable by Meltdown from the cpu_entry_area alias regardless. >=20 > Afaik, it does now potentially expose through meltdown the per-thread > entry stack info, which is new. It=E2=80=99s always been exposed through the RO alias. The only new exposure= is the *address* of the RW alias, I think. >=20 > But I don't think that's a show-stopper. >=20 >> static void __init pti_clone_user_shared(void) >> { >> + for_each_possible_cpu(cpu) { >=20 > But this code is pretty disgusting and seems wrong. >=20 > Do you really want to do all trhe _possible_ cpu's, not just the > online ones? I'd rather expose less (think MAXCPU) and then have the > CPU hotplug code expose the page as the CPU comes up? We already have exactly the same issue for cpu_entry_area. If we change it, I= think we should do cpu_entry_area at the same time. But that=E2=80=99s awk= ward because cpu_entry_area is mapped one PMD at a time right now. It=E2=80=99s also awkward to expose a percpu page dynamically, because (I th= ink) percpu data isn=E2=80=99t guaranteed to all be in the same PGD-sized ar= ea. A vmalloc fault in the early SYSCALL64 path is fatal. >=20 >> + unsigned long va =3D (unsigned long)&per_cpu(cpu_tss_rw, c= pu); >> + phys_addr_t pa =3D per_cpu_ptr_to_phys((void *)va); >> + pte_t *target_pte; >> + >> + target_pte =3D pti_user_pagetable_walk_pte(va); >=20 > This function only exists if CONFIG_X86_VSYSCALL_EMULATION, so it > won't even compile under (very unusual) configurations. Oops. >=20 > The "disgusting" part is that I think it could/should share more code > with the vsyscall case, and the whole target-pte checking and setting > should be shared too. I tried that. It was uglier. The percpu code wants to make up a new PTE beca= use the real kernel mapping uses large pages. The vsyscall code wants to cop= y a PTE because it=E2=80=99s really a PTE and it has unusual permissions. >=20 > Beause not being shared, I react to this: >=20 >> + set_pte(target_pte, pfn_pte(pa >> PAGE_SHIFT, PAGE_KERNEL= )); >=20 > Hmm. The vsyscall code just does >=20 > *target_pte =3D .. >=20 > without any set_pte() stuff. Do we want/need the PVOP cases, and if > so, why doesn't the vsyscall case need it? It doesn=E2=80=99t need it. I could use plain assignment. >=20 > Anyway, I love the approach, and how this gets rid of the nasty > trampoline, so no real complaints, just "this needs some fixups". >=20 >=20 I=E2=80=99ll do the fixups. I think that, if we want to unmap the pages for C= PUs that aren=E2=80=99t present, that should be a separate patch. I=E2=80=99= m also not convinced it adds much value. In general, PTI is fairly crappy, and it leaks all kinds of information. I s= uspect the worst leak is the NMI stack for local and remote CPUs. Fixing *th= at* is going to be fugly, but may actually be important, because I can easil= y imagine malicious user code that causes arbitrary kernel memory to get rea= d and spilled on the NMI stack. What we *should* do IMO is defer allocation of percpu space for not-present C= PUs to save a bunch of memory. But that=E2=80=99s a major change and will p= robably break things.=