Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp7086867ybi; Thu, 13 Jun 2019 09:19:19 -0700 (PDT) X-Google-Smtp-Source: APXvYqzDLaY9YqSgX4rLc9rR0yrObaWaFNAeV7vtNj7UqAugZGQAKBvX44CcdrOqOqWCv3vaMZQS X-Received: by 2002:a17:90a:b298:: with SMTP id c24mr6396799pjr.18.1560442759734; Thu, 13 Jun 2019 09:19:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560442759; cv=none; d=google.com; s=arc-20160816; b=ICrwBhMNCnP9trONLWbdQEPmmq+qEL98wTcIKY+iMmFhCLizr2Hhvlnq9PZALvwMUK y6LW0Eo0xm4BNAFtFNrCyUSpQiJ29K1UTU/kXnNpZxyYpTYvpv5Igj2k6WDQRx+b0Hlo ocldrHxFDVj6zhfX6FEUK3OirGf+CIQXtag+L0lRv3Dvv1StqslHyEeQP7vyuAWqPspG HgJ5FsdR/Af6SBH8sQDU1ZLaVSpgngZsdWMKe4YAfElfeYRgdCB/WHI3qazfQO9GSTpY q8hZh6zNSX2KwTTFa4PsnZ0PZw2iPDqijwPWICASQY5PzkvdXdrcopxL3F3DxsR76xVz Z1Ug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=iszxmpLoxk8hLNWxKCnflDXIBWD8hBpgx1UN8472nTg=; b=ZUxxaYFrK4WjA++TpE2pzhYiG7dymEyvwq0iW3PuzC6GWET5MaWV2OQEZxTTpaUj6B +5AIpbv1PIj8LdzsDJvcsSHB7p0zj3AsJnOhJFmQ0flr+/WFdJ1Eag1YQ7E51A+SamA4 m3/6i4G6B3ohli4o7nHi59V20vZjjdi23POYC2GmZxeHTt4rVWnjPB9rQM+J6SuYLBpn t/VYSbC6oY1D8APwb/d1ZNkW9BIy/CGwbO4c6QimOx54wV62mScAXSZUzhQNDr5cDlEA Mw7ece/DYopNRVaTPMyTkSKwdElEWEsMhkjW1OOFV7XYPWi6KLIDnuJbalPQ+gAg43b4 DrDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=S+m9ecci; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m12si41437pjq.67.2019.06.13.09.19.05; Thu, 13 Jun 2019 09:19:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=S+m9ecci; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391968AbfFMQQa (ORCPT + 99 others); Thu, 13 Jun 2019 12:16:30 -0400 Received: from mail.kernel.org ([198.145.29.99]:39522 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731151AbfFMQQ1 (ORCPT ); Thu, 13 Jun 2019 12:16:27 -0400 Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A501A217D6 for ; Thu, 13 Jun 2019 16:16:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1560442585; bh=6cDE1IQBqECLX/GE3cz5RFhViPVqUCHI2jp+eXTJxAQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=S+m9ecci+s1CfUEwQ+LU0pgPcQX0O3+GlhwHwETe4L4Vg3BsI02F7irSlLLhw3vS7 756B+c6c/qOr8fuOkfLvDzliqYxc6edRAA7PG9C/WDjGnTy0LFrgkSurMHDC3HpOfy IGmtaoI/YNTpYJfvjPvjM+6c3Cnz+nGSxxXe5xs4= Received: by mail-wr1-f43.google.com with SMTP id r16so2289454wrl.11 for ; Thu, 13 Jun 2019 09:16:25 -0700 (PDT) X-Gm-Message-State: APjAAAW6yybJqewfVi5dSkfawMJ+i7J+5dUva1zNYBCA44rV4qNDeJdY SfoLneH/Z8Swh8NMbKZ4r1bHS1TXAsAvwQHpYyFJmw== X-Received: by 2002:a5d:6207:: with SMTP id y7mr40127026wru.265.1560442584209; Thu, 13 Jun 2019 09:16:24 -0700 (PDT) MIME-Version: 1.0 References: <20190612170834.14855-1-mhillenb@amazon.de> In-Reply-To: From: Andy Lutomirski Date: Thu, 13 Jun 2019 09:16:12 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC 00/10] Process-local memory allocations for hiding KVM secrets To: Nadav Amit Cc: Andy Lutomirski , Dave Hansen , Marius Hillenbrand , kvm list , LKML , Kernel Hardening , Linux-MM , Alexander Graf , David Woodhouse , "the arch/x86 maintainers" , Peter Zijlstra Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 12, 2019 at 6:50 PM Nadav Amit wrote: > > > On Jun 12, 2019, at 6:30 PM, Andy Lutomirski wrote: > > > > On Wed, Jun 12, 2019 at 1:27 PM Andy Lutomirski w= rote: > >>> On Jun 12, 2019, at 12:55 PM, Dave Hansen wro= te: > >>> > >>>> On 6/12/19 10:08 AM, Marius Hillenbrand wrote: > >>>> This patch series proposes to introduce a region for what we call > >>>> process-local memory into the kernel's virtual address space. > >>> > >>> It might be fun to cc some x86 folks on this series. They might have > >>> some relevant opinions. ;) > >>> > >>> A few high-level questions: > >>> > >>> Why go to all this trouble to hide guest state like registers if all = the > >>> guest data itself is still mapped? > >>> > >>> Where's the context-switching code? Did I just miss it? > >>> > >>> We've discussed having per-cpu page tables where a given PGD is only = in > >>> use from one CPU at a time. I *think* this scheme still works in suc= h a > >>> case, it just adds one more PGD entry that would have to context-swit= ched. > >> > >> Fair warning: Linus is on record as absolutely hating this idea. He mi= ght change his mind, but it=E2=80=99s an uphill battle. > > > > I looked at the patch, and it (sensibly) has nothing to do with > > per-cpu PGDs. So it's in great shape! > > > > Seriously, though, here are some very high-level review comments: > > > > Please don't call it "process local", since "process" is meaningless. > > Call it "mm local" or something like that. > > > > We already have a per-mm kernel mapping: the LDT. So please nix all > > the code that adds a new VA region, etc, except to the extent that > > some of it consists of valid cleanups in and of itself. Instead, > > please refactor the LDT code (arch/x86/kernel/ldt.c, mainly) to make > > it use a more general "mm local" address range, and then reuse the > > same infrastructure for other fancy things. The code that makes it > > KASLR-able should be in its very own patch that applies *after* the > > code that makes it all work so that, when the KASLR part causes a > > crash, we can bisect it. > > > > + /* > > + * Faults in process-local memory may be caused by process-local > > + * addresses leaking into other contexts. > > + * tbd: warn and handle gracefully. > > + */ > > + if (unlikely(fault_in_process_local(address))) { > > + pr_err("page fault in PROCLOCAL at %lx", address); > > + force_sig_fault(SIGSEGV, SEGV_MAPERR, (void __user *)address, current= ); > > + } > > + > > > > Huh? Either it's an OOPS or you shouldn't print any special > > debugging. As it is, you're just blatantly leaking the address of the > > mm-local range to malicious user programs. > > > > Also, you should IMO consider using this mechanism for kmap_atomic(). > > Hi, Nadav! > > Well, some context for the =E2=80=9Chi=E2=80=9D would have been helpful. = (Do I have a bug > and I still don=E2=80=99t understand it?) Fair enough :) > > Perhaps you regard some use-case for a similar mechanism that I mentioned > before. I did implement something similar (but not the way that you wante= d) > to improve the performance of seccomp and system-calls when retpolines ar= e > used. I set per-mm code area that held code that used direct calls to inv= oke > seccomp filters and frequently used system-calls. > > My mechanism, I think, is more not suitable for this use-case. I needed m= y > code-page to be at the same 2GB range as the kernel text/modules, which d= oes > complicate things. Due to the same reason, it is also limited in the size= of > the data/code that it can hold. > I actually meant the opposite. If we had a general-purpose per-mm kernel address range, could it be used to optimize kmap_atomic() by limiting the scope of any shootdowns? As a rough sketch, we'd have some kmap_atomic slots for each cpu *in the mm-local region*. I'm not entirely sure this is a win. --Andy