Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp2853361ybi; Mon, 17 Jun 2019 11:28:33 -0700 (PDT) X-Google-Smtp-Source: APXvYqxAhGYFWbHr8JSD+fLjBl5vMbkG2Wsv9Lj6fDM+k74YmFZNI9ufKWTdAEyPWvWl3qecJVmc X-Received: by 2002:a17:90a:a10c:: with SMTP id s12mr148264pjp.49.1560796113257; Mon, 17 Jun 2019 11:28:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560796113; cv=none; d=google.com; s=arc-20160816; b=xIZm6wKoaL/AupHacORWCyWzh1IgRShrWnX9IM8EACCxEAEj9ZEoB9Fe6XFSF9ervQ MSDRL9dkhRjObH216Z0nqXXhzxC8piE+/yPHmrN2iQiMcJLr3xgwA6ztpl58BZ6vEB+0 AtCOeOj4Mx0wuomtSi6YA8Ti05EPKj2HegHgfwSfnZLNFZidAsl5O1hhuFbp6t6htZyl zAOPLJv41oANUYbL5Q5IpgWEN8A6HQZfflaSWrJv2UCGnf7uABVK8w4h026PyttOABF0 VvjcUKPZwIZ/EpUeVifxQKoN54fK+b/+GI+RbQhgRwkWXwVpzV1yNF5rhOkkHlRRfCG1 Ih5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:autocrypt:openpgp:from:references:cc:to:subject; bh=6H4mkm7WLcM2yq13uqTtAXQ5LOBWBCYMF5NVMbbfEUI=; b=PbWJkdI6EnaOlWMh2ksSaz/1ZyqMn3R1344/3kVmf7QWgl4+x3L870zi6nYR0WETna ImezXS+ky2VcSIOdqIuUrFiNN2Fyg8+sx0mc9aFteChUviyvnGSksUGJTvXm7UXOotCF XblR9ENNRSwy7f7wCupUKFiMn9fWZpUQxR4dr8p8SGkEwpq3Oo9F3g97bMwISkLlHBto PD9vxyXzZhVVjfC4fvzVR7QHim5wjvAayejQ++jk7LSbSDQm1ac3cuFfSOKBFSEynVCd Ojn7SpG/uyXFGzzbnbwms/zgxdvKOZEnLFn6DWgbnvKZ1SgT7W3CWmsgVwTDMsnQjRfR aLcQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 33si10787677plv.153.2019.06.17.11.28.17; Mon, 17 Jun 2019 11:28:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726248AbfFQS2B (ORCPT + 99 others); Mon, 17 Jun 2019 14:28:01 -0400 Received: from mga14.intel.com ([192.55.52.115]:26944 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725601AbfFQS2B (ORCPT ); Mon, 17 Jun 2019 14:28:01 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Jun 2019 11:28:00 -0700 X-ExtLoop1: 1 Received: from ray.jf.intel.com (HELO [10.7.201.126]) ([10.7.201.126]) by orsmga002.jf.intel.com with ESMTP; 17 Jun 2019 11:27:59 -0700 Subject: Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME To: Andy Lutomirski Cc: "Kirill A. Shutemov" , Andrew Morton , X86 ML , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Borislav Petkov , Peter Zijlstra , David Howells , Kees Cook , Kai Huang , Jacob Pan , Alison Schofield , Linux-MM , kvm list , keyrings@vger.kernel.org, LKML , Tom Lendacky References: <20190508144422.13171-1-kirill.shutemov@linux.intel.com> <20190508144422.13171-46-kirill.shutemov@linux.intel.com> <3c658cce-7b7e-7d45-59a0-e17dae986713@intel.com> From: Dave Hansen Openpgp: preference=signencrypt Autocrypt: addr=dave.hansen@intel.com; keydata= mQINBE6HMP0BEADIMA3XYkQfF3dwHlj58Yjsc4E5y5G67cfbt8dvaUq2fx1lR0K9h1bOI6fC oAiUXvGAOxPDsB/P6UEOISPpLl5IuYsSwAeZGkdQ5g6m1xq7AlDJQZddhr/1DC/nMVa/2BoY 2UnKuZuSBu7lgOE193+7Uks3416N2hTkyKUSNkduyoZ9F5twiBhxPJwPtn/wnch6n5RsoXsb ygOEDxLEsSk/7eyFycjE+btUtAWZtx+HseyaGfqkZK0Z9bT1lsaHecmB203xShwCPT49Blxz VOab8668QpaEOdLGhtvrVYVK7x4skyT3nGWcgDCl5/Vp3TWA4K+IofwvXzX2ON/Mj7aQwf5W iC+3nWC7q0uxKwwsddJ0Nu+dpA/UORQWa1NiAftEoSpk5+nUUi0WE+5DRm0H+TXKBWMGNCFn c6+EKg5zQaa8KqymHcOrSXNPmzJuXvDQ8uj2J8XuzCZfK4uy1+YdIr0yyEMI7mdh4KX50LO1 pmowEqDh7dLShTOif/7UtQYrzYq9cPnjU2ZW4qd5Qz2joSGTG9eCXLz5PRe5SqHxv6ljk8mb ApNuY7bOXO/A7T2j5RwXIlcmssqIjBcxsRRoIbpCwWWGjkYjzYCjgsNFL6rt4OL11OUF37wL QcTl7fbCGv53KfKPdYD5hcbguLKi/aCccJK18ZwNjFhqr4MliQARAQABtEVEYXZpZCBDaHJp c3RvcGhlciBIYW5zZW4gKEludGVsIFdvcmsgQWRkcmVzcykgPGRhdmUuaGFuc2VuQGludGVs LmNvbT6JAjgEEwECACIFAlQ+9J0CGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEGg1 lTBwyZKwLZUP/0dnbhDc229u2u6WtK1s1cSd9WsflGXGagkR6liJ4um3XCfYWDHvIdkHYC1t MNcVHFBwmQkawxsYvgO8kXT3SaFZe4ISfB4K4CL2qp4JO+nJdlFUbZI7cz/Td9z8nHjMcWYF IQuTsWOLs/LBMTs+ANumibtw6UkiGVD3dfHJAOPNApjVr+M0P/lVmTeP8w0uVcd2syiaU5jB aht9CYATn+ytFGWZnBEEQFnqcibIaOrmoBLu2b3fKJEd8Jp7NHDSIdrvrMjYynmc6sZKUqH2 I1qOevaa8jUg7wlLJAWGfIqnu85kkqrVOkbNbk4TPub7VOqA6qG5GCNEIv6ZY7HLYd/vAkVY E8Plzq/NwLAuOWxvGrOl7OPuwVeR4hBDfcrNb990MFPpjGgACzAZyjdmYoMu8j3/MAEW4P0z F5+EYJAOZ+z212y1pchNNauehORXgjrNKsZwxwKpPY9qb84E3O9KYpwfATsqOoQ6tTgr+1BR CCwP712H+E9U5HJ0iibN/CDZFVPL1bRerHziuwuQuvE0qWg0+0SChFe9oq0KAwEkVs6ZDMB2 P16MieEEQ6StQRlvy2YBv80L1TMl3T90Bo1UUn6ARXEpcbFE0/aORH/jEXcRteb+vuik5UGY 5TsyLYdPur3TXm7XDBdmmyQVJjnJKYK9AQxj95KlXLVO38lcuQINBFRjzmoBEACyAxbvUEhd GDGNg0JhDdezyTdN8C9BFsdxyTLnSH31NRiyp1QtuxvcqGZjb2trDVuCbIzRrgMZLVgo3upr MIOx1CXEgmn23Zhh0EpdVHM8IKx9Z7V0r+rrpRWFE8/wQZngKYVi49PGoZj50ZEifEJ5qn/H Nsp2+Y+bTUjDdgWMATg9DiFMyv8fvoqgNsNyrrZTnSgoLzdxr89FGHZCoSoAK8gfgFHuO54B lI8QOfPDG9WDPJ66HCodjTlBEr/Cwq6GruxS5i2Y33YVqxvFvDa1tUtl+iJ2SWKS9kCai2DR 3BwVONJEYSDQaven/EHMlY1q8Vln3lGPsS11vSUK3QcNJjmrgYxH5KsVsf6PNRj9mp8Z1kIG qjRx08+nnyStWC0gZH6NrYyS9rpqH3j+hA2WcI7De51L4Rv9pFwzp161mvtc6eC/GxaiUGuH BNAVP0PY0fqvIC68p3rLIAW3f97uv4ce2RSQ7LbsPsimOeCo/5vgS6YQsj83E+AipPr09Caj 0hloj+hFoqiticNpmsxdWKoOsV0PftcQvBCCYuhKbZV9s5hjt9qn8CE86A5g5KqDf83Fxqm/ vXKgHNFHE5zgXGZnrmaf6resQzbvJHO0Fb0CcIohzrpPaL3YepcLDoCCgElGMGQjdCcSQ+Ci FCRl0Bvyj1YZUql+ZkptgGjikQARAQABiQIfBBgBAgAJBQJUY85qAhsMAAoJEGg1lTBwyZKw l4IQAIKHs/9po4spZDFyfDjunimEhVHqlUt7ggR1Hsl/tkvTSze8pI1P6dGp2XW6AnH1iayn yRcoyT0ZJ+Zmm4xAH1zqKjWplzqdb/dO28qk0bPso8+1oPO8oDhLm1+tY+cOvufXkBTm+whm +AyNTjaCRt6aSMnA/QHVGSJ8grrTJCoACVNhnXg/R0g90g8iV8Q+IBZyDkG0tBThaDdw1B2l asInUTeb9EiVfL/Zjdg5VWiF9LL7iS+9hTeVdR09vThQ/DhVbCNxVk+DtyBHsjOKifrVsYep WpRGBIAu3bK8eXtyvrw1igWTNs2wazJ71+0z2jMzbclKAyRHKU9JdN6Hkkgr2nPb561yjcB8 sIq1pFXKyO+nKy6SZYxOvHxCcjk2fkw6UmPU6/j/nQlj2lfOAgNVKuDLothIxzi8pndB8Jju KktE5HJqUUMXePkAYIxEQ0mMc8Po7tuXdejgPMwgP7x65xtfEqI0RuzbUioFltsp1jUaRwQZ MTsCeQDdjpgHsj+P2ZDeEKCbma4m6Ez/YWs4+zDm1X8uZDkZcfQlD9NldbKDJEXLIjYWo1PH hYepSffIWPyvBMBTW2W5FRjJ4vLRrJSUoEfJuPQ3vW9Y73foyo/qFoURHO48AinGPZ7PC7TF vUaNOTjKedrqHkaOcqB185ahG2had0xnFsDPlx5y Message-ID: <5cbfa2da-ba2e-ed91-d0e8-add67753fc12@intel.com> Date: Mon, 17 Jun 2019 11:27:59 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Tom Lendacky, could you take a look down in the message to the talk of SEV? I want to make sure I'm not misrepresenting what it does today. ... >> I actually don't care all that much which one we end up with. It's not >> like the extra syscall in the second options means much. > > The benefit of the second one is that, if sys_encrypt is absent, it > just works. In the first model, programs need a fallback because > they'll segfault of mprotect_encrypt() gets ENOSYS. Well, by the time they get here, they would have already had to allocate and set up the encryption key. I don't think this would really be the "normal" malloc() path, for instance. >> How do we >> eventually stack it on top of persistent memory filesystems or Device >> DAX? > > How do we stack anonymous memory on top of persistent memory or Device > DAX? I'm confused. If our interface to MKTME is: fd = open("/dev/mktme"); ptr = mmap(fd); Then it's hard to combine with an interface which is: fd = open("/dev/dax123"); ptr = mmap(fd); Where if we have something like mprotect() (or madvise() or something else taking pointer), we can just do: fd = open("/dev/anything987"); ptr = mmap(fd); sys_encrypt(ptr); Now, we might not *do* it that way for dax, for instance, but I'm just saying that if we go the /dev/mktme route, we never get a choice. > I think that, in the long run, we're going to have to either expand > the core mm's concept of what "memory" is or just have a whole > parallel set of mechanisms for memory that doesn't work like memory. ... > I expect that some day normal memory will be able to be repurposed as > SGX pages on the fly, and that will also look a lot more like SEV or > XPFO than like the this model of MKTME. I think you're drawing the line at pages where the kernel can manage contents vs. not manage contents. I'm not sure that's the right distinction to make, though. The thing that is important is whether the kernel can manage the lifetime and location of the data in the page. Basically: Can the kernel choose where the page comes from and get the page back when it wants? I really don't like the current state of things like with SEV or with KVM direct device assignment where the physical location is quite locked down and the kernel really can't manage the memory. I'm trying really hard to make sure future hardware is more permissive about such things. My hope is that these are a temporary blip and not the new normal. > So, if we upstream MKTME as anonymous memory with a magic config > syscall, I predict that, in a few years, it will be end up inheriting > all downsides of both approaches with few of the upsides. Programs > like QEMU will need to learn to manipulate pages that can't be > accessed outside the VM without special VM buy-in, so the fact that > MKTME pages are fully functional and can be GUP-ed won't be very > useful. And the VM will learn about all these things, but MKTME won't > really fit in. Kai Huang (who is on cc) has been doing the QEMU enabling and might want to weigh in. I'd also love to hear from the AMD folks in case I'm not grokking some aspect of SEV. But, my understanding is that, even today, neither QEMU nor the kernel can see SEV-encrypted guest memory. So QEMU should already understand how to not interact with guest memory. I _assume_ it's also already doing this with anonymous memory, without needing /dev/sme or something. > And, one of these days, someone will come up with a version of XPFO > that could actually be upstreamed, and it seems entirely plausible > that it will be totally incompatible with MKTME-as-anonymous-memory > and that users of MKTME will actually get *worse* security. I'm not following here. XPFO just means that we don't keep the direct map around all the time for all memory. If XPFO and MKTME-as-anonymous-memory were both in play, I think we'd just be creating/destroying the MKTME-enlightened direct map instead of a vanilla one.