Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp879201pxf; Wed, 7 Apr 2021 14:00:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwChQGUVz6mjGb54ZEZ3iL4tw8h5kpBGHgBNLdG9TQEPWHYVH2LREMOGsbv0LyHyET/Qg9z X-Received: by 2002:a05:6638:d0d:: with SMTP id q13mr5332994jaj.141.1617829215884; Wed, 07 Apr 2021 14:00:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617829215; cv=none; d=google.com; s=arc-20160816; b=f1E9S5l7xrIeogeti95J4ablg4nnIoaiIC0lxxk6mURjy3/4f1SQbrMDq959zK31om eWE2fzVXTrfwOcyPQxBvqYxFckNGGfN9KoZjBOy96B89JO1i3lEpSpkiZDzTfmiSnLrt uSLNi8ss0OjCGpfjU88TDK23S08MKLksXVeEzOwJK0tq1PT4uaQn0R4vsr98z/mJanRh GSbm/o6kHvnDyx4dWM+r42eMKiUJUXWtDAQntQHuGe+uEReKIQdj+e3YVa+p8f60+geh eFsvvEeZAYl53qQ+hehaeXeXqWoVXdSCiAlucG3FcL/M54pV5U7nIPLHU8FXKSKJdAeh NDQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=vJkU8vXKb0BvHUAPz+V+eKFMW3IPbu7jFAYroR0o//I=; b=Z6KqvWBQQT09NDLl1EajF00/2VXrt9EG3Hre8gXAWfsGktFAT9JB0X+f2j0RHMmSYE pghCLWsqe1MN/KzdLpULCjLmFmsogh+4trYsKEG2OWj4MpEu83Q8TzjC7Tj5z2GhSK9O ztnBzG/eO3OyAPJtKkUBSUT5EPI9AjzSEw/IvIa8GAGx+2ksmoOf2ftEK8j3hBsOI+DM /XpR4sPxItB9VCcmua3cyB2A2gLoBBoEhtQqyYtxRZKF3iINlBev8DOp3caZ2z3yJtEN T9v8IeJQJTSBAIvC3b0FlsP6aSdy/waWSh1X8GF/BUd2zgdzxiZfv1Jdq45lCNS2vSFE Fxog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gqC7gTId; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y11si20351763ill.115.2021.04.07.14.00.02; Wed, 07 Apr 2021 14:00:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gqC7gTId; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233670AbhDGNbr (ORCPT + 99 others); Wed, 7 Apr 2021 09:31:47 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:43808 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245009AbhDGNbp (ORCPT ); Wed, 7 Apr 2021 09:31:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1617802295; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vJkU8vXKb0BvHUAPz+V+eKFMW3IPbu7jFAYroR0o//I=; b=gqC7gTIdD0Qt9CLjFdvEyBKgWrAmQPIPgmT8Q0++wU5hEgKd8k/VSXQmn0ybjfcvAp7UsP Wv18ydw7gangafhr6pcCcQNaCw8NLDsLiyV2XEWEM8pVmvGXGPmWBFPrsdap+DqjPDh+qf uM8Bdt3iHRchfE3eAhr22QydA4QZNrQ= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-222-oCYVG94SMY2U_nPOmm8fjA-1; Wed, 07 Apr 2021 09:31:32 -0400 X-MC-Unique: oCYVG94SMY2U_nPOmm8fjA-1 Received: by mail-wr1-f69.google.com with SMTP id r11so7042386wrp.8 for ; Wed, 07 Apr 2021 06:31:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=vJkU8vXKb0BvHUAPz+V+eKFMW3IPbu7jFAYroR0o//I=; b=sqRxcfwcVmQtBWXx1rck+JejLZFl2ORoiP7BG/qldYwa3V7F4JuidEieH3qtETOUou uukSvAJqylX8Ukcy4BtCqh1/e58yzYBHjosICXrGEX2RITOOX8PPVYJZp/K6o++tLFdA 96ZLnUMAaQasFzhn0JSN+Jn6tTXWbomEIw6MTvOYwLn1goNVOw0PTmlXhSzPuzH3A1J5 5Sw0KCwE/BUHpHK8my2DcfvRvrvArEH5//1XqAMFkSonqU0yNRupwB6DNTA7VIGI/mpt 1FZUE3tTVIvBTGHX8K2p90NTFM3qEdphzoKLQHSyudnlW4lbJjGvOsCOZ6uGXWZ687Fb IEfQ== X-Gm-Message-State: AOAM532s14ksJUSHPcZup851hxYAIsKXl/A2QEHpEElwkYuAptKx5zwY yINHmfdXl79FlYshxY6szxmFzkpSrzCLS9fR5b3oKNCqrmU9To7CIMfylsU+dJGAQ+2wEkNvJQ5 1cmrxIzLAEJP26WHrRNOjLXTy X-Received: by 2002:adf:fd0b:: with SMTP id e11mr550472wrr.347.1617802290842; Wed, 07 Apr 2021 06:31:30 -0700 (PDT) X-Received: by 2002:adf:fd0b:: with SMTP id e11mr550429wrr.347.1617802290484; Wed, 07 Apr 2021 06:31:30 -0700 (PDT) Received: from ?IPv6:2a01:e0a:466:71c0:99e0:ccd6:fcea:5668? ([2a01:e0a:466:71c0:99e0:ccd6:fcea:5668]) by smtp.gmail.com with ESMTPSA id u17sm7339826wmq.3.2021.04.07.06.31.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 07 Apr 2021 06:31:30 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Subject: Re: [RFCv1 7/7] KVM: unmap guest memory using poisoned pages From: Christophe de Dinechin In-Reply-To: <20210407131647.djajbwhqsmlafsyo@box.shutemov.name> Date: Wed, 7 Apr 2021 15:31:28 +0200 Cc: David Hildenbrand , Dave Hansen , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Jim Mattson , David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Content-Transfer-Encoding: quoted-printable Message-Id: References: <20210402152645.26680-1-kirill.shutemov@linux.intel.com> <20210402152645.26680-8-kirill.shutemov@linux.intel.com> <52518f09-7350-ebe9-7ddb-29095cd3a4d9@intel.com> <20210407131647.djajbwhqsmlafsyo@box.shutemov.name> To: "Kirill A. Shutemov" X-Mailer: Apple Mail (2.3654.60.0.2.21) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On 7 Apr 2021, at 15:16, Kirill A. Shutemov = wrote: >=20 > On Tue, Apr 06, 2021 at 04:57:46PM +0200, David Hildenbrand wrote: >> On 06.04.21 16:33, Dave Hansen wrote: >>> On 4/6/21 12:44 AM, David Hildenbrand wrote: >>>> On 02.04.21 17:26, Kirill A. Shutemov wrote: >>>>> TDX architecture aims to provide resiliency against = confidentiality and >>>>> integrity attacks. Towards this goal, the TDX architecture helps = enforce >>>>> the enabling of memory integrity for all TD-private memory. >>>>>=20 >>>>> The CPU memory controller computes the integrity check value (MAC) = for >>>>> the data (cache line) during writes, and it stores the MAC with = the >>>>> memory as meta-data. A 28-bit MAC is stored in the ECC bits. >>>>>=20 >>>>> Checking of memory integrity is performed during memory reads. If >>>>> integrity check fails, CPU poisones cache line. >>>>>=20 >>>>> On a subsequent consumption (read) of the poisoned data by = software, >>>>> there are two possible scenarios: >>>>>=20 >>>>> - Core determines that the execution can continue and it treats >>>>> poison with exception semantics signaled as a #MCE >>>>>=20 >>>>> - Core determines execution cannot continue,and it does an = unbreakable >>>>> shutdown >>>>>=20 >>>>> For more details, see Chapter 14 of Intel TDX Module EAS[1] >>>>>=20 >>>>> As some of integrity check failures may lead to system shutdown = host >>>>> kernel must not allow any writes to TD-private memory. This = requirment >>>>> clashes with KVM design: KVM expects the guest memory to be mapped = into >>>>> host userspace (e.g. QEMU). >>>>=20 >>>> So what you are saying is that if QEMU would write to such memory, = it >>>> could crash the kernel? What a broken design. >>>=20 >>> IMNHO, the broken design is mapping the memory to userspace in the = first >>> place. Why the heck would you actually expose something with the = MMU to >>> a context that can't possibly meaningfully access or safely write to = it? >>=20 >> I'd say the broken design is being able to crash the machine via a = simple >> memory write, instead of only crashing a single process in case = you're doing >> something nasty. =46rom the evaluation of the problem it feels like = this was a >> CPU design workaround: instead of properly cleaning up when it gets = tricky >> within the core, just crash the machine. And that's a CPU "feature", = not a >> kernel "feature". Now we have to fix broken HW in the kernel - once = again. >>=20 >> However, you raise a valid point: it does not make too much sense to = to map >> this into user space. Not arguing against that; but crashing the = machine is >> just plain ugly. >>=20 >> I wonder: why do we even *want* a VMA/mmap describing that memory? = Sounds >> like: for hacking support for that memory type into QEMU/KVM. >>=20 >> This all feels wrong, but I cannot really tell how it could be = better. That >> memory can really only be used (right now?) with hardware = virtualization >> from some point on. =46rom that point on (right from the start?), = there should >> be no VMA/mmap/page tables for user space anymore. >>=20 >> Or am I missing something? Is there still valid user space access? >=20 > There is. For IO (e.g. virtio) the guest mark a range of memory as = shared > (or unencrypted for AMD SEV). The range is not pre-defined. >=20 >>> This started with SEV. QEMU creates normal memory mappings with the = SEV >>> C-bit (encryption) disabled. The kernel plumbs those into NPT, but = when >>> those are instantiated, they have the C-bit set. So, we have = mismatched >>> mappings. Where does that lead? The two mappings not only differ = in >>> the encryption bit, causing one side to read gibberish if the other >>> writes: they're not even cache coherent. >>>=20 >>> That's the situation *TODAY*, even ignoring TDX. >>>=20 >>> BTW, I'm pretty sure I know the answer to the "why would you expose = this >>> to userspace" question: it's what QEMU/KVM did alreadhy for >>> non-encrypted memory, so this was the quickest way to get SEV = working. >>>=20 >>=20 >> Yes, I guess so. It was the fastest way to "hack" it into QEMU. >>=20 >> Would we ever even want a VMA/mmap/process page tables for that = memory? How >> could user space ever do something *not so nasty* with that memory = (in the >> current context of VMs)? >=20 > In the future, the memory should be still managable by host MM: = migration, > swapping, etc. But it's long way there. For now, the guest memory > effectively pinned on the host. Is there even a theoretical way to restore an encrypted page e.g. from = (host) swap without breaking the integrity check? Or will that only be possible = with assistance from within the encrypted enclave? >=20 > --=20 > Kirill A. Shutemov >=20