Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp905528pxx; Tue, 27 Oct 2020 03:36:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyaPZDR5QmWJNKs1bO8NeRcBx2iPEIAiOovFxUh2P098uAqqXZIRfXtgq6V4V/f+w5wuUwP X-Received: by 2002:aa7:c2ca:: with SMTP id m10mr1415203edp.255.1603795008167; Tue, 27 Oct 2020 03:36:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603795008; cv=none; d=google.com; s=arc-20160816; b=bsuuMtVBPwswx8qtuopDcDI1Xu1y1pZbd1+frlm4b5Z7nkD+6Zch1VZRjc3Z9Nsc+o gfkJ5omZmRGaqB+KP7YBSQZrBT51z7MgWX0LYZRVGq4wZ+OeOvWCr1t/ROfC832KSyak lvu3ef1i5g9DoaiiiBiptY5N5Sjig+PgrK7TAzarK6fqlGVmjWYDVB+yx5gcLsZ6Nobp pItegiEFDZclb/M9O2keNbf3HdHFEfwHuJrHP1L5AvdWIAt+gZbf9BjUh57DgTGo9+Bi 0DOBGWW5pQg7Zb9ouPmEI3YpqXpXUXJ6rP0ZOc9iF1cmrd/lX99+9mNDxOGM+Mj8E8uy TDAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=J9KxLNS6NZSsodPQjcDEAuYrTK8f+g+xLY17q46UZzg=; b=eB0lHHGtg5dX0Ms7NYue/zoxlUvQn23Q8/e9FVB2CnkzkAFlPddRLs1vzYnkKUI2Xt qudnGbrX1PSgHznrTU0jd3YvNpf3qmPgDg1G55zWqKcUr6d8ikarRt3wey15Zhaf1AHq 5yBj5r5YlZE4Lv27PdlIZhyN9DBB37RKUd2+NP1n/ujI7VqZGszx+ggfcM5obe5ERwYp m1QVt/8P3/L6vLfkD2Burcdvs3832b1OPsJ0J5O4Elpw9E5yLEHznAvj7GiMq/8UcVAX A0zl1tHQWAWk5qXxYI6tmlreRQAKnRfLvHy4Tdd2y/79zI9D7cbzEZtER7u3RlfTgPDh AG0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=RRO3Tzgd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d23si549183edr.183.2020.10.27.03.36.25; Tue, 27 Oct 2020 03:36:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=RRO3Tzgd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437124AbgJZX6v (ORCPT + 99 others); Mon, 26 Oct 2020 19:58:51 -0400 Received: from mail.kernel.org ([198.145.29.99]:40898 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2437084AbgJZX6k (ORCPT ); Mon, 26 Oct 2020 19:58:40 -0400 Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com [209.85.208.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 88F2D2087C for ; Mon, 26 Oct 2020 23:58:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1603756719; bh=uLnVjN+bYvNodXPMnfZTZnXM5m0PwYS9okq64LRETLs=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=RRO3TzgdvpmKVD3bbCDkGZZSR1+Fm8OK5aueWvEPYKjc7d8a93MtphTmsqbvyTiDV vFafi3oaHWEo+2qejFic18T+MDKTHSgZ5hWamXYCb2fVZsL+8d4q6JlAmQzfeoEiMv i3ZtXn1kgq7xiQy4KNuv/PVAp7RliU3VNVBzODwI= Received: by mail-lj1-f173.google.com with SMTP id m16so12549720ljo.6 for ; Mon, 26 Oct 2020 16:58:39 -0700 (PDT) X-Gm-Message-State: AOAM531SEipI+OqsPXl+ZWvktHxWNg2u4VgTCGdA5XkArHZ/nYYAF/HB 59Iju+mPR7BGl50GnOJWJ6bJTFaTAtYkyJXK5br7dQ== X-Received: by 2002:a5d:6744:: with SMTP id l4mr20569606wrw.18.1603756716752; Mon, 26 Oct 2020 16:58:36 -0700 (PDT) MIME-Version: 1.0 References: <20201020061859.18385-1-kirill.shutemov@linux.intel.com> <20201026152910.happu7wic4qjxmp7@box> In-Reply-To: <20201026152910.happu7wic4qjxmp7@box> From: Andy Lutomirski Date: Mon, 26 Oct 2020 16:58:16 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFCv2 00/16] KVM protected memory extension To: "Kirill A. Shutemov" Cc: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , David Rientjes , Andrea Arcangeli , Kees Cook , Will Drewry , "Edgecombe, Rick P" , "Kleen, Andi" , Liran Alon , Mike Rapoport , X86 ML , kvm list , Linux-MM , LKML , "Kirill A. Shutemov" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 26, 2020 at 8:29 AM Kirill A. Shutemov wrote: > > On Wed, Oct 21, 2020 at 11:20:56AM -0700, Andy Lutomirski wrote: > > > On Oct 19, 2020, at 11:19 PM, Kirill A. Shutemov wrote: > > > > > For removing the userspace mapping, use a trick similar to what NUMA > > > balancing does: convert memory that belongs to KVM memory slots to > > > PROT_NONE: all existing entries converted to PROT_NONE with mprotect() and > > > the newly faulted in pages get PROT_NONE from the updated vm_page_prot. > > > The new VMA flag -- VM_KVM_PROTECTED -- indicates that the pages in the > > > VMA must be treated in a special way in the GUP and fault paths. The flag > > > allows GUP to return the page even though it is mapped with PROT_NONE, but > > > only if the new GUP flag -- FOLL_KVM -- is specified. Any userspace access > > > to the memory would result in SIGBUS. Any GUP access without FOLL_KVM > > > would result in -EFAULT. > > > > > > > I definitely like the direction this patchset is going in, and I think > > that allowing KVM guests to have memory that is inaccessible to QEMU > > is a great idea. > > > > I do wonder, though: do we really want to do this with these PROT_NONE > > tricks, or should we actually come up with a way to have KVM guest map > > memory that isn't mapped into QEMU's mm_struct at all? As an example > > of the latter, I mean something a bit like this: > > > > https://lkml.kernel.org/r/CALCETrUSUp_7svg8EHNTk3nQ0x9sdzMCU=h8G-Sy6=SODq5GHg@mail.gmail.com > > > > I don't mean to say that this is a requirement of any kind of > > protected memory like this, but I do think we should understand the > > tradeoffs, in terms of what a full implementation looks like, the > > effort and time frames involved, and the maintenance burden of > > supporting whatever gets merged going forward. > > I considered the PROT_NONE trick neat. Complete removing of the mapping > from QEMU would require more changes into KVM and I'm not really familiar > with it. I think it's neat. The big tradeoff I'm concerned about is that it will likely become ABI once it lands. That is, if this series lands, then we will always have to support the case in which QEMU has a special non-present mapping that is nonetheless reflected as present in a guest. This is a bizarre state of affairs, it may become obsolete if a better API ever shows up, and it might end up placing constraints on the Linux VM that we don't love going forward. I don't think my proposal in the referenced thread above is that crazy or that difficult to implement. The basic idea is to have a way to create an mm_struct that is not loaded in CR3 anywhere. Instead, KVM will reference it, much as it currently references QEMU's mm_struct, to mirror mappings into the guest. This means it would be safe to have "protected" memory mapped into the special mm_struct because nothing other than KVM will ever reference the PTEs. But I think that someone who really understands the KVM memory mapping code should chime in. > > About tradeoffs: the trick interferes with AutoNUMA. I didn't put much > thought into how we can get it work together. Need to look into it. > > Do you see other tradeoffs? > > -- > Kirill A. Shutemov