Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp2778637pxb; Mon, 19 Apr 2021 13:47:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwDA2JTze7nqTqlxZZp7tJVTBjZKZytnFcGBnMwT4tKws0We3auT9mfKQz4e1CqOUCNk0Ry X-Received: by 2002:a17:90a:c091:: with SMTP id o17mr977518pjs.185.1618865223493; Mon, 19 Apr 2021 13:47:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618865223; cv=none; d=google.com; s=arc-20160816; b=R3FTM269/blel0IXU+qpXKoHtdppEsSA4e+/psVmujIuWwJG6I0Sm2qTIEJIkvF5uV rlErZi9rZXtPuif+/vhOFnVE9zSwjSF20FNWD6lKKHTqceRYabIY71D735tCqS4iTPbH niXerAWezaexfvzhaHcfKTtapDEzcnh5lsE5irH0VhMwwJp6cnuWW941VFE2nG04SH2w 6wXOHdOiP1wK+N+zghA9dMXxIBElkFcqhXtq9IzED3ldPQpCcOjc/N6f8syQHusInXVf V17nOLBPN4QjrB/vngc3cf4U9r6bzPeWWIeEAQoopMEUA+LWH8mJqIXJ3TYSAgCg6lxK GYRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=+X4D2n9RFMWw8zjzh3E9FVsIFGWzECSAEPhHZI5vCug=; b=azmp8/fHYg2L0Koa3ruogKqBUWG7l8w9ApBA7fRuO4cTJSL1JtLje/8Evp5h8JTTN8 wVbH4XXqsF/t75dLhaPtwFqVT+3oH2DSQoY6s22vYprgacvEyEdCUw6AcOinWDySfm+G g8jPANbkqRo3TTEiLz4LUCCP2E3LNCxzkM92DmR771FWtLz2/JIE4V0VW+C86zILAVBy vGd5KOXn6AXbIzwAlHG/aUBqwgPZXI9Kuc5+1Ca0TmPRYFrUo9z4etFP6mmLoyRKIAZ9 kdGlfSBPWYcoMyGj/QVup0svpM7wUgveo6hRLK8XaAnhlqAlbqN7x3z+vJaP7n0FXbxe 6edQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b="PKRW/7gr"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i10si18449165plt.75.2021.04.19.13.46.50; Mon, 19 Apr 2021 13:47:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b="PKRW/7gr"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238265AbhDSSy1 (ORCPT + 99 others); Mon, 19 Apr 2021 14:54:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48582 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233350AbhDSSy1 (ORCPT ); Mon, 19 Apr 2021 14:54:27 -0400 Received: from mail-lj1-x22f.google.com (mail-lj1-x22f.google.com [IPv6:2a00:1450:4864:20::22f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF5F6C06174A for ; Mon, 19 Apr 2021 11:53:56 -0700 (PDT) Received: by mail-lj1-x22f.google.com with SMTP id o5so7538083ljc.1 for ; Mon, 19 Apr 2021 11:53:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=+X4D2n9RFMWw8zjzh3E9FVsIFGWzECSAEPhHZI5vCug=; b=PKRW/7grIxILIqQsK8h20q83oHtONy15VxwtEa46HHMPKMLvgfSwOc/ocz5gMW5UF1 JXZ3sixgGt7FqkeYJNqj8cCgCtgKNz+s5k6ZSXhHuN1T29+4JyAD5zNHptCnSK5C+Vtb OThPJPzlqvud0YZHViZXPUIRp4FU6bh+H5+aefuZTo3WY6LChygih0ftfFO1q12z1c3N Sjv+D/H0MaDHSaZ5mYVlOuB6xZgQ6z3kwlBS0hLPB0HfMm4uXkLTyDuO9DkdumtI+cWF J6DAtW1wzC+iYCpgmdedDOtCTSaugPuBJ3qUgRh9QKt8ytE6JiBmjAwFG2EHExWpsFEW 9ctQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=+X4D2n9RFMWw8zjzh3E9FVsIFGWzECSAEPhHZI5vCug=; b=Nw4WiMzbHwHnSAN5XqDhrwGGBWP5dWTR0TmLDO5+HLBRdFULNnqYJCKxtxJF1o3UPA Ssn7wLUBk66O4EHyeKGXQiz6U7QvxkctiiDSWfNZg0fuVHspagKKzqvmL2yQ8EcWLfa0 QU5dkTmU7ztxO+M4HZGkfNiTqjbSRyKzoTLe5hvhoPpxOzg+gxHK3WXxR9hSZuLX6COr KyxSH0froWRMccDRMduuFEA46+R7J+so/3thHdL1gnuQlLhpv+NyX3zktxREH18KbVL0 OWZe7QoXg00TT4htbYqT52r2RmScelH19TWQ9PRUtWRG+6PGIKCmbBdALaTWDFz6TiI/ C0zA== X-Gm-Message-State: AOAM533EQ+Pz1aplZ0Iqk4l50JINlOd42Kfie/qgJSPb6QsA+iLonO2z 8c8ieX3o7VJCLQ02pnkOJEfRzw== X-Received: by 2002:a2e:964e:: with SMTP id z14mr12543091ljh.150.1618858435261; Mon, 19 Apr 2021 11:53:55 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id s21sm1914950lfs.261.2021.04.19.11.53.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Apr 2021 11:53:54 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id 28FB8102567; Mon, 19 Apr 2021 21:53:54 +0300 (+03) Date: Mon, 19 Apr 2021 21:53:54 +0300 From: "Kirill A. Shutemov" To: Sean Christopherson Cc: "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Jim Mattson , David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages Message-ID: <20210419185354.v3rgandtrel7bzjj@box> References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> <20210416154106.23721-14-kirill.shutemov@linux.intel.com> <20210419142602.khjbzktk5tk5l6lk@box.shutemov.name> <20210419164027.dqiptkebhdt5cfmy@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 19, 2021 at 06:09:29PM +0000, Sean Christopherson wrote: > On Mon, Apr 19, 2021, Kirill A. Shutemov wrote: > > On Mon, Apr 19, 2021 at 04:01:46PM +0000, Sean Christopherson wrote: > > > But fundamentally the private pages, are well, private. They can't be shared > > > across processes, so I think we could (should?) require the VMA to always be > > > MAP_PRIVATE. Does that buy us enough to rely on the VMA alone? I.e. is that > > > enough to prevent userspace and unaware kernel code from acquiring a reference > > > to the underlying page? > > > > Shared pages should be fine too (you folks wanted tmpfs support). > > Is that a conflict though? If the private->shared conversion request is kicked > out to userspace, then userspace can re-mmap() the files as MAP_SHARED, no? > > Allowing MAP_SHARED for guest private memory feels wrong. The data can't be > shared, and dirty data can't be written back to the file. It can be remapped, but faulting in the page would produce hwpoison entry. I don't see other way to make Google's use-case with tmpfs-backed guest memory work. > > The poisoned pages must be useless outside of the process with the blessed > > struct kvm. See kvm_pfn_map in the patch. > > The big requirement for kernel TDX support is that the pages are useless in the > host. Regarding the guest, for TDX, the TDX Module guarantees that at most a > single KVM guest can have access to a page at any given time. I believe the RMP > provides the same guarantees for SEV-SNP. > > SEV/SEV-ES could still end up with corruption if multiple guests map the same > private page, but that's obviously not the end of the world since it's the status > quo today. Living with that shortcoming might be a worthy tradeoff if punting > mutual exclusion between guests to firmware/hardware allows us to simplify the > kernel implementation. The critical question is whether we ever need to translate hva->pfn after the page is added to the guest private memory. I believe we do, but I never checked. And that's the reason we need to keep hwpoison entries around, which encode pfn. If we don't, it would simplify the solution: kvm_pfn_map is not needed. Single bit-per page would be enough. > > > > - Add a new GUP flag to retrive such pages from the userspace mapping. > > > > Used only for private mapping population. > > > > > > > - Shared gfn ranges managed by userspace, based on hypercalls from the > > > > guest. > > > > > > > > - Shared mappings get populated via normal VMA. Any poisoned pages here > > > > would lead to SIGBUS. > > > > > > > > So far it looks pretty straight-forward. > > > > > > > > The only thing that I don't understand is at way point the page gets tied > > > > to the KVM instance. Currently we do it just before populating shadow > > > > entries, but it would not work with the new scheme: as we poison pages > > > > on fault it they may never get inserted into shadow entries. That's not > > > > good as we rely on the info to unpoison page on free. > > > > > > Can you elaborate on what you mean by "unpoison"? If the page is never actually > > > mapped into the guest, then its poisoned status is nothing more than a software > > > flag, i.e. nothing extra needs to be done on free. > > > > Normally, poisoned flag preserved for freed pages as it usually indicate > > hardware issue. In this case we need return page to the normal circulation. > > So we need a way to differentiate two kinds of page poison. Current patch > > does this by adding page's pfn to kvm_pfn_map. But this will not work if > > we uncouple poisoning and adding to shadow PTE. > > Why use PG_hwpoison then? Page flags are scarce. I don't want to take occupy a new one until I'm sure I must. And we can re-use existing infrastructure to SIGBUS on access to such pages. -- Kirill A. Shutemov