Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp2825513pxb; Mon, 19 Apr 2021 15:14:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyOvz3YqSfI07xA7uL9F4Hcke7IskmH131H/XnL0+NSyz0BVchhJJFD3Ohf7opOgv0fAFUq X-Received: by 2002:a17:90b:3887:: with SMTP id mu7mr1318030pjb.65.1618870461623; Mon, 19 Apr 2021 15:14:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618870461; cv=none; d=google.com; s=arc-20160816; b=F6MHbYtxi5Wae2wi0NNtl5/mfsvAwIV9Z8H/xx6jUxcw6SASmxPxTYnnCIb3Eq7C+i wtbEam6RbN+OWJoiJ/qbtT7ohj7TdldnbmYp76lFT6xOh4DMxgL4AKjUVoHwPpE/zFw8 39pFX8VQJW0XoDt8kgAW+M7fjvh3SoB4g1GETY4Kf4precnSmT/M3T/LXy7OJ0t0kCCU E1AkiPHpkSrvmDBX/GZYvCJ/1M6LyOzKsXWWTVlTMNGjlnkFwAykUdp8tqNw3SfhqMVH 5NNjj9rF4kVJvGcAslilQ4BFHQHcqSjPTjoxHQNrp1zHlyunA9EQnrcTHhWBwKUCC/h9 84IQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=JUnfh0pPTrADDE5ELA+4XLvS6WL0YCiwkGF+Ud8ird8=; b=OZdgbWh5J6Dgw4iLmv3k1SefjHJU4qba1IgCQYsl6dQgnoCGFash8Y9DlNoMpiqO5f YZQnZX1T0GNjeP7k/oT4Rl5heRF/s3tMG/+D46CTX8DpMXITLKpxIEy9Hn+yyIEL2yRI B6USmvtPepcw+jsbsscrF0WOFeYO4mTU9c7AWG00PxfNrq9dQrCNhiEprBdk4tVkPB6N cJNT7xtAKDMLmDyQG/h0nqAgzxJFXE1JTKe+fPYyY1bj01CYusvrH+MIRxWSm2vRA+pp rHXwDWWelCPy+h/B0/Qa4ZWVZ/9E/ZujoQLXJn4Zb9mpFMdOMIa837zD6eKCnCtkkfp9 bs8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=gVSJkZhY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p22si16421814pgj.188.2021.04.19.15.14.09; Mon, 19 Apr 2021 15:14:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=gVSJkZhY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240679AbhDSSKH (ORCPT + 99 others); Mon, 19 Apr 2021 14:10:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38856 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233488AbhDSSKE (ORCPT ); Mon, 19 Apr 2021 14:10:04 -0400 Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E027C061761 for ; Mon, 19 Apr 2021 11:09:35 -0700 (PDT) Received: by mail-pg1-x530.google.com with SMTP id p12so24818216pgj.10 for ; Mon, 19 Apr 2021 11:09:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=JUnfh0pPTrADDE5ELA+4XLvS6WL0YCiwkGF+Ud8ird8=; b=gVSJkZhYfwM6kw+ptrMrrBqiowP0QK/MTQrX80iGcTyqt0Nui8BfDhHTZfkAKzObwu ZeFuyE4ol8P5d++A0LiroAJ6mp+xlu8ICWB2rkgZoaa4qE429e+7Vtsj2fK3naZebWpJ NK4tdLZs4JwFPzTsYaIvfHzqgn8ujWJrhXXQDc+YCvgxrjUvVkGt2bHRzrIZ/uEmDRzB oILq+PtO7IdOMY9865se4DdwVTSKOjYXZUzrcAHB1ZTo85rJMsIkGjtutPBNkkdlxo3I 1J6w0E4Tih1MEpMU3r08b6+M3T8s/jK6ynpmm1idIAIZQkxzLnvVQHKnSeDpGbBLE0dd LSrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=JUnfh0pPTrADDE5ELA+4XLvS6WL0YCiwkGF+Ud8ird8=; b=EdvVGaUTutwRxsTmp6X3P4phAEcV6aotX8CiY2YfIxibIY12XkTM5M+zr1gIsBII1T DyyGScmCUSrwva5a8aGiQKrPZIUy8tbvYr5kASILygWO9vLPHMjWdX1+8G/lO1R5X+Ec NHUt1ycMQLEG2s4Pk2nZmopTZUSMT7QWxhWK1Cj91q3019CkhMa+DdofaDRsQLWAng37 b37p8mQQv6HMUEazhwULTlSS5n5YTpw7dFCQ3ed+/MejNJ8AxcRr7cw561cd/J3HBDUd tTEjz1j5yDcwqmyw/EPC1SRlDj+48OuGF29cYYbCtiSZOuGqLst6dtseH/uiYShYFZDh QRJg== X-Gm-Message-State: AOAM532l/YtkbgHZ+aphOOmzO6zQ1Z1DYuc3rYjdSl0zemAKdTFaFLrJ dLET6l+EPZl93ez7oQZ+7ynjiA== X-Received: by 2002:a62:d108:0:b029:25d:497e:2dfd with SMTP id z8-20020a62d1080000b029025d497e2dfdmr10132772pfg.29.1618855774343; Mon, 19 Apr 2021 11:09:34 -0700 (PDT) Received: from google.com (240.111.247.35.bc.googleusercontent.com. [35.247.111.240]) by smtp.gmail.com with ESMTPSA id 63sm5957168pfx.202.2021.04.19.11.09.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Apr 2021 11:09:33 -0700 (PDT) Date: Mon, 19 Apr 2021 18:09:29 +0000 From: Sean Christopherson To: "Kirill A. Shutemov" Cc: "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Jim Mattson , David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages Message-ID: References: <20210416154106.23721-1-kirill.shutemov@linux.intel.com> <20210416154106.23721-14-kirill.shutemov@linux.intel.com> <20210419142602.khjbzktk5tk5l6lk@box.shutemov.name> <20210419164027.dqiptkebhdt5cfmy@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210419164027.dqiptkebhdt5cfmy@box.shutemov.name> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 19, 2021, Kirill A. Shutemov wrote: > On Mon, Apr 19, 2021 at 04:01:46PM +0000, Sean Christopherson wrote: > > But fundamentally the private pages, are well, private. They can't be shared > > across processes, so I think we could (should?) require the VMA to always be > > MAP_PRIVATE. Does that buy us enough to rely on the VMA alone? I.e. is that > > enough to prevent userspace and unaware kernel code from acquiring a reference > > to the underlying page? > > Shared pages should be fine too (you folks wanted tmpfs support). Is that a conflict though? If the private->shared conversion request is kicked out to userspace, then userspace can re-mmap() the files as MAP_SHARED, no? Allowing MAP_SHARED for guest private memory feels wrong. The data can't be shared, and dirty data can't be written back to the file. > The poisoned pages must be useless outside of the process with the blessed > struct kvm. See kvm_pfn_map in the patch. The big requirement for kernel TDX support is that the pages are useless in the host. Regarding the guest, for TDX, the TDX Module guarantees that at most a single KVM guest can have access to a page at any given time. I believe the RMP provides the same guarantees for SEV-SNP. SEV/SEV-ES could still end up with corruption if multiple guests map the same private page, but that's obviously not the end of the world since it's the status quo today. Living with that shortcoming might be a worthy tradeoff if punting mutual exclusion between guests to firmware/hardware allows us to simplify the kernel implementation. > > > - Add a new GUP flag to retrive such pages from the userspace mapping. > > > Used only for private mapping population. > > > > > - Shared gfn ranges managed by userspace, based on hypercalls from the > > > guest. > > > > > > - Shared mappings get populated via normal VMA. Any poisoned pages here > > > would lead to SIGBUS. > > > > > > So far it looks pretty straight-forward. > > > > > > The only thing that I don't understand is at way point the page gets tied > > > to the KVM instance. Currently we do it just before populating shadow > > > entries, but it would not work with the new scheme: as we poison pages > > > on fault it they may never get inserted into shadow entries. That's not > > > good as we rely on the info to unpoison page on free. > > > > Can you elaborate on what you mean by "unpoison"? If the page is never actually > > mapped into the guest, then its poisoned status is nothing more than a software > > flag, i.e. nothing extra needs to be done on free. > > Normally, poisoned flag preserved for freed pages as it usually indicate > hardware issue. In this case we need return page to the normal circulation. > So we need a way to differentiate two kinds of page poison. Current patch > does this by adding page's pfn to kvm_pfn_map. But this will not work if > we uncouple poisoning and adding to shadow PTE. Why use PG_hwpoison then?