Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp791760rwb; Wed, 9 Nov 2022 08:39:23 -0800 (PST) X-Google-Smtp-Source: AMsMyM6F0e5OON7r1ExVgFPrSe4yQEkT+a69sHz/+PBmu2mE+NQEJpKYRaxis29lHJ5F4/J25Vqx X-Received: by 2002:a05:6402:2791:b0:461:c5b4:d114 with SMTP id b17-20020a056402279100b00461c5b4d114mr59671361ede.357.1668011963168; Wed, 09 Nov 2022 08:39:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668011963; cv=none; d=google.com; s=arc-20160816; b=cGG8a9pvSROYYbitMVx4qFp4tAuwfBwGXHUkTGbU9h+6PoIaz9s5l4KM1l8XaBtd+I V+PHrD/9qUQFTfTYc8IrmkPkehUKM4eb+v47uMYk/XYj3jxekioxu1kMek8WxgtJjgs/ kVRav3tAGPyqoSPqwOgTrGBkH4YCvUkRNtyQpxU0vne9fAsbbHMY8TO3EMQVdRUuXIxn ouINOydXijHmD6rY8MjLfJ2pzO5CheEvxq1ssuYphNoUIXFgm5R1Q4r99cXIL48ixBUN o17t/FTkZFdQeehJOAKQXfiByajRLYH66H01+h1ukgAPyoOjDSFIX/WH1hbz7t/TxlN9 V0oA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:feedback-id :dkim-signature:dkim-signature; bh=H8kAV/CE2ghhRaBccYp9I0jjnZy3RFX2eXZbgxEIOcI=; b=sAAOJlgIlkMx38IN9UXtHOjLarlWQG8MrRB7nmvP7kxMkcg9VYusl/hLtFWNUoAIyu ORLeg5PNOIiRPxfeNzFgTfbSc+gEXvqsHcenRV+dspO7m53kufnY+vAbgsL3fZKuqU6u /O01t44Kv6fAHgmz6x5YZpKK9k2KfPP0eyKOaYXBeUsSsId/iYLuZIqP9cj8iQ0ykqPD Zyf1DPcnd7S5Ry+7FvCBJU4rcT4dYDqU4I9NVwx3tn7oMfb5sHt42rntkCvtvLAe6hVG SlagJ071xdmzizeyp/8bx0cJ9eKImS0EYyjyZiY/+blJ0hdNtWPutMttkcUev8SCg6wi w4xw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shutemov.name header.s=fm2 header.b=jYsBBMC7; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b=sJn22iXF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mp41-20020a1709071b2900b0078d9b5792a0si17273913ejc.319.2022.11.09.08.39.00; Wed, 09 Nov 2022 08:39:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@shutemov.name header.s=fm2 header.b=jYsBBMC7; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b=sJn22iXF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230262AbiKIPyS (ORCPT + 92 others); Wed, 9 Nov 2022 10:54:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43888 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229714AbiKIPyQ (ORCPT ); Wed, 9 Nov 2022 10:54:16 -0500 Received: from wnew4-smtp.messagingengine.com (wnew4-smtp.messagingengine.com [64.147.123.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4584DC7B; Wed, 9 Nov 2022 07:54:15 -0800 (PST) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailnew.west.internal (Postfix) with ESMTP id 34A082B06016; Wed, 9 Nov 2022 10:54:11 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Wed, 09 Nov 2022 10:54:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm2; t=1668009250; x=1668016450; bh=H8 kAV/CE2ghhRaBccYp9I0jjnZy3RFX2eXZbgxEIOcI=; b=jYsBBMC74+WS9Qdasq wMNLHIGiDVSgrzZY0ZZcOhQaSoY9+jgaNfcBHvli2fxtPvKu/o752qZIoecJm9t7 q3iK4BRv4zhO+tj/o1rg3+gpNd17oVbQdEOep/3lxK88bw32ivYQGgQhE3llAfdW bFNPr2KasrDPoDYt9pA2ewg/lTaftbbXjXt1wMzGaicVGOxM4RZnMZ9u/DrgVCU5 GSsg8BfpxtW1s3tqj/jJYib9h7nXO9ZxDy75XeqKoX/qaoX0tREp1DZwsTxDFYG7 Jfn8TOxfsH54SipmyGUjIeySSH/AeP84n/0fS7FQBYJ4IuXQbLWdwdvOcFk/0s2v RyHg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; t=1668009250; x=1668016450; bh=H8kAV/CE2ghhRaBccYp9I0jjnZy3 RFX2eXZbgxEIOcI=; b=sJn22iXFueTNEANRK/zyJZClFypl1maVA2Uvf4doanX8 QzbyCtmWFsSSFmz+vplYCXT0E165iQBG9su9MAVamut1B+PbE6+WRbQ67esTdM4f 0UwEziQwhkK+ARLr/aYvrQE0+Rdf5wsj9PHYPHjMGhiJZaFAXq7G2pslVuJi2iqm 73XLjGpcQ4oGp0z9/WPaeUSniOfei31oJZek9LqZwxRu0u808EUTgX7b6/cazaQ1 0EFL9wKepNeWfs00YaNOODjRSWG4xaxonYoL2DN4lNsTC3usU8yAMtB3MvkvWr4o 11gWkmOKqdD7fCMmdCdgQ9628fPgm9JR6InJHVeUYQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvgedrfedvgdekudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvvefukfhfgggtuggjsehttddttddttddvnecuhfhrohhmpedfmfhirhhi lhhlucetrdcuufhhuhhtvghmohhvfdcuoehkihhrihhllhesshhhuhhtvghmohhvrdhnrg hmvgeqnecuggftrfgrthhtvghrnhephfeigefhtdefhedtfedthefghedutddvueehtedt tdehjeeukeejgeeuiedvkedtnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpe hmrghilhhfrhhomhepkhhirhhilhhlsehshhhuthgvmhhovhdrnhgrmhgv X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 9 Nov 2022 10:54:08 -0500 (EST) Received: by box.shutemov.name (Postfix, from userid 1000) id 7FF36103D85; Wed, 9 Nov 2022 18:54:04 +0300 (+03) Date: Wed, 9 Nov 2022 18:54:04 +0300 From: "Kirill A. Shutemov" To: Isaku Yamahata , Hugh Dickins Cc: Vishal Annapurve , Chao Peng , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , tabba@google.com, Michael Roth , mhocko@suse.com, Muchun Song , wei.w.wang@intel.com Subject: Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM Message-ID: <20221109155404.istawiyvwr3yffag@box.shutemov.name> References: <20221025151344.3784230-1-chao.p.peng@linux.intel.com> <20221108004141.GF1063309@ls.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221108004141.GF1063309@ls.amr.corp.intel.com> X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 07, 2022 at 04:41:41PM -0800, Isaku Yamahata wrote: > On Thu, Nov 03, 2022 at 05:43:52PM +0530, > Vishal Annapurve wrote: > > > On Tue, Oct 25, 2022 at 8:48 PM Chao Peng wrote: > > > > > > This patch series implements KVM guest private memory for confidential > > > computing scenarios like Intel TDX[1]. If a TDX host accesses > > > TDX-protected guest memory, machine check can happen which can further > > > crash the running host system, this is terrible for multi-tenant > > > configurations. The host accesses include those from KVM userspace like > > > QEMU. This series addresses KVM userspace induced crash by introducing > > > new mm and KVM interfaces so KVM userspace can still manage guest memory > > > via a fd-based approach, but it can never access the guest memory > > > content. > > > > > > The patch series touches both core mm and KVM code. I appreciate > > > Andrew/Hugh and Paolo/Sean can review and pick these patches. Any other > > > reviews are always welcome. > > > - 01: mm change, target for mm tree > > > - 02-08: KVM change, target for KVM tree > > > > > > Given KVM is the only current user for the mm part, I have chatted with > > > Paolo and he is OK to merge the mm change through KVM tree, but > > > reviewed-by/acked-by is still expected from the mm people. > > > > > > The patches have been verified in Intel TDX environment, but Vishal has > > > done an excellent work on the selftests[4] which are dedicated for this > > > series, making it possible to test this series without innovative > > > hardware and fancy steps of building a VM environment. See Test section > > > below for more info. > > > > > > > > > Introduction > > > ============ > > > KVM userspace being able to crash the host is horrible. Under current > > > KVM architecture, all guest memory is inherently accessible from KVM > > > userspace and is exposed to the mentioned crash issue. The goal of this > > > series is to provide a solution to align mm and KVM, on a userspace > > > inaccessible approach of exposing guest memory. > > > > > > Normally, KVM populates secondary page table (e.g. EPT) by using a host > > > virtual address (hva) from core mm page table (e.g. x86 userspace page > > > table). This requires guest memory being mmaped into KVM userspace, but > > > this is also the source where the mentioned crash issue can happen. In > > > theory, apart from those 'shared' memory for device emulation etc, guest > > > memory doesn't have to be mmaped into KVM userspace. > > > > > > This series introduces fd-based guest memory which will not be mmaped > > > into KVM userspace. KVM populates secondary page table by using a > > > > With no mappings in place for userspace VMM, IIUC, looks like the host > > kernel will not be able to find the culprit userspace process in case > > of Machine check error on guest private memory. As implemented in > > hwpoison_user_mappings, host kernel tries to look at the processes > > which have mapped the pfns with hardware error. > > > > Is there a modification needed in mce handling logic of the host > > kernel to immediately send a signal to the vcpu thread accessing > > faulting pfn backing guest private memory? > > mce_register_decode_chain() can be used. MCE physical address(p->mce_addr) > includes host key id in addition to real physical address. By searching used > hkid by KVM, we can determine if the page is assigned to guest TD or not. If > yes, send SIGBUS. > > kvm_machine_check() can be enhanced for KVM specific use. This is before > memory_failure() is called, though. > > any other ideas? That's too KVM-centric. It will not work for other possible user of restricted memfd. I tried to find a way to get it right: we need to get restricted memfd code info about corrupted page so it can invalidate its users. On the next request of the page the user will see an error. In case of KVM, the error will likely escalate to SIGBUS. The problem is that core-mm code that handles memory failure knows nothing about restricted memfd. It only sees that the page belongs to a normal memfd. AFAICS, there's no way to get it intercepted from the shim level. shmem code has to be patches. shmem_error_remove_page() has to call into restricted memfd code. Hugh, are you okay with this? Or maybe you have a better idea? -- Kiryl Shutsemau / Kirill A. Shutemov