Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2870332pxj; Mon, 31 May 2021 13:08:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxg/+nnUcgWeTYaEoQL0OP7vTMwtp5/wYn/HVd39ZClt9rRAPRt1etyTroNS0pKyTKUN4uR X-Received: by 2002:a05:6402:4c5:: with SMTP id n5mr26791227edw.322.1622491732818; Mon, 31 May 2021 13:08:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622491732; cv=none; d=google.com; s=arc-20160816; b=1A1hoLd+8WIxFY25HmMnJZZofe4haka3i06ZsvvrRqDquK002o8opAGlM+Vu6e3wyZ sMyoWfqxirT9pmNYB410/VCR1JTfdpWaomjdK+Fcx/ldiyV6naq92dkedqeg7rEBQPvw vAJe4aaG7kcSXXNykXsoJoRz/5SEWqBOFDWOgpVubzbiRaDcUstvDXkDW05wq4tLWBwQ pkkQAbJnqnI33b7hZr1jkmfdVYukMLWxdkdENSo6nCjNYMGov09b735DfaEPa3b4mmxV 1qfcmVZYskeatZgfr0rtWY3jRXww7ZWjx3jskpX27o3bBWHaF2/vJGeEP+fK6Ti0WeZN 9zeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=2PYA1pYVKDAI4+09LVa/AjMjJk2rf6grA0gDFLIFu60=; b=xWIj6/7mbuaZHT4s7Bu5dWwTj5JTDgrxRHwM0sMQxFMZOWbgaYwREdxM4KSUhZRFDV EVMBIik0h3fE8tLzCzv+qFHveQkZY/M899sThKDrQzadYmjd597nkYUoEZSw98AbgOEV qyO8mdjo2iiJzICmEnRp4b2TaPt97ja7qYhM3WPXBNzMHm+vgM2L+glcjTsFXbUeQj45 bZOkL92EQouPFOpbEqh2e95mpp9AYDfG+tbsM72E+OyoS+VuW/8FYk2YtQtk8hIsACV9 p17vT+n6XfGyqQreYXoBLx1sCly88DwWN0H/D98Xnkwd2Avz8e2iXlu5v7cYkRPmUsYo U4vA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=jMGB9HVW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p11si9260088edx.117.2021.05.31.13.08.29; Mon, 31 May 2021 13:08:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=jMGB9HVW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231889AbhEaUIs (ORCPT + 99 others); Mon, 31 May 2021 16:08:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231686AbhEaUIq (ORCPT ); Mon, 31 May 2021 16:08:46 -0400 Received: from mail-lj1-x229.google.com (mail-lj1-x229.google.com [IPv6:2a00:1450:4864:20::229]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13BB1C06174A for ; Mon, 31 May 2021 13:07:05 -0700 (PDT) Received: by mail-lj1-x229.google.com with SMTP id p20so16322328ljj.8 for ; Mon, 31 May 2021 13:07:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=2PYA1pYVKDAI4+09LVa/AjMjJk2rf6grA0gDFLIFu60=; b=jMGB9HVWu7We5YKyc81ESVoYoaXoMM9Q5VcAD9ur8rFnJFXGQm8l3uQzQpe1w75++g EGGRMbNCSB+pUHPUaToaTLfJM9PSx/lfkp1TPuCum3/5XAfyo8vUnThe7wR/PSxhuNjX PLdCe68JHlRwsBMVUV3Yk91Q+xLlWtAed+KQNGnqrFnCqaOSL9WXfhm9HPl89rB9jgBT lXNrJVfdnevcf+ZuYu91HI/Oj90bR3leOdHh3fdpECSOLnwVcyXGSf5uEXwZlTvzQz5t D0DPjQKZY+R6QDOnDshhaidiSAo3n1Kal6Afo3rBJsdiDsqtiwCAsEyLQy5Cmiku20XH G6uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=2PYA1pYVKDAI4+09LVa/AjMjJk2rf6grA0gDFLIFu60=; b=BFiZnxv1KGdXuV20s5+o1cL15gCt1N6YZxRspiqzVnVh3tI/eSOTtHiR3cIi2nL5KH DNrdBuJ38RtgVLoD2ZDt+h5qsp/zbIGztyvLpPiMuGzLhUcNlvoYnYKvCip0qqOiKcYq ui2t5R5TwVTdkNab/0QP08dCCOuVBh2b7DXphxNIsAolb4UbjQpygQlo09kVmdVbrO4i mPypAG/VrZHZKT4QZ/j2PJhv05eLae29d56Irj2sfrN5sF5JKRMG5eyESyBoOCrsgQ7o w29COCsqZuT+NK+dmthkqhnaaafrKDPwQ3+213pqetXTG4TgZvWGHFF5+vs2RuDQr7Gp pnQA== X-Gm-Message-State: AOAM531uR+A+RVb+fxfd/CFj9vJl6mznqNcrvbhO71bPd2BlmA/NO3Xt bWxTHrTKXrbhUi7fsQzf68boag== X-Received: by 2002:a2e:7605:: with SMTP id r5mr18025459ljc.414.1622491623271; Mon, 31 May 2021 13:07:03 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id f4sm1440832lfu.133.2021.05.31.13.07.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 May 2021 13:07:02 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id 76C441027C1; Mon, 31 May 2021 23:07:12 +0300 (+03) Date: Mon, 31 May 2021 23:07:12 +0300 From: "Kirill A. Shutemov" To: Sean Christopherson Cc: "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Jim Mattson , David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , Chao Peng , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages Message-ID: <20210531200712.qjxghakcaj4s6ara@box.shutemov.name> References: <20210419142602.khjbzktk5tk5l6lk@box.shutemov.name> <20210419164027.dqiptkebhdt5cfmy@box.shutemov.name> <20210419185354.v3rgandtrel7bzjj@box> <20210419225755.nsrtjfvfcqscyb6m@box.shutemov.name> <20210521123148.a3t4uh4iezm6ax47@box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 26, 2021 at 07:46:52PM +0000, Sean Christopherson wrote: > On Fri, May 21, 2021, Kirill A. Shutemov wrote: > > Hi Sean, > > > > The core patch of the approach we've discussed before is below. It > > introduce a new page type with the required semantics. > > > > The full patchset can be found here: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git kvm-unmapped-guest-only > > > > but only the patch below is relevant for TDX. QEMU patch is attached. > > Can you post the whole series? I hoped to get it posted as part of TDX host enabling. As it is the feature is incomplete for pure KVM. I didn't implement on KVM side checks that provided by TDX module/hardware, so nothing prevents the same page to be added to multiple KVM instances. > The KVM behavior and usage of FOLL_GUEST is very relevant to TDX. The patch can be found here: https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=kvm-unmapped-guest-only&id=2cd6c2c20528696a46a2a59383ca81638bf856b5 > > CONFIG_HAVE_KVM_PROTECTED_MEMORY has to be changed to what is appropriate > > for TDX and FOLL_GUEST has to be used in hva_to_pfn_slow() when running > > TDX guest. > > This behavior in particular is relevant; KVM should provide FOLL_GUEST iff the > access is private or the VM type doesn't differentiate between private and > shared. I added FOL_GUEST if the KVM instance has the feature enabled. On top of that TDX-specific code has to check that the page is in fact PageGuest() before inserting it into private SEPT. The scheme makes sure that user-accessible memory cannot be not added as private to TD. > > When page get inserted into private sept we must make sure it is > > PageGuest() or SIGBUS otherwise. > > More KVM feedback :-) > > Ideally, KVM will synchronously exit to userspace with detailed information on > the bad behavior, not do SIGBUS. Hopefully that infrastructure will be in place > sooner than later. > > https://lkml.kernel.org/r/YKxJLcg/WomPE422@google.com My experiments are still v5.11, but I can rebase to whatever needed once the infrastructure hits upstream. > > Inserting PageGuest() into shared is fine, but the page will not be accessible > > from userspace. > > Even if it can be functionally fine, I don't think we want to allow KVM to map > PageGuest() as shared memory. The only reason to map memory shared is to share > it with something, e.g. the host, that doesn't have access to private memory, so > I can't envision a use case. > > On the KVM side, it's trivially easy to omit FOLL_GUEST for shared memory, while > always passing FOLL_GUEST would require manually zapping. Manual zapping isn't > a big deal, but I do think it can be avoided if userspace must either remap the > hva or define a new KVM memslot (new gpa->hva), both of which will automatically > zap any existing translations. > > Aha, thought of a concrete problem. If KVM maps PageGuest() into shared memory, > then KVM must ensure that the page is not mapped private via a different hva/gpa, > and is not mapped _any_ other guest because the TDX-Module's 1:1 PFN:TD+GPA > enforcement only applies to private memory. The explicit "VM_WRITE | VM_SHARED" > requirement below makes me think this wouldn't be prevented. Hm. I didn't realize that TDX module doesn't prevent the same page to be used as shared and private at the same time. Omitting FOLL_GUEST for shared memory doesn't look like a right approach. IIUC, it would require the kernel to track what memory is share and what private, which defeat the purpose of the rework. I would rather enforce !PageGuest() when share SEPT is populated in addition to enforcing PageGuest() fro private SEPT. Do you see any problems with this? > Oh, and the other nicety is that I think it would avoid having to explicitly > handle PageGuest() memory that is being accessed from kernel/KVM, i.e. if all > memory exposed to KVM must be !PageGuest(), then it is also eligible for > copy_{to,from}_user(). copy_{to,from}_user() enforce by setting PTE entries to PROT_NONE. Or do I miss your point? > > > Any feedback is welcome. > > > > -------------------------------8<------------------------------------------- > > > > From: "Kirill A. Shutemov" > > Date: Fri, 16 Apr 2021 01:30:48 +0300 > > Subject: [PATCH] mm: Introduce guest-only pages > > > > PageGuest() pages are only allowed to be used as guest memory. Userspace > > is not allowed read from or write to such pages. > > > > On page fault, PageGuest() pages produce PROT_NONE page table entries. > > Read or write there will trigger SIGBUS. Access to such pages via > > syscall leads to -EIO. > > > > The new mprotect(2) flag PROT_GUEST translates to VM_GUEST. Any page > > fault to VM_GUEST VMA produces PageGuest() page. > > > > Only shared tmpfs/shmem mappings are supported. > > Is limiting this to tmpfs/shmem only for the PoC/RFC, or is it also expected to > be the long-term behavior? I expect it to be enough to cover all relevant cases, no? Note that MAP_ANONYMOUS|MAP_SHARED also fits here. -- Kirill A. Shutemov