Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp966181pxj; Wed, 2 Jun 2021 16:38:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyud4j3m6i3aXIPaSW0ORP3sS2mqAnri6bvAYoQaoQb1HuxZqtroJgc7W3xkTmEpqhwjS4Z X-Received: by 2002:a17:907:2165:: with SMTP id rl5mr12974810ejb.98.1622677105272; Wed, 02 Jun 2021 16:38:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622677105; cv=none; d=google.com; s=arc-20160816; b=A6t4rp4ozUlmAe5gOqECpv2avjcZtqeaxd6hDCVcJnoMoShw6vm4y0D1uHRRl56/Oo JP+L0ttFRurz9KB6UbODE/TZu5l4CMEmep9oh+tskNtzy6zriB16Vw98WlsUiMCINBHm a+XVCxjhVR1UqTHuvPgtk7Me6lIrz1fqBktAoEiehNqhTbb6hv7xnygXP8/5Vg5sINdi 5jsXh+CKM9ZGTHDqmZtaRLpQc5Af9VZRlh4iiYTam4ib0W6GWkcbvfSH1MrqEW8Ilejj 3khU0JkS8JqLUWv+9hBlH9gvrggDRs9s0RqbmvL131n1F0tTP6c+cIMWmpNWiL2ULUyd jGqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=PTfQXWxpT7/vPk75BLaU1gGVmYZcPYEwaABzwTpX4X8=; b=lCT4Mg5dLF6sspg6lDA8jJVXZvyRKJMNm3o0iqQK+wLrZIfKwPMnYkxRphJc9qSESR q9yryeH7joNLq1I3ull/6lD7VFlpbvqO6r6878aLG5PfNy0KY6oYDcTpPWJHldZcUNwf b2fp2NqGuzimpwXWr0+ZMr9WmKMrmyueI7PoVzB/n45edLvhLGUb/+ZnjmznvbueZVcr qlZgoSvsIM1NsOcPK/wX41LpoN+IyKigWsVzicKn8h5DrApJY9YQlV5gpgqEc5/z2dtI NpJoZ+6aseh5Dg16HuwO92fERMakrYCk7fV+0iysJbw6lnXhJpk2FTLkw+l29mceyGwJ /liQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=xoIjz3sZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s18si1039354eji.655.2021.06.02.16.38.00; Wed, 02 Jun 2021 16:38:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=xoIjz3sZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229610AbhFBXgm (ORCPT + 99 others); Wed, 2 Jun 2021 19:36:42 -0400 Received: from mail-lj1-f176.google.com ([209.85.208.176]:36433 "EHLO mail-lj1-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229553AbhFBXgl (ORCPT ); Wed, 2 Jun 2021 19:36:41 -0400 Received: by mail-lj1-f176.google.com with SMTP id 131so4766648ljj.3 for ; Wed, 02 Jun 2021 16:34:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=PTfQXWxpT7/vPk75BLaU1gGVmYZcPYEwaABzwTpX4X8=; b=xoIjz3sZWLuffKRVh0YALgXm6O8hnsf+f+WCDpt4AdwtSsGKBjFrDkpv6SHmOKfLvQ ag6Zz3+0PVbskBft46UU1iBPu+6JCm+XXaKgAsY1YTm7Ib3+vfaTARypU8kuS4DO8bU4 4erqAmfO60o4/Y1v3inpt4u+tjLPknSeFcCnLA9KJ6i+IHoPzKxtEqsTAYOUcyv0H/ip 1H2ACkRBA30VtVgxpEFXPfXt7aT1I0vO4gIY+bd0BjcOMeDMWrsQIC89RzVjDvUkQ6Me nUD6UbBJ4qa2eYALWuISCxy5eyq2hsGJQ20/+Fy776Zq2mDn6g5z5YbtJm0dl/EkdAIS D2lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=PTfQXWxpT7/vPk75BLaU1gGVmYZcPYEwaABzwTpX4X8=; b=fHmD+NjPCr5t9O84iMDhG3lQCF5nTBys/Sddvf/Mzk4/8VbL0mHLGMFvno5AlLTZ74 UVnxW+s/RSfhbFQtyUzwpXrybdRQS4dNxY4SdxACIynUDxJvVGuoRVoaZRU29R1p7ymy 2IoIduCm1TjKvTwmWkrUy3ESNbj8yA/uiDn/9RDs+P8PKaeZNvB1TF742Rm5GxFnsiIi 9tUEGmSIBfdGKxVOsGeUA9VS+0mYeQtvU8CDKAlzLWHOb7+fkYfPxIqg25wuBeHrFmBl zc88RsRj90lPuMhKZQzbSzHYSSwX5EdZrX22kXBXHafGlGwa7HkmAp4ZkwpMc9UYsZs/ bbvw== X-Gm-Message-State: AOAM531n/TlLSGsF0BPcP7AhNz6TjHu9hX9E0c6Dqqo9Cg7zJ9hePkxL 8Z9th640SucKvjXTEnpRySw9+w== X-Received: by 2002:a2e:a489:: with SMTP id h9mr27552130lji.21.1622676823199; Wed, 02 Jun 2021 16:33:43 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id p1sm132646lfr.78.2021.06.02.16.33.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Jun 2021 16:33:42 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id 70984102781; Thu, 3 Jun 2021 02:33:53 +0300 (+03) Date: Thu, 3 Jun 2021 02:33:53 +0300 From: "Kirill A. Shutemov" To: Sean Christopherson Cc: "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Jim Mattson , David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , Chao Peng , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages Message-ID: <20210602233353.gxq35yxluhas5knp@box> References: <20210419164027.dqiptkebhdt5cfmy@box.shutemov.name> <20210419185354.v3rgandtrel7bzjj@box> <20210419225755.nsrtjfvfcqscyb6m@box.shutemov.name> <20210521123148.a3t4uh4iezm6ax47@box> <20210531200712.qjxghakcaj4s6ara@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 02, 2021 at 05:51:02PM +0000, Sean Christopherson wrote: > > Omitting FOLL_GUEST for shared memory doesn't look like a right approach. > > IIUC, it would require the kernel to track what memory is share and what > > private, which defeat the purpose of the rework. I would rather enforce > > !PageGuest() when share SEPT is populated in addition to enforcing > > PageGuest() fro private SEPT. > > Isn't that what omitting FOLL_GUEST would accomplish? For shared memory, > including mapping memory into the shared EPT, KVM will omit FOLL_GUEST and thus > require the memory to be readable/writable according to the guest access type. Ah. I guess I see what you're saying: we can pipe down the shared bit from GPA from direct_page_fault() (or whatever handles the fault) down to hva_to_pfn_slow() and omit FOLL_GUEST if the shared bit is set. Right? I guest it's doable, but codeshuffling going to be ugly. > By definition, that excludes PageGuest() because PageGuest() pages must always > be unmapped, e.g. PROTNONE. And for private EPT, because PageGuest() is always > PROTNONE or whatever, it will require FOLL_GUEST to retrieve the PTE/PMD/Pxx. > > On a semi-related topic, I don't think can_follow_write_pte() is the correct > place to hook PageGuest(). TDX's S-EPT has a quirk where all private guest > memory must be mapped writable, but that quirk doesn't hold true for non-TDX > guests. It should be legal to map private guest memory as read-only. Hm. The point of the change in can_follow_write_pte() is to only allow to write to a PageGuest() page if FOLL_GUEST is used and the mapping is writable. Without the change gup(FOLL_GUEST|FOLL_WRITE) would fail. It doesn't prevent using read-only guest mappings as read-only. But if you want to write to it it has to writable (in addtion to FOLL_GUEST). > And I believe the below snippet in follow_page_pte() will be problematic > too, since FOLL_NUMA is added unless FOLL_FORCE is set. I suspect the > correct approach is to handle FOLL_GUEST as an exception to > pte_protnone(), though that might require adjusting pte_protnone() to be > meaningful even when CONFIG_NUMA_BALANCING=n. > > if ((flags & FOLL_NUMA) && pte_protnone(pte)) > goto no_page; > if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) { > pte_unmap_unlock(ptep, ptl); > return NULL; > } Good catch. I'll look into how to untangle NUMA balancing and PageGuest(). It shouldn't be hard. PageGuest() pages should be subject for balancing. > > Do you see any problems with this? > > > > > Oh, and the other nicety is that I think it would avoid having to explicitly > > > handle PageGuest() memory that is being accessed from kernel/KVM, i.e. if all > > > memory exposed to KVM must be !PageGuest(), then it is also eligible for > > > copy_{to,from}_user(). > > > > copy_{to,from}_user() enforce by setting PTE entries to PROT_NONE. > > But KVM does _not_ want those PTEs PROT_NONE. If KVM is accessing memory that > is also accessible by the the guest, then it must be shared. And if it's shared, > it must also be accessible to host userspace, i.e. something other than PROT_NONE, > otherwise the memory isn't actually shared with anything. > > As above, any guest-accessible memory that is accessed by the host must be > shared, and so must be mapped with the required permissions. I don't see contradiction here: copy_{to,from}_user() would fail with -EFAULT on PROT_NONE PTE. By saying in initial posting that inserting PageGuest() into shared is fine, I didn't mean it's usefule, just allowed. -- Kirill A. Shutemov