Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp31304125rwd; Thu, 6 Jul 2023 19:48:24 -0700 (PDT) X-Google-Smtp-Source: APBJJlGP02fZR+36yoSXrPsRiFW9oikF9lBG/Gf5+9nIpj/LNFF/CcpTp0BIwwbnW3u3UOFPA3Sr X-Received: by 2002:a17:903:2443:b0:1b8:a6b8:a400 with SMTP id l3-20020a170903244300b001b8a6b8a400mr3605920pls.67.1688698104214; Thu, 06 Jul 2023 19:48:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688698104; cv=none; d=google.com; s=arc-20160816; b=sQRmaPxsD/xOMLihfinV9rGlFqF4jlB9Rm7RRHuU/qWnZ0ZO2m23Tp43MtV3MuDY3G 2UTfTs78UDNCCxQxx8QIcPInRoGsAaXDCC/en6ZOLlzH9xtrEsuHpRV6plnocZ3C0tUH gf66g4qFQtRRIrlGISaIvpTFwI4n3tLWHsMOTAPVN2ervIUPCP8D1WLXKQS+RIzNSrz8 sBBmFKuW5zcPRp1KeLrBbZBs5F/MFEr7dUKo3sv1yngcWZq/m8vkgFzPV+q0+Tokxm6/ xzdUGa1WVGN0uEaf+YoXjjsGnI9/d78SfWIxd4YPpyNtnRGbIe5zW5S+/kFj8UzZpyUG 8zzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=jHzMQLp6cC/kLBC/6EOoqJ2QCFHXCMXxzGNBM9QacjY=; fh=kXYxInfp3zP0E0/Jk3foitQJufcbNmZYjnx9jn/WtZw=; b=Q+nYBEFXTdfSI7Vpi8IczNMsvfHjAFuRorVaRGBZau1SbiY4+EDIMeIrz1DJvNRnnb HapZ3si1LGGSmg92IXEELN7gZSLar1MsAZveR1AApNcRWAVtqbxnWIxFp0syx8UGmw8/ LfzRjhbbuVXX+YEdDJkfFaC4tpo1jcfagfOevacHQvYt8iYIFUaSTzSxLfUCsCPN9gn5 KRBetqVdyL6KfhOFpaLbG2eOHN7u/cXDlr+KXeaBNbpgTcJTt4n75Y2+RCdMDLByL6qz 5UsaSlxd7A7xNhOj/SSjrPeQzFu5e/xbIEGlml02kseLQoyQ/MXkbTEr8UUBx4mGzWcD JNQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=hJRjPgcT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h4-20020a170902f54400b001aaf2ced278si2878620plf.430.2023.07.06.19.48.09; Thu, 06 Jul 2023 19:48:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=hJRjPgcT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231921AbjGGBfd (ORCPT + 99 others); Thu, 6 Jul 2023 21:35:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229910AbjGGBfa (ORCPT ); Thu, 6 Jul 2023 21:35:30 -0400 Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9DF122100 for ; Thu, 6 Jul 2023 18:35:15 -0700 (PDT) Received: by mail-lf1-x12b.google.com with SMTP id 2adb3069b0e04-4fbb281eec6so2007099e87.1 for ; Thu, 06 Jul 2023 18:35:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1688693713; x=1691285713; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jHzMQLp6cC/kLBC/6EOoqJ2QCFHXCMXxzGNBM9QacjY=; b=hJRjPgcTCc7q6itd2xbffCg48LKO/Qm/YiojcslhQVKO2e+xGuKwenYHZjO6sm/Nd4 j+Q+arjaIozLu+vV0ToUrNaQE3iqbHFkEbzw62omnPFzlS5rZwkuqhAb6kqwNak5ba03 v7eBCW1S9UdFdykuZWhw1cVpSMPjbwNVEnVDg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688693713; x=1691285713; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jHzMQLp6cC/kLBC/6EOoqJ2QCFHXCMXxzGNBM9QacjY=; b=Al8CQN6Xa2RZ+tC+OX4Vxu7pKRK6VYcvMO6b8EMCL5gDRHCEWFeiiAlaVbudYBa21E CLTM70ltiiiI2xnW84Dj5ObxRBGltN9tkqQhWj7iR2aAfMvKmHIAeG3U84/CH06Ke4Or dUQRUZOxXvR2b+gxKxe97SY5SMoxwPyg06ph9l4+BcJ78CtDWUIlQgz++d0khMIVNaUI 8fFzIsad5Y8omHNXS/qGYzoxwKPyRvvCIZ+3lTpTBjVKwazUl4kOUPDUbrCTrVt4Hvvu cj3eYDWRXpiQjQYwQCz2UAXTUvrD4Av4B8pUk/47bxqkIydhBAzY1PM2fyhv1XFb1VE7 qU8A== X-Gm-Message-State: ABy/qLbSnlOrNnlxdwDpDijZI1n6jJRR5bZFNOas9oZiwIyS8QEBWe6d 0ek7bo9zgdoaAk27RAA4ZOZPK8Q931GpKCK2E5Tr/w== X-Received: by 2002:a05:6512:3e0c:b0:4fb:bc46:7c09 with SMTP id i12-20020a0565123e0c00b004fbbc467c09mr3594815lfv.6.1688693713232; Thu, 06 Jul 2023 18:35:13 -0700 (PDT) MIME-Version: 1.0 References: <20230704075054.3344915-1-stevensd@google.com> <20230704075054.3344915-6-stevensd@google.com> <20230705101800.ut4c6topn6ylwczs@linux.intel.com> <20230706155805.GD3894444@ls.amr.corp.intel.com> In-Reply-To: <20230706155805.GD3894444@ls.amr.corp.intel.com> From: David Stevens Date: Fri, 7 Jul 2023 10:35:02 +0900 Message-ID: Subject: Re: [PATCH v7 5/8] KVM: x86/mmu: Don't pass FOLL_GET to __kvm_follow_pfn To: Isaku Yamahata Cc: Yu Zhang , Sean Christopherson , Marc Zyngier , Michael Ellerman , Peter Xu , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 7, 2023 at 12:58=E2=80=AFAM Isaku Yamahata wrote: > > On Thu, Jul 06, 2023 at 01:52:08PM +0900, > David Stevens wrote: > > > On Wed, Jul 5, 2023 at 7:17=E2=80=AFPM Yu Zhang wrote: > > > > > > On Tue, Jul 04, 2023 at 04:50:50PM +0900, David Stevens wrote: > > > > From: David Stevens > > > > > > > > Stop passing FOLL_GET to __kvm_follow_pfn. This allows the host to = map > > > > memory into the guest that is backed by un-refcounted struct pages = - for > > > > example, higher order non-compound pages allocated by the amdgpu dr= iver > > > > via ttm_pool_alloc_page. > > > > > > I guess you mean the tail pages of the higher order non-compound page= s? > > > And as to the head page, it is said to be set to one coincidentally[*= ], > > > and shall not be considered as refcounted. IIUC, refcount of this he= ad > > > page will be increased and decreased soon in hva_to_pfn_remapped(), s= o > > > this may not be a problem(?). But treating this head page differently= , > > > as a refcounted one(e.g., to set the A/D flags), is weired. > > > > > > Or maybe I missed some context, e.g., can the head page be allocted t= o > > > guest at all? > > > > Yes, this is to allow mapping the tail pages of higher order > > non-compound pages - I should have been more precise in my wording. > > The head pages can already be mapped into the guest. > > > > Treating the head and tail pages would require changing how KVM > > behaves in a situation it supports today (rather than just adding > > support for an unsupported situation). Currently, without this series, > > KVM can map VM_PFNMAP|VM_IO memory backed by refcounted pages into the > > guest. When that happens, KVM sets the A/D flags. I'm not sure whether > > that's actually valid behavior, nor do I know whether anyone actually > > cares about it. But it's what KVM does today, and I would shy away > > from modifying that behavior without good reason. > > > > > > > > > > The bulk of this change is tracking the is_refcounted_page flag so = that > > > > non-refcounted pages don't trigger page_count() =3D=3D 0 warnings. = This is > > > > done by storing the flag in an unused bit in the sptes. > > > > > > Also, maybe we should mention this only works on x86-64. > > > > > > > > > > > Signed-off-by: David Stevens > > > > --- > > > > arch/x86/kvm/mmu/mmu.c | 44 +++++++++++++++++++++--------= ---- > > > > arch/x86/kvm/mmu/mmu_internal.h | 1 + > > > > arch/x86/kvm/mmu/paging_tmpl.h | 9 ++++--- > > > > arch/x86/kvm/mmu/spte.c | 4 ++- > > > > arch/x86/kvm/mmu/spte.h | 12 ++++++++- > > > > arch/x86/kvm/mmu/tdp_mmu.c | 22 ++++++++++------- > > > > 6 files changed, 62 insertions(+), 30 deletions(-) > > > > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > > index e44ab512c3a1..b1607e314497 100644 > > > > --- a/arch/x86/kvm/mmu/mmu.c > > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > > > > ... > > > > > > > @@ -2937,6 +2943,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu= , struct kvm_memory_slot *slot, > > > > bool host_writable =3D !fault || fault->map_writable; > > > > bool prefetch =3D !fault || fault->prefetch; > > > > bool write_fault =3D fault && fault->write; > > > > + bool is_refcounted =3D !fault || fault->is_refcounted_page; > > > > > > Just wonder, what if a non-refcounted page is prefetched? Or is it p= ossible in > > > practice? > > > > Prefetching is still done via gfn_to_page_many_atomic, which sets > > FOLL_GET. That's fixable, but it's not something this series currently > > does. > > So if we prefetch a page, REFCOUNTED bit is cleared unconditionally with = this > hunk. kvm_set_page_{dirty, accessed} won't be called as expected for pre= fetched > spte. If I read the patch correctly, REFCOUNTED bit in SPTE should repre= sent > whether the corresponding page is ref-countable or not, right? > > Because direct_pte_prefetch_many() is for legacy KVM MMU and FNAME(prefet= ch_pte) > is shadow paging, we need to test it with legacy KVM MMU or shadow paging= to hit > the issue, though. > direct_pte_prefetch_many and prefetch_gpte both pass NULL for the fault parameter, so is_refcounted will evaluate to true. So the spte's refcounted bit will get set in that case. -David