Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp28780911rwd; Wed, 5 Jul 2023 02:36:01 -0700 (PDT) X-Google-Smtp-Source: APBJJlGNWbWKE5ySbjcEuoKnjXavSmySpZTTv/9ZbZQOzuVUP2qPNMFqZnzL3pyRf/rFoYMVubY6 X-Received: by 2002:a05:6870:2c8a:b0:18f:170:ae14 with SMTP id oh10-20020a0568702c8a00b0018f0170ae14mr15035637oab.34.1688549760892; Wed, 05 Jul 2023 02:36:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688549760; cv=none; d=google.com; s=arc-20160816; b=IhfIULO1MaxXsyjXyfG8Y1zXDD0+9/DAYbhy8uQqD652m8XRca8etrjSMxOQ+HVQao XKihd44K4OGfcHNfiMDLNepQp6JKqdQcmoI4eGoUN6FkKiXP5DoHR8ZAJeAG/YzNmw1l cO0sio9B6Bwr3Mi7sZYpxGfQ+4fiuWuu2Y9RsX8EBLPfS4Uc6Mm7mfHwi0DSx88uusCd vUJdcn0yz7fs77G9TN1scP8Jd2vKLlbDCi7/NGhoh1zKix8PIYIIJenBkdSHUW4rdpIf lAtmPqlkhlYlzSyIiXwg2dxmwwsf7UeZ5BTCPoonIkbp0aZCcHs6orHga/IrFJD5EIBz S1fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=ds/vxPMjy6e9Qs6QzqIDlAkBYqYxet8Enaf39dNopbw=; fh=LJ+Nwx30FVOlSxR8nEJsbkULY753NTPwm8wdExmN5tI=; b=NQYUHmi326f2A1OBQZWqc7RerRKeewp4LMQqnjwcPFw6WQf6aoc1SLYR0KFN7nu+ER mYOWxsbsmi9GcAa36+qYF+/PoTD17isHsHKBakdStgIYVE/NyUblynmC9h/VJbKXV3Ch BekIMowCJJJUsdQpCVqor/vrP9a6vS8OmWStcW/s/KXpCSCQobCd69zOeC8NDZEEm5m7 ungcHfiGKvanoynLJJe+t7ICaRvPvDbAAkEtCimRmnvDVZJeFFU4dKIjgP4mT/vVPFFT CSTXQabmwb0LLVbUjAgCql/tgblyYVdb8uiLLqMTtIMweAh8wVfHRyKzMnd1METhYkyA Mb1A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=QD+gfgoG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o18-20020a170902d4d200b001b8944aa948si7521406plg.518.2023.07.05.02.35.44; Wed, 05 Jul 2023 02:36:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=QD+gfgoG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232151AbjGEJIe (ORCPT + 99 others); Wed, 5 Jul 2023 05:08:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231424AbjGEJId (ORCPT ); Wed, 5 Jul 2023 05:08:33 -0400 Received: from mail-lj1-x22a.google.com (mail-lj1-x22a.google.com [IPv6:2a00:1450:4864:20::22a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F24F710FB for ; Wed, 5 Jul 2023 02:08:30 -0700 (PDT) Received: by mail-lj1-x22a.google.com with SMTP id 38308e7fff4ca-2b6f0508f54so30682411fa.3 for ; Wed, 05 Jul 2023 02:08:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1688548109; x=1691140109; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ds/vxPMjy6e9Qs6QzqIDlAkBYqYxet8Enaf39dNopbw=; b=QD+gfgoGivCvxsG05GAV5MQjfVHMiJVwY2Nf9JAUUyjg+Fxm46GmMchtODlJtRGoLT LPR+bjBag/DV9YWcwbecis9HCFWjL4gFMFFNoX2QW9FgdS2lA7qeV8D0ly0vSQhK/nuy tMQpF+lTy5vE5lsrFPm/vCz8dEKxYm3f6lTWg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688548109; x=1691140109; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ds/vxPMjy6e9Qs6QzqIDlAkBYqYxet8Enaf39dNopbw=; b=fWUJBZhmO4oNTm6obzwVs4evkSEtRrU0jVkafxWIZQiD3wZLKdHYohKOaC0IGSN35V n8I8TcRcamBbMOetI/uSMSAgzPIBDeic/NCgz3mzrT3aIG01Xq2Ns3tu5wGgm1/U3TbQ foIt9kmMO+wLV2EbwaiAp8NyKF7wcLcXsV+dPicbryRe1cuPPUXJpXzJ2kZ5IptmxmRS nFwml1RyBdmA1mafhtNEUY02P3mFcOVtaTXjgJfNdzBO42PbWUOe883QKza3VNK0SeqO 0EoyIru9QtnELCeYuf2MM3VVSeG8emPiPKoBXLP9jyKmCta2AniuzeTGKttLJExH1K10 liLA== X-Gm-Message-State: ABy/qLb6Vb3y8Ef/F5KiaZFhou44+R6gD4iIArdZTkRtxwBcuZgv2PrH db4n2b/LQcIAU/R9tSbt5wVrYKPA+4GD/i2npvCUTQ== X-Received: by 2002:a2e:3305:0:b0:2b6:fa92:479e with SMTP id d5-20020a2e3305000000b002b6fa92479emr2728526ljc.42.1688548109034; Wed, 05 Jul 2023 02:08:29 -0700 (PDT) MIME-Version: 1.0 References: <20230704075054.3344915-1-stevensd@google.com> <20230704075054.3344915-3-stevensd@google.com> <20230705114732.000005c6.zhi.wang.linux@gmail.com> In-Reply-To: <20230705114732.000005c6.zhi.wang.linux@gmail.com> From: David Stevens Date: Wed, 5 Jul 2023 18:08:17 +0900 Message-ID: Subject: Re: [PATCH v7 2/8] KVM: Introduce __kvm_follow_pfn function To: Zhi Wang Cc: Sean Christopherson , Marc Zyngier , Michael Ellerman , Peter Xu , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 5, 2023 at 5:47=E2=80=AFPM Zhi Wang = wrote: > > On Tue, 4 Jul 2023 16:50:47 +0900 > David Stevens wrote: > > > From: David Stevens > > > > Introduce __kvm_follow_pfn, which will replace __gfn_to_pfn_memslot. > > __kvm_follow_pfn refactors the old API's arguments into a struct and, > > where possible, combines the boolean arguments into a single flags > > argument. > > > > Signed-off-by: David Stevens > > --- > > include/linux/kvm_host.h | 16 ++++ > > virt/kvm/kvm_main.c | 171 ++++++++++++++++++++++----------------- > > virt/kvm/kvm_mm.h | 3 +- > > virt/kvm/pfncache.c | 8 +- > > 4 files changed, 122 insertions(+), 76 deletions(-) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 9d3ac7720da9..ef2763c2b12e 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -97,6 +97,7 @@ > > #define KVM_PFN_ERR_HWPOISON (KVM_PFN_ERR_MASK + 1) > > #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) > > #define KVM_PFN_ERR_SIGPENDING (KVM_PFN_ERR_MASK + 3) > > +#define KVM_PFN_ERR_NEEDS_IO (KVM_PFN_ERR_MASK + 4) > > > > /* > > * error pfns indicate that the gfn is in slot but faild to > > @@ -1156,6 +1157,21 @@ unsigned long gfn_to_hva_memslot_prot(struct kvm= _memory_slot *slot, gfn_t gfn, > > void kvm_release_page_clean(struct page *page); > > void kvm_release_page_dirty(struct page *page); > > > > +struct kvm_follow_pfn { > > + const struct kvm_memory_slot *slot; > > + gfn_t gfn; > > + unsigned int flags; > > + bool atomic; > > + /* Allow a read fault to create a writeable mapping. */ > > + bool allow_write_mapping; > > + > > + /* Outputs of __kvm_follow_pfn */ > > + hva_t hva; > > + bool writable; > > +}; > > + > > +kvm_pfn_t __kvm_follow_pfn(struct kvm_follow_pfn *foll); > > + > > kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn); > > kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault= , > > bool *writable); > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 371bd783ff2b..b13f22861d2f 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -2486,24 +2486,22 @@ static inline int check_user_page_hwpoison(unsi= gned long addr) > > * true indicates success, otherwise false is returned. It's also the > > * only part that runs if we can in atomic context. > > */ > > -static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, > > - bool *writable, kvm_pfn_t *pfn) > > +static bool hva_to_pfn_fast(struct kvm_follow_pfn *foll, kvm_pfn_t *pf= n) > > { > > struct page *page[1]; > > + bool write_fault =3D foll->flags & FOLL_WRITE; > > > > /* > > * Fast pin a writable pfn only if it is a write fault request > > * or the caller allows to map a writable pfn for a read fault > > * request. > > */ > > - if (!(write_fault || writable)) > > + if (!(write_fault || foll->allow_write_mapping)) > > return false; > > > > - if (get_user_page_fast_only(addr, FOLL_WRITE, page)) { > > + if (get_user_page_fast_only(foll->hva, FOLL_WRITE, page)) { > > *pfn =3D page_to_pfn(page[0]); > > - > > - if (writable) > > - *writable =3D true; > > + foll->writable =3D foll->allow_write_mapping; > > return true; > > } > > > > @@ -2514,35 +2512,26 @@ static bool hva_to_pfn_fast(unsigned long addr,= bool write_fault, > > * The slow path to get the pfn of the specified host virtual address, > > * 1 indicates success, -errno is returned if error is detected. > > */ > > -static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write= _fault, > > - bool interruptible, bool *writable, kvm_pfn_t = *pfn) > > +static int hva_to_pfn_slow(struct kvm_follow_pfn *foll, kvm_pfn_t *pfn= ) > > { > > - unsigned int flags =3D FOLL_HWPOISON; > > + unsigned int flags =3D FOLL_HWPOISON | FOLL_GET | foll->flags; > > struct page *page; > > int npages; > > > > might_sleep(); > > > > - if (writable) > > - *writable =3D write_fault; > > - > > - if (write_fault) > > - flags |=3D FOLL_WRITE; > > - if (async) > > - flags |=3D FOLL_NOWAIT; > > - if (interruptible) > > - flags |=3D FOLL_INTERRUPTIBLE; > > - > > - npages =3D get_user_pages_unlocked(addr, 1, &page, flags); > > + npages =3D get_user_pages_unlocked(foll->hva, 1, &page, flags); > > if (npages !=3D 1) > > return npages; > > > > + foll->writable =3D (foll->flags & FOLL_WRITE) && foll->allow_writ= e_mapping; > > + > > /* map read fault as writable if possible */ > > - if (unlikely(!write_fault) && writable) { > > + if (unlikely(!foll->writable) && foll->allow_write_mapping) { > > struct page *wpage; > > > > - if (get_user_page_fast_only(addr, FOLL_WRITE, &wpage)) { > > - *writable =3D true; > > + if (get_user_page_fast_only(foll->hva, FOLL_WRITE, &wpage= )) { > > + foll->writable =3D true; > > put_page(page); > > page =3D wpage; > > } > > @@ -2572,23 +2561,23 @@ static int kvm_try_get_pfn(kvm_pfn_t pfn) > > return get_page_unless_zero(page); > > } > > > > -static int hva_to_pfn_remapped(struct vm_area_struct *vma, > > - unsigned long addr, bool write_fault, > > - bool *writable, kvm_pfn_t *p_pfn) > > +static int hva_to_pfn_remapped(struct vm_area_struct *vma, struct kvm_= follow_pfn *foll, > > + kvm_pfn_t *p_pfn) > > { > > kvm_pfn_t pfn; > > pte_t *ptep; > > spinlock_t *ptl; > > + bool write_fault =3D foll->flags & FOLL_WRITE; > > int r; > > > > - r =3D follow_pte(vma->vm_mm, addr, &ptep, &ptl); > > + r =3D follow_pte(vma->vm_mm, foll->hva, &ptep, &ptl); > > if (r) { > > /* > > * get_user_pages fails for VM_IO and VM_PFNMAP vmas and = does > > * not call the fault handler, so do it here. > > */ > > bool unlocked =3D false; > > - r =3D fixup_user_fault(current->mm, addr, > > + r =3D fixup_user_fault(current->mm, foll->hva, > > (write_fault ? FAULT_FLAG_WRITE : 0)= , > > &unlocked); > > if (unlocked) > > @@ -2596,7 +2585,7 @@ static int hva_to_pfn_remapped(struct vm_area_str= uct *vma, > > if (r) > > return r; > > > > - r =3D follow_pte(vma->vm_mm, addr, &ptep, &ptl); > > + r =3D follow_pte(vma->vm_mm, foll->hva, &ptep, &ptl); > > if (r) > > return r; > > } > > @@ -2606,8 +2595,7 @@ static int hva_to_pfn_remapped(struct vm_area_str= uct *vma, > > goto out; > > } > > > > - if (writable) > > - *writable =3D pte_write(*ptep); > > + foll->writable =3D pte_write(*ptep) && foll->allow_write_mapping; > > pfn =3D pte_pfn(*ptep); > > > > /* > > @@ -2652,24 +2640,22 @@ static int hva_to_pfn_remapped(struct vm_area_s= truct *vma, > > * 2): @write_fault =3D false && @writable, @writable will tell the ca= ller > > * whether the mapping is writable. > > */ > > -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptib= le, > > - bool *async, bool write_fault, bool *writable) > > +kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *foll) > > { > > struct vm_area_struct *vma; > > kvm_pfn_t pfn; > > int npages, r; > > > > /* we can do it either atomically or asynchronously, not both */ > > - BUG_ON(atomic && async); > > + BUG_ON(foll->atomic && (foll->flags & FOLL_NOWAIT)); > > > > - if (hva_to_pfn_fast(addr, write_fault, writable, &pfn)) > > + if (hva_to_pfn_fast(foll, &pfn)) > > return pfn; > > > > - if (atomic) > > + if (foll->atomic) > > return KVM_PFN_ERR_FAULT; > > > > - npages =3D hva_to_pfn_slow(addr, async, write_fault, interruptibl= e, > > - writable, &pfn); > > + npages =3D hva_to_pfn_slow(foll, &pfn); > > if (npages =3D=3D 1) > > return pfn; > > if (npages =3D=3D -EINTR) > > @@ -2677,83 +2663,122 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool = atomic, bool interruptible, > > > > mmap_read_lock(current->mm); > > if (npages =3D=3D -EHWPOISON || > > - (!async && check_user_page_hwpoison(addr))) { > > + (!(foll->flags & FOLL_NOWAIT) && check_user_page_hwpoison(f= oll->hva))) { > > pfn =3D KVM_PFN_ERR_HWPOISON; > > goto exit; > > } > > > > retry: > > - vma =3D vma_lookup(current->mm, addr); > > + vma =3D vma_lookup(current->mm, foll->hva); > > > > if (vma =3D=3D NULL) > > pfn =3D KVM_PFN_ERR_FAULT; > > else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) { > > - r =3D hva_to_pfn_remapped(vma, addr, write_fault, writabl= e, &pfn); > > + r =3D hva_to_pfn_remapped(vma, foll, &pfn); > > if (r =3D=3D -EAGAIN) > > goto retry; > > if (r < 0) > > pfn =3D KVM_PFN_ERR_FAULT; > > } else { > > - if (async && vma_is_valid(vma, write_fault)) > > - *async =3D true; > > - pfn =3D KVM_PFN_ERR_FAULT; > > + if ((foll->flags & FOLL_NOWAIT) && > > + vma_is_valid(vma, foll->flags & FOLL_WRITE)) > > + pfn =3D KVM_PFN_ERR_NEEDS_IO; > > + else > > + pfn =3D KVM_PFN_ERR_FAULT; > > } > > exit: > > mmap_read_unlock(current->mm); > > return pfn; > > } > > > > -kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn= _t gfn, > > - bool atomic, bool interruptible, bool *asy= nc, > > - bool write_fault, bool *writable, hva_t *h= va) > > +kvm_pfn_t __kvm_follow_pfn(struct kvm_follow_pfn *foll) > > { > > - unsigned long addr =3D __gfn_to_hva_many(slot, gfn, NULL, write_f= ault); > > - > > - if (hva) > > - *hva =3D addr; > > + foll->hva =3D __gfn_to_hva_many(foll->slot, foll->gfn, NULL, > > + foll->flags & FOLL_WRITE); > > > > - if (addr =3D=3D KVM_HVA_ERR_RO_BAD) { > > - if (writable) > > - *writable =3D false; > > + if (foll->hva =3D=3D KVM_HVA_ERR_RO_BAD) > > return KVM_PFN_ERR_RO_FAULT; > > - } > > > > Can you explain why updating foll->writable =3D false (previously *writea= ble > =3D false) is omitted here? > > In the caller where the struct kvm_follow_pfn is initialized, e.g. > __gfn_to_pfn_memslot()/gfn_to_pfn_prot(), .writable is not initialized. > IIUC, they expect __kvm_follow_pfn() to update it and return .writable to > upper caller. > > As the one of the output, it would be better to initalize it either in th= e > caller or update it in __kvm_follow_pfn(). Or > __gfn_to_pfn_memslot()/gfn_to_pfn_prot() will return random data in the > stack to the caller via bool *writable. It doesn't sound nice. Entries omitted from an initializer are initialized to zero, so .writable does get initialized in all of the patches in this series via designated initializers. Although you're right that explicitly setting it to false is a good idea, in case someday someone adds a caller that doesn't use an initializer when declaring its kvm_follow_pfn. -David