Received: by 2002:a05:7412:1703:b0:e2:908c:2ebd with SMTP id dm3csp481309rdb; Thu, 24 Aug 2023 11:35:46 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGbJXwDFZKki+TlRYle61IcXMXSVXPtcvztICKCkZLI1cFbJo3k3fyjVFIEqKi3ei3By6cd X-Received: by 2002:a05:6402:2039:b0:51e:2e39:9003 with SMTP id ay25-20020a056402203900b0051e2e399003mr11867354edb.40.1692902146280; Thu, 24 Aug 2023 11:35:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692902146; cv=none; d=google.com; s=arc-20160816; b=0aIZZjTX/bayg/bU7deNhUG6girbprAlDiJDFmPNQRZh1ac8wPd+9TxMavRLO4STAb eq5zxGmmcG2ZXRSgUvTpNgkw14v4PmsUDLHNbv4yfdr0Uwx316dlsjUODTmmv2QwW4Ij BRuRghcVGT7GmbHPnsnbKobKkQOqb966JdNEyCH8rpzQSVBDSjoMXLaECyKIHYLENQn7 n7qGlIqoaiOorCMzMOEH5Rt1bXVuR5pFEXMK9zUlMZiRBjZHpdV0DXW3O4qzd8I2NyXb Hpvr/fDZfjF11TEqBU+fQM3WgMGKq+2neO0zBy4K6fN/plE5apcyxCMRlEaByJMwYxnQ QGig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=XABl5efGX4dLDIDtb0EMToFN6DITOp/e+Uzd4Q9F18I=; fh=m/iqgjSQo+2X+k7U9ZhoVdJ9y+86sC8FUZgxfvNQS6s=; b=o4vXvCN46kfXWRqkgDBUgwfkhT7lvLCltabsTkAlUU7qiRFacdxN1cC2ufJxN+FMWX 5KjA7d7VETqTEQiLgytBOrI4UTbziZY5Q+279b2OGmfOXDi407LdPRrjKlqUmA9qaOP1 afhji6HRMBVdT8mJhPrHKHZ8+Qm21wizaLIpbhIXrI6gnGsR54AoT0sSr9IgRdoWb7O0 sAMi7owH8huXOpN2zpCJIeEtpxhmUjXKtlxuXg6FEnopIX6x4QKss5pxChe5fBl8KohT aIWKp4cHKKwoIsUZYC2iWwOxJEQ25K2mS/egTYtQfn5sqfJEAd167q8qX5nNO0zYej4U 92lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=NMOxeuFc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id be20-20020a0564021a3400b005236a7804f9si76450edb.111.2023.08.24.11.35.20; Thu, 24 Aug 2023 11:35:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=NMOxeuFc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240438AbjHXIGP (ORCPT + 99 others); Thu, 24 Aug 2023 04:06:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240501AbjHXIFt (ORCPT ); Thu, 24 Aug 2023 04:05:49 -0400 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4DEA01736 for ; Thu, 24 Aug 2023 01:05:19 -0700 (PDT) Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-1bdb7b0c8afso40420355ad.3 for ; Thu, 24 Aug 2023 01:05:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1692864303; x=1693469103; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XABl5efGX4dLDIDtb0EMToFN6DITOp/e+Uzd4Q9F18I=; b=NMOxeuFctuvdMSGAHF6a3sNMAZVYTiQ15gzr51r5WSX9AD4Jp52VPhOlW4t3j7+BIp EK+p3SbwjPGUMIyX9MCUaAGP1JNqTZ84JwBGpqtuFZhFFJqNseR/Ohu5UlCU6B+kWGXz F1m0I2rqei5AcINJI6WK2oBjleuGwcva3t0q8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692864303; x=1693469103; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XABl5efGX4dLDIDtb0EMToFN6DITOp/e+Uzd4Q9F18I=; b=bKJnORBm4AbGJdhE7r7/DEv1JdV+5Td1pPsmYA8HapoRgGk2+5NpqL0p9H6DiPcQvR ggfDlogmXgeSAMuMiyNLKPBsNLTfJH0rih0uoqQgO/4qI32GU0dQgSeGcBEWsETy36DX PU9tLq4eyTm65hs0yKpj4ON2UzsNk5eIhQk1SKfA6684NSCJPBQZlLtcEEbjvb97TVIF tYTy3nrik1wY8J0F3c3gv5XbQ/+qxlu0BzwVXz/7QZ7tRC06RY66AeuKhwVB2sqpe+Nu /IvLuJoSUPTX2EELHUUEig1eljho6qsN3PYdMl4O+eGPio4Bak30DtIBdDeGyFhVS0mO fVDw== X-Gm-Message-State: AOJu0YyMBLOpEzPsL7pCT0neW5m1Kf2WebSQHkb5DwdjnixuPXp9zezi upK7TTug7Z8WZ1930GghOSqAcw== X-Received: by 2002:a17:902:8c85:b0:1bf:13a7:d3ef with SMTP id t5-20020a1709028c8500b001bf13a7d3efmr10608777plo.66.1692864303541; Thu, 24 Aug 2023 01:05:03 -0700 (PDT) Received: from localhost ([2401:fa00:8f:203:515:8b2a:90c3:b79e]) by smtp.gmail.com with UTF8SMTPSA id jd17-20020a170903261100b001bf20c80684sm12158428plb.6.2023.08.24.01.05.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 24 Aug 2023 01:05:03 -0700 (PDT) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson Cc: Yu Zhang , Isaku Yamahata , Marc Zyngier , Michael Ellerman , Peter Xu , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v8 3/8] KVM: mmu: Make __kvm_follow_pfn not imply FOLL_GET Date: Thu, 24 Aug 2023 17:04:03 +0900 Message-ID: <20230824080408.2933205-4-stevensd@google.com> X-Mailer: git-send-email 2.42.0.rc1.204.g551eb34607-goog In-Reply-To: <20230824080408.2933205-1-stevensd@google.com> References: <20230824080408.2933205-1-stevensd@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: David Stevens Make it so that __kvm_follow_pfn does not imply FOLL_GET. This allows callers to resolve a gfn when the associated pfn has a valid struct page that isn't being actively refcounted (e.g. tail pages of non-compound higher order pages). For a caller to safely omit FOLL_GET, all usages of the returned pfn must be guarded by a mmu notifier. This also adds a is_refcounted_page out parameter to kvm_follow_pfn that is set when the returned pfn has an associated struct page with a valid refcount. Callers that don't pass FOLL_GET should remember this value and use it to avoid places like kvm_is_ad_tracked_page that assume a non-zero refcount. Signed-off-by: David Stevens --- include/linux/kvm_host.h | 7 ++++ virt/kvm/kvm_main.c | 84 ++++++++++++++++++++++++---------------- virt/kvm/pfncache.c | 2 +- 3 files changed, 58 insertions(+), 35 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 59d9b5e5db33..713fc2d91f95 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1164,10 +1164,17 @@ struct kvm_follow_pfn { bool atomic; /* Try to create a writable mapping even for a read fault */ bool try_map_writable; + /* + * Usage of the returned pfn will be guared by a mmu notifier. Must + * be true if FOLL_GET is not set. + */ + bool guarded_by_mmu_notifier; /* Outputs of __kvm_follow_pfn */ hva_t hva; bool writable; + /* True if the returned pfn is for a page with a valid refcount. */ + bool is_refcounted_page; }; kvm_pfn_t __kvm_follow_pfn(struct kvm_follow_pfn *foll); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 5fde46f05117..963b96cd8ff9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2481,6 +2481,25 @@ static inline int check_user_page_hwpoison(unsigned long addr) return rc == -EHWPOISON; } +static kvm_pfn_t kvm_follow_refcounted_pfn(struct kvm_follow_pfn *foll, + struct page *page) +{ + kvm_pfn_t pfn = page_to_pfn(page); + + foll->is_refcounted_page = true; + + /* + * FIXME: Ideally, KVM wouldn't pass FOLL_GET to gup() when the caller + * doesn't want to grab a reference, but gup() doesn't support getting + * just the pfn, i.e. FOLL_GET is effectively mandatory. If that ever + * changes, drop this and simply don't pass FOLL_GET to gup(). + */ + if (!(foll->flags & FOLL_GET)) + put_page(page); + + return pfn; +} + /* * The fast path to get the writable pfn which will be stored in @pfn, * true indicates success, otherwise false is returned. It's also the @@ -2499,8 +2518,8 @@ static bool hva_to_pfn_fast(struct kvm_follow_pfn *foll, kvm_pfn_t *pfn) return false; if (get_user_page_fast_only(foll->hva, FOLL_WRITE, page)) { - *pfn = page_to_pfn(page[0]); foll->writable = true; + *pfn = kvm_follow_refcounted_pfn(foll, page[0]); return true; } @@ -2513,7 +2532,7 @@ static bool hva_to_pfn_fast(struct kvm_follow_pfn *foll, kvm_pfn_t *pfn) */ static int hva_to_pfn_slow(struct kvm_follow_pfn *foll, kvm_pfn_t *pfn) { - unsigned int flags = FOLL_HWPOISON | foll->flags; + unsigned int flags = FOLL_HWPOISON | FOLL_GET | foll->flags; struct page *page; int npages; @@ -2535,7 +2554,7 @@ static int hva_to_pfn_slow(struct kvm_follow_pfn *foll, kvm_pfn_t *pfn) page = wpage; } } - *pfn = page_to_pfn(page); + *pfn = kvm_follow_refcounted_pfn(foll, page); return npages; } @@ -2550,16 +2569,6 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault) return true; } -static int kvm_try_get_pfn(kvm_pfn_t pfn) -{ - struct page *page = kvm_pfn_to_refcounted_page(pfn); - - if (!page) - return 1; - - return get_page_unless_zero(page); -} - static int hva_to_pfn_remapped(struct vm_area_struct *vma, struct kvm_follow_pfn *foll, kvm_pfn_t *p_pfn) { @@ -2568,6 +2577,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, pte_t pte; spinlock_t *ptl; bool write_fault = foll->flags & FOLL_WRITE; + struct page *page; int r; r = follow_pte(vma->vm_mm, foll->hva, &ptep, &ptl); @@ -2601,28 +2611,29 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, pfn = pte_pfn(pte); /* - * Get a reference here because callers of *hva_to_pfn* and - * *gfn_to_pfn* ultimately call kvm_release_pfn_clean on the - * returned pfn. This is only needed if the VMA has VM_MIXEDMAP - * set, but the kvm_try_get_pfn/kvm_release_pfn_clean pair will - * simply do nothing for reserved pfns. - * - * Whoever called remap_pfn_range is also going to call e.g. - * unmap_mapping_range before the underlying pages are freed, - * causing a call to our MMU notifier. + * Now deal with reference counting. If kvm_pfn_to_refcounted_page + * returns NULL, then there's no refcount to worry about. * - * Certain IO or PFNMAP mappings can be backed with valid - * struct pages, but be allocated without refcounting e.g., - * tail pages of non-compound higher order allocations, which - * would then underflow the refcount when the caller does the - * required put_page. Don't allow those pages here. + * Otherwise, certain IO or PFNMAP mappings can be backed with valid + * struct pages but be allocated without refcounting e.g., tail pages of + * non-compound higher order allocations. If FOLL_GET is set and we + * increment such a refcount, then when that pfn is eventually passed to + * kvm_release_pfn_clean, its refcount would hit zero and be incorrectly + * freed. Therefore don't allow those pages here when FOLL_GET is set. */ - if (!kvm_try_get_pfn(pfn)) - r = -EFAULT; + page = kvm_pfn_to_refcounted_page(pfn); + if (!page) + goto out; + + if (get_page_unless_zero(page)) + WARN_ON_ONCE(kvm_follow_refcounted_pfn(foll, page) != pfn); out: pte_unmap_unlock(ptep, ptl); - *p_pfn = pfn; + if (!foll->is_refcounted_page && !foll->guarded_by_mmu_notifier) + r = -EFAULT; + else + *p_pfn = pfn; return r; } @@ -2696,6 +2707,11 @@ kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *foll) kvm_pfn_t __kvm_follow_pfn(struct kvm_follow_pfn *foll) { foll->writable = false; + foll->is_refcounted_page = false; + + if (WARN_ON_ONCE(!(foll->flags & FOLL_GET) && !foll->guarded_by_mmu_notifier)) + return KVM_PFN_ERR_FAULT; + foll->hva = __gfn_to_hva_many(foll->slot, foll->gfn, NULL, foll->flags & FOLL_WRITE); @@ -2720,7 +2736,7 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, struct kvm_follow_pfn foll = { .slot = slot, .gfn = gfn, - .flags = 0, + .flags = FOLL_GET, .atomic = atomic, .try_map_writable = !!writable, }; @@ -2752,7 +2768,7 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, struct kvm_follow_pfn foll = { .slot = gfn_to_memslot(kvm, gfn), .gfn = gfn, - .flags = write_fault ? FOLL_WRITE : 0, + .flags = FOLL_GET | (write_fault ? FOLL_WRITE : 0), .try_map_writable = !!writable, }; pfn = __kvm_follow_pfn(&foll); @@ -2767,7 +2783,7 @@ kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn) struct kvm_follow_pfn foll = { .slot = slot, .gfn = gfn, - .flags = FOLL_WRITE, + .flags = FOLL_GET | FOLL_WRITE, }; return __kvm_follow_pfn(&foll); } @@ -2778,7 +2794,7 @@ kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gf struct kvm_follow_pfn foll = { .slot = slot, .gfn = gfn, - .flags = FOLL_WRITE, + .flags = FOLL_GET | FOLL_WRITE, .atomic = true, }; return __kvm_follow_pfn(&foll); diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 86cd40acad11..c558f510ab51 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -147,7 +147,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) struct kvm_follow_pfn foll = { .slot = gpc->memslot, .gfn = gpa_to_gfn(gpc->gpa), - .flags = FOLL_WRITE, + .flags = FOLL_WRITE | FOLL_GET, .hva = gpc->uhva, }; -- 2.42.0.rc1.204.g551eb34607-goog