Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp3245745pxm; Mon, 28 Feb 2022 15:28:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJyWoMu6gs+mQPxddENhHSVuNX1mkHZgfX2QLe7ag9sEkcTjFmaZhOmcSxjUKrOGchLMpaPW X-Received: by 2002:aa7:8a4d:0:b0:4f2:6d3f:5fe3 with SMTP id n13-20020aa78a4d000000b004f26d3f5fe3mr24228249pfa.48.1646090894937; Mon, 28 Feb 2022 15:28:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646090894; cv=none; d=google.com; s=arc-20160816; b=qTn3wIKzamD9Ox7fxjvyberv4SEHtYj5REy1CPutPfJN/AlT2GX7xBD7L2rO7XVJOG FWx1MQEL9fDrnXlWiBj85ZY1EcGhf2Acv3069QlvPtItWhHBNVsEmZS0TSskf0Io1Dcs +upihqHwcjfDusviKx6BnqJyFbmrh7XBx094f++rmOeiAKW1hJxG3jqfcYRAchq3TsaD 5yPfTqewUZGiPDk6pkI9kPD2VU8VfZ8/j9n7GldpmanPFdoePQ1dfPj0LbHA4GJcQEwU 6vbQGpI7E02/gJ69eKrTtiFHJDzuqoNXsKIOJveRoigrOpIN8TQnmNF8eOEzjuBQDaF9 MJ0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=YoMcd1THGICE2VmFdB1g9ucNae/b72PiOrCiwNgNrHU=; b=xJAKSyi/jQCRC/8ryvjTyFJZI2PomfCet5jA7YlJakA62rRhG/n2Z9/yFvGCHW4iNo Oa98jqZUFTs23OxGIdEnRnFtcIa5QBZNqBOHwEkNpNCBNNO38j92Pa9R0YGkldr7rqqp ZRkSbFtKHZyvAKJJy+6V3GJxy8bYVGpfGQQiUOIMRaPeV1c7oAUSlxaSLgbDRhw8Kakh M2m+R4xF3CuLUEpXRrvkAB8QeTrwn4MehTgjgGYlYk6QWtEI8aWsBfPBkTCYpkUYg7ij CK59MVCGqcuPxEZnrAmxnXdgO6T+5FUbyAg5rt3Ta4q/02T1wPxhdW28DQqOViBvtMol F0oQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=KhtPHrlk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w7-20020a636207000000b00373bbd3db2bsi11131189pgb.735.2022.02.28.15.27.58; Mon, 28 Feb 2022 15:28:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=KhtPHrlk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229607AbiB1XQA (ORCPT + 99 others); Mon, 28 Feb 2022 18:16:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229457AbiB1XP6 (ORCPT ); Mon, 28 Feb 2022 18:15:58 -0500 Received: from mail-ej1-x62a.google.com (mail-ej1-x62a.google.com [IPv6:2a00:1450:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A19904349D for ; Mon, 28 Feb 2022 15:15:18 -0800 (PST) Received: by mail-ej1-x62a.google.com with SMTP id a8so27939583ejc.8 for ; Mon, 28 Feb 2022 15:15:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=YoMcd1THGICE2VmFdB1g9ucNae/b72PiOrCiwNgNrHU=; b=KhtPHrlkCURcM/UDBBkeodB+MwqbXhCYxIVriYq1RYVisuUokQNsczDB3hQnCjh13P dGriIG4ser0KM6mSB/so8J613WiFLIc4Mv35AFphKcMNEeILY+yj3elC4P9dqFziDi6M DFzQ4yMg0YkaDkaa8nKOCbK74RpLZ++Fhot/hpWZNCazzfI/tCKutWNL8ddELyM9XemI S5bzecFcpoQjnzKR0VZ40WnFmFLmcXM29CFHi2KJQPelJ5MrExIWnORPuwm2p9P91+ip U36GnMV++U65e0g3o61gUjhzODsUsCrWU6Jw+tvcxqubsYJ2UO2ARirKrPb65RR16DNQ f4PQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=YoMcd1THGICE2VmFdB1g9ucNae/b72PiOrCiwNgNrHU=; b=pJtJiiFQkPZd+S09CpdP6cNPcQcYSlF+PTtZLkDaiFyOl3ThKAvRQ1vu6gh4T9dajQ /hvZN/m4gKR0TjDvQpROSVVZS8+siHR+jlOY1NiDOWVzVNic8LVP/iyfbljBZZJ6+4Px yY9+/KVxPkVhQF3/noXuROPAsLRKSoZF1po2VUZjDfNSkVuur0PpqtmMIQnySVQ5Vv7T 6Mf7m3oGV0KncLjZ8V5UKitYDWHfjro34pP9ovYiMK6nJ7ng5L884h8NKdP5MpgiytQp J/9Db7BRtVrg29+dp3AGet7gijyIptu72PwfZYs1CigQsUbpRMekmEufjEmDMMzheNLB Pkjg== X-Gm-Message-State: AOAM530tUstHAIztTU1T6N6VV6jrhIgE72cmEwBZTRYfXPC+26xsy9HY fbUar45RLWNBXhqy+YSGDmCG5KlrA9ucjBZzHyfmHeFfhFk= X-Received: by 2002:a17:906:d14e:b0:6cd:8d7e:eec9 with SMTP id br14-20020a170906d14e00b006cd8d7eeec9mr16959496ejb.28.1646090116919; Mon, 28 Feb 2022 15:15:16 -0800 (PST) MIME-Version: 1.0 References: <20220226001546.360188-1-seanjc@google.com> <20220226001546.360188-4-seanjc@google.com> In-Reply-To: <20220226001546.360188-4-seanjc@google.com> From: Ben Gardon Date: Mon, 28 Feb 2022 15:15:03 -0800 Message-ID: Subject: Re: [PATCH v3 03/28] KVM: x86/mmu: Fix wrong/misleading comments in TDP MMU fast zap To: Sean Christopherson Cc: Paolo Bonzini , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , David Hildenbrand , kvm , LKML , David Matlack , Mingwei Zhang Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-18.1 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 25, 2022 at 4:16 PM Sean Christopherson wrote: > > Fix misleading and arguably wrong comments in the TDP MMU's fast zap > flow. The comments, and the fact that actually zapping invalid roots was > added separately, strongly suggests that zapping invalid roots is an > optimization and not required for correctness. That is a lie. > > KVM _must_ zap invalid roots before returning from kvm_mmu_zap_all_fast(), > because when it's called from kvm_mmu_invalidate_zap_pages_in_memslot(), > KVM is relying on it to fully remove all references to the memslot. Once > the memslot is gone, KVM's mmu_notifier hooks will be unable to find the > stale references as the hva=>gfn translation is done via the memslots. > If KVM doesn't immediately zap SPTEs and userspace unmaps a range after > deleting a memslot, KVM will fail to zap in response to the mmu_notifier > due to not finding a memslot corresponding to the notifier's range, which > leads to a variation of use-after-free. > > The other misleading comment (and code) explicitly states that roots > without a reference should be skipped. While that's technically true, > it's also extremely misleading as it should be impossible for KVM to > encounter a defunct root on the list while holding mmu_lock for write. > Opportunstically add a WARN to enforce that invariant. > > Fixes: b7cccd397f31 ("KVM: x86/mmu: Fast invalidation for TDP MMU") > Fixes: 4c6654bd160d ("KVM: x86/mmu: Tear down roots before kvm_mmu_zap_all_fast returns") > Signed-off-by: Sean Christopherson A couple nits about missing words, but otherwise looks good. Reviewed-by: Ben Gardon > --- > arch/x86/kvm/mmu/mmu.c | 8 +++++++ > arch/x86/kvm/mmu/tdp_mmu.c | 46 +++++++++++++++++++++----------------- > 2 files changed, 33 insertions(+), 21 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index b2c1c4eb6007..80607513a1f2 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -5662,6 +5662,14 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm) > > write_unlock(&kvm->mmu_lock); > > + /* > + * Zap the invalidated TDP MMU roots, all SPTEs must be dropped before > + * returning to the caller, e.g. if the zap is in response to a memslot > + * deletion, mmu_notifier callbacks will be unable to reach the SPTEs > + * associated with the deleted memslot once the update completes, and > + * Deferring the zap until the final reference to the root is put would > + * lead to use-after-free. > + */ > if (is_tdp_mmu_enabled(kvm)) { > read_lock(&kvm->mmu_lock); > kvm_tdp_mmu_zap_invalidated_roots(kvm); > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index 9357780ec28f..12866113fb4f 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -826,12 +826,11 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm) > } > > /* > - * Since kvm_tdp_mmu_zap_all_fast has acquired a reference to each > - * invalidated root, they will not be freed until this function drops the > - * reference. Before dropping that reference, tear down the paging > - * structure so that whichever thread does drop the last reference > - * only has to do a trivial amount of work. Since the roots are invalid, > - * no new SPTEs should be created under them. > + * Zap all invalidated roots to ensure all SPTEs are dropped before the "fast > + * zap" completes. Since kvm_tdp_mmu_invalidate_all_roots() has acquired a > + * reference to each invalidated root, roots will not be freed until after this > + * function drops the gifted reference, e.g. so that vCPUs don't get stuck with > + * tearing paging structures. Nit: tearing down paging structures > */ > void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm) > { > @@ -855,21 +854,25 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm) > } > > /* > - * Mark each TDP MMU root as invalid so that other threads > - * will drop their references and allow the root count to > - * go to 0. > + * Mark each TDP MMU root as invalid to prevent vCPUs from reusing a root that > + * is about to be zapped, e.g. in response to a memslots update. The caller is > + * responsible for invoking kvm_tdp_mmu_zap_invalidated_roots() to the actual Nit: to do > + * zapping. > * > - * Also take a reference on all roots so that this thread > - * can do the bulk of the work required to free the roots > - * once they are invalidated. Without this reference, a > - * vCPU thread might drop the last reference to a root and > - * get stuck with tearing down the entire paging structure. > + * Take a reference on all roots to prevent the root from being freed before it > + * is zapped by this thread. Freeing a root is not a correctness issue, but if > + * a vCPU drops the last reference to a root prior to the root being zapped, it > + * will get stuck with tearing down the entire paging structure. > * > - * Roots which have a zero refcount should be skipped as > - * they're already being torn down. > - * Already invalid roots should be referenced again so that > - * they aren't freed before kvm_tdp_mmu_zap_all_fast is > - * done with them. > + * Get a reference even if the root is already invalid, > + * kvm_tdp_mmu_zap_invalidated_roots() assumes it was gifted a reference to all > + * invalid roots, e.g. there's no epoch to identify roots that were invalidated > + * by a previous call. Roots stay on the list until the last reference is > + * dropped, so even though all invalid roots are zapped, a root may not go away > + * for quite some time, e.g. if a vCPU blocks across multiple memslot updates. > + * > + * Because mmu_lock is held for write, it should be impossible to observe a > + * root with zero refcount, i.e. the list of roots cannot be stale. > * > * This has essentially the same effect for the TDP MMU > * as updating mmu_valid_gen does for the shadow MMU. > @@ -879,9 +882,10 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) > struct kvm_mmu_page *root; > > lockdep_assert_held_write(&kvm->mmu_lock); > - list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) > - if (refcount_inc_not_zero(&root->tdp_mmu_root_count)) > + list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { > + if (!WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) > root->role.invalid = true; > + } > } > > /* > -- > 2.35.1.574.g5d30c73bfb-goog >