Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp727377pxm; Fri, 25 Feb 2022 18:28:20 -0800 (PST) X-Google-Smtp-Source: ABdhPJyxfiokC9zefPXLRheyP58U9XM1cTNkRHxOjfeX+/XUoD8ia+F+Vp6mJpYpLGjgX5MVQUPD X-Received: by 2002:a65:52cc:0:b0:374:3ee6:c632 with SMTP id z12-20020a6552cc000000b003743ee6c632mr8438639pgp.91.1645842500615; Fri, 25 Feb 2022 18:28:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645842500; cv=none; d=google.com; s=arc-20160816; b=u67NBYifq0++maUKmQANxA86/wrsbSZq/abBA32rAeJRMftHrW1J6DOidRK0xG0vcM g4isVUc1+NDJSAaIfu3idG4TccVAHBrvLIPlnWN1mWiTBoYpZu/Rxys9C/3E6WS1vD7X wRzCrdCAzoBNYjsYrL1J6l4pT7tnyghOLLCuMS6xx9dgB/Ng8YmExS3ut174JTWptGbw /xgzsuBIw/Q0yaMomdo4PJmzfWr7+iTQICf9LpNkLijqN8QGeiBEgz1ChFdtG0xjNLbc rjD32UCEEYIqVb/2wGBUpWMsahtbNuElnsHtXEVzs8t3kuRy/NXVOBJjyaDho7musYEH EujA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:reply-to:dkim-signature; bh=Jj10j2aUg8MzEq8hNDXHAtFe0jk7/OZgYIYqdt1cOoc=; b=0NyFbHxkmlGkcWqb1WWMilGvLmkuP2rFgPiouPW4jH69WJ1OMC30Ej3MxQlccqqfwI f28pVwf14UZPEa/sahTelKDH7XveI9npoG8JUbT7EQNEC8BuUZDRQ4VM2PBTz5x7Wqah H0cbwHa8Uxf5OZW+1F8l2bD8eiso7sDVKeVAv1o5mkjCKeE6VkcsFCnBl6tY0QeFJi11 k41EV3t+C2TNQiW2S9R8rl2/xoFldUzmk8RAm5aLcIScf3aS1ANye1GcTP8BulMn8E5s dyMtf7152fsJlMJ2AS0YzyrfZFnh1OvawtCoavyY02/ZK9y0LSFji3218wa0mlxsLAIr RcuQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=D2bVuMsw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id hd21-20020a17090b459500b001bd196888c7si213766pjb.36.2022.02.25.18.28.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Feb 2022 18:28:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=D2bVuMsw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id AB5DB274870; Fri, 25 Feb 2022 17:56:04 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240067AbiBZAQn (ORCPT + 99 others); Fri, 25 Feb 2022 19:16:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240106AbiBZAQk (ORCPT ); Fri, 25 Feb 2022 19:16:40 -0500 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BA6E2177E8 for ; Fri, 25 Feb 2022 16:16:07 -0800 (PST) Received: by mail-pj1-x104a.google.com with SMTP id 97-20020a17090a09ea00b001bc54996b62so6480519pjo.3 for ; Fri, 25 Feb 2022 16:16:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=Jj10j2aUg8MzEq8hNDXHAtFe0jk7/OZgYIYqdt1cOoc=; b=D2bVuMswqCIu+UG+EbEQzN6gM9KXJVT/zxc3AY9+iTQqtXVpHhdM9v0rQKNQOneKYn TQuXm3NDvUYi2SmHd4dr0u+ZMKGEOtqN798Hl830qR4uSuTKj6SwrOv/iG1EmfH95473 zSN+CrNkkuFMFH2/evxrT9A2Gy7Mdsg++tsrtvD/M9yNK75LEI2fqokwA85WII7OOmHL 6FLI7TpHmCd2tEVHkUq5XxG92BPqrfEifCe1XZZ5u1Fe0M3idj3nPxZCN2KAvAehEgf9 ZzBbOryoXNMqmhCpVH9edTem6LJAoVZ6xV2yHTV429B9uQ1qT/NiXinAvUBsEh0yng7h Tw6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=Jj10j2aUg8MzEq8hNDXHAtFe0jk7/OZgYIYqdt1cOoc=; b=095x+wGLwFY9GMeW4ernhC1k0HffXFHDSxvV3OWBDpEIrs4VUdpjKh+v+WE21OJjIC dwEg0CQqr01LsPTZ8BpGTQAgVqlIjityf3DKgGaZoIOvmvp6UkQFac8ypJPZvsHcQ7HI nJonOYH9a2ygzfFC6Nf5/rHYuAe4uJHh5AFyNysr+8BLqPNYWitHNOhufJd52R2fAjJH PTWWinbBu63UhSG0bnKLZzgiZEs0n4OXVYJ+c6PqeWikFxWetGjLW6jcGBUGjrx7TbjJ +VzLr/uK9/iJjs5uv9v/1FLSAs9KHi0L/JgInIFr7NYLoqeFS7EzNNF/NcgLV5qb2qZ6 UyHw== X-Gm-Message-State: AOAM532FJcAuX0VEVvxBuOlxGZznsl8fR9rCmnuZqGcHmQDCQtqLGIt3 vubDamN404tyJrjBNUXzFQ7KMr0MA1c= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a17:90b:4b52:b0:1bc:b208:dc5c with SMTP id mi18-20020a17090b4b5200b001bcb208dc5cmr761796pjb.1.1645834566189; Fri, 25 Feb 2022 16:16:06 -0800 (PST) Reply-To: Sean Christopherson Date: Sat, 26 Feb 2022 00:15:21 +0000 In-Reply-To: <20220226001546.360188-1-seanjc@google.com> Message-Id: <20220226001546.360188-4-seanjc@google.com> Mime-Version: 1.0 References: <20220226001546.360188-1-seanjc@google.com> X-Mailer: git-send-email 2.35.1.574.g5d30c73bfb-goog Subject: [PATCH v3 03/28] KVM: x86/mmu: Fix wrong/misleading comments in TDP MMU fast zap From: Sean Christopherson To: Paolo Bonzini , Christian Borntraeger , Janosch Frank , Claudio Imbrenda Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , David Hildenbrand , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Ben Gardon , Mingwei Zhang Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Fix misleading and arguably wrong comments in the TDP MMU's fast zap flow. The comments, and the fact that actually zapping invalid roots was added separately, strongly suggests that zapping invalid roots is an optimization and not required for correctness. That is a lie. KVM _must_ zap invalid roots before returning from kvm_mmu_zap_all_fast(), because when it's called from kvm_mmu_invalidate_zap_pages_in_memslot(), KVM is relying on it to fully remove all references to the memslot. Once the memslot is gone, KVM's mmu_notifier hooks will be unable to find the stale references as the hva=>gfn translation is done via the memslots. If KVM doesn't immediately zap SPTEs and userspace unmaps a range after deleting a memslot, KVM will fail to zap in response to the mmu_notifier due to not finding a memslot corresponding to the notifier's range, which leads to a variation of use-after-free. The other misleading comment (and code) explicitly states that roots without a reference should be skipped. While that's technically true, it's also extremely misleading as it should be impossible for KVM to encounter a defunct root on the list while holding mmu_lock for write. Opportunstically add a WARN to enforce that invariant. Fixes: b7cccd397f31 ("KVM: x86/mmu: Fast invalidation for TDP MMU") Fixes: 4c6654bd160d ("KVM: x86/mmu: Tear down roots before kvm_mmu_zap_all_fast returns") Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 8 +++++++ arch/x86/kvm/mmu/tdp_mmu.c | 46 +++++++++++++++++++++----------------- 2 files changed, 33 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index b2c1c4eb6007..80607513a1f2 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5662,6 +5662,14 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm) write_unlock(&kvm->mmu_lock); + /* + * Zap the invalidated TDP MMU roots, all SPTEs must be dropped before + * returning to the caller, e.g. if the zap is in response to a memslot + * deletion, mmu_notifier callbacks will be unable to reach the SPTEs + * associated with the deleted memslot once the update completes, and + * Deferring the zap until the final reference to the root is put would + * lead to use-after-free. + */ if (is_tdp_mmu_enabled(kvm)) { read_lock(&kvm->mmu_lock); kvm_tdp_mmu_zap_invalidated_roots(kvm); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 9357780ec28f..12866113fb4f 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -826,12 +826,11 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm) } /* - * Since kvm_tdp_mmu_zap_all_fast has acquired a reference to each - * invalidated root, they will not be freed until this function drops the - * reference. Before dropping that reference, tear down the paging - * structure so that whichever thread does drop the last reference - * only has to do a trivial amount of work. Since the roots are invalid, - * no new SPTEs should be created under them. + * Zap all invalidated roots to ensure all SPTEs are dropped before the "fast + * zap" completes. Since kvm_tdp_mmu_invalidate_all_roots() has acquired a + * reference to each invalidated root, roots will not be freed until after this + * function drops the gifted reference, e.g. so that vCPUs don't get stuck with + * tearing paging structures. */ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm) { @@ -855,21 +854,25 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm) } /* - * Mark each TDP MMU root as invalid so that other threads - * will drop their references and allow the root count to - * go to 0. + * Mark each TDP MMU root as invalid to prevent vCPUs from reusing a root that + * is about to be zapped, e.g. in response to a memslots update. The caller is + * responsible for invoking kvm_tdp_mmu_zap_invalidated_roots() to the actual + * zapping. * - * Also take a reference on all roots so that this thread - * can do the bulk of the work required to free the roots - * once they are invalidated. Without this reference, a - * vCPU thread might drop the last reference to a root and - * get stuck with tearing down the entire paging structure. + * Take a reference on all roots to prevent the root from being freed before it + * is zapped by this thread. Freeing a root is not a correctness issue, but if + * a vCPU drops the last reference to a root prior to the root being zapped, it + * will get stuck with tearing down the entire paging structure. * - * Roots which have a zero refcount should be skipped as - * they're already being torn down. - * Already invalid roots should be referenced again so that - * they aren't freed before kvm_tdp_mmu_zap_all_fast is - * done with them. + * Get a reference even if the root is already invalid, + * kvm_tdp_mmu_zap_invalidated_roots() assumes it was gifted a reference to all + * invalid roots, e.g. there's no epoch to identify roots that were invalidated + * by a previous call. Roots stay on the list until the last reference is + * dropped, so even though all invalid roots are zapped, a root may not go away + * for quite some time, e.g. if a vCPU blocks across multiple memslot updates. + * + * Because mmu_lock is held for write, it should be impossible to observe a + * root with zero refcount, i.e. the list of roots cannot be stale. * * This has essentially the same effect for the TDP MMU * as updating mmu_valid_gen does for the shadow MMU. @@ -879,9 +882,10 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) struct kvm_mmu_page *root; lockdep_assert_held_write(&kvm->mmu_lock); - list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) - if (refcount_inc_not_zero(&root->tdp_mmu_root_count)) + list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { + if (!WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) root->role.invalid = true; + } } /* -- 2.35.1.574.g5d30c73bfb-goog