Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40FB3C05027 for ; Mon, 6 Feb 2023 22:07:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229689AbjBFWG5 (ORCPT ); Mon, 6 Feb 2023 17:06:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229565AbjBFWGz (ORCPT ); Mon, 6 Feb 2023 17:06:55 -0500 Received: from mail-ej1-x629.google.com (mail-ej1-x629.google.com [IPv6:2a00:1450:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FFCD2FCE1 for ; Mon, 6 Feb 2023 14:06:54 -0800 (PST) Received: by mail-ej1-x629.google.com with SMTP id e22so9087503ejb.1 for ; Mon, 06 Feb 2023 14:06:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=FlqDpyPpyRc3qfoTqJA/NEC+lUkjqfAetcGAJ7W0YRk=; b=OUpBEhFa7AukjzvmDY6ZgsST5BbCa+a07g/Loxlvi6mzdVKDJnDhTnYgCFfBiF1Hwh SGI3qwaTsGGuMn3gyTcge2k3iTI5WUJ27p9rg98UBHx4Jka2eJ6s77AyAljWHbCtmOvY PgcNh5teA5pFxoDTv8SuZXLtOqS1wXt8YT7uISGWA/hWGeedU17WuRFX1CMRfZep0FO0 PkCTMSenVclsoQjzlMJgFmR5SuzK0XiojRi3dXxyOnimHjxp/uR60w8eCyhyrVMZd6aE igd2mWvfq7oS9oPPJ2qybPqEs3KCNOjYhVGSKdqJnhGng1Gxuaan2LnDDkWIqVbxtK42 pXZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=FlqDpyPpyRc3qfoTqJA/NEC+lUkjqfAetcGAJ7W0YRk=; b=qprZJgKSR097HyAYiYN4+CA0TcX77E77rGv9d/Rx6rV1t2HDf43ptClwXjKAF6S2Li t24sXnG75gdb1p/bzinKmyI3Tj1JmuC4ux5mBWITZ9j01dEWH8KeN/44mLWM5c334L2y ROerUzaznv+5Fn3dHcPodjCOTI8/6HaENbZDt96uo9HBSieZHd0lPYGYxDvkmY0yHaMw avswowjED1alNaeJudlALQLi/9jLXOP4FDXuoAP2S19QoVr7MABRzHTVsKvDRw6FJ6Km pCPJfOjXor+4v+832E2zR7XM598DRJHRzJgQLqc0cqJQTO+WQC7hD9Ga1KTcS0gtkp7F Vq7g== X-Gm-Message-State: AO0yUKWE1oDRbciWlX7+y4NEKvvMJLYQZf9f1byDBwLjsFvI9Lc5YaBB Z6goxLZ6o1AKhAVAh2m6OjJp3VO1tchWwhUXtZn/1g== X-Google-Smtp-Source: AK7set+MSciw1yovA+CyXf3oy5wxbvRarHn2fxoh4HMEokR1d+PMS4qDyCcyC2fXMEDgcR8rGBrjt4vRwqpLDZuSNYo= X-Received: by 2002:a17:906:3008:b0:878:8bd5:4bdf with SMTP id 8-20020a170906300800b008788bd54bdfmr258115ejz.270.1675721212602; Mon, 06 Feb 2023 14:06:52 -0800 (PST) MIME-Version: 1.0 References: <20230203192822.106773-1-vipinsh@google.com> <20230203192822.106773-3-vipinsh@google.com> In-Reply-To: <20230203192822.106773-3-vipinsh@google.com> From: Ben Gardon Date: Mon, 6 Feb 2023 14:06:41 -0800 Message-ID: Subject: Re: [Patch v2 2/5] KVM: x86/mmu: Optimize SPTE change flow for clear-dirty-log To: Vipin Sharma Cc: seanjc@google.com, pbonzini@redhat.com, dmatlack@google.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 3, 2023 at 11:28 AM Vipin Sharma wrote: > > No need to check all of the conditions in __handle_changed_spte() as > clearing dirty log only involves resetting dirty or writable bit. > > Make atomic change to dirty or writable bit and mark pfn dirty. > > Tested on 160 VCPU-160 GB VM and found that performance of clear dirty > log stage improved by ~38% in dirty_log_perf_test Dang! That's a big improvement. ... > diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h > index 30a52e5e68de..21046b34f94e 100644 > --- a/arch/x86/kvm/mmu/tdp_iter.h > +++ b/arch/x86/kvm/mmu/tdp_iter.h > @@ -121,4 +121,17 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root, > void tdp_iter_next(struct tdp_iter *iter); > void tdp_iter_restart(struct tdp_iter *iter); > > +static inline u64 kvm_tdp_mmu_clear_spte_bit(struct tdp_iter *iter, u64 mask) > +{ > + atomic64_t *sptep; > + > + if (kvm_tdp_mmu_spte_has_volatile_bits(iter->old_spte, iter->level)) { > + sptep = (atomic64_t *)rcu_dereference(iter->sptep); > + return (u64)atomic64_fetch_and(~mask, sptep); > + } > + > + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte & ~mask); > + return iter->old_spte; > +} > + If you do end up sending another version of the series and feel like breaking up this patch, you could probably split this part out since the change to how we set the SPTE and how we handle that change are somewhat independent. I like the switch to atomic64_fetch_and though. No idea if it's faster, but I would believe it could be. Totally optional, but there's currently no validation on the mask. Maybe we're only calling this in one place, but it might be worth clarifying the limits (if any) on what bits can be set in the mask. I don't think there necessarily need to be limits as of this commit, but the handling around this function where it's called here would obviously not be sufficient if the mask were -1UL or something. > #endif /* __KVM_X86_MMU_TDP_ITER_H */ > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index bba33aea0fb0..83f15052aa6c 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -710,18 +710,13 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm, > * notifier for access tracking. Leaving record_acc_track > * unset in that case prevents page accesses from being > * double counted. > - * @record_dirty_log: Record the page as dirty in the dirty bitmap if > - * appropriate for the change being made. Should be set > - * unless performing certain dirty logging operations. > - * Leaving record_dirty_log unset in that case prevents page > - * writes from being double counted. I was kind of hesitant about getting rid of this but now that I see it going, I love it. ... > @@ -1694,18 +1681,22 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root, > mask &= ~(1UL << (iter.gfn - gfn)); > > if (wrprot || spte_ad_need_write_protect(iter.old_spte)) { > - if (is_writable_pte(iter.old_spte)) > - new_spte = iter.old_spte & ~PT_WRITABLE_MASK; > - else > + if (!is_writable_pte(iter.old_spte)) > continue; > + > + clear_bits = PT_WRITABLE_MASK; > } else { > - if (iter.old_spte & shadow_dirty_mask) > - new_spte = iter.old_spte & ~shadow_dirty_mask; > - else > + if (!(iter.old_spte & shadow_dirty_mask)) > continue; > + > + clear_bits = shadow_dirty_mask; > } > > - tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte); > + iter.old_spte = kvm_tdp_mmu_clear_spte_bit(&iter, clear_bits); > + trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level, > + iter.old_spte, > + iter.old_spte & ~clear_bits); > + kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte)); > } Nice!