Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp684755pxf; Wed, 7 Apr 2021 09:08:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwIyMAdJIaLIT3nSiAYChQWClUxdtI/goi0HmthHNgBdboaZFp1UTmua5hjn2Aekr7DUn+S X-Received: by 2002:a17:906:f56:: with SMTP id h22mr4634189ejj.494.1617811708233; Wed, 07 Apr 2021 09:08:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617811708; cv=none; d=google.com; s=arc-20160816; b=d7VuXwIu1SfxwEGy1uB/sPDzvvklMINjKPPRfRAFf0QOSebbAbFAFSoECiIZ2UNOR/ Kq+3YmKcm/I/6aI3fw1pRA3gaL/l1wTS0BzvQpcG0SeuKcBdHfs8Qbqy7i+qoUH4A0rz KcXYLiA+P1d/lvj79F+B4mOvrhf/wJgQ1CRvgcjRl9+1ALMjYvytBZ4LRSz7AorT8OD9 v7y/PPT0kJ/ejJO8u9mNLsqH77OK6oqOCqufiSLmZH1rH98z4FLuNVWpy+XBCMBC9cUE zQfuvPtPOagK8rRWsI+p+ZiEVrh1d1ZvoHv166WYCcCBuqSMhdI5YPkMqTApIOEEKizr DmPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=+e4R9WhLCCBJ4gDbadZQs4tZRpdpcvcJbhVk8v4Qqw4=; b=m8wHpGQRhwgUn6RaSsDgqJSZXYJg/0Bd7blaQCQa+UtRX9uuqrnCXe+Xu5+jiLcvPf cuiAmUsxIT1bvqxHDhrFPg0NF79weKxcuQUTNefq5FgQLK+uUh9HpXm5AAZL2F4TlBpa qqO3oIvLa3LMgaMW80qc0HCvJUTKqlOpQ7pJcL2O5vfwOqRqv9YnBVyQKqj+AsMJNwPY ddUuiNE9iOvcFA5IFT9EYw2tXehLTqqEOj5Eo/VWpjoWqFz9XkAYqQkYU2ffTlxKHDLD rL/IHtvaDDIF+84e/FOAFD79A6N1064hcggpE+EVVOeTudmb2r3JtyHvVP6MypFMAwqg nh5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=IH7b9daR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qw12si8206108ejb.699.2021.04.07.09.08.00; Wed, 07 Apr 2021 09:08:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=IH7b9daR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242815AbhDFXmV (ORCPT + 99 others); Tue, 6 Apr 2021 19:42:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234385AbhDFXmU (ORCPT ); Tue, 6 Apr 2021 19:42:20 -0400 Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9FFEC06174A for ; Tue, 6 Apr 2021 16:42:11 -0700 (PDT) Received: by mail-pg1-x531.google.com with SMTP id l76so11596819pga.6 for ; Tue, 06 Apr 2021 16:42:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=+e4R9WhLCCBJ4gDbadZQs4tZRpdpcvcJbhVk8v4Qqw4=; b=IH7b9daRloX8JtyYrDtrasOOM1rOBc3PXLwvOJgBblhR8BDkCzBON39z0P3vMQCZJG WGW2NBAve+ZZ5SZ/yPewgk/Gif7KDPbK+2Km8kKCrF6hwIjEQrq+iSzy+adDd5ZxQTtk 7HdNeTuPTAI5JsoEnMJ5mu1tC7dVTU4MYjx8V5NUtzU6WqAKMh3krxfVT0xuGTz1RmRj iI8uCjBd1s6BSA2rfv7Rven/2nt5Arat7CeFho6tBIuqsGpUqhud91gLar5+gecB4lmI 9gUaSaQQYHEyPoowbsq49LtxbzPQnpPr2zjX/WHMM266ODHy1MMiv9hGDTsyl0XFPsNL 8iCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=+e4R9WhLCCBJ4gDbadZQs4tZRpdpcvcJbhVk8v4Qqw4=; b=qzVQUTLudx2+tAAD7gSxSbYqZpiOofoVIlnVTRh9OicQxQZpjxaYixH3B6PopTvZfc IITF1CCzqqmD4o3q5kaAGl8Ex0DbRGLpDpx8MB9kPxmhSuY7jzWek4KnoLox6wetdjim eBgi3q9Fkio3HUTJKWUjEVxavJ8REyvvwTCKXXlW6PwdxuTChZZjFVcTq5++g+bl7tV1 FerhqBzU43ixgxiMmprO5naCn/7bn2vmIs/snNODzhJGBb3Y3UdkBSisV9RlHfpCntMw XC8iSErcsG8eEWF3W/Y123MP8LL0IBCzgigQ5V6p6zvJt04kJRya72FNY/CP9LnsPgl6 tSjw== X-Gm-Message-State: AOAM531F7NcFpgw5BihoA1ZFf8Ge8OeM4lr3+CtlVuNHTrSg8+BbMadA EDbidO+Zb7sNNZHs+AUpZ/gc6w== X-Received: by 2002:a63:508:: with SMTP id 8mr603878pgf.220.1617752531098; Tue, 06 Apr 2021 16:42:11 -0700 (PDT) Received: from google.com (240.111.247.35.bc.googleusercontent.com. [35.247.111.240]) by smtp.gmail.com with ESMTPSA id c2sm19742305pfb.121.2021.04.06.16.42.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Apr 2021 16:42:10 -0700 (PDT) Date: Tue, 6 Apr 2021 23:42:06 +0000 From: Sean Christopherson To: Keqian Zhu Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Paolo Bonzini , wanghaibin.wang@huawei.com, Ben Gardon Subject: Re: [RFC PATCH] KVM: x86: Support write protect huge pages lazily Message-ID: References: <20200828081157.15748-1-zhukeqian1@huawei.com> <107696eb-755f-7807-a484-da63aad01ce4@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <107696eb-755f-7807-a484-da63aad01ce4@huawei.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org +Ben On Tue, Apr 06, 2021, Keqian Zhu wrote: > Hi Paolo, > > I plan to rework this patch and do full test. What do you think about this idea > (enable dirty logging for huge pages lazily)? Ben, don't you also have something similar (or maybe the exact opposite?) in the hopper? This sounds very familiar, but I can't quite connect the dots that are floating around my head... > PS: As dirty log of TDP MMU has been supported, I should add more code. > > On 2020/8/28 16:11, Keqian Zhu wrote: > > Currently during enable dirty logging, if we're with init-all-set, > > we just write protect huge pages and leave normal pages untouched, > > for that we can enable dirty logging for these pages lazily. > > > > It seems that enable dirty logging lazily for huge pages is feasible > > too, which not only reduces the time of start dirty logging, also > > greatly reduces side-effect on guest when there is high dirty rate. > > > > (These codes are not tested, for RFC purpose :-) ). > > > > Signed-off-by: Keqian Zhu > > --- > > arch/x86/include/asm/kvm_host.h | 3 +- > > arch/x86/kvm/mmu/mmu.c | 65 ++++++++++++++++++++++++++------- > > arch/x86/kvm/vmx/vmx.c | 3 +- > > arch/x86/kvm/x86.c | 22 +++++------ > > 4 files changed, 62 insertions(+), 31 deletions(-) > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index 5303dbc5c9bc..201a068cf43d 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -1296,8 +1296,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, > > > > void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); > > void kvm_mmu_slot_remove_write_access(struct kvm *kvm, > > - struct kvm_memory_slot *memslot, > > - int start_level); > > + struct kvm_memory_slot *memslot); > > void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, > > const struct kvm_memory_slot *memslot); > > void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 43fdb0c12a5d..4b7d577de6cd 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -1625,14 +1625,45 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head) > > } > > > > /** > > - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages > > + * kvm_mmu_write_protect_largepage_masked - write protect selected largepages > > * @kvm: kvm instance > > * @slot: slot to protect > > * @gfn_offset: start of the BITS_PER_LONG pages we care about > > * @mask: indicates which pages we should protect > > * > > - * Used when we do not need to care about huge page mappings: e.g. during dirty > > - * logging we do not have any such mappings. > > + * @ret: true if all pages are write protected > > + */ > > +static bool kvm_mmu_write_protect_largepage_masked(struct kvm *kvm, > > + struct kvm_memory_slot *slot, > > + gfn_t gfn_offset, unsigned long mask) > > +{ > > + struct kvm_rmap_head *rmap_head; > > + bool protected, all_protected; > > + gfn_t start_gfn = slot->base_gfn + gfn_offset; > > + int i; > > + > > + all_protected = true; > > + while (mask) { > > + protected = false; > > + for (i = PG_LEVEL_2M; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) { > > + rmap_head = __gfn_to_rmap(start_gfn + __ffs(mask), i, slot); > > + protectd |= __rmap_write_protect(kvm, rmap_head, false); > > + } > > + > > + all_protected &= protectd; > > + /* clear the first set bit */ > > + mask &= mask - 1; > > + } > > + > > + return all_protected; > > +} > > + > > +/** > > + * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages > > + * @kvm: kvm instance > > + * @slot: slot to protect > > + * @gfn_offset: start of the BITS_PER_LONG pages we care about > > + * @mask: indicates which pages we should protect > > */ > > static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, > > struct kvm_memory_slot *slot, > > @@ -1679,18 +1710,25 @@ EXPORT_SYMBOL_GPL(kvm_mmu_clear_dirty_pt_masked); > > > > /** > > * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected > > - * PT level pages. > > - * > > - * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to > > - * enable dirty logging for them. > > - * > > - * Used when we do not need to care about huge page mappings: e.g. during dirty > > - * logging we do not have any such mappings. > > + * dirty pages. > > */ > > void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, > > struct kvm_memory_slot *slot, > > gfn_t gfn_offset, unsigned long mask) > > { > > + /* > > + * If we're with initial-all-set, huge pages are NOT > > + * write protected when we start dirty log, so we must > > + * write protect them here. > > + */ > > + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) { > > + if (kvm_mmu_write_protect_largepage_masked(kvm, slot, > > + gfn_offset, mask)) > > + return; > > + } > > + > > + /* Then we can handle the 4K level pages */ > > + > > if (kvm_x86_ops.enable_log_dirty_pt_masked) > > kvm_x86_ops.enable_log_dirty_pt_masked(kvm, slot, gfn_offset, > > mask); > > @@ -5906,14 +5944,13 @@ static bool slot_rmap_write_protect(struct kvm *kvm, > > } > > > > void kvm_mmu_slot_remove_write_access(struct kvm *kvm, > > - struct kvm_memory_slot *memslot, > > - int start_level) > > + struct kvm_memory_slot *memslot) > > { > > bool flush; > > > > spin_lock(&kvm->mmu_lock); > > - flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect, > > - start_level, KVM_MAX_HUGEPAGE_LEVEL, false); > > + flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect, > > + false); > > spin_unlock(&kvm->mmu_lock); > > > > /* > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > index 819c185adf09..ba871c52ef8b 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -7538,8 +7538,7 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu) > > static void vmx_slot_enable_log_dirty(struct kvm *kvm, > > struct kvm_memory_slot *slot) > > { > > - if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) > > - kvm_mmu_slot_leaf_clear_dirty(kvm, slot); > > + kvm_mmu_slot_leaf_clear_dirty(kvm, slot); > > kvm_mmu_slot_largepage_remove_write_access(kvm, slot); > > } > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index d39d6cf1d473..c31c32f1424b 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -10225,22 +10225,18 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm, > > * is enabled the D-bit or the W-bit will be cleared. > > */ > > if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) { > > + /* > > + * If we're with initial-all-set, we don't need > > + * to write protect any page because they're > > + * reported as dirty already. > > + */ > > + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) > > + return; > > + > > if (kvm_x86_ops.slot_enable_log_dirty) { > > kvm_x86_ops.slot_enable_log_dirty(kvm, new); > > } else { > > - int level = > > - kvm_dirty_log_manual_protect_and_init_set(kvm) ? > > - PG_LEVEL_2M : PG_LEVEL_4K; > > - > > - /* > > - * If we're with initial-all-set, we don't need > > - * to write protect any small page because > > - * they're reported as dirty already. However > > - * we still need to write-protect huge pages > > - * so that the page split can happen lazily on > > - * the first write to the huge page. > > - */ > > - kvm_mmu_slot_remove_write_access(kvm, new, level); > > + kvm_mmu_slot_remove_write_access(kvm, new); > > } > > } else { > > if (kvm_x86_ops.slot_disable_log_dirty) > >