Received: by 2002:a05:6a10:6006:0:0:0:0 with SMTP id w6csp1024340pxa; Fri, 28 Aug 2020 01:13:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy0Ss2GdY/W1irG3ijyLQmggiIG6EBkJz/PlEuQ3yaykdWOklrAfb8Xu1s22RUUn/6iJi7x X-Received: by 2002:a17:906:bcd4:: with SMTP id lw20mr533679ejb.499.1598602407122; Fri, 28 Aug 2020 01:13:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598602407; cv=none; d=google.com; s=arc-20160816; b=USs62DtmPc7eSMEfnV7QNCXpUUiQNiRJJ8iIq5fA7qBuSzkIpOKaqflYVXFeScD++2 k+6MW4AO9u9HrWqJhjQH1CVwLjj6/tBHH4FBnCjAwmV7xMfGU5DvY62+mDTCXU3QlRa6 rseqyyTIOSjtN8dPv5WWlyH2OiiDs9ujuPcXMGEQDgp2KaTqKKVeV4YZtW4JTmCvlrSM Xe6bh3iYWfPHtxbuwejNUOJxxS4YgHGz5ReoACvFBymbUBgu6fBDN8ngplGvkql17rP/ 3u/zhOpmpnJD8JLAXAzP9t/hT+uqI+1wk/0TwIL/IZgHzSYSIuZOJvk9UNUptKJDgYpV zd3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:subject:cc :to:from; bh=nZgLQnN39sJ/gSHfh/vOPNWFh/gDtdIv+DDuAm2lWo4=; b=p4Rb8deE+LtqDVLPnGpYQgPbKOmquD1C2yOrZkIgdOSYU8vNwXpTeihvecraaywW8u hhWTZtFDjQv0qaQYMzHtGdW42NrAyGPxFKECoRUpMJzTDbm3WBAT89JR1D54St5cePnP xP5xYXWgYyXkVwQ9XyC/tX0QDDpfCTgmCz0Glskb81N0eZp5Opl8BrcIVc0Kz736kWwl tg9HpPpbuWS2T0/nD5CmpcyG6sk+5z21WuFoLUBELOPOM93wSLseUbeO7f2aI4gY08kR QBrfL4LtdAHt7x+7xgsTsh+bOlOZecxIOYEEAnAJjyHGzFBkZI2DQelfV5E79pTpCwQ+ GMCA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w25si181930ejy.124.2020.08.28.01.13.04; Fri, 28 Aug 2020 01:13:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728648AbgH1IMW (ORCPT + 99 others); Fri, 28 Aug 2020 04:12:22 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:34970 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728581AbgH1IMQ (ORCPT ); Fri, 28 Aug 2020 04:12:16 -0400 Received: from DGGEMS412-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 99B9BB2BF2C7879FE501; Fri, 28 Aug 2020 16:12:13 +0800 (CST) Received: from DESKTOP-5IS4806.china.huawei.com (10.174.187.22) by DGGEMS412-HUB.china.huawei.com (10.3.19.212) with Microsoft SMTP Server id 14.3.487.0; Fri, 28 Aug 2020 16:12:06 +0800 From: Keqian Zhu To: , , CC: , Paolo Bonzini , "Sean Christopherson" , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Keqian Zhu Subject: [RFC PATCH] KVM: x86: Support write protect huge pages lazily Date: Fri, 28 Aug 2020 16:11:57 +0800 Message-ID: <20200828081157.15748-1-zhukeqian1@huawei.com> X-Mailer: git-send-email 2.8.4.windows.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.174.187.22] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently during enable dirty logging, if we're with init-all-set, we just write protect huge pages and leave normal pages untouched, for that we can enable dirty logging for these pages lazily. It seems that enable dirty logging lazily for huge pages is feasible too, which not only reduces the time of start dirty logging, also greatly reduces side-effect on guest when there is high dirty rate. (These codes are not tested, for RFC purpose :-) ). Signed-off-by: Keqian Zhu --- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/mmu/mmu.c | 65 ++++++++++++++++++++++++++------- arch/x86/kvm/vmx/vmx.c | 3 +- arch/x86/kvm/x86.c | 22 +++++------ 4 files changed, 62 insertions(+), 31 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 5303dbc5c9bc..201a068cf43d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1296,8 +1296,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); void kvm_mmu_slot_remove_write_access(struct kvm *kvm, - struct kvm_memory_slot *memslot, - int start_level); + struct kvm_memory_slot *memslot); void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, const struct kvm_memory_slot *memslot); void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 43fdb0c12a5d..4b7d577de6cd 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1625,14 +1625,45 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head) } /** - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages + * kvm_mmu_write_protect_largepage_masked - write protect selected largepages * @kvm: kvm instance * @slot: slot to protect * @gfn_offset: start of the BITS_PER_LONG pages we care about * @mask: indicates which pages we should protect * - * Used when we do not need to care about huge page mappings: e.g. during dirty - * logging we do not have any such mappings. + * @ret: true if all pages are write protected + */ +static bool kvm_mmu_write_protect_largepage_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long mask) +{ + struct kvm_rmap_head *rmap_head; + bool protected, all_protected; + gfn_t start_gfn = slot->base_gfn + gfn_offset; + int i; + + all_protected = true; + while (mask) { + protected = false; + for (i = PG_LEVEL_2M; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) { + rmap_head = __gfn_to_rmap(start_gfn + __ffs(mask), i, slot); + protectd |= __rmap_write_protect(kvm, rmap_head, false); + } + + all_protected &= protectd; + /* clear the first set bit */ + mask &= mask - 1; + } + + return all_protected; +} + +/** + * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages + * @kvm: kvm instance + * @slot: slot to protect + * @gfn_offset: start of the BITS_PER_LONG pages we care about + * @mask: indicates which pages we should protect */ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, @@ -1679,18 +1710,25 @@ EXPORT_SYMBOL_GPL(kvm_mmu_clear_dirty_pt_masked); /** * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected - * PT level pages. - * - * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to - * enable dirty logging for them. - * - * Used when we do not need to care about huge page mappings: e.g. during dirty - * logging we do not have any such mappings. + * dirty pages. */ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask) { + /* + * If we're with initial-all-set, huge pages are NOT + * write protected when we start dirty log, so we must + * write protect them here. + */ + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) { + if (kvm_mmu_write_protect_largepage_masked(kvm, slot, + gfn_offset, mask)) + return; + } + + /* Then we can handle the 4K level pages */ + if (kvm_x86_ops.enable_log_dirty_pt_masked) kvm_x86_ops.enable_log_dirty_pt_masked(kvm, slot, gfn_offset, mask); @@ -5906,14 +5944,13 @@ static bool slot_rmap_write_protect(struct kvm *kvm, } void kvm_mmu_slot_remove_write_access(struct kvm *kvm, - struct kvm_memory_slot *memslot, - int start_level) + struct kvm_memory_slot *memslot) { bool flush; spin_lock(&kvm->mmu_lock); - flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect, - start_level, KVM_MAX_HUGEPAGE_LEVEL, false); + flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect, + false); spin_unlock(&kvm->mmu_lock); /* diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 819c185adf09..ba871c52ef8b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7538,8 +7538,7 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu) static void vmx_slot_enable_log_dirty(struct kvm *kvm, struct kvm_memory_slot *slot) { - if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) - kvm_mmu_slot_leaf_clear_dirty(kvm, slot); + kvm_mmu_slot_leaf_clear_dirty(kvm, slot); kvm_mmu_slot_largepage_remove_write_access(kvm, slot); } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d39d6cf1d473..c31c32f1424b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10225,22 +10225,18 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm, * is enabled the D-bit or the W-bit will be cleared. */ if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) { + /* + * If we're with initial-all-set, we don't need + * to write protect any page because they're + * reported as dirty already. + */ + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) + return; + if (kvm_x86_ops.slot_enable_log_dirty) { kvm_x86_ops.slot_enable_log_dirty(kvm, new); } else { - int level = - kvm_dirty_log_manual_protect_and_init_set(kvm) ? - PG_LEVEL_2M : PG_LEVEL_4K; - - /* - * If we're with initial-all-set, we don't need - * to write protect any small page because - * they're reported as dirty already. However - * we still need to write-protect huge pages - * so that the page split can happen lazily on - * the first write to the huge page. - */ - kvm_mmu_slot_remove_write_access(kvm, new, level); + kvm_mmu_slot_remove_write_access(kvm, new); } } else { if (kvm_x86_ops.slot_disable_log_dirty) -- 2.23.0