Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp2827402pxb; Mon, 19 Apr 2021 15:17:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzDEvYJsqN+7Lx+yt9pQAUeNqjLJSQvPzzqSUo0rSHKqTbW6EPA34u8DgQcn7I6uaPUfjPa X-Received: by 2002:a05:6402:4405:: with SMTP id y5mr5358927eda.149.1618870669217; Mon, 19 Apr 2021 15:17:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618870669; cv=none; d=google.com; s=arc-20160816; b=0AwJh1MAKcyv27RE7qgYwuX3O3wTLbASfMmYkGYbUi9paD8sgSa4tb6hkemJm2mzRH yilcG9wt/F4JqRdKvFI+e/+/Qw/Z5mNaklHFSwnCCFGpoW5vkekuyMVBqz8DJiM+JcSu 0yl1jFJG6HTZgefpVtHTBKvBSN6vb6Zn/3SR61San/fu+w7GOrozXNGIZRf4K3+GtnIi oVOLl9xoDk+etPEkqeOmQ+H4AEM4dstE8LJgjaGVBRhRnpvR9QBSo0bMjoV4HBFmmQZZ jI+ttNnRazW29oC/N/D/CIZvRNYDfJbridmt/pdqsTXQ5B5kou1NNZ4XKUEQJTBITj6N op3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=jXjjoWt46Jw4aYJCJQMswNRgoJcTtm7mXok8hN8cVH4=; b=gv7j8EVIPHe/3cWNQs+pLzTwBr0WXdaWzbWCs4hbVcJKyGfj1y1QqwUkP2Pt0r7nYs yCKosMtfsYeN7nrp89X/VosnvnOIBRyU2sTDKA851PnAE57kkfN4aLovhW2bZTisqCSP rc2pL8b1jOA/dvhwZJYgGL/iqTtX/uWfPyHsrobLpnnss6YRJIhNKaML02OIW8nZm9ky MjbJVIlPEmsgCJptppXgJN8oab/gSv9sk9ySrBr9RWhzO1fgIXmzRysdUpF0Fsbwg9v3 Moul67AgeW3CyXmxKHJjuEyyU3GNn5NA6IdC3xUxDeXPlT04b0NJ9OhxU+bfAEje3vAF zk+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=LBc5ghnU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v7si13204376edd.352.2021.04.19.15.17.26; Mon, 19 Apr 2021 15:17:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=LBc5ghnU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241618AbhDSTVh (ORCPT + 99 others); Mon, 19 Apr 2021 15:21:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239503AbhDSTVg (ORCPT ); Mon, 19 Apr 2021 15:21:36 -0400 Received: from mail-io1-xd32.google.com (mail-io1-xd32.google.com [IPv6:2607:f8b0:4864:20::d32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7938AC061761 for ; Mon, 19 Apr 2021 12:21:05 -0700 (PDT) Received: by mail-io1-xd32.google.com with SMTP id b9so6270733iod.13 for ; Mon, 19 Apr 2021 12:21:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jXjjoWt46Jw4aYJCJQMswNRgoJcTtm7mXok8hN8cVH4=; b=LBc5ghnUtNA4sLNOF5O3uvKop9WrGL+xQVnrKdZ8+XbH7Xs2yahScji+toDvMDZY+F V0+g2AEeEDt6HgyR6Gk0dtFT+7Z/C5CoKr/+rv0qAo5G2Lzd7AY2S3RdlgWXuAJontG4 Ndfupmi9oqSIgFlkJatmLRpZhX0PYrGzQP/IMIg0nOawSCojyPepe2tjJWbm84jneUqb aCVKNB5dfTbUfqlILEhnNHIUyz3fiScJsiTeNHZLK60LBPWb6E10ro4yeaMar9DyyQWS Sp5fLpappMul2poyk5T/G5xEyhMHAcoBv6/UiX6YHgVsPPXy7xwzWgM+90W9k8TfJSVh rB9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jXjjoWt46Jw4aYJCJQMswNRgoJcTtm7mXok8hN8cVH4=; b=Etu3CXyfyEVS4k/F66LU6D/DmEzbPHQSm7+TbjYF5fsGJ5vbl9wnmjCpmOndW8MxN8 QK/mYhG4Pi2nfDRH5xXm88PCDZWhXKIqEIgsg08efiYlna1Qms3Jnr3ituiyWJLhrz9/ SJCPdXBeorwSQQX/IFQvH9lM98aL2U6a11qfZO1ZJDbq6gO/DGD2hUCtvVvkm0E87epQ BE/RGNbTVMsU3XoebVzmd1QtJIhWPtATDdRGuUHt7eHx3GhgXueOqASJ1/whF6xRPxN1 BpkocxBLJ54D25XmjFU2eTC2H5jXsDc+gOGYxvS8NBOP/KslFG8cDKmMS/YyGufrXaRB IMyA== X-Gm-Message-State: AOAM532EVizFP/me4hG3oSlX7DC+S7FxHN89maIVj5g4H0EXiKXUkykP nlX2Elf4iBYmUKebL9JglnD6SNiYJ6VafzD2woGSEg== X-Received: by 2002:a5d:81c8:: with SMTP id t8mr15837366iol.19.1618860064760; Mon, 19 Apr 2021 12:21:04 -0700 (PDT) MIME-Version: 1.0 References: <20210416082511.2856-1-zhukeqian1@huawei.com> <20210416082511.2856-3-zhukeqian1@huawei.com> In-Reply-To: <20210416082511.2856-3-zhukeqian1@huawei.com> From: Ben Gardon Date: Mon, 19 Apr 2021 12:20:53 -0700 Message-ID: Subject: Re: [RFC PATCH v2 2/2] KVM: x86: Not wr-protect huge page with init_all_set dirty log To: Keqian Zhu Cc: LKML , kvm , Paolo Bonzini , Sean Christopherson , wanghaibin.wang@huawei.com Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 16, 2021 at 1:25 AM Keqian Zhu wrote: > > Currently during start dirty logging, if we're with init-all-set, > we write protect huge pages and leave normal pages untouched, for > that we can enable dirty logging for these pages lazily. > > Actually enable dirty logging lazily for huge pages is feasible > too, which not only reduces the time of start dirty logging, also > greatly reduces side-effect on guest when there is high dirty rate. > > Signed-off-by: Keqian Zhu > --- > arch/x86/kvm/mmu/mmu.c | 48 ++++++++++++++++++++++++++++++++++++++---- > arch/x86/kvm/x86.c | 37 +++++++++----------------------- > 2 files changed, 54 insertions(+), 31 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 2ce5bc2ea46d..98fa25172b9a 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -1188,8 +1188,7 @@ static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head, > * @gfn_offset: start of the BITS_PER_LONG pages we care about > * @mask: indicates which pages we should protect > * > - * Used when we do not need to care about huge page mappings: e.g. during dirty > - * logging we do not have any such mappings. > + * Used when we do not need to care about huge page mappings. > */ > static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, > struct kvm_memory_slot *slot, > @@ -1246,13 +1245,54 @@ static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm, > * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to > * enable dirty logging for them. > * > - * Used when we do not need to care about huge page mappings: e.g. during dirty > - * logging we do not have any such mappings. > + * We need to care about huge page mappings: e.g. during dirty logging we may > + * have any such mappings. > */ > void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, > struct kvm_memory_slot *slot, > gfn_t gfn_offset, unsigned long mask) > { > + gfn_t start, end; > + > + /* > + * Huge pages are NOT write protected when we start dirty log with > + * init-all-set, so we must write protect them at here. > + * > + * The gfn_offset is guaranteed to be aligned to 64, but the base_gfn > + * of memslot has no such restriction, so the range can cross two large > + * pages. > + */ > + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) { > + start = slot->base_gfn + gfn_offset + __ffs(mask); > + end = slot->base_gfn + gfn_offset + __fls(mask); > + kvm_mmu_slot_gfn_write_protect(kvm, slot, start, PG_LEVEL_2M); > + > + /* Cross two large pages? */ > + if (ALIGN(start << PAGE_SHIFT, PMD_SIZE) != > + ALIGN(end << PAGE_SHIFT, PMD_SIZE)) > + kvm_mmu_slot_gfn_write_protect(kvm, slot, end, > + PG_LEVEL_2M); > + } > + > + /* > + * RFC: > + * > + * 1. I don't return early when kvm_mmu_slot_gfn_write_protect() returns > + * true, because I am not very clear about the relationship between > + * legacy mmu and tdp mmu. AFAICS, the code logic is NOT an if/else > + * manner. > + * > + * The kvm_mmu_slot_gfn_write_protect() returns true when we hit a > + * writable large page mapping in legacy mmu mapping or tdp mmu mapping. > + * Do we still have normal mapping in that case? (e.g. We have large > + * mapping in legacy mmu and normal mapping in tdp mmu). Right, we can't return early because the two MMUs could map the page in different ways, but each MMU could also map the page in multiple ways independently. For example, if the legacy MMU was being used and we were running a nested VM, a page could be mapped 2M in EPT01 and 4K in EPT02, so we'd still need kvm_mmu_slot_gfn_write_protect calls for both levels. I don't think there's a case where we can return early here with the information that the first calls to kvm_mmu_slot_gfn_write_protect access. > + * > + * 2. kvm_mmu_slot_gfn_write_protect() doesn't tell us whether the large > + * page mapping exist. If it exists but is clean, we can return early. > + * However, we have to do invasive change. What do you mean by invasive change? > + */ > + > + /* Then we can handle the PT level pages */ > if (kvm_x86_ops.cpu_dirty_log_size) > kvm_mmu_clear_dirty_pt_masked(kvm, slot, gfn_offset, mask); > else > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index eca63625aee4..dfd676ffa7da 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -10888,36 +10888,19 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm, > */ > kvm_mmu_zap_collapsible_sptes(kvm, new); > } else { > - /* By default, write-protect everything to log writes. */ > - int level = PG_LEVEL_4K; > + /* > + * If we're with initial-all-set, we don't need to write protect > + * any page because they're reported as dirty already. > + */ > + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) > + return; > > if (kvm_x86_ops.cpu_dirty_log_size) { > - /* > - * Clear all dirty bits, unless pages are treated as > - * dirty from the get-go. > - */ > - if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) > - kvm_mmu_slot_leaf_clear_dirty(kvm, new); > - > - /* > - * Write-protect large pages on write so that dirty > - * logging happens at 4k granularity. No need to > - * write-protect small SPTEs since write accesses are > - * logged by the CPU via dirty bits. > - */ > - level = PG_LEVEL_2M; > - } else if (kvm_dirty_log_manual_protect_and_init_set(kvm)) { > - /* > - * If we're with initial-all-set, we don't need > - * to write protect any small page because > - * they're reported as dirty already. However > - * we still need to write-protect huge pages > - * so that the page split can happen lazily on > - * the first write to the huge page. > - */ > - level = PG_LEVEL_2M; > + kvm_mmu_slot_leaf_clear_dirty(kvm, new); > + kvm_mmu_slot_remove_write_access(kvm, new, PG_LEVEL_2M); > + } else { > + kvm_mmu_slot_remove_write_access(kvm, new, PG_LEVEL_4K); > } > - kvm_mmu_slot_remove_write_access(kvm, new, level); > } > } > > -- > 2.23.0 >