Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp46736pxk; Wed, 30 Sep 2020 17:25:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwwPimCwp6qbO4xghHZwVAtom8lO3vI8D6PsXNqbOANQQ/0RB4Y9MqCHFPCKjKdIYi4J4au X-Received: by 2002:a17:906:3495:: with SMTP id g21mr5223352ejb.121.1601511957207; Wed, 30 Sep 2020 17:25:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601511957; cv=none; d=google.com; s=arc-20160816; b=Jjgpb5GfOo5iFITQwA8y2ecxFCaxJ0G9FVdotJ5uLqFRmRv98xWzGjypMPMQy9kxWy QvSFU5OO4inT4l5DMzsxylJ1cNyQRmBzph7T2w1CL7JkrIT4Wxfz+Zv5boD37NM9WCIS oO+QmTfWnJIWrva/XSecztb4bkEOJIl0szQBJWKzQkKkUNb4qvmjFe+914SW1xmM72rP FHdH8QKAz6Z21NGULl575qoFK2dNWStz2v4kHEqGHr8Drw0ssWRmn6PNtGdpcGPC0c5k j2WwmU4UvHt+wIV4V01RbjDiT0MgDLNIYUTdcZKyCX+22HMgMkMPn5T+f66SJt7X/oBd +7Yw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=+S9ma2h9nZrG9Sat/6z/ehgUrIx5BYnTDTi/tV6A6k4=; b=EwZ9CRDF9fE3iWs7I/hZDC3Tv79L3DBvO16qTeJFogIW4+I9mdsbqDEvIfQORoDxme NBPBTX/I08tl6zoduO2RtwvGArtQ5bj2aQhjokV3NWSQbVBm2o7vy4f6kxPtz34HcA83 N8Av5Ao+INZngwNELY0Dhc9Kods69tVCuYm2ZYQXFxXwW/UGULREqZEKGo9eP1jsZd6w p39HcT/9X34GtGPctIpWMfN9i7dLvbPdTRsRkyTDkSUbtRY97YLKS7ocyza4ukrgZ0Gg ypGUlLyluPN58ZJNq+G1uwJj9dNkmKvNYHeYxXNFA4JJ1UOZ7SGaTY37kaSmleOtAgnt D18w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=I3B14Ii7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n13si2204142edy.54.2020.09.30.17.25.35; Wed, 30 Sep 2020 17:25:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=I3B14Ii7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730732AbgI3WeC (ORCPT + 99 others); Wed, 30 Sep 2020 18:34:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725799AbgI3WeC (ORCPT ); Wed, 30 Sep 2020 18:34:02 -0400 Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com [IPv6:2607:f8b0:4864:20::d41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A3BCC061755 for ; Wed, 30 Sep 2020 15:34:02 -0700 (PDT) Received: by mail-io1-xd41.google.com with SMTP id y13so4304224iow.4 for ; Wed, 30 Sep 2020 15:34:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+S9ma2h9nZrG9Sat/6z/ehgUrIx5BYnTDTi/tV6A6k4=; b=I3B14Ii7kuDQDslqdcm/IPjckSxnWQUi4OXiqFU5KnR6INOv6VpiNMsKafQ5WPIWKg y2bajmGzD1rNhvlzwFClWar0ikBP4/uBWoA0C5AIioOc9u5Qy+hLDKGyiJuDAig84rJh nvY5JO9BFmvI4PaW1xUI39uEIaKeW/xIF0Es347XU7ez+qoUU6h+16taGC2220lkx2gj jHPKf+0/YkVuJj29oS5lvnIYbm/umJ1IFWfA0PteJzWDPswwnDe9Gn6VhbByGJ6sN0Ze KFb0pNkX+tt1HhZx8sOK80US9EPdKzS9/kAz5ORuPm/wl3DRgUm9/tHFeCOrJCm8WC/K InOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+S9ma2h9nZrG9Sat/6z/ehgUrIx5BYnTDTi/tV6A6k4=; b=O131kCZpz+mEMls/p8PU4WvB15vrS1lpuSpROSW/32O4o1tILPdeYP2csrXQoZNxfv H7YZLBtyLevWBTtelAr5Sm1ayb8rOVsPUenYMZ8tpouSVu0qNkchbO9s/g9LPbIFWndR pbJzJMXuUwCJtWueQhYpvm9qI0E1QiZFZDhFCqM9Gao9nir4+ap59Ip4iJZ7YLNEd+oK Y9m/v396LyUsX2NtXESww6iIHCGeRw0uIYmkkbZDkERwTtAKrZSJJ8/D5qyiDnugcaDN l2jQhDQb75VEv9OieQvukHPM5sPA2uTAgEdpFshaOGRRPbIBd15qae0JEQjt6+WHLbja QKfA== X-Gm-Message-State: AOAM532GQj/OKWuFO/5FBOHYmLL0l56r0Cto62+cgpykqfMyS2C2Vry/ hFUu/9RwGAi3+XbYSb6yVshFNssjCU/Dxgp3gxrakw== X-Received: by 2002:a05:6638:1643:: with SMTP id a3mr3730239jat.4.1601505241093; Wed, 30 Sep 2020 15:34:01 -0700 (PDT) MIME-Version: 1.0 References: <20200925212302.3979661-1-bgardon@google.com> <20200925212302.3979661-21-bgardon@google.com> <20200930181556.GJ32672@linux.intel.com> In-Reply-To: From: Ben Gardon Date: Wed, 30 Sep 2020 15:33:50 -0700 Message-ID: Subject: Re: [PATCH 20/22] kvm: mmu: NX largepage recovery for TDP MMU To: Paolo Bonzini Cc: Sean Christopherson , LKML , kvm , Cannon Matthews , Peter Xu , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 30, 2020 at 12:56 PM Paolo Bonzini wrote: > > On 30/09/20 20:15, Sean Christopherson wrote: > > On Fri, Sep 25, 2020 at 02:23:00PM -0700, Ben Gardon wrote: > >> +/* > >> + * Clear non-leaf SPTEs and free the page tables they point to, if those SPTEs > >> + * exist in order to allow execute access on a region that would otherwise be > >> + * mapped as a large page. > >> + */ > >> +void kvm_tdp_mmu_recover_nx_lpages(struct kvm *kvm) > >> +{ > >> + struct kvm_mmu_page *sp; > >> + bool flush; > >> + int rcu_idx; > >> + unsigned int ratio; > >> + ulong to_zap; > >> + u64 old_spte; > >> + > >> + rcu_idx = srcu_read_lock(&kvm->srcu); > >> + spin_lock(&kvm->mmu_lock); > >> + > >> + ratio = READ_ONCE(nx_huge_pages_recovery_ratio); > >> + to_zap = ratio ? DIV_ROUND_UP(kvm->stat.nx_lpage_splits, ratio) : 0; > > > > This is broken, and possibly related to Paolo's INIT_LIST_HEAD issue. The TDP > > MMU never increments nx_lpage_splits, it instead has its own counter, > > tdp_mmu_lpage_disallowed_page_count. Unless I'm missing something, to_zap is > > guaranteed to be zero and thus this is completely untested. > > Except if you do shadow paging (through nested EPT) and then it bombs > immediately. :) > > > I don't see any reason for a separate tdp_mmu_lpage_disallowed_page_count, > > a single VM can't have both a legacy MMU and a TDP MMU, so it's not like there > > will be collisions with other code incrementing nx_lpage_splits. And the TDP > > MMU should be updating stats anyways. > > This is true, but having two counters is necessary (in the current > implementation) because otherwise you zap more than the requested ratio > of pages. > > The simplest solution is to add a "bool tdp_page" to struct > kvm_mmu_page, so that you can have a single list of > lpage_disallowed_pages and a single thread. The while loop can then > dispatch to the right "zapper" code. I actually did add that bool in patch 4: kvm: mmu: Allocate and free TDP MMU roots. I'm a little nervous about putting them in the same list, but I agree it would definitely simplify the implementation of reclaim. > > Anyway this patch is completely broken, so let's kick it away to the > next round. Understood, sorry I didn't test this one better. I'll incorporate your feedback and include it in the next series. > > Paolo > > >> + > >> + while (to_zap && > >> + !list_empty(&kvm->arch.tdp_mmu_lpage_disallowed_pages)) { > >> + /* > >> + * We use a separate list instead of just using active_mmu_pages > >> + * because the number of lpage_disallowed pages is expected to > >> + * be relatively small compared to the total. > >> + */ > >> + sp = list_first_entry(&kvm->arch.tdp_mmu_lpage_disallowed_pages, > >> + struct kvm_mmu_page, > >> + lpage_disallowed_link); > >> + > >> + old_spte = *sp->parent_sptep; > >> + *sp->parent_sptep = 0; > >> + > >> + list_del(&sp->lpage_disallowed_link); > >> + kvm->arch.tdp_mmu_lpage_disallowed_page_count--; > >> + > >> + handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), sp->gfn, > >> + old_spte, 0, sp->role.level + 1); > >> + > >> + flush = true; > >> + > >> + if (!--to_zap || need_resched() || > >> + spin_needbreak(&kvm->mmu_lock)) { > >> + flush = false; > >> + kvm_flush_remote_tlbs(kvm); > >> + if (to_zap) > >> + cond_resched_lock(&kvm->mmu_lock); > >> + } > >> + } > >> + > >> + if (flush) > >> + kvm_flush_remote_tlbs(kvm); > >> + > >> + spin_unlock(&kvm->mmu_lock); > >> + srcu_read_unlock(&kvm->srcu, rcu_idx); > >> +} > >> + > >> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h > >> index 2ecb047211a6d..45ea2d44545db 100644 > >> --- a/arch/x86/kvm/mmu/tdp_mmu.h > >> +++ b/arch/x86/kvm/mmu/tdp_mmu.h > >> @@ -43,4 +43,6 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm, > >> > >> bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, > >> struct kvm_memory_slot *slot, gfn_t gfn); > >> + > >> +void kvm_tdp_mmu_recover_nx_lpages(struct kvm *kvm); > >> #endif /* __KVM_X86_MMU_TDP_MMU_H */ > >> -- > >> 2.28.0.709.gb0816b6eb0-goog > >> > > >