Received-SPF: pass (google.com: domain of linux-kernel+bounces-212130-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249;
Date: Wed, 12 Jun 2024 11:56:03 -0700
From: Isaku Yamahata <isaku.yamahata@intel.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
	"seanjc@google.com" <seanjc@google.com>,
	"Huang, Kai" <kai.huang@intel.com>,
	"sagis@google.com" <sagis@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Aktas, Erdem" <erdemaktas@google.com>,
	"Zhao, Yan Y" <yan.y.zhao@intel.com>,
	"dmatlack@google.com" <dmatlack@google.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Yamahata, Isaku" <isaku.yamahata@intel.com>,
	"isaku.yamahata@gmail.com" <isaku.yamahata@gmail.com>
Subject: Re: [PATCH v2 15/15] KVM: x86/tdp_mmu: Add a helper function to walk
 down the TDP MMU
Message-ID: <20240612185603.GK386318@ls.amr.corp.intel.com>
References: <20240530210714.364118-1-rick.p.edgecombe@intel.com>
 <20240530210714.364118-16-rick.p.edgecombe@intel.com>
 <CABgObfbpNN842noAe77WYvgi5MzK2SAA_FYw-=fGa+PcT_Z22w@mail.gmail.com>
 <af69a8359cd5edf892d68764789de7f357c58d5e.camel@intel.com>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <af69a8359cd5edf892d68764789de7f357c58d5e.camel@intel.com>

On Fri, Jun 07, 2024 at 11:39:14PM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:

> On Fri, 2024-06-07 at 11:31 +0200, Paolo Bonzini wrote:
> > > -int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
> > > -                        int *root_level)
> > > +static int __kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64
> > > *sptes,
> > > +                                 enum kvm_tdp_mmu_root_types root_type)
> > >   {
> > > -       struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa);
> > > +       struct kvm_mmu_page *root = tdp_mmu_get_root(vcpu, root_type);
> > 
> > I think this function should take the struct kvm_mmu_page * directly.
> > 
> > > +{
> > > +       *root_level = vcpu->arch.mmu->root_role.level;
> > > +
> > > +       return __kvm_tdp_mmu_get_walk(vcpu, addr, sptes, KVM_DIRECT_ROOTS);
> > 
> > Here you pass root_to_sp(vcpu->arch.mmu->root.hpa);
> 
> I see. It is another case of more indirection to try to send the decision making
> through the helpers. We can try to open code things more.
> 
> > 
> > > +int kvm_tdp_mmu_get_walk_mirror_pfn(struct kvm_vcpu *vcpu, u64 gpa,
> > > +                                    kvm_pfn_t *pfn)
> > > +{
> > > +       u64 sptes[PT64_ROOT_MAX_LEVEL + 1], spte;
> > > +       int leaf;
> > > +
> > > +       lockdep_assert_held(&vcpu->kvm->mmu_lock);
> > > +
> > > +       rcu_read_lock();
> > > +       leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, KVM_MIRROR_ROOTS);
> > 
> > and likewise here.
> > 
> > You might also consider using a kvm_mmu_root_info for the mirror root,
> > even though the pgd field is not used.
> 
> This came up on the last version actually. The reason against it was that it
> used that tiny bit of extra memory for the pgd. It does look more symmetrical
> though.
> 
> > 
> > Then __kvm_tdp_mmu_get_walk() can take a struct kvm_mmu_root_info * instead.
> 
> Ahh, I see. Yes, that's a good reason.
> 
> > 
> > kvm_tdp_mmu_get_walk_mirror_pfn() doesn't belong in this series, but
> > introducing __kvm_tdp_mmu_get_walk() can stay here.
> 
> Ok, we can split it.
> 
> > 
> > Looking at the later patch, which uses
> > kvm_tdp_mmu_get_walk_mirror_pfn(), I think this function is a bit
> > overkill. I'll do a pre-review of the init_mem_region function,
> > especially the usage of kvm_gmem_populate:
> > 
> > +    slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
> > +    if (!kvm_slot_can_be_private(slot) || !kvm_mem_is_private(kvm, gfn)) {
> > +        ret = -EFAULT;
> > +        goto out_put_page;
> > +    }
> > 
> > The slots_lock is taken, so checking kvm_slot_can_be_private is unnecessary.
> > 
> > Checking kvm_mem_is_private perhaps should also be done in
> > kvm_gmem_populate() itself. I'll send a patch.
> > 
> > +    read_lock(&kvm->mmu_lock);
> > +
> > +    ret = kvm_tdp_mmu_get_walk_mirror_pfn(vcpu, gpa, &mmu_pfn);
> > +    if (ret < 0)
> > +        goto out;
> > +    if (ret > PG_LEVEL_4K) {
> > +        ret = -EINVAL;
> > +        goto out;
> > +    }
> > +    if (mmu_pfn != pfn) {
> > +        ret = -EAGAIN;
> > +        goto out;
> > +    }
> > 
> > If you require pre-faulting, you don't need to return mmu_pfn and
> > things would be seriously wrong if the two didn't match, wouldn't
> > they?
> 
> Yea, I'm not sure why it would be a normal condition. Maybe Isaku can comment on
> the thinking?

Sean suggested for KVM_TDX_INIT_MEM_REGION to check if the two pfn from TDP MMU
and guest_memfd match.  As pointed out, the two PFNs should match under
appropriate lock (or heavily broken).  Personally I'm fine to remove such check
and to avoid returning pfn.

https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/

  Then KVM provides a dedicated TDX ioctl(), i.e. what is/was KVM_TDX_INIT_MEM_REGION,
  to do PAGE.ADD.  KVM_TDX_INIT_MEM_REGION wouldn't need to map anything, it would
  simply need to verify that the pfn from guest_memfd() is the same as what's in
  the TDP MMU.
-- 
Isaku Yamahata <isaku.yamahata@intel.com>