2023-06-22 22:37:09

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask

On Thu, 2023-06-22 at 10:32 -0500, Michael Roth wrote:
> On Thu, Jun 22, 2023 at 09:55:22AM +0000, Huang, Kai wrote:
> >
> > >
> > > So if we were to straight-forwardly implement that based on how TDX
> > > currently handles checking for the shared bit in GPA, paired with how
> > > SEV-SNP handles checking for private bit in fault flags, it would look
> > > something like:
> > >
> > > bool kvm_fault_is_private(kvm, gpa, err)
> > > {
> > > /* SEV-SNP handling */
> > > if (kvm->arch.mmu_private_fault_mask)
> > > return !!(err & arch.mmu_private_fault_mask);
> > >
> > > /* TDX handling */
> > > if (kvm->arch.gfn_shared_mask)
> > > return !!(gpa & arch.gfn_shared_mask);
> >
> > The logic of the two are identical. I think they need to be converged.
>
> I think they're just different enough that trying too hard to converge
> them might obfuscate things. If the determination didn't come from 2
> completely different fields (gpa vs. fault flags) maybe it could be
> simplified a bit more, but have well-defined open-coded handler that
> gets called once to set fault->is_private during initial fault time so
> that that ugliness never needs to be looked at again by KVM MMU seems
> like a good way to keep things simple through the rest of the handling.

I actually agree with the second half that is after "but ...", but IIUC it
doesn't conflict with converging arch.mmu_private_fault_mask and
arch.gfn_shared_mask into one. No?

>
> >
> > Either SEV-SNP should convert the error code private bit to the gfn_shared_mask,
> > or TDX's shared bit should be converted to some private error bit.
>
> struct kvm_page_fault seems to be the preferred way to pass additional
> params/metadata around, and .is_private field was introduced to track
> this private/shared state as part of UPM base series:
>
> https://lore.kernel.org/lkml/[email protected]/

Sure. Agreed.

>
> So it seems like unecessary complexity to track/encode that state into
> other additional places rather than just encapsulating it all in
> fault->is_private (or some similar field), and synthesizing all this
> platform-specific handling into a well-defined value that's conveyed
> by something like fault->is_private in a way where KVM MMU doesn't need
> to worry as much about platform-specific stuff seems like a good thing,
> and in line with what the UPM base series was trying to do by adding the
> fault->is_private field.

Yes we should just set fault->is_private and pass to the rest of the common KVM
MMU code.

>
> So all I'm really proposing is that whatever SNP and TDX end up doing
> should center around setting that fault->is_private field and keeping
> everything contained there. 
>

I agree.

> If there are better ways to handle *how*
> that's done I don't have any complaints there, but moving/adding bits
> to GPA/error_flags after fault time just seems unecessary to me when
> fault->is_private field can serve that purpose just as well.

Perhaps you missed my point. My point is arch.mmu_private_fault_mask and
arch.gfn_shared_mask seem redundant because the logic around them are exactly
the same. I do believe we should have fault->is_private passing to the common
MMU code.

In fact, now I am wondering why we need to have "mmu_private_fault_mask" and
"gfn_shared_mask" in _common_ KVM MMU code. We already have enough mechanism in
KVM common code:

1) fault->is_private
2) kvm_mmu_page_role.private
3) an Xarray to tell whether a GFN is private or shared

I am not convinced that we need to have "mmu_private_fault_mask" and
"gfn_shared_mask" in common KVM MMU code. Instead, they can be in AMD and
Intel's vendor code.

Maybe it makes sense to have "gfn_shared_mask" in the KVM common code so that
the fault handler can just strip away the "shared bit" at the very beginning (at
least for TDX), but for the rest of the time I think we should already have
enough infrastructure to handle private/shared mapping.

Btw, one minor issue is, if I recall correctly, for TDX the shared bit must be
applied to the GFN for shared mapping in normal EPT. I guess AMD doesn't need
that for shared mapping. So "gfn_shared_mask" maybe useful in this case, but
w/o it I believe we can also achieve in another way via vendor callback.

>
> >
> > Perhaps converting SEV-SNP makes more sense because if I recall correctly SEV
> > guest also has a C-bit, correct?
>
> That's correct, but the C-bit doesn't show in the GPA that gets passed
> up to KVM during an #NPF, and instead gets encoded into the fault's
> error_flags.
>
> >
> > Or, ...
> >
> > >
> > > return false;
> > > }
> > >
> > > kvm_mmu_do_page_fault(vcpu, gpa, err, ...)
> > > {
> > > struct kvm_page_fault fault = {
> > > ...
> > > .is_private = kvm_fault_is_private(vcpu->kvm, gpa, err)
> >
> > ... should we do something like:
> >
> > .is_private = static_call(kvm_x86_fault_is_private)(vcpu->kvm, gpa, 
> > err);
>
> We actually had exactly this in v7 of SNP hypervisor patches:
>
> https://lore.kernel.org/linux-coco/[email protected]/T/#m17841f5bfdfb8350d69d78c6831dd8f3a4748638
>
> but Sean was hoping to avoid a callback, which is why we ended up using
> a bitmask in this version since it basically all that callback would
> need to do. It's unfortunately that we don't have a common scheme
> between SNP/TDX, but maybe that's still possible, I just think that
> whatever that ends up being, it should live and be contained inside
> whatever helper ends up setting fault->is_private.

Sure. If the function call can be avoided in fault handler then we should.

>
> There's some other awkwardness with a callback approach. It sort of ties
> into your question about selftests so I'll address it below...
>
>
> >
> > ?
> >
> > > };
> > >
> > > ...
> > > }
> > >
> > > And then arch.mmu_private_fault_mask and arch.gfn_shared_mask would be
> > > set per-KVM-instance, just like they are now with current SNP and TDX
> > > patchsets, since stuff like KVM self-test wouldn't be setting those
> > > masks, so it makes sense to do it per-instance in that regard.
> > >
> > > But that still gets a little awkward for the KVM self-test use-case where
> > > .is_private should sort of be ignored in favor of whatever the xarray
> > > reports via kvm_mem_is_private(). 
> > >
> >
> > I must have missed something. Why does KVM self-test have impact to how does
> > KVM handles private fault?
>
> The self-tests I'm referring to here are the ones from Vishal that shipped with
> v10 of Chao's UPM/fd-based private memory series, and also as part of Sean's
> gmem tree:
>
> https://github.com/sean-jc/linux/commit/a0f5f8c911804f55935094ad3a277301704330a6
>
> These exercise gmem/UPM handling without the need for any SNP/TDX
> hardware support. They do so by "trusting" the shared/private state
> that the VMM sets via KVM_SET_MEMORY_ATTRIBUTES. So if VMM says it
> should be private, KVM MMU will treat it as private, so we'd never
> get a mismatch, so KVM_EXIT_MEMORY_FAULT will never be generated.

If KVM supports a test mode, or by reading another thread, KVM wants to support
a software-based KVM_X86_PROTECTED_VM and distinguish it with hardware-based
confidential VMs such as TDX and SEV-SNP:

https://lore.kernel.org/lkml/9e3e99f78fcbd7db21368b5fe1d931feeb4db567.1686858861.git.isaku.yamahata@intel.com/T/#me627bed3d9acf73ea882e8baa76dfcb27759c440

then it's fine to handle it when setting up fault->is_private. But my point is
KVM self-test itself should not impact how does KVM implement fault handler --
it is KVM itself wants to support additional software-based things.

[...]

(Thanks for explaining additional background).


2023-06-22 23:43:50

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask

On Thu, Jun 22, 2023 at 10:31:08PM +0000,
"Huang, Kai" <[email protected]> wrote:

> > If there are better ways to handle *how*
> > that's done I don't have any complaints there, but moving/adding bits
> > to GPA/error_flags after fault time just seems unecessary to me when
> > fault->is_private field can serve that purpose just as well.
>
> Perhaps you missed my point. My point is arch.mmu_private_fault_mask and
> arch.gfn_shared_mask seem redundant because the logic around them are exactly
> the same. I do believe we should have fault->is_private passing to the common
> MMU code.
>
> In fact, now I am wondering why we need to have "mmu_private_fault_mask" and
> "gfn_shared_mask" in _common_ KVM MMU code. We already have enough mechanism in
> KVM common code:
>
> 1) fault->is_private
> 2) kvm_mmu_page_role.private
> 3) an Xarray to tell whether a GFN is private or shared
>
> I am not convinced that we need to have "mmu_private_fault_mask" and
> "gfn_shared_mask" in common KVM MMU code. Instead, they can be in AMD and
> Intel's vendor code.
>
> Maybe it makes sense to have "gfn_shared_mask" in the KVM common code so that
> the fault handler can just strip away the "shared bit" at the very beginning (at
> least for TDX), but for the rest of the time I think we should already have
> enough infrastructure to handle private/shared mapping.
>
> Btw, one minor issue is, if I recall correctly, for TDX the shared bit must be
> applied to the GFN for shared mapping in normal EPT. I guess AMD doesn't need
> that for shared mapping. So "gfn_shared_mask" maybe useful in this case, but
> w/o it I believe we can also achieve in another way via vendor callback.


"2) kvm_mmu_page_role.private" above has different meaning.

a). The fault is private or not.
b). page table the fault handler is walking is private or conventional.

a.) is common for SNP, TDX and PROTECTED_VM. It makes sense in
kvm_mmu_do_page_fault() and __kvm_faultin_pfn(). After kvm_faultin_pfn(), the
fault handler can mostly forget it for SNP and PROTECTED_VM. (large page
adjustment needs it, though.) This is what we're discussing in this thread.

b.) is specific to TDX. TDX KVM MMU introduces one more page table.

--
Isaku Yamahata <[email protected]>

2023-06-23 00:02:16

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask

On Thu, 2023-06-22 at 16:39 -0700, Isaku Yamahata wrote:
> On Thu, Jun 22, 2023 at 10:31:08PM +0000,
> "Huang, Kai" <[email protected]> wrote:
>
> > > If there are better ways to handle *how*
> > > that's done I don't have any complaints there, but moving/adding bits
> > > to GPA/error_flags after fault time just seems unecessary to me when
> > > fault->is_private field can serve that purpose just as well.
> >
> > Perhaps you missed my point. My point is arch.mmu_private_fault_mask and
> > arch.gfn_shared_mask seem redundant because the logic around them are exactly
> > the same. I do believe we should have fault->is_private passing to the common
> > MMU code.
> >
> > In fact, now I am wondering why we need to have "mmu_private_fault_mask" and
> > "gfn_shared_mask" in _common_ KVM MMU code. We already have enough mechanism in
> > KVM common code:
> >
> > 1) fault->is_private
> > 2) kvm_mmu_page_role.private
> > 3) an Xarray to tell whether a GFN is private or shared
> >
> > I am not convinced that we need to have "mmu_private_fault_mask" and
> > "gfn_shared_mask" in common KVM MMU code. Instead, they can be in AMD and
> > Intel's vendor code.
> >
> > Maybe it makes sense to have "gfn_shared_mask" in the KVM common code so that
> > the fault handler can just strip away the "shared bit" at the very beginning (at
> > least for TDX), but for the rest of the time I think we should already have
> > enough infrastructure to handle private/shared mapping.
> >
> > Btw, one minor issue is, if I recall correctly, for TDX the shared bit must be
> > applied to the GFN for shared mapping in normal EPT. I guess AMD doesn't need
> > that for shared mapping. So "gfn_shared_mask" maybe useful in this case, but
> > w/o it I believe we can also achieve in another way via vendor callback.
>
>
> "2) kvm_mmu_page_role.private" above has different meaning.
>
> a). The fault is private or not.
> b). page table the fault handler is walking is private or conventional.
>
> a.) is common for SNP, TDX and PROTECTED_VM. It makes sense in
> kvm_mmu_do_page_fault() and __kvm_faultin_pfn(). After kvm_faultin_pfn(), the
> fault handler can mostly forget it for SNP and PROTECTED_VM. (large page
> adjustment needs it, though.) This is what we're discussing in this thread.
>
> b.) is specific to TDX. TDX KVM MMU introduces one more page table.
>
>

I don't buy the last sentence. Even it's not necessarily for AMD from
hardware's perspective, but the concept remains true for AMD too. So why cannot
we use it for AMD?

Note "shared_gfn_mask" and kvm_mmu_page_role.private is kinda redundant too, I
do believe we should avoid introducing a lot fields to KVM *common* code due to
vendor differences (i.e., in this case, "gfn_shared_mask" and
"mmu_private_fault_mask").

2023-06-23 14:50:50

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH RFC v9 04/51] KVM: x86: Determine shared/private faults using a configurable mask

On Thu, Jun 22, 2023 at 11:52:56PM +0000,
"Huang, Kai" <[email protected]> wrote:

> On Thu, 2023-06-22 at 16:39 -0700, Isaku Yamahata wrote:
> > On Thu, Jun 22, 2023 at 10:31:08PM +0000,
> > "Huang, Kai" <[email protected]> wrote:
> >
> > > > If there are better ways to handle *how*
> > > > that's done I don't have any complaints there, but moving/adding bits
> > > > to GPA/error_flags after fault time just seems unecessary to me when
> > > > fault->is_private field can serve that purpose just as well.
> > >
> > > Perhaps you missed my point. My point is arch.mmu_private_fault_mask and
> > > arch.gfn_shared_mask seem redundant because the logic around them are exactly
> > > the same. I do believe we should have fault->is_private passing to the common
> > > MMU code.
> > >
> > > In fact, now I am wondering why we need to have "mmu_private_fault_mask" and
> > > "gfn_shared_mask" in _common_ KVM MMU code. We already have enough mechanism in
> > > KVM common code:
> > >
> > > 1) fault->is_private
> > > 2) kvm_mmu_page_role.private
> > > 3) an Xarray to tell whether a GFN is private or shared
> > >
> > > I am not convinced that we need to have "mmu_private_fault_mask" and
> > > "gfn_shared_mask" in common KVM MMU code. Instead, they can be in AMD and
> > > Intel's vendor code.
> > >
> > > Maybe it makes sense to have "gfn_shared_mask" in the KVM common code so that
> > > the fault handler can just strip away the "shared bit" at the very beginning (at
> > > least for TDX), but for the rest of the time I think we should already have
> > > enough infrastructure to handle private/shared mapping.
> > >
> > > Btw, one minor issue is, if I recall correctly, for TDX the shared bit must be
> > > applied to the GFN for shared mapping in normal EPT. I guess AMD doesn't need
> > > that for shared mapping. So "gfn_shared_mask" maybe useful in this case, but
> > > w/o it I believe we can also achieve in another way via vendor callback.
> >
> >
> > "2) kvm_mmu_page_role.private" above has different meaning.
> >
> > a). The fault is private or not.
> > b). page table the fault handler is walking is private or conventional.
> >
> > a.) is common for SNP, TDX and PROTECTED_VM. It makes sense in
> > kvm_mmu_do_page_fault() and __kvm_faultin_pfn(). After kvm_faultin_pfn(), the
> > fault handler can mostly forget it for SNP and PROTECTED_VM. (large page
> > adjustment needs it, though.) This is what we're discussing in this thread.
> >
> > b.) is specific to TDX. TDX KVM MMU introduces one more page table.
> >
> >
>
> I don't buy the last sentence. Even it's not necessarily for AMD from
> hardware's perspective, but the concept remains true for AMD too. So why cannot
> we use it for AMD?

We can use it for AMD. Let me rephrase it.
TDX only uses it now. SEV-SNP may or may not use it at their option.
--
Isaku Yamahata <[email protected]>