2024-03-11 10:49:05

by francisco flynn

[permalink] [raw]
Subject: [PATCH] KVM: x86/mmu: treat WC memory as MMIO

when doing kvm_tdp_mmu_map for WC memory, such as pages
allocated by amdgpu ttm driver for ttm_write_combined
caching mode(e.g. host coherent in vulkan),
the spte would be set to WB, in this case, vcpu write
to these pages would goes to cache first, and never
be write-combined and host-coherent anymore. so
WC memory should be treated as MMIO, and the effective
memory type is depending on guest PAT.

Signed-off-by: francisco_flynn <[email protected]>
---
arch/x86/include/asm/memtype.h | 2 ++
arch/x86/kvm/mmu/spte.c | 5 +++--
arch/x86/mm/pat/memtype.c | 8 ++++++++
3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/memtype.h b/arch/x86/include/asm/memtype.h
index 113b2fa51849..b49887d8580e 100644
--- a/arch/x86/include/asm/memtype.h
+++ b/arch/x86/include/asm/memtype.h
@@ -23,6 +23,8 @@ extern void memtype_free_io(resource_size_t start, resource_size_t end);

extern bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn);

+extern bool pat_pfn_is_wc(unsigned long pfn);
+
bool x86_has_pat_wp(void);
enum page_cache_mode pgprot2cachemode(pgprot_t pgprot);

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 4a599130e9c9..7842458dafa8 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -88,9 +88,10 @@ u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access)
static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
{
if (pfn_valid(pfn))
- return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn)) &&
+ return !is_zero_pfn(pfn) &&
+ (PageReserved(pfn_to_page(pfn)) || pat_pfn_is_wc(pfn)) &&
/*
- * Some reserved pages, such as those from NVDIMM
+ * Some reserved or WC pages, such as those from NVDIMM
* DAX devices, are not for MMIO, and can be mapped
* with cached memory type for better performance.
* However, the above check misconceives those pages
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 0904d7e8e126..a5dd906cd987 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -716,6 +716,14 @@ bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn)
}
EXPORT_SYMBOL_GPL(pat_pfn_immune_to_uc_mtrr);

+bool pat_pfn_is_wc(unsigned long pfn)
+{
+ enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn));
+
+ return cm == _PAGE_CACHE_MODE_WC;
+}
+EXPORT_SYMBOL_GPL(pat_pfn_is_wc);
+
/**
* memtype_reserve_io - Request a memory type mapping for a region of memory
* @start: start (physical address) of the region
--
2.25.1



2024-03-11 16:27:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86/mmu: treat WC memory as MMIO

On Mon, Mar 11, 2024, francisco_flynn wrote:
> when doing kvm_tdp_mmu_map for WC memory, such as pages
> allocated by amdgpu ttm driver for ttm_write_combined
> caching mode(e.g. host coherent in vulkan),
> the spte would be set to WB, in this case, vcpu write
> to these pages would goes to cache first, and never
> be write-combined and host-coherent anymore. so
> WC memory should be treated as MMIO, and the effective
> memory type is depending on guest PAT.

No, the effective memtype is not fully guest controlled. By forcing the EPT memtype
to UC, the guest can only use UC or WC. I don't know if there's a use case for
the host mapping memory WC while the guest uses WB, but it should be a moot point,
because this this series should do what you want (allow guest to map GPU buffers
as WC).

https://lore.kernel.org/all/[email protected]

> Signed-off-by: francisco_flynn <[email protected]>
> ---
> arch/x86/include/asm/memtype.h | 2 ++
> arch/x86/kvm/mmu/spte.c | 5 +++--

Please use get_maintainers.pl.

> arch/x86/mm/pat/memtype.c | 8 ++++++++
> 3 files changed, 13 insertions(+), 2 deletions(-)

2024-03-12 08:42:27

by francisco flynn

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86/mmu: treat WC memory as MMIO



On 2024/3/12 00:20, Sean Christopherson wrote:
> On Mon, Mar 11, 2024, francisco_flynn wrote:
>> when doing kvm_tdp_mmu_map for WC memory, such as pages
>> allocated by amdgpu ttm driver for ttm_write_combined
>> caching mode(e.g. host coherent in vulkan),
>> the spte would be set to WB, in this case, vcpu write
>> to these pages would goes to cache first, and never
>> be write-combined and host-coherent anymore. so
>> WC memory should be treated as MMIO, and the effective
>> memory type is depending on guest PAT.
>
> No, the effective memtype is not fully guest controlled. By forcing the EPT memtype
> to UC, the guest can only use UC or WC. I don't know if there's a use case for

Well,it's actually the host mapping memory WC and guest uses WC,
one use case is virtio-gpu host blob, which is to map physical GPU buffers into guest

> the host mapping memory WC while the guest uses WB, but it should be a moot point,
> because this this series should do what you want (allow guest to map GPU buffers
> as WC).
>
> https://lore.kernel.org/all/[email protected]
>

yes, this is what i want, but for virtio-gpu device, if we mapping WC typed
GPU buffer into guest, kvm_arch_has_noncoherent_dma would return false,
so on cpu without self-snoop support, guest PAT will be ignored, the effective
memory type would be set to WB, causing data inconsistency.

+ if (!static_cpu_has(X86_FEATURE_SELFSNOOP) &&
+ !kvm_arch_has_noncoherent_dma(vcpu->kvm))
return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;


>> Signed-off-by: francisco_flynn <[email protected]>
>> ---
>> arch/x86/include/asm/memtype.h | 2 ++
>> arch/x86/kvm/mmu/spte.c | 5 +++--
>
> Please use get_maintainers.pl.

sure.

>
>> arch/x86/mm/pat/memtype.c | 8 ++++++++
>> 3 files changed, 13 insertions(+), 2 deletions(-)


2024-03-12 15:22:02

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86/mmu: treat WC memory as MMIO

On Tue, Mar 12, 2024, francisco flynn wrote:
> On 2024/3/12 00:20, Sean Christopherson wrote:
> > On Mon, Mar 11, 2024, francisco_flynn wrote:
> >> when doing kvm_tdp_mmu_map for WC memory, such as pages
> >> allocated by amdgpu ttm driver for ttm_write_combined
> >> caching mode(e.g. host coherent in vulkan),
> >> the spte would be set to WB, in this case, vcpu write
> >> to these pages would goes to cache first, and never
> >> be write-combined and host-coherent anymore. so
> >> WC memory should be treated as MMIO, and the effective
> >> memory type is depending on guest PAT.
> >
> > No, the effective memtype is not fully guest controlled. By forcing the EPT memtype
> > to UC, the guest can only use UC or WC. I don't know if there's a use case for
>
> Well,it's actually the host mapping memory WC and guest uses WC,

No, when the guest is running, the host, i.e. KVM, sets the EPT memory type to UC

static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
{
if (is_mmio)
return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;

which effectively makes the guest "MTRR" memtype UC, and thus restricts the guest
to using UC or WC.

Your use case wants to map the memory as WC in the guest, but there are zero
guarantees that *every* use case wants to access such memory as WC (or UC),
i.e. forcing UC could cause performance regressions for existing use cases.

Ideally, KVM would force the EPT memtype to match the host PAT memtype while still
honoring guest PAT, but if we wanted to go that route, then KVM should (a) stuff
the exact memtype, (b) stuff the memtype for all of guest memory, and (c) do so
for all flavors of KVM on x86, not just EPT on VMX.

Unfortunatley, making that change now risks breaking 15+ years of KVM ABI. And
there's also the whole "unsafe on Intel CPUs without self-snoop" problem.

> one use case is virtio-gpu host blob, which is to map physical GPU buffers into guest
>
> > the host mapping memory WC while the guest uses WB, but it should be a moot point,
> > because this this series should do what you want (allow guest to map GPU buffers
> > as WC).
> >
> > https://lore.kernel.org/all/[email protected]
> >
>
> yes, this is what i want, but for virtio-gpu device, if we mapping WC typed
> GPU buffer into guest, kvm_arch_has_noncoherent_dma would return false,
> so on cpu without self-snoop support, guest PAT will be ignored, the effective
> memory type would be set to WB, causing data inconsistency.

My understanding is that every Intel CPU released in the last 8+ years supports
self-snoop. See check_memory_type_self_snoop_errata().

IMO, that's a perfectly reasonable line to draw: if you want virtio-gpu support,
upgrade to Ivy Bridge or later.

2024-03-13 02:07:46

by francisco flynn

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86/mmu: treat WC memory as MMIO

On 2024/3/12 23:21, Sean Christopherson wrote:
> On Tue, Mar 12, 2024, francisco flynn wrote:
>> On 2024/3/12 00:20, Sean Christopherson wrote:
>>> On Mon, Mar 11, 2024, francisco_flynn wrote:
>>>> when doing kvm_tdp_mmu_map for WC memory, such as pages
>>>> allocated by amdgpu ttm driver for ttm_write_combined
>>>> caching mode(e.g. host coherent in vulkan),
>>>> the spte would be set to WB, in this case, vcpu write
>>>> to these pages would goes to cache first, and never
>>>> be write-combined and host-coherent anymore. so
>>>> WC memory should be treated as MMIO, and the effective
>>>> memory type is depending on guest PAT.
>>>
>>> No, the effective memtype is not fully guest controlled. By forcing the EPT memtype
>>> to UC, the guest can only use UC or WC. I don't know if there's a use case for
>>
>> Well,it's actually the host mapping memory WC and guest uses WC,
>
> No, when the guest is running, the host, i.e. KVM, sets the EPT memory type to UC
>
> static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> {
> if (is_mmio)
> return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
>
> which effectively makes the guest "MTRR" memtype UC, and thus restricts the guest
> to using UC or WC.
>
> Your use case wants to map the memory as WC in the guest, but there are zero
> guarantees that *every* use case wants to access such memory as WC (or UC),
> i.e. forcing UC could cause performance regressions for existing use cases.
>

yes, this may cause performance regressions in some cases.

> Ideally, KVM would force the EPT memtype to match the host PAT memtype while still
> honoring guest PAT, but if we wanted to go that route, then KVM should (a) stuff
> the exact memtype, (b) stuff the memtype for all of guest memory, and (c) do so
> for all flavors of KVM on x86, not just EPT on VMX.
>

it's true.

> Unfortunatley, making that change now risks breaking 15+ years of KVM ABI. And
> there's also the whole "unsafe on Intel CPUs without self-snoop" problem.
>
>> one use case is virtio-gpu host blob, which is to map physical GPU buffers into guest
>>
>>> the host mapping memory WC while the guest uses WB, but it should be a moot point,
>>> because this this series should do what you want (allow guest to map GPU buffers
>>> as WC).
>>>
>>> https://lore.kernel.org/all/[email protected]
>>>
>>
>> yes, this is what i want, but for virtio-gpu device, if we mapping WC typed
>> GPU buffer into guest, kvm_arch_has_noncoherent_dma would return false,
>> so on cpu without self-snoop support, guest PAT will be ignored, the effective
>> memory type would be set to WB, causing data inconsistency.
>
> My understanding is that every Intel CPU released in the last 8+ years supports
> self-snoop. See check_memory_type_self_snoop_errata().
>
> IMO, that's a perfectly reasonable line to draw: if you want virtio-gpu support,
> upgrade to Ivy Bridge or later.

it make sense.