LinuxLists.cc - Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-01 09:36:56

Subject: Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

On 30/05/2015 12:59, Xiao Guangrong wrote:
> Currently guest MTRR is completely prohibited if cache snoop is supported on
> IOMMU (!noncoherent_dma) and host does the emulation based on the knowledge
> from host side, however, host side is not the good point to know
> what the purpose of guest is. A good example is that pass-throughed VGA
> frame buffer is not always UC as host expected

Can you explain how? The original idea was that such a framebuffer
would be kvm_is_reserved_pfn and thus be unconditionally UC.

> +bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
> + int page_num)
> +{
> + struct mtrr_looker looker;
> + struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
> + u64 start = gfn_to_gpa(gfn), end = gfn_to_gpa(gfn + page_num);
> + int type = -1;
> +
> + mtrr_for_each_mem_type(&looker, mtrr_state, start, end) {
> + if (type == -1) {
> + type = looker.mem_type;
> + continue;
> + }
> +
> + if (type != looker.mem_type)
> + return false;
> + }
> +
> + if ((type != -1) && looker.partial_map &&
> + (mtrr_state->def_type != type))
> + return false;
> +

No Pascal-like parentheses.

Does this have a performance impact on shadow? Perhaps we could cache
in struct kvm_arch_memory_slot whether the memslot is covered by MTRRs?

Paolo

2015-06-01 09:38:46

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

On 01/06/2015 11:36, Paolo Bonzini wrote:
> Does this have a performance impact on shadow? Perhaps we could cache
> in struct kvm_arch_memory_slot whether the memslot is covered by MTRRs?

Nevermind, patch 15 answers my question.

Paolo

2015-06-03 03:01:01

by Xiao Guangrong

[permalink] [raw]

Subject: Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

On 06/01/2015 05:36 PM, Paolo Bonzini wrote:
>
>
> On 30/05/2015 12:59, Xiao Guangrong wrote:
>> Currently guest MTRR is completely prohibited if cache snoop is supported on
>> IOMMU (!noncoherent_dma) and host does the emulation based on the knowledge
>> from host side, however, host side is not the good point to know
>> what the purpose of guest is. A good example is that pass-throughed VGA
>> frame buffer is not always UC as host expected
>
> Can you explain how? The original idea was that such a framebuffer
> would be kvm_is_reserved_pfn and thus be unconditionally UC.

Yes, frame-buffer is always UC in current code, however, UC for frame-buffer
causes bad performance. It's quoted from Documentation/x86/mtrr.txt:

| This is most useful when you have a video (VGA) card on a PCI or AGP bus.
| Enabling write-combining allows bus write transfers to be combined into a
| larger transfer before bursting over the PCI/AGP bus. This can increase
| performance of image write operations 2.5 times or more.

So that guest will configure the range to MTRR, this patchset follows
guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to emulate
guest cache type as guest expects.

2015-06-03 07:56:16

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

On 03/06/2015 04:56, Xiao Guangrong wrote:
>
>
> On 06/01/2015 05:36 PM, Paolo Bonzini wrote:
>>
>>
>> On 30/05/2015 12:59, Xiao Guangrong wrote:
>>> Currently guest MTRR is completely prohibited if cache snoop is
>>> supported on
>>> IOMMU (!noncoherent_dma) and host does the emulation based on the
>>> knowledge
>>> from host side, however, host side is not the good point to know
>>> what the purpose of guest is. A good example is that pass-throughed VGA
>>> frame buffer is not always UC as host expected
>>
>> Can you explain how? The original idea was that such a framebuffer
>> would be kvm_is_reserved_pfn and thus be unconditionally UC.
>
> Yes, frame-buffer is always UC in current code, however, UC for
> frame-buffer causes bad performance.

Understood now, thanks.

> So that guest will configure the range to MTRR, this patchset follows
> guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to
> emulate guest cache type as guest expects.

Unlike e.g. CR0.CD=1, UC memory does not snoop the cache to preserve
coherency. AMD, has special logic to do this, for example:

- if guest PAT says "UC" and host MTRR says "WB", the processor will not
cache the memory but will snoop the cache as if CR0.CD=1

- if guest PAT says "WC" and host (nested page table) PAT says "WB" and
host MTRR says "WB", the processor will still do write combining but
also snoop the cache as if CR0.CD=1

I am worried that the lack of this feature could cause problems if
guests map QEMU's VGA framebuffer as uncached. We have this problem on
ARM, so it's not 100% theoretical.

So, why do you need to always use IPAT=0? Can patch 15 keep the current
logic for RAM, like this:

if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu->kvm))
ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) <<
VMX_EPT_MT_EPTE_SHIFT;
else
ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT)
| VMX_EPT_IPAT_BIT;

?

Paolo

2015-06-04 08:27:46

by Xiao Guangrong

[permalink] [raw]

Subject: Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

On 06/03/2015 03:55 PM, Paolo Bonzini wrote:
>
>
> On 03/06/2015 04:56, Xiao Guangrong wrote:
>>
>>
>> On 06/01/2015 05:36 PM, Paolo Bonzini wrote:
>>>
>>>
>>> On 30/05/2015 12:59, Xiao Guangrong wrote:
>>>> Currently guest MTRR is completely prohibited if cache snoop is
>>>> supported on
>>>> IOMMU (!noncoherent_dma) and host does the emulation based on the
>>>> knowledge
>>>> from host side, however, host side is not the good point to know
>>>> what the purpose of guest is. A good example is that pass-throughed VGA
>>>> frame buffer is not always UC as host expected
>>>
>>> Can you explain how? The original idea was that such a framebuffer
>>> would be kvm_is_reserved_pfn and thus be unconditionally UC.
>>
>> Yes, frame-buffer is always UC in current code, however, UC for
>> frame-buffer causes bad performance.
>
> Understood now, thanks.
>
>> So that guest will configure the range to MTRR, this patchset follows
>> guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to
>> emulate guest cache type as guest expects.
>
> Unlike e.g. CR0.CD=1, UC memory does not snoop the cache to preserve
> coherency. AMD, has special logic to do this, for example:
>
> - if guest PAT says "UC" and host MTRR says "WB", the processor will not
> cache the memory but will snoop the cache as if CR0.CD=1
>
> - if guest PAT says "WC" and host (nested page table) PAT says "WB" and
> host MTRR says "WB", the processor will still do write combining but
> also snoop the cache as if CR0.CD=1
>
> I am worried that the lack of this feature could cause problems if
> guests map QEMU's VGA framebuffer as uncached. We have this problem on
> ARM, so it's not 100% theoretical.

CR0.CD is always 0 in both host and guest, i guess it's why we cleared
CR0.CD and CR0.NW in vmx_set_cr0().

>
> So, why do you need to always use IPAT=0? Can patch 15 keep the current
> logic for RAM, like this:
>
> if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu->kvm))
> ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) <<
> VMX_EPT_MT_EPTE_SHIFT;
> else
> ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT)
> | VMX_EPT_IPAT_BIT;

Yeah, it's okay, actually we considered this way, however
- it's light enough, it did not hurt guest performance based on our
benchmark.
- the logic has always used for noncherent_dma case, extend it to
normal case should have low risk and also help us to check the logic.
- completely follow MTRRS spec would be better than host hides it.

2015-06-04 08:31:10

by Xiao Guangrong

[permalink] [raw]

Subject: Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

On 06/04/2015 04:23 PM, Xiao Guangrong wrote:
>
>
> On 06/03/2015 03:55 PM, Paolo Bonzini wrote:
>>
>>
>> On 03/06/2015 04:56, Xiao Guangrong wrote:
>>>
>>>
>>> On 06/01/2015 05:36 PM, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 30/05/2015 12:59, Xiao Guangrong wrote:
>>>>> Currently guest MTRR is completely prohibited if cache snoop is
>>>>> supported on
>>>>> IOMMU (!noncoherent_dma) and host does the emulation based on the
>>>>> knowledge
>>>>> from host side, however, host side is not the good point to know
>>>>> what the purpose of guest is. A good example is that pass-throughed VGA
>>>>> frame buffer is not always UC as host expected
>>>>
>>>> Can you explain how? The original idea was that such a framebuffer
>>>> would be kvm_is_reserved_pfn and thus be unconditionally UC.
>>>
>>> Yes, frame-buffer is always UC in current code, however, UC for
>>> frame-buffer causes bad performance.
>>
>> Understood now, thanks.
>>
>>> So that guest will configure the range to MTRR, this patchset follows
>>> guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to
>>> emulate guest cache type as guest expects.
>>
>> Unlike e.g. CR0.CD=1, UC memory does not snoop the cache to preserve
>> coherency. AMD, has special logic to do this, for example:
>>
>> - if guest PAT says "UC" and host MTRR says "WB", the processor will not
>> cache the memory but will snoop the cache as if CR0.CD=1
>>
>> - if guest PAT says "WC" and host (nested page table) PAT says "WB" and
>> host MTRR says "WB", the processor will still do write combining but
>> also snoop the cache as if CR0.CD=1
>>
>> I am worried that the lack of this feature could cause problems if
>> guests map QEMU's VGA framebuffer as uncached. We have this problem on
>> ARM, so it's not 100% theoretical.
>
> CR0.CD is always 0 in both host and guest, i guess it's why we cleared
> CR0.CD and CR0.NW in vmx_set_cr0().

It reminds me that we should check guest CR0.CD before check guest MTRR
and disable guest PAT if guest CR0.CD = 1.

2015-06-04 08:34:24

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

On 04/06/2015 10:26, Xiao Guangrong wrote:
>>
>> CR0.CD is always 0 in both host and guest, i guess it's why we cleared
>> CR0.CD and CR0.NW in vmx_set_cr0().
>
> It reminds me that we should check guest CR0.CD before check guest MTRR
> and disable guest PAT if guest CR0.CD = 1.

I think it can be done separately, on top.

Paolo

2015-06-04 08:36:58

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

On 04/06/2015 10:23, Xiao Guangrong wrote:
>>
>> So, why do you need to always use IPAT=0? Can patch 15 keep the current
>> logic for RAM, like this:
>>
>> if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu->kvm))
>> ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) <<
>> VMX_EPT_MT_EPTE_SHIFT;
>> else
>> ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT)
>> | VMX_EPT_IPAT_BIT;
>
> Yeah, it's okay, actually we considered this way, however
> - it's light enough, it did not hurt guest performance based on our
> benchmark.
> - the logic has always used for noncherent_dma case, extend it to
> normal case should have low risk and also help us to check the logic.

But noncoherent_dma is not the common case, so it's not necessarily true
that the risk is low.

> - completely follow MTRRS spec would be better than host hides it.

We are a virtualization platform, we know well when MTRRs are necessary.

Tis a risk from blindly obeying the guest MTRRs: userspace can see stale
data if the guest's accesses bypass the cache. AMD bypasses this by
enabling snooping even in cases that ordinarily wouldn't snoop; for
Intel the solution is that RAM-backed areas should always use IPAT.

Paolo

2015-06-05 06:42:15

by Xiao Guangrong

[permalink] [raw]

Subject: Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

[ CCed Zhang Yang ]

On 06/04/2015 04:36 PM, Paolo Bonzini wrote:
>
>
> On 04/06/2015 10:23, Xiao Guangrong wrote:
>>>
>>> So, why do you need to always use IPAT=0? Can patch 15 keep the current
>>> logic for RAM, like this:
>>>
>>> if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu->kvm))
>>> ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) <<
>>> VMX_EPT_MT_EPTE_SHIFT;
>>> else
>>> ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT)
>>> | VMX_EPT_IPAT_BIT;
>>
>> Yeah, it's okay, actually we considered this way, however
>> - it's light enough, it did not hurt guest performance based on our
>> benchmark.
>> - the logic has always used for noncherent_dma case, extend it to
>> normal case should have low risk and also help us to check the logic.
>
> But noncoherent_dma is not the common case, so it's not necessarily true
> that the risk is low.

I thought noncoherent_dma exists on 1st generation(s) IOMMU, it should
be fully tested at that time.

>
>> - completely follow MTRRS spec would be better than host hides it.
>
> We are a virtualization platform, we know well when MTRRs are necessary.
>
> Tis a risk from blindly obeying the guest MTRRs: userspace can see stale
> data if the guest's accesses bypass the cache. AMD bypasses this by
> enabling snooping even in cases that ordinarily wouldn't snoop; for
> Intel the solution is that RAM-backed areas should always use IPAT.

Not sure if UC and other cacheable type combinations on guest and host
will cause problem. The SMD mentioned that snoop is not required only when
"The UC attribute comes from the MTRRs and the processors are not required
to snoop their caches since the data could never have been cached."
(Vol 3. 11.5.2.2)
VMX do not touch hardware MTRR MSRs and i guess snoop works under this case.

I also noticed if SS (self-snooping) is supported we need not to invalidate
cache when programming memory type (Vol 3. 11.11.8), so that means CPU works
well on the page which has different cache types i guess.

After think it carefully, we (Zhang Yang) doubt if always set WB for DMA
memory is really a good idea because we can not assume WB DMA works well for
all devices. One example is that audio DMA (not a MMIO region) is required WC
to improve its performance.

However, we think the SDM is not clear enough so let's do full vMTRR on MMIO
and noncoherent_dma first.　:)