Recording and slides:
https://drive.google.com/corp/drive/folders/18QbkitOXcZyYXpT558wXf9Hs-rQs8mhX?resourcekey=0-qOuxyhLUBUGlCwHrzPqAkQ
Key Takeways:
- Intel CPU<->CPU accesses are coherent for guest WC/UC accesses, so KVM can
honor guest PAT for all VMs without putting the host or guest at risk. I.e.
KVM x86 doesn't need new uAPI, we can simply delete the IPAT code.
- Intel CPUs need an SFENCE after VM-Exit, but there's already an smp_mb()
buried in srcu_read_lock(), and KVM uses SRCU to protect memslots, i.e. an
SFENCE is guaranteed before KVM (or userspace) will access guest memory after
VM-Exit. TODO: add and use smp_mb__after_srcu_read_lock() to pair with
smp_mb__after_srcu_read_unlock() and document the need for a barrier on Intel.
- IOMMU (via VFIO/IOMMUFD) mappings need cache flush operations on map() and
unmap() to prevent the guest from using non-coherent DMA to read stale data
on x86 (and likely other architectures).
- ARM's architecture doesn't guarantee coherency for mismatched memtypes, so
KVM still needs to figure out a solution for ARM, and possibly RISC-V as
well. But for CPU<->CPU access, KVM guarantees host safety, just not
functional correctness for the guest, i.e. finding a solution can likely be
deferred until a use case comes along.
- CPU<->Device coherency on ARM is messy and needs further discussion.
- GPU drivers flush caches when mapping and unmapping buffers, so the existing
virtio GPU use case is ok (though ideally it would be ported to use IOMMUFD's
mediated device support).
- Virtio GPU guest drivers are responsible for using CLFLUSH{OPT} instead of
WBVIND (which is intercept and ignored by KVM).
- KVM x86's support for virtualizing MTRRs on Intel CPUs can also be dropped
(it was effectively a workaround for ignoring guest PAT).
On Wed, Jan 24, 2024 at 10:24:44AM -0800, Sean Christopherson wrote:
> - ARM's architecture doesn't guarantee coherency for mismatched memtypes, so
> KVM still needs to figure out a solution for ARM, and possibly RISC-V as
> well. But for CPU<->CPU access, KVM guarantees host safety, just not
> functional correctness for the guest, i.e. finding a solution can likely be
> deferred until a use case comes along.
Regarding the side discussion on ARM DMA coherency enforcement..
Reading the docs more fully, SMMU has no analog to the Intel/AMD
per-page "ignore no snoop" functionality. The driver does the correct
things at the IOMMU API level to indicate this.
Various things say SMMU should map PCIe No Snoop to Normal-iNC-oNC-OSH
on the output transaction.
ARM docs recommend that the VMM clear the "No Snoop Enable" in the PCI
endpoint config space if they want to block No Snoop. I guess this
works for everything and is something we should think about
generically in VFIO to better support iommu drivers that lack
IOMMU_CAP_ENFORCE_CACHE_COHERENCY.
ARM KVM probably needs to do something with
kvm_arch_register_noncoherent_dma() to understand that the VM can
still make the cache incoherent even if FWB is set.
Relatedly the SMMU nested translation design is similar to KVM where
the S1 can contribute memory attributes. The STE.S2FWB behaves
similarly to the KVM where it prevents the S1 from overriding
cachable in the S2.
The nested patches are still to be posted but the current draft does
not set S2FWB, I will get that fixed.
We may have another vfio/iommufd/smmu issue where non-RAM pages are
mapped into the SMMU with IOMMU_CACHABLE, unclear when this would be
practically important but it seems wrong.
Jason