From: Ankit Agrawal <[email protected]>
Currently, KVM for ARM64 maps at stage 2 memory that is considered device
with DEVICE_nGnRE memory attributes; this setting overrides (per
ARM architecture [1]) any device MMIO mapping present at stage 1,
resulting in a set-up whereby a guest operating system cannot
determine device MMIO mapping memory attributes on its own but
it is always overridden by the KVM stage 2 default.
This set-up does not allow guest operating systems to select device
memory attributes independently from KVM stage-2 mappings
(refer to [1], "Combining stage 1 and stage 2 memory type attributes"),
which turns out to be an issue in that guest operating systems
(e.g. Linux) may request to map devices MMIO regions with memory
attributes that guarantee better performance (e.g. gathering
attribute - that for some devices can generate larger PCIe memory
writes TLPs) and specific operations (e.g. unaligned transactions)
such as the NormalNC memory type.
The default device stage 2 mapping was chosen in KVM for ARM64 since
it was considered safer (i.e. it would not allow guests to trigger
uncontained failures ultimately crashing the machine) but this
turned out to be asynchronous (SError) defeating the purpose.
For these reasons, relax the KVM stage 2 device memory attributes
from DEVICE_nGnRE to Normal-NC.
Generalizing to other devices may be problematic, however. E.g.
GICv2 VCPU interface, which is effectively a shared peripheral, can
allow a guest to affect another guest's interrupt distribution. Hence
limit the change to VFIO PCI as caution. This is achieved by
making the VFIO PCI core module set a flag that is tested by KVM
to activate the code. This could be extended to other devices in
the future once that is deemed safe.
[1] section D8.5 - DDI0487J_a_a-profile_architecture_reference_manual.pdf
Applied over v6.8-rc5.
History
=======
v7 -> v8
- Changed commit message of patches 2/4 and 4/4 to include detailed
description of the VM_ALLOW_ANY_UNCACHED flag posted by Jason in
the commit message.
- Added more detailed comment in the vfio_pci_core about
VM_ALLOW_ANY_UNCACHED flag.
- Rebased to v6.8-rc5.
v6 -> v7
- Changed VM_VFIO_ALLOW_WC to VM_ALLOW_ANY_UNCACHED based on suggestion
from Alex Williamson.
- Refactored stage2_set_prot_attr() based on Will's suggestion to
reorganize the switch cases. Also updated the case to return -EINVAL
when both KVM_PGTABLE_PROT_DEVICE and KVM_PGTABLE_PROT_NORMAL_NC set.
- Fixed nits pointed by Oliver and Catalin.
v5 -> v6
- Rebased to v6.8-rc2
v4 -> v5
- Moved the cover letter description text to patch 1/4.
- Cleaned up stage2_set_prot_attr() based on Marc Zyngier suggestions.
- Moved the mm header file changes to a separate patch.
- Rebased to v6.7-rc3.
v3 -> v4
- Moved the vfio-pci change to use the VM_VFIO_ALLOW_WC into
separate patch.
- Added check to warn on the case NORMAL_NC and DEVICE are
set simultaneously.
- Fixed miscellaneous nitpicks suggested in v3.
v2 -> v3
- Added a new patch (and converted to patch series) suggested by
Catalin Marinas to ensure the code changes are restricted to
VFIO PCI devices.
- Introduced VM_VFIO_ALLOW_WC flag for VFIO PCI to communicate
with VMM.
- Reverted GIC mapping to DEVICE.
v1 -> v2
- Updated commit log to the one posted by
Lorenzo Pieralisi <[email protected]> (Thanks!)
- Added new flag to represent the NORMAL_NC setting. Updated
stage2_set_prot_attr() to handle new flag.
v7 Link:
https://lore.kernel.org/all/[email protected]/
Suggested-by: Jason Gunthorpe <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Signed-off-by: Ankit Agrawal <[email protected]>
Ankit Agrawal (4):
kvm: arm64: introduce new flag for non-cacheable IO memory
mm: introduce new flag to indicate wc safe
kvm: arm64: set io memory s2 pte as normalnc for vfio pci device
vfio: convey kvm that the vfio-pci device is wc safe
arch/arm64/include/asm/kvm_pgtable.h | 2 ++
arch/arm64/include/asm/memory.h | 2 ++
arch/arm64/kvm/hyp/pgtable.c | 24 +++++++++++++++++++-----
arch/arm64/kvm/mmu.c | 14 ++++++++++----
drivers/vfio/pci/vfio_pci_core.c | 18 +++++++++++++++++-
include/linux/mm.h | 14 ++++++++++++++
6 files changed, 64 insertions(+), 10 deletions(-)
--
2.34.1
From: Ankit Agrawal <[email protected]>
The VM_ALLOW_ANY_UNCACHED flag is implemented for ARM64,
allowing KVM stage 2 device mapping attributes to use Normal-NC
rather than DEVICE_nGnRE, which allows guest mappings
supporting combining attributes (WC). ARM does not architecturally
guarantee this is safe, and indeed some MMIO regions like the GICv2
VCPU interface can trigger uncontained faults if Normal-NC is used.
To safely use VFIO in KVM the platform must guarantee full safety
in the guest where no action taken against a MMIO mapping can
trigger an uncontained failure. We belive that most VFIO PCI
platforms support this for both mapping types, at least in common
flows, based on some expectations of how PCI IP is integrated. So
make vfio-pci set the VM_ALLOW_ANY_UNCACHED flag.
Suggested-by: Catalin Marinas <[email protected]>
Acked-by: Jason Gunthorpe <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Ankit Agrawal <[email protected]>
---
drivers/vfio/pci/vfio_pci_core.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1cbc990d42e0..c93bea18fc4b 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1862,8 +1862,24 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
/*
* See remap_pfn_range(), called from vfio_pci_fault() but we can't
* change vm_flags within the fault handler. Set them now.
+ *
+ * VM_ALLOW_ANY_UNCACHED: The VMA flag is implemented for ARM64,
+ * allowing KVM stage 2 device mapping attributes to use Normal-NC
+ * rather than DEVICE_nGnRE, which allows guest mappings
+ * supporting combining attributes (WC). ARM does not
+ * architecturally guarantee this is safe, and indeed some MMIO
+ * regions like the GICv2 VCPU interface can trigger uncontained
+ * faults if Normal-NC is used.
+ *
+ * To safely use VFIO in KVM the platform must guarantee full
+ * safety in the guest where no action taken against a MMIO
+ * mapping can trigger an uncontained failure. We belive that
+ * most VFIO PCI platforms support this for both mapping types,
+ * at least in common flows, based on some expectations of how
+ * PCI IP is integrated. So set VM_ALLOW_ANY_UNCACHED in VMA flags.
*/
- vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
+ vm_flags_set(vma, VM_ALLOW_ANY_UNCACHED | VM_IO | VM_PFNMAP |
+ VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &vfio_pci_mmap_ops;
return 0;
--
2.34.1
On Tue, Feb 20, 2024 at 12:59:26PM +0530, [email protected] wrote:
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 1cbc990d42e0..c93bea18fc4b 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1862,8 +1862,24 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
> /*
> * See remap_pfn_range(), called from vfio_pci_fault() but we can't
> * change vm_flags within the fault handler. Set them now.
> + *
> + * VM_ALLOW_ANY_UNCACHED: The VMA flag is implemented for ARM64,
> + * allowing KVM stage 2 device mapping attributes to use Normal-NC
> + * rather than DEVICE_nGnRE, which allows guest mappings
> + * supporting combining attributes (WC). ARM does not
Nitpick: "supporting write-combining" (if you plan to respin).
--
Catalin
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 1cbc990d42e0..c93bea18fc4b 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -1862,8 +1862,24 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
>>?????? /*
>>??????? * See remap_pfn_range(), called from vfio_pci_fault() but we can't
>>??????? * change vm_flags within the fault handler.? Set them now.
>> +????? *
>> +????? * VM_ALLOW_ANY_UNCACHED: The VMA flag is implemented for ARM64,
>> +????? * allowing KVM stage 2 device mapping attributes to use Normal-NC
>> +????? * rather than DEVICE_nGnRE, which allows guest mappings
>> +????? * supporting combining attributes (WC). ARM does not
>
> Nitpick: "supporting write-combining" (if you plan to respin).
Ack.
On Tue, 20 Feb 2024 12:59:26 +0530
<[email protected]> wrote:
> From: Ankit Agrawal <[email protected]>
>
> The VM_ALLOW_ANY_UNCACHED flag is implemented for ARM64,
> allowing KVM stage 2 device mapping attributes to use Normal-NC
> rather than DEVICE_nGnRE, which allows guest mappings
> supporting combining attributes (WC). ARM does not architecturally
> guarantee this is safe, and indeed some MMIO regions like the GICv2
> VCPU interface can trigger uncontained faults if Normal-NC is used.
>
> To safely use VFIO in KVM the platform must guarantee full safety
> in the guest where no action taken against a MMIO mapping can
> trigger an uncontained failure. We belive that most VFIO PCI
> platforms support this for both mapping types, at least in common
> flows, based on some expectations of how PCI IP is integrated. So
> make vfio-pci set the VM_ALLOW_ANY_UNCACHED flag.
>
> Suggested-by: Catalin Marinas <[email protected]>
> Acked-by: Jason Gunthorpe <[email protected]>
> Acked-by: Catalin Marinas <[email protected]>
> Reviewed-by: David Hildenbrand <[email protected]>
> Signed-off-by: Ankit Agrawal <[email protected]>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 1cbc990d42e0..c93bea18fc4b 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1862,8 +1862,24 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
> /*
> * See remap_pfn_range(), called from vfio_pci_fault() but we can't
> * change vm_flags within the fault handler. Set them now.
> + *
> + * VM_ALLOW_ANY_UNCACHED: The VMA flag is implemented for ARM64,
> + * allowing KVM stage 2 device mapping attributes to use Normal-NC
> + * rather than DEVICE_nGnRE, which allows guest mappings
> + * supporting combining attributes (WC). ARM does not
> + * architecturally guarantee this is safe, and indeed some MMIO
> + * regions like the GICv2 VCPU interface can trigger uncontained
> + * faults if Normal-NC is used.
> + *
> + * To safely use VFIO in KVM the platform must guarantee full
> + * safety in the guest where no action taken against a MMIO
> + * mapping can trigger an uncontained failure. We belive that
> + * most VFIO PCI platforms support this for both mapping types,
> + * at least in common flows, based on some expectations of how
> + * PCI IP is integrated. So set VM_ALLOW_ANY_UNCACHED in VMA flags.
> */
> - vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
> + vm_flags_set(vma, VM_ALLOW_ANY_UNCACHED | VM_IO | VM_PFNMAP |
> + VM_DONTEXPAND | VM_DONTDUMP);
> vma->vm_ops = &vfio_pci_mmap_ops;
>
> return 0;
Acked-by: Alex Williamson <[email protected]>
On Tue, 20 Feb 2024 07:29:22 +0000,
<[email protected]> wrote:
>
> From: Ankit Agrawal <[email protected]>
>
> Currently, KVM for ARM64 maps at stage 2 memory that is considered device
> with DEVICE_nGnRE memory attributes; this setting overrides (per
> ARM architecture [1]) any device MMIO mapping present at stage 1,
> resulting in a set-up whereby a guest operating system cannot
> determine device MMIO mapping memory attributes on its own but
> it is always overridden by the KVM stage 2 default.
>
> This set-up does not allow guest operating systems to select device
> memory attributes independently from KVM stage-2 mappings
> (refer to [1], "Combining stage 1 and stage 2 memory type attributes"),
> which turns out to be an issue in that guest operating systems
> (e.g. Linux) may request to map devices MMIO regions with memory
> attributes that guarantee better performance (e.g. gathering
> attribute - that for some devices can generate larger PCIe memory
> writes TLPs) and specific operations (e.g. unaligned transactions)
> such as the NormalNC memory type.
>
> The default device stage 2 mapping was chosen in KVM for ARM64 since
> it was considered safer (i.e. it would not allow guests to trigger
> uncontained failures ultimately crashing the machine) but this
> turned out to be asynchronous (SError) defeating the purpose.
>
> For these reasons, relax the KVM stage 2 device memory attributes
> from DEVICE_nGnRE to Normal-NC.
>
> Generalizing to other devices may be problematic, however. E.g.
> GICv2 VCPU interface, which is effectively a shared peripheral, can
> allow a guest to affect another guest's interrupt distribution. Hence
> limit the change to VFIO PCI as caution. This is achieved by
> making the VFIO PCI core module set a flag that is tested by KVM
> to activate the code. This could be extended to other devices in
> the future once that is deemed safe.
>
> [1] section D8.5 - DDI0487J_a_a-profile_architecture_reference_manual.pdf
>
> Applied over v6.8-rc5.
For the series,
Reviewed-by: Marc Zyngier <[email protected]>
M.
--
Without deviation from the norm, progress is not possible.