2020-09-03 09:38:59

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH 0/2 v2] iommu: amd: Fix intremap IO_PAGE_FAULT for VMs

Interrupt remapping IO_PAGE_FAULT has been observed under system w/
large number of VMs w/ pass-through devices. This can be reproduced with
64 VMs + 64 pass-through VFs of Mellanox MT28800 Family [ConnectX-5 Ex],
where each VM runs small-packet netperf test via the pass-through device
to the netserver running on the host. All VMs are running in reboot loop,
to trigger IRTE updates.

In addition, to accelerate the failure, irqbalance is triggered periodically
(e.g. 1-5 sec), which should generate large amount of updates to IRTE.
This setup generally triggers IO_PAGE_FAULT within 3-4 hours.

Investigation has shown that the issue is in the code to update IRTE
while remapping is enabled. Please see patch 2/2 for detail discussion.

This serires has been tested running in the setup mentioned above
upto 96 hours w/o seeing issues.

Changes from v1 (https://lkml.org/lkml/2020/9/2/26)
* Fix typo in comments and commit messages
* Fix logic to check for X86_FEATURE_CX16 support in patch 2/2

Thanks,
Suravee

Suravee Suthikulpanit (2):
iommu: amd: Restore IRTE.RemapEn bit after programming IRTE
iommu: amd: Use cmpxchg_double() when updating 128-bit IRTE

drivers/iommu/amd/Kconfig | 2 +-
drivers/iommu/amd/init.c | 21 +++++++++++++++++++--
drivers/iommu/amd/iommu.c | 19 +++++++++++++++----
3 files changed, 35 insertions(+), 7 deletions(-)

--
2.17.1


2020-09-03 09:39:45

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH 1/2 v2] iommu: amd: Restore IRTE.RemapEn bit after programming IRTE

Currently, the RemapEn (valid) bit is accidentally cleared when
programming IRTE w/ guestMode=0. It should be restored to
the prior state.

Reviewed-by: Joao Martins <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
Fixes: b9fc6b56f478 ("iommu/amd: Implements irq_set_vcpu_affinity() hook to setup vapic mode for pass-through devices")
---
drivers/iommu/amd/iommu.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ba9f3dbc5b94..967f4e96d1eb 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3850,6 +3850,7 @@ int amd_iommu_deactivate_guest_mode(void *data)
struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
struct irq_cfg *cfg = ir_data->cfg;
+ u64 valid = entry->lo.fields_remap.valid;

if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
!entry || !entry->lo.fields_vapic.guest_mode)
@@ -3858,6 +3859,7 @@ int amd_iommu_deactivate_guest_mode(void *data)
entry->lo.val = 0;
entry->hi.val = 0;

+ entry->lo.fields_remap.valid = valid;
entry->lo.fields_remap.dm = apic->irq_dest_mode;
entry->lo.fields_remap.int_type = apic->irq_delivery_mode;
entry->hi.fields.vector = cfg->vector;
--
2.17.1

2020-09-03 09:40:29

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH 2/2 v2] iommu: amd: Use cmpxchg_double() when updating 128-bit IRTE

When using 128-bit interrupt-remapping table entry (IRTE) (a.k.a GA mode),
current driver disables interrupt remapping when it updates the IRTE
so that the upper and lower 64-bit values can be updated safely.

However, this creates a small window, where the interrupt could
arrive and result in IO_PAGE_FAULT (for interrupt) as shown below.

IOMMU Driver Device IRQ
============ ===========
irte.RemapEn=0
...
change IRTE IRQ from device ==> IO_PAGE_FAULT !!
...
irte.RemapEn=1

This scenario has been observed when changing irq affinity on a system
running I/O-intensive workload, in which the destination APIC ID
in the IRTE is updated.

Instead, use cmpxchg_double() to update the 128-bit IRTE at once without
disabling the interrupt remapping. However, this means several features,
which require GA (128-bit IRTE) support will also be affected if cmpxchg16b
is not supported (which is unprecedented for AMD processors w/ IOMMU).

Reviewed-by: Joao Martins <[email protected]>
Reported-by: Sean Osborne <[email protected]>
Tested-by: Erik Rockstrom <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
Fixes: 880ac60e2538 ("iommu/amd: Introduce interrupt remapping ops structure")
---
drivers/iommu/amd/Kconfig | 2 +-
drivers/iommu/amd/init.c | 21 +++++++++++++++++++--
drivers/iommu/amd/iommu.c | 17 +++++++++++++----
3 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd/Kconfig b/drivers/iommu/amd/Kconfig
index 1f061d91e0b8..626b97d0dd21 100644
--- a/drivers/iommu/amd/Kconfig
+++ b/drivers/iommu/amd/Kconfig
@@ -10,7 +10,7 @@ config AMD_IOMMU
select IOMMU_API
select IOMMU_IOVA
select IOMMU_DMA
- depends on X86_64 && PCI && ACPI
+ depends on X86_64 && PCI && ACPI && HAVE_CMPXCHG_DOUBLE
help
With this option you can enable support for AMD IOMMU hardware in
your system. An IOMMU is a hardware component which provides
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index c652f16eb702..ac09e4063677 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1511,7 +1511,14 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
iommu->mmio_phys_end = MMIO_REG_END_OFFSET;
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
- if (((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
+
+ /*
+ * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
+ * GAM also requires GA mode. Therefore, we need to
+ * check cmpxchg16b support before enabling it.
+ */
+ if (!boot_cpu_has(X86_FEATURE_CX16) ||
+ ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
break;
case 0x11:
@@ -1520,8 +1527,18 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
iommu->mmio_phys_end = MMIO_REG_END_OFFSET;
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
- if (((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0))
+
+ /*
+ * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
+ * XT, GAM also requires GA mode. Therefore, we need to
+ * check cmpxchg16b support before enabling them.
+ */
+ if (!boot_cpu_has(X86_FEATURE_CX16) ||
+ ((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0)) {
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
+ break;
+ }
+
/*
* Note: Since iommu_update_intcapxt() leverages
* the IOMMU MMIO access to MSI capability block registers
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 967f4e96d1eb..a382d7a73eaa 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3292,6 +3292,7 @@ static int alloc_irq_index(u16 devid, int count, bool align,
static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte,
struct amd_ir_data *data)
{
+ bool ret;
struct irq_remap_table *table;
struct amd_iommu *iommu;
unsigned long flags;
@@ -3309,10 +3310,18 @@ static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte,

entry = (struct irte_ga *)table->table;
entry = &entry[index];
- entry->lo.fields_remap.valid = 0;
- entry->hi.val = irte->hi.val;
- entry->lo.val = irte->lo.val;
- entry->lo.fields_remap.valid = 1;
+
+ ret = cmpxchg_double(&entry->lo.val, &entry->hi.val,
+ entry->lo.val, entry->hi.val,
+ irte->lo.val, irte->hi.val);
+ /*
+ * We use cmpxchg16 to atomically update the 128-bit IRTE,
+ * and it cannot be updated by the hardware or other processors
+ * behind us, so the return value of cmpxchg16 should be the
+ * same as the old value.
+ */
+ WARN_ON(!ret);
+
if (data)
data->ref = entry;

--
2.17.1

2020-09-04 09:57:25

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 0/2 v2] iommu: amd: Fix intremap IO_PAGE_FAULT for VMs

On Thu, Sep 03, 2020 at 09:38:20AM +0000, Suravee Suthikulpanit wrote:
> Suravee Suthikulpanit (2):
> iommu: amd: Restore IRTE.RemapEn bit after programming IRTE
> iommu: amd: Use cmpxchg_double() when updating 128-bit IRTE

Applied both for v5.9, thanks Suravee.