Interrupt remapping IO_PAGE_FAULT has been observed under system w/
large number of VMs w/ pass-through devices. This can be reproduced with
64 VMs + 64 pass-through VFs of Mellanox MT28800 Family [ConnectX-5 Ex],
where each VM runs small-packet netperf test via the pass-through device
to the netserver running on the host. All VMs are running in reboot loop,
to trigger IRTE updates.
In addition, to accelerate the failure, irqbalance is triggered periodically
(e.g. 1-5 sec), which should generate large amount of updates to IRTE.
This setup generally triggers IO_PAGE_FAULT within 3-4 hours.
Investigation has shown that the issue is in the code to update IRTE
while remapping is enabled. Please see patch 2/2 for detail discussion.
This serires has been tested running in the setup mentioned above
upto 96 hours w/o seeing issues.
Thanks,
Suravee
Suravee Suthikulpanit (2):
iommu: amd: Restore IRTE.RemapEn bit after programming IRTE
iommu: amd: Use cmpxchg_double() when updating 128-bit IRTE
drivers/iommu/amd/Kconfig | 2 +-
drivers/iommu/amd/init.c | 21 +++++++++++++++++++--
drivers/iommu/amd/iommu.c | 19 +++++++++++++++----
3 files changed, 35 insertions(+), 7 deletions(-)
--
2.17.1
When using 128-bit interrupt-remapping table entry (IRTE) (a.k.a GA mode),
current driver disables interrupt remapping when it updates the IRTE
so that the upper and lower 64-bit values can be updated safely.
However, this creates a small window, where the interrupt could
arrive and result in IO_PAGE_FAULT (for interrupt) as shown below.
IOMMU Driver Device IRQ
============ ===========
irte.RemapEn=0
...
change IRTE IRQ from device ==> IO_PAGE_FAULT !!
...
irte.RemapEn=1
This scenario has been observed when changing irq affinity on a system
running I/O-intensive workload, in which the destination APIC ID
in the IRTE is updated.
Instead, use cmpxchg_double() to update the 128-bit IRTE at once without
disabling the interrupt remapping. However, this means several features,
which require GA (128-bit IRTE) support will also be affected if cmpxchg16b
is not supported (which is unprecedented for AMD processors w/ IOMMU).
Reported-by: Sean Osborne <[email protected]>
Tested-by: Erik Rockstrom <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
drivers/iommu/amd/Kconfig | 2 +-
drivers/iommu/amd/init.c | 21 +++++++++++++++++++--
drivers/iommu/amd/iommu.c | 17 +++++++++++++----
3 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/amd/Kconfig b/drivers/iommu/amd/Kconfig
index 1f061d91e0b8..626b97d0dd21 100644
--- a/drivers/iommu/amd/Kconfig
+++ b/drivers/iommu/amd/Kconfig
@@ -10,7 +10,7 @@ config AMD_IOMMU
select IOMMU_API
select IOMMU_IOVA
select IOMMU_DMA
- depends on X86_64 && PCI && ACPI
+ depends on X86_64 && PCI && ACPI && HAVE_CMPXCHG_DOUBLE
help
With this option you can enable support for AMD IOMMU hardware in
your system. An IOMMU is a hardware component which provides
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index c652f16eb702..ad30467f6930 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1511,7 +1511,14 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
iommu->mmio_phys_end = MMIO_REG_END_OFFSET;
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
- if (((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
+
+ /*
+ * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
+ * GAM also requires GA mode. Therefore, we need to
+ * check cmbxchg16b support before enabling it.
+ */
+ if (!boot_cpu_has(X86_FEATURE_CX16) ||
+ ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
break;
case 0x11:
@@ -1520,8 +1527,18 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
iommu->mmio_phys_end = MMIO_REG_END_OFFSET;
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
- if (((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0))
+
+ /*
+ * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
+ * XT, GAM also requires GA mode. Therefore, we need to
+ * check cmbxchg16b support before enabling them.
+ */
+ if (boot_cpu_has(X86_FEATURE_CX16) ||
+ ((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0)) {
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
+ break;
+ }
+
/*
* Note: Since iommu_update_intcapxt() leverages
* the IOMMU MMIO access to MSI capability block registers
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 967f4e96d1eb..a382d7a73eaa 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3292,6 +3292,7 @@ static int alloc_irq_index(u16 devid, int count, bool align,
static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte,
struct amd_ir_data *data)
{
+ bool ret;
struct irq_remap_table *table;
struct amd_iommu *iommu;
unsigned long flags;
@@ -3309,10 +3310,18 @@ static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte,
entry = (struct irte_ga *)table->table;
entry = &entry[index];
- entry->lo.fields_remap.valid = 0;
- entry->hi.val = irte->hi.val;
- entry->lo.val = irte->lo.val;
- entry->lo.fields_remap.valid = 1;
+
+ ret = cmpxchg_double(&entry->lo.val, &entry->hi.val,
+ entry->lo.val, entry->hi.val,
+ irte->lo.val, irte->hi.val);
+ /*
+ * We use cmpxchg16 to atomically update the 128-bit IRTE,
+ * and it cannot be updated by the hardware or other processors
+ * behind us, so the return value of cmpxchg16 should be the
+ * same as the old value.
+ */
+ WARN_ON(!ret);
+
if (data)
data->ref = entry;
--
2.17.1
Currently, the RemapEn (valid) bit is accidentally cleared when
programming IRTE w/ guestMode=0. It should be restored to
the prior state.
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
drivers/iommu/amd/iommu.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ba9f3dbc5b94..967f4e96d1eb 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3850,6 +3850,7 @@ int amd_iommu_deactivate_guest_mode(void *data)
struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
struct irq_cfg *cfg = ir_data->cfg;
+ u64 valid = entry->lo.fields_remap.valid;
if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
!entry || !entry->lo.fields_vapic.guest_mode)
@@ -3858,6 +3859,7 @@ int amd_iommu_deactivate_guest_mode(void *data)
entry->lo.val = 0;
entry->hi.val = 0;
+ entry->lo.fields_remap.valid = valid;
entry->lo.fields_remap.dm = apic->irq_dest_mode;
entry->lo.fields_remap.int_type = apic->irq_delivery_mode;
entry->hi.fields.vector = cfg->vector;
--
2.17.1
On 9/2/20 5:51 AM, Suravee Suthikulpanit wrote:
> Currently, the RemapEn (valid) bit is accidentally cleared when
> programming IRTE w/ guestMode=0. It should be restored to
> the prior state.
>
Probably requires:
Fixes: b9fc6b56f478 ("iommu/amd: Implements irq_set_vcpu_affinity() hook to setup vapic
mode for pass-through devices")
?
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
FWIW,
Reviewed-by: Joao Martins <[email protected]>
On 9/2/20 5:51 AM, Suravee Suthikulpanit wrote:
> When using 128-bit interrupt-remapping table entry (IRTE) (a.k.a GA mode),
> current driver disables interrupt remapping when it updates the IRTE
> so that the upper and lower 64-bit values can be updated safely.
>
> However, this creates a small window, where the interrupt could
> arrive and result in IO_PAGE_FAULT (for interrupt) as shown below.
>
> IOMMU Driver Device IRQ
> ============ ===========
> irte.RemapEn=0
> ...
> change IRTE IRQ from device ==> IO_PAGE_FAULT !!
> ...
> irte.RemapEn=1
>
> This scenario has been observed when changing irq affinity on a system
> running I/O-intensive workload, in which the destination APIC ID
> in the IRTE is updated.
>
> Instead, use cmpxchg_double() to update the 128-bit IRTE at once without
> disabling the interrupt remapping. However, this means several features,
> which require GA (128-bit IRTE) support will also be affected if cmpxchg16b
> is not supported (which is unprecedented for AMD processors w/ IOMMU).
>
Probably requires:
Fixes: 880ac60e2538 ("iommu/amd: Introduce interrupt remapping ops structure")
?
> Reported-by: Sean Osborne <[email protected]>
> Tested-by: Erik Rockstrom <[email protected]>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
With the comments below addressed, FWIW:
Reviewed-by: Joao Martins <[email protected]>
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index c652f16eb702..ad30467f6930 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -1511,7 +1511,14 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
> iommu->mmio_phys_end = MMIO_REG_END_OFFSET;
> else
> iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
> - if (((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
> +
> + /*
> + * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
> + * GAM also requires GA mode. Therefore, we need to
> + * check cmbxchg16b support before enabling it.
> + */
s/cmbxchg16b/cmpxchg16b
> + if (!boot_cpu_has(X86_FEATURE_CX16) ||
> + ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
> amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
> break;
> case 0x11:
> @@ -1520,8 +1527,18 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
> iommu->mmio_phys_end = MMIO_REG_END_OFFSET;
> else
> iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
> - if (((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0))
> +
> + /*
> + * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
> + * XT, GAM also requires GA mode. Therefore, we need to
> + * check cmbxchg16b support before enabling them.
s/cmbxchg16b/cmpxchg16b
> + */
> + if (boot_cpu_has(X86_FEATURE_CX16) ||
You probably want !boot_cpu_has(X86_FEATURE_CX16) ?
> + ((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0)) {
> amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
> + break;
> + }
> +
> /*
> * Note: Since iommu_update_intcapxt() leverages
> * the IOMMU MMIO access to MSI capability block registers
Hi,
I'll send out V2 with fixes to the review comments below ...
On 9/2/20 10:26 PM, Joao Martins wrote:
> On 9/2/20 5:51 AM, Suravee Suthikulpanit wrote:
>> When using 128-bit interrupt-remapping table entry (IRTE) (a.k.a GA mode),
>> current driver disables interrupt remapping when it updates the IRTE
>> so that the upper and lower 64-bit values can be updated safely.
>>
>> However, this creates a small window, where the interrupt could
>> arrive and result in IO_PAGE_FAULT (for interrupt) as shown below.
>>
>> IOMMU Driver Device IRQ
>> ============ ===========
>> irte.RemapEn=0
>> ...
>> change IRTE IRQ from device ==> IO_PAGE_FAULT !!
>> ...
>> irte.RemapEn=1
>>
>> This scenario has been observed when changing irq affinity on a system
>> running I/O-intensive workload, in which the destination APIC ID
>> in the IRTE is updated.
>>
>> Instead, use cmpxchg_double() to update the 128-bit IRTE at once without
>> disabling the interrupt remapping. However, this means several features,
>> which require GA (128-bit IRTE) support will also be affected if cmpxchg16b
>> is not supported (which is unprecedented for AMD processors w/ IOMMU).
>>
> Probably requires:
>
> Fixes: 880ac60e2538 ("iommu/amd: Introduce interrupt remapping ops structure")
>
Yes, I will include this in V2.
>
>> Reported-by: Sean Osborne <[email protected]>
>> Tested-by: Erik Rockstrom <[email protected]>
>> Signed-off-by: Suravee Suthikulpanit <[email protected]>
>
> With the comments below addressed, FWIW:
>
> Reviewed-by: Joao Martins <[email protected]>
>
>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>> index c652f16eb702..ad30467f6930 100644
>> --- a/drivers/iommu/amd/init.c
>> +++ b/drivers/iommu/amd/init.c
>> @@ -1511,7 +1511,14 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
>> iommu->mmio_phys_end = MMIO_REG_END_OFFSET;
>> else
>> iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
>> - if (((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
>> +
>> + /*
>> + * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
>> + * GAM also requires GA mode. Therefore, we need to
>> + * check cmbxchg16b support before enabling it.
>> + */
>
> s/cmbxchg16b/cmpxchg16b
>
>> + if (!boot_cpu_has(X86_FEATURE_CX16) ||
>> + ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
>> amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
>> break;
>> case 0x11:
>> @@ -1520,8 +1527,18 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
>> iommu->mmio_phys_end = MMIO_REG_END_OFFSET;
>> else
>> iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
>> - if (((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0))
>> +
>> + /*
>> + * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
>> + * XT, GAM also requires GA mode. Therefore, we need to
>> + * check cmbxchg16b support before enabling them.
>
> s/cmbxchg16b/cmpxchg16b
>
>> + */
>> + if (boot_cpu_has(X86_FEATURE_CX16) ||
>
> You probably want !boot_cpu_has(X86_FEATURE_CX16) ?
.... Ah, sorry!! I forgot to change it back after testing for the negative case. Thank you for catching this.
Suravee
>
>> + ((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0)) {
>> amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
>> + break;
>> + }
>> +
>> /*
>> * Note: Since iommu_update_intcapxt() leverages
>> * the IOMMU MMIO access to MSI capability block registers