2017-07-21 08:59:20

by Baoquan He

[permalink] [raw]
Subject: [PATCH v8 00/13] Fix the on-flight DMA issue on system with amd iommu

When kernel panicked and jump into the kdump kernel, DMA started by the
1st kernel is not stopped, this is called on-flight DMA. In the current
code it will disable iommu and build new translation table and attach
device to it. This will cause:

1. IO_PAGE_FAULT warning message can be seen.
2. transfer data to or from incorrect areas of memory.

Sometime it causes the dump failure or kernel hang.

The principle of the fix is to defer the assignment of device to domain to
device driver initializtion stage. A new call-back is_attach_deferred() is
added to iommu-ops, will check whether we need defer the domain attach/detach
in iommu-core code. If defer is needed, just return directly from amd iommu
attach/detach function. The attachment will be done in device driver
initializaiton stage when calling get_domain().

Change history:
v8:v7:
Rebase patchset v7 on the latest v4.13-rc1.

- And re-enable printing IO_PAGE_FAULT message in kdump kernel.
- Only disable iommu if amd_iommu=off is specified in kdump kernel.


v6->v7:
Two main changes are made according to Joerg's suggestion:
- Add is_attach_deferred call-back to iommu-ops. With this domain
can be deferred to device driver init cleanly.

- Allocate memory below 4G for dev table if translation pre-enabled.
AMD engineer pointed out that it's unsafe to update the device-table
while iommu is enabled. device-table pointer update is split up into
two 32bit writes in the IOMMU hardware. So updating it while the IOMMU
is enabled could have some nasty side effects.

v5->v6:
According to Joerg's comments made several below main changes:
- Add sanity check when copy old dev tables.

- If a device is set up with guest translations (DTE.GV=1), then don't
copy that information but move the device over to an empty guest-cr3
table and handle the faults in the PPR log (which just answer them
with INVALID).

v5:
bnx2 NIC can't reset itself during driver init. Post patch to reset
it during driver init. IO_PAGE_FAULT can't be seen anymore.

Below is link of v5 post.
https://lists.linuxfoundation.org/pipermail/iommu/2016-September/018527.html


Baoquan He (12):
iommu/amd: Detect pre enabled translation
iommu/amd: add several helper functions
Revert "iommu/amd: Suppress IO_PAGE_FAULTs in kdump kernel"
iommu/amd: Define bit fields for DTE particularly
iommu/amd: Add function copy_dev_tables()
iommu/amd: copy old trans table from old kernel
iommu/amd: Do sanity check for irq remap of old dev table entry
iommu: Add is_attach_deferred call-back to iommu-ops
iommu/amd: Use is_attach_deferred call-back
iommu/amd: Allocate memory below 4G for dev table if translation
pre-enabled
iommu/amd: Don't copy GCR3 table root pointer
iommu/amd: Clear out the GV flag when handle deferred domain attach

root (1):
iommu/amd: Disable iommu only if amd_iommu=off is specified

drivers/iommu/amd_iommu.c | 81 ++++++++-------
drivers/iommu/amd_iommu_init.c | 212 ++++++++++++++++++++++++++++++++++++----
drivers/iommu/amd_iommu_proto.h | 2 +
drivers/iommu/amd_iommu_types.h | 56 ++++++++++-
drivers/iommu/amd_iommu_v2.c | 18 +++-
drivers/iommu/iommu.c | 8 ++
include/linux/iommu.h | 1 +
7 files changed, 315 insertions(+), 63 deletions(-)

--
2.5.5


2017-07-21 08:59:23

by Baoquan He

[permalink] [raw]
Subject: [PATCH v8 01/13] iommu/amd: Detect pre enabled translation

Add functions to check whether translation is already enabled in IOMMU.

Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/amd_iommu_init.c | 24 ++++++++++++++++++++++++
drivers/iommu/amd_iommu_proto.h | 1 +
drivers/iommu/amd_iommu_types.h | 4 ++++
3 files changed, 29 insertions(+)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5cc597b383c7..e39857ce6481 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -258,6 +258,25 @@ static int amd_iommu_enable_interrupts(void);
static int __init iommu_go_to_state(enum iommu_init_state state);
static void init_device_table_dma(void);

+bool translation_pre_enabled(struct amd_iommu *iommu)
+{
+ return (iommu->flags & AMD_IOMMU_FLAG_TRANS_PRE_ENABLED);
+}
+
+static void clear_translation_pre_enabled(struct amd_iommu *iommu)
+{
+ iommu->flags &= ~AMD_IOMMU_FLAG_TRANS_PRE_ENABLED;
+}
+
+static void init_translation_status(struct amd_iommu *iommu)
+{
+ u32 ctrl;
+
+ ctrl = readl(iommu->mmio_base + MMIO_CONTROL_OFFSET);
+ if (ctrl & (1<<CONTROL_IOMMU_EN))
+ iommu->flags |= AMD_IOMMU_FLAG_TRANS_PRE_ENABLED;
+}
+
static inline void update_last_devid(u16 devid)
{
if (devid > amd_iommu_last_bdf)
@@ -1399,6 +1418,11 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)

iommu->int_enabled = false;

+ init_translation_status(iommu);
+
+ if (translation_pre_enabled(iommu))
+ pr_warn("Translation is already enabled - trying to copy translation structures\n");
+
ret = init_iommu_from_acpi(iommu, h);
if (ret)
return ret;
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index 466260f8a1df..a9666d2005bb 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -87,4 +87,5 @@ static inline bool iommu_feature(struct amd_iommu *iommu, u64 f)
return !!(iommu->features & f);
}

+extern bool translation_pre_enabled(struct amd_iommu *iommu);
#endif /* _ASM_X86_AMD_IOMMU_PROTO_H */
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 294a409e283b..d15966b62b33 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -417,6 +417,7 @@ extern struct kmem_cache *amd_iommu_irq_cache;
#define APERTURE_PAGE_INDEX(a) (((a) >> 21) & 0x3fULL)


+
/*
* This struct is used to pass information about
* incoming PPR faults around.
@@ -435,6 +436,8 @@ struct iommu_domain;
struct irq_domain;
struct amd_irte_ops;

+#define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED (1 << 0)
+
/*
* This structure contains generic data for IOMMU protection domains
* independent of their use.
@@ -569,6 +572,7 @@ struct amd_iommu {
struct amd_irte_ops *irte_ops;
#endif

+ u32 flags;
volatile u64 __aligned(8) cmd_sem;
};

--
2.5.5

2017-07-21 08:59:30

by Baoquan He

[permalink] [raw]
Subject: [PATCH v8 02/13] iommu/amd: add several helper functions

Move single iommu enabling codes into a wrapper function early_enable_iommu().
This can make later kdump change easier.

And also add iommu_disable_command_buffer and iommu_disable_event_buffer
for later usage.

Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/amd_iommu_init.c | 42 +++++++++++++++++++++++++++++++-----------
1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index e39857ce6481..4ca6e3257d92 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -634,6 +634,14 @@ static void iommu_enable_command_buffer(struct amd_iommu *iommu)
amd_iommu_reset_cmd_buffer(iommu);
}

+/*
+ * This function disables the command buffer
+ */
+static void iommu_disable_command_buffer(struct amd_iommu *iommu)
+{
+ iommu_feature_disable(iommu, CONTROL_CMDBUF_EN);
+}
+
static void __init free_command_buffer(struct amd_iommu *iommu)
{
free_pages((unsigned long)iommu->cmd_buf, get_order(CMD_BUFFER_SIZE));
@@ -666,6 +674,14 @@ static void iommu_enable_event_buffer(struct amd_iommu *iommu)
iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN);
}

+/*
+ * This function disables the command buffer
+ */
+static void iommu_disable_event_buffer(struct amd_iommu *iommu)
+{
+ iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN);
+}
+
static void __init free_event_buffer(struct amd_iommu *iommu)
{
free_pages((unsigned long)iommu->evt_buf, get_order(EVT_BUFFER_SIZE));
@@ -2046,6 +2062,19 @@ static void iommu_enable_ga(struct amd_iommu *iommu)
#endif
}

+static void early_enable_iommu(struct amd_iommu *iommu)
+{
+ iommu_disable(iommu);
+ iommu_init_flags(iommu);
+ iommu_set_device_table(iommu);
+ iommu_enable_command_buffer(iommu);
+ iommu_enable_event_buffer(iommu);
+ iommu_set_exclusion_range(iommu);
+ iommu_enable_ga(iommu);
+ iommu_enable(iommu);
+ iommu_flush_all_caches(iommu);
+}
+
/*
* This function finally enables all IOMMUs found in the system after
* they have been initialized
@@ -2054,17 +2083,8 @@ static void early_enable_iommus(void)
{
struct amd_iommu *iommu;

- for_each_iommu(iommu) {
- iommu_disable(iommu);
- iommu_init_flags(iommu);
- iommu_set_device_table(iommu);
- iommu_enable_command_buffer(iommu);
- iommu_enable_event_buffer(iommu);
- iommu_set_exclusion_range(iommu);
- iommu_enable_ga(iommu);
- iommu_enable(iommu);
- iommu_flush_all_caches(iommu);
- }
+ for_each_iommu(iommu)
+ early_enable_iommu(iommu);

#ifdef CONFIG_IRQ_REMAP
if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
--
2.5.5

2017-07-21 08:59:34

by Baoquan He

[permalink] [raw]
Subject: [PATCH v8 03/13] Revert "iommu/amd: Suppress IO_PAGE_FAULTs in kdump kernel"

This reverts commit 54bd63570484167cb13edf81e31fff107b879981.

We still need the IO_PAGE_FAULT message to warn error after the
issue of on-flight dma in kdump kernel is fixed.

Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/amd_iommu.c | 3 +--
drivers/iommu/amd_iommu_init.c | 9 ---------
drivers/iommu/amd_iommu_types.h | 1 -
3 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 688e77576e5a..e8a6d8109564 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2086,8 +2086,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
flags |= tmp;
}

-
- flags &= ~(DTE_FLAG_SA | 0xffffULL);
+ flags &= ~(0xffffUL);
flags |= domain->id;

amd_iommu_dev_table[devid].data[1] = flags;
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 4ca6e3257d92..f6da5fe03b31 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -29,7 +29,6 @@
#include <linux/export.h>
#include <linux/iommu.h>
#include <linux/kmemleak.h>
-#include <linux/crash_dump.h>
#include <asm/pci-direct.h>
#include <asm/iommu.h>
#include <asm/gart.h>
@@ -1942,14 +1941,6 @@ static void init_device_table_dma(void)
for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
set_dev_entry_bit(devid, DEV_ENTRY_VALID);
set_dev_entry_bit(devid, DEV_ENTRY_TRANSLATION);
- /*
- * In kdump kernels in-flight DMA from the old kernel might
- * cause IO_PAGE_FAULTs. There are no reports that a kdump
- * actually failed because of that, so just disable fault
- * reporting in the hardware to get rid of the messages
- */
- if (is_kdump_kernel())
- set_dev_entry_bit(devid, DEV_ENTRY_NO_PAGE_FAULT);
}
}

diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index d15966b62b33..608e81ca5e92 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -322,7 +322,6 @@
#define IOMMU_PTE_IW (1ULL << 62)

#define DTE_FLAG_IOTLB (1ULL << 32)
-#define DTE_FLAG_SA (1ULL << 34)
#define DTE_FLAG_GV (1ULL << 55)
#define DTE_FLAG_MASK (0x3ffULL << 32)
#define DTE_GLX_SHIFT (56)
--
2.5.5

2017-07-21 08:59:37

by Baoquan He

[permalink] [raw]
Subject: [PATCH v8 04/13] iommu/amd: Define bit fields for DTE particularly

In AMD-Vi spec several bits of IO PTE fields and DTE fields are similar
so that both of them can share the same MACRO definition. However
defining them respectively can make code more read-able. Do it now.

Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/amd_iommu.c | 8 ++++----
drivers/iommu/amd_iommu_types.h | 18 ++++++++++++++----
2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index e8a6d8109564..e5a03f259986 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1537,9 +1537,9 @@ static int iommu_map_page(struct protection_domain *dom,

if (count > 1) {
__pte = PAGE_SIZE_PTE(phys_addr, page_size);
- __pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_P | IOMMU_PTE_FC;
+ __pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_PR | IOMMU_PTE_FC;
} else
- __pte = phys_addr | IOMMU_PTE_P | IOMMU_PTE_FC;
+ __pte = phys_addr | IOMMU_PTE_PR | IOMMU_PTE_FC;

if (prot & IOMMU_PROT_IR)
__pte |= IOMMU_PTE_IR;
@@ -2053,7 +2053,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)

pte_root |= (domain->mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
- pte_root |= IOMMU_PTE_IR | IOMMU_PTE_IW | IOMMU_PTE_P | IOMMU_PTE_TV;
+ pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V | DTE_FLAG_TV;

flags = amd_iommu_dev_table[devid].data[1];

@@ -2096,7 +2096,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
static void clear_dte_entry(u16 devid)
{
/* remove entry from the device table seen by the hardware */
- amd_iommu_dev_table[devid].data[0] = IOMMU_PTE_P | IOMMU_PTE_TV;
+ amd_iommu_dev_table[devid].data[0] = DTE_FLAG_V | DTE_FLAG_TV;
amd_iommu_dev_table[devid].data[1] &= DTE_FLAG_MASK;

amd_iommu_apply_erratum_63(devid);
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 608e81ca5e92..ea36af19b5b9 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -265,7 +265,7 @@
#define PM_LEVEL_INDEX(x, a) (((a) >> PM_LEVEL_SHIFT((x))) & 0x1ffULL)
#define PM_LEVEL_ENC(x) (((x) << 9) & 0xe00ULL)
#define PM_LEVEL_PDE(x, a) ((a) | PM_LEVEL_ENC((x)) | \
- IOMMU_PTE_P | IOMMU_PTE_IR | IOMMU_PTE_IW)
+ IOMMU_PTE_PR | IOMMU_PTE_IR | IOMMU_PTE_IW)
#define PM_PTE_LEVEL(pte) (((pte) >> 9) & 0x7ULL)

#define PM_MAP_4k 0
@@ -314,13 +314,23 @@
#define PTE_LEVEL_PAGE_SIZE(level) \
(1ULL << (12 + (9 * (level))))

-#define IOMMU_PTE_P (1ULL << 0)
-#define IOMMU_PTE_TV (1ULL << 1)
+/*
+ * Bit value definition for I/O PTE fields
+ */
+#define IOMMU_PTE_PR (1ULL << 0)
#define IOMMU_PTE_U (1ULL << 59)
#define IOMMU_PTE_FC (1ULL << 60)
#define IOMMU_PTE_IR (1ULL << 61)
#define IOMMU_PTE_IW (1ULL << 62)

+/*
+ * Bit value definition for DTE fields
+ */
+#define DTE_FLAG_V (1ULL << 0)
+#define DTE_FLAG_TV (1ULL << 1)
+#define DTE_FLAG_IR (1ULL << 61)
+#define DTE_FLAG_IW (1ULL << 62)
+
#define DTE_FLAG_IOTLB (1ULL << 32)
#define DTE_FLAG_GV (1ULL << 55)
#define DTE_FLAG_MASK (0x3ffULL << 32)
@@ -342,7 +352,7 @@
#define GCR3_VALID 0x01ULL

#define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
-#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_P)
+#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_PR)
#define IOMMU_PTE_PAGE(pte) (phys_to_virt((pte) & IOMMU_PAGE_MASK))
#define IOMMU_PTE_MODE(pte) (((pte) >> 9) & 0x07)

--
2.5.5

2017-07-21 08:59:40

by Baoquan He

[permalink] [raw]
Subject: [PATCH v8 05/13] iommu/amd: Add function copy_dev_tables()

Add function copy_dev_tables to copy the old DEV table entries of the panicked
kernel to the new allocated DEV table. Since all iommus share the same DTE table
the copy only need be done one time. Besides, we also need to:

- Check whether all IOMMUs actually use the same device table with the same size

- Verify that the size of the old device table is the expected size.

- Reserve the old domain id occupied in 1st kernel to avoid touching the old
io-page tables. Then on-flight DMA can continue looking it up.

And also define MACRO DEV_DOMID_MASK to replace magic number 0xffffULL, it can be
reused in copy_dev_tables().

Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/amd_iommu.c | 2 +-
drivers/iommu/amd_iommu_init.c | 55 +++++++++++++++++++++++++++++++++++++++++
drivers/iommu/amd_iommu_types.h | 1 +
3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index e5a03f259986..4d00f1bda900 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2086,7 +2086,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
flags |= tmp;
}

- flags &= ~(0xffffUL);
+ flags &= ~DEV_DOMID_MASK;
flags |= domain->id;

amd_iommu_dev_table[devid].data[1] = flags;
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index f6da5fe03b31..c58f091ce232 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -842,6 +842,61 @@ static int get_dev_entry_bit(u16 devid, u8 bit)
}


+static int copy_dev_tables(void)
+{
+ struct dev_table_entry *old_devtb = NULL;
+ u32 lo, hi, devid, old_devtb_size;
+ phys_addr_t old_devtb_phys;
+ u64 entry, last_entry = 0;
+ struct amd_iommu *iommu;
+ u16 dom_id, dte_v;
+ static int copied;
+
+ for_each_iommu(iommu) {
+ if (!translation_pre_enabled(iommu)) {
+ pr_err("IOMMU:%d is not pre-enabled!/n",
+ iommu->index);
+ return -1;
+ }
+
+ /* All IOMMUs should use the same device table with the same size */
+ lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
+ hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
+ entry = (((u64) hi) << 32) + lo;
+ if (last_entry && last_entry != entry) {
+ pr_err("IOMMU:%d should use the same dev table as others!/n",
+ iommu->index);
+ return -1;
+ }
+ last_entry = entry;
+
+ old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
+ if (old_devtb_size != dev_table_size) {
+ pr_err("The device table size of IOMMU:%d is not expected!/n",
+ iommu->index);
+ return -1;
+ }
+
+ old_devtb_phys = entry & PAGE_MASK;
+ old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+ if (!old_devtb)
+ return -1;
+
+ if (copied)
+ continue;
+ for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
+ amd_iommu_dev_table[devid] = old_devtb[devid];
+ dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
+ dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
+ if (dte_v && dom_id)
+ __set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
+ }
+ memunmap(old_devtb);
+ copied = 1;
+ }
+ return 0;
+}
+
void amd_iommu_apply_erratum_63(u16 devid)
{
int sysmgt;
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index ea36af19b5b9..1c06bcc06f5c 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -336,6 +336,7 @@
#define DTE_FLAG_MASK (0x3ffULL << 32)
#define DTE_GLX_SHIFT (56)
#define DTE_GLX_MASK (3)
+#define DEV_DOMID_MASK 0xffffULL

#define DTE_GCR3_VAL_A(x) (((x) >> 12) & 0x00007ULL)
#define DTE_GCR3_VAL_B(x) (((x) >> 15) & 0x0ffffULL)
--
2.5.5

2017-07-21 08:59:45

by Baoquan He

[permalink] [raw]
Subject: [PATCH v8 06/13] iommu/amd: copy old trans table from old kernel

Here several things need be done:
- If iommu is pre-enabled in a normal kernel, just disable it and print
warning.

- If failed to copy dev table of old kernel, continue to proceed as
it does in normal kernel.

- Disable and Re-enable event/cmd buffer, install the copied DTE table
to reg, and detect and enable guest vapic.

- Flush all caches

Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/amd_iommu_init.c | 51 ++++++++++++++++++++++++++++++++++++------
1 file changed, 44 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index c58f091ce232..aa9b5918d11f 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -37,6 +37,7 @@
#include <asm/io_apic.h>
#include <asm/irq_remapping.h>

+#include <linux/crash_dump.h>
#include "amd_iommu_proto.h"
#include "amd_iommu_types.h"
#include "irq_remapping.h"
@@ -1489,9 +1490,12 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
iommu->int_enabled = false;

init_translation_status(iommu);
-
- if (translation_pre_enabled(iommu))
- pr_warn("Translation is already enabled - trying to copy translation structures\n");
+ if (translation_pre_enabled(iommu) && !is_kdump_kernel()) {
+ iommu_disable(iommu);
+ clear_translation_pre_enabled(iommu);
+ pr_warn("Translation was enabled for IOMMU:%d but we are not in kdump mode\n",
+ iommu->index);
+ }

ret = init_iommu_from_acpi(iommu, h);
if (ret)
@@ -1986,8 +1990,7 @@ static int __init init_memory_definitions(struct acpi_table_header *table)
}

/*
- * Init the device table to not allow DMA access for devices and
- * suppress all page faults
+ * Init the device table to not allow DMA access for devices
*/
static void init_device_table_dma(void)
{
@@ -2128,9 +2131,43 @@ static void early_enable_iommu(struct amd_iommu *iommu)
static void early_enable_iommus(void)
{
struct amd_iommu *iommu;
+ bool is_pre_enabled = false;

- for_each_iommu(iommu)
- early_enable_iommu(iommu);
+ for_each_iommu(iommu) {
+ if (translation_pre_enabled(iommu)) {
+ is_pre_enabled = true;
+ break;
+ }
+ }
+
+ if (!is_pre_enabled) {
+ for_each_iommu(iommu)
+ early_enable_iommu(iommu);
+ } else {
+ pr_warn("Translation is already enabled - trying to copy translation structures\n");
+ if (copy_dev_tables()) {
+ pr_err("Failed to copy DEV table from previous kernel.\n");
+ /*
+ * If failed to copy dev tables from old kernel, continue to proceed
+ * as it does in normal kernel.
+ */
+ for_each_iommu(iommu) {
+ clear_translation_pre_enabled(iommu);
+ early_enable_iommu(iommu);
+ }
+ } else {
+ pr_info("Copied DEV table from previous kernel.\n");
+ for_each_iommu(iommu) {
+ iommu_disable_command_buffer(iommu);
+ iommu_disable_event_buffer(iommu);
+ iommu_enable_command_buffer(iommu);
+ iommu_enable_event_buffer(iommu);
+ iommu_enable_ga(iommu);
+ iommu_set_device_table(iommu);
+ iommu_flush_all_caches(iommu);
+ }
+ }
+ }

#ifdef CONFIG_IRQ_REMAP
if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
--
2.5.5

2017-07-21 08:59:51

by Baoquan He

[permalink] [raw]
Subject: [PATCH v8 08/13] iommu: Add is_attach_deferred call-back to iommu-ops

This new call-back will be used to check if the domain attach need be
deferred for now. If yes, the domain attach/detach will return directly.

Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/iommu.c | 8 ++++++++
include/linux/iommu.h | 1 +
2 files changed, 9 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3f6ea160afed..86581b115b92 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1283,6 +1283,10 @@ static int __iommu_attach_device(struct iommu_domain *domain,
struct device *dev)
{
int ret;
+ if ((domain->ops->is_attach_deferred != NULL) &&
+ domain->ops->is_attach_deferred(domain, dev))
+ return 0;
+
if (unlikely(domain->ops->attach_dev == NULL))
return -ENODEV;

@@ -1324,6 +1328,10 @@ EXPORT_SYMBOL_GPL(iommu_attach_device);
static void __iommu_detach_device(struct iommu_domain *domain,
struct device *dev)
{
+ if ((domain->ops->is_attach_deferred != NULL) &&
+ domain->ops->is_attach_deferred(domain, dev))
+ return;
+
if (unlikely(domain->ops->detach_dev == NULL))
return;

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2cb54adc4a33..63983c9e6c3a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -225,6 +225,7 @@ struct iommu_ops {
u32 (*domain_get_windows)(struct iommu_domain *domain);

int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
+ bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev);

unsigned long pgsize_bitmap;
};
--
2.5.5

2017-07-21 08:59:55

by Baoquan He

[permalink] [raw]
Subject: [PATCH v8 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled

AMD pointed out it's unsafe to update the device-table while iommu
is enabled. It turns out that device-table pointer update is split
up into two 32bit writes in the IOMMU hardware. So updating it while
the IOMMU is enabled could have some nasty side effects.

The only way to work around this is to allocate the device-table below
4GB if translation is pre-enabled in kdump kernel. If allocation failed,
still use the old one.

Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/amd_iommu_init.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 052fa4a977d8..d7c301d0d672 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2149,11 +2149,23 @@ static void early_enable_iommu(struct amd_iommu *iommu)
*/
static void early_enable_iommus(void)
{
+ struct dev_table_entry *dev_tbl;
struct amd_iommu *iommu;
bool is_pre_enabled = false;

for_each_iommu(iommu) {
if (translation_pre_enabled(iommu)) {
+ gfp_t gfp_flag = GFP_KERNEL | __GFP_ZERO | GFP_DMA32;;
+
+ dev_tbl = (void *)__get_free_pages(gfp_flag,
+ get_order(dev_table_size));
+ if (dev_tbl != NULL) {
+ memcpy(dev_tbl, amd_iommu_dev_table, dev_table_size);
+ free_pages((unsigned long)amd_iommu_dev_table,
+ get_order(dev_table_size));
+ amd_iommu_dev_table = dev_tbl;
+ }
+
is_pre_enabled = true;
break;
}
--
2.5.5

2017-07-21 09:00:10

by Baoquan He

[permalink] [raw]
Subject: [PATCH v8 13/13] iommu/amd: Disable iommu only if amd_iommu=off is specified

From: root <[email protected]>

It's ok to disable iommu in normal kernel. While there's no need
to disable it in kdump kernel after the on-flight dma issue has
heen fixed.

Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/amd_iommu_init.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 8b4bac978062..880f693c809b 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2494,7 +2494,8 @@ static int __init early_amd_iommu_init(void)
goto out;

/* Disable any previously enabled IOMMUs */
- disable_iommus();
+ if (amd_iommu_disabled)
+ disable_iommus();

if (amd_iommu_irq_remap)
amd_iommu_irq_remap = check_ioapic_information();
--
2.5.5

2017-07-21 09:00:01

by Baoquan He

[permalink] [raw]
Subject: [PATCH v8 11/13] iommu/amd: Don't copy GCR3 table root pointer

When iommu is pre_enabled in kdump kernel, if a device is set up with
guest translations (DTE.GV=1), then don't copy GCR3 table root pointer
but move the device over to an empty guest-cr3 table and handle the
faults in the PPR log (which answer them with INVALID). After all these
PPR faults are recoverable for the device and we should not allow the
device to change old-kernels data when we don't have to.

Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/amd_iommu.c | 28 +++-------------------------
drivers/iommu/amd_iommu_init.c | 11 +++++++++++
drivers/iommu/amd_iommu_proto.h | 1 +
drivers/iommu/amd_iommu_types.h | 24 ++++++++++++++++++++++++
drivers/iommu/amd_iommu_v2.c | 18 +++++++++++++++++-
5 files changed, 56 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 4ff413b34b51..46d077784da0 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -103,30 +103,6 @@ int amd_iommu_max_glx_val = -1;
static const struct dma_map_ops amd_iommu_dma_ops;

/*
- * This struct contains device specific data for the IOMMU
- */
-struct iommu_dev_data {
- struct list_head list; /* For domain->dev_list */
- struct list_head dev_data_list; /* For global dev_data_list */
- struct protection_domain *domain; /* Domain the device is bound to */
- u16 devid; /* PCI Device ID */
- u16 alias; /* Alias Device ID */
- bool iommu_v2; /* Device can make use of IOMMUv2 */
- bool passthrough; /* Device is identity mapped */
- struct {
- bool enabled;
- int qdep;
- } ats; /* ATS state */
- bool pri_tlp; /* PASID TLB required for
- PPR completions */
- u32 errata; /* Bitmap for errata to apply */
- bool use_vapic; /* Enable device to use vapic mode */
- bool defer_attach;
-
- struct ratelimit_state rs; /* Ratelimit IOPF messages */
-};
-
-/*
* general struct to manage commands send to an IOMMU
*/
struct iommu_cmd {
@@ -386,10 +362,11 @@ static struct iommu_dev_data *find_dev_data(u16 devid)
return dev_data;
}

-static struct iommu_dev_data *get_dev_data(struct device *dev)
+struct iommu_dev_data *get_dev_data(struct device *dev)
{
return dev->archdata.iommu;
}
+EXPORT_SYMBOL(get_dev_data);

/*
* Find or create an IOMMU group for a acpihid device.
@@ -2540,6 +2517,7 @@ static int dir2prot(enum dma_data_direction direction)
else
return 0;
}
+
/*
* This function contains common code for mapping of a physically
* contiguous memory region into DMA address space. It is used by all
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index d7c301d0d672..8b4bac978062 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -209,6 +209,7 @@ u16 *amd_iommu_alias_table;
* for a specific device. It is also indexed by the PCI device id.
*/
struct amd_iommu **amd_iommu_rlookup_table;
+EXPORT_SYMBOL(amd_iommu_rlookup_table);

/*
* This table is used to find the irq remapping table for a given device id
@@ -262,6 +263,7 @@ bool translation_pre_enabled(struct amd_iommu *iommu)
{
return (iommu->flags & AMD_IOMMU_FLAG_TRANS_PRE_ENABLED);
}
+EXPORT_SYMBOL(translation_pre_enabled);

static void clear_translation_pre_enabled(struct amd_iommu *iommu)
{
@@ -852,6 +854,7 @@ static int copy_dev_tables(void)
struct amd_iommu *iommu;
u16 dom_id, dte_v, irq_v;
static int copied;
+ u64 tmp;

for_each_iommu(iommu) {
if (!translation_pre_enabled(iommu)) {
@@ -895,6 +898,14 @@ static int copy_dev_tables(void)
amd_iommu_dev_table[devid].data[1]
= old_devtb[devid].data[1];
__set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
+ /* If gcr3 table existed, mask it out */
+ if (old_devtb[devid].data[0] & DTE_FLAG_GV) {
+ tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
+ tmp |= DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
+ amd_iommu_dev_table[devid].data[1] &= ~tmp;
+ tmp = DTE_GCR3_VAL_A(~0ULL) << DTE_GCR3_SHIFT_A;
+ amd_iommu_dev_table[devid].data[0] &= ~tmp;
+ }
}

irq_v = old_devtb[devid].data[2] & DTE_IRQ_REMAP_ENABLE;
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index a9666d2005bb..90e62e9b01c5 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -88,4 +88,5 @@ static inline bool iommu_feature(struct amd_iommu *iommu, u64 f)
}

extern bool translation_pre_enabled(struct amd_iommu *iommu);
+extern struct iommu_dev_data *get_dev_data(struct device *dev);
#endif /* _ASM_X86_AMD_IOMMU_PROTO_H */
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 7149fa52063f..32c68f2fc1dc 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -619,6 +619,30 @@ struct devid_map {
bool cmd_line;
};

+/*
+ * This struct contains device specific data for the IOMMU
+ */
+struct iommu_dev_data {
+ struct list_head list; /* For domain->dev_list */
+ struct list_head dev_data_list; /* For global dev_data_list */
+ struct protection_domain *domain; /* Domain the device is bound to */
+ u16 devid; /* PCI Device ID */
+ u16 alias; /* Alias Device ID */
+ bool iommu_v2; /* Device can make use of IOMMUv2 */
+ bool passthrough; /* Device is identity mapped */
+ struct {
+ bool enabled;
+ int qdep;
+ } ats; /* ATS state */
+ bool pri_tlp; /* PASID TLB required for
+ PPR completions */
+ u32 errata; /* Bitmap for errata to apply */
+ bool use_vapic; /* Enable device to use vapic mode */
+ bool defer_attach;
+
+ struct ratelimit_state rs; /* Ratelimit IOPF messages */
+};
+
/* Map HPET and IOAPIC ids to the devid used by the IOMMU */
extern struct list_head ioapic_map;
extern struct list_head hpet_map;
diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 6629c472eafd..0245d414a7b3 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -562,13 +562,29 @@ static int ppr_notifier(struct notifier_block *nb, unsigned long e, void *data)
unsigned long flags;
struct fault *fault;
bool finish;
- u16 tag;
+ u16 tag, devid;
int ret;
+ struct iommu_dev_data *dev_data;
+ struct pci_dev *pdev = NULL;

iommu_fault = data;
tag = iommu_fault->tag & 0x1ff;
finish = (iommu_fault->tag >> 9) & 1;

+ devid = iommu_fault->device_id;
+ pdev = pci_get_bus_and_slot(PCI_BUS_NUM(devid), devid & 0xff);
+ if (!pdev)
+ return -ENODEV;
+ dev_data = get_dev_data(&pdev->dev);
+
+ /* In kdump kernel pci dev is not initialized yet -> send INVALID */
+ if (translation_pre_enabled(amd_iommu_rlookup_table[devid])
+ && dev_data->defer_attach) {
+ amd_iommu_complete_ppr(pdev, iommu_fault->pasid,
+ PPR_INVALID, tag);
+ goto out;
+ }
+
ret = NOTIFY_DONE;
dev_state = get_device_state(iommu_fault->device_id);
if (dev_state == NULL)
--
2.5.5

2017-07-21 08:59:59

by Baoquan He

[permalink] [raw]
Subject: [PATCH v8 12/13] iommu/amd: Clear out the GV flag when handle deferred domain attach

When handle deferred domain attach, we need check if the domain is
v2. If not, should try to clear out the GV flag which could be
copied from the old device table entry.

Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/amd_iommu.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 46d077784da0..98aaccecbb76 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2085,6 +2085,11 @@ static void clear_dte_entry(u16 devid)
amd_iommu_apply_erratum_63(devid);
}

+static void clear_dte_flag_gv(u16 devid)
+{
+ amd_iommu_dev_table[devid].data[0] &= (~DTE_FLAG_GV);
+}
+
static void do_attach(struct iommu_dev_data *dev_data,
struct protection_domain *domain)
{
@@ -2459,6 +2464,7 @@ static struct iommu_group *amd_iommu_device_group(struct device *dev)
*/
static struct protection_domain *get_domain(struct device *dev)
{
+ struct iommu_dev_data *dev_data = get_dev_data(dev);
struct protection_domain *domain;
struct iommu_domain *io_domain;

@@ -2466,11 +2472,21 @@ static struct protection_domain *get_domain(struct device *dev)
return ERR_PTR(-EINVAL);

domain = get_dev_data(dev)->domain;
- if (domain == NULL && get_dev_data(dev)->defer_attach) {
+ if (domain == NULL && dev_data->defer_attach) {
+ u16 alias = amd_iommu_alias_table[dev_data->devid];
get_dev_data(dev)->defer_attach = false;
io_domain = iommu_get_domain_for_dev(dev);
domain = to_pdomain(io_domain);
attach_device(dev, domain);
+ /*
+ * If the deferred attached domain is not v2, should clear out
+ * the old GV flag.
+ */
+ if (!(domain->flags & PD_IOMMUV2_MASK)) {
+ clear_dte_flag_gv(dev_data->devid);
+ if (alias != dev_data->devid)
+ clear_dte_flag_gv(dev_data->devid);
+ }
}
if (!dma_ops_domain(domain))
return ERR_PTR(-EBUSY);
--
2.5.5

2017-07-21 09:01:23

by Baoquan He

[permalink] [raw]
Subject: [PATCH v8 09/13] iommu/amd: Use is_attach_deferred call-back

Implement call-back is_attach_deferred and use it to defer the
domain attach from iommu driver init to device driver init when
iommu is pre-enabled in kdump kernel.

Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/amd_iommu.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 5508f57a3e4f..4ff413b34b51 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -121,6 +121,7 @@ struct iommu_dev_data {
PPR completions */
u32 errata; /* Bitmap for errata to apply */
bool use_vapic; /* Enable device to use vapic mode */
+ bool defer_attach;

struct ratelimit_state rs; /* Ratelimit IOPF messages */
};
@@ -371,12 +372,17 @@ static u16 get_alias(struct device *dev)
static struct iommu_dev_data *find_dev_data(u16 devid)
{
struct iommu_dev_data *dev_data;
+ struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];

dev_data = search_dev_data(devid);

- if (dev_data == NULL)
+ if (dev_data == NULL) {
dev_data = alloc_dev_data(devid);

+ if (translation_pre_enabled(iommu))
+ dev_data->defer_attach = true;
+ }
+
return dev_data;
}

@@ -2477,11 +2483,18 @@ static struct iommu_group *amd_iommu_device_group(struct device *dev)
static struct protection_domain *get_domain(struct device *dev)
{
struct protection_domain *domain;
+ struct iommu_domain *io_domain;

if (!check_device(dev))
return ERR_PTR(-EINVAL);

domain = get_dev_data(dev)->domain;
+ if (domain == NULL && get_dev_data(dev)->defer_attach) {
+ get_dev_data(dev)->defer_attach = false;
+ io_domain = iommu_get_domain_for_dev(dev);
+ domain = to_pdomain(io_domain);
+ attach_device(dev, domain);
+ }
if (!dma_ops_domain(domain))
return ERR_PTR(-EBUSY);

@@ -3372,6 +3385,13 @@ static void amd_iommu_apply_resv_region(struct device *dev,
WARN_ON_ONCE(reserve_iova(&dma_dom->iovad, start, end) == NULL);
}

+static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain,
+ struct device *dev)
+{
+ struct iommu_dev_data *dev_data = dev->archdata.iommu;
+ return dev_data->defer_attach;
+}
+
const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
@@ -3388,6 +3408,7 @@ const struct iommu_ops amd_iommu_ops = {
.get_resv_regions = amd_iommu_get_resv_regions,
.put_resv_regions = amd_iommu_put_resv_regions,
.apply_resv_region = amd_iommu_apply_resv_region,
+ .is_attach_deferred = amd_iommu_is_attach_deferred,
.pgsize_bitmap = AMD_IOMMU_PGSIZES,
};

--
2.5.5

2017-07-21 09:01:39

by Baoquan He

[permalink] [raw]
Subject: [PATCH v8 07/13] iommu/amd: Do sanity check for irq remap of old dev table entry

Firstly split the dev table entry copy into address translation part and
irq remapping part. Because these two parts could be enabled
independently.

Secondly check if IntCtl and IntTabLen are 10b and 1000b if they are
set.

Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/amd_iommu.c | 5 -----
drivers/iommu/amd_iommu_init.c | 25 ++++++++++++++++++++++---
drivers/iommu/amd_iommu_types.h | 8 ++++++++
3 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 4d00f1bda900..5508f57a3e4f 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3776,11 +3776,6 @@ EXPORT_SYMBOL(amd_iommu_device_info);

static struct irq_chip amd_ir_chip;

-#define DTE_IRQ_PHYS_ADDR_MASK (((1ULL << 45)-1) << 6)
-#define DTE_IRQ_REMAP_INTCTL (2ULL << 60)
-#define DTE_IRQ_TABLE_LEN (8ULL << 1)
-#define DTE_IRQ_REMAP_ENABLE 1ULL
-
static void set_dte_irq_entry(u16 devid, struct irq_remap_table *table)
{
u64 dte;
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index aa9b5918d11f..052fa4a977d8 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -845,12 +845,12 @@ static int get_dev_entry_bit(u16 devid, u8 bit)

static int copy_dev_tables(void)
{
+ u64 int_ctl, int_tab_len, entry, last_entry = 0;
struct dev_table_entry *old_devtb = NULL;
u32 lo, hi, devid, old_devtb_size;
phys_addr_t old_devtb_phys;
- u64 entry, last_entry = 0;
struct amd_iommu *iommu;
- u16 dom_id, dte_v;
+ u16 dom_id, dte_v, irq_v;
static int copied;

for_each_iommu(iommu) {
@@ -889,8 +889,27 @@ static int copy_dev_tables(void)
amd_iommu_dev_table[devid] = old_devtb[devid];
dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
- if (dte_v && dom_id)
+ if (dte_v && dom_id) {
+ amd_iommu_dev_table[devid].data[0]
+ = old_devtb[devid].data[0];
+ amd_iommu_dev_table[devid].data[1]
+ = old_devtb[devid].data[1];
__set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
+ }
+
+ irq_v = old_devtb[devid].data[2] & DTE_IRQ_REMAP_ENABLE;
+ int_ctl = old_devtb[devid].data[2] & DTE_IRQ_REMAP_INTCTL_MASK;
+ int_tab_len = old_devtb[devid].data[2] & DTE_IRQ_TABLE_LEN_MASK;
+ if (irq_v && (int_ctl || int_tab_len)) {
+ if ((int_ctl != DTE_IRQ_REMAP_INTCTL) ||
+ (int_tab_len != DTE_IRQ_TABLE_LEN)) {
+ pr_err("Wrong old irq remapping flag: %#x\n", devid);
+ return -1;
+ }
+
+ amd_iommu_dev_table[devid].data[2]
+ = old_devtb[devid].data[2];
+ }
}
memunmap(old_devtb);
copied = 1;
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 1c06bcc06f5c..7149fa52063f 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -250,6 +250,14 @@

#define GA_GUEST_NR 0x1

+/* Bit value definition for dte irq remapping fields*/
+#define DTE_IRQ_PHYS_ADDR_MASK (((1ULL << 45)-1) << 6)
+#define DTE_IRQ_REMAP_INTCTL_MASK (0x3ULL << 60)
+#define DTE_IRQ_TABLE_LEN_MASK (0xfULL << 1)
+#define DTE_IRQ_REMAP_INTCTL (2ULL << 60)
+#define DTE_IRQ_TABLE_LEN (8ULL << 1)
+#define DTE_IRQ_REMAP_ENABLE 1ULL
+
#define PAGE_MODE_NONE 0x00
#define PAGE_MODE_1_LEVEL 0x01
#define PAGE_MODE_2_LEVEL 0x02
--
2.5.5

2017-07-23 22:34:36

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v8 11/13] iommu/amd: Don't copy GCR3 table root pointer

Hi Baoquan,

[auto build test WARNING on iommu/next]
[also build test WARNING on v4.13-rc1]
[cannot apply to next-20170721]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Baoquan-He/Fix-the-on-flight-DMA-issue-on-system-with-amd-iommu/20170724-060048
base: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: x86_64-randconfig-x005-201730 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

drivers//iommu/amd_iommu_v2.c: In function 'ppr_notifier':
>> drivers//iommu/amd_iommu_v2.c:566:6: warning: 'ret' may be used uninitialized in this function [-Wmaybe-uninitialized]
int ret;
^~~

vim +/ret +566 drivers//iommu/amd_iommu_v2.c

028eeacc Joerg Roedel 2011-11-24 556
028eeacc Joerg Roedel 2011-11-24 557 static int ppr_notifier(struct notifier_block *nb, unsigned long e, void *data)
028eeacc Joerg Roedel 2011-11-24 558 {
028eeacc Joerg Roedel 2011-11-24 559 struct amd_iommu_fault *iommu_fault;
028eeacc Joerg Roedel 2011-11-24 560 struct pasid_state *pasid_state;
028eeacc Joerg Roedel 2011-11-24 561 struct device_state *dev_state;
028eeacc Joerg Roedel 2011-11-24 562 unsigned long flags;
028eeacc Joerg Roedel 2011-11-24 563 struct fault *fault;
028eeacc Joerg Roedel 2011-11-24 564 bool finish;
49f3c23d Baoquan He 2017-07-21 565 u16 tag, devid;
028eeacc Joerg Roedel 2011-11-24 @566 int ret;
49f3c23d Baoquan He 2017-07-21 567 struct iommu_dev_data *dev_data;
49f3c23d Baoquan He 2017-07-21 568 struct pci_dev *pdev = NULL;
028eeacc Joerg Roedel 2011-11-24 569
028eeacc Joerg Roedel 2011-11-24 570 iommu_fault = data;
028eeacc Joerg Roedel 2011-11-24 571 tag = iommu_fault->tag & 0x1ff;
028eeacc Joerg Roedel 2011-11-24 572 finish = (iommu_fault->tag >> 9) & 1;
028eeacc Joerg Roedel 2011-11-24 573
49f3c23d Baoquan He 2017-07-21 574 devid = iommu_fault->device_id;
49f3c23d Baoquan He 2017-07-21 575 pdev = pci_get_bus_and_slot(PCI_BUS_NUM(devid), devid & 0xff);
49f3c23d Baoquan He 2017-07-21 576 if (!pdev)
49f3c23d Baoquan He 2017-07-21 577 return -ENODEV;
49f3c23d Baoquan He 2017-07-21 578 dev_data = get_dev_data(&pdev->dev);
49f3c23d Baoquan He 2017-07-21 579
49f3c23d Baoquan He 2017-07-21 580 /* In kdump kernel pci dev is not initialized yet -> send INVALID */
49f3c23d Baoquan He 2017-07-21 581 if (translation_pre_enabled(amd_iommu_rlookup_table[devid])
49f3c23d Baoquan He 2017-07-21 582 && dev_data->defer_attach) {
49f3c23d Baoquan He 2017-07-21 583 amd_iommu_complete_ppr(pdev, iommu_fault->pasid,
49f3c23d Baoquan He 2017-07-21 584 PPR_INVALID, tag);
49f3c23d Baoquan He 2017-07-21 585 goto out;
49f3c23d Baoquan He 2017-07-21 586 }
49f3c23d Baoquan He 2017-07-21 587
028eeacc Joerg Roedel 2011-11-24 588 ret = NOTIFY_DONE;
028eeacc Joerg Roedel 2011-11-24 589 dev_state = get_device_state(iommu_fault->device_id);
028eeacc Joerg Roedel 2011-11-24 590 if (dev_state == NULL)
028eeacc Joerg Roedel 2011-11-24 591 goto out;
028eeacc Joerg Roedel 2011-11-24 592
028eeacc Joerg Roedel 2011-11-24 593 pasid_state = get_pasid_state(dev_state, iommu_fault->pasid);
53d340ef Joerg Roedel 2014-07-08 594 if (pasid_state == NULL || pasid_state->invalid) {
028eeacc Joerg Roedel 2011-11-24 595 /* We know the device but not the PASID -> send INVALID */
028eeacc Joerg Roedel 2011-11-24 596 amd_iommu_complete_ppr(dev_state->pdev, iommu_fault->pasid,
028eeacc Joerg Roedel 2011-11-24 597 PPR_INVALID, tag);
028eeacc Joerg Roedel 2011-11-24 598 goto out_drop_state;
028eeacc Joerg Roedel 2011-11-24 599 }
028eeacc Joerg Roedel 2011-11-24 600
028eeacc Joerg Roedel 2011-11-24 601 spin_lock_irqsave(&pasid_state->lock, flags);
028eeacc Joerg Roedel 2011-11-24 602 atomic_inc(&pasid_state->pri[tag].inflight);
028eeacc Joerg Roedel 2011-11-24 603 if (finish)
028eeacc Joerg Roedel 2011-11-24 604 pasid_state->pri[tag].finish = true;
028eeacc Joerg Roedel 2011-11-24 605 spin_unlock_irqrestore(&pasid_state->lock, flags);
028eeacc Joerg Roedel 2011-11-24 606
028eeacc Joerg Roedel 2011-11-24 607 fault = kzalloc(sizeof(*fault), GFP_ATOMIC);
028eeacc Joerg Roedel 2011-11-24 608 if (fault == NULL) {
028eeacc Joerg Roedel 2011-11-24 609 /* We are OOM - send success and let the device re-fault */
028eeacc Joerg Roedel 2011-11-24 610 finish_pri_tag(dev_state, pasid_state, tag);
028eeacc Joerg Roedel 2011-11-24 611 goto out_drop_state;
028eeacc Joerg Roedel 2011-11-24 612 }
028eeacc Joerg Roedel 2011-11-24 613
028eeacc Joerg Roedel 2011-11-24 614 fault->dev_state = dev_state;
028eeacc Joerg Roedel 2011-11-24 615 fault->address = iommu_fault->address;
028eeacc Joerg Roedel 2011-11-24 616 fault->state = pasid_state;
028eeacc Joerg Roedel 2011-11-24 617 fault->tag = tag;
028eeacc Joerg Roedel 2011-11-24 618 fault->finish = finish;
b00675b8 Alexey Skidanov 2014-07-08 619 fault->pasid = iommu_fault->pasid;
028eeacc Joerg Roedel 2011-11-24 620 fault->flags = iommu_fault->flags;
028eeacc Joerg Roedel 2011-11-24 621 INIT_WORK(&fault->work, do_fault);
028eeacc Joerg Roedel 2011-11-24 622
028eeacc Joerg Roedel 2011-11-24 623 queue_work(iommu_wq, &fault->work);
028eeacc Joerg Roedel 2011-11-24 624
028eeacc Joerg Roedel 2011-11-24 625 ret = NOTIFY_OK;
028eeacc Joerg Roedel 2011-11-24 626
028eeacc Joerg Roedel 2011-11-24 627 out_drop_state:
dc88db7e Joerg Roedel 2014-07-08 628
dc88db7e Joerg Roedel 2014-07-08 629 if (ret != NOTIFY_OK && pasid_state)
dc88db7e Joerg Roedel 2014-07-08 630 put_pasid_state(pasid_state);
dc88db7e Joerg Roedel 2014-07-08 631
028eeacc Joerg Roedel 2011-11-24 632 put_device_state(dev_state);
028eeacc Joerg Roedel 2011-11-24 633
028eeacc Joerg Roedel 2011-11-24 634 out:
028eeacc Joerg Roedel 2011-11-24 635 return ret;
028eeacc Joerg Roedel 2011-11-24 636 }
028eeacc Joerg Roedel 2011-11-24 637

:::::: The code at line 566 was first introduced by commit
:::::: 028eeacc412a8bebf6711e58629b0cab56a9ba87 iommu/amd: Implement IO page-fault handler

:::::: TO: Joerg Roedel <[email protected]>
:::::: CC: Joerg Roedel <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (6.90 kB)
.config.gz (25.23 kB)
Download all attachments

2017-07-23 23:54:06

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v8 11/13] iommu/amd: Don't copy GCR3 table root pointer

On 07/24/17 at 06:33am, kbuild test robot wrote:
> Hi Baoquan,
>
> [auto build test WARNING on iommu/next]
> [also build test WARNING on v4.13-rc1]
> [cannot apply to next-20170721]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/Baoquan-He/Fix-the-on-flight-DMA-issue-on-system-with-amd-iommu/20170724-060048
> base: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
> config: x86_64-randconfig-x005-201730 (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
> http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings
>
> All warnings (new ones prefixed by >>):
>
> drivers//iommu/amd_iommu_v2.c: In function 'ppr_notifier':
> >> drivers//iommu/amd_iommu_v2.c:566:6: warning: 'ret' may be used uninitialized in this function [-Wmaybe-uninitialized]
> int ret;
> ^~~
>
> vim +/ret +566 drivers//iommu/amd_iommu_v2.c

Thanks, it should return NOTIFY_DONE anyway when ppr faults is handled
in kdump kernel since the GCR3 table root pointer has been made NULL
intentionally.

I will add this into patch 11/13 when repost need be done.


>From 742d8a51d8832e12884800840c4ebe802767d808 Mon Sep 17 00:00:00 2001
From: Baoquan He <[email protected]>
Date: Mon, 24 Jul 2017 07:48:10 +0800
Subject: [PATCH] iommu/amd: The ppr faults handled in kdump kernel should
return NOTIFY_DONE

Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/amd_iommu_v2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 0245d414a7b3..e705fac89cb4 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -578,6 +578,7 @@ static int ppr_notifier(struct notifier_block *nb, unsigned long e, void *data)
dev_data = get_dev_data(&pdev->dev);

/* In kdump kernel pci dev is not initialized yet -> send INVALID */
+ ret = NOTIFY_DONE;
if (translation_pre_enabled(amd_iommu_rlookup_table[devid])
&& dev_data->defer_attach) {
amd_iommu_complete_ppr(pdev, iommu_fault->pasid,
@@ -585,7 +586,6 @@ static int ppr_notifier(struct notifier_block *nb, unsigned long e, void *data)
goto out;
}

- ret = NOTIFY_DONE;
dev_state = get_device_state(iommu_fault->device_id);
if (dev_state == NULL)
goto out;
--
2.5.5

2017-07-27 15:04:49

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v8 01/13] iommu/amd: Detect pre enabled translation

On Fri, Jul 21, 2017 at 04:58:59PM +0800, Baoquan He wrote:
> diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
> index 294a409e283b..d15966b62b33 100644
> --- a/drivers/iommu/amd_iommu_types.h
> +++ b/drivers/iommu/amd_iommu_types.h
> @@ -417,6 +417,7 @@ extern struct kmem_cache *amd_iommu_irq_cache;
> #define APERTURE_PAGE_INDEX(a) (((a) >> 21) & 0x3fULL)
>
>
> +

Forgot to remove that from the diff?

> /*
> * This struct is used to pass information about
> * incoming PPR faults around.
> @@ -435,6 +436,8 @@ struct iommu_domain;
> struct irq_domain;
> struct amd_irte_ops;
>
> +#define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED (1 << 0)
> +
> /*
> * This structure contains generic data for IOMMU protection domains
> * independent of their use.

2017-07-27 15:06:16

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v8 02/13] iommu/amd: add several helper functions

On Fri, Jul 21, 2017 at 04:59:00PM +0800, Baoquan He wrote:
> Move single iommu enabling codes into a wrapper function early_enable_iommu().
> This can make later kdump change easier.
>
> And also add iommu_disable_command_buffer and iommu_disable_event_buffer
> for later usage.
>
> Signed-off-by: Baoquan He <[email protected]>
> ---
> drivers/iommu/amd_iommu_init.c | 42 +++++++++++++++++++++++++++++++-----------
> 1 file changed, 31 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index e39857ce6481..4ca6e3257d92 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -634,6 +634,14 @@ static void iommu_enable_command_buffer(struct amd_iommu *iommu)
> amd_iommu_reset_cmd_buffer(iommu);
> }
>
> +/*
> + * This function disables the command buffer
> + */
> +static void iommu_disable_command_buffer(struct amd_iommu *iommu)
> +{
> + iommu_feature_disable(iommu, CONTROL_CMDBUF_EN);
> +}
> +
> static void __init free_command_buffer(struct amd_iommu *iommu)
> {
> free_pages((unsigned long)iommu->cmd_buf, get_order(CMD_BUFFER_SIZE));
> @@ -666,6 +674,14 @@ static void iommu_enable_event_buffer(struct amd_iommu *iommu)
> iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN);
> }
>
> +/*
> + * This function disables the command buffer

s/command buffer/event log/

> + */
> +static void iommu_disable_event_buffer(struct amd_iommu *iommu)

Please also use event_log here.

> +{
> + iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN);
> +}
> +

2017-07-27 15:29:29

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v8 05/13] iommu/amd: Add function copy_dev_tables()

On Fri, Jul 21, 2017 at 04:59:03PM +0800, Baoquan He wrote:
> Add function copy_dev_tables to copy the old DEV table entries of the panicked

Since there is only one (for now), you can name the function in
singular: copy_dev_table() or copy_device_table().

> kernel to the new allocated DEV table. Since all iommus share the same DTE table
> the copy only need be done one time. Besides, we also need to:
>
> - Check whether all IOMMUs actually use the same device table with the same size
>
> - Verify that the size of the old device table is the expected size.
>
> - Reserve the old domain id occupied in 1st kernel to avoid touching the old
> io-page tables. Then on-flight DMA can continue looking it up.
>
> And also define MACRO DEV_DOMID_MASK to replace magic number 0xffffULL, it can be
> reused in copy_dev_tables().
>
> Signed-off-by: Baoquan He <[email protected]>
> ---
> drivers/iommu/amd_iommu.c | 2 +-
> drivers/iommu/amd_iommu_init.c | 55 +++++++++++++++++++++++++++++++++++++++++
> drivers/iommu/amd_iommu_types.h | 1 +
> 3 files changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index e5a03f259986..4d00f1bda900 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2086,7 +2086,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
> flags |= tmp;
> }
>
> - flags &= ~(0xffffUL);
> + flags &= ~DEV_DOMID_MASK;
> flags |= domain->id;
>
> amd_iommu_dev_table[devid].data[1] = flags;
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index f6da5fe03b31..c58f091ce232 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -842,6 +842,61 @@ static int get_dev_entry_bit(u16 devid, u8 bit)
> }
>
>
> +static int copy_dev_tables(void)
> +{
> + struct dev_table_entry *old_devtb = NULL;
> + u32 lo, hi, devid, old_devtb_size;
> + phys_addr_t old_devtb_phys;
> + u64 entry, last_entry = 0;
> + struct amd_iommu *iommu;
> + u16 dom_id, dte_v;
> + static int copied;
> +
> + for_each_iommu(iommu) {
> + if (!translation_pre_enabled(iommu)) {
> + pr_err("IOMMU:%d is not pre-enabled!/n",
> + iommu->index);
> + return -1;
> + }
> +
> + /* All IOMMUs should use the same device table with the same size */
> + lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
> + hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
> + entry = (((u64) hi) << 32) + lo;
> + if (last_entry && last_entry != entry) {
> + pr_err("IOMMU:%d should use the same dev table as others!/n",
> + iommu->index);
> + return -1;
> + }
> + last_entry = entry;
> +
> + old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
> + if (old_devtb_size != dev_table_size) {
> + pr_err("The device table size of IOMMU:%d is not expected!/n",
> + iommu->index);
> + return -1;
> + }

I think the loop can end here. There is only one table to copy, so you
don't need to copy it for every iommu. Just do the checks in the loop
and the copy once after the loop.

> +
> + old_devtb_phys = entry & PAGE_MASK;
> + old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
> + if (!old_devtb)
> + return -1;
> +
> + if (copied)
> + continue;
> + for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
> + amd_iommu_dev_table[devid] = old_devtb[devid];
> + dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
> + dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
> + if (dte_v && dom_id)
> + __set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
> + }

Please only copy the DTE when DTE.V is set in the old table. Also
don't copy any of the IOMMUv2 enablement bits from the old DTE. PRI
faults are recoverable by design and a device shouldn't fail on a
negative answer from the IOMMU.

2017-07-27 15:38:28

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v8 06/13] iommu/amd: copy old trans table from old kernel

On Fri, Jul 21, 2017 at 04:59:04PM +0800, Baoquan He wrote:
> @@ -2128,9 +2131,43 @@ static void early_enable_iommu(struct amd_iommu *iommu)
> static void early_enable_iommus(void)
> {
> struct amd_iommu *iommu;
> + bool is_pre_enabled = false;
>
> - for_each_iommu(iommu)
> - early_enable_iommu(iommu);
> + for_each_iommu(iommu) {
> + if (translation_pre_enabled(iommu)) {
> + is_pre_enabled = true;
> + break;
> + }
> + }

is_pre_enabled should only be true when _all_ iommus are pre-enabled. If
only one is found disabled just disable the others and continue without
copying the device table.

2017-07-27 15:53:31

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v8 09/13] iommu/amd: Use is_attach_deferred call-back

On Fri, Jul 21, 2017 at 04:59:07PM +0800, Baoquan He wrote:
> +static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain,
> + struct device *dev)
> +{
> + struct iommu_dev_data *dev_data = dev->archdata.iommu;
> + return dev_data->defer_attach;
> +}

If the redundant newline from patch 1 needs a new home, put it here :-)

2017-07-27 15:55:54

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v8 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled

On Fri, Jul 21, 2017 at 04:59:08PM +0800, Baoquan He wrote:
> AMD pointed out it's unsafe to update the device-table while iommu
> is enabled. It turns out that device-table pointer update is split
> up into two 32bit writes in the IOMMU hardware. So updating it while
> the IOMMU is enabled could have some nasty side effects.
>
> The only way to work around this is to allocate the device-table below
> 4GB if translation is pre-enabled in kdump kernel. If allocation failed,
> still use the old one.

Not only for the kdump kernel. The old device table must also be below
4GB so that its pointer can be updated with a 32bit write.

If the old table is above 4GB you still need the second write to zero
the upper parts of the pointer in hardware.

2017-07-27 15:57:59

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v8 11/13] iommu/amd: Don't copy GCR3 table root pointer

On Fri, Jul 21, 2017 at 04:59:09PM +0800, Baoquan He wrote:
> When iommu is pre_enabled in kdump kernel, if a device is set up with
> guest translations (DTE.GV=1), then don't copy GCR3 table root pointer
> but move the device over to an empty guest-cr3 table and handle the
> faults in the PPR log (which answer them with INVALID). After all these
> PPR faults are recoverable for the device and we should not allow the
> device to change old-kernels data when we don't have to.

Okay, forget my previous comment about disabling IOMMUv2 features while
copying. Its done in this patch.

2017-07-27 15:58:56

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v8 13/13] iommu/amd: Disable iommu only if amd_iommu=off is specified

On Fri, Jul 21, 2017 at 04:59:11PM +0800, Baoquan He wrote:
> From: root <[email protected]>

You probaly need to reset the author on this one.

>
> It's ok to disable iommu in normal kernel. While there's no need
> to disable it in kdump kernel after the on-flight dma issue has
> heen fixed.
>
> Signed-off-by: Baoquan He <[email protected]>
> ---
> drivers/iommu/amd_iommu_init.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index 8b4bac978062..880f693c809b 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -2494,7 +2494,8 @@ static int __init early_amd_iommu_init(void)
> goto out;
>
> /* Disable any previously enabled IOMMUs */
> - disable_iommus();
> + if (amd_iommu_disabled)
> + disable_iommus();
>
> if (amd_iommu_irq_remap)
> amd_iommu_irq_remap = check_ioapic_information();
> --
> 2.5.5

2017-07-28 02:31:04

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v8 01/13] iommu/amd: Detect pre enabled translation

On 07/27/17 at 05:04pm, Joerg Roedel wrote:
> On Fri, Jul 21, 2017 at 04:58:59PM +0800, Baoquan He wrote:
> > diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
> > index 294a409e283b..d15966b62b33 100644
> > --- a/drivers/iommu/amd_iommu_types.h
> > +++ b/drivers/iommu/amd_iommu_types.h
> > @@ -417,6 +417,7 @@ extern struct kmem_cache *amd_iommu_irq_cache;
> > #define APERTURE_PAGE_INDEX(a) (((a) >> 21) & 0x3fULL)
> >
> >
> > +
>
> Forgot to remove that from the diff?

Many thanks for your reviewing and great suggestions, Joerg!

Will withdraw this change in this patch.

>
> > /*
> > * This struct is used to pass information about
> > * incoming PPR faults around.
> > @@ -435,6 +436,8 @@ struct iommu_domain;
> > struct irq_domain;
> > struct amd_irte_ops;
> >
> > +#define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED (1 << 0)
> > +
> > /*
> > * This structure contains generic data for IOMMU protection domains
> > * independent of their use.

2017-07-28 02:37:07

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v8 02/13] iommu/amd: add several helper functions

On 07/27/17 at 05:06pm, Joerg Roedel wrote:
> On Fri, Jul 21, 2017 at 04:59:00PM +0800, Baoquan He wrote:
> > Move single iommu enabling codes into a wrapper function early_enable_iommu().
> > This can make later kdump change easier.
> >
> > And also add iommu_disable_command_buffer and iommu_disable_event_buffer
> > for later usage.
> >
> > Signed-off-by: Baoquan He <[email protected]>
> > ---
> > drivers/iommu/amd_iommu_init.c | 42 +++++++++++++++++++++++++++++++-----------
> > 1 file changed, 31 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> > index e39857ce6481..4ca6e3257d92 100644
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -634,6 +634,14 @@ static void iommu_enable_command_buffer(struct amd_iommu *iommu)
> > amd_iommu_reset_cmd_buffer(iommu);
> > }
> >
> > +/*
> > + * This function disables the command buffer
> > + */
> > +static void iommu_disable_command_buffer(struct amd_iommu *iommu)
> > +{
> > + iommu_feature_disable(iommu, CONTROL_CMDBUF_EN);
> > +}
> > +
> > static void __init free_command_buffer(struct amd_iommu *iommu)
> > {
> > free_pages((unsigned long)iommu->cmd_buf, get_order(CMD_BUFFER_SIZE));
> > @@ -666,6 +674,14 @@ static void iommu_enable_event_buffer(struct amd_iommu *iommu)
> > iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN);
> > }
> >
> > +/*
> > + * This function disables the command buffer
>
> s/command buffer/event log/

Forgot changing it after copying from command buffer code.

>
> > + */
> > +static void iommu_disable_event_buffer(struct amd_iommu *iommu)
>
> Please also use event_log here.

Sure, will change. Thanks.

>
> > +{
> > + iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN);
> > +}
> > +

2017-07-28 03:59:21

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v8 05/13] iommu/amd: Add function copy_dev_tables()

On 07/27/17 at 05:29pm, Joerg Roedel wrote:
> On Fri, Jul 21, 2017 at 04:59:03PM +0800, Baoquan He wrote:
> > Add function copy_dev_tables to copy the old DEV table entries of the panicked
>
> Since there is only one (for now), you can name the function in
> singular: copy_dev_table() or copy_device_table().

Make sense, will change it to copy_device_table(). Thanks.

>
> > kernel to the new allocated DEV table. Since all iommus share the same DTE table
> > the copy only need be done one time. Besides, we also need to:
> >
> > - Check whether all IOMMUs actually use the same device table with the same size
> >
> > - Verify that the size of the old device table is the expected size.
> >
> > - Reserve the old domain id occupied in 1st kernel to avoid touching the old
> > io-page tables. Then on-flight DMA can continue looking it up.
> >
> > And also define MACRO DEV_DOMID_MASK to replace magic number 0xffffULL, it can be
> > reused in copy_dev_tables().
> >
> > Signed-off-by: Baoquan He <[email protected]>
> > ---
> > drivers/iommu/amd_iommu.c | 2 +-
> > drivers/iommu/amd_iommu_init.c | 55 +++++++++++++++++++++++++++++++++++++++++
> > drivers/iommu/amd_iommu_types.h | 1 +
> > 3 files changed, 57 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> > index e5a03f259986..4d00f1bda900 100644
> > --- a/drivers/iommu/amd_iommu.c
> > +++ b/drivers/iommu/amd_iommu.c
> > @@ -2086,7 +2086,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
> > flags |= tmp;
> > }
> >
> > - flags &= ~(0xffffUL);
> > + flags &= ~DEV_DOMID_MASK;
> > flags |= domain->id;
> >
> > amd_iommu_dev_table[devid].data[1] = flags;
> > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> > index f6da5fe03b31..c58f091ce232 100644
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -842,6 +842,61 @@ static int get_dev_entry_bit(u16 devid, u8 bit)
> > }
> >
> >
> > +static int copy_dev_tables(void)
> > +{
> > + struct dev_table_entry *old_devtb = NULL;
> > + u32 lo, hi, devid, old_devtb_size;
> > + phys_addr_t old_devtb_phys;
> > + u64 entry, last_entry = 0;
> > + struct amd_iommu *iommu;
> > + u16 dom_id, dte_v;
> > + static int copied;
> > +
> > + for_each_iommu(iommu) {
> > + if (!translation_pre_enabled(iommu)) {
> > + pr_err("IOMMU:%d is not pre-enabled!/n",
> > + iommu->index);
> > + return -1;
> > + }
> > +
> > + /* All IOMMUs should use the same device table with the same size */
> > + lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
> > + hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
> > + entry = (((u64) hi) << 32) + lo;
> > + if (last_entry && last_entry != entry) {
> > + pr_err("IOMMU:%d should use the same dev table as others!/n",
> > + iommu->index);
> > + return -1;
> > + }
> > + last_entry = entry;
> > +
> > + old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
> > + if (old_devtb_size != dev_table_size) {
> > + pr_err("The device table size of IOMMU:%d is not expected!/n",
> > + iommu->index);
> > + return -1;
> > + }
>
> I think the loop can end here. There is only one table to copy, so you
> don't need to copy it for every iommu. Just do the checks in the loop
> and the copy once after the loop.

Ok, will change. I worried that device table addr might not be stored
into iommu register correctly. I am fine we only do the check only at
the first time.

>
> > +
> > + old_devtb_phys = entry & PAGE_MASK;
> > + old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
> > + if (!old_devtb)
> > + return -1;
> > +
> > + if (copied)
> > + continue;
> > + for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
> > + amd_iommu_dev_table[devid] = old_devtb[devid];
> > + dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
> > + dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
> > + if (dte_v && dom_id)
> > + __set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
> > + }
>
> Please only copy the DTE when DTE.V is set in the old table. Also

Sorry, the sanity check about DTE.V has been done in patch 7/13. I
should rewrite the subject and git log for patch 7/13.

> don't copy any of the IOMMUv2 enablement bits from the old DTE. PRI
> faults are recoverable by design and a device shouldn't fail on a
> negative answer from the IOMMU.

Yes, this is done in patch 11/13. Saw your comment in patch 11/13.

>

2017-07-28 04:02:43

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v8 09/13] iommu/amd: Use is_attach_deferred call-back

On 07/27/17 at 05:53pm, Joerg Roedel wrote:
> On Fri, Jul 21, 2017 at 04:59:07PM +0800, Baoquan He wrote:
> > +static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain,
> > + struct device *dev)
> > +{
> > + struct iommu_dev_data *dev_data = dev->archdata.iommu;
> > + return dev_data->defer_attach;
> > +}
>
> If the redundant newline from patch 1 needs a new home, put it here :-)

Got it, will add a newline at this place. Thanks.

2017-07-28 06:52:11

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v8 06/13] iommu/amd: copy old trans table from old kernel

On 07/27/17 at 05:38pm, Joerg Roedel wrote:
> On Fri, Jul 21, 2017 at 04:59:04PM +0800, Baoquan He wrote:
> > @@ -2128,9 +2131,43 @@ static void early_enable_iommu(struct amd_iommu *iommu)
> > static void early_enable_iommus(void)
> > {
> > struct amd_iommu *iommu;
> > + bool is_pre_enabled = false;
> >
> > - for_each_iommu(iommu)
> > - early_enable_iommu(iommu);
> > + for_each_iommu(iommu) {
> > + if (translation_pre_enabled(iommu)) {
> > + is_pre_enabled = true;
> > + break;
> > + }
> > + }
>
> is_pre_enabled should only be true when _all_ iommus are pre-enabled. If
> only one is found disabled just disable the others and continue without
> copying the device table.

OK, will change as you suggested.

>

2017-07-28 09:06:24

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v8 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled

Hi Joerg,

On 07/27/17 at 05:55pm, Joerg Roedel wrote:
> On Fri, Jul 21, 2017 at 04:59:08PM +0800, Baoquan He wrote:
> > AMD pointed out it's unsafe to update the device-table while iommu
> > is enabled. It turns out that device-table pointer update is split
> > up into two 32bit writes in the IOMMU hardware. So updating it while
> > the IOMMU is enabled could have some nasty side effects.
> >
> > The only way to work around this is to allocate the device-table below
> > 4GB if translation is pre-enabled in kdump kernel. If allocation failed,
> > still use the old one.
>
> Not only for the kdump kernel. The old device table must also be below
> 4GB so that its pointer can be updated with a 32bit write.
>
> If the old table is above 4GB you still need the second write to zero
> the upper parts of the pointer in hardware.

Do you mean the allocation of amd_iommu_dev_table in
early_amd_iommu_init() also need be addressed for 1st kernel? Seems we
don't make sure that for 1st kernel, like adding GFP_DMA32 flag when
allocate amd_iommu_dev_table in amd_iommu_dev_table
early_amd_iommu_init().

2017-07-28 09:07:05

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v8 11/13] iommu/amd: Don't copy GCR3 table root pointer

On 07/27/17 at 05:57pm, Joerg Roedel wrote:
> On Fri, Jul 21, 2017 at 04:59:09PM +0800, Baoquan He wrote:
> > When iommu is pre_enabled in kdump kernel, if a device is set up with
> > guest translations (DTE.GV=1), then don't copy GCR3 table root pointer
> > but move the device over to an empty guest-cr3 table and handle the
> > faults in the PPR log (which answer them with INVALID). After all these
> > PPR faults are recoverable for the device and we should not allow the
> > device to change old-kernels data when we don't have to.
>
> Okay, forget my previous comment about disabling IOMMUv2 features while
> copying. Its done in this patch.

Yeah, as you said. Thanks.

>

2017-07-28 09:08:47

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v8 13/13] iommu/amd: Disable iommu only if amd_iommu=off is specified

On 07/27/17 at 05:58pm, Joerg Roedel wrote:
> On Fri, Jul 21, 2017 at 04:59:11PM +0800, Baoquan He wrote:
> > From: root <[email protected]>
>
> You probaly need to reset the author on this one.

Oops, sorry. I made this patch on a testing machine. Will correct it.

Thanks a lot for all these comments and great suggestions.

>
> >
> > It's ok to disable iommu in normal kernel. While there's no need
> > to disable it in kdump kernel after the on-flight dma issue has
> > heen fixed.
> >
> > Signed-off-by: Baoquan He <[email protected]>
> > ---
> > drivers/iommu/amd_iommu_init.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> > index 8b4bac978062..880f693c809b 100644
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -2494,7 +2494,8 @@ static int __init early_amd_iommu_init(void)
> > goto out;
> >
> > /* Disable any previously enabled IOMMUs */
> > - disable_iommus();
> > + if (amd_iommu_disabled)
> > + disable_iommus();
> >
> > if (amd_iommu_irq_remap)
> > amd_iommu_irq_remap = check_ioapic_information();
> > --
> > 2.5.5

2017-07-28 11:14:43

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v8 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled

On Fri, Jul 28, 2017 at 05:06:19PM +0800, Baoquan He wrote:
> Do you mean the allocation of amd_iommu_dev_table in
> early_amd_iommu_init() also need be addressed for 1st kernel? Seems we
> don't make sure that for 1st kernel, like adding GFP_DMA32 flag when
> allocate amd_iommu_dev_table in amd_iommu_dev_table
> early_amd_iommu_init().

Yes, exactly, the first device table also needs to be allocated with
GFP_DMA32 so that it ends up below 4GB.

2017-07-28 11:15:58

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v8 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled

On 07/28/17 at 01:14pm, Joerg Roedel wrote:
> On Fri, Jul 28, 2017 at 05:06:19PM +0800, Baoquan He wrote:
> > Do you mean the allocation of amd_iommu_dev_table in
> > early_amd_iommu_init() also need be addressed for 1st kernel? Seems we
> > don't make sure that for 1st kernel, like adding GFP_DMA32 flag when
> > allocate amd_iommu_dev_table in amd_iommu_dev_table
> > early_amd_iommu_init().
>
> Yes, exactly, the first device table also needs to be allocated with
> GFP_DMA32 so that it ends up below 4GB.

Got it, will do. Thanks!

2017-07-28 11:18:13

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v8 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled

On Fri, Jul 28, 2017 at 07:15:53PM +0800, Baoquan He wrote:
> On 07/28/17 at 01:14pm, Joerg Roedel wrote:
> > Yes, exactly, the first device table also needs to be allocated with
> > GFP_DMA32 so that it ends up below 4GB.
>
> Got it, will do. Thanks!

Oh, and you also need to check in the kdump kernel whether the old
device-table is really below 4GB and abort the copy otherwise.

2017-07-28 11:27:04

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v8 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled

On 07/28/17 at 01:18pm, Joerg Roedel wrote:
> On Fri, Jul 28, 2017 at 07:15:53PM +0800, Baoquan He wrote:
> > On 07/28/17 at 01:14pm, Joerg Roedel wrote:
> > > Yes, exactly, the first device table also needs to be allocated with
> > > GFP_DMA32 so that it ends up below 4GB.
> >
> > Got it, will do. Thanks!
>
> Oh, and you also need to check in the kdump kernel whether the old
> device-table is really below 4GB and abort the copy otherwise.

Ah, indeed. Will change it. Thanks.

2017-07-31 10:01:21

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v8 02/13] iommu/amd: add several helper functions

Hi Joerg,

On 07/27/17 at 05:06pm, Joerg Roedel wrote:
> On Fri, Jul 21, 2017 at 04:59:00PM +0800, Baoquan He wrote:
> > Move single iommu enabling codes into a wrapper function early_enable_iommu().
> > This can make later kdump change easier.
> >
> > And also add iommu_disable_command_buffer and iommu_disable_event_buffer
> > for later usage.
> >
> > Signed-off-by: Baoquan He <[email protected]>
> > ---
> > drivers/iommu/amd_iommu_init.c | 42 +++++++++++++++++++++++++++++++-----------
> > 1 file changed, 31 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> > index e39857ce6481..4ca6e3257d92 100644
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -634,6 +634,14 @@ static void iommu_enable_command_buffer(struct amd_iommu *iommu)
> > amd_iommu_reset_cmd_buffer(iommu);
> > }
> >
> > +/*
> > + * This function disables the command buffer
> > + */
> > +static void iommu_disable_command_buffer(struct amd_iommu *iommu)
> > +{
> > + iommu_feature_disable(iommu, CONTROL_CMDBUF_EN);
> > +}
> > +
> > static void __init free_command_buffer(struct amd_iommu *iommu)
> > {
> > free_pages((unsigned long)iommu->cmd_buf, get_order(CMD_BUFFER_SIZE));
> > @@ -666,6 +674,14 @@ static void iommu_enable_event_buffer(struct amd_iommu *iommu)
> > iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN);
> > }
> >
> > +/*
> > + * This function disables the command buffer
>
> s/command buffer/event log/
>
> > + */
> > +static void iommu_disable_event_buffer(struct amd_iommu *iommu)
>
> Please also use event_log here.

I found the event log related handling functions all take
xxx_event_buffer names, like:
alloc_event_buffer()
iommu_enable_event_buffer()
free_event_buffer()

So for consistency, I plan to still use iommu_disable_event_buffer().
Maybe later we can change them in a clean up patch independently.

Thanks
Baoquan

>
> > +{
> > + iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN);
> > +}
> > +

2017-07-31 10:07:06

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v8 02/13] iommu/amd: add several helper functions

Hi Baoquan,

On Mon, Jul 31, 2017 at 06:01:11PM +0800, Baoquan He wrote:
> I found the event log related handling functions all take
> xxx_event_buffer names, like:
> alloc_event_buffer()
> iommu_enable_event_buffer()
> free_event_buffer()
>
> So for consistency, I plan to still use iommu_disable_event_buffer().
> Maybe later we can change them in a clean up patch independently.

Yeah, makes sense. Go ahead with it.



Joerg

2017-07-31 10:15:35

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v8 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled

Hi Joerg,

On 07/28/17 at 05:06pm, Baoquan He wrote:
> Hi Joerg,
>
> On 07/27/17 at 05:55pm, Joerg Roedel wrote:
> > On Fri, Jul 21, 2017 at 04:59:08PM +0800, Baoquan He wrote:
> > > AMD pointed out it's unsafe to update the device-table while iommu
> > > is enabled. It turns out that device-table pointer update is split
> > > up into two 32bit writes in the IOMMU hardware. So updating it while
> > > the IOMMU is enabled could have some nasty side effects.
> > >
> > > The only way to work around this is to allocate the device-table below
> > > 4GB if translation is pre-enabled in kdump kernel. If allocation failed,
> > > still use the old one.
> >
> > Not only for the kdump kernel. The old device table must also be below
> > 4GB so that its pointer can be updated with a 32bit write.
> >
> > If the old table is above 4GB you still need the second write to zero
> > the upper parts of the pointer in hardware.
>
> Do you mean the allocation of amd_iommu_dev_table in
> early_amd_iommu_init() also need be addressed for 1st kernel? Seems we
> don't make sure that for 1st kernel, like adding GFP_DMA32 flag when
> allocate amd_iommu_dev_table in amd_iommu_dev_table
> early_amd_iommu_init().

I plan to add GFP_DMA32 when allocate amd_iommu_dev_table in
early_amd_iommu_init() as below. Then in kdump kernel we don't need to
worry if the old amd_iommu_dev_table could be above 4G, right? And might
not need to check if it's above 4G, right?

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 781a138..85d6445 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2436,7 +2436,8 @@ static int __init early_amd_iommu_init(void)

/* Device table - directly used by all IOMMUs */
ret = -ENOMEM;
- amd_iommu_dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+ amd_iommu_dev_table = (void *)__get_free_pages(
+ GFP_KERNEL | __GFP_ZERO | GFP_DMA32,
get_order(dev_table_size));
if (amd_iommu_dev_table == NULL)
goto out;
--
2.9.4


2017-07-31 10:21:57

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v8 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled

Hi Baoquan,

On Mon, Jul 31, 2017 at 06:15:30PM +0800, Baoquan He wrote:
> I plan to add GFP_DMA32 when allocate amd_iommu_dev_table in
> early_amd_iommu_init() as below. Then in kdump kernel we don't need to
> worry if the old amd_iommu_dev_table could be above 4G, right? And might
> not need to check if it's above 4G, right?
>
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index 781a138..85d6445 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -2436,7 +2436,8 @@ static int __init early_amd_iommu_init(void)
>
> /* Device table - directly used by all IOMMUs */
> ret = -ENOMEM;
> - amd_iommu_dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> + amd_iommu_dev_table = (void *)__get_free_pages(
> + GFP_KERNEL | __GFP_ZERO | GFP_DMA32,
> get_order(dev_table_size));
> if (amd_iommu_dev_table == NULL)
> goto out;

Yeah, adding GFP_DMA32 is right. But you still need to check it in the
kdump path. Not checking it would mean you trust the old kernel, but
since it paniced there is no reason to put any trust in what happened
before.



Joerg

2017-07-31 10:29:30

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v8 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled

On 07/31/17 at 12:21pm, Joerg Roedel wrote:
> Hi Baoquan,
>
> On Mon, Jul 31, 2017 at 06:15:30PM +0800, Baoquan He wrote:
> > I plan to add GFP_DMA32 when allocate amd_iommu_dev_table in
> > early_amd_iommu_init() as below. Then in kdump kernel we don't need to
> > worry if the old amd_iommu_dev_table could be above 4G, right? And might
> > not need to check if it's above 4G, right?
> >
> > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> > index 781a138..85d6445 100644
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -2436,7 +2436,8 @@ static int __init early_amd_iommu_init(void)
> >
> > /* Device table - directly used by all IOMMUs */
> > ret = -ENOMEM;
> > - amd_iommu_dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> > + amd_iommu_dev_table = (void *)__get_free_pages(
> > + GFP_KERNEL | __GFP_ZERO | GFP_DMA32,
> > get_order(dev_table_size));
> > if (amd_iommu_dev_table == NULL)
> > goto out;
>
> Yeah, adding GFP_DMA32 is right. But you still need to check it in the
> kdump path. Not checking it would mean you trust the old kernel, but
> since it paniced there is no reason to put any trust in what happened
> before.

You are right, it could be touched accidentally. It must be checked.
Thanks a lot for your answer!