2014-12-22 09:16:35

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

This patchset is an update of Bill Sumner's patchset, implements a fix for:
If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
when a panic happens, the kdump kernel will boot with these faults:

dmar: DRHD: handling fault status reg 102
dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff80000
DMAR:[fault reason 01] Present bit in root entry is clear

dmar: DRHD: handling fault status reg 2
dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear

On some system, the interrupt remapping fault will also happen even if the
intel_iommu is not set to on, because the interrupt remapping will be enabled
when x2apic is needed by the system.

The cause of the DMA fault is described in Bill's original version, and the
INTR-Remap fault is caused by a similar reason. In short, the initialization
of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
response.

To fix this problem, we modifies the behaviors of the intel vt-d in the
crashdump kernel:

For DMA Remapping:
1. To accept the vt-d hardware in an active state,
2. Do not disable and re-enable the translation, keep it enabled.
3. Use the old root entry table, do not rewrite the RTA register.
4. Malloc and use new context entry table and page table, copy data from the
old ones that used by the old kernel.
5. to use different portions of the iova address ranges for the device drivers
in the crashdump kernel than the iova ranges that were in-use at the time
of the panic.
6. After device driver is loaded, when it issues the first dma_map command,
free the dmar_domain structure for this device, and generate a new one, so
that the device can be assigned a new and empty page table.
7. When a new context entry table is generated, we also save its address to
the old root entry table.

For Interrupt Remapping:
1. To accept the vt-d hardware in an active state,
2. Do not disable and re-enable the interrupt remapping, keep it enabled.
3. Use the old interrupt remapping table, do not rewrite the IRTA register.
4. When ioapic entry is setup, the interrupt remapping table is changed, and
the updated data will be stored to the old interrupt remapping table.

Advantages of this approach:
1. All manipulation of the IO-device is done by the Linux device-driver
for that device.
2. This approach behaves in a manner very similar to operation without an
active iommu.
3. Any activity between the IO-device and its RMRR areas is handled by the
device-driver in the same manner as during a non-kdump boot.
4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
This supports the practice of creating a special kdump kernel without
drivers for any devices that are not required for taking a crashdump.
5. Minimal code-changes among the existing mainline intel vt-d code.

Summary of changes in this patch set:
1. Added some useful function for root entry table in code intel-iommu.c
2. Added new members to struct root_entry and struct irte;
3. Functions to load old root entry table to iommu->root_entry from the memory
of old kernel.
4. Functions to malloc new context entry table and page table and copy the data
from the old ones to the malloced new ones.
5. Functions to enable support for DMA remapping in kdump kernel.
6. Functions to load old irte data from the old kernel to the kdump kernel.
7. Some code changes that support other behaviours that have been listed.
8. In the new functions, use physical address as "unsigned long" type, not
pointers.

Original version by Bill Sumner:
https://lkml.org/lkml/2014/1/10/518
https://lkml.org/lkml/2014/4/15/716
https://lkml.org/lkml/2014/4/24/836

Zhenhua's last of Bill's patchset:
https://lkml.org/lkml/2014/10/21/134
https://lkml.org/lkml/2014/12/15/121

Changed in this version:
1. Do not disable and re-enable traslation and interrupt remapping.
2. Use old root entry table.
3. Use old interrupt remapping table.
4. Use "unsigned long" as physical address.
5. Use intel_unmap to unmap the old dma;

This patchset should be applied with this one together:
https://lkml.org/lkml/2014/11/5/43
x86/iommu: fix incorrect bit operations in setting values

Bill Sumner (5):
iommu/vt-d: Update iommu_attach_domain() and its callers
iommu/vt-d: Items required for kdump
iommu/vt-d: data types and functions used for kdump
iommu/vt-d: Add domain-id functions
iommu/vt-d: enable kdump support in iommu module

Li, Zhen-Hua (10):
iommu/vt-d: Update iommu_attach_domain() and its callers
iommu/vt-d: Items required for kdump
iommu/vt-d: Add domain-id functions
iommu/vt-d: functions to copy data from old mem
iommu/vt-d: Add functions to load and save old re
iommu/vt-d: datatypes and functions used for kdump
iommu/vt-d: enable kdump support in iommu module
iommu/vtd: assign new page table for dma_map
iommu/vt-d: Copy functions for irte
iommu/vt-d: Use old irte in kdump kernel

drivers/iommu/intel-iommu.c | 1050 +++++++++++++++++++++++++++++++++--
drivers/iommu/intel_irq_remapping.c | 99 +++-
include/linux/intel-iommu.h | 18 +
3 files changed, 1123 insertions(+), 44 deletions(-)

--
2.0.0-rc0


2014-12-22 09:16:48

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH 03/10] iommu/vt-d: Add domain-id functions

Interfaces for when a new domain in the crashdump kernel needs some
values from the panicked kernel's context entries.

Signed-off-by: Bill Sumner <[email protected]>
---
drivers/iommu/intel-iommu.c | 62 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5ce2850..c0bebd6 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -381,6 +381,13 @@ struct domain_values_entry {
2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
};

+static struct domain_values_entry *intel_iommu_did_to_domain_values_entry(
+ int did, struct intel_iommu *iommu);
+
+static int intel_iommu_get_dids_from_old_kernel(struct intel_iommu *iommu);
+
+static int device_to_domain_id(struct intel_iommu *iommu, u8 bus, u8 devfn);
+
#endif /* CONFIG_CRASH_DUMP */

/*
@@ -4832,3 +4839,58 @@ static void __init check_tylersburg_isoch(void)
printk(KERN_WARNING "DMAR: Recommended TLB entries for ISOCH unit is 16; your BIOS set %d\n",
vtisochctrl);
}
+
+#ifdef CONFIG_CRASH_DUMP
+
+/*
+ * Interfaces for when a new domain in the crashdump kernel needs some
+ * values from the panicked kernel's context entries
+ *
+ */
+static struct domain_values_entry *intel_iommu_did_to_domain_values_entry(
+ int did, struct intel_iommu *iommu)
+{
+ struct domain_values_entry *dve; /* iterator */
+
+ list_for_each_entry(dve, &domain_values_list[iommu->seq_id], link)
+ if (dve->did == did)
+ return dve;
+ return NULL;
+}
+
+/* Mark domain-id's from old kernel as in-use on this iommu so that a new
+ * domain-id is allocated in the case where there is a device in the new kernel
+ * that was not in the old kernel -- and therefore a new domain-id is needed.
+ */
+static int intel_iommu_get_dids_from_old_kernel(struct intel_iommu *iommu)
+{
+ struct domain_values_entry *dve; /* iterator */
+
+ pr_info("IOMMU:%d Domain ids from panicked kernel:\n", iommu->seq_id);
+
+ list_for_each_entry(dve, &domain_values_list[iommu->seq_id], link) {
+ set_bit(dve->did, iommu->domain_ids);
+ pr_info("DID did:%d(0x%4.4x)\n", dve->did, dve->did);
+ }
+
+ pr_info("----------------------------------------\n");
+ return 0;
+}
+
+static int device_to_domain_id(struct intel_iommu *iommu, u8 bus, u8 devfn)
+{
+ int did = -1; /* domain-id returned */
+ struct root_entry *root;
+ struct context_entry *context;
+ unsigned long flags;
+
+ spin_lock_irqsave(&iommu->lock, flags);
+ root = &iommu->root_entry[bus];
+ context = get_context_addr_from_root(root);
+ if (context && context_present(context+devfn))
+ did = context_domain_id(context+devfn);
+ spin_unlock_irqrestore(&iommu->lock, flags);
+ return did;
+}
+
+#endif /* CONFIG_CRASH_DUMP */
--
2.0.0-rc0

2014-12-22 09:16:54

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH 04/10] iommu/vt-d: functions to copy data from old mem

Add some functions to copy the data from old kernel.
These functions are used to copy context tables and page tables.

To avoid calling iounmap between spin_lock_irqsave and spin_unlock_irqrestore,
use a link here, store the pointers , and then use iounmap to free them in
another place.

Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel-iommu.c | 97 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/intel-iommu.h | 9 +++++
2 files changed, 106 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c0bebd6..8a7ad72 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -388,6 +388,13 @@ static int intel_iommu_get_dids_from_old_kernel(struct intel_iommu *iommu);

static int device_to_domain_id(struct intel_iommu *iommu, u8 bus, u8 devfn);

+struct iommu_remapped_entry {
+ struct list_head list;
+ void __iomem *mem;
+};
+static LIST_HEAD(__iommu_remapped_mem);
+static DEFINE_MUTEX(__iommu_mem_list_lock);
+
#endif /* CONFIG_CRASH_DUMP */

/*
@@ -4843,6 +4850,96 @@ static void __init check_tylersburg_isoch(void)
#ifdef CONFIG_CRASH_DUMP

/*
+ * Copy memory from a physically-addressed area into a virtually-addressed area
+ */
+int __iommu_load_from_oldmem(void *to, unsigned long from, unsigned long size)
+{
+ unsigned long pfn; /* Page Frame Number */
+ size_t csize = (size_t)size; /* Num(bytes to copy) */
+ unsigned long offset; /* Lower 12 bits of to */
+ void __iomem *virt_mem;
+ struct iommu_remapped_entry *mapped;
+
+ pfn = from >> VTD_PAGE_SHIFT;
+ offset = from & (~VTD_PAGE_MASK);
+
+ if (page_is_ram(pfn)) {
+ memcpy(to, pfn_to_kaddr(pfn) + offset, csize);
+ } else{
+
+ mapped = kzalloc(sizeof(struct iommu_remapped_entry),
+ GFP_KERNEL);
+ if (!mapped)
+ return -ENOMEM;
+
+ virt_mem = ioremap_cache((unsigned long)from, size);
+ if (!virt_mem) {
+ kfree(mapped);
+ return -ENOMEM;
+ }
+ memcpy(to, virt_mem, size);
+
+ mutex_lock(&__iommu_mem_list_lock);
+ mapped->mem = virt_mem;
+ list_add_tail(&mapped->list, &__iommu_remapped_mem);
+ mutex_unlock(&__iommu_mem_list_lock);
+ }
+ return size;
+}
+
+/*
+ * Copy memory from a virtually-addressed area into a physically-addressed area
+ */
+int __iommu_save_to_oldmem(unsigned long to, void *from, unsigned long size)
+{
+ unsigned long pfn; /* Page Frame Number */
+ size_t csize = (size_t)size; /* Num(bytes to copy) */
+ unsigned long offset; /* Lower 12 bits of to */
+ void __iomem *virt_mem;
+ struct iommu_remapped_entry *mapped;
+
+ pfn = to >> VTD_PAGE_SHIFT;
+ offset = to & (~VTD_PAGE_MASK);
+
+ if (page_is_ram(pfn)) {
+ memcpy(pfn_to_kaddr(pfn) + offset, from, csize);
+ } else{
+ mapped = kzalloc(sizeof(struct iommu_remapped_entry),
+ GFP_KERNEL);
+ if (!mapped)
+ return -ENOMEM;
+
+ virt_mem = ioremap_cache((unsigned long)to, size);
+ if (!virt_mem) {
+ kfree(mapped);
+ return -ENOMEM;
+ }
+ memcpy(virt_mem, from, size);
+ mutex_lock(&__iommu_mem_list_lock);
+ mapped->mem = virt_mem;
+ list_add_tail(&mapped->list, &__iommu_remapped_mem);
+ mutex_unlock(&__iommu_mem_list_lock);
+ }
+ return size;
+}
+
+/*
+ * Free the mapped memory for ioremap;
+ */
+int __iommu_free_mapped_mem(void)
+{
+ struct iommu_remapped_entry *mem_entry, *tmp;
+
+ mutex_lock(&__iommu_mem_list_lock);
+ list_for_each_entry_safe(mem_entry, tmp, &__iommu_remapped_mem, list) {
+ iounmap(mem_entry->mem);
+ list_del(&mem_entry->list);
+ kfree(mem_entry);
+ }
+ mutex_unlock(&__iommu_mem_list_lock);
+ return 0;
+}
+/*
* Interfaces for when a new domain in the crashdump kernel needs some
* values from the panicked kernel's context entries
*
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index a65208a..8ffa523 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -26,6 +26,7 @@
#include <linux/iova.h>
#include <linux/io.h>
#include <linux/dma_remapping.h>
+#include <linux/crash_dump.h>
#include <asm/cacheflush.h>
#include <asm/iommu.h>

@@ -368,4 +369,12 @@ extern int dmar_ir_support(void);

extern const struct attribute_group *intel_iommu_groups[];

+#ifdef CONFIG_CRASH_DUMP
+extern int __iommu_load_from_oldmem(void *to, unsigned long from,
+ unsigned long size);
+extern int __iommu_save_to_oldmem(unsigned long to, void *from,
+ unsigned long size);
+extern int __iommu_free_mapped_mem(void);
+#endif /* CONFIG_CRASH_DUMP */
+
#endif
--
2.0.0-rc0

2014-12-22 09:16:59

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH 05/10] iommu/vt-d: Add functions to load and save old re

Add functions to load root entry table from old kernel, and to save updated
root entry table.
Add two member in struct intel_iommu, to store the RTA in old kernel, and
the mapped virt address of it.

We use the old RTA in dump kernel, and when the iommu->root_entry is used as
a cache in kdump kernel, its phys address will not be save to RTA register,
but when its data is changed, we will save the new data to old root entry table.

Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel-iommu.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/intel-iommu.h | 5 +++++
2 files changed, 54 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 8a7ad72..126294db 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -388,6 +388,10 @@ static int intel_iommu_get_dids_from_old_kernel(struct intel_iommu *iommu);

static int device_to_domain_id(struct intel_iommu *iommu, u8 bus, u8 devfn);

+static void __iommu_load_old_root_entry(struct intel_iommu *iommu);
+
+static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int index);
+
struct iommu_remapped_entry {
struct list_head list;
void __iomem *mem;
@@ -4990,4 +4994,49 @@ static int device_to_domain_id(struct intel_iommu *iommu, u8 bus, u8 devfn)
return did;
}

+/*
+ * Load the old root entry table to new root entry table.
+ */
+static void __iommu_load_old_root_entry(struct intel_iommu *iommu)
+{
+ if ((!iommu)
+ || (!iommu->root_entry)
+ || (!iommu->root_entry_old_virt)
+ || (!iommu->root_entry_old_phys))
+ return;
+ memcpy(iommu->root_entry, iommu->root_entry_old_virt, PAGE_SIZE);
+}
+
+/*
+ * When the data in new root entry table is changed, this function
+ * must be called to save the updated data to old root entry table.
+ */
+static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int index)
+{
+ u8 start;
+ unsigned long size;
+ void __iomem *to;
+ void *from;
+
+ if ((!iommu)
+ || (!iommu->root_entry)
+ || (!iommu->root_entry_old_virt)
+ || (!iommu->root_entry_old_phys))
+ return;
+
+ if (index < -1 || index >= ROOT_ENTRY_NR)
+ return;
+
+ if (index == -1) {
+ start = 0;
+ size = ROOT_ENTRY_NR * sizeof(struct root_entry);
+ } else {
+ start = index * sizeof(struct root_entry);
+ size = sizeof(struct root_entry);
+ }
+ to = iommu->root_entry_old_virt;
+ from = iommu->root_entry;
+ memcpy(to + start, from + start, size);
+}
+
#endif /* CONFIG_CRASH_DUMP */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 8ffa523..8e29b97 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -329,6 +329,11 @@ struct intel_iommu {
spinlock_t lock; /* protect context, domain ids */
struct root_entry *root_entry; /* virtual address */

+#ifdef CONFIG_CRASH_DUMP
+ void __iomem *root_entry_old_virt; /* mapped from old root entry */
+ unsigned long root_entry_old_phys; /* root entry in old kernel */
+#endif
+
struct iommu_flush flush;
#endif
struct q_inval *qi; /* Queued invalidation info */
--
2.0.0-rc0

2014-12-22 09:17:07

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH 06/10] iommu/vt-d: datatypes and functions used for kdump

Populate it with support functions to copy iommu translation tables from
from the panicked kernel into the kdump kernel in the event of a crash.

Functions:
malloc new context table and copy old context table to the new one.
malloc new page table and copy old page table to the new one.

Bill Sumner:
Original version, the creation of the data types and functions.

Li, Zhenhua:
Minor change:
Update the usage of context_get_* and context_put*, use context_*
and context_set_* for replacement.
Update the name of the function that copies root entry table.
Use new function to copy old context entry tables and page tables.
Use "unsigned long" for physical address.
Change incorrect aw_shift[4] and a few comments in copy_context_entry().

Signed-off-by: Bill Sumner <[email protected]>
Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel-iommu.c | 540 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 540 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 126294db..f9849cb 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -399,6 +399,62 @@ struct iommu_remapped_entry {
static LIST_HEAD(__iommu_remapped_mem);
static DEFINE_MUTEX(__iommu_mem_list_lock);

+/* ========================================================================
+ * Copy iommu translation tables from old kernel into new kernel.
+ * Entry to this set of functions is: intel_iommu_load_translation_tables()
+ * ------------------------------------------------------------------------
+ */
+
+/*
+ * Lists of domain_values_entry to hold domain values found during the copy.
+ * One list for each iommu in g_number_of_iommus.
+ */
+static struct list_head *domain_values_list;
+
+
+#define RET_BADCOPY -1 /* Return-code: Cannot copy translate tables */
+
+/*
+ * Struct copy_page_addr_parms is used to allow copy_page_addr()
+ * to accumulate values across multiple calls and returns.
+ */
+struct copy_page_addr_parms {
+ u32 first; /* flag: first-time */
+ u32 last; /* flag: last-time */
+ u32 bus; /* last bus number we saw */
+ u32 devfn; /* last devfn we saw */
+ u32 shift; /* last shift we saw */
+ u64 pte; /* Page Table Entry */
+ u64 next_addr; /* next-expected page_addr */
+
+ u64 page_addr; /* page_addr accumulating size */
+ u64 page_size; /* page_size accumulated */
+
+ struct domain_values_entry *dve; /* to accumulate iova ranges */
+};
+
+enum returns_from_copy_context_entry {
+RET_CCE_NOT_PRESENT = 1,
+RET_CCE_NEW_PAGE_TABLES,
+RET_CCE_PASS_THROUGH_1,
+RET_CCE_PASS_THROUGH_2,
+RET_CCE_RESERVED_VALUE,
+RET_CCE_PREVIOUS_DID
+};
+
+static int copy_context_entry(struct intel_iommu *iommu, u32 bus, u32 devfn,
+ void *ppap, struct context_entry *ce);
+
+static int copy_context_entry_table(struct intel_iommu *iommu,
+ u32 bus, void *ppap,
+ unsigned long *context_new_p,
+ unsigned long context_old_phys);
+
+static int copy_root_entry_table(struct intel_iommu *iommu, void *ppap);
+
+static int intel_iommu_load_translation_tables(struct dmar_drhd_unit *drhd,
+ int g_num_of_iommus);
+
#endif /* CONFIG_CRASH_DUMP */

/*
@@ -5039,4 +5095,488 @@ static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int index)
memcpy(to + start, from + start, size);
}

+/*
+ * constant for initializing instances of copy_page_addr_parms properly.
+ */
+static struct copy_page_addr_parms copy_page_addr_parms_init = {1, 0};
+
+
+
+/*
+ * Lowest-level function in the 'Copy Page Tables' set
+ * Called once for each page_addr present in an iommu page-address table.
+ *
+ * Because of the depth-first traversal of the page-tables by the
+ * higher-level functions that call 'copy_page_addr', all pages
+ * of a domain will be presented in ascending order of IO Virtual Address.
+ *
+ * This function accumulates each contiguous range of these IOVAs and
+ * reserves it within the proper domain in the crashdump kernel when a
+ * non-contiguous range is detected, as determined by any of the following:
+ * 1. a change in the bus or device owning the presented page
+ * 2. a change in the page-size of the presented page (parameter shift)
+ * 3. a change in the page-table entry of the presented page
+ * 4. a presented IOVA that does not match the expected next-page address
+ * 5. the 'last' flag is set, indicating that all IOVAs have been seen.
+ */
+static int copy_page_addr(u64 page_addr, u32 shift, u32 bus, u32 devfn,
+ u64 pte, struct domain_values_entry *dve,
+ void *parms)
+{
+ struct copy_page_addr_parms *ppap = parms;
+
+ u64 page_size = ((u64)1 << shift); /* page_size */
+ u64 pfn_lo; /* For reserving IOVA range */
+ u64 pfn_hi; /* For reserving IOVA range */
+ struct iova *iova_p; /* For reserving IOVA range */
+
+ if (!ppap) {
+ pr_err("ERROR: ppap is NULL: 0x%3.3x(%3.3d) DevFn: 0x%3.3x(%3.3d) Page: 0x%16.16llx Size: 0x%16.16llx(%lld)\n",
+ bus, bus, devfn, devfn, page_addr,
+ page_size, page_size);
+ return 0;
+ }
+
+ /* If (only extending current addr range) */
+ if (ppap->first == 0 &&
+ ppap->last == 0 &&
+ ppap->bus == bus &&
+ ppap->devfn == devfn &&
+ ppap->shift == shift &&
+ (ppap->pte & ~VTD_PAGE_MASK) == (pte & ~VTD_PAGE_MASK) &&
+ ppap->next_addr == page_addr) {
+
+ /* Update page size and next-expected address */
+ ppap->next_addr += page_size;
+ ppap->page_size += page_size;
+ return 0;
+ }
+
+ if (!ppap->first) {
+ /* Close-out the accumulated IOVA address range */
+
+ if (!ppap->dve) {
+ pr_err("%s ERROR: ppap->dve is NULL -- needed to reserve range for B:D:F=%2.2x:%2.2x:%1.1x\n",
+ __func__,
+ ppap->bus, ppap->devfn >> 3, ppap->devfn & 0x7);
+ return RET_BADCOPY;
+ }
+ pfn_lo = IOVA_PFN(ppap->page_addr);
+ pfn_hi = IOVA_PFN(ppap->page_addr + ppap->page_size);
+ iova_p = reserve_iova(&ppap->dve->iovad, pfn_lo, pfn_hi);
+ }
+
+ /* Prepare for a new IOVA address range */
+ ppap->first = 0; /* Not first-time anymore */
+ ppap->bus = bus;
+ ppap->devfn = devfn;
+ ppap->shift = shift;
+ ppap->pte = pte;
+ ppap->next_addr = page_addr + page_size; /* Next-expected page_addr */
+
+ ppap->page_addr = page_addr; /* Addr(new page) */
+ ppap->page_size = page_size; /* Size(new page) */
+
+ ppap->dve = dve; /* adr(device_values_entry for new range) */
+
+ return 0;
+}
+
+/*
+ * Recursive function to copy the tree of page tables (max 6 recursions)
+ * Parameter 'shift' controls the recursion
+ */
+static int copy_page_table(unsigned long *dma_pte_new_p,
+ unsigned long dma_pte_phys,
+ u32 shift, u64 page_addr,
+ struct intel_iommu *iommu,
+ u32 bus, u32 devfn,
+ struct domain_values_entry *dve, void *ppap)
+{
+ int ret; /* Integer return code */
+ struct dma_pte *p; /* Virtual adr(each entry) iterator */
+ struct dma_pte *pgt_new_virt; /* Adr(dma_pte in new kernel) */
+ unsigned long dma_pte_next; /* Adr(next table down) */
+ u64 u; /* index(each entry in page_table) */
+
+
+ /* If (already done all levels -- problem) */
+ if (shift < 12) {
+ pr_err("ERROR %s shift < 12 %lx\n", __func__, dma_pte_phys);
+ pr_err("shift %d, page_addr %16.16llu bus %3.3u devfn %3.3u\n",
+ shift, page_addr, bus, devfn);
+ return RET_BADCOPY;
+ }
+
+ /* allocate a page table in the new kernel
+ * copy contents from old kernel
+ * then update each entry in the table in the new kernel
+ */
+
+ pgt_new_virt = (struct dma_pte *)alloc_pgtable_page(iommu->node);
+ if (!pgt_new_virt)
+ return -ENOMEM;
+
+ ret = __iommu_load_from_oldmem(pgt_new_virt,
+ dma_pte_phys,
+ VTD_PAGE_SIZE);
+
+ if (ret <= 0)
+ return ret;
+
+ for (u = 0, p = pgt_new_virt; u < 512; u++, p++) {
+
+ if (((p->val & DMA_PTE_READ) == 0) &&
+ ((p->val & DMA_PTE_WRITE) == 0))
+ continue;
+
+ if (dma_pte_superpage(p) || (shift == 12)) {
+
+ ret = copy_page_addr(page_addr | (u << shift),
+ shift, bus, devfn, p->val, dve, ppap);
+ if (ret)
+ return ret;
+ continue;
+ }
+
+ ret = copy_page_table(&dma_pte_next,
+ (p->val & VTD_PAGE_MASK),
+ shift-9, page_addr | (u << shift),
+ iommu, bus, devfn, dve, ppap);
+ if (ret)
+ return ret;
+
+ p->val &= ~VTD_PAGE_MASK; /* Clear old and set new pgd */
+ p->val |= ((u64)dma_pte_next & VTD_PAGE_MASK);
+ }
+
+ *dma_pte_new_p = virt_to_phys(pgt_new_virt);
+ __iommu_flush_cache(iommu, pgt_new_virt, VTD_PAGE_SIZE);
+
+ return 0;
+}
+
+
+/*
+ * Called once for each context_entry found in a copied context_entry_table
+ * Each context_entry represents one PCIe device handled by the IOMMU.
+ *
+ * The 'domain_values_list' contains one 'domain_values_entry' for each
+ * unique domain-id found while copying the context entries for each iommu.
+ *
+ * The Intel-iommu spec. requires that every context_entry that contains
+ * the same domain-id point to the same set of page translation tables.
+ * The hardware uses this to improve the use of its translation cache.
+ * In order to insure that the copied translate tables abide by this
+ * requirement, this function keeps a list of domain-ids (dids) that
+ * have already been seen for this iommu. This function checks each entry
+ * already on the list for a domain-id that matches the domain-id in this
+ * context_entry. If found, this function places the address of the previous
+ * context's tree of page translation tables into this context_entry.
+ * If a matching previous entry is not found, a new 'domain_values_entry'
+ * structure is created for the domain-id in this context_entry and
+ * copy_page_table is called to duplicate its tree of page tables.
+ */
+static int copy_context_entry(struct intel_iommu *iommu, u32 bus, u32 devfn,
+ void *ppap, struct context_entry *ce)
+{
+ int ret = 0; /* Integer Return Code */
+ u32 shift = 0; /* bits to shift page_addr */
+ u64 page_addr = 0; /* Address of translated page */
+ unsigned long pgt_old_phys; /* Adr(page_table in the old kernel) */
+ unsigned long pgt_new_phys; /* Adr(page_table in the new kernel) */
+ u8 t; /* Translation-type from context */
+ u8 aw; /* Address-width from context */
+ u32 aw_shift[8] = {
+ 12+9+9, /* [000b] 30-bit AGAW (2-level page table) */
+ 12+9+9+9, /* [001b] 39-bit AGAW (3-level page table) */
+ 12+9+9+9+9, /* [010b] 48-bit AGAW (4-level page table) */
+ 12+9+9+9+9+9, /* [011b] 57-bit AGAW (5-level page table) */
+ 12+9+9+9+9+9+7, /* [100b] 64-bit AGAW (6-level page table) */
+ 0, /* [101b] Reserved */
+ 0, /* [110b] Reserved */
+ 0, /* [111b] Reserved */
+ };
+
+ struct domain_values_entry *dve = NULL;
+
+ if (!context_present(ce)) { /* If (context not present) */
+ ret = RET_CCE_NOT_PRESENT; /* Skip it */
+ goto exit;
+ }
+
+ t = context_translation_type(ce);
+ /* If we have seen this domain-id before on this iommu,
+ * give this context the same page-tables and we are done.
+ */
+ list_for_each_entry(dve, &domain_values_list[iommu->seq_id], link) {
+ if (dve->did == (int) context_domain_id(ce)) {
+ switch (t) {
+ case 0: /* page tables */
+ case 1: /* page tables */
+ context_set_address_root(ce,
+ virt_to_phys(dve->pgd));
+ ret = RET_CCE_PREVIOUS_DID;
+ break;
+
+ case 2: /* Pass through */
+ if (dve->pgd == NULL)
+ ret = RET_CCE_PASS_THROUGH_2;
+ else
+ ret = RET_BADCOPY;
+ break;
+
+ default: /* Bad value of 't'*/
+ ret = RET_BADCOPY;
+ break;
+ }
+ goto exit;
+ }
+ }
+
+ /* Since we now know that this is a new domain-id for this iommu,
+ * create a new entry, add it to the list, and handle its
+ * page tables.
+ */
+
+ dve = kcalloc(1, sizeof(struct domain_values_entry), GFP_KERNEL);
+ if (!dve) {
+ ret = -ENOMEM;
+ goto exit;
+ }
+
+ dve->did = (int) context_domain_id(ce);
+ dve->gaw = (int) agaw_to_width(context_address_width(ce));
+ dve->pgd = NULL;
+ init_iova_domain(&dve->iovad, DMA_32BIT_PFN);
+
+ list_add(&dve->link, &domain_values_list[iommu->seq_id]);
+
+
+ if (t == 0 || t == 1) { /* If (context has page tables) */
+ aw = context_address_width(ce);
+ shift = aw_shift[aw];
+
+ pgt_old_phys = context_address_root(ce) << VTD_PAGE_SHIFT;
+
+ ret = copy_page_table(&pgt_new_phys, pgt_old_phys,
+ shift-9, page_addr, iommu, bus, devfn, dve, ppap);
+
+ if (ret) /* if (problem) bail out */
+ goto exit;
+
+ context_set_address_root(ce, pgt_new_phys);
+ dve->pgd = phys_to_virt(pgt_new_phys);
+ ret = RET_CCE_NEW_PAGE_TABLES;
+ goto exit;
+ }
+
+ if (t == 2) { /* If (Identity mapped pass-through) */
+ ret = RET_CCE_PASS_THROUGH_1; /* REVISIT: Skip for now */
+ goto exit;
+ }
+
+ ret = RET_CCE_RESERVED_VALUE; /* Else ce->t is a Reserved value */
+ /* Note fall-through */
+
+exit: /* all returns come through here to insure good clean-up */
+ return ret;
+}
+
+
+/*
+ * Called once for each context_entry_table found in the root_entry_table
+ */
+static int copy_context_entry_table(struct intel_iommu *iommu,
+ u32 bus, void *ppap,
+ unsigned long *context_new_p,
+ unsigned long context_old_phys)
+{
+ int ret = 0; /* Integer return code */
+ struct context_entry *ce; /* Iterator */
+ unsigned long context_new_phys; /* adr(table in new kernel) */
+ struct context_entry *context_new_virt; /* adr(table in new kernel) */
+ u32 devfn = 0; /* PCI Device & function */
+
+ /* allocate a context-entry table in the new kernel
+ * copy contents from old kernel
+ * then update each entry in the table in the new kernel
+ */
+ context_new_virt =
+ (struct context_entry *)alloc_pgtable_page(iommu->node);
+ if (!context_new_virt)
+ return -ENOMEM;
+
+ context_new_phys = virt_to_phys(context_new_virt);
+
+ __iommu_load_from_oldmem(context_new_virt,
+ context_old_phys,
+ VTD_PAGE_SIZE);
+
+ for (devfn = 0, ce = context_new_virt; devfn < 256; devfn++, ce++) {
+
+ if (!context_present(ce)) /* If (context not present) */
+ continue; /* Skip it */
+
+ ret = copy_context_entry(iommu, bus, devfn, ppap, ce);
+ if (ret < 0) /* if (problem) */
+ return RET_BADCOPY;
+
+ switch (ret) {
+ case RET_CCE_NOT_PRESENT:
+ continue;
+ case RET_CCE_NEW_PAGE_TABLES:
+ continue;
+ case RET_CCE_PASS_THROUGH_1:
+ continue;
+ case RET_CCE_PASS_THROUGH_2:
+ continue;
+ case RET_CCE_RESERVED_VALUE:
+ return RET_BADCOPY;
+ case RET_CCE_PREVIOUS_DID:
+ continue;
+ default:
+ return RET_BADCOPY;
+ };
+ }
+
+ *context_new_p = context_new_phys;
+ return 0;
+}
+
+
+/*
+ * Highest-level function in the 'copy translation tables' set of functions
+ */
+static int copy_root_entry_table(struct intel_iommu *iommu, void *ppap)
+{
+ int ret = 0; /* Integer return code */
+ u32 bus; /* Index: root-entry-table */
+ struct root_entry *re; /* Virt(iterator: new table) */
+ unsigned long context_old_phys; /* Phys(context table entry) */
+ unsigned long context_new_phys; /* Phys(new context_entry) */
+
+ /*
+ * allocate a root-entry table in the new kernel
+ * copy contents from old kernel
+ * then update each entry in the table in the new kernel
+ */
+
+ if (!iommu->root_entry_old_phys)
+ return -ENOMEM;
+
+ for (bus = 0, re = iommu->root_entry; bus < 256; bus += 1, re += 1) {
+ if (!root_present(re))
+ continue;
+
+ context_old_phys = get_context_phys_from_root(re);
+
+ if (!context_old_phys)
+ continue;
+
+ context_new_phys = 0;
+ ret = copy_context_entry_table(iommu, bus, ppap,
+ &context_new_phys,
+ context_old_phys);
+ if (ret)
+ return ret;
+ __iommu_flush_cache(iommu,
+ phys_to_virt(context_new_phys),
+ VTD_PAGE_SIZE);
+
+ set_root_value(re, context_new_phys);
+ }
+
+ return 0;
+}
+/*
+ * Interface to the "copy translation tables" set of functions
+ * from mainline code.
+ */
+static int intel_iommu_load_translation_tables(struct dmar_drhd_unit *drhd,
+ int g_num_of_iommus)
+{
+ struct intel_iommu *iommu; /* Virt(iommu hardware registers) */
+ unsigned long long q; /* quadword scratch */
+ int ret = 0; /* Integer return code */
+ int i = 0; /* Loop index */
+ unsigned long flags;
+
+ /* Structure so copy_page_addr() can accumulate things
+ * over multiple calls and returns
+ */
+ struct copy_page_addr_parms ppa_parms = copy_page_addr_parms_init;
+ struct copy_page_addr_parms *ppap = &ppa_parms;
+
+
+ iommu = drhd->iommu;
+ q = dmar_readq(iommu->reg + DMAR_RTADDR_REG);
+ if (!q)
+ return -1;
+
+ /* If (list needs initializing) do it here */
+ if (!domain_values_list) {
+ domain_values_list =
+ kcalloc(g_num_of_iommus, sizeof(struct list_head),
+ GFP_KERNEL);
+
+ if (!domain_values_list) {
+ pr_err("Allocation failed for domain_values_list array\n");
+ return -ENOMEM;
+ }
+ for (i = 0; i < g_num_of_iommus; i++)
+ INIT_LIST_HEAD(&domain_values_list[i]);
+ }
+
+ spin_lock_irqsave(&iommu->lock, flags);
+
+ /* Load the root-entry table from the old kernel
+ * foreach context_entry_table in root_entry
+ * foreach context_entry in context_entry_table
+ * foreach level-1 page_table_entry in context_entry
+ * foreach level-2 page_table_entry in level 1 page_table_entry
+ * Above pattern continues up to 6 levels of page tables
+ * Sanity-check the entry
+ * Process the bus, devfn, page_address, page_size
+ */
+ if (!iommu->root_entry) {
+ iommu->root_entry =
+ (struct root_entry *)alloc_pgtable_page(iommu->node);
+ if (!iommu->root_entry) {
+ spin_unlock_irqrestore(&iommu->lock, flags);
+ return -ENOMEM;
+ }
+ }
+
+ iommu->root_entry_old_phys = q & VTD_PAGE_MASK;
+ if (!iommu->root_entry_old_phys) {
+ pr_err("Could not read old root entry address.");
+ return -1;
+ }
+
+ iommu->root_entry_old_virt = ioremap_cache(iommu->root_entry_old_phys,
+ VTD_PAGE_SIZE);
+ if (!iommu->root_entry_old_virt) {
+ pr_err("Could not map the old root entry.");
+ return -ENOMEM;
+ }
+
+ __iommu_load_old_root_entry(iommu);
+ ret = copy_root_entry_table(iommu, ppap);
+ __iommu_update_old_root_entry(iommu, -1);
+
+ spin_unlock_irqrestore(&iommu->lock, flags);
+
+ __iommu_free_mapped_mem();
+
+ if (ret)
+ return ret;
+
+
+ ppa_parms.last = 1;
+ copy_page_addr(0, 0, 0, 0, 0, NULL, ppap);
+
+ return 0;
+}
+
#endif /* CONFIG_CRASH_DUMP */
--
2.0.0-rc0

2014-12-22 09:17:12

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH 07/10] iommu/vt-d: enable kdump support in iommu module

Modify the operation of the following functions when called during crash dump:
device_to_domain_id
get_domain_for_dev
init_dmars
intel_iommu_init

Bill Sumner:
Original version.

Zhenhua:
Minor change,
The name of new calling functions.
Do not disable and re-enable TE in kdump kernel.

Signed-off-by: Bill Sumner <[email protected]>
Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel-iommu.c | 142 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 125 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index f9849cb..4efed7c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -907,6 +907,11 @@ static struct context_entry * device_to_context_entry(struct intel_iommu *iommu,
set_root_value(root, phy_addr);
set_root_present(root);
__iommu_flush_cache(iommu, root, sizeof(*root));
+
+#ifdef CONFIG_CRASH_DUMP
+ if (is_kdump_kernel())
+ __iommu_update_old_root_entry(iommu, bus);
+#endif
}
spin_unlock_irqrestore(&iommu->lock, flags);
return &context[devfn];
@@ -958,7 +963,8 @@ static void free_context_table(struct intel_iommu *iommu)

spin_lock_irqsave(&iommu->lock, flags);
if (!iommu->root_entry) {
- goto out;
+ spin_unlock_irqrestore(&iommu->lock, flags);
+ return;
}
for (i = 0; i < ROOT_ENTRY_NR; i++) {
root = &iommu->root_entry[i];
@@ -966,10 +972,26 @@ static void free_context_table(struct intel_iommu *iommu)
if (context)
free_pgtable_page(context);
}
- free_pgtable_page(iommu->root_entry);
- iommu->root_entry = NULL;
-out:
+
+#ifdef CONFIG_CRASH_DUMP
+ if (is_kdump_kernel()) {
+ iommu->root_entry_old_phys = 0;
+ root = iommu->root_entry;
+ iommu->root_entry = NULL;
+ } else {
+#endif
+ free_pgtable_page(iommu->root_entry);
+ iommu->root_entry = NULL;
+#ifdef CONFIG_CRASH_DUMP
+ }
+#endif
+
spin_unlock_irqrestore(&iommu->lock, flags);
+
+#ifdef CONFIG_CRASH_DUMP
+ if (is_kdump_kernel())
+ iounmap(root);
+#endif
}

static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
@@ -2381,6 +2403,9 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
unsigned long flags;
u8 bus, devfn;
int did = -1; /* Default to "no domain_id supplied" */
+#ifdef CONFIG_CRASH_DUMP
+ struct domain_values_entry *dve = NULL;
+#endif /* CONFIG_CRASH_DUMP */

domain = find_domain(dev);
if (domain)
@@ -2414,6 +2439,24 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
domain = alloc_domain(0);
if (!domain)
return NULL;
+
+#ifdef CONFIG_CRASH_DUMP
+ if (is_kdump_kernel()) {
+ /*
+ * if this device had a did in the old kernel
+ * use its values instead of generating new ones
+ */
+ did = device_to_domain_id(iommu, bus, devfn);
+ if (did > 0 || (did == 0 && !cap_caching_mode(iommu->cap)))
+ dve = intel_iommu_did_to_domain_values_entry(did,
+ iommu);
+ if (dve)
+ gaw = dve->gaw;
+ else
+ did = -1;
+ }
+#endif /* CONFIG_CRASH_DUMP */
+
domain->id = iommu_attach_domain(domain, iommu, did);
if (domain->id < 0) {
free_domain_mem(domain);
@@ -2425,6 +2468,18 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
return NULL;
}

+#ifdef CONFIG_CRASH_DUMP
+ if (is_kdump_kernel() && dve) {
+
+ if (domain->pgd)
+ free_pgtable_page(domain->pgd);
+
+ domain->pgd = dve->pgd;
+
+ copy_reserved_iova(&dve->iovad, &domain->iovad);
+ }
+#endif /* CONFIG_CRASH_DUMP */
+
/* register PCI DMA alias device */
if (dev_is_pci(dev)) {
tmp = dmar_insert_dev_info(iommu, PCI_BUS_NUM(dma_alias),
@@ -2948,14 +3003,35 @@ static int __init init_dmars(void)
if (ret)
goto free_iommu;

- /*
- * TBD:
- * we could share the same root & context tables
- * among all IOMMU's. Need to Split it later.
- */
- ret = iommu_alloc_root_entry(iommu);
- if (ret)
- goto free_iommu;
+#ifdef CONFIG_CRASH_DUMP
+ if (is_kdump_kernel()) {
+ pr_info("IOMMU Copying translate tables from panicked kernel\n");
+ ret = intel_iommu_load_translation_tables(drhd,
+ g_num_of_iommus);
+ if (ret) {
+ pr_err("IOMMU: Copy translate tables failed\n");
+
+ /* Best to stop trying */
+ goto free_iommu;
+ }
+ pr_info("IOMMU: root_cache:0x%12.12llx phys:0x%12.12llx\n",
+ (u64)iommu->root_entry,
+ (u64)iommu->root_entry_old_phys);
+ intel_iommu_get_dids_from_old_kernel(iommu);
+ } else {
+#endif /* CONFIG_CRASH_DUMP */
+ /*
+ * TBD:
+ * we could share the same root & context tables
+ * among all IOMMU's. Need to Split it later.
+ */
+ ret = iommu_alloc_root_entry(iommu);
+ if (ret)
+ goto free_iommu;
+#ifdef CONFIG_CRASH_DUMP
+ }
+#endif
+
if (!ecap_pass_through(iommu->ecap))
hw_pass_through = 0;
}
@@ -2972,6 +3048,16 @@ static int __init init_dmars(void)

check_tylersburg_isoch();

+#ifdef CONFIG_CRASH_DUMP
+ /*
+ * In the crashdump kernel: Skip setting-up new domains for
+ * si, rmrr, and the isa bus on the expectation that these
+ * translations were copied from the old kernel.
+ */
+ if (is_kdump_kernel())
+ goto skip_new_domains_for_si_rmrr_isa;
+#endif /* CONFIG_CRASH_DUMP */
+
/*
* If pass through is not set or not enabled, setup context entries for
* identity mappings for rmrr, gfx, and isa and may fall back to static
@@ -3012,6 +3098,10 @@ static int __init init_dmars(void)

iommu_prepare_isa();

+#ifdef CONFIG_CRASH_DUMP
+skip_new_domains_for_si_rmrr_isa:;
+#endif /* CONFIG_CRASH_DUMP */
+
/*
* for each drhd
* enable fault log
@@ -3040,7 +3130,15 @@ static int __init init_dmars(void)

iommu->flush.flush_context(iommu, 0, 0, 0, DMA_CCMD_GLOBAL_INVL);
iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
- iommu_enable_translation(iommu);
+
+#ifdef CONFIG_CRASH_DUMP
+ if (is_kdump_kernel()) {
+ if (!(iommu->gcmd & DMA_GCMD_TE))
+ iommu_enable_translation(iommu);
+ } else
+#endif
+ iommu_enable_translation(iommu);
+
iommu_disable_protect_mem_regions(iommu);
}

@@ -4351,12 +4449,22 @@ int __init intel_iommu_init(void)
goto out_free_dmar;
}

+#ifdef CONFIG_CRASH_DUMP
/*
- * Disable translation if already enabled prior to OS handover.
+ * If (This is the crash kernel)
+ * Set: copy iommu translate tables from old kernel
+ * Skip disabling the iommu hardware translations
*/
- for_each_active_iommu(iommu, drhd)
- if (iommu->gcmd & DMA_GCMD_TE)
- iommu_disable_translation(iommu);
+ if (is_kdump_kernel()) {
+ pr_info("IOMMU Skip disabling iommu hardware translations\n");
+ } else
+#endif /* CONFIG_CRASH_DUMP */
+ /*
+ * Disable translation if already enabled prior to OS handover.
+ */
+ for_each_active_iommu(iommu, drhd)
+ if (iommu->gcmd & DMA_GCMD_TE)
+ iommu_disable_translation(iommu);

if (dmar_dev_scope_init() < 0) {
if (force_on)
--
2.0.0-rc0

2014-12-22 09:17:17

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH 08/10] iommu/vtd: assign new page table for dma_map

When a device driver issues the first dma_map command for a
device, we assign a new and empty page-table, thus removing all
mappings from the old kernel for the device.

Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel-iommu.c | 56 ++++++++++++++++++++++++++++++++++++++-------
1 file changed, 48 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 4efed7c..7f8b546 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -44,6 +44,7 @@
#include <asm/irq_remapping.h>
#include <asm/cacheflush.h>
#include <asm/iommu.h>
+#include <linux/dma-mapping.h>

#include "irq_remapping.h"

@@ -455,6 +456,8 @@ static int copy_root_entry_table(struct intel_iommu *iommu, void *ppap);
static int intel_iommu_load_translation_tables(struct dmar_drhd_unit *drhd,
int g_num_of_iommus);

+static void unmap_device_dma(struct dmar_domain *domain, struct device *dev);
+
#endif /* CONFIG_CRASH_DUMP */

/*
@@ -3199,14 +3202,30 @@ static struct dmar_domain *__get_valid_domain_for_dev(struct device *dev)
return NULL;
}

- /* make sure context mapping is ok */
- if (unlikely(!domain_context_mapped(dev))) {
- ret = domain_context_mapping(domain, dev, CONTEXT_TT_MULTI_LEVEL);
- if (ret) {
- printk(KERN_ERR "Domain context map for %s failed",
- dev_name(dev));
- return NULL;
- }
+ /* if in kdump kernel, we need to unmap the mapped dma pages,
+ * detach this device first.
+ */
+ if (likely(domain_context_mapped(dev))) {
+#ifdef CONFIG_CRASH_DUMP
+ if (is_kdump_kernel()) {
+ unmap_device_dma(domain, dev);
+ domain = get_domain_for_dev(dev,
+ DEFAULT_DOMAIN_ADDRESS_WIDTH);
+ if (!domain) {
+ pr_err("Allocating domain for %s failed",
+ dev_name(dev));
+ return NULL;
+ }
+ } else
+#endif
+ return domain;
+ }
+
+ ret = domain_context_mapping(domain, dev, CONTEXT_TT_MULTI_LEVEL);
+ if (ret) {
+ pr_err("Domain context map for %s failed",
+ dev_name(dev));
+ return NULL;
}

return domain;
@@ -5687,4 +5706,25 @@ static int intel_iommu_load_translation_tables(struct dmar_drhd_unit *drhd,
return 0;
}

+static void unmap_device_dma(struct dmar_domain *domain, struct device *dev)
+{
+ struct intel_iommu *iommu;
+ struct context_entry *ce;
+ struct iova *iova;
+ u8 bus, devfn;
+ phys_addr_t phys_addr;
+ dma_addr_t dev_addr;
+
+ iommu = device_to_iommu(dev, &bus, &devfn);
+ ce = device_to_context_entry(iommu, bus, devfn);
+ phys_addr = context_address_root(ce) << VTD_PAGE_SHIFT;
+ dev_addr = phys_to_dma(dev, phys_addr);
+
+ iova = find_iova(&domain->iovad, IOVA_PFN(dev_addr));
+ if (iova)
+ intel_unmap(dev, dev_addr);
+
+ domain_remove_one_dev_info(domain, dev);
+}
+
#endif /* CONFIG_CRASH_DUMP */
--
2.0.0-rc0

2014-12-22 09:17:27

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH 10/10] iommu/vt-d: Use old irte in kdump kernel

Fix the intr-remapping fault.

[1.594890] dmar: DRHD: handling fault status reg 2
[1.594894] dmar: INTR-REMAP: Request device [[41:00.0] fault index 4d
[1.594894] INTR-REMAP:[fault reason 34] Present field in the IRTE entry
is clear

Use old irte in kdump kernel, do not disable and re-enable interrupt
remapping.

Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel_irq_remapping.c | 42 ++++++++++++++++++++++++++++++++-----
1 file changed, 37 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 13f2034..e244186 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -198,6 +198,11 @@ static int modify_irte(int irq, struct irte *irte_modified)

set_64bit(&irte->low, irte_modified->low);
set_64bit(&irte->high, irte_modified->high);
+
+#ifdef CONFIG_CRASH_DUMP
+ if (is_kdump_kernel())
+ __iommu_update_old_irte(iommu, index);
+#endif
__iommu_flush_cache(iommu, irte, sizeof(*irte));

rc = qi_flush_iec(iommu, index, 0);
@@ -259,6 +264,11 @@ static int clear_entries(struct irq_2_iommu *irq_iommu)
bitmap_release_region(iommu->ir_table->bitmap, index,
irq_iommu->irte_mask);

+#ifdef CONFIG_CRASH_DUMP
+ if (is_kdump_kernel())
+ __iommu_update_old_irte(iommu, -1);
+#endif
+
return qi_flush_iec(iommu, index, irq_iommu->irte_mask);
}

@@ -640,11 +650,20 @@ static int __init intel_enable_irq_remapping(void)
*/
dmar_fault(-1, iommu);

- /*
- * Disable intr remapping and queued invalidation, if already
- * enabled prior to OS handover.
- */
- iommu_disable_irq_remapping(iommu);
+#ifdef CONFIG_CRASH_DUMP
+ if (is_kdump_kernel()) {
+ /* Do notdisable irq and then re-enable again. */
+ } else {
+#endif
+ /*
+ * Disable intr remapping and queued invalidation,
+ * if already enabled prior to OS handover.
+ */
+ iommu_disable_irq_remapping(iommu);
+
+#ifdef CONFIG_CRASH_DUMP
+ }
+#endif

dmar_disable_qi(iommu);
}
@@ -687,7 +706,20 @@ static int __init intel_enable_irq_remapping(void)
if (intel_setup_irq_remapping(iommu))
goto error;

+#ifdef CONFIG_CRASH_DUMP
+ if (is_kdump_kernel()) {
+ unsigned long long q;
+
+ q = dmar_readq(iommu->reg + DMAR_IRTA_REG);
+ iommu->ir_table->base_old_phys = q & VTD_PAGE_MASK;
+ iommu->ir_table->base_old_virt = ioremap_cache(
+ iommu->ir_table->base_old_phys,
+ INTR_REMAP_TABLE_ENTRIES*sizeof(struct irte));
+ __iommu_load_old_irte(iommu);
+ } else
+#endif
iommu_set_irq_remapping(iommu, eim);
+
setup = 1;
}

--
2.0.0-rc0

2014-12-22 09:17:23

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH 09/10] iommu/vt-d: Copy functions for irte

Functions to copy the irte data from the old kernel into the kdump kernel.

Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel_irq_remapping.c | 57 +++++++++++++++++++++++++++++++++++++
include/linux/intel-iommu.h | 4 +++
2 files changed, 61 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index a55b207..13f2034 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -8,6 +8,7 @@
#include <linux/irq.h>
#include <linux/intel-iommu.h>
#include <linux/acpi.h>
+#include <linux/crash_dump.h>
#include <asm/io_apic.h>
#include <asm/smp.h>
#include <asm/cpu.h>
@@ -17,6 +18,11 @@

#include "irq_remapping.h"

+#ifdef CONFIG_CRASH_DUMP
+static int __iommu_load_old_irte(struct intel_iommu *iommu);
+static int __iommu_update_old_irte(struct intel_iommu *iommu, int index);
+#endif /* CONFIG_CRASH_DUMP */
+
struct ioapic_scope {
struct intel_iommu *iommu;
unsigned int id;
@@ -1296,3 +1302,54 @@ int dmar_ir_hotplug(struct dmar_drhd_unit *dmaru, bool insert)

return ret;
}
+
+#ifdef CONFIG_CRASH_DUMP
+
+static int __iommu_load_old_irte(struct intel_iommu *iommu)
+{
+ if ((!iommu)
+ || (!iommu->ir_table)
+ || (!iommu->ir_table->base)
+ || (!iommu->ir_table->base_old_phys)
+ || (!iommu->ir_table->base_old_virt))
+ return -1;
+
+ memcpy(iommu->ir_table->base,
+ iommu->ir_table->base_old_virt,
+ INTR_REMAP_TABLE_ENTRIES*sizeof(struct irte));
+
+ return 0;
+}
+
+static int __iommu_update_old_irte(struct intel_iommu *iommu, int index)
+{
+ int start;
+ unsigned long size;
+ void __iomem *to;
+ void *from;
+
+ if ((!iommu)
+ || (!iommu->ir_table)
+ || (!iommu->ir_table->base)
+ || (!iommu->ir_table->base_old_phys)
+ || (!iommu->ir_table->base_old_virt))
+ return -1;
+
+ if (index < -1 || index >= INTR_REMAP_TABLE_ENTRIES)
+ return -1;
+
+ if (index == -1) {
+ start = 0;
+ size = INTR_REMAP_TABLE_ENTRIES * sizeof(struct irte);
+ } else {
+ start = index * sizeof(struct irte);
+ size = sizeof(struct irte);
+ }
+
+ to = iommu->ir_table->base_old_virt;
+ from = iommu->ir_table->base;
+ memcpy(to + start, from + start, size);
+
+ return 0;
+}
+#endif /* CONFIG_CRASH_DUMP */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 8e29b97..76c6ea5 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -290,6 +290,10 @@ struct q_inval {
struct ir_table {
struct irte *base;
unsigned long *bitmap;
+#ifdef CONFIG_CRASH_DUMP
+ void __iomem *base_old_virt;
+ unsigned long base_old_phys;
+#endif
};
#endif

--
2.0.0-rc0

2014-12-22 09:16:46

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH 01/10] iommu/vt-d: Update iommu_attach_domain() and its callers

Allow specification of the domain-id for the new domain.
This patch only adds the 'did' parameter to iommu_attach_domain()
and modifies all of its callers to specify the default value of -1
which says "no did specified, allocate a new one".

This is no functional change from current behaviour -- just enables
a functional change to be made in a later patch.

Bill Sumner:
Original version.

Li, Zhenhua:
Minor change, add change to function __iommu_attach_domain.

Signed-off-by: Bill Sumner <[email protected]>
Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel-iommu.c | 34 ++++++++++++++++++++--------------
1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 1232336..2dc6250 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1534,31 +1534,36 @@ static struct dmar_domain *alloc_domain(int flags)
}

static int __iommu_attach_domain(struct dmar_domain *domain,
- struct intel_iommu *iommu)
+ struct intel_iommu *iommu,
+ int domain_number)
{
int num;
unsigned long ndomains;

ndomains = cap_ndoms(iommu->cap);
- num = find_first_zero_bit(iommu->domain_ids, ndomains);
- if (num < ndomains) {
- set_bit(num, iommu->domain_ids);
- iommu->domains[num] = domain;
- } else {
- num = -ENOSPC;
- }
+ if (domain_number < 0) {
+ num = find_first_zero_bit(iommu->domain_ids, ndomains);
+ if (num < ndomains) {
+ set_bit(num, iommu->domain_ids);
+ iommu->domains[num] = domain;
+ } else {
+ num = -ENOSPC;
+ }
+ } else
+ num = domain_number;

return num;
}

static int iommu_attach_domain(struct dmar_domain *domain,
- struct intel_iommu *iommu)
+ struct intel_iommu *iommu,
+ int domain_number)
{
int num;
unsigned long flags;

spin_lock_irqsave(&iommu->lock, flags);
- num = __iommu_attach_domain(domain, iommu);
+ num = __iommu_attach_domain(domain, iommu, domain_number);
spin_unlock_irqrestore(&iommu->lock, flags);
if (num < 0)
pr_err("IOMMU: no free domain ids\n");
@@ -1577,7 +1582,7 @@ static int iommu_attach_vm_domain(struct dmar_domain *domain,
if (iommu->domains[num] == domain)
return num;

- return __iommu_attach_domain(domain, iommu);
+ return __iommu_attach_domain(domain, iommu, -1);
}

static void iommu_detach_domain(struct dmar_domain *domain,
@@ -2231,6 +2236,7 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
u16 dma_alias;
unsigned long flags;
u8 bus, devfn;
+ int did = -1; /* Default to "no domain_id supplied" */

domain = find_domain(dev);
if (domain)
@@ -2264,7 +2270,7 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
domain = alloc_domain(0);
if (!domain)
return NULL;
- domain->id = iommu_attach_domain(domain, iommu);
+ domain->id = iommu_attach_domain(domain, iommu, did);
if (domain->id < 0) {
free_domain_mem(domain);
return NULL;
@@ -2442,7 +2448,7 @@ static int __init si_domain_init(int hw)
return -EFAULT;

for_each_active_iommu(iommu, drhd) {
- ret = iommu_attach_domain(si_domain, iommu);
+ ret = iommu_attach_domain(si_domain, iommu, -1);
if (ret < 0) {
domain_exit(si_domain);
return -EFAULT;
@@ -3866,7 +3872,7 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
iommu_enable_translation(iommu);

if (si_domain) {
- ret = iommu_attach_domain(si_domain, iommu);
+ ret = iommu_attach_domain(si_domain, iommu, -1);
if (ret < 0 || si_domain->id != ret)
goto disable_iommu;
domain_attach_iommu(si_domain, iommu);
--
2.0.0-rc0

2014-12-22 09:16:44

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH 02/10] iommu/vt-d: Items required for kdump

Add structure type domain_values_entry used for kdump;
Add context entry functions needed for kdump.

Bill Sumner:
Original version;

Li, Zhenhua:
Changed the name of new functions, make them consistent with current
context get/set functions.

Signed-off-by: Bill Sumner <[email protected]>
Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel-iommu.c | 70 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 70 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2dc6250..5ce2850 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -40,6 +40,7 @@
#include <linux/pci-ats.h>
#include <linux/memblock.h>
#include <linux/dma-contiguous.h>
+#include <linux/crash_dump.h>
#include <asm/irq_remapping.h>
#include <asm/cacheflush.h>
#include <asm/iommu.h>
@@ -208,6 +209,12 @@ get_context_addr_from_root(struct root_entry *root)
NULL);
}

+static inline unsigned long
+get_context_phys_from_root(struct root_entry *root)
+{
+ return root_present(root) ? (root->val & VTD_PAGE_MASK) : 0;
+}
+
/*
* low 64 bits:
* 0: present
@@ -228,6 +235,32 @@ static inline bool context_present(struct context_entry *context)
{
return (context->lo & 1);
}
+
+static inline int context_fault_enable(struct context_entry *c)
+{
+ return((c->lo >> 1) & 0x1);
+}
+
+static inline int context_translation_type(struct context_entry *c)
+{
+ return((c->lo >> 2) & 0x3);
+}
+
+static inline u64 context_address_root(struct context_entry *c)
+{
+ return((c->lo >> VTD_PAGE_SHIFT));
+}
+
+static inline int context_address_width(struct context_entry *c)
+{
+ return((c->hi >> 0) & 0x7);
+}
+
+static inline int context_domain_id(struct context_entry *c)
+{
+ return((c->hi >> 8) & 0xffff);
+}
+
static inline void context_set_present(struct context_entry *context)
{
context->lo |= 1;
@@ -313,6 +346,43 @@ static inline int first_pte_in_page(struct dma_pte *pte)
return !((unsigned long)pte & ~VTD_PAGE_MASK);
}

+
+#ifdef CONFIG_CRASH_DUMP
+
+/*
+ * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU
+ *
+ * Fixes the crashdump kernel to deal with an active iommu and legacy
+ * DMA from the (old) panicked kernel in a manner similar to how legacy
+ * DMA is handled when no hardware iommu was in use by the old kernel --
+ * allow the legacy DMA to continue into its current buffers.
+ *
+ * In the crashdump kernel, this code:
+ * 1. skips disabling the IOMMU's translating of IO Virtual Addresses (IOVA).
+ * 2. Do not re-enable IOMMU's translating.
+ * 3. In kdump kernel, use the old root entry table.
+ * 4. Leaves the current translations in-place so that legacy DMA will
+ * continue to use its current buffers.
+ * 5. Allocates to the device drivers in the crashdump kernel
+ * portions of the iova address ranges that are different
+ * from the iova address ranges that were being used by the old kernel
+ * at the time of the panic.
+ *
+ */
+
+struct domain_values_entry {
+ struct list_head link; /* link entries into a list */
+ struct iova_domain iovad; /* iova's that belong to this domain */
+ struct dma_pte *pgd; /* virtual address */
+ int did; /* domain id */
+ int gaw; /* max guest address width */
+ int iommu_superpage; /* Level of superpages supported:
+ 0 == 4KiB (no superpages), 1 == 2MiB,
+ 2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
+};
+
+#endif /* CONFIG_CRASH_DUMP */
+
/*
* This domain is a statically identity mapping domain.
* 1. This domain creats a static 1:1 mapping to all usable memory.
--
2.0.0-rc0

2014-12-22 09:44:42

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

This version works for 3.19.0-rc1.

Zhenhua
On 12/22/2014 05:15 PM, Li, Zhen-Hua wrote:
> This patchset is an update of Bill Sumner's patchset, implements a fix for:
> If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
> when a panic happens, the kdump kernel will boot with these faults:
>
> dmar: DRHD: handling fault status reg 102
> dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff80000
> DMAR:[fault reason 01] Present bit in root entry is clear
>
> dmar: DRHD: handling fault status reg 2
> dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
>
> On some system, the interrupt remapping fault will also happen even if the
> intel_iommu is not set to on, because the interrupt remapping will be enabled
> when x2apic is needed by the system.
>
> The cause of the DMA fault is described in Bill's original version, and the
> INTR-Remap fault is caused by a similar reason. In short, the initialization
> of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
> response.
>
> To fix this problem, we modifies the behaviors of the intel vt-d in the
> crashdump kernel:
>
> For DMA Remapping:
> 1. To accept the vt-d hardware in an active state,
> 2. Do not disable and re-enable the translation, keep it enabled.
> 3. Use the old root entry table, do not rewrite the RTA register.
> 4. Malloc and use new context entry table and page table, copy data from the
> old ones that used by the old kernel.
> 5. to use different portions of the iova address ranges for the device drivers
> in the crashdump kernel than the iova ranges that were in-use at the time
> of the panic.
> 6. After device driver is loaded, when it issues the first dma_map command,
> free the dmar_domain structure for this device, and generate a new one, so
> that the device can be assigned a new and empty page table.
> 7. When a new context entry table is generated, we also save its address to
> the old root entry table.
>
> For Interrupt Remapping:
> 1. To accept the vt-d hardware in an active state,
> 2. Do not disable and re-enable the interrupt remapping, keep it enabled.
> 3. Use the old interrupt remapping table, do not rewrite the IRTA register.
> 4. When ioapic entry is setup, the interrupt remapping table is changed, and
> the updated data will be stored to the old interrupt remapping table.
>
> Advantages of this approach:
> 1. All manipulation of the IO-device is done by the Linux device-driver
> for that device.
> 2. This approach behaves in a manner very similar to operation without an
> active iommu.
> 3. Any activity between the IO-device and its RMRR areas is handled by the
> device-driver in the same manner as during a non-kdump boot.
> 4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
> This supports the practice of creating a special kdump kernel without
> drivers for any devices that are not required for taking a crashdump.
> 5. Minimal code-changes among the existing mainline intel vt-d code.
>
> Summary of changes in this patch set:
> 1. Added some useful function for root entry table in code intel-iommu.c
> 2. Added new members to struct root_entry and struct irte;
> 3. Functions to load old root entry table to iommu->root_entry from the memory
> of old kernel.
> 4. Functions to malloc new context entry table and page table and copy the data
> from the old ones to the malloced new ones.
> 5. Functions to enable support for DMA remapping in kdump kernel.
> 6. Functions to load old irte data from the old kernel to the kdump kernel.
> 7. Some code changes that support other behaviours that have been listed.
> 8. In the new functions, use physical address as "unsigned long" type, not
> pointers.
>
> Original version by Bill Sumner:
> https://lkml.org/lkml/2014/1/10/518
> https://lkml.org/lkml/2014/4/15/716
> https://lkml.org/lkml/2014/4/24/836
>
> Zhenhua's last of Bill's patchset:
> https://lkml.org/lkml/2014/10/21/134
> https://lkml.org/lkml/2014/12/15/121
>
> Changed in this version:
> 1. Do not disable and re-enable traslation and interrupt remapping.
> 2. Use old root entry table.
> 3. Use old interrupt remapping table.
> 4. Use "unsigned long" as physical address.
> 5. Use intel_unmap to unmap the old dma;
>
> This patchset should be applied with this one together:
> https://lkml.org/lkml/2014/11/5/43
> x86/iommu: fix incorrect bit operations in setting values
>
> Bill Sumner (5):
> iommu/vt-d: Update iommu_attach_domain() and its callers
> iommu/vt-d: Items required for kdump
> iommu/vt-d: data types and functions used for kdump
> iommu/vt-d: Add domain-id functions
> iommu/vt-d: enable kdump support in iommu module
>
> Li, Zhen-Hua (10):
> iommu/vt-d: Update iommu_attach_domain() and its callers
> iommu/vt-d: Items required for kdump
> iommu/vt-d: Add domain-id functions
> iommu/vt-d: functions to copy data from old mem
> iommu/vt-d: Add functions to load and save old re
> iommu/vt-d: datatypes and functions used for kdump
> iommu/vt-d: enable kdump support in iommu module
> iommu/vtd: assign new page table for dma_map
> iommu/vt-d: Copy functions for irte
> iommu/vt-d: Use old irte in kdump kernel
>
> drivers/iommu/intel-iommu.c | 1050 +++++++++++++++++++++++++++++++++--
> drivers/iommu/intel_irq_remapping.c | 99 +++-
> include/linux/intel-iommu.h | 18 +
> 3 files changed, 1123 insertions(+), 44 deletions(-)
>

2014-12-26 05:19:57

by Takao Indoh

[permalink] [raw]
Subject: Re: [PATCH 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Hi Zhen-Hua,

I tested your patch and found two problems.

[1]
Kenel panic occurs during 2nd kernel boot.

..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.18.0 #25
Hardware name: FUJITSU-SV PRIMERGY BX920 S2/D3030, BIOS 080015 Rev.3D81.3030 02/10/2012
0000000000000002 ffff880036167d08 ffffffff815b1c6a 0000000000000000
ffffffff817f7670 ffff880036167d88 ffffffff815b19f1 0000000000000008
ffff880036167d98 ffff880036167d38 ffffffff810a5d2f ffff880036167d98
Call Trace:
[<ffffffff815b1c6a>] dump_stack+0x48/0x5e
[<ffffffff815b19f1>] panic+0xbb/0x1fa
[<ffffffff810a5d2f>] ? vprintk_default+0x1f/0x30
[<ffffffff814c6a6c>] panic_if_irq_remap+0x1c/0x20
[<ffffffff81b53985>] check_timer+0x1e7/0x5ed
[<ffffffff8129bd9d>] ? radix_tree_lookup+0xd/0x10
[<ffffffff81b5413b>] setup_IO_APIC+0x261/0x292
[<ffffffff81b50302>] native_smp_prepare_cpus+0x214/0x25d
[<ffffffff81b41c65>] kernel_init_freeable+0x1dc/0x28c
[<ffffffff815aaf00>] ? rest_init+0x80/0x80
[<ffffffff815aaf0e>] kernel_init+0xe/0xf0
[<ffffffff815b5d2c>] ret_from_fork+0x7c/0xb0
[<ffffffff815aaf00>] ? rest_init+0x80/0x80
---[ end Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC


This panic seems to be related to unflushed cache. I confirmed this
problem was fixed by the following patch.

--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -200,8 +200,13 @@ static int modify_irte(int irq, struct irte *irte_modified)
set_64bit(&irte->high, irte_modified->high);

#ifdef CONFIG_CRASH_DUMP
- if (is_kdump_kernel())
+ if (is_kdump_kernel()) {
__iommu_update_old_irte(iommu, index);
+ __iommu_flush_cache(iommu,
+ iommu->ir_table->base_old_virt +
+ index * sizeof(struct irte),
+ sizeof(struct irte));
+ }
#endif
__iommu_flush_cache(iommu, irte, sizeof(*irte));


[2]
Some DMAR error messages are still found in 2nd kernel boot.

dmar: DRHD: handling fault status reg 2
dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr ffded000
DMAR:[fault reason 01] Present bit in root entry is clear

I confiremd your commit 1a2262 was already applied. Any idea?

Thanks,
Takao Indoh


On 2014/12/22 18:15, Li, Zhen-Hua wrote:
> This patchset is an update of Bill Sumner's patchset, implements a fix for:
> If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
> when a panic happens, the kdump kernel will boot with these faults:
>
> dmar: DRHD: handling fault status reg 102
> dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff80000
> DMAR:[fault reason 01] Present bit in root entry is clear
>
> dmar: DRHD: handling fault status reg 2
> dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
>
> On some system, the interrupt remapping fault will also happen even if the
> intel_iommu is not set to on, because the interrupt remapping will be enabled
> when x2apic is needed by the system.
>
> The cause of the DMA fault is described in Bill's original version, and the
> INTR-Remap fault is caused by a similar reason. In short, the initialization
> of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
> response.
>
> To fix this problem, we modifies the behaviors of the intel vt-d in the
> crashdump kernel:
>
> For DMA Remapping:
> 1. To accept the vt-d hardware in an active state,
> 2. Do not disable and re-enable the translation, keep it enabled.
> 3. Use the old root entry table, do not rewrite the RTA register.
> 4. Malloc and use new context entry table and page table, copy data from the
> old ones that used by the old kernel.
> 5. to use different portions of the iova address ranges for the device drivers
> in the crashdump kernel than the iova ranges that were in-use at the time
> of the panic.
> 6. After device driver is loaded, when it issues the first dma_map command,
> free the dmar_domain structure for this device, and generate a new one, so
> that the device can be assigned a new and empty page table.
> 7. When a new context entry table is generated, we also save its address to
> the old root entry table.
>
> For Interrupt Remapping:
> 1. To accept the vt-d hardware in an active state,
> 2. Do not disable and re-enable the interrupt remapping, keep it enabled.
> 3. Use the old interrupt remapping table, do not rewrite the IRTA register.
> 4. When ioapic entry is setup, the interrupt remapping table is changed, and
> the updated data will be stored to the old interrupt remapping table.
>
> Advantages of this approach:
> 1. All manipulation of the IO-device is done by the Linux device-driver
> for that device.
> 2. This approach behaves in a manner very similar to operation without an
> active iommu.
> 3. Any activity between the IO-device and its RMRR areas is handled by the
> device-driver in the same manner as during a non-kdump boot.
> 4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
> This supports the practice of creating a special kdump kernel without
> drivers for any devices that are not required for taking a crashdump.
> 5. Minimal code-changes among the existing mainline intel vt-d code.
>
> Summary of changes in this patch set:
> 1. Added some useful function for root entry table in code intel-iommu.c
> 2. Added new members to struct root_entry and struct irte;
> 3. Functions to load old root entry table to iommu->root_entry from the memory
> of old kernel.
> 4. Functions to malloc new context entry table and page table and copy the data
> from the old ones to the malloced new ones.
> 5. Functions to enable support for DMA remapping in kdump kernel.
> 6. Functions to load old irte data from the old kernel to the kdump kernel.
> 7. Some code changes that support other behaviours that have been listed.
> 8. In the new functions, use physical address as "unsigned long" type, not
> pointers.
>
> Original version by Bill Sumner:
> https://lkml.org/lkml/2014/1/10/518
> https://lkml.org/lkml/2014/4/15/716
> https://lkml.org/lkml/2014/4/24/836
>
> Zhenhua's last of Bill's patchset:
> https://lkml.org/lkml/2014/10/21/134
> https://lkml.org/lkml/2014/12/15/121
>
> Changed in this version:
> 1. Do not disable and re-enable traslation and interrupt remapping.
> 2. Use old root entry table.
> 3. Use old interrupt remapping table.
> 4. Use "unsigned long" as physical address.
> 5. Use intel_unmap to unmap the old dma;
>
> This patchset should be applied with this one together:
> https://lkml.org/lkml/2014/11/5/43
> x86/iommu: fix incorrect bit operations in setting values
>
> Bill Sumner (5):
> iommu/vt-d: Update iommu_attach_domain() and its callers
> iommu/vt-d: Items required for kdump
> iommu/vt-d: data types and functions used for kdump
> iommu/vt-d: Add domain-id functions
> iommu/vt-d: enable kdump support in iommu module
>
> Li, Zhen-Hua (10):
> iommu/vt-d: Update iommu_attach_domain() and its callers
> iommu/vt-d: Items required for kdump
> iommu/vt-d: Add domain-id functions
> iommu/vt-d: functions to copy data from old mem
> iommu/vt-d: Add functions to load and save old re
> iommu/vt-d: datatypes and functions used for kdump
> iommu/vt-d: enable kdump support in iommu module
> iommu/vtd: assign new page table for dma_map
> iommu/vt-d: Copy functions for irte
> iommu/vt-d: Use old irte in kdump kernel
>
> drivers/iommu/intel-iommu.c | 1050 +++++++++++++++++++++++++++++++++--
> drivers/iommu/intel_irq_remapping.c | 99 +++-
> include/linux/intel-iommu.h | 18 +
> 3 files changed, 1123 insertions(+), 44 deletions(-)
>

2014-12-26 06:47:55

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Hi Takao Indoh,

Thank you very much for your testing. I will add your update in next
version.
Also I think a flush for __iommu_update_old_root_entry is also necessary.

Currently I have no idea about your fault, does it happen before or
during its loading? Could you send me your full kernel log as an
attachment?

Regards and Merry Christmas.
Zhenhua

On 12/26/2014 01:13 PM, Takao Indoh wrote:
> Hi Zhen-Hua,
>
> I tested your patch and found two problems.
>
> [1]
> Kenel panic occurs during 2nd kernel boot.
>
> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.18.0 #25
> Hardware name: FUJITSU-SV PRIMERGY BX920 S2/D3030, BIOS 080015 Rev.3D81.3030 02/10/2012
> 0000000000000002 ffff880036167d08 ffffffff815b1c6a 0000000000000000
> ffffffff817f7670 ffff880036167d88 ffffffff815b19f1 0000000000000008
> ffff880036167d98 ffff880036167d38 ffffffff810a5d2f ffff880036167d98
> Call Trace:
> [<ffffffff815b1c6a>] dump_stack+0x48/0x5e
> [<ffffffff815b19f1>] panic+0xbb/0x1fa
> [<ffffffff810a5d2f>] ? vprintk_default+0x1f/0x30
> [<ffffffff814c6a6c>] panic_if_irq_remap+0x1c/0x20
> [<ffffffff81b53985>] check_timer+0x1e7/0x5ed
> [<ffffffff8129bd9d>] ? radix_tree_lookup+0xd/0x10
> [<ffffffff81b5413b>] setup_IO_APIC+0x261/0x292
> [<ffffffff81b50302>] native_smp_prepare_cpus+0x214/0x25d
> [<ffffffff81b41c65>] kernel_init_freeable+0x1dc/0x28c
> [<ffffffff815aaf00>] ? rest_init+0x80/0x80
> [<ffffffff815aaf0e>] kernel_init+0xe/0xf0
> [<ffffffff815b5d2c>] ret_from_fork+0x7c/0xb0
> [<ffffffff815aaf00>] ? rest_init+0x80/0x80
> ---[ end Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC
>
>
> This panic seems to be related to unflushed cache. I confirmed this
> problem was fixed by the following patch.
>
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -200,8 +200,13 @@ static int modify_irte(int irq, struct irte *irte_modified)
> set_64bit(&irte->high, irte_modified->high);
>
> #ifdef CONFIG_CRASH_DUMP
> - if (is_kdump_kernel())
> + if (is_kdump_kernel()) {
> __iommu_update_old_irte(iommu, index);
> + __iommu_flush_cache(iommu,
> + iommu->ir_table->base_old_virt +
> + index * sizeof(struct irte),
> + sizeof(struct irte));
> + }
> #endif
> __iommu_flush_cache(iommu, irte, sizeof(*irte));
>
>
> [2]
> Some DMAR error messages are still found in 2nd kernel boot.
>
> dmar: DRHD: handling fault status reg 2
> dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr ffded000
> DMAR:[fault reason 01] Present bit in root entry is clear
>
> I confiremd your commit 1a2262 was already applied. Any idea?
>
> Thanks,
> Takao Indoh
>
>
> On 2014/12/22 18:15, Li, Zhen-Hua wrote:
>> This patchset is an update of Bill Sumner's patchset, implements a fix for:
>> If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
>> when a panic happens, the kdump kernel will boot with these faults:
>>
>> dmar: DRHD: handling fault status reg 102
>> dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff80000
>> DMAR:[fault reason 01] Present bit in root entry is clear
>>
>> dmar: DRHD: handling fault status reg 2
>> dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
>> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
>>
>> On some system, the interrupt remapping fault will also happen even if the
>> intel_iommu is not set to on, because the interrupt remapping will be enabled
>> when x2apic is needed by the system.
>>
>> The cause of the DMA fault is described in Bill's original version, and the
>> INTR-Remap fault is caused by a similar reason. In short, the initialization
>> of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
>> response.
>>
>> To fix this problem, we modifies the behaviors of the intel vt-d in the
>> crashdump kernel:
>>
>> For DMA Remapping:
>> 1. To accept the vt-d hardware in an active state,
>> 2. Do not disable and re-enable the translation, keep it enabled.
>> 3. Use the old root entry table, do not rewrite the RTA register.
>> 4. Malloc and use new context entry table and page table, copy data from the
>> old ones that used by the old kernel.
>> 5. to use different portions of the iova address ranges for the device drivers
>> in the crashdump kernel than the iova ranges that were in-use at the time
>> of the panic.
>> 6. After device driver is loaded, when it issues the first dma_map command,
>> free the dmar_domain structure for this device, and generate a new one, so
>> that the device can be assigned a new and empty page table.
>> 7. When a new context entry table is generated, we also save its address to
>> the old root entry table.
>>
>> For Interrupt Remapping:
>> 1. To accept the vt-d hardware in an active state,
>> 2. Do not disable and re-enable the interrupt remapping, keep it enabled.
>> 3. Use the old interrupt remapping table, do not rewrite the IRTA register.
>> 4. When ioapic entry is setup, the interrupt remapping table is changed, and
>> the updated data will be stored to the old interrupt remapping table.
>>
>> Advantages of this approach:
>> 1. All manipulation of the IO-device is done by the Linux device-driver
>> for that device.
>> 2. This approach behaves in a manner very similar to operation without an
>> active iommu.
>> 3. Any activity between the IO-device and its RMRR areas is handled by the
>> device-driver in the same manner as during a non-kdump boot.
>> 4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
>> This supports the practice of creating a special kdump kernel without
>> drivers for any devices that are not required for taking a crashdump.
>> 5. Minimal code-changes among the existing mainline intel vt-d code.
>>
>> Summary of changes in this patch set:
>> 1. Added some useful function for root entry table in code intel-iommu.c
>> 2. Added new members to struct root_entry and struct irte;
>> 3. Functions to load old root entry table to iommu->root_entry from the memory
>> of old kernel.
>> 4. Functions to malloc new context entry table and page table and copy the data
>> from the old ones to the malloced new ones.
>> 5. Functions to enable support for DMA remapping in kdump kernel.
>> 6. Functions to load old irte data from the old kernel to the kdump kernel.
>> 7. Some code changes that support other behaviours that have been listed.
>> 8. In the new functions, use physical address as "unsigned long" type, not
>> pointers.
>>
>> Original version by Bill Sumner:
>> https://lkml.org/lkml/2014/1/10/518
>> https://lkml.org/lkml/2014/4/15/716
>> https://lkml.org/lkml/2014/4/24/836
>>
>> Zhenhua's last of Bill's patchset:
>> https://lkml.org/lkml/2014/10/21/134
>> https://lkml.org/lkml/2014/12/15/121
>>
>> Changed in this version:
>> 1. Do not disable and re-enable traslation and interrupt remapping.
>> 2. Use old root entry table.
>> 3. Use old interrupt remapping table.
>> 4. Use "unsigned long" as physical address.
>> 5. Use intel_unmap to unmap the old dma;
>>
>> This patchset should be applied with this one together:
>> https://lkml.org/lkml/2014/11/5/43
>> x86/iommu: fix incorrect bit operations in setting values
>>
>> Bill Sumner (5):
>> iommu/vt-d: Update iommu_attach_domain() and its callers
>> iommu/vt-d: Items required for kdump
>> iommu/vt-d: data types and functions used for kdump
>> iommu/vt-d: Add domain-id functions
>> iommu/vt-d: enable kdump support in iommu module
>>
>> Li, Zhen-Hua (10):
>> iommu/vt-d: Update iommu_attach_domain() and its callers
>> iommu/vt-d: Items required for kdump
>> iommu/vt-d: Add domain-id functions
>> iommu/vt-d: functions to copy data from old mem
>> iommu/vt-d: Add functions to load and save old re
>> iommu/vt-d: datatypes and functions used for kdump
>> iommu/vt-d: enable kdump support in iommu module
>> iommu/vtd: assign new page table for dma_map
>> iommu/vt-d: Copy functions for irte
>> iommu/vt-d: Use old irte in kdump kernel
>>
>> drivers/iommu/intel-iommu.c | 1050 +++++++++++++++++++++++++++++++++--
>> drivers/iommu/intel_irq_remapping.c | 99 +++-
>> include/linux/intel-iommu.h | 18 +
>> 3 files changed, 1123 insertions(+), 44 deletions(-)
>>
>

2014-12-26 07:28:42

by Takao Indoh

[permalink] [raw]
Subject: Re: [PATCH 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

On 2014/12/26 15:46, Li, ZhenHua wrote:
> Hi Takao Indoh,
>
> Thank you very much for your testing. I will add your update in next
> version.
> Also I think a flush for __iommu_update_old_root_entry is also necessary.
>
> Currently I have no idea about your fault, does it happen before or
> during its loading? Could you send me your full kernel log as an
> attachment?

Sure, see attached file.

I removed 9/10 and 10/10 patches from my kernel to avoid panic problem I
reported in previous mail, and then tested kdump. So please ignore
intr-remap fault message in log file. Also please ignore stack trace
starting with the following message, it's a problem of my box.

Flags mismatch irq 0. 00000080 (i801_smbus) vs. 00015a00 (timer)

Thanks,
Takao Indoh

>
> Regards and Merry Christmas.
> Zhenhua
>
> On 12/26/2014 01:13 PM, Takao Indoh wrote:
>> Hi Zhen-Hua,
>>
>> I tested your patch and found two problems.
>>
>> [1]
>> Kenel panic occurs during 2nd kernel boot.
>>
>> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
>> Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC
>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.18.0 #25
>> Hardware name: FUJITSU-SV PRIMERGY BX920 S2/D3030, BIOS 080015 Rev.3D81.3030 02/10/2012
>> 0000000000000002 ffff880036167d08 ffffffff815b1c6a 0000000000000000
>> ffffffff817f7670 ffff880036167d88 ffffffff815b19f1 0000000000000008
>> ffff880036167d98 ffff880036167d38 ffffffff810a5d2f ffff880036167d98
>> Call Trace:
>> [<ffffffff815b1c6a>] dump_stack+0x48/0x5e
>> [<ffffffff815b19f1>] panic+0xbb/0x1fa
>> [<ffffffff810a5d2f>] ? vprintk_default+0x1f/0x30
>> [<ffffffff814c6a6c>] panic_if_irq_remap+0x1c/0x20
>> [<ffffffff81b53985>] check_timer+0x1e7/0x5ed
>> [<ffffffff8129bd9d>] ? radix_tree_lookup+0xd/0x10
>> [<ffffffff81b5413b>] setup_IO_APIC+0x261/0x292
>> [<ffffffff81b50302>] native_smp_prepare_cpus+0x214/0x25d
>> [<ffffffff81b41c65>] kernel_init_freeable+0x1dc/0x28c
>> [<ffffffff815aaf00>] ? rest_init+0x80/0x80
>> [<ffffffff815aaf0e>] kernel_init+0xe/0xf0
>> [<ffffffff815b5d2c>] ret_from_fork+0x7c/0xb0
>> [<ffffffff815aaf00>] ? rest_init+0x80/0x80
>> ---[ end Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC
>>
>>
>> This panic seems to be related to unflushed cache. I confirmed this
>> problem was fixed by the following patch.
>>
>> --- a/drivers/iommu/intel_irq_remapping.c
>> +++ b/drivers/iommu/intel_irq_remapping.c
>> @@ -200,8 +200,13 @@ static int modify_irte(int irq, struct irte *irte_modified)
>> set_64bit(&irte->high, irte_modified->high);
>>
>> #ifdef CONFIG_CRASH_DUMP
>> - if (is_kdump_kernel())
>> + if (is_kdump_kernel()) {
>> __iommu_update_old_irte(iommu, index);
>> + __iommu_flush_cache(iommu,
>> + iommu->ir_table->base_old_virt +
>> + index * sizeof(struct irte),
>> + sizeof(struct irte));
>> + }
>> #endif
>> __iommu_flush_cache(iommu, irte, sizeof(*irte));
>>
>>
>> [2]
>> Some DMAR error messages are still found in 2nd kernel boot.
>>
>> dmar: DRHD: handling fault status reg 2
>> dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr ffded000
>> DMAR:[fault reason 01] Present bit in root entry is clear
>>
>> I confiremd your commit 1a2262 was already applied. Any idea?
>>
>> Thanks,
>> Takao Indoh
>>
>>
>> On 2014/12/22 18:15, Li, Zhen-Hua wrote:
>>> This patchset is an update of Bill Sumner's patchset, implements a fix for:
>>> If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
>>> when a panic happens, the kdump kernel will boot with these faults:
>>>
>>> dmar: DRHD: handling fault status reg 102
>>> dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff80000
>>> DMAR:[fault reason 01] Present bit in root entry is clear
>>>
>>> dmar: DRHD: handling fault status reg 2
>>> dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
>>> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
>>>
>>> On some system, the interrupt remapping fault will also happen even if the
>>> intel_iommu is not set to on, because the interrupt remapping will be enabled
>>> when x2apic is needed by the system.
>>>
>>> The cause of the DMA fault is described in Bill's original version, and the
>>> INTR-Remap fault is caused by a similar reason. In short, the initialization
>>> of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
>>> response.
>>>
>>> To fix this problem, we modifies the behaviors of the intel vt-d in the
>>> crashdump kernel:
>>>
>>> For DMA Remapping:
>>> 1. To accept the vt-d hardware in an active state,
>>> 2. Do not disable and re-enable the translation, keep it enabled.
>>> 3. Use the old root entry table, do not rewrite the RTA register.
>>> 4. Malloc and use new context entry table and page table, copy data from the
>>> old ones that used by the old kernel.
>>> 5. to use different portions of the iova address ranges for the device drivers
>>> in the crashdump kernel than the iova ranges that were in-use at the time
>>> of the panic.
>>> 6. After device driver is loaded, when it issues the first dma_map command,
>>> free the dmar_domain structure for this device, and generate a new one, so
>>> that the device can be assigned a new and empty page table.
>>> 7. When a new context entry table is generated, we also save its address to
>>> the old root entry table.
>>>
>>> For Interrupt Remapping:
>>> 1. To accept the vt-d hardware in an active state,
>>> 2. Do not disable and re-enable the interrupt remapping, keep it enabled.
>>> 3. Use the old interrupt remapping table, do not rewrite the IRTA register.
>>> 4. When ioapic entry is setup, the interrupt remapping table is changed, and
>>> the updated data will be stored to the old interrupt remapping table.
>>>
>>> Advantages of this approach:
>>> 1. All manipulation of the IO-device is done by the Linux device-driver
>>> for that device.
>>> 2. This approach behaves in a manner very similar to operation without an
>>> active iommu.
>>> 3. Any activity between the IO-device and its RMRR areas is handled by the
>>> device-driver in the same manner as during a non-kdump boot.
>>> 4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
>>> This supports the practice of creating a special kdump kernel without
>>> drivers for any devices that are not required for taking a crashdump.
>>> 5. Minimal code-changes among the existing mainline intel vt-d code.
>>>
>>> Summary of changes in this patch set:
>>> 1. Added some useful function for root entry table in code intel-iommu.c
>>> 2. Added new members to struct root_entry and struct irte;
>>> 3. Functions to load old root entry table to iommu->root_entry from the memory
>>> of old kernel.
>>> 4. Functions to malloc new context entry table and page table and copy the data
>>> from the old ones to the malloced new ones.
>>> 5. Functions to enable support for DMA remapping in kdump kernel.
>>> 6. Functions to load old irte data from the old kernel to the kdump kernel.
>>> 7. Some code changes that support other behaviours that have been listed.
>>> 8. In the new functions, use physical address as "unsigned long" type, not
>>> pointers.
>>>
>>> Original version by Bill Sumner:
>>> https://lkml.org/lkml/2014/1/10/518
>>> https://lkml.org/lkml/2014/4/15/716
>>> https://lkml.org/lkml/2014/4/24/836
>>>
>>> Zhenhua's last of Bill's patchset:
>>> https://lkml.org/lkml/2014/10/21/134
>>> https://lkml.org/lkml/2014/12/15/121
>>>
>>> Changed in this version:
>>> 1. Do not disable and re-enable traslation and interrupt remapping.
>>> 2. Use old root entry table.
>>> 3. Use old interrupt remapping table.
>>> 4. Use "unsigned long" as physical address.
>>> 5. Use intel_unmap to unmap the old dma;
>>>
>>> This patchset should be applied with this one together:
>>> https://lkml.org/lkml/2014/11/5/43
>>> x86/iommu: fix incorrect bit operations in setting values
>>>
>>> Bill Sumner (5):
>>> iommu/vt-d: Update iommu_attach_domain() and its callers
>>> iommu/vt-d: Items required for kdump
>>> iommu/vt-d: data types and functions used for kdump
>>> iommu/vt-d: Add domain-id functions
>>> iommu/vt-d: enable kdump support in iommu module
>>>
>>> Li, Zhen-Hua (10):
>>> iommu/vt-d: Update iommu_attach_domain() and its callers
>>> iommu/vt-d: Items required for kdump
>>> iommu/vt-d: Add domain-id functions
>>> iommu/vt-d: functions to copy data from old mem
>>> iommu/vt-d: Add functions to load and save old re
>>> iommu/vt-d: datatypes and functions used for kdump
>>> iommu/vt-d: enable kdump support in iommu module
>>> iommu/vtd: assign new page table for dma_map
>>> iommu/vt-d: Copy functions for irte
>>> iommu/vt-d: Use old irte in kdump kernel
>>>
>>> drivers/iommu/intel-iommu.c | 1050 +++++++++++++++++++++++++++++++++--
>>> drivers/iommu/intel_irq_remapping.c | 99 +++-
>>> include/linux/intel-iommu.h | 18 +
>>> 3 files changed, 1123 insertions(+), 44 deletions(-)
>>>
>>
>
>
>


Attachments:
log.txt (50.53 kB)

2014-12-29 03:16:52

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Hi Takao Indoh,

Happy New Year, and thank you very much for you help. The flush is quite
a problem, as there are several places the flush function should be called,
I think the flush should be placed in functions like __iommu_update_old_*.
Created a small patch for this, it is attached.



As I cannot reproduce your problems on my system, so could you please try
these steps?
1. Apply the latest patchset, including 9/10 and 10/10, and then apply the
attached patch_for_flush.patch. And then test the kernel.

2. If 1 does not fix the DMAR fault problems, then it might be caused by
7/10, so please *unpatch* it from the kernel (others and the attached one
should be patched), and then test the kernel.

Regards
Zhenhua

On 12/26/2014 03:27 PM, Takao Indoh wrote:
> On 2014/12/26 15:46, Li, ZhenHua wrote:
>> Hi Takao Indoh,
>>
>> Thank you very much for your testing. I will add your update in next
>> version.
>> Also I think a flush for __iommu_update_old_root_entry is also necessary.
>>
>> Currently I have no idea about your fault, does it happen before or
>> during its loading? Could you send me your full kernel log as an
>> attachment?
> Sure, see attached file.
>
> I removed 9/10 and 10/10 patches from my kernel to avoid panic problem I
> reported in previous mail, and then tested kdump. So please ignore
> intr-remap fault message in log file. Also please ignore stack trace
> starting with the following message, it's a problem of my box.
>
> Flags mismatch irq 0. 00000080 (i801_smbus) vs. 00015a00 (timer)
>
> Thanks,
> Takao Indoh
>
>> Regards and Merry Christmas.
>> Zhenhua
>>
>> On 12/26/2014 01:13 PM, Takao Indoh wrote:
>>> Hi Zhen-Hua,
>>>
>>> I tested your patch and found two problems.
>>>
>>> [1]
>>> Kenel panic occurs during 2nd kernel boot.
>>>
>>> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
>>> Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC
>>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.18.0 #25
>>> Hardware name: FUJITSU-SV PRIMERGY BX920 S2/D3030, BIOS 080015 Rev.3D81.3030 02/10/2012
>>> 0000000000000002 ffff880036167d08 ffffffff815b1c6a 0000000000000000
>>> ffffffff817f7670 ffff880036167d88 ffffffff815b19f1 0000000000000008
>>> ffff880036167d98 ffff880036167d38 ffffffff810a5d2f ffff880036167d98
>>> Call Trace:
>>> [<ffffffff815b1c6a>] dump_stack+0x48/0x5e
>>> [<ffffffff815b19f1>] panic+0xbb/0x1fa
>>> [<ffffffff810a5d2f>] ? vprintk_default+0x1f/0x30
>>> [<ffffffff814c6a6c>] panic_if_irq_remap+0x1c/0x20
>>> [<ffffffff81b53985>] check_timer+0x1e7/0x5ed
>>> [<ffffffff8129bd9d>] ? radix_tree_lookup+0xd/0x10
>>> [<ffffffff81b5413b>] setup_IO_APIC+0x261/0x292
>>> [<ffffffff81b50302>] native_smp_prepare_cpus+0x214/0x25d
>>> [<ffffffff81b41c65>] kernel_init_freeable+0x1dc/0x28c
>>> [<ffffffff815aaf00>] ? rest_init+0x80/0x80
>>> [<ffffffff815aaf0e>] kernel_init+0xe/0xf0
>>> [<ffffffff815b5d2c>] ret_from_fork+0x7c/0xb0
>>> [<ffffffff815aaf00>] ? rest_init+0x80/0x80
>>> ---[ end Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC
>>>
>>>
>>> This panic seems to be related to unflushed cache. I confirmed this
>>> problem was fixed by the following patch.
>>>
>>> --- a/drivers/iommu/intel_irq_remapping.c
>>> +++ b/drivers/iommu/intel_irq_remapping.c
>>> @@ -200,8 +200,13 @@ static int modify_irte(int irq, struct irte *irte_modified)
>>> set_64bit(&irte->high, irte_modified->high);
>>>
>>> #ifdef CONFIG_CRASH_DUMP
>>> - if (is_kdump_kernel())
>>> + if (is_kdump_kernel()) {
>>> __iommu_update_old_irte(iommu, index);
>>> + __iommu_flush_cache(iommu,
>>> + iommu->ir_table->base_old_virt +
>>> + index * sizeof(struct irte),
>>> + sizeof(struct irte));
>>> + }
>>> #endif
>>> __iommu_flush_cache(iommu, irte, sizeof(*irte));
>>>
>>>
>>> [2]
>>> Some DMAR error messages are still found in 2nd kernel boot.
>>>
>>> dmar: DRHD: handling fault status reg 2
>>> dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr ffded000
>>> DMAR:[fault reason 01] Present bit in root entry is clear
>>>
>>> I confiremd your commit 1a2262 was already applied. Any idea?
>>>
>>> Thanks,
>>> Takao Indoh
>>>
>>>
>>> On 2014/12/22 18:15, Li, Zhen-Hua wrote:
>>>> This patchset is an update of Bill Sumner's patchset, implements a fix for:
>>>> If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
>>>> when a panic happens, the kdump kernel will boot with these faults:
>>>>
>>>> dmar: DRHD: handling fault status reg 102
>>>> dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff80000
>>>> DMAR:[fault reason 01] Present bit in root entry is clear
>>>>
>>>> dmar: DRHD: handling fault status reg 2
>>>> dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
>>>> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
>>>>
>>>> On some system, the interrupt remapping fault will also happen even if the
>>>> intel_iommu is not set to on, because the interrupt remapping will be enabled
>>>> when x2apic is needed by the system.
>>>>
>>>> The cause of the DMA fault is described in Bill's original version, and the
>>>> INTR-Remap fault is caused by a similar reason. In short, the initialization
>>>> of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
>>>> response.
>>>>
>>>> To fix this problem, we modifies the behaviors of the intel vt-d in the
>>>> crashdump kernel:
>>>>
>>>> For DMA Remapping:
>>>> 1. To accept the vt-d hardware in an active state,
>>>> 2. Do not disable and re-enable the translation, keep it enabled.
>>>> 3. Use the old root entry table, do not rewrite the RTA register.
>>>> 4. Malloc and use new context entry table and page table, copy data from the
>>>> old ones that used by the old kernel.
>>>> 5. to use different portions of the iova address ranges for the device drivers
>>>> in the crashdump kernel than the iova ranges that were in-use at the time
>>>> of the panic.
>>>> 6. After device driver is loaded, when it issues the first dma_map command,
>>>> free the dmar_domain structure for this device, and generate a new one, so
>>>> that the device can be assigned a new and empty page table.
>>>> 7. When a new context entry table is generated, we also save its address to
>>>> the old root entry table.
>>>>
>>>> For Interrupt Remapping:
>>>> 1. To accept the vt-d hardware in an active state,
>>>> 2. Do not disable and re-enable the interrupt remapping, keep it enabled.
>>>> 3. Use the old interrupt remapping table, do not rewrite the IRTA register.
>>>> 4. When ioapic entry is setup, the interrupt remapping table is changed, and
>>>> the updated data will be stored to the old interrupt remapping table.
>>>>
>>>> Advantages of this approach:
>>>> 1. All manipulation of the IO-device is done by the Linux device-driver
>>>> for that device.
>>>> 2. This approach behaves in a manner very similar to operation without an
>>>> active iommu.
>>>> 3. Any activity between the IO-device and its RMRR areas is handled by the
>>>> device-driver in the same manner as during a non-kdump boot.
>>>> 4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
>>>> This supports the practice of creating a special kdump kernel without
>>>> drivers for any devices that are not required for taking a crashdump.
>>>> 5. Minimal code-changes among the existing mainline intel vt-d code.
>>>>
>>>> Summary of changes in this patch set:
>>>> 1. Added some useful function for root entry table in code intel-iommu.c
>>>> 2. Added new members to struct root_entry and struct irte;
>>>> 3. Functions to load old root entry table to iommu->root_entry from the memory
>>>> of old kernel.
>>>> 4. Functions to malloc new context entry table and page table and copy the data
>>>> from the old ones to the malloced new ones.
>>>> 5. Functions to enable support for DMA remapping in kdump kernel.
>>>> 6. Functions to load old irte data from the old kernel to the kdump kernel.
>>>> 7. Some code changes that support other behaviours that have been listed.
>>>> 8. In the new functions, use physical address as "unsigned long" type, not
>>>> pointers.
>>>>
>>>> Original version by Bill Sumner:
>>>> https://lkml.org/lkml/2014/1/10/518
>>>> https://lkml.org/lkml/2014/4/15/716
>>>> https://lkml.org/lkml/2014/4/24/836
>>>>
>>>> Zhenhua's last of Bill's patchset:
>>>> https://lkml.org/lkml/2014/10/21/134
>>>> https://lkml.org/lkml/2014/12/15/121
>>>>
>>>> Changed in this version:
>>>> 1. Do not disable and re-enable traslation and interrupt remapping.
>>>> 2. Use old root entry table.
>>>> 3. Use old interrupt remapping table.
>>>> 4. Use "unsigned long" as physical address.
>>>> 5. Use intel_unmap to unmap the old dma;
>>>>
>>>> This patchset should be applied with this one together:
>>>> https://lkml.org/lkml/2014/11/5/43
>>>> x86/iommu: fix incorrect bit operations in setting values
>>>>
>>>> Bill Sumner (5):
>>>> iommu/vt-d: Update iommu_attach_domain() and its callers
>>>> iommu/vt-d: Items required for kdump
>>>> iommu/vt-d: data types and functions used for kdump
>>>> iommu/vt-d: Add domain-id functions
>>>> iommu/vt-d: enable kdump support in iommu module
>>>>
>>>> Li, Zhen-Hua (10):
>>>> iommu/vt-d: Update iommu_attach_domain() and its callers
>>>> iommu/vt-d: Items required for kdump
>>>> iommu/vt-d: Add domain-id functions
>>>> iommu/vt-d: functions to copy data from old mem
>>>> iommu/vt-d: Add functions to load and save old re
>>>> iommu/vt-d: datatypes and functions used for kdump
>>>> iommu/vt-d: enable kdump support in iommu module
>>>> iommu/vtd: assign new page table for dma_map
>>>> iommu/vt-d: Copy functions for irte
>>>> iommu/vt-d: Use old irte in kdump kernel
>>>>
>>>> drivers/iommu/intel-iommu.c | 1050 +++++++++++++++++++++++++++++++++--
>>>> drivers/iommu/intel_irq_remapping.c | 99 +++-
>>>> include/linux/intel-iommu.h | 18 +
>>>> 3 files changed, 1123 insertions(+), 44 deletions(-)
>>>>
>>
>>


Attachments:
patch_for_flush.patch (1.40 kB)