2015-05-11 09:54:04

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

This patchset is an update of Bill Sumner's patchset, implements a fix for:
If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
when a panic happens, the kdump kernel will boot with these faults:

dmar: DRHD: handling fault status reg 102
dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff80000
DMAR:[fault reason 01] Present bit in root entry is clear

dmar: DRHD: handling fault status reg 2
dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear

On some system, the interrupt remapping fault will also happen even if the
intel_iommu is not set to on, because the interrupt remapping will be enabled
when x2apic is needed by the system.

The cause of the DMA fault is described in Bill's original version, and the
INTR-Remap fault is caused by a similar reason. In short, the initialization
of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
response.

To fix this problem, we modifies the behaviors of the intel vt-d in the
crashdump kernel:

For DMA Remapping:
1. To accept the vt-d hardware in an active state,
2. Do not disable and re-enable the translation, keep it enabled.
3. Use the old root entry table, do not rewrite the RTA register.
4. Malloc and use new context entry table, copy data from the old ones that
used by the old kernel.
5. Keep using the old page tables before driver is loaded.
6. After device driver is loaded, when it issues the first dma_map command,
free the dmar_domain structure for this device, and generate a new one, so
that the device can be assigned a new and empty page table.
7. When a new context entry table is generated, we also save its address to
the old root entry table.

For Interrupt Remapping:
1. To accept the vt-d hardware in an active state,
2. Do not disable and re-enable the interrupt remapping, keep it enabled.
3. Use the old interrupt remapping table, do not rewrite the IRTA register.
4. When ioapic entry is setup, the interrupt remapping table is changed, and
the updated data will be stored to the old interrupt remapping table.

Advantages of this approach:
1. All manipulation of the IO-device is done by the Linux device-driver
for that device.
2. This approach behaves in a manner very similar to operation without an
active iommu.
3. Any activity between the IO-device and its RMRR areas is handled by the
device-driver in the same manner as during a non-kdump boot.
4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
This supports the practice of creating a special kdump kernel without
drivers for any devices that are not required for taking a crashdump.
5. Minimal code-changes among the existing mainline intel vt-d code.

Summary of changes in this patch set:
1. Added some useful function for root entry table in code intel-iommu.c
2. Added new members to struct root_entry and struct irte;
3. Functions to load old root entry table to iommu->root_entry from the memory
of old kernel.
4. Functions to malloc new context entry table and copy the data from the old
ones to the malloced new ones.
5. Functions to enable support for DMA remapping in kdump kernel.
6. Functions to load old irte data from the old kernel to the kdump kernel.
7. Some code changes that support other behaviours that have been listed.
8. In the new functions, use physical address as "unsigned long" type, not
pointers.

Original version by Bill Sumner:
https://lkml.org/lkml/2014/1/10/518
https://lkml.org/lkml/2014/4/15/716
https://lkml.org/lkml/2014/4/24/836

Zhenhua's updates:
https://lkml.org/lkml/2014/10/21/134
https://lkml.org/lkml/2014/12/15/121
https://lkml.org/lkml/2014/12/22/53
https://lkml.org/lkml/2015/1/6/1166
https://lkml.org/lkml/2015/1/12/35
https://lkml.org/lkml/2015/3/19/33
https://lkml.org/lkml/2015/4/10/135

Changelog[v11]:
1. Fix some conflicts with the latest upstream kernel.
Add flush in iommu_context_addr

Changelog[v10]:
1. Do not use CONFIG_CRASH_DUMP and is_kdump_kernel().
Use one flag which stores the te and ir status in last kernel:
iommu->pre_enabled_trans
iommu->pre_enabled_ir

Changelog[v9]:
1. Add new function iommu_attach_domain_with_id.
2. Do not copy old page tables, keep using the old ones.
3. Remove functions:
intel_iommu_did_to_domain_values_entry
intel_iommu_get_dids_from_old_kernel
device_to_domain_id
copy_page_addr
copy_page_table
copy_context_entry
copy_context_entry_table
4. Add new function device_to_existing_context_entry.

Changelog[v8]:
1. Add a missing __iommu_flush_cache in function copy_page_table.

Changelog[v7]:
1. Use __iommu_flush_cache to flush the data to hardware.

Changelog[v6]:
1. Use "unsigned long" as type of physical address.
2. Use new function unmap_device_dma to unmap the old dma.
3. Some small incorrect bits order for aw shift.

Changelog[v5]:
1. Do not disable and re-enable traslation and interrupt remapping.
2. Use old root entry table.
3. Use old interrupt remapping table.
4. New functions to copy data from old kernel, and save to old kernel mem.
5. New functions to save updated root entry table and irte table.
6. Use intel_unmap to unmap the old dma;
7. Allocate new pages while driver is being loaded.

Changelog[v4]:
1. Cut off the patches that move some defines and functions to new files.
2. Reduce the numbers of patches to five, make it more easier to read.
3. Changed the name of functions, make them consistent with current context
get/set functions.
4. Add change to function __iommu_attach_domain.

Changelog[v3]:
1. Commented-out "#define DEBUG 1" to eliminate debug messages.
2. Updated the comments about changes in each version.
3. Fixed: one-line added to Copy-Translations patch to initialize the iovad
struct as recommended by Baoquan He [[email protected]]
init_iova_domain(&domain->iovad, DMA_32BIT_PFN);

Changelog[v2]:
The following series implements a fix for:
A kdump problem about DMA that has been discussed for a long time. That is,
when a kernel panics and boots into the kdump kernel, DMA started by the
panicked kernel is not stopped before the kdump kernel is booted and the
kdump kernel disables the IOMMU while this DMA continues. This causes the
IOMMU to stop translating the DMA addresses as IOVAs and begin to treat
them as physical memory addresses -- which causes the DMA to either:
(1) generate DMAR errors or
(2) generate PCI SERR errors or
(3) transfer data to or from incorrect areas of memory. Often this
causes the dump to fail.

Changelog[v1]:
The original version.

Changed in this version:
1. Do not disable and re-enable traslation and interrupt remapping.
2. Use old root entry table.
3. Use old interrupt remapping table.
4. Use "unsigned long" as physical address.
5. Use intel_unmap to unmap the old dma;

Baoquan He <[email protected]> helps testing this patchset.
Takao Indoh <[email protected]> gives valuable suggestions.

Li, Zhen-Hua (10):
iommu/vt-d: New function to attach domain with id
iommu/vt-d: Items required for kdump
iommu/vt-d: Function to get existing context entry
iommu/vt-d: functions to copy data from old mem
iommu/vt-d: Add functions to load and save old re
iommu/vt-d: datatypes and functions used for kdump
iommu/vt-d: enable kdump support in iommu module
iommu/vt-d: assign new page table for dma_map
iommu/vt-d: Copy functions for irte
iommu/vt-d: Use old irte in kdump kernel

drivers/iommu/intel-iommu.c | 536 ++++++++++++++++++++++++++++++++++--
drivers/iommu/intel_irq_remapping.c | 96 ++++++-
include/linux/intel-iommu.h | 16 ++
3 files changed, 623 insertions(+), 25 deletions(-)

--
2.0.0-rc0


2015-05-11 09:57:59

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH v11 01/10] iommu/vt-d: New function to attach domain with id

Allow specification of the domain-id for the new domain.
This patch only adds a new function iommu_attach_domain_with_id, it is like
the function iommu_attach_domain(), only adding a parameter "did".

Bill Sumner:
(In older versions) Add new 'did' parameter to iommu_attach_domain();
The caller of this function.

Li, Zhenhua:
New function iommu_attach_domain_with_id(), instead of updating function
iommu_attach_domain();

Signed-off-by: Bill Sumner <[email protected]>
Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel-iommu.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 68d43be..cb9d6cc 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1552,6 +1552,16 @@ static int iommu_attach_domain(struct dmar_domain *domain,
return num;
}

+static int iommu_attach_domain_with_id(struct dmar_domain *domain,
+ struct intel_iommu *iommu,
+ int domain_number)
+{
+ if (domain_number >= 0)
+ return domain_number;
+
+ return iommu_attach_domain(domain, iommu);
+}
+
static int iommu_attach_vm_domain(struct dmar_domain *domain,
struct intel_iommu *iommu)
{
@@ -2220,6 +2230,7 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
u16 dma_alias;
unsigned long flags;
u8 bus, devfn;
+ int did = -1; /* Default to "no domain_id supplied" */

domain = find_domain(dev);
if (domain)
@@ -2253,7 +2264,7 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
domain = alloc_domain(0);
if (!domain)
return NULL;
- domain->id = iommu_attach_domain(domain, iommu);
+ domain->id = iommu_attach_domain_with_id(domain, iommu, did);
if (domain->id < 0) {
free_domain_mem(domain);
return NULL;
--
2.0.0-rc0

2015-05-11 09:54:16

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH v11 02/10] iommu/vt-d: Items required for kdump

Add context entry functions needed for kdump.

Bill Sumner:
Original version;

Li, Zhenhua:
Changed the name of new functions, make them consistent with current
context get/set functions.
Remove the structure dve which is not used in new version.

Signed-off-by: Bill Sumner <[email protected]>
Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel-iommu.c | 72 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 72 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index cb9d6cc..1e7ceb5 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -190,6 +190,31 @@ struct root_entry {
};
#define ROOT_ENTRY_NR (VTD_PAGE_SIZE/sizeof(struct root_entry))

+static inline bool root_present(struct root_entry *root)
+{
+ return (root->lo & 1);
+}
+
+static inline void set_root_value(struct root_entry *root, unsigned long value)
+{
+ root->lo &= ~VTD_PAGE_MASK;
+ root->lo |= value & VTD_PAGE_MASK;
+}
+
+static inline struct context_entry *
+get_context_addr_from_root(struct root_entry *root)
+{
+ return (struct context_entry *)
+ (root_present(root)?phys_to_virt(
+ root->lo & VTD_PAGE_MASK) :
+ NULL);
+}
+
+static inline unsigned long
+get_context_phys_from_root(struct root_entry *root)
+{
+ return root_present(root) ? (root->lo & VTD_PAGE_MASK) : 0;
+}

/*
* low 64 bits:
@@ -211,6 +236,32 @@ static inline bool context_present(struct context_entry *context)
{
return (context->lo & 1);
}
+
+static inline int context_fault_enable(struct context_entry *c)
+{
+ return((c->lo >> 1) & 0x1);
+}
+
+static inline int context_translation_type(struct context_entry *c)
+{
+ return((c->lo >> 2) & 0x3);
+}
+
+static inline u64 context_address_root(struct context_entry *c)
+{
+ return((c->lo >> VTD_PAGE_SHIFT));
+}
+
+static inline int context_address_width(struct context_entry *c)
+{
+ return((c->hi >> 0) & 0x7);
+}
+
+static inline int context_domain_id(struct context_entry *c)
+{
+ return((c->hi >> 8) & 0xffff);
+}
+
static inline void context_set_present(struct context_entry *context)
{
context->lo |= 1;
@@ -296,6 +347,27 @@ static inline int first_pte_in_page(struct dma_pte *pte)
return !((unsigned long)pte & ~VTD_PAGE_MASK);
}

+
+/*
+ * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU
+ *
+ * Fixes the crashdump kernel to deal with an active iommu and legacy
+ * DMA from the (old) panicked kernel in a manner similar to how legacy
+ * DMA is handled when no hardware iommu was in use by the old kernel --
+ * allow the legacy DMA to continue into its current buffers.
+ *
+ * In the crashdump kernel, this code:
+ * 1. skips disabling the IOMMU's translating.
+ * 2. Do not re-enable IOMMU's translating.
+ * 3. In kdump kernel, use the old root entry table.
+ * 4. Allocate pages for new context entry, copy data from old context entries
+ * in the old kernel to the new ones.
+ *
+ * In other kinds of kernel, for example, a kernel started by kexec,
+ * do the same thing as crashdump kernel.
+ */
+
+
/*
* This domain is a statically identity mapping domain.
* 1. This domain creats a static 1:1 mapping to all usable memory.
--
2.0.0-rc0

2015-05-11 09:54:21

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH v11 03/10] iommu/vt-d: Function to get existing context entry

Interface for when a new domain in the old kernel needs some
values from the panicked kernel's context entries.

Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel-iommu.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 1e7ceb5..07e6118 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -367,6 +367,10 @@ static inline int first_pte_in_page(struct dma_pte *pte)
* do the same thing as crashdump kernel.
*/

+static struct context_entry *device_to_existing_context_entry(
+ struct intel_iommu *iommu,
+ u8 bus, u8 devfn);
+

/*
* This domain is a statically identity mapping domain.
@@ -4810,3 +4814,22 @@ static void __init check_tylersburg_isoch(void)
printk(KERN_WARNING "DMAR: Recommended TLB entries for ISOCH unit is 16; your BIOS set %d\n",
vtisochctrl);
}
+
+static struct context_entry *device_to_existing_context_entry(
+ struct intel_iommu *iommu,
+ u8 bus, u8 devfn)
+{
+ struct root_entry *root;
+ struct context_entry *context;
+ struct context_entry *ret = NULL;
+ unsigned long flags;
+
+ spin_lock_irqsave(&iommu->lock, flags);
+ root = &iommu->root_entry[bus];
+ context = get_context_addr_from_root(root);
+ if (context && context_present(context+devfn))
+ ret = &context[devfn];
+ spin_unlock_irqrestore(&iommu->lock, flags);
+ return ret;
+}
+
--
2.0.0-rc0

2015-05-11 09:54:27

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH v11 04/10] iommu/vt-d: functions to copy data from old mem

Add some functions to copy the data from old kernel.
These functions are used to copy context tables and page tables.

To avoid calling iounmap between spin_lock_irqsave and spin_unlock_irqrestore,
use a link here, store the pointers , and then use iounmap to free them in
another place.

Li, Zhen-hua:
The functions and logics.

Takao Indoh:
Check if pfn is ram:
if (page_is_ram(pfn))

Signed-off-by: Li, Zhen-Hua <[email protected]>
Signed-off-by: Takao Indoh <[email protected]>
---
drivers/iommu/intel-iommu.c | 102 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/intel-iommu.h | 6 +++
2 files changed, 108 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 07e6118..0b97c15 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -371,6 +371,17 @@ static struct context_entry *device_to_existing_context_entry(
struct intel_iommu *iommu,
u8 bus, u8 devfn);

+/*
+ * A structure used to store the address allocated by ioremap();
+ * The we need to call iounmap() to free them out of spin_lock_irqsave/unlock;
+ */
+struct iommu_remapped_entry {
+ struct list_head list;
+ void __iomem *mem;
+};
+static LIST_HEAD(__iommu_remapped_mem);
+static DEFINE_MUTEX(__iommu_mem_list_lock);
+

/*
* This domain is a statically identity mapping domain.
@@ -4833,3 +4844,94 @@ static struct context_entry *device_to_existing_context_entry(
return ret;
}

+/*
+ * Copy memory from a physically-addressed area into a virtually-addressed area
+ */
+int __iommu_load_from_oldmem(void *to, unsigned long from, unsigned long size)
+{
+ unsigned long pfn; /* Page Frame Number */
+ size_t csize = (size_t)size; /* Num(bytes to copy) */
+ unsigned long offset; /* Lower 12 bits of to */
+ void __iomem *virt_mem;
+ struct iommu_remapped_entry *mapped;
+
+ pfn = from >> VTD_PAGE_SHIFT;
+ offset = from & (~VTD_PAGE_MASK);
+
+ if (page_is_ram(pfn)) {
+ memcpy(to, pfn_to_kaddr(pfn) + offset, csize);
+ } else{
+
+ mapped = kzalloc(sizeof(struct iommu_remapped_entry),
+ GFP_KERNEL);
+ if (!mapped)
+ return -ENOMEM;
+
+ virt_mem = ioremap_cache((unsigned long)from, size);
+ if (!virt_mem) {
+ kfree(mapped);
+ return -ENOMEM;
+ }
+ memcpy(to, virt_mem, size);
+
+ mutex_lock(&__iommu_mem_list_lock);
+ mapped->mem = virt_mem;
+ list_add_tail(&mapped->list, &__iommu_remapped_mem);
+ mutex_unlock(&__iommu_mem_list_lock);
+ }
+ return size;
+}
+
+/*
+ * Copy memory from a virtually-addressed area into a physically-addressed area
+ */
+int __iommu_save_to_oldmem(unsigned long to, void *from, unsigned long size)
+{
+ unsigned long pfn; /* Page Frame Number */
+ size_t csize = (size_t)size; /* Num(bytes to copy) */
+ unsigned long offset; /* Lower 12 bits of to */
+ void __iomem *virt_mem;
+ struct iommu_remapped_entry *mapped;
+
+ pfn = to >> VTD_PAGE_SHIFT;
+ offset = to & (~VTD_PAGE_MASK);
+
+ if (page_is_ram(pfn)) {
+ memcpy(pfn_to_kaddr(pfn) + offset, from, csize);
+ } else{
+ mapped = kzalloc(sizeof(struct iommu_remapped_entry),
+ GFP_KERNEL);
+ if (!mapped)
+ return -ENOMEM;
+
+ virt_mem = ioremap_cache((unsigned long)to, size);
+ if (!virt_mem) {
+ kfree(mapped);
+ return -ENOMEM;
+ }
+ memcpy(virt_mem, from, size);
+ mutex_lock(&__iommu_mem_list_lock);
+ mapped->mem = virt_mem;
+ list_add_tail(&mapped->list, &__iommu_remapped_mem);
+ mutex_unlock(&__iommu_mem_list_lock);
+ }
+ return size;
+}
+
+/*
+ * Free the mapped memory for ioremap;
+ */
+int __iommu_free_mapped_mem(void)
+{
+ struct iommu_remapped_entry *mem_entry, *tmp;
+
+ mutex_lock(&__iommu_mem_list_lock);
+ list_for_each_entry_safe(mem_entry, tmp, &__iommu_remapped_mem, list) {
+ iounmap(mem_entry->mem);
+ list_del(&mem_entry->list);
+ kfree(mem_entry);
+ }
+ mutex_unlock(&__iommu_mem_list_lock);
+ return 0;
+}
+
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 796ef96..ced1fac 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -380,4 +380,10 @@ extern int dmar_ir_support(void);

extern const struct attribute_group *intel_iommu_groups[];

+extern int __iommu_load_from_oldmem(void *to, unsigned long from,
+ unsigned long size);
+extern int __iommu_save_to_oldmem(unsigned long to, void *from,
+ unsigned long size);
+extern int __iommu_free_mapped_mem(void);
+
#endif
--
2.0.0-rc0

2015-05-11 09:54:36

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH v11 05/10] iommu/vt-d: Add functions to load and save old re

Add functions to load root entry table from old kernel, and to save updated
root entry table.
Add two member in struct intel_iommu, to store the RTA in old kernel, and
the mapped virt address of it.

We use the old RTA in dump kernel, and when the iommu->root_entry is used as
a cache in kdump kernel, its phys address will not be save to RTA register,
but when its data is changed, we will save the new data to old root entry table.

Li, Zhen-hua:
The functions and logics.

Takao Indoh:
Add __iommu_flush_cache.

Signed-off-by: Li, Zhen-Hua <[email protected]>
Signed-off-by: Takao Indoh <[email protected]>
---
drivers/iommu/intel-iommu.c | 54 ++++++++++++++++++++++++++++++++++++++++++++-
include/linux/intel-iommu.h | 3 +++
2 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0b97c15..3a5d446 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -371,6 +371,10 @@ static struct context_entry *device_to_existing_context_entry(
struct intel_iommu *iommu,
u8 bus, u8 devfn);

+static void __iommu_load_old_root_entry(struct intel_iommu *iommu);
+
+static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int index);
+
/*
* A structure used to store the address allocated by ioremap();
* The we need to call iounmap() to free them out of spin_lock_irqsave/unlock;
@@ -382,7 +386,6 @@ struct iommu_remapped_entry {
static LIST_HEAD(__iommu_remapped_mem);
static DEFINE_MUTEX(__iommu_mem_list_lock);

-
/*
* This domain is a statically identity mapping domain.
* 1. This domain creats a static 1:1 mapping to all usable memory.
@@ -4935,3 +4938,52 @@ int __iommu_free_mapped_mem(void)
return 0;
}

+/*
+ * Load the old root entry table to new root entry table.
+ */
+static void __iommu_load_old_root_entry(struct intel_iommu *iommu)
+{
+ if ((!iommu)
+ || (!iommu->root_entry)
+ || (!iommu->root_entry_old_virt)
+ || (!iommu->root_entry_old_phys))
+ return;
+ memcpy(iommu->root_entry, iommu->root_entry_old_virt, PAGE_SIZE);
+
+ __iommu_flush_cache(iommu, iommu->root_entry, PAGE_SIZE);
+}
+
+/*
+ * When the data in new root entry table is changed, this function
+ * must be called to save the updated data to old root entry table.
+ */
+static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int index)
+{
+ u8 start;
+ unsigned long size;
+ void __iomem *to;
+ void *from;
+
+ if ((!iommu)
+ || (!iommu->root_entry)
+ || (!iommu->root_entry_old_virt)
+ || (!iommu->root_entry_old_phys))
+ return;
+
+ if (index < -1 || index >= ROOT_ENTRY_NR)
+ return;
+
+ if (index == -1) {
+ start = 0;
+ size = ROOT_ENTRY_NR * sizeof(struct root_entry);
+ } else {
+ start = index * sizeof(struct root_entry);
+ size = sizeof(struct root_entry);
+ }
+ to = iommu->root_entry_old_virt;
+ from = iommu->root_entry;
+ memcpy(to + start, from + start, size);
+
+ __iommu_flush_cache(iommu, to + start, size);
+}
+
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ced1fac..e7cac12 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -340,6 +340,9 @@ struct intel_iommu {
spinlock_t lock; /* protect context, domain ids */
struct root_entry *root_entry; /* virtual address */

+ void __iomem *root_entry_old_virt; /* mapped from old root entry */
+ unsigned long root_entry_old_phys; /* root entry in old kernel */
+
struct iommu_flush flush;
#endif
struct q_inval *qi; /* Queued invalidation info */
--
2.0.0-rc0

2015-05-11 09:54:40

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH v11 06/10] iommu/vt-d: datatypes and functions used for kdump

Populate it with support functions to copy iommu translation tables from
from the panicked kernel into the kdump kernel in the event of a crash.

Functions:
Use old root entry table, and load the old data to root_entry as cache.
Malloc new context table and copy old context table to the new one.

Bill Sumner:
Original version, the creation of the data types and functions.

Li, Zhenhua:
Create new function iommu_check_pre_te_status() to check status.
Update the caller of context_get_* and context_put*, use context_*
and context_set_* for replacement.
Update the name of the function that loads root entry table.
Use new function to copy old context entry tables and page tables.
Use "unsigned long" for physical address.
Remove the functions to copy page table in Bill's version.
Remove usage of dve and ppap in Bill's version.

Signed-off-by: Bill Sumner <[email protected]>
Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel-iommu.c | 121 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/intel-iommu.h | 3 ++
2 files changed, 124 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3a5d446..28c3c64 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -386,6 +386,18 @@ struct iommu_remapped_entry {
static LIST_HEAD(__iommu_remapped_mem);
static DEFINE_MUTEX(__iommu_mem_list_lock);

+/* ========================================================================
+ * Copy iommu translation tables from old kernel into new kernel.
+ * Entry to this set of functions is: intel_iommu_load_translation_tables()
+ * ------------------------------------------------------------------------
+ */
+
+static int copy_root_entry_table(struct intel_iommu *iommu);
+
+static int intel_iommu_load_translation_tables(struct intel_iommu *iommu);
+
+static void iommu_check_pre_te_status(struct intel_iommu *iommu);
+
/*
* This domain is a statically identity mapping domain.
* 1. This domain creats a static 1:1 mapping to all usable memory.
@@ -4987,3 +4999,112 @@ static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int index)
__iommu_flush_cache(iommu, to + start, size);
}

+/*
+ * Load root entry tables from old kernel.
+ */
+static int copy_root_entry_table(struct intel_iommu *iommu)
+{
+ u32 bus; /* Index: root-entry-table */
+ struct root_entry *re; /* Virt(iterator: new table) */
+ unsigned long context_old_phys; /* Phys(context table entry) */
+ struct context_entry *context_new_virt; /* Virt(new context_entry) */
+
+ /*
+ * A new root entry table has been allocated ,
+ * we need copy re from old kernel to the new allocated one.
+ */
+
+ if (!iommu->root_entry_old_phys)
+ return -ENOMEM;
+
+ for (bus = 0, re = iommu->root_entry; bus < 256; bus += 1, re += 1) {
+ if (!root_present(re))
+ continue;
+
+ context_old_phys = get_context_phys_from_root(re);
+
+ if (!context_old_phys)
+ continue;
+
+ context_new_virt =
+ (struct context_entry *)alloc_pgtable_page(iommu->node);
+
+ if (!context_new_virt)
+ return -ENOMEM;
+
+ __iommu_load_from_oldmem(context_new_virt,
+ context_old_phys,
+ VTD_PAGE_SIZE);
+
+ __iommu_flush_cache(iommu, context_new_virt, VTD_PAGE_SIZE);
+
+ set_root_value(re, virt_to_phys(context_new_virt));
+ }
+
+ return 0;
+}
+
+/*
+ * Interface to the "load translation tables" set of functions
+ * from mainline code.
+ */
+static int intel_iommu_load_translation_tables(struct intel_iommu *iommu)
+{
+ unsigned long long q; /* quadword scratch */
+ int ret = 0; /* Integer return code */
+ unsigned long flags;
+
+ q = dmar_readq(iommu->reg + DMAR_RTADDR_REG);
+ if (!q)
+ return -1;
+
+ spin_lock_irqsave(&iommu->lock, flags);
+
+ /* Load the root-entry table from the old kernel
+ * foreach context_entry_table in root_entry
+ * Copy each entry table from old kernel
+ */
+ if (!iommu->root_entry) {
+ iommu->root_entry =
+ (struct root_entry *)alloc_pgtable_page(iommu->node);
+ if (!iommu->root_entry) {
+ spin_unlock_irqrestore(&iommu->lock, flags);
+ return -ENOMEM;
+ }
+ }
+
+ iommu->root_entry_old_phys = q & VTD_PAGE_MASK;
+ if (!iommu->root_entry_old_phys) {
+ pr_err("Could not read old root entry address.");
+ return -1;
+ }
+
+ iommu->root_entry_old_virt = ioremap_cache(iommu->root_entry_old_phys,
+ VTD_PAGE_SIZE);
+ if (!iommu->root_entry_old_virt) {
+ pr_err("Could not map the old root entry.");
+ return -ENOMEM;
+ }
+
+ __iommu_load_old_root_entry(iommu);
+ ret = copy_root_entry_table(iommu);
+ __iommu_flush_cache(iommu, iommu->root_entry, PAGE_SIZE);
+ __iommu_update_old_root_entry(iommu, -1);
+
+ spin_unlock_irqrestore(&iommu->lock, flags);
+
+ __iommu_free_mapped_mem();
+
+ return ret;
+}
+
+static void iommu_check_pre_te_status(struct intel_iommu *iommu)
+{
+ u32 sts;
+
+ sts = readl(iommu->reg + DMAR_GSTS_REG);
+ if (sts & DMA_GSTS_TES) {
+ pr_info("Translation is enabled prior to OS.\n");
+ iommu->pre_enabled_trans = 1;
+ }
+}
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index e7cac12..90e122e 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -340,6 +340,9 @@ struct intel_iommu {
spinlock_t lock; /* protect context, domain ids */
struct root_entry *root_entry; /* virtual address */

+ /* whether translation is enabled prior to OS*/
+ u8 pre_enabled_trans;
+
void __iomem *root_entry_old_virt; /* mapped from old root entry */
unsigned long root_entry_old_phys; /* root entry in old kernel */

--
2.0.0-rc0

2015-05-11 09:54:46

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH v11 07/10] iommu/vt-d: enable kdump support in iommu module

Modify the operation of the following functions when called during crash dump:
iommu_context_addr
free_context_table
get_domain_for_dev
init_dmars
intel_iommu_init

Bill Sumner:
Original version.

Zhenhua:
The name of new calling functions.
Do not disable and re-enable TE in kdump kernel.
Use the did and gaw from old context entry;

Signed-off-by: Bill Sumner <[email protected]>
Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel-iommu.c | 95 +++++++++++++++++++++++++++++++++++++++------
1 file changed, 83 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 28c3c64..91545bf 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -397,6 +397,7 @@ static int copy_root_entry_table(struct intel_iommu *iommu);
static int intel_iommu_load_translation_tables(struct intel_iommu *iommu);

static void iommu_check_pre_te_status(struct intel_iommu *iommu);
+static u8 g_translation_pre_enabled;

/*
* This domain is a statically identity mapping domain.
@@ -794,6 +795,9 @@ static inline struct context_entry *iommu_context_addr(struct intel_iommu *iommu
phy_addr = virt_to_phys((void *)context);
*entry = phy_addr | 1;
__iommu_flush_cache(iommu, entry, sizeof(*entry));
+
+ if (iommu->pre_enabled_trans)
+ __iommu_update_old_root_entry(iommu, bus);
}
return &context[devfn];
}
@@ -887,13 +891,15 @@ static void clear_context_table(struct intel_iommu *iommu, u8 bus, u8 devfn)

static void free_context_table(struct intel_iommu *iommu)
{
+ struct root_entry *root = NULL;
int i;
unsigned long flags;
struct context_entry *context;

spin_lock_irqsave(&iommu->lock, flags);
if (!iommu->root_entry) {
- goto out;
+ spin_unlock_irqrestore(&iommu->lock, flags);
+ return;
}
for (i = 0; i < ROOT_ENTRY_NR; i++) {
context = iommu_context_addr(iommu, i, 0, 0);
@@ -908,10 +914,23 @@ static void free_context_table(struct intel_iommu *iommu)
free_pgtable_page(context);

}
+
+ if (iommu->pre_enabled_trans) {
+ iommu->root_entry_old_phys = 0;
+ root = iommu->root_entry_old_virt;
+ iommu->root_entry_old_virt = NULL;
+ }
+
free_pgtable_page(iommu->root_entry);
iommu->root_entry = NULL;
-out:
+
spin_unlock_irqrestore(&iommu->lock, flags);
+
+ /* We put this out of spin_unlock is because iounmap() may
+ * cause error if surrounded by spin_lock and unlock;
+ */
+ if (iommu->pre_enabled_trans)
+ iounmap(root);
}

static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
@@ -2333,6 +2352,7 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
unsigned long flags;
u8 bus, devfn;
int did = -1; /* Default to "no domain_id supplied" */
+ struct context_entry *ce = NULL;

domain = find_domain(dev);
if (domain)
@@ -2366,6 +2386,20 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
domain = alloc_domain(0);
if (!domain)
return NULL;
+
+ if (iommu->pre_enabled_trans) {
+ /*
+ * if this device had a did in the old kernel
+ * use its values instead of generating new ones
+ */
+ ce = device_to_existing_context_entry(iommu, bus, devfn);
+
+ if (ce) {
+ did = context_domain_id(ce);
+ gaw = agaw_to_width(context_address_width(ce));
+ }
+ }
+
domain->id = iommu_attach_domain_with_id(domain, iommu, did);
if (domain->id < 0) {
free_domain_mem(domain);
@@ -2897,6 +2931,7 @@ static int __init init_dmars(void)
goto free_g_iommus;
}

+ g_translation_pre_enabled = 0; /* To know whether to skip RMRR */
for_each_active_iommu(iommu, drhd) {
g_iommus[iommu->seq_id] = iommu;

@@ -2904,14 +2939,30 @@ static int __init init_dmars(void)
if (ret)
goto free_iommu;

- /*
- * TBD:
- * we could share the same root & context tables
- * among all IOMMU's. Need to Split it later.
- */
- ret = iommu_alloc_root_entry(iommu);
- if (ret)
- goto free_iommu;
+ iommu_check_pre_te_status(iommu);
+ if (iommu->pre_enabled_trans) {
+ pr_info("IOMMU Copying translate tables from panicked kernel\n");
+ ret = intel_iommu_load_translation_tables(iommu);
+ if (ret) {
+ pr_err("IOMMU: Copy translate tables failed\n");
+
+ /* Best to stop trying */
+ goto free_iommu;
+ }
+ pr_info("IOMMU: root_cache:0x%12.12llx phys:0x%12.12llx\n",
+ (u64)iommu->root_entry,
+ (u64)iommu->root_entry_old_phys);
+ } else {
+ /*
+ * TBD:
+ * we could share the same root & context tables
+ * among all IOMMU's. Need to Split it later.
+ */
+ ret = iommu_alloc_root_entry(iommu);
+ if (ret)
+ goto free_iommu;
+ }
+
if (!ecap_pass_through(iommu->ecap))
hw_pass_through = 0;
}
@@ -2929,6 +2980,14 @@ static int __init init_dmars(void)
check_tylersburg_isoch();

/*
+ * In the second kernel: Skip setting-up new domains for
+ * si, rmrr, and the isa bus on the expectation that these
+ * translations were copied from the old kernel.
+ */
+ if (g_translation_pre_enabled)
+ goto skip_new_domains_for_si_rmrr_isa;
+
+ /*
* If pass through is not set or not enabled, setup context entries for
* identity mappings for rmrr, gfx, and isa and may fall back to static
* identity mapping if iommu_identity_mapping is set.
@@ -2968,6 +3027,8 @@ static int __init init_dmars(void)

iommu_prepare_isa();

+skip_new_domains_for_si_rmrr_isa:;
+
/*
* for each drhd
* enable fault log
@@ -2996,7 +3057,13 @@ static int __init init_dmars(void)

iommu->flush.flush_context(iommu, 0, 0, 0, DMA_CCMD_GLOBAL_INVL);
iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
- iommu_enable_translation(iommu);
+
+ if (iommu->pre_enabled_trans) {
+ if (!(iommu->gcmd & DMA_GCMD_TE))
+ iommu_enable_translation(iommu);
+ } else
+ iommu_enable_translation(iommu);
+
iommu_disable_protect_mem_regions(iommu);
}

@@ -4282,11 +4349,14 @@ int __init intel_iommu_init(void)
}

/*
- * Disable translation if already enabled prior to OS handover.
+ * We do not need to disable translation if already enabled prior
+ * to OS handover. Because we will try to copy old tables;
*/
+ /*
for_each_active_iommu(iommu, drhd)
if (iommu->gcmd & DMA_GCMD_TE)
iommu_disable_translation(iommu);
+ */

if (dmar_dev_scope_init() < 0) {
if (force_on)
@@ -5106,5 +5176,6 @@ static void iommu_check_pre_te_status(struct intel_iommu *iommu)
if (sts & DMA_GSTS_TES) {
pr_info("Translation is enabled prior to OS.\n");
iommu->pre_enabled_trans = 1;
+ g_translation_pre_enabled = 1;
}
}
--
2.0.0-rc0

2015-05-11 09:56:33

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH v11 08/10] iommu/vt-d: assign new page table for dma_map

When a device driver issues the first dma_map command for a device, we
assign a new and empty page-table, thus removing all mappings from the
old kernel for the device.

Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel-iommu.c | 58 ++++++++++++++++++++++++++++++++++++++-------
1 file changed, 50 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 91545bf..3cc1027 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -396,6 +396,9 @@ static int copy_root_entry_table(struct intel_iommu *iommu);

static int intel_iommu_load_translation_tables(struct intel_iommu *iommu);

+static void unmap_device_dma(struct dmar_domain *domain,
+ struct device *dev,
+ struct intel_iommu *iommu);
static void iommu_check_pre_te_status(struct intel_iommu *iommu);
static u8 g_translation_pre_enabled;

@@ -3115,6 +3118,7 @@ static struct iova *intel_alloc_iova(struct device *dev,
static struct dmar_domain *__get_valid_domain_for_dev(struct device *dev)
{
struct dmar_domain *domain;
+ struct intel_iommu *iommu;
int ret;

domain = get_domain_for_dev(dev, DEFAULT_DOMAIN_ADDRESS_WIDTH);
@@ -3124,14 +3128,30 @@ static struct dmar_domain *__get_valid_domain_for_dev(struct device *dev)
return NULL;
}

- /* make sure context mapping is ok */
- if (unlikely(!domain_context_mapped(dev))) {
- ret = domain_context_mapping(domain, dev, CONTEXT_TT_MULTI_LEVEL);
- if (ret) {
- printk(KERN_ERR "Domain context map for %s failed",
- dev_name(dev));
- return NULL;
- }
+ /* if in kdump kernel, we need to unmap the mapped dma pages,
+ * detach this device first.
+ */
+ if (likely(domain_context_mapped(dev))) {
+ iommu = domain_get_iommu(domain);
+ if (iommu->pre_enabled_trans) {
+ unmap_device_dma(domain, dev, iommu);
+
+ domain = get_domain_for_dev(dev,
+ DEFAULT_DOMAIN_ADDRESS_WIDTH);
+ if (!domain) {
+ pr_err("Allocating domain for %s failed",
+ dev_name(dev));
+ return NULL;
+ }
+ } else
+ return domain;
+ }
+
+ ret = domain_context_mapping(domain, dev, CONTEXT_TT_MULTI_LEVEL);
+ if (ret) {
+ pr_err("Domain context map for %s failed",
+ dev_name(dev));
+ return NULL;
}

return domain;
@@ -5168,6 +5188,28 @@ static int intel_iommu_load_translation_tables(struct intel_iommu *iommu)
return ret;
}

+static void unmap_device_dma(struct dmar_domain *domain,
+ struct device *dev,
+ struct intel_iommu *iommu)
+{
+ struct context_entry *ce;
+ struct iova *iova;
+ phys_addr_t phys_addr;
+ dma_addr_t dev_addr;
+ struct pci_dev *pdev;
+
+ pdev = to_pci_dev(dev);
+ ce = iommu_context_addr(iommu, pdev->bus->number, pdev->devfn, 1);
+ phys_addr = context_address_root(ce) << VTD_PAGE_SHIFT;
+ dev_addr = phys_to_dma(dev, phys_addr);
+
+ iova = find_iova(&domain->iovad, IOVA_PFN(dev_addr));
+ if (iova)
+ intel_unmap(dev, dev_addr);
+
+ domain_remove_one_dev_info(domain, dev);
+}
+
static void iommu_check_pre_te_status(struct intel_iommu *iommu)
{
u32 sts;
--
2.0.0-rc0

2015-05-11 09:54:53

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH v11 09/10] iommu/vt-d: Copy functions for irte

Functions to copy the irte data from the old kernel into the kdump kernel.

Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel_irq_remapping.c | 68 +++++++++++++++++++++++++++++++++++++
include/linux/intel-iommu.h | 4 +++
2 files changed, 72 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 5709ae9..c2a4406 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -17,6 +17,10 @@

#include "irq_remapping.h"

+static int __iommu_load_old_irte(struct intel_iommu *iommu);
+static int __iommu_update_old_irte(struct intel_iommu *iommu, int index);
+static void iommu_check_pre_ir_status(struct intel_iommu *iommu);
+
struct ioapic_scope {
struct intel_iommu *iommu;
unsigned int id;
@@ -1299,3 +1303,67 @@ int dmar_ir_hotplug(struct dmar_drhd_unit *dmaru, bool insert)

return ret;
}
+
+static int __iommu_load_old_irte(struct intel_iommu *iommu)
+{
+ if ((!iommu)
+ || (!iommu->ir_table)
+ || (!iommu->ir_table->base)
+ || (!iommu->ir_table->base_old_phys)
+ || (!iommu->ir_table->base_old_virt))
+ return -1;
+
+ memcpy(iommu->ir_table->base,
+ iommu->ir_table->base_old_virt,
+ INTR_REMAP_TABLE_ENTRIES*sizeof(struct irte));
+
+ __iommu_flush_cache(iommu, iommu->ir_table->base,
+ INTR_REMAP_TABLE_ENTRIES*sizeof(struct irte));
+
+ return 0;
+}
+
+static int __iommu_update_old_irte(struct intel_iommu *iommu, int index)
+{
+ int start;
+ unsigned long size;
+ void __iomem *to;
+ void *from;
+
+ if ((!iommu)
+ || (!iommu->ir_table)
+ || (!iommu->ir_table->base)
+ || (!iommu->ir_table->base_old_phys)
+ || (!iommu->ir_table->base_old_virt))
+ return -1;
+
+ if (index < -1 || index >= INTR_REMAP_TABLE_ENTRIES)
+ return -1;
+
+ if (index == -1) {
+ start = 0;
+ size = INTR_REMAP_TABLE_ENTRIES * sizeof(struct irte);
+ } else {
+ start = index * sizeof(struct irte);
+ size = sizeof(struct irte);
+ }
+
+ to = iommu->ir_table->base_old_virt;
+ from = iommu->ir_table->base;
+ memcpy(to + start, from + start, size);
+
+ __iommu_flush_cache(iommu, to + start, size);
+
+ return 0;
+}
+
+static void iommu_check_pre_ir_status(struct intel_iommu *iommu)
+{
+ u32 sts;
+
+ sts = readl(iommu->reg + DMAR_GSTS_REG);
+ if (sts & DMA_GSTS_IRES) {
+ pr_info("IR is enabled prior to OS.\n");
+ iommu->pre_enabled_ir = 1;
+ }
+}
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 90e122e..d7a0b5d 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -301,6 +301,8 @@ struct q_inval {
struct ir_table {
struct irte *base;
unsigned long *bitmap;
+ void __iomem *base_old_virt;
+ unsigned long base_old_phys;
};
#endif

@@ -342,6 +344,8 @@ struct intel_iommu {

/* whether translation is enabled prior to OS*/
u8 pre_enabled_trans;
+ /* whether interrupt remapping is enabled prior to OS*/
+ u8 pre_enabled_ir;

void __iomem *root_entry_old_virt; /* mapped from old root entry */
unsigned long root_entry_old_phys; /* root entry in old kernel */
--
2.0.0-rc0

2015-05-11 09:55:00

by Li, ZhenHua

[permalink] [raw]
Subject: [PATCH v11 10/10] iommu/vt-d: Use old irte in kdump kernel

Fix the intr-remapping fault.

[1.594890] dmar: DRHD: handling fault status reg 2
[1.594894] dmar: INTR-REMAP: Request device [[41:00.0] fault index 4d
[1.594894] INTR-REMAP:[fault reason 34] Present field in the IRTE entry
is clear

Use old irte in second kernel, do not disable and re-enable interrupt
remapping.

Signed-off-by: Li, Zhen-Hua <[email protected]>
---
drivers/iommu/intel_irq_remapping.c | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index c2a4406..46d80ad 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -197,6 +197,10 @@ static int modify_irte(int irq, struct irte *irte_modified)

set_64bit(&irte->low, irte_modified->low);
set_64bit(&irte->high, irte_modified->high);
+
+ if (iommu->pre_enabled_ir)
+ __iommu_update_old_irte(iommu, index);
+
__iommu_flush_cache(iommu, irte, sizeof(*irte));

rc = qi_flush_iec(iommu, index, 0);
@@ -258,6 +262,9 @@ static int clear_entries(struct irq_2_iommu *irq_iommu)
bitmap_release_region(iommu->ir_table->bitmap, index,
irq_iommu->irte_mask);

+ if (iommu->pre_enabled_ir)
+ __iommu_update_old_irte(iommu, -1);
+
return qi_flush_iec(iommu, index, irq_iommu->irte_mask);
}

@@ -657,11 +664,13 @@ static int __init intel_enable_irq_remapping(void)
*/
dmar_fault(-1, iommu);

+ iommu_check_pre_ir_status(iommu);
+
/*
- * Disable intr remapping and queued invalidation, if already
- * enabled prior to OS handover.
+ * Here we do not disable intr remapping,
+ * if already enabled prior to OS handover.
*/
- iommu_disable_irq_remapping(iommu);
+ /* iommu_disable_irq_remapping(iommu); */

dmar_disable_qi(iommu);
}
@@ -697,7 +706,18 @@ static int __init intel_enable_irq_remapping(void)
* Setup Interrupt-remapping for all the DRHD's now.
*/
for_each_iommu(iommu, drhd) {
- iommu_set_irq_remapping(iommu, eim);
+ if (iommu->pre_enabled_ir) {
+ unsigned long long q;
+
+ q = dmar_readq(iommu->reg + DMAR_IRTA_REG);
+ iommu->ir_table->base_old_phys = q & VTD_PAGE_MASK;
+ iommu->ir_table->base_old_virt = ioremap_cache(
+ iommu->ir_table->base_old_phys,
+ INTR_REMAP_TABLE_ENTRIES*sizeof(struct irte));
+ __iommu_load_old_irte(iommu);
+ } else
+ iommu_set_irq_remapping(iommu, eim);
+
setup = true;
}

--
2.0.0-rc0

2015-05-12 06:19:32

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> This patchset is an update of Bill Sumner's patchset, implements a fix for:
> If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
> when a panic happens, the kdump kernel will boot with these faults:

Test passed on my local workstation. Thanks, Zhenhua.

>
> dmar: DRHD: handling fault status reg 102
> dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff80000
> DMAR:[fault reason 01] Present bit in root entry is clear
>
> dmar: DRHD: handling fault status reg 2
> dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear

2015-05-12 08:18:23

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH v11 02/10] iommu/vt-d: Items required for kdump

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> Add context entry functions needed for kdump.
>
> Bill Sumner:
> Original version;
>
> Li, Zhenhua:
> Changed the name of new functions, make them consistent with current
> context get/set functions.
> Remove the structure dve which is not used in new version.
>
> Signed-off-by: Bill Sumner <[email protected]>
> Signed-off-by: Li, Zhen-Hua <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 72 +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 72 insertions(+)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index cb9d6cc..1e7ceb5 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -190,6 +190,31 @@ struct root_entry {
> };
> #define ROOT_ENTRY_NR (VTD_PAGE_SIZE/sizeof(struct root_entry))
>
> +static inline bool root_present(struct root_entry *root)
> +{
> + return (root->lo & 1);
> +}
> +
> +static inline void set_root_value(struct root_entry *root, unsigned long value)
> +{
> + root->lo &= ~VTD_PAGE_MASK;
> + root->lo |= value & VTD_PAGE_MASK;
> +}
> +
> +static inline struct context_entry *
> +get_context_addr_from_root(struct root_entry *root)
> +{
> + return (struct context_entry *)
> + (root_present(root)?phys_to_virt(
> + root->lo & VTD_PAGE_MASK) :
> + NULL);
> +}
> +
> +static inline unsigned long
> +get_context_phys_from_root(struct root_entry *root)
> +{
> + return root_present(root) ? (root->lo & VTD_PAGE_MASK) : 0;
> +}
>
> /*
> * low 64 bits:
> @@ -211,6 +236,32 @@ static inline bool context_present(struct context_entry *context)
> {
> return (context->lo & 1);
> }
> +
> +static inline int context_fault_enable(struct context_entry *c)
> +{
> + return((c->lo >> 1) & 0x1);
> +}
> +
> +static inline int context_translation_type(struct context_entry *c)
> +{
> + return((c->lo >> 2) & 0x3);
> +}
> +
> +static inline u64 context_address_root(struct context_entry *c)
> +{
> + return((c->lo >> VTD_PAGE_SHIFT));
> +}
> +
> +static inline int context_address_width(struct context_entry *c)
> +{
> + return((c->hi >> 0) & 0x7);
> +}
> +
> +static inline int context_domain_id(struct context_entry *c)
> +{
> + return((c->hi >> 8) & 0xffff);
> +}
> +
> static inline void context_set_present(struct context_entry *context)
> {
> context->lo |= 1;
> @@ -296,6 +347,27 @@ static inline int first_pte_in_page(struct dma_pte *pte)
> return !((unsigned long)pte & ~VTD_PAGE_MASK);
> }
>
> +
> +/*
> + * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU
> + *
> + * Fixes the crashdump kernel to deal with an active iommu and legacy
> + * DMA from the (old) panicked kernel in a manner similar to how legacy
> + * DMA is handled when no hardware iommu was in use by the old kernel --
> + * allow the legacy DMA to continue into its current buffers.
> + *
> + * In the crashdump kernel, this code:
> + * 1. skips disabling the IOMMU's translating.
> + * 2. Do not re-enable IOMMU's translating.
> + * 3. In kdump kernel, use the old root entry table.
> + * 4. Allocate pages for new context entry, copy data from old context entries
> + * in the old kernel to the new ones.
> + *
> + * In other kinds of kernel, for example, a kernel started by kexec,
> + * do the same thing as crashdump kernel.
> + */
> +
> +

Above comments should come along with the code changes instead of putting it
in this patch.

BTW, there's one more blank line at the end..

> /*
> * This domain is a statically identity mapping domain.
> * 1. This domain creats a static 1:1 mapping to all usable memory.
> --
> 2.0.0-rc0
>

2015-05-12 08:30:15

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH v11 04/10] iommu/vt-d: functions to copy data from old mem

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> Add some functions to copy the data from old kernel.
> These functions are used to copy context tables and page tables.
>
> To avoid calling iounmap between spin_lock_irqsave and spin_unlock_irqrestore,
> use a link here, store the pointers , and then use iounmap to free them in
> another place.
>
> Li, Zhen-hua:
> The functions and logics.
>
> Takao Indoh:
> Check if pfn is ram:
> if (page_is_ram(pfn))
>
> Signed-off-by: Li, Zhen-Hua <[email protected]>
> Signed-off-by: Takao Indoh <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 102 ++++++++++++++++++++++++++++++++++++++++++++
> include/linux/intel-iommu.h | 6 +++
> 2 files changed, 108 insertions(+)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 07e6118..0b97c15 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -371,6 +371,17 @@ static struct context_entry *device_to_existing_context_entry(
> struct intel_iommu *iommu,
> u8 bus, u8 devfn);
>
> +/*
> + * A structure used to store the address allocated by ioremap();
> + * The we need to call iounmap() to free them out of spin_lock_irqsave/unlock;
> + */
> +struct iommu_remapped_entry {
> + struct list_head list;
> + void __iomem *mem;
> +};
> +static LIST_HEAD(__iommu_remapped_mem);
> +static DEFINE_MUTEX(__iommu_mem_list_lock);
> +
>
> /*
> * This domain is a statically identity mapping domain.
> @@ -4833,3 +4844,94 @@ static struct context_entry *device_to_existing_context_entry(
> return ret;
> }
>
> +/*
> + * Copy memory from a physically-addressed area into a virtually-addressed area
> + */
> +int __iommu_load_from_oldmem(void *to, unsigned long from, unsigned long size)
> +{
> + unsigned long pfn; /* Page Frame Number */
> + size_t csize = (size_t)size; /* Num(bytes to copy) */
> + unsigned long offset; /* Lower 12 bits of to */

comment for above variable are unnecessary, they are clear enough.
BTW, use size_t size in function argument is better.

> + void __iomem *virt_mem;
> + struct iommu_remapped_entry *mapped;
> +
> + pfn = from >> VTD_PAGE_SHIFT;
> + offset = from & (~VTD_PAGE_MASK);
> +
> + if (page_is_ram(pfn)) {
> + memcpy(to, pfn_to_kaddr(pfn) + offset, csize);
> + } else{
> +
> + mapped = kzalloc(sizeof(struct iommu_remapped_entry),
> + GFP_KERNEL);
> + if (!mapped)
> + return -ENOMEM;
> +
> + virt_mem = ioremap_cache((unsigned long)from, size);
> + if (!virt_mem) {
> + kfree(mapped);
> + return -ENOMEM;
> + }
> + memcpy(to, virt_mem, size);

csize or size? as previous comment use size_t in argument it will be ok here.

> +
> + mutex_lock(&__iommu_mem_list_lock);
> + mapped->mem = virt_mem;
> + list_add_tail(&mapped->list, &__iommu_remapped_mem);
> + mutex_unlock(&__iommu_mem_list_lock);
> + }
> + return size;

type mismatch?

> +}
> +
> +/*
> + * Copy memory from a virtually-addressed area into a physically-addressed area
> + */
> +int __iommu_save_to_oldmem(unsigned long to, void *from, unsigned long size)
> +{
> + unsigned long pfn; /* Page Frame Number */
> + size_t csize = (size_t)size; /* Num(bytes to copy) */
> + unsigned long offset; /* Lower 12 bits of to */

Ditto about data type and comments..

> + void __iomem *virt_mem;
> + struct iommu_remapped_entry *mapped;
> +
> + pfn = to >> VTD_PAGE_SHIFT;
> + offset = to & (~VTD_PAGE_MASK);
> +
> + if (page_is_ram(pfn)) {
> + memcpy(pfn_to_kaddr(pfn) + offset, from, csize);
> + } else{
> + mapped = kzalloc(sizeof(struct iommu_remapped_entry),
> + GFP_KERNEL);
> + if (!mapped)
> + return -ENOMEM;
> +
> + virt_mem = ioremap_cache((unsigned long)to, size);
> + if (!virt_mem) {
> + kfree(mapped);
> + return -ENOMEM;
> + }
> + memcpy(virt_mem, from, size);
> + mutex_lock(&__iommu_mem_list_lock);
> + mapped->mem = virt_mem;
> + list_add_tail(&mapped->list, &__iommu_remapped_mem);
> + mutex_unlock(&__iommu_mem_list_lock);
> + }
> + return size;
> +}

Above two functions looks duplicate, can they be merged as one function and
add a argument about direction?

> +
> +/*
> + * Free the mapped memory for ioremap;
> + */
> +int __iommu_free_mapped_mem(void)
> +{
> + struct iommu_remapped_entry *mem_entry, *tmp;
> +
> + mutex_lock(&__iommu_mem_list_lock);
> + list_for_each_entry_safe(mem_entry, tmp, &__iommu_remapped_mem, list) {
> + iounmap(mem_entry->mem);
> + list_del(&mem_entry->list);
> + kfree(mem_entry);
> + }
> + mutex_unlock(&__iommu_mem_list_lock);
> + return 0;
> +}
> +
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 796ef96..ced1fac 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -380,4 +380,10 @@ extern int dmar_ir_support(void);
>
> extern const struct attribute_group *intel_iommu_groups[];
>
> +extern int __iommu_load_from_oldmem(void *to, unsigned long from,
> + unsigned long size);
> +extern int __iommu_save_to_oldmem(unsigned long to, void *from,
> + unsigned long size);
> +extern int __iommu_free_mapped_mem(void);
> +
> #endif
> --
> 2.0.0-rc0
>

2015-05-12 08:39:09

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH v11 05/10] iommu/vt-d: Add functions to load and save old re

Seems the subject was truncated? Maybe "re" means root entry? Then please fix it

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> Add functions to load root entry table from old kernel, and to save updated
> root entry table.
> Add two member in struct intel_iommu, to store the RTA in old kernel, and
> the mapped virt address of it.

Please explain a bit what is "RTA" in patch log, it is unclear to most of people
who do not know iommu details.

>
> We use the old RTA in dump kernel, and when the iommu->root_entry is used as
> a cache in kdump kernel, its phys address will not be save to RTA register,
> but when its data is changed, we will save the new data to old root entry table.
>
> Li, Zhen-hua:
> The functions and logics.
>
> Takao Indoh:
> Add __iommu_flush_cache.
>
> Signed-off-by: Li, Zhen-Hua <[email protected]>
> Signed-off-by: Takao Indoh <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 54 ++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/intel-iommu.h | 3 +++
> 2 files changed, 56 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 0b97c15..3a5d446 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -371,6 +371,10 @@ static struct context_entry *device_to_existing_context_entry(
> struct intel_iommu *iommu,
> u8 bus, u8 devfn);
>
> +static void __iommu_load_old_root_entry(struct intel_iommu *iommu);
> +
> +static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int index);
> +
> /*
> * A structure used to store the address allocated by ioremap();
> * The we need to call iounmap() to free them out of spin_lock_irqsave/unlock;
> @@ -382,7 +386,6 @@ struct iommu_remapped_entry {
> static LIST_HEAD(__iommu_remapped_mem);
> static DEFINE_MUTEX(__iommu_mem_list_lock);
>
> -
> /*
> * This domain is a statically identity mapping domain.
> * 1. This domain creats a static 1:1 mapping to all usable memory.
> @@ -4935,3 +4938,52 @@ int __iommu_free_mapped_mem(void)
> return 0;
> }
>
> +/*
> + * Load the old root entry table to new root entry table.
> + */
> +static void __iommu_load_old_root_entry(struct intel_iommu *iommu)
> +{
> + if ((!iommu)
> + || (!iommu->root_entry)
> + || (!iommu->root_entry_old_virt)
> + || (!iommu->root_entry_old_phys))
> + return;
> + memcpy(iommu->root_entry, iommu->root_entry_old_virt, PAGE_SIZE);
> +
> + __iommu_flush_cache(iommu, iommu->root_entry, PAGE_SIZE);
> +}
> +
> +/*
> + * When the data in new root entry table is changed, this function
> + * must be called to save the updated data to old root entry table.
> + */
> +static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int index)
> +{
> + u8 start;
> + unsigned long size;
> + void __iomem *to;
> + void *from;
> +
> + if ((!iommu)
> + || (!iommu->root_entry)
> + || (!iommu->root_entry_old_virt)
> + || (!iommu->root_entry_old_phys))
> + return;
> +
> + if (index < -1 || index >= ROOT_ENTRY_NR)
> + return;
> +
> + if (index == -1) {
> + start = 0;
> + size = ROOT_ENTRY_NR * sizeof(struct root_entry);
> + } else {
> + start = index * sizeof(struct root_entry);
> + size = sizeof(struct root_entry);
> + }
> + to = iommu->root_entry_old_virt;
> + from = iommu->root_entry;
> + memcpy(to + start, from + start, size);
> +
> + __iommu_flush_cache(iommu, to + start, size);
> +}
> +
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index ced1fac..e7cac12 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -340,6 +340,9 @@ struct intel_iommu {
> spinlock_t lock; /* protect context, domain ids */
> struct root_entry *root_entry; /* virtual address */
>
> + void __iomem *root_entry_old_virt; /* mapped from old root entry */
> + unsigned long root_entry_old_phys; /* root entry in old kernel */
> +
> struct iommu_flush flush;
> #endif
> struct q_inval *qi; /* Queued invalidation info */
> --
> 2.0.0-rc0
>

2015-05-12 08:48:30

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH v11 06/10] iommu/vt-d: datatypes and functions used for kdump

The patch subject is bad, previous patch you use "Items for kdump", this
patch you use "datatypes and functions used for kdump" then what's the
difference between these two patches?

Please think about a better one for these patches..

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> Populate it with support functions to copy iommu translation tables from
> from the panicked kernel into the kdump kernel in the event of a crash.
>
> Functions:
> Use old root entry table, and load the old data to root_entry as cache.
> Malloc new context table and copy old context table to the new one.
>
> Bill Sumner:
> Original version, the creation of the data types and functions.
>
> Li, Zhenhua:
> Create new function iommu_check_pre_te_status() to check status.
> Update the caller of context_get_* and context_put*, use context_*
> and context_set_* for replacement.
> Update the name of the function that loads root entry table.
> Use new function to copy old context entry tables and page tables.
> Use "unsigned long" for physical address.
> Remove the functions to copy page table in Bill's version.
> Remove usage of dve and ppap in Bill's version.
>
> Signed-off-by: Bill Sumner <[email protected]>
> Signed-off-by: Li, Zhen-Hua <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 121 ++++++++++++++++++++++++++++++++++++++++++++
> include/linux/intel-iommu.h | 3 ++
> 2 files changed, 124 insertions(+)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 3a5d446..28c3c64 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -386,6 +386,18 @@ struct iommu_remapped_entry {
> static LIST_HEAD(__iommu_remapped_mem);
> static DEFINE_MUTEX(__iommu_mem_list_lock);
>
> +/* ========================================================================

Please remove the ====..

> + * Copy iommu translation tables from old kernel into new kernel.

One more space between "new" and "kernel"

> + * Entry to this set of functions is: intel_iommu_load_translation_tables()
> + * ------------------------------------------------------------------------

Drop above --- line is better

> + */
> +
> +static int copy_root_entry_table(struct intel_iommu *iommu);
> +
> +static int intel_iommu_load_translation_tables(struct intel_iommu *iommu);
> +
> +static void iommu_check_pre_te_status(struct intel_iommu *iommu);
> +
> /*
> * This domain is a statically identity mapping domain.
> * 1. This domain creats a static 1:1 mapping to all usable memory.
> @@ -4987,3 +4999,112 @@ static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int index)
> __iommu_flush_cache(iommu, to + start, size);
> }
>
> +/*
> + * Load root entry tables from old kernel.
> + */
> +static int copy_root_entry_table(struct intel_iommu *iommu)
> +{
> + u32 bus; /* Index: root-entry-table */
> + struct root_entry *re; /* Virt(iterator: new table) */
> + unsigned long context_old_phys; /* Phys(context table entry) */
> + struct context_entry *context_new_virt; /* Virt(new context_entry) */
> +
> + /*
> + * A new root entry table has been allocated ,

One more space before ",'

> + * we need copy re from old kernel to the new allocated one.
> + */
> +
> + if (!iommu->root_entry_old_phys)
> + return -ENOMEM;
> +
> + for (bus = 0, re = iommu->root_entry; bus < 256; bus += 1, re += 1) {
> + if (!root_present(re))
> + continue;
> +
> + context_old_phys = get_context_phys_from_root(re);
> +
> + if (!context_old_phys)
> + continue;
> +
> + context_new_virt =
> + (struct context_entry *)alloc_pgtable_page(iommu->node);
> +
> + if (!context_new_virt)
> + return -ENOMEM;
> +
> + __iommu_load_from_oldmem(context_new_virt,
> + context_old_phys,
> + VTD_PAGE_SIZE);
> +
> + __iommu_flush_cache(iommu, context_new_virt, VTD_PAGE_SIZE);
> +
> + set_root_value(re, virt_to_phys(context_new_virt));
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * Interface to the "load translation tables" set of functions
> + * from mainline code.
> + */
> +static int intel_iommu_load_translation_tables(struct intel_iommu *iommu)
> +{
> + unsigned long long q; /* quadword scratch */
> + int ret = 0; /* Integer return code */

comment for ret is not necessary, "quadword" is also unnecessary, just explain
the purpose of 'q' is enough.

> + unsigned long flags;
> +
> + q = dmar_readq(iommu->reg + DMAR_RTADDR_REG);
> + if (!q)
> + return -1;
> +
> + spin_lock_irqsave(&iommu->lock, flags);
> +
> + /* Load the root-entry table from the old kernel
> + * foreach context_entry_table in root_entry
> + * Copy each entry table from old kernel
> + */
> + if (!iommu->root_entry) {
> + iommu->root_entry =
> + (struct root_entry *)alloc_pgtable_page(iommu->node);
> + if (!iommu->root_entry) {
> + spin_unlock_irqrestore(&iommu->lock, flags);
> + return -ENOMEM;
> + }
> + }
> +
> + iommu->root_entry_old_phys = q & VTD_PAGE_MASK;
> + if (!iommu->root_entry_old_phys) {
> + pr_err("Could not read old root entry address.");
> + return -1;
> + }
> +
> + iommu->root_entry_old_virt = ioremap_cache(iommu->root_entry_old_phys,
> + VTD_PAGE_SIZE);
> + if (!iommu->root_entry_old_virt) {
> + pr_err("Could not map the old root entry.");
> + return -ENOMEM;
> + }
> +
> + __iommu_load_old_root_entry(iommu);
> + ret = copy_root_entry_table(iommu);
> + __iommu_flush_cache(iommu, iommu->root_entry, PAGE_SIZE);
> + __iommu_update_old_root_entry(iommu, -1);
> +
> + spin_unlock_irqrestore(&iommu->lock, flags);
> +
> + __iommu_free_mapped_mem();
> +
> + return ret;
> +}
> +
> +static void iommu_check_pre_te_status(struct intel_iommu *iommu)
> +{
> + u32 sts;
> +
> + sts = readl(iommu->reg + DMAR_GSTS_REG);
> + if (sts & DMA_GSTS_TES) {
> + pr_info("Translation is enabled prior to OS.\n");
> + iommu->pre_enabled_trans = 1;
> + }
> +}
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index e7cac12..90e122e 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -340,6 +340,9 @@ struct intel_iommu {
> spinlock_t lock; /* protect context, domain ids */
> struct root_entry *root_entry; /* virtual address */
>
> + /* whether translation is enabled prior to OS*/
> + u8 pre_enabled_trans;
> +
> void __iomem *root_entry_old_virt; /* mapped from old root entry */
> unsigned long root_entry_old_phys; /* root entry in old kernel */
>
> --
> 2.0.0-rc0
>

2015-05-12 08:53:16

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH v11 07/10] iommu/vt-d: enable kdump support in iommu module

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> Modify the operation of the following functions when called during crash dump:
> iommu_context_addr
> free_context_table
> get_domain_for_dev
> init_dmars
> intel_iommu_init
>
> Bill Sumner:
> Original version.
>
> Zhenhua:
> The name of new calling functions.
> Do not disable and re-enable TE in kdump kernel.
> Use the did and gaw from old context entry;
>
> Signed-off-by: Bill Sumner <[email protected]>
> Signed-off-by: Li, Zhen-Hua <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 95 +++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 83 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 28c3c64..91545bf 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -397,6 +397,7 @@ static int copy_root_entry_table(struct intel_iommu *iommu);
> static int intel_iommu_load_translation_tables(struct intel_iommu *iommu);
>
> static void iommu_check_pre_te_status(struct intel_iommu *iommu);
> +static u8 g_translation_pre_enabled;
>
> /*
> * This domain is a statically identity mapping domain.
> @@ -794,6 +795,9 @@ static inline struct context_entry *iommu_context_addr(struct intel_iommu *iommu
> phy_addr = virt_to_phys((void *)context);
> *entry = phy_addr | 1;
> __iommu_flush_cache(iommu, entry, sizeof(*entry));
> +
> + if (iommu->pre_enabled_trans)
> + __iommu_update_old_root_entry(iommu, bus);
> }
> return &context[devfn];
> }
> @@ -887,13 +891,15 @@ static void clear_context_table(struct intel_iommu *iommu, u8 bus, u8 devfn)
>
> static void free_context_table(struct intel_iommu *iommu)
> {
> + struct root_entry *root = NULL;
> int i;
> unsigned long flags;
> struct context_entry *context;
>
> spin_lock_irqsave(&iommu->lock, flags);
> if (!iommu->root_entry) {
> - goto out;
> + spin_unlock_irqrestore(&iommu->lock, flags);
> + return;
> }
> for (i = 0; i < ROOT_ENTRY_NR; i++) {
> context = iommu_context_addr(iommu, i, 0, 0);
> @@ -908,10 +914,23 @@ static void free_context_table(struct intel_iommu *iommu)
> free_pgtable_page(context);
>
> }
> +
> + if (iommu->pre_enabled_trans) {
> + iommu->root_entry_old_phys = 0;
> + root = iommu->root_entry_old_virt;
> + iommu->root_entry_old_virt = NULL;
> + }
> +
> free_pgtable_page(iommu->root_entry);
> iommu->root_entry = NULL;
> -out:
> +
> spin_unlock_irqrestore(&iommu->lock, flags);
> +
> + /* We put this out of spin_unlock is because iounmap() may
> + * cause error if surrounded by spin_lock and unlock;
> + */
> + if (iommu->pre_enabled_trans)
> + iounmap(root);
> }
>
> static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
> @@ -2333,6 +2352,7 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
> unsigned long flags;
> u8 bus, devfn;
> int did = -1; /* Default to "no domain_id supplied" */
> + struct context_entry *ce = NULL;
>
> domain = find_domain(dev);
> if (domain)
> @@ -2366,6 +2386,20 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
> domain = alloc_domain(0);
> if (!domain)
> return NULL;
> +
> + if (iommu->pre_enabled_trans) {
> + /*
> + * if this device had a did in the old kernel
> + * use its values instead of generating new ones
> + */
> + ce = device_to_existing_context_entry(iommu, bus, devfn);
> +
> + if (ce) {
> + did = context_domain_id(ce);
> + gaw = agaw_to_width(context_address_width(ce));
> + }
> + }
> +
> domain->id = iommu_attach_domain_with_id(domain, iommu, did);
> if (domain->id < 0) {
> free_domain_mem(domain);
> @@ -2897,6 +2931,7 @@ static int __init init_dmars(void)
> goto free_g_iommus;
> }
>
> + g_translation_pre_enabled = 0; /* To know whether to skip RMRR */
> for_each_active_iommu(iommu, drhd) {
> g_iommus[iommu->seq_id] = iommu;
>
> @@ -2904,14 +2939,30 @@ static int __init init_dmars(void)
> if (ret)
> goto free_iommu;
>
> - /*
> - * TBD:
> - * we could share the same root & context tables
> - * among all IOMMU's. Need to Split it later.
> - */
> - ret = iommu_alloc_root_entry(iommu);
> - if (ret)
> - goto free_iommu;
> + iommu_check_pre_te_status(iommu);
> + if (iommu->pre_enabled_trans) {
> + pr_info("IOMMU Copying translate tables from panicked kernel\n");
> + ret = intel_iommu_load_translation_tables(iommu);
> + if (ret) {
> + pr_err("IOMMU: Copy translate tables failed\n");
> +
> + /* Best to stop trying */
> + goto free_iommu;
> + }
> + pr_info("IOMMU: root_cache:0x%12.12llx phys:0x%12.12llx\n",
> + (u64)iommu->root_entry,
> + (u64)iommu->root_entry_old_phys);
> + } else {
> + /*
> + * TBD:
> + * we could share the same root & context tables
> + * among all IOMMU's. Need to Split it later.
> + */
> + ret = iommu_alloc_root_entry(iommu);
> + if (ret)
> + goto free_iommu;
> + }
> +
> if (!ecap_pass_through(iommu->ecap))
> hw_pass_through = 0;
> }
> @@ -2929,6 +2980,14 @@ static int __init init_dmars(void)
> check_tylersburg_isoch();
>
> /*
> + * In the second kernel: Skip setting-up new domains for

Use kexec kernel instead of second kernel is better because there's no context
reader will be confused.

> + * si, rmrr, and the isa bus on the expectation that these
> + * translations were copied from the old kernel.
> + */
> + if (g_translation_pre_enabled)
> + goto skip_new_domains_for_si_rmrr_isa;
> +
> + /*
> * If pass through is not set or not enabled, setup context entries for
> * identity mappings for rmrr, gfx, and isa and may fall back to static
> * identity mapping if iommu_identity_mapping is set.
> @@ -2968,6 +3027,8 @@ static int __init init_dmars(void)
>
> iommu_prepare_isa();
>
> +skip_new_domains_for_si_rmrr_isa:;
> +
> /*
> * for each drhd
> * enable fault log
> @@ -2996,7 +3057,13 @@ static int __init init_dmars(void)
>
> iommu->flush.flush_context(iommu, 0, 0, 0, DMA_CCMD_GLOBAL_INVL);
> iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
> - iommu_enable_translation(iommu);
> +
> + if (iommu->pre_enabled_trans) {
> + if (!(iommu->gcmd & DMA_GCMD_TE))
> + iommu_enable_translation(iommu);
> + } else
> + iommu_enable_translation(iommu);
> +
> iommu_disable_protect_mem_regions(iommu);
> }
>
> @@ -4282,11 +4349,14 @@ int __init intel_iommu_init(void)
> }
>
> /*
> - * Disable translation if already enabled prior to OS handover.
> + * We do not need to disable translation if already enabled prior
> + * to OS handover. Because we will try to copy old tables;
> */
> + /*
> for_each_active_iommu(iommu, drhd)
> if (iommu->gcmd & DMA_GCMD_TE)
> iommu_disable_translation(iommu);
> + */

Above commented out code is useless?

>
> if (dmar_dev_scope_init() < 0) {
> if (force_on)
> @@ -5106,5 +5176,6 @@ static void iommu_check_pre_te_status(struct intel_iommu *iommu)
> if (sts & DMA_GSTS_TES) {
> pr_info("Translation is enabled prior to OS.\n");
> iommu->pre_enabled_trans = 1;
> + g_translation_pre_enabled = 1;
> }
> }
> --
> 2.0.0-rc0
>

2015-05-12 09:01:52

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH v11 09/10] iommu/vt-d: Copy functions for irte

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> Functions to copy the irte data from the old kernel into the kdump kernel.

Use above line as subject is better, if irte means interrupt remappin table
entry then descripbe it then I like the long format in subject line.

Also it need a better patch description about why do we need this, and what
has been added in this patch.

>
> Signed-off-by: Li, Zhen-Hua <[email protected]>
> ---
> drivers/iommu/intel_irq_remapping.c | 68 +++++++++++++++++++++++++++++++++++++
> include/linux/intel-iommu.h | 4 +++
> 2 files changed, 72 insertions(+)
>
> diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
> index 5709ae9..c2a4406 100644
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -17,6 +17,10 @@
>
> #include "irq_remapping.h"
>
> +static int __iommu_load_old_irte(struct intel_iommu *iommu);
> +static int __iommu_update_old_irte(struct intel_iommu *iommu, int index);
> +static void iommu_check_pre_ir_status(struct intel_iommu *iommu);
> +
> struct ioapic_scope {
> struct intel_iommu *iommu;
> unsigned int id;
> @@ -1299,3 +1303,67 @@ int dmar_ir_hotplug(struct dmar_drhd_unit *dmaru, bool insert)
>
> return ret;
> }
> +
> +static int __iommu_load_old_irte(struct intel_iommu *iommu)
> +{
> + if ((!iommu)
> + || (!iommu->ir_table)
> + || (!iommu->ir_table->base)
> + || (!iommu->ir_table->base_old_phys)
> + || (!iommu->ir_table->base_old_virt))
> + return -1;
> +
> + memcpy(iommu->ir_table->base,
> + iommu->ir_table->base_old_virt,
> + INTR_REMAP_TABLE_ENTRIES*sizeof(struct irte));
> +
> + __iommu_flush_cache(iommu, iommu->ir_table->base,
> + INTR_REMAP_TABLE_ENTRIES*sizeof(struct irte));
> +
> + return 0;
> +}
> +
> +static int __iommu_update_old_irte(struct intel_iommu *iommu, int index)
> +{
> + int start;
> + unsigned long size;
> + void __iomem *to;
> + void *from;
> +
> + if ((!iommu)
> + || (!iommu->ir_table)
> + || (!iommu->ir_table->base)
> + || (!iommu->ir_table->base_old_phys)
> + || (!iommu->ir_table->base_old_virt))
> + return -1;
> +
> + if (index < -1 || index >= INTR_REMAP_TABLE_ENTRIES)
> + return -1;
> +
> + if (index == -1) {
> + start = 0;
> + size = INTR_REMAP_TABLE_ENTRIES * sizeof(struct irte);
> + } else {
> + start = index * sizeof(struct irte);
> + size = sizeof(struct irte);
> + }
> +
> + to = iommu->ir_table->base_old_virt;
> + from = iommu->ir_table->base;
> + memcpy(to + start, from + start, size);
> +
> + __iommu_flush_cache(iommu, to + start, size);
> +
> + return 0;
> +}
> +
> +static void iommu_check_pre_ir_status(struct intel_iommu *iommu)
> +{
> + u32 sts;
> +
> + sts = readl(iommu->reg + DMAR_GSTS_REG);
> + if (sts & DMA_GSTS_IRES) {
> + pr_info("IR is enabled prior to OS.\n");
> + iommu->pre_enabled_ir = 1;
> + }
> +}
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 90e122e..d7a0b5d 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -301,6 +301,8 @@ struct q_inval {
> struct ir_table {
> struct irte *base;
> unsigned long *bitmap;
> + void __iomem *base_old_virt;
> + unsigned long base_old_phys;
> };
> #endif
>
> @@ -342,6 +344,8 @@ struct intel_iommu {
>
> /* whether translation is enabled prior to OS*/
> u8 pre_enabled_trans;
> + /* whether interrupt remapping is enabled prior to OS*/
> + u8 pre_enabled_ir;
>
> void __iomem *root_entry_old_virt; /* mapped from old root entry */
> unsigned long root_entry_old_phys; /* root entry in old kernel */
> --
> 2.0.0-rc0
>

2015-05-12 09:05:18

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> This patchset is an update of Bill Sumner's patchset, implements a fix for:
> If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
> when a panic happens, the kdump kernel will boot with these faults:
>
> dmar: DRHD: handling fault status reg 102
> dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff80000
> DMAR:[fault reason 01] Present bit in root entry is clear
>
> dmar: DRHD: handling fault status reg 2
> dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
>
> On some system, the interrupt remapping fault will also happen even if the
> intel_iommu is not set to on, because the interrupt remapping will be enabled
> when x2apic is needed by the system.
>
> The cause of the DMA fault is described in Bill's original version, and the
> INTR-Remap fault is caused by a similar reason. In short, the initialization
> of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
> response.
>
> To fix this problem, we modifies the behaviors of the intel vt-d in the
> crashdump kernel:
>
> For DMA Remapping:
> 1. To accept the vt-d hardware in an active state,
> 2. Do not disable and re-enable the translation, keep it enabled.
> 3. Use the old root entry table, do not rewrite the RTA register.
> 4. Malloc and use new context entry table, copy data from the old ones that
> used by the old kernel.
> 5. Keep using the old page tables before driver is loaded.
> 6. After device driver is loaded, when it issues the first dma_map command,
> free the dmar_domain structure for this device, and generate a new one, so
> that the device can be assigned a new and empty page table.
> 7. When a new context entry table is generated, we also save its address to
> the old root entry table.
>
> For Interrupt Remapping:
> 1. To accept the vt-d hardware in an active state,
> 2. Do not disable and re-enable the interrupt remapping, keep it enabled.
> 3. Use the old interrupt remapping table, do not rewrite the IRTA register.
> 4. When ioapic entry is setup, the interrupt remapping table is changed, and
> the updated data will be stored to the old interrupt remapping table.
>
> Advantages of this approach:
> 1. All manipulation of the IO-device is done by the Linux device-driver
> for that device.
> 2. This approach behaves in a manner very similar to operation without an
> active iommu.
> 3. Any activity between the IO-device and its RMRR areas is handled by the
> device-driver in the same manner as during a non-kdump boot.
> 4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
> This supports the practice of creating a special kdump kernel without
> drivers for any devices that are not required for taking a crashdump.
> 5. Minimal code-changes among the existing mainline intel vt-d code.
>
> Summary of changes in this patch set:
> 1. Added some useful function for root entry table in code intel-iommu.c
> 2. Added new members to struct root_entry and struct irte;
> 3. Functions to load old root entry table to iommu->root_entry from the memory
> of old kernel.
> 4. Functions to malloc new context entry table and copy the data from the old
> ones to the malloced new ones.
> 5. Functions to enable support for DMA remapping in kdump kernel.
> 6. Functions to load old irte data from the old kernel to the kdump kernel.
> 7. Some code changes that support other behaviours that have been listed.
> 8. In the new functions, use physical address as "unsigned long" type, not
> pointers.
>
> Original version by Bill Sumner:
> https://lkml.org/lkml/2014/1/10/518
> https://lkml.org/lkml/2014/4/15/716
> https://lkml.org/lkml/2014/4/24/836
>
> Zhenhua's updates:
> https://lkml.org/lkml/2014/10/21/134
> https://lkml.org/lkml/2014/12/15/121
> https://lkml.org/lkml/2014/12/22/53
> https://lkml.org/lkml/2015/1/6/1166
> https://lkml.org/lkml/2015/1/12/35
> https://lkml.org/lkml/2015/3/19/33
> https://lkml.org/lkml/2015/4/10/135
>
> Changelog[v11]:
> 1. Fix some conflicts with the latest upstream kernel.
> Add flush in iommu_context_addr
>
> Changelog[v10]:
> 1. Do not use CONFIG_CRASH_DUMP and is_kdump_kernel().
> Use one flag which stores the te and ir status in last kernel:
> iommu->pre_enabled_trans
> iommu->pre_enabled_ir
>
> Changelog[v9]:
> 1. Add new function iommu_attach_domain_with_id.
> 2. Do not copy old page tables, keep using the old ones.
> 3. Remove functions:
> intel_iommu_did_to_domain_values_entry
> intel_iommu_get_dids_from_old_kernel
> device_to_domain_id
> copy_page_addr
> copy_page_table
> copy_context_entry
> copy_context_entry_table
> 4. Add new function device_to_existing_context_entry.
>
> Changelog[v8]:
> 1. Add a missing __iommu_flush_cache in function copy_page_table.
>
> Changelog[v7]:
> 1. Use __iommu_flush_cache to flush the data to hardware.
>
> Changelog[v6]:
> 1. Use "unsigned long" as type of physical address.
> 2. Use new function unmap_device_dma to unmap the old dma.
> 3. Some small incorrect bits order for aw shift.
>
> Changelog[v5]:
> 1. Do not disable and re-enable traslation and interrupt remapping.
> 2. Use old root entry table.
> 3. Use old interrupt remapping table.
> 4. New functions to copy data from old kernel, and save to old kernel mem.
> 5. New functions to save updated root entry table and irte table.
> 6. Use intel_unmap to unmap the old dma;
> 7. Allocate new pages while driver is being loaded.
>
> Changelog[v4]:
> 1. Cut off the patches that move some defines and functions to new files.
> 2. Reduce the numbers of patches to five, make it more easier to read.
> 3. Changed the name of functions, make them consistent with current context
> get/set functions.
> 4. Add change to function __iommu_attach_domain.
>
> Changelog[v3]:
> 1. Commented-out "#define DEBUG 1" to eliminate debug messages.
> 2. Updated the comments about changes in each version.
> 3. Fixed: one-line added to Copy-Translations patch to initialize the iovad
> struct as recommended by Baoquan He [[email protected]]
> init_iova_domain(&domain->iovad, DMA_32BIT_PFN);
>
> Changelog[v2]:
> The following series implements a fix for:
> A kdump problem about DMA that has been discussed for a long time. That is,
> when a kernel panics and boots into the kdump kernel, DMA started by the
> panicked kernel is not stopped before the kdump kernel is booted and the
> kdump kernel disables the IOMMU while this DMA continues. This causes the
> IOMMU to stop translating the DMA addresses as IOVAs and begin to treat
> them as physical memory addresses -- which causes the DMA to either:
> (1) generate DMAR errors or
> (2) generate PCI SERR errors or
> (3) transfer data to or from incorrect areas of memory. Often this
> causes the dump to fail.
>
> Changelog[v1]:
> The original version.
>
> Changed in this version:
> 1. Do not disable and re-enable traslation and interrupt remapping.
> 2. Use old root entry table.
> 3. Use old interrupt remapping table.
> 4. Use "unsigned long" as physical address.
> 5. Use intel_unmap to unmap the old dma;
>
> Baoquan He <[email protected]> helps testing this patchset.
> Takao Indoh <[email protected]> gives valuable suggestions.
>
> Li, Zhen-Hua (10):
> iommu/vt-d: New function to attach domain with id
> iommu/vt-d: Items required for kdump
> iommu/vt-d: Function to get existing context entry
> iommu/vt-d: functions to copy data from old mem
> iommu/vt-d: Add functions to load and save old re
> iommu/vt-d: datatypes and functions used for kdump
> iommu/vt-d: enable kdump support in iommu module
> iommu/vt-d: assign new page table for dma_map
> iommu/vt-d: Copy functions for irte
> iommu/vt-d: Use old irte in kdump kernel
>
> drivers/iommu/intel-iommu.c | 536 ++++++++++++++++++++++++++++++++++--
> drivers/iommu/intel_irq_remapping.c | 96 ++++++-
> include/linux/intel-iommu.h | 16 ++
> 3 files changed, 623 insertions(+), 25 deletions(-)
>
> --
> 2.0.0-rc0
>

Zhenhua,

I reviewed the patchset from code struct point of view, see the inline
comments, thanks a lot and appreciated for your work.

Dave

2015-05-12 09:35:15

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

To fix the dmar fault in running v10 with latest upstream version, added a flush in the new function iommu_context_addr.

Thanks
Zhenhua

>
> This patchset is an update of Bill Sumner's patchset, implements a fix for:
> If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
> when a panic happens, the kdump kernel will boot with these faults:
>
> dmar: DRHD: handling fault status reg 102
> dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff80000
> DMAR:[fault reason 01] Present bit in root entry is clear
>
> dmar: DRHD: handling fault status reg 2
> dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
>
> On some system, the interrupt remapping fault will also happen even if the
> intel_iommu is not set to on, because the interrupt remapping will be enabled
> when x2apic is needed by the system.
>
> The cause of the DMA fault is described in Bill's original version, and the
> INTR-Remap fault is caused by a similar reason. In short, the initialization
> of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
> response.
>
> To fix this problem, we modifies the behaviors of the intel vt-d in the
> crashdump kernel:
>
> For DMA Remapping:
> 1. To accept the vt-d hardware in an active state,
> 2. Do not disable and re-enable the translation, keep it enabled.
> 3. Use the old root entry table, do not rewrite the RTA register.
> 4. Malloc and use new context entry table, copy data from the old ones that
> used by the old kernel.
> 5. Keep using the old page tables before driver is loaded.
> 6. After device driver is loaded, when it issues the first dma_map command,
> free the dmar_domain structure for this device, and generate a new one, so
> that the device can be assigned a new and empty page table.
> 7. When a new context entry table is generated, we also save its address to
> the old root entry table.
>
> For Interrupt Remapping:
> 1. To accept the vt-d hardware in an active state,
> 2. Do not disable and re-enable the interrupt remapping, keep it enabled.
> 3. Use the old interrupt remapping table, do not rewrite the IRTA register.
> 4. When ioapic entry is setup, the interrupt remapping table is changed, and
> the updated data will be stored to the old interrupt remapping table.
>
> Advantages of this approach:
> 1. All manipulation of the IO-device is done by the Linux device-driver
> for that device.
> 2. This approach behaves in a manner very similar to operation without an
> active iommu.
> 3. Any activity between the IO-device and its RMRR areas is handled by the
> device-driver in the same manner as during a non-kdump boot.
> 4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
> This supports the practice of creating a special kdump kernel without
> drivers for any devices that are not required for taking a crashdump.
> 5. Minimal code-changes among the existing mainline intel vt-d code.
>
> Summary of changes in this patch set:
> 1. Added some useful function for root entry table in code intel-iommu.c
> 2. Added new members to struct root_entry and struct irte;
> 3. Functions to load old root entry table to iommu->root_entry from the memory
> of old kernel.
> 4. Functions to malloc new context entry table and copy the data from the old
> ones to the malloced new ones.
> 5. Functions to enable support for DMA remapping in kdump kernel.
> 6. Functions to load old irte data from the old kernel to the kdump kernel.
> 7. Some code changes that support other behaviours that have been listed.
> 8. In the new functions, use physical address as "unsigned long" type, not
> pointers.
>
> Original version by Bill Sumner:
> https://lkml.org/lkml/2014/1/10/518
> https://lkml.org/lkml/2014/4/15/716
> https://lkml.org/lkml/2014/4/24/836
>
> Zhenhua's updates:
> https://lkml.org/lkml/2014/10/21/134
> https://lkml.org/lkml/2014/12/15/121
> https://lkml.org/lkml/2014/12/22/53
> https://lkml.org/lkml/2015/1/6/1166
> https://lkml.org/lkml/2015/1/12/35
> https://lkml.org/lkml/2015/3/19/33
> https://lkml.org/lkml/2015/4/10/135
>
> Changelog[v11]:
> 1. Fix some conflicts with the latest upstream kernel.
> Add flush in iommu_context_addr
>
> Changelog[v10]:
> 1. Do not use CONFIG_CRASH_DUMP and is_kdump_kernel().
> Use one flag which stores the te and ir status in last kernel:
> iommu->pre_enabled_trans
> iommu->pre_enabled_ir
>
> Changelog[v9]:
> 1. Add new function iommu_attach_domain_with_id.
> 2. Do not copy old page tables, keep using the old ones.
> 3. Remove functions:
> intel_iommu_did_to_domain_values_entry
> intel_iommu_get_dids_from_old_kernel
> device_to_domain_id
> copy_page_addr
> copy_page_table
> copy_context_entry
> copy_context_entry_table
> 4. Add new function device_to_existing_context_entry.
>
> Changelog[v8]:
> 1. Add a missing __iommu_flush_cache in function copy_page_table.
>
> Changelog[v7]:
> 1. Use __iommu_flush_cache to flush the data to hardware.
>
> Changelog[v6]:
> 1. Use "unsigned long" as type of physical address.
> 2. Use new function unmap_device_dma to unmap the old dma.
> 3. Some small incorrect bits order for aw shift.
>
> Changelog[v5]:
> 1. Do not disable and re-enable traslation and interrupt remapping.
> 2. Use old root entry table.
> 3. Use old interrupt remapping table.
> 4. New functions to copy data from old kernel, and save to old kernel mem.
> 5. New functions to save updated root entry table and irte table.
> 6. Use intel_unmap to unmap the old dma;
> 7. Allocate new pages while driver is being loaded.
>
> Changelog[v4]:
> 1. Cut off the patches that move some defines and functions to new files.
> 2. Reduce the numbers of patches to five, make it more easier to read.
> 3. Changed the name of functions, make them consistent with current context
> get/set functions.
> 4. Add change to function __iommu_attach_domain.
>
> Changelog[v3]:
> 1. Commented-out "#define DEBUG 1" to eliminate debug messages.
> 2. Updated the comments about changes in each version.
> 3. Fixed: one-line added to Copy-Translations patch to initialize the iovad
> struct as recommended by Baoquan He [[email protected]]
> init_iova_domain(&domain->iovad, DMA_32BIT_PFN);
>
> Changelog[v2]:
> The following series implements a fix for:
> A kdump problem about DMA that has been discussed for a long time. That is,
> when a kernel panics and boots into the kdump kernel, DMA started by the
> panicked kernel is not stopped before the kdump kernel is booted and the
> kdump kernel disables the IOMMU while this DMA continues. This causes the
> IOMMU to stop translating the DMA addresses as IOVAs and begin to treat
> them as physical memory addresses -- which causes the DMA to either:
> (1) generate DMAR errors or
> (2) generate PCI SERR errors or
> (3) transfer data to or from incorrect areas of memory. Often this
> causes the dump to fail.
>
> Changelog[v1]:
> The original version.
>
> Changed in this version:
> 1. Do not disable and re-enable traslation and interrupt remapping.
> 2. Use old root entry table.
> 3. Use old interrupt remapping table.
> 4. Use "unsigned long" as physical address.
> 5. Use intel_unmap to unmap the old dma;
>
> Baoquan He <[email protected]> helps testing this patchset.
> Takao Indoh <[email protected]> gives valuable suggestions.
>
> Li, Zhen-Hua (10):
> iommu/vt-d: New function to attach domain with id
> iommu/vt-d: Items required for kdump
> iommu/vt-d: Function to get existing context entry
> iommu/vt-d: functions to copy data from old mem
> iommu/vt-d: Add functions to load and save old re
> iommu/vt-d: datatypes and functions used for kdump
> iommu/vt-d: enable kdump support in iommu module
> iommu/vt-d: assign new page table for dma_map
> iommu/vt-d: Copy functions for irte
> iommu/vt-d: Use old irte in kdump kernel
>
> drivers/iommu/intel-iommu.c | 536 ++++++++++++++++++++++++++++++++++--
> drivers/iommu/intel_irq_remapping.c | 96 ++++++-
> include/linux/intel-iommu.h | 16 ++
> 3 files changed, 623 insertions(+), 25 deletions(-)
>
> --
> 2.0.0-rc0
>

2015-05-13 01:46:35

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH v11 02/10] iommu/vt-d: Items required for kdump

On 05/12/2015 04:17 PM, Dave Young wrote:
> On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
>> Add context entry functions needed for kdump.
>> +/*
>> + * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU
>> + *
>> + * Fixes the crashdump kernel to deal with an active iommu and legacy
>> + * DMA from the (old) panicked kernel in a manner similar to how legacy
>> + * DMA is handled when no hardware iommu was in use by the old kernel --
>> + * allow the legacy DMA to continue into its current buffers.
>> + *
>> + * In the crashdump kernel, this code:
>> + * 1. skips disabling the IOMMU's translating.
>> + * 2. Do not re-enable IOMMU's translating.
>> + * 3. In kdump kernel, use the old root entry table.
>> + * 4. Allocate pages for new context entry, copy data from old context entries
>> + * in the old kernel to the new ones.
>> + *
>> + * In other kinds of kernel, for example, a kernel started by kexec,
>> + * do the same thing as crashdump kernel.
>> + */
>> +
>> +
>
> Above comments should come along with the code changes instead of putting it
> in this patch.
>
> BTW, there's one more blank line at the end..
Code change is in 00/10, the cover letter.
And the blank does not matter, I checked the patch with
scripts/checkpatch.pl, no warnings, no errors.

2015-05-13 01:48:13

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH v11 05/10] iommu/vt-d: Add functions to load and save old re

On 05/12/2015 04:37 PM, Dave Young wrote:
> Seems the subject was truncated? Maybe "re" means root entry? Then please fix it
>
> On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
>> Add functions to load root entry table from old kernel, and to save updated
>> root entry table.
>> Add two member in struct intel_iommu, to store the RTA in old kernel, and
>> the mapped virt address of it.
>
> Please explain a bit what is "RTA" in patch log, it is unclear to most of people
> who do not know iommu details.
>
>>

If people want to understand this patchset, I assume they have read the
vt-d specs. RTA is defined clearly in this spec.

2015-05-13 01:55:59

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

One thing must be pointed out:
There is a known issue that hpsa driver cannot work well in kdump
kernel. And this patchset is not intended to fix this problem.

So this patchset cannot work with HP smart array devices which need hpsa
driver.

On 05/11/2015 05:52 PM, Li, Zhen-Hua wrote:
> This patchset is an update of Bill Sumner's patchset, implements a fix for:
> If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
> when a panic happens, the kdump kernel will boot with these faults:
>
> dmar: DRHD: handling fault status reg 102
> dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff80000
> DMAR:[fault reason 01] Present bit in root entry is clear
>
> dmar: DRHD: handling fault status reg 2
> dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
>
> On some system, the interrupt remapping fault will also happen even if the
> intel_iommu is not set to on, because the interrupt remapping will be enabled
> when x2apic is needed by the system.
>
> The cause of the DMA fault is described in Bill's original version, and the
> INTR-Remap fault is caused by a similar reason. In short, the initialization
> of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
> response.
>
> To fix this problem, we modifies the behaviors of the intel vt-d in the
> crashdump kernel:
>
> For DMA Remapping:
> 1. To accept the vt-d hardware in an active state,
> 2. Do not disable and re-enable the translation, keep it enabled.
> 3. Use the old root entry table, do not rewrite the RTA register.
> 4. Malloc and use new context entry table, copy data from the old ones that
> used by the old kernel.
> 5. Keep using the old page tables before driver is loaded.
> 6. After device driver is loaded, when it issues the first dma_map command,
> free the dmar_domain structure for this device, and generate a new one, so
> that the device can be assigned a new and empty page table.
> 7. When a new context entry table is generated, we also save its address to
> the old root entry table.
>
> For Interrupt Remapping:
> 1. To accept the vt-d hardware in an active state,
> 2. Do not disable and re-enable the interrupt remapping, keep it enabled.
> 3. Use the old interrupt remapping table, do not rewrite the IRTA register.
> 4. When ioapic entry is setup, the interrupt remapping table is changed, and
> the updated data will be stored to the old interrupt remapping table.
>
> Advantages of this approach:
> 1. All manipulation of the IO-device is done by the Linux device-driver
> for that device.
> 2. This approach behaves in a manner very similar to operation without an
> active iommu.
> 3. Any activity between the IO-device and its RMRR areas is handled by the
> device-driver in the same manner as during a non-kdump boot.
> 4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
> This supports the practice of creating a special kdump kernel without
> drivers for any devices that are not required for taking a crashdump.
> 5. Minimal code-changes among the existing mainline intel vt-d code.
>
> Summary of changes in this patch set:
> 1. Added some useful function for root entry table in code intel-iommu.c
> 2. Added new members to struct root_entry and struct irte;
> 3. Functions to load old root entry table to iommu->root_entry from the memory
> of old kernel.
> 4. Functions to malloc new context entry table and copy the data from the old
> ones to the malloced new ones.
> 5. Functions to enable support for DMA remapping in kdump kernel.
> 6. Functions to load old irte data from the old kernel to the kdump kernel.
> 7. Some code changes that support other behaviours that have been listed.
> 8. In the new functions, use physical address as "unsigned long" type, not
> pointers.
>
> Original version by Bill Sumner:
> https://lkml.org/lkml/2014/1/10/518
> https://lkml.org/lkml/2014/4/15/716
> https://lkml.org/lkml/2014/4/24/836
>
> Zhenhua's updates:
> https://lkml.org/lkml/2014/10/21/134
> https://lkml.org/lkml/2014/12/15/121
> https://lkml.org/lkml/2014/12/22/53
> https://lkml.org/lkml/2015/1/6/1166
> https://lkml.org/lkml/2015/1/12/35
> https://lkml.org/lkml/2015/3/19/33
> https://lkml.org/lkml/2015/4/10/135
>
> Changelog[v11]:
> 1. Fix some conflicts with the latest upstream kernel.
> Add flush in iommu_context_addr
>
> Changelog[v10]:
> 1. Do not use CONFIG_CRASH_DUMP and is_kdump_kernel().
> Use one flag which stores the te and ir status in last kernel:
> iommu->pre_enabled_trans
> iommu->pre_enabled_ir
>
> Changelog[v9]:
> 1. Add new function iommu_attach_domain_with_id.
> 2. Do not copy old page tables, keep using the old ones.
> 3. Remove functions:
> intel_iommu_did_to_domain_values_entry
> intel_iommu_get_dids_from_old_kernel
> device_to_domain_id
> copy_page_addr
> copy_page_table
> copy_context_entry
> copy_context_entry_table
> 4. Add new function device_to_existing_context_entry.
>
> Changelog[v8]:
> 1. Add a missing __iommu_flush_cache in function copy_page_table.
>
> Changelog[v7]:
> 1. Use __iommu_flush_cache to flush the data to hardware.
>
> Changelog[v6]:
> 1. Use "unsigned long" as type of physical address.
> 2. Use new function unmap_device_dma to unmap the old dma.
> 3. Some small incorrect bits order for aw shift.
>
> Changelog[v5]:
> 1. Do not disable and re-enable traslation and interrupt remapping.
> 2. Use old root entry table.
> 3. Use old interrupt remapping table.
> 4. New functions to copy data from old kernel, and save to old kernel mem.
> 5. New functions to save updated root entry table and irte table.
> 6. Use intel_unmap to unmap the old dma;
> 7. Allocate new pages while driver is being loaded.
>
> Changelog[v4]:
> 1. Cut off the patches that move some defines and functions to new files.
> 2. Reduce the numbers of patches to five, make it more easier to read.
> 3. Changed the name of functions, make them consistent with current context
> get/set functions.
> 4. Add change to function __iommu_attach_domain.
>
> Changelog[v3]:
> 1. Commented-out "#define DEBUG 1" to eliminate debug messages.
> 2. Updated the comments about changes in each version.
> 3. Fixed: one-line added to Copy-Translations patch to initialize the iovad
> struct as recommended by Baoquan He [[email protected]]
> init_iova_domain(&domain->iovad, DMA_32BIT_PFN);
>
> Changelog[v2]:
> The following series implements a fix for:
> A kdump problem about DMA that has been discussed for a long time. That is,
> when a kernel panics and boots into the kdump kernel, DMA started by the
> panicked kernel is not stopped before the kdump kernel is booted and the
> kdump kernel disables the IOMMU while this DMA continues. This causes the
> IOMMU to stop translating the DMA addresses as IOVAs and begin to treat
> them as physical memory addresses -- which causes the DMA to either:
> (1) generate DMAR errors or
> (2) generate PCI SERR errors or
> (3) transfer data to or from incorrect areas of memory. Often this
> causes the dump to fail.
>
> Changelog[v1]:
> The original version.
>
> Changed in this version:
> 1. Do not disable and re-enable traslation and interrupt remapping.
> 2. Use old root entry table.
> 3. Use old interrupt remapping table.
> 4. Use "unsigned long" as physical address.
> 5. Use intel_unmap to unmap the old dma;
>
> Baoquan He <[email protected]> helps testing this patchset.
> Takao Indoh <[email protected]> gives valuable suggestions.
>
> Li, Zhen-Hua (10):
> iommu/vt-d: New function to attach domain with id
> iommu/vt-d: Items required for kdump
> iommu/vt-d: Function to get existing context entry
> iommu/vt-d: functions to copy data from old mem
> iommu/vt-d: Add functions to load and save old re
> iommu/vt-d: datatypes and functions used for kdump
> iommu/vt-d: enable kdump support in iommu module
> iommu/vt-d: assign new page table for dma_map
> iommu/vt-d: Copy functions for irte
> iommu/vt-d: Use old irte in kdump kernel
>
> drivers/iommu/intel-iommu.c | 536 ++++++++++++++++++++++++++++++++++--
> drivers/iommu/intel_irq_remapping.c | 96 ++++++-
> include/linux/intel-iommu.h | 16 ++
> 3 files changed, 623 insertions(+), 25 deletions(-)
>

2015-05-13 02:11:13

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v11 07/10] iommu/vt-d: enable kdump support in iommu module

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> Modify the operation of the following functions when called during crash dump:
> iommu_context_addr
> free_context_table
> get_domain_for_dev
> init_dmars
> intel_iommu_init
>
> Bill Sumner:
> Original version.
>
> Zhenhua:
> The name of new calling functions.
> Do not disable and re-enable TE in kdump kernel.
> Use the did and gaw from old context entry;
>
> Signed-off-by: Bill Sumner <[email protected]>
> Signed-off-by: Li, Zhen-Hua <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 95 +++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 83 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 28c3c64..91545bf 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -397,6 +397,7 @@ static int copy_root_entry_table(struct intel_iommu *iommu);
> static int intel_iommu_load_translation_tables(struct intel_iommu *iommu);
>
> static void iommu_check_pre_te_status(struct intel_iommu *iommu);
> +static u8 g_translation_pre_enabled;
Hi Zhenhua,

I haven't checked patch one by one, am going through the code flow.

About g_translation_pre_enabled, I don't think it's necessary to define
it as a global variable. Both its assignment and judgement are in
init_dmars(). In this situation a local variable translation_pre_enabled
in init_dmars() is enough.

You can assign value to it here:

iommu_check_pre_te_status(iommu);
if (iommu->pre_enabled_trans) {
translation_pre_enabled = 1;
...
}

Thanks
Baoquan

2015-05-13 02:29:15

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH v11 07/10] iommu/vt-d: enable kdump support in iommu module

>> +static u8 g_translation_pre_enabled;
> Hi Zhenhua,
>
> I haven't checked patch one by one, am going through the code flow.
>
> About g_translation_pre_enabled, I don't think it's necessary to define
> it as a global variable. Both its assignment and judgement are in
> init_dmars(). In this situation a local variable translation_pre_enabled
> in init_dmars() is enough.
>
> You can assign value to it here:
>
> iommu_check_pre_te_status(iommu);
> if (iommu->pre_enabled_trans) {
> translation_pre_enabled = 1;
> ...
> }
>
> Thanks
> Baoquan
>
Hi Baoquan,
This variable is only be used in this file, for it is defined as static.
Till now, I think both global and local variable are fine, got the same
thing.
But I believe global is better, because if other functions want to know
whether translation is enabled, this global variable is a good choice.

Thanks
Zhenhua




2015-05-13 02:36:54

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v11 07/10] iommu/vt-d: enable kdump support in iommu module

On 05/13/15 at 10:28am, Li, ZhenHua wrote:
> >>+static u8 g_translation_pre_enabled;
> >Hi Zhenhua,
> >
> >I haven't checked patch one by one, am going through the code flow.
> >
> >About g_translation_pre_enabled, I don't think it's necessary to define
> >it as a global variable. Both its assignment and judgement are in
> >init_dmars(). In this situation a local variable translation_pre_enabled
> >in init_dmars() is enough.
> >
> >You can assign value to it here:
> >
> > iommu_check_pre_te_status(iommu);
> > if (iommu->pre_enabled_trans) {
> > translation_pre_enabled = 1;
> > ...
> > }
> >
> >Thanks
> >Baoquan
> >
> Hi Baoquan,
> This variable is only be used in this file, for it is defined as static.
> Till now, I think both global and local variable are fine, got the same
> thing.
> But I believe global is better, because if other functions want to
> know whether translation is enabled, this global variable is a good
> choice.

OK, I don't insist on this. But I think we don't have obligation to
consider the future usage for a function or variable. We can often see a
static function is redefined as non-static since it need be used in
other file or the similar thing for variable. It's also good to change
it when it's really needed.

Anyway, I am fine with this. Just for now it's a little uncomfortable to
me.

Thanks
Baoquan

2015-05-13 06:31:48

by Alexander Duyck

[permalink] [raw]
Subject: Re: [PATCH v11 02/10] iommu/vt-d: Items required for kdump

On 05/12/2015 06:45 PM, Li, ZhenHua wrote:
> On 05/12/2015 04:17 PM, Dave Young wrote:
>> On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
>>> Add context entry functions needed for kdump.
>>> +/*
>>> + * Fix Crashdump failure caused by leftover DMA through a hardware
>>> IOMMU
>>> + *
>>> + * Fixes the crashdump kernel to deal with an active iommu and legacy
>>> + * DMA from the (old) panicked kernel in a manner similar to how legacy
>>> + * DMA is handled when no hardware iommu was in use by the old
>>> kernel --
>>> + * allow the legacy DMA to continue into its current buffers.
>>> + *
>>> + * In the crashdump kernel, this code:
>>> + * 1. skips disabling the IOMMU's translating.
>>> + * 2. Do not re-enable IOMMU's translating.
>>> + * 3. In kdump kernel, use the old root entry table.
>>> + * 4. Allocate pages for new context entry, copy data from old
>>> context entries
>>> + * in the old kernel to the new ones.
>>> + *
>>> + * In other kinds of kernel, for example, a kernel started by kexec,
>>> + * do the same thing as crashdump kernel.
>>> + */
>>> +
>>> +
>>
>> Above comments should come along with the code changes instead of
>> putting it
>> in this patch.
>>
>> BTW, there's one more blank line at the end..
> Code change is in 00/10, the cover letter.
> And the blank does not matter, I checked the patch with
> scripts/checkpatch.pl, no warnings, no errors.

I agree with Dave. The comment by itself makes no sense. You would be
better off moving this to patch 3 where you actually start to add code.

You remove the double space in patch 5 of your set so it must of
triggered something.

Neither of these are grounds for rejecting the patches, but it is
something you may want to clean up if you end up resubmitting.

- Alex

2015-05-13 08:43:27

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH v11 02/10] iommu/vt-d: Items required for kdump

On 05/13/15 at 09:45am, Li, ZhenHua wrote:
> On 05/12/2015 04:17 PM, Dave Young wrote:
> >On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> >>Add context entry functions needed for kdump.
> >>+/*
> >>+ * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU
> >>+ *
> >>+ * Fixes the crashdump kernel to deal with an active iommu and legacy
> >>+ * DMA from the (old) panicked kernel in a manner similar to how legacy
> >>+ * DMA is handled when no hardware iommu was in use by the old kernel --
> >>+ * allow the legacy DMA to continue into its current buffers.
> >>+ *
> >>+ * In the crashdump kernel, this code:
> >>+ * 1. skips disabling the IOMMU's translating.
> >>+ * 2. Do not re-enable IOMMU's translating.
> >>+ * 3. In kdump kernel, use the old root entry table.
> >>+ * 4. Allocate pages for new context entry, copy data from old context entries
> >>+ * in the old kernel to the new ones.
> >>+ *
> >>+ * In other kinds of kernel, for example, a kernel started by kexec,
> >>+ * do the same thing as crashdump kernel.
> >>+ */
> >>+
> >>+
> >
> >Above comments should come along with the code changes instead of putting it
> >in this patch.
> >
> >BTW, there's one more blank line at the end..
> Code change is in 00/10, the cover letter.

I means the real code, not the changelog.

> And the blank does not matter, I checked the patch with
> scripts/checkpatch.pl, no warnings, no errors.

Why add two line breaks if one is enough? Adding such check in checkpatch.pl
make sense to me actually.

Thanks
Dave

2015-05-13 08:50:37

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH v11 05/10] iommu/vt-d: Add functions to load and save old re

On 05/13/15 at 09:47am, Li, ZhenHua wrote:
> On 05/12/2015 04:37 PM, Dave Young wrote:
> >Seems the subject was truncated? Maybe "re" means root entry? Then please fix it
> >
> >On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> >>Add functions to load root entry table from old kernel, and to save updated
> >>root entry table.
> >>Add two member in struct intel_iommu, to store the RTA in old kernel, and
> >>the mapped virt address of it.
> >
> >Please explain a bit what is "RTA" in patch log, it is unclear to most of people
> >who do not know iommu details.
> >
> >>
>
> If people want to understand this patchset, I assume they have read the vt-d
> specs. RTA is defined clearly in this spec.
>

I think explain a bit is better, it will be easier for review, RTA is defined in spec,
right, but if you refer to it in kernel source code, describe the meaning will help
people to understand your code.

I would not stick on this 'RTA', but a descriptive subject and patch log is still
important, please check the logs, not only for this patch.

Thanks
Dave

2015-05-13 08:57:15

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v11 06/10] iommu/vt-d: datatypes and functions used for kdump

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> Populate it with support functions to copy iommu translation tables from
> from the panicked kernel into the kdump kernel in the event of a crash.
>
> Functions:
> Use old root entry table, and load the old data to root_entry as cache.
> Malloc new context table and copy old context table to the new one.
>
> Bill Sumner:
> Original version, the creation of the data types and functions.
>
> Li, Zhenhua:
> Create new function iommu_check_pre_te_status() to check status.
> Update the caller of context_get_* and context_put*, use context_*
> and context_set_* for replacement.
> Update the name of the function that loads root entry table.
> Use new function to copy old context entry tables and page tables.
> Use "unsigned long" for physical address.
> Remove the functions to copy page table in Bill's version.
> Remove usage of dve and ppap in Bill's version.
>
> Signed-off-by: Bill Sumner <[email protected]>
> Signed-off-by: Li, Zhen-Hua <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 121 ++++++++++++++++++++++++++++++++++++++++++++
> include/linux/intel-iommu.h | 3 ++
> 2 files changed, 124 insertions(+)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 3a5d446..28c3c64 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -386,6 +386,18 @@ struct iommu_remapped_entry {
> static LIST_HEAD(__iommu_remapped_mem);
> static DEFINE_MUTEX(__iommu_mem_list_lock);
>
> +/* ========================================================================
> + * Copy iommu translation tables from old kernel into new kernel.
> + * Entry to this set of functions is: intel_iommu_load_translation_tables()
> + * ------------------------------------------------------------------------
> + */
> +
> +static int copy_root_entry_table(struct intel_iommu *iommu);
> +
> +static int intel_iommu_load_translation_tables(struct intel_iommu *iommu);
> +
> +static void iommu_check_pre_te_status(struct intel_iommu *iommu);
> +
> /*
> * This domain is a statically identity mapping domain.
> * 1. This domain creats a static 1:1 mapping to all usable memory.
> @@ -4987,3 +4999,112 @@ static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int index)
> __iommu_flush_cache(iommu, to + start, size);
> }
>
> +/*
> + * Load root entry tables from old kernel.
> + */
> +static int copy_root_entry_table(struct intel_iommu *iommu)
> +{
> + u32 bus; /* Index: root-entry-table */
> + struct root_entry *re; /* Virt(iterator: new table) */
> + unsigned long context_old_phys; /* Phys(context table entry) */
> + struct context_entry *context_new_virt; /* Virt(new context_entry) */
> +
> + /*
> + * A new root entry table has been allocated ,
> + * we need copy re from old kernel to the new allocated one.
> + */
> +
> + if (!iommu->root_entry_old_phys)
> + return -ENOMEM;
> +
> + for (bus = 0, re = iommu->root_entry; bus < 256; bus += 1, re += 1) {
> + if (!root_present(re))
> + continue;
> +
> + context_old_phys = get_context_phys_from_root(re);
> +
> + if (!context_old_phys)
> + continue;
> +
> + context_new_virt =
> + (struct context_entry *)alloc_pgtable_page(iommu->node);
> +
> + if (!context_new_virt)
> + return -ENOMEM;
> +
> + __iommu_load_from_oldmem(context_new_virt,
> + context_old_phys,
> + VTD_PAGE_SIZE);
> +
> + __iommu_flush_cache(iommu, context_new_virt, VTD_PAGE_SIZE);
> +
> + set_root_value(re, virt_to_phys(context_new_virt));
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * Interface to the "load translation tables" set of functions
> + * from mainline code.
> + */
> +static int intel_iommu_load_translation_tables(struct intel_iommu *iommu)
> +{
> + unsigned long long q; /* quadword scratch */
> + int ret = 0; /* Integer return code */
> + unsigned long flags;
> +
> + q = dmar_readq(iommu->reg + DMAR_RTADDR_REG);
> + if (!q)
> + return -1;
> +
> + spin_lock_irqsave(&iommu->lock, flags);
> +
> + /* Load the root-entry table from the old kernel
> + * foreach context_entry_table in root_entry
> + * Copy each entry table from old kernel
> + */
> + if (!iommu->root_entry) {
> + iommu->root_entry =
> + (struct root_entry *)alloc_pgtable_page(iommu->node);
> + if (!iommu->root_entry) {
> + spin_unlock_irqrestore(&iommu->lock, flags);
> + return -ENOMEM;
> + }
> + }
> +
> + iommu->root_entry_old_phys = q & VTD_PAGE_MASK;
> + if (!iommu->root_entry_old_phys) {
> + pr_err("Could not read old root entry address.");
> + return -1;
> + }
> +

I didn't find where you call iounmap to free mapping of
iommu->root_entry_old_phys. Am I missing anything?

> + iommu->root_entry_old_virt = ioremap_cache(iommu->root_entry_old_phys,
> + VTD_PAGE_SIZE);
> + if (!iommu->root_entry_old_virt) {
> + pr_err("Could not map the old root entry.");
> + return -ENOMEM;
> + }

2015-05-13 08:59:39

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH v11 06/10] iommu/vt-d: datatypes and functions used for kdump

Hi Dave,

iommu->root_entry_old_virt is used to store the mapped old rta.
iommu->root_entry_old_phys is used to store the physical address stored
in registers.
So we must not free/unmap iommu->root_entry_old_phys .

Zhenhua
On 05/13/2015 04:56 PM, Baoquan He wrote:
>> +
>> + iommu->root_entry_old_phys = q & VTD_PAGE_MASK;
>> + if (!iommu->root_entry_old_phys) {
>> + pr_err("Could not read old root entry address.");
>> + return -1;
>> + }
>> +
>
> I didn't find where you call iounmap to free mapping of
> iommu->root_entry_old_phys. Am I missing anything?
>
>> + iommu->root_entry_old_virt = ioremap_cache(iommu->root_entry_old_phys,
>> + VTD_PAGE_SIZE);
>> + if (!iommu->root_entry_old_virt) {
>> + pr_err("Could not map the old root entry.");
>> + return -ENOMEM;
>> + }

2015-05-13 09:01:20

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v11 04/10] iommu/vt-d: functions to copy data from old mem

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> Add some functions to copy the data from old kernel.
> These functions are used to copy context tables and page tables.
>
> To avoid calling iounmap between spin_lock_irqsave and spin_unlock_irqrestore,
> use a link here, store the pointers , and then use iounmap to free them in
> another place.

Hi Zhenhua,

I remember you mentioned iounmap will cause error inside
spin_lock_irqsave. Do you know why it happened now? And could you also
describe why avoid calling iounmap between
spin_lock_irqsave/unlock_irqsave is needed here and what's the status
now?

I think other reviewer may want to know it too.

Thanks
Baoquan

2015-05-13 09:11:46

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v11 06/10] iommu/vt-d: datatypes and functions used for kdump

On 05/13/15 at 04:58pm, Li, ZhenHua wrote:
> Hi Dave,
>
> iommu->root_entry_old_virt is used to store the mapped old rta.
> iommu->root_entry_old_phys is used to store the physical address
> stored in registers.
> So we must not free/unmap iommu->root_entry_old_phys .

Oh, yes. I was mistaken on this. Thanks for telling.

>
> Zhenhua
> On 05/13/2015 04:56 PM, Baoquan He wrote:
> >>+
> >>+ iommu->root_entry_old_phys = q & VTD_PAGE_MASK;
> >>+ if (!iommu->root_entry_old_phys) {
> >>+ pr_err("Could not read old root entry address.");
> >>+ return -1;
> >>+ }
> >>+
> >
> >I didn't find where you call iounmap to free mapping of
> >iommu->root_entry_old_phys. Am I missing anything?
> >
> >>+ iommu->root_entry_old_virt = ioremap_cache(iommu->root_entry_old_phys,
> >>+ VTD_PAGE_SIZE);
> >>+ if (!iommu->root_entry_old_virt) {
> >>+ pr_err("Could not map the old root entry.");
> >>+ return -ENOMEM;
> >>+ }
>

2015-05-13 09:14:11

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH v11 04/10] iommu/vt-d: functions to copy data from old mem

Hi Baoquan,
I am using a list here to store all the mapped addresses, and unmap them
out of iounmap.

About the reason, please check the old mails. I cannot remember the
detailed reasons.

Thanks
Zhenhua

On 05/13/2015 05:00 PM, Baoquan He wrote:
> On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
>> Add some functions to copy the data from old kernel.
>> These functions are used to copy context tables and page tables.
>>
>> To avoid calling iounmap between spin_lock_irqsave and spin_unlock_irqrestore,
>> use a link here, store the pointers , and then use iounmap to free them in
>> another place.
>
> Hi Zhenhua,
>
> I remember you mentioned iounmap will cause error inside
> spin_lock_irqsave. Do you know why it happened now? And could you also
> describe why avoid calling iounmap between
> spin_lock_irqsave/unlock_irqsave is needed here and what's the status
> now?
>
> I think other reviewer may want to know it too.
>
> Thanks
> Baoquan
>

2015-05-13 09:21:11

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v11 04/10] iommu/vt-d: functions to copy data from old mem

On 05/13/15 at 05:13pm, Li, ZhenHua wrote:
> Hi Baoquan,
> I am using a list here to store all the mapped addresses, and unmap
> them out of iounmap.

>
> About the reason, please check the old mails. I cannot remember the
> detailed reasons.

Yeah, I understand that the list is used to collect all mapped addresses
and unmap them all after spin_unlock_irqsave.

that's why it's better to put the reason in patch log. Patch log is used
to record this kind of explanation so that you needn't say it again and
again. My personal opinion.

Thanks
Baoquan

2015-05-18 10:07:03

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Hi Joerg,

Testing from HP: passed.
Testing from He Baoquan(Redhat): passed

The problem that dmar fault came again when running v10 with latest
kernel is fixed.
And I think there is no need to update the code to a new version now.

So please tell me if there are still any code need to be changed.

Thanks
Zhenhua

On 05/13/2015 09:54 AM, Li, ZhenHua wrote:
> One thing must be pointed out:
> There is a known issue that hpsa driver cannot work well in kdump
> kernel. And this patchset is not intended to fix this problem.
>
> So this patchset cannot work with HP smart array devices which need hpsa
> driver.
>
> On 05/11/2015 05:52 PM, Li, Zhen-Hua wrote:
>> This patchset is an update of Bill Sumner's patchset, implements a fix
>> for:
>> If a kernel boots with intel_iommu=on on a system that supports intel
>> vt-d,
>> when a panic happens, the kdump kernel will boot with these faults:
>>
>> dmar: DRHD: handling fault status reg 102
>> dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff80000
>> DMAR:[fault reason 01] Present bit in root entry is clear
>>
>> dmar: DRHD: handling fault status reg 2
>> dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
>> INTR-REMAP:[fault reason 34] Present field in the IRTE entry is
>> clear
>>
>> On some system, the interrupt remapping fault will also happen even if
>> the
>> intel_iommu is not set to on, because the interrupt remapping will be
>> enabled
>> when x2apic is needed by the system.
>>
>> The cause of the DMA fault is described in Bill's original version,
>> and the
>> INTR-Remap fault is caused by a similar reason. In short, the
>> initialization
>> of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
>> response.
>>
>> To fix this problem, we modifies the behaviors of the intel vt-d in the
>> crashdump kernel:
>>
>> For DMA Remapping:
>> 1. To accept the vt-d hardware in an active state,
>> 2. Do not disable and re-enable the translation, keep it enabled.
>> 3. Use the old root entry table, do not rewrite the RTA register.
>> 4. Malloc and use new context entry table, copy data from the old ones
>> that
>> used by the old kernel.
>> 5. Keep using the old page tables before driver is loaded.
>> 6. After device driver is loaded, when it issues the first dma_map
>> command,
>> free the dmar_domain structure for this device, and generate a new
>> one, so
>> that the device can be assigned a new and empty page table.
>> 7. When a new context entry table is generated, we also save its
>> address to
>> the old root entry table.
>>
>> For Interrupt Remapping:
>> 1. To accept the vt-d hardware in an active state,
>> 2. Do not disable and re-enable the interrupt remapping, keep it enabled.
>> 3. Use the old interrupt remapping table, do not rewrite the IRTA
>> register.
>> 4. When ioapic entry is setup, the interrupt remapping table is
>> changed, and
>> the updated data will be stored to the old interrupt remapping table.
>>
>> Advantages of this approach:
>> 1. All manipulation of the IO-device is done by the Linux device-driver
>> for that device.
>> 2. This approach behaves in a manner very similar to operation without an
>> active iommu.
>> 3. Any activity between the IO-device and its RMRR areas is handled by
>> the
>> device-driver in the same manner as during a non-kdump boot.
>> 4. If an IO-device has no driver in the kdump kernel, it is simply
>> left alone.
>> This supports the practice of creating a special kdump kernel without
>> drivers for any devices that are not required for taking a crashdump.
>> 5. Minimal code-changes among the existing mainline intel vt-d code.
>>
>> Summary of changes in this patch set:
>> 1. Added some useful function for root entry table in code intel-iommu.c
>> 2. Added new members to struct root_entry and struct irte;
>> 3. Functions to load old root entry table to iommu->root_entry from
>> the memory
>> of old kernel.
>> 4. Functions to malloc new context entry table and copy the data from
>> the old
>> ones to the malloced new ones.
>> 5. Functions to enable support for DMA remapping in kdump kernel.
>> 6. Functions to load old irte data from the old kernel to the kdump
>> kernel.
>> 7. Some code changes that support other behaviours that have been listed.
>> 8. In the new functions, use physical address as "unsigned long" type,
>> not
>> pointers.
>>
>> Original version by Bill Sumner:
>> https://lkml.org/lkml/2014/1/10/518
>> https://lkml.org/lkml/2014/4/15/716
>> https://lkml.org/lkml/2014/4/24/836
>>
>> Zhenhua's updates:
>> https://lkml.org/lkml/2014/10/21/134
>> https://lkml.org/lkml/2014/12/15/121
>> https://lkml.org/lkml/2014/12/22/53
>> https://lkml.org/lkml/2015/1/6/1166
>> https://lkml.org/lkml/2015/1/12/35
>> https://lkml.org/lkml/2015/3/19/33
>> https://lkml.org/lkml/2015/4/10/135
>>
>> Changelog[v11]:
>> 1. Fix some conflicts with the latest upstream kernel.
>> Add flush in iommu_context_addr
>>
>> Changelog[v10]:
>> 1. Do not use CONFIG_CRASH_DUMP and is_kdump_kernel().
>> Use one flag which stores the te and ir status in last kernel:
>> iommu->pre_enabled_trans
>> iommu->pre_enabled_ir
>>
>> Changelog[v9]:
>> 1. Add new function iommu_attach_domain_with_id.
>> 2. Do not copy old page tables, keep using the old ones.
>> 3. Remove functions:
>> intel_iommu_did_to_domain_values_entry
>> intel_iommu_get_dids_from_old_kernel
>> device_to_domain_id
>> copy_page_addr
>> copy_page_table
>> copy_context_entry
>> copy_context_entry_table
>> 4. Add new function device_to_existing_context_entry.
>>
>> Changelog[v8]:
>> 1. Add a missing __iommu_flush_cache in function copy_page_table.
>>
>> Changelog[v7]:
>> 1. Use __iommu_flush_cache to flush the data to hardware.
>>
>> Changelog[v6]:
>> 1. Use "unsigned long" as type of physical address.
>> 2. Use new function unmap_device_dma to unmap the old dma.
>> 3. Some small incorrect bits order for aw shift.
>>
>> Changelog[v5]:
>> 1. Do not disable and re-enable traslation and interrupt remapping.
>> 2. Use old root entry table.
>> 3. Use old interrupt remapping table.
>> 4. New functions to copy data from old kernel, and save to old
>> kernel mem.
>> 5. New functions to save updated root entry table and irte table.
>> 6. Use intel_unmap to unmap the old dma;
>> 7. Allocate new pages while driver is being loaded.
>>
>> Changelog[v4]:
>> 1. Cut off the patches that move some defines and functions to
>> new files.
>> 2. Reduce the numbers of patches to five, make it more easier to
>> read.
>> 3. Changed the name of functions, make them consistent with
>> current context
>> get/set functions.
>> 4. Add change to function __iommu_attach_domain.
>>
>> Changelog[v3]:
>> 1. Commented-out "#define DEBUG 1" to eliminate debug messages.
>> 2. Updated the comments about changes in each version.
>> 3. Fixed: one-line added to Copy-Translations patch to initialize
>> the iovad
>> struct as recommended by Baoquan He [[email protected]]
>> init_iova_domain(&domain->iovad, DMA_32BIT_PFN);
>>
>> Changelog[v2]:
>> The following series implements a fix for:
>> A kdump problem about DMA that has been discussed for a long
>> time. That is,
>> when a kernel panics and boots into the kdump kernel, DMA started
>> by the
>> panicked kernel is not stopped before the kdump kernel is booted
>> and the
>> kdump kernel disables the IOMMU while this DMA continues. This
>> causes the
>> IOMMU to stop translating the DMA addresses as IOVAs and begin to
>> treat
>> them as physical memory addresses -- which causes the DMA to either:
>> (1) generate DMAR errors or
>> (2) generate PCI SERR errors or
>> (3) transfer data to or from incorrect areas of memory. Often
>> this
>> causes the dump to fail.
>>
>> Changelog[v1]:
>> The original version.
>>
>> Changed in this version:
>> 1. Do not disable and re-enable traslation and interrupt remapping.
>> 2. Use old root entry table.
>> 3. Use old interrupt remapping table.
>> 4. Use "unsigned long" as physical address.
>> 5. Use intel_unmap to unmap the old dma;
>>
>> Baoquan He <[email protected]> helps testing this patchset.
>> Takao Indoh <[email protected]> gives valuable suggestions.
>>
>> Li, Zhen-Hua (10):
>> iommu/vt-d: New function to attach domain with id
>> iommu/vt-d: Items required for kdump
>> iommu/vt-d: Function to get existing context entry
>> iommu/vt-d: functions to copy data from old mem
>> iommu/vt-d: Add functions to load and save old re
>> iommu/vt-d: datatypes and functions used for kdump
>> iommu/vt-d: enable kdump support in iommu module
>> iommu/vt-d: assign new page table for dma_map
>> iommu/vt-d: Copy functions for irte
>> iommu/vt-d: Use old irte in kdump kernel
>>
>> drivers/iommu/intel-iommu.c | 536
>> ++++++++++++++++++++++++++++++++++--
>> drivers/iommu/intel_irq_remapping.c | 96 ++++++-
>> include/linux/intel-iommu.h | 16 ++
>> 3 files changed, 623 insertions(+), 25 deletions(-)
>>
>

2015-05-19 01:13:40

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Hi,

On 05/18/15 at 06:05pm, Li, ZhenHua wrote:
> Hi Joerg,
>
> Testing from HP: passed.
> Testing from He Baoquan(Redhat): passed
>
> The problem that dmar fault came again when running v10 with latest
> kernel is fixed.
> And I think there is no need to update the code to a new version now.
>
> So please tell me if there are still any code need to be changed.

Have you reviewed all my comments? Although it is only iommu driver fixes,
The patches are sent to lkml, no? If you do not want other people's comments
why will you send to us? I would say do not assume it is a closed circle for
only iommu driver.

It is not easy to review patches actually, frankly I feel kernel code review
is not so strict like the early days, maybe one reason is people are getting
tired to argue.

Thanks
Dave

2015-05-19 07:44:52

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Hi Dave,

Don't worry :).
Of cause I have read your comments, but most of them contains no actual
code change request, so I did not reply them one by one.
When we are sure there is no actual code change needed, I will update
the comments and other format problems if necessary.

Regards
Zhenhua

On 05/19/2015 09:13 AM, Dave Young wrote:
> Hi,
>
> On 05/18/15 at 06:05pm, Li, ZhenHua wrote:
>> Hi Joerg,
>>
>> Testing from HP: passed.
>> Testing from He Baoquan(Redhat): passed
>>
>> The problem that dmar fault came again when running v10 with latest
>> kernel is fixed.
>> And I think there is no need to update the code to a new version now.
>>
>> So please tell me if there are still any code need to be changed.
>
> Have you reviewed all my comments? Although it is only iommu driver fixes,
> The patches are sent to lkml, no? If you do not want other people's comments
> why will you send to us? I would say do not assume it is a closed circle for
> only iommu driver.
>
> It is not easy to review patches actually, frankly I feel kernel code review
> is not so strict like the early days, maybe one reason is people are getting
> tired to argue.
>
> Thanks
> Dave
>

2015-05-20 23:53:29

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v11 08/10] iommu/vt-d: assign new page table for dma_map

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> When a device driver issues the first dma_map command for a device, we
> assign a new and empty page-table, thus removing all mappings from the
> old kernel for the device.

Hi Zhenhua,

>From your patch I got it will remove all mappings, assign a new
page-table. But I didn't got why you stress an empty page-table. Did I
miss anything?

Thanks
Baoquan

>
> Signed-off-by: Li, Zhen-Hua <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 58 ++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 50 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 91545bf..3cc1027 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -396,6 +396,9 @@ static int copy_root_entry_table(struct intel_iommu *iommu);
>
> static int intel_iommu_load_translation_tables(struct intel_iommu *iommu);
>
> +static void unmap_device_dma(struct dmar_domain *domain,
> + struct device *dev,
> + struct intel_iommu *iommu);
> static void iommu_check_pre_te_status(struct intel_iommu *iommu);
> static u8 g_translation_pre_enabled;
>
> @@ -3115,6 +3118,7 @@ static struct iova *intel_alloc_iova(struct device *dev,
> static struct dmar_domain *__get_valid_domain_for_dev(struct device *dev)
> {
> struct dmar_domain *domain;
> + struct intel_iommu *iommu;
> int ret;
>
> domain = get_domain_for_dev(dev, DEFAULT_DOMAIN_ADDRESS_WIDTH);
> @@ -3124,14 +3128,30 @@ static struct dmar_domain *__get_valid_domain_for_dev(struct device *dev)
> return NULL;
> }
>
> - /* make sure context mapping is ok */
> - if (unlikely(!domain_context_mapped(dev))) {
> - ret = domain_context_mapping(domain, dev, CONTEXT_TT_MULTI_LEVEL);
> - if (ret) {
> - printk(KERN_ERR "Domain context map for %s failed",
> - dev_name(dev));
> - return NULL;
> - }
> + /* if in kdump kernel, we need to unmap the mapped dma pages,
> + * detach this device first.
> + */
> + if (likely(domain_context_mapped(dev))) {
> + iommu = domain_get_iommu(domain);
> + if (iommu->pre_enabled_trans) {
> + unmap_device_dma(domain, dev, iommu);
> +
> + domain = get_domain_for_dev(dev,
> + DEFAULT_DOMAIN_ADDRESS_WIDTH);
> + if (!domain) {
> + pr_err("Allocating domain for %s failed",
> + dev_name(dev));
> + return NULL;
> + }
> + } else
> + return domain;
> + }
> +
> + ret = domain_context_mapping(domain, dev, CONTEXT_TT_MULTI_LEVEL);
> + if (ret) {
> + pr_err("Domain context map for %s failed",
> + dev_name(dev));
> + return NULL;
> }
>
> return domain;
> @@ -5168,6 +5188,28 @@ static int intel_iommu_load_translation_tables(struct intel_iommu *iommu)
> return ret;
> }
>
> +static void unmap_device_dma(struct dmar_domain *domain,
> + struct device *dev,
> + struct intel_iommu *iommu)
> +{
> + struct context_entry *ce;
> + struct iova *iova;
> + phys_addr_t phys_addr;
> + dma_addr_t dev_addr;
> + struct pci_dev *pdev;
> +
> + pdev = to_pci_dev(dev);
> + ce = iommu_context_addr(iommu, pdev->bus->number, pdev->devfn, 1);
> + phys_addr = context_address_root(ce) << VTD_PAGE_SHIFT;
> + dev_addr = phys_to_dma(dev, phys_addr);
> +
> + iova = find_iova(&domain->iovad, IOVA_PFN(dev_addr));
> + if (iova)
> + intel_unmap(dev, dev_addr);
> +
> + domain_remove_one_dev_info(domain, dev);
> +}
> +
> static void iommu_check_pre_te_status(struct intel_iommu *iommu)
> {
> u32 sts;
> --
> 2.0.0-rc0
>

2015-05-21 01:28:24

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH v11 08/10] iommu/vt-d: assign new page table for dma_map

Hi Baoquan,

In the early version of this patchset, old page tables are used by new
kernel. But as discussed, we need to make kernel use new pages when
there is a new dma request , so we need to unmap the pages which were
mapped in old kernel, and this is what this patch does.

Thanks
Zhenhua

On 05/21/2015 07:52 AM, Baoquan He wrote:
> On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
>> When a device driver issues the first dma_map command for a device, we
>> assign a new and empty page-table, thus removing all mappings from the
>> old kernel for the device.
>
> Hi Zhenhua,
>
> From your patch I got it will remove all mappings, assign a new
> page-table. But I didn't got why you stress an empty page-table. Did I
> miss anything?
>
> Thanks
> Baoquan
>
>>
>> Signed-off-by: Li, Zhen-Hua <[email protected]>
>> ---
>> drivers/iommu/intel-iommu.c | 58 ++++++++++++++++++++++++++++++++++++++-------
>> 1 file changed, 50 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 91545bf..3cc1027 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -396,6 +396,9 @@ static int copy_root_entry_table(struct intel_iommu *iommu);
>>
>> static int intel_iommu_load_translation_tables(struct intel_iommu *iommu);
>>
>> +static void unmap_device_dma(struct dmar_domain *domain,
>> + struct device *dev,
>> + struct intel_iommu *iommu);
>> static void iommu_check_pre_te_status(struct intel_iommu *iommu);
>> static u8 g_translation_pre_enabled;
>>
>> @@ -3115,6 +3118,7 @@ static struct iova *intel_alloc_iova(struct device *dev,
>> static struct dmar_domain *__get_valid_domain_for_dev(struct device *dev)
>> {
>> struct dmar_domain *domain;
>> + struct intel_iommu *iommu;
>> int ret;
>>
>> domain = get_domain_for_dev(dev, DEFAULT_DOMAIN_ADDRESS_WIDTH);
>> @@ -3124,14 +3128,30 @@ static struct dmar_domain *__get_valid_domain_for_dev(struct device *dev)
>> return NULL;
>> }
>>
>> - /* make sure context mapping is ok */
>> - if (unlikely(!domain_context_mapped(dev))) {
>> - ret = domain_context_mapping(domain, dev, CONTEXT_TT_MULTI_LEVEL);
>> - if (ret) {
>> - printk(KERN_ERR "Domain context map for %s failed",
>> - dev_name(dev));
>> - return NULL;
>> - }
>> + /* if in kdump kernel, we need to unmap the mapped dma pages,
>> + * detach this device first.
>> + */
>> + if (likely(domain_context_mapped(dev))) {
>> + iommu = domain_get_iommu(domain);
>> + if (iommu->pre_enabled_trans) {
>> + unmap_device_dma(domain, dev, iommu);
>> +
>> + domain = get_domain_for_dev(dev,
>> + DEFAULT_DOMAIN_ADDRESS_WIDTH);
>> + if (!domain) {
>> + pr_err("Allocating domain for %s failed",
>> + dev_name(dev));
>> + return NULL;
>> + }
>> + } else
>> + return domain;
>> + }
>> +
>> + ret = domain_context_mapping(domain, dev, CONTEXT_TT_MULTI_LEVEL);
>> + if (ret) {
>> + pr_err("Domain context map for %s failed",
>> + dev_name(dev));
>> + return NULL;
>> }
>>
>> return domain;
>> @@ -5168,6 +5188,28 @@ static int intel_iommu_load_translation_tables(struct intel_iommu *iommu)
>> return ret;
>> }
>>
>> +static void unmap_device_dma(struct dmar_domain *domain,
>> + struct device *dev,
>> + struct intel_iommu *iommu)
>> +{
>> + struct context_entry *ce;
>> + struct iova *iova;
>> + phys_addr_t phys_addr;
>> + dma_addr_t dev_addr;
>> + struct pci_dev *pdev;
>> +
>> + pdev = to_pci_dev(dev);
>> + ce = iommu_context_addr(iommu, pdev->bus->number, pdev->devfn, 1);
>> + phys_addr = context_address_root(ce) << VTD_PAGE_SHIFT;
>> + dev_addr = phys_to_dma(dev, phys_addr);
>> +
>> + iova = find_iova(&domain->iovad, IOVA_PFN(dev_addr));
>> + if (iova)
>> + intel_unmap(dev, dev_addr);
>> +
>> + domain_remove_one_dev_info(domain, dev);
>> +}
>> +
>> static void iommu_check_pre_te_status(struct intel_iommu *iommu)
>> {
>> u32 sts;
>> --
>> 2.0.0-rc0
>>

2015-05-21 06:55:39

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v11 08/10] iommu/vt-d: assign new page table for dma_map

On 05/21/15 at 09:27am, Li, ZhenHua wrote:
> Hi Baoquan,
>
> In the early version of this patchset, old page tables are used by new
> kernel. But as discussed, we need to make kernel use new pages when
> there is a new dma request , so we need to unmap the pages which were
> mapped in old kernel, and this is what this patch does.

OK, just a new page table allocated in init_domain(), right? I thought a
specific empty page-table is allocated for these new domains in kdump
kernel.

>
> Thanks
> Zhenhua
>
> On 05/21/2015 07:52 AM, Baoquan He wrote:
> >On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> >>When a device driver issues the first dma_map command for a device, we
> >>assign a new and empty page-table, thus removing all mappings from the
> >>old kernel for the device.
> >
> >Hi Zhenhua,
> >
> > From your patch I got it will remove all mappings, assign a new
> >page-table. But I didn't got why you stress an empty page-table. Did I
> >miss anything?
> >
> >Thanks
> >Baoquan
> >
> >>
> >>Signed-off-by: Li, Zhen-Hua <[email protected]>
> >>---
> >> drivers/iommu/intel-iommu.c | 58 ++++++++++++++++++++++++++++++++++++++-------
> >> 1 file changed, 50 insertions(+), 8 deletions(-)
> >>
> >>diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> >>index 91545bf..3cc1027 100644
> >>--- a/drivers/iommu/intel-iommu.c
> >>+++ b/drivers/iommu/intel-iommu.c
> >>@@ -396,6 +396,9 @@ static int copy_root_entry_table(struct intel_iommu *iommu);
> >>
> >> static int intel_iommu_load_translation_tables(struct intel_iommu *iommu);
> >>
> >>+static void unmap_device_dma(struct dmar_domain *domain,
> >>+ struct device *dev,
> >>+ struct intel_iommu *iommu);
> >> static void iommu_check_pre_te_status(struct intel_iommu *iommu);
> >> static u8 g_translation_pre_enabled;
> >>
> >>@@ -3115,6 +3118,7 @@ static struct iova *intel_alloc_iova(struct device *dev,
> >> static struct dmar_domain *__get_valid_domain_for_dev(struct device *dev)
> >> {
> >> struct dmar_domain *domain;
> >>+ struct intel_iommu *iommu;
> >> int ret;
> >>
> >> domain = get_domain_for_dev(dev, DEFAULT_DOMAIN_ADDRESS_WIDTH);
> >>@@ -3124,14 +3128,30 @@ static struct dmar_domain *__get_valid_domain_for_dev(struct device *dev)
> >> return NULL;
> >> }
> >>
> >>- /* make sure context mapping is ok */
> >>- if (unlikely(!domain_context_mapped(dev))) {
> >>- ret = domain_context_mapping(domain, dev, CONTEXT_TT_MULTI_LEVEL);
> >>- if (ret) {
> >>- printk(KERN_ERR "Domain context map for %s failed",
> >>- dev_name(dev));
> >>- return NULL;
> >>- }
> >>+ /* if in kdump kernel, we need to unmap the mapped dma pages,
> >>+ * detach this device first.
> >>+ */
> >>+ if (likely(domain_context_mapped(dev))) {
> >>+ iommu = domain_get_iommu(domain);
> >>+ if (iommu->pre_enabled_trans) {
> >>+ unmap_device_dma(domain, dev, iommu);
> >>+
> >>+ domain = get_domain_for_dev(dev,
> >>+ DEFAULT_DOMAIN_ADDRESS_WIDTH);
> >>+ if (!domain) {
> >>+ pr_err("Allocating domain for %s failed",
> >>+ dev_name(dev));
> >>+ return NULL;
> >>+ }
> >>+ } else
> >>+ return domain;
> >>+ }
> >>+
> >>+ ret = domain_context_mapping(domain, dev, CONTEXT_TT_MULTI_LEVEL);
> >>+ if (ret) {
> >>+ pr_err("Domain context map for %s failed",
> >>+ dev_name(dev));
> >>+ return NULL;
> >> }
> >>
> >> return domain;
> >>@@ -5168,6 +5188,28 @@ static int intel_iommu_load_translation_tables(struct intel_iommu *iommu)
> >> return ret;
> >> }
> >>
> >>+static void unmap_device_dma(struct dmar_domain *domain,
> >>+ struct device *dev,
> >>+ struct intel_iommu *iommu)
> >>+{
> >>+ struct context_entry *ce;
> >>+ struct iova *iova;
> >>+ phys_addr_t phys_addr;
> >>+ dma_addr_t dev_addr;
> >>+ struct pci_dev *pdev;
> >>+
> >>+ pdev = to_pci_dev(dev);
> >>+ ce = iommu_context_addr(iommu, pdev->bus->number, pdev->devfn, 1);
> >>+ phys_addr = context_address_root(ce) << VTD_PAGE_SHIFT;
> >>+ dev_addr = phys_to_dma(dev, phys_addr);
> >>+
> >>+ iova = find_iova(&domain->iovad, IOVA_PFN(dev_addr));
> >>+ if (iova)
> >>+ intel_unmap(dev, dev_addr);
> >>+
> >>+ domain_remove_one_dev_info(domain, dev);
> >>+}
> >>+
> >> static void iommu_check_pre_te_status(struct intel_iommu *iommu)
> >> {
> >> u32 sts;
> >>--
> >>2.0.0-rc0
> >>
>

2015-05-21 08:41:26

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH v11 08/10] iommu/vt-d: assign new page table for dma_map

Hi Baoquan,
During driver being loaded and initialized, when there is a new dma
request, the function
__get_valid_domain_for_dev
is called, and then new page is mapped.

Please check this:
struct dma_map_ops intel_dma_ops = {
.alloc = intel_alloc_coherent,
.free = intel_free_coherent,
.map_sg = intel_map_sg,
.unmap_sg = intel_unmap_sg,
.map_page = intel_map_page,
.unmap_page = intel_unmap_page,
.mapping_error = intel_mapping_error,
};

You can also add dump_stack() in __get_valid_domain_for_dev to debug.

Thanks
Zhenhua

On 05/21/2015 02:54 PM, Baoquan He wrote:
> On 05/21/15 at 09:27am, Li, ZhenHua wrote:
>> Hi Baoquan,
>>
>> In the early version of this patchset, old page tables are used by new
>> kernel. But as discussed, we need to make kernel use new pages when
>> there is a new dma request , so we need to unmap the pages which were
>> mapped in old kernel, and this is what this patch does.
>
> OK, just a new page table allocated in init_domain(), right? I thought a
> specific empty page-table is allocated for these new domains in kdump
> kernel.
>
>>
>> Thanks
>> Zhenhua
>>
>> On 05/21/2015 07:52 AM, Baoquan He wrote:
>>> On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
>>>> When a device driver issues the first dma_map command for a device, we
>>>> assign a new and empty page-table, thus removing all mappings from the
>>>> old kernel for the device.
>>>
>>> Hi Zhenhua,
>>>
>>> From your patch I got it will remove all mappings, assign a new
>>> page-table. But I didn't got why you stress an empty page-table. Did I
>>> miss anything?
>>>
>>> Thanks
>>> Baoquan
>>>
>>>>
>>>> Signed-off-by: Li, Zhen-Hua <[email protected]>
>>>> --

2015-05-21 10:11:17

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v11 08/10] iommu/vt-d: assign new page table for dma_map

On 05/21/15 at 04:40pm, Li, ZhenHua wrote:
> Hi Baoquan,
> During driver being loaded and initialized, when there is a new dma
> request, the function
> __get_valid_domain_for_dev
> is called, and then new page is mapped.
>
> Please check this:
> struct dma_map_ops intel_dma_ops = {
> .alloc = intel_alloc_coherent,
> .free = intel_free_coherent,
> .map_sg = intel_map_sg,
> .unmap_sg = intel_unmap_sg,
> .map_page = intel_map_page,
> .unmap_page = intel_unmap_page,
> .mapping_error = intel_mapping_error,
> };
>
> You can also add dump_stack() in __get_valid_domain_for_dev to debug.

Yeah, I saw that. At the beginning I am just wondering why you say a new
page-table, and also mention it's a empty page-table. I think here that
new page-table is an empty page-table when you said them.

No confusion any more. Thanks for explanation.

>
> Thanks
> Zhenhua
>
> On 05/21/2015 02:54 PM, Baoquan He wrote:
> >On 05/21/15 at 09:27am, Li, ZhenHua wrote:
> >>Hi Baoquan,
> >>
> >>In the early version of this patchset, old page tables are used by new
> >>kernel. But as discussed, we need to make kernel use new pages when
> >>there is a new dma request , so we need to unmap the pages which were
> >>mapped in old kernel, and this is what this patch does.
> >
> >OK, just a new page table allocated in init_domain(), right? I thought a
> >specific empty page-table is allocated for these new domains in kdump
> >kernel.
> >
> >>
> >>Thanks
> >>Zhenhua
> >>
> >>On 05/21/2015 07:52 AM, Baoquan He wrote:
> >>>On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
> >>>>When a device driver issues the first dma_map command for a device, we
> >>>>assign a new and empty page-table, thus removing all mappings from the
> >>>>old kernel for the device.
> >>>
> >>>Hi Zhenhua,
> >>>
> >>> From your patch I got it will remove all mappings, assign a new
> >>>page-table. But I didn't got why you stress an empty page-table. Did I
> >>>miss anything?
> >>>
> >>>Thanks
> >>>Baoquan
> >>>
> >>>>
> >>>>Signed-off-by: Li, Zhen-Hua <[email protected]>
> >>>>--

2015-06-08 14:15:45

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v11 04/10] iommu/vt-d: functions to copy data from old mem

On Mon, 2015-05-11 at 17:52 +0800, Li, Zhen-Hua wrote:
> Add some functions to copy the data from old kernel.
> These functions are used to copy context tables and page tables.
>
> To avoid calling iounmap between spin_lock_irqsave and
> spin_unlock_irqrestore,
> use a link here, store the pointers , and then use iounmap to free
> them in
> another place.
>
> Li, Zhen-hua:
> The functions and logics.

Surely this isn't specific to the Intel IOMMU? Shouldn't it live
elsewhere — either in generic IOMMU code or perhaps in generic kexec
support code?

Don't we need to solve the same kexec problem on *all* platforms with
an IOMMU, and won't they all need something like this?

And I think you're misusing VTD_PAGE_{SHIFT,MASK} when you should be
using the normal PAGE_{SHIFT,MASK}. And shouldn't physical addresses be
phys_addr_t?

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation


Attachments:
smime.p7s (5.56 kB)

2015-06-08 14:26:41

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

On Mon, 2015-05-11 at 17:52 +0800, Li, Zhen-Hua wrote:
> To fix this problem, we modifies the behaviors of the intel vt-d in the
> crashdump kernel:
>
> For DMA Remapping:
> 1. To accept the vt-d hardware in an active state,
> 2. Do not disable and re-enable the translation, keep it enabled.
> 3. Use the old root entry table, do not rewrite the RTA register.
> 4. Malloc and use new context entry table, copy data from the old ones that
> used by the old kernel.

There are some interesting corner cases to handle here.

Firstly, you don't seem to handle the case of extended root/context
tables (where ecap_ecs is set). You need to preserve the DMA_RTADDR_RT
bit in the Root Table Address register, surely?

I think we also need to cope with inheriting page tables from a kernel
that *doesn't* use extended page tables, even on hardware that supports
it. Which means the use of extended page tables in the new kernel would
need to be contingent on something *other* than the pure hardware
capabilities.

> 5. Keep using the old page tables before driver is loaded.
> 6. After device driver is loaded, when it issues the first dma_map command,
> free the dmar_domain structure for this device, and generate a new one, so
> that the device can be assigned a new and empty page table.

What if there were RMRRs for this device? Don't we need to ensure that
those are present in the "new and empty page table" when we take it
over?

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation


Attachments:
smime.p7s (5.56 kB)

2015-06-08 15:21:59

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v11 04/10] iommu/vt-d: functions to copy data from old mem

Hi David,

On Mon, Jun 08, 2015 at 03:15:35PM +0100, David Woodhouse wrote:
> Surely this isn't specific to the Intel IOMMU? Shouldn't it live
> elsewhere — either in generic IOMMU code or perhaps in generic kexec
> support code?

I put a bigger rework of this on-top of Zhen-Hua's patches, you can find
the result in my x86/vt-d branch. With my patches I also removed this
pointer collecting concept and do the iomap_cache and iounmap calls
before the spin-lock is taken, so this problem is now solved
differently.

> And I think you're misusing VTD_PAGE_{SHIFT,MASK} when you should be
> using the normal PAGE_{SHIFT,MASK}.

I think VT_PAGE_* is correct, since the VT-d driver also runs on ia64.
There the system page-size is different from the VT-d page-size.

>And shouldn't physical addresses be phys_addr_t?

This is changed where appropriate, I hope.


Joerg

2015-06-08 15:30:08

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

On Mon, Jun 08, 2015 at 03:26:23PM +0100, David Woodhouse wrote:
> There are some interesting corner cases to handle here.
>
> Firstly, you don't seem to handle the case of extended root/context
> tables (where ecap_ecs is set). You need to preserve the DMA_RTADDR_RT
> bit in the Root Table Address register, surely?
>
> I think we also need to cope with inheriting page tables from a kernel
> that *doesn't* use extended page tables, even on hardware that supports
> it. Which means the use of extended page tables in the new kernel would
> need to be contingent on something *other* than the pure hardware
> capabilities.

Hmm, I also limited this functionality to kdump kernels. Do we still
need to preserve these extended data structures even when there is no
upstream support for them yet?

> > 5. Keep using the old page tables before driver is loaded.
> > 6. After device driver is loaded, when it issues the first dma_map command,
> > free the dmar_domain structure for this device, and generate a new one, so
> > that the device can be assigned a new and empty page table.
>
> What if there were RMRRs for this device? Don't we need to ensure that
> those are present in the "new and empty page table" when we take it
> over?

That is still a problem, but not specific to this patch-set. RMRRs will
not be restored, because domains allocated out of the DMA-API path will
not get any RMRR mappings. This is also a problem with device hotplug
(unplug a device with RMRRs and replug it in and the RMRR mappings will
be gone).

I agree that this needs to be fixed.


Joerg

2015-06-08 15:44:14

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v11 04/10] iommu/vt-d: functions to copy data from old mem

On Mon, 2015-06-08 at 17:21 +0200, Joerg Roedel wrote:
> Hi David,
>
> On Mon, Jun 08, 2015 at 03:15:35PM +0100, David Woodhouse wrote:
> > Surely this isn't specific to the Intel IOMMU? Shouldn't it live
> > elsewhere — either in generic IOMMU code or perhaps in generic kexec
> > support code?
>
> I put a bigger rework of this on-top of Zhen-Hua's patches, you can find
> the result in my x86/vt-d branch. With my patches I also removed this
> pointer collecting concept and do the iomap_cache and iounmap calls
> before the spin-lock is taken, so this problem is now solved
> differently.
>
> > And I think you're misusing VTD_PAGE_{SHIFT,MASK} when you should be
> > using the normal PAGE_{SHIFT,MASK}.
>
> I think VT_PAGE_* is correct, since the VT-d driver also runs on ia64.
> There the system page-size is different from the VT-d page-size.

That's the problem. In __iommu_load_from_oldmem we start with a
physical address in 'from', convert to a VT-d PFN in 'pfn':

+ pfn = from >> VTD_PAGE_SHIFT;

.. and then proceed to pass that pfn to non-VT-d functions like
page_is_ram() and pfn_to_kaddr() which really need their input pfn
values to be in terms of PAGE_SHIFT not VTD_PAGE_SHIFT.

But it looks like you've completely eliminated that now (including the
page_is_ram check). So although it *was* wrong, it doesn't matter now.

> > And shouldn't physical addresses be phys_addr_t?
>
> This is changed where appropriate, I hope.

OK. In fact once it's purely internal to intel-iommu.c it doesn't
matter as much since we don't put page tables in high memory on 32-bit
machines.

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation


Attachments:
smime.p7s (5.56 kB)

2015-06-08 15:50:38

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

On Mon, 2015-06-08 at 17:29 +0200, Joerg Roedel wrote:
> On Mon, Jun 08, 2015 at 03:26:23PM +0100, David Woodhouse wrote:
> > There are some interesting corner cases to handle here.
> >
> > Firstly, you don't seem to handle the case of extended root/context
> > tables (where ecap_ecs is set). You need to preserve the DMA_RTADDR_RT
> > bit in the Root Table Address register, surely?
> >
> > I think we also need to cope with inheriting page tables from a kernel
> > that *doesn't* use extended page tables, even on hardware that supports
> > it. Which means the use of extended page tables in the new kernel would
> > need to be contingent on something *other* than the pure hardware
> > capabilities.
>
> Hmm, I also limited this functionality to kdump kernels. Do we still
> need to preserve these extended data structures even when there is no
> upstream support for them yet?

We *do* have upstream support. The 4.1 kernel will use the extended
root/context tables and will set the DMA_RTADDR_RTT bit in the Root
Table Address register, even though it doesn't yet actually *use* any
of the shiny new bits in the extended context tables.

So the code which copies the context tables needs to take that into
account.

> That is still a problem, but not specific to this patch-set. RMRRs will
> not be restored, because domains allocated out of the DMA-API path will
> not get any RMRR mappings. This is also a problem with device hotplug
> (unplug a device with RMRRs and replug it in and the RMRR mappings will
> be gone).
>
> I agree that this needs to be fixed.

Yeah. We need the same thing with hardware passthrough — currently I
think we refuse to put RMRR-afflicted devices into the passthrough
domain purely because we lack the capability to install the RMRR
regions if/when we later take it *out* of the hardware passthrough
domain. Although I can't quite remember the logic there; surely if it's
RMRR-afflicted and we have iommu=pt, it'll *never* be taken out of the
1:1 domain? A device driver might come along and tell us it really is
64-bit capable and thus we might put it *in* to the passthrough domain
where previously we'd kept it out... but taking it *out*... ?

--
dwmw2


Attachments:
smime.p7s (5.56 kB)

2015-06-08 16:13:26

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

On Mon, Jun 08, 2015 at 04:50:24PM +0100, David Woodhouse wrote:
> On Mon, 2015-06-08 at 17:29 +0200, Joerg Roedel wrote:
> > Hmm, I also limited this functionality to kdump kernels. Do we still
> > need to preserve these extended data structures even when there is no
> > upstream support for them yet?
>
> We *do* have upstream support. The 4.1 kernel will use the extended
> root/context tables and will set the DMA_RTADDR_RTT bit in the Root
> Table Address register, even though it doesn't yet actually *use* any
> of the shiny new bits in the extended context tables.
>
> So the code which copies the context tables needs to take that into
> account.

Right, I missed that until now. So what the code with my changes does
is, it sets the DMA_RTADDR_RTT bit as it would do on a normal boot. But
unlike the root entry table address, this bit can not be changed while
translation is active.

So I think we need to read out that bit when we find translation enabled
and if it is different from what we would set it to, we bail out of any
copying, disable translation and proceed as in a normal boot.

> Yeah. We need the same thing with hardware passthrough — currently I
> think we refuse to put RMRR-afflicted devices into the passthrough
> domain purely because we lack the capability to install the RMRR
> regions if/when we later take it *out* of the hardware passthrough
> domain. Although I can't quite remember the logic there; surely if it's
> RMRR-afflicted and we have iommu=pt, it'll *never* be taken out of the
> 1:1 domain? A device driver might come along and tell us it really is
> 64-bit capable and thus we might put it *in* to the passthrough domain
> where previously we'd kept it out... but taking it *out*... ?

Well, yeah, the logic is complicated. My hope is to move all that logic
to the iommu core code to have it unified between iommu drivers.

The way it should work then is basically: Every device (better:
iommu-group) has a default domain (which can be a PT domain) and if the
device has RMRR mappings, it can not be taken out of that domain. The
default-domain branch of my tree contains the beginnings of that, but
there is still a lot of work to do to get there.


Joerg

2015-06-09 12:56:07

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

On Mon, 2015-06-08 at 18:13 +0200, Joerg Roedel wrote:
> On Mon, Jun 08, 2015 at 04:50:24PM +0100, David Woodhouse wrote:
> > On Mon, 2015-06-08 at 17:29 +0200, Joerg Roedel wrote:
> > > Hmm, I also limited this functionality to kdump kernels. Do we still
> > > need to preserve these extended data structures even when there is no
> > > upstream support for them yet?
> >
> > We *do* have upstream support. The 4.1 kernel will use the extended
> > root/context tables and will set the DMA_RTADDR_RTT bit in the Root
> > Table Address register, even though it doesn't yet actually *use* any
> > of the shiny new bits in the extended context tables.
> >
> > So the code which copies the context tables needs to take that into
> > account.
>
> Right, I missed that until now. So what the code with my changes does
> is, it sets the DMA_RTADDR_RTT bit as it would do on a normal boot. But
> unlike the root entry table address, this bit can not be changed while
> translation is active.
>
> So I think we need to read out that bit when we find translation enabled
> and if it is different from what we would set it to, we bail out of any
> copying, disable translation and proceed as in a normal boot.

Given that this is only for kdump and not the general case of kexec,
that's probably tolerable. Of course we do still need to make it *not*
broken for the case where DMA_RTADDR_RTT is set, as it is at the
moment.

And I suspect if we're doing that, it might be simple enough to make it
convert to/from the extended page tables. I don't think we want to
preserve PASID tables; only the "second level" (i.e. traditional)
translation. So we could happily pull the page table pointer out of
either kind of context entry, and install it into either kind. I think
there's a simple mapping of translation types too. I need to sort out
the translation types when adding the real PASID support (imminently!)
anyway.

--
dwmw2


Attachments:
smime.p7s (5.56 kB)

2015-06-10 09:22:02

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

On Tue, Jun 09, 2015 at 01:55:50PM +0100, David Woodhouse wrote:
> On Mon, 2015-06-08 at 18:13 +0200, Joerg Roedel wrote:
> > So I think we need to read out that bit when we find translation enabled
> > and if it is different from what we would set it to, we bail out of any
> > copying, disable translation and proceed as in a normal boot.
>
> Given that this is only for kdump and not the general case of kexec,
> that's probably tolerable. Of course we do still need to make it *not*
> broken for the case where DMA_RTADDR_RTT is set, as it is at the
> moment.

Yes, I just sent a patch for this and will include it into my x86/vt-d
branch if not objections come in.

> And I suspect if we're doing that, it might be simple enough to make it
> convert to/from the extended page tables. I don't think we want to
> preserve PASID tables; only the "second level" (i.e. traditional)
> translation. So we could happily pull the page table pointer out of
> either kind of context entry, and install it into either kind. I think
> there's a simple mapping of translation types too. I need to sort out
> the translation types when adding the real PASID support (imminently!)
> anyway.

What happens when we take away the PASID tables from a device? Can it
also go into some failure state?
When doing this, we need at least setup the page request queue before we
copy over anything and change the root-entry. Then we can handle any
faults that are caused by this and tell the device to not try further.


Joerg

2015-06-10 09:34:19

by Li, ZhenHua

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

On 06/10/2015 05:21 PM, Joerg Roedel wrote:
> On Tue, Jun 09, 2015 at 01:55:50PM +0100, David Woodhouse wrote:
>> On Mon, 2015-06-08 at 18:13 +0200, Joerg Roedel wrote:
>>> So I think we need to read out that bit when we find translation enabled
>>> and if it is different from what we would set it to, we bail out of any
>>> copying, disable translation and proceed as in a normal boot.
>>
>> Given that this is only for kdump and not the general case of kexec,
>> that's probably tolerable. Of course we do still need to make it *not*
>> broken for the case where DMA_RTADDR_RTT is set, as it is at the
>> moment.
>
> Yes, I just sent a patch for this and will include it into my x86/vt-d
> branch if not objections come in.
>
>> And I suspect if we're doing that, it might be simple enough to make it
>> convert to/from the extended page tables. I don't think we want to
>> preserve PASID tables; only the "second level" (i.e. traditional)
>> translation. So we could happily pull the page table pointer out of
>> either kind of context entry, and install it into either kind. I think
>> there's a simple mapping of translation types too. I need to sort out
>> the translation types when adding the real PASID support (imminently!)
>> anyway.
>
> What happens when we take away the PASID tables from a device? Can it
> also go into some failure state?
> When doing this, we need at least setup the page request queue before we
> copy over anything and change the root-entry. Then we can handle any
> faults that are caused by this and tell the device to not try further.
>
>
> Joerg
>
Is PASID part of new specs? Is there any plan to upgrade the driver to
support the latest vt-d specs?

Zhenhua

2015-06-10 14:11:38

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

On Wed, 2015-06-10 at 17:32 +0800, Li, ZhenHua wrote:
>
> Is PASID part of new specs? Is there any plan to upgrade the driver to
> support the latest vt-d specs?

Yes, and yes. I'm currently working on the latter — and the extended
page table support in 4.1 is the precursor to that work.

--
dwmw2


Attachments:
smime.p7s (5.56 kB)