2014-01-10 22:07:47

by Sumner, William

[permalink] [raw]
Subject: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

v2->v3:
1. Commented-out "#define DEBUG 1" to eliminate debug messages
2. Updated the comments about changes in each version in all patches in the set.
3. Fixed: one-line added to Copy-Translations" patch to initialize the iovad
struct as recommended by Baoquan He [[email protected]]
init_iova_domain(&domain->iovad, DMA_32BIT_PFN);

v1->v2:
The following series implements a fix for:
A kdump problem about DMA that has been discussed for a long time. That is,
when a kernel panics and boots into the kdump kernel, DMA started by the
panicked kernel is not stopped before the kdump kernel is booted and the
kdump kernel disables the IOMMU while this DMA continues. This causes the
IOMMU to stop translating the DMA addresses as IOVAs and begin to treat them
as physical memory addresses -- which causes the DMA to either:
(1) generate DMAR errors or (2) generate PCI SERR errors or (3) transfer
data to or from incorrect areas of memory. Often this causes the dump to fail.

This patch set modifies the behavior of the iommu in the (new) crashdump kernel:
1. to accept the iommu hardware in an active state,
2. to leave the current translations in-place so that legacy DMA will continue
using its current buffers until the device drivers in the crashdump kernel
initialize and initialize their devices,
3. to use different portions of the iova address ranges for the device drivers
in the crashdump kernel than the iova ranges that were in-use at the time
of the panic.

Advantages of this approach:
1. All manipulation of the IO-device is done by the Linux device-driver
for that device.
2. This approach behaves in a manner very similar to operation without an
active iommu.
3. Any activity between the IO-device and its RMRR areas is handled by the
device-driver in the same manner as during a non-kdump boot.
4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
This supports the practice of creating a special kdump kernel without
drivers for any devices that are not required for taking a crashdump.

Changes since the RFC version of this patch:
1. Consolidated all of the operational code into the "copy..." functions.
The "process..." functions were primarily used for diagnostics and
exploration; however, there was a small amount of operational code that
used the "process..." functions.
This operational code has been moved into the "copy..." functions.

2. Removed the "Process ..." functions and the diagnostic code that ran
on that function set. This removed about 1/4 of the code -- which this
operational patch set no longer needs. These portions of the RFC patch
could be formatted as a separate patch and submitted independently
at a later date.

3. Re-formatted the code to the Linux Coding Standards.
The checkpatch script still finds some lines to complain about;
however most of these lines are either (1) lines that I did not change,
or (2) lines that only changed by adding a level of indent which pushed
them over 80-characters, or (3) new lines whose intent is far clearer when
longer than 80-characters.

4. Updated the remaining debug print to be significantly more flexible.
This allows control over the amount of debug print to the console --
which can vary widely.

5. Fixed a couple of minor bugs found by testing on a machine with a
very large IO configuration.

At a high level, this code operates primarily during iommu initialization
and device-driver initialization

During intel-iommu hardware initialization:
In intel_iommu_init(void)
* If (This is the crash kernel)
. Set flag: crashdump_accepting_active_iommu (all changes below check this)
. Skip disabling the iommu hardware translations

In init_dmars()
* Duplicate the intel iommu translation tables from the old kernel
in the new kernel
. The root-entry table, all context-entry tables,
and all page-translation-entry tables
. The duplicate tables contain updated physical addresses to link them together.
. The duplicate tables are mapped into kernel virtual addresses
in the new kernel which allows most of the existing iommu code
to operate without change.
. Do some minimal sanity-checks during the copy
. Place the address of the new root-entry structure into "struct intel_iommu"

* Skip setting-up new domains for 'si', 'rmrr', 'isa'
. Translations for 'rmrr' and 'isa' ranges have been copied from the old kernel
. This patch has not yet been tested with iommu pass-through enabled

* Existing (unchanged) code near the end of dmar_init:
. Loads the address of the (now new) root-entry structure from
"struct intel_iommu" into the iommu hardware and does the hardware flushes.
This changes the active translation tables from the ones in the old kernel
to the copies in the new kernel.
. This is legal because the translations in the two sets of tables are
currently identical:
Virtualization Technology for Directed I/O. Architecture Specification,
February 2011, Rev. 1.3 (section 11.2, paragraph 2)

In iommu_init_domains()
* Mark as in-use all domain-id's from the old kernel
. In case the new kernel contains a device that was not in the old kernel
and a new, unused domain-id is actually needed, the bitmap will give us one.

When a new domain is created for a device:
* If (this device has a context in the old kernel)
. Get domain-id, address-width, and IOVA ranges from the old kernel context;
. Get address(page-entry-tables) from the copy in the new kernel;
. And apply all of the above values to the new domain structure.
* Else
. Create a new domain as normal


Bill Sumner (6):
Crashdump-Accepting-Active-IOMMU-Flags-and-Prototype
Crashdump-Accepting-Active-IOMMU-Utility-functions
Crashdump-Accepting-Active-IOMMU-Domain-Interfaces
Crashdump-Accepting-Active-IOMMU-Copy-Translations
Crashdump-Accepting-Active-IOMMU-Debug-Print-IOMMU
Crashdump-Accepting-Active-IOMMU-Call-From-Mainline

drivers/iommu/intel-iommu.c | 1293 ++++++++++++++++++++++++++++++++++++++++---
1 file changed, 1225 insertions(+), 68 deletions(-)

--
Bill Sumner <[email protected]>


2014-01-10 22:08:02

by Sumner, William

[permalink] [raw]
Subject: [PATCHv3 4/6] Crashdump-Accepting-Active-IOMMU-Copy-Translations

---------.---------.---------.---------.---------.---------.---------.
This patch contains a set of functions that duplicate into the
crashdump kernel the IOMMU translation tables that were in-use by the
panicked kernel at the time of the panic. Note that all of these
tables (pages) are linked together by physical memory addresses --
beginning with the address of the root-entry-table which is obtained
from the IOMMU hardware. These functions are located in reverse order
of being called -- from the highest-level (copying the root entry table)
to the lowest-level (accumulating each IOVA address range) -- while the
copy operation works its way down the tree of tables from the root-entry
to the lowest-level page-table.

During the traversal up and down this set of functions the structure
'copy_page_addr_parms' is passed as an anonymous structure via a void*
since most of the functions do not need it. Those functions that do
need it cast the pointer appropriately.

Entry to this set of functions from the mainline code is only through
the interface function 'copy_intel_iommu_translation_tables' which
instantiates the 'copy_page_addr_parms' structure on its stack, initializes
the global lists of domain-id structures (once only) and begins the copy.

The function that copies the individual page tables is recursive (max 6)
to accomodate various IOVA address widths. It stops either when the
address width reaches 12 bits (4k page addresses) or earlier when a
page table contains superpage addresses.

Note: The copying of the translation tables into the crashdump kernel
is done during system initialization when there is only one CPU active;
therefore no locks are necessary during the copy. After the copied
translation tables become active, the locks within the existing code
protect them.

v1->v2:
Updated patch description

v2->v3
Added "init_iova_domain(&domain->iovad, DMA_32BIT_PFN);" as recommended
by Baoquan He [[email protected]]

Signed-off-by: Bill Sumner <[email protected]>
---
drivers/iommu/intel-iommu.c | 530 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 530 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 85e30e5..6737550 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4795,3 +4795,533 @@ static int intel_iommu_get_dids_from_old_kernel(struct intel_iommu *iommu)
return 0;
}
#endif /* CONFIG_CRASH_DUMP */
+#ifdef CONFIG_CRASH_DUMP
+
+
+
+/* ========================================================================
+ * Copy iommu translation tables from old kernel into new kernel
+ * Entry to this set of functions is: copy_intel_iommu_translation_tables()
+ * ------------------------------------------------------------------------
+ */
+
+/*
+ * Struct copy_page_addr_parms is used to allow copy_page_addr()
+ * to accumulate values across multiple calls and returns.
+ */
+struct copy_page_addr_parms {
+ u32 first; /* flag: first-time */
+ u32 last; /* flag: last-time */
+ u32 bus; /* last bus number we saw */
+ u32 devfn; /* last devfn we saw */
+ u32 shift; /* last shift we saw */
+ u64 pte; /* Page Table Entry */
+ u64 next_addr; /* next-expected page_addr */
+
+ u64 page_addr; /* page_addr accumulating size */
+ u64 page_size; /* page_size accumulated */
+
+ struct dmar_domain *domain; /* to accumulate iova ranges */
+};
+
+/*
+ * constant for initializing instances of copy_page_addr_parms properly.
+ */
+static struct copy_page_addr_parms copy_page_addr_parms_init = {1, 0};
+
+
+
+/*
+ * Lowest-level function in the 'Copy Page Tables' set
+ * Called once for each page_addr present in an iommu page-address table.
+ */
+static int copy_page_addr(u64 page_addr, u32 shift, u32 bus, u32 devfn,
+ u64 pte, struct dmar_domain *domain,
+ void *parms)
+{
+ struct copy_page_addr_parms *ppap = parms;
+
+ u64 page_size = ((u64)1 << shift); /* page_size */
+ u64 pfn_lo; /* For reserving IOVA range */
+ u64 pfn_hi; /* For reserving IOVA range */
+ struct iova *iova_p; /* For reserving IOVA range */
+
+ if (!ppap) {
+ pr_err("ERROR: ppap is NULL: 0x%3.3x(%3.3d) DevFn: 0x%3.3x(%3.3d) Page: 0x%16.16llx Size: 0x%16.16llx(%lld)\n",
+ bus, bus, devfn, devfn, page_addr,
+ page_size, page_size);
+ return 0;
+ }
+
+ if (!ppap->last) { /* If (Not last time) */
+ if (pr_dbg.copy_page_addr)
+ pr_debug("ADDR::B:D:F=%2.2x:%2.2x:%1.1x Addr:0x%12.12llx Size:0x%12.12llx(%lld) Pte:0x%16.16llx\n",
+ bus, devfn >> 3, devfn & 0x7,
+ page_addr, page_size, page_size, pte);
+
+ /* If (only extending current addr range) */
+ if (ppap->first == 0 &&
+ ppap->bus == bus &&
+ ppap->devfn == devfn &&
+ ppap->shift == shift &&
+ (ppap->pte & ~VTD_PAGE_MASK) == (pte & ~VTD_PAGE_MASK) &&
+ ppap->next_addr == page_addr) {
+
+ /* Update page size and next-expected address */
+ ppap->next_addr += page_size;
+ ppap->page_size += page_size;
+ return 0;
+ }
+ }
+
+ if (!ppap->first) {
+ /* Print out the accumulated address range */
+
+ if (pr_dbg.addr_ranges)
+ pr_debug("DATA B:D:F=%2.2x:%2.2x:%1.1x Addr:0x%12.12llx Size:0x%12.12llx(%lld) Pte:0x%16.16llx\n",
+ ppap->bus, ppap->devfn >> 3, ppap->devfn & 0x7,
+ ppap->page_addr,
+ ppap->page_size, ppap->page_size, ppap->pte);
+
+ if (!ppap->domain) {
+ pr_err("%s ERROR: Domain is NULL -- needed to reserve range for B:D:F=%2.2x:%2.2x:%1.1x\n",
+ __func__,
+ ppap->bus, ppap->devfn >> 3, ppap->devfn & 0x7);
+ return 0;
+ }
+ pfn_lo = IOVA_PFN(ppap->page_addr);
+ pfn_hi = IOVA_PFN(ppap->page_addr + ppap->page_size);
+ iova_p = reserve_iova(&ppap->domain->iovad, pfn_lo, pfn_hi);
+
+ if (iova_p)
+ if (pr_dbg.reserved_ranges)
+ pr_debug("RSVD B:D:F=%2.2x:%2.2x:%1.1x (0x%16.16lx, 0x%16.16lx) did=0x%4.4x\n",
+ ppap->bus,
+ ppap->devfn >> 3, ppap->devfn & 0x7,
+ iova_p->pfn_lo, iova_p->pfn_hi,
+ ppap->domain->id);
+ }
+
+ /* Prepare for a new page */
+ ppap->first = 0; /* Not first-time anymore */
+ ppap->bus = bus;
+ ppap->devfn = devfn;
+ ppap->shift = shift;
+ ppap->pte = pte;
+ ppap->next_addr = page_addr + page_size; /* Next-expected page_addr */
+
+ ppap->page_addr = page_addr; /* Addr(new page) */
+ ppap->page_size = page_size; /* Size(new page) */
+
+ ppap->domain = domain; /* adr(domain for the new range) */
+
+ return 0;
+}
+
+/*
+ * Recursive function to copy the tree of page tables (max 6 recursions)
+ * Parameter 'shift' controls the recursion
+ */
+static int copy_page_table(struct dma_pte **dma_pte_new_p,
+ struct dma_pte *dma_pte_phys,
+ u32 shift, u64 page_addr,
+ struct intel_iommu *iommu,
+ u32 bus, u32 devfn,
+ struct dmar_domain *domain, void *ppap)
+{
+ int ret; /* Integer return code */
+ struct dma_pte *p; /* Physical adr(each entry) iterator */
+ struct dma_pte *pgt_new_virt; /* Adr(dma_pte in new kernel) */
+ struct dma_pte *dma_pte_next; /* Adr(next table down) */
+ u64 u; /* index(each entry in page_table) */
+
+ if (pr_dbg.copy_page_table)
+ pr_debug("ENTER %s B:D:F:%2.2x:%2.2x:%1.1x phys:%16.16llx shift:%d addr:%16.16llx\n",
+ __func__, bus, devfn >> 3, devfn & 0x7,
+ (u64)dma_pte_phys, shift, page_addr);
+
+ /* If (already done all levels -- problem) */
+ if (shift < 12) {
+ pr_err("ERROR %s shift < 12 %p\n", __func__, dma_pte_phys);
+ pr_err("shift %d, page_addr %16.16llu bus %3.3u devfn %3.3u\n",
+ shift, page_addr, bus, devfn);
+ return 2;
+ }
+
+ /* allocate a page table in the new kernel
+ * copy contents from old kernel
+ * then update each entry in the table in the new kernel
+ */
+
+ pgt_new_virt = (struct dma_pte *)alloc_pgtable_page(iommu->node);
+ if (!pgt_new_virt)
+ return -ENOMEM;
+
+ ret = oldcopy(pgt_new_virt, dma_pte_phys, VTD_PAGE_SIZE);
+ if (ret <= 0)
+ return ret;
+
+ for (u = 0, p = pgt_new_virt; u < 512; u++, p++) {
+
+ if (((p->val & DMA_PTE_READ) == 0) &&
+ ((p->val & DMA_PTE_WRITE) == 0))
+ continue;
+
+ if (dma_pte_superpage(p) || (shift == 12)) {
+
+ ret = copy_page_addr(page_addr | (u << shift),
+ shift, bus, devfn, p->val, domain, ppap);
+ if (ret)
+ return ret;
+ continue;
+ }
+
+ ret = copy_page_table(&dma_pte_next,
+ (struct dma_pte *)(p->val & VTD_PAGE_MASK),
+ shift-9, page_addr | (u << shift),
+ iommu, bus, devfn, domain, ppap);
+ if (ret)
+ return ret;
+
+ p->val &= ~VTD_PAGE_MASK; /* Clear old and set new pgd */
+ p->val |= ((u64)dma_pte_next & VTD_PAGE_MASK);
+ }
+
+ *dma_pte_new_p = (struct dma_pte *)virt_to_phys(pgt_new_virt);
+ __iommu_flush_cache(iommu, pgt_new_virt, VTD_PAGE_SIZE);
+
+ if (pr_dbg.copy_page_table)
+ pr_debug("LEAVE %s new page:%16.16llx(phys) %16.16llx(virt)\n",
+ __func__, (u64)(*dma_pte_new_p), (u64)pgt_new_virt);
+
+ return 0;
+}
+
+
+
+static int copy_context_entry(struct intel_iommu *iommu, u32 bus, u32 devfn,
+ void *ppap, struct context_entry *ce)
+{
+ int ret = 0; /* Integer Return Code */
+ u32 shift = 0; /* bits to shift page_addr */
+ u64 page_addr = 0; /* Address of translated page */
+ struct dma_pte *pgt_old_phys; /* Adr(page_table in the old kernel) */
+ struct dma_pte *pgt_new_phys; /* Adr(page_table in the new kernel) */
+ unsigned long asr; /* New asr value for new context */
+ u8 t; /* Translation-type from context */
+ u8 aw; /* Address-width from context */
+ u32 aw_shift[8] = {
+ 12+9+9, /* [000b] 30-bit AGAW (2-level page table) */
+ 12+9+9+9, /* [001b] 39-bit AGAW (3-level page table) */
+ 12+9+9+9+9, /* [010b] 48-bit AGAW (4-level page table) */
+ 12+9+9+9+9+9, /* [011b] 57-bit AGAW (5-level page table) */
+ 12+9+9+9+9+9+9, /* [100b] 64-bit AGAW (6-level page table) */
+ 0, /* [111b] Reserved */
+ 0, /* [110b] Reserved */
+ 0, /* [111b] Reserved */
+ };
+
+ struct dmar_domain *domain = NULL; /* To hold domain & device */
+ /* values from old kernel */
+ struct device_domain_info *info = NULL; /* adr(new for this device) */
+ struct device_domain_info *i = NULL; /* iterator for foreach */
+
+
+ pr_debug("CTXT B:D:F:%2.2x:%2.2x:%1.1x at virt: 0x%16.16llx hi:%16.16llx lo:%16.16llx\n",
+ bus, devfn >> 3, devfn & 0x7,
+ (u64) ce, ce->hi, ce->lo);
+
+ if (!context_get_p(ce)) { /* If (context not present) */
+ ret = 0; /* Skip it */
+ goto exit;
+ }
+
+ pr_debug("CTXT B:D:F:%2.2x:%2.2x:%1.1x p=%d fpd=%d t=%d asr=%16.16llx aw=%d aval=%d did=0x%4.4x\n",
+ bus, devfn >> 3, devfn & 0x7,
+ (int) context_get_p(ce),
+ (int) context_get_fpdi(ce),
+ (int) context_get_t(ce),
+ (u64) context_get_asr(ce),
+ (int) context_get_aw(ce),
+ (int) context_get_aval(ce),
+ (u32) context_get_did(ce));
+
+ info = alloc_devinfo_mem();
+ if (!info) {
+ ret = -ENOMEM;
+ goto exit;
+ }
+ /* info->segment = segment; May need this later */
+ info->bus = bus;
+ info->devfn = devfn;
+ info->iommu = iommu;
+
+ list_for_each_entry(i, &device_domain_values_list[iommu->seq_id],
+ global) {
+ if (i->domain->id == (int) context_get_did(ce)) {
+ domain = i->domain;
+ pr_debug("CTXT B:D:F:%2.2x:%2.2x:%1.1x Found did=0x%4.4x\n",
+ bus, devfn >> 3, devfn & 0x7, i->domain->id);
+ break;
+ }
+ }
+
+ if (!domain) {
+ domain = alloc_domain();
+ if (!domain) {
+ ret = -ENOMEM;
+ goto exit;
+ }
+ INIT_LIST_HEAD(&domain->devices);
+ domain->id = (int) context_get_did(ce);
+ domain->agaw = (int) context_get_aw(ce);
+ domain->pgd = NULL;
+ init_iova_domain(&domain->iovad, DMA_32BIT_PFN);
+
+ pr_debug("CTXT Allocated new list entry, did:%d\n",
+ domain->id);
+ }
+
+ info->domain = domain;
+ list_add(&info->link, &domain->devices);
+ list_add(&info->global, &device_domain_values_list[iommu->seq_id]);
+
+ if (domain->pgd) {
+ asr = virt_to_phys(domain->pgd) >> VTD_PAGE_SHIFT;
+ context_put_asr(ce, asr);
+ ret = 4;
+ goto exit;
+ }
+
+ t = context_get_t(ce);
+
+ if (t == 0 || t == 1) { /* If (context has page tables) */
+ aw = context_get_aw(ce);
+ shift = aw_shift[aw];
+
+ pgt_old_phys = (struct dma_pte *)(context_get_asr(ce) << 12);
+
+ ret = copy_page_table(&pgt_new_phys, pgt_old_phys,
+ shift-9, page_addr, iommu, bus, devfn, domain, ppap);
+
+ if (ret) /* if (problem) bail out */
+ goto exit;
+
+ asr = ((unsigned long)(pgt_new_phys)) >> VTD_PAGE_SHIFT;
+ context_put_asr(ce, asr);
+ domain->pgd = phys_to_virt((unsigned long)pgt_new_phys);
+ ret = 1;
+ goto exit;
+ }
+
+ if (t == 2) { /* If (Identity mapped pass-through) */
+ ret = 2; /* REVISIT: Skip for now */
+ goto exit;
+ }
+
+ ret = 3; /* Else ce->t is a Reserved value */
+
+exit: /* all returns come through here to insure good clean-up */
+
+ if (ret < 0) {
+ if (info)
+ free_devinfo_mem(info);
+ if (domain)
+ free_domain_mem(domain);
+ }
+ return ret;
+}
+
+
+static int copy_context_entry_table(struct intel_iommu *iommu,
+ u32 bus, void *ppap,
+ struct context_entry **context_new_p,
+ struct context_entry *context_old_phys)
+{
+ int ret = 0; /* Integer return code */
+ struct context_entry *ce; /* Iterator */
+ struct context_entry *context_new_phys; /* adr(table in new kernel) */
+ struct context_entry *context_new_virt; /* adr(table in new kernel) */
+ u32 devfn = 0; /* PCI Device & function */
+
+ /* allocate a context-entry table in the new kernel
+ * copy contents from old kernel
+ * then update each entry in the table in the new kernel
+ */
+ context_new_virt =
+ (struct context_entry *)alloc_pgtable_page(iommu->node);
+ if (!context_new_virt)
+ return -ENOMEM;
+
+ context_new_phys =
+ (struct context_entry *)virt_to_phys(context_new_virt);
+
+ oldcopy(context_new_virt, context_old_phys, VTD_PAGE_SIZE);
+
+ for (devfn = 0, ce = context_new_virt; devfn < 256; devfn++, ce++) {
+
+ if (!context_get_p(ce)) /* If (context not present) */
+ continue; /* Skip it */
+
+ ret = copy_context_entry(iommu, bus, devfn, ppap, ce);
+ if (ret == 0) /* if (Entry not present) */
+ continue;
+ if (ret == 1) /* If (Identity mapped pass-through) */
+ continue; /* REVISIT -- Skip for now */
+ if (ret == 2) /* If (ce->t was reserved value) */
+ continue; /* REVISIT -- Skip for now */
+ if (ret < 0) /* if (problem) */
+ return ret;
+ }
+
+ *context_new_p = context_new_phys;
+ __iommu_flush_cache(iommu, context_new_virt, VTD_PAGE_SIZE);
+ return 0;
+}
+
+
+
+static int copy_root_entry_table(struct intel_iommu *iommu, void *ppap,
+ struct root_entry **root_new_virt_p,
+ struct root_entry *root_old_phys)
+{
+ int ret = 0; /* Integer return code */
+ u32 bus; /* Index: root-entry-table */
+ struct root_entry *re; /* Virt(iterator: new table) */
+ struct root_entry *root_new_virt; /* Virt(table in new kernel) */
+ struct context_entry *context_old_phys; /* Phys(context table entry) */
+ struct context_entry *context_new_phys; /* Phys(new context_entry) */
+
+ /*
+ * allocate a root-entry table in the new kernel
+ * copy contents from old kernel
+ * then update each entry in the table in the new kernel
+ */
+
+ root_new_virt = (struct root_entry *)alloc_pgtable_page(iommu->node);
+ if (!root_new_virt)
+ return -ENOMEM;
+
+ oldcopy(root_new_virt, root_old_phys, VTD_PAGE_SIZE);
+
+ for (bus = 0, re = root_new_virt; bus < 256; bus += 1, re += 1) {
+
+ if (!root_present(re))
+ continue;
+
+ pr_debug("ROOT Bus: %2.2x re->val: %llx rsvd1: %llx\n",
+ bus, re->val, re->rsvd1);
+
+ context_old_phys = get_context_phys_from_root(re);
+
+ if (!context_old_phys)
+ continue;
+
+ ret = copy_context_entry_table(iommu, bus, ppap,
+ &context_new_phys,
+ context_old_phys);
+ if (ret)
+ return ret;
+
+ re->val &= ~VTD_PAGE_MASK;
+ set_root_value(re, (unsigned long)context_new_phys);
+ }
+
+ *root_new_virt_p = root_new_virt;
+ __iommu_flush_cache(iommu, root_new_virt, VTD_PAGE_SIZE);
+ return 0;
+}
+
+/*
+ * Interface to the "copy translation tables" set of functions
+ * from mainline code.
+ */
+static int copy_intel_iommu_translation_tables(struct dmar_drhd_unit *drhd,
+ struct root_entry **root_old_phys_p,
+ struct root_entry **root_new_virt_p)
+{
+ struct intel_iommu *iommu; /* Virt(iommu hardware registers) */
+ unsigned long long q; /* quadword scratch */
+ struct root_entry *root_phys; /* Phys(table in old kernel) */
+ struct root_entry *root_new; /* Virt(table in new kernel) */
+ int ret = 0; /* Integer return code */
+ int i = 0; /* Loop index */
+
+ /* Structure so copy_page_addr() can accumulate things
+ * over multiple calls and returns
+ */
+ struct copy_page_addr_parms ppa_parms = copy_page_addr_parms_init;
+ struct copy_page_addr_parms *ppap = &ppa_parms;
+
+
+ pr_debug("ENTER %s\n", __func__);
+
+ iommu = drhd->iommu;
+ q = readq(iommu->reg + DMAR_RTADDR_REG);
+ pr_debug("IOMMU %d: DMAR_RTADDR_REG:0x%16.16llx\n", iommu->seq_id, q);
+
+ if (!q)
+ return -1;
+
+ *root_old_phys_p = (struct root_entry *)q; /* Returned to caller */
+
+ /* If (list needs initializing) do it here */
+ if (!device_domain_values_list) {
+ device_domain_values_list =
+ kcalloc(g_num_of_iommus, sizeof(struct list_head),
+ GFP_KERNEL);
+
+ if (!device_domain_values_list) {
+ pr_err("Allocation failed for device_domain_values_list array\n");
+ return -ENOMEM;
+ }
+ for (i = 0; i < g_num_of_iommus; i++)
+ INIT_LIST_HEAD(&device_domain_values_list[i]);
+ }
+
+ /* Copy the root-entry table from the old kernel
+ * foreach context_entry_table in root_entry
+ * foreach context_entry in context_entry_table
+ * foreach level-1 page_table_entry in context_entry
+ * foreach level-2 page_table_entry in level 1 page_table_entry
+ * Above pattern continues up to 6 levels of page tables
+ * Sanity-check the entry
+ * Process the bus, devfn, page_address, page_size
+ */
+
+ root_phys = (struct root_entry *)q;
+ ret = copy_root_entry_table(iommu, ppap, &root_new, root_phys);
+ if (ret)
+ return ret;
+
+
+ ppa_parms.last = 1;
+ copy_page_addr(0, 0, 0, 0, 0, NULL, ppap);
+ *root_new_virt_p = root_new; /* Returned to caller */
+
+ /* The translation tables in the new kernel should now contain
+ * the same translations as the tables in the old kernel.
+ * This will allow us to update the iommu hdw to use the new tables.
+ *
+ * NOTE: Neither the iommu hardware nor the iommu->root_entry
+ * struct-value is updated herein.
+ * These are left for the caller to do.
+ */
+
+ { /* Dump the new root-entry table on the console */
+ u64 *p;
+ int i;
+
+ pr_debug("ROOT_ENTRY TABLE (NEW) START\n");
+
+ for (p = (void *)root_new, i = 0; i < 256; p += 2, i++)
+ if (p[1] != 0 || p[0] != 0 || i == 255)
+ pr_debug("i:%3.3d, p:0x%12.12llx %16.16llx %16.16llx\n",
+ i, (u64)p, p[1], p[0]);
+
+ pr_debug("ROOT_ENTRY TABLE (NEW) END\n");
+ }
+ pr_debug("LEAVE %s\n", __func__);
+ return 0;
+}
+#endif /* CONFIG_CRASH_DUMP */
--
Bill Sumner <[email protected]>

2014-01-10 22:07:59

by Sumner, William

[permalink] [raw]
Subject: [PATCHv3 5/6] Crashdump-Accepting-Active-IOMMU-Debug-Print-IOMMU

This patch contains a function to print the hardware registers of each
IOMMU onto the system console during crashdump kernel initialization.

This function seemed far too useful to leave out.

v1->v2:
Updated patch description
Fixed: "Advanced Fault Log register"

v2->v3:
No change

Signed-off-by: Bill Sumner <[email protected]>
---
drivers/iommu/intel-iommu.c | 76 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 76 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6737550..457ac80b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5325,3 +5325,79 @@ static int copy_intel_iommu_translation_tables(struct dmar_drhd_unit *drhd,
return 0;
}
#endif /* CONFIG_CRASH_DUMP */
+#ifdef CONFIG_CRASH_DUMP
+
+
+/* =========================================================================
+ * Diagnostic print
+ * ------------------------------------------------------------------------
+ */
+
+static struct intel_iommu_register_print {
+ int len; /* Length of register */
+ int idx; /* Index to read register */
+ char reg[20]; /* Linux name of register */
+ char txt[40]; /* Description */
+} intel_iommu_register_print_v[] = {
+ {1, DMAR_VER_REG, "DMAR_VER_REG", "Arch version supported by this IOMMU"},
+ {2, DMAR_CAP_REG, "DMAR_CAP_REG", "Hardware supported capabilities"},
+ {2, DMAR_ECAP_REG, "DMAR_ECAP_REG", "Extended capabilities supported"},
+ {1, DMAR_GCMD_REG, "DMAR_GCMD_REG", "Global command register"},
+ {1, DMAR_GSTS_REG, "DMAR_GSTS_REG", "Global status register "},
+ {2, DMAR_RTADDR_REG, "DMAR_RTADDR_REG", "Root entry table"},
+ {2, DMAR_CCMD_REG, "DMAR_CCMD_REG", "Context command reg"},
+ {1, DMAR_FSTS_REG, "DMAR_FSTS_REG", "Fault Status register"},
+ {1, DMAR_FECTL_REG, "DMAR_FECTL_REG", "Fault control register"},
+ {1, DMAR_FEDATA_REG, "DMAR_FEDATA_REG", "Fault event interrupt data register"},
+ {1, DMAR_FEADDR_REG, "DMAR_FEADDR_REG", "Fault event interrupt addr register"},
+ {1, DMAR_FEUADDR_REG, "DMAR_FEUADDR_REG", "Upper address register"},
+ {2, DMAR_AFLOG_REG, "DMAR_AFLOG_REG", "Advanced Fault Log register"},
+ {1, DMAR_PMEN_REG, "DMAR_PMEN_REG", "Enable Protected Memory Region"},
+ {1, DMAR_PLMBASE_REG, "DMAR_PLMBASE_REG", "PMRR Low addr"},
+ {1, DMAR_PLMLIMIT_REG, "DMAR_PLMLIMIT_REG", "PMRR low limit"},
+ {2, DMAR_PHMBASE_REG, "DMAR_PHMBASE_REG", "pmrr high base addr"},
+ {2, DMAR_PHMLIMIT_REG, "DMAR_PHMLIMIT_REG", "pmrr high limit"},
+ {2, DMAR_IQH_REG, "DMAR_IQH_REG", "Invalidation queue head register"},
+ {2, DMAR_IQT_REG, "DMAR_IQT_REG", "Invalidation queue tail register"},
+ {2, DMAR_IQA_REG, "DMAR_IQA_REG", "Invalidation queue addr register"},
+ {1, DMAR_ICS_REG, "DMAR_ICS_REG", "Invalidation complete status register"},
+ {2, DMAR_IRTA_REG, "DMAR_IRTA_REG", "Interrupt remapping table addr register"},
+};
+
+static void print_intel_iommu_registers(struct dmar_drhd_unit *drhd)
+{
+ struct intel_iommu *iommu; /* Virt adr(iommu hardware registers) */
+ unsigned long long q; /* quadword scratch */
+ u32 ver; /* DMAR_VER_REG */
+
+ int m = sizeof(intel_iommu_register_print_v) /
+ sizeof(intel_iommu_register_print_v[0]);
+ struct intel_iommu_register_print *p = &intel_iommu_register_print_v[0];
+
+ iommu = drhd->iommu;
+
+ pr_debug("ENTER %s\n", __func__);
+ ver = readl(iommu->reg + DMAR_VER_REG);
+ pr_debug("IOMMU %d: reg_base_addr %llx ver %d:%d cap %llx ecap %llx\n",
+ iommu->seq_id,
+ (unsigned long long)drhd->reg_base_addr,
+ DMAR_VER_MAJOR(ver), DMAR_VER_MINOR(ver),
+ (unsigned long long)iommu->cap,
+ (unsigned long long)iommu->ecap);
+
+ q = readq(iommu->reg + DMAR_RTADDR_REG);
+ pr_debug("IOMMU %d: DMAR_RTADDR_REG:0x%16.16llx\n", iommu->seq_id, q);
+
+ for (; p < &intel_iommu_register_print_v[m]; p++)
+ if (p->len == 2)
+ pr_debug("0x%16.16llx %-20s %-40s\n",
+ (u64)readq(iommu->reg + p->idx), p->reg,
+ p->txt);
+ else
+ pr_debug(" 0x%8.8x %-20s %-40s\n",
+ (u32)readl(iommu->reg + p->idx), p->reg,
+ p->txt);
+
+ pr_debug("LEAVE %s\n", __func__);
+}
+#endif /* CONFIG_CRASH_DUMP */
--
Bill Sumner <[email protected]>

2014-01-10 22:07:55

by Sumner, William

[permalink] [raw]
Subject: [PATCHv3 3/6] Crashdump-Accepting-Active-IOMMU-Domain-Interfaces

---------.---------.---------.---------.---------.---------.---------.
This patch contains functions called from among the mainline code
to retrieve information from the translation tables that have been
copied into the crashdump kernel from the panicked kernel.

Most often this happens when the crashdump kernel is setting-up a new
domain for a device. The crashdump kernel will assign to the device
(and to its new domain) the same IOVA addressing width and domain-id
that were used for this device by the panicked kernel. In the new
domain all of the IOVA addresses that were in-use by this device at
the time of the panic will be marked as "in-use" so that the crashdump
kernel will avoid re-using them.

This patch also includes one function to retrieve a bitmap of all
domain-ids in-use by the panicked kernel. These are marked "in-use"
so that if there is a device being used by the crashdump kernel that
was not being used by the panicked kernel then that new device will
receive a fresh, unused domain-id.

The 'device_domain_values_list' is used during the operation of
duplicating (copying) the translation tables from the panicked kernel
into the crashdump kernel, to insure that all devices that were assigned
to any specific domain-id in the panicked kernel are also assigned to
that same domain-id in the crashdump kernel. Additionally, all context
entries (there is one per device) that contain the same domain-id *must*
point to the same set of page-tables. The 'device_domain_values_list'
insures that if the copy operation has already seen 'this' domain-id,
it will simply point the current context entry to the same top-level
page-table that has been created previously for this domain-id.

No locks are necessary for 'device_domain_values_list'. Updates
are done only during the initialization operation of copying the
translation tables -- when only a single CPU is active.
Accesses after Linux goes multi-CPU are all read-only.

v1->v2:
Updated patch description

v2->v3:
No change

Signed-off-by: Bill Sumner <[email protected]>
---
drivers/iommu/intel-iommu.c | 266 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 266 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 4172a2b..85e30e5 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4529,3 +4529,269 @@ static int oldcopy(void *to, void *from, int size)
return (int) ret;
}
#endif /* CONFIG_CRASH_DUMP */
+#ifdef CONFIG_CRASH_DUMP
+
+
+
+/* ------------------------------------------------------------------------
+ * Interfaces for when a new domain in the new kernel needs some values
+ * from the old kernel's context entries
+ * ------------------------------------------------------------------------
+ */
+
+/* List to hold domain values found during the copy operation */
+static struct list_head *device_domain_values_list;
+
+
+/* Utility function for interface functions that follow. */
+static int
+context_get_entry(struct context_entry *context_addr,
+ struct intel_iommu *iommu, u32 bus, int devfn)
+{
+ unsigned long long q; /* quadword scratch */
+ struct root_entry *root_phys; /* Phys adr (root table entry) */
+ struct root_entry root_temp; /* Local copy of root_entry */
+ struct context_entry *context_phys; /* Phys adr */
+
+ if (pr_dbg.domain_get)
+ pr_debug("ENTER %s B:D:F=%2.2x:%2.2x:%1.1x &context_entry:0x%llx &intel_iommu:0x%llx\n",
+ __func__, bus, devfn>>3, devfn&7,
+ (u64)context_addr, (u64)iommu);
+
+ if (bus > 255) /* Sanity check */
+ return -5;
+ if (devfn > 255 || devfn < 0) /* Sanity check */
+ return -6;
+
+ q = readq(iommu->reg + DMAR_RTADDR_REG);
+ pr_debug("IOMMU %d: DMAR_RTADDR_REG:0x%16.16llx\n", iommu->seq_id, q);
+ if (!q)
+ return -1;
+
+ root_phys = (struct root_entry *) q; /* Adr(base of vector) */
+ root_phys += bus; /* Adr(entry we want) */
+
+ oldcopy(&root_temp, root_phys, sizeof(root_temp));
+
+ pr_debug("root_temp.val:0x%llx .rsvd1:0x%llx root_phys:0x%llx\n",
+ root_temp.val, root_temp.rsvd1, (u64)root_phys);
+
+ if (!root_present(&root_temp))
+ return -2;
+
+ pr_debug("B:D:F=%2.2x:%2.2x:%1.1x root_temp.val: %llx .rsvd1: %llx\n",
+ bus, devfn >> 3, devfn & 7, root_temp.val, root_temp.rsvd1);
+
+ if (root_temp.rsvd1) /* If (root_entry is bad) */
+ return -3;
+
+ context_phys = get_context_phys_from_root(&root_temp);
+ if (!context_phys)
+ return -4;
+
+ context_phys += devfn; /* Adr(context_entry we want) */
+
+
+ oldcopy(context_addr, context_phys, sizeof(*context_addr));
+
+ if (pr_dbg.domain_get)
+ pr_debug("LEAVE %s returning: phys:0x%12.12llx hi:0x%16.16llx lo:0x%16.16llx\n",
+ __func__, (u64) context_phys,
+ context_addr->hi, context_addr->lo);
+ return 0;
+}
+
+
+/* Get address_width of iova for a device from old kernel (if device existed) */
+static int
+domain_get_gaw_from_old_kernel(struct intel_iommu *iommu, struct pci_dev *pdev)
+{
+ int ret;
+ struct context_entry context_temp;
+
+ if (pr_dbg.domain_get)
+ pr_debug("ENTER %s B:D:F=%2.2x:%2.2x:%1.1x iommu:%d\n",
+ __func__, pdev->bus->number,
+ pdev->devfn >> 3, pdev->devfn & 7,
+ iommu->seq_id);
+
+ ret = context_get_entry(&context_temp, iommu,
+ pdev->bus->number, pdev->devfn);
+ if (ret < 0)
+ return ret;
+
+ return (int) agaw_to_width(context_get_aw(&context_temp));
+}
+
+
+/* Get domain_id for a device from old kernel (if device existed) */
+static int
+domain_get_did_from_old_kernel(struct intel_iommu *iommu, struct pci_dev *pdev)
+{
+ int ret;
+ struct context_entry context_temp;
+
+ if (pr_dbg.domain_get)
+ pr_debug("ENTER %s B:D:F=%2.2x:%2.2x:%1.1x iommu:%d\n",
+ __func__, pdev->bus->number,
+ pdev->devfn >> 3, pdev->devfn & 7,
+ iommu->seq_id);
+
+ ret = context_get_entry(&context_temp, iommu,
+ pdev->bus->number, pdev->devfn);
+ if (ret < 0)
+ return ret;
+
+ return (int) context_get_did(&context_temp);
+}
+
+
+/* Get adr(top page_table) for a device from old kernel (if device exists) */
+static u64
+domain_get_pgd_from_old_kernel(struct intel_iommu *iommu, struct pci_dev *pdev)
+{
+ int ret;
+ struct context_entry context_temp;
+ u64 phys;
+ u64 virt;
+
+ if (pr_dbg.domain_get)
+ pr_debug("ENTER %s B:D:F=%2.2x:%2.2x:%1.1x iommu:%d\n",
+ __func__, pdev->bus->number,
+ pdev->devfn >> 3, pdev->devfn & 7,
+ iommu->seq_id);
+
+ ret = context_get_entry(&context_temp, iommu,
+ pdev->bus->number, pdev->devfn);
+ if (ret < 0)
+ return 0;
+ if (!context_get_p(&context_temp))
+ return 0;
+
+ phys = context_get_asr(&context_temp) << VTD_PAGE_SHIFT;
+ if (pr_dbg.domain_get)
+ pr_debug("%s, phys: 0x%16.16llx\n", __func__, (u64) phys);
+
+ if (!phys)
+ return 0;
+
+ virt = (u64) phys_to_virt(phys);
+ if (pr_dbg.domain_get)
+ pr_debug("%s, virt: 0x%16.16llx\n", __func__, (u64) virt);
+
+ return virt;
+}
+
+
+/* Mark IOVAs that are in-use at time of panic by a device of the old kernel.
+ * Mark IOVAs in the domain for that device in the new kernel
+ * so that all new requests from the device driver for an IOVA will avoid
+ * re-using any IOVA that was in-use by the old kernel.
+ */
+static void
+domain_get_ranges_from_old_kernel(struct dmar_domain *domain,
+ struct intel_iommu *iommu,
+ struct pci_dev *pdev)
+{
+ u32 bus = pdev->bus->number;
+ int devfn = pdev->devfn;
+ struct device_domain_info *i = NULL; /* iterator for foreach */
+
+ pr_debug("ENTER %s, iommu=%d, B:D:F=%2.2x:%2.2x:%1.1x\n",
+ __func__, iommu->seq_id,
+ bus, devfn >> 3, devfn & 0x3);
+
+ list_for_each_entry(i, &device_domain_values_list[iommu->seq_id],
+ global) {
+ if (i->bus == bus && i->devfn == devfn) {
+ if (i->domain == NULL) {
+ pr_err("ERROR %s, iommu=%d, B:D:F=%2.2x:%2.2x:%1.1x\n",
+ __func__, iommu->seq_id,
+ bus, devfn >> 3, devfn & 0x3);
+
+ pr_err("FOUND B:D:F=%2.2x:%2.2x:%1.1x INFO domain-pointer is NULL\n",
+ bus, devfn >> 3, devfn & 0x3);
+ break;
+ }
+ pr_debug("FOUND B:D:F=%2.2x:%2.2x:%1.1x did:%4.4x\n",
+ bus, devfn >> 3, devfn & 0x3, i->domain->id);
+
+ copy_reserved_iova(&i->domain->iovad, &domain->iovad);
+ break;
+ }
+ }
+
+ pr_debug("LEAVE %s\n", __func__);
+}
+
+
+/* Mark domain-id's from old kernel as in-use on this iommu so that a new
+ * domain-id is allocated in the case where there is a device in the new kernel
+ * that was not in the old kernel -- and therefore a new domain-id is needed.
+ */
+static int intel_iommu_get_dids_from_old_kernel(struct intel_iommu *iommu)
+{
+ unsigned long long q; /* quadword scratch */
+ struct root_entry *root_phys; /* Phys(in old kernel) */
+ struct root_entry *root_temp; /* Virt(Local copy) */
+ struct root_entry *re; /* Loop index */
+ struct context_entry *context_phys; /* Phys(in old kernel) */
+ struct context_entry *context_temp; /* Virt(Local copy) */
+ struct context_entry *ce; /* Loop index */
+ int did; /* Each domain-id found */
+ u32 bus; /* Index into root-entry-table */
+ u32 devfn; /* Index into context-entry-table */
+
+ pr_debug("ENTER %s iommu:%d\n", __func__, iommu->seq_id);
+
+ q = readq(iommu->reg + DMAR_RTADDR_REG);
+ pr_debug("IOMMU %d: DMAR_RTADDR_REG:0x%16.16llx\n", iommu->seq_id, q);
+ if (!q)
+ return -ENOMEM;
+
+ root_phys = (void *)q;
+ root_temp = (struct root_entry *)alloc_pgtable_page(iommu->node);
+ if (!root_temp)
+ return -ENOMEM;
+ oldcopy(root_temp, root_phys, PAGE_SIZE);
+
+ context_temp = (struct context_entry *)alloc_pgtable_page(iommu->node);
+ if (!context_temp) {
+ free_pgtable_page(root_temp);
+ return -ENOMEM;
+ }
+
+ for (bus = 0, re = root_temp; bus < 256; bus += 1, re += 1) {
+
+ if (!root_present(re))
+ continue;
+
+ pr_debug("ROOT B:%2.2x val: %16.16llx rsvd1: %16.16llx\n",
+ bus, re->val, re->rsvd1);
+
+ if (re->rsvd1) /* If (root_entry is bad) */
+ continue;
+
+ context_phys = get_context_phys_from_root(re);
+ if (!context_phys)
+ continue;
+
+ oldcopy(context_temp, context_phys, PAGE_SIZE);
+
+ for (devfn = 0, ce = context_temp; devfn < 512; devfn++, ce++) {
+ if (!context_get_p(ce))
+ continue;
+
+ did = context_get_did(ce);
+ set_bit(did, iommu->domain_ids);
+ pr_debug("DID B:D:F:%2.2x:%2.2x:%1.1x did:%d(0x%4.4x)\n",
+ bus, devfn >> 3, devfn & 0x7, did, did);
+ }
+
+ }
+ free_pgtable_page(root_temp);
+ free_pgtable_page(context_temp);
+ pr_debug("LEAVE %s iommu:%d\n", __func__, iommu->seq_id);
+ return 0;
+}
+#endif /* CONFIG_CRASH_DUMP */
--
Bill Sumner <[email protected]>

2014-01-10 22:08:45

by Sumner, William

[permalink] [raw]
Subject: [PATCHv3 2/6] Crashdump-Accepting-Active-IOMMU-Utility-functions

---------.---------.---------.---------.---------.---------.---------.
Most of the code for Crashdump Accepting Active IOMMU is contained
in a large section at the end of intel-iommu.c -- beginning here.

This patch contains small utility functions used to access the
bit fields of the context entry, plus one to copy from a physically-
addressed area of memory (primarily pages from the panicked kernel)
into a virtually-addressed area within the crashdump kernel.

v1->v2:
Updated patch description

v2->v3:
No change

Signed-off-by: Bill Sumner <[email protected]>
---
drivers/iommu/intel-iommu.c | 74 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 74 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 17c4537..4172a2b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4455,3 +4455,77 @@ static void __init check_tylersburg_isoch(void)
printk(KERN_WARNING "DMAR: Recommended TLB entries for ISOCH unit is 16; your BIOS set %d\n",
vtisochctrl);
}
+#ifdef CONFIG_CRASH_DUMP
+
+/* ========================================================================
+ * Utility functions for accessing the iommu Translation Tables
+ * ------------------------------------------------------------------------
+ */
+static inline struct context_entry *
+get_context_phys_from_root(struct root_entry *root)
+{
+ return (struct context_entry *)
+ (root_present(root) ? (void *) (root->val & VTD_PAGE_MASK)
+ : NULL);
+}
+
+static int
+context_get_p(struct context_entry *c) {return((c->lo >> 0) & 0x1); }
+
+static int
+context_get_fpdi(struct context_entry *c) {return((c->lo >> 1) & 0x1); }
+
+static int
+context_get_t(struct context_entry *c) {return((c->lo >> 2) & 0x3); }
+
+static u64
+context_get_asr(struct context_entry *c) {return((c->lo >> 12)); }
+
+static int
+context_get_aw(struct context_entry *c) {return((c->hi >> 0) & 0x7); }
+
+static int
+context_get_aval(struct context_entry *c) {return((c->hi >> 3) & 0xf); }
+
+static int
+context_get_did(struct context_entry *c) {return((c->hi >> 8) & 0xffff); }
+
+static void context_put_asr(struct context_entry *c, unsigned long asr)
+{
+ c->lo &= (~VTD_PAGE_MASK);
+ c->lo |= (asr << VTD_PAGE_SHIFT);
+}
+
+
+/*
+ * Copy memory from a physically-addressed area into a virtually-addressed area
+ */
+static int oldcopy(void *to, void *from, int size)
+{
+ size_t ret = 0; /* Length copied */
+ unsigned long pfn; /* Page Frame Number */
+ char *buf = to; /* Adr(Output buffer) */
+ size_t csize = (size_t)size; /* Num(bytes to copy) */
+ unsigned long offset; /* Lower 12 bits of from */
+ int userbuf = 0; /* to is in kernel space */
+
+ if (pr_dbg.enter_oldcopy)
+ pr_debug("ENTER %s to=%16.16llx, from = %16.16llx, size = %d\n",
+ __func__,
+ (unsigned long long) to,
+ (unsigned long long) from, size);
+
+ if (intel_iommu_translation_tables_are_mapped)
+ memcpy(to, phys_to_virt((phys_addr_t)from), csize);
+ else {
+ pfn = ((unsigned long) from) >> VTD_PAGE_SHIFT;
+ offset = ((unsigned long) from) & (~VTD_PAGE_MASK);
+ ret = copy_oldmem_page(pfn, buf, csize, offset, userbuf);
+ }
+
+ if (pr_dbg.leave_oldcopy)
+ pr_debug("LEAVE %s ret=%d\n", __func__, (int) ret);
+
+ return (int) ret;
+}
+#endif /* CONFIG_CRASH_DUMP */
--
Bill Sumner <[email protected]>

2014-01-10 22:08:44

by Sumner, William

[permalink] [raw]
Subject: [PATCHv3 6/6] Crashdump-Accepting-Active-IOMMU-Call-From-Mainline

At a high level, this code operates primarily during iommu initialization
and device-driver initialization

During intel-iommu hardware initialization:
In intel_iommu_init(void)
* If (This is the crash kernel)
. Set flag: crashdump_accepting_active_iommu (all changes below check this)
. Skip disabling the iommu hardware translations

In init_dmars()
* Duplicate the intel iommu translation tables from the old kernel
in the new kernel
. The root-entry table, all context-entry tables,
and all page-translation-entry tables
. The duplicate tables contain updated physical addresses to link them together.
. The duplicate tables are mapped into kernel virtual addresses
in the new kernel which allows most of the existing iommu code
to operate without change.
. Do some minimal sanity-checks during the copy
. Place the address of the new root-entry structure into "struct intel_iommu"

* Skip setting-up new domains for 'si', 'rmrr', 'isa'
. Translations for 'rmrr' and 'isa' ranges have been copied from the old kernel
. This patch has not yet been tested with iommu pass-through enabled

* Existing (unchanged) code near the end of dmar_init:
. Loads the address of the (now new) root-entry structure from
"struct intel_iommu" into the iommu hardware and does the hardware flushes.
This changes the active translation tables from the ones in the old kernel
to the copies in the new kernel.
. This is legal because the translations in the two sets of tables are
currently identical:
Virtualization Technology for Directed I/O. Architecture Specification,
February 2011, Rev. 1.3 (section 11.2, paragraph 2)

In iommu_init_domains()
* Mark as in-use all domain-id's from the old kernel
. In case the new kernel contains a device that was not in the old kernel
and a new, unused domain-id is actually needed, the bitmap will give us one.

When a new domain is created for a device:
* If (this device has a context in the old kernel)
. Get domain-id, address-width, and IOVA ranges from the old kernel context;
. Get address(page-entry-tables) from the copy in the new kernel;
. And apply all of the above values to the new domain structure.
* Else
. Create a new domain as normal

v1->v2:
Updated patch description

v2->v3:
No change

Signed-off-by: Bill Sumner <[email protected]>
---
drivers/iommu/intel-iommu.c | 272 +++++++++++++++++++++++++++++++++-----------
1 file changed, 204 insertions(+), 68 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 457ac80b..4f702d1 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -21,6 +21,8 @@
* Author: Fenghua Yu <[email protected]>
*/

+/* #define DEBUG 1 */ /* Enable/Disable debug print in this source file */
+
#include <linux/init.h>
#include <linux/bitmap.h>
#include <linux/debugfs.h>
@@ -1357,6 +1359,12 @@ static int iommu_init_domains(struct intel_iommu *iommu)
*/
if (cap_caching_mode(iommu->cap))
set_bit(0, iommu->domain_ids);
+
+#ifdef CONFIG_CRASH_DUMP
+ if (crashdump_accepting_active_iommu)
+ intel_iommu_get_dids_from_old_kernel(iommu);
+#endif /* CONFIG_CRASH_DUMP */
+
return 0;
}

@@ -1430,7 +1438,8 @@ static struct dmar_domain *alloc_domain(void)
}

static int iommu_attach_domain(struct dmar_domain *domain,
- struct intel_iommu *iommu)
+ struct intel_iommu *iommu,
+ int domain_number)
{
int num;
unsigned long ndomains;
@@ -1440,12 +1449,15 @@ static int iommu_attach_domain(struct dmar_domain *domain,

spin_lock_irqsave(&iommu->lock, flags);

- num = find_first_zero_bit(iommu->domain_ids, ndomains);
- if (num >= ndomains) {
- spin_unlock_irqrestore(&iommu->lock, flags);
- printk(KERN_ERR "IOMMU: no free domain ids\n");
- return -ENOMEM;
- }
+ if (domain_number < 0) {
+ num = find_first_zero_bit(iommu->domain_ids, ndomains);
+ if (num >= ndomains) {
+ spin_unlock_irqrestore(&iommu->lock, flags);
+ printk(KERN_ERR "IOMMU: no free domain ids\n");
+ return -ENOMEM;
+ }
+ } else
+ num = domain_number;

domain->id = num;
set_bit(num, iommu->domain_ids);
@@ -2056,8 +2068,17 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
int bus = 0, devfn = 0;
int segment;
int ret;
+ int did = -1; /* Default to "no domain_id supplied" */

domain = find_domain(pdev);
+
+#ifdef CONFIG_CRASH_DUMP
+ if (domain)
+ if (pr_dbg.in_crashdump && crashdump_accepting_active_iommu)
+ pr_debug("IOMMU: Found domain (%d) for device %s\n",
+ domain->id, pci_name(pdev));
+#endif /* CONFIG_CRASH_DUMP */
+
if (domain)
return domain;

@@ -2088,6 +2109,12 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
}
}

+#ifdef CONFIG_CRASH_DUMP
+ if (pr_dbg.in_crashdump && crashdump_accepting_active_iommu)
+ pr_debug("IOMMU: Allocating new domain for device %s\n",
+ pci_name(pdev));
+#endif /* CONFIG_CRASH_DUMP */
+
domain = alloc_domain();
if (!domain)
goto error;
@@ -2102,7 +2129,26 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
}
iommu = drhd->iommu;

- ret = iommu_attach_domain(domain, iommu);
+#ifdef CONFIG_CRASH_DUMP
+ /* See if this device had a did & gaw in the old kernel */
+ if (crashdump_accepting_active_iommu) {
+ did = domain_get_did_from_old_kernel(iommu, pdev);
+ if (did > 0 || (did == 0 && !cap_caching_mode(iommu->cap))) {
+ ret = domain_get_gaw_from_old_kernel(iommu, pdev);
+ if (ret > 0)
+ gaw = ret;
+ else
+ did = -1;
+ } else
+ did = -1;
+ }
+
+ if (pr_dbg.in_crashdump && crashdump_accepting_active_iommu)
+ pr_debug("IOMMU: new domain for device %s: gaw(%d) did(%d)\n",
+ pci_name(pdev), gaw, did);
+#endif /* CONFIG_CRASH_DUMP */
+
+ ret = iommu_attach_domain(domain, iommu, did);
if (ret) {
free_domain_mem(domain);
goto error;
@@ -2113,6 +2159,23 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
goto error;
}

+#ifdef CONFIG_CRASH_DUMP
+ if (crashdump_accepting_active_iommu && did >= 0) {
+ u64 temp_pgd; /* Top page-translation-table */
+
+ domain_get_ranges_from_old_kernel(domain, iommu, pdev);
+
+ temp_pgd = domain_get_pgd_from_old_kernel(iommu, pdev);
+ if (temp_pgd) {
+ if (domain->pgd)
+ free_pgtable_page(domain->pgd);
+ domain->pgd = (struct dma_pte *)temp_pgd;
+ }
+ pr_debug("IOMMU: New Domain for device %s Did:%d Pgd: 0x%12.12llx\n",
+ pci_name(pdev), did, temp_pgd);
+ }
+#endif /* CONFIG_CRASH_DUMP */
+
/* register pcie-to-pci device */
if (dev_tmp) {
info = alloc_devinfo_mem();
@@ -2323,7 +2386,7 @@ static int __init si_domain_init(int hw)
pr_debug("Identity mapping domain is domain %d\n", si_domain->id);

for_each_active_iommu(iommu, drhd) {
- ret = iommu_attach_domain(si_domain, iommu);
+ ret = iommu_attach_domain(si_domain, iommu, (int) -1);
if (ret) {
domain_exit(si_domain);
return -EFAULT;
@@ -2531,6 +2594,10 @@ static int __init init_dmars(void)
struct pci_dev *pdev;
struct intel_iommu *iommu;
int i, ret;
+#ifdef CONFIG_CRASH_DUMP
+ struct root_entry *root_old_phys;
+ struct root_entry *root_new_virt;
+#endif /* CONFIG_CRASH_DUMP */

/*
* for each drhd
@@ -2578,16 +2645,41 @@ static int __init init_dmars(void)
if (ret)
goto error;

- /*
- * TBD:
- * we could share the same root & context tables
- * among all IOMMU's. Need to Split it later.
- */
- ret = iommu_alloc_root_entry(iommu);
- if (ret) {
- printk(KERN_ERR "IOMMU: allocate root entry failed\n");
- goto error;
+#ifdef CONFIG_CRASH_DUMP
+ if (crashdump_accepting_active_iommu) {
+ print_intel_iommu_registers(drhd);
+
+ pr_debug("Calling copy_intel_iommu_translation_tables\n");
+ pr_debug("(lists tables in OLD KERNEL during copy)\n");
+ ret = copy_intel_iommu_translation_tables(drhd,
+ &root_old_phys, &root_new_virt);
+ if (ret) {
+ pr_err("IOMMU: Copy translate tables failed\n");
+
+ /* Best to stop trying */
+ crashdump_accepting_active_iommu = false;
+ goto error;
+ }
+ iommu->root_entry = root_new_virt;
+ pr_debug("IOMMU: root_new_virt:0x%12.12llx phys:0x%12.12llx\n",
+ (u64)root_new_virt,
+ virt_to_phys(root_new_virt));
+ } else {
+#endif /* CONFIG_CRASH_DUMP */
+ /*
+ * TBD:
+ * we could share the same root & context tables
+ * among all IOMMU's. Need to Split it later.
+ */
+ ret = iommu_alloc_root_entry(iommu);
+ if (ret) {
+ printk(KERN_ERR "IOMMU: allocate root entry failed\n");
+ goto error;
+ }
+#ifdef CONFIG_CRASH_DUMP
}
+#endif /* CONFIG_CRASH_DUMP */
+
if (!ecap_pass_through(iommu->ecap))
hw_pass_through = 0;
}
@@ -2656,50 +2748,69 @@ static int __init init_dmars(void)

check_tylersburg_isoch();

- /*
- * If pass through is not set or not enabled, setup context entries for
- * identity mappings for rmrr, gfx, and isa and may fall back to static
- * identity mapping if iommu_identity_mapping is set.
- */
- if (iommu_identity_mapping) {
- ret = iommu_prepare_static_identity_mapping(hw_pass_through);
- if (ret) {
- printk(KERN_CRIT "Failed to setup IOMMU pass-through\n");
- goto error;
+#ifdef CONFIG_CRASH_DUMP
+ if (!crashdump_accepting_active_iommu) {
+ /* Skip setting-up new domains for si, rmrr, and the isa bus
+ * on the expectation that these translations
+ * were copied from the old kernel.
+ *
+ * NOTE: Indented the existing code below because it is now
+ * conditional upon the 'if' statement above.
+ * This pushed many of the lines over 80 characters.
+ * Chose to leave them and live with the 'checkpatch' warnings
+ * about "over 80 characters".
+ */
+#endif /* CONFIG_CRASH_DUMP */
+ /*
+ * If pass through is not set or not enabled, setup context entries for
+ * identity mappings for rmrr, gfx, and isa and may fall back to static
+ * identity mapping if iommu_identity_mapping is set.
+ */
+ if (iommu_identity_mapping) {
+ ret = iommu_prepare_static_identity_mapping(hw_pass_through);
+ if (ret) {
+ printk(KERN_CRIT "Failed to setup IOMMU pass-through\n");
+ goto error;
+ }
}
- }
- /*
- * For each rmrr
- * for each dev attached to rmrr
- * do
- * locate drhd for dev, alloc domain for dev
- * allocate free domain
- * allocate page table entries for rmrr
- * if context not allocated for bus
- * allocate and init context
- * set present in root table for this bus
- * init context with domain, translation etc
- * endfor
- * endfor
- */
- printk(KERN_INFO "IOMMU: Setting RMRR:\n");
- for_each_rmrr_units(rmrr) {
- for (i = 0; i < rmrr->devices_cnt; i++) {
- pdev = rmrr->devices[i];
- /*
- * some BIOS lists non-exist devices in DMAR
- * table.
- */
- if (!pdev)
- continue;
- ret = iommu_prepare_rmrr_dev(rmrr, pdev);
- if (ret)
- printk(KERN_ERR
- "IOMMU: mapping reserved region failed\n");
+ /*
+ * For each rmrr
+ * for each dev attached to rmrr
+ * do
+ * locate drhd for dev, alloc domain for dev
+ * allocate free domain
+ * allocate page table entries for rmrr
+ * if context not allocated for bus
+ * allocate and init context
+ * set present in root table for this bus
+ * init context with domain, translation etc
+ * endfor
+ * endfor
+ */
+ printk(KERN_INFO "IOMMU: Setting RMRR:\n");
+ for_each_rmrr_units(rmrr) {
+ for (i = 0; i < rmrr->devices_cnt; i++) {
+ pdev = rmrr->devices[i];
+ /*
+ * some BIOS lists non-exist devices in DMAR
+ * table.
+ */
+ if (!pdev)
+ continue;
+ ret = iommu_prepare_rmrr_dev(rmrr, pdev);
+ if (ret)
+ printk(KERN_ERR
+ "IOMMU: mapping reserved region failed\n");
+ }
}
- }

- iommu_prepare_isa();
+ iommu_prepare_isa();
+#ifdef CONFIG_CRASH_DUMP
+ } else {
+ intel_iommu_translation_tables_are_mapped = true;
+ pr_debug("intel_iommu_translation_tables_are_mapped = true\n");
+ }
+#endif /* CONFIG_CRASH_DUMP */

/*
* for each drhd
@@ -2893,6 +3004,12 @@ static dma_addr_t __intel_map_single(struct device *hwdev, phys_addr_t paddr,

BUG_ON(dir == DMA_NONE);

+#ifdef CONFIG_CRASH_DUMP
+ if (pr_dbg.in_crashdump && crashdump_accepting_active_iommu)
+ pr_debug("ENTER %s paddr(0x%12.12llx) size(0x%12.12lx)\n",
+ __func__, paddr, size);
+#endif /* CONFIG_CRASH_DUMP */
+
if (iommu_no_mapping(hwdev))
return paddr;

@@ -2935,6 +3052,12 @@ static dma_addr_t __intel_map_single(struct device *hwdev, phys_addr_t paddr,

start_paddr = (phys_addr_t)iova->pfn_lo << PAGE_SHIFT;
start_paddr += paddr & ~PAGE_MASK;
+
+#ifdef CONFIG_CRASH_DUMP
+ if (pr_dbg.in_crashdump && crashdump_accepting_active_iommu)
+ pr_debug("LEAVE %s dma_addr_t(0x%16.16llx)\n",
+ __func__, start_paddr);
+#endif /* CONFIG_CRASH_DUMP */
return start_paddr;

error:
@@ -3754,19 +3877,32 @@ int __init intel_iommu_init(void)
return -ENODEV;
}

+#ifdef CONFIG_CRASH_DUMP
/*
- * Disable translation if already enabled prior to OS handover.
+ * If (This is the crash kernel)
+ * Set: copy iommu translate tables from old kernel
+ * Skip disabling the iommu hardware translations
*/
- for_each_drhd_unit(drhd) {
- struct intel_iommu *iommu;
+ if (is_kdump_kernel()) {
+ crashdump_accepting_active_iommu = true;
+ pr_info("IOMMU crashdump_accepting_active_iommu = true\n");
+ pr_info("IOMMU Skip disabling iommu hardware translations\n");
+ } else
+#endif /* CONFIG_CRASH_DUMP */
+ /*
+ * Disable translation if already enabled prior to OS handover.
+ */
+ for_each_drhd_unit(drhd) {
+ struct intel_iommu *iommu;

- if (drhd->ignored)
- continue;
+ if (drhd->ignored)
+ continue;
+
+ iommu = drhd->iommu;
+ if (iommu->gcmd & DMA_GCMD_TE)
+ iommu_disable_translation(iommu);
+ }

- iommu = drhd->iommu;
- if (iommu->gcmd & DMA_GCMD_TE)
- iommu_disable_translation(iommu);
- }

if (dmar_dev_scope_init() < 0) {
if (force_on)
--
Bill Sumner <[email protected]>

2014-01-10 22:09:39

by Sumner, William

[permalink] [raw]
Subject: [PATCHv3 1/6] Crashdump-Accepting-Active-IOMMU-Flags-and-Prototype

The following series implements a fix for:
A kdump problem about DMA that has been discussed for a long time. That is,
when a kernel panics and boots into the kdump kernel, DMA started by the
panicked kernel is not stopped before the kdump kernel is booted and the
kdump kernel disables the IOMMU while this DMA continues. This causes the
IOMMU to stop translating the DMA addresses as IOVAs and begin to treat them
as physical memory addresses -- which causes the DMA to either:
(1) generate DMAR errors or (2) generate PCI SERR errors or (3) transfer
data to or from incorrect areas of memory. Often this causes the dump to fail.

This patch set modifies the behavior of the iommu in the (new) crashdump kernel:
1. to accept the iommu hardware in an active state,
2. to leave the current translations in-place so that legacy DMA will continue
using its current buffers until the device drivers in the crashdump kernel
initialize and initialize their devices,
3. to use different portions of the iova address ranges for the device drivers
in the crashdump kernel than the iova ranges that were in-use at the time
of the panic.
4. to mark as in-use each domain-id that an IOMMU was using at the
time of the panic -- so that if the crashdump kernel contains a
device that was not in the panicked kernel (and therefore needs
a new domain-id) that device will be given an unused domain-id.

Advantages of this approach:
1. All manipulation of the IO-device is done by the Linux device-driver
for that device.
2. This approach behaves in a manner very similar to operation without an
active iommu.
3. Any activity between the IO-device and its RMRR areas is handled by the
device-driver in the same manner as during a non-kdump boot.
4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
This supports the practice of creating a special kdump kernel without
drivers for any devices that are not required for taking a crashdump.

This patch contains global flags used by many sections of the code and
prototypes of the interface functions which are coded at the end of
the source file.

Static struct 'pr_dbg' contains bit-flags that control the amount of debug
print placed on the console. Note that the amount of print increases greatly
(probably geometrically if not faster) as additional bits further down the
structure are enabled. These flags and the pr_debug() lines scattered
throughout the code are only for the developer or analyst. At this time,
no means is provided to modify the flags without a re-compile.

v1->v2:
Updated patch description

v2->v3:
No change

Signed-off-by: Bill Sumner <[email protected]>
---
drivers/iommu/intel-iommu.c | 75 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 75 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 43b9bfe..17c4537 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -48,6 +48,7 @@

#include "irq_remapping.h"
#include "pci.h"
+#include <linux/crash_dump.h>

#define ROOT_SIZE VTD_PAGE_SIZE
#define CONTEXT_SIZE VTD_PAGE_SIZE
@@ -164,6 +165,80 @@ static inline unsigned long virt_to_dma_pfn(void *p)
return page_to_dma_pfn(virt_to_page(p));
}

+#ifdef CONFIG_CRASH_DUMP
+/* ===================================================================
+ * Crashdump Accepting Active IOMMU
+ * Enhances the crashdump kernel to deal with an active iommu
+ * and legacy DMA from the (old) panic'd kernel in a manner similar to how
+ * legacy DMA is handled when no hardware iommu was in use by the old kernel --
+ * allow the legacy DMA to continue into its current buffers.
+ *
+ * This code:
+ * 1. accepts the iommu hardware in an active state from the old kernel,
+ * 2. leaves the current translations in-place so that legacy DMA will
+ * continue to use its current buffers,
+ * 3. allocates to the device drivers in the crashdump kernel
+ * portions of the iova address ranges that are different
+ * from the iova address ranges that were being used by the old kernel
+ * at the time of the panic.
+ * -------------------------------------------------------------------
+ */
+
+/* Flags for Crashdump Accepting Active IOMMU */
+
+static int crashdump_accepting_active_iommu; /* activate this feature */
+static int intel_iommu_translation_tables_are_mapped; /* table copy done */
+
+static struct { /* run-time pr_debug() flags */
+ unsigned in_crashdump:1; /* if crashdump_accepting_active_iommu */
+ unsigned domain_get:1; /* pr_debug in domain_get* functions */
+ unsigned copy_page_table:1; /* enter/leave copy_page_table() */
+ unsigned copy_page_addr:1; /* enter/leave copy_page_addr() */
+ unsigned addr_ranges:1; /* accumulated addr ranges */
+ unsigned reserved_ranges:1; /* accumulated addr ranges reserved */
+ unsigned page_addr:1; /* adr(each page table) */
+ unsigned enter_oldcopy:1; /* enter oldcopy() parameters */
+ unsigned leave_oldcopy:1; /* leave oldcopy() parameters */
+} pr_dbg = { /* Enable flags below here */
+ .in_crashdump = 1,
+ .domain_get = 1,
+ .copy_page_table = 1,
+ .copy_page_addr = 0,
+ .addr_ranges = 0,
+ .reserved_ranges = 0,
+ .page_addr = 0,
+ .enter_oldcopy = 0,
+ .leave_oldcopy = 0,
+};
+
+/* Prototypes of interface functions for Crashdump Accepting Active IOMMU */
+
+static int
+copy_intel_iommu_translation_tables(struct dmar_drhd_unit *drhd,
+ struct root_entry **root_old_p, struct root_entry **root_new_p);
+
+static int
+domain_get_did_from_old_kernel(struct intel_iommu *iommu, struct pci_dev *pdev);
+
+static int
+domain_get_gaw_from_old_kernel(struct intel_iommu *iommu, struct pci_dev *pdev);
+
+static u64
+domain_get_pgd_from_old_kernel(struct intel_iommu *iommu, struct pci_dev *pdev);
+
+static void
+domain_get_ranges_from_old_kernel(struct dmar_domain *domain,
+ struct intel_iommu *iommu, struct pci_dev *pdev);
+static int
+intel_iommu_get_dids_from_old_kernel(struct intel_iommu *iommu);
+
+/* Debug-print functions for Crashdump Accepting Active IOMMU */
+
+static void
+print_intel_iommu_registers(struct dmar_drhd_unit *drhd);
+#endif /* CONFIG_CRASH_DUMP */
+
+
/* global iommu list, set NULL for ignored DMAR units */
static struct intel_iommu **g_iommus;

--
Bill Sumner <[email protected]>

2014-01-25 02:45:00

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

Tested this patchset on my local HP Z420 workstation, and it works very
well.

Hi Bill,

Thanks for your effort.

There are several concerns from me.

Firstly, I think the patch log need be rearanged. Patchset cover letter
can contain information to express why, how briefly. If you think this
is very useful, it can be split and put into patch log.

Then for each patch, patch log should be accurate and summary to
describe why and how this patch really does. If you feel several patches
have the corelation, they may need to be merged.

Secondly, each patch could get a seperate subject which tells what this
patch really does. Even they are merged to kernel git tree, each of them
is a independent commit. People can take to use or depend only one of
them. Actually, I don't like current patch subject.

Thirdly, this patchset will be part of intel-iommu, though they only
works for kdump kernel. As a subsystem, the style need be consistent. I
like the debug method which introduces a struct pr_debug, however
maintainers may not like it. Because a debug utility may bloat code and
affect people's review. Personally I like refined code, the less the
easier to review. Or put it as a independent patch at the end of the
patchset, let maintainer decide whether it's OK to pull in.

Sorry to say so much, I think this solution is truly the right way. As
you know, it's a big problem for kdump when intel-iommu is active in 1st
kernel. Because of this bug, many machines with intel-iommu have to be
set intel-iommu=off, the performance is affected very much.

Baoquan
Thanks

On 01/10/14 at 03:07pm, Bill Sumner wrote:
> v2->v3:
> 1. Commented-out "#define DEBUG 1" to eliminate debug messages
> 2. Updated the comments about changes in each version in all patches in the set.
> 3. Fixed: one-line added to Copy-Translations" patch to initialize the iovad
> struct as recommended by Baoquan He [[email protected]]
> init_iova_domain(&domain->iovad, DMA_32BIT_PFN);
>
> v1->v2:
> The following series implements a fix for:
> A kdump problem about DMA that has been discussed for a long time. That is,
> when a kernel panics and boots into the kdump kernel, DMA started by the
> panicked kernel is not stopped before the kdump kernel is booted and the
> kdump kernel disables the IOMMU while this DMA continues. This causes the
> IOMMU to stop translating the DMA addresses as IOVAs and begin to treat them
> as physical memory addresses -- which causes the DMA to either:
> (1) generate DMAR errors or (2) generate PCI SERR errors or (3) transfer
> data to or from incorrect areas of memory. Often this causes the dump to fail.
>

2014-04-07 20:44:08

by Donald Dutile

[permalink] [raw]
Subject: Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

On 01/10/2014 05:07 PM, Bill Sumner wrote:
> v2->v3:
> 1. Commented-out "#define DEBUG 1" to eliminate debug messages
> 2. Updated the comments about changes in each version in all patches in the set.
> 3. Fixed: one-line added to Copy-Translations" patch to initialize the iovad
> struct as recommended by Baoquan He [[email protected]]
> init_iova_domain(&domain->iovad, DMA_32BIT_PFN);
>
> v1->v2:
> The following series implements a fix for:
> A kdump problem about DMA that has been discussed for a long time. That is,
> when a kernel panics and boots into the kdump kernel, DMA started by the
> panicked kernel is not stopped before the kdump kernel is booted and the
> kdump kernel disables the IOMMU while this DMA continues. This causes the
> IOMMU to stop translating the DMA addresses as IOVAs and begin to treat them
> as physical memory addresses -- which causes the DMA to either:
> (1) generate DMAR errors or (2) generate PCI SERR errors or (3) transfer
> data to or from incorrect areas of memory. Often this causes the dump to fail.
>
> This patch set modifies the behavior of the iommu in the (new) crashdump kernel:
> 1. to accept the iommu hardware in an active state,
> 2. to leave the current translations in-place so that legacy DMA will continue
> using its current buffers until the device drivers in the crashdump kernel
> initialize and initialize their devices,
> 3. to use different portions of the iova address ranges for the device drivers
> in the crashdump kernel than the iova ranges that were in-use at the time
> of the panic.
>
> Advantages of this approach:
> 1. All manipulation of the IO-device is done by the Linux device-driver
> for that device.
> 2. This approach behaves in a manner very similar to operation without an
> active iommu.

Sorry to be late to the game.... finally getting out of a deep hole &
asked to look at this proposal...

Along this concept -- similar to operation without an active iommu --
have you considered the following:
a) if (this is crash kernel), turn *off* DMAR faults;
b) if (this is crash kernel), isolate all device DMA in IOMMU
b) as second kernel configures each device, have each device to use IOMMU hw-passthrough,
i.e., the equivalent of having no IOMMU for the second, kexec'd kernel
but, having the benefit of keeping all the other (potentially bad) devices
sequestered / isolated, until they are initialized & re-configured in the second kernel,
*if at all* -- note: kexec'd kernels may not enable/configure all devices that
existed in the first kernel (Bill: I'm sure you know this, but others may not).

RMRR's that were previously setup could continue to work if they are skipped in step (b),
unless the device has gone mad/bad. In that case, re-parsing the RMRR may or may not
clear up the issue.

Additionally, a tidbit of information like "some servers force NMI's on DMAR faults,
and cause a system reset, thereby, preventing a kdump to occur"
should have been included as one reason to stop DMAR faults from occurring on kexec-boot,
in addition to the fact that a flood of them can lock up a system.

Again, just turning off DMAR fault reporting for the 'if (this is crash kernel)',
short-term workaround sounds a whole lot less expensive to implement, as well as
'if (this is crash kernel), force hw-passthrough'.

If the IO devices are borked to the point that they won't complete DMA properly,
with or without IOMMU, the system is dead anyhow, game over.

Finally, copying the previous IOMMU state to the second kernel, and hoping
that the cause of the kernel crash wasn't an errant DMA (e.g., due to a device going bad,
or it's DMA-state being corrupted & causing an improper IO), is omitting an important failure
case/space.
Keeping the first-kernel DMA isolated (IOMMU on, all translations off, all DMAR faults off),
and then allowing each device (driver) configuration to sanely reset the device &
start a new (hw-passthrough) domain seems simpler and cleaner, for this dump-and-run kernel
effort.

- Don

> 3. Any activity between the IO-device and its RMRR areas is handled by the
> device-driver in the same manner as during a non-kdump boot.
> 4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
> This supports the practice of creating a special kdump kernel without
> drivers for any devices that are not required for taking a crashdump.
>
> Changes since the RFC version of this patch:
> 1. Consolidated all of the operational code into the "copy..." functions.
> The "process..." functions were primarily used for diagnostics and
> exploration; however, there was a small amount of operational code that
> used the "process..." functions.
> This operational code has been moved into the "copy..." functions.
>
> 2. Removed the "Process ..." functions and the diagnostic code that ran
> on that function set. This removed about 1/4 of the code -- which this
> operational patch set no longer needs. These portions of the RFC patch
> could be formatted as a separate patch and submitted independently
> at a later date.
>
> 3. Re-formatted the code to the Linux Coding Standards.
> The checkpatch script still finds some lines to complain about;
> however most of these lines are either (1) lines that I did not change,
> or (2) lines that only changed by adding a level of indent which pushed
> them over 80-characters, or (3) new lines whose intent is far clearer when
> longer than 80-characters.
>
> 4. Updated the remaining debug print to be significantly more flexible.
> This allows control over the amount of debug print to the console --
> which can vary widely.
>
> 5. Fixed a couple of minor bugs found by testing on a machine with a
> very large IO configuration.
>
> At a high level, this code operates primarily during iommu initialization
> and device-driver initialization
>
> During intel-iommu hardware initialization:
> In intel_iommu_init(void)
> * If (This is the crash kernel)
> . Set flag: crashdump_accepting_active_iommu (all changes below check this)
> . Skip disabling the iommu hardware translations
>
> In init_dmars()
> * Duplicate the intel iommu translation tables from the old kernel
> in the new kernel
> . The root-entry table, all context-entry tables,
> and all page-translation-entry tables
> . The duplicate tables contain updated physical addresses to link them together.
> . The duplicate tables are mapped into kernel virtual addresses
> in the new kernel which allows most of the existing iommu code
> to operate without change.
> . Do some minimal sanity-checks during the copy
> . Place the address of the new root-entry structure into "struct intel_iommu"
>
> * Skip setting-up new domains for 'si', 'rmrr', 'isa'
> . Translations for 'rmrr' and 'isa' ranges have been copied from the old kernel
> . This patch has not yet been tested with iommu pass-through enabled
>
> * Existing (unchanged) code near the end of dmar_init:
> . Loads the address of the (now new) root-entry structure from
> "struct intel_iommu" into the iommu hardware and does the hardware flushes.
> This changes the active translation tables from the ones in the old kernel
> to the copies in the new kernel.
> . This is legal because the translations in the two sets of tables are
> currently identical:
> Virtualization Technology for Directed I/O. Architecture Specification,
> February 2011, Rev. 1.3 (section 11.2, paragraph 2)
>
> In iommu_init_domains()
> * Mark as in-use all domain-id's from the old kernel
> . In case the new kernel contains a device that was not in the old kernel
> and a new, unused domain-id is actually needed, the bitmap will give us one.
>
> When a new domain is created for a device:
> * If (this device has a context in the old kernel)
> . Get domain-id, address-width, and IOVA ranges from the old kernel context;
> . Get address(page-entry-tables) from the copy in the new kernel;
> . And apply all of the above values to the new domain structure.
> * Else
> . Create a new domain as normal
>
>
> Bill Sumner (6):
> Crashdump-Accepting-Active-IOMMU-Flags-and-Prototype
> Crashdump-Accepting-Active-IOMMU-Utility-functions
> Crashdump-Accepting-Active-IOMMU-Domain-Interfaces
> Crashdump-Accepting-Active-IOMMU-Copy-Translations
> Crashdump-Accepting-Active-IOMMU-Debug-Print-IOMMU
> Crashdump-Accepting-Active-IOMMU-Call-From-Mainline
>
> drivers/iommu/intel-iommu.c | 1293 ++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 1225 insertions(+), 68 deletions(-)
>

2014-04-08 16:15:15

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

On Mon, 2014-04-07 at 16:43 -0400, Don Dutile wrote:
>
> Additionally, a tidbit of information like "some servers force NMI's
> on DMAR faults,
> and cause a system reset, thereby, preventing a kdump to occur"
> should have been included as one reason to stop DMAR faults from
> occurring on kexec-boot,
> in addition to the fact that a flood of them can lock up a system.

How about allocating a physical scratch page, and setting up a mapping
for each device such that *every* virtual address (apart from those
listed in RMRRs, perhaps) is mapped to that same scratch page?

That way you avoid the faults, but you also avoid stray DMA to parts of
the system that you don't want to get corrupted.

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation


Attachments:
smime.p7s (5.61 kB)

2014-04-08 18:44:12

by Donald Dutile

[permalink] [raw]
Subject: Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

On 04/08/2014 12:14 PM, David Woodhouse wrote:
> On Mon, 2014-04-07 at 16:43 -0400, Don Dutile wrote:
>>
>> Additionally, a tidbit of information like "some servers force NMI's
>> on DMAR faults,
>> and cause a system reset, thereby, preventing a kdump to occur"
>> should have been included as one reason to stop DMAR faults from
>> occurring on kexec-boot,
>> in addition to the fact that a flood of them can lock up a system.
>
> How about allocating a physical scratch page, and setting up a mapping
> for each device such that *every* virtual address (apart from those
> listed in RMRRs, perhaps) is mapped to that same scratch page?
>
> That way you avoid the faults, but you also avoid stray DMA to parts of
> the system that you don't want to get corrupted.
>
+1... more isolation as second kernel booting sounds good.

2014-04-25 18:13:20

by Sumner, William

[permalink] [raw]
Subject: RE: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

1 Hi Don,
2 It seems that we agree in the area of keeping the iommu active and using it to
3 isolate the legacy DMA. We also agree that the device's IO driver should be
4 the software that deals with the device (e.g.: resets it, etc.)
5
6 It also seems that there are several differences in our two approaches to how
7 the iommu is then used to accomplish this common goal.
8
9 More detailed replies are among your comments below.
10
11 On Monday 4/7/2014: Don Dutile wrote:
12 >On 01/10/2014 05:07 PM, Bill Sumner wrote:
13 >> v2->v3:
14 >> 1. Commented-out "#define DEBUG 1" to eliminate debug messages
15 >> 2. Updated the comments about changes in each version in all patches in the set .
16 >> 3. Fixed: one-line added to Copy-Translations" patch to initialize the iovad
17 >> struct as recommended by Baoquan He [[email protected]]
18 >> init_iova_domain(&domain->iovad, DMA_32BIT_PFN);
19 >>
20 >> v1->v2:
21 >> The following series implements a fix for:
22 >> A kdump problem about DMA that has been discussed for a long time. That is,
23 >> when a kernel panics and boots into the kdump kernel, DMA started by the
24 >> panicked kernel is not stopped before the kdump kernel is booted and the
25 >> kdump kernel disables the IOMMU while this DMA continues. This causes the
26 >> IOMMU to stop translating the DMA addresses as IOVAs and begin to treat them
27 >> as physical memory addresses -- which causes the DMA to either:
28 >> (1) generate DMAR errors or (2) generate PCI SERR errors or (3) transfer
29 >> data to or from incorrect areas of memory. Often this causes the dump to fail.
30 >>
31 >> This patch set modifies the behavior of the iommu in the (new) crashdump kernel :
32 >> 1. to accept the iommu hardware in an active state,
33 >> 2. to leave the current translations in-place so that legacy DMA will continue
34 >> using its current buffers until the device drivers in the crashdump kernel
35 >> initialize and initialize their devices,
36 >> 3. to use different portions of the iova address ranges for the device drivers
37 >> in the crashdump kernel than the iova ranges that were in-use at the time
38 >> of the panic.
39 >>
40 >> Advantages of this approach:
41 >> 1. All manipulation of the IO-device is done by the Linux device-driver
42 >> for that device.
43 >> 2. This approach behaves in a manner very similar to operation without an
44 >> active iommu.
45 >
46 >Sorry to be late to the game.... finally getting out of a deep hole &
47 >asked to look at this proposal...
48 >
49 >Along this concept -- similar to operation without an active iommu --
50 >have you considered the following:
51 >a) if (this is crash kernel), turn *off* DMAR faults;
52 I did look at turning off DMAR faults and, although I chose not to disable
53 them in my current patch submissions, I think that this is still an open area
54 to explore -- especially for any cases where DMAR faults were already occurring
55 in the panicked kernel. We may want to explore expanding the patches to cover the
56 case where legacy DMAR faults are still occurring in the kdump kernel.
57 I would recommend tackling this as a separate follow-on effort.
58 In testing my patches, I have seen no new DMAR faults introduced by them.
59 In fact, the transition is seamless, leaving no time window where (for
60 instance) RMRR DMA would fail.
61
62 >b) if (this is crash kernel), isolate all device DMA in IOMMU
63 I agree, the iommu is a great resource to isolate device DMA into safe memory
64 areas -- we should leave it active and use it.
65
66 >b) as second kernel configures each device, have each device to use IOMMU hw-pass through,
67 > i.e., the equivalent of having no IOMMU for the second, kexec'd kernel
68 > but, having the benefit of keeping all the other (potentially bad) devices
69 > sequestered / isolated, until they are initialized & re-configured in the seco nd kernel,
70 > *if at all* -- note: kexec'd kernels may not enable/configure all devices tha t
71 > existed in the first kernel (Bill: I'm sure you know this, but others may not ).
72 >
73 >RMRR's that were previously setup could continue to work if they are skipped in s tep (b),
74 >unless the device has gone mad/bad. In that case, re-parsing the RMRR may or may not
75 >clear up the issue.
76 >
77 >Additionally, a tidbit of information like "some servers force NMI's on DMAR faul ts,
78 >and cause a system reset, thereby, preventing a kdump to occur"
79 >should have been included as one reason to stop DMAR faults from occurring on kex ec-boot,
80 >in addition to the fact that a flood of them can lock up a system.
81 If a server will "force NMI's on DMAR faults, and cause a system reset, thereby,
82 preventing a kdump to occur" then if the panicked kernel was already causing DMAR
83 faults our dump is already doomed. If the panicked kernel was *not* causing
84 DMAR faults, then any kdump technique that does not introduce any new DMAR
85 fault conditions should be ok.
86 >
87 >Again, just turning off DMAR fault reporting for the 'if (this is crash kernel)',
88 >short-term workaround sounds a whole lot less expensive to implement, as well as
89 >'if (this is crash kernel), force hw-passthrough'.
90 Actually it may not be all that less expensive once the details of
91 its implementation are explored.
92
93 1. Especially with the current patches already available and somewhat tested.
94
95 2. I think that implementing what you suggest will require some manipulation
96 of the translation tables and their base register in the iommu -- code
97 that seems likely to be similar to what is in the patches. If this effort
98 goes in the direction you suggest, perhaps some of the code in the patches
99 could be of help.
100 >
101 >If the IO devices are borked to the point that they won't complete DMA properly,
102 >with or without IOMMU, the system is dead anyhow, game over.
103 True, and there is probably not much we can do about it.
104 >
105 >Finally, copying the previous IOMMU state to the second kernel, and hoping
106 >that the cause of the kernel crash wasn't an errant DMA (e.g., due to a device going bad,
107 >or it's DMA-state being corrupted & causing an improper IO), is omitting an important failure
108 >case/space.
109 I am not just "hoping":
110 a. The iommu isolates all device DMA to buffers dictated by the translation
111 tables -- whether those tables are the ones I copied or the ones that were
112 created in kdump to isolate the DMA to new memory buffers. By design,
113 the iommu hardware very strongly limits what memory a device can access --
114 even in the cited case where the device has a corrupted state and may be
115 making all sorts of bad requests.
116
117 b. The iommu tables of course could be "hosed" which would cause problems; but
118 in the normal case the translation tables are updated only by a small amount
119 of software in the iommu driver and the dma software. This fortunately
120 provides a relatively solid foundation that ensures the integrity of the
121 tables that the iommu hardware is using for translations.
122
123 c. Since the tables the patches copy come from a panicked kernel, they are
124 vetted as well as we can before they are copied and used in the kdump kernel.
125 This happens early in kdump __init before any of the copied tables are used
126 by the kdump iommu driver software. Currently if the patches find any bad
127 tables, they simply bail-out of taking the dump -- another area where they
128 might be expanded -- to do something different in this case.
129
130 d. The most vulnerable place I have noticed for bugs in the Linux kernel to
131 generate bad translations in the iommu tables is the short path in the
132 device drivers between obtaining the memory buffer from some allocator
133 and then obtaining an IOVA for that buffer from the dma subsystem.
134 If the device driver loses the buffer address and/or hands the dma subsystem
135 a different buffer address when it requests the IOVA, then the iommu has
136 no way of knowing that the buffer address is bad, creates the translation,
137 and bad things happen.
138
139 >Keeping the first-kernel DMA isolated (IOMMU on, all translations off, all DMAR faults off),
140 >and then allowing each device (driver) configuration to sanely reset the device &
141 >start a new (hw-passthrough) domain seems simpler and cleaner, for this dump-and-run kernel
142 >effort.
143 As you explore the technique of turning-off translations, a few questions come to mind:
144 1. How is the RMRR activity supported? Any time windows where the device cannot
145 initiate a DMA transaction will need to be minimized.
146
147 2. How does the iommu code know when the device has been reset by its driver
148 so that it is now safe to program new translations into the iommu tables?
149 The first communication the iommu code gets from the driver is when the driver
150 requests a translation via the dma services. I see no ordering requirement
151 on the driver that it must have already reset the device at this time.
152 The driver may be accumulating buffers that it will later send to the device
153 during the reset sequence for use as work queues, status inputs, or data
154 buffers.
155
156 3. What happens if hardware pass-through is enabled for a device before the
157 driver resets it ? This would seem to allow any legacy DMA to run rampant.
158 >
159 >- Don
160 >
161 >> 3. Any activity between the IO-device and its RMRR areas is handled by the
162 >> device-driver in the same manner as during a non-kdump boot.
163 >> 4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
164 >> This supports the practice of creating a special kdump kernel without
165 >> drivers for any devices that are not required for taking a crashdump.
166 >>
167 >> Changes since the RFC version of this patch:
168 >> 1. Consolidated all of the operational code into the "copy..." functions.
169 >> The "process..." functions were primarily used for diagnostics and
170 >> exploration; however, there was a small amount of operational code that
171 >> used the "process..." functions.
172 >> This operational code has been moved into the "copy..." functions.
173 >>
174 >> 2. Removed the "Process ..." functions and the diagnostic code that ran
175 >> on that function set. This removed about 1/4 of the code -- which this
176 >> operational patch set no longer needs. These portions of the RFC patch
177 >> could be formatted as a separate patch and submitted independently
178 >> at a later date.
179 >>
180 >> 3. Re-formatted the code to the Linux Coding Standards.
181 >> The checkpatch script still finds some lines to complain about;
182 >> however most of these lines are either (1) lines that I did not change,
183 >> or (2) lines that only changed by adding a level of indent which pushed
184 >> them over 80-characters, or (3) new lines whose intent is far clearer when
185 >> longer than 80-characters.
186 >>
187 >> 4. Updated the remaining debug print to be significantly more flexible.
188 >> This allows control over the amount of debug print to the console --
189 >> which can vary widely.
190 >>
191 >> 5. Fixed a couple of minor bugs found by testing on a machine with a
192 >> very large IO configuration.
193 >>
194 >> At a high level, this code operates primarily during iommu initialization
195 >> and device-driver initialization
196 >>
197 >> During intel-iommu hardware initialization:
198 >> In intel_iommu_init(void)
199 >> * If (This is the crash kernel)
200 >> . Set flag: crashdump_accepting_active_iommu (all changes below check this)
201 >> . Skip disabling the iommu hardware translations
202 >>
203 >> In init_dmars()
204 >> * Duplicate the intel iommu translation tables from the old kernel
205 >> in the new kernel
206 >> . The root-entry table, all context-entry tables,
207 >> and all page-translation-entry tables
208 >> . The duplicate tables contain updated physical addresses to link them toget her.
209 >> . The duplicate tables are mapped into kernel virtual addresses
210 >> in the new kernel which allows most of the existing iommu code
211 >> to operate without change.
212 >> . Do some minimal sanity-checks during the copy
213 >> . Place the address of the new root-entry structure into "struct intel_iommu "
214 >>
215 >> * Skip setting-up new domains for 'si', 'rmrr', 'isa'
216 >> . Translations for 'rmrr' and 'isa' ranges have been copied from the old ker nel
217 >> . This patch has not yet been tested with iommu pass-through enabled
218 >>
219 >> * Existing (unchanged) code near the end of dmar_init:
220 >> . Loads the address of the (now new) root-entry structure from
221 >> "struct intel_iommu" into the iommu hardware and does the hardware flushes .
222 >> This changes the active translation tables from the ones in the old kernel
223 >> to the copies in the new kernel.
224 >> . This is legal because the translations in the two sets of tables are
225 >> currently identical:
226 >> Virtualization Technology for Directed I/O. Architecture Specification,
227 >> February 2011, Rev. 1.3 (section 11.2, paragraph 2)
228 >>
229 >> In iommu_init_domains()
230 >> * Mark as in-use all domain-id's from the old kernel
231 >> . In case the new kernel contains a device that was not in the old kernel
232 >> and a new, unused domain-id is actually needed, the bitmap will give us on e.
233 >>
234 >> When a new domain is created for a device:
235 >> * If (this device has a context in the old kernel)
236 >> . Get domain-id, address-width, and IOVA ranges from the old kernel context;
237 >> . Get address(page-entry-tables) from the copy in the new kernel;
238 >> . And apply all of the above values to the new domain structure.
239 >> * Else
240 >> . Create a new domain as normal
241 >>
242 >>
243 >> Bill Sumner (6):
244 >> Crashdump-Accepting-Active-IOMMU-Flags-and-Prototype
245 >> Crashdump-Accepting-Active-IOMMU-Utility-functions
246 >> Crashdump-Accepting-Active-IOMMU-Domain-Interfaces
247 >> Crashdump-Accepting-Active-IOMMU-Copy-Translations
248 >> Crashdump-Accepting-Active-IOMMU-Debug-Print-IOMMU
249 >> Crashdump-Accepting-Active-IOMMU-Call-From-Mainline
250 >>
251 >> drivers/iommu/intel-iommu.c | 1293 ++++++++++++++++++++++++++++++++++++++++-- -
252 >> 1 file changed, 1225 insertions(+), 68 deletions(-)
253 >>
254 >
255 >
256 In summary:
257 1. We seem to agree that keeping the iommu active and using it is a good idea.
258 And there are a few different ideas about how to use it. It will be
259 a good conversation to explore these and settle on a final solution.
260
261 2. Since the patches are currently available on a recent stable baseline,
262 and the technique in them has already been somewhat tested by three companies
263 and found to work, I would recommend completing the review, testing, and
264 implementation of these patches into the Linux kernel. They appear to solve
265 the majority of the currently observed problems.
266
267 3. After that, exploration of other methodologies or extensions of these
268 patches could be pursued.
269
270 -- Bill