2015-06-11 13:48:01

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH 0/4] More cleanups and fixes for Intel VT-d

Hi,

here are a number of additional fixes for the Intel VT-d
driver and its behavior in a kdump kernel. The most
important fix is the iommu=pt case, which is broken without
these patches.

Regards,

Joerg

Joerg Roedel (4):
iommu/vt-d: Remove iommu_attach_domain_with_id()
iommu/vt-d: Don't consider copied context entries as present
iommu/vt-d: Remove unmap_device_dma()
iommu/vt-d: Make sure si_domain is allocated for kdump kernel

drivers/iommu/intel-iommu.c | 104 ++++++++++++++++----------------------------
1 file changed, 38 insertions(+), 66 deletions(-)

--
1.9.1


2015-06-11 13:48:11

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH 1/4] iommu/vt-d: Remove iommu_attach_domain_with_id()

From: Joerg Roedel <[email protected]>

We don't re-use domain-ids from the old kernel, so this
function is not needed anymore. Remove it.

Signed-off-by: Joerg Roedel <[email protected]>
---
drivers/iommu/intel-iommu.c | 13 +------------
1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 58abddb..b0fd5f2 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1648,16 +1648,6 @@ static int iommu_attach_domain(struct dmar_domain *domain,
return num;
}

-static int iommu_attach_domain_with_id(struct dmar_domain *domain,
- struct intel_iommu *iommu,
- int domain_number)
-{
- if (domain_number >= 0)
- return domain_number;
-
- return iommu_attach_domain(domain, iommu);
-}
-
static int iommu_attach_vm_domain(struct dmar_domain *domain,
struct intel_iommu *iommu)
{
@@ -2326,7 +2316,6 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
u16 dma_alias;
unsigned long flags;
u8 bus, devfn;
- int did = -1; /* Default to "no domain_id supplied" */

domain = find_domain(dev);
if (domain)
@@ -2361,7 +2350,7 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
if (!domain)
return NULL;

- domain->id = iommu_attach_domain_with_id(domain, iommu, did);
+ domain->id = iommu_attach_domain(domain, iommu);
if (domain->id < 0) {
free_domain_mem(domain);
return NULL;
--
1.9.1

2015-06-11 13:48:32

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH 2/4] iommu/vt-d: Don't consider copied context entries as present

From: Joerg Roedel <[email protected]>

Hide the copied context entries from the IOMMU driver by
considering them as non-present. This is implemented by
setting the first AVL bit (bit 67) in the context entry to
one. If this bit is set, the context_present() function
returns false.

Signed-off-by: Joerg Roedel <[email protected]>
---
drivers/iommu/intel-iommu.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b0fd5f2..f50d065 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -234,11 +234,21 @@ struct context_entry {
u64 hi;
};

-static inline bool context_present(struct context_entry *context)
+static inline bool __context_present(struct context_entry *context)
{
return (context->lo & 1);
}

+static inline bool context_copied(struct context_entry *context)
+{
+ return (((context->hi >> 3) & 0xfULL) == 1);
+}
+
+static inline bool context_present(struct context_entry *context)
+{
+ return __context_present(context) && !context_copied(context);
+}
+
static inline int context_fault_enable(struct context_entry *c)
{
return((c->lo >> 1) & 0x1);
@@ -269,6 +279,11 @@ static inline void context_set_present(struct context_entry *context)
context->lo |= 1;
}

+static inline void context_set_copied(struct context_entry *context)
+{
+ context->hi |= 1ULL << 3;
+}
+
static inline void context_set_fault_enable(struct context_entry *context)
{
context->lo &= (((u64)-1) << 2) | 1;
@@ -1919,6 +1934,9 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
}
}

+ /* First clear any left-overs, like a copied context entry */
+ context_clear_entry(context);
+
context_set_domain_id(context, id);

if (translation != CONTEXT_TT_PASS_THROUGH) {
@@ -4911,13 +4929,15 @@ static int copy_one_context_table(struct intel_iommu *iommu,
memcpy_fromio(&ce, &ctxt_tbl_old[devfn],
sizeof(struct context_entry));

- if (!context_present(&ce))
+ if (!__context_present(&ce))
continue;

did = context_domain_id(&ce);
if (did >=0 && did < cap_ndoms(iommu->cap))
set_bit(did, iommu->domain_ids);

+ context_set_copied(&ce);
+
ctxt_tbl[devfn] = ce;
}

--
1.9.1

2015-06-11 13:48:25

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH 3/4] iommu/vt-d: Remove unmap_device_dma()

From: Joerg Roedel <[email protected]>

This removes the unmap_device_dma() function and the
code-path calling it. That code looks like a left-over from
Bill Sumners patch-set, as it only makes sense when kdump
recovery is implemented like in his original patch-set.

With the way it is implemented now this code doesn't make
sense at all and can be removed.

Signed-off-by: Joerg Roedel <[email protected]>
---
drivers/iommu/intel-iommu.c | 44 --------------------------------------------
1 file changed, 44 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index f50d065..fcb9310 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -400,9 +400,6 @@ static LIST_HEAD(__iommu_remapped_mem);
*/
static int intel_iommu_load_translation_tables(struct intel_iommu *iommu);

-static void unmap_device_dma(struct dmar_domain *domain,
- struct device *dev,
- struct intel_iommu *iommu);
static void iommu_check_pre_te_status(struct intel_iommu *iommu);
static u8 g_translation_pre_enabled;

@@ -3120,25 +3117,6 @@ static struct dmar_domain *__get_valid_domain_for_dev(struct device *dev)
return NULL;
}

- /* if in kdump kernel, we need to unmap the mapped dma pages,
- * detach this device first.
- */
- if (likely(domain_context_mapped(dev))) {
- iommu = domain_get_iommu(domain);
- if (iommu->pre_enabled_trans) {
- unmap_device_dma(domain, dev, iommu);
-
- domain = get_domain_for_dev(dev,
- DEFAULT_DOMAIN_ADDRESS_WIDTH);
- if (!domain) {
- pr_err("Allocating domain for %s failed",
- dev_name(dev));
- return NULL;
- }
- } else
- return domain;
- }
-
ret = domain_context_mapping(domain, dev, CONTEXT_TT_MULTI_LEVEL);
if (ret) {
pr_err("Domain context map for %s failed",
@@ -5063,28 +5041,6 @@ out_unmap:
return ret;
}

-static void unmap_device_dma(struct dmar_domain *domain,
- struct device *dev,
- struct intel_iommu *iommu)
-{
- struct context_entry *ce;
- struct iova *iova;
- phys_addr_t phys_addr;
- dma_addr_t dev_addr;
- struct pci_dev *pdev;
-
- pdev = to_pci_dev(dev);
- ce = iommu_context_addr(iommu, pdev->bus->number, pdev->devfn, 1);
- phys_addr = context_address_root(ce) << VTD_PAGE_SHIFT;
- dev_addr = phys_to_dma(dev, phys_addr);
-
- iova = find_iova(&domain->iovad, IOVA_PFN(dev_addr));
- if (iova)
- intel_unmap(dev, dev_addr);
-
- domain_remove_one_dev_info(domain, dev);
-}
-
static void iommu_check_pre_te_status(struct intel_iommu *iommu)
{
u32 sts;
--
1.9.1

2015-06-11 13:48:17

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH 4/4] iommu/vt-d: Make sure si_domain is allocated for kdump kernel

From: Joerg Roedel <[email protected]>

The si_domain needs to be allocated in the kdump kernel too.
Otherwise the iommu=pt case breaks with a panic of the kdump
kernel.

Signed-off-by: Joerg Roedel <[email protected]>
---
drivers/iommu/intel-iommu.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index fcb9310..e5af94a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2781,10 +2781,6 @@ static int __init iommu_prepare_static_identity_mapping(int hw)
int i;
int ret = 0;

- ret = si_domain_init(hw);
- if (ret)
- return -EFAULT;
-
for_each_pci_dev(pdev) {
ret = dev_prepare_static_identity_mapping(&pdev->dev, hw);
if (ret)
@@ -2798,7 +2794,7 @@ static int __init iommu_prepare_static_identity_mapping(int hw)

if (dev->bus != &acpi_bus_type)
continue;
-
+
adev= to_acpi_device(dev);
mutex_lock(&adev->physical_node_lock);
list_for_each_entry(pn, &adev->physical_node_list, node) {
@@ -2979,9 +2975,20 @@ static int __init init_dmars(void)
check_tylersburg_isoch();

/*
- * In the second kernel: Skip setting-up new domains for
- * si, rmrr, and the isa bus on the expectation that these
- * translations were copied from the old kernel.
+ * Make sure the si_domain is allocated in the case we are in a
+ * kdump kernel.
+ */
+ if (iommu_identity_mapping) {
+ ret = si_domain_init(hw_pass_through);
+ if (ret)
+ goto free_iommu;
+ }
+
+ /*
+ * In the second kernel: Skip setting-up new domains for rmrr
+ * and the isa bus, they will be allocated and assigned on the
+ * first DMA-API call. This is to keep the old mappings alive
+ * until the device driver of the kdump kernel takes over.
*/
if (g_translation_pre_enabled)
goto skip_new_domains_for_si_rmrr_isa;
--
1.9.1

2015-06-11 14:07:12

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH 2/4] iommu/vt-d: Don't consider copied context entries as present

On Thu, 2015-06-11 at 15:47 +0200, Joerg Roedel wrote:
> From: Joerg Roedel <[email protected]>
>
> Hide the copied context entries from the IOMMU driver by
> considering them as non-present. This is implemented by
> setting the first AVL bit (bit 67) in the context entry to
> one. If this bit is set, the context_present() function
> returns false.
>
> Signed-off-by: Joerg Roedel <[email protected]>

In the extended context entry, bit 67 is the PGE bit. There are no bits
which are available to software, to my knowledge.

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation


Attachments:
smime.p7s (5.56 kB)

2015-06-11 14:13:04

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH 2/4] iommu/vt-d: Don't consider copied context entries as present

On Thu, Jun 11, 2015 at 03:07:02PM +0100, David Woodhouse wrote:
> On Thu, 2015-06-11 at 15:47 +0200, Joerg Roedel wrote:
> > From: Joerg Roedel <[email protected]>
> >
> > Hide the copied context entries from the IOMMU driver by
> > considering them as non-present. This is implemented by
> > setting the first AVL bit (bit 67) in the context entry to
> > one. If this bit is set, the context_present() function
> > returns false.
> >
> > Signed-off-by: Joerg Roedel <[email protected]>
>
> In the extended context entry, bit 67 is the PGE bit. There are no bits
> which are available to software, to my knowledge.

Ah you are right, I was looking at the context-entry format without
PASID. Then I have to solve this differently. Thanks for pointing that
out.


Joerg

2015-06-11 14:19:41

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH 2/4] iommu/vt-d: Don't consider copied context entries as present

On Thu, 2015-06-11 at 16:12 +0200, Joerg Roedel wrote:
> On Thu, Jun 11, 2015 at 03:07:02PM +0100, David Woodhouse wrote:
> > On Thu, 2015-06-11 at 15:47 +0200, Joerg Roedel wrote:
> > > From: Joerg Roedel <[email protected]>
> > >
> > > Hide the copied context entries from the IOMMU driver by
> > > considering them as non-present. This is implemented by
> > > setting the first AVL bit (bit 67) in the context entry to
> > > one. If this bit is set, the context_present() function
> > > returns false.
> > >
> > > Signed-off-by: Joerg Roedel <[email protected]>
> >
> > In the extended context entry, bit 67 is the PGE bit. There are no
> > bits
> > which are available to software, to my knowledge.
>
> Ah you are right, I was looking at the context-entry format without
> PASID. Then I have to solve this differently. Thanks for pointing
> that out

While looking for spare bits, I was pondering the FPD (fault processing
disable) bit. Which we're likely to want to set when we are seeing
fault storms from misbehaving devices, although that hasn't been
implemented yet.

That code path is going to want to keep some per-device state over and
above what's in the context entry (and it's going to tie in with
generic PCI/device error handling too).

In the short term, perhaps it's not unreasonable to set the FPD bit on
'inherited' domains? That might be what you wanted anyway?

Later on, we'll want to be able to turn that on and off at runtime and
it won't purely serve as a marker for 'inherited' domains. But as I
said, at that point we'll *need* external state anyway. So I think
that's OK, perhaps?

--
dwmw2


Attachments:
smime.p7s (5.56 kB)

2015-06-11 14:26:00

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH 2/4] iommu/vt-d: Don't consider copied context entries as present

On Thu, Jun 11, 2015 at 03:07:02PM +0100, David Woodhouse wrote:
> On Thu, 2015-06-11 at 15:47 +0200, Joerg Roedel wrote:
> > From: Joerg Roedel <[email protected]>
> >
> > Hide the copied context entries from the IOMMU driver by
> > considering them as non-present. This is implemented by
> > setting the first AVL bit (bit 67) in the context entry to
> > one. If this bit is set, the context_present() function
> > returns false.
> >
> > Signed-off-by: Joerg Roedel <[email protected]>
>
> In the extended context entry, bit 67 is the PGE bit. There are no bits
> which are available to software, to my knowledge.

Okay, reading the VT-d spec again, the extended context-entry table seem
to exist in parallel to the current context-entry table, right? So this
patch should still work, even with extended entries present.

If that's true, then we only need to consider the kdump case while
setting up the extended context table entry (which means don't trust any
content that is already there).


Joerg

2015-06-11 14:45:02

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH 2/4] iommu/vt-d: Don't consider copied context entries as present

On Thu, 2015-06-11 at 16:25 +0200, Joerg Roedel wrote:
> On Thu, Jun 11, 2015 at 03:07:02PM +0100, David Woodhouse wrote:
> > On Thu, 2015-06-11 at 15:47 +0200, Joerg Roedel wrote:
> > > From: Joerg Roedel <[email protected]>
> > >
> > > Hide the copied context entries from the IOMMU driver by
> > > considering them as non-present. This is implemented by
> > > setting the first AVL bit (bit 67) in the context entry to
> > > one. If this bit is set, the context_present() function
> > > returns false.
> > >
> > > Signed-off-by: Joerg Roedel <[email protected]>
> >
> > In the extended context entry, bit 67 is the PGE bit. There are no bits
> > which are available to software, to my knowledge.
>
> Okay, reading the VT-d spec again, the extended context-entry table seem
> to exist in parallel to the current context-entry table, right? So this
> patch should still work, even with extended entries present.

No, the extended context-entry exists *instead* of the legacy
context-entry. Note that all the bits in the legacy context-entry are
present in precisely the same place in the extended context-entry. It's
just that the extended context-entry defines meanings for more of them.

When you enable the DMA_RTADDR_RTT bit in the Root Table Address
register, the context-entries magically double in size.

It used to look like this:


Root Table Address Register
|
V

Root Table (struct root_entry) Context Table (struct context_entry)
------------------------------ ------------------------------------
0x00: Context-table pointer -----> Context entry for 00:00.0
0x08: unused Context entry for 00:00.1
0x10: unused Context entry for 00:00.2
... ... ...
0xff8:... Context entry for ff:1f.7


Now it looks like this

Root Table Address Register
|
V

Root Table (struct root_entry) Context Table (struct context_entry)
------------------------------ ------------------------------------
0x00: Context-table ptr #1 -----> Context entry for 00:00.0: lo
0x08: Context-table ptr #2 --, Context entry for 00:00.0: hi
0x10: unused | Context entry for 00:00.1: lo
... ... | ...
0xff8:... | Context entry for 7f:1f.7: hi
|
|
| Context Table (struct context_entry)
--> ------------------------------------
0x00: Context entry for 80:00.0: lo
0x08: Context entry for 80:00.1: hi
... ...
0xff8: Context entry for ff:1f.7: hi


This was implemented in http://git.kernel.org/linus/03ecc32c52 but
*all* that patch did was allocate the second page of context-table,
fill in the appropriate new pointer in the root table, and adjust the
way we calculate the *location* of a context-entry. In 4.1 we're still
only using the same old bits of the context-entry, which as noted are
in the same place in both cases. Even the mapping from the old 2-bit T
field to the new 3-bit TT field works out that way, for now.

--
dwmw2


Attachments:
smime.p7s (5.56 kB)

2015-06-11 14:51:43

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 2/4] iommu/vt-d: Don't consider copied context entries as present

On Thu, Jun 11, 2015 at 04:25:54PM +0200, Joerg Roedel wrote:
> Okay, reading the VT-d spec again, the extended context-entry table seem
> to exist in parallel to the current context-entry table, right? So this
> patch should still work, even with extended entries present.

I found section 3.4.4. of the Spec, and learned that the two tables do
not exist in parallel. The extended table ist just split into two parts
because the context entry was doubled in size.


Joerg

2015-06-11 15:12:46

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH 2/4] iommu/vt-d: Don't consider copied context entries as present

On Thu, 2015-06-11 at 15:44 +0100, David Woodhouse wrote:
> It used to look like this
> ...
> Now it looks like this

That's not quite right; each root entry is for only *one* bus, and the
correct version looks more like this...

It used to look like this:

Root Table Address Register
|
V

Root Table (struct root_entry) Context Table (struct context_entry)
------------------------------ ------------------------------------
0x00: CTX TBL for bus #0 ---------> CTX entry for 00:00.0 (bits 63-0)
0x08: unused CTX entry for 00:00.0 (bits 127-64)
0x10: CTX TBL for bus #1 ----. CTX entry for 00:00.1 (bits 63-0)
... ... | ...
0xff8:unused | CTX entry for 00:1f.7 (bits 127-64)
|
|
| Context Table (struct context_entry)
| ------------------------------------
0x00: -----> CTX entry for 01:00.0 (bits 63-0)
0x00: CTX entry for 01:00.0 (bits 127-64)
0x08: CTX entry for 01:00.1 (bits 63-0)
... ...
0xff8: CTX entry for 01:1f.7 (bits 127-64)


Now it looks like this, with each context entry taking twice the amount
of space:

Root Table Address Register
|
V

Root Table (struct root_entry) Context Table (struct context_entry)
------------------------------ ------------------------------------
0x00: CTX TBL LO for bus #0 ------> CTX entry for 00:00.0 (bits 63-0)
0x08: CTX TBL HI for bus #0 --. CTX entry for 00:00.0 (bits 127-64)
0x10: CTX TBL LO for bus #1 | CTX entry for 00:00.0 (bits 191-128)
... ... | ...
0xff8:CTX TBL HI for bus #FF | CTX entry for 00:0f.7 (bits 255-192)
|
|
| Context Table (struct context_entry)
| ------------------------------------
0x00: ----> CTX entry for 00:10.0 (bits 63-0)
0x00: CTX entry for 00:10.0 (bits 127-64)
0x08: CTX entry for 00:10.0 (bits 191-128)
... ...
0xff8: CTX entry for 00:1f.7 (bits 63-0)

--
dwmw2


Attachments:
smime.p7s (5.56 kB)

2015-06-11 15:54:31

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH 0/4] More cleanups and fixes for Intel VT-d

On Thu, 2015-06-11 at 15:47 +0200, Joerg Roedel wrote:
>
> here are a number of additional fixes for the Intel VT-d
> driver and its behavior in a kdump kernel. The most
> important fix is the iommu=pt case, which is broken without
> these patches.

It looks like the original kdump patch set has basically now been
rewritten. Perhaps the best thing to do is for Zhenhua to resubmit an
updated and clean version rather than preserving that
churn?

--
dwmw2


Attachments:
smime.p7s (5.56 kB)

2015-06-12 07:31:47

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 0/4] More cleanups and fixes for Intel VT-d

On Thu, Jun 11, 2015 at 04:54:26PM +0100, David Woodhouse wrote:
> It looks like the original kdump patch set has basically now been
> rewritten. Perhaps the best thing to do is for Zhenhua to resubmit an
> updated and clean version rather than preserving that
> churn?

Yeah, you are right. The history got really messy and is a horror to
backport. I'll post a clean rewrite of this soon.


Joerg

2015-06-12 20:29:33

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 0/4] More cleanups and fixes for Intel VT-d

On Thu, Jun 11, 2015 at 04:54:26PM +0100, David Woodhouse wrote:
> It looks like the original kdump patch set has basically now been
> rewritten. Perhaps the best thing to do is for Zhenhua to resubmit an
> updated and clean version rather than preserving that
> churn?

Here is a cleaned up version:

git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git vt-d-kdump

It is based on v4.1-rc7 and in my testing it works as before. I also
implemented support for copying over extended root-entrys and context
tables.

Baoquan, Zhen-Hua, can you please give it a test too? I plan to rebuild
the current x86/vt-d branch with this cleaned up version.


Joerg