2009-06-18 18:05:27

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition

IOMMU Identity Mapping Support: iommu_identity_mapping definition

Identity mapping for IOMMU defines a single domain to 1:1 map all pci devices
to all usable memory.

This will reduces map/unmap overhead in DMA API's and improve IOMMU performance.
On 10Gb network cards, Netperf shows no performance degradation compared to
non-IOMMU performance.

This method may lose some of DMA remapping benefits like isolation.

The first patch defines iommu_identity_mapping varialbe which controls the
identity mapping code and is 0 by default.

Signed-off-by: Fenghua Yu <[email protected]>

---

Documentation/kernel-parameters.txt | 1 +
arch/ia64/include/asm/iommu.h | 2 ++
arch/ia64/kernel/pci-dma.c | 1 +
arch/x86/include/asm/iommu.h | 1 +
arch/x86/kernel/pci-dma.c | 5 +++++
5 files changed, 10 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 4b78797..6b1d7b4 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -988,6 +988,7 @@ and is between 256 and 4096 characters. It is defined in the file
forcesac
soft
pt [x86, IA64]
+ identity [x86, IA64]

io7= [HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
diff --git a/arch/ia64/include/asm/iommu.h b/arch/ia64/include/asm/iommu.h
index 745e095..cc6d59f 100644
--- a/arch/ia64/include/asm/iommu.h
+++ b/arch/ia64/include/asm/iommu.h
@@ -11,8 +11,10 @@ extern int force_iommu, no_iommu;
extern int iommu_detected;
#ifdef CONFIG_DMAR
extern int iommu_pass_through;
+extern int iommu_identity_mapping;
#else
#define iommu_pass_through (0)
+#define iommu_identity_mapping (0)
#endif
extern void iommu_dma_init(void);
extern void machvec_init(const char *name);
diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index 0569596..15b8555 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -33,6 +33,7 @@ int force_iommu __read_mostly;
#endif

int iommu_pass_through;
+int iommu_identity_mapping;

/* Dummy device used for NULL arguments (normally ISA). Better would
be probably a smaller DMA mask, but this is bug-to-bug compatible
diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index fd6d21b..d2aee4a 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -7,6 +7,7 @@ extern struct dma_map_ops nommu_dma_ops;
extern int force_iommu, no_iommu;
extern int iommu_detected;
extern int iommu_pass_through;
+extern int iommu_identity_mapping;

/* 10 seconds */
#define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000)
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 049005e..489179a 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -33,6 +33,7 @@ int no_iommu __read_mostly;
int iommu_detected __read_mostly = 0;

int iommu_pass_through;
+int iommu_identity_mapping;

dma_addr_t bad_dma_address __read_mostly = 0;
EXPORT_SYMBOL(bad_dma_address);
@@ -208,6 +209,10 @@ static __init int iommu_setup(char *p)
forbid_dac = -1;
return 1;
}
+ if (!strncmp(p, "identity", 8)) {
+ iommu_identity_mapping = 1;
+ return 1;
+ }
#ifdef CONFIG_SWIOTLB
if (!strncmp(p, "soft", 4))
swiotlb = 1;


2009-06-18 18:11:47

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition

On Thu, Jun 18, 2009 at 11:05:14AM -0700, Fenghua Yu wrote:

> IOMMU Identity Mapping Support: iommu_identity_mapping definition
>
> Identity mapping for IOMMU defines a single domain to 1:1 map all
> pci devices to all usable memory.

Why use VT-d at all in this case? Do you have a use-case in mind?

Cheers,
Muli

> This method may lose some of DMA remapping benefits like isolation.

I think you meant s/may/will/.

Cheers,
Muli
--
Muli Ben-Yehuda | [email protected] | +972-4-8281080
Manager, Virtualization and Systems Architecture
Master Inventor, IBM Haifa Research Laboratory

2009-06-18 18:14:58

by Fenghua Yu

[permalink] [raw]
Subject: RE: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition

>
>On Thu, Jun 18, 2009 at 11:05:14AM -0700, Fenghua Yu wrote:
>
>> IOMMU Identity Mapping Support: iommu_identity_mapping definition
>>
>> Identity mapping for IOMMU defines a single domain to 1:1 map all
>> pci devices to all usable memory.
>
>Why use VT-d at all in this case? Do you have a use-case in mind?

Some users want to use VT-d in KVM but are concerned of DMA remapping performance. They can use identity mapping and still have KVM on VT-d. They can also use pass through patch (sent out before) if hardware supports pass through.

By default, identity mapping is turned off. "iommu=identity" can turn it on

Thanks.

-Fenghua

2009-06-18 18:15:31

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition

* Muli Ben-Yehuda ([email protected]) wrote:
> On Thu, Jun 18, 2009 at 11:05:14AM -0700, Fenghua Yu wrote:
>
> > IOMMU Identity Mapping Support: iommu_identity_mapping definition
> >
> > Identity mapping for IOMMU defines a single domain to 1:1 map all
> > pci devices to all usable memory.
>
> Why use VT-d at all in this case? Do you have a use-case in mind?

Device assignment.

2009-06-18 18:17:10

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition

* Fenghua Yu ([email protected]) wrote:
> IOMMU Identity Mapping Support: iommu_identity_mapping definition
>
> Identity mapping for IOMMU defines a single domain to 1:1 map all pci devices
> to all usable memory.
>
> This will reduces map/unmap overhead in DMA API's and improve IOMMU performance.
> On 10Gb network cards, Netperf shows no performance degradation compared to
> non-IOMMU performance.
>
> This method may lose some of DMA remapping benefits like isolation.
>
> The first patch defines iommu_identity_mapping varialbe which controls the
> identity mapping code and is 0 by default.

The only real difference between "pt" and "identity" is hardware support.
We should have a single value we don't have to tell users to do different
things depending on their hardware (they won't even know what they have)
to achieve the same result.

thanks,
-chris

2009-06-18 18:28:17

by Fenghua Yu

[permalink] [raw]
Subject: RE: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition

>>
>> The first patch defines iommu_identity_mapping varialbe which controls
>the
>> identity mapping code and is 0 by default.
>
>The only real difference between "pt" and "identity" is hardware support.
>We should have a single value we don't have to tell users to do different
>things depending on their hardware (they won't even know what they have)
>to achieve the same result.

Technically keeping two separate options in base kernel might be clear and easy to understand. A distro might merge them together or have other usage model.

Thanks.

-Fenghua

2009-06-18 18:28:59

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition

On Thu, Jun 18, 2009 at 11:14:51AM -0700, Yu, Fenghua wrote:
> >
> >On Thu, Jun 18, 2009 at 11:05:14AM -0700, Fenghua Yu wrote:
> >
> >> IOMMU Identity Mapping Support: iommu_identity_mapping definition
> >>
> >> Identity mapping for IOMMU defines a single domain to 1:1 map all
> >> pci devices to all usable memory.
> >
> >Why use VT-d at all in this case? Do you have a use-case in mind?
>
> Some users want to use VT-d in KVM but are concerned of DMA
> remapping performance. They can use identity mapping and still have
> KVM on VT-d. They can also use pass through patch (sent out before)
> if hardware supports pass through.

Sorry, I must be missing something. For the normal device assignment
case, we want the IOMMU page tables to have gpa->hpa mappings rather
than the 1-1 identity mapping. How do you envision the 1-1 mapping
being used in the device assignment case?

Cheers,
Muli
--
Muli Ben-Yehuda | [email protected] | +972-4-8281080
Manager, Virtualization and Systems Architecture
Master Inventor, IBM Haifa Research Laboratory

2009-06-18 18:33:16

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition

* Muli Ben-Yehuda ([email protected]) wrote:
> On Thu, Jun 18, 2009 at 11:14:51AM -0700, Yu, Fenghua wrote:
> > >
> > >On Thu, Jun 18, 2009 at 11:05:14AM -0700, Fenghua Yu wrote:
> > >
> > >> IOMMU Identity Mapping Support: iommu_identity_mapping definition
> > >>
> > >> Identity mapping for IOMMU defines a single domain to 1:1 map all
> > >> pci devices to all usable memory.
> > >
> > >Why use VT-d at all in this case? Do you have a use-case in mind?
> >
> > Some users want to use VT-d in KVM but are concerned of DMA
> > remapping performance. They can use identity mapping and still have
> > KVM on VT-d. They can also use pass through patch (sent out before)
> > if hardware supports pass through.
>
> Sorry, I must be missing something. For the normal device assignment
> case, we want the IOMMU page tables to have gpa->hpa mappings rather
> than the 1-1 identity mapping. How do you envision the 1-1 mapping
> being used in the device assignment case?

The 1-1 mapping is for all the host devices _not_ assigned to guests.
To eliminate the i/o overhead imposed on all guests not using an
assigned device or from i/o from host.

It's just the same as VT-d PassThrough mode for hardware that doesn't
support it.

thanks,
-chris

2009-06-18 18:35:59

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition

* Yu, Fenghua ([email protected]) wrote:
> >>
> >> The first patch defines iommu_identity_mapping varialbe which controls
> >the
> >> identity mapping code and is 0 by default.
> >
> >The only real difference between "pt" and "identity" is hardware support.
> >We should have a single value we don't have to tell users to do different
> >things depending on their hardware (they won't even know what they have)
> >to achieve the same result.
>
> Technically keeping two separate options in base kernel might be clear and easy to understand. A distro might merge them together or have other usage model.

This pushes burden to distros and users for no obvious gain. Just like
queued invalidation vs. register based invalidation...it's a hardware
detail that users don't really care that much about, they just care
about user visible functionality.

thanks,
-chris

2009-06-18 18:44:27

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition

On Thu, Jun 18, 2009 at 11:31:21AM -0700, Chris Wright wrote:

> The 1-1 mapping is for all the host devices _not_ assigned to
> guests. To eliminate the i/o overhead imposed on all guests not
> using an assigned device or from i/o from host.
>
> It's just the same as VT-d PassThrough mode for hardware that
> doesn't support it.

Ok, that makes sense. Thanks, Chris. However, that doesn't appear to
be what the patch does---unless I'm misreading, if
iommu_identity_mapping is set, *all* devices get identity
mapping. Instead of a global command line option, we need to provide a
way to enable/disable pt or identity mapping (I agree that the user
shouldn't know or care which is used, the kernel should pick the best
one automatically) on a per BDF basis.

Cheers,
Muli
--
Muli Ben-Yehuda | [email protected] | +972-4-8281080
Manager, Virtualization and Systems Architecture
Master Inventor, IBM Haifa Research Laboratory

2009-06-18 18:51:09

by Fenghua Yu

[permalink] [raw]
Subject: RE: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition

>
>On Thu, Jun 18, 2009 at 11:31:21AM -0700, Chris Wright wrote:
>
>> The 1-1 mapping is for all the host devices _not_ assigned to
>> guests. To eliminate the i/o overhead imposed on all guests not
>> using an assigned device or from i/o from host.
>>
>> It's just the same as VT-d PassThrough mode for hardware that
>> doesn't support it.
>
>Ok, that makes sense. Thanks, Chris. However, that doesn't appear to
>be what the patch does---unless I'm misreading, if
>iommu_identity_mapping is set, *all* devices get identity
>mapping. Instead of a global command line option, we need to provide a
>way to enable/disable pt or identity mapping (I agree that the user
>shouldn't know or care which is used, the kernel should pick the best
>one automatically) on a per BDF basis.

The device in kvm is attached to a kvm domain by intel_iommu_attach_device(). In this function, domain_context_mapping() changes the device's domain to kvm domain from si_domain.

Actually there is a bug when a device is detached from kvm domain...in intel_iommu_detach_device(), I should assigned the device back to the si_domain. So the device can be used in native again.

Thanks.

-Fenghua

2009-06-18 18:53:17

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition

* Muli Ben-Yehuda ([email protected]) wrote:
> On Thu, Jun 18, 2009 at 11:31:21AM -0700, Chris Wright wrote:
>
> > The 1-1 mapping is for all the host devices _not_ assigned to
> > guests. To eliminate the i/o overhead imposed on all guests not
> > using an assigned device or from i/o from host.
> >
> > It's just the same as VT-d PassThrough mode for hardware that
> > doesn't support it.
>
> Ok, that makes sense. Thanks, Chris. However, that doesn't appear to
> be what the patch does---unless I'm misreading, if
> iommu_identity_mapping is set, *all* devices get identity

Correct (and same as pt mode). Of course, a subsequent device assignement
will set that mapping to gpa<->hpa, so you'll get proper isolation for
the assigned device. Essentially it's saying the only reason you cared
about the IOMMU was for device assignment.

> mapping. Instead of a global command line option, we need to provide a
> way to enable/disable pt or identity mapping (I agree that the user
> shouldn't know or care which is used, the kernel should pick the best
> one automatically) on a per BDF basis.

Yeah, that's a possible enhancement. Although once you've disabled one
device's isolation you've created a hole...

thanks,
-chris

2009-06-18 19:10:06

by Fenghua Yu

[permalink] [raw]
Subject: RE: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition

>> mapping. Instead of a global command line option, we need to provide a
>> way to enable/disable pt or identity mapping (I agree that the user
>> shouldn't know or care which is used, the kernel should pick the best
>> one automatically) on a per BDF basis.
>
>Yeah, that's a possible enhancement. Although once you've disabled one
>device's isolation you've created a hole...
>
The isolation hole is not much better than a global 1:1 mapping. If one device is 1:1 mapped to all memory. All of other devices can access part of this device's DMA range. And this device can access all of other devices' DMA range. Only better place than a global 1:1 mapping is all of other devices are isolated from each other.

Thanks.

-Fenghua

2009-06-25 00:38:48

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

IA64 Compilation error fix for Intel IOMMU identity mapping

The Intel IOMMU identity mapping uses e820 to walk the memory map. This causes
compilation error on IA64 when IOMMU is configured:

drivers/pci/intel-iommu.c:42:22: error: asm/e820.h: No such file or directory

This patch fixes this compilation error on IA64. It defines architecture
dependent memory map walk functions on x86 and ia64 seperately to set up
identity mapping for all devices.

Signed-off-by: Fenghua Yu <[email protected]>
Acked-by: Tony Luck <[email protected]>

---

This patch is against the latest upstream tree.

arch/ia64/kernel/pci-dma.c | 27 +++++++++++++++++++++++++++
arch/x86/kernel/pci-dma.c | 22 ++++++++++++++++++++++
drivers/pci/intel-iommu.c | 38 ++++++++++++++++++++++----------------
include/linux/intel-iommu.h | 4 ++++
4 files changed, 75 insertions(+), 16 deletions(-)

diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index 0569596..9cb3700 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -10,7 +10,9 @@
#include <linux/dmar.h>
#include <asm/iommu.h>
#include <asm/machvec.h>
+#include <linux/efi.h>
#include <linux/dma-mapping.h>
+#include <linux/intel-iommu.h>

#include <asm/system.h>

@@ -122,4 +124,29 @@ void __init pci_iommu_alloc(void)
#endif
}

+static int __initdata identity_mapped = 1;
+
+static int __init __iommu_prepare_identity_map(u64 start, u64 end, void *pdev)
+{
+ int ret;
+
+ ret = iommu_prepare_identity_map((struct pci_dev *)pdev, start, end);
+ if (ret) {
+ printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+ identity_mapped = 0;
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int __init iommu_setup_identity_map(struct pci_dev *pdev)
+{
+ efi_memmap_walk(__iommu_prepare_identity_map, pdev);
+ if (!identity_mapped)
+ return -EFAULT;
+
+ return 0;
+}
+
#endif
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 4763047..775ece5 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -10,6 +10,7 @@
#include <asm/gart.h>
#include <asm/calgary.h>
#include <asm/amd_iommu.h>
+#include <linux/intel-iommu.h>

static int forbid_dac __read_mostly;

@@ -314,3 +315,24 @@ static __devinit void via_no_dac(struct pci_dev *dev)
}
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID, via_no_dac);
#endif
+
+int iommu_setup_identity_map(struct pci_dev *pdev)
+{
+ int i;
+ int ret;
+
+ for (i = 0; i < e820.nr_map; i++) {
+ struct e820entry *ei = &e820.map[i];
+
+ if (ei->type == E820_RAM) {
+ ret = iommu_prepare_identity_map(pdev,
+ ei->addr, ei->addr + ei->size);
+ if (ret) {
+ printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+ return -EFAULT;
+ }
+ }
+ }
+
+ return 0;
+}
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index e53eacd..009a79a 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -39,7 +39,6 @@
#include <linux/sysdev.h>
#include <asm/cacheflush.h>
#include <asm/iommu.h>
-#include <asm/e820.h>
#include "pci.h"

#define ROOT_SIZE VTD_PAGE_SIZE
@@ -1845,9 +1844,8 @@ error:

static int iommu_identity_mapping;

-static int iommu_prepare_identity_map(struct pci_dev *pdev,
- unsigned long long start,
- unsigned long long end)
+int iommu_prepare_identity_map(struct pci_dev *pdev, unsigned long long start,
+ unsigned long long end)
{
struct dmar_domain *domain;
unsigned long size;
@@ -2081,8 +2079,8 @@ static int domain_add_dev_info(struct dmar_domain *domain,

static int iommu_prepare_static_identity_mapping(void)
{
- int i;
struct pci_dev *pdev = NULL;
+ int first_pdev = 1;
int ret;

ret = si_domain_init();
@@ -2091,17 +2089,25 @@ static int iommu_prepare_static_identity_mapping(void)

printk(KERN_INFO "IOMMU: Setting identity map:\n");
for_each_pci_dev(pdev) {
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
-
- if (ei->type == E820_RAM) {
- ret = iommu_prepare_identity_map(pdev,
- ei->addr, ei->addr + ei->size);
- if (ret) {
- printk(KERN_INFO "1:1 mapping to one domain failed.\n");
- return -EFAULT;
- }
- }
+ if (first_pdev) {
+ /*
+ * first pdev sets up the whole page table for
+ * si_domain.
+ */
+ first_pdev = 0;
+ ret = iommu_setup_identity_map(pdev);
+ if (ret)
+ return ret;
+ } else {
+ /*
+ * following pdev's just use the page table.
+ */
+ printk(KERN_INFO "IOMMU: identity mapping for device %s\n",
+ pci_name(pdev));
+ ret = domain_context_mapping(si_domain, pdev,
+ CONTEXT_TT_MULTI_LEVEL);
+ if (ret)
+ return ret;
}
ret = domain_add_dev_info(si_domain, pdev);
if (ret)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 482dc91..5d470cf 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -359,5 +359,9 @@ extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
u64 addr, unsigned mask);

extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
+extern int iommu_prepare_identity_map(struct pci_dev *pdev,
+ unsigned long long start,
+ unsigned long long end);

+extern int iommu_setup_identity_map(struct pci_dev *pdev);
#endif

2009-06-25 01:01:39

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

On Wed, 24 Jun 2009 17:38:41 -0700
Fenghua Yu <[email protected]> wrote:

> IA64 Compilation error fix for Intel IOMMU identity mapping
>
> The Intel IOMMU identity mapping uses e820 to walk the memory map. This causes
> compilation error on IA64 when IOMMU is configured:
>
> drivers/pci/intel-iommu.c:42:22: error: asm/e820.h: No such file or directory
>
> This patch fixes this compilation error on IA64. It defines architecture
> dependent memory map walk functions on x86 and ia64 seperately to set up
> identity mapping for all devices.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> Acked-by: Tony Luck <[email protected]>
>
> ---
>
> This patch is against the latest upstream tree.
>
> arch/ia64/kernel/pci-dma.c | 27 +++++++++++++++++++++++++++
> arch/x86/kernel/pci-dma.c | 22 ++++++++++++++++++++++
> drivers/pci/intel-iommu.c | 38 ++++++++++++++++++++++----------------
> include/linux/intel-iommu.h | 4 ++++
> 4 files changed, 75 insertions(+), 16 deletions(-)
>
> diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
> index 0569596..9cb3700 100644
> --- a/arch/ia64/kernel/pci-dma.c
> +++ b/arch/ia64/kernel/pci-dma.c
> @@ -10,7 +10,9 @@
> #include <linux/dmar.h>
> #include <asm/iommu.h>
> #include <asm/machvec.h>
> +#include <linux/efi.h>
> #include <linux/dma-mapping.h>
> +#include <linux/intel-iommu.h>
>
> #include <asm/system.h>
>
> @@ -122,4 +124,29 @@ void __init pci_iommu_alloc(void)
> #endif
> }
>
> +static int __initdata identity_mapped = 1;
> +
> +static int __init __iommu_prepare_identity_map(u64 start, u64 end, void *pdev)
> +{
> + int ret;
> +
> + ret = iommu_prepare_identity_map((struct pci_dev *)pdev, start, end);
> + if (ret) {
> + printk(KERN_INFO "1:1 mapping to one domain failed.\n");
> + identity_mapped = 0;
> + return -EFAULT;
> + }
> +
> + return 0;
> +}
> +
> +int __init iommu_setup_identity_map(struct pci_dev *pdev)
> +{
> + efi_memmap_walk(__iommu_prepare_identity_map, pdev);
> + if (!identity_mapped)
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> #endif
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index 4763047..775ece5 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -10,6 +10,7 @@
> #include <asm/gart.h>
> #include <asm/calgary.h>
> #include <asm/amd_iommu.h>
> +#include <linux/intel-iommu.h>
>
> static int forbid_dac __read_mostly;
>
> @@ -314,3 +315,24 @@ static __devinit void via_no_dac(struct pci_dev *dev)
> }
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID, via_no_dac);
> #endif
> +
> +int iommu_setup_identity_map(struct pci_dev *pdev)
> +{
> + int i;
> + int ret;
> +
> + for (i = 0; i < e820.nr_map; i++) {
> + struct e820entry *ei = &e820.map[i];
> +
> + if (ei->type == E820_RAM) {
> + ret = iommu_prepare_identity_map(pdev,
> + ei->addr, ei->addr + ei->size);

iommu_prepare_identity_map() is in drivers/pci/intel-iommu.c. So this
patch breaks x86_64 build on !CONFIG_DMAR. IA64 works since
arch/ia64/kernel/pci-dma.c is only build on CONFIG_DMAR though.

2009-06-25 04:16:23

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

IA64 compilation error fix for Intel IOMMU identity mapping

The Intel IOMMU identity mapping uses e820 to walk the memory map. This causes
compilation error on IA64 when IOMMU is configured:

drivers/pci/intel-iommu.c:42:22: error: asm/e820.h: No such file or directory

This patch fixes this compilation error on IA64. It defines architecture
dependent memory map walk functions on x86 and ia64 seperately to set up
identity mapping for all devices.

Signed-off-by: Fenghua Yu <[email protected]>
Acked-by: Tony Luck <[email protected]>

---

This patch is against the latest upstream tree.

arch/ia64/kernel/pci-dma.c | 27 +++++++++++++++++++++++++++
arch/x86/kernel/pci-dma.c | 29 +++++++++++++++++++++++++++++
drivers/pci/intel-iommu.c | 38 ++++++++++++++++++++++----------------
include/linux/intel-iommu.h | 4 ++++
4 files changed, 82 insertions(+), 16 deletions(-)

diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index 0569596..9cb3700 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -10,7 +10,9 @@
#include <linux/dmar.h>
#include <asm/iommu.h>
#include <asm/machvec.h>
+#include <linux/efi.h>
#include <linux/dma-mapping.h>
+#include <linux/intel-iommu.h>

#include <asm/system.h>

@@ -122,4 +124,29 @@ void __init pci_iommu_alloc(void)
#endif
}

+static int __initdata identity_mapped = 1;
+
+static int __init __iommu_prepare_identity_map(u64 start, u64 end, void *pdev)
+{
+ int ret;
+
+ ret = iommu_prepare_identity_map((struct pci_dev *)pdev, start, end);
+ if (ret) {
+ printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+ identity_mapped = 0;
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+int __init iommu_setup_identity_map(struct pci_dev *pdev)
+{
+ efi_memmap_walk(__iommu_prepare_identity_map, pdev);
+ if (!identity_mapped)
+ return -EFAULT;
+
+ return 0;
+}
+
#endif
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 4763047..16dcc61 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -10,6 +10,7 @@
#include <asm/gart.h>
#include <asm/calgary.h>
#include <asm/amd_iommu.h>
+#include <linux/intel-iommu.h>

static int forbid_dac __read_mostly;

@@ -314,3 +315,31 @@ static __devinit void via_no_dac(struct pci_dev *dev)
}
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID, via_no_dac);
#endif
+
+#ifdef CONFIG_DMAR
+int iommu_setup_identity_map(struct pci_dev *pdev)
+{
+ int i;
+ int ret;
+
+ for (i = 0; i < e820.nr_map; i++) {
+ struct e820entry *ei = &e820.map[i];
+
+ if (ei->type == E820_RAM) {
+ ret = iommu_prepare_identity_map(pdev,
+ ei->addr, ei->addr + ei->size);
+ if (ret) {
+ printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+ return -EFAULT;
+ }
+ }
+ }
+
+ return 0;
+}
+#else
+int iommu_setup_identity_map(struct pci_dev *pdev)
+{
+ return 0;
+}
+#endif
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index e53eacd..009a79a 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -39,7 +39,6 @@
#include <linux/sysdev.h>
#include <asm/cacheflush.h>
#include <asm/iommu.h>
-#include <asm/e820.h>
#include "pci.h"

#define ROOT_SIZE VTD_PAGE_SIZE
@@ -1845,9 +1844,8 @@ error:

static int iommu_identity_mapping;

-static int iommu_prepare_identity_map(struct pci_dev *pdev,
- unsigned long long start,
- unsigned long long end)
+int iommu_prepare_identity_map(struct pci_dev *pdev, unsigned long long start,
+ unsigned long long end)
{
struct dmar_domain *domain;
unsigned long size;
@@ -2081,8 +2079,8 @@ static int domain_add_dev_info(struct dmar_domain *domain,

static int iommu_prepare_static_identity_mapping(void)
{
- int i;
struct pci_dev *pdev = NULL;
+ int first_pdev = 1;
int ret;

ret = si_domain_init();
@@ -2091,17 +2089,25 @@ static int iommu_prepare_static_identity_mapping(void)

printk(KERN_INFO "IOMMU: Setting identity map:\n");
for_each_pci_dev(pdev) {
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
-
- if (ei->type == E820_RAM) {
- ret = iommu_prepare_identity_map(pdev,
- ei->addr, ei->addr + ei->size);
- if (ret) {
- printk(KERN_INFO "1:1 mapping to one domain failed.\n");
- return -EFAULT;
- }
- }
+ if (first_pdev) {
+ /*
+ * first pdev sets up the whole page table for
+ * si_domain.
+ */
+ first_pdev = 0;
+ ret = iommu_setup_identity_map(pdev);
+ if (ret)
+ return ret;
+ } else {
+ /*
+ * following pdev's just use the page table.
+ */
+ printk(KERN_INFO "IOMMU: identity mapping for device %s\n",
+ pci_name(pdev));
+ ret = domain_context_mapping(si_domain, pdev,
+ CONTEXT_TT_MULTI_LEVEL);
+ if (ret)
+ return ret;
}
ret = domain_add_dev_info(si_domain, pdev);
if (ret)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 482dc91..5d470cf 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -359,5 +359,9 @@ extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
u64 addr, unsigned mask);

extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
+extern int iommu_prepare_identity_map(struct pci_dev *pdev,
+ unsigned long long start,
+ unsigned long long end);

+extern int iommu_setup_identity_map(struct pci_dev *pdev);
#endif

2009-06-25 04:49:21

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

On Wed, 24 Jun 2009 21:16:05 -0700
Fenghua Yu <[email protected]> wrote:

> IA64 compilation error fix for Intel IOMMU identity mapping
>
> The Intel IOMMU identity mapping uses e820 to walk the memory map. This causes
> compilation error on IA64 when IOMMU is configured:
>
> drivers/pci/intel-iommu.c:42:22: error: asm/e820.h: No such file or directory
>
> This patch fixes this compilation error on IA64. It defines architecture
> dependent memory map walk functions on x86 and ia64 seperately to set up
> identity mapping for all devices.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> Acked-by: Tony Luck <[email protected]>
>
> ---
>
> This patch is against the latest upstream tree.
>
> arch/ia64/kernel/pci-dma.c | 27 +++++++++++++++++++++++++++
> arch/x86/kernel/pci-dma.c | 29 +++++++++++++++++++++++++++++
> drivers/pci/intel-iommu.c | 38 ++++++++++++++++++++++----------------
> include/linux/intel-iommu.h | 4 ++++
> 4 files changed, 82 insertions(+), 16 deletions(-)
>
> diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
> index 0569596..9cb3700 100644
> --- a/arch/ia64/kernel/pci-dma.c
> +++ b/arch/ia64/kernel/pci-dma.c
> @@ -10,7 +10,9 @@
> #include <linux/dmar.h>
> #include <asm/iommu.h>
> #include <asm/machvec.h>
> +#include <linux/efi.h>
> #include <linux/dma-mapping.h>
> +#include <linux/intel-iommu.h>
>
> #include <asm/system.h>
>
> @@ -122,4 +124,29 @@ void __init pci_iommu_alloc(void)
> #endif
> }
>
> +static int __initdata identity_mapped = 1;
> +
> +static int __init __iommu_prepare_identity_map(u64 start, u64 end, void *pdev)
> +{
> + int ret;
> +
> + ret = iommu_prepare_identity_map((struct pci_dev *)pdev, start, end);
> + if (ret) {
> + printk(KERN_INFO "1:1 mapping to one domain failed.\n");
> + identity_mapped = 0;
> + return -EFAULT;
> + }
> +
> + return 0;
> +}
> +
> +int __init iommu_setup_identity_map(struct pci_dev *pdev)
> +{
> + efi_memmap_walk(__iommu_prepare_identity_map, pdev);
> + if (!identity_mapped)
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> #endif
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index 4763047..16dcc61 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -10,6 +10,7 @@
> #include <asm/gart.h>
> #include <asm/calgary.h>
> #include <asm/amd_iommu.h>
> +#include <linux/intel-iommu.h>
>
> static int forbid_dac __read_mostly;
>
> @@ -314,3 +315,31 @@ static __devinit void via_no_dac(struct pci_dev *dev)
> }
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID, via_no_dac);
> #endif
> +
> +#ifdef CONFIG_DMAR
> +int iommu_setup_identity_map(struct pci_dev *pdev)
> +{
> + int i;
> + int ret;
> +
> + for (i = 0; i < e820.nr_map; i++) {
> + struct e820entry *ei = &e820.map[i];
> +
> + if (ei->type == E820_RAM) {
> + ret = iommu_prepare_identity_map(pdev,
> + ei->addr, ei->addr + ei->size);
> + if (ret) {
> + printk(KERN_INFO "1:1 mapping to one domain failed.\n");
> + return -EFAULT;
> + }
> + }
> + }
> +
> + return 0;
> +}
> +#else
> +int iommu_setup_identity_map(struct pci_dev *pdev)
> +{
> + return 0;
> +}
> +#endif

I don't think that we need #else part.

This patch is really ugly; adds another ifdef and VT-D specific code
to the generic place (arch/x86/kernel/pci-dma.c).

However, I guess that we need to live with this for now since this
fixes the build error.

Another complaint from me is that, IIRC, the patch causes this build
error was posted was posted first during the merge window (18 June)
and merged few days later without properly reviewed (or compile
tested).

2009-06-25 07:11:57

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

On Thu, 2009-06-25 at 13:48 +0900, FUJITA Tomonori wrote:
> This patch is really ugly; adds another ifdef and VT-D specific code
> to the generic place (arch/x86/kernel/pci-dma.c).
>
> However, I guess that we need to live with this for now since this
> fixes the build error.

It raises the question: Why are we using firmware-specific interfaces to
list the available memory -- can't we get that from somewhere _generic_?

The less we tie our code to these crappy BIOS, EFI and ACPI interfaces,
the better off we'll be.

> Another complaint from me is that, IIRC, the patch causes this build
> error was posted was posted first during the merge window (18 June)
> and merged few days later without properly reviewed (or compile
> tested).

I apologise for that. I got to it rather late, just as I was reviewing
my outstanding code for Linus to pull. It _had_ been reviewed though --
this was the second incarnation of the patch, after review.

I didn't include it in my original pull request, and instead let it sit
in linux-next for a day (which is not long, I know, but better than
nothing). I then spent that day testing it myself, and tracking down a
bug in it.

I sent the fixed version to Linus at the end of the day -- I _thought_
I would have been told of any IA64 build failures in linux-next by then,
even if the original hadn't been tested on IA64 -- evidently I was
wrong.

I really must set myself up an IA64 cross-compiler. Every time I've
looked at that so far I've got distracted into the whole "why is
GCC/glibc build so incestuously broken" quagmire -- but I only need gcc,
so I'll steel myself to ignore the braindamage and just build a simple
compiler without glibc support.

Better still would be if I could find some IA64 hardware with this
IOMMU. Would kind of help with maintenance...

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

2009-06-25 21:52:47

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

On Thu, 2009-06-25 at 08:11 +0100, David Woodhouse wrote:
> It raises the question: Why are we using firmware-specific interfaces to
> list the available memory -- can't we get that from somewhere _generic_?
>
> The less we tie our code to these crappy BIOS, EFI and ACPI interfaces,
> the better off we'll be.

Does this work everywhere... ?

(Fenghua, why are we doing the whole setup per pci dev anyway -- why not
set up the page tables once and point all devices at the same page
tables?)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index e53eacd..1f0830a 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -39,7 +39,6 @@
#include <linux/sysdev.h>
#include <asm/cacheflush.h>
#include <asm/iommu.h>
-#include <asm/e820.h>
#include "pci.h"

#define ROOT_SIZE VTD_PAGE_SIZE
@@ -2081,7 +2080,6 @@ static int domain_add_dev_info(struct dmar_domain *domain,

static int iommu_prepare_static_identity_mapping(void)
{
- int i;
struct pci_dev *pdev = NULL;
int ret;

@@ -2091,16 +2089,18 @@ static int iommu_prepare_static_identity_mapping(void)

printk(KERN_INFO "IOMMU: Setting identity map:\n");
for_each_pci_dev(pdev) {
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
-
- if (ei->type == E820_RAM) {
- ret = iommu_prepare_identity_map(pdev,
- ei->addr, ei->addr + ei->size);
- if (ret) {
- printk(KERN_INFO "1:1 mapping to one domain failed.\n");
- return -EFAULT;
- }
+ struct zone *zone;
+
+ for_each_populated_zone(zone) {
+ unsigned long zone_end_pfn = zone->zone_start_pfn +
+ zone->spanned_pages;
+
+ ret = iommu_prepare_identity_map(pdev,
+ (uint64_t)zone->zone_start_pfn << PAGE_SHIFT,
+ (uint64_t)zone_end_pfn << PAGE_SHIFT);
+ if (ret) {
+ printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+ return -EFAULT;
}
}
ret = domain_add_dev_info(si_domain, pdev);



--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

2009-06-25 21:56:27

by Fenghua Yu

[permalink] [raw]
Subject: RE: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

>-----Original Message-----
>From: David Woodhouse [mailto:[email protected]]
>Sent: Thursday, June 25, 2009 2:52 PM
>To: FUJITA Tomonori
>Cc: Yu, Fenghua; [email protected]; [email protected];
>[email protected]; Luck, Tony; [email protected];
>[email protected]; [email protected]
>Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity
>Mapping Support
>
>On Thu, 2009-06-25 at 08:11 +0100, David Woodhouse wrote:
>> It raises the question: Why are we using firmware-specific interfaces to
>> list the available memory -- can't we get that from somewhere _generic_?
>>
>> The less we tie our code to these crappy BIOS, EFI and ACPI interfaces,
>> the better off we'll be.
>
>Does this work everywhere... ?
>
>(Fenghua, why are we doing the whole setup per pci dev anyway -- why not
>set up the page tables once and point all devices at the same page
>tables?)
>

That's what the ia64 fix patch does...the first pci device set up the page table. Then all following pci devices use that page table by setting up the context mapping.

+ if (first_pdev) {
+ /*
+ * first pdev sets up the whole page table for
+ * si_domain.
+ */
+ first_pdev = 0;
+ ret = iommu_setup_identity_map(pdev);
+ if (ret)
+ return ret;
+ } else {
+ /*
+ * following pdev's just use the page table.
+ */
+ printk(KERN_INFO "IOMMU: identity mapping for device %s\n",
+ pci_name(pdev));
+ ret = domain_context_mapping(si_domain, pdev,
+ CONTEXT_TT_MULTI_LEVEL);
+ if (ret)
Thanks.

-Fenghua

2009-06-25 22:01:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support



On Thu, 25 Jun 2009, David Woodhouse wrote:

> On Thu, 2009-06-25 at 08:11 +0100, David Woodhouse wrote:
> > It raises the question: Why are we using firmware-specific interfaces to
> > list the available memory -- can't we get that from somewhere _generic_?
> >
> > The less we tie our code to these crappy BIOS, EFI and ACPI interfaces,
> > the better off we'll be.
>
> Does this work everywhere... ?

Whimper.

Please pleas please please.. Pretty please, let this work (instead of
duplicating the insanity).

Linus

2009-06-25 22:46:42

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

On Thu, Jun 25, 2009 at 3:00 PM, Linus
Torvalds<[email protected]> wrote:
>> Does this work everywhere... ?
>
> Whimper.
>
> Please pleas please please.. Pretty please, let this work (instead of
> duplicating the insanity).

Well it compiles on all the ia64 configurations (with and without
iommu configured). So that is a big step forward. It boots on
a machine w/o an iommu. Fenghua will have to test on his machine
that has an iommu to confirm that this works for ia64.

-Tony

2009-06-25 23:44:51

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

* David Woodhouse ([email protected]) wrote:
> On Thu, 2009-06-25 at 08:11 +0100, David Woodhouse wrote:
> > It raises the question: Why are we using firmware-specific interfaces to
> > list the available memory -- can't we get that from somewhere _generic_?
> >
> > The less we tie our code to these crappy BIOS, EFI and ACPI interfaces,
> > the better off we'll be.
>
> Does this work everywhere... ?

Why don't we just use what's already there? That should already be
working w/ IA-64.

Then it's clearer for later consolidation (there's no reason to have all
these different copies of same page tables).

thanks,
-chris
---
From: Chris Wright <[email protected]>
Subject: [PATCH] intel-iommu: fix Identity Mapping to be arch independent

Drop the e820 scanning and use existing function for finding valid RAM
regions to add to 1:1 mapping.

Signed-off-by: Chris Wright <[email protected]>
---
drivers/pci/intel-iommu.c | 17 ++++-------------
1 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index e53eacd..151e7d9 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -39,7 +39,6 @@
#include <linux/sysdev.h>
#include <asm/cacheflush.h>
#include <asm/iommu.h>
-#include <asm/e820.h>
#include "pci.h"

#define ROOT_SIZE VTD_PAGE_SIZE
@@ -2081,7 +2080,6 @@ static int domain_add_dev_info(struct dmar_domain *domain,

static int iommu_prepare_static_identity_mapping(void)
{
- int i;
struct pci_dev *pdev = NULL;
int ret;

@@ -2091,17 +2089,10 @@ static int iommu_prepare_static_identity_mapping(void)

printk(KERN_INFO "IOMMU: Setting identity map:\n");
for_each_pci_dev(pdev) {
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
-
- if (ei->type == E820_RAM) {
- ret = iommu_prepare_identity_map(pdev,
- ei->addr, ei->addr + ei->size);
- if (ret) {
- printk(KERN_INFO "1:1 mapping to one domain failed.\n");
- return -EFAULT;
- }
- }
+ ret = iommu_prepare_with_active_regions(pdev);
+ if (ret) {
+ printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+ return -EFAULT;
}
ret = domain_add_dev_info(si_domain, pdev);
if (ret)

2009-06-26 01:36:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support



On Thu, 25 Jun 2009, Chris Wright wrote:
>
> From: Chris Wright <[email protected]>
> Subject: [PATCH] intel-iommu: fix Identity Mapping to be arch independent
>
> Drop the e820 scanning and use existing function for finding valid RAM
> regions to add to 1:1 mapping.
>
> Signed-off-by: Chris Wright <[email protected]>

Is this tested on something that actually uses IOMMU?

If so, of the patches I've seen so far, this is obviously my favorite.

Linus

2009-06-26 01:54:01

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

* Linus Torvalds ([email protected]) wrote:
> On Thu, 25 Jun 2009, Chris Wright wrote:
> >
> > From: Chris Wright <[email protected]>
> > Subject: [PATCH] intel-iommu: fix Identity Mapping to be arch independent
> >
> > Drop the e820 scanning and use existing function for finding valid RAM
> > regions to add to 1:1 mapping.
> >
> > Signed-off-by: Chris Wright <[email protected]>
>
> Is this tested on something that actually uses IOMMU?
>
> If so, of the patches I've seen so far, this is obviously my favorite.

I tested it on x86; works fine and creates a slightly tighter set of page
tables (matching to e820). But I don't have an IA-64 box w/ an IOMMU.

It does have one problem, however, the graphics work around code isn't
compiled in on IA-64, so it needs:

(a) a small ifdef move
(b) to be tested on IA-64, since my hypothesis was based on the fact that
it was already being used on IA-64

Here's v2, fixing (a) above (compile tested on IA-64), still needing (b).

thanks,
-chris
---
From: Chris Wright <[email protected]>
Subject: [PATCH v2] intel-iommu: fix Identity Mapping to be arch independent

Drop the e820 scanning and use existing function for finding valid RAM
regions to add to 1:1 mapping.

Signed-off-by: Chris Wright <[email protected]>
---
drivers/pci/intel-iommu.c | 19 +++++--------------
1 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index e53eacd..420afa8 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -39,7 +39,6 @@
#include <linux/sysdev.h>
#include <asm/cacheflush.h>
#include <asm/iommu.h>
-#include <asm/e820.h>
#include "pci.h"

#define ROOT_SIZE VTD_PAGE_SIZE
@@ -1908,7 +1907,6 @@ static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr,
rmrr->end_address + 1);
}

-#ifdef CONFIG_DMAR_GFX_WA
struct iommu_prepare_data {
struct pci_dev *pdev;
int ret;
@@ -1943,6 +1941,7 @@ static int __init iommu_prepare_with_active_regions(struct pci_dev *pdev)
return data.ret;
}

+#ifdef CONFIG_DMAR_GFX_WA
static void __init iommu_prepare_gfx_mapping(void)
{
struct pci_dev *pdev = NULL;
@@ -2081,7 +2080,6 @@ static int domain_add_dev_info(struct dmar_domain *domain,

static int iommu_prepare_static_identity_mapping(void)
{
- int i;
struct pci_dev *pdev = NULL;
int ret;

@@ -2091,17 +2089,10 @@ static int iommu_prepare_static_identity_mapping(void)

printk(KERN_INFO "IOMMU: Setting identity map:\n");
for_each_pci_dev(pdev) {
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
-
- if (ei->type == E820_RAM) {
- ret = iommu_prepare_identity_map(pdev,
- ei->addr, ei->addr + ei->size);
- if (ret) {
- printk(KERN_INFO "1:1 mapping to one domain failed.\n");
- return -EFAULT;
- }
- }
+ ret = iommu_prepare_with_active_regions(pdev);
+ if (ret) {
+ printk(KERN_INFO "1:1 mapping to one domain failed.\n");
+ return -EFAULT;
}
ret = domain_add_dev_info(si_domain, pdev);
if (ret)

2009-06-26 02:01:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support



On Thu, 25 Jun 2009, Chris Wright wrote:
> I tested it on x86; works fine and creates a slightly tighter set of page
> tables (matching to e820). But I don't have an IA-64 box w/ an IOMMU.
>
> It does have one problem, however, the graphics work around code isn't
> compiled in on IA-64, so it needs:

Ok. Sounds like what I want is to get this patch through the ia64 people,
after it has gotten some testing there. Since ia64 is where the current
kernel fails anyway, that sounds like the motivation will be there too.

And there's little point in me merging it until it's been tested to fix
the actual problem we have now.

Thanks,

Linus

2009-06-26 02:10:52

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

* Linus Torvalds ([email protected]) wrote:
> Ok. Sounds like what I want is to get this patch through the ia64 people,
> after it has gotten some testing there. Since ia64 is where the current
> kernel fails anyway, that sounds like the motivation will be there too.

Yup, I agree.

thanks,
-chris

2009-06-26 11:54:03

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

On Thu, 25 Jun 2009, Chris Wright wrote:

> * Linus Torvalds ([email protected]) wrote:
>> Ok. Sounds like what I want is to get this patch through the ia64 people,
>> after it has gotten some testing there. Since ia64 is where the current
>> kernel fails anyway, that sounds like the motivation will be there too.
>
> Yup, I agree.

Tony, please could you test what's in git://git.infradead.org/iommu-2.6.git

It builds, but we'd like to check that it works correctly with iommu=pt on
IA64.

(I'm leaving Fenghua's performance improvement for later.)

--
dwmw2

2009-06-26 18:21:55

by David Woodhouse

[permalink] [raw]
Subject: RE: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

On Thu, 2009-06-25 at 14:56 -0700, Yu, Fenghua wrote:
>
> >(Fenghua, why are we doing the whole setup per pci dev anyway -- why
> not
> >set up the page tables once and point all devices at the same page
> >tables?)
> >
>
> That's what the ia64 fix patch does...the first pci device set up the
> page table. Then all following pci devices use that page table by
> setting up the context mapping.

I've sorted that out separately -- it's not necessarily part of this
fix, although it _would_ be nice to have it in 2.6.31, since your
original 1:1 mapping patch is quite slow (and takes quite a lot of
memory) without it.

This time, I _am_ going to let it bake in linux-next for a while first
though.

Until Linus pulls the IA64 build fix from the real iommu-2.6.git tree,
I've put this into git://git.infradead.org/~dwmw2/iommu-pt.git

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

2009-06-27 00:05:34

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

* David Woodhouse ([email protected]) wrote:
> On Thu, 25 Jun 2009, Chris Wright wrote:
>
> > * Linus Torvalds ([email protected]) wrote:
> >> Ok. Sounds like what I want is to get this patch through the ia64 people,
> >> after it has gotten some testing there. Since ia64 is where the current
> >> kernel fails anyway, that sounds like the motivation will be there too.
> >
> > Yup, I agree.
>
> Tony, please could you test what's in git://git.infradead.org/iommu-2.6.git
>
> It builds, but we'd like to check that it works correctly with iommu=pt on
> IA64.

In the meantime, I booted this on an IA-64 box (w/out VT-d), forced it
to call intel_iommu_init() as if it had an IOMMU, and added a small bit
of debugging. Looks like it's doing the right thing.

efi based
e000000001000000-e000000078000000
e00000007c000000-e00000007e990000
e00000007f300000-e00000007fda0000
e000000100000000-e00000047b7f0000
e00000047f800000-e00000047fdc0000
e00000047fe80000-e00000047ffb0000
zone based
1000000-100000000
100000000-47ffb0000
online node based
1000000-78000000
7c000000-7e990000
7f300000-7fda0000
100000000-47b7f0000
47f800000-47fdc0000
47fe80000-47ffb0000

The efi based mem walker is returning virtual addresses, Fenghua did
you test that one? I'm surprised it worked.

This is quite similar to the type of output I see on x86_64 (except the
e820 map gives phys addrs, of course).

e820 based
10000-9d800
100000-bf341000
bf67e000-bf800000
100000000-340000000
zone based
10000-1000000
1000000-100000000
100000000-1c0000000
1c0000000-340000000
online node based
10000-9d000
100000-bf341000
bf67e000-bf800000
100000000-1c0000000
1c0000000-340000000

thanks,
-chris
---
debug patch for the really bored...

Index: linus-2.6/drivers/pci/intel-iommu.c
===================================================================
--- linus-2.6.orig/drivers/pci/intel-iommu.c
+++ linus-2.6/drivers/pci/intel-iommu.c
@@ -3110,10 +3110,66 @@ static int __init init_iommu_sysfs(void)
}
#endif /* CONFIG_PM */

+#ifdef CONFIG_X86
+#include <asm/e820.h>
+static void __init print_memmap(void)
+{
+ int i;
+ printk("e820 based\n");
+ for (i = 0; i < e820.nr_map; i++) {
+ struct e820entry *ei = &e820.map[i];
+
+ if (ei->type == E820_RAM)
+ printk("%llx-%llx\n", ei->addr, ei->addr + ei->size);
+ }
+}
+#elif CONFIG_IA64
+#include <linux/efi.h>
+static int __init efi_print_map(u64 start, u64 end, void *unused)
+{
+ printk("%llx-%llx\n", start, end);
+ return 0;
+}
+
+static void __init print_memmap(void)
+{
+ printk("efi based\n");
+ efi_memmap_walk(efi_print_map, NULL);
+}
+#endif
+
+static int __init print_region(unsigned long start_pfn, unsigned long end_pfn,
+ void *unused)
+{
+ printk("%lx-%lx\n", start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT);
+ return 0;
+}
+
+static void __init show_mem_regions(void)
+{
+ int nid;
+ struct zone *zone;
+
+ print_memmap();
+ printk("zone based\n");
+ for_each_populated_zone(zone) {
+ unsigned long zone_end_pfn = zone->zone_start_pfn +
+ zone->spanned_pages;
+ printk("%lx-%lx\n", zone->zone_start_pfn << PAGE_SHIFT,
+ zone_end_pfn << PAGE_SHIFT);
+ }
+
+ printk("online node based\n");
+ for_each_online_node(nid)
+ work_with_active_regions(nid, print_region, NULL);
+}
+
int __init intel_iommu_init(void)
{
int ret = 0;

+ show_mem_regions();
+
if (dmar_table_init())
return -ENODEV;

Index: linus-2.6/arch/ia64/kernel/pci-dma.c
===================================================================
--- linus-2.6.orig/arch/ia64/kernel/pci-dma.c
+++ linus-2.6/arch/ia64/kernel/pci-dma.c
@@ -47,7 +47,7 @@ extern struct dma_map_ops intel_dma_ops;

static int __init pci_iommu_init(void)
{
- if (iommu_detected)
+// if (iommu_detected)
intel_iommu_init();

return 0;

2009-06-27 11:44:50

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v2] IA64 Compilation Error Fix for Intel IOMMU Identity Mapping Support

On Thu, 2009-06-25 at 19:00 -0700, Linus Torvalds wrote:
>
> On Thu, 25 Jun 2009, Chris Wright wrote:
> > I tested it on x86; works fine and creates a slightly tighter set of page
> > tables (matching to e820). But I don't have an IA-64 box w/ an IOMMU.
> >
> > It does have one problem, however, the graphics work around code isn't
> > compiled in on IA-64, so it needs:
>
> Ok. Sounds like what I want is to get this patch through the ia64 people,
> after it has gotten some testing there. Since ia64 is where the current
> kernel fails anyway, that sounds like the motivation will be there too.
>
> And there's little point in me merging it until it's been tested to fix
> the actual problem we have now.

Well, the main problem is that the kernel doesn't build -- and it _has_
been tested to fix that.

As long as the daily linux-next test builds work, nobody really seems to
_care_ about VT-d on IA64 -- certainly not enough to get me a box when
they asked me to become VT-d maintainer.

Now that I chase it up, I've received only one test report and that was
"oh, even 2.6.29 doesn't boot on this box with VT-d enabled".

The patch _does_ fix the build breakage, definitely seems to be doing
the right thing, and the worst case is that the new feature doesn't work
on IA64. So I think we should merge it.

I'll bash some heads together and try to get myself some hardware so I
can actually test it on IA64 -- it looks like we have more problems
there than the small possibility that the new passthrough feature might
do the wrong thing (and in fact I suspect all IA64 VT-d hardware is new
enough to have the _hardware_ passthrough anyway, so won't need this
identity mapping set up at all).

Please pull Chris's fix from
git://git.infradead.org/iommu-2.6.git

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

2009-07-04 18:40:56

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH 1/2] IOMMU Identity Mapping Support: iommu_identity_mapping definition

On Thu, 2009-06-18 at 11:13 -0700, Chris Wright wrote:
> * Fenghua Yu ([email protected]) wrote:
> > IOMMU Identity Mapping Support: iommu_identity_mapping definition
> >
> > Identity mapping for IOMMU defines a single domain to 1:1 map all pci devices
> > to all usable memory.
> >
> > This will reduces map/unmap overhead in DMA API's and improve IOMMU performance.
> > On 10Gb network cards, Netperf shows no performance degradation compared to
> > non-IOMMU performance.
> >
> > This method may lose some of DMA remapping benefits like isolation.
> >
> > The first patch defines iommu_identity_mapping varialbe which controls the
> > identity mapping code and is 0 by default.
>
> The only real difference between "pt" and "identity" is hardware support.
> We should have a single value we don't have to tell users to do different
> things depending on their hardware (they won't even know what they have)
> to achieve the same result.

The _code_ ought to be a lot more shared than it is, too. Currently, the
hardware pass-through support has bugs that the software identity
mapping doesn't. It doesn't remove devices from the identity map if they
are limited to 32-bit DMA and a driver tries to set up mappings, which
is quite suboptimal. And it doesn't put them _back_ into the identity
map after they're detached from a VM, AFAICT.

I was going to fix that and unify the code paths, but then I found a bug
with the software identity mapping too -- if you have a PCI device which
is only capable of 32-bit DMA and it's behind a bridge (such as the
ohci1394 device on a Tylersburg SDV, although you'll have to hack the
kernel to pretend not to have the hardware PT support), it'll cause a
BUG() when it first sets up a mapping. What happens is this:

First it removes that device from si_domain because it can only address
4GiB of RAM, then get_domain_for_dev() will put it right back _in_ the
si_domain again, because it inherits its domain from the upstream PCI
bridge. And then we BUG() in domain_get_iommu() which _really_ doesn't
want to see the si_domain.

I _think_ this is the best fix for that...

>From 3dfc813d94bba2046c6aed216e0fd69ac93a8e03 Mon Sep 17 00:00:00 2001
From: David Woodhouse <[email protected]>
Date: Sat, 4 Jul 2009 19:11:08 +0100
Subject: [PATCH] intel-iommu: Don't use identity mapping for PCI devices behind bridges

Our current strategy for pass-through mode is to put all devices into
the 1:1 domain at startup (which is before we know what their dma_mask
will be), and only _later_ take them out of that domain, if it turns out
that they really can't address all of memory.

However, when there are a bunch of PCI devices behind a bridge, they all
end up with the same source-id on their DMA transactions, and hence in
the same IOMMU domain. This means that we _can't_ easily move them from
the 1:1 domain into their own domain at runtime, because there might be DMA
in-flight from their siblings.

So we have to adjust our pass-through strategy: For PCI devices not on
the root bus, and for the bridges which will take responsibility for
their transactions, we have to start up _out_ of the 1:1 domain, just in
case.

This fixes the BUG() we see when we have 32-bit-capable devices behind a
PCI-PCI bridge, and use the software identity mapping.

It does mean that we might end up using 'normal' mapping mode for some
devices which could actually live with the faster 1:1 mapping -- but
this is only for PCI devices behind bridges, which presumably aren't the
devices for which people are most concerned about performance.

Signed-off-by: David Woodhouse <[email protected]>
---
drivers/pci/intel-iommu.c | 30 ++++++++++++++++++++++++++++++
1 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index f9fc4f3..360fb67 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -2122,6 +2122,36 @@ static int iommu_should_identity_map(struct pci_dev *pdev, int startup)
if (iommu_identity_mapping == 2)
return IS_GFX_DEVICE(pdev);

+ /*
+ * We want to start off with all devices in the 1:1 domain, and
+ * take them out later if we find they can't access all of memory.
+ *
+ * However, we can't do this for PCI devices behind bridges,
+ * because all PCI devices behind the same bridge will end up
+ * with the same source-id on their transactions.
+ *
+ * Practically speaking, we can't change things around for these
+ * devices at run-time, because we can't be sure there'll be no
+ * DMA transactions in flight for any of their siblings.
+ *
+ * So PCI devices (unless they're on the root bus) as well as
+ * their parent PCI-PCI or PCIe-PCI bridges must be left _out_ of
+ * the 1:1 domain, just in _case_ one of their siblings turns out
+ * not to be able to map all of memory.
+ */
+ if (!pdev->is_pcie) {
+ if (!pci_is_root_bus(pdev->bus))
+ return 0;
+ if (pdev->class >> 8 == PCI_CLASS_BRIDGE_PCI)
+ return 0;
+ } else if (pdev->pcie_type == PCI_EXP_TYPE_PCI_BRIDGE)
+ return 0;
+
+ /*
+ * At boot time, we don't yet know if devices will be 64-bit capable.
+ * Assume that they will -- if they turn out not to be, then we can
+ * take them out of the 1:1 domain later.
+ */
if (!startup)
return pdev->dma_mask > DMA_BIT_MASK(32);

--
1.6.2.5


--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation