2007-06-19 22:18:33

by Keshavamurthy, Anil S

[permalink] [raw]
Subject: [Intel IOMMU 00/10] Intel IOMMU support, take #2

Hi All,
This patch supports the upcomming Intel IOMMU hardware
a.k.a. Intel(R) Virtualization Technology for Directed I/O
Architecture and the hardware spec for the same can be found here
http://www.intel.com/technology/virtualization/index.htm

This version of the patches incorporates several
feedback obtained from previous postings.

Some of the major changes are
1) Removed resource pool (a.k.a. pre-allocate pool) patch
2) For memory allocation in the DMA map api calls we
now use kmem_cache_alloc() and get_zeroedpage() functions
to allocate memory for internal data structures and for
page table setup memory.
3) The allocation of memory in the DMA map api calls is
very critical and to avoid failures during memory allocation
in the DMA map api calls we evaluated several technique
a) mempool - We found that mempool is pretty much useless
if we try to allocate memory with GFP_ATOMIC which is
our case. Also we found that it is difficult to judge
how much to reserver during the creation of mempool.
b) PF_MEMALLOC - When a task flags (current->flags) are
set with PF_MEMALLOC then watermark checks are avoided
during the memory allocation.
We choose to use the latter (option b) and make this as
a separate patch which can be debated further. Please
see patch 6/10.

Other minor changes are mostly coding style fixes and
making sure that checkpatch.pl passes the patches.

Please include this set of patches for next MM release.

Thanks and regards,
-Anil S Keshavamurthy
E-mail: [email protected]

--


2007-06-26 06:46:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

On Tue, 19 Jun 2007 14:37:01 -0700 "Keshavamurthy, Anil S" <[email protected]> wrote:

> This patch supports the upcomming Intel IOMMU hardware
> a.k.a. Intel(R) Virtualization Technology for Directed I/O
> Architecture

So... what's all this code for?

I assume that the intent here is to speed things up under Xen, etc? Do we
have any benchmark results to help us to decide whether a merge would be
justified?

Does it slow anything down?

2007-06-26 07:13:14

by Andi Kleen

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

On Tuesday 26 June 2007 08:45:50 Andrew Morton wrote:
> On Tue, 19 Jun 2007 14:37:01 -0700 "Keshavamurthy, Anil S" <[email protected]> wrote:
>
> > This patch supports the upcomming Intel IOMMU hardware
> > a.k.a. Intel(R) Virtualization Technology for Directed I/O
> > Architecture
>
> So... what's all this code for?
>
> I assume that the intent here is to speed things up under Xen, etc?

Yes in some cases, but not this code. That would be the Xen version
of this code that could potentially assign whole devices to guests.
I expect this to be only useful in some special cases though because
most hardware is not virtualizable and you typically want an own
instance for each guest.

Ok at some point KVM might implement this too; i likely would
use this code for this.

> Do we
> have any benchmark results to help us to decide whether a merge would be
> justified?

The main advantage for doing it in the normal kernel is not performance, but
more safety. Broken devices won't be able to corrupt memory by doing
random DMA.

Unfortunately that doesn't work for graphics yet, for that need
user space interfaces for the X server are needed.

There are some potential performance benefits too:
- When you have a device that cannot address the complete address range
an IOMMU can remap its memory instead of bounce buffering. Remapping
is likely cheaper than copying.
- The IOMMU can merge sg lists into a single virtual block. This could
potentially speed up SG IO when the device is slow walking SG lists.
[I long ago benchmarked 5% on some block benchmark with an old
MPT Fusion; but it probably depends a lot on the HBA]

And you get better driver debugging because unexpected memory accesses
from the devices will cause an trapable event.

>
> Does it slow anything down?

It adds more overhead to each IO so yes.

-Andi

2007-06-26 11:13:43

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

On Tue, Jun 26, 2007 at 09:12:45AM +0200, Andi Kleen wrote:

> There are some potential performance benefits too:
> - When you have a device that cannot address the complete address range
> an IOMMU can remap its memory instead of bounce buffering. Remapping
> is likely cheaper than copying.

But those devices aren't likely to be found on modern systems.

> - The IOMMU can merge sg lists into a single virtual block. This could
> potentially speed up SG IO when the device is slow walking SG lists.
> [I long ago benchmarked 5% on some block benchmark with an old
> MPT Fusion; but it probably depends a lot on the HBA]

But most devices are SG-capable.

> And you get better driver debugging because unexpected memory
> accesses from the devices will cause an trapable event.

That and direct-access for KVM the big ones, IMHO, and definitely
justify merging.

> > Does it slow anything down?
>
> It adds more overhead to each IO so yes.

How much? we have numbers (to be presented at OLS later this week)
that show that on bare-metal an IOMMU can cost as much as 15%-30% more
CPU utilization for an IO intensive workload (netperf). It will be
interesting to see comparable numbers for VT-d.

Cheers,
Muli

2007-06-26 15:01:19

by Andi Kleen

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

Muli Ben-Yehuda <[email protected]> writes:

> On Tue, Jun 26, 2007 at 09:12:45AM +0200, Andi Kleen wrote:
>
> > There are some potential performance benefits too:
> > - When you have a device that cannot address the complete address range
> > an IOMMU can remap its memory instead of bounce buffering. Remapping
> > is likely cheaper than copying.
>
> But those devices aren't likely to be found on modern systems.

Not true. I don't see anybody designing DAC capable USB or firewire
or sound or TV cards. And there are plenty of non AHCI SATA interfaces too
(often the BIOS defaults are this way because XP doesn't deal
well with AHCI). And video cards generally don't support it
(although they don't like IOMMUs either). Just these devices all might
not be performance relevant (except for the video cards)

> > - The IOMMU can merge sg lists into a single virtual block. This could
> > potentially speed up SG IO when the device is slow walking SG lists.
> > [I long ago benchmarked 5% on some block benchmark with an old
> > MPT Fusion; but it probably depends a lot on the HBA]
>
> But most devices are SG-capable.

Your point being? It depends on if the SG hardware is slow
enough that it makes a difference. I found one case where that
was true, but it's unknown how common that is.

Only benchmarks can tell.

Also my results were on a pretty slow IOMMU implementation
so with a fast one it might be different too.

> How much? we have numbers (to be presented at OLS later this week)
> that show that on bare-metal an IOMMU can cost as much as 15%-30% more
> CPU utilization for an IO intensive workload (netperf). It will be
> interesting to see comparable numbers for VT-d.

That is something that needs more work.

We should probably have a switch to use the IOMMU only for specific
devices (e.g. for the KVM case) r only when remapping is needed. Only
boot options for this is probably not good enough. But that is something
that can be worked on once everything is in tree.

Also the user interface for X server case needs more work.

-Andi

2007-06-26 15:05:55

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

Muli Ben-Yehuda wrote:
> How much? we have numbers (to be presented at OLS later this week)
> that show that on bare-metal an IOMMU can cost as much as 15%-30% more
> CPU utilization for an IO intensive workload (netperf). It will be
> interesting to see comparable numbers for VT-d.

for VT-d it is a LOT less. I'll let anil give you his data :)

2007-06-26 15:09:54

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

On Tue, Jun 26, 2007 at 05:56:49PM +0200, Andi Kleen wrote:

> > > - The IOMMU can merge sg lists into a single virtual block. This could
> > > potentially speed up SG IO when the device is slow walking SG
> > > lists. [I long ago benchmarked 5% on some block benchmark with
> > > an old MPT Fusion; but it probably depends a lot on the HBA]
> >
> > But most devices are SG-capable.
>
> Your point being?

That the fact that an IOMMU can do SG for non-SG-capble cards is not
interesting from a "reason for inclusion" POV.

> > How much? we have numbers (to be presented at OLS later this week)
> > that show that on bare-metal an IOMMU can cost as much as 15%-30%
> > more CPU utilization for an IO intensive workload (netperf). It
> > will be interesting to see comparable numbers for VT-d.
>
> That is something that needs more work.

Yup. I'm working on it (mostly in the context of Calgary) but also
looking at improvements to the DMA-API interface and usage.

> We should probably have a switch to use the IOMMU only for specific
> devices (e.g. for the KVM case) r only when remapping is
> needed.

Calgary already does this internally (via calgary=disable=<BUSNUM>)
but that's pretty ugly. It would be better to do it in a generic
fashion when deciding which dma_ops to call (i.e., a dma_ops per bus
or device).

> Also the user interface for X server case needs more work.

Is anyone working on it?

Cheers,
Muli

2007-06-26 15:11:56

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

On Tue, Jun 26, 2007 at 08:03:59AM -0700, Arjan van de Ven wrote:
> Muli Ben-Yehuda wrote:
> >How much? we have numbers (to be presented at OLS later this week)
> >that show that on bare-metal an IOMMU can cost as much as 15%-30% more
> >CPU utilization for an IO intensive workload (netperf). It will be
> >interesting to see comparable numbers for VT-d.
>
> for VT-d it is a LOT less. I'll let anil give you his data :)

Looking forward to it. Note that this is on a large SMP machine with
Gigabit ethernet, with netperf TCP stream. Comparing numbers for other
benchmarks on other machines is ... less than useful, but the numbers
themeselves are interesting.

Cheers,
Muli

2007-06-26 15:16:58

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

>
> Also the user interface for X server case needs more work.
>

actually with the mode setting of X moving into the kernel... X won't
use /dev/mem anymore at all
(and I think it mostly already doesn't even without that)

2007-06-26 15:34:19

by Andi Kleen

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

On Tue, Jun 26, 2007 at 08:15:05AM -0700, Arjan van de Ven wrote:
> >
> >Also the user interface for X server case needs more work.
> >
>
> actually with the mode setting of X moving into the kernel... X won't
> use /dev/mem anymore at all

We'll see if that happens. It has been talked about forever,
but results are sparse.

> (and I think it mostly already doesn't even without that)

It uses /sys/bus/pci/* which is not any better as seen from the IOMMU.

Any interface will need to be explicit because user space needs to know which
DMA addresses to put into the hardware. It's not enough to just transparently
translate the mappings.

-Andi

2007-06-26 15:37:12

by Andi Kleen

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

On Tue, Jun 26, 2007 at 11:09:40AM -0400, Muli Ben-Yehuda wrote:
> On Tue, Jun 26, 2007 at 05:56:49PM +0200, Andi Kleen wrote:
>
> > > > - The IOMMU can merge sg lists into a single virtual block. This could
> > > > potentially speed up SG IO when the device is slow walking SG
> > > > lists. [I long ago benchmarked 5% on some block benchmark with
> > > > an old MPT Fusion; but it probably depends a lot on the HBA]
> > >
> > > But most devices are SG-capable.
> >
> > Your point being?
>
> That the fact that an IOMMU can do SG for non-SG-capble cards is not
> interesting from a "reason for inclusion" POV.

You misunderstood me; my point was that some SG capable devices
can go faster if they get shorter SG lists.

But yes for non SG capable devices it is also interesting. I expect
it will obsolete most users of that ugly external patch to allocate large
memory areas for IOs. That's a point I didn't mention earlier.

> > Also the user interface for X server case needs more work.
>
> Is anyone working on it?

It's somewhere on the todo list.


-Andi

2007-06-26 15:52:46

by Keshavamurthy, Anil S

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

On Tue, Jun 26, 2007 at 11:11:25AM -0400, Muli Ben-Yehuda wrote:
> On Tue, Jun 26, 2007 at 08:03:59AM -0700, Arjan van de Ven wrote:
> > Muli Ben-Yehuda wrote:
> > >How much? we have numbers (to be presented at OLS later this week)
> > >that show that on bare-metal an IOMMU can cost as much as 15%-30% more
> > >CPU utilization for an IO intensive workload (netperf). It will be
> > >interesting to see comparable numbers for VT-d.
> >
> > for VT-d it is a LOT less. I'll let anil give you his data :)
>
> Looking forward to it. Note that this is on a large SMP machine with
> Gigabit ethernet, with netperf TCP stream. Comparing numbers for other
> benchmarks on other machines is ... less than useful, but the numbers
> themeselves are interesting.
Our initial benchmark results showed we had around 3% extra CPU
utilization overhead when compared to native(i.e without IOMMU).
Again, our benchmark was on small SMP machine and we used
iperf and a 1G ethernet cards.

Going forward we will do more benchmark tests and will share the
results.

-Anil

2007-06-26 16:00:51

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

On Tue, Jun 26, 2007 at 08:48:04AM -0700, Keshavamurthy, Anil S wrote:

> Our initial benchmark results showed we had around 3% extra CPU
> utilization overhead when compared to native(i.e without IOMMU).
> Again, our benchmark was on small SMP machine and we used iperf and
> a 1G ethernet cards.

Please try netperf and a bigger machine for a meaningful comparison :-)
I assume this is with e1000?

> Going forward we will do more benchmark tests and will share the
> results.

Looking forward to it.

Cheers,
Muli

2007-06-26 16:27:43

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

Andi Kleen wrote:
> On Tue, Jun 26, 2007 at 08:15:05AM -0700, Arjan van de Ven wrote:
>>> Also the user interface for X server case needs more work.
>>>
>> actually with the mode setting of X moving into the kernel... X won't
>> use /dev/mem anymore at all
>
> We'll see if that happens. It has been talked about forever,
> but results are sparse.

jbarnes posted the code a few weeks ago.

>
>> (and I think it mostly already doesn't even without that)
>
> It uses /sys/bus/pci/* which is not any better as seen from the IOMMU.
>
> Any interface will need to be explicit because user space needs to know which
> DMA addresses to put into the hardware. It's not enough to just transparently
> translate the mappings.

that's what DRM is used for nowadays...

2007-06-26 17:32:13

by Andi Kleen

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

> >>(and I think it mostly already doesn't even without that)
> >
> >It uses /sys/bus/pci/* which is not any better as seen from the IOMMU.
> >
> >Any interface will need to be explicit because user space needs to know
> >which
> >DMA addresses to put into the hardware. It's not enough to just
> >transparently
> >translate the mappings.
>
> that's what DRM is used for nowadays...

But DRM does support much less hardware than the X server?

Perhaps we just need an ioctl where an X server can switch this.

-Andi

2007-06-26 20:11:13

by Jesse Barnes

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

On Tuesday, June 26, 2007 10:31:57 Andi Kleen wrote:
> > >>(and I think it mostly already doesn't even without that)
> > >
> > >It uses /sys/bus/pci/* which is not any better as seen from the
> > > IOMMU.
> > >
> > >Any interface will need to be explicit because user space needs to
> > > know which
> > >DMA addresses to put into the hardware. It's not enough to just
> > >transparently
> > >translate the mappings.
> >
> > that's what DRM is used for nowadays...
>
> But DRM does support much less hardware than the X server?

Yeah, the number of DRM drivers is relatively small compared to X or
fbdev, but for simple DMA they're fairly easy to write.

> Perhaps we just need an ioctl where an X server can switch this.

Switch what? Turn on or off transparent translation?

Jesse

2007-06-26 22:35:27

by Andi Kleen

[permalink] [raw]
Subject: Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2

>
> > Perhaps we just need an ioctl where an X server can switch this.
>
> Switch what? Turn on or off transparent translation?

Turn on/off bypass for its device.

-Andi