2007-06-19 22:19:26

by Keshavamurthy, Anil S

[permalink] [raw]
Subject: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

Intel IOMMU driver needs memory during DMA map calls to setup its internal
page tables and for other data structures. As we all know that these DMA
map calls are mostly called in the interrupt context or with the spinlock
held by the upper level drivers(network/storage drivers), so in order to
avoid any memory allocation failure due to low memory issues,
this patch makes memory allocation by temporarily setting PF_MEMALLOC
flags for the current task before making memory allocation calls.

We evaluated mempools as a backup when kmem_cache_alloc() fails
and found that mempools are really not useful here because
1) We don;t know for sure how much to reserve in advance
2) And mempools are not useful for GFP_ATOMIC case (as we call
memory alloc functions with GFP_ATOMIC)


With PF_MEMALLOC flag set in the current->flags, the VM subsystem avoids
any watermark checks before allocating memory thus guarantee'ing the
memory till the last free page. Further, looking at the code in
mm/page_alloc.c in __alloc_pages() function, looks like this
flag is useful only in the non-interrupt context.

If we are in the interrupt context and memory allocation in IOMMU
driver fails for some reason, then the DMA map api's will return
failure and it is up to the higher level drivers to retry. Suppose,
if upper level driver programs the controller with the buggy
DMA virtual address, the IOMMU will block that DMA transaction
when that happens thus preventing any corruption to main memory.

So far in our test scenario, we were unable to create
any memory allocation failure inside dma map api calls.

Signed-off-by: Anil S Keshavamurthy <[email protected]>

---
drivers/pci/intel-iommu.c | 30 ++++++++++++++++++++++++++----
1 file changed, 26 insertions(+), 4 deletions(-)

Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c
===================================================================
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 15:45:46.000000000 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c 2007-06-19 13:10:29.000000000 -0700
@@ -84,9 +84,31 @@
static struct kmem_cache *iommu_devinfo_cache;
static struct kmem_cache *iommu_iova_cache;

+static inline void *iommu_kmem_cache_alloc(struct kmem_cache *cachep)
+{
+ unsigned int flags;
+ void *vaddr;
+
+ /* trying to avoid low memory issues */
+ flags = current->flags & PF_MEMALLOC;
+ current->flags |= PF_MEMALLOC;
+ vaddr = kmem_cache_alloc(cachep, GFP_ATOMIC);
+ current->flags &= (~PF_MEMALLOC | flags);
+ return vaddr;
+}
+
+
static inline void *alloc_pgtable_page(void)
{
- return (void *)get_zeroed_page(GFP_ATOMIC);
+ unsigned int flags;
+ void *vaddr;
+
+ /* trying to avoid low memory issues */
+ flags = current->flags & PF_MEMALLOC;
+ current->flags |= PF_MEMALLOC;
+ vaddr = (void *)get_zeroed_page(GFP_ATOMIC);
+ current->flags &= (~PF_MEMALLOC | flags);
+ return vaddr;
}

static inline void free_pgtable_page(void *vaddr)
@@ -96,7 +118,7 @@

static inline void *alloc_domain_mem(void)
{
- return kmem_cache_alloc(iommu_domain_cache, GFP_ATOMIC);
+ return iommu_kmem_cache_alloc(iommu_domain_cache);
}

static inline void free_domain_mem(void *vaddr)
@@ -106,7 +128,7 @@

static inline void * alloc_devinfo_mem(void)
{
- return kmem_cache_alloc(iommu_devinfo_cache, GFP_ATOMIC);
+ return iommu_kmem_cache_alloc(iommu_devinfo_cache);
}

static inline void free_devinfo_mem(void *vaddr)
@@ -116,7 +138,7 @@

struct iova *alloc_iova_mem(void)
{
- return kmem_cache_alloc(iommu_iova_cache, GFP_ATOMIC);
+ return iommu_kmem_cache_alloc(iommu_iova_cache);
}

void free_iova_mem(struct iova *iova)

--


2007-06-19 23:26:14

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Tue, 19 Jun 2007, Keshavamurthy, Anil S wrote:


> So far in our test scenario, we were unable to create
> any memory allocation failure inside dma map api calls.

All these functions should have gfp_t flags passed to them.
Otherwise you are locked into the use of GFP_ATOMIC. If this is a
parameter then you may be able to use GFP_KERNEL in various places and you
may develop the code to have less GFP_ATOMIC allocs in the future.

2007-06-19 23:29:24

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

Christoph Lameter wrote:
> On Tue, 19 Jun 2007, Keshavamurthy, Anil S wrote:
>
>
>> So far in our test scenario, we were unable to create
>> any memory allocation failure inside dma map api calls.
>
> All these functions should have gfp_t flags passed to them.

why?

> Otherwise you are locked into the use of GFP_ATOMIC.

all callers pretty much are either in irq context or with spinlocks
held. Good luck..... it's also called primarily from the PCI DMA API
which doesn't take a gfp_t argument in the first place...

so I'm not seeing the point.

2007-06-19 23:34:34

by Christoph Lameter

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Tue, 19 Jun 2007, Arjan van de Ven wrote:

> > Otherwise you are locked into the use of GFP_ATOMIC.
>
> all callers pretty much are either in irq context or with spinlocks held. Good
> luck..... it's also called primarily from the PCI DMA API which doesn't take a
> gfp_t argument in the first place...
>
> so I'm not seeing the point.

Hmmm... From my superficial look at things it seems that one could avoid
GFP_ATOMIC at times. I do not know too much about the driver though but it
seems a bit restrictive to always do GFP_ATOMIC allocs.

2007-06-20 01:10:51

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Tue, 2007-06-19 at 16:34 -0700, Christoph Lameter wrote:
> On Tue, 19 Jun 2007, Arjan van de Ven wrote:
>
> > > Otherwise you are locked into the use of GFP_ATOMIC.
> >
> > all callers pretty much are either in irq context or with spinlocks held. Good
> > luck..... it's also called primarily from the PCI DMA API which doesn't take a
> > gfp_t argument in the first place...
> >
> > so I'm not seeing the point.
>
> Hmmm... From my superficial look at things it seems that one could avoid
> GFP_ATOMIC at times.

by changing ALL drivers. And then you realize the common scenario is
that it's taken in irq context ;)

> I do not know too much about the driver though but it
> seems a bit restrictive to always do GFP_ATOMIC allocs.

the only alternative to GFP_ATOMIC is GFP_NOIO which is... barely
better. And then add that it can be used only sporadic...
feel free to first change all the drivers.. once the callers of these
functions have a gfp_t then I'm sure Anil will be happy to take one of
those as well ;-)

2007-06-20 08:07:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Tue, 2007-06-19 at 14:37 -0700, Keshavamurthy, Anil S wrote:
> plain text document attachment (intel_iommu_pf_memalloc.patch)
> Intel IOMMU driver needs memory during DMA map calls to setup its internal
> page tables and for other data structures. As we all know that these DMA
> map calls are mostly called in the interrupt context or with the spinlock
> held by the upper level drivers(network/storage drivers), so in order to
> avoid any memory allocation failure due to low memory issues,
> this patch makes memory allocation by temporarily setting PF_MEMALLOC
> flags for the current task before making memory allocation calls.
>
> We evaluated mempools as a backup when kmem_cache_alloc() fails
> and found that mempools are really not useful here because
> 1) We don;t know for sure how much to reserve in advance

So you just unleashed an unbounded allocation context on PF_MEMALLOC?
seems like a really really bad idea.

> 2) And mempools are not useful for GFP_ATOMIC case (as we call
> memory alloc functions with GFP_ATOMIC)

Mempools work as intended with GFP_ATOMIC, it gets filled up to the
specified number of elements using GFP_KERNEL (at creation time). This
gives GFP_ATOMIC allocations nr_elements extra items once it would start
failing.

> With PF_MEMALLOC flag set in the current->flags, the VM subsystem avoids
> any watermark checks before allocating memory thus guarantee'ing the
> memory till the last free page.

PF_MEMALLOC as is, is meant to salvage the VM from the typical VM
deadlock. Using it as you do now is not something a driver should ever
do, and I'm afraid I will have to strongly oppose this patch.

You really really want to calculate an upper bound on your memory
consumption and reserve this.

So, I'm afraid I'll have to..

NACK!

2007-06-20 13:04:51

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

Peter Zijlstra wrote:
>
>
> PF_MEMALLOC as is, is meant to salvage the VM from the typical VM
> deadlock.

.. and this IS the typical VM deadlock.. it is your storage driver
trying to write out a piece of memory on behalf of the VM, and calls
the iommu to map it, which then needs a bit of memory....

2007-06-20 17:35:33

by Suresh Siddha

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Wed, Jun 20, 2007 at 06:03:02AM -0700, Arjan van de Ven wrote:
> Peter Zijlstra wrote:
> >
> >
> >PF_MEMALLOC as is, is meant to salvage the VM from the typical VM
> >deadlock.
>
> .. and this IS the typical VM deadlock.. it is your storage driver
> trying to write out a piece of memory on behalf of the VM, and calls
> the iommu to map it, which then needs a bit of memory....

Today PF_MEMALLOC doesn't do much in interrupt context. If PF_MEMALLOC
is the right usage model for this, then we need to fix the behavior of
PF_MEMALLOC in the interrupt context(for our usage model, we do most
of the allocations in interrupt context).

I am not very familiar with PF_MEMALLOC. So experts please comment.

thanks,
suresh

2007-06-20 18:05:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Wed, 2007-06-20 at 10:30 -0700, Siddha, Suresh B wrote:
> On Wed, Jun 20, 2007 at 06:03:02AM -0700, Arjan van de Ven wrote:
> > Peter Zijlstra wrote:
> > >
> > >
> > >PF_MEMALLOC as is, is meant to salvage the VM from the typical VM
> > >deadlock.
> >
> > .. and this IS the typical VM deadlock.. it is your storage driver
> > trying to write out a piece of memory on behalf of the VM, and calls
> > the iommu to map it, which then needs a bit of memory....
>
> Today PF_MEMALLOC doesn't do much in interrupt context. If PF_MEMALLOC
> is the right usage model for this, then we need to fix the behavior of
> PF_MEMALLOC in the interrupt context(for our usage model, we do most
> of the allocations in interrupt context).

Right, I have patches that add GFP_EMERGENCY to do basically that.

> I am not very familiar with PF_MEMALLOC. So experts please comment.

PF_MEMALLOC is meant to avoid the VM deadlock - that is we need memory
to free memory. The one constraint is that its use be bounded. (which is
currently violated in that there is no bound on the number of direct
reclaim contexts - which is on my to-fix list)

So a reclaim context (kswapd and direct reclaim) set PF_MEMALLOC to
ensure they themselves will not block on a memory allocation. And it is
understood that these code paths have a bounded memory footprint.

Now, this code seems to be running from interrupt context, which makes
it impossible to tell if the work is being done on behalf of a reclaim
task. Is it possible to setup the needed data for the IRQ handler from
process context?

Blindly adding GFP_EMERGENCY to do this, has the distinct disadvantage
that there is no inherent bound on the amount of memory consumed. In my
patch set I add an emergency reserve (below the current watermarks,
because ALLOC_HIGH and ALLOC_HARDER modify the threshold in a relative
way, and thus cannot provide a guaranteed limit). I then accurately
account all allocations made from this reserve to ensure I never cross
the set limit.

Like has been said before, if possible move to blocking allocs
(GFP_NOIO), if that is not possible use mempools (for kmem_cache, or
page alloc), if that is not possible use ALLOC_NO_WATERMARKS
(PF_MEMALLOC, GFP_EMERGENCY) but put in a reserve and account its usage.

The last option basically boils down to reserved based allocation,
something which I hope to introduce some-day...

That is, failure is a OK, unless you're from a reclaim context, those
should make progress.


One thing I'm confused about, in earlier discussions it was said that
mempools are not sufficient because they deplete the GFP_ATOMIC reserve
and only then use the mempool. This would not work because some
downstream allocation would then go splat --- using
PF_MEMALLOC/GFP_EMERGENCY has exactly the same problem!



2007-06-20 19:16:43

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

Peter Zijlstra wrote:
> So a reclaim context (kswapd and direct reclaim) set PF_MEMALLOC to
> ensure they themselves will not block on a memory allocation. And it is
> understood that these code paths have a bounded memory footprint.


that's a too simplistic view though; what happens is that kswapd will
queue the IO, but the irq context will then take the IO from the queue
and do the DMA mapping... which needs the memory.....

2007-06-20 20:09:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Wed, 2007-06-20 at 12:14 -0700, Arjan van de Ven wrote:
> Peter Zijlstra wrote:
> > So a reclaim context (kswapd and direct reclaim) set PF_MEMALLOC to
> > ensure they themselves will not block on a memory allocation. And it is
> > understood that these code paths have a bounded memory footprint.
>
>
> that's a too simplistic view though; what happens is that kswapd will
> queue the IO, but the irq context will then take the IO from the queue
> and do the DMA mapping... which needs the memory.....

Right, but who stops some unrelated interrupt handler from completely
depleting memory?

What I'm saying is that there should be some coupling between the
reclaim context and the irq context doing work on its behalf.

For instance, you know how many pages are in the queue, and which queue.
So you could preallocate enough memory to handle that many pages from
irq context and couple that reserve to the queue object. Then irq
context can use that memory to do the work.



2007-06-20 23:08:08

by Keshavamurthy, Anil S

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Wed, Jun 20, 2007 at 10:08:51PM +0200, Peter Zijlstra wrote:
> On Wed, 2007-06-20 at 12:14 -0700, Arjan van de Ven wrote:
> > Peter Zijlstra wrote:
> > > So a reclaim context (kswapd and direct reclaim) set PF_MEMALLOC to
> > > ensure they themselves will not block on a memory allocation. And it is
> > > understood that these code paths have a bounded memory footprint.
> >
> >
> > that's a too simplistic view though; what happens is that kswapd will
> > queue the IO, but the irq context will then take the IO from the queue
> > and do the DMA mapping... which needs the memory.....

As Arjan is saying, that a reclaim context sets PF_MEMALLOC flag
and submits the IO, but the controller driver decides to queue the IO,
and later in the interrupt context it de-queues and calls the
IOMMU driver for mapping the DMA physical address and in this DMA
map api call we may need the memory to satisfy the DMA map api call.
Hence PF_MEMALLOC set by the reclaim context should work from
interrupt context too, if it is not then that needs to be fixed.

>
> Right, but who stops some unrelated interrupt handler from completely
> depleting memory?
The DMA map API's exposed by IOMMU are called by the
storage/network controller driver and it is in this IOMMU
driver we are talking about allocating memory. So we have no
clue how much I/O the controller above us is capable of submitting.
And all we do is the mapping of DMA phyical address and
provide the caller with the virtual DMA address and it
is in this process we may need some memory.

>
> What I'm saying is that there should be some coupling between the
> reclaim context and the irq context doing work on its behalf.
Nope this info is not available when upper level drivers
calls the standard DMA map api's.

>
> For instance, you know how many pages are in the queue, and which queue.
> So you could preallocate enough memory to handle that many pages from
> irq context and couple that reserve to the queue object. Then irq
> context can use that memory to do the work.

As I have said earlier, we are not the storage or network controller
driver and we have no idea how much the IO the controller is capable of.


Again this patch is the best effort for not to fail the DMA map api,
if it fails due to lack of memory, we have no choice but to return
failure to upper level driver. But today, upper level drivers are
not capable of handling failures from DMA map api's hence this best
effort for not to fail the DMA map api call.

thanks,
-Anil


2007-06-21 06:10:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Wed, 2007-06-20 at 16:03 -0700, Keshavamurthy, Anil S wrote:
> On Wed, Jun 20, 2007 at 10:08:51PM +0200, Peter Zijlstra wrote:
> > On Wed, 2007-06-20 at 12:14 -0700, Arjan van de Ven wrote:
> > > Peter Zijlstra wrote:
> > > > So a reclaim context (kswapd and direct reclaim) set PF_MEMALLOC to
> > > > ensure they themselves will not block on a memory allocation. And it is
> > > > understood that these code paths have a bounded memory footprint.
> > >
> > >
> > > that's a too simplistic view though; what happens is that kswapd will
> > > queue the IO, but the irq context will then take the IO from the queue
> > > and do the DMA mapping... which needs the memory.....
>
> As Arjan is saying, that a reclaim context sets PF_MEMALLOC flag
> and submits the IO, but the controller driver decides to queue the IO,
> and later in the interrupt context it de-queues and calls the
> IOMMU driver for mapping the DMA physical address and in this DMA
> map api call we may need the memory to satisfy the DMA map api call.
> Hence PF_MEMALLOC set by the reclaim context should work from
> interrupt context too, if it is not then that needs to be fixed.

PF_MEMALLOC cannot work from interrupt context, nor should it. But there
are other options.

What I'm saying is that if you do use the reserves, you should ensure
the use is bounded. I'm not seeing anything like that.

This is a generic API, who is to ensure some other non-swap device will
not deplete memory and deadlock the reclaim process?


Also, please do explain why mempools are not usable? those will have
exactly the same semantics as you are now getting (if PF_MEMALLOC would
work from interrupt context).


Also, explain to me how an IOMMU is different from bounce buffers? They
both do the same thing, no? They both need memory in order to complete
DMA.

Is it just a broken API you're working against? If so, isn't the Linux
way to fix these things, that is why we have the source code after all.


2007-06-21 06:12:54

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

Peter Zijlstra wrote:
> What I'm saying is that if you do use the reserves, you should ensure
> the use is bounded. I'm not seeing anything like that.

each mapping takes at most 3 pages
>
> This is a generic API, who is to ensure some other non-swap device will
> not deplete memory and deadlock the reclaim process?
>

that information is not available at this level ;(

>
>
> Also, explain to me how an IOMMU is different from bounce buffers? They
> both do the same thing, no? They both need memory in order to complete
> DMA.

bounce buffers happen in a place where you can sleep.... that makes a
lot of difference.

>
> Is it just a broken API you're working against? If so, isn't the Linux
> way to fix these things, that is why we have the source code after all.

well yes and no... the other iommu's snuck in as well... it's not
entirely fair to hold this one back until a 2 year, 1400 driver
project is completed ;(

2007-06-21 06:29:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Wed, 2007-06-20 at 23:11 -0700, Arjan van de Ven wrote:
> Peter Zijlstra wrote:
> > What I'm saying is that if you do use the reserves, you should ensure
> > the use is bounded. I'm not seeing anything like that.
>
> each mapping takes at most 3 pages

That is a start, but the thing I'm worried most about is non-reclaim
related devices using the thing when in dire straights.

> > This is a generic API, who is to ensure some other non-swap device will
> > not deplete memory and deadlock the reclaim process?
>
> that information is not available at this level ;(

Can we bring it there?

> > Also, explain to me how an IOMMU is different from bounce buffers? They
> > both do the same thing, no? They both need memory in order to complete
> > DMA.
>
> bounce buffers happen in a place where you can sleep.... that makes a
> lot of difference.

Right, can't you stick part of this work there?

> >
> > Is it just a broken API you're working against? If so, isn't the Linux
> > way to fix these things, that is why we have the source code after all.
>
> well yes and no... the other iommu's snuck in as well... it's not
> entirely fair to hold this one back until a 2 year, 1400 driver
> project is completed ;(

I understand, but at some point we should stop; we cannot keep taking
crap in deference of such things.

Also, the other iommu code you pointed me to, was happy to fail, it did
not attempt to use the emergency reserves.


But you left out the mempools question again. I have read the earlier
threads, and it was said mempools are no good because they first deplete
GFP_ATOMIC reserves and then down-stream allocs could go splat.
PF_MEMALLOC/GFP_EMERGENCY has exactly the same problem...

So why no mempools?

2007-06-21 06:35:10

by Keshavamurthy, Anil S

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Wed, Jun 20, 2007 at 11:11:05PM -0700, Arjan van de Ven wrote:
> Peter Zijlstra wrote:
> >What I'm saying is that if you do use the reserves, you should ensure
> >the use is bounded. I'm not seeing anything like that.
>
> each mapping takes at most 3 pages
With 3 pages(3 level page table), IOMMU can map at
most 2MB and each additional last level page helps
map another 2MB. Again, the IOMMU driver re-uses the
virtual address instead of giving contiguous virtual address
which helps to keep the page table growth and helps
reuse the same page table entries, in that sense we are
bounded, again we are not sure how much IO will be in flight
on a perticular system so as to reserve that many pages for
IOMMU page table setup.

-Anil

2007-06-21 06:41:56

by Keshavamurthy, Anil S

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Thu, Jun 21, 2007 at 08:29:34AM +0200, Peter Zijlstra wrote:
> On Wed, 2007-06-20 at 23:11 -0700, Arjan van de Ven wrote:
> > Peter Zijlstra wrote:
> Also, the other iommu code you pointed me to, was happy to fail, it did
> not attempt to use the emergency reserves.

Is the same behavior acceptable here?

-Anil

2007-06-21 07:13:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Wed, 2007-06-20 at 23:37 -0700, Keshavamurthy, Anil S wrote:
> On Thu, Jun 21, 2007 at 08:29:34AM +0200, Peter Zijlstra wrote:
> > On Wed, 2007-06-20 at 23:11 -0700, Arjan van de Ven wrote:
> > > Peter Zijlstra wrote:
> > Also, the other iommu code you pointed me to, was happy to fail, it did
> > not attempt to use the emergency reserves.
>
> Is the same behavior acceptable here?

I would say it is. Failure is a part of life.

If you have a (small) mempool with 16 pages or so, that should give you
plenty megabytes of io-space to get out of a tight spot. That is, you
can queue many pages with that. If it is depleted you know you have at
least that many pages outstanding. So failing will just delay the next
pages.

Throughput is not a key issue when that low on memory, a guarantee of
progress is.


2007-06-21 19:56:28

by Keshavamurthy, Anil S

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Thu, Jun 21, 2007 at 09:13:11AM +0200, Peter Zijlstra wrote:
> On Wed, 2007-06-20 at 23:37 -0700, Keshavamurthy, Anil S wrote:
> > On Thu, Jun 21, 2007 at 08:29:34AM +0200, Peter Zijlstra wrote:
> > > On Wed, 2007-06-20 at 23:11 -0700, Arjan van de Ven wrote:
> > > > Peter Zijlstra wrote:
> > > Also, the other iommu code you pointed me to, was happy to fail, it did
> > > not attempt to use the emergency reserves.
> >
> > Is the same behavior acceptable here?
>
> I would say it is. Failure is a part of life.
>
> If you have a (small) mempool with 16 pages or so, that should give you
> plenty megabytes of io-space to get out of a tight spot. That is, you
> can queue many pages with that. If it is depleted you know you have at
> least that many pages outstanding. So failing will just delay the next
> pages.
>
> Throughput is not a key issue when that low on memory, a guarantee of
> progress is.

Andrew,
Can you please queue all the other patches except this one for your
next MM release? (Yes, We can safely drop this patch without any issues in applying
rest of the patches).


-thanks,
Anil Keshavamurthy

2007-06-26 05:35:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls

On Wed, 20 Jun 2007 10:06:39 +0200 Peter Zijlstra <[email protected]> wrote:

> On Tue, 2007-06-19 at 14:37 -0700, Keshavamurthy, Anil S wrote:
> > plain text document attachment (intel_iommu_pf_memalloc.patch)
> > Intel IOMMU driver needs memory during DMA map calls to setup its internal
> > page tables and for other data structures. As we all know that these DMA
> > map calls are mostly called in the interrupt context or with the spinlock
> > held by the upper level drivers(network/storage drivers), so in order to
> > avoid any memory allocation failure due to low memory issues,
> > this patch makes memory allocation by temporarily setting PF_MEMALLOC
> > flags for the current task before making memory allocation calls.
> >
> > We evaluated mempools as a backup when kmem_cache_alloc() fails
> > and found that mempools are really not useful here because
> > 1) We don;t know for sure how much to reserve in advance
>
> So you just unleashed an unbounded allocation context on PF_MEMALLOC?
> seems like a really really bad idea.
>
> > 2) And mempools are not useful for GFP_ATOMIC case (as we call
> > memory alloc functions with GFP_ATOMIC)
>
> Mempools work as intended with GFP_ATOMIC, it gets filled up to the
> specified number of elements using GFP_KERNEL (at creation time). This
> gives GFP_ATOMIC allocations nr_elements extra items once it would start
> failing.

Yup. Changelog is buggy.

> > With PF_MEMALLOC flag set in the current->flags, the VM subsystem avoids
> > any watermark checks before allocating memory thus guarantee'ing the
> > memory till the last free page.
>
> PF_MEMALLOC as is, is meant to salvage the VM from the typical VM
> deadlock. Using it as you do now is not something a driver should ever
> do, and I'm afraid I will have to strongly oppose this patch.
>
> You really really want to calculate an upper bound on your memory
> consumption and reserve this.
>
> So, I'm afraid I'll have to..
>
> NACK!

err, PF_MEMALLOC doesn't actually do anything if in_interrupt(), so your
reason-for-nacking isn't legitimate. And neither is Anil's patch ;)

So I'm thinking that if this patch passed all his testing, a patch which
didn't play these PF_MEMALLOC games would pass the same tests.