2008-03-25 20:47:35

by Jarod Wilson

[permalink] [raw]
Subject: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler

There's a nasty memory leak in firewire-ohci's ar_context_tasklet(), in that
we're not freeing up some of the memory we use for each ar_buffer, due to a
moving pointer. The problem has been there for a while, but didn't start
to be noticed until we were doing a coherent allocation for the ar_buffer --
meaning we have a smaller pool of memory to work with now, so the problem
crops up sooner. The manifestation of this comes after doing a bunch of I/O to
a firewire disk, which eventually stalls, and this starts spewing to the
console:

PCI-DMA: Out of IOMMU space for 53248 bytes at device 0000:04:09.0

The device there is one of my FireWire controllers trying to do I/O. The host
is a fairly new rev. opteron.

Just need to make sure we're freeing the correct memory range is pass through
ar_context_tasklet to fix it. Probably something we ought to sneak into 2.6.25
if its still doable...

Signed-off-by: Jarod Wilson <[email protected]>
---

drivers/firewire/fw-ohci.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/firewire/fw-ohci.c b/drivers/firewire/fw-ohci.c
index 8ff9059..e1d50f7 100644
--- a/drivers/firewire/fw-ohci.c
+++ b/drivers/firewire/fw-ohci.c
@@ -579,7 +579,8 @@ static void ar_context_tasklet(unsigned long data)

if (d->res_count == 0) {
size_t size, rest, offset;
- dma_addr_t buffer_bus;
+ dma_addr_t start_bus;
+ void *start;

/*
* This descriptor is finished and we may have a
@@ -588,9 +589,9 @@ static void ar_context_tasklet(unsigned long data)
*/

offset = offsetof(struct ar_buffer, data);
- buffer_bus = le32_to_cpu(ab->descriptor.data_address) - offset;
+ start = buffer = ab;
+ start_bus = le32_to_cpu(ab->descriptor.data_address) - offset;

- buffer = ab;
ab = ab->next;
d = &ab->descriptor;
size = buffer + PAGE_SIZE - ctx->pointer;
@@ -605,7 +606,7 @@ static void ar_context_tasklet(unsigned long data)
buffer = handle_ar_packet(ctx, buffer);

dma_free_coherent(ohci->card.device, PAGE_SIZE,
- buffer, buffer_bus);
+ start, start_bus);
ar_context_add_page(ctx);
} else {
buffer = ctx->pointer;

--
Jarod Wilson
[email protected]


2008-03-25 22:30:30

by Stefan Richter

[permalink] [raw]
Subject: Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler

Jarod Wilson wrote:
> Just need to make sure we're freeing the correct memory

That would be a plus. :-)

> Probably something we ought to sneak into 2.6.25 if its still doable...

Looks good and initial testing here is fine. I don't have a board with
IOMMU though. Will look over it once more tomorrow, then submit it.
--
Stefan Richter
-=====-==--- --== ==--=
http://arcgraph.de/sr/

2008-03-26 07:10:42

by Stefan Richter

[permalink] [raw]
Subject: Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler

Jarod Wilson wrote:
> @@ -605,7 +606,7 @@ static void ar_context_tasklet(unsigned long data)
> buffer = handle_ar_packet(ctx, buffer);
>
> dma_free_coherent(ohci->card.device, PAGE_SIZE,
> - buffer, buffer_bus);
> + start, start_bus);
> ar_context_add_page(ctx);

On the other hand, why do we free a page + allocate a page?
Why don't we re-initialize and re-add the old page?
--
Stefan Richter
-=====-==--- --== ==-=-
http://arcgraph.de/sr/

2008-03-26 13:10:26

by Jarod Wilson

[permalink] [raw]
Subject: Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler

On Wednesday 26 March 2008 03:09:47 am Stefan Richter wrote:
> Jarod Wilson wrote:
> > @@ -605,7 +606,7 @@ static void ar_context_tasklet(unsigned long data)
> > buffer = handle_ar_packet(ctx, buffer);
> >
> > dma_free_coherent(ohci->card.device, PAGE_SIZE,
> > - buffer, buffer_bus);
> > + start, start_bus);
> > ar_context_add_page(ctx);
>
> On the other hand, why do we free a page + allocate a page?
> Why don't we re-initialize and re-add the old page?

Oh good, I'm not crazy (outside of having firewire on the brain way too much
right now). I had that same thought tossing and turning in bed late last
night. :)

--
Jarod Wilson
[email protected]

2008-03-26 21:37:45

by Jarod Wilson

[permalink] [raw]
Subject: Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler

On Tuesday 25 March 2008 04:47:16 pm Jarod Wilson wrote:
> There's a nasty memory leak in firewire-ohci's ar_context_tasklet(), in
> that we're not freeing up some of the memory we use for each ar_buffer, due
> to a moving pointer. The problem has been there for a while, but didn't
> start to be noticed until we were doing a coherent allocation for the
> ar_buffer -- meaning we have a smaller pool of memory to work with now, so
> the problem crops up sooner. The manifestation of this comes after doing a
> bunch of I/O to a firewire disk, which eventually stalls, and this starts
> spewing to the console:
>
> PCI-DMA: Out of IOMMU space for 53248 bytes at device 0000:04:09.0
>
> The device there is one of my FireWire controllers trying to do I/O. The
> host is a fairly new rev. opteron.
>
> Just need to make sure we're freeing the correct memory range is pass
> through ar_context_tasklet to fix it. Probably something we ought to sneak
> into 2.6.25 if its still doable...

So as it turns out, while this is indeed a leak that needs to be plugged, it
does NOT remedy the 'out of iommu space' issue, it just delays it a while
longer. Still working on tracing the root cause of the memory exhaustion.


--
Jarod Wilson
[email protected]

2008-03-26 23:50:39

by Stefan Richter

[permalink] [raw]
Subject: Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler

I wrote:
> Jarod Wilson wrote:
>> @@ -605,7 +606,7 @@ static void ar_context_tasklet(unsigned long data)
>> buffer = handle_ar_packet(ctx, buffer);
>>
>> dma_free_coherent(ohci->card.device, PAGE_SIZE,
>> - buffer, buffer_bus);
>> + start, start_bus);
>> ar_context_add_page(ctx);
>
> On the other hand, why do we free a page + allocate a page?
> Why don't we re-initialize and re-add the old page?


Meanwhile I tried a simple modification to ar_context_add_page and its
callers which results in _add_page simply re-adding the old page. I must
do something fundamentally wrong though.

After plugging in a FW disk and starting hdparm -tT, I get the modified
_add_page called for the ar_request_ctx, then for the ar_response_ctx,
then for the ar_request_ctx again, then everything stalls in one of
these modes:
- No status write request reception is logged anymore, or
- status write request reception with evt_no_status is logged.
The number of _add_page calls for ar_request_ctx until failure
corresponds to the number of pages added in ar_context_init.
(Normally two, I also tried three and four.)

Just FYI, here is basically what I tested, with a debug printk in it.
---
drivers/firewire/fw-ohci.c | 34 +++++++++++++++-------------------
1 file changed, 15 insertions(+), 19 deletions(-)

Index: linux/drivers/firewire/fw-ohci.c
===================================================================
--- linux.orig/drivers/firewire/fw-ohci.c
+++ linux/drivers/firewire/fw-ohci.c
@@ -451,14 +451,19 @@ ohci_update_phy_reg(struct fw_card *card
return 0;
}

-static int ar_context_add_page(struct ar_context *ctx)
+static int ar_context_add_page(struct ar_context *ctx, struct ar_buffer *ab)
{
struct device *dev = ctx->ohci->card.device;
- struct ar_buffer *ab;
dma_addr_t uninitialized_var(ab_bus);
- size_t offset;
+ size_t offset = offsetof(struct ar_buffer, data);

- ab = dma_alloc_coherent(dev, PAGE_SIZE, &ab_bus, GFP_ATOMIC);
+ if (ab == NULL)
+ ab = dma_alloc_coherent(dev, PAGE_SIZE, &ab_bus, GFP_KERNEL);
+ else {
+ ab_bus = le32_to_cpu(ab->descriptor.data_address) - offset;
+ fw_notify("=== %s ===\n",
+ ctx == &ctx->ohci->ar_request_ctx ? "Req " : "Resp");
+ }
if (ab == NULL)
return -ENOMEM;

@@ -466,7 +471,6 @@ static int ar_context_add_page(struct ar
ab->descriptor.control = cpu_to_le16(DESCRIPTOR_INPUT_MORE |
DESCRIPTOR_STATUS |
DESCRIPTOR_BRANCH_ALWAYS);
- offset = offsetof(struct ar_buffer, data);
ab->descriptor.req_count = cpu_to_le16(PAGE_SIZE - offset);
ab->descriptor.data_address = cpu_to_le32(ab_bus + offset);
ab->descriptor.res_count = cpu_to_le16(PAGE_SIZE - offset);
@@ -569,8 +573,7 @@ static __le32 *handle_ar_packet(struct a
static void ar_context_tasklet(unsigned long data)
{
struct ar_context *ctx = (struct ar_context *)data;
- struct fw_ohci *ohci = ctx->ohci;
- struct ar_buffer *ab;
+ struct ar_buffer *ab, *old_ab;
struct descriptor *d;
void *buffer, *end;

@@ -578,9 +581,7 @@ static void ar_context_tasklet(unsigned
d = &ab->descriptor;

if (d->res_count == 0) {
- size_t size, rest, offset;
- dma_addr_t start_bus;
- void *start;
+ size_t size, rest;

/*
* This descriptor is finished and we may have a
@@ -588,10 +589,7 @@ static void ar_context_tasklet(unsigned
* reuse the page for reassembling the split packet.
*/

- offset = offsetof(struct ar_buffer, data);
- start = buffer = ab;
- start_bus = le32_to_cpu(ab->descriptor.data_address) - offset;
-
+ buffer = old_ab = ab;
ab = ab->next;
d = &ab->descriptor;
size = buffer + PAGE_SIZE - ctx->pointer;
@@ -605,9 +603,7 @@ static void ar_context_tasklet(unsigned
while (buffer < end)
buffer = handle_ar_packet(ctx, buffer);

- dma_free_coherent(ohci->card.device, PAGE_SIZE,
- start, start_bus);
- ar_context_add_page(ctx);
+ ar_context_add_page(ctx, old_ab);
} else {
buffer = ctx->pointer;
ctx->pointer = end =
@@ -628,8 +624,8 @@ ar_context_init(struct ar_context *ctx,
ctx->last_buffer = &ab;
tasklet_init(&ctx->tasklet, ar_context_tasklet, (unsigned long)ctx);

- ar_context_add_page(ctx);
- ar_context_add_page(ctx);
+ ar_context_add_page(ctx, NULL);
+ ar_context_add_page(ctx, NULL);
ctx->current_buffer = ab.next;
ctx->pointer = ctx->current_buffer->data;


--
Stefan Richter
-=====-==--- --== ==-==
http://arcgraph.de/sr/

2008-03-27 00:13:20

by Stefan Richter

[permalink] [raw]
Subject: Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler

Jarod Wilson wrote:
> So as it turns out, while this is indeed a leak that needs to be plugged, it
> does NOT remedy the 'out of iommu space' issue, it just delays it a while
> longer. Still working on tracing the root cause of the memory exhaustion.

Do you want to change the wording of the patch description before I
submit it upstream?
--
Stefan Richter
-=====-==--- --== ==-==
http://arcgraph.de/sr/

2008-03-27 02:18:01

by Jarod Wilson

[permalink] [raw]
Subject: Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler

On Wednesday 26 March 2008 08:12:51 pm Stefan Richter wrote:
> Jarod Wilson wrote:
> > So as it turns out, while this is indeed a leak that needs to be plugged,
> > it does NOT remedy the 'out of iommu space' issue, it just delays it a
> > while longer. Still working on tracing the root cause of the memory
> > exhaustion.
>
> Do you want to change the wording of the patch description before I
> submit it upstream?

Yeah, I'll whip something up in just a sec and get it out the door...

--
Jarod Wilson
[email protected]

2008-03-27 07:57:26

by Stefan Richter

[permalink] [raw]
Subject: Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler

I wrote:
> I wrote:
>> On the other hand, why do we free a page + allocate a page?
>> Why don't we re-initialize and re-add the old page?
>
> Meanwhile I tried a simple modification to ar_context_add_page and its
> callers which results in _add_page simply re-adding the old page. I must
> do something fundamentally wrong though.

Besides, the current code which reassembles packets that reach into the
next buffer is broken for packets whose total size approaches PAGE_SIZE.
(Remember, async packets can be sized 4kB + 1394 headers + OHCI
trailer.) Reminds me of ohci1394 somehow. :-(

I will attempt to fix this for post 2.6.25, unless you aspire to do so.
--
Stefan Richter
-=====-==--- --== ==-==
http://arcgraph.de/sr/