2007-10-26 16:12:58

by Haavard Skinnemoen

[permalink] [raw]
Subject: [PATCH] DMA: Fix broken device refcounting

When a DMA device is unregistered, its reference count is decremented
twice for each channel: Once dma_class_dev_release() and once in
dma_chan_cleanup(). This may result in the DMA device driver's
remove() function completing before all channels have been cleaned
up, causing lots of use-after-free fun.

Fix it by incrementing the device's reference count twice for each
channel during registration.

Signed-off-by: Haavard Skinnemoen <[email protected]>
---
I'm not sure if this is the correct way to solve it, but it seems to
work. The remove() function does not hang, which indicates that the
device's reference count does drop all the way to zero on
unregistration, which in turn indicates that it did actually drop
_below_ zero before.

drivers/dma/dmaengine.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 8248992..302eded 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -397,6 +397,8 @@ int dma_async_device_register(struct dma_device *device)
goto err_out;
}

+ /* One for the channel, one of the class device */
+ kref_get(&device->refcount);
kref_get(&device->refcount);
kref_init(&chan->refcount);
chan->slow_ref = 0;
--
1.5.2.5


2007-10-26 16:42:36

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] DMA: Fix broken device refcounting


On Fri, 2007-10-26 at 09:12 -0700, Haavard Skinnemoen wrote:
> I'm not sure if this is the correct way to solve it, but it seems to
> work. The remove() function does not hang, which indicates that the
> device's reference count does drop all the way to zero on
> unregistration, which in turn indicates that it did actually drop
> _below_ zero before.

Yeah, Shannon ran into this too... I'd like to be able clean this up by
reducing the number of time we take the device reference, but the
following patch is still showing problems in Shannon's environment, so I
missed one...

---

dmaengine: fix up dma_device refcounting

From: Dan Williams <[email protected]>

Currently the code drops too many references on the parent device. Change
the scheme to:

+ take a reference at registration:
dma_async_device_register()
+ take a reference for each channel device registered:
device_register(&chan->dev)
- drop a reference for each channel device unregistered:
device_unregister(&chan->dev)
- drop a reference at unregistration:
dma_async_device_unregister()

Signed-off-by: Dan Williams <[email protected]>
---

drivers/dma/dmaengine.c | 16 ++++------------
1 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 84257f7..d2b600b 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -186,10 +186,9 @@ static void dma_client_chan_alloc(struct dma_client *client)
/* we are done once this client rejects
* an available resource
*/
- if (ack == DMA_ACK) {
+ if (ack == DMA_ACK)
dma_chan_get(chan);
- kref_get(&device->refcount);
- } else if (ack == DMA_NAK)
+ else if (ack == DMA_NAK)
return;
}
}
@@ -221,7 +220,6 @@ void dma_chan_cleanup(struct kref *kref)
{
struct dma_chan *chan = container_of(kref, struct dma_chan, refcount);
chan->device->device_free_chan_resources(chan);
- kref_put(&chan->device->refcount, dma_async_device_cleanup);
}
EXPORT_SYMBOL(dma_chan_cleanup);

@@ -276,11 +274,8 @@ static void dma_clients_notify_removed(struct dma_chan *chan)
/* client was holding resources for this channel so
* free it
*/
- if (ack == DMA_ACK) {
+ if (ack == DMA_ACK)
dma_chan_put(chan);
- kref_put(&chan->device->refcount,
- dma_async_device_cleanup);
- }
}

mutex_unlock(&dma_list_mutex);
@@ -320,11 +315,8 @@ void dma_async_client_unregister(struct dma_client *client)
ack = client->event_callback(client, chan,
DMA_RESOURCE_REMOVED);

- if (ack == DMA_ACK) {
+ if (ack == DMA_ACK)
dma_chan_put(chan);
- kref_put(&chan->device->refcount,
- dma_async_device_cleanup);
- }
}

list_del(&client->global_node);

2007-10-26 17:04:37

by Shannon Nelson

[permalink] [raw]
Subject: RE: [PATCH] DMA: Fix broken device refcounting

>From: Haavard Skinnemoen [mailto:[email protected]]
>
>When a DMA device is unregistered, its reference count is decremented
>twice for each channel: Once dma_class_dev_release() and once in
>dma_chan_cleanup(). This may result in the DMA device driver's
>remove() function completing before all channels have been cleaned
>up, causing lots of use-after-free fun.
>
>Fix it by incrementing the device's reference count twice for each
>channel during registration.
>
>Signed-off-by: Haavard Skinnemoen <[email protected]>
>---
>I'm not sure if this is the correct way to solve it, but it seems to
>work. The remove() function does not hang, which indicates that the
>device's reference count does drop all the way to zero on
>unregistration, which in turn indicates that it did actually drop
>_below_ zero before.
>
> drivers/dma/dmaengine.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
>diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
>index 8248992..302eded 100644
>--- a/drivers/dma/dmaengine.c
>+++ b/drivers/dma/dmaengine.c
>@@ -397,6 +397,8 @@ int dma_async_device_register(struct
>dma_device *device)
> goto err_out;
> }
>
>+ /* One for the channel, one of the class device */
>+ kref_get(&device->refcount);
> kref_get(&device->refcount);
> kref_init(&chan->refcount);
> chan->slow_ref = 0;
>--
>1.5.2.5
>

As Dan said, we've been discussing this offline, and hadn't come to an
agreement yet. My version of the patch is the opposite of yours -
instead of adding a kref_get(), I remove one of the kref_put() calls.
--

When a channel is removed from dmaengine, too many kref_put() calls
are made and the device removal happens too soon, usually causing
a panic.

Signed-off-by: Shannon Nelson <[email protected]>
---

drivers/dma/dmaengine.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 8248992..144a1b7 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -131,7 +131,6 @@ static void dma_async_device_cleanup(struct kref
*kref);
static void dma_class_dev_release(struct class_device *cd)
{
struct dma_chan *chan = container_of(cd, struct dma_chan,
class_dev);
- kref_put(&chan->device->refcount, dma_async_device_cleanup);
}

static struct class dma_devclass = {

2007-10-26 17:11:15

by Shannon Nelson

[permalink] [raw]
Subject: RE: [PATCH] DMA: Fix broken device refcounting

>-----Original Message-----
>From: Nelson, Shannon
>Sent: Friday, October 26, 2007 10:00 AM
>To: 'Haavard Skinnemoen'
>Cc: Williams, Dan J; [email protected];
>[email protected]
>Subject: RE: [PATCH] DMA: Fix broken device refcounting
>
>--
>
>When a channel is removed from dmaengine, too many kref_put() calls
>are made and the device removal happens too soon, usually causing
>a panic.
>
>Signed-off-by: Shannon Nelson <[email protected]>
>---
>
> drivers/dma/dmaengine.c | 1 -
> 1 files changed, 0 insertions(+), 1 deletions(-)
>
>diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
>index 8248992..144a1b7 100644
>--- a/drivers/dma/dmaengine.c
>+++ b/drivers/dma/dmaengine.c
>@@ -131,7 +131,6 @@ static void
>dma_async_device_cleanup(struct kref *kref);
> static void dma_class_dev_release(struct class_device *cd)
> {
> struct dma_chan *chan = container_of(cd, struct
>dma_chan, class_dev);
>- kref_put(&chan->device->refcount, dma_async_device_cleanup);
> }
>
> static struct class dma_devclass = {

Of course, to avoid compiler complaints, it might be better as something
like:

static void dma_class_dev_release(struct class_device *cd)
{
- struct dma_chan *chan = container_of(cd, struct dma_chan,
class_dev);
- kref_put(&chan->device->refcount, dma_async_device_cleanup);
+ return;
}

sln

2007-10-27 13:49:50

by Haavard Skinnemoen

[permalink] [raw]
Subject: Re: [PATCH] DMA: Fix broken device refcounting

On Fri, 26 Oct 2007 09:36:17 -0700
Dan Williams <[email protected]> wrote:

> @@ -221,7 +220,6 @@ void dma_chan_cleanup(struct kref *kref)
> {
> struct dma_chan *chan = container_of(kref, struct dma_chan, refcount);
> chan->device->device_free_chan_resources(chan);
> - kref_put(&chan->device->refcount, dma_async_device_cleanup);
> }
> EXPORT_SYMBOL(dma_chan_cleanup);

While I can't see any problems with the rest of the patch, I think this
part is wrong for the same reasons removing the kref_put() from the
class device cleanup function is. I don't see any constraint that
guarantees that dma_chan_cleanup() will always be called before
dma_dev_release(), which means that "chan" may have been freed before
this function gets a chance to run. Please correct me if I'm wrong.

Håvard

2007-10-27 19:15:09

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] DMA: Fix broken device refcounting

On Sat, 2007-10-27 at 06:49 -0700, Haavard Skinnemoen wrote:
> On Fri, 26 Oct 2007 09:36:17 -0700
> Dan Williams <[email protected]> wrote:
>
> > @@ -221,7 +220,6 @@ void dma_chan_cleanup(struct kref *kref)
> > {
> > struct dma_chan *chan = container_of(kref, struct dma_chan, refcount);
> > chan->device->device_free_chan_resources(chan);
> > - kref_put(&chan->device->refcount, dma_async_device_cleanup);
> > }
> > EXPORT_SYMBOL(dma_chan_cleanup);
>
> While I can't see any problems with the rest of the patch, I think this
> part is wrong for the same reasons removing the kref_put() from the
> class device cleanup function is. I don't see any constraint that
> guarantees that dma_chan_cleanup() will always be called before
> dma_dev_release(), which means that "chan" may have been freed before
> this function gets a chance to run. Please correct me if I'm wrong.

Absolutely right, the driver, not dmaengine, frees the memory so there
must be a per channel reference on the device to hold off the driver's
remove routine.
>
> Håvard

So how about this...

---snip---
dmaengine: Fix broken device refcounting

From: Haavard Skinnemoen <[email protected]>

When a DMA device is unregistered, its reference count is decremented
twice for each channel: Once dma_class_dev_release() and once in
dma_chan_cleanup(). This may result in the DMA device driver's
remove() function completing before all channels have been cleaned
up, causing lots of use-after-free fun.

Fix it by incrementing the device's reference count twice for each
channel during registration.

Signed-off-by: Haavard Skinnemoen <[email protected]>
[[email protected]: kill unnecessary client refcounting]
Signed-off-by: Dan Williams <[email protected]>
---

drivers/dma/dmaengine.c | 17 ++++++-----------
1 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 84257f7..ec7e871 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -186,10 +186,9 @@ static void dma_client_chan_alloc(struct dma_client *client)
/* we are done once this client rejects
* an available resource
*/
- if (ack == DMA_ACK) {
+ if (ack == DMA_ACK)
dma_chan_get(chan);
- kref_get(&device->refcount);
- } else if (ack == DMA_NAK)
+ else if (ack == DMA_NAK)
return;
}
}
@@ -276,11 +275,8 @@ static void dma_clients_notify_removed(struct dma_chan *chan)
/* client was holding resources for this channel so
* free it
*/
- if (ack == DMA_ACK) {
+ if (ack == DMA_ACK)
dma_chan_put(chan);
- kref_put(&chan->device->refcount,
- dma_async_device_cleanup);
- }
}

mutex_unlock(&dma_list_mutex);
@@ -320,11 +316,8 @@ void dma_async_client_unregister(struct dma_client *client)
ack = client->event_callback(client, chan,
DMA_RESOURCE_REMOVED);

- if (ack == DMA_ACK) {
+ if (ack == DMA_ACK)
dma_chan_put(chan);
- kref_put(&chan->device->refcount,
- dma_async_device_cleanup);
- }
}

list_del(&client->global_node);
@@ -401,6 +394,8 @@ int dma_async_device_register(struct dma_device *device)
goto err_out;
}

+ /* One for the channel, one of the class device */
+ kref_get(&device->refcount);
kref_get(&device->refcount);
kref_init(&chan->refcount);
chan->slow_ref = 0;

2007-10-28 19:18:02

by Shannon Nelson

[permalink] [raw]
Subject: Re: [PATCH] DMA: Fix broken device refcounting

On 10/27/07, Dan Williams <[email protected]> wrote:
> On Sat, 2007-10-27 at 06:49 -0700, Haavard Skinnemoen wrote:
> > On Fri, 26 Oct 2007 09:36:17 -0700
> > Dan Williams <[email protected]> wrote:
> >
> > > @@ -221,7 +220,6 @@ void dma_chan_cleanup(struct kref *kref)
> > > {
> > > struct dma_chan *chan = container_of(kref, struct dma_chan, refcount);
> > > chan->device->device_free_chan_resources(chan);
> > > - kref_put(&chan->device->refcount, dma_async_device_cleanup);
> > > }
> > > EXPORT_SYMBOL(dma_chan_cleanup);
> >
> > While I can't see any problems with the rest of the patch, I think this
> > part is wrong for the same reasons removing the kref_put() from the
> > class device cleanup function is. I don't see any constraint that
> > guarantees that dma_chan_cleanup() will always be called before
> > dma_dev_release(), which means that "chan" may have been freed before
> > this function gets a chance to run. Please correct me if I'm wrong.
>
> Absolutely right, the driver, not dmaengine, frees the memory so there
> must be a per channel reference on the device to hold off the driver's
> remove routine.
> >
> > H?vard
>
> So how about this...
>
> ---snip---
> dmaengine: Fix broken device refcounting
>
> From: Haavard Skinnemoen <[email protected]>
>
> When a DMA device is unregistered, its reference count is decremented
> twice for each channel: Once dma_class_dev_release() and once in
> dma_chan_cleanup(). This may result in the DMA device driver's
> remove() function completing before all channels have been cleaned
> up, causing lots of use-after-free fun.
>
> Fix it by incrementing the device's reference count twice for each
> channel during registration.
>
> Signed-off-by: Haavard Skinnemoen <[email protected]>
> [[email protected]: kill unnecessary client refcounting]
> Signed-off-by: Dan Williams <[email protected]>
> ---
>
> drivers/dma/dmaengine.c | 17 ++++++-----------
> 1 files changed, 6 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> index 84257f7..ec7e871 100644
> --- a/drivers/dma/dmaengine.c
> +++ b/drivers/dma/dmaengine.c
> @@ -186,10 +186,9 @@ static void dma_client_chan_alloc(struct dma_client *client)
> /* we are done once this client rejects
> * an available resource
> */
> - if (ack == DMA_ACK) {
> + if (ack == DMA_ACK)
> dma_chan_get(chan);
> - kref_get(&device->refcount);
> - } else if (ack == DMA_NAK)
> + else if (ack == DMA_NAK)
> return;
> }
> }
> @@ -276,11 +275,8 @@ static void dma_clients_notify_removed(struct dma_chan *chan)
> /* client was holding resources for this channel so
> * free it
> */
> - if (ack == DMA_ACK) {
> + if (ack == DMA_ACK)
> dma_chan_put(chan);
> - kref_put(&chan->device->refcount,
> - dma_async_device_cleanup);
> - }
> }
>
> mutex_unlock(&dma_list_mutex);
> @@ -320,11 +316,8 @@ void dma_async_client_unregister(struct dma_client *client)
> ack = client->event_callback(client, chan,
> DMA_RESOURCE_REMOVED);
>
> - if (ack == DMA_ACK) {
> + if (ack == DMA_ACK)
> dma_chan_put(chan);
> - kref_put(&chan->device->refcount,
> - dma_async_device_cleanup);
> - }
> }
>
> list_del(&client->global_node);
> @@ -401,6 +394,8 @@ int dma_async_device_register(struct dma_device *device)
> goto err_out;
> }
>
> + /* One for the channel, one of the class device */
> + kref_get(&device->refcount);
> kref_get(&device->refcount);
> kref_init(&chan->refcount);
> chan->slow_ref = 0;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Thanks - when I get back in tomorrow morning I'll test this to see
that it gets rid of the panic that I've been getting.

sln
--
==============================================
Mr. Shannon Nelson Parents can't afford to be squeamish.

2007-10-29 16:02:53

by Shannon Nelson

[permalink] [raw]
Subject: RE: [PATCH] DMA: Fix broken device refcounting

>From: Williams, Dan J
>On Sat, 2007-10-27 at 06:49 -0700, Haavard Skinnemoen wrote:
>> On Fri, 26 Oct 2007 09:36:17 -0700
>> Dan Williams <[email protected]> wrote:
>>
>> > @@ -221,7 +220,6 @@ void dma_chan_cleanup(struct kref *kref)
>> > {
>> > struct dma_chan *chan = container_of(kref, struct
>dma_chan, refcount);
>> > chan->device->device_free_chan_resources(chan);
>> > - kref_put(&chan->device->refcount, dma_async_device_cleanup);
>> > }
>> > EXPORT_SYMBOL(dma_chan_cleanup);
>>
>> While I can't see any problems with the rest of the patch, I
>think this
>> part is wrong for the same reasons removing the kref_put() from the
>> class device cleanup function is. I don't see any constraint that
>> guarantees that dma_chan_cleanup() will always be called before
>> dma_dev_release(), which means that "chan" may have been freed before
>> this function gets a chance to run. Please correct me if I'm wrong.
>
>Absolutely right, the driver, not dmaengine, frees the memory so there
>must be a per channel reference on the device to hold off the driver's
>remove routine.
>>
>> H?vard
>
>So how about this...
>
>---snip---
>dmaengine: Fix broken device refcounting
>
>From: Haavard Skinnemoen <[email protected]>
>
>When a DMA device is unregistered, its reference count is decremented
>twice for each channel: Once dma_class_dev_release() and once in
>dma_chan_cleanup(). This may result in the DMA device driver's
>remove() function completing before all channels have been cleaned
>up, causing lots of use-after-free fun.
>
>Fix it by incrementing the device's reference count twice for each
>channel during registration.
>
>Signed-off-by: Haavard Skinnemoen <[email protected]>
>[[email protected]: kill unnecessary client refcounting]
>Signed-off-by: Dan Williams <[email protected]>
>---
>
> drivers/dma/dmaengine.c | 17 ++++++-----------
> 1 files changed, 6 insertions(+), 11 deletions(-)
>
>diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
>index 84257f7..ec7e871 100644
>--- a/drivers/dma/dmaengine.c
>+++ b/drivers/dma/dmaengine.c
>@@ -186,10 +186,9 @@ static void dma_client_chan_alloc(struct
>dma_client *client)
> /* we are done once this client rejects
> * an available resource
> */
>- if (ack == DMA_ACK) {
>+ if (ack == DMA_ACK)
> dma_chan_get(chan);
>- kref_get(&device->refcount);
>- } else if (ack == DMA_NAK)
>+ else if (ack == DMA_NAK)
> return;
> }
> }
>@@ -276,11 +275,8 @@ static void
>dma_clients_notify_removed(struct dma_chan *chan)
> /* client was holding resources for this channel so
> * free it
> */
>- if (ack == DMA_ACK) {
>+ if (ack == DMA_ACK)
> dma_chan_put(chan);
>- kref_put(&chan->device->refcount,
>- dma_async_device_cleanup);
>- }
> }
>
> mutex_unlock(&dma_list_mutex);
>@@ -320,11 +316,8 @@ void dma_async_client_unregister(struct
>dma_client *client)
> ack = client->event_callback(client, chan,
> DMA_RESOURCE_REMOVED);
>
>- if (ack == DMA_ACK) {
>+ if (ack == DMA_ACK)
> dma_chan_put(chan);
>- kref_put(&chan->device->refcount,
>- dma_async_device_cleanup);
>- }
> }
>
> list_del(&client->global_node);
>@@ -401,6 +394,8 @@ int dma_async_device_register(struct
>dma_device *device)
> goto err_out;
> }
>
>+ /* One for the channel, one of the class device */
>+ kref_get(&device->refcount);
> kref_get(&device->refcount);
> kref_init(&chan->refcount);
> chan->slow_ref = 0;
>
>

I tested this in my ioatdma setup and no longer get the panic. I'm good with this if you two are happy with it.

Signed-off-by: Shannon Nelson <[email protected]>

sln

2007-10-29 16:11:41

by Haavard Skinnemoen

[permalink] [raw]
Subject: Re: [PATCH] DMA: Fix broken device refcounting

On Mon, 29 Oct 2007 09:02:34 -0700
"Nelson, Shannon" <[email protected]> wrote:

> I tested this in my ioatdma setup and no longer get the panic. I'm good with this if you two are happy with it.

Looks good to me too, although I haven't had a chance to test it yet.

Thanks,
Håvard