2019-10-23 00:03:11

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 1/5] dmaengine: Store module owner in dma_device struct

dma_chan_to_owner() dereferences the driver from the struct device to
obtain the owner and call module_[get|put](). However, if the backing
device is unbound before the dma_device is unregistered, the driver
will be cleared and this will cause a NULL pointer dereference.

Instead, store a pointer to the owner module in the dma_device struct
so the module reference can be properly put when the channel is put, even
if the backing device was destroyed first.

This change helps to support a safer unbind of DMA engines.
If the dma_device is unregistered in the driver's remove function,
there's no guarantee that there are no existing clients and a users
action may trigger the WARN_ONCE in dma_async_device_unregister()
which is unlikely to leave the system in a consistent state.
Instead, a better approach is to allow the backing driver to go away
and fail any subsequent requests to it.

Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/dma/dmaengine.c | 4 +++-
include/linux/dmaengine.h | 2 ++
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 03ac4b96117c..4b604086b1b3 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -179,7 +179,7 @@ __dma_device_satisfies_mask(struct dma_device *device,

static struct module *dma_chan_to_owner(struct dma_chan *chan)
{
- return chan->device->dev->driver->owner;
+ return chan->device->owner;
}

/**
@@ -919,6 +919,8 @@ int dma_async_device_register(struct dma_device *device)
return -EIO;
}

+ device->owner = device->dev->driver->owner;
+
if (dma_has_cap(DMA_MEMCPY, device->cap_mask) && !device->device_prep_dma_memcpy) {
dev_err(device->dev,
"Device claims capability %s, but op is not defined\n",
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 8fcdee1c0cf9..13aa0abb71de 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -674,6 +674,7 @@ struct dma_filter {
* @fill_align: alignment shift for memset operations
* @dev_id: unique device ID
* @dev: struct device reference for dma mapping api
+ * @owner: owner module (automatically set based on the provided dev)
* @src_addr_widths: bit mask of src addr widths the device supports
* Width is specified in bytes, e.g. for a device supporting
* a width of 4 the mask should have BIT(4) set.
@@ -737,6 +738,7 @@ struct dma_device {

int dev_id;
struct device *dev;
+ struct module *owner;

u32 src_addr_widths;
u32 dst_addr_widths;
--
2.20.1


2019-11-09 17:19:59

by Vinod Koul

[permalink] [raw]
Subject: Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct

Hi Logan,

Sorry for delay in reply!

On 22-10-19, 15:46, Logan Gunthorpe wrote:
> dma_chan_to_owner() dereferences the driver from the struct device to
> obtain the owner and call module_[get|put](). However, if the backing
> device is unbound before the dma_device is unregistered, the driver
> will be cleared and this will cause a NULL pointer dereference.

Have you been able to repro this? If so how..?

The expectation is that the driver shall unregister before removed.
>
> Instead, store a pointer to the owner module in the dma_device struct
> so the module reference can be properly put when the channel is put, even
> if the backing device was destroyed first.
>
> This change helps to support a safer unbind of DMA engines.

For error cases which should be fixed, so maybe this is a right way and
gets things fixed :)

> If the dma_device is unregistered in the driver's remove function,
> there's no guarantee that there are no existing clients and a users
> action may trigger the WARN_ONCE in dma_async_device_unregister()
> which is unlikely to leave the system in a consistent state.
> Instead, a better approach is to allow the backing driver to go away
> and fail any subsequent requests to it.
>
> Signed-off-by: Logan Gunthorpe <[email protected]>
> ---
> drivers/dma/dmaengine.c | 4 +++-
> include/linux/dmaengine.h | 2 ++
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> index 03ac4b96117c..4b604086b1b3 100644
> --- a/drivers/dma/dmaengine.c
> +++ b/drivers/dma/dmaengine.c
> @@ -179,7 +179,7 @@ __dma_device_satisfies_mask(struct dma_device *device,
>
> static struct module *dma_chan_to_owner(struct dma_chan *chan)
> {
> - return chan->device->dev->driver->owner;
> + return chan->device->owner;
> }
>
> /**
> @@ -919,6 +919,8 @@ int dma_async_device_register(struct dma_device *device)
> return -EIO;
> }
>
> + device->owner = device->dev->driver->owner;
> +
> if (dma_has_cap(DMA_MEMCPY, device->cap_mask) && !device->device_prep_dma_memcpy) {
> dev_err(device->dev,
> "Device claims capability %s, but op is not defined\n",
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index 8fcdee1c0cf9..13aa0abb71de 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -674,6 +674,7 @@ struct dma_filter {
> * @fill_align: alignment shift for memset operations
> * @dev_id: unique device ID
> * @dev: struct device reference for dma mapping api
> + * @owner: owner module (automatically set based on the provided dev)
> * @src_addr_widths: bit mask of src addr widths the device supports
> * Width is specified in bytes, e.g. for a device supporting
> * a width of 4 the mask should have BIT(4) set.
> @@ -737,6 +738,7 @@ struct dma_device {
>
> int dev_id;
> struct device *dev;
> + struct module *owner;
>
> u32 src_addr_widths;
> u32 dst_addr_widths;
> --
> 2.20.1

--
~Vinod

2019-11-11 16:51:32

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct



On 2019-11-09 10:18 a.m., Vinod Koul wrote:
> Hi Logan,
>
> Sorry for delay in reply!
>
> On 22-10-19, 15:46, Logan Gunthorpe wrote:
>> dma_chan_to_owner() dereferences the driver from the struct device to
>> obtain the owner and call module_[get|put](). However, if the backing
>> device is unbound before the dma_device is unregistered, the driver
>> will be cleared and this will cause a NULL pointer dereference.
>
> Have you been able to repro this? If so how..?
>
> The expectation is that the driver shall unregister before removed.

Yes, with my new driver, if I do a PCI unbind (which unregisters) while
the DMA engine is in use, it panics. The point is the underlying driver
can go away before the channel is removed.

I suspect this is less of an issue for most devices as they wouldn't
normally be unbound while in use (for example there's really no reason
to ever unbind IOAT seeing it's built into the system). Though, the fact
is, the user could unbind these devices at anytime and we don't want to
panic if they do.

>>
>> Instead, store a pointer to the owner module in the dma_device struct
>> so the module reference can be properly put when the channel is put, even
>> if the backing device was destroyed first.
>>
>> This change helps to support a safer unbind of DMA engines.
>
> For error cases which should be fixed, so maybe this is a right way and
> gets things fixed :)

Yes, if you'd like to merge the first two patches ahead of the rest,
that would make sense to me.

Thanks,

Logan

2019-11-12 05:58:29

by Vinod Koul

[permalink] [raw]
Subject: Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct

On 11-11-19, 09:50, Logan Gunthorpe wrote:
>
>
> On 2019-11-09 10:18 a.m., Vinod Koul wrote:
> > Hi Logan,
> >
> > Sorry for delay in reply!
> >
> > On 22-10-19, 15:46, Logan Gunthorpe wrote:
> >> dma_chan_to_owner() dereferences the driver from the struct device to
> >> obtain the owner and call module_[get|put](). However, if the backing
> >> device is unbound before the dma_device is unregistered, the driver
> >> will be cleared and this will cause a NULL pointer dereference.
> >
> > Have you been able to repro this? If so how..?
> >
> > The expectation is that the driver shall unregister before removed.
>
> Yes, with my new driver, if I do a PCI unbind (which unregisters) while
> the DMA engine is in use, it panics. The point is the underlying driver
> can go away before the channel is removed.

and in your driver remove you do not unregister? When unbind is invoked
the driver remove is invoked by core and you should unregister whatever
you have registered in your probe!

Said that, if someone is using the dmaengine at that point of time, it
is not a nice thing to do and can cause issues, but on idle it should
just work!

> I suspect this is less of an issue for most devices as they wouldn't
> normally be unbound while in use (for example there's really no reason
> to ever unbind IOAT seeing it's built into the system). Though, the fact
> is, the user could unbind these devices at anytime and we don't want to
> panic if they do.

There are many drivers which do modules so yes I am expecting unbind and
even a bind following that to work

--
~Vinod

2019-11-12 16:47:05

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct



On 2019-11-11 10:56 p.m., Vinod Koul wrote:
> On 11-11-19, 09:50, Logan Gunthorpe wrote:
>>
>>
>> On 2019-11-09 10:18 a.m., Vinod Koul wrote:
>>> Hi Logan,
>>>
>>> Sorry for delay in reply!
>>>
>>> On 22-10-19, 15:46, Logan Gunthorpe wrote:
>>>> dma_chan_to_owner() dereferences the driver from the struct device to
>>>> obtain the owner and call module_[get|put](). However, if the backing
>>>> device is unbound before the dma_device is unregistered, the driver
>>>> will be cleared and this will cause a NULL pointer dereference.
>>>
>>> Have you been able to repro this? If so how..?
>>>
>>> The expectation is that the driver shall unregister before removed.
>>
>> Yes, with my new driver, if I do a PCI unbind (which unregisters) while
>> the DMA engine is in use, it panics. The point is the underlying driver
>> can go away before the channel is removed.
>
> and in your driver remove you do not unregister? When unbind is invoked
> the driver remove is invoked by core and you should unregister whatever
> you have registered in your probe!
>
> Said that, if someone is using the dmaengine at that point of time, it
> is not a nice thing to do and can cause issues, but on idle it should
> just work!

But that's the problem. We can't expect our users to be "nice" and not
unbind when the driver is in use. Killing the kernel if the user
unexpectedly unbinds is not acceptable.

>> I suspect this is less of an issue for most devices as they wouldn't
>> normally be unbound while in use (for example there's really no reason
>> to ever unbind IOAT seeing it's built into the system). Though, the fact
>> is, the user could unbind these devices at anytime and we don't want to
>> panic if they do.
>
> There are many drivers which do modules so yes I am expecting unbind and
> even a bind following that to work

Except they will panic if they unbind while in use, so that's a
questionable definition of "work".

Logan

2019-11-14 04:56:57

by Vinod Koul

[permalink] [raw]
Subject: Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct

On 12-11-19, 09:45, Logan Gunthorpe wrote:
>
>
> On 2019-11-11 10:56 p.m., Vinod Koul wrote:
> > On 11-11-19, 09:50, Logan Gunthorpe wrote:
> >>
> >>
> >> On 2019-11-09 10:18 a.m., Vinod Koul wrote:
> >>> Hi Logan,
> >>>
> >>> Sorry for delay in reply!
> >>>
> >>> On 22-10-19, 15:46, Logan Gunthorpe wrote:
> >>>> dma_chan_to_owner() dereferences the driver from the struct device to
> >>>> obtain the owner and call module_[get|put](). However, if the backing
> >>>> device is unbound before the dma_device is unregistered, the driver
> >>>> will be cleared and this will cause a NULL pointer dereference.
> >>>
> >>> Have you been able to repro this? If so how..?
> >>>
> >>> The expectation is that the driver shall unregister before removed.
> >>
> >> Yes, with my new driver, if I do a PCI unbind (which unregisters) while
> >> the DMA engine is in use, it panics. The point is the underlying driver
> >> can go away before the channel is removed.
> >
> > and in your driver remove you do not unregister? When unbind is invoked
> > the driver remove is invoked by core and you should unregister whatever
> > you have registered in your probe!
> >
> > Said that, if someone is using the dmaengine at that point of time, it
> > is not a nice thing to do and can cause issues, but on idle it should
> > just work!
>
> But that's the problem. We can't expect our users to be "nice" and not
> unbind when the driver is in use. Killing the kernel if the user
> unexpectedly unbinds is not acceptable.

And that is why we review the code and ensure this does not happen and
behaviour is as expected

> >> I suspect this is less of an issue for most devices as they wouldn't
> >> normally be unbound while in use (for example there's really no reason
> >> to ever unbind IOAT seeing it's built into the system). Though, the fact
> >> is, the user could unbind these devices at anytime and we don't want to
> >> panic if they do.
> >
> > There are many drivers which do modules so yes I am expecting unbind and
> > even a bind following that to work
>
> Except they will panic if they unbind while in use, so that's a
> questionable definition of "work".

dmaengine core has module reference so while they are being used they
won't be removed (unless I complete misread the driver core behaviour)

--
~Vinod

2019-11-14 17:05:40

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct



On 2019-11-13 9:55 p.m., Vinod Koul wrote:
>> But that's the problem. We can't expect our users to be "nice" and not
>> unbind when the driver is in use. Killing the kernel if the user
>> unexpectedly unbinds is not acceptable.
>
> And that is why we review the code and ensure this does not happen and
> behaviour is as expected

Yes, but the current code can kill the kernel when the driver is unbound.

>>>> I suspect this is less of an issue for most devices as they wouldn't
>>>> normally be unbound while in use (for example there's really no reason
>>>> to ever unbind IOAT seeing it's built into the system). Though, the fact
>>>> is, the user could unbind these devices at anytime and we don't want to
>>>> panic if they do.
>>>
>>> There are many drivers which do modules so yes I am expecting unbind and
>>> even a bind following that to work
>>
>> Except they will panic if they unbind while in use, so that's a
>> questionable definition of "work".
>
> dmaengine core has module reference so while they are being used they
> won't be removed (unless I complete misread the driver core behaviour)

Yes, as I mentioned in my other email, holding a module reference does
not prevent the driver from being unbound. Any driver can be unbound by
the user at any time without the module being removed.

Essentially, at any time, a user can do this:

echo 0000:83:00.4 > /sys/bus/pci/drivers/plx_dma/unbind

Which will call plx_dma_remove() regardless of whether anyone has a
reference to the module, and regardless of whether the dma channel is
currently in use. I feel it is important that drivers support this
without crashing, and my plx_dma driver does the correct thing here.

Logan

2019-11-22 05:22:14

by Vinod Koul

[permalink] [raw]
Subject: Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct

On 14-11-19, 10:03, Logan Gunthorpe wrote:
>
>
> On 2019-11-13 9:55 p.m., Vinod Koul wrote:
> >> But that's the problem. We can't expect our users to be "nice" and not
> >> unbind when the driver is in use. Killing the kernel if the user
> >> unexpectedly unbinds is not acceptable.
> >
> > And that is why we review the code and ensure this does not happen and
> > behaviour is as expected
>
> Yes, but the current code can kill the kernel when the driver is unbound.
>
> >>>> I suspect this is less of an issue for most devices as they wouldn't
> >>>> normally be unbound while in use (for example there's really no reason
> >>>> to ever unbind IOAT seeing it's built into the system). Though, the fact
> >>>> is, the user could unbind these devices at anytime and we don't want to
> >>>> panic if they do.
> >>>
> >>> There are many drivers which do modules so yes I am expecting unbind and
> >>> even a bind following that to work
> >>
> >> Except they will panic if they unbind while in use, so that's a
> >> questionable definition of "work".
> >
> > dmaengine core has module reference so while they are being used they
> > won't be removed (unless I complete misread the driver core behaviour)
>
> Yes, as I mentioned in my other email, holding a module reference does
> not prevent the driver from being unbound. Any driver can be unbound by
> the user at any time without the module being removed.

That sounds okay then.
>
> Essentially, at any time, a user can do this:
>
> echo 0000:83:00.4 > /sys/bus/pci/drivers/plx_dma/unbind
>
> Which will call plx_dma_remove() regardless of whether anyone has a
> reference to the module, and regardless of whether the dma channel is
> currently in use. I feel it is important that drivers support this
> without crashing, and my plx_dma driver does the correct thing here.
>
> Logan

--
~Vinod

2019-11-22 16:55:29

by Dave Jiang

[permalink] [raw]
Subject: Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct



On 11/21/19 10:20 PM, Vinod Koul wrote:
> On 14-11-19, 10:03, Logan Gunthorpe wrote:
>>
>>
>> On 2019-11-13 9:55 p.m., Vinod Koul wrote:
>>>> But that's the problem. We can't expect our users to be "nice" and not
>>>> unbind when the driver is in use. Killing the kernel if the user
>>>> unexpectedly unbinds is not acceptable.
>>>
>>> And that is why we review the code and ensure this does not happen and
>>> behaviour is as expected
>>
>> Yes, but the current code can kill the kernel when the driver is unbound.
>>
>>>>>> I suspect this is less of an issue for most devices as they wouldn't
>>>>>> normally be unbound while in use (for example there's really no reason
>>>>>> to ever unbind IOAT seeing it's built into the system). Though, the fact
>>>>>> is, the user could unbind these devices at anytime and we don't want to
>>>>>> panic if they do.
>>>>>
>>>>> There are many drivers which do modules so yes I am expecting unbind and
>>>>> even a bind following that to work
>>>>
>>>> Except they will panic if they unbind while in use, so that's a
>>>> questionable definition of "work".
>>>
>>> dmaengine core has module reference so while they are being used they
>>> won't be removed (unless I complete misread the driver core behaviour)
>>
>> Yes, as I mentioned in my other email, holding a module reference does
>> not prevent the driver from being unbound. Any driver can be unbound by
>> the user at any time without the module being removed.
>
> That sounds okay then.

I'm actually glad Logan is putting some work in addressing this. I also
ran into the same issue as well dealing with unbinds on my new driver.

>>
>> Essentially, at any time, a user can do this:
>>
>> echo 0000:83:00.4 > /sys/bus/pci/drivers/plx_dma/unbind
>>
>> Which will call plx_dma_remove() regardless of whether anyone has a
>> reference to the module, and regardless of whether the dma channel is
>> currently in use. I feel it is important that drivers support this
>> without crashing, and my plx_dma driver does the correct thing here.
>>
>> Logan
>

2019-11-22 20:53:27

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct

On Fri, Nov 22, 2019 at 8:53 AM Dave Jiang <[email protected]> wrote:
>
>
>
> On 11/21/19 10:20 PM, Vinod Koul wrote:
> > On 14-11-19, 10:03, Logan Gunthorpe wrote:
> >>
> >>
> >> On 2019-11-13 9:55 p.m., Vinod Koul wrote:
> >>>> But that's the problem. We can't expect our users to be "nice" and not
> >>>> unbind when the driver is in use. Killing the kernel if the user
> >>>> unexpectedly unbinds is not acceptable.
> >>>
> >>> And that is why we review the code and ensure this does not happen and
> >>> behaviour is as expected
> >>
> >> Yes, but the current code can kill the kernel when the driver is unbound.
> >>
> >>>>>> I suspect this is less of an issue for most devices as they wouldn't
> >>>>>> normally be unbound while in use (for example there's really no reason
> >>>>>> to ever unbind IOAT seeing it's built into the system). Though, the fact
> >>>>>> is, the user could unbind these devices at anytime and we don't want to
> >>>>>> panic if they do.
> >>>>>
> >>>>> There are many drivers which do modules so yes I am expecting unbind and
> >>>>> even a bind following that to work
> >>>>
> >>>> Except they will panic if they unbind while in use, so that's a
> >>>> questionable definition of "work".
> >>>
> >>> dmaengine core has module reference so while they are being used they
> >>> won't be removed (unless I complete misread the driver core behaviour)
> >>
> >> Yes, as I mentioned in my other email, holding a module reference does
> >> not prevent the driver from being unbound. Any driver can be unbound by
> >> the user at any time without the module being removed.
> >
> > That sounds okay then.
>
> I'm actually glad Logan is putting some work in addressing this. I also
> ran into the same issue as well dealing with unbinds on my new driver.

This was an original mistake of the dmaengine implementation that
Vinod inherited. Module pinning is distinct from preventing device
unbind which ultimately can't be prevented. Longer term I think we
need to audit dmaengine consumers to make sure they are prepared for
the driver to be removed similar to how other request based drivers
can gracefully return an error status when the device goes away rather
than crashing.

2019-11-22 20:59:49

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct



On 2019-11-22 1:50 p.m., Dan Williams wrote:
> On Fri, Nov 22, 2019 at 8:53 AM Dave Jiang <[email protected]> wrote:
>>
>>
>>
>> On 11/21/19 10:20 PM, Vinod Koul wrote:
>>> On 14-11-19, 10:03, Logan Gunthorpe wrote:
>>>>
>>>>
>>>> On 2019-11-13 9:55 p.m., Vinod Koul wrote:
>>>>>> But that's the problem. We can't expect our users to be "nice" and not
>>>>>> unbind when the driver is in use. Killing the kernel if the user
>>>>>> unexpectedly unbinds is not acceptable.
>>>>>
>>>>> And that is why we review the code and ensure this does not happen and
>>>>> behaviour is as expected
>>>>
>>>> Yes, but the current code can kill the kernel when the driver is unbound.
>>>>
>>>>>>>> I suspect this is less of an issue for most devices as they wouldn't
>>>>>>>> normally be unbound while in use (for example there's really no reason
>>>>>>>> to ever unbind IOAT seeing it's built into the system). Though, the fact
>>>>>>>> is, the user could unbind these devices at anytime and we don't want to
>>>>>>>> panic if they do.
>>>>>>>
>>>>>>> There are many drivers which do modules so yes I am expecting unbind and
>>>>>>> even a bind following that to work
>>>>>>
>>>>>> Except they will panic if they unbind while in use, so that's a
>>>>>> questionable definition of "work".
>>>>>
>>>>> dmaengine core has module reference so while they are being used they
>>>>> won't be removed (unless I complete misread the driver core behaviour)
>>>>
>>>> Yes, as I mentioned in my other email, holding a module reference does
>>>> not prevent the driver from being unbound. Any driver can be unbound by
>>>> the user at any time without the module being removed.
>>>
>>> That sounds okay then.
>>
>> I'm actually glad Logan is putting some work in addressing this. I also
>> ran into the same issue as well dealing with unbinds on my new driver.
>
> This was an original mistake of the dmaengine implementation that
> Vinod inherited. Module pinning is distinct from preventing device
> unbind which ultimately can't be prevented. Longer term I think we
> need to audit dmaengine consumers to make sure they are prepared for
> the driver to be removed similar to how other request based drivers
> can gracefully return an error status when the device goes away rather
> than crashing.

Yes, but that will be a big project because there are a lot of drivers.
But I think the dmaengine common code needs to support removal properly,
which essentially means changing how all the drivers allocate and free
their structures, among other things.

The one saving grace is that most of the drivers are for SOCs which
can't be physically removed and there's really no use-case for the user
to call unbind.

Logan

2019-11-22 21:06:12

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct

On Fri, Nov 22, 2019 at 12:56 PM Logan Gunthorpe <[email protected]> wrote:
>
>
>
> On 2019-11-22 1:50 p.m., Dan Williams wrote:
> > On Fri, Nov 22, 2019 at 8:53 AM Dave Jiang <[email protected]> wrote:
> >>
> >>
> >>
> >> On 11/21/19 10:20 PM, Vinod Koul wrote:
> >>> On 14-11-19, 10:03, Logan Gunthorpe wrote:
> >>>>
> >>>>
> >>>> On 2019-11-13 9:55 p.m., Vinod Koul wrote:
> >>>>>> But that's the problem. We can't expect our users to be "nice" and not
> >>>>>> unbind when the driver is in use. Killing the kernel if the user
> >>>>>> unexpectedly unbinds is not acceptable.
> >>>>>
> >>>>> And that is why we review the code and ensure this does not happen and
> >>>>> behaviour is as expected
> >>>>
> >>>> Yes, but the current code can kill the kernel when the driver is unbound.
> >>>>
> >>>>>>>> I suspect this is less of an issue for most devices as they wouldn't
> >>>>>>>> normally be unbound while in use (for example there's really no reason
> >>>>>>>> to ever unbind IOAT seeing it's built into the system). Though, the fact
> >>>>>>>> is, the user could unbind these devices at anytime and we don't want to
> >>>>>>>> panic if they do.
> >>>>>>>
> >>>>>>> There are many drivers which do modules so yes I am expecting unbind and
> >>>>>>> even a bind following that to work
> >>>>>>
> >>>>>> Except they will panic if they unbind while in use, so that's a
> >>>>>> questionable definition of "work".
> >>>>>
> >>>>> dmaengine core has module reference so while they are being used they
> >>>>> won't be removed (unless I complete misread the driver core behaviour)
> >>>>
> >>>> Yes, as I mentioned in my other email, holding a module reference does
> >>>> not prevent the driver from being unbound. Any driver can be unbound by
> >>>> the user at any time without the module being removed.
> >>>
> >>> That sounds okay then.
> >>
> >> I'm actually glad Logan is putting some work in addressing this. I also
> >> ran into the same issue as well dealing with unbinds on my new driver.
> >
> > This was an original mistake of the dmaengine implementation that
> > Vinod inherited. Module pinning is distinct from preventing device
> > unbind which ultimately can't be prevented. Longer term I think we
> > need to audit dmaengine consumers to make sure they are prepared for
> > the driver to be removed similar to how other request based drivers
> > can gracefully return an error status when the device goes away rather
> > than crashing.
>
> Yes, but that will be a big project because there are a lot of drivers.

Oh yes, in fact I think it's something that can only reasonably be
considered for new consumers.

> But I think the dmaengine common code needs to support removal properly,
> which essentially means changing how all the drivers allocate and free
> their structures, among other things.
>
> The one saving grace is that most of the drivers are for SOCs which
> can't be physically removed and there's really no use-case for the user
> to call unbind.

Yes, the SOC case is not so much my concern as the generic offload use
cases, especially if those offloads are in a similar hotplug domain as
a cpu.

2019-11-22 21:44:10

by Dave Jiang

[permalink] [raw]
Subject: Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct



On 11/22/19 2:01 PM, Dan Williams wrote:
> On Fri, Nov 22, 2019 at 12:56 PM Logan Gunthorpe <[email protected]> wrote:
>>
>>
>>
>> On 2019-11-22 1:50 p.m., Dan Williams wrote:
>>> On Fri, Nov 22, 2019 at 8:53 AM Dave Jiang <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On 11/21/19 10:20 PM, Vinod Koul wrote:
>>>>> On 14-11-19, 10:03, Logan Gunthorpe wrote:
>>>>>>
>>>>>>
>>>>>> On 2019-11-13 9:55 p.m., Vinod Koul wrote:
>>>>>>>> But that's the problem. We can't expect our users to be "nice" and not
>>>>>>>> unbind when the driver is in use. Killing the kernel if the user
>>>>>>>> unexpectedly unbinds is not acceptable.
>>>>>>>
>>>>>>> And that is why we review the code and ensure this does not happen and
>>>>>>> behaviour is as expected
>>>>>>
>>>>>> Yes, but the current code can kill the kernel when the driver is unbound.
>>>>>>
>>>>>>>>>> I suspect this is less of an issue for most devices as they wouldn't
>>>>>>>>>> normally be unbound while in use (for example there's really no reason
>>>>>>>>>> to ever unbind IOAT seeing it's built into the system). Though, the fact
>>>>>>>>>> is, the user could unbind these devices at anytime and we don't want to
>>>>>>>>>> panic if they do.
>>>>>>>>>
>>>>>>>>> There are many drivers which do modules so yes I am expecting unbind and
>>>>>>>>> even a bind following that to work
>>>>>>>>
>>>>>>>> Except they will panic if they unbind while in use, so that's a
>>>>>>>> questionable definition of "work".
>>>>>>>
>>>>>>> dmaengine core has module reference so while they are being used they
>>>>>>> won't be removed (unless I complete misread the driver core behaviour)
>>>>>>
>>>>>> Yes, as I mentioned in my other email, holding a module reference does
>>>>>> not prevent the driver from being unbound. Any driver can be unbound by
>>>>>> the user at any time without the module being removed.
>>>>>
>>>>> That sounds okay then.
>>>>
>>>> I'm actually glad Logan is putting some work in addressing this. I also
>>>> ran into the same issue as well dealing with unbinds on my new driver.
>>>
>>> This was an original mistake of the dmaengine implementation that
>>> Vinod inherited. Module pinning is distinct from preventing device
>>> unbind which ultimately can't be prevented. Longer term I think we
>>> need to audit dmaengine consumers to make sure they are prepared for
>>> the driver to be removed similar to how other request based drivers
>>> can gracefully return an error status when the device goes away rather
>>> than crashing.
>>
>> Yes, but that will be a big project because there are a lot of drivers.
>
> Oh yes, in fact I think it's something that can only reasonably be
> considered for new consumers.
>
>> But I think the dmaengine common code needs to support removal properly,
>> which essentially means changing how all the drivers allocate and free
>> their structures, among other things.
>>
>> The one saving grace is that most of the drivers are for SOCs which
>> can't be physically removed and there's really no use-case for the user
>> to call unbind.
>
> Yes, the SOC case is not so much my concern as the generic offload use
> cases, especially if those offloads are in a similar hotplug domain as
> a cpu.
>

It becomes a bigger issue when "channels" can be reconfigured and can
come and go in a hot plug fashion.

2019-12-10 09:54:12

by Vinod Koul

[permalink] [raw]
Subject: Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct

On 22-11-19, 14:42, Dave Jiang wrote:
>
>
> On 11/22/19 2:01 PM, Dan Williams wrote:
> > On Fri, Nov 22, 2019 at 12:56 PM Logan Gunthorpe <[email protected]> wrote:
> > >
> > >
> > >
> > > On 2019-11-22 1:50 p.m., Dan Williams wrote:
> > > > On Fri, Nov 22, 2019 at 8:53 AM Dave Jiang <[email protected]> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 11/21/19 10:20 PM, Vinod Koul wrote:
> > > > > > On 14-11-19, 10:03, Logan Gunthorpe wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 2019-11-13 9:55 p.m., Vinod Koul wrote:
> > > > > > > > > But that's the problem. We can't expect our users to be "nice" and not
> > > > > > > > > unbind when the driver is in use. Killing the kernel if the user
> > > > > > > > > unexpectedly unbinds is not acceptable.
> > > > > > > >
> > > > > > > > And that is why we review the code and ensure this does not happen and
> > > > > > > > behaviour is as expected
> > > > > > >
> > > > > > > Yes, but the current code can kill the kernel when the driver is unbound.
> > > > > > >
> > > > > > > > > > > I suspect this is less of an issue for most devices as they wouldn't
> > > > > > > > > > > normally be unbound while in use (for example there's really no reason
> > > > > > > > > > > to ever unbind IOAT seeing it's built into the system). Though, the fact
> > > > > > > > > > > is, the user could unbind these devices at anytime and we don't want to
> > > > > > > > > > > panic if they do.
> > > > > > > > > >
> > > > > > > > > > There are many drivers which do modules so yes I am expecting unbind and
> > > > > > > > > > even a bind following that to work
> > > > > > > > >
> > > > > > > > > Except they will panic if they unbind while in use, so that's a
> > > > > > > > > questionable definition of "work".
> > > > > > > >
> > > > > > > > dmaengine core has module reference so while they are being used they
> > > > > > > > won't be removed (unless I complete misread the driver core behaviour)
> > > > > > >
> > > > > > > Yes, as I mentioned in my other email, holding a module reference does
> > > > > > > not prevent the driver from being unbound. Any driver can be unbound by
> > > > > > > the user at any time without the module being removed.
> > > > > >
> > > > > > That sounds okay then.
> > > > >
> > > > > I'm actually glad Logan is putting some work in addressing this. I also
> > > > > ran into the same issue as well dealing with unbinds on my new driver.
> > > >
> > > > This was an original mistake of the dmaengine implementation that
> > > > Vinod inherited. Module pinning is distinct from preventing device
> > > > unbind which ultimately can't be prevented. Longer term I think we
> > > > need to audit dmaengine consumers to make sure they are prepared for
> > > > the driver to be removed similar to how other request based drivers
> > > > can gracefully return an error status when the device goes away rather
> > > > than crashing.

Right finally wrapping my head of static dmaengine devices! we can
indeed have devices going away, but me wondering why this should be
handled in subsystems! Should the driver core not be doing this instead?
Would it be not a safe behaviour for unplug to unload the driver and
thus give a chance to unbind from subsystems too...


> > > Yes, but that will be a big project because there are a lot of drivers.
> >
> > Oh yes, in fact I think it's something that can only reasonably be
> > considered for new consumers.
> >
> > > But I think the dmaengine common code needs to support removal properly,
> > > which essentially means changing how all the drivers allocate and free
> > > their structures, among other things.
> > >
> > > The one saving grace is that most of the drivers are for SOCs which
> > > can't be physically removed and there's really no use-case for the user
> > > to call unbind.

yeah only a small set of drivers would need this for now!

> > Yes, the SOC case is not so much my concern as the generic offload use
> > cases, especially if those offloads are in a similar hotplug domain as
> > a cpu.
>
> It becomes a bigger issue when "channels" can be reconfigured and can come
> and go in a hot plug fashion.

--
~Vinod

2019-12-10 17:39:54

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 1/5] dmaengine: Store module owner in dma_device struct



On 2019-12-10 2:53 a.m., Vinod Koul wrote:
> On 22-11-19, 14:42, Dave Jiang wrote:
>>
>>
>> On 11/22/19 2:01 PM, Dan Williams wrote:
>>> On Fri, Nov 22, 2019 at 12:56 PM Logan Gunthorpe <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On 2019-11-22 1:50 p.m., Dan Williams wrote:
>>>>> On Fri, Nov 22, 2019 at 8:53 AM Dave Jiang <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/21/19 10:20 PM, Vinod Koul wrote:
>>>>>>> On 14-11-19, 10:03, Logan Gunthorpe wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2019-11-13 9:55 p.m., Vinod Koul wrote:
>>>>>>>>>> But that's the problem. We can't expect our users to be "nice" and not
>>>>>>>>>> unbind when the driver is in use. Killing the kernel if the user
>>>>>>>>>> unexpectedly unbinds is not acceptable.
>>>>>>>>>
>>>>>>>>> And that is why we review the code and ensure this does not happen and
>>>>>>>>> behaviour is as expected
>>>>>>>>
>>>>>>>> Yes, but the current code can kill the kernel when the driver is unbound.
>>>>>>>>
>>>>>>>>>>>> I suspect this is less of an issue for most devices as they wouldn't
>>>>>>>>>>>> normally be unbound while in use (for example there's really no reason
>>>>>>>>>>>> to ever unbind IOAT seeing it's built into the system). Though, the fact
>>>>>>>>>>>> is, the user could unbind these devices at anytime and we don't want to
>>>>>>>>>>>> panic if they do.
>>>>>>>>>>>
>>>>>>>>>>> There are many drivers which do modules so yes I am expecting unbind and
>>>>>>>>>>> even a bind following that to work
>>>>>>>>>>
>>>>>>>>>> Except they will panic if they unbind while in use, so that's a
>>>>>>>>>> questionable definition of "work".
>>>>>>>>>
>>>>>>>>> dmaengine core has module reference so while they are being used they
>>>>>>>>> won't be removed (unless I complete misread the driver core behaviour)
>>>>>>>>
>>>>>>>> Yes, as I mentioned in my other email, holding a module reference does
>>>>>>>> not prevent the driver from being unbound. Any driver can be unbound by
>>>>>>>> the user at any time without the module being removed.
>>>>>>>
>>>>>>> That sounds okay then.
>>>>>>
>>>>>> I'm actually glad Logan is putting some work in addressing this. I also
>>>>>> ran into the same issue as well dealing with unbinds on my new driver.
>>>>>
>>>>> This was an original mistake of the dmaengine implementation that
>>>>> Vinod inherited. Module pinning is distinct from preventing device
>>>>> unbind which ultimately can't be prevented. Longer term I think we
>>>>> need to audit dmaengine consumers to make sure they are prepared for
>>>>> the driver to be removed similar to how other request based drivers
>>>>> can gracefully return an error status when the device goes away rather
>>>>> than crashing.
>
> Right finally wrapping my head of static dmaengine devices! we can
> indeed have devices going away, but me wondering why this should be
> handled in subsystems! Should the driver core not be doing this instead?
> Would it be not a safe behaviour for unplug to unload the driver and
> thus give a chance to unbind from subsystems too...

Yes, I think it should be in the core. I was just worried about breaking
older drivers. But I'll see if I can move a bit more of the logic for a
v3 series.

Thanks,

Logan