Fix driver new_id sysfs-attribute removal deadlock by making sure to
not hold any locks that the attribute operations grab when removing the
attribute.
Specifically, usb_serial_deregister holds the table mutex when
deregistering the driver, which includes removing the new_id attribute.
This can lead to a deadlock as writing to new_id increments the
attribute's active count before trying to grab the same mutex in
usb_serial_probe.
The deadlock can easily be triggered by inserting a sleep in
usb_serial_deregister and writing the id of an unbound device to new_id
during module unload.
As the table mutex (in this case) is used to prevent subdriver unload
during probe, it should be sufficient to only hold the lock while
manipulating the usb-serial driver list during deregister. A racing
probe will then either fail to find a matching subdriver or fail to get
the corresponding module reference.
Since v3.15-rc1 this also triggers the following lockdep warning:
======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc2 #123 Tainted: G W
-------------------------------------------------------
modprobe/190 is trying to acquire lock:
(s_active#4){++++.+}, at: [<c0167aa0>] kernfs_remove_by_name_ns+0x4c/0x94
but task is already holding lock:
(table_lock){+.+.+.}, at: [<bf004d84>] usb_serial_deregister+0x3c/0x78 [usbserial]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (table_lock){+.+.+.}:
[<c0075f84>] __lock_acquire+0x1694/0x1ce4
[<c0076de8>] lock_acquire+0xb4/0x154
[<c03af3cc>] _raw_spin_lock+0x4c/0x5c
[<c02bbc24>] usb_store_new_id+0x14c/0x1ac
[<bf007eb4>] new_id_store+0x68/0x70 [usbserial]
[<c025f568>] drv_attr_store+0x30/0x3c
[<c01690e0>] sysfs_kf_write+0x5c/0x60
[<c01682c0>] kernfs_fop_write+0xd4/0x194
[<c010881c>] vfs_write+0xbc/0x198
[<c0108e4c>] SyS_write+0x4c/0xa0
[<c000f880>] ret_fast_syscall+0x0/0x48
-> #0 (s_active#4){++++.+}:
[<c03a7a28>] print_circular_bug+0x68/0x2f8
[<c0076218>] __lock_acquire+0x1928/0x1ce4
[<c0076de8>] lock_acquire+0xb4/0x154
[<c0166b70>] __kernfs_remove+0x254/0x310
[<c0167aa0>] kernfs_remove_by_name_ns+0x4c/0x94
[<c0169fb8>] remove_files.isra.1+0x48/0x84
[<c016a2fc>] sysfs_remove_group+0x58/0xac
[<c016a414>] sysfs_remove_groups+0x34/0x44
[<c02623b8>] driver_remove_groups+0x1c/0x20
[<c0260e9c>] bus_remove_driver+0x3c/0xe4
[<c026235c>] driver_unregister+0x38/0x58
[<bf007fb4>] usb_serial_bus_deregister+0x84/0x88 [usbserial]
[<bf004db4>] usb_serial_deregister+0x6c/0x78 [usbserial]
[<bf005330>] usb_serial_deregister_drivers+0x2c/0x4c [usbserial]
[<bf016618>] usb_serial_module_exit+0x14/0x1c [sierra]
[<c009d6cc>] SyS_delete_module+0x184/0x210
[<c000f880>] ret_fast_syscall+0x0/0x48
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(table_lock);
lock(s_active#4);
lock(table_lock);
lock(s_active#4);
*** DEADLOCK ***
1 lock held by modprobe/190:
#0: (table_lock){+.+.+.}, at: [<bf004d84>] usb_serial_deregister+0x3c/0x78 [usbserial]
stack backtrace:
CPU: 0 PID: 190 Comm: modprobe Tainted: G W 3.15.0-rc2 #123
[<c0015e10>] (unwind_backtrace) from [<c0013728>] (show_stack+0x20/0x24)
[<c0013728>] (show_stack) from [<c03a9a54>] (dump_stack+0x24/0x28)
[<c03a9a54>] (dump_stack) from [<c03a7cac>] (print_circular_bug+0x2ec/0x2f8)
[<c03a7cac>] (print_circular_bug) from [<c0076218>] (__lock_acquire+0x1928/0x1ce4)
[<c0076218>] (__lock_acquire) from [<c0076de8>] (lock_acquire+0xb4/0x154)
[<c0076de8>] (lock_acquire) from [<c0166b70>] (__kernfs_remove+0x254/0x310)
[<c0166b70>] (__kernfs_remove) from [<c0167aa0>] (kernfs_remove_by_name_ns+0x4c/0x94)
[<c0167aa0>] (kernfs_remove_by_name_ns) from [<c0169fb8>] (remove_files.isra.1+0x48/0x84)
[<c0169fb8>] (remove_files.isra.1) from [<c016a2fc>] (sysfs_remove_group+0x58/0xac)
[<c016a2fc>] (sysfs_remove_group) from [<c016a414>] (sysfs_remove_groups+0x34/0x44)
[<c016a414>] (sysfs_remove_groups) from [<c02623b8>] (driver_remove_groups+0x1c/0x20)
[<c02623b8>] (driver_remove_groups) from [<c0260e9c>] (bus_remove_driver+0x3c/0xe4)
[<c0260e9c>] (bus_remove_driver) from [<c026235c>] (driver_unregister+0x38/0x58)
[<c026235c>] (driver_unregister) from [<bf007fb4>] (usb_serial_bus_deregister+0x84/0x88 [usbserial])
[<bf007fb4>] (usb_serial_bus_deregister [usbserial]) from [<bf004db4>] (usb_serial_deregister+0x6c/0x78 [usbserial])
[<bf004db4>] (usb_serial_deregister [usbserial]) from [<bf005330>] (usb_serial_deregister_drivers+0x2c/0x4c [usbserial])
[<bf005330>] (usb_serial_deregister_drivers [usbserial]) from [<bf016618>] (usb_serial_module_exit+0x14/0x1c [sierra])
[<bf016618>] (usb_serial_module_exit [sierra]) from [<c009d6cc>] (SyS_delete_module+0x184/0x210)
[<c009d6cc>] (SyS_delete_module) from [<c000f880>] (ret_fast_syscall+0x0/0x48)
Signed-off-by: Johan Hovold <[email protected]>
---
I got this new lockdep warning with 3.15-rc, but this dates back to at
least 0daeed381c6a ("USB-BKL: Remove BKL use for usb serial driver
probing").
As far as I can tell, moving driver deregistration out of the table lock
should be fine. Another option would be to get a module reference in
new_id store, although that would still trigger the lockdep warning.
Thanks,
Johan
drivers/usb/serial/usb-serial.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/serial/usb-serial.c b/drivers/usb/serial/usb-serial.c
index 81fc0dfcfdcf..6d40d56378d7 100644
--- a/drivers/usb/serial/usb-serial.c
+++ b/drivers/usb/serial/usb-serial.c
@@ -1347,10 +1347,12 @@ static int usb_serial_register(struct usb_serial_driver *driver)
static void usb_serial_deregister(struct usb_serial_driver *device)
{
pr_info("USB Serial deregistering driver %s\n", device->description);
+
mutex_lock(&table_lock);
list_del(&device->driver_list);
- usb_serial_bus_deregister(device);
mutex_unlock(&table_lock);
+
+ usb_serial_bus_deregister(device);
}
/**
--
1.8.3.2
cc'ing Li Zhong who's working on a simliar issue in the following
thread and quoting whole body.
http://thread.gmane.org/gmane.linux.kernel/1680706
Li, this is another variation of the same problem. Maybe this can be
covered by your work too?
Thanks.
On Wed, Apr 23, 2014 at 11:32:19AM +0200, Johan Hovold wrote:
> Fix driver new_id sysfs-attribute removal deadlock by making sure to
> not hold any locks that the attribute operations grab when removing the
> attribute.
>
> Specifically, usb_serial_deregister holds the table mutex when
> deregistering the driver, which includes removing the new_id attribute.
> This can lead to a deadlock as writing to new_id increments the
> attribute's active count before trying to grab the same mutex in
> usb_serial_probe.
>
> The deadlock can easily be triggered by inserting a sleep in
> usb_serial_deregister and writing the id of an unbound device to new_id
> during module unload.
>
> As the table mutex (in this case) is used to prevent subdriver unload
> during probe, it should be sufficient to only hold the lock while
> manipulating the usb-serial driver list during deregister. A racing
> probe will then either fail to find a matching subdriver or fail to get
> the corresponding module reference.
>
> Since v3.15-rc1 this also triggers the following lockdep warning:
>
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.15.0-rc2 #123 Tainted: G W
> -------------------------------------------------------
> modprobe/190 is trying to acquire lock:
> (s_active#4){++++.+}, at: [<c0167aa0>] kernfs_remove_by_name_ns+0x4c/0x94
>
> but task is already holding lock:
> (table_lock){+.+.+.}, at: [<bf004d84>] usb_serial_deregister+0x3c/0x78 [usbserial]
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (table_lock){+.+.+.}:
> [<c0075f84>] __lock_acquire+0x1694/0x1ce4
> [<c0076de8>] lock_acquire+0xb4/0x154
> [<c03af3cc>] _raw_spin_lock+0x4c/0x5c
> [<c02bbc24>] usb_store_new_id+0x14c/0x1ac
> [<bf007eb4>] new_id_store+0x68/0x70 [usbserial]
> [<c025f568>] drv_attr_store+0x30/0x3c
> [<c01690e0>] sysfs_kf_write+0x5c/0x60
> [<c01682c0>] kernfs_fop_write+0xd4/0x194
> [<c010881c>] vfs_write+0xbc/0x198
> [<c0108e4c>] SyS_write+0x4c/0xa0
> [<c000f880>] ret_fast_syscall+0x0/0x48
>
> -> #0 (s_active#4){++++.+}:
> [<c03a7a28>] print_circular_bug+0x68/0x2f8
> [<c0076218>] __lock_acquire+0x1928/0x1ce4
> [<c0076de8>] lock_acquire+0xb4/0x154
> [<c0166b70>] __kernfs_remove+0x254/0x310
> [<c0167aa0>] kernfs_remove_by_name_ns+0x4c/0x94
> [<c0169fb8>] remove_files.isra.1+0x48/0x84
> [<c016a2fc>] sysfs_remove_group+0x58/0xac
> [<c016a414>] sysfs_remove_groups+0x34/0x44
> [<c02623b8>] driver_remove_groups+0x1c/0x20
> [<c0260e9c>] bus_remove_driver+0x3c/0xe4
> [<c026235c>] driver_unregister+0x38/0x58
> [<bf007fb4>] usb_serial_bus_deregister+0x84/0x88 [usbserial]
> [<bf004db4>] usb_serial_deregister+0x6c/0x78 [usbserial]
> [<bf005330>] usb_serial_deregister_drivers+0x2c/0x4c [usbserial]
> [<bf016618>] usb_serial_module_exit+0x14/0x1c [sierra]
> [<c009d6cc>] SyS_delete_module+0x184/0x210
> [<c000f880>] ret_fast_syscall+0x0/0x48
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(table_lock);
> lock(s_active#4);
> lock(table_lock);
> lock(s_active#4);
>
> *** DEADLOCK ***
>
> 1 lock held by modprobe/190:
> #0: (table_lock){+.+.+.}, at: [<bf004d84>] usb_serial_deregister+0x3c/0x78 [usbserial]
>
> stack backtrace:
> CPU: 0 PID: 190 Comm: modprobe Tainted: G W 3.15.0-rc2 #123
> [<c0015e10>] (unwind_backtrace) from [<c0013728>] (show_stack+0x20/0x24)
> [<c0013728>] (show_stack) from [<c03a9a54>] (dump_stack+0x24/0x28)
> [<c03a9a54>] (dump_stack) from [<c03a7cac>] (print_circular_bug+0x2ec/0x2f8)
> [<c03a7cac>] (print_circular_bug) from [<c0076218>] (__lock_acquire+0x1928/0x1ce4)
> [<c0076218>] (__lock_acquire) from [<c0076de8>] (lock_acquire+0xb4/0x154)
> [<c0076de8>] (lock_acquire) from [<c0166b70>] (__kernfs_remove+0x254/0x310)
> [<c0166b70>] (__kernfs_remove) from [<c0167aa0>] (kernfs_remove_by_name_ns+0x4c/0x94)
> [<c0167aa0>] (kernfs_remove_by_name_ns) from [<c0169fb8>] (remove_files.isra.1+0x48/0x84)
> [<c0169fb8>] (remove_files.isra.1) from [<c016a2fc>] (sysfs_remove_group+0x58/0xac)
> [<c016a2fc>] (sysfs_remove_group) from [<c016a414>] (sysfs_remove_groups+0x34/0x44)
> [<c016a414>] (sysfs_remove_groups) from [<c02623b8>] (driver_remove_groups+0x1c/0x20)
> [<c02623b8>] (driver_remove_groups) from [<c0260e9c>] (bus_remove_driver+0x3c/0xe4)
> [<c0260e9c>] (bus_remove_driver) from [<c026235c>] (driver_unregister+0x38/0x58)
> [<c026235c>] (driver_unregister) from [<bf007fb4>] (usb_serial_bus_deregister+0x84/0x88 [usbserial])
> [<bf007fb4>] (usb_serial_bus_deregister [usbserial]) from [<bf004db4>] (usb_serial_deregister+0x6c/0x78 [usbserial])
> [<bf004db4>] (usb_serial_deregister [usbserial]) from [<bf005330>] (usb_serial_deregister_drivers+0x2c/0x4c [usbserial])
> [<bf005330>] (usb_serial_deregister_drivers [usbserial]) from [<bf016618>] (usb_serial_module_exit+0x14/0x1c [sierra])
> [<bf016618>] (usb_serial_module_exit [sierra]) from [<c009d6cc>] (SyS_delete_module+0x184/0x210)
> [<c009d6cc>] (SyS_delete_module) from [<c000f880>] (ret_fast_syscall+0x0/0x48)
>
> Signed-off-by: Johan Hovold <[email protected]>
> ---
>
> I got this new lockdep warning with 3.15-rc, but this dates back to at
> least 0daeed381c6a ("USB-BKL: Remove BKL use for usb serial driver
> probing").
>
> As far as I can tell, moving driver deregistration out of the table lock
> should be fine. Another option would be to get a module reference in
> new_id store, although that would still trigger the lockdep warning.
>
> Thanks,
> Johan
>
>
> drivers/usb/serial/usb-serial.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/serial/usb-serial.c b/drivers/usb/serial/usb-serial.c
> index 81fc0dfcfdcf..6d40d56378d7 100644
> --- a/drivers/usb/serial/usb-serial.c
> +++ b/drivers/usb/serial/usb-serial.c
> @@ -1347,10 +1347,12 @@ static int usb_serial_register(struct usb_serial_driver *driver)
> static void usb_serial_deregister(struct usb_serial_driver *device)
> {
> pr_info("USB Serial deregistering driver %s\n", device->description);
> +
> mutex_lock(&table_lock);
> list_del(&device->driver_list);
> - usb_serial_bus_deregister(device);
> mutex_unlock(&table_lock);
> +
> + usb_serial_bus_deregister(device);
> }
>
> /**
> --
> 1.8.3.2
>
--
tejun
On Wed, 2014-04-23 at 10:19 -0400, Tejun Heo wrote:
> cc'ing Li Zhong who's working on a simliar issue in the following
> thread and quoting whole body.
>
> http://thread.gmane.org/gmane.linux.kernel/1680706
>
> Li, this is another variation of the same problem. Maybe this can be
> covered by your work too?
It seems to me that it is about write something to driver attribute, and
driver unloading. If so, maybe it's not easy to reuse the help functions
created for device attribute, and device removing.
But I guess the idea to break the active protection could still be
applied here:
Maybe we could try_module_get() here (like the other option suggested by
Johan?), and break active protection if we could get the module,
something like below?
================
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 83e910a..6ce27e0 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -69,9 +69,24 @@ static ssize_t drv_attr_store(struct kobject *kobj, struct attribute *attr,
struct driver_attribute *drv_attr = to_drv_attr(attr);
struct driver_private *drv_priv = to_driver(kobj);
ssize_t ret = -EIO;
+ struct kernfs_node *kn;
+
+ if (!try_module_get(drv_priv->driver->owner))
+ return ret;
+
+ kn = kernfs_find_and_get(kobj->sd, attr->name);
+ if (WARN_ON_ONCE(!kn))
+ return ret;
+
+ kernfs_break_active_protection(kn);
if (drv_attr->store)
ret = drv_attr->store(drv_priv->driver, buf, count);
+
+ kernfs_unbreak_active_protection(kn);
+ kernfs_put(kn);
+
+ module_put(drv_priv->driver->owner);
return ret;
}
On Thu, Apr 24, 2014 at 04:29:15PM +0800, Li Zhong wrote:
> On Wed, 2014-04-23 at 10:19 -0400, Tejun Heo wrote:
> > cc'ing Li Zhong who's working on a simliar issue in the following
> > thread and quoting whole body.
> >
> > http://thread.gmane.org/gmane.linux.kernel/1680706
> >
> > Li, this is another variation of the same problem. Maybe this can be
> > covered by your work too?
>
> It seems to me that it is about write something to driver attribute, and
> driver unloading. If so, maybe it's not easy to reuse the help functions
> created for device attribute, and device removing.
>
> But I guess the idea to break the active protection could still be
> applied here:
>
> Maybe we could try_module_get() here (like the other option suggested by
> Johan?), and break active protection if we could get the module,
> something like below?
I don't get why try_module_get() matters here. We can't call into
->store if the object at hand is already destroyed and the underlying
module can't go away if the target device is still alive.
try_module_get() doesn't actually protect the object. Why does that
matter? This is self removal, right? Can you please take a look at
kernfs_remove_self()?
Thanks.
--
tejun
On Thu, Apr 24, 2014 at 10:35:17AM -0400, Tejun Heo wrote:
> On Thu, Apr 24, 2014 at 04:29:15PM +0800, Li Zhong wrote:
> > On Wed, 2014-04-23 at 10:19 -0400, Tejun Heo wrote:
> > > cc'ing Li Zhong who's working on a simliar issue in the following
> > > thread and quoting whole body.
> > >
> > > http://thread.gmane.org/gmane.linux.kernel/1680706
> > >
> > > Li, this is another variation of the same problem. Maybe this can be
> > > covered by your work too?
> >
> > It seems to me that it is about write something to driver attribute, and
> > driver unloading. If so, maybe it's not easy to reuse the help functions
> > created for device attribute, and device removing.
> >
> > But I guess the idea to break the active protection could still be
> > applied here:
> >
> > Maybe we could try_module_get() here (like the other option suggested by
> > Johan?), and break active protection if we could get the module,
> > something like below?
>
> I don't get why try_module_get() matters here. We can't call into
> ->store if the object at hand is already destroyed and the underlying
> module can't go away if the target device is still alive.
> try_module_get() doesn't actually protect the object. Why does that
> matter? This is self removal, right? Can you please take a look at
> kernfs_remove_self()?
No, this isn't self removal. The driver-attribute (not device-attribute)
store operation simply grabs a lock that is also held while the driver
is being deregistered at module unload. Taking a reference to the module
in this case will prevent deregistration while store is running.
But it seems like this can be solved for usb-serial by simply not
holding the lock while deregistering.
Johan
On Thu, 2014-04-24 at 10:35 -0400, Tejun Heo wrote:
> On Thu, Apr 24, 2014 at 04:29:15PM +0800, Li Zhong wrote:
> > On Wed, 2014-04-23 at 10:19 -0400, Tejun Heo wrote:
> > > cc'ing Li Zhong who's working on a simliar issue in the following
> > > thread and quoting whole body.
> > >
> > > http://thread.gmane.org/gmane.linux.kernel/1680706
> > >
> > > Li, this is another variation of the same problem. Maybe this can be
> > > covered by your work too?
> >
> > It seems to me that it is about write something to driver attribute, and
> > driver unloading. If so, maybe it's not easy to reuse the help functions
> > created for device attribute, and device removing.
> >
> > But I guess the idea to break the active protection could still be
> > applied here:
> >
> > Maybe we could try_module_get() here (like the other option suggested by
> > Johan?), and break active protection if we could get the module,
> > something like below?
>
> I don't get why try_module_get() matters here. We can't call into
> ->store if the object at hand is already destroyed and the underlying
> module can't go away if the target device is still alive.
> try_module_get() doesn't actually protect the object. Why does that
> matter? This is self removal, right? Can you please take a look at
> kernfs_remove_self()?
This is about one process writing something to driver attributes, and
one process trying to unload this driver.
I think try_module_get() could detect whether the driver is being
unloaded, and if not, prevent it from being unloaded, so it could
protect the object here by not allow the driver to be unloaded.
And if the driver is being unloaded, we could abort the write operation
directly.
Thanks, Zhong
>
> Thanks.
>
On Thu, 2014-04-24 at 16:52 +0200, Johan Hovold wrote:
> On Thu, Apr 24, 2014 at 10:35:17AM -0400, Tejun Heo wrote:
> > On Thu, Apr 24, 2014 at 04:29:15PM +0800, Li Zhong wrote:
> > > On Wed, 2014-04-23 at 10:19 -0400, Tejun Heo wrote:
> > > > cc'ing Li Zhong who's working on a simliar issue in the following
> > > > thread and quoting whole body.
> > > >
> > > > http://thread.gmane.org/gmane.linux.kernel/1680706
> > > >
> > > > Li, this is another variation of the same problem. Maybe this can be
> > > > covered by your work too?
> > >
> > > It seems to me that it is about write something to driver attribute, and
> > > driver unloading. If so, maybe it's not easy to reuse the help functions
> > > created for device attribute, and device removing.
> > >
> > > But I guess the idea to break the active protection could still be
> > > applied here:
> > >
> > > Maybe we could try_module_get() here (like the other option suggested by
> > > Johan?), and break active protection if we could get the module,
> > > something like below?
> >
> > I don't get why try_module_get() matters here. We can't call into
> > ->store if the object at hand is already destroyed and the underlying
> > module can't go away if the target device is still alive.
> > try_module_get() doesn't actually protect the object. Why does that
> > matter? This is self removal, right? Can you please take a look at
> > kernfs_remove_self()?
>
> No, this isn't self removal. The driver-attribute (not device-attribute)
> store operation simply grabs a lock that is also held while the driver
> is being deregistered at module unload. Taking a reference to the module
> in this case will prevent deregistration while store is running.
>
> But it seems like this can be solved for usb-serial by simply not
> holding the lock while deregistering.
I didn't look carefully about this lock.
But I'm not sure whether there are such requirements for driver
attributes:
some lock needs be grabbed in the driver attributes store callbacks, and
the same lock also needs to be grabbed during driver unregister.
If we have such requirements currently or in the future, I think they
could all be solved by breaking active protection after get the module
reference.
Thanks, Zhong
>
> Johan
>
On Fri, Apr 25, 2014 at 10:16:57AM +0800, Li Zhong wrote:
> On Thu, 2014-04-24 at 16:52 +0200, Johan Hovold wrote:
> > No, this isn't self removal. The driver-attribute (not device-attribute)
> > store operation simply grabs a lock that is also held while the driver
> > is being deregistered at module unload. Taking a reference to the module
> > in this case will prevent deregistration while store is running.
> >
> > But it seems like this can be solved for usb-serial by simply not
> > holding the lock while deregistering.
>
> I didn't look carefully about this lock.
>
> But I'm not sure whether there are such requirements for driver
> attributes:
>
> some lock needs be grabbed in the driver attributes store callbacks, and
> the same lock also needs to be grabbed during driver unregister.
>
> If we have such requirements currently or in the future, I think they
> could all be solved by breaking active protection after get the module
> reference.
Yes, if we find any such cases, but I don't think it should be done
generally (as in one of your earlier posts).
Active protection is usually exactly what prevents the driver from being
deregistered, and would only need to be broken to avoid any ABBA
deadlocks from being reported by lockdep (the module reference would be
enough to prevent the actual deadlock).
Thanks,
Johan
On Fri, 25 Apr 2014, Li Zhong wrote:
> > I don't get why try_module_get() matters here. We can't call into
> > ->store if the object at hand is already destroyed and the underlying
> > module can't go away if the target device is still alive.
> > try_module_get() doesn't actually protect the object. Why does that
> > matter? This is self removal, right? Can you please take a look at
> > kernfs_remove_self()?
>
> This is about one process writing something to driver attributes, and
> one process trying to unload this driver.
>
> I think try_module_get() could detect whether the driver is being
> unloaded, and if not, prevent it from being unloaded, so it could
> protect the object here by not allow the driver to be unloaded.
That isn't how try_module_get() works. If the module is being
unloaded, try_module_get() simply fails. It does not prevent the
module from being unloaded -- that's why its name begins with "try".
Alan Stern
On Fri, 25 Apr 2014, Li Zhong wrote:
> > No, this isn't self removal. The driver-attribute (not device-attribute)
> > store operation simply grabs a lock that is also held while the driver
> > is being deregistered at module unload. Taking a reference to the module
> > in this case will prevent deregistration while store is running.
> >
> > But it seems like this can be solved for usb-serial by simply not
> > holding the lock while deregistering.
>
> I didn't look carefully about this lock.
>
> But I'm not sure whether there are such requirements for driver
> attributes:
>
> some lock needs be grabbed in the driver attributes store callbacks, and
> the same lock also needs to be grabbed during driver unregister.
In this case, the lock does _not_ need to be grabbed during driver
unregister. The driver grabs the lock, but it doesn't need to.
> If we have such requirements currently or in the future, I think they
> could all be solved by breaking active protection after get the module
> reference.
No! That would be very bad.
Unloading modules is quite different from unbinding drivers. After the
driver is unbound, its attribute callback routines can continue to run.
But after a driver module has been unloaded, its attribute callback
routines _cannot_ run because they aren't present in memory any more.
If we allowed a module to be unloaded while one of its callbacks was
running (because active protection was broken), imagine what would
happen...
Alan Stern
On Fri, Apr 25, 2014 at 09:54:33AM -0400, Alan Stern wrote:
> On Fri, 25 Apr 2014, Li Zhong wrote:
>
> > > I don't get why try_module_get() matters here. We can't call into
> > > ->store if the object at hand is already destroyed and the underlying
> > > module can't go away if the target device is still alive.
> > > try_module_get() doesn't actually protect the object. Why does that
> > > matter? This is self removal, right? Can you please take a look at
> > > kernfs_remove_self()?
> >
> > This is about one process writing something to driver attributes, and
> > one process trying to unload this driver.
> >
> > I think try_module_get() could detect whether the driver is being
> > unloaded, and if not, prevent it from being unloaded, so it could
> > protect the object here by not allow the driver to be unloaded.
>
> That isn't how try_module_get() works. If the module is being
> unloaded, try_module_get() simply fails. It does not prevent the
> module from being unloaded -- that's why its name begins with "try".
Well, to be fair, if the try_module_get() *succeeds* it will prevent the
module from being unloaded (and otherwise the sysfs operation bails
out).
Johan
On Fri, 2014-04-25 at 12:15 +0200, Johan Hovold wrote:
> On Fri, Apr 25, 2014 at 10:16:57AM +0800, Li Zhong wrote:
> > On Thu, 2014-04-24 at 16:52 +0200, Johan Hovold wrote:
>
> > > No, this isn't self removal. The driver-attribute (not device-attribute)
> > > store operation simply grabs a lock that is also held while the driver
> > > is being deregistered at module unload. Taking a reference to the module
> > > in this case will prevent deregistration while store is running.
> > >
> > > But it seems like this can be solved for usb-serial by simply not
> > > holding the lock while deregistering.
> >
> > I didn't look carefully about this lock.
> >
> > But I'm not sure whether there are such requirements for driver
> > attributes:
> >
> > some lock needs be grabbed in the driver attributes store callbacks, and
> > the same lock also needs to be grabbed during driver unregister.
> >
> > If we have such requirements currently or in the future, I think they
> > could all be solved by breaking active protection after get the module
> > reference.
>
> Yes, if we find any such cases, but I don't think it should be done
> generally (as in one of your earlier posts).
OK, maybe we only need to reconsider this only if we see some more
similar lockdep warnings in the future.
>
> Active protection is usually exactly what prevents the driver from being
> deregistered, and would only need to be broken to avoid any ABBA
> deadlocks from being reported by lockdep (the module reference would be
> enough to prevent the actual deadlock).
Yes, maybe try to get the module reference is not bad before writing to
driver attributes, as it doesn't make much sense to really call the
callbacks for the driver attribute if the driver is being unload.
And after we get the reference, it is safe for us to break the active.
But if we don't have such real cases(lockdep warnings), we actually
don't need to break it.
Thanks, Zhong
>
> Thanks,
> Johan
>
On Fri, 2014-04-25 at 09:54 -0400, Alan Stern wrote:
> On Fri, 25 Apr 2014, Li Zhong wrote:
>
> > > I don't get why try_module_get() matters here. We can't call into
> > > ->store if the object at hand is already destroyed and the underlying
> > > module can't go away if the target device is still alive.
> > > try_module_get() doesn't actually protect the object. Why does that
> > > matter? This is self removal, right? Can you please take a look at
> > > kernfs_remove_self()?
> >
> > This is about one process writing something to driver attributes, and
> > one process trying to unload this driver.
> >
> > I think try_module_get() could detect whether the driver is being
> > unloaded, and if not, prevent it from being unloaded, so it could
> > protect the object here by not allow the driver to be unloaded.
>
> That isn't how try_module_get() works. If the module is being
> unloaded, try_module_get() simply fails. It does not prevent the
> module from being unloaded -- that's why its name begins with "try".
Yes, I know that. What I said above is for the case when
try_module_get() detects the driver is NOT being unloaded, and increases
the reference counter.
Thanks, Zhong
>
> Alan Stern
>
On Fri, 2014-04-25 at 09:59 -0400, Alan Stern wrote:
> On Fri, 25 Apr 2014, Li Zhong wrote:
>
> > > No, this isn't self removal. The driver-attribute (not device-attribute)
> > > store operation simply grabs a lock that is also held while the driver
> > > is being deregistered at module unload. Taking a reference to the module
> > > in this case will prevent deregistration while store is running.
> > >
> > > But it seems like this can be solved for usb-serial by simply not
> > > holding the lock while deregistering.
> >
> > I didn't look carefully about this lock.
> >
> > But I'm not sure whether there are such requirements for driver
> > attributes:
> >
> > some lock needs be grabbed in the driver attributes store callbacks, and
> > the same lock also needs to be grabbed during driver unregister.
>
> In this case, the lock does _not_ need to be grabbed during driver
> unregister. The driver grabs the lock, but it doesn't need to.
OK.
>
> > If we have such requirements currently or in the future, I think they
> > could all be solved by breaking active protection after get the module
> > reference.
>
> No! That would be very bad.
>
> Unloading modules is quite different from unbinding drivers. After the
> driver is unbound, its attribute callback routines can continue to run.
> But after a driver module has been unloaded, its attribute callback
> routines _cannot_ run because they aren't present in memory any more.
>
> If we allowed a module to be unloaded while one of its callbacks was
> running (because active protection was broken), imagine what would
> happen...
I don't think the module could be unloaded after we increased the module
reference counter.
Thanks, Zhong
>
> Alan Stern
>