2019-02-04 23:14:32

by Sven Van Asbroeck

[permalink] [raw]
Subject: [RFC v1 0/3] Address potential user-after-free on module unload

I think there _might_ be potential use-after-free issues on module unload.

They are hard to trigger, but I think I've seen them bring the whole
kernel down when they do occur. Can be triggered by doing an insmod of
a vulnerable module, rapidly followed by an rmmod.

Caused by drivers which schedule work / delayed_work, but do not clean it up
properly on module unload. Which means the work function could run _after_
the module has unloaded.

A quick grep through the kernel sources brings up many instances.
I leave it to people more knowledgeable than me to determine if this problem
is likely to happen, and/or if it can be exploited to become a security risk.

Perhaps developers can be 'nudged' into doing the right thing by using
resource-managed versions of INIT_WORK() / INIT_DELAYED_WORK(), which may
address the issue quite elegantly.

Attached is a proposal patch, followed by sample fixes for two vulnerable
modules. As far as I can tell, many more modules are vulnerable.

Sven Van Asbroeck (3):
workqueue: Add resource-managed version of INIT_[DELAYED_]WORK()
max17042_battery: fix potential user-after-free on module unload
cap11xx: fix potential user-after-free on module unload

drivers/input/keyboard/cap11xx.c | 6 ++-
drivers/power/supply/max17042_battery.c | 5 ++-
include/linux/workqueue.h | 7 ++++
kernel/workqueue.c | 54 +++++++++++++++++++++++++
4 files changed, 70 insertions(+), 2 deletions(-)

--
2.17.1



2019-02-04 23:12:44

by Sven Van Asbroeck

[permalink] [raw]
Subject: [RFC v1 1/3] workqueue: Add resource-managed version of INIT_[DELAYED_]WORK()

In modules which extensively use devm_ resource management, it is often
easy to overlook (delayed) work that is left pending or running after the
module is unloaded. This could introduce user-after-free issues.

Nudge kernel developers into 'doing the right thing' by introducing a
resource-managed version of INIT_[DELAYED_]WORK(). This can be used as
an elegant way to ensure that work is not left pending or running after
its dependencies are released.

Functions introduced in workqueue.h :
- devm_init_work()
- devm_init_delayed_work()

Signed-off-by: Sven Van Asbroeck <[email protected]>
---
include/linux/workqueue.h | 7 +++++
kernel/workqueue.c | 54 +++++++++++++++++++++++++++++++++++++++
2 files changed, 61 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 60d673e15632..eee148eb9908 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -15,6 +15,7 @@
#include <linux/cpumask.h>
#include <linux/rcupdate.h>

+struct device;
struct workqueue_struct;

struct work_struct;
@@ -670,4 +671,10 @@ int workqueue_offline_cpu(unsigned int cpu);
int __init workqueue_init_early(void);
int __init workqueue_init(void);

+int __must_check devm_init_work(struct device *dev, struct work_struct *work,
+ work_func_t func);
+int __must_check devm_init_delayed_work(struct device *dev,
+ struct delayed_work *dw,
+ work_func_t func);
+
#endif
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index fc5d23d752a5..ab814b0b6c81 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5837,3 +5837,57 @@ int __init workqueue_init(void)

return 0;
}
+
+static void devm_work_release(void *data)
+{
+ struct work_struct *work = data;
+
+ cancel_work_sync(work);
+}
+
+/**
+ * devm_init_work - resource-controlled version of INIT_WORK()
+ * @dev: valid struct device pointer
+ * @work: work pointer to initialize
+ * @func: work function to initialize 'work' with
+ *
+ * Initialize the work pointer just like INIT_WORK(), but use resource control
+ * to help ensure work is not left running or pending when dev is destroyed.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int __must_check devm_init_work(struct device *dev, struct work_struct *work,
+ work_func_t func)
+{
+ INIT_WORK(work, func);
+ return devm_add_action(dev, devm_work_release, work);
+}
+EXPORT_SYMBOL_GPL(devm_init_work);
+
+static void devm_delayed_work_release(void *data)
+{
+ struct delayed_work *dw = data;
+
+ cancel_delayed_work_sync(dw);
+}
+
+/**
+ * devm_init_delayed_work - resource-controlled version of INIT_DELAYED_WORK()
+ * @dev: valid struct device pointer
+ * @dw: delayed_work pointer to initialize
+ * @func: work function to initialize 'dw' with
+ *
+ * Initialize the delayed_work pointer just like INIT_DELAYED_WORK(), but use
+ * resource control to help ensure delayed work is not left running or pending
+ * when dev is destroyed.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int __must_check devm_init_delayed_work(struct device *dev,
+ struct delayed_work *dw,
+ work_func_t func)
+{
+ INIT_DELAYED_WORK(dw, func);
+ return devm_add_action(dev, devm_delayed_work_release, dw);
+}
+EXPORT_SYMBOL_GPL(devm_init_delayed_work);
--
2.17.1


2019-02-04 23:15:20

by Sven Van Asbroeck

[permalink] [raw]
Subject: [RFC v1 2/3] max17042_battery: fix potential user-after-free on module unload

The work which is scheduled on a POR boot is potentially left
pending or running until after the driver module is unloaded.

Fix by using resource-controlled version of INIT_WORK().

Signed-off-by: Sven Van Asbroeck <[email protected]>
---
drivers/power/supply/max17042_battery.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/power/supply/max17042_battery.c b/drivers/power/supply/max17042_battery.c
index 2a8d75e5e930..a61e2b81f68a 100644
--- a/drivers/power/supply/max17042_battery.c
+++ b/drivers/power/supply/max17042_battery.c
@@ -1100,7 +1100,10 @@ static int max17042_probe(struct i2c_client *client,

regmap_read(chip->regmap, MAX17042_STATUS, &val);
if (val & STATUS_POR_BIT) {
- INIT_WORK(&chip->work, max17042_init_worker);
+ ret = devm_init_work(&client->dev, &chip->work,
+ max17042_init_worker);
+ if (ret)
+ return ret;
schedule_work(&chip->work);
} else {
chip->init_complete = 1;
--
2.17.1


2019-02-04 23:15:20

by Sven Van Asbroeck

[permalink] [raw]
Subject: [RFC v1 3/3] cap11xx: fix potential user-after-free on module unload

The work which is scheduled by led_classdev->brightness_set() is
potentially left pending or running until after the driver module
is unloaded.

Fix by using resource-controlled version of INIT_WORK().

Signed-off-by: Sven Van Asbroeck <[email protected]>
---
drivers/input/keyboard/cap11xx.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/input/keyboard/cap11xx.c b/drivers/input/keyboard/cap11xx.c
index 312916f99597..a92dd8ee9ed7 100644
--- a/drivers/input/keyboard/cap11xx.c
+++ b/drivers/input/keyboard/cap11xx.c
@@ -312,7 +312,11 @@ static int cap11xx_init_leds(struct device *dev,
led->reg = reg;
led->priv = priv;

- INIT_WORK(&led->work, cap11xx_led_work);
+ error = devm_init_work(dev, &led->work, cap11xx_led_work);
+ if (error) {
+ of_node_put(child);
+ return error;
+ }

error = devm_led_classdev_register(dev, &led->cdev);
if (error) {
--
2.17.1


2019-02-05 08:19:26

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [RFC v1 3/3] cap11xx: fix potential user-after-free on module unload

Hi Sven,

On Mon, Feb 04, 2019 at 05:09:52PM -0500, Sven Van Asbroeck wrote:
> The work which is scheduled by led_classdev->brightness_set() is
> potentially left pending or running until after the driver module
> is unloaded.
>
> Fix by using resource-controlled version of INIT_WORK().

I believe this is wrong way of fixing this. The LED classdev objects are
refcounted, and may live beyond the point where we unwibd devm stack,
so we are still left with the same use-after-free that we currently
have.

This is a general issue with LED subsystem as it provides no callback
for properly tearing down device structures, but I think in this
particular case we can simply switch from set_brightness() to
set_brightness_blocking() which will use the work item internal to the
LED classdev and that one is being shut down properly.

Jacek, does the above sound right?

Thanks.

--
Dmitry

2019-02-05 08:35:06

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [RFC v1 2/3] max17042_battery: fix potential user-after-free on module unload

On Mon, Feb 04, 2019 at 05:09:51PM -0500, Sven Van Asbroeck wrote:
> The work which is scheduled on a POR boot is potentially left
> pending or running until after the driver module is unloaded.
>
> Fix by using resource-controlled version of INIT_WORK().
>
> Signed-off-by: Sven Van Asbroeck <[email protected]>
> ---
> drivers/power/supply/max17042_battery.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/power/supply/max17042_battery.c b/drivers/power/supply/max17042_battery.c
> index 2a8d75e5e930..a61e2b81f68a 100644
> --- a/drivers/power/supply/max17042_battery.c
> +++ b/drivers/power/supply/max17042_battery.c
> @@ -1100,7 +1100,10 @@ static int max17042_probe(struct i2c_client *client,
>
> regmap_read(chip->regmap, MAX17042_STATUS, &val);
> if (val & STATUS_POR_BIT) {
> - INIT_WORK(&chip->work, max17042_init_worker);
> + ret = devm_init_work(&client->dev, &chip->work,
> + max17042_init_worker);
> + if (ret)
> + return ret;
> schedule_work(&chip->work);

Are there many more instances of this? I am unsure if we need
devm_init_work() when we can easily do the same in remove() call.

Thanks.

--
Dmitry

2019-02-05 08:37:03

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [RFC v1 3/3] cap11xx: fix potential user-after-free on module unload

On Tue, Feb 05, 2019 at 12:18:46AM -0800, Dmitry Torokhov wrote:
> Hi Sven,
>
> On Mon, Feb 04, 2019 at 05:09:52PM -0500, Sven Van Asbroeck wrote:
> > The work which is scheduled by led_classdev->brightness_set() is
> > potentially left pending or running until after the driver module
> > is unloaded.
> >
> > Fix by using resource-controlled version of INIT_WORK().
>
> I believe this is wrong way of fixing this. The LED classdev objects are
> refcounted, and may live beyond the point where we unwibd devm stack,
> so we are still left with the same use-after-free that we currently
> have.

Hmm, I take it back, it looks like the lifetime of the outer structure
is limited to the time while driver is bound. I still wonder if using
set_brightness_blocking() would be better fix?

>
> This is a general issue with LED subsystem as it provides no callback
> for properly tearing down device structures, but I think in this
> particular case we can simply switch from set_brightness() to
> set_brightness_blocking() which will use the work item internal to the
> LED classdev and that one is being shut down properly.
>
> Jacek, does the above sound right?
>

--
Dmitry

2019-02-05 14:42:12

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [RFC v1 2/3] max17042_battery: fix potential user-after-free on module unload

On Tue, Feb 5, 2019 at 3:27 AM Dmitry Torokhov
<[email protected]> wrote:
>
> Are there many more instances of this?

Unfortunately I think so.
A simple grep brings up a couple of candidates, but I'm sure there are more:

drivers/regulator/arizona-micsupp.c
drivers/nfc/port100.c
drivers/power/supply/max14656_charger_detector.c
drivers/phy/broadcom/phy-bcm-ns2-usbdrd.c

> I am unsure if we need
> devm_init_work() when we can easily do the same in remove() call.

The devm_init_work() suggestion only addresses the problem for those modules
that use devm_. The others will need fixes in remove(). But this is not as
elegant and error-proof as using devm_init_work().

For example, take port100.c's disconnect function. It starts deallocating
resources, but the cmd_complete_work (scheduled by urb completion) may still
be in progress, and touch these resources after free.

At least, I _think_ that is the case. I'm not familiar enough with this driver
to be 100% sure, but it sounds likely.

static void port100_disconnect(struct usb_interface *interface)
{
struct port100 *dev;

dev = usb_get_intfdata(interface);
usb_set_intfdata(interface, NULL);

nfc_digital_unregister_device(dev->nfc_digital_dev);
nfc_digital_free_device(dev->nfc_digital_dev);

usb_kill_urb(dev->in_urb);
usb_kill_urb(dev->out_urb);

usb_free_urb(dev->in_urb);
usb_free_urb(dev->out_urb);
usb_put_dev(dev->udev);

kfree(dev->cmd);

nfc_info(&interface->dev, "Sony Port-100 NFC device disconnected\n");
}

2019-02-05 14:58:30

by Kees Cook

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

On Mon, Feb 4, 2019 at 10:09 PM Sven Van Asbroeck <[email protected]> wrote:
>
> I think there _might_ be potential use-after-free issues on module unload.
>
> They are hard to trigger, but I think I've seen them bring the whole
> kernel down when they do occur. Can be triggered by doing an insmod of
> a vulnerable module, rapidly followed by an rmmod.
>
> Caused by drivers which schedule work / delayed_work, but do not clean it up
> properly on module unload. Which means the work function could run _after_
> the module has unloaded.
>
> A quick grep through the kernel sources brings up many instances.
> I leave it to people more knowledgeable than me to determine if this problem
> is likely to happen, and/or if it can be exploited to become a security risk.
>
> Perhaps developers can be 'nudged' into doing the right thing by using
> resource-managed versions of INIT_WORK() / INIT_DELAYED_WORK(), which may
> address the issue quite elegantly.

Can a Coccinelle script get written to find module-use of the non-devm
work init?

It seems like finding these in __init functions should be relatively
easy? (Or can we add runtime detection in the existing INIT_*WORK()
code to see if it is running from the wrong place?)

-Kees

>
> Attached is a proposal patch, followed by sample fixes for two vulnerable
> modules. As far as I can tell, many more modules are vulnerable.
>
> Sven Van Asbroeck (3):
> workqueue: Add resource-managed version of INIT_[DELAYED_]WORK()
> max17042_battery: fix potential user-after-free on module unload
> cap11xx: fix potential user-after-free on module unload
>
> drivers/input/keyboard/cap11xx.c | 6 ++-
> drivers/power/supply/max17042_battery.c | 5 ++-
> include/linux/workqueue.h | 7 ++++
> kernel/workqueue.c | 54 +++++++++++++++++++++++++
> 4 files changed, 70 insertions(+), 2 deletions(-)
>
> --
> 2.17.1
>


--
Kees Cook

2019-02-05 15:25:15

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

On Tue, Feb 5, 2019 at 9:57 AM Kees Cook <[email protected]> wrote:
>
>
> Can a Coccinelle script get written to find module-use of the non-devm
> work init?

My thoughts exactly ! But sadly I'm not a Coccinelle expert. I did
look briefly at
its syntax, but I didn't immediately "get" how Cocci could find this class of
errors, without a huge false positive rate (which would make it worse than
useless).

>
> It seems like finding these in __init functions should be relatively
> easy? (Or can we add runtime detection in the existing INIT_*WORK()
> code to see if it is running from the wrong place?)
>

IMHO the problem isn't that they're called from __init functions.
Also, nothing is
wrong with the location of INIT_*WORK per se.

The real problem is that developers overlook calling cancel_work_sync()
on unload. I'm not sure how we could bolt on runtime detection to catch
a *missing* function. Again, without causing tons of false positives.

2019-02-05 17:45:03

by Sebastian Reichel

[permalink] [raw]
Subject: Re: [RFC v1 2/3] max17042_battery: fix potential user-after-free on module unload

Hi,

On Tue, Feb 05, 2019 at 09:27:49AM -0500, Sven Van Asbroeck wrote:
> On Tue, Feb 5, 2019 at 3:27 AM Dmitry Torokhov
> <[email protected]> wrote:
> >
> > Are there many more instances of this?
>
> Unfortunately I think so.
> A simple grep brings up a couple of candidates, but I'm sure there are more:
>
> drivers/regulator/arizona-micsupp.c
> drivers/nfc/port100.c
> drivers/power/supply/max14656_charger_detector.c
> drivers/phy/broadcom/phy-bcm-ns2-usbdrd.c
>
> > I am unsure if we need
> > devm_init_work() when we can easily do the same in remove() call.
>
> The devm_init_work() suggestion only addresses the problem for those modules
> that use devm_. The others will need fixes in remove(). But this is not as
> elegant and error-proof as using devm_init_work().
>
> [...]

I would for sure appreciate a devm_init_work(). There are a lot of
devm_ users in power-supply and this helper would definitely
simplify things.

-- Sebastian


Attachments:
(No filename) (988.00 B)
signature.asc (849.00 B)
Download all attachments

2019-02-05 19:02:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

On Tue, Feb 05, 2019 at 02:57:11PM +0000, Kees Cook wrote:
> On Mon, Feb 4, 2019 at 10:09 PM Sven Van Asbroeck <[email protected]> wrote:
> >
> > I think there _might_ be potential use-after-free issues on module unload.

There are loads of issues with module unloading, which is why it pretty
much is a "best effort" type of thing. It never happens automatically
and the only way you can do it is if you have root access, at which
point, there are loads of other things you can do that are much worse :)

thanks,

greg k-h

2019-02-05 19:03:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

On Tue, Feb 05, 2019 at 10:22:50AM -0500, Sven Van Asbroeck wrote:
> On Tue, Feb 5, 2019 at 9:57 AM Kees Cook <[email protected]> wrote:
> >
> >
> > Can a Coccinelle script get written to find module-use of the non-devm
> > work init?
>
> My thoughts exactly ! But sadly I'm not a Coccinelle expert. I did
> look briefly at
> its syntax, but I didn't immediately "get" how Cocci could find this class of
> errors, without a huge false positive rate (which would make it worse than
> useless).
>
> >
> > It seems like finding these in __init functions should be relatively
> > easy? (Or can we add runtime detection in the existing INIT_*WORK()
> > code to see if it is running from the wrong place?)
> >
>
> IMHO the problem isn't that they're called from __init functions.
> Also, nothing is
> wrong with the location of INIT_*WORK per se.
>
> The real problem is that developers overlook calling cancel_work_sync()
> on unload. I'm not sure how we could bolt on runtime detection to catch
> a *missing* function. Again, without causing tons of false positives.

It really should happen when the device is removed (if it is a driver
that binds to a device.) If this is not a driver, then there should be
some way to scan that cancel_work_sync() is never called or not, right?

thanks,

greg k-h

2019-02-05 19:56:00

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

On Tue, Feb 5, 2019 at 1:43 PM Greg KH <[email protected]> wrote:
>
>
> It really should happen when the device is removed (if it is a driver
> that binds to a device.)

Absolutely. That's why I'm advocating adding a devm_init_work(),
which will take care of this automatically.

But it's of course not universally applicable. Not all drivers use devm.

> If this is not a driver, then there should be
> some way to scan that cancel_work_sync() is never called or not, right?

Are you saying the same thing as Kees, that ideally there should be
infrastructure that WARN()s if work isn't cleaned up properly?

I guess for that to work, the code would need to 'know' what resources
the work function is touching. And warn if one of the resources is freed
without cancelling the work.

Also, cancel_work_sync() is only really needed when running the work
on a global or shared workqueue. If it's a private one, then destroy_workqueue()
is good enough to cancel the work.

Sounds like more of a job for static code analysis? Coccinelle?

2019-02-05 21:26:19

by Jacek Anaszewski

[permalink] [raw]
Subject: Re: [RFC v1 3/3] cap11xx: fix potential user-after-free on module unload

Hi Dmitry,

On 2/5/19 9:18 AM, Dmitry Torokhov wrote:
> Hi Sven,
>
> On Mon, Feb 04, 2019 at 05:09:52PM -0500, Sven Van Asbroeck wrote:
>> The work which is scheduled by led_classdev->brightness_set() is
>> potentially left pending or running until after the driver module
>> is unloaded.
>>
>> Fix by using resource-controlled version of INIT_WORK().
>
> I believe this is wrong way of fixing this. The LED classdev objects are
> refcounted, and may live beyond the point where we unwibd devm stack,
> so we are still left with the same use-after-free that we currently
> have.

Could you please share what LED classdev objects refcounting
do you have on mind?

> This is a general issue with LED subsystem as it provides no callback
> for properly tearing down device structures, but I think in this
> particular case we can simply switch from set_brightness() to
> set_brightness_blocking() which will use the work item internal to the
> LED classdev and that one is being shut down properly.
>
> Jacek, does the above sound right?

Yes, since the introduction of brightness_set_blocking op there is no
need for out-of-led-core workqueues for deferring brightness setting.
And we do flush_work() in led_classdev_unregister().

--
Best regards,
Jacek Anaszewski

2019-02-05 22:03:55

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [RFC v1 3/3] cap11xx: fix potential user-after-free on module unload

On Tue, Feb 05, 2019 at 10:24:47PM +0100, Jacek Anaszewski wrote:
> Hi Dmitry,
>
> On 2/5/19 9:18 AM, Dmitry Torokhov wrote:
> > Hi Sven,
> >
> > On Mon, Feb 04, 2019 at 05:09:52PM -0500, Sven Van Asbroeck wrote:
> > > The work which is scheduled by led_classdev->brightness_set() is
> > > potentially left pending or running until after the driver module
> > > is unloaded.
> > >
> > > Fix by using resource-controlled version of INIT_WORK().
> >
> > I believe this is wrong way of fixing this. The LED classdev objects are
> > refcounted, and may live beyond the point where we unwibd devm stack,
> > so we are still left with the same use-after-free that we currently
> > have.
>
> Could you please share what LED classdev objects refcounting
> do you have on mind?

My mind was in a state of confusion when I wrote the above ;)

>
> > This is a general issue with LED subsystem as it provides no callback
> > for properly tearing down device structures, but I think in this
> > particular case we can simply switch from set_brightness() to
> > set_brightness_blocking() which will use the work item internal to the
> > LED classdev and that one is being shut down properly.
> >
> > Jacek, does the above sound right?
>
> Yes, since the introduction of brightness_set_blocking op there is no
> need for out-of-led-core workqueues for deferring brightness setting.
> And we do flush_work() in led_classdev_unregister().

OK, great, I'll write up a patch for cap11xx and others if I find them.

Thanks.

--
Dmitry

2019-02-05 22:13:33

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [RFC v1 3/3] cap11xx: fix potential user-after-free on module unload

On Tue, Feb 5, 2019 at 4:43 PM Dmitry Torokhov
<[email protected]> wrote:
>
> OK, great, I'll write up a patch for cap11xx and others if I find them.
>

Possibly also drivers/input/keyboard/lm8323.c

2019-02-06 16:47:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

On Tue, Feb 05, 2019 at 02:12:31PM -0500, Sven Van Asbroeck wrote:
> On Tue, Feb 5, 2019 at 1:43 PM Greg KH <[email protected]> wrote:
> >
> >
> > It really should happen when the device is removed (if it is a driver
> > that binds to a device.)
>
> Absolutely. That's why I'm advocating adding a devm_init_work(),
> which will take care of this automatically.
>
> But it's of course not universally applicable. Not all drivers use devm.

Ick, no, watch out for devm() calls. Odds are this is _NOT_ what you
want to do for a device. Remember when devm calls get freed (hint, not
at driver unbind/unload, but at device structure removal.

By creating a work queue, you are suddenly tying module code to a device
memory structure lifespan, both of which are totally independant.

It's the same issue with the devm irq call, that has been nothing but a
nightmare as everyone gets it wrong. Try to learn from our past
mistakes please :)

thanks,

greg k-h

2019-02-06 18:00:53

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

On Wed, Feb 6, 2019 at 8:47 AM Greg KH <[email protected]> wrote:
>
> On Tue, Feb 05, 2019 at 02:12:31PM -0500, Sven Van Asbroeck wrote:
> > On Tue, Feb 5, 2019 at 1:43 PM Greg KH <[email protected]> wrote:
> > >
> > >
> > > It really should happen when the device is removed (if it is a driver
> > > that binds to a device.)
> >
> > Absolutely. That's why I'm advocating adding a devm_init_work(),
> > which will take care of this automatically.
> >
> > But it's of course not universally applicable. Not all drivers use devm.
>
> Ick, no, watch out for devm() calls. Odds are this is _NOT_ what you
> want to do for a device. Remember when devm calls get freed (hint, not
> at driver unbind/unload, but at device structure removal.


??? We unwind devm on probe() failure and after remove() is called.
The device can live on.

>
>
> By creating a work queue, you are suddenly tying module code to a device
> memory structure lifespan, both of which are totally independant.
>
> It's the same issue with the devm irq call, that has been nothing but a
> nightmare as everyone gets it wrong. Try to learn from our past
> mistakes please :)


Yeah. But devm irq gave most trouble because we did not have enough
devm APIs so we often ended up with mixed devm/non-devm usage and that
is what was causing most of the issues. If we can switch everything to
devm then devm irq is not that troublesome.

I have 2+ drivers that currently use devm_add_action_or_reset() to
install action canceling work, they could be switched to
devm_init_work().

Thanks,

--
Dmitry

2019-02-06 18:08:32

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

On Wed, Feb 6, 2019 at 12:30 PM Dmitry Torokhov
<[email protected]> wrote:
>
> Yeah. But devm irq gave most trouble because we did not have enough
> devm APIs so we often ended up with mixed devm/non-devm usage and that
> is what was causing most of the issues. If we can switch everything to
> devm then devm irq is not that troublesome.
>

It sounds to me like _incomplete_ devm_ is worse than no devm at all.

Imagine a devm_ resource depends on a non-devm one:

int acme_probe(struct device *dev)
{
...
r = create_something();
d = devm_create_thing(dev, r);
}

Then remove could get us into some serious trouble:

void acme_remove(struct device *dev)
{
/* r _must_ be released here, we have no other place to do it */
destroy_something(r);
/* here, d is still alive because it's devm
* which is cleaned up _after_ remove().
* Now we have a live resource using a released resource.
* use-after-free anyone?
*/
}

This is a more generalized version of the issue I originally
observed, where r => struct work_struct.

I'm sure there must be plenty of these around the codebase.

I wish we had a Coccinelle script to catch these, because it's
one thing to fix them today. More will be added tomorrow.
devm_ is so elegant that people frequently use it without
thinking it through.

I certainly would have, before yesterday :)

2019-02-07 21:51:18

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

On Tue, Feb 5, 2019 at 9:57 AM Kees Cook <[email protected]> wrote:
>
> Can a Coccinelle script get written to find module-use of the non-devm
> work init?

Ok so I hacked together a Coccinelle script to find these
user-after-free issues,
related to work left running when the device or module is removed.

As far as I can see, these issues may crash/corrupt the kernel even on
_device_ remove/unplug. Users don't need root for that...

I got 71 hits. At least one is a false positive.
34 out of 71 could benefit from devm_init_work().

- is this important enough to ping back to authors of affected modules?
- should this be added to the kernel as part of 'make coccicheck' ?
- does this result make people "feel better" about devm_init_work() ?

Script results, and script itself, follows.

$ make coccicheck COCCI=./init_work.cocci MODE=report M=./drivers/

./drivers//usb/phy/phy-twl6030-usb.c:368:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//power/supply/ds2782_battery.c:429:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//nfc/port100.c:1558:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//net/ethernet/intel/iavf/iavf_main.c:3721:1-10: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//iio/proximity/as3935.c:371:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//hwmon/xgene-hwmon.c:649:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//gpu/drm/bridge/adv7511/adv7511_drv.c:1195:1-10: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use
devm_)
./drivers//block/sx8.c:1440:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//power/supply/isp1704_charger.c:488:1-10: missing clean-up
of INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//power/supply/da9150-charger.c:592:2-11: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//phy/renesas/phy-rcar-gen3-usb2.c:453:2-11: missing clean-up
of INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//pci/pcie/pme.c:330:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//net/ethernet/atheros/atlx/atl1.c:3077:1-10: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//media/pci/b2c2/flexcop-pci.c:384:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//hid/intel-ish-hid/ishtp-hid-client.c:812:1-10: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use
devm_)
./drivers//thermal/samsung/exynos_tmu.c:1051:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//spi/spi-mpc52xx.c:472:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//rtc/rtc-88pm860x.c:403:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//net/ethernet/ibm/ibmvnic.c:4759:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//net/ethernet/cavium/thunder/nicvf_main.c:2188:1-10: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//media/platform/qcom/venus/core.c:277:1-18: missing clean-up
of INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//video/fbdev/smscufx.c:1672:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//usb/renesas_usbhs/common.c:727:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//usb/gadget/udc/snps_udc_plat.c:186:2-19: missing clean-up
of INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//power/supply/twl4030_charger.c:1009:1-10: missing clean-up
of INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//power/supply/twl4030_charger.c:1010:1-18: missing clean-up
of INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//power/supply/max17042_battery.c:1103:2-11: missing clean-up
of INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//net/fjes/fjes_main.c:1250:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//net/ethernet/intel/e100.c:2902:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//mfd/si476x-i2c.c:772:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//input/serio/ps2-gpio.c:412:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//input/keyboard/matrix_keypad.c:512:1-18: missing clean-up
of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//extcon/extcon-sm5502.c:573:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//atm/idt77252.c:3624:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//video/fbdev/omap2/omapfb/displays/panel-dsi-cm.c:1239:1-18:
missing clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
(maybe use devm_)
./drivers//net/ethernet/renesas/ravb_main.c:2055:1-10: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//net/ethernet/natsemi/ns83820.c:1971:1-10: missing clean-up
of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//net/ethernet/cisco/enic/enic_main.c:2885:1-10: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//mfd/da903x.c:512:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//iio/adc/envelope-detector.c:343:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//hwmon/sht15.c:931:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//hwmon/sht15.c:932:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//extcon/extcon-rt8973a.c:581:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//usb/gadget/udc/renesas_usb3.c:2701:1-10: missing clean-up
of INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//usb/gadget/udc/renesas_usb3.c:2740:1-10: missing clean-up
of INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//power/supply/max14656_charger_detector.c:281:1-18: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use
devm_)
./drivers//net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c:6060:1-10:
missing clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//net/ethernet/xircom/xirc2ps_cs.c:497:4-13: missing clean-up
of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//net/ethernet/microchip/enc28j60.c:1576:1-10: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//net/ethernet/microchip/enc28j60.c:1577:1-10: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//net/ethernet/microchip/enc28j60.c:1578:1-10: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//net/ethernet/microchip/enc28j60.c:1579:1-10: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//net/ethernet/calxeda/xgmac.c:1726:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//mmc/host/mxcmmc.c:1159:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//power/supply/sc2731_charger.c:468:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//platform/olpc/olpc-ec.c:271:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//media/pci/saa7164/saa7164-core.c:1280:1-10: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//media/i2c/adv7842.c:3552:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//iio/adc/xilinx-xadc-core.c:1182:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//bluetooth/btsdio.c:314:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//auxdisplay/ht16k33.c:448:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//scsi/ibmvscsi/ibmvfc.c:4785:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//power/supply/ltc2941-battery-gauge.c:548:1-18: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use
devm_)
./drivers//phy/ti/phy-twl4030-usb.c:748:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//nfc/trf7970a.c:2069:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)
./drivers//net/ethernet/qualcomm/emac/emac.c:698:1-10: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//net/ethernet/chelsio/cxgb/cxgb2.c:1065:3-12: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//net/ethernet/chelsio/cxgb/cxgb2.c:1067:3-20: missing
clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//memstick/host/rtsx_usb_ms.c:795:1-18: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here
./drivers//char/pcmcia/synclink_cs.c:531:1-10: missing clean-up of
INIT_WORK/INIT_DELAYED_WORK initialized here

Coccinelle script:

/// Find INIT_(DELAYED_)WORKs in module probe() that are not cancelled
/// or flushed explicitly.
///
/// The following is a common use-after-free anti-pattern:
/// - INIT_WORK() or INIT_DELAYED_WORK() in module probe.
/// - schedule this work on the global workqueue (e.g. schedule_work() )
/// anywhere in the module.
/// - on module unload, forget to either:
/// + explicitly cancel the work, or
/// + wait for its completion.
/// This could mean that the work is still running/pending after the module
/// has unloaded, which is a use-after-free issue.
//
// Licence: GPLv2
//

virtual org
virtual report
virtual context
virtual patch

@find_probe depends on !patch && (context || org || report)@
identifier __probe_name, s, driver;
@@

(
static struct s driver = {
* .probe = __probe_name,
};
|
static struct s driver = {
.ops = {
* .add = __probe_name,
},
};
)

@find_init depends on find_probe && !patch && (context || org || report)@
identifier find_probe.__probe_name;
identifier e, __work, work_fn;
position j0;
@@

__probe_name(...)
{
...
(
* INIT_WORK@j0(&e->__work, work_fn)
|
* INIT_WORK@j0(&e.__work, work_fn)
|
* INIT_DELAYED_WORK@j0(&e->__work, work_fn)
|
* INIT_DELAYED_WORK@j0(&e.__work, work_fn)
)
...
}

@find_schedule depends on find_init && !patch && (context || org || report)@
identifier find_init.__work;
identifier e;
expression delay;
@@

(
schedule_work(&e->__work)
|
schedule_work(&e.__work)
|
schedule_delayed_work(&e->__work, delay)
|
schedule_delayed_work(&e.__work, delay)
)

@find_cancel depends on find_schedule && !patch && (context || org || report)@
identifier find_init.__work;
identifier e;
@@

(
cancel_work_sync(&e->__work)
|
cancel_work_sync(&e.__work)
|
flush_work(&e->__work)
|
flush_work(&e.__work)
|
cancel_delayed_work_sync(&e->__work)
|
cancel_delayed_work_sync(&e.__work)
|
flush_delayed_work(&e->__work)
|
flush_delayed_work(&e->__work)
)

@find_devm depends on find_probe && !patch && (context || org || report)@
identifier find_probe.__probe_name;
@@

__probe_name(...)
{
...
(
devm_kzalloc
|
devm_kcalloc
)
...
}

@script:python missing_cancel depends on find_schedule && !find_cancel
&& !find_devm && report@
j0 << find_init.j0;
@@

msg = "missing clean-up of INIT_WORK/INIT_DELAYED_WORK initialized here"
coccilib.report.print_report(j0[0], msg)

@script:python missing_cancel_devm depends on find_schedule &&
!find_cancel && find_devm && report@
j0 << find_init.j0;
@@

msg = "missing clean-up of INIT_WORK/INIT_DELAYED_WORK initialized
here (maybe use devm_)"
coccilib.report.print_report(j0[0], msg)

2019-02-07 22:21:51

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

Hi Sven,

On Thu, Feb 7, 2019 at 1:49 PM Sven Van Asbroeck <[email protected]> wrote:
>
> On Tue, Feb 5, 2019 at 9:57 AM Kees Cook <[email protected]> wrote:
> >
> > Can a Coccinelle script get written to find module-use of the non-devm
> > work init?
>
> Ok so I hacked together a Coccinelle script to find these
> user-after-free issues,
> related to work left running when the device or module is removed.
>
> As far as I can see, these issues may crash/corrupt the kernel even on
> _device_ remove/unplug. Users don't need root for that...
>
> I got 71 hits. At least one is a false positive.
> 34 out of 71 could benefit from devm_init_work().
...
> ./drivers//input/serio/ps2-gpio.c:412:1-18: missing clean-up of
> INIT_WORK/INIT_DELAYED_WORK initialized here (maybe use devm_)

OK, this seems to be a real issue. We need to make sure we flush the
TX work inside ps2_gpio_close().

> ./drivers//input/keyboard/matrix_keypad.c:512:1-18: missing clean-up
> of INIT_WORK/INIT_DELAYED_WORK initialized here

This is not as simple. The work in question is scheduled from
matrix_keypad_start() and matrix_keypad_stop() uses flush_work() to
make sure currently running instance completes after making sure that
it will not be rescheduled. And matrix_keypad_stop() is guaranteed to
be called by input core when input device is being unregistered if
input device was opened. So in effect we do not actually leak work
past driver remove().

Thanks.

--
Dmitry

2019-02-07 22:28:04

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

On Thu, Feb 7, 2019 at 5:21 PM Dmitry Torokhov
<[email protected]> wrote:
>
> > ./drivers//input/keyboard/matrix_keypad.c:512:1-18: missing clean-up
> > of INIT_WORK/INIT_DELAYED_WORK initialized here
>
> This is not as simple.
[...]
> So in effect we do not actually leak work
> past driver remove().

That's expected. No Coccinelle script is perfect
(especially not my feeble attempts!)
so there will be false positives.

2019-02-07 22:34:32

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

On Thu, Feb 7, 2019 at 5:21 PM Dmitry Torokhov
<[email protected]> wrote:
>
> > ./drivers//input/keyboard/matrix_keypad.c:512:1-18: missing clean-up
> > of INIT_WORK/INIT_DELAYED_WORK initialized here
>
> This is not as simple.
>

PS If you change
flush_work(&keypad->work.work);
to
flush_delayed_work(&keypad->work);

then the Coccinelle script works correctly, and does not flag
this driver.

2019-02-07 22:50:33

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

On Thu, Feb 7, 2019 at 2:32 PM Sven Van Asbroeck <[email protected]> wrote:
>
> On Thu, Feb 7, 2019 at 5:21 PM Dmitry Torokhov
> <[email protected]> wrote:
> >
> > > ./drivers//input/keyboard/matrix_keypad.c:512:1-18: missing clean-up
> > > of INIT_WORK/INIT_DELAYED_WORK initialized here
> >
> > This is not as simple.
> >
>
> PS If you change
> flush_work(&keypad->work.work);
> to
> flush_delayed_work(&keypad->work);
>
> then the Coccinelle script works correctly, and does not flag
> this driver.

Yes, I just sent out the patch as it was in fact a bug: we could have
missed work that is scheduled but not queued.

--
Dmitry

2019-02-08 04:31:27

by Miguel Ojeda

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

On Thu, Feb 7, 2019 at 11:33 PM Sven Van Asbroeck <[email protected]> wrote:
>
> On Thu, Feb 7, 2019 at 5:21 PM Dmitry Torokhov
> <[email protected]> wrote:
> >
> > > ./drivers//input/keyboard/matrix_keypad.c:512:1-18: missing clean-up
> > > of INIT_WORK/INIT_DELAYED_WORK initialized here
> >
> > This is not as simple.
> >
>
> PS If you change
> flush_work(&keypad->work.work);
> to
> flush_delayed_work(&keypad->work);
>
> then the Coccinelle script works correctly, and does not flag
> this driver.

Similarly, in drivers/auxdisplay/ht16k33.c, the cancel_delayed_work()
is there, instead of cancel_delayed_work_sync(). Having the script
suggest this change would be useful, too (i.e. instead of the devm_
change, assuming the cancel_delayed_work() is already there).

Thanks!
Miguel

2019-02-08 06:52:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

On Wed, Feb 06, 2019 at 09:30:29AM -0800, Dmitry Torokhov wrote:
> On Wed, Feb 6, 2019 at 8:47 AM Greg KH <[email protected]> wrote:
> >
> > On Tue, Feb 05, 2019 at 02:12:31PM -0500, Sven Van Asbroeck wrote:
> > > On Tue, Feb 5, 2019 at 1:43 PM Greg KH <[email protected]> wrote:
> > > >
> > > >
> > > > It really should happen when the device is removed (if it is a driver
> > > > that binds to a device.)
> > >
> > > Absolutely. That's why I'm advocating adding a devm_init_work(),
> > > which will take care of this automatically.
> > >
> > > But it's of course not universally applicable. Not all drivers use devm.
> >
> > Ick, no, watch out for devm() calls. Odds are this is _NOT_ what you
> > want to do for a device. Remember when devm calls get freed (hint, not
> > at driver unbind/unload, but at device structure removal.
>
>
> ??? We unwind devm on probe() failure and after remove() is called.
> The device can live on.

{sigh} you are right, I don't know what I was thinking. Then why were
the DRM developers so upset that they didn't see this happening
recently? Anyway, all should be fine here, nevermind...

thanks,

greg k-h

2019-02-08 17:08:33

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC v1 1/3] workqueue: Add resource-managed version of INIT_[DELAYED_]WORK()

On Mon, Feb 04, 2019 at 05:09:50PM -0500, Sven Van Asbroeck wrote:
> In modules which extensively use devm_ resource management, it is often
> easy to overlook (delayed) work that is left pending or running after the
> module is unloaded. This could introduce user-after-free issues.
>
> Nudge kernel developers into 'doing the right thing' by introducing a
> resource-managed version of INIT_[DELAYED_]WORK(). This can be used as
> an elegant way to ensure that work is not left pending or running after
> its dependencies are released.
>
> Functions introduced in workqueue.h :
> - devm_init_work()
> - devm_init_delayed_work()

I don't object to the basic idea but cancel_[delayed_]work_sync()
works iff queueing is disabled already, so there can be situations
where this can lead to surprising / subtle failures. Given that, it
*might* not be a bad idea to keep this explicit unless there is a way
to reliably block future queueing.

Thanks.

--
tejun

2019-02-08 18:16:05

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [RFC v1 1/3] workqueue: Add resource-managed version of INIT_[DELAYED_]WORK()

Hi Tejun,

On Fri, Feb 8, 2019 at 12:07 PM Tejun Heo <[email protected]> wrote:
>
> I don't object to the basic idea but cancel_[delayed_]work_sync()
> works iff queueing is disabled already, so there can be situations
> where this can lead to surprising / subtle failures. Given that, it
> *might* not be a bad idea to keep this explicit unless there is a way
> to reliably block future queueing.
>

Yes, I'm "coming around" to your opinion myself.

There's also the question of which one is appropriate for clean-up:
cancel_work_sync() or flush_work().

And what about work scheduled on more than one workqueue?
Or work scheduled on multi-threaded workqueues?

The workqueue API sounds too complicated to have a devm_
helper. It would lull developers into a false sense of security.

2019-02-10 18:05:38

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

Hi Miguel,

On Thu, Feb 7, 2019 at 11:30 PM Miguel Ojeda
<[email protected]> wrote:
>
> Similarly, in drivers/auxdisplay/ht16k33.c, the cancel_delayed_work()
> is there, instead of cancel_delayed_work_sync(). Having the script
> suggest this change would be useful, too (i.e. instead of the devm_
> change, assuming the cancel_delayed_work() is already there).
>

For relatively straightforward problems, I'd say yes.

However, the problems flagged by this script are not trivial at all.
In many cases, the missing _sync is just a symptom of general
synchronization issues on disconnect(), and simply adding it
will not fix the problem. Sometimes, it's just a false positive.

2019-02-14 10:12:43

by Miguel Ojeda

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

Hi Sven,

On Sun, Feb 10, 2019 at 7:05 PM Sven Van Asbroeck <[email protected]> wrote:
>
> For relatively straightforward problems, I'd say yes.
>
> However, the problems flagged by this script are not trivial at all.
> In many cases, the missing _sync is just a symptom of general
> synchronization issues on disconnect(), and simply adding it
> will not fix the problem. Sometimes, it's just a false positive.

Indeed! Note that I didn't mean to say they are trivial issues or that
the solution is simply to replace cancel_delayed_work() with the
_sync() version in all cases -- only that it might be a good idea to
suggest its existence (instead of only suggesting devm_, because in
cases like drivers/auxdisplay/ht16k33.c it may be the simplest fix).

Cheers,
Miguel

2019-02-15 00:46:41

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [RFC v1 0/3] Address potential user-after-free on module unload

Hi Miguel,

On Wed, Feb 13, 2019 at 8:11 PM Miguel Ojeda
<[email protected]> wrote:
>
> Indeed! Note that I didn't mean to say they are trivial issues or that
> the solution is simply to replace cancel_delayed_work() with the
> _sync() version in all cases -- only that it might be a good idea to
> suggest its existence (instead of only suggesting devm_, because in
> cases like drivers/auxdisplay/ht16k33.c it may be the simplest fix).
>

I will try to get your suggestion in, as I move forward with this
Coccinelle script.

Cheers,
Sven