2012-06-05 08:59:31

by Ming Lei

[permalink] [raw]
Subject: [PATCH] driver core: fix shutdown races with probe/remove

Firstly, .shutdown callback may touch a uninitialized hardware
if dev->driver is set and .probe is not completed.

Secondly, device_shutdown() may dereference a null pointer to cause
oops when dev->driver is cleared after it is checked in device_shutdown().

This patch uses SRCU and ACCESS_ONCE to avoid the races:
- don't start .shutdown until .probe in progess is completed
- avoid running .probe and .remove once shutdown is started

Cc: Alan Stern <[email protected]>
Cc: Paul McKenney <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---

Thanks Paul for helpping me on using SRCU and ACCESS_ONCE to solve
the problem.

drivers/base/base.h | 4 ++++
drivers/base/core.c | 6 ++++++
drivers/base/dd.c | 14 ++++++++++++++
drivers/base/init.c | 3 +++
4 files changed, 27 insertions(+)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 6ee17bb..06f9dc1 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -1,4 +1,5 @@
#include <linux/notifier.h>
+#include <linux/srcu.h>

/**
* struct subsys_private - structure to hold the private to the driver core portions of the bus_type/class structure.
@@ -138,3 +139,6 @@ extern int devtmpfs_init(void);
#else
static inline int devtmpfs_init(void) { return 0; }
#endif
+
+extern struct srcu_struct driver_srcu;
+extern int device_shutdown_started;
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 346be8b..c3d597b 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -49,6 +49,9 @@ static struct kobject *dev_kobj;
struct kobject *sysfs_dev_char_kobj;
struct kobject *sysfs_dev_block_kobj;

+struct srcu_struct driver_srcu;
+int device_shutdown_started;
+
#ifdef CONFIG_BLOCK
static inline int device_is_not_partition(struct device *dev)
{
@@ -1803,6 +1806,9 @@ void device_shutdown(void)
{
struct device *dev;

+ ACCESS_ONCE(device_shutdown_started) = 1;
+ synchronize_srcu(&driver_srcu);
+
spin_lock(&devices_kset->list_lock);
/*
* Walk the devices list backward, shutting down each in turn.
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 1b1cbb5..5587037 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -250,8 +250,13 @@ static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
static int really_probe(struct device *dev, struct device_driver *drv)
{
int ret = 0;
+ int idx;

+ idx = srcu_read_lock(&driver_srcu);
atomic_inc(&probe_count);
+ if (ACCESS_ONCE(device_shutdown_started))
+ goto done;
+
pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
drv->bus->name, __func__, drv->name, dev_name(dev));
WARN_ON(!list_empty(&dev->devres_head));
@@ -305,6 +310,7 @@ probe_failed:
done:
atomic_dec(&probe_count);
wake_up(&probe_waitqueue);
+ srcu_read_unlock(&driver_srcu, idx);
return ret;
}

@@ -467,6 +473,12 @@ EXPORT_SYMBOL_GPL(driver_attach);
static void __device_release_driver(struct device *dev)
{
struct device_driver *drv;
+ int idx;
+
+ idx = srcu_read_lock(&driver_srcu);
+
+ if (ACCESS_ONCE(device_shutdown_started))
+ goto exit;

drv = dev->driver;
if (drv) {
@@ -494,6 +506,8 @@ static void __device_release_driver(struct device *dev)
dev);

}
+exit:
+ srcu_read_unlock(&driver_srcu, idx);
}

/**
diff --git a/drivers/base/init.c b/drivers/base/init.c
index c16f0b8..1b7dae6 100644
--- a/drivers/base/init.c
+++ b/drivers/base/init.c
@@ -27,6 +27,9 @@ void __init driver_init(void)
firmware_init();
hypervisor_init();

+ /* sync shutdown with probe/release */
+ init_srcu_struct(&driver_srcu);
+
/* These are also core pieces, but must come after the
* core core pieces.
*/
--
1.7.9.5


2012-06-05 09:19:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Tue, Jun 05, 2012 at 04:59:07PM +0800, Ming Lei wrote:
> Firstly, .shutdown callback may touch a uninitialized hardware
> if dev->driver is set and .probe is not completed.

When would that ever happen?

> Secondly, device_shutdown() may dereference a null pointer to cause
> oops when dev->driver is cleared after it is checked in device_shutdown().

Again, when would that happen? A new device/driver being added to the
system as it is being shutdown?

Has anyone ever hit that before?

Is this really needed?

greg k-h

2012-06-05 09:38:44

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

Hi,

On Tue, Jun 5, 2012 at 5:18 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Tue, Jun 05, 2012 at 04:59:07PM +0800, Ming Lei wrote:
>> Firstly, .shutdown callback may touch a uninitialized hardware
>> if dev->driver is set and .probe is not completed.
>
> When would that ever happen?

It may happen when someone plugs a device, then run 'shutdown' immediately.

>
>> Secondly, device_shutdown() may dereference a null pointer to cause
>> oops when dev->driver is cleared after it is checked in device_shutdown().
>
> Again, when would that happen? ?A new device/driver being added to the
> system as it is being shutdown?

It is usually that someone presses 'power off' button and unplugs
its usb devices or pci/e devices, maybe together with a new plug.

In such case, the oops may be triggered.

>
> Has anyone ever hit that before?

Looks someone has reported it, I also can trigger it
with some delay added in device_shutdown.

>
> Is this really needed?

In case that it is triggered, oops will prevent system from being
shutdown, and it should be a big problem.

IMO, we can avoid it without much change and side effect, so
looks worthy the change.

Thanks,
--
Ming Lei

2012-06-05 14:47:08

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Tue, 5 Jun 2012, Ming Lei wrote:

> Firstly, .shutdown callback may touch a uninitialized hardware
> if dev->driver is set and .probe is not completed.
>
> Secondly, device_shutdown() may dereference a null pointer to cause
> oops when dev->driver is cleared after it is checked in device_shutdown().
>
> This patch uses SRCU and ACCESS_ONCE to avoid the races:
> - don't start .shutdown until .probe in progess is completed
> - avoid running .probe and .remove once shutdown is started

Avoid running probe, that's fine. But avoiding remove can lead to
problems, because the subsystem and the driver will no longer agree on
who should manage the device.

> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -49,6 +49,9 @@ static struct kobject *dev_kobj;
> struct kobject *sysfs_dev_char_kobj;
> struct kobject *sysfs_dev_block_kobj;
>
> +struct srcu_struct driver_srcu;
> +int device_shutdown_started;
> +
> #ifdef CONFIG_BLOCK
> static inline int device_is_not_partition(struct device *dev)
> {
> @@ -1803,6 +1806,9 @@ void device_shutdown(void)
> {
> struct device *dev;
>
> + ACCESS_ONCE(device_shutdown_started) = 1;

This does not need to be ACCESS_ONCE. ACCESS_ONCE is needed only in
situations where the compiler might insert multiple reads or writes.

> + synchronize_srcu(&driver_srcu);
> +
> spin_lock(&devices_kset->list_lock);
> /*
> * Walk the devices list backward, shutting down each in turn.
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 1b1cbb5..5587037 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -250,8 +250,13 @@ static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
> static int really_probe(struct device *dev, struct device_driver *drv)
> {
> int ret = 0;
> + int idx;
>
> + idx = srcu_read_lock(&driver_srcu);
> atomic_inc(&probe_count);
> + if (ACCESS_ONCE(device_shutdown_started))

This doesn't need to be ACCESS_ONCE either. If it would make you feel
better, you could write

if (rcu_dereference(&device_shutdown_started))

> + goto done;

You need to set ret to something other than 0, because the probe has
not succeeded. For example, set it to -ESHUTDOWN.

> +
> pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
> drv->bus->name, __func__, drv->name, dev_name(dev));
> WARN_ON(!list_empty(&dev->devres_head));
> @@ -305,6 +310,7 @@ probe_failed:
> done:
> atomic_dec(&probe_count);
> wake_up(&probe_waitqueue);
> + srcu_read_unlock(&driver_srcu, idx);
> return ret;
> }
>
> @@ -467,6 +473,12 @@ EXPORT_SYMBOL_GPL(driver_attach);
> static void __device_release_driver(struct device *dev)
> {
> struct device_driver *drv;
> + int idx;
> +
> + idx = srcu_read_lock(&driver_srcu);
> +
> + if (ACCESS_ONCE(device_shutdown_started))
> + goto exit;

As mentioned above, this is not good. I don't know... maybe it won't
ever cause any difficulties.

Alan Stern

2012-06-05 15:17:15

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Tue, Jun 5, 2012 at 10:47 PM, Alan Stern <[email protected]> wrote:
>
> Avoid running probe, that's fine. ?But avoiding remove can lead to
> problems, because the subsystem and the driver will no longer agree on
> who should manage the device.

After device_shutdown() has been called, the whole system will enter power down
or reset later, so it doesn't matter if who should manage the device.

Also once shutdown callback is called for the device, looks its other callbacks
should not touch the device any more.

>
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -49,6 +49,9 @@ static struct kobject *dev_kobj;
>> ?struct kobject *sysfs_dev_char_kobj;
>> ?struct kobject *sysfs_dev_block_kobj;
>>
>> +struct srcu_struct driver_srcu;
>> +int device_shutdown_started;
>> +
>> ?#ifdef CONFIG_BLOCK
>> ?static inline int device_is_not_partition(struct device *dev)
>> ?{
>> @@ -1803,6 +1806,9 @@ void device_shutdown(void)
>> ?{
>> ? ? ? struct device *dev;
>>
>> + ? ? ACCESS_ONCE(device_shutdown_started) = 1;
>
> This does not need to be ACCESS_ONCE. ?ACCESS_ONCE is needed only in
> situations where the compiler might insert multiple reads or writes.

I have talked with Paul about the ACCESS_ONCE, and Paul thought it is needed
to prevent the compiler from doing byte-at-a-time stores.

Maybe Paul has a detailed explanation about it.

>
>> + ? ? synchronize_srcu(&driver_srcu);
>> +
>> ? ? ? spin_lock(&devices_kset->list_lock);
>> ? ? ? /*
>> ? ? ? ?* Walk the devices list backward, shutting down each in turn.
>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>> index 1b1cbb5..5587037 100644
>> --- a/drivers/base/dd.c
>> +++ b/drivers/base/dd.c
>> @@ -250,8 +250,13 @@ static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
>> ?static int really_probe(struct device *dev, struct device_driver *drv)
>> ?{
>> ? ? ? int ret = 0;
>> + ? ? int idx;
>>
>> + ? ? idx = srcu_read_lock(&driver_srcu);
>> ? ? ? atomic_inc(&probe_count);
>> + ? ? if (ACCESS_ONCE(device_shutdown_started))
>
> This doesn't need to be ACCESS_ONCE either. ?If it would make you feel

Because the global variable of device_shutdown_started is not changed inside
the function, the compiler may put the above line after the line of
'dev->driver = drv',
so introduce the ACCESS_ONCE to avoid the compiler optimization.

> better, you could write
>
> ? ? ? ?if (rcu_dereference(&device_shutdown_started))

The usage of SRCU refers to the part 'Cleaning Up Safely' of the SRCU
paper 'Sleepable RCU'[1]. Also the patch doesn't call rcu_assign_pointer,
so rcu_dereference isn't needed.

[1], http://lwn.net/Articles/202847/

>
>> + ? ? ? ? ? ? goto done;
>
> You need to set ret to something other than 0, because the probe has
> not succeeded. ?For example, set it to -ESHUTDOWN.

Looks reasonable.

>
>> +
>> ? ? ? pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
>> ? ? ? ? ? ? ? ?drv->bus->name, __func__, drv->name, dev_name(dev));
>> ? ? ? WARN_ON(!list_empty(&dev->devres_head));
>> @@ -305,6 +310,7 @@ probe_failed:
>> ?done:
>> ? ? ? atomic_dec(&probe_count);
>> ? ? ? wake_up(&probe_waitqueue);
>> + ? ? srcu_read_unlock(&driver_srcu, idx);
>> ? ? ? return ret;
>> ?}
>>
>> @@ -467,6 +473,12 @@ EXPORT_SYMBOL_GPL(driver_attach);
>> ?static void __device_release_driver(struct device *dev)
>> ?{
>> ? ? ? struct device_driver *drv;
>> + ? ? int idx;
>> +
>> + ? ? idx = srcu_read_lock(&driver_srcu);
>> +
>> + ? ? if (ACCESS_ONCE(device_shutdown_started))
>> + ? ? ? ? ? ? goto exit;
>
> As mentioned above, this is not good. ?I don't know... maybe it won't
> ever cause any difficulties.

Looks same with above.

Thanks,
--
Ming Lei

2012-06-05 17:09:47

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Tue, 5 Jun 2012, Ming Lei wrote:

> On Tue, Jun 5, 2012 at 10:47 PM, Alan Stern <[email protected]> wrote:
> >
> > Avoid running probe, that's fine. ?But avoiding remove can lead to
> > problems, because the subsystem and the driver will no longer agree on
> > who should manage the device.
>
> After device_shutdown() has been called, the whole system will enter power down
> or reset later, so it doesn't matter if who should manage the device.

Maybe. But there might be quite some time between the shutdown call
and the eventual power-off or reboot.

> Also once shutdown callback is called for the device, looks its other callbacks
> should not touch the device any more.

You shouldn't depend on that. Shutdown methods generally put the
device into a state suitable for power-off or reboot; they don't often
guarantee that the driver won't change the state later on.

On the whole, it might be easier just to hold the device lock during
the shutdown call.

> >> @@ -1803,6 +1806,9 @@ void device_shutdown(void)
> >> ?{
> >> ? ? ? struct device *dev;
> >>
> >> + ? ? ACCESS_ONCE(device_shutdown_started) = 1;
> >
> > This does not need to be ACCESS_ONCE. ?ACCESS_ONCE is needed only in
> > situations where the compiler might insert multiple reads or writes.
>
> I have talked with Paul about the ACCESS_ONCE, and Paul thought it is needed
> to prevent the compiler from doing byte-at-a-time stores.

Why on earth would a compiler change this into a byte-at-a-time store?
And even if it did, what problem would that cause? Only one of the
bytes would be different from the original value.

> Maybe Paul has a detailed explanation about it.

> >> ?static int really_probe(struct device *dev, struct device_driver *drv)
> >> ?{
> >> ? ? ? int ret = 0;
> >> + ? ? int idx;
> >>
> >> + ? ? idx = srcu_read_lock(&driver_srcu);
> >> ? ? ? atomic_inc(&probe_count);
> >> + ? ? if (ACCESS_ONCE(device_shutdown_started))
> >
> > This doesn't need to be ACCESS_ONCE either. ?If it would make you feel
>
> Because the global variable of device_shutdown_started is not changed inside
> the function, the compiler may put the above line after the line of
> 'dev->driver = drv',

No, it can't do that. If the compiler moved this read farther down,
and as a result the CPU wrote to dev->driver when it shouldn't have,
the result would be incorrect (i.e., different from what would have
happened if the statement hadn't been moved). Hence the compiler is
prohibited from making such an optimization.

> so introduce the ACCESS_ONCE to avoid the compiler optimization.
>
> > better, you could write
> >
> > ? ? ? ?if (rcu_dereference(&device_shutdown_started))
>
> The usage of SRCU refers to the part 'Cleaning Up Safely' of the SRCU
> paper 'Sleepable RCU'[1]. Also the patch doesn't call rcu_assign_pointer,
> so rcu_dereference isn't needed.

True, it's not needed any more than ACCESS_ONCE is.

> > As mentioned above, this is not good. ?I don't know... maybe it won't
> > ever cause any difficulties.
>
> Looks same with above.

You could try going into an infinite loop here, but that's probably not
a good idea.

Alan Stern

2012-06-05 20:21:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Tue, Jun 05, 2012 at 01:09:45PM -0400, Alan Stern wrote:
> On Tue, 5 Jun 2012, Ming Lei wrote:
>
> > On Tue, Jun 5, 2012 at 10:47 PM, Alan Stern <[email protected]> wrote:
> > >
> > > Avoid running probe, that's fine. ?But avoiding remove can lead to
> > > problems, because the subsystem and the driver will no longer agree on
> > > who should manage the device.
> >
> > After device_shutdown() has been called, the whole system will enter power down
> > or reset later, so it doesn't matter if who should manage the device.
>
> Maybe. But there might be quite some time between the shutdown call
> and the eventual power-off or reboot.
>
> > Also once shutdown callback is called for the device, looks its other callbacks
> > should not touch the device any more.
>
> You shouldn't depend on that. Shutdown methods generally put the
> device into a state suitable for power-off or reboot; they don't often
> guarantee that the driver won't change the state later on.
>
> On the whole, it might be easier just to hold the device lock during
> the shutdown call.

That sounds much simpler to me.

greg k-h

2012-06-05 20:44:18

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Wed, 6 Jun 2012, Greg Kroah-Hartman wrote:

> > On the whole, it might be easier just to hold the device lock during
> > the shutdown call.
>
> That sounds much simpler to me.

Maybe use device_trylock() in a loop, and if the lock can't be acquired
within a second or so, either call the shutdown method without the lock
or else skip the device entirely.

That way, crashed or buggy drivers will have less chance of interfering
with system shutdown.

Alan Stern

2012-06-06 02:27:16

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Wed, Jun 6, 2012 at 1:09 AM, Alan Stern <[email protected]> wrote:

>> After device_shutdown() has been called, the whole system will enter power down
>> or reset later, so it doesn't matter if who should manage the device.
>
> Maybe. ?But there might be quite some time between the shutdown call
> and the eventual power-off or reboot.

Between device_shutdown and machine_halt, there is only syscore_shutdown left
to complete, so no much time is involved.

>
> On the whole, it might be easier just to hold the device lock during
> the shutdown call.

Yes, it is easier, but it is not efficient because there are very less
drivers which implemented .shutdown callback. Suppose there are some
drivers which have no .shutdown but are holding device lock, extra
time will be consumed in the lock acquiring, which may slow down 'power
down'.

>
>> >> @@ -1803,6 +1806,9 @@ void device_shutdown(void)
>> >> ?{
>> >> ? ? ? struct device *dev;
>> >>
>> >> + ? ? ACCESS_ONCE(device_shutdown_started) = 1;
>> >
>> > This does not need to be ACCESS_ONCE. ?ACCESS_ONCE is needed only in
>> > situations where the compiler might insert multiple reads or writes.
>>
>> I have talked with Paul about the ACCESS_ONCE, and Paul thought it is needed
>> to prevent the compiler from doing byte-at-a-time stores.
>
> Why on earth would a compiler change this into a byte-at-a-time store?
> And even if it did, what problem would that cause? ?Only one of the
> bytes would be different from the original value.

Maybe Paul can answer the question...

>
>> Maybe Paul has a detailed explanation about it.
>
>> >> ?static int really_probe(struct device *dev, struct device_driver *drv)
>> >> ?{
>> >> ? ? ? int ret = 0;
>> >> + ? ? int idx;
>> >>
>> >> + ? ? idx = srcu_read_lock(&driver_srcu);
>> >> ? ? ? atomic_inc(&probe_count);
>> >> + ? ? if (ACCESS_ONCE(device_shutdown_started))
>> >
>> > This doesn't need to be ACCESS_ONCE either. ?If it would make you feel
>>
>> Because the global variable of device_shutdown_started is not changed inside
>> the function, the compiler may put the above line after the line of
>> 'dev->driver = drv',
>
> No, it can't do that. ?If the compiler moved this read farther down,
> and as a result the CPU wrote to dev->driver when it shouldn't have,
> the result would be incorrect (i.e., different from what would have
> happened if the statement hadn't been moved). ?Hence the compiler is
> prohibited from making such an optimization.

There are some similar examples(check on global variable is moved) with
ACCESS_ONCE usage in Documentation/atomic_ops.txt. (from line 88)


Thanks,
--
Ming Lei

2012-06-06 13:42:56

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Wed, Jun 06, 2012 at 10:27:04AM +0800, Ming Lei wrote:
> On Wed, Jun 6, 2012 at 1:09 AM, Alan Stern <[email protected]> wrote:
>
> >> After device_shutdown() has been called, the whole system will enter power down
> >> or reset later, so it doesn't matter if who should manage the device.
> >
> > Maybe. ?But there might be quite some time between the shutdown call
> > and the eventual power-off or reboot.
>
> Between device_shutdown and machine_halt, there is only syscore_shutdown left
> to complete, so no much time is involved.
>
> >
> > On the whole, it might be easier just to hold the device lock during
> > the shutdown call.
>
> Yes, it is easier, but it is not efficient because there are very less
> drivers which implemented .shutdown callback. Suppose there are some
> drivers which have no .shutdown but are holding device lock, extra
> time will be consumed in the lock acquiring, which may slow down 'power
> down'.
>
> >
> >> >> @@ -1803,6 +1806,9 @@ void device_shutdown(void)
> >> >> ?{
> >> >> ? ? ? struct device *dev;
> >> >>
> >> >> + ? ? ACCESS_ONCE(device_shutdown_started) = 1;
> >> >
> >> > This does not need to be ACCESS_ONCE. ?ACCESS_ONCE is needed only in
> >> > situations where the compiler might insert multiple reads or writes.
> >>
> >> I have talked with Paul about the ACCESS_ONCE, and Paul thought it is needed
> >> to prevent the compiler from doing byte-at-a-time stores.
> >
> > Why on earth would a compiler change this into a byte-at-a-time store?
> > And even if it did, what problem would that cause? ?Only one of the
> > bytes would be different from the original value.
>
> Maybe Paul can answer the question...

No sane compiler would change it to a byte-at-a-time store, but the
compiler would nevertheless be within its rights to do so. And a quick
review of certain LKML threads could easily cause anyone to question gcc's
sanity. Furthermore, the compiler is permitted to make transformations
like the following, which it might well do to save a branch:

if (b) a = 0;
a = 1; if (b)
else a = 1;
a = 0;

In short, without ACCESS_ONCE(), the compiler is within its rights to
assume that the code is single-threaded. There are a large number of
non-obvious optimizations that are safe in single-threaded code but
destructive in multithreaded code.

In addition, and perhaps more important, ACCESS_ONCE() serves as useful
documentation of the fact that the variable is accessed outside of locks.

Thanx, Paul

> >> Maybe Paul has a detailed explanation about it.
> >
> >> >> ?static int really_probe(struct device *dev, struct device_driver *drv)
> >> >> ?{
> >> >> ? ? ? int ret = 0;
> >> >> + ? ? int idx;
> >> >>
> >> >> + ? ? idx = srcu_read_lock(&driver_srcu);
> >> >> ? ? ? atomic_inc(&probe_count);
> >> >> + ? ? if (ACCESS_ONCE(device_shutdown_started))
> >> >
> >> > This doesn't need to be ACCESS_ONCE either. ?If it would make you feel
> >>
> >> Because the global variable of device_shutdown_started is not changed inside
> >> the function, the compiler may put the above line after the line of
> >> 'dev->driver = drv',
> >
> > No, it can't do that. ?If the compiler moved this read farther down,
> > and as a result the CPU wrote to dev->driver when it shouldn't have,
> > the result would be incorrect (i.e., different from what would have
> > happened if the statement hadn't been moved). ?Hence the compiler is
> > prohibited from making such an optimization.
>
> There are some similar examples(check on global variable is moved) with
> ACCESS_ONCE usage in Documentation/atomic_ops.txt. (from line 88)
>
>
> Thanks,
> --
> Ming Lei
>

2012-06-06 14:44:08

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Wed, 6 Jun 2012, Ming Lei wrote:

> On Wed, Jun 6, 2012 at 1:09 AM, Alan Stern <[email protected]> wrote:
>
> >> After device_shutdown() has been called, the whole system will enter power down
> >> or reset later, so it doesn't matter if who should manage the device.
> >
> > Maybe. ?But there might be quite some time between the shutdown call
> > and the eventual power-off or reboot.
>
> Between device_shutdown and machine_halt, there is only syscore_shutdown left
> to complete, so no much time is involved.

Not much time from the user's point of view, but computers can do a lot
in a short time.

> > On the whole, it might be easier just to hold the device lock during
> > the shutdown call.
>
> Yes, it is easier, but it is not efficient because there are very less
> drivers which implemented .shutdown callback. Suppose there are some
> drivers which have no .shutdown but are holding device lock, extra
> time will be consumed in the lock acquiring, which may slow down 'power
> down'.

The main reasons for holding the device lock are probe and remove
callbacks. In your patch, synchronize_srcu has to wait for those too.
So there won't be much extra time.

> >> Because the global variable of device_shutdown_started is not changed inside
> >> the function, the compiler may put the above line after the line of
> >> 'dev->driver = drv',
> >
> > No, it can't do that. ?If the compiler moved this read farther down,
> > and as a result the CPU wrote to dev->driver when it shouldn't have,
> > the result would be incorrect (i.e., different from what would have
> > happened if the statement hadn't been moved). ?Hence the compiler is
> > prohibited from making such an optimization.
5B5B>
> There are some similar examples(check on global variable is moved) with
> ACCESS_ONCE usage in Documentation/atomic_ops.txt. (from line 88)

Most of those examples involve repeatedly reading a global variable.
In your case there is no repetition, and you are writing rather than
reading.

Furthermore, I think some of those examples go a little too far.
Here's an extract from the file:

-------------------------------------------------------------------
For a final example, consider the following code, assuming that the
variable a is set at boot time before the second CPU is brought online
and never changed later, so that memory barriers are not needed:

if (a)
b = 9;
else
b = 42;

The compiler is within its rights to manufacture an additional store
by transforming the above code into the following:

b = 42;
if (a)
b = 9;

This could come as a fatal surprise to other code running concurrently
that expected b to never have the value 42 if a was zero. To prevent
the compiler from doing this, write something like:

if (a)
ACCESS_ONCE(b) = 9;
else
ACCESS_ONCE(b) = 42;
-------------------------------------------------------------------

That just seems wrong. By the same reasoning, the compiler is within
its rights to transform either the original code or the code using
ACCESS_ONCE into:

b = 999;
if (a)
b = 9;
else
b = 42;

and again, other code would be confused. The simple fact is that
SMP-safe code is not likely to be produced by a compiler that assumes
everything is single-threaded.

Alan Stern

2012-06-06 15:21:55

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Wed, 6 Jun 2012, Paul E. McKenney wrote:

> No sane compiler would change it to a byte-at-a-time store, but the
> compiler would nevertheless be within its rights to do so. And a quick
> review of certain LKML threads could easily cause anyone to question gcc's
> sanity. Furthermore, the compiler is permitted to make transformations
> like the following, which it might well do to save a branch:
>
> if (b) a = 0;
> a = 1; if (b)
> else a = 1;
> a = 0;

The compiler would be forbidden if the original code were

if (b)
ACCESS_ONCE(a) = 1;
else
ACCESS_ONCE(a) = 0;

But if I remember correctly, the code snippet we were talking was more
like:

if (ACCESS_ONCE(b))
a = 1;

Isn't this use of ACCESS_ONCE unnecessary?

> In short, without ACCESS_ONCE(), the compiler is within its rights to
> assume that the code is single-threaded. There are a large number of
> non-obvious optimizations that are safe in single-threaded code but
> destructive in multithreaded code.

Followed to its logical conclusion, this means that virtually every
access to a shared variable that isn't protected by some sort of lock
or isn't one of the special atomic operations needs to use
ACCESS_ONCE. The kernel must be riddled with places that don't do
this.

Besides, how sure are you that even in the presence of ACCESS_ONCE, the
compiler will not make any unsafe transformations? For example, is the
compiler forbidden from transforming

if (a)
ACCESS_ONCE(b) = 5;

into

tmp = c;
c = 999;
if (a)
ACCESS_ONCE(b) = 5;
c = tmp;

? After all, the c variable isn't protected by ACCESS_ONCE in the
original code. And yet this transformation is clearly _unsafe_ for
multi-threaded operation.

> In addition, and perhaps more important, ACCESS_ONCE() serves as useful
> documentation of the fact that the variable is accessed outside of locks.

True.

Alan Stern


2012-06-06 15:27:35

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Wed, Jun 06, 2012 at 10:44:05AM -0400, Alan Stern wrote:
> On Wed, 6 Jun 2012, Ming Lei wrote:

[ . . . ]

> > There are some similar examples(check on global variable is moved) with
> > ACCESS_ONCE usage in Documentation/atomic_ops.txt. (from line 88)
>
> Most of those examples involve repeatedly reading a global variable.
> In your case there is no repetition, and you are writing rather than
> reading.
>
> Furthermore, I think some of those examples go a little too far.
> Here's an extract from the file:
>
> -------------------------------------------------------------------
> For a final example, consider the following code, assuming that the
> variable a is set at boot time before the second CPU is brought online
> and never changed later, so that memory barriers are not needed:
>
> if (a)
> b = 9;
> else
> b = 42;
>
> The compiler is within its rights to manufacture an additional store
> by transforming the above code into the following:
>
> b = 42;
> if (a)
> b = 9;
>
> This could come as a fatal surprise to other code running concurrently
> that expected b to never have the value 42 if a was zero. To prevent
> the compiler from doing this, write something like:
>
> if (a)
> ACCESS_ONCE(b) = 9;
> else
> ACCESS_ONCE(b) = 42;
> -------------------------------------------------------------------
>
> That just seems wrong. By the same reasoning, the compiler is within
> its rights to transform either the original code or the code using
> ACCESS_ONCE into:
>
> b = 999;
> if (a)
> b = 9;
> else
> b = 42;
>
> and again, other code would be confused. The simple fact is that
> SMP-safe code is not likely to be produced by a compiler that assumes
> everything is single-threaded.

If you use ACCESS_ONCE(), the compiler is prohibited from inserting
the "b = 999". If you don't use ACCESS_ONCE(), the compiler really
is permitted to insert the "b = 999". So, why would the compiler do
such a thing? One possible reason would be from optimizations using
large registers to hold multiple values. A store from such a register
could clobber unrelated variables, but as long as the compiler fixes
up the clobbering after the fact, it is within its rights to do so.

The sad fact is that the C standard really does permit the compiler
to assume that it is generating sequential code.

Thanx, Paul

2012-06-06 15:44:53

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Wed, 6 Jun 2012, Paul E. McKenney wrote:

> > That just seems wrong. By the same reasoning, the compiler is within
> > its rights to transform either the original code or the code using
> > ACCESS_ONCE into:
> >
> > b = 999;
> > if (a)
> > b = 9;
> > else
> > b = 42;
> >
> > and again, other code would be confused. The simple fact is that
> > SMP-safe code is not likely to be produced by a compiler that assumes
> > everything is single-threaded.
>
> If you use ACCESS_ONCE(), the compiler is prohibited from inserting
> the "b = 999".

What prohibits it?

> If you don't use ACCESS_ONCE(), the compiler really
> is permitted to insert the "b = 999". So, why would the compiler do
> such a thing? One possible reason would be from optimizations using
> large registers to hold multiple values. A store from such a register
> could clobber unrelated variables, but as long as the compiler fixes
> up the clobbering after the fact, it is within its rights to do so.
>
> The sad fact is that the C standard really does permit the compiler
> to assume that it is generating sequential code.

Compiling the kernel requires quite a few extensions to the C standard.
Assumptions about generating sequential code may well be among them.

Alan Stern

2012-06-06 15:48:36

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Wed, Jun 06, 2012 at 11:21:52AM -0400, Alan Stern wrote:
> On Wed, 6 Jun 2012, Paul E. McKenney wrote:
>
> > No sane compiler would change it to a byte-at-a-time store, but the
> > compiler would nevertheless be within its rights to do so. And a quick
> > review of certain LKML threads could easily cause anyone to question gcc's
> > sanity. Furthermore, the compiler is permitted to make transformations
> > like the following, which it might well do to save a branch:
> >
> > if (b) a = 0;
> > a = 1; if (b)
> > else a = 1;
> > a = 0;
>
> The compiler would be forbidden if the original code were
>
> if (b)
> ACCESS_ONCE(a) = 1;
> else
> ACCESS_ONCE(a) = 0;
>
> But if I remember correctly, the code snippet we were talking was more
> like:
>
> if (ACCESS_ONCE(b))
> a = 1;
>
> Isn't this use of ACCESS_ONCE unnecessary?

That would depend on what else is nearby.

> > In short, without ACCESS_ONCE(), the compiler is within its rights to
> > assume that the code is single-threaded. There are a large number of
> > non-obvious optimizations that are safe in single-threaded code but
> > destructive in multithreaded code.
>
> Followed to its logical conclusion, this means that virtually every
> access to a shared variable that isn't protected by some sort of lock
> or isn't one of the special atomic operations needs to use
> ACCESS_ONCE. The kernel must be riddled with places that don't do
> this.

Agreed, we are relying on the compiler being unaggressive about
optimizations, which it usually is. But it will continue becoming
more aggressive over time, so we should at least apply ACCESS_ONCE()
to new code.

> Besides, how sure are you that even in the presence of ACCESS_ONCE, the
> compiler will not make any unsafe transformations? For example, is the
> compiler forbidden from transforming
>
> if (a)
> ACCESS_ONCE(b) = 5;
>
> into
>
> tmp = c;
> c = 999;
> if (a)
> ACCESS_ONCE(b) = 5;
> c = tmp;
>
> ? After all, the c variable isn't protected by ACCESS_ONCE in the
> original code. And yet this transformation is clearly _unsafe_ for
> multi-threaded operation.

There are some limitations because volatile accesses are not allowed to
move past "sequence points", but it would be possible to come up with
similar examples. This sort of thing is why C1x has a memory model and
why it allows variables to be designated as needing to be SMP-safe.

In the meantime, things like ACCESS_ONCE() are what is available, limited
though they are.

Thanx, Paul

> > In addition, and perhaps more important, ACCESS_ONCE() serves as useful
> > documentation of the fact that the variable is accessed outside of locks.
>
> True.
>
> Alan Stern
>
>
>

2012-06-06 15:57:58

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Wed, Jun 06, 2012 at 11:44:50AM -0400, Alan Stern wrote:
> On Wed, 6 Jun 2012, Paul E. McKenney wrote:
>
> > > That just seems wrong. By the same reasoning, the compiler is within
> > > its rights to transform either the original code or the code using
> > > ACCESS_ONCE into:
> > >
> > > b = 999;
> > > if (a)
> > > b = 9;
> > > else
> > > b = 42;
> > >
> > > and again, other code would be confused. The simple fact is that
> > > SMP-safe code is not likely to be produced by a compiler that assumes
> > > everything is single-threaded.
> >
> > If you use ACCESS_ONCE(), the compiler is prohibited from inserting
> > the "b = 999".
>
> What prohibits it?

The compiler cannot move a volatile access across a sequence point, for
example, across a statement boundary.

That said, yes, there might be code preceding the "if" that allowed the
spurious store to "b" to be generated. And the compiler would definitely
be permitted to do something like this:

tmp = ACCESS_ONCE(a);
b = 999;
if (tmp)
b = 9;
else
b = 42;

I am having some difficulty coming up with a reasonable rationale for
this transformation, but it might happen if there was a variable "c"
adjacent to "b" in memory that was accessed after the "if" statement.

> > If you don't use ACCESS_ONCE(), the compiler really
> > is permitted to insert the "b = 999". So, why would the compiler do
> > such a thing? One possible reason would be from optimizations using
> > large registers to hold multiple values. A store from such a register
> > could clobber unrelated variables, but as long as the compiler fixes
> > up the clobbering after the fact, it is within its rights to do so.
> >
> > The sad fact is that the C standard really does permit the compiler
> > to assume that it is generating sequential code.
>
> Compiling the kernel requires quite a few extensions to the C standard.
> Assumptions about generating sequential code may well be among them.

Yep. We are making do with gcc extensions for the moment, imperfect though
they are.

Thanx, Paul

2012-06-06 16:05:14

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Wed, 6 Jun 2012, Paul E. McKenney wrote:

> On Wed, Jun 06, 2012 at 11:21:52AM -0400, Alan Stern wrote:
> > On Wed, 6 Jun 2012, Paul E. McKenney wrote:
> >
> > > No sane compiler would change it to a byte-at-a-time store, but the
> > > compiler would nevertheless be within its rights to do so. And a quick
> > > review of certain LKML threads could easily cause anyone to question gcc's
> > > sanity. Furthermore, the compiler is permitted to make transformations
> > > like the following, which it might well do to save a branch:
> > >
> > > if (b) a = 0;
> > > a = 1; if (b)
> > > else a = 1;
> > > a = 0;
> >
> > The compiler would be forbidden if the original code were
> >
> > if (b)
> > ACCESS_ONCE(a) = 1;
> > else
> > ACCESS_ONCE(a) = 0;
> >
> > But if I remember correctly, the code snippet we were talking was more
> > like:
> >
> > if (ACCESS_ONCE(b))
> > a = 1;
> >
> > Isn't this use of ACCESS_ONCE unnecessary?
>
> That would depend on what else is nearby.

Here's the relevant part of the original patch:

@@ -467,6 +473,12 @@ EXPORT_SYMBOL_GPL(driver_attach);
static void __device_release_driver(struct device *dev)
{
struct device_driver *drv;
+ int idx;
+
+ idx = srcu_read_lock(&driver_srcu);
+
+ if (ACCESS_ONCE(device_shutdown_started))
+ goto exit;

drv = dev->driver;
if (drv) {
@@ -494,6 +506,8 @@ static void __device_release_driver(struct device *dev)
dev);

}
+exit:
+ srcu_read_unlock(&driver_srcu, idx);
}

> There are some limitations because volatile accesses are not allowed to
> move past "sequence points", but it would be possible to come up with
> similar examples. This sort of thing is why C1x has a memory model and
> why it allows variables to be designated as needing to be SMP-safe.

Almost certainly the kernel won't use this facility. Or else it will
just require that _all_ global variables be SMP-safe.

Alan Stern

2012-06-06 16:26:29

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Wed, Jun 06, 2012 at 12:05:08PM -0400, Alan Stern wrote:
> On Wed, 6 Jun 2012, Paul E. McKenney wrote:
>
> > On Wed, Jun 06, 2012 at 11:21:52AM -0400, Alan Stern wrote:
> > > On Wed, 6 Jun 2012, Paul E. McKenney wrote:
> > >
> > > > No sane compiler would change it to a byte-at-a-time store, but the
> > > > compiler would nevertheless be within its rights to do so. And a quick
> > > > review of certain LKML threads could easily cause anyone to question gcc's
> > > > sanity. Furthermore, the compiler is permitted to make transformations
> > > > like the following, which it might well do to save a branch:
> > > >
> > > > if (b) a = 0;
> > > > a = 1; if (b)
> > > > else a = 1;
> > > > a = 0;
> > >
> > > The compiler would be forbidden if the original code were
> > >
> > > if (b)
> > > ACCESS_ONCE(a) = 1;
> > > else
> > > ACCESS_ONCE(a) = 0;
> > >
> > > But if I remember correctly, the code snippet we were talking was more
> > > like:
> > >
> > > if (ACCESS_ONCE(b))
> > > a = 1;
> > >
> > > Isn't this use of ACCESS_ONCE unnecessary?
> >
> > That would depend on what else is nearby.
>
> Here's the relevant part of the original patch:
>
> @@ -467,6 +473,12 @@ EXPORT_SYMBOL_GPL(driver_attach);
> static void __device_release_driver(struct device *dev)
> {
> struct device_driver *drv;
> + int idx;
> +
> + idx = srcu_read_lock(&driver_srcu);
> +
> + if (ACCESS_ONCE(device_shutdown_started))
> + goto exit;
>
> drv = dev->driver;
> if (drv) {
> @@ -494,6 +506,8 @@ static void __device_release_driver(struct device *dev)
> dev);
>
> }
> +exit:
> + srcu_read_unlock(&driver_srcu, idx);
> }

In this case, the ACCESS_ONCE() prevents the compiler from speculatively
executing the stuff following the "goto exit", which I freely admit is
insane even for compiler writers. But the documentation benefits still
stand.

> > There are some limitations because volatile accesses are not allowed to
> > move past "sequence points", but it would be possible to come up with
> > similar examples. This sort of thing is why C1x has a memory model and
> > why it allows variables to be designated as needing to be SMP-safe.
>
> Almost certainly the kernel won't use this facility. Or else it will
> just require that _all_ global variables be SMP-safe.

I will reserve judgment until after I see what effect requiring all
globals to be SMP-safe has on code generation. ;-)

Thanx, Paul

2012-06-06 16:58:32

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Wed, 6 Jun 2012, Paul E. McKenney wrote:

> On Wed, Jun 06, 2012 at 11:44:50AM -0400, Alan Stern wrote:
> > On Wed, 6 Jun 2012, Paul E. McKenney wrote:
> >
> > > > That just seems wrong. By the same reasoning, the compiler is within
> > > > its rights to transform either the original code or the code using
> > > > ACCESS_ONCE into:
> > > >
> > > > b = 999;
> > > > if (a)
> > > > b = 9;
> > > > else
> > > > b = 42;
> > > >
> > > > and again, other code would be confused. The simple fact is that
> > > > SMP-safe code is not likely to be produced by a compiler that assumes
> > > > everything is single-threaded.
> > >
> > > If you use ACCESS_ONCE(), the compiler is prohibited from inserting
> > > the "b = 999".
> >
> > What prohibits it?
>
> The compiler cannot move a volatile access across a sequence point, for
> example, across a statement boundary.

How does inserting a store to a non-volatile value qualify as moving a
volatile access?

Alan Stern

2012-06-06 23:25:16

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Wed, Jun 06, 2012 at 12:58:29PM -0400, Alan Stern wrote:
> On Wed, 6 Jun 2012, Paul E. McKenney wrote:
>
> > On Wed, Jun 06, 2012 at 11:44:50AM -0400, Alan Stern wrote:
> > > On Wed, 6 Jun 2012, Paul E. McKenney wrote:
> > >
> > > > > That just seems wrong. By the same reasoning, the compiler is within
> > > > > its rights to transform either the original code or the code using
> > > > > ACCESS_ONCE into:
> > > > >
> > > > > b = 999;
> > > > > if (a)
> > > > > b = 9;
> > > > > else
> > > > > b = 42;
> > > > >
> > > > > and again, other code would be confused. The simple fact is that
> > > > > SMP-safe code is not likely to be produced by a compiler that assumes
> > > > > everything is single-threaded.
> > > >
> > > > If you use ACCESS_ONCE(), the compiler is prohibited from inserting
> > > > the "b = 999".
> > >
> > > What prohibits it?
> >
> > The compiler cannot move a volatile access across a sequence point, for
> > example, across a statement boundary.
>
> How does inserting a store to a non-volatile value qualify as moving a
> volatile access?

Assuming that there is no access to "b" prior to the "if", then any
access to or modification of "b" would have to be associated with the
two assignments, which could not precede the volatile (at least as I
understand the sequencing rules). If there is some use of "b" prior
to the "if", then yes, the "b = 999" might be associated with those
prior accesses.

Thanx, Paul

2012-06-07 09:30:06

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove

On Wed, Jun 6, 2012 at 10:44 PM, Alan Stern <[email protected]> wrote:

>> > On the whole, it might be easier just to hold the device lock during
>> > the shutdown call.
>>
>> Yes, it is easier, but it is not efficient because there are very less
>> drivers which implemented .shutdown callback. Suppose there are some
>> drivers which have no .shutdown but are holding device lock, ?extra
>> time will be consumed in the lock acquiring, which may slow down 'power
>> down'.
>
> The main reasons for holding the device lock are probe and remove
> callbacks. ?In your patch, synchronize_srcu has to wait for those too.
> So there won't be much extra time.

There are differences between holding dev->lock(parent->lock) and
using synchronize_srcu:

- synchronize_srcu just wait for the ongoing probe/release
to complete

- holding dev->lock and parent->lock before shutdown means
every dev->lock and parent->lock is to be acquired, so any
drivers holding dev lock during device_shutdown will slow down
'power down' or 'reset'.

- if the device has not .shutdown implemented, it is not
necessary to acquire the lock and parent->lock, either of
which may has been held in other places already.

But after thinking about the problem further, the patch isn't good enough:

- even probe/remove is disallowed during shutdown, device_add/
device_del can be completed and only probe and release things
are bypassed, so .shutdown may see a deleted device from view of
driver model. For example, pci_device_shutdown need to access
its devres which may have been deleted by device_del already.

- violate driver model's implicit rule: once device_release_driver
(in device_del path) is completed, other callbacks for the device(driver)
won't be seen or called by others

So take the simply approach of holding lock to fix the races.


Thanks,
--
Ming Lei