2008-02-20 16:43:46

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Tue, 19 Feb 2008 [email protected] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10030
>
>
>
>
>
> ------- Comment #14 from [email protected] 2008-02-19 15:23 -------
> Thanks a lot for the debugging work!
>
> First, the patch triggers, which means that the problem discovered by Alan is
> troubling us. [Alan, do you have an idea how to fix that cleanly?]

I suggest we ask the maintainer for the MMC subsystem.

Pierre, you can find the details in the bugzilla entry. Briefly,
there's a pathway in the MMC core suspend routine (if the driver
doesn't implement a resume hook) which could lead to the host being
removed during a system suspend. This is an illegal operation and it
will deadlock.

Do you have a suggestion for a way to fix it?

Alan Stern


2008-02-20 17:31:06

by Pierre Ossman

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Wed, 20 Feb 2008 11:42:56 -0500 (EST)
Alan Stern <[email protected]> wrote:

> >
> > ------- Comment #14 from [email protected] 2008-02-19 15:23 -------
> > Thanks a lot for the debugging work!
> >
> > First, the patch triggers, which means that the problem discovered by Alan is
> > troubling us. [Alan, do you have an idea how to fix that cleanly?]
>
> I suggest we ask the maintainer for the MMC subsystem.
>
> Pierre, you can find the details in the bugzilla entry. Briefly,
> there's a pathway in the MMC core suspend routine (if the driver
> doesn't implement a resume hook) which could lead to the host being
> removed during a system suspend. This is an illegal operation and it
> will deadlock.
>
> Do you have a suggestion for a way to fix it?
>

Not really. But you have some things confused. What it checks is if the mmc bus handler (not a proper driver model, just a way of separating the MMC, SD and SDIO stuff) has a resume function. And if it doesn't, it removes the card (since it cannot revive it at resume).

So the only thing I can think of is to delay the removal until the resume routine, if that is safer.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2008-02-20 19:26:31

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Wed, 20 Feb 2008, Pierre Ossman wrote:

> Not really. But you have some things confused. What it checks is if
> the mmc bus handler (not a proper driver model, just a way of
> separating the MMC, SD and SDIO stuff) has a resume function. And if
> it doesn't, it removes the card (since it cannot revive it at
> resume).
>
> So the only thing I can think of is to delay the removal until the
> resume routine, if that is safer.

Do I understand this correctly? You've got special handling for the
case where a bus handler doesn't have a resume routine, but no special
handling for the case where it doesn't have a suspend routine? Why
bother to remove the device if neither routine exists (there won't be
any need to revive it since the bus never got suspended)?

And why not simply fail the suspend if the resume routine doesn't exist
and the suspend routine does? Maybe with an error message in the
system log.

Alan Stern

2008-02-20 20:51:55

by Pierre Ossman

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Wed, 20 Feb 2008 14:26:16 -0500 (EST)
Alan Stern <[email protected]> wrote:

>
> Do I understand this correctly? You've got special handling for the
> case where a bus handler doesn't have a resume routine, but no special
> handling for the case where it doesn't have a suspend routine?

Hmm... There should be checks for both, but the code seems to suggest otherwise.

> Why bother to remove the device if neither routine exists (there won't be
> any need to revive it since the bus never got suspended)?

The bus always gets suspended. The checks are to determine if state is saved or not. If it isn't, then a suspend/resume is treated as a removal/insertion.

> And why not simply fail the suspend if the resume routine doesn't exist
> and the suspend routine does? Maybe with an error message in the
> system log.

For the asymmetric case, I guess that would do. But I still want to remove devices when the bus handler has no suspend handling.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2008-02-20 21:00:22

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Wednesday, 20 of February 2008, Pierre Ossman wrote:
> On Wed, 20 Feb 2008 14:26:16 -0500 (EST)
> Alan Stern <[email protected]> wrote:
>
> >
> > Do I understand this correctly? You've got special handling for the
> > case where a bus handler doesn't have a resume routine, but no special
> > handling for the case where it doesn't have a suspend routine?
>
> Hmm... There should be checks for both, but the code seems to suggest otherwise.
>
> > Why bother to remove the device if neither routine exists (there won't be
> > any need to revive it since the bus never got suspended)?
>
> The bus always gets suspended. The checks are to determine if state is saved or not. If it isn't, then a suspend/resume is treated as a removal/insertion.
>
> > And why not simply fail the suspend if the resume routine doesn't exist
> > and the suspend routine does? Maybe with an error message in the
> > system log.
>
> For the asymmetric case, I guess that would do. But I still want to remove devices when the bus handler has no suspend handling.

I think I know how to handle that, but there's a more urgent issue I need to
fix first. Stay tuned. :-)

Thanks,
Rafael

2008-02-20 21:06:24

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Wed, 20 Feb 2008, Pierre Ossman wrote:

> > And why not simply fail the suspend if the resume routine doesn't exist
> > and the suspend routine does? Maybe with an error message in the
> > system log.
>
> For the asymmetric case, I guess that would do. But I still want to remove devices when the bus handler has no suspend handling.

Then it appears the correct approach is to use the new
device_pm_schedule_removal() routine. It schedules the removal of a
suspended device for a time when the system suspend is over.

Alan Stern

2008-02-20 22:18:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Wednesday, 20 of February 2008, Alan Stern wrote:
> On Wed, 20 Feb 2008, Pierre Ossman wrote:
>
> > > And why not simply fail the suspend if the resume routine doesn't exist
> > > and the suspend routine does? Maybe with an error message in the
> > > system log.
> >
> > For the asymmetric case, I guess that would do. But I still want to remove devices when the bus handler has no suspend handling.
>
> Then it appears the correct approach is to use the new
> device_pm_schedule_removal() routine. It schedules the removal of a
> suspended device for a time when the system suspend is over.

Well, below is an uncompiled and untested but illustrating the idea that
might allow people not to bother with device_pm_schedule_removal()
explicitly and can fix the issue at hand.

[There are some cases that need handling and are not covered here.]

Please have a look.

Thanks,
Rafael

---
drivers/base/core.c | 5 ++++-
drivers/base/power/main.c | 22 ++++++++++++++++++++++
drivers/base/power/power.h | 5 +++++
3 files changed, 31 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -59,6 +59,26 @@ static DECLARE_RWSEM(pm_sleep_rwsem);

int (*platform_enable_wakeup)(struct device *dev, int is_on);

+static struct task_struct *suspending_task;
+static DEFINE_MUTEX(suspending_task_mtx);
+
+bool in_suspend_context(void)
+{
+ bool result;
+
+ mutex_lock(&suspending_task_mtx);
+ result = (suspending_task == current);
+ mutex_unlock(&suspending_task_mtx);
+ return result;
+}
+
+static void set_suspending_task(struct task_struct *task)
+{
+ mutex_lock(&suspending_task_mtx);
+ suspending_task = task;
+ mutex_unlock(&suspending_task_mtx);
+}
+
/**
* device_pm_add - add a device to the list of active devices
* @dev: Device to be added to the list
@@ -272,6 +292,7 @@ static void dpm_resume(void)
mutex_lock(&dpm_list_mtx);
}
mutex_unlock(&dpm_list_mtx);
+ set_suspending_task(NULL);
}

/**
@@ -460,6 +481,7 @@ static int dpm_suspend(pm_message_t stat
{
int error = 0;

+ set_suspending_task(current);
mutex_lock(&dpm_list_mtx);
while (!list_empty(&dpm_locked)) {
struct list_head *entry = dpm_locked.prev;
Index: linux-2.6/drivers/base/core.c
===================================================================
--- linux-2.6.orig/drivers/base/core.c
+++ linux-2.6/drivers/base/core.c
@@ -1162,7 +1162,10 @@ void device_destroy(struct class *class,
dev = class_find_device(class, &devt, __match_devt);
if (dev) {
put_device(dev);
- device_unregister(dev);
+ if (in_suspend_context())
+ device_pm_schedule_removal(dev);
+ else
+ device_unregister(dev);
}
}
EXPORT_SYMBOL_GPL(device_destroy);
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -11,6 +11,7 @@ static inline struct device *to_device(s
return container_of(entry, struct device, power.entry);
}

+extern bool in_suspend_context(void);
extern void device_pm_add(struct device *);
extern void device_pm_remove(struct device *);
extern int pm_sleep_lock(void);
@@ -18,6 +19,10 @@ extern void pm_sleep_unlock(void);

#else /* CONFIG_PM_SLEEP */

+static inline bool in_suspend_context(void)
+{
+ return false;
+}

static inline void device_pm_add(struct device *dev)
{

2008-02-20 22:24:26

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Wed, 20 Feb 2008, Rafael J. Wysocki wrote:

> Well, below is an uncompiled and untested but illustrating the idea that
> might allow people not to bother with device_pm_schedule_removal()
> explicitly and can fix the issue at hand.
>
> [There are some cases that need handling and are not covered here.]
>
> Please have a look.
>
> Thanks,
> Rafael

> +static struct task_struct *suspending_task;
> +static DEFINE_MUTEX(suspending_task_mtx);

I suspect you don't really need this mutex.

> +bool in_suspend_context(void)
> +{
> + bool result;
> +
> + mutex_lock(&suspending_task_mtx);
> + result = (suspending_task == current);
> + mutex_unlock(&suspending_task_mtx);
> + return result;
> +}

If suspending_task == current then you are guaranteed to be serialized,
because everything a single task does is serial.

> @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class,
> dev = class_find_device(class, &devt, __match_devt);
> if (dev) {
> put_device(dev);
> - device_unregister(dev);
> + if (in_suspend_context())
> + device_pm_schedule_removal(dev);
> + else
> + device_unregister(dev);
> }
> }

But what about device_del()? Can a similar change be made there?

Alan Stern

2008-02-20 22:43:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Wednesday, 20 of February 2008, Alan Stern wrote:
> On Wed, 20 Feb 2008, Rafael J. Wysocki wrote:
>
> > Well, below is an uncompiled and untested but illustrating the idea that
> > might allow people not to bother with device_pm_schedule_removal()
> > explicitly and can fix the issue at hand.
> >
> > [There are some cases that need handling and are not covered here.]
> >
> > Please have a look.
> >
> > Thanks,
> > Rafael
>
> > +static struct task_struct *suspending_task;
> > +static DEFINE_MUTEX(suspending_task_mtx);
>
> I suspect you don't really need this mutex.
>
> > +bool in_suspend_context(void)
> > +{
> > + bool result;
> > +
> > + mutex_lock(&suspending_task_mtx);
> > + result = (suspending_task == current);
> > + mutex_unlock(&suspending_task_mtx);
> > + return result;
> > +}
>
> If suspending_task == current then you are guaranteed to be serialized,
> because everything a single task does is serial.

But in principle there could be a concurrent thread removind the device and
that should block on dev->sem held by us.

Right now that's not very likely to happen thanks to the freezer, but we're
doing all this stuff, because we want to get rid of the freezer eventually. :-)

> > @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class,
> > dev = class_find_device(class, &devt, __match_devt);
> > if (dev) {
> > put_device(dev);
> > - device_unregister(dev);
> > + if (in_suspend_context())
> > + device_pm_schedule_removal(dev);
> > + else
> > + device_unregister(dev);
> > }
> > }
>
> But what about device_del()? Can a similar change be made there?

I believe so.

I'm working on a more complete patch right now.

Thanks,
Rafael

2008-02-21 00:03:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Wednesday, 20 of February 2008, Alan Stern wrote:
> On Wed, 20 Feb 2008, Rafael J. Wysocki wrote:
>
> > Well, below is an uncompiled and untested but illustrating the idea that
> > might allow people not to bother with device_pm_schedule_removal()
> > explicitly and can fix the issue at hand.
> >
> > [There are some cases that need handling and are not covered here.]
> >
> > Please have a look.
> >
> > Thanks,
> > Rafael
>
> > +static struct task_struct *suspending_task;
> > +static DEFINE_MUTEX(suspending_task_mtx);
>
> I suspect you don't really need this mutex.
>
> > +bool in_suspend_context(void)
> > +{
> > + bool result;
> > +
> > + mutex_lock(&suspending_task_mtx);
> > + result = (suspending_task == current);
> > + mutex_unlock(&suspending_task_mtx);
> > + return result;
> > +}
>
> If suspending_task == current then you are guaranteed to be serialized,
> because everything a single task does is serial.

As I said before (but that doesn't seem to reach the list, so I'm repeating),
this is to protect other tasks from reading an inconsistent value of
suspending_task in case they attempt to remove a device concurrently with
respect to us.

While this is not likely to happen right now, because of the freezer, it may
very well happen when the freezer is finally removed.

> > @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class,
> > dev = class_find_device(class, &devt, __match_devt);
> > if (dev) {
> > put_device(dev);
> > - device_unregister(dev);
> > + if (in_suspend_context())
> > + device_pm_schedule_removal(dev);
> > + else
> > + device_unregister(dev);
> > }
> > }
>
> But what about device_del()? Can a similar change be made there?

Something like the patch below, perhaps? Again, untested, but compiled this
time. :-)

Thanks,
Rafael

---
drivers/base/core.c | 5 +++++
drivers/base/power/main.c | 22 ++++++++++++++++++++++
drivers/base/power/power.h | 5 +++++
3 files changed, 32 insertions(+)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -59,6 +59,26 @@ static DECLARE_RWSEM(pm_sleep_rwsem);

int (*platform_enable_wakeup)(struct device *dev, int is_on);

+static struct task_struct *suspending_task;
+static DEFINE_MUTEX(suspending_task_mtx);
+
+bool in_suspend_context(void)
+{
+ bool result;
+
+ mutex_lock(&suspending_task_mtx);
+ result = (suspending_task == current);
+ mutex_unlock(&suspending_task_mtx);
+ return result;
+}
+
+static void set_suspending_task(struct task_struct *task)
+{
+ mutex_lock(&suspending_task_mtx);
+ suspending_task = task;
+ mutex_unlock(&suspending_task_mtx);
+}
+
/**
* device_pm_add - add a device to the list of active devices
* @dev: Device to be added to the list
@@ -272,6 +292,7 @@ static void dpm_resume(void)
mutex_lock(&dpm_list_mtx);
}
mutex_unlock(&dpm_list_mtx);
+ set_suspending_task(NULL);
}

/**
@@ -460,6 +481,7 @@ static int dpm_suspend(pm_message_t stat
{
int error = 0;

+ set_suspending_task(current);
mutex_lock(&dpm_list_mtx);
while (!list_empty(&dpm_locked)) {
struct list_head *entry = dpm_locked.prev;
Index: linux-2.6/drivers/base/core.c
===================================================================
--- linux-2.6.orig/drivers/base/core.c
+++ linux-2.6/drivers/base/core.c
@@ -929,6 +929,11 @@ void device_del(struct device *dev)
struct device *parent = dev->parent;
struct class_interface *class_intf;

+ if (in_suspend_context()) {
+ get_device(dev);
+ device_pm_schedule_removal(dev);
+ return;
+ }
device_pm_remove(dev);
if (parent)
klist_del(&dev->knode_parent);
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -11,6 +11,7 @@ static inline struct device *to_device(s
return container_of(entry, struct device, power.entry);
}

+extern bool in_suspend_context(void);
extern void device_pm_add(struct device *);
extern void device_pm_remove(struct device *);
extern int pm_sleep_lock(void);
@@ -18,6 +19,10 @@ extern void pm_sleep_unlock(void);

#else /* CONFIG_PM_SLEEP */

+static inline bool in_suspend_context(void)
+{
+ return false;
+}

static inline void device_pm_add(struct device *dev)
{

2008-02-21 16:28:32

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:

> > > +bool in_suspend_context(void)
> > > +{
> > > + bool result;
> > > +
> > > + mutex_lock(&suspending_task_mtx);
> > > + result = (suspending_task == current);
> > > + mutex_unlock(&suspending_task_mtx);
> > > + return result;
> > > +}
> >
> > If suspending_task == current then you are guaranteed to be serialized,
> > because everything a single task does is serial.
>
> As I said before (but that doesn't seem to reach the list, so I'm repeating),
> this is to protect other tasks from reading an inconsistent value of
> suspending_task in case they attempt to remove a device concurrently with
> respect to us.
>
> While this is not likely to happen right now, because of the freezer, it may
> very well happen when the freezer is finally removed.

Sorry, I don't understand. Are you worried that process A might set
suspending_task = A but then process B might still see suspending_task
== NULL? Or that A might set suspend_task = NULL but then B might
still see suspending_task == A?

Neither one will cause any problem, since the only case that matters is
when B sees suspending_task == B -- and that can happen if and only if
B was the last process to set suspending_task.

In fact, you might as well get rid of the set_suspending_task() routine
entirely and just put the assignments inline.

> --- linux-2.6.orig/drivers/base/core.c
> +++ linux-2.6/drivers/base/core.c
> @@ -929,6 +929,11 @@ void device_del(struct device *dev)
> struct device *parent = dev->parent;
> struct class_interface *class_intf;
>
> + if (in_suspend_context()) {
> + get_device(dev);

Where is this get_device() undone? Shouldn't there be an extra
put_device() added to unregister_dropped_devices()?

> + device_pm_schedule_removal(dev);
> + return;
> + }
> device_pm_remove(dev);
> if (parent)
> klist_del(&dev->knode_parent);

And now the change to device_destroy() isn't needed at all.

Alan Stern

2008-02-21 16:40:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Thursday, 21 of February 2008, Alan Stern wrote:
> On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:
>
> > > > +bool in_suspend_context(void)
> > > > +{
> > > > + bool result;
> > > > +
> > > > + mutex_lock(&suspending_task_mtx);
> > > > + result = (suspending_task == current);
> > > > + mutex_unlock(&suspending_task_mtx);
> > > > + return result;
> > > > +}
> > >
> > > If suspending_task == current then you are guaranteed to be serialized,
> > > because everything a single task does is serial.
> >
> > As I said before (but that doesn't seem to reach the list, so I'm repeating),
> > this is to protect other tasks from reading an inconsistent value of
> > suspending_task in case they attempt to remove a device concurrently with
> > respect to us.
> >
> > While this is not likely to happen right now, because of the freezer, it may
> > very well happen when the freezer is finally removed.
>
> Sorry, I don't understand. Are you worried that process A might set
> suspending_task = A but then process B might still see suspending_task
> == NULL? Or that A might set suspend_task = NULL but then B might
> still see suspending_task == A?
>
> Neither one will cause any problem, since the only case that matters is
> when B sees suspending_task == B -- and that can happen if and only if
> B was the last process to set suspending_task.
>
> In fact, you might as well get rid of the set_suspending_task() routine
> entirely and just put the assignments inline.

OK, I will.

> > --- linux-2.6.orig/drivers/base/core.c
> > +++ linux-2.6/drivers/base/core.c
> > @@ -929,6 +929,11 @@ void device_del(struct device *dev)
> > struct device *parent = dev->parent;
> > struct class_interface *class_intf;
> >
> > + if (in_suspend_context()) {
> > + get_device(dev);
>
> Where is this get_device() undone? Shouldn't there be an extra
> put_device() added to unregister_dropped_devices()?

No, I don't think so, because unregister_dropped_devices() calls
device_unregister() that does the put_device() eventually.

If we are called by device_unregister(), the get_device() is needed to balance
the put_device() that will be called by device_unregister() after we return.

OTOH, if we are called directly, then we need to balance the put_device()
that will be done by device_unregister() called from
unregister_dropped_devices().

I hope I didn't miss anything.

> > + device_pm_schedule_removal(dev);
> > + return;
> > + }
> > device_pm_remove(dev);
> > if (parent)
> > klist_del(&dev->knode_parent);
>
> And now the change to device_destroy() isn't needed at all.

No, it's not. Didn't I remove it? I thought I did.

Thanks,
Rafael

2008-02-21 17:49:14

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:

> > > --- linux-2.6.orig/drivers/base/core.c
> > > +++ linux-2.6/drivers/base/core.c
> > > @@ -929,6 +929,11 @@ void device_del(struct device *dev)
> > > struct device *parent = dev->parent;
> > > struct class_interface *class_intf;
> > >
> > > + if (in_suspend_context()) {
> > > + get_device(dev);
> >
> > Where is this get_device() undone? Shouldn't there be an extra
> > put_device() added to unregister_dropped_devices()?
>
> No, I don't think so, because unregister_dropped_devices() calls
> device_unregister() that does the put_device() eventually.

Ah, yes.

> If we are called by device_unregister(), the get_device() is needed to balance
> the put_device() that will be called by device_unregister() after we return.
>
> OTOH, if we are called directly, then we need to balance the put_device()
> that will be done by device_unregister() called from
> unregister_dropped_devices().
>
> I hope I didn't miss anything.

Okay, that sounds right.

> > > + device_pm_schedule_removal(dev);
> > > + return;
> > > + }
> > > device_pm_remove(dev);
> > > if (parent)
> > > klist_del(&dev->knode_parent);
> >
> > And now the change to device_destroy() isn't needed at all.
>
> No, it's not. Didn't I remove it? I thought I did.

Oh yes, you did.

I see a possible problem in device_resume(). My copy isn't fully
up-to-date, but it looks like you call unregister_dropped_devices()
before doing the up_write(&pm_sleep_rwsem). Won't this cause
warnings in device_del() about a suspicious caller?

Alan Stern

2008-02-21 22:49:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Thursday, 21 of February 2008, Alan Stern wrote:
> On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:
>
> > > > --- linux-2.6.orig/drivers/base/core.c
> > > > +++ linux-2.6/drivers/base/core.c
> > > > @@ -929,6 +929,11 @@ void device_del(struct device *dev)
> > > > struct device *parent = dev->parent;
> > > > struct class_interface *class_intf;
> > > >
> > > > + if (in_suspend_context()) {
> > > > + get_device(dev);
> > >
> > > Where is this get_device() undone? Shouldn't there be an extra
> > > put_device() added to unregister_dropped_devices()?
> >
> > No, I don't think so, because unregister_dropped_devices() calls
> > device_unregister() that does the put_device() eventually.
>
> Ah, yes.
>
> > If we are called by device_unregister(), the get_device() is needed to balance
> > the put_device() that will be called by device_unregister() after we return.
> >
> > OTOH, if we are called directly, then we need to balance the put_device()
> > that will be done by device_unregister() called from
> > unregister_dropped_devices().
> >
> > I hope I didn't miss anything.
>
> Okay, that sounds right.
>
> > > > + device_pm_schedule_removal(dev);
> > > > + return;
> > > > + }
> > > > device_pm_remove(dev);
> > > > if (parent)
> > > > klist_del(&dev->knode_parent);
> > >
> > > And now the change to device_destroy() isn't needed at all.
> >
> > No, it's not. Didn't I remove it? I thought I did.
>
> Oh yes, you did.
>
> I see a possible problem in device_resume(). My copy isn't fully
> up-to-date, but it looks like you call unregister_dropped_devices()
> before doing the up_write(&pm_sleep_rwsem). Won't this cause
> warnings in device_del() about a suspicious caller?

No, it won't, because the devices' semaphores are unlocked by
unregister_dropped_devices() before calling device_unregister().

BTW, below is a simplified version of the patch, without the mutex protecting
suspending_task. I'd like to push it upstream if it looks good.

Please also have a look at http://bugzilla.kernel.org/show_bug.cgi?id=10030.
There seems to be another issue related to us holding devices' semaphores.
Namely, it looks like, when the user removes the card, a concurrent thread
(from a workqueue) calls device_del() and blocks on the dev->sem held by
us and then something else deadlocks with this thread. I'll be looking into
this tomorrow.

Thanks,
Rafael

---
drivers/base/core.c | 5 +++++
drivers/base/power/main.c | 9 +++++++++
drivers/base/power/power.h | 5 +++++
3 files changed, 19 insertions(+)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -59,6 +59,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);

int (*platform_enable_wakeup)(struct device *dev, int is_on);

+static struct task_struct *suspending_task;
+
+bool in_suspend_context(void)
+{
+ return (suspending_task == current);
+}
+
/**
* device_pm_add - add a device to the list of active devices
* @dev: Device to be added to the list
@@ -272,6 +279,7 @@ static void dpm_resume(void)
mutex_lock(&dpm_list_mtx);
}
mutex_unlock(&dpm_list_mtx);
+ suspending_task = NULL;
}

/**
@@ -460,6 +468,7 @@ static int dpm_suspend(pm_message_t stat
{
int error = 0;

+ suspending_task = current;
mutex_lock(&dpm_list_mtx);
while (!list_empty(&dpm_locked)) {
struct list_head *entry = dpm_locked.prev;
Index: linux-2.6/drivers/base/core.c
===================================================================
--- linux-2.6.orig/drivers/base/core.c
+++ linux-2.6/drivers/base/core.c
@@ -929,6 +929,11 @@ void device_del(struct device *dev)
struct device *parent = dev->parent;
struct class_interface *class_intf;

+ if (in_suspend_context()) {
+ get_device(dev);
+ device_pm_schedule_removal(dev);
+ return;
+ }
device_pm_remove(dev);
if (parent)
klist_del(&dev->knode_parent);
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -11,6 +11,7 @@ static inline struct device *to_device(s
return container_of(entry, struct device, power.entry);
}

+extern bool in_suspend_context(void);
extern void device_pm_add(struct device *);
extern void device_pm_remove(struct device *);
extern int pm_sleep_lock(void);
@@ -18,6 +19,10 @@ extern void pm_sleep_unlock(void);

#else /* CONFIG_PM_SLEEP */

+static inline bool in_suspend_context(void)
+{
+ return false;
+}

static inline void device_pm_add(struct device *dev)
{

2008-02-21 23:05:21

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:

> BTW, below is a simplified version of the patch, without the mutex protecting
> suspending_task. I'd like to push it upstream if it looks good.

It does look good. Go ahead and push.

Acked-by: Alan Stern <[email protected]>

> Please also have a look at http://bugzilla.kernel.org/show_bug.cgi?id=10030.
> There seems to be another issue related to us holding devices' semaphores.
> Namely, it looks like, when the user removes the card, a concurrent thread
> (from a workqueue) calls device_del() and blocks on the dev->sem held by
> us and then something else deadlocks with this thread. I'll be looking into
> this tomorrow.

I've been too busy with other things to look at the activity on that
bug report. Tonight or tomorrow...

Alan Stern

2008-02-23 01:31:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Friday, 22 of February 2008, Alan Stern wrote:
> On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:
>
> > BTW, below is a simplified version of the patch, without the mutex protecting
> > suspending_task. I'd like to push it upstream if it looks good.
>
> It does look good. Go ahead and push.
>
> Acked-by: Alan Stern <[email protected]>
>
> > Please also have a look at http://bugzilla.kernel.org/show_bug.cgi?id=10030.
> > There seems to be another issue related to us holding devices' semaphores.
> > Namely, it looks like, when the user removes the card, a concurrent thread
> > (from a workqueue) calls device_del() and blocks on the dev->sem held by
> > us and then something else deadlocks with this thread. I'll be looking into
> > this tomorrow.
>
> I've been too busy with other things to look at the activity on that
> bug report. Tonight or tomorrow...

Unfortunately, I missed your Bugzilla comment at
http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28

Well, in the face of it, I'm considering to remove the code that
acquires device semaphores from the suspend core for now. Evidently, this
change turns out to be painfully premature.

Also, we have apparent problems with pm_sleep_lock()
being take in device_add() (see
http://bugzilla.kernel.org/show_bug.cgi?id=9874).

Thanks,
Rafael

2008-02-23 04:39:25

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:

> Unfortunately, I missed your Bugzilla comment at
> http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28

Strange... But obviously you did see it eventually.

> Well, in the face of it, I'm considering to remove the code that
> acquires device semaphores from the suspend core for now. Evidently, this
> change turns out to be painfully premature.

I wonder if that's really the best thing. How would we then learn
about drivers trying to register or unregister a device during a sleep
transition?

Do you think it might be possible instead to somehow allow these
unregistrations to go through, while still failing or blocking
registrations? It shouldn't be too hard to modify the driver core so
that it calls the driver's remove() method without trying to acquire
dev->sem if your in_suspend_context() test succeeds.

I have to admit, I still don't understand what's going on with the MMC
driver. Why is there a workqueue involved? If the workqueue fails to
unregister the device, why should it bother the suspend routine? After
all, if the suspend routine can afford to wait for the workqueue to
finish then it could just as well afford to do the unregistration
itself.

> Also, we have apparent problems with pm_sleep_lock()
> being take in device_add() (see
> http://bugzilla.kernel.org/show_bug.cgi?id=9874).

We'll have to get more information from the bug reporter to figure out
what really happened there.

Ultimately it may turn out some drivers just aren't very careful about
when they try to register new devices. Doing the registration by way
of a workqueue can be problematic if the workqueue happens to run
during a system sleep transition. That will still be true if you
revert the acquire-all-semaphores patch.

Alan Stern

2008-02-23 20:18:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Saturday, 23 of February 2008, Alan Stern wrote:
> On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
>
> > Unfortunately, I missed your Bugzilla comment at
> > http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28
>
> Strange... But obviously you did see it eventually.
>
> > Well, in the face of it, I'm considering to remove the code that
> > acquires device semaphores from the suspend core for now. Evidently, this
> > change turns out to be painfully premature.
>
> I wonder if that's really the best thing.

Ultimately no, it's not. However, we are now late in the -rc2 time frame and
the release of -rc3 seems to be imminent. At this point, IMO, that's the
safest thing to do. BTW, appended is the patch I'd like to get applied.

> How would we then learn about drivers trying to register or unregister a
> device during a sleep transition?

I think we should fix the ones we know about and try to reintroduce this
change in the 2.6.26 time frame. It also seems reasonable to reconsider what
we want to achieve, as there may be a better way to do that.

> Do you think it might be possible instead to somehow allow these
> unregistrations to go through, while still failing or blocking
> registrations?

Yes, that should be possible, but as I said above, I think it's not the right
time for doing that right now.

> It shouldn't be too hard to modify the driver core so that it calls the
> driver's remove() method without trying to acquire dev->sem if your
> in_suspend_context() test succeeds.

I agree that the in_suspend_context() test may be useful and this is one
of the reasons why I think the entire approach requires reconsideration.

> I have to admit, I still don't understand what's going on with the MMC
> driver.

Me neither, and that alone is a good enough reason for resigning from acquiring
device semaphores in the suspend core, at least in the mainline kernel, until
we understand the problem.

> Why is there a workqueue involved? If the workqueue fails to
> unregister the device, why should it bother the suspend routine? After
> all, if the suspend routine can afford to wait for the workqueue to
> finish then it could just as well afford to do the unregistration
> itself.

Yes, that's strange.

> > Also, we have apparent problems with pm_sleep_lock()
> > being take in device_add() (see
> > http://bugzilla.kernel.org/show_bug.cgi?id=9874).
>
> We'll have to get more information from the bug reporter to figure out
> what really happened there.

Yes.

> Ultimately it may turn out some drivers just aren't very careful about
> when they try to register new devices.

That's almost certain to me.

> Doing the registration by way of a workqueue can be problematic if the
> workqueue happens to run during a system sleep transition. That will still
> be true if you revert the acquire-all-semaphores patch.

Yes, but our taking of device semaphores exposes these problems in a rather
drastic fashion. :-)

Thanks,
Rafael

---
drivers/base/power/main.c | 84 ++--------------------------------------------
1 file changed, 4 insertions(+), 80 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -48,7 +48,6 @@
*/

LIST_HEAD(dpm_active);
-static LIST_HEAD(dpm_locked);
static LIST_HEAD(dpm_off);
static LIST_HEAD(dpm_off_irq);
static LIST_HEAD(dpm_destroy);
@@ -81,28 +80,6 @@ void device_pm_add(struct device *dev)
*/
void device_pm_remove(struct device *dev)
{
- /*
- * If this function is called during a suspend, it will be blocked,
- * because we're holding the device's semaphore at that time, which may
- * lead to a deadlock. In that case we want to print a warning.
- * However, it may also be called by unregister_dropped_devices() with
- * the device's semaphore released, in which case the warning should
- * not be printed.
- */
- if (down_trylock(&dev->sem)) {
- if (down_read_trylock(&pm_sleep_rwsem)) {
- /* No suspend in progress, wait on dev->sem */
- down(&dev->sem);
- up_read(&pm_sleep_rwsem);
- } else {
- /* Suspend in progress, we may deadlock */
- dev_warn(dev, "Suspicious %s during suspend\n",
- __FUNCTION__);
- dump_stack();
- /* The user has been warned ... */
- down(&dev->sem);
- }
- }
pr_debug("PM: Removing info for %s:%s\n",
dev->bus ? dev->bus->name : "No Bus",
kobject_name(&dev->kobj));
@@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev
dpm_sysfs_remove(dev);
list_del_init(&dev->power.entry);
mutex_unlock(&dpm_list_mtx);
- up(&dev->sem);
}

/**
@@ -266,7 +242,7 @@ static void dpm_resume(void)
struct list_head *entry = dpm_off.next;
struct device *dev = to_device(entry);

- list_move_tail(entry, &dpm_locked);
+ list_move_tail(entry, &dpm_active);
mutex_unlock(&dpm_list_mtx);
resume_device(dev);
mutex_lock(&dpm_list_mtx);
@@ -275,25 +251,6 @@ static void dpm_resume(void)
}

/**
- * unlock_all_devices - Release each device's semaphore
- *
- * Go through the dpm_off list. Put each device on the dpm_active
- * list and unlock it.
- */
-static void unlock_all_devices(void)
-{
- mutex_lock(&dpm_list_mtx);
- while (!list_empty(&dpm_locked)) {
- struct list_head *entry = dpm_locked.prev;
- struct device *dev = to_device(entry);
-
- list_move(entry, &dpm_active);
- up(&dev->sem);
- }
- mutex_unlock(&dpm_list_mtx);
-}
-
-/**
* unregister_dropped_devices - Unregister devices scheduled for removal
*
* Unregister all devices on the dpm_destroy list.
@@ -305,7 +262,6 @@ static void unregister_dropped_devices(v
struct list_head *entry = dpm_destroy.next;
struct device *dev = to_device(entry);

- up(&dev->sem);
mutex_unlock(&dpm_list_mtx);
/* This also removes the device from the list */
device_unregister(dev);
@@ -324,7 +280,6 @@ void device_resume(void)
{
might_sleep();
dpm_resume();
- unlock_all_devices();
unregister_dropped_devices();
up_write(&pm_sleep_rwsem);
}
@@ -461,8 +416,8 @@ static int dpm_suspend(pm_message_t stat
int error = 0;

mutex_lock(&dpm_list_mtx);
- while (!list_empty(&dpm_locked)) {
- struct list_head *entry = dpm_locked.prev;
+ while (!list_empty(&dpm_active)) {
+ struct list_head *entry = dpm_active.prev;
struct device *dev = to_device(entry);

list_del_init(&dev->power.entry);
@@ -478,7 +433,7 @@ static int dpm_suspend(pm_message_t stat
""));
mutex_lock(&dpm_list_mtx);
if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_locked);
+ list_add_tail(&dev->power.entry, &dpm_active);
break;
}
mutex_lock(&dpm_list_mtx);
@@ -491,36 +446,6 @@ static int dpm_suspend(pm_message_t stat
}

/**
- * lock_all_devices - Acquire every device's semaphore
- *
- * Go through the dpm_active list. Carefully lock each device's
- * semaphore and put it in on the dpm_locked list.
- */
-static void lock_all_devices(void)
-{
- mutex_lock(&dpm_list_mtx);
- while (!list_empty(&dpm_active)) {
- struct list_head *entry = dpm_active.next;
- struct device *dev = to_device(entry);
-
- /* Required locking order is dev->sem first,
- * then dpm_list_mutex. Hence this awkward code.
- */
- get_device(dev);
- mutex_unlock(&dpm_list_mtx);
- down(&dev->sem);
- mutex_lock(&dpm_list_mtx);
-
- if (list_empty(entry))
- up(&dev->sem); /* Device was removed */
- else
- list_move_tail(entry, &dpm_locked);
- put_device(dev);
- }
- mutex_unlock(&dpm_list_mtx);
-}
-
-/**
* device_suspend - Save state and stop all devices in system.
* @state: new power management state
*
@@ -533,7 +458,6 @@ int device_suspend(pm_message_t state)

might_sleep();
down_write(&pm_sleep_rwsem);
- lock_all_devices();
error = dpm_suspend(state);
if (error)
device_resume();

2008-02-23 23:31:58

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:

> Ultimately no, it's not. However, we are now late in the -rc2 time frame and
> the release of -rc3 seems to be imminent. At this point, IMO, that's the
> safest thing to do. BTW, appended is the patch I'd like to get applied.

In the interest of having a safe release, I guess you're right. It's
unfortunate, though -- there's no good way to get thorough testing
without putting the code in Linus's tree.

> > How would we then learn about drivers trying to register or unregister a
> > device during a sleep transition?
>
> I think we should fix the ones we know about and try to reintroduce this
> change in the 2.6.26 time frame. It also seems reasonable to reconsider what
> we want to achieve, as there may be a better way to do that.

Below is my suggested approach, based on yours. It might even solve
the MMC problem, who knows? And it adds some potentially useful
debugging, although it would be better to dump just the stack of
suspending_task. Do you know how to do that from within a timer
routine?

I still have no clear idea about what's going on with the ACPI bug in
#9874. However perhaps the bug wouldn't occur if we blocked device
registration until after the resume was finished, instead of failing it
outright. Since the registration in this case was done in a workqueue
thread, blocking wouldn't cause an immediate deadlock.

Alan Stern



Index: usb-2.6/drivers/base/power/main.c
===================================================================
--- usb-2.6.orig/drivers/base/power/main.c
+++ usb-2.6/drivers/base/power/main.c
@@ -25,6 +25,7 @@
#include <linux/pm.h>
#include <linux/resume-trace.h>
#include <linux/rwsem.h>
+#include <linux/sched.h>

#include "../base.h"
#include "power.h"
@@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);

int (*platform_enable_wakeup)(struct device *dev, int is_on);

+static struct task_struct *suspending_task;
+
+bool in_suspend_context(void)
+{
+ return (suspending_task == current);
+}
+
/**
* device_pm_add - add a device to the list of active devices
* @dev: Device to be added to the list
@@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev
/* No suspend in progress, wait on dev->sem */
down(&dev->sem);
up_read(&pm_sleep_rwsem);
- } else {
- /* Suspend in progress, we may deadlock */
+ } else if (!in_suspend_context()) {
+ /* Suspending in another task, we may deadlock */
dev_warn(dev, "Suspicious %s during suspend\n",
__FUNCTION__);
dump_stack();
/* The user has been warned ... */
down(&dev->sem);
}
+ /* Otherwise we're okay */
}
pr_debug("PM: Removing info for %s:%s\n",
dev->bus ? dev->bus->name : "No Bus",
@@ -272,6 +281,7 @@ static void dpm_resume(void)
mutex_lock(&dpm_list_mtx);
}
mutex_unlock(&dpm_list_mtx);
+ suspending_task = NULL;
}

/**
@@ -410,6 +420,17 @@ int device_power_down(pm_message_t state
}
EXPORT_SYMBOL_GPL(device_power_down);

+/* Provide debugging info if we hang or deadlock during suspend */
+static struct timer_list suspend_timer;
+
+static void suspend_timeout(unsigned long _dev)
+{
+ struct device *dev = (struct device *) _dev;
+
+ dev_err(dev, "deadlock during suspend!\n");
+ show_state();
+}
+
/**
* suspend_device - Save state of one device.
* @dev: Device.
@@ -419,6 +440,10 @@ int suspend_device(struct device *dev, p
{
int error = 0;

+ /* Provide debugging output in case of a deadlock */
+ setup_timer(&suspend_timer, suspend_timeout, (unsigned long) dev);
+ mod_timer(&suspend_timer, jiffies + 5*HZ);
+
if (dev->power.power_state.event) {
dev_dbg(dev, "PM: suspend %d-->%d\n",
dev->power.power_state.event, state.event);
@@ -441,6 +466,8 @@ int suspend_device(struct device *dev, p
error = dev->bus->suspend(dev, state);
suspend_report_result(dev->bus->suspend, error);
}
+
+ del_timer_sync(&suspend_timer);
return error;
}

@@ -460,6 +487,7 @@ static int dpm_suspend(pm_message_t stat
{
int error = 0;

+ suspending_task = current;
mutex_lock(&dpm_list_mtx);
while (!list_empty(&dpm_locked)) {
struct list_head *entry = dpm_locked.prev;
Index: usb-2.6/drivers/base/power/power.h
===================================================================
--- usb-2.6.orig/drivers/base/power/power.h
+++ usb-2.6/drivers/base/power/power.h
@@ -11,6 +11,7 @@ static inline struct device *to_device(s
return container_of(entry, struct device, power.entry);
}

+extern bool in_suspend_context(void);
extern void device_pm_add(struct device *);
extern void device_pm_remove(struct device *);
extern int pm_sleep_lock(void);
@@ -18,6 +19,10 @@ extern void pm_sleep_unlock(void);

#else /* CONFIG_PM_SLEEP */

+static inline bool in_suspend_context(void)
+{
+ return false;
+}

static inline void device_pm_add(struct device *dev)
{
Index: usb-2.6/drivers/base/dd.c
===================================================================
--- usb-2.6.orig/drivers/base/dd.c
+++ usb-2.6/drivers/base/dd.c
@@ -325,10 +325,18 @@ void device_release_driver(struct device
* If anyone calls device_release_driver() recursively from
* within their ->remove callback for the same device, they
* will deadlock right here.
+ *
+ * We avoid locking dev->sem if we are in the context of a
+ * task doing a system suspend, in order that drivers can
+ * unregister devices from within their suspend() methods.
+ * This is okay because the suspending task will already own
+ * all the device semaphores.
*/
- down(&dev->sem);
+ if (!in_suspend_context())
+ down(&dev->sem);
__device_release_driver(dev);
- up(&dev->sem);
+ if (!in_suspend_context())
+ up(&dev->sem);
}
EXPORT_SYMBOL_GPL(device_release_driver);

2008-02-24 00:22:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sunday, 24 of February 2008, Alan Stern wrote:
> On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
>
> > Ultimately no, it's not. However, we are now late in the -rc2 time frame and
> > the release of -rc3 seems to be imminent. At this point, IMO, that's the
> > safest thing to do. BTW, appended is the patch I'd like to get applied.
>
> In the interest of having a safe release, I guess you're right.

That's what I'm concerned about at the moment. I'm afraid there won't be
enough time to nail down all the issues (there may be some we're not even
aware of).

> It's unfortunate, though -- there's no good way to get thorough testing
> without putting the code in Linus's tree.

Absolutely. Still, the code in question introduces unexpected behavior that
we don't really understand and it's not safe to leave it in the mainline.

> > > How would we then learn about drivers trying to register or unregister a
> > > device during a sleep transition?
> >
> > I think we should fix the ones we know about and try to reintroduce this
> > change in the 2.6.26 time frame. It also seems reasonable to reconsider what
> > we want to achieve, as there may be a better way to do that.
>
> Below is my suggested approach, based on yours. It might even solve
> the MMC problem, who knows? And it adds some potentially useful
> debugging, although it would be better to dump just the stack of
> suspending_task. Do you know how to do that from within a timer
> routine?

No, I don't.

> I still have no clear idea about what's going on with the ACPI bug in
> #9874.

It seems that the ACPI code is not prepared to cope with a failing device
registration, in which case it'd need fixing. Frankly, I'm afraid there may
be more issues like this throughout the tree.

> However perhaps the bug wouldn't occur if we blocked device
> registration until after the resume was finished, instead of failing it
> outright. Since the registration in this case was done in a workqueue
> thread, blocking wouldn't cause an immediate deadlock.

No, it wouldn't, but the fact that it happens in the ACPI code makes me worry.

If we block that code and the things it's supposed to do turn out to be
necessary for suspending later on, we'll be in trouble.

> Index: usb-2.6/drivers/base/power/main.c
> ===================================================================
> --- usb-2.6.orig/drivers/base/power/main.c
> +++ usb-2.6/drivers/base/power/main.c
> @@ -25,6 +25,7 @@
> #include <linux/pm.h>
> #include <linux/resume-trace.h>
> #include <linux/rwsem.h>
> +#include <linux/sched.h>
>
> #include "../base.h"
> #include "power.h"
> @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
>
> int (*platform_enable_wakeup)(struct device *dev, int is_on);
>
> +static struct task_struct *suspending_task;
> +
> +bool in_suspend_context(void)
> +{
> + return (suspending_task == current);
> +}
> +
> /**
> * device_pm_add - add a device to the list of active devices
> * @dev: Device to be added to the list
> @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev
> /* No suspend in progress, wait on dev->sem */
> down(&dev->sem);
> up_read(&pm_sleep_rwsem);
> - } else {
> - /* Suspend in progress, we may deadlock */
> + } else if (!in_suspend_context()) {
> + /* Suspending in another task, we may deadlock */

Well, that shouldn't really deadlock. If it does, there is a potential design
issue somewhere. I think it might be better to set up a timer in here too.

Although IMO it would be even better to just set up a timer and remove the
warning altogehter.

> dev_warn(dev, "Suspicious %s during suspend\n",
> __FUNCTION__);
> dump_stack();
> /* The user has been warned ... */
> down(&dev->sem);
> }
> + /* Otherwise we're okay */
> }
> pr_debug("PM: Removing info for %s:%s\n",
> dev->bus ? dev->bus->name : "No Bus",

There's a problem here, because we shouldn't release the semaphore if we're
in the suspend context.

> @@ -272,6 +281,7 @@ static void dpm_resume(void)
> mutex_lock(&dpm_list_mtx);
> }
> mutex_unlock(&dpm_list_mtx);
> + suspending_task = NULL;
> }
>
> /**
> @@ -410,6 +420,17 @@ int device_power_down(pm_message_t state
> }
> EXPORT_SYMBOL_GPL(device_power_down);
>
> +/* Provide debugging info if we hang or deadlock during suspend */
> +static struct timer_list suspend_timer;
> +
> +static void suspend_timeout(unsigned long _dev)
> +{
> + struct device *dev = (struct device *) _dev;
> +
> + dev_err(dev, "deadlock during suspend!\n");
> + show_state();

The show_state() seems to be overkill and won't really help if the user can't
collect the output on the fly using a serial console or something like this.
[The debug stuff printed here should fit a typical laptop screen, ideally.]

> +}
> +
> /**
> * suspend_device - Save state of one device.
> * @dev: Device.
> @@ -419,6 +440,10 @@ int suspend_device(struct device *dev, p
> {
> int error = 0;
>
> + /* Provide debugging output in case of a deadlock */
> + setup_timer(&suspend_timer, suspend_timeout, (unsigned long) dev);
> + mod_timer(&suspend_timer, jiffies + 5*HZ);
> +
> if (dev->power.power_state.event) {
> dev_dbg(dev, "PM: suspend %d-->%d\n",
> dev->power.power_state.event, state.event);
> @@ -441,6 +466,8 @@ int suspend_device(struct device *dev, p
> error = dev->bus->suspend(dev, state);
> suspend_report_result(dev->bus->suspend, error);
> }
> +
> + del_timer_sync(&suspend_timer);
> return error;
> }
>
> @@ -460,6 +487,7 @@ static int dpm_suspend(pm_message_t stat
> {
> int error = 0;
>
> + suspending_task = current;
> mutex_lock(&dpm_list_mtx);
> while (!list_empty(&dpm_locked)) {
> struct list_head *entry = dpm_locked.prev;
> Index: usb-2.6/drivers/base/power/power.h
> ===================================================================
> --- usb-2.6.orig/drivers/base/power/power.h
> +++ usb-2.6/drivers/base/power/power.h
> @@ -11,6 +11,7 @@ static inline struct device *to_device(s
> return container_of(entry, struct device, power.entry);
> }
>
> +extern bool in_suspend_context(void);
> extern void device_pm_add(struct device *);
> extern void device_pm_remove(struct device *);
> extern int pm_sleep_lock(void);
> @@ -18,6 +19,10 @@ extern void pm_sleep_unlock(void);
>
> #else /* CONFIG_PM_SLEEP */
>
> +static inline bool in_suspend_context(void)
> +{
> + return false;
> +}
>
> static inline void device_pm_add(struct device *dev)
> {
> Index: usb-2.6/drivers/base/dd.c
> ===================================================================
> --- usb-2.6.orig/drivers/base/dd.c
> +++ usb-2.6/drivers/base/dd.c
> @@ -325,10 +325,18 @@ void device_release_driver(struct device
> * If anyone calls device_release_driver() recursively from
> * within their ->remove callback for the same device, they
> * will deadlock right here.
> + *
> + * We avoid locking dev->sem if we are in the context of a
> + * task doing a system suspend, in order that drivers can
> + * unregister devices from within their suspend() methods.
> + * This is okay because the suspending task will already own
> + * all the device semaphores.
> */
> - down(&dev->sem);
> + if (!in_suspend_context())
> + down(&dev->sem);
> __device_release_driver(dev);
> - up(&dev->sem);
> + if (!in_suspend_context())
> + up(&dev->sem);
> }

I'd prefer

if (in_suspend_context()) {
__device_release_driver(dev);
} else {
down(&dev->sem);
__device_release_driver(dev);
up(&dev->sem);
}

> EXPORT_SYMBOL_GPL(device_release_driver);

Well, I'd really feel much more comfortable if we removed the troublesome code
for 2.6.25 and reintroduced it along with the above safeguards for 2.6.26.

Of course, this doesn't mean we can't send debug patches for testing to the
users who have alraedy reported problems with it. :-)

Thanks,
Rafael

2008-02-24 03:26:00

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

> On Sunday, 24 of February 2008, Alan Stern wrote:
> > On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
> >
> > > Ultimately no, it's not. However, we are now late in the -rc2 time frame and
> > > the release of -rc3 seems to be imminent. At this point, IMO, that's the
> > > safest thing to do. BTW, appended is the patch I'd like to get applied.
> >
> > In the interest of having a safe release, I guess you're right.
>
> That's what I'm concerned about at the moment. I'm afraid there won't be
> enough time to nail down all the issues (there may be some we're not even
> aware of).

All right. You'll need to enlarge your big reversal patch by reverting
commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f.

> > Below is my suggested approach, based on yours. It might even solve
> > the MMC problem, who knows? And it adds some potentially useful
> > debugging, although it would be better to dump just the stack of
> > suspending_task. Do you know how to do that from within a timer
> > routine?
>
> No, I don't.

On further checking the answer turns out to be sched_show_task().
That's the routine which gets called for each process when you type
Alt-SysRq-t.

> > I still have no clear idea about what's going on with the ACPI bug in
> > #9874.
>
> It seems that the ACPI code is not prepared to cope with a failing device
> registration, in which case it'd need fixing. Frankly, I'm afraid there may
> be more issues like this throughout the tree.

The fact remains that it is unsafe to register a device during a sleep
transition, although if the device's driver doesn't have a suspend or
resume method you can probably get away with it.

> > However perhaps the bug wouldn't occur if we blocked device
> > registration until after the resume was finished, instead of failing it
> > outright. Since the registration in this case was done in a workqueue
> > thread, blocking wouldn't cause an immediate deadlock.
>
> No, it wouldn't, but the fact that it happens in the ACPI code makes me worry.

It happened in a workqueue. There could be lots of similar cases: Some
interrupt-driven event causes a hotplug action. Since the action can't
be carried out in interrupt context, the driver has no choice but to
defer it to a workqueue or kernel thread. Blocking that workqueue or
kernel thread won't cause a problem _unless_ the driver's
suspend/resume methods try to synchronize with it. That's the
difficult case.

> If we block that code and the things it's supposed to do turn out to be
> necessary for suspending later on, we'll be in trouble.

Exactly.

> > @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev
> > /* No suspend in progress, wait on dev->sem */
> > down(&dev->sem);
> > up_read(&pm_sleep_rwsem);
> > - } else {
> > - /* Suspend in progress, we may deadlock */
> > + } else if (!in_suspend_context()) {
> > + /* Suspending in another task, we may deadlock */
>
> Well, that shouldn't really deadlock. If it does, there is a potential design
> issue somewhere. I think it might be better to set up a timer in here too.

At this point the driver core owns the device semaphore, so the
unregistration task will block until the sleep is over. If the
driver's resume method has to synchronize with the unregistration task
then it will deadlock. I agree, it would be a design issue. In fact,
it's the same design issue described earlier in this email.

> Although IMO it would be even better to just set up a timer and remove the
> warning altogehter.

That warning was just for debugging purposes. A timer in the resume
path could do the same thing with fewer false alarms, just like the
timer in the suspend path. I have added it to the new patch.

> > dev_warn(dev, "Suspicious %s during suspend\n",
> > __FUNCTION__);
> > dump_stack();
> > /* The user has been warned ... */
> > down(&dev->sem);
> > }
> > + /* Otherwise we're okay */
> > }
> > pr_debug("PM: Removing info for %s:%s\n",
> > dev->bus ? dev->bus->name : "No Bus",
>
> There's a problem here, because we shouldn't release the semaphore if we're
> in the suspend context.

Yes, you're right. It's fixed in the new version.

> > +static void suspend_timeout(unsigned long _dev)
> > +{
> > + struct device *dev = (struct device *) _dev;
> > +
> > + dev_err(dev, "deadlock during suspend!\n");
> > + show_state();
>
> The show_state() seems to be overkill and won't really help if the user can't
> collect the output on the fly using a serial console or something like this.
> [The debug stuff printed here should fit a typical laptop screen, ideally.]

Changed.

> I'd prefer
>
> if (in_suspend_context()) {
> __device_release_driver(dev);
> } else {
> down(&dev->sem);
> __device_release_driver(dev);
> up(&dev->sem);
> }

Okay.

You know, with this new patch we probably don't need
device_pm_schedule_removal() any more. However I have left it in for
now.

> Well, I'd really feel much more comfortable if we removed the troublesome code
> for 2.6.25 and reintroduced it along with the above safeguards for 2.6.26.
>
> Of course, this doesn't mean we can't send debug patches for testing to the
> users who have alraedy reported problems with it. :-)

Certainly!

Here's the new version. Zdenek, could you try it out with none of
Rafael's patches installed?

Alan Stern



Index: usb-2.6/drivers/base/power/main.c
===================================================================
--- usb-2.6.orig/drivers/base/power/main.c
+++ usb-2.6/drivers/base/power/main.c
@@ -25,6 +25,7 @@
#include <linux/pm.h>
#include <linux/resume-trace.h>
#include <linux/rwsem.h>
+#include <linux/sched.h>

#include "../base.h"
#include "power.h"
@@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);

int (*platform_enable_wakeup)(struct device *dev, int is_on);

+static struct task_struct *suspending_task;
+
+bool in_suspend_context(void)
+{
+ return (suspending_task == current);
+}
+
/**
* device_pm_add - add a device to the list of active devices
* @dev: Device to be added to the list
@@ -82,27 +90,14 @@ void device_pm_add(struct device *dev)
void device_pm_remove(struct device *dev)
{
/*
- * If this function is called during a suspend, it will be blocked,
- * because we're holding the device's semaphore at that time, which may
- * lead to a deadlock. In that case we want to print a warning.
- * However, it may also be called by unregister_dropped_devices() with
- * the device's semaphore released, in which case the warning should
- * not be printed.
+ * If this function is called during a sleep then it will block
+ * because we're holding the device's semaphore at that time;
+ * this may lead to a deadlock. But if the calling thread is
+ * the same as the thread doing the sleep then it already owns
+ * all the device semaphores, so we make an exception.
*/
- if (down_trylock(&dev->sem)) {
- if (down_read_trylock(&pm_sleep_rwsem)) {
- /* No suspend in progress, wait on dev->sem */
- down(&dev->sem);
- up_read(&pm_sleep_rwsem);
- } else {
- /* Suspend in progress, we may deadlock */
- dev_warn(dev, "Suspicious %s during suspend\n",
- __FUNCTION__);
- dump_stack();
- /* The user has been warned ... */
- down(&dev->sem);
- }
- }
+ if (!in_suspend_context())
+ down(&dev->sem);
pr_debug("PM: Removing info for %s:%s\n",
dev->bus ? dev->bus->name : "No Bus",
kobject_name(&dev->kobj));
@@ -110,7 +105,8 @@ void device_pm_remove(struct device *dev
dpm_sysfs_remove(dev);
list_del_init(&dev->power.entry);
mutex_unlock(&dpm_list_mtx);
- up(&dev->sem);
+ if (!in_suspend_context())
+ up(&dev->sem);
}

/**
@@ -157,6 +153,18 @@ void pm_sleep_unlock(void)
}


+/* Provide debugging info if we hang or deadlock during suspend/resume */
+static struct timer_list method_timer;
+
+static void method_timeout(unsigned long _dev)
+{
+ struct device *dev = (struct device *) _dev;
+
+ dev_err(dev, "deadlock during suspend or resume!\n");
+ sched_show_task(suspending_task);
+}
+
+
/*------------------------- Resume routines -------------------------*/

/**
@@ -230,6 +238,10 @@ static int resume_device(struct device *
TRACE_DEVICE(dev);
TRACE_RESUME(0);

+ /* Provide debugging output in case of a deadlock */
+ setup_timer(&method_timer, method_timeout, (unsigned long) dev);
+ mod_timer(&method_timer, jiffies + 5*HZ);
+
if (dev->bus && dev->bus->resume) {
dev_dbg(dev,"resuming\n");
error = dev->bus->resume(dev);
@@ -245,6 +257,8 @@ static int resume_device(struct device *
error = dev->class->resume(dev);
}

+ del_timer_sync(&method_timer);
+
TRACE_RESUME(error);
return error;
}
@@ -272,6 +286,7 @@ static void dpm_resume(void)
mutex_lock(&dpm_list_mtx);
}
mutex_unlock(&dpm_list_mtx);
+ suspending_task = NULL;
}

/**
@@ -419,6 +434,10 @@ int suspend_device(struct device *dev, p
{
int error = 0;

+ /* Provide debugging output in case of a deadlock */
+ setup_timer(&method_timer, method_timeout, (unsigned long) dev);
+ mod_timer(&method_timer, jiffies + 5*HZ);
+
if (dev->power.power_state.event) {
dev_dbg(dev, "PM: suspend %d-->%d\n",
dev->power.power_state.event, state.event);
@@ -441,6 +460,8 @@ int suspend_device(struct device *dev, p
error = dev->bus->suspend(dev, state);
suspend_report_result(dev->bus->suspend, error);
}
+
+ del_timer_sync(&method_timer);
return error;
}

@@ -460,6 +481,7 @@ static int dpm_suspend(pm_message_t stat
{
int error = 0;

+ suspending_task = current;
mutex_lock(&dpm_list_mtx);
while (!list_empty(&dpm_locked)) {
struct list_head *entry = dpm_locked.prev;
Index: usb-2.6/drivers/base/power/power.h
===================================================================
--- usb-2.6.orig/drivers/base/power/power.h
+++ usb-2.6/drivers/base/power/power.h
@@ -11,6 +11,7 @@ static inline struct device *to_device(s
return container_of(entry, struct device, power.entry);
}

+extern bool in_suspend_context(void);
extern void device_pm_add(struct device *);
extern void device_pm_remove(struct device *);
extern int pm_sleep_lock(void);
@@ -18,6 +19,10 @@ extern void pm_sleep_unlock(void);

#else /* CONFIG_PM_SLEEP */

+static inline bool in_suspend_context(void)
+{
+ return false;
+}

static inline void device_pm_add(struct device *dev)
{
Index: usb-2.6/drivers/base/dd.c
===================================================================
--- usb-2.6.orig/drivers/base/dd.c
+++ usb-2.6/drivers/base/dd.c
@@ -325,10 +325,20 @@ void device_release_driver(struct device
* If anyone calls device_release_driver() recursively from
* within their ->remove callback for the same device, they
* will deadlock right here.
+ *
+ * We avoid locking dev->sem if we are in the context of a
+ * task doing a system sleep, in order that drivers can
+ * unregister devices from within their suspend() methods.
+ * This is okay because the suspending task will already own
+ * all the device semaphores.
*/
- down(&dev->sem);
- __device_release_driver(dev);
- up(&dev->sem);
+ if (in_suspend_context()) {
+ __device_release_driver(dev);
+ } else {
+ down(&dev->sem);
+ __device_release_driver(dev);
+ up(&dev->sem);
+ }
}
EXPORT_SYMBOL_GPL(device_release_driver);

2008-02-24 04:26:44

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [Bug 10030] Suspend doesn't work when SD card is inserted

On Sat, 23 Feb 2008, Alan Stern wrote:

> It happened in a workqueue. There could be lots of similar cases: Some
> interrupt-driven event causes a hotplug action. Since the action can't
> be carried out in interrupt context, the driver has no choice but to
> defer it to a workqueue or kernel thread. Blocking that workqueue or
> kernel thread won't cause a problem _unless_ the driver's
> suspend/resume methods try to synchronize with it. That's the
> difficult case.

In fact this turns out to be exactly the problem with the MMC
subsystem. It uses a dedicated workqueue (kmmcd) to handle state
changes, and mmc_suspend_host() does a flush_workqueue(). If the
workqueue contains an entry to register or unregister a device, this is
guaranteed to deadlock -- and that's exactly what happens when Zdenek
inserts or removes a card during the disk-drive spindown time of a
system sleep.

It looks like the best solution is for mmc_suspend_host() not to flush
the workqueue; instead the workqueue should prevent itself from
running during a suspend. Right now the easiest way to accomplish this
is for the workqueue routines (mmc_detect() and siblings) to acquire
the mmc_host's device semaphore. If in the future the MMC subsystem
adds dynamic PM support then a more complicated arrangement will be
needed.

So the patch below, in combination with the previous patch, might just
fix the whole problem.

Alan Stern



Index: usb-2.6/drivers/mmc/core/core.c
===================================================================
--- usb-2.6.orig/drivers/mmc/core/core.c
+++ usb-2.6/drivers/mmc/core/core.c
@@ -731,8 +731,6 @@ void mmc_stop_host(struct mmc_host *host
*/
int mmc_suspend_host(struct mmc_host *host, pm_message_t state)
{
- mmc_flush_scheduled_work();
-
mmc_bus_get(host);
if (host->bus_ops && !host->bus_dead) {
if (host->bus_ops->suspend)
Index: usb-2.6/drivers/mmc/core/mmc.c
===================================================================
--- usb-2.6.orig/drivers/mmc/core/mmc.c
+++ usb-2.6/drivers/mmc/core/mmc.c
@@ -441,6 +441,7 @@ static void mmc_detect(struct mmc_host *
BUG_ON(!host);
BUG_ON(!host->card);

+ down(&mmc_classdev(host)->sem);
mmc_claim_host(host);

/*
@@ -449,6 +450,7 @@ static void mmc_detect(struct mmc_host *
err = mmc_send_status(host->card, NULL);

mmc_release_host(host);
+ up(&mmc_classdev(host)->sem);

if (err) {
mmc_remove(host);
Index: usb-2.6/drivers/mmc/core/sd.c
===================================================================
--- usb-2.6.orig/drivers/mmc/core/sd.c
+++ usb-2.6/drivers/mmc/core/sd.c
@@ -500,6 +500,7 @@ static void mmc_sd_detect(struct mmc_hos
BUG_ON(!host);
BUG_ON(!host->card);

+ down(&mmc_classdev(host)->sem);
mmc_claim_host(host);

/*
@@ -508,6 +509,7 @@ static void mmc_sd_detect(struct mmc_hos
err = mmc_send_status(host->card, NULL);

mmc_release_host(host);
+ up(&mmc_classdev(host)->sem);

if (err) {
mmc_sd_remove(host);
Index: usb-2.6/drivers/mmc/core/sdio.c
===================================================================
--- usb-2.6.orig/drivers/mmc/core/sdio.c
+++ usb-2.6/drivers/mmc/core/sdio.c
@@ -195,6 +195,7 @@ static void mmc_sdio_detect(struct mmc_h
BUG_ON(!host);
BUG_ON(!host->card);

+ down(&mmc_classdev(host)->sem);
mmc_claim_host(host);

/*
@@ -203,6 +204,7 @@ static void mmc_sdio_detect(struct mmc_h
err = mmc_select_card(host->card);

mmc_release_host(host);
+ up(&mmc_classdev(host)->sem);

if (err) {
mmc_sdio_remove(host);

2008-02-24 13:34:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sunday, 24 of February 2008, Alan Stern wrote:
> On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
>
> > On Sunday, 24 of February 2008, Alan Stern wrote:
> > > On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
> > >
> > > > Ultimately no, it's not. However, we are now late in the -rc2 time frame and
> > > > the release of -rc3 seems to be imminent. At this point, IMO, that's the
> > > > safest thing to do. BTW, appended is the patch I'd like to get applied.
> > >
> > > In the interest of having a safe release, I guess you're right.
> >
> > That's what I'm concerned about at the moment. I'm afraid there won't be
> > enough time to nail down all the issues (there may be some we're not even
> > aware of).
>
> All right. You'll need to enlarge your big reversal patch by reverting
> commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f.

Thanks for the hint.

In fact, I had to add a couple of fixes in device_power_down() and
dpm_suspend(). The entire patch is appended.

I'll comment your new patch in a separate message.

Thanks,
Rafael

---
drivers/base/power/main.c | 97 +++-------------------------------------------
drivers/usb/core/usb.c | 2
2 files changed, 8 insertions(+), 91 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -48,7 +48,6 @@
*/

LIST_HEAD(dpm_active);
-static LIST_HEAD(dpm_locked);
static LIST_HEAD(dpm_off);
static LIST_HEAD(dpm_off_irq);
static LIST_HEAD(dpm_destroy);
@@ -81,28 +80,6 @@ void device_pm_add(struct device *dev)
*/
void device_pm_remove(struct device *dev)
{
- /*
- * If this function is called during a suspend, it will be blocked,
- * because we're holding the device's semaphore at that time, which may
- * lead to a deadlock. In that case we want to print a warning.
- * However, it may also be called by unregister_dropped_devices() with
- * the device's semaphore released, in which case the warning should
- * not be printed.
- */
- if (down_trylock(&dev->sem)) {
- if (down_read_trylock(&pm_sleep_rwsem)) {
- /* No suspend in progress, wait on dev->sem */
- down(&dev->sem);
- up_read(&pm_sleep_rwsem);
- } else {
- /* Suspend in progress, we may deadlock */
- dev_warn(dev, "Suspicious %s during suspend\n",
- __FUNCTION__);
- dump_stack();
- /* The user has been warned ... */
- down(&dev->sem);
- }
- }
pr_debug("PM: Removing info for %s:%s\n",
dev->bus ? dev->bus->name : "No Bus",
kobject_name(&dev->kobj));
@@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev
dpm_sysfs_remove(dev);
list_del_init(&dev->power.entry);
mutex_unlock(&dpm_list_mtx);
- up(&dev->sem);
}

/**
@@ -266,7 +242,7 @@ static void dpm_resume(void)
struct list_head *entry = dpm_off.next;
struct device *dev = to_device(entry);

- list_move_tail(entry, &dpm_locked);
+ list_move_tail(entry, &dpm_active);
mutex_unlock(&dpm_list_mtx);
resume_device(dev);
mutex_lock(&dpm_list_mtx);
@@ -275,25 +251,6 @@ static void dpm_resume(void)
}

/**
- * unlock_all_devices - Release each device's semaphore
- *
- * Go through the dpm_off list. Put each device on the dpm_active
- * list and unlock it.
- */
-static void unlock_all_devices(void)
-{
- mutex_lock(&dpm_list_mtx);
- while (!list_empty(&dpm_locked)) {
- struct list_head *entry = dpm_locked.prev;
- struct device *dev = to_device(entry);
-
- list_move(entry, &dpm_active);
- up(&dev->sem);
- }
- mutex_unlock(&dpm_list_mtx);
-}
-
-/**
* unregister_dropped_devices - Unregister devices scheduled for removal
*
* Unregister all devices on the dpm_destroy list.
@@ -305,7 +262,6 @@ static void unregister_dropped_devices(v
struct list_head *entry = dpm_destroy.next;
struct device *dev = to_device(entry);

- up(&dev->sem);
mutex_unlock(&dpm_list_mtx);
/* This also removes the device from the list */
device_unregister(dev);
@@ -324,7 +280,6 @@ void device_resume(void)
{
might_sleep();
dpm_resume();
- unlock_all_devices();
unregister_dropped_devices();
up_write(&pm_sleep_rwsem);
}
@@ -388,18 +343,15 @@ int device_power_down(pm_message_t state
struct list_head *entry = dpm_off.prev;
struct device *dev = to_device(entry);

- list_del_init(&dev->power.entry);
error = suspend_device_late(dev, state);
if (error) {
printk(KERN_ERR "Could not power down device %s: "
"error %d\n",
kobject_name(&dev->kobj), error);
- if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_off);
break;
}
- if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_off_irq);
+ if (!list_empty(&dev->power.entry))
+ list_move(&dev->power.entry, &dpm_off_irq);
}

if (!error)
@@ -461,11 +413,10 @@ static int dpm_suspend(pm_message_t stat
int error = 0;

mutex_lock(&dpm_list_mtx);
- while (!list_empty(&dpm_locked)) {
- struct list_head *entry = dpm_locked.prev;
+ while (!list_empty(&dpm_active)) {
+ struct list_head *entry = dpm_active.prev;
struct device *dev = to_device(entry);

- list_del_init(&dev->power.entry);
mutex_unlock(&dpm_list_mtx);
error = suspend_device(dev, state);
if (error) {
@@ -476,14 +427,11 @@ static int dpm_suspend(pm_message_t stat
(error == -EAGAIN ?
" (please convert to suspend_late)" :
""));
- mutex_lock(&dpm_list_mtx);
- if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_locked);
break;
}
mutex_lock(&dpm_list_mtx);
- if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_off);
+ if (!list_empty(&dev->power.entry))
+ list_move(&dev->power.entry, &dpm_off);
}
mutex_unlock(&dpm_list_mtx);

@@ -491,36 +439,6 @@ static int dpm_suspend(pm_message_t stat
}

/**
- * lock_all_devices - Acquire every device's semaphore
- *
- * Go through the dpm_active list. Carefully lock each device's
- * semaphore and put it in on the dpm_locked list.
- */
-static void lock_all_devices(void)
-{
- mutex_lock(&dpm_list_mtx);
- while (!list_empty(&dpm_active)) {
- struct list_head *entry = dpm_active.next;
- struct device *dev = to_device(entry);
-
- /* Required locking order is dev->sem first,
- * then dpm_list_mutex. Hence this awkward code.
- */
- get_device(dev);
- mutex_unlock(&dpm_list_mtx);
- down(&dev->sem);
- mutex_lock(&dpm_list_mtx);
-
- if (list_empty(entry))
- up(&dev->sem); /* Device was removed */
- else
- list_move_tail(entry, &dpm_locked);
- put_device(dev);
- }
- mutex_unlock(&dpm_list_mtx);
-}
-
-/**
* device_suspend - Save state and stop all devices in system.
* @state: new power management state
*
@@ -533,7 +451,6 @@ int device_suspend(pm_message_t state)

might_sleep();
down_write(&pm_sleep_rwsem);
- lock_all_devices();
error = dpm_suspend(state);
if (error)
device_resume();
Index: linux-2.6/drivers/usb/core/usb.c
===================================================================
--- linux-2.6.orig/drivers/usb/core/usb.c
+++ linux-2.6/drivers/usb/core/usb.c
@@ -234,7 +234,7 @@ static int ksuspend_usb_init(void)
* singlethreaded. Its job doesn't justify running on more
* than one CPU.
*/
- ksuspend_usb_wq = create_singlethread_workqueue("ksuspend_usbd");
+ ksuspend_usb_wq = create_freezeable_workqueue("ksuspend_usbd");
if (!ksuspend_usb_wq)
return -ENOMEM;
return 0;

2008-02-24 13:52:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sunday, 24 of February 2008, Alan Stern wrote:
> On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
> > On Sunday, 24 of February 2008, Alan Stern wrote:
> > > On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
>
> > > I still have no clear idea about what's going on with the ACPI bug in
> > > #9874.
> >
> > It seems that the ACPI code is not prepared to cope with a failing device
> > registration, in which case it'd need fixing. Frankly, I'm afraid there may
> > be more issues like this throughout the tree.
>
> The fact remains that it is unsafe to register a device during a sleep
> transition, although if the device's driver doesn't have a suspend or
> resume method you can probably get away with it.
>
> > > However perhaps the bug wouldn't occur if we blocked device
> > > registration until after the resume was finished, instead of failing it
> > > outright. Since the registration in this case was done in a workqueue
> > > thread, blocking wouldn't cause an immediate deadlock.
> >
> > No, it wouldn't, but the fact that it happens in the ACPI code makes me worry.
>
> It happened in a workqueue. There could be lots of similar cases: Some
> interrupt-driven event causes a hotplug action. Since the action can't
> be carried out in interrupt context, the driver has no choice but to
> defer it to a workqueue or kernel thread. Blocking that workqueue or
> kernel thread won't cause a problem _unless_ the driver's
> suspend/resume methods try to synchronize with it. That's the
> difficult case.

Yes.

> > If we block that code and the things it's supposed to do turn out to be
> > necessary for suspending later on, we'll be in trouble.
>
> Exactly.
>
> > > @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev
> > > /* No suspend in progress, wait on dev->sem */
> > > down(&dev->sem);
> > > up_read(&pm_sleep_rwsem);
> > > - } else {
> > > - /* Suspend in progress, we may deadlock */
> > > + } else if (!in_suspend_context()) {
> > > + /* Suspending in another task, we may deadlock */
> >
> > Well, that shouldn't really deadlock. If it does, there is a potential design
> > issue somewhere. I think it might be better to set up a timer in here too.
>
> At this point the driver core owns the device semaphore, so the
> unregistration task will block until the sleep is over. If the
> driver's resume method has to synchronize with the unregistration task
> then it will deadlock. I agree, it would be a design issue. In fact,
> it's the same design issue described earlier in this email.
>
> > Although IMO it would be even better to just set up a timer and remove the
> > warning altogehter.
>
> That warning was just for debugging purposes. A timer in the resume
> path could do the same thing with fewer false alarms, just like the
> timer in the suspend path. I have added it to the new patch.

OK

> > > dev_warn(dev, "Suspicious %s during suspend\n",
> > > __FUNCTION__);
> > > dump_stack();
> > > /* The user has been warned ... */
> > > down(&dev->sem);
> > > }
> > > + /* Otherwise we're okay */
> > > }
> > > pr_debug("PM: Removing info for %s:%s\n",
> > > dev->bus ? dev->bus->name : "No Bus",
> >
> > There's a problem here, because we shouldn't release the semaphore if we're
> > in the suspend context.
>
> Yes, you're right. It's fixed in the new version.
>
> > > +static void suspend_timeout(unsigned long _dev)
> > > +{
> > > + struct device *dev = (struct device *) _dev;
> > > +
> > > + dev_err(dev, "deadlock during suspend!\n");
> > > + show_state();
> >
> > The show_state() seems to be overkill and won't really help if the user can't
> > collect the output on the fly using a serial console or something like this.
> > [The debug stuff printed here should fit a typical laptop screen, ideally.]
>
> Changed.
>
> > I'd prefer
> >
> > if (in_suspend_context()) {
> > __device_release_driver(dev);
> > } else {
> > down(&dev->sem);
> > __device_release_driver(dev);
> > up(&dev->sem);
> > }
>
> Okay.
>
> You know, with this new patch we probably don't need
> device_pm_schedule_removal() any more.

No, we don't. However, because of that dpm_suspend() and device_power_down()
need to be changed not to assume that the removed devices will end up on
dpm_destroy.

> However I have left it in for now.

Well, it's used by the b43, leds and hwrng drivers as well as by some CPU
hotplug notifiers. They all will have to be changed along with removing it.

> > Well, I'd really feel much more comfortable if we removed the troublesome code
> > for 2.6.25 and reintroduced it along with the above safeguards for 2.6.26.
> >
> > Of course, this doesn't mean we can't send debug patches for testing to the
> > users who have alraedy reported problems with it. :-)
>
> Certainly!
>
> Here's the new version.

IMO you also should add this change in device_power_down():

@@ -388,18 +343,15 @@ int device_power_down(pm_message_t state
struct list_head *entry = dpm_off.prev;
struct device *dev = to_device(entry);

- list_del_init(&dev->power.entry);
error = suspend_device_late(dev, state);
if (error) {
printk(KERN_ERR "Could not power down device %s: "
"error %d\n",
kobject_name(&dev->kobj), error);
- if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_off);
break;
}
- if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_off_irq);
+ if (!list_empty(&dev->power.entry))
+ list_move(&dev->power.entry, &dpm_off_irq);
}

if (!error)

and the analogous one in dpm_suspend(). Otherwise there would be a problem
if suspend_device_late() itself removed the device (same for device_suspend()).
Of course, that would lead to a problem if device_pm_schedule_removal() were
used for the removal, so the last "!list_empty(&dev->power.entry)" is really
not sufficient - until device_pm_schedule_removal() is removed.

Thanks,
Rafael

2008-02-24 14:02:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [Bug 10030] Suspend doesn't work when SD card is inserted

On Sunday, 24 of February 2008, Alan Stern wrote:
> On Sat, 23 Feb 2008, Alan Stern wrote:
>
> > It happened in a workqueue. There could be lots of similar cases: Some
> > interrupt-driven event causes a hotplug action. Since the action can't
> > be carried out in interrupt context, the driver has no choice but to
> > defer it to a workqueue or kernel thread. Blocking that workqueue or
> > kernel thread won't cause a problem _unless_ the driver's
> > suspend/resume methods try to synchronize with it. That's the
> > difficult case.
>
> In fact this turns out to be exactly the problem with the MMC
> subsystem. It uses a dedicated workqueue (kmmcd) to handle state
> changes, and mmc_suspend_host() does a flush_workqueue(). If the
> workqueue contains an entry to register or unregister a device, this is
> guaranteed to deadlock -- and that's exactly what happens when Zdenek
> inserts or removes a card during the disk-drive spindown time of a
> system sleep.
>
> It looks like the best solution is for mmc_suspend_host() not to flush
> the workqueue; instead the workqueue should prevent itself from
> running during a suspend. Right now the easiest way to accomplish this
> is for the workqueue routines (mmc_detect() and siblings) to acquire
> the mmc_host's device semaphore. If in the future the MMC subsystem
> adds dynamic PM support then a more complicated arrangement will be
> needed.
>
> So the patch below, in combination with the previous patch, might just
> fix the whole problem.

The patch looks sane and I think you're right.

Still, it depends on us holding the device semaphores (if I understand it
correctly). For this reason, I'd prefer to include it into the patch that adds
device locking to the PM core.

I think we should first remove the device locking from
drivers/base/power/main.c (that's the patch I'd like to push upstream now) and
then create a patch that will add it back along with all of the necessary
additional changes we know of (BTW, I've just received a message from another
user affected by it).

but we need to be careful with
all the details. IMO, it would be

Thanks,
Rafael


> Index: usb-2.6/drivers/mmc/core/core.c
> ===================================================================
> --- usb-2.6.orig/drivers/mmc/core/core.c
> +++ usb-2.6/drivers/mmc/core/core.c
> @@ -731,8 +731,6 @@ void mmc_stop_host(struct mmc_host *host
> */
> int mmc_suspend_host(struct mmc_host *host, pm_message_t state)
> {
> - mmc_flush_scheduled_work();
> -
> mmc_bus_get(host);
> if (host->bus_ops && !host->bus_dead) {
> if (host->bus_ops->suspend)
> Index: usb-2.6/drivers/mmc/core/mmc.c
> ===================================================================
> --- usb-2.6.orig/drivers/mmc/core/mmc.c
> +++ usb-2.6/drivers/mmc/core/mmc.c
> @@ -441,6 +441,7 @@ static void mmc_detect(struct mmc_host *
> BUG_ON(!host);
> BUG_ON(!host->card);
>
> + down(&mmc_classdev(host)->sem);
> mmc_claim_host(host);
>
> /*
> @@ -449,6 +450,7 @@ static void mmc_detect(struct mmc_host *
> err = mmc_send_status(host->card, NULL);
>
> mmc_release_host(host);
> + up(&mmc_classdev(host)->sem);
>
> if (err) {
> mmc_remove(host);
> Index: usb-2.6/drivers/mmc/core/sd.c
> ===================================================================
> --- usb-2.6.orig/drivers/mmc/core/sd.c
> +++ usb-2.6/drivers/mmc/core/sd.c
> @@ -500,6 +500,7 @@ static void mmc_sd_detect(struct mmc_hos
> BUG_ON(!host);
> BUG_ON(!host->card);
>
> + down(&mmc_classdev(host)->sem);
> mmc_claim_host(host);
>
> /*
> @@ -508,6 +509,7 @@ static void mmc_sd_detect(struct mmc_hos
> err = mmc_send_status(host->card, NULL);
>
> mmc_release_host(host);
> + up(&mmc_classdev(host)->sem);
>
> if (err) {
> mmc_sd_remove(host);
> Index: usb-2.6/drivers/mmc/core/sdio.c
> ===================================================================
> --- usb-2.6.orig/drivers/mmc/core/sdio.c
> +++ usb-2.6/drivers/mmc/core/sdio.c
> @@ -195,6 +195,7 @@ static void mmc_sdio_detect(struct mmc_h
> BUG_ON(!host);
> BUG_ON(!host->card);
>
> + down(&mmc_classdev(host)->sem);
> mmc_claim_host(host);
>
> /*
> @@ -203,6 +204,7 @@ static void mmc_sdio_detect(struct mmc_h
> err = mmc_select_card(host->card);
>
> mmc_release_host(host);
> + up(&mmc_classdev(host)->sem);
>
> if (err) {
> mmc_sdio_remove(host);
>
>
>



--
"Premature optimization is the root of all evil." - Donald Knuth

2008-02-24 15:34:06

by Alan Stern

[permalink] [raw]
Subject: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

> > In fact this turns out to be exactly the problem with the MMC
> > subsystem. It uses a dedicated workqueue (kmmcd) to handle state
> > changes, and mmc_suspend_host() does a flush_workqueue(). If the
> > workqueue contains an entry to register or unregister a device, this is
> > guaranteed to deadlock -- and that's exactly what happens when Zdenek
> > inserts or removes a card during the disk-drive spindown time of a
> > system sleep.
> >
> > It looks like the best solution is for mmc_suspend_host() not to flush
> > the workqueue; instead the workqueue should prevent itself from
> > running during a suspend. Right now the easiest way to accomplish this
> > is for the workqueue routines (mmc_detect() and siblings) to acquire
> > the mmc_host's device semaphore. If in the future the MMC subsystem
> > adds dynamic PM support then a more complicated arrangement will be
> > needed.
> >
> > So the patch below, in combination with the previous patch, might just
> > fix the whole problem.
>
> The patch looks sane and I think you're right.

Well, the patch itself isn't really adequate; it was just a first pass
intended to illustrate the nature of the problem.

The problems in the MMC subsystem go deeper.

The first is the way it uses flush_workqueue(). This is almost always
a bad function to call, because it introduces unnecessary dependencies.
cancel_delayed_work_sync() is much better.

But even changing that won't solve the second issue, which is a genuine
bug. There is a race between detect events and suspend events. The
mmc_suspend_host() routine starts out by flushing the kmmcd workqueue
before calling the host's suspend routine. So what happens if another
detect event occurs in between?

This whole area of synchronization between card insertion, card
removal, suspend, and resume needs to be thought out better.
Especially keeping in mind that device registration or unregistration
during a system sleep is likely to block until the sleep is over.

Lastly, there are some other questions about confusing aspects of the
subsystem. What's the story with __mmc_claim_host()? Is it really
nothing more than an abortable mutex_lock()? Why does it ever need to
be aborted? Why is its second argument an atomic_t instead of a normal
int?

Why are mmc_detect() and related routines described in comments as
"Card detection callback from host"? They don't handle card
_detection_ -- they handle card _removal_.

What's the purpose of the card-counting loop in
host/omap.c:mmc_omap_switch_handler()? It looks like dead code.

Alan Stern

2008-02-24 18:27:58

by Pavel Machek

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

Hi!

> Index: usb-2.6/drivers/base/power/main.c
> ===================================================================
> --- usb-2.6.orig/drivers/base/power/main.c
> +++ usb-2.6/drivers/base/power/main.c
> @@ -25,6 +25,7 @@
> #include <linux/pm.h>
> #include <linux/resume-trace.h>
> #include <linux/rwsem.h>
> +#include <linux/sched.h>
>
> #include "../base.h"
> #include "power.h"
> @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
>
> int (*platform_enable_wakeup)(struct device *dev, int is_on);
>
> +static struct task_struct *suspending_task;

What locking protects this variable? What happens when suspending_task
exits? (Hmm, that would probably be bug, anyway?)

Or are we running UP when this is accessed? This at least needs a big
fat comment.

> +bool in_suspend_context(void)
> +{
> + return (suspending_task == current);
> +}


Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-24 19:03:25

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sun, 24 Feb 2008, Pavel Machek wrote:

> Hi!
>
> > Index: usb-2.6/drivers/base/power/main.c
> > ===================================================================
> > --- usb-2.6.orig/drivers/base/power/main.c
> > +++ usb-2.6/drivers/base/power/main.c
> > @@ -25,6 +25,7 @@
> > #include <linux/pm.h>
> > #include <linux/resume-trace.h>
> > #include <linux/rwsem.h>
> > +#include <linux/sched.h>
> >
> > #include "../base.h"
> > #include "power.h"
> > @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
> >
> > int (*platform_enable_wakeup)(struct device *dev, int is_on);
> >
> > +static struct task_struct *suspending_task;
>
> What locking protects this variable? What happens when suspending_task
> exits? (Hmm, that would probably be bug, anyway?)

It's protected by whatever existing locking scheme allows only one
task to start a system sleep at a time. For example, the suspending
task has to get a write lock on pm_sleep_rwsem.

Yes, if the suspending task exits before the system has woken up,
you're in trouble regardless.

> Or are we running UP when this is accessed? This at least needs a big
> fat comment.
>
> > +bool in_suspend_context(void)
> > +{
> > + return (suspending_task == current);
> > +}

We aren't necessarily UP. But since all that matters is whether or not
suspend_task is equal to the current task, no extra locking is needed.

I'll add a comment explaining all this.

Alan Stern

2008-02-24 19:28:12

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

> > You know, with this new patch we probably don't need
> > device_pm_schedule_removal() any more.
>
> No, we don't. However, because of that dpm_suspend() and device_power_down()
> need to be changed not to assume that the removed devices will end up on
> dpm_destroy.

Yes. The safest approach will be to check whether the device is still
at the end of the dpm_locked or dpm_off list; if it hasn't then we can
assume that it has been removed (or scheduled for removal).

> IMO you also should add this change in device_power_down():
>
> @@ -388,18 +343,15 @@ int device_power_down(pm_message_t state
> struct list_head *entry = dpm_off.prev;
> struct device *dev = to_device(entry);
>
> - list_del_init(&dev->power.entry);
> error = suspend_device_late(dev, state);
> if (error) {
> printk(KERN_ERR "Could not power down device %s: "
> "error %d\n",
> kobject_name(&dev->kobj), error);
> - if (list_empty(&dev->power.entry))
> - list_add(&dev->power.entry, &dpm_off);
> break;
> }
> - if (list_empty(&dev->power.entry))
> - list_add(&dev->power.entry, &dpm_off_irq);
> + if (!list_empty(&dev->power.entry))
> + list_move(&dev->power.entry, &dpm_off_irq);
> }

Actually I changed the last two lines to:

+
+ /* Don't move the entry if the device has been unregistered */
+ if (entry == dpm_off.prev)
+ list_move(entry, &dpm_off_irq);

with analogous changes to dpm_suspend().

Any more suggestions for updates?

Alan Stern

2008-02-24 19:42:53

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008/2/24, Alan Stern <[email protected]>:
> On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
>
>
> > > You know, with this new patch we probably don't need
> > > device_pm_schedule_removal() any more.
> >
> > No, we don't. However, because of that dpm_suspend() and device_power_down()
> > need to be changed not to assume that the removed devices will end up on
> > dpm_destroy.
>
>
> with analogous changes to dpm_suspend().
>
> Any more suggestions for updates?
>

Is there any current patch I should actually test - it looks there is
ongoing discussion from the time Rafael has proposed his last patch to
check by me: http://bugzilla.kernel.org/attachment.cgi?id=14961&action=view

But I think it's visible the only suspend update will not resolve my
problems and the problems is deeper in the mmc.

So should I still test yesterdays' patch ?

Zdenek

Zdenek

2008-02-24 20:10:56

by Pavel Machek

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

Hi!

> > > @@ -25,6 +25,7 @@
> > > #include <linux/pm.h>
> > > #include <linux/resume-trace.h>
> > > #include <linux/rwsem.h>
> > > +#include <linux/sched.h>
> > >
> > > #include "../base.h"
> > > #include "power.h"
> > > @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
> > >
> > > int (*platform_enable_wakeup)(struct device *dev, int is_on);
> > >
> > > +static struct task_struct *suspending_task;
> >
> > What locking protects this variable? What happens when suspending_task
> > exits? (Hmm, that would probably be bug, anyway?)
>
> It's protected by whatever existing locking scheme allows only one
> task to start a system sleep at a time. For example, the suspending
> task has to get a write lock on pm_sleep_rwsem.

And readers of suspending_task are protected by?

At the very least, you'd need rmb() before reading it and wmb() after
writing to it, but I'm not sure if that's enough on every obscure
architecture out there.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-24 20:11:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sunday, 24 of February 2008, Zdenek Kabelac wrote:
> 2008/2/24, Alan Stern <[email protected]>:
> > On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
> >
> >
> > > > You know, with this new patch we probably don't need
> > > > device_pm_schedule_removal() any more.
> > >
> > > No, we don't. However, because of that dpm_suspend() and device_power_down()
> > > need to be changed not to assume that the removed devices will end up on
> > > dpm_destroy.
> >
> >
> > with analogous changes to dpm_suspend().
> >
> > Any more suggestions for updates?
> >
>
> Is there any current patch I should actually test - it looks there is
> ongoing discussion from the time Rafael has proposed his last patch to
> check by me: http://bugzilla.kernel.org/attachment.cgi?id=14961&action=view
>
> But I think it's visible the only suspend update will not resolve my
> problems and the problems is deeper in the mmc.

Yes, there are problems in there, but your feedback is needed nevertheless.

> So should I still test yesterdays' patch ?

Yes please. It will be valuable information to us whether it fixes the suspend
issue at least.

Thanks,
Rafael

2008-02-24 20:27:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sunday, 24 of February 2008, Rafael J. Wysocki wrote:
> On Sunday, 24 of February 2008, Alan Stern wrote:
> > On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
> >
> > > On Sunday, 24 of February 2008, Alan Stern wrote:
> > > > On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
> > > >
> > > > > Ultimately no, it's not. However, we are now late in the -rc2 time frame and
> > > > > the release of -rc3 seems to be imminent. At this point, IMO, that's the
> > > > > safest thing to do. BTW, appended is the patch I'd like to get applied.
> > > >
> > > > In the interest of having a safe release, I guess you're right.
> > >
> > > That's what I'm concerned about at the moment. I'm afraid there won't be
> > > enough time to nail down all the issues (there may be some we're not even
> > > aware of).
> >
> > All right. You'll need to enlarge your big reversal patch by reverting
> > commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f.
>
> Thanks for the hint.
>
> In fact, I had to add a couple of fixes in device_power_down() and
> dpm_suspend(). The entire patch is appended.

I hope it's fine by everyone to push the patch below to Greg for 2.6.25.

Thanks,
Rafael


---
From: Rafael J. Wysocki <[email protected]>

Remove the code that acquires all device semaphores from the suspend
code path as it causes multiple problems to appear (most notably,
http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
depending on the code being removed.

Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid
the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/core.c | 8 ---
drivers/base/power/main.c | 97 +++-------------------------------------------
drivers/usb/core/usb.c | 2
3 files changed, 8 insertions(+), 99 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -48,7 +48,6 @@
*/

LIST_HEAD(dpm_active);
-static LIST_HEAD(dpm_locked);
static LIST_HEAD(dpm_off);
static LIST_HEAD(dpm_off_irq);
static LIST_HEAD(dpm_destroy);
@@ -81,28 +80,6 @@ void device_pm_add(struct device *dev)
*/
void device_pm_remove(struct device *dev)
{
- /*
- * If this function is called during a suspend, it will be blocked,
- * because we're holding the device's semaphore at that time, which may
- * lead to a deadlock. In that case we want to print a warning.
- * However, it may also be called by unregister_dropped_devices() with
- * the device's semaphore released, in which case the warning should
- * not be printed.
- */
- if (down_trylock(&dev->sem)) {
- if (down_read_trylock(&pm_sleep_rwsem)) {
- /* No suspend in progress, wait on dev->sem */
- down(&dev->sem);
- up_read(&pm_sleep_rwsem);
- } else {
- /* Suspend in progress, we may deadlock */
- dev_warn(dev, "Suspicious %s during suspend\n",
- __FUNCTION__);
- dump_stack();
- /* The user has been warned ... */
- down(&dev->sem);
- }
- }
pr_debug("PM: Removing info for %s:%s\n",
dev->bus ? dev->bus->name : "No Bus",
kobject_name(&dev->kobj));
@@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev
dpm_sysfs_remove(dev);
list_del_init(&dev->power.entry);
mutex_unlock(&dpm_list_mtx);
- up(&dev->sem);
}

/**
@@ -266,7 +242,7 @@ static void dpm_resume(void)
struct list_head *entry = dpm_off.next;
struct device *dev = to_device(entry);

- list_move_tail(entry, &dpm_locked);
+ list_move_tail(entry, &dpm_active);
mutex_unlock(&dpm_list_mtx);
resume_device(dev);
mutex_lock(&dpm_list_mtx);
@@ -275,25 +251,6 @@ static void dpm_resume(void)
}

/**
- * unlock_all_devices - Release each device's semaphore
- *
- * Go through the dpm_off list. Put each device on the dpm_active
- * list and unlock it.
- */
-static void unlock_all_devices(void)
-{
- mutex_lock(&dpm_list_mtx);
- while (!list_empty(&dpm_locked)) {
- struct list_head *entry = dpm_locked.prev;
- struct device *dev = to_device(entry);
-
- list_move(entry, &dpm_active);
- up(&dev->sem);
- }
- mutex_unlock(&dpm_list_mtx);
-}
-
-/**
* unregister_dropped_devices - Unregister devices scheduled for removal
*
* Unregister all devices on the dpm_destroy list.
@@ -305,7 +262,6 @@ static void unregister_dropped_devices(v
struct list_head *entry = dpm_destroy.next;
struct device *dev = to_device(entry);

- up(&dev->sem);
mutex_unlock(&dpm_list_mtx);
/* This also removes the device from the list */
device_unregister(dev);
@@ -324,7 +280,6 @@ void device_resume(void)
{
might_sleep();
dpm_resume();
- unlock_all_devices();
unregister_dropped_devices();
up_write(&pm_sleep_rwsem);
}
@@ -388,18 +343,15 @@ int device_power_down(pm_message_t state
struct list_head *entry = dpm_off.prev;
struct device *dev = to_device(entry);

- list_del_init(&dev->power.entry);
error = suspend_device_late(dev, state);
if (error) {
printk(KERN_ERR "Could not power down device %s: "
"error %d\n",
kobject_name(&dev->kobj), error);
- if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_off);
break;
}
- if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_off_irq);
+ if (!list_empty(&dev->power.entry))
+ list_move(&dev->power.entry, &dpm_off_irq);
}

if (!error)
@@ -461,11 +413,10 @@ static int dpm_suspend(pm_message_t stat
int error = 0;

mutex_lock(&dpm_list_mtx);
- while (!list_empty(&dpm_locked)) {
- struct list_head *entry = dpm_locked.prev;
+ while (!list_empty(&dpm_active)) {
+ struct list_head *entry = dpm_active.prev;
struct device *dev = to_device(entry);

- list_del_init(&dev->power.entry);
mutex_unlock(&dpm_list_mtx);
error = suspend_device(dev, state);
if (error) {
@@ -476,14 +427,11 @@ static int dpm_suspend(pm_message_t stat
(error == -EAGAIN ?
" (please convert to suspend_late)" :
""));
- mutex_lock(&dpm_list_mtx);
- if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_locked);
break;
}
mutex_lock(&dpm_list_mtx);
- if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_off);
+ if (!list_empty(&dev->power.entry))
+ list_move(&dev->power.entry, &dpm_off);
}
mutex_unlock(&dpm_list_mtx);

@@ -491,36 +439,6 @@ static int dpm_suspend(pm_message_t stat
}

/**
- * lock_all_devices - Acquire every device's semaphore
- *
- * Go through the dpm_active list. Carefully lock each device's
- * semaphore and put it in on the dpm_locked list.
- */
-static void lock_all_devices(void)
-{
- mutex_lock(&dpm_list_mtx);
- while (!list_empty(&dpm_active)) {
- struct list_head *entry = dpm_active.next;
- struct device *dev = to_device(entry);
-
- /* Required locking order is dev->sem first,
- * then dpm_list_mutex. Hence this awkward code.
- */
- get_device(dev);
- mutex_unlock(&dpm_list_mtx);
- down(&dev->sem);
- mutex_lock(&dpm_list_mtx);
-
- if (list_empty(entry))
- up(&dev->sem); /* Device was removed */
- else
- list_move_tail(entry, &dpm_locked);
- put_device(dev);
- }
- mutex_unlock(&dpm_list_mtx);
-}
-
-/**
* device_suspend - Save state and stop all devices in system.
* @state: new power management state
*
@@ -533,7 +451,6 @@ int device_suspend(pm_message_t state)

might_sleep();
down_write(&pm_sleep_rwsem);
- lock_all_devices();
error = dpm_suspend(state);
if (error)
device_resume();
Index: linux-2.6/drivers/usb/core/usb.c
===================================================================
--- linux-2.6.orig/drivers/usb/core/usb.c
+++ linux-2.6/drivers/usb/core/usb.c
@@ -234,7 +234,7 @@ static int ksuspend_usb_init(void)
* singlethreaded. Its job doesn't justify running on more
* than one CPU.
*/
- ksuspend_usb_wq = create_singlethread_workqueue("ksuspend_usbd");
+ ksuspend_usb_wq = create_freezeable_workqueue("ksuspend_usbd");
if (!ksuspend_usb_wq)
return -ENOMEM;
return 0;
Index: linux-2.6/drivers/base/core.c
===================================================================
--- linux-2.6.orig/drivers/base/core.c
+++ linux-2.6/drivers/base/core.c
@@ -770,13 +770,6 @@ int device_add(struct device *dev)
struct class_interface *class_intf;
int error;

- error = pm_sleep_lock();
- if (error) {
- dev_warn(dev, "Suspicious %s during suspend\n", __FUNCTION__);
- dump_stack();
- return error;
- }
-
dev = get_device(dev);
if (!dev || !strlen(dev->bus_id)) {
error = -EINVAL;
@@ -843,7 +836,6 @@ int device_add(struct device *dev)
}
Done:
put_device(dev);
- pm_sleep_unlock();
return error;
BusError:
device_pm_remove(dev);

2008-02-24 20:33:18

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sun, 24 Feb 2008, Pavel Machek wrote:

> > > What locking protects this variable? What happens when suspending_task
> > > exits? (Hmm, that would probably be bug, anyway?)
> >
> > It's protected by whatever existing locking scheme allows only one
> > task to start a system sleep at a time. For example, the suspending
> > task has to get a write lock on pm_sleep_rwsem.
>
> And readers of suspending_task are protected by?

I added a comment about that too.

> At the very least, you'd need rmb() before reading it and wmb() after
> writing to it, but I'm not sure if that's enough on every obscure
> architecture out there.

No, neither one is needed because of the way suspending_task is used.

It's not necessary for a reader R to see the variable's actual value;
all R needs to know is whether or not suspending_task is equal to R.
Since the only process which can set suspending_task to R is R itself,
and since R will set suspending_task back to NULL before releasing the
write lock on pm_sleep_rwsem, there's never any ambiguity.

Alan Stern

2008-02-24 20:45:31

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <[email protected]>
>
> Remove the code that acquires all device semaphores from the suspend
> code path as it causes multiple problems to appear (most notably,
> http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
> change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
> depending on the code being removed.
>
> Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid
> the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.

Do you actually know whether removing that lock fixes the reported bug?

I agree, the lock should be removed for now. But I'd sure like to get
some feedback about what's going wrong. It's starting to look as
though we'd be a lot better off blocking device registration during
sleep instead of failing it.

Shouldn't resume_device() and suspend_device() now acquire the device
semaphore before calling the various methods? That's the way they
used to work.

Alan Stern

2008-02-24 20:58:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sunday, 24 of February 2008, Alan Stern wrote:
> On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
>
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Remove the code that acquires all device semaphores from the suspend
> > code path as it causes multiple problems to appear (most notably,
> > http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
> > change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
> > depending on the code being removed.
> >
> > Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid
> > the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.
>
> Do you actually know whether removing that lock fixes the reported bug?

In fact I'm waiting for the confirmation, but given the symptoms described by
Lukas I'm quite confident that the problem shouldn't be reproducible with
the lock removed.

> I agree, the lock should be removed for now. But I'd sure like to get
> some feedback about what's going wrong.

No question about that.

> It's starting to look as though we'd be a lot better off blocking device
> registration during sleep instead of failing it.

Agreed.

> Shouldn't resume_device() and suspend_device() now acquire the device
> semaphore before calling the various methods? That's the way they
> used to work.

Yes, thanks for reminding me about that.

Updated patch follows.

Thanks,
Rafael


---
From: Rafael J. Wysocki <[email protected]>

Remove the code that acquires all device semaphores from the suspend
code path as it causes multiple problems to appear (most notably,
http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
depending on the code being removed.

Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid
the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/core.c | 8 ---
drivers/base/power/main.c | 106 ++++++----------------------------------------
drivers/usb/core/usb.c | 2
3 files changed, 17 insertions(+), 99 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -48,7 +48,6 @@
*/

LIST_HEAD(dpm_active);
-static LIST_HEAD(dpm_locked);
static LIST_HEAD(dpm_off);
static LIST_HEAD(dpm_off_irq);
static LIST_HEAD(dpm_destroy);
@@ -81,28 +80,6 @@ void device_pm_add(struct device *dev)
*/
void device_pm_remove(struct device *dev)
{
- /*
- * If this function is called during a suspend, it will be blocked,
- * because we're holding the device's semaphore at that time, which may
- * lead to a deadlock. In that case we want to print a warning.
- * However, it may also be called by unregister_dropped_devices() with
- * the device's semaphore released, in which case the warning should
- * not be printed.
- */
- if (down_trylock(&dev->sem)) {
- if (down_read_trylock(&pm_sleep_rwsem)) {
- /* No suspend in progress, wait on dev->sem */
- down(&dev->sem);
- up_read(&pm_sleep_rwsem);
- } else {
- /* Suspend in progress, we may deadlock */
- dev_warn(dev, "Suspicious %s during suspend\n",
- __FUNCTION__);
- dump_stack();
- /* The user has been warned ... */
- down(&dev->sem);
- }
- }
pr_debug("PM: Removing info for %s:%s\n",
dev->bus ? dev->bus->name : "No Bus",
kobject_name(&dev->kobj));
@@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev
dpm_sysfs_remove(dev);
list_del_init(&dev->power.entry);
mutex_unlock(&dpm_list_mtx);
- up(&dev->sem);
}

/**
@@ -230,6 +206,8 @@ static int resume_device(struct device *
TRACE_DEVICE(dev);
TRACE_RESUME(0);

+ down(&dev->sem);
+
if (dev->bus && dev->bus->resume) {
dev_dbg(dev,"resuming\n");
error = dev->bus->resume(dev);
@@ -245,6 +223,8 @@ static int resume_device(struct device *
error = dev->class->resume(dev);
}

+ up(&dev->sem);
+
TRACE_RESUME(error);
return error;
}
@@ -266,7 +246,7 @@ static void dpm_resume(void)
struct list_head *entry = dpm_off.next;
struct device *dev = to_device(entry);

- list_move_tail(entry, &dpm_locked);
+ list_move_tail(entry, &dpm_active);
mutex_unlock(&dpm_list_mtx);
resume_device(dev);
mutex_lock(&dpm_list_mtx);
@@ -275,25 +255,6 @@ static void dpm_resume(void)
}

/**
- * unlock_all_devices - Release each device's semaphore
- *
- * Go through the dpm_off list. Put each device on the dpm_active
- * list and unlock it.
- */
-static void unlock_all_devices(void)
-{
- mutex_lock(&dpm_list_mtx);
- while (!list_empty(&dpm_locked)) {
- struct list_head *entry = dpm_locked.prev;
- struct device *dev = to_device(entry);
-
- list_move(entry, &dpm_active);
- up(&dev->sem);
- }
- mutex_unlock(&dpm_list_mtx);
-}
-
-/**
* unregister_dropped_devices - Unregister devices scheduled for removal
*
* Unregister all devices on the dpm_destroy list.
@@ -305,7 +266,6 @@ static void unregister_dropped_devices(v
struct list_head *entry = dpm_destroy.next;
struct device *dev = to_device(entry);

- up(&dev->sem);
mutex_unlock(&dpm_list_mtx);
/* This also removes the device from the list */
device_unregister(dev);
@@ -324,7 +284,6 @@ void device_resume(void)
{
might_sleep();
dpm_resume();
- unlock_all_devices();
unregister_dropped_devices();
up_write(&pm_sleep_rwsem);
}
@@ -388,18 +347,15 @@ int device_power_down(pm_message_t state
struct list_head *entry = dpm_off.prev;
struct device *dev = to_device(entry);

- list_del_init(&dev->power.entry);
error = suspend_device_late(dev, state);
if (error) {
printk(KERN_ERR "Could not power down device %s: "
"error %d\n",
kobject_name(&dev->kobj), error);
- if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_off);
break;
}
- if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_off_irq);
+ if (!list_empty(&dev->power.entry))
+ list_move(&dev->power.entry, &dpm_off_irq);
}

if (!error)
@@ -419,6 +375,8 @@ static int suspend_device(struct device
{
int error = 0;

+ down(&dev->sem);
+
if (dev->power.power_state.event) {
dev_dbg(dev, "PM: suspend %d-->%d\n",
dev->power.power_state.event, state.event);
@@ -441,6 +399,9 @@ static int suspend_device(struct device
error = dev->bus->suspend(dev, state);
suspend_report_result(dev->bus->suspend, error);
}
+
+ up(&dev->sem);
+
return error;
}

@@ -461,11 +422,10 @@ static int dpm_suspend(pm_message_t stat
int error = 0;

mutex_lock(&dpm_list_mtx);
- while (!list_empty(&dpm_locked)) {
- struct list_head *entry = dpm_locked.prev;
+ while (!list_empty(&dpm_active)) {
+ struct list_head *entry = dpm_active.prev;
struct device *dev = to_device(entry);

- list_del_init(&dev->power.entry);
mutex_unlock(&dpm_list_mtx);
error = suspend_device(dev, state);
if (error) {
@@ -476,14 +436,11 @@ static int dpm_suspend(pm_message_t stat
(error == -EAGAIN ?
" (please convert to suspend_late)" :
""));
- mutex_lock(&dpm_list_mtx);
- if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_locked);
break;
}
mutex_lock(&dpm_list_mtx);
- if (list_empty(&dev->power.entry))
- list_add(&dev->power.entry, &dpm_off);
+ if (!list_empty(&dev->power.entry))
+ list_move(&dev->power.entry, &dpm_off);
}
mutex_unlock(&dpm_list_mtx);

@@ -491,36 +448,6 @@ static int dpm_suspend(pm_message_t stat
}

/**
- * lock_all_devices - Acquire every device's semaphore
- *
- * Go through the dpm_active list. Carefully lock each device's
- * semaphore and put it in on the dpm_locked list.
- */
-static void lock_all_devices(void)
-{
- mutex_lock(&dpm_list_mtx);
- while (!list_empty(&dpm_active)) {
- struct list_head *entry = dpm_active.next;
- struct device *dev = to_device(entry);
-
- /* Required locking order is dev->sem first,
- * then dpm_list_mutex. Hence this awkward code.
- */
- get_device(dev);
- mutex_unlock(&dpm_list_mtx);
- down(&dev->sem);
- mutex_lock(&dpm_list_mtx);
-
- if (list_empty(entry))
- up(&dev->sem); /* Device was removed */
- else
- list_move_tail(entry, &dpm_locked);
- put_device(dev);
- }
- mutex_unlock(&dpm_list_mtx);
-}
-
-/**
* device_suspend - Save state and stop all devices in system.
* @state: new power management state
*
@@ -533,7 +460,6 @@ int device_suspend(pm_message_t state)

might_sleep();
down_write(&pm_sleep_rwsem);
- lock_all_devices();
error = dpm_suspend(state);
if (error)
device_resume();
Index: linux-2.6/drivers/usb/core/usb.c
===================================================================
--- linux-2.6.orig/drivers/usb/core/usb.c
+++ linux-2.6/drivers/usb/core/usb.c
@@ -234,7 +234,7 @@ static int ksuspend_usb_init(void)
* singlethreaded. Its job doesn't justify running on more
* than one CPU.
*/
- ksuspend_usb_wq = create_singlethread_workqueue("ksuspend_usbd");
+ ksuspend_usb_wq = create_freezeable_workqueue("ksuspend_usbd");
if (!ksuspend_usb_wq)
return -ENOMEM;
return 0;
Index: linux-2.6/drivers/base/core.c
===================================================================
--- linux-2.6.orig/drivers/base/core.c
+++ linux-2.6/drivers/base/core.c
@@ -770,13 +770,6 @@ int device_add(struct device *dev)
struct class_interface *class_intf;
int error;

- error = pm_sleep_lock();
- if (error) {
- dev_warn(dev, "Suspicious %s during suspend\n", __FUNCTION__);
- dump_stack();
- return error;
- }
-
dev = get_device(dev);
if (!dev || !strlen(dev->bus_id)) {
error = -EINVAL;
@@ -843,7 +836,6 @@ int device_add(struct device *dev)
}
Done:
put_device(dev);
- pm_sleep_unlock();
return error;
BusError:
device_pm_remove(dev);

2008-02-24 21:11:23

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <[email protected]>
>
> Remove the code that acquires all device semaphores from the suspend
> code path as it causes multiple problems to appear (most notably,
> http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
> change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
> depending on the code being removed.
>
> Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid
> the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.

This looks okay.

Are you going to submit it just for 2.6.25, leaving the existing code
as-is for 2.6.26?

Alan Stern

2008-02-24 21:42:41

by Pavel Machek

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sun 2008-02-24 15:33:01, Alan Stern wrote:
> On Sun, 24 Feb 2008, Pavel Machek wrote:
>
> > > > What locking protects this variable? What happens when suspending_task
> > > > exits? (Hmm, that would probably be bug, anyway?)
> > >
> > > It's protected by whatever existing locking scheme allows only one
> > > task to start a system sleep at a time. For example, the suspending
> > > task has to get a write lock on pm_sleep_rwsem.
> >
> > And readers of suspending_task are protected by?
>
> I added a comment about that too.
>
> > At the very least, you'd need rmb() before reading it and wmb() after
> > writing to it, but I'm not sure if that's enough on every obscure
> > architecture out there.
>
> No, neither one is needed because of the way suspending_task is used.
>
> It's not necessary for a reader R to see the variable's actual value;
> all R needs to know is whether or not suspending_task is equal to R.
> Since the only process which can set suspending_task to R is R itself,
> and since R will set suspending_task back to NULL before releasing the
> write lock on pm_sleep_rwsem, there's never any ambiguity.

Subtle.

Very subtly wrong ;-).

imagine suspending_task == 0xabcdef01. Now task "R" with current ==
0xabcd0000 reads suspending_task while the other cpu is writing to it,
and sees 0xabcd0000 (0xef01 was not yet written) -- and mistakenly
believes that "R" == suspending_task.

I agree it is very unlikely, and it will not happen on i386. But what
about just using atomic_t suspending_task, and store current->pid into
it?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-24 22:20:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sunday, 24 of February 2008, Alan Stern wrote:
> On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
>
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Remove the code that acquires all device semaphores from the suspend
> > code path as it causes multiple problems to appear (most notably,
> > http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
> > change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
> > depending on the code being removed.
> >
> > Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid
> > the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.
>
> This looks okay.
>
> Are you going to submit it just for 2.6.25, leaving the existing code
> as-is for 2.6.26?

No, I'd like to start over on top of 2.6.25. That will be much cleaner from
the upstream point of view.

We seem to have all the pieces anyway, so that's only a matter of putting them
back together.

Thanks,
Rafael

2008-02-24 22:22:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sunday, 24 of February 2008, Pavel Machek wrote:
> On Sun 2008-02-24 15:33:01, Alan Stern wrote:
> > On Sun, 24 Feb 2008, Pavel Machek wrote:
> >
> > > > > What locking protects this variable? What happens when suspending_task
> > > > > exits? (Hmm, that would probably be bug, anyway?)
> > > >
> > > > It's protected by whatever existing locking scheme allows only one
> > > > task to start a system sleep at a time. For example, the suspending
> > > > task has to get a write lock on pm_sleep_rwsem.
> > >
> > > And readers of suspending_task are protected by?
> >
> > I added a comment about that too.
> >
> > > At the very least, you'd need rmb() before reading it and wmb() after
> > > writing to it, but I'm not sure if that's enough on every obscure
> > > architecture out there.
> >
> > No, neither one is needed because of the way suspending_task is used.
> >
> > It's not necessary for a reader R to see the variable's actual value;
> > all R needs to know is whether or not suspending_task is equal to R.
> > Since the only process which can set suspending_task to R is R itself,
> > and since R will set suspending_task back to NULL before releasing the
> > write lock on pm_sleep_rwsem, there's never any ambiguity.
>
> Subtle.
>
> Very subtly wrong ;-).
>
> imagine suspending_task == 0xabcdef01. Now task "R" with current ==
> 0xabcd0000 reads suspending_task while the other cpu is writing to it,
> and sees 0xabcd0000 (0xef01 was not yet written) -- and mistakenly
> believes that "R" == suspending_task.
>
> I agree it is very unlikely, and it will not happen on i386. But what
> about just using atomic_t suspending_task, and store current->pid into
> it?

I'd rather use a lock, frankly. For example, we can require the readers to
take pm_sleep_rwsem for reading in order to access that.

Thanks,
Rafael

2008-02-25 02:19:23

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sun, 24 Feb 2008, Pavel Machek wrote:

> > > At the very least, you'd need rmb() before reading it and wmb() after
> > > writing to it, but I'm not sure if that's enough on every obscure
> > > architecture out there.
> >
> > No, neither one is needed because of the way suspending_task is used.
> >
> > It's not necessary for a reader R to see the variable's actual value;
> > all R needs to know is whether or not suspending_task is equal to R.
> > Since the only process which can set suspending_task to R is R itself,
> > and since R will set suspending_task back to NULL before releasing the
> > write lock on pm_sleep_rwsem, there's never any ambiguity.
>
> Subtle.
>
> Very subtly wrong ;-).
>
> imagine suspending_task == 0xabcdef01. Now task "R" with current ==
> 0xabcd0000 reads suspending_task while the other cpu is writing to it,
> and sees 0xabcd0000 (0xef01 was not yet written) -- and mistakenly
> believes that "R" == suspending_task.

I always thought that reads and writes of pointers are atomic, just
like reads and writes of longs. Is that wrong?

Now if pointers were the same size as long long then I would agree with
you.

> I agree it is very unlikely, and it will not happen on i386. But what
> about just using atomic_t suspending_task, and store current->pid into
> it?

That would work just as well. Except that it wouldn't need to be
atomic_t, because current->pid is always an integer (not guaranteed, I
suppose, but that's what it is on all current architectures) and
reads/writes of ints _are_ atomic.

Alan Stern

2008-02-25 02:21:59

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

> > Very subtly wrong ;-).
> >
> > imagine suspending_task == 0xabcdef01. Now task "R" with current ==
> > 0xabcd0000 reads suspending_task while the other cpu is writing to it,
> > and sees 0xabcd0000 (0xef01 was not yet written) -- and mistakenly
> > believes that "R" == suspending_task.
> >
> > I agree it is very unlikely, and it will not happen on i386. But what
> > about just using atomic_t suspending_task, and store current->pid into
> > it?
>
> I'd rather use a lock, frankly. For example, we can require the readers to
> take pm_sleep_rwsem for reading in order to access that.

That certainly won't work. Imagine what would happen when the reader
_was_ the suspending task.

But if you really twist my arm, I'll go along with Pavel's suggestion.

Alan Stern

2008-02-25 11:30:43

by Pavel Machek

[permalink] [raw]
Subject: using long instead of atomic_t when only set/read is required (was Re: [Bug 10030] Suspend doesn't work when SD card is inserted)

Hi!

Alan thinks that `subj` is correct...

> > > > At the very least, you'd need rmb() before reading it and wmb() after
> > > > writing to it, but I'm not sure if that's enough on every obscure
> > > > architecture out there.
> > >
> > > No, neither one is needed because of the way suspending_task is used.
> > >
> > > It's not necessary for a reader R to see the variable's actual value;
> > > all R needs to know is whether or not suspending_task is equal to R.
> > > Since the only process which can set suspending_task to R is R itself,
> > > and since R will set suspending_task back to NULL before releasing the
> > > write lock on pm_sleep_rwsem, there's never any ambiguity.
> >
> > Subtle.
> >
> > Very subtly wrong ;-).
> >
> > imagine suspending_task == 0xabcdef01. Now task "R" with current ==
> > 0xabcd0000 reads suspending_task while the other cpu is writing to it,
> > and sees 0xabcd0000 (0xef01 was not yet written) -- and mistakenly
> > believes that "R" == suspending_task.
>
> I always thought that reads and writes of pointers are atomic, just
> like reads and writes of longs. Is that wrong?

...but I'm not that sure. Can someone clarify?

I guess it only works as long as longs are aligned? Should it be
written down to atomic_ops.txt?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-02-25 11:41:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Monday, 25 of February 2008, Alan Stern wrote:
> On Sun, 24 Feb 2008, Pavel Machek wrote:
>
> > > > At the very least, you'd need rmb() before reading it and wmb() after
> > > > writing to it, but I'm not sure if that's enough on every obscure
> > > > architecture out there.
> > >
> > > No, neither one is needed because of the way suspending_task is used.
> > >
> > > It's not necessary for a reader R to see the variable's actual value;
> > > all R needs to know is whether or not suspending_task is equal to R.
> > > Since the only process which can set suspending_task to R is R itself,
> > > and since R will set suspending_task back to NULL before releasing the
> > > write lock on pm_sleep_rwsem, there's never any ambiguity.
> >
> > Subtle.
> >
> > Very subtly wrong ;-).
> >
> > imagine suspending_task == 0xabcdef01. Now task "R" with current ==
> > 0xabcd0000 reads suspending_task while the other cpu is writing to it,
> > and sees 0xabcd0000 (0xef01 was not yet written) -- and mistakenly
> > believes that "R" == suspending_task.
>
> I always thought that reads and writes of pointers are atomic, just
> like reads and writes of longs. Is that wrong?

That depends on the architecture. It may be wrong on alpha, IIRC.

> Now if pointers were the same size as long long then I would agree with
> you.

That certainly is true on x86-64. On alpha probably too.

> > I agree it is very unlikely, and it will not happen on i386. But what
> > about just using atomic_t suspending_task, and store current->pid into
> > it?
>
> That would work just as well. Except that it wouldn't need to be
> atomic_t, because current->pid is always an integer (not guaranteed, I
> suppose, but that's what it is on all current architectures) and
> reads/writes of ints _are_ atomic.

That also depends on the arch, I'm afraid, and in general if we assume
something to be atomic, it's better to make the code reflect that assumption.

Thanks,
Rafael

2008-02-25 11:43:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Bug 10030] Suspend doesn't work when SD card is inserted

On Monday, 25 of February 2008, Alan Stern wrote:
> On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
>
> > > Very subtly wrong ;-).
> > >
> > > imagine suspending_task == 0xabcdef01. Now task "R" with current ==
> > > 0xabcd0000 reads suspending_task while the other cpu is writing to it,
> > > and sees 0xabcd0000 (0xef01 was not yet written) -- and mistakenly
> > > believes that "R" == suspending_task.
> > >
> > > I agree it is very unlikely, and it will not happen on i386. But what
> > > about just using atomic_t suspending_task, and store current->pid into
> > > it?
> >
> > I'd rather use a lock, frankly. For example, we can require the readers to
> > take pm_sleep_rwsem for reading in order to access that.
>
> That certainly won't work. Imagine what would happen when the reader
> _was_ the suspending task.

Yeah, I've already realized it was a stupid idea. :-)

> But if you really twist my arm, I'll go along with Pavel's suggestion.

So can you do that, pretty please?

Thanks,
Rafael

2008-02-25 14:46:21

by Alan Stern

[permalink] [raw]
Subject: Re: using long instead of atomic_t when only set/read is required (was Re: [Bug 10030] Suspend doesn't work when SD card is inserted)

On Mon, 25 Feb 2008, Pavel Machek wrote:

> Hi!
>
> Alan thinks that `subj` is correct...

More precisely, reads and writes of pointers are always atomic. That
is, if a write and a read occur concurrently, it is guaranteed that the
read will obtain either the old or the new value of the pointer, never
a mish-mash of the two. If this were not so then RCU wouldn't work.

> I guess it only works as long as longs are aligned? Should it be
> written down to atomic_ops.txt?

Forget it, I'm going to withdraw the entire thing. The whole approach
is wrong. More details in a new email thread, to follow soon.

Alan Stern

2008-02-25 17:41:43

by Pierre Ossman

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Sun, 24 Feb 2008 10:33:34 -0500 (EST)
Alan Stern <[email protected]> wrote:

>
> Well, the patch itself isn't really adequate; it was just a first pass
> intended to illustrate the nature of the problem.
>
> The problems in the MMC subsystem go deeper.
>
> The first is the way it uses flush_workqueue(). This is almost always
> a bad function to call, because it introduces unnecessary dependencies.
> cancel_delayed_work_sync() is much better.
>
> But even changing that won't solve the second issue, which is a genuine
> bug. There is a race between detect events and suspend events. The
> mmc_suspend_host() routine starts out by flushing the kmmcd workqueue
> before calling the host's suspend routine. So what happens if another
> detect event occurs in between?
>

The idea is that host drivers shouldn't do that. Once they've called
mmc_suspend_host(), then they shouldn't be poking the MMC core in any
other way. None of this is of course properly documented. :/

> This whole area of synchronization between card insertion, card
> removal, suspend, and resume needs to be thought out better.
> Especially keeping in mind that device registration or unregistration
> during a system sleep is likely to block until the sleep is over.

Trying to keep up with the PM changes is a full time job. For now I've
mostly ignored it as I don't even have time for the other stuff.
Patches welcome.

>
> Lastly, there are some other questions about confusing aspects of the
> subsystem. What's the story with __mmc_claim_host()? Is it really
> nothing more than an abortable mutex_lock()? Why does it ever need to
> be aborted? Why is its second argument an atomic_t instead of a normal
> int?
>

It got that way when SDIO got in. It was needed to make it abortable to
solve a rather nasty deadlock situation. I don't remember the details
right now, but it should be in the LKML archives.

> Why are mmc_detect() and related routines described in comments as
> "Card detection callback from host"? They don't handle card
> _detection_ -- they handle card _removal_.

They handle both.

>
> What's the purpose of the card-counting loop in
> host/omap.c:mmc_omap_switch_handler()? It looks like dead code.
>

I'm not too familiar with that driver, but they've been playing around
with multiplexing several cards into one controller. Might be bits and
pieces of that.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2008-02-25 17:58:44

by Alan Stern

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

Thanks for the explanations.

On Mon, 25 Feb 2008, Pierre Ossman wrote:

> Trying to keep up with the PM changes is a full time job. For now I've
> mostly ignored it as I don't even have time for the other stuff.
> Patches welcome.

What do you think of the patch attached to comment #40 in the Bugzilla
entry?

Alan Stern

2008-02-25 18:32:18

by Pierre Ossman

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Mon, 25 Feb 2008 12:58:30 -0500 (EST)
Alan Stern <[email protected]> wrote:

> Thanks for the explanations.
>
> On Mon, 25 Feb 2008, Pierre Ossman wrote:
>
> > Trying to keep up with the PM changes is a full time job. For now I've
> > mostly ignored it as I don't even have time for the other stuff.
> > Patches welcome.
>
> What do you think of the patch attached to comment #40 in the Bugzilla
> entry?
>

Looks ok. As long as those two synchronization points are guaranteed,
then I'm happy.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2008-02-25 20:01:03

by Alan Stern

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Mon, 25 Feb 2008, Pierre Ossman wrote:

> > What do you think of the patch attached to comment #40 in the Bugzilla
> > entry?
> >
>
> Looks ok. As long as those two synchronization points are guaranteed,
> then I'm happy.

Maybe a better approach would be to leave the workqueue unfreezable,
and keep cancel_delayed_work_sync() in mmc_suspend_host(). It would
then be necessary to add a test to verify, if there is a card attached,
that the card is indeed suspended. After all, it's possible that the
cancel_delayed_work_sync() ended up waiting for a job already running
on the workqueue to register a new card! (The same would be true even
with flush_scheduled_work.)

Also, as a bit of defensive programming, it might be a good idea to add
a "suspended" flag to the mmc_host structure. If mmc_rescan() sees
that the flag is set then it should return immediately. This would
protect against host drivers that aren't careful to disable detect
IRQs before calling mmc_suspend_host().

Alan Stern

2008-02-25 22:52:14

by Felipe Balbi

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

<snip>

> > What's the purpose of the card-counting loop in
> > host/omap.c:mmc_omap_switch_handler()? It looks like dead code.
> >
>
> I'm not too familiar with that driver, but they've been playing around
> with multiplexing several cards into one controller. Might be bits and
> pieces of that.
>

AFAIK, that came after mmc multislot support. Omap2 has one mmc
controller for several (2 in practice) mm slots (cards).
It's not really a dead code since it's used in n800 and n810, it's in
linux-omap archives and git tree.

Carlos Aguiar, added in the loop, might have some comments about it.

--
Best Regards,
Felipe Balbi
[email protected]

2008-03-01 14:13:01

by Pierre Ossman

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Mon, 25 Feb 2008 15:00:51 -0500 (EST)
Alan Stern <[email protected]> wrote:

>
> Maybe a better approach would be to leave the workqueue unfreezable,
> and keep cancel_delayed_work_sync() in mmc_suspend_host(). It would
> then be necessary to add a test to verify, if there is a card attached,
> that the card is indeed suspended. After all, it's possible that the
> cancel_delayed_work_sync() ended up waiting for a job already running
> on the workqueue to register a new card! (The same would be true even
> with flush_scheduled_work.)

How would that be handled? I'd prefer a patch as I'm evidently not up to date with how to fondle the pm stuff properly. :)

>
> Also, as a bit of defensive programming, it might be a good idea to add
> a "suspended" flag to the mmc_host structure. If mmc_rescan() sees
> that the flag is set then it should return immediately. This would
> protect against host drivers that aren't careful to disable detect
> IRQs before calling mmc_suspend_host().

Indeed. I'll add that to my todo.

Rgds

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2008-03-01 14:36:46

by Alan Stern

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Sat, 1 Mar 2008, Pierre Ossman wrote:

> On Mon, 25 Feb 2008 15:00:51 -0500 (EST)
> Alan Stern <[email protected]> wrote:
>
> >
> > Maybe a better approach would be to leave the workqueue unfreezable,
> > and keep cancel_delayed_work_sync() in mmc_suspend_host(). It would
> > then be necessary to add a test to verify, if there is a card attached,
> > that the card is indeed suspended. After all, it's possible that the
> > cancel_delayed_work_sync() ended up waiting for a job already running
> > on the workqueue to register a new card! (The same would be true even
> > with flush_scheduled_work.)
>
> How would that be handled? I'd prefer a patch as I'm evidently not up to date with how to fondle the pm stuff properly. :)

The easiest way to do this will be to wait until some planned updates
are added to the PM core; then this will be a quick and simple change.
That probably means waiting until after 2.6.25 is released, however.

> > Also, as a bit of defensive programming, it might be a good idea to add
> > a "suspended" flag to the mmc_host structure. If mmc_rescan() sees
> > that the flag is set then it should return immediately. This would
> > protect against host drivers that aren't careful to disable detect
> > IRQs before calling mmc_suspend_host().
>
> Indeed. I'll add that to my todo.

That could go in along with the previous change. The PM update
mentioned above involves adding a "suspended" flag to every struct
device. With that available, this amounts to nothing more than an
extra test added at the beginning of mmc_rescan().

Alan Stern

2008-03-01 14:48:22

by Pierre Ossman

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Sat, 1 Mar 2008 09:36:36 -0500 (EST)
Alan Stern <[email protected]> wrote:

>
> The easiest way to do this will be to wait until some planned updates
> are added to the PM core; then this will be a quick and simple change.
> That probably means waiting until after 2.6.25 is released, however.
>

Ok. Just give me a poke when it's time.

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2008-03-03 14:44:39

by Pavel Machek

[permalink] [raw]
Subject: [patch] Re: using long instead of atomic_t when only set/read is required

Hi!

> > Alan thinks that `subj` is correct...
>
> More precisely, reads and writes of pointers are always atomic. That
> is, if a write and a read occur concurrently, it is guaranteed that the
> read will obtain either the old or the new value of the pointer, never
> a mish-mash of the two. If this were not so then RCU wouldn't work.

Ok, so linux actually atomicity of long?

If so, this should probably be applied...

Signed-off-by: Pavel Machek <[email protected]>

diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt
index 4ef2450..0a7d180 100644
--- a/Documentation/atomic_ops.txt
+++ b/Documentation/atomic_ops.txt
@@ -21,6 +21,9 @@ local_t is very similar to atomic_t. If
updated by one CPU, local_t is probably more appropriate. Please see
Documentation/local_ops.txt for the semantics of local_t.

+long (and int and void *) can be used instead of atomic_t, if all you
+need is atomic setting and atomic reading.
+
The first operations to implement for atomic_t's are the initializers and
plain reads.




--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-03-03 15:42:45

by Alan Stern

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

On Mon, 3 Mar 2008, Pavel Machek wrote:

> Hi!
>
> > > Alan thinks that `subj` is correct...
> >
> > More precisely, reads and writes of pointers are always atomic. That
> > is, if a write and a read occur concurrently, it is guaranteed that the
> > read will obtain either the old or the new value of the pointer, never
> > a mish-mash of the two. If this were not so then RCU wouldn't work.

Right, Paul?

> Ok, so linux actually atomicity of long?
>
> If so, this should probably be applied...
>
> Signed-off-by: Pavel Machek <[email protected]>
>
> diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt
> index 4ef2450..0a7d180 100644
> --- a/Documentation/atomic_ops.txt
> +++ b/Documentation/atomic_ops.txt
> @@ -21,6 +21,9 @@ local_t is very similar to atomic_t. If
> updated by one CPU, local_t is probably more appropriate. Please see
> Documentation/local_ops.txt for the semantics of local_t.
>
> +long (and int and void *) can be used instead of atomic_t, if all you
> +need is atomic setting and atomic reading.
> +
> The first operations to implement for atomic_t's are the initializers and
> plain reads.

Yes indeed. This fact doesn't seem to be documented anywhere, but it
is clearly a requirement of the kernel. I would make the text a little
more explicit, see below.

Alan Stern

-------------------------------------------------------


Atomicity of reads of write for pointers and integral types (other than
long long) should be documented.

Signed-off-by: Alan Stern <[email protected]>

---

Index: usb-2.6/Documentation/atomic_ops.txt
===================================================================
--- usb-2.6.orig/Documentation/atomic_ops.txt
+++ usb-2.6/Documentation/atomic_ops.txt
@@ -21,6 +21,21 @@ local_t is very similar to atomic_t. If
updated by one CPU, local_t is probably more appropriate. Please see
Documentation/local_ops.txt for the semantics of local_t.

+For all properly-aligned pointer and integral types other than long
+long, the kernel requires simple reads and writes to be atomic with
+respect to each other. That is, if one CPU reads a pointer value at
+the same time as another CPU overwrites the pointer, it is guaranteed
+that the reader will obtain either the old or the new value of the
+pointer, never some mish-mash combination of the two. Likewise, if
+one CPU writes a long value at the same time as another CPU does, it
+is guaranteed that one or the other of the values will end up stored
+in memory, not some mish-mash combination of bits.
+
+Thus, if all you need is atomicity of reading and writing then you can
+use plain old ints, longs, or pointers; you don't need to use
+atomic_t. (But note: This guarantee emphatically does not apply to
+long long values or unaligned values!)
+
The first operations to implement for atomic_t's are the initializers and
plain reads.

2008-03-03 16:02:53

by Alan

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

> Ok, so linux actually atomicity of long?

No it doesn't. And even if it did you couldn't use long for this because
atomic_t also ensures the points operations complete are defined. You
might just about get away with volatile long * objects on x86 for simple
assignments but for anything else gcc can and will generate code to
update values whichever way it feels best - which includes turning

long *x = a + b;

into

*x = a;
*x += b;

Alan

2008-03-03 16:12:17

by Alan

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required


> Atomicity of reads of write for pointers and integral types (other than
> long long) should be documented.

NAK.

Atomicity of reads or writes for pointers and integral types is NOT
guaranteed. Gcc doesn't believe in your guarantee.

Alan

2008-03-03 17:11:22

by Alan Stern

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

On Mon, 3 Mar 2008, Alan Cox wrote:

> > Atomicity of reads of write for pointers and integral types (other than
> > long long) should be documented.
>
> NAK.
>
> Atomicity of reads or writes for pointers and integral types is NOT
> guaranteed. Gcc doesn't believe in your guarantee.

Miscommunication and lack of clarity. CPU reads and writes _are_
guaranteed to be atomic. What is not guaranteed is that the compiler
will generate a single read or write instruction corresponding to a
particular expression in C.

Consider a routine like the following:

static task_struct *the_task;

void store_task(void)
{
the_task = current;
}

Is it possible to say whether readers examining "the_task" are
guaranteed to see a coherent value?

Alan Stern

2008-03-03 17:34:48

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

On Tuesday 04 March 2008 02:53, Alan Cox wrote:
> > Atomicity of reads of write for pointers and integral types (other than
> > long long) should be documented.
>
> NAK.
>
> Atomicity of reads or writes for pointers and integral types is NOT
> guaranteed. Gcc doesn't believe in your guarantee.

Are you sure gcc doesn't? Or is it just "C"?

Linux wouldn't work today if gcc did something non-atomic there
(presuming you're talking about naturally aligned pointers/ints).
It is widely used and accepted.

RCU users are far from the only places to rely on this, although
I guess they are the main ones when it comes to assigning pointers
atomically.

2008-03-03 17:35:52

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

On Mon, Mar 03, 2008 at 10:42:35AM -0500, Alan Stern wrote:
> On Mon, 3 Mar 2008, Pavel Machek wrote:
>
> > Hi!
> >
> > > > Alan thinks that `subj` is correct...
> > >
> > > More precisely, reads and writes of pointers are always atomic. That
> > > is, if a write and a read occur concurrently, it is guaranteed that the
> > > read will obtain either the old or the new value of the pointer, never
> > > a mish-mash of the two. If this were not so then RCU wouldn't work.
>
> Right, Paul?

Yep, as long as they aligned naturally, e.g., 32-bit pointers to a
four-byte boundary, 64-bit pointers to an eight-byte boundary.

Note that this is a backdoor agreement between the Linux kernel and gcc.
The C/C++ standard does -not- require atomic reads or writes for anything
other than a volatile sig_atomic_t. In fact, there are a lot of things
that C/C++ doesn't guarantee, which is why Linux makes use of so many
of gcc's non-standard extensions. ;-)

> > Ok, so linux actually atomicity of long?

Yep, same alignment rules as pointers. Ah, and the alignment is
covered below. Very good!

> > If so, this should probably be applied...
> >
> > Signed-off-by: Pavel Machek <[email protected]>
> >
> > diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt
> > index 4ef2450..0a7d180 100644
> > --- a/Documentation/atomic_ops.txt
> > +++ b/Documentation/atomic_ops.txt
> > @@ -21,6 +21,9 @@ local_t is very similar to atomic_t. If
> > updated by one CPU, local_t is probably more appropriate. Please see
> > Documentation/local_ops.txt for the semantics of local_t.
> >
> > +long (and int and void *) can be used instead of atomic_t, if all you
> > +need is atomic setting and atomic reading.
> > +
> > The first operations to implement for atomic_t's are the initializers and
> > plain reads.
>
> Yes indeed. This fact doesn't seem to be documented anywhere, but it
> is clearly a requirement of the kernel. I would make the text a little
> more explicit, see below.
>
> Alan Stern
>
> -------------------------------------------------------
>
>
> Atomicity of reads of write for pointers and integral types (other than
> long long) should be documented.
>
> Signed-off-by: Alan Stern <[email protected]>
>
> ---
>
> Index: usb-2.6/Documentation/atomic_ops.txt
> ===================================================================
> --- usb-2.6.orig/Documentation/atomic_ops.txt
> +++ usb-2.6/Documentation/atomic_ops.txt
> @@ -21,6 +21,21 @@ local_t is very similar to atomic_t. If
> updated by one CPU, local_t is probably more appropriate. Please see
> Documentation/local_ops.txt for the semantics of local_t.
>
> +For all properly-aligned pointer and integral types other than long
> +long, the kernel requires simple reads and writes to be atomic with
> +respect to each other. That is, if one CPU reads a pointer value at
> +the same time as another CPU overwrites the pointer, it is guaranteed
> +that the reader will obtain either the old or the new value of the
> +pointer, never some mish-mash combination of the two. Likewise, if
> +one CPU writes a long value at the same time as another CPU does, it
> +is guaranteed that one or the other of the values will end up stored
> +in memory, not some mish-mash combination of bits.
> +
> +Thus, if all you need is atomicity of reading and writing then you can
> +use plain old ints, longs, or pointers; you don't need to use
> +atomic_t. (But note: This guarantee emphatically does not apply to
> +long long values or unaligned values!)
> +
> The first operations to implement for atomic_t's are the initializers and
> plain reads.
>
>

2008-03-03 17:36:44

by Pavel Machek

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

Hi!

> > Ok, so linux actually atomicity of long?
^~-- assumes should be here.

> No it doesn't. And even if it did you couldn't use long for this because
> atomic_t also ensures the points operations complete are defined. You
> might just about get away with volatile long * objects on x86 for simple
> assignments but for anything else gcc can and will generate code to
> update values whichever way it feels best - which includes turning
>
> long *x = a + b;
>
> into
>
> *x = a;
> *x += b;

Ok, I can understand the gcc side. But do we actually run on an
architecture where

long *x;

*x = 0;

racing with

*x = 0x12345678;

can produce

*x == 0x12340000;

or something like that? I'm told RCU relies on architectures not doing
this, and I'd like to get this clarified.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-03-03 17:38:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required



On Mon, 3 Mar 2008, Alan Stern wrote:
>
> Consider a routine like the following:
>
> static task_struct *the_task;
>
> void store_task(void)
> {
> the_task = current;
> }
>
> Is it possible to say whether readers examining "the_task" are
> guaranteed to see a coherent value?

Yes, we do depend on this. All the RCU stuff (and in general *anything*
that depends on memory ordering as opposed to full locking, and we have
quite a lot of it) is very fundamentally dependent on the fact that things
like pointers get read and written atomically.

HOWEVER, it is worth pointing out that it's generally true in a
"different" sense than the actual atomic accesses. For example, if you
test a single bit of a word, it's still quite possible that gcc will have
turned that "atomic" read into a single byte read, so it's not necessarily
the case that we'll actually even read the whole word.

(Writes are different: if you do things like bitwise updates they simply
*will*not* be atomic, but that's simply not what we depend on anyway).

So in that sense, the atomicity guarantees are a lot weaker than the ones
we do for IO accesses, but that's all fine. Memory isn't IO, and doesn't
have side effects.

Linus

2008-03-03 17:39:31

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

On Tue, Mar 04, 2008 at 04:16:33AM +1100, Nick Piggin wrote:
> On Tuesday 04 March 2008 02:53, Alan Cox wrote:
> > > Atomicity of reads of write for pointers and integral types (other than
> > > long long) should be documented.
> >
> > NAK.
> >
> > Atomicity of reads or writes for pointers and integral types is NOT
> > guaranteed. Gcc doesn't believe in your guarantee.
>
> Are you sure gcc doesn't? Or is it just "C"?
>
> Linux wouldn't work today if gcc did something non-atomic there
> (presuming you're talking about naturally aligned pointers/ints).
> It is widely used and accepted.
>
> RCU users are far from the only places to rely on this, although
> I guess they are the main ones when it comes to assigning pointers
> atomically.

It is true that gcc can refetch pointers/ints if it runs out of registers,
which is why rcu_dereference() recently had an ACCESS_ONCE() added to it.

But such refetching cannot result in a mish-mash of two different
pointer values, confusing though it might be to the affected code.

Thanx, Paul

2008-03-03 17:44:36

by Pavel Machek

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

Hi!

> > Consider a routine like the following:
> >
> > static task_struct *the_task;
> >
> > void store_task(void)
> > {
> > the_task = current;
> > }
> >
> > Is it possible to say whether readers examining "the_task" are
> > guaranteed to see a coherent value?
>
> Yes, we do depend on this. All the RCU stuff (and in general *anything*
> that depends on memory ordering as opposed to full locking, and we have
> quite a lot of it) is very fundamentally dependent on the fact that things
> like pointers get read and written atomically.
>
> HOWEVER, it is worth pointing out that it's generally true in a
> "different" sense than the actual atomic accesses. For example, if you
> test a single bit of a word, it's still quite possible that gcc will have
> turned that "atomic" read into a single byte read, so it's not necessarily
> the case that we'll actually even read the whole word.
>
> (Writes are different: if you do things like bitwise updates they simply
> *will*not* be atomic, but that's simply not what we depend on anyway).

Ok... can we get Alan Stern's patch into Documentation/atomic_ops.txt
, then? I was not aware of this, and there seems to be lot of
confusion around...

Plus... I really don't think we can "just access" this as normal
pointers... due to the compiler issues Alan Cox mentioned, and due to
the ACCESS_ONCE() issue.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-03-03 17:48:32

by Alan

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

> Are you sure gcc doesn't? Or is it just "C"?

gcc doesn't

> Linux wouldn't work today if gcc did something non-atomic there
> (presuming you're talking about naturally aligned pointers/ints).
> It is widely used and accepted.

Yes and we've had tty layer traces in the past clearly showing it isn't
always safe, especially if any math is involved anywhere near the
assignment. That may be why pointer flipping happens to work.

Alan

2008-03-03 19:27:25

by Alan Stern

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

On Mon, 3 Mar 2008, Pavel Machek wrote:

> Ok... can we get Alan Stern's patch into Documentation/atomic_ops.txt
> , then? I was not aware of this, and there seems to be lot of
> confusion around...
>
> Plus... I really don't think we can "just access" this as normal
> pointers... due to the compiler issues Alan Cox mentioned, and due to
> the ACCESS_ONCE() issue.

Here's an updated version of the patch, including the issue Alan Cox
brought up.

Alan Stern

-----------------------------------------------------------------

Atomicity of reads of write for pointers and integral types (other than
long long) should be documented, along with the limitations imposed by
the compiler.

Signed-off-by: Alan Stern <[email protected]>

---

Index: usb-2.6/Documentation/atomic_ops.txt
===================================================================
--- usb-2.6.orig/Documentation/atomic_ops.txt
+++ usb-2.6/Documentation/atomic_ops.txt
@@ -21,6 +21,24 @@ local_t is very similar to atomic_t. If
updated by one CPU, local_t is probably more appropriate. Please see
Documentation/local_ops.txt for the semantics of local_t.

+For all properly-aligned pointer and integral types other than long
+long, the kernel requires simple reads and writes to be atomic with
+respect to each other. That is, if one CPU reads a pointer value at
+the same time as another CPU overwrites the pointer, it is guaranteed
+that the reader will obtain either the old or the new value of the
+pointer, never some mish-mash combination of the two. Likewise, if
+one CPU writes a long value at the same time as another CPU does, it
+is guaranteed that one or the other of the values will end up stored
+in memory, not some mish-mash combination of bits.
+
+Thus, if all you need is atomicity of reading and writing then you can
+use plain old ints, longs, or pointers; you don't need to use
+atomic_t. But note: This guarantee emphatically does not apply to
+long long values or unaligned values! Note also that gcc does not
+guarantee to compile all C assignment expressions into simple writes.
+For example, a statement like "x = a + b" might cause gcc to emit code
+equivalent to "x = a; x += b", which is decidedly non-atomic.
+
The first operations to implement for atomic_t's are the initializers and
plain reads.

2008-03-03 20:29:10

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

On Monday, 3 of March 2008, Pavel Machek wrote:
> Hi!
>
> > > Ok, so linux actually atomicity of long?
> ^~-- assumes should be here.
>
> > No it doesn't. And even if it did you couldn't use long for this because
> > atomic_t also ensures the points operations complete are defined. You
> > might just about get away with volatile long * objects on x86 for simple
> > assignments but for anything else gcc can and will generate code to
> > update values whichever way it feels best - which includes turning
> >
> > long *x = a + b;
> >
> > into
> >
> > *x = a;
> > *x += b;
>
> Ok, I can understand the gcc side. But do we actually run on an
> architecture where
>
> long *x;
>
> *x = 0;
>
> racing with
>
> *x = 0x12345678;
>
> can produce
>
> *x == 0x12340000;
>
> or something like that?

Well something like this could happen, in theory, on a "32-bit" architecture
with a 16-bit bus. In that case, one can imagine, the first word of the
first write may be sent through the bus immediately followed by the first
word of the second write, followed by the second word of the second
write and by the second word of the first write, in this order.

> I'm told RCU relies on architectures not doing this, and I'd like to get this
> clarified.

Yes, it would be good to know that for sure.

Rafael

2008-03-03 21:13:09

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

On Mon, Mar 03, 2008 at 09:27:29PM +0100, Rafael J. Wysocki wrote:
> On Monday, 3 of March 2008, Pavel Machek wrote:
> > Hi!
> >
> > > > Ok, so linux actually atomicity of long?
> > ^~-- assumes should be here.
> >
> > > No it doesn't. And even if it did you couldn't use long for this because
> > > atomic_t also ensures the points operations complete are defined. You
> > > might just about get away with volatile long * objects on x86 for simple
> > > assignments but for anything else gcc can and will generate code to
> > > update values whichever way it feels best - which includes turning
> > >
> > > long *x = a + b;
> > >
> > > into
> > >
> > > *x = a;
> > > *x += b;
> >
> > Ok, I can understand the gcc side. But do we actually run on an
> > architecture where
> >
> > long *x;
> >
> > *x = 0;
> >
> > racing with
> >
> > *x = 0x12345678;
> >
> > can produce
> >
> > *x == 0x12340000;
> >
> > or something like that?
>
> Well something like this could happen, in theory, on a "32-bit" architecture
> with a 16-bit bus. In that case, one can imagine, the first word of the
> first write may be sent through the bus immediately followed by the first
> word of the second write, followed by the second word of the second
> write and by the second word of the first write, in this order.
>
> > I'm told RCU relies on architectures not doing this, and I'd like to get this
> > clarified.
>
> Yes, it would be good to know that for sure.

Most rcu_assign_pointer() calls are protected by locks, but there might
be a few that are not. However, the case that concerns me most would be
the following:

o Task 0 writes the lower 16 bits of the pointer.

o Task 1 reads the lower 16 bits of the pointer.

o Task 1 reads the upper 16 bits of the pointer.

o Task 0 writes the upper 16 bits of the pointer.

This would result in task 1 getting a mish-mash of the old and new
versions of the pointer. Very bad!!! RCU heavily relies on the reader
seeing either the initial value of the pointer or on the value written
by some single write.

But doesn't this require a -multi-CPU- system with a 16-bit data path
from the ALU to the L0 cache? This seems a bit unlikely. Or am I being
naive about embedded CPUs?

On the other hand, if you have a 32-bit single-CPU system with a 16-bit
path to memory, all we need is that interrupts be restricted to happening
at instruction boundaries rather than in the middle of instructions.

Thanx, Paul

2008-03-03 21:59:50

by David Brownell

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Monday 25 February 2008, Pierre Ossman wrote:
> On Sun, 24 Feb 2008 10:33:34 -0500 (EST)
> Alan Stern <[email protected]> wrote:
>
> > But even changing that won't solve the second issue, which is a genuine
> > bug. There is a race between detect events and suspend events. The
> > mmc_suspend_host() routine starts out by flushing the kmmcd workqueue
> > before calling the host's suspend routine. So what happens if another
> > detect event occurs in between?
> >
>
> The idea is that host drivers shouldn't do that. Once they've called
> mmc_suspend_host(), then they shouldn't be poking the MMC core in any
> other way. None of this is of course properly documented. :/

Card insert/remove events can be system wake events though. Which
makes that restriction impractical.

I think hosts need to be able to call mmc_detect_change() as soon as
they see a stable signal. The MMC core can hold off handling that
for a while, if it needs to wait until the code walking the device
tree gets around to resuming that host. It's a lot more natural to
hold off such stuff one time there than in N host drivers; especially
since the MMC core already has such hold-off code.

- Dave

2008-03-03 22:24:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required



On Mon, 3 Mar 2008, Rafael J. Wysocki wrote:
>
> Well something like this could happen, in theory, on a "32-bit" architecture
> with a 16-bit bus.

No it couldn't.

That would only be true if there is no cache, and no cache coherency.

Basically, Linux requires a cache-coherent architecture to work in SMP.
Anything else is insane (except as a cluster).

So there is no way we can see partial updates, except with terminally
broken hardware that we would never support anyway for tons of other
reasons.

Linus

2008-03-04 06:05:35

by Pierre Ossman

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Mon, 3 Mar 2008 13:59:37 -0800
David Brownell <[email protected]> wrote:

>
> Card insert/remove events can be system wake events though. Which
> makes that restriction impractical.
>
> I think hosts need to be able to call mmc_detect_change() as soon as
> they see a stable signal. The MMC core can hold off handling that
> for a while, if it needs to wait until the code walking the device
> tree gets around to resuming that host. It's a lot more natural to
> hold off such stuff one time there than in N host drivers; especially
> since the MMC core already has such hold-off code.
>

That actually sorts itself out as the MMC core reprobes on wakeup, but I see your point. Right now things will work peachy if the controllers just make sure to disable their card detection logic before telling the core to suspend.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2008-03-04 09:44:22

by David Brownell

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Monday 03 March 2008, Pierre Ossman wrote:
> On Mon, 3 Mar 2008 13:59:37 -0800
> David Brownell <[email protected]> wrote:
>
> >
> > Card insert/remove events can be system wake events though. Which
> > makes that restriction impractical.

This part seems to be ignored by your comment ... wake events.


> > I think hosts need to be able to call mmc_detect_change() as soon as
> > they see a stable signal. The MMC core can hold off handling that
> > for a while, if it needs to wait until the code walking the device
> > tree gets around to resuming that host. It's a lot more natural to
> > hold off such stuff one time there than in N host drivers; especially
> > since the MMC core already has such hold-off code.
> >
>
> That actually sorts itself out as the MMC core reprobes on wakeup, but
> I see your point. Right now things will work peachy if the controllers
> just make sure to disable their card detection logic before telling the
> core to suspend.

That is, the MMC core doesn't understand wakeup events.

Or, as pointed out elsewhere, well-behaved MMC hosts ... which don't
need either such reprobing or the associated remove-on-suspend.

- Dave

2008-03-04 10:00:24

by Pierre Ossman

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Tue, 4 Mar 2008 01:44:09 -0800
David Brownell <[email protected]> wrote:

> On Monday 03 March 2008, Pierre Ossman wrote:
> > On Mon, 3 Mar 2008 13:59:37 -0800
> > David Brownell <[email protected]> wrote:
> >
> > >
> > > Card insert/remove events can be system wake events though. Which
> > > makes that restriction impractical.
>
> This part seems to be ignored by your comment ... wake events.
>

How so? How is a wake caused by the MMC controller different than any other source?

>
> That is, the MMC core doesn't understand wakeup events.
>
> Or, as pointed out elsewhere, well-behaved MMC hosts ... which don't
> need either such reprobing or the associated remove-on-suspend.
>

You need well behaved _systems_, not just hosts to achieve that guarantee. SDHCI controllers have the ability to wake up the system on card removal, but there is zero guarantee that the platform actually wired the controller up in a way that actually allows it to do this. Throw suspend to disk, where the system might completely lose power, into the mix and you're completely screwed.

So for the default behaviour to change, we need one of two things:

- Certainty that removals cannot go unnoticed. (even then, you also need wakeup latency guarantees)
- Ability to detect a removal after the fact.

For the general case, the first one is impossible given todays hardware. The second might be solvable, but noone has done the work.

(If your complete system can satisfy the first option, feel free to add the "unsafe" option to you defconfig, but that is hardly adequate reason to have it on by default.)

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

2008-03-04 17:50:41

by Alan Stern

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Mon, 3 Mar 2008, David Brownell wrote:

> Card insert/remove events can be system wake events though. Which
> makes that restriction impractical.
>
> I think hosts need to be able to call mmc_detect_change() as soon as
> they see a stable signal. The MMC core can hold off handling that
> for a while, if it needs to wait until the code walking the device
> tree gets around to resuming that host. It's a lot more natural to
> hold off such stuff one time there than in N host drivers; especially
> since the MMC core already has such hold-off code.

That's what ended up happening. The workqueue used by
mmc_detect_change() was made freezable, so hosts could call the routine
at any time but it wouldn't do anything until the system sleep was
over.

A more flexible approach would avoid freezing the workqueue, and allow
it to process card removals at any time. But card insertions would be
ignored if the mmc_host device was suspended; at resume time the core
probes for changes that occurred during the sleep.

Alan Stern

2008-03-04 17:53:56

by Alan Stern

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Tue, 4 Mar 2008, David Brownell wrote:

> Or, as pointed out elsewhere, well-behaved MMC hosts ... which don't
> need either such reprobing or the associated remove-on-suspend.

I don't understand this comment. Suppose a card was inserted while the
system was hibernating. If the core didn't reprobe, when would that
card be discovered?

Alan Stern

2008-03-04 19:13:50

by David Brownell

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Tuesday 04 March 2008, Alan Stern wrote:
> On Tue, 4 Mar 2008, David Brownell wrote:
>
> > Or, as pointed out elsewhere, well-behaved MMC hosts ... which don't
> > need either such reprobing or the associated remove-on-suspend.
>
> I don't understand this comment. Suppose a card was inserted while the
> system was hibernating. If the core didn't reprobe, when would that
> card be discovered?

The host controller would tell the core to check for a card, exactly
like it does at all other times.

That's the natural alternative to having the MMC core assume that card
detection was broken in low power states, so that the core needed to
forcibly remove the cards before suspend, and reprobe during resume
processing.

Having the MMC core make such needless assumptions can cause problems
for the upper layers, including filesystems.

- Dave

2008-03-04 19:51:48

by Alan Stern

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Tue, 4 Mar 2008, David Brownell wrote:

> > I don't understand this comment. Suppose a card was inserted while the
> > system was hibernating. If the core didn't reprobe, when would that
> > card be discovered?
>
> The host controller would tell the core to check for a card, exactly
> like it does at all other times.

But how would the host controller know to do that? Isn't card
insertion detection often driven by interrupts? If a card is
inserted while the computer is off, no interrupt will be generated.

> That's the natural alternative to having the MMC core assume that card
> detection was broken in low power states, so that the core needed to
> forcibly remove the cards before suspend, and reprobe during resume
> processing.

Is that the assumption the MMC core was really making? Are you sure it
wasn't assuming something else (perhaps equally as bad)?

> Having the MMC core make such needless assumptions can cause problems
> for the upper layers, including filesystems.

What's wrong with a superfluous probe at resume time, besides the waste
of a few milliseconds?

Alan Stern

2008-03-04 20:31:27

by David Brownell

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Tuesday 04 March 2008, Alan Stern wrote:
> On Tue, 4 Mar 2008, David Brownell wrote:
>
> > > I don't understand this comment. Suppose a card was inserted while the
> > > system was hibernating. If the core didn't reprobe, when would that
> > > card be discovered?
> >
> > The host controller would tell the core to check for a card, exactly
> > like it does at all other times.
>
> But how would the host controller know to do that? Isn't card
> insertion detection often driven by interrupts? If a card is
> inserted while the computer is off, no interrupt will be generated.

I'm talking about Real Suspend States (tm), not hibernation.
The systems in question tend to not support hibernation.

In particular: system sleep states where IRQs work ... that's a
common way for non-x86 platforms to work.


> > That's the natural alternative to having the MMC core assume that card
> > detection was broken in low power states, so that the core needed to
> > forcibly remove the cards before suspend, and reprobe during resume
> > processing.
>
> Is that the assumption the MMC core was really making? Are you sure it
> wasn't assuming something else (perhaps equally as bad)?

On the systems I was looking at, that assumption seems equivalent to
what the latest MMC core is assuming. (I think this came as part of
the rewrite/refactoring a while back.) But since the assumptions aren't
documented as such, you could say all guesses are equal. ;)


> > Having the MMC core make such needless assumptions can cause problems
> > for the upper layers, including filesystems.
>
> What's wrong with a superfluous probe at resume time, besides the waste
> of a few milliseconds?

I'm more concerned with the undesirable removal of devices at suspend
time ... ones with mounted filesystems etc.

- Dave

2008-03-04 21:01:04

by Alan Stern

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Tue, 4 Mar 2008, David Brownell wrote:

> > What's wrong with a superfluous probe at resume time, besides the waste
> > of a few milliseconds?
>
> I'm more concerned with the undesirable removal of devices at suspend
> time ... ones with mounted filesystems etc.

On that we can agree. The removal is done if the host doesn't define a
resume method. There doesn't seem to be any point to that, given that
the probing during resume will determine whether a card has in fact
been removed.

Alan Stern

2008-03-04 23:56:17

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

On Wednesday, 5 of March 2008, Peter Hartley wrote:
> On Mon, 2008-03-03 at 18:24 +0100, Pavel Machek wrote:
> > Ok, I can understand the gcc side. But do we actually run on an
> > architecture where
> >
> > long *x;
> >
> > *x = 0;
> >
> > racing with
> >
> > *x = 0x12345678;
> >
> > can produce
> >
> > *x == 0x12340000;
> >
> > or something like that? I'm told RCU relies on architectures not doing
> > this, and I'd like to get this clarified.
>
> ARM6, ARM7500 and similar do exactly this for short (and unsigned
> short), although not for int, long, or pointers:
>
> > struct foo { short b; short c; };
> > void baa(struct foo *f, short cc) { f->c = cc; }
>
> becomes (arm-linux-gcc -mcpu=arm6):
>
> > baa:
> > mov r3, r1, lsr #8
> > strb r3, [r0, #3]
> > strb r1, [r0, #2]
> > mov pc, lr
>
> note the two single-byte stores, as ARM6 didn't have the "store
> halfword" instruction.
>
> So I think Alan Stern's
> "For all properly-aligned pointer and integral types other than long
> long..."
> should be amended to
> "For all properly-aligned pointer and integral types other than short or
> long long..."

Well, perhaps it's sufficient to document just pointers? In fact this is what
RCU relies on.

Thanks,
Rafael

2008-03-05 00:17:15

by Peter Hartley

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

On Mon, 2008-03-03 at 18:24 +0100, Pavel Machek wrote:
> Ok, I can understand the gcc side. But do we actually run on an
> architecture where
>
> long *x;
>
> *x = 0;
>
> racing with
>
> *x = 0x12345678;
>
> can produce
>
> *x == 0x12340000;
>
> or something like that? I'm told RCU relies on architectures not doing
> this, and I'd like to get this clarified.

ARM6, ARM7500 and similar do exactly this for short (and unsigned
short), although not for int, long, or pointers:

> struct foo { short b; short c; };
> void baa(struct foo *f, short cc) { f->c = cc; }

becomes (arm-linux-gcc -mcpu=arm6):

> baa:
> mov r3, r1, lsr #8
> strb r3, [r0, #3]
> strb r1, [r0, #2]
> mov pc, lr

note the two single-byte stores, as ARM6 didn't have the "store
halfword" instruction.

So I think Alan Stern's
"For all properly-aligned pointer and integral types other than long
long..."
should be amended to
"For all properly-aligned pointer and integral types other than short or
long long..."

Peter

2008-03-05 00:26:39

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

On Wed, Mar 05, 2008 at 12:54:03AM +0100, Rafael J. Wysocki wrote:
> On Wednesday, 5 of March 2008, Peter Hartley wrote:
> > On Mon, 2008-03-03 at 18:24 +0100, Pavel Machek wrote:
> > > Ok, I can understand the gcc side. But do we actually run on an
> > > architecture where
> > >
> > > long *x;
> > >
> > > *x = 0;
> > >
> > > racing with
> > >
> > > *x = 0x12345678;
> > >
> > > can produce
> > >
> > > *x == 0x12340000;
> > >
> > > or something like that? I'm told RCU relies on architectures not doing
> > > this, and I'd like to get this clarified.
> >
> > ARM6, ARM7500 and similar do exactly this for short (and unsigned
> > short), although not for int, long, or pointers:
> >
> > > struct foo { short b; short c; };
> > > void baa(struct foo *f, short cc) { f->c = cc; }
> >
> > becomes (arm-linux-gcc -mcpu=arm6):
> >
> > > baa:
> > > mov r3, r1, lsr #8
> > > strb r3, [r0, #3]
> > > strb r1, [r0, #2]
> > > mov pc, lr
> >
> > note the two single-byte stores, as ARM6 didn't have the "store
> > halfword" instruction.
> >
> > So I think Alan Stern's
> > "For all properly-aligned pointer and integral types other than long
> > long..."
> > should be amended to
> > "For all properly-aligned pointer and integral types other than short or
> > long long..."
>
> Well, perhaps it's sufficient to document just pointers? In fact this is what
> RCU relies on.

One can do RCU on array indexes (ints/longs) as well as pointers, so we
need to keep "integral" in there.

Thanx, Paul

2008-03-05 17:07:58

by Matti Linnanvuori

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

On Mon, 2008-03-03 at 18:24 +0100, Pavel Machek wrote:
> "For all properly-aligned pointer and integral types other than short or
> long long..."

As far as I know, Linux kernel supports Alpha that does not have atomic
updates of integral types with less than 32 bits.


____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping

2008-03-06 15:58:43

by Mark Lord

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

Linus Torvalds wrote:
>
> On Mon, 3 Mar 2008, Alan Stern wrote:
>> Consider a routine like the following:
>>
>> static task_struct *the_task;
>>
>> void store_task(void)
>> {
>> the_task = current;
>> }
>>
>> Is it possible to say whether readers examining "the_task" are
>> guaranteed to see a coherent value?
>
> Yes, we do depend on this. All the RCU stuff (and in general *anything*
> that depends on memory ordering as opposed to full locking, and we have
> quite a lot of it) is very fundamentally dependent on the fact that things
> like pointers get read and written atomically.
..

But also consider something like this:

void store_task(void)
{
*the_task = current;
}

In this case, there is no guarantee that the assignment
can be done atomically on all CPU types. Some RISC archs
(eg. MIPS R2xxx) require an (interruptible) instruction pair
to store values to a potentially unaligned address.

This was a BIG issue on a different system that I once worked on.

Cheers

2008-03-06 16:13:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required



On Thu, 6 Mar 2008, Mark Lord wrote:
>
> But also consider something like this:
>
> void store_task(void)
> {
> *the_task = current;
> }
>
> In this case, there is no guarantee that the assignment
> can be done atomically on all CPU types. Some RISC archs
> (eg. MIPS R2xxx) require an (interruptible) instruction pair
> to store values to a potentially unaligned address.

You'd better not be using unaligned accesses for memory-ordering-sensitive
things (I think x86 happens get even that right for most cases, but I
don't think the architecture specification guarantees it, and I'm pretty
sure that you might find problems on cache crossing writes, for example)

But quite frankly, if you have an architecture that can't do the above as
a single write when it's a pointer, then you have a totally broken
architecture. It's not worth supporting.

(There are data structures that are harder than native words: bytes and
shorts can require load-modify-write cycles, and "u64" and friends can
obviously be multiple words, so you shouldn't depend on things for those
"complex" cases. But we *definitely* depend on atomicity for regular word
accesses).

Linus

2008-03-06 16:27:45

by Mark Lord

[permalink] [raw]
Subject: Re: [patch] Re: using long instead of atomic_t when only set/read is required

Linus Torvalds wrote:
>
> On Thu, 6 Mar 2008, Mark Lord wrote:
>> But also consider something like this:
>>
>> void store_task(void)
>> {
>> *the_task = current;
>> }
>>
>> In this case, there is no guarantee that the assignment
>> can be done atomically on all CPU types. Some RISC archs
>> (eg. MIPS R2xxx) require an (interruptible) instruction pair
>> to store values to a potentially unaligned address.
>
> You'd better not be using unaligned accesses for memory-ordering-sensitive
> things (I think x86 happens get even that right for most cases, but I
> don't think the architecture specification guarantees it, and I'm pretty
> sure that you might find problems on cache crossing writes, for example)
..

Yeah. For the MIPS R2xxx CPU, the question was whether the compiler
could guarantee the alignment of the things the pointer could point at.
In cases where it could not, it would emit the interruptable instruction
pair instead of a single load instruction.

I don't know what gcc does on MIPS for this. Perhaps it simply always
assumes an aligned access? In that case, there's no issue unless some
putz actually creates/uses a pointer to an unaligned data object.

What about other architectures like ARM ? Probably also assumes aligned,
in which case all is well.

> But quite frankly, if you have an architecture that can't do the above as
> a single write when it's a pointer, then you have a totally broken
> architecture. It's not worth supporting.
>
> (There are data structures that are harder than native words: bytes and
> shorts can require load-modify-write cycles, and "u64" and friends can
> obviously be multiple words, so you shouldn't depend on things for those
> "complex" cases. But we *definitely* depend on atomicity for regular word
> accesses).

2008-03-06 20:21:01

by Pavel Machek

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Tue 2008-03-04 16:00:51, Alan Stern wrote:
> On Tue, 4 Mar 2008, David Brownell wrote:
>
> > > What's wrong with a superfluous probe at resume time, besides the waste
> > > of a few milliseconds?
> >
> > I'm more concerned with the undesirable removal of devices at suspend
> > time ... ones with mounted filesystems etc.
>
> On that we can agree. The removal is done if the host doesn't define a
> resume method. There doesn't seem to be any point to that, given that
> the probing during resume will determine whether a card has in fact
> been removed.

Hmm, if the driver is sleeping too deeply, user might have removed the
card and put in different one, without driver noticing. That would be
_bad_.
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-03-06 20:33:50

by Alan Stern

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Thu, 6 Mar 2008, Pavel Machek wrote:

> On Tue 2008-03-04 16:00:51, Alan Stern wrote:
> > On Tue, 4 Mar 2008, David Brownell wrote:
> >
> > > > What's wrong with a superfluous probe at resume time, besides the waste
> > > > of a few milliseconds?
> > >
> > > I'm more concerned with the undesirable removal of devices at suspend
> > > time ... ones with mounted filesystems etc.
> >
> > On that we can agree. The removal is done if the host doesn't define a
> > resume method. There doesn't seem to be any point to that, given that
> > the probing during resume will determine whether a card has in fact
> > been removed.
>
> Hmm, if the driver is sleeping too deeply, user might have removed the
> card and put in different one, without driver noticing. That would be
> _bad_.

Ironically, the very same problem now exists with the USB mass-storage
driver. I don't see any way for the driver itself to solve it,
especially during a hibernation (which can be a _very_ deep sleep).

One thing that could be done is for filesystems to verify, after a
system sleep, that their superblocks haven't changed. There could
still be issues with non-mounted partitions, if they have live entries
in the block cache, but it would be an improvement.

Do you know the right people to mention this to? Anybody in filesystem
development interested in suspend/hibernation issues?

Alan Stern

2008-03-06 20:54:52

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

2008/3/6, Alan Stern <[email protected]>:
> On Thu, 6 Mar 2008, Pavel Machek wrote:
>
> > On Tue 2008-03-04 16:00:51, Alan Stern wrote:
> > > On Tue, 4 Mar 2008, David Brownell wrote:
> > >
> > > > > What's wrong with a superfluous probe at resume time, besides the waste
> > > > > of a few milliseconds?
> > > >
> > > > I'm more concerned with the undesirable removal of devices at suspend
> > > > time ... ones with mounted filesystems etc.
> > >
> > > On that we can agree. The removal is done if the host doesn't define a
> > > resume method. There doesn't seem to be any point to that, given that
> > > the probing during resume will determine whether a card has in fact
> > > been removed.
> >
> > Hmm, if the driver is sleeping too deeply, user might have removed the
> > card and put in different one, without driver noticing. That would be
> > _bad_.
>
>
> Ironically, the very same problem now exists with the USB mass-storage
> driver. I don't see any way for the driver itself to solve it,
> especially during a hibernation (which can be a _very_ deep sleep).
>
> One thing that could be done is for filesystems to verify, after a
> system sleep, that their superblocks haven't changed. There could
> still be issues with non-mounted partitions, if they have live entries
> in the block cache, but it would be an improvement.
>
> Do you know the right people to mention this to? Anybody in filesystem
> development interested in suspend/hibernation issues?


IMHO the way would be to try to unmount fs if it's possible - if not -
user should be notified on suspend/hibernation that he must preserve
media in its place after resume and it should be checked and user
should be notified if different devices/fs were find...

Zdenek

2008-03-06 21:23:42

by David Brownell

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]


> > > > Card insert/remove events can be system wake events though. Which
> > > > makes that restriction impractical.
> >
> > This part seems to be ignored by your comment ... wake events.
>
> How so? How is a wake caused by the MMC controller different
> than any other source?

They aren't. Which is part of why the way MMC currently assumes that
insert/remove events don't work is a problem.


> > That is, the MMC core doesn't understand wakeup events.
> >
> > Or, as pointed out elsewhere, well-behaved MMC hosts ... which don't
> > need either such reprobing or the associated remove-on-suspend.
>
> You need well behaved _systems_, not just hosts to achieve that
> guarantee.

Sure, but a host can't be well behaved all by itself!

And in any case, I had already made clear I was talking
about _systems_ that behave properly.


> SDHCI controllers have the ability to wake up the
> system on card removal, but there is zero guarantee that the
> platform actually wired the controller up in a way that actually
> allows it to do this. Throw suspend to disk, where the system might
> completely lose power, into the mix and you're completely screwed.

I'm talking about generic MMC/SD controllers of the type
that have been around for years ... on systems which won't
use hibernation ("suspend to disk"), but do use real system
sleep states where card detection (by IRQs) works.


> So for the default behaviour to change, we need one of two things:
>
> - Certainty that removals cannot go unnoticed. (even then, you
> also need wakeup latency guarantees)
> - Ability to detect a removal after the fact.
>
> For the general case, the first one is impossible given todays
> hardware.

Odd that it's very possible on the systems I mentioned.


> The second might be solvable, but noone has done the work.

I don't know what you mean by "detect a removal after
the fact", or why it'd be needed if you detected it
in the first place.


> (If your complete system can satisfy the first option, feel
> free to add the "unsafe" option to you defconfig, but that is
> hardly adequate reason to have it on by default.)

Thing is, I had also pointed out that it wasn't "unsafe"
in the least on many systems.

I'll refresh the patch which improves that mechanism and
updates its description.

- Dave

2008-03-06 21:32:53

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

On Thursday, 6 of March 2008, Zdenek Kabelac wrote:
> 2008/3/6, Alan Stern <[email protected]>:
> > On Thu, 6 Mar 2008, Pavel Machek wrote:
> >
> > > On Tue 2008-03-04 16:00:51, Alan Stern wrote:
> > > > On Tue, 4 Mar 2008, David Brownell wrote:
> > > >
> > > > > > What's wrong with a superfluous probe at resume time, besides the waste
> > > > > > of a few milliseconds?
> > > > >
> > > > > I'm more concerned with the undesirable removal of devices at suspend
> > > > > time ... ones with mounted filesystems etc.
> > > >
> > > > On that we can agree. The removal is done if the host doesn't define a
> > > > resume method. There doesn't seem to be any point to that, given that
> > > > the probing during resume will determine whether a card has in fact
> > > > been removed.
> > >
> > > Hmm, if the driver is sleeping too deeply, user might have removed the
> > > card and put in different one, without driver noticing. That would be
> > > _bad_.
> >
> >
> > Ironically, the very same problem now exists with the USB mass-storage
> > driver. I don't see any way for the driver itself to solve it,
> > especially during a hibernation (which can be a _very_ deep sleep).
> >
> > One thing that could be done is for filesystems to verify, after a
> > system sleep, that their superblocks haven't changed. There could
> > still be issues with non-mounted partitions, if they have live entries
> > in the block cache, but it would be an improvement.
> >
> > Do you know the right people to mention this to? Anybody in filesystem
> > development interested in suspend/hibernation issues?

I don't really think so (but I may be wrong ...)

> IMHO the way would be to try to unmount fs if it's possible - if not -
> user should be notified on suspend/hibernation that he must preserve
> media in its place after resume and it should be checked and user
> should be notified if different devices/fs were find...

This is a very long standing issue which IMO can only be solved by making
filesystems suspend-aware.

Thanks,
Rafael