Hi,
After thinking a lot about how to fix properly the hangs caused by
insert/removal of mmc card during suspend/resume, and default behavior
of not trusting the card persistence over suspend, I finally come to
conclusion that changing the del_gendisk is wrong.
First of all there are 2 types of removal possible. First one happens
when system detects that some device is gone. At that point there is
really no point in syncing it.
The other type of removal is controlled removal, usually on user
request. Surly we must sync the device of this request.
This type of removal _shouldn't_ happen during suspend/resume
transaction. The only case when it does is today to protect against user
carelessness of switching the cards during suspend.
I think that it is just wrong to sync the device in suspend/resume time.
At that time userspace is frozen, but also its not known which drivers
are still running. They might even suspend asynchronously...
So, such cases should be moved to pm-notifier, thing that my patch does
for mmc.
Other users should be fixed as well.
We can, in addition to that, add a temporary hack to del_gendisk with
loud WARN_ON though.
If card is really removed during suspend, then we can just introduce
del_gendisk_dead or something like that which will be safe to call
during suspend.
I didn't do that but rather I made the card detection thread freezeable,
thus eliminated the whole problem.
If you remove the card during suspend, system will notice at end of
resume.
---
commit 395a40018e09b109811a1fc5b09572f30e7db9d6
Author: Maxim Levitsky <[email protected]>
Date: Fri Jun 11 20:14:13 2010 +0300
MMC: fix all hangs related to mmc/sd card insert/removal during suspend/resume.
If you don't use CONFIG_MMC_UNSAFE_RESUME, card will now be removed
in pm notifier, while userspace is still running.
Thus it will be possible to sync it properly.
Card detect workqueue is now freezable, therefore a card insert/removal event will
wait till userspace is unfrozen.
Tested with and without CONFIG_MMC_UNSAFE_RESUME, with suspend and hibernate.
Signed-off-by: Maxim Levitsky <[email protected]>
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 85d15de..0cba53a 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -1264,19 +1264,6 @@ int mmc_suspend_host(struct mmc_host *host)
if (host->bus_ops && !host->bus_dead) {
if (host->bus_ops->suspend)
err = host->bus_ops->suspend(host);
- if (err == -ENOSYS || !host->bus_ops->resume) {
- /*
- * We simply "remove" the card in this case.
- * It will be redetected on resume.
- */
- if (host->bus_ops->remove)
- host->bus_ops->remove(host);
- mmc_claim_host(host);
- mmc_detach_bus(host);
- mmc_release_host(host);
- host->pm_flags = 0;
- err = 0;
- }
}
mmc_bus_put(host);
@@ -1308,12 +1295,6 @@ int mmc_resume_host(struct mmc_host *host)
printk(KERN_WARNING "%s: error %d during resume "
"(card was removed?)\n",
mmc_hostname(host), err);
- if (host->bus_ops->remove)
- host->bus_ops->remove(host);
- mmc_claim_host(host);
- mmc_detach_bus(host);
- mmc_release_host(host);
- /* no need to bother upper layers */
err = 0;
}
}
@@ -1328,6 +1309,37 @@ int mmc_resume_host(struct mmc_host *host)
return err;
}
+/* Do the card removal on suspend if card is assumed removeable
+ * Do that in pm notifier while userspace isn't yet frozen, so we will be able
+ to sync the card.
+*/
+int mmc_pm_notify(struct notifier_block *notify_block,
+ unsigned long mode, void *unused)
+{
+ struct mmc_host *host = container_of(
+ notify_block, struct mmc_host, pm_notify);
+
+
+ switch (mode) {
+ case PM_HIBERNATION_PREPARE:
+ case PM_SUSPEND_PREPARE:
+
+ if (!host->bus_ops || host->bus_ops->suspend)
+ break;
+
+ if (host->bus_ops->remove)
+ host->bus_ops->remove(host);
+ mmc_claim_host(host);
+ mmc_detach_bus(host);
+ mmc_release_host(host);
+ host->pm_flags = 0;
+ break;
+
+ }
+
+ return 0;
+}
+
EXPORT_SYMBOL(mmc_resume_host);
#endif
@@ -1336,7 +1348,7 @@ static int __init mmc_init(void)
{
int ret;
- workqueue = create_singlethread_workqueue("kmmcd");
+ workqueue = create_freezeable_workqueue("kmmcd");
if (!workqueue)
return -ENOMEM;
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index 4735390..53f3406 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -17,6 +17,7 @@
#include <linux/pagemap.h>
#include <linux/leds.h>
#include <linux/slab.h>
+#include <linux/suspend.h>
#include <linux/mmc/host.h>
@@ -85,6 +86,8 @@ struct mmc_host *mmc_alloc_host(int extra, struct device *dev)
init_waitqueue_head(&host->wq);
INIT_DELAYED_WORK(&host->detect, mmc_rescan);
INIT_DELAYED_WORK_DEFERRABLE(&host->disable, mmc_host_deeper_disable);
+ host->pm_notify.notifier_call = mmc_pm_notify;
+
/*
* By default, hosts do not support SGIO or large requests.
@@ -132,6 +135,7 @@ int mmc_add_host(struct mmc_host *host)
mmc_add_host_debugfs(host);
#endif
+ register_pm_notifier(&host->pm_notify);
mmc_start_host(host);
return 0;
@@ -158,6 +162,8 @@ void mmc_remove_host(struct mmc_host *host)
device_del(&host->class_dev);
led_trigger_unregister_simple(host->led);
+
+ unregister_pm_notifier(&host->pm_notify);
}
EXPORT_SYMBOL(mmc_remove_host);
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index f65913c..b45812d 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -124,6 +124,7 @@ struct mmc_host {
unsigned int f_min;
unsigned int f_max;
u32 ocr_avail;
+ struct notifier_block pm_notify;
#define MMC_VDD_165_195 0x00000080 /* VDD voltage 1.65 - 1.95 */
#define MMC_VDD_20_21 0x00000100 /* VDD voltage 2.0 ~ 2.1 */
@@ -257,6 +258,8 @@ int mmc_card_can_sleep(struct mmc_host *host);
int mmc_host_enable(struct mmc_host *host);
int mmc_host_disable(struct mmc_host *host);
int mmc_host_lazy_disable(struct mmc_host *host);
+int mmc_pm_notify(struct notifier_block *notify_block, unsigned long, void *);
+
static inline void mmc_set_disable_delay(struct mmc_host *host,
unsigned int disable_delay)
Sorry, forgot one hunk...
Will reply with the patch.
Best regards,
Maxim Levitsky
If you don't use CONFIG_MMC_UNSAFE_RESUME, card will now be removed
in pm notified while userspace is still running.
Thus it will be possible to sync it propely.
Card detect workqueue is now freezeable, therefore a card insert/removal event will
wait till userspace is unfrozen.
Tested with and without CONFIG_MMC_UNSAFE_RESUME with suspend and hibernate.
Signed-off-by: Maxim Levitsky <[email protected]>
---
drivers/mmc/core/core.c | 54 +++++++++++++++++++++++++++------------------
drivers/mmc/core/host.c | 6 +++++
include/linux/mmc/host.h | 3 ++
3 files changed, 41 insertions(+), 22 deletions(-)
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 569e94d..0cba53a 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -1259,26 +1259,11 @@ int mmc_suspend_host(struct mmc_host *host)
if (host->caps & MMC_CAP_DISABLE)
cancel_delayed_work(&host->disable);
- cancel_delayed_work(&host->detect);
- mmc_flush_scheduled_work();
mmc_bus_get(host);
if (host->bus_ops && !host->bus_dead) {
if (host->bus_ops->suspend)
err = host->bus_ops->suspend(host);
- if (err == -ENOSYS || !host->bus_ops->resume) {
- /*
- * We simply "remove" the card in this case.
- * It will be redetected on resume.
- */
- if (host->bus_ops->remove)
- host->bus_ops->remove(host);
- mmc_claim_host(host);
- mmc_detach_bus(host);
- mmc_release_host(host);
- host->pm_flags = 0;
- err = 0;
- }
}
mmc_bus_put(host);
@@ -1310,12 +1295,6 @@ int mmc_resume_host(struct mmc_host *host)
printk(KERN_WARNING "%s: error %d during resume "
"(card was removed?)\n",
mmc_hostname(host), err);
- if (host->bus_ops->remove)
- host->bus_ops->remove(host);
- mmc_claim_host(host);
- mmc_detach_bus(host);
- mmc_release_host(host);
- /* no need to bother upper layers */
err = 0;
}
}
@@ -1330,6 +1309,37 @@ int mmc_resume_host(struct mmc_host *host)
return err;
}
+/* Do the card removal on suspend if card is assumed removeable
+ * Do that in pm notifier while userspace isn't yet frozen, so we will be able
+ to sync the card.
+*/
+int mmc_pm_notify(struct notifier_block *notify_block,
+ unsigned long mode, void *unused)
+{
+ struct mmc_host *host = container_of(
+ notify_block, struct mmc_host, pm_notify);
+
+
+ switch (mode) {
+ case PM_HIBERNATION_PREPARE:
+ case PM_SUSPEND_PREPARE:
+
+ if (!host->bus_ops || host->bus_ops->suspend)
+ break;
+
+ if (host->bus_ops->remove)
+ host->bus_ops->remove(host);
+ mmc_claim_host(host);
+ mmc_detach_bus(host);
+ mmc_release_host(host);
+ host->pm_flags = 0;
+ break;
+
+ }
+
+ return 0;
+}
+
EXPORT_SYMBOL(mmc_resume_host);
#endif
@@ -1338,7 +1348,7 @@ static int __init mmc_init(void)
{
int ret;
- workqueue = create_singlethread_workqueue("kmmcd");
+ workqueue = create_freezeable_workqueue("kmmcd");
if (!workqueue)
return -ENOMEM;
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index 4735390..53f3406 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -17,6 +17,7 @@
#include <linux/pagemap.h>
#include <linux/leds.h>
#include <linux/slab.h>
+#include <linux/suspend.h>
#include <linux/mmc/host.h>
@@ -85,6 +86,8 @@ struct mmc_host *mmc_alloc_host(int extra, struct device *dev)
init_waitqueue_head(&host->wq);
INIT_DELAYED_WORK(&host->detect, mmc_rescan);
INIT_DELAYED_WORK_DEFERRABLE(&host->disable, mmc_host_deeper_disable);
+ host->pm_notify.notifier_call = mmc_pm_notify;
+
/*
* By default, hosts do not support SGIO or large requests.
@@ -132,6 +135,7 @@ int mmc_add_host(struct mmc_host *host)
mmc_add_host_debugfs(host);
#endif
+ register_pm_notifier(&host->pm_notify);
mmc_start_host(host);
return 0;
@@ -158,6 +162,8 @@ void mmc_remove_host(struct mmc_host *host)
device_del(&host->class_dev);
led_trigger_unregister_simple(host->led);
+
+ unregister_pm_notifier(&host->pm_notify);
}
EXPORT_SYMBOL(mmc_remove_host);
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index f65913c..b45812d 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -124,6 +124,7 @@ struct mmc_host {
unsigned int f_min;
unsigned int f_max;
u32 ocr_avail;
+ struct notifier_block pm_notify;
#define MMC_VDD_165_195 0x00000080 /* VDD voltage 1.65 - 1.95 */
#define MMC_VDD_20_21 0x00000100 /* VDD voltage 2.0 ~ 2.1 */
@@ -257,6 +258,8 @@ int mmc_card_can_sleep(struct mmc_host *host);
int mmc_host_enable(struct mmc_host *host);
int mmc_host_disable(struct mmc_host *host);
int mmc_host_lazy_disable(struct mmc_host *host);
+int mmc_pm_notify(struct notifier_block *notify_block, unsigned long, void *);
+
static inline void mmc_set_disable_delay(struct mmc_host *host,
unsigned int disable_delay)
--
1.7.0.4
--- On Fri, 6/11/10, Maxim Levitsky <[email protected]> wrote:
> After thinking a lot about how to fix properly the hangs
> caused by
> insert/removal of mmc card during suspend/resume, and default behavior
> of not trusting the card persistence over suspend,
Right; there are two types of driver: ones
that can correctly report whether the card
has been removed ... and ones that can't That
default behavior presumes the latter.
The MMC/SD framework doesn't know about these
two types
(For
reference: the easy way to do the former involves
setting up the GPIO used for card detect as a
(wakeup?) IRQ source.
> First of all there are 2 types of removal possible. First
> one happens
> when system detects that some device is gone. At that point
> there is
> really no point in syncing it.
One suggestion was syncing before suspend...
but that would require coordinating the
block layer and PM framework. That seems
like it'd be generally a wise thing to do;
no point in losing the write cache, ever.
> The other type of removal is controlled removal, usually on user request.
On Fri, 11 Jun 2010, Maxim Levitsky wrote:
> Hi,
>
> After thinking a lot about how to fix properly the hangs caused by
> insert/removal of mmc card during suspend/resume, and default behavior
> of not trusting the card persistence over suspend, I finally come to
> conclusion that changing the del_gendisk is wrong.
>
> First of all there are 2 types of removal possible. First one happens
> when system detects that some device is gone. At that point there is
> really no point in syncing it.
>
> The other type of removal is controlled removal, usually on user
> request. Surly we must sync the device of this request.
> This type of removal _shouldn't_ happen during suspend/resume
> transaction. The only case when it does is today to protect against user
> carelessness of switching the cards during suspend.
There are other pathological cases which can cause it to happen, but
they are pretty unlikely.
> I think that it is just wrong to sync the device in suspend/resume time.
> At that time userspace is frozen, but also its not known which drivers
> are still running. They might even suspend asynchronously...
> So, such cases should be moved to pm-notifier, thing that my patch does
> for mmc.
> Other users should be fixed as well.
>
> We can, in addition to that, add a temporary hack to del_gendisk with
> loud WARN_ON though.
>
> If card is really removed during suspend, then we can just introduce
> del_gendisk_dead or something like that which will be safe to call
> during suspend.
>
> I didn't do that but rather I made the card detection thread freezeable,
> thus eliminated the whole problem.
> If you remove the card during suspend, system will notice at end of
> resume.
I don't know why the mmc subsystem works differently from USB. In USB,
the equivalent of UNSAFE_RESUME is a per-device flag that can be
controlled via sysfs (see Documentation/usb/persist.txt). And it
almost always defaults to ON, i.e., the kernel assumes that if a device
is present before suspend and after resume it is the same device --
although some checking is done to try to verify this (the descriptors
have to remain the same). We started out being more cautious (the
default was OFF), but Linus complained about it being _too_ cautious.
And like you have done here, in USB the kernel thread that handles
registering and unregistering devices is freezable, so things never get
added or removed at an unsafe time.
Alan Stern
On Fri, 2010-06-11 at 17:00 -0400, Alan Stern wrote:
> On Fri, 11 Jun 2010, Maxim Levitsky wrote:
>
> > Hi,
> >
> > After thinking a lot about how to fix properly the hangs caused by
> > insert/removal of mmc card during suspend/resume, and default behavior
> > of not trusting the card persistence over suspend, I finally come to
> > conclusion that changing the del_gendisk is wrong.
> >
> > First of all there are 2 types of removal possible. First one happens
> > when system detects that some device is gone. At that point there is
> > really no point in syncing it.
> >
> > The other type of removal is controlled removal, usually on user
> > request. Surly we must sync the device of this request.
> > This type of removal _shouldn't_ happen during suspend/resume
> > transaction. The only case when it does is today to protect against user
> > carelessness of switching the cards during suspend.
>
> There are other pathological cases which can cause it to happen, but
> they are pretty unlikely.
>
> > I think that it is just wrong to sync the device in suspend/resume time.
> > At that time userspace is frozen, but also its not known which drivers
> > are still running. They might even suspend asynchronously...
> > So, such cases should be moved to pm-notifier, thing that my patch does
> > for mmc.
> > Other users should be fixed as well.
> >
> > We can, in addition to that, add a temporary hack to del_gendisk with
> > loud WARN_ON though.
> >
> > If card is really removed during suspend, then we can just introduce
> > del_gendisk_dead or something like that which will be safe to call
> > during suspend.
> >
> > I didn't do that but rather I made the card detection thread freezeable,
> > thus eliminated the whole problem.
> > If you remove the card during suspend, system will notice at end of
> > resume.
>
> I don't know why the mmc subsystem works differently from USB. In USB,
> the equivalent of UNSAFE_RESUME is a per-device flag that can be
> controlled via sysfs (see Documentation/usb/persist.txt). And it
> almost always defaults to ON, i.e., the kernel assumes that if a device
> is present before suspend and after resume it is the same device --
> although some checking is done to try to verify this (the descriptors
> have to remain the same). We started out being more cautious (the
> default was OFF), but Linus complained about it being _too_ cautious.
I will be very happy to see the description and default value of
MMC_UNSAFE_RESUME changed.
This patch fixes both cases.
>
> And like you have done here, in USB the kernel thread that handles
> registering and unregistering devices is freezable, so things never get
> added or removed at an unsafe time.
Very nice!
Best regards,
Maxim Levitsky
On Fri, 2010-06-11 at 22:19 +0300, Maxim Levitsky wrote:
> If you don't use CONFIG_MMC_UNSAFE_RESUME, card will now be removed
> in pm notified while userspace is still running.
> Thus it will be possible to sync it propely.
>
> Card detect workqueue is now freezeable, therefore a card insert/removal event will
> wait till userspace is unfrozen.
>
> Tested with and without CONFIG_MMC_UNSAFE_RESUME with suspend and hibernate.
And I say it is now well tested.
All attempts to hang the system failed.
Best regards,
Maxim Levitsky
On Fri, 11 Jun 2010 22:19:55 +0300
Maxim Levitsky <[email protected]> wrote:
> If you don't use CONFIG_MMC_UNSAFE_RESUME, card will now be removed
> in pm notified while userspace is still running.
> Thus it will be possible to sync it propely.
That paragraph is a disaster and I'm not sure that I understand it well
enough to repair it. I think you wanted s/now/not/ and s/in/if/ and
s/it will be/it will not be/. Or perhaps you didn't.
Please send a comprehensible replacement.
On Fri, 11 Jun 2010 22:19:55 +0300
Maxim Levitsky <[email protected]> wrote:
> If you don't use CONFIG_MMC_UNSAFE_RESUME, card will now be removed
> in pm notified while userspace is still running.
> Thus it will be possible to sync it propely.
>
> Card detect workqueue is now freezeable, therefore a card insert/removal event will
> wait till userspace is unfrozen.
>
> Tested with and without CONFIG_MMC_UNSAFE_RESUME with suspend and hibernate.
>
>
> ...
>
> @@ -158,6 +162,8 @@ void mmc_remove_host(struct mmc_host *host)
> device_del(&host->class_dev);
>
> led_trigger_unregister_simple(host->led);
> +
> + unregister_pm_notifier(&host->pm_notify);
> }
This looks a little risky. There's a window where the pm notifier
remains registered after we've done the device_del() and the
led_trigger_unregister_simple().
I don't know if the code's really buggy, nor if it might become buggy
in the future as things evolve. But as register_pm_notifier() is the
last thing we do before mmc_start_host(), I'd have though that
unregister_pm_notifier() should be the first thing we do after
mmc_stop_host()?
>
> ...
>
> --- a/include/linux/mmc/host.h
> +++ b/include/linux/mmc/host.h
> @@ -124,6 +124,7 @@ struct mmc_host {
> unsigned int f_min;
> unsigned int f_max;
> u32 ocr_avail;
> + struct notifier_block pm_notify;
>
> #define MMC_VDD_165_195 0x00000080 /* VDD voltage 1.65 - 1.95 */
> #define MMC_VDD_20_21 0x00000100 /* VDD voltage 2.0 ~ 2.1 */
> @@ -257,6 +258,8 @@ int mmc_card_can_sleep(struct mmc_host *host);
> int mmc_host_enable(struct mmc_host *host);
> int mmc_host_disable(struct mmc_host *host);
> int mmc_host_lazy_disable(struct mmc_host *host);
> +int mmc_pm_notify(struct notifier_block *notify_block, unsigned long, void *);
It's unusual to provide names for some of the arguments and to leave
them out for others.
On Mon, 2010-06-14 at 16:01 -0700, Andrew Morton wrote:
> On Fri, 11 Jun 2010 22:19:55 +0300
> Maxim Levitsky <[email protected]> wrote:
>
> > If you don't use CONFIG_MMC_UNSAFE_RESUME, card will now be removed
> > in pm notified while userspace is still running.
> > Thus it will be possible to sync it propely.
> >
> > Card detect workqueue is now freezeable, therefore a card insert/removal event will
> > wait till userspace is unfrozen.
> >
> > Tested with and without CONFIG_MMC_UNSAFE_RESUME with suspend and hibernate.
> >
> >
> > ...
> >
> > @@ -158,6 +162,8 @@ void mmc_remove_host(struct mmc_host *host)
> > device_del(&host->class_dev);
> >
> > led_trigger_unregister_simple(host->led);
> > +
> > + unregister_pm_notifier(&host->pm_notify);
> > }
>
> This looks a little risky. There's a window where the pm notifier
> remains registered after we've done the device_del() and the
> led_trigger_unregister_simple().
>
> I don't know if the code's really buggy, nor if it might become buggy
> in the future as things evolve. But as register_pm_notifier() is the
> last thing we do before mmc_start_host(), I'd have though that
> unregister_pm_notifier() should be the first thing we do after
> mmc_stop_host()?
Absolutely right!
>
> >
> > ...
> >
> > --- a/include/linux/mmc/host.h
> > +++ b/include/linux/mmc/host.h
> > @@ -124,6 +124,7 @@ struct mmc_host {
> > unsigned int f_min;
> > unsigned int f_max;
> > u32 ocr_avail;
> > + struct notifier_block pm_notify;
> >
> > #define MMC_VDD_165_195 0x00000080 /* VDD voltage 1.65 - 1.95 */
> > #define MMC_VDD_20_21 0x00000100 /* VDD voltage 2.0 ~ 2.1 */
> > @@ -257,6 +258,8 @@ int mmc_card_can_sleep(struct mmc_host *host);
> > int mmc_host_enable(struct mmc_host *host);
> > int mmc_host_disable(struct mmc_host *host);
> > int mmc_host_lazy_disable(struct mmc_host *host);
> > +int mmc_pm_notify(struct notifier_block *notify_block, unsigned long, void *);
>
> It's unusual to provide names for some of the arguments and to leave
> them out for others.
Will fix that too.
Best regards,
Maxim Levitsky
On Mon, 2010-06-14 at 15:51 -0700, Andrew Morton wrote:
> On Fri, 11 Jun 2010 22:19:55 +0300
> Maxim Levitsky <[email protected]> wrote:
>
> > If you don't use CONFIG_MMC_UNSAFE_RESUME, card will now be removed
> > in pm notified while userspace is still running.
> > Thus it will be possible to sync it propely.
>
> That paragraph is a disaster and I'm not sure that I understand it well
> enough to repair it. I think you wanted s/now/not/ and s/in/if/ and
> s/it will be/it will not be/. Or perhaps you didn't.
>
> Please send a comprehensible replacement.
>
Sure.
If you don't use CONFIG_MMC_UNSAFE_RESUME, as soon as you attempt to
suspend, the card will be removed, therefore this patch doesn't change
the behavior of this option.
However the removal will be done by pm notifier, which runs while
userspace is still not frozen and thus can freely use del_gendisk,
without the risk of deadlock which would happen otherwise.
If you do use
Best regards,
Maxim Levitsky