2009-01-02 06:40:35

by Sriram V

[permalink] [raw]
Subject: Power Management with rootfs on SDMMC.

Hi,
I am using linux-2.6.27. I am testing power management after
booting out of a SD/MMC card.
My root file system is on a SD card.

I am issuing the following command to suspend

$ echo -n mem > /sys/power/state

What happens is, The kernel hangs and it does not come out of suspend.
even after i press keypad/generate serial input data.

# echo -n mem > /sys/power/state
PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.00 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
mmc0: card e624 removed
--- kernel hangs---

The same works, If i boot on NFS and mount a SD card.
PM works, suspend and resume work fine. This means
that the PM of SD MMC driver works properly.

However, I have a problem when the rootfs is on SDMMC card.
Has anyone tried this before?

Am i missing something here?

Regards,
sriram


2009-01-02 10:22:08

by Andreas Mohr

[permalink] [raw]
Subject: Re: Power Management with rootfs on SDMMC.

Hi,

> I am using linux-2.6.27. I am testing power management after
> booting out of a SD/MMC card.
> My root file system is on a SD card.
>
> I am issuing the following command to suspend
>
> $ echo -n mem > /sys/power/state
> What happens is, The kernel hangs and it does not come out of suspend.
> even after i press keypad/generate serial input data.


> Has anyone tried this before?
>
> Am i missing something here?

I don't think you're missing much, and you're definitely not alone.

There have been long threads on mobile phone and netbook related forums about issues
with seemingly "any slightly advanced use whatsoever" of partitions on SD cards.

IMHO in this strongly increasingly netbook- and mobile phone-enabled world it's
a bloody shame that:
- we have a hanging suspend/resume on an SD rootfs (often the only way of
achieving serious Linux use on a mobile phone!)
- we lose partition mounts due to full device re-probing instead of re-using the
same minor device ID after resume
- installing a swap partition on an SD card and then resuming can easily
go as far as __even completely corrupting__ the entire SD card partitioning
plus first partition (corrupts first 1kB of the card: both table and partition)
People then immediately resort to a non-helpful "Don't Do This, Ever" reply
(using swap partition on SD and suspend, see http://dev.laptop.org/ticket/6532#comment:10),
but to this I'd say:
News Flash, if this can theoretically be made to work at all using software
(i.e. there are no VM-related _hard_ blockers to such an operation
of using swap itself on a non-fixed SD slot), then this should goddamn be made
to work practically on Linux, _somehow_, since on SSD netbooks this is
the most natural thing to do to avoid wear of the builtin device.

CC'd Pierre Ossman since he might know more valuable details of some of those
problems. And CC'd Linus directly since, IMHO, some of these grave problems are
full BLOCKERS and should be fully investigated now given the general direction
that the hardware market is strongly going into.

We simply cannot afford being this terrible in this space.

Related URLs about such issues:
http://dev.laptop.org/ticket/6532 (note: marked FIXED,
but that's only an unhelpful local fix, not of the underlying issues)
https://kerneltrap.org/mailarchive/linux-kernel/2007/7/19/118682
http://wiki.laptop.org/go/OLPC_SW_ECO_-_SD_CARD_CORRUPTION
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=504391
http://forum.eeeuser.com/viewtopic.php?pid=472151
http://www.olpcnews.com/forum/index.php?topic=2842.0;wap2
https://dev.openwrt.org/browser/trunk/target/linux/s3c24xx/patches-2.6.24/1144-config-add-back-MMC_UNSAFE_RESUME.patch.patch?rev=13613
http://www.olpcnews.com/software/sugar/wanted_software_testers.html
https://bugs.launchpad.net/moblin-kernel/+bug/193177
http://www.gossamer-threads.com/lists/linux/kernel/984970

A possibly helpful clue might be provided by a similar CardBus incident,
http://marc.info/?l=linux-wireless&m=122539058601588&w=4

I think that CONFIG_MMC_UNSAFE_RESUME referenced by some of the URLs above
is indeed unsafe and one should try to find a proper solution for this issue
by doing some kind of active media re-identification
(media management via media IDs or so) upon resume.
Plus, in general such a hard CONFIG_ switch is entire unsuitable for the
device flexibility one would take for granted these days.

Any thoughts, anyone?
(I'm affected by both
hangs during resume when using SD
and corruption when using swap, depending on which kernel version or
setup I happen to be using on my A110L)

IOW, I have stopped using both(!) SD slots on my device altogether
(a nice 16GB card wasted), that's how bad the combined effect
of these issues was for me.
And there are multiple other examples of users in the URLs above where they have
entirely given up on this due to the existing problems.


(or, slightly reworded: I think it's high time for some kernel God to buy a measly
netbook or some such instead of 16-core mainframes to get a feeling for the
amount of issues that one hits there)

This reply admittedly is a hodge-podge of references to existing reports,
but this is just because I think that it's about time that these media management issues
ought to be investigated massively (e.g. by dedicating someone for a 24/7 job
until these things are fixed the proper way - Linux Foundation?).

Thanks,

Andreas Mohr

2009-01-02 11:22:18

by Pierre Ossman

[permalink] [raw]
Subject: Re: Power Management with rootfs on SDMMC.

On Fri, 2 Jan 2009 11:21:52 +0100
Andreas Mohr <[email protected]> wrote:

>
> There have been long threads on mobile phone and netbook related forums about issues
> with seemingly "any slightly advanced use whatsoever" of partitions on SD cards.
>

As you may notice, you only get egg on your face when you suspend, so
it's really just the single problem. Granted, it's still a big one.

> IMHO in this strongly increasingly netbook- and mobile phone-enabled world it's
> a bloody shame that:
> - we have a hanging suspend/resume on an SD rootfs (often the only way of
> achieving serious Linux use on a mobile phone!)

I take it this is without CONFIG_MMC_UNSAFE_RESUME.

The fundamental problem is that we have no way of detecting if a card
was removed during suspend, meaning we cannot guarantee that we'll
return the hardware to the upper layers in the same state it was
before the suspend.

There are two improvements that can be made here:

- Don't power down the card during suspend. This eats more power and
might not be supported on all systems, but it allows us to detect any
removal. This has been on my todo list for ages, but I haven't found
any time to implement it (or even test if I have any systems that
might support it).

- Have upper layers handle removal detection. E.g. in the common case
of rootfs, the filesystem driver verifies that the storage is in the
same state when it resumes as it was when it suspended. This requires
a lot of work though as AFAIK there is no suspend functionality in
either the block layer or the VFS.

> - we lose partition mounts due to full device re-probing instead of re-using the
> same minor device ID after resume

This is a block layer issue, and I don't know if it's fixable.

Basically the problem is that someone is keeping the resources
associated with the pre-suspend block device pinned in memory. When the
post-suspend block device is created, it cannot reuse the device IDs
since they are still in use.

> - installing a swap partition on an SD card and then resuming can easily
> go as far as __even completely corrupting__ the entire SD card partitioning
> plus first partition (corrupts first 1kB of the card: both table and partition)
> People then immediately resort to a non-helpful "Don't Do This, Ever" reply
> (using swap partition on SD and suspend, see http://dev.laptop.org/ticket/6532#comment:10),

Hmm... I was under the impression that they got this fixed nice and
proper. Perhaps comment 34 should be sent to lkml and/or added to the
kernel bugzilla.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.


Attachments:
signature.asc (197.00 B)

2009-01-02 12:21:34

by Andreas Mohr

[permalink] [raw]
Subject: Re: Power Management with rootfs on SDMMC.

On Fri, Jan 02, 2009 at 12:21:48PM +0100, Pierre Ossman wrote:
> On Fri, 2 Jan 2009 11:21:52 +0100
> Andreas Mohr <[email protected]> wrote:
>
> >
> > There have been long threads on mobile phone and netbook related forums about issues
> > with seemingly "any slightly advanced use whatsoever" of partitions on SD cards.
> >
>
> As you may notice, you only get egg on your face when you suspend, so
> it's really just the single problem. Granted, it's still a big one.

The problem being that I (just like many other users) am trying to suspend
"all the time" (my God-Given Right ;), with issues popping up "all the time"
(Intel VC switching, microcode module, ath5k, and SD slots, just to name
all resume issues - now mostly working - on one single machine recently).
And I'm just fed up with it, sorry to have to put it that bluntly.

> > IMHO in this strongly increasingly netbook- and mobile phone-enabled world it's
> > a bloody shame that:
> > - we have a hanging suspend/resume on an SD rootfs (often the only way of
> > achieving serious Linux use on a mobile phone!)
>
> I take it this is without CONFIG_MMC_UNSAFE_RESUME.

Indeed (and I admittedly haven't even done any .28 tests yet about the
previous observations of suspend hangs and resume hangs
and partition corruption).

But one of my items was that CONFIG_MMC_UNSAFE_RESUME itself seems a
pretty inflexible and _hard-wired-selectable_ workaround measure anyway.

> The fundamental problem is that we have no way of detecting if a card
> was removed during suspend, meaning we cannot guarantee that we'll
> return the hardware to the upper layers in the same state it was
> before the suspend.
>
> There are two improvements that can be made here:
>
> - Don't power down the card during suspend. This eats more power and
> might not be supported on all systems, but it allows us to detect any
> removal. This has been on my todo list for ages, but I haven't found
> any time to implement it (or even test if I have any systems that
> might support it).

While this would improve things, it seems to be the second-best solution only,
especially since this probably requires properly working removal
notification for _every_ controller type.

> - Have upper layers handle removal detection. E.g. in the common case
> of rootfs, the filesystem driver verifies that the storage is in the
> same state when it resumes as it was when it suspended. This requires
> a lot of work though as AFAIK there is no suspend functionality in
> either the block layer or the VFS.

To me this seems to be the clearly preferred method.
(CC'd VFS, already pondered before whether I should do this but decided not to yet)

> > - we lose partition mounts due to full device re-probing instead of re-using the
> > same minor device ID after resume
>
> This is a block layer issue, and I don't know if it's fixable.
>
> Basically the problem is that someone is keeping the resources
> associated with the pre-suspend block device pinned in memory. When the
> post-suspend block device is created, it cannot reuse the device IDs
> since they are still in use.

I thought so, but someone would need to get to the bottom of this
and figure out a way to get a nice suspend routines support/workaround.

> > - installing a swap partition on an SD card and then resuming can easily
> > go as far as __even completely corrupting__ the entire SD card partitioning
> > plus first partition (corrupts first 1kB of the card: both table and partition)
> > People then immediately resort to a non-helpful "Don't Do This, Ever" reply
> > (using swap partition on SD and suspend, see http://dev.laptop.org/ticket/6532#comment:10),
>
> Hmm... I was under the impression that they got this fixed nice and
> proper. Perhaps comment 34 should be sent to lkml and/or added to the
> kernel bugzilla.

Right, #34 seems to describe pretty much what I think should be done
(keep things powered-down, then resume and compare with existing remembered
media id and revive old device handle in case it's actually same card).

("media id" above preferrably being a generic kernel concept of a media id
mechanism supported for all sorts of different media that a controller
may allow the kernel to support).


As a side note, I'm voicing a "me too" of not being too happy
to see people hard-coding timeouts there to try to "fix" this issue
instead of directly trying to come up with a synchronized signalling method
to fix this race there.


Am I right in thinking that if this is fixed properly, it would be the
CONFIG_MMC_UNSAFE_RESUME way of handling things, just in a sufficiently safe
manner? (notwithstanding user stupidity, i.e. hard removal of cards)
(i.e. CONFIG_MMC_UNSAFE_RESUME would then just be made default?)
Or... hmm... perhaps CONFIG_MMC_UNSAFE_RESUME actually would already
work for me entirely with my PCIE hotplug controller
in case its driver already provides reliably timed controller reinit
/ media re-detection after resume...

Anyway, the general thinking here _has_ to be:
if a mounted card remains in the slot during suspend, then it _should_ get
re-assigned properly, and if it has been removed despite not being
unmounted, then after resume the kernel should actively discard all references
(and throw a warning or some such).
And having a special CONFIG_MMC_UNSAFE_RESUME isn't really helpful here AFAICS,
VFS (and all related layers) should be able to handle this on its own
in its entirety, and if it's not able to do this
then it is to be considered very buggy and ought to be fixed.

But this is all common wisdom anyway I'd think, someone would have to actually
implement things to correctly work this way.


I actually thought of digging into this myself some time, but as opposed
to libata UDMA issues or WLAN LED support it's way too problematic
to tackle for me since it's said to be deep in VFS lands and debugging on
this measly machine would additionally take ages (2 hours in case one
needs an entire kernel build), plus limited time.

Thanks for your comments!

Andreas Mohr

2009-01-03 20:11:17

by Pavel Machek

[permalink] [raw]
Subject: Re: Power Management with rootfs on SDMMC.

Hi!

> > I am using linux-2.6.27. I am testing power management after
> > booting out of a SD/MMC card.
> > My root file system is on a SD card.
> >
> > I am issuing the following command to suspend
> >
> > $ echo -n mem > /sys/power/state
> > What happens is, The kernel hangs and it does not come out of suspend.
> > even after i press keypad/generate serial input data.
>
>
> > Has anyone tried this before?
> >
> > Am i missing something here?
>
> I don't think you're missing much, and you're definitely not alone.
>
> There have been long threads on mobile phone and netbook related forums about issues
> with seemingly "any slightly advanced use whatsoever" of partitions on SD cards.
>
> IMHO in this strongly increasingly netbook- and mobile phone-enabled world it's
> a bloody shame that:
...
> - installing a swap partition on an SD card and then resuming can easily
> go as far as __even completely corrupting__ the entire SD card partitioning
> plus first partition (corrupts first 1kB of the card: both table and partition)
> People then immediately resort to a non-helpful "Don't Do This, Ever" reply
> (using swap partition on SD and suspend, see http://dev.laptop.org/ticket/6532#comment:10),
> but to this I'd say:
> News Flash, if this can theoretically be made to work at all using software
> (i.e. there are no VM-related _hard_ blockers to such an operation
> of using swap itself on a non-fixed SD slot), then this should goddamn be made
> to work practically on Linux, _somehow_, since on SSD netbooks this is
> the most natural thing to do to avoid wear of the builtin device.

I'd like to help with this one... can you reproduce this?


> (or, slightly reworded: I think it's high time for some kernel God to buy a measly
> netbook or some such instead of 16-core mainframes to get a feeling for the
> amount of issues that one hits there)

I have one and yes, its full of problems; see my blog post about 'evil
little cards'.

OTOH currently you can't safely use ext3 on flash card, so suspend
problems seem little 'uninteresting' compared to that.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-01-03 20:24:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: Power Management with rootfs on SDMMC.



On Fri, 2 Jan 2009, Pavel Machek wrote:
>
> > (or, slightly reworded: I think it's high time for some kernel God to buy a measly
> > netbook or some such instead of 16-core mainframes to get a feeling for the
> > amount of issues that one hits there)
>
> I have one and yes, its full of problems; see my blog post about 'evil
> little cards'.

Btw, why do people think that CONFIG_USB_PERSIST got removed, and is now
always on? Exactly because of this issue, and because one kernel God (me)
wanted to be able to suspend their netbook.

That said, right now most flash performance is so incredibly bad that I
can't reasonably be expected to use that thing for anything real. They may
say 15-20 MB/s, but that's best-case streaming performance. Real life
write performance is often in the tens of kB (yes, *kilo* bytes) per
second. I hope it will improve, and no, it's not just about ext3 journal
behavior, although that certainly makes some of the issues much much
worse.

So no, I don't use my netbook very much. But this _is_ largely fixed ont
he USB side, even if you may need to do

echo 1 >/sys/bus/usb/devices/.../power/persist

and I argued that we should default to it for any mounted device. It's
hard to do, since the device doesn't even know about whether it's mounted
or not.

Linus

2009-01-03 20:41:47

by Pavel Machek

[permalink] [raw]
Subject: Re: Power Management with rootfs on SDMMC.

On Sat 2009-01-03 12:23:39, Linus Torvalds wrote:
>
>
> On Fri, 2 Jan 2009, Pavel Machek wrote:
> >
> > > (or, slightly reworded: I think it's high time for some kernel God to buy a measly
> > > netbook or some such instead of 16-core mainframes to get a feeling for the
> > > amount of issues that one hits there)
> >
> > I have one and yes, its full of problems; see my blog post about 'evil
> > little cards'.
>
> Btw, why do people think that CONFIG_USB_PERSIST got removed, and is now
> always on? Exactly because of this issue, and because one kernel God (me)
> wanted to be able to suspend their netbook.
>
> That said, right now most flash performance is so incredibly bad that I
> can't reasonably be expected to use that thing for anything real. They may
> say 15-20 MB/s, but that's best-case streaming performance. Real life
> write performance is often in the tens of kB (yes, *kilo* bytes) per
> second. I hope it will improve, and no, it's not just about ext3 journal
> behavior, although that certainly makes some of the issues much much
> worse.
>
> So no, I don't use my netbook very much. But this _is_ largely fixed ont
> he USB side, even if you may need to do
>
> echo 1 >/sys/bus/usb/devices/.../power/persist
>
> and I argued that we should default to it for any mounted device. It's
> hard to do, since the device doesn't even know about whether it's mounted
> or not.

Unfortunately, that corrupts filesystems by default on devices like
olympus c-750 digital camera:

* when USB is powered, olympus unmounts its filesystem, and goes into
mass storage mode.

* when USB is unpowered, olympus mounts it back.

So you have mounted filesystem on linux side and suspend. That powers
down the usb and camera mounts the filesystem automagically... then
linux resumes, and state of in-memory caches does not match real
filesystem state...

[Now, olympus uses VFAT and probably will not write to the filesystem
unless you take a picture... with mp3 player, I'd expect mtime to be
updated on the playlist files...]
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-01-03 20:45:37

by Alan

[permalink] [raw]
Subject: Re: Power Management with rootfs on SDMMC.

> [Now, olympus uses VFAT and probably will not write to the filesystem
> unless you take a picture... with mp3 player, I'd expect mtime to be
> updated on the playlist files...]

Some of the MP3/OGG players (especially the bigger disk based ones) use
the file system to store their own metadata so yes it makes a nasty mess.

2009-01-03 21:17:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: Power Management with rootfs on SDMMC.



On Sat, 3 Jan 2009, Alan Cox wrote:

> > [Now, olympus uses VFAT and probably will not write to the filesystem
> > unless you take a picture... with mp3 player, I'd expect mtime to be
> > updated on the playlist files...]
>
> Some of the MP3/OGG players (especially the bigger disk based ones) use
> the file system to store their own metadata so yes it makes a nasty mess.

Well, it goes both ways. You can make a nasty mess right now by suspending
and simply not having a working computer when it comes back - all your
work being lost.

At least with cameras and mp3 players you _can_ choose to just unmount
them before suspending (and most distros would hopefully mount them with
something like automount anyway?). In contrast, if your root (or /home)
directory is on a SD card, and that card is over USB rather than some IDE
controller, you're basically screwed.

Yes, distros can set the /sys/bus/usb/devices/.../power/persist thing for
important filesystems automatically, and I'm sure there are magic udev
rules that could be written (and perhaps even exist). The likelihood that
they actually get things right is pretty low, though. So I suspect we'd be
much better off having sane defaults in the kernel instead.

And yes, the "sane defaults" may well be that FATFS does _not_ make the
media be persistent.

Think "door lock" commands (aka "prevent/allow medium removal"). This is
really not that very different. Some filesystems are so important that the
user messing with them is deadly anyway - so we should "lock" them and
consider them persistent - and if the user does something bad, there was
really never any good solution for it. Other filesystems we're better off
just letting the user rip out, because we can be reasonably expected to
handle it gracefully.

So it boils down to the fact that if you have something like / or /home
mounted, we really _cannot_ do any better than "assume the user doesn't
screw us up".

A per-filesystem callback to re-verify at resume might be a good idea, but
a lot of filesystems cannot reasonably do a lot of verification.

Linus

2009-01-03 23:10:23

by Alan

[permalink] [raw]
Subject: Re: Power Management with rootfs on SDMMC.

> Well, it goes both ways. You can make a nasty mess right now by suspending
> and simply not having a working computer when it comes back - all your
> work being lost.

Yes but these are both symptoms of the same problem.

> they actually get things right is pretty low, though. So I suspect we'd be
> much better off having sane defaults in the kernel instead.

I don't believe "auto-destroy my music collection" is a sane default...

> So it boils down to the fact that if you have something like / or /home
> mounted, we really _cannot_ do any better than "assume the user doesn't
> screw us up".
>
> A per-filesystem callback to re-verify at resume might be a good idea, but
> a lot of filesystems cannot reasonably do a lot of verification.

A per file system sync and quiesce is I think also part of the
requirement. Having the file system media consistent but still mounted
before suspending is a good thing anyway (especially with stuff like USB
keys that people do then go and remove post suspend) and you can put the
device into a consistent state and revalidate it *regardless* of the
whether it is / or a music player. What you do if revalidating / fails is
another question ;)

Alan

2009-01-04 03:00:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: Power Management with rootfs on SDMMC.



On Sat, 3 Jan 2009, Alan Cox wrote:
>
> I don't believe "auto-destroy my music collection" is a sane default...

You are missing the point.

You have one totally made-up example of something that may happen as a
result of a default I didn't even advocate (but you didn't read my email).

And you use that as an argument against another case that wasn't even made
up, but a real issue where suspend simply didn't work, because the /home
partition no longer worked afterwards.

There really was nothing theoretical in my issue. On a certain class of
hardware, you absolutely _have_ to make your /home or / partition be
behind a USB thing, because nothing else has enough space on it.

And your made-up example wouldn't even trigger if we just made a per-mount
decision to mark devices persistent.

So why are you arguing?

Linus

2009-01-04 12:04:20

by Alan

[permalink] [raw]
Subject: Re: Power Management with rootfs on SDMMC.

> > I don't believe "auto-destroy my music collection" is a sane default...
>
> You are missing the point.

Nope.

> You have one totally made-up example of something that may happen as a
> result of a default I didn't even advocate (but you didn't read my email).

Actually I went and verified the behaviour of the iRiver iHP20.

> There really was nothing theoretical in my issue. On a certain class of
> hardware, you absolutely _have_ to make your /home or / partition be
> behind a USB thing, because nothing else has enough space on it.

Yes clearly.

> And your made-up example wouldn't even trigger if we just made a per-mount

What made up example ?

> decision to mark devices persistent.

The distribution question is 'how do you make that decision reliably and
correctly ?'. That is closely followed by 'what state should it end up in
if the relevant scripts don't run for some reason ?' - which is clearly
"not persistent" for safety reasons.

> So why are you arguing?

I'm not aware I was. I was simply pointing out that

- the general distribution default cannot be one that harms user data on
music players
- that you can fix it more elegantly by quiescing and validating file
systems across a suspend/resume

neither of which appears to disagree with the point that you want USB (or
increasingly SD card) to be able to autoresume when it holds file
systems. It does however make clear which way around any kernel default
should be.

Alan

2009-01-04 17:53:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: Power Management with rootfs on SDMMC.



On Sun, 4 Jan 2009, Alan Cox wrote:
>
> The distribution question is 'how do you make that decision reliably and
> correctly ?'. That is closely followed by 'what state should it end up in
> if the relevant scripts don't run for some reason ?' - which is clearly
> "not persistent" for safety reasons.

No.

You clearly haven't even read my emails. I quote:

And yes, the "sane defaults" may well be that FATFS does _not_ make the
media be persistent.

however, the sane default remains that things like / and /home _should_ be
marked persistent.


> I was simply pointing out that
>
> - the general distribution default cannot be one that harms user data on
> music players

No you were not. You were repeatign that mantra as if it was relevant,
when it isn't.

> - that you can fix it more elegantly by quiescing and validating file
> systems across a suspend/resume

Send me a patch.

Hint: you can't sanely even validate it without teaching the USB layer
_not_ to disconnect and re-connect the damn thing on resume (ie set the
"persistent" flag), because that can easily cause totally unrelated
changes to cause the device to be re-enumerated. How do you even find it?

In other words, you're wrong.

The only sane thing to do is what I already outlined: teach the
mount-points (and yes, on a per-mount-point basis) to mark the devices
persistent, and make the device drivers honor that.

Then, in _addition_, you can - once the device stays around - you can also
add a filesystem callback for suspend/resume, and make the filesystem try
to do extra sanity checking.

But quite frankly, that's very much the secondary stage, since it won't
matter in any sane real situation (ie once you've simply said that "we
don't cae about FAT being persistent"). And it's a secondary stage also
simply because it needs to happen _after_ you've already made the ones you
care about persistent, since it simply won't work otherwise.

Linus

2009-03-29 16:26:19

by Andreas Mohr

[permalink] [raw]
Subject: Re: Power Management with rootfs on SDMMC.

Hi,

On Fri, Jan 02, 2009 at 06:24:13PM +0100, Pavel Machek wrote:
> > IMHO in this strongly increasingly netbook- and mobile phone-enabled world it's
> > a bloody shame that:
> ...
> > - installing a swap partition on an SD card and then resuming can easily
> > go as far as __even completely corrupting__ the entire SD card partitioning
> > plus first partition (corrupts first 1kB of the card: both table and partition)
> > People then immediately resort to a non-helpful "Don't Do This, Ever" reply
> > (using swap partition on SD and suspend, see http://dev.laptop.org/ticket/6532#comment:10),
> > but to this I'd say:
> > News Flash, if this can theoretically be made to work at all using software
> > (i.e. there are no VM-related _hard_ blockers to such an operation
> > of using swap itself on a non-fixed SD slot), then this should goddamn be made
> > to work practically on Linux, _somehow_, since on SSD netbooks this is
> > the most natural thing to do to avoid wear of the builtin device.
>
> I'd like to help with this one... can you reproduce this?

Replying now since I'd like to give a status update:

I'm currently running a 2.6.29-rc8 with CONFIG_MMC_UNSAFE_RESUME enabled,
and an external SD card mount (ext2) _does_ survive resume properly
(most likely without this option it would still corrupt the partition
after resume).
Not sure whether use of the swap partition on this card is troublefree during
resume, though... (not seen much use so far, and no issues yet either)

However while UNSAFE_RESUME does work, obviously it's not a good idea
at all to have the Kconfig option described as:

config MMC_UNSAFE_RESUME
bool "Allow unsafe resume (DANGEROUS)"
help
If you say Y here, the MMC layer will assume that all cards
stayed in their respective slots during the suspend. The
normal behaviour is to remove them at suspend and
redetecting them at resume. Breaking this assumption will
in most cases result in data corruption.

This option is usually just for embedded systems which use
a MMC/SD card for rootfs. Most people should say N here.


since in such a use case it's obviously dangerous to _not_ have
this option enabled.

Or, in other words (as I have written before), media management should
get smarter to not need this option at all.

The least that should be done now is to drastically change this Kconfig
description (mention that _both_ settings can be dangerous, and help
people decide which one to use), to reflect current knowledge.
I'd be willing to submit a patch... (--> Pierre, right?)


[[ JFYI:

I'm very relieved to see those severe SSD bio performance problems
being addressed currently (Fengguang et al).

2.6.29-rc8 is too rough otherwise:
still i915 issues: x.org crash every ~ half-dozen resumes;
some ath5k weirdness (a suspected ath5k oops when doing rfkill, ...).
Plus e100 non-MII variants still not supported in 2.6.29 (let's see
how many complaints we get ;).

Going to install 2.6.29.1 (the one with network dropout fixed) once it appears.

]]

Thanks,

Andreas Mohr

2009-04-05 18:45:23

by Pierre Ossman

[permalink] [raw]
Subject: Re: Power Management with rootfs on SDMMC.

On Sun, 29 Mar 2009 18:26:01 +0200
Andreas Mohr <[email protected]> wrote:

>
> The least that should be done now is to drastically change this Kconfig
> description (mention that _both_ settings can be dangerous, and help
> people decide which one to use), to reflect current knowledge.
> I'd be willing to submit a patch... (--> Pierre, right?)
>

Right. And it probably needs a better description, yes. What should
also be clear is that one failure mode results in noisy corruption, and
the other in silent corruption (which is much worse IMO).

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.


Attachments:
signature.asc (198.00 B)