2007-05-10 10:35:53

by Haavard Skinnemoen

[permalink] [raw]
Subject: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

At some point before 2.6.20, the mmc subsystem moved the card
detection code to its own workqueue. One consequence of this change
is that when using an mmc card as a root device, the card may get
detected after the init task attempts to mount the root filesystem,
causing a kernel panic because the root device could not be opened.

This patch adds a call to mmc_flush_scheduled_work() late in the boot
sequence so that we can be sure the mmc card detection scans are
complete before attempting to use an mmc device as a root device.

Signed-off-by: Haavard Skinnemoen <[email protected]>
---
drivers/mmc/core/sysfs.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/mmc/core/sysfs.c b/drivers/mmc/core/sysfs.c
index 843b1fb..168b20f 100644
--- a/drivers/mmc/core/sysfs.c
+++ b/drivers/mmc/core/sysfs.c
@@ -358,3 +358,16 @@ static void __exit mmc_exit(void)

module_init(mmc_init);
module_exit(mmc_exit);
+
+#ifndef MODULE
+/*
+ * Make sure scanning for new cards has completed before attempting
+ * to mount the root filesystem.
+ */
+static int __init mmc_finish_detect(void)
+{
+ mmc_flush_scheduled_work();
+ return 0;
+}
+late_initcall(mmc_finish_detect);
+#endif
--
1.4.4.4


2007-05-10 12:06:33

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

Haavard Skinnemoen wrote:
> At some point before 2.6.20, the mmc subsystem moved the card
> detection code to its own workqueue. One consequence of this change
> is that when using an mmc card as a root device, the card may get
> detected after the init task attempts to mount the root filesystem,
> causing a kernel panic because the root device could not be opened.
>
> This patch adds a call to mmc_flush_scheduled_work() late in the boot
> sequence so that we can be sure the mmc card detection scans are
> complete before attempting to use an mmc device as a root device.
>
> Signed-off-by: Haavard Skinnemoen <[email protected]>

NAK. This is still hackish and not a reliable, controlled way of handling the
issue of rootfs on removable media.

For reference, how is this handled in USB (which is conceptually identical)? The
normal case for removable media is usually an initrd that can wait for hotplug
events.

Rgds
Pierre


Attachments:
signature.asc (251.00 B)
OpenPGP digital signature

2007-05-10 12:37:48

by Haavard Skinnemoen

[permalink] [raw]
Subject: Re: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

On Thu, 10 May 2007 14:04:25 +0200
Pierre Ossman <[email protected]> wrote:

> Haavard Skinnemoen wrote:
> > At some point before 2.6.20, the mmc subsystem moved the card
> > detection code to its own workqueue. One consequence of this change
> > is that when using an mmc card as a root device, the card may get
> > detected after the init task attempts to mount the root filesystem,
> > causing a kernel panic because the root device could not be opened.
> >
> > This patch adds a call to mmc_flush_scheduled_work() late in the boot
> > sequence so that we can be sure the mmc card detection scans are
> > complete before attempting to use an mmc device as a root device.
> >
> > Signed-off-by: Haavard Skinnemoen <[email protected]>
>
> NAK. This is still hackish and not a reliable, controlled way of handling the
> issue of rootfs on removable media.

What exactly makes this unreliable? This is done almost exactly the
same way for SCSI. See drivers/scsi/scsi_wait_scan.c.

> For reference, how is this handled in USB (which is conceptually identical)? The
> normal case for removable media is usually an initrd that can wait for hotplug
> events.

I don't know about USB, but root=/dev/mmcblk0p1 used to work before
2.6.20 and it doesn't work anymore. Doesn't that make this a regression?

Haavard

2007-05-10 13:47:00

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

Haavard Skinnemoen wrote:
>
> What exactly makes this unreliable? This is done almost exactly the
> same way for SCSI. See drivers/scsi/scsi_wait_scan.c.
>

I am not against the function of waiting for an initial scan, what I oppose is
using side effects to achieve that function. I do not want to take
responsibility for something that easily breaks because we use a kernel
subsystem for something it wasn't meant for.

That said, if there is a precedent for achieving this function a certain way, I
might be convinced to let it in. I'll have a look at that scsi example.

>
> I don't know about USB, but root=/dev/mmcblk0p1 used to work before
> 2.6.20 and it doesn't work anymore. Doesn't that make this a regression?
>

Yes and no. It depends on if it was an official function, or just an effect of
how things currently were implemented. As far as I can see, it's the latter. The
MMC layer goes through several steps that could very well get delayed or happen
in parallel. So the fact that it happened to work the way you wanted it to was
sheer luck.

Rgds
Pierre



Attachments:
signature.asc (251.00 B)
OpenPGP digital signature

2007-05-10 14:34:00

by Haavard Skinnemoen

[permalink] [raw]
Subject: Re: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

On Thu, 10 May 2007 15:45:54 +0200
Pierre Ossman <[email protected]> wrote:

> Haavard Skinnemoen wrote:
> >
> > What exactly makes this unreliable? This is done almost exactly the
> > same way for SCSI. See drivers/scsi/scsi_wait_scan.c.
> >
>
> I am not against the function of waiting for an initial scan, what I oppose is
> using side effects to achieve that function. I do not want to take
> responsibility for something that easily breaks because we use a kernel
> subsystem for something it wasn't meant for.

Ok, is there any better way to achieve the same this? As far as I
can tell, mmc_remove_host() uses mmc_flush_scheduled_work() for the
same purpose -- it ensures that scans that have already been started
will have completed before the function continues.

> That said, if there is a precedent for achieving this function a certain way, I
> might be convinced to let it in. I'll have a look at that scsi example.

Thanks.

> > I don't know about USB, but root=/dev/mmcblk0p1 used to work before
> > 2.6.20 and it doesn't work anymore. Doesn't that make this a regression?
> >
>
> Yes and no. It depends on if it was an official function, or just an effect of
> how things currently were implemented. As far as I can see, it's the latter. The
> MMC layer goes through several steps that could very well get delayed or happen
> in parallel. So the fact that it happened to work the way you wanted it to was
> sheer luck.

I wouldn't call it luck. The way mmc_rescan() is implemented, any scans
that are started before late_initcall time are completed before
mmc_finish_detect() returns. The steps are all done synchronously in
the workqueue function.

And I never realized that using a block device as a parameter to the
root= parameter could possibly be "unofficial" functionality...?

Haavard

2007-05-10 15:59:32

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

Haavard Skinnemoen wrote:
>
> Ok, is there any better way to achieve the same this? As far as I
> can tell, mmc_remove_host() uses mmc_flush_scheduled_work() for the
> same purpose -- it ensures that scans that have already been started
> will have completed before the function continues.
>

Not quite. mmc_remove_host() doesn't care if a detect has been executed or not,
it only cares about whether or not one is executing right now. So in order for
your patch to work, there must be no way that mmc_finish_detect() is called
before the detect is scheduled.

>
> I wouldn't call it luck. The way mmc_rescan() is implemented, any scans
> that are started before late_initcall time are completed before
> mmc_finish_detect() returns. The steps are all done synchronously in
> the workqueue function.
>

Indeed. But it is not by design that they are scheduled before late_initcall().
Also, flushing the workqueue is also not a by-design way of completing a scan,
it just happens to be that way right now.

> And I never realized that using a block device as a parameter to the
> root= parameter could possibly be "unofficial" functionality...?

Then you've learned something new. ;)

Only some devices work that way (generally only those with "simple" interfaces).
And most things are these days behind more advanced buses, more akin to a network.

Generally, not waiting for a lot of hardware is a good thing as it speeds up
boot times. In my mind, the proper way is having a script in an initrd that
waits for just the parts of the hardware that this particular system needs.
Everything else can be brought up in the background.

Rgds
Pierre



Attachments:
signature.asc (251.00 B)
OpenPGP digital signature

2007-05-10 16:28:04

by Haavard Skinnemoen

[permalink] [raw]
Subject: Re: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

On Thu, 10 May 2007 17:58:27 +0200
Pierre Ossman <[email protected]> wrote:

> Haavard Skinnemoen wrote:
> >
> > Ok, is there any better way to achieve the same this? As far as I
> > can tell, mmc_remove_host() uses mmc_flush_scheduled_work() for the
> > same purpose -- it ensures that scans that have already been started
> > will have completed before the function continues.
> >
>
> Not quite. mmc_remove_host() doesn't care if a detect has been executed or not,
> it only cares about whether or not one is executing right now. So in order for
> your patch to work, there must be no way that mmc_finish_detect() is called
> before the detect is scheduled.

Is there any way this can happen? Host controller drivers register
themselves from module_init(), their probe() function gets called,
which calls mmc_add_host(), which schedules the initial scan. If
multiple host controllers are present, they will all schedule their
scans before all module_init()s have been called. Am I missing
something?

> > I wouldn't call it luck. The way mmc_rescan() is implemented, any scans
> > that are started before late_initcall time are completed before
> > mmc_finish_detect() returns. The steps are all done synchronously in
> > the workqueue function.
> >
>
> Indeed. But it is not by design that they are scheduled before late_initcall().
> Also, flushing the workqueue is also not a by-design way of completing a scan,
> it just happens to be that way right now.

So how is it supposed to work "by design"?

> > And I never realized that using a block device as a parameter to the
> > root= parameter could possibly be "unofficial" functionality...?
>
> Then you've learned something new. ;)

Guess so. It seems like the prepare_namespace stuff is getting less
useful every day.

> Only some devices work that way (generally only those with "simple" interfaces).
> And most things are these days behind more advanced buses, more akin to a network.

It's funny that NFS root does not have these kinds of problems then...

> Generally, not waiting for a lot of hardware is a good thing as it speeds up
> boot times. In my mind, the proper way is having a script in an initrd that
> waits for just the parts of the hardware that this particular system needs.
> Everything else can be brought up in the background.

Yeah, but I suspect most users will rather have a system that actually
boots than a system that would have booted very quickly if it had booted
at all.

Btw, how can I wait for the scanning to complete from early userspace?

Haavard

2007-05-10 17:52:01

by Matt Reimer

[permalink] [raw]
Subject: Re: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

On 5/10/07, Haavard Skinnemoen <[email protected]> wrote:
> Btw, how can I wait for the scanning to complete from early userspace?

A workaround is to pass the kernel argument rootdelay=5 or something
like that (the value specifies seconds).

Matt

2007-05-10 19:42:19

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

Haavard Skinnemoen wrote:
> On Thu, 10 May 2007 17:58:27 +0200
> Pierre Ossman <[email protected]> wrote:
>
> Is there any way this can happen? Host controller drivers register
> themselves from module_init(), their probe() function gets called,
> which calls mmc_add_host(), which schedules the initial scan. If
> multiple host controllers are present, they will all schedule their
> scans before all module_init()s have been called. Am I missing
> something?
>

You're assuming two things:

1. The bus the host controller reside on is synchronous both in adding devices
and drivers. This is true for most buses, but not all. As we can see, the MMC
bus is an example of one that isn't.

2. The bus the host controller reside on is scanned before any sync function we add.

The second probably isn't much of a stretch, but the first is more likely to
break. E.g. an usb based controller would not satisfy 1 as usb is scanned
asynchronously. Platform devices are case by case (e.g. some device might be
delayed since it needs time to power up).

>> Indeed. But it is not by design that they are scheduled before late_initcall().
>> Also, flushing the workqueue is also not a by-design way of completing a scan,
>> it just happens to be that way right now.
>
> So how is it supposed to work "by design"?
>

It isn't. The MMC bus is a pesky bugger in that it is slow and involves a lot of
sleeping and asynchronous callbacks. As such, synchronisation needs to be very
explicit using for example completions.

>>> And I never realized that using a block device as a parameter to the
>>> root= parameter could possibly be "unofficial" functionality...?
>> Then you've learned something new. ;)
>
> Guess so. It seems like the prepare_namespace stuff is getting less
> useful every day.
>

Probably. But I'd like to hope we're gaining more than we lose. There are some
horror stories from the scsi camp where synchronous scanning means it takes an
hour to boot a machine.

>> Only some devices work that way (generally only those with "simple" interfaces).
>> And most things are these days behind more advanced buses, more akin to a network.
>
> It's funny that NFS root does not have these kinds of problems then...
>

I'm not familiar with that code, but I would guess it has a whole bunch of
prerequisite conditions. And the nfs root installations I've seen all use
initrd, so I would assume there are some restrictions to letting the kernel
handle it.

>> Generally, not waiting for a lot of hardware is a good thing as it speeds up
>> boot times. In my mind, the proper way is having a script in an initrd that
>> waits for just the parts of the hardware that this particular system needs.
>> Everything else can be brought up in the background.
>
> Yeah, but I suspect most users will rather have a system that actually
> boots than a system that would have booted very quickly if it had booted
> at all.
>

I think most people can live with having an initrd, so I wouldn't paint quite
such a bleak picture.

> Btw, how can I wait for the scanning to complete from early userspace?
>

Monitor /proc/partitions or /sys until the device you need for rootfs shows up.


The big problem from my point of view is that I do not want to start promising
people a feature we might not be able to support in the future, and perhaps not
even now with some hardware.

Is there some reason you cannot use an initrd or is it just the inconvenience?


Rgds
Pierre


Attachments:
signature.asc (251.00 B)
OpenPGP digital signature

2007-05-11 07:44:44

by Haavard Skinnemoen

[permalink] [raw]
Subject: Re: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

On Thu, 10 May 2007 10:51:50 -0700
"Matt Reimer" <[email protected]> wrote:

> On 5/10/07, Haavard Skinnemoen <[email protected]> wrote:
> > Btw, how can I wait for the scanning to complete from early userspace?
>
> A workaround is to pass the kernel argument rootdelay=5 or something
> like that (the value specifies seconds).

Yeah, I know. It's just that this is still a workaround and I wanted to
fix the real issue. But if the real fix is "use initrd", I guess I'll
just have to look into that...

Haavard

2007-05-11 08:39:27

by Haavard Skinnemoen

[permalink] [raw]
Subject: Re: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

On Thu, 10 May 2007 21:41:22 +0200
Pierre Ossman <[email protected]> wrote:

> You're assuming two things:
>
> 1. The bus the host controller reside on is synchronous both in adding devices
> and drivers. This is true for most buses, but not all. As we can see, the MMC
> bus is an example of one that isn't.
>
> 2. The bus the host controller reside on is scanned before any sync function we add.
>
> The second probably isn't much of a stretch, but the first is more likely to
> break. E.g. an usb based controller would not satisfy 1 as usb is scanned
> asynchronously. Platform devices are case by case (e.g. some device might be
> delayed since it needs time to power up).

You're right about my assumptions. Are there any existing drivers that
break them? Are there even any usb-based controllers around? I though
most usb-mmc controllers used the USB Mass Storage class and thus
don't use the mmc subsystem at all.

> >> Indeed. But it is not by design that they are scheduled before late_initcall().
> >> Also, flushing the workqueue is also not a by-design way of completing a scan,
> >> it just happens to be that way right now.
> >
> > So how is it supposed to work "by design"?
> >
>
> It isn't. The MMC bus is a pesky bugger in that it is slow and involves a lot of
> sleeping and asynchronous callbacks. As such, synchronisation needs to be very
> explicit using for example completions.

I see. The flush_workqueue approach might end up waiting for other
things than just scanning, is that the problem? Would it be better to
add a per-host "inital scan complete" completion that we could wait on
instead?

> >>> And I never realized that using a block device as a parameter to the
> >>> root= parameter could possibly be "unofficial" functionality...?
> >> Then you've learned something new. ;)
> >
> > Guess so. It seems like the prepare_namespace stuff is getting less
> > useful every day.
> >
>
> Probably. But I'd like to hope we're gaining more than we lose. There are some
> horror stories from the scsi camp where synchronous scanning means it takes an
> hour to boot a machine.

Yeah, but I don't understand why we keep breaking it without providing
any real alternative (apart from distro-specific initrd scripts.)

Also, with scsi you can do the scanning in parallel, but wait for all
of them to complete before mounting the root filesystem. This is what I
want the mmc subsystem to do as well.

> >> Only some devices work that way (generally only those with "simple" interfaces).
> >> And most things are these days behind more advanced buses, more akin to a network.
> >
> > It's funny that NFS root does not have these kinds of problems then...
> >
>
> I'm not familiar with that code, but I would guess it has a whole bunch of
> prerequisite conditions. And the nfs root installations I've seen all use
> initrd, so I would assume there are some restrictions to letting the kernel
> handle it.

I've never used initrd with nfs root. I suspect most distributions use
initrd so that they can build a generic, modular kernel and just load
the modules they need from initrd.

And nfs root often uses DHCP to obtain an IP address, it has to do
portmap lookups, etc, so it definitely qualifies as a pesky,
asynchronous bugger. But it still manages to be synchronous enough to
be able to mount root without depending on initrd or rootdelay.

> >> Generally, not waiting for a lot of hardware is a good thing as it speeds up
> >> boot times. In my mind, the proper way is having a script in an initrd that
> >> waits for just the parts of the hardware that this particular system needs.
> >> Everything else can be brought up in the background.
> >
> > Yeah, but I suspect most users will rather have a system that actually
> > boots than a system that would have booted very quickly if it had booted
> > at all.
> >
>
> I think most people can live with having an initrd, so I wouldn't paint quite
> such a bleak picture.

I'm not sure how many embedded people actually know how to build an
initrd for a custom board.

> > Btw, how can I wait for the scanning to complete from early userspace?
> >
>
> Monitor /proc/partitions or /sys until the device you need for rootfs shows up.

Simple enough, I suppose. But it still adds complexity to the system
which used to be unnecessary.

> The big problem from my point of view is that I do not want to start promising
> people a feature we might not be able to support in the future, and perhaps not
> even now with some hardware.

The problem from _my_ point of view is that this feature already worked
until 2.6.20, so the promise has already been made.

> Is there some reason you cannot use an initrd or is it just the inconvenience?

Well, it's mostly inconvenience I suppose. But I'm also worried about
increasing the boot time of systems that used to boot in 4-5 seconds.
One second extra spent doing decompression, polling files in sysfs,
etc. could have a huge impact, relatively speaking.

Now, I'm not really opposed to the whole concept of doing filesystem
mounting from early userspace -- I think that might be a quite elegant
approach. But a) the prepare_namespace code is still in the kernel, so
it ought to work, b) klibc has not yet been merged so each user is on
his own, and c) Repeatedly reading sysfs files to see when the root
device is available is one thing that I don't think is very elegant.

But if you don't want this issue fixed (i.e. you don't think of it as
an issue) I guess I have to either start working on yet another initrd
setup or just apply the patch to our vendor kernel and be done with it.
The latter certainly is the most tempting, but I suppose the former is
more like how things are supposed to be done in the future.

Haavard

2007-05-13 13:36:00

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

Haavard Skinnemoen wrote:
>
> You're right about my assumptions. Are there any existing drivers that
> break them? Are there even any usb-based controllers around? I though
> most usb-mmc controllers used the USB Mass Storage class and thus
> don't use the mmc subsystem at all.
>

Yes, but they might show up in the future. My point was that we know of
scenarios that will break this, so it won't be a universal solution (even though
it might work right now).

>
> I see. The flush_workqueue approach might end up waiting for other
> things than just scanning, is that the problem? Would it be better to
> add a per-host "inital scan complete" completion that we could wait on
> instead?
>

That would be a cleaner solution yes. That way we don't exploit any current
behaviour that might change in the future.

>
> I'm not sure how many embedded people actually know how to build an
> initrd for a custom board.
>

All the ones I have on my desk right now use initrd. ;)

>
> But if you don't want this issue fixed (i.e. you don't think of it as
> an issue) I guess I have to either start working on yet another initrd
> setup or just apply the patch to our vendor kernel and be done with it.
> The latter certainly is the most tempting, but I suppose the former is
> more like how things are supposed to be done in the future.
>

Of course I see it as an issue. My concern is if we gain more than we lose.

I had a chat with David Woodhouse and Segher Boessenkool and I think we have
another approach. Basically, we move the waiting which would normally go into
the initrd and move it into the kernel. So you get something like:

"Waiting for root device /dev/mmcblk0p1..."

The only problem here is if the device never shows up, but if that was the case
you would previously get a panic, so it should not be any worse.

Does that sound like something that would work for you? From my point of view
it's a much cleaner solution that has the benefit of not being tied into a
certain subsystem (i.e. this would "fix" usb aswell).

Rgds
Pierre


Attachments:
signature.asc (251.00 B)
OpenPGP digital signature

2007-05-13 13:47:56

by Haavard Skinnemoen

[permalink] [raw]
Subject: Re: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

On Sun, 13 May 2007 15:34:51 +0200
Pierre Ossman <[email protected]> wrote:

> > I'm not sure how many embedded people actually know how to build an
> > initrd for a custom board.
> >
>
> All the ones I have on my desk right now use initrd. ;)

Ok, I guess we'll just have to look into that, then.

> I had a chat with David Woodhouse and Segher Boessenkool and I think we have
> another approach. Basically, we move the waiting which would normally go into
> the initrd and move it into the kernel. So you get something like:
>
> "Waiting for root device /dev/mmcblk0p1..."
>
> The only problem here is if the device never shows up, but if that was the case
> you would previously get a panic, so it should not be any worse.

Indeed, it will even be pretty obvious what happened, and the user can
probably just insert a card at that point to get a working system.

> Does that sound like something that would work for you? From my point of view
> it's a much cleaner solution that has the benefit of not being tied into a
> certain subsystem (i.e. this would "fix" usb aswell).

Yes, that would definitely work.

Haavard

2007-05-13 14:25:42

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

Haavard Skinnemoen wrote:
> On Sun, 13 May 2007 15:34:51 +0200
> Pierre Ossman <[email protected]> wrote:
>
>> Does that sound like something that would work for you? From my point of view
>> it's a much cleaner solution that has the benefit of not being tied into a
>> certain subsystem (i.e. this would "fix" usb aswell).
>
> Yes, that would definitely work.
>

I'll commence the kernel whipping. Will you be available for testing once I have
something working?

Rgds
Pierre



Attachments:
signature.asc (251.00 B)
OpenPGP digital signature

2007-05-13 14:38:13

by Haavard Skinnemoen

[permalink] [raw]
Subject: Re: [PATCH] MMC: Flush mmc workqueue late in the boot sequence

On Sun, 13 May 2007 16:24:55 +0200
Pierre Ossman <[email protected]> wrote:

> Haavard Skinnemoen wrote:
> > On Sun, 13 May 2007 15:34:51 +0200
> > Pierre Ossman <[email protected]> wrote:
> >
> >> Does that sound like something that would work for you? From my point of view
> >> it's a much cleaner solution that has the benefit of not being tied into a
> >> certain subsystem (i.e. this would "fix" usb aswell).
> >
> > Yes, that would definitely work.
> >
>
> I'll commence the kernel whipping. Will you be available for testing once I have
> something working?

Sure. I'll see if I can find a few more testers over at the avrfreaks
forums as well.

Haavard