2009-01-09 17:15:25

by Enrik Berkhan

[permalink] [raw]
Subject: Reference counting of MMC host driver modules

Hi,

I've noticed recently that the MMC/SD block driver does not reference
count the MMC/SD host driver module that it uses via the MMC/SD core
layer. Thus, I can rmmod my host driver module while, for example, a
partition on a SD card is mounted.

Assuming this does not happen intentionally, what would be the correct
fix? Is a try_module_get(card->host->parent->driver->owner) in
mmc_blk_alloc() in drivers/mmc/card/block.c the right thing to do? If
so, I could provide a patch fixing the issue. If not, please give me
advice how to do it better :) (May be I'm missing something that has to
do with hotplugging, as the devices I'm using are non-removeable).

Or am I possibly doing something wrong in my host driver and this
problem is not present for others at all?

Enrik


2009-01-09 18:21:16

by David Vrabel

[permalink] [raw]
Subject: Re: Reference counting of MMC host driver modules

Enrik Berkhan wrote:
> Hi,
>
> I've noticed recently that the MMC/SD block driver does not reference
> count the MMC/SD host driver module that it uses via the MMC/SD core
> layer. Thus, I can rmmod my host driver module while, for example, a
> partition on a SD card is mounted.

This is same as removing the card while it's in use.

David
--
David Vrabel, Senior Software Engineer, Drivers
CSR, Churchill House, Cambridge Business Park, Tel: +44 (0)1223 692562
Cowley Road, Cambridge, CB4 0WZ http://www.csr.com/

2009-01-09 20:02:25

by Enrik Berkhan

[permalink] [raw]
Subject: Re: Reference counting of MMC host driver modules

On Fri, Jan 09, 2009 at 06:20:32PM +0000, David Vrabel wrote:
> Enrik Berkhan wrote:
> > I've noticed recently that the MMC/SD block driver does not reference
> > count the MMC/SD host driver module that it uses via the MMC/SD core
> > layer. Thus, I can rmmod my host driver module while, for example, a
> > partition on a SD card is mounted.
>
> This is same as removing the card while it's in use.

I don't think so.

Being able to remove the card while it is in use is a bug of the
mechanical interface. Can't be fixed easily.

Being able to remove the module while it is in use is a bug of the
software. Can be fixed.

Enrik

2009-01-10 09:49:28

by Stefan Richter

[permalink] [raw]
Subject: Re: Reference counting of MMC host driver modules

Enrik Berkhan wrote:
> On Fri, Jan 09, 2009 at 06:20:32PM +0000, David Vrabel wrote:
>> Enrik Berkhan wrote:
>>> I've noticed recently that the MMC/SD block driver does not reference
>>> count the MMC/SD host driver module that it uses via the MMC/SD core
>>> layer. Thus, I can rmmod my host driver module while, for example, a
>>> partition on a SD card is mounted.
>> This is same as removing the card while it's in use.
>
> I don't think so.
>
> Being able to remove the card while it is in use is a bug of the
> mechanical interface. Can't be fixed easily.
>
> Being able to remove the module while it is in use is a bug of the
> software. Can be fixed.

Here is my implementation experience with the issue, concerning the two
FireWire driver stacks which I maintain:

In case of firewire (new FireWire stack), module removal of the
controller driver will trigger removal of all child and grandchild
devices, e.g. sd_shutdown of FireWire HDDs.

This will have the same effect as if the cable of the HDD was pulled.
I don't see a need to prevent this because whoever runs
# modprobe -r firewire-ohci
is supposed to know what he is doing.

But in case of ieee1394 (old FireWire stack), the sbp2 storage driver
calls try_module_get() on the host driver whenever it starts using a
device, and of course module_put() when it is done. I.e. modprobe -r
ohci1394 is blocked as long as the sbp2 driver is in touch with a
FireWire drive.

I added this at some point because of unclean layering of the ieee1394
stack: The high-level drivers video1394 and dv1394 do not only depend
on the ieee1394 core driver, but also directly on the ohci1394 low-level
driver. Now, if somebody runs
# modprobe -r video1394
it would also remove ohci1394 if sbp2 hadn't artificially increased
ohci1394's reference count.

*However*, note that try_module_get() and module_put() don't actually do
what you asked for: They do _not_ guarantee that the host driver gets
unbound from the MMC (or FireWire...) controller:
You can still do
# echo -n ${ID} > /sys/module/${MODULE}/drivers/${DRIVER}/unbind

So, if somebody asked me to copy my ieee1394/sbp2 safeguard into
firewire/fw-sbp2, I would reject that on the grounds that killing the
connection to the FireWire disk is the *expected result* of
# modprobe -r firewire-ohci
--
Stefan Richter
-=====-==--= ---= -=-=-
http://arcgraph.de/sr/

2009-01-11 09:46:43

by Pierre Ossman

[permalink] [raw]
Subject: Re: Reference counting of MMC host driver modules

On Sat, 10 Jan 2009 10:49:00 +0100
Stefan Richter <[email protected]> wrote:

>
> So, if somebody asked me to copy my ieee1394/sbp2 safeguard into
> firewire/fw-sbp2, I would reject that on the grounds that killing the
> connection to the FireWire disk is the *expected result* of
> # modprobe -r firewire-ohci

I have to agree with Stefan's reasoning here. The reference counting is
about protecting kernel integrity, not about saving the user's foot.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.

2009-01-12 16:32:18

by Enrik Berkhan

[permalink] [raw]
Subject: Re: Reference counting of MMC host driver modules

Hi,

Pierre Ossman wrote:
> Stefan Richter <[email protected]> wrote:
> > So, if somebody asked me to copy my ieee1394/sbp2 safeguard into
> > firewire/fw-sbp2, I would reject that on the grounds that killing the
> > connection to the FireWire disk is the *expected result* of
> > # modprobe -r firewire-ohci
> I have to agree with Stefan's reasoning here. The reference counting is
> about protecting kernel integrity, not about saving the user's foot.

Yes, meanwhile, I have to admit you all are right and I have been
looking in the wrong direction.

My original problem was the following:

- MD raid0 made of some MMC/SD devices
- raid0 activated
- rmmod mmc_host_driver (user error, not noticing the raid is active)
- insmod mmc_host_driver (user error, still not noticing the raid is active)
- write to raid0 device
- kernel crash

I have debugged this a little and found the following reason for the
crash:

When removing the mmc_host_driver, everything seems to be fine; the
MMC/SD block device has been deactivated by mmc_blk_remove(), that in
turn has stopped the queue via mmc_cleanup_queue(). mmc_cleanup_queue()
calls blk_cleanup_queue() on the underlying struct request_queue. By
this, the reference count of the struct request_queues kboj drops to
zero. The MD driver still has the block device open and, actually,
things work fine unless the memory of the struct request_queue isn't
touched, because it is marked dead. Of course, accessing the MD device
returns EIO, but that's fine.

When the mmc_host_driver is reloaded, new struct request_queues will be
allocated and with some probability, the old memory will be re-used for
them or the old memory locations will be re-used for something else. The
key point is that the queues still in use by the MD layer will
effectively no longer be marked dead or completely corrupted.

I don't know if this is a problem of the MD layer, the MMC/SD block
driver, a more general problem or even no problem at all :)

How can this be fixed?

Enrik

2009-01-12 17:10:44

by Stefan Richter

[permalink] [raw]
Subject: Re: Reference counting of MMC host driver modules

Enrik Berkhan wrote:
> When removing the mmc_host_driver, everything seems to be fine; the
> MMC/SD block device has been deactivated by mmc_blk_remove(), that in
> turn has stopped the queue via mmc_cleanup_queue(). mmc_cleanup_queue()
> calls blk_cleanup_queue() on the underlying struct request_queue. By
> this, the reference count of the struct request_queues kboj drops to
> zero. The MD driver still has the block device open and, actually,
> things work fine unless the memory of the struct request_queue isn't
> touched, because it is marked dead. Of course, accessing the MD device
> returns EIO, but that's fine.
>
> When the mmc_host_driver is reloaded, new struct request_queues will be
> allocated and with some probability, the old memory will be re-used for
> them or the old memory locations will be re-used for something else. The
> key point is that the queues still in use by the MD layer will
> effectively no longer be marked dead or completely corrupted.

So in short, the request_queue's reference count goes to zero even
though something still points to it?
--
Stefan Richter
-=====-==--= ---= -==--
http://arcgraph.de/sr/

2009-01-12 20:02:25

by Enrik Berkhan

[permalink] [raw]
Subject: Re: Reference counting of MMC host driver modules

Stefan Richter wrote:
> Enrik Berkhan wrote:
> > When removing the mmc_host_driver, everything seems to be fine; the
> > MMC/SD block device has been deactivated by mmc_blk_remove(), that in
> > turn has stopped the queue via mmc_cleanup_queue(). mmc_cleanup_queue()
> > calls blk_cleanup_queue() on the underlying struct request_queue. By
> > this, the reference count of the struct request_queues kboj drops to
> > zero. The MD driver still has the block device open and, actually,
> > things work fine unless the memory of the struct request_queue isn't
> > touched, because it is marked dead. Of course, accessing the MD device
> > returns EIO, but that's fine.
> >
> > When the mmc_host_driver is reloaded, new struct request_queues will be
> > allocated and with some probability, the old memory will be re-used for
> > them or the old memory locations will be re-used for something else. The
> > key point is that the queues still in use by the MD layer will
> > effectively no longer be marked dead or completely corrupted.
>
> So in short, the request_queue's reference count goes to zero even
> though something still points to it?

Exactly. AFAICS.

I haven't checked yet if this happens using other block device
infrastructure, too.

Enrik