Hi,
I've noticed recently that the MMC/SD block driver does not reference
count the MMC/SD host driver module that it uses via the MMC/SD core
layer. Thus, I can rmmod my host driver module while, for example, a
partition on a SD card is mounted.
Assuming this does not happen intentionally, what would be the correct
fix? Is a try_module_get(card->host->parent->driver->owner) in
mmc_blk_alloc() in drivers/mmc/card/block.c the right thing to do? If
so, I could provide a patch fixing the issue. If not, please give me
advice how to do it better :) (May be I'm missing something that has to
do with hotplugging, as the devices I'm using are non-removeable).
Or am I possibly doing something wrong in my host driver and this
problem is not present for others at all?
Enrik
Enrik Berkhan wrote:
> Hi,
>
> I've noticed recently that the MMC/SD block driver does not reference
> count the MMC/SD host driver module that it uses via the MMC/SD core
> layer. Thus, I can rmmod my host driver module while, for example, a
> partition on a SD card is mounted.
This is same as removing the card while it's in use.
David
--
David Vrabel, Senior Software Engineer, Drivers
CSR, Churchill House, Cambridge Business Park, Tel: +44 (0)1223 692562
Cowley Road, Cambridge, CB4 0WZ http://www.csr.com/
On Fri, Jan 09, 2009 at 06:20:32PM +0000, David Vrabel wrote:
> Enrik Berkhan wrote:
> > I've noticed recently that the MMC/SD block driver does not reference
> > count the MMC/SD host driver module that it uses via the MMC/SD core
> > layer. Thus, I can rmmod my host driver module while, for example, a
> > partition on a SD card is mounted.
>
> This is same as removing the card while it's in use.
I don't think so.
Being able to remove the card while it is in use is a bug of the
mechanical interface. Can't be fixed easily.
Being able to remove the module while it is in use is a bug of the
software. Can be fixed.
Enrik
Enrik Berkhan wrote:
> On Fri, Jan 09, 2009 at 06:20:32PM +0000, David Vrabel wrote:
>> Enrik Berkhan wrote:
>>> I've noticed recently that the MMC/SD block driver does not reference
>>> count the MMC/SD host driver module that it uses via the MMC/SD core
>>> layer. Thus, I can rmmod my host driver module while, for example, a
>>> partition on a SD card is mounted.
>> This is same as removing the card while it's in use.
>
> I don't think so.
>
> Being able to remove the card while it is in use is a bug of the
> mechanical interface. Can't be fixed easily.
>
> Being able to remove the module while it is in use is a bug of the
> software. Can be fixed.
Here is my implementation experience with the issue, concerning the two
FireWire driver stacks which I maintain:
In case of firewire (new FireWire stack), module removal of the
controller driver will trigger removal of all child and grandchild
devices, e.g. sd_shutdown of FireWire HDDs.
This will have the same effect as if the cable of the HDD was pulled.
I don't see a need to prevent this because whoever runs
# modprobe -r firewire-ohci
is supposed to know what he is doing.
But in case of ieee1394 (old FireWire stack), the sbp2 storage driver
calls try_module_get() on the host driver whenever it starts using a
device, and of course module_put() when it is done. I.e. modprobe -r
ohci1394 is blocked as long as the sbp2 driver is in touch with a
FireWire drive.
I added this at some point because of unclean layering of the ieee1394
stack: The high-level drivers video1394 and dv1394 do not only depend
on the ieee1394 core driver, but also directly on the ohci1394 low-level
driver. Now, if somebody runs
# modprobe -r video1394
it would also remove ohci1394 if sbp2 hadn't artificially increased
ohci1394's reference count.
*However*, note that try_module_get() and module_put() don't actually do
what you asked for: They do _not_ guarantee that the host driver gets
unbound from the MMC (or FireWire...) controller:
You can still do
# echo -n ${ID} > /sys/module/${MODULE}/drivers/${DRIVER}/unbind
So, if somebody asked me to copy my ieee1394/sbp2 safeguard into
firewire/fw-sbp2, I would reject that on the grounds that killing the
connection to the FireWire disk is the *expected result* of
# modprobe -r firewire-ohci
--
Stefan Richter
-=====-==--= ---= -=-=-
http://arcgraph.de/sr/
On Sat, 10 Jan 2009 10:49:00 +0100
Stefan Richter <[email protected]> wrote:
>
> So, if somebody asked me to copy my ieee1394/sbp2 safeguard into
> firewire/fw-sbp2, I would reject that on the grounds that killing the
> connection to the FireWire disk is the *expected result* of
> # modprobe -r firewire-ohci
I have to agree with Stefan's reasoning here. The reference counting is
about protecting kernel integrity, not about saving the user's foot.
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
rdesktop, core developer http://www.rdesktop.org
WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.
Hi,
Pierre Ossman wrote:
> Stefan Richter <[email protected]> wrote:
> > So, if somebody asked me to copy my ieee1394/sbp2 safeguard into
> > firewire/fw-sbp2, I would reject that on the grounds that killing the
> > connection to the FireWire disk is the *expected result* of
> > # modprobe -r firewire-ohci
> I have to agree with Stefan's reasoning here. The reference counting is
> about protecting kernel integrity, not about saving the user's foot.
Yes, meanwhile, I have to admit you all are right and I have been
looking in the wrong direction.
My original problem was the following:
- MD raid0 made of some MMC/SD devices
- raid0 activated
- rmmod mmc_host_driver (user error, not noticing the raid is active)
- insmod mmc_host_driver (user error, still not noticing the raid is active)
- write to raid0 device
- kernel crash
I have debugged this a little and found the following reason for the
crash:
When removing the mmc_host_driver, everything seems to be fine; the
MMC/SD block device has been deactivated by mmc_blk_remove(), that in
turn has stopped the queue via mmc_cleanup_queue(). mmc_cleanup_queue()
calls blk_cleanup_queue() on the underlying struct request_queue. By
this, the reference count of the struct request_queues kboj drops to
zero. The MD driver still has the block device open and, actually,
things work fine unless the memory of the struct request_queue isn't
touched, because it is marked dead. Of course, accessing the MD device
returns EIO, but that's fine.
When the mmc_host_driver is reloaded, new struct request_queues will be
allocated and with some probability, the old memory will be re-used for
them or the old memory locations will be re-used for something else. The
key point is that the queues still in use by the MD layer will
effectively no longer be marked dead or completely corrupted.
I don't know if this is a problem of the MD layer, the MMC/SD block
driver, a more general problem or even no problem at all :)
How can this be fixed?
Enrik
Enrik Berkhan wrote:
> When removing the mmc_host_driver, everything seems to be fine; the
> MMC/SD block device has been deactivated by mmc_blk_remove(), that in
> turn has stopped the queue via mmc_cleanup_queue(). mmc_cleanup_queue()
> calls blk_cleanup_queue() on the underlying struct request_queue. By
> this, the reference count of the struct request_queues kboj drops to
> zero. The MD driver still has the block device open and, actually,
> things work fine unless the memory of the struct request_queue isn't
> touched, because it is marked dead. Of course, accessing the MD device
> returns EIO, but that's fine.
>
> When the mmc_host_driver is reloaded, new struct request_queues will be
> allocated and with some probability, the old memory will be re-used for
> them or the old memory locations will be re-used for something else. The
> key point is that the queues still in use by the MD layer will
> effectively no longer be marked dead or completely corrupted.
So in short, the request_queue's reference count goes to zero even
though something still points to it?
--
Stefan Richter
-=====-==--= ---= -==--
http://arcgraph.de/sr/
Stefan Richter wrote:
> Enrik Berkhan wrote:
> > When removing the mmc_host_driver, everything seems to be fine; the
> > MMC/SD block device has been deactivated by mmc_blk_remove(), that in
> > turn has stopped the queue via mmc_cleanup_queue(). mmc_cleanup_queue()
> > calls blk_cleanup_queue() on the underlying struct request_queue. By
> > this, the reference count of the struct request_queues kboj drops to
> > zero. The MD driver still has the block device open and, actually,
> > things work fine unless the memory of the struct request_queue isn't
> > touched, because it is marked dead. Of course, accessing the MD device
> > returns EIO, but that's fine.
> >
> > When the mmc_host_driver is reloaded, new struct request_queues will be
> > allocated and with some probability, the old memory will be re-used for
> > them or the old memory locations will be re-used for something else. The
> > key point is that the queues still in use by the MD layer will
> > effectively no longer be marked dead or completely corrupted.
>
> So in short, the request_queue's reference count goes to zero even
> though something still points to it?
Exactly. AFAICS.
I haven't checked yet if this happens using other block device
infrastructure, too.
Enrik