Hi Jens,
I've been trying to sort out some bugs in the MMC layer's block driver,
but my knowledge about the block layer is severely lacking. So I was
hoping you could educate me a bit. :)
Upon creation, the following happens:
alloc_disk()
spin_lock_init()
blk_init_queue()
blk_queue_*() (Set up limits)
disk->* = * (assign members)
blk_queue_hardsect_size()
set_capacity()
add_disk()
And on a clean removal, where there are no users of a card when it is
removed:
del_gendisk()
put_disk()
blk_cleanup_queue()
So far everything seems nice and peachy. The question is what to do when
a card is removed when the device is open.
In that case, del_gendisk() will be called, which seems to be documented
as blocking any new requests to be added to the queue. But there will be
a lot of outstanding requests in the queue.
Is it up to each block device to iterate and fail these or can I tell
the kernel "I'm broken, go away!"?
When the queue eventually drains (without too many oopses) and the user
calls close(), then put_disk() and blk_cleanup_queue() will be called.
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
On Tue, Nov 14 2006, Pierre Ossman wrote:
> Hi Jens,
>
> I've been trying to sort out some bugs in the MMC layer's block driver,
> but my knowledge about the block layer is severely lacking. So I was
> hoping you could educate me a bit. :)
>
> Upon creation, the following happens:
>
> alloc_disk()
> spin_lock_init()
> blk_init_queue()
> blk_queue_*() (Set up limits)
> disk->* = * (assign members)
> blk_queue_hardsect_size()
> set_capacity()
> add_disk()
>
> And on a clean removal, where there are no users of a card when it is
> removed:
>
> del_gendisk()
> put_disk()
> blk_cleanup_queue()
>
> So far everything seems nice and peachy. The question is what to do when
> a card is removed when the device is open.
>
> In that case, del_gendisk() will be called, which seems to be documented
> as blocking any new requests to be added to the queue. But there will be
> a lot of outstanding requests in the queue.
>
> Is it up to each block device to iterate and fail these or can I tell
> the kernel "I'm broken, go away!"?
>
> When the queue eventually drains (without too many oopses) and the user
> calls close(), then put_disk() and blk_cleanup_queue() will be called.
There is no helper to kill already queued requests when a device is
removed, if you look at SCSI you'll see that it handles this "manually"
as well in the request_fn handler. So you'll need a "device dead or
gone" check in your request_fn handler, and do it from there.
--
Jens Axboe
Jens Axboe wrote:
>
> There is no helper to kill already queued requests when a device is
> removed, if you look at SCSI you'll see that it handles this "manually"
> as well in the request_fn handler. So you'll need a "device dead or
> gone" check in your request_fn handler, and do it from there.
>
>
Is there some part of the current infrastructure I can use to determine
this. If del_gendisk() grabs the queue lock (and hence is "safe" wrt the
request handler), then perhaps there is a test that can be done to test
if the disk has been deleted?
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
On Tue, Nov 14 2006, Pierre Ossman wrote:
> Jens Axboe wrote:
> >
> > There is no helper to kill already queued requests when a device is
> > removed, if you look at SCSI you'll see that it handles this "manually"
> > as well in the request_fn handler. So you'll need a "device dead or
> > gone" check in your request_fn handler, and do it from there.
> >
> >
>
> Is there some part of the current infrastructure I can use to determine
> this. If del_gendisk() grabs the queue lock (and hence is "safe" wrt the
> request handler), then perhaps there is a test that can be done to test
> if the disk has been deleted?
SCSI just sets ->queuedata to NULL, if you store your device there you
may do just that. Or just mark your device structure appropriately,
there are no special places in the queue for that.
--
Jens Axboe
Jens Axboe wrote:
> SCSI just sets ->queuedata to NULL, if you store your device there you
> may do just that. Or just mark your device structure appropriately,
> there are no special places in the queue for that.
>
>
I've had another look at it, and I believe I have a solution. There is
one assumption I need to verify though.
After del_gendisk() and after I've flushed out any remaining requests,
is it ok to kill off the queue? Someone might still have the disk open,
so that would mean the queue is gone by the time gendisk's release
function is called.
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
On Tue, Nov 14 2006, Pierre Ossman wrote:
> Jens Axboe wrote:
> > SCSI just sets ->queuedata to NULL, if you store your device there you
> > may do just that. Or just mark your device structure appropriately,
> > there are no special places in the queue for that.
> >
> >
>
> I've had another look at it, and I believe I have a solution. There is
> one assumption I need to verify though.
>
> After del_gendisk() and after I've flushed out any remaining requests,
> is it ok to kill off the queue? Someone might still have the disk open,
> so that would mean the queue is gone by the time gendisk's release
> function is called.
What do you mean by "killing off the queue"? As long as the queue can be
gotten at, it needs to remain valid. That is what the references are
for.
--
Jens Axboe
On Tue, Nov 14, 2006 at 09:54:58AM +0100, Pierre Ossman wrote:
> I've had another look at it, and I believe I have a solution. There is
> one assumption I need to verify though.
>
> After del_gendisk() and after I've flushed out any remaining requests,
> is it ok to kill off the queue? Someone might still have the disk open,
> so that would mean the queue is gone by the time gendisk's release
> function is called.
Just arrange for the mmc_queue_thread() to empty the queue when
MMC_QUEUE_EXIT is set, and then exit. I thought this was something
that the block layer looked after (Jens must have missed this in his
original review of the MMC code.)
The handling of userspace keeping the device open despite the hardware
having been removed is already in place.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
Russell King wrote:
> Just arrange for the mmc_queue_thread() to empty the queue when
> MMC_QUEUE_EXIT is set, and then exit. I thought this was something
> that the block layer looked after (Jens must have missed this in his
> original review of the MMC code.)
>
mmc_queue_thread() will empty the thread when MMC_QUEUE_EXIT is set. The
problem is that we do not set that bit until the last person closes the
device. In order to avoid problems we need to empty the queue before
mmc_blk_remove() exits (after which the card structure is no longer valid).
> The handling of userspace keeping the device open despite the hardware
> having been removed is already in place.
>
>
Ok, that's one less problem for me to worry about. :)
Jens Axboe wrote:
> What do you mean by "killing off the queue"? As long as the queue can be
> gotten at, it needs to remain valid. That is what the references are
> for.
>
I do:
del_gendisk();
(wait for queue to become empty, i.e. elv_next_request() == NULL)
blk_cleanup_queue();
and then assume that the request function will no longer be called for
this queue.
Suggested patch:
diff --git a/drivers/mmc/mmc_block.c b/drivers/mmc/mmc_block.c
index f9027c8..5025abe 100644
--- a/drivers/mmc/mmc_block.c
+++ b/drivers/mmc/mmc_block.c
@@ -83,7 +83,6 @@ static void mmc_blk_put(struct mmc_blk_d
md->usage--;
if (md->usage == 0) {
put_disk(md->disk);
- mmc_cleanup_queue(&md->queue);
kfree(md);
}
mutex_unlock(&open_lock);
@@ -553,12 +552,11 @@ static void mmc_blk_remove(struct mmc_ca
if (md) {
int devidx;
+ /* Stop new requests from getting into the queue */
del_gendisk(md->disk);
- /*
- * I think this is needed.
- */
- md->disk->queue = NULL;
+ /* Then flush out any already in there */
+ mmc_cleanup_queue(&md->queue);
devidx = md->disk->first_minor >> MMC_SHIFT;
__clear_bit(devidx, dev_use);
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
On Tue, Nov 14 2006, Pierre Ossman wrote:
> Russell King wrote:
> > Just arrange for the mmc_queue_thread() to empty the queue when
> > MMC_QUEUE_EXIT is set, and then exit. I thought this was something
> > that the block layer looked after (Jens must have missed this in his
> > original review of the MMC code.)
> >
>
> mmc_queue_thread() will empty the thread when MMC_QUEUE_EXIT is set. The
> problem is that we do not set that bit until the last person closes the
> device. In order to avoid problems we need to empty the queue before
> mmc_blk_remove() exits (after which the card structure is no longer valid).
>
> > The handling of userspace keeping the device open despite the hardware
> > having been removed is already in place.
> >
> >
>
> Ok, that's one less problem for me to worry about. :)
>
> Jens Axboe wrote:
> > What do you mean by "killing off the queue"? As long as the queue can be
> > gotten at, it needs to remain valid. That is what the references are
> > for.
> >
>
> I do:
>
> del_gendisk();
> (wait for queue to become empty, i.e. elv_next_request() == NULL)
> blk_cleanup_queue();
elv_next_request() returning NULL means nothing wrt the queue being
empty.
> and then assume that the request function will no longer be called for
> this queue.
>
> Suggested patch:
I think you are making this way too complicated, it's actually pretty
simple: you call blk_put_queue() or blk_cleanup_queue() (same thing)
when _you_ drop your reference to the queue. That's just normal cleanup.
When a device goes away, you make sure that you know about this. I said
that SCSI clears q->queuedata, so it knows that when ->request_fn is
invoked with a NULL q->queuedata (where it stores the device pointer),
the device is not there and the request should just be flushed to
heaven.
Don't make any assumptions about when request_fn will be called or not.
That's bound to be racy anyway.
--
Jens Axboe
On Tue, Nov 14, 2006 at 12:33:47PM +0100, Pierre Ossman wrote:
> Jens Axboe wrote:
> > What do you mean by "killing off the queue"? As long as the queue can be
> > gotten at, it needs to remain valid. That is what the references are
> > for.
> >
>
> I do:
>
> del_gendisk();
> (wait for queue to become empty, i.e. elv_next_request() == NULL)
> blk_cleanup_queue();
>
> and then assume that the request function will no longer be called for
> this queue.
>
> Suggested patch:
>
> diff --git a/drivers/mmc/mmc_block.c b/drivers/mmc/mmc_block.c
> index f9027c8..5025abe 100644
> --- a/drivers/mmc/mmc_block.c
> +++ b/drivers/mmc/mmc_block.c
> @@ -83,7 +83,6 @@ static void mmc_blk_put(struct mmc_blk_d
> md->usage--;
> if (md->usage == 0) {
> put_disk(md->disk);
> - mmc_cleanup_queue(&md->queue);
> kfree(md);
> }
> mutex_unlock(&open_lock);
> @@ -553,12 +552,11 @@ static void mmc_blk_remove(struct mmc_ca
> if (md) {
> int devidx;
>
> + /* Stop new requests from getting into the queue */
> del_gendisk(md->disk);
>
> - /*
> - * I think this is needed.
> - */
> - md->disk->queue = NULL;
> + /* Then flush out any already in there */
> + mmc_cleanup_queue(&md->queue);
>
> devidx = md->disk->first_minor >> MMC_SHIFT;
> __clear_bit(devidx, dev_use);
You now have a disk which may be in use (and a block device) for which
there is no queue. I couldn't see any locking what so ever in del_gendisk
so it's quite possible that the queue could still be referenced while
the disk is still open.
I also don't think del_gendisk() is sufficient to ensure that no new
requests appear in the queue, which is why I'm setting md->disk->queue
to NULL there.
But shug, I don't know the block layer. Jens is your best bet.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
Jens Axboe wrote:
> I think you are making this way too complicated, it's actually pretty
> simple: you call blk_put_queue() or blk_cleanup_queue() (same thing)
> when _you_ drop your reference to the queue. That's just normal cleanup.
> When a device goes away, you make sure that you know about this. I said
> that SCSI clears q->queuedata, so it knows that when ->request_fn is
> invoked with a NULL q->queuedata (where it stores the device pointer),
> the device is not there and the request should just be flushed to
> heaven.
>
What about the gendisk object? Since I assigned the queue pointer to it,
it didn't naturally get a chance to increase the reference count. When
can I safely drop my reference without the gendisk getting upset?
> Don't make any assumptions about when request_fn will be called or not.
> That's bound to be racy anyway.
>
>
Things get a bit muddy by the fact that the mmc layer has a thread that
handles the queue. So I guess we need to have a way to shut down that
thread, but still be able to throw away any stray requests from the
block layer?
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
How about this patch? It is basically the same as the previous, but it
also sets queuedata to NULL and tests for that. It does not address if
someone still has dependencies on the queue but hasn't gotten itself a
reference (as I haven't gotten any word on if that is a problem):
diff --git a/drivers/mmc/mmc_block.c b/drivers/mmc/mmc_block.c
index f9027c8..5025abe 100644
--- a/drivers/mmc/mmc_block.c
+++ b/drivers/mmc/mmc_block.c
@@ -83,7 +83,6 @@ static void mmc_blk_put(struct mmc_blk_d
md->usage--;
if (md->usage == 0) {
put_disk(md->disk);
- mmc_cleanup_queue(&md->queue);
kfree(md);
}
mutex_unlock(&open_lock);
@@ -553,12 +552,11 @@ static void mmc_blk_remove(struct mmc_ca
if (md) {
int devidx;
+ /* Stop new requests from getting into the queue */
del_gendisk(md->disk);
- /*
- * I think this is needed.
- */
- md->disk->queue = NULL;
+ /* Then flush out any already in there */
+ mmc_cleanup_queue(&md->queue);
devidx = md->disk->first_minor >> MMC_SHIFT;
__clear_bit(devidx, dev_use);
diff --git a/drivers/mmc/mmc_queue.c b/drivers/mmc/mmc_queue.c
index 4ccdd82..b6769e2 100644
--- a/drivers/mmc/mmc_queue.c
+++ b/drivers/mmc/mmc_queue.c
@@ -111,6 +111,19 @@ static int mmc_queue_thread(void *d)
static void mmc_request(request_queue_t *q)
{
struct mmc_queue *mq = q->queuedata;
+ struct request *req;
+ int ret;
+
+ if (!mq) {
+ printk(KERN_ERR "MMC: killing requests for dead queue\n");
+ while ((req = elv_next_request(q)) != NULL) {
+ do {
+ ret = end_that_request_chunk(req, 0,
+ req->current_nr_sectors << 9);
+ } while (ret);
+ }
+ return;
+ }
if (!mq->req)
wake_up(&mq->thread_wq);
@@ -179,6 +192,15 @@ EXPORT_SYMBOL(mmc_init_queue);
void mmc_cleanup_queue(struct mmc_queue *mq)
{
+ request_queue_t *q = mq->queue;
+ unsigned long flags;
+
+ /* Mark that we should start throwing out stragglers */
+ spin_lock_irqsave(q->queue_lock, flags);
+ q->queuedata = NULL;
+ spin_unlock_irqrestore(q->queue_lock, flags);
+
+ /* Then terminate our worker thread */
mq->flags |= MMC_QUEUE_EXIT;
wake_up(&mq->thread_wq);
wait_for_completion(&mq->thread_complete);
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
Pierre Ossman wrote:
> How about this patch? It is basically the same as the previous, but it
> also sets queuedata to NULL and tests for that. It does not address if
> someone still has dependencies on the queue but hasn't gotten itself a
> reference (as I haven't gotten any word on if that is a problem):
Ok, I've done some more digging through the block layer. A lot of voodoo
in there, but gendisk seems to make sure it has its own reference to the
queue.
As gendisk drops it reference to the queue in del_gendisk(), I have to
assume that is would be invalid for any part of kernel to look at
disk->queue at this point (since it might now have been released).
So I think this patch should cover all bases. It cleanly kills of our
extra thread and also handles any requests that might be left behind.
I hope you can look at this soon as I'm eager to get rid of this oops.
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
On Tue, Nov 14, 2006 at 03:34:25PM +0100, Pierre Ossman wrote:
> How about this patch? It is basically the same as the previous, but it
> also sets queuedata to NULL and tests for that. It does not address if
> someone still has dependencies on the queue but hasn't gotten itself a
> reference (as I haven't gotten any word on if that is a problem):
>
> diff --git a/drivers/mmc/mmc_block.c b/drivers/mmc/mmc_block.c
> index f9027c8..5025abe 100644
> --- a/drivers/mmc/mmc_block.c
> +++ b/drivers/mmc/mmc_block.c
> @@ -83,7 +83,6 @@ static void mmc_blk_put(struct mmc_blk_d
> md->usage--;
> if (md->usage == 0) {
> put_disk(md->disk);
> - mmc_cleanup_queue(&md->queue);
> kfree(md);
> }
> mutex_unlock(&open_lock);
> @@ -553,12 +552,11 @@ static void mmc_blk_remove(struct mmc_ca
> if (md) {
> int devidx;
>
> + /* Stop new requests from getting into the queue */
> del_gendisk(md->disk);
I'm not entirely convinced that del_gendisk() will prevent any new
requests getting into the queue, which is why the comment below
says:
>
> - /*
> - * I think this is needed.
> - */
> - md->disk->queue = NULL;
Yes, del_gendisk() removes the disk from view for new opens, but what
about existing users?
As the code currently stands, when I remove a mounted MMC card, I get:
generic_make_request: Trying to access nonexistent block-device mmcblk0p1 (26)
FAT: Directory bread(block 26) failed
generic_make_request: Trying to access nonexistent block-device mmcblk0p1 (27)
FAT: Directory bread(block 27) failed
... etc ...
and this comes from:
q = bdev_get_queue(bio->bi_bdev);
if (!q) {
printk(KERN_ERR ...
in block/ll_rw_blk.c. bdev_get_queue merely does:
return bdev->bd_disk->queue;
> + /* Then flush out any already in there */
> + mmc_cleanup_queue(&md->queue);
Now, the thing is calling mmc_cleanup_queue() will shutdown the thread
and wait for it to complete its business, free the scatterlist, and
call blk_cleanup_queue() on that queue.
Ergo, once we've shut down the thread, we should not get any further
MMC requests, so:
> diff --git a/drivers/mmc/mmc_queue.c b/drivers/mmc/mmc_queue.c
> index 4ccdd82..b6769e2 100644
> --- a/drivers/mmc/mmc_queue.c
> +++ b/drivers/mmc/mmc_queue.c
> @@ -111,6 +111,19 @@ static int mmc_queue_thread(void *d)
> static void mmc_request(request_queue_t *q)
> {
> struct mmc_queue *mq = q->queuedata;
> + struct request *req;
> + int ret;
> +
> + if (!mq) {
> + printk(KERN_ERR "MMC: killing requests for dead queue\n");
> + while ((req = elv_next_request(q)) != NULL) {
> + do {
> + ret = end_that_request_chunk(req, 0,
> + req->current_nr_sectors << 9);
> + } while (ret);
> + }
> + return;
> + }
>
> if (!mq->req)
> wake_up(&mq->thread_wq);
> @@ -179,6 +192,15 @@ EXPORT_SYMBOL(mmc_init_queue);
>
> void mmc_cleanup_queue(struct mmc_queue *mq)
> {
> + request_queue_t *q = mq->queue;
> + unsigned long flags;
> +
> + /* Mark that we should start throwing out stragglers */
> + spin_lock_irqsave(q->queue_lock, flags);
> + q->queuedata = NULL;
> + spin_unlock_irqrestore(q->queue_lock, flags);
> +
> + /* Then terminate our worker thread */
> mq->flags |= MMC_QUEUE_EXIT;
> wake_up(&mq->thread_wq);
> wait_for_completion(&mq->thread_complete);
These hunks do not achieve us anything.
However, what we _do_ need to do is to arrange for the MMC queue thread
to error out all pending requests before dying if MMC_QUEUE_EXIT is set.
That's already handled since the queue thread only ever exits if there
are no requests pending _and_ MMC_QUEUE_EXIT has been set.
So I think all you really need is this change:
diff --git a/drivers/mmc/mmc_block.c b/drivers/mmc/mmc_block.c
index f9027c8..9173067 100644
--- a/drivers/mmc/mmc_block.c
+++ b/drivers/mmc/mmc_block.c
@@ -83,7 +83,6 @@ static void mmc_blk_put(struct mmc_blk_d
md->usage--;
if (md->usage == 0) {
put_disk(md->disk);
- mmc_cleanup_queue(&md->queue);
kfree(md);
}
mutex_unlock(&open_lock);
@@ -559,6 +558,7 @@ static void mmc_blk_remove(struct mmc_ca
* I think this is needed.
*/
md->disk->queue = NULL;
+ mmc_cleanup_queue(&md->queue);
devidx = md->disk->first_minor >> MMC_SHIFT;
__clear_bit(devidx, dev_use);
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
Russell King wrote:
> I'm not entirely convinced that del_gendisk() will prevent any new
> requests getting into the queue, which is why the comment below
> says:
>
>
Jens wasn't either, so my second patch also modifies the request
function to drop requests.
>>
>> - /*
>> - * I think this is needed.
>> - */
>> - md->disk->queue = NULL;
>>
>
> Yes, del_gendisk() removes the disk from view for new opens, but what
> about existing users?
>
I find it a bit disconcerting to modify that structure without any locks
held. Once we've given the queue to gendisk, it should do its own
reference handling. If that assign to NULL is needed then IMHO,
something is broken elsewhere and should be fixed.
> These hunks do not achieve us anything.
>
> However, what we _do_ need to do is to arrange for the MMC queue thread
> to error out all pending requests before dying if MMC_QUEUE_EXIT is set.
> That's already handled since the queue thread only ever exits if there
> are no requests pending _and_ MMC_QUEUE_EXIT has been set.
>
>
Well, Jens seemed to suggest that the proper way was not to try and
prevent everyone from putting new stuff into the queue, but to start
failing requests. Hence my changes.
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org