2009-06-04 18:01:00

by Stefan Bader

[permalink] [raw]
Subject: [PATCH] mmc: prevent dangling block device from accessing stale queues

Kernel: 2.6.30-rc7 based
Worked in 2.6.28 (probably only because things went at a different speed)

Testcase: Use ext3/ext4 on a SD card partitioned with one primary DOS partition
and leave it mounted while suspend/resume.

Result: After resume the partition table of the SD card has been erased.

The detailed description can be found at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/383668

In essence the mmc block device frees the generic request queue before the last
user of the gendisk has stopped using it leaving an invalid queue pointer which
get unfortunately re-used before more requests come in for the old device.

The bugfix will cause more I/O error messages and might not be the ultimate way
things should work, but it prevents data from getting lost.

Stefan


Attachments:
0001-UBUNTU-Upstream-mmc-prevent-dangling-block-device-fr.patch (2.46 kB)

2009-06-04 18:29:18

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] mmc: prevent dangling block device from accessing stale queues

On Thu, 04 Jun 2009 20:00:52 +0200
Stefan Bader <[email protected]> wrote:

> Kernel: 2.6.30-rc7 based
> Worked in 2.6.28 (probably only because things went at a different speed)
>
> Testcase: Use ext3/ext4 on a SD card partitioned with one primary DOS partition
> and leave it mounted while suspend/resume.
>
> Result: After resume the partition table of the SD card has been erased.
>
> The detailed description can be found at:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/383668
>
> In essence the mmc block device frees the generic request queue before the last
> user of the gendisk has stopped using it leaving an invalid queue pointer which
> get unfortunately re-used before more requests come in for the old device.
>
> The bugfix will cause more I/O error messages and might not be the ultimate way
> things should work, but it prevents data from getting lost.
>

You seem to have dug a bit further than I've had time for. Do you have
anything substantial to back this up:

> + /*
> + * Calling blk_cleanup_queue() would be too soon here. As long as
> + * the gendisk has a reference to it and is not released we should
> + * keep the queue. It has been shutdown and will not accept any new
> + * requests, so that should be safe.
> + */

?

It would seem that gendisk is making some bad assumptions and needs to
be changed if that is the case.

This part from the launchpad report also seems incredibly broken:

> What makes the whole thing a disaster is the fact that the block device queue objects are taken from a slub cache. Which means on resume, the newly created block device will get the same queue object as the old one, initializes it and
> after the tasks have been resumed, ext3 feels obliged to write out the invalidated superblocks (still not sure why it goes for sector 0) which will happily migrate to the new block device and cause confusion.

Jens, comments?

Rgds
--
-- Pierre Ossman

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.


Attachments:
signature.asc (198.00 B)

2009-06-04 19:00:51

by Stefan Bader

[permalink] [raw]
Subject: Re: [PATCH] mmc: prevent dangling block device from accessing stale queues

Pierre Ossman wrote:
> On Thu, 04 Jun 2009 20:00:52 +0200
> Stefan Bader <[email protected]> wrote:
>
>> Kernel: 2.6.30-rc7 based
>> Worked in 2.6.28 (probably only because things went at a different speed)
>>
>> Testcase: Use ext3/ext4 on a SD card partitioned with one primary DOS partition
>> and leave it mounted while suspend/resume.
>>
>> Result: After resume the partition table of the SD card has been erased.
>>
>> The detailed description can be found at:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/383668
>>
>> In essence the mmc block device frees the generic request queue before the last
>> user of the gendisk has stopped using it leaving an invalid queue pointer which
>> get unfortunately re-used before more requests come in for the old device.
>>
>> The bugfix will cause more I/O error messages and might not be the ultimate way
>> things should work, but it prevents data from getting lost.
>>
>
> You seem to have dug a bit further than I've had time for. Do you have
> anything substantial to back this up:
>
>> + /*
>> + * Calling blk_cleanup_queue() would be too soon here. As long as
>> + * the gendisk has a reference to it and is not released we should
>> + * keep the queue. It has been shutdown and will not accept any new
>> + * requests, so that should be safe.
>> + */
>

This is mostly based on the debug output. But it seems hard to get around of it
without having a way to increment the refcount of the queue. It is probably not
the most common use case to remove a device while it is mounted.
Hm, not sure this is what you wanted to know... On the launchpad report there
are logs which I took with lots of printk's enabled. This shows that after
resume the queue receives a request from mmcblk0 (which no longer exists) but
uses the same pointer as mmcblk1 which was just created.

>
> It would seem that gendisk is making some bad assumptions and needs to
> be changed if that is the case.

I think the setup and release of it would need to have access to blk_queue_get
and blk_queue_put. When it is created and the queue pointer is stored it should
take a reference and when the object is finally released, reference to the
queue would get dropped.

> This part from the launchpad report also seems incredibly broken:
>
>> What makes the whole thing a disaster is the fact that the block device queue objects are taken from a slub cache. Which means on resume, the newly created block device will get the same queue object as the old one, initializes it and
>> after the tasks have been resumed, ext3 feels obliged to write out the invalidated superblocks (still not sure why it goes for sector 0) which will happily migrate to the new block device and cause confusion.

I don't think that part is that much broken. It is more a unfortunate result of
the previous events. Maybe the part of ext3 writing to sector 0 is a bit
worrying as I would only expect it to update the mount information which I hink
is somewhere around sector 10.

> Jens, comments?
>
> Rgds


--

When all other means of communication fail, try words!

2009-06-04 19:15:23

by Matt Fleming

[permalink] [raw]
Subject: Re: [PATCH] mmc: prevent dangling block device from accessing stale queues

On Thu, Jun 04, 2009 at 09:00:42PM +0200, Stefan Bader wrote:
>
> Hm, not sure this is what you wanted to know... On the launchpad report
> there are logs which I took with lots of printk's enabled. This shows that
> after resume the queue receives a request from mmcblk0 (which no longer
> exists) but uses the same pointer as mmcblk1 which was just created.
>

Maybe I'm missing something, but why is the device instance being
destroyed during a suspend? E.g why do you have mmcblk0 before suspend and
mmcblk1 after suspend?

2009-06-04 19:21:18

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] mmc: prevent dangling block device from accessing stale queues

On Thu, 04 Jun 2009 21:00:42 +0200
Stefan Bader <[email protected]> wrote:

> Pierre Ossman wrote:
> >
> > You seem to have dug a bit further than I've had time for. Do you have
> > anything substantial to back this up:
> >
> >> + /*
> >> + * Calling blk_cleanup_queue() would be too soon here. As long as
> >> + * the gendisk has a reference to it and is not released we should
> >> + * keep the queue. It has been shutdown and will not accept any new
> >> + * requests, so that should be safe.
> >> + */
> >
>
> This is mostly based on the debug output. But it seems hard to get around of it
> without having a way to increment the refcount of the queue. It is probably not
> the most common use case to remove a device while it is mounted.
> Hm, not sure this is what you wanted to know... On the launchpad report there
> are logs which I took with lots of printk's enabled. This shows that after
> resume the queue receives a request from mmcblk0 (which no longer exists) but
> uses the same pointer as mmcblk1 which was just created.
>

I was hoping you had dug around in the block layer and had some idea
why gendisk requires someone else to keep the queue around for it. Is
it just a simple case of a missing reference, or is there some
architectural problem?

> > This part from the launchpad report also seems incredibly broken:
> >
> >> What makes the whole thing a disaster is the fact that the block device queue objects are taken from a slub cache. Which means on resume, the newly created block device will get the same queue object as the old one, initializes it and
> >> after the tasks have been resumed, ext3 feels obliged to write out the invalidated superblocks (still not sure why it goes for sector 0) which will happily migrate to the new block device and cause confusion.
>
> I don't think that part is that much broken. It is more a unfortunate result of
> the previous events. Maybe the part of ext3 writing to sector 0 is a bit
> worrying as I would only expect it to update the mount information which I hink
> is somewhere around sector 10.
>

The incredibly broken part is how requests for the old queue wind up on
the new queue. Such a thing should never be possible.

Rgds
--
-- Pierre Ossman

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.


Attachments:
signature.asc (198.00 B)

2009-06-04 19:22:59

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] mmc: prevent dangling block device from accessing stale queues

On Thu, 4 Jun 2009 20:15:13 +0100
Matt Fleming <[email protected]> wrote:

>
> Maybe I'm missing something, but why is the device instance being
> destroyed during a suspend? E.g why do you have mmcblk0 before suspend and
> mmcblk1 after suspend?

Because the card gets powered down during suspend and we have no way of
detecting what has happened to it when we come back up. USB does the
same thing (although it has slightly more intelligent hardware which
can keep track of removal as long as the host has some power).

Rgds
--
-- Pierre Ossman

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.


Attachments:
signature.asc (198.00 B)

2009-06-04 19:23:41

by Stefan Bader

[permalink] [raw]
Subject: Re: [PATCH] mmc: prevent dangling block device from accessing stale queues

Matt Fleming wrote:
> On Thu, Jun 04, 2009 at 09:00:42PM +0200, Stefan Bader wrote:
>> Hm, not sure this is what you wanted to know... On the launchpad report
>> there are logs which I took with lots of printk's enabled. This shows that
>> after resume the queue receives a request from mmcblk0 (which no longer
>> exists) but uses the same pointer as mmcblk1 which was just created.
>>
>
> Maybe I'm missing something, but why is the device instance being
> destroyed during a suspend? E.g why do you have mmcblk0 before suspend and
> mmcblk1 after suspend?

That is the way mmcblock works (without unsafe resume set) in conjunction with
( probably ) slow userspace. On suspend the block device is removed. But the
mount is cleaned by (in that case hald) doing a forced unmount. The timeing
seems to be that the unmount part is partially done on the way up.

--

When all other means of communication fail, try words!

2009-06-04 19:37:27

by Stefan Bader

[permalink] [raw]
Subject: Re: [PATCH] mmc: prevent dangling block device from accessing stale queues

Pierre Ossman wrote:
> On Thu, 04 Jun 2009 21:00:42 +0200
> Stefan Bader <[email protected]> wrote:
>
>> Pierre Ossman wrote:
>>> You seem to have dug a bit further than I've had time for. Do you have
>>> anything substantial to back this up:
>>>
>>>> + /*
>>>> + * Calling blk_cleanup_queue() would be too soon here. As long as
>>>> + * the gendisk has a reference to it and is not released we should
>>>> + * keep the queue. It has been shutdown and will not accept any new
>>>> + * requests, so that should be safe.
>>>> + */
>> This is mostly based on the debug output. But it seems hard to get around of it
>> without having a way to increment the refcount of the queue. It is probably not
>> the most common use case to remove a device while it is mounted.
>> Hm, not sure this is what you wanted to know... On the launchpad report there
>> are logs which I took with lots of printk's enabled. This shows that after
>> resume the queue receives a request from mmcblk0 (which no longer exists) but
>> uses the same pointer as mmcblk1 which was just created.
>>
>
> I was hoping you had dug around in the block layer and had some idea
> why gendisk requires someone else to keep the queue around for it. Is
> it just a simple case of a missing reference, or is there some
> architectural problem?
>

You could say architectural. The get a queue object and the pointer to that
gets stored in the gendisk object. This is used in generic make request to get
the queue for a bdev. The reference to the bdev (this is a bit guessing) is
kept by the filesystem.
The mmc block device will release the disk reference not before the last user
is gone (again the fs). Another approach would have been to set the queue
pointer to NULL after the queue has been released. But there is no locking
around getting the pointer, so that seemed dangerous as well.

>>> This part from the launchpad report also seems incredibly broken:
>>>
>>>> What makes the whole thing a disaster is the fact that the block device queue objects are taken from a slub cache. Which means on resume, the newly created block device will get the same queue object as the old one, initializes it and
>>>> after the tasks have been resumed, ext3 feels obliged to write out the invalidated superblocks (still not sure why it goes for sector 0) which will happily migrate to the new block device and cause confusion.
>> I don't think that part is that much broken. It is more a unfortunate result of
>> the previous events. Maybe the part of ext3 writing to sector 0 is a bit
>> worrying as I would only expect it to update the mount information which I hink
>> is somewhere around sector 10.
>>
>
> The incredibly broken part is how requests for the old queue wind up on
> the new queue. Such a thing should never be possible.
>

That is only possible as the queue object s created from a cache. The old queue
has been released and the new on re-uses that storage. This would be ok, but
now pointer in the old gendisk is in fact crosspointing.

I think (but I have not debugged much into that direction) that I saw bad
pointer dereferences on just ejecting the mounted sd card. Which probably was
caused by the same issue. Just in that case the pointer is invalid and no new
device has been created to be hit.


> Rgds


--

When all other means of communication fail, try words!

2009-06-10 21:03:01

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] mmc: prevent dangling block device from accessing stale queues

On Thu 2009-06-04 20:00:52, Stefan Bader wrote:
> Kernel: 2.6.30-rc7 based
> Worked in 2.6.28 (probably only because things went at a different speed)
>
> Testcase: Use ext3/ext4 on a SD card partitioned with one primary DOS
> partition and leave it mounted while suspend/resume.
>
> Result: After resume the partition table of the SD card has been erased.
>
> The detailed description can be found at:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/383668
>
> In essence the mmc block device frees the generic request queue before
> the last user of the gendisk has stopped using it leaving an invalid
> queue pointer which get unfortunately re-used before more requests come
> in for the old device.
>
> The bugfix will cause more I/O error messages and might not be the
> ultimate way things should work, but it prevents data from getting lost.

Thanks for finding root cause of this!
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-06-23 15:01:22

by Stefan Bader

[permalink] [raw]
Subject: Re: [PATCH] mmc: prevent dangling block device from accessing stale queues

I did not see any news related to this issue. Pierre, are you waiting
on more info from my side? Or did I miss a change somewhere else?

Stefan


> On Thu 2009-06-04 20:00:52, Stefan Bader wrote:
>> Kernel: 2.6.30-rc7 based
>> Worked in 2.6.28 (probably only because things went at a different speed)
>>
>> Testcase: Use ext3/ext4 on a SD card partitioned with one primary DOS
>> partition and leave it mounted while suspend/resume.
>>
>> Result: After resume the partition table of the SD card has been erased.
>>
>> The detailed description can be found at:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/383668
>>
>> In essence the mmc block device frees the generic request queue before
>> the last user of the gendisk has stopped using it leaving an invalid
>> queue pointer which get unfortunately re-used before more requests come
>> in for the old device.
>>
>> The bugfix will cause more I/O error messages and might not be the
>> ultimate way things should work, but it prevents data from getting lost.
>

2009-07-01 11:09:30

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] mmc: prevent dangling block device from accessing stale queues

On Tue, 23 Jun 2009 17:01:14 +0200
Stefan Bader <[email protected]> wrote:

> I did not see any news related to this issue. Pierre, are you waiting
> on more info from my side? Or did I miss a change somewhere else?
>

I plan to have a closer look at this first. I believe your solution is
a workaround and doesn't solve the real issue. As such, I want to have
one more go at finding and fixing the real problem before committing
this. Your analysis should help a lot in that effort.

Rgds
--
-- Pierre Ossman

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.


Attachments:
signature.asc (198.00 B)