2009-06-04 23:35:33

by Deepak Saxena

[permalink] [raw]
Subject: Re: [PATCH] MMC: Add 400ms to CAFE SD controller resume path

On May 28 2009, at 10:48, Pierre Ossman was caught saying:
> On Thu, 28 May 2009 01:36:29 -0700
> Andrew Morton <[email protected]> wrote:
>
> > On Fri, 22 May 2009 13:51:50 +0200 Pierre Ossman <[email protected]> wrote:
> >
> > >
> > > Reading through that report, I don't believe you properly worked around
> > > the bug. You only avoid bug 1339, but that's only mildly related.
> > >
> > > What this workaround does is to make sure that MMC_UNSAFE_RESUME
> > > actually works. But if you change cards during suspend, the VFS bug
> > > should reappear and you'll corrupt the partition table.
> >
> > What do you think the VFS did wrong here?
> >
>
> Might be the block layer as well. Somehow requests associated with an
> old block device end up on the queue of a new block device. I don't
> see how this can happen given Linux' device model, but somehow it does.

The request is not ending up in the queue of the new device. If I
recall correctyl What is happening is that userspace does an unmount on the
old device when it receives the notification that it is gone (since it is not
redected and a new entry is created). By the time that unmount is called, the
various data structures for the device has been zeroed out but the various
operation pointers are !NULL, so when we go back to write the superblock for
the mounted partition, we overwrite the device's partition table as the
partition offset is now 0. The main issue here is that the kernel is allowing
an I/O to a device that is technically still there but not from the kernel's
POV.

~Deepak

--
In the end, they will not say, "those were dark times," they will ask
"why were their poets silent?" - Bertold Brecht


2009-06-13 10:35:06

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH] MMC: Add 400ms to CAFE SD controller resume path

On Thu, 4 Jun 2009 23:33:49 +0000
Deepak Saxena <[email protected]> wrote:

>
> The request is not ending up in the queue of the new device. If I
> recall correctyl What is happening is that userspace does an unmount on the
> old device when it receives the notification that it is gone (since it is not
> redected and a new entry is created). By the time that unmount is called, the
> various data structures for the device has been zeroed out but the various
> operation pointers are !NULL, so when we go back to write the superblock for
> the mounted partition, we overwrite the device's partition table as the
> partition offset is now 0. The main issue here is that the kernel is allowing
> an I/O to a device that is technically still there but not from the kernel's
> POV.

But if the device is not there from the kernel's POV then it should be
impossible to write to it. It seems this is a kernel bug though. See
this thread:

http://marc.info/?t=124413860500006&r=1&w=2

Rgds
--
-- Pierre Ossman

WARNING: This correspondence is being monitored by the
Swedish government. Make sure your server uses encryption
for SMTP traffic and consider using PGP for end-to-end
encryption.


Attachments:
signature.asc (198.00 B)