2006-02-26 22:50:53

by col-pepper

[permalink] [raw]
Subject: o_sync in vfat driver

Hi,

OMG what do I have to do to post here? 10th attempt.
{part2}

Here is a non-exhaustive list of typical devices types requiring fat vfat
support:

fd ide-hd scsi-hd usb-hd cdrom usb-hd usb-handheld (iPod, iRiver etc)
usb-flash (usbsticks, cameras, some music devices.)

IIRC the sync mount option for vfat is ignored for file systems >2G, this
effectively (and probably intentionally) excludes nearly all hd partitions
and iPod type devices.

sync does not have any meaning for CD DVD media.


2006-02-27 13:28:51

by Lennart Sorensen

[permalink] [raw]
Subject: Re: o_sync in vfat driver

On Sun, Feb 26, 2006 at 11:50:40PM +0100, [email protected] wrote:
> Hi,
>
> OMG what do I have to do to post here? 10th attempt.
> {part2}
>
> Here is a non-exhaustive list of typical devices types requiring fat vfat
> support:
>
> fd ide-hd scsi-hd usb-hd cdrom usb-hd usb-handheld (iPod, iRiver etc)
> usb-flash (usbsticks, cameras, some music devices.)
>
> IIRC the sync mount option for vfat is ignored for file systems >2G, this
> effectively (and probably intentionally) excludes nearly all hd partitions
> and iPod type devices.

I think many people wish it was ignored on smaller devices too given
what it does to write performance. And if your device is flash based
and is one of the ones that doesn't have proper wear leveling the card
won't last long with sync enabled (even with wear leveling rewriting the
fat that often as sync seems to do can't be good for the lifespan of the
flash).

I suspect either vfat should ignore sync all the time, or it should at
least warn about its use so distributions don't think enabling it on all
removeable media is a good idea in general. Or perhaps the vfat driver
could be made to wait for a file to be closed or at least have some
timeout before updating the fat table again. Not sure.

Len Sorensen

2006-02-27 13:50:33

by Arjan van de Ven

[permalink] [raw]
Subject: Re: o_sync in vfat driver

On Mon, 2006-02-27 at 08:28 -0500, Lennart Sorensen wrote:
> On Sun, Feb 26, 2006 at 11:50:40PM +0100, [email protected] wrote:
> > Hi,
> >
> > OMG what do I have to do to post here? 10th attempt.
> > {part2}
> >
> > Here is a non-exhaustive list of typical devices types requiring fat vfat
> > support:
> >
> > fd ide-hd scsi-hd usb-hd cdrom usb-hd usb-handheld (iPod, iRiver etc)
> > usb-flash (usbsticks, cameras, some music devices.)
> >
> > IIRC the sync mount option for vfat is ignored for file systems >2G, this
> > effectively (and probably intentionally) excludes nearly all hd partitions
> > and iPod type devices.
>
> I think many people wish it was ignored on smaller devices too given
> what it does to write performance.

well. If you don't want it *DO NOT USE IT AT THE MOUNT COMMAND LINE* !!!



> And if your device is flash based
> and is one of the ones that doesn't have proper wear leveling the card
> won't last long with sync enabled (even with wear leveling rewriting the
> fat that often as sync seems to do can't be good for the lifespan of the
> flash).

patient> doctor doctor it hurts when I do this
doctor> Then don't do that





2006-02-27 14:06:28

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: o_sync in vfat driver

On Mon, 2006-02-27 at 14:50 +0100, Arjan van de Ven wrote:
> On Mon, 2006-02-27 at 08:28 -0500, Lennart Sorensen wrote:
> > On Sun, Feb 26, 2006 at 11:50:40PM +0100, [email protected] wrote:
> > > Hi,
> > >
> > > OMG what do I have to do to post here? 10th attempt.
> > > {part2}
> > >
> > > Here is a non-exhaustive list of typical devices types requiring fat vfat
> > > support:
> > >
> > > fd ide-hd scsi-hd usb-hd cdrom usb-hd usb-handheld (iPod, iRiver etc)
> > > usb-flash (usbsticks, cameras, some music devices.)
> > >
> > > IIRC the sync mount option for vfat is ignored for file systems >2G, this
> > > effectively (and probably intentionally) excludes nearly all hd partitions
> > > and iPod type devices.
> >
> > I think many people wish it was ignored on smaller devices too given
> > what it does to write performance.
>
> well. If you don't want it *DO NOT USE IT AT THE MOUNT COMMAND LINE* !!!

That is easy to say when you are using the command line... Modern
distros (as you know I am sure) mount all hot-plug devices like usb
keys, usb hard disks, etc automatically at plug-in time and at least
some distros use "-o sync" for everything so you don't get (too much)
data loss when the user unplugs a device and so a umount to unplug the
device does not take ages...

Being someone who maintains a distribution based on one of the big
distributions I can tell you that figuring out how to change that
default behaviour is not always pretty. Usually involves hacking files
deep in the bowels of the hotplug framework on the system.

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

2006-02-27 14:26:26

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: o_sync in vfat driver


On Mon, 27 Feb 2006, Lennart Sorensen wrote:

> On Sun, Feb 26, 2006 at 11:50:40PM +0100, [email protected] wrote:
>> Hi,
>>
>> OMG what do I have to do to post here? 10th attempt.
>> {part2}
>>
>> Here is a non-exhaustive list of typical devices types requiring fat vfat
>> support:
>>
>> fd ide-hd scsi-hd usb-hd cdrom usb-hd usb-handheld (iPod, iRiver etc)
>> usb-flash (usbsticks, cameras, some music devices.)
>>
>> IIRC the sync mount option for vfat is ignored for file systems >2G, this
>> effectively (and probably intentionally) excludes nearly all hd partitions
>> and iPod type devices.
>
> I think many people wish it was ignored on smaller devices too given
> what it does to write performance. And if your device is flash based
> and is one of the ones that doesn't have proper wear leveling the card
> won't last long with sync enabled (even with wear leveling rewriting the
> fat that often as sync seems to do can't be good for the lifespan of the
> flash).
>
> I suspect either vfat should ignore sync all the time, or it should at
> least warn about its use so distributions don't think enabling it on all
> removeable media is a good idea in general. Or perhaps the vfat driver
> could be made to wait for a file to be closed or at least have some
> timeout before updating the fat table again. Not sure.
>
> Len Sorensen

I really don't think one needs to worry about this! The flash-file-
system designers know how to minimize wear and spread the wear
throughout the device. It's not up to the file-systems to be
concerned whatsoever! The filesystems need to concern themselves
with the proper implementation of their structural details, nothing
else. Any special device considerations do not belong in the
file-system code. If there are any special device considerations,
they need to be in the device driver, nowhere else.

BYW, even the drivers can't effectively compensate for any
potential wear because they don't know where the physical
write will occur. The physical sectors (pages) of many of
these devices are 64k. All of the access, both read and
write, is buffered in read/write static RAM. It's only
when the disk emulator of the FlashRAM decides that the
static RAM needs to be flushed to flash, that the write
actually occurs. Typically, a LRU 64k page is erased
and re-written. Then a table is updated to reference the
new correct block. This is all transparent, and it
needs to be, because erasing a 64k block takes nearly
a second! Without the buffering, write performance
would be unacceptable.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.53 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-02-27 14:27:21

by Arjan van de Ven

[permalink] [raw]
Subject: Re: o_sync in vfat driver

On Mon, 2006-02-27 at 14:06 +0000, Anton Altaparmakov wrote:
> On Mon, 2006-02-27 at 14:50 +0100, Arjan van de Ven wrote:
> > On Mon, 2006-02-27 at 08:28 -0500, Lennart Sorensen wrote:
> > > On Sun, Feb 26, 2006 at 11:50:40PM +0100, [email protected] wrote:
> > > > Hi,
> > > >
> > > > OMG what do I have to do to post here? 10th attempt.
> > > > {part2}
> > > >
> > > > Here is a non-exhaustive list of typical devices types requiring fat vfat
> > > > support:
> > > >
> > > > fd ide-hd scsi-hd usb-hd cdrom usb-hd usb-handheld (iPod, iRiver etc)
> > > > usb-flash (usbsticks, cameras, some music devices.)
> > > >
> > > > IIRC the sync mount option for vfat is ignored for file systems >2G, this
> > > > effectively (and probably intentionally) excludes nearly all hd partitions
> > > > and iPod type devices.
> > >
> > > I think many people wish it was ignored on smaller devices too given
> > > what it does to write performance.
> >
> > well. If you don't want it *DO NOT USE IT AT THE MOUNT COMMAND LINE* !!!
>
> That is easy to say when you are using the command line... Modern
> distros (as you know I am sure) mount all hot-plug devices like usb
> keys, usb hard disks, etc automatically at plug-in time and at least
> some distros use "-o sync"

that is a bad misdesign of that distro or at least the tool the distro
uses for this (I don't know which it is so I can say that without
sounding partial :)

the tool that decides to use "sync", or at least the author thereof,
should be aware of what flash is, and that it has a limited lifespan etc
etc, and that you thus want maximum caching etc.



2006-02-27 14:42:03

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: o_sync in vfat driver

On Mon, 2006-02-27 at 15:27 +0100, Arjan van de Ven wrote:
> On Mon, 2006-02-27 at 14:06 +0000, Anton Altaparmakov wrote:
> > On Mon, 2006-02-27 at 14:50 +0100, Arjan van de Ven wrote:
> > > On Mon, 2006-02-27 at 08:28 -0500, Lennart Sorensen wrote:
> > > > On Sun, Feb 26, 2006 at 11:50:40PM +0100, [email protected] wrote:
> > > > > Hi,
> > > > >
> > > > > OMG what do I have to do to post here? 10th attempt.
> > > > > {part2}
> > > > >
> > > > > Here is a non-exhaustive list of typical devices types requiring fat vfat
> > > > > support:
> > > > >
> > > > > fd ide-hd scsi-hd usb-hd cdrom usb-hd usb-handheld (iPod, iRiver etc)
> > > > > usb-flash (usbsticks, cameras, some music devices.)
> > > > >
> > > > > IIRC the sync mount option for vfat is ignored for file systems >2G, this
> > > > > effectively (and probably intentionally) excludes nearly all hd partitions
> > > > > and iPod type devices.
> > > >
> > > > I think many people wish it was ignored on smaller devices too given
> > > > what it does to write performance.
> > >
> > > well. If you don't want it *DO NOT USE IT AT THE MOUNT COMMAND LINE* !!!
> >
> > That is easy to say when you are using the command line... Modern
> > distros (as you know I am sure) mount all hot-plug devices like usb
> > keys, usb hard disks, etc automatically at plug-in time and at least
> > some distros use "-o sync"
>
> that is a bad misdesign of that distro or at least the tool the distro
> uses for this (I don't know which it is so I can say that without
> sounding partial :)
>
> the tool that decides to use "sync", or at least the author thereof,
> should be aware of what flash is, and that it has a limited lifespan etc
> etc, and that you thus want maximum caching etc.

I agree completely which is why we hack the system to remove the o_sync
on our distro derivative. (-:

But my point was that your solution of "don't do that then" is not much
use to your average user who sits in front of such distro in graphical
desktop as they are not technical enough to find and hack their hotplug
system to work properly...

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

2006-02-27 18:53:39

by Jan Engelhardt

[permalink] [raw]
Subject: Re: o_sync in vfat driver

>
>I really don't think one needs to worry about this! The flash-file-
>system designers know how to minimize wear and spread the wear
>throughout the device. It's not up to the file-systems to be
>concerned whatsoever!

Yes, the filesystem designers, JFFS and such. But most people unfortunately
have to use something not-optimized-for-flash called VFAT to be able to
read it on Win32 too. I would like to use UDF instead, but Windows seems to
have a nogo with UDF on non-CDROMs.


Jan Engelhardt
--

2006-02-27 21:04:46

by col-pepper

[permalink] [raw]
Subject: Re: o_sync in vfat driver

On Mon, 27 Feb 2006 15:41:44 +0100, Anton Altaparmakov <[email protected]>
wrote:

> On Mon, 2006-02-27 at 15:27 +0100, Arjan van de Ven wrote:
>> On Mon, 2006-02-27 at 14:06 +0000, Anton Altaparmakov wrote:
>> > On Mon, 2006-02-27 at 14:50 +0100, Arjan van de Ven wrote:
>> > > On Mon, 2006-02-27 at 08:28 -0500, Lennart Sorensen wrote:
>> > > > On Sun, Feb 26, 2006 at 11:50:40PM +0100, [email protected]
>> wrote:
>> > > > > Hi,
>> > > > >
>> > > > > OMG what do I have to do to post here? 10th attempt.
>> > > > > {part2}
>> > > > >
>> > > > > Here is a non-exhaustive list of typical devices types
>> requiring fat vfat
>> > > > > support:
>> > > > >
>> > > > > fd ide-hd scsi-hd usb-hd cdrom usb-hd usb-handheld (iPod,
>> iRiver etc)
>> > > > > usb-flash (usbsticks, cameras, some music devices.)
>> > > > >
>> > > > > IIRC the sync mount option for vfat is ignored for file systems
>> >2G, this
>> > > > > effectively (and probably intentionally) excludes nearly all hd
>> partitions
>> > > > > and iPod type devices.
>> > > >
>> > > > I think many people wish it was ignored on smaller devices too
>> given
>> > > > what it does to write performance.
>> > >
>> > > well. If you don't want it *DO NOT USE IT AT THE MOUNT COMMAND
>> LINE* !!!
>> >
>> > That is easy to say when you are using the command line... Modern
>> > distros (as you know I am sure) mount all hot-plug devices like usb
>> > keys, usb hard disks, etc automatically at plug-in time and at least
>> > some distros use "-o sync"
>>
>> that is a bad misdesign of that distro or at least the tool the distro
>> uses for this (I don't know which it is so I can say that without
>> sounding partial :)
>>
>> the tool that decides to use "sync", or at least the author thereof,
>> should be aware of what flash is, and that it has a limited lifespan etc
>> etc, and that you thus want maximum caching etc.
>
> I agree completely which is why we hack the system to remove the o_sync
> on our distro derivative. (-:
>
> But my point was that your solution of "don't do that then" is not much
> use to your average user who sits in front of such distro in graphical
> desktop as they are not technical enough to find and hack their hotplug
> system to work properly...
>
> Best regards,
>
> Anton

>> If you don't want it *DO NOT USE IT AT THE MOUNT COMMAND LINE* !!!

Yeah, cleaver.
That is not really a constructive responce. I dont use , I do use command
line mount all the time. I never was in danger of damaging my drive with
this new "feature".

Telling a user who has just burnt out a brand new 1GB usb device he should
have RTFM and modified that HAL configuration to insure it did not use
sync it not likely to win much confidence in the linux kernel.

The point of raising this is that the vast majority of linux users have no
awareness of this. If there is a danger of this sync implementation
damaging hardware it should be done differently.

More importantly this sync strategy is very likely _increasing_ the danger
of data loss that is the core reason for using sync in the first place.

To quote from my earlier post:

The new model attempts to be more rigourous by updating the FAT every time
a block of data is written. Thus the "hammering" of the physical memory
hosting the FAT record.

In view of the nature of flash memory this may actually be drastically
increasing the chance that the whole FAT gets erased.

If a pullout occurs during write , there is now a near 50% chance that
this takes out the entire FAT.

Now if that analysis is inaccurate I'd like be corrected. But flash has to
be zeroed to be written. If every second write is zeroing the FAT this
would seem much more likely to destroy the whole fs than to provide better
protection from a untimely pull-out.


[Note: I am not subscribed to LKML, if you wish me to recieve any follow
ups please BCC: col-pepper at piments point com . thx]

2006-02-27 21:17:31

by Arjan van de Ven

[permalink] [raw]
Subject: Re: o_sync in vfat driver


> Telling a user who has just burnt out a brand new 1GB usb device he should
> have RTFM and modified that HAL configuration to insure it did not use
> sync it not likely to win much confidence in the linux kernel.

or in HAL. really.


there was a very long discussion abuot kernel stability.
The problem is that once depending on the absence of a feature becomes
ABI ... there is a big problem.


2006-02-27 21:32:13

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: o_sync in vfat driver


On Mon, 27 Feb 2006 [email protected] wrote:

> On Mon, 27 Feb 2006 15:41:44 +0100, Anton Altaparmakov <[email protected]>
> wrote:
>
>> On Mon, 2006-02-27 at 15:27 +0100, Arjan van de Ven wrote:
>>> On Mon, 2006-02-27 at 14:06 +0000, Anton Altaparmakov wrote:
>>>> On Mon, 2006-02-27 at 14:50 +0100, Arjan van de Ven wrote:
>>>>> On Mon, 2006-02-27 at 08:28 -0500, Lennart Sorensen wrote:
>>>>>> On Sun, Feb 26, 2006 at 11:50:40PM +0100, [email protected]
>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> OMG what do I have to do to post here? 10th attempt.
>>>>>>> {part2}
>>>>>>>
>>>>>>> Here is a non-exhaustive list of typical devices types
>>> requiring fat vfat
>>>>>>> support:
>>>>>>>
>>>>>>> fd ide-hd scsi-hd usb-hd cdrom usb-hd usb-handheld (iPod,
>>> iRiver etc)
>>>>>>> usb-flash (usbsticks, cameras, some music devices.)
>>>>>>>
>>>>>>> IIRC the sync mount option for vfat is ignored for file systems
>>>> 2G, this
>>>>>>> effectively (and probably intentionally) excludes nearly all hd
>>> partitions
>>>>>>> and iPod type devices.
>>>>>>
>>>>>> I think many people wish it was ignored on smaller devices too
>>> given
>>>>>> what it does to write performance.
>>>>>
>>>>> well. If you don't want it *DO NOT USE IT AT THE MOUNT COMMAND
>>> LINE* !!!
>>>>
>>>> That is easy to say when you are using the command line... Modern
>>>> distros (as you know I am sure) mount all hot-plug devices like usb
>>>> keys, usb hard disks, etc automatically at plug-in time and at least
>>>> some distros use "-o sync"
>>>
>>> that is a bad misdesign of that distro or at least the tool the distro
>>> uses for this (I don't know which it is so I can say that without
>>> sounding partial :)
>>>
>>> the tool that decides to use "sync", or at least the author thereof,
>>> should be aware of what flash is, and that it has a limited lifespan etc
>>> etc, and that you thus want maximum caching etc.
>>
>> I agree completely which is why we hack the system to remove the o_sync
>> on our distro derivative. (-:
>>
>> But my point was that your solution of "don't do that then" is not much
>> use to your average user who sits in front of such distro in graphical
>> desktop as they are not technical enough to find and hack their hotplug
>> system to work properly...
>>
>> Best regards,
>>
>> Anton
>
>>> If you don't want it *DO NOT USE IT AT THE MOUNT COMMAND LINE* !!!
>
> Yeah, cleaver.
> That is not really a constructive responce. I dont use , I do use command
> line mount all the time. I never was in danger of damaging my drive with
> this new "feature".
>
> Telling a user who has just burnt out a brand new 1GB usb device he should
> have RTFM and modified that HAL configuration to insure it did not use
> sync it not likely to win much confidence in the linux kernel.
>
> The point of raising this is that the vast majority of linux users have no
> awareness of this. If there is a danger of this sync implementation
> damaging hardware it should be done differently.
>
> More importantly this sync strategy is very likely _increasing_ the danger
> of data loss that is the core reason for using sync in the first place.
>
> To quote from my earlier post:
>
> The new model attempts to be more rigourous by updating the FAT every time
> a block of data is written. Thus the "hammering" of the physical memory
> hosting the FAT record.

Nobody should care.

>
> In view of the nature of flash memory this may actually be drastically
> increasing the chance that the whole FAT gets erased.
>

Will not happen because that's not how they work.

> If a pullout occurs during write , there is now a near 50% chance that
> this takes out the entire FAT.
>

If a pullout or a power-failure occurs, you just have an incomplete
write, an old FAT entry just like ejecting a floppy during a write.

> Now if that analysis is inaccurate I'd like be corrected. But flash has to
> be zeroed to be written. If every second write is zeroing the FAT this
> would seem much more likely to destroy the whole fs than to provide better
> protection from a untimely pull-out.
>

Flash does not get zeroed to be written! It gets erased, which sets all
the bits to '1', i.e., all bytes to 0xff. Further, the designers of
flash disks are not stupid as you assume. The direct access occurs
to static RAM (read/write stuff). After a few milliseconds of it
becoming dirty, and/or when a new page needs to be accessed, the
chip erases some page that was not used yet, or was used a long
time ago and is not on the active list. Then, it becomes buzy,
writes the current sector to the newly erased sector, and (after
that write occurs) replaces the entry in the table that tells the
disk implimentation the logical to physical translation of that page.
In the case where a page will be changed, the new page's data is read
from the device into static RAM before access. In any case, the chip
then becomes non-buzy. The power can fail at any time and you just
have the previous data instead of the new data, just like a real
disk drive, except that the sectors are large (64 k).

You see, these are not just flash-RAM chips. They are disc drive
emulators that contain an ASIC for the bus interface and control
logic, some static RAM, and the flash RAM.

The IDE emulators, like CompaqFlash, as tiny as they are, actually
have the same pin-outs as an IDE drive!!

>
> [Note: I am not subscribed to LKML, if you wish me to recieve any follow
> ups please BCC: col-pepper at piments point com . thx]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.53 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-02-27 23:22:16

by col-pepper

[permalink] [raw]
Subject: Re: o_sync in vfat driver

On Mon, 27 Feb 2006 22:17:21 +0100, Arjan van de Ven <[email protected]>
wrote:

>
>> Telling a user who has just burnt out a brand new 1GB usb device he
>> should
>> have RTFM and modified that HAL configuration to insure it did not use
>> sync it not likely to win much confidence in the linux kernel.
>
> or in HAL. really.

It may unfairly reflect on HAL in the users' mind but hal still does
exactly what it is set up to do.

>
>
> there was a very long discussion abuot kernel stability.
> The problem is that once depending on the absence of a feature becomes
> ABI ... there is a big problem.
>
>
>

It was not totally absent. If it was absent no-one would configure
anything to use it anyway. It seems that big problem was that it
functionality was fundamentlly changed but it was passed on like a minor
mod that no-one needed to worry about and the doc was not updated at the
time.

2006-02-27 23:22:18

by col-pepper

[permalink] [raw]
Subject: Re: o_sync in vfat driver

On Mon, 27 Feb 2006 22:32:07 +0100, linux-os (Dick Johnson)
<[email protected]> wrote:

> Flash does not get zeroed to be written! It gets erased, which sets all
> the bits to '1', i.e., all bytes to 0xff.

Thanks for the correction, but that does not change the discussion.

> Further, the designers of
> flash disks are not stupid as you assume. The direct access occurs
> to static RAM (read/write stuff).

I'm not assuming anything . Some hardware has been killed by this issue.
http://lkml.org/lkml/2005/5/13/144

It seems that it's you making the assumption that all of these devices are
manufactured the same way.

The constant dirtying of the buffer will still cause excessive use of the
flash block hosting the FAT. Clearly not all devices use a load spreading
mechanism and this can lead to premature failure.


2006-02-28 13:10:49

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: o_sync in vfat driver


On Mon, 27 Feb 2006 [email protected] wrote:

> On Mon, 27 Feb 2006 22:32:07 +0100, linux-os (Dick Johnson)
> <[email protected]> wrote:
>
>> Flash does not get zeroed to be written! It gets erased, which sets all
>> the bits to '1', i.e., all bytes to 0xff.
>
> Thanks for the correction, but that does not change the discussion.
>
>> Further, the designers of
>> flash disks are not stupid as you assume. The direct access occurs
>> to static RAM (read/write stuff).
>
> I'm not assuming anything . Some hardware has been killed by this issue.
> http://lkml.org/lkml/2005/5/13/144

No. That hardware was not killed by that issue. The writer, or another
who had encountered the same issue, eventually repartitioned and
reformatted the device. The partition table had gotten corrupted by
some experiments and the writer assumed that the device was broken.
It wasn't.

Also, if you read other elements in this thread, you would have
learned about something that has become somewhat of a red herring.

It takes about a second to erase a 64k physical sector. This is
a required operation before it is written. Since the projected
life of these new devices is about 5 to 10 million such cycles,
(older NAND flash used in modems was only 100-200k) the writer
would have to be running that "brand new device" for at least
5 million seconds. Let's see:

60 seconds per minute
3600 seconds per hour
86400 seconds per day.

5,000,000 / 86400 = 57 days of continuous writes to the same
sector. The writer surely would have a strange file because
he states that even a single large file can destroy the drive
if it is mounted with the "sync" option.

Also, the failure mode of NAND flash is not that it becomes
"destroyed". The failure mode is a slow loss of data. The
devices no longer retain data for a zillion years, only a
few hundred, eventually, only a year or so. So when somebody
claims that the flash has gotten destroyed, they need to have
written it for a few months, then waited for a few years before
reporting the event.

Clearly the writer is wrong.

>
> It seems that it's you making the assumption that all of these devices are
> manufactured the same way.
>
> The constant dirtying of the buffer will still cause excessive use of the
> flash block hosting the FAT. Clearly not all devices use a load spreading
> mechanism and this can lead to premature failure.
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.53 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-02-28 15:18:59

by Lennart Sorensen

[permalink] [raw]
Subject: Re: o_sync in vfat driver

On Tue, Feb 28, 2006 at 08:10:44AM -0500, linux-os (Dick Johnson) wrote:
>
> On Mon, 27 Feb 2006 [email protected] wrote:
>
> > On Mon, 27 Feb 2006 22:32:07 +0100, linux-os (Dick Johnson)
> > <[email protected]> wrote:
> >
> >> Flash does not get zeroed to be written! It gets erased, which sets all
> >> the bits to '1', i.e., all bytes to 0xff.
> >
> > Thanks for the correction, but that does not change the discussion.
> >
> >> Further, the designers of
> >> flash disks are not stupid as you assume. The direct access occurs
> >> to static RAM (read/write stuff).
> >
> > I'm not assuming anything . Some hardware has been killed by this issue.
> > http://lkml.org/lkml/2005/5/13/144
>
> No. That hardware was not killed by that issue. The writer, or another
> who had encountered the same issue, eventually repartitioned and
> reformatted the device. The partition table had gotten corrupted by
> some experiments and the writer assumed that the device was broken.
> It wasn't.
>
> Also, if you read other elements in this thread, you would have
> learned about something that has become somewhat of a red herring.
>
> It takes about a second to erase a 64k physical sector. This is
> a required operation before it is written. Since the projected
> life of these new devices is about 5 to 10 million such cycles,
> (older NAND flash used in modems was only 100-200k) the writer
> would have to be running that "brand new device" for at least
> 5 million seconds. Let's see:

How come I can write to my compact flash at about 2M/s if you claim it
takes 1s to erase a 64k sector? Somehow I think your number is much too
high. Or it can do multiple erases at the same time.

Also the 5 to 10 million is a lot higher than the numbers the makers of
the compact flash cards I use claim.

> 60 seconds per minute
> 3600 seconds per hour
> 86400 seconds per day.
>
> 5,000,000 / 86400 = 57 days of continuous writes to the same
> sector. The writer surely would have a strange file because
> he states that even a single large file can destroy the drive
> if it is mounted with the "sync" option.
>
> Also, the failure mode of NAND flash is not that it becomes
> "destroyed". The failure mode is a slow loss of data. The
> devices no longer retain data for a zillion years, only a
> few hundred, eventually, only a year or so. So when somebody
> claims that the flash has gotten destroyed, they need to have
> written it for a few months, then waited for a few years before
> reporting the event.

Some flash devices can be "destroyed" by loosing power in the middle of
a write, since this causes them to corrupt their table of blocks and
only the manufacturer has the ability to reset that. Fortunately this
doesn't seem like too common a design.

Len Sorensen

2006-02-28 16:11:46

by Helge Hafting

[permalink] [raw]
Subject: Re: o_sync in vfat driver

[email protected] wrote:

> On Mon, 27 Feb 2006 15:41:44 +0100, Anton Altaparmakov
> <[email protected]> wrote:
>
>> On Mon, 2006-02-27 at 15:27 +0100, Arjan van de Ven wrote:
>>
>>> On Mon, 2006-02-27 at 14:06 +0000, Anton Altaparmakov wrote:
>>> > On Mon, 2006-02-27 at 14:50 +0100, Arjan van de Ven wrote:
>>> > > On Mon, 2006-02-27 at 08:28 -0500, Lennart Sorensen wrote:
>>> > > > On Sun, Feb 26, 2006 at 11:50:40PM +0100,
>>> [email protected] wrote:
>>> > > > > Hi,
>>> > > > >
>>> > > > > OMG what do I have to do to post here? 10th attempt.
>>> > > > > {part2}
>>> > > > >
>>> > > > > Here is a non-exhaustive list of typical devices types
>>> requiring fat vfat
>>> > > > > support:
>>> > > > >
>>> > > > > fd ide-hd scsi-hd usb-hd cdrom usb-hd usb-handheld (iPod,
>>> iRiver etc)
>>> > > > > usb-flash (usbsticks, cameras, some music devices.)
>>> > > > >
>>> > > > > IIRC the sync mount option for vfat is ignored for file
>>> systems >2G, this
>>> > > > > effectively (and probably intentionally) excludes nearly all
>>> hd partitions
>>> > > > > and iPod type devices.
>>> > > >
>>> > > > I think many people wish it was ignored on smaller devices
>>> too given
>>> > > > what it does to write performance.
>>> > >
>>> > > well. If you don't want it *DO NOT USE IT AT THE MOUNT COMMAND
>>> LINE* !!!
>>> >
>>> > That is easy to say when you are using the command line... Modern
>>> > distros (as you know I am sure) mount all hot-plug devices like usb
>>> > keys, usb hard disks, etc automatically at plug-in time and at least
>>> > some distros use "-o sync"
>>>
>>> that is a bad misdesign of that distro or at least the tool the distro
>>> uses for this (I don't know which it is so I can say that without
>>> sounding partial :)
>>>
>>> the tool that decides to use "sync", or at least the author thereof,
>>> should be aware of what flash is, and that it has a limited lifespan
>>> etc
>>> etc, and that you thus want maximum caching etc.
>>
>>
>> I agree completely which is why we hack the system to remove the o_sync
>> on our distro derivative. (-:
>>
>> But my point was that your solution of "don't do that then" is not much
>> use to your average user who sits in front of such distro in graphical
>> desktop as they are not technical enough to find and hack their hotplug
>> system to work properly...
>>
>> Best regards,
>>
>> Anton
>
>
>>> If you don't want it *DO NOT USE IT AT THE MOUNT COMMAND LINE* !!!
>>
>
> Yeah, cleaver.
> That is not really a constructive responce. I dont use , I do use
> command line mount all the time. I never was in danger of damaging my
> drive with this new "feature".
>
> Telling a user who has just burnt out a brand new 1GB usb device he
> should have RTFM and modified that HAL configuration to insure it did
> not use sync it not likely to win much confidence in the linux kernel.

No problem in the kernel. The system is set up wrong. A simple user
may not be able to
figure out his distro's hotplug setup to fix this - but then this problem is
the fault of _the distro_, not the kernel. Complain to distributors
instead.

There is no need for the kernel to treat o_sync VFAT in any special
way. The users,
or more likely the distros, can skip that o_sync part.

Not all distros have such problems either. On debian, I had to set up
/etc/fstab myself -
where not specifying sync is easy enough.

>
> The point of raising this is that the vast majority of linux users
> have no awareness of this. If there is a danger of this sync
> implementation damaging hardware it should be done differently.

Which is why people is working on the "sync on close" alternative.

>
> More importantly this sync strategy is very likely _increasing_ the
> danger of data loss that is the core reason for using sync in the
> first place.
>
> To quote from my earlier post:
>
> The new model attempts to be more rigourous by updating the FAT every
> time
> a block of data is written. Thus the "hammering" of the physical memory
> hosting the FAT record.
>
> In view of the nature of flash memory this may actually be drastically
> increasing the chance that the whole FAT gets erased.
>
> If a pullout occurs during write , there is now a near 50% chance that
> this takes out the entire FAT.

No, only one FAT entry. And the users who pull out during writes
_really_ get
what they deserve anyway. You don't need deep linux knowledge for that.
In the day of the floppy, people respected the activity light regardless
of OS.

Helge Hafting

2006-02-28 16:16:42

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: o_sync in vfat driver


On Tue, 28 Feb 2006, Lennart Sorensen wrote:

> On Tue, Feb 28, 2006 at 08:10:44AM -0500, linux-os (Dick Johnson) wrote:
>>
>> On Mon, 27 Feb 2006 [email protected] wrote:
>>
>>> On Mon, 27 Feb 2006 22:32:07 +0100, linux-os (Dick Johnson)
>>> <[email protected]> wrote:
>>>
>>>> Flash does not get zeroed to be written! It gets erased, which sets all
>>>> the bits to '1', i.e., all bytes to 0xff.
>>>
>>> Thanks for the correction, but that does not change the discussion.
>>>
>>>> Further, the designers of
>>>> flash disks are not stupid as you assume. The direct access occurs
>>>> to static RAM (read/write stuff).
>>>
>>> I'm not assuming anything . Some hardware has been killed by this issue.
>>> http://lkml.org/lkml/2005/5/13/144
>>
>> No. That hardware was not killed by that issue. The writer, or another
>> who had encountered the same issue, eventually repartitioned and
>> reformatted the device. The partition table had gotten corrupted by
>> some experiments and the writer assumed that the device was broken.
>> It wasn't.
>>
>> Also, if you read other elements in this thread, you would have
>> learned about something that has become somewhat of a red herring.
>>
>> It takes about a second to erase a 64k physical sector. This is
>> a required operation before it is written. Since the projected
>> life of these new devices is about 5 to 10 million such cycles,
>> (older NAND flash used in modems was only 100-200k) the writer
>> would have to be running that "brand new device" for at least
>> 5 million seconds. Let's see:
>
> How come I can write to my compact flash at about 2M/s if you claim it
> takes 1s to erase a 64k sector? Somehow I think your number is much too
> high. Or it can do multiple erases at the same time.
>
> Also the 5 to 10 million is a lot higher than the numbers the makers of
> the compact flash cards I use claim.
>

Here is an instrumented erase function on a driver that rewrites
the first sector of a BIOS ROM. Unlike the Flash DISKS, the
BIOS ROM has no buffering in static RAM so you can gustimate
the actual time to erase............

//-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
//
// This erases a page and waits for the erasure to complete. It
// returns false if it failed.
//
static int erase(void *bios, int page)
{
int era;
flags_t flags;
jiffie_t ticks, start;
spin_lock_irqsave(&info->lock, flags);
erase_page(bios, page);
spin_unlock_irqrestore(&info->lock, flags);

start = jiffies;

ticks = jiffies + (ERA_TIME * HZ);
era = 0x00;
while(time_before(jiffies, ticks))
{
if((era = check_erase(bios, page)))
break;
if(signal_pending(current))
break;
set_current_state(TASK_INTERRUPTIBLE);
schedule_timeout(1);
}
set_current_state(TASK_RUNNING);


printk("They don't believe... %d\n", (int) (jiffies - start));

return era;
}

[SNIPPED...]

On the system I rewrite a BIOS sector on, jiffies is 1024 ticks/second.

parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
lp0: using parport0 (interrupt-driven).
lp0: console ready
device eth0 entered promiscuous mode
device eth0 left promiscuous mode
device eth0 entered promiscuous mode
device eth0 left promiscuous mode
Analogic-BiosDev : Initialization complete
They don't believe... 1004

Now, the wait for erase always sleeps for at least a timer-tick
(about a milisecond) so this may take longer than the physical
erase, but not much longer.

The erase function is:

#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#
# This erases the NVRAM page (block). It doesn't wait for completion.
# Each block is 64k in length.
# M29W040B chip
#
.section .text
erase_page:
pushl %ebx
movl BUF(%esp), %ebx # Address of the chip
movl DAT(%esp), %ecx # The page
andl $0x07, %ecx # Max pages
shll $0x10, %ecx # Times 64k
movb $0xf0, (%ebx) # Reset
movb $0xaa, 0x555(%ebx)
movb $0x55, 0x2aa(%ebx)
movb $0x80, 0x555(%ebx)
movb $0xaa, 0x555(%ebx)
movb $0x55, 0x2aa(%ebx)
movb $0x30, (%ecx,%ebx)
popl %ebx
ret
.size erase_page,.-erase_page
.type erase_page,@function
.global erase_page


And the check-erase function is this:

#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#
# This reads the whole M29W040B page, looking for all 0xffff words.
# It returns non-zero if it has been erased and zero otherwise.
#
check_erase:
pushl %edi
movl BUF(%esp), %edi # Point to buffer
movl DAT(%esp), %eax # 64k page
andl $0x07, %eax # Max pages possible
shll $0x10, %eax # Times 64k
addl %eax, %edi # Offset to start
cld
movl $0x8000, %ecx # Number of words to check
movl $-1, %eax # What to look for
repz scasw # Look for all 0xffff
jz 1f # All erased
incl %eax # -1 becomes zero
1: popl %edi
ret
.size check_erase,.-check_erase
.type check_erase,@function
.global check_erase



>> 60 seconds per minute
>> 3600 seconds per hour
>> 86400 seconds per day.
>>
>> 5,000,000 / 86400 = 57 days of continuous writes to the same
>> sector. The writer surely would have a strange file because
>> he states that even a single large file can destroy the drive
>> if it is mounted with the "sync" option.
>>
>> Also, the failure mode of NAND flash is not that it becomes
>> "destroyed". The failure mode is a slow loss of data. The
>> devices no longer retain data for a zillion years, only a
>> few hundred, eventually, only a year or so. So when somebody
>> claims that the flash has gotten destroyed, they need to have
>> written it for a few months, then waited for a few years before
>> reporting the event.
>
> Some flash devices can be "destroyed" by loosing power in the middle of
> a write, since this causes them to corrupt their table of blocks and
> only the manufacturer has the ability to reset that. Fortunately this
> doesn't seem like too common a design.
>

# dd if=/dev/zero of=/dev/whatever bs=1M count=128

Fixes a 128 megabyte flash disk, plug in other values for other
sizes.

> Len Sorensen

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.54 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-02-28 18:09:04

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: o_sync in vfat driver

"linux-os \(Dick Johnson\)" <[email protected]> writes:

> Here is an instrumented erase function on a driver that rewrites
> the first sector of a BIOS ROM. Unlike the Flash DISKS, the
> BIOS ROM has no buffering in static RAM so you can gustimate
> the actual time to erase............

The NOR flash is different but Samsung manual for K9F5608U0A-YCB0,
K9F5608U0A-YIB0 32M x 8 Bit NAND Flash Memory says:

FEATURES GENERAL DESCRIPTION
- Page Program : (512 + 16)Byte
- Block Erase : (16K + 512)Byte
- Program time : 200us(Typ.)
- Block Erase Time : 2ms(Typ.)
- Endurance : 100K Program/Erase Cycles
- Data Retention : 10 Years
--
Krzysztof Halasa

2006-02-28 19:04:33

by col-pepper

[permalink] [raw]
Subject: Re: o_sync in vfat driver

On Tue, 28 Feb 2006 14:10:44 +0100, linux-os (Dick Johnson)
<[email protected]> wrote:

> No. That hardware was not killed by that issue. The writer, or another
> who had encountered the same issue, eventually repartitioned and
> reformatted the device. The partition table had gotten corrupted by
> some experiments and the writer assumed that the device was broken.
> It wasn't.

I did not get the info you posted from that thread so maybe I missed
something you saw. Or indeed it was someone else.


Many thanks for your comments. If this is a false alert all the better.


> Also, the failure mode of NAND flash is not that it becomes
> "destroyed". The failure mode is a slow loss of data. The
> devices no longer retain data for a zillion years, only a
> few hundred, eventually, only a year or so.

There was a comment about the failure mode, no time scale was given. I see
no reason why the degradation would stop at a year though.

> Since the projected life of these new devices is about 5 to 10million
> such cycles,(older NAND flash used in modems was only 100-200k)

Maybe some of the cheap devices are not using the new flash memory in
which case it would come down to between 24 and 48hrs of constant use.
This would be a realistic problem.

Alan Cox refered to some devices that could be damaged as "crap", so it
seems he is aware of some hardware differences.

In conclusion it seems from Andrew Morton's posts that the way this is
handled is under review so I am confident that a robust and stable
solution will result.

Thanks again for your thoughts on this.

2006-02-28 22:37:38

by Pavel Machek

[permalink] [raw]
Subject: Re: o_sync in vfat driver

Hi!

> > I agree completely which is why we hack the system to remove the o_sync
> > on our distro derivative. (-:
> >
> > But my point was that your solution of "don't do that then" is not much
> > use to your average user who sits in front of such distro in graphical
> > desktop as they are not technical enough to find and hack their hotplug
> > system to work properly...
> >
> > Best regards,
> >
> > Anton
>
> >> If you don't want it *DO NOT USE IT AT THE MOUNT COMMAND LINE* !!!
>
> Yeah, cleaver.
> That is not really a constructive responce. I dont use , I do use command
> line mount all the time. I never was in danger of damaging my drive with
> this new "feature".
>
> Telling a user who has just burnt out a brand new 1GB usb device he should
> have RTFM and modified that HAL configuration to insure it did not use
> sync it not likely to win much confidence in the linux kernel.

Return that 1GB usb device to manufacturer, it was broken.

> The point of raising this is that the vast majority of linux users have no
> awareness of this. If there is a danger of this sync implementation
> damaging hardware it should be done differently.

Fix the distribution, then.
Pavel
--
Web maintainer for suspend.sf.net (http://www.sf.net/projects/suspend) wanted...

2006-02-28 22:39:17

by Pavel Machek

[permalink] [raw]
Subject: Re: o_sync in vfat driver

On ?t 28-02-06 00:21:53, [email protected] wrote:
> On Mon, 27 Feb 2006 22:32:07 +0100, linux-os (Dick Johnson)
> <[email protected]> wrote:
>
> > Flash does not get zeroed to be written! It gets erased, which sets all
> > the bits to '1', i.e., all bytes to 0xff.
>
> Thanks for the correction, but that does not change the discussion.
>
> > Further, the designers of
> > flash disks are not stupid as you assume. The direct access occurs
> > to static RAM (read/write stuff).
>
> I'm not assuming anything . Some hardware has been killed by this issue.
> http://lkml.org/lkml/2005/5/13/144

I have seen flash disk dead in 5 minutes, even without o-sync. Those
devices are often crap. (I copied tar file to flash by cat foo.tar >
/dev/sda. That was apparently enough to kill that flash. Label "Yahoo"
should have warned me).
Pavel
--
Web maintainer for suspend.sf.net (http://www.sf.net/projects/suspend) wanted...

2006-02-28 23:10:25

by Kamran Karimi

[permalink] [raw]
Subject: why VM_SHM has been removed from mm.h?

Hello all,

VM_SHM is used by DIPC to quickly recognise when we are dealing with a
System V IPC segment. It has been "removed" from recent kernels (set to 0).
Is there an easy way to find out if a segment is a Sys V shm? if not, I
suggest we re-activate it.

-Kamran


2006-03-01 03:02:42

by Phillip Susi

[permalink] [raw]
Subject: Re: why VM_SHM has been removed from mm.h?

Is there a reason that you posted this as a reply to the "o_sync in
vfat" thread?

Kamran Karimi wrote:
> Hello all,
>
> VM_SHM is used by DIPC to quickly recognise when we are dealing with a
> System V IPC segment. It has been "removed" from recent kernels (set to
> 0). Is there an easy way to find out if a segment is a Sys V shm? if
> not, I suggest we re-activate it.
>
> -Kamran
>

2006-03-01 04:28:41

by Kyle Moffett

[permalink] [raw]
Subject: Re: o_sync in vfat driver

On Feb 28, 2006, at 17:38:55, Pavel Machek wrote:
> I have seen flash disk dead in 5 minutes, even without o-sync.
> Those devices are often crap. (I copied tar file to flash by cat
> foo.tar > /dev/sda. That was apparently enough to kill that flash.
> Label "Yahoo" should have warned me).

Sometimes a flash device can have a temporary error condition that is
solved by rewriting the data. (I've seen it triggered by buggy USB
hubs that don't provide the rated power). It seems that a number of
flash drives have internal checks, and when those trigger it reports
a bad sector (even if it isn't permanently bad). My 1GB flashdrive
failed in that way, and I was able to fix the error by erasing with
"dd if=/dev/full of=/dev/usbkey" and reformatting. After the error
occurred I started md5summing every file I put on the drive, but I've
been using it for a month now and not a single checksum has miscomputed.

Cheers,
Kyle Moffett

2006-03-01 07:55:53

by Hugh Dickins

[permalink] [raw]
Subject: Re: why VM_SHM has been removed from mm.h?

On Tue, 28 Feb 2006, Kamran Karimi wrote:
>
> VM_SHM is used by DIPC to quickly recognise when we are dealing with a System
> V IPC segment. It has been "removed" from recent kernels (set to 0).

Curious: VM_SHM wasn't set on a System V IPC shm vma in any 2.4 or 2.6
kernel that I know of; but was set on the vmas of a random collection
of drivers. Perhaps you've been using your own patch to set it on
SysV IPC shm vmas, and clear it from drivers' vmas?

(We'll remove VM_SHM entirely once I've trawled through those drivers.)

> Is there an easy way to find out if a segment is a Sys V shm?

Nothing easy and reliable springs immediately to mind - from a VM point
of view, they're treated much the same as tmpfs files; but there
probably is some hacky way if we think about it long enough.

> if not, I suggest we re-activate it.

It seems that either you've been doing the wrong thing up to now,
and never noticed it; or that you've been using your own flag in
your own patch, and can continue to do so. No need for vanilla
kernel to reinstate VM_SHM.

Are you sure you need to recognize them?

Hugh

2006-03-01 14:58:43

by Kamran Karimi

[permalink] [raw]
Subject: Re: why VM_SHM has been removed from mm.h?

Thank you Hugh for the reply. Last time I used VM_SHM was in 2.2.x kernels.
I have a programme called DIPC which makes System V shared memory segments
(and also messages and semaphores) work over a network.

In the arch/xyz/mm/fault.c file, it checks the VM_SHM flag and then calls
its logic. As a substitute I've been trying this ad-hoc code to see if a vma
represents a Sys V shm:

file = vma->vm_file;
if(file && (file->f_dentry) && (file->f_dentry->d_inode) &&
(id = file->f_dentry->d_inode->i_ino)) {
shp = shm_lock(id);
if(shp == NULL)
return 0; // not a Sys V shm
}
else return 0; // not a Sys V shm

But the kernel hangs with an invalid-pointer error message. Any suggestions?

-Kamran


>On Tue, 28 Feb 2006, Kamran Karimi wrote:
> >
> > VM_SHM is used by DIPC to quickly recognise when we are dealing with a
>System
> > V IPC segment. It has been "removed" from recent kernels (set to 0).
>
>Curious: VM_SHM wasn't set on a System V IPC shm vma in any 2.4 or 2.6
>kernel that I know of; but was set on the vmas of a random collection
>of drivers. Perhaps you've been using your own patch to set it on
>SysV IPC shm vmas, and clear it from drivers' vmas?
>
>(We'll remove VM_SHM entirely once I've trawled through those drivers.)
>
> > Is there an easy way to find out if a segment is a Sys V shm?
>
>Nothing easy and reliable springs immediately to mind - from a VM point
>of view, they're treated much the same as tmpfs files; but there
>probably is some hacky way if we think about it long enough.
>
> > if not, I suggest we re-activate it.
>
>It seems that either you've been doing the wrong thing up to now,
>and never noticed it; or that you've been using your own flag in
>your own patch, and can continue to do so. No need for vanilla
>kernel to reinstate VM_SHM.
>
>Are you sure you need to recognize them?
>
>Hugh


2006-03-01 16:23:25

by Hugh Dickins

[permalink] [raw]
Subject: Re: why VM_SHM has been removed from mm.h?

On Wed, 1 Mar 2006, Kamran Karimi wrote:
>
> Thank you Hugh for the reply. Last time I used VM_SHM was in 2.2.x kernels.
> I have a programme called DIPC which makes System V shared memory segments
> (and also messages and semaphores) work over a network.
>
> In the arch/xyz/mm/fault.c file, it checks the VM_SHM flag and then calls
> its logic. As a substitute I've been trying this ad-hoc code to see if a vma
> represents a Sys V shm:
>
> file = vma->vm_file;
> if(file && (file->f_dentry) && (file->f_dentry->d_inode) &&
> (id = file->f_dentry->d_inode->i_ino)) {
> shp = shm_lock(id);
> if(shp == NULL)
> return 0; // not a Sys V shm
> }
> else return 0; // not a Sys V shm
>
> But the kernel hangs with an invalid-pointer error message. Any suggestions?

It's not obvious to me why the kernel would hang with an invalid pointer
error message there: ipc_lock appears to have good safety against being
passed a random id. Perhaps the invalid pointer message comes from
other code you've not shown (for example, I hope you shm_unlock(shp)
and return 1 when shm_lock succeeds), or perhaps I'm misreading.

But what you're doing there looks entirely weird and meaningless to me:
if shm_lock happens to succeed or fail on the inode number of some file
on some filesystem, that tells you nothing about whether that file is
SysV shm or not. Ah, I see ipc/shm.c saves id in i_ino: so if you're
dealing with a SysV shm file, then indeed that ought to tell whether
you're dealing with a SysV shm file - but that hasn't helped much!

Since you're already patching base kernel source (you mention
arch/xyz/mm/fault.c), why don't you just patch your own VM_SYSVSHM
into include/linux/mm.h, and set it on the vma in ipc/shm.c?

Hugh

2006-03-01 16:55:22

by Kamran Karimi

[permalink] [raw]
Subject: Re: why VM_SHM has been removed from mm.h?

>It's not obvious to me why the kernel would hang with an invalid pointer
>error message there: ipc_lock appears to have good safety against being
>passed a random id. Perhaps the invalid pointer message comes from
>other code you've not shown (for example, I hope you shm_unlock(shp)
>and return 1 when shm_lock succeeds), or perhaps I'm misreading.

I have put printk() statements all over the place. The hang (which is during
boot time) occurs within the block of code that I sent you. There is a
shm_unlock() statement after the code, but it is never reached.

>Since you're already patching base kernel source (you mention
>arch/xyz/mm/fault.c), why don't you just patch your own VM_SYSVSHM
>into include/linux/mm.h, and set it on the vma in ipc/shm.c?

Yes this looks like a good solution. I have changed VM_SHM in mm.h to be
0x0800000 and am looking for a good place to include it in the
vma->vm_flags. shmat() looks like a good place. How can I find the vma of a
SysV shm in that routine?

-Kamran


2006-03-01 17:50:19

by Hugh Dickins

[permalink] [raw]
Subject: Re: why VM_SHM has been removed from mm.h?

On Wed, 1 Mar 2006, Kamran Karimi wrote:
>
> > Since you're already patching base kernel source (you mention
> > arch/xyz/mm/fault.c), why don't you just patch your own VM_SYSVSHM
> > into include/linux/mm.h, and set it on the vma in ipc/shm.c?
>
> Yes this looks like a good solution. I have changed VM_SHM in mm.h to be
> 0x0800000 and am looking for a good place to include it in the vma->vm_flags.

I already pointed out that several drivers are setting VM_SHM; and that
we shall remove it in due course. Your DIPC patch should use its own flag.

> shmat() looks like a good place. How can I find the vma of a SysV shm in that
> routine?

shm_mmap would be the right place: shmat's do_mmap will call it.

Hugh

2006-03-02 08:23:26

by col-pepper

[permalink] [raw]
Subject: Re: o_sync in vfat driver

On Tue, 28 Feb 2006 23:38:55 +0100, Pavel Machek <[email protected]> wrote:

> On ?t 28-02-06 00:21:53, [email protected] wrote:
>> On Mon, 27 Feb 2006 22:32:07 +0100, linux-os (Dick Johnson)
>> <[email protected]> wrote:
>>
>> > Flash does not get zeroed to be written! It gets erased, which sets
>> all
>> > the bits to '1', i.e., all bytes to 0xff.
>>
>> Thanks for the correction, but that does not change the discussion.
>>
>> > Further, the designers of
>> > flash disks are not stupid as you assume. The direct access occurs
>> > to static RAM (read/write stuff).
>>
>> I'm not assuming anything . Some hardware has been killed by this issue.
>> http://lkml.org/lkml/2005/5/13/144
>
> I have seen flash disk dead in 5 minutes, even without o-sync. Those
> devices are often crap. (I copied tar file to flash by cat foo.tar >
> /dev/sda. That was apparently enough to kill that flash. Label "Yahoo"
> should have warned me).
> Pavel

If I'm not mistaken, writing to the device with cat will output that file
byte by byte. This would probably be even harder on the device than using
a formatted device with o_sync, since it would dirty a 64k block 64k times!

It seems some of the less elaborate devices dont support this type of use.

I suspect if you had tried using dd with a suitable bs you may still own a
crap Yahoo usb device.

Just because the linux kernel lets us use the abstract /dev devices freely
does not mean everything you can do with a /dev is a good idea for all h/w
that gets a device name.

I think that is the heart of the problem. Manufacturers are designing
these devices for the windows market. They are specifically designed and
supplied, preformatted with a fat fs, to be used in that way.

If linux distros, MacOS or anybody else wants to claim to support these
devices the default setup should probably handle the devices in a
_similar_ way to the native windows drivers.




2006-03-02 08:32:18

by Pavel Machek

[permalink] [raw]
Subject: Re: o_sync in vfat driver

On Čt 02-03-06 09:23:02, [email protected] wrote:
> On Tue, 28 Feb 2006 23:38:55 +0100, Pavel Machek <[email protected]> wrote:
>
> >On Út 28-02-06 00:21:53, [email protected] wrote:
> >>On Mon, 27 Feb 2006 22:32:07 +0100, linux-os (Dick Johnson)
> >><[email protected]> wrote:
> >>
> >>> Flash does not get zeroed to be written! It gets erased, which sets
> >>all
> >>> the bits to '1', i.e., all bytes to 0xff.
> >>
> >>Thanks for the correction, but that does not change the discussion.
> >>
> >>> Further, the designers of
> >>> flash disks are not stupid as you assume. The direct access occurs
> >>> to static RAM (read/write stuff).
> >>
> >>I'm not assuming anything . Some hardware has been killed by this issue.
> >>http://lkml.org/lkml/2005/5/13/144
> >
> >I have seen flash disk dead in 5 minutes, even without o-sync. Those
> >devices are often crap. (I copied tar file to flash by cat foo.tar >
> >/dev/sda. That was apparently enough to kill that flash. Label "Yahoo"
> >should have warned me).
>
> If I'm not mistaken, writing to the device with cat will output that file
> byte by byte. This would probably be even harder on the device than using
> a formatted device with o_sync, since it would dirty a 64k block 64k
> times!

No.

> It seems some of the less elaborate devices dont support this type of use.
>
> I suspect if you had tried using dd with a suitable bs you may still own a
> crap Yahoo usb device.
>
> Just because the linux kernel lets us use the abstract /dev devices freely
> does not mean everything you can do with a /dev is a good idea for all h/w
> that gets a device name.
>
> I think that is the heart of the problem. Manufacturers are designing
> these devices for the windows market. They are specifically designed and
> supplied, preformatted with a fat fs, to be used in that way.

There's USB mass storage specification, that says nothing about FAT,
or expected use of the device... if your device is broken FAT thing
that will break if used any other way, do not advertise it as USB mass
storage.
Pavel
--
Web maintainer for suspend.sf.net (http://www.sf.net/projects/suspend) wanted...