According to the man page for fsync it copies in-core data to disk
prior to its return. Does that take async i/o to the media in account?
I.e. does it wait for completion of the async i/o to the disk?
/Anders
> According to the man page for fsync it copies in-core data to disk
> prior to its return. Does that take async i/o to the media in account?
> I.e. does it wait for completion of the async i/o to the disk?
Undefined.
In theory for a journalling file system it means the change is committed to the
log and the log to the media, and for other fs that the change is committed
to the final disk and recoverable by fsck worst case
In practice some IDE disks do write merging and small amounts of write
caching in the drive firmware so you cannot trust it 100%. In addition some
higher end controllers will store to battery backed memory caches which is
normally just fine since the reboot will play through the ram cache.
Hi,
On Tue, Feb 06, 2001 at 02:52:40PM +0000, Alan Cox wrote:
> > According to the man page for fsync it copies in-core data to disk
> > prior to its return. Does that take async i/o to the media in account?
> > I.e. does it wait for completion of the async i/o to the disk?
>
> Undefined.
> In practice some IDE disks do write merging and small amounts of write
> caching in the drive firmware so you cannot trust it 100%.
It's worth noting that it *is* defined unambiguously in the standards:
fsync waits until all the data is hard on disk. Linux will obey that
if it possibly can: only in cases where the hardware is actively lying
about when the data has hit disk will the guarantee break down.
--Stephen
Hello,
On Tue, 6 Feb 2001, Alan Cox wrote:
[snip]
> In theory for a journalling file system it means the change is committed to the
> log and the log to the media, and for other fs that the change is committed
> to the final disk and recoverable by fsck worst case
>
> In practice some IDE disks do write merging and small amounts of write
> caching in the drive firmware so you cannot trust it 100%. In addition some
> higher end controllers will store to battery backed memory caches which is
> normally just fine since the reboot will play through the ram cache.
>
Does this imply that in order to ensure my data hits the drives, i should
do a warm reboot and then shut down from the lilo: prompt or similiar?
apologies to bug you with a simple question, but i can see other people
worrying about data loss here too.
--
/jbm
[email protected] said:
> Linux will obey that if it possibly can: only in cases where the
> hardware is actively lying about when the data has hit disk will the
> guarantee break down.
Do we attempt to ask SCSI disks nicely to flush their write caches in this
situation? cf. http://www.danbbs.dk/~dino/SCSI/SCSI2-09.html#9.2.18
Or do we instruct all SCSI disks not to do write caching in the first place?
--
dwmw2
> Does this imply that in order to ensure my data hits the drives, i should
> do a warm reboot and then shut down from the lilo: prompt or similiar?
As far as I can tell the IDE drives are write caching at most a second or two
of data. Andre may know more
On Tue, 6 Feb 2001, Stephen C. Tweedie wrote:
> It's worth noting that it *is* defined unambiguously in the standards:
> fsync waits until all the data is hard on disk. Linux will obey that
> if it possibly can: only in cases where the hardware is actively lying
> about when the data has hit disk will the guarantee break down.
It is defined for writes that have begun before the fsync() started.
fsync has no bearing on aio writes until the async writes have completed.
If people are worried about the interaction between an fsync in their app
and an async write, they should be using syncronous writes (which are
perfectly usable with async io).
-ben
Hi,
On Tue, Feb 06, 2001 at 05:54:41PM +0000, David Woodhouse wrote:
>
> [email protected] said:
> > Linux will obey that if it possibly can: only in cases where the
> > hardware is actively lying about when the data has hit disk will the
> > guarantee break down.
>
> Do we attempt to ask SCSI disks nicely to flush their write caches in this
> situation? cf. http://www.danbbs.dk/~dino/SCSI/SCSI2-09.html#9.2.18
No, we simply omit to instruct them to enable write-back caching.
Linux assumes that the WCE (write cache enable) bit in a disk's
caching mode page is zero.
--Stephen
"Stephen C. Tweedie" wrote:
>
> Hi,
>
> On Tue, Feb 06, 2001 at 02:52:40PM +0000, Alan Cox wrote:
> > > According to the man page for fsync it copies in-core data to disk
> > > prior to its return. Does that take async i/o to the media in account?
> > > I.e. does it wait for completion of the async i/o to the disk?
> >
> > Undefined.
>
> > In practice some IDE disks do write merging and small amounts of write
> > caching in the drive firmware so you cannot trust it 100%.
>
> It's worth noting that it *is* defined unambiguously in the standards:
> fsync waits until all the data is hard on disk. Linux will obey that
> if it possibly can: only in cases where the hardware is actively lying
> about when the data has hit disk will the guarantee break down.
Sometimes I want to know that the write is safely on disk and sometimes
I only need to know that the io has gone over the bus and is on its way
to disk. In the latter case the buffer/page can be unlocked a lot
sooner. Please correct me if I'm wrong, but I don't think the current
API can make that distinction for IDE, much less provide a uniform way
of controlling this behaviour across all types of block devices. We
need that, or else we have to choose between the following: 1) slow 2)
risky.
I'd like to be able to set a bit in the buffer_head that says 'get back
to me when it's on disk' vs 'get back to me when it's hit the bus'.
--
Daniel
On Tue, 6 Feb 2001, Stephen C. Tweedie wrote:
> Hi,
>
> On Tue, Feb 06, 2001 at 05:54:41PM +0000, David Woodhouse wrote:
> >
> > [email protected] said:
> > > Linux will obey that if it possibly can: only in cases where the
> > > hardware is actively lying about when the data has hit disk will the
> > > guarantee break down.
> >
> > Do we attempt to ask SCSI disks nicely to flush their write caches in this
> > situation? cf. http://www.danbbs.dk/~dino/SCSI/SCSI2-09.html#9.2.18
>
> No, we simply omit to instruct them to enable write-back caching.
> Linux assumes that the WCE (write cache enable) bit in a disk's
> caching mode page is zero.
Stephen,
You can not be so blind to omit the command.
You have to issue an active command to disable WCE.
All modern drives come with it defaulted enabled, especially ATA disks.
Andre Hedrick
Linux ATA Development
ASL Kernel Development
-----------------------------------------------------------------------------
ASL, Inc. Toll free: 1-877-ASL-3535
1757 Houret Court Fax: 1-408-941-2071
Milpitas, CA 95035 Web: http://www.aslab.com
Hi,
On Tue, Feb 06, 2001 at 11:25:00AM -0800, Andre Hedrick wrote:
> On Tue, 6 Feb 2001, Stephen C. Tweedie wrote:
> > No, we simply omit to instruct them to enable write-back caching.
> > Linux assumes that the WCE (write cache enable) bit in a disk's
> > caching mode page is zero.
>
> You can not be so blind to omit the command.
Linux has traditionally ignored the issue. Don't ask me to defend it
--- the last advice I got from anybody who knew SCSI well was that
SCSI disks were defaulting to WCE-disabled.
Note that disabling SCSI WCE doesn't disable the cache, it just
enforces synchronous completion. With tagged command queuing,
writeback caching doesn't necessarily mean a huge performance
increase. But if WCE is being enabled by default on modern SCSI
drives, then that's something which the scsi stack really does need to
fix --- the upper block layers will most definitely break if we have
WCE enabled and we don't set force-unit-access on the scsi commands.
The ll_rw_block interface is perfectly clear: it expects the data to
be written to persistent storage once the buffer_head end_io is
called. If that's not the case, somebody needs to fix the lower
layers.
Cheers,
Stephen
On Tue, 6 Feb 2001, Stephen C. Tweedie wrote:
> The ll_rw_block interface is perfectly clear: it expects the data to
> be written to persistent storage once the buffer_head end_io is
> called. If that's not the case, somebody needs to fix the lower
> layers.
Sure in 2.5 when I have a cleaner method of setting up hooks to allow
testing and changing of the mode but you can not assume that this stuff is
off by default and will stay that way.
At this time I am working to clean up an IBM mess of drives that do random
dumping of the drive cache to the platters when power is pulled. This is
a nice dirty errata that I have heard about but have never seen, but can
believe that it is real. The painful part is now that drives have these
huge buffers of upto 4MB, we have only a second or two to hit the platters
before the head float and spindle sync for writing depart from the
allowable range and it does not get to disk....OOPS!
I suspect that with all of the new NVRAM HOSTS coming to market soon we
will see more fs death in the future until things settle.
Cheers,
Andre Hedrick
Linux ATA Development
ASL Kernel Development
-----------------------------------------------------------------------------
ASL, Inc. Toll free: 1-877-ASL-3535
1757 Houret Court Fax: 1-408-941-2071
Milpitas, CA 95035 Web: http://www.aslab.com