2005-03-04 06:34:40

by Junfeng Yang

[permalink] [raw]
Subject: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?


Hi,

FiSC (our file system checker) emits several warnings on ext2, jfs and
reiserfs, complaining that diretories or files are lost while FiSC
believes they should already be persistent on disk. (ext3 behaves
correctly.)

All warnings boil down to a single cause: when these file systems are
mounted -o sync or dirsync, dirty blocks are still written out
asynchronously. It appears to me that these mount options don't have any
effect on these file systems. Is this the intended behavior?

man mount shows:

sync All I/O to the file system should be done
synchronously.

dirsync
All directory updates within the file system should
be
done synchronously. This affects the following
system
calls: creat, link, unlink, symlink, mkdir, rmdir,
mknod
and rename.

Any clafirication on this would be very helpful,

-Junfeng


2005-03-04 07:17:51

by Matt Mackall

[permalink] [raw]
Subject: Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?

On Thu, Mar 03, 2005 at 10:33:40PM -0800, Junfeng Yang wrote:
>
> Hi,
>
> FiSC (our file system checker) emits several warnings on ext2, jfs and
> reiserfs, complaining that diretories or files are lost while FiSC
> believes they should already be persistent on disk. (ext3 behaves
> correctly.)
>
> All warnings boil down to a single cause: when these file systems are
> mounted -o sync or dirsync, dirty blocks are still written out
> asynchronously. It appears to me that these mount options don't have any
> effect on these file systems. Is this the intended behavior?

I don't believe so. The sync option should definitionally make calls
to fsync for integrity redundant. This probably got broken ages ago
for ext2 in one of the many buffer/page cache refactorings.

--
Mathematics is the supreme nostalgia of our time.

2005-03-04 07:34:28

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?

>All warnings boil down to a single cause: when these file systems are
>mounted -o sync or dirsync, dirty blocks are still written out
>asynchronously. It appears to me that these mount options don't have any
>effect on these file systems. Is this the intended behavior?

At least my HDD LED flashes regularly when I add -o sync...
(Using `mount / -o remount,sync`)

It may happen that FISC reads the disk before the write command even finished.
With all the HD head movement optimization in the kernel (block layer,
boiling down to TCQ/NCQ), this sounds possible.


Jan Engelhardt
--

2005-03-04 08:02:21

by Junfeng Yang

[permalink] [raw]
Subject: Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?

> It may happen that FISC reads the disk before the write command even finished.
> With all the HD head movement optimization in the kernel (block layer,
> boiling down to TCQ/NCQ), this sounds possible.

FiSC "crashes" the kernel immediately after a file system operation
(creat, mkdir, write, etc) returns. Presumably, if a file system is
mounted -o sync, all the FS operations should be done synchronously. i.e.,
if creat("foo") returns, the file "foo" better be on disk. It turns out
not the case for ext2, jfs and reiserfs.

-Junfeng

2005-03-04 08:43:54

by Junfeng Yang

[permalink] [raw]
Subject: Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?

On Thu, 3 Mar 2005, Junfeng Yang wrote:

>
> Hi,
>
> FiSC (our file system checker) emits several warnings on ext2, jfs and
> reiserfs, complaining that diretories or files are lost while FiSC
> believes they should already be persistent on disk. (ext3 behaves
> correctly.)

I forget to mention, we are mainly looking for crash-recovery bugs. The
warnings can trigger this way:
1. do several file system operations
2. "crash" the test machine
3. get the crashed disk image, run fsck to recover
4. mount the recovered disk image

I'm able to reproduce the same warnings on ext2 using the following
program:

main()
{
system("sudo umount /dev/hda9");
system("/sbin/mke2fs /dev/hda9");
system("sudo mount -t ext2 /dev/hda9 /mnt/sbd1 -o sync,dirsync");
creat("/mnt/sbd1/0002", 0777);
mkdir("/mnt/sbd1/0003", 0777);
// unplug your power cord here :) then use e2fsck to recover
}

uname -a shows
Linux notus 2.6.8-1-686 #1 Thu Nov 25 04:34:30 UTC 2004 i686 GNU/Linux

2005-03-04 09:15:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?

Junfeng Yang <[email protected]> wrote:
>
> On Thu, 3 Mar 2005, Junfeng Yang wrote:
>
> >
> > Hi,
> >
> > FiSC (our file system checker) emits several warnings on ext2, jfs and
> > reiserfs, complaining that diretories or files are lost while FiSC
> > believes they should already be persistent on disk. (ext3 behaves
> > correctly.)
>
> I forget to mention, we are mainly looking for crash-recovery bugs. The
> warnings can trigger this way:
> 1. do several file system operations
> 2. "crash" the test machine
> 3. get the crashed disk image, run fsck to recover
> 4. mount the recovered disk image
>
> I'm able to reproduce the same warnings on ext2 using the following
> program:
>
> main()
> {
> system("sudo umount /dev/hda9");
> system("/sbin/mke2fs /dev/hda9");
> system("sudo mount -t ext2 /dev/hda9 /mnt/sbd1 -o sync,dirsync");
> creat("/mnt/sbd1/0002", 0777);
> mkdir("/mnt/sbd1/0003", 0777);
> // unplug your power cord here :) then use e2fsck to recover
> }

That would be a bug. Please send the e2fsck output.

> uname -a shows
> Linux notus 2.6.8-1-686 #1 Thu Nov 25 04:34:30 UTC 2004 i686 GNU/Linux

It would be much better to test vaguely contemporary kernels.

2005-03-04 09:44:46

by Junfeng Yang

[permalink] [raw]
Subject: Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?

> That would be a bug. Please send the e2fsck output.

Here is the trace

1. file system is made with sbin/mkfs.ext2 -F -b 1024 /dev/hda9 60
and mounted with -o sync,dirsync

1. operations FiSC did:

creat(/mnt/sbd0/0001)
write(/mnt/sbd0/0001)
rename(/mnt/sbd0/0001, /mnt/sbd0/0002)
mkdir(/mnt/sbd0/0003)

2. FiSC "crashed" the test machine after mkdir returns. Crashed
disk image can be downloaded at: http://fisc.stanford.edu/bug2/crash.img.bz2

e2fsck output is:

e2fsck 1.36 (05-Feb-2005)
/dev/hda9 was not cleanly unmounted, check
forced.
Pass 1: Checking inodes, blocks, and sizes
Inode 12, i_blocks is 16, should be 2. Fix? yes

Pass 2: Checking directory structure
Entry '0003' in / (2) has deleted/unused inode 13. Clear? yes

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -21
Fix? yes

Free blocks count wrong for group #0 (38, counted=39).
Fix? yes

Free blocks count wrong (38, counted=39).
Fix? yes

Inode bitmap differences: -13
Fix? yes

Free inodes count wrong for group #0 (3, counted=4).
Fix? yes

Directories count wrong for group #0 (3, counted=2).
Fix? yes

Free inodes count wrong (3, counted=4).
Fix? yes


/dev/hda9: ***** FILE SYSTEM WAS MODIFIED
*****
/dev/hda9: 12/16 files (0.0% non-contiguous), 21/60 blocks

>
>
> It would be much better to test vaguely contemporary kernels.
>

I'm going to check 2.6.11 tonight.

2005-03-04 10:27:23

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?

On 2005-03-04T01:44:06, Junfeng Yang <[email protected]> wrote:

> > That would be a bug. Please send the e2fsck output.
>
> Here is the trace
>
> 1. file system is made with sbin/mkfs.ext2 -F -b 1024 /dev/hda9 60
> and mounted with -o sync,dirsync
>
> 1. operations FiSC did:
>
> creat(/mnt/sbd0/0001)
> write(/mnt/sbd0/0001)
> rename(/mnt/sbd0/0001, /mnt/sbd0/0002)
> mkdir(/mnt/sbd0/0003)
>
> 2. FiSC "crashed" the test machine after mkdir returns. Crashed
> disk image can be downloaded at: http://fisc.stanford.edu/bug2/crash.img.bz2

I've run into similar issues. For example, a "touch foo" also isn't
synchronous with -o sync, but stays entirely in the cache. Andrea tells
me this is expected behaviour, so I've given up on this one...


Sincerely,
Lars Marowsky-Br?e <[email protected]>

--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

2005-03-04 11:25:45

by Andrew Morton

[permalink] [raw]
Subject: Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?

Lars Marowsky-Bree <[email protected]> wrote:
>
> On 2005-03-04T01:44:06, Junfeng Yang <[email protected]> wrote:
>
> > > That would be a bug. Please send the e2fsck output.
> >
> > Here is the trace
> >
> > 1. file system is made with sbin/mkfs.ext2 -F -b 1024 /dev/hda9 60
> > and mounted with -o sync,dirsync
> >
> > 1. operations FiSC did:
> >
> > creat(/mnt/sbd0/0001)
> > write(/mnt/sbd0/0001)
> > rename(/mnt/sbd0/0001, /mnt/sbd0/0002)
> > mkdir(/mnt/sbd0/0003)
> >
> > 2. FiSC "crashed" the test machine after mkdir returns. Crashed
> > disk image can be downloaded at: http://fisc.stanford.edu/bug2/crash.img.bz2
>
> I've run into similar issues. For example, a "touch foo" also isn't
> synchronous with -o sync, but stays entirely in the cache. Andrea tells
> me this is expected behaviour, so I've given up on this one...
>

Why is that expected behaviour? I have vague memories which agree with
that, but I cannot remember the reasoning.

>From a quick parse, ext2 seems to be full of MS_SYNCHRONOUS holes, and
there might be some O_SYNC ones there as well.

Problem is, it's subtle because we try to defer I/O until the last stage,
to avoid doing extra I/O.

So this wild scattergun patch probably does extra work and possibly extra
I/O all over the place, but I'd be interested if Junfeng could give it a
quick test. It's against 2.6.11.

A real patch would take some painstaking work.


diff -puN fs/ext2/balloc.c~ext2-sync-fix fs/ext2/balloc.c
--- 25/fs/ext2/balloc.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/balloc.c 2005-03-04 02:49:00.000000000 -0800
@@ -139,8 +139,9 @@ static void release_blocks(struct super_
}
}

-static int group_reserve_blocks(struct ext2_sb_info *sbi, int group_no,
- struct ext2_group_desc *desc, struct buffer_head *bh, int count)
+static int group_reserve_blocks(struct super_block *sb,
+ struct ext2_sb_info *sbi, int group_no, struct ext2_group_desc *desc,
+ struct buffer_head *bh, int count)
{
unsigned free_blocks;

@@ -154,6 +155,8 @@ static int group_reserve_blocks(struct e
desc->bg_free_blocks_count = cpu_to_le16(free_blocks - count);
spin_unlock(sb_bgl_lock(sbi, group_no));
mark_buffer_dirty(bh);
+ if (sb->s_flags & MS_SYNCHRONOUS)
+ sync_dirty_buffer(bh);
return count;
}

@@ -170,6 +173,8 @@ static void group_release_blocks(struct
spin_unlock(sb_bgl_lock(sbi, group_no));
sb->s_dirt = 1;
mark_buffer_dirty(bh);
+ if (sb->s_flags & MS_SYNCHRONOUS)
+ sync_dirty_buffer(bh);
}
}

@@ -377,7 +382,7 @@ int ext2_new_block(struct inode *inode,
goto io_error;
}

- group_alloc = group_reserve_blocks(sbi, group_no, desc,
+ group_alloc = group_reserve_blocks(sb, sbi, group_no, desc,
gdp_bh, es_alloc);
if (group_alloc) {
ret_block = ((goal - le32_to_cpu(es->s_first_data_block)) %
@@ -413,7 +418,7 @@ retry:
desc = ext2_get_group_desc(sb, group_no, &gdp_bh);
if (!desc)
goto io_error;
- group_alloc = group_reserve_blocks(sbi, group_no, desc,
+ group_alloc = group_reserve_blocks(sb, sbi, group_no, desc,
gdp_bh, es_alloc);
}
if (!group_alloc) {
diff -puN fs/ext2/ialloc.c~ext2-sync-fix fs/ext2/ialloc.c
--- 25/fs/ext2/ialloc.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/ialloc.c 2005-03-04 02:54:13.000000000 -0800
@@ -86,6 +86,8 @@ static void ext2_release_inode(struct su
percpu_counter_dec(&EXT2_SB(sb)->s_dirs_counter);
sb->s_dirt = 1;
mark_buffer_dirty(bh);
+ if (sb->s_flags & MS_SYNCHRONOUS)
+ sync_dirty_buffer(bh);
}

/*
@@ -563,6 +565,8 @@ got:

sb->s_dirt = 1;
mark_buffer_dirty(bh2);
+ if (sb->s_flags & MS_SYNCHRONOUS)
+ sync_dirty_buffer(bh2);
inode->i_uid = current->fsuid;
if (test_opt (sb, GRPID))
inode->i_gid = dir->i_gid;
@@ -614,7 +618,7 @@ got:
DQUOT_FREE_INODE(inode);
goto fail2;
}
- mark_inode_dirty(inode);
+ ext2_mark_inode_dirty(inode);
ext2_debug("allocating inode %lu\n", inode->i_ino);
ext2_preread_inode(inode);
return inode;
diff -puN fs/ext2/super.c~ext2-sync-fix fs/ext2/super.c
--- 25/fs/ext2/super.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/super.c 2005-03-04 02:49:00.000000000 -0800
@@ -1097,6 +1097,8 @@ static ssize_t ext2_quota_write(struct s
set_buffer_uptodate(bh);
mark_buffer_dirty(bh);
unlock_buffer(bh);
+ if (sb->s_flags & MS_SYNCHRONOUS)
+ sync_dirty_buffer(bh);
brelse(bh);
offset = 0;
towrite -= tocopy;
@@ -1110,8 +1112,8 @@ out:
i_size_write(inode, off+len-towrite);
inode->i_version++;
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
- mark_inode_dirty(inode);
up(&inode->i_sem);
+ ext2_mark_inode_dirty(inode);
return len - towrite;
}

diff -puN fs/ext2/xattr.c~ext2-sync-fix fs/ext2/xattr.c
--- 25/fs/ext2/xattr.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/xattr.c 2005-03-04 02:49:00.000000000 -0800
@@ -348,6 +348,8 @@ static void ext2_xattr_update_super_bloc
sb->s_dirt = 1;
mark_buffer_dirty(EXT2_SB(sb)->s_sbh);
unlock_super(sb);
+ if (sb->s_flags & MS_SYNCHRONOUS)
+ sync_dirty_buffer(EXT2_SB(sb)->s_sbh);
}

/*
diff -puN fs/ext2/dir.c~ext2-sync-fix fs/ext2/dir.c
--- 25/fs/ext2/dir.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/dir.c 2005-03-04 02:49:00.000000000 -0800
@@ -428,7 +428,7 @@ void ext2_set_link(struct inode *dir, st
ext2_put_page(page);
dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL;
- mark_inode_dirty(dir);
+ ext2_mark_inode_dirty(dir);
}

/*
@@ -518,7 +518,7 @@ got_it:
err = ext2_commit_chunk(page, from, to);
dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL;
- mark_inode_dirty(dir);
+ ext2_mark_inode_dirty(dir);
/* OFFSET_CACHE */
out_put:
ext2_put_page(page);
@@ -566,7 +566,7 @@ int ext2_delete_entry (struct ext2_dir_e
err = ext2_commit_chunk(page, from, to);
inode->i_ctime = inode->i_mtime = CURRENT_TIME_SEC;
EXT2_I(inode)->i_flags &= ~EXT2_BTREE_FL;
- mark_inode_dirty(inode);
+ ext2_mark_inode_dirty(inode);
out:
ext2_put_page(page);
return err;
diff -puN fs/ext2/inode.c~ext2-sync-fix fs/ext2/inode.c
--- 25/fs/ext2/inode.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/inode.c 2005-03-04 02:49:00.000000000 -0800
@@ -41,6 +41,17 @@ MODULE_LICENSE("GPL");
static int ext2_update_inode(struct inode * inode, int do_sync);

/*
+ * dirty an ext2 inode and sync it if needed
+ */
+int ext2_mark_inode_dirty(struct inode *inode)
+{
+ mark_inode_dirty(inode);
+ if (inode_needs_sync(inode))
+ return ext2_update_inode(inode, 1);
+ return 0;
+}
+
+/*
* Test whether an inode is a fast symlink.
*/
static inline int ext2_inode_is_fast_symlink(struct inode *inode)
@@ -60,8 +71,7 @@ void ext2_delete_inode (struct inode * i
if (is_bad_inode(inode))
goto no_delete;
EXT2_I(inode)->i_dtime = get_seconds();
- mark_inode_dirty(inode);
- ext2_update_inode(inode, inode_needs_sync(inode));
+ ext2_mark_inode_dirty(inode);

inode->i_size = 0;
if (inode->i_blocks)
diff -puN fs/ext2/acl.c~ext2-sync-fix fs/ext2/acl.c
diff -puN fs/ext2/ioctl.c~ext2-sync-fix fs/ext2/ioctl.c
--- 25/fs/ext2/ioctl.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/ioctl.c 2005-03-04 02:49:00.000000000 -0800
@@ -60,7 +60,7 @@ int ext2_ioctl (struct inode * inode, st

ext2_set_inode_flags(inode);
inode->i_ctime = CURRENT_TIME_SEC;
- mark_inode_dirty(inode);
+ ext2_mark_inode_dirty(inode);
return 0;
}
case EXT2_IOC_GETVERSION:
@@ -73,7 +73,7 @@ int ext2_ioctl (struct inode * inode, st
if (get_user(inode->i_generation, (int __user *) arg))
return -EFAULT;
inode->i_ctime = CURRENT_TIME_SEC;
- mark_inode_dirty(inode);
+ ext2_mark_inode_dirty(inode);
return 0;
default:
return -ENOTTY;
diff -puN fs/ext2/namei.c~ext2-sync-fix fs/ext2/namei.c
--- 25/fs/ext2/namei.c~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/namei.c 2005-03-04 02:55:15.000000000 -0800
@@ -132,7 +132,7 @@ static int ext2_create (struct inode * d
inode->i_mapping->a_ops = &ext2_nobh_aops;
else
inode->i_mapping->a_ops = &ext2_aops;
- mark_inode_dirty(inode);
+ ext2_mark_inode_dirty(inode);
err = ext2_add_nondir(dentry, inode);
}
return err;
@@ -153,7 +153,7 @@ static int ext2_mknod (struct inode * di
#ifdef CONFIG_EXT2_FS_XATTR
inode->i_op = &ext2_special_inode_operations;
#endif
- mark_inode_dirty(inode);
+ ext2_mark_inode_dirty(inode);
err = ext2_add_nondir(dentry, inode);
}
return err;
@@ -191,7 +191,7 @@ static int ext2_symlink (struct inode *
memcpy((char*)(EXT2_I(inode)->i_data),symname,l);
inode->i_size = l-1;
}
- mark_inode_dirty(inode);
+ ext2_mark_inode_dirty(inode);

err = ext2_add_nondir(dentry, inode);
out:
diff -puN fs/ext2/ext2.h~ext2-sync-fix fs/ext2/ext2.h
--- 25/fs/ext2/ext2.h~ext2-sync-fix 2005-03-04 02:49:00.000000000 -0800
+++ 25-akpm/fs/ext2/ext2.h 2005-03-04 02:49:00.000000000 -0800
@@ -116,6 +116,7 @@ extern unsigned long ext2_count_free (st
/* inode.c */
extern void ext2_read_inode (struct inode *);
extern int ext2_write_inode (struct inode *, int);
+int ext2_mark_inode_dirty(struct inode *inode);
extern void ext2_delete_inode (struct inode *);
extern int ext2_sync_inode (struct inode *);
extern void ext2_discard_prealloc (struct inode *);
_

2005-03-05 00:37:06

by Junfeng Yang

[permalink] [raw]
Subject: Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?

> >From a quick parse, ext2 seems to be full of MS_SYNCHRONOUS holes, and
> there might be some O_SYNC ones there as well.

I should be able to easily add O_SYNC check to FiSC. Several questions:
1. Does O_SYNC apply to directory as well?
2. For the same file, if I open twice, once with O_SYNC and another time
without, only writes through the O_SYNC fd will be sychonous, right?
3. I open a file w/o O_SYNC, issue a bunch of writes, then call
ioctl(FIOASYNC) to set the fd sync, then issure a second set of writes.
Only the second set of writes are synchronous?

btw, man page show that O_DSYNC and O_RSYNC are just O_SYNC. Is this true
for current linux kernel (2.6)?

> So this wild scattergun patch probably does extra work and possibly extra
> I/O all over the place, but I'd be interested if Junfeng could give it a
> quick test. It's against 2.6.11.

I checked 2.6.11 with your patch just now. Looks like the problem is
still there. If you need more information, let me know. Image is at
http://fisc.stanford.edu/bug2/crash-1.img.bz2. Below is the output from
e2fsck.

e2fsck 1.36 (05-Feb-2005)
/dev/ide/host0/bus0/target0/lun0/part9 was not cleanly unmounted, check
forced.
Pass 1: Checking inodes, blocks, and sizes
Inode 13, i_blocks is 16, should be 2. Fix? yes

Inode 15 is a zero-length directory. Clear? yes

Pass 2: Checking directory structure
Entry '0005' in / (2) has deleted/unused inode 15. Clear? yes

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 2 ref count is 4, should be 3. Fix? yes

Pass 5: Checking group summary information
Block bitmap differences: -21
Fix? yes

Free blocks count wrong for group #0 (38, counted=39).
Fix? yes

Free blocks count wrong (38, counted=39).
Fix? yes

Inode bitmap differences: -15
Fix? yes

Free inodes count wrong for group #0 (1, counted=2).
Fix? yes

Directories count wrong for group #0 (3, counted=2).
Fix? yes

Free inodes count wrong (1, counted=2).
Fix? yes


/dev/ide/host0/bus0/target0/lun0/part9: ***** FILE SYSTEM WAS MODIFIED
*****
/dev/ide/host0/bus0/target0/lun0/part9: 14/16 files (0.0% non-contiguous),
21/60 blocks


2005-03-05 00:36:55

by Andrew Morton

[permalink] [raw]
Subject: Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?

Junfeng Yang <[email protected]> wrote:
>
> > >From a quick parse, ext2 seems to be full of MS_SYNCHRONOUS holes, and
> > there might be some O_SYNC ones there as well.
>
> I should be able to easily add O_SYNC check to FiSC. Several questions:
> 1. Does O_SYNC apply to directory as well?

Only if you can open directores for writing ;)

> 2. For the same file, if I open twice, once with O_SYNC and another time
> without, only writes through the O_SYNC fd will be sychonous, right?

Yes, O_SYNC is a per-fd thing.

> 3. I open a file w/o O_SYNC, issue a bunch of writes, then call
> ioctl(FIOASYNC) to set the fd sync, then issure a second set of writes.
> Only the second set of writes are synchronous?

FIOSYNC is unrelated to O_SYNC. OSYNC can only be set at open().

> btw, man page show that O_DSYNC and O_RSYNC are just O_SYNC. Is this true
> for current linux kernel (2.6)?

The kernel only supports O_SYNC (equivalent behaviour to O_RSYNC|O_DSYNC).
Perhaps glibc does a conversion.

> > So this wild scattergun patch probably does extra work and possibly extra
> > I/O all over the place, but I'd be interested if Junfeng could give it a
> > quick test. It's against 2.6.11.
>
> I checked 2.6.11 with your patch just now. Looks like the problem is
> still there. If you need more information, let me know. Image is at
> http://fisc.stanford.edu/bug2/crash-1.img.bz2. Below is the output from
> e2fsck.

ugh. Thanks.

2005-03-07 17:31:27

by Alan

[permalink] [raw]
Subject: Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?

The IDE layer default is still unfortunately broken and leaves write
caching enabled. Turn it off with hdparm.

2005-03-07 23:04:38

by Junfeng Yang

[permalink] [raw]
Subject: Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?


FiSC can still get those warnings with hdparm -W 0, or with a simple
ramdisk that serves the disk requests whenever they are submitted.

Thanks,
-Junfeng

On Mon, 7 Mar 2005, Alan Cox wrote:

> The IDE layer default is still unfortunately broken and leaves write
> caching enabled. Turn it off with hdparm.
>

2005-03-08 00:32:16

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?

In article <[email protected]> you wrote:
> 3. I open a file w/o O_SYNC, issue a bunch of writes, then call
> ioctl(FIOASYNC) to set the fd sync, then issure a second set of writes.
> Only the second set of writes are synchronous?

I also am curious if one can open a file, write to it, close it, open it and
do fsync()/fdatasync() on it?

Greetings
Bernd

2005-03-19 22:35:06

by Florian Weimer

[permalink] [raw]
Subject: Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?

* Bernd Eckenfels:

> In article <[email protected]> you wrote:
>> 3. I open a file w/o O_SYNC, issue a bunch of writes, then call
>> ioctl(FIOASYNC) to set the fd sync, then issure a second set of writes.
>> Only the second set of writes are synchronous?
>
> I also am curious if one can open a file, write to it, close it, open it and
> do fsync()/fdatasync() on it?

Hopefully the fsync/fdatasync call will flush all previous writes
(even from other processes). Berkeley DB relies on this behavior for
correct operation.