2008-05-12 15:56:29

by Eric Sandeen

[permalink] [raw]
Subject: barriers off by default?

As I look at my shiny new 500G disks with 32MB of cache, I find myself
wondering why the default for ext3 and ext4 is to have barriers disabled.

This is a pretty dangerous default w.r.t. filesystem integrity on power
loss, no?

Thanks,

-Eric


2008-05-12 17:30:05

by Eric Anopolsky

[permalink] [raw]
Subject: Re: barriers off by default?

On Mon, May 12, 2008 at 9:56 AM, Eric Sandeen <[email protected]> wrote:
> As I look at my shiny new 500G disks with 32MB of cache, I find myself
> wondering why the default for ext3 and ext4 is to have barriers disabled.
>
> This is a pretty dangerous default w.r.t. filesystem integrity on power
> loss, no?

>From reading the documentation, I was under the impression that write
barriers don't always do what they're supposed to do.

Cheers,
Eric

2008-05-12 17:53:04

by Eric Sandeen

[permalink] [raw]
Subject: Re: barriers off by default?

Eric A wrote:
> On Mon, May 12, 2008 at 9:56 AM, Eric Sandeen <[email protected]> wrote:
>> As I look at my shiny new 500G disks with 32MB of cache, I find myself
>> wondering why the default for ext3 and ext4 is to have barriers disabled.
>>
>> This is a pretty dangerous default w.r.t. filesystem integrity on power
>> loss, no?
>
> From reading the documentation, I was under the impression that write
> barriers don't always do what they're supposed to do.

Which documentation?

Thanks,
-Eric

2008-05-15 14:43:56

by Jan Kara

[permalink] [raw]
Subject: Re: barriers off by default?

> As I look at my shiny new 500G disks with 32MB of cache, I find myself
> wondering why the default for ext3 and ext4 is to have barriers disabled.
>
> This is a pretty dangerous default w.r.t. filesystem integrity on power
> loss, no?
JFYI: SUSE kernel carries for ages a patch which changes this default.
I'd be more than happy to drop it ;).

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2008-05-15 16:00:43

by Eric Sandeen

[permalink] [raw]
Subject: Re: barriers off by default?

Jan Kara wrote:
>> As I look at my shiny new 500G disks with 32MB of cache, I find myself
>> wondering why the default for ext3 and ext4 is to have barriers disabled.
>>
>> This is a pretty dangerous default w.r.t. filesystem integrity on power
>> loss, no?
> JFYI: SUSE kernel carries for ages a patch which changes this default.
> I'd be more than happy to drop it ;).
>
> Honza

Did you ever send it upstream? I'd ACK it :)

(I have a copy of it here... the fsync change seems sane too...)

@@ -85,7 +86,10 @@ int ext3_sync_file(struct file * file, s
.sync_mode = WB_SYNC_ALL,
.nr_to_write = 0, /* sys_fsync did this */
};
+ journal_t *journal = EXT3_SB(inode->i_sb)->s_journal;
ret = sync_inode(inode, &wbc);
+ if (journal && (journal->j_flags & JFS_BARRIER))
+ blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
}
out:

I assume we need that for some power-plug-pull scenarios ... in fact I
had just been meaning to do something similar after reading an old
thread on barriers. reiserfs & xfs do this already in their sync paths.

Thanks,

-Eric

2008-05-15 16:21:44

by Andreas Dilger

[permalink] [raw]
Subject: Re: barriers off by default?

On May 15, 2008 11:00 -0500, Eric Sandeen wrote:
> ... the fsync change seems sane too...
>
> @@ -85,7 +86,10 @@ int ext3_sync_file(struct file * file, s
> .sync_mode = WB_SYNC_ALL,
> .nr_to_write = 0, /* sys_fsync did this */
> };
> + journal_t *journal = EXT3_SB(inode->i_sb)->s_journal;
> ret = sync_inode(inode, &wbc);
> + if (journal && (journal->j_flags & JFS_BARRIER))
> + blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
> }
> out:
>
> I assume we need that for some power-plug-pull scenarios ... in fact I
> had just been meaning to do something similar after reading an old
> thread on barriers. reiserfs & xfs do this already in their sync paths.

Wouldn't it make more sense to add this into the generic do_fsync()
routine?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-05-15 18:34:31

by Eric Sandeen

[permalink] [raw]
Subject: Re: barriers off by default?

Andreas Dilger wrote:
> On May 15, 2008 11:00 -0500, Eric Sandeen wrote:
>> ... the fsync change seems sane too...
>>
>> @@ -85,7 +86,10 @@ int ext3_sync_file(struct file * file, s
>> .sync_mode = WB_SYNC_ALL,
>> .nr_to_write = 0, /* sys_fsync did this */
>> };
>> + journal_t *journal = EXT3_SB(inode->i_sb)->s_journal;
>> ret = sync_inode(inode, &wbc);
>> + if (journal && (journal->j_flags & JFS_BARRIER))
>> + blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
>> }
>> out:
>>
>> I assume we need that for some power-plug-pull scenarios ... in fact I
>> had just been meaning to do something similar after reading an old
>> thread on barriers. reiserfs & xfs do this already in their sync paths.
>
> Wouldn't it make more sense to add this into the generic do_fsync()
> routine?

blkdev_issue_flush() depends on barrier support:

submit_bio(1 << BIO_RW_BARRIER, bio);

and I think only the filesystem can know for sure if it has barriers
currently in use and available?

-Eric

2008-05-15 20:44:13

by Eric Sandeen

[permalink] [raw]
Subject: Re: barriers off by default?

Jan Kara wrote:

>> As I look at my shiny new 500G disks with 32MB of cache, I find myself
>> wondering why the default for ext3 and ext4 is to have barriers disabled.
>>
>> This is a pretty dangerous default w.r.t. filesystem integrity on power
>> loss, no?
>>
> JFYI: SUSE kernel carries for ages a patch which changes this default.
> I'd be more than happy to drop it ;).
>
> Honza
>
What do folks think of this?

the show_options change is a little funky since jbd may do a test
write and fail... (actually I was thinking maybe at fill_super we
should do a test barrier write and get it out of the way early...)

-Eric


diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
index b45f3c1..daab1f5 100644
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@@ -52,8 +52,16 @@ commit=nrsec (*) Ext3 can be told to sync all its data and metadata
Setting it to very large values will improve
performance.

-barrier=1 This enables/disables barriers. barrier=0 disables
- it, barrier=1 enables it.
+barrier=<0|1(*)> This enables/disables the use of write barriers in
+ the jbd code. barrier=0 disables, barrier=1 enables.
+ This also requires an IO stack which can support
+ barriers, and if jbd gets an error on a barrier
+ write, it will disable again with a warning.
+ Write barriers enforce proper on-disk ordering
+ of journal commits, making volatile disk write caches
+ safe to use, at some performance penalty. If
+ your disks are battery-backed in one way or another,
+ disabling barriers may safely improve performance.

orlov (*) This enables the new Orlov block allocator. It is
enabled by default.
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index fe3119a..d06e0f3 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -555,6 +555,7 @@ static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs)
struct super_block *sb = vfs->mnt_sb;
struct ext3_sb_info *sbi = EXT3_SB(sb);
struct ext3_super_block *es = sbi->s_es;
+ journal_t *journal = EXT3_SB(sb)->s_journal;
unsigned long def_mount_opts;

def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
@@ -613,8 +614,16 @@ static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs)
seq_printf(seq, ",commit=%u",
(unsigned) (sbi->s_commit_interval / HZ));
}
- if (test_opt(sb, BARRIER))
- seq_puts(seq, ",barrier=1");
+ if (!test_opt(sb, BARRIER)) {
+ seq_puts(seq, ",barrier=0");
+ } else {
+ /*
+ * jbd inherits the barrier flag from ext3, and jbd may actually
+ * turn off barriers if a write fails, so it's the real test.
+ */
+ if (journal && !(journal->j_flags & JFS_BARRIER))
+ seq_puts(seq, ",barrier=1(failed)");
+ }
if (test_opt(sb, NOBH))
seq_puts(seq, ",nobh");

@@ -1589,6 +1598,7 @@ static int ext3_fill_super (struct super_block *sb, void *data, int silent)
sbi->s_resgid = le16_to_cpu(es->s_def_resgid);

set_opt(sbi->s_mount_opt, RESERVATION);
+ set_opt(sbi->s_mount_opt, BARRIER);

if (!parse_options ((char *) data, sb, &journal_inum, &journal_devnum,
NULL, 0))


2008-05-16 00:21:47

by Jan Kara

[permalink] [raw]
Subject: Re: barriers off by default?

On Thu 15-05-08 15:44:04, Eric Sandeen wrote:
> Jan Kara wrote:
>
> >> As I look at my shiny new 500G disks with 32MB of cache, I find myself
> >> wondering why the default for ext3 and ext4 is to have barriers disabled.
> >>
> >> This is a pretty dangerous default w.r.t. filesystem integrity on power
> >> loss, no?
> >>
> > JFYI: SUSE kernel carries for ages a patch which changes this default.
> > I'd be more than happy to drop it ;).
> >
> > Honza
> >
> What do folks think of this?
>
> the show_options change is a little funky since jbd may do a test
> write and fail... (actually I was thinking maybe at fill_super we
> should do a test barrier write and get it out of the way early...)
Yes, the patch looks fine with me.

Acked-by: Jan Kara <[email protected]>

> diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
> index b45f3c1..daab1f5 100644
> --- a/Documentation/filesystems/ext3.txt
> +++ b/Documentation/filesystems/ext3.txt
> @@ -52,8 +52,16 @@ commit=nrsec (*) Ext3 can be told to sync all its data and metadata
> Setting it to very large values will improve
> performance.
>
> -barrier=1 This enables/disables barriers. barrier=0 disables
> - it, barrier=1 enables it.
> +barrier=<0|1(*)> This enables/disables the use of write barriers in
> + the jbd code. barrier=0 disables, barrier=1 enables.
> + This also requires an IO stack which can support
> + barriers, and if jbd gets an error on a barrier
> + write, it will disable again with a warning.
> + Write barriers enforce proper on-disk ordering
> + of journal commits, making volatile disk write caches
> + safe to use, at some performance penalty. If
> + your disks are battery-backed in one way or another,
> + disabling barriers may safely improve performance.
>
> orlov (*) This enables the new Orlov block allocator. It is
> enabled by default.
> diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> index fe3119a..d06e0f3 100644
> --- a/fs/ext3/super.c
> +++ b/fs/ext3/super.c
> @@ -555,6 +555,7 @@ static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs)
> struct super_block *sb = vfs->mnt_sb;
> struct ext3_sb_info *sbi = EXT3_SB(sb);
> struct ext3_super_block *es = sbi->s_es;
> + journal_t *journal = EXT3_SB(sb)->s_journal;
> unsigned long def_mount_opts;
>
> def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
> @@ -613,8 +614,16 @@ static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs)
> seq_printf(seq, ",commit=%u",
> (unsigned) (sbi->s_commit_interval / HZ));
> }
> - if (test_opt(sb, BARRIER))
> - seq_puts(seq, ",barrier=1");
> + if (!test_opt(sb, BARRIER)) {
> + seq_puts(seq, ",barrier=0");
> + } else {
> + /*
> + * jbd inherits the barrier flag from ext3, and jbd may actually
> + * turn off barriers if a write fails, so it's the real test.
> + */
> + if (journal && !(journal->j_flags & JFS_BARRIER))
> + seq_puts(seq, ",barrier=1(failed)");
> + }
> if (test_opt(sb, NOBH))
> seq_puts(seq, ",nobh");
>
> @@ -1589,6 +1598,7 @@ static int ext3_fill_super (struct super_block *sb, void *data, int silent)
> sbi->s_resgid = le16_to_cpu(es->s_def_resgid);
>
> set_opt(sbi->s_mount_opt, RESERVATION);
> + set_opt(sbi->s_mount_opt, BARRIER);
>
> if (!parse_options ((char *) data, sb, &journal_inum, &journal_devnum,
> NULL, 0))
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2008-05-16 00:58:19

by Eric Sandeen

[permalink] [raw]
Subject: Re: barriers off by default?

Jan Kara wrote:
> On Thu 15-05-08 15:44:04, Eric Sandeen wrote:


>> + if (!test_opt(sb, BARRIER)) {
>> + seq_puts(seq, ",barrier=0");
>> + } else {
>> + /*
>> + * jbd inherits the barrier flag from ext3, and jbd may actually
>> + * turn off barriers if a write fails, so it's the real test.
>> + */
>> + if (journal && !(journal->j_flags & JFS_BARRIER))
>> + seq_puts(seq, ",barrier=1(failed)");
>> + }

Actually, since /proc/mounts should substitute for mtab I should not be
putting chatty informational messages in there, should I!

/*
* jbd inherits the barrier flag from ext3, and jbd may actually
* turn off barriers if a write fails, so it's the real test.
*/
if (!test_opt(sb, BARRIER) ||
(journal && !(journal->j_flags & JFS_BARRIER)))
seq_puts(seq, ",barrier=0");

is a better plan.

-Eric

2008-05-17 17:38:56

by Theodore Ts'o

[permalink] [raw]
Subject: Re: barriers off by default?

Given the discussion on the other thread about the performance impact
and hard-to-hit nature of barrier problems, how about this?

- Ted

ext4: Add EXT4_DEFM_BARRIER and EXT4_DEFM_I_VERSION default mount options

Allow the i_version and barrier mount options to be set by default via
flags in the ext4 superblock.

Signed-off-by: "Theodore Ts'o" <[email protected]>
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b7ffb91..afeba1f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -785,6 +785,8 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
#define EXT4_DEFM_JMODE_DATA 0x0020
#define EXT4_DEFM_JMODE_ORDERED 0x0040
#define EXT4_DEFM_JMODE_WBACK 0x0060
+#define EXT4_DEFM_BARRIER 0x0080
+#define EXT4_DEFM_I_VERSION 0x0100

/*
* Structure of a directory entry
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 09d9359..c0a657c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -729,7 +729,7 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
seq_printf(seq, ",commit=%u",
(unsigned) (sbi->s_commit_interval / HZ));
}
- if (test_opt(sb, BARRIER))
+ if (test_opt(sb, BARRIER) && !(def_mount_opts & EXT4_DEFM_BARRIER))
seq_puts(seq, ",barrier=1");
if (test_opt(sb, NOBH))
seq_puts(seq, ",nobh");
@@ -737,7 +737,7 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
seq_puts(seq, ",noextents");
if (!test_opt(sb, MBALLOC))
seq_puts(seq, ",nomballoc");
- if (test_opt(sb, I_VERSION))
+ if (test_opt(sb, I_VERSION) && !(def_mount_opts & EXT4_DEFM_I_VERSION))
seq_puts(seq, ",i_version");

if (sbi->s_stripe)
@@ -1896,6 +1896,12 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
else if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_WBACK)
sbi->s_mount_opt |= EXT4_MOUNT_WRITEBACK_DATA;

+ if (def_mount_opts & EXT4_DEFM_BARRIER)
+ set_opt(sbi->s_mount_opt, BARRIER);
+
+ if (def_mount_opts & EXT4_DEFM_I_VERSION)
+ sb->s_flags |= MS_I_VERSION;
+
if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_PANIC)
set_opt(sbi->s_mount_opt, ERRORS_PANIC);
else if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_CONTINUE)

2008-05-17 19:03:00

by Eric Sandeen

[permalink] [raw]
Subject: Re: barriers off by default?

Theodore Tso wrote:
> Given the discussion on the other thread about the performance impact
> and hard-to-hit nature of barrier problems, how about this?

I still want to do a bit of testing to see if it's really that hard to
hit :) (trying to hook up my x10-power-killer is a challenge with no
serial ports....)

But in any case, adding these to potential default mount options sounds
good to me.

-Eric

> - Ted
>
> ext4: Add EXT4_DEFM_BARRIER and EXT4_DEFM_I_VERSION default mount options
>
> Allow the i_version and barrier mount options to be set by default via
> flags in the ext4 superblock.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index b7ffb91..afeba1f 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -785,6 +785,8 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
> #define EXT4_DEFM_JMODE_DATA 0x0020
> #define EXT4_DEFM_JMODE_ORDERED 0x0040
> #define EXT4_DEFM_JMODE_WBACK 0x0060
> +#define EXT4_DEFM_BARRIER 0x0080
> +#define EXT4_DEFM_I_VERSION 0x0100
>
> /*
> * Structure of a directory entry
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 09d9359..c0a657c 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -729,7 +729,7 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
> seq_printf(seq, ",commit=%u",
> (unsigned) (sbi->s_commit_interval / HZ));
> }
> - if (test_opt(sb, BARRIER))
> + if (test_opt(sb, BARRIER) && !(def_mount_opts & EXT4_DEFM_BARRIER))
> seq_puts(seq, ",barrier=1");
> if (test_opt(sb, NOBH))
> seq_puts(seq, ",nobh");
> @@ -737,7 +737,7 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
> seq_puts(seq, ",noextents");
> if (!test_opt(sb, MBALLOC))
> seq_puts(seq, ",nomballoc");
> - if (test_opt(sb, I_VERSION))
> + if (test_opt(sb, I_VERSION) && !(def_mount_opts & EXT4_DEFM_I_VERSION))
> seq_puts(seq, ",i_version");
>
> if (sbi->s_stripe)
> @@ -1896,6 +1896,12 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
> else if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_WBACK)
> sbi->s_mount_opt |= EXT4_MOUNT_WRITEBACK_DATA;
>
> + if (def_mount_opts & EXT4_DEFM_BARRIER)
> + set_opt(sbi->s_mount_opt, BARRIER);
> +
> + if (def_mount_opts & EXT4_DEFM_I_VERSION)
> + sb->s_flags |= MS_I_VERSION;
> +
> if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_PANIC)
> set_opt(sbi->s_mount_opt, ERRORS_PANIC);
> else if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_CONTINUE)