This patch-set makes ext4 file-system stop using the VFS '->write_supers()'
call-back and the '->s_dirt' superblock field because I plan to remove them
once all users are gone.
Most of the ext4 changes were done or suggested by Jan Kara (thanks!). This
patch does not do anything complex - we basically notice that ext4 does not
really needed the 'write_super()' functionality and we can remove it. We
mark the superblock buffer as dirty directly, rather than postponing this till
the next 'sync_supers()' kernel thread wake-up.
However, this means that we'll end up with calculating the checksum more often,
see patch 3 for more information.
I've also dropped the fifth patch which removes the 'now' argument from
'__ext4_handle_dirty_super()' - simply because I do not understand why we
should prefer marking the buffer as dirty directly instead of calling
'ext4_commit_super()': the speed difference must be neglectable and
'ext4_commit_super()' performs some error checking. I can submit this patch
later - just need some more explanation from Jan.
Tested with xfstests (both journalled and non-journalled ext4 modes).
Reminder:
The goal is to get rid of the 'sync_supers()' kernel thread. This kernel thread
wakes up every 5 seconds (by default) and calls '->write_super()' for all
mounted file-systems. And the bad thing is that this is done even if all the
superblocks are clean. Moreover, many file-systems do not even need this end
they do not register the '->write_super()' method at all (e.g., btrfs).
So 'sync_supers()' most often just generates useless wake-ups and wastes power.
I am trying to make all file-systems independent of '->write_super()' and plan
to remove 'sync_supers()' and '->write_super()' completely once there are no
more users.
======
Overall status:
1. ext4: patches submitted,
https://lkml.org/lkml/2012/7/3/210
2. exofs: patch submitted,
https://lkml.org/lkml/2012/6/4/211
3. sysv: patches submitted,
http://lkml.org/lkml/2012/7/3/250
4. udf: patch submitted, sits in Jan Kara's tree:
https://lkml.org/lkml/2012/6/4/233
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs for_testing
5. affs: patches submitted, sit in Al Viro's tree:
https://lkml.org/lkml/2012/6/6/400
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs for-next
6. hfs: patches submitted, sit Andrew Morton's tree
http://lkml.org/lkml/2012/6/12/82
7. hfsplus: patches submitted, sit in Andrew Morton's tree:
https://lkml.org/lkml/2012/6/13/195
8. ext2: done, see commit f72cf5e223a28d3b3ea7dc9e40464fd534e359e8
9. vfat: done, see commit 78491189ddb6d84d4a4abae992ed891a236d0263
10. jffs2: done, see commit 208b14e507c00ff7f108e1a388dd3d8cc805a443
11. reiserfs: done, see commit 033369d1af1264abc23bea2e174aa47cdd212f6f
TODO: ufs
======
fs/ext4/ext4.h | 10 +---------
fs/ext4/ext4_jbd2.c | 2 +-
fs/ext4/file.c | 14 +++++++++++++-
fs/ext4/ialloc.c | 2 --
fs/ext4/mballoc.c | 2 --
fs/ext4/super.c | 15 ++-------------
6 files changed, 17 insertions(+), 28 deletions(-)
Thanks,
Artem.
From: Jan Kara <[email protected]>
The last user of ext4_mark_super_dirty() in ext4_file_open() is so rare it
can well be modifying the superblock properly by journalling the change.
Change it and get rid of ext4_mark_super_dirty() as it's not needed anymore.
Artem: small amendments.
Artem: tested using xfstests for both journalled and non-journalled ext4.
Signed-off-by: Jan Kara <[email protected]>
Tested-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/ext4/ext4.h | 9 ---------
fs/ext4/file.c | 14 +++++++++++++-
2 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index cfc4e01..0c4042e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2321,15 +2321,6 @@ static inline void ext4_unlock_group(struct super_block *sb,
spin_unlock(ext4_group_lock_ptr(sb, group));
}
-static inline void ext4_mark_super_dirty(struct super_block *sb)
-{
- struct ext4_super_block *es = EXT4_SB(sb)->s_es;
-
- ext4_superblock_csum_set(sb, es);
- if (EXT4_SB(sb)->s_journal == NULL)
- sb->s_dirt =1;
-}
-
/*
* Block validity checking
*/
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 8c7642a..3a77494 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -181,9 +181,21 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
path.dentry = mnt->mnt_root;
cp = d_path(&path, buf, sizeof(buf));
if (!IS_ERR(cp)) {
+ handle_t *handle;
+ int err;
+
+ handle = ext4_journal_start_sb(sb, 1);
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
+ err = ext4_journal_get_write_access(handle, sbi->s_sbh);
+ if (err) {
+ ext4_journal_stop(handle);
+ return err;
+ }
strlcpy(sbi->s_es->s_last_mounted, cp,
sizeof(sbi->s_es->s_last_mounted));
- ext4_mark_super_dirty(sb);
+ ext4_handle_dirty_super(handle, sb);
+ ext4_journal_stop(handle);
}
}
/*
--
1.7.7.6
From: Artem Bityutskiy <[email protected]>
We do not depend on VFS's '->write_super()' anymore and do not need the
's_dirt' flag anymore. This patch weeds out 'ext4_write_super()' and the
's_dirt' usage in VFS.
Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/ext4/super.c | 10 ----------
1 files changed, 0 insertions(+), 10 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9b26ba0..546c6ed 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -73,7 +73,6 @@ static const char *ext4_decode_error(struct super_block *sb, int errno,
static int ext4_remount(struct super_block *sb, int *flags, char *data);
static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf);
static int ext4_unfreeze(struct super_block *sb);
-static void ext4_write_super(struct super_block *sb);
static int ext4_freeze(struct super_block *sb);
static struct dentry *ext4_mount(struct file_system_type *fs_type, int flags,
const char *dev_name, void *data);
@@ -1193,7 +1192,6 @@ static const struct super_operations ext4_nojournal_sops = {
.dirty_inode = ext4_dirty_inode,
.drop_inode = ext4_drop_inode,
.evict_inode = ext4_evict_inode,
- .write_super = ext4_write_super,
.put_super = ext4_put_super,
.statfs = ext4_statfs,
.remount_fs = ext4_remount,
@@ -4202,7 +4200,6 @@ int ext4_commit_super(struct super_block *sb, int sync)
es->s_free_inodes_count =
cpu_to_le32(percpu_counter_sum_positive(
&EXT4_SB(sb)->s_freeinodes_counter));
- sb->s_dirt = 0;
BUFFER_TRACE(sbh, "marking dirty");
ext4_superblock_csum_set(sb, es);
mark_buffer_dirty(sbh);
@@ -4309,13 +4306,6 @@ int ext4_force_commit(struct super_block *sb)
return ret;
}
-static void ext4_write_super(struct super_block *sb)
-{
- lock_super(sb);
- ext4_commit_super(sb, 1);
- unlock_super(sb);
-}
-
static int ext4_sync_fs(struct super_block *sb, int wait)
{
int ret = 0;
--
1.7.7.6
From: Artem Bityutskiy <[email protected]>
This patch changes the '__ext4_handle_dirty_super()' function which is used
by ext4 to update the superblock via the journal in the following cases:
1. When creating the first large file on a file system without
EXT4_FEATURE_RO_COMPAT_LARGE_FILE feature.
2. When re-sizing the file-system.
3. When creating an xattr on a file-system without the
EXT4_FEATURE_COMPAT_EXT_ATTR feature.
4. When adding or deleting an orphan (because we update the 's_last_orphan'
superblock field). This happens actually on every delete operation.
This function, however, falls back to just marking the superblock as dirty
if the file-system has no journal. This means that we delay the actual
superblock I/O submission by 5 seconds (roughly speaking). Namely, the
'sync_supers()' kernel thread will call 'ext4_write_super()' later, where
we actually will submit the superblock down to the media.
However:
1. For cases 1-3 it does not add any value to delay the I/O submission. These
events are rare and we may just commit submit the superblock for
asynchronous I/O right away.
2. But case N4 is a bit less obvious.
a) If checksumming is disabled, then it is probably all-right to just mark
the superblock buffer as dirty right away.
b) If checksumming is enabled, then we'll end up doing more work on deletion
because we'll end up with re-calculating checksum on every deletion.
But I think that the right way to solve this issue is to calculate
checksum at the time when the buffer is actually sent to the I/O queue.
If the checksum is a function of the buffer plus some seed data, we
should be able to implement some kind of call-back which will be called
for the buffer just before it is sent to the I/O queue.
So it looks like my change in the b) case is exposing an existing flaw,
not introducing a new one. And this is why I think it is OK.
This patch also removes 's_dirt' condition on the unmount path because we never
set it anymore, so we should not test it.
Tested using xfstests for both journalled and non-journalled ext4.
Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/ext4/ext4.h | 1 +
fs/ext4/ext4_jbd2.c | 2 +-
fs/ext4/super.c | 5 ++---
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0c4042e..b2439d5 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2041,6 +2041,7 @@ extern int ext4_superblock_csum_verify(struct super_block *sb,
struct ext4_super_block *es);
extern void ext4_superblock_csum_set(struct super_block *sb,
struct ext4_super_block *es);
+extern int ext4_commit_super(struct super_block *sb, int sync);
extern void *ext4_kvmalloc(size_t size, gfp_t flags);
extern void *ext4_kvzalloc(size_t size, gfp_t flags);
extern void ext4_kvfree(void *ptr);
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 90f7c2e..27354df 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -156,6 +156,6 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line,
(struct ext4_super_block *)bh->b_data);
mark_buffer_dirty(bh);
} else
- sb->s_dirt = 1;
+ err = ext4_commit_super(sb, 0);
return err;
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index eb7aa3e..9b26ba0 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -63,7 +63,6 @@ static struct ext4_features *ext4_feat;
static int ext4_load_journal(struct super_block *, struct ext4_super_block *,
unsigned long journal_devnum);
static int ext4_show_options(struct seq_file *seq, struct dentry *root);
-static int ext4_commit_super(struct super_block *sb, int sync);
static void ext4_mark_recovery_complete(struct super_block *sb,
struct ext4_super_block *es);
static void ext4_clear_journal_err(struct super_block *sb,
@@ -896,7 +895,7 @@ static void ext4_put_super(struct super_block *sb)
EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
es->s_state = cpu_to_le16(sbi->s_mount_state);
}
- if (sb->s_dirt || !(sb->s_flags & MS_RDONLY))
+ if (!(sb->s_flags & MS_RDONLY))
ext4_commit_super(sb, 1);
if (sbi->s_proc) {
@@ -4155,7 +4154,7 @@ static int ext4_load_journal(struct super_block *sb,
return 0;
}
-static int ext4_commit_super(struct super_block *sb, int sync)
+int ext4_commit_super(struct super_block *sb, int sync)
{
struct ext4_super_block *es = EXT4_SB(sb)->s_es;
struct buffer_head *sbh = EXT4_SB(sb)->s_sbh;
--
1.7.7.6
From: Jan Kara <[email protected]>
Commit a0375156 properly notes that superblock doesn't need to be marked
as dirty when only number of free inodes / blocks / number of directories
changes since that is recomputed on each mount anyway. However that comment
leaves some unnecessary markings as dirty in place. Remove these.
Artem: tested using xfstests for both journalled and non-journalled ext4.
Signed-off-by: Jan Kara <[email protected]>
Tested-by: Artem Bityutskiy <[email protected]>
Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/ext4/ialloc.c | 2 --
fs/ext4/mballoc.c | 2 --
2 files changed, 0 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index d48e8b1..25b918c 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -315,7 +315,6 @@ out:
err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
if (!fatal)
fatal = err;
- ext4_mark_super_dirty(sb);
} else
ext4_error(sb, "bit already cleared for inode %lu", ino);
@@ -830,7 +829,6 @@ got:
percpu_counter_dec(&sbi->s_freeinodes_counter);
if (S_ISDIR(mode))
percpu_counter_inc(&sbi->s_dirs_counter);
- ext4_mark_super_dirty(sb);
if (sbi->s_log_groups_per_flex) {
flex_group = ext4_flex_group(sbi, group);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 1cd6994..eabfb4c 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2825,7 +2825,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
err = ext4_handle_dirty_metadata(handle, NULL, gdp_bh);
out_err:
- ext4_mark_super_dirty(sb);
brelse(bitmap_bh);
return err;
}
@@ -4694,7 +4693,6 @@ do_more:
put_bh(bitmap_bh);
goto do_more;
}
- ext4_mark_super_dirty(sb);
error_return:
brelse(bitmap_bh);
ext4_std_error(sb, err);
--
1.7.7.6