This patch-set makes HFSPLUS file-system stop using the VFS '->write_supers()'
call-back and the '->s_dirt' superblock field because I plan to remove them
once all users are gone.
Like the other similar patch-sets, we switch to a delayed job for writing out
the superblock instead of using the 's_dirt' flag. Additionally, this patch-set
includes several clean-ups.
Reminder:
The goal is to get rid of the 'sync_supers()' kernel thread. This kernel thread
wakes up every 5 seconds (by default) and calls '->write_super()' for all
mounted file-systems. And the bad thing is that this is done even if all the
superblocks are clean. Moreover, many file-systems do not even need this end
they do not register the '->write_super()' method at all (e.g., btrfs).
So 'sync_supers()' most often just generates useless wake-ups and wastes power.
I am trying to make all file-systems independent of '->write_super()' and plan
to remove 'sync_supers()' and '->write_super()' completely once there are no
more users.
Al, in the past I was trying to upstream patches which optimized 'sync_super()',
but you wanted me to kill it completely instead, which I am trying to do
now, see http://lkml.org/lkml/2010/7/22/96
Tested using the fsstress test from the LTP project.
======
Overall status:
1. ext4: patches submitted, waiting for review from Ted Ts'o:
https://lkml.org/lkml/2012/4/2/111
2. hfs: patches submitted, see
http://lkml.org/lkml/2012/6/12/82
3. hfsplus: this patch-set
4. udf: patch submitted, should be in Jan Kara's tree:
https://lkml.org/lkml/2012/6/4/233
5. exofs: patch submitted, not sure if it will go to the exofs tree:
https://lkml.org/lkml/2012/6/4/211
6. affs: patches submitted, should be in Al Viro's tree:
https://lkml.org/lkml/2012/6/6/400
7. ext2: done, see commit f72cf5e223a28d3b3ea7dc9e40464fd534e359e8
8. vfat: done, see commit 78491189ddb6d84d4a4abae992ed891a236d0263
9. jffs2: done, see commit 208b14e507c00ff7f108e1a388dd3d8cc805a443
10. reiserfs: done, see commit 033369d1af1264abc23bea2e174aa47cdd212f6f
TODO: sysv, ufs
======
fs/hfsplus/bitmap.c | 4 ++--
fs/hfsplus/dir.c | 2 +-
fs/hfsplus/hfsplus_fs.h | 7 +++++--
fs/hfsplus/inode.c | 6 +++---
fs/hfsplus/super.c | 46 ++++++++++++++++++++++++++++++++++------------
5 files changed, 45 insertions(+), 20 deletions(-)
Thanks,
Artem.
From: Artem Bityutskiy <[email protected]>
This patch makes hfsplus stop using the VFS '->write_super()' method along with
the 's_dirt' superblock flag, because they are on their way out.
The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblocks using the '->write_super()' call-back. But the
problem with this thread is that it wastes power by waking up the system every
5 seconds, even if there are no diry superblocks, or there are no client
file-systems which would need this (e.g., btrfs does not use
'->write_super()'). So we want to kill it completely and thus, we need to make
file-systems to stop using the '->write_super()' VFS service, and then remove
it together with the kernel thread.
Tested using fsstress from the LTP project.
Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/hfsplus/bitmap.c | 4 ++--
fs/hfsplus/dir.c | 2 +-
fs/hfsplus/hfsplus_fs.h | 6 +++++-
fs/hfsplus/inode.c | 6 +++---
fs/hfsplus/super.c | 41 +++++++++++++++++++++++++++++++++--------
5 files changed, 44 insertions(+), 15 deletions(-)
diff --git a/fs/hfsplus/bitmap.c b/fs/hfsplus/bitmap.c
index 1cad80c..4cfbe2e 100644
--- a/fs/hfsplus/bitmap.c
+++ b/fs/hfsplus/bitmap.c
@@ -153,7 +153,7 @@ done:
kunmap(page);
*max = offset + (curr - pptr) * 32 + i - start;
sbi->free_blocks -= *max;
- sb->s_dirt = 1;
+ hfsplus_mark_mdb_dirty(sb);
dprint(DBG_BITMAP, "-> %u,%u\n", start, *max);
out:
mutex_unlock(&sbi->alloc_mutex);
@@ -228,7 +228,7 @@ out:
set_page_dirty(page);
kunmap(page);
sbi->free_blocks += len;
- sb->s_dirt = 1;
+ hfsplus_mark_mdb_dirty(sb);
mutex_unlock(&sbi->alloc_mutex);
return 0;
diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c
index 26b53fb..8375504 100644
--- a/fs/hfsplus/dir.c
+++ b/fs/hfsplus/dir.c
@@ -316,7 +316,7 @@ static int hfsplus_link(struct dentry *src_dentry, struct inode *dst_dir,
inode->i_ctime = CURRENT_TIME_SEC;
mark_inode_dirty(inode);
sbi->file_count++;
- dst_dir->i_sb->s_dirt = 1;
+ hfsplus_mark_mdb_dirty(dst_dir->i_sb);
out:
mutex_unlock(&sbi->vh_mutex);
return res;
diff --git a/fs/hfsplus/hfsplus_fs.h b/fs/hfsplus/hfsplus_fs.h
index 66a9365..558dbb4 100644
--- a/fs/hfsplus/hfsplus_fs.h
+++ b/fs/hfsplus/hfsplus_fs.h
@@ -153,8 +153,11 @@ struct hfsplus_sb_info {
gid_t gid;
int part, session;
-
unsigned long flags;
+
+ int work_queued; /* non-zero delayed work is queued */
+ struct delayed_work sync_work; /* FS sync delayed work */
+ spinlock_t work_lock; /* protects sync_work and work_queued */
};
#define HFSPLUS_SB_WRITEBACKUP 0
@@ -428,6 +431,7 @@ int hfsplus_show_options(struct seq_file *, struct dentry *);
/* super.c */
struct inode *hfsplus_iget(struct super_block *, unsigned long);
+void hfsplus_mark_mdb_dirty(struct super_block *sb);
/* tables.c */
extern u16 hfsplus_case_fold_table[];
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index 82b69ee..ad1f16f 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -431,7 +431,7 @@ struct inode *hfsplus_new_inode(struct super_block *sb, umode_t mode)
sbi->file_count++;
insert_inode_hash(inode);
mark_inode_dirty(inode);
- sb->s_dirt = 1;
+ hfsplus_mark_mdb_dirty(sb);
return inode;
}
@@ -442,7 +442,7 @@ void hfsplus_delete_inode(struct inode *inode)
if (S_ISDIR(inode->i_mode)) {
HFSPLUS_SB(sb)->folder_count--;
- sb->s_dirt = 1;
+ hfsplus_mark_mdb_dirty(sb);
return;
}
HFSPLUS_SB(sb)->file_count--;
@@ -455,7 +455,7 @@ void hfsplus_delete_inode(struct inode *inode)
inode->i_size = 0;
hfsplus_file_truncate(inode);
}
- sb->s_dirt = 1;
+ hfsplus_mark_mdb_dirty(sb);
}
void hfsplus_inode_read_fork(struct inode *inode, struct hfsplus_fork_raw *fork)
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
index f4f3d54..cb6b9c1 100644
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -124,7 +124,7 @@ static int hfsplus_system_write_inode(struct inode *inode)
if (fork->total_size != cpu_to_be64(inode->i_size)) {
set_bit(HFSPLUS_SB_WRITEBACKUP, &sbi->flags);
- inode->i_sb->s_dirt = 1;
+ hfsplus_mark_mdb_dirty(inode->i_sb);
}
hfsplus_inode_write_fork(inode, fork);
if (tree)
@@ -173,7 +173,7 @@ static int hfsplus_sync_fs(struct super_block *sb, int wait)
dprint(DBG_SUPER, "hfsplus_sync_fs\n");
- sb->s_dirt = 0;
+ cancel_delayed_work(&sbi->sync_work);
/*
* Explicitly write out the special metadata inodes.
@@ -226,12 +226,34 @@ out:
return error;
}
-static void hfsplus_write_super(struct super_block *sb)
+static void delayed_sync_fs(struct work_struct *work)
{
- if (!(sb->s_flags & MS_RDONLY))
- hfsplus_sync_fs(sb, 1);
- else
- sb->s_dirt = 0;
+ struct hfsplus_sb_info *sbi;
+
+ sbi = container_of(work, struct hfsplus_sb_info, sync_work.work);
+
+ spin_lock(&sbi->work_lock);
+ sbi->work_queued = 0;
+ spin_unlock(&sbi->work_lock);
+
+ hfsplus_sync_fs(sbi->alloc_file->i_sb, 1);
+}
+
+void hfsplus_mark_mdb_dirty(struct super_block *sb)
+{
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
+ unsigned long delay;
+
+ if (sb->s_flags & MS_RDONLY)
+ return;
+
+ spin_lock(&sbi->work_lock);
+ if (!sbi->work_queued) {
+ delay = msecs_to_jiffies(dirty_writeback_interval * 10);
+ queue_delayed_work(system_long_wq, &sbi->sync_work, delay);
+ sbi->work_queued = 1;
+ }
+ spin_unlock(&sbi->work_lock);
}
static void hfsplus_put_super(struct super_block *sb)
@@ -240,6 +262,8 @@ static void hfsplus_put_super(struct super_block *sb)
dprint(DBG_SUPER, "hfsplus_put_super\n");
+ cancel_delayed_work_sync(&sbi->sync_work);
+
if (!(sb->s_flags & MS_RDONLY) && sbi->s_vhdr) {
struct hfsplus_vh *vhdr = sbi->s_vhdr;
@@ -325,7 +349,6 @@ static const struct super_operations hfsplus_sops = {
.write_inode = hfsplus_write_inode,
.evict_inode = hfsplus_evict_inode,
.put_super = hfsplus_put_super,
- .write_super = hfsplus_write_super,
.sync_fs = hfsplus_sync_fs,
.statfs = hfsplus_statfs,
.remount_fs = hfsplus_remount,
@@ -352,6 +375,8 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
sb->s_fs_info = sbi;
mutex_init(&sbi->alloc_mutex);
mutex_init(&sbi->vh_mutex);
+ spin_lock_init(&sbi->work_lock);
+ INIT_DELAYED_WORK(&sbi->sync_work, delayed_sync_fs);
hfsplus_fill_defaults(sbi);
err = -EINVAL;
--
1.7.7.6
From: Artem Bityutskiy <[email protected]>
Print correct function name in the debugging print of the
'hfsplus_sync_fs()' function.
Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/hfsplus/super.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
index 5df771e..9e9c2788 100644
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -171,7 +171,7 @@ static int hfsplus_sync_fs(struct super_block *sb, int wait)
if (!wait)
return 0;
- dprint(DBG_SUPER, "hfsplus_write_super\n");
+ dprint(DBG_SUPER, "hfsplus_sync_fs\n");
sb->s_dirt = 0;
--
1.7.7.6
From: Artem Bityutskiy <[email protected]>
This check is useless because we always have 'sb->s_fs_info' to be non-NULL.
Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/hfsplus/super.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
index 9e9c2788..f4f3d54 100644
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -240,9 +240,6 @@ static void hfsplus_put_super(struct super_block *sb)
dprint(DBG_SUPER, "hfsplus_put_super\n");
- if (!sb->s_fs_info)
- return;
-
if (!(sb->s_flags & MS_RDONLY) && sbi->s_vhdr) {
struct hfsplus_vh *vhdr = sbi->s_vhdr;
--
1.7.7.6
From: Artem Bityutskiy <[email protected]>
... because it is used only in fs/hfsplus/super.c.
Signed-off-by: Artem Bityutskiy <[email protected]>
---
fs/hfsplus/hfsplus_fs.h | 1 -
fs/hfsplus/super.c | 2 +-
2 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/fs/hfsplus/hfsplus_fs.h b/fs/hfsplus/hfsplus_fs.h
index 4e75ac6..66a9365 100644
--- a/fs/hfsplus/hfsplus_fs.h
+++ b/fs/hfsplus/hfsplus_fs.h
@@ -428,7 +428,6 @@ int hfsplus_show_options(struct seq_file *, struct dentry *);
/* super.c */
struct inode *hfsplus_iget(struct super_block *, unsigned long);
-int hfsplus_sync_fs(struct super_block *sb, int wait);
/* tables.c */
extern u16 hfsplus_case_fold_table[];
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
index a9bca4b..5df771e 100644
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -161,7 +161,7 @@ static void hfsplus_evict_inode(struct inode *inode)
}
}
-int hfsplus_sync_fs(struct super_block *sb, int wait)
+static int hfsplus_sync_fs(struct super_block *sb, int wait)
{
struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
struct hfsplus_vh *vhdr = sbi->s_vhdr;
--
1.7.7.6
On Wed, 2012-06-13 at 14:37 +0300, Artem Bityutskiy wrote:
> This patch-set makes HFSPLUS file-system stop using the VFS '->write_supers()'
> call-back and the '->s_dirt' superblock field because I plan to remove them
> once all users are gone.
Hi, Al or Andrew, would you please consider picking this patch-set?
--
Best Regards,
Artem Bityutskiy
On Wed, 13 Jun 2012 14:37:51 +0300
Artem Bityutskiy <[email protected]> wrote:
> From: Artem Bityutskiy <[email protected]>
>
> This patch makes hfsplus stop using the VFS '->write_super()' method along with
> the 's_dirt' superblock flag, because they are on their way out.
>
> The whole "superblock write-out" VFS infrastructure is served by the
> 'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
> writes out all dirty superblocks using the '->write_super()' call-back. But the
> problem with this thread is that it wastes power by waking up the system every
> 5 seconds, even if there are no diry superblocks, or there are no client
> file-systems which would need this (e.g., btrfs does not use
> '->write_super()'). So we want to kill it completely and thus, we need to make
> file-systems to stop using the '->write_super()' VFS service, and then remove
> it together with the kernel thread.
>
> Tested using fsstress from the LTP project.
>
>
> ...
>
> --- a/fs/hfsplus/hfsplus_fs.h
> +++ b/fs/hfsplus/hfsplus_fs.h
> @@ -153,8 +153,11 @@ struct hfsplus_sb_info {
> gid_t gid;
>
> int part, session;
> -
> unsigned long flags;
> +
> + int work_queued; /* non-zero delayed work is queued */
This would be a little nicer if it had the bool type.
> + struct delayed_work sync_work; /* FS sync delayed work */
> + spinlock_t work_lock; /* protects sync_work and work_queued */
I'm not sure that this lock really needs to exist.
> -static void hfsplus_write_super(struct super_block *sb)
> +static void delayed_sync_fs(struct work_struct *work)
> {
> - if (!(sb->s_flags & MS_RDONLY))
> - hfsplus_sync_fs(sb, 1);
> - else
> - sb->s_dirt = 0;
> + struct hfsplus_sb_info *sbi;
> +
> + sbi = container_of(work, struct hfsplus_sb_info, sync_work.work);
> +
> + spin_lock(&sbi->work_lock);
> + sbi->work_queued = 0;
> + spin_unlock(&sbi->work_lock);
Here it is "protecting" a single write.
> + hfsplus_sync_fs(sbi->alloc_file->i_sb, 1);
> +}
> +
> +void hfsplus_mark_mdb_dirty(struct super_block *sb)
> +{
> + struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
> + unsigned long delay;
> +
> + if (sb->s_flags & MS_RDONLY)
> + return;
> +
> + spin_lock(&sbi->work_lock);
> + if (!sbi->work_queued) {
> + delay = msecs_to_jiffies(dirty_writeback_interval * 10);
> + queue_delayed_work(system_long_wq, &sbi->sync_work, delay);
> + sbi->work_queued = 1;
> + }
> + spin_unlock(&sbi->work_lock);
> }
And I think it could be made to go away here, perhaps by switching to
test_and_set_bit or similar.
And I wonder about the queue_delayed_work(). iirc this does nothing to
align timer expiries, so someone who has a lot of filesystems could end
up with *more* timer wakeups. Shouldn't we do something here to make
the system do larger amounts of work per timer expiry? Such as the
timer-slack infrastructure?
It strikes me that this whole approach improves the small system with
little write activity, but makes things worse for the larger system
with a lot of filesystems?
On Thu, 2012-06-21 at 12:41 -0700, Andrew Morton wrote:
> > + spin_lock(&sbi->work_lock);
> > + if (!sbi->work_queued) {
> > + delay = msecs_to_jiffies(dirty_writeback_interval * 10);
> > + queue_delayed_work(system_long_wq, &sbi->sync_work, delay);
> > + sbi->work_queued = 1;
> > + }
> > + spin_unlock(&sbi->work_lock);
> > }
>
> And I think it could be made to go away here, perhaps by switching to
> test_and_set_bit or similar.
Yes, you are right, and I thought about this. But I did not want to make
it more complicated than needed. Spinlock is simple and
straight-forward, this is not a fast-path.
> And I wonder about the queue_delayed_work(). iirc this does nothing to
> align timer expiries, so someone who has a lot of filesystems could end
> up with *more* timer wakeups. Shouldn't we do something here to make
> the system do larger amounts of work per timer expiry? Such as the
> timer-slack infrastructure?
Well, I also thought about this. But I again did not want to invent
anything complex because main file systems - ext4, btrfs, xfs - do not
use 'write_super()' at all. And then only these dying / rare
file-systems like btrfs / hfs - I did not feel like over-engineering is
needed.
If someone is affected by more timers, which I really really doubt, we
can further optimize this using deferrable timers for some file-systems
which do not really think anything super-important, and we can use range
hrtimers for file-systems which sync something more or less important.
The former is easy to do, the latter would need improving the workqueue
infrastructure.
> It strikes me that this whole approach improves the small system with
> little write activity, but makes things worse for the larger system
> with a lot of filesystems?
Well, if we have a lot of those rare FSes and do not have much
activities, then we do not wake up. If we have a lot of activities,
probably it does not hurt to wake-up much. But of course there is a
situation when this would hurt, but I again, do doubt such systems are
common, an really care.
IOW, I do not dismiss your point, but I think that we rather see if it
affects anyone and then optimize with a deferrable/range timers.
--
Best Regards,
Artem Bityutskiy
On Thu, 2012-06-21 at 23:30 +0300, Artem Bityutskiy wrote:
> Well, I also thought about this. But I again did not want to invent
> anything complex because main file systems - ext4, btrfs, xfs - do not
> use 'write_super()' at all.
Sorry, ext4 does use it in the non-journal mode but I have patches which
kill it, although not looked at by Ted so far.
> And then only these dying / rare
> file-systems like btrfs / hfs - I did not feel like over-engineering
> is
> needed.
Sorry, of course I meant reiserfs, not btrfs.
So what's left is hfs/hfsplus, affs, ufs, udf, reiserfs, jffs2. I could
optimize them if needed, but is this worth the effort?
--
Best Regards,
Artem Bityutskiy