2022-08-24 16:04:29

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v4 1/3] ext4: don't increase iversion counter for ea_inodes

ea_inodes are using i_version for storing part of the reference count so
we really need to leave it alone.

The problem can be reproduced by xfstest ext4/026 when iversion is
enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL
inodes in ext4_mark_iloc_dirty().

Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Christian Brauner (Microsoft) <[email protected]>
---
v2, v3, v4: no change

fs/ext4/inode.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 601214453c3a..2a220be34caa 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5731,7 +5731,12 @@ int ext4_mark_iloc_dirty(handle_t *handle,
}
ext4_fc_track_inode(handle, inode);

- if (IS_I_VERSION(inode))
+ /*
+ * ea_inodes are using i_version for storing reference count, don't
+ * mess with it
+ */
+ if (IS_I_VERSION(inode) &&
+ !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
inode_inc_iversion(inode);

/* the do_update_inode consumes one bh->b_count */
--
2.37.1


2022-08-24 16:04:31

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v4 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE

Currently the I_DIRTY_TIME will never get set if the inode already has
I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's
true, however ext4 will only update the on-disk inode in
->dirty_inode(), not on actual writeback. As a result if the inode
already has I_DIRTY_INODE state by the time we get to
__mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
into on-disk inode and will not get updated until the next I_DIRTY_INODE
update, which might never come if we crash or get a power failure.

The problem can be reproduced on ext4 by running xfstest generic/622
with -o iversion mount option.

Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
I_DIRTY_INODE. Also make sure that the case is properly handled in
writeback_single_inode() as well. Additionally changes in
xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.

Thanks Jan Kara for suggestions on how to make this work properly.

Cc: Dave Chinner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Lukas Czerner <[email protected]>
Suggested-by: Jan Kara <[email protected]>
---
v2: Reworked according to suggestions from Jan
v3: Update documentation, add comments, change flag to flags in
xfs_fs_dirty_inode()
v4: Update documentation, simplify condition in xfs_fs_dirty_inode()

Documentation/filesystems/vfs.rst | 3 +++
fs/fs-writeback.c | 34 ++++++++++++++++++++-----------
fs/xfs/xfs_super.c | 10 +++++++--
include/linux/fs.h | 9 ++++----
4 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 6cd6953e175b..b2ef2449aed9 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -274,6 +274,9 @@ or bottom half).
This is specifically for the inode itself being marked dirty,
not its data. If the update needs to be persisted by fdatasync(),
then I_DIRTY_DATASYNC will be set in the flags argument.
+ I_DIRTY_TIME will be set in the flags in case lazytime is enabled
+ and struct inode has times updated since the last ->dirty_inode
+ call.

``write_inode``
this method is called when the VFS needs to write an inode to
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 05221366a16d..638dbf143727 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1718,9 +1718,14 @@ static int writeback_single_inode(struct inode *inode,
*/
if (!(inode->i_state & I_DIRTY_ALL))
inode_cgwb_move_to_attached(inode, wb);
- else if (!(inode->i_state & I_SYNC_QUEUED) &&
- (inode->i_state & I_DIRTY))
- redirty_tail_locked(inode, wb);
+ else if (!(inode->i_state & I_SYNC_QUEUED)) {
+ if ((inode->i_state & I_DIRTY))
+ redirty_tail_locked(inode, wb);
+ else if (inode->i_state & I_DIRTY_TIME) {
+ inode->dirtied_when = jiffies;
+ inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
+ }
+ }

spin_unlock(&wb->list_lock);
inode_sync_complete(inode);
@@ -2369,6 +2374,17 @@ void __mark_inode_dirty(struct inode *inode, int flags)
trace_writeback_mark_inode_dirty(inode, flags);

if (flags & I_DIRTY_INODE) {
+
+ /* Inode timestamp update will piggback on this dirtying */
+ if (inode->i_state & I_DIRTY_TIME) {
+ spin_lock(&inode->i_lock);
+ if (inode->i_state & I_DIRTY_TIME) {
+ inode->i_state &= ~I_DIRTY_TIME;
+ flags |= I_DIRTY_TIME;
+ }
+ spin_unlock(&inode->i_lock);
+ }
+
/*
* Notify the filesystem about the inode being dirtied, so that
* (if needed) it can update on-disk fields and journal the
@@ -2378,7 +2394,8 @@ void __mark_inode_dirty(struct inode *inode, int flags)
*/
trace_writeback_dirty_inode_start(inode, flags);
if (sb->s_op->dirty_inode)
- sb->s_op->dirty_inode(inode, flags & I_DIRTY_INODE);
+ sb->s_op->dirty_inode(inode,
+ flags & (I_DIRTY_INODE | I_DIRTY_TIME));
trace_writeback_dirty_inode(inode, flags);

/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
@@ -2399,21 +2416,15 @@ void __mark_inode_dirty(struct inode *inode, int flags)
*/
smp_mb();

- if (((inode->i_state & flags) == flags) ||
- (dirtytime && (inode->i_state & I_DIRTY_INODE)))
+ if ((inode->i_state & flags) == flags)
return;

spin_lock(&inode->i_lock);
- if (dirtytime && (inode->i_state & I_DIRTY_INODE))
- goto out_unlock_inode;
if ((inode->i_state & flags) != flags) {
const int was_dirty = inode->i_state & I_DIRTY;

inode_attach_wb(inode, NULL);

- /* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
- if (flags & I_DIRTY_INODE)
- inode->i_state &= ~I_DIRTY_TIME;
inode->i_state |= flags;

/*
@@ -2486,7 +2497,6 @@ void __mark_inode_dirty(struct inode *inode, int flags)
out_unlock:
if (wb)
spin_unlock(&wb->list_lock);
-out_unlock_inode:
spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL(__mark_inode_dirty);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 9ac59814bbb6..f029c6702dda 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -653,7 +653,7 @@ xfs_fs_destroy_inode(
static void
xfs_fs_dirty_inode(
struct inode *inode,
- int flag)
+ int flags)
{
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
@@ -661,7 +661,13 @@ xfs_fs_dirty_inode(

if (!(inode->i_sb->s_flags & SB_LAZYTIME))
return;
- if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
+
+ /*
+ * Only do the timestamp update if the inode is dirty (I_DIRTY_SYNC)
+ * and has dirty timestamp (I_DIRTY_TIME). I_DIRTY_TIME can be passed
+ * in flags possibly together with I_DIRTY_SYNC.
+ */
+ if ((flags & ~I_DIRTY_TIME) != I_DIRTY_SYNC || !(flags & I_DIRTY_TIME))
return;

if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9eced4cc286e..56a4b4b02477 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2371,13 +2371,14 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
* don't have to write inode on fdatasync() when only
* e.g. the timestamps have changed.
* I_DIRTY_PAGES Inode has dirty pages. Inode itself may be clean.
- * I_DIRTY_TIME The inode itself only has dirty timestamps, and the
+ * I_DIRTY_TIME The inode itself has dirty timestamps, and the
* lazytime mount option is enabled. We keep track of this
* separately from I_DIRTY_SYNC in order to implement
* lazytime. This gets cleared if I_DIRTY_INODE
- * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e.
- * either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
- * i_state, but not both. I_DIRTY_PAGES may still be set.
+ * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
+ * I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
+ * in place because writeback might already be in progress
+ * and we don't want to lose the time update
* I_NEW Serves as both a mutex and completion notification.
* New inodes set I_NEW. If two processes both create
* the same inode, one of them will release its inode and
--
2.37.1

2022-08-24 16:04:52

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v4 3/3] ext4: unconditionally enable the i_version counter

From: Jeff Layton <[email protected]>

The original i_version implementation was pretty expensive, requiring a
log flush on every change. Because of this, it was gated behind a mount
option (implemented via the MS_I_VERSION mountoption flag).

Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
i_version flag much less expensive, so there is no longer a performance
penalty from enabling it. xfs and btrfs already enable it
unconditionally when the on-disk format can support it.

Have ext4 ignore the SB_I_VERSION flag, and just enable it
unconditionally. While we're in here, remove the handling of
Opt_i_version as well, since we're almost to 5.20 anyway.

Ideally, we'd couple this change with a way to disable the i_version
counter (just in case), but the way the iversion mount option was
implemented makes that difficult to do. We'd need to add a new mount
option altogether or do something with tune2fs. That's probably best
left to later patches if it turns out to be needed.

[ Removed leftover bits of i_version from ext4_apply_options() since it
now can't ever be set in ctx->mask_s_flags -- lczerner ]

Cc: Dave Chinner <[email protected]>
Cc: Benjamin Coddington <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Christian Brauner (Microsoft) <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
v3: Removed leftover bits of i_version from ext4_apply_options
v4: no change

fs/ext4/inode.c | 5 ++---
fs/ext4/super.c | 21 ++++-----------------
2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2a220be34caa..c77d40f05763 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5425,7 +5425,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
return -EINVAL;
}

- if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
+ if (attr->ia_size != inode->i_size)
inode_inc_iversion(inode);

if (shrink) {
@@ -5735,8 +5735,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
* ea_inodes are using i_version for storing reference count, don't
* mess with it
*/
- if (IS_I_VERSION(inode) &&
- !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
+ if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
inode_inc_iversion(inode);

/* the do_update_inode consumes one bh->b_count */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9a66abcca1a8..1c953f6d400e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1585,7 +1585,7 @@ enum {
Opt_inlinecrypt,
Opt_usrjquota, Opt_grpjquota, Opt_quota,
Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
- Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
+ Opt_usrquota, Opt_grpquota, Opt_prjquota,
Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
@@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
fsparam_flag ("barrier", Opt_barrier),
fsparam_u32 ("barrier", Opt_barrier),
fsparam_flag ("nobarrier", Opt_nobarrier),
- fsparam_flag ("i_version", Opt_i_version),
fsparam_flag ("dax", Opt_dax),
fsparam_enum ("dax", Opt_dax_type, ext4_param_dax),
fsparam_u32 ("stripe", Opt_stripe),
@@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
case Opt_abort:
ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
return 0;
- case Opt_i_version:
- ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
- ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
- ctx_set_flags(ctx, SB_I_VERSION);
- return 0;
case Opt_inlinecrypt:
#ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
ctx_set_flags(ctx, SB_INLINECRYPT);
@@ -2814,14 +2808,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
sb->s_flags &= ~ctx->mask_s_flags;
sb->s_flags |= ctx->vals_s_flags;

- /*
- * i_version differs from common mount option iversion so we have
- * to let vfs know that it was set, otherwise it would get cleared
- * on remount
- */
- if (ctx->mask_s_flags & SB_I_VERSION)
- fc->sb_flags |= SB_I_VERSION;
-
#define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
APPLY(s_commit_interval);
APPLY(s_stripe);
@@ -2970,8 +2956,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
- if (sb->s_flags & SB_I_VERSION)
- SEQ_OPTS_PUTS("i_version");
if (nodefs || sbi->s_stripe)
SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
if (nodefs || EXT4_MOUNT_DATA_FLAGS &
@@ -4640,6 +4624,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
(test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);

+ /* i_version is always enabled now */
+ sb->s_flags |= SB_I_VERSION;
+
if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
(ext4_has_compat_features(sb) ||
ext4_has_ro_compat_features(sb) ||
--
2.37.1

2022-08-24 17:50:03

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE

On Wed 24-08-22 18:03:48, Lukas Czerner wrote:
> Currently the I_DIRTY_TIME will never get set if the inode already has
> I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's
> true, however ext4 will only update the on-disk inode in
> ->dirty_inode(), not on actual writeback. As a result if the inode
> already has I_DIRTY_INODE state by the time we get to
> __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
> into on-disk inode and will not get updated until the next I_DIRTY_INODE
> update, which might never come if we crash or get a power failure.
>
> The problem can be reproduced on ext4 by running xfstest generic/622
> with -o iversion mount option.
>
> Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
> I_DIRTY_INODE. Also make sure that the case is properly handled in
> writeback_single_inode() as well. Additionally changes in
> xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.
>
> Thanks Jan Kara for suggestions on how to make this work properly.
>
> Cc: Dave Chinner <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Signed-off-by: Lukas Czerner <[email protected]>
> Suggested-by: Jan Kara <[email protected]>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Just two nits below:

> @@ -2369,6 +2374,17 @@ void __mark_inode_dirty(struct inode *inode, int flags)
> trace_writeback_mark_inode_dirty(inode, flags);
>
> if (flags & I_DIRTY_INODE) {
> +

Pointless empty line here.

> + /* Inode timestamp update will piggback on this dirtying */

Maybe expand this comment to:

/*
* Inode timestamp update will piggback on this dirtying.
* We tell ->dirty_inode callback that timestamps need to
* be updated by setting I_DIRTY_TIME in flags.
*/
> + if (inode->i_state & I_DIRTY_TIME) {
> + spin_lock(&inode->i_lock);
> + if (inode->i_state & I_DIRTY_TIME) {
> + inode->i_state &= ~I_DIRTY_TIME;
> + flags |= I_DIRTY_TIME;
> + }
> + spin_unlock(&inode->i_lock);
> + }
> +

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-08-25 10:14:35

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v5] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE

Currently the I_DIRTY_TIME will never get set if the inode already has
I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's
true, however ext4 will only update the on-disk inode in
->dirty_inode(), not on actual writeback. As a result if the inode
already has I_DIRTY_INODE state by the time we get to
__mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
into on-disk inode and will not get updated until the next I_DIRTY_INODE
update, which might never come if we crash or get a power failure.

The problem can be reproduced on ext4 by running xfstest generic/622
with -o iversion mount option.

Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
I_DIRTY_INODE. Also make sure that the case is properly handled in
writeback_single_inode() as well. Additionally changes in
xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.

Thanks Jan Kara for suggestions on how to make this work properly.

Cc: Dave Chinner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Lukas Czerner <[email protected]>
Suggested-by: Jan Kara <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
v2: Reworked according to suggestions from Jan
v3: Update documentation, add comments, change flag to flags in
xfs_fs_dirty_inode()
v4: Update documentation, simplify condition in xfs_fs_dirty_inode()
v5: Update comment for condition in __mark_inode_dirty()


Documentation/filesystems/vfs.rst | 3 +++
fs/fs-writeback.c | 37 +++++++++++++++++++++----------
fs/xfs/xfs_super.c | 10 +++++++--
include/linux/fs.h | 9 ++++----
4 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 6cd6953e175b..b2ef2449aed9 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -274,6 +274,9 @@ or bottom half).
This is specifically for the inode itself being marked dirty,
not its data. If the update needs to be persisted by fdatasync(),
then I_DIRTY_DATASYNC will be set in the flags argument.
+ I_DIRTY_TIME will be set in the flags in case lazytime is enabled
+ and struct inode has times updated since the last ->dirty_inode
+ call.

``write_inode``
this method is called when the VFS needs to write an inode to
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 05221366a16d..45860591d51f 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1718,9 +1718,14 @@ static int writeback_single_inode(struct inode *inode,
*/
if (!(inode->i_state & I_DIRTY_ALL))
inode_cgwb_move_to_attached(inode, wb);
- else if (!(inode->i_state & I_SYNC_QUEUED) &&
- (inode->i_state & I_DIRTY))
- redirty_tail_locked(inode, wb);
+ else if (!(inode->i_state & I_SYNC_QUEUED)) {
+ if ((inode->i_state & I_DIRTY))
+ redirty_tail_locked(inode, wb);
+ else if (inode->i_state & I_DIRTY_TIME) {
+ inode->dirtied_when = jiffies;
+ inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
+ }
+ }

spin_unlock(&wb->list_lock);
inode_sync_complete(inode);
@@ -2369,6 +2374,20 @@ void __mark_inode_dirty(struct inode *inode, int flags)
trace_writeback_mark_inode_dirty(inode, flags);

if (flags & I_DIRTY_INODE) {
+ /*
+ * Inode timestamp update will piggback on this dirtying.
+ * We tell ->dirty_inode callback that timestamps need to
+ * be updated by setting I_DIRTY_TIME in flags.
+ */
+ if (inode->i_state & I_DIRTY_TIME) {
+ spin_lock(&inode->i_lock);
+ if (inode->i_state & I_DIRTY_TIME) {
+ inode->i_state &= ~I_DIRTY_TIME;
+ flags |= I_DIRTY_TIME;
+ }
+ spin_unlock(&inode->i_lock);
+ }
+
/*
* Notify the filesystem about the inode being dirtied, so that
* (if needed) it can update on-disk fields and journal the
@@ -2378,7 +2397,8 @@ void __mark_inode_dirty(struct inode *inode, int flags)
*/
trace_writeback_dirty_inode_start(inode, flags);
if (sb->s_op->dirty_inode)
- sb->s_op->dirty_inode(inode, flags & I_DIRTY_INODE);
+ sb->s_op->dirty_inode(inode,
+ flags & (I_DIRTY_INODE | I_DIRTY_TIME));
trace_writeback_dirty_inode(inode, flags);

/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
@@ -2399,21 +2419,15 @@ void __mark_inode_dirty(struct inode *inode, int flags)
*/
smp_mb();

- if (((inode->i_state & flags) == flags) ||
- (dirtytime && (inode->i_state & I_DIRTY_INODE)))
+ if ((inode->i_state & flags) == flags)
return;

spin_lock(&inode->i_lock);
- if (dirtytime && (inode->i_state & I_DIRTY_INODE))
- goto out_unlock_inode;
if ((inode->i_state & flags) != flags) {
const int was_dirty = inode->i_state & I_DIRTY;

inode_attach_wb(inode, NULL);

- /* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
- if (flags & I_DIRTY_INODE)
- inode->i_state &= ~I_DIRTY_TIME;
inode->i_state |= flags;

/*
@@ -2486,7 +2500,6 @@ void __mark_inode_dirty(struct inode *inode, int flags)
out_unlock:
if (wb)
spin_unlock(&wb->list_lock);
-out_unlock_inode:
spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL(__mark_inode_dirty);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 9ac59814bbb6..f029c6702dda 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -653,7 +653,7 @@ xfs_fs_destroy_inode(
static void
xfs_fs_dirty_inode(
struct inode *inode,
- int flag)
+ int flags)
{
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
@@ -661,7 +661,13 @@ xfs_fs_dirty_inode(

if (!(inode->i_sb->s_flags & SB_LAZYTIME))
return;
- if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
+
+ /*
+ * Only do the timestamp update if the inode is dirty (I_DIRTY_SYNC)
+ * and has dirty timestamp (I_DIRTY_TIME). I_DIRTY_TIME can be passed
+ * in flags possibly together with I_DIRTY_SYNC.
+ */
+ if ((flags & ~I_DIRTY_TIME) != I_DIRTY_SYNC || !(flags & I_DIRTY_TIME))
return;

if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9eced4cc286e..56a4b4b02477 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2371,13 +2371,14 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
* don't have to write inode on fdatasync() when only
* e.g. the timestamps have changed.
* I_DIRTY_PAGES Inode has dirty pages. Inode itself may be clean.
- * I_DIRTY_TIME The inode itself only has dirty timestamps, and the
+ * I_DIRTY_TIME The inode itself has dirty timestamps, and the
* lazytime mount option is enabled. We keep track of this
* separately from I_DIRTY_SYNC in order to implement
* lazytime. This gets cleared if I_DIRTY_INODE
- * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e.
- * either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
- * i_state, but not both. I_DIRTY_PAGES may still be set.
+ * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
+ * I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
+ * in place because writeback might already be in progress
+ * and we don't want to lose the time update
* I_NEW Serves as both a mutex and completion notification.
* New inodes set I_NEW. If two processes both create
* the same inode, one of them will release its inode and
--
2.37.1

2022-08-26 16:14:03

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] ext4: unconditionally enable the i_version counter

On Wed, 2022-08-24 at 18:03 +0200, Lukas Czerner wrote:
> From: Jeff Layton <[email protected]>
>
> The original i_version implementation was pretty expensive, requiring a
> log flush on every change. Because of this, it was gated behind a mount
> option (implemented via the MS_I_VERSION mountoption flag).
>
> Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
> i_version flag much less expensive, so there is no longer a performance
> penalty from enabling it. xfs and btrfs already enable it
> unconditionally when the on-disk format can support it.
>
> Have ext4 ignore the SB_I_VERSION flag, and just enable it
> unconditionally. While we're in here, remove the handling of
> Opt_i_version as well, since we're almost to 5.20 anyway.
>
> Ideally, we'd couple this change with a way to disable the i_version
> counter (just in case), but the way the iversion mount option was
> implemented makes that difficult to do. We'd need to add a new mount
> option altogether or do something with tune2fs. That's probably best
> left to later patches if it turns out to be needed.
>
> [ Removed leftover bits of i_version from ext4_apply_options() since it
> now can't ever be set in ctx->mask_s_flags -- lczerner ]
>
> Cc: Dave Chinner <[email protected]>
> Cc: Benjamin Coddington <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Darrick J. Wong <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> Signed-off-by: Lukas Czerner <[email protected]>
> Reviewed-by: Christian Brauner (Microsoft) <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>
> ---
> v3: Removed leftover bits of i_version from ext4_apply_options
> v4: no change
>
> fs/ext4/inode.c | 5 ++---
> fs/ext4/super.c | 21 ++++-----------------
> 2 files changed, 6 insertions(+), 20 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 2a220be34caa..c77d40f05763 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5425,7 +5425,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
> return -EINVAL;
> }
>
> - if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
> + if (attr->ia_size != inode->i_size)
> inode_inc_iversion(inode);
>
> if (shrink) {
> @@ -5735,8 +5735,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
> * ea_inodes are using i_version for storing reference count, don't
> * mess with it
> */
> - if (IS_I_VERSION(inode) &&
> - !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> + if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> inode_inc_iversion(inode);
>
> /* the do_update_inode consumes one bh->b_count */
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 9a66abcca1a8..1c953f6d400e 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1585,7 +1585,7 @@ enum {
> Opt_inlinecrypt,
> Opt_usrjquota, Opt_grpjquota, Opt_quota,
> Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
> - Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
> + Opt_usrquota, Opt_grpquota, Opt_prjquota,
> Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
> Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
> Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
> @@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
> fsparam_flag ("barrier", Opt_barrier),
> fsparam_u32 ("barrier", Opt_barrier),
> fsparam_flag ("nobarrier", Opt_nobarrier),
> - fsparam_flag ("i_version", Opt_i_version),
> fsparam_flag ("dax", Opt_dax),
> fsparam_enum ("dax", Opt_dax_type, ext4_param_dax),
> fsparam_u32 ("stripe", Opt_stripe),
> @@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
> case Opt_abort:
> ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
> return 0;
> - case Opt_i_version:
> - ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
> - ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
> - ctx_set_flags(ctx, SB_I_VERSION);
> - return 0;
> case Opt_inlinecrypt:
> #ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
> ctx_set_flags(ctx, SB_INLINECRYPT);
> @@ -2814,14 +2808,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
> sb->s_flags &= ~ctx->mask_s_flags;
> sb->s_flags |= ctx->vals_s_flags;
>
> - /*
> - * i_version differs from common mount option iversion so we have
> - * to let vfs know that it was set, otherwise it would get cleared
> - * on remount
> - */
> - if (ctx->mask_s_flags & SB_I_VERSION)
> - fc->sb_flags |= SB_I_VERSION;
> -
> #define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
> APPLY(s_commit_interval);
> APPLY(s_stripe);
> @@ -2970,8 +2956,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
> SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
> if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
> SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
> - if (sb->s_flags & SB_I_VERSION)
> - SEQ_OPTS_PUTS("i_version");
> if (nodefs || sbi->s_stripe)
> SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
> if (nodefs || EXT4_MOUNT_DATA_FLAGS &
> @@ -4640,6 +4624,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
> sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
> (test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
>
> + /* i_version is always enabled now */
> + sb->s_flags |= SB_I_VERSION;
> +
> if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
> (ext4_has_compat_features(sb) ||
> ext4_has_ro_compat_features(sb) ||

Hi Lukas,

I know I had originally asked you to shepherd this patch into mainline,
but I think it may be better to wait on it for now. Since I asked that,
we've since found out that ext4 is bumping the i_version counter on
atime updates. It'd be best to get that fixed before we turn this on
unconditionally, since it could cause a performance regression in some
cases. I'll plan to pick this back up for my latest i_version series if
that sounds ok to you.

Sorry for the back and forth, and thanks again!

Cheers,
--
Jeff Layton <[email protected]>

2022-08-29 08:19:01

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] ext4: unconditionally enable the i_version counter

On Fri, Aug 26, 2022 at 12:11:23PM -0400, Jeff Layton wrote:
> On Wed, 2022-08-24 at 18:03 +0200, Lukas Czerner wrote:
> > From: Jeff Layton <[email protected]>
> >
> > The original i_version implementation was pretty expensive, requiring a
> > log flush on every change. Because of this, it was gated behind a mount
> > option (implemented via the MS_I_VERSION mountoption flag).
> >
> > Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
> > i_version flag much less expensive, so there is no longer a performance
> > penalty from enabling it. xfs and btrfs already enable it
> > unconditionally when the on-disk format can support it.
> >
> > Have ext4 ignore the SB_I_VERSION flag, and just enable it
> > unconditionally. While we're in here, remove the handling of
> > Opt_i_version as well, since we're almost to 5.20 anyway.
> >
> > Ideally, we'd couple this change with a way to disable the i_version
> > counter (just in case), but the way the iversion mount option was
> > implemented makes that difficult to do. We'd need to add a new mount
> > option altogether or do something with tune2fs. That's probably best
> > left to later patches if it turns out to be needed.
> >
> > [ Removed leftover bits of i_version from ext4_apply_options() since it
> > now can't ever be set in ctx->mask_s_flags -- lczerner ]
> >
> > Cc: Dave Chinner <[email protected]>
> > Cc: Benjamin Coddington <[email protected]>
> > Cc: Christoph Hellwig <[email protected]>
> > Cc: Darrick J. Wong <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
> > Signed-off-by: Lukas Czerner <[email protected]>
> > Reviewed-by: Christian Brauner (Microsoft) <[email protected]>
> > Reviewed-by: Jan Kara <[email protected]>
> > ---
> > v3: Removed leftover bits of i_version from ext4_apply_options
> > v4: no change
> >
> > fs/ext4/inode.c | 5 ++---
> > fs/ext4/super.c | 21 ++++-----------------
> > 2 files changed, 6 insertions(+), 20 deletions(-)
> >
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 2a220be34caa..c77d40f05763 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -5425,7 +5425,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
> > return -EINVAL;
> > }
> >
> > - if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
> > + if (attr->ia_size != inode->i_size)
> > inode_inc_iversion(inode);
> >
> > if (shrink) {
> > @@ -5735,8 +5735,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
> > * ea_inodes are using i_version for storing reference count, don't
> > * mess with it
> > */
> > - if (IS_I_VERSION(inode) &&
> > - !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> > + if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> > inode_inc_iversion(inode);
> >
> > /* the do_update_inode consumes one bh->b_count */
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index 9a66abcca1a8..1c953f6d400e 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -1585,7 +1585,7 @@ enum {
> > Opt_inlinecrypt,
> > Opt_usrjquota, Opt_grpjquota, Opt_quota,
> > Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
> > - Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
> > + Opt_usrquota, Opt_grpquota, Opt_prjquota,
> > Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
> > Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
> > Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
> > @@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
> > fsparam_flag ("barrier", Opt_barrier),
> > fsparam_u32 ("barrier", Opt_barrier),
> > fsparam_flag ("nobarrier", Opt_nobarrier),
> > - fsparam_flag ("i_version", Opt_i_version),
> > fsparam_flag ("dax", Opt_dax),
> > fsparam_enum ("dax", Opt_dax_type, ext4_param_dax),
> > fsparam_u32 ("stripe", Opt_stripe),
> > @@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
> > case Opt_abort:
> > ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
> > return 0;
> > - case Opt_i_version:
> > - ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
> > - ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
> > - ctx_set_flags(ctx, SB_I_VERSION);
> > - return 0;
> > case Opt_inlinecrypt:
> > #ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
> > ctx_set_flags(ctx, SB_INLINECRYPT);
> > @@ -2814,14 +2808,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
> > sb->s_flags &= ~ctx->mask_s_flags;
> > sb->s_flags |= ctx->vals_s_flags;
> >
> > - /*
> > - * i_version differs from common mount option iversion so we have
> > - * to let vfs know that it was set, otherwise it would get cleared
> > - * on remount
> > - */
> > - if (ctx->mask_s_flags & SB_I_VERSION)
> > - fc->sb_flags |= SB_I_VERSION;
> > -
> > #define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
> > APPLY(s_commit_interval);
> > APPLY(s_stripe);
> > @@ -2970,8 +2956,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
> > SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
> > if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
> > SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
> > - if (sb->s_flags & SB_I_VERSION)
> > - SEQ_OPTS_PUTS("i_version");
> > if (nodefs || sbi->s_stripe)
> > SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
> > if (nodefs || EXT4_MOUNT_DATA_FLAGS &
> > @@ -4640,6 +4624,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
> > sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
> > (test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
> >
> > + /* i_version is always enabled now */
> > + sb->s_flags |= SB_I_VERSION;
> > +
> > if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
> > (ext4_has_compat_features(sb) ||
> > ext4_has_ro_compat_features(sb) ||
>
> Hi Lukas,
>
> I know I had originally asked you to shepherd this patch into mainline,
> but I think it may be better to wait on it for now. Since I asked that,
> we've since found out that ext4 is bumping the i_version counter on
> atime updates. It'd be best to get that fixed before we turn this on
> unconditionally, since it could cause a performance regression in some
> cases. I'll plan to pick this back up for my latest i_version series if
> that sounds ok to you.
>
> Sorry for the back and forth, and thanks again!

Hi Jeff,

sure, no problem. I can drop the patch. The rest of the series is still
valid though.

Thanks!
-Lukas

>
> Cheers,
> --
> Jeff Layton <[email protected]>
>

2022-08-29 10:18:40

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] ext4: unconditionally enable the i_version counter

On Mon, 2022-08-29 at 10:17 +0200, Lukas Czerner wrote:
> On Fri, Aug 26, 2022 at 12:11:23PM -0400, Jeff Layton wrote:
> > On Wed, 2022-08-24 at 18:03 +0200, Lukas Czerner wrote:
> > > From: Jeff Layton <[email protected]>
> > >
> > > The original i_version implementation was pretty expensive, requiring a
> > > log flush on every change. Because of this, it was gated behind a mount
> > > option (implemented via the MS_I_VERSION mountoption flag).
> > >
> > > Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
> > > i_version flag much less expensive, so there is no longer a performance
> > > penalty from enabling it. xfs and btrfs already enable it
> > > unconditionally when the on-disk format can support it.
> > >
> > > Have ext4 ignore the SB_I_VERSION flag, and just enable it
> > > unconditionally. While we're in here, remove the handling of
> > > Opt_i_version as well, since we're almost to 5.20 anyway.
> > >
> > > Ideally, we'd couple this change with a way to disable the i_version
> > > counter (just in case), but the way the iversion mount option was
> > > implemented makes that difficult to do. We'd need to add a new mount
> > > option altogether or do something with tune2fs. That's probably best
> > > left to later patches if it turns out to be needed.
> > >
> > > [ Removed leftover bits of i_version from ext4_apply_options() since it
> > > now can't ever be set in ctx->mask_s_flags -- lczerner ]
> > >
> > > Cc: Dave Chinner <[email protected]>
> > > Cc: Benjamin Coddington <[email protected]>
> > > Cc: Christoph Hellwig <[email protected]>
> > > Cc: Darrick J. Wong <[email protected]>
> > > Signed-off-by: Jeff Layton <[email protected]>
> > > Signed-off-by: Lukas Czerner <[email protected]>
> > > Reviewed-by: Christian Brauner (Microsoft) <[email protected]>
> > > Reviewed-by: Jan Kara <[email protected]>
> > > ---
> > > v3: Removed leftover bits of i_version from ext4_apply_options
> > > v4: no change
> > >
> > > fs/ext4/inode.c | 5 ++---
> > > fs/ext4/super.c | 21 ++++-----------------
> > > 2 files changed, 6 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > index 2a220be34caa..c77d40f05763 100644
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > @@ -5425,7 +5425,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
> > > return -EINVAL;
> > > }
> > >
> > > - if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
> > > + if (attr->ia_size != inode->i_size)
> > > inode_inc_iversion(inode);
> > >
> > > if (shrink) {
> > > @@ -5735,8 +5735,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
> > > * ea_inodes are using i_version for storing reference count, don't
> > > * mess with it
> > > */
> > > - if (IS_I_VERSION(inode) &&
> > > - !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> > > + if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> > > inode_inc_iversion(inode);
> > >
> > > /* the do_update_inode consumes one bh->b_count */
> > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > > index 9a66abcca1a8..1c953f6d400e 100644
> > > --- a/fs/ext4/super.c
> > > +++ b/fs/ext4/super.c
> > > @@ -1585,7 +1585,7 @@ enum {
> > > Opt_inlinecrypt,
> > > Opt_usrjquota, Opt_grpjquota, Opt_quota,
> > > Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
> > > - Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
> > > + Opt_usrquota, Opt_grpquota, Opt_prjquota,
> > > Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
> > > Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
> > > Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
> > > @@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
> > > fsparam_flag ("barrier", Opt_barrier),
> > > fsparam_u32 ("barrier", Opt_barrier),
> > > fsparam_flag ("nobarrier", Opt_nobarrier),
> > > - fsparam_flag ("i_version", Opt_i_version),
> > > fsparam_flag ("dax", Opt_dax),
> > > fsparam_enum ("dax", Opt_dax_type, ext4_param_dax),
> > > fsparam_u32 ("stripe", Opt_stripe),
> > > @@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
> > > case Opt_abort:
> > > ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
> > > return 0;
> > > - case Opt_i_version:
> > > - ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
> > > - ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
> > > - ctx_set_flags(ctx, SB_I_VERSION);
> > > - return 0;
> > > case Opt_inlinecrypt:
> > > #ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
> > > ctx_set_flags(ctx, SB_INLINECRYPT);
> > > @@ -2814,14 +2808,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
> > > sb->s_flags &= ~ctx->mask_s_flags;
> > > sb->s_flags |= ctx->vals_s_flags;
> > >
> > > - /*
> > > - * i_version differs from common mount option iversion so we have
> > > - * to let vfs know that it was set, otherwise it would get cleared
> > > - * on remount
> > > - */
> > > - if (ctx->mask_s_flags & SB_I_VERSION)
> > > - fc->sb_flags |= SB_I_VERSION;
> > > -
> > > #define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
> > > APPLY(s_commit_interval);
> > > APPLY(s_stripe);
> > > @@ -2970,8 +2956,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
> > > SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
> > > if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
> > > SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
> > > - if (sb->s_flags & SB_I_VERSION)
> > > - SEQ_OPTS_PUTS("i_version");
> > > if (nodefs || sbi->s_stripe)
> > > SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
> > > if (nodefs || EXT4_MOUNT_DATA_FLAGS &
> > > @@ -4640,6 +4624,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
> > > sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
> > > (test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
> > >
> > > + /* i_version is always enabled now */
> > > + sb->s_flags |= SB_I_VERSION;
> > > +
> > > if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
> > > (ext4_has_compat_features(sb) ||
> > > ext4_has_ro_compat_features(sb) ||
> >
> > Hi Lukas,
> >
> > I know I had originally asked you to shepherd this patch into mainline,
> > but I think it may be better to wait on it for now. Since I asked that,
> > we've since found out that ext4 is bumping the i_version counter on
> > atime updates. It'd be best to get that fixed before we turn this on
> > unconditionally, since it could cause a performance regression in some
> > cases. I'll plan to pick this back up for my latest i_version series if
> > that sounds ok to you.
> >
> > Sorry for the back and forth, and thanks again!
>
> Hi Jeff,
>
> sure, no problem. I can drop the patch. The rest of the series is still
> valid though.
>
> Thanks!
> -Lukas
>
>

Yes, the rest is fine (AFAICT)

Thanks!
--
Jeff Layton <[email protected]>

2022-09-29 15:06:25

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] ext4: don't increase iversion counter for ea_inodes

On Wed, 24 Aug 2022 18:03:47 +0200, Lukas Czerner wrote:
> ea_inodes are using i_version for storing part of the reference count so
> we really need to leave it alone.
>
> The problem can be reproduced by xfstest ext4/026 when iversion is
> enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL
> inodes in ext4_mark_iloc_dirty().
>
> [...]

Applied, thanks!

[1/3] ext4: don't increase iversion counter for ea_inodes
commit: 6c7c5ade428cc65b58e4aba1925b5347970f4456
[2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
commit: 625e1e67b66245b93ccae868cd4a950d257de003
[3/3] ext4: unconditionally enable the i_version counter
commit: 59772a0cb09a7ec77362653e8be207a464fa04af

Best regards,
--
Theodore Ts'o <[email protected]>

2022-09-29 15:06:32

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v5] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE

On Thu, 25 Aug 2022 12:06:57 +0200, Lukas Czerner wrote:
> Currently the I_DIRTY_TIME will never get set if the inode already has
> I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's
> true, however ext4 will only update the on-disk inode in
> ->dirty_inode(), not on actual writeback. As a result if the inode
> already has I_DIRTY_INODE state by the time we get to
> __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
> into on-disk inode and will not get updated until the next I_DIRTY_INODE
> update, which might never come if we crash or get a power failure.
>
> [...]

Applied, thanks!

[1/1] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
commit: 625e1e67b66245b93ccae868cd4a950d257de003

Best regards,
--
Theodore Ts'o <[email protected]>