2022-08-03 10:58:27

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE

Currently the I_DIRTY_TIME will never get set if the inode already has
I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's
true, however ext4 will only update the on-disk inode in
->dirty_inode(), not on actual writeback. As a result if the inode
already has I_DIRTY_INODE state by the time we get to
__mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
into on-disk inode and will not get updated until the next I_DIRTY_INODE
update, which might never come if we crash or get a power failure.

The problem can be reproduced on ext4 by running xfstest generic/622
with -o iversion mount option.

Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
I_DIRTY_INODE. Also make sure that the case is properly handled in
writeback_single_inode() as well. Additionally changes in
xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.

Thanks Jan Kara for suggestions on how to make this work properly.

Cc: Dave Chinner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Lukas Czerner <[email protected]>
Suggested-by: Jan Kara <[email protected]>
---
v2: Reworked according to suggestions from Jan

fs/fs-writeback.c | 34 ++++++++++++++++++++++------------
fs/xfs/xfs_super.c | 3 ++-
include/linux/fs.h | 6 +++---
3 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 05221366a16d..638dbf143727 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1718,9 +1718,14 @@ static int writeback_single_inode(struct inode *inode,
*/
if (!(inode->i_state & I_DIRTY_ALL))
inode_cgwb_move_to_attached(inode, wb);
- else if (!(inode->i_state & I_SYNC_QUEUED) &&
- (inode->i_state & I_DIRTY))
- redirty_tail_locked(inode, wb);
+ else if (!(inode->i_state & I_SYNC_QUEUED)) {
+ if ((inode->i_state & I_DIRTY))
+ redirty_tail_locked(inode, wb);
+ else if (inode->i_state & I_DIRTY_TIME) {
+ inode->dirtied_when = jiffies;
+ inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
+ }
+ }

spin_unlock(&wb->list_lock);
inode_sync_complete(inode);
@@ -2369,6 +2374,17 @@ void __mark_inode_dirty(struct inode *inode, int flags)
trace_writeback_mark_inode_dirty(inode, flags);

if (flags & I_DIRTY_INODE) {
+
+ /* Inode timestamp update will piggback on this dirtying */
+ if (inode->i_state & I_DIRTY_TIME) {
+ spin_lock(&inode->i_lock);
+ if (inode->i_state & I_DIRTY_TIME) {
+ inode->i_state &= ~I_DIRTY_TIME;
+ flags |= I_DIRTY_TIME;
+ }
+ spin_unlock(&inode->i_lock);
+ }
+
/*
* Notify the filesystem about the inode being dirtied, so that
* (if needed) it can update on-disk fields and journal the
@@ -2378,7 +2394,8 @@ void __mark_inode_dirty(struct inode *inode, int flags)
*/
trace_writeback_dirty_inode_start(inode, flags);
if (sb->s_op->dirty_inode)
- sb->s_op->dirty_inode(inode, flags & I_DIRTY_INODE);
+ sb->s_op->dirty_inode(inode,
+ flags & (I_DIRTY_INODE | I_DIRTY_TIME));
trace_writeback_dirty_inode(inode, flags);

/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
@@ -2399,21 +2416,15 @@ void __mark_inode_dirty(struct inode *inode, int flags)
*/
smp_mb();

- if (((inode->i_state & flags) == flags) ||
- (dirtytime && (inode->i_state & I_DIRTY_INODE)))
+ if ((inode->i_state & flags) == flags)
return;

spin_lock(&inode->i_lock);
- if (dirtytime && (inode->i_state & I_DIRTY_INODE))
- goto out_unlock_inode;
if ((inode->i_state & flags) != flags) {
const int was_dirty = inode->i_state & I_DIRTY;

inode_attach_wb(inode, NULL);

- /* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
- if (flags & I_DIRTY_INODE)
- inode->i_state &= ~I_DIRTY_TIME;
inode->i_state |= flags;

/*
@@ -2486,7 +2497,6 @@ void __mark_inode_dirty(struct inode *inode, int flags)
out_unlock:
if (wb)
spin_unlock(&wb->list_lock);
-out_unlock_inode:
spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL(__mark_inode_dirty);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index aa977c7ea370..cff05a4771b5 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -658,7 +658,8 @@ xfs_fs_dirty_inode(

if (!(inode->i_sb->s_flags & SB_LAZYTIME))
return;
- if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
+ if ((flag & ~I_DIRTY_TIME) != I_DIRTY_SYNC ||
+ !((inode->i_state | flag) & I_DIRTY_TIME))
return;

if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ad5e3520fae..2243797badf2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2245,9 +2245,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
* lazytime mount option is enabled. We keep track of this
* separately from I_DIRTY_SYNC in order to implement
* lazytime. This gets cleared if I_DIRTY_INODE
- * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e.
- * either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
- * i_state, but not both. I_DIRTY_PAGES may still be set.
+ * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
+ * I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
+ * in place.
* I_NEW Serves as both a mutex and completion notification.
* New inodes set I_NEW. If two processes both create
* the same inode, one of them will release its inode and
--
2.37.1



2022-08-05 08:16:27

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE

On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote:
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9ad5e3520fae..2243797badf2 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2245,9 +2245,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> * The inode itself only has dirty timestamps, and the
> * lazytime mount option is enabled. We keep track of this
> * separately from I_DIRTY_SYNC in order to implement
> * lazytime. This gets cleared if I_DIRTY_INODE
> - * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e.
> - * either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
> - * i_state, but not both. I_DIRTY_PAGES may still be set.
> + * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
> + * I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
> + * in place.

I'm still having a hard time understanding the new semantics. The first
sentence above needs to be updated since I_DIRTY_TIME no longer means "the inode
itself only has dirty timestamps", right?

Also, have you checked all the places that I_DIRTY_TIME is used and verified
they do the right thing now? What about inode_is_dirtytime_only()?

Also what is the precise meaning of the flags argument to ->dirty_inode now?

sb->s_op->dirty_inode(inode,
flags & (I_DIRTY_INODE | I_DIRTY_TIME));

Note that dirty_inode is documented in Documentation/filesystems/vfs.rst.

- Eric

2022-08-05 12:26:03

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE

On Fri, Aug 05, 2022 at 01:05:45AM -0700, Eric Biggers wrote:
> On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote:
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 9ad5e3520fae..2243797badf2 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2245,9 +2245,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> > * The inode itself only has dirty timestamps, and the
> > * lazytime mount option is enabled. We keep track of this
> > * separately from I_DIRTY_SYNC in order to implement
> > * lazytime. This gets cleared if I_DIRTY_INODE
> > - * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e.
> > - * either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
> > - * i_state, but not both. I_DIRTY_PAGES may still be set.
> > + * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
> > + * I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
> > + * in place.
>
> I'm still having a hard time understanding the new semantics. The first
> sentence above needs to be updated since I_DIRTY_TIME no longer means "the inode
> itself only has dirty timestamps", right?

The problem is that it was always assumed that I_DIRTY_INODE superseeds
I_DIRTY_TIME and so it would get cleared in __mark_inode_dirty() when we
have I_DIRTY_INODE. That's true, we call sb->s_op->dirty_inode(), the
time update gets pushed into on-disk inode structure, I_DIRTY_TIME
cleared and it will get queued for writeback.

Any subsequent dirtying with I_DIRTY_TIME gets ignored simply because
I_DIRTY_INODE is already set in i_state. But in ext4 this time update
will never get pushed into on disk inode and there is no I_DIRTY_TIME so
once the writeback is done we've lost all those I_DIRTY_TIME updates in
between even if there was a sync.

Now, we still clear I_DIRTY_TIME when we get I_DIRTY_INODE, but any
subsequent I_DIRTY_TIME only updates won't be ignored and we set it into
i_state. After the writeback is done it'll be moved to b_dirty_time
list.

So I am not sure how would you like it to be re-worded, simply removing
the 'only' would be ok?

>
> Also, have you checked all the places that I_DIRTY_TIME is used and verified
> they do the right thing now? What about inode_is_dirtytime_only()?

Yes, that's fine, despite the slightly misleading name ;)

>
> Also what is the precise meaning of the flags argument to ->dirty_inode now?
>
> sb->s_op->dirty_inode(inode,
> flags & (I_DIRTY_INODE | I_DIRTY_TIME));
>
> Note that dirty_inode is documented in Documentation/filesystems/vfs.rst.

Don't know. It alredy don't mention I_DIRTY_SYNC that can be there as
well. Additionaly it can have I_DIRTY_TIME to inform the fs we have a
dirty timestamp as well (in case of lazytime).

Perhaps we can add:

If the inode has dirty timestamp and lazytime is enabled I_DIRTY_TIME
will be set in the flags.

-Lukas

>
> - Eric
>