2020-03-06 00:46:49

by Eric Biggers

[permalink] [raw]
Subject: lazytime causing inodes to remain dirty after sync?

While testing my patch "fscrypt: don't evict dirty inodes after removing key"
(https://lkml.kernel.org/r/[email protected]), I've
run into an issue where even after the filesystem is sync'ed and no files are
in-use, inodes can remain dirty if the filesystem is mounted with -o lazytime.
Thus, my patch causes some inodes to not be evicted when they should be.

(lazytime is the default on f2fs, but ext4 supports it too.)

This is caused by the following code in __writeback_single_inode() that
redirties the inode if its access time is dirty:

if (dirty & I_DIRTY_TIME)
mark_inode_dirty_sync(inode);
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & ~I_DIRTY_PAGES) {
int err = write_inode(inode, wbc);
if (ret == 0)
ret = err;
}
trace_writeback_single_inode(inode, wbc, nr_to_write);
return ret;

Here's a reproducer in the kvm-xfstests test appliance which demonstrates the
problem using sync(), without fscrypt involved at all:

sysctl vm.dirty_expire_centisecs=500
umount /vdc
mkfs.ext4 -F /dev/vdc
mount /vdc -o lazytime
echo contents > /vdc/file
sync
ino=$(stat -c %i /vdc/file)
echo 1 | tee /sys/kernel/debug/tracing/events/writeback/writeback_{single_inode_start,mark_inode_dirty,lazytime}/enable
echo "ino == $ino" | tee /sys/kernel/debug/tracing/events/writeback/writeback_{single_inode_start,mark_inode_dirty,lazytime}/filter
echo > /sys/kernel/debug/tracing/trace
cat /vdc/file > /dev/null
sync
cat /sys/kernel/debug/tracing/trace_pipe

The tracing shows that the inode for /vdc/file is written during the sync at
7.28s. But then, still during the sync, it's immediately re-dirtied. It then
gets written again later in the background, after the sync.

cat-286 [001] ...1 7.279433: writeback_mark_inode_dirty: bdi 254:32: ino=12 state= flags=I_DIRTY_TIME
kworker/u8:0-8 [003] ...1 7.282647: writeback_single_inode_start: bdi 254:32: ino=12 state=I_SYNC|I_DIRTY_TIME|I_DIRTY_TIME_EXPIRED dirtied_when=4294879420 age=0 index=1 to_write=9223372036854775807 wrote=0 cgroup_ino=1
kworker/u8:0-8 [003] ...2 7.282660: writeback_lazytime: dev 254,32 ino 12 dirtied 4294879420 state I_SYNC|I_DIRTY_TIME|I_DIRTY_TIME_EXPIRED mode 0100644
kworker/u8:0-8 [003] ...1 7.283204: writeback_mark_inode_dirty: bdi 254:32: ino=12 state=I_SYNC flags=I_DIRTY_SYNC
kworker/u8:0-8 [003] ...1 12.412079: writeback_single_inode_start: bdi 254:32: ino=12 state=I_DIRTY_SYNC|I_SYNC dirtied_when=4294879421 age=5 index=1 to_write=13312 wrote=0 cgroup_ino=1

Is this behavior intentional at all? It seems like a bug; it seems the inode
should be written just once, during the sync.

- Eric


2020-03-07 02:01:37

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration

In the case that an inode has dirty timestamp for longer than the
lazytime expiration timeout (or if all such inodes are being flushed
out due to a sync or syncfs system call), we need to inform the file
system that the inode is dirty so that the inode's timestamps can be
copied out to the on-disk data structures. That's because if the file
system supports lazytime, it will have ignored the dirty_inode(inode,
I_DIRTY_TIME) notification when the timestamp was modified in memory.q

Previously, this was accomplished by calling mark_inode_dirty_sync(),
but that has the unfortunate side effect of also putting the inode the
writeback list, and that's not necessary in this case, since we will
immediately call write_inode() afterwards.

Eric Biggers noticed that this was causing problems for fscrypt after
the key was removed[1].

[1] https://lore.kernel.org/r/[email protected]

Reported-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
---
fs/fs-writeback.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 76ac9c7d32ec..32101349ba97 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1504,8 +1504,9 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)

spin_unlock(&inode->i_lock);

- if (dirty & I_DIRTY_TIME)
- mark_inode_dirty_sync(inode);
+ /* This was a lazytime expiration; we need to tell the file system */
+ if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
+ inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_TIME_EXPIRED);
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & ~I_DIRTY_PAGES) {
int err = write_inode(inode, wbc);
--
2.24.1

2020-03-11 03:20:33

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration

On Fri, Mar 06, 2020 at 09:00:43PM -0500, Theodore Ts'o wrote:
> In the case that an inode has dirty timestamp for longer than the
> lazytime expiration timeout (or if all such inodes are being flushed
> out due to a sync or syncfs system call), we need to inform the file
> system that the inode is dirty so that the inode's timestamps can be
> copied out to the on-disk data structures. That's because if the file
> system supports lazytime, it will have ignored the dirty_inode(inode,
> I_DIRTY_TIME) notification when the timestamp was modified in memory.q
>
> Previously, this was accomplished by calling mark_inode_dirty_sync(),
> but that has the unfortunate side effect of also putting the inode the
> writeback list, and that's not necessary in this case, since we will
> immediately call write_inode() afterwards.
>
> Eric Biggers noticed that this was causing problems for fscrypt after
> the key was removed[1].
>
> [1] https://lore.kernel.org/r/[email protected]
>
> Reported-by: Eric Biggers <[email protected]>
> Signed-off-by: Theodore Ts'o <[email protected]>
> ---
> fs/fs-writeback.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 76ac9c7d32ec..32101349ba97 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1504,8 +1504,9 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
>
> spin_unlock(&inode->i_lock);
>
> - if (dirty & I_DIRTY_TIME)
> - mark_inode_dirty_sync(inode);
> + /* This was a lazytime expiration; we need to tell the file system */
> + if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
> + inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_TIME_EXPIRED);
> /* Don't write the inode if only I_DIRTY_PAGES was set */
> if (dirty & ~I_DIRTY_PAGES) {
> int err = write_inode(inode, wbc);
> --

Thanks Ted! This fixes the fscrypt test failure.

However, are you sure this works correctly on all filesystems? I'm not sure
about XFS. XFS only implements ->dirty_inode(), not ->write_inode(), and in its
->dirty_inode() it does:

static void
xfs_fs_dirty_inode(
struct inode *inode,
int flag)
{
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
struct xfs_trans *tp;

if (!(inode->i_sb->s_flags & SB_LAZYTIME))
return;
if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
return;

if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
return;
xfs_ilock(ip, XFS_ILOCK_EXCL);
xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
xfs_trans_log_inode(tp, ip, XFS_ILOG_TIMESTAMP);
xfs_trans_commit(tp);
}


So flag=I_DIRTY_TIME_EXPIRED will be a no-op.

Maybe you should be using I_DIRTY_SYNC instead? Or perhaps XFS should be
checking for either I_DIRTY_TIME_EXPIRED or I_DIRTY_SYNC?

- Eric

2020-03-11 13:00:02

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration

On Tue, Mar 10, 2020 at 08:20:09PM -0700, Eric Biggers wrote:
> Thanks Ted! This fixes the fscrypt test failure.
>
> However, are you sure this works correctly on all filesystems? I'm not sure
> about XFS. XFS only implements ->dirty_inode(), not ->write_inode(), and in its
> ->dirty_inode() it does:
...
> if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
> return;

That's true, but when the timestamps were originally modified,
dirty_inode() will be called with flag == I_DIRTY_TIME, which will
*not* be a no-op; which is to say, XFS will force the timestamps to be
updated on disk when the timestamps are first dirtied, because it
doesn't support I_DIRTY_TIME.

So I think we're fine.

- Ted

2020-03-11 23:56:19

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration

On Tue, Mar 10, 2020 at 08:20:09PM -0700, Eric Biggers wrote:
> On Fri, Mar 06, 2020 at 09:00:43PM -0500, Theodore Ts'o wrote:
> > In the case that an inode has dirty timestamp for longer than the
> > lazytime expiration timeout (or if all such inodes are being flushed
> > out due to a sync or syncfs system call), we need to inform the file
> > system that the inode is dirty so that the inode's timestamps can be
> > copied out to the on-disk data structures. That's because if the file
> > system supports lazytime, it will have ignored the dirty_inode(inode,
> > I_DIRTY_TIME) notification when the timestamp was modified in memory.q
> >
> > Previously, this was accomplished by calling mark_inode_dirty_sync(),
> > but that has the unfortunate side effect of also putting the inode the
> > writeback list, and that's not necessary in this case, since we will
> > immediately call write_inode() afterwards.
> >
> > Eric Biggers noticed that this was causing problems for fscrypt after
> > the key was removed[1].
> >
> > [1] https://lore.kernel.org/r/[email protected]
> >
> > Reported-by: Eric Biggers <[email protected]>
> > Signed-off-by: Theodore Ts'o <[email protected]>
> > ---
> > fs/fs-writeback.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 76ac9c7d32ec..32101349ba97 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -1504,8 +1504,9 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
> >
> > spin_unlock(&inode->i_lock);
> >
> > - if (dirty & I_DIRTY_TIME)
> > - mark_inode_dirty_sync(inode);
> > + /* This was a lazytime expiration; we need to tell the file system */
> > + if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
> > + inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_TIME_EXPIRED);
> > /* Don't write the inode if only I_DIRTY_PAGES was set */
> > if (dirty & ~I_DIRTY_PAGES) {
> > int err = write_inode(inode, wbc);
> > --
>
> Thanks Ted! This fixes the fscrypt test failure.
>
> However, are you sure this works correctly on all filesystems? I'm not sure
> about XFS. XFS only implements ->dirty_inode(), not ->write_inode(), and in its
> ->dirty_inode() it does:
>
> static void
> xfs_fs_dirty_inode(
> struct inode *inode,
> int flag)
> {
> struct xfs_inode *ip = XFS_I(inode);
> struct xfs_mount *mp = ip->i_mount;
> struct xfs_trans *tp;
>
> if (!(inode->i_sb->s_flags & SB_LAZYTIME))
> return;
> if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
> return;
>
> if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
> return;
> xfs_ilock(ip, XFS_ILOCK_EXCL);
> xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> xfs_trans_log_inode(tp, ip, XFS_ILOG_TIMESTAMP);
> xfs_trans_commit(tp);
> }
>
>
> So flag=I_DIRTY_TIME_EXPIRED will be a no-op.
>
> Maybe you should be using I_DIRTY_SYNC instead? Or perhaps XFS should be
> checking for either I_DIRTY_TIME_EXPIRED or I_DIRTY_SYNC?

Right, XFS does not use the VFS inode writeback code at all - we
track all metadata changes internally via journalling. The VFS uses
I_DIRTY_SYNC to indicate and inode is metadata dirty and a writeback
candidate. Hence if we need to mark an inode dirty for integrity
purposes for _any reason_, then I_DIRTY_SYNC is the correct flag to
be passing to ->dirty_inode.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2020-03-12 00:08:15

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration

On Wed, Mar 11, 2020 at 08:57:49AM -0400, Theodore Y. Ts'o wrote:
> On Tue, Mar 10, 2020 at 08:20:09PM -0700, Eric Biggers wrote:
> > Thanks Ted! This fixes the fscrypt test failure.
> >
> > However, are you sure this works correctly on all filesystems? I'm not sure
> > about XFS. XFS only implements ->dirty_inode(), not ->write_inode(), and in its
> > ->dirty_inode() it does:
> ...
> > if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
> > return;
>
> That's true, but when the timestamps were originally modified,
> dirty_inode() will be called with flag == I_DIRTY_TIME, which will
> *not* be a no-op; which is to say, XFS will force the timestamps to be
> updated on disk when the timestamps are first dirtied, because it
> doesn't support I_DIRTY_TIME.

We log the initial timestamp change, and then ignore timestamp
updates until the dirty time expires and the inode is set
I_DIRTY_SYNC via __mark_inode_dirty_sync(). IOWs, on expiry, we have
time stamps that may be 24 hours out of date in memory, and they
still need to be flushed to the journal.

However, your change does not mark the inode dirtying on expiry
anymore, so...

> So I think we're fine.

... we're not fine. This breaks XFS and any other filesystem that
relies on a I_DIRTY_SYNC notification to handle dirty time expiry
correctly.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2020-03-12 14:35:13

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration

On Thu, Mar 12, 2020 at 11:07:17AM +1100, Dave Chinner wrote:
> > That's true, but when the timestamps were originally modified,
> > dirty_inode() will be called with flag == I_DIRTY_TIME, which will
> > *not* be a no-op; which is to say, XFS will force the timestamps to be
> > updated on disk when the timestamps are first dirtied, because it
> > doesn't support I_DIRTY_TIME.
>
> We log the initial timestamp change, and then ignore timestamp
> updates until the dirty time expires and the inode is set
> I_DIRTY_SYNC via __mark_inode_dirty_sync(). IOWs, on expiry, we have
> time stamps that may be 24 hours out of date in memory, and they
> still need to be flushed to the journal.
>
> However, your change does not mark the inode dirtying on expiry
> anymore, so...
>
> > So I think we're fine.
>
> ... we're not fine. This breaks XFS and any other filesystem that
> relies on a I_DIRTY_SYNC notification to handle dirty time expiry
> correctly.

I haven't seen the original mail this replies to, but if we could
get the lazytime expirty by some other means (e.g. an explicit
callback), XFS could opt out of all the VFS inode tracking again,
which would simplify a few things.

2020-03-12 22:40:12

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration

On Thu, Mar 12, 2020 at 07:34:45AM -0700, Christoph Hellwig wrote:
> On Thu, Mar 12, 2020 at 11:07:17AM +1100, Dave Chinner wrote:
> > > That's true, but when the timestamps were originally modified,
> > > dirty_inode() will be called with flag == I_DIRTY_TIME, which will
> > > *not* be a no-op; which is to say, XFS will force the timestamps to be
> > > updated on disk when the timestamps are first dirtied, because it
> > > doesn't support I_DIRTY_TIME.
> >
> > We log the initial timestamp change, and then ignore timestamp
> > updates until the dirty time expires and the inode is set
> > I_DIRTY_SYNC via __mark_inode_dirty_sync(). IOWs, on expiry, we have
> > time stamps that may be 24 hours out of date in memory, and they
> > still need to be flushed to the journal.
> >
> > However, your change does not mark the inode dirtying on expiry
> > anymore, so...
> >
> > > So I think we're fine.
> >
> > ... we're not fine. This breaks XFS and any other filesystem that
> > relies on a I_DIRTY_SYNC notification to handle dirty time expiry
> > correctly.
>
> I haven't seen the original mail this replies to,

The original problem was calling mark_inode_dirty_sync() on expiry
during inode writeback was causing the inode to be put back on the
dirty inode list and so ext4 was flushing it twice - once on expiry
and once 5 seconds later on the next background writeback pass.

This is a problem that XFS does not have because it does not
implement ->write_inode...

> but if we could
> get the lazytime expirty by some other means (e.g. an explicit
> callback), XFS could opt out of all the VFS inode tracking again,
> which would simplify a few things.

Yes, that would definitely make things simpler for XFS, and it would
also solve the problem that the generic lazytime expiry code has....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2020-03-20 02:48:01

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] writeback: avoid double-writing the inode on a lazytime expiration

On Thu, Mar 12, 2020 at 07:34:45AM -0700, Christoph Hellwig wrote:
> I haven't seen the original mail this replies to, but if we could
> get the lazytime expirty by some other means (e.g. an explicit
> callback), XFS could opt out of all the VFS inode tracking again,
> which would simplify a few things.

Part of my thinking of calling

inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_TIME_EXPIRED);

So that it would be an explicit callback to XFS. So why don't I break
this as two patches --- one which uses I_DIRTY_SYNC, as before, and a
second one which changes calls dirty_inode() with
I_DIRTY_TIME_EXPIRED, and with a change to XFS so that it recognizes
I_DIRTY_TIME_EXPIRED as if it were I_DIRTY_SYNC. If this would then
allow XFS to simplify how it handles VFS tracking, you could do that
in a separate patch.

Does that work? I'll send out the two patches, and if you can
review/ack the second patch, that would be great.

- Ted

2020-03-20 02:53:45

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 1/2] writeback: avoid double-writing the inode on a lazytime expiration

In the case that an inode has dirty timestamp for longer than the
lazytime expiration timeout (or if all such inodes are being flushed
out due to a sync or syncfs system call), we need to inform the file
system that the inode is dirty so that the inode's timestamps can be
copied out to the on-disk data structures. That's because if the file
system supports lazytime, it will have ignored the dirty_inode(inode,
I_DIRTY_TIME) notification when the timestamp was modified in memory.q

Previously, this was accomplished by calling mark_inode_dirty_sync(),
but that has the unfortunate side effect of also putting the inode the
writeback list, and that's not necessary in this case, since we will
immediately call write_inode() afterwards.

Eric Biggers noticed that this was causing problems for fscrypt after
the key was removed[1].

[1] https://lore.kernel.org/r/[email protected]

Reported-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
---
fs/fs-writeback.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 76ac9c7d32ec..867454997c9d 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1504,8 +1504,9 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)

spin_unlock(&inode->i_lock);

- if (dirty & I_DIRTY_TIME)
- mark_inode_dirty_sync(inode);
+ /* This was a lazytime expiration; we need to tell the file system */
+ if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
+ inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & ~I_DIRTY_PAGES) {
int err = write_inode(inode, wbc);
--
2.24.1

2020-03-25 09:21:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/2] writeback: avoid double-writing the inode on a lazytime expiration

> spin_unlock(&inode->i_lock);
>
> - if (dirty & I_DIRTY_TIME)
> - mark_inode_dirty_sync(inode);
> + /* This was a lazytime expiration; we need to tell the file system */
> + if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
> + inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);

I think this needs a very clear comment explaining why we don't go
through __mark_inode_dirty.

But as said before I'd rather have a new lazytime_expired operation that
makes it very clear what is happening. We currenly have 4 file systems
(ext4, f2fs, ubifs and xfs) that support lazytime, so this won't really
be a major churn.

2020-03-25 15:22:15

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 1/2] writeback: avoid double-writing the inode on a lazytime expiration

On Wed, Mar 25, 2020 at 02:20:57AM -0700, Christoph Hellwig wrote:
> > spin_unlock(&inode->i_lock);
> >
> > - if (dirty & I_DIRTY_TIME)
> > - mark_inode_dirty_sync(inode);
> > + /* This was a lazytime expiration; we need to tell the file system */
> > + if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
> > + inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
>
> I think this needs a very clear comment explaining why we don't go
> through __mark_inode_dirty.

I can take the explanation which is in the git commit description and
move it into the comment.

> But as said before I'd rather have a new lazytime_expired operation that
> makes it very clear what is happening. We currenly have 4 file systems
> (ext4, f2fs, ubifs and xfs) that support lazytime, so this won't really
> be a major churn.

Again, I believe patch #2 does what you want; if it doesn't can you
explain why passing I_DIRTY_TIME_EXPIRED to s_op->dirty_inode() isn't
"a new lazytime expired operation that makes very clear what is
happening"?

I separated out patch #1 and patch #2 because patch #1 preserves
current behavior, and patch #2 modifies XFS code, which I don't want
to push Linus without an XFS reviewed-by.

N.b. None of the other file systems required a change for patch #2,
so if you want, we can have the XFS tree carry patch #2, and/or
combine that with whatever other simplifying changes that you want.
Or I can combine patch #1 and patch #2, with an XFS Reviewed-by, and
send it through the ext4 tree.

What's your pleasure?

- Ted

2020-03-25 15:49:15

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH 1/2] writeback: avoid double-writing the inode on a lazytime expiration

On Wed, Mar 25, 2020 at 11:21:13AM -0400, Theodore Y. Ts'o wrote:
> On Wed, Mar 25, 2020 at 02:20:57AM -0700, Christoph Hellwig wrote:
> > > spin_unlock(&inode->i_lock);
> > >
> > > - if (dirty & I_DIRTY_TIME)
> > > - mark_inode_dirty_sync(inode);
> > > + /* This was a lazytime expiration; we need to tell the file system */
> > > + if (dirty & I_DIRTY_TIME_EXPIRED && inode->i_sb->s_op->dirty_inode)
> > > + inode->i_sb->s_op->dirty_inode(inode, I_DIRTY_SYNC);
> >
> > I think this needs a very clear comment explaining why we don't go
> > through __mark_inode_dirty.
>
> I can take the explanation which is in the git commit description and
> move it into the comment.
>
> > But as said before I'd rather have a new lazytime_expired operation that
> > makes it very clear what is happening. We currenly have 4 file systems
> > (ext4, f2fs, ubifs and xfs) that support lazytime, so this won't really
> > be a major churn.
>
> Again, I believe patch #2 does what you want; if it doesn't can you
> explain why passing I_DIRTY_TIME_EXPIRED to s_op->dirty_inode() isn't
> "a new lazytime expired operation that makes very clear what is
> happening"?
>
> I separated out patch #1 and patch #2 because patch #1 preserves
> current behavior, and patch #2 modifies XFS code, which I don't want
> to push Linus without an XFS reviewed-by.
>
> N.b. None of the other file systems required a change for patch #2,
> so if you want, we can have the XFS tree carry patch #2, and/or
> combine that with whatever other simplifying changes that you want.
> Or I can combine patch #1 and patch #2, with an XFS Reviewed-by, and
> send it through the ext4 tree.
>
> What's your pleasure?

TBH while I'm pretty sure this does actually maintain more or less the
same behavior on xfs, I prefer Christoph's explicit ->lazytime_expired
approach[1] over squinting at bitflag manipulations.

(It also took me a while to realize that this patch duo even existed, as
it was kinda buried in its parent thread...)

--D

[1] https://lore.kernel.org/linux-fsdevel/[email protected]/T/#t

>
> - Ted
>