From: Masayoshi Mizuma <[email protected]>
/proc/mounts doesn't show 'i_version' even if iversion
mount option is set to XFS.
iversion mount option is a VFS option, not ext4 specific option.
Move the handler to show_sb_opts() so that /proc/mounts can show
'i_version' on not only ext4 but also the other filesystem.
Signed-off-by: Masayoshi Mizuma <[email protected]>
---
fs/ext4/super.c | 2 --
fs/proc_namespace.c | 1 +
2 files changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a29e8ea1a7ab..879289de1818 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2325,8 +2325,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
- if (sb->s_flags & SB_I_VERSION)
- SEQ_OPTS_PUTS("i_version");
if (nodefs || sbi->s_stripe)
SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
if (nodefs || EXT4_MOUNT_DATA_FLAGS &
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 3059a9394c2d..848c4afaef05 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -49,6 +49,7 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb)
{ SB_DIRSYNC, ",dirsync" },
{ SB_MANDLOCK, ",mand" },
{ SB_LAZYTIME, ",lazytime" },
+ { SB_I_VERSION, ",i_version" },
{ 0, NULL }
};
const struct proc_fs_opts *fs_infop;
--
2.18.4
On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma <[email protected]>
>
> /proc/mounts doesn't show 'i_version' even if iversion
> mount option is set to XFS.
>
> iversion mount option is a VFS option, not ext4 specific option.
> Move the handler to show_sb_opts() so that /proc/mounts can show
> 'i_version' on not only ext4 but also the other filesystem.
SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version
mount option.
On Wed, Jun 17, 2020 at 01:03:14AM -0700, Christoph Hellwig wrote:
> On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote:
> > From: Masayoshi Mizuma <[email protected]>
> >
> > /proc/mounts doesn't show 'i_version' even if iversion
> > mount option is set to XFS.
> >
> > iversion mount option is a VFS option, not ext4 specific option.
> > Move the handler to show_sb_opts() so that /proc/mounts can show
> > 'i_version' on not only ext4 but also the other filesystem.
>
> SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version
> mount option.
Thank you for pointing it out, I misunderstood that for XFS.
iversion mount option is a VFS option, so I think it's good to
show 'i_version' on /proc/mounts if the filesystem has i_version feature,
like as:
$ cat /proc/mounts
...
/dev/vdb1 /mnt xfs rw,i_version,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
...
- Masa
On Wed, Jun 17, 2020 at 01:03:14AM -0700, Christoph Hellwig wrote:
> On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote:
> > From: Masayoshi Mizuma <[email protected]>
> >
> > /proc/mounts doesn't show 'i_version' even if iversion
> > mount option is set to XFS.
> >
> > iversion mount option is a VFS option, not ext4 specific option.
> > Move the handler to show_sb_opts() so that /proc/mounts can show
> > 'i_version' on not only ext4 but also the other filesystem.
>
> SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version
> mount option.
It probably *should* be a kernel internal flag, but it seems to work as
a mount option too.
By coincidence I've just been looking at a bug report showing that
i_version support is getting accidentally turned off on XFS whenever
userspace does a read-write remount.
Is there someplace in the xfs mount code where it should be throwing out
SB_I_VERSION?
Ideally there'd be entirely different fields for mount options and
internal feature flags. But I don't know, maybe SB_I_VERSION is the
only flag we have like this.
--b.
On 6/17/20 10:58 AM, J. Bruce Fields wrote:
> On Wed, Jun 17, 2020 at 01:03:14AM -0700, Christoph Hellwig wrote:
>> On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote:
>>> From: Masayoshi Mizuma <[email protected]>
>>>
>>> /proc/mounts doesn't show 'i_version' even if iversion
>>> mount option is set to XFS.
>>>
>>> iversion mount option is a VFS option, not ext4 specific option.
>>> Move the handler to show_sb_opts() so that /proc/mounts can show
>>> 'i_version' on not only ext4 but also the other filesystem.
>>
>> SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version
>> mount option.
>
> It probably *should* be a kernel internal flag, but it seems to work as
> a mount option too.
Not on XFS AFAICT:
[600280.685810] xfs: Unknown parameter 'i_version'
so we can't be exporting "mount options" for xfs that aren't actually
going to be accepted by the filesystem.
> By coincidence I've just been looking at a bug report showing that
> i_version support is getting accidentally turned off on XFS whenever
> userspace does a read-write remount.
>
> Is there someplace in the xfs mount code where it should be throwing out
> SB_I_VERSION?
<cc xfs list>
XFS doesn't manipulate that flag on remount. We just turn it on by default
for modern filesystem formats:
/* version 5 superblocks support inode version counters. */
if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
sb->s_flags |= SB_I_VERSION;
Also, this behavior doesn't seem unique to xfs:
# mount -o loop,i_version fsfile test_iversion
# grep test_iversion /proc/mounts
/dev/loop4 /tmp/test_iversion ext4 rw,seclabel,relatime,i_version 0 0
# mount -o remount test_iversion
# grep test_iversion /proc/mounts
/dev/loop4 /tmp/test_iversion ext4 rw,seclabel,relatime 0 0
# uname -a
Linux <hostname> 5.7.0-rc4+ #7 SMP Wed Jun 10 14:01:34 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
-Eric
> Ideally there'd be entirely different fields for mount options and
> internal feature flags. But I don't know, maybe SB_I_VERSION is the
> only flag we have like this.
>
> --b.
>
On Wed, Jun 17, 2020 at 12:14:28PM -0500, Eric Sandeen wrote:
>
>
> On 6/17/20 10:58 AM, J. Bruce Fields wrote:
> > On Wed, Jun 17, 2020 at 01:03:14AM -0700, Christoph Hellwig wrote:
> >> On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote:
> >>> From: Masayoshi Mizuma <[email protected]>
> >>>
> >>> /proc/mounts doesn't show 'i_version' even if iversion
> >>> mount option is set to XFS.
> >>>
> >>> iversion mount option is a VFS option, not ext4 specific option.
> >>> Move the handler to show_sb_opts() so that /proc/mounts can show
> >>> 'i_version' on not only ext4 but also the other filesystem.
> >>
> >> SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version
> >> mount option.
> >
> > It probably *should* be a kernel internal flag, but it seems to work as
> > a mount option too.
>
> Not on XFS AFAICT:
>
> [600280.685810] xfs: Unknown parameter 'i_version'
Yeah, because the mount option is 'iversion', not 'i_version'. Even if
you were going to expose the flag state in /proc/mounts, the text string
should match the mount option.
> so we can't be exporting "mount options" for xfs that aren't actually
> going to be accepted by the filesystem.
>
> > By coincidence I've just been looking at a bug report showing that
> > i_version support is getting accidentally turned off on XFS whenever
> > userspace does a read-write remount.
> >
> > Is there someplace in the xfs mount code where it should be throwing out
> > SB_I_VERSION?
>
> <cc xfs list>
>
> XFS doesn't manipulate that flag on remount. We just turn it on by default
> for modern filesystem formats:
>
> /* version 5 superblocks support inode version counters. */
> if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
> sb->s_flags |= SB_I_VERSION;
>
> Also, this behavior doesn't seem unique to xfs:
>
> # mount -o loop,i_version fsfile test_iversion
> # grep test_iversion /proc/mounts
> /dev/loop4 /tmp/test_iversion ext4 rw,seclabel,relatime,i_version 0 0
> # mount -o remount test_iversion
> # grep test_iversion /proc/mounts
> /dev/loop4 /tmp/test_iversion ext4 rw,seclabel,relatime 0 0
> # uname -a
> Linux <hostname> 5.7.0-rc4+ #7 SMP Wed Jun 10 14:01:34 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
Probably because do_mount clears it and I guess xfs don't re-enable
it in any of their remount functions...
--D
> -Eric
>
> > Ideally there'd be entirely different fields for mount options and
> > internal feature flags. But I don't know, maybe SB_I_VERSION is the
> > only flag we have like this.
> >
> > --b.
> >
On 6/17/20 12:24 PM, Darrick J. Wong wrote:
> On Wed, Jun 17, 2020 at 12:14:28PM -0500, Eric Sandeen wrote:
>>
>>
>> On 6/17/20 10:58 AM, J. Bruce Fields wrote:
>>> On Wed, Jun 17, 2020 at 01:03:14AM -0700, Christoph Hellwig wrote:
>>>> On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote:
>>>>> From: Masayoshi Mizuma <[email protected]>
>>>>>
>>>>> /proc/mounts doesn't show 'i_version' even if iversion
>>>>> mount option is set to XFS.
>>>>>
>>>>> iversion mount option is a VFS option, not ext4 specific option.
>>>>> Move the handler to show_sb_opts() so that /proc/mounts can show
>>>>> 'i_version' on not only ext4 but also the other filesystem.
>>>>
>>>> SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version
>>>> mount option.
>>>
>>> It probably *should* be a kernel internal flag, but it seems to work as
>>> a mount option too.
>>
>> Not on XFS AFAICT:
>>
>> [600280.685810] xfs: Unknown parameter 'i_version'
>
> Yeah, because the mount option is 'iversion', not 'i_version'. Even if
unless you're ext4:
{Opt_i_version, "i_version"},
ok "iversion" is what mount(8) takes and translates into MS_I_VERSION (thanks Darrick)
# strace -vv -emount mount -oloop,iversion fsfile mnt
mount("/dev/loop0", "/tmp/mnt", "xfs", MS_I_VERSION, NULL) = 0
FWIW, mount actually seems to pass what it finds in /proc/mounts back in on remount for ext4:
# strace -vv -emount mount -o remount mnt
mount("/dev/loop0", "/tmp/mnt", 0x55bfcbdca150, MS_REMOUNT|MS_RELATIME, "seclabel,i_version,data=ordered") = 0
but it still looks unhandled on remount. Perhaps if /proc/mounts exposed
"iversion" (not "i_version") then mount -o remount would DTRT.
-Eric
> you were going to expose the flag state in /proc/mounts, the text string
> should match the mount option.
>
>> so we can't be exporting "mount options" for xfs that aren't actually
>> going to be accepted by the filesystem.
>>
>>> By coincidence I've just been looking at a bug report showing that
>>> i_version support is getting accidentally turned off on XFS whenever
>>> userspace does a read-write remount.
>>>
>>> Is there someplace in the xfs mount code where it should be throwing out
>>> SB_I_VERSION?
>>
>> <cc xfs list>
>>
>> XFS doesn't manipulate that flag on remount. We just turn it on by default
>> for modern filesystem formats:
>>
>> /* version 5 superblocks support inode version counters. */
>> if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
>> sb->s_flags |= SB_I_VERSION;
>>
>> Also, this behavior doesn't seem unique to xfs:
>>
>> # mount -o loop,i_version fsfile test_iversion
>> # grep test_iversion /proc/mounts
>> /dev/loop4 /tmp/test_iversion ext4 rw,seclabel,relatime,i_version 0 0
>> # mount -o remount test_iversion
>> # grep test_iversion /proc/mounts
>> /dev/loop4 /tmp/test_iversion ext4 rw,seclabel,relatime 0 0
>> # uname -a
>> Linux <hostname> 5.7.0-rc4+ #7 SMP Wed Jun 10 14:01:34 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
>
> Probably because do_mount clears it and I guess xfs don't re-enable
> it in any of their remount functions...
On Wed, Jun 17, 2020 at 12:55:24PM -0500, Eric Sandeen wrote:
> On 6/17/20 12:24 PM, Darrick J. Wong wrote:
> > On Wed, Jun 17, 2020 at 12:14:28PM -0500, Eric Sandeen wrote:
> >>
> >>
> >> On 6/17/20 10:58 AM, J. Bruce Fields wrote:
> >>> On Wed, Jun 17, 2020 at 01:03:14AM -0700, Christoph Hellwig wrote:
> >>>> On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote:
> >>>>> From: Masayoshi Mizuma <[email protected]>
> >>>>>
> >>>>> /proc/mounts doesn't show 'i_version' even if iversion
> >>>>> mount option is set to XFS.
> >>>>>
> >>>>> iversion mount option is a VFS option, not ext4 specific option.
> >>>>> Move the handler to show_sb_opts() so that /proc/mounts can show
> >>>>> 'i_version' on not only ext4 but also the other filesystem.
> >>>>
> >>>> SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version
> >>>> mount option.
> >>>
> >>> It probably *should* be a kernel internal flag, but it seems to work as
> >>> a mount option too.
> >>
> >> Not on XFS AFAICT:
> >>
> >> [600280.685810] xfs: Unknown parameter 'i_version'
> >
> > Yeah, because the mount option is 'iversion', not 'i_version'. Even if
>
> unless you're ext4:
>
> {Opt_i_version, "i_version"},
>
> ok "iversion" is what mount(8) takes and translates into MS_I_VERSION (thanks Darrick)
>
> # strace -vv -emount mount -oloop,iversion fsfile mnt
> mount("/dev/loop0", "/tmp/mnt", "xfs", MS_I_VERSION, NULL) = 0
>
> FWIW, mount actually seems to pass what it finds in /proc/mounts back in on remount for ext4:
>
> # strace -vv -emount mount -o remount mnt
> mount("/dev/loop0", "/tmp/mnt", 0x55bfcbdca150, MS_REMOUNT|MS_RELATIME, "seclabel,i_version,data=ordered") = 0
>
> but it still looks unhandled on remount. Perhaps if /proc/mounts exposed
> "iversion" (not "i_version") then mount -o remount would DTRT.
I'd rather just eliminate the option, to the extent possible.
It was only ever a mount option since it caused a performance regression
in some filesystems, but I *think* that was addressed by Jeff Layton's
work (f02a9ad1f15d "fs: handle inode->i_version more efficiently").
XFS in particular is just using this flag to tell knfsd that it should
use i_version. I don't think it was really intended for userspace to be
able to turn this off.
--b.
On 6/17/20 1:18 PM, J. Bruce Fields wrote:
> On Wed, Jun 17, 2020 at 12:55:24PM -0500, Eric Sandeen wrote:
>> On 6/17/20 12:24 PM, Darrick J. Wong wrote:
>>> On Wed, Jun 17, 2020 at 12:14:28PM -0500, Eric Sandeen wrote:
>>>>
>>>>
>>>> On 6/17/20 10:58 AM, J. Bruce Fields wrote:
>>>>> On Wed, Jun 17, 2020 at 01:03:14AM -0700, Christoph Hellwig wrote:
>>>>>> On Tue, Jun 16, 2020 at 04:21:23PM -0400, Masayoshi Mizuma wrote:
>>>>>>> From: Masayoshi Mizuma <[email protected]>
>>>>>>>
>>>>>>> /proc/mounts doesn't show 'i_version' even if iversion
>>>>>>> mount option is set to XFS.
>>>>>>>
>>>>>>> iversion mount option is a VFS option, not ext4 specific option.
>>>>>>> Move the handler to show_sb_opts() so that /proc/mounts can show
>>>>>>> 'i_version' on not only ext4 but also the other filesystem.
>>>>>>
>>>>>> SB_I_VERSION is a kernel internal flag. XFS doesn't have an i_version
>>>>>> mount option.
>>>>>
>>>>> It probably *should* be a kernel internal flag, but it seems to work as
>>>>> a mount option too.
>>>>
>>>> Not on XFS AFAICT:
>>>>
>>>> [600280.685810] xfs: Unknown parameter 'i_version'
>>>
>>> Yeah, because the mount option is 'iversion', not 'i_version'. Even if
>>
>> unless you're ext4:
>>
>> {Opt_i_version, "i_version"},
>>
>> ok "iversion" is what mount(8) takes and translates into MS_I_VERSION (thanks Darrick)
>>
>> # strace -vv -emount mount -oloop,iversion fsfile mnt
>> mount("/dev/loop0", "/tmp/mnt", "xfs", MS_I_VERSION, NULL) = 0
>>
>> FWIW, mount actually seems to pass what it finds in /proc/mounts back in on remount for ext4:
>>
>> # strace -vv -emount mount -o remount mnt
>> mount("/dev/loop0", "/tmp/mnt", 0x55bfcbdca150, MS_REMOUNT|MS_RELATIME, "seclabel,i_version,data=ordered") = 0
>>
>> but it still looks unhandled on remount. Perhaps if /proc/mounts exposed
>> "iversion" (not "i_version") then mount -o remount would DTRT.
>
> I'd rather just eliminate the option, to the extent possible.
>
> It was only ever a mount option since it caused a performance regression
> in some filesystems, but I *think* that was addressed by Jeff Layton's
> work (f02a9ad1f15d "fs: handle inode->i_version more efficiently").
>
> XFS in particular is just using this flag to tell knfsd that it should
> use i_version. I don't think it was really intended for userspace to be
> able to turn this off.
but mount(8) has already exposed this interface:
iversion
Every time the inode is modified, the i_version field will be incremented.
noiversion
Do not increment the i_version inode field.
so now what?
FWIW, exporting "iversion" in /proc/mounts will
1) tell us whether the SB_I_VERSION is or is not in fact set on the fs, and
2) will make remount DTRT because mount(8) will properly parse it and send it
back in via the "flags" var during remount.
# mount -o loop fsfile mnt
# grep mnt /proc/mounts
/dev/loop0 /tmp/mnt xfs rw,iversion,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
# strace -vv -emount mount -o remount mnt
mount("/dev/loop0", "/tmp/mnt", 0x55c11d0e3150, MS_REMOUNT|MS_RELATIME|MS_I_VERSION, "seclabel,attr2,inode64,logbufs=8"...) = 0
# grep mnt /proc/mounts
/dev/loop0 /tmp/mnt xfs rw,iversion,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
ext4 i_version will just map back to iversion:
# mount -o i_version,loop fsfile mnt
# grep mnt /proc/mounts
/dev/loop0 /tmp/mnt ext4 rw,iversion,seclabel,relatime 0 0
# mount -o remount mnt
# grep mnt /proc/mounts
/dev/loop0 /tmp/mnt ext4 rw,iversion,seclabel,relatime 0 0
One wrinkle is that xfs will not honor "noiversion" currently but that's a different question.
-Eric
On Wed, Jun 17, 2020 at 01:28:11PM -0500, Eric Sandeen wrote:
> but mount(8) has already exposed this interface:
>
> iversion
> Every time the inode is modified, the i_version field will be incremented.
>
> noiversion
> Do not increment the i_version inode field.
>
> so now what?
It's not like anyone's actually depending on i_version *not* being
incremented. (Can you even observe it from userspace other than over
NFS?)
So, just silently turn on the "iversion" behavior and ignore noiversion,
and I doubt you're going to break any real application.
--b.
On Wed, Jun 17, 2020 at 02:45:07PM -0400, J. Bruce Fields wrote:
> On Wed, Jun 17, 2020 at 01:28:11PM -0500, Eric Sandeen wrote:
> > but mount(8) has already exposed this interface:
> >
> > iversion
> > Every time the inode is modified, the i_version field will be incremented.
> >
> > noiversion
> > Do not increment the i_version inode field.
> >
> > so now what?
>
> It's not like anyone's actually depending on i_version *not* being
> incremented. (Can you even observe it from userspace other than over
> NFS?)
>
> So, just silently turn on the "iversion" behavior and ignore noiversion,
> and I doubt you're going to break any real application.
I suppose it's probably good to remain the options for user compatibility,
however, it seems that iversion and noiversiont are useful for
only ext4.
How about moving iversion and noiversion description on mount(8)
to ext4 specific option?
And fixing the remount issue for XFS (maybe btrfs has the same
issue as well)?
For XFS like as:
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 379cbff438bc..2ddd634cfb0b 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1748,6 +1748,9 @@ xfs_fc_reconfigure(
return error;
}
+ if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
+ mp->m_super->s_flags |= SB_I_VERSION;
+
return 0;
}
Thanks,
Masa
On Wed, Jun 17, 2020 at 09:30:26PM -0400, Masayoshi Mizuma wrote:
> On Wed, Jun 17, 2020 at 02:45:07PM -0400, J. Bruce Fields wrote:
> > On Wed, Jun 17, 2020 at 01:28:11PM -0500, Eric Sandeen wrote:
> > > but mount(8) has already exposed this interface:
> > >
> > > iversion
> > > Every time the inode is modified, the i_version field will be incremented.
> > >
> > > noiversion
> > > Do not increment the i_version inode field.
> > >
> > > so now what?
> >
> > It's not like anyone's actually depending on i_version *not* being
> > incremented. (Can you even observe it from userspace other than over
> > NFS?)
> >
> > So, just silently turn on the "iversion" behavior and ignore noiversion,
> > and I doubt you're going to break any real application.
>
> I suppose it's probably good to remain the options for user compatibility,
> however, it seems that iversion and noiversiont are useful for
> only ext4.
> How about moving iversion and noiversion description on mount(8)
> to ext4 specific option?
>
> And fixing the remount issue for XFS (maybe btrfs has the same
> issue as well)?
> For XFS like as:
>
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 379cbff438bc..2ddd634cfb0b 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1748,6 +1748,9 @@ xfs_fc_reconfigure(
> return error;
> }
>
> + if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
> + mp->m_super->s_flags |= SB_I_VERSION;
> +
I wonder, does this have to be done at the top of this function because
the vfs already removed S_I_VERSION from s_flags?
--D
> return 0;
> }
>
> Thanks,
> Masa
On Wed, Jun 17, 2020 at 09:30:26PM -0400, Masayoshi Mizuma wrote:
> On Wed, Jun 17, 2020 at 02:45:07PM -0400, J. Bruce Fields wrote:
> > On Wed, Jun 17, 2020 at 01:28:11PM -0500, Eric Sandeen wrote:
> > > but mount(8) has already exposed this interface:
> > >
> > > iversion
> > > Every time the inode is modified, the i_version field will be incremented.
> > >
> > > noiversion
> > > Do not increment the i_version inode field.
> > >
> > > so now what?
> >
> > It's not like anyone's actually depending on i_version *not* being
> > incremented. (Can you even observe it from userspace other than over
> > NFS?)
> >
> > So, just silently turn on the "iversion" behavior and ignore noiversion,
> > and I doubt you're going to break any real application.
>
> I suppose it's probably good to remain the options for user compatibility,
> however, it seems that iversion and noiversiont are useful for
> only ext4.
> How about moving iversion and noiversion description on mount(8)
> to ext4 specific option?
>
> And fixing the remount issue for XFS (maybe btrfs has the same
> issue as well)?
> For XFS like as:
>
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 379cbff438bc..2ddd634cfb0b 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1748,6 +1748,9 @@ xfs_fc_reconfigure(
> return error;
> }
>
> + if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
> + mp->m_super->s_flags |= SB_I_VERSION;
> +
> return 0;
> }
no this doesn't work, because the sueprblock flags are modified
after ->reconfigure is called.
i.e. reconfigure_super() does this:
if (fc->ops->reconfigure) {
retval = fc->ops->reconfigure(fc);
if (retval) {
if (!force)
goto cancel_readonly;
/* If forced remount, go ahead despite any errors */
WARN(1, "forced remount of a %s fs returned %i\n",
sb->s_type->name, retval);
}
}
WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) |
(fc->sb_flags & fc->sb_flags_mask)));
And it's the WRITE_ONCE() line that clears SB_I_VERSION out of
sb->s_flags. Hence adding it in ->reconfigure doesn't help.
What we actually want to do here in xfs_fc_reconfigure() is this:
if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
fc->sb_flags_mask |= SB_I_VERSION;
So that the SB_I_VERSION is not cleared from sb->s_flags.
I'll also note that btrfs will need the same fix, because it also
sets SB_I_VERSION unconditionally, as will any other filesystem that
does this, too.
Really, this is just indicative of the mess that the mount
flags vs superblock feature flags are. Filesystems can choose to
unconditionally support various superblock features, and no mount
option futzing from userspace should -ever- be able to change that
feature. Filesystems really do need to be able to override mount
options that were parsed in userspace and turned into a binary
flag...
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Wed, Jun 17, 2020 at 06:44:29PM -0700, Darrick J. Wong wrote:
> On Wed, Jun 17, 2020 at 09:30:26PM -0400, Masayoshi Mizuma wrote:
> > On Wed, Jun 17, 2020 at 02:45:07PM -0400, J. Bruce Fields wrote:
> > > On Wed, Jun 17, 2020 at 01:28:11PM -0500, Eric Sandeen wrote:
> > > > but mount(8) has already exposed this interface:
> > > >
> > > > iversion
> > > > Every time the inode is modified, the i_version field will be incremented.
> > > >
> > > > noiversion
> > > > Do not increment the i_version inode field.
> > > >
> > > > so now what?
> > >
> > > It's not like anyone's actually depending on i_version *not* being
> > > incremented. (Can you even observe it from userspace other than over
> > > NFS?)
> > >
> > > So, just silently turn on the "iversion" behavior and ignore noiversion,
> > > and I doubt you're going to break any real application.
> >
> > I suppose it's probably good to remain the options for user compatibility,
> > however, it seems that iversion and noiversiont are useful for
> > only ext4.
> > How about moving iversion and noiversion description on mount(8)
> > to ext4 specific option?
> >
> > And fixing the remount issue for XFS (maybe btrfs has the same
> > issue as well)?
> > For XFS like as:
> >
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 379cbff438bc..2ddd634cfb0b 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1748,6 +1748,9 @@ xfs_fc_reconfigure(
> > return error;
> > }
> >
> > + if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
> > + mp->m_super->s_flags |= SB_I_VERSION;
> > +
>
> I wonder, does this have to be done at the top of this function because
> the vfs already removed S_I_VERSION from s_flags?
Ah, I found the above code doesn't work...
sb->s_flags will be updated after reconfigure():
int reconfigure_super(struct fs_context *fc)
{
...
if (fc->ops->reconfigure) {
retval = fc->ops->reconfigure(fc);
if (retval) {
if (!force)
goto cancel_readonly;
/* If forced remount, go ahead despite any errors */
WARN(1, "forced remount of a %s fs returned %i\n",
sb->s_type->name, retval);
}
}
WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) |
(fc->sb_flags & fc->sb_flags_mask)));
Here, fc->sb_flags_mask should be MS_RMT_MASK, so SB_I_VERSION will be
dropped except iversion mount opstion (MS_I_VERSION) is set.
- Masa
On Thu, Jun 18, 2020 at 01:05:39PM +1000, Dave Chinner wrote:
> On Wed, Jun 17, 2020 at 09:30:26PM -0400, Masayoshi Mizuma wrote:
> > On Wed, Jun 17, 2020 at 02:45:07PM -0400, J. Bruce Fields wrote:
> > > On Wed, Jun 17, 2020 at 01:28:11PM -0500, Eric Sandeen wrote:
> > > > but mount(8) has already exposed this interface:
> > > >
> > > > iversion
> > > > Every time the inode is modified, the i_version field will be incremented.
> > > >
> > > > noiversion
> > > > Do not increment the i_version inode field.
> > > >
> > > > so now what?
> > >
> > > It's not like anyone's actually depending on i_version *not* being
> > > incremented. (Can you even observe it from userspace other than over
> > > NFS?)
> > >
> > > So, just silently turn on the "iversion" behavior and ignore noiversion,
> > > and I doubt you're going to break any real application.
> >
> > I suppose it's probably good to remain the options for user compatibility,
> > however, it seems that iversion and noiversiont are useful for
> > only ext4.
> > How about moving iversion and noiversion description on mount(8)
> > to ext4 specific option?
> >
> > And fixing the remount issue for XFS (maybe btrfs has the same
> > issue as well)?
> > For XFS like as:
> >
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 379cbff438bc..2ddd634cfb0b 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1748,6 +1748,9 @@ xfs_fc_reconfigure(
> > return error;
> > }
> >
> > + if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
> > + mp->m_super->s_flags |= SB_I_VERSION;
> > +
> > return 0;
> > }
>
> no this doesn't work, because the sueprblock flags are modified
> after ->reconfigure is called.
>
> i.e. reconfigure_super() does this:
>
> if (fc->ops->reconfigure) {
> retval = fc->ops->reconfigure(fc);
> if (retval) {
> if (!force)
> goto cancel_readonly;
> /* If forced remount, go ahead despite any errors */
> WARN(1, "forced remount of a %s fs returned %i\n",
> sb->s_type->name, retval);
> }
> }
>
> WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) |
> (fc->sb_flags & fc->sb_flags_mask)));
>
> And it's the WRITE_ONCE() line that clears SB_I_VERSION out of
> sb->s_flags. Hence adding it in ->reconfigure doesn't help.
>
> What we actually want to do here in xfs_fc_reconfigure() is this:
>
> if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
> fc->sb_flags_mask |= SB_I_VERSION;
>
> So that the SB_I_VERSION is not cleared from sb->s_flags.
>
> I'll also note that btrfs will need the same fix, because it also
> sets SB_I_VERSION unconditionally, as will any other filesystem that
> does this, too.
Thank you for pointed it out.
How about following change? I believe it works both xfs and btrfs...
diff --git a/fs/super.c b/fs/super.c
index b0a511bef4a0..42fc6334d384 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -973,6 +973,9 @@ int reconfigure_super(struct fs_context *fc)
}
}
+ if (sb->s_flags & SB_I_VERSION)
+ fc->sb_flags |= MS_I_VERSION;
+
WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) |
(fc->sb_flags & fc->sb_flags_mask)));
/* Needs to be ordered wrt mnt_is_readonly() */
- Masa
>
> Really, this is just indicative of the mess that the mount
> flags vs superblock feature flags are. Filesystems can choose to
> unconditionally support various superblock features, and no mount
> option futzing from userspace should -ever- be able to change that
> feature. Filesystems really do need to be able to override mount
> options that were parsed in userspace and turned into a binary
> flag...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]
On Wed, Jun 17, 2020 at 11:45:35PM -0400, Masayoshi Mizuma wrote:
> On Thu, Jun 18, 2020 at 01:05:39PM +1000, Dave Chinner wrote:
> > On Wed, Jun 17, 2020 at 09:30:26PM -0400, Masayoshi Mizuma wrote:
> > > On Wed, Jun 17, 2020 at 02:45:07PM -0400, J. Bruce Fields wrote:
> > > > On Wed, Jun 17, 2020 at 01:28:11PM -0500, Eric Sandeen wrote:
> > > > > but mount(8) has already exposed this interface:
> > > > >
> > > > > iversion
> > > > > Every time the inode is modified, the i_version field will be incremented.
> > > > >
> > > > > noiversion
> > > > > Do not increment the i_version inode field.
> > > > >
> > > > > so now what?
> > > >
> > > > It's not like anyone's actually depending on i_version *not* being
> > > > incremented. (Can you even observe it from userspace other than over
> > > > NFS?)
> > > >
> > > > So, just silently turn on the "iversion" behavior and ignore noiversion,
> > > > and I doubt you're going to break any real application.
> > >
> > > I suppose it's probably good to remain the options for user compatibility,
> > > however, it seems that iversion and noiversiont are useful for
> > > only ext4.
> > > How about moving iversion and noiversion description on mount(8)
> > > to ext4 specific option?
> > >
> > > And fixing the remount issue for XFS (maybe btrfs has the same
> > > issue as well)?
> > > For XFS like as:
> > >
> > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > index 379cbff438bc..2ddd634cfb0b 100644
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -1748,6 +1748,9 @@ xfs_fc_reconfigure(
> > > return error;
> > > }
> > >
> > > + if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
> > > + mp->m_super->s_flags |= SB_I_VERSION;
> > > +
> > > return 0;
> > > }
> >
> > no this doesn't work, because the sueprblock flags are modified
> > after ->reconfigure is called.
> >
> > i.e. reconfigure_super() does this:
> >
> > if (fc->ops->reconfigure) {
> > retval = fc->ops->reconfigure(fc);
> > if (retval) {
> > if (!force)
> > goto cancel_readonly;
> > /* If forced remount, go ahead despite any errors */
> > WARN(1, "forced remount of a %s fs returned %i\n",
> > sb->s_type->name, retval);
> > }
> > }
> >
> > WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) |
> > (fc->sb_flags & fc->sb_flags_mask)));
> >
> > And it's the WRITE_ONCE() line that clears SB_I_VERSION out of
> > sb->s_flags. Hence adding it in ->reconfigure doesn't help.
> >
> > What we actually want to do here in xfs_fc_reconfigure() is this:
> >
> > if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
> > fc->sb_flags_mask |= SB_I_VERSION;
> >
> > So that the SB_I_VERSION is not cleared from sb->s_flags.
> >
> > I'll also note that btrfs will need the same fix, because it also
> > sets SB_I_VERSION unconditionally, as will any other filesystem that
> > does this, too.
>
> Thank you for pointed it out.
> How about following change? I believe it works both xfs and btrfs...
>
> diff --git a/fs/super.c b/fs/super.c
> index b0a511bef4a0..42fc6334d384 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -973,6 +973,9 @@ int reconfigure_super(struct fs_context *fc)
> }
> }
>
> + if (sb->s_flags & SB_I_VERSION)
> + fc->sb_flags |= MS_I_VERSION;
> +
> WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) |
> (fc->sb_flags & fc->sb_flags_mask)));
> /* Needs to be ordered wrt mnt_is_readonly() */
This will prevent SB_I_VERSION from being turned off at all. That
will break existing filesystems that allow SB_I_VERSION to be turned
off on remount, such as ext4.
The manipulations here need to be in the filesystem specific code;
we screwed this one up so badly there is no "one size fits all"
behaviour that we can implement in the generic code...
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Fri, Jun 19, 2020 at 08:39:48AM +1000, Dave Chinner wrote:
> On Wed, Jun 17, 2020 at 11:45:35PM -0400, Masayoshi Mizuma wrote:
> > Thank you for pointed it out.
> > How about following change? I believe it works both xfs and btrfs...
> >
> > diff --git a/fs/super.c b/fs/super.c
> > index b0a511bef4a0..42fc6334d384 100644
> > --- a/fs/super.c
> > +++ b/fs/super.c
> > @@ -973,6 +973,9 @@ int reconfigure_super(struct fs_context *fc)
> > }
> > }
> >
> > + if (sb->s_flags & SB_I_VERSION)
> > + fc->sb_flags |= MS_I_VERSION;
> > +
> > WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) |
> > (fc->sb_flags & fc->sb_flags_mask)));
> > /* Needs to be ordered wrt mnt_is_readonly() */
>
> This will prevent SB_I_VERSION from being turned off at all. That
> will break existing filesystems that allow SB_I_VERSION to be turned
> off on remount, such as ext4.
>
> The manipulations here need to be in the filesystem specific code;
> we screwed this one up so badly there is no "one size fits all"
> behaviour that we can implement in the generic code...
My memory was that after Jeff Layton's i_version patches, there wasn't
really a significant performance hit any more, so the ability to turn it
off is no longer useful.
But looking back through Jeff's postings, I don't see him claiming that;
e.g. in:
https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/linux-nfs/[email protected]/
https://lore.kernel.org/linux-nfs/[email protected]/
he reports comparing old iversion behavior to new iversion behavior, but
not new iversion behavior to new noiversion behavior.
--b.
On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote:
> On Fri, Jun 19, 2020 at 08:39:48AM +1000, Dave Chinner wrote:
> > On Wed, Jun 17, 2020 at 11:45:35PM -0400, Masayoshi Mizuma wrote:
> > > Thank you for pointed it out.
> > > How about following change? I believe it works both xfs and btrfs...
> > >
> > > diff --git a/fs/super.c b/fs/super.c
> > > index b0a511bef4a0..42fc6334d384 100644
> > > --- a/fs/super.c
> > > +++ b/fs/super.c
> > > @@ -973,6 +973,9 @@ int reconfigure_super(struct fs_context *fc)
> > > }
> > > }
> > >
> > > + if (sb->s_flags & SB_I_VERSION)
> > > + fc->sb_flags |= MS_I_VERSION;
> > > +
> > > WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) |
> > > (fc->sb_flags & fc->sb_flags_mask)));
> > > /* Needs to be ordered wrt mnt_is_readonly() */
> >
> > This will prevent SB_I_VERSION from being turned off at all. That
> > will break existing filesystems that allow SB_I_VERSION to be turned
> > off on remount, such as ext4.
> >
> > The manipulations here need to be in the filesystem specific code;
> > we screwed this one up so badly there is no "one size fits all"
> > behaviour that we can implement in the generic code...
>
> My memory was that after Jeff Layton's i_version patches, there wasn't
> really a significant performance hit any more, so the ability to turn it
> off is no longer useful.
Yes, I completely agree with you here. However, with some
filesystems allowing it to be turned off, we can't just wave our
hands and force enable the option. Those filesystems - if the
maintainers chose to always enable iversion - will have to go
through a mount option deprecation period before permanently
enabling it.
> But looking back through Jeff's postings, I don't see him claiming that;
> e.g. in:
>
> https://lore.kernel.org/lkml/[email protected]/
> https://lore.kernel.org/linux-nfs/[email protected]/
> https://lore.kernel.org/linux-nfs/[email protected]/
>
> he reports comparing old iversion behavior to new iversion behavior, but
> not new iversion behavior to new noiversion behavior.
Yeah, it's had to compare noiversion behaviour on filesystems where
it was understood that it couldn't actually be turned off. And,
realistically, the comaprison to noiversion wasn't really relevant
to the problem Jeff's patchset was addressing...
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Fri, 2020-06-19 at 12:44 +1000, Dave Chinner wrote:
> On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote:
> > On Fri, Jun 19, 2020 at 08:39:48AM +1000, Dave Chinner wrote:
> > > On Wed, Jun 17, 2020 at 11:45:35PM -0400, Masayoshi Mizuma wrote:
> > > > Thank you for pointed it out.
> > > > How about following change? I believe it works both xfs and btrfs...
> > > >
> > > > diff --git a/fs/super.c b/fs/super.c
> > > > index b0a511bef4a0..42fc6334d384 100644
> > > > --- a/fs/super.c
> > > > +++ b/fs/super.c
> > > > @@ -973,6 +973,9 @@ int reconfigure_super(struct fs_context *fc)
> > > > }
> > > > }
> > > >
> > > > + if (sb->s_flags & SB_I_VERSION)
> > > > + fc->sb_flags |= MS_I_VERSION;
> > > > +
> > > > WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) |
> > > > (fc->sb_flags & fc->sb_flags_mask)));
> > > > /* Needs to be ordered wrt mnt_is_readonly() */
> > >
> > > This will prevent SB_I_VERSION from being turned off at all. That
> > > will break existing filesystems that allow SB_I_VERSION to be turned
> > > off on remount, such as ext4.
> > >
> > > The manipulations here need to be in the filesystem specific code;
> > > we screwed this one up so badly there is no "one size fits all"
> > > behaviour that we can implement in the generic code...
> >
> > My memory was that after Jeff Layton's i_version patches, there wasn't
> > really a significant performance hit any more, so the ability to turn it
> > off is no longer useful.
>
> Yes, I completely agree with you here. However, with some
> filesystems allowing it to be turned off, we can't just wave our
> hands and force enable the option. Those filesystems - if the
> maintainers chose to always enable iversion - will have to go
> through a mount option deprecation period before permanently
> enabling it.
>
> > But looking back through Jeff's postings, I don't see him claiming that;
> > e.g. in:
> >
> > https://lore.kernel.org/lkml/[email protected]/
> > https://lore.kernel.org/linux-nfs/[email protected]/
> > https://lore.kernel.org/linux-nfs/[email protected]/
> >
> > he reports comparing old iversion behavior to new iversion behavior, but
> > not new iversion behavior to new noiversion behavior.
>
> Yeah, it's had to compare noiversion behaviour on filesystems where
> it was understood that it couldn't actually be turned off. And,
> realistically, the comaprison to noiversion wasn't really relevant
> to the problem Jeff's patchset was addressing...
>
I actually did do some comparison with that patchset vs. noiversion
mounted ext4, and found that there was a small performance delta. It
wasn't much but it was measurable enough that I didn't want to propose
removing the option from ext4 altogether at the time. It may be worth it
to do that now though.
--
Jeff Layton <[email protected]>
On Fri, Jun 19, 2020 at 08:39:48AM +1000, Dave Chinner wrote:
> This will prevent SB_I_VERSION from being turned off at all. That
> will break existing filesystems that allow SB_I_VERSION to be turned
> off on remount, such as ext4.
>
> The manipulations here need to be in the filesystem specific code;
> we screwed this one up so badly there is no "one size fits all"
> behaviour that we can implement in the generic code...
Yes. SB_I_VERSION should never be set by common code.
On Fri, Jun 19, 2020 at 12:44:55PM +1000, Dave Chinner wrote:
> On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote:
> > My memory was that after Jeff Layton's i_version patches, there wasn't
> > really a significant performance hit any more, so the ability to turn it
> > off is no longer useful.
>
> Yes, I completely agree with you here. However, with some
> filesystems allowing it to be turned off, we can't just wave our
> hands and force enable the option. Those filesystems - if the
> maintainers chose to always enable iversion - will have to go
> through a mount option deprecation period before permanently
> enabling it.
I don't understand why.
The filesystem can continue to let people set iversion or noiversion as
they like, while under the covers behaving as if iversion is always set.
I can't see how that would break any application. (Or even how an
application would be able to detect that the filesystem was doing this.)
--b.
>
> > But looking back through Jeff's postings, I don't see him claiming that;
> > e.g. in:
> >
> > https://lore.kernel.org/lkml/[email protected]/
> > https://lore.kernel.org/linux-nfs/[email protected]/
> > https://lore.kernel.org/linux-nfs/[email protected]/
> >
> > he reports comparing old iversion behavior to new iversion behavior, but
> > not new iversion behavior to new noiversion behavior.
>
> Yeah, it's had to compare noiversion behaviour on filesystems where
> it was understood that it couldn't actually be turned off. And,
> realistically, the comaprison to noiversion wasn't really relevant
> to the problem Jeff's patchset was addressing...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]
On Fri, Jun 19, 2020 at 04:40:33PM -0400, J. Bruce Fields wrote:
> On Fri, Jun 19, 2020 at 12:44:55PM +1000, Dave Chinner wrote:
> > On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote:
> > > My memory was that after Jeff Layton's i_version patches, there wasn't
> > > really a significant performance hit any more, so the ability to turn it
> > > off is no longer useful.
> >
> > Yes, I completely agree with you here. However, with some
> > filesystems allowing it to be turned off, we can't just wave our
> > hands and force enable the option. Those filesystems - if the
> > maintainers chose to always enable iversion - will have to go
> > through a mount option deprecation period before permanently
> > enabling it.
>
> I don't understand why.
>
> The filesystem can continue to let people set iversion or noiversion as
> they like, while under the covers behaving as if iversion is always set.
> I can't see how that would break any application. (Or even how an
> application would be able to detect that the filesystem was doing this.)
It doesn't break functionality, but it affects performance. IOWs, it
can make certain workloads go a lot slower in some circumstances.
And that can result in unexectedly breaking SLAs or slow down a
complex, finely tuned data center wide workload to the point it no
longer meets requirements. Such changes in behaviour are considered
a regression, especially if they result from a change that just
ignores the mount option that turned off that specific feature.
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Sat, Jun 20, 2020 at 08:10:44AM +1000, Dave Chinner wrote:
> On Fri, Jun 19, 2020 at 04:40:33PM -0400, J. Bruce Fields wrote:
> > On Fri, Jun 19, 2020 at 12:44:55PM +1000, Dave Chinner wrote:
> > > On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote:
> > > > My memory was that after Jeff Layton's i_version patches, there wasn't
> > > > really a significant performance hit any more, so the ability to turn it
> > > > off is no longer useful.
> > >
> > > Yes, I completely agree with you here. However, with some
> > > filesystems allowing it to be turned off, we can't just wave our
> > > hands and force enable the option. Those filesystems - if the
> > > maintainers chose to always enable iversion - will have to go
> > > through a mount option deprecation period before permanently
> > > enabling it.
> >
> > I don't understand why.
> >
> > The filesystem can continue to let people set iversion or noiversion as
> > they like, while under the covers behaving as if iversion is always set.
> > I can't see how that would break any application. (Or even how an
> > application would be able to detect that the filesystem was doing this.)
>
> It doesn't break functionality, but it affects performance.
I thought you just agreed above that any performance hit was not
"significant".
> IOWs, it can make certain workloads go a lot slower in some
> circumstances. And that can result in unexectedly breaking SLAs or
> slow down a complex, finely tuned data center wide workload to the
> point it no longer meets requirements. Such changes in behaviour are
> considered a regression, especially if they result from a change that
> just ignores the mount option that turned off that specific feature.
I get that, but, what's the threshhold here for a significant risk of
regression?
The "noiversion" behavior is kinda painful for NFS.
--b.
On Sat, Jun 20, 2020 at 11:49:57AM +1000, Dave Chinner wrote:
> On Fri, Jun 19, 2020 at 06:28:43PM -0400, J. Bruce Fields wrote:
> > On Sat, Jun 20, 2020 at 08:10:44AM +1000, Dave Chinner wrote:
> > > On Fri, Jun 19, 2020 at 04:40:33PM -0400, J. Bruce Fields wrote:
> > > > On Fri, Jun 19, 2020 at 12:44:55PM +1000, Dave Chinner wrote:
> > > > > On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote:
> > > > > > My memory was that after Jeff Layton's i_version patches, there wasn't
> > > > > > really a significant performance hit any more, so the ability to turn it
> > > > > > off is no longer useful.
> > > > >
> > > > > Yes, I completely agree with you here. However, with some
> > > > > filesystems allowing it to be turned off, we can't just wave our
> > > > > hands and force enable the option. Those filesystems - if the
> > > > > maintainers chose to always enable iversion - will have to go
> > > > > through a mount option deprecation period before permanently
> > > > > enabling it.
> > > >
> > > > I don't understand why.
> > > >
> > > > The filesystem can continue to let people set iversion or noiversion as
> > > > they like, while under the covers behaving as if iversion is always set.
> > > > I can't see how that would break any application. (Or even how an
> > > > application would be able to detect that the filesystem was doing this.)
> > >
> > > It doesn't break functionality, but it affects performance.
> >
> > I thought you just agreed above that any performance hit was not
> > "significant".
>
> Yes, but that's just /my opinion/.
>
> However, other people have different opinions on this matter (and we
> know that from the people who considered XFS v4 -> v5 going slower
> because iversion a major regression), and so we must acknowledge
> those opinions even if we don't agree with them.
Do you have any of those reports handy? Were there numbers?
--b.
On Fri, Jun 19, 2020 at 06:28:43PM -0400, J. Bruce Fields wrote:
> On Sat, Jun 20, 2020 at 08:10:44AM +1000, Dave Chinner wrote:
> > On Fri, Jun 19, 2020 at 04:40:33PM -0400, J. Bruce Fields wrote:
> > > On Fri, Jun 19, 2020 at 12:44:55PM +1000, Dave Chinner wrote:
> > > > On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote:
> > > > > My memory was that after Jeff Layton's i_version patches, there wasn't
> > > > > really a significant performance hit any more, so the ability to turn it
> > > > > off is no longer useful.
> > > >
> > > > Yes, I completely agree with you here. However, with some
> > > > filesystems allowing it to be turned off, we can't just wave our
> > > > hands and force enable the option. Those filesystems - if the
> > > > maintainers chose to always enable iversion - will have to go
> > > > through a mount option deprecation period before permanently
> > > > enabling it.
> > >
> > > I don't understand why.
> > >
> > > The filesystem can continue to let people set iversion or noiversion as
> > > they like, while under the covers behaving as if iversion is always set.
> > > I can't see how that would break any application. (Or even how an
> > > application would be able to detect that the filesystem was doing this.)
> >
> > It doesn't break functionality, but it affects performance.
>
> I thought you just agreed above that any performance hit was not
> "significant".
Yes, but that's just /my opinion/.
However, other people have different opinions on this matter (and we
know that from the people who considered XFS v4 -> v5 going slower
because iversion a major regression), and so we must acknowledge
those opinions even if we don't agree with them.
That is, if people are using noiversion for performance reasons on
filesystems that are not designed/intended to have it permanently
enabled, then yanking that functionality out from under them without
warning is a Bad Thing To Do.
If we want to change this behaviour, the mount option needs to be
deprecated and a date/kernel release placed on when it will no
longer function (at least a year in the future). When the mount
option is used, it needs to log a deprecation warning to the syslog
so that users are informed that the option is going away. Then when
the deprecation date passes, the mount option can then be ignored by
the kernel.
> > IOWs, it can make certain workloads go a lot slower in some
> > circumstances. And that can result in unexectedly breaking SLAs or
> > slow down a complex, finely tuned data center wide workload to the
> > point it no longer meets requirements. Such changes in behaviour are
> > considered a regression, especially if they result from a change that
> > just ignores the mount option that turned off that specific feature.
>
> I get that, but, what's the threshhold here for a significant risk of
> regression?
<shrug>
I have no real idea because the filesystems that are affected are
not ones I'm actively involved in supporting or developing for.
That's for the people with domain specific expertise - the
individual filesystem maintainers - to decide.
> The "noiversion" behavior is kinda painful for NFS.
Sure, but that's all you ever get on XFS v4 filesystems and any
other filesystem that doesn't support persistent iversion storage.
So the NFS implemenations are going to have to live with filesystems
without iversion support at all for many more years to come.
Cheers,
Dave.
--
Dave Chinner
[email protected]
On 6/19/20 8:56 PM, J. Bruce Fields wrote:
> On Sat, Jun 20, 2020 at 11:49:57AM +1000, Dave Chinner wrote:
...
>> However, other people have different opinions on this matter (and we
>> know that from the people who considered XFS v4 -> v5 going slower
>> because iversion a major regression), and so we must acknowledge
>> those opinions even if we don't agree with them.
>
> Do you have any of those reports handy? Were there numbers?
I can't answer that but did a little digging. MS_I_VERSION as an option
appeared here:
commit 7a224228ed79d587ece2304869000aad1b8e97dd
Author: Jean Noel Cordenner <[email protected]>
Date: Mon Jan 28 23:58:27 2008 -0500
vfs: Add 64 bit i_version support
The i_version field of the inode is changed to be a 64-bit counter that
is set on every inode creation and that is incremented every time the
inode data is modified (similarly to the "ctime" time-stamp).
The aim is to fulfill a NFSv4 requirement for rfc3530.
This first part concerns the vfs, it converts the 32-bit i_version in
the generic inode to a 64-bit, a flag is added in the super block in
order to check if the feature is enabled and the i_version is
incremented in the vfs.
Signed-off-by: Mingming Cao <[email protected]>
Signed-off-by: Jean Noel Cordenner <[email protected]>
Signed-off-by: Kalpak Shah <[email protected]>
and ext4's Opt_i_version mount option appeared here:
commit 25ec56b518257a56d2ff41a941d288e4b5ff9488
Author: Jean Noel Cordenner <[email protected]>
Date: Mon Jan 28 23:58:27 2008 -0500
ext4: Add inode version support in ext4
This patch adds 64-bit inode version support to ext4. The lower 32 bits
are stored in the osd1.linux1.l_i_version field while the high 32 bits
are stored in the i_version_hi field newly created in the ext4_inode.
This field is incremented in case the ext4_inode is large enough. A
i_version mount option has been added to enable the feature.
Signed-off-by: Mingming Cao <[email protected]>
Signed-off-by: Andreas Dilger <[email protected]>
Signed-off-by: Kalpak Shah <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Jean Noel Cordenner <[email protected]>
so the optional enablement was there on day one, without any real explanation
of why.
-Eric
On Sat, Jun 20, 2020 at 12:00:43PM -0500, Eric Sandeen wrote:
> On 6/19/20 8:56 PM, J. Bruce Fields wrote:
> > On Sat, Jun 20, 2020 at 11:49:57AM +1000, Dave Chinner wrote:
>
> ...
>
> >> However, other people have different opinions on this matter (and we
> >> know that from the people who considered XFS v4 -> v5 going slower
> >> because iversion a major regression), and so we must acknowledge
> >> those opinions even if we don't agree with them.
> >
> > Do you have any of those reports handy? Were there numbers?
>
> I can't answer that but did a little digging. MS_I_VERSION as an option
> appeared here:
>
...
> so the optional enablement was there on day one, without any real explanation
> of why.
My memory is that they didn't have measurements at first, but worried
that there might be a performance issue. Which later mesurements
confirmed.
But that Jeff Layton's work eliminated most of that.
I think ext4 was the focuse of the concern, but xfs might also have had
a (less serious) regression, and btrfs might have actually had it worst?
But I don't have references and my memory may be wrong.
--b.
On Fri, Jun 19, 2020 at 09:56:33PM -0400, J. Bruce Fields wrote:
> On Sat, Jun 20, 2020 at 11:49:57AM +1000, Dave Chinner wrote:
> > On Fri, Jun 19, 2020 at 06:28:43PM -0400, J. Bruce Fields wrote:
> > > On Sat, Jun 20, 2020 at 08:10:44AM +1000, Dave Chinner wrote:
> > > > On Fri, Jun 19, 2020 at 04:40:33PM -0400, J. Bruce Fields wrote:
> > > > > On Fri, Jun 19, 2020 at 12:44:55PM +1000, Dave Chinner wrote:
> > > > > > On Thu, Jun 18, 2020 at 10:20:05PM -0400, J. Bruce Fields wrote:
> > > > > > > My memory was that after Jeff Layton's i_version patches, there wasn't
> > > > > > > really a significant performance hit any more, so the ability to turn it
> > > > > > > off is no longer useful.
> > > > > >
> > > > > > Yes, I completely agree with you here. However, with some
> > > > > > filesystems allowing it to be turned off, we can't just wave our
> > > > > > hands and force enable the option. Those filesystems - if the
> > > > > > maintainers chose to always enable iversion - will have to go
> > > > > > through a mount option deprecation period before permanently
> > > > > > enabling it.
> > > > >
> > > > > I don't understand why.
> > > > >
> > > > > The filesystem can continue to let people set iversion or noiversion as
> > > > > they like, while under the covers behaving as if iversion is always set.
> > > > > I can't see how that would break any application. (Or even how an
> > > > > application would be able to detect that the filesystem was doing this.)
> > > >
> > > > It doesn't break functionality, but it affects performance.
> > >
> > > I thought you just agreed above that any performance hit was not
> > > "significant".
> >
> > Yes, but that's just /my opinion/.
> >
> > However, other people have different opinions on this matter (and we
> > know that from the people who considered XFS v4 -> v5 going slower
> > because iversion a major regression), and so we must acknowledge
> > those opinions even if we don't agree with them.
>
> Do you have any of those reports handy? Were there numbers?
e.g. RH BZ #1355813 when v5 format was enabled by default in RHEL7.
Numbers were 40-47% performance degradation for in-cache writes
caused by the original IVERSION implementation using iozone. There
were others I recall, all realted to similar high-IOP small random
writes workloads typical of databases....
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Sun, Jun 21, 2020 at 09:54:08AM +1000, Dave Chinner wrote:
> On Fri, Jun 19, 2020 at 09:56:33PM -0400, J. Bruce Fields wrote:
> > On Sat, Jun 20, 2020 at 11:49:57AM +1000, Dave Chinner wrote:
> > > However, other people have different opinions on this matter (and we
> > > know that from the people who considered XFS v4 -> v5 going slower
> > > because iversion a major regression), and so we must acknowledge
> > > those opinions even if we don't agree with them.
> >
> > Do you have any of those reports handy? Were there numbers?
>
> e.g. RH BZ #1355813 when v5 format was enabled by default in RHEL7.
> Numbers were 40-47% performance degradation for in-cache writes
> caused by the original IVERSION implementation using iozone. There
> were others I recall, all realted to similar high-IOP small random
> writes workloads typical of databases....
Thanks, that's an interesting bug! Though a bit tangled. This is where
you identified the change attribute as the main culprit:
https://bugzilla.redhat.com/show_bug.cgi?id=1355813#c42
The test was running at 70,000 writes/s (2.2GB/s), so it was one
transaction per write() syscall: timestamp updates. On CRC
enabled filesystems, we have a change counter for NFSv4 - it
gets incremented on every write() syscall, even when the
timestamp doesn't change. That's the difference in behaviour and
hence performance in this test.
In RHEL8, or anything post-v4.16, the frequency of change attribute
updates should be back down to that of timestamp updates on this
workload. So it'd be interesting to repeat that experiment now.
The bug was reporting in-house testing, and doesn't show any evidence
that particular regression was encountered by users; Eric said:
https://bugzilla.redhat.com/show_bug.cgi?id=1355813#c52
Root cause of this minor in-memory regression was inode
versioning behavior; as it's unlikely to have real-world effects
(and has been open for years with no customer complaints) I'm
closing this WONTFIX to get it off the radar.
The typical user may just skip an upgrade or otherwise work around the
problem rather than root-causing it like this, so absence of reports
isn't conclusive. I understand wanting to err on the side of caution.
But if that regression's mostly addressed now, then I'm still inclined
to think it's time to just leave this on everywhere.
--b.
On Mon, Jun 22, 2020 at 05:26:12PM -0400, J. Bruce Fields wrote:
> On Sun, Jun 21, 2020 at 09:54:08AM +1000, Dave Chinner wrote:
> > On Fri, Jun 19, 2020 at 09:56:33PM -0400, J. Bruce Fields wrote:
> > > On Sat, Jun 20, 2020 at 11:49:57AM +1000, Dave Chinner wrote:
> > > > However, other people have different opinions on this matter (and we
> > > > know that from the people who considered XFS v4 -> v5 going slower
> > > > because iversion a major regression), and so we must acknowledge
> > > > those opinions even if we don't agree with them.
> > >
> > > Do you have any of those reports handy? Were there numbers?
> >
> > e.g. RH BZ #1355813 when v5 format was enabled by default in RHEL7.
> > Numbers were 40-47% performance degradation for in-cache writes
> > caused by the original IVERSION implementation using iozone. There
> > were others I recall, all realted to similar high-IOP small random
> > writes workloads typical of databases....
>
> Thanks, that's an interesting bug! Though a bit tangled. This is where
> you identified the change attribute as the main culprit:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1355813#c42
>
> The test was running at 70,000 writes/s (2.2GB/s), so it was one
> transaction per write() syscall: timestamp updates. On CRC
> enabled filesystems, we have a change counter for NFSv4 - it
> gets incremented on every write() syscall, even when the
> timestamp doesn't change. That's the difference in behaviour and
> hence performance in this test.
>
> In RHEL8, or anything post-v4.16, the frequency of change attribute
> updates should be back down to that of timestamp updates on this
> workload. So it'd be interesting to repeat that experiment now.
Yup, which in itself has been a problem for similar workloads.
There's a reason we now recommend the use of lazytime for high
performance database workloads that can do hundreds of thousands of
small write IOs a second...
> The bug was reporting in-house testing, and doesn't show any evidence
> that particular regression was encountered by users; Eric said:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1355813#c52
>
> Root cause of this minor in-memory regression was inode
> versioning behavior; as it's unlikely to have real-world effects
> (and has been open for years with no customer complaints) I'm
> closing this WONTFIX to get it off the radar.
It's just the first I found because bugzilla has a slow, less than
useful search engine. We know that real applications have
hit this, and we know even the overhead of timestamp updates on
writes is way too high for them.
> The typical user may just skip an upgrade or otherwise work around the
> problem rather than root-causing it like this, so absence of reports
> isn't conclusive. I understand wanting to err on the side of caution.
Yup, it's a generic problem - just because we've worked around or
mitigated the most common situations it impacts performance, that
doesn't mean they work for everyone....
Cheers,
Dave.
--
Dave Chinner
[email protected]
On 6/18/20 3:39 PM, Dave Chinner wrote:
> On Wed, Jun 17, 2020 at 11:45:35PM -0400, Masayoshi Mizuma wrote:
...
>> Thank you for pointed it out.
>> How about following change? I believe it works both xfs and btrfs...
>>
>> diff --git a/fs/super.c b/fs/super.c
>> index b0a511bef4a0..42fc6334d384 100644
>> --- a/fs/super.c
>> +++ b/fs/super.c
>> @@ -973,6 +973,9 @@ int reconfigure_super(struct fs_context *fc)
>> }
>> }
>>
>> + if (sb->s_flags & SB_I_VERSION)
>> + fc->sb_flags |= MS_I_VERSION;
>> +
>> WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) |
>> (fc->sb_flags & fc->sb_flags_mask)));
>> /* Needs to be ordered wrt mnt_is_readonly() */
>
> This will prevent SB_I_VERSION from being turned off at all. That
> will break existing filesystems that allow SB_I_VERSION to be turned
> off on remount, such as ext4.
>
> The manipulations here need to be in the filesystem specific code;
> we screwed this one up so badly there is no "one size fits all"
> behaviour that we can implement in the generic code...
I wandered back into this thread for some reason ... ;)
Since iversion/noiversion is /already/ advertised as a vfs-level mount option,
wouldn't exposing it in /proc/mounts solve the original problem here?
("i_version" is wrong, because it's ext4-specific, but "iversion" is handled
by the vfs, so it's meaningful for any filesystems, and it will also trivially
allow mount(2) to preserve it across remounts for all filesystems that set it by
default.)
Seems like that's the fastest path to fixing the current problems, even if a
long-term goal may be to deprecate it altogether.
-Eric
On Mon, Jul 13, 2020 at 04:45:19PM -0700, Eric Sandeen wrote:
> I wandered back into this thread for some reason ... ;)
>
> Since iversion/noiversion is /already/ advertised as a vfs-level mount option,
> wouldn't exposing it in /proc/mounts solve the original problem here?
>
> ("i_version" is wrong, because it's ext4-specific, but "iversion" is handled
> by the vfs, so it's meaningful for any filesystems, and it will also trivially
> allow mount(2) to preserve it across remounts for all filesystems that set it by
> default.)
>
> Seems like that's the fastest path to fixing the current problems, even if a
> long-term goal may be to deprecate it altogether.
But they should not be exposed as a mount option. E.g. for XFS we
decide internally if we have a useful i_version or not, totally
independent of the mount option that leaked into the VFS. So we'll
need to fix how the flag is used before doing any new work in this area.
On 7/14/20 1:30 AM, Christoph Hellwig wrote:
> On Mon, Jul 13, 2020 at 04:45:19PM -0700, Eric Sandeen wrote:
>> I wandered back into this thread for some reason ... ;)
>>
>> Since iversion/noiversion is /already/ advertised as a vfs-level mount option,
>> wouldn't exposing it in /proc/mounts solve the original problem here?
>>
>> ("i_version" is wrong, because it's ext4-specific, but "iversion" is handled
>> by the vfs, so it's meaningful for any filesystems, and it will also trivially
>> allow mount(2) to preserve it across remounts for all filesystems that set it by
>> default.)
>>
>> Seems like that's the fastest path to fixing the current problems, even if a
>> long-term goal may be to deprecate it altogether.
>
> But they should not be exposed as a mount option. E.g. for XFS we
> decide internally if we have a useful i_version or not, totally
> independent of the mount option that leaked into the VFS. So we'll
> need to fix how the flag is used before doing any new work in this area.
>
It's been explicitly exposed, documented, fixed, updated etc for about
12 years now.
I was just hoping to make the current situation - even if we regret its
mere existence - less broken, because going down a deprecation path will
take us a while even if we choose it.
In the meantime I'll just make sure xfs isn't broken on remount, but had
hoped for a more general fix. *shrug*
-Eric