2022-08-19 12:03:28

by Jeff Layton

[permalink] [raw]
Subject: [PATCH] vfs: report an inode version in statx for IS_I_VERSION inodes

From: Jeff Layton <[email protected]>

The NFS server and IMA both rely heavily on the i_version counter, but
it's largely invisible to userland, which makes it difficult to test its
behavior. This value would also be of use to userland NFS servers, and
other applications that want a reliable way to know if there was an
explicit change to an inode since they last checked.

Claim one of the spare fields in struct statx to hold a 64-bit inode
version attribute. This value must change with any explicit, observeable
metadata or data change. Note that atime updates are excluded from this,
unless it is due to an explicit change via utimes or similar mechanism.

When statx requests this attribute on an IS_I_VERSION inode, do an
inode_query_iversion and fill the result in the field. Also, update the
test-statx.c program to display the inode version and the mountid.

Cc: David Howells <[email protected]>
Cc: Frank Filz <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/stat.c | 7 +++++++
include/linux/stat.h | 1 +
include/uapi/linux/stat.h | 3 ++-
samples/vfs/test-statx.c | 8 ++++++--
4 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/stat.c b/fs/stat.c
index 9ced8860e0f3..d892909836aa 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -17,6 +17,7 @@
#include <linux/syscalls.h>
#include <linux/pagemap.h>
#include <linux/compat.h>
+#include <linux/iversion.h>

#include <linux/uaccess.h>
#include <asm/unistd.h>
@@ -118,6 +119,11 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
stat->attributes_mask |= (STATX_ATTR_AUTOMOUNT |
STATX_ATTR_DAX);

+ if ((request_mask & STATX_INO_VERSION) && IS_I_VERSION(inode)) {
+ stat->result_mask |= STATX_INO_VERSION;
+ stat->ino_version = inode_query_iversion(inode);
+ }
+
mnt_userns = mnt_user_ns(path->mnt);
if (inode->i_op->getattr)
return inode->i_op->getattr(mnt_userns, path, stat,
@@ -611,6 +617,7 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
tmp.stx_dev_major = MAJOR(stat->dev);
tmp.stx_dev_minor = MINOR(stat->dev);
tmp.stx_mnt_id = stat->mnt_id;
+ tmp.stx_ino_version = stat->ino_version;

return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0;
}
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 7df06931f25d..9cd77eb7bc1a 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -50,6 +50,7 @@ struct kstat {
struct timespec64 btime; /* File creation time */
u64 blocks;
u64 mnt_id;
+ u64 ino_version;
};

#endif
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 1500a0f58041..48d9307d7f31 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -124,7 +124,7 @@ struct statx {
__u32 stx_dev_minor;
/* 0x90 */
__u64 stx_mnt_id;
- __u64 __spare2;
+ __u64 stx_ino_version; /* Inode change attribute */
/* 0xa0 */
__u64 __spare3[12]; /* Spare space for future expansion */
/* 0x100 */
@@ -152,6 +152,7 @@ struct statx {
#define STATX_BASIC_STATS 0x000007ffU /* The stuff in the normal stat struct */
#define STATX_BTIME 0x00000800U /* Want/got stx_btime */
#define STATX_MNT_ID 0x00001000U /* Got stx_mnt_id */
+#define STATX_INO_VERSION 0x00002000U /* Want/got stx_change_attr */

#define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */

diff --git a/samples/vfs/test-statx.c b/samples/vfs/test-statx.c
index 49c7a46cee07..23e68036fdfb 100644
--- a/samples/vfs/test-statx.c
+++ b/samples/vfs/test-statx.c
@@ -107,6 +107,8 @@ static void dump_statx(struct statx *stx)
printf("Device: %-15s", buffer);
if (stx->stx_mask & STATX_INO)
printf(" Inode: %-11llu", (unsigned long long) stx->stx_ino);
+ if (stx->stx_mask & STATX_MNT_ID)
+ printf(" MountId: %llx", stx->stx_mnt_id);
if (stx->stx_mask & STATX_NLINK)
printf(" Links: %-5u", stx->stx_nlink);
if (stx->stx_mask & STATX_TYPE) {
@@ -145,7 +147,9 @@ static void dump_statx(struct statx *stx)
if (stx->stx_mask & STATX_CTIME)
print_time("Change: ", &stx->stx_ctime);
if (stx->stx_mask & STATX_BTIME)
- print_time(" Birth: ", &stx->stx_btime);
+ print_time("Birth: ", &stx->stx_btime);
+ if (stx->stx_mask & STATX_INO_VERSION)
+ printf("Inode Version: 0x%llx\n", stx->stx_ino_version);

if (stx->stx_attributes_mask) {
unsigned char bits, mbits;
@@ -218,7 +222,7 @@ int main(int argc, char **argv)
struct statx stx;
int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW;

- unsigned int mask = STATX_BASIC_STATS | STATX_BTIME;
+ unsigned int mask = STATX_BASIC_STATS | STATX_BTIME | STATX_MNT_ID | STATX_INO_VERSION;

for (argv++; *argv; argv++) {
if (strcmp(*argv, "-F") == 0) {
--
2.37.2


2022-08-23 13:11:43

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH] vfs: report an inode version in statx for IS_I_VERSION inodes

* Jeff Layton:

> From: Jeff Layton <[email protected]>
>
> The NFS server and IMA both rely heavily on the i_version counter, but
> it's largely invisible to userland, which makes it difficult to test its
> behavior. This value would also be of use to userland NFS servers, and
> other applications that want a reliable way to know if there was an
> explicit change to an inode since they last checked.
>
> Claim one of the spare fields in struct statx to hold a 64-bit inode
> version attribute. This value must change with any explicit, observeable
> metadata or data change. Note that atime updates are excluded from this,
> unless it is due to an explicit change via utimes or similar mechanism.
>
> When statx requests this attribute on an IS_I_VERSION inode, do an
> inode_query_iversion and fill the result in the field. Also, update the
> test-statx.c program to display the inode version and the mountid.

Will the version survive reboots? Is it stored on disks? Can backup
tools (and others) use this to check if the file has changed since the
last time the version has been observed?

Thanks,
Florian

2022-08-23 13:21:17

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] vfs: report an inode version in statx for IS_I_VERSION inodes

On Tue, 2022-08-23 at 12:01 +0200, Florian Weimer wrote:
> * Jeff Layton:
>
> > From: Jeff Layton <[email protected]>
> >
> > The NFS server and IMA both rely heavily on the i_version counter, but
> > it's largely invisible to userland, which makes it difficult to test its
> > behavior. This value would also be of use to userland NFS servers, and
> > other applications that want a reliable way to know if there was an
> > explicit change to an inode since they last checked.
> >
> > Claim one of the spare fields in struct statx to hold a 64-bit inode
> > version attribute. This value must change with any explicit, observeable
> > metadata or data change. Note that atime updates are excluded from this,
> > unless it is due to an explicit change via utimes or similar mechanism.
> >
> > When statx requests this attribute on an IS_I_VERSION inode, do an
> > inode_query_iversion and fill the result in the field. Also, update the
> > test-statx.c program to display the inode version and the mountid.
>
> Will the version survive reboots? Is it stored on disks? Can backup
> tools (and others) use this to check if the file has changed since the
> last time the version has been observed?
>


The answer to all of those question is "yes".
--
Jeff Layton <[email protected]>

2022-08-23 22:11:42

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] vfs: report an inode version in statx for IS_I_VERSION inodes

On Fri, Aug 19, 2022 at 07:56:41AM -0400, Jeff Layton wrote:
> From: Jeff Layton <[email protected]>
>
> The NFS server and IMA both rely heavily on the i_version counter, but
> it's largely invisible to userland, which makes it difficult to test its
> behavior. This value would also be of use to userland NFS servers, and
> other applications that want a reliable way to know if there was an
> explicit change to an inode since they last checked.
>
> Claim one of the spare fields in struct statx to hold a 64-bit inode
> version attribute. This value must change with any explicit, observeable
> metadata or data change. Note that atime updates are excluded from this,
> unless it is due to an explicit change via utimes or similar mechanism.
>
> When statx requests this attribute on an IS_I_VERSION inode, do an
> inode_query_iversion and fill the result in the field. Also, update the
> test-statx.c program to display the inode version and the mountid.
>
> Cc: David Howells <[email protected]>
> Cc: Frank Filz <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>

NAK.

THere's no definition of what consitutes an "inode change" and this
exposes internal filesystem implementation details (i.e. on disk
format behaviour) directly to userspace. That means when the
internal filesystem behaviour changes, userspace applications will
see changes in stat->ino_version changes and potentially break them.

We *need a documented specification* for the behaviour we are exposing to
userspace here, and then individual filesystems needs to opt into
providing this information as they are modified to conform to the
behaviour we are exposing directly to userspsace.

Jeff - can you please stop posting iversion patches to different
subsystems as individual, unrelated patchsets and start posting all
the changes - statx, ext4, xfs, man pages, etc as a single patchset
so the discussion can be centralised in one place and not spread
over half a dozen disconnected threads?

-Dave.
--
Dave Chinner
[email protected]

2022-08-24 10:20:04

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] vfs: report an inode version in statx for IS_I_VERSION inodes

On Wed, 2022-08-24 at 07:53 +1000, Dave Chinner wrote:
> On Fri, Aug 19, 2022 at 07:56:41AM -0400, Jeff Layton wrote:
> > From: Jeff Layton <[email protected]>
> >
> > The NFS server and IMA both rely heavily on the i_version counter, but
> > it's largely invisible to userland, which makes it difficult to test its
> > behavior. This value would also be of use to userland NFS servers, and
> > other applications that want a reliable way to know if there was an
> > explicit change to an inode since they last checked.
> >
> > Claim one of the spare fields in struct statx to hold a 64-bit inode
> > version attribute. This value must change with any explicit, observeable
> > metadata or data change. Note that atime updates are excluded from this,
> > unless it is due to an explicit change via utimes or similar mechanism.
> >
> > When statx requests this attribute on an IS_I_VERSION inode, do an
> > inode_query_iversion and fill the result in the field. Also, update the
> > test-statx.c program to display the inode version and the mountid.
> >
> > Cc: David Howells <[email protected]>
> > Cc: Frank Filz <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
>
> NAK.
>
> THere's no definition of what consitutes an "inode change" and this
> exposes internal filesystem implementation details (i.e. on disk
> format behaviour) directly to userspace. That means when the
> internal filesystem behaviour changes, userspace applications will
> see changes in stat->ino_version changes and potentially break them.
>
> We *need a documented specification* for the behaviour we are exposing to
> userspace here, and then individual filesystems needs to opt into
> providing this information as they are modified to conform to the
> behaviour we are exposing directly to userspsace.
>
> Jeff - can you please stop posting iversion patches to different
> subsystems as individual, unrelated patchsets and start posting all
> the changes - statx, ext4, xfs, man pages, etc as a single patchset
> so the discussion can be centralised in one place and not spread
> over half a dozen disconnected threads?
>


Sure. Give me a few days and I'll post a more coherent set of patches.

Thanks,
--
Jeff Layton <[email protected]>

2022-08-25 18:51:49

by Colin Walters

[permalink] [raw]
Subject: Re: [PATCH] vfs: report an inode version in statx for IS_I_VERSION inodes



On Tue, Aug 23, 2022, at 5:53 PM, Dave Chinner wrote:
>
> THere's no definition of what consitutes an "inode change" and this
> exposes internal filesystem implementation details (i.e. on disk
> format behaviour) directly to userspace. That means when the
> internal filesystem behaviour changes, userspace applications will
> see changes in stat->ino_version changes and potentially break them.

As a userspace developer (ostree, etc. who is definitely interested in this functionality) I do agree with this concern; but a random drive by comment: would it be helpful to expose iversion (or other bits like this from the vfs) via e.g. debugfs to start? I think that'd unblock writing fstests in the short term right?


2022-08-25 20:02:35

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] vfs: report an inode version in statx for IS_I_VERSION inodes

On Thu, 2022-08-25 at 14:48 -0400, Colin Walters wrote:
>
> On Tue, Aug 23, 2022, at 5:53 PM, Dave Chinner wrote:
> >
> > THere's no definition of what consitutes an "inode change" and this
> > exposes internal filesystem implementation details (i.e. on disk
> > format behaviour) directly to userspace. That means when the
> > internal filesystem behaviour changes, userspace applications will
> > see changes in stat->ino_version changes and potentially break them.
>
> As a userspace developer (ostree, etc. who is definitely interested in this functionality) I do agree with this concern; but a random drive by comment: would it be helpful to expose iversion (or other bits like this from the vfs) via e.g. debugfs to start? I think that'd unblock writing fstests in the short term right?
>
>

It's great to hear from userland developers who are interested in this!

I don't think there is a lot of controversy about the idea of presenting
a value like this via statx. The usefulness seems pretty obvious if
you've ever had to deal with timestamp granularity issues.

The part we're wrestling with now is that applications will need a clear
(and testable!) definition of what this value means. We need to be very
careful how we define this so that userland developers don't get stuck
dealing with semantics that vary per fstype, while still allowing the
broadest range of filesystems to support it.

My current thinking is to define this such that the reported ino_version
MUST change any time that the ctime would change (even if the timestamp
doesn't appear to change). That should also catch mtime updates.

The part I'm still conflicted about is whether we should allow for a
conformant implementation to increment the value even when there is no
apparent change to the inode.

IOW, should this value mean that something _did_ change in the inode or
that something _may_ have changed in it?

Implementations that do spurious increments would less than ideal, but
defining it that way might allow a broader range of filesystems to
present this value.

What would you prefer, as a userland developer?
--
Jeff Layton <[email protected]>

2022-08-26 08:46:57

by Colin Walters

[permalink] [raw]
Subject: Re: [PATCH] vfs: report an inode version in statx for IS_I_VERSION inodes

Bigger picture, I think eventually I'm going to rework stuff related to my use case to be more similar to the container stack, specifically using overlayfs; so it's quite possible by the time iversion is exposed to userspace, I won't have any strong want/need of it myself.

On Thu, Aug 25, 2022, at 3:48 PM, Jeff Layton wrote:

> IOW, should this value mean that something _did_ change in the inode or
> that something _may_ have changed in it?

In my case it's basically the same as IMA - we want to only compute the sha256 digest of files that actually changed. Some false positives are hence OK - but that also means the usefulness of the feature degrades in proportion to that number.

A bit more detail:

I didn't deep dive into the XFS mention about internal/background iversion changes, but AIUI at a high level it sounds like those iversion changes happen mainly (only?) when the file is recently created and pending writeback, which doesn't seem like a problem in practice. I do agree with Ingo's old quote about atime though in https://lwn.net/Articles/244829/ and this thread reminded me to use `noatime` on my main workstation (again; I'd recently changed how I provision it).




2022-08-27 07:47:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] vfs: report an inode version in statx for IS_I_VERSION inodes

On Thu, Aug 25, 2022 at 02:48:02PM -0400, Colin Walters wrote:
>
>
> On Tue, Aug 23, 2022, at 5:53 PM, Dave Chinner wrote:
> >
> > THere's no definition of what consitutes an "inode change" and this
> > exposes internal filesystem implementation details (i.e. on disk
> > format behaviour) directly to userspace. That means when the
> > internal filesystem behaviour changes, userspace applications will
> > see changes in stat->ino_version changes and potentially break them.
>
> As a userspace developer (ostree, etc. who is definitely interested in this functionality) I do agree with this concern; but a random drive by comment: would it be helpful to expose iversion (or other bits like this from the vfs) via e.g. debugfs to start? I think that'd unblock writing fstests in the short term right?
>
>

This would not work at all for "virtual" filesystems like debugfs and
sysfs which only create the data when the file is read, and there's no
way to know if the data is going to be different than the last time it
was read, sorry.

greg k-h