The setattr codepath is still using coarse-grained timestamps, even on
multigrain filesystems. To fix this, we need to fetch the timestamp for
ctime updates later, at the point where the assignment occurs in
setattr_copy.
On a multigrain inode, ignore the ia_ctime in the attrs, and always
update the ctime to the current clock value. Update the atime and mtime
with the same value (if needed) unless they are being set to other
specific values, a'la utimes().
Note that we don't want to do this universally however, as some
filesystems (e.g. most networked fs) want to do an explicit update
elsewhere before updating the local inode.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/attr.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 46 insertions(+), 6 deletions(-)
diff --git a/fs/attr.c b/fs/attr.c
index a8ae5f6d9b16..8ba330e6a582 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -275,6 +275,42 @@ int inode_newsize_ok(const struct inode *inode, loff_t offset)
}
EXPORT_SYMBOL(inode_newsize_ok);
+/**
+ * setattr_copy_mgtime - update timestamps for mgtime inodes
+ * @inode: inode timestamps to be updated
+ * @attr: attrs for the update
+ *
+ * With multigrain timestamps, we need to take more care to prevent races
+ * when updating the ctime. Always update the ctime to the very latest
+ * using the standard mechanism, and use that to populate the atime and
+ * mtime appropriately (unless we're setting those to specific values).
+ */
+static void setattr_copy_mgtime(struct inode *inode, const struct iattr *attr)
+{
+ unsigned int ia_valid = attr->ia_valid;
+ struct timespec64 now;
+
+ /*
+ * If the ctime isn't being updated then nothing else should be
+ * either.
+ */
+ if (!(ia_valid & ATTR_CTIME)) {
+ WARN_ON_ONCE(ia_valid & (ATTR_ATIME|ATTR_MTIME));
+ return;
+ }
+
+ now = inode_set_ctime_current(inode);
+ if (ia_valid & ATTR_ATIME_SET)
+ inode->i_atime = attr->ia_atime;
+ else if (ia_valid & ATTR_ATIME)
+ inode->i_atime = now;
+
+ if (ia_valid & ATTR_MTIME_SET)
+ inode->i_mtime = attr->ia_mtime;
+ else if (ia_valid & ATTR_MTIME)
+ inode->i_mtime = now;
+}
+
/**
* setattr_copy - copy simple metadata updates into the generic inode
* @idmap: idmap of the mount the inode was found from
@@ -307,12 +343,6 @@ void setattr_copy(struct mnt_idmap *idmap, struct inode *inode,
i_uid_update(idmap, attr, inode);
i_gid_update(idmap, attr, inode);
- if (ia_valid & ATTR_ATIME)
- inode->i_atime = attr->ia_atime;
- if (ia_valid & ATTR_MTIME)
- inode->i_mtime = attr->ia_mtime;
- if (ia_valid & ATTR_CTIME)
- inode_set_ctime_to_ts(inode, attr->ia_ctime);
if (ia_valid & ATTR_MODE) {
umode_t mode = attr->ia_mode;
if (!in_group_or_capable(idmap, inode,
@@ -320,6 +350,16 @@ void setattr_copy(struct mnt_idmap *idmap, struct inode *inode,
mode &= ~S_ISGID;
inode->i_mode = mode;
}
+
+ if (is_mgtime(inode))
+ return setattr_copy_mgtime(inode, attr);
+
+ if (ia_valid & ATTR_ATIME)
+ inode->i_atime = attr->ia_atime;
+ if (ia_valid & ATTR_MTIME)
+ inode->i_mtime = attr->ia_mtime;
+ if (ia_valid & ATTR_CTIME)
+ inode_set_ctime_to_ts(inode, attr->ia_ctime);
}
EXPORT_SYMBOL(setattr_copy);
---
base-commit: c5d9e87c18026d4caee80cd60a143f2b6564d446
change-id: 20230830-kdevops-2f154434ffd3
Best regards,
--
Jeff Layton <[email protected]>
On Wed, 30 Aug 2023 14:28:43 -0400, Jeff Layton wrote:
> The setattr codepath is still using coarse-grained timestamps, even on
> multigrain filesystems. To fix this, we need to fetch the timestamp for
> ctime updates later, at the point where the assignment occurs in
> setattr_copy.
>
> On a multigrain inode, ignore the ia_ctime in the attrs, and always
> update the ctime to the current clock value. Update the atime and mtime
> with the same value (if needed) unless they are being set to other
> specific values, a'la utimes().
>
> [...]
Picking this into vfs.ctime. Let me know if this has to go somewhere else.
---
Applied to the vfs.ctime branch of the vfs/vfs.git tree.
Patches in the vfs.ctime branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.ctime
[1/1] fs: have setattr_copy handle multigrain timestamps appropriately
https://git.kernel.org/vfs/vfs/c/2846ab555339
Hi Jeff,
On Wed, Aug 30, 2023 at 02:28:43PM -0400, Jeff Layton wrote:
> The setattr codepath is still using coarse-grained timestamps, even on
> multigrain filesystems. To fix this, we need to fetch the timestamp for
> ctime updates later, at the point where the assignment occurs in
> setattr_copy.
>
> On a multigrain inode, ignore the ia_ctime in the attrs, and always
> update the ctime to the current clock value. Update the atime and mtime
> with the same value (if needed) unless they are being set to other
> specific values, a'la utimes().
>
> Note that we don't want to do this universally however, as some
> filesystems (e.g. most networked fs) want to do an explicit update
> elsewhere before updating the local inode.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/attr.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 46 insertions(+), 6 deletions(-)
>
> diff --git a/fs/attr.c b/fs/attr.c
> index a8ae5f6d9b16..8ba330e6a582 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -275,6 +275,42 @@ int inode_newsize_ok(const struct inode *inode, loff_t offset)
> }
> EXPORT_SYMBOL(inode_newsize_ok);
>
> +/**
> + * setattr_copy_mgtime - update timestamps for mgtime inodes
> + * @inode: inode timestamps to be updated
> + * @attr: attrs for the update
> + *
> + * With multigrain timestamps, we need to take more care to prevent races
> + * when updating the ctime. Always update the ctime to the very latest
> + * using the standard mechanism, and use that to populate the atime and
> + * mtime appropriately (unless we're setting those to specific values).
> + */
> +static void setattr_copy_mgtime(struct inode *inode, const struct iattr *attr)
> +{
> + unsigned int ia_valid = attr->ia_valid;
> + struct timespec64 now;
> +
> + /*
> + * If the ctime isn't being updated then nothing else should be
> + * either.
> + */
> + if (!(ia_valid & ATTR_CTIME)) {
> + WARN_ON_ONCE(ia_valid & (ATTR_ATIME|ATTR_MTIME));
> + return;
> + }
After this change in -next as commit d6f106662147 ("fs: have
setattr_copy handle multigrain timestamps appropriately"), I see the
following warning on all of my machines when starting my containers with
podman/docker:
[ 0.000000] Linux version 6.6.0-rc1-00001-gd6f106662147 ([email protected]) (x86_64-linux-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41) #1 SMP PREEMPT_DYNAMIC Tue Sep 12 16:01:41 MST 2023
...
[ 91.484884] ------------[ cut here ]------------
[ 91.484889] WARNING: CPU: 2 PID: 721 at fs/attr.c:298 setattr_copy+0x106/0x1b0
[ 91.484920] Modules linked in:
[ 91.484923] CPU: 2 PID: 721 Comm: podman Not tainted 6.6.0-rc1-00001-gd6f106662147 #1 33e7e76d587862399371bcd9c318a44e2ec9ced1
[ 91.484927] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 91.484930] RIP: 0010:setattr_copy+0x106/0x1b0
[ 91.484933] Code: 44 0f 44 f8 48 8b 43 28 66 44 89 3b 48 8b 40 28 f6 40 08 40 0f 84 59 ff ff ff 45 8b 2c 24 41 f6 c5 40 75 49 41 83 e5 30 74 94 <0f> 0b eb 90 48 8b 43 28 41 8b 54 24 0c 4c 89 f7 48 8b b0 90 04 00
[ 91.484934] RSP: 0018:ffffc90003d8f778 EFLAGS: 00010206
[ 91.484940] RAX: ffffffff9bb407c0 RBX: ffff888102ae44f8 RCX: ffff8881127ad0c0
[ 91.484941] RDX: ffffc90003d8f900 RSI: ffff888102ae44f8 RDI: ffffffff9bb346a0
[ 91.484942] RBP: ffffc90003d8f7a0 R08: ffff888101017300 R09: ffff888102ae44f8
[ 91.484943] R10: 000000006500ee8f R11: 000000000cf2ee13 R12: ffffc90003d8f900
[ 91.484943] R13: 0000000000000030 R14: ffffffff9bb346a0 R15: 0000000000000000
[ 91.484946] FS: 00007f155effd6c0(0000) GS:ffff88846fc80000(0000) knlGS:0000000000000000
[ 91.484947] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 91.484948] CR2: 00007f155dffbd58 CR3: 000000010694a000 CR4: 0000000000350ee0
[ 91.484950] Call Trace:
[ 91.484951] <TASK>
[ 91.484952] ? setattr_copy+0x106/0x1b0
[ 91.484956] ? __warn+0x81/0x130
[ 91.484969] ? setattr_copy+0x106/0x1b0
[ 91.484971] ? report_bug+0x171/0x1a0
[ 91.484987] ? handle_bug+0x3c/0x80
[ 91.484991] ? exc_invalid_op+0x17/0x70
[ 91.484992] ? asm_exc_invalid_op+0x1a/0x20
[ 91.485007] ? setattr_copy+0x106/0x1b0
[ 91.485009] btrfs_setattr+0x3d0/0x830
[ 91.485021] ? btrfs_setattr+0x3ea/0x830
[ 91.485023] notify_change+0x1f5/0x4b0
[ 91.485028] ? ovl_set_timestamps.isra.0+0x7d/0xa0
[ 91.485033] ovl_set_timestamps.isra.0+0x7d/0xa0
[ 91.485040] ovl_set_attr.part.0+0x9f/0xb0
[ 91.485043] ovl_copy_up_metadata+0xb1/0x210
[ 91.485046] ? ovl_mkdir_real+0x32/0xc0
[ 91.485048] ovl_copy_up_one+0x6ab/0x14f0
[ 91.485050] ? btrfs_search_slot+0x8c8/0xd00
[ 91.485056] ? xa_load+0x8c/0xe0
[ 91.485063] ovl_copy_up_flags+0xcf/0x100
[ 91.485067] ovl_do_remove+0xa5/0x500
[ 91.485068] ? inode_permission+0xde/0x190
[ 91.485075] ? __pfx_bpf_lsm_inode_permission+0x10/0x10
[ 91.485081] ? security_inode_permission+0x3e/0x60
[ 91.485090] vfs_unlink+0x112/0x280
[ 91.485094] do_unlinkat+0x14b/0x320
[ 91.485096] __x64_sys_unlinkat+0x37/0x70
[ 91.485098] do_syscall_64+0x60/0x90
[ 91.485100] ? syscall_exit_to_user_mode+0x2b/0x40
[ 91.485104] ? do_syscall_64+0x6c/0x90
[ 91.485106] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 91.485108] RIP: 0033:0x55d4e18b04ee
[ 91.485112] Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
[ 91.485113] RSP: 002b:000000c0003874a0 EFLAGS: 00000212 ORIG_RAX: 0000000000000107
[ 91.485114] RAX: ffffffffffffffda RBX: 000000000000000a RCX: 000055d4e18b04ee
[ 91.485115] RDX: 0000000000000000 RSI: 000000c00027e1c0 RDI: 000000000000000a
[ 91.485116] RBP: 000000c0003874e0 R08: 0000000000000000 R09: 0000000000000000
[ 91.485116] R10: 0000000000000000 R11: 0000000000000212 R12: 000000000000001c
[ 91.485117] R13: 000000c000100800 R14: 000000c0000061a0 R15: 000000c0002f0060
[ 91.485119] </TASK>
[ 91.485119] ---[ end trace 0000000000000000 ]---
Is this expected?
Cheers,
Nathan
On Tue, 2023-09-12 at 16:10 -0700, Nathan Chancellor wrote:
> Hi Jeff,
>
> On Wed, Aug 30, 2023 at 02:28:43PM -0400, Jeff Layton wrote:
> > The setattr codepath is still using coarse-grained timestamps, even on
> > multigrain filesystems. To fix this, we need to fetch the timestamp for
> > ctime updates later, at the point where the assignment occurs in
> > setattr_copy.
> >
> > On a multigrain inode, ignore the ia_ctime in the attrs, and always
> > update the ctime to the current clock value. Update the atime and mtime
> > with the same value (if needed) unless they are being set to other
> > specific values, a'la utimes().
> >
> > Note that we don't want to do this universally however, as some
> > filesystems (e.g. most networked fs) want to do an explicit update
> > elsewhere before updating the local inode.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/attr.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------
> > 1 file changed, 46 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/attr.c b/fs/attr.c
> > index a8ae5f6d9b16..8ba330e6a582 100644
> > --- a/fs/attr.c
> > +++ b/fs/attr.c
> > @@ -275,6 +275,42 @@ int inode_newsize_ok(const struct inode *inode, loff_t offset)
> > }
> > EXPORT_SYMBOL(inode_newsize_ok);
> >
> > +/**
> > + * setattr_copy_mgtime - update timestamps for mgtime inodes
> > + * @inode: inode timestamps to be updated
> > + * @attr: attrs for the update
> > + *
> > + * With multigrain timestamps, we need to take more care to prevent races
> > + * when updating the ctime. Always update the ctime to the very latest
> > + * using the standard mechanism, and use that to populate the atime and
> > + * mtime appropriately (unless we're setting those to specific values).
> > + */
> > +static void setattr_copy_mgtime(struct inode *inode, const struct iattr *attr)
> > +{
> > + unsigned int ia_valid = attr->ia_valid;
> > + struct timespec64 now;
> > +
> > + /*
> > + * If the ctime isn't being updated then nothing else should be
> > + * either.
> > + */
> > + if (!(ia_valid & ATTR_CTIME)) {
> > + WARN_ON_ONCE(ia_valid & (ATTR_ATIME|ATTR_MTIME));
> > + return;
> > + }
>
> After this change in -next as commit d6f106662147 ("fs: have
> setattr_copy handle multigrain timestamps appropriately"), I see the
> following warning on all of my machines when starting my containers with
> podman/docker:
>
> [ 0.000000] Linux version 6.6.0-rc1-00001-gd6f106662147 ([email protected]) (x86_64-linux-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41) #1 SMP PREEMPT_DYNAMIC Tue Sep 12 16:01:41 MST 2023
> ...
> [ 91.484884] ------------[ cut here ]------------
> [ 91.484889] WARNING: CPU: 2 PID: 721 at fs/attr.c:298 setattr_copy+0x106/0x1b0
> [ 91.484920] Modules linked in:
> [ 91.484923] CPU: 2 PID: 721 Comm: podman Not tainted 6.6.0-rc1-00001-gd6f106662147 #1 33e7e76d587862399371bcd9c318a44e2ec9ced1
> [ 91.484927] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> [ 91.484930] RIP: 0010:setattr_copy+0x106/0x1b0
> [ 91.484933] Code: 44 0f 44 f8 48 8b 43 28 66 44 89 3b 48 8b 40 28 f6 40 08 40 0f 84 59 ff ff ff 45 8b 2c 24 41 f6 c5 40 75 49 41 83 e5 30 74 94 <0f> 0b eb 90 48 8b 43 28 41 8b 54 24 0c 4c 89 f7 48 8b b0 90 04 00
> [ 91.484934] RSP: 0018:ffffc90003d8f778 EFLAGS: 00010206
> [ 91.484940] RAX: ffffffff9bb407c0 RBX: ffff888102ae44f8 RCX: ffff8881127ad0c0
> [ 91.484941] RDX: ffffc90003d8f900 RSI: ffff888102ae44f8 RDI: ffffffff9bb346a0
> [ 91.484942] RBP: ffffc90003d8f7a0 R08: ffff888101017300 R09: ffff888102ae44f8
> [ 91.484943] R10: 000000006500ee8f R11: 000000000cf2ee13 R12: ffffc90003d8f900
> [ 91.484943] R13: 0000000000000030 R14: ffffffff9bb346a0 R15: 0000000000000000
> [ 91.484946] FS: 00007f155effd6c0(0000) GS:ffff88846fc80000(0000) knlGS:0000000000000000
> [ 91.484947] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 91.484948] CR2: 00007f155dffbd58 CR3: 000000010694a000 CR4: 0000000000350ee0
> [ 91.484950] Call Trace:
> [ 91.484951] <TASK>
> [ 91.484952] ? setattr_copy+0x106/0x1b0
> [ 91.484956] ? __warn+0x81/0x130
> [ 91.484969] ? setattr_copy+0x106/0x1b0
> [ 91.484971] ? report_bug+0x171/0x1a0
> [ 91.484987] ? handle_bug+0x3c/0x80
> [ 91.484991] ? exc_invalid_op+0x17/0x70
> [ 91.484992] ? asm_exc_invalid_op+0x1a/0x20
> [ 91.485007] ? setattr_copy+0x106/0x1b0
> [ 91.485009] btrfs_setattr+0x3d0/0x830
> [ 91.485021] ? btrfs_setattr+0x3ea/0x830
> [ 91.485023] notify_change+0x1f5/0x4b0
> [ 91.485028] ? ovl_set_timestamps.isra.0+0x7d/0xa0
> [ 91.485033] ovl_set_timestamps.isra.0+0x7d/0xa0
> [ 91.485040] ovl_set_attr.part.0+0x9f/0xb0
> [ 91.485043] ovl_copy_up_metadata+0xb1/0x210
> [ 91.485046] ? ovl_mkdir_real+0x32/0xc0
> [ 91.485048] ovl_copy_up_one+0x6ab/0x14f0
> [ 91.485050] ? btrfs_search_slot+0x8c8/0xd00
> [ 91.485056] ? xa_load+0x8c/0xe0
> [ 91.485063] ovl_copy_up_flags+0xcf/0x100
> [ 91.485067] ovl_do_remove+0xa5/0x500
> [ 91.485068] ? inode_permission+0xde/0x190
> [ 91.485075] ? __pfx_bpf_lsm_inode_permission+0x10/0x10
> [ 91.485081] ? security_inode_permission+0x3e/0x60
> [ 91.485090] vfs_unlink+0x112/0x280
> [ 91.485094] do_unlinkat+0x14b/0x320
> [ 91.485096] __x64_sys_unlinkat+0x37/0x70
> [ 91.485098] do_syscall_64+0x60/0x90
> [ 91.485100] ? syscall_exit_to_user_mode+0x2b/0x40
> [ 91.485104] ? do_syscall_64+0x6c/0x90
> [ 91.485106] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [ 91.485108] RIP: 0033:0x55d4e18b04ee
> [ 91.485112] Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
> [ 91.485113] RSP: 002b:000000c0003874a0 EFLAGS: 00000212 ORIG_RAX: 0000000000000107
> [ 91.485114] RAX: ffffffffffffffda RBX: 000000000000000a RCX: 000055d4e18b04ee
> [ 91.485115] RDX: 0000000000000000 RSI: 000000c00027e1c0 RDI: 000000000000000a
> [ 91.485116] RBP: 000000c0003874e0 R08: 0000000000000000 R09: 0000000000000000
> [ 91.485116] R10: 0000000000000000 R11: 0000000000000212 R12: 000000000000001c
> [ 91.485117] R13: 000000c000100800 R14: 000000c0000061a0 R15: 000000c0002f0060
> [ 91.485119] </TASK>
> [ 91.485119] ---[ end trace 0000000000000000 ]---
>
> Is this expected?
>
No. Most likely this is a bug in overlayfs. It's trying to set the atime
and mtime w/o setting the ctime. I'll have to look at an appropriate
fix. Stay tuned.
Thanks for the bug report,
--
Jeff Layton <[email protected]>