2023-07-18 13:01:55

by Jeffrey Layton

[permalink] [raw]
Subject: Re: linux-next ext4 inode size 128 corrupted

On Mon, 2023-07-17 at 20:43 -0700, Hugh Dickins wrote:
> Hi Jeff,
>
> I've been unable to run my kernel builds on ext4 on loop0 on tmpfs
> swapping load on linux-next recently, on one machine: various kinds
> of havoc, most common symptoms being ext4_find_dest_de:2107 errors,
> systemd-journald errors, segfaults. But no problem observed running
> on a more recent installation.
>
> Bisected yesterday to 979492850abd ("ext4: convert to ctime accessor
> functions").
>
> I've mostly averted my eyes from the EXT4_INODE macro changes there,
> but I think that's where the problem lies. Reading the comment in
> fs/ext4/ext4.h above EXT4_FITS_IN_INODE() led me to try "tune2fs -l"
> and look at /etc/mke2fs.conf. It's an old installation, its own
> inodes are 256, but that old mke2fs.conf does default to 128 for small
> FSes, and what I use for the load test is small. Passing -I 256 to the
> mkfs makes the problems go away.
>
> (What's most alarming about the corruption is that it appears to extend
> beyond just the throwaway test filesystem: segfaults on bash and libc.so
> from the root filesystem. But no permanent damage done there.)
>
> One oddity I noticed in scrutinizing that commit, didn't help with
> the issues above, but there's a hunk in ext4_rename() which changes
> - old.dir->i_ctime = old.dir->i_mtime = current_time(old.dir);
> + old.dir->i_mtime = inode_set_ctime_current(old.inode);
>
>

I suspect the problem here is the i_crtime, which lives wholly in the
extended part of the inode. The old macros would just not store anything
if the i_crtime didn't fit, but the new ones would still store the
tv_sec field in that case, which could be a memory corruptor. This patch
should fix it, and I'm testing it now.

Hugh, if you're able to give this a spin on your setup, then that would
be most helpful. This is also in the "ctime" branch in my kernel.org
tree if that helps. If this looks good, I'll ask Christian to fold this
into the ext4 conversion patch.

Thanks for the bug report!

---------------------------8<--------------------------

[PATCH] ext4: fix the time handling macros when ext4 is using small inodes

If ext4 is using small on-disk inodes, then it may not be able to store
fine grained timestamps. It also can't store the i_crtime at all in that
case since that fully lives in the extended part of the inode.

979492850abd got the EXT4_EINODE_{GET,SET}_XTIME macros wrong, and would
still store the tv_sec field of the i_crtime into the raw_inode, even
when they were small, corrupting adjacent memory.

This fixes those macros to skip setting anything in the raw_inode if the
tv_sec field doesn't fit.

Cc: Jan Kara <[email protected]>
Fixes: 979492850abd ("ext4: convert to ctime accessor functions")
Reported-by: Hugh Dickins <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/ext4/ext4.h | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2af347669db7..1e2259d9967d 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -900,8 +900,10 @@ do { \
#define EXT4_INODE_SET_CTIME(inode, raw_inode) \
EXT4_INODE_SET_XTIME_VAL(i_ctime, inode, raw_inode, inode_get_ctime(inode))

-#define EXT4_EINODE_SET_XTIME(xtime, einode, raw_inode) \
- EXT4_INODE_SET_XTIME_VAL(xtime, &((einode)->vfs_inode), raw_inode, (einode)->xtime)
+#define EXT4_EINODE_SET_XTIME(xtime, einode, raw_inode) \
+ if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime)) \
+ EXT4_INODE_SET_XTIME_VAL(xtime, &((einode)->vfs_inode), \
+ raw_inode, (einode)->xtime)

#define EXT4_INODE_GET_XTIME_VAL(xtime, inode, raw_inode) \
(EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra) ? \
@@ -922,9 +924,14 @@ do { \
EXT4_INODE_GET_XTIME_VAL(i_ctime, inode, raw_inode)); \
} while (0)

-#define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode) \
-do { \
- (einode)->xtime = EXT4_INODE_GET_XTIME_VAL(xtime, &(einode->vfs_inode), raw_inode); \
+#define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode) \
+do { \
+ if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime)) \
+ (einode)->xtime = \
+ EXT4_INODE_GET_XTIME_VAL(xtime, &(einode->vfs_inode), \
+ raw_inode); \
+ else \
+ (einode)->xtime = (struct timespec64){0, 0}; \
} while (0)

#define i_disk_version osd1.linux1.l_i_version
--
2.41.0




2023-07-18 18:14:41

by Hugh Dickins

[permalink] [raw]
Subject: Re: linux-next ext4 inode size 128 corrupted

On Tue, 18 Jul 2023, Jeff Layton wrote:
> On Mon, 2023-07-17 at 20:43 -0700, Hugh Dickins wrote:
> > Hi Jeff,
> >
> > I've been unable to run my kernel builds on ext4 on loop0 on tmpfs
> > swapping load on linux-next recently, on one machine: various kinds
> > of havoc, most common symptoms being ext4_find_dest_de:2107 errors,
> > systemd-journald errors, segfaults. But no problem observed running
> > on a more recent installation.
> >
> > Bisected yesterday to 979492850abd ("ext4: convert to ctime accessor
> > functions").
> >
> > I've mostly averted my eyes from the EXT4_INODE macro changes there,
> > but I think that's where the problem lies. Reading the comment in
> > fs/ext4/ext4.h above EXT4_FITS_IN_INODE() led me to try "tune2fs -l"
> > and look at /etc/mke2fs.conf. It's an old installation, its own
> > inodes are 256, but that old mke2fs.conf does default to 128 for small
> > FSes, and what I use for the load test is small. Passing -I 256 to the
> > mkfs makes the problems go away.
> >
> > (What's most alarming about the corruption is that it appears to extend
> > beyond just the throwaway test filesystem: segfaults on bash and libc.so
> > from the root filesystem. But no permanent damage done there.)
> >
> > One oddity I noticed in scrutinizing that commit, didn't help with
> > the issues above, but there's a hunk in ext4_rename() which changes
> > - old.dir->i_ctime = old.dir->i_mtime = current_time(old.dir);
> > + old.dir->i_mtime = inode_set_ctime_current(old.inode);
> >
> >
>
> I suspect the problem here is the i_crtime, which lives wholly in the
> extended part of the inode. The old macros would just not store anything
> if the i_crtime didn't fit, but the new ones would still store the
> tv_sec field in that case, which could be a memory corruptor. This patch
> should fix it, and I'm testing it now.

That makes sense.

>
> Hugh, if you're able to give this a spin on your setup, then that would
> be most helpful. This is also in the "ctime" branch in my kernel.org
> tree if that helps. If this looks good, I'll ask Christian to fold this
> into the ext4 conversion patch.

Yes, it's now running fine on the problem machine, and on the no-problem.

Tested-by: Hugh Dickins <[email protected]>

>
> Thanks for the bug report!

And thanks for the quick turnaround!

But I'm puzzled by your dismissing that
- old.dir->i_ctime = old.dir->i_mtime = current_time(old.dir);
+ old.dir->i_mtime = inode_set_ctime_current(old.inode);
in ext4_rename() as "actually looks fine".

Different issue, nothing to do with the corruption, sure. Much less
important, sure. But updating ctime on the wrong inode is "fine"?

Hugh

>
> ---------------------------8<--------------------------
>
> [PATCH] ext4: fix the time handling macros when ext4 is using small inodes
>
> If ext4 is using small on-disk inodes, then it may not be able to store
> fine grained timestamps. It also can't store the i_crtime at all in that
> case since that fully lives in the extended part of the inode.
>
> 979492850abd got the EXT4_EINODE_{GET,SET}_XTIME macros wrong, and would
> still store the tv_sec field of the i_crtime into the raw_inode, even
> when they were small, corrupting adjacent memory.
>
> This fixes those macros to skip setting anything in the raw_inode if the
> tv_sec field doesn't fit.
>
> Cc: Jan Kara <[email protected]>
> Fixes: 979492850abd ("ext4: convert to ctime accessor functions")
> Reported-by: Hugh Dickins <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/ext4/ext4.h | 17 ++++++++++++-----
> 1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 2af347669db7..1e2259d9967d 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -900,8 +900,10 @@ do { \
> #define EXT4_INODE_SET_CTIME(inode, raw_inode) \
> EXT4_INODE_SET_XTIME_VAL(i_ctime, inode, raw_inode, inode_get_ctime(inode))
>
> -#define EXT4_EINODE_SET_XTIME(xtime, einode, raw_inode) \
> - EXT4_INODE_SET_XTIME_VAL(xtime, &((einode)->vfs_inode), raw_inode, (einode)->xtime)
> +#define EXT4_EINODE_SET_XTIME(xtime, einode, raw_inode) \
> + if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime)) \
> + EXT4_INODE_SET_XTIME_VAL(xtime, &((einode)->vfs_inode), \
> + raw_inode, (einode)->xtime)
>
> #define EXT4_INODE_GET_XTIME_VAL(xtime, inode, raw_inode) \
> (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra) ? \
> @@ -922,9 +924,14 @@ do { \
> EXT4_INODE_GET_XTIME_VAL(i_ctime, inode, raw_inode)); \
> } while (0)
>
> -#define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode) \
> -do { \
> - (einode)->xtime = EXT4_INODE_GET_XTIME_VAL(xtime, &(einode->vfs_inode), raw_inode); \
> +#define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode) \
> +do { \
> + if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime)) \
> + (einode)->xtime = \
> + EXT4_INODE_GET_XTIME_VAL(xtime, &(einode->vfs_inode), \
> + raw_inode); \
> + else \
> + (einode)->xtime = (struct timespec64){0, 0}; \
> } while (0)
>
> #define i_disk_version osd1.linux1.l_i_version
> --
> 2.41.0
>
>
>

2023-07-18 18:41:17

by Jeffrey Layton

[permalink] [raw]
Subject: Re: linux-next ext4 inode size 128 corrupted

On Tue, 2023-07-18 at 11:03 -0700, Hugh Dickins wrote:
> On Tue, 18 Jul 2023, Jeff Layton wrote:
> > On Mon, 2023-07-17 at 20:43 -0700, Hugh Dickins wrote:
> > > Hi Jeff,
> > >
> > > I've been unable to run my kernel builds on ext4 on loop0 on tmpfs
> > > swapping load on linux-next recently, on one machine: various kinds
> > > of havoc, most common symptoms being ext4_find_dest_de:2107 errors,
> > > systemd-journald errors, segfaults. But no problem observed running
> > > on a more recent installation.
> > >
> > > Bisected yesterday to 979492850abd ("ext4: convert to ctime accessor
> > > functions").
> > >
> > > I've mostly averted my eyes from the EXT4_INODE macro changes there,
> > > but I think that's where the problem lies. Reading the comment in
> > > fs/ext4/ext4.h above EXT4_FITS_IN_INODE() led me to try "tune2fs -l"
> > > and look at /etc/mke2fs.conf. It's an old installation, its own
> > > inodes are 256, but that old mke2fs.conf does default to 128 for small
> > > FSes, and what I use for the load test is small. Passing -I 256 to the
> > > mkfs makes the problems go away.
> > >
> > > (What's most alarming about the corruption is that it appears to extend
> > > beyond just the throwaway test filesystem: segfaults on bash and libc.so
> > > from the root filesystem. But no permanent damage done there.)
> > >
> > > One oddity I noticed in scrutinizing that commit, didn't help with
> > > the issues above, but there's a hunk in ext4_rename() which changes
> > > - old.dir->i_ctime = old.dir->i_mtime = current_time(old.dir);
> > > + old.dir->i_mtime = inode_set_ctime_current(old.inode);
> > >
> > >
> >
> > I suspect the problem here is the i_crtime, which lives wholly in the
> > extended part of the inode. The old macros would just not store anything
> > if the i_crtime didn't fit, but the new ones would still store the
> > tv_sec field in that case, which could be a memory corruptor. This patch
> > should fix it, and I'm testing it now.
>
> That makes sense.
>
> >
> > Hugh, if you're able to give this a spin on your setup, then that would
> > be most helpful. This is also in the "ctime" branch in my kernel.org
> > tree if that helps. If this looks good, I'll ask Christian to fold this
> > into the ext4 conversion patch.
>
> Yes, it's now running fine on the problem machine, and on the no-problem.
>
> Tested-by: Hugh Dickins <[email protected]>
>
> >
> > Thanks for the bug report!
>
> And thanks for the quick turnaround!
>
> But I'm puzzled by your dismissing that
> - old.dir->i_ctime = old.dir->i_mtime = current_time(old.dir);
> + old.dir->i_mtime = inode_set_ctime_current(old.inode);
> in ext4_rename() as "actually looks fine".
>
> Different issue, nothing to do with the corruption, sure. Much less
> important, sure. But updating ctime on the wrong inode is "fine"?

Ahh , sorry I wasn't looking at that properly. I think you're correct.
The right fix is probably to move ext4 to use generic_rename_timestamp.
I'll test and send another patch for that.

Thanks again!
--
Jeff Layton <[email protected]>