2021-01-05 06:28:02

by yangerkun

[permalink] [raw]
Subject: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT

We got a "deleted inode referenced" warning cross our fsstress test. The
bug can be reproduced easily with following steps:

cd /dev/shm
mkdir test/
fallocate -l 128M img
mkfs.ext4 -b 1024 img
mount img test/
dd if=/dev/zero of=test/foo bs=1M count=128
mkdir test/dir/ && cd test/dir/
for ((i=0;i<1000;i++)); do touch file$i; done # consume all block
cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD,
/dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in
ext4_rename will return ENOSPC!!
cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1
We will get the output:
"ls: cannot access 'test/dir/file1': Structure needs cleaning"
and the dmesg show:
"EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls:
deleted inode referenced: 139"

ext4_rename will create a special inode for whiteout and use this 'ino'
to replace the source file's dir entry 'ino'. Once error happens
latter(the error above was the ENOSPC return from ext4_add_entry in
ext4_rename since all space has been consumed), the cleanup do drop the
nlink for whiteout, but forget to restore 'ino' with source file. This
will trigger the bug describle as above.

Signed-off-by: yangerkun <[email protected]>
---
fs/ext4/namei.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index b17a082b7db1..90f7ebeb69c8 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3593,9 +3593,6 @@ static int ext4_setent(handle_t *handle, struct ext4_renament *ent,
return retval2;
}
}
- brelse(ent->bh);
- ent->bh = NULL;
-
return retval;
}

@@ -3794,6 +3791,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
}
}

+ old_file_type = old.de->file_type;
if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
ext4_handle_sync(handle);

@@ -3821,7 +3819,6 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
force_reread = (new.dir->i_ino == old.dir->i_ino &&
ext4_test_inode_flag(new.dir, EXT4_INODE_INLINE_DATA));

- old_file_type = old.de->file_type;
if (whiteout) {
/*
* Do this before adding a new entry, so the old entry is sure
@@ -3919,15 +3916,19 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
retval = 0;

end_rename:
- brelse(old.dir_bh);
- brelse(old.bh);
- brelse(new.bh);
if (whiteout) {
- if (retval)
+ if (retval) {
+ ext4_setent(handle, &old,
+ old.inode->i_ino, old_file_type);
drop_nlink(whiteout);
+ }
unlock_new_inode(whiteout);
iput(whiteout);
+
}
+ brelse(old.dir_bh);
+ brelse(old.bh);
+ brelse(new.bh);
if (handle)
ext4_journal_stop(handle);
return retval;
--
2.25.4


2021-01-05 14:32:55

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT

On Tue 05-01-21 14:28:57, yangerkun wrote:
> We got a "deleted inode referenced" warning cross our fsstress test. The
> bug can be reproduced easily with following steps:
>
> cd /dev/shm
> mkdir test/
> fallocate -l 128M img
> mkfs.ext4 -b 1024 img
> mount img test/
> dd if=/dev/zero of=test/foo bs=1M count=128
> mkdir test/dir/ && cd test/dir/
> for ((i=0;i<1000;i++)); do touch file$i; done # consume all block
> cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD,
> /dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in
> ext4_rename will return ENOSPC!!
> cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1
> We will get the output:
> "ls: cannot access 'test/dir/file1': Structure needs cleaning"
> and the dmesg show:
> "EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls:
> deleted inode referenced: 139"
>
> ext4_rename will create a special inode for whiteout and use this 'ino'
> to replace the source file's dir entry 'ino'. Once error happens
> latter(the error above was the ENOSPC return from ext4_add_entry in
> ext4_rename since all space has been consumed), the cleanup do drop the
> nlink for whiteout, but forget to restore 'ino' with source file. This
> will trigger the bug describle as above.
>
> Signed-off-by: yangerkun <[email protected]>

Thanks! The patch looks good to me now. You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/namei.c | 17 +++++++++--------
> 1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index b17a082b7db1..90f7ebeb69c8 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -3593,9 +3593,6 @@ static int ext4_setent(handle_t *handle, struct ext4_renament *ent,
> return retval2;
> }
> }
> - brelse(ent->bh);
> - ent->bh = NULL;
> -
> return retval;
> }
>
> @@ -3794,6 +3791,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
> }
> }
>
> + old_file_type = old.de->file_type;
> if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
> ext4_handle_sync(handle);
>
> @@ -3821,7 +3819,6 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
> force_reread = (new.dir->i_ino == old.dir->i_ino &&
> ext4_test_inode_flag(new.dir, EXT4_INODE_INLINE_DATA));
>
> - old_file_type = old.de->file_type;
> if (whiteout) {
> /*
> * Do this before adding a new entry, so the old entry is sure
> @@ -3919,15 +3916,19 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
> retval = 0;
>
> end_rename:
> - brelse(old.dir_bh);
> - brelse(old.bh);
> - brelse(new.bh);
> if (whiteout) {
> - if (retval)
> + if (retval) {
> + ext4_setent(handle, &old,
> + old.inode->i_ino, old_file_type);
> drop_nlink(whiteout);
> + }
> unlock_new_inode(whiteout);
> iput(whiteout);
> +
> }
> + brelse(old.dir_bh);
> + brelse(old.bh);
> + brelse(new.bh);
> if (handle)
> ext4_journal_stop(handle);
> return retval;
> --
> 2.25.4
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-01-14 03:53:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT

On Tue, Jan 05, 2021 at 02:28:57PM +0800, yangerkun wrote:
> We got a "deleted inode referenced" warning cross our fsstress test. The
> bug can be reproduced easily with following steps:
>
> cd /dev/shm
> mkdir test/
> fallocate -l 128M img
> mkfs.ext4 -b 1024 img
> mount img test/
> dd if=/dev/zero of=test/foo bs=1M count=128
> mkdir test/dir/ && cd test/dir/
> for ((i=0;i<1000;i++)); do touch file$i; done # consume all block
> cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD,
> /dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in
> ext4_rename will return ENOSPC!!
> cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1
> We will get the output:
> "ls: cannot access 'test/dir/file1': Structure needs cleaning"
> and the dmesg show:
> "EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls:
> deleted inode referenced: 139"
>
> ext4_rename will create a special inode for whiteout and use this 'ino'
> to replace the source file's dir entry 'ino'. Once error happens
> latter(the error above was the ENOSPC return from ext4_add_entry in
> ext4_rename since all space has been consumed), the cleanup do drop the
> nlink for whiteout, but forget to restore 'ino' with source file. This
> will trigger the bug describle as above.
>
> Signed-off-by: yangerkun <[email protected]>

Thanks, replied.

- Ted

2021-01-20 06:59:49

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT

On Thu, Jan 14, 2021 at 5:53 AM Theodore Ts'o <[email protected]> wrote:
>
> On Tue, Jan 05, 2021 at 02:28:57PM +0800, yangerkun wrote:
> > We got a "deleted inode referenced" warning cross our fsstress test. The
> > bug can be reproduced easily with following steps:
> >
> > cd /dev/shm
> > mkdir test/
> > fallocate -l 128M img
> > mkfs.ext4 -b 1024 img
> > mount img test/
> > dd if=/dev/zero of=test/foo bs=1M count=128
> > mkdir test/dir/ && cd test/dir/
> > for ((i=0;i<1000;i++)); do touch file$i; done # consume all block
> > cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD,
> > /dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in
> > ext4_rename will return ENOSPC!!
> > cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1
> > We will get the output:
> > "ls: cannot access 'test/dir/file1': Structure needs cleaning"
> > and the dmesg show:
> > "EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls:
> > deleted inode referenced: 139"
> >
> > ext4_rename will create a special inode for whiteout and use this 'ino'
> > to replace the source file's dir entry 'ino'. Once error happens
> > latter(the error above was the ENOSPC return from ext4_add_entry in
> > ext4_rename since all space has been consumed), the cleanup do drop the
> > nlink for whiteout, but forget to restore 'ino' with source file. This
> > will trigger the bug describle as above.
> >
> > Signed-off-by: yangerkun <[email protected]>
>

Apropos RENAME_WHITEOUT, it seems to be missing __ext4_fc_track_link().
I guess test coverage of RENAME_WHITEOUT in fstests is not much.
I have been seeing trickles of bug fixes for RENAME_WHITEOUT for almost
every filesystem that supports it.

But I must say it would have been very hard to catch missing ext4_fc_track_*
without specialized fs fuzzer such as the CrashMonkey generated tests.

And as long as I am ranting, I'd like to point out that it is a shame
that whiteout
was not implemented as a special (constant) inode whose nlink is irrelevant
(or a special dirent with d_ino 0 and d_type DT_WHT for that matter).
It would have been a rather small RO_COMPAT on-disk change for ext4.
It could also be implemented in slightly more backward compat manner by
maintaining a valid nlink and postpone setting the RO_COMPAT flag until
EXT4_LINK_MAX is reached.

As things stand now, overlayfs makes an effort to maintain a singleton
hardlinked whiteout inode, without being able to use it with RENAME_WHITEOUT
and filesystems have to take special care to journal the metadata of all
individual whiteout inodes, without any added value to the only user
(overlayfs).

But I guess that train has left the station long ago...

Thanks,
Amir.

2021-01-20 08:44:23

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT

On Wed, Jan 20, 2021 at 7:57 AM Amir Goldstein <[email protected]> wrote:

> And as long as I am ranting, I'd like to point out that it is a shame
> that whiteout
> was not implemented as a special (constant) inode whose nlink is irrelevant
> (or a special dirent with d_ino 0 and d_type DT_WHT for that matter).
> It would have been a rather small RO_COMPAT on-disk change for ext4.
> It could also be implemented in slightly more backward compat manner by
> maintaining a valid nlink and postpone setting the RO_COMPAT flag until
> EXT4_LINK_MAX is reached.
>
> As things stand now, overlayfs makes an effort to maintain a singleton
> hardlinked whiteout inode, without being able to use it with RENAME_WHITEOUT
> and filesystems have to take special care to journal the metadata of all
> individual whiteout inodes, without any added value to the only user
> (overlayfs).
>
> But I guess that train has left the station long ago...

Not so, I believe. Kernel internal interfaces are easy to change, and
adding support for DT_WHT to overlayfs would mostly be a trivial
undertaking.

The big issue (as always) is userspace API's and not introducing
DT_WHT there was a very deliberate choice. Adding a translation layer
from an internal whiteout representation to the userspace API also
does not seem to be a very complex problem, but I haven't looked into
that deeply.

So AFAICS there's really nothing preventing the addition of whiteout
objects to filesystems, other than developer dedication.

Thanks,
Miklos

2021-01-22 19:51:38

by harshad shirwadkar

[permalink] [raw]
Subject: Re: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT

Thanks Amir for pointing that out. Yes we are missing fast commit
tracking in whiteout. I'll send out a fix for that.

> But I must say it would have been very hard to catch missing ext4_fc_track_*
> without specialized fs fuzzer such as the CrashMonkey generated tests.

I agree, it's been on my to-do list to run CrashMonkey tests with fast
commits. I'm curious what kind of CrashMonkey tests you ran that
helped you catch this? Were you running Overlayfs on top of Ext4 with
fast commits?

Thanks,
Harshad

On Wed, Jan 20, 2021 at 12:42 AM Miklos Szeredi <[email protected]> wrote:
>
> On Wed, Jan 20, 2021 at 7:57 AM Amir Goldstein <[email protected]> wrote:
>
> > And as long as I am ranting, I'd like to point out that it is a shame
> > that whiteout
> > was not implemented as a special (constant) inode whose nlink is irrelevant
> > (or a special dirent with d_ino 0 and d_type DT_WHT for that matter).
> > It would have been a rather small RO_COMPAT on-disk change for ext4.
> > It could also be implemented in slightly more backward compat manner by
> > maintaining a valid nlink and postpone setting the RO_COMPAT flag until
> > EXT4_LINK_MAX is reached.
> >
> > As things stand now, overlayfs makes an effort to maintain a singleton
> > hardlinked whiteout inode, without being able to use it with RENAME_WHITEOUT
> > and filesystems have to take special care to journal the metadata of all
> > individual whiteout inodes, without any added value to the only user
> > (overlayfs).
> >
> > But I guess that train has left the station long ago...
>
> Not so, I believe. Kernel internal interfaces are easy to change, and
> adding support for DT_WHT to overlayfs would mostly be a trivial
> undertaking.
>
> The big issue (as always) is userspace API's and not introducing
> DT_WHT there was a very deliberate choice. Adding a translation layer
> from an internal whiteout representation to the userspace API also
> does not seem to be a very complex problem, but I haven't looked into
> that deeply.
>
> So AFAICS there's really nothing preventing the addition of whiteout
> objects to filesystems, other than developer dedication.
>
> Thanks,
> Miklos

2021-01-22 20:36:07

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT

On Fri, Jan 22, 2021 at 9:21 PM harshad shirwadkar
<[email protected]> wrote:
>
> Thanks Amir for pointing that out. Yes we are missing fast commit
> tracking in whiteout. I'll send out a fix for that.
>
> > But I must say it would have been very hard to catch missing ext4_fc_track_*
> > without specialized fs fuzzer such as the CrashMonkey generated tests.
>
> I agree, it's been on my to-do list to run CrashMonkey tests with fast
> commits. I'm curious what kind of CrashMonkey tests you ran that
> helped you catch this? Were you running Overlayfs on top of Ext4 with
> fast commits?
>

Neither. I just guessed RENAME_WHITEOUT might be missed as
developers are rarely aware of it.
I never ran CrashMonkey tests myself.
I found a few crash consistency bugs using xfstest generic/455.
I suggest that you run it with fast commits
and try using NUM_OPS and NUM_FILES larger than the test defaults
to let the test run for a longer time.

Thanks,
Amir.

2021-03-09 07:34:06

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT

On Fri, Jan 22, 2021 at 9:21 PM harshad shirwadkar
<[email protected]> wrote:
>
> Thanks Amir for pointing that out. Yes we are missing fast commit
> tracking in whiteout. I'll send out a fix for that.
>

Ping.

Harshad,

Did you forget or did I miss the patch?

Thanks,
Amir.