2021-04-13 08:01:09

by Haotian Li

[permalink] [raw]
Subject: [PATCH] e2fsck: try write_primary_superblock() again when it failed

Function write_primary_superblock() has two ways to flush
superblock, byte-by-byte as default. It may use
io_channel_write_byte() many times. If some errors occur
during these funcs, the superblock may become inconsistent
and produce checksum error.

Try write_primary_superblock() with whole-block way again
when it failed on byte-by-byte way.
---
lib/ext2fs/closefs.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lib/ext2fs/closefs.c b/lib/ext2fs/closefs.c
index 69cbdd8c..1fc27fb5 100644
--- a/lib/ext2fs/closefs.c
+++ b/lib/ext2fs/closefs.c
@@ -223,10 +223,8 @@ static errcode_t write_primary_superblock(ext2_filsys fs,
retval = io_channel_write_byte(fs->io,
SUPERBLOCK_OFFSET + (2 * write_idx), size,
new_super + write_idx);
- if (retval == EXT2_ET_UNIMPLEMENTED)
- goto fallback;
if (retval)
- return retval;
+ goto fallback;
}
memcpy(fs->orig_super, super, SUPERBLOCK_SIZE);
return 0;
--
2.23.0


2021-05-10 03:51:24

by Zhiqiang Liu

[permalink] [raw]
Subject: Re: [PATCH] e2fsck: try write_primary_superblock() again when it failed

friendly ping...

On 2021/4/13 11:19, Haotian Li wrote:
> Function write_primary_superblock() has two ways to flush
> superblock, byte-by-byte as default. It may use
> io_channel_write_byte() many times. If some errors occur
> during these funcs, the superblock may become inconsistent
> and produce checksum error.
>
> Try write_primary_superblock() with whole-block way again
> when it failed on byte-by-byte way.
> ---
> lib/ext2fs/closefs.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/lib/ext2fs/closefs.c b/lib/ext2fs/closefs.c
> index 69cbdd8c..1fc27fb5 100644
> --- a/lib/ext2fs/closefs.c
> +++ b/lib/ext2fs/closefs.c
> @@ -223,10 +223,8 @@ static errcode_t write_primary_superblock(ext2_filsys fs,
> retval = io_channel_write_byte(fs->io,
> SUPERBLOCK_OFFSET + (2 * write_idx), size,
> new_super + write_idx);
> - if (retval == EXT2_ET_UNIMPLEMENTED)
> - goto fallback;
> if (retval)
> - return retval;
> + goto fallback;
> }
> memcpy(fs->orig_super, super, SUPERBLOCK_SIZE);
> return 0;

2021-05-10 21:00:22

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] e2fsck: try write_primary_superblock() again when it failed

On Tue, Apr 13, 2021 at 11:19:30AM +0800, Haotian Li wrote:
> Function write_primary_superblock() has two ways to flush
> superblock, byte-by-byte as default. It may use
> io_channel_write_byte() many times. If some errors occur
> during these funcs, the superblock may become inconsistent
> and produce checksum error.
>
> Try write_primary_superblock() with whole-block way again
> when it failed on byte-by-byte way.

So unless you're using Direct I/O, this patch really shouldn't be
making any difference. (And tune2fs doesn't actually support Direct
I/O access, and that's the only e2fsprogs program that should normally
be making changes to the superblock on a regular basis.) That's
becuase with buffered I/O, we aren't going to be actually trying to
write to the device. The byte-by-byte writes will only be to
in-memory buffer caches, and so write errors should be *very* rare.
Are you actually seeing this happen in actual practice? If so, what
userspace program is trying to write the superblock, and under what
circumstances.

The problem with writing the whole superblock at one go is that this
can race with changes being made by the kernel --- for example, if we
are unlinking or truncating a file, and the kernel is updated the
first inode in the orphaned inode list, this can lead to other types
of inconsistencies / file system corruption if we get unlucky. That's
why we switched the byte-by-byte update approach (at least for those
I/O managers which supported it; those that didn't were things like
the Windows I/O manager where concurrent update by the kernel wasn't
an issue).

It is true that since we have added metadata checksums, this can lead
to checksum inconsistencies. In practice, since we don't validate the
superblock while the file system is mounted, this should only happen
in two cases if there is a race between tune2fs modifying the kernel
and the kernel trying to update the superblock, and we crash before a
subsequent attempt by the kernel to update the superblock; for
example, when the orphaned inode list is being modified, or when the
file system is unmounted.

What we should do fix this, at least for the long term, is to add new
generic ioctls for updating the label and UUID. This is something I
had discussed with Darrick as something that multiple file systems
would find useful. For the other use cases, what I think we should do
is to implement an ext4-specific ioctl which takes a pointer to an
in-memory 1k superblock, and a 32-bit flag word. One bit in the flag
word might cause the ioctl to update the following fields:

__le16 s_mnt_count; /* Mount count */
__le16 s_max_mnt_count; /* Maximal mount count */
__le32 s_lastcheck; /* time of last check */
__le32 s_checkinterval; /* max. time between checks */

Another bit might mean updating these fields:

__le16 s_errors; /* Behaviour when detecting errors */
__le32 s_r_blocks_count_lo; /* Reserved blocks count */
__le32 s_r_blocks_count_hi; /* Reserved blocks count */


Yet another bit might mean updating these fields:

__le32 s_default_mount_opts;
__u8 s_mount_opts[64];

etc. If the flag word is 0, then the ioctl will return success, and
the flag word will be updated with the set of flags supported by the
currently running kernel.

In that way, tune2fs could test and see if it is running on a kernel
which supports superblock updates via the ioctl mechanism. If it
does, it will use this in preference to trying to update the
superblock via direct writes to the block device, and since the actual
commit is being done by the kernel, it can run those changes through
the journal. The kernel can also make sure the superblock checksum is
updated in a completely consistent way and with appropriate locking
with other changes to the superblock.

Does that make sense?

- Ted