My 6.2-rc1 regression run on the current x86-64 test appliance revealed a new
failure for generic/454 on the 4k file system configuration and all other
configurations using a 4k block size. This failure reproduces with 100%
reliability and continues to appear as of 6.2-rc4.
The test output indicates that the file system under test is inconsistent.
e2fsck reports:
*** fsck.ext4 output ***
fsck from util-linux 2.36.1
e2fsck 1.46.2 (28-Feb-2021)
Pass 1: Checking inodes, blocks, and sizes
Extended attribute in inode 131074 has a hash (857950233) which is invalid
Clear? no
Extended attribute in inode 131074 has a hash (736302368) which is invalid
Clear? no
Extended attribute in inode 131074 has a hash (674453032) which is invalid
Clear? no
Extended attribute in inode 131074 has a hash (2299266654) which is invalid
Clear? no
Extended attribute in inode 131074 has a hash (3503002490) which is invalid
Clear? no
< and continues with more of the same >
The failure bisects to the following commit in -rc1:
3bc753c06dd0 ("kbuild: treat char as always unsigned")
The comment for this commit suggests that it's likely to cause things to
break where there has been type misuse for char; presumably, that's what's
happened here.
Eric
On Jan 17, 2023, at 11:31 AM, Eric Whitney <[email protected]> wrote:
>
> My 6.2-rc1 regression run on the current x86-64 test appliance revealed a new
> failure for generic/454 on the 4k file system configuration and all other
> configurations using a 4k block size. This failure reproduces with 100%
> reliability and continues to appear as of 6.2-rc4.
>
> The test output indicates that the file system under test is inconsistent.
There is actually support in the superblock for both signed and unsigned char
hash calculations, exactly because there was a bug like this in the past.
It looks like the ext4 code/build is still using the signed hash functions:
static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
{
:
:
if (i & EXT2_FLAGS_UNSIGNED_HASH)
sbi->s_hash_unsigned = 3;
else if ((i & EXT2_FLAGS_SIGNED_HASH) == 0) {
#ifdef __CHAR_UNSIGNED__
if (!sb_rdonly(sb))
es->s_flags |=
cpu_to_le32(EXT2_FLAGS_UNSIGNED_HASH);
sbi->s_hash_unsigned = 3;
#else
if (!sb_rdonly(sb))
es->s_flags |=
cpu_to_le32(EXT2_FLAGS_SIGNED_HASH);
#endif
}
It looks like this *should* be detecting the unsigned/signed char type
automatically based on __CHAR_UNSIGNED__, but that isn't working properly
in this case. I have no idea whether this is a compiler or kernel issue,
just thought I'd point out the background of what ext4 is doing here.
Cheers, Andreas
> e2fsck reports:
>
> *** fsck.ext4 output ***
> fsck from util-linux 2.36.1
> e2fsck 1.46.2 (28-Feb-2021)
> Pass 1: Checking inodes, blocks, and sizes
> Extended attribute in inode 131074 has a hash (857950233) which is invalid
> Clear? no
>
> Extended attribute in inode 131074 has a hash (736302368) which is invalid
> Clear? no
>
> Extended attribute in inode 131074 has a hash (674453032) which is invalid
> Clear? no
>
> Extended attribute in inode 131074 has a hash (2299266654) which is invalid
> Clear? no
>
> Extended attribute in inode 131074 has a hash (3503002490) which is invalid
> Clear? no
>
> < and continues with more of the same >
>
> The failure bisects to the following commit in -rc1:
>
> 3bc753c06dd0 ("kbuild: treat char as always unsigned")
>
> The comment for this commit suggests that it's likely to cause things to
> break where there has been type misuse for char; presumably, that's what's
> happened here.
>
> Eric
>
Cheers, Andreas
[Added some Cc's, and updated subject to reflect what this is really about]
On Tue, Jan 17, 2023 at 05:10:55PM -0700, Andreas Dilger wrote:
> On Jan 17, 2023, at 11:31 AM, Eric Whitney <[email protected]> wrote:
> >
> > My 6.2-rc1 regression run on the current x86-64 test appliance revealed a new
> > failure for generic/454 on the 4k file system configuration and all other
> > configurations using a 4k block size. This failure reproduces with 100%
> > reliability and continues to appear as of 6.2-rc4.
> >
> > The test output indicates that the file system under test is inconsistent.
>
> There is actually support in the superblock for both signed and unsigned char
> hash calculations, exactly because there was a bug like this in the past.
> It looks like the ext4 code/build is still using the signed hash functions:
>
>
> static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
> {
> :
> :
> if (i & EXT2_FLAGS_UNSIGNED_HASH)
> sbi->s_hash_unsigned = 3;
> else if ((i & EXT2_FLAGS_SIGNED_HASH) == 0) {
> #ifdef __CHAR_UNSIGNED__
> if (!sb_rdonly(sb))
> es->s_flags |=
> cpu_to_le32(EXT2_FLAGS_UNSIGNED_HASH);
> sbi->s_hash_unsigned = 3;
> #else
> if (!sb_rdonly(sb))
> es->s_flags |=
> cpu_to_le32(EXT2_FLAGS_SIGNED_HASH);
> #endif
> }
>
> It looks like this *should* be detecting the unsigned/signed char type
> automatically based on __CHAR_UNSIGNED__, but that isn't working properly
> in this case. I have no idea whether this is a compiler or kernel issue,
> just thought I'd point out the background of what ext4 is doing here.
>
> Cheers, Andreas
Well, since v6.2-rc1 the kernel is always compiled with -funsigned-char, so of
course the above no longer works to detect the "default" signedness of a char.
Below is one very ugly solution. It seems to work, based on the output of
'make V=1'; fs/ext4/char.c is compiled *without* -funsigned-char, and everything
else is still compiled with -funsigned-char. Though, I'm not sure that the
trick I'm using with KBUILD_CFLAGS is meant to be supported.
Better ideas would be appreciated. If the default signedness of 'char' is a
per-arch thing, maybe each arch could explicitly select
ARCH_HAVE_DEFAULT_SIGNED_CHAR or ARCH_HAVE_DEFAULT_UNSIGNED_CHAR? Or is there
any chance that this code is obsolete and can be removed from ext4?
From 87b77d02c399d684d906832862ad234ec321ff12 Mon Sep 17 00:00:00 2001
From: Eric Biggers <[email protected]>
Date: Tue, 17 Jan 2023 19:21:35 -0800
Subject: [PATCH] ext4: fix detection of default char signedness
For strange reasons involving a historical bug in ext4's on-disk format,
ext4 needs to know the default signedness of a char. Since the kernel
is now always compiled with -funsigned-char, checking __CHAR_UNSIGNED__
no longer works. To make it work again, check __CHAR_UNSIGNED__ in a
separate translation unit that is compiled without -funsigned-char.
Fixes: 3bc753c06dd0 ("kbuild: treat char as always unsigned")
Reported-by: Eric Whitney <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
fs/ext4/Makefile | 43 +++++++++++++++++++++++++++++++++++++------
fs/ext4/char.c | 24 ++++++++++++++++++++++++
fs/ext4/ext4.h | 2 ++
fs/ext4/super.c | 20 ++++++++++----------
4 files changed, 73 insertions(+), 16 deletions(-)
create mode 100644 fs/ext4/char.c
diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index 72206a2926765..fa7dc62fa1a2c 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -5,12 +5,43 @@
obj-$(CONFIG_EXT4_FS) += ext4.o
-ext4-y := balloc.o bitmap.o block_validity.o dir.o ext4_jbd2.o extents.o \
- extents_status.o file.o fsmap.o fsync.o hash.o ialloc.o \
- indirect.o inline.o inode.o ioctl.o mballoc.o migrate.o \
- mmp.o move_extent.o namei.o page-io.o readpage.o resize.o \
- super.o symlink.o sysfs.o xattr.o xattr_hurd.o xattr_trusted.o \
- xattr_user.o fast_commit.o orphan.o
+ext4-y := balloc.o \
+ bitmap.o \
+ block_validity.o \
+ char.o \
+ dir.o \
+ ext4_jbd2.o \
+ extents.o \
+ extents_status.o \
+ fast_commit.o \
+ file.o \
+ fsmap.o \
+ fsync.o \
+ hash.o \
+ ialloc.o \
+ indirect.o \
+ inline.o \
+ inode.o \
+ ioctl.o \
+ mballoc.o \
+ migrate.o \
+ mmp.o \
+ move_extent.o \
+ namei.o \
+ orphan.o \
+ page-io.o \
+ readpage.o \
+ resize.o \
+ super.o \
+ symlink.o \
+ sysfs.o \
+ xattr.o \
+ xattr_hurd.o \
+ xattr_trusted.o \
+ xattr_user.o
+
+# char.c needs to be compiled with the default char signedness.
+$(obj)/char.o: KBUILD_CFLAGS := $(filter-out -funsigned-char,$(KBUILD_CFLAGS))
ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o
ext4-$(CONFIG_EXT4_FS_SECURITY) += xattr_security.o
diff --git a/fs/ext4/char.c b/fs/ext4/char.c
new file mode 100644
index 0000000000000..2a8b3df44262c
--- /dev/null
+++ b/fs/ext4/char.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Detect whether char is signed or unsigned by default on this platform,
+ * disregarding the fact that since v6.2, char is always unsigned in the kernel,
+ * i.e. the kernel is now always built with -funsigned char.
+ *
+ * To do this, check __CHAR_UNSIGNED__ in a translation unit that is compiled
+ * *without* -funsigned-char.
+ *
+ * Do *not* include any headers in this file, since it's no longer being tested
+ * that kernel-internal headers build cleanly without -funsigned-char.
+ */
+
+int ext4_is_char_unsigned(void);
+
+int ext4_is_char_unsigned(void)
+{
+#ifdef __CHAR_UNSIGNED__
+ return 1;
+#else
+ return 0;
+#endif
+}
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 140e1eb300d17..bdadad0b4e7ab 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3675,6 +3675,8 @@ extern int ext4_check_blockref(const char *, unsigned int,
extern int ext4_sb_block_valid(struct super_block *sb, struct inode *inode,
ext4_fsblk_t start_blk, unsigned int count);
+/* char.c */
+int ext4_is_char_unsigned(void);
/* extents.c */
struct ext4_ext_path;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 260c1b3e3ef2c..2bd6d1b15d041 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5189,16 +5189,16 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
if (i & EXT2_FLAGS_UNSIGNED_HASH)
sbi->s_hash_unsigned = 3;
else if ((i & EXT2_FLAGS_SIGNED_HASH) == 0) {
-#ifdef __CHAR_UNSIGNED__
- if (!sb_rdonly(sb))
- es->s_flags |=
- cpu_to_le32(EXT2_FLAGS_UNSIGNED_HASH);
- sbi->s_hash_unsigned = 3;
-#else
- if (!sb_rdonly(sb))
- es->s_flags |=
- cpu_to_le32(EXT2_FLAGS_SIGNED_HASH);
-#endif
+ if (ext4_is_char_unsigned()) {
+ if (!sb_rdonly(sb))
+ es->s_flags |=
+ cpu_to_le32(EXT2_FLAGS_UNSIGNED_HASH);
+ sbi->s_hash_unsigned = 3;
+ } else {
+ if (!sb_rdonly(sb))
+ es->s_flags |=
+ cpu_to_le32(EXT2_FLAGS_SIGNED_HASH);
+ }
}
}
base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4
--
2.39.0
On Tue, Jan 17, 2023 at 07:55:57PM -0800, Eric Biggers wrote:
> [Added some Cc's, and updated subject to reflect what this is really about]
>
> On Tue, Jan 17, 2023 at 05:10:55PM -0700, Andreas Dilger wrote:
> > On Jan 17, 2023, at 11:31 AM, Eric Whitney <[email protected]> wrote:
> > >
> > > My 6.2-rc1 regression run on the current x86-64 test appliance revealed a new
> > > failure for generic/454 on the 4k file system configuration and all other
> > > configurations using a 4k block size. This failure reproduces with 100%
> > > reliability and continues to appear as of 6.2-rc4.
> > >
> > > The test output indicates that the file system under test is inconsistent.
> >
> > There is actually support in the superblock for both signed and unsigned char
> > hash calculations, exactly because there was a bug like this in the past.
> > It looks like the ext4 code/build is still using the signed hash functions:
> >
> >
> > static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
> > {
> > :
> > :
> > if (i & EXT2_FLAGS_UNSIGNED_HASH)
> > sbi->s_hash_unsigned = 3;
> > else if ((i & EXT2_FLAGS_SIGNED_HASH) == 0) {
> > #ifdef __CHAR_UNSIGNED__
> > if (!sb_rdonly(sb))
> > es->s_flags |=
> > cpu_to_le32(EXT2_FLAGS_UNSIGNED_HASH);
> > sbi->s_hash_unsigned = 3;
> > #else
> > if (!sb_rdonly(sb))
> > es->s_flags |=
> > cpu_to_le32(EXT2_FLAGS_SIGNED_HASH);
> > #endif
> > }
> >
> > It looks like this *should* be detecting the unsigned/signed char type
> > automatically based on __CHAR_UNSIGNED__, but that isn't working properly
> > in this case. I have no idea whether this is a compiler or kernel issue,
> > just thought I'd point out the background of what ext4 is doing here.
> >
> > Cheers, Andreas
>
> Well, since v6.2-rc1 the kernel is always compiled with -funsigned-char, so of
> course the above no longer works to detect the "default" signedness of a char.
>
> Below is one very ugly solution. It seems to work, based on the output of
> 'make V=1'; fs/ext4/char.c is compiled *without* -funsigned-char, and everything
> else is still compiled with -funsigned-char. Though, I'm not sure that the
> trick I'm using with KBUILD_CFLAGS is meant to be supported.
>
> Better ideas would be appreciated. If the default signedness of 'char' is a
> per-arch thing, maybe each arch could explicitly select
> ARCH_HAVE_DEFAULT_SIGNED_CHAR or ARCH_HAVE_DEFAULT_UNSIGNED_CHAR? Or is there
> any chance that this code is obsolete and can be removed from ext4?
>
... and ext4's xattr hashing also depends on the signedness of char, so the
following would be needed too:
From 2b47d4ab7fa6aefe81e851417b298372a2e9f391 Mon Sep 17 00:00:00 2001
From: Eric Biggers <[email protected]>
Date: Tue, 17 Jan 2023 20:13:01 -0800
Subject: [PATCH] ext4: make ext4_xattr_hash_entry() use default char
signedness
Annoyingly, ext4 uses different xattr hash algorithms depending on the
default signedness of 'char'. Since 'char' is now always unsigned in
the kernel, ext4 now needs to explicitly check the default signedness of
'char' in order to decide which algorithm to use.
This fixes handling of xattrs whose names contain characters outside the
ASCII range, including xfstest generic/454 which tests that case.
Reported-by: Eric Whitney <[email protected]>
Fixes: 3bc753c06dd0 ("kbuild: treat char as always unsigned")
Signed-off-by: Eric Biggers <[email protected]>
---
fs/ext4/xattr.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 7decaaf27e82b..b96d134786b71 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -3078,10 +3078,18 @@ static __le32 ext4_xattr_hash_entry(char *name, size_t name_len, __le32 *value,
{
__u32 hash = 0;
- while (name_len--) {
- hash = (hash << NAME_HASH_SHIFT) ^
- (hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^
- *name++;
+ if (ext4_is_char_unsigned()) {
+ while (name_len--) {
+ hash = (hash << NAME_HASH_SHIFT) ^
+ (hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^
+ *(unsigned char *)name++;
+ }
+ } else {
+ while (name_len--) {
+ hash = (hash << NAME_HASH_SHIFT) ^
+ (hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^
+ *(signed char *)name++;
+ }
}
while (value_count--) {
hash = (hash << VALUE_HASH_SHIFT) ^
base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4
prerequisite-patch-id: e2b312341ee6de27e940ca5470787f3a0dde4541
--
2.39.0
On Tue, Jan 17, 2023 at 7:56 PM Eric Biggers <[email protected]> wrote:
>
> [Added some Cc's, and updated subject to reflect what this is really about]
Hmm.
I really hate this.
> On Tue, Jan 17, 2023 at 05:10:55PM -0700, Andreas Dilger wrote:
> > >
> > > My 6.2-rc1 regression run on the current x86-64 test appliance revealed a new
> > > failure for generic/454 on the 4k file system configuration and all other
> > > configurations using a 4k block size. This failure reproduces with 100%
> > > reliability and continues to appear as of 6.2-rc4.
> > >
> > > The test output indicates that the file system under test is inconsistent.
> >
> > There is actually support in the superblock for both signed and unsigned char
> > hash calculations, exactly because there was a bug like this in the past.
> > It looks like the ext4 code/build is still using the signed hash functions:
So clearly ext4 is completely buggy in this respect, but this is
exactly what would happen if you just mount a disk that was written to
on (old, pre-funsigned-char) x86, and then mount it on, say, an arm
machine that has always been unsigned-char.
That was always supposed to work.
So switching to a new kernel really should just be *exactly* the same
as moving the disk to a different machine.
And that's still "supposed to just work".
And this whole "let's get it wrong, and make x86 act as if 'char' is
signed, even when it isn't" seems entirely the wrong way around.
So the bug here is that __ext4_fill_super() seems to not look at the
actual on-disk thing, but instead do
> > static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
> > {
...
> > else if ((i & EXT2_FLAGS_SIGNED_HASH) == 0) {
> > #ifdef __CHAR_UNSIGNED__
but at the same time this code is *exactly* the code that is trying to
deal with "oh, you moved the disk from a signed architecture to an
unsigned one".
So the above code is literally what should fix up that movement -
taking the *actual* new signedness (or lack thereof, in this case)
into account.
Now, it apparently doesn't work very well, and I suspect the reason it
doesn't work is that the xattr code doesn't actually test these
EXT2_FLAGS_SIGNED_HASH bits (and the s_hash_unsigned field value that
goes along with it).
But we should *fix* that.
Instead, the patch is self-admittedly very ugly:
> Below is one very ugly solution. It seems to work [..]
but I really don't think it works. It just perpetuates the bug that
you can't move a filesystem from one architecture to another.
So I really think that the solution is either
(a) just admit that ext4 was buggy, and say "char is now unsigned",
and know that generic/454 will fail when you switch from a buggy
kernel to a new one that no longer has this signedness bug.
(b) fix ext4_xattr_hash_entry() to actually see "oh, this filesystem
was created with signed chars, and so we'll use that algorithm even
though our chars are always unsigned".
Honestly, the only actual case of breakage I have heard of is that
test, so I was hoping that (a) is simply the acceptable and simplest
solution. It basically says "nobody really cares, we're now always
unsigned, real people didn't use non-ASCII xattr names".
Anyway, here's a TOTALLY UNTESTED patch to do (b). Maybe it's entirely
broken, but I think you can see what I'm aiming for.
Comments?
Linus
On Tue, Jan 17, 2023 at 08:27:23PM -0800, Linus Torvalds wrote:
> On Tue, Jan 17, 2023 at 7:56 PM Eric Biggers <[email protected]> wrote:
> >
> > [Added some Cc's, and updated subject to reflect what this is really about]
>
> Hmm.
>
> I really hate this.
>
> > On Tue, Jan 17, 2023 at 05:10:55PM -0700, Andreas Dilger wrote:
> > > >
> > > > My 6.2-rc1 regression run on the current x86-64 test appliance revealed a new
> > > > failure for generic/454 on the 4k file system configuration and all other
> > > > configurations using a 4k block size. This failure reproduces with 100%
> > > > reliability and continues to appear as of 6.2-rc4.
> > > >
> > > > The test output indicates that the file system under test is inconsistent.
> > >
> > > There is actually support in the superblock for both signed and unsigned char
> > > hash calculations, exactly because there was a bug like this in the past.
> > > It looks like the ext4 code/build is still using the signed hash functions:
>
> So clearly ext4 is completely buggy in this respect, but this is
> exactly what would happen if you just mount a disk that was written to
> on (old, pre-funsigned-char) x86, and then mount it on, say, an arm
> machine that has always been unsigned-char.
>
> That was always supposed to work.
>
> So switching to a new kernel really should just be *exactly* the same
> as moving the disk to a different machine.
>
> And that's still "supposed to just work".
>
> And this whole "let's get it wrong, and make x86 act as if 'char' is
> signed, even when it isn't" seems entirely the wrong way around.
>
> So the bug here is that __ext4_fill_super() seems to not look at the
> actual on-disk thing, but instead do
>
> > > static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
> > > {
> ...
> > > else if ((i & EXT2_FLAGS_SIGNED_HASH) == 0) {
> > > #ifdef __CHAR_UNSIGNED__
>
> but at the same time this code is *exactly* the code that is trying to
> deal with "oh, you moved the disk from a signed architecture to an
> unsigned one".
>
> So the above code is literally what should fix up that movement -
> taking the *actual* new signedness (or lack thereof, in this case)
> into account.
>
> Now, it apparently doesn't work very well, and I suspect the reason it
> doesn't work is that the xattr code doesn't actually test these
> EXT2_FLAGS_SIGNED_HASH bits (and the s_hash_unsigned field value that
> goes along with it).
>
> But we should *fix* that.
Well, reading the code more carefully, the on-disk ext4 superblock can contain
EXT2_FLAGS_SIGNED_HASH, EXT2_FLAGS_UNSIGNED_HASH, or neither. "Neither" is the
legacy case. The above existing code in ext4 is handling the "neither" case by
setting the flag corresponding to the default signedness of char. So yes, that
migration code was always broken if you moved the disk from a platform with
signed char (e.g. x86) to a platform with unsigned char (e.g. arm). But,
-funsigned-char breaks it whenever the disk stays on a platform with signed
char. That seems much worse. Though, it's also a migration for legacy
filesystems, so maybe that code isn't needed often anymore anyway...
Anyway, as I found out after my initial reply, and which you noticed too, the
thing that actually caused the breakage here is the xattr hash. The xattr hash
isn't related to the logic in __ext4_fill_super(), or to EXT2_FLAGS_SIGNED_HASH
and EXT2_FLAGS_UNSIGNED_HASH. Those are only for the dirhash. Unfortunately,
ext4's xattr hash didn't get fixed years ago when the dirhash did. The xattr
hash still just always uses 'char'.
> (a) just admit that ext4 was buggy, and say "char is now unsigned",
> and know that generic/454 will fail when you switch from a buggy
> kernel to a new one that no longer has this signedness bug.
It seems kind of crazy to intentionally break xattrs with non-ASCII names upon a
kernel upgrade...
>
> (b) fix ext4_xattr_hash_entry() to actually see "oh, this filesystem was
> created with signed chars, and so we'll use that algorithm even though our
> chars are always unsigned".
>
> Honestly, the only actual case of breakage I have heard of is that test, so I
> was hoping that (a) is simply the acceptable and simplest solution. It
> basically says "nobody really cares, we're now always unsigned, real people
> didn't use non-ASCII xattr names".
>
> Anyway, here's a TOTALLY UNTESTED patch to do (b). Maybe it's entirely broken,
> but I think you can see what I'm aiming for.
>
> Comments?
>
> Linus
> fs/ext4/xattr.c | 41 +++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 39 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 7decaaf27e82..69a1b8c6a2ec 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -81,6 +81,8 @@ ext4_xattr_block_cache_find(struct inode *, struct ext4_xattr_header *,
> struct mb_cache_entry **);
> static __le32 ext4_xattr_hash_entry(char *name, size_t name_len, __le32 *value,
> size_t value_count);
> +static __le32 ext4_xattr_hash_entry_signed(char *name, size_t name_len, __le32 *value,
> + size_t value_count);
> static void ext4_xattr_rehash(struct ext4_xattr_header *);
>
> static const struct xattr_handler * const ext4_xattr_handler_map[] = {
> @@ -470,8 +472,21 @@ ext4_xattr_inode_verify_hashes(struct inode *ea_inode,
> tmp_data = cpu_to_le32(hash);
> e_hash = ext4_xattr_hash_entry(entry->e_name, entry->e_name_len,
> &tmp_data, 1);
> - if (e_hash != entry->e_hash)
> - return -EFSCORRUPTED;
> + /* All good? */
> + if (e_hash == entry->e_hash)
> + return 0;
> +
> + /*
> + * Not good. Maybe the entry hash was calculated
> + * using the buggy signed char version?
> + */
> + e_hash = ext4_xattr_hash_entry_signed(entry->e_name, entry->e_name_len,
> + &tmp_data, 1);
> + if (e_hash == entry->e_hash)
> + return 0;
> +
> + /* Still no match - bad */
> + return -EFSCORRUPTED;
> }
> return 0;
> }
> @@ -3091,6 +3106,28 @@ static __le32 ext4_xattr_hash_entry(char *name, size_t name_len, __le32 *value,
> return cpu_to_le32(hash);
> }
>
> +/*
> + * ext4_xattr_hash_entry_signed()
> + *
> + * Compute the hash of an extended attribute incorrectly.
> + */
> +static __le32 ext4_xattr_hash_entry_signed(char *name, size_t name_len, __le32 *value, size_t value_count)
> +{
> + __u32 hash = 0;
> +
> + while (name_len--) {
> + hash = (hash << NAME_HASH_SHIFT) ^
> + (hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^
> + (signed char)*name++;
> + }
> + while (value_count--) {
> + hash = (hash << VALUE_HASH_SHIFT) ^
> + (hash >> (8*sizeof(hash) - VALUE_HASH_SHIFT)) ^
> + le32_to_cpu(*value++);
> + }
> + return cpu_to_le32(hash);
> +}
> +
I think that what your patch does is allow filesystems to contain both signed
and unsigned xattr hashes, and write out new ones as unsigned. That might work,
though e2fsprogs would need to be fixed too, and old versions of e2fsck would
corrupt xattrs unless a new ext4 filesystem feature flag was added.
Maybe using the solution that the dirhash uses, where the signedness of the hash
is explicitly stored in the superblock on-disk, would be better. ext4 would
still need to know the default signedness of a char in order to set the on-disk
flag in the first place, though, unless it assumes that the correct value is
already conveyed by the filesystem having either EXT2_FLAGS_SIGNED_HASH or
EXT2_FLAGS_UNSIGNED_HASH set. (But those are currently meant for the dirhash,
not the xattr hash, and apparently they aren't set on legacy filesystems.)
- Eric
On Tue, Jan 17, 2023 at 9:14 PM Eric Biggers <[email protected]> wrote:
>
> Well, reading the code more carefully, the on-disk ext4 superblock can contain
> EXT2_FLAGS_SIGNED_HASH, EXT2_FLAGS_UNSIGNED_HASH, or neither. "Neither" is the
> legacy case. The above existing code in ext4 is handling the "neither" case by
> setting the flag corresponding to the default signedness of char. So yes, that
> migration code was always broken if you moved the disk from a platform with
> signed char (e.g. x86) to a platform with unsigned char (e.g. arm). But,
> -funsigned-char breaks it whenever the disk stays on a platform with signed
> char. That seems much worse. Though, it's also a migration for legacy
> filesystems, so maybe that code isn't needed often anymore anyway...
The xattr hash is also broken if it stays on one single machine, but
is accessed two different ways.
Example: about half our architectures are mainly tested inside qemu,
so if you ever end up using the same disk image across two emulated
environments, you'll hit the exact same thing.
Basically, any filesystem that depends on host byte order, or on
host-specific data sizes - or on host signedness rules - is simply
completely and utterly broken (unless it's something like 'tmpfs' that
doesn't have any existence outside of that machine).
So ext4 has been broken from day one when it comes to xattr hashing.
And nobody ever noticed, because very few people use xattrs to begin
with, and when they do it tends to be very limited. And they seldom
mix architectures.
But "nobody noticed" doesn't mean it wasn't broken. It was always
completely and unambiguously buggy.
> > (a) just admit that ext4 was buggy, and say "char is now unsigned",
> > and know that generic/454 will fail when you switch from a buggy
> > kernel to a new one that no longer has this signedness bug.
>
> It seems kind of crazy to intentionally break xattrs with non-ASCII names upon a
> kernel upgrade...
I really don't think they happen very much, and if we can fix a bug
without doing anything about it, and nobody notices, that would be
fine by me.
But:
> I think that what your patch does is allow filesystems to contain both signed
> and unsigned xattr hashes, and write out new ones as unsigned.
Right. Nobody seems to actually care about the hash, as far as I can tell.
It's used for that corruption check. And it is used by
ext4_xattr_block_cache_find() to basically reuse a cached entry, but
it has no actual semantic meaning.
> That might work,
> though e2fsprogs would need to be fixed too, and old versions of e2fsck would
> corrupt xattrs unless a new ext4 filesystem feature flag was added.
The thing is, ef2progs NEEDS TO BE FIXED REGARDLESS!
You don't seem to realize that this is a fundamental filesystem bug.
It was not introduced by "-funsigned-char". It's been there for decades.
Re-introducing the "let's try to hide this bug" logic like your patch
does is disgusting and actively wrong.
This bug needs to be *fixed*.
And since we don't seem to have a "this filesystem uses stupid signed
hash arithmetic" flag (the EXT2_FLAGS_SIGNED_HASH only covers the
filename case), and since nobody actually cares, the best option seems
to be to just do what the code should have done originally, ie not
rely on 'char' being sign-extended.
A simpler patch would be to actually just entirely remove the check of
the e_hash value entiely, ie just this:
- if (e_hash != entry->e_hash)
- return -EFSCORRUPTED;
and just say that the hash was always broken and the test for a random
value is not worth it.
Linus
On Wed, Jan 18, 2023 at 07:48:06AM -0800, Linus Torvalds wrote:
> On Tue, Jan 17, 2023 at 9:14 PM Eric Biggers <[email protected]> wrote:
> >
> > Well, reading the code more carefully, the on-disk ext4 superblock can contain
> > EXT2_FLAGS_SIGNED_HASH, EXT2_FLAGS_UNSIGNED_HASH, or neither. "Neither" is the
> > legacy case. The above existing code in ext4 is handling the "neither" case by
> > setting the flag corresponding to the default signedness of char. So yes, that
> > migration code was always broken if you moved the disk from a platform with
> > signed char (e.g. x86) to a platform with unsigned char (e.g. arm). But,
> > -funsigned-char breaks it whenever the disk stays on a platform with signed
> > char. That seems much worse. Though, it's also a migration for legacy
> > filesystems, so maybe that code isn't needed often anymore anyway...
>
> The xattr hash is also broken if it stays on one single machine, but
> is accessed two different ways.
>
> Example: about half our architectures are mainly tested inside qemu,
> so if you ever end up using the same disk image across two emulated
> environments, you'll hit the exact same thing.
>
> Basically, any filesystem that depends on host byte order, or on
> host-specific data sizes - or on host signedness rules - is simply
> completely and utterly broken (unless it's something like 'tmpfs' that
> doesn't have any existence outside of that machine).
>
> So ext4 has been broken from day one when it comes to xattr hashing.
>
> And nobody ever noticed, because very few people use xattrs to begin
> with, and when they do it tends to be very limited. And they seldom
> mix architectures.
>
> But "nobody noticed" doesn't mean it wasn't broken. It was always
> completely and unambiguously buggy.
>
> > > (a) just admit that ext4 was buggy, and say "char is now unsigned",
> > > and know that generic/454 will fail when you switch from a buggy
> > > kernel to a new one that no longer has this signedness bug.
> >
> > It seems kind of crazy to intentionally break xattrs with non-ASCII names upon a
> > kernel upgrade...
>
> I really don't think they happen very much, and if we can fix a bug
> without doing anything about it, and nobody notices, that would be
> fine by me.
>
> But:
>
> > I think that what your patch does is allow filesystems to contain both signed
> > and unsigned xattr hashes, and write out new ones as unsigned.
>
> Right. Nobody seems to actually care about the hash, as far as I can tell.
>
> It's used for that corruption check. And it is used by
> ext4_xattr_block_cache_find() to basically reuse a cached entry, but
> it has no actual semantic meaning.
>
> > That might work,
> > though e2fsprogs would need to be fixed too, and old versions of e2fsck would
> > corrupt xattrs unless a new ext4 filesystem feature flag was added.
>
> The thing is, ef2progs NEEDS TO BE FIXED REGARDLESS!
>
> You don't seem to realize that this is a fundamental filesystem bug.
>
> It was not introduced by "-funsigned-char". It's been there for decades.
>
> Re-introducing the "let's try to hide this bug" logic like your patch
> does is disgusting and actively wrong.
>
> This bug needs to be *fixed*.
>
> And since we don't seem to have a "this filesystem uses stupid signed
> hash arithmetic" flag (the EXT2_FLAGS_SIGNED_HASH only covers the
> filename case), and since nobody actually cares, the best option seems
> to be to just do what the code should have done originally, ie not
> rely on 'char' being sign-extended.
>
> A simpler patch would be to actually just entirely remove the check of
> the e_hash value entiely, ie just this:
>
> - if (e_hash != entry->e_hash)
> - return -EFSCORRUPTED;
>
> and just say that the hash was always broken and the test for a random
> value is not worth it.
>
Of course I understand there's a fundamental filesystem bug. The question is
what to *do* about it. The patches I suggested were *only* intended to make the
ext4 code work the way it did in v6.1 as a workaround, and to start a discussion
about how to detect the platform's default signedness of a char, since it does
seem that's still going to be needed regardless, even though it of course never
should have been needed in the first place. I'm glad that you're interested in
helping with a more fundamental fix as well.
Now, the options for that are:
For the dirhash:
1a.) When mounting a filesystem that doesn't have the signedness of the dirhash
explicitly stored, assume it's the platform's default signedness, and
explicitly store that. (Behavior in v6.1 and earlier.)
1b.) When mounting a filesystem that doesn't have the signedness of the dirhash
explicitly stored, assume it's unsigned, and explicitly store that on-disk.
(Behavior in v6.2-rc4.)
For the xattr hash:
2a.) When mounting a filesystem that doesn't have the signedness of the xattr
hash explicitly stored, assume it's the platform's default signedness, and
explicitly store that. (Like how the dirhash worked.)
2b.) Change the xattr hash to always be unsigned. (Behavior in v6.2-rc4.)
2c.) Write new xattr hashes as unsigned, and allow the filesystem to contain
both unsigned and signed xattr hashes, without any explicit indication. If
the hash fails to verify as unsigned, try verifying it as signed too.
(1a) and (2a) would be the least likely to break users. [(2a) instead of (2c),
since (2c) would make old versions of e2fsck break filesystems on platforms with
signed char.] And those solutions require being able to detect the platform's
default signedness, however much we hate it.
Now, we seem to have gotten the "let's break userspace, lol" version of Linus
today, not the "SHUT THE FUCK UP, WE DO NOT BREAK USERSPACE" version of Linus
(https://lore.kernel.org/r/CA+55aFy98A+LJK4+GWMcbzaa1zsPBRo76q+ioEjbx-uaMKH6Uw@mail.gmail.com).
So sure, if we're extremely confident that no one, or at least no one we care
about, is mounting very old filesystems that haven't been mounted in a long
time, or using non-ASCII xattr names, then sure we could break those cases.
These cases were indeed already broken if a filesystem moved between platforms
with different char signedness, so that could be a reason not to care, although
"moving between platforms" is *much* less common than "same platform".
- Eric
On Wed, Jan 18, 2023 at 11:14 AM Eric Biggers <[email protected]> wrote:
>
> Now, we seem to have gotten the "let's break userspace, lol" version of Linus
> today, not the "SHUT THE FUCK UP, WE DO NOT BREAK USERSPACE" version of Linus
Heh.
Note that the reason I'm so laissez-faire about it is that "broken
test case" is something very different from "actually broken user
space".
I haven't actually seen anybody _report_ this as a problem, I've only
seen the generic/454 xfstest failures.
And "test failure" is simply not the same thing as "user failure".
Test failures are interesting in that they can most definitely
pinpoint the source of _potential_ user failures, but sometimes they
are just esoteric corner cases that don't happen in reality.
So the fact that we have had this bug since forever makes me suspect
absolutely nobody cares in real life. Yes, what's new is that it
happens on the same machine, but people have definitely moved ext4 USB
sticks around etc. I've most definitely done that myself, and it's not
been just between x86 machines.
Of course, it may also be that the filesystem patterns when you move a
USB stick around is very different from, say, the root filesystem
where you _don't_ necessarily tend to do it. So maybe the lack of
reports over the decades is not because people don't use xattrs with
the high bit set in the xattr names, but because it only happens in
situations that don't have that filesystem movement.
I dunno. On my system, at least, there is absolutely no sign of any
odd xattr names, according to something disgusting like
find / -xdev -type f -print0 | xargs -0 getfattr
but who knows.
Linus
On Tue, Jan 17, 2023 at 09:14:12PM -0800, Eric Biggers wrote:
>
> Well, reading the code more carefully, the on-disk ext4 superblock can contain
> EXT2_FLAGS_SIGNED_HASH, EXT2_FLAGS_UNSIGNED_HASH, or neither. "Neither" is the
> legacy case. The above existing code in ext4 is handling the "neither" case by
> setting the flag corresponding to the default signedness of char. So yes, that
> migration code was always broken if you moved the disk from a platform with
> signed char (e.g. x86) to a platform with unsigned char (e.g. arm). But,
> -funsigned-char breaks it whenever the disk stays on a platform with signed
> char. That seems much worse. Though, it's also a migration for legacy
> filesystems, so maybe that code isn't needed often anymore anyway...
The migration code will detect what the current signedness of char,
and then *set* the SIGNED/UNSIGNED flag. Once we've set the flag,
we'll continue to use it, and modern versions of e2fsprogs will set
the SIGNED/UNSIGNED flag again using the default signedness of char.
Once the signed/unsigned flag is set, then we can freely move the file
system between architectures and it's not a problem. So if you have a
file system that was created before 2007 (e2fsprogs v1.40), the first
time you mount that file system on a kernel or run e2fsck on a scheme
that understands the new SIGNED/UNSIGNED scheme, then we use whatever
hash variant that had been used on the pre-2007 kernel/e2fsprogs
forever.
commit f77704e416fca7dbe4cc91abba674d2ae3c14f6f
Author: Theodore Ts'o <[email protected]>
Date: Sat Nov 11 22:32:35 2006 -0500
Add directory hashed signed/unsigned hint to superblock
The e2fsprogs and kernel implementation of directory hash tree has a
bug which causes the implementation to be dependent on whether
characters are signed or unsigned. Platforms such as the PowerPC,
Arm, and S/390 have signed characters by default, which means that
hash directories on those systems are incompatible with hash
directories on other systems, such as the x86.
To fix this we add a new flags field to the superblock, and define two
new bits in that field to indicate whether or not the directory should
be signed or unsigned. If the bits are not set, e2fsck and fixed
kernels will set them to the signed/unsigned value of the currently
running platform, and then respect those bits when calculating the
directory hash. This allows compatibility with current filesystems,
as well as allowing cross-architectural compatibility.
Addresses Debian Bug: #389772
Signed-off-by: "Theodore Ts'o" <[email protected]>
So techncially speaking the migration code is probably not needed any
more, but it's not a lot of code (just a few lines of code to set the
flags if neither flag is set), and it's always possible that that you
might try to mount a pre-2006 file system on a 2022 system, so we
might as well keep it.
> Anyway, as I found out after my initial reply, and which you noticed too, the
> thing that actually caused the breakage here is the xattr hash. The xattr hash
> isn't related to the logic in __ext4_fill_super(), or to EXT2_FLAGS_SIGNED_HASH
> and EXT2_FLAGS_UNSIGNED_HASH. Those are only for the dirhash. Unfortunately,
> ext4's xattr hash didn't get fixed years ago when the dirhash did. The xattr
> hash still just always uses 'char'.
I *think* we're OK, because we only use XOR (^) on the char in the
xattr entry hash:
static __le32 ext4_xattr_hash_entry(char *name, size_t name_len, __le32 *value,
size_t value_count)
{
__u32 hash = 0;
while (name_len--) {
hash = (hash << NAME_HASH_SHIFT) ^
(hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^
*name++;
}
while (value_count--) {
hash = (hash << VALUE_HASH_SHIFT) ^
(hash >> (8*sizeof(hash) - VALUE_HASH_SHIFT)) ^
le32_to_cpu(*value++);
}
return cpu_to_le32(hash);
}
And XOR shouldn't matter whether *name is signed or unsigned. Am I
missing something?
- Ted
On Wed, Jan 18, 2023 at 03:14:04PM -0600, Linus Torvalds wrote:
> You're missing the fact that 'char' gets expanded to 'int', and in the
> process but #7 gets copied to bits 8-31 if it is signed.
>
> Then the xor and the later shifting will move those bits around..
Doh! One of those C pitfalls that I don't know how I mad missed.
I agree with your analysis that in actual practice, almost no one
actually uses non-ASCII characters for xattr names. (Filenames, yes,
but in general xattr names are set by programs, not by users.) So
besides xfstests generic/454, how likely is it that people would be
using things like Octopus emoji's or Unicode characters such as <GREEK
UPSILON WITH ACUTE AND HOOK SYMBOL>? Very unlikely, I'd argue. I
might be a bit more worried about userspace applications written for,
say, Red Flag Linux in China using chinese characters in xattrs, but
I'd argue even there it's much more likely that this would be in the
xattr values as opposed to the name.
In terms of what should we do for next steps, if we only pick signed,
then it's possible if there are some edge case users who actually did
use non-ASCII characters in the xattr name on PowerPC, ARM, or S/390,
they would be broken. That's simpler, and if we think there are
darned few of them, I guess we could do that.
That being said, it's not that much more work to use a flag in the
superblock to indicate whether or not we should be explicitly casting
*name to either a signed or unsigned char, and then setting that flag
automagically to avoid problems on people who started the file system
on say, x86 before the signed to unsigned char transition, and who
started natively on a PowerPC, ARM, or S/390.
The one bit which makes this a bit more complex is either way, we need
to change both the kernel and e2fsprogs, which is why if we do the
signed/unsigned xattr hash flag, it's important to set the flag value
to be whatever the "default" signeded would be on that architecture
pre 6.2-rc1.
- Ted
> On Jan 18, 2023, at 12:39 PM, Linus Torvalds <[email protected]> wrote:
>> Note that the reason I'm so laissez-faire about it is that "broken
>> test case" is something very different from "actually broken user
>> space".
>>
>> I haven't actually seen anybody _report_ this as a problem, I've only
>> seen the generic/454 xfstest failures.
That is likely because the number of 6.2+ kernel users who are using
Unicode xattr names is small, but they would likely come out of the
woodwork once Ubuntu/RHEL start using those kernels, and by then it
would be too late to fix this in a compatible manner.
> On Wed, Jan 18, 2023 at 03:14:04PM -0600, Linus Torvalds wrote:
>> You're missing the fact that 'char' gets expanded to 'int', and in the
>> process but #7 gets copied to bits 8-31 if it is signed.
>>
>> Then the xor and the later shifting will move those bits around..
>
> I agree with your analysis that in actual practice, almost no one
> actually uses non-ASCII characters for xattr names. (Filenames, yes,
> but in general xattr names are set by programs, not by users.) So
> besides xfstests generic/454, how likely is it that people would be
> using things like Octopus emoji's or Unicode characters such as <GREEK
> UPSILON WITH ACUTE AND HOOK SYMBOL>? Very unlikely, I'd argue. I
> might be a bit more worried about userspace applications written for,
> say, Red Flag Linux in China using chinese characters in xattrs, but
> I'd argue even there it's much more likely that this would be in the
> xattr values as opposed to the name.
>
> In terms of what should we do for next steps, if we only pick signed,
> then it's possible if there are some edge case users who actually did
> use non-ASCII characters in the xattr name on PowerPC, ARM, or S/390,
> they would be broken. That's simpler, and if we think there are
> darned few of them, I guess we could do that.
>
> That being said, it's not that much more work to use a flag in the
> superblock to indicate whether or not we should be explicitly casting
> *name to either a signed or unsigned char, and then setting that flag
> automagically to avoid problems on people who started the file system
> on say, x86 before the signed to unsigned char transition, and who
> started natively on a PowerPC, ARM, or S/390.
>
> The one bit which makes this a bit more complex is either way, we need
> to change both the kernel and e2fsprogs, which is why if we do the
> signed/unsigned xattr hash flag, it's important to set the flag value
> to be whatever the "default" signeded would be on that architecture
> pre 6.2-rc1.
It makes sense to use the existing UNSIGNED/SIGNED flag in the superblock
for the dir hash also for the xattr hash. That would give the historical
value for the xattr hashes prior to the 6.2 unsigned char change, and is
correct for all filesystems and xattrs *except* non-ASCII xattr names
created on 6.2+ kernels (which should indeed be relatively few cases).
e2fsck could do the same, and would again be correct for all xattrs names
except those created with kernel 6.2+. It could check both the signed
and unsigned forms and correct those few that are reversed compared to
the superblock flag, which should be rare, but isn't much work and avoids
incorrectly clearing the "corrupted" xattrs.
Cheers, Andreas
On Wed, Jan 18, 2023 at 03:20:03PM -0700, Andreas Dilger wrote:
> > In terms of what should we do for next steps, if we only pick signed,
> > then it's possible if there are some edge case users who actually did
> > use non-ASCII characters in the xattr name on PowerPC, ARM, or S/390,
> > they would be broken. That's simpler, and if we think there are
> > darned few of them, I guess we could do that.
> >
> > That being said, it's not that much more work to use a flag in the
> > superblock to indicate whether or not we should be explicitly casting
> > *name to either a signed or unsigned char, and then setting that flag
> > automagically to avoid problems on people who started the file system
> > on say, x86 before the signed to unsigned char transition, and who
> > started natively on a PowerPC, ARM, or S/390.
> >
> > The one bit which makes this a bit more complex is either way, we need
> > to change both the kernel and e2fsprogs, which is why if we do the
> > signed/unsigned xattr hash flag, it's important to set the flag value
> > to be whatever the "default" signeded would be on that architecture
> > pre 6.2-rc1.
>
> It makes sense to use the existing UNSIGNED/SIGNED flag in the superblock
> for the dir hash also for the xattr hash. That would give the historical
> value for the xattr hashes prior to the 6.2 unsigned char change, and is
> correct for all filesystems and xattrs *except* non-ASCII xattr names
> created on 6.2+ kernels (which should indeed be relatively few cases).
>
> e2fsck could do the same, and would again be correct for all xattrs names
> except those created with kernel 6.2+. It could check both the signed
> and unsigned forms and correct those few that are reversed compared to
> the superblock flag, which should be rare, but isn't much work and avoids
> incorrectly clearing the "corrupted" xattrs.
Reusing the existing SIGNED/UNSIGNED dirhash superblock flag wouldn't work in
the case where a filesystem is created on x86, then flashed to an ARM device.
That's exactly how Android works. The dirhash is then explicitly SIGNED,
whereas the xattr hash is implicitly unsigned.
Yes, that case has always been broken if there were non-ASCII xattr names in the
filesystem to begin with, before it was flashed to the device. That doesn't
really happen though. What *is* plausible is that some random application
running on the device uses non-ASCII xattr names.
So to be safe, a new pair of SIGNED/UNSIGNED superblock flags for the xattr hash
would be needed. The existing pair would be for the dirhash only.
- Eric
On Wed, Jan 18, 2023 at 2:20 PM Andreas Dilger <[email protected]> wrote:
>
> It makes sense to use the existing UNSIGNED/SIGNED flag in the superblock
> for the dir hash also for the xattr hash.
No. As Eric says, the existing flag for this - while related to a very
similar issue - just doesn't work.
It's for a different hash, and the flag has been set (or not) based on
previous history of that particular filesystem that isn't necessarily
at all relevant. The "signedness" of the xattrs hash simply doesn't
necessarily match the signedness of the existing flag.
Also, honestly, it's just entirely POINTLESS.
Here's the deal: these kinds of flags simply SHOULD NOT EXIST. A
filesystem that has dynamic byte order flags is an INCOMPETENT
filesystem. Yes, I know they exist. The people involved should be
ashamed of them.
For things like byte order flags, it's simply unambiguously better to
just have an unconditional on-disk byte order, even when that byte
order doesn't match the "native" byte order of the machine. Doing an
unconditional 'bswap' operation is literally smaller, simpler and
faster than conditionally doing nothing, and makes for much easier
verification (because you can use static typing rules, like we use for
"__le32" and types like that).
The EXACT same thing is true of architecture-specific signedness.
Yes, you can add flags to dynamically choose one or the other, but
that's actively *worse* then just picking one statically and saying
"this is the correct one".
And honestly, the unsigned version was always the correct one. It's
not even half-way ambiguous like the whole little-endian vs big-endian
discussion used to be. In this very thread Ted made clear that he
hadn't even realized that this signed case could happen and would
matter. The signed case is surprising and unintentional.
So, given that
(a) it's just a *fact* that the static choice is better
(b) of the two choices, one is clearly correct
I'm just going to put my foot down and say that right way forward is
clearly to just make that be the de-facto rule.
And honestly, it's the simpler patch too, since "-funsigned-char" has
already done all the actual "pick the right hash" for us.
We have four levels of "how much do we care" and simplicity:
(1) do nothing, see if anybody actually ever notices outside of generic/454
(2) just remove the hash check entirely, it's not doing anything
semantically meaningful
(3) keep the hash check, replace the "return -EFSCORRUPTED" with a
"pr_warn_once()" and returning zero
(4) the "keep the hash check, if it fails, test it the other way,
always write the correct new hash" patch I already posted
but the thing I refuse to do is to either just maintain the bug as-is
with extra complexity (Eric's patch) or add a new flag (or erroneously
re-use an existing flag) to make it dynamic.
The e2fsck argument is bugus, because *every* case requires e2fsck
changes. Even if we maintain the completely bogus historical behavior
of "the on-disk filesystem representation depends on architecture",
e2fsck had better start warning about it.
And again, I guarantee that the "fix the signedness" is the simpler
thing even for e2fsck, since at least then there is one unambiguously
correct version and one incorrect one, instead of some wishy-washy
"oh, this filesystem is correct for *THESE* architectures".
I'm attaching my suggested patch here again - this time with a commit
message - because while I'm ok with any of options 1-4 above, I
already wrote the patch for the most complicated case, and it's not
*that* complex.
We could possibly add a "pr_warn_once()" so that we have a combination
of (3) and (4).
In the longer term, we should
(a) remove the #ifdef __CHAR_UNSIGNED__ in ext4 as always true
(happily, I grepped - it's the only case in the whole source tree)
(b) probably plan on _eventually_ just remove my disgusting
"calculate signed version" thing, once we think it's all blown over
and we have never seen one of those pr_warn_on() messages outside of
generic/454
but I *really* don't want to add some new filesystem flag to deal with
this situation when a non-flag version is just simpler.
Hmm?
Linus