please be noted we felt quite strange why this change could cause
xfstests.generic.454.fail
but even by rebuild and rerun, the issue seems persistent while keeping clean
on parent
daf4218bf8dd9750 3bc753c06dd02a3517c9b498e38
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:10 100% 10:10 xfstests.generic.454.fail
so we just report out FYI to seek any insights about this.
Greeting,
FYI, we noticed xfstests.generic.454.fail due to commit (built with gcc-11):
commit: 3bc753c06dd02a3517c9b498e3846ebfc94ac3ee ("kbuild: treat char as always unsigned")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linux-next/master c76083fac3bae1a87ae3d005b5cb1cbc761e31d5]
in testcase: xfstests
version: xfstests-x86_64-fb6575e-1_20221226
with following parameters:
disk: 4HDD
fs: ext4
test: generic-group-22
test-description: xfstests is a regression test suite for xfs and other files ystems.
test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
on test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]
generic/454 _check_generic_filesystem: filesystem on /dev/sda4 is inconsistent
(see /lkp/benchmarks/xfstests/results//generic/454.full for details)
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
On Thu, Dec 29, 2022 at 12:49 AM kernel test robot
<[email protected]> wrote:
>
> generic/454 _check_generic_filesystem: filesystem on /dev/sda4 is inconsistent
The commentary on that test is:
Create xattrs with multiple keys that all appear the same
(in unicode, anyway) but point to different values. In theory all
Linux filesystems should allow this (filenames are a sequence of
arbitrary bytes) even if the user implications are horrifying.
and looking at the script it seems to indeed just do setfattr and
getfattr with some unusual data (ie high bit set).
Adding Ted, since this is apparently all on ext4. I guess it could be
the vfs layer too, but it really doesn't tend to look very much at the
xattr data, so.. Adding Christian Brauner anyway, since he's been
working in this area for other reasons.
Ted, Christian - I cut down the report mercilessly. It's not really
all that interesting, apart from the basic information of "xfstest
generic/454 started failing consistently on ext4 at commit
3bc753c06dd0 ('kbuild: treat char as always unsigned')".
If you think you need more, see
https://lore.kernel.org/all/[email protected]/
Also, I'm surprised this hasn't been an issue earlier - 'char' has
always been unsigned on arm (among other architectures), so if this
test started failing now on x86-64 due to -funsigned-char, it has
presumably been failing on arm the whole time.
I assume it's something that compares a 'char *name' by value, but the
ones I looked at (eg xattr_find_entry() used strlen()/memcmp() which
should be all good).
Oh, I think I see one potential problem in ext4:
ext4_xattr_hash_entry() is hot garbage. Lookie here:
while (name_len--) {
hash = (hash << NAME_HASH_SHIFT) ^
(hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^
*name++;
}
so that hash will now depend on the sign of that 'char *name' pointer.
If that hash ever has any long-term meaning (ie saved on disk or
exposed some other way), that would be problematic.
Of course, if it's just an ephemeral thing only used on that one
machine, then it isn't a problem - having different hashes on
different machines (and different boots) is a non-issue. But
considering that it then does
return cpu_to_le32(hash);
at the end does seem to imply to me that it all really *tries* to be
architecture-neutral, and has just failed horrendously.
Because that does look like code that might get confused by the sign of a char.
There might be other cases, I only looked very quickly for these kinds
of problems.
Linus
On Thu, Dec 29, 2022 at 10:55:05AM -0800, Linus Torvalds wrote:
> Also, I'm surprised this hasn't been an issue earlier - 'char' has
> always been unsigned on arm (among other architectures), so if this
> test started failing now on x86-64 due to -funsigned-char, it has
> presumably been failing on arm the whole time.
That's the curious part, indeed...
> Oh, I think I see one potential problem in ext4:
>
> ext4_xattr_hash_entry() is hot garbage. Lookie here:
>
> while (name_len--) {
> hash = (hash << NAME_HASH_SHIFT) ^
> (hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^
> *name++;
> }
>
> so that hash will now depend on the sign of that 'char *name' pointer.
>
> If that hash ever has any long-term meaning (ie saved on disk or
> exposed some other way), that would be problematic.
Note that ext4 has lots of sign-specific code for hashing. Only some of
it can now be removed, since compatibility with old file systems must be
preserved. But what I mean is the code that begins in super.c:
i = le32_to_cpu(es->s_flags);
if (i & EXT2_FLAGS_UNSIGNED_HASH)
sbi->s_hash_unsigned = 3;
else if ((i & EXT2_FLAGS_SIGNED_HASH) == 0) {
#ifdef __CHAR_UNSIGNED__
if (!sb_rdonly(sb))
es->s_flags |=
cpu_to_le32(EXT2_FLAGS_UNSIGNED_HASH);
sbi->s_hash_unsigned = 3;
#else
if (!sb_rdonly(sb))
es->s_flags |=
cpu_to_le32(EXT2_FLAGS_SIGNED_HASH);
#endif
}
The second part of that #else can now go away. And then maybe the whole
expression can be simplified.
These actually wind up being used in namei.c:
hinfo->hash_version += EXT4_SB(dir->i_sb)->s_hash_unsigned;
, which then sets the hash version that's selected in hash.c:
switch (hinfo->hash_version) {
case DX_HASH_LEGACY_UNSIGNED:
hash = dx_hack_hash_unsigned(name, len);
break;
case DX_HASH_LEGACY:
hash = dx_hack_hash_signed(name, len);
break;
case DX_HASH_HALF_MD4_UNSIGNED:
str2hashbuf = str2hashbuf_unsigned;
fallthrough;
case DX_HASH_HALF_MD4:
p = name;
[...]
And so on. dx_hack_hash_unsigned() and dx_hack_hash_signed() are the
same functions, except one uses `unsigned char` and the other uses
`signed char`. It's unfortunate these exist, but now it's part of the
on-disk format, so they have to stick around (along with other warts
like "halfmd4").
But at least for new file systems, things should be unified. Anyway, it
looks like for *these* hashes, the ext4 developers did consider the
signedness issue.
Sounds like maybe it was left out of ext4_xattr_hash_entry(), which does
indeed look like it's part of the on-disk representation:
static int
ext4_xattr_inode_verify_hashes(struct inode *ea_inode,
struct ext4_xattr_entry *entry, void *buffer,
size_t size)
{
u32 hash;
/* Verify stored hash matches calculated hash. */
hash = ext4_xattr_inode_hash(EXT4_SB(ea_inode->i_sb), buffer, size);
if (hash != ext4_xattr_inode_get_hash(ea_inode))
return -EFSCORRUPTED;
if (entry) {
__le32 e_hash, tmp_data;
/* Verify entry hash. */
tmp_data = cpu_to_le32(hash);
e_hash = ext4_xattr_hash_entry(entry->e_name, entry->e_name_len,
&tmp_data, 1);
if (e_hash != entry->e_hash)
return -EFSCORRUPTED;
}
return 0;
}
So if ext4_xattr_hash_entry() is indeed broken with respect to
heisensignness, then this stuff has always been broken, and it's
probably good that this is being unearthed...
Jason
[Note: this mail contains only information for Linux kernel regression
tracking. Mails like these contain '#forregzbot' in the subject to make
then easy to spot and filter out. The author also tried to remove most
or all individuals from the list of recipients to spare them the hassle.]
On 29.12.22 09:49, kernel test robot wrote:
>
> please be noted we felt quite strange why this change could cause
> xfstests.generic.454.fail
> but even by rebuild and rerun, the issue seems persistent while keeping clean
> on parent
>
> daf4218bf8dd9750 3bc753c06dd02a3517c9b498e38
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :10 100% 10:10 xfstests.generic.454.fail
>
> so we just report out FYI to seek any insights about this.
>
>
> Greeting,
>
> FYI, we noticed xfstests.generic.454.fail due to commit (built with gcc-11):
>
> commit: 3bc753c06dd02a3517c9b498e3846ebfc94ac3ee ("kbuild: treat char as always unsigned")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linux-next/master c76083fac3bae1a87ae3d005b5cb1cbc761e31d5]
>
> in testcase: xfstests
> version: xfstests-x86_64-fb6575e-1_20221226
> with following parameters:
>
> disk: 4HDD
> fs: ext4
> test: generic-group-22
>
Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:
#regzbot ^introduced 3bc753c06dd02a
#regzbot title ext4/acls: xfstests.generic.454 suddenly fails for the
kernel test robot
#regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.
On Thu, Dec 29, 2022 at 10:55:05AM -0800, Linus Torvalds wrote:
> On Thu, Dec 29, 2022 at 12:49 AM kernel test robot
> <[email protected]> wrote:
> >
> > generic/454 _check_generic_filesystem: filesystem on /dev/sda4 is inconsistent
>
> The commentary on that test is:
>
> Create xattrs with multiple keys that all appear the same
> (in unicode, anyway) but point to different values. In theory all
> Linux filesystems should allow this (filenames are a sequence of
> arbitrary bytes) even if the user implications are horrifying.
>
> and looking at the script it seems to indeed just do setfattr and
> getfattr with some unusual data (ie high bit set).
>
> Adding Ted, since this is apparently all on ext4. I guess it could be
> the vfs layer too, but it really doesn't tend to look very much at the
> xattr data, so.. Adding Christian Brauner anyway, since he's been
> working in this area for other reasons.
The test uses the user.* xattr namespace which should be unaffected by
the xattr changes we did the last few cycles.
>
> Ted, Christian - I cut down the report mercilessly. It's not really
> all that interesting, apart from the basic information of "xfstest
> generic/454 started failing consistently on ext4 at commit
> 3bc753c06dd0 ('kbuild: treat char as always unsigned')".
>
> If you think you need more, see
>
> https://lore.kernel.org/all/[email protected]/
>
> Also, I'm surprised this hasn't been an issue earlier - 'char' has
> always been unsigned on arm (among other architectures), so if this
> test started failing now on x86-64 due to -funsigned-char, it has
> presumably been failing on arm the whole time.
>
> I assume it's something that compares a 'char *name' by value, but the
> ones I looked at (eg xattr_find_entry() used strlen()/memcmp() which
> should be all good).
>
> Oh, I think I see one potential problem in ext4:
>
> ext4_xattr_hash_entry() is hot garbage. Lookie here:
>
> while (name_len--) {
> hash = (hash << NAME_HASH_SHIFT) ^
> (hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^
> *name++;
> }
>
> so that hash will now depend on the sign of that 'char *name' pointer.
>
> If that hash ever has any long-term meaning (ie saved on disk or
> exposed some other way), that would be problematic.
If the xattrs aren't storable in the inode then they are stored in a
separate block. The consist of a header and after that is an array of
struct ext4_xattr_entry entries.
Each of the entries store a hash value e_hash which is a hash of the
xattr name and xattr value.
The aforementioned header contains another h_hash field which seems to
be a hash of all e_hash fields of all xattrs.
For in-inode/ea-inode xattrs a hash for xattr name and value is stored
on disk as well but there's no header. The i_atime field contains a
checksum of only the value that is stored in the inode.
IOW, the bug might also depend on how the xattrs are stored. For
example, what xattrs might be stored in the inode depends on the inode
size that is chosen when the filesystem is created. But if I'm not
mistaken, it also depends on the block size.
So if the block size chosen for x86 differs from arm that might have an
impact on how the xattrs are stored and thus make the bug more or less
likely to appear?...