2023-05-09 18:21:51

by Eric Whitney

[permalink] [raw]
Subject: 6.4-rc1 xfstests-bld adv regressions

Hi Jan:

I'm seeing two test regressions on 6.4-rc1 while running the adv test case
with kvm-xfstests. Both tests fail with 100% reliability in 100 trial runs,
and the failures appear to depend solely upon the fast commit mount option.

The first is generic/065, where the relevant info from 065.full is:

_check_generic_filesystem: filesystem on /dev/vdc is inconsistent
*** fsck.ext4 output ***
fsck from util-linux 2.36.1
e2fsck 1.47.0 (5-Feb-2023)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Directories count wrong for group #16 (4294967293, counted=0).


The second is generic/535, where the test output is:

QA output created by 535
Silence is golden
+Before: 755
+After : 777

Both test failures bisect to: e360c6ed7274 ("ext4: Drop special handling of
journalled data from ext4_sync_file()"). Reverting this patch eliminates the
test failures. So, I thought I'd bring these to your attention.

Eric


2023-05-09 19:11:20

by Jan Kara

[permalink] [raw]
Subject: Re: 6.4-rc1 xfstests-bld adv regressions

Hi Eric!

On Tue 09-05-23 14:20:15, Eric Whitney wrote:
> I'm seeing two test regressions on 6.4-rc1 while running the adv test case
> with kvm-xfstests. Both tests fail with 100% reliability in 100 trial runs,
> and the failures appear to depend solely upon the fast commit mount option.
>
> The first is generic/065, where the relevant info from 065.full is:
>
> _check_generic_filesystem: filesystem on /dev/vdc is inconsistent
> *** fsck.ext4 output ***
> fsck from util-linux 2.36.1
> e2fsck 1.47.0 (5-Feb-2023)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Directories count wrong for group #16 (4294967293, counted=0).
>
>
> The second is generic/535, where the test output is:
>
> QA output created by 535
> Silence is golden
> +Before: 755
> +After : 777
>
> Both test failures bisect to: e360c6ed7274 ("ext4: Drop special handling of
> journalled data from ext4_sync_file()"). Reverting this patch eliminates the
> test failures. So, I thought I'd bring these to your attention.

Thanks for report! Yeah, when doing commit e360c6ed7274 I forgot about
directories which can be also fsynced and which need special treatment. I
have to think a bit what's the best way to fix this.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

Subject: Re: 6.4-rc1 xfstests-bld adv regressions

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 09.05.23 20:20, Eric Whitney wrote:
>
> I'm seeing two test regressions on 6.4-rc1 while running the adv test case
> with kvm-xfstests. Both tests fail with 100% reliability in 100 trial runs,
> and the failures appear to depend solely upon the fast commit mount option.
>
> The first is generic/065, where the relevant info from 065.full is:
>
> _check_generic_filesystem: filesystem on /dev/vdc is inconsistent
> *** fsck.ext4 output ***
> fsck from util-linux 2.36.1
> e2fsck 1.47.0 (5-Feb-2023)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Directories count wrong for group #16 (4294967293, counted=0).
>
>
> The second is generic/535, where the test output is:
>
> QA output created by 535
> Silence is golden
> +Before: 755
> +After : 777
>
> Both test failures bisect to: e360c6ed7274 ("ext4: Drop special handling of
> journalled data from ext4_sync_file()"). Reverting this patch eliminates the
> test failures. So, I thought I'd bring these to your attention.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced e360c6ed7274
#regzbot title ext4: adv test cases of kvm-xfstests fail
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

2023-05-24 11:04:02

by Jan Kara

[permalink] [raw]
Subject: Re: 6.4-rc1 xfstests-bld adv regressions

Hello!

Due to conferences this took a bit long. I'm sorry for that.

On Tue 09-05-23 21:09:30, Jan Kara wrote:
> On Tue 09-05-23 14:20:15, Eric Whitney wrote:
> > I'm seeing two test regressions on 6.4-rc1 while running the adv test case
> > with kvm-xfstests. Both tests fail with 100% reliability in 100 trial runs,
> > and the failures appear to depend solely upon the fast commit mount option.
> >
> > The first is generic/065, where the relevant info from 065.full is:
> >
> > _check_generic_filesystem: filesystem on /dev/vdc is inconsistent
> > *** fsck.ext4 output ***
> > fsck from util-linux 2.36.1
> > e2fsck 1.47.0 (5-Feb-2023)
> > Pass 1: Checking inodes, blocks, and sizes
> > Pass 2: Checking directory structure
> > Pass 3: Checking directory connectivity
> > Pass 4: Checking reference counts
> > Pass 5: Checking group summary information
> > Directories count wrong for group #16 (4294967293, counted=0).
> >
> >
> > The second is generic/535, where the test output is:
> >
> > QA output created by 535
> > Silence is golden
> > +Before: 755
> > +After : 777
> >
> > Both test failures bisect to: e360c6ed7274 ("ext4: Drop special handling of
> > journalled data from ext4_sync_file()"). Reverting this patch eliminates the
> > test failures. So, I thought I'd bring these to your attention.
>
> Thanks for report! Yeah, when doing commit e360c6ed7274 I forgot about
> directories which can be also fsynced and which need special treatment. I
> have to think a bit what's the best way to fix this.

After digging a bit in the code I understand now what has confused me. The
thing is that fastcommit does not track metadata changes on directories but
neither does it mark the filesystem as ineligible when they happen. So
ext4_fc_commit() implicitely relies on the fact that it never gets called
in any other case than fsync(2) on a regular file.

I believe we should improve fastcommit code to better handle directories
or at least not have these implicit assumptions but for now the easiest fix
is to return back the explicit full commit for non-regular files. I'll send
a patch.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR