2024-06-10 16:25:24

by Ben Hutchings

[permalink] [raw]
Subject: Re: linux: ext4 corruption with symlinks

On Sun, 5 Nov 2023 16:12:41 +0000 Hervé Werner <[email protected]>
wrote:
> Hello
>
> I'm sorry for the delay.
>
> > Are you able to reliably preoeduce the issue and can bisect it to
> > the introducing commit?
> I faced this issue on real data but I struggled to find a reliable
> scenario to reproduce it. Here is what I just came up with:
>   sudo mkfs -t ext4 -O fast_commit,inline_data /dev/sdb
>   sudo mount /dev/sdb /mnt/
>   sudo install -d -o myuser /mnt/annex
>   cd /mnt/annex
>   git init && git annex init
>   for i in {1..2}; do
>     for i in {1..10000}; do
>       dd if=/dev/urandom of=file-${i} bs=1K count=1 2>/dev/null
>     done
>     git annex add -J cpus . >/dev/null && git annex sync -J cpus && git annex fsck -J cpus >/dev/null
>     git rm * && git annex sync  && git annex dropunused all
>   done
>
> Then at some point the following error appears:
>   EXT4-fs error (device sdb): ext4_map_blocks:577: inode #3942343: block 4: comm git-annex:w: lblock 1 mapped to illegal pblock 4 (length 1)
[...]

I can also reproduce this error message using the above script and:

- Linux 6.10-rc2
- A 2 GiB loopback devic instead of /dev/sdb

I bisected this back to:

commit 9725958bb75cdfa10f2ec11526fdb23e7485e8e4
Author: Xin Yin <[email protected]>
Date: Thu Dec 23 11:23:37 2021 +0800

ext4: fast commit may miss tracking unwritten range during ftruncate

It is still possible to cleanly revert that commit from 6.10-rc2, and
doing so removes the error message.

Ben.

--
Ben Hutchings
Reality is just a crutch for people who can't handle science fiction.


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part

2024-06-14 16:19:05

by Luis Henriques

[permalink] [raw]
Subject: Re: linux: ext4 corruption with symlinks

On Mon 10 Jun 2024 06:03:58 PM +02, Ben Hutchings wrote;

> On Sun, 5 Nov 2023 16:12:41 +0000 Hervé Werner <[email protected]>
> wrote:
>> Hello
>>
>> I'm sorry for the delay.
>>
>> > Are you able to reliably preoeduce the issue and can bisect it to
>> > the introducing commit?
>> I faced this issue on real data but I struggled to find a reliable
>> scenario to reproduce it. Here is what I just came up with:
>>   sudo mkfs -t ext4 -O fast_commit,inline_data /dev/sdb
>>   sudo mount /dev/sdb /mnt/
>>   sudo install -d -o myuser /mnt/annex
>>   cd /mnt/annex
>>   git init && git annex init
>>   for i in {1..2}; do
>>     for i in {1..10000}; do
>>       dd if=/dev/urandom of=file-${i} bs=1K count=1 2>/dev/null
>>     done
>>     git annex add -J cpus . >/dev/null && git annex sync -J cpus && git annex fsck -J cpus >/dev/null
>>     git rm * && git annex sync  && git annex dropunused all
>>   done
>>
>> Then at some point the following error appears:
>>   EXT4-fs error (device sdb): ext4_map_blocks:577: inode #3942343: block 4: comm git-annex:w: lblock 1 mapped to illegal pblock 4 (length 1)
> [...]
>
> I can also reproduce this error message using the above script and:
>
> - Linux 6.10-rc2
> - A 2 GiB loopback devic instead of /dev/sdb
>
> I bisected this back to:
>
> commit 9725958bb75cdfa10f2ec11526fdb23e7485e8e4
> Author: Xin Yin <[email protected]>
> Date: Thu Dec 23 11:23:37 2021 +0800
>
> ext4: fast commit may miss tracking unwritten range during ftruncate
>
> It is still possible to cleanly revert that commit from 6.10-rc2, and
> doing so removes the error message.

Because I recently fixed an issue in the fast commit code[1] I was hoping
that you were hitting the same bug. I've executed the reproducer with the
fix (which hasn't been merged yet) and realised it's definitely a
different problem.

Debugged the issue a bit, it seems to be related with the fact that
ext4_fc_write_inode_data() isn't able to cope with the fact that
'ei->i_fc_lblk_len' is set to EXT_MAX_BLOCKS.

I'm CC'ing Harshad, maybe he has some idea.

[1] https://lore.kernel.org/all/[email protected]

Cheers,
--
Luís