2023-11-29 06:15:22

by Daniel Dawson

[permalink] [raw]
Subject: [inline_data] ext4: Stale flags before sync when convert to non-inline

When a file is converted from inline to non-inline, it has stale flags
until sync.

If a file with inline data is written to such that it must become
non-inline, it temporarily appears to have the inline data flag and not
(if applicable) the extent flag. This is corrected on sync, but can
cause problems in certain situations.

Details:

All that is needed to show this behavior is the following command:

$ rm -r test-file; dd if=/dev/zero of=test-file bs=64 count=3
status=none; lsattr test-file

Assuming extents are in use, this should show

--------------e------- test-file

but instead shows

------------------N--- test-file

until test-file is synced. Despite this, the file is already non-inline
and is treated as such for most purposes.

Why is this a problem? Because some code will fail under such a
condition, for example, lseek(..., SEEK_HOLE) will result in ENOENT. I
ran into this with Gentoo's Portage, which uses the call to handle
sparse files when copying. Sometimes, an ebuild creates a temporary file
that is quickly copied, and apparently the temporary is written in small
increments, triggering this.

Here is a small program that reproduces the SEEK_HOLE problem (pass it
the pathname of a file to create):
https://gist.github.com/ddawson/22cfd4cac32916f6f1dcc86f90eed21a

Tested with kernel: 6.7.0-rc3 (also 6.6 series)
/proc/version: Linux version 6.7.0-rc3 ([email protected]) (gcc
(Gentoo 13.2.1_p20231014 p8) 13.2.1 20231014, GNU ld (Gentoo 2.41 p2)
2.41.0) #4 SMP PREEMPT_DYNAMIC Tue Nov 28 20:09:05 PST 2023
Operating System: Gentoo Linux
uname -mi: x86_64 GenuineIntel
.config: https://gist.github.com/ddawson/2f2e60c6e44a62047d7b7d99c7ce5632
dmesg output:
https://gist.github.com/ddawson/026ea63f099ee3e0c301f522dff00764

--
PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A



2023-11-29 20:06:00

by Daniel Dawson

[permalink] [raw]
Subject: Re: [inline_data] ext4: Stale flags before sync when convert to non-inline

On 11/28/23 10:15 PM, Daniel Dawson wrote:
> $ rm -r test-file; dd if=/dev/zero of=test-file bs=64 count=3
> status=none; lsattr test-file

My apologies. That should be "rm -f" at the start.

--
PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A


2023-11-30 04:07:20

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [inline_data] ext4: Stale flags before sync when convert to non-inline

On Tue, Nov 28, 2023 at 10:15:09PM -0800, Daniel Dawson wrote:
> When a file is converted from inline to non-inline, it has stale flags until
> sync.
>
> If a file with inline data is written to such that it must become
> non-inline, it temporarily appears to have the inline data flag and not (if
> applicable) the extent flag. This is corrected on sync, but can cause
> problems in certain situations.

The issue here is delayed allocation. When you write to a file with
delayed allocation enabled, the file system doesn't decide where the
data will be written on the storage device until the last possible
minute, when writeback occurs. This can be triggered by a fsync(2) or
sync(2) system call,

> Why is this a problem? Because some code will fail under such a condition,
> for example, lseek(..., SEEK_HOLE) will result in ENOENT. I ran into this
> with Gentoo's Portage, which uses the call to handle sparse files when
> copying. Sometimes, an ebuild creates a temporary file that is quickly
> copied, and apparently the temporary is written in small increments,
> triggering this.

This is caused by missing logic in ext4_iomap_begin_report(). For the
non-inline case, this function does this:

ret = ext4_map_blocks(NULL, inode, &map, 0);
if (ret < 0)
return ret;
if (ret == 0)
delalloc = ext4_iomap_is_delalloc(inode, &map);

For a non-inline file, if you write some data blocks that hasn't been
written back due to delayed allocation, ext4_map_blocks() won't be
able to map the logical block to a physical block. This logic is
missing in the inline_data case:

if (ext4_has_inline_data(inode)) {
ret = ext4_inline_data_iomap(inode, iomap);
if (ret != -EAGAIN) {
if (ret == 0 && offset >= iomap->length)
ret = -ENOENT;
return ret;
}
}

What's missing is a call to ext4_iomap_is_delalloc() in the case where
ret == 0, and then setting the delayed allocation flag:

if (delalloc && iomap->type == IOMAP_HOLE)
iomap->type = IOMAP_DELALLOC;

This will deal with the combination of inline_data and delayed
allocation for SEEK_HOLE and for FIEMAP.

I'll need to turn this into an actual patch and then create a test to
validate the patch but I'm pretty sure this should deal with the
problem you've come across.

Cheers,

- Ted

2023-12-28 09:29:21

by Daniel Dawson

[permalink] [raw]
Subject: Re: [inline_data] ext4: Stale flags before sync when convert to non-inline

On 11/29/23 8:06 PM, Theodore Ts'o wrote:
> I'll need to turn this into an actual patch and then create a test to
> validate the patch but I'm pretty sure this should deal with the
> problem you've come across.
Is there anything new on this (apologies if I managed to miss it)? Let
me know if there's anything I might do to help, like maybe testing.

--
PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A


2024-01-23 06:06:34

by Daniel Dawson

[permalink] [raw]
Subject: Re: [inline_data] ext4: Stale flags before sync when convert to non-inline

On 11/28/23 10:15 PM, Daniel Dawson wrote:
> When a file is converted from inline to non-inline, it has stale flags
> until sync.

> Why is this a problem? Because some code will fail under such a
> condition, for example, lseek(..., SEEK_HOLE) will result in ENOENT.


Just tested. Still happening on 6.8-rc1.

--
PGP fingerprint: 5BBD5080FEB0EF7F142F8173D572B791F7B4422A


2024-01-24 17:13:31

by Luis Henriques

[permalink] [raw]
Subject: Re: [inline_data] ext4: Stale flags before sync when convert to non-inline

Daniel Dawson <[email protected]> writes:

> On 11/28/23 10:15 PM, Daniel Dawson wrote:
>> When a file is converted from inline to non-inline, it has stale flags until
>> sync.
>
>> Why is this a problem? Because some code will fail under such a condition, for
>> example, lseek(..., SEEK_HOLE) will result in ENOENT.
>
>
> Just tested. Still happening on 6.8-rc1.

FWIW, I've been looking into a similar issue related with inline-data and
delayed allocation. It may however be quite different because it seems to
add small block sizes into the mix:

https://bugzilla.kernel.org/show_bug.cgi?id=200681

Unfortunately, I'm still trying to find my way around all this code and I
can't say I fully understand the whole flow using the reproducer provided
in that bugzilla.

Bellow, I'm inlining a patch that started as debug patch that I've used to
try to understand what was going on. It seems to workaround that bug, but
I know it's not a real fix -- I don't yet understand what's going on.

Regarding your specific usecase, I can reproduce it and, unfortunately, I
don't thing Ted's suggestion will fix it as I don't even see
ext4_iomap_begin_report() being executed at all. Anyway, just my 2
cents... let's see if I can come up with something.

Cheers,
--
Luís

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index d5bd1e3a5d36..d0c3d6fd48de 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -528,7 +528,19 @@ int ext4_readpage_inline(struct inode *inode, struct folio *folio)
if (!folio->index)
ret = ext4_read_inline_folio(inode, folio);
else if (!folio_test_uptodate(folio)) {
- folio_zero_segment(folio, 0, folio_size(folio));
+ struct buffer_head *bh, *head;
+ size_t start = 0;
+
+ head = folio_buffers(folio);
+ if (head) {
+ bh = head;
+ do {
+ if (!buffer_uptodate(bh))
+ break;
+ start += inode->i_sb->s_blocksize;
+ } while ((bh = bh->b_this_page) != head);
+ }
+ folio_zero_segment(folio, start, folio_size(folio));
folio_mark_uptodate(folio);
}

2024-01-29 11:17:31

by Luis Henriques

[permalink] [raw]
Subject: Re: [inline_data] ext4: Stale flags before sync when convert to non-inline

Daniel Dawson <[email protected]> writes:

> I didn't see your message until now. Sorry.
>
> On 1/24/24 9:13 AM, Luis Henriques wrote:
>> Bellow, I'm inlining a patch that started as debug patch that I've used to
>> try to understand what was going on. It seems to workaround that bug, but
>> I know it's not a real fix -- I don't yet understand what's going on.
>
> Thanks for this. I'm not sure if you meant to say you think it works around the
> present issue. I just tested it, and it does not. In case you missed the start

I was referring to the bugzilla bug [1] I've been looking into recently.
My patch was a debug patch for that bug, but it definitely does not fix
the issue you're reporting. Sorry for mixing up everything and confusing
everyone ;-)

[1] https://bugzilla.kernel.org/show_bug.cgi?id=200681

> of the thread, here is the test I gave for triggering the issue:
>
> $ rm -f test-file; dd if=/dev/zero of=test-file bs=64 count=3 status=none;
> lsattr test-file
>
> Instead of writing the file all at once, it splits it into 3 writes, where the
> first is small enough to make the file inline, and then it becomes
> non-inline. Ideally, the output should be
>
> --------------e------- test-file
>
> but delayed allocation means it instead shows
>
> ------------------N--- test-file
>
> until sync. I also gave this code for testing SEEK_HOLE:
>
> https://gist.github.com/ddawson/22cfd4cac32916f6f1dcc86f90eed21a
>
>> Regarding your specific usecase, I can reproduce it and, unfortunately, I
>> don't thing Ted's suggestion will fix it as I don't even see
>> ext4_iomap_begin_report() being executed at all.
>
> To be clear, that function is called in a few specific circumstances, such as
> when lseek() is called with SEEK_HOLE or SEEK_DATA, or with FIEMAP. When I
> traced the kernel myself, I did see it being executed from the lseek() call. The
> changes are to address the file not yet being converted from inline, where the
> contents are still written where the map would otherwise be. If you treat it as
> the map, you get nonsense. Something else needs to be done.
>
> I'm not clear on whether his proposed changes would then allow an application to
> function properly under such a condition, but it should at least *not* give
> ENOENT.
>
> After testing what I think are the changes he proposed, I find it doesn't
> work. If I remove the "&& iomap->type == IOMAP_HOLE", lseek() no longer gives an
> error, but instead returns 0, which I'm pretty sure won't work for the affected
> use case. Either way, I'm not sure I interpreted his description of the changes
> correctly.

Sure, I can understand under which circumstances ext4_iomap_begin_report()
can be executed. What I meant was that, for your very simple test case
(using 'dd' and 'lsattr'), this function will not be executed and thus the
suggested fix will still show the 'N' in the file attributes.

Anyway, thanks a lot for the extra information your providing here. I'll
try to find some time in the next few days to keep debugging the issue and
hopefully get some more useful info (and, who knows! a patch).

Cheers,
--
Luís