在 2024/5/3 18:38, Anand Jain 写道:
> This patch, along with the dependent patches below, adds support for
> ext4 unwritten file extents as preallocated file extent in btrfs.
>
> btrfs-progs: convert: refactor ext2_create_file_extents add argument ext2_inode
> btrfs-progs: convert: struct blk_iterate_data, add ext2-specific file inode pointers
> btrfs-progs: convert: refactor __btrfs_record_file_extent to add a prealloc flag
>
> The patch is developed with POV of portability with the current
> e2fsprogs library.
>
> Testcase:
>
> $ dd if=/dev/urandom of=/mnt/test/foo bs=4K count=1 conv=fsync status=none
> $ dd if=/dev/urandom of=/mnt/test/foo bs=4K count=2 conv=fsync seek=1 status=none
> $ xfs_io -f -c 'falloc -k 12K 12K' /mnt/test/foo
> $ dd if=/dev/zero of=/mnt/test/foo bs=4K count=1 conv=fsync seek=6 status=none
>
> $ filefrag -v /mnt/test/foo
> Filesystem type is: ef53
> File size of /mnt/test/foo is 28672 (7 blocks of 4096 bytes)
> ext: logical_offset: physical_offset: length: expected: flags:
> 0: 0.. 0: 33280.. 33280: 1:
> 1: 1.. 2: 33792.. 33793: 2: 33281:
> 2: 3.. 5: 33281.. 33283: 3: 33794: unwritten
> 3: 6.. 6: 33794.. 33794: 1: 33284: last,eof
>
> $ sha256sum /mnt/test/foo
> 18619b678a5c207a971a0aa931604f48162e307c57ecdec450d5f095fe9f32c7 /mnt/test/foo
>
> Convert and compare the checksum
>
> Before:
>
> $ filefrag -v /mnt/test/foo
> Filesystem type is: 9123683e
> File size of /mnt/test/foo is 28672 (7 blocks of 4096 bytes)
> ext: logical_offset: physical_offset: length: expected: flags:
> 0: 0.. 0: 33280.. 33280: 1: shared
> 1: 1.. 2: 33792.. 33793: 2: 33281: shared
> 2: 3.. 6: 33281.. 33284: 4: 33794: last,shared,eof
> /mnt/test/foo: 3 extents found
>
> $ sha256sum /mnt/test/foo
> 6874a1733e5785682210d69c07f256f684cf5433c7235ed29848b4a4d52030e0 /mnt/test/foo
>
> After:
>
> $ filefrag -v /mnt/test/foo
> Filesystem type is: 9123683e
> File size of /mnt/test/foo is 28672 (7 blocks of 4096 bytes)
> ext: logical_offset: physical_offset: length: expected: flags:
> 0: 0.. 0: 33280.. 33280: 1: shared
> 1: 1.. 2: 33792.. 33793: 2: 33281: shared
> 2: 3.. 5: 33281.. 33283: 3: 33794: unwritten,shared
> 3: 6.. 6: 33794.. 33794: 1: 33284: last,shared,eof
> /mnt/test/foo: 4 extents found
>
> $ sha256sum /mnt/test/foo
> 18619b678a5c207a971a0aa931604f48162e307c57ecdec450d5f095fe9f32c7 /mnt/test/foo
>
> Signed-off-by: Anand Jain <[email protected]>
> ---
> RFC: Limited tested. Is there a ready file or test case available to
> exercise the feature?
>
> convert/source-fs.c | 49 ++++++++++++++++++++++++++++++++++++++++++++-
> convert/source-fs.h | 1 +
> 2 files changed, 49 insertions(+), 1 deletion(-)
>
> diff --git a/convert/source-fs.c b/convert/source-fs.c
> index 9039b0e66758..647ea1f29060 100644
> --- a/convert/source-fs.c
> +++ b/convert/source-fs.c
> @@ -239,6 +239,45 @@ fail:
> return ret;
> }
>
> +int find_prealloc(struct blk_iterate_data *data, int index, bool *prealloc)
This function is called for every file extent we're going to create.
I'm not a big fan of doing so many lookup.
My question is, is this the only way to determine the flag of the data
extent?
My instinct says there should be a straight forward way to determine if
a file extent is preallocated or not, just like what we do in our file
extent items.
Thus during the ext2fs_block_iterate2(), there should be some way to
tell if a block is preallocated or not.
Thus adding ext4 ML to get some feedback.
Thanks,
Qu
> +{
> + ext2_extent_handle_t handle;
> + struct ext2fs_extent extent;
> +
> + if (ext2fs_extent_open2(data->ext2_fs, data->ext2_ino,
> + data->ext2_inode, &handle)) {
> + error("ext2fs_extent_open2 failed, inode %d", data->ext2_ino);
> + return -EINVAL;
> + }
> +
> + if (ext2fs_extent_goto2(handle, 0, index)) {
> + error("ext2fs_extent_goto2 failed, inode %d num_blocks %llu",
> + data->ext2_ino, data->num_blocks);
> + ext2fs_extent_free(handle);
> + return -EINVAL;
> + }
> +
> + memset(&extent, 0, sizeof(struct ext2fs_extent));
> + if (ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT, &extent)) {
> + error("ext2fs_extent_get EXT2_EXTENT_CURRENT failed inode %d",
> + data->ext2_ino);
> + ext2fs_extent_free(handle);
> + return -EINVAL;
> + }
> +
> + if (extent.e_pblk != data->disk_block) {
> + error("inode %d index %d found wrong extent e_pblk %llu wanted disk_block %llu",
> + data->ext2_ino, index, extent.e_pblk, data->disk_block);
> + ext2fs_extent_free(handle);
> + return -EINVAL;
> + }
> +
> + if (extent.e_flags & EXT2_EXTENT_FLAGS_UNINIT)
> + *prealloc = true;
> +
> + return 0;
> +}
> +
> /*
> * Record a file extent in original filesystem into btrfs one.
> * The special point is, old disk_block can point to a reserved range.
> @@ -257,6 +296,7 @@ int record_file_blocks(struct blk_iterate_data *data,
> u64 old_disk_bytenr = disk_block * sectorsize;
> u64 num_bytes = num_blocks * sectorsize;
> u64 cur_off = old_disk_bytenr;
> + int index = data->first_block;
>
> /* Hole, pass it to record_file_extent directly */
> if (old_disk_bytenr == 0)
> @@ -276,6 +316,12 @@ int record_file_blocks(struct blk_iterate_data *data,
> u64 extent_num_bytes;
> u64 real_disk_bytenr;
> u64 cur_len;
> + bool prealloc = false;
> +
> + if (find_prealloc(data, index, &prealloc)) {
> + data->errcode = -1;
> + return -EINVAL;
> + }
>
> key.objectid = data->convert_ino;
> key.type = BTRFS_EXTENT_DATA_KEY;
> @@ -317,11 +363,12 @@ int record_file_blocks(struct blk_iterate_data *data,
> ret = btrfs_record_file_extent(data->trans, data->root,
> data->objectid, data->inode, file_pos,
> real_disk_bytenr, cur_len,
> - false);
> + prealloc);
> if (ret < 0)
> break;
> cur_off += cur_len;
> file_pos += cur_len;
> + index++;
>
> /*
> * No need to care about csum
> diff --git a/convert/source-fs.h b/convert/source-fs.h
> index 0e71e79eddcc..3922c444de10 100644
> --- a/convert/source-fs.h
> +++ b/convert/source-fs.h
> @@ -156,5 +156,6 @@ int read_disk_extent(struct btrfs_root *root, u64 bytenr,
> u32 num_bytes, char *buffer);
> int record_file_blocks(struct blk_iterate_data *data,
> u64 file_block, u64 disk_block, u64 num_blocks);
> +int find_prealloc(struct blk_iterate_data *data, int index, bool *prealloc);
>
> #endif
On 5/3/24 17:37, Qu Wenruo wrote:
>
>
> 在 2024/5/3 18:38, Anand Jain 写道:
>> This patch, along with the dependent patches below, adds support for
>> ext4 unwritten file extents as preallocated file extent in btrfs.
>>
>> btrfs-progs: convert: refactor ext2_create_file_extents add argument
>> ext2_inode
>> btrfs-progs: convert: struct blk_iterate_data, add ext2-specific
>> file inode pointers
>> btrfs-progs: convert: refactor __btrfs_record_file_extent to add a
>> prealloc flag
>>
>> The patch is developed with POV of portability with the current
>> e2fsprogs library.
>>
>> Testcase:
>>
>> $ dd if=/dev/urandom of=/mnt/test/foo bs=4K count=1 conv=fsync
>> status=none
>> $ dd if=/dev/urandom of=/mnt/test/foo bs=4K count=2 conv=fsync
>> seek=1 status=none
>> $ xfs_io -f -c 'falloc -k 12K 12K' /mnt/test/foo
>> $ dd if=/dev/zero of=/mnt/test/foo bs=4K count=1 conv=fsync
>> seek=6 status=none
>>
>> $ filefrag -v /mnt/test/foo
>> Filesystem type is: ef53
>> File size of /mnt/test/foo is 28672 (7 blocks of 4096 bytes)
>> ext: logical_offset: physical_offset: length:
>> expected: flags:
>> 0: 0.. 0: 33280.. 33280: 1:
>> 1: 1.. 2: 33792.. 33793: 2: 33281:
>> 2: 3.. 5: 33281.. 33283: 3:
>> 33794: unwritten
>> 3: 6.. 6: 33794.. 33794: 1:
>> 33284: last,eof
>>
>> $ sha256sum /mnt/test/foo
>>
>> 18619b678a5c207a971a0aa931604f48162e307c57ecdec450d5f095fe9f32c7
>> /mnt/test/foo
>>
>> Convert and compare the checksum
>>
>> Before:
>>
>> $ filefrag -v /mnt/test/foo
>> Filesystem type is: 9123683e
>> File size of /mnt/test/foo is 28672 (7 blocks of 4096 bytes)
>> ext: logical_offset: physical_offset: length:
>> expected: flags:
>> 0: 0.. 0: 33280.. 33280:
>> 1: shared
>> 1: 1.. 2: 33792.. 33793: 2:
>> 33281: shared
>> 2: 3.. 6: 33281.. 33284: 4:
>> 33794: last,shared,eof
>> /mnt/test/foo: 3 extents found
>>
>> $ sha256sum /mnt/test/foo
>>
>> 6874a1733e5785682210d69c07f256f684cf5433c7235ed29848b4a4d52030e0
>> /mnt/test/foo
>>
>> After:
>>
>> $ filefrag -v /mnt/test/foo
>> Filesystem type is: 9123683e
>> File size of /mnt/test/foo is 28672 (7 blocks of 4096 bytes)
>> ext: logical_offset: physical_offset: length:
>> expected: flags:
>> 0: 0.. 0: 33280.. 33280:
>> 1: shared
>> 1: 1.. 2: 33792.. 33793: 2:
>> 33281: shared
>> 2: 3.. 5: 33281.. 33283: 3:
>> 33794: unwritten,shared
>> 3: 6.. 6: 33794.. 33794: 1:
>> 33284: last,shared,eof
>> /mnt/test/foo: 4 extents found
>>
>> $ sha256sum /mnt/test/foo
>>
>> 18619b678a5c207a971a0aa931604f48162e307c57ecdec450d5f095fe9f32c7
>> /mnt/test/foo
>>
>> Signed-off-by: Anand Jain <[email protected]>
>> ---
>> RFC: Limited tested. Is there a ready file or test case available to
>> exercise the feature?
>>
>> convert/source-fs.c | 49 ++++++++++++++++++++++++++++++++++++++++++++-
>> convert/source-fs.h | 1 +
>> 2 files changed, 49 insertions(+), 1 deletion(-)
>>
>> diff --git a/convert/source-fs.c b/convert/source-fs.c
>> index 9039b0e66758..647ea1f29060 100644
>> --- a/convert/source-fs.c
>> +++ b/convert/source-fs.c
>> @@ -239,6 +239,45 @@ fail:
>> return ret;
>> }
>> +int find_prealloc(struct blk_iterate_data *data, int index, bool
>> *prealloc)
>
> This function is called for every file extent we're going to create.
> I'm not a big fan of doing so many lookup.
>
> My question is, is this the only way to determine the flag of the data
> extent?
>
> My instinct says there should be a straight forward way to determine if
> a file extent is preallocated or not, just like what we do in our file
> extent items.
> Thus during the ext2fs_block_iterate2(), there should be some way to
> tell if a block is preallocated or not.
Unfortunately, the callback doesn't provide the extent flags. Unless, I
miss something?
Thx, Anand
>
> Thus adding ext4 ML to get some feedback.
>
> Thanks,
> Qu
>
>> +{
>> + ext2_extent_handle_t handle;
>> + struct ext2fs_extent extent;
>> +
>> + if (ext2fs_extent_open2(data->ext2_fs, data->ext2_ino,
>> + data->ext2_inode, &handle)) {
>> + error("ext2fs_extent_open2 failed, inode %d", data->ext2_ino);
>> + return -EINVAL;
>> + }
>> +
>> + if (ext2fs_extent_goto2(handle, 0, index)) {
>> + error("ext2fs_extent_goto2 failed, inode %d num_blocks %llu",
>> + data->ext2_ino, data->num_blocks);
>> + ext2fs_extent_free(handle);
>> + return -EINVAL;
>> + }
>> +
>> + memset(&extent, 0, sizeof(struct ext2fs_extent));
>> + if (ext2fs_extent_get(handle, EXT2_EXTENT_CURRENT, &extent)) {
>> + error("ext2fs_extent_get EXT2_EXTENT_CURRENT failed inode %d",
>> + data->ext2_ino);
>> + ext2fs_extent_free(handle);
>> + return -EINVAL;
>> + }
>> +
>> + if (extent.e_pblk != data->disk_block) {
>> + error("inode %d index %d found wrong extent e_pblk %llu wanted
>> disk_block %llu",
>> + data->ext2_ino, index, extent.e_pblk, data->disk_block);
>> + ext2fs_extent_free(handle);
>> + return -EINVAL;
>> + }
>> +
>> + if (extent.e_flags & EXT2_EXTENT_FLAGS_UNINIT)
>> + *prealloc = true;
>> +
>> + return 0;
>> +}
>> +
>> /*
>> * Record a file extent in original filesystem into btrfs one.
>> * The special point is, old disk_block can point to a reserved range.
>> @@ -257,6 +296,7 @@ int record_file_blocks(struct blk_iterate_data *data,
>> u64 old_disk_bytenr = disk_block * sectorsize;
>> u64 num_bytes = num_blocks * sectorsize;
>> u64 cur_off = old_disk_bytenr;
>> + int index = data->first_block;
>> /* Hole, pass it to record_file_extent directly */
>> if (old_disk_bytenr == 0)
>> @@ -276,6 +316,12 @@ int record_file_blocks(struct blk_iterate_data
>> *data,
>> u64 extent_num_bytes;
>> u64 real_disk_bytenr;
>> u64 cur_len;
>> + bool prealloc = false;
>> +
>> + if (find_prealloc(data, index, &prealloc)) {
>> + data->errcode = -1;
>> + return -EINVAL;
>> + }
>> key.objectid = data->convert_ino;
>> key.type = BTRFS_EXTENT_DATA_KEY;
>> @@ -317,11 +363,12 @@ int record_file_blocks(struct blk_iterate_data
>> *data,
>> ret = btrfs_record_file_extent(data->trans, data->root,
>> data->objectid, data->inode, file_pos,
>> real_disk_bytenr, cur_len,
>> - false);
>> + prealloc);
>> if (ret < 0)
>> break;
>> cur_off += cur_len;
>> file_pos += cur_len;
>> + index++;
>> /*
>> * No need to care about csum
>> diff --git a/convert/source-fs.h b/convert/source-fs.h
>> index 0e71e79eddcc..3922c444de10 100644
>> --- a/convert/source-fs.h
>> +++ b/convert/source-fs.h
>> @@ -156,5 +156,6 @@ int read_disk_extent(struct btrfs_root *root, u64
>> bytenr,
>> u32 num_bytes, char *buffer);
>> int record_file_blocks(struct blk_iterate_data *data,
>> u64 file_block, u64 disk_block, u64 num_blocks);
>> +int find_prealloc(struct blk_iterate_data *data, int index, bool
>> *prealloc);
>> #endif
在 2024/5/3 21:55, Anand Jain 写道:
[...]
>>> +int find_prealloc(struct blk_iterate_data *data, int index, bool
>>> *prealloc)
>>
>> This function is called for every file extent we're going to create.
>> I'm not a big fan of doing so many lookup.
>>
>> My question is, is this the only way to determine the flag of the data
>> extent?
>>
>> My instinct says there should be a straight forward way to determine
>> if a file extent is preallocated or not, just like what we do in our
>> file extent items.
>
>
>> Thus during the ext2fs_block_iterate2(), there should be some way to
>> tell if a block is preallocated or not.
>
> Unfortunately, the callback doesn't provide the extent flags. Unless, I
> miss something?
You're right, the iterator interface does not provide any extra info.
And I also checked the kernel implementation, they have extra
ext4_map_blocks() to do the resolve, and then ext4_es_lookup_extent() to
determine if it's unwritten.
So I'm afraid we have to go this solution.
Meanwhile related to the implementation, can we put the prealloc flat
into blk_iterate_data?
So that we can handle different fses' preallocated extents in a more
common way.
Thanks,
Qu
On 5/4/24 06:23, Qu Wenruo wrote:
>
>
> 在 2024/5/3 21:55, Anand Jain 写道:
> [...]
>>>> +int find_prealloc(struct blk_iterate_data *data, int index, bool
>>>> *prealloc)
>>>
>>> This function is called for every file extent we're going to create.
>>> I'm not a big fan of doing so many lookup.
>>>
>>> My question is, is this the only way to determine the flag of the
>>> data extent?
>>>
>>> My instinct says there should be a straight forward way to determine
>>> if a file extent is preallocated or not, just like what we do in our
>>> file extent items.
>>
>>
>>> Thus during the ext2fs_block_iterate2(), there should be some way to
>>> tell if a block is preallocated or not.
>>
>> Unfortunately, the callback doesn't provide the extent flags. Unless,
>> I miss something?
>
> You're right, the iterator interface does not provide any extra info.
>
> And I also checked the kernel implementation, they have extra
> ext4_map_blocks() to do the resolve, and then ext4_es_lookup_extent() to
> determine if it's unwritten.
>
> So I'm afraid we have to go this solution.
>
>
> Meanwhile related to the implementation, can we put the prealloc flat
> into blk_iterate_data?
> So that we can handle different fses' preallocated extents in a more
> common way.
>
I initially thought so, but is blk_iterate_data::num_blocks always
equal to extent::e_len in all file data extent situations mixed
with hole and unwritten combinations? If not, then the flag might
not be appropriate there, as it doesn't apply to the entirety of
blk_iterate_data::num_blocks.
Thanks, Anand
> Thanks,
> Qu
在 2024/5/4 08:57, Anand Jain 写道:
>
>
> On 5/4/24 06:23, Qu Wenruo wrote:
>>
>>
>> 在 2024/5/3 21:55, Anand Jain 写道:
>> [...]
>>>>> +int find_prealloc(struct blk_iterate_data *data, int index, bool
>>>>> *prealloc)
>>>>
>>>> This function is called for every file extent we're going to create.
>>>> I'm not a big fan of doing so many lookup.
>>>>
>>>> My question is, is this the only way to determine the flag of the
>>>> data extent?
>>>>
>>>> My instinct says there should be a straight forward way to determine
>>>> if a file extent is preallocated or not, just like what we do in our
>>>> file extent items.
>>>
>>>
>>>> Thus during the ext2fs_block_iterate2(), there should be some way to
>>>> tell if a block is preallocated or not.
>>>
>>> Unfortunately, the callback doesn't provide the extent flags. Unless,
>>> I miss something?
>>
>> You're right, the iterator interface does not provide any extra info.
>>
>> And I also checked the kernel implementation, they have extra
>> ext4_map_blocks() to do the resolve, and then ext4_es_lookup_extent()
>> to determine if it's unwritten.
>>
>> So I'm afraid we have to go this solution.
>>
>>
>> Meanwhile related to the implementation, can we put the prealloc flat
>> into blk_iterate_data?
>> So that we can handle different fses' preallocated extents in a more
>> common way.
>>
>
> I initially thought so, but is blk_iterate_data::num_blocks always
> equal to extent::e_len in all file data extent situations mixed
> with hole and unwritten combinations? If not, then the flag might
> not be appropriate there, as it doesn't apply to the entirety of
> blk_iterate_data::num_blocks.
I do not think we need @num_blocks to match extent_len.
We're already doing some kind of merge inside block_iterate_proc(), and
if we find previous extent flag doesn't match the current one, we just
need to submit the previous one.
Although I also believe we need some better abstraction for the common code.
The current one doesn't explain everything well for things parameters
like disk_block/file_block.
Thanks,
Qu
>
> Thanks, Anand
>
>> Thanks,
>> Qu