2023-10-04 07:56:38

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH v3 0/4] btrfs: RAID stripe tree updates

This batch of RST updates contains the on-disk format changes Qu
suggested. It drastically simplifies the write and path, especially for
RAID10.

Instead of recording all strides of a striped RAID into one stripe tree
entry, we create multiple entries per stride. This allows us to remove the
length in the stride as we can use the length from the key. Using this
method RAID10 becomes RAID1 and RAID0 becomes single from the point of
view of the stripe tree.

---
- Link to first batch: https://lore.kernel.org/r/[email protected]
- Link to second batch: https://lore.kernel.org/r/[email protected]

---
Johannes Thumshirn (4):
btrfs: change RST write
btrfs: remove stride length check on read
btrfs: remove raid stride length in tree printer
btrfs: remove stride length from on-disk format

fs/btrfs/accessors.h | 2 -
fs/btrfs/print-tree.c | 5 +-
fs/btrfs/raid-stripe-tree.c | 173 ++--------------------------------------
include/uapi/linux/btrfs_tree.h | 2 -
4 files changed, 7 insertions(+), 175 deletions(-)
---
base-commit: 8d3aed36ee6cac09c7bd6bee6ad67dc2a35615af
change-id: 20230915-rst-updates-8c55784ca4ef

Best regards,
--
Johannes Thumshirn <[email protected]>


2023-10-04 07:56:44

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH v3 2/4] btrfs: remove stride length check on read

Signed-off-by: Johannes Thumshirn <[email protected]>
---
fs/btrfs/raid-stripe-tree.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 248e048810d3..944e8f1862aa 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -237,16 +237,8 @@ int btrfs_get_raid_extent_offset(struct btrfs_fs_info *fs_info,
for (int i = 0; i < num_stripes; i++) {
struct btrfs_raid_stride *stride = &stripe_extent->strides[i];
u64 devid = btrfs_raid_stride_devid(leaf, stride);
- u64 len = btrfs_raid_stride_length(leaf, stride);
u64 physical = btrfs_raid_stride_physical(leaf, stride);

- if (offset >= len) {
- offset -= len;
-
- if (offset >= BTRFS_STRIPE_LEN)
- continue;
- }
-
if (devid != stripe->dev->devid)
continue;


--
2.41.0

2023-10-04 07:56:47

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH v3 3/4] btrfs: remove raid stride length in tree printer

Signed-off-by: Johannes Thumshirn <[email protected]>
---
fs/btrfs/print-tree.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c
index 2ade2672ed58..7e46aa8a0444 100644
--- a/fs/btrfs/print-tree.c
+++ b/fs/btrfs/print-tree.c
@@ -215,10 +215,9 @@ static void print_raid_stripe_key(const struct extent_buffer *eb, u32 item_size,
btrfs_raid_array[encoding].raid_name : "unknown");

for (int i = 0; i < num_stripes; i++)
- pr_info("\t\t\tstride %d devid %llu physical %llu length %llu\n",
+ pr_info("\t\t\tstride %d devid %llu physical %llu\n",
i, btrfs_raid_stride_devid(eb, &stripe->strides[i]),
- btrfs_raid_stride_physical(eb, &stripe->strides[i]),
- btrfs_raid_stride_length(eb, &stripe->strides[i]));
+ btrfs_raid_stride_physical(eb, &stripe->strides[i]));
}

/*

--
2.41.0

2023-10-04 07:56:50

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH v3 4/4] btrfs: remove stride length from on-disk format

Signed-off-by: Johannes Thumshirn <[email protected]>
---
fs/btrfs/accessors.h | 2 --
include/uapi/linux/btrfs_tree.h | 2 --
2 files changed, 4 deletions(-)

diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h
index b780d9087490..aa0844535644 100644
--- a/fs/btrfs/accessors.h
+++ b/fs/btrfs/accessors.h
@@ -309,12 +309,10 @@ BTRFS_SETGET_STACK_FUNCS(stack_timespec_nsec, struct btrfs_timespec, nsec, 32);
BTRFS_SETGET_FUNCS(stripe_extent_encoding, struct btrfs_stripe_extent, encoding, 8);
BTRFS_SETGET_FUNCS(raid_stride_devid, struct btrfs_raid_stride, devid, 64);
BTRFS_SETGET_FUNCS(raid_stride_physical, struct btrfs_raid_stride, physical, 64);
-BTRFS_SETGET_FUNCS(raid_stride_length, struct btrfs_raid_stride, length, 64);
BTRFS_SETGET_STACK_FUNCS(stack_stripe_extent_encoding,
struct btrfs_stripe_extent, encoding, 8);
BTRFS_SETGET_STACK_FUNCS(stack_raid_stride_devid, struct btrfs_raid_stride, devid, 64);
BTRFS_SETGET_STACK_FUNCS(stack_raid_stride_physical, struct btrfs_raid_stride, physical, 64);
-BTRFS_SETGET_STACK_FUNCS(stack_raid_stride_length, struct btrfs_raid_stride, length, 64);

/* struct btrfs_dev_extent */
BTRFS_SETGET_FUNCS(dev_extent_chunk_tree, struct btrfs_dev_extent, chunk_tree, 64);
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index 9fafcaebf44d..c25fc9614594 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -737,8 +737,6 @@ struct btrfs_raid_stride {
__le64 devid;
/* The physical location on disk. */
__le64 physical;
- /* The length of stride on this disk. */
- __le64 length;
} __attribute__ ((__packed__));

/* The stripe_extent::encoding, 1:1 mapping of enum btrfs_raid_types. */

--
2.41.0

2023-10-04 07:57:33

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH v3 1/4] btrfs: change RST write

Signed-off-by: Johannes Thumshirn <[email protected]>
---
fs/btrfs/raid-stripe-tree.c | 165 ++------------------------------------------
1 file changed, 5 insertions(+), 160 deletions(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 8a38f07a3246..248e048810d3 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -75,12 +75,12 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
}

static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans,
- int num_stripes,
struct btrfs_io_context *bioc)
{
struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_key stripe_key;
struct btrfs_root *stripe_root = fs_info->stripe_root;
+ const int num_stripes = btrfs_bg_type_to_factor(bioc->map_type);
u8 encoding = btrfs_bg_flags_to_raid_index(bioc->map_type);
struct btrfs_stripe_extent *stripe_extent;
const size_t item_size = struct_size(stripe_extent, strides, num_stripes);
@@ -107,7 +107,6 @@ static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans,

btrfs_set_stack_raid_stride_devid(raid_stride, devid);
btrfs_set_stack_raid_stride_physical(raid_stride, physical);
- btrfs_set_stack_raid_stride_length(raid_stride, length);
}

stripe_key.objectid = bioc->logical;
@@ -124,173 +123,19 @@ static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans,
return ret;
}

-static int btrfs_insert_mirrored_raid_extents(struct btrfs_trans_handle *trans,
- struct btrfs_ordered_extent *ordered,
- u64 map_type)
-{
- int num_stripes = btrfs_bg_type_to_factor(map_type);
- struct btrfs_io_context *bioc;
- int ret;
-
- list_for_each_entry(bioc, &ordered->bioc_list, rst_ordered_entry) {
- ret = btrfs_insert_one_raid_extent(trans, num_stripes, bioc);
- if (ret)
- return ret;
- }
-
- return 0;
-}
-
-static int btrfs_insert_striped_mirrored_raid_extents(
- struct btrfs_trans_handle *trans,
- struct btrfs_ordered_extent *ordered,
- u64 map_type)
-{
- struct btrfs_io_context *bioc;
- struct btrfs_io_context *rbioc;
- const size_t nstripes = list_count_nodes(&ordered->bioc_list);
- const enum btrfs_raid_types index = btrfs_bg_flags_to_raid_index(map_type);
- const int substripes = btrfs_raid_array[index].sub_stripes;
- const int max_stripes = div_u64(trans->fs_info->fs_devices->rw_devices,
- substripes);
- int left = nstripes;
- int i;
- int ret = 0;
- u64 stripe_end;
- u64 prev_end;
- int stripe;
-
- if (nstripes == 1)
- return btrfs_insert_mirrored_raid_extents(trans, ordered, map_type);
-
- rbioc = kzalloc(struct_size(rbioc, stripes, nstripes * substripes), GFP_NOFS);
- if (!rbioc)
- return -ENOMEM;
-
- rbioc->map_type = map_type;
- rbioc->logical = list_first_entry(&ordered->bioc_list, typeof(*rbioc),
- rst_ordered_entry)->logical;
-
- stripe_end = rbioc->logical;
- prev_end = stripe_end;
- i = 0;
- stripe = 0;
- list_for_each_entry(bioc, &ordered->bioc_list, rst_ordered_entry) {
- rbioc->size += bioc->size;
- for (int j = 0; j < substripes; j++) {
- stripe = i + j;
- rbioc->stripes[stripe].dev = bioc->stripes[j].dev;
- rbioc->stripes[stripe].physical = bioc->stripes[j].physical;
- rbioc->stripes[stripe].length = bioc->size;
- }
-
- stripe_end += rbioc->size;
- if (i >= nstripes ||
- (stripe_end - prev_end >= max_stripes * BTRFS_STRIPE_LEN)) {
- ret = btrfs_insert_one_raid_extent(trans, stripe + 1, rbioc);
- if (ret)
- goto out;
-
- left -= stripe + 1;
- if (left <= 0)
- break;
-
- i = 0;
- rbioc->logical += rbioc->size;
- rbioc->size = 0;
- } else {
- i += substripes;
- prev_end = stripe_end;
- }
- }
-
- if (left > 0) {
- bioc = list_prev_entry(bioc, rst_ordered_entry);
- ret = btrfs_insert_one_raid_extent(trans, substripes, bioc);
- }
-
-out:
- kfree(rbioc);
- return ret;
-}
-
-static int btrfs_insert_striped_raid_extents(struct btrfs_trans_handle *trans,
- struct btrfs_ordered_extent *ordered,
- u64 map_type)
-{
- struct btrfs_io_context *bioc;
- struct btrfs_io_context *rbioc;
- const size_t nstripes = list_count_nodes(&ordered->bioc_list);
- int i;
- int ret = 0;
-
- rbioc = kzalloc(struct_size(rbioc, stripes, nstripes), GFP_NOFS);
- if (!rbioc)
- return -ENOMEM;
- rbioc->map_type = map_type;
- rbioc->logical = list_first_entry(&ordered->bioc_list, typeof(*rbioc),
- rst_ordered_entry)->logical;
-
- i = 0;
- list_for_each_entry(bioc, &ordered->bioc_list, rst_ordered_entry) {
- rbioc->size += bioc->size;
- rbioc->stripes[i].dev = bioc->stripes[0].dev;
- rbioc->stripes[i].physical = bioc->stripes[0].physical;
- rbioc->stripes[i].length = bioc->size;
-
- if (i == nstripes - 1) {
- ret = btrfs_insert_one_raid_extent(trans, nstripes, rbioc);
- if (ret)
- goto out;
-
- i = 0;
- rbioc->logical += rbioc->size;
- rbioc->size = 0;
- } else {
- i++;
- }
- }
-
- if (i && i < nstripes - 1)
- ret = btrfs_insert_one_raid_extent(trans, i, rbioc);
-
-out:
- kfree(rbioc);
- return ret;
-}
-
int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,
struct btrfs_ordered_extent *ordered_extent)
{
struct btrfs_io_context *bioc;
- u64 map_type;
int ret;

if (!btrfs_fs_incompat(trans->fs_info, RAID_STRIPE_TREE))
return 0;

- map_type = list_first_entry(&ordered_extent->bioc_list, typeof(*bioc),
- rst_ordered_entry)->map_type;
-
- switch (map_type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
- case BTRFS_BLOCK_GROUP_DUP:
- case BTRFS_BLOCK_GROUP_RAID1:
- case BTRFS_BLOCK_GROUP_RAID1C3:
- case BTRFS_BLOCK_GROUP_RAID1C4:
- ret = btrfs_insert_mirrored_raid_extents(trans, ordered_extent, map_type);
- break;
- case BTRFS_BLOCK_GROUP_RAID0:
- ret = btrfs_insert_striped_raid_extents(trans, ordered_extent, map_type);
- break;
- case BTRFS_BLOCK_GROUP_RAID10:
- ret = btrfs_insert_striped_mirrored_raid_extents(trans, ordered_extent,
- map_type);
- break;
- default:
- btrfs_err(trans->fs_info, "trying to insert unknown block group profile %lld",
- map_type & BTRFS_BLOCK_GROUP_PROFILE_MASK);
- ret = -EINVAL;
- break;
+ list_for_each_entry(bioc, &ordered_extent->bioc_list, rst_ordered_entry) {
+ ret = btrfs_insert_one_raid_extent(trans, bioc);
+ if (ret)
+ return ret;
}

while (!list_empty(&ordered_extent->bioc_list)) {

--
2.41.0

2023-10-04 08:26:31

by Qu Wenruo

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] btrfs: RAID stripe tree updates



On 2023/10/4 18:26, Johannes Thumshirn wrote:
> This batch of RST updates contains the on-disk format changes Qu
> suggested. It drastically simplifies the write and path, especially for
> RAID10.
>
> Instead of recording all strides of a striped RAID into one stripe tree
> entry, we create multiple entries per stride. This allows us to remove the
> length in the stride as we can use the length from the key. Using this
> method RAID10 becomes RAID1 and RAID0 becomes single from the point of
> view of the stripe tree.

Great the idea can simplify the code.
So I'm very glad I can provide some help on RST.

Although one concern is about the compatibility, but I guess since rst
is still covered under experimental flags for progs, we can more or less
ignore the compatibility for now?

The other concern is, how would those patches be merged, would David
just fold them, and we can check the misc-next, or there would be
another branch for us to view the code?

Thanks,
Qu
>
> ---
> - Link to first batch: https://lore.kernel.org/r/[email protected]
> - Link to second batch: https://lore.kernel.org/r/[email protected]
>
> ---
> Johannes Thumshirn (4):
> btrfs: change RST write
> btrfs: remove stride length check on read
> btrfs: remove raid stride length in tree printer
> btrfs: remove stride length from on-disk format
>
> fs/btrfs/accessors.h | 2 -
> fs/btrfs/print-tree.c | 5 +-
> fs/btrfs/raid-stripe-tree.c | 173 ++--------------------------------------
> include/uapi/linux/btrfs_tree.h | 2 -
> 4 files changed, 7 insertions(+), 175 deletions(-)
> ---
> base-commit: 8d3aed36ee6cac09c7bd6bee6ad67dc2a35615af
> change-id: 20230915-rst-updates-8c55784ca4ef
>
> Best regards,

2023-10-04 08:37:19

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] btrfs: RAID stripe tree updates

On 04.10.23 10:26, Qu Wenruo wrote:
>
>
> On 2023/10/4 18:26, Johannes Thumshirn wrote:
>> This batch of RST updates contains the on-disk format changes Qu
>> suggested. It drastically simplifies the write and path, especially for
>> RAID10.
>>
>> Instead of recording all strides of a striped RAID into one stripe tree
>> entry, we create multiple entries per stride. This allows us to remove the
>> length in the stride as we can use the length from the key. Using this
>> method RAID10 becomes RAID1 and RAID0 becomes single from the point of
>> view of the stripe tree.
>
> Great the idea can simplify the code.
> So I'm very glad I can provide some help on RST.
>
> Although one concern is about the compatibility, but I guess since rst
> is still covered under experimental flags for progs, we can more or less
> ignore the compatibility for now?

They're only in misc-next by now so I hope we can.

> The other concern is, how would those patches be merged, would David
> just fold them, and we can check the misc-next, or there would be
> another branch for us to view the code?

I was under the impression, David is going to fold them in.

2023-10-04 12:43:52

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] btrfs: change RST write

Fixes: 33f7a25346e3 ("btrfs: add support for inserting raid stripe extents")

2023-10-04 12:44:15

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] btrfs: remove stride length check on read

Fixes: 2d606e264676 ("btrfs: lookup physical address from stripe extent")

2023-10-04 12:44:53

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] btrfs: remove raid stride length in tree printer

Fixes: 380bf60956f4 ("btrfs: add raid stripe tree pretty printer")

2023-10-04 12:45:15

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] btrfs: remove stride length from on-disk format

Fixes: cc85b2cdbf32 ("btrfs: add raid stripe tree definitions")

2023-10-04 13:22:06

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] btrfs: RAID stripe tree updates

On Wed, Oct 04, 2023 at 12:56:15AM -0700, Johannes Thumshirn wrote:
> This batch of RST updates contains the on-disk format changes Qu
> suggested. It drastically simplifies the write and path, especially for
> RAID10.
>
> Instead of recording all strides of a striped RAID into one stripe tree
> entry, we create multiple entries per stride. This allows us to remove the
> length in the stride as we can use the length from the key. Using this
> method RAID10 becomes RAID1 and RAID0 becomes single from the point of
> view of the stripe tree.
>
> ---
> - Link to first batch: https://lore.kernel.org/r/[email protected]
> - Link to second batch: https://lore.kernel.org/r/[email protected]
>
> ---
> Johannes Thumshirn (4):
> btrfs: change RST write
> btrfs: remove stride length check on read
> btrfs: remove raid stride length in tree printer
> btrfs: remove stride length from on-disk format

Folded to the commits, thanks.