2023-09-18 15:52:54

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH 0/4] btrfs: RAID stripe tree updates

Here is a first batch of updates for the RAID stripe tree patchset in
for-next which address some of the review comments.

---
Johannes Thumshirn (4):
btrfs: fix 64bit division in btrfs_insert_striped_mirrored_raid_extents
btrfs: break loop in case set_io_stripe fails
btrfs: rename ordered_enmtry in btrfs_io_context
btrfs: add tree-checker for RAID-stripe-tree

fs/btrfs/bio.c | 3 ++-
fs/btrfs/raid-stripe-tree.c | 26 +++++++++++++-------------
fs/btrfs/tree-checker.c | 42 ++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/volumes.c | 8 +++++++-
fs/btrfs/volumes.h | 2 +-
5 files changed, 65 insertions(+), 16 deletions(-)
---
base-commit: 7368638e1a1f30dbf34c141dc2355a96ca2a2932
change-id: 20230915-rst-updates-8c55784ca4ef

Best regards,
--
Johannes Thumshirn <[email protected]>


2023-09-18 15:53:30

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH 1/4] btrfs: fix 64bit division in btrfs_insert_striped_mirrored_raid_extents

Fix modpost error due to 64bit division on 32bit systems in
btrfs_insert_striped_mirrored_raid_extents.

Cc: Geert Uytterhoeven <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
---
fs/btrfs/raid-stripe-tree.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 85e8e389990f..0c0e620ed8b9 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -148,10 +148,10 @@ static int btrfs_insert_striped_mirrored_raid_extents(
{
struct btrfs_io_context *bioc;
struct btrfs_io_context *rbioc;
- const int nstripes = list_count_nodes(&ordered->bioc_list);
- const int index = btrfs_bg_flags_to_raid_index(map_type);
- const int substripes = btrfs_raid_array[index].sub_stripes;
- const int max_stripes = trans->fs_info->fs_devices->rw_devices / substripes;
+ const size_t nstripes = list_count_nodes(&ordered->bioc_list);
+ const enum btrfs_raid_types index = btrfs_bg_flags_to_raid_index(map_type);
+ const u8 substripes = btrfs_raid_array[index].sub_stripes;
+ const int max_stripes = div_u64(trans->fs_info->fs_devices->rw_devices, substripes);
int left = nstripes;
int i;
int ret = 0;

--
2.41.0

2023-09-18 16:09:48

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH 3/4] btrfs: rename ordered_enmtry in btrfs_io_context

Rename the 'ordered_entry' member of 'struct btrfs_io_context' to
'rst_ordered_entry' to make it clear, it is only there for updates to the
raid-stripe-tree.

Cc: Qu Wenru <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
---
fs/btrfs/bio.c | 3 ++-
fs/btrfs/raid-stripe-tree.c | 18 +++++++++---------
fs/btrfs/volumes.h | 2 +-
3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 4c234024beae..d0a029c859e1 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -708,7 +708,8 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
* can't happen until after the last submission.
*/
btrfs_get_bioc(bioc);
- list_add_tail(&bioc->ordered_entry, &bbio->ordered->bioc_list);
+ list_add_tail(&bioc->rst_ordered_entry,
+ &bbio->ordered->bioc_list);
}

/*
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 0c0e620ed8b9..1bf84258dbfa 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -132,7 +132,7 @@ static int btrfs_insert_mirrored_raid_extents(struct btrfs_trans_handle *trans,
struct btrfs_io_context *bioc;
int ret;

- list_for_each_entry(bioc, &ordered->bioc_list, ordered_entry) {
+ list_for_each_entry(bioc, &ordered->bioc_list, rst_ordered_entry) {
ret = btrfs_insert_one_raid_extent(trans, num_stripes, bioc);
if (ret)
return ret;
@@ -167,12 +167,12 @@ static int btrfs_insert_striped_mirrored_raid_extents(

rbioc->map_type = map_type;
rbioc->logical = list_first_entry(&ordered->bioc_list, typeof(*rbioc),
- ordered_entry)->logical;
+ rst_ordered_entry)->logical;

stripe_end = rbioc->logical;
prev_end = stripe_end;
i = 0;
- list_for_each_entry(bioc, &ordered->bioc_list, ordered_entry) {
+ list_for_each_entry(bioc, &ordered->bioc_list, rst_ordered_entry) {
rbioc->size += bioc->size;
for (int j = 0; j < substripes; j++) {
int stripe = i + j;
@@ -201,7 +201,7 @@ static int btrfs_insert_striped_mirrored_raid_extents(
}

if (left) {
- bioc = list_prev_entry(bioc, ordered_entry);
+ bioc = list_prev_entry(bioc, rst_ordered_entry);
ret = btrfs_insert_one_raid_extent(trans, substripes, bioc);
}

@@ -225,10 +225,10 @@ static int btrfs_insert_striped_raid_extents(struct btrfs_trans_handle *trans,
return -ENOMEM;
rbioc->map_type = map_type;
rbioc->logical = list_first_entry(&ordered->bioc_list, typeof(*rbioc),
- ordered_entry)->logical;
+ rst_ordered_entry)->logical;

i = 0;
- list_for_each_entry(bioc, &ordered->bioc_list, ordered_entry) {
+ list_for_each_entry(bioc, &ordered->bioc_list, rst_ordered_entry) {
rbioc->size += bioc->size;
rbioc->stripes[i].dev = bioc->stripes[0].dev;
rbioc->stripes[i].physical = bioc->stripes[0].physical;
@@ -266,7 +266,7 @@ int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,
return 0;

map_type = list_first_entry(&ordered_extent->bioc_list, typeof(*bioc),
- ordered_entry)->map_type;
+ rst_ordered_entry)->map_type;

switch (map_type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
case BTRFS_BLOCK_GROUP_DUP:
@@ -291,8 +291,8 @@ int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,

while (!list_empty(&ordered_extent->bioc_list)) {
bioc = list_first_entry(&ordered_extent->bioc_list,
- typeof(*bioc), ordered_entry);
- list_del(&bioc->ordered_entry);
+ typeof(*bioc), rst_ordered_entry);
+ list_del(&bioc->rst_ordered_entry);
btrfs_put_bioc(bioc);
}

diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 54549e34a306..b511d17c6717 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -432,7 +432,7 @@ struct btrfs_io_context {

u64 logical;
u64 size;
- struct list_head ordered_entry;
+ struct list_head rst_ordered_entry;

/*
* The total number of stripes, including the extra duplicated

--
2.41.0

2023-09-18 16:22:27

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH 4/4] btrfs: add tree-checker for RAID-stripe-tree

Add a tree checker for RAID stripe tree items.

Signed-off-by: Johannes Thumshirn <[email protected]>
---
fs/btrfs/tree-checker.c | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 01bba79165e7..ea84ca2767e9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -30,6 +30,7 @@
#include "file-item.h"
#include "inode-item.h"
#include "dir-item.h"
+#include "raid-stripe-tree.h"

/*
* Error message should follow the following format:
@@ -1635,6 +1636,44 @@ static int check_inode_ref(struct extent_buffer *leaf,
return 0;
}

+static int check_raid_stripe_extent(struct extent_buffer *leaf,
+ struct btrfs_key *key, int slot)
+{
+ struct btrfs_stripe_extent *stripe_extent =
+ btrfs_item_ptr(leaf, slot, struct btrfs_stripe_extent);
+
+ if (unlikely(!IS_ALIGNED(key->objectid, leaf->fs_info->sectorsize))) {
+ generic_err(leaf, slot,
+"invalid key objectid for raid stripe extent, have %llu expect aligned to %u",
+ key->objectid, leaf->fs_info->sectorsize);
+ return -EUCLEAN;
+ }
+
+ if (unlikely(!btrfs_fs_incompat(leaf->fs_info, RAID_STRIPE_TREE))) {
+ generic_err(leaf, slot,
+ "RAID_STRIPE_EXTENT present but RAID_STRIPE_TREE incompat bit unset");
+ return -EUCLEAN;
+ }
+
+ switch (btrfs_stripe_extent_encoding(leaf, stripe_extent)) {
+ case BTRFS_STRIPE_RAID0:
+ case BTRFS_STRIPE_RAID1:
+ case BTRFS_STRIPE_DUP:
+ case BTRFS_STRIPE_RAID10:
+ case BTRFS_STRIPE_RAID5:
+ case BTRFS_STRIPE_RAID6:
+ case BTRFS_STRIPE_RAID1C3:
+ case BTRFS_STRIPE_RAID1C4:
+ break;
+ default:
+ generic_err(leaf, slot, "invalid raid stripe encoding %u",
+ btrfs_stripe_extent_encoding(leaf, stripe_extent));
+ return -EUCLEAN;
+ }
+
+ return 0;
+}
+
/*
* Common point to switch the item-specific validation.
*/
@@ -1689,6 +1728,9 @@ static enum btrfs_tree_block_status check_leaf_item(struct extent_buffer *leaf,
case BTRFS_EXTENT_DATA_REF_KEY:
ret = check_extent_data_ref(leaf, key, slot);
break;
+ case BTRFS_RAID_STRIPE_KEY:
+ ret = check_raid_stripe_extent(leaf, key, slot);
+ break;
}

if (ret)

--
2.41.0

2023-09-18 16:40:53

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH 2/4] btrfs: break loop in case set_io_stripe fails

Break out of the loop in case st_io_stripe() fails.

Signed-off-by: Johannes Thumshirn <[email protected]>
---
fs/btrfs/volumes.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0c5bd8d2ea06..d2a0ae9d91c4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6576,11 +6576,15 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
*/
bioc->full_stripe_logical = em->start +
btrfs_stripe_nr_to_offset(stripe_nr * data_stripes);
- for (i = 0; i < num_stripes; i++)
+ for (i = 0; i < num_stripes; i++) {
ret = set_io_stripe(fs_info, op, logical, length,
&bioc->stripes[i], map,
(i + stripe_nr) % num_stripes,
stripe_offset, stripe_nr);
+ if (ret)
+ break;
+ }
+
} else {
/*
* For all other non-RAID56 profiles, just copy the target
@@ -6590,6 +6594,8 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
ret = set_io_stripe(fs_info, op, logical, length,
&bioc->stripes[i], map, stripe_index,
stripe_offset, stripe_nr);
+ if (ret)
+ break;
stripe_index++;
}
}

--
2.41.0

2023-09-18 18:55:10

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 1/4] btrfs: fix 64bit division in btrfs_insert_striped_mirrored_raid_extents

Hi David,

On Mon, Sep 18, 2023 at 6:31 PM David Sterba <[email protected]> wrote:
> On Mon, Sep 18, 2023 at 03:03:10PM +0000, Johannes Thumshirn wrote:
> > On 18.09.23 16:19, Geert Uytterhoeven wrote:
> > > Hi Johannes,
> > >
> > > On Mon, Sep 18, 2023 at 4:14 PM Johannes Thumshirn
> > > <[email protected]> wrote:
> > >> Fix modpost error due to 64bit division on 32bit systems in
> > >> btrfs_insert_striped_mirrored_raid_extents.
> > >>
> > >> Cc: Geert Uytterhoeven <[email protected]>
> > >> Signed-off-by: Johannes Thumshirn <[email protected]>
> > >
> > > Thanks for your patch!
> > >
> > >> --- a/fs/btrfs/raid-stripe-tree.c
> > >> +++ b/fs/btrfs/raid-stripe-tree.c
> > >> @@ -148,10 +148,10 @@ static int btrfs_insert_striped_mirrored_raid_extents(
> > >> {
> > >> struct btrfs_io_context *bioc;
> > >> struct btrfs_io_context *rbioc;
> > >> - const int nstripes = list_count_nodes(&ordered->bioc_list);
> > >> - const int index = btrfs_bg_flags_to_raid_index(map_type);
> > >> - const int substripes = btrfs_raid_array[index].sub_stripes;
> > >> - const int max_stripes = trans->fs_info->fs_devices->rw_devices / substripes;
> > >> + const size_t nstripes = list_count_nodes(&ordered->bioc_list);
> > >> + const enum btrfs_raid_types index = btrfs_bg_flags_to_raid_index(map_type);
> > >> + const u8 substripes = btrfs_raid_array[index].sub_stripes;
> > >> + const int max_stripes = div_u64(trans->fs_info->fs_devices->rw_devices, substripes);
> > >
> > > What if the quotient does not fit in a signed 32-bit value?
> >
> > Then you've bought a lot of HDDs ;-)
> >
> > Jokes aside, yes this is theoretically correct. Dave can you fix
> > max_stripes up to be u64 when applying?
>
> I think we can keep it int, or unsigned int if needed, we can't hit such
> huge values for rw_devices. The 'theoretically' would fit for a machine
> with infinite resources, otherwise the maximum number of devices I'd
> expect is a few thousand.

rw_devices and various other *_devices are u64.
Is there a good reason they are that big?
With the fs fuzzing threads in mind, is any validation done on their values?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-09-18 22:35:53

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH 1/4] btrfs: fix 64bit division in btrfs_insert_striped_mirrored_raid_extents

On Mon, Sep 18, 2023 at 08:31:19PM +0200, Geert Uytterhoeven wrote:
> Hi David,
>
> On Mon, Sep 18, 2023 at 6:31 PM David Sterba <[email protected]> wrote:
> > On Mon, Sep 18, 2023 at 03:03:10PM +0000, Johannes Thumshirn wrote:
> > > On 18.09.23 16:19, Geert Uytterhoeven wrote:
> > > > Hi Johannes,
> > > >
> > > > On Mon, Sep 18, 2023 at 4:14 PM Johannes Thumshirn
> > > > <[email protected]> wrote:
> > > >> Fix modpost error due to 64bit division on 32bit systems in
> > > >> btrfs_insert_striped_mirrored_raid_extents.
> > > >>
> > > >> Cc: Geert Uytterhoeven <[email protected]>
> > > >> Signed-off-by: Johannes Thumshirn <[email protected]>
> > > >
> > > > Thanks for your patch!
> > > >
> > > >> --- a/fs/btrfs/raid-stripe-tree.c
> > > >> +++ b/fs/btrfs/raid-stripe-tree.c
> > > >> @@ -148,10 +148,10 @@ static int btrfs_insert_striped_mirrored_raid_extents(
> > > >> {
> > > >> struct btrfs_io_context *bioc;
> > > >> struct btrfs_io_context *rbioc;
> > > >> - const int nstripes = list_count_nodes(&ordered->bioc_list);
> > > >> - const int index = btrfs_bg_flags_to_raid_index(map_type);
> > > >> - const int substripes = btrfs_raid_array[index].sub_stripes;
> > > >> - const int max_stripes = trans->fs_info->fs_devices->rw_devices / substripes;
> > > >> + const size_t nstripes = list_count_nodes(&ordered->bioc_list);
> > > >> + const enum btrfs_raid_types index = btrfs_bg_flags_to_raid_index(map_type);
> > > >> + const u8 substripes = btrfs_raid_array[index].sub_stripes;
> > > >> + const int max_stripes = div_u64(trans->fs_info->fs_devices->rw_devices, substripes);
> > > >
> > > > What if the quotient does not fit in a signed 32-bit value?
> > >
> > > Then you've bought a lot of HDDs ;-)
> > >
> > > Jokes aside, yes this is theoretically correct. Dave can you fix
> > > max_stripes up to be u64 when applying?
> >
> > I think we can keep it int, or unsigned int if needed, we can't hit such
> > huge values for rw_devices. The 'theoretically' would fit for a machine
> > with infinite resources, otherwise the maximum number of devices I'd
> > expect is a few thousand.
>
> rw_devices and various other *_devices are u64.
> Is there a good reason they are that big?

Many members' types of the on-disk structures are generous and u64 was
the default choice, in many cases practically meaning "you don't have to
care about it for the whole fileystem lifetime" or when u32 would be
close to some potentially reachable value (like 4GiB chunks). You could
find examples where u64 is too much but it's not a big deal for data
stored once and over time I don't remember that we'd have to regret that
some struct member is not big enough.

> With the fs fuzzing threads in mind, is any validation done on their values?

I think the superblock is the most fuzzed structure of btrfs and we do a
lot of direct validation,

https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/tree/fs/btrfs/disk-io.c#n2299

regarding the number of devices there's a warning when the value is
larger than "1<<31"

https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/tree/fs/btrfs/disk-io.c#n2433

The rw_devices are counting how many devices are actually found (i.e.
represented by a block device) and compared against the value stored in
the super block.

The u64 is also convenient for calculations where a e.g. a type counting
zones was u32 because it's a sane type but then we need to convert it to
bytes the shift overflows, we had such bugs. Fortunatelly the sector_t
is u64 for a long time but it was also source of subtle errors when
converting to bytes.

2023-09-18 22:57:13

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH 1/4] btrfs: fix 64bit division in btrfs_insert_striped_mirrored_raid_extents

On Mon, Sep 18, 2023 at 03:03:10PM +0000, Johannes Thumshirn wrote:
> On 18.09.23 16:19, Geert Uytterhoeven wrote:
> > Hi Johannes,
> >
> > On Mon, Sep 18, 2023 at 4:14 PM Johannes Thumshirn
> > <[email protected]> wrote:
> >> Fix modpost error due to 64bit division on 32bit systems in
> >> btrfs_insert_striped_mirrored_raid_extents.
> >>
> >> Cc: Geert Uytterhoeven <[email protected]>
> >> Signed-off-by: Johannes Thumshirn <[email protected]>
> >
> > Thanks for your patch!
> >
> >> --- a/fs/btrfs/raid-stripe-tree.c
> >> +++ b/fs/btrfs/raid-stripe-tree.c
> >> @@ -148,10 +148,10 @@ static int btrfs_insert_striped_mirrored_raid_extents(
> >> {
> >> struct btrfs_io_context *bioc;
> >> struct btrfs_io_context *rbioc;
> >> - const int nstripes = list_count_nodes(&ordered->bioc_list);
> >> - const int index = btrfs_bg_flags_to_raid_index(map_type);
> >> - const int substripes = btrfs_raid_array[index].sub_stripes;
> >> - const int max_stripes = trans->fs_info->fs_devices->rw_devices / substripes;
> >> + const size_t nstripes = list_count_nodes(&ordered->bioc_list);
> >> + const enum btrfs_raid_types index = btrfs_bg_flags_to_raid_index(map_type);
> >> + const u8 substripes = btrfs_raid_array[index].sub_stripes;
> >> + const int max_stripes = div_u64(trans->fs_info->fs_devices->rw_devices, substripes);
> >
> > What if the quotient does not fit in a signed 32-bit value?
>
> Then you've bought a lot of HDDs ;-)
>
> Jokes aside, yes this is theoretically correct. Dave can you fix
> max_stripes up to be u64 when applying?

I think we can keep it int, or unsigned int if needed, we can't hit such
huge values for rw_devices. The 'theoretically' would fit for a machine
with infinite resources, otherwise the maximum number of devices I'd
expect is a few thousand.

2023-09-19 02:36:04

by Qu Wenruo

[permalink] [raw]
Subject: Re: [PATCH 4/4] btrfs: add tree-checker for RAID-stripe-tree



On 2023/9/18 23:44, Johannes Thumshirn wrote:
> Add a tree checker for RAID stripe tree items.
>
> Signed-off-by: Johannes Thumshirn <[email protected]>
> ---
> fs/btrfs/tree-checker.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 42 insertions(+)
>
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 01bba79165e7..ea84ca2767e9 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -30,6 +30,7 @@
> #include "file-item.h"
> #include "inode-item.h"
> #include "dir-item.h"
> +#include "raid-stripe-tree.h"
>
> /*
> * Error message should follow the following format:
> @@ -1635,6 +1636,44 @@ static int check_inode_ref(struct extent_buffer *leaf,
> return 0;
> }
>
> +static int check_raid_stripe_extent(struct extent_buffer *leaf,
> + struct btrfs_key *key, int slot)
> +{
> + struct btrfs_stripe_extent *stripe_extent =
> + btrfs_item_ptr(leaf, slot, struct btrfs_stripe_extent);
> +
> + if (unlikely(!IS_ALIGNED(key->objectid, leaf->fs_info->sectorsize))) {
> + generic_err(leaf, slot,
> +"invalid key objectid for raid stripe extent, have %llu expect aligned to %u",
> + key->objectid, leaf->fs_info->sectorsize);
> + return -EUCLEAN;
> + }
> +
> + if (unlikely(!btrfs_fs_incompat(leaf->fs_info, RAID_STRIPE_TREE))) {
> + generic_err(leaf, slot,
> + "RAID_STRIPE_EXTENT present but RAID_STRIPE_TREE incompat bit unset");
> + return -EUCLEAN;
> + }
> +
> + switch (btrfs_stripe_extent_encoding(leaf, stripe_extent)) {
> + case BTRFS_STRIPE_RAID0:
> + case BTRFS_STRIPE_RAID1:
> + case BTRFS_STRIPE_DUP:
> + case BTRFS_STRIPE_RAID10:
> + case BTRFS_STRIPE_RAID5:
> + case BTRFS_STRIPE_RAID6:
> + case BTRFS_STRIPE_RAID1C3:
> + case BTRFS_STRIPE_RAID1C4:
> + break;
> + default:
> + generic_err(leaf, slot, "invalid raid stripe encoding %u",
> + btrfs_stripe_extent_encoding(leaf, stripe_extent));
> + return -EUCLEAN;
> + }

Another thing we can check is the item size, the item size should be
aligned to a single record, or we can get garbage reading the last record.

Thanks,
Qu
> +
> + return 0;
> +}
> +
> /*
> * Common point to switch the item-specific validation.
> */
> @@ -1689,6 +1728,9 @@ static enum btrfs_tree_block_status check_leaf_item(struct extent_buffer *leaf,
> case BTRFS_EXTENT_DATA_REF_KEY:
> ret = check_extent_data_ref(leaf, key, slot);
> break;
> + case BTRFS_RAID_STRIPE_KEY:
> + ret = check_raid_stripe_extent(leaf, key, slot);
> + break;
> }
>
> if (ret)
>

2023-09-19 08:08:37

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH 0/4] btrfs: RAID stripe tree updates

On Mon, Sep 18, 2023 at 07:14:29AM -0700, Johannes Thumshirn wrote:
> Here is a first batch of updates for the RAID stripe tree patchset in
> for-next which address some of the review comments.

Thanks.

> ---
> Johannes Thumshirn (4):
> btrfs: fix 64bit division in btrfs_insert_striped_mirrored_raid_extents
> btrfs: break loop in case set_io_stripe fails
> btrfs: rename ordered_enmtry in btrfs_io_context

Folded to the patches.

> btrfs: add tree-checker for RAID-stripe-tree

Added as a new one.

2023-09-19 10:45:14

by Qu Wenruo

[permalink] [raw]
Subject: Re: [PATCH 1/4] btrfs: fix 64bit division in btrfs_insert_striped_mirrored_raid_extents



On 2023/9/19 01:54, David Sterba wrote:
> On Mon, Sep 18, 2023 at 03:03:10PM +0000, Johannes Thumshirn wrote:
>> On 18.09.23 16:19, Geert Uytterhoeven wrote:
>>> Hi Johannes,
>>>
>>> On Mon, Sep 18, 2023 at 4:14 PM Johannes Thumshirn
>>> <[email protected]> wrote:
>>>> Fix modpost error due to 64bit division on 32bit systems in
>>>> btrfs_insert_striped_mirrored_raid_extents.
>>>>
>>>> Cc: Geert Uytterhoeven <[email protected]>
>>>> Signed-off-by: Johannes Thumshirn <[email protected]>
>>>
>>> Thanks for your patch!
>>>
>>>> --- a/fs/btrfs/raid-stripe-tree.c
>>>> +++ b/fs/btrfs/raid-stripe-tree.c
>>>> @@ -148,10 +148,10 @@ static int btrfs_insert_striped_mirrored_raid_extents(
>>>> {
>>>> struct btrfs_io_context *bioc;
>>>> struct btrfs_io_context *rbioc;
>>>> - const int nstripes = list_count_nodes(&ordered->bioc_list);
>>>> - const int index = btrfs_bg_flags_to_raid_index(map_type);
>>>> - const int substripes = btrfs_raid_array[index].sub_stripes;
>>>> - const int max_stripes = trans->fs_info->fs_devices->rw_devices / substripes;
>>>> + const size_t nstripes = list_count_nodes(&ordered->bioc_list);
>>>> + const enum btrfs_raid_types index = btrfs_bg_flags_to_raid_index(map_type);
>>>> + const u8 substripes = btrfs_raid_array[index].sub_stripes;
>>>> + const int max_stripes = div_u64(trans->fs_info->fs_devices->rw_devices, substripes);
>>>
>>> What if the quotient does not fit in a signed 32-bit value?
>>
>> Then you've bought a lot of HDDs ;-)
>>
>> Jokes aside, yes this is theoretically correct. Dave can you fix
>> max_stripes up to be u64 when applying?
>
> I think we can keep it int, or unsigned int if needed, we can't hit such
> huge values for rw_devices. The 'theoretically' would fit for a machine
> with infinite resources, otherwise the maximum number of devices I'd
> expect is a few thousand.

In fact, we already have an check in btrfs_validate_super(), if the
num_devices is over 1<<31, we would reject the fs.

I think we should be safe to further reduce the threshold.

U16_MAX sounds a valid and sane value to me.
If no rejection I can send out a patch for this.

And later change internal rw_devices/num_devices to u16.

Thanks,
Qu

2023-09-19 19:36:12

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH 1/4] btrfs: fix 64bit division in btrfs_insert_striped_mirrored_raid_extents

On Tue, Sep 19, 2023 at 10:07:00AM +0930, Qu Wenruo wrote:
> On 2023/9/19 01:54, David Sterba wrote:
> > On Mon, Sep 18, 2023 at 03:03:10PM +0000, Johannes Thumshirn wrote:
> >> On 18.09.23 16:19, Geert Uytterhoeven wrote:
> >>> Hi Johannes,
> >>>
> >>> On Mon, Sep 18, 2023 at 4:14 PM Johannes Thumshirn
> >>> <[email protected]> wrote:
> >>>> Fix modpost error due to 64bit division on 32bit systems in
> >>>> btrfs_insert_striped_mirrored_raid_extents.
> >>>>
> >>>> Cc: Geert Uytterhoeven <[email protected]>
> >>>> Signed-off-by: Johannes Thumshirn <[email protected]>
> >>>
> >>> Thanks for your patch!
> >>>
> >>>> --- a/fs/btrfs/raid-stripe-tree.c
> >>>> +++ b/fs/btrfs/raid-stripe-tree.c
> >>>> @@ -148,10 +148,10 @@ static int btrfs_insert_striped_mirrored_raid_extents(
> >>>> {
> >>>> struct btrfs_io_context *bioc;
> >>>> struct btrfs_io_context *rbioc;
> >>>> - const int nstripes = list_count_nodes(&ordered->bioc_list);
> >>>> - const int index = btrfs_bg_flags_to_raid_index(map_type);
> >>>> - const int substripes = btrfs_raid_array[index].sub_stripes;
> >>>> - const int max_stripes = trans->fs_info->fs_devices->rw_devices / substripes;
> >>>> + const size_t nstripes = list_count_nodes(&ordered->bioc_list);
> >>>> + const enum btrfs_raid_types index = btrfs_bg_flags_to_raid_index(map_type);
> >>>> + const u8 substripes = btrfs_raid_array[index].sub_stripes;
> >>>> + const int max_stripes = div_u64(trans->fs_info->fs_devices->rw_devices, substripes);
> >>>
> >>> What if the quotient does not fit in a signed 32-bit value?
> >>
> >> Then you've bought a lot of HDDs ;-)
> >>
> >> Jokes aside, yes this is theoretically correct. Dave can you fix
> >> max_stripes up to be u64 when applying?
> >
> > I think we can keep it int, or unsigned int if needed, we can't hit such
> > huge values for rw_devices. The 'theoretically' would fit for a machine
> > with infinite resources, otherwise the maximum number of devices I'd
> > expect is a few thousand.
>
> In fact, we already have an check in btrfs_validate_super(), if the
> num_devices is over 1<<31, we would reject the fs.

No, it's just a warning in that case.

> I think we should be safe to further reduce the threshold.
>
> U16_MAX sounds a valid and sane value to me.
> If no rejection I can send out a patch for this.
>
> And later change internal rw_devices/num_devices to u16.

U16 does not make sense here, it's not a native int type on many
architectures and generates awkward assembly code. We use it in
justified cases where it's saving space in structures that are allocated
thousand times. The arbitrary limit 65536 is probably sane but not
much different than 1<<31, practically not hit and was useful to
note fuzzed superblocks.

2023-09-20 07:15:47

by Qu Wenruo

[permalink] [raw]
Subject: Re: [PATCH 1/4] btrfs: fix 64bit division in btrfs_insert_striped_mirrored_raid_extents



On 2023/9/19 23:28, David Sterba wrote:
> On Tue, Sep 19, 2023 at 10:07:00AM +0930, Qu Wenruo wrote:
>> On 2023/9/19 01:54, David Sterba wrote:
>>> On Mon, Sep 18, 2023 at 03:03:10PM +0000, Johannes Thumshirn wrote:
>>>> On 18.09.23 16:19, Geert Uytterhoeven wrote:
>>>>> Hi Johannes,
>>>>>
>>>>> On Mon, Sep 18, 2023 at 4:14 PM Johannes Thumshirn
>>>>> <[email protected]> wrote:
>>>>>> Fix modpost error due to 64bit division on 32bit systems in
>>>>>> btrfs_insert_striped_mirrored_raid_extents.
>>>>>>
>>>>>> Cc: Geert Uytterhoeven <[email protected]>
>>>>>> Signed-off-by: Johannes Thumshirn <[email protected]>
>>>>>
>>>>> Thanks for your patch!
>>>>>
>>>>>> --- a/fs/btrfs/raid-stripe-tree.c
>>>>>> +++ b/fs/btrfs/raid-stripe-tree.c
>>>>>> @@ -148,10 +148,10 @@ static int btrfs_insert_striped_mirrored_raid_extents(
>>>>>> {
>>>>>> struct btrfs_io_context *bioc;
>>>>>> struct btrfs_io_context *rbioc;
>>>>>> - const int nstripes = list_count_nodes(&ordered->bioc_list);
>>>>>> - const int index = btrfs_bg_flags_to_raid_index(map_type);
>>>>>> - const int substripes = btrfs_raid_array[index].sub_stripes;
>>>>>> - const int max_stripes = trans->fs_info->fs_devices->rw_devices / substripes;
>>>>>> + const size_t nstripes = list_count_nodes(&ordered->bioc_list);
>>>>>> + const enum btrfs_raid_types index = btrfs_bg_flags_to_raid_index(map_type);
>>>>>> + const u8 substripes = btrfs_raid_array[index].sub_stripes;
>>>>>> + const int max_stripes = div_u64(trans->fs_info->fs_devices->rw_devices, substripes);
>>>>>
>>>>> What if the quotient does not fit in a signed 32-bit value?
>>>>
>>>> Then you've bought a lot of HDDs ;-)
>>>>
>>>> Jokes aside, yes this is theoretically correct. Dave can you fix
>>>> max_stripes up to be u64 when applying?
>>>
>>> I think we can keep it int, or unsigned int if needed, we can't hit such
>>> huge values for rw_devices. The 'theoretically' would fit for a machine
>>> with infinite resources, otherwise the maximum number of devices I'd
>>> expect is a few thousand.
>>
>> In fact, we already have an check in btrfs_validate_super(), if the
>> num_devices is over 1<<31, we would reject the fs.
>
> No, it's just a warning in that case.

We can make it a proper reject.

>
>> I think we should be safe to further reduce the threshold.
>>
>> U16_MAX sounds a valid and sane value to me.
>> If no rejection I can send out a patch for this.
>>
>> And later change internal rw_devices/num_devices to u16.
>
> U16 does not make sense here, it's not a native int type on many
> architectures and generates awkward assembly code. We use it in
> justified cases where it's saving space in structures that are allocated
> thousand times. The arbitrary limit 65536 is probably sane but not
> much different than 1<<31, practically not hit and was useful to
> note fuzzed superblocks.

OK, we can make it unsigned int (mostly u32) for fs_info::*_devices, but
still do extra limits on things like device add to limit it to U16_MAX.

Would this be a better solution?
At least it would still half the width while keep it native to most (if
not all) archs.

Thanks,
Qu

2023-09-20 19:55:34

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH 1/4] btrfs: fix 64bit division in btrfs_insert_striped_mirrored_raid_extents

On Wed, Sep 20, 2023 at 07:20:49AM +0930, Qu Wenruo wrote:
> >>>>> What if the quotient does not fit in a signed 32-bit value?
> >>>>
> >>>> Then you've bought a lot of HDDs ;-)
> >>>>
> >>>> Jokes aside, yes this is theoretically correct. Dave can you fix
> >>>> max_stripes up to be u64 when applying?
> >>>
> >>> I think we can keep it int, or unsigned int if needed, we can't hit such
> >>> huge values for rw_devices. The 'theoretically' would fit for a machine
> >>> with infinite resources, otherwise the maximum number of devices I'd
> >>> expect is a few thousand.
> >>
> >> In fact, we already have an check in btrfs_validate_super(), if the
> >> num_devices is over 1<<31, we would reject the fs.
> >
> > No, it's just a warning in that case.
>
> We can make it a proper reject.
>
> >
> >> I think we should be safe to further reduce the threshold.
> >>
> >> U16_MAX sounds a valid and sane value to me.
> >> If no rejection I can send out a patch for this.
> >>
> >> And later change internal rw_devices/num_devices to u16.
> >
> > U16 does not make sense here, it's not a native int type on many
> > architectures and generates awkward assembly code. We use it in
> > justified cases where it's saving space in structures that are allocated
> > thousand times. The arbitrary limit 65536 is probably sane but not
> > much different than 1<<31, practically not hit and was useful to
> > note fuzzed superblocks.
>
> OK, we can make it unsigned int (mostly u32) for fs_info::*_devices, but
> still do extra limits on things like device add to limit it to U16_MAX.
>
> Would this be a better solution?
> At least it would still half the width while keep it native to most (if
> not all) archs.

I don't see much point changing it from u64, it copies the on-disk type,
we validate the value on input, then use it as an int type. There are
not even theoretical problems stemming from the type width. With the
validations in place we don't need to add any artificial limits to the
number of devices, like we don't add such limitations elsewhere if not
necessary.