2024-05-24 16:29:31

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH v5 0/3] btrfs: zoned: always set aside a zone for relocation

For zoned filesytsems we heavily rely on relocation for garbage collecting
as we cannot do any in-place updates of disk blocks.

But there can be situations where we're running out of space for doing the
relocation.

To solve this, always have a zone reserved for relocation.

This is a subset of another approach to this problem I've submitted in
https://lore.kernel.org/r/[email protected]

---
Changes in v5:
- Split out one patch to skip relocation of the data relocation bg
- Link to v4: https://lore.kernel.org/r/[email protected]

Changes in v4:
- Skip data_reloc_bg in delete_unused_bgs() and reclaim_bgs_work()
- Link to v3: https://lore.kernel.org/r/[email protected]

Changes in v3:
- Rename btrfs_reserve_relocation_zone -> btrfs_reserve_relocation_bg
- Bail out if we already have a relocation bg set
- Link to v2: https://lore.kernel.org/r/[email protected]

Changes in v2:
- Incorporate Naohiro's review
- Link to v1: https://lore.kernel.org/r/[email protected]

---
Johannes Thumshirn (3):
btrfs: don't try to relocate the data relocation block-group
btrfs: zoned: reserve relocation block-group on mount
btrfs: reserve new relocation block-group after successful relocation

fs/btrfs/block-group.c | 11 ++++++++
fs/btrfs/disk-io.c | 2 ++
fs/btrfs/relocation.c | 14 +++++++++++
fs/btrfs/volumes.c | 2 ++
fs/btrfs/zoned.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/zoned.h | 3 +++
6 files changed, 100 insertions(+)
---
base-commit: 2aabf192868a0f6d9ee3e35f9b0a8d97c77a46da
change-id: 20240514-zoned-gc-2ce793459eb7

Best regards,
--
Johannes Thumshirn <[email protected]>



2024-05-24 16:29:44

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH v5 1/3] btrfs: don't try to relocate the data relocation block-group

From: Johannes Thumshirn <[email protected]>

When relocating block-groups, either via auto reclaim or manual
balance, skip the data relocation block-group.

Signed-off-by: Johannes Thumshirn <[email protected]>
---
fs/btrfs/block-group.c | 2 ++
fs/btrfs/relocation.c | 7 +++++++
fs/btrfs/volumes.c | 2 ++
3 files changed, 11 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 9910bae89966..9a01bbad45f6 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1921,6 +1921,8 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
div64_u64(zone_unusable * 100, bg->length));
trace_btrfs_reclaim_block_group(bg);
ret = btrfs_relocate_chunk(fs_info, bg->start);
+ if (ret == -EBUSY)
+ ret = 0;
if (ret) {
btrfs_dec_block_group_ro(bg);
btrfs_err(fs_info, "error relocating chunk %llu",
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 5f1a909a1d91..39e2db9af64f 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4037,6 +4037,13 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
int rw = 0;
int err = 0;

+ spin_lock(&fs_info->relocation_bg_lock);
+ if (group_start == fs_info->data_reloc_bg) {
+ spin_unlock(&fs_info->relocation_bg_lock);
+ return -EBUSY;
+ }
+ spin_unlock(&fs_info->relocation_bg_lock);
+
/*
* This only gets set if we had a half-deleted snapshot on mount. We
* cannot allow relocation to start while we're still trying to clean up
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3f70f727dacf..75da3a32885b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3367,6 +3367,8 @@ int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
btrfs_scrub_pause(fs_info);
ret = btrfs_relocate_block_group(fs_info, chunk_offset);
btrfs_scrub_continue(fs_info);
+ if (ret == -EBUSY)
+ return 0;
if (ret) {
/*
* If we had a transaction abort, stop all running scrubs.

--
2.43.0


2024-05-24 16:30:02

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH v5 2/3] btrfs: zoned: reserve relocation block-group on mount

From: Johannes Thumshirn <[email protected]>

Reserve one zone as a data relocation target on each mount. If we already
find one empty block group, there's no need to force a chunk allocation,
but we can use this empty data block group as our relocation target.

Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Naohiro Aota <[email protected]>
---
fs/btrfs/block-group.c | 9 +++++++
fs/btrfs/disk-io.c | 2 ++
fs/btrfs/zoned.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/zoned.h | 3 +++
4 files changed, 82 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 9a01bbad45f6..167ded78af89 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1500,6 +1500,15 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
btrfs_put_block_group(block_group);
continue;
}
+
+ spin_lock(&fs_info->relocation_bg_lock);
+ if (block_group->start == fs_info->data_reloc_bg) {
+ btrfs_put_block_group(block_group);
+ spin_unlock(&fs_info->relocation_bg_lock);
+ continue;
+ }
+ spin_unlock(&fs_info->relocation_bg_lock);
+
spin_unlock(&fs_info->unused_bgs_lock);

btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 78d3966232ae..16bb52bcb69e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3547,6 +3547,8 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
}
btrfs_discard_resume(fs_info);

+ btrfs_reserve_relocation_bg(fs_info);
+
if (fs_info->uuid_root &&
(btrfs_test_opt(fs_info, RESCAN_UUID_TREE) ||
fs_info->generation != btrfs_super_uuid_tree_generation(disk_super))) {
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index c52a0063f7db..f4962935efef 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -17,6 +17,7 @@
#include "fs.h"
#include "accessors.h"
#include "bio.h"
+#include "transaction.h"

/* Maximum number of zones to report per blkdev_report_zones() call */
#define BTRFS_REPORT_NR_ZONES 4096
@@ -2637,3 +2638,70 @@ void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info)
}
spin_unlock(&fs_info->zone_active_bgs_lock);
}
+
+static u64 find_empty_block_group(struct btrfs_space_info *sinfo, u64 flags)
+{
+ struct btrfs_block_group *bg;
+
+ for (int i = 0; i < BTRFS_NR_RAID_TYPES; i++) {
+ list_for_each_entry(bg, &sinfo->block_groups[i], list) {
+ if (bg->flags != flags)
+ continue;
+ if (bg->used == 0)
+ return bg->start;
+ }
+ }
+
+ return 0;
+}
+
+void btrfs_reserve_relocation_bg(struct btrfs_fs_info *fs_info)
+{
+ struct btrfs_root *tree_root = fs_info->tree_root;
+ struct btrfs_space_info *sinfo = fs_info->data_sinfo;
+ struct btrfs_trans_handle *trans;
+ struct btrfs_block_group *bg;
+ u64 flags = btrfs_get_alloc_profile(fs_info, sinfo->flags);
+ u64 bytenr = 0;
+
+ lockdep_assert_not_held(&fs_info->relocation_bg_lock);
+
+ if (!btrfs_is_zoned(fs_info))
+ return;
+
+ if (fs_info->data_reloc_bg)
+ return;
+
+ bytenr = find_empty_block_group(sinfo, flags);
+ if (!bytenr) {
+ int ret;
+
+ trans = btrfs_join_transaction(tree_root);
+ if (IS_ERR(trans))
+ return;
+
+ ret = btrfs_chunk_alloc(trans, flags, CHUNK_ALLOC_FORCE);
+ btrfs_end_transaction(trans);
+ if (ret)
+ return;
+
+ bytenr = find_empty_block_group(sinfo, flags);
+ if (!bytenr)
+ return;
+
+ }
+
+ bg = btrfs_lookup_block_group(fs_info, bytenr);
+ if (!bg)
+ return;
+
+ if (!btrfs_zone_activate(bg))
+ bytenr = 0;
+
+ btrfs_put_block_group(bg);
+
+ spin_lock(&fs_info->relocation_bg_lock);
+ if (!fs_info->data_reloc_bg)
+ fs_info->data_reloc_bg = bytenr;
+ spin_unlock(&fs_info->relocation_bg_lock);
+}
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index ff605beb84ef..56c1c19d52bc 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -95,6 +95,7 @@ int btrfs_zone_finish_one_bg(struct btrfs_fs_info *fs_info);
int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info,
struct btrfs_space_info *space_info, bool do_finish);
void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info);
+void btrfs_reserve_relocation_bg(struct btrfs_fs_info *fs_info);
#else /* CONFIG_BLK_DEV_ZONED */

static inline int btrfs_get_dev_zone_info_all_devices(struct btrfs_fs_info *fs_info)
@@ -264,6 +265,8 @@ static inline int btrfs_zoned_activate_one_bg(struct btrfs_fs_info *fs_info,

static inline void btrfs_check_active_zone_reservation(struct btrfs_fs_info *fs_info) { }

+static inline void btrfs_reserve_relocation_bg(struct btrfs_fs_info *fs_info) { }
+
#endif

static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)

--
2.43.0


2024-05-24 16:30:11

by Johannes Thumshirn

[permalink] [raw]
Subject: [PATCH v5 3/3] btrfs: reserve new relocation block-group after successful relocation

From: Johannes Thumshirn <[email protected]>

After we've committed a relocation transaction, we know we have just freed
up space. Set it as hint for the next relocation.

Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Naohiro Aota <[email protected]>
---
fs/btrfs/relocation.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 39e2db9af64f..29d235003ff1 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3811,6 +3811,13 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
ret = btrfs_commit_transaction(trans);
if (ret && !err)
err = ret;
+
+ /*
+ * We know we have just freed space, set it as hint for the
+ * next relocation.
+ */
+ if (!err)
+ btrfs_reserve_relocation_bg(fs_info);
out_free:
ret = clean_dirty_subvols(rc);
if (ret < 0 && !err)

--
2.43.0


2024-05-29 04:48:30

by Naohiro Aota

[permalink] [raw]
Subject: Re: [PATCH v5 1/3] btrfs: don't try to relocate the data relocation block-group

On Fri, May 24, 2024 at 06:29:09PM GMT, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <[email protected]>
>
> When relocating block-groups, either via auto reclaim or manual
> balance, skip the data relocation block-group.
>
> Signed-off-by: Johannes Thumshirn <[email protected]>
> ---

Thinking of this patch gave me one question. What happens when we manually
relocate a data relocation block group with a RAID-type conversion?

prepare_allocation_zoned() give the data_reloc_bg hint which points to
pre-conversion RAID level. That block group is rejected because of
ffe_ctl->flags mismatch. And, find_free_extent() begins its loop with
ffe_ctl->index = (post-conversion RAID index). All the BGs with that RAID
level are rejected because they are not fs_info->data_reloc_bg.

When ffe_ctl->index goes to the pre-conversion RAID index, the data
relocation BG could be selected. But, as we reject a SINGLE BG for RAID
(DUP) allocation, that won't work for SINGLE to RAID conversion.

The loop will eventually end up allocating a new chunk with a
post-conversion RAID level. However, it is still not eligible for an
allocation because it is not fs_info->data_reloc_bg.

This question may be out of the scope of this patch. But, that scenario
should be addressed in this series or another series.

Considering this scenario, I'm not sure just skipping a data relocation BG
is a good solution. It will make it impossible to convert a data relocation
BG.

> fs/btrfs/block-group.c | 2 ++
> fs/btrfs/relocation.c | 7 +++++++
> fs/btrfs/volumes.c | 2 ++
> 3 files changed, 11 insertions(+)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 9910bae89966..9a01bbad45f6 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1921,6 +1921,8 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
> div64_u64(zone_unusable * 100, bg->length));
> trace_btrfs_reclaim_block_group(bg);
> ret = btrfs_relocate_chunk(fs_info, bg->start);
> + if (ret == -EBUSY)
> + ret = 0;
> if (ret) {
> btrfs_dec_block_group_ro(bg);
> btrfs_err(fs_info, "error relocating chunk %llu",
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 5f1a909a1d91..39e2db9af64f 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -4037,6 +4037,13 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
> int rw = 0;
> int err = 0;
>
> + spin_lock(&fs_info->relocation_bg_lock);
> + if (group_start == fs_info->data_reloc_bg) {
> + spin_unlock(&fs_info->relocation_bg_lock);
> + return -EBUSY;
> + }
> + spin_unlock(&fs_info->relocation_bg_lock);
> +
> /*
> * This only gets set if we had a half-deleted snapshot on mount. We
> * cannot allow relocation to start while we're still trying to clean up
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 3f70f727dacf..75da3a32885b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3367,6 +3367,8 @@ int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
> btrfs_scrub_pause(fs_info);
> ret = btrfs_relocate_block_group(fs_info, chunk_offset);
> btrfs_scrub_continue(fs_info);
> + if (ret == -EBUSY)
> + return 0;
> if (ret) {
> /*
> * If we had a transaction abort, stop all running scrubs.
>
> --
> 2.43.0
>