2023-08-10 04:35:14

by xiaoshoukui

[permalink] [raw]
Subject: [PATCH] btrfs: fix return value when race occur between balance and cancel/pause

Issue a pause or cancel IOCTL request after judging that there is no
pause or cancel request on the path of __btrfs_balance to return 0,
which will mislead the user that the pause or cancel requests are
successful.In fact, the balance request has not been paused or canceled.

On that race condition, a non-zero errno should be returned to the user.

Signed-off-by: xiaoshoukui <[email protected]>
---
fs/btrfs/fs.h | 6 ++++++
fs/btrfs/volumes.c | 14 +++++++++-----
2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 203d2a267828..c27def881922 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -93,6 +93,12 @@ enum {
*/
BTRFS_FS_BALANCE_RUNNING,

+ /* Indicate that balance has been paused. */
+ BTRFS_FS_BALANCE_PAUSED,
+
+ /* Indicate that balance has been canceled. */
+ BTRFS_FS_BALANCE_CANCELED,
+
/*
* Indicate that relocation of a chunk has started, it's set per chunk
* and is toggled between chunks.
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2ecb76cf3d91..839ce1808f23 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4267,7 +4267,6 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
u64 num_devices;
unsigned seq;
bool reducing_redundancy;
- bool paused = false;
int i;

if (btrfs_fs_closing(fs_info) ||
@@ -4390,6 +4389,8 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
ASSERT(!test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
set_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags);
describe_balance_start_or_resume(fs_info);
+ clear_bit(BTRFS_FS_BALANCE_PAUSED, &fs_info->flags);
+ clear_bit(BTRFS_FS_BALANCE_CANCELED, &fs_info->flags);
mutex_unlock(&fs_info->balance_mutex);

ret = __btrfs_balance(fs_info);
@@ -4398,7 +4399,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
if (ret == -ECANCELED && atomic_read(&fs_info->balance_pause_req)) {
btrfs_info(fs_info, "balance: paused");
btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED);
- paused = true;
+ set_bit(BTRFS_FS_BALANCE_PAUSED, &fs_info->flags);
}
/*
* Balance can be canceled by:
@@ -4415,8 +4416,10 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
*
* So here we only check the return value to catch canceled balance.
*/
- else if (ret == -ECANCELED || ret == -EINTR)
+ else if (ret == -ECANCELED || ret == -EINTR) {
btrfs_info(fs_info, "balance: canceled");
+ set_bit(BTRFS_FS_BALANCE_CANCELED, &fs_info->flags);
+ }
else
btrfs_info(fs_info, "balance: ended with status: %d", ret);

@@ -4428,7 +4431,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
}

/* We didn't pause, we can clean everything up. */
- if (!paused) {
+ if (!test_bit(BTRFS_FS_BALANCE_PAUSED, &fs_info->flags)) {
reset_balance_state(fs_info);
btrfs_exclop_finish(fs_info);
}
@@ -4587,6 +4590,7 @@ int btrfs_pause_balance(struct btrfs_fs_info *fs_info)
/* we are good with balance_ctl ripped off from under us */
BUG_ON(test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
atomic_dec(&fs_info->balance_pause_req);
+ ret = test_bit(BTRFS_FS_BALANCE_PAUSED, &fs_info->flags) ? 0 : -EINVAL;
} else {
ret = -ENOTCONN;
}
@@ -4642,7 +4646,7 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags));
atomic_dec(&fs_info->balance_cancel_req);
mutex_unlock(&fs_info->balance_mutex);
- return 0;
+ return test_bit(BTRFS_FS_BALANCE_CANCELED, &fs_info->flags) ? 0 : -EINVAL;
}

int btrfs_uuid_scan_kthread(void *data)
--
2.20.1



2023-08-10 14:25:07

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH] btrfs: fix return value when race occur between balance and cancel/pause

On Wed, Aug 09, 2023 at 11:48:10PM -0400, xiaoshoukui wrote:
> Issue a pause or cancel IOCTL request after judging that there is no
> pause or cancel request on the path of __btrfs_balance to return 0,
> which will mislead the user that the pause or cancel requests are
> successful.In fact, the balance request has not been paused or canceled.
>
> On that race condition, a non-zero errno should be returned to the user.
>
> Signed-off-by: xiaoshoukui <[email protected]>
> ---
> fs/btrfs/fs.h | 6 ++++++
> fs/btrfs/volumes.c | 14 +++++++++-----
> 2 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> index 203d2a267828..c27def881922 100644
> --- a/fs/btrfs/fs.h
> +++ b/fs/btrfs/fs.h
> @@ -93,6 +93,12 @@ enum {
> */
> BTRFS_FS_BALANCE_RUNNING,
>
> + /* Indicate that balance has been paused. */
> + BTRFS_FS_BALANCE_PAUSED,
> +
> + /* Indicate that balance has been canceled. */
> + BTRFS_FS_BALANCE_CANCELED,

I don't like that the status is tracked in several bits like that, in
addition to the already complicated locking and state transitions of
restarted balance. I think this is a hint that some things can be
simplified or combined together, though it could be difficult

2023-08-11 03:22:55

by xiaoshoukui

[permalink] [raw]
Subject: Re: [PATCH] btrfs: fix return value when race occur between balance and cancel/pause

The first thought to solve the problem was to use locks, but after practice,
it turn it out that this would made the original code even more complex.

The way of tracking status may just a workaround solution. The better solution
may is to refactor balance relevant code.

I think interface provided to the user is very important for reliability.
Looking forward to a better solution, If needed, I can take some effort
for testing and reproducing.