2021-07-27 07:15:50

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

When removing a writeable device in __btrfs_free_extra_devids, the rw
device count should be decremented.

This error was caught by Syzbot which reported a warning in
close_fs_devices because fs_devices->rw_devices was not 0 after
closing all devices. Here is the call trace that was observed:

btrfs_mount_root():
btrfs_scan_one_device():
device_list_add(); <---------------- device added
btrfs_open_devices():
open_fs_devices():
btrfs_open_one_device(); <-------- writable device opened,
rw device count ++
btrfs_fill_super():
open_ctree():
btrfs_free_extra_devids():
__btrfs_free_extra_devids(); <--- writable device removed,
rw device count not decremented
fail_tree_roots:
btrfs_close_devices():
close_fs_devices(); <------- rw device count off by 1

As a note, prior to commit cf89af146b7e ("btrfs: dev-replace: fail
mount if we don't have replace item with target device"), rw_devices
was decremented on removing a writable device in
__btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
was not set for the device. However, this check does not need to be
reinstated as it is now redundant and incorrect.

In __btrfs_free_extra_devids, we skip removing the device if it is the
target for replacement. This is done by checking whether device->devid
== BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
and so it's redundant to test for that bit.

Additionally, following commit 82372bc816d7 ("Btrfs: make
the logic of source device removing more clear"), rw_devices is
incremented whenever a writeable device is added to the alloc
list (including the target device in btrfs_dev_replace_finishing), so
all removals of writable devices from the alloc list should also be
accompanied by a decrement to rw_devices.

Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
Reported-by: [email protected]
Tested-by: [email protected]
Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
Reviewed-by: Anand Jain <[email protected]>
---
fs/btrfs/volumes.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 807502cd6510..916c25371658 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
list_del_init(&device->dev_alloc_list);
clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
+ fs_devices->rw_devices--;
}
list_del_init(&device->dev_list);
fs_devices->num_devices--;
--
2.25.1



2021-07-28 13:04:01

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
> When removing a writeable device in __btrfs_free_extra_devids, the rw
> device count should be decremented.
>
> This error was caught by Syzbot which reported a warning in
> close_fs_devices because fs_devices->rw_devices was not 0 after
> closing all devices. Here is the call trace that was observed:
>
> btrfs_mount_root():
> btrfs_scan_one_device():
> device_list_add(); <---------------- device added
> btrfs_open_devices():
> open_fs_devices():
> btrfs_open_one_device(); <-------- writable device opened,
> rw device count ++
> btrfs_fill_super():
> open_ctree():
> btrfs_free_extra_devids():
> __btrfs_free_extra_devids(); <--- writable device removed,
> rw device count not decremented
> fail_tree_roots:
> btrfs_close_devices():
> close_fs_devices(); <------- rw device count off by 1
>
> As a note, prior to commit cf89af146b7e ("btrfs: dev-replace: fail
> mount if we don't have replace item with target device"), rw_devices
> was decremented on removing a writable device in
> __btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
> was not set for the device. However, this check does not need to be
> reinstated as it is now redundant and incorrect.
>
> In __btrfs_free_extra_devids, we skip removing the device if it is the
> target for replacement. This is done by checking whether device->devid
> == BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
> only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
> should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
> and so it's redundant to test for that bit.
>
> Additionally, following commit 82372bc816d7 ("Btrfs: make
> the logic of source device removing more clear"), rw_devices is
> incremented whenever a writeable device is added to the alloc
> list (including the target device in btrfs_dev_replace_finishing), so
> all removals of writable devices from the alloc list should also be
> accompanied by a decrement to rw_devices.
>
> Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
> Reported-by: [email protected]
> Tested-by: [email protected]
> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
> Reviewed-by: Anand Jain <[email protected]>

Added to misc-next, thanks.

2021-08-12 10:44:41

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
> When removing a writeable device in __btrfs_free_extra_devids, the rw
> device count should be decremented.
>
> This error was caught by Syzbot which reported a warning in
> close_fs_devices because fs_devices->rw_devices was not 0 after
> closing all devices. Here is the call trace that was observed:
>
> btrfs_mount_root():
> btrfs_scan_one_device():
> device_list_add(); <---------------- device added
> btrfs_open_devices():
> open_fs_devices():
> btrfs_open_one_device(); <-------- writable device opened,
> rw device count ++
> btrfs_fill_super():
> open_ctree():
> btrfs_free_extra_devids():
> __btrfs_free_extra_devids(); <--- writable device removed,
> rw device count not decremented
> fail_tree_roots:
> btrfs_close_devices():
> close_fs_devices(); <------- rw device count off by 1
>
> As a note, prior to commit cf89af146b7e ("btrfs: dev-replace: fail
> mount if we don't have replace item with target device"), rw_devices
> was decremented on removing a writable device in
> __btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
> was not set for the device. However, this check does not need to be
> reinstated as it is now redundant and incorrect.
>
> In __btrfs_free_extra_devids, we skip removing the device if it is the
> target for replacement. This is done by checking whether device->devid
> == BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
> only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
> should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
> and so it's redundant to test for that bit.
>
> Additionally, following commit 82372bc816d7 ("Btrfs: make
> the logic of source device removing more clear"), rw_devices is
> incremented whenever a writeable device is added to the alloc
> list (including the target device in btrfs_dev_replace_finishing), so
> all removals of writable devices from the alloc list should also be
> accompanied by a decrement to rw_devices.
>
> Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
> Reported-by: [email protected]
> Tested-by: [email protected]
> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
> Reviewed-by: Anand Jain <[email protected]>
> ---
> fs/btrfs/volumes.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 807502cd6510..916c25371658 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
> if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
> list_del_init(&device->dev_alloc_list);
> clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
> + fs_devices->rw_devices--;
> }
> list_del_init(&device->dev_list);
> fs_devices->num_devices--;

I've hit a crash on master branch with stacktrace very similar to one
this bug was supposed to fix. It's a failed assertion on device close.
This patch was the last one to touch it and it matches some of the
keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
the original patch but was not reinstated in your fix.

I'm not sure how reproducible it is, right now I have only one instance
and am hunting another strange problem. They could be related.

assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150

https://susepaste.org/view/raw/18223056 full log with other stacktraces,
possibly relatedg

[ 3310.383268] kernel BUG at fs/btrfs/ctree.h:3431!
[ 3310.384586] invalid opcode: 0000 [#1] PREEMPT SMP
[ 3310.385848] CPU: 1 PID: 3982 Comm: umount Tainted: G W 5.14.0-rc5-default+ #1532
[ 3310.388216] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[ 3310.391054] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
[ 3310.397628] RSP: 0018:ffffb7a5454c7db8 EFLAGS: 00010246
[ 3310.399079] RAX: 0000000000000068 RBX: ffff978364b91c00 RCX: 0000000000000000
[ 3310.400990] RDX: 0000000000000000 RSI: ffffffffabee13c4 RDI: 00000000ffffffff
[ 3310.402504] RBP: ffff9783523a4c00 R08: 0000000000000001 R09: 0000000000000001
[ 3310.404025] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9783523a4d18
[ 3310.405074] R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000003
[ 3310.406130] FS: 00007f61c8f42800(0000) GS:ffff9783bd800000(0000) knlGS:0000000000000000
[ 3310.407649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3310.409022] CR2: 000056190cffa810 CR3: 0000000030b96002 CR4: 0000000000170ea0
[ 3310.410111] Call Trace:
[ 3310.410561] btrfs_close_one_device.cold+0x11/0x55 [btrfs]
[ 3310.411788] close_fs_devices+0x44/0xb0 [btrfs]
[ 3310.412654] btrfs_close_devices+0x48/0x160 [btrfs]
[ 3310.413449] generic_shutdown_super+0x69/0x100
[ 3310.414155] kill_anon_super+0x14/0x30
[ 3310.414802] btrfs_kill_super+0x12/0x20 [btrfs]
[ 3310.415767] deactivate_locked_super+0x2c/0xa0
[ 3310.416562] cleanup_mnt+0x144/0x1b0
[ 3310.417153] task_work_run+0x59/0xa0
[ 3310.417744] exit_to_user_mode_loop+0xe7/0xf0
[ 3310.418440] exit_to_user_mode_prepare+0xaf/0xf0
[ 3310.419425] syscall_exit_to_user_mode+0x19/0x50
[ 3310.420281] do_syscall_64+0x4a/0x90
[ 3310.420934] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3310.421718] RIP: 0033:0x7f61c91654db

2021-08-12 15:45:38

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

On 12/8/21 6:38 pm, David Sterba wrote:
> On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
>> When removing a writeable device in __btrfs_free_extra_devids, the rw
>> device count should be decremented.
>>
>> This error was caught by Syzbot which reported a warning in
>> close_fs_devices because fs_devices->rw_devices was not 0 after
>> closing all devices. Here is the call trace that was observed:
>>
>> btrfs_mount_root():
>> btrfs_scan_one_device():
>> device_list_add(); <---------------- device added
>> btrfs_open_devices():
>> open_fs_devices():
>> btrfs_open_one_device(); <-------- writable device opened,
>> rw device count ++
>> btrfs_fill_super():
>> open_ctree():
>> btrfs_free_extra_devids():
>> __btrfs_free_extra_devids(); <--- writable device removed,
>> rw device count not decremented
>> fail_tree_roots:
>> btrfs_close_devices():
>> close_fs_devices(); <------- rw device count off by 1
>>
>> As a note, prior to commit cf89af146b7e ("btrfs: dev-replace: fail
>> mount if we don't have replace item with target device"), rw_devices
>> was decremented on removing a writable device in
>> __btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
>> was not set for the device. However, this check does not need to be
>> reinstated as it is now redundant and incorrect.
>>
>> In __btrfs_free_extra_devids, we skip removing the device if it is the
>> target for replacement. This is done by checking whether device->devid
>> == BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
>> only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
>> should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
>> and so it's redundant to test for that bit.
>>
>> Additionally, following commit 82372bc816d7 ("Btrfs: make
>> the logic of source device removing more clear"), rw_devices is
>> incremented whenever a writeable device is added to the alloc
>> list (including the target device in btrfs_dev_replace_finishing), so
>> all removals of writable devices from the alloc list should also be
>> accompanied by a decrement to rw_devices.
>>
>> Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
>> Reported-by: [email protected]
>> Tested-by: [email protected]
>> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
>> Reviewed-by: Anand Jain <[email protected]>
>> ---
>> fs/btrfs/volumes.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 807502cd6510..916c25371658 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
>> if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
>> list_del_init(&device->dev_alloc_list);
>> clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
>> + fs_devices->rw_devices--;
>> }
>> list_del_init(&device->dev_list);
>> fs_devices->num_devices--;
>
> I've hit a crash on master branch with stacktrace very similar to one
> this bug was supposed to fix. It's a failed assertion on device close.
> This patch was the last one to touch it and it matches some of the
> keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
> the original patch but was not reinstated in your fix.
>
> I'm not sure how reproducible it is, right now I have only one instance
> and am hunting another strange problem. They could be related.
>
> assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
>
> https://susepaste.org/view/raw/18223056 full log with other stacktraces,
> possibly relatedg
>

Looking at the logs, it seems that a dev_replace was started, then
suspended. But it wasn't canceled or resumed before the fs devices were
closed.

I'll investigate further, just throwing some observations out there.

> [ 3310.383268] kernel BUG at fs/btrfs/ctree.h:3431!
> [ 3310.384586] invalid opcode: 0000 [#1] PREEMPT SMP
> [ 3310.385848] CPU: 1 PID: 3982 Comm: umount Tainted: G W 5.14.0-rc5-default+ #1532
> [ 3310.388216] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
> [ 3310.391054] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
> [ 3310.397628] RSP: 0018:ffffb7a5454c7db8 EFLAGS: 00010246
> [ 3310.399079] RAX: 0000000000000068 RBX: ffff978364b91c00 RCX: 0000000000000000
> [ 3310.400990] RDX: 0000000000000000 RSI: ffffffffabee13c4 RDI: 00000000ffffffff
> [ 3310.402504] RBP: ffff9783523a4c00 R08: 0000000000000001 R09: 0000000000000001
> [ 3310.404025] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9783523a4d18
> [ 3310.405074] R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000003
> [ 3310.406130] FS: 00007f61c8f42800(0000) GS:ffff9783bd800000(0000) knlGS:0000000000000000
> [ 3310.407649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3310.409022] CR2: 000056190cffa810 CR3: 0000000030b96002 CR4: 0000000000170ea0
> [ 3310.410111] Call Trace:
> [ 3310.410561] btrfs_close_one_device.cold+0x11/0x55 [btrfs]
> [ 3310.411788] close_fs_devices+0x44/0xb0 [btrfs]
> [ 3310.412654] btrfs_close_devices+0x48/0x160 [btrfs]
> [ 3310.413449] generic_shutdown_super+0x69/0x100
> [ 3310.414155] kill_anon_super+0x14/0x30
> [ 3310.414802] btrfs_kill_super+0x12/0x20 [btrfs]
> [ 3310.415767] deactivate_locked_super+0x2c/0xa0
> [ 3310.416562] cleanup_mnt+0x144/0x1b0
> [ 3310.417153] task_work_run+0x59/0xa0
> [ 3310.417744] exit_to_user_mode_loop+0xe7/0xf0
> [ 3310.418440] exit_to_user_mode_prepare+0xaf/0xf0
> [ 3310.419425] syscall_exit_to_user_mode+0x19/0x50
> [ 3310.420281] do_syscall_64+0x4a/0x90
> [ 3310.420934] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 3310.421718] RIP: 0033:0x7f61c91654db
>

2021-08-12 18:15:42

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

On Thu, Aug 12, 2021 at 11:43:16PM +0800, Desmond Cheong Zhi Xi wrote:
> On 12/8/21 6:38 pm, David Sterba wrote:
> > On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
> >> When removing a writeable device in __btrfs_free_extra_devids, the rw
> >> device count should be decremented.
> >>
> >> This error was caught by Syzbot which reported a warning in
> >> close_fs_devices because fs_devices->rw_devices was not 0 after
> >> closing all devices. Here is the call trace that was observed:
> >>
> >> btrfs_mount_root():
> >> btrfs_scan_one_device():
> >> device_list_add(); <---------------- device added
> >> btrfs_open_devices():
> >> open_fs_devices():
> >> btrfs_open_one_device(); <-------- writable device opened,
> >> rw device count ++
> >> btrfs_fill_super():
> >> open_ctree():
> >> btrfs_free_extra_devids():
> >> __btrfs_free_extra_devids(); <--- writable device removed,
> >> rw device count not decremented
> >> fail_tree_roots:
> >> btrfs_close_devices():
> >> close_fs_devices(); <------- rw device count off by 1
> >>
> >> As a note, prior to commit cf89af146b7e ("btrfs: dev-replace: fail
> >> mount if we don't have replace item with target device"), rw_devices
> >> was decremented on removing a writable device in
> >> __btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
> >> was not set for the device. However, this check does not need to be
> >> reinstated as it is now redundant and incorrect.
> >>
> >> In __btrfs_free_extra_devids, we skip removing the device if it is the
> >> target for replacement. This is done by checking whether device->devid
> >> == BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
> >> only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
> >> should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
> >> and so it's redundant to test for that bit.
> >>
> >> Additionally, following commit 82372bc816d7 ("Btrfs: make
> >> the logic of source device removing more clear"), rw_devices is
> >> incremented whenever a writeable device is added to the alloc
> >> list (including the target device in btrfs_dev_replace_finishing), so
> >> all removals of writable devices from the alloc list should also be
> >> accompanied by a decrement to rw_devices.
> >>
> >> Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
> >> Reported-by: [email protected]
> >> Tested-by: [email protected]
> >> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
> >> Reviewed-by: Anand Jain <[email protected]>
> >> ---
> >> fs/btrfs/volumes.c | 1 +
> >> 1 file changed, 1 insertion(+)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index 807502cd6510..916c25371658 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
> >> if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
> >> list_del_init(&device->dev_alloc_list);
> >> clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
> >> + fs_devices->rw_devices--;
> >> }
> >> list_del_init(&device->dev_list);
> >> fs_devices->num_devices--;
> >
> > I've hit a crash on master branch with stacktrace very similar to one
> > this bug was supposed to fix. It's a failed assertion on device close.
> > This patch was the last one to touch it and it matches some of the
> > keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
> > the original patch but was not reinstated in your fix.
> >
> > I'm not sure how reproducible it is, right now I have only one instance
> > and am hunting another strange problem. They could be related.
> >
> > assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
> >
> > https://susepaste.org/view/raw/18223056 full log with other stacktraces,
> > possibly relatedg
> >
>
> Looking at the logs, it seems that a dev_replace was started, then
> suspended. But it wasn't canceled or resumed before the fs devices were
> closed.
>
> I'll investigate further, just throwing some observations out there.

Thanks. I'm testing the patch revert, no crash after first loop, I'll
run a few more to be sure as it's not entirely reliable.

Sending the revert is option of last resort as we're approaching end of
5.14 dev cycle and the crash prevents testing (unlike the fuzzer
warning).

2021-08-12 18:40:52

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

On 12/8/21 11:50 pm, David Sterba wrote:
> On Thu, Aug 12, 2021 at 11:43:16PM +0800, Desmond Cheong Zhi Xi wrote:
>> On 12/8/21 6:38 pm, David Sterba wrote:
>>> On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
>>>> When removing a writeable device in __btrfs_free_extra_devids, the rw
>>>> device count should be decremented.
>>>>
>>>> This error was caught by Syzbot which reported a warning in
>>>> close_fs_devices because fs_devices->rw_devices was not 0 after
>>>> closing all devices. Here is the call trace that was observed:
>>>>
>>>> btrfs_mount_root():
>>>> btrfs_scan_one_device():
>>>> device_list_add(); <---------------- device added
>>>> btrfs_open_devices():
>>>> open_fs_devices():
>>>> btrfs_open_one_device(); <-------- writable device opened,
>>>> rw device count ++
>>>> btrfs_fill_super():
>>>> open_ctree():
>>>> btrfs_free_extra_devids():
>>>> __btrfs_free_extra_devids(); <--- writable device removed,
>>>> rw device count not decremented
>>>> fail_tree_roots:
>>>> btrfs_close_devices():
>>>> close_fs_devices(); <------- rw device count off by 1
>>>>
>>>> As a note, prior to commit cf89af146b7e ("btrfs: dev-replace: fail
>>>> mount if we don't have replace item with target device"), rw_devices
>>>> was decremented on removing a writable device in
>>>> __btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
>>>> was not set for the device. However, this check does not need to be
>>>> reinstated as it is now redundant and incorrect.
>>>>
>>>> In __btrfs_free_extra_devids, we skip removing the device if it is the
>>>> target for replacement. This is done by checking whether device->devid
>>>> == BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
>>>> only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
>>>> should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
>>>> and so it's redundant to test for that bit.
>>>>
>>>> Additionally, following commit 82372bc816d7 ("Btrfs: make
>>>> the logic of source device removing more clear"), rw_devices is
>>>> incremented whenever a writeable device is added to the alloc
>>>> list (including the target device in btrfs_dev_replace_finishing), so
>>>> all removals of writable devices from the alloc list should also be
>>>> accompanied by a decrement to rw_devices.
>>>>
>>>> Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
>>>> Reported-by: [email protected]
>>>> Tested-by: [email protected]
>>>> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
>>>> Reviewed-by: Anand Jain <[email protected]>
>>>> ---
>>>> fs/btrfs/volumes.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index 807502cd6510..916c25371658 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
>>>> if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
>>>> list_del_init(&device->dev_alloc_list);
>>>> clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
>>>> + fs_devices->rw_devices--;
>>>> }
>>>> list_del_init(&device->dev_list);
>>>> fs_devices->num_devices--;
>>>
>>> I've hit a crash on master branch with stacktrace very similar to one
>>> this bug was supposed to fix. It's a failed assertion on device close.
>>> This patch was the last one to touch it and it matches some of the
>>> keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
>>> the original patch but was not reinstated in your fix.
>>>
>>> I'm not sure how reproducible it is, right now I have only one instance
>>> and am hunting another strange problem. They could be related.
>>>
>>> assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
>>>
>>> https://susepaste.org/view/raw/18223056 full log with other stacktraces,
>>> possibly relatedg
>>>
>>
>> Looking at the logs, it seems that a dev_replace was started, then
>> suspended. But it wasn't canceled or resumed before the fs devices were
>> closed.
>>
>> I'll investigate further, just throwing some observations out there.
>
> Thanks. I'm testing the patch revert, no crash after first loop, I'll
> run a few more to be sure as it's not entirely reliable.
>
> Sending the revert is option of last resort as we're approaching end of
> 5.14 dev cycle and the crash prevents testing (unlike the fuzzer
> warning).
>

I might be missing something, so any thoughts would be appreciated. But
I don't think the assertion in btrfs_close_one_device is correct.

From what I see, this crash happens when close_ctree is called while a
dev_replace hasn't completed. In close_ctree, we suspend the
dev_replace, but keep the replace target around so that we can resume
the dev_replace procedure when we mount the root again. This is the call
trace:

close_ctree():
btrfs_dev_replace_suspend_for_unmount();
btrfs_close_devices():
btrfs_close_fs_devices():
btrfs_close_one_device():
ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
&device->dev_state));

However, since the replace target sticks around, there is a device with
BTRFS_DEV_STATE_REPLACE_TGT set, and we fail the assertion in
btrfs_close_one_device.

Two options I can think of:

- We could remove the assertion.

- Or we could clear the BTRFS_DEV_STATE_REPLACE_TGT bit in
btrfs_dev_replace_suspend_for_unmount. This is fine since the bit is set
again in btrfs_init_dev_replace if the dev_replace->replace_state is
BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED. But this approach strikes me as
a little odd because the device is still the replace target when
mounting in the future.

Thoughts?

2021-08-13 09:09:31

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

On Fri, Aug 13, 2021 at 01:31:25AM +0800, Desmond Cheong Zhi Xi wrote:
> On 12/8/21 11:50 pm, David Sterba wrote:
> > On Thu, Aug 12, 2021 at 11:43:16PM +0800, Desmond Cheong Zhi Xi wrote:
> >> On 12/8/21 6:38 pm, David Sterba wrote:
> >>> On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
> >>>> --- a/fs/btrfs/volumes.c
> >>>> +++ b/fs/btrfs/volumes.c
> >>>> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
> >>>> if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
> >>>> list_del_init(&device->dev_alloc_list);
> >>>> clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
> >>>> + fs_devices->rw_devices--;
> >>>> }
> >>>> list_del_init(&device->dev_list);
> >>>> fs_devices->num_devices--;
> >>>
> >>> I've hit a crash on master branch with stacktrace very similar to one
> >>> this bug was supposed to fix. It's a failed assertion on device close.
> >>> This patch was the last one to touch it and it matches some of the
> >>> keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
> >>> the original patch but was not reinstated in your fix.
> >>>
> >>> I'm not sure how reproducible it is, right now I have only one instance
> >>> and am hunting another strange problem. They could be related.
> >>>
> >>> assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
> >>>
> >>> https://susepaste.org/view/raw/18223056 full log with other stacktraces,
> >>> possibly relatedg
> >>>
> >>
> >> Looking at the logs, it seems that a dev_replace was started, then
> >> suspended. But it wasn't canceled or resumed before the fs devices were
> >> closed.
> >>
> >> I'll investigate further, just throwing some observations out there.
> >
> > Thanks. I'm testing the patch revert, no crash after first loop, I'll
> > run a few more to be sure as it's not entirely reliable.
> >
> > Sending the revert is option of last resort as we're approaching end of
> > 5.14 dev cycle and the crash prevents testing (unlike the fuzzer
> > warning).
> >
>
> I might be missing something, so any thoughts would be appreciated. But
> I don't think the assertion in btrfs_close_one_device is correct.
>
> From what I see, this crash happens when close_ctree is called while a
> dev_replace hasn't completed. In close_ctree, we suspend the
> dev_replace, but keep the replace target around so that we can resume
> the dev_replace procedure when we mount the root again. This is the call
> trace:
>
> close_ctree():
> btrfs_dev_replace_suspend_for_unmount();
> btrfs_close_devices():
> btrfs_close_fs_devices():
> btrfs_close_one_device():
> ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
> &device->dev_state));
>
> However, since the replace target sticks around, there is a device with
> BTRFS_DEV_STATE_REPLACE_TGT set, and we fail the assertion in
> btrfs_close_one_device.
>
> Two options I can think of:
>
> - We could remove the assertion.
>
> - Or we could clear the BTRFS_DEV_STATE_REPLACE_TGT bit in
> btrfs_dev_replace_suspend_for_unmount. This is fine since the bit is set
> again in btrfs_init_dev_replace if the dev_replace->replace_state is
> BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED. But this approach strikes me as
> a little odd because the device is still the replace target when
> mounting in the future.

The option #2 does not sound safe because the TGT bit is checked in
several places where device list is queried for various reasons, even
without a mounted filesystem.

Removing the assertion makes more sense but I'm still not convinced that
the this is expected/allowed state of a closed device.

2021-08-13 10:14:30

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

On 13/8/21 4:51 pm, David Sterba wrote:
> On Fri, Aug 13, 2021 at 01:31:25AM +0800, Desmond Cheong Zhi Xi wrote:
>> On 12/8/21 11:50 pm, David Sterba wrote:
>>> On Thu, Aug 12, 2021 at 11:43:16PM +0800, Desmond Cheong Zhi Xi wrote:
>>>> On 12/8/21 6:38 pm, David Sterba wrote:
>>>>> On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
>>>>>> --- a/fs/btrfs/volumes.c
>>>>>> +++ b/fs/btrfs/volumes.c
>>>>>> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
>>>>>> if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
>>>>>> list_del_init(&device->dev_alloc_list);
>>>>>> clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
>>>>>> + fs_devices->rw_devices--;
>>>>>> }
>>>>>> list_del_init(&device->dev_list);
>>>>>> fs_devices->num_devices--;
>>>>>
>>>>> I've hit a crash on master branch with stacktrace very similar to one
>>>>> this bug was supposed to fix. It's a failed assertion on device close.
>>>>> This patch was the last one to touch it and it matches some of the
>>>>> keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
>>>>> the original patch but was not reinstated in your fix.
>>>>>
>>>>> I'm not sure how reproducible it is, right now I have only one instance
>>>>> and am hunting another strange problem. They could be related.
>>>>>
>>>>> assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
>>>>>
>>>>> https://susepaste.org/view/raw/18223056 full log with other stacktraces,
>>>>> possibly relatedg
>>>>>
>>>>
>>>> Looking at the logs, it seems that a dev_replace was started, then
>>>> suspended. But it wasn't canceled or resumed before the fs devices were
>>>> closed.
>>>>
>>>> I'll investigate further, just throwing some observations out there.
>>>
>>> Thanks. I'm testing the patch revert, no crash after first loop, I'll
>>> run a few more to be sure as it's not entirely reliable.
>>>
>>> Sending the revert is option of last resort as we're approaching end of
>>> 5.14 dev cycle and the crash prevents testing (unlike the fuzzer
>>> warning).
>>>
>>
>> I might be missing something, so any thoughts would be appreciated. But
>> I don't think the assertion in btrfs_close_one_device is correct.
>>
>> From what I see, this crash happens when close_ctree is called while a
>> dev_replace hasn't completed. In close_ctree, we suspend the
>> dev_replace, but keep the replace target around so that we can resume
>> the dev_replace procedure when we mount the root again. This is the call
>> trace:
>>
>> close_ctree():
>> btrfs_dev_replace_suspend_for_unmount();
>> btrfs_close_devices():
>> btrfs_close_fs_devices():
>> btrfs_close_one_device():
>> ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
>> &device->dev_state));
>>
>> However, since the replace target sticks around, there is a device with
>> BTRFS_DEV_STATE_REPLACE_TGT set, and we fail the assertion in
>> btrfs_close_one_device.
>>
>> Two options I can think of:
>>
>> - We could remove the assertion.
>>
>> - Or we could clear the BTRFS_DEV_STATE_REPLACE_TGT bit in
>> btrfs_dev_replace_suspend_for_unmount. This is fine since the bit is set
>> again in btrfs_init_dev_replace if the dev_replace->replace_state is
>> BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED. But this approach strikes me as
>> a little odd because the device is still the replace target when
>> mounting in the future.
>
> The option #2 does not sound safe because the TGT bit is checked in
> several places where device list is queried for various reasons, even
> without a mounted filesystem.
>
> Removing the assertion makes more sense but I'm still not convinced that
> the this is expected/allowed state of a closed device.
>

Would it be better if we cleared the REPLACE_TGT bit only when closing
the device where device->devid == BTRFS_DEV_REPLACE_DEVID?

The first conditional in btrfs_close_one_device assumes that we can come
across such a device. If we come across it, we should properly reset it.

If other devices has this bit set, the ASSERT will still catch it and
let us know something is wrong.

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 70f94b75f25a..a5afebb78ecf 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1130,6 +1130,9 @@ static void btrfs_close_one_device(struct btrfs_device *device)
fs_devices->rw_devices--;
}

+ if (device->devid == BTRFS_DEV_REPLACE_DEVID)
+ clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
+
if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
fs_devices->missing_devices--;

2021-08-13 11:48:45

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

On Fri, Aug 13, 2021 at 05:57:26PM +0800, Desmond Cheong Zhi Xi wrote:
> On 13/8/21 4:51 pm, David Sterba wrote:
> > On Fri, Aug 13, 2021 at 01:31:25AM +0800, Desmond Cheong Zhi Xi wrote:
> >> On 12/8/21 11:50 pm, David Sterba wrote:
> >>> On Thu, Aug 12, 2021 at 11:43:16PM +0800, Desmond Cheong Zhi Xi wrote:
> >>>> On 12/8/21 6:38 pm, David Sterba wrote:
> >>>>> On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
> >>>>>> --- a/fs/btrfs/volumes.c
> >>>>>> +++ b/fs/btrfs/volumes.c
> >>>>>> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
> >>>>>> if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
> >>>>>> list_del_init(&device->dev_alloc_list);
> >>>>>> clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
> >>>>>> + fs_devices->rw_devices--;
> >>>>>> }
> >>>>>> list_del_init(&device->dev_list);
> >>>>>> fs_devices->num_devices--;
> >>>>>
> >>>>> I've hit a crash on master branch with stacktrace very similar to one
> >>>>> this bug was supposed to fix. It's a failed assertion on device close.
> >>>>> This patch was the last one to touch it and it matches some of the
> >>>>> keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
> >>>>> the original patch but was not reinstated in your fix.
> >>>>>
> >>>>> I'm not sure how reproducible it is, right now I have only one instance
> >>>>> and am hunting another strange problem. They could be related.
> >>>>>
> >>>>> assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
> >>>>>
> >>>>> https://susepaste.org/view/raw/18223056 full log with other stacktraces,
> >>>>> possibly relatedg
> >>>>>
> >>>>
> >>>> Looking at the logs, it seems that a dev_replace was started, then
> >>>> suspended. But it wasn't canceled or resumed before the fs devices were
> >>>> closed.
> >>>>
> >>>> I'll investigate further, just throwing some observations out there.
> >>>
> >>> Thanks. I'm testing the patch revert, no crash after first loop, I'll
> >>> run a few more to be sure as it's not entirely reliable.
> >>>
> >>> Sending the revert is option of last resort as we're approaching end of
> >>> 5.14 dev cycle and the crash prevents testing (unlike the fuzzer
> >>> warning).
> >>>
> >>
> >> I might be missing something, so any thoughts would be appreciated. But
> >> I don't think the assertion in btrfs_close_one_device is correct.
> >>
> >> From what I see, this crash happens when close_ctree is called while a
> >> dev_replace hasn't completed. In close_ctree, we suspend the
> >> dev_replace, but keep the replace target around so that we can resume
> >> the dev_replace procedure when we mount the root again. This is the call
> >> trace:
> >>
> >> close_ctree():
> >> btrfs_dev_replace_suspend_for_unmount();
> >> btrfs_close_devices():
> >> btrfs_close_fs_devices():
> >> btrfs_close_one_device():
> >> ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
> >> &device->dev_state));
> >>
> >> However, since the replace target sticks around, there is a device with
> >> BTRFS_DEV_STATE_REPLACE_TGT set, and we fail the assertion in
> >> btrfs_close_one_device.
> >>
> >> Two options I can think of:
> >>
> >> - We could remove the assertion.
> >>
> >> - Or we could clear the BTRFS_DEV_STATE_REPLACE_TGT bit in
> >> btrfs_dev_replace_suspend_for_unmount. This is fine since the bit is set
> >> again in btrfs_init_dev_replace if the dev_replace->replace_state is
> >> BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED. But this approach strikes me as
> >> a little odd because the device is still the replace target when
> >> mounting in the future.
> >
> > The option #2 does not sound safe because the TGT bit is checked in
> > several places where device list is queried for various reasons, even
> > without a mounted filesystem.
> >
> > Removing the assertion makes more sense but I'm still not convinced that
> > the this is expected/allowed state of a closed device.
> >
>
> Would it be better if we cleared the REPLACE_TGT bit only when closing
> the device where device->devid == BTRFS_DEV_REPLACE_DEVID?
>
> The first conditional in btrfs_close_one_device assumes that we can come
> across such a device. If we come across it, we should properly reset it.
>
> If other devices has this bit set, the ASSERT will still catch it and
> let us know something is wrong.

That sounds great.

> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 70f94b75f25a..a5afebb78ecf 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1130,6 +1130,9 @@ static void btrfs_close_one_device(struct btrfs_device *device)
> fs_devices->rw_devices--;
> }
>
> + if (device->devid == BTRFS_DEV_REPLACE_DEVID)
> + clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
> +
> if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
> fs_devices->missing_devices--;

I'll do a few test rounds, thanks.

2021-08-19 17:14:32

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

On 13/8/21 6:30 pm, David Sterba wrote:
> On Fri, Aug 13, 2021 at 05:57:26PM +0800, Desmond Cheong Zhi Xi wrote:
>> On 13/8/21 4:51 pm, David Sterba wrote:
>>> On Fri, Aug 13, 2021 at 01:31:25AM +0800, Desmond Cheong Zhi Xi wrote:
>>>> On 12/8/21 11:50 pm, David Sterba wrote:
>>>>> On Thu, Aug 12, 2021 at 11:43:16PM +0800, Desmond Cheong Zhi Xi wrote:
>>>>>> On 12/8/21 6:38 pm, David Sterba wrote:
>>>>>>> On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
>>>>>>>> --- a/fs/btrfs/volumes.c
>>>>>>>> +++ b/fs/btrfs/volumes.c
>>>>>>>> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
>>>>>>>> if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
>>>>>>>> list_del_init(&device->dev_alloc_list);
>>>>>>>> clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
>>>>>>>> + fs_devices->rw_devices--;
>>>>>>>> }
>>>>>>>> list_del_init(&device->dev_list);
>>>>>>>> fs_devices->num_devices--;
>>>>>>>
>>>>>>> I've hit a crash on master branch with stacktrace very similar to one
>>>>>>> this bug was supposed to fix. It's a failed assertion on device close.
>>>>>>> This patch was the last one to touch it and it matches some of the
>>>>>>> keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
>>>>>>> the original patch but was not reinstated in your fix.
>>>>>>>
>>>>>>> I'm not sure how reproducible it is, right now I have only one instance
>>>>>>> and am hunting another strange problem. They could be related.
>>>>>>>
>>>>>>> assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
>>>>>>>
>>>>>>> https://susepaste.org/view/raw/18223056 full log with other stacktraces,
>>>>>>> possibly relatedg
>>>>>>>
>>>>>>
>>>>>> Looking at the logs, it seems that a dev_replace was started, then
>>>>>> suspended. But it wasn't canceled or resumed before the fs devices were
>>>>>> closed.
>>>>>>
>>>>>> I'll investigate further, just throwing some observations out there.
>>>>>
>>>>> Thanks. I'm testing the patch revert, no crash after first loop, I'll
>>>>> run a few more to be sure as it's not entirely reliable.
>>>>>
>>>>> Sending the revert is option of last resort as we're approaching end of
>>>>> 5.14 dev cycle and the crash prevents testing (unlike the fuzzer
>>>>> warning).
>>>>>
>>>>
>>>> I might be missing something, so any thoughts would be appreciated. But
>>>> I don't think the assertion in btrfs_close_one_device is correct.
>>>>
>>>> From what I see, this crash happens when close_ctree is called while a
>>>> dev_replace hasn't completed. In close_ctree, we suspend the
>>>> dev_replace, but keep the replace target around so that we can resume
>>>> the dev_replace procedure when we mount the root again. This is the call
>>>> trace:
>>>>
>>>> close_ctree():
>>>> btrfs_dev_replace_suspend_for_unmount();
>>>> btrfs_close_devices():
>>>> btrfs_close_fs_devices():
>>>> btrfs_close_one_device():
>>>> ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
>>>> &device->dev_state));
>>>>
>>>> However, since the replace target sticks around, there is a device with
>>>> BTRFS_DEV_STATE_REPLACE_TGT set, and we fail the assertion in
>>>> btrfs_close_one_device.
>>>>
>>>> Two options I can think of:
>>>>
>>>> - We could remove the assertion.
>>>>
>>>> - Or we could clear the BTRFS_DEV_STATE_REPLACE_TGT bit in
>>>> btrfs_dev_replace_suspend_for_unmount. This is fine since the bit is set
>>>> again in btrfs_init_dev_replace if the dev_replace->replace_state is
>>>> BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED. But this approach strikes me as
>>>> a little odd because the device is still the replace target when
>>>> mounting in the future.
>>>
>>> The option #2 does not sound safe because the TGT bit is checked in
>>> several places where device list is queried for various reasons, even
>>> without a mounted filesystem.
>>>
>>> Removing the assertion makes more sense but I'm still not convinced that
>>> the this is expected/allowed state of a closed device.
>>>
>>
>> Would it be better if we cleared the REPLACE_TGT bit only when closing
>> the device where device->devid == BTRFS_DEV_REPLACE_DEVID?
>>
>> The first conditional in btrfs_close_one_device assumes that we can come
>> across such a device. If we come across it, we should properly reset it.
>>
>> If other devices has this bit set, the ASSERT will still catch it and
>> let us know something is wrong.
>
> That sounds great.
>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 70f94b75f25a..a5afebb78ecf 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1130,6 +1130,9 @@ static void btrfs_close_one_device(struct btrfs_device *device)
>> fs_devices->rw_devices--;
>> }
>>
>> + if (device->devid == BTRFS_DEV_REPLACE_DEVID)
>> + clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
>> +
>> if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
>> fs_devices->missing_devices--;
>
> I'll do a few test rounds, thanks.
>

Hi David,

Just following up. Did that resolve the issue or is further
investigation needed?

2021-08-19 17:40:27

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

On Fri, Aug 20, 2021 at 01:11:58AM +0800, Desmond Cheong Zhi Xi wrote:
> >>> The option #2 does not sound safe because the TGT bit is checked in
> >>> several places where device list is queried for various reasons, even
> >>> without a mounted filesystem.
> >>>
> >>> Removing the assertion makes more sense but I'm still not convinced that
> >>> the this is expected/allowed state of a closed device.
> >>>
> >>
> >> Would it be better if we cleared the REPLACE_TGT bit only when closing
> >> the device where device->devid == BTRFS_DEV_REPLACE_DEVID?
> >>
> >> The first conditional in btrfs_close_one_device assumes that we can come
> >> across such a device. If we come across it, we should properly reset it.
> >>
> >> If other devices has this bit set, the ASSERT will still catch it and
> >> let us know something is wrong.
> >
> > That sounds great.
> >
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index 70f94b75f25a..a5afebb78ecf 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -1130,6 +1130,9 @@ static void btrfs_close_one_device(struct btrfs_device *device)
> >> fs_devices->rw_devices--;
> >> }
> >>
> >> + if (device->devid == BTRFS_DEV_REPLACE_DEVID)
> >> + clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
> >> +
> >> if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
> >> fs_devices->missing_devices--;
> >
> > I'll do a few test rounds, thanks.
>
> Just following up. Did that resolve the issue or is further
> investigation needed?

The fix seems to work, I haven't seen the assertion fail anymore,
incidentally the crash also stopped to show up on an unpatched branch.

2021-08-20 03:11:10

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

On 20/8/21 1:34 am, David Sterba wrote:
> On Fri, Aug 20, 2021 at 01:11:58AM +0800, Desmond Cheong Zhi Xi wrote:
>>>>> The option #2 does not sound safe because the TGT bit is checked in
>>>>> several places where device list is queried for various reasons, even
>>>>> without a mounted filesystem.
>>>>>
>>>>> Removing the assertion makes more sense but I'm still not convinced that
>>>>> the this is expected/allowed state of a closed device.
>>>>>
>>>>
>>>> Would it be better if we cleared the REPLACE_TGT bit only when closing
>>>> the device where device->devid == BTRFS_DEV_REPLACE_DEVID?
>>>>
>>>> The first conditional in btrfs_close_one_device assumes that we can come
>>>> across such a device. If we come across it, we should properly reset it.
>>>>
>>>> If other devices has this bit set, the ASSERT will still catch it and
>>>> let us know something is wrong.
>>>
>>> That sounds great.
>>>
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index 70f94b75f25a..a5afebb78ecf 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -1130,6 +1130,9 @@ static void btrfs_close_one_device(struct btrfs_device *device)
>>>> fs_devices->rw_devices--;
>>>> }
>>>>
>>>> + if (device->devid == BTRFS_DEV_REPLACE_DEVID)
>>>> + clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
>>>> +
>>>> if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
>>>> fs_devices->missing_devices--;
>>>
>>> I'll do a few test rounds, thanks.
>>
>> Just following up. Did that resolve the issue or is further
>> investigation needed?
>
> The fix seems to work, I haven't seen the assertion fail anymore,
> incidentally the crash also stopped to show up on an unpatched branch.
>

Sounds good, thanks for the update. If there's anything else I can help
with, please let me know.

2021-08-20 11:06:48

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

On Fri, Aug 20, 2021 at 11:09:05AM +0800, Desmond Cheong Zhi Xi wrote:
> On 20/8/21 1:34 am, David Sterba wrote:
> > On Fri, Aug 20, 2021 at 01:11:58AM +0800, Desmond Cheong Zhi Xi wrote:
> >>>>> The option #2 does not sound safe because the TGT bit is checked in
> >>>>> several places where device list is queried for various reasons, even
> >>>>> without a mounted filesystem.
> >>>>>
> >>>>> Removing the assertion makes more sense but I'm still not convinced that
> >>>>> the this is expected/allowed state of a closed device.
> >>>>>
> >>>>
> >>>> Would it be better if we cleared the REPLACE_TGT bit only when closing
> >>>> the device where device->devid == BTRFS_DEV_REPLACE_DEVID?
> >>>>
> >>>> The first conditional in btrfs_close_one_device assumes that we can come
> >>>> across such a device. If we come across it, we should properly reset it.
> >>>>
> >>>> If other devices has this bit set, the ASSERT will still catch it and
> >>>> let us know something is wrong.
> >>>
> >>> That sounds great.
> >>>
> >>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >>>> index 70f94b75f25a..a5afebb78ecf 100644
> >>>> --- a/fs/btrfs/volumes.c
> >>>> +++ b/fs/btrfs/volumes.c
> >>>> @@ -1130,6 +1130,9 @@ static void btrfs_close_one_device(struct btrfs_device *device)
> >>>> fs_devices->rw_devices--;
> >>>> }
> >>>>
> >>>> + if (device->devid == BTRFS_DEV_REPLACE_DEVID)
> >>>> + clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
> >>>> +
> >>>> if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
> >>>> fs_devices->missing_devices--;
> >>>
> >>> I'll do a few test rounds, thanks.
> >>
> >> Just following up. Did that resolve the issue or is further
> >> investigation needed?
> >
> > The fix seems to work, I haven't seen the assertion fail anymore,
> > incidentally the crash also stopped to show up on an unpatched branch.
> >
>
> Sounds good, thanks for the update. If there's anything else I can help
> with, please let me know.

So are you going to send the patch with the fix?

2021-08-20 17:55:36

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids

On 20/8/21 6:58 pm, David Sterba wrote:
> On Fri, Aug 20, 2021 at 11:09:05AM +0800, Desmond Cheong Zhi Xi wrote:
>> On 20/8/21 1:34 am, David Sterba wrote:
>>> On Fri, Aug 20, 2021 at 01:11:58AM +0800, Desmond Cheong Zhi Xi wrote:
>>>>>>> The option #2 does not sound safe because the TGT bit is checked in
>>>>>>> several places where device list is queried for various reasons, even
>>>>>>> without a mounted filesystem.
>>>>>>>
>>>>>>> Removing the assertion makes more sense but I'm still not convinced that
>>>>>>> the this is expected/allowed state of a closed device.
>>>>>>>
>>>>>>
>>>>>> Would it be better if we cleared the REPLACE_TGT bit only when closing
>>>>>> the device where device->devid == BTRFS_DEV_REPLACE_DEVID?
>>>>>>
>>>>>> The first conditional in btrfs_close_one_device assumes that we can come
>>>>>> across such a device. If we come across it, we should properly reset it.
>>>>>>
>>>>>> If other devices has this bit set, the ASSERT will still catch it and
>>>>>> let us know something is wrong.
>>>>>
>>>>> That sounds great.
>>>>>
>>>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>>>> index 70f94b75f25a..a5afebb78ecf 100644
>>>>>> --- a/fs/btrfs/volumes.c
>>>>>> +++ b/fs/btrfs/volumes.c
>>>>>> @@ -1130,6 +1130,9 @@ static void btrfs_close_one_device(struct btrfs_device *device)
>>>>>> fs_devices->rw_devices--;
>>>>>> }
>>>>>>
>>>>>> + if (device->devid == BTRFS_DEV_REPLACE_DEVID)
>>>>>> + clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
>>>>>> +
>>>>>> if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
>>>>>> fs_devices->missing_devices--;
>>>>>
>>>>> I'll do a few test rounds, thanks.
>>>>
>>>> Just following up. Did that resolve the issue or is further
>>>> investigation needed?
>>>
>>> The fix seems to work, I haven't seen the assertion fail anymore,
>>> incidentally the crash also stopped to show up on an unpatched branch.
>>>
>>
>> Sounds good, thanks for the update. If there's anything else I can help
>> with, please let me know.
>
> So are you going to send the patch with the fix?
>

Right, just sent. For some reason I thought it was already patched.