2022-12-26 10:58:29

by Hou Tao

[permalink] [raw]
Subject: [PATCH v2 1/2] fscache: Use wait_on_bit() to wait for the freeing of relinquished volume

From: Hou Tao <[email protected]>

The freeing of relinquished volume will wake up the pending volume
acquisition by using wake_up_bit(), however it is mismatched with
wait_var_event() used in fscache_wait_on_volume_collision() and it will
never wake up the waiter in the wait-queue because these two functions
operate on different wait-queues.

According to the implementation in fscache_wait_on_volume_collision(),
if the wake-up of pending acquisition is delayed longer than 20 seconds
(e.g., due to the delay of on-demand fd closing), the first
wait_var_event_timeout() will timeout and the following wait_var_event()
will hang forever as shown below:

FS-Cache: Potential volume collision new=00000024 old=00000022
......
INFO: task mount:1148 blocked for more than 122 seconds.
Not tainted 6.1.0-rc6+ #1
task:mount state:D stack:0 pid:1148 ppid:1
Call Trace:
<TASK>
__schedule+0x2f6/0xb80
schedule+0x67/0xe0
fscache_wait_on_volume_collision.cold+0x80/0x82
__fscache_acquire_volume+0x40d/0x4e0
erofs_fscache_register_volume+0x51/0xe0 [erofs]
erofs_fscache_register_fs+0x19c/0x240 [erofs]
erofs_fc_fill_super+0x746/0xaf0 [erofs]
vfs_get_super+0x7d/0x100
get_tree_nodev+0x16/0x20
erofs_fc_get_tree+0x20/0x30 [erofs]
vfs_get_tree+0x24/0xb0
path_mount+0x2fa/0xa90
do_mount+0x7c/0xa0
__x64_sys_mount+0x8b/0xe0
do_syscall_64+0x30/0x60
entry_SYSCALL_64_after_hwframe+0x46/0xb0

Considering that wake_up_bit() is more selective, so fixing it by using
wait_on_bit() instead of wait_var_event() to wait for the freeing of
relinquished volume. In addition because waitqueue_active() is used in
wake_up_bit() and clear_bit() doesn't imply any memory barrier, so also
adding smp_mb__after_atomic() before wake_up_bit().

Fixes: 62ab63352350 ("fscache: Implement volume registration")
Signed-off-by: Hou Tao <[email protected]>
---
fs/fscache/volume.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
index ab8ceddf9efa..fc3dd3bc851d 100644
--- a/fs/fscache/volume.c
+++ b/fs/fscache/volume.c
@@ -141,13 +141,14 @@ static bool fscache_is_acquire_pending(struct fscache_volume *volume)
static void fscache_wait_on_volume_collision(struct fscache_volume *candidate,
unsigned int collidee_debug_id)
{
- wait_var_event_timeout(&candidate->flags,
- !fscache_is_acquire_pending(candidate), 20 * HZ);
+ wait_on_bit_timeout(&candidate->flags, FSCACHE_VOLUME_ACQUIRE_PENDING,
+ TASK_UNINTERRUPTIBLE, 20 * HZ);
if (fscache_is_acquire_pending(candidate)) {
pr_notice("Potential volume collision new=%08x old=%08x",
candidate->debug_id, collidee_debug_id);
fscache_stat(&fscache_n_volumes_collision);
- wait_var_event(&candidate->flags, !fscache_is_acquire_pending(candidate));
+ wait_on_bit(&candidate->flags, FSCACHE_VOLUME_ACQUIRE_PENDING,
+ TASK_UNINTERRUPTIBLE);
}
}

@@ -348,6 +349,11 @@ static void fscache_wake_pending_volume(struct fscache_volume *volume,
if (fscache_volume_same(cursor, volume)) {
fscache_see_volume(cursor, fscache_volume_see_hash_wake);
clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
+ /*
+ * Paired with barrier in wait_on_bit(). Check
+ * wake_up_bit() and waitqueue_active() for details.
+ */
+ smp_mb__after_atomic();
wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);
return;
}
--
2.29.2


2023-01-11 16:15:30

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] fscache: Use wait_on_bit() to wait for the freeing of relinquished volume

Hou Tao <[email protected]> wrote:

> clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
> + /*
> + * Paired with barrier in wait_on_bit(). Check
> + * wake_up_bit() and waitqueue_active() for details.
> + */
> + smp_mb__after_atomic();
> wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);

What two values are you applying a partial ordering to?

David

2023-01-12 02:26:24

by Hou Tao

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] fscache: Use wait_on_bit() to wait for the freeing of relinquished volume

Hi,

On 1/12/2023 12:06 AM, David Howells wrote:
> Hou Tao <[email protected]> wrote:
>
>> clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
>> + /*
>> + * Paired with barrier in wait_on_bit(). Check
>> + * wake_up_bit() and waitqueue_active() for details.
>> + */
>> + smp_mb__after_atomic();
>> wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);
> What two values are you applying a partial ordering to?
cursor->flags and wq->head. fscache_wake_pending_volume() will write
cursor->flags and read wq->head through waitqueue_active(), and the wait will
write wq->head then read cursor->flags.
>
> David
>

2023-01-12 04:00:33

by Jingbo Xu

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] fscache: Use wait_on_bit() to wait for the freeing of relinquished volume



On 12/26/22 6:33 PM, Hou Tao wrote:
> From: Hou Tao <[email protected]>
>
> The freeing of relinquished volume will wake up the pending volume
> acquisition by using wake_up_bit(), however it is mismatched with
> wait_var_event() used in fscache_wait_on_volume_collision() and it will
> never wake up the waiter in the wait-queue because these two functions
> operate on different wait-queues.
>
> According to the implementation in fscache_wait_on_volume_collision(),
> if the wake-up of pending acquisition is delayed longer than 20 seconds
> (e.g., due to the delay of on-demand fd closing), the first
> wait_var_event_timeout() will timeout and the following wait_var_event()
> will hang forever as shown below:
>
> FS-Cache: Potential volume collision new=00000024 old=00000022
> ......
> INFO: task mount:1148 blocked for more than 122 seconds.
> Not tainted 6.1.0-rc6+ #1
> task:mount state:D stack:0 pid:1148 ppid:1
> Call Trace:
> <TASK>
> __schedule+0x2f6/0xb80
> schedule+0x67/0xe0
> fscache_wait_on_volume_collision.cold+0x80/0x82
> __fscache_acquire_volume+0x40d/0x4e0
> erofs_fscache_register_volume+0x51/0xe0 [erofs]
> erofs_fscache_register_fs+0x19c/0x240 [erofs]
> erofs_fc_fill_super+0x746/0xaf0 [erofs]
> vfs_get_super+0x7d/0x100
> get_tree_nodev+0x16/0x20
> erofs_fc_get_tree+0x20/0x30 [erofs]
> vfs_get_tree+0x24/0xb0
> path_mount+0x2fa/0xa90
> do_mount+0x7c/0xa0
> __x64_sys_mount+0x8b/0xe0
> do_syscall_64+0x30/0x60
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
>
> Considering that wake_up_bit() is more selective, so fixing it by using
^
fix
> wait_on_bit() instead of wait_var_event() to wait for the freeing of
> relinquished volume. In addition because waitqueue_active() is used in
> wake_up_bit() and clear_bit() doesn't imply any memory barrier, so also
> adding smp_mb__after_atomic() before wake_up_bit().

... doesn't imply any memory barrier, add ...

>
> Fixes: 62ab63352350 ("fscache: Implement volume registration")
> Signed-off-by: Hou Tao <[email protected]>


Otherwise LGTM :)

Reviewed-by: Jingbo Xu <[email protected]>

> ---
> fs/fscache/volume.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
> index ab8ceddf9efa..fc3dd3bc851d 100644
> --- a/fs/fscache/volume.c
> +++ b/fs/fscache/volume.c
> @@ -141,13 +141,14 @@ static bool fscache_is_acquire_pending(struct fscache_volume *volume)
> static void fscache_wait_on_volume_collision(struct fscache_volume *candidate,
> unsigned int collidee_debug_id)
> {
> - wait_var_event_timeout(&candidate->flags,
> - !fscache_is_acquire_pending(candidate), 20 * HZ);
> + wait_on_bit_timeout(&candidate->flags, FSCACHE_VOLUME_ACQUIRE_PENDING,
> + TASK_UNINTERRUPTIBLE, 20 * HZ);
> if (fscache_is_acquire_pending(candidate)) {
> pr_notice("Potential volume collision new=%08x old=%08x",
> candidate->debug_id, collidee_debug_id);
> fscache_stat(&fscache_n_volumes_collision);
> - wait_var_event(&candidate->flags, !fscache_is_acquire_pending(candidate));
> + wait_on_bit(&candidate->flags, FSCACHE_VOLUME_ACQUIRE_PENDING,
> + TASK_UNINTERRUPTIBLE);
> }
> }
>
> @@ -348,6 +349,11 @@ static void fscache_wake_pending_volume(struct fscache_volume *volume,
> if (fscache_volume_same(cursor, volume)) {
> fscache_see_volume(cursor, fscache_volume_see_hash_wake);
> clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
> + /*
> + * Paired with barrier in wait_on_bit(). Check
> + * wake_up_bit() and waitqueue_active() for details.
> + */
> + smp_mb__after_atomic();
> wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);
> return;
> }

--
Thanks,
Jingbo

2023-01-12 04:13:18

by Jingbo Xu

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] fscache: Use wait_on_bit() to wait for the freeing of relinquished volume



On 1/12/23 12:06 AM, David Howells wrote:
> Hou Tao <[email protected]> wrote:
>
>> clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
>> + /*
>> + * Paired with barrier in wait_on_bit(). Check
>> + * wake_up_bit() and waitqueue_active() for details.
>> + */
>> + smp_mb__after_atomic();
>> wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);
>
> What two values are you applying a partial ordering to?

Yeah Hou Tao has explained that a full barrier is needed here to avoid
the potential reordering at the waker side.

As I was also researching on this these days, I'd like to share my
thought on this, hopefully if it could give some insight :)

Without the barrier at the waker side, it may suffer from the following
race:

```
CPU0 - waker CPU1 - waiter

if (waitqueue_active(wq_head)) <-- find no wq_entry in wq_head list
wake_up(wq_head);

for (;;) {
prepare_to_wait(...);
# add wq_entry into wq_head list

if (@cond) <-- @cond is false
break;
schedule(); <-- wq_entry still in
wq_head list,
wait for next wakeup
}
finish_wait(&wq_head, &wait);

@cond = true;
```

in which case the waiter misses the wakeup for one time.

--
Thanks,
Jingbo

2023-01-12 06:43:23

by Hou Tao

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] fscache: Use wait_on_bit() to wait for the freeing of relinquished volume

Hi,

On 1/12/2023 11:58 AM, Jingbo Xu wrote:
>
> On 1/12/23 12:06 AM, David Howells wrote:
>> Hou Tao <[email protected]> wrote:
>>
>>> clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
>>> + /*
>>> + * Paired with barrier in wait_on_bit(). Check
>>> + * wake_up_bit() and waitqueue_active() for details.
>>> + */
>>> + smp_mb__after_atomic();
>>> wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);
>> What two values are you applying a partial ordering to?
> Yeah Hou Tao has explained that a full barrier is needed here to avoid
> the potential reordering at the waker side.
>
> As I was also researching on this these days, I'd like to share my
> thought on this, hopefully if it could give some insight :)
>
> Without the barrier at the waker side, it may suffer from the following
> race:
>
> ```
> CPU0 - waker CPU1 - waiter
>
> if (waitqueue_active(wq_head)) <-- find no wq_entry in wq_head list
> wake_up(wq_head);
>
> for (;;) {
> prepare_to_wait(...);
> # add wq_entry into wq_head list
>
> if (@cond) <-- @cond is false
> break;
> schedule(); <-- wq_entry still in
> wq_head list,
> wait for next wakeup
> }
> finish_wait(&wq_head, &wait);
>
> @cond = true;
> ```
>
> in which case the waiter misses the wakeup for one time.
Thanks for the details annotation. It is exactly what I tried to say but failed to.
>

2023-01-12 06:48:09

by Hou Tao

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] fscache: Use wait_on_bit() to wait for the freeing of relinquished volume

Hi,

On 1/12/2023 11:47 AM, Jingbo Xu wrote:
>
> On 12/26/22 6:33 PM, Hou Tao wrote:
>> From: Hou Tao <[email protected]>
>>
>> The freeing of relinquished volume will wake up the pending volume
>> acquisition by using wake_up_bit(), however it is mismatched with
>> wait_var_event() used in fscache_wait_on_volume_collision() and it will
>> never wake up the waiter in the wait-queue because these two functions
>> operate on different wait-queues.
>>
>> According to the implementation in fscache_wait_on_volume_collision(),
>> if the wake-up of pending acquisition is delayed longer than 20 seconds
>> (e.g., due to the delay of on-demand fd closing), the first
>> wait_var_event_timeout() will timeout and the following wait_var_event()
>> will hang forever as shown below:
>>
>> FS-Cache: Potential volume collision new=00000024 old=00000022
>> ......
>> INFO: task mount:1148 blocked for more than 122 seconds.
>> Not tainted 6.1.0-rc6+ #1
>> task:mount state:D stack:0 pid:1148 ppid:1
>> Call Trace:
>> <TASK>
>> __schedule+0x2f6/0xb80
>> schedule+0x67/0xe0
>> fscache_wait_on_volume_collision.cold+0x80/0x82
>> __fscache_acquire_volume+0x40d/0x4e0
>> erofs_fscache_register_volume+0x51/0xe0 [erofs]
>> erofs_fscache_register_fs+0x19c/0x240 [erofs]
>> erofs_fc_fill_super+0x746/0xaf0 [erofs]
>> vfs_get_super+0x7d/0x100
>> get_tree_nodev+0x16/0x20
>> erofs_fc_get_tree+0x20/0x30 [erofs]
>> vfs_get_tree+0x24/0xb0
>> path_mount+0x2fa/0xa90
>> do_mount+0x7c/0xa0
>> __x64_sys_mount+0x8b/0xe0
>> do_syscall_64+0x30/0x60
>> entry_SYSCALL_64_after_hwframe+0x46/0xb0
>>
>> Considering that wake_up_bit() is more selective, so fixing it by using
> ^
> fix
>> wait_on_bit() instead of wait_var_event() to wait for the freeing of
>> relinquished volume. In addition because waitqueue_active() is used in
>> wake_up_bit() and clear_bit() doesn't imply any memory barrier, so also
>> adding smp_mb__after_atomic() before wake_up_bit().
> ... doesn't imply any memory barrier, add ...
Thanks for suggestions above. Will update in v3.
>
>> Fixes: 62ab63352350 ("fscache: Implement volume registration")
>> Signed-off-by: Hou Tao <[email protected]>
>
> Otherwise LGTM :)
>
> Reviewed-by: Jingbo Xu <[email protected]>
Thanks for review.
>
>> ---
>> fs/fscache/volume.c | 12 +++++++++---
>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
>> index ab8ceddf9efa..fc3dd3bc851d 100644
>> --- a/fs/fscache/volume.c
>> +++ b/fs/fscache/volume.c
>> @@ -141,13 +141,14 @@ static bool fscache_is_acquire_pending(struct fscache_volume *volume)
>> static void fscache_wait_on_volume_collision(struct fscache_volume *candidate,
>> unsigned int collidee_debug_id)
>> {
>> - wait_var_event_timeout(&candidate->flags,
>> - !fscache_is_acquire_pending(candidate), 20 * HZ);
>> + wait_on_bit_timeout(&candidate->flags, FSCACHE_VOLUME_ACQUIRE_PENDING,
>> + TASK_UNINTERRUPTIBLE, 20 * HZ);
>> if (fscache_is_acquire_pending(candidate)) {
>> pr_notice("Potential volume collision new=%08x old=%08x",
>> candidate->debug_id, collidee_debug_id);
>> fscache_stat(&fscache_n_volumes_collision);
>> - wait_var_event(&candidate->flags, !fscache_is_acquire_pending(candidate));
>> + wait_on_bit(&candidate->flags, FSCACHE_VOLUME_ACQUIRE_PENDING,
>> + TASK_UNINTERRUPTIBLE);
>> }
>> }
>>
>> @@ -348,6 +349,11 @@ static void fscache_wake_pending_volume(struct fscache_volume *volume,
>> if (fscache_volume_same(cursor, volume)) {
>> fscache_see_volume(cursor, fscache_volume_see_hash_wake);
>> clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
>> + /*
>> + * Paired with barrier in wait_on_bit(). Check
>> + * wake_up_bit() and waitqueue_active() for details.
>> + */
>> + smp_mb__after_atomic();
>> wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);
>> return;
>> }