2018-02-02 17:57:14

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 1/2] bdi: make sure congestion states are clear on free

FUSE has a bug where it fails to clear congestion states if a
connection gets aborted while congested, which can leave
nr_wb_congested[] stuck until reboot causing wait_iff_congested() to
wait spuriously.

While the bdi owner, FUSE, is primarily responsible for clearing
congestion states before destroying bdi_writebacks, bdi layer can
ensure that congestion states are not leaked beyond bdi_writeback
lifecycle.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Joshua Miller <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: [email protected]
---
include/linux/backing-dev.h | 14 +++++++++++++-
mm/backing-dev.c | 2 +-
2 files changed, 14 insertions(+), 2 deletions(-)

--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -220,6 +220,18 @@ static inline int bdi_sched_wait(void *w
return 0;
}

+static inline void __wb_congested_free(struct bdi_writeback_congested *congested)
+{
+ /*
+ * Make sure congestion states are cleared before freeing to avoid
+ * nr_wb_congested() corruption which can lead to misbehaving
+ * wait_iff_congested().
+ */
+ clear_wb_congested(congested, BLK_RW_SYNC);
+ clear_wb_congested(congested, BLK_RW_ASYNC);
+ kfree(congested);
+}
+
#ifdef CONFIG_CGROUP_WRITEBACK

struct bdi_writeback_congested *
@@ -409,7 +421,7 @@ wb_congested_get_create(struct backing_d
static inline void wb_congested_put(struct bdi_writeback_congested *congested)
{
if (atomic_dec_and_test(&congested->refcnt))
- kfree(congested);
+ __wb_congested_free(congested);
}

static inline struct bdi_writeback *wb_find_current(struct backing_dev_info *bdi)
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -509,7 +509,7 @@ void wb_congested_put(struct bdi_writeba
}

spin_unlock_irqrestore(&cgwb_lock, flags);
- kfree(congested);
+ __wb_congested_free(congested);
}

static void cgwb_release_workfn(struct work_struct *work)


2018-02-02 17:56:23

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2/2] FUSE: fix congested state leak on aborted connections

If a connection gets aborted while congested, FUSE can leave
nr_wb_congested[] stuck until reboot causing wait_iff_congested() to
wait spuriously which can lead to severe performance degradation.

The leak is caused by gating congestion state clearing with
fc->connected test in request_end(). This was added way back in 2009
by 26c3679101db ("fuse: destroy bdi on umount"). While the commit
description doesn't explain why the test was added, it most likely was
to avoid dereferencing bdi after it got destroyed.

Since then, bdi lifetime rules have changed many times and now we're
always guaranteed to have access to the bdi while the superblock is
alive (fc->sb).

Drop fc->connected conditional to avoid leaking congestion states.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Joshua Miller <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: [email protected]
---
fs/fuse/dev.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -381,8 +381,7 @@ static void request_end(struct fuse_conn
if (!fc->blocked && waitqueue_active(&fc->blocked_waitq))
wake_up(&fc->blocked_waitq);

- if (fc->num_background == fc->congestion_threshold &&
- fc->connected && fc->sb) {
+ if (fc->num_background == fc->congestion_threshold && fc->sb) {
clear_bdi_congested(fc->sb->s_bdi, BLK_RW_SYNC);
clear_bdi_congested(fc->sb->s_bdi, BLK_RW_ASYNC);
}

2018-02-05 23:04:28

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 1/2] bdi: make sure congestion states are clear on free

On Fri, Feb 02, 2018 at 09:53:28AM -0800, Tejun Heo wrote:
> FUSE has a bug where it fails to clear congestion states if a
> connection gets aborted while congested, which can leave
> nr_wb_congested[] stuck until reboot causing wait_iff_congested() to
> wait spuriously.
>
> While the bdi owner, FUSE, is primarily responsible for clearing
> congestion states before destroying bdi_writebacks, bdi layer can
> ensure that congestion states are not leaked beyond bdi_writeback
> lifecycle.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Reported-by: Joshua Miller <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: [email protected]

Acked-by: Johannes Weiner <[email protected]>

2018-02-06 16:21:07

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/2] bdi: make sure congestion states are clear on free

On Fri 02-02-18 09:53:28, Tejun Heo wrote:
> FUSE has a bug where it fails to clear congestion states if a
> connection gets aborted while congested, which can leave
> nr_wb_congested[] stuck until reboot causing wait_iff_congested() to
> wait spuriously.
>
> While the bdi owner, FUSE, is primarily responsible for clearing
> congestion states before destroying bdi_writebacks, bdi layer can
> ensure that congestion states are not leaked beyond bdi_writeback
> lifecycle.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Reported-by: Joshua Miller <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: [email protected]

Looks good. You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> include/linux/backing-dev.h | 14 +++++++++++++-
> mm/backing-dev.c | 2 +-
> 2 files changed, 14 insertions(+), 2 deletions(-)
>
> --- a/include/linux/backing-dev.h
> +++ b/include/linux/backing-dev.h
> @@ -220,6 +220,18 @@ static inline int bdi_sched_wait(void *w
> return 0;
> }
>
> +static inline void __wb_congested_free(struct bdi_writeback_congested *congested)
> +{
> + /*
> + * Make sure congestion states are cleared before freeing to avoid
> + * nr_wb_congested() corruption which can lead to misbehaving
> + * wait_iff_congested().
> + */
> + clear_wb_congested(congested, BLK_RW_SYNC);
> + clear_wb_congested(congested, BLK_RW_ASYNC);
> + kfree(congested);
> +}
> +
> #ifdef CONFIG_CGROUP_WRITEBACK
>
> struct bdi_writeback_congested *
> @@ -409,7 +421,7 @@ wb_congested_get_create(struct backing_d
> static inline void wb_congested_put(struct bdi_writeback_congested *congested)
> {
> if (atomic_dec_and_test(&congested->refcnt))
> - kfree(congested);
> + __wb_congested_free(congested);
> }
>
> static inline struct bdi_writeback *wb_find_current(struct backing_dev_info *bdi)
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -509,7 +509,7 @@ void wb_congested_put(struct bdi_writeba
> }
>
> spin_unlock_irqrestore(&cgwb_lock, flags);
> - kfree(congested);
> + __wb_congested_free(congested);
> }
>
> static void cgwb_release_workfn(struct work_struct *work)
--
Jan Kara <[email protected]>
SUSE Labs, CR

2018-02-06 16:26:36

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 2/2] FUSE: fix congested state leak on aborted connections

On Fri 02-02-18 09:54:14, Tejun Heo wrote:
> If a connection gets aborted while congested, FUSE can leave
> nr_wb_congested[] stuck until reboot causing wait_iff_congested() to
> wait spuriously which can lead to severe performance degradation.
>
> The leak is caused by gating congestion state clearing with
> fc->connected test in request_end(). This was added way back in 2009
> by 26c3679101db ("fuse: destroy bdi on umount"). While the commit
> description doesn't explain why the test was added, it most likely was
> to avoid dereferencing bdi after it got destroyed.
>
> Since then, bdi lifetime rules have changed many times and now we're
> always guaranteed to have access to the bdi while the superblock is
> alive (fc->sb).
>
> Drop fc->connected conditional to avoid leaking congestion states.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Reported-by: Joshua Miller <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Miklos Szeredi <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: [email protected]

Yeah, this should be fine AFAICT but my knowledge of FUSE is very cursory.
Anyway:

Acked-by: Jan Kara <[email protected]>

Honza

> ---
> fs/fuse/dev.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -381,8 +381,7 @@ static void request_end(struct fuse_conn
> if (!fc->blocked && waitqueue_active(&fc->blocked_waitq))
> wake_up(&fc->blocked_waitq);
>
> - if (fc->num_background == fc->congestion_threshold &&
> - fc->connected && fc->sb) {
> + if (fc->num_background == fc->congestion_threshold && fc->sb) {
> clear_bdi_congested(fc->sb->s_bdi, BLK_RW_SYNC);
> clear_bdi_congested(fc->sb->s_bdi, BLK_RW_ASYNC);
> }
--
Jan Kara <[email protected]>
SUSE Labs, CR

2018-05-30 14:23:11

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 2/2] FUSE: fix congested state leak on aborted connections

On Tue, Feb 6, 2018 at 5:25 PM, Jan Kara <[email protected]> wrote:
> On Fri 02-02-18 09:54:14, Tejun Heo wrote:
>> If a connection gets aborted while congested, FUSE can leave
>> nr_wb_congested[] stuck until reboot causing wait_iff_congested() to
>> wait spuriously which can lead to severe performance degradation.
>>
>> The leak is caused by gating congestion state clearing with
>> fc->connected test in request_end(). This was added way back in 2009
>> by 26c3679101db ("fuse: destroy bdi on umount"). While the commit
>> description doesn't explain why the test was added, it most likely was
>> to avoid dereferencing bdi after it got destroyed.
>>
>> Since then, bdi lifetime rules have changed many times and now we're
>> always guaranteed to have access to the bdi while the superblock is
>> alive (fc->sb).
>>
>> Drop fc->connected conditional to avoid leaking congestion states.
>>
>> Signed-off-by: Tejun Heo <[email protected]>
>> Reported-by: Joshua Miller <[email protected]>
>> Cc: Johannes Weiner <[email protected]>
>> Cc: Miklos Szeredi <[email protected]>
>> Cc: Jan Kara <[email protected]>
>> Cc: [email protected]
>
> Yeah, this should be fine AFAICT but my knowledge of FUSE is very cursory.
> Anyway:
>
> Acked-by: Jan Kara <[email protected]>

Can't say I fully understand how the global "is any bdi congested"
state is used in direct reclaim, but the patch is an obvious
improvement, so applied.

Thanks,
Miklos