2023-05-09 02:04:05

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 08/13] btrfs: Use alloc_ordered_workqueue() to create ordered workqueues

BACKGROUND
==========

When multiple work items are queued to a workqueue, their execution order
doesn't match the queueing order. They may get executed in any order and
simultaneously. When fully serialized execution - one by one in the queueing
order - is needed, an ordered workqueue should be used which can be created
with alloc_ordered_workqueue().

However, alloc_ordered_workqueue() was a later addition. Before it, an
ordered workqueue could be obtained by creating an UNBOUND workqueue with
@max_active==1. This originally was an implementation side-effect which was
broken by 4c16bd327c74 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
ordered"). Because there were users that depended on the ordered execution,
5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
made workqueue allocation path to implicitly promote UNBOUND workqueues w/
@max_active==1 to ordered workqueues.

While this has worked okay, overloading the UNBOUND allocation interface
this way creates other issues. It's difficult to tell whether a given
workqueue actually needs to be ordered and users that legitimately want a
min concurrency level wq unexpectedly gets an ordered one instead. With
planned UNBOUND workqueue updates to improve execution locality and more
prevalence of chiplet designs which can benefit from such improvements, this
isn't a state we wanna be in forever.

This patch series audits all callsites that create an UNBOUND workqueue w/
@max_active==1 and converts them to alloc_ordered_workqueue() as necessary.

WHAT TO LOOK FOR
================

The conversions are from

alloc_workqueue(WQ_UNBOUND | flags, 1, args..)

to

alloc_ordered_workqueue(flags, args...)

which don't cause any functional changes. If you know that fully ordered
execution is not ncessary, please let me know. I'll drop the conversion and
instead add a comment noting the fact to reduce confusion while conversion
is in progress.

If you aren't fully sure, it's completely fine to let the conversion
through. The behavior will stay exactly the same and we can always
reconsider later.

As there are follow-up workqueue core changes, I'd really appreciate if the
patch can be routed through the workqueue tree w/ your acks. Thanks.

v3: btrfs_alloc_workqueue() excluded again. The concurrency level of a
workqueue allocated through btrfs_alloc_workqueue() may be dynamically
adjusted through thresh_exec_hook(), so they can't be ordered.

v2: btrfs_alloc_workqueue() updated too as suggested by Wang.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: David Sterba <[email protected]>
Cc: [email protected]
---
fs/btrfs/disk-io.c | 2 +-
fs/btrfs/scrub.c | 6 ++++--
2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 59ea049fe7ee..32d08aed88b6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2217,7 +2217,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
fs_info->qgroup_rescan_workers =
btrfs_alloc_workqueue(fs_info, "qgroup-rescan", flags, 1, 0);
fs_info->discard_ctl.discard_workers =
- alloc_workqueue("btrfs_discard", WQ_UNBOUND | WQ_FREEZABLE, 1);
+ alloc_ordered_workqueue("btrfs_discard", WQ_FREEZABLE);

if (!(fs_info->workers && fs_info->hipri_workers &&
fs_info->delalloc_workers && fs_info->flush_workers &&
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 836725a19661..d5401d7813a5 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2734,8 +2734,10 @@ static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
if (refcount_inc_not_zero(&fs_info->scrub_workers_refcnt))
return 0;

- scrub_workers = alloc_workqueue("btrfs-scrub", flags,
- is_dev_replace ? 1 : max_active);
+ if (is_dev_replace)
+ scrub_workers = alloc_ordered_workqueue("btrfs-scrub", flags);
+ else
+ scrub_workers = alloc_workqueue("btrfs-scrub", flags, max_active);
if (!scrub_workers)
goto fail_scrub_workers;

--
2.40.1


2023-05-09 15:43:02

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH 08/13] btrfs: Use alloc_ordered_workqueue() to create ordered workqueues

On Mon, May 08, 2023 at 03:50:27PM -1000, Tejun Heo wrote:
> BACKGROUND
> ==========
>
> When multiple work items are queued to a workqueue, their execution order
> doesn't match the queueing order. They may get executed in any order and
> simultaneously. When fully serialized execution - one by one in the queueing
> order - is needed, an ordered workqueue should be used which can be created
> with alloc_ordered_workqueue().
>
> However, alloc_ordered_workqueue() was a later addition. Before it, an
> ordered workqueue could be obtained by creating an UNBOUND workqueue with
> @max_active==1. This originally was an implementation side-effect which was
> broken by 4c16bd327c74 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
> ordered"). Because there were users that depended on the ordered execution,
> 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
> made workqueue allocation path to implicitly promote UNBOUND workqueues w/
> @max_active==1 to ordered workqueues.
>
> While this has worked okay, overloading the UNBOUND allocation interface
> this way creates other issues. It's difficult to tell whether a given
> workqueue actually needs to be ordered and users that legitimately want a
> min concurrency level wq unexpectedly gets an ordered one instead. With
> planned UNBOUND workqueue updates to improve execution locality and more
> prevalence of chiplet designs which can benefit from such improvements, this
> isn't a state we wanna be in forever.
>
> This patch series audits all callsites that create an UNBOUND workqueue w/
> @max_active==1 and converts them to alloc_ordered_workqueue() as necessary.
>
> WHAT TO LOOK FOR
> ================
>
> The conversions are from
>
> alloc_workqueue(WQ_UNBOUND | flags, 1, args..)
>
> to
>
> alloc_ordered_workqueue(flags, args...)
>
> which don't cause any functional changes. If you know that fully ordered
> execution is not ncessary, please let me know. I'll drop the conversion and
> instead add a comment noting the fact to reduce confusion while conversion
> is in progress.
>
> If you aren't fully sure, it's completely fine to let the conversion
> through. The behavior will stay exactly the same and we can always
> reconsider later.
>
> As there are follow-up workqueue core changes, I'd really appreciate if the
> patch can be routed through the workqueue tree w/ your acks. Thanks.
>
> v3: btrfs_alloc_workqueue() excluded again. The concurrency level of a
> workqueue allocated through btrfs_alloc_workqueue() may be dynamically
> adjusted through thresh_exec_hook(), so they can't be ordered.
>
> v2: btrfs_alloc_workqueue() updated too as suggested by Wang.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Wang Yugui <[email protected]>
> Cc: Chris Mason <[email protected]>
> Cc: Josef Bacik <[email protected]>
> Cc: David Sterba <[email protected]>
> Cc: [email protected]
> ---
> fs/btrfs/disk-io.c | 2 +-
> fs/btrfs/scrub.c | 6 ++++--
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 59ea049fe7ee..32d08aed88b6 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2217,7 +2217,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
> fs_info->qgroup_rescan_workers =
> btrfs_alloc_workqueue(fs_info, "qgroup-rescan", flags, 1, 0);
> fs_info->discard_ctl.discard_workers =
> - alloc_workqueue("btrfs_discard", WQ_UNBOUND | WQ_FREEZABLE, 1);
> + alloc_ordered_workqueue("btrfs_discard", WQ_FREEZABLE);
>
> if (!(fs_info->workers && fs_info->hipri_workers &&
> fs_info->delalloc_workers && fs_info->flush_workers &&

I think there are a few more conversions missing. There's a local flags
variable in btrfs_init_workqueues

2175 static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
2176 {
2177 u32 max_active = fs_info->thread_pool_size;
2178 unsigned int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND;

And used like

2194 fs_info->fixup_workers =
2195 btrfs_alloc_workqueue(fs_info, "fixup", flags, 1, 0);

2213 fs_info->qgroup_rescan_workers =
2214 btrfs_alloc_workqueue(fs_info, "qgroup-rescan", flags, 1, 0);

WQ_UNBOUND is not mentioned explicitliy like for the "btrfs_discard"
workqueue. Patch v2 did the switch in btrfs_alloc_workqueue according
to the max_active/limit_active parameter but this would affect all
queues and not all of them require to be ordered.

In btrfs_resize_thread_pool the workqueue_set_max_active is called
directly or indirectly so this can set the max_active to a user-defined
mount option. Could this be a problem or trigger a warning? This would
lead to max_active==1 + WQ_UNBOUND.

2023-05-09 16:03:19

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 08/13] btrfs: Use alloc_ordered_workqueue() to create ordered workqueues

Hello, David.

Thanks for taking a look.

On Tue, May 09, 2023 at 04:53:32PM +0200, David Sterba wrote:
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index 59ea049fe7ee..32d08aed88b6 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -2217,7 +2217,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
> > fs_info->qgroup_rescan_workers =
> > btrfs_alloc_workqueue(fs_info, "qgroup-rescan", flags, 1, 0);
> > fs_info->discard_ctl.discard_workers =
> > - alloc_workqueue("btrfs_discard", WQ_UNBOUND | WQ_FREEZABLE, 1);
> > + alloc_ordered_workqueue("btrfs_discard", WQ_FREEZABLE);
> >
> > if (!(fs_info->workers && fs_info->hipri_workers &&
> > fs_info->delalloc_workers && fs_info->flush_workers &&
>
> I think there are a few more conversions missing. There's a local flags
> variable in btrfs_init_workqueues
>
> 2175 static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
> 2176 {
> 2177 u32 max_active = fs_info->thread_pool_size;
> 2178 unsigned int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND;
>
> And used like
>
> 2194 fs_info->fixup_workers =
> 2195 btrfs_alloc_workqueue(fs_info, "fixup", flags, 1, 0);
>
> 2213 fs_info->qgroup_rescan_workers =
> 2214 btrfs_alloc_workqueue(fs_info, "qgroup-rescan", flags, 1, 0);

Right you are.

> WQ_UNBOUND is not mentioned explicitliy like for the "btrfs_discard"
> workqueue. Patch v2 did the switch in btrfs_alloc_workqueue according
> to the max_active/limit_active parameter but this would affect all
> queues and not all of them require to be ordered.

The thresh mechanism which auto adjusts max active means that the workqueues
allocated btrfs_alloc_workqueue() can't be ordered, right? When thresh is
smaller than DFT_THRESHOLD, the mechanism is disabled but that looks like an
optimization.

> In btrfs_resize_thread_pool the workqueue_set_max_active is called
> directly or indirectly so this can set the max_active to a user-defined
> mount option. Could this be a problem or trigger a warning? This would
> lead to max_active==1 + WQ_UNBOUND.

That's not a problem. The only thing we need to make sure is that the
workqueues which actually *must* be ordered use alloc_ordered_workqueue() as
they won't be implicitly treated as ordered in the future.

* The current patch converts two - fs_info->discard_ctl.discard_workers and
scrub_workers when @is_dev_replace is set. Do they actually need to be
ordered?

* As you pointed out, fs_info->fixup_workers and
fs_info->qgroup_rescan_workers are also currently implicitly ordered. Do
they actually need to be ordered?

Thanks.

--
tejun

2023-05-10 00:05:33

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 08/13] btrfs: Use alloc_ordered_workqueue() to create ordered workqueues

Hello, David.

On Wed, May 10, 2023 at 01:36:20AM +0200, David Sterba wrote:
...
> Yeah I think so but I'm not entierly sure. The ordering for all queues
> that don't start with max_active > 1 should not be required, here the
> parallelization and out of order processing is expected and serialized
> or decided once the work is done.
>
> > > In btrfs_resize_thread_pool the workqueue_set_max_active is called
> > > directly or indirectly so this can set the max_active to a user-defined
> > > mount option. Could this be a problem or trigger a warning? This would
> > > lead to max_active==1 + WQ_UNBOUND.
> >
> > That's not a problem. The only thing we need to make sure is that the
> > workqueues which actually *must* be ordered use alloc_ordered_workqueue() as
> > they won't be implicitly treated as ordered in the future.
> >
> > * The current patch converts two - fs_info->discard_ctl.discard_workers and
> > scrub_workers when @is_dev_replace is set. Do they actually need to be
> > ordered?
> >
> > * As you pointed out, fs_info->fixup_workers and
> > fs_info->qgroup_rescan_workers are also currently implicitly ordered. Do
> > they actually need to be ordered?
>
> I think all of them somehow implictly depend on the ordering. The
> replace process sequentially goes over a block group and copies blocks.
>
> The fixup process is quite obscure and we should preserve the semantics
> as much as possible. It has something to do with pages that get out of
> sync with extent state without btrfs knowing and that there are more such
> requests hapenning at the same time is low but once it happens it can
> lead to corruptions.
>
> Quota rescan is in its nature also a sequential process but I think it
> does not need to be ordered, it's started from higher level context like
> enabling quotas or rescan but there are also calls at remount time so
> this makes it less clear.
>
> In summary, if the ordered queue could be used then I'd recommend to do
> it as the safe option.

I see. It seems rather error-prone to make workqueues implicitly ordered
from btrfs_alloc_workqueue(). I'll see if I can make it explicit and keep
all workqueues which are currently guaranteed to be ordered ordered.

Thanks.

--
tejun

2023-05-10 00:12:32

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH 08/13] btrfs: Use alloc_ordered_workqueue() to create ordered workqueues

On Tue, May 09, 2023 at 05:57:16AM -1000, Tejun Heo wrote:
> Hello, David.
>
> Thanks for taking a look.
>
> On Tue, May 09, 2023 at 04:53:32PM +0200, David Sterba wrote:
> > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > > index 59ea049fe7ee..32d08aed88b6 100644
> > > --- a/fs/btrfs/disk-io.c
> > > +++ b/fs/btrfs/disk-io.c
> > > @@ -2217,7 +2217,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
> > > fs_info->qgroup_rescan_workers =
> > > btrfs_alloc_workqueue(fs_info, "qgroup-rescan", flags, 1, 0);
> > > fs_info->discard_ctl.discard_workers =
> > > - alloc_workqueue("btrfs_discard", WQ_UNBOUND | WQ_FREEZABLE, 1);
> > > + alloc_ordered_workqueue("btrfs_discard", WQ_FREEZABLE);
> > >
> > > if (!(fs_info->workers && fs_info->hipri_workers &&
> > > fs_info->delalloc_workers && fs_info->flush_workers &&
> >
> > I think there are a few more conversions missing. There's a local flags
> > variable in btrfs_init_workqueues
> >
> > 2175 static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
> > 2176 {
> > 2177 u32 max_active = fs_info->thread_pool_size;
> > 2178 unsigned int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND;
> >
> > And used like
> >
> > 2194 fs_info->fixup_workers =
> > 2195 btrfs_alloc_workqueue(fs_info, "fixup", flags, 1, 0);
> >
> > 2213 fs_info->qgroup_rescan_workers =
> > 2214 btrfs_alloc_workqueue(fs_info, "qgroup-rescan", flags, 1, 0);
>
> Right you are.
>
> > WQ_UNBOUND is not mentioned explicitliy like for the "btrfs_discard"
> > workqueue. Patch v2 did the switch in btrfs_alloc_workqueue according
> > to the max_active/limit_active parameter but this would affect all
> > queues and not all of them require to be ordered.
>
> The thresh mechanism which auto adjusts max active means that the workqueues
> allocated btrfs_alloc_workqueue() can't be ordered, right? When thresh is
> smaller than DFT_THRESHOLD, the mechanism is disabled but that looks like an
> optimization.

Yeah I think so but I'm not entierly sure. The ordering for all queues
that don't start with max_active > 1 should not be required, here the
parallelization and out of order processing is expected and serialized
or decided once the work is done.

> > In btrfs_resize_thread_pool the workqueue_set_max_active is called
> > directly or indirectly so this can set the max_active to a user-defined
> > mount option. Could this be a problem or trigger a warning? This would
> > lead to max_active==1 + WQ_UNBOUND.
>
> That's not a problem. The only thing we need to make sure is that the
> workqueues which actually *must* be ordered use alloc_ordered_workqueue() as
> they won't be implicitly treated as ordered in the future.
>
> * The current patch converts two - fs_info->discard_ctl.discard_workers and
> scrub_workers when @is_dev_replace is set. Do they actually need to be
> ordered?
>
> * As you pointed out, fs_info->fixup_workers and
> fs_info->qgroup_rescan_workers are also currently implicitly ordered. Do
> they actually need to be ordered?

I think all of them somehow implictly depend on the ordering. The
replace process sequentially goes over a block group and copies blocks.

The fixup process is quite obscure and we should preserve the semantics
as much as possible. It has something to do with pages that get out of
sync with extent state without btrfs knowing and that there are more such
requests hapenning at the same time is low but once it happens it can
lead to corruptions.

Quota rescan is in its nature also a sequential process but I think it
does not need to be ordered, it's started from higher level context like
enabling quotas or rescan but there are also calls at remount time so
this makes it less clear.

In summary, if the ordered queue could be used then I'd recommend to do
it as the safe option.

2023-05-25 23:46:03

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 08/13] btrfs: Use alloc_ordered_workqueue() to create ordered workqueues

BACKGROUND
==========

When multiple work items are queued to a workqueue, their execution order
doesn't match the queueing order. They may get executed in any order and
simultaneously. When fully serialized execution - one by one in the queueing
order - is needed, an ordered workqueue should be used which can be created
with alloc_ordered_workqueue().

However, alloc_ordered_workqueue() was a later addition. Before it, an
ordered workqueue could be obtained by creating an UNBOUND workqueue with
@max_active==1. This originally was an implementation side-effect which was
broken by 4c16bd327c74 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
ordered"). Because there were users that depended on the ordered execution,
5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
made workqueue allocation path to implicitly promote UNBOUND workqueues w/
@max_active==1 to ordered workqueues.

While this has worked okay, overloading the UNBOUND allocation interface
this way creates other issues. It's difficult to tell whether a given
workqueue actually needs to be ordered and users that legitimately want a
min concurrency level wq unexpectedly gets an ordered one instead. With
planned UNBOUND workqueue updates to improve execution locality and more
prevalence of chiplet designs which can benefit from such improvements, this
isn't a state we wanna be in forever.

This patch series audits all callsites that create an UNBOUND workqueue w/
@max_active==1 and converts them to alloc_ordered_workqueue() as necessary.

BTRFS
=====

* fs_info->scrub_workers initialized in scrub_workers_get() was setting
@max_active to 1 when @is_dev_replace is set and it seems that the
workqueue actually needs to be ordered if @is_dev_replace. Update the code
so that alloc_ordered_workqueue() is used if @is_dev_replace.

* fs_info->discard_ctl.discard_workers initialized in
btrfs_init_workqueues() was directly using alloc_workqueue() w/
@max_active==1. Converted to alloc_ordered_workqueue().

* fs_info->fixup_workers and fs_info->qgroup_rescan_workers initialized in
btrfs_queue_work() use the btrfs's workqueue wrapper, btrfs_workqueue,
which are allocated with btrfs_alloc_workqueue().

btrfs_workqueue implements automatic @max_active adjustment which is
disabled when the specified max limix is below a certain threshold, so
calling btrfs_alloc_workqueue() with @limit_active==1 yields an ordered
workqueue whose @max_active won't be changed as the auto-tuning is
disabled.

This is rather brittle in that nothing clearly indicates that the two
workqueues should be ordered or btrfs_alloc_workqueue() must disable
auto-tuning when @limit_active==1.

This patch factors out the common btrfs_workqueue init code into
btrfs_init_workqueue() and add explicit btrfs_alloc_ordered_workqueue().
The two workqueues are converted to use the new ordered allocation
interface.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: David Sterba <[email protected]>
Cc: [email protected]
---
Hello,

David, I think this is a bit too invasive to carry through workqueue tree.
If this looks okay, can you plase apply route it through the btrfs tree?

Thanks.

fs/btrfs/async-thread.c | 43 ++++++++++++++++++++++++++++++++++++++-----
fs/btrfs/async-thread.h | 3 +++
fs/btrfs/disk-io.c | 8 +++++---
fs/btrfs/scrub.c | 6 ++++--
4 files changed, 50 insertions(+), 10 deletions(-)

--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -71,6 +71,16 @@ bool btrfs_workqueue_normal_congested(co
return atomic_read(&wq->pending) > wq->thresh * 2;
}

+static void btrfs_init_workqueue(struct btrfs_workqueue *wq,
+ struct btrfs_fs_info *fs_info)
+{
+ wq->fs_info = fs_info;
+ atomic_set(&wq->pending, 0);
+ INIT_LIST_HEAD(&wq->ordered_list);
+ spin_lock_init(&wq->list_lock);
+ spin_lock_init(&wq->thres_lock);
+}
+
struct btrfs_workqueue *btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info,
const char *name, unsigned int flags,
int limit_active, int thresh)
@@ -80,9 +90,9 @@ struct btrfs_workqueue *btrfs_alloc_work
if (!ret)
return NULL;

- ret->fs_info = fs_info;
+ btrfs_init_workqueue(ret, fs_info);
+
ret->limit_active = limit_active;
- atomic_set(&ret->pending, 0);
if (thresh == 0)
thresh = DFT_THRESHOLD;
/* For low threshold, disabling threshold is a better choice */
@@ -106,9 +116,32 @@ struct btrfs_workqueue *btrfs_alloc_work
return NULL;
}

- INIT_LIST_HEAD(&ret->ordered_list);
- spin_lock_init(&ret->list_lock);
- spin_lock_init(&ret->thres_lock);
+ trace_btrfs_workqueue_alloc(ret, name);
+ return ret;
+}
+
+struct btrfs_workqueue *btrfs_alloc_ordered_workqueue(
+ struct btrfs_fs_info *fs_info, const char *name,
+ unsigned int flags)
+{
+ struct btrfs_workqueue *ret = kzalloc(sizeof(*ret), GFP_KERNEL);
+
+ if (!ret)
+ return NULL;
+
+ btrfs_init_workqueue(ret, fs_info);
+
+ /* ordered workqueues don't allow @max_active adjustments */
+ ret->limit_active = 1;
+ ret->current_active = 1;
+ ret->thresh = NO_THRESHOLD;
+
+ ret->normal_wq = alloc_ordered_workqueue("btrfs-%s", flags, name);
+ if (!ret->normal_wq) {
+ kfree(ret);
+ return NULL;
+ }
+
trace_btrfs_workqueue_alloc(ret, name);
return ret;
}
--- a/fs/btrfs/async-thread.h
+++ b/fs/btrfs/async-thread.h
@@ -31,6 +31,9 @@ struct btrfs_workqueue *btrfs_alloc_work
unsigned int flags,
int limit_active,
int thresh);
+struct btrfs_workqueue *btrfs_alloc_ordered_workqueue(
+ struct btrfs_fs_info *fs_info, const char *name,
+ unsigned int flags);
void btrfs_init_work(struct btrfs_work *work, btrfs_func_t func,
btrfs_func_t ordered_func, btrfs_func_t ordered_free);
void btrfs_queue_work(struct btrfs_workqueue *wq,
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2177,6 +2177,7 @@ static int btrfs_init_workqueues(struct
{
u32 max_active = fs_info->thread_pool_size;
unsigned int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND;
+ unsigned int ordered_flags = WQ_MEM_RECLAIM | WQ_FREEZABLE;

fs_info->workers =
btrfs_alloc_workqueue(fs_info, "worker", flags, max_active, 16);
@@ -2196,7 +2197,7 @@ static int btrfs_init_workqueues(struct
btrfs_alloc_workqueue(fs_info, "cache", flags, max_active, 0);

fs_info->fixup_workers =
- btrfs_alloc_workqueue(fs_info, "fixup", flags, 1, 0);
+ btrfs_alloc_ordered_workqueue(fs_info, "fixup", ordered_flags);

fs_info->endio_workers =
alloc_workqueue("btrfs-endio", flags, max_active);
@@ -2215,9 +2216,10 @@ static int btrfs_init_workqueues(struct
btrfs_alloc_workqueue(fs_info, "delayed-meta", flags,
max_active, 0);
fs_info->qgroup_rescan_workers =
- btrfs_alloc_workqueue(fs_info, "qgroup-rescan", flags, 1, 0);
+ btrfs_alloc_ordered_workqueue(fs_info, "qgroup-rescan",
+ ordered_flags);
fs_info->discard_ctl.discard_workers =
- alloc_workqueue("btrfs_discard", WQ_UNBOUND | WQ_FREEZABLE, 1);
+ alloc_ordered_workqueue("btrfs_discard", WQ_FREEZABLE);

if (!(fs_info->workers && fs_info->hipri_workers &&
fs_info->delalloc_workers && fs_info->flush_workers &&
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2734,8 +2734,10 @@ static noinline_for_stack int scrub_work
if (refcount_inc_not_zero(&fs_info->scrub_workers_refcnt))
return 0;

- scrub_workers = alloc_workqueue("btrfs-scrub", flags,
- is_dev_replace ? 1 : max_active);
+ if (is_dev_replace)
+ scrub_workers = alloc_ordered_workqueue("btrfs-scrub", flags);
+ else
+ scrub_workers = alloc_workqueue("btrfs-scrub", flags, max_active);
if (!scrub_workers)
goto fail_scrub_workers;


2023-05-26 13:09:42

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH 08/13] btrfs: Use alloc_ordered_workqueue() to create ordered workqueues

On Thu, May 25, 2023 at 01:33:08PM -1000, Tejun Heo wrote:
> BACKGROUND
> ==========
>
> When multiple work items are queued to a workqueue, their execution order
> doesn't match the queueing order. They may get executed in any order and
> simultaneously. When fully serialized execution - one by one in the queueing
> order - is needed, an ordered workqueue should be used which can be created
> with alloc_ordered_workqueue().
>
> However, alloc_ordered_workqueue() was a later addition. Before it, an
> ordered workqueue could be obtained by creating an UNBOUND workqueue with
> @max_active==1. This originally was an implementation side-effect which was
> broken by 4c16bd327c74 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
> ordered"). Because there were users that depended on the ordered execution,
> 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
> made workqueue allocation path to implicitly promote UNBOUND workqueues w/
> @max_active==1 to ordered workqueues.
>
> While this has worked okay, overloading the UNBOUND allocation interface
> this way creates other issues. It's difficult to tell whether a given
> workqueue actually needs to be ordered and users that legitimately want a
> min concurrency level wq unexpectedly gets an ordered one instead. With
> planned UNBOUND workqueue updates to improve execution locality and more
> prevalence of chiplet designs which can benefit from such improvements, this
> isn't a state we wanna be in forever.
>
> This patch series audits all callsites that create an UNBOUND workqueue w/
> @max_active==1 and converts them to alloc_ordered_workqueue() as necessary.
>
> BTRFS
> =====
>
> * fs_info->scrub_workers initialized in scrub_workers_get() was setting
> @max_active to 1 when @is_dev_replace is set and it seems that the
> workqueue actually needs to be ordered if @is_dev_replace. Update the code
> so that alloc_ordered_workqueue() is used if @is_dev_replace.
>
> * fs_info->discard_ctl.discard_workers initialized in
> btrfs_init_workqueues() was directly using alloc_workqueue() w/
> @max_active==1. Converted to alloc_ordered_workqueue().
>
> * fs_info->fixup_workers and fs_info->qgroup_rescan_workers initialized in
> btrfs_queue_work() use the btrfs's workqueue wrapper, btrfs_workqueue,
> which are allocated with btrfs_alloc_workqueue().
>
> btrfs_workqueue implements automatic @max_active adjustment which is
> disabled when the specified max limix is below a certain threshold, so
> calling btrfs_alloc_workqueue() with @limit_active==1 yields an ordered
> workqueue whose @max_active won't be changed as the auto-tuning is
> disabled.
>
> This is rather brittle in that nothing clearly indicates that the two
> workqueues should be ordered or btrfs_alloc_workqueue() must disable
> auto-tuning when @limit_active==1.
>
> This patch factors out the common btrfs_workqueue init code into
> btrfs_init_workqueue() and add explicit btrfs_alloc_ordered_workqueue().
> The two workqueues are converted to use the new ordered allocation
> interface.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Wang Yugui <[email protected]>
> Cc: Chris Mason <[email protected]>
> Cc: Josef Bacik <[email protected]>
> Cc: David Sterba <[email protected]>
> Cc: [email protected]
> ---
> Hello,
>
> David, I think this is a bit too invasive to carry through workqueue tree.
> If this looks okay, can you plase apply route it through the btrfs tree?

Yesd and I actually prefer to take such patches via btrfs tree unless
there's a strong dependency on other patches from another subsystem.
Thanks.