2024-02-03 15:44:12

by Waiman Long

[permalink] [raw]
Subject: [PATCH-wq v2 0/5] workqueue: Enable unbound cpumask update on ordered workqueues

v2:
- [v1] https://lore.kernel.org/all/[email protected]/
- Rebased on top of wq's for-v6.9 branch.
- Use the new pwq_tryinc_nr_active() mechanism to freeze the new
pwq of an ordered workqueue until the old pwq has been properly
drained to maintain ordering.
- Make rescuer follow changes in workqueue unbound cpumask as well
as its sysfs cpumask, if available.

Ordered workqueues does not currently follow changes made to the
global unbound cpumask because per-pool workqueue changes may break
the ordering guarantee. IOW, a work function in an ordered workqueue
may run on a cpuset isolated CPU.

This series enables ordered workqueues to follow changes made to
the global unbound cpumask by temporaily freeze the newly allocated
pool_workqueue by using the new frozen flag to freeze execution of
newly queued work items until the old pwq has been properly flushed.

The cpumask of the rescuer task of each workqueue is also made to follow
changes in workqueue unbound cpumask as well as its sysfs cpumask,
if available.

Juri Lelli (1):
kernel/workqueue: Let rescuers follow unbound wq cpumask changes

Waiman Long (4):
workqueue: Skip __WQ_DESTROYING workqueues when updating global
unbound cpumask
workqueue: Enable unbound cpumask update on ordered workqueues
workqueue: Thaw frozen pwq in workqueue_apply_unbound_cpumask()
workqueue: Bind unbound workqueue rescuer to wq_unbound_cpumask

kernel/workqueue.c | 128 +++++++++++++++++++++++++++++++++++++++------
1 file changed, 113 insertions(+), 15 deletions(-)

--
2.39.3



2024-02-03 15:44:26

by Waiman Long

[permalink] [raw]
Subject: [PATCH-wq v2 1/5] workqueue: Skip __WQ_DESTROYING workqueues when updating global unbound cpumask

Skip updating workqueues with __WQ_DESTROYING bit set when updating
global unbound cpumask to avoid unnecessary work and other complications.

Signed-off-by: Waiman Long <[email protected]>
---
kernel/workqueue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ffb625db9771..7ef393f4012e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -6313,7 +6313,7 @@ static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask)
lockdep_assert_held(&wq_pool_mutex);

list_for_each_entry(wq, &workqueues, list) {
- if (!(wq->flags & WQ_UNBOUND))
+ if (!(wq->flags & WQ_UNBOUND) || (wq->flags & __WQ_DESTROYING))
continue;

/* creating multiple pwqs breaks ordering guarantee */
--
2.39.3


2024-02-03 15:44:57

by Waiman Long

[permalink] [raw]
Subject: [PATCH-wq v2 5/5] workqueue: Bind unbound workqueue rescuer to wq_unbound_cpumask

Commit 85f0ab43f9de ("kernel/workqueue: Bind rescuer to unbound
cpumask for WQ_UNBOUND") modified init_rescuer() to bind rescuer of
an unbound workqueue to the cpumask in wq->unbound_attrs. However
unbound_attrs->cpumask's of all workqueues are initialized to
cpu_possible_mask and will only be changed if it has the WQ_SYSFS flag
to expose a cpumask sysfs file to be written by users. So this patch
doesn't achieve what it is intended to do.

If an unbound workqueue is created after wq_unbound_cpumask is modified
and there is no more unbound cpumask update after that, the unbound
rescuer will be bound to all CPUs unless the workqueue is created
with the WQ_SYSFS flag and an user explicitly modified its cpumask
sysfs file. Fix this problem by binding directly to wq_unbound_cpumask
in init_rescuer().

Fixes: 85f0ab43f9de ("kernel/workqueue: Bind rescuer to unbound cpumask for WQ_UNBOUND")
Signed-off-by: Waiman Long <[email protected]>
---
kernel/workqueue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 172f7299aa71..1e022899c13c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5149,7 +5149,7 @@ static int init_rescuer(struct workqueue_struct *wq)

wq->rescuer = rescuer;
if (wq->flags & WQ_UNBOUND)
- kthread_bind_mask(rescuer->task, wq->unbound_attrs->cpumask);
+ kthread_bind_mask(rescuer->task, wq_unbound_cpumask);
else
kthread_bind_mask(rescuer->task, cpu_possible_mask);
wake_up_process(rescuer->task);
--
2.39.3


2024-02-03 15:45:02

by Waiman Long

[permalink] [raw]
Subject: [PATCH-wq v2 3/5] workqueue: Thaw frozen pwq in workqueue_apply_unbound_cpumask()

workqueue_apply_unbound_cpumask() cannot proceed with an ordered
workqueue if its dfl_pwq is still frozen. Just do a sleep wait for
it to be thawed may not work in some cases if pwq_release_workfn() is
somehow prevented from being called due to resources (e.g. wq_pool_mutex)
that are held by its caller.

To break the logjam, we have to actively check if the frozen dfl_pwq
is ready to be thawed and call thaw_pwq() directly if so.

Signed-off-by: Waiman Long <[email protected]>
---
kernel/workqueue.c | 30 ++++++++++++++++++++++++++++--
1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f089e532758a..ee934c2c6ea8 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -6354,6 +6354,32 @@ void thaw_workqueues(void)
}
#endif /* CONFIG_FREEZER */

+/*
+ * Check the given ordered workqueue to see if its non-default pwq's have
+ * zero reference count and if so thaw the frozen default pwq.
+ *
+ * Return:
+ * %true if dfl_pwq has been thawed or %false otherwise.
+ */
+static bool ordered_workqueue_ref_check(struct workqueue_struct *wq)
+{
+ int refs = 0;
+ struct pool_workqueue *pwq;
+
+ if (!READ_ONCE(wq->dfl_pwq->frozen))
+ return true;
+ mutex_lock(&wq->mutex);
+ for_each_pwq(pwq, wq) {
+ if (pwq == wq->dfl_pwq)
+ continue;
+ refs += pwq->refcnt;
+ }
+ if (!refs)
+ thaw_pwq(wq->dfl_pwq);
+ mutex_unlock(&wq->mutex);
+ return !refs;
+}
+
static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask)
{
LIST_HEAD(ctxs);
@@ -6378,12 +6404,12 @@ static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask)

if (!(wq->flags & __WQ_ORDERED_EXPLICIT)) {
wq->flags &= ~__WQ_ORDERED;
- } else if (pwq && pwq->frozen) {
+ } else if (pwq && !ordered_workqueue_ref_check(wq)) {
int i;

for (i = 0; i < 10; i++) {
msleep(10);
- if (!pwq->frozen)
+ if (ordered_workqueue_ref_check(wq))
break;
}
if (WARN_ON_ONCE(pwq->frozen))
--
2.39.3


2024-02-03 15:48:19

by Waiman Long

[permalink] [raw]
Subject: [PATCH-wq v2 2/5] workqueue: Enable unbound cpumask update on ordered workqueues

Ordered workqueues does not currently follow changes made to the
global unbound cpumask because per-pool workqueue changes may break
the ordering guarantee. IOW, a work function in an ordered workqueue
may run on an isolated CPU.

This patch enables ordered workqueues to follow changes made to
the global unbound cpumask by temporaily freeze the newly allocated
pool_workqueue by using the new frozen flag to freeze execution of
newly queued work items until the old pwq has been properly flushed.

This enables ordered workqueues to follow the unbound cpumask changes
like other unbound workqueues at the expense of some delay in execution
of work functions during the transition period.

Signed-off-by: Waiman Long <[email protected]>
---
kernel/workqueue.c | 93 +++++++++++++++++++++++++++++++++++++++-------
1 file changed, 80 insertions(+), 13 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7ef393f4012e..f089e532758a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -242,6 +242,7 @@ struct pool_workqueue {
int refcnt; /* L: reference count */
int nr_in_flight[WORK_NR_COLORS];
/* L: nr of in_flight works */
+ int frozen; /* L: temporarily frozen */

/*
* nr_active management and WORK_STRUCT_INACTIVE:
@@ -1667,6 +1668,9 @@ static bool pwq_tryinc_nr_active(struct pool_workqueue *pwq, bool fill)

lockdep_assert_held(&pool->lock);

+ if (pwq->frozen)
+ return false;
+
if (!nna) {
/* per-cpu workqueue, pwq->nr_active is sufficient */
obtained = pwq->nr_active < READ_ONCE(wq->max_active);
@@ -1747,6 +1751,21 @@ static bool pwq_activate_first_inactive(struct pool_workqueue *pwq, bool fill)
}
}

+/**
+ * thaw_pwq - thaw a frozen pool_workqueue
+ * @pwq: pool_workqueue to be thawed
+ */
+static void thaw_pwq(struct pool_workqueue *pwq)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&pwq->pool->lock, flags);
+ pwq->frozen = false;
+ if (pwq_activate_first_inactive(pwq, true))
+ kick_pool(pwq->pool);
+ raw_spin_unlock_irqrestore(&pwq->pool->lock, flags);
+}
+
/**
* node_activate_pending_pwq - Activate a pending pwq on a wq_node_nr_active
* @nna: wq_node_nr_active to activate a pending pwq for
@@ -4595,6 +4614,14 @@ static void pwq_release_workfn(struct kthread_work *work)
mutex_lock(&wq->mutex);
list_del_rcu(&pwq->pwqs_node);
is_last = list_empty(&wq->pwqs);
+
+ /*
+ * For ordered workqueue with a frozen dfl_pwq, thaw it now.
+ */
+ if (!is_last && (wq->flags & __WQ_ORDERED_EXPLICIT) &&
+ wq->dfl_pwq->frozen)
+ thaw_pwq(wq->dfl_pwq);
+
mutex_unlock(&wq->mutex);
}

@@ -4758,10 +4785,30 @@ static void apply_wqattrs_cleanup(struct apply_wqattrs_ctx *ctx)
{
if (ctx) {
int cpu;
+ bool refcheck = false;

for_each_possible_cpu(cpu)
put_pwq_unlocked(ctx->pwq_tbl[cpu]);
+
+ /*
+ * For ordered workqueue with a frozen dfl_pwq and a reference
+ * count of 1 in ctx->dfl_pwq, it is highly likely that the
+ * refcnt will become 0 after the final put_pwq(). Acquire
+ * wq->mutex to ensure that the pwq won't be freed by
+ * pwq_release_workfn() when we check pwq later.
+ */
+ if ((ctx->wq->flags & __WQ_ORDERED_EXPLICIT) &&
+ ctx->wq->dfl_pwq->frozen &&
+ (ctx->dfl_pwq->refcnt == 1)) {
+ mutex_lock(&ctx->wq->mutex);
+ refcheck = true;
+ }
put_pwq_unlocked(ctx->dfl_pwq);
+ if (refcheck) {
+ if (!ctx->dfl_pwq->refcnt)
+ thaw_pwq(ctx->wq->dfl_pwq);
+ mutex_unlock(&ctx->wq->mutex);
+ }

free_workqueue_attrs(ctx->attrs);

@@ -4821,6 +4868,15 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
cpumask_copy(new_attrs->__pod_cpumask, new_attrs->cpumask);
ctx->attrs = new_attrs;

+ /*
+ * For initialized ordered workqueues, there is only one pwq (dfl_pwq).
+ * Temporarily the frozen flag of ctx->dfl_pwq to freeze the execution
+ * of newly queued work items until execution of older work items in
+ * the old pwq has completed.
+ */
+ if (!list_empty(&wq->pwqs) && (wq->flags & __WQ_ORDERED_EXPLICIT))
+ ctx->dfl_pwq->frozen = true;
+
ctx->wq = wq;
return ctx;

@@ -4861,13 +4917,8 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
if (WARN_ON(!(wq->flags & WQ_UNBOUND)))
return -EINVAL;

- /* creating multiple pwqs breaks ordering guarantee */
- if (!list_empty(&wq->pwqs)) {
- if (WARN_ON(wq->flags & __WQ_ORDERED_EXPLICIT))
- return -EINVAL;
-
+ if (!list_empty(&wq->pwqs) && !(wq->flags & __WQ_ORDERED_EXPLICIT))
wq->flags &= ~__WQ_ORDERED;
- }

ctx = apply_wqattrs_prepare(wq, attrs, wq_unbound_cpumask);
if (IS_ERR(ctx))
@@ -6316,11 +6367,28 @@ static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask)
if (!(wq->flags & WQ_UNBOUND) || (wq->flags & __WQ_DESTROYING))
continue;

- /* creating multiple pwqs breaks ordering guarantee */
+ /*
+ * We does not support changing cpumask of an ordered workqueue
+ * again before the previous cpumask change is completed.
+ * Sleep up to 100ms in 10ms interval to allow previous
+ * operation to complete and skip it if not done by then.
+ */
if (!list_empty(&wq->pwqs)) {
- if (wq->flags & __WQ_ORDERED_EXPLICIT)
- continue;
- wq->flags &= ~__WQ_ORDERED;
+ struct pool_workqueue *pwq = wq->dfl_pwq;
+
+ if (!(wq->flags & __WQ_ORDERED_EXPLICIT)) {
+ wq->flags &= ~__WQ_ORDERED;
+ } else if (pwq && pwq->frozen) {
+ int i;
+
+ for (i = 0; i < 10; i++) {
+ msleep(10);
+ if (!pwq->frozen)
+ break;
+ }
+ if (WARN_ON_ONCE(pwq->frozen))
+ continue;
+ }
}

ctx = apply_wqattrs_prepare(wq, wq->unbound_attrs, unbound_cpumask);
@@ -6836,9 +6904,8 @@ int workqueue_sysfs_register(struct workqueue_struct *wq)
int ret;

/*
- * Adjusting max_active or creating new pwqs by applying
- * attributes breaks ordering guarantee. Disallow exposing ordered
- * workqueues.
+ * Adjusting max_active breaks ordering guarantee. Disallow exposing
+ * ordered workqueues.
*/
if (WARN_ON(wq->flags & __WQ_ORDERED_EXPLICIT))
return -EINVAL;
--
2.39.3


2024-02-03 15:48:39

by Waiman Long

[permalink] [raw]
Subject: [PATCH-wq v2 4/5] kernel/workqueue: Let rescuers follow unbound wq cpumask changes

From: Juri Lelli <[email protected]>

When workqueue cpumask changes are committed the associated rescuer (if
one exists) affinity is not touched and this might be a problem down the
line for isolated setups.

Make sure rescuers affinity is updated every time a workqueue cpumask
changes, so that rescuers can't break isolation.

[longman: set_cpus_allowed_ptr() will block until the designated task
is enqueued on an allowed CPU, no wake_up_process() needed. Also use
the unbound_effective_cpumask() helper as suggested by Tejun.]

Signed-off-by: Juri Lelli <[email protected]>
Signed-off-by: Waiman Long <[email protected]>
---
kernel/workqueue.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ee934c2c6ea8..172f7299aa71 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4905,6 +4905,11 @@ static void apply_wqattrs_commit(struct apply_wqattrs_ctx *ctx)
/* update node_nr_active->max */
wq_update_node_max_active(ctx->wq, -1);

+ /* rescuer needs to respect wq cpumask changes */
+ if (ctx->wq->rescuer)
+ set_cpus_allowed_ptr(ctx->wq->rescuer->task,
+ unbound_effective_cpumask(ctx->wq));
+
mutex_unlock(&ctx->wq->mutex);
}

--
2.39.3


2024-02-04 10:15:33

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH-wq v2 3/5] workqueue: Thaw frozen pwq in workqueue_apply_unbound_cpumask()

Hi Waiman,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tj-wq/for-next]
[also build test WARNING on next-20240202]
[cannot apply to linus/master v6.8-rc2]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Waiman-Long/workqueue-Skip-__WQ_DESTROYING-workqueues-when-updating-global-unbound-cpumask/20240203-234626
base: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-next
patch link: https://lore.kernel.org/r/20240203154334.791910-4-longman%40redhat.com
patch subject: [PATCH-wq v2 3/5] workqueue: Thaw frozen pwq in workqueue_apply_unbound_cpumask()
config: x86_64-randconfig-122-20240204 (https://download.01.org/0day-ci/archive/20240204/[email protected]/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240204/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

sparse warnings: (new ones prefixed by >>)
kernel/workqueue.c:361:40: sparse: sparse: duplicate [noderef]
kernel/workqueue.c:361:40: sparse: sparse: multiple address spaces given: __percpu & __rcu
>> kernel/workqueue.c:6373:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/workqueue.c:6373:25: sparse: struct pool_workqueue *
kernel/workqueue.c:6373:25: sparse: struct pool_workqueue [noderef] __rcu *

vim +6373 kernel/workqueue.c

6356
6357 /*
6358 * Check the given ordered workqueue to see if its non-default pwq's have
6359 * zero reference count and if so thaw the frozen default pwq.
6360 *
6361 * Return:
6362 * %true if dfl_pwq has been thawed or %false otherwise.
6363 */
6364 static bool ordered_workqueue_ref_check(struct workqueue_struct *wq)
6365 {
6366 int refs = 0;
6367 struct pool_workqueue *pwq;
6368
6369 if (!READ_ONCE(wq->dfl_pwq->frozen))
6370 return true;
6371 mutex_lock(&wq->mutex);
6372 for_each_pwq(pwq, wq) {
> 6373 if (pwq == wq->dfl_pwq)
6374 continue;
6375 refs += pwq->refcnt;
6376 }
6377 if (!refs)
6378 thaw_pwq(wq->dfl_pwq);
6379 mutex_unlock(&wq->mutex);
6380 return !refs;
6381 }
6382

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2024-02-04 16:07:32

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH-wq v2 3/5] workqueue: Thaw frozen pwq in workqueue_apply_unbound_cpumask()

On 2/4/24 05:15, kernel test robot wrote:
> Hi Waiman,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on tj-wq/for-next]
> [also build test WARNING on next-20240202]
> [cannot apply to linus/master v6.8-rc2]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Waiman-Long/workqueue-Skip-__WQ_DESTROYING-workqueues-when-updating-global-unbound-cpumask/20240203-234626
> base: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-next
> patch link: https://lore.kernel.org/r/20240203154334.791910-4-longman%40redhat.com
> patch subject: [PATCH-wq v2 3/5] workqueue: Thaw frozen pwq in workqueue_apply_unbound_cpumask()
> config: x86_64-randconfig-122-20240204 (https://download.01.org/0day-ci/archive/20240204/[email protected]/config)
> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240204/[email protected]/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <[email protected]>
> | Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
>
> sparse warnings: (new ones prefixed by >>)
> kernel/workqueue.c:361:40: sparse: sparse: duplicate [noderef]
> kernel/workqueue.c:361:40: sparse: sparse: multiple address spaces given: __percpu & __rcu
>>> kernel/workqueue.c:6373:25: sparse: sparse: incompatible types in comparison expression (different address spaces):
> kernel/workqueue.c:6373:25: sparse: struct pool_workqueue *
> kernel/workqueue.c:6373:25: sparse: struct pool_workqueue [noderef] __rcu *

OK, I didn't realize that a __rcu tag is added to the dfl_pwq in 6.9.
Will change the patch series to use the appropriate helpers to avoid
this kind of warnings.

Cheers,
Longman


2024-02-05 09:48:44

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH-wq v2 0/5] workqueue: Enable unbound cpumask update on ordered workqueues

Hi,

On 03/02/24 10:43, Waiman Long wrote:
> v2:
> - [v1] https://lore.kernel.org/all/[email protected]/
> - Rebased on top of wq's for-v6.9 branch.
> - Use the new pwq_tryinc_nr_active() mechanism to freeze the new
> pwq of an ordered workqueue until the old pwq has been properly
> drained to maintain ordering.
> - Make rescuer follow changes in workqueue unbound cpumask as well
> as its sysfs cpumask, if available.
>
> Ordered workqueues does not currently follow changes made to the
> global unbound cpumask because per-pool workqueue changes may break
> the ordering guarantee. IOW, a work function in an ordered workqueue
> may run on a cpuset isolated CPU.
>
> This series enables ordered workqueues to follow changes made to
> the global unbound cpumask by temporaily freeze the newly allocated
> pool_workqueue by using the new frozen flag to freeze execution of
> newly queued work items until the old pwq has been properly flushed.
>
> The cpumask of the rescuer task of each workqueue is also made to follow
> changes in workqueue unbound cpumask as well as its sysfs cpumask,
> if available.

From a testing point of view this now looks good to me.

Tested-by: Juri Lelli <[email protected]>

Best,
Juri


2024-02-05 18:00:42

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH-wq v2 1/5] workqueue: Skip __WQ_DESTROYING workqueues when updating global unbound cpumask

Hello,

On Sat, Feb 03, 2024 at 10:43:30AM -0500, Waiman Long wrote:
> Skip updating workqueues with __WQ_DESTROYING bit set when updating
> global unbound cpumask to avoid unnecessary work and other complications.
>
> Signed-off-by: Waiman Long <[email protected]>

Applied to wq/for-6.9.

Thanks.

--
tejun

2024-02-05 21:18:02

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH-wq v2 2/5] workqueue: Enable unbound cpumask update on ordered workqueues

Hello, Waiman.

On Sat, Feb 03, 2024 at 10:43:31AM -0500, Waiman Long wrote:
> @@ -242,6 +242,7 @@ struct pool_workqueue {
> int refcnt; /* L: reference count */
> int nr_in_flight[WORK_NR_COLORS];
> /* L: nr of in_flight works */
> + int frozen; /* L: temporarily frozen */

Can we say "plugged" instead? This is a bit confusing because freeze/thaw
are used by PM.

> @@ -1667,6 +1668,9 @@ static bool pwq_tryinc_nr_active(struct pool_workqueue *pwq, bool fill)
>
> lockdep_assert_held(&pool->lock);
>
> + if (pwq->frozen)
> + return false;

Given that this only applied to unbound workqueues, this can be after the if
(!nna) block, right? And also maybe add unlikely()?

> +/**
> + * thaw_pwq - thaw a frozen pool_workqueue
> + * @pwq: pool_workqueue to be thawed
> + */
> +static void thaw_pwq(struct pool_workqueue *pwq)

and "unplug".

> @@ -4595,6 +4614,14 @@ static void pwq_release_workfn(struct kthread_work *work)
> mutex_lock(&wq->mutex);
> list_del_rcu(&pwq->pwqs_node);
> is_last = list_empty(&wq->pwqs);
> +
> + /*
> + * For ordered workqueue with a frozen dfl_pwq, thaw it now.
> + */
> + if (!is_last && (wq->flags & __WQ_ORDERED_EXPLICIT) &&
> + wq->dfl_pwq->frozen)
> + thaw_pwq(wq->dfl_pwq);

It should thaw the last pwq in the wq->pwqs list, not the current dfl_pwq,
right? Imagine an ordered workqueue went through two unbound mask changes.
There will be three pwq's. Let's say there are work items queued on all
three. A, B, C are the workqueues in the oldest to neweest order. 0, 1, 2..
are work items in the queueing order.

dfl_pwq --\
|
v
pwqs -> C -> B -> A (oldest)
| | |
3 2 0
|
1

dfl_pwq should point to the latest pwq as that's where the new work items
should be queued. But, execution should only be allowed in the oldest pwq to
maintain execution order.

I think maybe a simpler way express the logic is "always keep only the last
pwq in the pwq list unplugged".

Also, we likely want to test __WQ_ORDERED as this should also apply to
implicitly ordered workqueues (they should go away eventually but still with
us for now). The problem is that __WQ_ORDERED can go off and when it goes of
it'd need to unplug all the pwqs and so on.

> @@ -4758,10 +4785,30 @@ static void apply_wqattrs_cleanup(struct apply_wqattrs_ctx *ctx)
> {
> if (ctx) {
> int cpu;
> + bool refcheck = false;
>
> for_each_possible_cpu(cpu)
> put_pwq_unlocked(ctx->pwq_tbl[cpu]);
> +
> + /*
> + * For ordered workqueue with a frozen dfl_pwq and a reference
> + * count of 1 in ctx->dfl_pwq, it is highly likely that the
> + * refcnt will become 0 after the final put_pwq(). Acquire
> + * wq->mutex to ensure that the pwq won't be freed by
> + * pwq_release_workfn() when we check pwq later.
> + */
> + if ((ctx->wq->flags & __WQ_ORDERED_EXPLICIT) &&
> + ctx->wq->dfl_pwq->frozen &&
> + (ctx->dfl_pwq->refcnt == 1)) {
> + mutex_lock(&ctx->wq->mutex);
> + refcheck = true;
> + }
> put_pwq_unlocked(ctx->dfl_pwq);
> + if (refcheck) {
> + if (!ctx->dfl_pwq->refcnt)
> + thaw_pwq(ctx->wq->dfl_pwq);
> + mutex_unlock(&ctx->wq->mutex);
> + }

I don't think it's a worthwhile optimization to not grab wq->mutex in
apply_wqattrs path. Can you please simplify the code?

> @@ -6316,11 +6367,28 @@ static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask)
> if (!(wq->flags & WQ_UNBOUND) || (wq->flags & __WQ_DESTROYING))
> continue;
>
> - /* creating multiple pwqs breaks ordering guarantee */
> + /*
> + * We does not support changing cpumask of an ordered workqueue
^
do

> + * again before the previous cpumask change is completed.

Maybe explain why?

> + * Sleep up to 100ms in 10ms interval to allow previous
> + * operation to complete and skip it if not done by then.
> + */
> if (!list_empty(&wq->pwqs)) {
> - if (wq->flags & __WQ_ORDERED_EXPLICIT)
> - continue;
> - wq->flags &= ~__WQ_ORDERED;
> + struct pool_workqueue *pwq = wq->dfl_pwq;
> +
> + if (!(wq->flags & __WQ_ORDERED_EXPLICIT)) {
> + wq->flags &= ~__WQ_ORDERED;
> + } else if (pwq && pwq->frozen) {
> + int i;
> +
> + for (i = 0; i < 10; i++) {
> + msleep(10);
> + if (!pwq->frozen)
> + break;
> + }
> + if (WARN_ON_ONCE(pwq->frozen))
> + continue;
> + }

I don't understand this. Why do we need to block further changes?

Thanks.

--
tejun