2020-06-01 06:10:38

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH] workqueue: ensure all flush_work() completed when being destoryed

In old days, worker threads are not shared among different
workqueues and destroy_workqueue() used kthread_stop() to destroy
all workers before going to destroy workqueue structures.
And kthread_stop() can ensure the scheduled (worker->scheduled)
work items and the linked work items queued by flush_work()
to be completed.

For a workqueue to be completed/unused for wq users means that all
queued works have completed and all flush_work() have return,
and the workqueue is legitimate to passed to destroy_workqueue().

But
e22bee782b3b("workqueue: implement concurrency managed dynamic worker pool")
made worker pools and workers shared among different
workqueues and kthread_stop() is not used to sync the completion
of the work items. destroy_workqueue() uses drain_workqueue()
to drain user work items, but internal work items queued by
flush_work() is not drained due to they don't have colors.

So problems may occur when wq_barrier_func() does complete(&barr->done)
and the wokenup wq-user code does destroy_workqueue(). destroy_workqueue()
can be scheduled eariler than the proccess_one_work() to do
the put_pwq(), so that the sanity check in destroy_workqueue()
can see the no yet put pwq->refcnt and blame false positively.

The problem can be easily fixed by removing the WORK_NO_COLOR
and making the internal work item queued by flush_work() inherit
the color of the work items to be flushed. It would definitely
revert the design and the benefits of the WORK_NO_COLOR.

The color-based flush_workqueue() was designed so well that
we can also easily to adapt it to flush WORK_NO_COLOR. This
is the approach taken by this patch which introduces a
flush_no_color() to flush all the WORK_NO_COLOR works.

Signed-off-by: Lai Jiangshan <[email protected]>
---
include/linux/workqueue.h | 5 +--
kernel/workqueue.c | 72 +++++++++++++++++++++++++++++++++------
2 files changed, 64 insertions(+), 13 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 8b505d22fc0e..85aea89fb6fc 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -55,8 +55,9 @@ enum {
* The last color is no color used for works which don't
* participate in workqueue flushing.
*/
- WORK_NR_COLORS = (1 << WORK_STRUCT_COLOR_BITS) - 1,
- WORK_NO_COLOR = WORK_NR_COLORS,
+ WORK_NR_COLORS = (1 << WORK_STRUCT_COLOR_BITS),
+ WORK_NR_FLUSH_COLORS = (WORK_NR_COLORS - 1),
+ WORK_NO_COLOR = WORK_NR_FLUSH_COLORS,

/* not bound to any CPU, prefer the local CPU */
WORK_CPU_UNBOUND = NR_CPUS,
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7a1fc9fe6314..cf281e273b28 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -585,7 +585,7 @@ static int get_work_color(struct work_struct *work)

static int work_next_color(int color)
{
- return (color + 1) % WORK_NR_COLORS;
+ return (color + 1) % WORK_NR_FLUSH_COLORS;
}

/*
@@ -1167,19 +1167,18 @@ static void pwq_activate_first_delayed(struct pool_workqueue *pwq)
*/
static void pwq_dec_nr_in_flight(struct pool_workqueue *pwq, int color)
{
- /* uncolored work items don't participate in flushing or nr_active */
- if (color == WORK_NO_COLOR)
- goto out_put;
+ /* uncolored work items don't participate in nr_active */
+ if (color != WORK_NO_COLOR) {
+ pwq->nr_active--;
+ if (!list_empty(&pwq->delayed_works)) {
+ /* one down, submit a delayed one */
+ if (pwq->nr_active < pwq->max_active)
+ pwq_activate_first_delayed(pwq);
+ }
+ }

pwq->nr_in_flight[color]--;

- pwq->nr_active--;
- if (!list_empty(&pwq->delayed_works)) {
- /* one down, submit a delayed one */
- if (pwq->nr_active < pwq->max_active)
- pwq_activate_first_delayed(pwq);
- }
-
/* is flush in progress and are we at the flushing tip? */
if (likely(pwq->flush_color != color))
goto out_put;
@@ -2682,6 +2681,7 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
}

debug_work_activate(&barr->work);
+ pwq->nr_in_flight[WORK_NO_COLOR]++;
insert_work(pwq, &barr->work, head,
work_color_to_flags(WORK_NO_COLOR) | linked);
}
@@ -2915,6 +2915,55 @@ void flush_workqueue(struct workqueue_struct *wq)
}
EXPORT_SYMBOL(flush_workqueue);

+/*
+ * Called form destroy_workqueue() to flush all uncompleted internal
+ * work items queued by flush_work().
+ */
+static void flush_no_color(struct workqueue_struct *wq)
+{
+ struct wq_flusher this_flusher = {
+ .list = LIST_HEAD_INIT(this_flusher.list),
+ .flush_color = -1,
+ .done = COMPLETION_INITIALIZER_ONSTACK_MAP(this_flusher.done, wq->lockdep_map),
+ };
+
+ lock_map_acquire(&wq->lockdep_map);
+ lock_map_release(&wq->lockdep_map);
+
+ mutex_lock(&wq->mutex);
+ if (WARN_ON_ONCE(wq->first_flusher))
+ goto out_unlock;
+
+ wq->first_flusher = &this_flusher;
+ this_flusher.flush_color = wq->work_color;
+ WARN_ON_ONCE(wq->flush_color != this_flusher.flush_color);
+ WARN_ON_ONCE(!list_empty(&wq->flusher_overflow));
+ WARN_ON_ONCE(!list_empty(&wq->flusher_queue));
+
+ if (!flush_workqueue_prep_pwqs(wq, WORK_NO_COLOR, -1)) {
+ /* nothing to flush, done */
+ WRITE_ONCE(wq->first_flusher, NULL);
+ goto out_unlock;
+ }
+
+ check_flush_dependency(wq, NULL);
+
+ mutex_unlock(&wq->mutex);
+
+ wait_for_completion(&this_flusher.done);
+
+ mutex_lock(&wq->mutex);
+ WARN_ON_ONCE(wq->first_flusher != &this_flusher);
+ WARN_ON_ONCE(!list_empty(&wq->flusher_overflow));
+ WARN_ON_ONCE(!list_empty(&wq->flusher_queue));
+ WARN_ON_ONCE(!list_empty(&this_flusher.list));
+ WARN_ON_ONCE(wq->flush_color != this_flusher.flush_color);
+ WRITE_ONCE(wq->first_flusher, NULL);
+
+out_unlock:
+ mutex_unlock(&wq->mutex);
+}
+
/**
* drain_workqueue - drain a workqueue
* @wq: workqueue to drain
@@ -4353,6 +4402,7 @@ void destroy_workqueue(struct workqueue_struct *wq)

/* drain it before proceeding with destruction */
drain_workqueue(wq);
+ flush_no_color(wq);

/* kill rescuer, if sanity checks fail, leave it w/o rescuer */
if (wq->rescuer) {
--
2.20.1


2020-06-01 15:34:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] workqueue: ensure all flush_work() completed when being destoryed

Hello, Lai.

On Mon, Jun 01, 2020 at 06:08:02AM +0000, Lai Jiangshan wrote:
> +static void flush_no_color(struct workqueue_struct *wq)
> +{
...

I'm not too sure about using the colored flushing system for this. Given
that the requirements are a lot simpler, I'd prefer keep it separate and
dumb rather than intertwining it with regular flushes. e.g. each wq can keep
the number of flush work items in flight and a simple wake up mechanism for
the task doing destroy_workqueue().

Thanks.

--
tejun

2020-06-02 13:52:15

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH] workqueue: ensure all flush_work() completed when being destoryed

In old days, worker threads are not shared among different
workqueues and destroy_workqueue() used kthread_stop() to destroy
all workers before going to destroy workqueue structures.
And kthread_stop() can ensure the scheduled (worker->scheduled)
work items and the linked work items queued by flush_work()
to be completed.

For a workqueue to be completed/unused for wq users means that all
queued works have completed and all flush_work() have return,
and the workqueue is legitimate to passed to destroy_workqueue().

But
e22bee782b3b("workqueue: implement concurrency managed dynamic worker pool")
made worker pools and workers shared among different
workqueues and kthread_stop() is not used to sync the completion
of the work items. destroy_workqueue() uses drain_workqueue()
to drain user work items, but internal work items queued by
flush_work() is not drained due to they don't have colors.

So problems may occur when wq_barrier_func() does complete(&barr->done)
and the wokenup wq-user code does destroy_workqueue(). destroy_workqueue()
can be scheduled eariler than the proccess_one_work() to do
the put_pwq(), so that the sanity check in destroy_workqueue()
can see the no yet put pwq->refcnt and blame false positively.

The problem can be easily fixed by removing the WORK_NO_COLOR
and making the internal work item queued by flush_work() inherit
the color of the work items to be flushed. It would definitely
revert the design and the benefits of the WORK_NO_COLOR.

The patch simply adds an atomic counter for in-flight flush_work()
and a completion for destroy_workqueue() waiting for them.

Signed-off-by: Lai Jiangshan <[email protected]>
---
Changed from V1:
Change from flush_no_color based mechanism to atomic+completion
based as TJ suggested.

kernel/workqueue.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1921c982f920..71272beb8e01 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -253,6 +253,9 @@ struct workqueue_struct {
int nr_drainers; /* WQ: drain in progress */
int saved_max_active; /* WQ: saved pwq max_active */

+ atomic_t nr_flush_work; /* flush work in progress */
+ struct completion flush_work_done; /* sync flush_work() */
+
struct workqueue_attrs *unbound_attrs; /* PW: only for unbound wqs */
struct pool_workqueue *dfl_pwq; /* PW: only for unbound wqs */

@@ -1154,6 +1157,12 @@ static void pwq_activate_first_delayed(struct pool_workqueue *pwq)
pwq_activate_delayed_work(work);
}

+static void dec_nr_in_flight_flush_work(struct workqueue_struct *wq)
+{
+ if (atomic_dec_and_test(&wq->nr_flush_work))
+ complete(&wq->flush_work_done);
+}
+
/**
* pwq_dec_nr_in_flight - decrement pwq's nr_in_flight
* @pwq: pwq of interest
@@ -1168,8 +1177,10 @@ static void pwq_activate_first_delayed(struct pool_workqueue *pwq)
static void pwq_dec_nr_in_flight(struct pool_workqueue *pwq, int color)
{
/* uncolored work items don't participate in flushing or nr_active */
- if (color == WORK_NO_COLOR)
+ if (color == WORK_NO_COLOR) {
+ dec_nr_in_flight_flush_work(pwq->wq);
goto out_put;
+ }

pwq->nr_in_flight[color]--;

@@ -2682,6 +2693,7 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
}

debug_work_activate(&barr->work);
+ atomic_inc(&pwq->wq->nr_flush_work);
insert_work(pwq, &barr->work, head,
work_color_to_flags(WORK_NO_COLOR) | linked);
}
@@ -4278,6 +4290,9 @@ struct workqueue_struct *alloc_workqueue(const char *fmt,
wq_init_lockdep(wq);
INIT_LIST_HEAD(&wq->list);

+ atomic_set(&wq->nr_flush_work, 1);
+ init_completion(&wq->flush_work_done);
+
if (alloc_and_link_pwqs(wq) < 0)
goto err_unreg_lockdep;

@@ -4354,6 +4369,10 @@ void destroy_workqueue(struct workqueue_struct *wq)
/* drain it before proceeding with destruction */
drain_workqueue(wq);

+ /* flush all uncompleted internal work items queued by flush_work() */
+ dec_nr_in_flight_flush_work(wq);
+ wait_for_completion(&wq->flush_work_done);
+
/* kill rescuer, if sanity checks fail, leave it w/o rescuer */
if (wq->rescuer) {
struct worker *rescuer = wq->rescuer;
--
2.20.1

2020-06-02 16:18:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] workqueue: ensure all flush_work() completed when being destoryed

Hello, Lai.

On Tue, Jun 02, 2020 at 01:49:14PM +0000, Lai Jiangshan wrote:
> +static void dec_nr_in_flight_flush_work(struct workqueue_struct *wq)
> +{
> + if (atomic_dec_and_test(&wq->nr_flush_work))

Do you think it'd make sense to put this in pwq so that it can be
synchronized with the pool lock instead of using a separate atomic counter?

Makes sense to me otherwise.

Thanks.

--
tejun

2020-06-02 23:21:12

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH] workqueue: ensure all flush_work() completed when being destoryed

On Wed, Jun 3, 2020 at 12:13 AM Tejun Heo <[email protected]> wrote:
>
> Hello, Lai.
>
> On Tue, Jun 02, 2020 at 01:49:14PM +0000, Lai Jiangshan wrote:
> > +static void dec_nr_in_flight_flush_work(struct workqueue_struct *wq)
> > +{
> > + if (atomic_dec_and_test(&wq->nr_flush_work))
>
> Do you think it'd make sense to put this in pwq so that it can be
> synchronized with the pool lock instead of using a separate atomic counter?
>

Hello, Tejun

When the counter is put in pwq, there will be a per-pwq counter,
a per-pwq completion or pointer and a flush_workqueue_prep_pwqs()-like
function (although simpler) to set up them before waiting for them.

It would sound like the V1 patch. per-pwq counter may have better
performance over per-wq atomic, it seems too tiny to add code
complicity. V1 has the simplest pwq_dec_nr_in_flight() except
there is too much WARN_ON_ONCE() in flush_no_color().

Thanks
Lai

> Makes sense to me otherwise.
>
> Thanks.
>
> --
> tejun