2014-02-15 19:39:26

by Peter Hurley

[permalink] [raw]
Subject: [PATCH] workqueue: Document exceptions to work item non-reentrancy guarantee

Since commit a2c1c57be8d9fd5b716113c8991d3d702eeacf77,
workqueue: consider work function when searching for busy work items
work items whose work functions are re-assigned are no longer guaranteed
non-reentrancy with the previously assigned work function. For example,

PREPARE_WORK(&work, funcA)
schedule_work(&work)
.
< funcA starts >
.
PREPARE_WORK(&work, funcB)
schedule_work(&work)

funcA() may run concurrently with funcB().

The work item non-reentrancy guarantee is a crucial design guarantee
which, if violated, may not have obvious consequences. For example,
the entire firewire subsystem expects the as-documented per-work item
non-reentrancy guarantee, which was the behavior prior to the above
commit, not the per-work function + per-work item behavior since.

Document the known exceptions to this guarantee.

Cc: Stefan Richter <[email protected]>
Cc: <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
Documentation/workqueue.txt | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/Documentation/workqueue.txt b/Documentation/workqueue.txt
index f81a65b..5b49491 100644
--- a/Documentation/workqueue.txt
+++ b/Documentation/workqueue.txt
@@ -223,9 +223,17 @@ resources, scheduled and executed.

This flag is meaningless for unbound wq.

-Note that the flag WQ_NON_REENTRANT no longer exists as all workqueues
-are now non-reentrant - any work item is guaranteed to be executed by
-at most one worker system-wide at any given time.
+Note that the flag WQ_NON_REENTRANT no longer exists.
+Generally, workqueues are non-reentrant -- a work item will be executed
+by at most one worker system-wide at any given time. However, this
+non-reentrancy guarantee does not extend to work items whose function
+is re-assigned (for example, via PREPARE_WORK/PREPARE_DELAYED_WORK)
+and re-queued for execution; the newly scheduled work may run concurrently
+with the previous work item function.
+
+Note that WQ_UNBOUND workqueues, such as system_unbound_wq, have no
+non-reentrancy guarantees as these workqueues do not provide concurrency
+management.

@max_active:

--
1.8.1.2


2014-02-18 01:44:18

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH] workqueue: Document exceptions to work item non-reentrancy guarantee

On Sat, 2014-02-15 at 14:38 -0500, Peter Hurley wrote:
> Since commit a2c1c57be8d9fd5b716113c8991d3d702eeacf77,
> workqueue: consider work function when searching for busy work items
> work items whose work functions are re-assigned are no longer guaranteed
> non-reentrancy with the previously assigned work function. For example,
>
> PREPARE_WORK(&work, funcA)
> schedule_work(&work)
> .
> < funcA starts >
> .
> PREPARE_WORK(&work, funcB)
> schedule_work(&work)
>
> funcA() may run concurrently with funcB().
>
> The work item non-reentrancy guarantee is a crucial design guarantee
> which, if violated, may not have obvious consequences. For example,
> the entire firewire subsystem expects the as-documented per-work item
> non-reentrancy guarantee, which was the behavior prior to the above
> commit, not the per-work function + per-work item behavior since.
>
> Document the known exceptions to this guarantee.
[...]

It never would have occurred to me that you could safely change the
function for a work item that is already scheduled or running.
Especially given that PREPARE_WORK() is just a simple assignment (i.e.
no serialisation).

Perhaps there is a need to document more generally how PREPARE_WORK()
can be used, not just this quirk of reentrancy.

Ben.

--
Ben Hutchings
Nothing is ever a complete failure; it can always serve as a bad example.


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2014-02-18 02:29:40

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH] workqueue: Document exceptions to work item non-reentrancy guarantee

On 02/17/2014 08:43 PM, Ben Hutchings wrote:
> On Sat, 2014-02-15 at 14:38 -0500, Peter Hurley wrote:
>> Since commit a2c1c57be8d9fd5b716113c8991d3d702eeacf77,
>> workqueue: consider work function when searching for busy work items
>> work items whose work functions are re-assigned are no longer guaranteed
>> non-reentrancy with the previously assigned work function. For example,
>>
>> PREPARE_WORK(&work, funcA)
>> schedule_work(&work)
>> .
>> < funcA starts >
>> .
>> PREPARE_WORK(&work, funcB)
>> schedule_work(&work)
>>
>> funcA() may run concurrently with funcB().
>>
>> The work item non-reentrancy guarantee is a crucial design guarantee
>> which, if violated, may not have obvious consequences. For example,
>> the entire firewire subsystem expects the as-documented per-work item
>> non-reentrancy guarantee, which was the behavior prior to the above
>> commit, not the per-work function + per-work item behavior since.
>>
>> Document the known exceptions to this guarantee.
> [...]
>
> It never would have occurred to me that you could safely change the
> function for a work item that is already scheduled or running.
> Especially given that PREPARE_WORK() is just a simple assignment (i.e.
> no serialisation).

process_one_work() has an established order that safely allows for
resetting the work function and scheduling the work, and further
guaranteeing that the new work function will run.

Further, existing memory barriers ensure that
1. The new work function is visible on all cpus before testing if
the work is already pending.
2. The new work function is stored as the worker's current function
before the work is marked as not pending.

If this wasn't possible, then single-threaded workqueues could
not be used for multiple functions without flushing work.

I wonder if the floppy driver is broken too.

Regards,
Peter Hurley

2014-02-18 15:33:22

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] workqueue: Document exceptions to work item non-reentrancy guarantee

On Mon, Feb 17, 2014 at 09:29:34PM -0500, Peter Hurley wrote:
> >It never would have occurred to me that you could safely change the
> >function for a work item that is already scheduled or running.
> >Especially given that PREPARE_WORK() is just a simple assignment (i.e.
> >no serialisation).
>
> process_one_work() has an established order that safely allows for
> resetting the work function and scheduling the work, and further
> guaranteeing that the new work function will run.
>
> Further, existing memory barriers ensure that
> 1. The new work function is visible on all cpus before testing if
> the work is already pending.
> 2. The new work function is stored as the worker's current function
> before the work is marked as not pending.
>
> If this wasn't possible, then single-threaded workqueues could
> not be used for multiple functions without flushing work.
>
> I wonder if the floppy driver is broken too.

Ugh... I'd just rather remove PREPARE_WORK altogether. It's a pretty
dumb thing to do anyway. I'll look into it.

Thanks.

--
tejun

2014-02-18 16:31:24

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH] workqueue: Document exceptions to work item non-reentrancy guarantee

On 02/18/2014 10:30 AM, Tejun Heo wrote:
> On Mon, Feb 17, 2014 at 09:29:34PM -0500, Peter Hurley wrote:
>>> It never would have occurred to me that you could safely change the
>>> function for a work item that is already scheduled or running.
>>> Especially given that PREPARE_WORK() is just a simple assignment (i.e.
>>> no serialisation).
>>
>> process_one_work() has an established order that safely allows for
>> resetting the work function and scheduling the work, and further
>> guaranteeing that the new work function will run.
>>
>> Further, existing memory barriers ensure that
>> 1. The new work function is visible on all cpus before testing if
>> the work is already pending.
>> 2. The new work function is stored as the worker's current function
>> before the work is marked as not pending.
>>
>> If this wasn't possible, then single-threaded workqueues could
>> not be used for multiple functions without flushing work.
>>
>> I wonder if the floppy driver is broken too.
>
> Ugh... I'd just rather remove PREPARE_WORK altogether.

Ok.

That doesn't make the use-case go away; it simply moves it outside
the workqueue subsystem.

For example, in the case of the firewire subsystem, this technique
was used to essentially single-thread per-device work using only one
designated workqueue for all devices. The possibility of accidentally
running a work item 2x is a non-issue since the device state is
managed atomically.

Of the other use cases in the kernel, it seems only the floppy
driver uses a similar technique. But maybe that's ok because it's
on a single-threaded workqueue.

USB and AFS use PREPARE_{DELAYED}_WORK to reschedule from within
the current work function to a new function, which seems ok.

fwserial already serializes its use of PREPARE_WORK with &peer->lock
(and checks if the work is already pending).

> It's a pretty dumb thing to do anyway.

Fragile, yes; dumb, no. At least not from the point-of-view of the
documentation and what the workqueue actually did. But obviously from
your reaction, unintentional design.

> I'll look into it.

Thanks.

Regards,
Peter Hurley

2014-02-18 16:42:32

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] workqueue: Document exceptions to work item non-reentrancy guarantee

On Tue, Feb 18, 2014 at 11:31:18AM -0500, Peter Hurley wrote:
> >It's a pretty dumb thing to do anyway.
>
> Fragile, yes; dumb, no. At least not from the point-of-view of the
> documentation and what the workqueue actually did. But obviously from
> your reaction, unintentional design.

Well, maybe dumb is too strong a word but these types of edge case
interfaces tend to cause subtle issues over time while being
marginally useful at best. Sure it helps a couple odd users to write
slightly less code, but that's such a marginal gain compared to the
fragility it adds by being another one-off thing. So, yeah, it's a
silly thing to include in an interface as widely use as workqueue and
it's more of a historical remanant than anything else at this point.

Thanks.

--
tejun