On May 22 Tejun Heo wrote:
> Hello,
>
> On Tue, May 21, 2013 at 12:54:04PM -0400, Peter Hurley wrote:
> >> This rescuer thread is responsible to keep the queue working even
> >> under high memory pressure so that a memory allocation might
> >> sleep. If that happens, all work of that workqueue is designated to
> >> that particular rescuer thread. The work in this rescuer thread is
> >> done strictly sequential. Now we have the situation that the
> >> rescuer thread runs
> >> fw_device_init->read_config_rom->read_rom->fw_run_transaction. fw_run_transaction
> >> blocks waiting for the completion object. This completion object
> >> will be completed in bus_reset_work, but this work will never
> >> executed in the rescuer thread.
> >
> > Interesting.
> >
> > Tejun, is this workqueue behavior as designed? Ie., that a workqueue used
> > as a domain for forward progress guarantees collapses under certain conditions,
> > such as scheduler overhead and no longer ensures forward progress?
>
> Yeap, from Documentation/workqueue.txt
>
> WQ_MEM_RECLAIM
>
> All wq which might be used in the memory reclaim paths _MUST_
> have this flag set. The wq is guaranteed to have at least one
> execution context regardless of memory pressure.
>
> All it guarantees is that there will be at least one execution thread
> working on the workqueue under any conditions. If there are
> inter-dependent work items which are necessary to make forward
> progress in memory reclaim, they must be put into separate workqueues.
> In turn, workqueues w/ WQ_RESCUER set *must* be able to make forward
> progress in all cases at the concurrency level of 1. Probably the
> documentation needs a bit of clarification.
[...]
> > I thought the whole point of needing WQ_MEM_RECLAIM is if a SBP-2
> > device is swap.
> >
> > FWIW, I still believe that we should revert to the original bus
> > reset as tasklet and redo the TI workaround to use
> > TI-workaround-specific versions of non-sleeping PHY accesses.
>
> The right fix would be either dropping WQ_MEM_RECLAIM or breaking it
> into two workqueues so that work items don't have interdependencies.
>
> Thanks.
Argh, suddenly it all seems so obvious. Tejun, Peter, Stephan, thank you
for getting this clarified.
A third (fourth?) way to fix it --- feasible or not --- would be to break
the dependency between the worklets. In this case, use a timer to cancel
outbound transactions if the request-transmit IRQ event was not received
before a timeout. We had such a timeout in the older ieee1394 drivers and
we also had it in earlier versions of the firewire drivers, at a risk of a
race between CPU and OHCI.
--
Stefan Richter
-=====-===-= -=-= =-==-
http://arcgraph.de/sr/
> A third (fourth?) way to fix it --- feasible or not --- would be to
> break
> the dependency between the worklets. In this case, use a timer to
> cancel
> outbound transactions if the request-transmit IRQ event was not
> received
> before a timeout. We had such a timeout in the older ieee1394 drivers
> and
> we also had it in earlier versions of the firewire drivers, at a risk
> of a
> race between CPU and OHCI.
Why do we need a timer? If we guarantee that bus_reset_work() always
make progress (if we put it into its own queue), it should always be
able to complete the corresponding completion object the other works are
waiting for.
Regards,
Stephan
On May 22 Stephan Gatzka wrote:
[I wrote:]
> > A third (fourth?) way to fix it --- feasible or not --- would be to break
> > the dependency between the worklets. In this case, use a timer to cancel
> > outbound transactions if the request-transmit IRQ event was not received
> > before a timeout. We had such a timeout in the older ieee1394 drivers and
> > we also had it in earlier versions of the firewire drivers, at a risk of a
> > race between CPU and OHCI.
>
> Why do we need a timer? If we guarantee that bus_reset_work() always
> make progress (if we put it into its own queue), it should always be
> able to complete the corresponding completion object the other works are
> waiting for.
We don't need a timer if the dual-wq solution is implemented. It's just
that there are four different solutions possible in principle. Any one
of these fixes is sufficient on its own:
Problem --- A WQ_MEM_RECLAIM workqueue will, for some periods of time,
offer a concurrency level of only 1; particularly during memory reclaim,
but also in some other situations due to the way how it is implemented (as
far as I understood yesterday's posts). Therefore, interdependent works
must not be inserted into the same WQ_MEM_RECLAIM workqueue instance.
Solution 1 --- Don't set the WQ_MEM_RECLAIM flag. This is only possible
if memory reclaim does not depend on any one of the works that are
inserted into this queue. (This solution is therefore not available to
an SBP-2 initiator.)
Solution 2 --- Perform interdependent works in different queue instances.
(Keep the WQ_MEM_RECLAIM flag set at those workqueues that have to take
work which is necessary for progress of memory reclaim. If this and only
this solution is employed for an SBP-2 initiator, we need two if not more
WQ_MEM_RECLAIM workqueue instances.)
Solution 3 --- Remove the dependency between worklets:
Solution 3a --- Remove the lower-level worklet altogether.
E.g. reimplement the lower-level worklet as a tasklet.
Solution 3b --- Remove the higher-level worklet's dependency.
E.g. reimplement the higher-level worklet such that it is woken by
a timer and then aborts or reschedules ( = lets the lower-level
worklet bubble up in the queue).
--
Stefan Richter
-=====-===-= -=-= =-==-
http://arcgraph.de/sr/
> Solution 2 --- Perform interdependent works in different queue
> instances.
> (Keep the WQ_MEM_RECLAIM flag set at those workqueues that have to take
> work which is necessary for progress of memory reclaim. If this and
> only
> this solution is employed for an SBP-2 initiator, we need two if not
> more
> WQ_MEM_RECLAIM workqueue instances.)
I would go for this solution. I have no problems with lots of workqueues
around, because there is only a relatively small structure required for
each workqueue.
>
> Solution 3 --- Remove the dependency between worklets:
>
> Solution 3a --- Remove the lower-level worklet altogether.
> E.g. reimplement the lower-level worklet as a tasklet.
No, I like the workqueue context. :)
> Solution 3b --- Remove the higher-level worklet's dependency.
> E.g. reimplement the higher-level worklet such that it is woken by
> a timer and then aborts or reschedules ( = lets the lower-level
> worklet bubble up in the queue).
This looks more difficult to me and not so easy to test.
Stephan
On 05/22/2013 09:06 AM, Stephan Gatzka wrote:
>> Solution 2 --- Perform interdependent works in different queue instances.
>> (Keep the WQ_MEM_RECLAIM flag set at those workqueues that have to take
>> work which is necessary for progress of memory reclaim. If this and only
>> this solution is employed for an SBP-2 initiator, we need two if not more
>> WQ_MEM_RECLAIM workqueue instances.)
>
> I would go for this solution. I have no problems with lots of workqueues around, because there is only a relatively small structure required for each workqueue.
Now that we more fully understand the cause, this would be my choice as
well (despite my earlier advocacy for the return to tasklets).
Although we do need to carefully review the other work items (bm_work,br_work)
to ensure that there aren't unwanted dependencies with those as well.
Also, wqs that are WQ_MEM_RECLAIM aren't free as each wq will have a
dedicated rescue thread.
[The other advantage with tasklets is the reduced latency but I do
understand the advantages of the RT characteristics of worklets.]
Regards,
Peter Hurley
>
>>
>> Solution 3 --- Remove the dependency between worklets:
>>
>> Solution 3a --- Remove the lower-level worklet altogether.
>> E.g. reimplement the lower-level worklet as a tasklet.
>
> No, I like the workqueue context. :)
>
>
>> Solution 3b --- Remove the higher-level worklet's dependency.
>> E.g. reimplement the higher-level worklet such that it is woken by
>> a timer and then aborts or reschedules ( = lets the lower-level
>> worklet bubble up in the queue).
>
> This looks more difficult to me and not so easy to test.
>
> Stephan