(Adding Cc: LKML and Tejun. Not adding Ralf, I think he is subscribed.)
On May 21 Stephan Gatzka wrote:
> Hi!
>
> I want to keep you informed about that issue, Ralf described some month
> ago:
>
> http://marc.info/?l=linux1394-devel&m=136152042914650&w=2
>
> We instrumented the workqueue and got a lot of insight how it works. :)
>
> The deadlock occurs mainly because the firewire workqueue is allocated
> with WQ_MEM_RECLAIM. That's why a so called rescuer thread is started
> while allocating the wq.
>
> This rescuer thread is responsible to keep the queue working even under
> high memory pressure so that a memory allocation might sleep. If that
> happens, all work of that workqueue is designated to that particular
> rescuer thread. The work in this rescuer thread is done strictly
> sequential. Now we have the situation that the rescuer thread runs
> fw_device_init->read_config_rom->read_rom->fw_run_transaction.
> fw_run_transaction blocks waiting for the completion object. This
> completion object will be completed in bus_reset_work, but this work
> will never executed in the rescuer thread.
>
> The interesting question is why the firewire wq stuff is designated to
> the rescuer thread because we are definitely not under high memory
> pressure. I looked into the workqueue code and found that the work is
> shifted to the rescuer a new worker thread could not be allocated in a
> certain time (static bool maybe_create_worker(struct global_cwq *gcwq)
> in workqueue.c). This timeout (MAYDAY_INITIAL_TIMEOUT) is set to 10ms.
If I recall correctly what was written earlier in this thread, the fact
that firewire-ohci schedules its work from IRQ context seems to matter.
If so, then this raises the generally interesting question how to use
workqueue as bottom half of an interrupt handler.¹ Does this necessitate
separate workqueues --- one for the BH worklets and another for worklets
that depend on the BH worklets?²
But Ralf wrote on March 12: "Other tests have shown that even the own
workqueue is not enough. The deadlock also occurs here. Not nearly as
often, but it happens. Only with the tasklet can I yet be sure that the
deadlock does not occur." So, having a private workqueue just for this
single BH worklet alone is still not the answer.
--------
¹) Worklets as bottom halves --- as alternative to tasklets as bottom
halves --- have been suggested by RT folks in the past because there
is more control over CPU scheduling of worklets compared to tasklets,
obviously.
²) In the firewire-subsystem, we have a subsystem workqueue for various
work from firewire-core and occasional work from upper-layer drivers
IOW protocol drivers. A while ago we converted a low-level driver IRQ
BH from tasklet to worklet in order to be able to do some sleeping
stuff in it. However, progress of the core and upper layer work
depends on the IRQ BH to be executed concurrently.
> What I don't understand is that the create_worker() call is not
> satisfied within 10ms. We run the stuff on a 400Mhz PowerPC with Xenomai
> realtime extension. Because we have a lot of firewire participants, we
> got a log of bus resets when starting up the whole system. Maybe we
> spend too much time during the interrupts, but that's just a rough guess
> right now. We will look further into that issue.
Wait a minute --- I seem to be reading of Xenomai RT extensions for the
first time in this thread. Has the problem been reproduced on a mainline
kernel too? (Mainline plus Ralf's firewire upper layer driver maybe, but
without any other 3rd party stuff please. Actually the issue should be
reproducible even without an upper layer driver performing
fw_iso_resource_manage in fw_workqueue context, shouldn't it?)
There was one or two /other/ reports about split transactions never
completing, i.e. not even timing out, on the ffado-user or ffado-devel
mailinglist: The userland Jack audio server got stuck in
drivers/firewire/core-cdev.c::fw_device_op_release(). However, whether
this was the same problem or a different one altogether is not known to
me. There is at least one known hardware³ bug which we do not handle in
yet which can lead to this, orthogonally to the workqueue problem which we
are discussing here. --- Long story short, Ralf may possibly be the
only one who suffers from this problem so far, which makes it even more
important to verify whether or not this is a mainline problem.)
--------
³) JMB38x
> Our workaround is to declare the firewire workqueue as CPU_INTENSIVE.
> This gives a slight different execution model in the normal case (no
> work running on the rescuer thread). Maybe it's worth to think about
> that because some work looks rather heavy to me.
>
> Regards,
>
> Stephan
There is quite a bit going on in firewire-ohci's and firewire-core's parts
of self-ID-complete event handling. But I suspect that stuff like that is
still not in the league for which CPU_INTENSIVE was intended for
(cryptography etc.).
--
Stefan Richter
-=====-===-= -=-= =-=-=
http://arcgraph.de/sr/
> Wait a minute --- I seem to be reading of Xenomai RT extensions for the
> first time in this thread. Has the problem been reproduced on a
> mainline
> kernel too? (Mainline plus Ralf's firewire upper layer driver maybe,
> but
> without any other 3rd party stuff please. Actually the issue should be
> reproducible even without an upper layer driver performing
> fw_iso_resource_manage in fw_workqueue context, shouldn't it?)
Unfortunately it's not that easy. While I agree that it should be
reproducible just with mainline stuff, we have to force our system into
the situation that it "thinks" it's under memory pressure to hand over
the work to the rescuer thread. It looks that a simple printk from a
different driver might lead to that situation on our system.
As a first fix, I would say we have at least bring bus_reset_work()
running on its own workqueue because otherwise we cannot guarantee
progress of bus_reset_work under memory pressure.
> There is quite a bit going on in firewire-ohci's and firewire-core's
> parts
> of self-ID-complete event handling. But I suspect that stuff like that
> is
> still not in the league for which CPU_INTENSIVE was intended for
> (cryptography etc.).
Some advice from Tejun would be great here...
Hello,
On Wed, May 22, 2013 at 11:08:43AM +0200, Stephan Gatzka wrote:
> >There is quite a bit going on in firewire-ohci's and
> >firewire-core's parts
> >of self-ID-complete event handling. But I suspect that stuff like
> >that is
> >still not in the league for which CPU_INTENSIVE was intended for
> >(cryptography etc.).
>
> Some advice from Tejun would be great here...
No need to use CPU_INTENSIVE unless it's really burning considerable
amount of CPU cycles. I don't know what the firewire work items are
doing but am pretty doubtful they would need the flag, and please
don't use it to work around deadlock problems. :)
Thanks.
--
tejun
On May 22 Stephan Gatzka wrote:
[I wrote:]
> > Wait a minute --- I seem to be reading of Xenomai RT extensions for the
> > first time in this thread. Has the problem been reproduced on a mainline
> > kernel too? (Mainline plus Ralf's firewire upper layer driver maybe, but
> > without any other 3rd party stuff please. Actually the issue should be
> > reproducible even without an upper layer driver performing
> > fw_iso_resource_manage in fw_workqueue context, shouldn't it?)
>
> Unfortunately it's not that easy. While I agree that it should be
> reproducible just with mainline stuff, we have to force our system into
> the situation that it "thinks" it's under memory pressure to hand over
> the work to the rescuer thread. It looks that a simple printk from a
> different driver might lead to that situation on our system.
[...]
I retract my request to get it retested with mainline since a) the
analysis has clearly shown that a mainline bug is at the root of the
problem, and b) Ralf confirmed now that one of the possible fixes (using
two queues) does in fact work for him like the theory says how it should
work for mainline.
--
Stefan Richter
-=====-===-= -=-= =-==-
http://arcgraph.de/sr/