LinuxLists.cc - Re: [PATCH 01/30] infiniband: update workqueue usage

2010-12-15 18:42:51

Subject: Re: [PATCH 01/30] infiniband: update workqueue usage

Thanks Tejun. A couple questions:

> * ib_wq is added, which is used as the common workqueue for infiniband
> instead of the system workqueue. All system workqueue usages
> including flush_scheduled_work() callers are converted to use and
> flush ib_wq. This is to prepare for deprecation of
> flush_scheduled_work().

Why do we want to move to a subsystem-specific workqueue? Can we just
replace flush_scheduled_work() by cancel_delayed_work_sync() as
appropriate and not create yet another work queue?

> * qib_wq is removed and ib_wq is used instead.

You obviously looked at the comment

- /*
- * We create our own workqueue mainly because we want to be
- * able to flush it when devices are being removed. We can't
- * use schedule_work()/flush_scheduled_work() because both
- * unregister_netdev() and linkwatch_event take the rtnl lock,
- * so flush_scheduled_work() can deadlock during device
- * removal.
- */
- qib_wq = create_workqueue("qib");

and know that with the new workqueue stuff, this issue no longer
exists. But for both my education and also the clarity of the changelog
for this patch, perhaps you could expand on why ib_wq is safe here.

> * create[_singlethread]_workqueue() usages are replaced with the new
> alloc[_ordered]_workqueue(). This removes rescuers from all
> infiniband workqueues.

What are rescuers?

Can we replace some of these driver-specific work queues by the ib_wq?

Are all these things just possibilities for future cleanup?

Thanks,
Roland

2010-12-16 16:51:08

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH 01/30] infiniband: update workqueue usage

Hello, Roland. Sorry about the delay.

On 12/15/2010 07:33 PM, Roland Dreier wrote:
> Thanks Tejun. A couple questions:
>
> > * ib_wq is added, which is used as the common workqueue for infiniband
> > instead of the system workqueue. All system workqueue usages
> > including flush_scheduled_work() callers are converted to use and
> > flush ib_wq. This is to prepare for deprecation of
> > flush_scheduled_work().
>
> Why do we want to move to a subsystem-specific workqueue? Can we just
> replace flush_scheduled_work() by cancel_delayed_work_sync() as
> appropriate and not create yet another work queue?

Because there are places where work is used to free the containing
structure. Before a module is unloaded, all works which uses
functions in the module should be flushed; however, if a work is used
to free the containing structure, such work can't be flushed
explicitly, so the workqueue which processes such works should be
flushed.

So, in this case, ib_wq is added primarily to serve as a flush domain.
For driver midlayers, this seems often necessary. Also, the workqueue
doesn't have any dedicated worker and is quite cheap.

>
> > * qib_wq is removed and ib_wq is used instead.
>
> You obviously looked at the comment
>
> - /*
> - * We create our own workqueue mainly because we want to be
> - * able to flush it when devices are being removed. We can't
> - * use schedule_work()/flush_scheduled_work() because both
> - * unregister_netdev() and linkwatch_event take the rtnl lock,
> - * so flush_scheduled_work() can deadlock during device
> - * removal.
> - */
> - qib_wq = create_workqueue("qib");
>
> and know that with the new workqueue stuff, this issue no longer
> exists. But for both my education and also the clarity of the changelog
> for this patch, perhaps you could expand on why ib_wq is safe here.

I think I got confused. I thought the comment was indicating the
separation between qib_wq and qib_cq_wq. It's between system_wq and
qib_wq, right? I'll drop this part from the series, but then again
what's the difference from ib_srp, which flushes the common workqueue?
Why doesn't ib_srp have the same problem?

> > * create[_singlethread]_workqueue() usages are replaced with the new
> > alloc[_ordered]_workqueue(). This removes rescuers from all
> > infiniband workqueues.
>
> What are rescuers?

Normally, all workqueues share global per-cpu worker pool, but certain
workqueues needs forward progress guarantee under memory pressure (the
ones which are used to free memory). In this case, the workqueues are
created with WQ_MEM_RECLAIM and has a single rescuer worker reserved.
So, any workqueue which is in memory reclaim path needs to have the
flag set to avoid the unlikely but still possible deadlock under
memory pressure.

> Can we replace some of these driver-specific work queues by the ib_wq?
>
> Are all these things just possibilities for future cleanup?

Hmm... Yeah, sure, they can be. With the new implementation, separate
workqueues are used for the following purposes.

* As a forward progress guarantee domain as decribed above.

* As a flushing domain.

* As a property domain. Different workqueues have different execution
and queueing properties set.

Unless one of the above is necessary, work items can be queued
together into the same workqueue. Concurrency-wise, it wouldn't make
any difference. They all use the same set of workers anyway, but I
don't know the code well enough to make the changes myself. If you're
interested in doing it, I'll be happy to help.

Thanks.

--
tejun

2010-12-23 22:12:13

by David Dillow

[permalink] [raw]

Subject: Re: [PATCH 01/30] infiniband: update workqueue usage

On Thu, 2010-12-16 at 17:50 +0100, Tejun Heo wrote:
> On 12/15/2010 07:33 PM, Roland Dreier wrote:
> >
> > > * qib_wq is removed and ib_wq is used instead.
> >
> > You obviously looked at the comment
> >
> > - /*
> > - * We create our own workqueue mainly because we want to be
> > - * able to flush it when devices are being removed. We can't
> > - * use schedule_work()/flush_scheduled_work() because both
> > - * unregister_netdev() and linkwatch_event take the rtnl lock,
> > - * so flush_scheduled_work() can deadlock during device
> > - * removal.
> > - */
> > - qib_wq = create_workqueue("qib");
> >
> > and know that with the new workqueue stuff, this issue no longer
> > exists. But for both my education and also the clarity of the changelog
> > for this patch, perhaps you could expand on why ib_wq is safe here.
>
> I think I got confused. I thought the comment was indicating the
> separation between qib_wq and qib_cq_wq. It's between system_wq and
> qib_wq, right? I'll drop this part from the series, but then again
> what's the difference from ib_srp, which flushes the common workqueue?
> Why doesn't ib_srp have the same problem?

Looking at qib, I'm not sure the comment isn't confused -- the only
place I see where qib_wq or qib_cq_wq get flushed is by
destroy_workqueue() when the module is being unloaded. And we shouldn't
be there with rtnl_lock held by the caller.

Roland, please let me know how plan to proceed -- I need to update
ib_srp to get rid of *scheduled_work(), and I can either use the IB
core's queue, or define my own. Since it's cheap, I don't suppose it
matters much, but I think I'd prefer to share if possible.

Thanks,
Dave