2009-12-14 21:33:45

by Alan Stern

[permalink] [raw]
Subject: Warn people about flush_scheduled_work()

Tejun:

You've spent some time working on the workqueue implementation, right?
I'd like to add comments or kerneldoc warning people about how
dangerous it can be to use flush_scheduled_work() and related
functions. Something like this:

Think twice before calling this function! It's very easy
to get into trouble if you don't take great care. Either
of the following situations will lead to deadlock:

Your code is running in the context of a scheduled
work routine.

Your code or its caller holds a lock needed by
one of the work items currently on the workqueue.

Since you generally don't know who your caller is, what locks
it holds, or what locks are needed by the items on the
workqueue, avoiding these situations is quite difficult.

Consider using cancel_work_sync() or cancel_delayed_work_sync()
instead. In most situations they will accomplish what you
need.

Does this sound like a good idea? Certainly flush_scheduled_work()
is used in places where it shouldn't be.

If comments like this are added, where do you think would be a good
place to put them?

Alan Stern


2009-12-14 21:41:58

by Oliver Neukum

[permalink] [raw]
Subject: Re: Warn people about flush_scheduled_work()

Am Montag, 14. Dezember 2009 22:33:38 schrieb Alan Stern:
> Consider using cancel_work_sync() or cancel_delayed_work_sync()
> instead. In most situations they will accomplish what you
> need.

In which respect is cancel_work_sync() fundamentally safer?
If the work is already running and takes a lock you are holding,
then what?

Regards
Oliver

2009-12-14 22:02:55

by Alan Stern

[permalink] [raw]
Subject: Re: Warn people about flush_scheduled_work()

On Mon, 14 Dec 2009, Oliver Neukum wrote:

> Am Montag, 14. Dezember 2009 22:33:38 schrieb Alan Stern:
> > Consider using cancel_work_sync() or cancel_delayed_work_sync()
> > instead. In most situations they will accomplish what you
> > need.
>
> In which respect is cancel_work_sync() fundamentally safer?
> If the work is already running and takes a lock you are holding,
> then what?

With cancel_work_sync() you _know_ what locks the work item is going to
take, since it's your work item. With flush_scheduled_work() you have
no idea what locks will be needed by the items on the queue. They
could come from anywhere.

Alan Stern

2009-12-14 22:12:28

by Oliver Neukum

[permalink] [raw]
Subject: Re: Warn people about flush_scheduled_work()

Am Montag, 14. Dezember 2009 23:02:51 schrieb Alan Stern:
> On Mon, 14 Dec 2009, Oliver Neukum wrote:
>
> > Am Montag, 14. Dezember 2009 22:33:38 schrieb Alan Stern:
> > > Consider using cancel_work_sync() or cancel_delayed_work_sync()
> > > instead. In most situations they will accomplish what you
> > > need.
> >
> > In which respect is cancel_work_sync() fundamentally safer?
> > If the work is already running and takes a lock you are holding,
> > then what?
>
> With cancel_work_sync() you know what locks the work item is going to
> take, since it's your work item. With flush_scheduled_work() you have
> no idea what locks will be needed by the items on the queue. They
> could come from anywhere.

True, but what use is that if you don't know your call chain and the locks
it takes.

Regards
Oliver

2009-12-14 22:38:41

by Tejun Heo

[permalink] [raw]
Subject: Re: Warn people about flush_scheduled_work()

Hello, Alan Stern.

On 12/15/2009 06:33 AM, Alan Stern wrote:
> You've spent some time working on the workqueue implementation, right?
> I'd like to add comments or kerneldoc warning people about how
> dangerous it can be to use flush_scheduled_work() and related
> functions. Something like this:
>
> Think twice before calling this function! It's very easy
> to get into trouble if you don't take great care. Either
> of the following situations will lead to deadlock:
>
> Your code is running in the context of a scheduled
> work routine.
>
> Your code or its caller holds a lock needed by
> one of the work items currently on the workqueue.
>
> Since you generally don't know who your caller is, what locks
> it holds, or what locks are needed by the items on the
> workqueue, avoiding these situations is quite difficult.

I think both problems can be detected by lockdep, right? So, they
aren't that difficult to detect.

> Consider using cancel_work_sync() or cancel_delayed_work_sync()
> instead. In most situations they will accomplish what you
> need.
>
> Does this sound like a good idea? Certainly flush_scheduled_work()
> is used in places where it shouldn't be.

Yeah, recommending more work-specific constructs definitely would be
better. It's bad that we can't recommend the use of flush_work() as
it doesn't do cross-cpu flushing. Maybe that needs explanation too.

> If comments like this are added, where do you think would be a good
> place to put them?

DocBook comment on top of each function, maybe?

Thanks.

--
tejun

2009-12-14 23:05:13

by Alan Stern

[permalink] [raw]
Subject: Re: Warn people about flush_scheduled_work()

On Mon, 14 Dec 2009, Oliver Neukum wrote:

> Am Montag, 14. Dezember 2009 23:02:51 schrieb Alan Stern:
> > On Mon, 14 Dec 2009, Oliver Neukum wrote:
> >
> > > Am Montag, 14. Dezember 2009 22:33:38 schrieb Alan Stern:
> > > > Consider using cancel_work_sync() or cancel_delayed_work_sync()
> > > > instead. In most situations they will accomplish what you
> > > > need.
> > >
> > > In which respect is cancel_work_sync() fundamentally safer?
> > > If the work is already running and takes a lock you are holding,
> > > then what?
> >
> > With cancel_work_sync() you know what locks the work item is going to
> > take, since it's your work item. With flush_scheduled_work() you have
> > no idea what locks will be needed by the items on the queue. They
> > could come from anywhere.
>
> True, but what use is that if you don't know your call chain and the locks
> it takes.

Okay, clearly you have to know something about your call chain. The
comment should be toned down a little bit.

Alan Stern

2009-12-14 23:14:14

by Alan Stern

[permalink] [raw]
Subject: Re: Warn people about flush_scheduled_work()

On Tue, 15 Dec 2009, Tejun Heo wrote:

> Hello, Alan Stern.
>
> On 12/15/2009 06:33 AM, Alan Stern wrote:
> > You've spent some time working on the workqueue implementation, right?
> > I'd like to add comments or kerneldoc warning people about how
> > dangerous it can be to use flush_scheduled_work() and related
> > functions. Something like this:
> >
> > Think twice before calling this function! It's very easy
> > to get into trouble if you don't take great care. Either
> > of the following situations will lead to deadlock:
> >
> > Your code is running in the context of a scheduled
> > work routine.
> >
> > Your code or its caller holds a lock needed by
> > one of the work items currently on the workqueue.
> >
> > Since you generally don't know who your caller is, what locks
> > it holds, or what locks are needed by the items on the
> > workqueue, avoiding these situations is quite difficult.
>
> I think both problems can be detected by lockdep, right? So, they
> aren't that difficult to detect.

Maybe they can, now. It used to be they couldn't.

However, lockdep doesn't help much -- it tells you the cause of the
deadlock but it doesn't prevent the deadlock from occurring. The
programmer has to do this, by avoiding flush_scheduled_work(). I guess
that could be added to the comment.

> > Consider using cancel_work_sync() or cancel_delayed_work_sync()
> > instead. In most situations they will accomplish what you
> > need.
> >
> > Does this sound like a good idea? Certainly flush_scheduled_work()
> > is used in places where it shouldn't be.
>
> Yeah, recommending more work-specific constructs definitely would be
> better. It's bad that we can't recommend the use of flush_work() as
> it doesn't do cross-cpu flushing. Maybe that needs explanation too.

I'll do my comments, and you can do yours. :-)

Should these changes go through Andrew Morton? Or can you take them?

Alan Stern

2009-12-14 23:25:20

by Tejun Heo

[permalink] [raw]
Subject: Re: Warn people about flush_scheduled_work()

Hello,

On 12/15/2009 08:14 AM, Alan Stern wrote:
>> I think both problems can be detected by lockdep, right? So, they
>> aren't that difficult to detect.
>
> Maybe they can, now. It used to be they couldn't.

I think they can now. Workqueue workers acquire pseudo lock known to
lockdep around execution of a work function and flush_workqueue()
grabs and releases the pseudo lock too, so both are known to lockdep.

> However, lockdep doesn't help much -- it tells you the cause of the
> deadlock but it doesn't prevent the deadlock from occurring. The
> programmer has to do this, by avoiding flush_scheduled_work(). I guess
> that could be added to the comment.

As lockdep computes deadlocks from all possible combinations, it
usually detects deadlocks pretty quickly during devel cycle, so it
doesn't prevent them from happening but still helps a lot.

>> Yeah, recommending more work-specific constructs definitely would be
>> better. It's bad that we can't recommend the use of flush_work() as
>> it doesn't do cross-cpu flushing. Maybe that needs explanation too.
>
> I'll do my comments, and you can do yours. :-)

Fair enough. :-)

> Should these changes go through Andrew Morton? Or can you take them?

I already have a tree set up for workqueue changes, so I guess I can
take them.

Thanks.

--
tejun

2009-12-16 15:52:24

by Alan Stern

[permalink] [raw]
Subject: [PATCH] Warn about flush_scheduled_work()

This patch (as1319) adds kerneldoc and a pointed warning to
flush_scheduled_work().

Signed-off-by: Alan Stern <[email protected]>

---

Index: usb-2.6/kernel/workqueue.c
===================================================================
--- usb-2.6.orig/kernel/workqueue.c
+++ usb-2.6/kernel/workqueue.c
@@ -720,6 +720,30 @@ int schedule_on_each_cpu(work_func_t fun
return 0;
}

+/**
+ * flush_scheduled_work - ensure that any scheduled work has run to completion.
+ *
+ * Forces execution of the kernel-global workqueue and blocks until its
+ * completion.
+ *
+ * Think twice before calling this function! It's very easy to get into
+ * trouble if you don't take great care. Either of the following situations
+ * will lead to deadlock:
+ *
+ * One of the work items currently on the workqueue needs to acquire
+ * a lock held by your code or its caller.
+ *
+ * Your code is running in the context of a work routine.
+ *
+ * They will be detected by lockdep when they occur, but the first might not
+ * occur very often. It depends on what work items are on the workqueue and
+ * what locks they need, which you have no control over.
+ *
+ * In most situations flushing the entire workqueue is overkill; you merely
+ * need to know that a particular work item isn't queued and isn't running.
+ * In such cases you should use cancel_delayed_work_sync() or
+ * cancel_work_sync() instead.
+ */
void flush_scheduled_work(void)
{
flush_workqueue(keventd_wq);

2009-12-21 08:23:18

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] Warn about flush_scheduled_work()

On 12/17/2009 12:52 AM, Alan Stern wrote:
> This patch (as1319) adds kerneldoc and a pointed warning to
> flush_scheduled_work().
>
> Signed-off-by: Alan Stern <[email protected]>

Queued in local queue. Will commit to wq#for-next when and if it
opens. If cmwq gets nacked upstream. I'll route this one separately.
Thanks.

--
tejun

2010-02-12 08:40:26

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] Warn about flush_scheduled_work()

On 12/21/2009 05:26 PM, Tejun Heo wrote:
> On 12/17/2009 12:52 AM, Alan Stern wrote:
>> This patch (as1319) adds kerneldoc and a pointed warning to
>> flush_scheduled_work().
>>
>> Signed-off-by: Alan Stern <[email protected]>

applied to wq tree. will push out when merge window opens.

Thanks.

--
tejun