2007-02-06 23:31:33

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 3/6] workqueue: make cancel_rearming_delayed_workqueue() work on idle dwork

cancel_rearming_delayed_workqueue(dwork) will hang forever if dwork was not
scheduled, because in that case cancel_delayed_work()->del_timer_sync() never
returns true.

I don't know if there are any callers which may have problems, but this is
not so convenient, and the fix is very simple.

Q: looks like we don't need "struct workqueue_struct *wq" parameter. If the
timer was aborted successfully, get_wq_data() == wq. Is it worth to add the
new function?

Signed-off-by: Oleg Nesterov <[email protected]>

--- 6.20-rc6-mm3/kernel/workqueue.c~3_cdw 2007-02-06 23:09:34.000000000 +0300
+++ 6.20-rc6-mm3/kernel/workqueue.c 2007-02-06 23:42:43.000000000 +0300
@@ -565,6 +565,10 @@ EXPORT_SYMBOL(flush_work_keventd);
void cancel_rearming_delayed_workqueue(struct workqueue_struct *wq,
struct delayed_work *dwork)
{
+ /* Was it ever queued ? */
+ if (!get_wq_data(&dwork->work))
+ return;
+
while (!cancel_delayed_work(dwork))
flush_workqueue(wq);
}


2007-02-07 14:32:00

by Daniel Drake

[permalink] [raw]
Subject: Re: [PATCH 3/6] workqueue: make cancel_rearming_delayed_workqueue() work on idle dwork

Oleg Nesterov wrote:
> cancel_rearming_delayed_workqueue(dwork) will hang forever if dwork was not
> scheduled, because in that case cancel_delayed_work()->del_timer_sync() never
> returns true.

Thanks! We hit this problem before with the zd1211rw driver and avoided
using cancel_rearming_delayed_workqueue() for this reason. I never did
get around to looking into if the function itself could be fixed,
although I see not much effort would have been needed :)

Daniel

2007-02-07 15:14:32

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/6] workqueue: make cancel_rearming_delayed_workqueue() work on idle dwork

On 02/07, Daniel Drake wrote:
>
> Oleg Nesterov wrote:
> >cancel_rearming_delayed_workqueue(dwork) will hang forever if dwork was not
> >scheduled, because in that case cancel_delayed_work()->del_timer_sync()
> >never
> >returns true.
>
> Thanks! We hit this problem before with the zd1211rw driver and avoided
> using cancel_rearming_delayed_workqueue() for this reason.

Great. But I am afraid my changelog was incomplete. This patch only fixes
the cancel_rearming_delayed_workqueue(freshly_initialized_dwork) lockup.

The following code

schedule_delayed_work(dw);
cancel_rearming_delayed_workqueue(dw); // OK
cancel_rearming_delayed_workqueue(dw); // HANGS!

still doesn't work.

Is it worth fixing? The fix is very simple, and probably makes sense by
itself:

cancel_delayed_work:

- work_release(&work->work);
+ work->work.data = NULL;

Oleg.

2007-02-07 17:41:40

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/6] workqueue: make cancel_rearming_delayed_workqueue() work on idle dwork

On 02/07, Oleg Nesterov wrote:
>
> The following code
>
> schedule_delayed_work(dw);
> cancel_rearming_delayed_workqueue(dw); // OK
> cancel_rearming_delayed_workqueue(dw); // HANGS!
>
> still doesn't work.

I think we have another problem with delayed_works.

cancel_rearming_delayed_workqueue() doesn't garantee that the ->func() is not
running upon return. I don't know if it is bug or not, the comment says nothing
about that.

However, we have the callers which seem to assume the opposite, example

net/ipv4/ipvs/ip_vs_core.c

module_exit
ip_vs_cleanup
ip_vs_control_cleanup
cancel_rearming_delayed_work
// done

This is unsafe. The module may be unloaded and the memory may be freed
while defense_work_handler() is still running/preempted.

Unless I missed something, which side should be fixed?

Oleg.

2007-02-08 03:09:01

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH 3/6] workqueue: make cancel_rearming_delayed_workqueue() work on idle dwork

On Wed, Feb 07, 2007 at 08:43:55PM +0300, Oleg Nesterov wrote:
> On 02/07, Oleg Nesterov wrote:
> >
> > The following code
> >
> > schedule_delayed_work(dw);
> > cancel_rearming_delayed_workqueue(dw); // OK
> > cancel_rearming_delayed_workqueue(dw); // HANGS!
> >
> > still doesn't work.
>
> I think we have another problem with delayed_works.
>
> cancel_rearming_delayed_workqueue() doesn't garantee that the ->func() is not
> running upon return. I don't know if it is bug or not, the comment says nothing
> about that.
>
> However, we have the callers which seem to assume the opposite, example
>
> net/ipv4/ipvs/ip_vs_core.c
>
> module_exit
> ip_vs_cleanup
> ip_vs_control_cleanup
> cancel_rearming_delayed_work
> // done
>
> This is unsafe. The module may be unloaded and the memory may be freed
> while defense_work_handler() is still running/preempted.
>
> Unless I missed something, which side should be fixed?

Assuming the decision is to fix the ipvs side, is the fix
just to remove the call to cancel_rearming_delayed_work() in
ip_vs_control_cleanup() ?

--
Horms
H: http://www.vergenet.net/~horms/
W: http://www.valinux.co.jp/en/

2007-02-08 08:35:52

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/6] workqueue: make cancel_rearming_delayed_workqueue() work on idle dwork

On 02/08, Horms wrote:
>
> On Wed, Feb 07, 2007 at 08:43:55PM +0300, Oleg Nesterov wrote:
> >
> > I think we have another problem with delayed_works.
> >
> > cancel_rearming_delayed_workqueue() doesn't garantee that the ->func() is not
> > running upon return. I don't know if it is bug or not, the comment says nothing
> > about that.
> >
> > However, we have the callers which seem to assume the opposite, example
> >
> > net/ipv4/ipvs/ip_vs_core.c
> >
> > module_exit
> > ip_vs_cleanup
> > ip_vs_control_cleanup
> > cancel_rearming_delayed_work
> > // done
> >
> > This is unsafe. The module may be unloaded and the memory may be freed
> > while defense_work_handler() is still running/preempted.
> >
> > Unless I missed something, which side should be fixed?
>
> Assuming the decision is to fix the ipvs side, is the fix
> just to remove the call to cancel_rearming_delayed_work() in
> ip_vs_control_cleanup() ?

I think ip_vs_control_cleanup() should also do flush_workqueue() after
cancel_rearming_delayed_work().

This is ugly, because we have flush_work() but can't use it on delayed
works. This is possible to change, but not so trivial.

Andrew, do you think it is worth to tweak delayed works so it would be
possible to use flush_work(dwork->work) ?

Oleg.

2007-02-08 08:40:05

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 3/6] workqueue: make cancel_rearming_delayed_workqueue() work on idle dwork

On Thu, 8 Feb 2007 11:35:39 +0300 Oleg Nesterov <[email protected]> wrote:

> On 02/08, Horms wrote:
> >
> > On Wed, Feb 07, 2007 at 08:43:55PM +0300, Oleg Nesterov wrote:
> > >
> > > I think we have another problem with delayed_works.
> > >
> > > cancel_rearming_delayed_workqueue() doesn't garantee that the ->func() is not
> > > running upon return. I don't know if it is bug or not, the comment says nothing
> > > about that.
> > >
> > > However, we have the callers which seem to assume the opposite, example
> > >
> > > net/ipv4/ipvs/ip_vs_core.c
> > >
> > > module_exit
> > > ip_vs_cleanup
> > > ip_vs_control_cleanup
> > > cancel_rearming_delayed_work
> > > // done
> > >
> > > This is unsafe. The module may be unloaded and the memory may be freed
> > > while defense_work_handler() is still running/preempted.
> > >
> > > Unless I missed something, which side should be fixed?
> >
> > Assuming the decision is to fix the ipvs side, is the fix
> > just to remove the call to cancel_rearming_delayed_work() in
> > ip_vs_control_cleanup() ?
>
> I think ip_vs_control_cleanup() should also do flush_workqueue() after
> cancel_rearming_delayed_work().
>
> This is ugly, because we have flush_work() but can't use it on delayed
> works. This is possible to change, but not so trivial.
>
> Andrew, do you think it is worth to tweak delayed works so it would be
> possible to use flush_work(dwork->work) ?
>

I've completely lost track of what you've been doing in there (this is a
problem) but sure, if the patch isn't too horrid it's always better to be
robust in the core than to have to work around inadequacies in the callers.

2007-02-08 09:46:16

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/6] workqueue: make cancel_rearming_delayed_workqueue() work on idle dwork

On 02/08, Andrew Morton wrote:
>
> On Thu, 8 Feb 2007 11:35:39 +0300 Oleg Nesterov <[email protected]> wrote:
>
> > Andrew, do you think it is worth to tweak delayed works so it would be
> > possible to use flush_work(dwork->work) ?
> >
>
> I've completely lost track of what you've been doing in there (this is a
> problem) but sure, if the patch isn't too horrid it's always better to be
> robust in the core than to have to work around inadequacies in the callers.

It is not so obvious to me what should be done. Note that this problem is not
connected to recent changes, there were (I hope) completely transparent for
the delayed works.

The comment for cancel_delayed_work() work says

Note that the work callback function may still be running on return from
cancel_delayed_work(). Run flush_scheduled_work() or flush_work() to wait
on it.

The same is true for cancel_rearming_delayed_work(), but not documented. Note
also that the comment above is wrong, we can't use flush_work(dwork->work), it
was never supposed to do because queue_delayed_work() use work->data "wrongly".

Now,

- We can change cancel_rearming_delayed_work() so it does a final
flush_workqueue(). But this means that 2 flavors of cancel delayed
work will have a subtle difference.

OR

- Document the fact that cancel_rearming_delayed_work() doesn't
garantee that ->func() is not running upon return, fix affected
callers.

Finally, we can also tweak delaed_works so it will actually be possible
to use flush_work(dwork->work) after cancel_{,rearming_}delayed_work().
Seems to make sense, but needs (hopefully not too horrid) changes.

And other problems. Currently cancel_rearming_delayed_work(dwork) will hang
if dwork was never scheduled, or cancel_rearming_delayed_work() was already
called before. The first problem is solved by this patch, the second is still
here. The fix is simple _unless_ we are going to implement "flush_work() works
on dwork->work" above.

Oh, I can't make a decision, please tell me...

Oleg.