2009-07-15 18:43:12

by Oleg Nesterov

[permalink] [raw]
Subject: Re: FW: avoiding run_workqueue() recursion

Hi Anirban,

On 07/14, Anirban Sinha wrote:
>
> >I had a question about one of your previous commits:
> >
> >: commit 2355b70fd59cb5be7de2052a9edeee7afb7ff099
> >: Author: Lai Jiangshan <[email protected]>
> >: Date: Thu Apr 2 16:58:24 2009 -0700
> >:
> >: workqueue: avoid recursion in run_workqueue()
> >
> >http://git.kernel.org/linus/2355b70fd59cb5be7de2052a9edeee7afb7ff099
> >
> >
> >I saw a few discussions on the mailing list around this. I also did see
> >your "I still don't know why I merged ..." comment on this. I have the
> >following observations. I am new in the kernel hacking world, so please
> >bear with me.
> >
> >(a) I do agree that flushing the work queues from within
> run_workqueue()
> >is buggy in itself.
> >
> >(b) I do also agree that recursive call to run_workqueue() is bad due
> to
> >the reasons cited in the commit log (even though I had a good laugh
> when
> >I saw the "morton gets to eat his hat" stuff :)).
> >
> >(c) I am a little puzzled by the change the patch made. If we let the
> >call sleep on completion when keventd is itself running the
> >flush_workqueue(), are we not introducing a deadlock? If the thread
> that
> >is itself is responsible for walking the workqueue and dispatching the
> >work functions goes to sleep, who will wake it up?

Yes, this will deadlock. Note the WARN_ON().

> >In my honest opinion, I think we should simply return when (cwq->thread
> >== current) is true. I think in that condition, it should be just a
> >nop.

If we just return silently, we do not flush but hide the problem ?
And in this can lead to other problems which are very hard to
trigger/debug.

Oleg.


2009-07-15 18:52:42

by Anirban Sinha

[permalink] [raw]
Subject: RE: FW: avoiding run_workqueue() recursion

Hi Oleg:

>If we just return silently, we do not flush but hide the problem ?
>And in this can lead to other problems which are very hard to
>trigger/debug.

True. I think flushing is an invalid operation for a thread that is
already walking the work-queue, like keventd. It is inherently bug in
the code somewhere else (may be in a work function?). I liked your idea
of replacing WARN_ON() with BUG_ON() but I do understand that a panic
could be a bigger hammer here. May be we can have some sort of
restrictions or conventions for writing work functions? I don't know.

Ani