2004-03-12 21:01:15

by Tim Hockin

[permalink] [raw]
Subject: calling flush_scheduled_work()

We've recently bumped into an issue, and I'm not sure which is the real bug.

In short we have a case where mntput() is called from the kevetd workqueue.
When that mntput() hit an NFS mount, we got a deadlock. It turns out that
deep in the RPC code, someone calls flush_scheduled_work(). Deadlock.

So what is the real bug?

Is it verboten to call mntput() from keventd? What other things might lead
to a flush_scheduled_work() and must therefore be avoided?

Should callers of flush_scheduled_work() be changed to use private
workqueues? There are 31 calls that I got from grep. 25 are in drivers/, 1
in ncpfs, 3 in nfs4, 2 in sunrpc. The drivers/ are *probably* ok. Should
those other 6 be changed?

Either way, it seems like there should maybe be a check and a badness
warning if flush_workqueue is called from that workqueue.

Which avenue should we follow? Our own problem can be fixed differently,
but I didn't want to just ignore this unstated assumption that it is safe to
call flush_scheduled_work() anywhere.

Tim

--
Tim Hockin
Sun Microsystems, Linux Software Engineering
[email protected]
All opinions are my own, not Sun's


2004-03-12 22:00:52

by Trond Myklebust

[permalink] [raw]
Subject: Re: calling flush_scheduled_work()

P? fr , 12/03/2004 klokka 15:58, skreiv Tim Hockin:
> We've recently bumped into an issue, and I'm not sure which is the real bug.
>
> In short we have a case where mntput() is called from the kevetd workqueue.
> When that mntput() hit an NFS mount, we got a deadlock. It turns out that
> deep in the RPC code, someone calls flush_scheduled_work(). Deadlock.
>
> So what is the real bug?
>
> Is it verboten to call mntput() from keventd? What other things might lead
> to a flush_scheduled_work() and must therefore be avoided?
>
> Should callers of flush_scheduled_work() be changed to use private
> workqueues? There are 31 calls that I got from grep. 25 are in drivers/, 1
> in ncpfs, 3 in nfs4, 2 in sunrpc. The drivers/ are *probably* ok. Should
> those other 6 be changed?

It would be dead easy to change RPC+NFS to use their own, workqueue. In
fact I've already considered that several times in the places where the
NFSv4 state stuff gets hairy. ... but it might be nice not to have to
set up all these (mostly) idle threads for exclusive use by NFS just in
order to catch a corner case.

Could we therefore perhaps consider setting up one or more global
workqueues that would be alternatives to keventd?

Cheers,
Trond

2004-03-12 22:38:43

by Tim Hockin

[permalink] [raw]
Subject: Re: calling flush_scheduled_work()

On Fri, Mar 12, 2004 at 05:00:48PM -0500, Trond Myklebust wrote:
> It would be dead easy to change RPC+NFS to use their own, workqueue. In
> fact I've already considered that several times in the places where the
> NFSv4 state stuff gets hairy. ... but it might be nice not to have to
> set up all these (mostly) idle threads for exclusive use by NFS just in
> order to catch a corner case.
>
> Could we therefore perhaps consider setting up one or more global
> workqueues that would be alternatives to keventd?

That just divides up the problem, but doesn't solve it. The simplest would
be to print a badness and somehow get out of the flush. Then anyone who is
triggering the corner case (us, in this case) can just solve it ourselves.
It's not pretty.

In short, it's dubious that ANYONE call flush_scheduled_work() on a
workqueue that they don't own.

Tim

2004-03-12 23:19:23

by Trond Myklebust

[permalink] [raw]
Subject: Re: calling flush_scheduled_work()

P? fr , 12/03/2004 klokka 17:38, skreiv Tim Hockin:
> That just divides up the problem, but doesn't solve it. The simplest would
> be to print a badness and somehow get out of the flush. Then anyone who is
> triggering the corner case (us, in this case) can just solve it ourselves.
> It's not pretty.

Ewww.... That's saying "I'm just going to ignore the problem and pretend
I can continue". At least the hang will tell you that there *is* a
problem and points to where it is happening.

If I understood your description of where it is happening, then in this
case, continuing is likely to lead to an Oops later when all those open
RPC up/downcall pipes find that their parent directory has disappeared
from underneath them.

> In short, it's dubious that ANYONE call flush_scheduled_work() on a
> workqueue that they don't own.

Huh? You're saying that just because work has been scheduled on some
third party thread, you should not be allowed to wait on completion of
that work? That is a clearly unreasonable statement... Imagine doing
even memory allocation in such an environment...

I agree that we need to identify and solve the deadlocking problems, but
let's not start constricting developers with completely arbitrary rules.

The *real* problem here is that mput() is being run from keventd, and is
triggering code that was assumed to be running from an ordinary process
context. Let's concentrate on fixing that...

Cheers,
Trond

2004-03-12 23:25:58

by Andrew Morton

[permalink] [raw]
Subject: Re: calling flush_scheduled_work()

Tim Hockin <[email protected]> wrote:
>
> We've recently bumped into an issue, and I'm not sure which is the real bug.
>
> In short we have a case where mntput() is called from the kevetd workqueue.
> When that mntput() hit an NFS mount, we got a deadlock. It turns out that
> deep in the RPC code, someone calls flush_scheduled_work(). Deadlock.

Seems simple enough to fix the workqueue code to handle this situation.

Wanna test this for me?

---

25-akpm/kernel/workqueue.c | 8 ++++++++
1 files changed, 8 insertions(+)

diff -puN kernel/workqueue.c~flush_scheduled_work-deadlock-fix kernel/workqueue.c
--- 25/kernel/workqueue.c~flush_scheduled_work-deadlock-fix Fri Mar 12 15:24:29 2004
+++ 25-akpm/kernel/workqueue.c Fri Mar 12 15:25:46 2004
@@ -229,6 +229,14 @@ void fastcall flush_workqueue(struct wor
continue;
cwq = wq->cpu_wq + cpu;

+ if (cwq->thread == current) {
+ /*
+ * Probably keventd trying to flush its own queue.
+ * So just run it by hand rather than deadlocking
+ */
+ run_workqueue(cwq);
+ continue;
+ }
spin_lock_irq(&cwq->lock);
sequence_needed = cwq->insert_sequence;


_

2004-03-12 23:31:43

by Tim Hockin

[permalink] [raw]
Subject: Re: calling flush_scheduled_work()

On Fri, Mar 12, 2004 at 06:19:14PM -0500, Trond Myklebust wrote:
> Ewww.... That's saying "I'm just going to ignore the problem and pretend
> I can continue". At least the hang will tell you that there *is* a
> problem and points to where it is happening.

Well, it needs to be non-silent. Whether that is a BUG() or a badness or a
panic..

> > In short, it's dubious that ANYONE call flush_scheduled_work() on a
> > workqueue that they don't own.
>
> Huh? You're saying that just because work has been scheduled on some
> third party thread, you should not be allowed to wait on completion of
> that work? That is a clearly unreasonable statement... Imagine doing
> even memory allocation in such an environment...

Waiting for completion of your work is one thing. But you can't know what
else has been scheduled. In this case RPC doesn't know that our work is
calling it. By waiting for ALL work, it is assuming (silently) that it will
never be called from a keventd.

Either that assumption is true, in which case there needs to be a BUG and a
way out, or that assumption is false. I'd like to believe that the
assumption is false.

> The *real* problem here is that mput() is being run from keventd, and is
> triggering code that was assumed to be running from an ordinary process
> context. Let's concentrate on fixing that...

Well, we did fix it in a different way. I just wanted to bring this up as
something that warrants at least a BUG(). But a BUG() and deadlock is
almost a panic().

So if mntput() just CAN NOT be called from keventd, that is a fine answer,
but something should be written about that. And all the other things that
can not be called from keventd.

--
Tim Hockin
Sun Microsystems, Linux Software Engineering
[email protected]
All opinions are my own, not Sun's

2004-03-13 00:20:45

by Tim Hockin

[permalink] [raw]
Subject: Re: calling flush_scheduled_work()

On Fri, Mar 12, 2004 at 03:27:47PM -0800, Andrew Morton wrote:
> Seems simple enough to fix the workqueue code to handle this situation.

Indeed. I should have done that myself. Thanks.

> Wanna test this for me?

Works for me.

--
Tim Hockin
Sun Microsystems, Linux Software Engineering
[email protected]
All opinions are my own, not Sun's

2004-03-13 01:17:57

by Trond Myklebust

[permalink] [raw]
Subject: Re: calling flush_scheduled_work()

P? fr , 12/03/2004 klokka 18:27, skreiv Andrew Morton:
> Seems simple enough to fix the workqueue code to handle this situation.
>
> Wanna test this for me?

What if one of the workloads on the queue has another call to
flush_scheduled_work()? (which again finds itself calling another
instance, ...)
This fix doesn't really resolve the deadlock issue. It just reduces the
possibilities of it having any consequences...

Could you please at least include a dump_stack() before the call to
run_workqueue()) so that we can catch the bug in the act, and attempt to
fix it.

Cheers,
Trond

2004-03-13 02:12:23

by Andrew Morton

[permalink] [raw]
Subject: Re: calling flush_scheduled_work()

Trond Myklebust <[email protected]> wrote:
>
> P? fr , 12/03/2004 klokka 18:27, skreiv Andrew Morton:
> > Seems simple enough to fix the workqueue code to handle this situation.
> >
> > Wanna test this for me?
>
> What if one of the workloads on the queue has another call to
> flush_scheduled_work()? (which again finds itself calling another
> instance, ...)

I fixed that with the old solution-via-changelog:


Because keventd is a resource which is shared between unrelated parts
of the kernel it is possible for one person's workqueue handler to
accidentally call another person's flush_scheduled_work(). thockin
managed it by calling mntput() from a workqueue handler. It deadlocks.

It's simple enough to fix: teach flush_scheduled_work() to go direct
when it discovers that the calling thread is the one which should be
running the work.

Note that this can cause recursion. The depth of that recursion is
equal to the number of currently-queued works which themselves want to
call flush_scheduled_work(). If this ever exceeds three I'll eat my hat.


Really, it's so unlikely it's hardly worth bothering about.

> This fix doesn't really resolve the deadlock issue. It just reduces the
> possibilities of it having any consequences...
>
> Could you please at least include a dump_stack() before the call to
> run_workqueue()) so that we can catch the bug in the act, and attempt to
> fix it.

How's this?



Add a debug check for workqueues nested more than three deep via the
direct-run-workqueue() path.


---

25-akpm/kernel/workqueue.c | 10 ++++++++++
1 files changed, 10 insertions(+)

diff -puN kernel/workqueue.c~flush_scheduled_work-recursion-detect kernel/workqueue.c
--- 25/kernel/workqueue.c~flush_scheduled_work-recursion-detect 2004-03-12 17:54:32.859424968 -0800
+++ 25-akpm/kernel/workqueue.c 2004-03-12 18:11:09.939845704 -0800
@@ -47,6 +47,7 @@ struct cpu_workqueue_struct {
struct workqueue_struct *wq;
task_t *thread;

+ int run_depth; /* Detect run_workqueue() recursion depth */
} ____cacheline_aligned;

/*
@@ -140,6 +141,13 @@ static inline void run_workqueue(struct
* done.
*/
spin_lock_irqsave(&cwq->lock, flags);
+ cwq->run_depth++;
+ if (cwq->run_depth > 3) {
+ /* morton gets to eat his hat */
+ printk("%s: recusrion depth exceeded: %d\n",
+ __FUNCTION__, cwq->run_depth);
+ dump_stack();
+ }
while (!list_empty(&cwq->worklist)) {
struct work_struct *work = list_entry(cwq->worklist.next,
struct work_struct, entry);
@@ -157,6 +165,7 @@ static inline void run_workqueue(struct
cwq->remove_sequence++;
wake_up(&cwq->work_done);
}
+ cwq->run_depth--;
spin_unlock_irqrestore(&cwq->lock, flags);
}

@@ -286,6 +295,7 @@ struct workqueue_struct *create_workqueu
wq = kmalloc(sizeof(*wq), GFP_KERNEL);
if (!wq)
return NULL;
+ memset(wq, 0, sizeof(*wq));

for (cpu = 0; cpu < NR_CPUS; cpu++) {
if (!cpu_online(cpu))

_

2004-03-13 02:25:09

by Trond Myklebust

[permalink] [raw]
Subject: Re: calling flush_scheduled_work()

P? fr , 12/03/2004 klokka 21:12, skreiv Andrew Morton:
> > Could you please at least include a dump_stack() before the call to
> > run_workqueue()) so that we can catch the bug in the act, and attempt to
> > fix it.
>
> How's this?

Excellent (aside from the spelling error)... Just for the record:
chocolate hats will not be acceptable as a forfeit. 8-)

Cheers,
Trond

2004-03-13 11:44:24

by Stefan Rompf

[permalink] [raw]
Subject: Re: calling flush_scheduled_work()

Andrew Morton wrote:

>> In short we have a case where mntput() is called from the kevetd
>> workqueue.
>> When that mntput() hit an NFS mount, we got a deadlock. It turns out
>> that
>> deep in the RPC code, someone calls flush_scheduled_work(). Deadlock.
>
> Seems simple enough to fix the workqueue code to handle this situation.

Code fixing one corner case won't help. Some time ago, there has been a
deadlock between a network driver that called flush_scheduled_work() while
the kernel held the rtnl semaphore and work scheduled by the linkwatch code
that needs rtnl.

I had posted a patch that changed linkwatch not to block waiting for rtnl,
however it was dropped in favor of fixing the driver (I don't own that card,
so I can't tell you if it works by now)

However, this is another example for the problem: Any code can
schedule_work(), any other code can wait in any place for this work to
complete. As long as we don't have some known consent on what functions that
runs inside the keventd workqueue may (not) do, and when it is ok to call
flush_scheduled_work(), we are always at risk that the workqueue mechanism
creates a deadlock by accident.

Stefan