2003-09-23 13:41:37

by Ian Kent

[permalink] [raw]
Subject: [PATCH] autofs4 deadlock during expire - kernel 2.6


I have noticed that a deadlock can occur in the autofs4 module in the 2.6
kernel and RedHat kernels with the O(1) scheduler patch. The easiest way
to reproduce it is with an autofs-4.0.0 tree mount with at least 10 mounts
within a Nautilus desktop environment with the module debug flag set.
Perhaps this is because of the longish amount of time spent in the expire
state.

The scenario:

1) An expire locates an expirable dentry and signals the user space
daemon to umount it.
2) During the umount operation Nautilus notices and scans entries in the
directory tree triggering a revalidate/lookup.
3) autofs4 places the requesting process on the umount wait queue.
4) At completion of the expire both process are released from the wait
queue.
5) If the lookup gets a quantum first deadlock occurs and expiration stops
leaving an autofs mount that is permanently busy.

The following patch fixes this by ensuring that the expire gets a lookin
before the revalidate/lookup continues.

The problem of the remount that this causes remains and I an not sure how
to deal with that at this stage.

Comments and suggestions please.

Is this acceptable for inclusion in the kernel?
If so what should I do to make this happen?

diff -Nur autofs4-2.6.orig/fs/autofs4/autofs_i.h autofs4-2.6.deadlock/fs/autofs4/autofs_i.h
--- autofs4-2.6.orig/fs/autofs4/autofs_i.h 2003-09-09 03:50:14.000000000 +0800
+++ autofs4-2.6.deadlock/fs/autofs4/autofs_i.h 2003-09-22 22:48:11.000000000 +0800
@@ -82,6 +82,7 @@
char *name;
/* This is for status reporting upon return */
int status;
+ struct task_struct *owner;
int wait_ctr;
};

diff -Nur autofs4-2.6.orig/fs/autofs4/waitq.c autofs4-2.6.deadlock/fs/autofs4/waitq.c
--- autofs4-2.6.orig/fs/autofs4/waitq.c 2003-09-09 03:50:31.000000000 +0800
+++ autofs4-2.6.deadlock/fs/autofs4/waitq.c 2003-09-23 00:02:29.209789432 +0800
@@ -206,6 +206,11 @@

interruptible_sleep_on(&wq->queue);

+ if (waitqueue_active(&wq->queue) && current != wq->owner) {
+ set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(wq->wait_ctr * (HZ/10));
+ }
+
spin_lock_irqsave(&current->sighand->siglock, irqflags);
current->blocked = oldset;
recalc_sigpending();

--

,-._|\ Ian Kent
/ \ Perth, Western Australia
*_.--._/ E-mail: [email protected]
v Web: http://themaw.net/


2003-09-23 15:32:59

by Mike Waychison

[permalink] [raw]
Subject: Re: [PATCH] autofs4 deadlock during expire - kernel 2.6

Ian Kent wrote:
> I have noticed that a deadlock can occur in the autofs4 module in the 2.6
> kernel and RedHat kernels with the O(1) scheduler patch. The easiest way
> to reproduce it is with an autofs-4.0.0 tree mount with at least 10 mounts
> within a Nautilus desktop environment with the module debug flag set.
> Perhaps this is because of the longish amount of time spent in the expire
> state.
>
> The scenario:
>
> 1) An expire locates an expirable dentry and signals the user space
> daemon to umount it.
> 2) During the umount operation Nautilus notices and scans entries in the
> directory tree triggering a revalidate/lookup.
> 3) autofs4 places the requesting process on the umount wait queue.
> 4) At completion of the expire both process are released from the wait
> queue.
> 5) If the lookup gets a quantum first deadlock occurs and expiration stops
> leaving an autofs mount that is permanently busy.
>
> The following patch fixes this by ensuring that the expire gets a lookin
> before the revalidate/lookup continues.

I'm not extremely familiar with the autofs4 implementation, but why is
the expiring process in the wait queue to start off with? Doesn't
autofs4 let the daemon's pgrp bypass the waitqueue completely?

>
> The problem of the remount that this causes remains and I an not sure how
> to deal with that at this stage.

This is purely a userspace issue. Nautilus trampling all over the
filesystem is greatly unjustified. If they want a decent filesystem
change notification system put in place, let them propose something.
Caching mountpoints entries is a bad idea.

>
> Comments and suggestions please.
>
> Is this acceptable for inclusion in the kernel?
> If so what should I do to make this happen?
>
> diff -Nur autofs4-2.6.orig/fs/autofs4/autofs_i.h autofs4-2.6.deadlock/fs/autofs4/autofs_i.h
> --- autofs4-2.6.orig/fs/autofs4/autofs_i.h 2003-09-09 03:50:14.000000000 +0800
> +++ autofs4-2.6.deadlock/fs/autofs4/autofs_i.h 2003-09-22 22:48:11.000000000 +0800
> @@ -82,6 +82,7 @@
> char *name;
> /* This is for status reporting upon return */
> int status;
> + struct task_struct *owner;

This new field isn't even set anywhere!!

> int wait_ctr;
> };
>
> diff -Nur autofs4-2.6.orig/fs/autofs4/waitq.c autofs4-2.6.deadlock/fs/autofs4/waitq.c
> --- autofs4-2.6.orig/fs/autofs4/waitq.c 2003-09-09 03:50:31.000000000 +0800
> +++ autofs4-2.6.deadlock/fs/autofs4/waitq.c 2003-09-23 00:02:29.209789432 +0800
> @@ -206,6 +206,11 @@
>
> interruptible_sleep_on(&wq->queue);
>
> + if (waitqueue_active(&wq->queue) && current != wq->owner) {
> + set_current_state(TASK_INTERRUPTIBLE);
> + schedule_timeout(wq->wait_ctr * (HZ/10));
> + }
> +

This doesn't avoid the problem, it just makes it go away 99.99% of the
time. I would suggest using a complete_all from the owner of the wq,
and let all other waiters wait for the completion.

Again, why is the umount on this queue?

Mike Waychison

2003-09-23 16:18:39

by Ian Kent

[permalink] [raw]
Subject: Re: [PATCH] autofs4 deadlock during expire - kernel 2.6

On Tue, 23 Sep 2003, Mike Waychison wrote:

> I'm not extremely familiar with the autofs4 implementation, but why is
> the expiring process in the wait queue to start off with? Doesn't
> autofs4 let the daemon's pgrp bypass the waitqueue completely?
>

The daemon issues expire requests to the kernel module every timeout
interval. When the kernel module identifies a eligible dentry to expire it
posts a message to the user space daemon and waits on a wait queue for the
daemon to signal completion. Any other process that do a lookup on the
dentry being expired wait on the queue for completion. Upon completion
the waiters are awoken. The O(1) scheduler never seems to pick the
process that posted the expire whereas the 2.4 kernel does. I have tried
an exclusive wait queue without success (among other things). Delaying
waiters other than the owner seems a fairly effective method of
letting the owner complete its task before the other waiters continue.

> >
> > The problem of the remount that this causes remains and I an not sure how
> > to deal with that at this stage.
>
> This is purely a userspace issue. Nautilus trampling all over the
> filesystem is greatly unjustified. If they want a decent filesystem
> change notification system put in place, let them propose something.
> Caching mountpoints entries is a bad idea.

Yes and maybe I can do something about in the userspace daemon, but it is
a problem that exists and we need to ba aware of.

I have subscribed to the Nautilus list and got some pointers to the parts
of the code that may be in question. I haven't had a chance to examine it
yet.

>
> >
> > Comments and suggestions please.
> >
> > Is this acceptable for inclusion in the kernel?
> > If so what should I do to make this happen?
> >
> > diff -Nur autofs4-2.6.orig/fs/autofs4/autofs_i.h autofs4-2.6.deadlock/fs/autofs4/autofs_i.h
> > --- autofs4-2.6.orig/fs/autofs4/autofs_i.h 2003-09-09 03:50:14.000000000 +0800
> > +++ autofs4-2.6.deadlock/fs/autofs4/autofs_i.h 2003-09-22 22:48:11.000000000 +0800
> > @@ -82,6 +82,7 @@
> > char *name;
> > /* This is for status reporting upon return */
> > int status;
> > + struct task_struct *owner;
>
> This new field isn't even set anywhere!!

Right you are, sorry, I'll repost a corrected patch.

It's late and I'll have to test it again to make sure it works, so that
won't be till tomorrow.

Clearly this will only work for the case of one additional waiter that is
scheduled before the wait owner. The case that occurs here.

>
> > int wait_ctr;
> > };
> >
> > diff -Nur autofs4-2.6.orig/fs/autofs4/waitq.c autofs4-2.6.deadlock/fs/autofs4/waitq.c
> > --- autofs4-2.6.orig/fs/autofs4/waitq.c 2003-09-09 03:50:31.000000000 +0800
> > +++ autofs4-2.6.deadlock/fs/autofs4/waitq.c 2003-09-23 00:02:29.209789432 +0800
> > @@ -206,6 +206,11 @@
> >
> > interruptible_sleep_on(&wq->queue);
> >
> > + if (waitqueue_active(&wq->queue) && current != wq->owner) {
> > + set_current_state(TASK_INTERRUPTIBLE);
> > + schedule_timeout(wq->wait_ctr * (HZ/10));
> > + }
> > +
>
> This doesn't avoid the problem, it just makes it go away 99.99% of the
> time. I would suggest using a complete_all from the owner of the wq,
> and let all other waiters wait for the completion.

I thought of that also. I tried to work a completion into the code without
success. Anyway, an exclusive wait queue, similar in concept, didn't
solve the problem. It was still be necessary to delay releasing
other waiters.

Perhaps you would like to offer some code as an example of how best to do
this with a completion or other means.

--

,-._|\ Ian Kent
/ \ Perth, Western Australia
*_.--._/ E-mail: [email protected]
v Web: http://themaw.net/