2004-11-05 10:03:27

by Michael Gernoth

[permalink] [raw]
Subject: Hanging NFS umounts with 2.4.27

Hi,
we have 2 SMP machines running Linux 2.4.27 with autofs which have a high
rate of mounting/umounting. When they are heavily used, it happens that
about once a day a umount-process gets stuck and the only solution is
to reboot the machine.
We can also see this happen (much less often) on our UP workstations,
where in the average only one student is working on a single machine at
a time.

Searching through the Changesets I found 1.1402.1.19:
http://linux.bkbits.net:8080/linux-2.4/[email protected]
After reverting this one, we have a stable umount-behaviour again.

Grepping through the log of the serial console finds the following
hanging umounts (the nfs-servers were up at these moments):
19969 ? S 9:32 /bin/umount //local/perl-5.004/.arch.os
23437 ? S 0:00 /bin/umount //local/maple/.arch.os
19969 ? S 9:40 /bin/umount //local/perl-5.004/.arch.os
23437 ? S 0:00 /bin/umount //local/maple/.arch.os
19969 ? S 9:59 /bin/umount //local/perl-5.004/.arch.os
23437 ? S 0:00 /bin/umount //local/maple/.arch.os
19969 ? S 10:13 /bin/umount //local/perl-5.004/.arch.os
23437 ? S 0:00 /bin/umount //local/maple/.arch.os
22491 ? S 0:00 /bin/umount //local/gnu-utils-1.0/.arch.os
23560 ? S 0:00 /bin/umount //proj/cipadm

The file-systems are mounted from a Solaris 9 machine with the following
options on the client-side:
rw,nosuid,nodev,retry=5,intr,rsize=8192,noquota,wsize=8192,hard,tcp,addr=...

We are running Debian/Testing with autofs version 4.1.3.

Our current Kernel-config is at:
http://wwwcip.informatik.uni-erlangen.de/~simigern/cip-generic-config
(This Kernel is patched with ACLs and the current autofs-patch, but the
behaviour can be reproduced with a vanilla kernel.org kernel)

Regards,
Michael

Please CC me on replies, as I am not subscribed to linux-kernel. Thanks.

--
Michael Gernoth Department of Computer Science IV
Martensstrasse 1 D-91058 Erlangen Germany University of Erlangen-Nuremberg
http://wwwcip.informatik.uni-erlangen.de/


2004-11-05 19:59:18

by Trond Myklebust

[permalink] [raw]
Subject: Re: Hanging NFS umounts with 2.4.27

NFS: Always wake up tasks that are waiting on the sillyrenamed file to
complete.
---
unlink.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.4.28-rc1/fs/nfs/unlink.c
===================================================================
--- linux-2.4.28-rc1.orig/fs/nfs/unlink.c 2004-11-05 11:26:07.832922087 -0800
+++ linux-2.4.28-rc1/fs/nfs/unlink.c 2004-11-05 11:44:38.241824060 -0800
@@ -130,13 +130,14 @@ nfs_async_unlink_done(struct rpc_task *t
if (nfs_async_handle_jukebox(task))
return;
if (!dir)
- return;
+ goto out;
dir_i = dir->d_inode;
nfs_zap_caches(dir_i);
NFS_PROTO(dir_i)->unlink_done(dir, &task->tk_msg);
put_rpccred(data->cred);
data->cred = NULL;
dput(dir);
+out:
data->completed = 1;
wake_up(&data->waitq);
}


Attachments:
linux-2.4.28-fix_unlink.dif (792.00 B)

2004-11-05 20:22:56

by Michael Gernoth

[permalink] [raw]
Subject: Re: Hanging NFS umounts with 2.4.27

On Fri, Nov 05, 2004 at 11:55:41AM -0800, Trond Myklebust wrote:
> fr den 05.11.2004 Klokka 11:02 (+0100) skreiv Michael Gernoth:
>
> > Searching through the Changesets I found 1.1402.1.19:
> > http://linux.bkbits.net:8080/linux-2.4/[email protected]
> > After reverting this one, we have a stable umount-behaviour again.
>
> Does the attached patch help at all?

Just applied it and rebooted the machines.
I can say more on monday when the next horde of users are
working on the machines.

Thanks,
Michael

2004-11-08 21:21:44

by Michael Gernoth

[permalink] [raw]
Subject: Re: Hanging NFS umounts with 2.4.27

On Fri, Nov 05, 2004 at 11:55:41AM -0800, Trond Myklebust wrote:
> fr den 05.11.2004 Klokka 11:02 (+0100) skreiv Michael Gernoth:
>
> > Searching through the Changesets I found 1.1402.1.19:
> > http://linux.bkbits.net:8080/linux-2.4/[email protected]
> > After reverting this one, we have a stable umount-behaviour again.
>
> Does the attached patch help at all?
>
> NFS: Always wake up tasks that are waiting on the sillyrenamed file to
> complete.

This seems to fix it for us. Neither my stress-test during the weekend
nor the students today were able to reproduce the hanging umounts :-)

Thanks,
Michael

--
Michael Gernoth Department of Computer Science IV
Martensstrasse 1 D-91058 Erlangen Germany University of Erlangen-Nuremberg
http://wwwcip.informatik.uni-erlangen.de/