Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Tue, 18 Dec 2001 10:11:05 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Tue, 18 Dec 2001 10:10:55 -0500 Received: from noc1.BelWue.de ([129.143.2.1]:41914 "EHLO noc1.BelWue.DE") by vger.kernel.org with ESMTP id ; Tue, 18 Dec 2001 10:10:47 -0500 From: Birger Lammering Message-ID: <15391.23663.215547.622349@stderr.science-computing.de> Date: Tue, 18 Dec 2001 16:10:39 +0100 To: amd-dev@cs.columbia.edu, linux-kernel@vger.kernel.org Subject: nfs3 problem: aix-server, amd, linux 2.4.10 - 2.4.17pre8 client X-Mailer: VM 6.89 under 21.1 (patch 14) "Cuyahoga Valley" XEmacs Lucid Mime-Version: 1.0 (generated by tm-edit 1.7) Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hi, forgot to cc this to linux-kernel and amd-dev. You might remember: (http://www.uwsg.indiana.edu/hypermail/linux/kernel/0111.2/0562.html) The bug is still hiding somewhere between nfs3 client (or server?) and amd.... To: Ion Badulescu , trond.myklebust@fys.uio.no Date: Mon, 17 Dec 2001 19:18:29 +0100 Hi Ion and Trond, Ion Badulescu writes: > So, can you try to get the same /proc/mounts line while mounting by hand, > and see if the problem re-appears? The command should be something like > mount -o nosuid,nodev,vers=3,tcp,intr,hard,rsize=32768,wsize=32768 ... ok, done. But there was no lock-up. > The reason I'm not really suspecting amd, but rather the kernel NFS > client, is because amd is only involved in mounting the server, it doesn't > do much after that. So unless there is a race condition somewhere which > involves quickly unmounting and remounting the same share, I don't see how > amd could be the cause here. > > > It's not only the bug in the TCP/IP or NFS driver that is > > interesting. I guess for tracing it, it would be cool to have some > > hint on what triggers the bug. So far I could not reproduce it with > > manual mounts (I've printed out the man page and tried all kinds of > > mount options allready :-). > > Well, try the above. If that mount command doesn't reproduce the lock-up, > then try forcing amd to keep the share mounted (simply have a shell > chdir'ed into that directory) and see if it still locks up. If it does, > then I'm at a loss... The cp was started after cd'ing into the target directory - and it locked up -> there is almost surely no race condition, caused by quickly mounting and umounting the share. (btw. we would have seen that in the tcpdump). I even "ls -l"'ed in another shell and saw that the file size grew up to 786432 bytes until cp locked up. The remainder of the file was copied in one go. The lock-up cannot be reproduced in a trivial way without amd; and the share is not umounted during the copy attempt. I have no clue how to nail down the bug; unless Trond finds something by inspecting the nfs-related changes from Linux 2.4.9 to 2.4.10.... (hint, hint :-) 2.4.9 and older don't show this behaviour... An idea for a possible (and ugly) work around, that came up here, was to tell amd to use the mount command rather than the mount system call. This can be done by editing the NIS map. I find this rather inconvenient for our purpose - to say the least :-/. Would it be possible to invent an amd.conf option (i.e. 'nfs_program=mount') that tells amd to use the mount/umount programs rather than the system calls? Or can I replace the mount system call in conf/mount/mount_linux.c by a system("mount ...") call and recompile? :-) Cheers, Birger - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/