From: "Ara.T.Howard" Subject: Re: binaries becoming corrupt on nfs Date: Wed, 16 Mar 2005 15:40:47 -0700 (MST) Message-ID: References: <1110835899.19295.42.camel@lade.trondhjem.org> <1110836857.24466.4.camel@lade.trondhjem.org> <1110838426.24466.17.camel@lade.trondhjem.org> Reply-To: "Ara.T.Howard" Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: nfs@lists.sourceforge.net Received: from sc8-sf-mx1-b.sourceforge.net ([10.3.1.11] helo=sc8-sf-mx1.sourceforge.net) by sc8-sf-list2.sourceforge.net with esmtp (Exim 4.30) id 1DBhCJ-0003S8-L8 for nfs@lists.sourceforge.net; Wed, 16 Mar 2005 14:41:03 -0800 Received: from harp.ngdc.noaa.gov ([140.172.178.33]) by sc8-sf-mx1.sourceforge.net with esmtp (TLSv1:AES256-SHA:256) (Exim 4.41) id 1DBhCE-00034d-9a for nfs@lists.sourceforge.net; Wed, 16 Mar 2005 14:41:03 -0800 To: Trond Myklebust In-Reply-To: <1110838426.24466.17.camel@lade.trondhjem.org> Sender: nfs-admin@lists.sourceforge.net Errors-To: nfs-admin@lists.sourceforge.net List-Unsubscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Post: List-Help: List-Subscribe: , List-Archive: On Mon, 14 Mar 2005, Trond Myklebust wrote: > On Linux-2.4.x, mmap() will prevent the NFS client from clearing the cache > in a timely fashion, and so pages that should have been thrown out of cache > may become "frozen in" to the cache. > > In Linux-2.6.x, that problem should hopefully have been fixed due to a > combination of updates to the memory management layer + NFS client fixes. trond- i'm still seeing this issue even though NO copying is occuring on mmap'd binaries. the process used is now the built-in install program install: all $(install_prog) grid_ols/grid_ols $(bindir) $(install_prog) subset/subset $(bindir) install does not copy, it unlinks the dest and then writes a new file: jib:~/shared/dmspnl_new > strace install a b 2>&1 | tail -13 unlink("b") = 0 open("a", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0 open("b", O_WRONLY|O_CREAT|O_LARGEFILE, 0100664) = 4 fstat64(4, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0 fstat64(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0 read(3, "", 8192) = 0 close(4) = 0 close(3) = 0 chmod("b", 0600) = 0 chown32("b", -1, -1) = 0 chmod("b", 0755) = 0 exit_group(0) = ? if i run 'make install' while these binaries are running on our cluster (almost ensuring more than one of them has the file mmap'd) i will see some small random number of nodes with corrupt caches begin to have every subsequent run of the binary fail. this should not be - should it? -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki =============================================================================== ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs