From: "Ara.T.Howard" <Ara.T.Howard@noaa.gov>
Subject: Re: binaries becoming corrupt on nfs
Date: Wed, 16 Mar 2005 15:40:47 -0700 (MST)
Message-ID: <Pine.LNX.4.60.0503161535280.2370@harp.ngdc.noaa.gov>
References: <Pine.LNX.4.60.0503141414590.4702@harp.ngdc.noaa.gov>
 <1110835899.19295.42.camel@lade.trondhjem.org>
 <Pine.LNX.4.60.0503141434390.4702@harp.ngdc.noaa.gov>
 <1110836857.24466.4.camel@lade.trondhjem.org>  <Pine.LNX.4.60.0503141456140.4702@harp.ngdc.noaa.gov>
 <1110838426.24466.17.camel@lade.trondhjem.org>
Reply-To: "Ara.T.Howard" <Ara.T.Howard@noaa.gov>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: nfs@lists.sourceforge.net
To: Trond Myklebust <trond.myklebust@fys.uio.no>
In-Reply-To: <1110838426.24466.17.camel@lade.trondhjem.org>
Sender: nfs-admin@lists.sourceforge.net
Errors-To: nfs-admin@lists.sourceforge.net

On Mon, 14 Mar 2005, Trond Myklebust wrote:

> On Linux-2.4.x, mmap() will prevent the NFS client from clearing the cache
> in a timely fashion, and so pages that should have been thrown out of cache
> may become "frozen in" to the cache.
>
> In Linux-2.6.x, that problem should hopefully have been fixed due to a
> combination of updates to the memory management layer + NFS client fixes.

trond-

i'm still seeing this issue even though NO copying is occuring on mmap'd
binaries.  the process used is now the built-in install program

   install: all
           $(install_prog) grid_ols/grid_ols $(bindir)
           $(install_prog) subset/subset $(bindir)

install does not copy, it unlinks the dest and then writes a new file:

   jib:~/shared/dmspnl_new > strace install a b 2>&1 | tail -13
   unlink("b")                             = 0
   open("a", O_RDONLY|O_LARGEFILE)         = 3
   fstat64(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
   open("b", O_WRONLY|O_CREAT|O_LARGEFILE, 0100664) = 4
   fstat64(4, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
   fstat64(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
   read(3, "", 8192)                       = 0
   close(4)                                = 0
   close(3)                                = 0
   chmod("b", 0600)                        = 0
   chown32("b", -1, -1)                    = 0
   chmod("b", 0755)                        = 0
   exit_group(0)                           = ?

if i run 'make install' while these binaries are running on our cluster
(almost ensuring more than one of them has the file mmap'd) i will see some
small random number of nodes with corrupt caches begin to have every
subsequent run of the binary fail.

this should not be - should it?

-a
-- 
===============================================================================
| EMAIL   :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE   :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself.  --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs