From: Bill Schrier Subject: Re: NFS Locking Issue - Solaris-Linux Date: Mon, 28 Oct 2002 09:25:53 -0500 Sender: nfs-admin@lists.sourceforge.net Message-ID: <3DBD48F1.93924BAF@neolinear.com> References: <3DB55BCC.447F0C32@neolinear.com> <200210221602.g9MG2BS25573@leinie.lmcg.wisc.edu> <3DB6EFF8.D8B659B0@neolinear.com> <200210251753.g9PHrqu01095@leinie.lmcg.wisc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: nfs@lists.sourceforge.net, it@neolinear.com, support@raidzone.com Return-path: Received: from n5.neolinear.com ([208.20.218.5] helo=flood.neolinear.com) by usw-sf-list1.sourceforge.net with esmtp (Exim 3.31-VA-mm2 #1 (Debian)) id 186Arn-0001PS-00 for ; Mon, 28 Oct 2002 06:27:43 -0800 To: Daniel Forrest Errors-To: nfs-admin@lists.sourceforge.net List-Help: List-Post: List-Subscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Unsubscribe: , List-Archive: > >> Thanks for the input - it is dead on. We tested your work around, > >> and it worked exactly as you described. > > I'm actually glad that someone else saw this same bug. > > >> The one thing that confuses me is that Raidzone said that they were > >> unable to reproduce the error - but you said you thought that > >> disabling nlockmgr over TCP was a compile time option. If this is > >> the case, then I would assume that Raidzone would also be running > >> with nlockmgr over TCP - since I think we are running the same > >> kernel that they are. > > After I downgraded to 2.4.16-10, I upgraded another (non-Raidzone) > machine to 2.4.18 and figured I could continue my testing. Bzzzzt. > Everything worked fine with NLM over TCP. I am suspecting that not > only is it a timing related bug, it is probably a hardware specific > bug, and maybe even limited to only a subset of the Raidzone boxes if > they can't reproduce the error themselves. Otherwise they should have > complaints from lots of Sun users, right? We ended up downgrading back to 2.4.16 as well (we went to -7 since that was what we had on hand). The reason for that was we had attempted to clean out the NLM stuff on the 2.4.18-12, but it ended up messing up our clients pretty bad - so we just downgraded. Running the 2.4.16 kernel also fixed the NLM problem for us. Honestly, I was surprised when they told us that they couldn't reproduce the problem. They sent us output that showed that they were running the NLM on TCP, but they said they weren't getting the issue - so I would agree that it might be limited to a certain subset of the raidzone boxes (lucky us, huh?). > >> Does anyone know if this is indeed correct? Is there a way to > >> disable nlockmgr over TCP without a kernel recompile? > > I would be interested in this too in case I run into it again. > > >> We're not too interested in downrevving our kernel on this machine. > >> Since a stock Redhat install appears to come with nlockmgr disabled > >> on TCP by default, it seems that it got turned on somewhere along > >> the way with the Raidzone kernel distributions, and really needs to > >> be turned back off - especially in an environment with Solaris > >> clients. > > If you turn on NFS over TCP you get NLM over TCP. I assume they are > turning on NFS over TCP for performance reasons. Daniel, Again, thanks for the info on the work around. We greatly appreciate the help! Bill -- William J. Schrier Phone: 412.968.5780 x151 Neolinear, Inc. Fax: 412.968.5788 583 Epsilon Drive Email: wschrier@neolinear.com Pittsburgh, PA 15238 ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs