Return-Path: linux-nfs-owner@vger.kernel.org Received: from acsinet15.oracle.com ([141.146.126.227]:38499 "EHLO acsinet15.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752620Ab1KCVwR convert rfc822-to-8bit (ORCPT ); Thu, 3 Nov 2011 17:52:17 -0400 Subject: Re: [BUG?] Maybe NFS bug since 2.6.37 on SPARC64 Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Chuck Lever In-Reply-To: <1320356241.80563.YahooMailNeo@web24706.mail.ird.yahoo.com> Date: Thu, 3 Nov 2011 17:51:57 -0400 Cc: Jim Rees , Trond Myklebust , Linux NFS Mailing List Message-Id: <92DF2E31-FABF-40A5-8F78-89B64363568B@oracle.com> References: <1320349396.90614.YahooMailNeo@web24707.mail.ird.yahoo.com> <1320353685.18396.119.camel@lade.trondhjem.org> <20111103211100.GA8393@umich.edu> <1320356241.80563.YahooMailNeo@web24706.mail.ird.yahoo.com> To: Lukas Razik Sender: linux-nfs-owner@vger.kernel.org List-ID: On Nov 3, 2011, at 5:37 PM, Lukas Razik wrote: >> On Nov 3, 2011, at 5:11 PM, Jim Rees wrote: > >> >>> Trond Myklebust wrote: >>> >>>> [ 442.666622] NFS: failed to create MNT RPC client, status=-60 >>>> [ 442.666732] NFS: unable to mount server 137.226.167.241, error -60 >>>> [ 442.666868] VFS: Unable to mount root fs via NFS, trying floppy. >>>> [ 442.667032] VFS: Insert root floppy and press ENTER >>>> >>> Error 60 is ETIMEDOUT on SPARC, so it seems that the problem is >>> basically the same one that you see in your 2.6.32 trace (rpcbind: >>> server 137.226.167.241 not responding, timed out) except that now it is >>> a fatal error. >>> >>> Any idea why the first RPC calls might be failing here? A switch >>> misconfiguration or something like that perhaps? >>> >>> Wasn't there a change in the way nfs mount options are handled by the >> kernel >>> for nfsroot about the time of 2.6.39? Something about changing from >> default >>> udp to tcp maybe? >> >> There was a change, but it was changed back to UDP because of problems like >> this. Behavior in 3.0 or the latest 2.6.39 stable kernel may be improved. >> > > I don't know if this was a tip to test newest 2.6.39 but as I wrote in my first email > http://thread.gmane.org/gmane.linux.nfs/44596 > that's the output of linux-2.6.39.4 with "nfsdebug": > > [ 407.571521] IP-Config: Complete: > [ 407.571589] device=eth0, addr=137.226.167.242, mask=255.255.255.224, gw=137.226.167.225, > [ 407.571793] host=cluster2, domain=, nis-domain=(none), > [ 407.571907] bootserver=255.255.255.255, rootserver=137.226.167.241, rootpath= > [ 407.572332] Root-NFS: nfsroot=/srv/nfs/cluster2 > [ 407.572726] NFS: nfs mount opts='udp,nolock,addr=137.226.167.241' > [ 407.572927] NFS: parsing nfs mount option 'udp' > [ 407.572995] NFS: parsing nfs mount option 'nolock' > [ 407.573071] NFS: parsing nfs mount option 'addr=137.226.167.241' > [ 407.573139] NFS: MNTPATH: '/srv/nfs/cluster2' > [ 407.573203] NFS: sending MNT request for 137.226.167.241:/srv/nfs/cluster2 > [ 408.617894] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx > [ 408.638319] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > [ 442.666622] NFS: failed to create MNT RPC client, status=-60 > [ 442.666732] NFS: unable to mount server 137.226.167.241, error -60 > [ 442.666868] VFS: Unable to mount root fs via NFS, trying floppy. > [ 442.667032] VFS: Insert root floppy and press ENTER > > And this behaviour is exactly the same as in all other 2.6.37 - 2.6.39.4 which I've tested. > So if anybody of you all have an idea what I could try to do, I'll follow... Find out why the very first RPC on your system always fails. As Trond says, the only reason this worked on the older kernels is because NFSROOT fell back to a default port for NFSD. This is also broken behavior, but in your case it happened to work so you never noticed it. I seem to recall there's a way to set the NFS and RPC debugging flags on the kernel command line so more information can be captured during boot. But I don't see it under Documentation/. You could add a line in fs/nfs/nfsroot.c:nfs_root_debug() to set flags also in the rpc_debug global variable to gather more information. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com