2009-04-15 20:52:37

by Chuck Lever

[permalink] [raw]
Subject: Re: klibc's nfsmount failure with 2.6.27.21, while 2.6.25.20 was fine

On Apr 15, 2009, at 3:17 PM, Hans-Peter Jansen wrote:
> Am Mittwoch, 15. April 2009 schrieb Chuck Lever:
>> On Apr 15, 2009, at 1:23 PM, Hans-Peter Jansen wrote:
>>> Am Mittwoch, 15. April 2009 schrieb Chuck Lever:
>>>> When using rpcbind instead of portmapper, what does the output of
>>>> "rpcinfo" look like on the server?
>>>
>>> It's dumped in the mail starting this thread.
>>
>> I don't see why the client's rpcbind attempt for the server's mountd
>> service should have failed.
>
> For completeness, here's the current rpcinfo view with portmap:
>
> # rpcinfo -p 172.16.23.110
> Program Vers Proto Port
> 100000 2 tcp 111 portmapper
> 100000 2 udp 111 portmapper
> 100005 1 udp 54838 mountd
> 100005 1 tcp 32772 mountd
> 100005 2 udp 54838 mountd
> 100005 2 tcp 32772 mountd
> 100005 3 udp 54838 mountd
> 100005 3 tcp 32772 mountd
> 100003 2 udp 2049 nfs
> 100003 3 udp 2049 nfs
> 100003 4 udp 2049 nfs
> 100021 1 udp 35501 nlockmgr
> 100021 3 udp 35501 nlockmgr
> 100021 4 udp 35501 nlockmgr
> 100003 2 tcp 2049 nfs
> 100003 3 tcp 2049 nfs
> 100003 4 tcp 2049 nfs
> 100021 1 tcp 54766 nlockmgr
> 100021 3 tcp 54766 nlockmgr
> 100021 4 tcp 54766 nlockmgr
> 100024 1 udp 44650 status
> 100024 1 tcp 39765 status
>
>
>> Would it be possible for you to capture a packet trace of the =20
>> client's
>> attempt to mount it's root file system? (You will likely need to do
>> this for a bugzilla report, anyway).
>
> =C4hem, Chuck, may I ask you to look into the initial mail again. The=
=20
> failing
> case is attached there. Here I've attached the good one. Since I =20
> couldn't
> locate any mount attempt in the dump, I've left a few more nfs
> transactions.

The client makes a PMAP_GETPORT request via TCP. The server's rpcbind =
=20
drops the connection without replying after receiving the request. I =20
didn't see anything immediately wrong with the request, although =20
wireshark didn't like it either. I had to decode it by hand.

Restarting rpcbind usually means you lose all your rpc service =20
registrations until you restart those services, but it would be worth =20
trying this: stop the server's rpcbind service, then run rpcbind in a =20
terminal session with "-d" to see what it thinks the problem is when =20
it drops the connection.

Please attach a copy of your /etc/netconfig to reply.

>> Also let us know what's running on your clients (distribution, kerne=
l
>> version, etc).
>
> Hmm, sure. The client setup is a legacy SuSE 9.3 diskless environment
> (unfortunately Novell didn't manage to create a distribution with =20
> similar
> stability since then..., being a rpm junke, I will soon check Cent-OS
> (again)).
>
> Client (relevant) versions:
> Kernel: 2.6.11.4
> Udev (nfsmount): 053
>
> Let me know, what more I can provide, please.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com


2009-04-15 22:48:54

by Hans-Peter Jansen

[permalink] [raw]
Subject: Re: klibc's nfsmount failure with 2.6.27.21, while 2.6.25.20 was fine

Am Mittwoch, 15. April 2009 schrieb Chuck Lever:
> On Apr 15, 2009, at 3:17 PM, Hans-Peter Jansen wrote:
> >
> > =C4hem, Chuck, may I ask you to look into the initial mail again. T=
he
> > failing
> > case is attached there. Here I've attached the good one. Since I
> > couldn't
> > locate any mount attempt in the dump, I've left a few more nfs
> > transactions.
>
> The client makes a PMAP_GETPORT request via TCP. The server's rpcbin=
d
> drops the connection without replying after receiving the request. I
> didn't see anything immediately wrong with the request, although
> wireshark didn't like it either. I had to decode it by hand.

8[

> Restarting rpcbind usually means you lose all your rpc service
> registrations until you restart those services, but it would be worth
> trying this: stop the server's rpcbind service, then run rpcbind in a
> terminal session with "-d" to see what it thinks the problem is when
> it drops the connection.
>
> Please attach a copy of your /etc/netconfig to reply.

Okay, but since it requires yet another portmap <-> rpcbind swap, pleas=
e=20
give me a few days to find a appropriate slot.

Pete