2009-07-20 22:48:01

by Ben Greear

[permalink] [raw]
Subject: Error mounting FC8 NFS server with 2.6.31-rc3 NFSv4 client.

I tried mounting an NFS server running FC8 (2.6.26.8-57.fc8 kernel)
using a Fedora 11 system running an un-patched 2.6.31-rc3 64-bit kernel.

I am not sure at all that the FC8 system is set up to handle NFSv4 properly, but
I was expecting some sort of useful error if that was the case.

Instead, I get this continually spewing to /var/log/messages:

Error: state manager failed on NFSv4 server 192.168.100.6 with error 2
Error: state manager failed on NFSv4 server 192.168.100.6 with error 2
Error: state manager failed on NFSv4 server 192.168.100.6 with error 2
Error: state manager failed on NFSv4 server 192.168.100.6 with error 2
...

On the file-server, I see this:
Jul 19 04:47:59 fs2 kernel: nfs4_cb: server 192.168.100.196 not responding, timed out
Jul 19 04:49:29 fs2 kernel: nfs4_cb: server 192.168.100.196 not responding, timed out
Jul 19 04:49:29 fs2 ntpd[2585]: kernel time sync status change 0001
Jul 19 04:50:59 fs2 kernel: nfs4_cb: server 192.168.100.196 not responding, timed out


I added some debug patches (on top of my other patches, including those to nfs)
and got some debug info:

It seems that nfs4_init_clientid is returning -2, and establish_clid is returning
-2. This causes reclaim lease logic to fail, and that causes state manager to print
out the error repeatedly. -2 means ENOENT.


As far as I can tell, the mount never completes, staying in D state and filling up
logs (I deleted a 16GB /var/log/messages file a few minutes ago!)

The mount command I'm trying is:
mount -t nfs4 192.168.100.6:/export/tmp /mnt/lf/nfs4-0


My kernel config is found here:
http://www.candelatech.com/oss/i7_config.txt


Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com



2009-07-21 18:28:17

by Trond Myklebust

[permalink] [raw]
Subject: Re: Error mounting FC8 NFS server with 2.6.31-rc3 NFSv4 client.

On Tue, 2009-07-21 at 11:01 -0700, Ben Greear wrote:
> On 07/21/2009 10:59 AM, Trond Myklebust wrote:
> > On Tue, 2009-07-21 at 10:36 -0700, Ben Greear wrote:
> >> On 07/21/2009 10:12 AM, Trond Myklebust wrote:
> >>> On Tue, 2009-07-21 at 09:49 -0700, Ben Greear wrote:
> >>>> On 07/21/2009 05:15 AM, Trond Myklebust wrote:
> >>>>
> >>>>> What does /var/lib/nfs/v4recovery look like on the server?
> >>>> The server was misconfigured, but I still think the client should
> >>>> behave better in this case. If you cannot reproduce it, let me know
> >>>> and I can try to be more specific. If you still want the v4recovery
> >>>> information, let me know and I'll send it.
> >>> So how should the client behave, when a screwed up server allows it to
> >>> mount but starts returning illegal values for setclientid? The only
> >>> thing I can see we could do is to tell the user EINSANESERVER...
> >> Well, it could just fail the mount and give up and not overly spam
> >> /var/log/messages in a tight loop perhaps?
> >
> > This doesn't happen at mount time. It happens when you open a file.
>
> Not for me, and evidently not for the other person that reported
> similar results. All I had to do was attempt the mount (which never
> completed).
>
> Thanks,
> Ben

Ah... You have NFS_V4_1 enabled despite the Kconfig warning... Does the
bug occur when you turn this off too?

Trond

2009-07-20 23:02:23

by Ben Greear

[permalink] [raw]
Subject: Re: Error mounting FC8 NFS server with 2.6.31-rc3 NFSv4 client.

On 07/20/2009 02:43 PM, Ben Greear wrote:
> I tried mounting an NFS server running FC8 (2.6.26.8-57.fc8 kernel)
> using a Fedora 11 system running an un-patched 2.6.31-rc3 64-bit kernel.

Ok, seems you have to have things mis-configured to reproduce this:

I had my /etc/exports looking like this on the file-server:

/export/tmp 192.168.100.0/24(rw)


When I changed it to be:
/export/tmp 192.168.100.0/24(rw,fsid=0)


And mounted 192.168.100.6:/ instead of 192.168.100.6:/export/tmp
then it mounts properly and appears to work just fine.

If I leave file-server configured properly with fsid=0, but try
to mount 192.168.100.6:/export/tmp then I get an error about
no such file or directory, which also appears to be correct
behaviour.

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2009-07-21 12:15:32

by Trond Myklebust

[permalink] [raw]
Subject: Re: Error mounting FC8 NFS server with 2.6.31-rc3 NFSv4 client.

On Mon, 2009-07-20 at 15:09 -0700, Ben Greear wrote:
> I tried mounting an NFS server running FC8 (2.6.26.8-57.fc8 kernel)
> using a Fedora 11 system running an un-patched 2.6.31-rc3 64-bit kernel.
>
> I am not sure at all that the FC8 system is set up to handle NFSv4 properly, but
> I was expecting some sort of useful error if that was the case.
>
> Instead, I get this continually spewing to /var/log/messages:
>
> Error: state manager failed on NFSv4 server 192.168.100.6 with error 2
> Error: state manager failed on NFSv4 server 192.168.100.6 with error 2
> Error: state manager failed on NFSv4 server 192.168.100.6 with error 2
> Error: state manager failed on NFSv4 server 192.168.100.6 with error 2
> ...
>
> On the file-server, I see this:
> Jul 19 04:47:59 fs2 kernel: nfs4_cb: server 192.168.100.196 not responding, timed out
> Jul 19 04:49:29 fs2 kernel: nfs4_cb: server 192.168.100.196 not responding, timed out
> Jul 19 04:49:29 fs2 ntpd[2585]: kernel time sync status change 0001
> Jul 19 04:50:59 fs2 kernel: nfs4_cb: server 192.168.100.196 not responding, timed out
>
>
> I added some debug patches (on top of my other patches, including those to nfs)
> and got some debug info:
>
> It seems that nfs4_init_clientid is returning -2, and establish_clid is returning
> -2. This causes reclaim lease logic to fail, and that causes state manager to print
> out the error repeatedly. -2 means ENOENT.
>
>
> As far as I can tell, the mount never completes, staying in D state and filling up
> logs (I deleted a 16GB /var/log/messages file a few minutes ago!)
>
> The mount command I'm trying is:
> mount -t nfs4 192.168.100.6:/export/tmp /mnt/lf/nfs4-0
>
>
> My kernel config is found here:
> http://www.candelatech.com/oss/i7_config.txt
>

What does /var/lib/nfs/v4recovery look like on the server?

Trond


2009-07-21 16:49:12

by Ben Greear

[permalink] [raw]
Subject: Re: Error mounting FC8 NFS server with 2.6.31-rc3 NFSv4 client.

On 07/21/2009 05:15 AM, Trond Myklebust wrote:

>
> What does /var/lib/nfs/v4recovery look like on the server?

The server was misconfigured, but I still think the client should
behave better in this case. If you cannot reproduce it, let me know
and I can try to be more specific. If you still want the v4recovery
information, let me know and I'll send it.


I had my /etc/exports looking like this on the file-server:

/export/tmp 192.168.100.0/24(rw)


When I changed it to be:
/export/tmp 192.168.100.0/24(rw,fsid=0)


And mounted 192.168.100.6:/ instead of 192.168.100.6:/export/tmp
then it mounts properly and appears to work just fine.

If I leave file-server configured properly with fsid=0, but try
to mount 192.168.100.6:/export/tmp then I get an error about
no such file or directory, which also appears to be correct
behaviour.

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2009-07-21 17:12:22

by Trond Myklebust

[permalink] [raw]
Subject: Re: Error mounting FC8 NFS server with 2.6.31-rc3 NFSv4 client.

On Tue, 2009-07-21 at 09:49 -0700, Ben Greear wrote:
> On 07/21/2009 05:15 AM, Trond Myklebust wrote:
>
> >
> > What does /var/lib/nfs/v4recovery look like on the server?
>
> The server was misconfigured, but I still think the client should
> behave better in this case. If you cannot reproduce it, let me know
> and I can try to be more specific. If you still want the v4recovery
> information, let me know and I'll send it.

So how should the client behave, when a screwed up server allows it to
mount but starts returning illegal values for setclientid? The only
thing I can see we could do is to tell the user EINSANESERVER...

Now, we _should_ fix the wretched NFS server so that it doesn't do NFSv4
mounts when there is no configured root partition. We _should_ also fix
the damned thing so that it doesn't return illegal values.

Trond


2009-07-21 17:36:55

by Ben Greear

[permalink] [raw]
Subject: Re: Error mounting FC8 NFS server with 2.6.31-rc3 NFSv4 client.

On 07/21/2009 10:12 AM, Trond Myklebust wrote:
> On Tue, 2009-07-21 at 09:49 -0700, Ben Greear wrote:
>> On 07/21/2009 05:15 AM, Trond Myklebust wrote:
>>
>>> What does /var/lib/nfs/v4recovery look like on the server?
>> The server was misconfigured, but I still think the client should
>> behave better in this case. If you cannot reproduce it, let me know
>> and I can try to be more specific. If you still want the v4recovery
>> information, let me know and I'll send it.
>
> So how should the client behave, when a screwed up server allows it to
> mount but starts returning illegal values for setclientid? The only
> thing I can see we could do is to tell the user EINSANESERVER...

Well, it could just fail the mount and give up and not overly spam
/var/log/messages in a tight loop perhaps?

> Now, we _should_ fix the wretched NFS server so that it doesn't do NFSv4
> mounts when there is no configured root partition. We _should_ also fix
> the damned thing so that it doesn't return illegal values.

Sounds fine to me.

Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2009-07-21 17:49:53

by Frans Pop

[permalink] [raw]
Subject: Re: Error mounting FC8 NFS server with 2.6.31-rc3 NFSv4 client.

Ben Greear wrote:
> On 07/21/2009 05:15 AM, Trond Myklebust wrote:
>> What does /var/lib/nfs/v4recovery look like on the server?
>
> The server was misconfigured, but I still think the client should
> behave better in this case. If you cannot reproduce it, let me know
> and I can try to be more specific. If you still want the v4recovery
> information, let me know and I'll send it.
>
> I had my /etc/exports looking like this on the file-server:
>
> /export/tmp 192.168.100.0/24(rw)
>
> When I changed it to be:
> /export/tmp 192.168.100.0/24(rw,fsid=0)
>
> And mounted 192.168.100.6:/ instead of 192.168.100.6:/export/tmp
> then it mounts properly and appears to work just fine.
>
> If I leave file-server configured properly with fsid=0, but try
> to mount 192.168.100.6:/export/tmp then I get an error about
> no such file or directory, which also appears to be correct
> behaviour.

Duh! I had exactly the same problem this week when I "quickly" set up an
NFS server on one of my boxes: forgot to add fsid=0 for the NFS4 root.
Result was the mount hanging on the client and the same errors as Ben
reported.

It would be great if this configuration problem was detected better.

P.S. /var/lib/nfs/v4recovery on the server was empty in my case.

2009-07-21 17:59:04

by Trond Myklebust

[permalink] [raw]
Subject: Re: Error mounting FC8 NFS server with 2.6.31-rc3 NFSv4 client.

On Tue, 2009-07-21 at 10:36 -0700, Ben Greear wrote:
> On 07/21/2009 10:12 AM, Trond Myklebust wrote:
> > On Tue, 2009-07-21 at 09:49 -0700, Ben Greear wrote:
> >> On 07/21/2009 05:15 AM, Trond Myklebust wrote:
> >>
> >>> What does /var/lib/nfs/v4recovery look like on the server?
> >> The server was misconfigured, but I still think the client should
> >> behave better in this case. If you cannot reproduce it, let me know
> >> and I can try to be more specific. If you still want the v4recovery
> >> information, let me know and I'll send it.
> >
> > So how should the client behave, when a screwed up server allows it to
> > mount but starts returning illegal values for setclientid? The only
> > thing I can see we could do is to tell the user EINSANESERVER...
>
> Well, it could just fail the mount and give up and not overly spam
> /var/log/messages in a tight loop perhaps?

This doesn't happen at mount time. It happens when you open a file.

Trond


2009-07-21 18:02:02

by Ben Greear

[permalink] [raw]
Subject: Re: Error mounting FC8 NFS server with 2.6.31-rc3 NFSv4 client.

On 07/21/2009 10:59 AM, Trond Myklebust wrote:
> On Tue, 2009-07-21 at 10:36 -0700, Ben Greear wrote:
>> On 07/21/2009 10:12 AM, Trond Myklebust wrote:
>>> On Tue, 2009-07-21 at 09:49 -0700, Ben Greear wrote:
>>>> On 07/21/2009 05:15 AM, Trond Myklebust wrote:
>>>>
>>>>> What does /var/lib/nfs/v4recovery look like on the server?
>>>> The server was misconfigured, but I still think the client should
>>>> behave better in this case. If you cannot reproduce it, let me know
>>>> and I can try to be more specific. If you still want the v4recovery
>>>> information, let me know and I'll send it.
>>> So how should the client behave, when a screwed up server allows it to
>>> mount but starts returning illegal values for setclientid? The only
>>> thing I can see we could do is to tell the user EINSANESERVER...
>> Well, it could just fail the mount and give up and not overly spam
>> /var/log/messages in a tight loop perhaps?
>
> This doesn't happen at mount time. It happens when you open a file.

Not for me, and evidently not for the other person that reported
similar results. All I had to do was attempt the mount (which never
completed).

Thanks,
Ben

>
> Trond


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com