2003-09-03 22:13:49

by Pascal Schmidt

[permalink] [raw]
Subject: [NFS] attempt to use V1 mount protocol on V3 server


Hi!

I don't know whether this is really a problem with the kernel
NFS client - I tried looking at the code but I cannot make too
much sense out of it. ;)

The problem is the following. I have written a user-space NFSv3
server and wanted to register the mount program for version 3 only.
But, this leads to a two problems:

a) when unmounting an NFS volume, the server gets sent an umount
request indicating version 1 of the protocol, sending a version 3
umount is not even attempted

b) when something goes wrong during the NFSv3 mount, the kernel
seems to fall back to NFSv2, re-attempting the mount with mount
protocol version 1

I think both of this should not be done when the remote side does not
advertise mount protocol version 1 support.

Question: is this a problem of the user-space mount utility or is
it an in-kernel problem?

I've worked around it for the moment by registering for the mount v1
protocol too, handling umount and returning an error if a mount is
attempted. I would it like much more to register for v3 only, though...

--
Ciao,
Pascal


2003-09-03 23:33:42

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NFS] attempt to use V1 mount protocol on V3 server

>>>>> " " == Pascal Schmidt <[email protected]> writes:

> a) when unmounting an NFS volume, the server gets sent an umount
> request indicating version 1 of the protocol, sending a
> version 3 umount is not even attempted

> b) when something goes wrong during the NFSv3 mount, the kernel
> seems to fall back to NFSv2, re-attempting the mount with
> mount protocol version 1

> I think both of this should not be done when the remote side
> does not advertise mount protocol version 1 support.

> Question: is this a problem of the user-space mount utility or
> is it an in-kernel problem?


a) Is a feature of the 'mount' program. An NFS server should in any
case not rely on the umount being sent: a client may have crashed
or been firewalled, or whatever...

b) Is a kernel feature which will never trigger if you are passing a
correct filehandle from your mountd.

Cheers,
Trond

2003-09-04 02:08:22

by Pascal Schmidt

[permalink] [raw]
Subject: Re: [NFS] attempt to use V1 mount protocol on V3 server

On Thu, 04 Sep 2003 01:40:13 +0200, you wrote in linux.kernel:

> a) Is a feature of the 'mount' program. An NFS server should in any
> case not rely on the umount being sent: a client may have crashed
> or been firewalled, or whatever...

Okay. I'm not relying on it, anyway. I had just expected to get a V3
umount call and not a V1 umount call. I could understand the V1 as a
fallback. Seems I have to live with user-space tools calling me for
protocol versions I didn't even register with the portmapper.

> b) Is a kernel feature which will never trigger if you are passing a
> correct filehandle from your mountd.

That's assuming all NFSv3 servers do NFSv2 also. I don't. In this case
the bug was in my nfsd who was not recognizing the filehandle coming in
via GETATTR as correct. ;)

So I'll have to live with registering for V1 also and handling umount
there and rejecting mount with an error. Oh well.

Thanks for the explanations!

Oh, BTW, that reminds me: the 2.6.0-test NFS client does not like
FSSTAT returning NFS3ERR_NOTSUPP. When I started coding, I got a hard
lockup of my system due to that, had to press the reset button, not
even Alt-SysRq wanted to work. I couldn't capture the output and
shutting down the system didn't work, plus I could not start any new
processes. Sure, that was a buggy server, but should that lock up
the kernel? Known problem?

I can probably reproduce that since changing my code to return NOTSUPP
again would be easy, if you are interested.

--
Ciao,
Pascal

2003-09-04 02:19:58

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NFS] attempt to use V1 mount protocol on V3 server

>>>>> " " == Pascal Schmidt <[email protected]> writes:

> That's assuming all NFSv3 servers do NFSv2 also. I don't. In
> this case the bug was in my nfsd who was not recognizing the
> filehandle coming in via GETATTR as correct. ;)

> So I'll have to live with registering for V1 also and handling
> umount there and rejecting mount with an error. Oh well.

No. That won't make any difference. The kernel never talks to the
mountd.

It's being handed a bogus filehandle by the userland mount command
(which gets it from mountd). When it sends the initial NFSv3 GETATTR
call to the nfsd, and gets rejected, it just retries the same GETATTR
call as an NFSv2 call.

> Oh, BTW, that reminds me: the 2.6.0-test NFS client does not
> like FSSTAT returning NFS3ERR_NOTSUPP. When I started coding, I
> got a hard lockup of my system due to that, had to press the
> reset button, not even Alt-SysRq wanted to work. I couldn't
> capture the output and shutting down the system didn't work,
> plus I could not start any new processes. Sure, that was a
> buggy server, but should that lock up the kernel? Known
> problem?

I'll check what's happening. AFAICS, the NFS layer should not really
care, but it will pass some funny values back to the VFS, and this
might be screwing something up...

Cheers,
Trond

2003-09-04 02:41:46

by Pascal Schmidt

[permalink] [raw]
Subject: Re: [NFS] attempt to use V1 mount protocol on V3 server

On Wed, 3 Sep 2003, Trond Myklebust wrote:

> It's being handed a bogus filehandle by the userland mount command
> (which gets it from mountd). When it sends the initial NFSv3 GETATTR
> call to the nfsd, and gets rejected, it just retries the same GETATTR
> call as an NFSv2 call.

Out of interest, how does this work? Not obvious to me since an NFSv3
filehandle is too big for an NFSv2 server.

> I'll check what's happening. AFAICS, the NFS layer should not really
> care, but it will pass some funny values back to the VFS, and this
> might be screwing something up...

Sounds likely, since basically the whole machine locked up and no
futher fs operations seemed to be happening. I haven't checked whether
2.4 also shows the problem - I just fixed it in my code and then it
obviously did not happen anymore.

I can test patches or also send you my code if you want to test
things yourself. It's also available online, UNFS3 project at
SourceForge, but that's of course a version with working FSSTAT.

--
Ciao,
Pascal

2003-09-04 04:37:29

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NFS] attempt to use V1 mount protocol on V3 server

>>>>> " " == Pascal Schmidt <[email protected]> writes:

> Out of interest, how does this work? Not obvious to me since an
> NFSv3 filehandle is too big for an NFSv2 server.

Most are not. An NFSv3 filehandle has a variable size (as opposed to
NFSv2 which are fixed size), and so most NFS servers use the same
filehandle for NFSv2 and NFSv3.

Note: The reason for this mess is that the early Linux-2.2.x knfsd
servers were NFSv2 only. Unfortunately, the associated kmountd daemon
would advertise that it did NFSv3 too, in which case it just returned
the same NFSv2 filehandles. By retrying the GETATTR call in the NFSv3
client, and automatically switching to NFSv2 we were able to catch
these buggy setups.


Note: The fact that we are now stuck with a schizophrenic NFSv3 client
is one of the many reasons why I am now *very* wary of trying to work
around server bugs by making fixes to the client code.

Cheers,
Trond

2003-09-04 14:28:02

by Pascal Schmidt

[permalink] [raw]
Subject: Re: [NFS] attempt to use V1 mount protocol on V3 server

On Thu, 4 Sep 2003, Trond Myklebust wrote:

> Most are not. An NFSv3 filehandle has a variable size (as opposed to
> NFSv2 which are fixed size), and so most NFS servers use the same
> filehandle for NFSv2 and NFSv3.

Well, my filehandles are all 64 bytes at the moment. Doesn't matter
anyway since my nfsd does not handle NFSv2.

> Note: The fact that we are now stuck with a schizophrenic NFSv3 client
> is one of the many reasons why I am now *very* wary of trying to work
> around server bugs by making fixes to the client code.

Fine with me if a buggy server results in a failure to mount. However,
I was seeing crashes.

--
Ciao,
Pascal

2003-09-04 15:25:06

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NFS] attempt to use V1 mount protocol on V3 server

>>>>> " " == Pascal Schmidt <[email protected]> writes:

> Fine with me if a buggy server results in a failure to
> mount. However, I was seeing crashes.

I assume that your server's RPC engine replying with a PROG_MISMATCH
the way it should when it cannot support NFSv2?

Hmm.. Looking at the code, we appear not to be handling that case very
well in the RPC client. PROG_UNAVAIL, PROG_MISMATCH, and PROC_UNAVAIL
are all handled incorrectly as if the replies were garbage...

Althought this is harmless, we should really be returning an EIO
immediately, and report the error in the syslog...

Does the following patch (against 2.4.22) help?

Cheers,
Trond

--- linux-2.4.22-up/net/sunrpc/clnt.c.orig 2003-08-23 14:11:09.000000000 -0400
+++ linux-2.4.22-up/net/sunrpc/clnt.c 2003-09-04 11:21:28.000000000 -0400
@@ -932,6 +932,24 @@
switch ((n = ntohl(*p++))) {
case RPC_SUCCESS:
return p;
+ case RPC_PROG_UNAVAIL:
+ printk(KERN_WARNING "RPC: %4d call_verify: program %u is unsupported by server %s\n",
+ task->tk_pid, (unsigned int)task->tk_client->cl_prog,
+ task->tk_client->cl_server);
+ goto out_eio;
+ case RPC_PROG_MISMATCH:
+ printk(KERN_WARNING "RPC: %4d call_verify: program %u, version %u unsupported by server %s\n",
+ task->tk_pid, (unsigned int)task->tk_client->cl_prog,
+ (unsigned int)task->tk_client->cl_vers,
+ task->tk_client->cl_server);
+ goto out_eio;
+ case RPC_PROC_UNAVAIL:
+ printk(KERN_WARNING "RPC: %4d call_verify: proc %u unsupported by program %u, version %u on server %s\n",
+ task->tk_pid, (unsigned int)task->tk_msg.rpc_proc,
+ (unsigned int)task->tk_client->cl_prog,
+ (unsigned int)task->tk_client->cl_vers,
+ task->tk_client->cl_server);
+ goto out_eio;
case RPC_GARBAGE_ARGS:
break; /* retry */
default:
@@ -949,6 +967,7 @@
return NULL;
}
printk(KERN_WARNING "RPC: garbage, exit EIO\n");
+out_eio:
rpc_exit(task, -EIO);
return NULL;
}