LinuxLists.cc - Re: [PATCH - v2] mount.nfs: Fix fallback from tcp to udp

2014-03-10 21:27:36

Subject: Re: [PATCH - v2] mount.nfs: Fix fallback from tcp to udp

Hey Neil,

On 02/23/2014 10:23 PM, NeilBrown wrote:
>
> Protocol negotiation in mount.nfs does not correctly negotiate with a
> server which only supports NFSv3 and UDP.
>
> When mount.nfs attempts an NFSv4 mount and fails with ECONNREFUSED
> it does not fall back to NFSv3, as this is not recognised as a
> "does not support NFSv4" error.
> However ECONNREFUSED is a clear indication that the server doesn't
> support TCP, and ipso facto does not support NFSv4.
> So ECONNREFUSED should trigger a fallback from v4 to v2/3.
What server are you running this against?

I've started an number of different servers with the "-T -N 4"
rpc.nfsd flags here is what I've found:

When I doing the mount without "-o v3" mount option, the mount
hangs (forever) in the server discovering trunking code.

Mount older server like RHEL5 or RHEL6 the mount (w/out the patch)
success because RPC: Program not registered is return which causes
the fallback to happen

So I'm assuming you are using a recent Linux server because
I do see the ECONNREFUSED with that server, but because I
have to use the -o v3 this patch does not seem to help.
Basically I spin in this loop:

mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 127.0.0.1 prog 100003 vers 3 prot TCP port 2049
mount.nfs: portmap query failed: RPC: Remote system error - Connection refused

I have not debug as to what is going on, but it does not look like
the patch is doing what is expect...

steved.

>
> However ECONNREFUSED may simply indicate that NFSv4 isn't supported
> *yet*. i.e. the server is still booting and isn't responding to NFS
> requests yet. So if we subsequently find that NFSv3 is supported, we
> need to check with rpcbind to confirm that NFSv4 really isn't
> supported.
> If rpcbind reports that v4 is not supported after reporting that v3
> is, we can safely use v4. If it reports that v4 is supported, we need
> to retry v4.
>
> One line in this patch requires explanation. It is
> + save_pmap.pm_vers = 4;
>
> This is needed as nfs_probe_nfsport is entered with pm_vers already
> set, by the line
> nfs_pmap->pm_vers = mntvers_to_nfs(*probe_vers);
> in nfs_probe_bothports(). This makes the passing of probe_nfs3_only
> and probe_nfs2_only to nfs_probe_port() in nfs_probe_nfsport()
> pointless, and the passing of probe_nfs4_only in the new code
> ineffectual, without resetting pm_vers.
>
> The setting of pm_vers to mntvers_to_nfs() should probably be removed,
> but as we don't have any regression tests, doing this would be unwise.
>
> Signed-off-by: NeilBrown <[email protected]>
> Reported-by: Carsten Ziepke <[email protected]>
>
> diff --git a/utils/mount/network.c b/utils/mount/network.c
> index 2fdd2c051be7..0521d5f6709f 100644
> --- a/utils/mount/network.c
> +++ b/utils/mount/network.c
> @@ -149,6 +149,11 @@ static const unsigned int probe_tcp_first[] = {
> 0,
> };
>
> +static const unsigned int probe_tcp_only[] = {
> + IPPROTO_TCP,
> + 0,
> +};
> +
> static const unsigned long probe_nfs2_only[] = {
> 2,
> 0,
> @@ -159,6 +164,11 @@ static const unsigned long probe_nfs3_only[] = {
> 0,
> };
>
> +static const unsigned long probe_nfs4_only[] = {
> + 4,
> + 0,
> +};
> +
> static const unsigned long probe_mnt1_first[] = {
> 1,
> 2,
> @@ -611,18 +621,34 @@ out_ok:
> * returned; rpccreateerr.cf_stat is set to reflect the nature of the error.
> */
> static int nfs_probe_nfsport(const struct sockaddr *sap, const socklen_t salen,
> - struct pmap *pmap)
> + struct pmap *pmap, int checkv4)
> {
> if (pmap->pm_vers && pmap->pm_prot && pmap->pm_port)
> return 1;
>
> if (nfs_mount_data_version >= 4) {
> const unsigned int *probe_proto;
> + int ret;
> + struct pmap save_pmap;
> + struct sockaddr_storage save_sa;
>
> probe_proto = nfs_default_proto();
> -
> - return nfs_probe_port(sap, salen, pmap,
> - probe_nfs3_only, probe_proto);
> + memcpy(&save_pmap, pmap, sizeof(save_pmap));
> + memcpy(&save_sa, sap, salen);
> +
> + ret = nfs_probe_port(sap, salen, pmap,
> + probe_nfs3_only, probe_proto);
> + if (!ret || !checkv4 || probe_proto != probe_tcp_first)
> + return ret;
> + save_pmap.pm_vers = 4;
> + ret = nfs_probe_port((struct sockaddr*)&save_sa, salen, &save_pmap,
> + probe_nfs4_only, probe_tcp_only);
> + if (ret) {
> + rpc_createerr.cf_stat = RPC_FAILED;
> + rpc_createerr.cf_error.re_errno = EAGAIN;
> + return 0;
> + }
> + return 1;
> } else
> return nfs_probe_port(sap, salen, pmap,
> probe_nfs2_only, probe_udp_only);
> @@ -671,7 +697,7 @@ static int nfs_probe_version_fixed(const struct sockaddr *mnt_saddr,
> const socklen_t nfs_salen,
> struct pmap *nfs_pmap)
> {
> - if (!nfs_probe_nfsport(nfs_saddr, nfs_salen, nfs_pmap))
> + if (!nfs_probe_nfsport(nfs_saddr, nfs_salen, nfs_pmap, 0))
> return 0;
> return nfs_probe_mntport(mnt_saddr, mnt_salen, mnt_pmap);
> }
> @@ -686,6 +712,8 @@ static int nfs_probe_version_fixed(const struct sockaddr *mnt_saddr,
> * @nfs_salen: length of NFS server's address
> * @nfs_pmap: IN: partially filled-in NFS RPC service tuple;
> * OUT: fully filled-in NFS RPC service tuple
> + * @checkv4: Flag indicating that if v3 is available we must also
> + * check v4, and if that is available, set re_errno to EAGAIN.
> *
> * Returns 1 and fills in both @pmap structs if the requested service
> * ports are unambiguous and pingable. Otherwise zero is returned;
> @@ -696,7 +724,8 @@ int nfs_probe_bothports(const struct sockaddr *mnt_saddr,
> struct pmap *mnt_pmap,
> const struct sockaddr *nfs_saddr,
> const socklen_t nfs_salen,
> - struct pmap *nfs_pmap)
> + struct pmap *nfs_pmap,
> + int checkv4)
> {
> struct pmap save_nfs, save_mnt;
> const unsigned long *probe_vers;
> @@ -717,7 +746,7 @@ int nfs_probe_bothports(const struct sockaddr *mnt_saddr,
>
> for (; *probe_vers; probe_vers++) {
> nfs_pmap->pm_vers = mntvers_to_nfs(*probe_vers);
> - if (nfs_probe_nfsport(nfs_saddr, nfs_salen, nfs_pmap) != 0) {
> + if (nfs_probe_nfsport(nfs_saddr, nfs_salen, nfs_pmap, checkv4) != 0) {
> mnt_pmap->pm_vers = *probe_vers;
> if (nfs_probe_mntport(mnt_saddr, mnt_salen, mnt_pmap) != 0)
> return 1;
> @@ -757,7 +786,7 @@ int probe_bothports(clnt_addr_t *mnt_server, clnt_addr_t *nfs_server)
> return nfs_probe_bothports(mnt_addr, sizeof(mnt_server->saddr),
> &mnt_server->pmap,
> nfs_addr, sizeof(nfs_server->saddr),
> - &nfs_server->pmap);
> + &nfs_server->pmap, 0);
> }
>
> /**
> diff --git a/utils/mount/network.h b/utils/mount/network.h
> index 9c75856c9ca8..d7636d7c54a6 100644
> --- a/utils/mount/network.h
> +++ b/utils/mount/network.h
> @@ -42,7 +42,7 @@ static const struct timeval RETRY_TIMEOUT = { 3, 0 };
> int probe_bothports(clnt_addr_t *, clnt_addr_t *);
> int nfs_probe_bothports(const struct sockaddr *, const socklen_t,
> struct pmap *, const struct sockaddr *,
> - const socklen_t, struct pmap *);
> + const socklen_t, struct pmap *, int);
> int nfs_gethostbyname(const char *, struct sockaddr_in *);
> int nfs_lookup(const char *hostname, const sa_family_t family,
> struct sockaddr *sap, socklen_t *salen);
> diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
> index a642394d2f5a..1f4782563572 100644
> --- a/utils/mount/stropts.c
> +++ b/utils/mount/stropts.c
> @@ -484,7 +484,7 @@ static int nfs_construct_new_options(struct mount_options *options,
> * FALSE is returned if some failure occurred.
> */
> static int
> -nfs_rewrite_pmap_mount_options(struct mount_options *options)
> +nfs_rewrite_pmap_mount_options(struct mount_options *options, int checkv4)
> {
> union nfs_sockaddr nfs_address;
> struct sockaddr *nfs_saddr = &nfs_address.sa;
> @@ -534,7 +534,7 @@ nfs_rewrite_pmap_mount_options(struct mount_options *options)
> * negotiate. Bail now if we can't contact it.
> */
> if (!nfs_probe_bothports(mnt_saddr, mnt_salen, &mnt_pmap,
> - nfs_saddr, nfs_salen, &nfs_pmap)) {
> + nfs_saddr, nfs_salen, &nfs_pmap, checkv4)) {
> errno = ESPIPE;
> if (rpc_createerr.cf_stat == RPC_PROGNOTREGISTERED)
> errno = EOPNOTSUPP;
> @@ -595,7 +595,8 @@ static int nfs_sys_mount(struct nfsmount_info *mi, struct mount_options *opts)
> }
>
> static int nfs_do_mount_v3v2(struct nfsmount_info *mi,
> - struct sockaddr *sap, socklen_t salen)
> + struct sockaddr *sap, socklen_t salen,
> + int checkv4)
> {
> struct mount_options *options = po_dup(mi->options);
> int result = 0;
> @@ -637,7 +638,7 @@ static int nfs_do_mount_v3v2(struct nfsmount_info *mi,
> printf(_("%s: trying text-based options '%s'\n"),
> progname, *mi->extra_opts);
>
> - if (!nfs_rewrite_pmap_mount_options(options))
> + if (!nfs_rewrite_pmap_mount_options(options, checkv4))
> goto out_fail;
>
> result = nfs_sys_mount(mi, options);
> @@ -653,13 +654,13 @@ out_fail:
> * Returns TRUE if successful, otherwise FALSE.
> * "errno" is set to reflect the individual error.
> */
> -static int nfs_try_mount_v3v2(struct nfsmount_info *mi)
> +static int nfs_try_mount_v3v2(struct nfsmount_info *mi, int checkv4)
> {
> struct addrinfo *ai;
> int ret = 0;
>
> for (ai = mi->address; ai != NULL; ai = ai->ai_next) {
> - ret = nfs_do_mount_v3v2(mi, ai->ai_addr, ai->ai_addrlen);
> + ret = nfs_do_mount_v3v2(mi, ai->ai_addr, ai->ai_addrlen, checkv4);
> if (ret != 0)
> return ret;
>
> @@ -793,7 +794,8 @@ static int nfs_autonegotiate(struct nfsmount_info *mi)
> result = nfs_try_mount_v4(mi);
> if (result)
> return result;
> -
> +
> +check_errno:
> switch (errno) {
> case EPROTONOSUPPORT:
> /* A clear indication that the server or our
> @@ -807,12 +809,24 @@ static int nfs_autonegotiate(struct nfsmount_info *mi)
> /* Linux servers prior to 2.6.25 may return
> * EPERM when NFS version 4 is not supported. */
> goto fall_back;
> + case ECONNREFUSED:
> + /* UDP-Only servers won't support v4, but maybe it
> + * just isn't ready yet. So try v3, but double-check
> + * with rpcbind for v4. */
> + result = nfs_try_mount_v3v2(mi, 1);
> + if (result == 0 && errno == EAGAIN) {
> + /* v4 server seems to be registered now. */
> + result = nfs_try_mount_v4(mi);
> + if (result == 0 && errno != ECONNREFUSED)
> + goto check_errno;
> + }
> + return result;
> default:
> return result;
> }
>
> fall_back:
> - return nfs_try_mount_v3v2(mi);
> + return nfs_try_mount_v3v2(mi, 0);
> }
>
> /*
> @@ -831,7 +845,7 @@ static int nfs_try_mount(struct nfsmount_info *mi)
> break;
> case 2:
> case 3:
> - result = nfs_try_mount_v3v2(mi);
> + result = nfs_try_mount_v3v2(mi, 0);
> break;
> case 4:
> result = nfs_try_mount_v4(mi);
>

2014-03-10 22:01:34

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH - v2] mount.nfs: Fix fallback from tcp to udp

On Mon, 10 Mar 2014 17:27:27 -0400 Steve Dickson <[email protected]> wrote:

> Hey Neil,
>
> On 02/23/2014 10:23 PM, NeilBrown wrote:
> >
> > Protocol negotiation in mount.nfs does not correctly negotiate with a
> > server which only supports NFSv3 and UDP.
> >
> > When mount.nfs attempts an NFSv4 mount and fails with ECONNREFUSED
> > it does not fall back to NFSv3, as this is not recognised as a
> > "does not support NFSv4" error.
> > However ECONNREFUSED is a clear indication that the server doesn't
> > support TCP, and ipso facto does not support NFSv4.
> > So ECONNREFUSED should trigger a fallback from v4 to v2/3.
> What server are you running this against?

Not sure, either Linux 3.2 or 3.11.
I don't know exactly what server the bug was reported to me against, I'm
guessing some NAS thing.

https://bugzilla.novell.com/show_bug.cgi?id=863749

>
> I've started an number of different servers with the "-T -N 4"
> rpc.nfsd flags here is what I've found:
>
> When I doing the mount without "-o v3" mount option, the mount
> hangs (forever) in the server discovering trunking code.
>
> Mount older server like RHEL5 or RHEL6 the mount (w/out the patch)
> success because RPC: Program not registered is return which causes
> the fallback to happen
>
> So I'm assuming you are using a recent Linux server because
> I do see the ECONNREFUSED with that server, but because I
> have to use the -o v3 this patch does not seem to help.
> Basically I spin in this loop:
>
> mount.nfs: prog 100003, trying vers=3, prot=6
> mount.nfs: trying 127.0.0.1 prog 100003 vers 3 prot TCP port 2049
> mount.nfs: portmap query failed: RPC: Remote system error - Connection refused
>
> I have not debug as to what is going on, but it does not look like
> the patch is doing what is expect...
>
> steved.

With a 3.11.10 client talking to a 3.2.0 server I run
rpc.nfsd 0
rpc.nfsd -T -N4
on the server, then
rpcinfo -p SERVER | grep nfs
shows
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100227 2 udp 2049 nfs_acl
100227 3 udp 2049 nfs_acl

On client I run
mount -v SERVER:/PATH /mnt
and I get
mount.nfs: trying text-based options 'vers=4,addr=192.168.1.3,clientaddr=192.168.1.2'
mount.nfs: mount(2): Connection refused

repeating ever 10 seconds or so. It eventually times out after 2 minutes.

Same client to a 3.10 server I get the same behaviour.
3.2.0 client and 3.10 server, same behaviour again.

I have noticed that sometimes when I stop the NFS server the registration
with rpcbind doesn't go away. Not often, but sometimes. I wonder if that
could be confusing something? Can you check that nfsv4 has been
de-registered from rpcbind?

I note you are getting the error:

> mount.nfs: portmap query failed: RPC: Remote system error - Connection refused

This seems to suggest that rpcbind isn't running. Yet when I kill rpcbind
and try a v3 mount I get

mount.nfs: portmap query failed: RPC: Unable to receive - Connection refused

which is slightly different, so presumably there is a different cause in your
case.

Maybe you could turn on some rpcdebug tracing to see what is happening?

Thanks,
NeilBrown

>
> >
> > However ECONNREFUSED may simply indicate that NFSv4 isn't supported
> > *yet*. i.e. the server is still booting and isn't responding to NFS
> > requests yet. So if we subsequently find that NFSv3 is supported, we
> > need to check with rpcbind to confirm that NFSv4 really isn't
> > supported.
> > If rpcbind reports that v4 is not supported after reporting that v3
> > is, we can safely use v4. If it reports that v4 is supported, we need
> > to retry v4.
> >
> > One line in this patch requires explanation. It is
> > + save_pmap.pm_vers = 4;
> >
> > This is needed as nfs_probe_nfsport is entered with pm_vers already
> > set, by the line
> > nfs_pmap->pm_vers = mntvers_to_nfs(*probe_vers);
> > in nfs_probe_bothports(). This makes the passing of probe_nfs3_only
> > and probe_nfs2_only to nfs_probe_port() in nfs_probe_nfsport()
> > pointless, and the passing of probe_nfs4_only in the new code
> > ineffectual, without resetting pm_vers.
> >
> > The setting of pm_vers to mntvers_to_nfs() should probably be removed,
> > but as we don't have any regression tests, doing this would be unwise.
> >
> > Signed-off-by: NeilBrown <[email protected]>
> > Reported-by: Carsten Ziepke <[email protected]>
> >
> > diff --git a/utils/mount/network.c b/utils/mount/network.c
> > index 2fdd2c051be7..0521d5f6709f 100644
> > --- a/utils/mount/network.c
> > +++ b/utils/mount/network.c
> > @@ -149,6 +149,11 @@ static const unsigned int probe_tcp_first[] = {
> > 0,
> > };
> >
> > +static const unsigned int probe_tcp_only[] = {
> > + IPPROTO_TCP,
> > + 0,
> > +};
> > +
> > static const unsigned long probe_nfs2_only[] = {
> > 2,
> > 0,
> > @@ -159,6 +164,11 @@ static const unsigned long probe_nfs3_only[] = {
> > 0,
> > };
> >
> > +static const unsigned long probe_nfs4_only[] = {
> > + 4,
> > + 0,
> > +};
> > +
> > static const unsigned long probe_mnt1_first[] = {
> > 1,
> > 2,
> > @@ -611,18 +621,34 @@ out_ok:
> > * returned; rpccreateerr.cf_stat is set to reflect the nature of the error.
> > */
> > static int nfs_probe_nfsport(const struct sockaddr *sap, const socklen_t salen,
> > - struct pmap *pmap)
> > + struct pmap *pmap, int checkv4)
> > {
> > if (pmap->pm_vers && pmap->pm_prot && pmap->pm_port)
> > return 1;
> >
> > if (nfs_mount_data_version >= 4) {
> > const unsigned int *probe_proto;
> > + int ret;
> > + struct pmap save_pmap;
> > + struct sockaddr_storage save_sa;
> >
> > probe_proto = nfs_default_proto();
> > -
> > - return nfs_probe_port(sap, salen, pmap,
> > - probe_nfs3_only, probe_proto);
> > + memcpy(&save_pmap, pmap, sizeof(save_pmap));
> > + memcpy(&save_sa, sap, salen);
> > +
> > + ret = nfs_probe_port(sap, salen, pmap,
> > + probe_nfs3_only, probe_proto);
> > + if (!ret || !checkv4 || probe_proto != probe_tcp_first)
> > + return ret;
> > + save_pmap.pm_vers = 4;
> > + ret = nfs_probe_port((struct sockaddr*)&save_sa, salen, &save_pmap,
> > + probe_nfs4_only, probe_tcp_only);
> > + if (ret) {
> > + rpc_createerr.cf_stat = RPC_FAILED;
> > + rpc_createerr.cf_error.re_errno = EAGAIN;
> > + return 0;
> > + }
> > + return 1;
> > } else
> > return nfs_probe_port(sap, salen, pmap,
> > probe_nfs2_only, probe_udp_only);
> > @@ -671,7 +697,7 @@ static int nfs_probe_version_fixed(const struct sockaddr *mnt_saddr,
> > const socklen_t nfs_salen,
> > struct pmap *nfs_pmap)
> > {
> > - if (!nfs_probe_nfsport(nfs_saddr, nfs_salen, nfs_pmap))
> > + if (!nfs_probe_nfsport(nfs_saddr, nfs_salen, nfs_pmap, 0))
> > return 0;
> > return nfs_probe_mntport(mnt_saddr, mnt_salen, mnt_pmap);
> > }
> > @@ -686,6 +712,8 @@ static int nfs_probe_version_fixed(const struct sockaddr *mnt_saddr,
> > * @nfs_salen: length of NFS server's address
> > * @nfs_pmap: IN: partially filled-in NFS RPC service tuple;
> > * OUT: fully filled-in NFS RPC service tuple
> > + * @checkv4: Flag indicating that if v3 is available we must also
> > + * check v4, and if that is available, set re_errno to EAGAIN.
> > *
> > * Returns 1 and fills in both @pmap structs if the requested service
> > * ports are unambiguous and pingable. Otherwise zero is returned;
> > @@ -696,7 +724,8 @@ int nfs_probe_bothports(const struct sockaddr *mnt_saddr,
> > struct pmap *mnt_pmap,
> > const struct sockaddr *nfs_saddr,
> > const socklen_t nfs_salen,
> > - struct pmap *nfs_pmap)
> > + struct pmap *nfs_pmap,
> > + int checkv4)
> > {
> > struct pmap save_nfs, save_mnt;
> > const unsigned long *probe_vers;
> > @@ -717,7 +746,7 @@ int nfs_probe_bothports(const struct sockaddr *mnt_saddr,
> >
> > for (; *probe_vers; probe_vers++) {
> > nfs_pmap->pm_vers = mntvers_to_nfs(*probe_vers);
> > - if (nfs_probe_nfsport(nfs_saddr, nfs_salen, nfs_pmap) != 0) {
> > + if (nfs_probe_nfsport(nfs_saddr, nfs_salen, nfs_pmap, checkv4) != 0) {
> > mnt_pmap->pm_vers = *probe_vers;
> > if (nfs_probe_mntport(mnt_saddr, mnt_salen, mnt_pmap) != 0)
> > return 1;
> > @@ -757,7 +786,7 @@ int probe_bothports(clnt_addr_t *mnt_server, clnt_addr_t *nfs_server)
> > return nfs_probe_bothports(mnt_addr, sizeof(mnt_server->saddr),
> > &mnt_server->pmap,
> > nfs_addr, sizeof(nfs_server->saddr),
> > - &nfs_server->pmap);
> > + &nfs_server->pmap, 0);
> > }
> >
> > /**
> > diff --git a/utils/mount/network.h b/utils/mount/network.h
> > index 9c75856c9ca8..d7636d7c54a6 100644
> > --- a/utils/mount/network.h
> > +++ b/utils/mount/network.h
> > @@ -42,7 +42,7 @@ static const struct timeval RETRY_TIMEOUT = { 3, 0 };
> > int probe_bothports(clnt_addr_t *, clnt_addr_t *);
> > int nfs_probe_bothports(const struct sockaddr *, const socklen_t,
> > struct pmap *, const struct sockaddr *,
> > - const socklen_t, struct pmap *);
> > + const socklen_t, struct pmap *, int);
> > int nfs_gethostbyname(const char *, struct sockaddr_in *);
> > int nfs_lookup(const char *hostname, const sa_family_t family,
> > struct sockaddr *sap, socklen_t *salen);
> > diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
> > index a642394d2f5a..1f4782563572 100644
> > --- a/utils/mount/stropts.c
> > +++ b/utils/mount/stropts.c
> > @@ -484,7 +484,7 @@ static int nfs_construct_new_options(struct mount_options *options,
> > * FALSE is returned if some failure occurred.
> > */
> > static int
> > -nfs_rewrite_pmap_mount_options(struct mount_options *options)
> > +nfs_rewrite_pmap_mount_options(struct mount_options *options, int checkv4)
> > {
> > union nfs_sockaddr nfs_address;
> > struct sockaddr *nfs_saddr = &nfs_address.sa;
> > @@ -534,7 +534,7 @@ nfs_rewrite_pmap_mount_options(struct mount_options *options)
> > * negotiate. Bail now if we can't contact it.
> > */
> > if (!nfs_probe_bothports(mnt_saddr, mnt_salen, &mnt_pmap,
> > - nfs_saddr, nfs_salen, &nfs_pmap)) {
> > + nfs_saddr, nfs_salen, &nfs_pmap, checkv4)) {
> > errno = ESPIPE;
> > if (rpc_createerr.cf_stat == RPC_PROGNOTREGISTERED)
> > errno = EOPNOTSUPP;
> > @@ -595,7 +595,8 @@ static int nfs_sys_mount(struct nfsmount_info *mi, struct mount_options *opts)
> > }
> >
> > static int nfs_do_mount_v3v2(struct nfsmount_info *mi,
> > - struct sockaddr *sap, socklen_t salen)
> > + struct sockaddr *sap, socklen_t salen,
> > + int checkv4)
> > {
> > struct mount_options *options = po_dup(mi->options);
> > int result = 0;
> > @@ -637,7 +638,7 @@ static int nfs_do_mount_v3v2(struct nfsmount_info *mi,
> > printf(_("%s: trying text-based options '%s'\n"),
> > progname, *mi->extra_opts);
> >
> > - if (!nfs_rewrite_pmap_mount_options(options))
> > + if (!nfs_rewrite_pmap_mount_options(options, checkv4))
> > goto out_fail;
> >
> > result = nfs_sys_mount(mi, options);
> > @@ -653,13 +654,13 @@ out_fail:
> > * Returns TRUE if successful, otherwise FALSE.
> > * "errno" is set to reflect the individual error.
> > */
> > -static int nfs_try_mount_v3v2(struct nfsmount_info *mi)
> > +static int nfs_try_mount_v3v2(struct nfsmount_info *mi, int checkv4)
> > {
> > struct addrinfo *ai;
> > int ret = 0;
> >
> > for (ai = mi->address; ai != NULL; ai = ai->ai_next) {
> > - ret = nfs_do_mount_v3v2(mi, ai->ai_addr, ai->ai_addrlen);
> > + ret = nfs_do_mount_v3v2(mi, ai->ai_addr, ai->ai_addrlen, checkv4);
> > if (ret != 0)
> > return ret;
> >
> > @@ -793,7 +794,8 @@ static int nfs_autonegotiate(struct nfsmount_info *mi)
> > result = nfs_try_mount_v4(mi);
> > if (result)
> > return result;
> > -
> > +
> > +check_errno:
> > switch (errno) {
> > case EPROTONOSUPPORT:
> > /* A clear indication that the server or our
> > @@ -807,12 +809,24 @@ static int nfs_autonegotiate(struct nfsmount_info *mi)
> > /* Linux servers prior to 2.6.25 may return
> > * EPERM when NFS version 4 is not supported. */
> > goto fall_back;
> > + case ECONNREFUSED:
> > + /* UDP-Only servers won't support v4, but maybe it
> > + * just isn't ready yet. So try v3, but double-check
> > + * with rpcbind for v4. */
> > + result = nfs_try_mount_v3v2(mi, 1);
> > + if (result == 0 && errno == EAGAIN) {
> > + /* v4 server seems to be registered now. */
> > + result = nfs_try_mount_v4(mi);
> > + if (result == 0 && errno != ECONNREFUSED)
> > + goto check_errno;
> > + }
> > + return result;
> > default:
> > return result;
> > }
> >
> > fall_back:
> > - return nfs_try_mount_v3v2(mi);
> > + return nfs_try_mount_v3v2(mi, 0);
> > }
> >
> > /*
> > @@ -831,7 +845,7 @@ static int nfs_try_mount(struct nfsmount_info *mi)
> > break;
> > case 2:
> > case 3:
> > - result = nfs_try_mount_v3v2(mi);
> > + result = nfs_try_mount_v3v2(mi, 0);
> > break;
> > case 4:
> > result = nfs_try_mount_v4(mi);
> >

Attachments:

signature.asc (828.00 B)

2014-03-12 05:38:13

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH - v2] mount.nfs: Fix fallback from tcp to udp

On Tue, 11 Mar 2014 10:52:36 -0400 Steve Dickson <[email protected]> wrote:

> On 03/10/2014 06:01 PM, NeilBrown wrote:
> >
> > With a 3.11.10 client talking to a 3.2.0 server I run
> > rpc.nfsd 0
> > rpc.nfsd -T -N4
> > on the server, then
> > rpcinfo -p SERVER | grep nfs
> > shows
> > 100003 2 udp 2049 nfs
> > 100003 3 udp 2049 nfs
> > 100227 2 udp 2049 nfs_acl
> > 100227 3 udp 2049 nfs_acl
> >
> > On client I run
> > mount -v SERVER:/PATH /mnt
> > and I get
> > mount.nfs: trying text-based options 'vers=4,addr=192.168.1.3,clientaddr=192.168.1.2'
> > mount.nfs: mount(2): Connection refused
> >
> > repeating ever 10 seconds or so. It eventually times out after 2 minutes.
> >
> > Same client to a 3.10 server I get the same behaviour.
> > 3.2.0 client and 3.10 server, same behaviour again.
> >
> > I have noticed that sometimes when I stop the NFS server the registration
> > with rpcbind doesn't go away. Not often, but sometimes. I wonder if that
> > could be confusing something? Can you check that nfsv4 has been
> > de-registered from rpcbind?
> >
> > I note you are getting the error:
> >
> >> mount.nfs: portmap query failed: RPC: Remote system error - Connection refused
> >
> > This seems to suggest that rpcbind isn't running. Yet when I kill rpcbind
> > and try a v3 mount I get
> >
> > mount.nfs: portmap query failed: RPC: Unable to receive - Connection refused
> >
> > which is slightly different, so presumably there is a different cause in your
> > case.
> >
> > Maybe you could turn on some rpcdebug tracing to see what is happening?
> Ok... I had to dial back my client to an older kernel (3.12)
> to start seeing what you were seeing...
>
> I would make one change and one comment... The change I would
> like to make (I'll re-post it) is to ping the server to see
> if v4 came up instead of asking rpcbind if its registered.
> Code wise I think it cleaner and quicker plus I'm not sure
> its a good idea to tie v4 and rpcbind together...

My logic was that if rpcbind was running at all, then any v4 server should
register with it. It would seem odd for rpcbind to report "v2 or v3" but for
v4 to be running anyway.
However I don't object in principle to your approach.
I'll have a look at the code.

>
> My comment is this... This code become obsolete with the 3.13
> kernel because the kernel never returns the timeout or the
> ECONNREFUSED... The mount just spins in the kernel until
> interrupted.

This sounds like a regression to me. For a systemcall that used to fail to
now hang sounds like an API change, and we usually discourage those.

Can it be fixed? Trond?

NeilBrown

>
> steved.

Attachments:

signature.asc (828.00 B)

2014-03-12 09:15:13

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH - v2] mount.nfs: Fix fallback from tcp to udp

On Mar 12, 2014, at 1:38, NeilBrown <[email protected]> wrote:

> On Tue, 11 Mar 2014 10:52:36 -0400 Steve Dickson <[email protected]> wrote:
>
>> On 03/10/2014 06:01 PM, NeilBrown wrote:
>>>
>>> With a 3.11.10 client talking to a 3.2.0 server I run
>>> rpc.nfsd 0
>>> rpc.nfsd -T -N4
>>> on the server, then
>>> rpcinfo -p SERVER | grep nfs
>>> shows
>>> 100003 2 udp 2049 nfs
>>> 100003 3 udp 2049 nfs
>>> 100227 2 udp 2049 nfs_acl
>>> 100227 3 udp 2049 nfs_acl
>>>
>>> On client I run
>>> mount -v SERVER:/PATH /mnt
>>> and I get
>>> mount.nfs: trying text-based options 'vers=4,addr=192.168.1.3,clientaddr=192.168.1.2'
>>> mount.nfs: mount(2): Connection refused
>>>
>>> repeating ever 10 seconds or so. It eventually times out after 2 minutes.
>>>
>>> Same client to a 3.10 server I get the same behaviour.
>>> 3.2.0 client and 3.10 server, same behaviour again.
>>>
>>> I have noticed that sometimes when I stop the NFS server the registration
>>> with rpcbind doesn't go away. Not often, but sometimes. I wonder if that
>>> could be confusing something? Can you check that nfsv4 has been
>>> de-registered from rpcbind?
>>>
>>> I note you are getting the error:
>>>
>>>> mount.nfs: portmap query failed: RPC: Remote system error - Connection refused
>>>
>>> This seems to suggest that rpcbind isn't running. Yet when I kill rpcbind
>>> and try a v3 mount I get
>>>
>>> mount.nfs: portmap query failed: RPC: Unable to receive - Connection refused
>>>
>>> which is slightly different, so presumably there is a different cause in your
>>> case.
>>>
>>> Maybe you could turn on some rpcdebug tracing to see what is happening?
>> Ok... I had to dial back my client to an older kernel (3.12)
>> to start seeing what you were seeing...
>>
>> I would make one change and one comment... The change I would
>> like to make (I'll re-post it) is to ping the server to see
>> if v4 came up instead of asking rpcbind if its registered.
>> Code wise I think it cleaner and quicker plus I'm not sure
>> its a good idea to tie v4 and rpcbind together...
>
> My logic was that if rpcbind was running at all, then any v4 server should
> register with it. It would seem odd for rpcbind to report "v2 or v3" but for
> v4 to be running anyway.
> However I don't object in principle to your approach.
> I'll have a look at the code.
>
>
>>
>> My comment is this... This code become obsolete with the 3.13
>> kernel because the kernel never returns the timeout or the
>> ECONNREFUSED... The mount just spins in the kernel until
>> interrupted.
>
> This sounds like a regression to me. For a systemcall that used to fail to
> now hang sounds like an API change, and we usually discourage those.
>
> Can it be fixed? Trond?

Can someone please provide a test case that confirms that there has been such a change? I would expect the timeouts to have changed due to the NFSv4 trunking detection (which is exactly why it is wrong to rely on the kernel timeouts here anyway), but I would not expect the kernel to never time out at all.

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]

2014-03-13 01:23:21

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH - v2] mount.nfs: Fix fallback from tcp to udp

> >> I would expect the timeouts to have changed due to the NFSv4 trunking detection (which is
> >> exactly why it is wrong to rely on the kernel timeouts here anyway), but I would not expect
> >> the kernel to never time out at all.
> > It appears it started with 3.13 kernels... The above stack is from a 3.14-ish client.
> >
>
> Which patch caused the behaviour to change?

561ec1603171cd9b38dcf6cac53e8710f437a48d is the first bad commit
commit 561ec1603171cd9b38dcf6cac53e8710f437a48d
Author: Trond Myklebust <[email protected]>
Date: Thu Sep 26 15:22:45 2013 -0400

SUNRPC: call_connect_status should recheck bind and connect status on error

Currently, we go directly to call_transmit which sends us to call_status
on error. If we know that the connect attempt failed, we should rather
just jump straight back to call_bind and call_connect.

Ditto for EAGAIN, except do not delay.

Signed-off-by: Trond Myklebust <[email protected]>

If I revert that commit from mainline (which may be a completely bogus thing
to do) then mainline works (at least for this specific simple test).
(The revert required some wiggling - I'll include it below).

To be precise, the test is to try to mount
mount server:/path /mnt
from a server which has run
rpc.nfsd -T -N4

"success" is getting periodic messages:

mount.nfs: trying text-based options 'retry=1,vers=4,addr=10.0.10.2,clientaddr=10.0.10.1'
mount.nfs: mount(2): Connection refused

"failure" is not getting those messages.

There is another change though.
For the commit above I don't not get "Connection refused", but after 2
minutes I get

mount.nfs: mount(2): Connection timed out

With mainline, it waits forever.
I did a second git bisect for this change and found

2118071d3b0d57a03fad77885f4fdc364798aa87 is the first bad commit
commit 2118071d3b0d57a03fad77885f4fdc364798aa87
Author: Trond Myklebust <[email protected]>
Date: Tue Dec 31 13:22:59 2013 -0500

SUNRPC: Report connection error values to rpc_tasks on the pending queue

Currently we only report EAGAIN, which is not descriptive enough for
softconn tasks.

Signed-off-by: Trond Myklebust <[email protected]>

From this commit, a mount attempt which is getting connections denied will
block indefinitely.

Hope that is helpful.

NeilBrown

This is the revert that I mentioned - just for completeness.

From 9c1462ff54fcc2adc79c825b867c32c19e30a9a7 Mon Sep 17 00:00:00 2001
From: NeilBrown <[email protected]>
Date: Thu, 13 Mar 2014 11:38:54 +1100
Subject: [PATCH] Revert "SUNRPC: call_connect_status should recheck bind and
connect status on error"

This reverts commit 561ec1603171cd9b38dcf6cac53e8710f437a48d.

Conflicts:
net/sunrpc/clnt.c

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 0edada973434..ba0cd114f0e1 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1796,7 +1796,6 @@ call_connect_status(struct rpc_task *task)
dprint_status(task);

trace_rpc_connect_status(task, status);
- task->tk_status = 0;
switch (status) {
/* if soft mounted, test if we've timed out */
case -ETIMEDOUT:
@@ -1805,16 +1804,14 @@ call_connect_status(struct rpc_task *task)
case -ECONNREFUSED:
case -ECONNRESET:
case -ECONNABORTED:
- case -ENETUNREACH:
case -EHOSTUNREACH:
- /* retry with existing socket, after a delay */
- rpc_delay(task, 3*HZ);
+ case -ENETUNREACH:
if (RPC_IS_SOFTCONN(task))
break;
- case -EAGAIN:
- task->tk_action = call_bind;
- return;
+ /* retry with existing socket, after a delay */
case 0:
+ case -EAGAIN:
+ task->tk_status = 0;
clnt->cl_stats->netreconn++;
task->tk_action = call_transmit;
return;

Attachments:

signature.asc (828.00 B)

2014-03-12 13:11:51

by Steve Dickson

[permalink] [raw]

Subject: Re: [PATCH - v2] mount.nfs: Fix fallback from tcp to udp

On 03/12/2014 09:09 AM, Steve Dickson wrote:
>
>
> On 03/12/2014 07:22 AM, Trond Myklebust wrote:
>>
>> On Mar 12, 2014, at 6:57, Steve Dickson <[email protected]> wrote:
>>
>>>
>>>
>>> On 03/12/2014 05:15 AM, Trond Myklebust wrote:
>>>>
>>>> On Mar 12, 2014, at 1:38, NeilBrown <[email protected]> wrote:
>>>>
>>>>> On Tue, 11 Mar 2014 10:52:36 -0400 Steve Dickson <[email protected]> wrote:
>>>>>
>>>>>> On 03/10/2014 06:01 PM, NeilBrown wrote:
>>>>>>>
>>>>>>> With a 3.11.10 client talking to a 3.2.0 server I run
>>>>>>> rpc.nfsd 0
>>>>>>> rpc.nfsd -T -N4
>>>>>>> on the server, then
>>>>>>> rpcinfo -p SERVER | grep nfs
>>>>>>> shows
>>>>>>> 100003 2 udp 2049 nfs
>>>>>>> 100003 3 udp 2049 nfs
>>>>>>> 100227 2 udp 2049 nfs_acl
>>>>>>> 100227 3 udp 2049 nfs_acl
>>>>>>>
>>>>>>> On client I run
>>>>>>> mount -v SERVER:/PATH /mnt
>>>>>>> and I get
>>>>>>> mount.nfs: trying text-based options 'vers=4,addr=192.168.1.3,clientaddr=192.168.1.2'
>>>>>>> mount.nfs: mount(2): Connection refused
>>>>>>>
>>>>>>> repeating ever 10 seconds or so. It eventually times out after 2 minutes.
>>>>>>>
>>>>>>> Same client to a 3.10 server I get the same behaviour.
>>>>>>> 3.2.0 client and 3.10 server, same behaviour again.
>>>>>>>
>>>>>>> I have noticed that sometimes when I stop the NFS server the registration
>>>>>>> with rpcbind doesn't go away. Not often, but sometimes. I wonder if that
>>>>>>> could be confusing something? Can you check that nfsv4 has been
>>>>>>> de-registered from rpcbind?
>>>>>>>
>>>>>>> I note you are getting the error:
>>>>>>>
>>>>>>>> mount.nfs: portmap query failed: RPC: Remote system error - Connection refused
>>>>>>>
>>>>>>> This seems to suggest that rpcbind isn't running. Yet when I kill rpcbind
>>>>>>> and try a v3 mount I get
>>>>>>>
>>>>>>> mount.nfs: portmap query failed: RPC: Unable to receive - Connection refused
>>>>>>>
>>>>>>> which is slightly different, so presumably there is a different cause in your
>>>>>>> case.
>>>>>>>
>>>>>>> Maybe you could turn on some rpcdebug tracing to see what is happening?
>>>>>> Ok... I had to dial back my client to an older kernel (3.12)
>>>>>> to start seeing what you were seeing...
>>>>>>
>>>>>> I would make one change and one comment... The change I would
>>>>>> like to make (I'll re-post it) is to ping the server to see
>>>>>> if v4 came up instead of asking rpcbind if its registered.
>>>>>> Code wise I think it cleaner and quicker plus I'm not sure
>>>>>> its a good idea to tie v4 and rpcbind together...
>>>>>
>>>>> My logic was that if rpcbind was running at all, then any v4 server should
>>>>> register with it. It would seem odd for rpcbind to report "v2 or v3" but for
>>>>> v4 to be running anyway.
>>>>> However I don't object in principle to your approach.
>>>>> I'll have a look at the code.
>>>>>
>>>>>
>>>>>>
>>>>>> My comment is this... This code become obsolete with the 3.13
>>>>>> kernel because the kernel never returns the timeout or the
>>>>>> ECONNREFUSED... The mount just spins in the kernel until
>>>>>> interrupted.
>>>>>
>>>>> This sounds like a regression to me. For a systemcall that used to fail to
>>>>> now hang sounds like an API change, and we usually discourage those.
>>>>>
>>>>> Can it be fixed? Trond?
>>>>
>>>> Can someone please provide a test case that confirms that there has been such a change?
>>> On the server:
>>> rpc.nfsd 0
>>> rpc.nfsd -N4
>>>
>>> On the client
>>> mount <server>:/export /mnt
>>>
>>> I have a mount hanging/spinning since yesterday
>>> 19178 pts/2 D+ 0:26 /sbin/mount.nfs fedora:/home /mnt/home -v -o rw
>>>
>>> A stack dump from crash:
>>> PID: 19178 TASK: ffff8800ba2b41a0 CPU: 0 COMMAND: "mount.nfs"
>>> #0 [ffff8800b93115f8] __schedule at ffffffff815f0c3d
>>> #1 [ffff8800b9311660] schedule at ffffffff815f1179
>>> #2 [ffff8800b9311670] rpc_wait_bit_killable at ffffffffa03f7a35 [sunrpc]
>>> #3 [ffff8800b9311688] __wait_on_bit at ffffffff815ef200
>>> #4 [ffff8800b93116c8] out_of_line_wait_on_bit at ffffffff815ef2b7
>>> #5 [ffff8800b9311738] __rpc_execute at ffffffffa03f890a [sunrpc]
>>> #6 [ffff8800b9311798] rpc_execute at ffffffffa03f9fce [sunrpc]
>>> #7 [ffff8800b93117c8] rpc_run_task at ffffffffa03f01c0 [sunrpc]
>>> #8 [ffff8800b93117e8] rpc_call_sync at ffffffffa03f0230 [sunrpc]
>>> #9 [ffff8800b9311840] nfs4_proc_setclientid at ffffffffa06c9c49 [nfsv4]
>>> #10 [ffff8800b9311988] nfs40_discover_server_trunking at ffffffffa06d8489 [nfsv4]
>>> #11 [ffff8800b93119d0] nfs4_discover_server_trunking at ffffffffa06daf2d [nfsv4]
>>> #12 [ffff8800b9311a28] nfs4_init_client at ffffffffa06e29a4 [nfsv4]
>>> #13 [ffff8800b9311b20] nfs_get_client at ffffffffa06816ba [nfs]
>>> #14 [ffff8800b9311b80] nfs4_set_client at ffffffffa06e1fb0 [nfsv4]
>>> #15 [ffff8800b9311c00] nfs4_create_server at ffffffffa06e34ce [nfsv4]
>>> #16 [ffff8800b9311c88] nfs4_remote_mount at ffffffffa06db90e [nfsv4]
>>> #17 [ffff8800b9311cb0] mount_fs at ffffffff811b3c89
>>> #18 [ffff8800b9311cf8] vfs_kern_mount at ffffffff811cea8f
>>> #19 [ffff8800b9311d30] nfs_do_root_mount at ffffffffa06db836 [nfsv4]
>>> #20 [ffff8800b9311d70] nfs4_try_mount at ffffffffa06dbc24 [nfsv4]
>>> #21 [ffff8800b9311da0] nfs_fs_mount at ffffffffa068dcc5 [nfs]
>>> #22 [ffff8800b9311e28] mount_fs at ffffffff811b3c89
>>> #23 [ffff8800b9311e70] vfs_kern_mount at ffffffff811cea8f
>>> #24 [ffff8800b9311ea8] do_mount at ffffffff811d0e3e
>>> #25 [ffff8800b9311f28] sys_mount at ffffffff811d16b6
>>> #26 [ffff8800b9311f80] system_call_fastpath at ffffffff815fc0d9
>>>
>>>
>>>> I would expect the timeouts to have changed due to the NFSv4 trunking detection (which is
>>>> exactly why it is wrong to rely on the kernel timeouts here anyway), but I would not expect
>>>> the kernel to never time out at all.
>>> It appears it started with 3.13 kernels... The above stack is from a 3.14-ish client.
>>>
>>
>> Which patch caused the behaviour to change?
> IDK.... I just know 3.12 (f19) kernel does return timeouts and 3.13 (f20) do not....
This goes for a 3.14 kernels as well....

steved

2014-03-12 10:57:35

by Steve Dickson

[permalink] [raw]

Subject: Re: [PATCH - v2] mount.nfs: Fix fallback from tcp to udp

On 03/12/2014 05:15 AM, Trond Myklebust wrote:
>
> On Mar 12, 2014, at 1:38, NeilBrown <[email protected]> wrote:
>
>> On Tue, 11 Mar 2014 10:52:36 -0400 Steve Dickson <[email protected]> wrote:
>>
>>> On 03/10/2014 06:01 PM, NeilBrown wrote:
>>>>
>>>> With a 3.11.10 client talking to a 3.2.0 server I run
>>>> rpc.nfsd 0
>>>> rpc.nfsd -T -N4
>>>> on the server, then
>>>> rpcinfo -p SERVER | grep nfs
>>>> shows
>>>> 100003 2 udp 2049 nfs
>>>> 100003 3 udp 2049 nfs
>>>> 100227 2 udp 2049 nfs_acl
>>>> 100227 3 udp 2049 nfs_acl
>>>>
>>>> On client I run
>>>> mount -v SERVER:/PATH /mnt
>>>> and I get
>>>> mount.nfs: trying text-based options 'vers=4,addr=192.168.1.3,clientaddr=192.168.1.2'
>>>> mount.nfs: mount(2): Connection refused
>>>>
>>>> repeating ever 10 seconds or so. It eventually times out after 2 minutes.
>>>>
>>>> Same client to a 3.10 server I get the same behaviour.
>>>> 3.2.0 client and 3.10 server, same behaviour again.
>>>>
>>>> I have noticed that sometimes when I stop the NFS server the registration
>>>> with rpcbind doesn't go away. Not often, but sometimes. I wonder if that
>>>> could be confusing something? Can you check that nfsv4 has been
>>>> de-registered from rpcbind?
>>>>
>>>> I note you are getting the error:
>>>>
>>>>> mount.nfs: portmap query failed: RPC: Remote system error - Connection refused
>>>>
>>>> This seems to suggest that rpcbind isn't running. Yet when I kill rpcbind
>>>> and try a v3 mount I get
>>>>
>>>> mount.nfs: portmap query failed: RPC: Unable to receive - Connection refused
>>>>
>>>> which is slightly different, so presumably there is a different cause in your
>>>> case.
>>>>
>>>> Maybe you could turn on some rpcdebug tracing to see what is happening?
>>> Ok... I had to dial back my client to an older kernel (3.12)
>>> to start seeing what you were seeing...
>>>
>>> I would make one change and one comment... The change I would
>>> like to make (I'll re-post it) is to ping the server to see
>>> if v4 came up instead of asking rpcbind if its registered.
>>> Code wise I think it cleaner and quicker plus I'm not sure
>>> its a good idea to tie v4 and rpcbind together...
>>
>> My logic was that if rpcbind was running at all, then any v4 server should
>> register with it. It would seem odd for rpcbind to report "v2 or v3" but for
>> v4 to be running anyway.
>> However I don't object in principle to your approach.
>> I'll have a look at the code.
>>
>>
>>>
>>> My comment is this... This code become obsolete with the 3.13
>>> kernel because the kernel never returns the timeout or the
>>> ECONNREFUSED... The mount just spins in the kernel until
>>> interrupted.
>>
>> This sounds like a regression to me. For a systemcall that used to fail to
>> now hang sounds like an API change, and we usually discourage those.
>>
>> Can it be fixed? Trond?
>
> Can someone please provide a test case that confirms that there has been such a change?
On the server:
rpc.nfsd 0
rpc.nfsd -N4

On the client
mount <server>:/export /mnt

I have a mount hanging/spinning since yesterday
19178 pts/2 D+ 0:26 /sbin/mount.nfs fedora:/home /mnt/home -v -o rw

A stack dump from crash:
PID: 19178 TASK: ffff8800ba2b41a0 CPU: 0 COMMAND: "mount.nfs"
#0 [ffff8800b93115f8] __schedule at ffffffff815f0c3d
#1 [ffff8800b9311660] schedule at ffffffff815f1179
#2 [ffff8800b9311670] rpc_wait_bit_killable at ffffffffa03f7a35 [sunrpc]
#3 [ffff8800b9311688] __wait_on_bit at ffffffff815ef200
#4 [ffff8800b93116c8] out_of_line_wait_on_bit at ffffffff815ef2b7
#5 [ffff8800b9311738] __rpc_execute at ffffffffa03f890a [sunrpc]
#6 [ffff8800b9311798] rpc_execute at ffffffffa03f9fce [sunrpc]
#7 [ffff8800b93117c8] rpc_run_task at ffffffffa03f01c0 [sunrpc]
#8 [ffff8800b93117e8] rpc_call_sync at ffffffffa03f0230 [sunrpc]
#9 [ffff8800b9311840] nfs4_proc_setclientid at ffffffffa06c9c49 [nfsv4]
#10 [ffff8800b9311988] nfs40_discover_server_trunking at ffffffffa06d8489 [nfsv4]
#11 [ffff8800b93119d0] nfs4_discover_server_trunking at ffffffffa06daf2d [nfsv4]
#12 [ffff8800b9311a28] nfs4_init_client at ffffffffa06e29a4 [nfsv4]
#13 [ffff8800b9311b20] nfs_get_client at ffffffffa06816ba [nfs]
#14 [ffff8800b9311b80] nfs4_set_client at ffffffffa06e1fb0 [nfsv4]
#15 [ffff8800b9311c00] nfs4_create_server at ffffffffa06e34ce [nfsv4]
#16 [ffff8800b9311c88] nfs4_remote_mount at ffffffffa06db90e [nfsv4]
#17 [ffff8800b9311cb0] mount_fs at ffffffff811b3c89
#18 [ffff8800b9311cf8] vfs_kern_mount at ffffffff811cea8f
#19 [ffff8800b9311d30] nfs_do_root_mount at ffffffffa06db836 [nfsv4]
#20 [ffff8800b9311d70] nfs4_try_mount at ffffffffa06dbc24 [nfsv4]
#21 [ffff8800b9311da0] nfs_fs_mount at ffffffffa068dcc5 [nfs]
#22 [ffff8800b9311e28] mount_fs at ffffffff811b3c89
#23 [ffff8800b9311e70] vfs_kern_mount at ffffffff811cea8f
#24 [ffff8800b9311ea8] do_mount at ffffffff811d0e3e
#25 [ffff8800b9311f28] sys_mount at ffffffff811d16b6
#26 [ffff8800b9311f80] system_call_fastpath at ffffffff815fc0d9

> I would expect the timeouts to have changed due to the NFSv4 trunking detection (which is
> exactly why it is wrong to rely on the kernel timeouts here anyway), but I would not expect
> the kernel to never time out at all.
It appears it started with 3.13 kernels... The above stack is from a 3.14-ish client.

That patch I posted the other day fixed this by breaking out of the case statement
with -ETIMEDOUT errors in nfs4_discover_server_trunking() instead of ssleep(1) and then
trying the RPC....

steved.

2014-03-11 14:56:09

by Steve Dickson

[permalink] [raw]

Subject: Re: [PATCH - v2] mount.nfs: Fix fallback from tcp to udp

On 03/10/2014 06:01 PM, NeilBrown wrote:
>
> With a 3.11.10 client talking to a 3.2.0 server I run
> rpc.nfsd 0
> rpc.nfsd -T -N4
> on the server, then
> rpcinfo -p SERVER | grep nfs
> shows
> 100003 2 udp 2049 nfs
> 100003 3 udp 2049 nfs
> 100227 2 udp 2049 nfs_acl
> 100227 3 udp 2049 nfs_acl
>
> On client I run
> mount -v SERVER:/PATH /mnt
> and I get
> mount.nfs: trying text-based options 'vers=4,addr=192.168.1.3,clientaddr=192.168.1.2'
> mount.nfs: mount(2): Connection refused
>
> repeating ever 10 seconds or so. It eventually times out after 2 minutes.
>
> Same client to a 3.10 server I get the same behaviour.
> 3.2.0 client and 3.10 server, same behaviour again.
>
> I have noticed that sometimes when I stop the NFS server the registration
> with rpcbind doesn't go away. Not often, but sometimes. I wonder if that
> could be confusing something? Can you check that nfsv4 has been
> de-registered from rpcbind?
>
> I note you are getting the error:
>
>> mount.nfs: portmap query failed: RPC: Remote system error - Connection refused
>
> This seems to suggest that rpcbind isn't running. Yet when I kill rpcbind
> and try a v3 mount I get
>
> mount.nfs: portmap query failed: RPC: Unable to receive - Connection refused
>
> which is slightly different, so presumably there is a different cause in your
> case.
>
> Maybe you could turn on some rpcdebug tracing to see what is happening?
Ok... I had to dial back my client to an older kernel (3.12)
to start seeing what you were seeing...

I would make one change and one comment... The change I would
like to make (I'll re-post it) is to ping the server to see
if v4 came up instead of asking rpcbind if its registered.
Code wise I think it cleaner and quicker plus I'm not sure
its a good idea to tie v4 and rpcbind together...

My comment is this... This code become obsolete with the 3.13
kernel because the kernel never returns the timeout or the
ECONNREFUSED... The mount just spins in the kernel until
interrupted.

steved.

2014-03-12 11:22:29

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH - v2] mount.nfs: Fix fallback from tcp to udp

On Mar 12, 2014, at 6:57, Steve Dickson <[email protected]> wrote:

>
>
> On 03/12/2014 05:15 AM, Trond Myklebust wrote:
>>
>> On Mar 12, 2014, at 1:38, NeilBrown <[email protected]> wrote:
>>
>>> On Tue, 11 Mar 2014 10:52:36 -0400 Steve Dickson <[email protected]> wrote:
>>>
>>>> On 03/10/2014 06:01 PM, NeilBrown wrote:
>>>>>
>>>>> With a 3.11.10 client talking to a 3.2.0 server I run
>>>>> rpc.nfsd 0
>>>>> rpc.nfsd -T -N4
>>>>> on the server, then
>>>>> rpcinfo -p SERVER | grep nfs
>>>>> shows
>>>>> 100003 2 udp 2049 nfs
>>>>> 100003 3 udp 2049 nfs
>>>>> 100227 2 udp 2049 nfs_acl
>>>>> 100227 3 udp 2049 nfs_acl
>>>>>
>>>>> On client I run
>>>>> mount -v SERVER:/PATH /mnt
>>>>> and I get
>>>>> mount.nfs: trying text-based options 'vers=4,addr=192.168.1.3,clientaddr=192.168.1.2'
>>>>> mount.nfs: mount(2): Connection refused
>>>>>
>>>>> repeating ever 10 seconds or so. It eventually times out after 2 minutes.
>>>>>
>>>>> Same client to a 3.10 server I get the same behaviour.
>>>>> 3.2.0 client and 3.10 server, same behaviour again.
>>>>>
>>>>> I have noticed that sometimes when I stop the NFS server the registration
>>>>> with rpcbind doesn't go away. Not often, but sometimes. I wonder if that
>>>>> could be confusing something? Can you check that nfsv4 has been
>>>>> de-registered from rpcbind?
>>>>>
>>>>> I note you are getting the error:
>>>>>
>>>>>> mount.nfs: portmap query failed: RPC: Remote system error - Connection refused
>>>>>
>>>>> This seems to suggest that rpcbind isn't running. Yet when I kill rpcbind
>>>>> and try a v3 mount I get
>>>>>
>>>>> mount.nfs: portmap query failed: RPC: Unable to receive - Connection refused
>>>>>
>>>>> which is slightly different, so presumably there is a different cause in your
>>>>> case.
>>>>>
>>>>> Maybe you could turn on some rpcdebug tracing to see what is happening?
>>>> Ok... I had to dial back my client to an older kernel (3.12)
>>>> to start seeing what you were seeing...
>>>>
>>>> I would make one change and one comment... The change I would
>>>> like to make (I'll re-post it) is to ping the server to see
>>>> if v4 came up instead of asking rpcbind if its registered.
>>>> Code wise I think it cleaner and quicker plus I'm not sure
>>>> its a good idea to tie v4 and rpcbind together...
>>>
>>> My logic was that if rpcbind was running at all, then any v4 server should
>>> register with it. It would seem odd for rpcbind to report "v2 or v3" but for
>>> v4 to be running anyway.
>>> However I don't object in principle to your approach.
>>> I'll have a look at the code.
>>>
>>>
>>>>
>>>> My comment is this... This code become obsolete with the 3.13
>>>> kernel because the kernel never returns the timeout or the
>>>> ECONNREFUSED... The mount just spins in the kernel until
>>>> interrupted.
>>>
>>> This sounds like a regression to me. For a systemcall that used to fail to
>>> now hang sounds like an API change, and we usually discourage those.
>>>
>>> Can it be fixed? Trond?
>>
>> Can someone please provide a test case that confirms that there has been such a change?
> On the server:
> rpc.nfsd 0
> rpc.nfsd -N4
>
> On the client
> mount <server>:/export /mnt
>
> I have a mount hanging/spinning since yesterday
> 19178 pts/2 D+ 0:26 /sbin/mount.nfs fedora:/home /mnt/home -v -o rw
>
> A stack dump from crash:
> PID: 19178 TASK: ffff8800ba2b41a0 CPU: 0 COMMAND: "mount.nfs"
> #0 [ffff8800b93115f8] __schedule at ffffffff815f0c3d
> #1 [ffff8800b9311660] schedule at ffffffff815f1179
> #2 [ffff8800b9311670] rpc_wait_bit_killable at ffffffffa03f7a35 [sunrpc]
> #3 [ffff8800b9311688] __wait_on_bit at ffffffff815ef200
> #4 [ffff8800b93116c8] out_of_line_wait_on_bit at ffffffff815ef2b7
> #5 [ffff8800b9311738] __rpc_execute at ffffffffa03f890a [sunrpc]
> #6 [ffff8800b9311798] rpc_execute at ffffffffa03f9fce [sunrpc]
> #7 [ffff8800b93117c8] rpc_run_task at ffffffffa03f01c0 [sunrpc]
> #8 [ffff8800b93117e8] rpc_call_sync at ffffffffa03f0230 [sunrpc]
> #9 [ffff8800b9311840] nfs4_proc_setclientid at ffffffffa06c9c49 [nfsv4]
> #10 [ffff8800b9311988] nfs40_discover_server_trunking at ffffffffa06d8489 [nfsv4]
> #11 [ffff8800b93119d0] nfs4_discover_server_trunking at ffffffffa06daf2d [nfsv4]
> #12 [ffff8800b9311a28] nfs4_init_client at ffffffffa06e29a4 [nfsv4]
> #13 [ffff8800b9311b20] nfs_get_client at ffffffffa06816ba [nfs]
> #14 [ffff8800b9311b80] nfs4_set_client at ffffffffa06e1fb0 [nfsv4]
> #15 [ffff8800b9311c00] nfs4_create_server at ffffffffa06e34ce [nfsv4]
> #16 [ffff8800b9311c88] nfs4_remote_mount at ffffffffa06db90e [nfsv4]
> #17 [ffff8800b9311cb0] mount_fs at ffffffff811b3c89
> #18 [ffff8800b9311cf8] vfs_kern_mount at ffffffff811cea8f
> #19 [ffff8800b9311d30] nfs_do_root_mount at ffffffffa06db836 [nfsv4]
> #20 [ffff8800b9311d70] nfs4_try_mount at ffffffffa06dbc24 [nfsv4]
> #21 [ffff8800b9311da0] nfs_fs_mount at ffffffffa068dcc5 [nfs]
> #22 [ffff8800b9311e28] mount_fs at ffffffff811b3c89
> #23 [ffff8800b9311e70] vfs_kern_mount at ffffffff811cea8f
> #24 [ffff8800b9311ea8] do_mount at ffffffff811d0e3e
> #25 [ffff8800b9311f28] sys_mount at ffffffff811d16b6
> #26 [ffff8800b9311f80] system_call_fastpath at ffffffff815fc0d9
>
>
>> I would expect the timeouts to have changed due to the NFSv4 trunking detection (which is
>> exactly why it is wrong to rely on the kernel timeouts here anyway), but I would not expect
>> the kernel to never time out at all.
> It appears it started with 3.13 kernels... The above stack is from a 3.14-ish client.
>

Which patch caused the behaviour to change?

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]

2014-03-12 13:09:52

by Steve Dickson

[permalink] [raw]

Subject: Re: [PATCH - v2] mount.nfs: Fix fallback from tcp to udp

On 03/12/2014 07:22 AM, Trond Myklebust wrote:
>
> On Mar 12, 2014, at 6:57, Steve Dickson <[email protected]> wrote:
>
>>
>>
>> On 03/12/2014 05:15 AM, Trond Myklebust wrote:
>>>
>>> On Mar 12, 2014, at 1:38, NeilBrown <[email protected]> wrote:
>>>
>>>> On Tue, 11 Mar 2014 10:52:36 -0400 Steve Dickson <[email protected]> wrote:
>>>>
>>>>> On 03/10/2014 06:01 PM, NeilBrown wrote:
>>>>>>
>>>>>> With a 3.11.10 client talking to a 3.2.0 server I run
>>>>>> rpc.nfsd 0
>>>>>> rpc.nfsd -T -N4
>>>>>> on the server, then
>>>>>> rpcinfo -p SERVER | grep nfs
>>>>>> shows
>>>>>> 100003 2 udp 2049 nfs
>>>>>> 100003 3 udp 2049 nfs
>>>>>> 100227 2 udp 2049 nfs_acl
>>>>>> 100227 3 udp 2049 nfs_acl
>>>>>>
>>>>>> On client I run
>>>>>> mount -v SERVER:/PATH /mnt
>>>>>> and I get
>>>>>> mount.nfs: trying text-based options 'vers=4,addr=192.168.1.3,clientaddr=192.168.1.2'
>>>>>> mount.nfs: mount(2): Connection refused
>>>>>>
>>>>>> repeating ever 10 seconds or so. It eventually times out after 2 minutes.
>>>>>>
>>>>>> Same client to a 3.10 server I get the same behaviour.
>>>>>> 3.2.0 client and 3.10 server, same behaviour again.
>>>>>>
>>>>>> I have noticed that sometimes when I stop the NFS server the registration
>>>>>> with rpcbind doesn't go away. Not often, but sometimes. I wonder if that
>>>>>> could be confusing something? Can you check that nfsv4 has been
>>>>>> de-registered from rpcbind?
>>>>>>
>>>>>> I note you are getting the error:
>>>>>>
>>>>>>> mount.nfs: portmap query failed: RPC: Remote system error - Connection refused
>>>>>>
>>>>>> This seems to suggest that rpcbind isn't running. Yet when I kill rpcbind
>>>>>> and try a v3 mount I get
>>>>>>
>>>>>> mount.nfs: portmap query failed: RPC: Unable to receive - Connection refused
>>>>>>
>>>>>> which is slightly different, so presumably there is a different cause in your
>>>>>> case.
>>>>>>
>>>>>> Maybe you could turn on some rpcdebug tracing to see what is happening?
>>>>> Ok... I had to dial back my client to an older kernel (3.12)
>>>>> to start seeing what you were seeing...
>>>>>
>>>>> I would make one change and one comment... The change I would
>>>>> like to make (I'll re-post it) is to ping the server to see
>>>>> if v4 came up instead of asking rpcbind if its registered.
>>>>> Code wise I think it cleaner and quicker plus I'm not sure
>>>>> its a good idea to tie v4 and rpcbind together...
>>>>
>>>> My logic was that if rpcbind was running at all, then any v4 server should
>>>> register with it. It would seem odd for rpcbind to report "v2 or v3" but for
>>>> v4 to be running anyway.
>>>> However I don't object in principle to your approach.
>>>> I'll have a look at the code.
>>>>
>>>>
>>>>>
>>>>> My comment is this... This code become obsolete with the 3.13
>>>>> kernel because the kernel never returns the timeout or the
>>>>> ECONNREFUSED... The mount just spins in the kernel until
>>>>> interrupted.
>>>>
>>>> This sounds like a regression to me. For a systemcall that used to fail to
>>>> now hang sounds like an API change, and we usually discourage those.
>>>>
>>>> Can it be fixed? Trond?
>>>
>>> Can someone please provide a test case that confirms that there has been such a change?
>> On the server:
>> rpc.nfsd 0
>> rpc.nfsd -N4
>>
>> On the client
>> mount <server>:/export /mnt
>>
>> I have a mount hanging/spinning since yesterday
>> 19178 pts/2 D+ 0:26 /sbin/mount.nfs fedora:/home /mnt/home -v -o rw
>>
>> A stack dump from crash:
>> PID: 19178 TASK: ffff8800ba2b41a0 CPU: 0 COMMAND: "mount.nfs"
>> #0 [ffff8800b93115f8] __schedule at ffffffff815f0c3d
>> #1 [ffff8800b9311660] schedule at ffffffff815f1179
>> #2 [ffff8800b9311670] rpc_wait_bit_killable at ffffffffa03f7a35 [sunrpc]
>> #3 [ffff8800b9311688] __wait_on_bit at ffffffff815ef200
>> #4 [ffff8800b93116c8] out_of_line_wait_on_bit at ffffffff815ef2b7
>> #5 [ffff8800b9311738] __rpc_execute at ffffffffa03f890a [sunrpc]
>> #6 [ffff8800b9311798] rpc_execute at ffffffffa03f9fce [sunrpc]
>> #7 [ffff8800b93117c8] rpc_run_task at ffffffffa03f01c0 [sunrpc]
>> #8 [ffff8800b93117e8] rpc_call_sync at ffffffffa03f0230 [sunrpc]
>> #9 [ffff8800b9311840] nfs4_proc_setclientid at ffffffffa06c9c49 [nfsv4]
>> #10 [ffff8800b9311988] nfs40_discover_server_trunking at ffffffffa06d8489 [nfsv4]
>> #11 [ffff8800b93119d0] nfs4_discover_server_trunking at ffffffffa06daf2d [nfsv4]
>> #12 [ffff8800b9311a28] nfs4_init_client at ffffffffa06e29a4 [nfsv4]
>> #13 [ffff8800b9311b20] nfs_get_client at ffffffffa06816ba [nfs]
>> #14 [ffff8800b9311b80] nfs4_set_client at ffffffffa06e1fb0 [nfsv4]
>> #15 [ffff8800b9311c00] nfs4_create_server at ffffffffa06e34ce [nfsv4]
>> #16 [ffff8800b9311c88] nfs4_remote_mount at ffffffffa06db90e [nfsv4]
>> #17 [ffff8800b9311cb0] mount_fs at ffffffff811b3c89
>> #18 [ffff8800b9311cf8] vfs_kern_mount at ffffffff811cea8f
>> #19 [ffff8800b9311d30] nfs_do_root_mount at ffffffffa06db836 [nfsv4]
>> #20 [ffff8800b9311d70] nfs4_try_mount at ffffffffa06dbc24 [nfsv4]
>> #21 [ffff8800b9311da0] nfs_fs_mount at ffffffffa068dcc5 [nfs]
>> #22 [ffff8800b9311e28] mount_fs at ffffffff811b3c89
>> #23 [ffff8800b9311e70] vfs_kern_mount at ffffffff811cea8f
>> #24 [ffff8800b9311ea8] do_mount at ffffffff811d0e3e
>> #25 [ffff8800b9311f28] sys_mount at ffffffff811d16b6
>> #26 [ffff8800b9311f80] system_call_fastpath at ffffffff815fc0d9
>>
>>
>>> I would expect the timeouts to have changed due to the NFSv4 trunking detection (which is
>>> exactly why it is wrong to rely on the kernel timeouts here anyway), but I would not expect
>>> the kernel to never time out at all.
>> It appears it started with 3.13 kernels... The above stack is from a 3.14-ish client.
>>
>
> Which patch caused the behaviour to change?
IDK.... I just know 3.12 (f19) kernel does return timeouts and 3.13 (f20) do not....

steved.
>
> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
>