LinuxLists.cc - Re: Status of mount.nfs

[permalink] [raw]

Subject: Re: Status of mount.nfs

Chuck Lever wrote:
>>> umount.nfs uses getmntdirbackward(), which probes /etc/mtab, as far
>>> as I can tell. One problem with this is that often the effective
>>> transport protocol isn't listed in /etc/mtab at all, if, say, the
>>> user requests TCP and the server supports only UDP.
>> This got lost in the translation... In older mount code (i.e. the one
>> in utils-linux) /proc/mounts is used which is a much simpler way
>> of dealing with this... imho..
>
> Miklos seems intent on eliminating /etc/mtab anyway...
Good...

>
>>> I can't see why we need to refer back to either file to determine the
>>> transport protocol for a umount request. Whatever transport mountd
>>> is advertising at the moment is what should be used, right?
>> Well for firewall reasons you generally want to use the protocol
>> that the mount used...
>
> That could have been a very long time ago, even months, and the server
> settings may have changed. Thus sending what mount used seems
> inherently unreliable. The race window is enormous!
hmm... I must be missing something... Why is umount-ing with the
same network protocol that mount used unreliable and racy?

>
>>> [ Steve, since you have a different recollection of how all this
>>> mount stuff works, I wonder if Amit took an older version of mount
>>> when he split out the new mount.nfs helper... Can you verify this?
>>> Maybe there are some fixes you made that need to be ported over. ]
>> No... I pretty sure I had Amit use the latest and greatest...
>> I just think there was some decisions made or liberties taken
>> without a complete understand of what the ramification were...
>
> Thanks for checking on this. I worried we may have missed some
> important bug fixes.
A while back I did a patch dump of all the bugs we found when
we added the new code to Fedora... Neil's tree has all the patch
we have..

> Well, if libtirpc is added to nfs-utils, the mount command could use
> that instead. We'd be able to fix any bugs in libtirpc quite easily.
> That seems like an excellent way to address every problem with glibc's
> RPC implementation, and immediately have a "simple" use case for testing
> libtirpc (or whatever we have to replace the RPC functionality in glibc).
I can't agree with you more... At this point both rpcbind and libtirpc
are now fully supported by both Bull and yours truly... Both
tarballs are available on sourceforge:

http://sourceforge.net/projects/libtirpc/
http://sourceforge.net/projects/rpcbind/
Git trees are at:
http://git.infradead.org/?p=users/steved/libtirpc.git
http://git.infradead.org/?p=users/steved/rpcbind.git

And of course the rpms are available from Fedora mirrors

At this point everything is not quite synced up but that
will change very shortly... and new release will be coming
because the code is being used and bugs are being fixed...

The next major step would be to port nfs-utils to use libtirpc
which is on my todo list along with a ton of other things... :-\

In the end, I think it would be a very good move for our community
to own the entire stack... including the RPC library code...

steved.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-07-30 04:15:59

[permalink] [raw]

Subject: Re: Status of mount.nfs

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

chuck.lever.vcf (315.00 B)
(No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-07-31 18:30:38

[permalink] [raw]

Subject: Re: Status of mount.nfs

On Sat, 2007-07-28 at 08:51 -0400, Steve Dickson wrote:
> Trond Myklebust wrote:
> > On Fri, 2007-07-27 at 13:07 -0400, Steve Dickson wrote:
> >
> >> After how long of a wait? If all the timeouts are controllable then
> >> I agrees, but if we have to wait undefined-able amount of time for
> >> every RPC retransmit, then I think we should do a ping...
> >
> > You should be able to set the timeout either on a per-RPC call basis by
> > using clnt_call(), or by changing the default timeout on the CLIENT
> > object using clnt_control() (man 3 rpc).
> True... but I was thinking of when clnt_cnt() call pmap_getport()
> which does not take a timeout value... In that case you are
> stuck with a 60 sec hard coded timeout, regardless of the time out
> you pass in...

In the version of mount.nfs from the linux-nfs.org git tree, we call our
own private implementation of pmap_getport() instead of the one from
glibc.

Cheers
Trond

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-07-31 21:31:38

[permalink] [raw]

Subject: Re: Status of mount.nfs

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

chuck.lever.vcf (290.00 B)
(No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-07-24 20:47:47

[permalink] [raw]

Subject: Re: Status of mount.nfs

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

chuck.lever.vcf (259.00 B)
(No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-07-24 21:10:58

[permalink] [raw]

Subject: Re: Status of mount.nfs

On Tue, 2007-07-24 at 16:46 -0400, Chuck Lever wrote:

> The TCP case fails because mount.nfs is using the portmapper even though
> the user has specified the ports on the command line. Could that be the
> root cause of the failure?

If the user specifies a port, then there is no real good reason to use
the portmapper:

* For the case of the mount protocol, we should just try an RPC
call and then look at the returned RPC error values to figure
out which versions of the protocol are supported if it fails (or
alternatively, fall back to UDP if the TCP connection fails).
* If the mount call has succeeded and we have a port for the NFS
server, we should probably just try the mount call for a
sufficiently recent kernel, then look at the returned error
codes. For older kernels (pre 2.6.13?) which don't return decent
error values, then ping first in userland and look at the RPC
return values. Retry with UDP if TCP doesn't work...

Trond

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-07-24 21:20:21

[permalink] [raw]

Subject: Re: Status of mount.nfs

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

chuck.lever.vcf (259.00 B)
(No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-07-25 02:10:10

[permalink] [raw]

Subject: rpcbind behavior on Fedora 7

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

chuck.lever.vcf (259.00 B)
(No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-07-25 19:40:00

[permalink] [raw]

Subject: Re: Status of mount.nfs

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

chuck.lever.vcf (259.00 B)
(No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-07-26 12:20:26

[permalink] [raw]

Subject: Re: rpcbind behavior on Fedora 7

Sorry for the delayed response... I was traveling...

Chuck Lever wrote:
> I was trying out the mount.nfs test case for another bug (see attached).
> The test case didn't work against a Fedora 7 server. Trying to mount
> with UDP against a specific port just hangs. So I tried an rpcinfo
> against it to see what the current configuration was.
In /etc/netconfig switch the order of the udp/tcp and udp6/tcp6
entries making the udp/tcp entires first. Similar to:

--- /etc/netconfig.orig 2005-05-18 01:10:50.000000000 -0400
+++ /etc/netconfig 2007-07-24 09:45:40.000000000 -0400
@@ -10,10 +10,10 @@
# The <device> and <nametoaddr_libs> fields are always empty in this
# implementation.
#
-udp6 tpi_clts v inet6 udp - -
-tcp6 tpi_cots_ord v inet6 tcp - -
udp tpi_clts v inet udp - -
tcp tpi_cots_ord v inet tcp - -
+udp6 tpi_clts v inet6 udp - -
+tcp6 tpi_cots_ord v inet6 tcp - -
rawip tpi_raw - inet - - -
local tpi_cots_ord - loopback - - -
unix tpi_cots_ord - loopback - - -

steved.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-07-26 12:47:16

[permalink] [raw]

Subject: Re: Status of mount.nfs

Chuck Lever wrote:
>
> I'm looking at probe_nfsport() and probe_mntport() and I see that the
> portmapper call is avoided iff the protocol version, transport protocol,
> and the port number are all specified in advance.
>
> So if you specify:
>
> -o mountport=650,port=2049
>
> mount will still contact the server's portmapper to determine which
> transport protocols are available (and this breaks through a firewall
> that doesn't pass portmapper requests). Only if you specify:
>
> -o tcp,mountport=650,port=2049
>
> or
>
> -o udp,mountport=650,port=2049
>
> then the portmapper calls are avoided entirely. For some reason
> probe_bothports() sets the default NFS version to 3 but does not set a
> default transport protocol.
The protocol is set by probe_port().

>
> The transport protocols are probed differently for NFS and MNT: for MNT,
> UDP is probed first then TCP; for NFS, the opposite is true. The mount
> command is supposed to try both transport protocol types both for NFS
> and MNT, but it appears that it is failing to try the other type if the
> first fails... I see this is also problematic for umount.nfs.
The I idea here was to using UDP to probing for both rpc.mountd and the
NFS server so as not to put (tcp) ports in TIMEWAIT, basically making
them unavailable for the actual mount. The allowed many more tcp
mounts to happen during autofs mount storms.

Note: if the protocol was explicitly specified (i.e. proto=tcp) only
that protocol was used during the probing and mounting. I have
been asked to change that as well. Meaning if when proto=tcp is
specified, they still want the udp probing to occur.

steved.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-07-27 03:04:20

[permalink] [raw]

Subject: Re: Status of mount.nfs

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

chuck.lever.vcf (315.00 B)
(No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-07-27 15:00:58

[permalink] [raw]

Subject: Re: Status of mount.nfs

Chuck Lever wrote:
>
> And umount.nfs always uses TCP for the mountd request. I have a patch
> that fixes that to behave more like mount.nfs does, which I will forward
> in the next day or two.
thats a bug... umount should use the protocol the mount did...
I thought I had fixed that... :-\

>
> I notice some problems if a share is mounted with TCP, but the server
> later disables TCP -- umount.nfs hiccups on that when it tries to umount
> using the same protocol as listed in /etc/mtab. Perhaps relying on
> /etc/mtab for setting the umount protocol is unnecessary.
I think I was using /proc/mounts...

>
> We have three requests that need to be made:
>
> 1. GETPORT -- I think this should UDP all the time unless proto=tcp is
> explicitly specified;
Some people have asked that we first try UDP all the time... which
I have resisted but it might make sense...

>
> 2. MNT -- likewise, UDP unless proto=tcp is specified or GETPORT says
> UDP is not supported;
>
> 3. NFS -- this should be TCP all the time unless proto=udp is specified
> or GETPORT says TCP is not supported.
What about rollbacks... meaning if tcp is not supported do we try udp?
if v4 is not supported to we try v3 and the v2 or just fail the mount?

>
> Even better would be to use RPCB_DUMP instead of RPCB_GETPORT. That way
> we only need a single rpcbind call for both protocols, and can get
> transport protocol information as well, and make an "informed" choice.
Good point... but note, a while back I got a request to use GETPORT
instead of DUMP because some Cisco router actually use the GETPORTs
to punch wholes in their firewalls.
>
> Also, can we get rid of the clnt_ping()? If not, can we document why it
> is there? It adds two extra round trips to the whole process. If error
> reporting is the problem, maybe we can try the pings only if the kernel
> part of the mount process fails?
How do we avoid hang down deep in RPC land (governed by
uncontrollable timeout) when either mountd or nfsd are not up?

That was the main reason for the ping. Since neither portmapper or
rpcbind ping their services before they hand out the ports, there
is really no way of telling where the server is up? So to avoid
the hang, we ping them... Sure its costly network wise, but
hanging during a boot because a server is not responding is
a bit more costly... imho...

steved.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-07-27 15:56:16

[permalink] [raw]

Subject: Re: Status of mount.nfs

On Fri, 2007-07-27 at 11:00 -0400, Steve Dickson wrote:

> That was the main reason for the ping. Since neither portmapper or
> rpcbind ping their services before they hand out the ports, there
> is really no way of telling where the server is up? So to avoid
> the hang, we ping them... Sure its costly network wise, but
> hanging during a boot because a server is not responding is
> a bit more costly... imho...

Right, but recent kernels both can and will ping the NFS service for
you.

Cheers
Trond

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-07-27 16:16:58

[permalink] [raw]

Subject: Re: Status of mount.nfs

Trond Myklebust wrote:
> On Fri, 2007-07-27 at 11:00 -0400, Steve Dickson wrote:
>
>> That was the main reason for the ping. Since neither portmapper or
>> rpcbind ping their services before they hand out the ports, there
>> is really no way of telling where the server is up? So to avoid
>> the hang, we ping them... Sure its costly network wise, but
>> hanging during a boot because a server is not responding is
>> a bit more costly... imho...
>
> Right, but recent kernels both can and will ping the NFS service for
> you.
Good point... but that's just as costly (wrt network traffic) as
the mount command doing the pinging... plus the status of the
remote mountd is also needed.

steved.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-07-27 16:28:10

[permalink] [raw]

Subject: Re: Status of mount.nfs

On Fri, 2007-07-27 at 12:16 -0400, Steve Dickson wrote:
>
> Trond Myklebust wrote:
> > On Fri, 2007-07-27 at 11:00 -0400, Steve Dickson wrote:
> >
> >> That was the main reason for the ping. Since neither portmapper or
> >> rpcbind ping their services before they hand out the ports, there
> >> is really no way of telling where the server is up? So to avoid
> >> the hang, we ping them... Sure its costly network wise, but
> >> hanging during a boot because a server is not responding is
> >> a bit more costly... imho...
> >
> > Right, but recent kernels both can and will ping the NFS service for
> > you.
> Good point... but that's just as costly (wrt network traffic) as
> the mount command doing the pinging... plus the status of the
> remote mountd is also needed.

The kernel ping is sent on the same connection to the server than the
NFS client will use, so we don't waste any extra TCP ports.

I agree that you need to figure out the mountd parameters, but the right
thing to do there is simply to try the command. The RPC return values
will tell you if the service or version you tried isn't supported.

Cheers
Trond

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-07-27 17:07:29

[permalink] [raw]

Subject: Re: Status of mount.nfs

Trond Myklebust wrote:
> On Fri, 2007-07-27 at 12:16 -0400, Steve Dickson wrote:
>> Trond Myklebust wrote:
>>> On Fri, 2007-07-27 at 11:00 -0400, Steve Dickson wrote:
>>>
>>>> That was the main reason for the ping. Since neither portmapper or
>>>> rpcbind ping their services before they hand out the ports, there
>>>> is really no way of telling where the server is up? So to avoid
>>>> the hang, we ping them... Sure its costly network wise, but
>>>> hanging during a boot because a server is not responding is
>>>> a bit more costly... imho...
>>> Right, but recent kernels both can and will ping the NFS service for
>>> you.
>> Good point... but that's just as costly (wrt network traffic) as
>> the mount command doing the pinging... plus the status of the
>> remote mountd is also needed.
>
> The kernel ping is sent on the same connection to the server than the
> NFS client will use, so we don't waste any extra TCP ports.
>
> I agree that you need to figure out the mountd parameters, but the right
> thing to do there is simply to try the command. The RPC return values
> will tell you if the service or version you tried isn't supported.
After how long of a wait? If all the timeouts are controllable then
I agrees, but if we have to wait undefined-able amount of time for
every RPC retransmit, then I think we should do a ping...

steved.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-07-27 17:13:30

[permalink] [raw]

Subject: Re: Status of mount.nfs

On Fri, 2007-07-27 at 13:07 -0400, Steve Dickson wrote:

> After how long of a wait? If all the timeouts are controllable then
> I agrees, but if we have to wait undefined-able amount of time for
> every RPC retransmit, then I think we should do a ping...

You should be able to set the timeout either on a per-RPC call basis by
using clnt_call(), or by changing the default timeout on the CLIENT
object using clnt_control() (man 3 rpc).

Cheers
Trond

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-07-27 20:12:23

[permalink] [raw]

Subject: Re: Status of mount.nfs

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

chuck.lever.vcf (259.00 B)
(No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-07-27 21:39:29

[permalink] [raw]

Subject: Re: Status of mount.nfs

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

chuck.lever.vcf (259.00 B)
(No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-07-28 12:53:14

[permalink] [raw]

Subject: Re: Status of mount.nfs

Trond Myklebust wrote:
> On Fri, 2007-07-27 at 13:07 -0400, Steve Dickson wrote:
>
>> After how long of a wait? If all the timeouts are controllable then
>> I agrees, but if we have to wait undefined-able amount of time for
>> every RPC retransmit, then I think we should do a ping...
>
> You should be able to set the timeout either on a per-RPC call basis by
> using clnt_call(), or by changing the default timeout on the CLIENT
> object using clnt_control() (man 3 rpc).
True... but I was thinking of when clnt_cnt() call pmap_getport()
which does not take a timeout value... In that case you are
stuck with a 60 sec hard coded timeout, regardless of the time out
you pass in...

steved.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-07-28 13:21:19

[permalink] [raw]

Subject: Re: Status of mount.nfs

Chuck Lever wrote:
> Sorry about all the questions... and thanks for providing the history.
> The good news is that, now that [u]mount.nfs resides in nfs-utils, we
> can easily make this work a whole lot better.
>
> Steve Dickson wrote:
>> Chuck Lever wrote:
>>>
>>> And umount.nfs always uses TCP for the mountd request. I have a
>>> patch that fixes that to behave more like mount.nfs does, which I
>>> will forward in the next day or two.
>> thats a bug... umount should use the protocol the mount did...
>> I thought I had fixed that... :-\
>
> Nope... umount.nfs sets the transport protocol to TCP explicitly before
> doing the umount call. Check out utils/mount/nfsumount.c:_nfsumount() .
>
>>> I notice some problems if a share is mounted with TCP, but the server
>>> later disables TCP -- umount.nfs hiccups on that when it tries to
>>> umount using the same protocol as listed in /etc/mtab. Perhaps
>>> relying on /etc/mtab for setting the umount protocol is unnecessary.
>> I think I was using /proc/mounts...
>
> umount.nfs uses getmntdirbackward(), which probes /etc/mtab, as far as I
> can tell. One problem with this is that often the effective transport
> protocol isn't listed in /etc/mtab at all, if, say, the user requests
> TCP and the server supports only UDP.
This got lost in the translation... In older mount code (i.e. the one
in utils-linux) /proc/mounts is used which is a much simpler way
of dealing with this... imho..

>
> I can't see why we need to refer back to either file to determine the
> transport protocol for a umount request. Whatever transport mountd is
> advertising at the moment is what should be used, right?
Well for firewall reasons you generally want to use the protocol
that the mount used...

>
> [ Steve, since you have a different recollection of how all this mount
> stuff works, I wonder if Amit took an older version of mount when he
> split out the new mount.nfs helper... Can you verify this? Maybe there
> are some fixes you made that need to be ported over. ]
No... I pretty sure I had Amit use the latest and greatest...
I just think there was some decisions made or liberties taken
without a complete understand of what the ramification were...

>
>>> 2. MNT -- likewise, UDP unless proto=tcp is specified or GETPORT
>>> says UDP is not supported;
>>>
>>> 3. NFS -- this should be TCP all the time unless proto=udp is
>>> specified or GETPORT says TCP is not supported.
>> What about rollbacks... meaning if tcp is not supported do we try udp?
>> if v4 is not supported to we try v3 and the v2 or just fail the mount?
>
> I think breaking back can be supported by grabbing all data about the
> interesting services from portmapper at the start of a mount request.
> That way mount.nfs can build the correct request based on the list of
> services advertised on the server, and the list of options from the
> mount command line, then make a single set of requests. No retry logic
> is needed except for handling "bg".
right...

>>> Also, can we get rid of the clnt_ping()? If not, can we document why
>>> it is there? It adds two extra round trips to the whole process. If
>>> error reporting is the problem, maybe we can try the pings only if
>>> the kernel part of the mount process fails?
>> How do we avoid hang down deep in RPC land (governed by
>> uncontrollable timeout) when either mountd or nfsd are not up?
>
> I guess I don't see how a NULL RPC is different than sending a real
> request, when we're talking about a single MNT request from a user space
> application. If the service is down, it fails either way.
As long as the request does not get caught up in some unreasonably
long timeout in the RPC code... there is no difference... Waiting
60sec for each retry or to find out some service is down would
not be a good thing when a machine is coming up...

>
>> That was the main reason for the ping. Since neither portmapper or
>> rpcbind ping their services before they hand out the ports, there
>> is really no way of telling where the server is up? So to avoid
>> the hang, we ping them... Sure its costly network wise, but
>> hanging during a boot because a server is not responding is
>> a bit more costly... imho...
>
> My feeling is we should then fix the kernel to behave more reasonably. I
> recently changed the kernel's rpcbind client to use "intr" instead of
> "nointr" for its requests, for example. Is it practical to track down
> the hangs and fix them?
In the kernel yes, in glibc no because that code will not
change, period!

> Is it just the long time waiting for a failure,
> or do the mount processes actually get totally stuck?
Its a long time wait that can not be controlled...

steved.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-07-28 21:02:40

[permalink] [raw]

Subject: Re: Status of mount.nfs

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

chuck.lever.vcf (259.00 B)
(No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-07-24 17:50:14

[permalink] [raw]

Subject: Re: Status of mount.nfs

On Tue, 2007-07-24 at 19:24 +0200, Steinar H. Gunderson wrote:
> On Mon, Jul 23, 2007 at 06:13:42PM -0400, Chuck Lever wrote:
> > It would help if we could take a look at a clean network trace of the bad
> > and the good mount operations.
>
> It was quite simple to test this myself. I started the kernel server on a
> machine, then shut down portmap. First I did:
>
> fugl:~> sudo mount -t nfs -o port=2049,mountport=901,nfsvers=3 192.168.0.101:/ /mnt
> mount: mount to NFS server '192.168.0.101' failed: System Error: Connection refused.
>
> The dump is attached as "default.dump". Then I did
>
> fugl:~> sudo mount -t nfs -o port=2049,mountport=901,nfsvers=3,udp 192.168.0.101:/ /mnt
>
> which is attached as "udp.dump".
>
> Note that in default.dump, UDP is simply never tried at all. I believe that
> to be a bug.

Nope. Nowhere in the documentation will you find a promise to fall back
to UDP.

Trond

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-07-24 17:55:31

by Steinar H. Gunderson

[permalink] [raw]

Subject: Re: Status of mount.nfs

On Tue, Jul 24, 2007 at 01:50:05PM -0400, Trond Myklebust wrote:
>> Note that in default.dump, UDP is simply never tried at all. I believe that
>> to be a bug.
> Nope. Nowhere in the documentation will you find a promise to fall back
> to UDP.

OK, but in that case the regression should still be documented, as it seems
this worked before.

/* Steinar */
--
Homepage: http://www.sesse.net/

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-08-01 11:00:07

[permalink] [raw]

Subject: Re: Status of mount.nfs

Chuck Lever wrote:
> I was looking at this yesterday. The stock timeout for TCP connects on
> Linux is 75 seconds. The version of getport() used in the mount command
> might control the TCP connect timeout by using a non-blocking connect()
> with a select(). The select() then times out if the connection doesn't
> complete.
>
> But I'm wondering if we really want to continue using TCP for GETPORT
> calls. Solaris mount appears to use only UDP for GETPORT, for example.
As as long as the GETPORTs don't use privilege ports I don't think its
a problem... plus I don't think one size fixes all.. meaning due to
different firewalls requirements both udp and tcp GETPORTS will be
needed... imho...

steved.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-08-01 20:04:05

[permalink] [raw]

Subject: Re: Status of mount.nfs

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Attachments:

chuck.lever.vcf (290.00 B)
(No filename) (315.00 B)
(No filename) (140.00 B)
Download all attachments

2007-08-01 21:12:37