2009-08-07 06:42:24

by Benny Halevy

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

On Aug. 07, 2009, 3:18 +0300, Carlos Andr? <[email protected]> wrote:
> Anyone ?
>
> 2009/7/29 Carlos Andr? <[email protected]>:
>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work with Kerberos
>> and AutoFS, but i got a problem: If NFS server goes down i get a LOOOOOOONG
>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>
>> Since i need mount some (3 to 6) dirs at user logon process, if mount hangs,
>> user logon hangs. Then i want configure it to timeout (if server down) after
>> 10-15 secs (MAX) on each mount attempt.
>>
>> I already make a lab and tried a LOT of combinations, there my findings
>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using basic command
>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>> sec=krb5,proto=<tcp/udp>) from NFS client:
>>
>> - Once i try access mount point using AutoFS (proto=tcp OR proto=udp) it
>> hangs for 189 secs (3m9s: real 3m9.001s) until show error (mount: mount to
>> NFS server '172.16.0.10' failed: timed out (giving up))

Sounds like you're hitting the server's grace period.
You can try hacking it in /etc/init.d/nfs
by adding
echo 15 > /proc/fs/nfsd/nfsv4leasetime

before
daemon rpc.nfsd $RPCNFSDARGS $RPCNFSDCOUNT

Then, look at your /var/log/messages file for
NFSD: starting 15-second grace period

That said, I'm not sure why it can't be passed as a command line
option to the nfsd daemon and controlled via RPCNFSDARGS
in /etc/sysconfig/nfs

Benny

>>
>> Mounting manually using NFSv4 i got same timeouts of AutoFS.
>>
>> The only way to get a lower timeout value is using only proto=udp,retry=0
>> (not using sec=krb5) any another combination i get 3m9s (sec=krb5,proto=tcp
>>
>>
>> I tried change another NFS mount options putting a lower value (timeo,
>> retrans, retry), and they only change something then i use NFSv3 with
>> proto=udp. But i want NFSv4/TCP (coz Kerberos) and a timeout lower then 15
>> secs... :(
>>
>> I'm using these packages (server and client side):
>> autofs-5.0.1-0.rc2.102.el5_3.1
>> nfs-utils-1.0.9-40.el5
>> kernel-2.6.18-128.1.16.el5
>>
>> The only way to resolve this behavior is changing the source code? There's
>> no way to lower timeout with NFSv4/TCP in this case ?
>>
>> Thanks.
>>
>>
> _______________________________________________
> NFSv4 mailing list
> [email protected]
> http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4


2009-08-12 14:27:18

by Ian Kent

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Carlos Andr? wrote:
> Chuck,
> Since we'll use some NFS servers, and (some) they are not critical to
> user work (if they're down, user can just sit and wait for them
> working on something else like a webapp, printing, etc...) and if we
> lose a server or a router (worst case) acumulative timeouts on
> workstation login-boot process will make users try kill us (lol). A
> acceptable timeout(first mount fail timeout) per automount try is
> something around 10-15 (max) seconds. For us the best is a option to
> permit us to do adjustments (1 to X secs), and putting this like a new
> feature/non-default option will not mess with another users.... I dont
> wanna make a mess on sources by myself just to modify automounter for
> our needs, I just want the "right" solution in our case...

But that also begs the question.
What version of autofs is in use?

The latest versions have the ability to specify a time to wait before
killing the mount process but I think that itself is an option that
hasn't had the bugs worked out yet because the RPC layer seems to want
to refuse to give up even when we kill the process.

Perhaps we can work on this further, depending on your version
constraints, of course.

>
> Thanks for help.
>
> 2009/8/11 Chuck Lever <[email protected]>:
>> On Aug 11, 2009, at 8:41 AM, Carlos Andr? wrote:
>>> This long timeout is good if workstation need mount a critical
>>> directory using /etc/fstab on boot (for example)..
>>> But in my case, using this loooong timeout doesnt make any sense,
>>> since autofs retry mount directory on-access. This in fact gives me
>>> alot of headaches, coz user login 'll just hangs if one server goes
>>> down for any reason, and will again hangs if user try access directory
>>> pointing to a NFS down server...
>> "retry=0" means the mount command will fail as soon as the first mount(2)
>> system call fails. When you set SYN retries to 1, this means after 9
>> seconds, the connect fails, and that causes the mount(2) system call to
>> fail.
>>
>> Recent conversations with Ian suggested that a long timeout was desired for
>> automounter as well as other cases. Ian, is there something else we need to
>> consider to determine the correct retry timeout for NFS/TCP mount points
>> handled via automounter? How should mount.nfs wait so we don't make other
>> use cases worse? (Looks like most of the history is intact below).
>>
>> How long do you think is appropriate for the automounter to wait if the
>> server is down, in your case, Carlos?
>>
>>> Am losing something or there have was something weirdo...!?
>>> ------------------------------------------------
>>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries [DEFAULT]
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> proto=tcp,retry=1
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real 3m9.000s
>>> user 0m0.002s
>>> sys 0m0.001s
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> sec=krb5p,proto=tcp,retry=1
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real 3m9.000s
>>> user 0m0.000s
>>> sys 0m0.002s
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> proto=tcp,retry=0
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real 3m9.001s
>>> user 0m0.000s
>>> sys 0m0.003s
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> sec=krb5p,proto=tcp,retry=0
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real 3m9.001s
>>> user 0m0.002s
>>> sys 0m0.001s
>>>
>>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries [ 5 to 1 ]
>>>
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> proto=tcp,retry=1
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 6]
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real 1m3.002s
>>> user 0m0.000s
>>> sys 0m0.002s
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> sec=krb5p,proto=tcp,retry=1
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 13]
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real 2m6.000s
>>> user 0m0.000s
>>> sys 0m0.002s
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> proto=tcp,retry=0
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real 0m9.003s
>>> user 0m0.001s
>>> sys 0m0.002s
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> sec=krb5p,proto=tcp,retry=0
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 13]
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real 2m6.001s
>>> user 0m0.001s
>>> sys 0m0.002s
>>> [root@KSTATION ~]#
>>> ------------------------------------------------
>>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1... and
>>> using retry=0 without kerberos I got only 9s...
>>>
>>> *sigh*
>>>
>>>
>>>
>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr? wrote:
>>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_retries to
>>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>> Right. Normally the RPC client calls the kernel's socket connect
>>>> function,
>>>> which does 6 SYN retries. That one call usually takes longer than the
>>>> RPC
>>>> client's connect timeout, so it only makes one connect call, and then
>>>> fails.
>>>>
>>>> Reducing the number of SYN retries per connect attempt causes the RPC
>>>> client
>>>> to retry the connect call until its connect timeout expires. Each
>>>> connect
>>>> call resets the SYN timeout to 3 seconds.
>>>>
>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>>> sec=krb5p,proto=tcp
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>
>>>>> real 3m9.000s
>>>>> user 0m0.000s
>>>>> sys 0m0.002s
>>>>>
>>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>>> sec=krb5p,proto=tcp ("retry=1" = no change)
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>
>>>>> real 2m6.004s
>>>>> user 0m0.000s
>>>>> sys 0m0.004s
>>>>>
>>>>> (3,6,3,6... secs interval)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2009/8/10 Carlos Andr? <[email protected]>:
>>>>>> No, i'm just using packages from CentOS repo...
>>>>>>
>>>>>> And u're right about expo retries... with tcpdump i've monitored
>>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port
>>>>>> 2049...
>>>>>> I tried use "retry=1" option on mount without any change... I dont
>>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>>
>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr? wrote:
>>>>>>>> Bruce, no... you're right. I'm describing a situation where my
>>>>>>>> server
>>>>>>>> died... i need mount fail faster (10 or 15 secs max) than 3 minutes
>>>>>>>> and 9 seconds...
>>>>>>> The 189 second timeout is likely how long it takes the kernel to give
>>>>>>> up
>>>>>>> trying to connect a TCP socket to the server (6 SYN attempts with
>>>>>>> exponential retries, or something like that). For stock CentOS 5.3, I
>>>>>>> think
>>>>>>> user space does only a DNS lookup for normal NFSv4 mounts -- the
>>>>>>> kernel
>>>>>>> just
>>>>>>> tries to connect a TCP socket to port 2049, with no preceding rpcbind
>>>>>>> request.
>>>>>>>
>>>>>>> Carlos, let us know if you have replaced any NFS-related CentOS
>>>>>>> components
>>>>>>> (kernel, nfs-utils) with something you've built yourself.
>>>>>>>
>>>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr? <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>> Anyone ?
>>>>>>>>>>>
>>>>>>>>>>> 2009/7/29 Carlos Andr? <[email protected]>:
>>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work with
>>>>>>>>>>>> Kerberos
>>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i get a
>>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>>>>>>
>>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon process, if
>>>>>>>>>>>> mount
>>>>>>>>>>>> hangs,
>>>>>>>>>>>> user logon hangs. Then i want configure it to timeout (if server
>>>>>>>>>>>> down)
>>>>>>>>>>>> after
>>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>>
>>>>>>>>>>>> I already make a lab and tried a LOT of combinations, there my
>>>>>>>>>>>> findings
>>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using
>>>>>>>>>>>> basic
>>>>>>>>>>>> command
>>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>>>>>>> sec=krb5,proto=<tcp/udp>) from NFS client:
>>>>>>>>>>>>
>>>>>>>>>>>> - Once i try access mount point using AutoFS (proto=tcp OR
>>>>>>>>>>>> proto=udp)
>>>>>>>>>>>> it
>>>>>>>>>>>> hangs for 189 secs (3m9s: real 3m9.001s) until show error
>>>>>>>>>>>> (mount:
>>>>>>>>>>>> mount to
>>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>>>>>> Sounds like you're hitting the server's grace period.
>>>>>>>>> I thought he was describing a situation where the server the server
>>>>>>>>> is completely gone and isn't coming back, and wondering how to make
>>>>>>>>> the
>>>>>>>>> mount fail faster. But I may be misunderstanding.
>>>>>>>>>
>>>>>>>>> --b.
>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
>>>>>>>> in
>>>>>>>> the body of a message to [email protected]
>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>> --
>>>>>>> Chuck Lever
>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>> --
>>>> Chuck Lever
>>>> chuck[dot]lever[at]oracle[dot]com
>>>>
>>>>
>>>>
>>>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>>


2009-08-12 14:13:54

by Ian Kent

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Chuck Lever wrote:
> On Aug 11, 2009, at 8:41 AM, Carlos Andr? wrote:
>> This long timeout is good if workstation need mount a critical
>> directory using /etc/fstab on boot (for example)..
>> But in my case, using this loooong timeout doesnt make any sense,
>> since autofs retry mount directory on-access. This in fact gives me
>> alot of headaches, coz user login 'll just hangs if one server goes
>> down for any reason, and will again hangs if user try access directory
>> pointing to a NFS down server...
>
> "retry=0" means the mount command will fail as soon as the first
> mount(2) system call fails. When you set SYN retries to 1, this means
> after 9 seconds, the connect fails, and that causes the mount(2) system
> call to fail.
>
> Recent conversations with Ian suggested that a long timeout was desired
> for automounter as well as other cases. Ian, is there something else we
> need to consider to determine the correct retry timeout for NFS/TCP
> mount points handled via automounter? How should mount.nfs wait so we
> don't make other use cases worse? (Looks like most of the history is
> intact below).

Of course we know that autofs is entirely at the mercy of mount(8) (and
mount.nfs in particular). This has always been a difficult situation for
the automounter because interactive mount invocations should wait. But I
believe automount mounts should always time out quickly, but that leads
to its own set of problems, especially when home directories are concerned.

I think adding "retry=0" is the right thing to do myself but I'm not
certain that will work as we expect. I'll have to do some experimentation.

>
> How long do you think is appropriate for the automounter to wait if the
> server is down, in your case, Carlos?
>
>> Am losing something or there have was something weirdo...!?
>> ------------------------------------------------
>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries [DEFAULT]
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> proto=tcp,retry=1
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real 3m9.000s
>> user 0m0.002s
>> sys 0m0.001s
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> sec=krb5p,proto=tcp,retry=1
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real 3m9.000s
>> user 0m0.000s
>> sys 0m0.002s
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> proto=tcp,retry=0
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real 3m9.001s
>> user 0m0.000s
>> sys 0m0.003s
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> sec=krb5p,proto=tcp,retry=0
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real 3m9.001s
>> user 0m0.002s
>> sys 0m0.001s
>>
>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries [ 5 to 1 ]
>>
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> proto=tcp,retry=1
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 6]
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real 1m3.002s
>> user 0m0.000s
>> sys 0m0.002s
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> sec=krb5p,proto=tcp,retry=1
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 13]
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real 2m6.000s
>> user 0m0.000s
>> sys 0m0.002s
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> proto=tcp,retry=0
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real 0m9.003s
>> user 0m0.001s
>> sys 0m0.002s
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> sec=krb5p,proto=tcp,retry=0
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 13]
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real 2m6.001s
>> user 0m0.001s
>> sys 0m0.002s
>> [root@KSTATION ~]#
>> ------------------------------------------------
>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1... and
>> using retry=0 without kerberos I got only 9s...
>>
>> *sigh*
>>
>>
>>
>> 2009/8/10 Chuck Lever <[email protected]>:
>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr? wrote:
>>>>
>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_retries to
>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>
>>> Right. Normally the RPC client calls the kernel's socket connect
>>> function,
>>> which does 6 SYN retries. That one call usually takes longer than
>>> the RPC
>>> client's connect timeout, so it only makes one connect call, and then
>>> fails.
>>>
>>> Reducing the number of SYN retries per connect attempt causes the RPC
>>> client
>>> to retry the connect call until its connect timeout expires. Each
>>> connect
>>> call resets the SYN timeout to 3 seconds.
>>>
>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>> sec=krb5p,proto=tcp
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>
>>>> real 3m9.000s
>>>> user 0m0.000s
>>>> sys 0m0.002s
>>>>
>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>> sec=krb5p,proto=tcp ("retry=1" = no change)
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>
>>>> real 2m6.004s
>>>> user 0m0.000s
>>>> sys 0m0.004s
>>>>
>>>> (3,6,3,6... secs interval)
>>>>
>>>>
>>>>
>>>>
>>>> 2009/8/10 Carlos Andr? <[email protected]>:
>>>>>
>>>>> No, i'm just using packages from CentOS repo...
>>>>>
>>>>> And u're right about expo retries... with tcpdump i've monitored
>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port
>>>>> 2049...
>>>>> I tried use "retry=1" option on mount without any change... I dont
>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>
>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>
>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr? wrote:
>>>>>>>
>>>>>>> Bruce, no... you're right. I'm describing a situation where my
>>>>>>> server
>>>>>>> died... i need mount fail faster (10 or 15 secs max) than 3 minutes
>>>>>>> and 9 seconds...
>>>>>>
>>>>>> The 189 second timeout is likely how long it takes the kernel to
>>>>>> give up
>>>>>> trying to connect a TCP socket to the server (6 SYN attempts with
>>>>>> exponential retries, or something like that). For stock CentOS
>>>>>> 5.3, I
>>>>>> think
>>>>>> user space does only a DNS lookup for normal NFSv4 mounts -- the
>>>>>> kernel
>>>>>> just
>>>>>> tries to connect a TCP socket to port 2049, with no preceding rpcbind
>>>>>> request.
>>>>>>
>>>>>> Carlos, let us know if you have replaced any NFS-related CentOS
>>>>>> components
>>>>>> (kernel, nfs-utils) with something you've built yourself.
>>>>>>
>>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>>
>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>>>>>>
>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr? <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Anyone ?
>>>>>>>>>>
>>>>>>>>>> 2009/7/29 Carlos Andr? <[email protected]>:
>>>>>>>>>>>
>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work with
>>>>>>>>>>> Kerberos
>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i get a
>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>>>>>
>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon process, if
>>>>>>>>>>> mount
>>>>>>>>>>> hangs,
>>>>>>>>>>> user logon hangs. Then i want configure it to timeout (if server
>>>>>>>>>>> down)
>>>>>>>>>>> after
>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>
>>>>>>>>>>> I already make a lab and tried a LOT of combinations, there my
>>>>>>>>>>> findings
>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using
>>>>>>>>>>> basic
>>>>>>>>>>> command
>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>>>>>> sec=krb5,proto=<tcp/udp>) from NFS client:
>>>>>>>>>>>
>>>>>>>>>>> - Once i try access mount point using AutoFS (proto=tcp OR
>>>>>>>>>>> proto=udp)
>>>>>>>>>>> it
>>>>>>>>>>> hangs for 189 secs (3m9s: real 3m9.001s) until show error
>>>>>>>>>>> (mount:
>>>>>>>>>>> mount to
>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>>>>>
>>>>>>>>> Sounds like you're hitting the server's grace period.
>>>>>>>>
>>>>>>>> I thought he was describing a situation where the server the server
>>>>>>>> is completely gone and isn't coming back, and wondering how to make
>>>>>>>> the
>>>>>>>> mount fail faster. But I may be misunderstanding.
>>>>>>>>
>>>>>>>> --b.
>>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>> linux-nfs" in
>>>>>>> the body of a message to [email protected]
>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>> --
>>>>>> Chuck Lever
>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>> --
>>> Chuck Lever
>>> chuck[dot]lever[at]oracle[dot]com
>>>
>>>
>>>
>>>
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>


2009-08-13 15:18:55

by Carlos André

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Filled bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=3D517349

Thanks!

2009/8/13 Carlos Andr=E9 <[email protected]>:
> 2009/8/13 Ian Kent <[email protected]>:
>> Carlos Andr=E9 wrote:
>>> Today (2009-08-12) I'm using:
>>> kernel-2.6.18-128.2.1.el5
>>> autofs-5.0.1-0.rc2.102.el5_3.1
>>
>> Thanks,
>>
>> My mistake, the wait time I was referring to is used for umounts during
>> expires and is present in rev rc2.102.
>>
>> It shouldn't be hard to add this for mount as well.
>> Would you like me to put something together?
>
> Sure! that 'll help me a lot (and for sure another ppl) :) Thanks :)
>
>>
>> Probably would be good to test something out to see if we can make a
>> difference with the killing mount after some configured timeout but, if
>> we make progress, probably the best way to deal with it is for you to
>> log a bug against rhel-5 so I can get it committed to the rhel package.
>> The possible issue is that I'm not sure if the RPC subsystem in the
>> above rhel kernel will respond well to process death with potential
>> outstanding requests. But we'll see.
>
> Ok, on my way :)
>
> Thanks a lot!
>
>>
>>>
>>>
>>> Look my last test:
>>> --------------------------------------------------------------
>>> [root@KSTATION areas]# time ls testdown
>>> ls: testdown: No such file or directory
>>>
>>> real =A0 =A03m9.025s
>>> user =A0 =A00m0.000s
>>> sys =A0 =A0 0m0.002s
>>>
>>>
>>>
>>>
>>> Aug 12 12:57:07 KSTATION automount[15471]: sun_mount: parse(sun):
>>> mounting root /misc/areas, mountpoint testdown, what
>>> 1.2.3.4:/areas/testdown, fstype nfs4, options
>>> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>> Aug 12 12:57:07 KSTATION automount[15471]: do_mount:
>>> 1.2.3.4:/areas/testdown /misc/areas/testdown type nfs4 options
>>> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0 using module nfs4
>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>>> root=3D/misc/areas name=3Dtestdown what=3D1.2.3.4:/areas/testdown,
>>> fstype=3Dnfs4, options=3Dacl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>>> nfs options=3D"acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0", nosymlink=3D0, r=
o=3D0
>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>>> calling mkdir_path /misc/areas/testdown
>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>>> calling mount -t nfs4 -s -o acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>> 1.2.3.4:/areas/testdown /misc/areas/testdown
>>> Aug 12 12:58:12 KSTATION automount[15471]: st_expire: state 1 path /mis=
c
>>> Aug 12 12:58:12 KSTATION automount[15471]: expire_proc: exp_proc =3D
>>> 3078093712 path /misc
>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_proc_indirect: 2
>>> submounts remaining in /misc
>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: got thid
>>> 3078093712 path /misc stat 3
>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: sigchld:
>>> exp 3078093712 finished, switching from 2 to 1
>>> Aug 12 12:58:13 KSTATION automount[15471]: st_ready: st_ready(): state
>>> =3D 2 path /misc
>>> Aug 12 12:59:28 KSTATION automount[15471]: st_expire: state 1 path /mis=
c
>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc: exp_proc =3D
>>> 3078093712 path /misc
>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc_indirect: 2
>>> submounts remaining in /misc
>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: got thid
>>> 3078093712 path /misc stat 3
>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: sigchld:
>>> exp 3078093712 finished, switching from 2 to 1
>>> Aug 12 12:59:28 KSTATION automount[15471]: st_ready: st_ready(): state
>>> =3D 2 path /misc
>>> Aug 12 13:00:16 KSTATION automount[15471]: >> mount: mount to NFS
>>> server '1.2.3.4' failed: timed out (giving up).
>>> Aug 12 13:00:16 KSTATION automount[15471]: mount(nfs): nfs: mount
>>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>>> Aug 12 13:00:16 KSTATION automount[15471]: send_fail: token =3D 17
>>> Aug 12 13:00:16 KSTATION automount[15471]: failed to mount /misc/areas/=
testdown
>>> Aug 12 13:00:43 KSTATION automount[15471]: st_expire: state 1 path /mis=
c
>>> --------------------------------------------------------------
>>>
>>> 2009/8/12 Ian Kent <[email protected]>:
>>>> Carlos Andr=E9 wrote:
>>>>> Hi Ian,
>>>>> I'm getting crazy trying put "retry=3D" to work on mount... this opti=
on
>>>>> just DONT WORK if use proto=3Dtcp and/OR kerberos (sec=3Dkrb5/krb5i/k=
rb5p)
>>>>> like you can see on my previous emails...
>>>> Right, my mistake for not looking closely enough at post.
>>>>
>>>> Maybe this is related to the same sort of problem we had with mount in
>>>> the past, before the options parsing went into the kernel, where other
>>>> services, like portmapper (or rpcbind), were being done with different
>>>> timeout parameters before the RPC calls for mounting. That's just an
>>>> example as NFSv4 shouldn't be sensitive to portmapper anyway.
>>>>
>>>> But what version of autofs and kernel did you say you were using?
>>>>
>>>>> I appreciate any help.
>>>>>
>>>>> Carlos.
>>>>>
>>>>>
>>>>> 2009/8/12 Ian Kent <[email protected]>:
>>>>>> Chuck Lever wrote:
>>>>>>> On Aug 11, 2009, at 8:41 AM, Carlos Andr=E9 wrote:
>>>>>>>> This long timeout is good if workstation need mount a critical
>>>>>>>> directory using /etc/fstab on boot (for example)..
>>>>>>>> But in my case, using this loooong timeout doesnt make any sense,
>>>>>>>> since autofs retry mount directory on-access. This in fact gives m=
e
>>>>>>>> alot of headaches, coz user login 'll just hangs if one server goe=
s
>>>>>>>> down for any reason, and will again hangs if user try access direc=
tory
>>>>>>>> pointing to a NFS down server...
>>>>>>> "retry=3D0" means the mount command will fail as soon as the first
>>>>>>> mount(2) system call fails. =A0When you set SYN retries to 1, this =
means
>>>>>>> after 9 seconds, the connect fails, and that causes the mount(2) sy=
stem
>>>>>>> call to fail.
>>>>>>>
>>>>>>> Recent conversations with Ian suggested that a long timeout was des=
ired
>>>>>>> for automounter as well as other cases. =A0Ian, is there something =
else we
>>>>>>> need to consider to determine the correct retry timeout for NFS/TCP
>>>>>>> mount points handled via automounter? =A0How should mount.nfs wait =
so we
>>>>>>> don't make other use cases worse? =A0(Looks like most of the histor=
y is
>>>>>>> intact below).
>>>>>> Of course we know that autofs is entirely at the mercy of mount(8) (=
and
>>>>>> mount.nfs in particular). This has always been a difficult situation=
for
>>>>>> the automounter because interactive mount invocations should wait. B=
ut I
>>>>>> believe automount mounts should always time out quickly, but that le=
ads
>>>>>> to its own set of problems, especially when home directories are con=
cerned.
>>>>>>
>>>>>> I think adding "retry=3D0" is the right thing to do myself but I'm n=
ot
>>>>>> certain that will work as we expect. I'll have to do some experiment=
ation.
>>>>>>
>>>>>>> How long do you think is appropriate for the automounter to wait if=
the
>>>>>>> server is down, in your case, Carlos?
>>>>>>>
>>>>>>>> Am losing something or there have was something weirdo...!?
>>>>>>>> ------------------------------------------------
>>>>>>>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries =A0=
[DEFAULT]
>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>> proto=3Dtcp,retry=3D1
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up)=
.
>>>>>>>>
>>>>>>>> real =A0 =A03m9.000s
>>>>>>>> user =A0 =A00m0.002s
>>>>>>>> sys =A0 =A0 0m0.001s
>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up)=
.
>>>>>>>>
>>>>>>>> real =A0 =A03m9.000s
>>>>>>>> user =A0 =A00m0.000s
>>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>> proto=3Dtcp,retry=3D0
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up)=
.
>>>>>>>>
>>>>>>>> real =A0 =A03m9.001s
>>>>>>>> user =A0 =A00m0.000s
>>>>>>>> sys =A0 =A0 0m0.003s
>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up)=
.
>>>>>>>>
>>>>>>>> real =A0 =A03m9.001s
>>>>>>>> user =A0 =A00m0.002s
>>>>>>>> sys =A0 =A0 0m0.001s
>>>>>>>>
>>>>>>>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries [ 5=
to 1 ]
>>>>>>>>
>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>> proto=3Dtcp,retry=3D1
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).=
[x 6]
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up)=
.
>>>>>>>>
>>>>>>>> real =A0 =A01m3.002s
>>>>>>>> user =A0 =A00m0.000s
>>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).=
[x 13]
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up)=
.
>>>>>>>>
>>>>>>>> real =A0 =A02m6.000s
>>>>>>>> user =A0 =A00m0.000s
>>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>> proto=3Dtcp,retry=3D0
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up)=
.
>>>>>>>>
>>>>>>>> real =A0 =A00m9.003s
>>>>>>>> user =A0 =A00m0.001s
>>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).=
[x 13]
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up)=
.
>>>>>>>>
>>>>>>>> real =A0 =A02m6.001s
>>>>>>>> user =A0 =A00m0.001s
>>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>> [root@KSTATION ~]#
>>>>>>>> ------------------------------------------------
>>>>>>>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1... a=
nd
>>>>>>>> using retry=3D0 without kerberos I got only 9s...
>>>>>>>>
>>>>>>>> *sigh*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr=E9 wrote:
>>>>>>>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>>>>>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_retri=
es to
>>>>>>>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>>>>>>> Right. =A0Normally the RPC client calls the kernel's socket conne=
ct
>>>>>>>>> function,
>>>>>>>>> which does 6 SYN retries. =A0That one call usually takes longer t=
han
>>>>>>>>> the RPC
>>>>>>>>> client's connect timeout, so it only makes one connect call, and =
then
>>>>>>>>> fails.
>>>>>>>>>
>>>>>>>>> Reducing the number of SYN retries per connect attempt causes the=
RPC
>>>>>>>>> client
>>>>>>>>> to retry the connect call until its connect timeout expires. =A0E=
ach
>>>>>>>>> connect
>>>>>>>>> call resets the SYN timeout to 3 seconds.
>>>>>>>>>
>>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving u=
p).
>>>>>>>>>>
>>>>>>>>>> real =A0 =A03m9.000s
>>>>>>>>>> user =A0 =A00m0.000s
>>>>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>>>>
>>>>>>>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
>>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp =A0("retry=3D1" =3D no change)
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying=
).
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying=
).
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying=
).
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying=
).
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying=
).
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying=
).
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying=
).
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying=
).
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying=
).
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying=
).
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying=
).
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying=
).
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying=
).
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving u=
p).
>>>>>>>>>>
>>>>>>>>>> real =A0 =A02m6.004s
>>>>>>>>>> user =A0 =A00m0.000s
>>>>>>>>>> sys =A0 =A0 0m0.004s
>>>>>>>>>>
>>>>>>>>>> (3,6,3,6... secs interval)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2009/8/10 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>> No, i'm just using packages from CentOS repo...
>>>>>>>>>>>
>>>>>>>>>>> And u're right about expo retries... with tcpdump i've monitore=
d
>>>>>>>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on p=
ort
>>>>>>>>>>> 2049...
>>>>>>>>>>> I tried use "retry=3D1" option on mount without any change... I=
dont
>>>>>>>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>>>>>>>
>>>>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>>>>>>>>>>> Bruce, no... you're right. =A0I'm describing a situation wher=
e my
>>>>>>>>>>>>> server
>>>>>>>>>>>>> died... i need mount fail faster (10 or 15 secs max) than 3 m=
inutes
>>>>>>>>>>>>> and 9 seconds...
>>>>>>>>>>>> The 189 second timeout is likely how long it takes the kernel =
to
>>>>>>>>>>>> give up
>>>>>>>>>>>> trying to connect a TCP socket to the server (6 SYN attempts w=
ith
>>>>>>>>>>>> exponential retries, or something like that). =A0For stock Cen=
tOS
>>>>>>>>>>>> 5.3, I
>>>>>>>>>>>> think
>>>>>>>>>>>> user space does only a DNS lookup for normal NFSv4 mounts -- t=
he
>>>>>>>>>>>> kernel
>>>>>>>>>>>> just
>>>>>>>>>>>> tries to connect a TCP socket to port 2049, with no preceding =
rpcbind
>>>>>>>>>>>> request.
>>>>>>>>>>>>
>>>>>>>>>>>> Carlos, let us know if you have replaced any NFS-related CentO=
S
>>>>>>>>>>>> components
>>>>>>>>>>>> (kernel, nfs-utils) with something you've built yourself.
>>>>>>>>>>>>
>>>>>>>>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote=
:
>>>>>>>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <candrecn@gmai=
l.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Anyone ?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to wo=
rk with
>>>>>>>>>>>>>>>>> Kerberos
>>>>>>>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes down =
i get a
>>>>>>>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon proce=
ss, if
>>>>>>>>>>>>>>>>> mount
>>>>>>>>>>>>>>>>> hangs,
>>>>>>>>>>>>>>>>> user logon hangs. Then i want configure it to timeout (if=
server
>>>>>>>>>>>>>>>>> down)
>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I already make a lab and tried a LOT of combinations, the=
re my
>>>>>>>>>>>>>>>>> findings
>>>>>>>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) us=
ing
>>>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>>>>>>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - Once i try access mount point using AutoFS (proto=3Dtcp=
OR
>>>>>>>>>>>>>>>>> proto=3Dudp)
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> hangs for 189 secs (3m9s: real =A03m9.001s) =A0until show=
error
>>>>>>>>>>>>>>>>> (mount:
>>>>>>>>>>>>>>>>> mount to
>>>>>>>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>>>>>>>>>>> Sounds like you're hitting the server's grace period.
>>>>>>>>>>>>>> I thought he was describing a situation where the server the=
server
>>>>>>>>>>>>>> is completely gone and isn't coming back, and wondering how =
to make
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> mount fail faster. =A0But I may be misunderstanding.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --b.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>>>> linux-nfs" in
>>>>>>>>>>>>> the body of a message to [email protected]
>>>>>>>>>>>>> More majordomo info at =A0http://vger.kernel.org/majordomo-in=
fo.html
>>>>>>>>>>>> --
>>>>>>>>>>>> Chuck Lever
>>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Chuck Lever
>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>> --
>>>>>>> Chuck Lever
>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>>
>>
>
_______________________________________________
NFSv4 mailing list
[email protected]
http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4

2009-08-10 19:18:11

by Chuck Lever III

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

On Aug 10, 2009, at 2:29 PM, Carlos Andr? wrote:
> Bruce, no... you're right. I'm describing a situation where my server
> died... i need mount fail faster (10 or 15 secs max) than 3 minutes
> and 9 seconds...

The 189 second timeout is likely how long it takes the kernel to give
up trying to connect a TCP socket to the server (6 SYN attempts with
exponential retries, or something like that). For stock CentOS 5.3, I
think user space does only a DNS lookup for normal NFSv4 mounts -- the
kernel just tries to connect a TCP socket to port 2049, with no
preceding rpcbind request.

Carlos, let us know if you have replaced any NFS-related CentOS
components (kernel, nfs-utils) with something you've built yourself.

> 2009/8/7 J. Bruce Fields <[email protected]>:
>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr? <[email protected]>
>>> wrote:
>>>> Anyone ?
>>>>
>>>> 2009/7/29 Carlos Andr? <[email protected]>:
>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work with
>>>>> Kerberos
>>>>> and AutoFS, but i got a problem: If NFS server goes down i get a
>>>>> LOOOOOOONG
>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>
>>>>> Since i need mount some (3 to 6) dirs at user logon process, if
>>>>> mount hangs,
>>>>> user logon hangs. Then i want configure it to timeout (if server
>>>>> down) after
>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>
>>>>> I already make a lab and tried a LOT of combinations, there my
>>>>> findings
>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using
>>>>> basic command
>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>> sec=krb5,proto=<tcp/udp>) from NFS client:
>>>>>
>>>>> - Once i try access mount point using AutoFS (proto=tcp OR
>>>>> proto=udp) it
>>>>> hangs for 189 secs (3m9s: real 3m9.001s) until show error
>>>>> (mount: mount to
>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>
>>> Sounds like you're hitting the server's grace period.
>>
>> I thought he was describing a situation where the server the server
>> is completely gone and isn't coming back, and wondering how to make
>> the
>> mount fail faster. But I may be misunderstanding.
>>
>> --b.
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2009-08-10 20:35:22

by Chuck Lever III

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

On Aug 10, 2009, at 4:05 PM, Carlos Andr=E9 wrote:
> Something funny: Using default tcp_syn_retries (5) i got
> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_retries to
> 1 i got "3,6,3,6,3,6..." secs interval...

Right. Normally the RPC client calls the kernel's socket connect =20
function, which does 6 SYN retries. That one call usually takes =20
longer than the RPC client's connect timeout, so it only makes one =20
connect call, and then fails.

Reducing the number of SYN retries per connect attempt causes the RPC =20
client to retry the connect call until its connect timeout expires. =20
Each connect call resets the SYN timeout to 3 seconds.

> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
> sec=3Dkrb5p,proto=3Dtcp
> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>
> real 3m9.000s
> user 0m0.000s
> sys 0m0.002s
>
> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
> sec=3Dkrb5p,proto=3Dtcp ("retry=3D1" =3D no change)
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>
> real 2m6.004s
> user 0m0.000s
> sys 0m0.004s
>
> (3,6,3,6... secs interval)
>
>
>
>
> 2009/8/10 Carlos Andr=E9 <[email protected]>:
>> No, i'm just using packages from CentOS repo...
>>
>> And u're right about expo retries... with tcpdump i've monitored
>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port
>> 2049...
>> I tried use "retry=3D1" option on mount without any change... I dont
>> want change source or tcp timers... just NFSv4 client.
>>
>> 2009/8/10 Chuck Lever <[email protected]>:
>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>>
>>>> Bruce, no... you're right. I'm describing a situation where my =20
>>>> server
>>>> died... i need mount fail faster (10 or 15 secs max) than 3 minutes
>>>> and 9 seconds...
>>>
>>> The 189 second timeout is likely how long it takes the kernel to =20
>>> give up
>>> trying to connect a TCP socket to the server (6 SYN attempts with
>>> exponential retries, or something like that). For stock CentOS =20
>>> 5.3, I think
>>> user space does only a DNS lookup for normal NFSv4 mounts -- the =20
>>> kernel just
>>> tries to connect a TCP socket to port 2049, with no preceding =20
>>> rpcbind
>>> request.
>>>
>>> Carlos, let us know if you have replaced any NFS-related CentOS =20
>>> components
>>> (kernel, nfs-utils) with something you've built yourself.
>>>
>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>
>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>>>
>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <[email protected]> =20
>>>>>> wrote:
>>>>>>>
>>>>>>> Anyone ?
>>>>>>>
>>>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>>>
>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work =20
>>>>>>>> with
>>>>>>>> Kerberos
>>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i =20
>>>>>>>> get a
>>>>>>>> LOOOOOOONG
>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>>
>>>>>>>> Since i need mount some (3 to 6) dirs at user logon process, =20
>>>>>>>> if mount
>>>>>>>> hangs,
>>>>>>>> user logon hangs. Then i want configure it to timeout (if =20
>>>>>>>> server down)
>>>>>>>> after
>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>
>>>>>>>> I already make a lab and tried a LOT of combinations, there my
>>>>>>>> findings
>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using =20
>>>>>>>> basic
>>>>>>>> command
>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>>
>>>>>>>> - Once i try access mount point using AutoFS (proto=3Dtcp OR =20
>>>>>>>> proto=3Dudp)
>>>>>>>> it
>>>>>>>> hangs for 189 secs (3m9s: real 3m9.001s) until show error =20
>>>>>>>> (mount:
>>>>>>>> mount to
>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>>
>>>>>> Sounds like you're hitting the server's grace period.
>>>>>
>>>>> I thought he was describing a situation where the server the =20
>>>>> server
>>>>> is completely gone and isn't coming back, and wondering how to =20
>>>>> make the
>>>>> mount fail faster. But I may be misunderstanding.
>>>>>
>>>>> --b.
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-=20
>>>> nfs" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> Chuck Lever
>>> chuck[dot]lever[at]oracle[dot]com
>>>
>>>
>>>
>>>
>>

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



_______________________________________________
NFSv4 mailing list
[email protected]
http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4

2009-08-18 13:19:03

by Chuck Lever III

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

On Aug 17, 2009, at 8:30 PM, Ian Kent wrote:
> On Thu, 2009-08-13 at 12:18 -0300, Carlos Andr? wrote:
>> Filled bug report:
>> https://bugzilla.redhat.com/show_bug.cgi?id=517349
>
> Hi Carlos,
>
> I have a patched source rpm to add a mount wait parameter to autofs
> located at:
> http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.131.bz517349.1
>
> Could you build it and see if it works.
> I haven't tested it at all but it is fairly straight forward.

For NFSv2/v3, we have an rpcbind query followed by a MNT request, on
separate transports that the kernel can set up with fast timeouts
(when the underlying transport implementations eventually support this).

It could be more challenging for NFSv4 because the initial connection
is done on the same NFS transport that will be used for normal
operation. The connection timeout parameters can't easily be changed
from giving up quickly to normal, at least with the current RPC
infrastructure. Perhaps the first few NFSv4 operations could be done
on a separate transport.

> It is still unclear if this is the right way to do this and what the
> consequences are in sending a term signal to mount. This mount request
> will likely be followed by other requests for the same mount causing
> an
> accumulation of mount(8) processes waiting for RPC timeouts before
> they
> can answer the TERM signal.

One concern is that this would leave a number of outgoing privileged
ports in use (until the mount attempts time out, then after another
two minutes the port leaves TIME_WAIT). Clearly we need an in-kernel
solution to canceling mount requests without leaving a bunch of
resources tied up. Probably the best approach, in general terms, is
to tell mount(2) how long to wait, then let the kernel abort it as
needed.

> Anyway, for information the patch included in the source rpm above is:
>
> autofs-5.0.4 - add mount wait parameter
>
> From: Ian Kent <[email protected]>
>
> Often delays when trying to mount from a server that is not reponding
> for some reason are undesirable. To try and prevent these delays we
> provide a configuration setting to limit the time that we wait for
> our spawned mount(8) process to complete before sending it a SIGTERM
> signal. This patch adds a configuration parameter to allow us to
> request we limit the time we wait for mount(8) to complete before
> send it a TERM signal.
> ---
>
> daemon/spawn.c | 3 ++-
> include/defaults.h | 2 ++
> lib/defaults.c | 13 +++++++++++++
> man/auto.master.5.in | 7 +++++++
> redhat/autofs.sysconfig.in | 9 +++++++++
> samples/autofs.conf.default.in | 9 +++++++++
> 6 files changed, 42 insertions(+), 1 deletion(-)
>
>
> --- autofs-5.0.1.orig/daemon/spawn.c
> +++ autofs-5.0.1/daemon/spawn.c
> @@ -312,6 +312,7 @@ int spawn_mount(unsigned logopt, ...)
> unsigned int options;
> unsigned int retries = MTAB_LOCK_RETRIES;
> int update_mtab = 1, ret, printed = 0;
> + unsigned int wait = defaults_get_mount_wait();
> char buf[PATH_MAX];
>
> /* If we use mount locking we can't validate the location */
> @@ -353,7 +354,7 @@ int spawn_mount(unsigned logopt, ...)
> va_end(arg);
>
> while (retries--) {
> - ret = do_spawn(logopt, -1, options, prog, (const char **) argv);
> + ret = do_spawn(logopt, wait, options, prog, (const char **) argv);
> if (ret & MTAB_NOTUPDATED) {
> struct timespec tm = {3, 0};
>
> --- autofs-5.0.1.orig/include/defaults.h
> +++ autofs-5.0.1/include/defaults.h
> @@ -24,6 +24,7 @@
>
> #define DEFAULT_TIMEOUT 600
> #define DEFAULT_NEGATIVE_TIMEOUT 60
> +#define DEFAULT_MOUNT_WAIT -1
> #define DEFAULT_UMOUNT_WAIT 12
> #define DEFAULT_BROWSE_MODE 1
> #define DEFAULT_LOGGING 0
> @@ -62,6 +63,7 @@ struct ldap_schema *defaults_get_schema(
> struct ldap_searchdn *defaults_get_searchdns(void);
> void defaults_free_searchdns(struct ldap_searchdn *);
> unsigned int defaults_get_append_options(void);
> +unsigned int defaults_get_mount_wait(void);
> unsigned int defaults_get_umount_wait(void);
> const char *defaults_get_auth_conf_file(void);
> unsigned int defaults_get_map_hash_table_size(void);
> --- autofs-5.0.1.orig/lib/defaults.c
> +++ autofs-5.0.1/lib/defaults.c
> @@ -45,6 +45,7 @@
> #define ENV_NAME_VALUE_ATTR "VALUE_ATTRIBUTE"
>
> #define ENV_APPEND_OPTIONS "APPEND_OPTIONS"
> +#define ENV_MOUNT_WAIT "MOUNT_WAIT"
> #define ENV_UMOUNT_WAIT "UMOUNT_WAIT"
> #define ENV_AUTH_CONF_FILE "AUTH_CONF_FILE"
>
> @@ -323,6 +324,7 @@ unsigned int defaults_read_config(unsign
> check_set_config_value(key, ENV_NAME_ENTRY_ATTR, value,
> to_syslog) ||
> check_set_config_value(key, ENV_NAME_VALUE_ATTR, value,
> to_syslog) ||
> check_set_config_value(key, ENV_APPEND_OPTIONS, value,
> to_syslog) ||
> + check_set_config_value(key, ENV_MOUNT_WAIT, value, to_syslog)
> ||
> check_set_config_value(key, ENV_UMOUNT_WAIT, value, to_syslog)
> ||
> check_set_config_value(key, ENV_AUTH_CONF_FILE, value,
> to_syslog) ||
> check_set_config_value(key, ENV_MAP_HASH_TABLE_SIZE, value,
> to_syslog))
> @@ -652,6 +654,17 @@ unsigned int defaults_get_append_options
> return res;
> }
>
> +unsigned int defaults_get_mount_wait(void)
> +{
> + long wait;
> +
> + wait = get_env_number(ENV_MOUNT_WAIT);
> + if (wait < 0)
> + wait = DEFAULT_MOUNT_WAIT;
> +
> + return (unsigned int) wait;
> +}
> +
> unsigned int defaults_get_umount_wait(void)
> {
> long wait;
> --- autofs-5.0.1.orig/man/auto.master.5.in
> +++ autofs-5.0.1/man/auto.master.5.in
> @@ -175,6 +175,13 @@ Set the default timeout for caching fail
> 60). If the equivalent command line option is given it will override
> this
> setting.
> .TP
> +.B MOUNT_WAIT
> +Set the default time to wait for a response from a spawned mount(8)
> +before sending it a SIGTERM. Note that we still need to wait for the
> +RPC layer to timeout before the sub-process exits so this isn't ideal
> +but it is the best we can do. The default is to wait until mount(8)
> +returns without intervention.
> +.TP
> .B UMOUNT_WAIT
> Set the default time to wait for a response from a spawned umount(8)
> before sending it a SIGTERM. Note that we still need to wait for the
> --- autofs-5.0.1.orig/redhat/autofs.sysconfig.in
> +++ autofs-5.0.1/redhat/autofs.sysconfig.in
> @@ -14,6 +14,15 @@ TIMEOUT=300
> #
> #NEGATIVE_TIMEOUT=60
> #
> +# MOUNT_WAIT - time to wait for a response from umount(8).
> +# Setting this timeout can cause problems when
> +# mount would otherwise wait for a server that
> +# is temporarily unavailable, such as when it's
> +# restarting. The defailt of waiting for mount(8)
> +# usually results in a wait of around 3 minutes.
> +#
> +#MOUNT_WAIT=-1
> +#
> # UMOUNT_WAIT - time to wait for a response from umount(8).
> #
> #UMOUNT_WAIT=12
> --- autofs-5.0.1.orig/samples/autofs.conf.default.in
> +++ autofs-5.0.1/samples/autofs.conf.default.in
> @@ -14,6 +14,15 @@ TIMEOUT=300
> #
> #NEGATIVE_TIMEOUT=60
> #
> +# MOUNT_WAIT - time to wait for a response from umount(8).
> +# Setting this timeout can cause problems when
> +# mount would otherwise wait for a server that
> +# is temporarily unavailable, such as when it's
> +# restarting. The defailt of waiting for mount(8)
> +# usually results in a wait of around 3 minutes.
> +#
> +#MOUNT_WAIT=-1
> +#
> # UMOUNT_WAIT - time to wait for a response from umount(8).
> #
> #UMOUNT_WAIT=12
>
>
>>
>> Thanks!
>>
>> 2009/8/13 Carlos Andr? <[email protected]>:
>>> 2009/8/13 Ian Kent <[email protected]>:
>>>> Carlos Andr? wrote:
>>>>> Today (2009-08-12) I'm using:
>>>>> kernel-2.6.18-128.2.1.el5
>>>>> autofs-5.0.1-0.rc2.102.el5_3.1
>>>>
>>>> Thanks,
>>>>
>>>> My mistake, the wait time I was referring to is used for umounts
>>>> during
>>>> expires and is present in rev rc2.102.
>>>>
>>>> It shouldn't be hard to add this for mount as well.
>>>> Would you like me to put something together?
>>>
>>> Sure! that 'll help me a lot (and for sure another ppl) :) Thanks :)
>>>
>>>>
>>>> Probably would be good to test something out to see if we can
>>>> make a
>>>> difference with the killing mount after some configured timeout
>>>> but, if
>>>> we make progress, probably the best way to deal with it is for
>>>> you to
>>>> log a bug against rhel-5 so I can get it committed to the rhel
>>>> package.
>>>> The possible issue is that I'm not sure if the RPC subsystem in the
>>>> above rhel kernel will respond well to process death with potential
>>>> outstanding requests. But we'll see.
>>>
>>> Ok, on my way :)
>>>
>>> Thanks a lot!
>>>
>>>>
>>>>>
>>>>>
>>>>> Look my last test:
>>>>> --------------------------------------------------------------
>>>>> [root@KSTATION areas]# time ls testdown
>>>>> ls: testdown: No such file or directory
>>>>>
>>>>> real 3m9.025s
>>>>> user 0m0.000s
>>>>> sys 0m0.002s
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Aug 12 12:57:07 KSTATION automount[15471]: sun_mount: parse(sun):
>>>>> mounting root /misc/areas, mountpoint testdown, what
>>>>> 1.2.3.4:/areas/testdown, fstype nfs4, options
>>>>> acl,sec=krb5p,proto=tcp,retry=0
>>>>> Aug 12 12:57:07 KSTATION automount[15471]: do_mount:
>>>>> 1.2.3.4:/areas/testdown /misc/areas/testdown type nfs4 options
>>>>> acl,sec=krb5p,proto=tcp,retry=0 using module nfs4
>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount:
>>>>> mount(nfs):
>>>>> root=/misc/areas name=testdown what=1.2.3.4:/areas/testdown,
>>>>> fstype=nfs4, options=acl,sec=krb5p,proto=tcp,retry=0
>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount:
>>>>> mount(nfs):
>>>>> nfs options="acl,sec=krb5p,proto=tcp,retry=0", nosymlink=0, ro=0
>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount:
>>>>> mount(nfs):
>>>>> calling mkdir_path /misc/areas/testdown
>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount:
>>>>> mount(nfs):
>>>>> calling mount -t nfs4 -s -o acl,sec=krb5p,proto=tcp,retry=0
>>>>> 1.2.3.4:/areas/testdown /misc/areas/testdown
>>>>> Aug 12 12:58:12 KSTATION automount[15471]: st_expire: state 1
>>>>> path /misc
>>>>> Aug 12 12:58:12 KSTATION automount[15471]: expire_proc: exp_proc =
>>>>> 3078093712 path /misc
>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_proc_indirect: 2
>>>>> submounts remaining in /misc
>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: got
>>>>> thid
>>>>> 3078093712 path /misc stat 3
>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup:
>>>>> sigchld:
>>>>> exp 3078093712 finished, switching from 2 to 1
>>>>> Aug 12 12:58:13 KSTATION automount[15471]: st_ready: st_ready():
>>>>> state
>>>>> = 2 path /misc
>>>>> Aug 12 12:59:28 KSTATION automount[15471]: st_expire: state 1
>>>>> path /misc
>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc: exp_proc =
>>>>> 3078093712 path /misc
>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc_indirect: 2
>>>>> submounts remaining in /misc
>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: got
>>>>> thid
>>>>> 3078093712 path /misc stat 3
>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup:
>>>>> sigchld:
>>>>> exp 3078093712 finished, switching from 2 to 1
>>>>> Aug 12 12:59:28 KSTATION automount[15471]: st_ready: st_ready():
>>>>> state
>>>>> = 2 path /misc
>>>>> Aug 12 13:00:16 KSTATION automount[15471]: >> mount: mount to NFS
>>>>> server '1.2.3.4' failed: timed out (giving up).
>>>>> Aug 12 13:00:16 KSTATION automount[15471]: mount(nfs): nfs: mount
>>>>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>>>>> Aug 12 13:00:16 KSTATION automount[15471]: send_fail: token = 17
>>>>> Aug 12 13:00:16 KSTATION automount[15471]: failed to mount /misc/
>>>>> areas/testdown
>>>>> Aug 12 13:00:43 KSTATION automount[15471]: st_expire: state 1
>>>>> path /misc
>>>>> --------------------------------------------------------------
>>>>>
>>>>> 2009/8/12 Ian Kent <[email protected]>:
>>>>>> Carlos Andr? wrote:
>>>>>>> Hi Ian,
>>>>>>> I'm getting crazy trying put "retry=" to work on mount... this
>>>>>>> option
>>>>>>> just DONT WORK if use proto=tcp and/OR kerberos (sec=krb5/
>>>>>>> krb5i/krb5p)
>>>>>>> like you can see on my previous emails...
>>>>>> Right, my mistake for not looking closely enough at post.
>>>>>>
>>>>>> Maybe this is related to the same sort of problem we had with
>>>>>> mount in
>>>>>> the past, before the options parsing went into the kernel,
>>>>>> where other
>>>>>> services, like portmapper (or rpcbind), were being done with
>>>>>> different
>>>>>> timeout parameters before the RPC calls for mounting. That's
>>>>>> just an
>>>>>> example as NFSv4 shouldn't be sensitive to portmapper anyway.
>>>>>>
>>>>>> But what version of autofs and kernel did you say you were using?
>>>>>>
>>>>>>> I appreciate any help.
>>>>>>>
>>>>>>> Carlos.
>>>>>>>
>>>>>>>
>>>>>>> 2009/8/12 Ian Kent <[email protected]>:
>>>>>>>> Chuck Lever wrote:
>>>>>>>>> On Aug 11, 2009, at 8:41 AM, Carlos Andr? wrote:
>>>>>>>>>> This long timeout is good if workstation need mount a
>>>>>>>>>> critical
>>>>>>>>>> directory using /etc/fstab on boot (for example)..
>>>>>>>>>> But in my case, using this loooong timeout doesnt make any
>>>>>>>>>> sense,
>>>>>>>>>> since autofs retry mount directory on-access. This in fact
>>>>>>>>>> gives me
>>>>>>>>>> alot of headaches, coz user login 'll just hangs if one
>>>>>>>>>> server goes
>>>>>>>>>> down for any reason, and will again hangs if user try
>>>>>>>>>> access directory
>>>>>>>>>> pointing to a NFS down server...
>>>>>>>>> "retry=0" means the mount command will fail as soon as the
>>>>>>>>> first
>>>>>>>>> mount(2) system call fails. When you set SYN retries to 1,
>>>>>>>>> this means
>>>>>>>>> after 9 seconds, the connect fails, and that causes the
>>>>>>>>> mount(2) system
>>>>>>>>> call to fail.
>>>>>>>>>
>>>>>>>>> Recent conversations with Ian suggested that a long timeout
>>>>>>>>> was desired
>>>>>>>>> for automounter as well as other cases. Ian, is there
>>>>>>>>> something else we
>>>>>>>>> need to consider to determine the correct retry timeout for
>>>>>>>>> NFS/TCP
>>>>>>>>> mount points handled via automounter? How should mount.nfs
>>>>>>>>> wait so we
>>>>>>>>> don't make other use cases worse? (Looks like most of the
>>>>>>>>> history is
>>>>>>>>> intact below).
>>>>>>>> Of course we know that autofs is entirely at the mercy of
>>>>>>>> mount(8) (and
>>>>>>>> mount.nfs in particular). This has always been a difficult
>>>>>>>> situation for
>>>>>>>> the automounter because interactive mount invocations should
>>>>>>>> wait. But I
>>>>>>>> believe automount mounts should always time out quickly, but
>>>>>>>> that leads
>>>>>>>> to its own set of problems, especially when home directories
>>>>>>>> are concerned.
>>>>>>>>
>>>>>>>> I think adding "retry=0" is the right thing to do myself but
>>>>>>>> I'm not
>>>>>>>> certain that will work as we expect. I'll have to do some
>>>>>>>> experimentation.
>>>>>>>>
>>>>>>>>> How long do you think is appropriate for the automounter to
>>>>>>>>> wait if the
>>>>>>>>> server is down, in your case, Carlos?
>>>>>>>>>
>>>>>>>>>> Am losing something or there have was something weirdo...!?
>>>>>>>>>> ------------------------------------------------
>>>>>>>>>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/
>>>>>>>>>> tcp_syn_retries [DEFAULT]
>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4
>>>>>>>>>> -o
>>>>>>>>>> proto=tcp,retry=1
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>> (giving up).
>>>>>>>>>>
>>>>>>>>>> real 3m9.000s
>>>>>>>>>> user 0m0.002s
>>>>>>>>>> sys 0m0.001s
>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4
>>>>>>>>>> -o
>>>>>>>>>> sec=krb5p,proto=tcp,retry=1
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>> (giving up).
>>>>>>>>>>
>>>>>>>>>> real 3m9.000s
>>>>>>>>>> user 0m0.000s
>>>>>>>>>> sys 0m0.002s
>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4
>>>>>>>>>> -o
>>>>>>>>>> proto=tcp,retry=0
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>> (giving up).
>>>>>>>>>>
>>>>>>>>>> real 3m9.001s
>>>>>>>>>> user 0m0.000s
>>>>>>>>>> sys 0m0.003s
>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4
>>>>>>>>>> -o
>>>>>>>>>> sec=krb5p,proto=tcp,retry=0
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>> (giving up).
>>>>>>>>>>
>>>>>>>>>> real 3m9.001s
>>>>>>>>>> user 0m0.002s
>>>>>>>>>> sys 0m0.001s
>>>>>>>>>>
>>>>>>>>>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/
>>>>>>>>>> tcp_syn_retries [ 5 to 1 ]
>>>>>>>>>>
>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4
>>>>>>>>>> -o
>>>>>>>>>> proto=tcp,retry=1
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>> (retrying). [x 6]
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>> (giving up).
>>>>>>>>>>
>>>>>>>>>> real 1m3.002s
>>>>>>>>>> user 0m0.000s
>>>>>>>>>> sys 0m0.002s
>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4
>>>>>>>>>> -o
>>>>>>>>>> sec=krb5p,proto=tcp,retry=1
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>> (retrying). [x 13]
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>> (giving up).
>>>>>>>>>>
>>>>>>>>>> real 2m6.000s
>>>>>>>>>> user 0m0.000s
>>>>>>>>>> sys 0m0.002s
>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4
>>>>>>>>>> -o
>>>>>>>>>> proto=tcp,retry=0
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>> (giving up).
>>>>>>>>>>
>>>>>>>>>> real 0m9.003s
>>>>>>>>>> user 0m0.001s
>>>>>>>>>> sys 0m0.002s
>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4
>>>>>>>>>> -o
>>>>>>>>>> sec=krb5p,proto=tcp,retry=0
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>> (retrying). [x 13]
>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>> (giving up).
>>>>>>>>>>
>>>>>>>>>> real 2m6.001s
>>>>>>>>>> user 0m0.001s
>>>>>>>>>> sys 0m0.002s
>>>>>>>>>> [root@KSTATION ~]#
>>>>>>>>>> ------------------------------------------------
>>>>>>>>>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to
>>>>>>>>>> 1... and
>>>>>>>>>> using retry=0 without kerberos I got only 9s...
>>>>>>>>>>
>>>>>>>>>> *sigh*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr? wrote:
>>>>>>>>>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>>>>>>>>>> "3,6,12,24,48,96" secs interval... but if i change
>>>>>>>>>>>> tcp_syn_retries to
>>>>>>>>>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>>>>>>>>> Right. Normally the RPC client calls the kernel's socket
>>>>>>>>>>> connect
>>>>>>>>>>> function,
>>>>>>>>>>> which does 6 SYN retries. That one call usually takes
>>>>>>>>>>> longer than
>>>>>>>>>>> the RPC
>>>>>>>>>>> client's connect timeout, so it only makes one connect
>>>>>>>>>>> call, and then
>>>>>>>>>>> fails.
>>>>>>>>>>>
>>>>>>>>>>> Reducing the number of SYN retries per connect attempt
>>>>>>>>>>> causes the RPC
>>>>>>>>>>> client
>>>>>>>>>>> to retry the connect call until its connect timeout
>>>>>>>>>>> expires. Each
>>>>>>>>>>> connect
>>>>>>>>>>> call resets the SYN timeout to 3 seconds.
>>>>>>>>>>>
>>>>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t
>>>>>>>>>>>> nfs4 -o
>>>>>>>>>>>> sec=krb5p,proto=tcp
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (giving up).
>>>>>>>>>>>>
>>>>>>>>>>>> real 3m9.000s
>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>> sys 0m0.002s
>>>>>>>>>>>>
>>>>>>>>>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/
>>>>>>>>>>>> tcp_syn_retries
>>>>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t
>>>>>>>>>>>> nfs4 -o
>>>>>>>>>>>> sec=krb5p,proto=tcp ("retry=1" = no change)
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (retrying).
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (retrying).
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (retrying).
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (retrying).
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (retrying).
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (retrying).
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (retrying).
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (retrying).
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (retrying).
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (retrying).
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (retrying).
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (retrying).
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (retrying).
>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out
>>>>>>>>>>>> (giving up).
>>>>>>>>>>>>
>>>>>>>>>>>> real 2m6.004s
>>>>>>>>>>>> user 0m0.000s
>>>>>>>>>>>> sys 0m0.004s
>>>>>>>>>>>>
>>>>>>>>>>>> (3,6,3,6... secs interval)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2009/8/10 Carlos Andr? <[email protected]>:
>>>>>>>>>>>>> No, i'm just using packages from CentOS repo...
>>>>>>>>>>>>>
>>>>>>>>>>>>> And u're right about expo retries... with tcpdump i've
>>>>>>>>>>>>> monitored
>>>>>>>>>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96
>>>>>>>>>>>>> secs on port
>>>>>>>>>>>>> 2049...
>>>>>>>>>>>>> I tried use "retry=1" option on mount without any
>>>>>>>>>>>>> change... I dont
>>>>>>>>>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr? wrote:
>>>>>>>>>>>>>>> Bruce, no... you're right. I'm describing a situation
>>>>>>>>>>>>>>> where my
>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>> died... i need mount fail faster (10 or 15 secs max)
>>>>>>>>>>>>>>> than 3 minutes
>>>>>>>>>>>>>>> and 9 seconds...
>>>>>>>>>>>>>> The 189 second timeout is likely how long it takes the
>>>>>>>>>>>>>> kernel to
>>>>>>>>>>>>>> give up
>>>>>>>>>>>>>> trying to connect a TCP socket to the server (6 SYN
>>>>>>>>>>>>>> attempts with
>>>>>>>>>>>>>> exponential retries, or something like that). For
>>>>>>>>>>>>>> stock CentOS
>>>>>>>>>>>>>> 5.3, I
>>>>>>>>>>>>>> think
>>>>>>>>>>>>>> user space does only a DNS lookup for normal NFSv4
>>>>>>>>>>>>>> mounts -- the
>>>>>>>>>>>>>> kernel
>>>>>>>>>>>>>> just
>>>>>>>>>>>>>> tries to connect a TCP socket to port 2049, with no
>>>>>>>>>>>>>> preceding rpcbind
>>>>>>>>>>>>>> request.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Carlos, let us know if you have replaced any NFS-
>>>>>>>>>>>>>> related CentOS
>>>>>>>>>>>>>> components
>>>>>>>>>>>>>> (kernel, nfs-utils) with something you've built yourself.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>>>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny
>>>>>>>>>>>>>>>> Halevy wrote:
>>>>>>>>>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr? <[email protected]
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Anyone ?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2009/7/29 Carlos Andr? <[email protected]>:
>>>>>>>>>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4
>>>>>>>>>>>>>>>>>>> server to work with
>>>>>>>>>>>>>>>>>>> Kerberos
>>>>>>>>>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server
>>>>>>>>>>>>>>>>>>> goes down i get a
>>>>>>>>>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4
>>>>>>>>>>>>>>>>>>> client...
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user
>>>>>>>>>>>>>>>>>>> logon process, if
>>>>>>>>>>>>>>>>>>> mount
>>>>>>>>>>>>>>>>>>> hangs,
>>>>>>>>>>>>>>>>>>> user logon hangs. Then i want configure it to
>>>>>>>>>>>>>>>>>>> timeout (if server
>>>>>>>>>>>>>>>>>>> down)
>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I already make a lab and tried a LOT of
>>>>>>>>>>>>>>>>>>> combinations, there my
>>>>>>>>>>>>>>>>>>> findings
>>>>>>>>>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP:
>>>>>>>>>>>>>>>>>>> 172.16.1.10) using
>>>>>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t
>>>>>>>>>>>>>>>>>>> nfs4 -o
>>>>>>>>>>>>>>>>>>> sec=krb5,proto=<tcp/udp>) from NFS client:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> - Once i try access mount point using AutoFS
>>>>>>>>>>>>>>>>>>> (proto=tcp OR
>>>>>>>>>>>>>>>>>>> proto=udp)
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> hangs for 189 secs (3m9s: real 3m9.001s) until
>>>>>>>>>>>>>>>>>>> show error
>>>>>>>>>>>>>>>>>>> (mount:
>>>>>>>>>>>>>>>>>>> mount to
>>>>>>>>>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving
>>>>>>>>>>>>>>>>>>> up))
>>>>>>>>>>>>>>>>> Sounds like you're hitting the server's grace period.
>>>>>>>>>>>>>>>> I thought he was describing a situation where the
>>>>>>>>>>>>>>>> server the server
>>>>>>>>>>>>>>>> is completely gone and isn't coming back, and
>>>>>>>>>>>>>>>> wondering how to make
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> mount fail faster. But I may be misunderstanding.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --b.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> To unsubscribe from this list: send the line
>>>>>>>>>>>>>>> "unsubscribe
>>>>>>>>>>>>>>> linux-nfs" in
>>>>>>>>>>>>>>> the body of a message to [email protected]
>>>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Chuck Lever
>>>>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Chuck Lever
>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Chuck Lever
>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>
>>>>
>>>
>

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2009-08-11 12:41:24

by Carlos André

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

This long timeout is good if workstation need mount a critical
directory using /etc/fstab on boot (for example)..
But in my case, using this loooong timeout doesnt make any sense,
since autofs retry mount directory on-access. This in fact gives me
alot of headaches, coz user login 'll just hangs if one server goes
down for any reason, and will again hangs if user try access directory
pointing to a NFS down server...


Am losing something or there have was something weirdo...!?
------------------------------------------------
[root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries [DEFAULT]
[root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o proto=tcp,retry=1
mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).

real 3m9.000s
user 0m0.002s
sys 0m0.001s
[root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
sec=krb5p,proto=tcp,retry=1
mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).

real 3m9.000s
user 0m0.000s
sys 0m0.002s
[root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o proto=tcp,retry=0
mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).

real 3m9.001s
user 0m0.000s
sys 0m0.003s
[root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
sec=krb5p,proto=tcp,retry=0
mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).

real 3m9.001s
user 0m0.002s
sys 0m0.001s

[root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries [ 5 to 1 ]

[root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o proto=tcp,retry=1
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 6]
mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).

real 1m3.002s
user 0m0.000s
sys 0m0.002s
[root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
sec=krb5p,proto=tcp,retry=1
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 13]
mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).

real 2m6.000s
user 0m0.000s
sys 0m0.002s
[root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o proto=tcp,retry=0
mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).

real 0m9.003s
user 0m0.001s
sys 0m0.002s
[root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
sec=krb5p,proto=tcp,retry=0
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 13]
mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).

real 2m6.001s
user 0m0.001s
sys 0m0.002s
[root@KSTATION ~]#
------------------------------------------------
max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1... and
using retry=0 without kerberos I got only 9s...

*sigh*



2009/8/10 Chuck Lever <[email protected]>:
> On Aug 10, 2009, at 4:05 PM, Carlos Andr? wrote:
>>
>> Something funny: Using default tcp_syn_retries (5) i got
>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_retries to
>> 1 i got "3,6,3,6,3,6..." secs interval...
>
> Right. Normally the RPC client calls the kernel's socket connect function,
> which does 6 SYN retries. That one call usually takes longer than the RPC
> client's connect timeout, so it only makes one connect call, and then fails.
>
> Reducing the number of SYN retries per connect attempt causes the RPC client
> to retry the connect call until its connect timeout expires. Each connect
> call resets the SYN timeout to 3 seconds.
>
>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>> sec=krb5p,proto=tcp
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real 3m9.000s
>> user 0m0.000s
>> sys 0m0.002s
>>
>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>> sec=krb5p,proto=tcp ("retry=1" = no change)
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real 2m6.004s
>> user 0m0.000s
>> sys 0m0.004s
>>
>> (3,6,3,6... secs interval)
>>
>>
>>
>>
>> 2009/8/10 Carlos Andr? <[email protected]>:
>>>
>>> No, i'm just using packages from CentOS repo...
>>>
>>> And u're right about expo retries... with tcpdump i've monitored
>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port
>>> 2049...
>>> I tried use "retry=1" option on mount without any change... I dont
>>> want change source or tcp timers... just NFSv4 client.
>>>
>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>
>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr? wrote:
>>>>>
>>>>> Bruce, no... you're right. I'm describing a situation where my server
>>>>> died... i need mount fail faster (10 or 15 secs max) than 3 minutes
>>>>> and 9 seconds...
>>>>
>>>> The 189 second timeout is likely how long it takes the kernel to give up
>>>> trying to connect a TCP socket to the server (6 SYN attempts with
>>>> exponential retries, or something like that). For stock CentOS 5.3, I
>>>> think
>>>> user space does only a DNS lookup for normal NFSv4 mounts -- the kernel
>>>> just
>>>> tries to connect a TCP socket to port 2049, with no preceding rpcbind
>>>> request.
>>>>
>>>> Carlos, let us know if you have replaced any NFS-related CentOS
>>>> components
>>>> (kernel, nfs-utils) with something you've built yourself.
>>>>
>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>
>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>>>>
>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr? <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Anyone ?
>>>>>>>>
>>>>>>>> 2009/7/29 Carlos Andr? <[email protected]>:
>>>>>>>>>
>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work with
>>>>>>>>> Kerberos
>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i get a
>>>>>>>>> LOOOOOOONG
>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>>>
>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon process, if
>>>>>>>>> mount
>>>>>>>>> hangs,
>>>>>>>>> user logon hangs. Then i want configure it to timeout (if server
>>>>>>>>> down)
>>>>>>>>> after
>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>
>>>>>>>>> I already make a lab and tried a LOT of combinations, there my
>>>>>>>>> findings
>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using basic
>>>>>>>>> command
>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>>>> sec=krb5,proto=<tcp/udp>) from NFS client:
>>>>>>>>>
>>>>>>>>> - Once i try access mount point using AutoFS (proto=tcp OR
>>>>>>>>> proto=udp)
>>>>>>>>> it
>>>>>>>>> hangs for 189 secs (3m9s: real 3m9.001s) until show error (mount:
>>>>>>>>> mount to
>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>>>
>>>>>>> Sounds like you're hitting the server's grace period.
>>>>>>
>>>>>> I thought he was describing a situation where the server the server
>>>>>> is completely gone and isn't coming back, and wondering how to make
>>>>>> the
>>>>>> mount fail faster. But I may be misunderstanding.
>>>>>>
>>>>>> --b.
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>> the body of a message to [email protected]
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>> --
>>>> Chuck Lever
>>>> chuck[dot]lever[at]oracle[dot]com
>>>>
>>>>
>>>>
>>>>
>>>
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>

2009-08-12 02:37:23

by Carlos André

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Chuck,
Since we'll use some NFS servers, and (some) they are not critical to
user work (if they're down, user can just sit and wait for them
working on something else like a webapp, printing, etc...) and if we
lose a server or a router (worst case) acumulative timeouts on
workstation login-boot process will make users try kill us (lol). A
acceptable timeout(first mount fail timeout) per automount try is
something around 10-15 (max) seconds. For us the best is a option to
permit us to do adjustments (1 to X secs), and putting this like a new
feature/non-default option will not mess with another users.... I dont
wanna make a mess on sources by myself just to modify automounter for
our needs, I just want the "right" solution in our case...

Thanks for help.

2009/8/11 Chuck Lever <[email protected]>:
> On Aug 11, 2009, at 8:41 AM, Carlos Andr=E9 wrote:
>>
>> This long timeout is good if workstation need mount a critical
>> directory using /etc/fstab on boot (for example)..
>> But in my case, using this loooong timeout doesnt make any sense,
>> since autofs retry mount directory on-access. This in fact gives me
>> alot of headaches, coz user login 'll just hangs if one server goes
>> down for any reason, and will again hangs if user try access directory
>> pointing to a NFS down server...
>
> "retry=3D0" means the mount command will fail as soon as the first mount(=
2)
> system call fails. =A0When you set SYN retries to 1, this means after 9
> seconds, the connect fails, and that causes the mount(2) system call to
> fail.
>
> Recent conversations with Ian suggested that a long timeout was desired f=
or
> automounter as well as other cases. =A0Ian, is there something else we ne=
ed to
> consider to determine the correct retry timeout for NFS/TCP mount points
> handled via automounter? =A0How should mount.nfs wait so we don't make ot=
her
> use cases worse? =A0(Looks like most of the history is intact below).
>
> How long do you think is appropriate for the automounter to wait if the
> server is down, in your case, Carlos?
>
>> Am losing something or there have was something weirdo...!?
>> ------------------------------------------------
>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries =A0[DEFAU=
LT]
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> proto=3Dtcp,retry=3D1
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real =A0 =A03m9.000s
>> user =A0 =A00m0.002s
>> sys =A0 =A0 0m0.001s
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real =A0 =A03m9.000s
>> user =A0 =A00m0.000s
>> sys =A0 =A0 0m0.002s
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> proto=3Dtcp,retry=3D0
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real =A0 =A03m9.001s
>> user =A0 =A00m0.000s
>> sys =A0 =A0 0m0.003s
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real =A0 =A03m9.001s
>> user =A0 =A00m0.002s
>> sys =A0 =A0 0m0.001s
>>
>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries [ 5 to 1 =
]
>>
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> proto=3Dtcp,retry=3D1
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 6]
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real =A0 =A01m3.002s
>> user =A0 =A00m0.000s
>> sys =A0 =A0 0m0.002s
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 13=
]
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real =A0 =A02m6.000s
>> user =A0 =A00m0.000s
>> sys =A0 =A0 0m0.002s
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> proto=3Dtcp,retry=3D0
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real =A0 =A00m9.003s
>> user =A0 =A00m0.001s
>> sys =A0 =A0 0m0.002s
>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 13=
]
>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>
>> real =A0 =A02m6.001s
>> user =A0 =A00m0.001s
>> sys =A0 =A0 0m0.002s
>> [root@KSTATION ~]#
>> ------------------------------------------------
>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1... and
>> using retry=3D0 without kerberos I got only 9s...
>>
>> *sigh*
>>
>>
>>
>> 2009/8/10 Chuck Lever <[email protected]>:
>>>
>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr=E9 wrote:
>>>>
>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_retries to
>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>
>>> Right. =A0Normally the RPC client calls the kernel's socket connect
>>> function,
>>> which does 6 SYN retries. =A0That one call usually takes longer than th=
e
>>> RPC
>>> client's connect timeout, so it only makes one connect call, and then
>>> fails.
>>>
>>> Reducing the number of SYN retries per connect attempt causes the RPC
>>> client
>>> to retry the connect call until its connect timeout expires. =A0Each
>>> connect
>>> call resets the SYN timeout to 3 seconds.
>>>
>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>> sec=3Dkrb5p,proto=3Dtcp
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>
>>>> real =A0 =A03m9.000s
>>>> user =A0 =A00m0.000s
>>>> sys =A0 =A0 0m0.002s
>>>>
>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>> sec=3Dkrb5p,proto=3Dtcp =A0("retry=3D1" =3D no change)
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>
>>>> real =A0 =A02m6.004s
>>>> user =A0 =A00m0.000s
>>>> sys =A0 =A0 0m0.004s
>>>>
>>>> (3,6,3,6... secs interval)
>>>>
>>>>
>>>>
>>>>
>>>> 2009/8/10 Carlos Andr=E9 <[email protected]>:
>>>>>
>>>>> No, i'm just using packages from CentOS repo...
>>>>>
>>>>> And u're right about expo retries... with tcpdump i've monitored
>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port
>>>>> 2049...
>>>>> I tried use "retry=3D1" option on mount without any change... I dont
>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>
>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>
>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>>>>>
>>>>>>> Bruce, no... you're right. =A0I'm describing a situation where my
>>>>>>> server
>>>>>>> died... i need mount fail faster (10 or 15 secs max) than 3 minutes
>>>>>>> and 9 seconds...
>>>>>>
>>>>>> The 189 second timeout is likely how long it takes the kernel to giv=
e
>>>>>> up
>>>>>> trying to connect a TCP socket to the server (6 SYN attempts with
>>>>>> exponential retries, or something like that). =A0For stock CentOS 5.=
3, I
>>>>>> think
>>>>>> user space does only a DNS lookup for normal NFSv4 mounts -- the
>>>>>> kernel
>>>>>> just
>>>>>> tries to connect a TCP socket to port 2049, with no preceding rpcbin=
d
>>>>>> request.
>>>>>>
>>>>>> Carlos, let us know if you have replaced any NFS-related CentOS
>>>>>> components
>>>>>> (kernel, nfs-utils) with something you've built yourself.
>>>>>>
>>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>>
>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>>>>>>
>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Anyone ?
>>>>>>>>>>
>>>>>>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>
>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work wit=
h
>>>>>>>>>>> Kerberos
>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i get =
a
>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>>>>>
>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon process, if
>>>>>>>>>>> mount
>>>>>>>>>>> hangs,
>>>>>>>>>>> user logon hangs. Then i want configure it to timeout (if serve=
r
>>>>>>>>>>> down)
>>>>>>>>>>> after
>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>
>>>>>>>>>>> I already make a lab and tried a LOT of combinations, there my
>>>>>>>>>>> findings
>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using
>>>>>>>>>>> basic
>>>>>>>>>>> command
>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>>>>>
>>>>>>>>>>> - Once i try access mount point using AutoFS (proto=3Dtcp OR
>>>>>>>>>>> proto=3Dudp)
>>>>>>>>>>> it
>>>>>>>>>>> hangs for 189 secs (3m9s: real =A03m9.001s) =A0until show error
>>>>>>>>>>> (mount:
>>>>>>>>>>> mount to
>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>>>>>
>>>>>>>>> Sounds like you're hitting the server's grace period.
>>>>>>>>
>>>>>>>> I thought he was describing a situation where the server the serve=
r
>>>>>>>> is completely gone and isn't coming back, and wondering how to mak=
e
>>>>>>>> the
>>>>>>>> mount fail faster. =A0But I may be misunderstanding.
>>>>>>>>
>>>>>>>> --b.
>>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs=
"
>>>>>>> in
>>>>>>> the body of a message to [email protected]
>>>>>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
>>>>>>
>>>>>> --
>>>>>> Chuck Lever
>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>> --
>>> Chuck Lever
>>> chuck[dot]lever[at]oracle[dot]com
>>>
>>>
>>>
>>>
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>
_______________________________________________
NFSv4 mailing list
[email protected]
http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4

2009-08-11 20:00:34

by Chuck Lever III

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

On Aug 11, 2009, at 8:41 AM, Carlos Andr=E9 wrote:
> This long timeout is good if workstation need mount a critical
> directory using /etc/fstab on boot (for example)..
> But in my case, using this loooong timeout doesnt make any sense,
> since autofs retry mount directory on-access. This in fact gives me
> alot of headaches, coz user login 'll just hangs if one server goes
> down for any reason, and will again hangs if user try access directory
> pointing to a NFS down server...

"retry=3D0" means the mount command will fail as soon as the first =20
mount(2) system call fails. When you set SYN retries to 1, this means =20
after 9 seconds, the connect fails, and that causes the mount(2) =20
system call to fail.

Recent conversations with Ian suggested that a long timeout was =20
desired for automounter as well as other cases. Ian, is there =20
something else we need to consider to determine the correct retry =20
timeout for NFS/TCP mount points handled via automounter? How should =20
mount.nfs wait so we don't make other use cases worse? (Looks like =20
most of the history is intact below).

How long do you think is appropriate for the automounter to wait if =20
the server is down, in your case, Carlos?

> Am losing something or there have was something weirdo...!?
> ------------------------------------------------
> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries =20
> [DEFAULT]
> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o =20
> proto=3Dtcp,retry=3D1
> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>
> real 3m9.000s
> user 0m0.002s
> sys 0m0.001s
> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>
> real 3m9.000s
> user 0m0.000s
> sys 0m0.002s
> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o =20
> proto=3Dtcp,retry=3D0
> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>
> real 3m9.001s
> user 0m0.000s
> sys 0m0.003s
> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>
> real 3m9.001s
> user 0m0.002s
> sys 0m0.001s
>
> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries [ 5 =20
> to 1 ]
>
> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o =20
> proto=3Dtcp,retry=3D1
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). =20
> [x 6]
> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>
> real 1m3.002s
> user 0m0.000s
> sys 0m0.002s
> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). =20
> [x 13]
> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>
> real 2m6.000s
> user 0m0.000s
> sys 0m0.002s
> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o =20
> proto=3Dtcp,retry=3D0
> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>
> real 0m9.003s
> user 0m0.001s
> sys 0m0.002s
> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). =20
> [x 13]
> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>
> real 2m6.001s
> user 0m0.001s
> sys 0m0.002s
> [root@KSTATION ~]#
> ------------------------------------------------
> max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1... and
> using retry=3D0 without kerberos I got only 9s...
>
> *sigh*
>
>
>
> 2009/8/10 Chuck Lever <[email protected]>:
>> On Aug 10, 2009, at 4:05 PM, Carlos Andr=E9 wrote:
>>>
>>> Something funny: Using default tcp_syn_retries (5) i got
>>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_retries =20
>>> to
>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>
>> Right. Normally the RPC client calls the kernel's socket connect =20
>> function,
>> which does 6 SYN retries. That one call usually takes longer than =20
>> the RPC
>> client's connect timeout, so it only makes one connect call, and =20
>> then fails.
>>
>> Reducing the number of SYN retries per connect attempt causes the =20
>> RPC client
>> to retry the connect call until its connect timeout expires. Each =20
>> connect
>> call resets the SYN timeout to 3 seconds.
>>
>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>> sec=3Dkrb5p,proto=3Dtcp
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real 3m9.000s
>>> user 0m0.000s
>>> sys 0m0.002s
>>>
>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>> sec=3Dkrb5p,proto=3Dtcp ("retry=3D1" =3D no change)
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real 2m6.004s
>>> user 0m0.000s
>>> sys 0m0.004s
>>>
>>> (3,6,3,6... secs interval)
>>>
>>>
>>>
>>>
>>> 2009/8/10 Carlos Andr=E9 <[email protected]>:
>>>>
>>>> No, i'm just using packages from CentOS repo...
>>>>
>>>> And u're right about expo retries... with tcpdump i've monitored
>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port
>>>> 2049...
>>>> I tried use "retry=3D1" option on mount without any change... I dont
>>>> want change source or tcp timers... just NFSv4 client.
>>>>
>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>
>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>>>>
>>>>>> Bruce, no... you're right. I'm describing a situation where my =20
>>>>>> server
>>>>>> died... i need mount fail faster (10 or 15 secs max) than 3 =20
>>>>>> minutes
>>>>>> and 9 seconds...
>>>>>
>>>>> The 189 second timeout is likely how long it takes the kernel to =20
>>>>> give up
>>>>> trying to connect a TCP socket to the server (6 SYN attempts with
>>>>> exponential retries, or something like that). For stock CentOS =20
>>>>> 5.3, I
>>>>> think
>>>>> user space does only a DNS lookup for normal NFSv4 mounts -- the =20
>>>>> kernel
>>>>> just
>>>>> tries to connect a TCP socket to port 2049, with no preceding =20
>>>>> rpcbind
>>>>> request.
>>>>>
>>>>> Carlos, let us know if you have replaced any NFS-related CentOS
>>>>> components
>>>>> (kernel, nfs-utils) with something you've built yourself.
>>>>>
>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>
>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>>>>>
>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Anyone ?
>>>>>>>>>
>>>>>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>
>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work =20
>>>>>>>>>> with
>>>>>>>>>> Kerberos
>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i =20
>>>>>>>>>> get a
>>>>>>>>>> LOOOOOOONG
>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>>>>
>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon =20
>>>>>>>>>> process, if
>>>>>>>>>> mount
>>>>>>>>>> hangs,
>>>>>>>>>> user logon hangs. Then i want configure it to timeout (if =20
>>>>>>>>>> server
>>>>>>>>>> down)
>>>>>>>>>> after
>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>
>>>>>>>>>> I already make a lab and tried a LOT of combinations, there =20
>>>>>>>>>> my
>>>>>>>>>> findings
>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) =20
>>>>>>>>>> using basic
>>>>>>>>>> command
>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>>>>
>>>>>>>>>> - Once i try access mount point using AutoFS (proto=3Dtcp OR
>>>>>>>>>> proto=3Dudp)
>>>>>>>>>> it
>>>>>>>>>> hangs for 189 secs (3m9s: real 3m9.001s) until show error =20
>>>>>>>>>> (mount:
>>>>>>>>>> mount to
>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>>>>
>>>>>>>> Sounds like you're hitting the server's grace period.
>>>>>>>
>>>>>>> I thought he was describing a situation where the server the =20
>>>>>>> server
>>>>>>> is completely gone and isn't coming back, and wondering how to =20
>>>>>>> make
>>>>>>> the
>>>>>>> mount fail faster. But I may be misunderstanding.
>>>>>>>
>>>>>>> --b.
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-=20
>>>>>> nfs" in
>>>>>> the body of a message to [email protected]
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-=20
>>>>>> info.html
>>>>>
>>>>> --
>>>>> Chuck Lever
>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>>

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



_______________________________________________
NFSv4 mailing list
[email protected]
http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4

2009-08-10 20:11:14

by Chuck Lever III

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

On Aug 10, 2009, at 3:43 PM, Carlos Andr? wrote:
> No, i'm just using packages from CentOS repo...
>
> And u're right about expo retries... with tcpdump i've monitored
> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port
> 2049...
> I tried use "retry=1" option on mount without any change...

That won't have any effect on the kernel's TCP connect behavior. It
is simply used by the mount command to know when to stop redriving
mount(2) system calls. The current mount command doesn't actually
interrupt the mount(2) system call if it's taking longer than the
specified "retry=" setting.

> I don't want change source or tcp timers... just NFSv4 client.

I don't know of any way to effect a change in the kernel's TCP connect
behavior short of a code change, and that would affect all RPC/TCP
programs.

Basically the server is down. I suppose the client's kernel can
detect this is the case as soon as the ARP request for the server's
MAC address times out, but normally we retry TCP connects for a while
(even in this case) because we assume the server is coming back up as
quickly as it can, and want to catch it as quickly as possible.

But we can't shorten this timeout in the general case, I don't think.
It could take quite a while on a busy network or if a long round trip
is involved for a TCP connect to complete.

> 2009/8/10 Chuck Lever <[email protected]>:
>> On Aug 10, 2009, at 2:29 PM, Carlos Andr? wrote:
>>>
>>> Bruce, no... you're right. I'm describing a situation where my
>>> server
>>> died... i need mount fail faster (10 or 15 secs max) than 3 minutes
>>> and 9 seconds...
>>
>> The 189 second timeout is likely how long it takes the kernel to
>> give up
>> trying to connect a TCP socket to the server (6 SYN attempts with
>> exponential retries, or something like that). For stock CentOS
>> 5.3, I think
>> user space does only a DNS lookup for normal NFSv4 mounts -- the
>> kernel just
>> tries to connect a TCP socket to port 2049, with no preceding rpcbind
>> request.
>>
>> Carlos, let us know if you have replaced any NFS-related CentOS
>> components
>> (kernel, nfs-utils) with something you've built yourself.
>>
>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>
>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>>
>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr? <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> Anyone ?
>>>>>>
>>>>>> 2009/7/29 Carlos Andr? <[email protected]>:
>>>>>>>
>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work with
>>>>>>> Kerberos
>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i get a
>>>>>>> LOOOOOOONG
>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>
>>>>>>> Since i need mount some (3 to 6) dirs at user logon process,
>>>>>>> if mount
>>>>>>> hangs,
>>>>>>> user logon hangs. Then i want configure it to timeout (if
>>>>>>> server down)
>>>>>>> after
>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>
>>>>>>> I already make a lab and tried a LOT of combinations, there my
>>>>>>> findings
>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using
>>>>>>> basic
>>>>>>> command
>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>> sec=krb5,proto=<tcp/udp>) from NFS client:
>>>>>>>
>>>>>>> - Once i try access mount point using AutoFS (proto=tcp OR
>>>>>>> proto=udp)
>>>>>>> it
>>>>>>> hangs for 189 secs (3m9s: real 3m9.001s) until show error
>>>>>>> (mount:
>>>>>>> mount to
>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>
>>>>> Sounds like you're hitting the server's grace period.
>>>>
>>>> I thought he was describing a situation where the server the server
>>>> is completely gone and isn't coming back, and wondering how to
>>>> make the
>>>> mount fail faster. But I may be misunderstanding.
>>>>
>>>> --b.
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-
>>> nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>>

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2009-08-18 00:30:18

by Ian Kent

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

T24gVGh1LCAyMDA5LTA4LTEzIGF0IDEyOjE4IC0wMzAwLCBDYXJsb3MgQW5kcsOpIHdyb3RlOgo+
IEZpbGxlZCBidWcgcmVwb3J0Ogo+IGh0dHBzOi8vYnVnemlsbGEucmVkaGF0LmNvbS9zaG93X2J1
Zy5jZ2k/aWQ9NTE3MzQ5CgpIaSBDYXJsb3MsCgpJIGhhdmUgYSBwYXRjaGVkIHNvdXJjZSBycG0g
dG8gYWRkIGEgbW91bnQgd2FpdCBwYXJhbWV0ZXIgdG8gYXV0b2ZzCmxvY2F0ZWQgYXQ6Cmh0dHA6
Ly9wZW9wbGUucmVkaGF0LmNvbS9+aWtlbnQvYXV0b2ZzLTUuMC4xLTAucmMyLjEzMS5iejUxNzM0
OS4xCgpDb3VsZCB5b3UgYnVpbGQgaXQgYW5kIHNlZSBpZiBpdCB3b3Jrcy4KSSBoYXZlbid0IHRl
c3RlZCBpdCBhdCBhbGwgYnV0IGl0IGlzIGZhaXJseSBzdHJhaWdodCBmb3J3YXJkLgpJdCBpcyBz
dGlsbCB1bmNsZWFyIGlmIHRoaXMgaXMgdGhlIHJpZ2h0IHdheSB0byBkbyB0aGlzIGFuZCB3aGF0
IHRoZQpjb25zZXF1ZW5jZXMgYXJlIGluIHNlbmRpbmcgYSB0ZXJtIHNpZ25hbCB0byBtb3VudC4g
VGhpcyBtb3VudCByZXF1ZXN0CndpbGwgbGlrZWx5IGJlIGZvbGxvd2VkIGJ5IG90aGVyIHJlcXVl
c3RzIGZvciB0aGUgc2FtZSBtb3VudCBjYXVzaW5nIGFuCmFjY3VtdWxhdGlvbiBvZiBtb3VudCg4
KSBwcm9jZXNzZXMgd2FpdGluZyBmb3IgUlBDIHRpbWVvdXRzIGJlZm9yZSB0aGV5CmNhbiBhbnN3
ZXIgdGhlIFRFUk0gc2lnbmFsLgoKQW55d2F5LCBmb3IgaW5mb3JtYXRpb24gdGhlIHBhdGNoIGlu
Y2x1ZGVkIGluIHRoZSBzb3VyY2UgcnBtIGFib3ZlIGlzOgoKYXV0b2ZzLTUuMC40IC0gYWRkIG1v
dW50IHdhaXQgcGFyYW1ldGVyCgpGcm9tOiBJYW4gS2VudCA8cmF2ZW5AdGhlbWF3Lm5ldD4KCk9m
dGVuIGRlbGF5cyB3aGVuIHRyeWluZyB0byBtb3VudCBmcm9tIGEgc2VydmVyIHRoYXQgaXMgbm90
IHJlcG9uZGluZwpmb3Igc29tZSByZWFzb24gYXJlIHVuZGVzaXJhYmxlLiBUbyB0cnkgYW5kIHBy
ZXZlbnQgdGhlc2UgZGVsYXlzIHdlCnByb3ZpZGUgYSBjb25maWd1cmF0aW9uIHNldHRpbmcgdG8g
bGltaXQgdGhlIHRpbWUgdGhhdCB3ZSB3YWl0IGZvcgpvdXIgc3Bhd25lZCBtb3VudCg4KSBwcm9j
ZXNzIHRvIGNvbXBsZXRlIGJlZm9yZSBzZW5kaW5nIGl0IGEgU0lHVEVSTQpzaWduYWwuIFRoaXMg
cGF0Y2ggYWRkcyBhIGNvbmZpZ3VyYXRpb24gcGFyYW1ldGVyIHRvIGFsbG93IHVzIHRvCnJlcXVl
c3Qgd2UgbGltaXQgdGhlIHRpbWUgd2Ugd2FpdCBmb3IgbW91bnQoOCkgdG8gY29tcGxldGUgYmVm
b3JlCnNlbmQgaXQgYSBURVJNIHNpZ25hbC4KLS0tCgogZGFlbW9uL3NwYXduLmMgICAgICAgICAg
ICAgICAgIHwgICAgMyArKy0KIGluY2x1ZGUvZGVmYXVsdHMuaCAgICAgICAgICAgICB8ICAgIDIg
KysKIGxpYi9kZWZhdWx0cy5jICAgICAgICAgICAgICAgICB8ICAgMTMgKysrKysrKysrKysrKwog
bWFuL2F1dG8ubWFzdGVyLjUuaW4gICAgICAgICAgIHwgICAgNyArKysrKysrCiByZWRoYXQvYXV0
b2ZzLnN5c2NvbmZpZy5pbiAgICAgfCAgICA5ICsrKysrKysrKwogc2FtcGxlcy9hdXRvZnMuY29u
Zi5kZWZhdWx0LmluIHwgICAgOSArKysrKysrKysKIDYgZmlsZXMgY2hhbmdlZCwgNDIgaW5zZXJ0
aW9ucygrKSwgMSBkZWxldGlvbigtKQoKCi0tLSBhdXRvZnMtNS4wLjEub3JpZy9kYWVtb24vc3Bh
d24uYworKysgYXV0b2ZzLTUuMC4xL2RhZW1vbi9zcGF3bi5jCkBAIC0zMTIsNiArMzEyLDcgQEAg
aW50IHNwYXduX21vdW50KHVuc2lnbmVkIGxvZ29wdCwgLi4uKQogCXVuc2lnbmVkIGludCBvcHRp
b25zOwogCXVuc2lnbmVkIGludCByZXRyaWVzID0gTVRBQl9MT0NLX1JFVFJJRVM7CiAJaW50IHVw
ZGF0ZV9tdGFiID0gMSwgcmV0LCBwcmludGVkID0gMDsKKwl1bnNpZ25lZCBpbnQgd2FpdCA9IGRl
ZmF1bHRzX2dldF9tb3VudF93YWl0KCk7CiAJY2hhciBidWZbUEFUSF9NQVhdOwogCiAJLyogSWYg
d2UgdXNlIG1vdW50IGxvY2tpbmcgd2UgY2FuJ3QgdmFsaWRhdGUgdGhlIGxvY2F0aW9uICovCkBA
IC0zNTMsNyArMzU0LDcgQEAgaW50IHNwYXduX21vdW50KHVuc2lnbmVkIGxvZ29wdCwgLi4uKQog
CXZhX2VuZChhcmcpOwogCiAJd2hpbGUgKHJldHJpZXMtLSkgewotCQlyZXQgPSBkb19zcGF3bihs
b2dvcHQsIC0xLCBvcHRpb25zLCBwcm9nLCAoY29uc3QgY2hhciAqKikgYXJndik7CisJCXJldCA9
IGRvX3NwYXduKGxvZ29wdCwgd2FpdCwgb3B0aW9ucywgcHJvZywgKGNvbnN0IGNoYXIgKiopIGFy
Z3YpOwogCQlpZiAocmV0ICYgTVRBQl9OT1RVUERBVEVEKSB7CiAJCQlzdHJ1Y3QgdGltZXNwZWMg
dG0gPSB7MywgMH07CiAKLS0tIGF1dG9mcy01LjAuMS5vcmlnL2luY2x1ZGUvZGVmYXVsdHMuaAor
KysgYXV0b2ZzLTUuMC4xL2luY2x1ZGUvZGVmYXVsdHMuaApAQCAtMjQsNiArMjQsNyBAQAogCiAj
ZGVmaW5lIERFRkFVTFRfVElNRU9VVAkJCTYwMAogI2RlZmluZSBERUZBVUxUX05FR0FUSVZFX1RJ
TUVPVVQJNjAKKyNkZWZpbmUgREVGQVVMVF9NT1VOVF9XQUlUCQktMQogI2RlZmluZSBERUZBVUxU
X1VNT1VOVF9XQUlUCQkxMgogI2RlZmluZSBERUZBVUxUX0JST1dTRV9NT0RFCQkxCiAjZGVmaW5l
IERFRkFVTFRfTE9HR0lORwkJCTAKQEAgLTYyLDYgKzYzLDcgQEAgc3RydWN0IGxkYXBfc2NoZW1h
ICpkZWZhdWx0c19nZXRfc2NoZW1hKAogc3RydWN0IGxkYXBfc2VhcmNoZG4gKmRlZmF1bHRzX2dl
dF9zZWFyY2hkbnModm9pZCk7CiB2b2lkIGRlZmF1bHRzX2ZyZWVfc2VhcmNoZG5zKHN0cnVjdCBs
ZGFwX3NlYXJjaGRuICopOwogdW5zaWduZWQgaW50IGRlZmF1bHRzX2dldF9hcHBlbmRfb3B0aW9u
cyh2b2lkKTsKK3Vuc2lnbmVkIGludCBkZWZhdWx0c19nZXRfbW91bnRfd2FpdCh2b2lkKTsKIHVu
c2lnbmVkIGludCBkZWZhdWx0c19nZXRfdW1vdW50X3dhaXQodm9pZCk7CiBjb25zdCBjaGFyICpk
ZWZhdWx0c19nZXRfYXV0aF9jb25mX2ZpbGUodm9pZCk7CiB1bnNpZ25lZCBpbnQgZGVmYXVsdHNf
Z2V0X21hcF9oYXNoX3RhYmxlX3NpemUodm9pZCk7Ci0tLSBhdXRvZnMtNS4wLjEub3JpZy9saWIv
ZGVmYXVsdHMuYworKysgYXV0b2ZzLTUuMC4xL2xpYi9kZWZhdWx0cy5jCkBAIC00NSw2ICs0NSw3
IEBACiAjZGVmaW5lIEVOVl9OQU1FX1ZBTFVFX0FUVFIJCSJWQUxVRV9BVFRSSUJVVEUiCiAKICNk
ZWZpbmUgRU5WX0FQUEVORF9PUFRJT05TCQkiQVBQRU5EX09QVElPTlMiCisjZGVmaW5lIEVOVl9N
T1VOVF9XQUlUCQkJIk1PVU5UX1dBSVQiCiAjZGVmaW5lIEVOVl9VTU9VTlRfV0FJVAkJCSJVTU9V
TlRfV0FJVCIKICNkZWZpbmUgRU5WX0FVVEhfQ09ORl9GSUxFCQkiQVVUSF9DT05GX0ZJTEUiCiAK
QEAgLTMyMyw2ICszMjQsNyBAQCB1bnNpZ25lZCBpbnQgZGVmYXVsdHNfcmVhZF9jb25maWcodW5z
aWduCiAJCSAgICBjaGVja19zZXRfY29uZmlnX3ZhbHVlKGtleSwgRU5WX05BTUVfRU5UUllfQVRU
UiwgdmFsdWUsIHRvX3N5c2xvZykgfHwKIAkJICAgIGNoZWNrX3NldF9jb25maWdfdmFsdWUoa2V5
LCBFTlZfTkFNRV9WQUxVRV9BVFRSLCB2YWx1ZSwgdG9fc3lzbG9nKSB8fAogCQkgICAgY2hlY2tf
c2V0X2NvbmZpZ192YWx1ZShrZXksIEVOVl9BUFBFTkRfT1BUSU9OUywgdmFsdWUsIHRvX3N5c2xv
ZykgfHwKKwkJICAgIGNoZWNrX3NldF9jb25maWdfdmFsdWUoa2V5LCBFTlZfTU9VTlRfV0FJVCwg
dmFsdWUsIHRvX3N5c2xvZykgfHwKIAkJICAgIGNoZWNrX3NldF9jb25maWdfdmFsdWUoa2V5LCBF
TlZfVU1PVU5UX1dBSVQsIHZhbHVlLCB0b19zeXNsb2cpIHx8CiAJCSAgICBjaGVja19zZXRfY29u
ZmlnX3ZhbHVlKGtleSwgRU5WX0FVVEhfQ09ORl9GSUxFLCB2YWx1ZSwgdG9fc3lzbG9nKSB8fAog
CQkgICAgY2hlY2tfc2V0X2NvbmZpZ192YWx1ZShrZXksIEVOVl9NQVBfSEFTSF9UQUJMRV9TSVpF
LCB2YWx1ZSwgdG9fc3lzbG9nKSkKQEAgLTY1Miw2ICs2NTQsMTcgQEAgdW5zaWduZWQgaW50IGRl
ZmF1bHRzX2dldF9hcHBlbmRfb3B0aW9ucwogCXJldHVybiByZXM7CiB9CiAKK3Vuc2lnbmVkIGlu
dCBkZWZhdWx0c19nZXRfbW91bnRfd2FpdCh2b2lkKQoreworCWxvbmcgd2FpdDsKKworCXdhaXQg
PSBnZXRfZW52X251bWJlcihFTlZfTU9VTlRfV0FJVCk7CisJaWYgKHdhaXQgPCAwKQorCQl3YWl0
ID0gREVGQVVMVF9NT1VOVF9XQUlUOworCisJcmV0dXJuICh1bnNpZ25lZCBpbnQpIHdhaXQ7Cit9
CisKIHVuc2lnbmVkIGludCBkZWZhdWx0c19nZXRfdW1vdW50X3dhaXQodm9pZCkKIHsKIAlsb25n
IHdhaXQ7Ci0tLSBhdXRvZnMtNS4wLjEub3JpZy9tYW4vYXV0by5tYXN0ZXIuNS5pbgorKysgYXV0
b2ZzLTUuMC4xL21hbi9hdXRvLm1hc3Rlci41LmluCkBAIC0xNzUsNiArMTc1LDEzIEBAIFNldCB0
aGUgZGVmYXVsdCB0aW1lb3V0IGZvciBjYWNoaW5nIGZhaWwKIDYwKS4gSWYgdGhlIGVxdWl2YWxl
bnQgY29tbWFuZCBsaW5lIG9wdGlvbiBpcyBnaXZlbiBpdCB3aWxsIG92ZXJyaWRlIHRoaXMKIHNl
dHRpbmcuCiAuVFAKKy5CIE1PVU5UX1dBSVQKK1NldCB0aGUgZGVmYXVsdCB0aW1lIHRvIHdhaXQg
Zm9yIGEgcmVzcG9uc2UgZnJvbSBhIHNwYXduZWQgbW91bnQoOCkKK2JlZm9yZSBzZW5kaW5nIGl0
IGEgU0lHVEVSTS4gTm90ZSB0aGF0IHdlIHN0aWxsIG5lZWQgdG8gd2FpdCBmb3IgdGhlCitSUEMg
bGF5ZXIgdG8gdGltZW91dCBiZWZvcmUgdGhlIHN1Yi1wcm9jZXNzIGV4aXRzIHNvIHRoaXMgaXNu
J3QgaWRlYWwKK2J1dCBpdCBpcyB0aGUgYmVzdCB3ZSBjYW4gZG8uIFRoZSBkZWZhdWx0IGlzIHRv
IHdhaXQgdW50aWwgbW91bnQoOCkKK3JldHVybnMgd2l0aG91dCBpbnRlcnZlbnRpb24uCisuVFAK
IC5CIFVNT1VOVF9XQUlUCiBTZXQgdGhlIGRlZmF1bHQgdGltZSB0byB3YWl0IGZvciBhIHJlc3Bv
bnNlIGZyb20gYSBzcGF3bmVkIHVtb3VudCg4KQogYmVmb3JlIHNlbmRpbmcgaXQgYSBTSUdURVJN
LiBOb3RlIHRoYXQgd2Ugc3RpbGwgbmVlZCB0byB3YWl0IGZvciB0aGUKLS0tIGF1dG9mcy01LjAu
MS5vcmlnL3JlZGhhdC9hdXRvZnMuc3lzY29uZmlnLmluCisrKyBhdXRvZnMtNS4wLjEvcmVkaGF0
L2F1dG9mcy5zeXNjb25maWcuaW4KQEAgLTE0LDYgKzE0LDE1IEBAIFRJTUVPVVQ9MzAwCiAjCiAj
TkVHQVRJVkVfVElNRU9VVD02MAogIworIyBNT1VOVF9XQUlUIC0gdGltZSB0byB3YWl0IGZvciBh
IHJlc3BvbnNlIGZyb20gdW1vdW50KDgpLgorIyAJICAgICAgIFNldHRpbmcgdGhpcyB0aW1lb3V0
IGNhbiBjYXVzZSBwcm9ibGVtcyB3aGVuCisjIAkgICAgICAgbW91bnQgd291bGQgb3RoZXJ3aXNl
IHdhaXQgZm9yIGEgc2VydmVyIHRoYXQKKyMgCSAgICAgICBpcyB0ZW1wb3JhcmlseSB1bmF2YWls
YWJsZSwgc3VjaCBhcyB3aGVuIGl0J3MKKyMgCSAgICAgICByZXN0YXJ0aW5nLiBUaGUgZGVmYWls
dCBvZiB3YWl0aW5nIGZvciBtb3VudCg4KQorIyAJICAgICAgIHVzdWFsbHkgcmVzdWx0cyBpbiBh
IHdhaXQgb2YgYXJvdW5kIDMgbWludXRlcy4KKyMKKyNNT1VOVF9XQUlUPS0xCisjCiAjIFVNT1VO
VF9XQUlUIC0gdGltZSB0byB3YWl0IGZvciBhIHJlc3BvbnNlIGZyb20gdW1vdW50KDgpLgogIwog
I1VNT1VOVF9XQUlUPTEyCi0tLSBhdXRvZnMtNS4wLjEub3JpZy9zYW1wbGVzL2F1dG9mcy5jb25m
LmRlZmF1bHQuaW4KKysrIGF1dG9mcy01LjAuMS9zYW1wbGVzL2F1dG9mcy5jb25mLmRlZmF1bHQu
aW4KQEAgLTE0LDYgKzE0LDE1IEBAIFRJTUVPVVQ9MzAwCiAjCiAjTkVHQVRJVkVfVElNRU9VVD02
MAogIworIyBNT1VOVF9XQUlUIC0gdGltZSB0byB3YWl0IGZvciBhIHJlc3BvbnNlIGZyb20gdW1v
dW50KDgpLgorIyAJICAgICAgIFNldHRpbmcgdGhpcyB0aW1lb3V0IGNhbiBjYXVzZSBwcm9ibGVt
cyB3aGVuCisjIAkgICAgICAgbW91bnQgd291bGQgb3RoZXJ3aXNlIHdhaXQgZm9yIGEgc2VydmVy
IHRoYXQKKyMgCSAgICAgICBpcyB0ZW1wb3JhcmlseSB1bmF2YWlsYWJsZSwgc3VjaCBhcyB3aGVu
IGl0J3MKKyMgCSAgICAgICByZXN0YXJ0aW5nLiBUaGUgZGVmYWlsdCBvZiB3YWl0aW5nIGZvciBt
b3VudCg4KQorIyAJICAgICAgIHVzdWFsbHkgcmVzdWx0cyBpbiBhIHdhaXQgb2YgYXJvdW5kIDMg
bWludXRlcy4KKyMKKyNNT1VOVF9XQUlUPS0xCisjCiAjIFVNT1VOVF9XQUlUIC0gdGltZSB0byB3
YWl0IGZvciBhIHJlc3BvbnNlIGZyb20gdW1vdW50KDgpLgogIwogI1VNT1VOVF9XQUlUPTEyCgoK
PiAKPiBUaGFua3MhCj4gCj4gMjAwOS84LzEzIENhcmxvcyBBbmRyw6kgPGNhbmRyZWNuQGdtYWls
LmNvbT46Cj4gPiAyMDA5LzgvMTMgSWFuIEtlbnQgPGlrZW50QHJlZGhhdC5jb20+Ogo+ID4+IENh
cmxvcyBBbmRyw6kgd3JvdGU6Cj4gPj4+IFRvZGF5ICgyMDA5LTA4LTEyKSBJJ20gdXNpbmc6Cj4g
Pj4+IGtlcm5lbC0yLjYuMTgtMTI4LjIuMS5lbDUKPiA+Pj4gYXV0b2ZzLTUuMC4xLTAucmMyLjEw
Mi5lbDVfMy4xCj4gPj4KPiA+PiBUaGFua3MsCj4gPj4KPiA+PiBNeSBtaXN0YWtlLCB0aGUgd2Fp
dCB0aW1lIEkgd2FzIHJlZmVycmluZyB0byBpcyB1c2VkIGZvciB1bW91bnRzIGR1cmluZwo+ID4+
IGV4cGlyZXMgYW5kIGlzIHByZXNlbnQgaW4gcmV2IHJjMi4xMDIuCj4gPj4KPiA+PiBJdCBzaG91
bGRuJ3QgYmUgaGFyZCB0byBhZGQgdGhpcyBmb3IgbW91bnQgYXMgd2VsbC4KPiA+PiBXb3VsZCB5
b3UgbGlrZSBtZSB0byBwdXQgc29tZXRoaW5nIHRvZ2V0aGVyPwo+ID4KPiA+IFN1cmUhIHRoYXQg
J2xsIGhlbHAgbWUgYSBsb3QgKGFuZCBmb3Igc3VyZSBhbm90aGVyIHBwbCkgOikgVGhhbmtzIDop
Cj4gPgo+ID4+Cj4gPj4gUHJvYmFibHkgd291bGQgYmUgZ29vZCB0byB0ZXN0IHNvbWV0aGluZyBv
dXQgdG8gc2VlIGlmIHdlIGNhbiBtYWtlIGEKPiA+PiBkaWZmZXJlbmNlIHdpdGggdGhlIGtpbGxp
bmcgbW91bnQgYWZ0ZXIgc29tZSBjb25maWd1cmVkIHRpbWVvdXQgYnV0LCBpZgo+ID4+IHdlIG1h
a2UgcHJvZ3Jlc3MsIHByb2JhYmx5IHRoZSBiZXN0IHdheSB0byBkZWFsIHdpdGggaXQgaXMgZm9y
IHlvdSB0bwo+ID4+IGxvZyBhIGJ1ZyBhZ2FpbnN0IHJoZWwtNSBzbyBJIGNhbiBnZXQgaXQgY29t
bWl0dGVkIHRvIHRoZSByaGVsIHBhY2thZ2UuCj4gPj4gVGhlIHBvc3NpYmxlIGlzc3VlIGlzIHRo
YXQgSSdtIG5vdCBzdXJlIGlmIHRoZSBSUEMgc3Vic3lzdGVtIGluIHRoZQo+ID4+IGFib3ZlIHJo
ZWwga2VybmVsIHdpbGwgcmVzcG9uZCB3ZWxsIHRvIHByb2Nlc3MgZGVhdGggd2l0aCBwb3RlbnRp
YWwKPiA+PiBvdXRzdGFuZGluZyByZXF1ZXN0cy4gQnV0IHdlJ2xsIHNlZS4KPiA+Cj4gPiBPaywg
b24gbXkgd2F5IDopCj4gPgo+ID4gVGhhbmtzIGEgbG90IQo+ID4KPiA+Pgo+ID4+Pgo+ID4+Pgo+
ID4+PiBMb29rIG15IGxhc3QgdGVzdDoKPiA+Pj4gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KPiA+Pj4gW3Jvb3RAS1NUQVRJT04g
YXJlYXNdIyB0aW1lIGxzIHRlc3Rkb3duCj4gPj4+IGxzOiB0ZXN0ZG93bjogTm8gc3VjaCBmaWxl
IG9yIGRpcmVjdG9yeQo+ID4+Pgo+ID4+PiByZWFsICAgIDNtOS4wMjVzCj4gPj4+IHVzZXIgICAg
MG0wLjAwMHMKPiA+Pj4gc3lzICAgICAwbTAuMDAycwo+ID4+Pgo+ID4+Pgo+ID4+Pgo+ID4+Pgo+
ID4+PiBBdWcgMTIgMTI6NTc6MDcgS1NUQVRJT04gYXV0b21vdW50WzE1NDcxXTogc3VuX21vdW50
OiBwYXJzZShzdW4pOgo+ID4+PiBtb3VudGluZyByb290IC9taXNjL2FyZWFzLCBtb3VudHBvaW50
IHRlc3Rkb3duLCB3aGF0Cj4gPj4+IDEuMi4zLjQ6L2FyZWFzL3Rlc3Rkb3duLCBmc3R5cGUgbmZz
NCwgb3B0aW9ucwo+ID4+PiBhY2wsc2VjPWtyYjVwLHByb3RvPXRjcCxyZXRyeT0wCj4gPj4+IEF1
ZyAxMiAxMjo1NzowNyBLU1RBVElPTiBhdXRvbW91bnRbMTU0NzFdOiBkb19tb3VudDoKPiA+Pj4g
MS4yLjMuNDovYXJlYXMvdGVzdGRvd24gL21pc2MvYXJlYXMvdGVzdGRvd24gdHlwZSBuZnM0IG9w
dGlvbnMKPiA+Pj4gYWNsLHNlYz1rcmI1cCxwcm90bz10Y3AscmV0cnk9MCB1c2luZyBtb2R1bGUg
bmZzNAo+ID4+PiBBdWcgMTIgMTI6NTc6MDcgS1NUQVRJT04gYXV0b21vdW50WzE1NDcxXTogbW91
bnRfbW91bnQ6IG1vdW50KG5mcyk6Cj4gPj4+IHJvb3Q9L21pc2MvYXJlYXMgbmFtZT10ZXN0ZG93
biB3aGF0PTEuMi4zLjQ6L2FyZWFzL3Rlc3Rkb3duLAo+ID4+PiBmc3R5cGU9bmZzNCwgb3B0aW9u
cz1hY2wsc2VjPWtyYjVwLHByb3RvPXRjcCxyZXRyeT0wCj4gPj4+IEF1ZyAxMiAxMjo1NzowNyBL
U1RBVElPTiBhdXRvbW91bnRbMTU0NzFdOiBtb3VudF9tb3VudDogbW91bnQobmZzKToKPiA+Pj4g
bmZzIG9wdGlvbnM9ImFjbCxzZWM9a3JiNXAscHJvdG89dGNwLHJldHJ5PTAiLCBub3N5bWxpbms9
MCwgcm89MAo+ID4+PiBBdWcgMTIgMTI6NTc6MDcgS1NUQVRJT04gYXV0b21vdW50WzE1NDcxXTog
bW91bnRfbW91bnQ6IG1vdW50KG5mcyk6Cj4gPj4+IGNhbGxpbmcgbWtkaXJfcGF0aCAvbWlzYy9h
cmVhcy90ZXN0ZG93bgo+ID4+PiBBdWcgMTIgMTI6NTc6MDcgS1NUQVRJT04gYXV0b21vdW50WzE1
NDcxXTogbW91bnRfbW91bnQ6IG1vdW50KG5mcyk6Cj4gPj4+IGNhbGxpbmcgbW91bnQgLXQgbmZz
NCAtcyAtbyBhY2wsc2VjPWtyYjVwLHByb3RvPXRjcCxyZXRyeT0wCj4gPj4+IDEuMi4zLjQ6L2Fy
ZWFzL3Rlc3Rkb3duIC9taXNjL2FyZWFzL3Rlc3Rkb3duCj4gPj4+IEF1ZyAxMiAxMjo1ODoxMiBL
U1RBVElPTiBhdXRvbW91bnRbMTU0NzFdOiBzdF9leHBpcmU6IHN0YXRlIDEgcGF0aCAvbWlzYwo+
ID4+PiBBdWcgMTIgMTI6NTg6MTIgS1NUQVRJT04gYXV0b21vdW50WzE1NDcxXTogZXhwaXJlX3By
b2M6IGV4cF9wcm9jID0KPiA+Pj4gMzA3ODA5MzcxMiBwYXRoIC9taXNjCj4gPj4+IEF1ZyAxMiAx
Mjo1ODoxMyBLU1RBVElPTiBhdXRvbW91bnRbMTU0NzFdOiBleHBpcmVfcHJvY19pbmRpcmVjdDog
Mgo+ID4+PiBzdWJtb3VudHMgcmVtYWluaW5nIGluIC9taXNjCj4gPj4+IEF1ZyAxMiAxMjo1ODox
MyBLU1RBVElPTiBhdXRvbW91bnRbMTU0NzFdOiBleHBpcmVfY2xlYW51cDogZ290IHRoaWQKPiA+
Pj4gMzA3ODA5MzcxMiBwYXRoIC9taXNjIHN0YXQgMwo+ID4+PiBBdWcgMTIgMTI6NTg6MTMgS1NU
QVRJT04gYXV0b21vdW50WzE1NDcxXTogZXhwaXJlX2NsZWFudXA6IHNpZ2NobGQ6Cj4gPj4+IGV4
cCAzMDc4MDkzNzEyIGZpbmlzaGVkLCBzd2l0Y2hpbmcgZnJvbSAyIHRvIDEKPiA+Pj4gQXVnIDEy
IDEyOjU4OjEzIEtTVEFUSU9OIGF1dG9tb3VudFsxNTQ3MV06IHN0X3JlYWR5OiBzdF9yZWFkeSgp
OiBzdGF0ZQo+ID4+PiA9IDIgcGF0aCAvbWlzYwo+ID4+PiBBdWcgMTIgMTI6NTk6MjggS1NUQVRJ
T04gYXV0b21vdW50WzE1NDcxXTogc3RfZXhwaXJlOiBzdGF0ZSAxIHBhdGggL21pc2MKPiA+Pj4g
QXVnIDEyIDEyOjU5OjI4IEtTVEFUSU9OIGF1dG9tb3VudFsxNTQ3MV06IGV4cGlyZV9wcm9jOiBl
eHBfcHJvYyA9Cj4gPj4+IDMwNzgwOTM3MTIgcGF0aCAvbWlzYwo+ID4+PiBBdWcgMTIgMTI6NTk6
MjggS1NUQVRJT04gYXV0b21vdW50WzE1NDcxXTogZXhwaXJlX3Byb2NfaW5kaXJlY3Q6IDIKPiA+
Pj4gc3VibW91bnRzIHJlbWFpbmluZyBpbiAvbWlzYwo+ID4+PiBBdWcgMTIgMTI6NTk6MjggS1NU
QVRJT04gYXV0b21vdW50WzE1NDcxXTogZXhwaXJlX2NsZWFudXA6IGdvdCB0aGlkCj4gPj4+IDMw
NzgwOTM3MTIgcGF0aCAvbWlzYyBzdGF0IDMKPiA+Pj4gQXVnIDEyIDEyOjU5OjI4IEtTVEFUSU9O
IGF1dG9tb3VudFsxNTQ3MV06IGV4cGlyZV9jbGVhbnVwOiBzaWdjaGxkOgo+ID4+PiBleHAgMzA3
ODA5MzcxMiBmaW5pc2hlZCwgc3dpdGNoaW5nIGZyb20gMiB0byAxCj4gPj4+IEF1ZyAxMiAxMjo1
OToyOCBLU1RBVElPTiBhdXRvbW91bnRbMTU0NzFdOiBzdF9yZWFkeTogc3RfcmVhZHkoKTogc3Rh
dGUKPiA+Pj4gPSAyIHBhdGggL21pc2MKPiA+Pj4gQXVnIDEyIDEzOjAwOjE2IEtTVEFUSU9OIGF1
dG9tb3VudFsxNTQ3MV06ID4+IG1vdW50OiBtb3VudCB0byBORlMKPiA+Pj4gc2VydmVyICcxLjIu
My40JyBmYWlsZWQ6IHRpbWVkIG91dCAoZ2l2aW5nIHVwKS4KPiA+Pj4gQXVnIDEyIDEzOjAwOjE2
IEtTVEFUSU9OIGF1dG9tb3VudFsxNTQ3MV06IG1vdW50KG5mcyk6IG5mczogbW91bnQKPiA+Pj4g
ZmFpbHVyZSAxLjIuMy40Oi9hcmVhcy90ZXN0ZG93biBvbiAvbWlzYy9hcmVhcy90ZXN0ZG93bgo+
ID4+PiBBdWcgMTIgMTM6MDA6MTYgS1NUQVRJT04gYXV0b21vdW50WzE1NDcxXTogc2VuZF9mYWls
OiB0b2tlbiA9IDE3Cj4gPj4+IEF1ZyAxMiAxMzowMDoxNiBLU1RBVElPTiBhdXRvbW91bnRbMTU0
NzFdOiBmYWlsZWQgdG8gbW91bnQgL21pc2MvYXJlYXMvdGVzdGRvd24KPiA+Pj4gQXVnIDEyIDEz
OjAwOjQzIEtTVEFUSU9OIGF1dG9tb3VudFsxNTQ3MV06IHN0X2V4cGlyZTogc3RhdGUgMSBwYXRo
IC9taXNjCj4gPj4+IC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tCj4gPj4+Cj4gPj4+IDIwMDkvOC8xMiBJYW4gS2VudCA8aWtlbnRA
cmVkaGF0LmNvbT46Cj4gPj4+PiBDYXJsb3MgQW5kcsOpIHdyb3RlOgo+ID4+Pj4+IEhpIElhbiwK
PiA+Pj4+PiBJJ20gZ2V0dGluZyBjcmF6eSB0cnlpbmcgcHV0ICJyZXRyeT0iIHRvIHdvcmsgb24g
bW91bnQuLi4gdGhpcyBvcHRpb24KPiA+Pj4+PiBqdXN0IERPTlQgV09SSyBpZiB1c2UgcHJvdG89
dGNwIGFuZC9PUiBrZXJiZXJvcyAoc2VjPWtyYjUva3JiNWkva3JiNXApCj4gPj4+Pj4gbGlrZSB5
b3UgY2FuIHNlZSBvbiBteSBwcmV2aW91cyBlbWFpbHMuLi4KPiA+Pj4+IFJpZ2h0LCBteSBtaXN0
YWtlIGZvciBub3QgbG9va2luZyBjbG9zZWx5IGVub3VnaCBhdCBwb3N0Lgo+ID4+Pj4KPiA+Pj4+
IE1heWJlIHRoaXMgaXMgcmVsYXRlZCB0byB0aGUgc2FtZSBzb3J0IG9mIHByb2JsZW0gd2UgaGFk
IHdpdGggbW91bnQgaW4KPiA+Pj4+IHRoZSBwYXN0LCBiZWZvcmUgdGhlIG9wdGlvbnMgcGFyc2lu
ZyB3ZW50IGludG8gdGhlIGtlcm5lbCwgd2hlcmUgb3RoZXIKPiA+Pj4+IHNlcnZpY2VzLCBsaWtl
IHBvcnRtYXBwZXIgKG9yIHJwY2JpbmQpLCB3ZXJlIGJlaW5nIGRvbmUgd2l0aCBkaWZmZXJlbnQK
PiA+Pj4+IHRpbWVvdXQgcGFyYW1ldGVycyBiZWZvcmUgdGhlIFJQQyBjYWxscyBmb3IgbW91bnRp
bmcuIFRoYXQncyBqdXN0IGFuCj4gPj4+PiBleGFtcGxlIGFzIE5GU3Y0IHNob3VsZG4ndCBiZSBz
ZW5zaXRpdmUgdG8gcG9ydG1hcHBlciBhbnl3YXkuCj4gPj4+Pgo+ID4+Pj4gQnV0IHdoYXQgdmVy
c2lvbiBvZiBhdXRvZnMgYW5kIGtlcm5lbCBkaWQgeW91IHNheSB5b3Ugd2VyZSB1c2luZz8KPiA+
Pj4+Cj4gPj4+Pj4gSSBhcHByZWNpYXRlIGFueSBoZWxwLgo+ID4+Pj4+Cj4gPj4+Pj4gQ2FybG9z
Lgo+ID4+Pj4+Cj4gPj4+Pj4KPiA+Pj4+PiAyMDA5LzgvMTIgSWFuIEtlbnQgPGlrZW50QHJlZGhh
dC5jb20+Ogo+ID4+Pj4+PiBDaHVjayBMZXZlciB3cm90ZToKPiA+Pj4+Pj4+IE9uIEF1ZyAxMSwg
MjAwOSwgYXQgODo0MSBBTSwgQ2FybG9zIEFuZHLDqSB3cm90ZToKPiA+Pj4+Pj4+PiBUaGlzIGxv
bmcgdGltZW91dCBpcyBnb29kIGlmIHdvcmtzdGF0aW9uIG5lZWQgbW91bnQgYSBjcml0aWNhbAo+
ID4+Pj4+Pj4+IGRpcmVjdG9yeSB1c2luZyAvZXRjL2ZzdGFiIG9uIGJvb3QgKGZvciBleGFtcGxl
KS4uCj4gPj4+Pj4+Pj4gQnV0IGluIG15IGNhc2UsIHVzaW5nIHRoaXMgbG9vb29uZyB0aW1lb3V0
IGRvZXNudCBtYWtlIGFueSBzZW5zZSwKPiA+Pj4+Pj4+PiBzaW5jZSBhdXRvZnMgcmV0cnkgbW91
bnQgZGlyZWN0b3J5IG9uLWFjY2Vzcy4gVGhpcyBpbiBmYWN0IGdpdmVzIG1lCj4gPj4+Pj4+Pj4g
YWxvdCBvZiBoZWFkYWNoZXMsIGNveiB1c2VyIGxvZ2luICdsbCBqdXN0IGhhbmdzIGlmIG9uZSBz
ZXJ2ZXIgZ29lcwo+ID4+Pj4+Pj4+IGRvd24gZm9yIGFueSByZWFzb24sIGFuZCB3aWxsIGFnYWlu
IGhhbmdzIGlmIHVzZXIgdHJ5IGFjY2VzcyBkaXJlY3RvcnkKPiA+Pj4+Pj4+PiBwb2ludGluZyB0
byBhIE5GUyBkb3duIHNlcnZlci4uLgo+ID4+Pj4+Pj4gInJldHJ5PTAiIG1lYW5zIHRoZSBtb3Vu
dCBjb21tYW5kIHdpbGwgZmFpbCBhcyBzb29uIGFzIHRoZSBmaXJzdAo+ID4+Pj4+Pj4gbW91bnQo
Mikgc3lzdGVtIGNhbGwgZmFpbHMuICBXaGVuIHlvdSBzZXQgU1lOIHJldHJpZXMgdG8gMSwgdGhp
cyBtZWFucwo+ID4+Pj4+Pj4gYWZ0ZXIgOSBzZWNvbmRzLCB0aGUgY29ubmVjdCBmYWlscywgYW5k
IHRoYXQgY2F1c2VzIHRoZSBtb3VudCgyKSBzeXN0ZW0KPiA+Pj4+Pj4+IGNhbGwgdG8gZmFpbC4K
PiA+Pj4+Pj4+Cj4gPj4+Pj4+PiBSZWNlbnQgY29udmVyc2F0aW9ucyB3aXRoIElhbiBzdWdnZXN0
ZWQgdGhhdCBhIGxvbmcgdGltZW91dCB3YXMgZGVzaXJlZAo+ID4+Pj4+Pj4gZm9yIGF1dG9tb3Vu
dGVyIGFzIHdlbGwgYXMgb3RoZXIgY2FzZXMuICBJYW4sIGlzIHRoZXJlIHNvbWV0aGluZyBlbHNl
IHdlCj4gPj4+Pj4+PiBuZWVkIHRvIGNvbnNpZGVyIHRvIGRldGVybWluZSB0aGUgY29ycmVjdCBy
ZXRyeSB0aW1lb3V0IGZvciBORlMvVENQCj4gPj4+Pj4+PiBtb3VudCBwb2ludHMgaGFuZGxlZCB2
aWEgYXV0b21vdW50ZXI/ICBIb3cgc2hvdWxkIG1vdW50Lm5mcyB3YWl0IHNvIHdlCj4gPj4+Pj4+
PiBkb24ndCBtYWtlIG90aGVyIHVzZSBjYXNlcyB3b3JzZT8gIChMb29rcyBsaWtlIG1vc3Qgb2Yg
dGhlIGhpc3RvcnkgaXMKPiA+Pj4+Pj4+IGludGFjdCBiZWxvdykuCj4gPj4+Pj4+IE9mIGNvdXJz
ZSB3ZSBrbm93IHRoYXQgYXV0b2ZzIGlzIGVudGlyZWx5IGF0IHRoZSBtZXJjeSBvZiBtb3VudCg4
KSAoYW5kCj4gPj4+Pj4+IG1vdW50Lm5mcyBpbiBwYXJ0aWN1bGFyKS4gVGhpcyBoYXMgYWx3YXlz
IGJlZW4gYSBkaWZmaWN1bHQgc2l0dWF0aW9uIGZvcgo+ID4+Pj4+PiB0aGUgYXV0b21vdW50ZXIg
YmVjYXVzZSBpbnRlcmFjdGl2ZSBtb3VudCBpbnZvY2F0aW9ucyBzaG91bGQgd2FpdC4gQnV0IEkK
PiA+Pj4+Pj4gYmVsaWV2ZSBhdXRvbW91bnQgbW91bnRzIHNob3VsZCBhbHdheXMgdGltZSBvdXQg
cXVpY2tseSwgYnV0IHRoYXQgbGVhZHMKPiA+Pj4+Pj4gdG8gaXRzIG93biBzZXQgb2YgcHJvYmxl
bXMsIGVzcGVjaWFsbHkgd2hlbiBob21lIGRpcmVjdG9yaWVzIGFyZSBjb25jZXJuZWQuCj4gPj4+
Pj4+Cj4gPj4+Pj4+IEkgdGhpbmsgYWRkaW5nICJyZXRyeT0wIiBpcyB0aGUgcmlnaHQgdGhpbmcg
dG8gZG8gbXlzZWxmIGJ1dCBJJ20gbm90Cj4gPj4+Pj4+IGNlcnRhaW4gdGhhdCB3aWxsIHdvcmsg
YXMgd2UgZXhwZWN0LiBJJ2xsIGhhdmUgdG8gZG8gc29tZSBleHBlcmltZW50YXRpb24uCj4gPj4+
Pj4+Cj4gPj4+Pj4+PiBIb3cgbG9uZyBkbyB5b3UgdGhpbmsgaXMgYXBwcm9wcmlhdGUgZm9yIHRo
ZSBhdXRvbW91bnRlciB0byB3YWl0IGlmIHRoZQo+ID4+Pj4+Pj4gc2VydmVyIGlzIGRvd24sIGlu
IHlvdXIgY2FzZSwgQ2FybG9zPwo+ID4+Pj4+Pj4KPiA+Pj4+Pj4+PiBBbSBsb3Npbmcgc29tZXRo
aW5nIG9yIHRoZXJlIGhhdmUgd2FzIHNvbWV0aGluZyB3ZWlyZG8uLi4hPwo+ID4+Pj4+Pj4+IC0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQo+ID4+Pj4+Pj4+
IFtyb290QEtTVEFUSU9OIH5dIyBlY2hvIDUgPiAvcHJvYy9zeXMvbmV0L2lwdjQvdGNwX3N5bl9y
ZXRyaWVzICBbREVGQVVMVF0KPiA+Pj4+Pj4+PiBbcm9vdEBLU1RBVElPTiB+XSMgdGltZSBtb3Vu
dCAxLjIuMy40Oi9ibGFibGEgL3RtcC8gLXQgbmZzNCAtbwo+ID4+Pj4+Pj4+IHByb3RvPXRjcCxy
ZXRyeT0xCj4gPj4+Pj4+Pj4gbW91bnQ6IG1vdW50IHRvIE5GUyBzZXJ2ZXIgJzEuMi4zLjQnIGZh
aWxlZDogdGltZWQgb3V0IChnaXZpbmcgdXApLgo+ID4+Pj4+Pj4+Cj4gPj4+Pj4+Pj4gcmVhbCAg
ICAzbTkuMDAwcwo+ID4+Pj4+Pj4+IHVzZXIgICAgMG0wLjAwMnMKPiA+Pj4+Pj4+PiBzeXMgICAg
IDBtMC4wMDFzCj4gPj4+Pj4+Pj4gW3Jvb3RAS1NUQVRJT04gfl0jIHRpbWUgbW91bnQgMS4yLjMu
NDovYmxhYmxhIC90bXAvIC10IG5mczQgLW8KPiA+Pj4+Pj4+PiBzZWM9a3JiNXAscHJvdG89dGNw
LHJldHJ5PTEKPiA+Pj4+Pj4+PiBtb3VudDogbW91bnQgdG8gTkZTIHNlcnZlciAnMS4yLjMuNCcg
ZmFpbGVkOiB0aW1lZCBvdXQgKGdpdmluZyB1cCkuCj4gPj4+Pj4+Pj4KPiA+Pj4+Pj4+PiByZWFs
ICAgIDNtOS4wMDBzCj4gPj4+Pj4+Pj4gdXNlciAgICAwbTAuMDAwcwo+ID4+Pj4+Pj4+IHN5cyAg
ICAgMG0wLjAwMnMKPiA+Pj4+Pj4+PiBbcm9vdEBLU1RBVElPTiB+XSMgdGltZSBtb3VudCAxLjIu
My40Oi9ibGFibGEgL3RtcC8gLXQgbmZzNCAtbwo+ID4+Pj4+Pj4+IHByb3RvPXRjcCxyZXRyeT0w
Cj4gPj4+Pj4+Pj4gbW91bnQ6IG1vdW50IHRvIE5GUyBzZXJ2ZXIgJzEuMi4zLjQnIGZhaWxlZDog
dGltZWQgb3V0IChnaXZpbmcgdXApLgo+ID4+Pj4+Pj4+Cj4gPj4+Pj4+Pj4gcmVhbCAgICAzbTku
MDAxcwo+ID4+Pj4+Pj4+IHVzZXIgICAgMG0wLjAwMHMKPiA+Pj4+Pj4+PiBzeXMgICAgIDBtMC4w
MDNzCj4gPj4+Pj4+Pj4gW3Jvb3RAS1NUQVRJT04gfl0jIHRpbWUgbW91bnQgMS4yLjMuNDovYmxh
YmxhIC90bXAvIC10IG5mczQgLW8KPiA+Pj4+Pj4+PiBzZWM9a3JiNXAscHJvdG89dGNwLHJldHJ5
PTAKPiA+Pj4+Pj4+PiBtb3VudDogbW91bnQgdG8gTkZTIHNlcnZlciAnMS4yLjMuNCcgZmFpbGVk
OiB0aW1lZCBvdXQgKGdpdmluZyB1cCkuCj4gPj4+Pj4+Pj4KPiA+Pj4+Pj4+PiByZWFsICAgIDNt
OS4wMDFzCj4gPj4+Pj4+Pj4gdXNlciAgICAwbTAuMDAycwo+ID4+Pj4+Pj4+IHN5cyAgICAgMG0w
LjAwMXMKPiA+Pj4+Pj4+Pgo+ID4+Pj4+Pj4+IFtyb290QEtTVEFUSU9OIH5dIyBlY2hvIDEgPiAv
cHJvYy9zeXMvbmV0L2lwdjQvdGNwX3N5bl9yZXRyaWVzIFsgNSB0byAxIF0KPiA+Pj4+Pj4+Pgo+
ID4+Pj4+Pj4+IFtyb290QEtTVEFUSU9OIH5dIyB0aW1lIG1vdW50IDEuMi4zLjQ6L2JsYWJsYSAv
dG1wLyAtdCBuZnM0IC1vCj4gPj4+Pj4+Pj4gcHJvdG89dGNwLHJldHJ5PTEKPiA+Pj4+Pj4+PiBt
b3VudDogbW91bnQgdG8gTkZTIHNlcnZlciAnMS4yLjMuNCcgZmFpbGVkOiB0aW1lZCBvdXQgKHJl
dHJ5aW5nKS4gW3ggNl0KPiA+Pj4+Pj4+PiBtb3VudDogbW91bnQgdG8gTkZTIHNlcnZlciAnMS4y
LjMuNCcgZmFpbGVkOiB0aW1lZCBvdXQgKGdpdmluZyB1cCkuCj4gPj4+Pj4+Pj4KPiA+Pj4+Pj4+
PiByZWFsICAgIDFtMy4wMDJzCj4gPj4+Pj4+Pj4gdXNlciAgICAwbTAuMDAwcwo+ID4+Pj4+Pj4+
IHN5cyAgICAgMG0wLjAwMnMKPiA+Pj4+Pj4+PiBbcm9vdEBLU1RBVElPTiB+XSMgdGltZSBtb3Vu
dCAxLjIuMy40Oi9ibGFibGEgL3RtcC8gLXQgbmZzNCAtbwo+ID4+Pj4+Pj4+IHNlYz1rcmI1cCxw
cm90bz10Y3AscmV0cnk9MQo+ID4+Pj4+Pj4+IG1vdW50OiBtb3VudCB0byBORlMgc2VydmVyICcx
LjIuMy40JyBmYWlsZWQ6IHRpbWVkIG91dCAocmV0cnlpbmcpLiBbeCAxM10KPiA+Pj4+Pj4+PiBt
b3VudDogbW91bnQgdG8gTkZTIHNlcnZlciAnMS4yLjMuNCcgZmFpbGVkOiB0aW1lZCBvdXQgKGdp
dmluZyB1cCkuCj4gPj4+Pj4+Pj4KPiA+Pj4+Pj4+PiByZWFsICAgIDJtNi4wMDBzCj4gPj4+Pj4+
Pj4gdXNlciAgICAwbTAuMDAwcwo+ID4+Pj4+Pj4+IHN5cyAgICAgMG0wLjAwMnMKPiA+Pj4+Pj4+
PiBbcm9vdEBLU1RBVElPTiB+XSMgdGltZSBtb3VudCAxLjIuMy40Oi9ibGFibGEgL3RtcC8gLXQg
bmZzNCAtbwo+ID4+Pj4+Pj4+IHByb3RvPXRjcCxyZXRyeT0wCj4gPj4+Pj4+Pj4gbW91bnQ6IG1v
dW50IHRvIE5GUyBzZXJ2ZXIgJzEuMi4zLjQnIGZhaWxlZDogdGltZWQgb3V0IChnaXZpbmcgdXAp
Lgo+ID4+Pj4+Pj4+Cj4gPj4+Pj4+Pj4gcmVhbCAgICAwbTkuMDAzcwo+ID4+Pj4+Pj4+IHVzZXIg
ICAgMG0wLjAwMXMKPiA+Pj4+Pj4+PiBzeXMgICAgIDBtMC4wMDJzCj4gPj4+Pj4+Pj4gW3Jvb3RA
S1NUQVRJT04gfl0jIHRpbWUgbW91bnQgMS4yLjMuNDovYmxhYmxhIC90bXAvIC10IG5mczQgLW8K
PiA+Pj4+Pj4+PiBzZWM9a3JiNXAscHJvdG89dGNwLHJldHJ5PTAKPiA+Pj4+Pj4+PiBtb3VudDog
bW91bnQgdG8gTkZTIHNlcnZlciAnMS4yLjMuNCcgZmFpbGVkOiB0aW1lZCBvdXQgKHJldHJ5aW5n
KS4gW3ggMTNdCj4gPj4+Pj4+Pj4gbW91bnQ6IG1vdW50IHRvIE5GUyBzZXJ2ZXIgJzEuMi4zLjQn
IGZhaWxlZDogdGltZWQgb3V0IChnaXZpbmcgdXApLgo+ID4+Pj4+Pj4+Cj4gPj4+Pj4+Pj4gcmVh
bCAgICAybTYuMDAxcwo+ID4+Pj4+Pj4+IHVzZXIgICAgMG0wLjAwMXMKPiA+Pj4+Pj4+PiBzeXMg
ICAgIDBtMC4wMDJzCj4gPj4+Pj4+Pj4gW3Jvb3RAS1NUQVRJT04gfl0jCj4gPj4+Pj4+Pj4gLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tCj4gPj4+Pj4+Pj4g
bWF4IHRpbWVvdXQgZ29lcyB0byAybTZzIGNoYW5naW5nIHRjcF9zeW5fcmV0cmllcyBmcm9tIDUg
dG8gMS4uLiBhbmQKPiA+Pj4+Pj4+PiB1c2luZyByZXRyeT0wIHdpdGhvdXQga2VyYmVyb3MgSSBn
b3Qgb25seSA5cy4uLgo+ID4+Pj4+Pj4+Cj4gPj4+Pj4+Pj4gKnNpZ2gqCj4gPj4+Pj4+Pj4KPiA+
Pj4+Pj4+Pgo+ID4+Pj4+Pj4+Cj4gPj4+Pj4+Pj4gMjAwOS84LzEwIENodWNrIExldmVyIDxjaHVj
ay5sZXZlckBvcmFjbGUuY29tPjoKPiA+Pj4+Pj4+Pj4gT24gQXVnIDEwLCAyMDA5LCBhdCA0OjA1
IFBNLCBDYXJsb3MgQW5kcsOpIHdyb3RlOgo+ID4+Pj4+Pj4+Pj4gU29tZXRoaW5nIGZ1bm55OiBV
c2luZyBkZWZhdWx0IHRjcF9zeW5fcmV0cmllcyAoNSkgaSBnb3QKPiA+Pj4+Pj4+Pj4+ICIzLDYs
MTIsMjQsNDgsOTYiIHNlY3MgaW50ZXJ2YWwuLi4gYnV0IGlmIGkgY2hhbmdlIHRjcF9zeW5fcmV0
cmllcyB0bwo+ID4+Pj4+Pj4+Pj4gMSBpIGdvdCAiMyw2LDMsNiwzLDYuLi4iIHNlY3MgaW50ZXJ2
YWwuLi4KPiA+Pj4+Pj4+Pj4gUmlnaHQuICBOb3JtYWxseSB0aGUgUlBDIGNsaWVudCBjYWxscyB0
aGUga2VybmVsJ3Mgc29ja2V0IGNvbm5lY3QKPiA+Pj4+Pj4+Pj4gZnVuY3Rpb24sCj4gPj4+Pj4+
Pj4+IHdoaWNoIGRvZXMgNiBTWU4gcmV0cmllcy4gIFRoYXQgb25lIGNhbGwgdXN1YWxseSB0YWtl
cyBsb25nZXIgdGhhbgo+ID4+Pj4+Pj4+PiB0aGUgUlBDCj4gPj4+Pj4+Pj4+IGNsaWVudCdzIGNv
bm5lY3QgdGltZW91dCwgc28gaXQgb25seSBtYWtlcyBvbmUgY29ubmVjdCBjYWxsLCBhbmQgdGhl
bgo+ID4+Pj4+Pj4+PiBmYWlscy4KPiA+Pj4+Pj4+Pj4KPiA+Pj4+Pj4+Pj4gUmVkdWNpbmcgdGhl
IG51bWJlciBvZiBTWU4gcmV0cmllcyBwZXIgY29ubmVjdCBhdHRlbXB0IGNhdXNlcyB0aGUgUlBD
Cj4gPj4+Pj4+Pj4+IGNsaWVudAo+ID4+Pj4+Pj4+PiB0byByZXRyeSB0aGUgY29ubmVjdCBjYWxs
IHVudGlsIGl0cyBjb25uZWN0IHRpbWVvdXQgZXhwaXJlcy4gIEVhY2gKPiA+Pj4+Pj4+Pj4gY29u
bmVjdAo+ID4+Pj4+Pj4+PiBjYWxsIHJlc2V0cyB0aGUgU1lOIHRpbWVvdXQgdG8gMyBzZWNvbmRz
Lgo+ID4+Pj4+Pj4+Pgo+ID4+Pj4+Pj4+Pj4gW3Jvb3RAS1NFUlZFUiBtbnRdIyB0aW1lIG1vdW50
IDEuMi4zLjQ6L2JsYWJsYSB0bXAvIC10IG5mczQgLW8KPiA+Pj4+Pj4+Pj4+IHNlYz1rcmI1cCxw
cm90bz10Y3AKPiA+Pj4+Pj4+Pj4+IG1vdW50OiBtb3VudCB0byBORlMgc2VydmVyICcxLjIuMy40
JyBmYWlsZWQ6IHRpbWVkIG91dCAoZ2l2aW5nIHVwKS4KPiA+Pj4+Pj4+Pj4+Cj4gPj4+Pj4+Pj4+
PiByZWFsICAgIDNtOS4wMDBzCj4gPj4+Pj4+Pj4+PiB1c2VyICAgIDBtMC4wMDBzCj4gPj4+Pj4+
Pj4+PiBzeXMgICAgIDBtMC4wMDJzCj4gPj4+Pj4+Pj4+Pgo+ID4+Pj4+Pj4+Pj4gW3Jvb3RAS1NF
UlZFUiAvXSMgZWNobyAxID4gL3Byb2Mvc3lzL25ldC9pcHY0L3RjcF9zeW5fcmV0cmllcwo+ID4+
Pj4+Pj4+Pj4gW3Jvb3RAS1NFUlZFUiBtbnRdIyB0aW1lIG1vdW50IDEuMi4zLjQ6L2JsYWJsYSB0
bXAvIC10IG5mczQgLW8KPiA+Pj4+Pj4+Pj4+IHNlYz1rcmI1cCxwcm90bz10Y3AgICgicmV0cnk9
MSIgPSBubyBjaGFuZ2UpCj4gPj4+Pj4+Pj4+PiBtb3VudDogbW91bnQgdG8gTkZTIHNlcnZlciAn
MS4yLjMuNCcgZmFpbGVkOiB0aW1lZCBvdXQgKHJldHJ5aW5nKS4KPiA+Pj4+Pj4+Pj4+IG1vdW50
OiBtb3VudCB0byBORlMgc2VydmVyICcxLjIuMy40JyBmYWlsZWQ6IHRpbWVkIG91dCAocmV0cnlp
bmcpLgo+ID4+Pj4+Pj4+Pj4gbW91bnQ6IG1vdW50IHRvIE5GUyBzZXJ2ZXIgJzEuMi4zLjQnIGZh
aWxlZDogdGltZWQgb3V0IChyZXRyeWluZykuCj4gPj4+Pj4+Pj4+PiBtb3VudDogbW91bnQgdG8g
TkZTIHNlcnZlciAnMS4yLjMuNCcgZmFpbGVkOiB0aW1lZCBvdXQgKHJldHJ5aW5nKS4KPiA+Pj4+
Pj4+Pj4+IG1vdW50OiBtb3VudCB0byBORlMgc2VydmVyICcxLjIuMy40JyBmYWlsZWQ6IHRpbWVk
IG91dCAocmV0cnlpbmcpLgo+ID4+Pj4+Pj4+Pj4gbW91bnQ6IG1vdW50IHRvIE5GUyBzZXJ2ZXIg
JzEuMi4zLjQnIGZhaWxlZDogdGltZWQgb3V0IChyZXRyeWluZykuCj4gPj4+Pj4+Pj4+PiBtb3Vu
dDogbW91bnQgdG8gTkZTIHNlcnZlciAnMS4yLjMuNCcgZmFpbGVkOiB0aW1lZCBvdXQgKHJldHJ5
aW5nKS4KPiA+Pj4+Pj4+Pj4+IG1vdW50OiBtb3VudCB0byBORlMgc2VydmVyICcxLjIuMy40JyBm
YWlsZWQ6IHRpbWVkIG91dCAocmV0cnlpbmcpLgo+ID4+Pj4+Pj4+Pj4gbW91bnQ6IG1vdW50IHRv
IE5GUyBzZXJ2ZXIgJzEuMi4zLjQnIGZhaWxlZDogdGltZWQgb3V0IChyZXRyeWluZykuCj4gPj4+
Pj4+Pj4+PiBtb3VudDogbW91bnQgdG8gTkZTIHNlcnZlciAnMS4yLjMuNCcgZmFpbGVkOiB0aW1l
ZCBvdXQgKHJldHJ5aW5nKS4KPiA+Pj4+Pj4+Pj4+IG1vdW50OiBtb3VudCB0byBORlMgc2VydmVy
ICcxLjIuMy40JyBmYWlsZWQ6IHRpbWVkIG91dCAocmV0cnlpbmcpLgo+ID4+Pj4+Pj4+Pj4gbW91
bnQ6IG1vdW50IHRvIE5GUyBzZXJ2ZXIgJzEuMi4zLjQnIGZhaWxlZDogdGltZWQgb3V0IChyZXRy
eWluZykuCj4gPj4+Pj4+Pj4+PiBtb3VudDogbW91bnQgdG8gTkZTIHNlcnZlciAnMS4yLjMuNCcg
ZmFpbGVkOiB0aW1lZCBvdXQgKHJldHJ5aW5nKS4KPiA+Pj4+Pj4+Pj4+IG1vdW50OiBtb3VudCB0
byBORlMgc2VydmVyICcxLjIuMy40JyBmYWlsZWQ6IHRpbWVkIG91dCAoZ2l2aW5nIHVwKS4KPiA+
Pj4+Pj4+Pj4+Cj4gPj4+Pj4+Pj4+PiByZWFsICAgIDJtNi4wMDRzCj4gPj4+Pj4+Pj4+PiB1c2Vy
ICAgIDBtMC4wMDBzCj4gPj4+Pj4+Pj4+PiBzeXMgICAgIDBtMC4wMDRzCj4gPj4+Pj4+Pj4+Pgo+
ID4+Pj4+Pj4+Pj4gKDMsNiwzLDYuLi4gc2VjcyBpbnRlcnZhbCkKPiA+Pj4+Pj4+Pj4+Cj4gPj4+
Pj4+Pj4+Pgo+ID4+Pj4+Pj4+Pj4KPiA+Pj4+Pj4+Pj4+Cj4gPj4+Pj4+Pj4+PiAyMDA5LzgvMTAg
Q2FybG9zIEFuZHLDqSA8Y2FuZHJlY25AZ21haWwuY29tPjoKPiA+Pj4+Pj4+Pj4+PiBObywgaSdt
IGp1c3QgdXNpbmcgcGFja2FnZXMgZnJvbSBDZW50T1MgcmVwby4uLgo+ID4+Pj4+Pj4+Pj4+Cj4g
Pj4+Pj4+Pj4+Pj4gQW5kIHUncmUgcmlnaHQgYWJvdXQgZXhwbyByZXRyaWVzLi4uIHdpdGggdGNw
ZHVtcCBpJ3ZlIG1vbml0b3JlZAo+ID4+Pj4+Pj4+Pj4+IHRyYWZmaWMgYW5kIGkgZ290IFNZTiBy
ZXRyaWVzIGluIDMsIDYsIDEyLCAyNCwgNDgsIDk2IHNlY3Mgb24gcG9ydAo+ID4+Pj4+Pj4+Pj4+
IDIwNDkuLi4KPiA+Pj4+Pj4+Pj4+PiBJIHRyaWVkIHVzZSAicmV0cnk9MSIgb3B0aW9uIG9uIG1v
dW50IHdpdGhvdXQgYW55IGNoYW5nZS4uLiBJIGRvbnQKPiA+Pj4+Pj4+Pj4+PiB3YW50IGNoYW5n
ZSBzb3VyY2Ugb3IgdGNwIHRpbWVycy4uLiBqdXN0IE5GU3Y0IGNsaWVudC4KPiA+Pj4+Pj4+Pj4+
Pgo+ID4+Pj4+Pj4+Pj4+IDIwMDkvOC8xMCBDaHVjayBMZXZlciA8Y2h1Y2subGV2ZXJAb3JhY2xl
LmNvbT46Cj4gPj4+Pj4+Pj4+Pj4+IE9uIEF1ZyAxMCwgMjAwOSwgYXQgMjoyOSBQTSwgQ2FybG9z
IEFuZHLDqSB3cm90ZToKPiA+Pj4+Pj4+Pj4+Pj4+IEJydWNlLCBuby4uLiB5b3UncmUgcmlnaHQu
ICBJJ20gZGVzY3JpYmluZyBhIHNpdHVhdGlvbiB3aGVyZSBteQo+ID4+Pj4+Pj4+Pj4+Pj4gc2Vy
dmVyCj4gPj4+Pj4+Pj4+Pj4+PiBkaWVkLi4uIGkgbmVlZCBtb3VudCBmYWlsIGZhc3RlciAoMTAg
b3IgMTUgc2VjcyBtYXgpIHRoYW4gMyBtaW51dGVzCj4gPj4+Pj4+Pj4+Pj4+PiBhbmQgOSBzZWNv
bmRzLi4uCj4gPj4+Pj4+Pj4+Pj4+IFRoZSAxODkgc2Vjb25kIHRpbWVvdXQgaXMgbGlrZWx5IGhv
dyBsb25nIGl0IHRha2VzIHRoZSBrZXJuZWwgdG8KPiA+Pj4+Pj4+Pj4+Pj4gZ2l2ZSB1cAo+ID4+
Pj4+Pj4+Pj4+PiB0cnlpbmcgdG8gY29ubmVjdCBhIFRDUCBzb2NrZXQgdG8gdGhlIHNlcnZlciAo
NiBTWU4gYXR0ZW1wdHMgd2l0aAo+ID4+Pj4+Pj4+Pj4+PiBleHBvbmVudGlhbCByZXRyaWVzLCBv
ciBzb21ldGhpbmcgbGlrZSB0aGF0KS4gIEZvciBzdG9jayBDZW50T1MKPiA+Pj4+Pj4+Pj4+Pj4g
NS4zLCBJCj4gPj4+Pj4+Pj4+Pj4+IHRoaW5rCj4gPj4+Pj4+Pj4+Pj4+IHVzZXIgc3BhY2UgZG9l
cyBvbmx5IGEgRE5TIGxvb2t1cCBmb3Igbm9ybWFsIE5GU3Y0IG1vdW50cyAtLSB0aGUKPiA+Pj4+
Pj4+Pj4+Pj4ga2VybmVsCj4gPj4+Pj4+Pj4+Pj4+IGp1c3QKPiA+Pj4+Pj4+Pj4+Pj4gdHJpZXMg
dG8gY29ubmVjdCBhIFRDUCBzb2NrZXQgdG8gcG9ydCAyMDQ5LCB3aXRoIG5vIHByZWNlZGluZyBy
cGNiaW5kCj4gPj4+Pj4+Pj4+Pj4+IHJlcXVlc3QuCj4gPj4+Pj4+Pj4+Pj4+Cj4gPj4+Pj4+Pj4+
Pj4+IENhcmxvcywgbGV0IHVzIGtub3cgaWYgeW91IGhhdmUgcmVwbGFjZWQgYW55IE5GUy1yZWxh
dGVkIENlbnRPUwo+ID4+Pj4+Pj4+Pj4+PiBjb21wb25lbnRzCj4gPj4+Pj4+Pj4+Pj4+IChrZXJu
ZWwsIG5mcy11dGlscykgd2l0aCBzb21ldGhpbmcgeW91J3ZlIGJ1aWx0IHlvdXJzZWxmLgo+ID4+
Pj4+Pj4+Pj4+Pgo+ID4+Pj4+Pj4+Pj4+Pj4gMjAwOS84LzcgSi4gQnJ1Y2UgRmllbGRzIDxiZmll
bGRzQGZpZWxkc2VzLm9yZz46Cj4gPj4+Pj4+Pj4+Pj4+Pj4gT24gRnJpLCBBdWcgMDcsIDIwMDkg
YXQgMDk6NDI6MThBTSArMDMwMCwgQmVubnkgSGFsZXZ5IHdyb3RlOgo+ID4+Pj4+Pj4+Pj4+Pj4+
PiBPbiBBdWcuIDA3LCAyMDA5LCAzOjE4ICswMzAwLCBDYXJsb3MgQW5kcsOpIDxjYW5kcmVjbkBn
bWFpbC5jb20+Cj4gPj4+Pj4+Pj4+Pj4+Pj4+IHdyb3RlOgo+ID4+Pj4+Pj4+Pj4+Pj4+Pj4gQW55
b25lID8KPiA+Pj4+Pj4+Pj4+Pj4+Pj4+Cj4gPj4+Pj4+Pj4+Pj4+Pj4+PiAyMDA5LzcvMjkgQ2Fy
bG9zIEFuZHLDqSA8Y2FuZHJlY25AZ21haWwuY29tPjoKPiA+Pj4+Pj4+Pj4+Pj4+Pj4+PiBQUEws
IEkgbmVlZCBwdXQgYSBDZW50T1MgNS4zICh1cGRhdGVkKSBORlN2NCBzZXJ2ZXIgdG8gd29yayB3
aXRoCj4gPj4+Pj4+Pj4+Pj4+Pj4+Pj4gS2VyYmVyb3MKPiA+Pj4+Pj4+Pj4+Pj4+Pj4+PiBhbmQg
QXV0b0ZTLCBidXQgaSBnb3QgYSBwcm9ibGVtOiBJZiBORlMgc2VydmVyIGdvZXMgZG93biBpIGdl
dCBhCj4gPj4+Pj4+Pj4+Pj4+Pj4+Pj4gTE9PT09PT09ORwo+ID4+Pj4+Pj4+Pj4+Pj4+Pj4+IG1v
dW50IHRpbWVvdXQgb24gQ2VudE9TIDUuMyAodXBkYXRlZCkgTkZTdjQgY2xpZW50Li4uCj4gPj4+
Pj4+Pj4+Pj4+Pj4+Pj4KPiA+Pj4+Pj4+Pj4+Pj4+Pj4+PiBTaW5jZSBpIG5lZWQgbW91bnQgc29t
ZSAoMyB0byA2KSBkaXJzIGF0IHVzZXIgbG9nb24gcHJvY2VzcywgaWYKPiA+Pj4+Pj4+Pj4+Pj4+
Pj4+PiBtb3VudAo+ID4+Pj4+Pj4+Pj4+Pj4+Pj4+IGhhbmdzLAo+ID4+Pj4+Pj4+Pj4+Pj4+Pj4+
IHVzZXIgbG9nb24gaGFuZ3MuIFRoZW4gaSB3YW50IGNvbmZpZ3VyZSBpdCB0byB0aW1lb3V0IChp
ZiBzZXJ2ZXIKPiA+Pj4+Pj4+Pj4+Pj4+Pj4+PiBkb3duKQo+ID4+Pj4+Pj4+Pj4+Pj4+Pj4+IGFm
dGVyCj4gPj4+Pj4+Pj4+Pj4+Pj4+Pj4gMTAtMTUgc2VjcyAoTUFYKSBvbiBlYWNoIG1vdW50IGF0
dGVtcHQuCj4gPj4+Pj4+Pj4+Pj4+Pj4+Pj4KPiA+Pj4+Pj4+Pj4+Pj4+Pj4+PiBJIGFscmVhZHkg
bWFrZSBhIGxhYiBhbmQgdHJpZWQgYSBMT1Qgb2YgY29tYmluYXRpb25zLCB0aGVyZSBteQo+ID4+
Pj4+Pj4+Pj4+Pj4+Pj4+IGZpbmRpbmdzCj4gPj4+Pj4+Pj4+Pj4+Pj4+Pj4gKHNlcnZlciBET1dO
IElQOiAxNzIuMTYuMC4xMCAvIGNsaWVudCBJUDogMTcyLjE2LjEuMTApIHVzaW5nCj4gPj4+Pj4+
Pj4+Pj4+Pj4+Pj4gYmFzaWMKPiA+Pj4+Pj4+Pj4+Pj4+Pj4+PiBjb21tYW5kCj4gPj4+Pj4+Pj4+
Pj4+Pj4+Pj4gKHRpbWUgbW91bnQgMTcyLjE2LjAuMTA6L3JlbW90ZWRpciAvbG9jYWxkaXIvIC10
IG5mczQgLW8KPiA+Pj4+Pj4+Pj4+Pj4+Pj4+PiBzZWM9a3JiNSxwcm90bz08dGNwL3VkcD4pIGZy
b20gTkZTIGNsaWVudDoKPiA+Pj4+Pj4+Pj4+Pj4+Pj4+Pgo+ID4+Pj4+Pj4+Pj4+Pj4+Pj4+IC0g
T25jZSBpIHRyeSBhY2Nlc3MgbW91bnQgcG9pbnQgdXNpbmcgQXV0b0ZTIChwcm90bz10Y3AgT1IK
PiA+Pj4+Pj4+Pj4+Pj4+Pj4+PiBwcm90bz11ZHApCj4gPj4+Pj4+Pj4+Pj4+Pj4+Pj4gaXQKPiA+
Pj4+Pj4+Pj4+Pj4+Pj4+PiBoYW5ncyBmb3IgMTg5IHNlY3MgKDNtOXM6IHJlYWwgIDNtOS4wMDFz
KSAgdW50aWwgc2hvdyBlcnJvcgo+ID4+Pj4+Pj4+Pj4+Pj4+Pj4+IChtb3VudDoKPiA+Pj4+Pj4+
Pj4+Pj4+Pj4+PiBtb3VudCB0bwo+ID4+Pj4+Pj4+Pj4+Pj4+Pj4+IE5GUyBzZXJ2ZXIgJzE3Mi4x
Ni4wLjEwJyBmYWlsZWQ6IHRpbWVkIG91dCAoZ2l2aW5nIHVwKSkKPiA+Pj4+Pj4+Pj4+Pj4+Pj4g
U291bmRzIGxpa2UgeW91J3JlIGhpdHRpbmcgdGhlIHNlcnZlcidzIGdyYWNlIHBlcmlvZC4KPiA+
Pj4+Pj4+Pj4+Pj4+PiBJIHRob3VnaHQgaGUgd2FzIGRlc2NyaWJpbmcgYSBzaXR1YXRpb24gd2hl
cmUgdGhlIHNlcnZlciB0aGUgc2VydmVyCj4gPj4+Pj4+Pj4+Pj4+Pj4gaXMgY29tcGxldGVseSBn
b25lIGFuZCBpc24ndCBjb21pbmcgYmFjaywgYW5kIHdvbmRlcmluZyBob3cgdG8gbWFrZQo+ID4+
Pj4+Pj4+Pj4+Pj4+IHRoZQo+ID4+Pj4+Pj4+Pj4+Pj4+IG1vdW50IGZhaWwgZmFzdGVyLiAgQnV0
IEkgbWF5IGJlIG1pc3VuZGVyc3RhbmRpbmcuCj4gPj4+Pj4+Pj4+Pj4+Pj4KPiA+Pj4+Pj4+Pj4+
Pj4+PiAtLWIuCj4gPj4+Pj4+Pj4+Pj4+Pj4KPiA+Pj4+Pj4+Pj4+Pj4+IC0tCj4gPj4+Pj4+Pj4+
Pj4+PiBUbyB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJz
Y3JpYmUKPiA+Pj4+Pj4+Pj4+Pj4+IGxpbnV4LW5mcyIgaW4KPiA+Pj4+Pj4+Pj4+Pj4+IHRoZSBi
b2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3JnCj4gPj4+Pj4+Pj4+
Pj4+PiBNb3JlIG1ham9yZG9tbyBpbmZvIGF0ICBodHRwOi8vdmdlci5rZXJuZWwub3JnL21ham9y
ZG9tby1pbmZvLmh0bWwKPiA+Pj4+Pj4+Pj4+Pj4gLS0KPiA+Pj4+Pj4+Pj4+Pj4gQ2h1Y2sgTGV2
ZXIKPiA+Pj4+Pj4+Pj4+Pj4gY2h1Y2tbZG90XWxldmVyW2F0XW9yYWNsZVtkb3RdY29tCj4gPj4+
Pj4+Pj4+Pj4+Cj4gPj4+Pj4+Pj4+Pj4+Cj4gPj4+Pj4+Pj4+Pj4+Cj4gPj4+Pj4+Pj4+Pj4+Cj4g
Pj4+Pj4+Pj4+IC0tCj4gPj4+Pj4+Pj4+IENodWNrIExldmVyCj4gPj4+Pj4+Pj4+IGNodWNrW2Rv
dF1sZXZlclthdF1vcmFjbGVbZG90XWNvbQo+ID4+Pj4+Pj4+Pgo+ID4+Pj4+Pj4+Pgo+ID4+Pj4+
Pj4+Pgo+ID4+Pj4+Pj4+Pgo+ID4+Pj4+Pj4gLS0KPiA+Pj4+Pj4+IENodWNrIExldmVyCj4gPj4+
Pj4+PiBjaHVja1tkb3RdbGV2ZXJbYXRdb3JhY2xlW2RvdF1jb20KPiA+Pj4+Pj4+Cj4gPj4+Pj4+
Pgo+ID4+Pj4+Pj4KPiA+Pj4+Cj4gPj4KPiA+Pgo+ID4KCl9fX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fCk5GU3Y0IG1haWxpbmcgbGlzdApORlN2NEBsaW51eC1u
ZnMub3JnCmh0dHA6Ly9saW51eC1uZnMub3JnL2NnaS1iaW4vbWFpbG1hbi9saXN0aW5mby9uZnN2
NA==

2009-08-12 16:40:17

by Carlos André

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Today (2009-08-12) I'm using:
kernel-2.6.18-128.2.1.el5
autofs-5.0.1-0.rc2.102.el5_3.1


Look my last test:
--------------------------------------------------------------
[root@KSTATION areas]# time ls testdown
ls: testdown: No such file or directory

real 3m9.025s
user 0m0.000s
sys 0m0.002s




Aug 12 12:57:07 KSTATION automount[15471]: sun_mount: parse(sun):
mounting root /misc/areas, mountpoint testdown, what
1.2.3.4:/areas/testdown, fstype nfs4, options
acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
Aug 12 12:57:07 KSTATION automount[15471]: do_mount:
1.2.3.4:/areas/testdown /misc/areas/testdown type nfs4 options
acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0 using module nfs4
Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
root=3D/misc/areas name=3Dtestdown what=3D1.2.3.4:/areas/testdown,
fstype=3Dnfs4, options=3Dacl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
nfs options=3D"acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0", nosymlink=3D0, ro=3D0
Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
calling mkdir_path /misc/areas/testdown
Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
calling mount -t nfs4 -s -o acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
1.2.3.4:/areas/testdown /misc/areas/testdown
Aug 12 12:58:12 KSTATION automount[15471]: st_expire: state 1 path /misc
Aug 12 12:58:12 KSTATION automount[15471]: expire_proc: exp_proc =3D
3078093712 path /misc
Aug 12 12:58:13 KSTATION automount[15471]: expire_proc_indirect: 2
submounts remaining in /misc
Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: got thid
3078093712 path /misc stat 3
Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: sigchld:
exp 3078093712 finished, switching from 2 to 1
Aug 12 12:58:13 KSTATION automount[15471]: st_ready: st_ready(): state
=3D 2 path /misc
Aug 12 12:59:28 KSTATION automount[15471]: st_expire: state 1 path /misc
Aug 12 12:59:28 KSTATION automount[15471]: expire_proc: exp_proc =3D
3078093712 path /misc
Aug 12 12:59:28 KSTATION automount[15471]: expire_proc_indirect: 2
submounts remaining in /misc
Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: got thid
3078093712 path /misc stat 3
Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: sigchld:
exp 3078093712 finished, switching from 2 to 1
Aug 12 12:59:28 KSTATION automount[15471]: st_ready: st_ready(): state
=3D 2 path /misc
Aug 12 13:00:16 KSTATION automount[15471]: >> mount: mount to NFS
server '1.2.3.4' failed: timed out (giving up).
Aug 12 13:00:16 KSTATION automount[15471]: mount(nfs): nfs: mount
failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
Aug 12 13:00:16 KSTATION automount[15471]: send_fail: token =3D 17
Aug 12 13:00:16 KSTATION automount[15471]: failed to mount /misc/areas/test=
down
Aug 12 13:00:43 KSTATION automount[15471]: st_expire: state 1 path /misc
--------------------------------------------------------------

2009/8/12 Ian Kent <[email protected]>:
> Carlos Andr=E9 wrote:
>> Hi Ian,
>> I'm getting crazy trying put "retry=3D" to work on mount... this option
>> just DONT WORK if use proto=3Dtcp and/OR kerberos (sec=3Dkrb5/krb5i/krb5=
p)
>> like you can see on my previous emails...
>
> Right, my mistake for not looking closely enough at post.
>
> Maybe this is related to the same sort of problem we had with mount in
> the past, before the options parsing went into the kernel, where other
> services, like portmapper (or rpcbind), were being done with different
> timeout parameters before the RPC calls for mounting. That's just an
> example as NFSv4 shouldn't be sensitive to portmapper anyway.
>
> But what version of autofs and kernel did you say you were using?
>
>>
>> I appreciate any help.
>>
>> Carlos.
>>
>>
>> 2009/8/12 Ian Kent <[email protected]>:
>>> Chuck Lever wrote:
>>>> On Aug 11, 2009, at 8:41 AM, Carlos Andr=E9 wrote:
>>>>> This long timeout is good if workstation need mount a critical
>>>>> directory using /etc/fstab on boot (for example)..
>>>>> But in my case, using this loooong timeout doesnt make any sense,
>>>>> since autofs retry mount directory on-access. This in fact gives me
>>>>> alot of headaches, coz user login 'll just hangs if one server goes
>>>>> down for any reason, and will again hangs if user try access directory
>>>>> pointing to a NFS down server...
>>>> "retry=3D0" means the mount command will fail as soon as the first
>>>> mount(2) system call fails. =A0When you set SYN retries to 1, this mea=
ns
>>>> after 9 seconds, the connect fails, and that causes the mount(2) system
>>>> call to fail.
>>>>
>>>> Recent conversations with Ian suggested that a long timeout was desired
>>>> for automounter as well as other cases. =A0Ian, is there something els=
e we
>>>> need to consider to determine the correct retry timeout for NFS/TCP
>>>> mount points handled via automounter? =A0How should mount.nfs wait so =
we
>>>> don't make other use cases worse? =A0(Looks like most of the history is
>>>> intact below).
>>> Of course we know that autofs is entirely at the mercy of mount(8) (and
>>> mount.nfs in particular). This has always been a difficult situation for
>>> the automounter because interactive mount invocations should wait. But I
>>> believe automount mounts should always time out quickly, but that leads
>>> to its own set of problems, especially when home directories are concer=
ned.
>>>
>>> I think adding "retry=3D0" is the right thing to do myself but I'm not
>>> certain that will work as we expect. I'll have to do some experimentati=
on.
>>>
>>>> How long do you think is appropriate for the automounter to wait if the
>>>> server is down, in your case, Carlos?
>>>>
>>>>> Am losing something or there have was something weirdo...!?
>>>>> ------------------------------------------------
>>>>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries =A0[DE=
FAULT]
>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>> proto=3Dtcp,retry=3D1
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>
>>>>> real =A0 =A03m9.000s
>>>>> user =A0 =A00m0.002s
>>>>> sys =A0 =A0 0m0.001s
>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>
>>>>> real =A0 =A03m9.000s
>>>>> user =A0 =A00m0.000s
>>>>> sys =A0 =A0 0m0.002s
>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>> proto=3Dtcp,retry=3D0
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>
>>>>> real =A0 =A03m9.001s
>>>>> user =A0 =A00m0.000s
>>>>> sys =A0 =A0 0m0.003s
>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>
>>>>> real =A0 =A03m9.001s
>>>>> user =A0 =A00m0.002s
>>>>> sys =A0 =A0 0m0.001s
>>>>>
>>>>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries [ 5 to=
1 ]
>>>>>
>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>> proto=3Dtcp,retry=3D1
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x=
6]
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>
>>>>> real =A0 =A01m3.002s
>>>>> user =A0 =A00m0.000s
>>>>> sys =A0 =A0 0m0.002s
>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x=
13]
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>
>>>>> real =A0 =A02m6.000s
>>>>> user =A0 =A00m0.000s
>>>>> sys =A0 =A0 0m0.002s
>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>> proto=3Dtcp,retry=3D0
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>
>>>>> real =A0 =A00m9.003s
>>>>> user =A0 =A00m0.001s
>>>>> sys =A0 =A0 0m0.002s
>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x=
13]
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>
>>>>> real =A0 =A02m6.001s
>>>>> user =A0 =A00m0.001s
>>>>> sys =A0 =A0 0m0.002s
>>>>> [root@KSTATION ~]#
>>>>> ------------------------------------------------
>>>>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1... and
>>>>> using retry=3D0 without kerberos I got only 9s...
>>>>>
>>>>> *sigh*
>>>>>
>>>>>
>>>>>
>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr=E9 wrote:
>>>>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_retries =
to
>>>>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>>>> Right. =A0Normally the RPC client calls the kernel's socket connect
>>>>>> function,
>>>>>> which does 6 SYN retries. =A0That one call usually takes longer than
>>>>>> the RPC
>>>>>> client's connect timeout, so it only makes one connect call, and then
>>>>>> fails.
>>>>>>
>>>>>> Reducing the number of SYN retries per connect attempt causes the RPC
>>>>>> client
>>>>>> to retry the connect call until its connect timeout expires. =A0Each
>>>>>> connect
>>>>>> call resets the SYN timeout to 3 seconds.
>>>>>>
>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>>>>> sec=3Dkrb5p,proto=3Dtcp
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>>
>>>>>>> real =A0 =A03m9.000s
>>>>>>> user =A0 =A00m0.000s
>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>
>>>>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>>>>> sec=3Dkrb5p,proto=3Dtcp =A0("retry=3D1" =3D no change)
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>>
>>>>>>> real =A0 =A02m6.004s
>>>>>>> user =A0 =A00m0.000s
>>>>>>> sys =A0 =A0 0m0.004s
>>>>>>>
>>>>>>> (3,6,3,6... secs interval)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2009/8/10 Carlos Andr=E9 <[email protected]>:
>>>>>>>> No, i'm just using packages from CentOS repo...
>>>>>>>>
>>>>>>>> And u're right about expo retries... with tcpdump i've monitored
>>>>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port
>>>>>>>> 2049...
>>>>>>>> I tried use "retry=3D1" option on mount without any change... I do=
nt
>>>>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>>>>
>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>>>>>>>> Bruce, no... you're right. =A0I'm describing a situation where my
>>>>>>>>>> server
>>>>>>>>>> died... i need mount fail faster (10 or 15 secs max) than 3 minu=
tes
>>>>>>>>>> and 9 seconds...
>>>>>>>>> The 189 second timeout is likely how long it takes the kernel to
>>>>>>>>> give up
>>>>>>>>> trying to connect a TCP socket to the server (6 SYN attempts with
>>>>>>>>> exponential retries, or something like that). =A0For stock CentOS
>>>>>>>>> 5.3, I
>>>>>>>>> think
>>>>>>>>> user space does only a DNS lookup for normal NFSv4 mounts -- the
>>>>>>>>> kernel
>>>>>>>>> just
>>>>>>>>> tries to connect a TCP socket to port 2049, with no preceding rpc=
bind
>>>>>>>>> request.
>>>>>>>>>
>>>>>>>>> Carlos, let us know if you have replaced any NFS-related CentOS
>>>>>>>>> components
>>>>>>>>> (kernel, nfs-utils) with something you've built yourself.
>>>>>>>>>
>>>>>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <[email protected]=
om>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Anyone ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work =
with
>>>>>>>>>>>>>> Kerberos
>>>>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i g=
et a
>>>>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon process,=
if
>>>>>>>>>>>>>> mount
>>>>>>>>>>>>>> hangs,
>>>>>>>>>>>>>> user logon hangs. Then i want configure it to timeout (if se=
rver
>>>>>>>>>>>>>> down)
>>>>>>>>>>>>>> after
>>>>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I already make a lab and tried a LOT of combinations, there =
my
>>>>>>>>>>>>>> findings
>>>>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using
>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>> command
>>>>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>>>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - Once i try access mount point using AutoFS (proto=3Dtcp OR
>>>>>>>>>>>>>> proto=3Dudp)
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>> hangs for 189 secs (3m9s: real =A03m9.001s) =A0until show er=
ror
>>>>>>>>>>>>>> (mount:
>>>>>>>>>>>>>> mount to
>>>>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>>>>>>>> Sounds like you're hitting the server's grace period.
>>>>>>>>>>> I thought he was describing a situation where the server the se=
rver
>>>>>>>>>>> is completely gone and isn't coming back, and wondering how to =
make
>>>>>>>>>>> the
>>>>>>>>>>> mount fail faster. =A0But I may be misunderstanding.
>>>>>>>>>>>
>>>>>>>>>>> --b.
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>> linux-nfs" in
>>>>>>>>>> the body of a message to [email protected]
>>>>>>>>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.=
html
>>>>>>>>> --
>>>>>>>>> Chuck Lever
>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>> --
>>>>>> Chuck Lever
>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> --
>>>> Chuck Lever
>>>> chuck[dot]lever[at]oracle[dot]com
>>>>
>>>>
>>>>
>>>
>
>

2009-08-13 14:19:59

by Ian Kent

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Carlos Andr=E9 wrote:
> Today (2009-08-12) I'm using:
> kernel-2.6.18-128.2.1.el5
> autofs-5.0.1-0.rc2.102.el5_3.1

Thanks,

My mistake, the wait time I was referring to is used for umounts during
expires and is present in rev rc2.102.

It shouldn't be hard to add this for mount as well.
Would you like me to put something together?

Probably would be good to test something out to see if we can make a
difference with the killing mount after some configured timeout but, if
we make progress, probably the best way to deal with it is for you to
log a bug against rhel-5 so I can get it committed to the rhel package.
The possible issue is that I'm not sure if the RPC subsystem in the
above rhel kernel will respond well to process death with potential
outstanding requests. But we'll see.

> =

> =

> Look my last test:
> --------------------------------------------------------------
> [root@KSTATION areas]# time ls testdown
> ls: testdown: No such file or directory
> =

> real 3m9.025s
> user 0m0.000s
> sys 0m0.002s
> =

> =

> =

> =

> Aug 12 12:57:07 KSTATION automount[15471]: sun_mount: parse(sun):
> mounting root /misc/areas, mountpoint testdown, what
> 1.2.3.4:/areas/testdown, fstype nfs4, options
> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
> Aug 12 12:57:07 KSTATION automount[15471]: do_mount:
> 1.2.3.4:/areas/testdown /misc/areas/testdown type nfs4 options
> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0 using module nfs4
> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
> root=3D/misc/areas name=3Dtestdown what=3D1.2.3.4:/areas/testdown,
> fstype=3Dnfs4, options=3Dacl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
> nfs options=3D"acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0", nosymlink=3D0, ro=
=3D0
> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
> calling mkdir_path /misc/areas/testdown
> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
> calling mount -t nfs4 -s -o acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
> 1.2.3.4:/areas/testdown /misc/areas/testdown
> Aug 12 12:58:12 KSTATION automount[15471]: st_expire: state 1 path /misc
> Aug 12 12:58:12 KSTATION automount[15471]: expire_proc: exp_proc =3D
> 3078093712 path /misc
> Aug 12 12:58:13 KSTATION automount[15471]: expire_proc_indirect: 2
> submounts remaining in /misc
> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: got thid
> 3078093712 path /misc stat 3
> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: sigchld:
> exp 3078093712 finished, switching from 2 to 1
> Aug 12 12:58:13 KSTATION automount[15471]: st_ready: st_ready(): state
> =3D 2 path /misc
> Aug 12 12:59:28 KSTATION automount[15471]: st_expire: state 1 path /misc
> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc: exp_proc =3D
> 3078093712 path /misc
> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc_indirect: 2
> submounts remaining in /misc
> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: got thid
> 3078093712 path /misc stat 3
> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: sigchld:
> exp 3078093712 finished, switching from 2 to 1
> Aug 12 12:59:28 KSTATION automount[15471]: st_ready: st_ready(): state
> =3D 2 path /misc
> Aug 12 13:00:16 KSTATION automount[15471]: >> mount: mount to NFS
> server '1.2.3.4' failed: timed out (giving up).
> Aug 12 13:00:16 KSTATION automount[15471]: mount(nfs): nfs: mount
> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
> Aug 12 13:00:16 KSTATION automount[15471]: send_fail: token =3D 17
> Aug 12 13:00:16 KSTATION automount[15471]: failed to mount /misc/areas/te=
stdown
> Aug 12 13:00:43 KSTATION automount[15471]: st_expire: state 1 path /misc
> --------------------------------------------------------------
> =

> 2009/8/12 Ian Kent <[email protected]>:
>> Carlos Andr=E9 wrote:
>>> Hi Ian,
>>> I'm getting crazy trying put "retry=3D" to work on mount... this option
>>> just DONT WORK if use proto=3Dtcp and/OR kerberos (sec=3Dkrb5/krb5i/krb=
5p)
>>> like you can see on my previous emails...
>> Right, my mistake for not looking closely enough at post.
>>
>> Maybe this is related to the same sort of problem we had with mount in
>> the past, before the options parsing went into the kernel, where other
>> services, like portmapper (or rpcbind), were being done with different
>> timeout parameters before the RPC calls for mounting. That's just an
>> example as NFSv4 shouldn't be sensitive to portmapper anyway.
>>
>> But what version of autofs and kernel did you say you were using?
>>
>>> I appreciate any help.
>>>
>>> Carlos.
>>>
>>>
>>> 2009/8/12 Ian Kent <[email protected]>:
>>>> Chuck Lever wrote:
>>>>> On Aug 11, 2009, at 8:41 AM, Carlos Andr=E9 wrote:
>>>>>> This long timeout is good if workstation need mount a critical
>>>>>> directory using /etc/fstab on boot (for example)..
>>>>>> But in my case, using this loooong timeout doesnt make any sense,
>>>>>> since autofs retry mount directory on-access. This in fact gives me
>>>>>> alot of headaches, coz user login 'll just hangs if one server goes
>>>>>> down for any reason, and will again hangs if user try access directo=
ry
>>>>>> pointing to a NFS down server...
>>>>> "retry=3D0" means the mount command will fail as soon as the first
>>>>> mount(2) system call fails. When you set SYN retries to 1, this means
>>>>> after 9 seconds, the connect fails, and that causes the mount(2) syst=
em
>>>>> call to fail.
>>>>>
>>>>> Recent conversations with Ian suggested that a long timeout was desir=
ed
>>>>> for automounter as well as other cases. Ian, is there something else=
we
>>>>> need to consider to determine the correct retry timeout for NFS/TCP
>>>>> mount points handled via automounter? How should mount.nfs wait so we
>>>>> don't make other use cases worse? (Looks like most of the history is
>>>>> intact below).
>>>> Of course we know that autofs is entirely at the mercy of mount(8) (and
>>>> mount.nfs in particular). This has always been a difficult situation f=
or
>>>> the automounter because interactive mount invocations should wait. But=
I
>>>> believe automount mounts should always time out quickly, but that leads
>>>> to its own set of problems, especially when home directories are conce=
rned.
>>>>
>>>> I think adding "retry=3D0" is the right thing to do myself but I'm not
>>>> certain that will work as we expect. I'll have to do some experimentat=
ion.
>>>>
>>>>> How long do you think is appropriate for the automounter to wait if t=
he
>>>>> server is down, in your case, Carlos?
>>>>>
>>>>>> Am losing something or there have was something weirdo...!?
>>>>>> ------------------------------------------------
>>>>>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries [DEF=
AULT]
>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>> proto=3Dtcp,retry=3D1
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>
>>>>>> real 3m9.000s
>>>>>> user 0m0.002s
>>>>>> sys 0m0.001s
>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>
>>>>>> real 3m9.000s
>>>>>> user 0m0.000s
>>>>>> sys 0m0.002s
>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>> proto=3Dtcp,retry=3D0
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>
>>>>>> real 3m9.001s
>>>>>> user 0m0.000s
>>>>>> sys 0m0.003s
>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>
>>>>>> real 3m9.001s
>>>>>> user 0m0.002s
>>>>>> sys 0m0.001s
>>>>>>
>>>>>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries [ 5 t=
o 1 ]
>>>>>>
>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>> proto=3Dtcp,retry=3D1
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [=
x 6]
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>
>>>>>> real 1m3.002s
>>>>>> user 0m0.000s
>>>>>> sys 0m0.002s
>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [=
x 13]
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>
>>>>>> real 2m6.000s
>>>>>> user 0m0.000s
>>>>>> sys 0m0.002s
>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>> proto=3Dtcp,retry=3D0
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>
>>>>>> real 0m9.003s
>>>>>> user 0m0.001s
>>>>>> sys 0m0.002s
>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [=
x 13]
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>
>>>>>> real 2m6.001s
>>>>>> user 0m0.001s
>>>>>> sys 0m0.002s
>>>>>> [root@KSTATION ~]#
>>>>>> ------------------------------------------------
>>>>>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1... and
>>>>>> using retry=3D0 without kerberos I got only 9s...
>>>>>>
>>>>>> *sigh*
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr=E9 wrote:
>>>>>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>>>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_retries=
to
>>>>>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>>>>> Right. Normally the RPC client calls the kernel's socket connect
>>>>>>> function,
>>>>>>> which does 6 SYN retries. That one call usually takes longer than
>>>>>>> the RPC
>>>>>>> client's connect timeout, so it only makes one connect call, and th=
en
>>>>>>> fails.
>>>>>>>
>>>>>>> Reducing the number of SYN retries per connect attempt causes the R=
PC
>>>>>>> client
>>>>>>> to retry the connect call until its connect timeout expires. Each
>>>>>>> connect
>>>>>>> call resets the SYN timeout to 3 seconds.
>>>>>>>
>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>>>>>> sec=3Dkrb5p,proto=3Dtcp
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>>>
>>>>>>>> real 3m9.000s
>>>>>>>> user 0m0.000s
>>>>>>>> sys 0m0.002s
>>>>>>>>
>>>>>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>>>>>> sec=3Dkrb5p,proto=3Dtcp ("retry=3D1" =3D no change)
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>>>
>>>>>>>> real 2m6.004s
>>>>>>>> user 0m0.000s
>>>>>>>> sys 0m0.004s
>>>>>>>>
>>>>>>>> (3,6,3,6... secs interval)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2009/8/10 Carlos Andr=E9 <[email protected]>:
>>>>>>>>> No, i'm just using packages from CentOS repo...
>>>>>>>>>
>>>>>>>>> And u're right about expo retries... with tcpdump i've monitored
>>>>>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port
>>>>>>>>> 2049...
>>>>>>>>> I tried use "retry=3D1" option on mount without any change... I d=
ont
>>>>>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>>>>>
>>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>>>>>>>>> Bruce, no... you're right. I'm describing a situation where my
>>>>>>>>>>> server
>>>>>>>>>>> died... i need mount fail faster (10 or 15 secs max) than 3 min=
utes
>>>>>>>>>>> and 9 seconds...
>>>>>>>>>> The 189 second timeout is likely how long it takes the kernel to
>>>>>>>>>> give up
>>>>>>>>>> trying to connect a TCP socket to the server (6 SYN attempts with
>>>>>>>>>> exponential retries, or something like that). For stock CentOS
>>>>>>>>>> 5.3, I
>>>>>>>>>> think
>>>>>>>>>> user space does only a DNS lookup for normal NFSv4 mounts -- the
>>>>>>>>>> kernel
>>>>>>>>>> just
>>>>>>>>>> tries to connect a TCP socket to port 2049, with no preceding rp=
cbind
>>>>>>>>>> request.
>>>>>>>>>>
>>>>>>>>>> Carlos, let us know if you have replaced any NFS-related CentOS
>>>>>>>>>> components
>>>>>>>>>> (kernel, nfs-utils) with something you've built yourself.
>>>>>>>>>>
>>>>>>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <candrecn@gmail.=
com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Anyone ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work=
with
>>>>>>>>>>>>>>> Kerberos
>>>>>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i =
get a
>>>>>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon process=
, if
>>>>>>>>>>>>>>> mount
>>>>>>>>>>>>>>> hangs,
>>>>>>>>>>>>>>> user logon hangs. Then i want configure it to timeout (if s=
erver
>>>>>>>>>>>>>>> down)
>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I already make a lab and tried a LOT of combinations, there=
my
>>>>>>>>>>>>>>> findings
>>>>>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using
>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>>>>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - Once i try access mount point using AutoFS (proto=3Dtcp OR
>>>>>>>>>>>>>>> proto=3Dudp)
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> hangs for 189 secs (3m9s: real 3m9.001s) until show error
>>>>>>>>>>>>>>> (mount:
>>>>>>>>>>>>>>> mount to
>>>>>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>>>>>>>>> Sounds like you're hitting the server's grace period.
>>>>>>>>>>>> I thought he was describing a situation where the server the s=
erver
>>>>>>>>>>>> is completely gone and isn't coming back, and wondering how to=
make
>>>>>>>>>>>> the
>>>>>>>>>>>> mount fail faster. But I may be misunderstanding.
>>>>>>>>>>>>
>>>>>>>>>>>> --b.
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>> linux-nfs" in
>>>>>>>>>>> the body of a message to [email protected]
>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.h=
tml
>>>>>>>>>> --
>>>>>>>>>> Chuck Lever
>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>> --
>>>>>>> Chuck Lever
>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> --
>>>>> Chuck Lever
>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>
>>>>>
>>>>>
>>

2009-08-13 14:43:53

by Carlos André

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

2009/8/13 Ian Kent <[email protected]>:
> Carlos Andr=E9 wrote:
>> Today (2009-08-12) I'm using:
>> kernel-2.6.18-128.2.1.el5
>> autofs-5.0.1-0.rc2.102.el5_3.1
>
> Thanks,
>
> My mistake, the wait time I was referring to is used for umounts during
> expires and is present in rev rc2.102.
>
> It shouldn't be hard to add this for mount as well.
> Would you like me to put something together?

Sure! that 'll help me a lot (and for sure another ppl) :) Thanks :)

>
> Probably would be good to test something out to see if we can make a
> difference with the killing mount after some configured timeout but, if
> we make progress, probably the best way to deal with it is for you to
> log a bug against rhel-5 so I can get it committed to the rhel package.
> The possible issue is that I'm not sure if the RPC subsystem in the
> above rhel kernel will respond well to process death with potential
> outstanding requests. But we'll see.

Ok, on my way :)

Thanks a lot!

>
>>
>>
>> Look my last test:
>> --------------------------------------------------------------
>> [root@KSTATION areas]# time ls testdown
>> ls: testdown: No such file or directory
>>
>> real =A0 =A03m9.025s
>> user =A0 =A00m0.000s
>> sys =A0 =A0 0m0.002s
>>
>>
>>
>>
>> Aug 12 12:57:07 KSTATION automount[15471]: sun_mount: parse(sun):
>> mounting root /misc/areas, mountpoint testdown, what
>> 1.2.3.4:/areas/testdown, fstype nfs4, options
>> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>> Aug 12 12:57:07 KSTATION automount[15471]: do_mount:
>> 1.2.3.4:/areas/testdown /misc/areas/testdown type nfs4 options
>> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0 using module nfs4
>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>> root=3D/misc/areas name=3Dtestdown what=3D1.2.3.4:/areas/testdown,
>> fstype=3Dnfs4, options=3Dacl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>> nfs options=3D"acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0", nosymlink=3D0, ro=
=3D0
>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>> calling mkdir_path /misc/areas/testdown
>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>> calling mount -t nfs4 -s -o acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>> 1.2.3.4:/areas/testdown /misc/areas/testdown
>> Aug 12 12:58:12 KSTATION automount[15471]: st_expire: state 1 path /misc
>> Aug 12 12:58:12 KSTATION automount[15471]: expire_proc: exp_proc =3D
>> 3078093712 path /misc
>> Aug 12 12:58:13 KSTATION automount[15471]: expire_proc_indirect: 2
>> submounts remaining in /misc
>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: got thid
>> 3078093712 path /misc stat 3
>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: sigchld:
>> exp 3078093712 finished, switching from 2 to 1
>> Aug 12 12:58:13 KSTATION automount[15471]: st_ready: st_ready(): state
>> =3D 2 path /misc
>> Aug 12 12:59:28 KSTATION automount[15471]: st_expire: state 1 path /misc
>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc: exp_proc =3D
>> 3078093712 path /misc
>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc_indirect: 2
>> submounts remaining in /misc
>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: got thid
>> 3078093712 path /misc stat 3
>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: sigchld:
>> exp 3078093712 finished, switching from 2 to 1
>> Aug 12 12:59:28 KSTATION automount[15471]: st_ready: st_ready(): state
>> =3D 2 path /misc
>> Aug 12 13:00:16 KSTATION automount[15471]: >> mount: mount to NFS
>> server '1.2.3.4' failed: timed out (giving up).
>> Aug 12 13:00:16 KSTATION automount[15471]: mount(nfs): nfs: mount
>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>> Aug 12 13:00:16 KSTATION automount[15471]: send_fail: token =3D 17
>> Aug 12 13:00:16 KSTATION automount[15471]: failed to mount /misc/areas/t=
estdown
>> Aug 12 13:00:43 KSTATION automount[15471]: st_expire: state 1 path /misc
>> --------------------------------------------------------------
>>
>> 2009/8/12 Ian Kent <[email protected]>:
>>> Carlos Andr=E9 wrote:
>>>> Hi Ian,
>>>> I'm getting crazy trying put "retry=3D" to work on mount... this option
>>>> just DONT WORK if use proto=3Dtcp and/OR kerberos (sec=3Dkrb5/krb5i/kr=
b5p)
>>>> like you can see on my previous emails...
>>> Right, my mistake for not looking closely enough at post.
>>>
>>> Maybe this is related to the same sort of problem we had with mount in
>>> the past, before the options parsing went into the kernel, where other
>>> services, like portmapper (or rpcbind), were being done with different
>>> timeout parameters before the RPC calls for mounting. That's just an
>>> example as NFSv4 shouldn't be sensitive to portmapper anyway.
>>>
>>> But what version of autofs and kernel did you say you were using?
>>>
>>>> I appreciate any help.
>>>>
>>>> Carlos.
>>>>
>>>>
>>>> 2009/8/12 Ian Kent <[email protected]>:
>>>>> Chuck Lever wrote:
>>>>>> On Aug 11, 2009, at 8:41 AM, Carlos Andr=E9 wrote:
>>>>>>> This long timeout is good if workstation need mount a critical
>>>>>>> directory using /etc/fstab on boot (for example)..
>>>>>>> But in my case, using this loooong timeout doesnt make any sense,
>>>>>>> since autofs retry mount directory on-access. This in fact gives me
>>>>>>> alot of headaches, coz user login 'll just hangs if one server goes
>>>>>>> down for any reason, and will again hangs if user try access direct=
ory
>>>>>>> pointing to a NFS down server...
>>>>>> "retry=3D0" means the mount command will fail as soon as the first
>>>>>> mount(2) system call fails. =A0When you set SYN retries to 1, this m=
eans
>>>>>> after 9 seconds, the connect fails, and that causes the mount(2) sys=
tem
>>>>>> call to fail.
>>>>>>
>>>>>> Recent conversations with Ian suggested that a long timeout was desi=
red
>>>>>> for automounter as well as other cases. =A0Ian, is there something e=
lse we
>>>>>> need to consider to determine the correct retry timeout for NFS/TCP
>>>>>> mount points handled via automounter? =A0How should mount.nfs wait s=
o we
>>>>>> don't make other use cases worse? =A0(Looks like most of the history=
is
>>>>>> intact below).
>>>>> Of course we know that autofs is entirely at the mercy of mount(8) (a=
nd
>>>>> mount.nfs in particular). This has always been a difficult situation =
for
>>>>> the automounter because interactive mount invocations should wait. Bu=
t I
>>>>> believe automount mounts should always time out quickly, but that lea=
ds
>>>>> to its own set of problems, especially when home directories are conc=
erned.
>>>>>
>>>>> I think adding "retry=3D0" is the right thing to do myself but I'm not
>>>>> certain that will work as we expect. I'll have to do some experimenta=
tion.
>>>>>
>>>>>> How long do you think is appropriate for the automounter to wait if =
the
>>>>>> server is down, in your case, Carlos?
>>>>>>
>>>>>>> Am losing something or there have was something weirdo...!?
>>>>>>> ------------------------------------------------
>>>>>>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries =A0[=
DEFAULT]
>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>> proto=3Dtcp,retry=3D1
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>>
>>>>>>> real =A0 =A03m9.000s
>>>>>>> user =A0 =A00m0.002s
>>>>>>> sys =A0 =A0 0m0.001s
>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>>
>>>>>>> real =A0 =A03m9.000s
>>>>>>> user =A0 =A00m0.000s
>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>> proto=3Dtcp,retry=3D0
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>>
>>>>>>> real =A0 =A03m9.001s
>>>>>>> user =A0 =A00m0.000s
>>>>>>> sys =A0 =A0 0m0.003s
>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>>
>>>>>>> real =A0 =A03m9.001s
>>>>>>> user =A0 =A00m0.002s
>>>>>>> sys =A0 =A0 0m0.001s
>>>>>>>
>>>>>>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries [ 5 =
to 1 ]
>>>>>>>
>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>> proto=3Dtcp,retry=3D1
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). =
[x 6]
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>>
>>>>>>> real =A0 =A01m3.002s
>>>>>>> user =A0 =A00m0.000s
>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). =
[x 13]
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>>
>>>>>>> real =A0 =A02m6.000s
>>>>>>> user =A0 =A00m0.000s
>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>> proto=3Dtcp,retry=3D0
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>>
>>>>>>> real =A0 =A00m9.003s
>>>>>>> user =A0 =A00m0.001s
>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). =
[x 13]
>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>>
>>>>>>> real =A0 =A02m6.001s
>>>>>>> user =A0 =A00m0.001s
>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>> [root@KSTATION ~]#
>>>>>>> ------------------------------------------------
>>>>>>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1... and
>>>>>>> using retry=3D0 without kerberos I got only 9s...
>>>>>>>
>>>>>>> *sigh*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr=E9 wrote:
>>>>>>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>>>>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_retrie=
s to
>>>>>>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>>>>>> Right. =A0Normally the RPC client calls the kernel's socket connect
>>>>>>>> function,
>>>>>>>> which does 6 SYN retries. =A0That one call usually takes longer th=
an
>>>>>>>> the RPC
>>>>>>>> client's connect timeout, so it only makes one connect call, and t=
hen
>>>>>>>> fails.
>>>>>>>>
>>>>>>>> Reducing the number of SYN retries per connect attempt causes the =
RPC
>>>>>>>> client
>>>>>>>> to retry the connect call until its connect timeout expires. =A0Ea=
ch
>>>>>>>> connect
>>>>>>>> call resets the SYN timeout to 3 seconds.
>>>>>>>>
>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up=
).
>>>>>>>>>
>>>>>>>>> real =A0 =A03m9.000s
>>>>>>>>> user =A0 =A00m0.000s
>>>>>>>>> sys =A0 =A0 0m0.002s
>>>>>>>>>
>>>>>>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp =A0("retry=3D1" =3D no change)
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up=
).
>>>>>>>>>
>>>>>>>>> real =A0 =A02m6.004s
>>>>>>>>> user =A0 =A00m0.000s
>>>>>>>>> sys =A0 =A0 0m0.004s
>>>>>>>>>
>>>>>>>>> (3,6,3,6... secs interval)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2009/8/10 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>> No, i'm just using packages from CentOS repo...
>>>>>>>>>>
>>>>>>>>>> And u're right about expo retries... with tcpdump i've monitored
>>>>>>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on po=
rt
>>>>>>>>>> 2049...
>>>>>>>>>> I tried use "retry=3D1" option on mount without any change... I =
dont
>>>>>>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>>>>>>
>>>>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>>>>>>>>>> Bruce, no... you're right. =A0I'm describing a situation where=
my
>>>>>>>>>>>> server
>>>>>>>>>>>> died... i need mount fail faster (10 or 15 secs max) than 3 mi=
nutes
>>>>>>>>>>>> and 9 seconds...
>>>>>>>>>>> The 189 second timeout is likely how long it takes the kernel to
>>>>>>>>>>> give up
>>>>>>>>>>> trying to connect a TCP socket to the server (6 SYN attempts wi=
th
>>>>>>>>>>> exponential retries, or something like that). =A0For stock Cent=
OS
>>>>>>>>>>> 5.3, I
>>>>>>>>>>> think
>>>>>>>>>>> user space does only a DNS lookup for normal NFSv4 mounts -- the
>>>>>>>>>>> kernel
>>>>>>>>>>> just
>>>>>>>>>>> tries to connect a TCP socket to port 2049, with no preceding r=
pcbind
>>>>>>>>>>> request.
>>>>>>>>>>>
>>>>>>>>>>> Carlos, let us know if you have replaced any NFS-related CentOS
>>>>>>>>>>> components
>>>>>>>>>>> (kernel, nfs-utils) with something you've built yourself.
>>>>>>>>>>>
>>>>>>>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>>>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <candrecn@gmail=
.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Anyone ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to wor=
k with
>>>>>>>>>>>>>>>> Kerberos
>>>>>>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i=
get a
>>>>>>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon proces=
s, if
>>>>>>>>>>>>>>>> mount
>>>>>>>>>>>>>>>> hangs,
>>>>>>>>>>>>>>>> user logon hangs. Then i want configure it to timeout (if =
server
>>>>>>>>>>>>>>>> down)
>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I already make a lab and tried a LOT of combinations, ther=
e my
>>>>>>>>>>>>>>>> findings
>>>>>>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) usi=
ng
>>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>>>>>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - Once i try access mount point using AutoFS (proto=3Dtcp =
OR
>>>>>>>>>>>>>>>> proto=3Dudp)
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> hangs for 189 secs (3m9s: real =A03m9.001s) =A0until show =
error
>>>>>>>>>>>>>>>> (mount:
>>>>>>>>>>>>>>>> mount to
>>>>>>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>>>>>>>>>> Sounds like you're hitting the server's grace period.
>>>>>>>>>>>>> I thought he was describing a situation where the server the =
server
>>>>>>>>>>>>> is completely gone and isn't coming back, and wondering how t=
o make
>>>>>>>>>>>>> the
>>>>>>>>>>>>> mount fail faster. =A0But I may be misunderstanding.
>>>>>>>>>>>>>
>>>>>>>>>>>>> --b.
>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>>>>> linux-nfs" in
>>>>>>>>>>>> the body of a message to [email protected]
>>>>>>>>>>>> More majordomo info at =A0http://vger.kernel.org/majordomo-inf=
o.html
>>>>>>>>>>> --
>>>>>>>>>>> Chuck Lever
>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>> --
>>>>>>>> Chuck Lever
>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> Chuck Lever
>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>
>>>>>>
>>>>>>
>>>
>
>

2009-08-07 14:04:25

by J. Bruce Fields

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

T24gRnJpLCBBdWcgMDcsIDIwMDkgYXQgMDk6NDI6MThBTSArMDMwMCwgQmVubnkgSGFsZXZ5IHdy
b3RlOgo+IE9uIEF1Zy4gMDcsIDIwMDksIDM6MTggKzAzMDAsIENhcmxvcyBBbmRyw6kgPGNhbmRy
ZWNuQGdtYWlsLmNvbT4gd3JvdGU6Cj4gPiBBbnlvbmUgPwo+ID4gCj4gPiAyMDA5LzcvMjkgQ2Fy
bG9zIEFuZHLDqSA8Y2FuZHJlY25AZ21haWwuY29tPjoKPiA+PiBQUEwsIEkgbmVlZCBwdXQgYSBD
ZW50T1MgNS4zICh1cGRhdGVkKSBORlN2NCBzZXJ2ZXIgdG8gd29yayB3aXRoIEtlcmJlcm9zCj4g
Pj4gYW5kIEF1dG9GUywgYnV0IGkgZ290IGEgcHJvYmxlbTogSWYgTkZTIHNlcnZlciBnb2VzIGRv
d24gaSBnZXQgYSBMT09PT09PT05HCj4gPj4gbW91bnQgdGltZW91dCBvbiBDZW50T1MgNS4zICh1
cGRhdGVkKSBORlN2NCBjbGllbnQuLi4KPiA+Pgo+ID4+IFNpbmNlIGkgbmVlZCBtb3VudCBzb21l
ICgzIHRvIDYpIGRpcnMgYXQgdXNlciBsb2dvbiBwcm9jZXNzLCBpZiBtb3VudCBoYW5ncywKPiA+
PiB1c2VyIGxvZ29uIGhhbmdzLiBUaGVuIGkgd2FudCBjb25maWd1cmUgaXQgdG8gdGltZW91dCAo
aWYgc2VydmVyIGRvd24pIGFmdGVyCj4gPj4gMTAtMTUgc2VjcyAoTUFYKSBvbiBlYWNoIG1vdW50
IGF0dGVtcHQuCj4gPj4KPiA+PiBJIGFscmVhZHkgbWFrZSBhIGxhYiBhbmQgdHJpZWQgYSBMT1Qg
b2YgY29tYmluYXRpb25zLCB0aGVyZSBteSBmaW5kaW5ncwo+ID4+IChzZXJ2ZXIgRE9XTiBJUDog
MTcyLjE2LjAuMTAgLyBjbGllbnQgSVA6IDE3Mi4xNi4xLjEwKSB1c2luZyBiYXNpYyBjb21tYW5k
Cj4gPj4gKHRpbWUgbW91bnQgMTcyLjE2LjAuMTA6L3JlbW90ZWRpciAvbG9jYWxkaXIvIC10IG5m
czQgLW8KPiA+PiBzZWM9a3JiNSxwcm90bz08dGNwL3VkcD4pIGZyb20gTkZTIGNsaWVudDoKPiA+
Pgo+ID4+IC0gT25jZSBpIHRyeSBhY2Nlc3MgbW91bnQgcG9pbnQgdXNpbmcgQXV0b0ZTIChwcm90
bz10Y3AgT1IgcHJvdG89dWRwKSBpdAo+ID4+IGhhbmdzIGZvciAxODkgc2VjcyAoM205czogcmVh
bCAgM205LjAwMXMpICB1bnRpbCBzaG93IGVycm9yIChtb3VudDogbW91bnQgdG8KPiA+PiBORlMg
c2VydmVyICcxNzIuMTYuMC4xMCcgZmFpbGVkOiB0aW1lZCBvdXQgKGdpdmluZyB1cCkpCj4gCj4g
U291bmRzIGxpa2UgeW91J3JlIGhpdHRpbmcgdGhlIHNlcnZlcidzIGdyYWNlIHBlcmlvZC4KCkkg
dGhvdWdodCBoZSB3YXMgZGVzY3JpYmluZyBhIHNpdHVhdGlvbiB3aGVyZSB0aGUgc2VydmVyIHRo
ZSBzZXJ2ZXIKaXMgY29tcGxldGVseSBnb25lIGFuZCBpc24ndCBjb21pbmcgYmFjaywgYW5kIHdv
bmRlcmluZyBob3cgdG8gbWFrZSB0aGUKbW91bnQgZmFpbCBmYXN0ZXIuICBCdXQgSSBtYXkgYmUg
bWlzdW5kZXJzdGFuZGluZy4KCi0tYi4KX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX18KTkZTdjQgbWFpbGluZyBsaXN0Ck5GU3Y0QGxpbnV4LW5mcy5vcmcKaHR0
cDovL2xpbnV4LW5mcy5vcmcvY2dpLWJpbi9tYWlsbWFuL2xpc3RpbmZvL25mc3Y0

2009-08-10 18:29:31

by Carlos André

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Bruce, no... you're right. I'm describing a situation where my server
died... i need mount fail faster (10 or 15 secs max) than 3 minutes
and 9 seconds...

2009/8/7 J. Bruce Fields <[email protected]>:
> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <[email protected]> wr=
ote:
>> > Anyone ?
>> >
>> > 2009/7/29 Carlos Andr=E9 <[email protected]>:
>> >> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work with =
Kerberos
>> >> and AutoFS, but i got a problem: If NFS server goes down i get a =
LOOOOOOONG
>> >> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>> >>
>> >> Since i need mount some (3 to 6) dirs at user logon process, if m=
ount hangs,
>> >> user logon hangs. Then i want configure it to timeout (if server =
down) after
>> >> 10-15 secs (MAX) on each mount attempt.
>> >>
>> >> I already make a lab and tried a LOT of combinations, there my fi=
ndings
>> >> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using basi=
c command
>> >> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>> >> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>> >>
>> >> - Once i try access mount point using AutoFS (proto=3Dtcp OR prot=
o=3Dudp) it
>> >> hangs for 189 secs (3m9s: real =A03m9.001s) =A0until show error (=
mount: mount to
>> >> NFS server '172.16.0.10' failed: timed out (giving up))
>>
>> Sounds like you're hitting the server's grace period.
>
> I thought he was describing a situation where the server the server
> is completely gone and isn't coming back, and wondering how to make t=
he
> mount fail faster. =A0But I may be misunderstanding.
>
> --b.
>

2009-08-10 19:43:16

by Carlos André

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

No, i'm just using packages from CentOS repo...

And u're right about expo retries... with tcpdump i've monitored
traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port
2049...
I tried use "retry=3D1" option on mount without any change... I dont
want change source or tcp timers... just NFSv4 client.

2009/8/10 Chuck Lever <[email protected]>:
> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>
>> Bruce, no... you're right. =A0I'm describing a situation where my server
>> died... i need mount fail faster (10 or 15 secs max) than 3 minutes
>> and 9 seconds...
>
> The 189 second timeout is likely how long it takes the kernel to give up
> trying to connect a TCP socket to the server (6 SYN attempts with
> exponential retries, or something like that). =A0For stock CentOS 5.3, I =
think
> user space does only a DNS lookup for normal NFSv4 mounts -- the kernel j=
ust
> tries to connect a TCP socket to port 2049, with no preceding rpcbind
> request.
>
> Carlos, let us know if you have replaced any NFS-related CentOS components
> (kernel, nfs-utils) with something you've built yourself.
>
>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>
>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>
>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <[email protected]> wrot=
e:
>>>>>
>>>>> Anyone ?
>>>>>
>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>
>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work with
>>>>>> Kerberos
>>>>>> and AutoFS, but i got a problem: If NFS server goes down i get a
>>>>>> LOOOOOOONG
>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>
>>>>>> Since i need mount some (3 to 6) dirs at user logon process, if mount
>>>>>> hangs,
>>>>>> user logon hangs. Then i want configure it to timeout (if server dow=
n)
>>>>>> after
>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>
>>>>>> I already make a lab and tried a LOT of combinations, there my
>>>>>> findings
>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using basic
>>>>>> command
>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>
>>>>>> - Once i try access mount point using AutoFS (proto=3Dtcp OR proto=
=3Dudp)
>>>>>> it
>>>>>> hangs for 189 secs (3m9s: real =A03m9.001s) =A0until show error (mou=
nt:
>>>>>> mount to
>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>
>>>> Sounds like you're hitting the server's grace period.
>>>
>>> I thought he was describing a situation where the server the server
>>> is completely gone and isn't coming back, and wondering how to make the
>>> mount fail faster. =A0But I may be misunderstanding.
>>>
>>> --b.
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>

2009-08-10 20:05:10

by Carlos André

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Something funny: Using default tcp_syn_retries (5) i got
"3,6,12,24,48,96" secs interval... but if i change tcp_syn_retries to
1 i got "3,6,3,6,3,6..." secs interval...

[root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
sec=3Dkrb5p,proto=3Dtcp
mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).

real 3m9.000s
user 0m0.000s
sys 0m0.002s

[root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
[root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
sec=3Dkrb5p,proto=3Dtcp ("retry=3D1" =3D no change)
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).

real 2m6.004s
user 0m0.000s
sys 0m0.004s

(3,6,3,6... secs interval)




2009/8/10 Carlos Andr=E9 <[email protected]>:
> No, i'm just using packages from CentOS repo...
>
> And u're right about expo retries... with tcpdump i've monitored
> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port
> 2049...
> I tried use "retry=3D1" option on mount without any change... I dont
> want change source or tcp timers... just NFSv4 client.
>
> 2009/8/10 Chuck Lever <[email protected]>:
>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>
>>> Bruce, no... you're right. =A0I'm describing a situation where my server
>>> died... i need mount fail faster (10 or 15 secs max) than 3 minutes
>>> and 9 seconds...
>>
>> The 189 second timeout is likely how long it takes the kernel to give up
>> trying to connect a TCP socket to the server (6 SYN attempts with
>> exponential retries, or something like that). =A0For stock CentOS 5.3, I=
think
>> user space does only a DNS lookup for normal NFSv4 mounts -- the kernel =
just
>> tries to connect a TCP socket to port 2049, with no preceding rpcbind
>> request.
>>
>> Carlos, let us know if you have replaced any NFS-related CentOS componen=
ts
>> (kernel, nfs-utils) with something you've built yourself.
>>
>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>
>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>>
>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <[email protected]> wro=
te:
>>>>>>
>>>>>> Anyone ?
>>>>>>
>>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>>
>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work with
>>>>>>> Kerberos
>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i get a
>>>>>>> LOOOOOOONG
>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>
>>>>>>> Since i need mount some (3 to 6) dirs at user logon process, if mou=
nt
>>>>>>> hangs,
>>>>>>> user logon hangs. Then i want configure it to timeout (if server do=
wn)
>>>>>>> after
>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>
>>>>>>> I already make a lab and tried a LOT of combinations, there my
>>>>>>> findings
>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using basic
>>>>>>> command
>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>
>>>>>>> - Once i try access mount point using AutoFS (proto=3Dtcp OR proto=
=3Dudp)
>>>>>>> it
>>>>>>> hangs for 189 secs (3m9s: real =A03m9.001s) =A0until show error (mo=
unt:
>>>>>>> mount to
>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>
>>>>> Sounds like you're hitting the server's grace period.
>>>>
>>>> I thought he was describing a situation where the server the server
>>>> is completely gone and isn't coming back, and wondering how to make the
>>>> mount fail faster. =A0But I may be misunderstanding.
>>>>
>>>> --b.
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>>
>

2009-08-12 15:00:19

by Carlos André

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Hi Ian,
I'm getting crazy trying put "retry=3D" to work on mount... this option
just DONT WORK if use proto=3Dtcp and/OR kerberos (sec=3Dkrb5/krb5i/krb=
5p)
like you can see on my previous emails...

I appreciate any help.

Carlos.


2009/8/12 Ian Kent <[email protected]>:
> Chuck Lever wrote:
>> On Aug 11, 2009, at 8:41 AM, Carlos Andr=E9 wrote:
>>> This long timeout is good if workstation need mount a critical
>>> directory using /etc/fstab on boot (for example)..
>>> But in my case, using this loooong timeout doesnt make any sense,
>>> since autofs retry mount directory on-access. This in fact gives me
>>> alot of headaches, coz user login 'll just hangs if one server goes
>>> down for any reason, and will again hangs if user try access direct=
ory
>>> pointing to a NFS down server...
>>
>> "retry=3D0" means the mount command will fail as soon as the first
>> mount(2) system call fails. =A0When you set SYN retries to 1, this m=
eans
>> after 9 seconds, the connect fails, and that causes the mount(2) sys=
tem
>> call to fail.
>>
>> Recent conversations with Ian suggested that a long timeout was desi=
red
>> for automounter as well as other cases. =A0Ian, is there something e=
lse we
>> need to consider to determine the correct retry timeout for NFS/TCP
>> mount points handled via automounter? =A0How should mount.nfs wait s=
o we
>> don't make other use cases worse? =A0(Looks like most of the history=
is
>> intact below).
>
> Of course we know that autofs is entirely at the mercy of mount(8) (a=
nd
> mount.nfs in particular). This has always been a difficult situation =
for
> the automounter because interactive mount invocations should wait. Bu=
t I
> believe automount mounts should always time out quickly, but that lea=
ds
> to its own set of problems, especially when home directories are conc=
erned.
>
> I think adding "retry=3D0" is the right thing to do myself but I'm no=
t
> certain that will work as we expect. I'll have to do some experimenta=
tion.
>
>>
>> How long do you think is appropriate for the automounter to wait if =
the
>> server is down, in your case, Carlos?
>>
>>> Am losing something or there have was something weirdo...!?
>>> ------------------------------------------------
>>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries =A0[=
DEFAULT]
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> proto=3Dtcp,retry=3D1
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real =A0 =A03m9.000s
>>> user =A0 =A00m0.002s
>>> sys =A0 =A0 0m0.001s
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real =A0 =A03m9.000s
>>> user =A0 =A00m0.000s
>>> sys =A0 =A0 0m0.002s
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> proto=3Dtcp,retry=3D0
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real =A0 =A03m9.001s
>>> user =A0 =A00m0.000s
>>> sys =A0 =A0 0m0.003s
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real =A0 =A03m9.001s
>>> user =A0 =A00m0.002s
>>> sys =A0 =A0 0m0.001s
>>>
>>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries [ 5 =
to 1 ]
>>>
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> proto=3Dtcp,retry=3D1
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). =
[x 6]
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real =A0 =A01m3.002s
>>> user =A0 =A00m0.000s
>>> sys =A0 =A0 0m0.002s
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). =
[x 13]
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real =A0 =A02m6.000s
>>> user =A0 =A00m0.000s
>>> sys =A0 =A0 0m0.002s
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> proto=3Dtcp,retry=3D0
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real =A0 =A00m9.003s
>>> user =A0 =A00m0.001s
>>> sys =A0 =A0 0m0.002s
>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). =
[x 13]
>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>
>>> real =A0 =A02m6.001s
>>> user =A0 =A00m0.001s
>>> sys =A0 =A0 0m0.002s
>>> [root@KSTATION ~]#
>>> ------------------------------------------------
>>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1... an=
d
>>> using retry=3D0 without kerberos I got only 9s...
>>>
>>> *sigh*
>>>
>>>
>>>
>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr=E9 wrote:
>>>>>
>>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_retrie=
s to
>>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>>
>>>> Right. =A0Normally the RPC client calls the kernel's socket connec=
t
>>>> function,
>>>> which does 6 SYN retries. =A0That one call usually takes longer th=
an
>>>> the RPC
>>>> client's connect timeout, so it only makes one connect call, and t=
hen
>>>> fails.
>>>>
>>>> Reducing the number of SYN retries per connect attempt causes the =
RPC
>>>> client
>>>> to retry the connect call until its connect timeout expires. =A0Ea=
ch
>>>> connect
>>>> call resets the SYN timeout to 3 seconds.
>>>>
>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>>> sec=3Dkrb5p,proto=3Dtcp
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up=
).
>>>>>
>>>>> real =A0 =A03m9.000s
>>>>> user =A0 =A00m0.000s
>>>>> sys =A0 =A0 0m0.002s
>>>>>
>>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>>> sec=3Dkrb5p,proto=3Dtcp =A0("retry=3D1" =3D no change)
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying)=
=2E
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying)=
=2E
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying)=
=2E
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying)=
=2E
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying)=
=2E
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying)=
=2E
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying)=
=2E
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying)=
=2E
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying)=
=2E
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying)=
=2E
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying)=
=2E
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying)=
=2E
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying)=
=2E
>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up=
).
>>>>>
>>>>> real =A0 =A02m6.004s
>>>>> user =A0 =A00m0.000s
>>>>> sys =A0 =A0 0m0.004s
>>>>>
>>>>> (3,6,3,6... secs interval)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2009/8/10 Carlos Andr=E9 <[email protected]>:
>>>>>>
>>>>>> No, i'm just using packages from CentOS repo...
>>>>>>
>>>>>> And u're right about expo retries... with tcpdump i've monitored
>>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on po=
rt
>>>>>> 2049...
>>>>>> I tried use "retry=3D1" option on mount without any change... I =
dont
>>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>>
>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>
>>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>>>>>>
>>>>>>>> Bruce, no... you're right. =A0I'm describing a situation where=
my
>>>>>>>> server
>>>>>>>> died... i need mount fail faster (10 or 15 secs max) than 3 mi=
nutes
>>>>>>>> and 9 seconds...
>>>>>>>
>>>>>>> The 189 second timeout is likely how long it takes the kernel t=
o
>>>>>>> give up
>>>>>>> trying to connect a TCP socket to the server (6 SYN attempts wi=
th
>>>>>>> exponential retries, or something like that). =A0For stock Cent=
OS
>>>>>>> 5.3, I
>>>>>>> think
>>>>>>> user space does only a DNS lookup for normal NFSv4 mounts -- th=
e
>>>>>>> kernel
>>>>>>> just
>>>>>>> tries to connect a TCP socket to port 2049, with no preceding r=
pcbind
>>>>>>> request.
>>>>>>>
>>>>>>> Carlos, let us know if you have replaced any NFS-related CentOS
>>>>>>> components
>>>>>>> (kernel, nfs-utils) with something you've built yourself.
>>>>>>>
>>>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>>>
>>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>>>>>>>
>>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <candrecn@gmail=
=2Ecom>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Anyone ?
>>>>>>>>>>>
>>>>>>>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>>
>>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to wor=
k with
>>>>>>>>>>>> Kerberos
>>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i=
get a
>>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>>>>>>
>>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon proces=
s, if
>>>>>>>>>>>> mount
>>>>>>>>>>>> hangs,
>>>>>>>>>>>> user logon hangs. Then i want configure it to timeout (if =
server
>>>>>>>>>>>> down)
>>>>>>>>>>>> after
>>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>>
>>>>>>>>>>>> I already make a lab and tried a LOT of combinations, ther=
e my
>>>>>>>>>>>> findings
>>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) usi=
ng
>>>>>>>>>>>> basic
>>>>>>>>>>>> command
>>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>>>>>>
>>>>>>>>>>>> - Once i try access mount point using AutoFS (proto=3Dtcp =
OR
>>>>>>>>>>>> proto=3Dudp)
>>>>>>>>>>>> it
>>>>>>>>>>>> hangs for 189 secs (3m9s: real =A03m9.001s) =A0until show =
error
>>>>>>>>>>>> (mount:
>>>>>>>>>>>> mount to
>>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>>>>>>
>>>>>>>>>> Sounds like you're hitting the server's grace period.
>>>>>>>>>
>>>>>>>>> I thought he was describing a situation where the server the =
server
>>>>>>>>> is completely gone and isn't coming back, and wondering how t=
o make
>>>>>>>>> the
>>>>>>>>> mount fail faster. =A0But I may be misunderstanding.
>>>>>>>>>
>>>>>>>>> --b.
>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>> linux-nfs" in
>>>>>>>> the body of a message to [email protected]
>>>>>>>> More majordomo info at =A0http://vger.kernel.org/majordomo-inf=
o.html
>>>>>>>
>>>>>>> --
>>>>>>> Chuck Lever
>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Chuck Lever
>>>> chuck[dot]lever[at]oracle[dot]com
>>>>
>>>>
>>>>
>>>>
>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>
>

2009-08-12 15:20:17

by Ian Kent

[permalink] [raw]
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Carlos Andr=E9 wrote:
> Hi Ian,
> I'm getting crazy trying put "retry=3D" to work on mount... this option
> just DONT WORK if use proto=3Dtcp and/OR kerberos (sec=3Dkrb5/krb5i/krb5p)
> like you can see on my previous emails...

Right, my mistake for not looking closely enough at post.

Maybe this is related to the same sort of problem we had with mount in
the past, before the options parsing went into the kernel, where other
services, like portmapper (or rpcbind), were being done with different
timeout parameters before the RPC calls for mounting. That's just an
example as NFSv4 shouldn't be sensitive to portmapper anyway.

But what version of autofs and kernel did you say you were using?

> =

> I appreciate any help.
> =

> Carlos.
> =

> =

> 2009/8/12 Ian Kent <[email protected]>:
>> Chuck Lever wrote:
>>> On Aug 11, 2009, at 8:41 AM, Carlos Andr=E9 wrote:
>>>> This long timeout is good if workstation need mount a critical
>>>> directory using /etc/fstab on boot (for example)..
>>>> But in my case, using this loooong timeout doesnt make any sense,
>>>> since autofs retry mount directory on-access. This in fact gives me
>>>> alot of headaches, coz user login 'll just hangs if one server goes
>>>> down for any reason, and will again hangs if user try access directory
>>>> pointing to a NFS down server...
>>> "retry=3D0" means the mount command will fail as soon as the first
>>> mount(2) system call fails. When you set SYN retries to 1, this means
>>> after 9 seconds, the connect fails, and that causes the mount(2) system
>>> call to fail.
>>>
>>> Recent conversations with Ian suggested that a long timeout was desired
>>> for automounter as well as other cases. Ian, is there something else we
>>> need to consider to determine the correct retry timeout for NFS/TCP
>>> mount points handled via automounter? How should mount.nfs wait so we
>>> don't make other use cases worse? (Looks like most of the history is
>>> intact below).
>> Of course we know that autofs is entirely at the mercy of mount(8) (and
>> mount.nfs in particular). This has always been a difficult situation for
>> the automounter because interactive mount invocations should wait. But I
>> believe automount mounts should always time out quickly, but that leads
>> to its own set of problems, especially when home directories are concern=
ed.
>>
>> I think adding "retry=3D0" is the right thing to do myself but I'm not
>> certain that will work as we expect. I'll have to do some experimentatio=
n.
>>
>>> How long do you think is appropriate for the automounter to wait if the
>>> server is down, in your case, Carlos?
>>>
>>>> Am losing something or there have was something weirdo...!?
>>>> ------------------------------------------------
>>>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries [DEFAU=
LT]
>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>> proto=3Dtcp,retry=3D1
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>
>>>> real 3m9.000s
>>>> user 0m0.002s
>>>> sys 0m0.001s
>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>
>>>> real 3m9.000s
>>>> user 0m0.000s
>>>> sys 0m0.002s
>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>> proto=3Dtcp,retry=3D0
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>
>>>> real 3m9.001s
>>>> user 0m0.000s
>>>> sys 0m0.003s
>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>
>>>> real 3m9.001s
>>>> user 0m0.002s
>>>> sys 0m0.001s
>>>>
>>>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries [ 5 to =
1 ]
>>>>
>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>> proto=3Dtcp,retry=3D1
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x =
6]
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>
>>>> real 1m3.002s
>>>> user 0m0.000s
>>>> sys 0m0.002s
>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x =
13]
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>
>>>> real 2m6.000s
>>>> user 0m0.000s
>>>> sys 0m0.002s
>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>> proto=3Dtcp,retry=3D0
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>
>>>> real 0m9.003s
>>>> user 0m0.001s
>>>> sys 0m0.002s
>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x =
13]
>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>
>>>> real 2m6.001s
>>>> user 0m0.001s
>>>> sys 0m0.002s
>>>> [root@KSTATION ~]#
>>>> ------------------------------------------------
>>>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1... and
>>>> using retry=3D0 without kerberos I got only 9s...
>>>>
>>>> *sigh*
>>>>
>>>>
>>>>
>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr=E9 wrote:
>>>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_retries to
>>>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>>> Right. Normally the RPC client calls the kernel's socket connect
>>>>> function,
>>>>> which does 6 SYN retries. That one call usually takes longer than
>>>>> the RPC
>>>>> client's connect timeout, so it only makes one connect call, and then
>>>>> fails.
>>>>>
>>>>> Reducing the number of SYN retries per connect attempt causes the RPC
>>>>> client
>>>>> to retry the connect call until its connect timeout expires. Each
>>>>> connect
>>>>> call resets the SYN timeout to 3 seconds.
>>>>>
>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>>>> sec=3Dkrb5p,proto=3Dtcp
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>
>>>>>> real 3m9.000s
>>>>>> user 0m0.000s
>>>>>> sys 0m0.002s
>>>>>>
>>>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>>>>>> sec=3Dkrb5p,proto=3Dtcp ("retry=3D1" =3D no change)
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>>>>>>
>>>>>> real 2m6.004s
>>>>>> user 0m0.000s
>>>>>> sys 0m0.004s
>>>>>>
>>>>>> (3,6,3,6... secs interval)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2009/8/10 Carlos Andr=E9 <[email protected]>:
>>>>>>> No, i'm just using packages from CentOS repo...
>>>>>>>
>>>>>>> And u're right about expo retries... with tcpdump i've monitored
>>>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port
>>>>>>> 2049...
>>>>>>> I tried use "retry=3D1" option on mount without any change... I dont
>>>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>>>
>>>>>>> 2009/8/10 Chuck Lever <[email protected]>:
>>>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>>>>>>> Bruce, no... you're right. I'm describing a situation where my
>>>>>>>>> server
>>>>>>>>> died... i need mount fail faster (10 or 15 secs max) than 3 minut=
es
>>>>>>>>> and 9 seconds...
>>>>>>>> The 189 second timeout is likely how long it takes the kernel to
>>>>>>>> give up
>>>>>>>> trying to connect a TCP socket to the server (6 SYN attempts with
>>>>>>>> exponential retries, or something like that). For stock CentOS
>>>>>>>> 5.3, I
>>>>>>>> think
>>>>>>>> user space does only a DNS lookup for normal NFSv4 mounts -- the
>>>>>>>> kernel
>>>>>>>> just
>>>>>>>> tries to connect a TCP socket to port 2049, with no preceding rpcb=
ind
>>>>>>>> request.
>>>>>>>>
>>>>>>>> Carlos, let us know if you have replaced any NFS-related CentOS
>>>>>>>> components
>>>>>>>> (kernel, nfs-utils) with something you've built yourself.
>>>>>>>>
>>>>>>>>> 2009/8/7 J. Bruce Fields <[email protected]>:
>>>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <[email protected]=
m>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Anyone ?
>>>>>>>>>>>>
>>>>>>>>>>>> 2009/7/29 Carlos Andr=E9 <[email protected]>:
>>>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work w=
ith
>>>>>>>>>>>>> Kerberos
>>>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i ge=
t a
>>>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon process, =
if
>>>>>>>>>>>>> mount
>>>>>>>>>>>>> hangs,
>>>>>>>>>>>>> user logon hangs. Then i want configure it to timeout (if ser=
ver
>>>>>>>>>>>>> down)
>>>>>>>>>>>>> after
>>>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I already make a lab and tried a LOT of combinations, there my
>>>>>>>>>>>>> findings
>>>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using
>>>>>>>>>>>>> basic
>>>>>>>>>>>>> command
>>>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - Once i try access mount point using AutoFS (proto=3Dtcp OR
>>>>>>>>>>>>> proto=3Dudp)
>>>>>>>>>>>>> it
>>>>>>>>>>>>> hangs for 189 secs (3m9s: real 3m9.001s) until show error
>>>>>>>>>>>>> (mount:
>>>>>>>>>>>>> mount to
>>>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>>>>>>> Sounds like you're hitting the server's grace period.
>>>>>>>>>> I thought he was describing a situation where the server the ser=
ver
>>>>>>>>>> is completely gone and isn't coming back, and wondering how to m=
ake
>>>>>>>>>> the
>>>>>>>>>> mount fail faster. But I may be misunderstanding.
>>>>>>>>>>
>>>>>>>>>> --b.
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>>> linux-nfs" in
>>>>>>>>> the body of a message to [email protected]
>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>>> --
>>>>>>>> Chuck Lever
>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>> --
>>>>> Chuck Lever
>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>
>>>>>
>>>>>
>>>>>
>>> --
>>> Chuck Lever
>>> chuck[dot]lever[at]oracle[dot]com
>>>
>>>
>>>
>>