2002-05-13 19:53:16

by Ion Badulescu

[permalink] [raw]
Subject: Re: NFS retry on disconnection

On Mon, 13 May 2002 09:17:54 -0700, Bryan Henderson <[email protected]> wrote:

>>If you are adamant that you want all the aggravation of data loss,
>>programs crashing, user complaints, etc etc though, you can use the
>>'soft' mount option.
>
> You sound like someone who has not faced the aggravation of resources that
> are hung indefinitely because communication has been lost with an NFS
> server which is no longer of any relevance to anything. I'd say in many
> cases that must outweigh the aggravation of data loss and programs crashing
> and cause more user complaints. Or do you have a way besides timeouts to
> ease that aggravation.

It's a lose-lose situation, and really the only proper way to fix the mess
is to bring the downed server back up.

Having NFS timeout underneath you is exactly the same as having a hard
drive fail underneath you. Most if not all applications are utterly
unprepared to deal with the resulting I/O errors, thus guaranteeing
data corruption.

The 'soft' mounting is only acceptable for read-only mounts, and even then
only if you really don't care about your input data disappearing at random.
That's AT RANDOM -- it's not a typo, you'll get RANDOM failures, depending
on how loaded your network and/or server are at times.

> Though you didn't answer the question, your advice implies that the Linux
> NFSv3 filesystem driver retries forever when communication with the NFS
> server has been lost. If not, please correct us.

The default is 'hard' which will retry forever.

You can mount with '-o soft' which will give you the desired data corruption.

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [email protected]
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2002-05-13 21:03:43

by Greg Lindahl

[permalink] [raw]
Subject: Re: Re: NFS retry on disconnection

On Mon, May 13, 2002 at 03:53:00PM -0400, Ion Badulescu wrote:

> > Though you didn't answer the question, your advice implies that the Linux
> > NFSv3 filesystem driver retries forever when communication with the NFS
> > server has been lost. If not, please correct us.
>
> The default is 'hard' which will retry forever.
>
> You can mount with '-o soft' which will give you the desired data corruption.

Linux NFS isn't any different from any other OS in this treatment of
"soft" and "hard", although I guess an OS could have a default to
"soft", which would be a bad idea.

"Best practices" is to mount "hard,intr" so the user can interrupt the
process if they wish to... otherwise it will wait until the problem is
fixed. You should also use automounts to ensure that a minimum number
of filesystems are mounted, especially if you have a complicated setup
with many infrequently-used filesystems.

greg


_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [email protected]
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-05-14 06:37:33

by Peter Astrand

[permalink] [raw]
Subject: Re: Re: NFS retry on disconnection


>Linux NFS isn't any different from any other OS in this treatment of
>"soft" and "hard", although I guess an OS could have a default to
>"soft", which would be a bad idea.
>
>"Best practices" is to mount "hard,intr" so the user can interrupt the
>process if they wish to... otherwise it will wait until the problem is
>fixed. You should also use automounts to ensure that a minimum number

One thing that keeps annoying me is that "intr" only allows interrupting
the file operation when a major timeout happens. I want to be able to
interrupt at any time (as far as I know, Solaris works like this). Any
good reasons for only interrupting at major timeouts?

--
/Peter ?strand <[email protected]>




_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [email protected]
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-05-14 08:17:08

by Trond Myklebust

[permalink] [raw]
Subject: Re: Re: NFS retry on disconnection

>>>>> " " == astrand <Peter> writes:


> One thing that keeps annoying me is that "intr" only allows
> interrupting the file operation when a major timeout happens. I

Nope. Major, minor, it all goes through the same code and both can be
interrupted. What can happen, though, is that one process could
actually be waiting on another process.

If, say, they are both waiting to read data from the same page, then
only one process actually does the RPC call. The VFS/MM layers will
put the other process to sleep in the global function 'lock_page()'.
That unfortunately means that it cannot interrupt, since 'lock_page()'
does not do interruptible sleeps.

Cheers,
Trond

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [email protected]
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-05-14 08:39:03

by Peter Astrand

[permalink] [raw]
Subject: Re: Re: NFS retry on disconnection

> > One thing that keeps annoying me is that "intr" only allows
> > interrupting the file operation when a major timeout happens. I
>
> Nope. Major, minor, it all goes through the same code and both can be
> interrupted.

Does this mean that the man page nfs(5) is incorrect? It says:

"intr

If an NFS file operation has a major timeout and it is hard mounted, then
allow signals to interupt the file operation and cause it to return EINTR
to the calling program."

> If, say, they are both waiting to read data from the same page, then
> only one process actually does the RPC call. The VFS/MM layers will
> put the other process to sleep in the global function 'lock_page()'.
> That unfortunately means that it cannot interrupt, since 'lock_page()'
> does not do interruptible sleeps.

I see. And this is not possible to change?


--
/Peter ?strand <[email protected]>




_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [email protected]
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-05-14 09:31:46

by Trond Myklebust

[permalink] [raw]
Subject: Re: Re: NFS retry on disconnection

>>>>> " " == Peter Astrand <[email protected]> writes:

> Does this mean that the man page nfs(5) is incorrect? It says:

> "intr

> If an NFS file operation has a major timeout and it is hard
> mounted, then allow signals to interupt the file operation and
> cause it to return EINTR to the calling program."

That looks to be incorrect, yes.

>> function 'lock_page()'. That unfortunately means that it
>> cannot interrupt, since 'lock_page()' does not do interruptible
>> sleeps.

> I see. And this is not possible to change?

Not without great effort, and it would demand considerable auditing to
determine exactly where interruptible sleeps make sense.

A better alternative might be to use asynchronous I/O (as in the POSIX
'aio' library functions). There are patches floating around for that
sort of thing.

Cheers,
Trond

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [email protected]
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-05-14 22:52:46

by Ion Badulescu

[permalink] [raw]
Subject: Re: Re: NFS retry on disconnection

On 14 May 2002 10:16:59 +0200, Trond Myklebust <[email protected]> wrote:

> Nope. Major, minor, it all goes through the same code and both can be
> interrupted. What can happen, though, is that one process could
> actually be waiting on another process.

What I've noticed is that, more often than not, the hanging processes
end up waiting for rpciod. Sending a SIGKILL to rpciod unhangs those
processes, but unfortunately only root can do that...

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [email protected]
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs