2007-08-20 23:01:15

by Robin Lee Powell

[permalink] [raw]
Subject: NFS hang + umount -f: better behaviour requested.

(cc's to me appreciated)

It would be really, really nice if "umount -f" against a hung NFS
mount actually worked on Linux. As much as I hate Solaris, I
consider it the gold standard in this case: If I say
"umount -f /mount/that/is/hung" it just goes away, immediately, and
anything still trying to use it dies (with EIO, I'm told).

If I know the NFS server is down, that really is the correct
behaviour. I very much want this behaviour, and am willing to
bribe/pay for it, although my resources are limited.

Unless you're interested in details of my tests, stop here.

I'm bringing this up again (I know it's been mentioned here before)
because I had been told that NFS support had gotten better in Linux
recently, so I have been (for my $dayjob) testing the behaviour of
NFS (autofs NFS, specifically) under Linux with hard,intr and using
iptables to simulate a hang. fuser hangs, as far as I can tell
indefinately, as does lsof. umount -f returns after a long time with
"busy", umount -l works after a long time but leaves the system in a
very unfortunate state such that I have to kill things by hand and
manually edit /etc/mtab to get autofs to work again.

The "correct solution" to this situation according to
http://nfs.sourceforge.net/ is cycles of "kill processes" and
"umount -f". This has two problems: 1. It sucks. 2. If fuser
and lsof both hand (and they do: fuser has been on
"stat("/home/rpowell/"," for > 30 minutes now), I have no way to
pick which processes to kill.

I've read every man page I could find, and the only nfs option that
semes even vaguely helpful is "soft", but everything that mentions
"soft" also says to never use it.

This is the single worst aspect of adminning a Linux system that I,
as a carreer sysadmin, have to deal with. In fact, it's really the
only one I even dislike. At my current work place, we've lost
multiple person-days to this issue, having to go around and reboot
every Linux box that was hanging off a down NFS server.

I know many other admins who also really want Solaris style
"umount -f"; I'm sure if I passed the hat I could get a decent
bounty together for this feature; let me know if you're interested.

Thanks.

-Robin

--
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/
Reason #237 To Learn Lojban: "Homonyms: Their Grate!"
Proud Supporter of the Singularity Institute - http://singinst.org/


2007-08-20 23:27:21

by NeilBrown

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

On Monday August 20, [email protected] wrote:
> (cc's to me appreciated)
>
> It would be really, really nice if "umount -f" against a hung NFS
> mount actually worked on Linux. As much as I hate Solaris, I
> consider it the gold standard in this case: If I say
> "umount -f /mount/that/is/hung" it just goes away, immediately, and
> anything still trying to use it dies (with EIO, I'm told).

Have you tried "umount -l"? How far is that from your requirements?

Alternately:
mount --move /problem/path /somewhere/else
umount -f /somewhere/else
umount -l /somewhere/else

might be a little closer to what you want.

Though I agree that it would be nice if we could convince all
subsequent requests to a server to fail EIO instead of just the
currently active ones. I'm not sure that just changing "umount -f" is
the right interface though.... Maybe if all the server handles
appeared in sysfs and have an attribute which you could set to cause
all requests to fail...

NeilBrown

2007-08-20 23:34:27

by Robin Lee Powell

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

On Tue, Aug 21, 2007 at 09:27:06AM +1000, Neil Brown wrote:
> On Monday August 20, [email protected] wrote:
> > (cc's to me appreciated)
> >
> > It would be really, really nice if "umount -f" against a hung
> > NFS mount actually worked on Linux. As much as I hate Solaris,
> > I consider it the gold standard in this case: If I say "umount
> > -f /mount/that/is/hung" it just goes away, immediately, and
> > anything still trying to use it dies (with EIO, I'm told).
>
> Have you tried "umount -l"? How far is that from your
> requirements?

I actually talked about that further down. The short version: quite
far.

The long version:

It leaves a bunch of hung processes, with no real way for me to
determine which processes are hung on the now-non-existent mount,
and (at least with autofs) it leaves /etc/mtab in an inconsistent
state, so I had to edit it to restart autofs. Only a mild
improvement on rebooting, says I.

Also, it took a really long time (minutes) to return.

> Alternately:
> mount --move /problem/path /somewhere/else
> umount -f /somewhere/else
> umount -l /somewhere/else
>
> might be a little closer to what you want.

I don't think that would solve the problem: the umount -f would
still hang and eventually return busy, fuser would still hang, and
umount -l would still leave inconsistent crap lying around.

> Though I agree that it would be nice if we could convince all
> subsequent requests to a server to fail EIO instead of just the
> currently active ones. I'm not sure that just changing "umount
> -f" is the right interface though.... Maybe if all the server
> handles appeared in sysfs and have an attribute which you could
> set to cause all requests to fail...

I have no opinion on interface details, I simply know that on
Solaris, "umount -f" Just Works, and I would love to have similar
behaviour on Linux.

-Robin

--
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/
Reason #237 To Learn Lojban: "Homonyms: Their Grate!"
Proud Supporter of the Singularity Institute - http://singinst.org/

2007-08-21 01:55:19

by Salah Coronya

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

Robin Lee Powell <rlpowell <at> digitalkingdom.org> writes:


> > Though I agree that it would be nice if we could convince all
> > subsequent requests to a server to fail EIO instead of just the
> > currently active ones. I'm not sure that just changing "umount
> > -f" is the right interface though.... Maybe if all the server
> > handles appeared in sysfs and have an attribute which you could
> > set to cause all requests to fail...
>
> I have no opinion on interface details, I simply know that on
> Solaris, "umount -f" Just Works, and I would love to have similar
> behaviour on Linux.
>
> -Robin
>

What you are looing is revoke()/frevokeat(); which will yank the file right from
under the descriptor. Its currently in -mm. Of course "mount" will still need to
iterate over each open file on the mount and revoke it.



2007-08-21 16:44:07

by John Stoffel

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.


Robin> I'm bringing this up again (I know it's been mentioned here
Robin> before) because I had been told that NFS support had gotten
Robin> better in Linux recently, so I have been (for my $dayjob)
Robin> testing the behaviour of NFS (autofs NFS, specifically) under
Robin> Linux with hard,intr and using iptables to simulate a hang.

So why are you mouting with hard,intr semantics? At my current
SysAdmin job, we mount everything (solaris included) with 'soft,intr'
and it works well. If an NFS server goes down, clients don't hang for
large periods of time.

Robin> fuser hangs, as far as I can tell indefinately, as does
Robin> lsof. umount -f returns after a long time with "busy", umount
Robin> -l works after a long time but leaves the system in a very
Robin> unfortunate state such that I have to kill things by hand and
Robin> manually edit /etc/mtab to get autofs to work again.

Robin> The "correct solution" to this situation according to
Robin> http://nfs.sourceforge.net/ is cycles of "kill processes" and
Robin> "umount -f". This has two problems: 1. It sucks. 2. If fuser
Robin> and lsof both hand (and they do: fuser has been on
Robin> "stat("/home/rpowell/"," for > 30 minutes now), I have no way to
Robin> pick which processes to kill.

Robin> I've read every man page I could find, and the only nfs option
Robin> that semes even vaguely helpful is "soft", but everything that
Robin> mentions "soft" also says to never use it.

I think the man pages are out of date, or ignoring reality. Try
mounting with soft,intr and see how it works for you. I think you'll
be happy.

Robin> This is the single worst aspect of adminning a Linux system that I,
Robin> as a carreer sysadmin, have to deal with. In fact, it's really the
Robin> only one I even dislike. At my current work place, we've lost
Robin> multiple person-days to this issue, having to go around and reboot
Robin> every Linux box that was hanging off a down NFS server.

Robin> I know many other admins who also really want Solaris style
Robin> "umount -f"; I'm sure if I passed the hat I could get a decent
Robin> bounty together for this feature; let me know if you're interested.

Robin> Thanks.

Robin> -Robin

Robin> --
Robin> http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/
Robin> Reason #237 To Learn Lojban: "Homonyms: Their Grate!"
Robin> Proud Supporter of the Singularity Institute - http://singinst.org/
Robin> -
Robin> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Robin> the body of a message to [email protected]
Robin> More majordomo info at http://vger.kernel.org/majordomo-info.html
Robin> Please read the FAQ at http://www.tux.org/lkml/


Robin> !DSPAM:46ca1d9676791030010506!

2007-08-21 16:55:57

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

On Tue, Aug 21, 2007 at 12:43:47PM -0400, John Stoffel wrote:
> Robin> I've read every man page I could find, and the only nfs option
> Robin> that semes even vaguely helpful is "soft", but everything that
> Robin> mentions "soft" also says to never use it.
>
> I think the man pages are out of date, or ignoring reality.

No. The price of using "soft" is the chance of data corruption, since
an application may for example be left thinking that a write has
succeeded when it hasn't. See

http://nfs.sourceforge.net/#faq_e4

--b.

2007-08-21 17:02:14

by Peter Staubach

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

John Stoffel wrote:
> Robin> I'm bringing this up again (I know it's been mentioned here
> Robin> before) because I had been told that NFS support had gotten
> Robin> better in Linux recently, so I have been (for my $dayjob)
> Robin> testing the behaviour of NFS (autofs NFS, specifically) under
> Robin> Linux with hard,intr and using iptables to simulate a hang.
>
> So why are you mouting with hard,intr semantics? At my current
> SysAdmin job, we mount everything (solaris included) with 'soft,intr'
> and it works well. If an NFS server goes down, clients don't hang for
> large periods of time.
>
>

Wow! That's _really_ a bad idea. NFS READ operations which
timeout can lead to executables which mysteriously fail, file
corruption, etc. NFS WRITE operations which fail may or may
not lead to file corruption.

Anything writable should _always_ be mounted "hard" for safety
purposes. Readonly mounted file systems _may_ be mounted "soft",
depending upon what is located on them.

> Robin> fuser hangs, as far as I can tell indefinately, as does
> Robin> lsof. umount -f returns after a long time with "busy", umount
> Robin> -l works after a long time but leaves the system in a very
> Robin> unfortunate state such that I have to kill things by hand and
> Robin> manually edit /etc/mtab to get autofs to work again.
>
> Robin> The "correct solution" to this situation according to
> Robin> http://nfs.sourceforge.net/ is cycles of "kill processes" and
> Robin> "umount -f". This has two problems: 1. It sucks. 2. If fuser
> Robin> and lsof both hand (and they do: fuser has been on
> Robin> "stat("/home/rpowell/"," for > 30 minutes now), I have no way to
> Robin> pick which processes to kill.
>
> Robin> I've read every man page I could find, and the only nfs option
> Robin> that semes even vaguely helpful is "soft", but everything that
> Robin> mentions "soft" also says to never use it.
>
> I think the man pages are out of date, or ignoring reality. Try
> mounting with soft,intr and see how it works for you. I think you'll
> be happy.
>
>

Please don't. You will end up regretting it in the long run.
Taking a chance on corrupted data or critical applications which
just fail is not worth the benefit.

It would safer for us to implement something which works like
the Solaris forced umount support for NFS.

Thanx...

ps

> Robin> This is the single worst aspect of adminning a Linux system that I,
> Robin> as a carreer sysadmin, have to deal with. In fact, it's really the
> Robin> only one I even dislike. At my current work place, we've lost
> Robin> multiple person-days to this issue, having to go around and reboot
> Robin> every Linux box that was hanging off a down NFS server.
>
> Robin> I know many other admins who also really want Solaris style
> Robin> "umount -f"; I'm sure if I passed the hat I could get a decent
> Robin> bounty together for this feature; let me know if you're interested.
>
> Robin> Thanks.
>
> Robin> -Robin
>
> Robin> --
> Robin> http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/
> Robin> Reason #237 To Learn Lojban: "Homonyms: Their Grate!"
> Robin> Proud Supporter of the Singularity Institute - http://singinst.org/
> Robin> -
> Robin> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> Robin> the body of a message to [email protected]
> Robin> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Robin> Please read the FAQ at http://www.tux.org/lkml/
>
>
> Robin> !DSPAM:46ca1d9676791030010506!
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2007-08-21 17:14:21

by Chakri n

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

To add to the pain, lsof or fuser hang on unresponsive shares.

I wrote my own wrapper to go through the "/proc/<pid>" file tables and
find any process using the unresponsive mounts and kill those
processes.This works well.

Also, it brings another point. If the unresponsives problem cannot be
fixed for some NFS data corruption reasons, is it possible for a mount
to have both soft & hard semantics? Some process might want to use the
mount point soft and other processes hard. This can be implemented
easily in NFS & SUNRPC layers adding timeout to requests, but it
becomes tricky in VFS layer. If a soft proces is waiting on an inode
locked by a hard process, the soft process gets hard semantics too.

Thanks
--Chakri

On 8/21/07, Peter Staubach <[email protected]> wrote:
> John Stoffel wrote:
> > Robin> I'm bringing this up again (I know it's been mentioned here
> > Robin> before) because I had been told that NFS support had gotten
> > Robin> better in Linux recently, so I have been (for my $dayjob)
> > Robin> testing the behaviour of NFS (autofs NFS, specifically) under
> > Robin> Linux with hard,intr and using iptables to simulate a hang.
> >
> > So why are you mouting with hard,intr semantics? At my current
> > SysAdmin job, we mount everything (solaris included) with 'soft,intr'
> > and it works well. If an NFS server goes down, clients don't hang for
> > large periods of time.
> >
> >
>
> Wow! That's _really_ a bad idea. NFS READ operations which
> timeout can lead to executables which mysteriously fail, file
> corruption, etc. NFS WRITE operations which fail may or may
> not lead to file corruption.
>
> Anything writable should _always_ be mounted "hard" for safety
> purposes. Readonly mounted file systems _may_ be mounted "soft",
> depending upon what is located on them.
>
> > Robin> fuser hangs, as far as I can tell indefinately, as does
> > Robin> lsof. umount -f returns after a long time with "busy", umount
> > Robin> -l works after a long time but leaves the system in a very
> > Robin> unfortunate state such that I have to kill things by hand and
> > Robin> manually edit /etc/mtab to get autofs to work again.
> >
> > Robin> The "correct solution" to this situation according to
> > Robin> http://nfs.sourceforge.net/ is cycles of "kill processes" and
> > Robin> "umount -f". This has two problems: 1. It sucks. 2. If fuser
> > Robin> and lsof both hand (and they do: fuser has been on
> > Robin> "stat("/home/rpowell/"," for > 30 minutes now), I have no way to
> > Robin> pick which processes to kill.
> >
> > Robin> I've read every man page I could find, and the only nfs option
> > Robin> that semes even vaguely helpful is "soft", but everything that
> > Robin> mentions "soft" also says to never use it.
> >
> > I think the man pages are out of date, or ignoring reality. Try
> > mounting with soft,intr and see how it works for you. I think you'll
> > be happy.
> >
> >
>
> Please don't. You will end up regretting it in the long run.
> Taking a chance on corrupted data or critical applications which
> just fail is not worth the benefit.
>
> It would safer for us to implement something which works like
> the Solaris forced umount support for NFS.
>
> Thanx...
>
> ps
>
> > Robin> This is the single worst aspect of adminning a Linux system that I,
> > Robin> as a carreer sysadmin, have to deal with. In fact, it's really the
> > Robin> only one I even dislike. At my current work place, we've lost
> > Robin> multiple person-days to this issue, having to go around and reboot
> > Robin> every Linux box that was hanging off a down NFS server.
> >
> > Robin> I know many other admins who also really want Solaris style
> > Robin> "umount -f"; I'm sure if I passed the hat I could get a decent
> > Robin> bounty together for this feature; let me know if you're interested.
> >
> > Robin> Thanks.
> >
> > Robin> -Robin
> >
> > Robin> --
> > Robin> http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/
> > Robin> Reason #237 To Learn Lojban: "Homonyms: Their Grate!"
> > Robin> Proud Supporter of the Singularity Institute - http://singinst.org/
> > Robin> -
> > Robin> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > Robin> the body of a message to [email protected]
> > Robin> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Robin> Please read the FAQ at http://www.tux.org/lkml/
> >
> >
> > Robin> !DSPAM:46ca1d9676791030010506!
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2007-08-21 17:14:40

by Robin Lee Powell

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

On Tue, Aug 21, 2007 at 01:01:44PM -0400, Peter Staubach wrote:
> John Stoffel wrote:
> >Robin> I'm bringing this up again (I know it's been mentioned here
> >Robin> before) because I had been told that NFS support had gotten
> >Robin> better in Linux recently, so I have been (for my $dayjob)
> >Robin> testing the behaviour of NFS (autofs NFS, specifically) under
> >Robin> Linux with hard,intr and using iptables to simulate a hang.
> >
> >So why are you mouting with hard,intr semantics? At my current
> >SysAdmin job, we mount everything (solaris included) with
> >'soft,intr' and it works well. If an NFS server goes down,
> >clients don't hang for large periods of time.
>
> Wow! That's _really_ a bad idea. NFS READ operations which
> timeout can lead to executables which mysteriously fail, file
> corruption, etc. NFS WRITE operations which fail may or may not
> lead to file corruption.
>
> Anything writable should _always_ be mounted "hard" for safety
> purposes. Readonly mounted file systems _may_ be mounted "soft",
> depending upon what is located on them.

Does write + tcp make this any different?

-Robin

--
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/
Reason #237 To Learn Lojban: "Homonyms: Their Grate!"
Proud Supporter of the Singularity Institute - http://singinst.org/

2007-08-21 17:19:18

by Peter Staubach

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

Robin Lee Powell wrote:
> On Tue, Aug 21, 2007 at 01:01:44PM -0400, Peter Staubach wrote:
>
>> John Stoffel wrote:
>>
>>> Robin> I'm bringing this up again (I know it's been mentioned here
>>> Robin> before) because I had been told that NFS support had gotten
>>> Robin> better in Linux recently, so I have been (for my $dayjob)
>>> Robin> testing the behaviour of NFS (autofs NFS, specifically) under
>>> Robin> Linux with hard,intr and using iptables to simulate a hang.
>>>
>>> So why are you mouting with hard,intr semantics? At my current
>>> SysAdmin job, we mount everything (solaris included) with
>>> 'soft,intr' and it works well. If an NFS server goes down,
>>> clients don't hang for large periods of time.
>>>
>> Wow! That's _really_ a bad idea. NFS READ operations which
>> timeout can lead to executables which mysteriously fail, file
>> corruption, etc. NFS WRITE operations which fail may or may not
>> lead to file corruption.
>>
>> Anything writable should _always_ be mounted "hard" for safety
>> purposes. Readonly mounted file systems _may_ be mounted "soft",
>> depending upon what is located on them.
>>
>
> Does write + tcp make this any different?

Nope...

TCP may make a difference if the problem is related to the network
being slow or lossy, but will not affect anything if the server
is just slow or down. Even if TCP would have eventually gotten
all of the packets in a request or response through, the client
may time out, cease waiting, and corruption may occur again.

ps

2007-08-21 18:51:00

by John Stoffel

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

>>>>> "Peter" == Peter Staubach <[email protected]> writes:

Peter> John Stoffel wrote:
Robin> I'm bringing this up again (I know it's been mentioned here
Robin> before) because I had been told that NFS support had gotten
Robin> better in Linux recently, so I have been (for my $dayjob)
Robin> testing the behaviour of NFS (autofs NFS, specifically) under
Robin> Linux with hard,intr and using iptables to simulate a hang.
>>
>> So why are you mouting with hard,intr semantics? At my current
>> SysAdmin job, we mount everything (solaris included) with 'soft,intr'
>> and it works well. If an NFS server goes down, clients don't hang for
>> large periods of time.

Peter> Wow! That's _really_ a bad idea. NFS READ operations which
Peter> timeout can lead to executables which mysteriously fail, file
Peter> corruption, etc. NFS WRITE operations which fail may or may
Peter> not lead to file corruption.

Peter> Anything writable should _always_ be mounted "hard" for safety
Peter> purposes. Readonly mounted file systems _may_ be mounted
Peter> "soft", depending upon what is located on them.

Not in my experience. We use NetApps as our backing NFS servers, so
maybe my experience isn't totally relevant. But with a mix of Linux
and Solaris clients, we've never had problems with soft,intr on our
NFS clients.

We also don't see file corruption, mysterious executables failing to
run, etc.

Now maybe those issues are raised when you have a Linux NFS server
with Solaris clients. But in my book, reliable NFS servers are key,
and if they are reliable, 'soft,intr' works just fine.

Now maybe if we had NFS exported directories everywhere, and stuff
cross mounted all over the place with autofs, then we might change our
minds.

In any case, I don't dis-agree with the fundamental request to make
the NFS client code on Linux easier to work with. I bet Trond (who
works at NetApp) will have something to say on this issue.


John

2007-08-21 19:05:22

by Peter Staubach

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

John Stoffel wrote:
>>>>>> "Peter" == Peter Staubach <[email protected]> writes:
>>>>>>
>
> Peter> John Stoffel wrote:
> Robin> I'm bringing this up again (I know it's been mentioned here
> Robin> before) because I had been told that NFS support had gotten
> Robin> better in Linux recently, so I have been (for my $dayjob)
> Robin> testing the behaviour of NFS (autofs NFS, specifically) under
> Robin> Linux with hard,intr and using iptables to simulate a hang.
>
>>> So why are you mouting with hard,intr semantics? At my current
>>> SysAdmin job, we mount everything (solaris included) with 'soft,intr'
>>> and it works well. If an NFS server goes down, clients don't hang for
>>> large periods of time.
>>>
>
> Peter> Wow! That's _really_ a bad idea. NFS READ operations which
> Peter> timeout can lead to executables which mysteriously fail, file
> Peter> corruption, etc. NFS WRITE operations which fail may or may
> Peter> not lead to file corruption.
>
> Peter> Anything writable should _always_ be mounted "hard" for safety
> Peter> purposes. Readonly mounted file systems _may_ be mounted
> Peter> "soft", depending upon what is located on them.
>
> Not in my experience. We use NetApps as our backing NFS servers, so
> maybe my experience isn't totally relevant. But with a mix of Linux
> and Solaris clients, we've never had problems with soft,intr on our
> NFS clients.
>
> We also don't see file corruption, mysterious executables failing to
> run, etc.
>
> Now maybe those issues are raised when you have a Linux NFS server
> with Solaris clients. But in my book, reliable NFS servers are key,
> and if they are reliable, 'soft,intr' works just fine.
>
> Now maybe if we had NFS exported directories everywhere, and stuff
> cross mounted all over the place with autofs, then we might change our
> minds.
>
> In any case, I don't dis-agree with the fundamental request to make
> the NFS client code on Linux easier to work with. I bet Trond (who
> works at NetApp) will have something to say on this issue.

Just for the others who may be reading this thread --

If you use sufficient network bandwidth and high quality
enough networks and NFS servers with plenty of resources,
then you _may_ be able to get away with "soft" mounting
for a some period of time.

However, any server, including Solaris and NetApp servers,
will fail, and those failures may or may not affect the
NFS service being provided. In fact, unless the system
is being carefully administrated and the applications are
written very well, with error detection and recovery in
mind, then corruption can occur, and it can be silent and
unnoticed until too late. In fact, most failures do occur
silently and get chalked up to other causes because it will
not be possible to correlate the badness with the NFS
client giving up when attempting to communicate with an
NFS server.

I wish you the best of luck, although with the environment
that you describe, it seems like "hard" mounts would work
equally well and would not incur the risks.

ps

2007-08-21 19:26:15

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

On Tue, Aug 21, 2007 at 02:50:42PM -0400, John Stoffel wrote:
> Not in my experience. We use NetApps as our backing NFS servers, so
> maybe my experience isn't totally relevant. But with a mix of Linux
> and Solaris clients, we've never had problems with soft,intr on our
> NFS clients.
>
> We also don't see file corruption, mysterious executables failing to
> run, etc.
>
> Now maybe those issues are raised when you have a Linux NFS server
> with Solaris clients. But in my book, reliable NFS servers are key,
> and if they are reliable, 'soft,intr' works just fine.

The NFS server alone can't prevent the problems Peter Staubach refers
to. Their frequency also depends on the network and the way you're
using the filesystem. (A sufficiently paranoid application accessing
the filesystem could function correctly despite the problems caused by
soft mounts, but the degree of paranoia required probably isn't common.)

In practice, you may get away with soft mounts and never see problems.
But other people considering them should probably make sure they
understand the issues before trusting anything important to them.

--b.

2007-08-21 23:04:33

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

On Tue, 21 Aug 2007 14:50:42 EDT, John Stoffel said:

> Now maybe those issues are raised when you have a Linux NFS server
> with Solaris clients. But in my book, reliable NFS servers are key,
> and if they are reliable, 'soft,intr' works just fine.

And you don't need all that ext3 journal overhead if your disk drives
are reliable too. Gotcha. :)


Attachments:
(No filename) (226.00 B)

2007-08-22 10:03:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

On Tue, Aug 21, 2007 at 07:04:16PM -0400, [email protected] wrote:
> On Tue, 21 Aug 2007 14:50:42 EDT, John Stoffel said:
>
> > Now maybe those issues are raised when you have a Linux NFS server
> > with Solaris clients. But in my book, reliable NFS servers are key,
> > and if they are reliable, 'soft,intr' works just fine.
>
> And you don't need all that ext3 journal overhead if your disk drives
> are reliable too. Gotcha. :)

Err, no. The ext3 journal overhead buys you not needing to fsck after
an unclean shutdown, and safety against crap getting written to the
inode table on an unclean power hit while the disk drive is writing
and the memory goes insane before the DMA engine and disk drive stop
working from the voltage on the power supply rails. (Hence my advice
that if you use XFS on Linux, make *sure* you have a UPS; on machines
such as the SGI Indy they added bigger capacitors to the PSU and a
real power fail interrupt, but PC-class hardware is
inexpensive/crappy, so it doesn't have such niceties.)

- Ted


2007-08-22 15:27:42

by John Stoffel

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

>>>>> "Valdis" == Valdis Kletnieks <[email protected]> writes:

Valdis> On Tue, 21 Aug 2007 14:50:42 EDT, John Stoffel said:
>> Now maybe those issues are raised when you have a Linux NFS server
>> with Solaris clients. But in my book, reliable NFS servers are key,
>> and if they are reliable, 'soft,intr' works just fine.

Valdis> And you don't need all that ext3 journal overhead if your disk
Valdis> drives are reliable too. Gotcha. :)

Yeah yeah... you got me. *grin* In a way. How to say this. NFS is
like ext2 in some ways. No real protection from errors unless you
turn on possibly performance killing aspects of the code.

Ext3 takes it to a higher level of consistency without compromising as
much on the performance.

RAID can be the base of both of these things, and that helps alot. If
your RAID is reliable.

So, my NetApps are reliable because they have NVRAM for performance,
and it's battery backed for reliability. On that they build the
Volume and Filesystem stuff, which also has performance and
reliability built-in.

On top of this, they have NFS (or CIFS or other protocols, but I use
only NFS). And we actually default to "proto=tcp,soft,intr" for all
our mounts. We do this for performance, because we're confident of
the underlying reliability of the layers below it. All the way down
to the Network switches in a way. Though I admit we don't dual-path
everything since we don't have enough need for that level of
reliability.

So that's where I'm coming from. Now, I'd be happy to be proven
wrong, but I'd like to see people giving test scripts which can be run
on a client to simulate failures and such so I can run them here in my
environment as test. Maybe I'll change my mind. Maybe I won't.

At least we've got choice. :]

John

2007-08-24 15:37:42

by Peter Staubach

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

Ric Wheeler wrote:
> J. Bruce Fields wrote:
>> On Tue, Aug 21, 2007 at 02:50:42PM -0400, John Stoffel wrote:
>>
>>> Not in my experience. We use NetApps as our backing NFS servers, so
>>> maybe my experience isn't totally relevant. But with a mix of Linux
>>> and Solaris clients, we've never had problems with soft,intr on our
>>> NFS clients.
>>>
>>> We also don't see file corruption, mysterious executables failing to
>>> run, etc.
>>> Now maybe those issues are raised when you have a Linux NFS server
>>> with Solaris clients. But in my book, reliable NFS servers are key,
>>> and if they are reliable, 'soft,intr' works just fine.
>>>
>>
>> The NFS server alone can't prevent the problems Peter Staubach refers
>> to. Their frequency also depends on the network and the way you're
>> using the filesystem. (A sufficiently paranoid application accessing
>> the filesystem could function correctly despite the problems caused by
>> soft mounts, but the degree of paranoia required probably isn't common.)
>>
> Would it be sufficient to insure that that application always issues
> an fsync() before closing any recently written/updated file? Is there
> some other subtle paranoid techniques that should be used?

I suspect that this is not sufficient. The application should
be prepared to rewrite data if it can determine what data did
not get written. Using fsync will tell the application when
data was not written to the server correctly, but not which
part of the data.

Perhaps O_SYNC or fsync following each write, but either one of
these options will also cause a large performance degradation.

The right solution is the use of TCP and hard mounting.

ps

2007-08-24 15:53:36

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

On Fri, Aug 24, 2007 at 11:09:14AM -0400, Ric Wheeler wrote:
> J. Bruce Fields wrote:
>> The NFS server alone can't prevent the problems Peter Staubach refers
>> to. Their frequency also depends on the network and the way you're
>> using the filesystem. (A sufficiently paranoid application accessing
>> the filesystem could function correctly despite the problems caused by
>> soft mounts, but the degree of paranoia required probably isn't common.)
>>
> Would it be sufficient to insure that that application always issues an
> fsync() before closing any recently written/updated file? Is there some
> other subtle paranoid techniques that should be used?

NFS already syncs on close (and on unlock), so you should just need to
check the return values from any writes, fsyncs, closes, etc. (and
realize that an error there may mean some or all of the previous writes
to this file descriptor failed). And operations like mkdir have the
same problem--a timeout leaves you not knowing whether the directory was
created, because you don't know whether the operation reached the server
or not.

I assume the problems with executables that Peter Staubach refers to are
due to reads on mmap'd files timing out.

I don't use soft mounts myself and haven't had to debug user problems
with them, so my understanding of it all is purely theoretical--others
will have a better idea when and how these kinds of failures actually
manifest themselves in practice.

--b.

2007-08-24 16:16:17

by Ric Wheeler

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

J. Bruce Fields wrote:
> On Tue, Aug 21, 2007 at 02:50:42PM -0400, John Stoffel wrote:
>
>> Not in my experience. We use NetApps as our backing NFS servers, so
>> maybe my experience isn't totally relevant. But with a mix of Linux
>> and Solaris clients, we've never had problems with soft,intr on our
>> NFS clients.
>>
>> We also don't see file corruption, mysterious executables failing to
>> run, etc.
>>
>> Now maybe those issues are raised when you have a Linux NFS server
>> with Solaris clients. But in my book, reliable NFS servers are key,
>> and if they are reliable, 'soft,intr' works just fine.
>>
>
> The NFS server alone can't prevent the problems Peter Staubach refers
> to. Their frequency also depends on the network and the way you're
> using the filesystem. (A sufficiently paranoid application accessing
> the filesystem could function correctly despite the problems caused by
> soft mounts, but the degree of paranoia required probably isn't common.)
>
Would it be sufficient to insure that that application always issues an
fsync() before closing any recently written/updated file? Is there some
other subtle paranoid techniques that should be used?

ric

2007-08-31 08:06:59

by Ian Kent

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

On Tue, 21 Aug 2007, John Stoffel wrote:

> >>>>> "Peter" == Peter Staubach <[email protected]> writes:
>
> Peter> John Stoffel wrote:
> Robin> I'm bringing this up again (I know it's been mentioned here
> Robin> before) because I had been told that NFS support had gotten
> Robin> better in Linux recently, so I have been (for my $dayjob)
> Robin> testing the behaviour of NFS (autofs NFS, specifically) under
> Robin> Linux with hard,intr and using iptables to simulate a hang.
> >>
> >> So why are you mouting with hard,intr semantics? At my current
> >> SysAdmin job, we mount everything (solaris included) with 'soft,intr'
> >> and it works well. If an NFS server goes down, clients don't hang for
> >> large periods of time.
>
> Peter> Wow! That's _really_ a bad idea. NFS READ operations which
> Peter> timeout can lead to executables which mysteriously fail, file
> Peter> corruption, etc. NFS WRITE operations which fail may or may
> Peter> not lead to file corruption.
>
> Peter> Anything writable should _always_ be mounted "hard" for safety
> Peter> purposes. Readonly mounted file systems _may_ be mounted
> Peter> "soft", depending upon what is located on them.
>
> Not in my experience. We use NetApps as our backing NFS servers, so
> maybe my experience isn't totally relevant. But with a mix of Linux
> and Solaris clients, we've never had problems with soft,intr on our
> NFS clients.

So, there's a power outage and the UPS had a glitch.
Oops, you've got to recover multiple TB and tell users everything since
the last incremental backup is gone.

You use UPS in the computer room but management, in it's cost cutting
wisdom, hasn't provided for UPS for your Unix workstations and there's a
power outage. Oops, you've got lots of corrupt files but you don't know
which ones they are so you've got to recover multiple TB and tell users
everything since the last incremental backup is gone.

Ok, so hard mounting may not always save you in these circumstances but
soft mounting will surely get you in the neck.

Ian

2007-08-31 15:10:27

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

On Fri, 31 Aug 2007 16:06:36 +0800, Ian Kent said:
> So, there's a power outage and the UPS had a glitch.

Murphy can get a *lot* more creative than that.

So we'd outgrown the capacity on our UPS and diesel generator, and decided
to replace them. So we schedule downtime for a Saturday. Rather scary, we
had a Sun E10K that had been powered-up for several years, and just as expected,
a good fraction of the 400+ drives it had failed to re-spinup. While recovering
from that, we discovered that although the vast majority of the 400 drives were
either mirrors or raidsets, due to a config error, the boot volume wasn't
mirrored (fortunately, it spun up OK so we dodged the bullet), so we fixed that.

Literally the next Friday, not even a week later, a contractor relocating a
door into our machine room shorted out a sensor circuit in our fire suppression
system, triggering a Halon dump. Of course, no amount of UPS and diesel was
going to save us now, because there was a safety interlock that killed the
power feeds if the Halon dumped. This time, since they'd all been stressed
just a week before, only 2 of the 400+ disks on the E10K failed to spin up.

Guess which two. ;)





Attachments:
(No filename) (226.00 B)

2007-08-31 15:30:42

by Ian Kent

[permalink] [raw]
Subject: Re: NFS hang + umount -f: better behaviour requested.

On Fri, 2007-08-31 at 11:10 -0400, [email protected] wrote:
> On Fri, 31 Aug 2007 16:06:36 +0800, Ian Kent said:
> > So, there's a power outage and the UPS had a glitch.
>
> Murphy can get a *lot* more creative than that.
>
> So we'd outgrown the capacity on our UPS and diesel generator, and decided
> to replace them. So we schedule downtime for a Saturday. Rather scary, we
> had a Sun E10K that had been powered-up for several years, and just as expected,
> a good fraction of the 400+ drives it had failed to re-spinup. While recovering
> from that, we discovered that although the vast majority of the 400 drives were
> either mirrors or raidsets, due to a config error, the boot volume wasn't
> mirrored (fortunately, it spun up OK so we dodged the bullet), so we fixed that.
>
> Literally the next Friday, not even a week later, a contractor relocating a
> door into our machine room shorted out a sensor circuit in our fire suppression
> system, triggering a Halon dump. Of course, no amount of UPS and diesel was
> going to save us now, because there was a safety interlock that killed the
> power feeds if the Halon dumped. This time, since they'd all been stressed
> just a week before, only 2 of the 400+ disks on the E10K failed to spin up.
>
> Guess which two. ;)

Eeeeeekkkk!!
The mirrors, of course.

Ian