2003-05-16 07:43:59

by Peter Astrand

[permalink] [raw]
Subject: Soft vs Hard mounts


> if using hardmounts, everybody sits and wait for the other server
> to come back online... if and when it does, and if it comes up
> at some other ip# or some other nfs ports, other pcs need to be
> rebooted too
> - i dont like waiting around or rebooting

Question: Why is this soft-vs-hard-mount debate so common in the Linux NF=
S=20
world, but not in, for example, the Solaris community?=20

My answer: Because Solaris has a better "umount -f" command. Linux users
want to use soft mounts to avoid rebooting in case of an unavailable
server. Solaris users can use "umount -f" instead.

In case you don't know what Solaris umount -f does, here's the man page=20
info:

-f Forcibly unmount a file system.

Without this option, umount does not allow a file sys-
tem to be unmounted if a file on the file system is
busy. Using this option can cause data loss for open
files; programs which access files after the file sys-
tem has been unmounted will get an error (EIO).

umount -f is much better than soft mounts: Even though it can lead to dat=
a=20
loss/corruption, you have to invoke this command manually. Data=20
loss/corruption won't occur behind your back, whenever a timeout occurs.=20

It would be really nice with this kind of "umount -f" in Linux.=20

--=20
/Peter =C5strand <[email protected]>





-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-05-16 07:49:09

by Greg Lindahl

[permalink] [raw]
Subject: Re: Soft vs Hard mounts

On Fri, May 16, 2003 at 09:43:59AM +0200, Peter =C5strand wrote:

> Question: Why is this soft-vs-hard-mount debate so common in the Linux =
NFS=20
> world, but not in, for example, the Solaris community?=20
>=20
> My answer: Because Solaris has a better "umount -f" command. Linux user=
s
> want to use soft mounts to avoid rebooting in case of an unavailable
> server. Solaris users can use "umount -f" instead.

That's part of it. The rest of the mystery is:

1) Linux's nfs client code is not reliable at allowing SIGINT
to kill processes when disks mounted "hard,intr" go astray, and

2) Linux's nfs client code is not sufficiently careful about not
sucking unrelated processes into the black hole of a failed "hard"
mount.

I admit I've never done a systematic study of the Linux client for
these issues, but when you run into soft-mounting-advocates, they
generally talk about all 3 of these issues, not only umount.

On Solaris, it used to be the case that (2) required that you be
careful about exactly where you mounted things; I don't know about
now, but it was part of the lore of how to use the automounter + hard
mounts and not get burned.

-- greg



-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-16 16:13:18

by Lever, Charles

[permalink] [raw]
Subject: RE: Soft vs Hard mounts

hi greg-

very succinct, and i think this is the crux of the problem.

recently trond mentioned that this issue was not likely to
be addressed until after 2.6 because it requires architectural
changes in the VFS layer and probably a new type of semaphor
implementation.

in the meantime, i think we can address some of the instability:

1. reduce or eliminate the ESTALE errors that occur after
a server reboot. this is probably some kind of client
bug.

2. get lazy unmounting (umount -l) to work for NFS. i tried
this recently, and it didn't work as advertised.

3. make soft mounts work more reliably by purging a file's
page cache contents when a soft timeout occurs

4. educate sysadmins how to make soft mounts more reliable
by thoroughly testing to determine a reasonable set of
mount options.

> On Fri, May 16, 2003 at 09:43:59AM +0200, Peter =C5strand wrote:
>=20
> > Question: Why is this soft-vs-hard-mount debate so common=20
> in the Linux NFS=20
> > world, but not in, for example, the Solaris community?=20
> >=20
> > My answer: Because Solaris has a better "umount -f"=20
> command. Linux users
> > want to use soft mounts to avoid rebooting in case of an unavailable
> > server. Solaris users can use "umount -f" instead.
>=20
> That's part of it. The rest of the mystery is:
>=20
> 1) Linux's nfs client code is not reliable at allowing SIGINT
> to kill processes when disks mounted "hard,intr" go astray, and
>=20
> 2) Linux's nfs client code is not sufficiently careful about not
> sucking unrelated processes into the black hole of a failed "hard"
> mount.
>=20
> I admit I've never done a systematic study of the Linux client for
> these issues, but when you run into soft-mounting-advocates, they
> generally talk about all 3 of these issues, not only umount.
>=20
> On Solaris, it used to be the case that (2) required that you be
> careful about exactly where you mounted things; I don't know about
> now, but it was part of the lore of how to use the automounter + hard
> mounts and not get burned.
>=20
> -- greg
>=20
>=20
>=20
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux=20
> enterprise solutions
> http://www.enterpriselinuxforum.com
>=20
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>=20


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-16 16:26:01

by Lever, Charles

[permalink] [raw]
Subject: RE: Soft vs Hard mounts

> 4. educate sysadmins how to make soft mounts more reliable
> by thoroughly testing to determine a reasonable set of
> mount options.

let me clarify here: one of *us* should do the testing, and
we should post the results in the FAQ or how-to.


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-05-16 21:50:18

by Greg Lindahl

[permalink] [raw]
Subject: Re: Soft vs Hard mounts

On Fri, May 16, 2003 at 09:13:18AM -0700, Lever, Charles wrote:

> recently trond mentioned that this issue was not likely to
> be addressed until after 2.6 because it requires architectural
> changes in the VFS layer and probably a new type of semaphor
> implementation.

Well, some of the issues might be separate, and some have workarounds.

For example, if stat() on a nfs mountpoint causes a process to hang,
and you have processes like ssh which want to make sure your conf file
is secure by walking the path back to /, there is a workaround
of putting each mount point in its own directory:

/home/foo/foo
/home/bar/bar

You'd also want to look into what the Linux automounter does. It
looks different from what I'm used to.

> 3. make soft mounts work more reliably by purging a file's
> page cache contents when a soft timeout occurs
>
> 4. educate sysadmins how to make soft mounts more reliable
> by thoroughly testing to determine a reasonable set of
> mount options.

Option (3) might make network congestion much worse, and turn
congestion into failure. (4) is really hard, because it's hard to
thoroughly test many different types of activity and congestion.

Distributed system people generally eventually end up thinking that
introducing any timeout will always result in congestion and odd
circumstances turning into failures. I vastly prefer systems which
have the behavior of "hard,intr".

greg



-------------------------------------------------------
This SF.net email is sponsored by: If flattening out C++ or Java
code to make your application fit in a relational database is painful,
don't do it! Check out ObjectStore. Now part of Progress Software.
http://www.objectstore.net/sourceforge
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs