2003-01-14 15:56:12

by Lever, Charles

[permalink] [raw]
Subject: RE: [NFS] Re: broken umount -f

"umount -f" doesn't end pending RPCs. if there are processes
with pending RPCs, then they are stuck and you will have to
reboot. "intr" may allow some of these processes to be killed
before trying the "umount."

however, if there are no outstanding RPCs on the client, but
the server is not available, umount -f works as advertised.

> -----Original Message-----
> From: Peter =C5strand [mailto:[email protected]]=20
> Sent: Monday, January 13, 2003 4:45 AM
> To: Trond Myklebust
> Cc: [email protected]; [email protected]
> Subject: [NFS] Re: broken umount -f
>=20
>=20
> >>For as long as I remember, umount -f has been broken. I got=20
> a reminder=20
> >>of this fact today when we took an older NFS server out of=20
> use. I had=20
> >>to reboot almost all machines that had mounts from this server. Not=
=20
> >>nice.
>=20
> ...
>=20
> > AFAICS It works for me.
> >=20
> > Are you using the 'intr' mount option,
>=20
> Yes, as often I can. But IMHO, it should be possible to=20
> unmount an unreachable NFS fs even if it wasn't mounted with=20
> "intr". Otherwise we have a quite silly "sysadmin trap".
>=20
> >and are you remembering to kill
> > those processes that are actually using the mount point first?
>=20
> One some machines, I killed more or less everything. It=20
> didn't help. One some other machines, I couldn't kill so=20
> blindly. Remember, both "lsof" and "fuser" hangs.
>=20
> Also, as far as I understand, Solaris 8 does not require that=20
> you kill all processes before unmounting, if you use the "-f"=20
> flag (processes will get EIO). Would it be possible to=20
> implement this feature in Linux? That would be really nice.
>=20
> Regards, Peter
>=20
>=20
> >>For as long as I remember, umount -f has been broken. I got=20
> a reminder=20
> >>of this fact today when we took an older NFS server out of=20
> use. I had=20
> >>to reboot almost all machines that had mounts from this server. Not=
=20
> >>nice.
> >>
> >>Anyone knows why -f does not work? When I try, I get:
> >>
> >># umount -f /import/applix Cannot MOUNTPROG RPC: RPC: Port mapper=20
> >>failure - RPC: Unable to receive umount2: Device or resource busy
> >>umount: /import/applix: device is busy
> >>
> >>lsof and fuser hangs, as do "df" and "du". Really frustrating. It's=
=20
> >>not even possible to cleanly reboot the system, since=20
> RedHats shutdown=20
> >>scripts wants to unmount NFS fs's.
> >>
> >>I'm not exactly sure I understand what -f is supposed to do. Is it=20
> >>correct that it is supposed to unmount without contacting the NFS=20
> >>server? I assume that I still have to make sure no=20
> processes are using=20
> >>the FS? Would it be possible to add a "-9" flag (or something like=20
> >>that) that kills off all processes that uses the NFS fs=20
> automatically?
> >>
> >>(I'm using all kinds of RedHat Linux versions, from 5.0 up to 7.3.=20
> >>From what I can tell, this problems exists in all versions.)
> >>
>=20
>=20
>=20
>=20
>=20
> -------------------------------------------------------
> This SF.NET email is sponsored by: FREE SSL Guide from=20
> Thawte are you planning your Web Server Security? Click here=20
> to get a FREE Thawte SSL guide and find the answers to all=20
> your SSL security issues.=20
> http://ads.sourceforge.net/cgi-> bin/redirect.pl?thaw0026en
>=20
>=20
> _______________________________________________
> NFS maillist - [email protected]=20
> https://lists.sourceforge.net/lists/listinfo/n> fs
>=20


2003-01-14 17:07:57

by Scott Mcdermott

[permalink] [raw]
Subject: Re: Re: broken umount -f

Lever, Charles on Tue 14/01 07:56 -0800:
> however, if there are no outstanding RPCs on the client, but
> the server is not available, umount -f works as advertised.

I think his point was that Solaris will allow root to umount -f in any
case.

Right now I have to reboot my Linux NFS clients remotely with a "sync;
reboot -f" if they are stuck on a stale mount. `init 6' will time out
forever trying to unmount NFS filesystems (and yes, the init scripts use
`-f'), and by that time my sshd is killed so I have to drive out to the
colo site.


-------------------------------------------------------
This SF.NET email is sponsored by: FREE SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-14 19:06:15

by Trond Myklebust

[permalink] [raw]
Subject: Re: Re: broken umount -f

>>>>> " " == Scott Mcdermott <[email protected]> writes:

> Lever, Charles on Tue 14/01 07:56 -0800:
>> however, if there are no outstanding RPCs on the client, but
>> the server is not available, umount -f works as advertised.

> I think his point was that Solaris will allow root to umount -f
> in any case.

> Right now I have to reboot my Linux NFS clients remotely with a
> "sync; reboot -f" if they are stuck on a stale mount. `init 6'
> will time out forever trying to unmount NFS filesystems (and
> yes, the init scripts use `-f'), and by that time my sshd is
> killed so I have to drive out to the colo site.

Linux will not allow you to unmount without killing those processes,
and I'd be opposed to any patch that tries to kill active processes
from within the filesystem.
This is something that needs to be resolved in userland.

Cheers,
Trond


-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-14 19:19:55

by Scott Mcdermott

[permalink] [raw]
Subject: Re: Re: broken umount -f

Trond Myklebust on Tue 14/01 20:06 +0100:
> Linux will not allow you to unmount without killing those processes,
> and I'd be opposed to any patch that tries to kill active processes
> from within the filesystem. This is something that needs to be
> resolved in userland.

Last I checked, the programs wouldn't die even with -KILL when they were
stuck in device-wait state. The only way to reboot a machine with such
processes is to reboot -f, which is wrong. The filesystems should be
able to have forced umount at sysadmin's discretion.


-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-14 19:35:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: Re: broken umount -f

>>>>> " " == Scott Mcdermott <[email protected]> writes:

> Last I checked, the programs wouldn't die even with -KILL when
> they were stuck in device-wait state.

They will if you mount with 'intr', and make sure that you kill *all*
programs that are using that mountpoint.

If somebody wants to work on 'umount -f' then by all means do...

Cheers,
Trond


-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-14 19:39:21

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: Re: broken umount -f

On Tue, Jan 14, 2003 at 08:06:09PM +0100, Trond Myklebust wrote:
> Linux will not allow you to unmount without killing those processes,
> and I'd be opposed to any patch that tries to kill active processes
> from within the filesystem.
> This is something that needs to be resolved in userland.

It's up to NFS to make the outstanding IOs against the filesystem
return -EIO.

-ben


-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-14 19:53:10

by Trond Myklebust

[permalink] [raw]
Subject: Re: Re: broken umount -f

>>>>> " " == Benjamin LaHaise <[email protected]> writes:

> It's up to NFS to make the outstanding IOs against the
> filesystem return -EIO.

It does that.

void
nfs_umount_begin(struct super_block *sb)
{
struct nfs_server *server = NFS_SB(sb);
struct rpc_clnt *rpc;

/* -EIO all pending I/O */
if ((rpc = server->client) != NULL)
rpc_killall_tasks(rpc);
}

What it does not do is hunt down every task that holds an open file,
or has been started from an NFS directory.

Cheers,
Trond


-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-14 19:56:15

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: Re: broken umount -f

On Tue, Jan 14, 2003 at 08:52:58PM +0100, Trond Myklebust wrote:
> What it does not do is hunt down every task that holds an open file,
> or has been started from an NFS directory.

Leaving them with stale directories that EIO on everything should work
fine. The unmount is actually a distinct operation from the freeing
of all the data structures, which thanks to Al Viro's changes will
now be properly garbage collected as the processes die off.

-ben
--
"Do you seek knowledge in time travel?"


-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-14 22:17:58

by Scott Mcdermott

[permalink] [raw]
Subject: Re: Re: broken umount -f

Trond Myklebust on Tue 14/01 20:35 +0100:
> > Last I checked, the programs wouldn't die even with -KILL when
> > they were stuck in device-wait state.
>
> They will if you mount with 'intr', and make sure that you kill *all*
> programs that are using that mountpoint.

I don't want users to be able to corrupt their own data by using `intr'
when I am just bouncing the server for some reason. But, I want root to
be able to force it if necessary, as in the case of a client reboot when
the server is gone (eg when switching servers and they need new handles
for their automounts, but still have the old ones and the server is
never coming back up). Usually this means the admin screwed up, but
unfortunately that does happen.

"umount -f" seems like the right place for this.


-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-14 22:29:54

by Steven N. Hirsch

[permalink] [raw]
Subject: Re: Re: broken umount -f

On Tue, 14 Jan 2003, Trond Myklebust wrote:

> >>>>> " " == Scott Mcdermott <[email protected]> writes:
>
> > Last I checked, the programs wouldn't die even with -KILL when
> > they were stuck in device-wait state.
>
> They will if you mount with 'intr', and make sure that you kill *all*
> programs that are using that mountpoint.

I can honestly say that I end up rebooting more than 80% of the time when
the server goes away. Believe me, I've tried everything including taking
the system down to single-user mode and killing everything not bolted to
the kernel <g>.

Steve




-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-14 22:32:09

by Steven N. Hirsch

[permalink] [raw]
Subject: Re: Re: broken umount -f

On Tue, 14 Jan 2003, Scott Mcdermott wrote:

> Trond Myklebust on Tue 14/01 20:35 +0100:
> > > Last I checked, the programs wouldn't die even with -KILL when
> > > they were stuck in device-wait state.
> >
> > They will if you mount with 'intr', and make sure that you kill *all*
> > programs that are using that mountpoint.
>
> I don't want users to be able to corrupt their own data by using `intr'
> when I am just bouncing the server for some reason. But, I want root to
> be able to force it if necessary, as in the case of a client reboot when
> the server is gone (eg when switching servers and they need new handles
> for their automounts, but still have the old ones and the server is
> never coming back up). Usually this means the admin screwed up, but
> unfortunately that does happen.
>
> "umount -f" seems like the right place for this.

You have my vote. For sanity's sake, there needs to be a big, red lever
under the locked access hatch - labled "I said umount -f, dammit!".




-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs