LinuxLists.cc - broken umount -f

2002-12-27 16:44:15

Subject: broken umount -f

For as long as I remember, umount -f has been broken. I got a reminder of
this fact today when we took an older NFS server out of use. I had to
reboot almost all machines that had mounts from this server. Not nice.=20

Anyone knows why -f does not work? When I try, I get:

# umount -f /import/applix
Cannot MOUNTPROG RPC: RPC: Port mapper failure - RPC: Unable to receive
umount2: Device or resource busy
umount: /import/applix: device is busy

lsof and fuser hangs, as do "df" and "du". Really frustrating. It's not=20
even possible to cleanly reboot the system, since RedHats shutdown script=
s=20
wants to unmount NFS fs's.=20

I'm not exactly sure I understand what -f is supposed to do. Is it correc=
t
that it is supposed to unmount without contacting the NFS server? I assum=
e
that I still have to make sure no processes are using the FS? Would it be
possible to add a "-9" flag (or something like that) that kills off all
processes that uses the NFS fs automatically?

(I'm using all kinds of RedHat Linux versions, from 5.0 up to 7.3. From=20
what I can tell, this problems exists in all versions.)

--=20
Peter =C5strand Telephone: +46-13-21 46 00
Cendio Systems E-mail: [email protected]
Teknikringen 3
583 30 Link=F6ping

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-13 09:44:33

by Peter Astrand

[permalink] [raw]

Subject: Re: broken umount -f

>>For as long as I remember, umount -f has been broken. I got a reminder
>>of this fact today when we took an older NFS server out of use. I had to
>>reboot almost all machines that had mounts from this server. Not nice.

...

> AFAICS It works for me.
>
> Are you using the 'intr' mount option,

Yes, as often I can. But IMHO, it should be possible to unmount an
unreachable NFS fs even if it wasn't mounted with "intr". Otherwise we
have a quite silly "sysadmin trap".

>and are you remembering to kill
> those processes that are actually using the mount point first?

One some machines, I killed more or less everything. It didn't help. One
some other machines, I couldn't kill so blindly. Remember, both "lsof" and
"fuser" hangs.

Also, as far as I understand, Solaris 8 does not require that you kill all
processes before unmounting, if you use the "-f" flag (processes will get
EIO). Would it be possible to implement this feature in Linux? That would
be really nice.

Regards, Peter

>>For as long as I remember, umount -f has been broken. I got a reminder
>>of this fact today when we took an older NFS server out of use. I had to
>>reboot almost all machines that had mounts from this server. Not nice.
>>
>>Anyone knows why -f does not work? When I try, I get:
>>
>># umount -f /import/applix Cannot MOUNTPROG RPC: RPC: Port mapper
>>failure - RPC: Unable to receive umount2: Device or resource busy
>>umount: /import/applix: device is busy
>>
>>lsof and fuser hangs, as do "df" and "du". Really frustrating. It's not
>>even possible to cleanly reboot the system, since RedHats shutdown
>>scripts wants to unmount NFS fs's.
>>
>>I'm not exactly sure I understand what -f is supposed to do. Is it
>>correct that it is supposed to unmount without contacting the NFS
>>server? I assume that I still have to make sure no processes are using
>>the FS? Would it be possible to add a "-9" flag (or something like that)
>>that kills off all processes that uses the NFS fs automatically?
>>
>>(I'm using all kinds of RedHat Linux versions, from 5.0 up to 7.3. From
>>what I can tell, this problems exists in all versions.)
>>

2003-01-14 19:32:03

by Cole, Timothy D.

[permalink] [raw]

Subject: RE: Re: broken umount -f

> -----Original Message-----
> From: Trond Myklebust [mailto:[email protected]]
> Sent: Tuesday, January 14, 2003 14:06
> To: Scott Mcdermott
> Cc: [email protected]
> Subject: Re: [NFS] Re: broken umount -f

> Linux will not allow you to unmount without killing those processes,
> and I'd be opposed to any patch that tries to kill active processes
> from within the filesystem.

> This is something that needs to be resolved in userland.

The few times I've needed to use umount -f, the processes in question
weren't killable from userland. Is there an architectural reason for this?

(i.e. instead of killing them, why can't their pending system calls return
with -EIO if a umount is forced, as I've seen some other unices do in
similar situations [pending RPCs + a dead server pinning a mount]?)

-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-14 19:32:53

by Brian Tinsley

[permalink] [raw]

Subject: Re: Re: broken umount -f

Scott Mcdermott wrote:

>Trond Myklebust on Tue 14/01 20:06 +0100:
>
>
>>Linux will not allow you to unmount without killing those processes,
>>and I'd be opposed to any patch that tries to kill active processes
>>from within the filesystem. This is something that needs to be
>>resolved in userland.
>>
>>
>
>Last I checked, the programs wouldn't die even with -KILL when they were
>stuck in device-wait state. The only way to reboot a machine with such
>processes is to reboot -f, which is wrong. The filesystems should be
>able to have forced umount at sysadmin's discretion.
>
>
I've had luck with:

kill -9 <pids>
umount -f <nfs_dirs>
kill -9 <pids>
umount -f <nfs_dirs>

The last umount always works for me.

--

Brian Tinsley
Chief Systems Engineer
Emageon

-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-14 19:36:39

by Lever, Charles

[permalink] [raw]

Subject: RE: Re: broken umount -f

> -----Original Message-----
> From: Scott Mcdermott [mailto:[email protected]]
> Sent: Tuesday, January 14, 2003 2:20 PM
> To: Trond Myklebust
> Cc: [email protected]
> Subject: Re: [NFS] Re: broken umount -f
>
> Last I checked, the programs wouldn't die even with -KILL
> when they were
> stuck in device-wait state. The only way to reboot a machine
> with such
> processes is to reboot -f, which is wrong. The filesystems should be
> able to have forced umount at sysadmin's discretion.

do you remember which kernel this was?

trond fixed a long-standing "processes stuck in 'D' state" bug in
2.4.20. this bug may be the reason these processes didn't die
when you killed them.

-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-14 19:55:24

by Cole, Timothy D.

[permalink] [raw]

Subject: RE: Re: broken umount -f

> -----Original Message-----
> From: Trond Myklebust [mailto:[email protected]]
> Sent: Tuesday, January 14, 2003 14:36
> To: Scott Mcdermott
> Cc: [email protected]
> Subject: Re: [NFS] Re: broken umount -f
>
> They will if you mount with 'intr', and make sure that you kill *all*
> programs that are using that mountpoint.

That doesn't always appear to be the case in practice (maybe a long-standing
bug/strange interaction?). Can there be users not reported by fuser?

-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-15 05:19:31

by Scott Mcdermott

[permalink] [raw]

Subject: Re: Re: broken umount -f

Lever, Charles on Tue 14/01 11:36 -0800:
> > Last I checked, the programs wouldn't die even with -KILL when they
> > were stuck in device-wait state. The only way to reboot a machine
> > with such processes is to reboot -f, which is wrong. The
> > filesystems should be able to have forced umount at sysadmin's
> > discretion.
>
> do you remember which kernel this was?
>
> trond fixed a long-standing "processes stuck in 'D' state" bug in
> 2.4.20. this bug may be the reason these processes didn't die when
> you killed them.

<on client>

# uname -r
2.4.21-pre3-NFS_ALL

# showmount --exports nfsserver
Export list for nfsserver
/tmp 10.0.0.5

# mount nfsserver:/tmp /mnt/tmp

# cd /mnt/tmp

# ssh nfsserver /etc/init.d/nfs stop
Shutting down NFS mountd: [ OK ]
Shutting down NFS daemon: [ OK ]
Shutting down NFS services: [ OK ]
Shutting down NFS quotas: [ OK ]

# /bin/pwd
nfs: server nfsserver not responding, still trying

<hangs>

<now on other tty with pwd=/>

# ps -eo state,wchan,pid,command | grep ^D
D end 1472 /bin/pwd

# kill -KILL 1472

# umount -f /mnt/tmp
Cannot MOUNTPROG RPC: RPC: Program not registered
umount2: Device or resource busy
umount: /mnt/tmp: device is busy

# kill -KILL 1472
# kill -KILL 1472

# umount -f /mnt/tmp
Cannot MOUNTPROG RPC: RPC: Program not registered
umount2: Device or resource busy
umount: /mnt/tmp: device is busy

# kill -KILL 1472
# kill -KILL 1472
# kill -KILL 1472
# kill -KILL 1472
# kill -KILL 1472
# kill -KILL 1472
# kill -KILL 1472

# umount -f /mnt/tmp
Cannot MOUNTPROG RPC: RPC: Program not registered
umount2: Device or resource busy
umount: /mnt/tmp: device is busy

# kill -KILL 1472

# ps -eo state,wchan,pid,command | grep ^D
D end 1472 /bin/pwd

-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-15 05:21:40

by Scott Mcdermott

[permalink] [raw]

Subject: Re: Re: broken umount -f

To [email protected] on Wed 15/01 00:19 -0500:
> Lever, Charles on Tue 14/01 11:36 -0800:
> <on client>

forgot to include

# grep nfsserver /proc/mounts
nfsserver:/tmp /mnt/tmp nfs rw,v3,rsize=8192,wsize=8192,hard,udp,lock,addr=nfsserver 0 0

-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-15 15:36:26

by Murata, Dennis W (SAIC)

[permalink] [raw]

Subject: RE: Re: broken umount -f

If you cd out of /mnt/tmp can you then umount -f? This looks similar to
what happens on Solaris 8 if the directory happens to be the home directory
of a user logged into a system. The login session has to be killed before
the umount -f works. I would not think in your case that the user would
have to be forced off.

Wayne

-----Original Message-----
From: Scott Mcdermott [mailto:[email protected]]
Sent: Tuesday, January 14, 2003 11:19 PM
To: [email protected]
Subject: Re: [NFS] Re: broken umount -f

Lever, Charles on Tue 14/01 11:36 -0800:
> > Last I checked, the programs wouldn't die even with -KILL when they
> > were stuck in device-wait state. The only way to reboot a machine
> > with such processes is to reboot -f, which is wrong. The
> > filesystems should be able to have forced umount at sysadmin's
> > discretion.
>
> do you remember which kernel this was?
>
> trond fixed a long-standing "processes stuck in 'D' state" bug in
> 2.4.20. this bug may be the reason these processes didn't die when
> you killed them.

<on client>

# uname -r
2.4.21-pre3-NFS_ALL

# showmount --exports nfsserver
Export list for nfsserver
/tmp 10.0.0.5

# mount nfsserver:/tmp /mnt/tmp

# cd /mnt/tmp

# ssh nfsserver /etc/init.d/nfs stop
Shutting down NFS mountd: [ OK ]
Shutting down NFS daemon: [ OK ]
Shutting down NFS services: [ OK ]
Shutting down NFS quotas: [ OK ]

# /bin/pwd
nfs: server nfsserver not responding, still trying

<hangs>

<now on other tty with pwd=/>

# ps -eo state,wchan,pid,command | grep ^D
D end 1472 /bin/pwd

# kill -KILL 1472

# umount -f /mnt/tmp
Cannot MOUNTPROG RPC: RPC: Program not registered
umount2: Device or resource busy
umount: /mnt/tmp: device is busy

# kill -KILL 1472
# kill -KILL 1472

# umount -f /mnt/tmp
Cannot MOUNTPROG RPC: RPC: Program not registered
umount2: Device or resource busy
umount: /mnt/tmp: device is busy

# kill -KILL 1472
# kill -KILL 1472
# kill -KILL 1472
# kill -KILL 1472
# kill -KILL 1472
# kill -KILL 1472
# kill -KILL 1472

# umount -f /mnt/tmp
Cannot MOUNTPROG RPC: RPC: Program not registered
umount2: Device or resource busy
umount: /mnt/tmp: device is busy

# kill -KILL 1472

# ps -eo state,wchan,pid,command | grep ^D
D end 1472 /bin/pwd

-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-15 14:45:56

by Lever, Charles

[permalink] [raw]

Subject: RE: Re: broken umount -f

> To [email protected] on Wed 15/01 00:19 -0500:
> > Lever, Charles on Tue 14/01 11:36 -0800:
> > <on client>
>
> forgot to include
>
> # grep nfsserver /proc/mounts
> nfsserver:/tmp /mnt/tmp nfs
> rw,v3,rsize=8192,wsize=8192,hard,udp,lock,addr=nfsserver 0 0

thanks!

what if you try it again with the intr mount option?

if that doesn't help, enable rpc level debugging and send me
the kernel log contents.

echo 3 > /proc/sys/sunrpc/rpc_debug # to enable debugging
echo 0 > /proc/sys/sunrpc/rpc_debug # to turn it off again

-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-15 16:32:38

by Scott Mcdermott

[permalink] [raw]

Subject: Re: Re: broken umount -f

Lever, Charles on Wed 15/01 06:45 -0800:
> what if you try it again with the intr mount option?

I'm sure it *will* work with the `intr' mount option. But I don't want
my users to be able to corrupt their own data just because I decided to
bounce to server for whatever reason. Their IO to that filesystem
should hang, uninterruptibly, as is the conventional wisdom (that hard,
nointr is the Right Way), and I agree with.

"pick one or the other" doesn't work because there are situtations where
the filesystem IO will never complete and a umount *has* to be forced or
there is never any recovery option.

> if that doesn't help, enable rpc level debugging and send me the
> kernel log contents.

if you'd like I can still do this, but it probably works fine with intr,
that's not what the problem is. The problem is having a system that one
cannot even reboot without using "reboot -f" just because the server is
down and the client mounts with hard,intr.

-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-15 17:05:05

by Lever, Charles

[permalink] [raw]

Subject: RE: Re: broken umount -f

> I'm sure it *will* work with the `intr' mount option. But I
> don't want
> my users to be able to corrupt their own data just because I
> decided to
> bounce to server for whatever reason. Their IO to that filesystem
> should hang, uninterruptibly, as is the conventional wisdom
> (that hard,
> nointr is the Right Way), and I agree with.

do you know what the risk of data corruption is when using "intr"?
seems pretty low to me.

-------------------------------------------------------
This SF.NET email is sponsored by: Take your first step towards giving
your online business a competitive advantage. Test-drive a Thawte SSL
certificate - our easy online guide will show you how. Click here to get
started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-15 17:35:57

by Scott Mcdermott

[permalink] [raw]

Subject: Re: Re: broken umount -f

Lever, Charles on Wed 15/01 09:04 -0800:
> > I'm sure it *will* work with the `intr' mount option. But I don't
> > want my users to be able to corrupt their own data just because I
> > decided to bounce to server for whatever reason.
>
> do you know what the risk of data corruption is when using "intr"?
> seems pretty low to me.

User saving his mail spool, sees "nfs server not responding, still
trying" and decides to try killing his MUA. Too bad it works and now
his spool is a steaming pile of ASCII.

any number of other possible problems with intr

`soft' and `intr' are evil and should be banned.

-------------------------------------------------------
This SF.NET email is sponsored by: A Thawte Code Signing Certificate
is essential in establishing user confidence by providing assurance of
authenticity and code integrity. Download our Free Code Signing guide:
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0028en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-15 18:24:37

by Scott Mcdermott

[permalink] [raw]

Subject: Re: Re: broken umount -f

Bill Rugolsky Jr. on Wed 15/01 13:08 -0500:
> > do you know what the risk of data corruption is when using "intr"?
> > seems pretty low to me.
>
> Any reason why we can't add a mount option that makes cl_intr only
> effective for SIGKILL in sunrpc/clnt.c?

and how about also only if euid == 0

-------------------------------------------------------
This SF.NET email is sponsored by: A Thawte Code Signing Certificate
is essential in establishing user confidence by providing assurance of
authenticity and code integrity. Download our Free Code Signing guide:
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0028en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-15 18:22:32

by Bill Rugolsky Jr.

[permalink] [raw]

Subject: Re: Re: broken umount -f

[Sorry, for the resend; I broke my return address on the first try.]

On Wed, Jan 15, 2003 at 09:04:58AM -0800, Lever, Charles wrote:
> do you know what the risk of data corruption is when using "intr"?
> seems pretty low to me.

Any reason why we can't add a mount option that makes cl_intr only
effective for SIGKILL in sunrpc/clnt.c? (A general sigmask= option
doesn't seem immediately useful, unless there is some use for [T]STOP.)

Regards,

Bill Rugolsky

-------------------------------------------------------
This SF.NET email is sponsored by: A Thawte Code Signing Certificate
is essential in establishing user confidence by providing assurance of
authenticity and code integrity. Download our Free Code Signing guide:
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0028en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-15 19:08:35

by Cuenta de la lista de linux

[permalink] [raw]

Subject: NFS problem: udp port nfs unreachable

Hi all:

I am having some problems running a little cluster with nfs.
I have a nfs server with redhat 7.2 , kernel 2.4.7-10 ,
portmap 4.0-38 and i have upgraded to nfs-utils 1.0.1

When from the client i try :
mount ipservidor:/home /mnt/home

I always get the following errors messages en /var/log/messages:

authenticated mount request from client'sip:762

I have used tcpdump to dig into the problem and all that i can see the server
sending the following ICMP mesages to the client:

udp port nfs unreachable

How can I add the levels of the log to the syslog(/var/log/messages) in
order to get more messages and debugging my problems?

Have anybody some ideas about these errors messages?
What is wrong?

David

-------------------------------------------------------
This SF.NET email is sponsored by: A Thawte Code Signing Certificate
is essential in establishing user confidence by providing assurance of
authenticity and code integrity. Download our Free Code Signing guide:
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0028en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-15 19:16:30

by Cole, Timothy D.

[permalink] [raw]

Subject: RE: Re: broken umount -f

> -----Original Message-----
> From: Scott Mcdermott [mailto:[email protected]]
> Sent: Wednesday, January 15, 2003 12:24
> To: [email protected]
> Subject: Re: [NFS] Re: broken umount -f
>
> User saving his mail spool, sees "nfs server not responding, still
> trying" and decides to try killing his MUA. Too bad it works and now
> his spool is a steaming pile of ASCII.

That's possible with non-NFS filesystems too -- just normally a smaller
window of opportunity. It doesn't require a filesystem hang in any case --
most mailbox operations are not a single atomic write(). Imagine someone
killing the MUA in the middle of deleting a large mail from a ~40MB mail
spool on any filesystem, local or remote.

Also consider the nointr case -- process hangs, user can't kill it, user
naively closes the terminal window. Server comes back up. SIGHUP is
finally handled when the write() returns. Process dies. ASCII soup again.

This is of course assuming that the NFS server _can_ come back up. If not,
totally unkillable processes are a pain-in-the-ssh.

> `soft' and `intr' are evil and should be banned.

Agreed WRT soft's evil-ness, anyway. But hard,intr seems to be a pretty
good combination, as far as safety from data corruption, and from a
standpoint of not having to reboot-and-kill-week-long-simulations just
because a few unrelated (but important) processes got wedged by a
recalcitrant NFS server.

-------------------------------------------------------
This SF.NET email is sponsored by: A Thawte Code Signing Certificate
is essential in establishing user confidence by providing assurance of
authenticity and code integrity. Download our Free Code Signing guide:
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0028en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-16 20:49:52

by Ion Badulescu

[permalink] [raw]

Subject: Re: Re: broken umount -f

On Wed, 15 Jan 2003 09:04:58 -0800, Lever, Charles <[email protected]> wrote:
>
> do you know what the risk of data corruption is when using "intr"?
> seems pretty low to me.

What is the risk, anyway? That the user will press Ctrl-C (SIGINT) and
kill the process? How is that different from doing the same thing when
NFS is not hung? Sure, you can envision a case where the process traps
SIGINT so it is not fatal, yet the NFS request ends up being canceled,
but it's hardly the end of the world.

I also think we have a misunderstanding here. SIGKILL should _always_ be able
to kill a process hanging on NFS. Unconditionally. It may not do it right
away, it may take a few seconds until the client exists the noninterruptible
sequence, but it should succeed eventually. The role of 'umount -f' then
becomes mostly to speed up the effects of the SIGKILL.

SIGINT should be able to do the same thing if the mount is done with 'intr'.
Nothing more and nothing less.

>From the Solaris mount_nfs(1M) man page:

intr | nointr
Allow (do not allow) keyboard interrupts to kill
a process that is hung while waiting for a
response on a hard-mounted file system. The
default is intr, which makes it possible for
clients to interrupt applications that may be
waiting for a remote mount.

Linux does the above, mostly. The biggest problem is that sometimes
the hanging NFS access will be done by rpciod, not by the process
itself, and so it's rpciod that needs the SIGKILL (or SIGINT) in
order to abort the access. Unfortunately, rpciod is owned by root,
so a regular user can't send it any signals. For the sysadmin,
killall -KILL rpciod combined with umount -f does the trick most of
the time.

The other problem I've seen occasionally is that umount -f hangs
(interruptibly) instead of aborting all outstanding I/O's. I haven't
been able to find a pattern as to when it happens, yet.

Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com
Understand how to protect your customers personal information by implementing
SSL on your Apache Web Server. Click here to get our FREE Thawte Apache
Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs