2014-03-21 20:17:30

by Chris Friesen

[permalink] [raw]
Subject: race-free exportfs and unmount?


Hi,

There was a linux-nfs thread in July 2012 with the subject "Linux NFS
and cached properties". It discussed the fact that you can't reliably do

exportfs -u 192.168.1.11:/mnt
umount /mnt

since there could be rpc users still running when exportfs returns, so
the umount fails thinking the filesystem is busy.

I'm running into this on a production system.

Was anything ever done to resolve this issue?
If not are there any workarounds?

Please cc me, I'm not subscribed to the list.

Chris


2014-03-21 22:58:19

by Chris Friesen

[permalink] [raw]
Subject: Re: race-free exportfs and unmount?

On 03/21/2014 02:20 PM, J. Bruce Fields wrote:
> On Fri, Mar 21, 2014 at 02:17:13PM -0600, Chris Friesen wrote:
>>
>> Hi,
>>
>> There was a linux-nfs thread in July 2012 with the subject "Linux
>> NFS and cached properties". It discussed the fact that you can't
>> reliably do
>>
>> exportfs -u 192.168.1.11:/mnt
>> umount /mnt
>>
>> since there could be rpc users still running when exportfs returns,
>> so the umount fails thinking the filesystem is busy.
>
> There could also be clients holding opens, locks, or delegations on the
> export.
>
>> I'm running into this on a production system.
>>
>> Was anything ever done to resolve this issue?
>> If not are there any workarounds?
>
> You can shut down the server completely, unmount, and restart.


What is different with shutting down the server completely vs unexporting?

Does shutting down the server somehow wait for in-flight operations to
complete whereas the unexport doesn't? I'm assuming that it can't just
cancel in-progress disk I/O and as long as that's happening then we
won't be able to unmount the filesystem.

Thanks,
Chris


2014-03-21 23:09:53

by NeilBrown

[permalink] [raw]
Subject: Re: race-free exportfs and unmount?

On Fri, 21 Mar 2014 16:58:07 -0600 Chris Friesen
<[email protected]> wrote:

> On 03/21/2014 02:20 PM, J. Bruce Fields wrote:
> > On Fri, Mar 21, 2014 at 02:17:13PM -0600, Chris Friesen wrote:
> >>
> >> Hi,
> >>
> >> There was a linux-nfs thread in July 2012 with the subject "Linux
> >> NFS and cached properties". It discussed the fact that you can't
> >> reliably do
> >>
> >> exportfs -u 192.168.1.11:/mnt
> >> umount /mnt
> >>
> >> since there could be rpc users still running when exportfs returns,
> >> so the umount fails thinking the filesystem is busy.
> >
> > There could also be clients holding opens, locks, or delegations on the
> > export.
> >
> >> I'm running into this on a production system.
> >>
> >> Was anything ever done to resolve this issue?
> >> If not are there any workarounds?
> >
> > You can shut down the server completely, unmount, and restart.
>
>
> What is different with shutting down the server completely vs unexporting?
>
> Does shutting down the server somehow wait for in-flight operations to
> complete whereas the unexport doesn't? I'm assuming that it can't just
> cancel in-progress disk I/O and as long as that's happening then we
> won't be able to unmount the filesystem.

Shutting down the server waits for all nfsd threads to complete what they are
currently doing.
I think you can simply:

exportfs -u the filesystem
N=`cat /proc/fs/nfsd/thread`
echo 0 > /proc/fs/nfsd/threads
echo $N > /proc/fs/nfsd/threads
umount the filesystem

to reliably unmount a filesystem used by nfsd.
NFS service will be stopped for a moment but clients shouldn't notice beyond
slight delay and the need to re-establish a connection.

If this doesn't work for some reason, we should probably fix it.

NeilBrown

>
> Thanks,
> Chris


Attachments:
signature.asc (828.00 B)

2014-03-21 20:20:44

by J. Bruce Fields

[permalink] [raw]
Subject: Re: race-free exportfs and unmount?

On Fri, Mar 21, 2014 at 02:17:13PM -0600, Chris Friesen wrote:
>
> Hi,
>
> There was a linux-nfs thread in July 2012 with the subject "Linux
> NFS and cached properties". It discussed the fact that you can't
> reliably do
>
> exportfs -u 192.168.1.11:/mnt
> umount /mnt
>
> since there could be rpc users still running when exportfs returns,
> so the umount fails thinking the filesystem is busy.

There could also be clients holding opens, locks, or delegations on the
export.

> I'm running into this on a production system.
>
> Was anything ever done to resolve this issue?
> If not are there any workarounds?

You can shut down the server completely, unmount, and restart.

What is it you need to do exactly?

--b.

2014-03-22 10:18:07

by Larry Keegan

[permalink] [raw]
Subject: Re: race-free exportfs and unmount?

On Fri, 21 Mar 2014 14:56:11 -0600
Chris Friesen <[email protected]> wrote:
> On 03/21/2014 02:20 PM, J. Bruce Fields wrote:
> > On Fri, Mar 21, 2014 at 02:17:13PM -0600, Chris Friesen wrote:
> >>
> >> Hi,
> >>
> >> There was a linux-nfs thread in July 2012 with the subject "Linux
> >> NFS and cached properties". It discussed the fact that you can't
> >> reliably do
> >>
> >> exportfs -u 192.168.1.11:/mnt

You forgot echo /mnt > /proc/fs/nfsd/unlock_filesystem

> >> umount /mnt
> >>
> >> since there could be rpc users still running when exportfs returns,
> >> so the umount fails thinking the filesystem is busy.

This is almost always the case on an active NFS server. Stuff 'em! Just
unlock the filesystem and your drbd flip should work just fine. I've
been doing it for years.

BOFH.

2014-03-21 20:56:23

by Chris Friesen

[permalink] [raw]
Subject: Re: race-free exportfs and unmount?

On 03/21/2014 02:20 PM, J. Bruce Fields wrote:
> On Fri, Mar 21, 2014 at 02:17:13PM -0600, Chris Friesen wrote:
>>
>> Hi,
>>
>> There was a linux-nfs thread in July 2012 with the subject "Linux
>> NFS and cached properties". It discussed the fact that you can't
>> reliably do
>>
>> exportfs -u 192.168.1.11:/mnt
>> umount /mnt
>>
>> since there could be rpc users still running when exportfs returns,
>> so the umount fails thinking the filesystem is busy.
>
> There could also be clients holding opens, locks, or delegations on the
> export.
>
>> I'm running into this on a production system.
>>
>> Was anything ever done to resolve this issue?
>> If not are there any workarounds?
>
> You can shut down the server completely, unmount, and restart.

Just to clarify, you mean shut down the NFS server processes? As in,
"/etc/init.d/nfsserver stop"?

Currently there is another filesystem that stays exported and doing the
above would take it down too....but I might be able to make that work if
it's the only way.

> What is it you need to do exactly?

We have two servers that act as primary/secondary for a drbd-replicated
filesystem. The primary mounts the drbd filesystem and exports it via nfs.

This is used for OpenStack, so there should be very little
contention--each compute node generally only touches the files
corresponding to the VMs that it is hosting. I don't think they would
be doing NFS locks, but I could be wrong.

On a controlled failover, we need to take down the NFS server IP
address, unexport the filesystem, unmount the drbd device, and set drbd
to secondary.

What we're seeing is that the unexport passes, but the unmount fails. A
few minutes later one of our guys manually ran "exportfs -f" and that
seemed to unblock things.

Chris