Hi,
I'm wondering if I'm missing something or if this is a bug.
A NFS4 export has active clients. The mount is removed from
/etc/exports and 'exportfs -r' is run. Clients immediately start
getting 'Stale file handle' errors, but the mountpoint is still busy
and cannot be unmounted. Killing off nfsd solves the problem, but is
undesirable for obvious reasons.
On debian linux, kernel version 3.10-2-amd64, with nfs-utils 1.2.8.
Thanks,
mh
--
Martin Hicks P.Eng. | [email protected]
Bork Consulting Inc. | +1 (613) 266-2296
On Mon, Aug 19, 2013 at 11:55:58AM -0400, Martin Hicks wrote:
> Hi Bruce,
>
> On Fri, Aug 16, 2013 at 5:12 PM, J. Bruce Fields <[email protected]> wrote:
> > On Thu, Aug 15, 2013 at 12:04:33PM -0400, Martin Hicks wrote:
> >> I'm wondering if I'm missing something or if this is a bug.
> >>
> >> A NFS4 export has active clients. The mount is removed from
> >> /etc/exports and 'exportfs -r' is run. Clients immediately start
> >> getting 'Stale file handle' errors, but the mountpoint is still busy
> >> and cannot be unmounted. Killing off nfsd solves the problem, but is
> >> undesirable for obvious reasons.
> >>
> >> On debian linux, kernel version 3.10-2-amd64, with nfs-utils 1.2.8.
> >
> > Yeah, the clients may hold opens or locks on the filesystem and those
> > don't get removed on exports -r.
> >
> > For now shutting down the server is the only solution.
> >
> > We could possibly fix that, or provide some other way to do whatever it
> > is you're trying to do, but it's likely not a small change.
>
> Essentially I've got a NAS with two doors that have removable disks
> behind them. I get a signal from hardware when one of the doors is
> opened, and I need to kill services, unmount and remove the block
> device very quickly so the user can remove or swap disks. I was
> trying to avoid killing nfsd so that any clients connected to the
> block device behind the other door could continue uninterrupted.
OK, understood, so you're mainly worried about access to the remaining
data continuing uniterrrupted.
That said--it's *really* not a problem that the other stuff starts
erroring out immediately? I imagine the typical application isn't
going to handle the errors very gracefully.
> If this isn't possible then I need to minimize the downtime to the
> other disk. With quick experiements this morning if I simply restart
> nfs it seems to take between 60 and 90 seconds for the client to start
> doing IO again. I haven't tracked down the reason yet, but it seems
> like the server is preventing the client from doing IO for some
> time...
It's probably the grace period (which will block pretty much any IO for
clients using NFSv4).
--b.
On Mon, Aug 19, 2013 at 2:42 PM, J. Bruce Fields <[email protected]> wrote:
> On Mon, Aug 19, 2013 at 11:55:58AM -0400, Martin Hicks wrote:
>> Hi Bruce,
>> >
>> > We could possibly fix that, or provide some other way to do whatever it
>> > is you're trying to do, but it's likely not a small change.
>>
>> Essentially I've got a NAS with two doors that have removable disks
>> behind them. I get a signal from hardware when one of the doors is
>> opened, and I need to kill services, unmount and remove the block
>> device very quickly so the user can remove or swap disks. I was
>> trying to avoid killing nfsd so that any clients connected to the
>> block device behind the other door could continue uninterrupted.
>
> OK, understood, so you're mainly worried about access to the remaining
> data continuing uniterrrupted.
>
> That said--it's *really* not a problem that the other stuff starts
> erroring out immediately? I imagine the typical application isn't
> going to handle the errors very gracefully.
>
This is a product where the client users of the system will be
designing us in from the beginning, so their applications have to
tolerate this type of disk swapping.
>> If this isn't possible then I need to minimize the downtime to the
>> other disk. With quick experiements this morning if I simply restart
>> nfs it seems to take between 60 and 90 seconds for the client to start
>> doing IO again. I haven't tracked down the reason yet, but it seems
>> like the server is preventing the client from doing IO for some
>> time...
>
> It's probably the grace period (which will block pretty much any IO for
> clients using NFSv4).
This is correct. nfsv3 seems to recover quite quickly if I restart
the nfs services, in about 3 seconds. I can see with wireshark that
the server is holding off the nfsv4 clients for quite a while with
NFS4ERR_GRACE.
Setting nfsv4gracetime (once I also found fs.nfs.nlm_grace_period) get
the drop-out time to just above 10 seconds, which is the minimum
nfsv4gracetime. That'll have to do for now.
Thanks again,
mh
--
Martin Hicks P.Eng. | [email protected]
Bork Consulting Inc. | +1 (613) 266-2296
On Thu, Aug 15, 2013 at 12:04:33PM -0400, Martin Hicks wrote:
> I'm wondering if I'm missing something or if this is a bug.
>
> A NFS4 export has active clients. The mount is removed from
> /etc/exports and 'exportfs -r' is run. Clients immediately start
> getting 'Stale file handle' errors, but the mountpoint is still busy
> and cannot be unmounted. Killing off nfsd solves the problem, but is
> undesirable for obvious reasons.
>
> On debian linux, kernel version 3.10-2-amd64, with nfs-utils 1.2.8.
Yeah, the clients may hold opens or locks on the filesystem and those
don't get removed on exports -r.
For now shutting down the server is the only solution.
We could possibly fix that, or provide some other way to do whatever it
is you're trying to do, but it's likely not a small change.
Nevertheless, for future reference it would be interesting to know what
exactly you're trying to do.
--b.
On Fri, 16 Aug 2013 17:12:18 -0400 "J. Bruce Fields" <[email protected]>
wrote:
> On Thu, Aug 15, 2013 at 12:04:33PM -0400, Martin Hicks wrote:
> > I'm wondering if I'm missing something or if this is a bug.
> >
> > A NFS4 export has active clients. The mount is removed from
> > /etc/exports and 'exportfs -r' is run. Clients immediately start
> > getting 'Stale file handle' errors, but the mountpoint is still busy
> > and cannot be unmounted. Killing off nfsd solves the problem, but is
> > undesirable for obvious reasons.
> >
> > On debian linux, kernel version 3.10-2-amd64, with nfs-utils 1.2.8.
>
> Yeah, the clients may hold opens or locks on the filesystem and those
> don't get removed on exports -r.
>
> For now shutting down the server is the only solution.
How far does:
echo /path/to/export > /proc/fs/nfsd/unlock_filesystem
get you? Or does that just drop 'lockd' locks and not NFSv4 locks?
NeilBrown
>
> We could possibly fix that, or provide some other way to do whatever it
> is you're trying to do, but it's likely not a small change.
>
> Nevertheless, for future reference it would be interesting to know what
> exactly you're trying to do.
>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 21, 2013 at 12:43:43PM +1000, NeilBrown wrote:
> On Fri, 16 Aug 2013 17:12:18 -0400 "J. Bruce Fields" <[email protected]>
> wrote:
>
> > On Thu, Aug 15, 2013 at 12:04:33PM -0400, Martin Hicks wrote:
> > > I'm wondering if I'm missing something or if this is a bug.
> > >
> > > A NFS4 export has active clients. The mount is removed from
> > > /etc/exports and 'exportfs -r' is run. Clients immediately start
> > > getting 'Stale file handle' errors, but the mountpoint is still busy
> > > and cannot be unmounted. Killing off nfsd solves the problem, but is
> > > undesirable for obvious reasons.
> > >
> > > On debian linux, kernel version 3.10-2-amd64, with nfs-utils 1.2.8.
> >
> > Yeah, the clients may hold opens or locks on the filesystem and those
> > don't get removed on exports -r.
> >
> > For now shutting down the server is the only solution.
>
> How far does:
> echo /path/to/export > /proc/fs/nfsd/unlock_filesystem
> get you? Or does that just drop 'lockd' locks and not NFSv4 locks?
Right, it just does lockd locks. It should also do NFSv4 locks, opens,
and delegations. Happy if somebody wants to finish that job off--it
probably wouldn't be too hard? Although there may be a bit of work to
get the error returns right in the v4 case--I think we'd want to keep
the relevant stateid's around and return NFS4ERR_ADMIN_REVOKED when a
client continues to use them.
--b.
Hi Bruce,
On Fri, Aug 16, 2013 at 5:12 PM, J. Bruce Fields <[email protected]> wrote:
> On Thu, Aug 15, 2013 at 12:04:33PM -0400, Martin Hicks wrote:
>> I'm wondering if I'm missing something or if this is a bug.
>>
>> A NFS4 export has active clients. The mount is removed from
>> /etc/exports and 'exportfs -r' is run. Clients immediately start
>> getting 'Stale file handle' errors, but the mountpoint is still busy
>> and cannot be unmounted. Killing off nfsd solves the problem, but is
>> undesirable for obvious reasons.
>>
>> On debian linux, kernel version 3.10-2-amd64, with nfs-utils 1.2.8.
>
> Yeah, the clients may hold opens or locks on the filesystem and those
> don't get removed on exports -r.
>
> For now shutting down the server is the only solution.
>
> We could possibly fix that, or provide some other way to do whatever it
> is you're trying to do, but it's likely not a small change.
Essentially I've got a NAS with two doors that have removable disks
behind them. I get a signal from hardware when one of the doors is
opened, and I need to kill services, unmount and remove the block
device very quickly so the user can remove or swap disks. I was
trying to avoid killing nfsd so that any clients connected to the
block device behind the other door could continue uninterrupted.
If this isn't possible then I need to minimize the downtime to the
other disk. With quick experiements this morning if I simply restart
nfs it seems to take between 60 and 90 seconds for the client to start
doing IO again. I haven't tracked down the reason yet, but it seems
like the server is preventing the client from doing IO for some
time...
Thanks,
mh
--
Martin Hicks P.Eng. | [email protected]
Bork Consulting Inc. | +1 (613) 266-2296
On Wed, Aug 21, 2013 at 7:27 AM, J. Bruce Fields <[email protected]> wrote:
> On Wed, Aug 21, 2013 at 12:43:43PM +1000, NeilBrown wrote:
>> On Fri, 16 Aug 2013 17:12:18 -0400 "J. Bruce Fields" <[email protected]>
>> wrote:
>>
>> > On Thu, Aug 15, 2013 at 12:04:33PM -0400, Martin Hicks wrote:
>> > > I'm wondering if I'm missing something or if this is a bug.
>> > >
>> > > A NFS4 export has active clients. The mount is removed from
>> > > /etc/exports and 'exportfs -r' is run. Clients immediately start
>> > > getting 'Stale file handle' errors, but the mountpoint is still busy
>> > > and cannot be unmounted. Killing off nfsd solves the problem, but is
>> > > undesirable for obvious reasons.
>> > >
>> > > On debian linux, kernel version 3.10-2-amd64, with nfs-utils 1.2.8.
>> >
>> > Yeah, the clients may hold opens or locks on the filesystem and those
>> > don't get removed on exports -r.
>> >
>> > For now shutting down the server is the only solution.
>>
>> How far does:
>> echo /path/to/export > /proc/fs/nfsd/unlock_filesystem
>> get you? Or does that just drop 'lockd' locks and not NFSv4 locks?
>
> Right, it just does lockd locks. It should also do NFSv4 locks, opens,
> and delegations. Happy if somebody wants to finish that job off--it
> probably wouldn't be too hard? Although there may be a bit of work to
> get the error returns right in the v4 case--I think we'd want to keep
> the relevant stateid's around and return NFS4ERR_ADMIN_REVOKED when a
> client continues to use them.
I don't have the bandwidth to take this on right now, but I may in the future.
Thanks for your help,
mh
--
Martin Hicks P.Eng. | [email protected]
Bork Consulting Inc. | +1 (613) 266-2296