2021-09-22 20:58:31

by Kazi Anwar

[permalink] [raw]
Subject: nfs4err_delay

Hi,
We are running nfs v 4.1 on centos 7.6.
We are seeing an NFS issue where when files/dirs are deleted from a
client they are occasionally stuck at unlinkat system call(according
to strace its stuck for 100.5 secs every time). Can anyone explain
this behavior?
Running tcp dump shows nfs4err_delay status sent from the server to
the stuck client.

--
Kazi Anwar


2021-09-24 21:45:49

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfs4err_delay

On Wed, Sep 22, 2021 at 03:56:23PM -0500, Kazi Anwar wrote:
> We are running nfs v 4.1 on centos 7.6.
> We are seeing an NFS issue where when files/dirs are deleted from a
> client they are occasionally stuck at unlinkat system call(according
> to strace its stuck for 100.5 secs every time). Can anyone explain
> this behavior?
> Running tcp dump shows nfs4err_delay status sent from the server to
> the stuck client.

Client and server are both centos 7.6?

Is the NFS4ERR_DELAY a reponse to a REMOVE?

Does /proc/locks show a delegation held on the file the client's trying
to remove?

--b.

2021-09-24 21:52:38

by Kazi Anwar

[permalink] [raw]
Subject: Re: nfs4err_delay

Yes, both clients and server are centos 7.6. And the NFS4ERR_DELAY is
a response to a REMOVE.
I will need to check on the locks the next time it happens. Can you
share what you are thinking?

thanks,
Kazi

On Fri, Sep 24, 2021 at 11:39 AM J. Bruce Fields <[email protected]> wrote:
>
> On Wed, Sep 22, 2021 at 03:56:23PM -0500, Kazi Anwar wrote:
> > We are running nfs v 4.1 on centos 7.6.
> > We are seeing an NFS issue where when files/dirs are deleted from a
> > client they are occasionally stuck at unlinkat system call(according
> > to strace its stuck for 100.5 secs every time). Can anyone explain
> > this behavior?
> > Running tcp dump shows nfs4err_delay status sent from the server to
> > the stuck client.
>
> Client and server are both centos 7.6?
>
> Is the NFS4ERR_DELAY a reponse to a REMOVE?
>
> Does /proc/locks show a delegation held on the file the client's trying
> to remove?
>
> --b.

2021-09-24 22:13:58

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfs4err_delay

On Fri, Sep 24, 2021 at 12:54:30PM -0500, Kazi Anwar wrote:
> Yes, both clients and server are centos 7.6. And the NFS4ERR_DELAY is
> a response to a REMOVE.
> I will need to check on the locks the next time it happens. Can you
> share what you are thinking?

Offhand the only reason I can think a server would return DELAY is that
there's a delegation on the file being removed, and the delegation
recall and return isn't working for some reason.

If that's the case, it should succeed after about 90 seconds. Also, you
can workaround the problem by turning of delegations and leases with
"echo 0 >/proc/sys/fs/leases_enable".

--b.

>
> thanks,
> Kazi
>
> On Fri, Sep 24, 2021 at 11:39 AM J. Bruce Fields <[email protected]> wrote:
> >
> > On Wed, Sep 22, 2021 at 03:56:23PM -0500, Kazi Anwar wrote:
> > > We are running nfs v 4.1 on centos 7.6.
> > > We are seeing an NFS issue where when files/dirs are deleted from a
> > > client they are occasionally stuck at unlinkat system call(according
> > > to strace its stuck for 100.5 secs every time). Can anyone explain
> > > this behavior?
> > > Running tcp dump shows nfs4err_delay status sent from the server to
> > > the stuck client.
> >
> > Client and server are both centos 7.6?
> >
> > Is the NFS4ERR_DELAY a reponse to a REMOVE?
> >
> > Does /proc/locks show a delegation held on the file the client's trying
> > to remove?
> >
> > --b.