2015-11-13 09:22:54

by Christoph Hellwig

[permalink] [raw]
Subject: lock reclaim failed!

I see the following running xfstestst on latest Linus' tree against
as sever from the same tree:

generic/089 [ 2000.358405] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[ 2001.769939] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[ 2001.770732] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[ 2122.608136] NFS: nfs4_reclaim_open_state: Lock reclaim failed!



2015-11-13 15:47:33

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: lock reclaim failed!

On Fri, Nov 13, 2015 at 4:22 AM, Christoph Hellwig <[email protected]> wrote:
> I see the following running xfstestst on latest Linus' tree against
> as sever from the same tree:
>
> generic/089 [ 2000.358405] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
> [ 2001.769939] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
> [ 2001.770732] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
> [ 2122.608136] NFS: nfs4_reclaim_open_state: Lock reclaim failed!

Going against the NetApp server, we have seen this error logged when
the server returned BAD_STATEID to the IO on the delegated stateid. Is
there such error on the network trace?

2015-11-14 00:59:04

by Jeff Layton

[permalink] [raw]
Subject: Re: lock reclaim failed!

On Fri, 13 Nov 2015 10:47:32 -0500
Olga Kornievskaia <[email protected]> wrote:

> On Fri, Nov 13, 2015 at 4:22 AM, Christoph Hellwig <[email protected]> wrote:
> > I see the following running xfstestst on latest Linus' tree against
> > as sever from the same tree:
> >
> > generic/089 [ 2000.358405] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
> > [ 2001.769939] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
> > [ 2001.770732] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
> > [ 2122.608136] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
>
> Going against the NetApp server, we have seen this error logged when
> the server returned BAD_STATEID to the IO on the delegated stateid. Is
> there such error on the network trace?

Yes, a capture is probably the best way to tell what's really going on.

I just tried generic/test089 a couple of times on the fedora kernel
here (on both server and client) and didn't see that pop:

http://koji.fedoraproject.org/koji/buildinfo?buildID=699182

What's the top commit of your kernel? I assume that the server didn't
restart or anything, right?

--
Jeff Layton <[email protected]>

2015-11-17 12:00:10

by Christoph Hellwig

[permalink] [raw]
Subject: Re: lock reclaim failed!

I've tried for a couple days to reproduce this issue, but failed.
Looks like this was just a one off - sorry for the noise.