Date:   Thu, 18 Apr 2019 16:43:56 -0400
From:   Scott Mayhew <smayhew@redhat.com>
To:     Trond Myklebust <trondmy@hammerspace.com>
Cc:     "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: Question about open(CLAIM_FH)
Message-ID: <20190418204356.GA15226@coeurl.usersys.redhat.com>
References: <20190418133728.GS3773@coeurl.usersys.redhat.com>
 <213d4ead8a7ae890dadc7785d59117e798f94748.camel@hammerspace.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <213d4ead8a7ae890dadc7785d59117e798f94748.camel@hammerspace.com>
User-Agent: Mutt/1.11.3 (2019-02-01)
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk

On Thu, 18 Apr 2019, Trond Myklebust wrote:

> Hi Scott,
> 
> On Thu, 2019-04-18 at 09:37 -0400, Scott Mayhew wrote:
> > When the client does an open(CLAIM_FH) and the server already has
> > open
> > state for that open owner and file, what's supposed to happen?
> > Currently the server returns the existing stateid with the seqid
> > bumped,
> > but it looks like the client is expecting a new stateid (I'm seeing
> > the
> > state manager spending a lot of time waiting in
> > nfs_set_open_stateid_locked() due to NFS_STATE_CHANGE_WAIT being set
> > in
> > the state flags by nfs_need_update_open_stateid()).
> > 
> > Looking at rfc5661 section 18.16.3, I see:
> > 
> >    | CLAIM_NULL, CLAIM_FH | For the client, this is a new OPEN
> > request |
> >    |                      | and there is no previous state
> > associated  |
> >    |                      | with the file for the
> > client.  With        |
> >    |                      | CLAIM_NULL, the file is identified by
> > the  |
> >    |                      | current filehandle and the
> > specified       |
> >    |                      | component name.  With CLAIM_FH (new
> > to     |
> >    |                      | NFSv4.1), the file is identified by
> > just   |
> >    |                      | the current filehandle.  
> > 
> > So it seems like maybe the server should be tossing the old state and
> > returning a new stateid?
> > 
> 
> No. As far as the protocol is concerned, the only difference between
> CLAIM_NULL and CLAIM_FH is through how the client identifies the file
> (in the first case, through an implicit lookup, and in the second case
> through a file handle). The client should be free to intermix the two
> types of OPEN, and it should expect the resulting stateids to depend
> only on whether or not the open_owner matches. If the open_owner
> matches an existing stateid, then that stateid is bumped and returned.
> 
> I'm not aware of any expectation in the client that this should not be
> the case, so if you are seeing different behaviour, then something else
> must be at work here. Is the client perhaps mounting the same
> filesystem in two different places in such a way that the super block
> is not being shared?

No, it's just a single 4.1 mount w/ the default mount options.

For a bit of background, I've been trying to track down a problem in
RHEL where the SEQ4_STATUS_RECALLABLE_STATE_REVOKED flags is getting
permanently set because the nfs4_client->cl_revoked list on the server
is non-empty... yet there's no longer open state on the client. 

I can reproduce it pretty easily in RHEL using 2 VMs, each with 2-4 CPUs
and 4-8G of memory.  The server has 64 nfsd threads and a 15 second
lease time.

On the client I'm running the following to add a 10ms delay to CB_RECALL
replies:
# stap -gve 'global count = 0; probe module("nfsv4").function("nfs4_callback_recall") { printf("%s: %d\n", ppfunc(), ++count); mdelay(10); }'

then in another window I open a bunch of files:
# for i in `seq -w 1 5000`; do sleep 2m </mnt/t/dir1/file.$i & done

(Note: I already created the files ahead of time)

As soon as the bash prompt returns on the client, I run the following on
the server:
# for i in `seq -w 1 5000`; do date >/export/dir1/file.$i & done

At that point, any further SEQUENCE ops will have the recallable state
revoked flag set on the client until the fs is unmounted.

If I run the same steps on Fedora clients with recent kernels, I don't
have the problem with the recallable state revoked flag, but I'm getting
some other strangeness.  Everything starts out fine with
nfs_reap_expired_delegations() doing TEST_STATEID and FREE_STATEID, but
once the state manager starts callings nfs41_open_expired(), things sort
of grind to a halt and I see 1 OPEN and 1 or 2 TEST_STATEID ops every 5
seconds in wireshark.  It stays that way until the files are closed on
the client, when I see a slew of DELEGRETURNs and FREE_STATEIDs... but
I'm only seeing 3 or 4 CLOSE ops.  If I poke around in crash on the
server, I see a ton of open stateids:

crash> epython fs/nfsd/print-client-state-info.py
nfsd_net = 0xffff93e473511000
        nfs4_client = 0xffff93e3f7954980
                nfs4_stateowner = 0xffff93e4058cc360 num_stateids = 4997 <---- only 3 CLOSE ops were received
                num_openowners = 1
                num_layouts = 0
                num_delegations = 0
                num_sessions = 1
                num_copies = 0
                num_revoked = 0
                cl_cb_waitq_qlen = 0

Those stateids stick around until the fs is unmounted (and the
DESTROY_STATEID ops return NFS4ERR_CLIENTID_BUSY while doing so).

Both VMs are running 5.0.6-200.fc29.x86_64, but the server also has the
"nfsd: Don't release the callback slot unless it was actually held"
patch you sent a few weeks ago as well as the "nfsd: CB_RECALL can race
with FREE_STATEID" patch I sent today.

-Scott

> 
> Cheers
>   Trond
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
>