2021-12-13 03:39:33

by [email protected]

[permalink] [raw]
Subject: [bug report] nfs clients fail to get read delegations after file open with O_RDWR

Hi,
  While running our recent nfs related tests, we found a delegation regression by nfstests[1]:

======================================================================
/root 10:49# ./test.sh
nfsd       sock fh export svc proc fileop auth repcache xdr lockd
    INFO: 10:49:21.015795 - SYSTEM: Linux btrfs 5.16.0-rc5-custom #41 SMP PREEMPT Mon Dec 13 10:47:00 CST 2021 x86_64
    DBG1: 10:49:21.016102 - Get routing info: /usr/bin/ip route get 192.168.122.77
    DBG7: 10:49:21.029015 - SETUP starts
    DBG3: 10:49:21.029135 - Sync all buffers to disk
    DBG2: 10:49:21.912538 - Unmount volume: umount -f /mnt/t
    DBG2: 10:49:21.918442 - Mount volume: mount -o vers=4.2,proto=tcp,sec=sys,hard,rsize=4096,wsize=4096 192.168.122.77:/nfstest /mnt/t
    DBG5: 10:49:22.227222 - Get the actual NFS version of mount point: findmnt /mnt/t
    DBG6: 10:49:22.235422 -     NFS version of mount point: 4.2
    DBG3: 10:49:22.235579 - Creating file [/mnt/t/nfstest_delegation_20211213_104921_f_001] 65536@0
    DBG3: 10:49:22.305370 - Creating file [/mnt/t/nfstest_delegation_20211213_104921_f_002] 65536@0
    DBG3: 10:49:22.367123 - Sync all buffers to disk
    DBG2: 10:49:22.382804 - Unmount volume: umount -f /mnt/t
    DBG7: 10:49:22.427507 - SETUP done
    TIME: 1.416896s
*** Basic READ delegation test
    TEST: Running test 'basic01'
    DBG3: 10:49:22.430398 - Sync all buffers to disk
    DBG2: 10:49:22.439226 - Unmount volume: umount -f /mnt/t
    DBG2: 10:49:22.444660 - Start remote procedure server locally
    DBG2: 10:49:22.552758 - Trace start: /usr/bin/tcpdump -i lo -n -B 196608 -s 0 -w /tmp/nfstest_delegation_20211213_104921_001.cap host 192.168.122.77
    DBG2: 10:49:22.629611 - Mount volume: mount -o vers=4.2,proto=tcp,sec=sys,hard,rsize=4096,wsize=4096 192.168.122.77:/nfstest /mnt/t
    DBG5: 10:49:22.758927 - Get the actual NFS version of mount point: findmnt /mnt/t
    DBG6: 10:49:22.766713 -     NFS version of mount point: 4.2
    DBG4: 10:49:22.766848 - Open /mnt/t/nfstest_delegation_20211213_104921_f_001 so open owner sticks around
    DBG2: 10:49:22.779669 - Open file for READ [/mnt/t/nfstest_delegation_20211213_104921_f_002]
    PASS: Open file for READ should succeed
    DBG3: 10:49:22.788609 - Read file [/mnt/t/nfstest_delegation_20211213_104921_f_002]
    PASS: Read file should succeed
    DBG2: 10:49:22.911668 - Open file for READ on same process [/mnt/t/nfstest_delegation_20211213_104921_f_002]
    PASS: Open file for READ on same process should succeed
    DBG3: 10:49:22.918108 - Read file on same process [/mnt/t/nfstest_delegation_20211213_104921_f_002]
    PASS: Read file on same process should succeed
    DBG2: 10:49:23.028735 - Open file for READ from a different process [/mnt/t/nfstest_delegation_20211213_104921_f_002]
    PASS: Open file for READ from a different process should succeed
    DBG3: 10:49:23.035165 - Read file from a different process [/mnt/t/nfstest_delegation_20211213_104921_f_002]
    PASS: Read file from a different process should succeed
    DBG3: 10:49:23.144099 - Sync all buffers to disk
    DBG2: 10:49:23.161263 - Unmount volume: umount -f /mnt/t
    DBG2: 10:49:23.224145 - Trace stop
    DBG2: 10:49:33.231485 - Stopping packet trace capture: stopping process 1550
    DBG1: 10:49:33.311940 - trace_open [/tmp/nfstest_delegation_20211213_104921_001.cap]
    FAIL: OPEN should be sent
    PASS: OPEN should be sent with CLAIM_NULL
    PASS: OPEN should be sent with the name of the file to be opened
    PASS: OPEN should be sent with the filehandle of the directory
    FAIL: READ delegation should be granted
    TIME: 10.946048s
=============================================================
/root 10:51# cat test.sh
dmesg -C
rpcdebug -m nfsd -s all
export PYTHONPATH=/root/nfstest/
cd /root/nfstest/test
./nfstest_delegation --nfsversion=4.2 -e /nfstest --server  192.168.122.77  --trcdelay 10 --runtest basic01 -v debug

rpcdebug -m nfsd -c all
=============================================================
The output of dmesg is on link[2].

The test first opens and creates file1 and file2. Does umount and mount again,
opens the files with O_RDONLY, and tries to get a READ delegation from the server.
However, the reply sever sent did not contain a delegation so it failed.
The test fails with kernel version v5.14..v5.16-rc5.

Please correct me if something below is wrong :)
After some investigations,  I realized there is nfsd file cache mechanism since
the commit 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd").
The last refcount of one dentry and the inode is always held by the nfs file lru list.
The refcount will be freed when nfsd_file_gc() is called.  And the times to call it
are 1) NFSD_LAUNDRETTE_DELAY elapsed. 2) nfsd is shutting down.

In the above scenario,  the client closed the file just created which triggered
nfsd_file_put() to queue nfsd_filecache_laundrette and umountd.
However, before the cleanup start (in NFSD_LAUNDRETTE_DELAY), the client
mounted and opened the file again. nfsd found the dentry because nfsd lru list entry
is still holding it. And nfsd4_check_conflicting_opens() refused to give a delegation
(-EAGAIN) because inode->i_writecount is 1 even @fp->fi_fds[O_RDONLY] is not NULL.

And the server won't give any READ delegation until the second umount (nfsd_file_free()
called by the nfsd_file_gc_worker()).

So I wonder if it's a real regression/bug or an expected "cost" of speedup on
nfsd file caches?


[1]: http://git.linux-nfs.org/?p=mora/nfstest.git;a=summary
[2]: https://pastebin.com/w8eBF5Qi

Thanks
--
Su


2021-12-13 21:55:53

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [bug report] nfs clients fail to get read delegations after file open with O_RDWR

On Mon, Dec 13, 2021 at 03:39:26AM +0000, [email protected] wrote:
> So I wonder if it's a real regression/bug or an expected "cost" of speedup on
> nfsd file caches?

Without looking at this case in detail:

Delegations are granted at the server's discretion, so this certainly
isn't a bug.

It might be suboptimal behavior. If there's evidence that this causes
significant regressions on some important workload, then we'd want to
look into fixing it.

--b.

2021-12-14 00:58:19

by [email protected]

[permalink] [raw]
Subject: Re: [bug report] nfs clients fail to get read delegations after file open with O_RDWR

Thanks for your quick reply!

>>Without looking at this case in detail:

>>Delegations are granted at the server's discretion, so this certainly
>>isn't a bug.

Got it.

>>It might be suboptimal behavior. If there's evidence that this causes
>>significant regressions on some important workload, then we'd want to
>>look into fixing it.

If I understand the case correctly, the most common workload it influences like:

One nfs client opens a file with flag O_WRONLY/O_RDWR, close it.
Then some nfs clients open the file with O_RDONLY right now then it will prevent
server to give any delegation to other clients. It may cause many unnecessary
requests from clients because lack of delegations.

--
Su
>>--b.

________________________________________
From: J. Bruce Fields <[email protected]>
Sent: Tuesday, December 14, 2021 5:55
To: Su, Yue/?? Խ
Cc: [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [bug report] nfs clients fail to get read delegations after file open with O_RDWR

On Mon, Dec 13, 2021 at 03:39:26AM +0000, [email protected] wrote:
> So I wonder if it's a real regression/bug or an expected "cost" of speedup on
> nfsd file caches?

Without looking at this case in detail:

Delegations are granted at the server's discretion, so this certainly
isn't a bug.

It might be suboptimal behavior. If there's evidence that this causes
significant regressions on some important workload, then we'd want to
look into fixing it.

--b.

2021-12-14 16:45:09

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [bug report] nfs clients fail to get read delegations after file open with O_RDWR

On Tue, Dec 14, 2021 at 12:50:53AM +0000, [email protected] wrote:
> Thanks for your quick reply!
>
> >>Without looking at this case in detail:
>
> >>Delegations are granted at the server's discretion, so this certainly
> >>isn't a bug.
>
> Got it.
>
> >>It might be suboptimal behavior. If there's evidence that this causes
> >>significant regressions on some important workload, then we'd want to
> >>look into fixing it.
>
> If I understand the case correctly, the most common workload it influences like:
>
> One nfs client opens a file with flag O_WRONLY/O_RDWR, close it.
> Then some nfs clients open the file with O_RDONLY right now then it will prevent
> server to give any delegation to other clients. It may cause many unnecessary
> requests from clients because lack of delegations.

Right.

For the moment, this is something I'd accept patches for, but I'm not
actively working on.

I think it's been suggested that we could even turn off the file cache
completely in the v4 case, since in that case we don't have to re-open
on every IO.

--b.

2021-12-14 17:19:57

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [bug report] nfs clients fail to get read delegations after file open with O_RDWR

On Tue, 2021-12-14 at 11:45 -0500, J. Bruce Fields wrote:
> On Tue, Dec 14, 2021 at 12:50:53AM +0000, [email protected] wrote:
> > Thanks for your quick reply!
> >
> > > > Without looking at this case in detail:
> >
> > > > Delegations are granted at the server's discretion, so this certainly
> > > > isn't a bug.
> >
> > Got it.
> >
> > > > It might be suboptimal behavior. If there's evidence that this causes
> > > > significant regressions on some important workload, then we'd want to
> > > > look into fixing it.
> >
> > If I understand the case correctly, the most common workload it influences like:
> >
> > One nfs client opens a file with flag O_WRONLY/O_RDWR, close it.
> > Then some nfs clients open the file with O_RDONLY right now then it will prevent
> > server to give any delegation to other clients. It may cause many unnecessary
> > requests from clients because lack of delegations.
>
> Right.
>
> For the moment, this is something I'd accept patches for, but I'm not
> actively working on.
>
> I think it's been suggested that we could even turn off the file cache
> completely in the v4 case, since in that case we don't have to re-open
> on every IO.
>

Is that really desirable behavior? There is the bloom filter in
nfs4state.c too that prevents it from handing out a delegation too soon
after a delegrecall.

The situation above doesn't involve a recall, but it _could_ have if the
timing had been a little different. It's probably worth thinking about
how the rules for this ought to work in all cases.

Should we be treating inodes that experience real delegation recalls
differently from this case?
--
Jeff Layton <[email protected]>

2021-12-14 17:24:55

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [bug report] nfs clients fail to get read delegations after file open with O_RDWR

On Tue, Dec 14, 2021 at 12:19:54PM -0500, Jeff Layton wrote:
> On Tue, 2021-12-14 at 11:45 -0500, J. Bruce Fields wrote:
> > On Tue, Dec 14, 2021 at 12:50:53AM +0000, [email protected] wrote:
> > > If I understand the case correctly, the most common workload it influences like:
> > >
> > > One nfs client opens a file with flag O_WRONLY/O_RDWR, close it.
> > > Then some nfs clients open the file with O_RDONLY right now then it will prevent
> > > server to give any delegation to other clients. It may cause many unnecessary
> > > requests from clients because lack of delegations.
...
> Is that really desirable behavior? There is the bloom filter in
> nfs4state.c too that prevents it from handing out a delegation too soon
> after a delegrecall.
>
> The situation above doesn't involve a recall, but it _could_ have if the
> timing had been a little different. It's probably worth thinking about
> how the rules for this ought to work in all cases.
>
> Should we be treating inodes that experience real delegation recalls
> differently from this case?

It's not hard to imagine common cases where clients would want to read a
file soon after it's written. E.g. a distributed compile or processing
pipeline where clients are sitting there waiting for new files to appear
and then doing further processing on them when they do. I guess the
creator would do something in that case to indicate when the writing was
done--maybe rename the file to a common directory, or send some kind of
signal outside of NFS.

--b.