2021-04-14 07:43:33

by J. Bruce Fields

[permalink] [raw]
Subject: generic/430 COPY/delegation caching regression

generic/430 started failing in 4.12-rc3, as of 7c1d1dcc24b3 "nfsd: grant
read delegations to clients holding writes".

Looks like that reintroduced the problem fixed by 16abd2a0c124 "NFSv4.2:
fix client's attribute cache management for copy_file_range": the client
needs to invalidate its cache of the destination of a copy even when it
holds a delegation.

--b.


2021-04-14 12:18:03

by Trond Myklebust

[permalink] [raw]
Subject: Re: generic/430 COPY/delegation caching regression

On Tue, 2021-04-13 at 19:19 -0400, J. Bruce Fields wrote:
> generic/430 started failing in 4.12-rc3, as of 7c1d1dcc24b3 "nfsd:
> grant
> read delegations to clients holding writes".
>
> Looks like that reintroduced the problem fixed by 16abd2a0c124
> "NFSv4.2:
> fix client's attribute cache management for copy_file_range": the
> client
> needs to invalidate its cache of the destination of a copy even when it
> holds a delegation.
>
> --b.

Hmm.. The only thing I see that could be causing an issue is the fact
that we're relying on cache invalidation to change the file size.

nfs_set_cache_invalid(
dst_inode, NFS_INO_REVAL_PAGECACHE | NFS_INO_REVAL_FORCED |
NFS_INO_INVALID_SIZE | NFS_INO_INVALID_ATTR |
NFS_INO_INVALID_DATA);

The only problem there is that nfs_set_cache_invalid() will clobber the
NFS_INO_INVALID_SIZE because if we hold a delegation, then our client
is the sole authority for the size attribute (hence we don't allow it
to be invalidated). We therefore expect a call to i_size_write(), if
the file size grew.

Otherwise, the setting of NFS_INO_INVALID_DATA should be redundant
because we've already punched a hole with truncate_pagecache_range().

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-04-14 12:21:19

by Kornievskaia, Olga

[permalink] [raw]
Subject: Re: generic/430 COPY/delegation caching regression



On 4/13/21, 7:20 PM, "J. Bruce Fields" <[email protected]> wrote:

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




generic/430 started failing in 4.12-rc3, as of 7c1d1dcc24b3 "nfsd: grant
read delegations to clients holding writes".

Looks like that reintroduced the problem fixed by 16abd2a0c124 "NFSv4.2:
fix client's attribute cache management for copy_file_range": the client
needs to invalidate its cache of the destination of a copy even when it
holds a delegation.

[olga] I'm confused what client version are you testing and against what server? I haven't seen generic/430 failing while testing upstream versions against upstream server verions. What should I try (as in what client version against what server version) to reproduce the failure?

--b.

2021-04-14 16:14:40

by J. Bruce Fields

[permalink] [raw]
Subject: Re: generic/430 COPY/delegation caching regression

On Wed, Apr 14, 2021 at 03:30:19AM +0000, Kornievskaia, Olga wrote:
> On 4/13/21, 7:20 PM, "J. Bruce Fields" <[email protected]> wrote:
> generic/430 started failing in 4.12-rc3, as of 7c1d1dcc24b3 "nfsd: grant
> read delegations to clients holding writes".
>
> Looks like that reintroduced the problem fixed by 16abd2a0c124 "NFSv4.2:
> fix client's attribute cache management for copy_file_range": the client
> needs to invalidate its cache of the destination of a copy even when it
> holds a delegation.
>
> [olga] I'm confused what client version are you testing and against what server? I haven't seen generic/430 failing while testing upstream versions against upstream server verions. What should I try (as in what client version against what server version) to reproduce the failure?

You can reproduce it with client and server both on rc3.

(In more detail: you need a client with 7c1d1dcc24b3, but a server that
doesn't yet have 6ee65a773096 "Revert "nfsd4: a client's own opens
needn't prevent delegation"".

I have a patch that will restore the server's ability to grant
delegations to clients with write opens, but this regression was one of
the problems I ran across in testing....)

--b.

2021-04-14 16:25:58

by J. Bruce Fields

[permalink] [raw]
Subject: Re: generic/430 COPY/delegation caching regression

On Wed, Apr 14, 2021 at 03:09:18AM +0000, Trond Myklebust wrote:
> On Tue, 2021-04-13 at 19:19 -0400, J. Bruce Fields wrote:
> > generic/430 started failing in 4.12-rc3, as of 7c1d1dcc24b3 "nfsd:
> > grant
> > read delegations to clients holding writes".
> >
> > Looks like that reintroduced the problem fixed by 16abd2a0c124
> > "NFSv4.2:
> > fix client's attribute cache management for copy_file_range": the
> > client
> > needs to invalidate its cache of the destination of a copy even when it
> > holds a delegation.
> >
> > --b.
>
> Hmm.. The only thing I see that could be causing an issue is the fact
> that we're relying on cache invalidation to change the file size.
>
> nfs_set_cache_invalid(
> dst_inode, NFS_INO_REVAL_PAGECACHE | NFS_INO_REVAL_FORCED |
> NFS_INO_INVALID_SIZE | NFS_INO_INVALID_ATTR |
> NFS_INO_INVALID_DATA);
>
> The only problem there is that nfs_set_cache_invalid() will clobber the
> NFS_INO_INVALID_SIZE because if we hold a delegation, then our client
> is the sole authority for the size attribute (hence we don't allow it
> to be invalidated). We therefore expect a call to i_size_write(), if
> the file size grew.
>
> Otherwise, the setting of NFS_INO_INVALID_DATA should be redundant
> because we've already punched a hole with truncate_pagecache_range().

Looks like it's just copying a file and finding the destination still empty;
expected/actual output diff from xfstests is:

e11fbace556cba26bf0076e74cab90a3 TEST_DIR/test-430/file
e11fbace556cba26bf0076e74cab90a3 TEST_DIR/test-430/copy
Copy beginning of original file
+cmp: EOF on /mnt/test-430/beginning which is empty
md5sums after copying beginning:
e11fbace556cba26bf0076e74cab90a3 TEST_DIR/test-430/file
-cabe45dcc9ae5b66ba86600cca6b8ba8 TEST_DIR/test-430/beginning

The test script there is:

echo "Create the original file and then copy"
$XFS_IO_PROG -f -c 'pwrite -S 0x61 0 1000' $testdir/file >> $seqres.full 2>&1
$XFS_IO_PROG -f -c 'pwrite -S 0x62 1000 1000' $testdir/file >> $seqres.full 2>&1
$XFS_IO_PROG -f -c 'pwrite -S 0x63 2000 1000' $testdir/file >> $seqres.full 2>&1
$XFS_IO_PROG -f -c 'pwrite -S 0x64 3000 1000' $testdir/file >> $seqres.full 2>&1
$XFS_IO_PROG -f -c 'pwrite -S 0x65 4000 1000' $testdir/file >> $seqres.full 2>&1
$XFS_IO_PROG -f -c "copy_range $testdir/file" "$testdir/copy"
cmp $testdir/file $testdir/copy
echo "Original md5sums:"
md5sum $testdir/{file,copy} | _filter_test_dir

echo "Copy beginning of original file"
$XFS_IO_PROG -f -c "copy_range -l 1000 $testdir/file" "$testdir/beginning"
cmp -n 1000 $testdir/file $testdir/beginning

If the client is just failing to notice when a newly created file's size is
grown as the result of a COPY, then I wonder why the first copy (of "file" to
"copy") didn't also fail.

--b.