2022-11-22 06:13:20

by Zorro Lang

[permalink] [raw]
Subject: Re: [PATCH] nfs: test files written size as expected

On Mon, Nov 21, 2022 at 11:37:16AM -0800, Darrick J. Wong wrote:
> On Tue, Nov 22, 2022 at 02:47:45AM +0800, Zorro Lang wrote:
> > On Sat, Nov 05, 2022 at 11:23:29AM +0800, Zorro Lang wrote:
> > > Test nfs and its underlying fs, make sure file size as expected
> > > after writting a file, and the speculative allocation space can
> > > be shrunken.
> > >
> > > Signed-off-by: Zorro Lang <[email protected]>
> > > ---
> > >
> > > Hi,
> > >
> > > The original bug reproducer is:
> > > 1. mount nfs3 backed by xfs
> > > 2. dd if=/dev/zero of=/nfs/10M bs=1M count=10
> > > 3. du -sh /nfs/10M
> > > 16M /nfs/10M
> > >
> > > As this was a xfs issue, so cc linux-xfs@ to get review.
> > >
> > > Thanks,
> > > Zorro
> > >
> > > tests/nfs/002 | 43 +++++++++++++++++++++++++++++++++++++++++++
> > > tests/nfs/002.out | 2 ++
> > > 2 files changed, 45 insertions(+)
> > > create mode 100755 tests/nfs/002
> > > create mode 100644 tests/nfs/002.out
> > >
> > > diff --git a/tests/nfs/002 b/tests/nfs/002
> > > new file mode 100755
> > > index 00000000..3d29958d
> > > --- /dev/null
> > > +++ b/tests/nfs/002
> > > @@ -0,0 +1,43 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (c) 2022 Red Hat, Inc. All Rights Reserved.
> > > +#
> > > +# FS QA Test 002
> > > +#
> > > +# Make sure nfs gets expected file size after writting a big sized file. It's
> > > +# not only testing nfs, test its underlying fs too. For example a known old bug
> > > +# on xfs (underlying fs) caused nfs get larger file size (e.g. 16M) after
> > > +# writting 10M data to a file. It's fixed by a series of patches around
> > > +# 579b62faa5fb16 ("xfs: add background scanning to clear eofblocks inodes")
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest auto rw
> > > +
> > > +# real QA test starts here
> > > +_supported_fs nfs
> > > +_require_test
> > > +
> > > +localfile=$TEST_DIR/testfile.$seq
> > > +rm -rf $localfile
> > > +
> > > +$XFS_IO_PROG -f -t -c "pwrite 0 10m" -c "fsync" $localfile >>$seqres.full 2>&1
> > > +block_size=`stat -c '%B' $localfile`
> > > +iblocks_expected=$((10 * 1024 * 1024 / $block_size))
> > > +# Try several times for the speculative allocated file size can be shrunken
> > > +res=1
> > > +for ((i=0; i<10; i++));do
> > > + iblocks_real=`stat -c '%b' $localfile`
> > > + if [ "$iblocks_expected" = "$iblocks_real" ];then
> > > + res=0
> > > + break
> > > + fi
> > > + sleep 10
> > > +done
> >
> > Hmm... this case sometimes fails on kernel 6.1.0-rc6 [1] (nfs4.2 base on xfs),
> > even I changed the sleep time to 20s * 10, it still fails. But I can't reproduce
> > this failure if the underlying fs is ext4... cc linux-xfs, to check if I miss
> > something for xfs? Or this's a xfs issue?
>
> Could be anything, really -- speculative preallocation on the server, or
> xattrs blowing up the attr fork. You'd have to go query the file
> mappings and whatnot of the xfs file on the server to find out.

Hi Darrick,

Thanks for your reply.

When the number of blocks isn't as expected, I printed the bmap of the test
file on xfs, it really shows 20480 blocks [1], not 32640 blocks.

And I checked the xattr of the file on xfs, it doesn't has attr blocks [2], it's
local format. Even if it has some attr blocks, 32640 too much bigger than 20480,
I don't think selinux or other xattr takes that many blocks.

Hmm... this test case passed on nfs3 and nfs4.0 with xfs, but fails on nfs4.1 or
nfs4.2 with xfs. cc linux-nfs@ list to get more review.

BTW, this case is an original regression test for a rhel bug:
https://bugzilla.redhat.com/show_bug.cgi?id=955254 (you might can't open it)
which blamed that "xfs files written over nfs are bigger due to speculative
allocation and may never shrink while fs is mounted".

Dave said:
For NFS servers, keeping the speculative prealloc around for as long as possible
is the desired behaviour. It will get truncated away when the inode is cycled
out of the cache by memory pressure on the NFS server. So the code is currently
behaving as designed.

But there are changes upstream that give different behaviour, so we merged
below xfs patches to "fix" that bug:

xfs: add EOFBLOCKS inode tagging/untagging
xfs: support a tag-based inode_ag_iterator
xfs: create helper to check whether to free eofblocks on inode
xfs: make xfs_free_eofblocks() non-static, return EAGAIN on trylock failure
xfs: create function to scan and clear EOFBLOCKS inodes
xfs: add XFS_IOC_FREE_EOFBLOCKS ioctl
xfs: add inode id filtering to eofblocks scan
xfs: support multiple inode id filtering in eofblocks scan
xfs: add minimum file size filtering to eofblocks scan
xfs: add background scanning to clear eofblocks inodes
xfs: limit speculative prealloc near ENOSPC thresholds
xfs: limit speculative prealloc size on sparse files
xfs: fix potential infinite loop in xfs_iomap_prealloc_size()
xfs: increase prealloc size to double that of the previous extent
xfs: fix xfs_iomap_eof_prealloc_initial_size type
xfs: Define a new function xfs_this_quota_on()
xfs: Define a new function xfs_inode_dquot()
xfs: reorganize xfs_iomap_prealloc_size to remove indentation
xfs: push rounddown_pow_of_two() to after prealloc throttle
xfs: pass xfs_dquot to xfs_qm_adjust_dqlimits() instead of xfs_disk_dquot_t
xfs: xfs_dquot prealloc throttling watermarks and low free space
xfs: add quota-driven speculative preallocation throttling
xfs: xfs_iomap_prealloc_size() tracepoint
xfs: don't use speculative prealloc for small files

How I'm not sure if it's a bug again, or an expected result for nfs4.2 with xfs.

Thanks,
Zorro

[1]
/mnt/xfstests/test/nfs-server/testfile.002:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..20479]: 192..20671 0 (192..20671) 20480 000000
FLAG Values:
0100000 Shared extent
0010000 Unwritten preallocated extent
0001000 Doesn't begin on stripe unit
0000100 Doesn't end on stripe unit
0000010 Doesn't begin on stripe width
0000001 Doesn't end on stripe width


[2]
core.format = 2 (extents)
...
core.size = 10485760
core.nblocks = 2560
core.extsize = 0
core.nextents = 1
core.naextents = 0
core.forkoff = 24
core.aformat = 1 (local)
...
u3.bmx[0] = [startoff,startblock,blockcount,extentflag]
0:[0,24,2560,0]
a.sfattr.hdr.totsize = 47
a.sfattr.hdr.count = 1
a.sfattr.list[0].namelen = 7
a.sfattr.list[0].valuelen = 33
a.sfattr.list[0].root = 0
a.sfattr.list[0].secure = 1
a.sfattr.list[0].name = "selinux"
a.sfattr.list[0].value = "system_u:object_r:unlabeled_t:s0\000"

>
> --D
>
> > Thanks,
> > Zorro
> >
> > [1]
> > # ./check nfs/002
> > FSTYP -- nfs
> > PLATFORM -- Linux/x86_64 dell-per640-04 6.1.0-rc6 #1 SMP PREEMPT_DYNAMIC Mon Nov 21 00:51:20 EST 2022
> > MKFS_OPTIONS -- dell-per640-04.dell2.lab.eng.bos.redhat.com:/mnt/xfstests/scratch/nfs-server
> > MOUNT_OPTIONS -- -o vers=4.2 -o context=system_u:object_r:root_t:s0 dell-per640-04.dell2.lab.eng.bos.redhat.com:/mnt/xfstests/scratch/nfs-server /mnt/xfstests/scratch/nfs-client
> >
> > nfs/002 3s ... - output mismatch (see /var/lib/xfstests/results//nfs/002.out.bad)
> > --- tests/nfs/002.out 2022-11-21 01:29:33.861770474 -0500
> > +++ /var/lib/xfstests/results//nfs/002.out.bad 2022-11-21 13:27:37.424199056 -0500
> > @@ -1,2 +1,3 @@
> > QA output created by 002
> > +Write 20480 blocks, but get 32640 blocks
> > Silence is golden
> > ...
> > (Run 'diff -u /var/lib/xfstests/tests/nfs/002.out /var/lib/xfstests/results//nfs/002.out.bad' to see the entire diff)
> > Ran: nfs/002
> > Failures: nfs/002
> > Failed 1 of 1 tests
> >
> >
> > > +if [ $res -ne 0 ];then
> > > + echo "Write $iblocks_expected blocks, but get $iblocks_real blocks"
> > > +fi
> > > +
> > > +echo "Silence is golden"
> > > +# success, all done
> > > +status=0
> > > +exit
> > > diff --git a/tests/nfs/002.out b/tests/nfs/002.out
> > > new file mode 100644
> > > index 00000000..61705c7c
> > > --- /dev/null
> > > +++ b/tests/nfs/002.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 002
> > > +Silence is golden
> > > --
> > > 2.31.1
> > >
> >
>