2022-03-31 02:49:05

by Yi Wang

[permalink] [raw]
Subject: [PATCH] xfs: getattr ignore blocks beyond eof

From: Cheng Lin <[email protected]>

Blocks beyond EOF, which preallocated, will be reclaimed at some time.
These blocks can be ignored when getattr.

This patch will optimize query accuracy for getattr blocks.

Signed-off-by: Cheng Lin <[email protected]>
Signed-off-by: Yi Wang <[email protected]>
---
fs/xfs/xfs_bmap_util.c | 1 +
fs/xfs/xfs_icache.c | 1 +
fs/xfs/xfs_inode.h | 1 +
fs/xfs/xfs_iomap.c | 5 +++++
fs/xfs/xfs_iops.c | 19 ++++++++++++++++++-
5 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index eb2e387..9f4081d 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -753,6 +753,7 @@
if (error)
goto out_unlock;

+ ip->i_last_fsb = end_fsb;
xfs_inode_clear_eofblocks_tag(ip);
goto out_unlock;

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 9644f93..43ffb9e 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -102,6 +102,7 @@ struct xfs_inode *
memset(&ip->i_df, 0, sizeof(ip->i_df));
ip->i_flags = 0;
ip->i_delayed_blks = 0;
+ ip->i_last_fsb = 0;
ip->i_diflags2 = mp->m_ino_geo.new_diflags2;
ip->i_nblocks = 0;
ip->i_forkoff = 0;
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index b7e8f14..56fc41b 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -55,6 +55,7 @@
unsigned long i_flags; /* see defined flags below */
uint64_t i_delayed_blks; /* count of delay alloc blks */
xfs_fsize_t i_disk_size; /* number of bytes in file */
+ xfs_fileoff_t i_last_fsb; /* last fsb preallocated */
xfs_rfsblock_t i_nblocks; /* # of direct & btree blocks */
prid_t i_projid; /* owner's project id */
xfs_extlen_t i_extsize; /* basic/minimum extent size */
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index e552ce5..bd9266e 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -881,6 +881,7 @@
bool eof = false, cow_eof = false, shared = false;
int allocfork = XFS_DATA_FORK;
int error = 0;
+ xfs_fileoff_t prealloc_last_fsb = 0;

if (xfs_is_shutdown(mp))
return -EIO;
@@ -1024,6 +1025,7 @@
XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
ASSERT(p_end_fsb > offset_fsb);
prealloc_blocks = p_end_fsb - end_fsb;
+ prealloc_last_fsb = p_end_fsb;
}
}

@@ -1049,6 +1051,9 @@
goto out_unlock;
}

+ if (prealloc_last_fsb && prealloc_blocks)
+ ip->i_last_fsb = prealloc_last_fsb;
+
if (allocfork == XFS_COW_FORK) {
trace_xfs_iomap_alloc(ip, offset, count, allocfork, &cmap);
goto found_cow;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index b79b384..ca0372c 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -559,8 +559,14 @@
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;

+ xfs_off_t fsb_beyond_eof;
+ xfs_fileoff_t end_fsb;
+
trace_xfs_getattr(ip);

+ end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
+ fsb_beyond_eof = ip->i_last_fsb - end_fsb;
+
if (xfs_is_shutdown(mp))
return -EIO;

@@ -574,7 +580,15 @@
stat->atime = inode->i_atime;
stat->mtime = inode->i_mtime;
stat->ctime = inode->i_ctime;
- stat->blocks = XFS_FSB_TO_BB(mp, ip->i_nblocks + ip->i_delayed_blks);
+
+ if (fsb_beyond_eof > 0) {
+ stat->blocks =
+ XFS_FSB_TO_BB(mp, ip->i_nblocks + ip->i_delayed_blks)
+ - fsb_beyond_eof;
+ } else {
+ stat->blocks =
+ XFS_FSB_TO_BB(mp, ip->i_nblocks + ip->i_delayed_blks);
+ }

if (xfs_has_v3inodes(mp)) {
if (request_mask & STATX_BTIME) {
@@ -988,6 +1002,9 @@
ip->i_disk_size = newsize;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);

+ /* update i_last_fsb to newsize when truncate.*/
+ ip->i_last_fsb = XFS_B_TO_FSB(mp, newsize);
+
if (newsize <= oldsize) {
error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, newsize);
if (error)
--
1.8.3.1


2022-03-31 03:15:53

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] xfs: getattr ignore blocks beyond eof

On Thu, Mar 31, 2022 at 04:02:56PM +0800, Yi Wang wrote:
> From: Cheng Lin <[email protected]>
>
> Blocks beyond EOF, which preallocated, will be reclaimed at some time.
> These blocks can be ignored when getattr.
>
> This patch will optimize query accuracy for getattr blocks.

Huh? This subtracts posteof blocks from the query results, which makes
the results *less accurate*. Those blocks are mapped to the file, hence
they are supposed to be counted in nblocks.

--D

> Signed-off-by: Cheng Lin <[email protected]>
> Signed-off-by: Yi Wang <[email protected]>
> ---
> fs/xfs/xfs_bmap_util.c | 1 +
> fs/xfs/xfs_icache.c | 1 +
> fs/xfs/xfs_inode.h | 1 +
> fs/xfs/xfs_iomap.c | 5 +++++
> fs/xfs/xfs_iops.c | 19 ++++++++++++++++++-
> 5 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index eb2e387..9f4081d 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -753,6 +753,7 @@
> if (error)
> goto out_unlock;
>
> + ip->i_last_fsb = end_fsb;
> xfs_inode_clear_eofblocks_tag(ip);
> goto out_unlock;
>
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 9644f93..43ffb9e 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -102,6 +102,7 @@ struct xfs_inode *
> memset(&ip->i_df, 0, sizeof(ip->i_df));
> ip->i_flags = 0;
> ip->i_delayed_blks = 0;
> + ip->i_last_fsb = 0;
> ip->i_diflags2 = mp->m_ino_geo.new_diflags2;
> ip->i_nblocks = 0;
> ip->i_forkoff = 0;
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index b7e8f14..56fc41b 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -55,6 +55,7 @@
> unsigned long i_flags; /* see defined flags below */
> uint64_t i_delayed_blks; /* count of delay alloc blks */
> xfs_fsize_t i_disk_size; /* number of bytes in file */
> + xfs_fileoff_t i_last_fsb; /* last fsb preallocated */
> xfs_rfsblock_t i_nblocks; /* # of direct & btree blocks */
> prid_t i_projid; /* owner's project id */
> xfs_extlen_t i_extsize; /* basic/minimum extent size */
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index e552ce5..bd9266e 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -881,6 +881,7 @@
> bool eof = false, cow_eof = false, shared = false;
> int allocfork = XFS_DATA_FORK;
> int error = 0;
> + xfs_fileoff_t prealloc_last_fsb = 0;
>
> if (xfs_is_shutdown(mp))
> return -EIO;
> @@ -1024,6 +1025,7 @@
> XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
> ASSERT(p_end_fsb > offset_fsb);
> prealloc_blocks = p_end_fsb - end_fsb;
> + prealloc_last_fsb = p_end_fsb;
> }
> }
>
> @@ -1049,6 +1051,9 @@
> goto out_unlock;
> }
>
> + if (prealloc_last_fsb && prealloc_blocks)
> + ip->i_last_fsb = prealloc_last_fsb;
> +
> if (allocfork == XFS_COW_FORK) {
> trace_xfs_iomap_alloc(ip, offset, count, allocfork, &cmap);
> goto found_cow;
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index b79b384..ca0372c 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -559,8 +559,14 @@
> struct xfs_inode *ip = XFS_I(inode);
> struct xfs_mount *mp = ip->i_mount;
>
> + xfs_off_t fsb_beyond_eof;
> + xfs_fileoff_t end_fsb;
> +
> trace_xfs_getattr(ip);
>
> + end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
> + fsb_beyond_eof = ip->i_last_fsb - end_fsb;
> +
> if (xfs_is_shutdown(mp))
> return -EIO;
>
> @@ -574,7 +580,15 @@
> stat->atime = inode->i_atime;
> stat->mtime = inode->i_mtime;
> stat->ctime = inode->i_ctime;
> - stat->blocks = XFS_FSB_TO_BB(mp, ip->i_nblocks + ip->i_delayed_blks);
> +
> + if (fsb_beyond_eof > 0) {
> + stat->blocks =
> + XFS_FSB_TO_BB(mp, ip->i_nblocks + ip->i_delayed_blks)
> + - fsb_beyond_eof;
> + } else {
> + stat->blocks =
> + XFS_FSB_TO_BB(mp, ip->i_nblocks + ip->i_delayed_blks);
> + }
>
> if (xfs_has_v3inodes(mp)) {
> if (request_mask & STATX_BTIME) {
> @@ -988,6 +1002,9 @@
> ip->i_disk_size = newsize;
> xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
>
> + /* update i_last_fsb to newsize when truncate.*/
> + ip->i_last_fsb = XFS_B_TO_FSB(mp, newsize);
> +
> if (newsize <= oldsize) {
> error = xfs_itruncate_extents(&tp, ip, XFS_DATA_FORK, newsize);
> if (error)
> --
> 1.8.3.1

2022-03-31 05:13:51

by Yi Wang

[permalink] [raw]
Subject: Re:[PATCH] xfs: getattr ignore blocks beyond eof

>> From: Cheng Lin <[email protected]>
>>
>> Blocks beyond EOF, which preallocated, will be reclaimed at some time.
>> These blocks can be ignored when getattr.
>>
>> This patch will optimize query accuracy for getattr blocks.
>Huh? This subtracts posteof blocks from the query results, which makes
>the results *less accurate*. Those blocks are mapped to the file, hence
>they are supposed to be counted in nblocks.
>--D
Yes, those blocks are mapped to the file. And the results including them are
absolutely real for xfs, at the moment of query.

But, those blocks are like the credit without consumption, are unstalbe, and
will be reclaimed at some time. This may cause trouble for the application.
e.g. in a case,
1. Firstly, write 100k data to file;
2. query the result;
3. close the file;
4. query the result.

fd stat wrt[96 @ 393216]: blks[896], size[397312].
fd stat wrt[97 @ 397312]: blks[896], size[401408].
fd stat wrt[98 @ 401408]: blks[896], size[405504].
fd stat wrt[99 @ 405504]: blks[896], size[409600].
lstat open: blks[896], size[409600].
lstat close: blks[800], size[409600].

Here two problems:
1. why the result different before between after file close?
2. why the result not change after writing data, or a big change?

The above problems can be explained by fs preallocation.
If the impact of preallocation are closed in fs, not visible to the outside, the result is stable and real for the application.

That is the reason for this patch.
Thanks very much.

2022-03-31 05:14:01

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] xfs: getattr ignore blocks beyond eof

Hi Yi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on xfs-linux/for-next]
[also build test ERROR on v5.17 next-20220330]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/intel-lab-lkp/linux/commits/Yi-Wang/xfs-getattr-ignore-blocks-beyond-eof/20220331-082944
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: s390-randconfig-r002-20220330 (https://download.01.org/0day-ci/archive/20220331/[email protected]/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 0f6d9501cf49ce02937099350d08f20c4af86f3d)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install s390 cross compiling tool for clang build
# apt-get install binutils-s390x-linux-gnu
# https://github.com/intel-lab-lkp/linux/commit/e560188227f8fed285a1bd736e5708de984f0596
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Yi-Wang/xfs-getattr-ignore-blocks-beyond-eof/20220331-082944
git checkout e560188227f8fed285a1bd736e5708de984f0596
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

>> fs/xfs/xfs_bmap_util.c:756:19: error: use of undeclared identifier 'end_fsb'
ip->i_last_fsb = end_fsb;
^
1 error generated.


vim +/end_fsb +756 fs/xfs/xfs_bmap_util.c

710
711 /*
712 * This is called to free any blocks beyond eof. The caller must hold
713 * IOLOCK_EXCL unless we are in the inode reclaim path and have the only
714 * reference to the inode.
715 */
716 int
717 xfs_free_eofblocks(
718 struct xfs_inode *ip)
719 {
720 struct xfs_trans *tp;
721 struct xfs_mount *mp = ip->i_mount;
722 int error;
723
724 /* Attach the dquots to the inode up front. */
725 error = xfs_qm_dqattach(ip);
726 if (error)
727 return error;
728
729 /* Wait on dio to ensure i_size has settled. */
730 inode_dio_wait(VFS_I(ip));
731
732 error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
733 if (error) {
734 ASSERT(xfs_is_shutdown(mp));
735 return error;
736 }
737
738 xfs_ilock(ip, XFS_ILOCK_EXCL);
739 xfs_trans_ijoin(tp, ip, 0);
740
741 /*
742 * Do not update the on-disk file size. If we update the on-disk file
743 * size and then the system crashes before the contents of the file are
744 * flushed to disk then the files may be full of holes (ie NULL files
745 * bug).
746 */
747 error = xfs_itruncate_extents_flags(&tp, ip, XFS_DATA_FORK,
748 XFS_ISIZE(ip), XFS_BMAPI_NODISCARD);
749 if (error)
750 goto err_cancel;
751
752 error = xfs_trans_commit(tp);
753 if (error)
754 goto out_unlock;
755
> 756 ip->i_last_fsb = end_fsb;
757 xfs_inode_clear_eofblocks_tag(ip);
758 goto out_unlock;
759
760 err_cancel:
761 /*
762 * If we get an error at this point we simply don't
763 * bother truncating the file.
764 */
765 xfs_trans_cancel(tp);
766 out_unlock:
767 xfs_iunlock(ip, XFS_ILOCK_EXCL);
768 return error;
769 }
770

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-03-31 05:15:09

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] xfs: getattr ignore blocks beyond eof

Hi Yi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on xfs-linux/for-next]
[also build test ERROR on v5.17 next-20220330]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/intel-lab-lkp/linux/commits/Yi-Wang/xfs-getattr-ignore-blocks-beyond-eof/20220331-082944
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: sparc-randconfig-r005-20220330 (https://download.01.org/0day-ci/archive/20220331/[email protected]/config)
compiler: sparc-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/e560188227f8fed285a1bd736e5708de984f0596
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Yi-Wang/xfs-getattr-ignore-blocks-beyond-eof/20220331-082944
git checkout e560188227f8fed285a1bd736e5708de984f0596
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=sparc SHELL=/bin/bash fs/xfs/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

fs/xfs/xfs_bmap_util.c: In function 'xfs_free_eofblocks':
>> fs/xfs/xfs_bmap_util.c:756:26: error: 'end_fsb' undeclared (first use in this function)
756 | ip->i_last_fsb = end_fsb;
| ^~~~~~~
fs/xfs/xfs_bmap_util.c:756:26: note: each undeclared identifier is reported only once for each function it appears in


vim +/end_fsb +756 fs/xfs/xfs_bmap_util.c

710
711 /*
712 * This is called to free any blocks beyond eof. The caller must hold
713 * IOLOCK_EXCL unless we are in the inode reclaim path and have the only
714 * reference to the inode.
715 */
716 int
717 xfs_free_eofblocks(
718 struct xfs_inode *ip)
719 {
720 struct xfs_trans *tp;
721 struct xfs_mount *mp = ip->i_mount;
722 int error;
723
724 /* Attach the dquots to the inode up front. */
725 error = xfs_qm_dqattach(ip);
726 if (error)
727 return error;
728
729 /* Wait on dio to ensure i_size has settled. */
730 inode_dio_wait(VFS_I(ip));
731
732 error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
733 if (error) {
734 ASSERT(xfs_is_shutdown(mp));
735 return error;
736 }
737
738 xfs_ilock(ip, XFS_ILOCK_EXCL);
739 xfs_trans_ijoin(tp, ip, 0);
740
741 /*
742 * Do not update the on-disk file size. If we update the on-disk file
743 * size and then the system crashes before the contents of the file are
744 * flushed to disk then the files may be full of holes (ie NULL files
745 * bug).
746 */
747 error = xfs_itruncate_extents_flags(&tp, ip, XFS_DATA_FORK,
748 XFS_ISIZE(ip), XFS_BMAPI_NODISCARD);
749 if (error)
750 goto err_cancel;
751
752 error = xfs_trans_commit(tp);
753 if (error)
754 goto out_unlock;
755
> 756 ip->i_last_fsb = end_fsb;
757 xfs_inode_clear_eofblocks_tag(ip);
758 goto out_unlock;
759
760 err_cancel:
761 /*
762 * If we get an error at this point we simply don't
763 * bother truncating the file.
764 */
765 xfs_trans_cancel(tp);
766 out_unlock:
767 xfs_iunlock(ip, XFS_ILOCK_EXCL);
768 return error;
769 }
770

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-03-31 05:45:29

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] xfs: getattr ignore blocks beyond eof

On Thu, Mar 31, 2022 at 11:28:59AM +0800, [email protected] wrote:
> >> From: Cheng Lin <[email protected]>
> >>
> >> Blocks beyond EOF, which preallocated, will be reclaimed at some time.
> >> These blocks can be ignored when getattr.
> >>
> >> This patch will optimize query accuracy for getattr blocks.
> >Huh? This subtracts posteof blocks from the query results, which makes
> >the results *less accurate*. Those blocks are mapped to the file, hence
> >they are supposed to be counted in nblocks.
> >--D
> Yes, those blocks are mapped to the file. And the results including them are
> absolutely real for xfs, at the moment of query.
>
> But, those blocks are like the credit without consumption, are unstalbe, and
> will be reclaimed at some time. This may cause trouble for the application.

What application is having trouble with this?

> e.g. in a case,
> 1. Firstly, write 100k data to file;
> 2. query the result;
> 3. close the file;
> 4. query the result.
>
> fd stat wrt[96 @ 393216]: blks[896], size[397312].
> fd stat wrt[97 @ 397312]: blks[896], size[401408].
> fd stat wrt[98 @ 401408]: blks[896], size[405504].
> fd stat wrt[99 @ 405504]: blks[896], size[409600].
> lstat open: blks[896], size[409600].
> lstat close: blks[800], size[409600].
>
> Here two problems:
> 1. why the result different before between after file close?
> 2. why the result not change after writing data, or a big change?

Because that's the way speculative preallocation works.

> The above problems can be explained by fs preallocation.
> If the impact of preallocation are closed in fs, not visible to
> the outside, the result is stable and real for the application.

You're a decade late to the party. Complaining about how visible
artifacts of speculative preallocation are going to cause the sky
to fall was something that happened when we fix implemented the
mechanism.

Once people realised how much better their filesystems resisted
fragmentation and hence aged more gracefully because the temporary
overallocation helped ensure large, contiguous extents got allocated
more often than not, those complaints went away.

Have you even considered what reflink/dedupe does to user visible
block accounting? That's right - it's a complete lie.

We do not, and have not ever tried to, hide allocation or block
usage artifacts from userspace because any application that depends
on specific block allocation patterns or accounting from the
filesystem is broken by design.

Every filesystem accounts blocks differently, and more often than
not the block count exposed to userspace also includes metadata
blocks (extent maps, xattr blocks, etc) and it might multiple count
other blocks (e.g. shared extents). Hence so you can't actually
use it for anything useful in userspace except reporting how many
blocks this file *might* use.

If your application is dependent on block counts exactly matching
the file data space for waht ever reason, then what speculative
preallocation does is the least of your problems.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2022-03-31 06:09:48

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] xfs: getattr ignore blocks beyond eof

On Thu, Mar 31, 2022 at 04:33:40PM +1100, Dave Chinner wrote:
> On Thu, Mar 31, 2022 at 11:28:59AM +0800, [email protected] wrote:
> > The above problems can be explained by fs preallocation.
> > If the impact of preallocation are closed in fs, not visible to
> > the outside, the result is stable and real for the application.
....

> If your application is dependent on block counts exactly matching
> the file data space for waht ever reason, then what speculative
> preallocation does is the least of your problems.

That said, if you have written an application that cannot handle
this behaviour, then you can *turn it off* by using the allocsize
mount option. But then subsequent file fragmentation issues are your
own problem to solve, not ours.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2022-03-31 13:19:25

by Yi Wang

[permalink] [raw]
Subject: Re:[PATCH] xfs: getattr ignore blocks beyond eof

> We do not, and have not ever tried to, hide allocation or block
> usage artifacts from userspace because any application that depends
> on specific block allocation patterns or accounting from the
> filesystem is broken by design.
>
> Every filesystem accounts blocks differently, and more often than
> not the block count exposed to userspace also includes metadata
> blocks (extent maps, xattr blocks, etc) and it might multiple count
> other blocks (e.g. shared extents). Hence so you can't actually
> use it for anything useful in userspace except reporting how many
> blocks this file *might* use.
>
> If your application is dependent on block counts exactly matching
> the file data space for waht ever reason, then what speculative
> preallocation does is the least of your problems.
>

Thanks for your explaination.

Unfortunately, the app I'm using evaluates diskusage by querying
the changes of the backend filesystem (XFS) file before and after
the operation. Without giving up the benefits of preallocation, the
app's statistics will become obsolete and no chance to correct it
at a small cost, because of the silence reclaim of posteof blocks.
That is the app's problem.

Posteof blocks will be reclaimed sooner or later, it seems reasonable
to ignore them directly during query. This is my humble opinion in
this patch. At the query moment, it's not real, but it will become so
eventually. It's a speculative result for query.

2022-04-01 11:17:30

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] xfs: getattr ignore blocks beyond eof

On Thu, Mar 31, 2022 at 04:32:07PM +0800, [email protected] wrote:
> > We do not, and have not ever tried to, hide allocation or block
> > usage artifacts from userspace because any application that depends
> > on specific block allocation patterns or accounting from the
> > filesystem is broken by design.
> >
> > Every filesystem accounts blocks differently, and more often than
> > not the block count exposed to userspace also includes metadata
> > blocks (extent maps, xattr blocks, etc) and it might multiple count
> > other blocks (e.g. shared extents). Hence so you can't actually
> > use it for anything useful in userspace except reporting how many
> > blocks this file *might* use.
> >
> > If your application is dependent on block counts exactly matching
> > the file data space for waht ever reason, then what speculative
> > preallocation does is the least of your problems.
> >
>
> Thanks for your explaination.
>
> Unfortunately, the app I'm using evaluates diskusage by querying
> the changes of the backend filesystem (XFS) file before and after
> the operation.

What application is this?

What is it trying to use this information for?

I'm trying to understand why someone thought this was a good idea,
and without actually being able to look up the code and see what it
is using the information for, I can't really say much more than
"this seems broken by design".

> Without giving up the benefits of preallocation, the
> app's statistics will become obsolete and no chance to correct it
> at a small cost, because of the silence reclaim of posteof blocks.
> That is the app's problem.

Yes it is.

> Posteof blocks will be reclaimed sooner or later, it seems reasonable

No, that is not guaranteed. If you the extend the file again, those
post eof blocks will no longer be post-eof blocks and instead
contain user data. Also, fallocate() can allocate post-eof blocks,
and in that case they can be retained permanently because the user
asked them to be placed beyond EOF.

So the assertion that post-eof blocks always get removed sooner or
later is not actually true.

> to ignore them directly during query. This is my humble opinion in
> this patch. At the query moment, it's not real, but it will become so
> eventually. It's a speculative result for query.

No, it's the _correct_ result for the current state of the file
being queried. The statx() man page says:

st_blocks
This field indicates the number of blocks allocated to the
file, in 512-byte units. (This may be smaller than st_size/512
when the file has holes.)

The POSIX specification just defines it as "Number of blocks
allocated for this object."

Neither say anything about how the filesystem should or shouldn't
account those blocks, that it must be stable, that it must reflect
the amount of data written to the file, etc. ALl they say is that
it is the amount of blocks allocated for that file.

As it is, hiding space usage like you propose is likely to cause
more problems than it solaves, because not du will not report all
the disk space used by a file and hence we'll end up with other
users reporting that the disk space reported by du does not match up
with the space the filesytem is using. Which, of course is also
expected, because reflink/dedupe result in du multiple counting
shared blocks.

IOWs, userspace tracking and aggregation of filesystem space usage
just doesn't work, and so papering over behaviours that expose the
fact it doesn't and can't work are in no-ones best interests.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2022-04-05 02:17:02

by Yi Wang

[permalink] [raw]
Subject: Re:[PATCH] xfs: getattr ignore blocks beyond eof

> On Thu, Mar 31, 2022 at 04:32:07PM +0800, [email protected] wrote:
> > > We do not, and have not ever tried to, hide allocation or block
> > > usage artifacts from userspace because any application that depends
> > > on specific block allocation patterns or accounting from the
> > > filesystem is broken by design.
> > >
> > > If your application is dependent on block counts exactly matching
> > > the file data space for waht ever reason, then what speculative
> > > preallocation does is the least of your problems.
> > >
> >
> > Thanks for your explaination.
> >
> > Unfortunately, the app I'm using evaluates diskusage by querying
> > the changes of the backend filesystem (XFS) file before and after
> > the operation.
>
> What application is this?
>
> What is it trying to use this information for?

Thanks very much, Dave.

I'm trying to use a new xlater(module) named 'simple-quota' in
glusterfs, which collects file's diskusage by stat, for quota function.

>
> I'm trying to understand why someone thought this was a good idea,
> and without actually being able to look up the code and see what it
> is using the information for, I can't really say much more than
> "this seems broken by design".
>
> > Without giving up the benefits of preallocation, the
> > app's statistics will become obsolete and no chance to correct it
> > at a small cost, because of the silence reclaim of posteof blocks.
> > That is the app's problem.

2022-04-05 02:52:28

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] xfs: getattr ignore blocks beyond eof

On Fri, Apr 01, 2022 at 04:09:40PM +0800, [email protected] wrote:
> > On Thu, Mar 31, 2022 at 04:32:07PM +0800, [email protected] wrote:
> > > > We do not, and have not ever tried to, hide allocation or block
> > > > usage artifacts from userspace because any application that depends
> > > > on specific block allocation patterns or accounting from the
> > > > filesystem is broken by design.
> > > >
> > > > If your application is dependent on block counts exactly matching
> > > > the file data space for waht ever reason, then what speculative
> > > > preallocation does is the least of your problems.
> > > >
> > >
> > > Thanks for your explaination.
> > >
> > > Unfortunately, the app I'm using evaluates diskusage by querying
> > > the changes of the backend filesystem (XFS) file before and after
> > > the operation.
> >
> > What application is this?
> >
> > What is it trying to use this information for?
>
> Thanks very much, Dave.
>
> I'm trying to use a new xlater(module) named 'simple-quota' in
> glusterfs, which collects file's diskusage by stat, for quota function.

So a company (kadalu) has forked gluster, then removed the gluster
quota implementation because of undefined "major performance
problems" and then implemented their own quota thing that stores
disk usage information in extended attributes. And:

Advised method is adding quota limit info before writing any
information to directory. Even otherwise we don’t need quota
crawl for updating quota-limit, but do a du -s -b $dir and
write the output into xattr.

Yeah, that's broken by design.

Oh, look at that:

Just run statfs() on all dirs which has Quota set, and send
a flag to update status of quota xlator through internal
xattr mechanism if quota limit exceeds for a directory! This
can run once in 5 or 10 seconds, or even lesser frequency if
the quota limit is huge!

That's pretty broken, too. Doesn't scale out, either...

Cheers,

Dave.
--
Dave Chinner
[email protected]