2022-11-03 10:40:24

by Gao Xiang

[permalink] [raw]
Subject: [PATCH] xfs: extend the freelist before available space check

There is a long standing issue which could cause fs shutdown due to
inode extent to btree conversion failure right after an extent
allocation in the same AG, which is absolutely unexpected due to the
proper minleft reservation in the previous allocation. Brian once
addressed one of the root cause [1], however, such symptom can still
occur after the commit is merged as reported [2], and our cloud
environment is also suffering from this issue.

From the description of the commit [1], I found that Zirong has an
in-house stress test reproducer for this issue, therefore I asked him
to reproduce again and he confirmed that such issue can still be
reproducable on RHEL 9.

Thanks to him, after dumping the transaction log items, I think
the root cause is as below:
1. BUF: ... start blkno:0x3bffe81 len:1 bmap size:1 flags:0x2800
AGF Buffer: (XAGF)
ver:1 seq#:3 len:2621424
root BNO:9 CNT:7
level BNO:2 CNT:2
1st:64 last:69 cnt:6 freeblks:18277 longest:6395

2. agfl (flfirst = 64, fllast = 69, flcount = 6)
64:547 65:167 66:1651 67:2040807 68:783 69:604

3. The log records a new AGF
blkno 62914177, len 1, map_size 1
00000000: 58 41 47 46 00 00 00 01 00 00 00 03 00 27 ff f0 XAGF.........'..
00000010: 00 00 00 09 00 00 00 07 00 00 00 00 00 00 00 02 ................
00000020: 00 00 00 02 00 00 00 00 00 00 00 41 00 00 00 45 ...........A...E
00000030: 00 00 00 05 00 00 47 65 00 00 18 fb 00 00 00 09 ......Ge........
00000040: 75 dc c1 b5 1a 45 40 2a 80 50 72 f0 59 6e 62 66 u....E@*.Pr.Ynbf
agf 3 flfirst: 65 (0x41) fllast: 69 (0x45) cnt: 5

4. agfl 64 (daddr 62918552) was then written as a cntbt block
log item:
type#011= 0x123c
flags#011= 0x8
blkno 62918552, len 8, map_size 1
00000000: 41 42 33 43 00 00 00 fd 00 1f 23 e4 ff ff ff ff AB3C......#.....
00000010: 00 00 00 00 03 c0 0f 98 00 00 00 00 00 00 00 00 ................
00000020: 75 dc c1 b5 1a 45 40 2a 80 50 72 f0 59 6e 62 66 u....E@*.Pr.Ynbf

5. Finally, kaboom.
Nov 1 07:56:09 dell-per750-08 kernel: ------------[ cut here ]------------
Nov 1 07:56:09 dell-per750-08 kernel: WARNING: CPU: 15 PID: 49290 at fs/xfs/libxfs/xfs_bmap.c:717 xfs_bmap_extents_to_btree+0xc51/0x1050 [xfs]
...
Nov 1 07:56:10 dell-per750-08 kernel: XFS (sda2): agno 3 agflcount 5 freeblks 18277 reservation 18276 6

In order to fix the issue above, freelist needs to be filled with the
minimal blocks at least before available space check, and then we also
know the freelist has enough blocks for the following emergency btree
allocations.

[1] commit 1ca89fbc48e1 ("xfs: don't account extra agfl blocks as available")
https://lore.kernel.org/r/[email protected]
[2] https://lore.kernel.org/r/[email protected]
Signed-off-by: Gao Xiang <[email protected]>
---
fs/xfs/libxfs/xfs_alloc.c | 187 ++++++++++++++++++++++++--------------
1 file changed, 121 insertions(+), 66 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 6261599bb389..5d5e0e0e5227 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2587,6 +2587,86 @@ xfs_exact_minlen_extent_available(
}
#endif

+/*
+ * The freelist has to be in a good state before the available space check
+ * since multiple allocations could be performed from a single AG and
+ * transaction under certain conditions. For example, A bmbt allocation
+ * request made for inode extent to bmap format conversion after an extent
+ * allocation is expected to be satisfied by the same AG. Such bmap conversion
+ * allocation can fail due to the available space check if allocbt required an
+ * extra btree block from the freelist in the previous allocation but without
+ * make the freelist longer.
+ */
+int
+xfs_fill_agfl(
+ struct xfs_alloc_arg *args,
+ int flags,
+ xfs_extlen_t need,
+ struct xfs_buf *agbp)
+{
+ struct xfs_trans *tp = args->tp;
+ struct xfs_perag *pag = agbp->b_pag;
+ struct xfs_alloc_arg targs = {
+ .tp = tp,
+ .mp = tp->t_mountp,
+ .agbp = agbp,
+ .agno = args->agno,
+ .alignment = 1,
+ .minlen = 1,
+ .prod = 1,
+ .type = XFS_ALLOCTYPE_THIS_AG,
+ .pag = pag,
+ };
+ struct xfs_buf *agflbp = NULL;
+ xfs_agblock_t bno;
+ int error;
+
+ if (flags & XFS_ALLOC_FLAG_NORMAP)
+ targs.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
+ else
+ targs.oinfo = XFS_RMAP_OINFO_AG;
+
+ error = xfs_alloc_read_agfl(pag, tp, &agflbp);
+ if (error)
+ return error;
+
+ /* Make the freelist longer if it's too short. */
+ while (pag->pagf_flcount < need) {
+ targs.agbno = 0;
+ targs.maxlen = need - pag->pagf_flcount;
+ targs.resv = XFS_AG_RESV_AGFL;
+
+ /* Allocate as many blocks as possible at once. */
+ error = xfs_alloc_ag_vextent(&targs);
+ if (error)
+ goto out_agflbp_relse;
+
+ /*
+ * Stop if we run out. Won't happen if callers are obeying
+ * the restrictions correctly. Can happen for free calls
+ * on a completely full ag.
+ */
+ if (targs.agbno == NULLAGBLOCK) {
+ if (flags & XFS_ALLOC_FLAG_FREEING)
+ break;
+ error = -ENOSPC;
+ goto out_agflbp_relse;
+ }
+ /*
+ * Put each allocated block on the list.
+ */
+ for (bno = targs.agbno; bno < targs.agbno + targs.len; bno++) {
+ error = xfs_alloc_put_freelist(pag, tp, agbp,
+ agflbp, bno, 0);
+ if (error)
+ goto out_agflbp_relse;
+ }
+ }
+out_agflbp_relse:
+ xfs_trans_brelse(tp, agflbp);
+ return error;
+}
+
/*
* Decide whether to use this allocation group for this allocation.
* If so, fix up the btree freelist's size.
@@ -2600,8 +2680,7 @@ xfs_alloc_fix_freelist(
struct xfs_perag *pag = args->pag;
struct xfs_trans *tp = args->tp;
struct xfs_buf *agbp = NULL;
- struct xfs_buf *agflbp = NULL;
- struct xfs_alloc_arg targs; /* local allocation arguments */
+ struct xfs_owner_info oinfo;
xfs_agblock_t bno; /* freelist block */
xfs_extlen_t need; /* total blocks needed in freelist */
int error = 0;
@@ -2630,22 +2709,45 @@ xfs_alloc_fix_freelist(
goto out_agbp_relse;
}

- need = xfs_alloc_min_freelist(mp, pag);
- if (!xfs_alloc_space_available(args, need, flags |
- XFS_ALLOC_FLAG_CHECK))
- goto out_agbp_relse;
-
/*
- * Get the a.g. freespace buffer.
- * Can fail if we're not blocking on locks, and it's held.
+ * See the comment above xfs_fill_agfl() for the reason why we need to
+ * make freelist longer here. Assumed that such case is quite rare, so
+ * in order to simplify the code, let's take agbp unconditionally.
*/
- if (!agbp) {
- error = xfs_alloc_read_agf(pag, tp, flags, &agbp);
- if (error) {
- /* Couldn't lock the AGF so skip this AG. */
- if (error == -EAGAIN)
- error = 0;
- goto out_no_agbp;
+ need = xfs_alloc_min_freelist(mp, pag);
+ if (pag->pagf_flcount < need) {
+ /*
+ * Get the a.g. freespace buffer.
+ * Can fail if we're not blocking on locks, and it's held.
+ */
+ if (!agbp) {
+ error = xfs_alloc_read_agf(pag, tp, flags, &agbp);
+ if (error) {
+ /* Couldn't lock the AGF so skip this AG. */
+ if (error == -EAGAIN)
+ error = 0;
+ goto out_no_agbp;
+ }
+ }
+
+ need = xfs_alloc_min_freelist(mp, pag);
+ error = xfs_fill_agfl(args, flags, need, agbp);
+ if (error)
+ goto out_agbp_relse;
+ } else {
+ if (!xfs_alloc_space_available(args, need, flags |
+ XFS_ALLOC_FLAG_CHECK))
+ goto out_agbp_relse;
+
+ /* Get the a.g. freespace buffer. */
+ if (!agbp) {
+ error = xfs_alloc_read_agf(pag, tp, flags, &agbp);
+ if (error) {
+ /* Couldn't lock the AGF so skip this AG. */
+ if (error == -EAGAIN)
+ error = 0;
+ goto out_no_agbp;
+ }
}
}

@@ -2691,69 +2793,22 @@ xfs_alloc_fix_freelist(
* regenerated AGFL, bnobt, and cntbt. See repair/phase5.c and
* repair/rmap.c in xfsprogs for details.
*/
- memset(&targs, 0, sizeof(targs));
- /* struct copy below */
if (flags & XFS_ALLOC_FLAG_NORMAP)
- targs.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
+ oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
else
- targs.oinfo = XFS_RMAP_OINFO_AG;
+ oinfo = XFS_RMAP_OINFO_AG;
while (!(flags & XFS_ALLOC_FLAG_NOSHRINK) && pag->pagf_flcount > need) {
error = xfs_alloc_get_freelist(pag, tp, agbp, &bno, 0);
if (error)
goto out_agbp_relse;

/* defer agfl frees */
- xfs_defer_agfl_block(tp, args->agno, bno, &targs.oinfo);
+ xfs_defer_agfl_block(tp, args->agno, bno, &oinfo);
}

- targs.tp = tp;
- targs.mp = mp;
- targs.agbp = agbp;
- targs.agno = args->agno;
- targs.alignment = targs.minlen = targs.prod = 1;
- targs.type = XFS_ALLOCTYPE_THIS_AG;
- targs.pag = pag;
- error = xfs_alloc_read_agfl(pag, tp, &agflbp);
- if (error)
- goto out_agbp_relse;
-
- /* Make the freelist longer if it's too short. */
- while (pag->pagf_flcount < need) {
- targs.agbno = 0;
- targs.maxlen = need - pag->pagf_flcount;
- targs.resv = XFS_AG_RESV_AGFL;
-
- /* Allocate as many blocks as possible at once. */
- error = xfs_alloc_ag_vextent(&targs);
- if (error)
- goto out_agflbp_relse;
-
- /*
- * Stop if we run out. Won't happen if callers are obeying
- * the restrictions correctly. Can happen for free calls
- * on a completely full ag.
- */
- if (targs.agbno == NULLAGBLOCK) {
- if (flags & XFS_ALLOC_FLAG_FREEING)
- break;
- goto out_agflbp_relse;
- }
- /*
- * Put each allocated block on the list.
- */
- for (bno = targs.agbno; bno < targs.agbno + targs.len; bno++) {
- error = xfs_alloc_put_freelist(pag, tp, agbp,
- agflbp, bno, 0);
- if (error)
- goto out_agflbp_relse;
- }
- }
- xfs_trans_brelse(tp, agflbp);
args->agbp = agbp;
return 0;

-out_agflbp_relse:
- xfs_trans_brelse(tp, agflbp);
out_agbp_relse:
if (agbp)
xfs_trans_brelse(tp, agbp);
--
2.24.4



2022-11-03 13:09:20

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] xfs: extend the freelist before available space check

Hi Gao,

I love your patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on linus/master v6.1-rc3 next-20221103]
[cannot apply to djwong-xfs/djwong-devel]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Gao-Xiang/xfs-extend-the-freelist-before-available-space-check/20221103-174840
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
patch link: https://lore.kernel.org/r/20221103094639.39984-1-hsiangkao%40linux.alibaba.com
patch subject: [PATCH] xfs: extend the freelist before available space check
config: sparc-allyesconfig
compiler: sparc64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/427b4db33a93f832a6da62b2838bd5a558d024d9
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Gao-Xiang/xfs-extend-the-freelist-before-available-space-check/20221103-174840
git checkout 427b4db33a93f832a6da62b2838bd5a558d024d9
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sparc SHELL=/bin/bash fs/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> fs/xfs/libxfs/xfs_alloc.c:2597:1: warning: no previous prototype for 'xfs_fill_agfl' [-Wmissing-prototypes]
2597 | xfs_fill_agfl(
| ^~~~~~~~~~~~~


vim +/xfs_fill_agfl +2597 fs/xfs/libxfs/xfs_alloc.c

2585
2586 /*
2587 * The freelist has to be in a good state before the available space check
2588 * since multiple allocations could be performed from a single AG and
2589 * transaction under certain conditions. For example, A bmbt allocation
2590 * request made for inode extent to bmap format conversion after an extent
2591 * allocation is expected to be satisfied by the same AG. Such bmap conversion
2592 * allocation can fail due to the available space check if allocbt required an
2593 * extra btree block from the freelist in the previous allocation but without
2594 * make the freelist longer.
2595 */
2596 int
> 2597 xfs_fill_agfl(
2598 struct xfs_alloc_arg *args,
2599 int flags,
2600 xfs_extlen_t need,
2601 struct xfs_buf *agbp)
2602 {
2603 struct xfs_trans *tp = args->tp;
2604 struct xfs_perag *pag = agbp->b_pag;
2605 struct xfs_alloc_arg targs = {
2606 .tp = tp,
2607 .mp = tp->t_mountp,
2608 .agbp = agbp,
2609 .agno = args->agno,
2610 .alignment = 1,
2611 .minlen = 1,
2612 .prod = 1,
2613 .type = XFS_ALLOCTYPE_THIS_AG,
2614 .pag = pag,
2615 };
2616 struct xfs_buf *agflbp = NULL;
2617 xfs_agblock_t bno;
2618 int error;
2619
2620 if (flags & XFS_ALLOC_FLAG_NORMAP)
2621 targs.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
2622 else
2623 targs.oinfo = XFS_RMAP_OINFO_AG;
2624
2625 error = xfs_alloc_read_agfl(pag, tp, &agflbp);
2626 if (error)
2627 return error;
2628
2629 /* Make the freelist longer if it's too short. */
2630 while (pag->pagf_flcount < need) {
2631 targs.agbno = 0;
2632 targs.maxlen = need - pag->pagf_flcount;
2633 targs.resv = XFS_AG_RESV_AGFL;
2634
2635 /* Allocate as many blocks as possible at once. */
2636 error = xfs_alloc_ag_vextent(&targs);
2637 if (error)
2638 goto out_agflbp_relse;
2639
2640 /*
2641 * Stop if we run out. Won't happen if callers are obeying
2642 * the restrictions correctly. Can happen for free calls
2643 * on a completely full ag.
2644 */
2645 if (targs.agbno == NULLAGBLOCK) {
2646 if (flags & XFS_ALLOC_FLAG_FREEING)
2647 break;
2648 error = -ENOSPC;
2649 goto out_agflbp_relse;
2650 }
2651 /*
2652 * Put each allocated block on the list.
2653 */
2654 for (bno = targs.agbno; bno < targs.agbno + targs.len; bno++) {
2655 error = xfs_alloc_put_freelist(pag, tp, agbp,
2656 agflbp, bno, 0);
2657 if (error)
2658 goto out_agflbp_relse;
2659 }
2660 }
2661 out_agflbp_relse:
2662 xfs_trans_brelse(tp, agflbp);
2663 return error;
2664 }
2665

--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (4.77 kB)
config (328.46 kB)
Download all attachments

2022-11-03 13:49:00

by Gao Xiang

[permalink] [raw]
Subject: [PATCH v2] xfs: extend the freelist before available space check

There is a long standing issue which could cause fs shutdown due to
inode extent to btree conversion failure right after an extent
allocation in the same AG, which is absolutely unexpected due to the
proper minleft reservation in the previous allocation. Brian once
addressed one of the root cause [1], however, such symptom can still
occur after the commit is merged as reported [2], and our cloud
environment is also suffering from this issue.

From the description of the commit [1], I found that Zirong has an
in-house stress test reproducer for this issue, therefore I asked him
to reproduce again and he confirmed that such issue can still be
reproducable on RHEL 9.

Thanks to him, after dumping the transaction log items, I think
the root cause is as below:
1. Allocate space with the following condition:
freeblks: 18304 pagf_flcount: 6
reservation: 18276 need (min_free): 6
args->minleft: 1
available = freeblks + agflcount - reservation - need - minleft
= 18304 + min(6, 6) - 18276 - 6 - 1 = 27

The first allocation check itself is ok;

2. At that time, the AG state is
AGF Buffer: (XAGF)
ver:1 seq#:3 len:2621424
root BNO:9 CNT:7
level BNO:2 CNT:2
1st:64 last:69 cnt:6 freeblks:18277 longest:6395

agfl (flfirst = 64, fllast = 69, flcount = 6)
64:547 65:167 66:1651 67:2040807 68:783 69:604

3. Then, cntbt needs a new btree block (so take one block
from agfl), and then the log records a new AGF:
blkno 62914177, len 1, map_size 1
00000000: 58 41 47 46 00 00 00 01 00 00 00 03 00 27 ff f0 XAGF.........'..
00000010: 00 00 00 09 00 00 00 07 00 00 00 00 00 00 00 02 ................
00000020: 00 00 00 02 00 00 00 00 00 00 00 41 00 00 00 45 ...........A...E
00000030: 00 00 00 05 00 00 47 65 00 00 18 fb 00 00 00 09 ......Ge........
00000040: 75 dc c1 b5 1a 45 40 2a 80 50 72 f0 59 6e 62 66 u....E@*.Pr.Ynbf
agf 3 flfirst: 65 (0x41) fllast: 69 (0x45) cnt: 5
freeblks 18277

4. agfl 64 (daddr 62918552) was then written as a cntbt block
log item:
type#011= 0x123c
flags#011= 0x8
blkno 62918552, len 8, map_size 1
00000000: 41 42 33 43 00 00 00 fd 00 1f 23 e4 ff ff ff ff AB3C......#.....
00000010: 00 00 00 00 03 c0 0f 98 00 00 00 00 00 00 00 00 ................
00000020: 75 dc c1 b5 1a 45 40 2a 80 50 72 f0 59 6e 62 66 u....E@*.Pr.Ynbf

5. Finally, the following inode extent to btree allocation fails:
Nov 1 07:56:09 dell-per750-08 kernel: ------------[ cut here ]------------
Nov 1 07:56:09 dell-per750-08 kernel: WARNING: CPU: 15 PID: 49290 at fs/xfs/libxfs/xfs_bmap.c:717 xfs_bmap_extents_to_btree+0xc51/0x1050 [xfs]
...
Nov 1 07:56:10 dell-per750-08 kernel: XFS (sda2): agno 3 agflcount 5 freeblks 18277 reservation 18276 6

since

available = freeblks + agflcount - reservation - need - minleft
= 18277 + min(5, 6) - 18276 - 6 - 0 = 0 < 1
kaboom.

In order to fix the issue above, freelist needs to be filled with the
minimal blocks at least before available space check, and then we also
know the freelist has enough blocks for the following emergency btree
allocations.

[1] 1ca89fbc48e1 ("xfs: don't account extra agfl blocks as available")
https://lore.kernel.org/r/[email protected]
[2] https://lore.kernel.org/r/[email protected]
Reported-by: Zirong Lang <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
---
changes since v1:
- refine commit message for better understanding;
- add a "Reported-by" tag to thank Zirong for reproducing.

fs/xfs/libxfs/xfs_alloc.c | 186 ++++++++++++++++++++++++--------------
1 file changed, 120 insertions(+), 66 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 6261599bb389..524497ccf52b 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2587,6 +2587,85 @@ xfs_exact_minlen_extent_available(
}
#endif

+/*
+ * The freelist has to be in a good state before the available space check
+ * since multiple allocations could be performed from a single AG and
+ * transaction under certain conditions. For example, A bmbt allocation
+ * request made for inode extent to bmap format conversion after an extent
+ * allocation is expected to be satisfied by the same AG. Such bmap conversion
+ * allocation can fail due to the available space check if allocbt required an
+ * extra btree block from the freelist in the previous allocation but without
+ * making the freelist longer.
+ */
+int
+xfs_fill_agfl(
+ struct xfs_alloc_arg *args,
+ int flags,
+ xfs_extlen_t need,
+ struct xfs_buf *agbp)
+{
+ struct xfs_trans *tp = args->tp;
+ struct xfs_perag *pag = agbp->b_pag;
+ struct xfs_alloc_arg targs = {
+ .tp = tp,
+ .mp = tp->t_mountp,
+ .agbp = agbp,
+ .agno = args->agno,
+ .alignment = 1,
+ .minlen = 1,
+ .prod = 1,
+ .type = XFS_ALLOCTYPE_THIS_AG,
+ .pag = pag,
+ };
+ struct xfs_buf *agflbp = NULL;
+ xfs_agblock_t bno;
+ int error;
+
+ if (flags & XFS_ALLOC_FLAG_NORMAP)
+ targs.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
+ else
+ targs.oinfo = XFS_RMAP_OINFO_AG;
+
+ error = xfs_alloc_read_agfl(pag, tp, &agflbp);
+ if (error)
+ return error;
+
+ /* Make the freelist longer if it's too short. */
+ while (pag->pagf_flcount < need) {
+ targs.agbno = 0;
+ targs.maxlen = need - pag->pagf_flcount;
+ targs.resv = XFS_AG_RESV_AGFL;
+
+ /* Allocate as many blocks as possible at once. */
+ error = xfs_alloc_ag_vextent(&targs);
+ if (error)
+ goto out_agflbp_relse;
+
+ /*
+ * Stop if we run out. Won't happen if callers are obeying
+ * the restrictions correctly. Can happen for free calls
+ * on a completely full ag.
+ */
+ if (targs.agbno == NULLAGBLOCK) {
+ if (flags & XFS_ALLOC_FLAG_FREEING)
+ break;
+ goto out_agflbp_relse;
+ }
+ /*
+ * Put each allocated block on the list.
+ */
+ for (bno = targs.agbno; bno < targs.agbno + targs.len; bno++) {
+ error = xfs_alloc_put_freelist(pag, tp, agbp,
+ agflbp, bno, 0);
+ if (error)
+ goto out_agflbp_relse;
+ }
+ }
+out_agflbp_relse:
+ xfs_trans_brelse(tp, agflbp);
+ return error;
+}
+
/*
* Decide whether to use this allocation group for this allocation.
* If so, fix up the btree freelist's size.
@@ -2600,8 +2679,7 @@ xfs_alloc_fix_freelist(
struct xfs_perag *pag = args->pag;
struct xfs_trans *tp = args->tp;
struct xfs_buf *agbp = NULL;
- struct xfs_buf *agflbp = NULL;
- struct xfs_alloc_arg targs; /* local allocation arguments */
+ struct xfs_owner_info oinfo;
xfs_agblock_t bno; /* freelist block */
xfs_extlen_t need; /* total blocks needed in freelist */
int error = 0;
@@ -2630,22 +2708,45 @@ xfs_alloc_fix_freelist(
goto out_agbp_relse;
}

- need = xfs_alloc_min_freelist(mp, pag);
- if (!xfs_alloc_space_available(args, need, flags |
- XFS_ALLOC_FLAG_CHECK))
- goto out_agbp_relse;
-
/*
- * Get the a.g. freespace buffer.
- * Can fail if we're not blocking on locks, and it's held.
+ * See the comment above xfs_fill_agfl() for the reason why we need to
+ * make freelist longer here. Assumed that such case is quite rare, so
+ * in order to simplify the code, let's take agbp unconditionally.
*/
- if (!agbp) {
- error = xfs_alloc_read_agf(pag, tp, flags, &agbp);
- if (error) {
- /* Couldn't lock the AGF so skip this AG. */
- if (error == -EAGAIN)
- error = 0;
- goto out_no_agbp;
+ need = xfs_alloc_min_freelist(mp, pag);
+ if (pag->pagf_flcount < need) {
+ /*
+ * Get the a.g. freespace buffer.
+ * Can fail if we're not blocking on locks, and it's held.
+ */
+ if (!agbp) {
+ error = xfs_alloc_read_agf(pag, tp, flags, &agbp);
+ if (error) {
+ /* Couldn't lock the AGF so skip this AG. */
+ if (error == -EAGAIN)
+ error = 0;
+ goto out_no_agbp;
+ }
+ }
+
+ need = xfs_alloc_min_freelist(mp, pag);
+ error = xfs_fill_agfl(args, flags, need, agbp);
+ if (error)
+ goto out_agbp_relse;
+ } else {
+ if (!xfs_alloc_space_available(args, need, flags |
+ XFS_ALLOC_FLAG_CHECK))
+ goto out_agbp_relse;
+
+ /* Get the a.g. freespace buffer. */
+ if (!agbp) {
+ error = xfs_alloc_read_agf(pag, tp, flags, &agbp);
+ if (error) {
+ /* Couldn't lock the AGF so skip this AG. */
+ if (error == -EAGAIN)
+ error = 0;
+ goto out_no_agbp;
+ }
}
}

@@ -2691,69 +2792,22 @@ xfs_alloc_fix_freelist(
* regenerated AGFL, bnobt, and cntbt. See repair/phase5.c and
* repair/rmap.c in xfsprogs for details.
*/
- memset(&targs, 0, sizeof(targs));
- /* struct copy below */
if (flags & XFS_ALLOC_FLAG_NORMAP)
- targs.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
+ oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
else
- targs.oinfo = XFS_RMAP_OINFO_AG;
+ oinfo = XFS_RMAP_OINFO_AG;
while (!(flags & XFS_ALLOC_FLAG_NOSHRINK) && pag->pagf_flcount > need) {
error = xfs_alloc_get_freelist(pag, tp, agbp, &bno, 0);
if (error)
goto out_agbp_relse;

/* defer agfl frees */
- xfs_defer_agfl_block(tp, args->agno, bno, &targs.oinfo);
+ xfs_defer_agfl_block(tp, args->agno, bno, &oinfo);
}

- targs.tp = tp;
- targs.mp = mp;
- targs.agbp = agbp;
- targs.agno = args->agno;
- targs.alignment = targs.minlen = targs.prod = 1;
- targs.type = XFS_ALLOCTYPE_THIS_AG;
- targs.pag = pag;
- error = xfs_alloc_read_agfl(pag, tp, &agflbp);
- if (error)
- goto out_agbp_relse;
-
- /* Make the freelist longer if it's too short. */
- while (pag->pagf_flcount < need) {
- targs.agbno = 0;
- targs.maxlen = need - pag->pagf_flcount;
- targs.resv = XFS_AG_RESV_AGFL;
-
- /* Allocate as many blocks as possible at once. */
- error = xfs_alloc_ag_vextent(&targs);
- if (error)
- goto out_agflbp_relse;
-
- /*
- * Stop if we run out. Won't happen if callers are obeying
- * the restrictions correctly. Can happen for free calls
- * on a completely full ag.
- */
- if (targs.agbno == NULLAGBLOCK) {
- if (flags & XFS_ALLOC_FLAG_FREEING)
- break;
- goto out_agflbp_relse;
- }
- /*
- * Put each allocated block on the list.
- */
- for (bno = targs.agbno; bno < targs.agbno + targs.len; bno++) {
- error = xfs_alloc_put_freelist(pag, tp, agbp,
- agflbp, bno, 0);
- if (error)
- goto out_agflbp_relse;
- }
- }
- xfs_trans_brelse(tp, agflbp);
args->agbp = agbp;
return 0;

-out_agflbp_relse:
- xfs_trans_brelse(tp, agflbp);
out_agbp_relse:
if (agbp)
xfs_trans_brelse(tp, agbp);
--
2.24.4


2022-11-03 14:02:18

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v2] xfs: extend the freelist before available space check

Hi,

On Thu, Nov 03, 2022 at 09:10:25PM +0800, Gao Xiang wrote:
> There is a long standing issue which could cause fs shutdown due to
> inode extent to btree conversion failure right after an extent
> allocation in the same AG, which is absolutely unexpected due to the
> proper minleft reservation in the previous allocation. Brian once
> addressed one of the root cause [1], however, such symptom can still
> occur after the commit is merged as reported [2], and our cloud
> environment is also suffering from this issue.
>
> From the description of the commit [1], I found that Zirong has an
> in-house stress test reproducer for this issue, therefore I asked him
> to reproduce again and he confirmed that such issue can still be
> reproducable on RHEL 9.
>
> Thanks to him, after dumping the transaction log items, I think
> the root cause is as below:
> 1. Allocate space with the following condition:
> freeblks: 18304 pagf_flcount: 6
> reservation: 18276 need (min_free): 6
> args->minleft: 1
> available = freeblks + agflcount - reservation - need - minleft
> = 18304 + min(6, 6) - 18276 - 6 - 1 = 27
>
> The first allocation check itself is ok;
>
> 2. At that time, the AG state is
> AGF Buffer: (XAGF)
> ver:1 seq#:3 len:2621424
> root BNO:9 CNT:7
> level BNO:2 CNT:2
> 1st:64 last:69 cnt:6 freeblks:18277 longest:6395
>
> agfl (flfirst = 64, fllast = 69, flcount = 6)
> 64:547 65:167 66:1651 67:2040807 68:783 69:604
>
> 3. Then, cntbt needs a new btree block (so take one block
> from agfl), and then the log records a new AGF:
> blkno 62914177, len 1, map_size 1
> 00000000: 58 41 47 46 00 00 00 01 00 00 00 03 00 27 ff f0 XAGF.........'..
> 00000010: 00 00 00 09 00 00 00 07 00 00 00 00 00 00 00 02 ................
> 00000020: 00 00 00 02 00 00 00 00 00 00 00 41 00 00 00 45 ...........A...E
> 00000030: 00 00 00 05 00 00 47 65 00 00 18 fb 00 00 00 09 ......Ge........
> 00000040: 75 dc c1 b5 1a 45 40 2a 80 50 72 f0 59 6e 62 66 u....E@*.Pr.Ynbf
> agf 3 flfirst: 65 (0x41) fllast: 69 (0x45) cnt: 5
> freeblks 18277
>
> 4. agfl 64 (daddr 62918552) was then written as a cntbt block
> log item:
> type#011= 0x123c
> flags#011= 0x8
> blkno 62918552, len 8, map_size 1
> 00000000: 41 42 33 43 00 00 00 fd 00 1f 23 e4 ff ff ff ff AB3C......#.....
> 00000010: 00 00 00 00 03 c0 0f 98 00 00 00 00 00 00 00 00 ................
> 00000020: 75 dc c1 b5 1a 45 40 2a 80 50 72 f0 59 6e 62 66 u....E@*.Pr.Ynbf
>
> 5. Finally, the following inode extent to btree allocation fails:
> Nov 1 07:56:09 dell-per750-08 kernel: ------------[ cut here ]------------
> Nov 1 07:56:09 dell-per750-08 kernel: WARNING: CPU: 15 PID: 49290 at fs/xfs/libxfs/xfs_bmap.c:717 xfs_bmap_extents_to_btree+0xc51/0x1050 [xfs]
> ...
> Nov 1 07:56:10 dell-per750-08 kernel: XFS (sda2): agno 3 agflcount 5 freeblks 18277 reservation 18276 6
>
> since
>
> available = freeblks + agflcount - reservation - need - minleft
> = 18277 + min(5, 6) - 18276 - 6 - 0 = 0 < 1
> kaboom.
>

Perhaps it's still not a correct fix since the second conversion
allocation will fail as:

available = freeblks + agflcount - reservation - need - minleft
= 18276 + min(6, 6) - 18276 - 6 - 0 = 0 < 1

If we don't want to use the last blocks of the AG, we should shorten
args->maxlen to avoid touch these agfl blocks, thoughts?

2300 static bool
2301 xfs_alloc_space_available(
2302 struct xfs_alloc_arg *args,
2303 xfs_extlen_t min_free,
2304 int flags)
2305 {

...

2329 available = (int)(pag->pagf_freeblks + agflcount -
2330 reservation - min_free - args->minleft);

^ here available = 27

2331 if (available < (int)max(args->total, alloc_len))
2332 return false;
2333
2334 /*
2335 * Clamp maxlen to the amount of free space available for the actual
2336 * extent allocation.
2337 */
2338 if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) {
2339 args->maxlen = available;

^ so args->maxlen = 27 here

and then freeblks from 18304 - 27 = 18277, but with another agfl block
allocated (pagf_flcount from 6 to 5), the inequality will not satisfy:

available = freeblks + agflcount - reservation - need - minleft
= 18277 + min(5, 6) - 18276 - 6 - 0 = 0 < 1

I think one of the correct fix is to fix args->maxlen above though, or
some better preferred idea to fix this?

Thanks,
Gao Xiang

2022-11-03 16:56:29

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] xfs: extend the freelist before available space check

Hi Gao,

I love your patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on linus/master v6.1-rc3 next-20221103]
[cannot apply to djwong-xfs/djwong-devel]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Gao-Xiang/xfs-extend-the-freelist-before-available-space-check/20221103-174840
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
patch link: https://lore.kernel.org/r/20221103094639.39984-1-hsiangkao%40linux.alibaba.com
patch subject: [PATCH] xfs: extend the freelist before available space check
config: hexagon-randconfig-r004-20221102
compiler: clang version 15.0.4 (https://github.com/llvm/llvm-project 5c68a1cb123161b54b72ce90e7975d95a8eaf2a4)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/427b4db33a93f832a6da62b2838bd5a558d024d9
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Gao-Xiang/xfs-extend-the-freelist-before-available-space-check/20221103-174840
git checkout 427b4db33a93f832a6da62b2838bd5a558d024d9
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash fs/xfs/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

In file included from fs/xfs/libxfs/xfs_alloc.c:6:
In file included from fs/xfs/xfs.h:22:
In file included from fs/xfs/xfs_linux.h:31:
In file included from include/linux/blkdev.h:9:
In file included from include/linux/blk_types.h:10:
In file included from include/linux/bvec.h:10:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __raw_readb(PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
^
In file included from fs/xfs/libxfs/xfs_alloc.c:6:
In file included from fs/xfs/xfs.h:22:
In file included from fs/xfs/xfs_linux.h:31:
In file included from include/linux/blkdev.h:9:
In file included from include/linux/blk_types.h:10:
In file included from include/linux/bvec.h:10:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
^
In file included from fs/xfs/libxfs/xfs_alloc.c:6:
In file included from fs/xfs/xfs.h:22:
In file included from fs/xfs/xfs_linux.h:31:
In file included from include/linux/blkdev.h:9:
In file included from include/linux/blk_types.h:10:
In file included from include/linux/bvec.h:10:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writeb(value, PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
>> fs/xfs/libxfs/xfs_alloc.c:2597:1: warning: no previous prototype for function 'xfs_fill_agfl' [-Wmissing-prototypes]
xfs_fill_agfl(
^
fs/xfs/libxfs/xfs_alloc.c:2596:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int
^
static
7 warnings generated.


vim +/xfs_fill_agfl +2597 fs/xfs/libxfs/xfs_alloc.c

2585
2586 /*
2587 * The freelist has to be in a good state before the available space check
2588 * since multiple allocations could be performed from a single AG and
2589 * transaction under certain conditions. For example, A bmbt allocation
2590 * request made for inode extent to bmap format conversion after an extent
2591 * allocation is expected to be satisfied by the same AG. Such bmap conversion
2592 * allocation can fail due to the available space check if allocbt required an
2593 * extra btree block from the freelist in the previous allocation but without
2594 * make the freelist longer.
2595 */
2596 int
> 2597 xfs_fill_agfl(
2598 struct xfs_alloc_arg *args,
2599 int flags,
2600 xfs_extlen_t need,
2601 struct xfs_buf *agbp)
2602 {
2603 struct xfs_trans *tp = args->tp;
2604 struct xfs_perag *pag = agbp->b_pag;
2605 struct xfs_alloc_arg targs = {
2606 .tp = tp,
2607 .mp = tp->t_mountp,
2608 .agbp = agbp,
2609 .agno = args->agno,
2610 .alignment = 1,
2611 .minlen = 1,
2612 .prod = 1,
2613 .type = XFS_ALLOCTYPE_THIS_AG,
2614 .pag = pag,
2615 };
2616 struct xfs_buf *agflbp = NULL;
2617 xfs_agblock_t bno;
2618 int error;
2619
2620 if (flags & XFS_ALLOC_FLAG_NORMAP)
2621 targs.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
2622 else
2623 targs.oinfo = XFS_RMAP_OINFO_AG;
2624
2625 error = xfs_alloc_read_agfl(pag, tp, &agflbp);
2626 if (error)
2627 return error;
2628
2629 /* Make the freelist longer if it's too short. */
2630 while (pag->pagf_flcount < need) {
2631 targs.agbno = 0;
2632 targs.maxlen = need - pag->pagf_flcount;
2633 targs.resv = XFS_AG_RESV_AGFL;
2634
2635 /* Allocate as many blocks as possible at once. */
2636 error = xfs_alloc_ag_vextent(&targs);
2637 if (error)
2638 goto out_agflbp_relse;
2639
2640 /*
2641 * Stop if we run out. Won't happen if callers are obeying
2642 * the restrictions correctly. Can happen for free calls
2643 * on a completely full ag.
2644 */
2645 if (targs.agbno == NULLAGBLOCK) {
2646 if (flags & XFS_ALLOC_FLAG_FREEING)
2647 break;
2648 error = -ENOSPC;
2649 goto out_agflbp_relse;
2650 }
2651 /*
2652 * Put each allocated block on the list.
2653 */
2654 for (bno = targs.agbno; bno < targs.agbno + targs.len; bno++) {
2655 error = xfs_alloc_put_freelist(pag, tp, agbp,
2656 agflbp, bno, 0);
2657 if (error)
2658 goto out_agflbp_relse;
2659 }
2660 }
2661 out_agflbp_relse:
2662 xfs_trans_brelse(tp, agflbp);
2663 return error;
2664 }
2665

--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (9.14 kB)
config (127.65 kB)
Download all attachments

2022-11-03 23:23:42

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v2] xfs: extend the freelist before available space check

On Thu, Nov 03, 2022 at 09:10:25PM +0800, Gao Xiang wrote:
> There is a long standing issue which could cause fs shutdown due to
> inode extent to btree conversion failure right after an extent
> allocation in the same AG, which is absolutely unexpected due to the
> proper minleft reservation in the previous allocation. Brian once
> addressed one of the root cause [1], however, such symptom can still
> occur after the commit is merged as reported [2], and our cloud
> environment is also suffering from this issue.
>
> From the description of the commit [1], I found that Zirong has an
> in-house stress test reproducer for this issue, therefore I asked him
> to reproduce again and he confirmed that such issue can still be
> reproducable on RHEL 9.
>
> Thanks to him, after dumping the transaction log items, I think
> the root cause is as below:
> 1. Allocate space with the following condition:
> freeblks: 18304 pagf_flcount: 6
> reservation: 18276 need (min_free): 6
> args->minleft: 1
> available = freeblks + agflcount - reservation - need - minleft
> = 18304 + min(6, 6) - 18276 - 6 - 1 = 27
>
> The first allocation check itself is ok;
>
> 2. At that time, the AG state is
> AGF Buffer: (XAGF)
> ver:1 seq#:3 len:2621424
> root BNO:9 CNT:7
> level BNO:2 CNT:2
> 1st:64 last:69 cnt:6 freeblks:18277 longest:6395
^^^^^^^^^^^^^^

Hold on - pag->pagf_freeblks != agf->freeblks, and if we start with
the agf freeblocks:

available = 18277 + 6 - 18276 - 6 - 1 = 0

IOWs, the allocation should never selected this AG in the first
place.

So why is pag->pagf_freeblks not equal to agf->freeblks when this
allocation was first checked? It's clearly not because the AGFL is
unpopulated - both the perag and the agf indicate it has the minimum
6 blocks already allocated....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2022-11-04 01:04:11

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v2] xfs: extend the freelist before available space check

Hi Dave,

On Fri, Nov 04, 2022 at 10:05:42AM +1100, Dave Chinner wrote:
> On Thu, Nov 03, 2022 at 09:10:25PM +0800, Gao Xiang wrote:
> > There is a long standing issue which could cause fs shutdown due to
> > inode extent to btree conversion failure right after an extent
> > allocation in the same AG, which is absolutely unexpected due to the
> > proper minleft reservation in the previous allocation. Brian once
> > addressed one of the root cause [1], however, such symptom can still
> > occur after the commit is merged as reported [2], and our cloud
> > environment is also suffering from this issue.
> >
> > From the description of the commit [1], I found that Zirong has an
> > in-house stress test reproducer for this issue, therefore I asked him
> > to reproduce again and he confirmed that such issue can still be
> > reproducable on RHEL 9.
> >
> > Thanks to him, after dumping the transaction log items, I think
> > the root cause is as below:
> > 1. Allocate space with the following condition:
> > freeblks: 18304 pagf_flcount: 6
> > reservation: 18276 need (min_free): 6
> > args->minleft: 1
> > available = freeblks + agflcount - reservation - need - minleft
> > = 18304 + min(6, 6) - 18276 - 6 - 1 = 27
> >
> > The first allocation check itself is ok;
> >
> > 2. At that time, the AG state is
> > AGF Buffer: (XAGF)
> > ver:1 seq#:3 len:2621424
> > root BNO:9 CNT:7
> > level BNO:2 CNT:2
> > 1st:64 last:69 cnt:6 freeblks:18277 longest:6395
> ^^^^^^^^^^^^^^
>
> Hold on - pag->pagf_freeblks != agf->freeblks, and if we start with
> the agf freeblocks:
>
> available = 18277 + 6 - 18276 - 6 - 1 = 0
>
> IOWs, the allocation should never selected this AG in the first
> place.
>
> So why is pag->pagf_freeblks not equal to agf->freeblks when this
> allocation was first checked? It's clearly not because the AGFL is
> unpopulated - both the perag and the agf indicate it has the minimum
> 6 blocks already allocated....

Thanks for the reply.

I may mispresent 2) here since there are several AGF agno 3 recording,
the last completed trans printed by "xfs_logprint" is:

============================================================================
TRANS: tid:0xaf57a744 #items:621 trans:0xaf57a744 q:0x56104c44be70
CUD: cnt:1 total:1 a:0x56104c44e320 len:16
CUD: #regs: 1 id: 0xff110004e02bc1e8
EFI: cnt:1 total:1 a:0x56104c447b30 len:32
EFI: #regs:1 num_extents:1 id:0xff110001bd8c56e0
(s: 0xe7cc8d, l: 3)
EFD: cnt:1 total:1 a:0x56104c42d1b0 len:32
EFD: #regs: 1 num_extents: 1 id: 0xff110001bd8c56e0
BUF: cnt:2 total:2 a:0x56104c42f5c0 len:24 a:0x56104c4712e0 len:128
BUF: #regs:2 start blkno:0x3bffe81 len:1 bmap size:1 flags:0x2800
AGF Buffer: (XAGF)
ver:1 seq#:3 len:2621424
root BNO:9 CNT:7
level BNO:2 CNT:2
1st:64 last:69 cnt:6 freeblks:18304 longest:6395

So I think freeblks starts from 18304.

18277 is just an intermediate state in my mind (Actually there is also such AGF
record, but that is not the latest one because this is a stress test), sorry
for this.

In short, in order to do the first allocation, I think it allocates from
freeblks 18304 -> 18276
agflcount 6->5
And the second one fails,
available = freeblks + agflcount - reservation - need - minleft
= 18277 + min(5, 6) - 18276 - 6 - 0 = 0 < 1
I also think it can happen in the current codebase.

Full xfs_logprint is too large to send by email to the mailing list, but
I could send this separately to you if really needed.

My debugging message catched when xfs_trans_cancel() attached in the
following reply of this email.

Thanks,
Gao Xiang

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]

2022-11-04 01:04:43

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v2] xfs: extend the freelist before available space check

On Fri, Nov 04, 2022 at 08:31:04AM +0800, Gao Xiang wrote:
> Hi Dave,
>
> On Fri, Nov 04, 2022 at 10:05:42AM +1100, Dave Chinner wrote:
> > On Thu, Nov 03, 2022 at 09:10:25PM +0800, Gao Xiang wrote:
> > > There is a long standing issue which could cause fs shutdown due to
> > > inode extent to btree conversion failure right after an extent
> > > allocation in the same AG, which is absolutely unexpected due to the
> > > proper minleft reservation in the previous allocation. Brian once
> > > addressed one of the root cause [1], however, such symptom can still
> > > occur after the commit is merged as reported [2], and our cloud
> > > environment is also suffering from this issue.
> > >
> > > From the description of the commit [1], I found that Zirong has an
> > > in-house stress test reproducer for this issue, therefore I asked him
> > > to reproduce again and he confirmed that such issue can still be
> > > reproducable on RHEL 9.
> > >
> > > Thanks to him, after dumping the transaction log items, I think
> > > the root cause is as below:
> > > 1. Allocate space with the following condition:
> > > freeblks: 18304 pagf_flcount: 6
> > > reservation: 18276 need (min_free): 6
> > > args->minleft: 1
> > > available = freeblks + agflcount - reservation - need - minleft
> > > = 18304 + min(6, 6) - 18276 - 6 - 1 = 27
> > >
> > > The first allocation check itself is ok;
> > >
> > > 2. At that time, the AG state is
> > > AGF Buffer: (XAGF)
> > > ver:1 seq#:3 len:2621424
> > > root BNO:9 CNT:7
> > > level BNO:2 CNT:2
> > > 1st:64 last:69 cnt:6 freeblks:18277 longest:6395
> > ^^^^^^^^^^^^^^
> >
> > Hold on - pag->pagf_freeblks != agf->freeblks, and if we start with
> > the agf freeblocks:
> >
> > available = 18277 + 6 - 18276 - 6 - 1 = 0
> >
> > IOWs, the allocation should never selected this AG in the first
> > place.
> >
> > So why is pag->pagf_freeblks not equal to agf->freeblks when this
> > allocation was first checked? It's clearly not because the AGFL is
> > unpopulated - both the perag and the agf indicate it has the minimum
> > 6 blocks already allocated....
>
> Thanks for the reply.
>
> I may mispresent 2) here since there are several AGF agno 3 recording,
> the last completed trans printed by "xfs_logprint" is:
>
> ============================================================================
> TRANS: tid:0xaf57a744 #items:621 trans:0xaf57a744 q:0x56104c44be70
> CUD: cnt:1 total:1 a:0x56104c44e320 len:16
> CUD: #regs: 1 id: 0xff110004e02bc1e8
> EFI: cnt:1 total:1 a:0x56104c447b30 len:32
> EFI: #regs:1 num_extents:1 id:0xff110001bd8c56e0
> (s: 0xe7cc8d, l: 3)
> EFD: cnt:1 total:1 a:0x56104c42d1b0 len:32
> EFD: #regs: 1 num_extents: 1 id: 0xff110001bd8c56e0
> BUF: cnt:2 total:2 a:0x56104c42f5c0 len:24 a:0x56104c4712e0 len:128
> BUF: #regs:2 start blkno:0x3bffe81 len:1 bmap size:1 flags:0x2800
> AGF Buffer: (XAGF)
> ver:1 seq#:3 len:2621424
> root BNO:9 CNT:7
> level BNO:2 CNT:2
> 1st:64 last:69 cnt:6 freeblks:18304 longest:6395
>
> So I think freeblks starts from 18304.
>
> 18277 is just an intermediate state in my mind (Actually there is also such AGF
> record, but that is not the latest one because this is a stress test), sorry
> for this.
>
> In short, in order to do the first allocation, I think it allocates from
> freeblks 18304 -> 18276


^ sorry 18277 here, already too many numbers in my head

I tried to send dmesg.log.xz in this thread, since I'm not sure whether
@vger.kernel.org will drop this email directly or not.

Thanks,
Gao Xiang


> agflcount 6->5
> And the second one fails,
> available = freeblks + agflcount - reservation - need - minleft
> = 18277 + min(5, 6) - 18276 - 6 - 0 = 0 < 1
> I also think it can happen in the current codebase.
>
> Full xfs_logprint is too large to send by email to the mailing list, but
> I could send this separately to you if really needed.
>
> My debugging message catched when xfs_trans_cancel() attached in the
> following reply of this email.
>
> Thanks,
> Gao Xiang
>
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > [email protected]


Attachments:
(No filename) (4.40 kB)
dmesg.log.xz (14.63 kB)
Download all attachments

2022-11-20 08:01:03

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2] xfs: extend the freelist before available space check


please be noted we noticed there are some discussion around this v2 patch,
Dave also gave some comments. it seems us there will be further refines.
we still sent out this report FYI the possible impact of this patch.

Greeting,

FYI, we noticed xfstests.xfs.076.fail due to commit (built with gcc-11):

commit: bdd2815e0ab101910e39dfcdc9a2a88e01a3d39b ("[PATCH v2] xfs: extend the freelist before available space check")
url: https://github.com/intel-lab-lkp/linux/commits/UPDATE-20221103-211312/Gao-Xiang/xfs-extend-the-freelist-before-available-space-check/20221103-174840
base: https://git.kernel.org/cgit/fs/xfs/xfs-linux.git for-next
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH v2] xfs: extend the freelist before available space check

in testcase: xfstests
version: xfstests-x86_64-5a5e419-1_20221108
with following parameters:

disk: 4HDD
fs: xfs
test: xfs-group-07

test-description: xfstests is a regression test suite for xfs and other files ystems.
test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git


on test machine: 4 threads 1 sockets Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz (Ivy Bridge) with 8G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):




If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]


user :warn : [ 118.806130] run fstests xfs/076 at 2022-11-17 02:44:36
kern :notice: [ 119.589950] XFS (sda2): Mounting V5 Filesystem
kern :info : [ 119.657030] XFS (sda2): Ending clean mount
kern :warn : [ 119.663880] xfs filesystem being mounted at /fs/sda2 supports timestamps until 2038 (0x7fffffff)
kern :notice: [ 121.762735] XFS (sda5): Mounting V5 Filesystem
kern :info : [ 121.975291] XFS (sda5): Ending clean mount
kern :warn : [ 121.981897] xfs filesystem being mounted at /fs/scratch supports timestamps until 2038 (0x7fffffff)
kern :notice: [ 122.001845] XFS (sda5): Unmounting Filesystem
kern :notice: [ 122.315727] XFS (sda5): Mounting V5 Filesystem
kern :info : [ 122.371799] XFS (sda5): Ending clean mount
kern :warn : [ 122.378570] xfs filesystem being mounted at /fs/scratch supports timestamps until 2038 (0x7fffffff)
user :notice: [ 133.550084] xfs/076 IPMI BMC is not supported on this machine, skip bmc-watchdog setup!

kern :alert : [ 243.286999] XFS (sda5): Internal error xfs_trans_cancel at line 1097 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x9e8/0xe80 [xfs]
kern :warn : [ 243.300093] CPU: 1 PID: 36204 Comm: touch Not tainted 6.1.0-rc1-00031-gbdd2815e0ab1 #1
kern :warn : [ 243.308725] Hardware name: Hewlett-Packard HP Pro 3340 MT/17A1, BIOS 8.07 01/24/2013
kern :warn : [ 243.317169] Call Trace:
kern :warn : [ 243.320320] <TASK>
kern :warn : [ 243.323131] dump_stack_lvl+0x34/0x44
kern :warn : [ 243.327505] xfs_trans_cancel+0x236/0x580 [xfs]
kern :warn : [ 243.332889] xfs_create+0x9e8/0xe80 [xfs]
kern :warn : [ 243.337742] ? xfs_irele+0xe0/0xe0 [xfs]
kern :warn : [ 243.342511] ? __wake_up_common_lock+0xe3/0x140
kern :warn : [ 243.347751] ? get_cached_acl_rcu+0xf0/0xf0
kern :warn : [ 243.352647] ? _raw_spin_lock+0x81/0xd0
kern :warn : [ 243.357192] xfs_generic_create+0x477/0x610 [xfs]
kern :warn : [ 243.362737] ? xfs_setup_iops+0x3b0/0x3b0 [xfs]
kern :warn : [ 243.368100] ? d_splice_alias+0xfa/0x4f0
kern :warn : [ 243.372727] ? arch_stack_walk+0x61/0xf0
kern :warn : [ 243.377355] ? xfs_vn_link+0x1c0/0x1c0 [xfs]
kern :warn : [ 243.382461] ? xfs_vn_lookup+0x143/0x170 [xfs]
kern :warn : [ 243.387739] ? inode_permission+0x91/0x500
kern :warn : [ 243.392548] ? xfs_vn_link+0x1c0/0x1c0 [xfs]
kern :warn : [ 243.397646] lookup_open+0xb47/0x1300
kern :warn : [ 243.402630] ? path_lookupat+0x650/0x650
kern :warn : [ 243.407266] ? down_write_killable+0x160/0x160
kern :warn : [ 243.412419] ? __legitimize_path+0x6b/0x160
kern :warn : [ 243.417308] open_last_lookups+0x436/0xeb0
kern :warn : [ 243.422109] ? kasan_set_track+0x21/0x30
kern :warn : [ 243.426738] path_openat+0x165/0x640
kern :warn : [ 243.431019] ? vfs_tmpfile_open+0x60/0x60
kern :warn : [ 243.435734] ? filemap_map_pages+0x718/0xb10
kern :warn : [ 243.440707] do_filp_open+0x1b1/0x3e0
kern :warn : [ 243.445080] ? filemap_map_pmd+0x630/0x630
kern :warn : [ 243.449886] ? may_open_dev+0xd0/0xd0
kern :warn : [ 243.454256] ? _raw_spin_lock+0x81/0xd0
kern :warn : [ 243.458807] ? _raw_write_lock_irq+0xd0/0xd0
kern :warn : [ 243.463789] ? alloc_fd+0x3b3/0x560
kern :warn : [ 243.467981] do_sys_openat2+0x122/0x400
kern :warn : [ 243.472524] ? build_open_flags+0x450/0x450
kern :warn : [ 243.477413] __x64_sys_openat+0x11f/0x1d0
kern :warn : [ 243.482125] ? __x64_sys_open+0x1a0/0x1a0
kern :warn : [ 243.486841] ? exit_to_user_mode_loop+0xbc/0x120
kern :warn : [ 243.492171] do_syscall_64+0x35/0x80
kern :warn : [ 243.496451] entry_SYSCALL_64_after_hwframe+0x5e/0xc8
kern :warn : [ 243.502206] RIP: 0033:0x7f727731b5a7
kern :warn : [ 243.506490] Code: 25 00 00 41 00 3d 00 00 41 00 74 47 64 8b 04 25 18 00 00 00 85 c0 75 6b 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 95 00 00 00 48 8b 4c 24 28 64 48 2b 0c 25
kern :warn : [ 243.525950] RSP: 002b:00007ffc0dcb7aa0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
kern :warn : [ 243.534224] RAX: ffffffffffffffda RBX: 00007ffc0dcb7d58 RCX: 00007f727731b5a7
kern :warn : [ 243.542057] RDX: 0000000000000941 RSI: 00007ffc0dcb8795 RDI: 00000000ffffff9c
kern :warn : [ 243.549893] RBP: 00007ffc0dcb8795 R08: 0000000000000000 R09: 0000000000000000
kern :warn : [ 243.557728] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941
kern :warn : [ 243.565570] R13: 00007ffc0dcb8795 R14: 0000000000000000 R15: 0000000000000000
kern :warn : [ 243.573404] </TASK>
kern :alert : [ 243.588136] XFS (sda5): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0x24f/0x580 [xfs] (fs/xfs/xfs_trans.c:1098). Shutting down filesystem.
kern :alert : [ 243.603268] XFS (sda5): Please unmount the filesystem and rectify the problem(s)
kern :notice: [ 243.758781] XFS (sda5): Unmounting Filesystem
kern :notice: [ 243.944261] XFS (sda2): Unmounting Filesystem
user :notice: [ 244.041371] [failed, exit status 1]- output mismatch (see /lkp/benchmarks/xfstests/results//xfs/076.out.bad)

user :notice: [ 244.060537] --- tests/xfs/076.out 2022-11-08 10:09:45.000000000 +0000

user :notice: [ 244.073022] +++ /lkp/benchmarks/xfstests/results//xfs/076.out.bad 2022-11-17 02:46:42.084042859 +0000

user :notice: [ 244.086146] @@ -1,2 +1,3 @@

user :notice: [ 244.093039] QA output created by 076

user :notice: [ 244.100648] -iusepct is in range

user :notice: [ 244.107677] +fpunch failed

user :notice: [ 244.115509] +(see /lkp/benchmarks/xfstests/results//xfs/076.full for details)

user :notice: [ 244.126194] ...


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (7.63 kB)
config-6.1.0-rc1-00031-gbdd2815e0ab1 (172.88 kB)
job-script (6.11 kB)
kmsg.xz (41.27 kB)
xfstests (1.73 kB)
job.yaml (4.97 kB)
reproduce (0.98 kB)
Download all attachments