Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp1587580rwi; Thu, 3 Nov 2022 07:02:18 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4bTRoGpFKsdvcFEgXJ2Y/QCznkuyBPL0pJ3Z2wa9p2VypTKz8jHHbg6bysVgyJ1xoTDf/7 X-Received: by 2002:a17:90a:a017:b0:213:ad3:4d1a with SMTP id q23-20020a17090aa01700b002130ad34d1amr48503318pjp.120.1667484138217; Thu, 03 Nov 2022 07:02:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667484138; cv=none; d=google.com; s=arc-20160816; b=etwHkIZjJM94LthMfIvO/8VXGFK8DwwJZMpIVYszBr1qR+zItXlhJmWhS07eYLu1sm DJPRoKrYxRBXjXklS8hTL1jXxziZH7yZMg6FAuozJvjpjCBx8mD4IDUeiCpAXwocmahM 4ZNJxe528nFEd3LeAMt0i5YoLcUoBd7XXfGVv+s1tTHXM2ahEtg2Z/aTdBLzUEdgtHE1 sw/wVqL2kJKMT26Ew7U5wYKfTLodaaGUtT3szyiqvG8TdPkUrdg623cRKJrxdtwvwGjD YbL5H+L/6Gzu+TBLiMzwn1leuYZ+xHmYmjE4IB3ovDzK/SYMxASe94FJR31ZAfHClE2H rbaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:mail-followup-to:message-id:subject:cc:to:from:date; bh=4dXmec8fuRQMUYJbDmCjOKd+1/pBq143pZTvooadSR8=; b=YF4wNnXVx1mXpgfGg8an2DeGXKwc8VQzeXyQ0VNOLbtYYJdz20PUTCnydFVx21ELxT QhsXPklZVVR2nrcLe+nMRtXfiqDK/IJ1MqoxasHGKBRWnqlR76pb2KWY8GkMHIMCZEW1 9kSrKTsmXNdM/+720UcHZ4aZdEBM1tVTAOXpKYG3eN1ywO5iZSLbUpnNiVV+IqZ5iIk4 R2r+6JsB2yWTZrU6EySMcUZ1eHOGwJ5VzGi62BO8yFGbumeuK5vCRvs01X1kr9j6LYL7 KoAlnOxZ/QtZKSbXdEr/nl9PTjkdm2TqYnSPXqv8Lyj12NK1Se1h1U0oyDs45xBVH5mA UOcw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m4-20020a6562c4000000b00455bdca34dcsi1226837pgv.88.2022.11.03.07.02.00; Thu, 03 Nov 2022 07:02:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231538AbiKCN4D (ORCPT + 97 others); Thu, 3 Nov 2022 09:56:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231499AbiKCN4A (ORCPT ); Thu, 3 Nov 2022 09:56:00 -0400 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8528515717; Thu, 3 Nov 2022 06:55:58 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045168;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0VTt9LJG_1667483754; Received: from B-P7TQMD6M-0146.local(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0VTt9LJG_1667483754) by smtp.aliyun-inc.com; Thu, 03 Nov 2022 21:55:56 +0800 Date: Thu, 3 Nov 2022 21:55:51 +0800 From: Gao Xiang To: linux-xfs@vger.kernel.org, "Darrick J. Wong" , Dave Chinner , Brian Foster Cc: LKML , Zirong Lang Subject: Re: [PATCH v2] xfs: extend the freelist before available space check Message-ID: Mail-Followup-To: linux-xfs@vger.kernel.org, "Darrick J. Wong" , Dave Chinner , Brian Foster , LKML , Zirong Lang References: <20221103094639.39984-1-hsiangkao@linux.alibaba.com> <20221103131025.40064-1-hsiangkao@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20221103131025.40064-1-hsiangkao@linux.alibaba.com> X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Thu, Nov 03, 2022 at 09:10:25PM +0800, Gao Xiang wrote: > There is a long standing issue which could cause fs shutdown due to > inode extent to btree conversion failure right after an extent > allocation in the same AG, which is absolutely unexpected due to the > proper minleft reservation in the previous allocation. Brian once > addressed one of the root cause [1], however, such symptom can still > occur after the commit is merged as reported [2], and our cloud > environment is also suffering from this issue. > > From the description of the commit [1], I found that Zirong has an > in-house stress test reproducer for this issue, therefore I asked him > to reproduce again and he confirmed that such issue can still be > reproducable on RHEL 9. > > Thanks to him, after dumping the transaction log items, I think > the root cause is as below: > 1. Allocate space with the following condition: > freeblks: 18304 pagf_flcount: 6 > reservation: 18276 need (min_free): 6 > args->minleft: 1 > available = freeblks + agflcount - reservation - need - minleft > = 18304 + min(6, 6) - 18276 - 6 - 1 = 27 > > The first allocation check itself is ok; > > 2. At that time, the AG state is > AGF Buffer: (XAGF) > ver:1 seq#:3 len:2621424 > root BNO:9 CNT:7 > level BNO:2 CNT:2 > 1st:64 last:69 cnt:6 freeblks:18277 longest:6395 > > agfl (flfirst = 64, fllast = 69, flcount = 6) > 64:547 65:167 66:1651 67:2040807 68:783 69:604 > > 3. Then, cntbt needs a new btree block (so take one block > from agfl), and then the log records a new AGF: > blkno 62914177, len 1, map_size 1 > 00000000: 58 41 47 46 00 00 00 01 00 00 00 03 00 27 ff f0 XAGF.........'.. > 00000010: 00 00 00 09 00 00 00 07 00 00 00 00 00 00 00 02 ................ > 00000020: 00 00 00 02 00 00 00 00 00 00 00 41 00 00 00 45 ...........A...E > 00000030: 00 00 00 05 00 00 47 65 00 00 18 fb 00 00 00 09 ......Ge........ > 00000040: 75 dc c1 b5 1a 45 40 2a 80 50 72 f0 59 6e 62 66 u....E@*.Pr.Ynbf > agf 3 flfirst: 65 (0x41) fllast: 69 (0x45) cnt: 5 > freeblks 18277 > > 4. agfl 64 (daddr 62918552) was then written as a cntbt block > log item: > type#011= 0x123c > flags#011= 0x8 > blkno 62918552, len 8, map_size 1 > 00000000: 41 42 33 43 00 00 00 fd 00 1f 23 e4 ff ff ff ff AB3C......#..... > 00000010: 00 00 00 00 03 c0 0f 98 00 00 00 00 00 00 00 00 ................ > 00000020: 75 dc c1 b5 1a 45 40 2a 80 50 72 f0 59 6e 62 66 u....E@*.Pr.Ynbf > > 5. Finally, the following inode extent to btree allocation fails: > Nov 1 07:56:09 dell-per750-08 kernel: ------------[ cut here ]------------ > Nov 1 07:56:09 dell-per750-08 kernel: WARNING: CPU: 15 PID: 49290 at fs/xfs/libxfs/xfs_bmap.c:717 xfs_bmap_extents_to_btree+0xc51/0x1050 [xfs] > ... > Nov 1 07:56:10 dell-per750-08 kernel: XFS (sda2): agno 3 agflcount 5 freeblks 18277 reservation 18276 6 > > since > > available = freeblks + agflcount - reservation - need - minleft > = 18277 + min(5, 6) - 18276 - 6 - 0 = 0 < 1 > kaboom. > Perhaps it's still not a correct fix since the second conversion allocation will fail as: available = freeblks + agflcount - reservation - need - minleft = 18276 + min(6, 6) - 18276 - 6 - 0 = 0 < 1 If we don't want to use the last blocks of the AG, we should shorten args->maxlen to avoid touch these agfl blocks, thoughts? 2300 static bool 2301 xfs_alloc_space_available( 2302 struct xfs_alloc_arg *args, 2303 xfs_extlen_t min_free, 2304 int flags) 2305 { ... 2329 available = (int)(pag->pagf_freeblks + agflcount - 2330 reservation - min_free - args->minleft); ^ here available = 27 2331 if (available < (int)max(args->total, alloc_len)) 2332 return false; 2333 2334 /* 2335 * Clamp maxlen to the amount of free space available for the actual 2336 * extent allocation. 2337 */ 2338 if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) { 2339 args->maxlen = available; ^ so args->maxlen = 27 here and then freeblks from 18304 - 27 = 18277, but with another agfl block allocated (pagf_flcount from 6 to 5), the inequality will not satisfy: available = freeblks + agflcount - reservation - need - minleft = 18277 + min(5, 6) - 18276 - 6 - 0 = 0 < 1 I think one of the correct fix is to fix args->maxlen above though, or some better preferred idea to fix this? Thanks, Gao Xiang