Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp1608208rwi; Thu, 27 Oct 2022 18:49:22 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4+lllDASNqdrpZpL8I5Us4v2yR93HBz2xKQNZEnNqizGOMndpAu6yexplo6IVo/a7qlRkh X-Received: by 2002:a63:6b09:0:b0:453:88a9:1d18 with SMTP id g9-20020a636b09000000b0045388a91d18mr44140360pgc.41.1666921762053; Thu, 27 Oct 2022 18:49:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666921762; cv=none; d=google.com; s=arc-20160816; b=nIPxSPTGVQTc0C7y0BrSuKHT+/5Vuz2x1lQjtFOtyiY+DG+cb2X7bBq7flGqUv5SUg QvuosKfW53xRgZPhcuRiUZ/WhlsJMHTg4L2Mp0MHvvlqRFrdLCnjSUq9h0sORaN7XyX1 ahjjlrhXVCW/Lk/ccCt3Z9knnTIwsUNGaf348PkPziv6N4rsllEAE0U5B9Z7LRsFCk1y /2/z3xfErgdP6O483JT02MNZH9NCE06iD4jDDmxV35bbWD4zS122WzK4rX3CAWODiwfg RweuIYmt5IYnVaI0OF7w+8rf+INhiMLlcOTAwcC9UTzcD9B5aS+aPPWCNctANON7UsSF tPyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=CRFYjNw0Xb30OIkWQj/E/wj0DGjQ4uYUHfYWZ9WMRE4=; b=F1/x1fiylvS7ejgLHmXOAh40Qp/t3CgQuD+lO9Rv8Du26Hm3hi2goNfUQ8rPevfXsE K51eS+FMvy/q6TILcBb644x9tpAjcImvFRCOi3RcM+ltpn3db5hXTEJRF8X1rSVnbRev O3bxSbmdNV9TX1HWWXXF04bKU5qjapTsOHCUkxCEwTYwINJ5Pfcloac7S0GZ1ddbxa2S oRYTi5HbudCNN6+pAQBE1bzI5Q1x8BAJP7UyVudRauBCZWaorr0A2Bi19eyeFTkOgYfg hkNQA9fHUR59ULs5Sps45D+nR1tnW2bKb8E+UjHf0OVF2VQ2v7yD9CvvA3FShBlTHTuJ jaSA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x69-20020a638648000000b00462a6f64703si3120850pgd.376.2022.10.27.18.48.58; Thu, 27 Oct 2022 18:49:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229531AbiJ1BmA (ORCPT + 99 others); Thu, 27 Oct 2022 21:42:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235210AbiJ1Bl7 (ORCPT ); Thu, 27 Oct 2022 21:41:59 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C233DA3B50; Thu, 27 Oct 2022 18:41:58 -0700 (PDT) Received: from dggpeml500026.china.huawei.com (unknown [172.30.72.53]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4Mz4v93NTYzpWFw; Fri, 28 Oct 2022 09:38:29 +0800 (CST) Received: from dggpeml500016.china.huawei.com (7.185.36.70) by dggpeml500026.china.huawei.com (7.185.36.106) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Fri, 28 Oct 2022 09:41:56 +0800 Received: from [10.174.176.102] (10.174.176.102) by dggpeml500016.china.huawei.com (7.185.36.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Fri, 28 Oct 2022 09:41:56 +0800 Message-ID: <2ab8e268-4e1a-8e97-4798-48fcdb651cdf@huawei.com> Date: Fri, 28 Oct 2022 09:41:55 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.3.2 Subject: Re: [PATCH RFC] ext4:record error information when insert extent failed in 'ext4_split_extent_at' To: Ye Bin , , , CC: , , Ye Bin References: <20221024122725.3083432-1-yebin@huaweicloud.com> From: zhanchengbin In-Reply-To: <20221024122725.3083432-1-yebin@huaweicloud.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.176.102] X-ClientProxiedBy: dggpeml100015.china.huawei.com (7.185.36.168) To dggpeml500016.china.huawei.com (7.185.36.70) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org There have been a lot of problems here before, but the problem has not been fundamentally solved. On 2022/10/24 20:27, Ye Bin wrote: > From: Ye Bin > > There's issue as follows when do test with memory fault injection: > [localhost]# fsck.ext4 -a image > image: clean, 45595/655360 files, 466841/2621440 blocks > [localhost]# fsck.ext4 -fn image > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > Block bitmap differences: -(1457230--1457256) > Fix? no > > image: ********** WARNING: Filesystem still has errors ********** > > image: 45595/655360 files (12.4% non-contiguous), 466841/2621440 blocks > > Inject context: > ----------------------------------------------------------- > Inject function:kmem_cache_alloc (pid:177858) (return: 0) > Calltrace Context: > mem_cache_allock+0x73/0xcc > ext4_mb_new_blocks+0x32e/0x540 [ext4] > ext4_new_meta_blocks+0xc4/0x110 [ext4] > ext4_ext_grow_indepth+0x68/0x250 [ext4] > ext4_ext_create_new_leaf+0xc5/0x120 [ext4] > ext4_ext_insert_extent+0x1bf/0x670 [ext4] > ext4_split_extent_at+0x212/0x530 [ext4] > ext4_split_extent+0x13a/0x1a0 [ext4] > ext4_ext_handle_unwritten_extents+0x13d/0x240 [ext4] > ext4_ext_map_blocks+0x459/0x8f0 [ext4] > ext4_map_blocks+0x18e/0x5a0 [ext4] > ext4_iomap_alloc+0xb0/0x1b0 [ext4] > ext4_iomap_begin+0xb0/0x130 [ext4] > iomap_apply+0x95/0x2e0 > __iomap_dio_rw+0x1cc/0x4b0 > iomap_dio_rw+0xe/0x40 > ext4_dio_write_iter+0x1a9/0x390 [ext4] > new_sync_write+0x113/0x1b0 > vfs_write+0x1b7/0x250 > ksys_write+0x5f/0xe0 > do_syscall_64+0x33/0x40 > entry_SYSCALL_64_after_hwframe+0x61/0xc6 > > Compare extent change in journal: > Start: > ee_block ee_len ee_start > 75 32798 1457227 -> unwritten len=30 > 308 12 434489 > 355 5 442492 > => > ee_block ee_len ee_start > 11 2 951584 > 74 32769 951647 -> unwritten len=1 > 75 32771 1457227 -> unwritten len=3, length decreased 27 > 211 15 960906 > 308 12 434489 > 355 5 442492 > > Acctually, above issue can repaired by 'fsck -fa'. But file system is 'clean', > 'fsck' will not do deep repair. > Obviously, final lost 27 blocks. Above issue may happens as follows: > ext4_split_extent_at > ... > err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags); -> return -ENOMEM > if (err != -ENOSPC && err != -EDQUOT) > goto out; -> goto 'out' will not fix extent length, will > ... > fix_extent_len: > ex->ee_len = orig_ex.ee_len; > /* > * Ignore ext4_ext_dirty return value since we are already in error path > * and err is a non-zero error code. > */ > ext4_ext_dirty(handle, inode, path + path->p_depth); > return err; > out: > ext4_ext_show_leaf(inode, path); > return err; > If 'ext4_ext_insert_extent' return '-ENOMEM' which will not fix 'ex->ee_len' by > old length. 'ext4_ext_insert_extent' will trigger extent tree merge, fix like > 'ex->ee_len = orig_ex.ee_len' may lead to new issues. > To solve above issue, record error messages when 'ext4_ext_insert_extent' return > 'err' not equal '(-ENOSPC && -EDQUOT)'. If filesysten is mounted with 'errors=continue' > as filesystem is not clean 'fsck' will repair issue. If filesystem is mounted with > 'errors=remount-ro' filesystem will be remounted by read-only. > > Signed-off-by: Ye Bin > --- > fs/ext4/extents.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index f1956288307f..582a7d59d6e3 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -3252,8 +3252,13 @@ static int ext4_split_extent_at(handle_t *handle, > ext4_ext_mark_unwritten(ex2); > > err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags); > - if (err != -ENOSPC && err != -EDQUOT) > + if (err != -ENOSPC && err != -EDQUOT) { > + if (err) > + EXT4_ERROR_INODE_ERR(inode, -err, > + "insert extent failed block = %d len = %d", > + ex2->ee_block, ex2->ee_len); > goto out; > + } > > if (EXT4_EXT_MAY_ZEROOUT & split_flag) { > if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) { >