Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp1142994ybk; Thu, 14 May 2020 01:18:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzZrrOWiysNwN0fAeQ74RILks/PNNhuq1dNKZHTV09FRH0q5wnjmMrBVDfDHpmXChvV0nqW X-Received: by 2002:a50:9a25:: with SMTP id o34mr2978372edb.10.1589444288362; Thu, 14 May 2020 01:18:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589444288; cv=none; d=google.com; s=arc-20160816; b=aMHHk444Vtha5FHRQ0dfIIYA39FNGkfyUCMvggxUbBbPxBQrG2BqPViph+XM7eK0GK wTBW2XOZhmTbiFhrM1Zdbjq46Qxh0qy7hpPAnWSCLlnv5vuUvfm4wctMJTZemGUmH3dm DkO7iKLm+vi1RijPav7RPqIgLpJYMUfLYamyFN8qcmZayHoKMqr3cG4KB0FCNhPrfe5a 2UvTDoac4bY79LbnJvlzL2rQ+OcVqwtpcfDsxB1GwQj3L1k3Cj+DPthRQBVQpjEZTsvU XElurVPy9u6M9Mj2wDTgmtbcmjKwvEiHsVoUasOTnINkzKhFM+NnodYAldRP8TEvPoJo vxfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=537nXLZDYamRGGLIM5iCs+E4u1iTsNwzK/QhUbHtST8=; b=YUx1BvRiQCxY/QLCvJYuOo8ftt8XOSPqcXAmejjLaQ+ajskDDfuRAA+ryrfxuWEgDX 7IO3jvVgF6BqszI7Pj179IWSNQl/kpBRdDcN6p5V0eAbK0iE5mYB1pp1x/DH6/gYtCGm RCYc+8NT2Ti3VSBtNE+E05GPtbJfYrESsVXuDUFZsI1NMIAexeN4oiC1d+StYlZ3k9KG 5TWdzZvBXQnww64dS75x/frhMNBEuTM60G7DLxc5YUIFYyHXLjqsDGxd8LeCP1ckpZeB zd6qkIy/mlZWx9Wnyjtxxes9QetybqqrqeBFRPrxG4M0qXvqz1wu0mCztePzNI/J+Dx6 fVFA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j14si757242ejy.206.2020.05.14.01.17.38; Thu, 14 May 2020 01:18:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725977AbgENIPB (ORCPT + 99 others); Thu, 14 May 2020 04:15:01 -0400 Received: from out30-131.freemail.mail.aliyun.com ([115.124.30.131]:53593 "EHLO out30-131.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725878AbgENIPB (ORCPT ); Thu, 14 May 2020 04:15:01 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07484;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0TyW1GQC_1589444097; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0TyW1GQC_1589444097) by smtp.aliyun-inc.com(127.0.0.1); Thu, 14 May 2020 16:14:57 +0800 From: Jeffle Xu To: enwlinux@gmail.com, linux-ext4@vger.kernel.org Cc: tytso@mit.edu, joseph.qi@linux.alibaba.com Subject: [PATCH RFC] ext4: fix partial cluster initialization when splitting extent Date: Thu, 14 May 2020 16:14:57 +0800 Message-Id: <1589444097-38535-1-git-send-email-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hi Eric, would you mind explaining why the magic number '2' is used here when calculating the physical cluster number of the partial cluster in commit f4226d9ea400 ("ext4: fix partial cluster initialization") ? ``` + /* + * If we're going to split the extent, note that + * the cluster containing the block after 'end' is + * in use to avoid freeing it when removing blocks. + */ + if (sbi->s_cluster_ratio > 1) { + pblk = ext4_ext_pblock(ex) + end - ee_block + 2; + partial_cluster = + -(long long) EXT4_B2C(sbi, pblk); + } ``` As far as I understand, there we are initializing the partial cluster describing the beginning of the split extent after 'end'. The corrsponding physical block number of the first block in the split extent should be 'ext4_ext_pblock(ex) + end - ee_block + 1'. This bug will cause xfstests shared/298 failure on ext4 with bigalloc enabled sometimes. Ext4 error messages indicate that previously freed blocks are being freed again, and the following fsck will fail due to the inconsistency of block bitmap and bg descriptor. The following is an example case: 1. First, Initialize a ext4 filesystem with cluster size '16K', block size '4K', in which case, one cluster contains four blocks. 2. Create one file (e.g., xxx.img) on this ext4 filesystem. Now the extent tree of this file is like: ... 36864:[0]4:220160 36868:[0]14332:145408 51200:[0]2:231424 ... 3. Then execute PUNCH_HOLE fallocate on this file. The hole range is like: .. ext4_ext_remove_space: dev 254,16 ino 12 since 49506 end 49506 depth 1 ext4_ext_remove_space: dev 254,16 ino 12 since 49544 end 49546 depth 1 ext4_ext_remove_space: dev 254,16 ino 12 since 49605 end 49607 depth 1 ... 4. Then the extent tree of this file after punching is like ... 49507:[0]37:158047 49547:[0]58:158087 ... 5. Detailed procedure of punching hole [49544, 49546] 5.1. The block address space: ``` lblk ~49505 49506 49507~49543 49544~49546 49547~ ---------+------+-------------+----------------+-------- extent | hole | extent | hole | extent ---------+------+-------------+----------------+-------- pblk ~158045 158046 158047~158083 158084~158086 158087~ ``` 5.2. The detailed layout of cluster 39521: ``` cluster 39521 <-------------------------------> hole extent <----------------------><-------- lblk 49544 49545 49546 49547 +-------+-------+-------+-------+ | | | | | +-------+-------+-------+-------+ pblk 158084 1580845 158086 158087 ``` 5.3. The ftrace output when punching hole [49544, 49546]: - ext4_ext_remove_space (start 49544, end 49546) - ext4_ext_rm_leaf (start 49544, end 49546, last_extent [49507(158047), 40], partial [pclu 39522 lblk 0 state 2]) - ext4_remove_blocks (extent [49507(158047), 40], from 49544 to 49546, partial [pclu 39522 lblk 0 state 2] - ext4_free_blocks: (block 158084 count 4) - ext4_mballoc_free (extent 1/6753/1) In this case, the whole cluster 39521 is freed mistakenly when freeing pblock 158084~158086 (i.e., the first three blocks of this cluster), although pblock 158087 (the last remaining block of this cluster) has not been freed yet. The root cause of this isuue is that, the pclu of the partial cluster is calculated mistakenly in ext4_ext_remove_space(). The correct partial_cluster.pclu (i.e., the cluster number of the first block in the next extent, that is, lblock 49597 (pblock 158086)) should be 39521 rather than 39522. Fixes: f4226d9ea400 ("ext4: fix partial cluster initialization") Signed-off-by: Jeffle Xu --- fs/ext4/extents.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index f2b577b..cb74496 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -2828,7 +2828,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, * in use to avoid freeing it when removing blocks. */ if (sbi->s_cluster_ratio > 1) { - pblk = ext4_ext_pblock(ex) + end - ee_block + 2; + pblk = ext4_ext_pblock(ex) + end - ee_block + 1; partial.pclu = EXT4_B2C(sbi, pblk); partial.state = nofree; } -- 1.8.3.1