Received: by 2002:ab2:6a05:0:b0:1f8:1780:a4ed with SMTP id w5csp1027314lqo; Sat, 11 May 2024 04:38:31 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVzjMWuOywWPqZkBR61U5+rrLrwgsrewPnf3LKUuhwYMbR0O/jRpY9UyusoWPEOQi8Hf9KYu4xhjPgi3DsZPTLI/yaiB8p7J8/djt9tVQ== X-Google-Smtp-Source: AGHT+IGPmQC0CsSv9DUbTnMYrr+RIKCG8sQ/+tyV8yWfzWWI4HHPeNpQjwMmdch9dh2JvfhNhozN X-Received: by 2002:a17:907:7fa4:b0:a59:9fc8:38c2 with SMTP id a640c23a62f3a-a5a2d58550dmr419975866b.31.1715427511192; Sat, 11 May 2024 04:38:31 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715427511; cv=pass; d=google.com; s=arc-20160816; b=fi2WHAzq5Sf1nBGGo6o8m0GqWA+quJU0lTT6fZ/UcOqJ6hrqDBE+lHtH89cz+FvIkU 0dBq4N7RmmYHDJcCCknji1JBe6RZileHRJ9zZvEAVWnyjGi/vEwwzzYBaVE62c2A/ik0 xFG8YER+Gs+TGcU+7WCwCYqaK2NzyURmc9HBNcbkTm7FYdl8IItv5BoK+QncL0dTxhqb IaeNJJQ9xbMjjqZlXG7Vz7cY7hM8QhbqWHwvVKFXFMcW5eOm/Rd4qUfsPeXs0gPizv3L r1YNre/oack9LcC/nIQeknUlIhe6IKdv+Zkjd+85NGigIyWmxVGbH6I1ELOrXEXHnqZW TNTw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=ah7K5HhkF80e5CuCW5wWEbq/ZWbw1x2FDAV7UqpyvkU=; fh=/gW0yUFnRcFNkSCPxz8UjBHCGPgiCEguRpLIibqIEzc=; b=v95kgjSKyA8EsXPqigLdESmXJ+aG+UsfYAvfQ4XJgSPXk84lVGRDRLgtOA1iG/4N9Z v7KUFWLic5oydjg6Roc3lAExcfCDGlCilfJ0484u4oA/yHfDlrW6YLB4rZLP1zUoJJfD 7urAZiBXqP37quVcLA8IV4PnxLigbXS//XvCSvruH4wq7QfCtGc3dmZvJoRW1ZonRoJF bLfVTIHJJTu836ZCgRFWHhGfDsbC2V1iLKrpMSy9Rs1Zu5tbsncRSrRtfvOM1PDhZ/a/ 9EZRdSxZEq/qep8t0xcv6uu0XGtjwO14XeVmkA+XWkBEUwNIRrMpVtLjGU/u8JYeJ/Qi PfbA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-ext4+bounces-2460-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-ext4+bounces-2460-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id a640c23a62f3a-a5a17946934si320167966b.36.2024.05.11.04.38.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 11 May 2024 04:38:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4+bounces-2460-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-ext4+bounces-2460-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-ext4+bounces-2460-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id E7AAC1F21638 for ; Sat, 11 May 2024 11:38:30 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8E5E35FBB7; Sat, 11 May 2024 11:37:13 +0000 (UTC) X-Original-To: linux-ext4@vger.kernel.org Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D6875577C; Sat, 11 May 2024 11:37:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715427433; cv=none; b=KFByvX9jcqVsdwjMEs+XuqFEvf09daHnAa17N2Idbl5r0G/1jsnBg05kkWeBEr0icxh0LT/cBkIkl+HAESvJqd3J4fmsSxBXkXa7v017ykT7iVn82/x6kvndUwdsCdm2aWFpZTlY3FB9ck3rtTaVsmURPwzCJYQC+RJ2hEHkQEQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715427433; c=relaxed/simple; bh=RAprmNiB8UcEtxfv5M7q9CQ4z65zs7vu+Dir4hy/N/Y=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fpbAzPyZd+KOfwwP7eds+OguWQSTzlBUYi7Ew7TdiitQOAlkOJjXMrBTlbv8KQcDw6/Ewbh0HehX7sglk4CLGpAwiPxG3BY8wG9uek/Khp0LFAE5WvnpVl0iGS7s4J8vklDfzFQ6ZrFkmP1ZMaHpKTWNVqK4uovMUrcp/thfqn0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4Vc3cm5459z4f3jdH; Sat, 11 May 2024 19:36:56 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 2C02E1A01D2; Sat, 11 May 2024 19:37:05 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDHlxA+WD9mG0B4MQ--.22689S6; Sat, 11 May 2024 19:37:05 +0800 (CST) From: Zhang Yi To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [PATCH v4 02/10] ext4: check the extent status again before inserting delalloc block Date: Sat, 11 May 2024 19:26:11 +0800 Message-Id: <20240511112619.3656450-3-yi.zhang@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240511112619.3656450-1-yi.zhang@huaweicloud.com> References: <20240511112619.3656450-1-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-ext4@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID:cCh0CgDHlxA+WD9mG0B4MQ--.22689S6 X-Coremail-Antispam: 1UD129KBjvJXoWxWryUAw4DtF15Ar4xWr1rZwb_yoW5Aw18pF 9xCrn5Cr10gws7Gan3WF17Zr1rWw4rXrW7GFy3Kr1UZFy3JFySkF12va42va1fKrZ7JF4Y qFWYqryUu3WUtrDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBK14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UM2 8EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AI xVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20x vE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xv r2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8cxan2IY04 v7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_ Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x 0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8 JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIx AIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7VU1489tUUUUU= = X-CM-SenderInfo: d1lo6xhdqjqx5xdzvxpfor3voofrz/ From: Zhang Yi ext4_da_map_blocks looks up for any extent entry in the extent status tree (w/o i_data_sem) and then the looks up for any ondisk extent mapping (with i_data_sem in read mode). If it finds a hole in the extent status tree or if it couldn't find any entry at all, it then takes the i_data_sem in write mode to add a da entry into the extent status tree. This can actually race with page mkwrite & fallocate path. Note that this is ok between 1. ext4 buffered-write path v/s ext4_page_mkwrite(), because of the folio lock 2. ext4 buffered write path v/s ext4 fallocate because of the inode lock. But this can race between ext4_page_mkwrite() & ext4 fallocate path ext4_page_mkwrite() ext4_fallocate() block_page_mkwrite() ext4_da_map_blocks() //find hole in extent status tree ext4_alloc_file_blocks() ext4_map_blocks() //allocate block and unwritten extent ext4_insert_delayed_block() ext4_da_reserve_space() //reserve one more block ext4_es_insert_delayed_block() //drop unwritten extent and add delayed extent by mistake Then, the delalloc extent is wrong until writeback and the extra reserved block can't be released any more and it triggers below warning: EXT4-fs (pmem2): Inode 13 (00000000bbbd4d23): i_reserved_data_blocks(1) not cleared! Fixes the problem by looking up extent status tree again while the i_data_sem is held in write mode. If it still can't find any entry, then we insert a new da entry into the extent status tree. Cc: stable@vger.kernel.org Signed-off-by: Zhang Yi Reviewed-by: Jan Kara --- fs/ext4/inode.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 6a41172c06e1..6114ca79f464 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1737,6 +1737,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock, if (ext4_es_is_hole(&es)) goto add_delayed; +found: /* * Delayed extent could be allocated by fallocate. * So we need to check it. @@ -1781,6 +1782,26 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock, add_delayed: down_write(&EXT4_I(inode)->i_data_sem); + /* + * Page fault path (ext4_page_mkwrite does not take i_rwsem) + * and fallocate path (no folio lock) can race. Make sure we + * lookup the extent status tree here again while i_data_sem + * is held in write mode, before inserting a new da entry in + * the extent status tree. + */ + if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) { + if (!ext4_es_is_hole(&es)) { + up_write(&EXT4_I(inode)->i_data_sem); + goto found; + } + } else if (!ext4_has_inline_data(inode)) { + retval = ext4_map_query_blocks(NULL, inode, map); + if (retval) { + up_write(&EXT4_I(inode)->i_data_sem); + return retval; + } + } + retval = ext4_insert_delayed_block(inode, map->m_lblk); up_write(&EXT4_I(inode)->i_data_sem); if (retval) -- 2.39.2