Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp6114239rdb; Mon, 18 Sep 2023 04:54:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFBdU1TbO2rsk30BdvY3VR5S7H6+tQ6z0vAS8kbayCRe3FM5mK2VIwySjMqq9W7l3b6TrnJ X-Received: by 2002:a05:6a21:3d82:b0:14c:d0c0:dd27 with SMTP id bj2-20020a056a213d8200b0014cd0c0dd27mr6829595pzc.44.1695038079854; Mon, 18 Sep 2023 04:54:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695038079; cv=none; d=google.com; s=arc-20160816; b=o41piXgj/UohK8vVVLkC0iRVHrNDycc9NTQPTdUA6s3M7iOPEJtksqO4abXmWYTMmt 8nBDubuG5ZnohuYv9IZ7xIorPqCN/W5Azmcx72Q8BtOZmwxaUqdwwd1DLTfxDMYKz49S y3tiB+jUXcHqCeghT9eaS/NcdonyLzHvJBdXdKA1f38MLdyT/Pg7uJmkyA5LCKpDnMhM jVxbnkzjy9rXEepinV81hDFteid3AzbqroiEf9s3ieVCCwaJsVF33Ekabn8NsVx1RTC+ zJb9LLgiI51Y0YQVLzIRODOSiPDhAMoHjkaHp8E0FYeSOPJQUILdXIQyFu7ftr3uzbnU tuSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=VDKC0IgCtueKNyujpoZMuLZLOeOtYSFtTCp33XEvMbQ=; fh=PJMzc3Hp2p3Dvzr5TvkhdjVsfQ/xiDeCtVytQiIqKjY=; b=bau0AJYOQDvhrq1uCi+fBC3aoFWuySHM+9rZEklqGzZgwxSTa0NJ2nrdQVA732rWKf UYDJz5/Un8pp8cOxHK5iP1ANw8LYDAJH2pNtxj0V3dG8Jr60xZpvSHr2qIhqyhpJeM/3 VUNO38mLPnTK8sAlcbqYNi1Swc6KODh9HiR7uluPUfg66qKS6AIunltG4yyWyiN8hE6z fOOxT7+WzlauBKEAu1CK2yF1/qWPdaDnLMknoQ2asx7n3ryqHGoQh+PE9Mt4nUePhCqx DfMmYx4j4YI/q59u2e9MBZHc6A7rQzggcAbvY/j4fiqXdZFLLsAzgUGfFvrCZZamrcfB +TdA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=ndUBAxmZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id b9-20020a056a000cc900b0068fc7ab1fd5si8161432pfv.269.2023.09.18.04.54.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Sep 2023 04:54:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=ndUBAxmZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 0D4BF807C6EA; Mon, 18 Sep 2023 03:47:44 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241169AbjIRKrC (ORCPT + 99 others); Mon, 18 Sep 2023 06:47:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43124 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241220AbjIRKqi (ORCPT ); Mon, 18 Sep 2023 06:46:38 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9031A2; Mon, 18 Sep 2023 03:46:29 -0700 (PDT) Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38IAbjLR002153; Mon, 18 Sep 2023 10:46:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=VDKC0IgCtueKNyujpoZMuLZLOeOtYSFtTCp33XEvMbQ=; b=ndUBAxmZzSSO5fJPfKvfp428YzhVQO2nlhMHzCR20zdUEty4+sD5FGBsO/Q/scBtRw1i aKHS97alJEB/kAixgMOnmLn3qg13EC/MjQi8dn7fwH6BSzR+YWHa077/LnsRyaC0mb+q CDI4oWPcTU0zeiVypFzFBNO7YHa3hrX7mksU0zbIPt4suvh/tZEKw0u+A050sSsWlZcP S/RE52wjBrpzhZh897MmhzOJ7JnShZOzP/xfSsENGHo2sSllJNL3ocZ7D1pEHy1fQhuL KINXH13aNBKQkl+jRSj0jThDd6RVKqiOJtlE6MAcUbKQemrzOWsvF71MDg54waYDX0Dj Hw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t6mqhrj49-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 18 Sep 2023 10:46:25 +0000 Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38IAbt7Q003407; Mon, 18 Sep 2023 10:46:24 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t6mqhrj44-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 18 Sep 2023 10:46:24 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38I9FfR1016455; Mon, 18 Sep 2023 10:46:23 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3t5sd1hbup-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 18 Sep 2023 10:46:23 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38IAkMtK63635934 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 18 Sep 2023 10:46:22 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1A8D42004B; Mon, 18 Sep 2023 10:46:22 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0B7F920043; Mon, 18 Sep 2023 10:46:13 +0000 (GMT) Received: from li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com.com (unknown [9.43.103.37]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 18 Sep 2023 10:46:12 +0000 (GMT) From: Ojaswin Mujoo To: linux-ext4@vger.kernel.org, "Theodore Ts'o" Cc: Ritesh Harjani , linux-kernel@vger.kernel.org, Jan Kara Subject: [PATCH v3 1/1] ext4: Mark buffer new if it is unwritten to avoid stale data exposure Date: Mon, 18 Sep 2023 16:15:50 +0530 Message-Id: X-Mailer: git-send-email 2.39.3 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: jzQaErfWEcJamkHNs8YUyaO-4GHM4Mm6 X-Proofpoint-GUID: LIezmvTrat85M1iO0KxBRGJrTWMfJ6VW X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-18_02,2023-09-18_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 clxscore=1015 impostorscore=0 suspectscore=0 malwarescore=0 phishscore=0 spamscore=0 adultscore=0 bulkscore=0 priorityscore=1501 lowpriorityscore=0 mlxlogscore=980 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309180092 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 18 Sep 2023 03:47:44 -0700 (PDT) ** Short Version ** In ext4 with dioread_nolock, we could have a scenario where the bh returned by get_blocks (ext4_get_block_unwritten()) in __block_write_begin_int() has UNWRITTEN and MAPPED flag set. Since such a bh does not have NEW flag set we never zero out the range of bh that is not under write, causing whatever stale data is present in the folio at that time to be written out to disk. To fix this mark the buffer as new, in case it is unwritten, in ext4_get_block_unwritten(). ** Long Version ** The issue mentioned above was resulting in two different bugs: 1. On block size < page size case in ext4, generic/269 was reliably failing with dioread_nolock. The state of the write was as follows: * The write was extending i_size. * The last block of the file was fallocated and had an unwritten extent * We were near ENOSPC and hence we were switching to non-delayed alloc allocation. In this case, the back trace that triggers the bug is as follows: ext4_da_write_begin() /* switch to nodelalloc due to low space */ ext4_write_begin() ext4_should_dioread_nolock() // true since mount flags still have delalloc __block_write_begin(..., ext4_get_block_unwritten) __block_write_begin_int() for(each buffer head in page) { /* first iteration, this is bh1 which contains i_size */ if (!buffer_mapped) get_block() /* returns bh with only UNWRITTEN and MAPPED */ /* second iteration, bh2 */ if (!buffer_mapped) get_block() /* we fail here, could be ENOSPC */ } if (err) /* * this would zero out all new buffers and mark them uptodate. * Since bh1 was never marked new, we skip it here which causes * the bug later. */ folio_zero_new_buffers(); /* ext4_wrte_begin() error handling */ ext4_truncate_failed_write() ext4_truncate() ext4_block_truncate_page() __ext4_block_zero_page_range() if(!buffer_uptodate()) ext4_read_bh_lock() ext4_read_bh() -> ... ext4_submit_bh_wbc() BUG_ON(buffer_unwritten(bh)); /* !!! */ 2. The second issue is stale data exposure with page size >= blocksize with dioread_nolock. The conditions needed for it to happen are same as the previous issue ie dioread_nolock around ENOSPC condition. The issue is also similar where in __block_write_begin_int() when we call ext4_get_block_unwritten() on the buffer_head and the underlying extent is unwritten, we get an unwritten and mapped buffer head. Since it is not new, we never zero out the partial range which is not under write, thus writing stale data to disk. This can be easily observed with the following reproducer: fallocate -l 4k testfile xfs_io -c "pwrite 2k 2k" testfile # hexdump output will have stale data in from byte 0 to 2k in testfile hexdump -C testfile NOTE: To trigger this, we need dioread_nolock enabled and write happening via ext4_write_begin(), which is usually used when we have -o nodealloc. Since dioread_nolock is disabled with nodelalloc, the only alternate way to call ext4_write_begin() is to ensure that delayed alloc switches to nodelalloc ie ext4_da_write_begin() calls ext4_write_begin(). This will usually happen when ext4 is almost full like the way generic/269 was triggering it in Issue 1 above. This might make the issue harder to hit. Hence, for reliable replication, I used the below patch to temporarily allow dioread_nolock with nodelalloc and then mount the disk with -o nodealloc,dioread_nolock. With this you can hit the stale data issue 100% of times: @@ -508,8 +508,8 @@ static inline int ext4_should_dioread_nolock(struct inode *inode) if (ext4_should_journal_data(inode)) return 0; /* temporary fix to prevent generic/422 test failures */ - if (!test_opt(inode->i_sb, DELALLOC)) - return 0; + // if (!test_opt(inode->i_sb, DELALLOC)) + // return 0; return 1; } After applying this patch to mark buffer as NEW, both the above issues are fixed. Signed-off-by: Ojaswin Mujoo Reviewed-by: Jan Kara --- fs/ext4/inode.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 6c490f05e2ba..8b286a800193 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -789,10 +789,22 @@ int ext4_get_block(struct inode *inode, sector_t iblock, int ext4_get_block_unwritten(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { + int ret = 0; + ext4_debug("ext4_get_block_unwritten: inode %lu, create flag %d\n", inode->i_ino, create); - return _ext4_get_block(inode, iblock, bh_result, + ret = _ext4_get_block(inode, iblock, bh_result, EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT); + + /* + * If the buffer is marked unwritten, mark it as new to make sure it is + * zeroed out correctly in case of partial writes. Otherwise, there is + * a chance of stale data getting exposed. + */ + if (ret == 0 && buffer_unwritten(bh_result)) + set_buffer_new(bh_result); + + return ret; } /* Maximum number of blocks we map for direct IO at once. */ -- 2.39.3