Received: by 2002:a05:6358:700f:b0:131:369:b2a3 with SMTP id 15csp557432rwo; Tue, 1 Aug 2023 23:47:09 -0700 (PDT) X-Google-Smtp-Source: APBJJlFkfXxZwPkOzyFzlWckvrE5mrNZWpdJ0xB0cB0E9nTBtN84H+lLbnxreYpcVjejb3U98Z7g X-Received: by 2002:a17:906:c10e:b0:98e:3dac:6260 with SMTP id do14-20020a170906c10e00b0098e3dac6260mr5986020ejc.13.1690958829038; Tue, 01 Aug 2023 23:47:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690958829; cv=none; d=google.com; s=arc-20160816; b=JJRDjDwkSnf2Z3X1Ig5kGxavE1gF9zihvEUQg5D4m3ImqA4TVZNt33/0NHHRmewHJO QLeUbgAEYxqaPZtlW6IQ25OSAbTPvkjp0AN1qDV8BRggXerylf8lF68B2mNAVP9WT6up 1Xayiemnjw5uGJytJTsVa9+D7qRu3alSZvP/tfpYN/fh52pfRXmZtUl4kkRBNhVWie0A QrZiRAOT7uDjw8u6qDddeK44JwD+DAvXThhBnqZiTlbO6JxJZJ5svYy/O3ZY95bkUm0r ANC5yqOCEXv5bNRjwLUZyXR87foy1ZFyGlD6nA89Q460UgvDtzFZzZYPmxTRTTL3LJns F3uw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:to:cc:in-reply-to:date:subject :mime-version:message-id:from:dkim-signature; bh=89yQo6JvhtkY+bxBvF10uAWWy7qU/ZemUlqRNtERQBU=; fh=PmGbQwY7ZW+0v7eyBefY01C2lFpgBlV3eEbp24/G/dM=; b=Jam7StMxwWKAY1R8CbHNVaXB/u6v0lY/DKEpUVLzrFOvtLsXY0r6bRPd/lChsdCwOj eokUQVuQwCK+gxXEgsqYTEl7m3vDTU/990+iWhQo0I3Q4VpVMwmJwDA2SSa8V/xf34Q3 JM7CQD5R4kdA/T5OgOWcG2oBFQvME3YgIckgGmxeuZ0Y1Iikvcdg//c0XQhhL099MNSg fDvDYrwhbi2Nu8d/MR3nTaPFgpUZeeAiZIORCcSyDUmM58l+NcDtyY8G69cxoRVhe6RJ VDoC7uxIM9HAamRwH4/8DjEprheIYW59gjU0Y7atyuktteyRpwz9GPlRsREygvI7Uc9d 8sdA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dilger-ca.20221208.gappssmtp.com header.s=20221208 header.b=atPdeHat; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u21-20020a170906069500b00992ca650b77si10215635ejb.416.2023.08.01.23.46.44; Tue, 01 Aug 2023 23:47:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@dilger-ca.20221208.gappssmtp.com header.s=20221208 header.b=atPdeHat; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231801AbjHBGHN (ORCPT + 99 others); Wed, 2 Aug 2023 02:07:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230155AbjHBGHL (ORCPT ); Wed, 2 Aug 2023 02:07:11 -0400 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FDA21707 for ; Tue, 1 Aug 2023 23:07:09 -0700 (PDT) Received: by mail-pf1-x432.google.com with SMTP id d2e1a72fcca58-686efb9ee0cso6232727b3a.3 for ; Tue, 01 Aug 2023 23:07:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dilger-ca.20221208.gappssmtp.com; s=20221208; t=1690956429; x=1691561229; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=89yQo6JvhtkY+bxBvF10uAWWy7qU/ZemUlqRNtERQBU=; b=atPdeHatzfegDqTLuTNLAe2U/ht7BeGUbDRdU9BtQ1s0XKN7LGlnsHpQZ8JI+FxSll o202Wd5iR6QKo2YIimR5P108FnQQ0vCd86t4pYA6c4J7Ziwf7KQ7otdmsFbvIb+EMUls +I/BP0L3xBALA2oQ/ENYYYtH1o5LeC87jsTYuUuRoXMrvqdeT4PXz7R7t8aj+aDNv2eQ 83I9tjx4B5eVSLWhb7F07mN0lXSgzlXDAp7Fy66St9cZaj53tADeO6dh2TB1sDlmCHyz WOBqKmtX6Rpl6mxPDwfi9YG8Y7CSd+aeA0hD/mPTYmW7RR8170nmybcUUFpBOYy0tOhm VcbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690956429; x=1691561229; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=89yQo6JvhtkY+bxBvF10uAWWy7qU/ZemUlqRNtERQBU=; b=SnBE7zJERZSpkxMcg4sVWsHv17EATwGi5l67MqFErarCdSKODwx8fYVlxEEi9CQjPc Whl8uSzbuGVFQifnUX2LNQs5570cdl/svaJhlV+sv+bmZ0r3LQsnAQsojjKqA4YVsLPF pJrF3l93x2jGpkV+0mn1J8O7gl6xcqSOa2gKVwBqsef3TXhdINQW0kc5DbOO9liRODnJ opQvcWPQB4cqJlm8rDXMYtQ9+oFLhWzk9iMKw/TylhFsfNypsrzN/tQx7DF4jmU5/36m BY8bDSSM/+LyGxwIlJ8ij7m7Dmtqi4C3wt+Egq6CkTRQtEACCG7WZ5g+GeUFGEt3W0zN e4zA== X-Gm-Message-State: ABy/qLZCctgJcIHnGPLfQd9GLMJH7gBWFouv3WeMAOyw/mlZ9rnjgg/K eHOvmcJl6gJ6CoI/LPEZ9KjKrQ== X-Received: by 2002:a05:6a00:2316:b0:687:535f:6712 with SMTP id h22-20020a056a00231600b00687535f6712mr4525816pfh.14.1690956428586; Tue, 01 Aug 2023 23:07:08 -0700 (PDT) Received: from cabot.adilger.int (S01061cabc081bf83.cg.shawcable.net. [70.77.221.9]) by smtp.gmail.com with ESMTPSA id k5-20020aa78205000000b00682a0184742sm10523132pfi.148.2023.08.01.23.07.07 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 Aug 2023 23:07:07 -0700 (PDT) From: Andreas Dilger Message-Id: Content-Type: multipart/signed; boundary="Apple-Mail=_7DC4E546-4390-4370-AF1A-053B5A4A010E"; protocol="application/pgp-signature"; micalg=pgp-sha256 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [PATCH v3] ext4: Fix rec_len verify error Date: Wed, 2 Aug 2023 00:07:04 -0600 In-Reply-To: <20230801151828.GB11332@frogsfrogsfrogs> Cc: zhangshida , Theodore Ts'o , Zhang Yi , Ext4 Developers List , Linux Kernel Mailing List , zhangshida@kylinos.cn, stable@kernel.org To: "Darrick J. Wong" References: <20230801112337.1856215-1-zhangshida@kylinos.cn> <20230801151828.GB11332@frogsfrogsfrogs> X-Mailer: Apple Mail (2.3273) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --Apple-Mail=_7DC4E546-4390-4370-AF1A-053B5A4A010E Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Aug 1, 2023, at 9:18 AM, Darrick J. Wong wrote: >=20 > On Tue, Aug 01, 2023 at 07:23:37PM +0800, zhangshida wrote: >> From: Shida Zhang >>=20 >> With the configuration PAGE_SIZE 64k and filesystem blocksize 64k, >> a problem occurred when more than 13 million files were directly = created >> under a directory: >>=20 >> EXT4-fs error (device xx): ext4_dx_csum_set:492: inode #xxxx: comm = xxxxx: dir seems corrupt? Run e2fsck -D. >> EXT4-fs error (device xx): ext4_dx_csum_verify:463: inode #xxxx: comm = xxxxx: dir seems corrupt? Run e2fsck -D. >> EXT4-fs error (device xx): dx_probe:856: inode #xxxx: block 8188: = comm xxxxx: Directory index failed checksum >>=20 >> When enough files are created, the fake_dirent->reclen will be = 0xffff. >> it doesn't equal to the blocksize 65536, i.e. 0x10000. >>=20 >> But it is not the same condition when blocksize equals to 4k. >> when enough files are created, the fake_dirent->reclen will be = 0x1000. >> it equals to the blocksize 4k, i.e. 0x1000. >>=20 >> The problem seems to be related to the limitation of the 16-bit field >> when the blocksize is set to 64k. >> To address this, helpers like ext4_rec_len_{from,to}_disk has already >> been introduce to complete the conversion between the encoded and the >> plain form of rec_len. >>=20 >> So fix this one by using the helper, and all the other >> le16_to_cpu(ext4_dir_entry{,_2}.rec_len) accesses in this file too. >>=20 >> Cc: stable@kernel.org >> Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree = nodes") >> Suggested-by: Andreas Dilger >> Suggested-by: Darrick J. Wong >> Signed-off-by: Shida Zhang >> --- >> v1->v2: >> Use the existing helper to covert the rec_len, as suggested by = Andreas. >> v2->v3: >> 1,Covert all the other rec_len if necessary, as suggested by Darrick. >> 2,Rephrase the commit message. >>=20 >> fs/ext4/namei.c | 16 ++++++++-------- >> 1 file changed, 8 insertions(+), 8 deletions(-) >>=20 >> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c >> index 0caf6c730ce3..8cb377b8ad86 100644 >> --- a/fs/ext4/namei.c >> +++ b/fs/ext4/namei.c >> @@ -346,14 +346,14 @@ static struct ext4_dir_entry_tail = *get_dirent_tail(struct inode *inode, >>=20 >> #ifdef PARANOID >> struct ext4_dir_entry *d, *top; >> + int blocksize =3D EXT4_BLOCK_SIZE(inode->i_sb); >>=20 >> d =3D (struct ext4_dir_entry *)bh->b_data; >> top =3D (struct ext4_dir_entry *)(bh->b_data + >> - (EXT4_BLOCK_SIZE(inode->i_sb) - >> - sizeof(struct ext4_dir_entry_tail))); >> - while (d < top && d->rec_len) >> + (blocksize - sizeof(struct ext4_dir_entry_tail))); >> + while (d < top && ext4_rec_len_from_disk(d->rec_len, blocksize)) >> d =3D (struct ext4_dir_entry *)(((void *)d) + >> - le16_to_cpu(d->rec_len)); >> + ext4_rec_len_from_disk(d->rec_len, blocksize)); >>=20 >> if (d !=3D top) >> return NULL; >=20 > This is sitll missing some pieces; what about this clause at line 367: >=20 > if (t->det_reserved_zero1 || > le16_to_cpu(t->det_rec_len) !=3D sizeof(struct = ext4_dir_entry_tail) || > t->det_reserved_zero2 || > t->det_reserved_ft !=3D EXT4_FT_DIR_CSUM) > return NULL; >=20 >> @@ -445,13 +445,13 @@ static struct dx_countlimit = *get_dx_countlimit(struct inode *inode, >> struct ext4_dir_entry *dp; >> struct dx_root_info *root; >> int count_offset; >> + int blocksize =3D EXT4_BLOCK_SIZE(inode->i_sb); >>=20 >> - if (le16_to_cpu(dirent->rec_len) =3D=3D = EXT4_BLOCK_SIZE(inode->i_sb)) >> + if (ext4_rec_len_from_disk(dirent->rec_len, blocksize) =3D=3D = blocksize) >> count_offset =3D 8; >> - else if (le16_to_cpu(dirent->rec_len) =3D=3D 12) { >> + else if (ext4_rec_len_from_disk(dirent->rec_len, blocksize) =3D=3D= 12) { >=20 > Why not lift this ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ to = a > local variable? @dirent doesn't change, right? >=20 >> dp =3D (struct ext4_dir_entry *)(((void *)dirent) + 12); >> - if (le16_to_cpu(dp->rec_len) !=3D >> - EXT4_BLOCK_SIZE(inode->i_sb) - 12) >> + if (ext4_rec_len_from_disk(dp->rec_len, blocksize) !=3D = blocksize - 12) >> return NULL; >> root =3D (struct dx_root_info *)(((void *)dp + 12)); >> if (root->reserved_zero || >=20 > What about dx_make_map? >=20 > Here's all the opencoded access I could find: >=20 > $ git grep le16.*rec_len fs/ext4/ > fs/ext4/namei.c:356: le16_to_cpu(d->rec_len)); > fs/ext4/namei.c:367: le16_to_cpu(t->det_rec_len) !=3D = sizeof(struct ext4_dir_entry_tail) || > fs/ext4/namei.c:449: if (le16_to_cpu(dirent->rec_len) =3D=3D = EXT4_BLOCK_SIZE(inode->i_sb)) > fs/ext4/namei.c:451: else if (le16_to_cpu(dirent->rec_len) =3D=3D = 12) { > fs/ext4/namei.c:453: if (le16_to_cpu(dp->rec_len) !=3D > fs/ext4/namei.c:1338: map_tail->size =3D = le16_to_cpu(de->rec_len); Not all of these cases are actual bugs. The ext4_rec_len_from_disk() function is only different for rec_len >=3D 2^16, so if it is comparing rec_len against "12" or "sizeof(struct ...)" then the inequality will be correct regardless of how it is decoded. That said, it makes sense to use ext4_rec_len_from_disk() to access rec_len consistently throughout the code, since that avoids potential bugs in the future. We know the code will eventually will be copied some place where rec_len >=3D 2^16 is actually important, and we may as well avoid that bug before it happens. One thing this discussion *does* expose is that ext4_rec_len_from_disk() is hard-coded at compile time to differentiate between PAGE_SIZE > 64k and PAGE_SIZE =3D 4K, because it was never possible to have blocksize > PAGE_SIZE, so only ARM/PPC ever had filesystems with blocksize=3D64KiB (and the Fujitsu Fugaku SPARC system with blocksize=3D256KiB). However, with the recent advent of the VM and IO layers allowing blocksize > PAGE_SIZE this function will need to be changed to allow the same on x86 PAGE_SIZE=3D4KiB systems. Instead of checking #if PAGE_SIZE >=3D 65536 it should handle this based on the filesystem blocksize at runtime: static inline unsigned int ext4_rec_len_from_disk(__le16 dlen, unsigned blocksize) { unsigned len =3D le16_to_cpu(dlen); if (blocksize < 65536) return len; if (len =3D=3D EXT4_MAX_REC_LEN || len =3D=3D 0) return blocksize; return (len & 65532) | ((len & 3) << 16); } Strictly speaking, ((len & 65532) | ((len & 3) << 16) should equal "len" for any filesystem with blocksize < 65536, but IMHO it is more clear if the code is written this way. Similarly, the encoding needs to be changed to handle large records at runtime for when we eventually allow ext4 with blocksize > PAGE_SIZE. static inline __le16 ext4_rec_len_to_disk(unsigned len, unsigned = blocksize) { BUG_ON(len > blocksize); BUG_ON(blocksize > (1 << 18)); BUG_ON(len & 3); if (len < 65536) /* always true for blocksize < 65536 */ return cpu_to_le16(len); if (len =3D=3D blocksize) { if (blocksize =3D=3D 65536) return cpu_to_le16(EXT4_MAX_REC_LEN); return cpu_to_le16(0); } return cpu_to_le16((len & 65532) | ((len >> 16) & 3)); } Cheers, Andreas --Apple-Mail=_7DC4E546-4390-4370-AF1A-053B5A4A010E Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iQIzBAEBCAAdFiEEDb73u6ZejP5ZMprvcqXauRfMH+AFAmTJ8ogACgkQcqXauRfM H+CaKA//ajPC+6psxaK1xYxWe6I3WkSIJFKZqJ9wrvxvSZ3IhySoEBBgoQHK4AgB 71TDe4ijq8S8BQ33BHpxTStuIjfoUKz1qbKRvnE5+YHb1oi9w/YwZMnS3II6DxU1 UJCRu4rgl3ZEJmgNDXE9z4Lan8ynUMc/a1wowHSGEoBFrD2D1qzjnpPPfZdldq/r EV+iBOFNrAu/omzqW7SywaNU67hh8RsJDUBAoL6+JabWnRgRp81NPjccsJnTgGod J4b2JQHvISv5f7yeKN6KFBs7lg0k2N8YnAoxamTku3pOSQwvhFHYq7A105ydhDg2 KxyeF3daNeFcfCsXFzaDpsoFOHXL0qRxUBLV333g+pd87j0QPTFc3Xzg41LCkggh rSB6m6J2KMOTHA3Dc+RmaWk/hFPPjqHeuGyXgNO0w9EfuREFDtYBZaNqjChiwgQR rWsjQRcjfZT5UjKEDTJntShTkQRa1PHyVBz0EhqEsUz7TP7Fd/tNauOcAu4q+9hO qp/8buY9USPIgx/sou7d/YdLBzvrONYQNSf5lFb/3k7gWG6Qb5UjAV5zwv3Qf/uP MoX9R0DU4NHbN8uXGkrMWBdAbxfnP5a4bdQ+2H+ps0OSSBMTSt1Ag6UPWV7QGTjv oyLKQmgId2qtAa0OiTITkTBv1TgppKhZYoadYe171vxJaQ40OWk= =HuQO -----END PGP SIGNATURE----- --Apple-Mail=_7DC4E546-4390-4370-AF1A-053B5A4A010E--