2024-05-29 08:30:40

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v17 0/7] Case insensitive cleanup for ext4/f2fs

Hello,

I am trying to respin the series here :
https://www.spinics.net/lists/linux-ext4/msg85081.html

I resent some of the v9 patches and got some reviews from Gabriel,
I did changes as requested and here is v17.

Changes in v17:
- in patch 2/7 the case insensitive match helper, I modified the logic a bit,
memcmp params, and return errors properly, also removed patches for logging
errors as the message is now included in the helper itself.

Changes in v16:
- rewrote patch 2/9 without `match`
- changed to return value in generic_ci_match coming from utf8 compare only in
strict mode.
- changed f2fs_warn to *_ratelimited in 7/9
- removed the declaration of f2fs_cf_name_slab in recovery.c as it's no longer
needed.

Changes in v15:
- fix wrong check `ret<0` in 7/9
- fix memleak reintroduced in 8/9

Changes in v14:
- fix wrong kfree unchecked call
- changed the return code in 3/8

Changes in v13:
- removed stray wrong line in 2/8
- removed old R-b as it's too long since they were given
- removed check for null buff in 2/8
- added new patch `f2fs: Log error when lookup of encoded dentry fails` as suggested
- rebased on unicode.git for-next branch

Changes in v12:
- revert to v10 comparison with propagating the error code from utf comparison

Changes in v11:
- revert to the original v9 implementation for the comparison helper.

Changes in v10:
- reworked a bit the comparison helper to improve performance by
first performing the exact lookup.


* Original commit letter

The case-insensitive implementations in f2fs and ext4 have quite a bit
of duplicated code. This series simplifies the ext4 version, with the
goal of extracting ext4_ci_compare into a helper library that can be
used by both filesystems. It also reduces the clutter from many
codeguards for CONFIG_UNICODE; as requested by Linus, they are part of
the codeflow now.

While there, I noticed we can leverage the utf8 functions to detect
encoded names that are corrupted in the filesystem. Therefore, it also
adds an ext4 error on that scenario, to mark the filesystem as
corrupted.

This series survived passes of xfstests -g quick.

Gabriel Krisman Bertazi (7):
ext4: Simplify the handling of cached casefolded names
f2fs: Simplify the handling of cached casefolded names
libfs: Introduce case-insensitive string comparison helper
ext4: Reuse generic_ci_match for ci comparisons
f2fs: Reuse generic_ci_match for ci comparisons
ext4: Move CONFIG_UNICODE defguards into the code flow
f2fs: Move CONFIG_UNICODE defguards into the code flow

fs/ext4/crypto.c | 10 +---
fs/ext4/ext4.h | 35 ++++++++-----
fs/ext4/namei.c | 126 +++++++++++++++------------------------------
fs/ext4/super.c | 4 +-
fs/f2fs/dir.c | 105 +++++++++++--------------------------
fs/f2fs/f2fs.h | 16 +++++-
fs/f2fs/namei.c | 10 ++--
fs/f2fs/recovery.c | 9 +---
fs/f2fs/super.c | 8 +--
fs/libfs.c | 74 ++++++++++++++++++++++++++
include/linux/fs.h | 4 ++
11 files changed, 200 insertions(+), 201 deletions(-)

--
2.34.1



2024-05-29 08:30:41

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v17 4/7] ext4: Reuse generic_ci_match for ci comparisons

From: Gabriel Krisman Bertazi <[email protected]>

Instead of reimplementing ext4_match_ci, use the new libfs helper.

It also adds a comment explaining why fname->cf_name.name must be
checked prior to the encryption hash optimization, because that tripped
me before.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
Signed-off-by: Eugen Hristev <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
---
fs/ext4/namei.c | 91 +++++++++++++++----------------------------------
1 file changed, 27 insertions(+), 64 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index ec4c9bfc1057..20668741a23c 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1390,58 +1390,6 @@ static void dx_insert_block(struct dx_frame *frame, u32 hash, ext4_lblk_t block)
}

#if IS_ENABLED(CONFIG_UNICODE)
-/*
- * Test whether a case-insensitive directory entry matches the filename
- * being searched for. If quick is set, assume the name being looked up
- * is already in the casefolded form.
- *
- * Returns: 0 if the directory entry matches, more than 0 if it
- * doesn't match or less than zero on error.
- */
-static int ext4_ci_compare(const struct inode *parent, const struct qstr *name,
- u8 *de_name, size_t de_name_len, bool quick)
-{
- const struct super_block *sb = parent->i_sb;
- const struct unicode_map *um = sb->s_encoding;
- struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
- struct qstr entry = QSTR_INIT(de_name, de_name_len);
- int ret;
-
- if (IS_ENCRYPTED(parent)) {
- const struct fscrypt_str encrypted_name =
- FSTR_INIT(de_name, de_name_len);
-
- decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
- if (!decrypted_name.name)
- return -ENOMEM;
- ret = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
- &decrypted_name);
- if (ret < 0)
- goto out;
- entry.name = decrypted_name.name;
- entry.len = decrypted_name.len;
- }
-
- if (quick)
- ret = utf8_strncasecmp_folded(um, name, &entry);
- else
- ret = utf8_strncasecmp(um, name, &entry);
- if (ret < 0) {
- /* Handle invalid character sequence as either an error
- * or as an opaque byte sequence.
- */
- if (sb_has_strict_encoding(sb))
- ret = -EINVAL;
- else if (name->len != entry.len)
- ret = 1;
- else
- ret = !!memcmp(name->name, entry.name, entry.len);
- }
-out:
- kfree(decrypted_name.name);
- return ret;
-}
-
int ext4_fname_setup_ci_filename(struct inode *dir, const struct qstr *iname,
struct ext4_filename *name)
{
@@ -1503,20 +1451,35 @@ static bool ext4_match(struct inode *parent,
#if IS_ENABLED(CONFIG_UNICODE)
if (IS_CASEFOLDED(parent) &&
(!IS_ENCRYPTED(parent) || fscrypt_has_encryption_key(parent))) {
- if (fname->cf_name.name) {
- if (IS_ENCRYPTED(parent)) {
- if (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
- fname->hinfo.minor_hash !=
- EXT4_DIRENT_MINOR_HASH(de)) {
+ int ret;

- return false;
- }
- }
- return !ext4_ci_compare(parent, &fname->cf_name,
- de->name, de->name_len, true);
+ /*
+ * Just checking IS_ENCRYPTED(parent) below is not
+ * sufficient to decide whether one can use the hash for
+ * skipping the string comparison, because the key might
+ * have been added right after
+ * ext4_fname_setup_ci_filename(). In this case, a hash
+ * mismatch will be a false negative. Therefore, make
+ * sure cf_name was properly initialized before
+ * considering the calculated hash.
+ */
+ if (IS_ENCRYPTED(parent) && fname->cf_name.name &&
+ (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
+ fname->hinfo.minor_hash != EXT4_DIRENT_MINOR_HASH(de)))
+ return false;
+
+ ret = generic_ci_match(parent, fname->usr_fname,
+ &fname->cf_name, de->name,
+ de->name_len);
+ if (ret < 0) {
+ /*
+ * Treat comparison errors as not a match. The
+ * only case where it happens is on a disk
+ * corruption or ENOMEM.
+ */
+ return false;
}
- return !ext4_ci_compare(parent, fname->usr_fname, de->name,
- de->name_len, false);
+ return ret;
}
#endif

--
2.34.1


2024-06-04 19:18:01

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH v17 4/7] ext4: Reuse generic_ci_match for ci comparisons

Eugen Hristev <[email protected]> writes:

> From: Gabriel Krisman Bertazi <[email protected]>
>
> Instead of reimplementing ext4_match_ci, use the new libfs helper.
>
> It also adds a comment explaining why fname->cf_name.name must be
> checked prior to the encryption hash optimization, because that tripped
> me before.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> Signed-off-by: Eugen Hristev <[email protected]>
> Reviewed-by: Eric Biggers <[email protected]>
> ---
> fs/ext4/namei.c | 91 +++++++++++++++----------------------------------
> 1 file changed, 27 insertions(+), 64 deletions(-)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index ec4c9bfc1057..20668741a23c 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -1390,58 +1390,6 @@ static void dx_insert_block(struct dx_frame *frame, u32 hash, ext4_lblk_t block)
> }
>
> #if IS_ENABLED(CONFIG_UNICODE)
> -/*
> - * Test whether a case-insensitive directory entry matches the filename
> - * being searched for. If quick is set, assume the name being looked up
> - * is already in the casefolded form.
> - *
> - * Returns: 0 if the directory entry matches, more than 0 if it
> - * doesn't match or less than zero on error.
> - */
> -static int ext4_ci_compare(const struct inode *parent, const struct qstr *name,
> - u8 *de_name, size_t de_name_len, bool quick)
> -{
> - const struct super_block *sb = parent->i_sb;
> - const struct unicode_map *um = sb->s_encoding;
> - struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
> - struct qstr entry = QSTR_INIT(de_name, de_name_len);
> - int ret;
> -
> - if (IS_ENCRYPTED(parent)) {
> - const struct fscrypt_str encrypted_name =
> - FSTR_INIT(de_name, de_name_len);
> -
> - decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
> - if (!decrypted_name.name)
> - return -ENOMEM;
> - ret = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
> - &decrypted_name);
> - if (ret < 0)
> - goto out;
> - entry.name = decrypted_name.name;
> - entry.len = decrypted_name.len;
> - }
> -
> - if (quick)
> - ret = utf8_strncasecmp_folded(um, name, &entry);
> - else
> - ret = utf8_strncasecmp(um, name, &entry);
> - if (ret < 0) {
> - /* Handle invalid character sequence as either an error
> - * or as an opaque byte sequence.
> - */
> - if (sb_has_strict_encoding(sb))
> - ret = -EINVAL;
> - else if (name->len != entry.len)
> - ret = 1;
> - else
> - ret = !!memcmp(name->name, entry.name, entry.len);
> - }
> -out:
> - kfree(decrypted_name.name);
> - return ret;
> -}
> -
> int ext4_fname_setup_ci_filename(struct inode *dir, const struct qstr *iname,
> struct ext4_filename *name)
> {
> @@ -1503,20 +1451,35 @@ static bool ext4_match(struct inode *parent,
> #if IS_ENABLED(CONFIG_UNICODE)
> if (IS_CASEFOLDED(parent) &&
> (!IS_ENCRYPTED(parent) || fscrypt_has_encryption_key(parent))) {
> - if (fname->cf_name.name) {
> - if (IS_ENCRYPTED(parent)) {
> - if (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
> - fname->hinfo.minor_hash !=
> - EXT4_DIRENT_MINOR_HASH(de)) {
> + int ret;
>
> - return false;
> - }
> - }
> - return !ext4_ci_compare(parent, &fname->cf_name,
> - de->name, de->name_len, true);
> + /*
> + * Just checking IS_ENCRYPTED(parent) below is not
> + * sufficient to decide whether one can use the hash for
> + * skipping the string comparison, because the key might
> + * have been added right after
> + * ext4_fname_setup_ci_filename(). In this case, a hash
> + * mismatch will be a false negative. Therefore, make
> + * sure cf_name was properly initialized before
> + * considering the calculated hash.
> + */
> + if (IS_ENCRYPTED(parent) && fname->cf_name.name &&
> + (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
> + fname->hinfo.minor_hash != EXT4_DIRENT_MINOR_HASH(de)))
> + return false;
> +
> + ret = generic_ci_match(parent, fname->usr_fname,
> + &fname->cf_name, de->name,
> + de->name_len);
> + if (ret < 0) {
> + /*
> + * Treat comparison errors as not a match. The
> + * only case where it happens is on a disk
> + * corruption or ENOMEM.
> + */
> + return false;
> }
> - return !ext4_ci_compare(parent, fname->usr_fname, de->name,
> - de->name_len, false);

With the changes to patch 3 in this iteration, This could become:

/*
* Treat comparison errors as not a match. The
* only case where it happens is disk corruption
* or ENOMEM.
*/
return ext4_ci_compare(parent, fname->usr_fname, de->name,
de->name_len, false) > 0;

--
Gabriel Krisman Bertazi

2024-06-05 10:49:16

by Eugen Hristev

[permalink] [raw]
Subject: Re: [PATCH v17 4/7] ext4: Reuse generic_ci_match for ci comparisons

On 6/4/24 22:17, Gabriel Krisman Bertazi wrote:
> Eugen Hristev <[email protected]> writes:
>
>> From: Gabriel Krisman Bertazi <[email protected]>
>>
>> Instead of reimplementing ext4_match_ci, use the new libfs helper.
>>
>> It also adds a comment explaining why fname->cf_name.name must be
>> checked prior to the encryption hash optimization, because that tripped
>> me before.
>>
>> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>> Signed-off-by: Eugen Hristev <[email protected]>
>> Reviewed-by: Eric Biggers <[email protected]>
>> ---
>> fs/ext4/namei.c | 91 +++++++++++++++----------------------------------
>> 1 file changed, 27 insertions(+), 64 deletions(-)
>>
>> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
>> index ec4c9bfc1057..20668741a23c 100644
>> --- a/fs/ext4/namei.c
>> +++ b/fs/ext4/namei.c
>> @@ -1390,58 +1390,6 @@ static void dx_insert_block(struct dx_frame *frame, u32 hash, ext4_lblk_t block)
>> }
>>
>> #if IS_ENABLED(CONFIG_UNICODE)
>> -/*
>> - * Test whether a case-insensitive directory entry matches the filename
>> - * being searched for. If quick is set, assume the name being looked up
>> - * is already in the casefolded form.
>> - *
>> - * Returns: 0 if the directory entry matches, more than 0 if it
>> - * doesn't match or less than zero on error.
>> - */
>> -static int ext4_ci_compare(const struct inode *parent, const struct qstr *name,
>> - u8 *de_name, size_t de_name_len, bool quick)
>> -{
>> - const struct super_block *sb = parent->i_sb;
>> - const struct unicode_map *um = sb->s_encoding;
>> - struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
>> - struct qstr entry = QSTR_INIT(de_name, de_name_len);
>> - int ret;
>> -
>> - if (IS_ENCRYPTED(parent)) {
>> - const struct fscrypt_str encrypted_name =
>> - FSTR_INIT(de_name, de_name_len);
>> -
>> - decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
>> - if (!decrypted_name.name)
>> - return -ENOMEM;
>> - ret = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
>> - &decrypted_name);
>> - if (ret < 0)
>> - goto out;
>> - entry.name = decrypted_name.name;
>> - entry.len = decrypted_name.len;
>> - }
>> -
>> - if (quick)
>> - ret = utf8_strncasecmp_folded(um, name, &entry);
>> - else
>> - ret = utf8_strncasecmp(um, name, &entry);
>> - if (ret < 0) {
>> - /* Handle invalid character sequence as either an error
>> - * or as an opaque byte sequence.
>> - */
>> - if (sb_has_strict_encoding(sb))
>> - ret = -EINVAL;
>> - else if (name->len != entry.len)
>> - ret = 1;
>> - else
>> - ret = !!memcmp(name->name, entry.name, entry.len);
>> - }
>> -out:
>> - kfree(decrypted_name.name);
>> - return ret;
>> -}
>> -
>> int ext4_fname_setup_ci_filename(struct inode *dir, const struct qstr *iname,
>> struct ext4_filename *name)
>> {
>> @@ -1503,20 +1451,35 @@ static bool ext4_match(struct inode *parent,
>> #if IS_ENABLED(CONFIG_UNICODE)
>> if (IS_CASEFOLDED(parent) &&
>> (!IS_ENCRYPTED(parent) || fscrypt_has_encryption_key(parent))) {
>> - if (fname->cf_name.name) {
>> - if (IS_ENCRYPTED(parent)) {
>> - if (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
>> - fname->hinfo.minor_hash !=
>> - EXT4_DIRENT_MINOR_HASH(de)) {
>> + int ret;
>>
>> - return false;
>> - }
>> - }
>> - return !ext4_ci_compare(parent, &fname->cf_name,
>> - de->name, de->name_len, true);
>> + /*
>> + * Just checking IS_ENCRYPTED(parent) below is not
>> + * sufficient to decide whether one can use the hash for
>> + * skipping the string comparison, because the key might
>> + * have been added right after
>> + * ext4_fname_setup_ci_filename(). In this case, a hash
>> + * mismatch will be a false negative. Therefore, make
>> + * sure cf_name was properly initialized before
>> + * considering the calculated hash.
>> + */
>> + if (IS_ENCRYPTED(parent) && fname->cf_name.name &&
>> + (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
>> + fname->hinfo.minor_hash != EXT4_DIRENT_MINOR_HASH(de)))
>> + return false;
>> +
>> + ret = generic_ci_match(parent, fname->usr_fname,
>> + &fname->cf_name, de->name,
>> + de->name_len);
>> + if (ret < 0) {
>> + /*
>> + * Treat comparison errors as not a match. The
>> + * only case where it happens is on a disk
>> + * corruption or ENOMEM.
>> + */
>> + return false;
>> }
>> - return !ext4_ci_compare(parent, fname->usr_fname, de->name,
>> - de->name_len, false);
>
> With the changes to patch 3 in this iteration, This could become:
>
> /*
> * Treat comparison errors as not a match. The
> * only case where it happens is disk corruption
> * or ENOMEM.
> */
> return ext4_ci_compare(parent, fname->usr_fname, de->name,
> de->name_len, false) > 0;
>

Do you mean

return generic_ci_match(parent, fname->usr_fname,
&fname->cf_name, de->name,
de->name_len) > 0;

?

Because ext4_ci_compare was obsoleted with this series.

Thanks,
Eugen