2024-04-05 12:14:20

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v16 0/9] Cache insensitive cleanup for ext4/f2fs

Hello,

I am trying to respin the series here :
https://www.spinics.net/lists/linux-ext4/msg85081.html

I resent some of the v9 patches and got some reviews from Gabriel,
I did changes as requested and here is v16.

Changes in v16:
- rewrote patch 2/9 without `match`
- changed to return value in generic_ci_match coming from utf8 compare only in
strict mode.
- changed f2fs_warn to *_ratelimited in 7/9
- removed the declaration of f2fs_cf_name_slab in recovery.c as it's no longer
needed.

Changes in v15:
- fix wrong check `ret<0` in 7/9
- fix memleak reintroduced in 8/9

Changes in v14:
- fix wrong kfree unchecked call
- changed the return code in 3/8

Changes in v13:
- removed stray wrong line in 2/8
- removed old R-b as it's too long since they were given
- removed check for null buff in 2/8
- added new patch `f2fs: Log error when lookup of encoded dentry fails` as suggested
- rebased on unicode.git for-next branch

Changes in v12:
- revert to v10 comparison with propagating the error code from utf comparison

Changes in v11:
- revert to the original v9 implementation for the comparison helper.

Changes in v10:
- reworked a bit the comparison helper to improve performance by
first performing the exact lookup.


* Original commit letter

The case-insensitive implementations in f2fs and ext4 have quite a bit
of duplicated code. This series simplifies the ext4 version, with the
goal of extracting ext4_ci_compare into a helper library that can be
used by both filesystems. It also reduces the clutter from many
codeguards for CONFIG_UNICODE; as requested by Linus, they are part of
the codeflow now.

While there, I noticed we can leverage the utf8 functions to detect
encoded names that are corrupted in the filesystem. Therefore, it also
adds an ext4 error on that scenario, to mark the filesystem as
corrupted.

This series survived passes of xfstests -g quick.

Eugen Hristev (1):
f2fs: Log error when lookup of encoded dentry fails

Gabriel Krisman Bertazi (8):
ext4: Simplify the handling of cached insensitive names
f2fs: Simplify the handling of cached insensitive names
libfs: Introduce case-insensitive string comparison helper
ext4: Reuse generic_ci_match for ci comparisons
f2fs: Reuse generic_ci_match for ci comparisons
ext4: Log error when lookup of encoded dentry fails
ext4: Move CONFIG_UNICODE defguards into the code flow
f2fs: Move CONFIG_UNICODE defguards into the code flow

fs/ext4/crypto.c | 10 +---
fs/ext4/ext4.h | 35 +++++++-----
fs/ext4/namei.c | 129 ++++++++++++++++-----------------------------
fs/ext4/super.c | 4 +-
fs/f2fs/dir.c | 108 ++++++++++++-------------------------
fs/f2fs/f2fs.h | 16 +++++-
fs/f2fs/namei.c | 10 ++--
fs/f2fs/recovery.c | 9 +---
fs/f2fs/super.c | 8 +--
fs/libfs.c | 73 +++++++++++++++++++++++++
include/linux/fs.h | 4 ++
11 files changed, 206 insertions(+), 200 deletions(-)

--
2.34.1



2024-04-05 12:14:37

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v16 2/9] f2fs: Simplify the handling of cached insensitive names

From: Gabriel Krisman Bertazi <[email protected]>

Keeping it as qstr avoids the unnecessary conversion in f2fs_match

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
[[email protected]: port to 6.8-rc3 and minor changes]
Signed-off-by: Eugen Hristev <[email protected]>
---
fs/f2fs/dir.c | 53 ++++++++++++++++++++++++++--------------------
fs/f2fs/f2fs.h | 16 +++++++++++++-
fs/f2fs/recovery.c | 9 +-------
3 files changed, 46 insertions(+), 32 deletions(-)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 02c9355176d3..bdf980e25adb 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -42,35 +42,49 @@ static unsigned int bucket_blocks(unsigned int level)
return 4;
}

+#if IS_ENABLED(CONFIG_UNICODE)
/* If @dir is casefolded, initialize @fname->cf_name from @fname->usr_fname. */
int f2fs_init_casefolded_name(const struct inode *dir,
struct f2fs_filename *fname)
{
-#if IS_ENABLED(CONFIG_UNICODE)
struct super_block *sb = dir->i_sb;
+ unsigned char *buf;
+ int len;

if (IS_CASEFOLDED(dir) &&
!is_dot_dotdot(fname->usr_fname->name, fname->usr_fname->len)) {
- fname->cf_name.name = f2fs_kmem_cache_alloc(f2fs_cf_name_slab,
- GFP_NOFS, false, F2FS_SB(sb));
- if (!fname->cf_name.name)
+ buf = f2fs_kmem_cache_alloc(f2fs_cf_name_slab,
+ GFP_NOFS, false, F2FS_SB(sb));
+ if (!buf)
return -ENOMEM;
- fname->cf_name.len = utf8_casefold(sb->s_encoding,
- fname->usr_fname,
- fname->cf_name.name,
- F2FS_NAME_LEN);
- if ((int)fname->cf_name.len <= 0) {
- kmem_cache_free(f2fs_cf_name_slab, fname->cf_name.name);
- fname->cf_name.name = NULL;
+
+ len = utf8_casefold(sb->s_encoding, fname->usr_fname,
+ buf, F2FS_NAME_LEN);
+ if (len <= 0) {
+ kmem_cache_free(f2fs_cf_name_slab, buf);
if (sb_has_strict_encoding(sb))
return -EINVAL;
/* fall back to treating name as opaque byte sequence */
+ return 0;
}
+ fname->cf_name.name = buf;
+ fname->cf_name.len = len;
}
-#endif
+
return 0;
}

+void f2fs_free_casefolded_name(struct f2fs_filename *fname)
+{
+ unsigned char *buf = (unsigned char *)fname->cf_name.name;
+
+ if (buf) {
+ kmem_cache_free(f2fs_cf_name_slab, buf);
+ fname->cf_name.name = NULL;
+ }
+}
+#endif /* CONFIG_UNICODE */
+
static int __f2fs_setup_filename(const struct inode *dir,
const struct fscrypt_name *crypt_name,
struct f2fs_filename *fname)
@@ -142,12 +156,7 @@ void f2fs_free_filename(struct f2fs_filename *fname)
kfree(fname->crypto_buf.name);
fname->crypto_buf.name = NULL;
#endif
-#if IS_ENABLED(CONFIG_UNICODE)
- if (fname->cf_name.name) {
- kmem_cache_free(f2fs_cf_name_slab, fname->cf_name.name);
- fname->cf_name.name = NULL;
- }
-#endif
+ f2fs_free_casefolded_name(fname);
}

static unsigned long dir_block_index(unsigned int level,
@@ -235,11 +244,9 @@ static inline int f2fs_match_name(const struct inode *dir,
struct fscrypt_name f;

#if IS_ENABLED(CONFIG_UNICODE)
- if (fname->cf_name.name) {
- struct qstr cf = FSTR_TO_QSTR(&fname->cf_name);
-
- return f2fs_match_ci_name(dir, &cf, de_name, de_name_len);
- }
+ if (fname->cf_name.name)
+ return f2fs_match_ci_name(dir, &fname->cf_name,
+ de_name, de_name_len);
#endif
f.usr_fname = fname->usr_fname;
f.disk_name = fname->disk_name;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index fced2b7652f4..119769240ffb 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -531,7 +531,7 @@ struct f2fs_filename {
* internal operation where usr_fname is also NULL. In all these cases
* we fall back to treating the name as an opaque byte sequence.
*/
- struct fscrypt_str cf_name;
+ struct qstr cf_name;
#endif
};

@@ -3533,8 +3533,22 @@ int f2fs_get_tmpfile(struct mnt_idmap *idmap, struct inode *dir,
/*
* dir.c
*/
+#if IS_ENABLED(CONFIG_UNICODE)
int f2fs_init_casefolded_name(const struct inode *dir,
struct f2fs_filename *fname);
+void f2fs_free_casefolded_name(struct f2fs_filename *fname);
+#else
+static inline int f2fs_init_casefolded_name(const struct inode *dir,
+ struct f2fs_filename *fname)
+{
+ return 0;
+}
+
+static inline void f2fs_free_casefolded_name(struct f2fs_filename *fname)
+{
+}
+#endif /* CONFIG_UNICODE */
+
int f2fs_setup_filename(struct inode *dir, const struct qstr *iname,
int lookup, struct f2fs_filename *fname);
int f2fs_prepare_lookup(struct inode *dir, struct dentry *dentry,
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index e7bf15b8240a..00c396973c8a 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -46,10 +46,6 @@

static struct kmem_cache *fsync_entry_slab;

-#if IS_ENABLED(CONFIG_UNICODE)
-extern struct kmem_cache *f2fs_cf_name_slab;
-#endif
-
bool f2fs_space_for_roll_forward(struct f2fs_sb_info *sbi)
{
s64 nalloc = percpu_counter_sum_positive(&sbi->alloc_valid_block_count);
@@ -153,11 +149,8 @@ static int init_recovered_filename(const struct inode *dir,
if (err)
return err;
f2fs_hash_filename(dir, fname);
-#if IS_ENABLED(CONFIG_UNICODE)
/* Case-sensitive match is fine for recovery */
- kmem_cache_free(f2fs_cf_name_slab, fname->cf_name.name);
- fname->cf_name.name = NULL;
-#endif
+ f2fs_free_casefolded_name(fname);
} else {
f2fs_hash_filename(dir, fname);
}
--
2.34.1


2024-04-05 12:14:47

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v16 3/9] libfs: Introduce case-insensitive string comparison helper

From: Gabriel Krisman Bertazi <[email protected]>

generic_ci_match can be used by case-insensitive filesystems to compare
strings under lookup with dirents in a case-insensitive way. This
function is currently reimplemented by each filesystem supporting
casefolding, so this reduces code duplication in filesystem-specific
code.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
[[email protected]: rework to first test the exact match]
Signed-off-by: Eugen Hristev <[email protected]>
---
fs/libfs.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 4 +++
2 files changed, 77 insertions(+)

diff --git a/fs/libfs.c b/fs/libfs.c
index 3a6f2cb364f8..88fcd19685a3 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1811,6 +1811,79 @@ static const struct dentry_operations generic_ci_dentry_ops = {
.d_revalidate = fscrypt_d_revalidate,
#endif
};
+
+/**
+ * generic_ci_match() - Match a name (case-insensitively) with a dirent.
+ * This is a filesystem helper for comparison with directory entries.
+ * generic_ci_d_compare should be used in VFS' ->d_compare instead.
+ *
+ * @parent: Inode of the parent of the dirent under comparison
+ * @name: name under lookup.
+ * @folded_name: Optional pre-folded name under lookup
+ * @de_name: Dirent name.
+ * @de_name_len: dirent name length.
+ *
+ * Test whether a case-insensitive directory entry matches the filename
+ * being searched. If @folded_name is provided, it is used instead of
+ * recalculating the casefold of @name.
+ *
+ * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
+ * < 0 on error.
+ */
+int generic_ci_match(const struct inode *parent,
+ const struct qstr *name,
+ const struct qstr *folded_name,
+ const u8 *de_name, u32 de_name_len)
+{
+ const struct super_block *sb = parent->i_sb;
+ const struct unicode_map *um = sb->s_encoding;
+ struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
+ struct qstr dirent = QSTR_INIT(de_name, de_name_len);
+ int res = 0;
+
+ if (IS_ENCRYPTED(parent)) {
+ const struct fscrypt_str encrypted_name =
+ FSTR_INIT((u8 *) de_name, de_name_len);
+
+ if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
+ return -EINVAL;
+
+ decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
+ if (!decrypted_name.name)
+ return -ENOMEM;
+ res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
+ &decrypted_name);
+ if (res < 0)
+ goto out;
+ dirent.name = decrypted_name.name;
+ dirent.len = decrypted_name.len;
+ }
+
+ /*
+ * Attempt a case-sensitive match first. It is cheaper and
+ * should cover most lookups, including all the sane
+ * applications that expect a case-sensitive filesystem.
+ */
+ if (folded_name->name) {
+ if (dirent.len == folded_name->len &&
+ !memcmp(folded_name->name, dirent.name, dirent.len))
+ goto out;
+ res = utf8_strncasecmp_folded(um, folded_name, &dirent);
+ } else {
+ if (dirent.len == name->len &&
+ !memcmp(name->name, dirent.name, dirent.len) &&
+ (!sb_has_strict_encoding(sb) || !utf8_validate(um, name)))
+ goto out;
+ res = utf8_strncasecmp(um, name, &dirent);
+ }
+
+out:
+ kfree(decrypted_name.name);
+ if (res < 0 && sb_has_strict_encoding(sb))
+ return res;
+ return !res;
+}
+EXPORT_SYMBOL(generic_ci_match);
#endif

#ifdef CONFIG_FS_ENCRYPTION
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 883b72478f61..422488ed24cd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3346,6 +3346,10 @@ extern int generic_file_fsync(struct file *, loff_t, loff_t, int);
extern int generic_check_addressable(unsigned, u64);

extern void generic_set_sb_d_ops(struct super_block *sb);
+extern int generic_ci_match(const struct inode *parent,
+ const struct qstr *name,
+ const struct qstr *folded_name,
+ const u8 *de_name, u32 de_name_len);

static inline bool sb_has_encoding(const struct super_block *sb)
{
--
2.34.1


2024-04-05 12:15:09

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v16 4/9] ext4: Reuse generic_ci_match for ci comparisons

From: Gabriel Krisman Bertazi <[email protected]>

Instead of reimplementing ext4_match_ci, use the new libfs helper.

It also adds a comment explaining why fname->cf_name.name must be
checked prior to the encryption hash optimization, because that tripped
me before.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
Signed-off-by: Eugen Hristev <[email protected]>
---
fs/ext4/namei.c | 91 +++++++++++++++----------------------------------
1 file changed, 27 insertions(+), 64 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index b96983a4c185..2d0ee232fbe7 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1390,58 +1390,6 @@ static void dx_insert_block(struct dx_frame *frame, u32 hash, ext4_lblk_t block)
}

#if IS_ENABLED(CONFIG_UNICODE)
-/*
- * Test whether a case-insensitive directory entry matches the filename
- * being searched for. If quick is set, assume the name being looked up
- * is already in the casefolded form.
- *
- * Returns: 0 if the directory entry matches, more than 0 if it
- * doesn't match or less than zero on error.
- */
-static int ext4_ci_compare(const struct inode *parent, const struct qstr *name,
- u8 *de_name, size_t de_name_len, bool quick)
-{
- const struct super_block *sb = parent->i_sb;
- const struct unicode_map *um = sb->s_encoding;
- struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
- struct qstr entry = QSTR_INIT(de_name, de_name_len);
- int ret;
-
- if (IS_ENCRYPTED(parent)) {
- const struct fscrypt_str encrypted_name =
- FSTR_INIT(de_name, de_name_len);
-
- decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
- if (!decrypted_name.name)
- return -ENOMEM;
- ret = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
- &decrypted_name);
- if (ret < 0)
- goto out;
- entry.name = decrypted_name.name;
- entry.len = decrypted_name.len;
- }
-
- if (quick)
- ret = utf8_strncasecmp_folded(um, name, &entry);
- else
- ret = utf8_strncasecmp(um, name, &entry);
- if (ret < 0) {
- /* Handle invalid character sequence as either an error
- * or as an opaque byte sequence.
- */
- if (sb_has_strict_encoding(sb))
- ret = -EINVAL;
- else if (name->len != entry.len)
- ret = 1;
- else
- ret = !!memcmp(name->name, entry.name, entry.len);
- }
-out:
- kfree(decrypted_name.name);
- return ret;
-}
-
int ext4_fname_setup_ci_filename(struct inode *dir, const struct qstr *iname,
struct ext4_filename *name)
{
@@ -1503,20 +1451,35 @@ static bool ext4_match(struct inode *parent,
#if IS_ENABLED(CONFIG_UNICODE)
if (IS_CASEFOLDED(parent) &&
(!IS_ENCRYPTED(parent) || fscrypt_has_encryption_key(parent))) {
- if (fname->cf_name.name) {
- if (IS_ENCRYPTED(parent)) {
- if (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
- fname->hinfo.minor_hash !=
- EXT4_DIRENT_MINOR_HASH(de)) {
+ int ret;

- return false;
- }
- }
- return !ext4_ci_compare(parent, &fname->cf_name,
- de->name, de->name_len, true);
+ /*
+ * Just checking IS_ENCRYPTED(parent) below is not
+ * sufficient to decide whether one can use the hash for
+ * skipping the string comparison, because the key might
+ * have been added right after
+ * ext4_fname_setup_ci_filename(). In this case, a hash
+ * mismatch will be a false negative. Therefore, make
+ * sure cf_name was properly initialized before
+ * considering the calculated hash.
+ */
+ if (IS_ENCRYPTED(parent) && fname->cf_name.name &&
+ (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
+ fname->hinfo.minor_hash != EXT4_DIRENT_MINOR_HASH(de)))
+ return false;
+
+ ret = generic_ci_match(parent, fname->usr_fname,
+ &fname->cf_name, de->name,
+ de->name_len);
+ if (ret < 0) {
+ /*
+ * Treat comparison errors as not a match. The
+ * only case where it happens is on a disk
+ * corruption or ENOMEM.
+ */
+ return false;
}
- return !ext4_ci_compare(parent, fname->usr_fname, de->name,
- de->name_len, false);
+ return ret;
}
#endif

--
2.34.1


2024-04-05 12:15:31

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v16 6/9] ext4: Log error when lookup of encoded dentry fails

From: Gabriel Krisman Bertazi <[email protected]>

If the volume is in strict mode, ext4_ci_compare can report a broken
encoding name. This will not trigger on a bad lookup, which is caught
earlier, only if the actual disk name is bad.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
Signed-off-by: Eugen Hristev <[email protected]>
---
fs/ext4/namei.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 2d0ee232fbe7..3268cf45d9db 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1477,6 +1477,9 @@ static bool ext4_match(struct inode *parent,
* only case where it happens is on a disk
* corruption or ENOMEM.
*/
+ if (ret == -EINVAL)
+ EXT4_ERROR_INODE(parent,
+ "Directory contains filename that is invalid UTF-8");
return false;
}
return ret;
--
2.34.1


2024-04-05 12:15:35

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v16 5/9] f2fs: Reuse generic_ci_match for ci comparisons

From: Gabriel Krisman Bertazi <[email protected]>

Now that ci_match is part of libfs, make f2fs reuse it instead of having
a different implementation.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
Signed-off-by: Eugen Hristev <[email protected]>
---
fs/f2fs/dir.c | 58 ++++-----------------------------------------------
1 file changed, 4 insertions(+), 54 deletions(-)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index bdf980e25adb..cbd7a5e96a37 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -185,58 +185,6 @@ static struct f2fs_dir_entry *find_in_block(struct inode *dir,
return f2fs_find_target_dentry(&d, fname, max_slots);
}

-#if IS_ENABLED(CONFIG_UNICODE)
-/*
- * Test whether a case-insensitive directory entry matches the filename
- * being searched for.
- *
- * Returns 1 for a match, 0 for no match, and -errno on an error.
- */
-static int f2fs_match_ci_name(const struct inode *dir, const struct qstr *name,
- const u8 *de_name, u32 de_name_len)
-{
- const struct super_block *sb = dir->i_sb;
- const struct unicode_map *um = sb->s_encoding;
- struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
- struct qstr entry = QSTR_INIT(de_name, de_name_len);
- int res;
-
- if (IS_ENCRYPTED(dir)) {
- const struct fscrypt_str encrypted_name =
- FSTR_INIT((u8 *)de_name, de_name_len);
-
- if (WARN_ON_ONCE(!fscrypt_has_encryption_key(dir)))
- return -EINVAL;
-
- decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
- if (!decrypted_name.name)
- return -ENOMEM;
- res = fscrypt_fname_disk_to_usr(dir, 0, 0, &encrypted_name,
- &decrypted_name);
- if (res < 0)
- goto out;
- entry.name = decrypted_name.name;
- entry.len = decrypted_name.len;
- }
-
- res = utf8_strncasecmp_folded(um, name, &entry);
- /*
- * In strict mode, ignore invalid names. In non-strict mode,
- * fall back to treating them as opaque byte sequences.
- */
- if (res < 0 && !sb_has_strict_encoding(sb)) {
- res = name->len == entry.len &&
- memcmp(name->name, entry.name, name->len) == 0;
- } else {
- /* utf8_strncasecmp_folded returns 0 on match */
- res = (res == 0);
- }
-out:
- kfree(decrypted_name.name);
- return res;
-}
-#endif /* CONFIG_UNICODE */
-
static inline int f2fs_match_name(const struct inode *dir,
const struct f2fs_filename *fname,
const u8 *de_name, u32 de_name_len)
@@ -245,8 +193,10 @@ static inline int f2fs_match_name(const struct inode *dir,

#if IS_ENABLED(CONFIG_UNICODE)
if (fname->cf_name.name)
- return f2fs_match_ci_name(dir, &fname->cf_name,
- de_name, de_name_len);
+ return generic_ci_match(dir, fname->usr_fname,
+ &fname->cf_name,
+ de_name, de_name_len);
+
#endif
f.usr_fname = fname->usr_fname;
f.disk_name = fname->disk_name;
--
2.34.1


2024-04-05 12:15:51

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v16 7/9] f2fs: Log error when lookup of encoded dentry fails

If the volume is in strict mode, generi c_ci_compare can report a broken
encoding name. This will not trigger on a bad lookup, which is caught
earlier, only if the actual disk name is bad.

Suggested-by: Gabriel Krisman Bertazi <[email protected]>
Signed-off-by: Eugen Hristev <[email protected]>
---
fs/f2fs/dir.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index cbd7a5e96a37..376f705aa3f1 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -192,11 +192,16 @@ static inline int f2fs_match_name(const struct inode *dir,
struct fscrypt_name f;

#if IS_ENABLED(CONFIG_UNICODE)
- if (fname->cf_name.name)
- return generic_ci_match(dir, fname->usr_fname,
- &fname->cf_name,
- de_name, de_name_len);
-
+ if (fname->cf_name.name) {
+ int ret = generic_ci_match(dir, fname->usr_fname,
+ &fname->cf_name,
+ de_name, de_name_len);
+ if (ret == -EINVAL)
+ f2fs_warn_ratelimited(F2FS_SB(dir->i_sb),
+ "Directory contains filename that is invalid UTF-8");
+
+ return ret;
+ }
#endif
f.usr_fname = fname->usr_fname;
f.disk_name = fname->disk_name;
--
2.34.1


2024-04-05 12:16:19

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v16 8/9] ext4: Move CONFIG_UNICODE defguards into the code flow

From: Gabriel Krisman Bertazi <[email protected]>

Instead of a bunch of ifdefs, make the unicode built checks part of the
code flow where possible, as requested by Torvalds.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
[[email protected]: port to 6.8-rc3]
Signed-off-by: Eugen Hristev <[email protected]>
---
fs/ext4/crypto.c | 10 ++--------
fs/ext4/ext4.h | 33 +++++++++++++++++++++------------
fs/ext4/namei.c | 14 +++++---------
fs/ext4/super.c | 4 +---
4 files changed, 29 insertions(+), 32 deletions(-)

diff --git a/fs/ext4/crypto.c b/fs/ext4/crypto.c
index 7ae0b61258a7..0a056d97e640 100644
--- a/fs/ext4/crypto.c
+++ b/fs/ext4/crypto.c
@@ -31,11 +31,10 @@ int ext4_fname_setup_filename(struct inode *dir, const struct qstr *iname,

ext4_fname_from_fscrypt_name(fname, &name);

-#if IS_ENABLED(CONFIG_UNICODE)
err = ext4_fname_setup_ci_filename(dir, iname, fname);
if (err)
ext4_fname_free_filename(fname);
-#endif
+
return err;
}

@@ -51,11 +50,9 @@ int ext4_fname_prepare_lookup(struct inode *dir, struct dentry *dentry,

ext4_fname_from_fscrypt_name(fname, &name);

-#if IS_ENABLED(CONFIG_UNICODE)
err = ext4_fname_setup_ci_filename(dir, &dentry->d_name, fname);
if (err)
ext4_fname_free_filename(fname);
-#endif
return err;
}

@@ -70,10 +67,7 @@ void ext4_fname_free_filename(struct ext4_filename *fname)
fname->usr_fname = NULL;
fname->disk_name.name = NULL;

-#if IS_ENABLED(CONFIG_UNICODE)
- kfree(fname->cf_name.name);
- fname->cf_name.name = NULL;
-#endif
+ ext4_fname_free_ci_filename(fname);
}

static bool uuid_is_zero(__u8 u[16])
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index d7bc2062cf3f..816fda388d5e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2742,8 +2742,25 @@ ext4_fsblk_t ext4_inode_to_goal_block(struct inode *);

#if IS_ENABLED(CONFIG_UNICODE)
extern int ext4_fname_setup_ci_filename(struct inode *dir,
- const struct qstr *iname,
- struct ext4_filename *fname);
+ const struct qstr *iname,
+ struct ext4_filename *fname);
+
+static inline void ext4_fname_free_ci_filename(struct ext4_filename *fname)
+{
+ kfree(fname->cf_name.name);
+ fname->cf_name.name = NULL;
+}
+#else
+static inline int ext4_fname_setup_ci_filename(struct inode *dir,
+ const struct qstr *iname,
+ struct ext4_filename *fname)
+{
+ return 0;
+}
+
+static inline void ext4_fname_free_ci_filename(struct ext4_filename *fname)
+{
+}
#endif

/* ext4 encryption related stuff goes here crypto.c */
@@ -2766,16 +2783,11 @@ static inline int ext4_fname_setup_filename(struct inode *dir,
int lookup,
struct ext4_filename *fname)
{
- int err = 0;
fname->usr_fname = iname;
fname->disk_name.name = (unsigned char *) iname->name;
fname->disk_name.len = iname->len;

-#if IS_ENABLED(CONFIG_UNICODE)
- err = ext4_fname_setup_ci_filename(dir, iname, fname);
-#endif
-
- return err;
+ return ext4_fname_setup_ci_filename(dir, iname, fname);
}

static inline int ext4_fname_prepare_lookup(struct inode *dir,
@@ -2787,10 +2799,7 @@ static inline int ext4_fname_prepare_lookup(struct inode *dir,

static inline void ext4_fname_free_filename(struct ext4_filename *fname)
{
-#if IS_ENABLED(CONFIG_UNICODE)
- kfree(fname->cf_name.name);
- fname->cf_name.name = NULL;
-#endif
+ ext4_fname_free_ci_filename(fname);
}

static inline int ext4_ioctl_get_encryption_pwsalt(struct file *filp,
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 3268cf45d9db..a5d9e5b01015 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1834,8 +1834,7 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi
}
}

-#if IS_ENABLED(CONFIG_UNICODE)
- if (!inode && IS_CASEFOLDED(dir)) {
+ if (IS_ENABLED(CONFIG_UNICODE) && !inode && IS_CASEFOLDED(dir)) {
/* Eventually we want to call d_add_ci(dentry, NULL)
* for negative dentries in the encoding case as
* well. For now, prevent the negative dentry
@@ -1843,7 +1842,7 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi
*/
return NULL;
}
-#endif
+
return d_splice_alias(inode, dentry);
}

@@ -3173,16 +3172,14 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
ext4_fc_track_unlink(handle, dentry);
retval = ext4_mark_inode_dirty(handle, dir);

-#if IS_ENABLED(CONFIG_UNICODE)
/* VFS negative dentries are incompatible with Encoding and
* Case-insensitiveness. Eventually we'll want avoid
* invalidating the dentries here, alongside with returning the
* negative dentries at ext4_lookup(), when it is better
* supported by the VFS for the CI case.
*/
- if (IS_CASEFOLDED(dir))
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
d_invalidate(dentry);
-#endif

end_rmdir:
brelse(bh);
@@ -3284,16 +3281,15 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
goto out_trace;

retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry), dentry);
-#if IS_ENABLED(CONFIG_UNICODE)
+
/* VFS negative dentries are incompatible with Encoding and
* Case-insensitiveness. Eventually we'll want avoid
* invalidating the dentries here, alongside with returning the
* negative dentries at ext4_lookup(), when it is better
* supported by the VFS for the CI case.
*/
- if (IS_CASEFOLDED(dir))
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
d_invalidate(dentry);
-#endif

out_trace:
trace_ext4_unlink_exit(dentry, retval);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3fce1b80c419..4e212f01b761 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3604,14 +3604,12 @@ int ext4_feature_set_ok(struct super_block *sb, int readonly)
return 0;
}

-#if !IS_ENABLED(CONFIG_UNICODE)
- if (ext4_has_feature_casefold(sb)) {
+ if (!IS_ENABLED(CONFIG_UNICODE) && ext4_has_feature_casefold(sb)) {
ext4_msg(sb, KERN_ERR,
"Filesystem with casefold feature cannot be "
"mounted without CONFIG_UNICODE");
return 0;
}
-#endif

if (readonly)
return 1;
--
2.34.1


2024-04-05 12:16:29

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v16 9/9] f2fs: Move CONFIG_UNICODE defguards into the code flow

From: Gabriel Krisman Bertazi <[email protected]>

Instead of a bunch of ifdefs, make the unicode built checks part of the
code flow where possible, as requested by Torvalds.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
[[email protected]: port to 6.8-rc3]
Signed-off-by: Eugen Hristev <[email protected]>
---
fs/f2fs/namei.c | 10 ++++------
fs/f2fs/super.c | 8 ++++----
2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index e54f8c08bda8..1ecde2b45e99 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -576,8 +576,7 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
goto out_iput;
}
out_splice:
-#if IS_ENABLED(CONFIG_UNICODE)
- if (!inode && IS_CASEFOLDED(dir)) {
+ if (IS_ENABLED(CONFIG_UNICODE) && !inode && IS_CASEFOLDED(dir)) {
/* Eventually we want to call d_add_ci(dentry, NULL)
* for negative dentries in the encoding case as
* well. For now, prevent the negative dentry
@@ -586,7 +585,7 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
trace_f2fs_lookup_end(dir, dentry, ino, err);
return NULL;
}
-#endif
+
new = d_splice_alias(inode, dentry);
trace_f2fs_lookup_end(dir, !IS_ERR_OR_NULL(new) ? new : dentry,
ino, IS_ERR(new) ? PTR_ERR(new) : err);
@@ -639,16 +638,15 @@ static int f2fs_unlink(struct inode *dir, struct dentry *dentry)
f2fs_delete_entry(de, page, dir, inode);
f2fs_unlock_op(sbi);

-#if IS_ENABLED(CONFIG_UNICODE)
/* VFS negative dentries are incompatible with Encoding and
* Case-insensitiveness. Eventually we'll want avoid
* invalidating the dentries here, alongside with returning the
* negative dentries at f2fs_lookup(), when it is better
* supported by the VFS for the CI case.
*/
- if (IS_CASEFOLDED(dir))
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
d_invalidate(dentry);
-#endif
+
if (IS_DIRSYNC(dir))
f2fs_sync_fs(sbi->sb, 1);
fail:
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index a4bc26dfdb1a..68f3639d9b52 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -311,7 +311,7 @@ struct kmem_cache *f2fs_cf_name_slab;
static int __init f2fs_create_casefold_cache(void)
{
f2fs_cf_name_slab = f2fs_kmem_cache_create("f2fs_casefolded_name",
- F2FS_NAME_LEN);
+ F2FS_NAME_LEN);
return f2fs_cf_name_slab ? 0 : -ENOMEM;
}

@@ -1313,13 +1313,13 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount)
return -EINVAL;
}
#endif
-#if !IS_ENABLED(CONFIG_UNICODE)
- if (f2fs_sb_has_casefold(sbi)) {
+
+ if (!IS_ENABLED(CONFIG_UNICODE) && f2fs_sb_has_casefold(sbi)) {
f2fs_err(sbi,
"Filesystem with casefold feature cannot be mounted without CONFIG_UNICODE");
return -EINVAL;
}
-#endif
+
/*
* The BLKZONED feature indicates that the drive was formatted with
* zone alignment optimization. This is optional for host-aware
--
2.34.1


2024-04-05 12:18:53

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v16 0/9] Cache insensitive cleanup for ext4/f2fs

On Fri, Apr 05, 2024 at 03:13:23PM +0300, Eugen Hristev wrote:
> Hello,
>
> I am trying to respin the series here :
> https://www.spinics.net/lists/linux-ext4/msg85081.html

The subject here is "Cache insensitive cleanup for ext4/f2fs".
Cache insensitive means something entirely different
https://en.wikipedia.org/wiki/Cache-oblivious_algorithm

I suspect you mean "Case insensitive".

2024-04-05 13:02:43

by Eugen Hristev

[permalink] [raw]
Subject: Re: [PATCH v16 0/9] Cache insensitive cleanup for ext4/f2fs

On 4/5/24 15:18, Matthew Wilcox wrote:
> On Fri, Apr 05, 2024 at 03:13:23PM +0300, Eugen Hristev wrote:
>> Hello,
>>
>> I am trying to respin the series here :
>> https://www.spinics.net/lists/linux-ext4/msg85081.html
>
> The subject here is "Cache insensitive cleanup for ext4/f2fs".
> Cache insensitive means something entirely different
> https://en.wikipedia.org/wiki/Cache-oblivious_algorithm
>
> I suspect you mean "Case insensitive".

You are correct, I apologize for the typo.

> _______________________________________________
> Kernel mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> This list is managed by https://mailman.collabora.com


2024-05-10 01:23:33

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v16 2/9] f2fs: Simplify the handling of cached insensitive names

On Fri, Apr 05, 2024 at 03:13:25PM +0300, Eugen Hristev wrote:
> From: Gabriel Krisman Bertazi <[email protected]>
>
> Keeping it as qstr avoids the unnecessary conversion in f2fs_match
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> [[email protected]: port to 6.8-rc3 and minor changes]
> Signed-off-by: Eugen Hristev <[email protected]>
> ---
> fs/f2fs/dir.c | 53 ++++++++++++++++++++++++++--------------------
> fs/f2fs/f2fs.h | 16 +++++++++++++-
> fs/f2fs/recovery.c | 9 +-------
> 3 files changed, 46 insertions(+), 32 deletions(-)

Reviewed-by: Eric Biggers <[email protected]>

(But please change "cached insensitive" to "case-insensitive")

- Eric

2024-05-10 01:25:53

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v16 8/9] ext4: Move CONFIG_UNICODE defguards into the code flow

On Fri, Apr 05, 2024 at 03:13:31PM +0300, Eugen Hristev wrote:
> From: Gabriel Krisman Bertazi <[email protected]>
>
> Instead of a bunch of ifdefs, make the unicode built checks part of the
> code flow where possible, as requested by Torvalds.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> [[email protected]: port to 6.8-rc3]
> Signed-off-by: Eugen Hristev <[email protected]>
> ---
> fs/ext4/crypto.c | 10 ++--------
> fs/ext4/ext4.h | 33 +++++++++++++++++++++------------
> fs/ext4/namei.c | 14 +++++---------
> fs/ext4/super.c | 4 +---
> 4 files changed, 29 insertions(+), 32 deletions(-)

Reviewed-by: Eric Biggers <[email protected]>

- Eric

2024-05-10 01:25:59

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v16 6/9] ext4: Log error when lookup of encoded dentry fails

On Fri, Apr 05, 2024 at 03:13:29PM +0300, Eugen Hristev wrote:
> From: Gabriel Krisman Bertazi <[email protected]>
>
> If the volume is in strict mode, ext4_ci_compare can report a broken
> encoding name. This will not trigger on a bad lookup, which is caught
> earlier, only if the actual disk name is bad.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> Signed-off-by: Eugen Hristev <[email protected]>
> ---
> fs/ext4/namei.c | 3 +++
> 1 file changed, 3 insertions(+)
>

Reviewed-by: Eric Biggers <[email protected]>

- Eric

2024-05-10 01:26:10

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v16 9/9] f2fs: Move CONFIG_UNICODE defguards into the code flow

On Fri, Apr 05, 2024 at 03:13:32PM +0300, Eugen Hristev wrote:
> From: Gabriel Krisman Bertazi <[email protected]>
>
> Instead of a bunch of ifdefs, make the unicode built checks part of the
> code flow where possible, as requested by Torvalds.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> [[email protected]: port to 6.8-rc3]
> Signed-off-by: Eugen Hristev <[email protected]>
> ---
> fs/f2fs/namei.c | 10 ++++------
> fs/f2fs/super.c | 8 ++++----
> 2 files changed, 8 insertions(+), 10 deletions(-)

Reviewed-by: Eric Biggers <[email protected]>

- Eric

2024-05-10 01:33:41

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v16 3/9] libfs: Introduce case-insensitive string comparison helper

On Fri, Apr 05, 2024 at 03:13:26PM +0300, Eugen Hristev wrote:
> +/**
> + * generic_ci_match() - Match a name (case-insensitively) with a dirent.
> + * This is a filesystem helper for comparison with directory entries.
> + * generic_ci_d_compare should be used in VFS' ->d_compare instead.
> + *
> + * @parent: Inode of the parent of the dirent under comparison
> + * @name: name under lookup.
> + * @folded_name: Optional pre-folded name under lookup
> + * @de_name: Dirent name.
> + * @de_name_len: dirent name length.
> + *
> + * Test whether a case-insensitive directory entry matches the filename
> + * being searched. If @folded_name is provided, it is used instead of
> + * recalculating the casefold of @name.
> + *
> + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
> + * < 0 on error.
> + */
> +int generic_ci_match(const struct inode *parent,
> + const struct qstr *name,
> + const struct qstr *folded_name,
> + const u8 *de_name, u32 de_name_len)
> +{
> + const struct super_block *sb = parent->i_sb;
> + const struct unicode_map *um = sb->s_encoding;
> + struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
> + struct qstr dirent = QSTR_INIT(de_name, de_name_len);
> + int res = 0;
> +
> + if (IS_ENCRYPTED(parent)) {
> + const struct fscrypt_str encrypted_name =
> + FSTR_INIT((u8 *) de_name, de_name_len);
> +
> + if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
> + return -EINVAL;
> +
> + decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
> + if (!decrypted_name.name)
> + return -ENOMEM;
> + res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
> + &decrypted_name);
> + if (res < 0)
> + goto out;

If fscrypt_fname_disk_to_usr() returns an error and !sb_has_strict_encoding(sb),
then this function returns 0 (indicating no match) instead of the error code
(indicating an error). Is that the correct behavior? I would think that
strict_encoding should only have an effect on the actual name comparison.

> + /*
> + * Attempt a case-sensitive match first. It is cheaper and
> + * should cover most lookups, including all the sane
> + * applications that expect a case-sensitive filesystem.
> + */
> + if (folded_name->name) {
> + if (dirent.len == folded_name->len &&
> + !memcmp(folded_name->name, dirent.name, dirent.len))
> + goto out;
> + res = utf8_strncasecmp_folded(um, folded_name, &dirent);

Shouldn't the memcmp be done with the original user-specified name, not the
casefolded name? I would think that the user-specified name is the one that's
more likely to match the on-disk name, because of case preservation. In most
cases users will specify the same case on both file creation and later access.

- Eric

2024-05-12 21:28:16

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH v16 3/9] libfs: Introduce case-insensitive string comparison helper

Eric Biggers <[email protected]> writes:

> On Fri, Apr 05, 2024 at 03:13:26PM +0300, Eugen Hristev wrote:

>> + if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
>> + return -EINVAL;
>> +
>> + decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
>> + if (!decrypted_name.name)
>> + return -ENOMEM;
>> + res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
>> + &decrypted_name);
>> + if (res < 0)
>> + goto out;
>
> If fscrypt_fname_disk_to_usr() returns an error and !sb_has_strict_encoding(sb),
> then this function returns 0 (indicating no match) instead of the error code
> (indicating an error). Is that the correct behavior? I would think that
> strict_encoding should only have an effect on the actual name
> comparison.

No. we *want* this return code to be propagated back to f2fs. In ext4 it
wouldn't matter since the error is not visible outside of ext4_match,
but f2fs does the right thing and stops the lookup.

Thinking about it, there is a second problem with this series.
Currently, if we are on strict_mode, f2fs_match_ci_name does not
propagate unicode errors back to f2fs. So, once a utf8 invalid sequence
is found during lookup, it will be considered not-a-match but the lookup
will continue. This allows some lookups to succeed even in a corrupted
directory. With this patch, we will abort the lookup on the first
error, breaking existing semantics. Note that these are different from
memory allocation failure and fscrypt_fname_disk_to_usr. For those, it
makes sense to abort.

Also, once patch 6 and 7 are added, if fscrypt fails with -EINVAL for
any reason unrelated to unicode (like in the WARN_ON above), we will
incorrectly print the error message saying there is a bad UTF8 string.

My suggestion would be to keep the current behavior. Make
generic_ci_match only propagate non-unicode related errors back to the
filesystem. This means that we need to move the error messages in patch
6 and 7 into this function, so they only trigger when utf8_strncasecmp*
itself fails.

>> + /*
>> + * Attempt a case-sensitive match first. It is cheaper and
>> + * should cover most lookups, including all the sane
>> + * applications that expect a case-sensitive filesystem.
>> + */
>> + if (folded_name->name) {
>> + if (dirent.len == folded_name->len &&
>> + !memcmp(folded_name->name, dirent.name, dirent.len))
>> + goto out;
>> + res = utf8_strncasecmp_folded(um, folded_name, &dirent);
>
> Shouldn't the memcmp be done with the original user-specified name, not the
> casefolded name? I would think that the user-specified name is the one that's
> more likely to match the on-disk name, because of case preservation. In most
> cases users will specify the same case on both file creation and later access.

Yes.

--
Gabriel Krisman Bertazi

2024-05-22 14:03:08

by Eugen Hristev

[permalink] [raw]
Subject: Re: [PATCH v16 3/9] libfs: Introduce case-insensitive string comparison helper

On 5/13/24 00:27, Gabriel Krisman Bertazi wrote:
> Eric Biggers <[email protected]> writes:
>
>> On Fri, Apr 05, 2024 at 03:13:26PM +0300, Eugen Hristev wrote:
>
>>> + if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
>>> + return -EINVAL;
>>> +
>>> + decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
>>> + if (!decrypted_name.name)
>>> + return -ENOMEM;
>>> + res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
>>> + &decrypted_name);
>>> + if (res < 0)
>>> + goto out;
>>
>> If fscrypt_fname_disk_to_usr() returns an error and !sb_has_strict_encoding(sb),
>> then this function returns 0 (indicating no match) instead of the error code
>> (indicating an error). Is that the correct behavior? I would think that
>> strict_encoding should only have an effect on the actual name
>> comparison.
>
> No. we *want* this return code to be propagated back to f2fs. In ext4 it
> wouldn't matter since the error is not visible outside of ext4_match,
> but f2fs does the right thing and stops the lookup.

In the previous version which I sent, you told me that the error should be
propagated only in strict_mode, and if !strict_mode, it should just return no match.
Originally I did not understand that this should be done only for utf8_strncasecmp
errors, and not for all the errors. I will change it here to fix that.

>
> Thinking about it, there is a second problem with this series.
> Currently, if we are on strict_mode, f2fs_match_ci_name does not
> propagate unicode errors back to f2fs. So, once a utf8 invalid sequence
> is found during lookup, it will be considered not-a-match but the lookup
> will continue. This allows some lookups to succeed even in a corrupted
> directory. With this patch, we will abort the lookup on the first
> error, breaking existing semantics. Note that these are different from
> memory allocation failure and fscrypt_fname_disk_to_usr. For those, it
> makes sense to abort.

So , in the case of f2fs , we must not propagate utf8 errors ? It should just
return no match even in strict mode ?
If this helper is common for both f2fs and ext4, we have to do the same for ext4 ?
Or we are no longer able to commonize the code altogether ?
>
> Also, once patch 6 and 7 are added, if fscrypt fails with -EINVAL for
> any reason unrelated to unicode (like in the WARN_ON above), we will
> incorrectly print the error message saying there is a bad UTF8 string.
>
> My suggestion would be to keep the current behavior. Make
> generic_ci_match only propagate non-unicode related errors back to the
> filesystem. This means that we need to move the error messages in patch
> 6 and 7 into this function, so they only trigger when utf8_strncasecmp*
> itself fails.
>

So basically unicode errors stop here, and print the error message here in that case.
Am I understanding it correctly ?
>>> + /*
>>> + * Attempt a case-sensitive match first. It is cheaper and
>>> + * should cover most lookups, including all the sane
>>> + * applications that expect a case-sensitive filesystem.
>>> + */
>>> + if (folded_name->name) {
>>> + if (dirent.len == folded_name->len &&
>>> + !memcmp(folded_name->name, dirent.name, dirent.len))
>>> + goto out;
>>> + res = utf8_strncasecmp_folded(um, folded_name, &dirent);
>>
>> Shouldn't the memcmp be done with the original user-specified name, not the
>> casefolded name? I would think that the user-specified name is the one that's
>> more likely to match the on-disk name, because of case preservation. In most
>> cases users will specify the same case on both file creation and later access.
>
> Yes.
>
so the utf8_strncasecmp_folded call here must use name->name instead of folded_name ?

Thanks for the review
Eugen


2024-05-22 23:06:09

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH v16 3/9] libfs: Introduce case-insensitive string comparison helper

Eugen Hristev <[email protected]> writes:

> On 5/13/24 00:27, Gabriel Krisman Bertazi wrote:
>> Eric Biggers <[email protected]> writes:
>>
>>> On Fri, Apr 05, 2024 at 03:13:26PM +0300, Eugen Hristev wrote:
>>
>>>> + if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
>>>> + return -EINVAL;
>>>> +
>>>> + decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
>>>> + if (!decrypted_name.name)
>>>> + return -ENOMEM;
>>>> + res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
>>>> + &decrypted_name);
>>>> + if (res < 0)
>>>> + goto out;
>>>
>>> If fscrypt_fname_disk_to_usr() returns an error and !sb_has_strict_encoding(sb),
>>> then this function returns 0 (indicating no match) instead of the error code
>>> (indicating an error). Is that the correct behavior? I would think that
>>> strict_encoding should only have an effect on the actual name
>>> comparison.
>>
>> No. we *want* this return code to be propagated back to f2fs. In ext4 it
>> wouldn't matter since the error is not visible outside of ext4_match,
>> but f2fs does the right thing and stops the lookup.
>
> In the previous version which I sent, you told me that the error should be
> propagated only in strict_mode, and if !strict_mode, it should just return no match.
> Originally I did not understand that this should be done only for utf8_strncasecmp
> errors, and not for all the errors. I will change it here to fix that.

Yes, it depends on which error we are talking about. For ENOMEM and
whatever error fscrypt_fname_disk_to_usr returns, we surely want to send
that back, such that f2fs can handle it (i.e abort the lookup). Unicode
casefolding errors don't need to stop the lookup.


>> Thinking about it, there is a second problem with this series.
>> Currently, if we are on strict_mode, f2fs_match_ci_name does not
>> propagate unicode errors back to f2fs. So, once a utf8 invalid sequence
>> is found during lookup, it will be considered not-a-match but the lookup
>> will continue. This allows some lookups to succeed even in a corrupted
>> directory. With this patch, we will abort the lookup on the first
>> error, breaking existing semantics. Note that these are different from
>> memory allocation failure and fscrypt_fname_disk_to_usr. For those, it
>> makes sense to abort.
>
> So , in the case of f2fs , we must not propagate utf8 errors ? It should just
> return no match even in strict mode ?
> If this helper is common for both f2fs and ext4, we have to do the same for ext4 ?
> Or we are no longer able to commonize the code altogether ?

We can have a common handler. It doesn't matter for Ext4 because it
ignores all errors. Perhaps ext4 can be improved too in a different
patchset.

>> My suggestion would be to keep the current behavior. Make
>> generic_ci_match only propagate non-unicode related errors back to the
>> filesystem. This means that we need to move the error messages in patch
>> 6 and 7 into this function, so they only trigger when utf8_strncasecmp*
>> itself fails.
>>
>
> So basically unicode errors stop here, and print the error message here in that case.
> Am I understanding it correctly ?

Yes, that is it. print the error message - only in strict mode - and
return not-a-match.

Is there any problem with this approach that I'm missing?

>>>> + /*
>>>> + * Attempt a case-sensitive match first. It is cheaper and
>>>> + * should cover most lookups, including all the sane
>>>> + * applications that expect a case-sensitive filesystem.
>>>> + */
>>>> + if (folded_name->name) {
>>>> + if (dirent.len == folded_name->len &&
>>>> + !memcmp(folded_name->name, dirent.name, dirent.len))
>>>> + goto out;
>>>> + res = utf8_strncasecmp_folded(um, folded_name, &dirent);
>>>
>>> Shouldn't the memcmp be done with the original user-specified name, not the
>>> casefolded name? I would think that the user-specified name is the one that's
>>> more likely to match the on-disk name, because of case preservation. In most
>>> cases users will specify the same case on both file creation and later access.
>>
>> Yes.
>>
> so the utf8_strncasecmp_folded call here must use name->name instead of folded_name ?

No, utf8_strncasecmp_folded requires a casefolded name. Eric's point is
that the *memcmp* should always compare against name->name since it's more
likely to match the name on disk than the folded version because the user
is probably doing a case-exact lookup.

This also means the memcmp can be moved outside the "if (folded_name->name)",
simplifying the patch!

--
Gabriel Krisman Bertazi

2024-05-26 11:50:01

by Eugen Hristev

[permalink] [raw]
Subject: Re: [PATCH v16 3/9] libfs: Introduce case-insensitive string comparison helper

On 5/23/24 02:05, Gabriel Krisman Bertazi wrote:
> Eugen Hristev <[email protected]> writes:
>
>> On 5/13/24 00:27, Gabriel Krisman Bertazi wrote:
>>> Eric Biggers <[email protected]> writes:
>>>
>>>> On Fri, Apr 05, 2024 at 03:13:26PM +0300, Eugen Hristev wrote:
>>>
>>>>> + if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
>>>>> + return -EINVAL;
>>>>> +
>>>>> + decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
>>>>> + if (!decrypted_name.name)
>>>>> + return -ENOMEM;
>>>>> + res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
>>>>> + &decrypted_name);
>>>>> + if (res < 0)
>>>>> + goto out;
>>>>
>>>> If fscrypt_fname_disk_to_usr() returns an error and !sb_has_strict_encoding(sb),
>>>> then this function returns 0 (indicating no match) instead of the error code
>>>> (indicating an error). Is that the correct behavior? I would think that
>>>> strict_encoding should only have an effect on the actual name
>>>> comparison.
>>>
>>> No. we *want* this return code to be propagated back to f2fs. In ext4 it
>>> wouldn't matter since the error is not visible outside of ext4_match,
>>> but f2fs does the right thing and stops the lookup.
>>
>> In the previous version which I sent, you told me that the error should be
>> propagated only in strict_mode, and if !strict_mode, it should just return no match.
>> Originally I did not understand that this should be done only for utf8_strncasecmp
>> errors, and not for all the errors. I will change it here to fix that.
>
> Yes, it depends on which error we are talking about. For ENOMEM and
> whatever error fscrypt_fname_disk_to_usr returns, we surely want to send
> that back, such that f2fs can handle it (i.e abort the lookup). Unicode
> casefolding errors don't need to stop the lookup.
>
>
>>> Thinking about it, there is a second problem with this series.
>>> Currently, if we are on strict_mode, f2fs_match_ci_name does not
>>> propagate unicode errors back to f2fs. So, once a utf8 invalid sequence
>>> is found during lookup, it will be considered not-a-match but the lookup
>>> will continue. This allows some lookups to succeed even in a corrupted
>>> directory. With this patch, we will abort the lookup on the first
>>> error, breaking existing semantics. Note that these are different from
>>> memory allocation failure and fscrypt_fname_disk_to_usr. For those, it
>>> makes sense to abort.
>>
>> So , in the case of f2fs , we must not propagate utf8 errors ? It should just
>> return no match even in strict mode ?
>> If this helper is common for both f2fs and ext4, we have to do the same for ext4 ?
>> Or we are no longer able to commonize the code altogether ?
>
> We can have a common handler. It doesn't matter for Ext4 because it
> ignores all errors. Perhaps ext4 can be improved too in a different
> patchset.
>
>>> My suggestion would be to keep the current behavior. Make
>>> generic_ci_match only propagate non-unicode related errors back to the
>>> filesystem. This means that we need to move the error messages in patch
>>> 6 and 7 into this function, so they only trigger when utf8_strncasecmp*
>>> itself fails.
>>>
>>
>> So basically unicode errors stop here, and print the error message here in that case.
>> Am I understanding it correctly ?
>
> Yes, that is it. print the error message - only in strict mode - and
> return not-a-match.
>
> Is there any problem with this approach that I'm missing?

As the printing is moved here, in the common code, we cannot use either of
f2fs_warn nor EXT4_ERROR_INODE . Any suggestion ? Would have to be something
meaningful for the user and ratelimited I guess.

Thanks for the explanations !


>
>>>>> + /*
>>>>> + * Attempt a case-sensitive match first. It is cheaper and
>>>>> + * should cover most lookups, including all the sane
>>>>> + * applications that expect a case-sensitive filesystem.
>>>>> + */
>>>>> + if (folded_name->name) {
>>>>> + if (dirent.len == folded_name->len &&
>>>>> + !memcmp(folded_name->name, dirent.name, dirent.len))
>>>>> + goto out;
>>>>> + res = utf8_strncasecmp_folded(um, folded_name, &dirent);
>>>>
>>>> Shouldn't the memcmp be done with the original user-specified name, not the
>>>> casefolded name? I would think that the user-specified name is the one that's
>>>> more likely to match the on-disk name, because of case preservation. In most
>>>> cases users will specify the same case on both file creation and later access.
>>>
>>> Yes.
>>>
>> so the utf8_strncasecmp_folded call here must use name->name instead of folded_name ?
>
> No, utf8_strncasecmp_folded requires a casefolded name. Eric's point is
> that the *memcmp* should always compare against name->name since it's more
> likely to match the name on disk than the folded version because the user
> is probably doing a case-exact lookup.
>
> This also means the memcmp can be moved outside the "if (folded_name->name)",
> simplifying the patch!
>