2024-02-20 08:55:27

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v12 0/8] Cache insensitive cleanup for ext4/f2fs

Hello,

I am trying to respin the series here :
https://www.spinics.net/lists/linux-ext4/msg85081.html

I resent some of the v9 patches and got some reviews from Gabriel,
I did changes as requested and here is v12.

Changes in v12:
- revert to v10 comparison with propagating the error code from utf comparison

Changes in v11:
- revert to the original v9 implementation for the comparison helper.

Changes in v10:
- reworked a bit the comparison helper to improve performance by
first performing the exact lookup.


* Original commit letter

The case-insensitive implementations in f2fs and ext4 have quite a bit
of duplicated code. This series simplifies the ext4 version, with the
goal of extracting ext4_ci_compare into a helper library that can be
used by both filesystems. It also reduces the clutter from many
codeguards for CONFIG_UNICODE; as requested by Linus, they are part of
the codeflow now.

While there, I noticed we can leverage the utf8 functions to detect
encoded names that are corrupted in the filesystem. Therefore, it also
adds an ext4 error on that scenario, to mark the filesystem as
corrupted.

This series survived passes of xfstests -g quick.


Gabriel Krisman Bertazi (8):
ext4: Simplify the handling of cached insensitive names
f2fs: Simplify the handling of cached insensitive names
libfs: Introduce case-insensitive string comparison helper
ext4: Reuse generic_ci_match for ci comparisons
f2fs: Reuse generic_ci_match for ci comparisons
ext4: Log error when lookup of encoded dentry fails
ext4: Move CONFIG_UNICODE defguards into the code flow
f2fs: Move CONFIG_UNICODE defguards into the code flow

fs/ext4/crypto.c | 19 ++-----
fs/ext4/ext4.h | 35 +++++++-----
fs/ext4/namei.c | 129 ++++++++++++++++-----------------------------
fs/ext4/super.c | 4 +-
fs/f2fs/dir.c | 105 +++++++++++-------------------------
fs/f2fs/f2fs.h | 17 +++++-
fs/f2fs/namei.c | 10 ++--
fs/f2fs/recovery.c | 5 +-
fs/f2fs/super.c | 8 +--
fs/libfs.c | 85 +++++++++++++++++++++++++++++
include/linux/fs.h | 4 ++
11 files changed, 216 insertions(+), 205 deletions(-)

--
2.34.1



2024-02-20 08:56:39

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v12 3/8] libfs: Introduce case-insensitive string comparison helper

From: Gabriel Krisman Bertazi <[email protected]>

generic_ci_match can be used by case-insensitive filesystems to compare
strings under lookup with dirents in a case-insensitive way. This
function is currently reimplemented by each filesystem supporting
casefolding, so this reduces code duplication in filesystem-specific
code.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
[[email protected]: rework to first test the exact match]
Signed-off-by: Eugen Hristev <[email protected]>
---
fs/libfs.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 4 +++
2 files changed, 89 insertions(+)

diff --git a/fs/libfs.c b/fs/libfs.c
index bb18884ff20e..65e2fb17a2b6 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1773,6 +1773,91 @@ static const struct dentry_operations generic_ci_dentry_ops = {
.d_hash = generic_ci_d_hash,
.d_compare = generic_ci_d_compare,
};
+
+/**
+ * generic_ci_match() - Match a name (case-insensitively) with a dirent.
+ * This is a filesystem helper for comparison with directory entries.
+ * generic_ci_d_compare should be used in VFS' ->d_compare instead.
+ *
+ * @parent: Inode of the parent of the dirent under comparison
+ * @name: name under lookup.
+ * @folded_name: Optional pre-folded name under lookup
+ * @de_name: Dirent name.
+ * @de_name_len: dirent name length.
+ *
+ * Test whether a case-insensitive directory entry matches the filename
+ * being searched. If @folded_name is provided, it is used instead of
+ * recalculating the casefold of @name.
+ *
+ * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
+ * < 0 on error.
+ */
+int generic_ci_match(const struct inode *parent,
+ const struct qstr *name,
+ const struct qstr *folded_name,
+ const u8 *de_name, u32 de_name_len)
+{
+ const struct super_block *sb = parent->i_sb;
+ const struct unicode_map *um = sb->s_encoding;
+ struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
+ struct qstr dirent = QSTR_INIT(de_name, de_name_len);
+ int res, match = 0;
+
+ if (IS_ENCRYPTED(parent)) {
+ const struct fscrypt_str encrypted_name =
+ FSTR_INIT((u8 *) de_name, de_name_len);
+
+ if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
+ return -EINVAL;
+
+ decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
+ if (!decrypted_name.name)
+ return -ENOMEM;
+ res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
+ &decrypted_name);
+ if (res < 0)
+ goto out;
+ dirent.name = decrypted_name.name;
+ dirent.len = decrypted_name.len;
+ }
+
+ /*
+ * Attempt a case-sensitive match first. It is cheaper and
+ * should cover most lookups, including all the sane
+ * applications that expect a case-sensitive filesystem.
+ *
+ * This comparison is safe under RCU because the caller
+ * guarantees the consistency between str and len. See
+ * __d_lookup_rcu_op_compare() for details.
+ */
+ if (folded_name->name) {
+ if (dirent.len == folded_name->len &&
+ !memcmp(folded_name->name, dirent.name, dirent.len)) {
+ match = 1;
+ goto out;
+ }
+ res = utf8_strncasecmp_folded(um, folded_name, &dirent);
+ } else {
+ if (dirent.len == name->len &&
+ !memcmp(name->name, dirent.name, dirent.len) &&
+ (!sb_has_strict_encoding(sb) || !utf8_validate(um, name))) {
+ match = 1;
+ goto out;
+ }
+ res = utf8_strncasecmp(um, name, &dirent);
+ }
+
+out:
+ kfree(decrypted_name.name);
+ if (match) /* matched by direct comparison */
+ return 1;
+ else if (!res) /* matched by utf8 comparison */
+ return 1;
+ else if (res < 0) /* error on utf8 comparison */
+ return res;
+ return 0; /* no match */
+}
+EXPORT_SYMBOL(generic_ci_match);
#endif

#ifdef CONFIG_FS_ENCRYPTION
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 820b93b2917f..7af691ff8d44 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3296,6 +3296,10 @@ extern int generic_file_fsync(struct file *, loff_t, loff_t, int);
extern int generic_check_addressable(unsigned, u64);

extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry);
+extern int generic_ci_match(const struct inode *parent,
+ const struct qstr *name,
+ const struct qstr *folded_name,
+ const u8 *de_name, u32 de_name_len);

static inline bool sb_has_encoding(const struct super_block *sb)
{
--
2.34.1


2024-02-20 08:57:37

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v12 6/8] ext4: Log error when lookup of encoded dentry fails

From: Gabriel Krisman Bertazi <[email protected]>

If the volume is in strict mode, ext4_ci_compare can report a broken
encoding name. This will not trigger on a bad lookup, which is caught
earlier, only if the actual disk name is bad.

Reviewed-by: Eric Biggers <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
Signed-off-by: Eugen Hristev <[email protected]>
---
fs/ext4/namei.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 6e7af8dc4dde..7d357c417475 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1477,6 +1477,9 @@ static bool ext4_match(struct inode *parent,
* only case where it happens is on a disk
* corruption or ENOMEM.
*/
+ if (ret == -EINVAL)
+ EXT4_ERROR_INODE(parent,
+ "Directory contains filename that is invalid UTF-8");
return false;
}
return ret;
--
2.34.1


2024-02-20 09:07:43

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v12 2/8] f2fs: Simplify the handling of cached insensitive names

From: Gabriel Krisman Bertazi <[email protected]>

Keeping it as qstr avoids the unnecessary conversion in f2fs_match

Reviewed-by: Eric Biggers <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
[[email protected]: port to 6.8-rc3]
Signed-off-by: Eugen Hristev <[email protected]>
---
fs/f2fs/dir.c | 53 ++++++++++++++++++++++++++--------------------
fs/f2fs/f2fs.h | 17 ++++++++++++++-
fs/f2fs/recovery.c | 5 +----
3 files changed, 47 insertions(+), 28 deletions(-)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 3f20d94e12f9..f5b65cf36393 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -42,35 +42,49 @@ static unsigned int bucket_blocks(unsigned int level)
return 4;
}

+#if IS_ENABLED(CONFIG_UNICODE)
/* If @dir is casefolded, initialize @fname->cf_name from @fname->usr_fname. */
int f2fs_init_casefolded_name(const struct inode *dir,
struct f2fs_filename *fname)
{
-#if IS_ENABLED(CONFIG_UNICODE)
struct super_block *sb = dir->i_sb;
+ unsigned char *buf;
+ int len;

if (IS_CASEFOLDED(dir) &&
!is_dot_dotdot(fname->usr_fname->name, fname->usr_fname->len)) {
- fname->cf_name.name = f2fs_kmem_cache_alloc(f2fs_cf_name_slab,
- GFP_NOFS, false, F2FS_SB(sb));
- if (!fname->cf_name.name)
+ buf = f2fs_kmem_cache_alloc(f2fs_cf_name_slab,
+ GFP_NOFS, false, F2FS_SB(sb));
+ if (!buf)
return -ENOMEM;
- fname->cf_name.len = utf8_casefold(sb->s_encoding,
- fname->usr_fname,
- fname->cf_name.name,
- F2FS_NAME_LEN);
- if ((int)fname->cf_name.len <= 0) {
- kmem_cache_free(f2fs_cf_name_slab, fname->cf_name.name);
- fname->cf_name.name = NULL;
+
+ len = utf8_casefold(sb->s_encoding, fname->usr_fname,
+ buf, F2FS_NAME_LEN);
+ if (len <= 0) {
+ kmem_cache_free(f2fs_cf_name_slab, buf);
if (sb_has_strict_encoding(sb))
return -EINVAL;
/* fall back to treating name as opaque byte sequence */
+ return 0;
}
+ fname->cf_name.name = buf;
+ fname->cf_name.len = len;
}
-#endif
+
return 0;
}

+void f2fs_free_casefolded_name(struct f2fs_filename *fname)
+{
+ unsigned char *buf = (unsigned char *)fname->cf_name.name;
+
+ if (buf) {
+ kmem_cache_free(f2fs_cf_name_slab, buf);
+ fname->cf_name.name = NULL;
+ }
+}
+#endif /* CONFIG_UNICODE */
+
static int __f2fs_setup_filename(const struct inode *dir,
const struct fscrypt_name *crypt_name,
struct f2fs_filename *fname)
@@ -142,12 +156,7 @@ void f2fs_free_filename(struct f2fs_filename *fname)
kfree(fname->crypto_buf.name);
fname->crypto_buf.name = NULL;
#endif
-#if IS_ENABLED(CONFIG_UNICODE)
- if (fname->cf_name.name) {
- kmem_cache_free(f2fs_cf_name_slab, fname->cf_name.name);
- fname->cf_name.name = NULL;
- }
-#endif
+ f2fs_free_casefolded_name(fname);
}

static unsigned long dir_block_index(unsigned int level,
@@ -235,11 +244,9 @@ static inline int f2fs_match_name(const struct inode *dir,
struct fscrypt_name f;

#if IS_ENABLED(CONFIG_UNICODE)
- if (fname->cf_name.name) {
- struct qstr cf = FSTR_TO_QSTR(&fname->cf_name);
-
- return f2fs_match_ci_name(dir, &cf, de_name, de_name_len);
- }
+ if (fname->cf_name.name)
+ return f2fs_match_ci_name(dir, &fname->cf_name,
+ de_name, de_name_len);
#endif
f.usr_fname = fname->usr_fname;
f.disk_name = fname->disk_name;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 84c9fead3ad4..2ff8e52642ec 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -530,7 +530,7 @@ struct f2fs_filename {
* internal operation where usr_fname is also NULL. In all these cases
* we fall back to treating the name as an opaque byte sequence.
*/
- struct fscrypt_str cf_name;
+ struct qstr cf_name;
#endif
};

@@ -3533,8 +3533,23 @@ int f2fs_get_tmpfile(struct mnt_idmap *idmap, struct inode *dir,
/*
* dir.c
*/
+unsigned char f2fs_get_de_type(struct f2fs_dir_entry *de);
+#if IS_ENABLED(CONFIG_UNICODE)
int f2fs_init_casefolded_name(const struct inode *dir,
struct f2fs_filename *fname);
+void f2fs_free_casefolded_name(struct f2fs_filename *fname);
+#else
+static inline int f2fs_init_casefolded_name(const struct inode *dir,
+ struct f2fs_filename *fname)
+{
+ return 0;
+}
+
+static inline void f2fs_free_casefolded_name(struct f2fs_filename *fname)
+{
+}
+#endif /* CONFIG_UNICODE */
+
int f2fs_setup_filename(struct inode *dir, const struct qstr *iname,
int lookup, struct f2fs_filename *fname);
int f2fs_prepare_lookup(struct inode *dir, struct dentry *dentry,
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index aad1d1a9b3d6..8e8501a3a8e0 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -153,11 +153,8 @@ static int init_recovered_filename(const struct inode *dir,
if (err)
return err;
f2fs_hash_filename(dir, fname);
-#if IS_ENABLED(CONFIG_UNICODE)
/* Case-sensitive match is fine for recovery */
- kmem_cache_free(f2fs_cf_name_slab, fname->cf_name.name);
- fname->cf_name.name = NULL;
-#endif
+ f2fs_free_casefolded_name(fname);
} else {
f2fs_hash_filename(dir, fname);
}
--
2.34.1


2024-02-20 09:16:16

by Eugen Hristev

[permalink] [raw]
Subject: [PATCH v12 7/8] ext4: Move CONFIG_UNICODE defguards into the code flow

From: Gabriel Krisman Bertazi <[email protected]>

Instead of a bunch of ifdefs, make the unicode built checks part of the
code flow where possible, as requested by Torvalds.

Reviewed-by: Eric Biggers <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
[[email protected]: port to 6.8-rc3]
Signed-off-by: Eugen Hristev <[email protected]>
---
fs/ext4/crypto.c | 19 +++----------------
fs/ext4/ext4.h | 33 +++++++++++++++++++++------------
fs/ext4/namei.c | 14 +++++---------
fs/ext4/super.c | 4 +---
4 files changed, 30 insertions(+), 40 deletions(-)

diff --git a/fs/ext4/crypto.c b/fs/ext4/crypto.c
index 7ae0b61258a7..1d2f8b79529c 100644
--- a/fs/ext4/crypto.c
+++ b/fs/ext4/crypto.c
@@ -31,12 +31,7 @@ int ext4_fname_setup_filename(struct inode *dir, const struct qstr *iname,

ext4_fname_from_fscrypt_name(fname, &name);

-#if IS_ENABLED(CONFIG_UNICODE)
- err = ext4_fname_setup_ci_filename(dir, iname, fname);
- if (err)
- ext4_fname_free_filename(fname);
-#endif
- return err;
+ return ext4_fname_setup_ci_filename(dir, iname, fname);
}

int ext4_fname_prepare_lookup(struct inode *dir, struct dentry *dentry,
@@ -51,12 +46,7 @@ int ext4_fname_prepare_lookup(struct inode *dir, struct dentry *dentry,

ext4_fname_from_fscrypt_name(fname, &name);

-#if IS_ENABLED(CONFIG_UNICODE)
- err = ext4_fname_setup_ci_filename(dir, &dentry->d_name, fname);
- if (err)
- ext4_fname_free_filename(fname);
-#endif
- return err;
+ return ext4_fname_setup_ci_filename(dir, &dentry->d_name, fname);
}

void ext4_fname_free_filename(struct ext4_filename *fname)
@@ -70,10 +60,7 @@ void ext4_fname_free_filename(struct ext4_filename *fname)
fname->usr_fname = NULL;
fname->disk_name.name = NULL;

-#if IS_ENABLED(CONFIG_UNICODE)
- kfree(fname->cf_name.name);
- fname->cf_name.name = NULL;
-#endif
+ ext4_fname_free_ci_filename(fname);
}

static bool uuid_is_zero(__u8 u[16])
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 932bae88b4a7..18362054b59e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2742,8 +2742,25 @@ ext4_fsblk_t ext4_inode_to_goal_block(struct inode *);

#if IS_ENABLED(CONFIG_UNICODE)
extern int ext4_fname_setup_ci_filename(struct inode *dir,
- const struct qstr *iname,
- struct ext4_filename *fname);
+ const struct qstr *iname,
+ struct ext4_filename *fname);
+
+static inline void ext4_fname_free_ci_filename(struct ext4_filename *fname)
+{
+ kfree(fname->cf_name.name);
+ fname->cf_name.name = NULL;
+}
+#else
+static inline int ext4_fname_setup_ci_filename(struct inode *dir,
+ const struct qstr *iname,
+ struct ext4_filename *fname)
+{
+ return 0;
+}
+
+static inline void ext4_fname_free_ci_filename(struct ext4_filename *fname)
+{
+}
#endif

/* ext4 encryption related stuff goes here crypto.c */
@@ -2766,16 +2783,11 @@ static inline int ext4_fname_setup_filename(struct inode *dir,
int lookup,
struct ext4_filename *fname)
{
- int err = 0;
fname->usr_fname = iname;
fname->disk_name.name = (unsigned char *) iname->name;
fname->disk_name.len = iname->len;

-#if IS_ENABLED(CONFIG_UNICODE)
- err = ext4_fname_setup_ci_filename(dir, iname, fname);
-#endif
-
- return err;
+ return ext4_fname_setup_ci_filename(dir, iname, fname);
}

static inline int ext4_fname_prepare_lookup(struct inode *dir,
@@ -2787,10 +2799,7 @@ static inline int ext4_fname_prepare_lookup(struct inode *dir,

static inline void ext4_fname_free_filename(struct ext4_filename *fname)
{
-#if IS_ENABLED(CONFIG_UNICODE)
- kfree(fname->cf_name.name);
- fname->cf_name.name = NULL;
-#endif
+ ext4_fname_free_ci_filename(fname);
}

static inline int ext4_ioctl_get_encryption_pwsalt(struct file *filp,
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 7d357c417475..822bd16f76fa 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1835,8 +1835,7 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi
}
}

-#if IS_ENABLED(CONFIG_UNICODE)
- if (!inode && IS_CASEFOLDED(dir)) {
+ if (IS_ENABLED(CONFIG_UNICODE) && !inode && IS_CASEFOLDED(dir)) {
/* Eventually we want to call d_add_ci(dentry, NULL)
* for negative dentries in the encoding case as
* well. For now, prevent the negative dentry
@@ -1844,7 +1843,7 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi
*/
return NULL;
}
-#endif
+
return d_splice_alias(inode, dentry);
}

@@ -3174,16 +3173,14 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
ext4_fc_track_unlink(handle, dentry);
retval = ext4_mark_inode_dirty(handle, dir);

-#if IS_ENABLED(CONFIG_UNICODE)
/* VFS negative dentries are incompatible with Encoding and
* Case-insensitiveness. Eventually we'll want avoid
* invalidating the dentries here, alongside with returning the
* negative dentries at ext4_lookup(), when it is better
* supported by the VFS for the CI case.
*/
- if (IS_CASEFOLDED(dir))
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
d_invalidate(dentry);
-#endif

end_rmdir:
brelse(bh);
@@ -3285,16 +3282,15 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
goto out_trace;

retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry), dentry);
-#if IS_ENABLED(CONFIG_UNICODE)
+
/* VFS negative dentries are incompatible with Encoding and
* Case-insensitiveness. Eventually we'll want avoid
* invalidating the dentries here, alongside with returning the
* negative dentries at ext4_lookup(), when it is better
* supported by the VFS for the CI case.
*/
- if (IS_CASEFOLDED(dir))
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
d_invalidate(dentry);
-#endif

out_trace:
trace_ext4_unlink_exit(dentry, retval);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0f931d0c227d..933a9218c20c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3609,14 +3609,12 @@ int ext4_feature_set_ok(struct super_block *sb, int readonly)
return 0;
}

-#if !IS_ENABLED(CONFIG_UNICODE)
- if (ext4_has_feature_casefold(sb)) {
+ if (!IS_ENABLED(CONFIG_UNICODE) && ext4_has_feature_casefold(sb)) {
ext4_msg(sb, KERN_ERR,
"Filesystem with casefold feature cannot be "
"mounted without CONFIG_UNICODE");
return 0;
}
-#endif

if (readonly)
return 1;
--
2.34.1


2024-02-27 23:21:39

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH v12 2/8] f2fs: Simplify the handling of cached insensitive names

Eugen Hristev <[email protected]> writes:

> From: Gabriel Krisman Bertazi <[email protected]>
>
> Keeping it as qstr avoids the unnecessary conversion in f2fs_match
>
> Reviewed-by: Eric Biggers <[email protected]>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> [[email protected]: port to 6.8-rc3]
> Signed-off-by: Eugen Hristev <[email protected]>
> ---
> fs/f2fs/dir.c | 53 ++++++++++++++++++++++++++--------------------
> fs/f2fs/f2fs.h | 17 ++++++++++++++-
> fs/f2fs/recovery.c | 5 +----
> 3 files changed, 47 insertions(+), 28 deletions(-)
>
> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> index 3f20d94e12f9..f5b65cf36393 100644
> --- a/fs/f2fs/dir.c
> +++ b/fs/f2fs/dir.c
> @@ -42,35 +42,49 @@ static unsigned int bucket_blocks(unsigned int level)
> return 4;
> }
>
> +#if IS_ENABLED(CONFIG_UNICODE)
> /* If @dir is casefolded, initialize @fname->cf_name from @fname->usr_fname. */
> int f2fs_init_casefolded_name(const struct inode *dir,
> struct f2fs_filename *fname)
> {
> -#if IS_ENABLED(CONFIG_UNICODE)
> struct super_block *sb = dir->i_sb;
> + unsigned char *buf;
> + int len;
>
> if (IS_CASEFOLDED(dir) &&
> !is_dot_dotdot(fname->usr_fname->name, fname->usr_fname->len)) {
> - fname->cf_name.name = f2fs_kmem_cache_alloc(f2fs_cf_name_slab,
> - GFP_NOFS, false, F2FS_SB(sb));
> - if (!fname->cf_name.name)
> + buf = f2fs_kmem_cache_alloc(f2fs_cf_name_slab,
> + GFP_NOFS, false, F2FS_SB(sb));
> + if (!buf)
> return -ENOMEM;
> - fname->cf_name.len = utf8_casefold(sb->s_encoding,
> - fname->usr_fname,
> - fname->cf_name.name,
> - F2FS_NAME_LEN);
> - if ((int)fname->cf_name.len <= 0) {
> - kmem_cache_free(f2fs_cf_name_slab, fname->cf_name.name);
> - fname->cf_name.name = NULL;
> +
> + len = utf8_casefold(sb->s_encoding, fname->usr_fname,
> + buf, F2FS_NAME_LEN);
> + if (len <= 0) {
> + kmem_cache_free(f2fs_cf_name_slab, buf);
> if (sb_has_strict_encoding(sb))
> return -EINVAL;
> /* fall back to treating name as opaque byte sequence */
> + return 0;
> }
> + fname->cf_name.name = buf;
> + fname->cf_name.len = len;
> }
> -#endif
> +
> return 0;
> }
>
> +void f2fs_free_casefolded_name(struct f2fs_filename *fname)
> +{
> + unsigned char *buf = (unsigned char *)fname->cf_name.name;
> +
> + if (buf) {
> + kmem_cache_free(f2fs_cf_name_slab, buf);
> + fname->cf_name.name = NULL;
> + }
> +}

If we use kfree here, we can drop the buf !=NULL check.

> +#endif /* CONFIG_UNICODE */
> +
> static int __f2fs_setup_filename(const struct inode *dir,
> const struct fscrypt_name *crypt_name,
> struct f2fs_filename *fname)
> @@ -142,12 +156,7 @@ void f2fs_free_filename(struct f2fs_filename *fname)
> kfree(fname->crypto_buf.name);
> fname->crypto_buf.name = NULL;
> #endif
> -#if IS_ENABLED(CONFIG_UNICODE)
> - if (fname->cf_name.name) {
> - kmem_cache_free(f2fs_cf_name_slab, fname->cf_name.name);
> - fname->cf_name.name = NULL;
> - }
> -#endif
> + f2fs_free_casefolded_name(fname);
> }
>
> static unsigned long dir_block_index(unsigned int level,
> @@ -235,11 +244,9 @@ static inline int f2fs_match_name(const struct inode *dir,
> struct fscrypt_name f;
>
> #if IS_ENABLED(CONFIG_UNICODE)
> - if (fname->cf_name.name) {
> - struct qstr cf = FSTR_TO_QSTR(&fname->cf_name);
> -
> - return f2fs_match_ci_name(dir, &cf, de_name, de_name_len);
> - }
> + if (fname->cf_name.name)
> + return f2fs_match_ci_name(dir, &fname->cf_name,
> + de_name, de_name_len);
> #endif
> f.usr_fname = fname->usr_fname;
> f.disk_name = fname->disk_name;
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 84c9fead3ad4..2ff8e52642ec 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -530,7 +530,7 @@ struct f2fs_filename {
> * internal operation where usr_fname is also NULL. In all these cases
> * we fall back to treating the name as an opaque byte sequence.
> */
> - struct fscrypt_str cf_name;
> + struct qstr cf_name;
> #endif
> };
>
> @@ -3533,8 +3533,23 @@ int f2fs_get_tmpfile(struct mnt_idmap *idmap, struct inode *dir,
> /*
> * dir.c
> */
> +unsigned char f2fs_get_de_type(struct f2fs_dir_entry *de);

This is not part of the original patch and doesn't make sense here. It
seems to be included by a bad rebase?


--
Gabriel Krisman Bertazi

2024-02-27 23:33:43

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH v12 3/8] libfs: Introduce case-insensitive string comparison helper

Eugen Hristev <[email protected]> writes:

> From: Gabriel Krisman Bertazi <[email protected]>
>
> generic_ci_match can be used by case-insensitive filesystems to compare
> strings under lookup with dirents in a case-insensitive way. This
> function is currently reimplemented by each filesystem supporting
> casefolding, so this reduces code duplication in filesystem-specific
> code.

Just a note that this conflicts with the other patchset to generic
helpers that I just applied. The conflict is trivial, If you could base
the next iteration on top of my for-next, it would be helpful.

>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> [[email protected]: rework to first test the exact match]
> Signed-off-by: Eugen Hristev <[email protected]>
> ---
> fs/libfs.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 4 +++
> 2 files changed, 89 insertions(+)
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index bb18884ff20e..65e2fb17a2b6 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1773,6 +1773,91 @@ static const struct dentry_operations generic_ci_dentry_ops = {
> .d_hash = generic_ci_d_hash,
> .d_compare = generic_ci_d_compare,
> };
> +
> +/**
> + * generic_ci_match() - Match a name (case-insensitively) with a dirent.
> + * This is a filesystem helper for comparison with directory entries.
> + * generic_ci_d_compare should be used in VFS' ->d_compare instead.
> + *
> + * @parent: Inode of the parent of the dirent under comparison
> + * @name: name under lookup.
> + * @folded_name: Optional pre-folded name under lookup
> + * @de_name: Dirent name.
> + * @de_name_len: dirent name length.
> + *
> + * Test whether a case-insensitive directory entry matches the filename
> + * being searched. If @folded_name is provided, it is used instead of
> + * recalculating the casefold of @name.
> + *
> + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
> + * < 0 on error.
> + */
> +int generic_ci_match(const struct inode *parent,
> + const struct qstr *name,
> + const struct qstr *folded_name,
> + const u8 *de_name, u32 de_name_len)
> +{
> + const struct super_block *sb = parent->i_sb;
> + const struct unicode_map *um = sb->s_encoding;
> + struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
> + struct qstr dirent = QSTR_INIT(de_name, de_name_len);
> + int res, match = 0;
> +
> + if (IS_ENCRYPTED(parent)) {
> + const struct fscrypt_str encrypted_name =
> + FSTR_INIT((u8 *) de_name, de_name_len);
> +
> + if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
> + return -EINVAL;
> +
> + decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
> + if (!decrypted_name.name)
> + return -ENOMEM;
> + res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
> + &decrypted_name);
> + if (res < 0)
> + goto out;
> + dirent.name = decrypted_name.name;
> + dirent.len = decrypted_name.len;
> + }
> +
> + /*
> + * Attempt a case-sensitive match first. It is cheaper and
> + * should cover most lookups, including all the sane
> + * applications that expect a case-sensitive filesystem.

> + * This comparison is safe under RCU because the caller
> + * guarantees the consistency between str and len. See
> + * __d_lookup_rcu_op_compare() for details.

As I mentioned in the previous review, there's no RCU here. This comment
makes no sense here.

> + */
> + if (folded_name->name) {
> + if (dirent.len == folded_name->len &&
> + !memcmp(folded_name->name, dirent.name, dirent.len)) {
> + match = 1;
> + goto out;
> + }
> + res = utf8_strncasecmp_folded(um, folded_name, &dirent);
> + } else {
> + if (dirent.len == name->len &&
> + !memcmp(name->name, dirent.name, dirent.len) &&
> + (!sb_has_strict_encoding(sb) || !utf8_validate(um, name))) {
> + match = 1;
> + goto out;
> + }
> + res = utf8_strncasecmp(um, name, &dirent);
> + }
> +
> +out:

> + kfree(decrypted_name.name);
> + if (match) /* matched by direct comparison */
> + return 1;
> + else if (!res) /* matched by utf8 comparison */
> + return 1;
> + else if (res < 0) /* error on utf8 comparison */
> + return res;
> + return 0; /* no match */
> +}

It can be simplified to

if (res < 0)
return res;
return (match || !res);

>
> #ifdef CONFIG_FS_ENCRYPTION
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 820b93b2917f..7af691ff8d44 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3296,6 +3296,10 @@ extern int generic_file_fsync(struct file *, loff_t, loff_t, int);
> extern int generic_check_addressable(unsigned, u64);
>
> extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry);
> +extern int generic_ci_match(const struct inode *parent,
> + const struct qstr *name,
> + const struct qstr *folded_name,
> + const u8 *de_name, u32 de_name_len);
>
> static inline bool sb_has_encoding(const struct super_block *sb)
> {

--
Gabriel Krisman Bertazi

2024-02-27 23:38:27

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH v12 6/8] ext4: Log error when lookup of encoded dentry fails

Eugen Hristev <[email protected]> writes:

> From: Gabriel Krisman Bertazi <[email protected]>
>
> If the volume is in strict mode, ext4_ci_compare can report a broken
> encoding name. This will not trigger on a bad lookup, which is caught
> earlier, only if the actual disk name is bad.
>
> Reviewed-by: Eric Biggers <[email protected]>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> Signed-off-by: Eugen Hristev <[email protected]>
> ---
> fs/ext4/namei.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 6e7af8dc4dde..7d357c417475 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -1477,6 +1477,9 @@ static bool ext4_match(struct inode *parent,
> * only case where it happens is on a disk
> * corruption or ENOMEM.
> */
> + if (ret == -EINVAL)
> + EXT4_ERROR_INODE(parent,
> + "Directory contains filename that is invalid UTF-8");
> return false;
> }
> return ret;

Can you add a patch doing the same for f2fs?


--
Gabriel Krisman Bertazi

2024-02-27 23:49:13

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH v12 0/8] Cache insensitive cleanup for ext4/f2fs

Eugen Hristev <[email protected]> writes:

> Hello,
>
> I am trying to respin the series here :
> https://www.spinics.net/lists/linux-ext4/msg85081.html

This has a reviewed-by tag from Eric, but since its been years and we've
been going through more changes now, I'd ask you to drop the r-b until
Eric has had a chance to review it and give a new tag.

Thanks,

> I resent some of the v9 patches and got some reviews from Gabriel,
> I did changes as requested and here is v12.
>
> Changes in v12:
> - revert to v10 comparison with propagating the error code from utf comparison
>
> Changes in v11:
> - revert to the original v9 implementation for the comparison helper.
>
> Changes in v10:
> - reworked a bit the comparison helper to improve performance by
> first performing the exact lookup.
>
>
> * Original commit letter
>
> The case-insensitive implementations in f2fs and ext4 have quite a bit
> of duplicated code. This series simplifies the ext4 version, with the
> goal of extracting ext4_ci_compare into a helper library that can be
> used by both filesystems. It also reduces the clutter from many
> codeguards for CONFIG_UNICODE; as requested by Linus, they are part of
> the codeflow now.
>
> While there, I noticed we can leverage the utf8 functions to detect
> encoded names that are corrupted in the filesystem. Therefore, it also
> adds an ext4 error on that scenario, to mark the filesystem as
> corrupted.
>
> This series survived passes of xfstests -g quick.
>
>
> Gabriel Krisman Bertazi (8):
> ext4: Simplify the handling of cached insensitive names
> f2fs: Simplify the handling of cached insensitive names
> libfs: Introduce case-insensitive string comparison helper
> ext4: Reuse generic_ci_match for ci comparisons
> f2fs: Reuse generic_ci_match for ci comparisons
> ext4: Log error when lookup of encoded dentry fails
> ext4: Move CONFIG_UNICODE defguards into the code flow
> f2fs: Move CONFIG_UNICODE defguards into the code flow
>
> fs/ext4/crypto.c | 19 ++-----
> fs/ext4/ext4.h | 35 +++++++-----
> fs/ext4/namei.c | 129 ++++++++++++++++-----------------------------
> fs/ext4/super.c | 4 +-
> fs/f2fs/dir.c | 105 +++++++++++-------------------------
> fs/f2fs/f2fs.h | 17 +++++-
> fs/f2fs/namei.c | 10 ++--
> fs/f2fs/recovery.c | 5 +-
> fs/f2fs/super.c | 8 +--
> fs/libfs.c | 85 +++++++++++++++++++++++++++++
> include/linux/fs.h | 4 ++
> 11 files changed, 216 insertions(+), 205 deletions(-)

--
Gabriel Krisman Bertazi