2022-05-11 21:10:20

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v4 00/10] Clean up the case-insensitive lookup path

The case-insensitive implementations in f2fs and ext4 have quite a bit
of duplicated code. This series simplifies the ext4 version, with the
goal of extracting ext4_ci_compare into a helper library that can be
used by both filesystems. It also reduces the clutter from many
codeguards for CONFIG_UNICODE; as requested by Linus, they are part of
the codeflow now.

While there, I noticed we can leverage the utf8 functions to detect
encoded names that are corrupted in the filesystem. Therefore, it also
adds an ext4 error on that scenario, to mark the filesystem as
corrupted.

This series survived passes of xfstests -g quick.

Gabriel Krisman Bertazi (10):
ext4: Match the f2fs ci_compare implementation
ext4: Simplify the handling of cached insensitive names
f2fs: Simplify the handling of cached insensitive names
ext4: Implement ci comparison using unicode_name
ext4: Simplify hash check on ext4_match
ext4: Log error when lookup of encoded dentry fails
ext4: Move ext4_match_ci into libfs
f2fs: Reuse generic_ci_match for ci comparisons
ext4: Move CONFIG_UNICODE defguards into the code flow
f2fs: Move CONFIG_UNICODE defguards into the code flow

fs/ext4/ext4.h | 41 +++++++--------
fs/ext4/namei.c | 126 ++++++++++++++-------------------------------
fs/ext4/super.c | 4 +-
fs/f2fs/dir.c | 103 ++++++++++++------------------------
fs/f2fs/f2fs.h | 3 +-
fs/f2fs/namei.c | 12 ++---
fs/f2fs/recovery.c | 5 +-
fs/f2fs/super.c | 22 ++++----
fs/libfs.c | 61 ++++++++++++++++++++++
include/linux/fs.h | 8 +++
10 files changed, 185 insertions(+), 200 deletions(-)

--
2.36.1



2022-05-11 21:12:56

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v4 06/10] ext4: Log error when lookup of encoded dentry fails

If the volume is in strict mode, ext4_ci_compare can report a broken
encoding name. This will not trigger on a bad lookup, which is caught
earlier, only if the actual disk name is bad.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

---

Changes since v1:
- reword error message "file in directory" -> "filename" (Eric)
---
fs/ext4/namei.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index cebbcabf0ff0..708811525411 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1458,6 +1458,9 @@ static bool ext4_match(struct inode *parent,
* only case where it happens is on a disk
* corruption or ENOMEM.
*/
+ if (ret == -EINVAL)
+ EXT4_ERROR_INODE(parent,
+ "Bad encoded filename");
return false;
}
return ret;
--
2.36.1


2022-05-11 22:01:29

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v4 09/10] ext4: Move CONFIG_UNICODE defguards into the code flow

Instead of a bunch of ifdefs, make the unicode built checks part of the
code flow where possible, as requested by Torvalds.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/ext4/ext4.h | 39 +++++++++++++++++++--------------------
fs/ext4/namei.c | 15 ++++++---------
fs/ext4/super.c | 4 +---
3 files changed, 26 insertions(+), 32 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 93a28fcb2e22..e3c55a8e23bd 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2725,11 +2725,17 @@ extern unsigned ext4_free_clusters_after_init(struct super_block *sb,
struct ext4_group_desc *gdp);
ext4_fsblk_t ext4_inode_to_goal_block(struct inode *);

-#if IS_ENABLED(CONFIG_UNICODE)
extern int ext4_fname_setup_ci_filename(struct inode *dir,
- const struct qstr *iname,
- struct ext4_filename *fname);
+ const struct qstr *iname,
+ struct ext4_filename *fname);
+
+static inline void ext4_fname_free_ci_filename(struct ext4_filename *fname)
+{
+#if IS_ENABLED(CONFIG_UNICODE)
+ kfree(fname->cf_name.name);
+ fname->cf_name.name = NULL;
#endif
+}

#ifdef CONFIG_FS_ENCRYPTION
static inline void ext4_fname_from_fscrypt_name(struct ext4_filename *dst,
@@ -2758,9 +2764,9 @@ static inline int ext4_fname_setup_filename(struct inode *dir,

ext4_fname_from_fscrypt_name(fname, &name);

-#if IS_ENABLED(CONFIG_UNICODE)
- err = ext4_fname_setup_ci_filename(dir, iname, fname);
-#endif
+ if (IS_ENABLED(CONFIG_UNICODE))
+ err = ext4_fname_setup_ci_filename(dir, iname, fname);
+
return err;
}

@@ -2777,9 +2783,9 @@ static inline int ext4_fname_prepare_lookup(struct inode *dir,

ext4_fname_from_fscrypt_name(fname, &name);

-#if IS_ENABLED(CONFIG_UNICODE)
- err = ext4_fname_setup_ci_filename(dir, &dentry->d_name, fname);
-#endif
+ if (IS_ENABLED(CONFIG_UNICODE))
+ err = ext4_fname_setup_ci_filename(dir, &dentry->d_name, fname);
+
return err;
}

@@ -2794,10 +2800,7 @@ static inline void ext4_fname_free_filename(struct ext4_filename *fname)
fname->usr_fname = NULL;
fname->disk_name.name = NULL;

-#if IS_ENABLED(CONFIG_UNICODE)
- kfree(fname->cf_name.name);
- fname->cf_name.name = NULL;
-#endif
+ ext4_fname_free_ci_filename(fname);
}
#else /* !CONFIG_FS_ENCRYPTION */
static inline int ext4_fname_setup_filename(struct inode *dir,
@@ -2810,9 +2813,8 @@ static inline int ext4_fname_setup_filename(struct inode *dir,
fname->disk_name.name = (unsigned char *) iname->name;
fname->disk_name.len = iname->len;

-#if IS_ENABLED(CONFIG_UNICODE)
- err = ext4_fname_setup_ci_filename(dir, iname, fname);
-#endif
+ if (IS_ENABLED(CONFIG_UNICODE))
+ err = ext4_fname_setup_ci_filename(dir, iname, fname);

return err;
}
@@ -2826,10 +2828,7 @@ static inline int ext4_fname_prepare_lookup(struct inode *dir,

static inline void ext4_fname_free_filename(struct ext4_filename *fname)
{
-#if IS_ENABLED(CONFIG_UNICODE)
- kfree(fname->cf_name.name);
- fname->cf_name.name = NULL;
-#endif
+ ext4_fname_free_ci_filename(fname);
}
#endif /* !CONFIG_FS_ENCRYPTION */

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 16fd0df5f8a8..0892f9ee15cf 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1757,8 +1757,7 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi
}
}

-#if IS_ENABLED(CONFIG_UNICODE)
- if (!inode && IS_CASEFOLDED(dir)) {
+ if (IS_ENABLED(CONFIG_UNICODE) && !inode && IS_CASEFOLDED(dir)) {
/* Eventually we want to call d_add_ci(dentry, NULL)
* for negative dentries in the encoding case as
* well. For now, prevent the negative dentry
@@ -1766,7 +1765,7 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi
*/
return NULL;
}
-#endif
+
return d_splice_alias(inode, dentry);
}

@@ -3083,16 +3082,14 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
ext4_fc_track_unlink(handle, dentry);
retval = ext4_mark_inode_dirty(handle, dir);

-#if IS_ENABLED(CONFIG_UNICODE)
/* VFS negative dentries are incompatible with Encoding and
* Case-insensitiveness. Eventually we'll want avoid
* invalidating the dentries here, alongside with returning the
* negative dentries at ext4_lookup(), when it is better
* supported by the VFS for the CI case.
*/
- if (IS_CASEFOLDED(dir))
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
d_invalidate(dentry);
-#endif

end_rmdir:
brelse(bh);
@@ -3188,16 +3185,16 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
retval = __ext4_unlink(handle, dir, &dentry->d_name, d_inode(dentry));
if (!retval)
ext4_fc_track_unlink(handle, dentry);
-#if IS_ENABLED(CONFIG_UNICODE)
+
/* VFS negative dentries are incompatible with Encoding and
* Case-insensitiveness. Eventually we'll want avoid
* invalidating the dentries here, alongside with returning the
* negative dentries at ext4_lookup(), when it is better
* supported by the VFS for the CI case.
*/
- if (IS_CASEFOLDED(dir))
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
d_invalidate(dentry);
-#endif
+
if (handle)
ext4_journal_stop(handle);

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 1847b46af808..fa0004459dd6 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3645,14 +3645,12 @@ int ext4_feature_set_ok(struct super_block *sb, int readonly)
return 0;
}

-#if !IS_ENABLED(CONFIG_UNICODE)
- if (ext4_has_feature_casefold(sb)) {
+ if (!IS_ENABLED(CONFIG_UNICODE) && ext4_has_feature_casefold(sb)) {
ext4_msg(sb, KERN_ERR,
"Filesystem with casefold feature cannot be "
"mounted without CONFIG_UNICODE");
return 0;
}
-#endif

if (readonly)
return 1;
--
2.36.1


2022-05-12 00:29:34

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v4 07/10] ext4: Move ext4_match_ci into libfs

Matching case-insensitive names is a generic operation and can be shared
with f2fs. Move it next to the rest of the shared casefold fs code.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/ext4/namei.c | 62 +---------------------------------------------
fs/libfs.c | 61 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 3 +++
3 files changed, 65 insertions(+), 61 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 708811525411..16fd0df5f8a8 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1318,66 +1318,6 @@ static void dx_insert_block(struct dx_frame *frame, u32 hash, ext4_lblk_t block)
}

#if IS_ENABLED(CONFIG_UNICODE)
-/**
- * ext4_match_ci() - Match (case-insensitive) a name with a dirent.
- * @parent: Inode of the parent of the dentry.
- * @uname: name under lookup.
- * @de_name: Dirent name.
- * @de_name_len: dirent name length.
- *
- * Test whether a case-insensitive directory entry matches the filename
- * being searched.
- *
- * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
- * < 0 on error.
- */
-static int ext4_match_ci(const struct inode *parent,
- const struct unicode_name *uname,
- u8 *de_name, size_t de_name_len)
-{
- const struct super_block *sb = parent->i_sb;
- const struct unicode_map *um = sb->s_encoding;
- struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
- struct qstr entry = QSTR_INIT(de_name, de_name_len);
- int ret, match = false;
-
- if (IS_ENCRYPTED(parent)) {
- const struct fscrypt_str encrypted_name =
- FSTR_INIT(de_name, de_name_len);
-
- decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
- if (!decrypted_name.name)
- return -ENOMEM;
- ret = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
- &decrypted_name);
- if (ret < 0)
- goto out;
- entry.name = decrypted_name.name;
- entry.len = decrypted_name.len;
- }
-
- if (uname->folded_name->name)
- ret = utf8_strncasecmp_folded(um, uname->folded_name, &entry);
- else
- ret = utf8_strncasecmp(um, uname->usr_name, &entry);
-
- if (!ret)
- match = true;
- else if (ret < 0 && !sb_has_strict_encoding(sb)) {
- /*
- * In non-strict mode, fallback to a byte comparison if
- * the names have invalid characters.
- */
- ret = 0;
- match = ((uname->usr_name->len == entry.len) &&
- !memcmp(uname->usr_name->name, entry.name, entry.len));
- }
-
-out:
- kfree(decrypted_name.name);
- return (ret >= 0) ? match : ret;
-}
-
int ext4_fname_setup_ci_filename(struct inode *dir, const struct qstr *iname,
struct ext4_filename *name)
{
@@ -1451,7 +1391,7 @@ static bool ext4_match(struct inode *parent,
};
int ret;

- ret = ext4_match_ci(parent, &u, de->name, de->name_len);
+ ret = generic_ci_match(parent, &u, de->name, de->name_len);
if (ret < 0) {
/*
* Treat comparison errors as not a match. The
diff --git a/fs/libfs.c b/fs/libfs.c
index 974125270a42..c14b3fa615f5 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1465,6 +1465,67 @@ static const struct dentry_operations generic_ci_dentry_ops = {
.d_hash = generic_ci_d_hash,
.d_compare = generic_ci_d_compare,
};
+
+/**
+ * generic_ci_match() - Match (case-insensitive) a name with a dirent.
+ * @parent: Inode of the parent of the dentry.
+ * @uname: name under lookup.
+ * @de_name: Dirent name.
+ * @de_name_len: dirent name length.
+ *
+ * Test whether a case-insensitive directory entry matches the filename
+ * being searched.
+ *
+ * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
+ * < 0 on error.
+ */
+int generic_ci_match(const struct inode *parent,
+ const struct unicode_name *uname,
+ u8 *de_name, size_t de_name_len)
+{
+ const struct super_block *sb = parent->i_sb;
+ const struct unicode_map *um = sb->s_encoding;
+ struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
+ struct qstr entry = QSTR_INIT(de_name, de_name_len);
+ int ret, match = false;
+
+ if (IS_ENCRYPTED(parent)) {
+ const struct fscrypt_str encrypted_name =
+ FSTR_INIT(de_name, de_name_len);
+
+ decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
+ if (!decrypted_name.name)
+ return -ENOMEM;
+ ret = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
+ &decrypted_name);
+ if (ret < 0)
+ goto out;
+ entry.name = decrypted_name.name;
+ entry.len = decrypted_name.len;
+ }
+
+ if (uname->folded_name->name)
+ ret = utf8_strncasecmp_folded(um, uname->folded_name, &entry);
+ else
+ ret = utf8_strncasecmp(um, uname->usr_name, &entry);
+
+ if (!ret)
+ match = true;
+ else if (ret < 0 && !sb_has_strict_encoding(sb)) {
+ /*
+ * In non-strict mode, fallback to a byte comparison if
+ * the names have invalid characters.
+ */
+ ret = 0;
+ match = ((uname->usr_name->len == entry.len) &&
+ !memcmp(uname->usr_name->name, entry.name, entry.len));
+ }
+
+out:
+ kfree(decrypted_name.name);
+ return (ret >= 0) ? match : ret;
+}
+EXPORT_SYMBOL(generic_ci_match);
#endif

#ifdef CONFIG_FS_ENCRYPTION
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3f76a18a5f40..6a750b8704c9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3364,6 +3364,9 @@ struct unicode_name {
};

extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry);
+extern int generic_ci_match(const struct inode *parent,
+ const struct unicode_name *uname, u8 *de_name,
+ size_t de_name_len);

#ifdef CONFIG_MIGRATION
extern int buffer_migrate_page(struct address_space *,
--
2.36.1


2022-05-12 01:31:03

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v4 04/10] ext4: Implement ci comparison using unicode_name

By using a new type here, we can hide most of the caching casefold logic
from ext4. The condition in ext4_match is now quite redundant, but this
is addressed in the next patch.

This doesn't use ext4_filename to keep it generic, since the function
will be moved to libfs to be shared with f2fs.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

--
Changes since v1:
- Instead of (ab)using fscrypt_name, create a new type (ebiggers).
---
fs/ext4/namei.c | 32 +++++++++++++++-----------------
include/linux/fs.h | 5 +++++
2 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 84fdb23f09b8..5296ced2e43e 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1321,20 +1321,19 @@ static void dx_insert_block(struct dx_frame *frame, u32 hash, ext4_lblk_t block)
/**
* ext4_match_ci() - Match (case-insensitive) a name with a dirent.
* @parent: Inode of the parent of the dentry.
- * @name: name under lookup.
+ * @uname: name under lookup.
* @de_name: Dirent name.
* @de_name_len: dirent name length.
- * @quick: whether @name is already casefolded.
*
* Test whether a case-insensitive directory entry matches the filename
- * being searched. If quick is set, the @name being looked up is
- * already in the casefolded form.
+ * being searched.
*
* Return: > 0 if the directory entry matches, 0 if it doesn't match, or
* < 0 on error.
*/
-static int ext4_match_ci(const struct inode *parent, const struct qstr *name,
- u8 *de_name, size_t de_name_len, bool quick)
+static int ext4_match_ci(const struct inode *parent,
+ const struct unicode_name *uname,
+ u8 *de_name, size_t de_name_len)
{
const struct super_block *sb = parent->i_sb;
const struct unicode_map *um = sb->s_encoding;
@@ -1357,10 +1356,10 @@ static int ext4_match_ci(const struct inode *parent, const struct qstr *name,
entry.len = decrypted_name.len;
}

- if (quick)
- ret = utf8_strncasecmp_folded(um, name, &entry);
+ if (uname->folded_name->name)
+ ret = utf8_strncasecmp_folded(um, uname->folded_name, &entry);
else
- ret = utf8_strncasecmp(um, name, &entry);
+ ret = utf8_strncasecmp(um, uname->usr_name, &entry);

if (!ret)
match = true;
@@ -1370,8 +1369,8 @@ static int ext4_match_ci(const struct inode *parent, const struct qstr *name,
* the names have invalid characters.
*/
ret = 0;
- match = ((name->len == entry.len) &&
- !memcmp(name->name, entry.name, entry.len));
+ match = ((uname->usr_name->len == entry.len) &&
+ !memcmp(uname->usr_name->name, entry.name, entry.len));
}

out:
@@ -1441,6 +1440,10 @@ static bool ext4_match(struct inode *parent,
#if IS_ENABLED(CONFIG_UNICODE)
if (parent->i_sb->s_encoding && IS_CASEFOLDED(parent) &&
(!IS_ENCRYPTED(parent) || fscrypt_has_encryption_key(parent))) {
+ struct unicode_name u = {
+ .folded_name = &fname->cf_name,
+ .usr_name = fname->usr_fname
+ };
int ret;

if (fname->cf_name.name) {
@@ -1452,14 +1455,9 @@ static bool ext4_match(struct inode *parent,
return false;
}
}
-
- ret = ext4_match_ci(parent, &fname->cf_name, de->name,
- de->name_len, true);
- } else {
- ret = ext4_match_ci(parent, fname->usr_fname,
- de->name, de->name_len, false);
}

+ ret = ext4_match_ci(parent, &u, de->name, de->name_len);
if (ret < 0) {
/*
* Treat comparison errors as not a match. The
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e2d892b201b0..3f76a18a5f40 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3358,6 +3358,11 @@ extern int generic_file_fsync(struct file *, loff_t, loff_t, int);

extern int generic_check_addressable(unsigned, u64);

+struct unicode_name {
+ const struct qstr *folded_name;
+ const struct qstr *usr_name;
+};
+
extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry);

#ifdef CONFIG_MIGRATION
--
2.36.1


2022-05-12 07:08:22

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v4 05/10] ext4: Simplify hash check on ext4_match

The existence of fname->cf_name.name requires s_encoding & IS_CASEFOLDED,
therefore this can be simplified.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/ext4/namei.c | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 5296ced2e43e..cebbcabf0ff0 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1438,25 +1438,19 @@ static bool ext4_match(struct inode *parent,
#endif

#if IS_ENABLED(CONFIG_UNICODE)
- if (parent->i_sb->s_encoding && IS_CASEFOLDED(parent) &&
- (!IS_ENCRYPTED(parent) || fscrypt_has_encryption_key(parent))) {
+ if (IS_ENCRYPTED(parent) && fname->cf_name.name) {
+ if (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
+ fname->hinfo.minor_hash != EXT4_DIRENT_MINOR_HASH(de))
+ return false;
+ }
+
+ if (parent->i_sb->s_encoding && IS_CASEFOLDED(parent)) {
struct unicode_name u = {
.folded_name = &fname->cf_name,
.usr_name = fname->usr_fname
};
int ret;

- if (fname->cf_name.name) {
- if (IS_ENCRYPTED(parent)) {
- if (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
- fname->hinfo.minor_hash !=
- EXT4_DIRENT_MINOR_HASH(de)) {
-
- return false;
- }
- }
- }
-
ret = ext4_match_ci(parent, &u, de->name, de->name_len);
if (ret < 0) {
/*
--
2.36.1


2022-05-12 08:45:18

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] ext4: Move CONFIG_UNICODE defguards into the code flow

On Wed, May 11, 2022 at 03:31:45PM -0400, Gabriel Krisman Bertazi wrote:
> Instead of a bunch of ifdefs, make the unicode built checks part of the
> code flow where possible, as requested by Torvalds.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> fs/ext4/ext4.h | 39 +++++++++++++++++++--------------------
> fs/ext4/namei.c | 15 ++++++---------
> fs/ext4/super.c | 4 +---
> 3 files changed, 26 insertions(+), 32 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 93a28fcb2e22..e3c55a8e23bd 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2725,11 +2725,17 @@ extern unsigned ext4_free_clusters_after_init(struct super_block *sb,
> struct ext4_group_desc *gdp);
> ext4_fsblk_t ext4_inode_to_goal_block(struct inode *);
>
> -#if IS_ENABLED(CONFIG_UNICODE)
> extern int ext4_fname_setup_ci_filename(struct inode *dir,
> - const struct qstr *iname,
> - struct ext4_filename *fname);
> + const struct qstr *iname,
> + struct ext4_filename *fname);

I think this function should just have a !CONFIG_UNICODE stub that does nothing,
so that the callers can just call it unconditionally and not have to gate their
call on CONFIG_UNICODE themselves.

- Eric

2022-05-12 08:50:41

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v4 04/10] ext4: Implement ci comparison using unicode_name

On Wed, May 11, 2022 at 03:31:40PM -0400, Gabriel Krisman Bertazi wrote:
> By using a new type here, we can hide most of the caching casefold logic
> from ext4. The condition in ext4_match is now quite redundant, but this
> is addressed in the next patch.
>
> This doesn't use ext4_filename to keep it generic, since the function
> will be moved to libfs to be shared with f2fs.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> --
> Changes since v1:
> - Instead of (ab)using fscrypt_name, create a new type (ebiggers).
> ---
> fs/ext4/namei.c | 32 +++++++++++++++-----------------
> include/linux/fs.h | 5 +++++
> 2 files changed, 20 insertions(+), 17 deletions(-)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 84fdb23f09b8..5296ced2e43e 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -1321,20 +1321,19 @@ static void dx_insert_block(struct dx_frame *frame, u32 hash, ext4_lblk_t block)
> /**
> * ext4_match_ci() - Match (case-insensitive) a name with a dirent.
> * @parent: Inode of the parent of the dentry.
> - * @name: name under lookup.
> + * @uname: name under lookup.
> * @de_name: Dirent name.
> * @de_name_len: dirent name length.
> - * @quick: whether @name is already casefolded.
> *
> * Test whether a case-insensitive directory entry matches the filename
> - * being searched. If quick is set, the @name being looked up is
> - * already in the casefolded form.
> + * being searched.
> *
> * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
> * < 0 on error.
> */
> -static int ext4_match_ci(const struct inode *parent, const struct qstr *name,
> - u8 *de_name, size_t de_name_len, bool quick)
> +static int ext4_match_ci(const struct inode *parent,
> + const struct unicode_name *uname,
> + u8 *de_name, size_t de_name_len)
> {
> const struct super_block *sb = parent->i_sb;
> const struct unicode_map *um = sb->s_encoding;
> @@ -1357,10 +1356,10 @@ static int ext4_match_ci(const struct inode *parent, const struct qstr *name,
> entry.len = decrypted_name.len;
> }
>
> - if (quick)
> - ret = utf8_strncasecmp_folded(um, name, &entry);
> + if (uname->folded_name->name)
> + ret = utf8_strncasecmp_folded(um, uname->folded_name, &entry);
> else
> - ret = utf8_strncasecmp(um, name, &entry);
> + ret = utf8_strncasecmp(um, uname->usr_name, &entry);
>
> if (!ret)
> match = true;
> @@ -1370,8 +1369,8 @@ static int ext4_match_ci(const struct inode *parent, const struct qstr *name,
> * the names have invalid characters.
> */
> ret = 0;
> - match = ((name->len == entry.len) &&
> - !memcmp(name->name, entry.name, entry.len));
> + match = ((uname->usr_name->len == entry.len) &&
> + !memcmp(uname->usr_name->name, entry.name, entry.len));
> }
>
> out:
> @@ -1441,6 +1440,10 @@ static bool ext4_match(struct inode *parent,
> #if IS_ENABLED(CONFIG_UNICODE)
> if (parent->i_sb->s_encoding && IS_CASEFOLDED(parent) &&
> (!IS_ENCRYPTED(parent) || fscrypt_has_encryption_key(parent))) {
> + struct unicode_name u = {
> + .folded_name = &fname->cf_name,
> + .usr_name = fname->usr_fname
> + };
> int ret;
>
> if (fname->cf_name.name) {
> @@ -1452,14 +1455,9 @@ static bool ext4_match(struct inode *parent,
> return false;
> }
> }
> -
> - ret = ext4_match_ci(parent, &fname->cf_name, de->name,
> - de->name_len, true);
> - } else {
> - ret = ext4_match_ci(parent, fname->usr_fname,
> - de->name, de->name_len, false);
> }
>
> + ret = ext4_match_ci(parent, &u, de->name, de->name_len);
> if (ret < 0) {
> /*
> * Treat comparison errors as not a match. The
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e2d892b201b0..3f76a18a5f40 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3358,6 +3358,11 @@ extern int generic_file_fsync(struct file *, loff_t, loff_t, int);
>
> extern int generic_check_addressable(unsigned, u64);
>
> +struct unicode_name {
> + const struct qstr *folded_name;
> + const struct qstr *usr_name;
> +};
> +
> extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry);
>
> #ifdef CONFIG_MIGRATION

I don't really see the point of this. The only times struct unicode_name gets
used are when one is initialized on the stack for a single call to
generic_ci_match(). So the end result is just that the function prototype is:

int generic_ci_match(const struct inode *parent,
const struct unicode_name *uname,
const u8 *de_name, size_t de_name_len);

... instead of:

int generic_ci_match(const struct inode *parent, const struct qstr *usr_fname,
const struct qstr *folded_name,
const u8 *de_name, size_t de_name_len);

So the only effect is to consolidate two parameters into one. I don't think
it's worth it, given that the struct is being created on-demand.

Also note that filenames are not necessarily valid Unicode, so "unicode_name" is
a bit misleading.

- Eric

2022-05-12 13:28:54

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v4 06/10] ext4: Log error when lookup of encoded dentry fails

On Wed, May 11, 2022 at 03:31:42PM -0400, Gabriel Krisman Bertazi wrote:
> If the volume is in strict mode, ext4_ci_compare can report a broken
> encoding name. This will not trigger on a bad lookup, which is caught
> earlier, only if the actual disk name is bad.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> ---
>
> Changes since v1:
> - reword error message "file in directory" -> "filename" (Eric)
> ---
> fs/ext4/namei.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index cebbcabf0ff0..708811525411 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -1458,6 +1458,9 @@ static bool ext4_match(struct inode *parent,
> * only case where it happens is on a disk
> * corruption or ENOMEM.
> */
> + if (ret == -EINVAL)
> + EXT4_ERROR_INODE(parent,
> + "Bad encoded filename");

This message is still quite vague; perhaps it should be more specific about what
a "bad" filename is? Maybe something like: "Directory contains filename that is
not valid UTF-8" (or whatever the encoding being enforced is).

- Eric

2022-05-13 02:23:41

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v4 07/10] ext4: Move ext4_match_ci into libfs

On Wed, May 11, 2022 at 03:31:43PM -0400, Gabriel Krisman Bertazi wrote:
> Matching case-insensitive names is a generic operation and can be shared
> with f2fs. Move it next to the rest of the shared casefold fs code.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> fs/ext4/namei.c | 62 +---------------------------------------------
> fs/libfs.c | 61 +++++++++++++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 3 +++
> 3 files changed, 65 insertions(+), 61 deletions(-)

It might be a good idea to split this into two patches, one for the libfs part
and one for the ext4 part. That would make sorting out the dependencies of this
series easier in case it doesn't all go in in one cycle.

> +/**
> + * generic_ci_match() - Match (case-insensitive) a name with a dirent.
> + * @parent: Inode of the parent of the dentry.
> + * @uname: name under lookup.
> + * @de_name: Dirent name.
> + * @de_name_len: dirent name length.
> + *
> + * Test whether a case-insensitive directory entry matches the filename
> + * being searched.
> + *
> + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
> + * < 0 on error.
> + */
> +int generic_ci_match(const struct inode *parent,
> + const struct unicode_name *uname,
> + u8 *de_name, size_t de_name_len)

de_name should be const, like it is in the f2fs version. It does get cast away
temporarily when it is stored in a fscrypt_str, but it never gets modified (and
must not be) so const is appropriate.

- Eric

2022-05-13 09:08:11

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH v4 02/10] ext4: Simplify the handling of cached insensitive names

Keeping it as qstr avoids the unnecessary conversion in ext4_match

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

--
Changes since v1:
- Simplify hunk (eric)
---
fs/ext4/ext4.h | 2 +-
fs/ext4/namei.c | 22 +++++++++++-----------
2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a743b1e3b89e..93a28fcb2e22 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2490,7 +2490,7 @@ struct ext4_filename {
struct fscrypt_str crypto_buf;
#endif
#if IS_ENABLED(CONFIG_UNICODE)
- struct fscrypt_str cf_name;
+ struct qstr cf_name;
#endif
};

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 59eb3ecfdea7..84fdb23f09b8 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1382,7 +1382,8 @@ static int ext4_match_ci(const struct inode *parent, const struct qstr *name,
int ext4_fname_setup_ci_filename(struct inode *dir, const struct qstr *iname,
struct ext4_filename *name)
{
- struct fscrypt_str *cf_name = &name->cf_name;
+ struct qstr *cf_name = &name->cf_name;
+ unsigned char *buf;
struct dx_hash_info *hinfo = &name->hinfo;
int len;

@@ -1392,18 +1393,18 @@ int ext4_fname_setup_ci_filename(struct inode *dir, const struct qstr *iname,
return 0;
}

- cf_name->name = kmalloc(EXT4_NAME_LEN, GFP_NOFS);
- if (!cf_name->name)
+ buf = kmalloc(EXT4_NAME_LEN, GFP_NOFS);
+ if (!buf)
return -ENOMEM;

- len = utf8_casefold(dir->i_sb->s_encoding,
- iname, cf_name->name,
- EXT4_NAME_LEN);
+ len = utf8_casefold(dir->i_sb->s_encoding, iname, buf, EXT4_NAME_LEN);
if (len <= 0) {
- kfree(cf_name->name);
- cf_name->name = NULL;
+ kfree(buf);
+ buf = NULL;
}
+ cf_name->name = buf;
cf_name->len = (unsigned) len;
+
if (!IS_ENCRYPTED(dir))
return 0;

@@ -1443,8 +1444,6 @@ static bool ext4_match(struct inode *parent,
int ret;

if (fname->cf_name.name) {
- struct qstr cf = {.name = fname->cf_name.name,
- .len = fname->cf_name.len};
if (IS_ENCRYPTED(parent)) {
if (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
fname->hinfo.minor_hash !=
@@ -1453,7 +1452,8 @@ static bool ext4_match(struct inode *parent,
return false;
}
}
- ret = ext4_match_ci(parent, &cf, de->name,
+
+ ret = ext4_match_ci(parent, &fname->cf_name, de->name,
de->name_len, true);
} else {
ret = ext4_match_ci(parent, fname->usr_fname,
--
2.36.1


2022-05-13 17:55:06

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v4 05/10] ext4: Simplify hash check on ext4_match

On Wed, May 11, 2022 at 03:31:41PM -0400, Gabriel Krisman Bertazi wrote:
> The existence of fname->cf_name.name requires s_encoding & IS_CASEFOLDED,
> therefore this can be simplified.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> fs/ext4/namei.c | 20 +++++++-------------
> 1 file changed, 7 insertions(+), 13 deletions(-)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 5296ced2e43e..cebbcabf0ff0 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -1438,25 +1438,19 @@ static bool ext4_match(struct inode *parent,
> #endif
>
> #if IS_ENABLED(CONFIG_UNICODE)
> - if (parent->i_sb->s_encoding && IS_CASEFOLDED(parent) &&
> - (!IS_ENCRYPTED(parent) || fscrypt_has_encryption_key(parent))) {
> + if (IS_ENCRYPTED(parent) && fname->cf_name.name) {
> + if (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
> + fname->hinfo.minor_hash != EXT4_DIRENT_MINOR_HASH(de))
> + return false;
> + }
> +
> + if (parent->i_sb->s_encoding && IS_CASEFOLDED(parent)) {
> struct unicode_name u = {
> .folded_name = &fname->cf_name,
> .usr_name = fname->usr_fname
> };
> int ret;
>
> - if (fname->cf_name.name) {
> - if (IS_ENCRYPTED(parent)) {
> - if (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
> - fname->hinfo.minor_hash !=
> - EXT4_DIRENT_MINOR_HASH(de)) {
> -
> - return false;
> - }
> - }
> - }
> -

I don't think it's correct to delete the check for the encryption key here. If
lookup is by no-key name, then fscrypt_match_name() must be used, not
generic_ci_match(). And unlike f2fs, ext4 doesn't keep track of whether the
whole lookup is by no-key name; ext4 relies on this fscrypt_has_encryption_key()
check at the last minute when doing each individual comparison. (Which is not
great, but that's how it works now.)

- Eric

2022-05-18 03:32:08

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] Clean up the case-insensitive lookup path

On Wed, May 11, 2022 at 03:31:36PM -0400, Gabriel Krisman Bertazi wrote:
> The case-insensitive implementations in f2fs and ext4 have quite a bit
> of duplicated code. This series simplifies the ext4 version, with the
> goal of extracting ext4_ci_compare into a helper library that can be
> used by both filesystems. It also reduces the clutter from many
> codeguards for CONFIG_UNICODE; as requested by Linus, they are part of
> the codeflow now.
>
> While there, I noticed we can leverage the utf8 functions to detect
> encoded names that are corrupted in the filesystem. Therefore, it also
> adds an ext4 error on that scenario, to mark the filesystem as
> corrupted.

Gabriel, are you planning on doing another version of this patch series?

It looks like the first two patches for ext4 are not controversial, so
I could take those, while some of the other patches have questions
which Eric has raised.

Thanks,

- Ted


2022-05-18 03:39:29

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] Clean up the case-insensitive lookup path

On Tue, May 17, 2022 at 03:57:05PM -0400, Gabriel Krisman Bertazi wrote:
>
> I'll be reworking the series to apply Eric's comments and I might render
> patch 1 unnecessary. I'd be happy to send a v5 for the whole thing
> instead of applying the first two now.

OK, great, I'll wait for the v5 patch series.

Thanks,

- Ted

2022-05-18 04:23:09

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] Clean up the case-insensitive lookup path

"Theodore Ts'o" <[email protected]> writes:

> On Wed, May 11, 2022 at 03:31:36PM -0400, Gabriel Krisman Bertazi wrote:
>> The case-insensitive implementations in f2fs and ext4 have quite a bit
>> of duplicated code. This series simplifies the ext4 version, with the
>> goal of extracting ext4_ci_compare into a helper library that can be
>> used by both filesystems. It also reduces the clutter from many
>> codeguards for CONFIG_UNICODE; as requested by Linus, they are part of
>> the codeflow now.
>>
>> While there, I noticed we can leverage the utf8 functions to detect
>> encoded names that are corrupted in the filesystem. Therefore, it also
>> adds an ext4 error on that scenario, to mark the filesystem as
>> corrupted.
>
> Gabriel, are you planning on doing another version of this patch
> series?
> It looks like the first two patches for ext4 are not controversial, so
> I could take those, while some of the other patches have questions
> which Eric has raised.

Hi Ted,

I'll be reworking the series to apply Eric's comments and I might render
patch 1 unnecessary. I'd be happy to send a v5 for the whole thing
instead of applying the first two now.

Thanks,


--
Gabriel Krisman Bertazi