Hi,
This is the v4 of the negative dentry support on case-insensitive
directories. It doesn't have any functional changes from v1. It applies
Eric's comments to bring the flags check closet together, improve the
documentation and improve comments in the code. I also relooked at the
locks to ensure the inode read lock is indeed enough in the lookup_slow
path.
As usual, retested with xfstests.
--
cover letter from v1.
This patchset enables negative dentries for case-insensitive directories
in ext4/f2fs. It solves the corner cases for this feature, including
those already tested by fstests (generic/556). It also solves an
existing bug with the existing implementation where old negative
dentries are left behind after a directory conversion to
case-insensitive.
Testing-wise, I ran sanity checks to show it properly uses the created
negative dentries, observed the expected performance increase of the
dentry cache hit, and showed it survives the quick group in fstests on
both f2fs and ext4 without regressions.
* Background
Negative dentries have always been disabled in case-insensitive
directories because, in their current form, they can't provide enough
assurances that all the case variations of a filename won't exist in a
directory, and the name-preserving case-insenstive semantics
during file creation prevents some negative dentries from being
instantiated unmodified.
Nevertheless, for the general case, the existing implementation would
already work with negative dentries, even though they are fully
disabled. That is: if the original lookup that created the dentry was
done in a case-insensitive way, the negative dentry can usually be
validated, since it assures that no other dcache entry exists, *and*
that no variation of the file exists on disk (since the lookup
failed). A following lookup would then be executed with the
case-insensitive-aware d_hash and d_lookup, which would find the right
negative dentry and use it.
The first corner case arises when a case-insensitive directory has
negative dentries that were created before the directory was flipped to
case-insensitive. A directory must be empty to be converted, but it
doesn't mean the directory doesn't have negative dentry children. If
that happens, the dangling dentries left behind can't assure that no
case-variation of the name exists. They only mean the exact name
doesn't exist. A further lookup would incorrectly validate them.
The code below demonstrates the problem. In this example $1 and $2 are
two strings, where:
(i) $1 != $2
(ii) casefold($1) == casefold($2)
(iii) hash($1) == hash($2) == hash(casefold($1))
Then, the following sequence could potentially return a ENOENT, even
though the case-insensitive lookup should exist:
mkdir d <- Case-sensitive directory
touch d/$1
touch d/$2
unlink d/$1 <- leaves negative dentry behind.
unlink d/$2 <- leaves *another* negative dentry behind.
chattr +F d <- make 'd' case-insensitive.
touch d/$1 <- Both negative dentries could match. finds one of them,
and instantiate
access d/$1 <- Find the other negative dentry, get -ENOENT.
In fact, this is a problem even on the current implementation, where
negative dentries for CI are disabled. There was a bug reported by Al
Viro in 2020, where a directory might end up with dangling negative
dentries created during a case-sensitive lookup, because they existed
before the +F attribute was set.
It is hard to trigger the issue, because condition (iii) is hard to test
on an unmodified kernel. By hacking the kernel to force the hash
collision, there are a few ways we can trigger this bizarre behavior in
case-insensitive directories through the insertion of negative dentries.
Another problem exists when turning a negative dentry to positive. If
the negative dentry has a different case than what is currently being
used for lookup, the dentry cannot be reused without changing its name,
in order to guarantee filename-preserving semantics to userspace. We
need to either change the name or invalidate the dentry. This issue is
currently avoided in mainline, since the negative dentry mechanism is
disabled.
* Proposal
The main idea is to differentiate negative dentries created in a
case-insensitive context from those created during a case-sensitive
lookup via a new dentry flag, D_CASEFOLD_LOOKUP, set by the filesystem
the d_lookup hook. Since the former can be used (except for the
name-preserving issue), d_revalidate will just check the flag to
quickly accept or reject the dentry.
A different solution would be to guarantee no negative dentry exists
during the case-sensitive to case-insensitive directory conversion (the
other direction is safe). It has the following problems:
1) It is not trivial to implement a race-free mechanism to ensure
negative dentries won't be recreated immediately after invalidation
while converting the directory.
2) The knowledge whether the negative dentry is valid (i.e. comes from
a case-insensitive lookup) is implicit on the fact that we are
correctly invalidating dentries when converting the directory.
Having a D_CASEFOLD_LOOKUP avoids both issues, and seems to be a cheap
solution to the problem.
But, as explained above, due to the filename preserving semantics, we
cannot just validate based on D_CASEFOLD_LOOKUP.
For that, one solution would be to invalidate the negative dentry when
it is decided to turn it positive, instead of reusing it. I implemented
that in the past (2018) but Al Viro made it clear we don't want to incur
costs on the VFS critical path for filesystems who don't care about
case-insensitiveness.
Instead, this patch invalidates negative dentries in casefold
directories in d_revalidate during creation lookups, iff the lookup name
is not exactly what is cached. Other kinds of lookups wouldn't need
this limitation.
* caveats
1) Encryption
Negative dentries on case-insensitive encrypted directories are also
disabled. No semantic change for them is intended in
this patchset; we just bypass the revalidation directly to fscrypt, for
positive dentries. Encryption support is future work.
2) revalidate the cached dentry using the name under lookup
Validating based on the lookup name is strange for a cache. the new
semantic is implemented by d_revalidate, to stay out of the critical
path of filesystems who don't care about case-insensitiveness, as much
as possible. The only change is the addition of a new flavor of
d_revalidate.
* Tests
There are a tests in place for most of the corner cases in generic/556.
They mainly verify the name-preserving semantics. The invalidation when
converting the directory is harder to test, because it is hard to force
the invalidation of specific cached dentries that occlude a dangling
invalid dentry. I tested it with forcing the positive dentries to be
removed, but I'm not sure how to write an upstreamable test.
It also survives fstests quick group regression testing on both ext4 and
f2fs.
* Performance
The latency of lookups of non-existing files is obviously improved, as
would be expected. The following numbers compare the execution time of 10^6
lookups of a non-existing file in a case-insensitive directory
pre-populated with 100k files in ext4.
Without the patch: 10.363s / 0.349s / 9.920s (real/user/sys)
With the patch: 1.752s / 0.276s / 1.472s (real/user/sys)
* patchset
Patch 1 introduces a new flavor of d_revalidate to provide the
filesystem with the name under lookup; Patch 2 introduces the new flag
to signal the dentry creation context; Patch 3 introduces a libfs helper
to revalidate negative dentries on case-insensitive directories; Patch 4
deals with encryption; Patch 5 cleans up the now redundant dentry
operations for case-insensitive with and without encryption; Finally,
Patch 6 and 7 enable support on case-insensitive directories
for ext4 and f2fs, respectively.
Gabriel Krisman Bertazi (7):
fs: Expose name under lookup to d_revalidate hook
fs: Add DCACHE_CASEFOLDED_NAME flag
libfs: Validate negative dentries in case-insensitive directories
libfs: Chain encryption checks after case-insensitive revalidation
libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
ext4: Enable negative dentries on case-insensitive lookup
f2fs: Enable negative dentries on case-insensitive lookup
Documentation/filesystems/locking.rst | 3 +
Documentation/filesystems/vfs.rst | 12 ++++
fs/dcache.c | 10 ++-
fs/ext4/namei.c | 35 ++-------
fs/f2fs/namei.c | 25 ++-----
fs/libfs.c | 100 +++++++++++++++++---------
fs/namei.c | 23 +++---
include/linux/dcache.h | 9 +++
8 files changed, 123 insertions(+), 94 deletions(-)
--
2.41.0
From: Gabriel Krisman Bertazi <[email protected]>
Negative dentries support on case-insensitive ext4/f2fs will require
access to the name under lookup to ensure it matches the dentry. This
adds an optional new flavor of cached dentry revalidation hook to expose
this extra parameter.
I'm fine with extending d_revalidate instead of adding a new hook, if
it is considered cleaner and the approach is accepted. I wrote a new
hook to simplify reviewing.
Reviewed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
Changes since v2:
- Document d_revalidate_name hook. (Eric)
---
Documentation/filesystems/locking.rst | 3 +++
Documentation/filesystems/vfs.rst | 12 ++++++++++++
fs/dcache.c | 2 +-
fs/namei.c | 23 ++++++++++++++---------
include/linux/dcache.h | 1 +
5 files changed, 31 insertions(+), 10 deletions(-)
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index ed148919e11a..d68997ba6584 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -18,6 +18,8 @@ dentry_operations
prototypes::
int (*d_revalidate)(struct dentry *, unsigned int);
+ int (*d_revalidate_name)(struct dentry *, const struct qstr *,
+ unsigned int);
int (*d_weak_revalidate)(struct dentry *, unsigned int);
int (*d_hash)(const struct dentry *, struct qstr *);
int (*d_compare)(const struct dentry *,
@@ -37,6 +39,7 @@ locking rules:
ops rename_lock ->d_lock may block rcu-walk
================== =========== ======== ============== ========
d_revalidate: no no yes (ref-walk) maybe
+d_revalidate_name: no no yes (ref-walk) maybe
d_weak_revalidate: no no yes no
d_hash no no no maybe
d_compare: yes no no maybe
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index cb2a97e49872..34c842bd7cb2 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1252,6 +1252,8 @@ defined:
struct dentry_operations {
int (*d_revalidate)(struct dentry *, unsigned int);
+ int (*d_revalidate_name)(struct dentry *, const struct qstr *,
+ unsigned int);
int (*d_weak_revalidate)(struct dentry *, unsigned int);
int (*d_hash)(const struct dentry *, struct qstr *);
int (*d_compare)(const struct dentry *,
@@ -1288,6 +1290,16 @@ defined:
return
-ECHILD and it will be called again in ref-walk mode.
+``d_revalidate_name``
+ Variant of d_revalidate that also provides the name under look-up. Most
+ filesystems will keep it as NULL, unless there are particular semantics
+ for filenames encoding that need to be handled during dentry
+ revalidation.
+
+ When available, it is called in lieu of d_revalidate and has the same
+ locking rules and return semantics. Refer to d_revalidate for more
+ information.
+
``d_weak_revalidate``
called when the VFS needs to revalidate a "jumped" dentry. This
is called when a path-walk ends at dentry that was not acquired
diff --git a/fs/dcache.c b/fs/dcache.c
index 52e6d5fdab6b..98521862e58a 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1928,7 +1928,7 @@ void d_set_d_op(struct dentry *dentry, const struct dentry_operations *op)
dentry->d_flags |= DCACHE_OP_HASH;
if (op->d_compare)
dentry->d_flags |= DCACHE_OP_COMPARE;
- if (op->d_revalidate)
+ if (op->d_revalidate || op->d_revalidate_name)
dentry->d_flags |= DCACHE_OP_REVALIDATE;
if (op->d_weak_revalidate)
dentry->d_flags |= DCACHE_OP_WEAK_REVALIDATE;
diff --git a/fs/namei.c b/fs/namei.c
index e56ff39a79bc..84df0ddd20db 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -853,11 +853,16 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
return false;
}
-static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
+static inline int d_revalidate(struct dentry *dentry,
+ const struct qstr *name,
+ unsigned int flags)
{
- if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
+
+ if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE)) {
+ if (dentry->d_op->d_revalidate_name)
+ return dentry->d_op->d_revalidate_name(dentry, name, flags);
return dentry->d_op->d_revalidate(dentry, flags);
- else
+ } else
return 1;
}
@@ -1565,7 +1570,7 @@ static struct dentry *lookup_dcache(const struct qstr *name,
{
struct dentry *dentry = d_lookup(dir, name);
if (dentry) {
- int error = d_revalidate(dentry, flags);
+ int error = d_revalidate(dentry, name, flags);
if (unlikely(error <= 0)) {
if (!error)
d_invalidate(dentry);
@@ -1636,19 +1641,19 @@ static struct dentry *lookup_fast(struct nameidata *nd)
if (read_seqcount_retry(&parent->d_seq, nd->seq))
return ERR_PTR(-ECHILD);
- status = d_revalidate(dentry, nd->flags);
+ status = d_revalidate(dentry, &nd->last, nd->flags);
if (likely(status > 0))
return dentry;
if (!try_to_unlazy_next(nd, dentry))
return ERR_PTR(-ECHILD);
if (status == -ECHILD)
/* we'd been told to redo it in non-rcu mode */
- status = d_revalidate(dentry, nd->flags);
+ status = d_revalidate(dentry, &nd->last, nd->flags);
} else {
dentry = __d_lookup(parent, &nd->last);
if (unlikely(!dentry))
return NULL;
- status = d_revalidate(dentry, nd->flags);
+ status = d_revalidate(dentry, &nd->last, nd->flags);
}
if (unlikely(status <= 0)) {
if (!status)
@@ -1676,7 +1681,7 @@ static struct dentry *__lookup_slow(const struct qstr *name,
if (IS_ERR(dentry))
return dentry;
if (unlikely(!d_in_lookup(dentry))) {
- int error = d_revalidate(dentry, flags);
+ int error = d_revalidate(dentry, name, flags);
if (unlikely(error <= 0)) {
if (!error) {
d_invalidate(dentry);
@@ -3421,7 +3426,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
if (d_in_lookup(dentry))
break;
- error = d_revalidate(dentry, nd->flags);
+ error = d_revalidate(dentry, &nd->last, nd->flags);
if (likely(error > 0))
break;
if (error)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 6b351e009f59..b6188f2e8950 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -127,6 +127,7 @@ enum dentry_d_lock_class
struct dentry_operations {
int (*d_revalidate)(struct dentry *, unsigned int);
+ int (*d_revalidate_name)(struct dentry *, const struct qstr *, unsigned int);
int (*d_weak_revalidate)(struct dentry *, unsigned int);
int (*d_hash)(const struct dentry *, struct qstr *);
int (*d_compare)(const struct dentry *,
--
2.41.0
From: Gabriel Krisman Bertazi <[email protected]>
Now that casefold needs d_revalidate and calls fscrypt_d_revalidate
itself, generic_encrypt_ci_dentry_ops and generic_ci_dentry_ops are now
equivalent. Merge them together and simplify the setup code.
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
changes since v2:
- reword comment for clarity (Eric)
---
fs/libfs.c | 45 +++++++++++++--------------------------------
1 file changed, 13 insertions(+), 32 deletions(-)
diff --git a/fs/libfs.c b/fs/libfs.c
index 44c02993adb4..957dd12c1f25 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1516,7 +1516,7 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
return fscrypt_d_revalidate(dentry, flags);
}
-static const struct dentry_operations generic_ci_dentry_ops = {
+static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
.d_hash = generic_ci_d_hash,
.d_compare = generic_ci_d_compare,
.d_revalidate_name = generic_ci_d_revalidate,
@@ -1529,26 +1529,19 @@ static const struct dentry_operations generic_encrypted_dentry_ops = {
};
#endif
-#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
-static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
- .d_hash = generic_ci_d_hash,
- .d_compare = generic_ci_d_compare,
- .d_revalidate_name = generic_ci_d_revalidate,
-};
-#endif
-
/**
* generic_set_encrypted_ci_d_ops - helper for setting d_ops for given dentry
* @dentry: dentry to set ops on
*
- * Casefolded directories need d_hash and d_compare set, so that the dentries
- * contained in them are handled case-insensitively. Note that these operations
- * are needed on the parent directory rather than on the dentries in it, and
- * while the casefolding flag can be toggled on and off on an empty directory,
- * dentry_operations can't be changed later. As a result, if the filesystem has
- * casefolding support enabled at all, we have to give all dentries the
- * casefolding operations even if their inode doesn't have the casefolding flag
- * currently (and thus the casefolding ops would be no-ops for now).
+ * Casefolded directories need some dentry_operations set, so that the dentries
+ * contained in them are handled case-insensitively. Note that d_hash and
+ * d_compare are needed on the parent directory rather than on the dentries in
+ * it, and while the casefolding flag can be toggled on and off on an empty
+ * directory, dentry_operations can't be changed later. As a result, if the
+ * filesystem has casefolding support enabled at all, we have to give all
+ * dentries the casefolding operations even if their inode doesn't have the
+ * casefolding flag currently (and thus the casefolding ops would be no-ops for
+ * now).
*
* Encryption works differently in that the only dentry operation it needs is
* d_revalidate, which it only needs on dentries that have the no-key name flag.
@@ -1557,34 +1550,22 @@ static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
* Finally, to maximize compatibility with overlayfs (which isn't compatible
* with certain dentry operations) and to avoid taking an unnecessary
* performance hit, we use custom dentry_operations for each possible
- * combination rather than always installing all operations.
+ * combination of operations rather than always installing them.
*/
void generic_set_encrypted_ci_d_ops(struct dentry *dentry)
{
-#ifdef CONFIG_FS_ENCRYPTION
- bool needs_encrypt_ops = dentry->d_flags & DCACHE_NOKEY_NAME;
-#endif
#if IS_ENABLED(CONFIG_UNICODE)
- bool needs_ci_ops = dentry->d_sb->s_encoding;
-#endif
-#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
- if (needs_encrypt_ops && needs_ci_ops) {
+ if (dentry->d_sb->s_encoding) {
d_set_d_op(dentry, &generic_encrypted_ci_dentry_ops);
return;
}
#endif
#ifdef CONFIG_FS_ENCRYPTION
- if (needs_encrypt_ops) {
+ if (dentry->d_flags & DCACHE_NOKEY_NAME) {
d_set_d_op(dentry, &generic_encrypted_dentry_ops);
return;
}
#endif
-#if IS_ENABLED(CONFIG_UNICODE)
- if (needs_ci_ops) {
- d_set_d_op(dentry, &generic_ci_dentry_ops);
- return;
- }
-#endif
}
EXPORT_SYMBOL(generic_set_encrypted_ci_d_ops);
--
2.41.0
From: Gabriel Krisman Bertazi <[email protected]>
Support encrypted dentries in generic_ci_d_revalidate by chaining
fscrypt_d_revalidate at the tail of the d_revalidate. This allows
filesystem to just call generic_ci_d_revalidate and let it handle any
case-insensitive dentry (encrypted or not).
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
Changes since v2:
- Enable negative dentries of encrypted filesystems (Eric)
---
fs/libfs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/libfs.c b/fs/libfs.c
index ed04c4dcc312..44c02993adb4 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1513,7 +1513,7 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
}
}
}
- return 1;
+ return fscrypt_d_revalidate(dentry, flags);
}
static const struct dentry_operations generic_ci_dentry_ops = {
@@ -1533,7 +1533,7 @@ static const struct dentry_operations generic_encrypted_dentry_ops = {
static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
.d_hash = generic_ci_d_hash,
.d_compare = generic_ci_d_compare,
- .d_revalidate = fscrypt_d_revalidate,
+ .d_revalidate_name = generic_ci_d_revalidate,
};
#endif
--
2.41.0
From: Gabriel Krisman Bertazi <[email protected]>
Instead of invalidating negative dentries during case-insensitive
lookups, mark them as such and let them be added to the dcache.
d_ci_revalidate is able to properly filter them out if necessary based
on the dentry casefold flag.
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
Changes since v2:
- Move dentry flag set closer to fscrypt code (Eric)
---
fs/ext4/namei.c | 35 ++++-------------------------------
1 file changed, 4 insertions(+), 31 deletions(-)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 0caf6c730ce3..b22194a83e1a 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1759,6 +1759,10 @@ static struct buffer_head *ext4_lookup_entry(struct inode *dir,
err = ext4_fname_prepare_lookup(dir, dentry, &fname);
generic_set_encrypted_ci_d_ops(dentry);
+
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
+ d_set_casefold_lookup(dentry);
+
if (err == -ENOENT)
return NULL;
if (err)
@@ -1866,16 +1870,6 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi
}
}
-#if IS_ENABLED(CONFIG_UNICODE)
- if (!inode && IS_CASEFOLDED(dir)) {
- /* Eventually we want to call d_add_ci(dentry, NULL)
- * for negative dentries in the encoding case as
- * well. For now, prevent the negative dentry
- * from being cached.
- */
- return NULL;
- }
-#endif
return d_splice_alias(inode, dentry);
}
@@ -3206,17 +3200,6 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
ext4_fc_track_unlink(handle, dentry);
retval = ext4_mark_inode_dirty(handle, dir);
-#if IS_ENABLED(CONFIG_UNICODE)
- /* VFS negative dentries are incompatible with Encoding and
- * Case-insensitiveness. Eventually we'll want avoid
- * invalidating the dentries here, alongside with returning the
- * negative dentries at ext4_lookup(), when it is better
- * supported by the VFS for the CI case.
- */
- if (IS_CASEFOLDED(dir))
- d_invalidate(dentry);
-#endif
-
end_rmdir:
brelse(bh);
if (handle)
@@ -3317,16 +3300,6 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
goto out_trace;
retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry), dentry);
-#if IS_ENABLED(CONFIG_UNICODE)
- /* VFS negative dentries are incompatible with Encoding and
- * Case-insensitiveness. Eventually we'll want avoid
- * invalidating the dentries here, alongside with returning the
- * negative dentries at ext4_lookup(), when it is better
- * supported by the VFS for the CI case.
- */
- if (IS_CASEFOLDED(dir))
- d_invalidate(dentry);
-#endif
out_trace:
trace_ext4_unlink_exit(dentry, retval);
--
2.41.0
From: Gabriel Krisman Bertazi <[email protected]>
Instead of invalidating negative dentries during case-insensitive
lookups, mark them as such and let them be added to the dcache.
d_ci_revalidate is able to properly filter them out if necessary based
on the dentry casefold flag.
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
Changes since v2:
- Move dentry flag set closer to fscrypt code (Eric)
---
fs/f2fs/namei.c | 25 ++++---------------------
1 file changed, 4 insertions(+), 21 deletions(-)
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index bee0568888da..fef8e2e77f75 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -533,6 +533,10 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
err = f2fs_prepare_lookup(dir, dentry, &fname);
generic_set_encrypted_ci_d_ops(dentry);
+
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
+ d_set_casefold_lookup(dentry);
+
if (err == -ENOENT)
goto out_splice;
if (err)
@@ -578,17 +582,6 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
goto out_iput;
}
out_splice:
-#if IS_ENABLED(CONFIG_UNICODE)
- if (!inode && IS_CASEFOLDED(dir)) {
- /* Eventually we want to call d_add_ci(dentry, NULL)
- * for negative dentries in the encoding case as
- * well. For now, prevent the negative dentry
- * from being cached.
- */
- trace_f2fs_lookup_end(dir, dentry, ino, err);
- return NULL;
- }
-#endif
new = d_splice_alias(inode, dentry);
trace_f2fs_lookup_end(dir, !IS_ERR_OR_NULL(new) ? new : dentry,
ino, IS_ERR(new) ? PTR_ERR(new) : err);
@@ -641,16 +634,6 @@ static int f2fs_unlink(struct inode *dir, struct dentry *dentry)
f2fs_delete_entry(de, page, dir, inode);
f2fs_unlock_op(sbi);
-#if IS_ENABLED(CONFIG_UNICODE)
- /* VFS negative dentries are incompatible with Encoding and
- * Case-insensitiveness. Eventually we'll want avoid
- * invalidating the dentries here, alongside with returning the
- * negative dentries at f2fs_lookup(), when it is better
- * supported by the VFS for the CI case.
- */
- if (IS_CASEFOLDED(dir))
- d_invalidate(dentry);
-#endif
if (IS_DIRSYNC(dir))
f2fs_sync_fs(sbi->sb, 1);
fail:
--
2.41.0
On Thu, Jul 27, 2023 at 01:28:36PM -0400, Gabriel Krisman Bertazi wrote:
> This is the v4 of the negative dentry support on case-insensitive
> directories. It doesn't have any functional changes from v1. It applies
> Eric's comments to bring the flags check closet together, improve the
> documentation and improve comments in the code. I also relooked at the
> locks to ensure the inode read lock is indeed enough in the lookup_slow
> path.
Al, Christian, any thoughts or preferences for how we should handle
this patch series? I'm willing to take it through the ext4 tree, but
since it has vfs, ext4, and f2fs changes (and the bulk of the changes
are in the vfs), perhaps it should go through the vfs tree?
Also, Christian, I notice one of the five VFS patches in the series
has your Reviewed-by tag, but not the others? Is that because you
haven't had a chance to make a final determination on those patches,
or you have outstanding comments still to be addressed?
Cheers,
- Ted
"Theodore Ts'o" <[email protected]> writes:
> On Thu, Jul 27, 2023 at 01:28:36PM -0400, Gabriel Krisman Bertazi wrote:
>> This is the v4 of the negative dentry support on case-insensitive
>> directories. It doesn't have any functional changes from v1. It applies
>> Eric's comments to bring the flags check closet together, improve the
>> documentation and improve comments in the code. I also relooked at the
>> locks to ensure the inode read lock is indeed enough in the lookup_slow
>> path.
>
> Al, Christian, any thoughts or preferences for how we should handle
> this patch series? I'm willing to take it through the ext4 tree, but
> since it has vfs, ext4, and f2fs changes (and the bulk of the changes
> are in the vfs), perhaps it should go through the vfs tree?
>
> Also, Christian, I notice one of the five VFS patches in the series
> has your Reviewed-by tag, but not the others? Is that because you
> haven't had a chance to make a final determination on those patches,
> or you have outstanding comments still to be addressed?
Hi Ted,
Thanks for helping push it forward!
I'm not sure if I missed Christian's tag in a previous iteration. I
looked through my archive and didn't find it. Unless I'm mistaken, I
don't think I have any r-b from him here yet.
--
Gabriel Krisman Bertazi
On Thu, Jul 27, 2023 at 02:39:55PM -0400, Gabriel Krisman Bertazi wrote:
> > Also, Christian, I notice one of the five VFS patches in the series
> > has your Reviewed-by tag, but not the others? Is that because you
> > haven't had a chance to make a final determination on those patches,
> > or you have outstanding comments still to be addressed?
>
> I'm not sure if I missed Christian's tag in a previous iteration. I
> looked through my archive and didn't find it. Unless I'm mistaken, I
> don't think I have any r-b from him here yet.
Ah, right. I looked back and I'm not sure why I thought he had signed
off one of them; I must have hallucinated it....
- Ted
> since it has vfs, ext4, and f2fs changes (and the bulk of the changes
> are in the vfs), perhaps it should go through the vfs tree?
I've just waited for Eric to finish his review. I'll take a look later
and will get it into -next for long soaking.
On Thu, Jul 27, 2023 at 01:28:36PM -0400, Gabriel Krisman Bertazi wrote:
> Hi,
>
> This is the v4 of the negative dentry support on case-insensitive
> directories. It doesn't have any functional changes from v1. It applies
> Eric's comments to bring the flags check closet together, improve the
I'd like to please have Acks/RVBs from at least Eric for this since he's
been diligently reviewing this.
On Thu, Jul 27, 2023 at 01:28:37PM -0400, Gabriel Krisman Bertazi wrote:
> From: Gabriel Krisman Bertazi <[email protected]>
>
> Negative dentries support on case-insensitive ext4/f2fs will require
> access to the name under lookup to ensure it matches the dentry. This
> adds an optional new flavor of cached dentry revalidation hook to expose
> this extra parameter.
>
> I'm fine with extending d_revalidate instead of adding a new hook, if
> it is considered cleaner and the approach is accepted. I wrote a new
> hook to simplify reviewing.
>
> Reviewed-by: Theodore Ts'o <[email protected]>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> ---
> Changes since v2:
> - Document d_revalidate_name hook. (Eric)
> ---
> Documentation/filesystems/locking.rst | 3 +++
> Documentation/filesystems/vfs.rst | 12 ++++++++++++
> fs/dcache.c | 2 +-
> fs/namei.c | 23 ++++++++++++++---------
> include/linux/dcache.h | 1 +
> 5 files changed, 31 insertions(+), 10 deletions(-)
>
> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> index ed148919e11a..d68997ba6584 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -18,6 +18,8 @@ dentry_operations
> prototypes::
>
> int (*d_revalidate)(struct dentry *, unsigned int);
> + int (*d_revalidate_name)(struct dentry *, const struct qstr *,
> + unsigned int);
I think we should just extend d_revalidate(). You can't reasonably
implement d_revalidate() and d_revalidate_name() and then have the VFS
call both. That's just weird. Imho, it belongs into d_revalidate()
proper. Documentation should come with the same warning about handling
d_inode in so far as under some condition d_name can change under the
caller.
> int (*d_weak_revalidate)(struct dentry *, unsigned int);
> int (*d_hash)(const struct dentry *, struct qstr *);
> int (*d_compare)(const struct dentry *,
> @@ -37,6 +39,7 @@ locking rules:
> ops rename_lock ->d_lock may block rcu-walk
> ================== =========== ======== ============== ========
> d_revalidate: no no yes (ref-walk) maybe
> +d_revalidate_name: no no yes (ref-walk) maybe
> d_weak_revalidate: no no yes no
> d_hash no no no maybe
> d_compare: yes no no maybe
> diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
> index cb2a97e49872..34c842bd7cb2 100644
> --- a/Documentation/filesystems/vfs.rst
> +++ b/Documentation/filesystems/vfs.rst
> @@ -1252,6 +1252,8 @@ defined:
>
> struct dentry_operations {
> int (*d_revalidate)(struct dentry *, unsigned int);
> + int (*d_revalidate_name)(struct dentry *, const struct qstr *,
> + unsigned int);
> int (*d_weak_revalidate)(struct dentry *, unsigned int);
> int (*d_hash)(const struct dentry *, struct qstr *);
> int (*d_compare)(const struct dentry *,
> @@ -1288,6 +1290,16 @@ defined:
> return
> -ECHILD and it will be called again in ref-walk mode.
>
> +``d_revalidate_name``
> + Variant of d_revalidate that also provides the name under look-up. Most
> + filesystems will keep it as NULL, unless there are particular semantics
> + for filenames encoding that need to be handled during dentry
> + revalidation.
> +
> + When available, it is called in lieu of d_revalidate and has the same
> + locking rules and return semantics. Refer to d_revalidate for more
> + information.
> +
> ``d_weak_revalidate``
> called when the VFS needs to revalidate a "jumped" dentry. This
> is called when a path-walk ends at dentry that was not acquired
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 52e6d5fdab6b..98521862e58a 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1928,7 +1928,7 @@ void d_set_d_op(struct dentry *dentry, const struct dentry_operations *op)
> dentry->d_flags |= DCACHE_OP_HASH;
> if (op->d_compare)
> dentry->d_flags |= DCACHE_OP_COMPARE;
> - if (op->d_revalidate)
> + if (op->d_revalidate || op->d_revalidate_name)
> dentry->d_flags |= DCACHE_OP_REVALIDATE;
> if (op->d_weak_revalidate)
> dentry->d_flags |= DCACHE_OP_WEAK_REVALIDATE;
> diff --git a/fs/namei.c b/fs/namei.c
> index e56ff39a79bc..84df0ddd20db 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -853,11 +853,16 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
> return false;
> }
>
> -static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
> +static inline int d_revalidate(struct dentry *dentry,
> + const struct qstr *name,
> + unsigned int flags)
> {
> - if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
> +
> + if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE)) {
> + if (dentry->d_op->d_revalidate_name)
> + return dentry->d_op->d_revalidate_name(dentry, name, flags);
> return dentry->d_op->d_revalidate(dentry, flags);
This whole sequence got me thinking.
If you create an ext4 filesystem with casefolding like:
mkfs.ext4 -F -E encoding=utf8 /dev/sdb
and then
mount -t ext4 /dev/sdb /mnt
mkdir /mnt/casefold
chattr +F /mnt/casefold
then you can mount overlayfs on the non-casefolded root dentry at /mnt:
(1) mount -t overlay overlay -o upperdir=/upper,workdir=/work,lowerdir=/mnt /opt
but you cannot mount overlayfs on the casefolded root dentry at
/mnt/casefolded:
(2) mount -t overlay overlay -o upperdir=/upper,workdir=/work,lowerdir=/mnt/casefold /opt
because overlayfs rejects the dentry in ovl_dentry_weird() because the
dentry will have DCACHE_OP_HASH set because casefold libfs helpers rely
on a custom dentry hash function.
In any case (1) shouldn't a problem per se as overlayfs will return
EREMOTE from lookup because ovl_dentry_weird() will also be called by
overlayfs during lookup. So it should be safe though I haven't spent a
lot of mental effort to figure out whether this can somehow be otherwise
used to trigger nonsensical behavior or potential bugs.
But this logic is predicated on DCACHE_OP_HASH. So if for some crazy
reason a filesystem were to implement ->d_revalidate_name() but didn't
also implement ->d_hash() we'd be hosed because overlayfs calls
->d_revalidate() directly.
And then there's ecryptfs which is happily mountable over casefolding
directories:
ubuntu@imp1-vm:~$ sudo mount -t ecryptfs /mnt/test/casefold-dir /opt
ubuntu@imp1-vm:/opt$ findmnt | grep opt
└─/opt /mnt/test/casefold-dir ecryptfs rw,relatime,ecryptfs_sig=8567ee2ae5880f2d,ecryptfs_cipher=aes,ecryptfs_key_bytes=16,ecryptfs_unlink_sigs
So it doesn't even seem to care if the underlying filesytem uses a
custom dentry hash function which seems problematic (So unrelated to
this change someone should likely explain why that doesn't matter.).
Afaict with your series this will be even more broken because ecryptfs
and overlayfs call ->d_revalidate() directly.
So this suggests that really you want to extend ->d_revalidate() and we
should at least similar to overlayfs make ecryptfs reject being mounted
on casefolding directories and refuse lookup requests for casefolding
directories.
Ideally we'd explicitly reject by having such fses detect casefolding
unless it's really enough to reject based on DCACHE_OP_HASH.
Christian Brauner <[email protected]> writes:
>> -static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
>> +static inline int d_revalidate(struct dentry *dentry,
>> + const struct qstr *name,
>> + unsigned int flags)
>> {
>> - if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
>> +
>> + if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE)) {
>> + if (dentry->d_op->d_revalidate_name)
>> + return dentry->d_op->d_revalidate_name(dentry, name, flags);
>> return dentry->d_op->d_revalidate(dentry, flags);
>
> This whole sequence got me thinking.
>
...
> ubuntu@imp1-vm:~$ sudo mount -t ecryptfs /mnt/test/casefold-dir /opt
> ubuntu@imp1-vm:/opt$ findmnt | grep opt
> └─/opt /mnt/test/casefold-dir ecryptfs rw,relatime,ecryptfs_sig=8567ee2ae5880f2d,ecryptfs_cipher=aes,ecryptfs_key_bytes=16,ecryptfs_unlink_sigs
That's interesting. I was aware of overlayfs and wanted to eventually
get it to work together with casefold, but never considered an ecryptfs combo.
> So it doesn't even seem to care if the underlying filesytem uses a
> custom dentry hash function which seems problematic (So unrelated to
> this change someone should likely explain why that doesn't matter.).
>
> Afaict with your series this will be even more broken because ecryptfs
> and overlayfs call ->d_revalidate() directly.
>
> So this suggests that really you want to extend ->d_revalidate() and we
> should at least similar to overlayfs make ecryptfs reject being mounted
> on casefolding directories and refuse lookup requests for casefolding
> directories.
>
> Ideally we'd explicitly reject by having such fses detect casefolding
> unless it's really enough to reject based on DCACHE_OP_HASH.
Thanks for finding this issue. I'll follow up with merging d_revalidate
and d_revalidate_name and adding a patch to explicitly reject
combinations of ecryptfs/overlayfs with casefolding filesystems, and
safeguard the lookup.
--
Gabriel Krisman Bertazi