Hi,
This is the v2 of the negative dentry support on case-insensitive directories.
It doesn't have any functional changes from v1, but it adds more context and a
comment to the dentry->d_name access I'm doing in d_revalidate, documenting
why (i understand) it is safe to do it without protecting from the parallell
directory changes.
Please, let me know if the documentation is sufficient or if I'm missing some
case.
Retested with xfstests for ext4 and f2fs.
--
cover letter from v1.
This patchset enables negative dentries for case-insensitive directories
in ext4/f2fs. It solves the corner cases for this feature, including
those already tested by fstests (generic/556). It also solves an
existing bug with the existing implementation where old negative
dentries are left behind after a directory conversion to
case-insensitive.
Testing-wise, I ran sanity checks to show it properly uses the created
negative dentries, observed the expected performance increase of the
dentry cache hit, and showed it survives the quick group in fstests on
both f2fs and ext4 without regressions.
* Background
Negative dentries have always been disabled in case-insensitive
directories because, in their current form, they can't provide enough
assurances that all the case variations of a filename won't exist in a
directory, and the name-preserving case-insenstive semantics
during file creation prevents some negative dentries from being
instantiated unmodified.
Nevertheless, for the general case, the existing implementation would
already work with negative dentries, even though they are fully
disabled. That is: if the original lookup that created the dentry was
done in a case-insensitive way, the negative dentry can usually be
validated, since it assures that no other dcache entry exists, *and*
that no variation of the file exists on disk (since the lookup
failed). A following lookup would then be executed with the
case-insensitive-aware d_hash and d_lookup, which would find the right
negative dentry and use it.
The first corner case arises when a case-insensitive directory has
negative dentries that were created before the directory was flipped to
case-insensitive. A directory must be empty to be converted, but it
doesn't mean the directory doesn't have negative dentry children. If
that happens, the dangling dentries left behind can't assure that no
case-variation of the name exists. They only mean the exact name
doesn't exist. A further lookup would incorrectly validate them.
The code below demonstrates the problem. In this example $1 and $2 are
two strings, where:
(i) $1 != $2
(ii) casefold($1) == casefold($2)
(iii) hash($1) == hash($2) == hash(casefold($1))
Then, the following sequence could potentially return a ENOENT, even
though the case-insensitive lookup should exist:
mkdir d <- Case-sensitive directory
touch d/$1
touch d/$2
unlink d/$1 <- leaves negative dentry behind.
unlink d/$2 <- leaves *another* negative dentry behind.
chattr +F d <- make 'd' case-insensitive.
touch d/$1 <- Both negative dentries could match. finds one of them,
and instantiate
access d/$1 <- Find the other negative dentry, get -ENOENT.
In fact, this is a problem even on the current implementation, where
negative dentries for CI are disabled. There was a bug reported by Al
Viro in 2020, where a directory might end up with dangling negative
dentries created during a case-sensitive lookup, because they existed
before the +F attribute was set.
It is hard to trigger the issue, because condition (iii) is hard to test
on an unmodified kernel. By hacking the kernel to force the hash
collision, there are a few ways we can trigger this bizarre behavior in
case-insensitive directories through the insertion of negative dentries.
Another problem exists when turning a negative dentry to positive. If
the negative dentry has a different case than what is currently being
used for lookup, the dentry cannot be reused without changing its name,
in order to guarantee filename-preserving semantics to userspace. We
need to either change the name or invalidate the dentry. This issue is
currently avoided in mainline, since the negative dentry mechanism is
disabled.
* Proposal
The main idea is to differentiate negative dentries created in a
case-insensitive context from those created during a case-sensitive
lookup via a new dentry flag, D_CASEFOLD_LOOKUP, set by the filesystem
the d_lookup hook. Since the former can be used (except for the
name-preserving issue), d_revalidate will just check the flag to
quickly accept or reject the dentry.
A different solution would be to guarantee no negative dentry exists
during the case-sensitive to case-insensitive directory conversion (the
other direction is safe). It has the following problems:
1) It is not trivial to implement a race-free mechanism to ensure
negative dentries won't be recreated immediately after invalidation
while converting the directory.
2) The knowledge whether the negative dentry is valid (i.e. comes from
a case-insensitive lookup) is implicit on the fact that we are
correctly invalidating dentries when converting the directory.
Having a D_CASEFOLD_LOOKUP avoids both issues, and seems to be a cheap
solution to the problem.
But, as explained above, due to the filename preserving semantics, we
cannot just validate based on D_CASEFOLD_LOOKUP.
For that, one solution would be to invalidate the negative dentry when
it is decided to turn it positive, instead of reusing it. I implemented
that in the past (2018) but Al Viro made it clear we don't want to incur
costs on the VFS critical path for filesystems who don't care about
case-insensitiveness.
Instead, this patch invalidates negative dentries in casefold
directories in d_revalidate during creation lookups, iff the lookup name
is not exactly what is cached. Other kinds of lookups wouldn't need
this limitation.
* caveats
1) Encryption
Negative dentries on case-insensitive encrypted directories are also
disabled. No semantic change for them is intended in
this patchset; we just bypass the revalidation directly to fscrypt, for
positive dentries. Encryption support is future work.
2) revalidate the cached dentry using the name under lookup
Validating based on the lookup name is strange for a cache. the new
semantic is implemented by d_revalidate, to stay out of the critical
path of filesystems who don't care about case-insensitiveness, as much
as possible. The only change is the addition of a new flavor of
d_revalidate.
* Tests
There are a tests in place for most of the corner cases in generic/556.
They mainly verify the name-preserving semantics. The invalidation when
converting the directory is harder to test, because it is hard to force
the invalidation of specific cached dentries that occlude a dangling
invalid dentry. I tested it with forcing the positive dentries to be
removed, but I'm not sure how to write an upstreamable test.
It also survives fstests quick group regression testing on both ext4 and
f2fs.
* Performance
The latency of lookups of non-existing files is obviously improved, as
would be expected. The following numbers compare the execution time of 10^6
lookups of a non-existing file in a case-insensitive directory
pre-populated with 100k files in ext4.
Without the patch: 10.363s / 0.349s / 9.920s (real/user/sys)
With the patch: 1.752s / 0.276s / 1.472s (real/user/sys)
* patchset
Patch 1 introduces a new flavor of d_revalidate to provide the
filesystem with the name under lookup; Patch 2 introduces the new flag
to signal the dentry creation context; Patch 3 introduces a libfs helper
to revalidate negative dentries on case-insensitive directories; Patch 4
deals with encryption; Patch 5 cleans up the now redundant dentry
operations for case-insensitive with and without encryption; Finally,
Patch 6 and 7 enable support on case-insensitive directories
for ext4 and f2fs, respectively.
Gabriel Krisman Bertazi (7):
fs: Expose name under lookup to d_revalidate hook
fs: Add DCACHE_CASEFOLD_LOOKUP flag
libfs: Validate negative dentries in case-insensitive directories
libfs: Support revalidation of encrypted case-insensitive dentries
libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
ext4: Enable negative dentries on case-insensitive lookup
f2fs: Enable negative dentries on case-insensitive lookup
fs/dcache.c | 10 +++++-
fs/ext4/namei.c | 34 ++----------------
fs/f2fs/namei.c | 23 ++----------
fs/libfs.c | 82 ++++++++++++++++++++++++++----------------
fs/namei.c | 23 +++++++-----
include/linux/dcache.h | 9 +++++
6 files changed, 88 insertions(+), 93 deletions(-)
--
2.40.0
From: Gabriel Krisman Bertazi <[email protected]>
Now that casefold needs d_revalidate and calls fscrypt_d_revalidate
itself, generic_encrypt_ci_dentry_ops and generic_ci_dentry_ops are now
equivalent. Merge them together and simplify the setup code.
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/libfs.c | 44 +++++++++++++-------------------------------
1 file changed, 13 insertions(+), 31 deletions(-)
diff --git a/fs/libfs.c b/fs/libfs.c
index 0886044db593..348ec6130198 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1504,7 +1504,7 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
return fscrypt_d_revalidate(dentry, flags);
}
-static const struct dentry_operations generic_ci_dentry_ops = {
+static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
.d_hash = generic_ci_d_hash,
.d_compare = generic_ci_d_compare,
.d_revalidate_name = generic_ci_d_revalidate,
@@ -1517,26 +1517,20 @@ static const struct dentry_operations generic_encrypted_dentry_ops = {
};
#endif
-#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
-static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
- .d_hash = generic_ci_d_hash,
- .d_compare = generic_ci_d_compare,
- .d_revalidate_name = generic_ci_d_revalidate,
-};
-#endif
-
/**
* generic_set_encrypted_ci_d_ops - helper for setting d_ops for given dentry
* @dentry: dentry to set ops on
*
- * Casefolded directories need d_hash and d_compare set, so that the dentries
- * contained in them are handled case-insensitively. Note that these operations
- * are needed on the parent directory rather than on the dentries in it, and
- * while the casefolding flag can be toggled on and off on an empty directory,
- * dentry_operations can't be changed later. As a result, if the filesystem has
- * casefolding support enabled at all, we have to give all dentries the
- * casefolding operations even if their inode doesn't have the casefolding flag
- * currently (and thus the casefolding ops would be no-ops for now).
+ * Casefolded directories need d_hash, d_compare and d_revalidate set, so
+ * that the dentries contained in them are handled case-insensitively,
+ * but implement support for fs_encryption. Note that these operations
+ * are needed on the parent directory rather than on the dentries in it,
+ * and while the casefolding flag can be toggled on and off on an empty
+ * directory, dentry_operations can't be changed later. As a result, if
+ * the filesystem has casefolding support enabled at all, we have to
+ * give all dentries the casefolding operations even if their inode
+ * doesn't have the casefolding flag currently (and thus the casefolding
+ * ops would be no-ops for now).
*
* Encryption works differently in that the only dentry operation it needs is
* d_revalidate, which it only needs on dentries that have the no-key name flag.
@@ -1549,30 +1543,18 @@ static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
*/
void generic_set_encrypted_ci_d_ops(struct dentry *dentry)
{
-#ifdef CONFIG_FS_ENCRYPTION
- bool needs_encrypt_ops = dentry->d_flags & DCACHE_NOKEY_NAME;
-#endif
#if IS_ENABLED(CONFIG_UNICODE)
- bool needs_ci_ops = dentry->d_sb->s_encoding;
-#endif
-#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
- if (needs_encrypt_ops && needs_ci_ops) {
+ if (dentry->d_sb->s_encoding) {
d_set_d_op(dentry, &generic_encrypted_ci_dentry_ops);
return;
}
#endif
#ifdef CONFIG_FS_ENCRYPTION
- if (needs_encrypt_ops) {
+ if (dentry->d_flags & DCACHE_NOKEY_NAME) {
d_set_d_op(dentry, &generic_encrypted_dentry_ops);
return;
}
#endif
-#if IS_ENABLED(CONFIG_UNICODE)
- if (needs_ci_ops) {
- d_set_d_op(dentry, &generic_ci_dentry_ops);
- return;
- }
-#endif
}
EXPORT_SYMBOL(generic_set_encrypted_ci_d_ops);
--
2.40.0
From: Gabriel Krisman Bertazi <[email protected]>
Preserve the existing behavior for encrypted directories, by rejecting
negative dentries of encrypted+casefolded directories. This allows
generic_ci_d_revalidate to be used by filesystems with both features
enabled, as long as the directory is either casefolded or encrypted, but
not both at the same time.
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/libfs.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/libfs.c b/fs/libfs.c
index f8881e29c5d5..0886044db593 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1478,6 +1478,9 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
const struct inode *dir = READ_ONCE(parent->d_inode);
if (dir && needs_casefold(dir)) {
+ if (IS_ENCRYPTED(dir))
+ return 0;
+
if (!d_is_casefold_lookup(dentry))
return 0;
@@ -1497,7 +1500,8 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
}
}
}
- return 1;
+
+ return fscrypt_d_revalidate(dentry, flags);
}
static const struct dentry_operations generic_ci_dentry_ops = {
@@ -1517,7 +1521,7 @@ static const struct dentry_operations generic_encrypted_dentry_ops = {
static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
.d_hash = generic_ci_d_hash,
.d_compare = generic_ci_d_compare,
- .d_revalidate = fscrypt_d_revalidate,
+ .d_revalidate_name = generic_ci_d_revalidate,
};
#endif
--
2.40.0
From: Gabriel Krisman Bertazi <[email protected]>
This flag marks a negative or positive dentry as being created after a
case-insensitive lookup operation. It is useful to differentiate
dentries this way to detect whether the negative dentry can be trusted
during a case-insensitive lookup.
Reviewed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/dcache.c | 8 ++++++++
include/linux/dcache.h | 8 ++++++++
2 files changed, 16 insertions(+)
diff --git a/fs/dcache.c b/fs/dcache.c
index 98521862e58a..05e4c7019e17 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1958,6 +1958,14 @@ void d_set_fallthru(struct dentry *dentry)
}
EXPORT_SYMBOL(d_set_fallthru);
+void d_set_casefold_lookup(struct dentry *dentry)
+{
+ spin_lock(&dentry->d_lock);
+ dentry->d_flags |= DCACHE_CASEFOLD_LOOKUP;
+ spin_unlock(&dentry->d_lock);
+}
+EXPORT_SYMBOL(d_set_casefold_lookup);
+
static unsigned d_flags_for_inode(struct inode *inode)
{
unsigned add_flags = DCACHE_REGULAR_TYPE;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index b6188f2e8950..457345123cb6 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -209,6 +209,7 @@ struct dentry_operations {
#define DCACHE_FALLTHRU 0x01000000 /* Fall through to lower layer */
#define DCACHE_NOKEY_NAME 0x02000000 /* Encrypted name encoded without key */
#define DCACHE_OP_REAL 0x04000000
+#define DCACHE_CASEFOLD_LOOKUP 0x08000000 /* Dentry comes from a casefold directory */
#define DCACHE_PAR_LOOKUP 0x10000000 /* being looked up (with parent locked shared) */
#define DCACHE_DENTRY_CURSOR 0x20000000
@@ -497,6 +498,13 @@ static inline bool d_is_fallthru(const struct dentry *dentry)
return dentry->d_flags & DCACHE_FALLTHRU;
}
+extern void d_set_casefold_lookup(struct dentry *dentry);
+
+static inline bool d_is_casefold_lookup(const struct dentry *dentry)
+{
+ return dentry->d_flags & DCACHE_CASEFOLD_LOOKUP;
+}
+
extern int sysctl_vfs_cache_pressure;
--
2.40.0
From: Gabriel Krisman Bertazi <[email protected]>
Instead of invalidating negative dentries during case-insensitive
lookups, mark them as such and let them be added to the dcache.
d_ci_revalidate is able to properly filter them out if necessary based
on the dentry casefold flag.
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/ext4/namei.c | 34 +++-------------------------------
1 file changed, 3 insertions(+), 31 deletions(-)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index a5010b5b8a8c..35f87f7141fe 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1850,16 +1850,9 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, unsi
}
}
-#if IS_ENABLED(CONFIG_UNICODE)
- if (!inode && IS_CASEFOLDED(dir)) {
- /* Eventually we want to call d_add_ci(dentry, NULL)
- * for negative dentries in the encoding case as
- * well. For now, prevent the negative dentry
- * from being cached.
- */
- return NULL;
- }
-#endif
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
+ d_set_casefold_lookup(dentry);
+
return d_splice_alias(inode, dentry);
}
@@ -3185,17 +3178,6 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
ext4_fc_track_unlink(handle, dentry);
retval = ext4_mark_inode_dirty(handle, dir);
-#if IS_ENABLED(CONFIG_UNICODE)
- /* VFS negative dentries are incompatible with Encoding and
- * Case-insensitiveness. Eventually we'll want avoid
- * invalidating the dentries here, alongside with returning the
- * negative dentries at ext4_lookup(), when it is better
- * supported by the VFS for the CI case.
- */
- if (IS_CASEFOLDED(dir))
- d_invalidate(dentry);
-#endif
-
end_rmdir:
brelse(bh);
if (handle)
@@ -3296,16 +3278,6 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
goto out_trace;
retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry), dentry);
-#if IS_ENABLED(CONFIG_UNICODE)
- /* VFS negative dentries are incompatible with Encoding and
- * Case-insensitiveness. Eventually we'll want avoid
- * invalidating the dentries here, alongside with returning the
- * negative dentries at ext4_lookup(), when it is better
- * supported by the VFS for the CI case.
- */
- if (IS_CASEFOLDED(dir))
- d_invalidate(dentry);
-#endif
out_trace:
trace_ext4_unlink_exit(dentry, retval);
--
2.40.0
From: Gabriel Krisman Bertazi <[email protected]>
Instead of invalidating negative dentries during case-insensitive
lookups, mark them as such and let them be added to the dcache.
d_ci_revalidate is able to properly filter them out if necessary based
on the dentry casefold flag.
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/f2fs/namei.c | 23 ++---------------------
1 file changed, 2 insertions(+), 21 deletions(-)
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 11fc4c8036a9..57ca7ea86509 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -564,17 +564,8 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
goto out_iput;
}
out_splice:
-#if IS_ENABLED(CONFIG_UNICODE)
- if (!inode && IS_CASEFOLDED(dir)) {
- /* Eventually we want to call d_add_ci(dentry, NULL)
- * for negative dentries in the encoding case as
- * well. For now, prevent the negative dentry
- * from being cached.
- */
- trace_f2fs_lookup_end(dir, dentry, ino, err);
- return NULL;
- }
-#endif
+ if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
+ d_set_casefold_lookup(dentry);
new = d_splice_alias(inode, dentry);
err = PTR_ERR_OR_ZERO(new);
trace_f2fs_lookup_end(dir, dentry, ino, !new ? -ENOENT : err);
@@ -627,16 +618,6 @@ static int f2fs_unlink(struct inode *dir, struct dentry *dentry)
f2fs_delete_entry(de, page, dir, inode);
f2fs_unlock_op(sbi);
-#if IS_ENABLED(CONFIG_UNICODE)
- /* VFS negative dentries are incompatible with Encoding and
- * Case-insensitiveness. Eventually we'll want avoid
- * invalidating the dentries here, alongside with returning the
- * negative dentries at f2fs_lookup(), when it is better
- * supported by the VFS for the CI case.
- */
- if (IS_CASEFOLDED(dir))
- d_invalidate(dentry);
-#endif
if (IS_DIRSYNC(dir))
f2fs_sync_fs(sbi->sb, 1);
fail:
--
2.40.0
Gabriel Krisman Bertazi <[email protected]> writes:
> Hi,
>
> This is the v2 of the negative dentry support on case-insensitive directories.
> It doesn't have any functional changes from v1, but it adds more context and a
> comment to the dentry->d_name access I'm doing in d_revalidate, documenting
> why (i understand) it is safe to do it without protecting from the parallell
> directory changes.
>
> Please, let me know if the documentation is sufficient or if I'm missing some
> case.
Hi Al, Christian,
Wanted to ping about this and see if we can get a review from the vfs
side to get it merged.
Thank you!
>
> Retested with xfstests for ext4 and f2fs.
>
> --
> cover letter from v1.
>
> This patchset enables negative dentries for case-insensitive directories
> in ext4/f2fs. It solves the corner cases for this feature, including
> those already tested by fstests (generic/556). It also solves an
> existing bug with the existing implementation where old negative
> dentries are left behind after a directory conversion to
> case-insensitive.
>
> Testing-wise, I ran sanity checks to show it properly uses the created
> negative dentries, observed the expected performance increase of the
> dentry cache hit, and showed it survives the quick group in fstests on
> both f2fs and ext4 without regressions.
>
> * Background
>
> Negative dentries have always been disabled in case-insensitive
> directories because, in their current form, they can't provide enough
> assurances that all the case variations of a filename won't exist in a
> directory, and the name-preserving case-insenstive semantics
> during file creation prevents some negative dentries from being
> instantiated unmodified.
>
> Nevertheless, for the general case, the existing implementation would
> already work with negative dentries, even though they are fully
> disabled. That is: if the original lookup that created the dentry was
> done in a case-insensitive way, the negative dentry can usually be
> validated, since it assures that no other dcache entry exists, *and*
> that no variation of the file exists on disk (since the lookup
> failed). A following lookup would then be executed with the
> case-insensitive-aware d_hash and d_lookup, which would find the right
> negative dentry and use it.
>
> The first corner case arises when a case-insensitive directory has
> negative dentries that were created before the directory was flipped to
> case-insensitive. A directory must be empty to be converted, but it
> doesn't mean the directory doesn't have negative dentry children. If
> that happens, the dangling dentries left behind can't assure that no
> case-variation of the name exists. They only mean the exact name
> doesn't exist. A further lookup would incorrectly validate them.
>
> The code below demonstrates the problem. In this example $1 and $2 are
> two strings, where:
>
> (i) $1 != $2
> (ii) casefold($1) == casefold($2)
> (iii) hash($1) == hash($2) == hash(casefold($1))
>
> Then, the following sequence could potentially return a ENOENT, even
> though the case-insensitive lookup should exist:
>
> mkdir d <- Case-sensitive directory
> touch d/$1
> touch d/$2
> unlink d/$1 <- leaves negative dentry behind.
> unlink d/$2 <- leaves *another* negative dentry behind.
> chattr +F d <- make 'd' case-insensitive.
> touch d/$1 <- Both negative dentries could match. finds one of them,
> and instantiate
> access d/$1 <- Find the other negative dentry, get -ENOENT.
>
> In fact, this is a problem even on the current implementation, where
> negative dentries for CI are disabled. There was a bug reported by Al
> Viro in 2020, where a directory might end up with dangling negative
> dentries created during a case-sensitive lookup, because they existed
> before the +F attribute was set.
>
> It is hard to trigger the issue, because condition (iii) is hard to test
> on an unmodified kernel. By hacking the kernel to force the hash
> collision, there are a few ways we can trigger this bizarre behavior in
> case-insensitive directories through the insertion of negative dentries.
>
> Another problem exists when turning a negative dentry to positive. If
> the negative dentry has a different case than what is currently being
> used for lookup, the dentry cannot be reused without changing its name,
> in order to guarantee filename-preserving semantics to userspace. We
> need to either change the name or invalidate the dentry. This issue is
> currently avoided in mainline, since the negative dentry mechanism is
> disabled.
>
> * Proposal
>
> The main idea is to differentiate negative dentries created in a
> case-insensitive context from those created during a case-sensitive
> lookup via a new dentry flag, D_CASEFOLD_LOOKUP, set by the filesystem
> the d_lookup hook. Since the former can be used (except for the
> name-preserving issue), d_revalidate will just check the flag to
> quickly accept or reject the dentry.
>
> A different solution would be to guarantee no negative dentry exists
> during the case-sensitive to case-insensitive directory conversion (the
> other direction is safe). It has the following problems:
>
> 1) It is not trivial to implement a race-free mechanism to ensure
> negative dentries won't be recreated immediately after invalidation
> while converting the directory.
>
> 2) The knowledge whether the negative dentry is valid (i.e. comes from
> a case-insensitive lookup) is implicit on the fact that we are
> correctly invalidating dentries when converting the directory.
>
> Having a D_CASEFOLD_LOOKUP avoids both issues, and seems to be a cheap
> solution to the problem.
>
> But, as explained above, due to the filename preserving semantics, we
> cannot just validate based on D_CASEFOLD_LOOKUP.
>
> For that, one solution would be to invalidate the negative dentry when
> it is decided to turn it positive, instead of reusing it. I implemented
> that in the past (2018) but Al Viro made it clear we don't want to incur
> costs on the VFS critical path for filesystems who don't care about
> case-insensitiveness.
>
> Instead, this patch invalidates negative dentries in casefold
> directories in d_revalidate during creation lookups, iff the lookup name
> is not exactly what is cached. Other kinds of lookups wouldn't need
> this limitation.
>
> * caveats
>
> 1) Encryption
>
> Negative dentries on case-insensitive encrypted directories are also
> disabled. No semantic change for them is intended in
> this patchset; we just bypass the revalidation directly to fscrypt, for
> positive dentries. Encryption support is future work.
>
> 2) revalidate the cached dentry using the name under lookup
>
> Validating based on the lookup name is strange for a cache. the new
> semantic is implemented by d_revalidate, to stay out of the critical
> path of filesystems who don't care about case-insensitiveness, as much
> as possible. The only change is the addition of a new flavor of
> d_revalidate.
>
> * Tests
>
> There are a tests in place for most of the corner cases in generic/556.
> They mainly verify the name-preserving semantics. The invalidation when
> converting the directory is harder to test, because it is hard to force
> the invalidation of specific cached dentries that occlude a dangling
> invalid dentry. I tested it with forcing the positive dentries to be
> removed, but I'm not sure how to write an upstreamable test.
>
> It also survives fstests quick group regression testing on both ext4 and
> f2fs.
>
> * Performance
>
> The latency of lookups of non-existing files is obviously improved, as
> would be expected. The following numbers compare the execution time of 10^6
> lookups of a non-existing file in a case-insensitive directory
> pre-populated with 100k files in ext4.
>
> Without the patch: 10.363s / 0.349s / 9.920s (real/user/sys)
> With the patch: 1.752s / 0.276s / 1.472s (real/user/sys)
>
> * patchset
>
> Patch 1 introduces a new flavor of d_revalidate to provide the
> filesystem with the name under lookup; Patch 2 introduces the new flag
> to signal the dentry creation context; Patch 3 introduces a libfs helper
> to revalidate negative dentries on case-insensitive directories; Patch 4
> deals with encryption; Patch 5 cleans up the now redundant dentry
> operations for case-insensitive with and without encryption; Finally,
> Patch 6 and 7 enable support on case-insensitive directories
> for ext4 and f2fs, respectively.
>
> Gabriel Krisman Bertazi (7):
> fs: Expose name under lookup to d_revalidate hook
> fs: Add DCACHE_CASEFOLD_LOOKUP flag
> libfs: Validate negative dentries in case-insensitive directories
> libfs: Support revalidation of encrypted case-insensitive dentries
> libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
> ext4: Enable negative dentries on case-insensitive lookup
> f2fs: Enable negative dentries on case-insensitive lookup
>
> fs/dcache.c | 10 +++++-
> fs/ext4/namei.c | 34 ++----------------
> fs/f2fs/namei.c | 23 ++----------
> fs/libfs.c | 82 ++++++++++++++++++++++++++----------------
> fs/namei.c | 23 +++++++-----
> include/linux/dcache.h | 9 +++++
> 6 files changed, 88 insertions(+), 93 deletions(-)
--
Gabriel Krisman Bertazi
On Fri, Apr 21, 2023 at 08:03:07PM -0400, Gabriel Krisman Bertazi wrote:
> From: Gabriel Krisman Bertazi <[email protected]>
>
> Preserve the existing behavior for encrypted directories, by rejecting
> negative dentries of encrypted+casefolded directories. This allows
> generic_ci_d_revalidate to be used by filesystems with both features
> enabled, as long as the directory is either casefolded or encrypted, but
> not both at the same time.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> fs/libfs.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index f8881e29c5d5..0886044db593 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1478,6 +1478,9 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
> const struct inode *dir = READ_ONCE(parent->d_inode);
>
> if (dir && needs_casefold(dir)) {
> + if (IS_ENCRYPTED(dir))
> + return 0;
> +
Why not allow negative dentries in case-insensitive encrypted directories?
I can't think any reason why it wouldn't just work.
- Eric
On Fri, Apr 21, 2023 at 08:03:08PM -0400, Gabriel Krisman Bertazi wrote:
> From: Gabriel Krisman Bertazi <[email protected]>
>
> Now that casefold needs d_revalidate and calls fscrypt_d_revalidate
> itself, generic_encrypt_ci_dentry_ops and generic_ci_dentry_ops are now
> equivalent. Merge them together and simplify the setup code.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> fs/libfs.c | 44 +++++++++++++-------------------------------
> 1 file changed, 13 insertions(+), 31 deletions(-)
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 0886044db593..348ec6130198 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1504,7 +1504,7 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
> return fscrypt_d_revalidate(dentry, flags);
> }
>
> -static const struct dentry_operations generic_ci_dentry_ops = {
> +static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
> .d_hash = generic_ci_d_hash,
> .d_compare = generic_ci_d_compare,
> .d_revalidate_name = generic_ci_d_revalidate,
> @@ -1517,26 +1517,20 @@ static const struct dentry_operations generic_encrypted_dentry_ops = {
> };
> #endif
>
> -#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
> -static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
> - .d_hash = generic_ci_d_hash,
> - .d_compare = generic_ci_d_compare,
> - .d_revalidate_name = generic_ci_d_revalidate,
> -};
> -#endif
> -
> /**
> * generic_set_encrypted_ci_d_ops - helper for setting d_ops for given dentry
> * @dentry: dentry to set ops on
> *
> - * Casefolded directories need d_hash and d_compare set, so that the dentries
> - * contained in them are handled case-insensitively. Note that these operations
> - * are needed on the parent directory rather than on the dentries in it, and
> - * while the casefolding flag can be toggled on and off on an empty directory,
> - * dentry_operations can't be changed later. As a result, if the filesystem has
> - * casefolding support enabled at all, we have to give all dentries the
> - * casefolding operations even if their inode doesn't have the casefolding flag
> - * currently (and thus the casefolding ops would be no-ops for now).
> + * Casefolded directories need d_hash, d_compare and d_revalidate set, so
> + * that the dentries contained in them are handled case-insensitively,
> + * but implement support for fs_encryption. Note that these operations
The part ", but implement support for fs_encryption" is confusing. It would be
clearer with that deleted, since encryption is covered by the next paragraph.
> + * are needed on the parent directory rather than on the dentries in it,
> + * and while the casefolding flag can be toggled on and off on an empty
> + * directory, dentry_operations can't be changed later. As a result, if
> + * the filesystem has casefolding support enabled at all, we have to
> + * give all dentries the casefolding operations even if their inode
> + * doesn't have the casefolding flag currently (and thus the casefolding
> + * ops would be no-ops for now).
> *
> * Encryption works differently in that the only dentry operation it needs is
> * d_revalidate, which it only needs on dentries that have the no-key name flag.
> * The no-key flag can't be set "later", so we don't have to worry about that.
> *
> * Finally, to maximize compatibility with overlayfs (which isn't compatible
> * with certain dentry operations) and to avoid taking an unnecessary
> * performance hit, we use custom dentry_operations for each possible
> * combination rather than always installing all operations.
When I wrote the last paragraph, I think I had in mind "each possible
combination of features". Now it's changing in meaning to "each possible
combination of operations". Maybe replace it with that to make it clearer?
- Eric
On Fri, Apr 21, 2023 at 08:03:05PM -0400, Gabriel Krisman Bertazi wrote:
> @@ -209,6 +209,7 @@ struct dentry_operations {
> #define DCACHE_FALLTHRU 0x01000000 /* Fall through to lower layer */
> #define DCACHE_NOKEY_NAME 0x02000000 /* Encrypted name encoded without key */
> #define DCACHE_OP_REAL 0x04000000
> +#define DCACHE_CASEFOLD_LOOKUP 0x08000000 /* Dentry comes from a casefold directory */
>
> #define DCACHE_PAR_LOOKUP 0x10000000 /* being looked up (with parent locked shared) */
> #define DCACHE_DENTRY_CURSOR 0x20000000
The first time I read DCACHE_CASEFOLD_LOOKUP, I got the wrong impression, since
it uses _LOOKUP in a different way from DCACHE_PAR_LOOKUP. DCACHE_PAR_LOOKUP
uses it to mean "dentry is currently being looked up", while
DCACHE_CASEFOLD_LOOKUP uses it to mean "dentry *was* looked up".
Maybe DCACHE_CASEFOLDED_NAME would be more logical? That would follow
DCACHE_NOKEY_NAME. (Also CASEFOLD => CASEFOLDED would be logical, I think.)
- Eric
On Fri, Apr 21, 2023 at 08:03:10PM -0400, Gabriel Krisman Bertazi wrote:
> From: Gabriel Krisman Bertazi <[email protected]>
>
> Instead of invalidating negative dentries during case-insensitive
> lookups, mark them as such and let them be added to the dcache.
> d_ci_revalidate is able to properly filter them out if necessary based
> on the dentry casefold flag.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> fs/f2fs/namei.c | 23 ++---------------------
> 1 file changed, 2 insertions(+), 21 deletions(-)
>
> diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
> index 11fc4c8036a9..57ca7ea86509 100644
> --- a/fs/f2fs/namei.c
> +++ b/fs/f2fs/namei.c
> @@ -564,17 +564,8 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
> goto out_iput;
> }
> out_splice:
> -#if IS_ENABLED(CONFIG_UNICODE)
> - if (!inode && IS_CASEFOLDED(dir)) {
> - /* Eventually we want to call d_add_ci(dentry, NULL)
> - * for negative dentries in the encoding case as
> - * well. For now, prevent the negative dentry
> - * from being cached.
> - */
> - trace_f2fs_lookup_end(dir, dentry, ino, err);
> - return NULL;
> - }
> -#endif
> + if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir))
> + d_set_casefold_lookup(dentry);
I wonder if a more consistent place for the above code would be earlier in
f2fs_lookup(), next to the call to generic_set_encrypted_ci_d_ops()? That's
where the dentry_operations are set. It's also next to f2fs_prepare_lookup()
which is where DCACHE_NOKEY_NAME gets set if needed.
- Eric
Eric Biggers <[email protected]> writes:
> On Fri, Apr 21, 2023 at 08:03:07PM -0400, Gabriel Krisman Bertazi wrote:
>> From: Gabriel Krisman Bertazi <[email protected]>
>>
>> Preserve the existing behavior for encrypted directories, by rejecting
>> negative dentries of encrypted+casefolded directories. This allows
>> generic_ci_d_revalidate to be used by filesystems with both features
>> enabled, as long as the directory is either casefolded or encrypted, but
>> not both at the same time.
>>
>> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>> ---
>> fs/libfs.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/libfs.c b/fs/libfs.c
>> index f8881e29c5d5..0886044db593 100644
>> --- a/fs/libfs.c
>> +++ b/fs/libfs.c
>> @@ -1478,6 +1478,9 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
>> const struct inode *dir = READ_ONCE(parent->d_inode);
>>
>> if (dir && needs_casefold(dir)) {
>> + if (IS_ENCRYPTED(dir))
>> + return 0;
>> +
>
> Why not allow negative dentries in case-insensitive encrypted directories?
> I can't think any reason why it wouldn't just work.
TBH, I'm not familiar with the details of combined encrypted+casefold
support to be confident it works. This patch preserves the current
behavior of disabling them for encrypted+casefold directories.
I suspect it might require extra work that I'm not focusing on this
patchset. For instance, what should be the order of
fscrypt_d_revalidate and the checks I'm adding here? Note we will start
creating negative dentries in casefold directories after patch 6/7, so
unless we disable it here, we will start calling fscrypt_d_revalidate
for negative+casefold.
Should I just drop this hunk? Unless you are confident it works as is, I
prefer to add this support in stages and keep negative dentries of
encrypted+casefold directories disabled for now.
--
Gabriel Krisman Bertazi
On Tue, Jul 18, 2023 at 03:34:13PM -0400, Gabriel Krisman Bertazi wrote:
> Eric Biggers <[email protected]> writes:
>
> > On Fri, Apr 21, 2023 at 08:03:07PM -0400, Gabriel Krisman Bertazi wrote:
> >> From: Gabriel Krisman Bertazi <[email protected]>
> >>
> >> Preserve the existing behavior for encrypted directories, by rejecting
> >> negative dentries of encrypted+casefolded directories. This allows
> >> generic_ci_d_revalidate to be used by filesystems with both features
> >> enabled, as long as the directory is either casefolded or encrypted, but
> >> not both at the same time.
> >>
> >> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> >> ---
> >> fs/libfs.c | 8 ++++++--
> >> 1 file changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fs/libfs.c b/fs/libfs.c
> >> index f8881e29c5d5..0886044db593 100644
> >> --- a/fs/libfs.c
> >> +++ b/fs/libfs.c
> >> @@ -1478,6 +1478,9 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
> >> const struct inode *dir = READ_ONCE(parent->d_inode);
> >>
> >> if (dir && needs_casefold(dir)) {
> >> + if (IS_ENCRYPTED(dir))
> >> + return 0;
> >> +
> >
> > Why not allow negative dentries in case-insensitive encrypted directories?
> > I can't think any reason why it wouldn't just work.
>
> TBH, I'm not familiar with the details of combined encrypted+casefold
> support to be confident it works.This patch preserves the current
> behavior of disabling them for encrypted+casefold directories.
Not allowing that combination reduces the usefulness of this patchset.
Note that Android's use of casefold is always combined with encryption.
> I suspect it might require extra work that I'm not focusing on this
> patchset. For instance, what should be the order of
> fscrypt_d_revalidate and the checks I'm adding here?
Why would order matter? If either "feature" wants the dentry to be invalidated,
then the dentry gets invalidated.
> Note we will start creating negative dentries in casefold directories after
> patch 6/7, so unless we disable it here, we will start calling
> fscrypt_d_revalidate for negative+casefold.
fscrypt_d_revalidate() only cares about the DCACHE_NOKEY_NAME flag, so that's
not a problem.
>
> Should I just drop this hunk? Unless you are confident it works as is, I
> prefer to add this support in stages and keep negative dentries of
> encrypted+casefold directories disabled for now.
Unless I'm missing something, I think you're overcomplicating it. It should
just work if you don't go out of your way to prohibit this case. I.e., just
don't add the IS_ENCRYPTED(dir) check to generic_ci_d_revalidate().
- Eric
Eric Biggers <[email protected]> writes:
> Why would order matter? If either "feature" wants the dentry to be invalidated,
> then the dentry gets invalidated.
For instance, I was wondering makes sense for instance to memcmp d_name for
!DCACHE_NOKEY_NAME or if we wanted fscrypt_d_revalidate to come first.
>> Note we will start creating negative dentries in casefold directories after
>> patch 6/7, so unless we disable it here, we will start calling
>> fscrypt_d_revalidate for negative+casefold.
>
> fscrypt_d_revalidate() only cares about the DCACHE_NOKEY_NAME flag, so that's
> not a problem.
..I see now it is the first thing checked in fscrypt_d_revalidate.
>> Should I just drop this hunk? Unless you are confident it works as is, I
>> prefer to add this support in stages and keep negative dentries of
>> encrypted+casefold directories disabled for now.
>
> Unless I'm missing something, I think you're overcomplicating it.
Not overcomplicating. I'm just not familiar with fscrypt details enough to be
sure I could enable it. But yes, it seems safe.
> It should
> just work if you don't go out of your way to prohibit this case. I.e., just
> don't add the IS_ENCRYPTED(dir) check to generic_ci_d_revalidate().
I'll drop the check. And resend.
Thanks,
--
Gabriel Krisman Bertazi