2022-06-22 20:08:31

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 0/7] Support negative dentries on case-insensitive directories

Hi,

This survives every corner case I could think of for negative dentries
on case-insensitive filesystems, including those already tested by
fstests. I also ran sanity checks to show it uses the created negative
dentries, and I observed the expected performance increase of the
negative dentry cache hit.

* Background

Negative dentries have always been disabled in case-insensitive
directories because they don't provide enough assurance that every case
variation of a filename doesn't exist in a directory and because there
are known corner cases in file creation where negative dentries can't be
instantiated.

In the trivial case the upstream implementation already works for
negative dentries, even though it is disabled. That is: if the lookup
that caused the dentry to be created was done in a case-insensitive way,
the negative dentry can already be trusted, since it means that no valid
dcache entry exists, *and* that no variation of the file exists on
disk (since the lookup failed). A following lookup will then be executed
with case-insensitive-aware d_hash and d_lookup, it will find the right
negative dentry and can trust it. It has a creation problem, though,
discussed below.

The first problem appears when a case-insensitive directory has negative
dentries that were created when the directory was case-sensitive. A
further lookup would incorrectly trust it:

This sequence demonstrates the problem:

mkdir d
touch d/$1
touch d/$2
unlink d/$1 <- leaves negative dentry.
unlink d/$2 <- leaves negative dentry.
chattr +F d
touch d/$1 <- finds one of the negative dentries, makes it positive
< if d/$1 is d_drop somehow >
access d/$2 <- Might find the other negative dentry, get -ENOENT

There are actually a few problems here. The first is that a preexisting
negative dentry created during a case-sensitive lookup doesn't guarantee
that no other variation of the name exists. This is not a big problem
in the common case, since the directory has to be empty to be converted,
and the d_hash and d_compare are case-insensitive; which means they will
find the same dentry and reuse it most of the time (except for
invalidations and hash collisions). But it means that we are leaving
behind a stalled dentry that shouldn't be there.

The real problem happens if $1 and $2 are two strings where:
(i) casefold($1) == casefold($2)
(ii) hash($1) == hash($2) == hash(casefold($1))

This condition is the worst case. Both negative dentries can
potentially be found during a case-insensitive lookup if the wrong
dentry is invalidated.

In fact, this is a problem even on the current implementation. There
was a bug reported by Al in 2020 [1], where a directory might end up
with dangling negative dentries created during a case-sensitive lookup,
because when the +F attribute is set; even though that code requires an
empty directory, it doesn't check for negative dentries.

Condition (ii) is hard to test, but not impossible. But, even if it
is not present, we still leave negative dentries behind, which shouldn't
currently exist for a case-insensitive directory.

A completely different problem with negative dentries on
case-insensitive directories exist when turning a negative dentry to
positive. If the negative dentry has a different case than what is
currently being looked up, the dentry cannot be reused without changing
its name, because we guarantee filename-preserving semantics. We need
to either change the name or invalidate the dentry. This is currently
done in the upstream kernel by completely stopping negative dentries
from being created in the first place.

* Proposal

The proposed solution is to differentiate negative dentries created from
a case-insensitive context from those created during a case-sensitive
one via a new dentry flag, D_CASEFOLD_LOOKUP, set by the filesystem
during d_lookup. Since a negative dentry created during a
case-insensitive lookup can be trusted (except for the name-preserving
issue), we can check that flag during d_revalidate to quickly accept or
reject the negative dentry.

Another solution for that problem would be to guarantee that no negative
dentry exists during the Case-sensitive to case-insensitive directory
conversion (the other direction is safe). This has the following
problems:

1) It is not trivial to implement a race-free mechanism to ensure
negative dentries won't be recreated immediately after invalidation
while converting the directory.

2) The knowledge whether the negative dentry can be is
valid (i.e. comes from a case-insensitive lookup) is implicit on the
fact that we are correctly invalidating dentries when converting the
directory.

Having a D_CASEFOLD_LOOKUP avoids both issues, and seems to be a cheap
solution to explicitly decide whether to validate a negative dentry.

As explained, in order to maintain the filename preserving semantics, it
is not sufficient to reuse the dentry.

One solution would be to invalidate the negative dentry when it is
decided to turn it positive, instead of reusing it. I implemented that
in the past (2018) but my understanding is that we don't want to incur
costs on the VFS critical path for other filesystems who don't care
about case-insensitiveness. I think there is also a challenge in making
this invalidation race-free, but it might be simpler than I thought.

Instead, I'm suggesting that we only validate negative dentries in
casefold directories during lookups that will instantiate the dentry
when the lookup name is exactly what is cached.

* caveats

1) Encryption

Currently, negative dentries on encrypted directories are also disabled.
No semantic change on encrypted directories is intended in this
patchset; we just bypass the revalidation directly to fscrypt, for
positive dentries. I'm working on this case as future work.

2) revalidate the cached dentry using the name under lookup

This is strange for a cache. the new semantic is implemented on
d_revalidate() to stay out of the critical path of filesystems that
don't care about case-insensitive. But this requires the revalidation
hook to validate based on what name is under lookup, which is odd for a
cache.

* Tests

There are a few tests for the corner cases discussed above in
generic/556. They mainly verify the name-preserving semantics.
The invalidation when converting the directory is harder to test,
because it is hard to force the invalidation of specific cached dentries
that occlude a dangling invalid dentry. I tested it with forcing the
positive dentries to be removed, but I'm not sure how to write an
upstreamable test for it.

This also survives smoke test on ext4 and f2fs.

* patchset

Patch 1 introduces a new version of d_revalidate to provide the
filesystem with the name under lookup; Patch 2 introduces a new dentry
flag to mark dentries as created during a case-insensitive lookup; Patch
3 introduces a libfs helper to validate negative dentries on
case-insensitive directories; Patch 4 deals with encryption; Patch 5
cleans up the now redundant dentry operations for case-insensitive with
and without encryption; Finally, Patch 6 and 7 enable negative dentries
on case-insensitive directories for ext4 and f2fs, respectively.

Gabriel Krisman Bertazi (7):
fs: Expose name under lookup to d_revalidate hook
fs: Add DCACHE_CASEFOLD_LOOKUP flag
libfs: Validate negative dentries in case-insensitive directories
libfs: Support revalidation of encrypted case-insensitive dentries
libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
ext4: Enable negative dentries on case-insensitive lookup
f2fs: Enable negative dentries on case-insensitive lookup

fs/dcache.c | 9 +++++-
fs/ext4/namei.c | 35 +++-----------------
fs/f2fs/namei.c | 23 ++------------
fs/libfs.c | 72 ++++++++++++++++++++++++------------------
fs/namei.c | 23 ++++++++------
include/linux/dcache.h | 9 ++++++
6 files changed, 78 insertions(+), 93 deletions(-)

--
2.36.1


2022-06-22 20:10:47

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 5/7] libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops

Now that casefold needs d_revalidate and calls fscrypt_d_revalidate
itself, generic_encrypt_ci_dentry_ops and generic_ci_dentry_ops are now
equivalent. Merge them together and simplify the setup code.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/libfs.c | 44 +++++++++++++-------------------------------
1 file changed, 13 insertions(+), 31 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index e4da68ebd618..05f82e1a6f70 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1477,7 +1477,7 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
return fscrypt_d_revalidate(dentry, flags);
}

-static const struct dentry_operations generic_ci_dentry_ops = {
+static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
.d_hash = generic_ci_d_hash,
.d_compare = generic_ci_d_compare,
.d_revalidate_name = generic_ci_d_revalidate,
@@ -1490,26 +1490,20 @@ static const struct dentry_operations generic_encrypted_dentry_ops = {
};
#endif

-#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
-static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
- .d_hash = generic_ci_d_hash,
- .d_compare = generic_ci_d_compare,
- .d_revalidate_name = generic_ci_d_revalidate,
-};
-#endif
-
/**
* generic_set_encrypted_ci_d_ops - helper for setting d_ops for given dentry
* @dentry: dentry to set ops on
*
- * Casefolded directories need d_hash and d_compare set, so that the dentries
- * contained in them are handled case-insensitively. Note that these operations
- * are needed on the parent directory rather than on the dentries in it, and
- * while the casefolding flag can be toggled on and off on an empty directory,
- * dentry_operations can't be changed later. As a result, if the filesystem has
- * casefolding support enabled at all, we have to give all dentries the
- * casefolding operations even if their inode doesn't have the casefolding flag
- * currently (and thus the casefolding ops would be no-ops for now).
+ * Casefolded directories need d_hash, d_compare and d_revalidate set, so
+ * that the dentries contained in them are handled case-insensitively,
+ * but implement support for fs_encryption. Note that these operations
+ * are needed on the parent directory rather than on the dentries in it,
+ * and while the casefolding flag can be toggled on and off on an empty
+ * directory, dentry_operations can't be changed later. As a result, if
+ * the filesystem has casefolding support enabled at all, we have to
+ * give all dentries the casefolding operations even if their inode
+ * doesn't have the casefolding flag currently (and thus the casefolding
+ * ops would be no-ops for now).
*
* Encryption works differently in that the only dentry operation it needs is
* d_revalidate, which it only needs on dentries that have the no-key name flag.
@@ -1522,29 +1516,17 @@ static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
*/
void generic_set_encrypted_ci_d_ops(struct dentry *dentry)
{
-#ifdef CONFIG_FS_ENCRYPTION
- bool needs_encrypt_ops = dentry->d_flags & DCACHE_NOKEY_NAME;
-#endif
#if IS_ENABLED(CONFIG_UNICODE)
- bool needs_ci_ops = dentry->d_sb->s_encoding;
-#endif
-#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
- if (needs_encrypt_ops && needs_ci_ops) {
+ if (dentry->d_sb->s_encoding) {
d_set_d_op(dentry, &generic_encrypted_ci_dentry_ops);
return;
}
#endif
#ifdef CONFIG_FS_ENCRYPTION
- if (needs_encrypt_ops) {
+ if (dentry->d_flags & DCACHE_NOKEY_NAME) {
d_set_d_op(dentry, &generic_encrypted_dentry_ops);
return;
}
#endif
-#if IS_ENABLED(CONFIG_UNICODE)
- if (needs_ci_ops) {
- d_set_d_op(dentry, &generic_ci_dentry_ops);
- return;
- }
-#endif
}
EXPORT_SYMBOL(generic_set_encrypted_ci_d_ops);
--
2.36.1

2022-06-22 20:12:09

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 4/7] libfs: Support revalidation of encrypted case-insensitive dentries

Preserve the existing behavior for encrypted directories, by rejecting
negative dentries of encrypted+casefolded directories. This allows
generic_ci_d_revalidate to be used by filesystems with both features
enabled, as long as the directory is either casefolded or encrypted, but
not both at the same time.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/libfs.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index de43f3f585f1..e4da68ebd618 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1461,6 +1461,9 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
const struct inode *dir = READ_ONCE(parent->d_inode);

if (dir && needs_casefold(dir)) {
+ if (IS_ENCRYPTED(dir))
+ return 0;
+
if (!d_is_casefold_lookup(dentry))
return 0;

@@ -1470,7 +1473,8 @@ static inline int generic_ci_d_revalidate(struct dentry *dentry,
return 0;
}
}
- return 1;
+
+ return fscrypt_d_revalidate(dentry, flags);
}

static const struct dentry_operations generic_ci_dentry_ops = {
@@ -1490,7 +1494,7 @@ static const struct dentry_operations generic_encrypted_dentry_ops = {
static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
.d_hash = generic_ci_d_hash,
.d_compare = generic_ci_d_compare,
- .d_revalidate = fscrypt_d_revalidate,
+ .d_revalidate_name = generic_ci_d_revalidate,
};
#endif

--
2.36.1

2022-06-22 20:12:34

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 2/7] fs: Add DCACHE_CASEFOLD_LOOKUP flag

This flag marks a negative or positive dentry as being created after a
case-insensitive lookup operation. It is useful to differentiate
dentries this way to detect whether the negative dentry can be trusted
during a case-insensitive lookup.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
fs/dcache.c | 7 +++++++
include/linux/dcache.h | 8 ++++++++
2 files changed, 15 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index a0fe9e3676fb..518ddb7fbe0c 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1958,6 +1958,13 @@ void d_set_fallthru(struct dentry *dentry)
}
EXPORT_SYMBOL(d_set_fallthru);

+void d_set_casefold_lookup(struct dentry *dentry)
+{
+ spin_lock(&dentry->d_lock);
+ dentry->d_flags |= DCACHE_CASEFOLD_LOOKUP;
+ spin_unlock(&dentry->d_lock);
+}
+
static unsigned d_flags_for_inode(struct inode *inode)
{
unsigned add_flags = DCACHE_REGULAR_TYPE;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 871f65c8ef7f..8b71c5e418c2 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -208,6 +208,7 @@ struct dentry_operations {
#define DCACHE_FALLTHRU 0x01000000 /* Fall through to lower layer */
#define DCACHE_NOKEY_NAME 0x02000000 /* Encrypted name encoded without key */
#define DCACHE_OP_REAL 0x04000000
+#define DCACHE_CASEFOLD_LOOKUP 0x08000000 /* Dentry comes from a casefold directory */

#define DCACHE_PAR_LOOKUP 0x10000000 /* being looked up (with parent locked shared) */
#define DCACHE_DENTRY_CURSOR 0x20000000
@@ -497,6 +498,13 @@ static inline bool d_is_fallthru(const struct dentry *dentry)
return dentry->d_flags & DCACHE_FALLTHRU;
}

+extern void d_set_casefold_lookup(struct dentry *dentry);
+
+static inline bool d_is_casefold_lookup(const struct dentry *dentry)
+{
+ return dentry->d_flags & DCACHE_CASEFOLD_LOOKUP;
+}
+

extern int sysctl_vfs_cache_pressure;

--
2.36.1

2023-02-24 22:36:33

by Daniel Rosenberg

[permalink] [raw]
Subject: Re: [PATCH 0/7] Support negative dentries on case-insensitive directories

These look good to me. It will be nice to have negative dentries back for
casefolded directories.

-Daniel Rosenberg

2023-03-23 14:40:06

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 2/7] fs: Add DCACHE_CASEFOLD_LOOKUP flag

On Wed, Jun 22, 2022 at 03:45:58PM -0400, Gabriel Krisman Bertazi wrote:
> This flag marks a negative or positive dentry as being created after a
> case-insensitive lookup operation. It is useful to differentiate
> dentries this way to detect whether the negative dentry can be trusted
> during a case-insensitive lookup.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Reviewed-by: Theodore Ts'o <[email protected]>

2023-03-23 14:42:15

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 4/7] libfs: Support revalidation of encrypted case-insensitive dentries

On Wed, Jun 22, 2022 at 03:46:00PM -0400, Gabriel Krisman Bertazi wrote:
> Preserve the existing behavior for encrypted directories, by rejecting
> negative dentries of encrypted+casefolded directories. This allows
> generic_ci_d_revalidate to be used by filesystems with both features
> enabled, as long as the directory is either casefolded or encrypted, but
> not both at the same time.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Reviewed-by: Theodore Ts'o <[email protected]>

2023-03-23 14:42:38

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 5/7] libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops

On Wed, Jun 22, 2022 at 03:46:01PM -0400, Gabriel Krisman Bertazi wrote:
> Now that casefold needs d_revalidate and calls fscrypt_d_revalidate
> itself, generic_encrypt_ci_dentry_ops and generic_ci_dentry_ops are now
> equivalent. Merge them together and simplify the setup code.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Reviewed-by: Theodore Ts'o <[email protected]>