From: Gabriel Krisman Bertazi <[email protected]>
Introduce a dentry revalidation helper to be used by case-insensitive
filesystems to check if it is safe to reuse a negative dentry.
A negative dentry is safe to be reused on a case-insensitive lookup if
it was created during a case-insensitive lookup and this is not a lookup
that will instantiate a dentry. If this is a creation lookup, we also
need to make sure the name matches sensitively the name under lookup in
order to assure the name preserving semantics.
dentry->d_name is only checked by the case-insensitive d_revalidate hook
in the LOOKUP_CREATE/LOOKUP_RENAME_TARGET case since, for these cases,
d_revalidate is always called with the parent inode read-locked, and
therefore the name cannot change from under us.
d_revalidate is only called in 4 places: lookup_dcache, __lookup_slow,
lookup_open and lookup_fast:
- lookup_dcache always calls it with zeroed flags, with the exception
of when coming from __lookup_hash, which needs the parent locked
already, for instance in the open/creation path, which is locked in
open_last_lookups.
- In __lookup_slow, either the parent inode is read locked by the
caller (lookup_slow), or it is called with no flags (lookup_one*).
The read lock suffices to prevent ->d_name modifications, with the
exception of one case: __d_unalias, will call __d_move to fix a
directory accessible from multiple dentries, which effectively swaps
->d_name while holding only the shared read lock. This happens
through this flow:
lookup_slow() //LOOKUP_CREATE
d_lookup()
->d_lookup()
d_splice_alias()
__d_unalias()
__d_move()
Nevertheless, this case is not a problem because negative dentries
are not allowed to be moved with __d_move.
- lookup_open also requires the parent to be locked in the creation
case, which is done in open_last_lookups.
- lookup_fast will indeed be called with the parent unlocked, but it
shouldn't be called with LOOKUP_CREATE. Either it is called in the
link_path_walk, where nd->flags doesn't have LOOKUP_CREATE yet or in
open_last_lookups. But, in this case, it also never has LOOKUP_CREATE,
because it is only called on the !O_CREAT case, which means op->intent
doesn't have LOOKUP_CREAT (set in build_open_flags only if O_CREAT is
set).
Finally, for the LOOKUP_RENAME_TARGET, we are doing a rename, so the
parents inodes are also locked.
Reviewed-by: Theodore Ts'o <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
Changes since v3:
- Add comment regarding creation (Eric)
- Reorder checks to clarify !flags meaning (Eric)
- Add commit message explanaton of the inode read lock wrt.
__d_move. (Eric)
Changes since v2:
- Add comments to all rejection cases (Eric)
- safeguard against filesystem creating dentries without LOOKUP flags
---
fs/libfs.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/fs/libfs.c b/fs/libfs.c
index 5b851315eeed..ed04c4dcc312 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1462,9 +1462,64 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
return 0;
}
+static inline int generic_ci_d_revalidate(struct dentry *dentry,
+ const struct qstr *name,
+ unsigned int flags)
+{
+ if (d_is_negative(dentry)) {
+ const struct dentry *parent = READ_ONCE(dentry->d_parent);
+ const struct inode *dir = READ_ONCE(parent->d_inode);
+
+ if (dir && needs_casefold(dir)) {
+ /*
+ * Negative dentries created prior to turning the
+ * directory case-insensitive cannot be trusted, since
+ * they don't ensure any possible case version of the
+ * filename doesn't exist.
+ */
+ if (!d_is_casefold_lookup(dentry))
+ return 0;
+
+ /*
+ * Filesystems will call into d_revalidate without
+ * setting LOOKUP_ flags even for file creation (see
+ * lookup_one* variants). Reject negative dentries in
+ * this case, since we can't know for sure it won't be
+ * used for creation.
+ */
+ if (!flags)
+ return 0;
+
+ /*
+ * If the lookup is for creation, then a negative dentry
+ * can only be reused if it's a case-sensitive match,
+ * not just a case-insensitive one. This is needed to
+ * make the new file be created with the name the user
+ * specified, preserving case.
+ */
+ if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
+ /*
+ * ->d_name won't change from under us in the
+ * creation path only, since d_revalidate during
+ * creation and renames is always called with
+ * the parent inode locked. It isn't the case
+ * for all lookup callpaths, so ->d_name must
+ * not be touched outside
+ * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
+ */
+ if (dentry->d_name.len != name->len ||
+ memcmp(dentry->d_name.name, name->name, name->len))
+ return 0;
+ }
+ }
+ }
+ return 1;
+}
+
static const struct dentry_operations generic_ci_dentry_ops = {
.d_hash = generic_ci_d_hash,
.d_compare = generic_ci_d_compare,
+ .d_revalidate_name = generic_ci_d_revalidate,
};
#endif
--
2.41.0
On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
> From: Gabriel Krisman Bertazi <[email protected]>
>
> Introduce a dentry revalidation helper to be used by case-insensitive
> filesystems to check if it is safe to reuse a negative dentry.
>
> A negative dentry is safe to be reused on a case-insensitive lookup if
> it was created during a case-insensitive lookup and this is not a lookup
> that will instantiate a dentry. If this is a creation lookup, we also
> need to make sure the name matches sensitively the name under lookup in
> order to assure the name preserving semantics.
>
> dentry->d_name is only checked by the case-insensitive d_revalidate hook
> in the LOOKUP_CREATE/LOOKUP_RENAME_TARGET case since, for these cases,
> d_revalidate is always called with the parent inode read-locked, and
> therefore the name cannot change from under us.
>
> d_revalidate is only called in 4 places: lookup_dcache, __lookup_slow,
> lookup_open and lookup_fast:
>
> - lookup_dcache always calls it with zeroed flags, with the exception
> of when coming from __lookup_hash, which needs the parent locked
> already, for instance in the open/creation path, which is locked in
> open_last_lookups.
>
> - In __lookup_slow, either the parent inode is read locked by the
> caller (lookup_slow), or it is called with no flags (lookup_one*).
> The read lock suffices to prevent ->d_name modifications, with the
> exception of one case: __d_unalias, will call __d_move to fix a
> directory accessible from multiple dentries, which effectively swaps
> ->d_name while holding only the shared read lock. This happens
> through this flow:
>
> lookup_slow() //LOOKUP_CREATE
> d_lookup()
> ->d_lookup()
> d_splice_alias()
> __d_unalias()
> __d_move()
>
> Nevertheless, this case is not a problem because negative dentries
> are not allowed to be moved with __d_move.
>
> - lookup_open also requires the parent to be locked in the creation
> case, which is done in open_last_lookups.
>
> - lookup_fast will indeed be called with the parent unlocked, but it
> shouldn't be called with LOOKUP_CREATE. Either it is called in the
> link_path_walk, where nd->flags doesn't have LOOKUP_CREATE yet or in
> open_last_lookups. But, in this case, it also never has LOOKUP_CREATE,
> because it is only called on the !O_CREAT case, which means op->intent
> doesn't have LOOKUP_CREAT (set in build_open_flags only if O_CREAT is
> set).
>
> Finally, for the LOOKUP_RENAME_TARGET, we are doing a rename, so the
> parents inodes are also locked.
>
> Reviewed-by: Theodore Ts'o <[email protected]>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
>
> ---
> Changes since v3:
> - Add comment regarding creation (Eric)
> - Reorder checks to clarify !flags meaning (Eric)
> - Add commit message explanaton of the inode read lock wrt.
> __d_move. (Eric)
> Changes since v2:
> - Add comments to all rejection cases (Eric)
> - safeguard against filesystem creating dentries without LOOKUP flags
> ---
> fs/libfs.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 55 insertions(+)
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 5b851315eeed..ed04c4dcc312 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1462,9 +1462,64 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
> return 0;
> }
>
> +static inline int generic_ci_d_revalidate(struct dentry *dentry,
> + const struct qstr *name,
> + unsigned int flags)
> +{
> + if (d_is_negative(dentry)) {
> + const struct dentry *parent = READ_ONCE(dentry->d_parent);
> + const struct inode *dir = READ_ONCE(parent->d_inode);
> +
> + if (dir && needs_casefold(dir)) {
> + /*
> + * Negative dentries created prior to turning the
> + * directory case-insensitive cannot be trusted, since
> + * they don't ensure any possible case version of the
> + * filename doesn't exist.
> + */
> + if (!d_is_casefold_lookup(dentry))
> + return 0;
> +
> + /*
> + * Filesystems will call into d_revalidate without
> + * setting LOOKUP_ flags even for file creation (see
> + * lookup_one* variants). Reject negative dentries in
> + * this case, since we can't know for sure it won't be
> + * used for creation.
> + */
> + if (!flags)
> + return 0;
> +
> + /*
> + * If the lookup is for creation, then a negative dentry
> + * can only be reused if it's a case-sensitive match,
> + * not just a case-insensitive one. This is needed to
> + * make the new file be created with the name the user
> + * specified, preserving case.
> + */
> + if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
> + /*
> + * ->d_name won't change from under us in the
> + * creation path only, since d_revalidate during
> + * creation and renames is always called with
> + * the parent inode locked. It isn't the case
> + * for all lookup callpaths, so ->d_name must
> + * not be touched outside
> + * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
> + */
> + if (dentry->d_name.len != name->len ||
> + memcmp(dentry->d_name.name, name->name, name->len))
> + return 0;
> + }
> + }
> + }
> + return 1;
> +}
> +
> static const struct dentry_operations generic_ci_dentry_ops = {
> .d_hash = generic_ci_d_hash,
> .d_compare = generic_ci_d_compare,
> + .d_revalidate_name = generic_ci_d_revalidate,
> };
> #endif
Wouldn't it make sense to get rid of all this indentation?
const struct dentry *parent;
const struct inode *dir;
if (!d_is_negative(dentry))
return 1;
parent = READ_ONCE(dentry->d_parent);
dir = READ_ONCE(parent->d_inode);
if (!dir)
return 1;
if (!needs_casefold(dir))
return 1;
/*
* Negative dentries created prior to turning the
* directory case-insensitive cannot be trusted, since
* they don't ensure any possible case version of the
* filename doesn't exist.
*/
if (!d_is_casefold_lookup(dentry))
return 0;
/*
* Filesystems will call into d_revalidate without
* setting LOOKUP_ flags even for file creation (see
* lookup_one* variants). Reject negative dentries in
* this case, since we can't know for sure it won't be
* used for creation.
*/
if (!flags)
return 0;
/*
* If the lookup is for creation, then a negative dentry
* can only be reused if it's a case-sensitive match,
* not just a case-insensitive one. This is needed to
* make the new file be created with the name the user
* specified, preserving case.
*/
if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
/*
* ->d_name won't change from under us in the
* creation path only, since d_revalidate during
* creation and renames is always called with
* the parent inode locked. It isn't the case
* for all lookup callpaths, so ->d_name must
* not be touched outside
* (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
*/
if (dentry->d_name.len != name->len ||
memcmp(dentry->d_name.name, name->name, name->len))
return 0;
}
return 1;
Christian Brauner <[email protected]> writes:
> On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
>
> Wouldn't it make sense to get rid of all this indentation?
I'm ok with making this change. I'll wait for more reviews and Eric
before sending a new version with this done.
Thanks!
--
Gabriel Krisman Bertazi
On Fri, Jul 28, 2023 at 11:09:46AM -0400, Gabriel Krisman Bertazi wrote:
> Christian Brauner <[email protected]> writes:
>
> > On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
>
> >
> > Wouldn't it make sense to get rid of all this indentation?
>
> I'm ok with making this change. I'll wait for more reviews and Eric
> before sending a new version with this done.
>
> Thanks!
>
Well, the issue is that with patch 4, all the 'return 1;' would need to change
to 'return fscrypt_d_revalidate(dentry, flags);'.
A helper function could be used, though, if you prefer:
static int generic_ci_d_revalidate(struct dentry *dentry,
const struct qstr *name, unsigned int flags)
{
if (!ci_d_revalidate(dentry, name, flags))
return 0;
return fscrypt_d_revalidate(dentry, flags);
}
- Eric
On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
> - In __lookup_slow, either the parent inode is read locked by the
> caller (lookup_slow), or it is called with no flags (lookup_one*).
> The read lock suffices to prevent ->d_name modifications, with the
> exception of one case: __d_unalias, will call __d_move to fix a
> directory accessible from multiple dentries, which effectively swaps
> ->d_name while holding only the shared read lock. This happens
> through this flow:
>
> lookup_slow() //LOOKUP_CREATE
> d_lookup()
> ->d_lookup()
> d_splice_alias()
> __d_unalias()
> __d_move()
>
> Nevertheless, this case is not a problem because negative dentries
> are not allowed to be moved with __d_move.
Isn't it possible for a negative dentry to become a positive one concurrently?
- Eric
On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
> dentry->d_name is only checked by the case-insensitive d_revalidate hook
> in the LOOKUP_CREATE/LOOKUP_RENAME_TARGET case since, for these cases,
> d_revalidate is always called with the parent inode read-locked, and
> therefore the name cannot change from under us.
"at least read-locked"? Or do you actually mean write-locked?
> +static inline int generic_ci_d_revalidate(struct dentry *dentry,
> + const struct qstr *name,
> + unsigned int flags)
No need for inline here.
- Eric
Eric Biggers <[email protected]> writes:
> On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
>> dentry->d_name is only checked by the case-insensitive d_revalidate hook
>> in the LOOKUP_CREATE/LOOKUP_RENAME_TARGET case since, for these cases,
>> d_revalidate is always called with the parent inode read-locked, and
>> therefore the name cannot change from under us.
>
> "at least read-locked"? Or do you actually mean write-locked?
No. I mean read-locked, as in holding the read-part of the inode lock.
This is the case for lookup_slow, which is safe, despite the d_add_ci
case we discussed in the previous iteration. I'll reword to say "at
least read-locked and mention it is the case in lookup_slow".
>> +static inline int generic_ci_d_revalidate(struct dentry *dentry,
>> + const struct qstr *name,
>> + unsigned int flags)
>
> No need for inline here.
sorry, I missed the inline from your previuos review. Will fix it up
for this one.
>
> - Eric
--
Gabriel Krisman Bertazi
Eric Biggers <[email protected]> writes:
> On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
>> - In __lookup_slow, either the parent inode is read locked by the
>> caller (lookup_slow), or it is called with no flags (lookup_one*).
>> The read lock suffices to prevent ->d_name modifications, with the
>> exception of one case: __d_unalias, will call __d_move to fix a
>> directory accessible from multiple dentries, which effectively swaps
>> ->d_name while holding only the shared read lock. This happens
>> through this flow:
>>
>> lookup_slow() //LOOKUP_CREATE
>> d_lookup()
>> ->d_lookup()
>> d_splice_alias()
>> __d_unalias()
>> __d_move()
>>
>> Nevertheless, this case is not a problem because negative dentries
>> are not allowed to be moved with __d_move.
>
> Isn't it possible for a negative dentry to become a positive one concurrently?
Do you mean d_splice_alias racing with a dentry instantiation and
__d_move being called on a negative dentry that is turning positive?
It is not possible for __d_move to be called with a negative dentry for
d_splice_alias, since the inode->i_lock is locked during __d_find_alias,
so it can't race with __d_instantiate or d_add. Then, __d_find_alias
can't find negative dentries in the first place, so we either have a
positive dentry, in which case __d_move is fine with regard to
d_revalidate_name, or we don't have any aliases and don't call
__d_move.
Can you clarify what problem you see here?
--
Gabriel Krisman Bertazi
On Thu, Aug 03, 2023 at 01:37:45PM -0400, Gabriel Krisman Bertazi wrote:
> Eric Biggers <[email protected]> writes:
>
> > On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
> >> - In __lookup_slow, either the parent inode is read locked by the
> >> caller (lookup_slow), or it is called with no flags (lookup_one*).
> >> The read lock suffices to prevent ->d_name modifications, with the
> >> exception of one case: __d_unalias, will call __d_move to fix a
> >> directory accessible from multiple dentries, which effectively swaps
> >> ->d_name while holding only the shared read lock. This happens
> >> through this flow:
> >>
> >> lookup_slow() //LOOKUP_CREATE
> >> d_lookup()
> >> ->d_lookup()
> >> d_splice_alias()
> >> __d_unalias()
> >> __d_move()
> >>
> >> Nevertheless, this case is not a problem because negative dentries
> >> are not allowed to be moved with __d_move.
> >
> > Isn't it possible for a negative dentry to become a positive one concurrently?
>
> Do you mean d_splice_alias racing with a dentry instantiation and
> __d_move being called on a negative dentry that is turning positive?
>
> It is not possible for __d_move to be called with a negative dentry for
> d_splice_alias, since the inode->i_lock is locked during __d_find_alias,
> so it can't race with __d_instantiate or d_add. Then, __d_find_alias
> can't find negative dentries in the first place, so we either have a
> positive dentry, in which case __d_move is fine with regard to
> d_revalidate_name, or we don't have any aliases and don't call
> __d_move.
>
> Can you clarify what problem you see here?
>
I agree that negative dentries can't be moved --- I pointed this out earlier
(https://lore.kernel.org/linux-fsdevel/[email protected]).
The question is whether if ->d_revalidate sees a negative dentry, when can it
assume that it remains a negative dentry for the remainder of ->d_revalidate.
I'm not sure there is a problem, I just don't understand your explanation.
- Eric
Eric Biggers <[email protected]> writes:
> On Thu, Aug 03, 2023 at 01:37:45PM -0400, Gabriel Krisman Bertazi wrote:
>> Eric Biggers <[email protected]> writes:
>>
>> > On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
>> >> - In __lookup_slow, either the parent inode is read locked by the
>> >> caller (lookup_slow), or it is called with no flags (lookup_one*).
>> >> The read lock suffices to prevent ->d_name modifications, with the
>> >> exception of one case: __d_unalias, will call __d_move to fix a
>> >> directory accessible from multiple dentries, which effectively swaps
>> >> ->d_name while holding only the shared read lock. This happens
>> >> through this flow:
>> >>
>> >> lookup_slow() //LOOKUP_CREATE
>> >> d_lookup()
>> >> ->d_lookup()
>> >> d_splice_alias()
>> >> __d_unalias()
>> >> __d_move()
>> >>
>> >> Nevertheless, this case is not a problem because negative dentries
>> >> are not allowed to be moved with __d_move.
>> >
>> > Isn't it possible for a negative dentry to become a positive one concurrently?
>>
>> Do you mean d_splice_alias racing with a dentry instantiation and
>> __d_move being called on a negative dentry that is turning positive?
>>
>> It is not possible for __d_move to be called with a negative dentry for
>> d_splice_alias, since the inode->i_lock is locked during __d_find_alias,
>> so it can't race with __d_instantiate or d_add. Then, __d_find_alias
>> can't find negative dentries in the first place, so we either have a
>> positive dentry, in which case __d_move is fine with regard to
>> d_revalidate_name, or we don't have any aliases and don't call
>> __d_move.
>>
>> Can you clarify what problem you see here?
>>
>
> I agree that negative dentries can't be moved --- I pointed this out earlier
> (https://lore.kernel.org/linux-fsdevel/[email protected]).
> The question is whether if ->d_revalidate sees a negative dentry, when can it
> assume that it remains a negative dentry for the remainder of ->d_revalidate.
> I'm not sure there is a problem, I just don't understand your
> explanation.
I see. Thanks for clarifying, as I had previously misunderstood your
point.
So, first of all, if d_revalidate itself is not a creation, it doesn't
matter, because we won't touch ->d_name. We might invalidate a valid
dentry, but that is ok. The problem would be limited to d_revalidate
being on the creation path, where the parent (read-)lock is held. The
problem would be doing the memcmp(), while the dentry is turned positive
(d_instantiate), while someone else moves the name.
For the dentry to be turned positive during a d_revalidate, it would
then have to race with d_add or with d_instantiate. d_add shouldn't be
possible since we are holding the parent inode lock (at least
read-side), which will serialize file creation.
From my understanding of the code, d_instantiate also can't race with
d_revalidate for the same reason - is also serialized by the parent
inode lock, which is acquired in filename_create. At least for all paths
in ext4/f2fs. In fact, I'm failing to find a case where the lock is not
taken when instantiating a dentry, but I'm unsure if this is a guarantee
or just an artifact of the code.
It seems to be safe in the current code, but I don't know if it is a
guarantee. Can anyone comment on this?
--
Gabriel Krisman Bertazi