2023-11-23 17:13:15

by Al Viro

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Thu, Nov 23, 2023 at 10:57:22AM -0500, Gabriel Krisman Bertazi wrote:
> Linus Torvalds <[email protected]> writes:
>
> > Side note: Gabriel, as things are now, instead of that
> >
> > if (!d_is_casefolded_name(dentry))
> > return 0;
> >
> > in generic_ci_d_revalidate(), I would suggest that any time a
> > directory is turned into a case-folded one, you'd just walk all the
> > dentries for that directory and invalidate negative ones at that
> > point. Or was there some reason I missed that made it a good idea to
> > do it at run-time after-the-fact?
> >
>
> The problem I found with that approach, which I originally tried, was
> preventing concurrent lookups from racing with the invalidation and
> creating more 'case-sensitive' negative dentries. Did I miss a way to
> synchronize with concurrent lookups of the children of the dentry? We
> can trivially ensure the dentry doesn't have positive children by
> holding the parent lock, but that doesn't protect from concurrent
> lookups creating negative dentries, as far as I understand.

AFAICS, there is a problem with dentries that never came through
->lookup(). Unless I'm completely misreading your code, your
generic_ci_d_revalidate() is not called for them. Ever.

Hash lookups are controlled by ->d_op of parent; that's where ->d_hash()
and ->d_compare() come from. Revalidate comes from *child*. You need
->d_op->d_revalidate of child dentry to be set to your generic_ci_d_revalidate().

The place where it gets set is generic_set_encrypted_ci_d_ops(). Look
at its callchain; in case of ext4 it gets called from ext4_lookup_dentry(),
which is called from ext4_lookup(). And dentry passed to it is the
argument of ->lookup().

Now take a look at open-by-fhandle stuff; all methods in there
(->fh_to_dentry(), ->fh_to_parent(), ->get_parent()) end up
returning d_obtain_alias(some inode).

We *do* call ->lookup(), all right - in reconnect_one(), while
trying to connect those suckers with the main tree. But the way
it works is that d_splice_alias() in ext4_lookup() moves the
existing alias for subdirectory, connecting it to the parent.
That's not the dentry ext4_lookup() had set ->d_op on - that's
the dentry that came from d_obtain_alias(). And those do not
have ->d_op set by anything in your tree.

That's the problem I'd been talking about - there is a class of situations
where the work done by ext4_lookup() to set the state of dentry gets
completely lost. After lookup you do have a dentry in the right place,
with the right name and inode, etc., but with NULL ->d_op->d_revalidate.


2023-11-23 17:38:09

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

Al Viro <[email protected]> writes:

> On Thu, Nov 23, 2023 at 10:57:22AM -0500, Gabriel Krisman Bertazi wrote:
>> Linus Torvalds <[email protected]> writes:
>>
>> > Side note: Gabriel, as things are now, instead of that
>> >
>> > if (!d_is_casefolded_name(dentry))
>> > return 0;
>> >
>> > in generic_ci_d_revalidate(), I would suggest that any time a
>> > directory is turned into a case-folded one, you'd just walk all the
>> > dentries for that directory and invalidate negative ones at that
>> > point. Or was there some reason I missed that made it a good idea to
>> > do it at run-time after-the-fact?
>> >
>>
>> The problem I found with that approach, which I originally tried, was
>> preventing concurrent lookups from racing with the invalidation and
>> creating more 'case-sensitive' negative dentries. Did I miss a way to
>> synchronize with concurrent lookups of the children of the dentry? We
>> can trivially ensure the dentry doesn't have positive children by
>> holding the parent lock, but that doesn't protect from concurrent
>> lookups creating negative dentries, as far as I understand.
>
> AFAICS, there is a problem with dentries that never came through
> ->lookup(). Unless I'm completely misreading your code, your
> generic_ci_d_revalidate() is not called for them. Ever.
>
> Hash lookups are controlled by ->d_op of parent; that's where ->d_hash()
> and ->d_compare() come from. Revalidate comes from *child*. You need
> ->d_op->d_revalidate of child dentry to be set to your generic_ci_d_revalidate().
>
> The place where it gets set is generic_set_encrypted_ci_d_ops(). Look
> at its callchain; in case of ext4 it gets called from ext4_lookup_dentry(),
> which is called from ext4_lookup(). And dentry passed to it is the
> argument of ->lookup().
>
> Now take a look at open-by-fhandle stuff; all methods in there
> (->fh_to_dentry(), ->fh_to_parent(), ->get_parent()) end up
> returning d_obtain_alias(some inode).
>
> We *do* call ->lookup(), all right - in reconnect_one(), while
> trying to connect those suckers with the main tree. But the way
> it works is that d_splice_alias() in ext4_lookup() moves the
> existing alias for subdirectory, connecting it to the parent.
> That's not the dentry ext4_lookup() had set ->d_op on - that's
> the dentry that came from d_obtain_alias(). And those do not
> have ->d_op set by anything in your tree.
>
> That's the problem I'd been talking about - there is a class of situations
> where the work done by ext4_lookup() to set the state of dentry gets
> completely lost. After lookup you do have a dentry in the right place,
> with the right name and inode, etc., but with NULL
> ->d_op->d_revalidate.

I get the problem now. I admit to not understanding all the details yet,
which is why I haven't answered directly, but I understand already how
it can get borked. I'm studying your explanation.

Originally, ->d_op could be propagated trivially since we had sb->s_d_op
set, which would be set by __d_alloc, but that is no longer the case
since we combined fscrypt and CI support.

What I still don't understand is why we shouldn't fixup ->d_op when
calling d_obtain_alias (before __d_instantiate_anon) and you say we
better do it in d_splice_alias. The ->d_op is going to be the same
across the filesystem when the casefold feature is enabled, regardless
if the directory is casefolded. If we set it there, the alias already
has the right d_op from the start.

--
Gabriel Krisman Bertazi