2023-11-23 18:24:55

by Al Viro

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

On Thu, Nov 23, 2023 at 12:37:43PM -0500, Gabriel Krisman Bertazi wrote:
> > That's the problem I'd been talking about - there is a class of situations
> > where the work done by ext4_lookup() to set the state of dentry gets
> > completely lost. After lookup you do have a dentry in the right place,
> > with the right name and inode, etc., but with NULL
> > ->d_op->d_revalidate.
>
> I get the problem now. I admit to not understanding all the details yet,
> which is why I haven't answered directly, but I understand already how
> it can get borked. I'm studying your explanation.
>
> Originally, ->d_op could be propagated trivially since we had sb->s_d_op
> set, which would be set by __d_alloc, but that is no longer the case
> since we combined fscrypt and CI support.
>
> What I still don't understand is why we shouldn't fixup ->d_op when
> calling d_obtain_alias (before __d_instantiate_anon) and you say we
> better do it in d_splice_alias. The ->d_op is going to be the same
> across the filesystem when the casefold feature is enabled, regardless
> if the directory is casefolded. If we set it there, the alias already
> has the right d_op from the start.

*blink*

A paragraph above you've said that it's not constant over the entire
filesystem.

Look, it's really simple - any setup work of that sort done in ->lookup()
is either misplaced, or should be somehow transferred over to the alias
if one gets picked.

As for d_obtain_alias()... AFAICS, it's far more limited in what information
it could access. It knows the inode, but it has no idea about the parent
to be.

The more I look at that, the more it feels like we need a method that would
tell the filesystem that this dentry is about to be spliced here. 9p is
another place where it would obviously simplify the things; ocfs2 'attach
lock' stuff is another case where the things get much more complicated
by having to do that stuff after splicing, etc.

It's not even hard to do:

1. turn bool exchange in __d_move() arguments into 3-value thing - move,
exchange or splice. Have the callers in d_splice_alias() and __d_unalias()
pass "splice" instead of false (aka normal move).

2. make __d_move() return an int (normally 0)

3. if asked to splice and if there's target->d_op->d_transfer(), let
__d_move() call it right after
spin_lock_nested(&dentry->d_lock, 2);
spin_lock_nested(&target->d_lock, 3);
in there. Passing it target and dentry, obviously. In unlikely case
of getting a non-zero returned by the method, undo locks and return
that value to __d_move() caller.

4. d_move() and d_exchange() would ignore the value returned by __d_move();
__d_unalias() turn
__d_move(alias, dentry, false);
ret = 0;
into
ret = __d_move(alias, dentry, Splice);
d_splice_alias() turn
__d_move(new, dentry, false);
write_sequnlock(&rename_lock);
into
err = __d_move(new, dentry, Splice);
write_sequnlock(&rename_lock);
if (unlikely(err)) {
dput(new);
new = ERR_PTR(err);
}
(actually, dput()-on-error part would be common to all 3 branches
in there, so it would probably get pulled out of that if-else if-else).

I can cook a patch doing that (and convert the obvious beneficiaries already
in the tree to it) and throw it into dcache branch - just need to massage
the series in there for repost...

PS: note, BTW, that fscrypt folks have already placed a hook into
__d_move(), exactly for the case of splice; I wonder if that would be
foldable into the same mechanism - hadn't looked in details yet.


2023-11-23 19:07:07

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

Al Viro <[email protected]> writes:

> On Thu, Nov 23, 2023 at 12:37:43PM -0500, Gabriel Krisman Bertazi wrote:
>> > That's the problem I'd been talking about - there is a class of situations
>> > where the work done by ext4_lookup() to set the state of dentry gets
>> > completely lost. After lookup you do have a dentry in the right place,
>> > with the right name and inode, etc., but with NULL
>> > ->d_op->d_revalidate.
>>
>> I get the problem now. I admit to not understanding all the details yet,
>> which is why I haven't answered directly, but I understand already how
>> it can get borked. I'm studying your explanation.
>>
>> Originally, ->d_op could be propagated trivially since we had sb->s_d_op
>> set, which would be set by __d_alloc, but that is no longer the case
>> since we combined fscrypt and CI support.
>>
>> What I still don't understand is why we shouldn't fixup ->d_op when
>> calling d_obtain_alias (before __d_instantiate_anon) and you say we
>> better do it in d_splice_alias. The ->d_op is going to be the same
>> across the filesystem when the casefold feature is enabled, regardless
>> if the directory is casefolded. If we set it there, the alias already
>> has the right d_op from the start.
>
> *blink*
>
> A paragraph above you've said that it's not constant over the entire
> filesystem.

The same ->d_op is used by every dentry in the filesystem if the superblock
has the casefold bit enabled, regardless of whether a specific inode is
casefolded or not. See generic_set_encrypted_ci_d_ops in my tree. It is
called unconditionally by ext4_lookup and only checks the superblock:

void generic_set_encrypted_ci_d_ops(struct dentry *dentry)
{
if (dentry->d_sb->s_encoding) {
d_set_d_op(dentry, &generic_encrypted_ci_dentry_ops);
return;
}
...

What I meant was that this used to be set once at sb->s_d_op, and
propagated during dentry allocation. Therefore, the propagation to the
alias would happen inside __d_alloc. Once we enabled fscrypt and
casefold to work together, sb->s_d_op is NULL and we always set the same
handler for every dentry during lookup.

> Look, it's really simple - any setup work of that sort done in ->lookup()
> is either misplaced, or should be somehow transferred over to the alias
> if one gets picked.
>
> As for d_obtain_alias()... AFAICS, it's far more limited in what information
> it could access. It knows the inode, but it has no idea about the parent
> to be.

Since it has the inode, d_obtain_alias has the superblock. I think that's all
we need for generic_set_encrypted_ci_d_ops.

> The more I look at that, the more it feels like we need a method that would
> tell the filesystem that this dentry is about to be spliced here. 9p is
> another place where it would obviously simplify the things; ocfs2 'attach
> lock' stuff is another case where the things get much more complicated
> by having to do that stuff after splicing, etc.
>
> It's not even hard to do:
>
> 1. turn bool exchange in __d_move() arguments into 3-value thing - move,
> exchange or splice. Have the callers in d_splice_alias() and __d_unalias()
> pass "splice" instead of false (aka normal move).
>
> 2. make __d_move() return an int (normally 0)
>
> 3. if asked to splice and if there's target->d_op->d_transfer(), let
> __d_move() call it right after
> spin_lock_nested(&dentry->d_lock, 2);
> spin_lock_nested(&target->d_lock, 3);
> in there. Passing it target and dentry, obviously. In unlikely case
> of getting a non-zero returned by the method, undo locks and return
> that value to __d_move() caller.
>
> 4. d_move() and d_exchange() would ignore the value returned by __d_move();
> __d_unalias() turn
> __d_move(alias, dentry, false);
> ret = 0;
> into
> ret = __d_move(alias, dentry, Splice);
> d_splice_alias() turn
> __d_move(new, dentry, false);
> write_sequnlock(&rename_lock);
> into
> err = __d_move(new, dentry, Splice);
> write_sequnlock(&rename_lock);
> if (unlikely(err)) {
> dput(new);
> new = ERR_PTR(err);
> }
> (actually, dput()-on-error part would be common to all 3 branches
> in there, so it would probably get pulled out of that if-else if-else).
>
> I can cook a patch doing that (and convert the obvious beneficiaries already
> in the tree to it) and throw it into dcache branch - just need to massage
> the series in there for repost...

if you can write that, I'll definitely appreciate it. It will surely
take me much longer to figure it out myself.

--
Gabriel Krisman Bertazi